Hi Tim, Thanks for confirm the question. That question confused me for a long time. Really appreciate.
About another question, I still don't know whether ModelA is correct or Model B is correct. Still confused Thanks, Alex 2018-05-02 13:53 GMT-07:00 Tim Robertson <[email protected]>: > Thanks Alex, > > Yes, looking at that code I believe you are correct - the memStore scanner > is appended after the block scanners. > The block scanners may or may not see hits in the block cache when they > read. If they don't get a hit, they'll open the block from the underlying > HFile(s). > > > > On Wed, May 2, 2018 at 10:41 PM, Xi Yang <[email protected]> wrote: > > > Hi Tim, > > > > Thank you for detailed explanation. Yes, that really helps me! I really > > appreciate it! > > > > > > But I still confused about the sequence: > > > > I've read these codes in *HStore.getScanners* : > > > > > > * // TODO this used to get the store files in descending order,* > > * // but now we get them in ascending order, which I think is* > > * // actually more correct, since memstore get put at the end.* > > * List<StoreFileScanner> sfScanners = > > StoreFileScanner.getScannersForStoreFiles(storeFilesToScan,* > > * cacheBlocks, usePread, isCompaction, false, matcher, readPt);* > > * List<KeyValueScanner> scanners = new ArrayList<>(sfScanners.size() + > > 1);* > > * scanners.addAll(sfScanners);* > > * // Then the memstore scanners* > > * scanners.addAll(memStoreScanners);* > > > > > > Is it mean this step: > > > > > > *2) It looks in the memstore to see if there are any writes still in > > memoryready to flush down to the HFiles that needs merged with the data > > read in 1) * > > > > is behind the following step? > > > > *c) the data is read from the opened block * > > > > > > > > > > Here are explanation of the images I drew before, so that we don't need > the > > images: > > > > When a read request come in > > Model A > > > > 1. get Scanners (including StoreScanner and MemStoreScanner). > > MemStoreScanner is the last one > > 2. Begin with the first StoreScanner > > 3. Try to get the block from BlockCache of the StoreScanner > > 4. Try to get the block from HFile of the StoreScanner > > 5. Go to the next StoreScanner > > 6. Loop #2 - #5 until all StoreScanner been used > > 7. Try to get the block from memStore > > > > > > Model B > > > > 1. Try to get the block from BlockCache, if failed then go to #2 > > 2. get Scanners (including StoreScanner and MemStoreScanner). > > MemStoreScanner is the last on > > 3. Begin with the first StoreScanner > > 4. Try to get the block from HFile of the StoreScanner > > 5. Go to the next StoreScanner > > 6. Loop #4 - #5 until all StoreScanner been used > > 7. Try to get the block from memStore > > > > > > > > Thanks, > > Alex > > > > > > 2018-05-02 1:04 GMT-07:00 Tim Robertson <[email protected]>: > > > > > Hi Alex, > > > > > > I'm not sure I fully follow your question without the images but I'll > try > > > and help. > > > > > > When a read request comes in, my understanding of the order of > execution > > is > > > as follows (perhaps someone can verify this): > > > > > > 1) It looks in the block cache for the cells (this is a read only cache > > > containing recently read data) > > > 2) It looks in the memstore to see if there are any writes still in > > memory > > > ready to flush down to the HFiles that needs merged with the data read > in > > > 1) > > > 3) Only if not found it starts locating the data from HFiles (note, > there > > > can be multiple files per region until major compaction runs which > merges > > > into 1 per column family, discarding stale data where possible) > > > a) It uses bloom filters and the block cache indexes to locate the > > target > > > blocks (these are part of the HFiles, but read into memory when the > > region > > > servers start) > > > b) those target blocks are then opened and occupy space on the block > > > cache on the region server (possibly evicting other blocks) > > > c) the data is read from the opened block > > > > > > Does that help at all? > > > > > > Thanks, > > > Tim > > > > > > > > > > > > On Wed, May 2, 2018 at 9:49 AM, Xi Yang <[email protected]> > wrote: > > > > > > > OK, I got it. I've understood the Q2 by your help, thanks! > > > > > > > > > > > > > > > > Seems like I have to use some other way to draw my images, Here is > the > > > > updated version Q1: > > > > > > > > > > > > Q1 > > > > > > > > I found that HFileScannerImpl.getCachedBlock(...) get block from > > > > BlockCache. This CachedBlock is used by StoreFileScanner. Is that > mean > > > the > > > > read model like: > > > > > > > > *Model A* > > > > > > > > When a read request come > > > > > > > > 1. Read 1st Store: > > > > a. read BlockCache > > > > b. read HFile > > > > 2. Read 2nd Store: > > > > a. read BlockCache > > > > b. read HFile > > > > 3. ...... > > > > 4. Read Memstore > > > > > > > > > > > > > > > > Or there is only one BlockCache and all the read request will go > > through > > > it > > > > first, like: > > > > > > > > *Model B:* > > > > > > > > When a read request come > > > > > > > > 1. Read BlockCache > > > > 2. Read 1st Store -> read HFIle > > > > 3. Read 2nd Store -> read HFile > > > > 4. .... > > > > 5. Read Memstore > > > > > > > > > > > > > > > > > > > > Thanks, > > > > Alex > > > > > > > > > > > > > > > > 2018-05-01 20:04 GMT-07:00 Josh Elser <[email protected]>: > > > > > > > > > FYI, the mailing list strips images. > > > > > > > > > > There is only one BlockCache per RS. Not sure if that answers your > Q1 > > > in > > > > > entirety though. > > > > > > > > > > Q2. The "Block" in "BlockCache" are the blocks that make up the > HBase > > > > > HFiles in HDFS. Data in the Memstore does not yet exist in HFiles > on > > > > HDFS. > > > > > Additionally, Memstore is already in memory; no need to have a > > > different > > > > > cache to accomplish the same thing :) > > > > > > > > > > On 5/1/18 9:25 PM, Xi Yang wrote: > > > > > > > > > >> Sorry to bother you guys. May I ask 2 questions about HBase? > > > > >> > > > > >> Q1 > > > > >> > > > > >> I found that |HFileScannerImpl.getCachedBlock(...)| get block > from > > > > >> BlockCache. This CachedBlock is used by |StoreFileScanner|. Is > that > > > mean > > > > >> the read model like: > > > > >> > > > > >> *Model A* > > > > >> > > > > >> Or there is only one BlockCache and all the read request will go > > > through > > > > >> it first, like: > > > > >> > > > > >> *Model B:* > > > > >> > > > > >> > > > > >> Q2 > > > > >> If the data been read from Memstore, will it be put in BlockCache > to > > > > >> accelerate the read process next time? > > > > >> > > > > >> > > > > >> Thanks, > > > > >> Alex > > > > >> > > > > >> > > > > >> > > > > > > > > > > > > > > >
