Thanks Alex, Yes, looking at that code I believe you are correct - the memStore scanner is appended after the block scanners. The block scanners may or may not see hits in the block cache when they read. If they don't get a hit, they'll open the block from the underlying HFile(s).
On Wed, May 2, 2018 at 10:41 PM, Xi Yang <alex.xi.y...@gmail.com> wrote: > Hi Tim, > > Thank you for detailed explanation. Yes, that really helps me! I really > appreciate it! > > > But I still confused about the sequence: > > I've read these codes in *HStore.getScanners* : > > > * // TODO this used to get the store files in descending order,* > * // but now we get them in ascending order, which I think is* > * // actually more correct, since memstore get put at the end.* > * List<StoreFileScanner> sfScanners = > StoreFileScanner.getScannersForStoreFiles(storeFilesToScan,* > * cacheBlocks, usePread, isCompaction, false, matcher, readPt);* > * List<KeyValueScanner> scanners = new ArrayList<>(sfScanners.size() + > 1);* > * scanners.addAll(sfScanners);* > * // Then the memstore scanners* > * scanners.addAll(memStoreScanners);* > > > Is it mean this step: > > > *2) It looks in the memstore to see if there are any writes still in > memoryready to flush down to the HFiles that needs merged with the data > read in 1) * > > is behind the following step? > > *c) the data is read from the opened block * > > > > > Here are explanation of the images I drew before, so that we don't need the > images: > > When a read request come in > Model A > > 1. get Scanners (including StoreScanner and MemStoreScanner). > MemStoreScanner is the last one > 2. Begin with the first StoreScanner > 3. Try to get the block from BlockCache of the StoreScanner > 4. Try to get the block from HFile of the StoreScanner > 5. Go to the next StoreScanner > 6. Loop #2 - #5 until all StoreScanner been used > 7. Try to get the block from memStore > > > Model B > > 1. Try to get the block from BlockCache, if failed then go to #2 > 2. get Scanners (including StoreScanner and MemStoreScanner). > MemStoreScanner is the last on > 3. Begin with the first StoreScanner > 4. Try to get the block from HFile of the StoreScanner > 5. Go to the next StoreScanner > 6. Loop #4 - #5 until all StoreScanner been used > 7. Try to get the block from memStore > > > > Thanks, > Alex > > > 2018-05-02 1:04 GMT-07:00 Tim Robertson <timrobertson...@gmail.com>: > > > Hi Alex, > > > > I'm not sure I fully follow your question without the images but I'll try > > and help. > > > > When a read request comes in, my understanding of the order of execution > is > > as follows (perhaps someone can verify this): > > > > 1) It looks in the block cache for the cells (this is a read only cache > > containing recently read data) > > 2) It looks in the memstore to see if there are any writes still in > memory > > ready to flush down to the HFiles that needs merged with the data read in > > 1) > > 3) Only if not found it starts locating the data from HFiles (note, there > > can be multiple files per region until major compaction runs which merges > > into 1 per column family, discarding stale data where possible) > > a) It uses bloom filters and the block cache indexes to locate the > target > > blocks (these are part of the HFiles, but read into memory when the > region > > servers start) > > b) those target blocks are then opened and occupy space on the block > > cache on the region server (possibly evicting other blocks) > > c) the data is read from the opened block > > > > Does that help at all? > > > > Thanks, > > Tim > > > > > > > > On Wed, May 2, 2018 at 9:49 AM, Xi Yang <alex.xi.y...@gmail.com> wrote: > > > > > OK, I got it. I've understood the Q2 by your help, thanks! > > > > > > > > > > > > Seems like I have to use some other way to draw my images, Here is the > > > updated version Q1: > > > > > > > > > Q1 > > > > > > I found that HFileScannerImpl.getCachedBlock(...) get block from > > > BlockCache. This CachedBlock is used by StoreFileScanner. Is that mean > > the > > > read model like: > > > > > > *Model A* > > > > > > When a read request come > > > > > > 1. Read 1st Store: > > > a. read BlockCache > > > b. read HFile > > > 2. Read 2nd Store: > > > a. read BlockCache > > > b. read HFile > > > 3. ...... > > > 4. Read Memstore > > > > > > > > > > > > Or there is only one BlockCache and all the read request will go > through > > it > > > first, like: > > > > > > *Model B:* > > > > > > When a read request come > > > > > > 1. Read BlockCache > > > 2. Read 1st Store -> read HFIle > > > 3. Read 2nd Store -> read HFile > > > 4. .... > > > 5. Read Memstore > > > > > > > > > > > > > > > Thanks, > > > Alex > > > > > > > > > > > > 2018-05-01 20:04 GMT-07:00 Josh Elser <els...@apache.org>: > > > > > > > FYI, the mailing list strips images. > > > > > > > > There is only one BlockCache per RS. Not sure if that answers your Q1 > > in > > > > entirety though. > > > > > > > > Q2. The "Block" in "BlockCache" are the blocks that make up the HBase > > > > HFiles in HDFS. Data in the Memstore does not yet exist in HFiles on > > > HDFS. > > > > Additionally, Memstore is already in memory; no need to have a > > different > > > > cache to accomplish the same thing :) > > > > > > > > On 5/1/18 9:25 PM, Xi Yang wrote: > > > > > > > >> Sorry to bother you guys. May I ask 2 questions about HBase? > > > >> > > > >> Q1 > > > >> > > > >> I found that |HFileScannerImpl.getCachedBlock(...)| get block from > > > >> BlockCache. This CachedBlock is used by |StoreFileScanner|. Is that > > mean > > > >> the read model like: > > > >> > > > >> *Model A* > > > >> > > > >> Or there is only one BlockCache and all the read request will go > > through > > > >> it first, like: > > > >> > > > >> *Model B:* > > > >> > > > >> > > > >> Q2 > > > >> If the data been read from Memstore, will it be put in BlockCache to > > > >> accelerate the read process next time? > > > >> > > > >> > > > >> Thanks, > > > >> Alex > > > >> > > > >> > > > >> > > > > > > > > > >