Hi Tim,

Thanks for confirm the question.  That question confused me for a long
time. Really appreciate.


About another question, I still don't know whether ModelA is correct or
Model B is correct. Still confused


Thanks,
Alex

2018-05-02 13:53 GMT-07:00 Tim Robertson <[email protected]>:

> Thanks Alex,
>
> Yes, looking at that code I believe you are correct - the memStore scanner
> is appended after the block scanners.
> The block scanners may or may not see hits in the block cache when they
> read. If they don't get a hit, they'll open the block from the underlying
> HFile(s).
>
>
>
> On Wed, May 2, 2018 at 10:41 PM, Xi Yang <[email protected]> wrote:
>
> > Hi Tim,
> >
> > Thank you for detailed explanation. Yes, that really helps me! I really
> > appreciate it!
> >
> >
> > But I still confused about the sequence:
> >
> > I've read these codes in *HStore.getScanners* :
> >
> >
> > *    // TODO this used to get the store files in descending order,*
> > *    // but now we get them in ascending order, which I think is*
> > *    // actually more correct, since memstore get put at the end.*
> > *    List<StoreFileScanner> sfScanners =
> > StoreFileScanner.getScannersForStoreFiles(storeFilesToScan,*
> > *      cacheBlocks, usePread, isCompaction, false, matcher, readPt);*
> > *    List<KeyValueScanner> scanners = new ArrayList<>(sfScanners.size() +
> > 1);*
> > *    scanners.addAll(sfScanners);*
> > *    // Then the memstore scanners*
> > *    scanners.addAll(memStoreScanners);*
> >
> >
> > Is it mean this step:
> >
> >
> > *2) It looks in the memstore to see if there are any writes still in
> > memoryready to flush down to the HFiles that needs merged with the data
> > read in 1) *
> >
> > is behind the following step?
> >
> > *c) the data is read from the opened block *
> >
> >
> >
> >
> > Here are explanation of the images I drew before, so that we don't need
> the
> > images:
> >
> > When a read request come in
> > Model A
> >
> >    1. get Scanners (including StoreScanner and MemStoreScanner).
> >    MemStoreScanner is the last one
> >    2. Begin with the first StoreScanner
> >    3. Try to get the block from BlockCache of the StoreScanner
> >    4. Try to get the block from HFile of the StoreScanner
> >    5. Go to the next StoreScanner
> >    6. Loop #2 - #5 until all StoreScanner been used
> >    7. Try to get the block from memStore
> >
> >
> > Model B
> >
> >    1. Try to get the block from BlockCache, if failed then go to #2
> >    2. get Scanners (including StoreScanner and MemStoreScanner).
> >    MemStoreScanner is the last on
> >    3. Begin with the first StoreScanner
> >    4. Try to get the block from HFile of the StoreScanner
> >    5. Go to the next StoreScanner
> >    6. Loop #4 - #5 until all StoreScanner been used
> >    7. Try to get the block from memStore
> >
> >
> >
> > Thanks,
> > Alex
> >
> >
> > 2018-05-02 1:04 GMT-07:00 Tim Robertson <[email protected]>:
> >
> > > Hi Alex,
> > >
> > > I'm not sure I fully follow your question without the images but I'll
> try
> > > and help.
> > >
> > > When a read request comes in, my understanding of the order of
> execution
> > is
> > > as follows (perhaps someone can verify this):
> > >
> > > 1) It looks in the block cache for the cells (this is a read only cache
> > > containing recently read data)
> > > 2) It looks in the memstore to see if there are any writes still in
> > memory
> > > ready to flush down to the HFiles that needs merged with the data read
> in
> > > 1)
> > > 3) Only if not found it starts locating the data from HFiles (note,
> there
> > > can be multiple files per region until major compaction runs which
> merges
> > > into 1 per column family, discarding stale data where possible)
> > >   a) It uses bloom filters and the block cache indexes to locate the
> > target
> > > blocks (these are part of the HFiles, but read into memory when the
> > region
> > > servers start)
> > >   b) those target blocks are then opened and occupy space on the block
> > > cache on the region server (possibly evicting other blocks)
> > >   c) the data is read from the opened block
> > >
> > > Does that help at all?
> > >
> > > Thanks,
> > > Tim
> > >
> > >
> > >
> > > On Wed, May 2, 2018 at 9:49 AM, Xi Yang <[email protected]>
> wrote:
> > >
> > > > OK, I got it. I've understood the Q2 by your help, thanks!
> > > >
> > > >
> > > >
> > > > Seems like I have to use some other way to draw my images, Here is
> the
> > > > updated version Q1:
> > > >
> > > >
> > > > Q1
> > > >
> > > > I found that HFileScannerImpl.getCachedBlock(...) get block from
> > > > BlockCache. This CachedBlock is used by StoreFileScanner. Is that
> mean
> > > the
> > > > read model like:
> > > >
> > > > *Model A*
> > > >
> > > > When a read request come
> > > >
> > > >    1. Read 1st Store:
> > > >    a. read BlockCache
> > > >    b. read HFile
> > > >    2. Read 2nd Store:
> > > >    a. read BlockCache
> > > >    b. read HFile
> > > >    3. ......
> > > >    4. Read Memstore
> > > >
> > > >
> > > >
> > > > Or there is only one BlockCache and all the read request will go
> > through
> > > it
> > > > first, like:
> > > >
> > > > *Model B:*
> > > >
> > > > When a read request come
> > > >
> > > >    1. Read BlockCache
> > > >    2. Read 1st Store -> read HFIle
> > > >    3. Read 2nd Store -> read HFile
> > > >    4. ....
> > > >    5. Read Memstore
> > > >
> > > >
> > > > ​​
> > > >
> > > > Thanks,
> > > > Alex
> > > >
> > > >
> > > >
> > > > 2018-05-01 20:04 GMT-07:00 Josh Elser <[email protected]>:
> > > >
> > > > > FYI, the mailing list strips images.
> > > > >
> > > > > There is only one BlockCache per RS. Not sure if that answers your
> Q1
> > > in
> > > > > entirety though.
> > > > >
> > > > > Q2. The "Block" in "BlockCache" are the blocks that make up the
> HBase
> > > > > HFiles in HDFS. Data in the Memstore does not yet exist in HFiles
> on
> > > > HDFS.
> > > > > Additionally, Memstore is already in memory; no need to have a
> > > different
> > > > > cache to accomplish the same thing :)
> > > > >
> > > > > On 5/1/18 9:25 PM, Xi Yang wrote:
> > > > >
> > > > >> Sorry to bother you guys. May I ask 2 questions about HBase?
> > > > >>
> > > > >> Q1
> > > > >>
> > > > >> I found that |HFileScannerImpl.getCachedBlock(...)| get block
> from
> > > > >> BlockCache. This CachedBlock is used by |StoreFileScanner|. Is
> that
> > > mean
> > > > >> the read model like:
> > > > >>
> > > > >> *Model A*
> > > > >>
> > > > >> Or there is only one BlockCache and all the read request will go
> > > through
> > > > >> it first, like:
> > > > >>
> > > > >> *Model B:*
> > > > >>
> > > > >> ​
> > > > >> Q2
> > > > >> If the data been read from Memstore, will it be put in BlockCache
> to
> > > > >> accelerate the read process next time?
> > > > >>
> > > > >> ​
> > > > >> Thanks,
> > > > >> Alex
> > > > >>
> > > > >> ​
> > > > >>
> > > > >
> > > >
> > >
> >
>

Reply via email to