Re: Fwd:

Tim Robertson Wed, 02 May 2018 01:05:32 -0700

Hi Alex,

I'm not sure I fully follow your question without the images but I'll try
and help.


When a read request comes in, my understanding of the order of execution is
as follows (perhaps someone can verify this):

1) It looks in the block cache for the cells (this is a read only cache
containing recently read data)
2) It looks in the memstore to see if there are any writes still in memory
ready to flush down to the HFiles that needs merged with the data read in 1)
3) Only if not found it starts locating the data from HFiles (note, there
can be multiple files per region until major compaction runs which merges
into 1 per column family, discarding stale data where possible)
  a) It uses bloom filters and the block cache indexes to locate the target
blocks (these are part of the HFiles, but read into memory when the region
servers start)
  b) those target blocks are then opened and occupy space on the block
cache on the region server (possibly evicting other blocks)
  c) the data is read from the opened block

Does that help at all?

Thanks,
Tim



On Wed, May 2, 2018 at 9:49 AM, Xi Yang <[email protected]> wrote:

> OK, I got it. I've understood the Q2 by your help, thanks!
>
>
>
> Seems like I have to use some other way to draw my images, Here is the
> updated version Q1:
>
>
> Q1
>
> I found that HFileScannerImpl.getCachedBlock(...) get block from
> BlockCache. This CachedBlock is used by StoreFileScanner. Is that mean the
> read model like:
>
> *Model A*
>
> When a read request come
>
>    1. Read 1st Store:
>    a. read BlockCache
>    b. read HFile
>    2. Read 2nd Store:
>    a. read BlockCache
>    b. read HFile
>    3. ......
>    4. Read Memstore
>
>
>
> Or there is only one BlockCache and all the read request will go through it
> first, like:
>
> *Model B:*
>
> When a read request come
>
>    1. Read BlockCache
>    2. Read 1st Store -> read HFIle
>    3. Read 2nd Store -> read HFile
>    4. ....
>    5. Read Memstore
>
>
> 
>
> Thanks,
> Alex
>
>
>
> 2018-05-01 20:04 GMT-07:00 Josh Elser <[email protected]>:
>
> > FYI, the mailing list strips images.
> >
> > There is only one BlockCache per RS. Not sure if that answers your Q1 in
> > entirety though.
> >
> > Q2. The "Block" in "BlockCache" are the blocks that make up the HBase
> > HFiles in HDFS. Data in the Memstore does not yet exist in HFiles on
> HDFS.
> > Additionally, Memstore is already in memory; no need to have a different
> > cache to accomplish the same thing :)
> >
> > On 5/1/18 9:25 PM, Xi Yang wrote:
> >
> >> Sorry to bother you guys. May I ask 2 questions about HBase?
> >>
> >> Q1
> >>
> >> I found that |HFileScannerImpl.getCachedBlock(...)| get block from
> >> BlockCache. This CachedBlock is used by |StoreFileScanner|. Is that mean
> >> the read model like:
> >>
> >> *Model A*
> >>
> >> Or there is only one BlockCache and all the read request will go through
> >> it first, like:
> >>
> >> *Model B:*
> >>
> >> 
> >> Q2
> >> If the data been read from Memstore, will it be put in BlockCache to
> >> accelerate the read process next time?
> >>
> >> 
> >> Thanks,
> >> Alex
> >>
> >> 
> >>
> >
>

Re: Fwd:

Reply via email to