Re: Block-Cache and usage

Aaron McCurry Thu, 20 Mar 2014 05:16:24 -0700

On Wed, Mar 19, 2014 at 1:57 PM, Ravikumar Govindarajan <
[email protected]> wrote:

> One obvious case is a cache-hit scenario, where instead of using the
> block-cache, there is a fairly heavy round-trip to data-node. It is also
> highly likely that the data-node might have evicted the hot-pages due to
> other active reads.

Or writes.  The normal behavior in the Linux filesystem cache is to cache
newly written data and evict the oldest data from memory.  So during merges
(or any other writes from other Hadoop processes) the Linux filesystem will
unload pages that you might be using.

>

> How much of cache-hit happens in Blur? Will I be correct in saying that
> repeated terms occurring in search only will benefit block-cache?
>

No.  Typically the hit to miss ratio is very high, its a metric that is
recorded in Blur (you can access via the blue shell by running the top
command).  It's not unusual to see hits in the 5000-10000/s range with a
block size of 64KB and misses occurring at the same time between 10-20/s.
 This has a lot to due with how Lucene stores it's indexes, they are highly
compressed files (although not compressed with a generic compression
scheme).

Let me know if you any other questions.

Aaron

>
> --
> Ravi
>
>
> On Wed, Mar 19, 2014 at 11:06 PM, Ravikumar Govindarajan <
> [email protected]> wrote:
>
> > I was looking at block-cache code and trying to understand why we need
> it.
> >
> > We divide the file into blocks of 8KB and write to hadoop. While reading,
> > we only read in batches of 8KB and store in block-cache
> >
> > This is a form of read-ahead caching on the client-side[shard-server]. Am
> > I correct in understanding?
> >
> > Recent releases of hadoop have a notion of read-ahead caching in
> data-node
> > itself. The default value is 4MB but I believe it can also be configured
> to
> > whatever is needed.
> >
> > What are the advantages of a block-cache vis-a-vis data-node read-ahead
> > cache?
> >
> > I also am not familiar with hadoop IO sub-system as to whether it's
> > correct and performant to do read-aheads in data-nodes for a use-case
> like
> > lucene.
> >
> > Can someone help me?
> >
> > --
> > Ravi
> >
> >
> >
>

Re: Block-Cache and usage

Reply via email to