One obvious case is a cache-hit scenario, where instead of using the
block-cache, there is a fairly heavy round-trip to data-node. It is also
highly likely that the data-node might have evicted the hot-pages due to
other active reads.

How much of cache-hit happens in Blur? Will I be correct in saying that
repeated terms occurring in search only will benefit block-cache?

--
Ravi


On Wed, Mar 19, 2014 at 11:06 PM, Ravikumar Govindarajan <
[email protected]> wrote:

> I was looking at block-cache code and trying to understand why we need it.
>
> We divide the file into blocks of 8KB and write to hadoop. While reading,
> we only read in batches of 8KB and store in block-cache
>
> This is a form of read-ahead caching on the client-side[shard-server]. Am
> I correct in understanding?
>
> Recent releases of hadoop have a notion of read-ahead caching in data-node
> itself. The default value is 4MB but I believe it can also be configured to
> whatever is needed.
>
> What are the advantages of a block-cache vis-a-vis data-node read-ahead
> cache?
>
> I also am not familiar with hadoop IO sub-system as to whether it's
> correct and performant to do read-aheads in data-nodes for a use-case like
> lucene.
>
> Can someone help me?
>
> --
> Ravi
>
>
>

Reply via email to