One obvious case is a cache-hit scenario, where instead of using the block-cache, there is a fairly heavy round-trip to data-node. It is also highly likely that the data-node might have evicted the hot-pages due to other active reads.
How much of cache-hit happens in Blur? Will I be correct in saying that repeated terms occurring in search only will benefit block-cache? -- Ravi On Wed, Mar 19, 2014 at 11:06 PM, Ravikumar Govindarajan < [email protected]> wrote: > I was looking at block-cache code and trying to understand why we need it. > > We divide the file into blocks of 8KB and write to hadoop. While reading, > we only read in batches of 8KB and store in block-cache > > This is a form of read-ahead caching on the client-side[shard-server]. Am > I correct in understanding? > > Recent releases of hadoop have a notion of read-ahead caching in data-node > itself. The default value is 4MB but I believe it can also be configured to > whatever is needed. > > What are the advantages of a block-cache vis-a-vis data-node read-ahead > cache? > > I also am not familiar with hadoop IO sub-system as to whether it's > correct and performant to do read-aheads in data-nodes for a use-case like > lucene. > > Can someone help me? > > -- > Ravi > > >
