I was looking at block-cache code and trying to understand why we need it.

We divide the file into blocks of 8KB and write to hadoop. While reading,
we only read in batches of 8KB and store in block-cache

This is a form of read-ahead caching on the client-side[shard-server]. Am I
correct in understanding?

Recent releases of hadoop have a notion of read-ahead caching in data-node
itself. The default value is 4MB but I believe it can also be configured to
whatever is needed.

What are the advantages of a block-cache vis-a-vis data-node read-ahead
cache?

I also am not familiar with hadoop IO sub-system as to whether it's correct
and performant to do read-aheads in data-nodes for a use-case like lucene.

Can someone help me?

--
Ravi

Reply via email to