[
https://issues.apache.org/jira/browse/HADOOP-1398?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Tom White updated HADOOP-1398:
------------------------------
Attachment: hadoop-blockcache.patch
Here is an initial implementation - feedback would be much appreciated.
BlockFSInputStream reads a FSInputStream in a block-oriented manner, and caches
blocks. There's also a BlockMapFile.Reader that uses a BlockFSInputStream to
read the MapFile data. HStore uses a BlockMapFile.Reader to read the first
HStoreFile - at startup and after compaction. New HStoreFiles produced after
memcache flushes are read using a regular reader in order to keep memory use
fixed. Currently block caching is configured by the hbase properties
hbase.hstore.blockCache.maxSize (defaults to 0 - no cache) and
hbase.hstore.blockCache.blockSize (defaults to 64k). (It would be desirable to
make caches configurable on a per-column family basis - the current way is just
a stop gap.)
I've also had to push details of the block caching implementation up to
MapFile.Reader, which is undesirable. The problem is that the streams are
opened in the constructor of SequenceFile.Reader, which is called by the
constructor of MapFile.Reader, so there is no opportunity to set the final
fields blockSize and maxBlockCacheSize on a subclass of MapFile.Reader before
the stream is opened. I think the proper solution is to have an explicit open
method on SequenceFile.Reader, but I'm not sure about the impact of this since
it would be an incompatible change. Perhaps do in conjunction with HADOOP-2604?
> Add in-memory caching of data
> -----------------------------
>
> Key: HADOOP-1398
> URL: https://issues.apache.org/jira/browse/HADOOP-1398
> Project: Hadoop
> Issue Type: New Feature
> Components: contrib/hbase
> Reporter: Jim Kellerman
> Priority: Trivial
> Attachments: hadoop-blockcache.patch
>
>
> Bigtable provides two in-memory caches: one for row/column data and one for
> disk block caches.
> The size of each cache should be configurable, data should be loaded lazily,
> and the cache managed by an LRU mechanism.
> One complication of the block cache is that all data is read through a
> SequenceFile.Reader which ultimately reads data off of disk via a RPC proxy
> for ClientProtocol. This would imply that the block caching would have to be
> pushed down to either the DFSClient or SequenceFile.Reader
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.