[ https://issues.apache.org/jira/browse/HADOOP-1398?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Tom White updated HADOOP-1398: ------------------------------ Attachment: hadoop-blockcache.patch Here is an initial implementation - feedback would be much appreciated. BlockFSInputStream reads a FSInputStream in a block-oriented manner, and caches blocks. There's also a BlockMapFile.Reader that uses a BlockFSInputStream to read the MapFile data. HStore uses a BlockMapFile.Reader to read the first HStoreFile - at startup and after compaction. New HStoreFiles produced after memcache flushes are read using a regular reader in order to keep memory use fixed. Currently block caching is configured by the hbase properties hbase.hstore.blockCache.maxSize (defaults to 0 - no cache) and hbase.hstore.blockCache.blockSize (defaults to 64k). (It would be desirable to make caches configurable on a per-column family basis - the current way is just a stop gap.) I've also had to push details of the block caching implementation up to MapFile.Reader, which is undesirable. The problem is that the streams are opened in the constructor of SequenceFile.Reader, which is called by the constructor of MapFile.Reader, so there is no opportunity to set the final fields blockSize and maxBlockCacheSize on a subclass of MapFile.Reader before the stream is opened. I think the proper solution is to have an explicit open method on SequenceFile.Reader, but I'm not sure about the impact of this since it would be an incompatible change. Perhaps do in conjunction with HADOOP-2604? > Add in-memory caching of data > ----------------------------- > > Key: HADOOP-1398 > URL: https://issues.apache.org/jira/browse/HADOOP-1398 > Project: Hadoop > Issue Type: New Feature > Components: contrib/hbase > Reporter: Jim Kellerman > Priority: Trivial > Attachments: hadoop-blockcache.patch > > > Bigtable provides two in-memory caches: one for row/column data and one for > disk block caches. > The size of each cache should be configurable, data should be loaded lazily, > and the cache managed by an LRU mechanism. > One complication of the block cache is that all data is read through a > SequenceFile.Reader which ultimately reads data off of disk via a RPC proxy > for ClientProtocol. This would imply that the block caching would have to be > pushed down to either the DFSClient or SequenceFile.Reader -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.