[ 
https://issues.apache.org/jira/browse/HADOOP-1398?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tom White updated HADOOP-1398:
------------------------------

    Attachment: hadoop-blockcache.patch

Here is an initial implementation - feedback would be much appreciated.

BlockFSInputStream reads a FSInputStream in a block-oriented manner, and caches 
blocks. There's also a BlockMapFile.Reader that uses a BlockFSInputStream to 
read the MapFile data. HStore uses a BlockMapFile.Reader to read the first 
HStoreFile - at startup and after compaction. New HStoreFiles produced after 
memcache flushes are read using a regular reader in order to keep memory use 
fixed. Currently block caching is configured by the hbase properties 
hbase.hstore.blockCache.maxSize (defaults to 0 - no cache) and 
hbase.hstore.blockCache.blockSize (defaults to 64k). (It would be desirable to 
make caches configurable on a per-column family basis - the current way is just 
a stop gap.)

I've also had to push details of the block caching implementation up to 
MapFile.Reader, which is undesirable. The problem is that the streams are 
opened in the constructor of SequenceFile.Reader, which is called by the 
constructor of MapFile.Reader, so there is no opportunity to set the final 
fields blockSize and maxBlockCacheSize on a subclass of MapFile.Reader before 
the stream is opened. I think the proper solution is to have an explicit open 
method on SequenceFile.Reader, but I'm not sure about the impact of this since 
it would be an incompatible change. Perhaps do in conjunction with HADOOP-2604?

> Add in-memory caching of data
> -----------------------------
>
>                 Key: HADOOP-1398
>                 URL: https://issues.apache.org/jira/browse/HADOOP-1398
>             Project: Hadoop
>          Issue Type: New Feature
>          Components: contrib/hbase
>            Reporter: Jim Kellerman
>            Priority: Trivial
>         Attachments: hadoop-blockcache.patch
>
>
> Bigtable provides two in-memory caches: one for row/column data and one for 
> disk block caches.
> The size of each cache should be configurable, data should be loaded lazily, 
> and the cache managed by an LRU mechanism.
> One complication of the block cache is that all data is read through a 
> SequenceFile.Reader which ultimately reads data off of disk via a RPC proxy 
> for ClientProtocol. This would imply that the block caching would have to be 
> pushed down to either the DFSClient or SequenceFile.Reader

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to