[ https://issues.apache.org/jira/browse/HADOOP-2604?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12559203#action_12559203 ]
Bryan Duxbury commented on HADOOP-2604: --------------------------------------- I think it would make sense for us to maintain both a bloom filter and an index on row keys. That way, you can check the filter first to decide if you should check the index. Even if there's been deletions in the region that damage the filter, the index will still answer your question pretty quickly. We can maintain (re-create) the filter during compactions. I think we would see huge gains from having an always-on bloom filter, especially for sparser row spaces. > [hbase] Create an HBase-specific MapFile implementation > ------------------------------------------------------- > > Key: HADOOP-2604 > URL: https://issues.apache.org/jira/browse/HADOOP-2604 > Project: Hadoop > Issue Type: Improvement > Components: contrib/hbase > Reporter: Bryan Duxbury > Priority: Minor > > Today, HBase uses the Hadoop MapFile class to store data persistently to > disk. This is convenient, as it's already done (and maintained by other > people :). However, it's beginning to look like there might be possible > performance benefits to be had from doing an HBase-specific implementation of > MapFile that incorporated some precise features. > This issue should serve as a place to track discussion about what features > might be included in such an implementation. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.