[ https://issues.apache.org/jira/browse/HADOOP-2604?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12559655#action_12559655 ]
Doug Cutting commented on HADOOP-2604: -------------------------------------- > Exclude column family name from the file [ ... ] The column family name could be stored in the SequenceFile's metadata, no? MapFile's constructors don't currently permit one to specify metadata, but that'd be easy to add. > There is some indication that the existing MapFile implementation is > optimized for streaming access [ ... ] It shouldn't be. The problem is that mapreduce, what's primarily used to benchmark and debug Hadoop, doesn't do any random access. So it's easy for random-access-related performance problems to sneak into MapFile and HDFS. Both Nutch and HBase depend on efficient random access from Hadoop, primarily through MapFile. We need a good random-access benchmark that someone regularly executes. Perhaps one could be added to the sort benchmark suite, since that is regularly run by Yahoo!? Or someone else could start running regular HBase benchmarks on a grid somewhere? > [hbase] Create an HBase-specific MapFile implementation > ------------------------------------------------------- > > Key: HADOOP-2604 > URL: https://issues.apache.org/jira/browse/HADOOP-2604 > Project: Hadoop > Issue Type: Improvement > Components: contrib/hbase > Reporter: Bryan Duxbury > Priority: Minor > > Today, HBase uses the Hadoop MapFile class to store data persistently to > disk. This is convenient, as it's already done (and maintained by other > people :). However, it's beginning to look like there might be possible > performance benefits to be had from doing an HBase-specific implementation of > MapFile that incorporated some precise features. > This issue should serve as a place to track discussion about what features > might be included in such an implementation. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.