[ https://issues.apache.org/jira/browse/HBASE-3857?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13033295#comment-13033295 ]
Mikhail Bautin commented on HBASE-3857: --------------------------------------- Hi St.Ack, Thank you for all the feedback! To scan an HFile in the new format we don't even need the root index. Each block is self-sufficient in that the header contains all the information necessary to decode the block, except the compression type, which is found in the trailer. We could create an "HFile fix" tool that would rebuild the block index if necessary. In HFile format v1, however, if the block index is corrupt, we would not be able to read any data blocks at all. So I don't see how HFile format v2 is more brittle than v1. Implementation update: a load test (org.apache.hadoop.hbase.manual.HBaseTest) is successfully running on a 5-node cluster, and I see some 2-level indexes being created with 5-15 root-level entries so far (with the max index block size set to 128K), as well as some compound ROW Bloom filters. Regards, --Mikhail > Change the HFile Format > ----------------------- > > Key: HBASE-3857 > URL: https://issues.apache.org/jira/browse/HBASE-3857 > Project: HBase > Issue Type: New Feature > Reporter: Liyin Tang > Assignee: Mikhail Bautin > Attachments: hfile_format_v2_design_draft_0.1.pdf > > > In order to support HBASE-3763 and HBASE-3856, we need to change the format > of the HFile. The new format proposal is attached here. Thanks for Mikhail > Bautin for the documentation. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira