[
https://issues.apache.org/jira/browse/HBASE-3857?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13029615#comment-13029615
]
stack commented on HBASE-3857:
------------------------------
Design looks excellent.
A few comments:
+ It looks like it will be self-migrating in that it can read version1 hfiles.
Thats great.
+ You say
"Block!type,!a!sequence!of!bytes!equivalent!to!version!1's!"magic!records" Is
this the case? The magic was supposed to be a sequence you could search to
pick up the parse again after hitting a bad patch of corrupted data. You seem
to instead start blocks with a type?
+ How are blocks sized now? Are we still cutting blocks off at first KV
boundary after we go past configured hfile block size -- e.g. 64k -- or
instead, is the block cutoff instead determined by fill of the bloom filter
array or the index?
+ I think I know what the following refers to in the diagram, "Version!2!root
index,!stored!in!the!data!block!index!section!of!the!file" -- its kept in the
'load-on-open section', right?
+ Can we have example of how root, intermediate and leaf indices interrelate?
Whats in the root, intermediates, and leaf indices? Are intermediates
optional? At what boundary do they cut in? Leaf indices are optional too?
What are these? indices into the data block?
+ • Offset!(long)!
o For this description
"This!offset!may!point!to!a!data!block!or!to!a!deeper?level!index!block.!
• On?disk!size!(int)!
• Key!(a!serialized!byte!array)!
o Key!(VInt)!
o Key!bytes"
You are using vint specifying key size. We didn't do that in v1? You have a
good implementation (was costly IIRC using hadoops').
+ Is a '!root!index!bloc' same as a 'Root Data Index' (from the diagram?)
+ "• entryOffsets:!the!“secondary!index” of!offsets!of!entries!in!the!block,!to!
facilitate!a!quick!binary!search!on!the!key!(numEntries-int!values)"
Is this worth the bother? A binary search of in-memory data structure? How
many entries are you thinking there will be in these blocks?
+1
> Change the HFile Format
> -----------------------
>
> Key: HBASE-3857
> URL: https://issues.apache.org/jira/browse/HBASE-3857
> Project: HBase
> Issue Type: New Feature
> Reporter: Liyin Tang
> Assignee: Mikhail Bautin
> Attachments: hfile_format_v2_design_draft_0.1.pdf
>
>
> In order to support HBASE-3763 and HBASE-3856, we need to change the format
> of the HFile. The new format proposal is attached here. Thanks for Mikhail
> Bautin for the documentation.
--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira