[ 
https://issues.apache.org/jira/browse/HBASE-3857?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13029615#comment-13029615
 ] 

stack commented on HBASE-3857:
------------------------------

Design looks excellent.

A few comments:

+ It looks like it will be self-migrating in that it can read version1 hfiles.  
Thats great.
+ You say 
"Block!type,!a!sequence!of!bytes!equivalent!to!version!1's!"magic!records"  Is 
this the case?  The magic was supposed to be a sequence you could search to 
pick up the parse again after hitting a bad patch of corrupted data.  You seem 
to instead start blocks with a type?
+ How are blocks sized now?  Are we still cutting blocks off at first KV 
boundary after we go past configured hfile block size -- e.g. 64k -- or 
instead, is the block cutoff instead determined by fill of the bloom filter 
array or the index?
+ I think I know what the following refers to in the diagram, "Version!2!root 
index,!stored!in!the!data!block!index!section!of!the!file" -- its kept in the 
'load-on-open section', right?
+ Can we have example of how root, intermediate and leaf indices interrelate?  
Whats in the root, intermediates, and leaf indices?  Are intermediates 
optional?  At what boundary do they cut in?  Leaf indices are optional too?  
What are these? indices into the data block?
+ • Offset!(long)!
o For this description 
"This!offset!may!point!to!a!data!block!or!to!a!deeper?level!index!block.!
• On?disk!size!(int)!
• Key!(a!serialized!byte!array)!
o Key!(VInt)!
o Key!bytes"

You are using vint specifying key size.  We didn't do that in v1?  You have a 
good implementation (was costly IIRC using hadoops').

+ Is a '!root!index!bloc' same as a 'Root Data Index' (from the diagram?)
+ "• entryOffsets:!the!“secondary!index” of!offsets!of!entries!in!the!block,!to!
facilitate!a!quick!binary!search!on!the!key!(numEntries-int!values)"

Is this worth the bother?  A binary search of in-memory data structure?  How 
many entries are you thinking there will be in these blocks?

 
+1



> Change the HFile Format
> -----------------------
>
>                 Key: HBASE-3857
>                 URL: https://issues.apache.org/jira/browse/HBASE-3857
>             Project: HBase
>          Issue Type: New Feature
>            Reporter: Liyin Tang
>            Assignee: Mikhail Bautin
>         Attachments: hfile_format_v2_design_draft_0.1.pdf
>
>
> In order to support HBASE-3763 and HBASE-3856, we need to change the format 
> of the HFile. The new format proposal is attached here. Thanks for Mikhail 
> Bautin for the documentation. 

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

Reply via email to