[
https://issues.apache.org/jira/browse/HBASE-3857?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13032656#comment-13032656
]
Mikhail Bautin commented on HBASE-3857:
---------------------------------------
I will try to answer the rest of the questions:
> + You say
> "Block!type,!a!sequence!of!bytes!equivalent!to!version!1's!"magic!records" Is
> this the case? The magic was supposed to be a sequence you could search to
> pick up the parse again after hitting a bad patch of corrupted data. You seem
> to instead start blocks with a type?
In our design, the magic record a serialized representation of the block
type.
I did not see any logic that searches for a magic record after hitting a
block
of bad data in version 1, so I did not implement it in version 2. I am not
sure
what are the specific data corruption cases this might help fix.
> + How are blocks sized now? Are we still cutting blocks off at first KV
boundary after we go past configured hfile block size – e.g. 64k – or instead,
is the block cutoff instead determined by fill of the bloom filter array or the
index?
The blocks are sized the same way as before. Block cutoff happens
independently
for regular data blocks and for inline blocks (Bloom blocks and leaf data
index
blocks). When a normal data block fills up, we give all registered "inline
block writers" a chance to insert their next block into the stream. The
Bloom
filter writer has an ability to queue filled-up blocks until its next
chance to
write them, and block index writer's chunks can only fill up on data block
boundary.
> + I think I know what the following refers to in the diagram,
"Version!2!root index,!stored!in!the!data!block!index!section!of!the!file" –
its kept in the 'load-on-open section', right?
This should have been "Version 2 root index, stored in the load-on-open
section
of the file". Thanks for catching this. I will fix this in the spec.
> + • Offset!(long)!
> o For this description
"This!offset!may!point!to!a!data!block!or!to!a!deeper?level!index!block.!
> •
On?disk!size!(int)!
> • Key!(a!serialized!byte!array)!
> o Key!(VInt)!
> o
Key!bytes"
> You are using vint specifying key size. We didn't do that in v1? You have a
> good implementation (was costly IIRC using hadoops').
Actually, version 1 already uses VInt to store the block index, because it
uses
Bytes.writeByteArray, which stores the length as a VInt. We decided to keep
the
root-level block index format similar to the version 1 block index format,
since
it gets de-serialized into a byte[][], a long[], and an int[] anyway.
> + Is a '!root!index!bloc' same as a 'Root Data Index' (from the diagram?)
The Root Data Index is one particular instance of a root index block. We
use the
same "root index block" format for the data index root level, meta index
(which is always single-level), and Bloom index (also single-level). For
intermediate and leaf-level blocks we use another "non-root index block"
format
that allows to do binary search of the serialized data structure.
> + "• entryOffsets:!the!“secondary!index”
of!offsets!of!entries!in!the!block,!to!
facilitate!a!quick!binary!search!on!the!key!(numEntries-int!values)"
> Is this worth the bother? A binary search of in-memory data structure? How
> many entries are you thinking there will be in these blocks?
After discussing this with Nicolas, we decided not to change the data block
format, because in our case there are somewhere between 10-500 key/value
pairs
per data block, so binary search does not offer much benefit compared to the
current linear search, and the read time is dominated by input/output
anyway.
Hope this helps. Please let me know if you have any further questions/concerns
about
the HFile format v2.
Thanks!
--Mikhail
> Change the HFile Format
> -----------------------
>
> Key: HBASE-3857
> URL: https://issues.apache.org/jira/browse/HBASE-3857
> Project: HBase
> Issue Type: New Feature
> Reporter: Liyin Tang
> Assignee: Mikhail Bautin
> Attachments: hfile_format_v2_design_draft_0.1.pdf
>
>
> In order to support HBASE-3763 and HBASE-3856, we need to change the format
> of the HFile. The new format proposal is attached here. Thanks for Mikhail
> Bautin for the documentation.
--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira