[ 
https://issues.apache.org/jira/browse/HBASE-14283?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14909063#comment-14909063
 ] 

Ben Lau commented on HBASE-14283:
---------------------------------

As Matteo mentions, we discussed this issue briefly with him and Stack about a 
week ago or so.  [~mbertozzi] I thought we had asked if updating the major 
version would be fine and the answer was in the affirmative but I might've 
misunderstood or misunderstood the degree of affirmation.  (IIRC the reason was 
something along the lines of cluster operators updating 
HFile.FORMAT_VERSION_KEY in their config being a desirable property of the 2nd 
rolling upgrade or something.)

In any case though the reason we initially were going to do a 3.x but moved to 
a 4.0 was because (1) major versions (but not in HFiles?) generally denote 
level of backward compatibility and the new HFiles produced in this patch 
cannot be read by an HFileV3.X reader (2) the patch requires enough changes to 
assumptions in the serialization code (eg regarding header size or block cache) 
that it doesn't seem appropriate as a minor version change and (3) if we are 
following the rules we've set for ourselves, the changes in HConstants alone 
(annotated as Stable) mean this patch should be going into HBase 2.0 
(admittedly rules can always be bent).

Updating only the minor version because the format change currently fixes only 
1 bug (as opposed to 10 bugs or adding a new feature) seems to be the wrong way 
to think of versioning, IMO.  If our concern is that we would like a fix for 
this bug in 1.3 and not wait until 2.0, we could also commit a shorter term fix 
for 1.3 that just always reads the block header or does an optimistic read and 
falls back on reading the block header if the read fails from block size 
expectations (configurable, optimistic off by default).  In combination with an 
expected size correction for index blocks (perhaps not for bloom filter blocks 
since that fix is a messy addition and also violates some API abstraction 
layers in StoreFile) it might be fine in most scenarios, especially if the 
cluster operator is allowed to change the inline block chunking config for the 
cluster that needs to do reverse scans.  Within Yahoo internally we will 
probably go with a bandaid fix like this for now so that users can use reverse 
scans and still get 'ok' even if not max performance.  (Also sub-100% 
performance is better than getting exceptions about block sizes ;) )

If people would be okay with dividing it up this way-- short term fix (no HFile 
changes) for 1.3 and a longer term fix (HFileV4) for HBase 2.0 I could create a 
separate ticket with the newest patch as a starting point for HFileV4 and 
submit another patch for this ticket that implements a configurable 'choose 
your tradeoff' fix as described in the previous paragraph.

> Reverse scan doesn’t work with HFile inline index/bloom blocks
> --------------------------------------------------------------
>
>                 Key: HBASE-14283
>                 URL: https://issues.apache.org/jira/browse/HBASE-14283
>             Project: HBase
>          Issue Type: Bug
>            Reporter: Ben Lau
>            Assignee: Ben Lau
>         Attachments: HBASE-14283-v2.patch, HBASE-14283.patch, 
> hfile-seek-before.patch
>
>
> Reverse scans do not work if an HFile contains inline bloom blocks or leaf 
> level index blocks.  The reason is because the seekBefore() call calculates 
> the previous data block’s size by assuming data blocks are contiguous which 
> is not the case in HFile V2 and beyond.
> Attached is a first cut patch (targeting 
> bcef28eefaf192b0ad48c8011f98b8e944340da5 on trunk) which includes:
> (1) a unit test which exposes the bug and demonstrates failures for both 
> inline bloom blocks and inline index blocks
> (2) a proposed fix for inline index blocks that does not require a new HFile 
> version change, but is only performant for 1 and 2-level indexes and not 3+.  
> 3+ requires an HFile format update for optimal performance.    
> This patch does not fix the bloom filter blocks bug.  But the fix should be 
> similar to the case of inline index blocks.  The reason I haven’t made the 
> change yet is I want to confirm that you guys would be fine with me revising 
> the HFile.Reader interface.
> Specifically, these 2 functions (getGeneralBloomFilterMetadata and 
> getDeleteBloomFilterMetadata) need to return the BloomFilter.  Right now the 
> HFileReader class doesn’t have a reference to the bloom filters (and hence 
> their indices) and only constructs the IO streams and hence has no way to 
> know where the bloom blocks are in the HFile.  It seems that the HFile.Reader 
> bloom method comments state that they “know nothing about how that metadata 
> is structured” but I do not know if that is a requirement of the abstraction 
> (why?) or just an incidental current property. 
> We would like to do 3 things with community approval:
> (1) Update the HFile.Reader interface and implementation to contain and 
> return BloomFilters directly rather than unstructured IO streams
> (2) Merge the fixes for index blocks and bloom blocks into open source
> (3) Create a new Jira ticket for open source HBase to add a ‘prevBlockSize’ 
> field in the block header in the next HFile version, so that seekBefore() 
> calls can not only be correct but performant in all cases.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to