[ https://issues.apache.org/jira/browse/HBASE-14283?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14975523#comment-14975523 ]
Andrew Purtell commented on HBASE-14283: ---------------------------------------- I'll back mine out because the build is broken again (two CellUtil#matchingTimestamps) > Reverse scan doesn’t work with HFile inline index/bloom blocks > -------------------------------------------------------------- > > Key: HBASE-14283 > URL: https://issues.apache.org/jira/browse/HBASE-14283 > Project: HBase > Issue Type: Bug > Reporter: Ben Lau > Assignee: Ben Lau > Fix For: 2.0.0, 1.2.0, 1.3.0, 1.0.3, 1.1.3, 0.98.16 > > Attachments: HBASE-14283-0.98.patch, HBASE-14283-branch-1.0.patch, > HBASE-14283-branch-1.1.patch, HBASE-14283-branch-1.2.patch, > HBASE-14283-branch-1.patch, HBASE-14283-master.patch, > HBASE-14283-reupload-master.patch, HBASE-14283-v2.patch, HBASE-14283.patch, > hbase-14283_add.patch, hfile-seek-before.patch > > > Reverse scans do not work if an HFile contains inline bloom blocks or leaf > level index blocks. The reason is because the seekBefore() call calculates > the previous data block’s size by assuming data blocks are contiguous which > is not the case in HFile V2 and beyond. > Attached is a first cut patch (targeting > bcef28eefaf192b0ad48c8011f98b8e944340da5 on trunk) which includes: > (1) a unit test which exposes the bug and demonstrates failures for both > inline bloom blocks and inline index blocks > (2) a proposed fix for inline index blocks that does not require a new HFile > version change, but is only performant for 1 and 2-level indexes and not 3+. > 3+ requires an HFile format update for optimal performance. > This patch does not fix the bloom filter blocks bug. But the fix should be > similar to the case of inline index blocks. The reason I haven’t made the > change yet is I want to confirm that you guys would be fine with me revising > the HFile.Reader interface. > Specifically, these 2 functions (getGeneralBloomFilterMetadata and > getDeleteBloomFilterMetadata) need to return the BloomFilter. Right now the > HFileReader class doesn’t have a reference to the bloom filters (and hence > their indices) and only constructs the IO streams and hence has no way to > know where the bloom blocks are in the HFile. It seems that the HFile.Reader > bloom method comments state that they “know nothing about how that metadata > is structured” but I do not know if that is a requirement of the abstraction > (why?) or just an incidental current property. > We would like to do 3 things with community approval: > (1) Update the HFile.Reader interface and implementation to contain and > return BloomFilters directly rather than unstructured IO streams > (2) Merge the fixes for index blocks and bloom blocks into open source > (3) Create a new Jira ticket for open source HBase to add a ‘prevBlockSize’ > field in the block header in the next HFile version, so that seekBefore() > calls can not only be correct but performant in all cases. -- This message was sent by Atlassian JIRA (v6.3.4#6332)