[ https://issues.apache.org/jira/browse/HBASE-5104?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13277182#comment-13277182 ]
Phabricator commented on HBASE-5104: ------------------------------------ stack has commented on the revision "[jira] [HBASE-5104] Provide a reliable intra-row pagination mechanism". lgtm INLINE COMMENTS src/main/java/org/apache/hadoop/hbase/client/Get.java:212 Will this be accurate if rows are inserted meantime (or deleted?). src/main/java/org/apache/hadoop/hbase/client/Get.java:201 This is great. One day we should do size-based too. src/main/java/org/apache/hadoop/hbase/client/Get.java:472 Why not just write out our version as 3? To save some bytes on wire? src/main/java/org/apache/hadoop/hbase/client/Scan.java:102 Doesn't Scan and Get share common ancestor? src/main/java/org/apache/hadoop/hbase/protobuf/ProtobufUtil.java:647 THanks for doing this. src/test/java/org/apache/hadoop/hbase/client/TestScannersFromClientSide.java:452 You need to add below to each classified test for classification to work @org.junit.Rule public org.apache.hadoop.hbase.ResourceCheckerJUnitRule cu = new org.apache.hadoop.hbase.ResourceCheckerJUnitRule(); REVISION DETAIL https://reviews.facebook.net/D2799 To: madhuvaidya, lhofhansl, Kannan, tedyu, stack, todd, JIRA, jxcn01, mbautin Cc: jxcn01, Liyin > Provide a reliable intra-row pagination mechanism > ------------------------------------------------- > > Key: HBASE-5104 > URL: https://issues.apache.org/jira/browse/HBASE-5104 > Project: HBase > Issue Type: Bug > Reporter: Kannan Muthukkaruppan > Assignee: Madhuwanti Vaidya > Attachments: D2799.1.patch, D2799.2.patch, D2799.3.patch, > D2799.4.patch, D2799.5.patch, D2799.6.patch, > jira-HBASE-5104-Provide-a-reliable-intra-row-paginat-2012-04-16_12_39_42.patch, > testFilterList.rb > > > Addendum: > Doing pagination (retrieving at most "limit" number of KVs at a particular > "offset") is currently supported via the ColumnPaginationFilter. However, it > is not a very clean way of supporting pagination. Some of the problems with > it are: > * Normally, one would expect a query with (Filter(A) AND Filter(B)) to have > same results as (query with Filter(A)) INTERSECT (query with Filter(B)). This > is not the case for ColumnPaginationFilter as its internal state gets updated > depending on whether or not Filter(A) returns TRUE/FALSE for a particular > cell. > * When this Filter is used in combination with other filters (e.g., doing AND > with another filter using FilterList), the behavior of the query depends on > the order of filters in the FilterList. This is not ideal. > * ColumnPaginationFilter is a stateful filter which ends up counting multiple > versions of the cell as separate values even if another filter upstream or > the ScanQueryMatcher is going to reject the value for other reasons. > Seems like we need a reliable way to do pagination. The particular use case > that prompted this JIRA is pagination within the same rowKey. For example, > for a given row key R, get columns with prefix P, starting at offset X (among > columns which have prefix P) and limit Y. Some possible fixes might be: > 1) enhance ColumnPrefixFilter to support another constructor which supports > limit/offset. > 2) Support pagination (limit/offset) at the Scan/Get API level (rather than > as a filter) [Like SQL]. > Original Post: > Thanks Jiakai Liu for reporting this issue and doing the initial > investigation. Email from Jiakai below: > Assuming that we have an index column family with the following entries: > "tag0:001:thread1" > ... > "tag1:001:thread1" > "tag1:002:thread2" > ... > "tag1:010:thread10" > ... > "tag2:001:thread1" > "tag2:005:thread5" > ... > To get threads with "tag1" in range [5, 10), I tried the following code: > ColumnPrefixFilter filter1 = new > ColumnPrefixFilter(Bytes.toBytes("tag1")); > ColumnPaginationFilter filter2 = new ColumnPaginationFilter(5 /* limit > */, 5 /* offset */); > FilterList filters = new FilterList(Operator.MUST_PASS_ALL); > filters.addFilter(filter1); > filters.addFilter(filter2); > Get get = new Get(USER); > get.addFamily(COLUMN_FAMILY); > get.setMaxVersions(1); > get.setFilter(filters); > Somehow it didn't work as expected. It returned the entries as if the filter1 > were not set. > Turns out the ColumnPrefixFilter returns SEEK_NEXT_USING_HINT in some cases. > The FilterList filter does not handle this return code properly (treat it as > INCLUDE). -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira