[ https://issues.apache.org/jira/browse/HBASE-5104?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13255008#comment-13255008 ]
Hadoop QA commented on HBASE-5104: ---------------------------------- -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12522843/jira-HBASE-5104-Provide-a-reliable-intra-row-paginat-2012-04-16_12_39_42.patch against trunk revision . +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 11 new or modified tests. +1 javadoc. The javadoc tool did not generate any warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. -1 findbugs. The patch appears to introduce 4 new Findbugs (version 1.3.9) warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. -1 core tests. The patch failed these unit tests: Test results: https://builds.apache.org/job/PreCommit-HBASE-Build/1541//testReport/ Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/1541//artifact/trunk/patchprocess/newPatchFindbugsWarnings.html Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/1541//console This message is automatically generated. > Provide a reliable intra-row pagination mechanism > ------------------------------------------------- > > Key: HBASE-5104 > URL: https://issues.apache.org/jira/browse/HBASE-5104 > Project: HBase > Issue Type: Bug > Reporter: Kannan Muthukkaruppan > Assignee: Madhuwanti Vaidya > Attachments: D2799.1.patch, D2799.2.patch, D2799.3.patch, > jira-HBASE-5104-Provide-a-reliable-intra-row-paginat-2012-04-16_12_39_42.patch, > testFilterList.rb > > > Addendum: > Doing pagination (retrieving at most "limit" number of KVs at a particular > "offset") is currently supported via the ColumnPaginationFilter. However, it > is not a very clean way of supporting pagination. Some of the problems with > it are: > * Normally, one would expect a query with (Filter(A) AND Filter(B)) to have > same results as (query with Filter(A)) INTERSECT (query with Filter(B)). This > is not the case for ColumnPaginationFilter as its internal state gets updated > depending on whether or not Filter(A) returns TRUE/FALSE for a particular > cell. > * When this Filter is used in combination with other filters (e.g., doing AND > with another filter using FilterList), the behavior of the query depends on > the order of filters in the FilterList. This is not ideal. > * ColumnPaginationFilter is a stateful filter which ends up counting multiple > versions of the cell as separate values even if another filter upstream or > the ScanQueryMatcher is going to reject the value for other reasons. > Seems like we need a reliable way to do pagination. The particular use case > that prompted this JIRA is pagination within the same rowKey. For example, > for a given row key R, get columns with prefix P, starting at offset X (among > columns which have prefix P) and limit Y. Some possible fixes might be: > 1) enhance ColumnPrefixFilter to support another constructor which supports > limit/offset. > 2) Support pagination (limit/offset) at the Scan/Get API level (rather than > as a filter) [Like SQL]. > Original Post: > Thanks Jiakai Liu for reporting this issue and doing the initial > investigation. Email from Jiakai below: > Assuming that we have an index column family with the following entries: > "tag0:001:thread1" > ... > "tag1:001:thread1" > "tag1:002:thread2" > ... > "tag1:010:thread10" > ... > "tag2:001:thread1" > "tag2:005:thread5" > ... > To get threads with "tag1" in range [5, 10), I tried the following code: > ColumnPrefixFilter filter1 = new > ColumnPrefixFilter(Bytes.toBytes("tag1")); > ColumnPaginationFilter filter2 = new ColumnPaginationFilter(5 /* limit > */, 5 /* offset */); > FilterList filters = new FilterList(Operator.MUST_PASS_ALL); > filters.addFilter(filter1); > filters.addFilter(filter2); > Get get = new Get(USER); > get.addFamily(COLUMN_FAMILY); > get.setMaxVersions(1); > get.setFilter(filters); > Somehow it didn't work as expected. It returned the entries as if the filter1 > were not set. > Turns out the ColumnPrefixFilter returns SEEK_NEXT_USING_HINT in some cases. > The FilterList filter does not handle this return code properly (treat it as > INCLUDE). -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira