[ 
https://issues.apache.org/jira/browse/HBASE-9079?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13723166#comment-13723166
 ] 

Ted Yu commented on HBASE-9079:
-------------------------------

For TestFuzzyAndColumnRangeFilter, please add license.

Can you provide trunk patch so that we can let Hadoop QA run through it ?
{code}
+        FilterList filterList = new 
FilterList(Lists.<Filter>newArrayList(fuzzyRowFilter, columnRangeFilter));
{code}
Can you alter the order of the two filters above so that we know the 
correctness isn't dependent on ordering of the Filters ?
Meaning both orders are tested.

Indentation is off - it should be two spaces for each level of indentation.
{code}
+                LOG.info("Got rk: " + Bytes.toStringBinary(kv.getRow()) + " 
cq: " + Bytes.toStringBinary(kv.getQualifier()));
{code}
Length limit should be 100 per line.

In getNextKeyHint():
{code}
     for (Filter filter : filters) {
+      if (seekHintFilter != null && seekHintFilter != filter) {
+        //get hint from the filter that was responsible for the
+        //SEEK_NEXT_USING_HINT code
+        continue;
{code}
Does the above if block mean that only one Filter which provides seek hint 
would be respected ?
                
> FilterList getNextKeyHint skips rows that should be included in the results
> ---------------------------------------------------------------------------
>
>                 Key: HBASE-9079
>                 URL: https://issues.apache.org/jira/browse/HBASE-9079
>             Project: HBase
>          Issue Type: Bug
>          Components: Filters
>    Affects Versions: 0.94.10
>            Reporter: Viral Bajaria
>         Attachments: TestFail.patch, TestSuccess.patch
>
>
> I hit a weird issue/bug and am able to reproduce the error consistently. The 
> problem arises when FilterList has two filters where each implements the 
> getNextKeyHint method.
> The way the current implementation works is, StoreScanner will call 
> matcher.getNextKeyHint() whenever it gets a SEEK_NEXT_USING_HINT. This in 
> turn will call filter.getNextKeyHint() which at this stage is of type 
> FilterList. The implementation in FilterList iterates through all the filters 
> and keeps the max KeyValue that it sees. All is fine if you wrap filters in 
> FilterList in which only one of them implements getNextKeyHint. but if 
> multiple of them implement then that's where things get weird.
> For example:
> - create two filters: one is FuzzyRowFilter and second is ColumnRangeFilter. 
> Both of them implement getNextKeyHint
> - wrap them in FilterList with MUST_PASS_ALL
> - FuzzyRowFilter will seek to the correct first row and then pass it to 
> ColumnRangeFilter which will return the SEEK_NEXT_USING_HINT code.
> - Now in FilterList when getNextKeyHint is called, it calls the one on 
> FuzzyRow first which basically says what the next row should be. While in 
> reality we want the ColumnRangeFilter to give the seek hint.
> - The above behavior skips data that should be returned, which I have 
> verified by using a RowFilter with RegexStringComparator.
> I updated the FilterList to maintain state on which filter returns the 
> SEEK_NEXT_USING_HINT and in getNextKeyHint, I invoke the method on the saved 
> filter and reset that state. I tested it with my current queries and it works 
> fine but I need to run the entire test suite to make sure I have not 
> introduced any regression. In addition to that I need to figure out what 
> should be the behavior when the opeation is MUST_PASS_ONE, but I doubt it 
> should be any different.
> Is my understanding of it being a bug correct ? Or am I trivializing it and 
> ignoring something very important ? If it's tough to wrap your head around 
> the explanation, then I can open a JIRA and upload a patch against 0.94 head.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Reply via email to