[ 
https://issues.apache.org/jira/browse/HBASE-9079?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13731447#comment-13731447
 ] 

Lars Hofhansl commented on HBASE-9079:
--------------------------------------

>From your example the problem is (1) "FuzzyRow includes the filter and says 
>move on to ColumnRange" paired with (2) "FuzzyRow returns the next row that we 
>should use".
Even though the FuzzyRowInclude said we should include the row the call to 
getNextKeyHint() returns a non-null KV.

So from that angle we should only consult the filters that we actually called, 
which your patch does correctly.

Now, what if FuzzyRowFilter had been the one that returned 
SEEK_NEXT_USING_HINT. Then that filter would be the one to provide the 
getNextKeyHint as well, and might skip right past the KV you want for the 
ColumnRangeFilter. You are saying in that case FuzzyRowFilter did not INCLUDE 
it, and thus it would be correct use its getNextKeyHint?

And because we're short-circuiting the and when we encounter 
SEEK_NEXT_USING_HINT, we can safely jump to that KV. OK. Seems that is correct. 
Just making sure...

In that case I only have one further comment:
getNextKeyHint on FilterList is only called when SEEK_NEXT_USING_HINT is 
returned. If this FilterList is MUST_PASS_ALL the seekHintFilter must not be 
null, correct? So we could simplify like this:
{code}
@@ -332,9 +337,15 @@
   @Override
   public KeyValue getNextKeyHint(KeyValue currentKV) {
     KeyValue keyHint = null;
+    if (operator == Operator.MUST_PASS_ALL) {
+      keyHint = seekHintFilter.getNextKeyHint(currentKV);
+      return keyHint;
+    }
+
     for (Filter filter : filters) {
       KeyValue curKeyHint = filter.getNextKeyHint(currentKV);
-      if (curKeyHint == null && operator == Operator.MUST_PASS_ONE) {
+      if (curKeyHint == null) {
         // If we ever don't have a hint and this is must-pass-one, then no hint
         return null;
       }
@@ -344,13 +355,7 @@
           keyHint = curKeyHint;
           continue;
         }
-        // There is an existing hint
-        if (operator == Operator.MUST_PASS_ALL &&
-            KeyValue.COMPARATOR.compare(keyHint, curKeyHint) < 0) {
-          // If all conditions must pass, we can keep the max hint
-          keyHint = curKeyHint;
-        } else if (operator == Operator.MUST_PASS_ONE &&
-            KeyValue.COMPARATOR.compare(keyHint, curKeyHint) > 0) {
+        if (KeyValue.COMPARATOR.compare(keyHint, curKeyHint) > 0) {
           // If any condition can pass, we need to keep the min hint
           keyHint = curKeyHint;
         }
{code}
And then also reset the seekHintFilter in the reset() method.
                
> FilterList getNextKeyHint skips rows that should be included in the results
> ---------------------------------------------------------------------------
>
>                 Key: HBASE-9079
>                 URL: https://issues.apache.org/jira/browse/HBASE-9079
>             Project: HBase
>          Issue Type: Bug
>          Components: Filters
>    Affects Versions: 0.94.10
>            Reporter: Viral Bajaria
>             Fix For: 0.98.0, 0.95.2, 0.94.12
>
>         Attachments: HBASE-9079-0.94.patch, HBASE-9079-trunk.patch
>
>
> I hit a weird issue/bug and am able to reproduce the error consistently. The 
> problem arises when FilterList has two filters where each implements the 
> getNextKeyHint method.
> The way the current implementation works is, StoreScanner will call 
> matcher.getNextKeyHint() whenever it gets a SEEK_NEXT_USING_HINT. This in 
> turn will call filter.getNextKeyHint() which at this stage is of type 
> FilterList. The implementation in FilterList iterates through all the filters 
> and keeps the max KeyValue that it sees. All is fine if you wrap filters in 
> FilterList in which only one of them implements getNextKeyHint. but if 
> multiple of them implement then that's where things get weird.
> For example:
> - create two filters: one is FuzzyRowFilter and second is ColumnRangeFilter. 
> Both of them implement getNextKeyHint
> - wrap them in FilterList with MUST_PASS_ALL
> - FuzzyRowFilter will seek to the correct first row and then pass it to 
> ColumnRangeFilter which will return the SEEK_NEXT_USING_HINT code.
> - Now in FilterList when getNextKeyHint is called, it calls the one on 
> FuzzyRow first which basically says what the next row should be. While in 
> reality we want the ColumnRangeFilter to give the seek hint.
> - The above behavior skips data that should be returned, which I have 
> verified by using a RowFilter with RegexStringComparator.
> I updated the FilterList to maintain state on which filter returns the 
> SEEK_NEXT_USING_HINT and in getNextKeyHint, I invoke the method on the saved 
> filter and reset that state. I tested it with my current queries and it works 
> fine but I need to run the entire test suite to make sure I have not 
> introduced any regression. In addition to that I need to figure out what 
> should be the behavior when the opeation is MUST_PASS_ONE, but I doubt it 
> should be any different.
> Is my understanding of it being a bug correct ? Or am I trivializing it and 
> ignoring something very important ? If it's tough to wrap your head around 
> the explanation, then I can open a JIRA and upload a patch against 0.94 head.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Reply via email to