[ 
https://issues.apache.org/jira/browse/HBASE-9079?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13732119#comment-13732119
 ] 

Lars Hofhansl commented on HBASE-9079:
--------------------------------------

Did you get a chance to test this with real data, [~viralbajaria]?
                
> FilterList getNextKeyHint skips rows that should be included in the results
> ---------------------------------------------------------------------------
>
>                 Key: HBASE-9079
>                 URL: https://issues.apache.org/jira/browse/HBASE-9079
>             Project: HBase
>          Issue Type: Bug
>          Components: Filters
>    Affects Versions: 0.94.10
>            Reporter: Viral Bajaria
>            Assignee: Viral Bajaria
>             Fix For: 0.98.0, 0.95.2, 0.94.11
>
>         Attachments: 9079-0.94-v2.txt, 9079-trunk-v2.txt, 
> HBASE-9079-0.94.patch, HBASE-9079-trunk.patch
>
>
> I hit a weird issue/bug and am able to reproduce the error consistently. The 
> problem arises when FilterList has two filters where each implements the 
> getNextKeyHint method.
> The way the current implementation works is, StoreScanner will call 
> matcher.getNextKeyHint() whenever it gets a SEEK_NEXT_USING_HINT. This in 
> turn will call filter.getNextKeyHint() which at this stage is of type 
> FilterList. The implementation in FilterList iterates through all the filters 
> and keeps the max KeyValue that it sees. All is fine if you wrap filters in 
> FilterList in which only one of them implements getNextKeyHint. but if 
> multiple of them implement then that's where things get weird.
> For example:
> - create two filters: one is FuzzyRowFilter and second is ColumnRangeFilter. 
> Both of them implement getNextKeyHint
> - wrap them in FilterList with MUST_PASS_ALL
> - FuzzyRowFilter will seek to the correct first row and then pass it to 
> ColumnRangeFilter which will return the SEEK_NEXT_USING_HINT code.
> - Now in FilterList when getNextKeyHint is called, it calls the one on 
> FuzzyRow first which basically says what the next row should be. While in 
> reality we want the ColumnRangeFilter to give the seek hint.
> - The above behavior skips data that should be returned, which I have 
> verified by using a RowFilter with RegexStringComparator.
> I updated the FilterList to maintain state on which filter returns the 
> SEEK_NEXT_USING_HINT and in getNextKeyHint, I invoke the method on the saved 
> filter and reset that state. I tested it with my current queries and it works 
> fine but I need to run the entire test suite to make sure I have not 
> introduced any regression. In addition to that I need to figure out what 
> should be the behavior when the opeation is MUST_PASS_ONE, but I doubt it 
> should be any different.
> Is my understanding of it being a bug correct ? Or am I trivializing it and 
> ignoring something very important ? If it's tough to wrap your head around 
> the explanation, then I can open a JIRA and upload a patch against 0.94 head.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Reply via email to