Re: FilterList and SingleColumnValueFilter

Paul Ambrose Fri, 18 Dec 2009 08:34:45 -0800

Ugh.  I am afraid not.  

The two changes that I am advocating (that could break someone else, which is
of course problematic) are:


1)  SingleColumnValueFilter.filterKeyValue(KeyValue keyValue)
When the column name does not match, the return value should be NEXT_ROW,
rather than INCLUDE.  As mentioned earlier, when called by FilterList,
the INCLUDE return value discontinues further filter evaluation for a given 
KeyValue
in FilterList. That is problematic because matchedColumn is later checked in 
filterRow
and will always be false for unevaluated filters.

2) FilterList.filterKeyValue(KeyValue v) returns SKIP and I do not know why.
In the case of MUST_PASS_ALL, a filter not returning an INCLUDE
should result in a NEXT_ROW (not SKIP) being returned, and at the bottom,
an INCLUDE should always be returned (rather than a SKIP).

Here is a dumb question.  A while ago, I tried to add my own filter to the
server, but I could not get it going without adding an entry in 
HbaseObjectWritable.addToMap().  Should I be able to add a filter without
this step?  If so, I am content to have my own version of the 
SingleColumnValueFilter
and FilterList and not risk breaking others (though I do think the code is 
incorrect).



On Dec 17, 2009, at 10:27 AM, stack wrote:

> On Tue, Dec 15, 2009 at 10:42 PM, Paul Ambrose <[email protected]> wrote:
> 
>> Hey Michael,
>> 
>> If hbase-2037 will make it into 0.20.3, I am fine.
>> 
> 
> Grand.
> 
> Will hbase-2037 fix both issues you describe? (Have you tried it I wonder?)
> 
> St.Ack
> 
> 
> 
>> If not, I would greatly appreciate you breaking it out for 0.20.3.
>> 
>> 
> 
> 
> 
> 
>> Thanks,
>> Paul
>> 
>> 
>> 
>> On Dec 15, 2009, at 10:28 PM, stack wrote:
>> 
>>> Paul:
>>> 
>>> I can apply the fix from hbase-2037... I can break it out of the posted
>>> patch thats up there.  Just say the word.
>>> 
>>> St.Ack
>>> 
>>> 
>>> On Tue, Dec 15, 2009 at 4:17 PM, Ram Kulbak <[email protected]>
>> wrote:
>>> 
>>>> Hi Paul,
>>>> 
>>>> I've encountered the same problem. I think its fixed as part of
>>>> https://issues.apache.org/jira/browse/HBASE-2037
>>>> 
>>>> Regards,
>>>> Yoram
>>>> 
>>>> 
>>>> 
>>>> On Wed, Dec 16, 2009 at 10:45 AM, Paul Ambrose <[email protected]>
>> wrote:
>>>> 
>>>>> I ran into some problems with FilterList and SingleColumnValueFilter.
>>>>> 
>>>>> I created a FilterList with MUST_PASS_ONE and two
>>>> SingleColumnValueFilters
>>>>> (each testing equality on a different columns) and query some trivial
>>>> data:
>>>>> 
>>>>> http://pastie.org/744890
>>>>> 
>>>>> The problem that I encountered were two-fold:
>>>>> 
>>>>> SingleColumnValueFilter.filterKeyValues() returns ReturnCode.INCLUDE
>>>>> if the column names do not match. If FilterList is employed, then when
>>>> the
>>>>> first Filter returns INCLUDE (because the column names do not match),
>> no
>>>>> more filters for that KeyValue are evaluated.  That is problematic
>>>> because
>>>>> when filterRow() is finally called for those filters, matchedColumn is
>>>>> never
>>>>> found to be true because they were not invoked (due to FilterList
>> exiting
>>>>> from
>>>>> the filterList iteration when the name mismatched INCLUDE was
>> returned).
>>>>> The fix (at least for this scenario) is for
>>>>> SingleColumnValueFilter.filterKeyValues() to
>>>>> return ReturnCode.NEXT_ROW (rather than INCLUDE).
>>>>> 
>>>>> The second problem is at the bottom of FilterList.filterKeyValue()
>>>>> where ReturnCode.SKIP is returned if MUST_PASS_ONE is the operator,
>>>>> rather than always returning ReturnCode.INCLUDE and then leaving the
>>>>> final filter decision to be made by the call to filterRow().   I am
>> sure
>>>>> there is a good
>>>>> reason for returning SKIP in other scenarios, but it is problematic in
>>>>> mine.
>>>>> 
>>>>> Feedback would be much appreciated.
>>>>> 
>>>>> Paul
>>>>> 
>>>>> 
>>>>> 
>>>>> 
>>>>> 
>>>>> 
>>>>> 
>>>>> 
>>>> 
>> 
>>

Re: FilterList and SingleColumnValueFilter

Reply via email to