Re: FilterList and SingleColumnValueFilter

bmdevelopment Fri, 18 Dec 2009 13:46:05 -0800

Hi,

Yes, I'll be doing testing on FilterLists in my program in the next fewweeks, so will come back with my results afterwards and myrecommendations as well. :)

Thanks, enjoy the weekend.


stack wrote:

Maybe you two smart fellas can between you make a recommendation and a
patch?
Thanks lads,
St.Ack

On Fri, Dec 18, 2009 at 11:44 AM, bmdevelopment <bmdevelopm...@gmail.com>wrote:

Hi,
Fyi, I came across similar issues when working on HBASE-1975.
The return values did not seem to be correct to me either, but when I began
changing them it seemed to lead to quite involved changes in the SCVF and
Filter unit tests - something I wanted to avoid.
In the end, I tried to keep the changes to SCVF as simple as possible.
At one point, I did also attempt my own version of SCVF and ran into the
same issue of having to use HbaseObjectWritable.addToMap().

Now I am beginning to use MUST_PAST_ALL and MUST_PASS_ONE FilterList of
SCVFs - maybe similar to what Paul is doing in his original mail. So, if it
is not working as expected, I will probably need this in the near future as
well.

Thanks
Jeremiah


Paul Ambrose wrote:

Ugh.  I am afraid not.
The two changes that I am advocating (that could break someone else, which
is
of course problematic) are:

1)  SingleColumnValueFilter.filterKeyValue(KeyValue keyValue)
When the column name does not match, the return value should be NEXT_ROW,
rather than INCLUDE.  As mentioned earlier, when called by FilterList,
the INCLUDE return value discontinues further filter evaluation for a
given KeyValue
in FilterList. That is problematic because matchedColumn is later checked
in filterRow
and will always be false for unevaluated filters.

2) FilterList.filterKeyValue(KeyValue v) returns SKIP and I do not know
why.
In the case of MUST_PASS_ALL, a filter not returning an INCLUDE
should result in a NEXT_ROW (not SKIP) being returned, and at the bottom,
an INCLUDE should always be returned (rather than a SKIP).

Here is a dumb question.  A while ago, I tried to add my own filter to the
server, but I could not get it going without adding an entry in
HbaseObjectWritable.addToMap().  Should I be able to add a filter without
this step?  If so, I am content to have my own version of the
SingleColumnValueFilter
and FilterList and not risk breaking others (though I do think the code is
incorrect).



On Dec 17, 2009, at 10:27 AM, stack wrote:

 On Tue, Dec 15, 2009 at 10:42 PM, Paul Ambrose <pambr...@mac.com> wrote:

 Hey Michael,

If hbase-2037 will make it into 0.20.3, I am fine.

 Grand.

Will hbase-2037 fix both issues you describe? (Have you tried it I
wonder?)

St.Ack



 If not, I would greatly appreciate you breaking it out for 0.20.3.


 Thanks,

Paul



On Dec 15, 2009, at 10:28 PM, stack wrote:

 Paul:

I can apply the fix from hbase-2037... I can break it out of the posted
patch thats up there.  Just say the word.

St.Ack


On Tue, Dec 15, 2009 at 4:17 PM, Ram Kulbak <ram.kul...@gmail.com>

wrote:

Hi Paul,

I've encountered the same problem. I think its fixed as part of
https://issues.apache.org/jira/browse/HBASE-2037

Regards,
Yoram



On Wed, Dec 16, 2009 at 10:45 AM, Paul Ambrose <pambr...@mac.com>

wrote:
I ran into some problems with FilterList and SingleColumnValueFilter.

I created a FilterList with MUST_PASS_ONE and two

SingleColumnValueFilters

(each testing equality on a different columns) and query some trivial

data:

http://pastie.org/744890

The problem that I encountered were two-fold:

SingleColumnValueFilter.filterKeyValues() returns ReturnCode.INCLUDE
if the column names do not match. If FilterList is employed, then
when

the

first Filter returns INCLUDE (because the column names do not match),

no

more filters for that KeyValue are evaluated.  That is problematic

because

when filterRow() is finally called for those filters, matchedColumn
is
never
found to be true because they were not invoked (due to FilterList

exiting

from

the filterList iteration when the name mismatched INCLUDE was

returned).

The fix (at least for this scenario) is for

SingleColumnValueFilter.filterKeyValues() to
return ReturnCode.NEXT_ROW (rather than INCLUDE).

The second problem is at the bottom of FilterList.filterKeyValue()
where ReturnCode.SKIP is returned if MUST_PASS_ONE is the operator,
rather than always returning ReturnCode.INCLUDE and then leaving the
final filter decision to be made by the call to filterRow().   I am

sure

there is a good

reason for returning SKIP in other scenarios, but it is problematic
in
mine.

Feedback would be much appreciated.

Paul

Re: FilterList and SingleColumnValueFilter

Reply via email to