My earlier suggestion of having SCVF.filterKeyValue() not return INCLUDE on column name mismatches was incorrect because INCLUDE is appropriate when SCVF is used without FIlterLIst (in the case of MUST_PASS_ONE). I think the fix is to have FilterList evaluate all the filters and not bail early when an INCLUDE is found. I will continue to play with it.
On Dec 18, 2009, at 1:45 PM, bmdevelopment wrote: > Hi, > Yes, I'll be doing testing on FilterLists in my program in the next few > weeks, so will come back with my results afterwards and my recommendations as > well. :) > Thanks, enjoy the weekend. > > stack wrote: >> Maybe you two smart fellas can between you make a recommendation and a >> patch? >> Thanks lads, >> St.Ack >> On Fri, Dec 18, 2009 at 11:44 AM, bmdevelopment >> <[email protected]>wrote: >>> Hi, >>> Fyi, I came across similar issues when working on HBASE-1975. >>> The return values did not seem to be correct to me either, but when I began >>> changing them it seemed to lead to quite involved changes in the SCVF and >>> Filter unit tests - something I wanted to avoid. >>> In the end, I tried to keep the changes to SCVF as simple as possible. >>> At one point, I did also attempt my own version of SCVF and ran into the >>> same issue of having to use HbaseObjectWritable.addToMap(). >>> >>> Now I am beginning to use MUST_PAST_ALL and MUST_PASS_ONE FilterList of >>> SCVFs - maybe similar to what Paul is doing in his original mail. So, if it >>> is not working as expected, I will probably need this in the near future as >>> well. >>> >>> Thanks >>> Jeremiah >>> >>> >>> Paul Ambrose wrote: >>> >>>> Ugh. I am afraid not. >>>> The two changes that I am advocating (that could break someone else, which >>>> is >>>> of course problematic) are: >>>> >>>> 1) SingleColumnValueFilter.filterKeyValue(KeyValue keyValue) >>>> When the column name does not match, the return value should be NEXT_ROW, >>>> rather than INCLUDE. As mentioned earlier, when called by FilterList, >>>> the INCLUDE return value discontinues further filter evaluation for a >>>> given KeyValue >>>> in FilterList. That is problematic because matchedColumn is later checked >>>> in filterRow >>>> and will always be false for unevaluated filters. >>>> >>>> 2) FilterList.filterKeyValue(KeyValue v) returns SKIP and I do not know >>>> why. >>>> In the case of MUST_PASS_ALL, a filter not returning an INCLUDE >>>> should result in a NEXT_ROW (not SKIP) being returned, and at the bottom, >>>> an INCLUDE should always be returned (rather than a SKIP). >>>> >>>> Here is a dumb question. A while ago, I tried to add my own filter to the >>>> server, but I could not get it going without adding an entry in >>>> HbaseObjectWritable.addToMap(). Should I be able to add a filter without >>>> this step? If so, I am content to have my own version of the >>>> SingleColumnValueFilter >>>> and FilterList and not risk breaking others (though I do think the code is >>>> incorrect). >>>> >>>> >>>> >>>> On Dec 17, 2009, at 10:27 AM, stack wrote: >>>> >>>> On Tue, Dec 15, 2009 at 10:42 PM, Paul Ambrose <[email protected]> wrote: >>>>> Hey Michael, >>>>>> If hbase-2037 will make it into 0.20.3, I am fine. >>>>>> >>>>>> Grand. >>>>> Will hbase-2037 fix both issues you describe? (Have you tried it I >>>>> wonder?) >>>>> >>>>> St.Ack >>>>> >>>>> >>>>> >>>>> If not, I would greatly appreciate you breaking it out for 0.20.3. >>>>>> >>>>>> >>>>> >>>>> Thanks, >>>>>> Paul >>>>>> >>>>>> >>>>>> >>>>>> On Dec 15, 2009, at 10:28 PM, stack wrote: >>>>>> >>>>>> Paul: >>>>>>> I can apply the fix from hbase-2037... I can break it out of the posted >>>>>>> patch thats up there. Just say the word. >>>>>>> >>>>>>> St.Ack >>>>>>> >>>>>>> >>>>>>> On Tue, Dec 15, 2009 at 4:17 PM, Ram Kulbak <[email protected]> >>>>>>> >>>>>> wrote: >>>>>> >>>>>>> Hi Paul, >>>>>>>> I've encountered the same problem. I think its fixed as part of >>>>>>>> https://issues.apache.org/jira/browse/HBASE-2037 >>>>>>>> >>>>>>>> Regards, >>>>>>>> Yoram >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> On Wed, Dec 16, 2009 at 10:45 AM, Paul Ambrose <[email protected]> >>>>>>>> >>>>>>> wrote: >>>>>>> I ran into some problems with FilterList and SingleColumnValueFilter. >>>>>>>>> I created a FilterList with MUST_PASS_ONE and two >>>>>>>>> >>>>>>>> SingleColumnValueFilters >>>>>>>> >>>>>>>>> (each testing equality on a different columns) and query some trivial >>>>>>>>> >>>>>>>> data: >>>>>>>> >>>>>>>>> http://pastie.org/744890 >>>>>>>>> >>>>>>>>> The problem that I encountered were two-fold: >>>>>>>>> >>>>>>>>> SingleColumnValueFilter.filterKeyValues() returns ReturnCode.INCLUDE >>>>>>>>> if the column names do not match. If FilterList is employed, then >>>>>>>>> when >>>>>>>>> >>>>>>>> the >>>>>>>> >>>>>>>>> first Filter returns INCLUDE (because the column names do not match), >>>>>>>>> >>>>>>>> no >>>>>>> more filters for that KeyValue are evaluated. That is problematic >>>>>>>> because >>>>>>>> >>>>>>>>> when filterRow() is finally called for those filters, matchedColumn >>>>>>>>> is >>>>>>>>> never >>>>>>>>> found to be true because they were not invoked (due to FilterList >>>>>>>>> >>>>>>>> exiting >>>>>>> from >>>>>>>>> the filterList iteration when the name mismatched INCLUDE was >>>>>>>>> >>>>>>>> returned). >>>>>>> The fix (at least for this scenario) is for >>>>>>>>> SingleColumnValueFilter.filterKeyValues() to >>>>>>>>> return ReturnCode.NEXT_ROW (rather than INCLUDE). >>>>>>>>> >>>>>>>>> The second problem is at the bottom of FilterList.filterKeyValue() >>>>>>>>> where ReturnCode.SKIP is returned if MUST_PASS_ONE is the operator, >>>>>>>>> rather than always returning ReturnCode.INCLUDE and then leaving the >>>>>>>>> final filter decision to be made by the call to filterRow(). I am >>>>>>>>> >>>>>>>> sure >>>>>>> there is a good >>>>>>>>> reason for returning SKIP in other scenarios, but it is problematic >>>>>>>>> in >>>>>>>>> mine. >>>>>>>>> >>>>>>>>> Feedback would be much appreciated. >>>>>>>>> >>>>>>>>> Paul >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> >
