On Fri, Dec 18, 2009 at 8:33 AM, Paul Ambrose <pambr...@mac.com> wrote:
> Ugh. I am afraid not. > > The two changes that I am advocating (that could break someone else, which > is > of course problematic) are: > > 1) SingleColumnValueFilter.filterKeyValue(KeyValue keyValue) > When the column name does not match, the return value should be NEXT_ROW, > rather than INCLUDE. As mentioned earlier, when called by FilterList, > the INCLUDE return value discontinues further filter evaluation for a given > KeyValue > in FilterList. That is problematic because matchedColumn is later checked > in filterRow > and will always be false for unevaluated filters. > > Should SCVF work this way? Is it broke the way it currently works in your opinion? We could just be specific in the SCVF class javadoc that it does not include columns whose name do not match. SCVF is kinda popular as filters go (I think): i.e. someone is probably using it. > 2) FilterList.filterKeyValue(KeyValue v) returns SKIP and I do not know > why. > In the case of MUST_PASS_ALL, a filter not returning an INCLUDE > should result in a NEXT_ROW (not SKIP) being returned, and at the bottom, > an INCLUDE should always be returned (rather than a SKIP). > OK. If you make the change, tests break? Do the breaking tests help explain why they work the way they do? Paul, filters are a little tricky. All the permutations I'm sure have not been excercised so for sure there are holes in how filter rules are evaluated. It sounds like 2) above is a bug -- fixed by hbase-2037? -- but that the way SCVF works may be more lax than you'd like? If they are 'broke', please make a patch to fix. Please add tests that illustrate the fix made. > > Here is a dumb question. A while ago, I tried to add my own filter to the > server, but I could not get it going without adding an entry in > HbaseObjectWritable.addToMap(). Should I be able to add a filter without > this step? If so, I am content to have my own version of the > SingleColumnValueFilter > and FilterList and not risk breaking others (though I do think the code is > incorrect). > > Yeah, it looks like it. What should happen is that we should support passing class name. I thought we did but it doesn't look like it. If no such code, then we should look for a String of the class name and use reflection instantiating the passed filter. St.Ack > > > On Dec 17, 2009, at 10:27 AM, stack wrote: > > > On Tue, Dec 15, 2009 at 10:42 PM, Paul Ambrose <pambr...@mac.com> wrote: > > > >> Hey Michael, > >> > >> If hbase-2037 will make it into 0.20.3, I am fine. > >> > > > > Grand. > > > > Will hbase-2037 fix both issues you describe? (Have you tried it I > wonder?) > > > > St.Ack > > > > > > > >> If not, I would greatly appreciate you breaking it out for 0.20.3. > >> > >> > > > > > > > > > >> Thanks, > >> Paul > >> > >> > >> > >> On Dec 15, 2009, at 10:28 PM, stack wrote: > >> > >>> Paul: > >>> > >>> I can apply the fix from hbase-2037... I can break it out of the posted > >>> patch thats up there. Just say the word. > >>> > >>> St.Ack > >>> > >>> > >>> On Tue, Dec 15, 2009 at 4:17 PM, Ram Kulbak <ram.kul...@gmail.com> > >> wrote: > >>> > >>>> Hi Paul, > >>>> > >>>> I've encountered the same problem. I think its fixed as part of > >>>> https://issues.apache.org/jira/browse/HBASE-2037 > >>>> > >>>> Regards, > >>>> Yoram > >>>> > >>>> > >>>> > >>>> On Wed, Dec 16, 2009 at 10:45 AM, Paul Ambrose <pambr...@mac.com> > >> wrote: > >>>> > >>>>> I ran into some problems with FilterList and SingleColumnValueFilter. > >>>>> > >>>>> I created a FilterList with MUST_PASS_ONE and two > >>>> SingleColumnValueFilters > >>>>> (each testing equality on a different columns) and query some trivial > >>>> data: > >>>>> > >>>>> http://pastie.org/744890 > >>>>> > >>>>> The problem that I encountered were two-fold: > >>>>> > >>>>> SingleColumnValueFilter.filterKeyValues() returns ReturnCode.INCLUDE > >>>>> if the column names do not match. If FilterList is employed, then > when > >>>> the > >>>>> first Filter returns INCLUDE (because the column names do not match), > >> no > >>>>> more filters for that KeyValue are evaluated. That is problematic > >>>> because > >>>>> when filterRow() is finally called for those filters, matchedColumn > is > >>>>> never > >>>>> found to be true because they were not invoked (due to FilterList > >> exiting > >>>>> from > >>>>> the filterList iteration when the name mismatched INCLUDE was > >> returned). > >>>>> The fix (at least for this scenario) is for > >>>>> SingleColumnValueFilter.filterKeyValues() to > >>>>> return ReturnCode.NEXT_ROW (rather than INCLUDE). > >>>>> > >>>>> The second problem is at the bottom of FilterList.filterKeyValue() > >>>>> where ReturnCode.SKIP is returned if MUST_PASS_ONE is the operator, > >>>>> rather than always returning ReturnCode.INCLUDE and then leaving the > >>>>> final filter decision to be made by the call to filterRow(). I am > >> sure > >>>>> there is a good > >>>>> reason for returning SKIP in other scenarios, but it is problematic > in > >>>>> mine. > >>>>> > >>>>> Feedback would be much appreciated. > >>>>> > >>>>> Paul > >>>>> > >>>>> > >>>>> > >>>>> > >>>>> > >>>>> > >>>>> > >>>>> > >>>> > >> > >> > >