I am sort on the fence for having separate filter methods like you indicated... By giving a unified call, we make things simpler, but at the potential cost of speed. Of course the hotspot JIT can really kick some ass inlining the KeyValue calls, so maybe it wont matter.
If you look at the github proto-implementation, you'll see what I actually cache the results of some of these calls, so we can avoid having to call the filter over and over when we know it will return the same result. I call this 'stickyNextRow' - it should help quite a bit, since it will keep us skiping to the next row once we failed any test that indicates we need to be on the next row. The change is here: http://github.com/ryanobjc/hbase/commit/c61219222a58b25df7bda2ffc1a60029a03450eb Let me know if I should make adjustements. I'm starting by reimplementing some common filters and building tests for them. -ryan On Wed, May 27, 2009 at 7:34 AM, Erik Holstad <[email protected]> wrote: > Looks good! > Some thoughts: > The way I saw the filter working was as a compliment to the regular checks > that are row/(family)/qualifier/TTL. > Since we are parsing the KeyValue in the matcher I thought that we would do > the filter there too, so that after > every regular check we would have the filter check, so we would have a > > rowFilter(byte[] buffer, int offset, int length), > familyFilter(byte[] buffer, int offset, int length), > qualifierFilter(byte[] buffer, int offset, int length) and > valueFilter(byte[] buffer, int offset, int length), > > to make us early out as soon as possible and don't have to reparse the > KeyValue, since the filter is going to add > a cost by just being there as it is. > > Erik >
