I am sort on the fence for having separate filter methods like you
indicated... By giving a unified call, we make things simpler, but at the
potential cost of speed.  Of course the hotspot JIT can really kick some ass
inlining the KeyValue calls, so maybe it wont matter.

If you look at the github proto-implementation, you'll see what I actually
cache the results of some of these calls, so we can avoid having to call the
filter over and over when we know it will return the same result.  I call
this 'stickyNextRow' - it should help quite a bit, since it will keep us
skiping to the next row once we failed any test that indicates we need to be
on the next row.

The change is here:
http://github.com/ryanobjc/hbase/commit/c61219222a58b25df7bda2ffc1a60029a03450eb

Let me know if I should make adjustements.  I'm starting by reimplementing
some common filters and building tests for them.

-ryan

On Wed, May 27, 2009 at 7:34 AM, Erik Holstad <[email protected]> wrote:

> Looks good!
> Some thoughts:
> The way I saw the filter working was as a compliment to the regular checks
> that are row/(family)/qualifier/TTL.
> Since we are parsing the KeyValue in the matcher I thought that we would do
> the filter there too, so that after
> every regular check we would have the filter check, so we would have a
>
> rowFilter(byte[] buffer, int offset, int length),
> familyFilter(byte[] buffer, int offset, int length),
> qualifierFilter(byte[] buffer, int offset, int length) and
> valueFilter(byte[] buffer, int offset, int length),
>
> to make us early out as soon as possible and don't have to reparse the
> KeyValue, since the filter is going to add
> a cost by just being there as it is.
>
> Erik
>

Reply via email to