Re: Filter use cases

Jonathan Gray Tue, 26 May 2009 23:40:13 -0700

This sounds like a good initial approach for a new filter interface.

+1 on moving forward with what you propose, allowing for modifications as
we reimplement and integrate.


Good stuff, Ryan!

JG

On Tue, May 26, 2009 11:28 pm, Ryan Rawson wrote:
> Hi all,
>
>
> With HBASE-1304, it's time to normalize and review our filter API.
>
>
> Here are a few givens:
> - all calls must be byte[] offset,int offset, int length
> - maybe we can have calls for KeyValue (which encodes all parts of the key
> &
> value as per the name) - we'd like to get rid of the calls:
> --   boolean filterRow(final SortedMap<byte [], Cell> columns);
> --   boolean filterRow(final List<KeyValue> results);
> These calls are expensive, and there is no reason to have them.
>
>
> Here is a proposal, imagine a filter will see this sequence of calls:
> - reset()
> - filterRowKey(byte[],int,int) - true to include row, false to skip to
> next row - filterKeyValue(KeyValue) - true to include key/value, false to
> skip -- can choose to filter on family, qualifier, value, anything really.
>  - filterRow() - true to include entire row, false to post-hoc veto row
>
>
> In this case one could implement the "filterIfColumnMissing" feature of
> ColumnValueFilter by carrying state and returning false from filterRow()
> to veto the row based on the columns/values we didn't see.
>
> In any of these cases, all these functions will be called quite
> frequently, so efficiency of the code is paramount.  It's probable that
> filterRowKey() will be 'cached' by the calling code, but filterKeyValue()
> is called for nearly every single value we would normally return (ie: it's
> applied _AFTER_ column matching and version and timestamp and delete
> tracking).
>
> The goal is to:
> (a) make the implementation easy and performant
> (b) make the API normative and easy to code for
> (c) make everything work
>
>
> Thoughts?
> -ryan
>
>

Re: Filter use cases

Reply via email to