This sounds like a good initial approach for a new filter interface. +1 on moving forward with what you propose, allowing for modifications as we reimplement and integrate.
Good stuff, Ryan! JG On Tue, May 26, 2009 11:28 pm, Ryan Rawson wrote: > Hi all, > > > With HBASE-1304, it's time to normalize and review our filter API. > > > Here are a few givens: > - all calls must be byte[] offset,int offset, int length > - maybe we can have calls for KeyValue (which encodes all parts of the key > & > value as per the name) - we'd like to get rid of the calls: > -- boolean filterRow(final SortedMap<byte [], Cell> columns); > -- boolean filterRow(final List<KeyValue> results); > These calls are expensive, and there is no reason to have them. > > > Here is a proposal, imagine a filter will see this sequence of calls: > - reset() > - filterRowKey(byte[],int,int) - true to include row, false to skip to > next row - filterKeyValue(KeyValue) - true to include key/value, false to > skip -- can choose to filter on family, qualifier, value, anything really. > - filterRow() - true to include entire row, false to post-hoc veto row > > > In this case one could implement the "filterIfColumnMissing" feature of > ColumnValueFilter by carrying state and returning false from filterRow() > to veto the row based on the columns/values we didn't see. > > In any of these cases, all these functions will be called quite > frequently, so efficiency of the code is paramount. It's probable that > filterRowKey() will be 'cached' by the calling code, but filterKeyValue() > is called for nearly every single value we would normally return (ie: it's > applied _AFTER_ column matching and version and timestamp and delete > tracking). > > The goal is to: > (a) make the implementation easy and performant > (b) make the API normative and easy to code for > (c) make everything work > > > Thoughts? > -ryan > >
