On Thu, Dec 3, 2015 at 12:54 PM, Jerry He <[email protected]> wrote:
> Thanks. Stack. > I will look into the code more as well. > Do you think Column only Bloom Filter will help more with this SCAN + > explicit columns case and with space saving? > > Come again Jerry. Column-only? (It has to have a row on it, right?). And how do we get space savings? There is a bloom at the start of every row already, to speed deletes. IIRC, we always read this first before we do anything. Perhaps we could beef it up with more than just delete? St.Ack > Jerry > > On Thu, Dec 3, 2015 at 9:01 AM, Stack <[email protected]> wrote: > > > On Wed, Dec 2, 2015 at 10:01 PM, Jerry He <[email protected]> wrote: > > > > > Thanks for the response. You got my question correctly. > > > If we are scanning the rows one by one and we have the requested column > > in > > > the column tracker, we have the row+column to look up in the bloom > > filter, > > > don't we? We may not be able to filter out the file scanners upfront. > But > > > may at the later time and lower level to skip something? > > > > > > > > <I've not looked at the code>You are right. If more than one explicit > > column specified, we could do a bloom check for the second and so on > since > > we'd have the current row to hand. It could make for a nice speedup for > > scans of many explicit columns traversing a dataset that is sparsely > > populated.</I've not looked at the code>. > > > > St.Ack > > > > > > > > > Jerry > > > > > > On Mon, Nov 30, 2015 at 10:55 PM, Stack <[email protected]> wrote: > > > > > > > On Mon, Nov 30, 2015 at 9:56 AM, Jerry He <[email protected]> > wrote: > > > > > > > > > Hi, experts > > > > > > > > > > HBASE supports ROWCOL bloom filter. ROW+COL would be the bloom key. > > > > > In most of the documentations, it says only GET would benefit. For > > > > > multi-column as well. > > > > > > > > > > If I do scan with StartRow and EndRow, and also specify columns. > > > > > Would ROWCOL bloom filter provide any benefit in anyway? > > > > > > > > > > > > > > If I understand your question properly, the answer is no. While we > > might > > > > have a set of columns to check in the bloom, we'd not know the set of > > > rows > > > > between start and end row and so would not be able to formulate a > query > > > > against the ROW+COL bloom filter. > > > > > > > > St.Ack > > > > > > > > > > > > > > > > > Thank you. > > > > > > > > > > Jerry > > > > > > > > > > > > > > >
