Sure, Michael! I have a post table which contains a column "keyphrases" which is fairly sparsely distributed across the rows in the table; meaning most posts do not have keyphrases. I then have a requirement in that for any query against our index which might returns many results (usually 0(1000)) I need to quickly retrieve all keyphrases for all posts which meet the query. I do not want to issue thousands of calls to HBase (and have the information returned in a few seconds at most), so I am building a bloom filter which will test if the post has keyphrases and only if it answers in the affirmative, only then I will access HBase. Given then < 1% of posts have keyphrases this cuts the access time by 2 orders of magnitude.
On Mon, Feb 8, 2010 at 4:07 PM, Stack <[email protected]> wrote: > On Mon, Feb 8, 2010 at 3:04 PM, Boris Aleksandrovsky <[email protected]> > wrote: > > Thanks. This is a one-time scan (per server runtime) in order to build > > bloomfilters to speed up access to that table; so definitely not in the > > query runtime :-) > > Can you say more about the above project of yours Boris? It sounds > interesting. > St.Ack > -- Thanks, Boris http://twitter.com/baleksan http://www.linkedin.com/in/baleksan
