RE: How HBase perform per-column scan?

Liu, Raymond Sun, 10 Mar 2013 22:13:25 -0700

Hmm, I don't mean query bloom filter directly. I mean the storefilescanner will 
query rowcol bloomfilter to see is it need a seek or not. And I guess this will 
be performed on every row without need to specific a row keys?



> ROWCOL bloom says whether for a given row (rowkey) a given column (qualifier)
> is present in an HFile or not.  But for the user he dont know the rowkeys. He
> wants all the rows with column 'x'
> 
> -Anoop-
> 
> ________________________________________
> From: Liu, Raymond [raymond....@intel.com]
> Sent: Monday, March 11, 2013 7:43 AM
> To: user@hbase.apache.org
> Subject: RE: How HBase perform per-column scan?
> 
> Just curious, won't ROWCOL bloom filter works for this case?
> 
> Best Regards,
> Raymond Liu
> 
> >
> > As per the above said, you will need a full table scan on that CF.
> > As Ted said, consider having a look at your schema design.
> >
> > -Anoop-
> >
> >
> > On Sun, Mar 10, 2013 at 8:10 PM, Ted Yu <yuzhih...@gmail.com> wrote:
> >
> > > bq. physically column family should be able to perform efficiently
> > > (storage layer
> > >
> > > When you scan a row, data for different column families would be
> > > brought into memory (if you don't utilize HBASE-5416) Take a look at:
> > >
> > >
> >
> https://issues.apache.org/jira/browse/HBASE-5416?focusedCommentId=1354
> > > 1258&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-ta
> > > bp
> > > anel#comment-13541258
> > >
> > > which was based on the settings described in:
> > >
> > >
> > >
> >
> https://issues.apache.org/jira/browse/HBASE-5416?focusedCommentId=1354
> > > 1191&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-ta
> > > bp
> > > anel#comment-13541191
> > >
> > > This boils down to your schema design. If possible, consider
> > > extracting column C into its own column family.
> > >
> > > Cheers
> > >
> > > On Sun, Mar 10, 2013 at 7:14 AM, PG <pengyunm...@gmail.com> wrote:
> > >
> > > > Hi, Ted and Anoop, thanks for your notes.
> > > > I am talking about column rather than column family, since
> > > > physically column family should be able to perform efficiently
> > > > (storage layer, CF's are stored separately). But columns of the
> > > > same column family may be
> > > mixed
> > > > physically, and that makes filters column value hard... So I want
> > > > to know if there are any mechanism in HBase worked on this...
> > > > Regards,
> > > > Yun
> > > >
> > > > On Mar 10, 2013, at 10:01 AM, Ted Yu <yuzhih...@gmail.com> wrote:
> > > >
> > > > > Hi, Yun:
> > > > > Take a look at HBASE-5416 (Improve performance of scans with
> > > > > some kind
> > > of
> > > > > filters) which is in 0.94.5 release.
> > > > >
> > > > > In your case, you can use a filter which specifies column C as
> > > > > the essential family.
> > > > > Here I interpret column C as column family.
> > > > >
> > > > > Cheers
> > > > >
> > > > > On Sat, Mar 9, 2013 at 11:11 AM, yun peng
> > > > > <pengyunm...@gmail.com>
> > > wrote:
> > > > >
> > > > >> Hi, All,
> > > > >> I want to find all existing values for a given column in a
> > > > >> HBase, and
> > > > would
> > > > >> that result in a full-table scan in HBase? For example, given a
> > > > >> column
> > > > C,
> > > > >> the table is of very large number of rows, from which few rows
> > > > >> (say
> > > > only 1
> > > > >> row) have non-empty values for column C. Would HBase still ues
> > > > >> a full
> > > > table
> > > > >> scan to find this row? Or HBase has any optimization work for
> > > > >> this
> > > kind
> > > > of
> > > > >> query?
> > > > >> Thanks...
> > > > >> Regards
> > > > >> Yun
> > > > >>
> > > >
> > >

RE: How HBase perform per-column scan?

Reply via email to