Re: Scan across multiple columns

Lars George Sat, 11 Apr 2009 09:34:20 -0700

Hi Vincent,

What I did is also have a custom getSplits() implementation in theTableInputFormat. When the splits are determined I mask out thoseregions that have no key of interest. Since the start and end key areordered as a total list I can safely assume that if I scan the last fewthousand entries that I can skip the regions beforehand. Of course, ifyou have a complete random key or the rows are spread across everyregion then this is futile.


Lars

Vincent Poon (vinpoon) wrote:

Thanks for the reply.  I have been using ColumnValueFilter, but was
wondering if there was a faster solution, as it seems ColumnValueFilter
must apply the filter on the entire row range (in my case I need to scan
the entire table, with millions of rows).  I also tried using indirect
queries - scanning down Col A and then using the rowIds to get the cell
under ColB.  This is ok until the number of values under Col A is very
large.

Vincent

-----Original Message-----

From: Ryan Rawson [mailto:ryano...@gmail.com]Sent: Thursday, April 09, 2009 6:34 PM

To: hbase-user@hadoop.apache.org
Subject: Re: Scan across multiple columns

Check out the org.apache.hadoop.hbase.filter package.  The
ColumnValueFilter might be of help specifically.

The other solution is to do it client side.

-ryan

On Thu, Apr 9, 2009 at 2:45 PM, Vincent Poon (vinpoon)
<vinp...@cisco.com>wrote:

Say I want to scan down a table that looks like this:

           Col A      Col B
row1        x             x
row2                       x
row3        x             x
Normally a scanner would return all three rows, but what's the bestway to scan so that only row1 and row3 are returned? i.e. only therows with data in both columns.
Thanks,
Vincent

Re: Scan across multiple columns

Reply via email to