Re: why does the scanner api have only startRow and stopRow and not also a count? was: Improving HBase scanner

Gary Helmling Tue, 04 May 2010 09:10:57 -0700

You can always add a PageFilter to your Scan instance to achieve this:
http://hadoop.apache.org/hbase/docs/r0.20.3/api/org/apache/hadoop/hbase/filter/PageFilter.html


Just be aware that you should still count on the client side if you want to
strictly limit to a given size.  Since the filter is applied independently
on each regionserver, the client can still receive back more than the page
size # of items.

--gh


On Tue, May 4, 2010 at 12:02 PM, TuX RaceR <tuxrace...@gmail.com> wrote:

> Hi Hbase users,
>
> question related to the previous one, if we want to limit the amount of
> data retrieved by a a scanner, can we tell to not scan after a number of
> rows is reached?
> If I look at another KV store (cassandra) the equivalent of the scan API
> uses there a
>
>
>     KeyRange
>
> object, see
> http://wiki.apache.org/cassandra/API
>
> *Attribute*
>
>
>
> *Type*
>
>
>
> *Default*
>
>
>
> *Required*
>
>
>
> *Description*
>
> start_key
>
>
>
> string
>
>
>
> n/a
>
>
>
> N
>
>
>
> The first key in the inclusive KeyRange.
>
> end_key
>
>
>
> string
>
>
>
> n/a
>
>
>
> N
>
>
>
> The last key in the inclusive KeyRange.
>
> start_token
>
>
>
> string
>
>
>
> n/a
>
>
>
> N
>
>
>
> The first token in the exclusive KeyRange.
>
> end_token
>
>
>
> string
>
>
>
> n/a
>
>
>
> N
>
>
>
> The last token in the exclusive KeyRange.
>
> count
>
>
>
> i32
>
>
>
> 100
>
>
>
> Y
>
>
>
> The total number of keys to permit in the KeyRange.
>
>
>     Would it be useful (performance wise) to have a 'count' parameter
>     too, or would it be useless as equivalent to end the scan loop
>     application side, when the desired number of row is reached?
>
>
>
>     Thanks
>
>
>     TuX
>
>
>

Re: why does the scanner api have only startRow and stopRow and not also a count? was: Improving HBase scanner

Reply via email to