Re: why does the scanner api have only startRow and stopRow and not also a count? was: Improving HBase scanner

Ryan Rawson Tue, 04 May 2010 21:11:55 -0700

Also HBase is not a key-value store. With an ordered index you can
retrieve successive rows, whereas true key value stores don't promise
any relation to the 'next' key if there is even such an option.  Eg:
Berkeley DB.  And yes I know about OrderedPreservingPartitioner but
I've also heard it's a bad idea to use it :-)


-ryan

On Tue, May 4, 2010 at 10:18 AM, TuX RaceR <[email protected]> wrote:
> Thanks a lot Gary: I had missed this one
> cheers
> TuX
>
>
> Gary Helmling wrote:
>>
>> You can always add a PageFilter to your Scan instance to achieve this:
>>
>> http://hadoop.apache.org/hbase/docs/r0.20.3/api/org/apache/hadoop/hbase/filter/PageFilter.html
>>
>> Just be aware that you should still count on the client side if you want
>> to
>> strictly limit to a given size.  Since the filter is applied independently
>> on each regionserver, the client can still receive back more than the page
>> size # of items.
>>
>> --gh
>>
>>
>> On Tue, May 4, 2010 at 12:02 PM, TuX RaceR <[email protected]> wrote:
>>
>>
>>>
>>> Hi Hbase users,
>>>
>>> question related to the previous one, if we want to limit the amount of
>>> data retrieved by a a scanner, can we tell to not scan after a number of
>>> rows is reached?
>>> If I look at another KV store (cassandra) the equivalent of the scan API
>>> uses there a
>>>
>>>
>>>    KeyRange
>>>
>>> object, see
>>> http://wiki.apache.org/cassandra/API
>>>
>>> *Attribute*
>>>
>>>
>>>
>>> *Type*
>>>
>>>
>>>
>>> *Default*
>>>
>>>
>>>
>>> *Required*
>>>
>>>
>>>
>>> *Description*
>>>
>>> start_key
>>>
>>>
>>>
>>> string
>>>
>>>
>>>
>>> n/a
>>>
>>>
>>>
>>> N
>>>
>>>
>>>
>>> The first key in the inclusive KeyRange.
>>>
>>> end_key
>>>
>>>
>>>
>>> string
>>>
>>>
>>>
>>> n/a
>>>
>>>
>>>
>>> N
>>>
>>>
>>>
>>> The last key in the inclusive KeyRange.
>>>
>>> start_token
>>>
>>>
>>>
>>> string
>>>
>>>
>>>
>>> n/a
>>>
>>>
>>>
>>> N
>>>
>>>
>>>
>>> The first token in the exclusive KeyRange.
>>>
>>> end_token
>>>
>>>
>>>
>>> string
>>>
>>>
>>>
>>> n/a
>>>
>>>
>>>
>>> N
>>>
>>>
>>>
>>> The last token in the exclusive KeyRange.
>>>
>>> count
>>>
>>>
>>>
>>> i32
>>>
>>>
>>>
>>> 100
>>>
>>>
>>>
>>> Y
>>>
>>>
>>>
>>> The total number of keys to permit in the KeyRange.
>>>
>>>
>>>    Would it be useful (performance wise) to have a 'count' parameter
>>>    too, or would it be useless as equivalent to end the scan loop
>>>    application side, when the desired number of row is reached?
>>>
>>>
>>>
>>>    Thanks
>>>
>>>
>>>    TuX
>>>
>>>
>>>
>>>
>>
>>
>
>

Re: why does the scanner api have only startRow and stopRow and not also a count? was: Improving HBase scanner

Reply via email to