Re: Record limit in scan api?

Jean-Daniel Cryans Fri, 20 Nov 2009 13:56:11 -0800

And on the Scan as I wrote in my answer which is really really convenient.

Not convinced on using bytes as a value for caching... It would be
also more complicated.


J-D

On Fri, Nov 20, 2009 at 1:45 PM, Ryan Rawson <[email protected]> wrote:
> You can set it on a per-HTable basis.  HTable.setScannerCaching(int);
>
>
>
> On Fri, Nov 20, 2009 at 1:43 PM, Dave Latham <[email protected]> wrote:
>> I have some tables with large rows and some tables with very small rows, so
>> I keep my default scanner caching at 1 row, but have to remember to set it
>> higher when scanner tables with smaller rows.  It would be nice to have a
>> default that did something reasonable across tables.
>>
>> Would it make sense to set scanner caching as a count of bytes rather than a
>> count of rows?  That would make it similar to the write buffer for batches
>> of puts that get flushed based on size rather than a fixed number of Puts.
>> Then there could be some default value which should provide decent
>> performance out of the box.
>>
>> Dave
>>
>> On Fri, Nov 20, 2009 at 12:35 PM, Gary Helmling <[email protected]> wrote:
>>
>>> To set this per scan you should be able to do:
>>>
>>> Scan s = new Scan()
>>> s.setCaching(...)
>>>
>>> (I think this works anyway)
>>>
>>>
>>> The other thing that I've found useful is using a PageFilter on scans:
>>>
>>> http://hadoop.apache.org/hbase/docs/r0.20.2/api/org/apache/hadoop/hbase/filter/PageFilter.html
>>>
>>> I believe this is applied independently on each region server (?) so you
>>> still need to do your own counting in iterating the results, but it can be
>>> used to early out on the server side separately from the scanner caching
>>> value.
>>>
>>> --gh
>>>
>>> On Fri, Nov 20, 2009 at 3:04 PM, stack <[email protected]> wrote:
>>>
>>> > There is this in the configuration:
>>> >
>>> >  <property>
>>> >    <name>hbase.client.scanner.caching</name>
>>> >    <value>1</value>
>>> >    <description>Number of rows that will be fetched when calling next
>>> >    on a scanner if it is not served from memory. Higher caching values
>>> >    will enable faster scanners but will eat up more memory and some
>>> >    calls of next may take longer and longer times when the cache is
>>> empty.
>>> >    </description>
>>> >  </property>
>>> >
>>> >
>>> > Being able to do it per Scan sounds like something we should add.
>>> >
>>> > St.Ack
>>> >
>>> >
>>> > On Fri, Nov 20, 2009 at 11:43 AM, Adam Silberstein
>>> > <[email protected]>wrote:
>>> >
>>> > >   Hi,
>>> > > Is there a way to specify a limit on number of returned records for
>>> scan?
>>> > >  I
>>> > > don¹t see any way to do this when building the scan.  If there is, that
>>> > > would be great.  If not, what about when iterating over the result?  If
>>> I
>>> > > exit the loop when I reach my limit, will that approximate this clause?
>>> > I
>>> > > guess my real question is about how scan is implemented in the client.
>>> > >  I.e.
>>> > > How many records are returned from Hbase at a time as I iterate through
>>> > the
>>> > > scan result?  If I want 1,000 records and 100 get returned at a time,
>>> > then
>>> > > I¹m in good shape.  On the other hand, if I want 10 records and get 100
>>> > at
>>> > > a
>>> > > time, it¹s a bit wasteful, though the waste is bounded.
>>> > >
>>> > > Thanks,
>>> > > Adam
>>> > >
>>> >
>>>
>>
>

Re: Record limit in scan api?

Reply via email to