And on the Scan as I wrote in my answer which is really really convenient. Not convinced on using bytes as a value for caching... It would be also more complicated.
J-D On Fri, Nov 20, 2009 at 1:45 PM, Ryan Rawson <[email protected]> wrote: > You can set it on a per-HTable basis. HTable.setScannerCaching(int); > > > > On Fri, Nov 20, 2009 at 1:43 PM, Dave Latham <[email protected]> wrote: >> I have some tables with large rows and some tables with very small rows, so >> I keep my default scanner caching at 1 row, but have to remember to set it >> higher when scanner tables with smaller rows. It would be nice to have a >> default that did something reasonable across tables. >> >> Would it make sense to set scanner caching as a count of bytes rather than a >> count of rows? That would make it similar to the write buffer for batches >> of puts that get flushed based on size rather than a fixed number of Puts. >> Then there could be some default value which should provide decent >> performance out of the box. >> >> Dave >> >> On Fri, Nov 20, 2009 at 12:35 PM, Gary Helmling <[email protected]> wrote: >> >>> To set this per scan you should be able to do: >>> >>> Scan s = new Scan() >>> s.setCaching(...) >>> >>> (I think this works anyway) >>> >>> >>> The other thing that I've found useful is using a PageFilter on scans: >>> >>> http://hadoop.apache.org/hbase/docs/r0.20.2/api/org/apache/hadoop/hbase/filter/PageFilter.html >>> >>> I believe this is applied independently on each region server (?) so you >>> still need to do your own counting in iterating the results, but it can be >>> used to early out on the server side separately from the scanner caching >>> value. >>> >>> --gh >>> >>> On Fri, Nov 20, 2009 at 3:04 PM, stack <[email protected]> wrote: >>> >>> > There is this in the configuration: >>> > >>> > <property> >>> > <name>hbase.client.scanner.caching</name> >>> > <value>1</value> >>> > <description>Number of rows that will be fetched when calling next >>> > on a scanner if it is not served from memory. Higher caching values >>> > will enable faster scanners but will eat up more memory and some >>> > calls of next may take longer and longer times when the cache is >>> empty. >>> > </description> >>> > </property> >>> > >>> > >>> > Being able to do it per Scan sounds like something we should add. >>> > >>> > St.Ack >>> > >>> > >>> > On Fri, Nov 20, 2009 at 11:43 AM, Adam Silberstein >>> > <[email protected]>wrote: >>> > >>> > > Hi, >>> > > Is there a way to specify a limit on number of returned records for >>> scan? >>> > > I >>> > > don¹t see any way to do this when building the scan. If there is, that >>> > > would be great. If not, what about when iterating over the result? If >>> I >>> > > exit the loop when I reach my limit, will that approximate this clause? >>> > I >>> > > guess my real question is about how scan is implemented in the client. >>> > > I.e. >>> > > How many records are returned from Hbase at a time as I iterate through >>> > the >>> > > scan result? If I want 1,000 records and 100 get returned at a time, >>> > then >>> > > I¹m in good shape. On the other hand, if I want 10 records and get 100 >>> > at >>> > > a >>> > > time, it¹s a bit wasteful, though the waste is bounded. >>> > > >>> > > Thanks, >>> > > Adam >>> > > >>> > >>> >> >
