Re: htable.getScanner() slow?

Ryan Rawson Thu, 28 May 2009 19:11:43 -0700

The speed gains will be shocking.  Right now you can expect a 5-100x speed
increase, and soon it will be more like 10-20-200x.


I found with 0.19 there was a 200ms floor in my tests, and 0.20 so far has
blown past that.  There is HBASE-1304 still in progress which is showing
much promise.  Please stay tuned!

These are very exciting times for hbase... Soon HBase will be no SPOF apart
from HDFS, and performant as well.

If you are feeling brave, try hadoop 0.20 and hbase-trunk.  Standard
developer-preview type caveats apply, support is semi-limited since the bug
you might have is already being rewritten.

Having said that, I use HBase 0.20-trunk in production.  I'm also a
committer, so YMMV.

Good luck!
-ryan

On Thu, May 28, 2009 at 7:06 PM, Xinan Wu <wuxi...@gmail.com> wrote:

> Ryan,
>
> Thanks for the reply. I tried tweaking scanner caching but did not
> change the speed much. The test I ended up doing was just getScanner()
> and then immedietely scanner.close() without issuing scanner.next()...
>
> Anyway, it's good to know HBase 0.20 may improve the speed. Is slow
> scanner a known issue with hbase < 0.19 too? (I am using 0.19.2/3, but
> am just curious...)
>
> Xinan
>
> On Thu, May 28, 2009 at 6:56 PM, Ryan Rawson <ryano...@gmail.com> wrote:
> > Hi,
> >
> > You should consider setting scanner caching to reduce the number of
> > server-round trips.
> >
> > But slow scanners is a known problem with 0.19.  HBase 0.20 aims to fix
> this
> > substantially.  Shocking speed gains are hopefully going to be par for
> the
> > course.
> >
> > -ryan
> >
> > On Thu, May 28, 2009 at 6:47 PM, Xinan Wu <wuxi...@gmail.com> wrote:
> >
> >> Hi,
> >>
> >> I've been experimenting row scanning in hbase recently, following
> >> advice from
> >>
> http://devblog.streamy.com/2009/04/23/hbase-row-key-design-for-paging-limit-offset-queries/
> >> ?.
> >>
> >> One thing I notice is htable.getScanner() function call is very slow...
> >>
> >> My table schema is very simple. Integer (as binary 4 bytes) as rowKey,
> >> and single column family..
> >>
> >> If I store 100 records in the same row with different columns, I can
> >> get all the them with a single API call, at about 350 requests per
> >> second (but paging would not be very scalable if records# gets
> >> larger).
> >>
> >> If I store 100 records in 100 different rows (with sort-key appended
> >> to rowKey), then I can use scanner to get them (and paging would be
> >> more scalable). However, getScanner() call takes about 60 ms to
> >> return, and subsequent scanner.next() calls are very fast. Overall,
> >> this gives me only 15 requests per second.
> >>
> >> My dev box is ubuntu 8.04 2.4GHz Quad, 4GB mem, pretty typical one.
> >>
> >> Anyone has experience with slow scanner creation? Any suggestions?
> >>
> >> Xinan
> >>
> >
>

Re: htable.getScanner() slow?

Reply via email to