I set it to 10,000 - the job ran in 44 seconds (compared to 29 seconds reading from HFS), so a speed up of 7x or so.
Thanks again, - Dmitry On Tue, Mar 16, 2010 at 9:28 PM, Jean-Daniel Cryans <jdcry...@apache.org>wrote: > Out of interest... to what did you set it and what was the speed-up like? > > J-D > > On Tue, Mar 16, 2010 at 9:26 PM, Dmitry Chechik <dmi...@tellapart.com> > wrote: > > That did it. Thanks! > > > > On Tue, Mar 16, 2010 at 9:14 PM, Jean-Daniel Cryans <jdcry...@apache.org > >wrote: > > > >> Did you set scanner caching higher? > >> > >> J-D > >> > >> On Tue, Mar 16, 2010 at 9:10 PM, Dmitry <dmi...@tellapart.com> wrote: > >> > Hi all, > >> > > >> > I'm trying to analyse some issues with HBase performance in a > mapreduce. > >> > > >> > I'm running a mapreduce which reads a table and just writes it out to > >> HDFS. > >> > The table is small, roughly ~400M of data and 18M rows. > >> > I've pre-split the table into 32 regions, so that I'm not running into > >> the > >> > problem of having one region server serve the entire table. > >> > > >> > I'm running an HBase cluster with: > >> > - 16 region servers (each on the same machine as a Hadoop tasktracker > and > >> > datanode). > >> > - 1 master (on the same machine as the Hadoop jobtracker and > namenode.) > >> > - Zookeeper quorum of just 1 machine (on the same machine as the > master). > >> > > >> > I have LZO compression enabled for both HBase and Hadoop. > >> > > >> > Running this job takes about 5-6 minutes. > >> > > >> > Running a mapreduce reading the exact same set of data from a > >> SequenceFile > >> > on HDFS takes only about 1 minute. > >> > > >> > What else can I do to try to diagnose this? > >> > > >> > Thanks, > >> > > >> > - Dmitry > >> > > >> > > >