Out of interest... to what did you set it and what was the speed-up like? J-D
On Tue, Mar 16, 2010 at 9:26 PM, Dmitry Chechik <dmi...@tellapart.com> wrote: > That did it. Thanks! > > On Tue, Mar 16, 2010 at 9:14 PM, Jean-Daniel Cryans > <jdcry...@apache.org>wrote: > >> Did you set scanner caching higher? >> >> J-D >> >> On Tue, Mar 16, 2010 at 9:10 PM, Dmitry <dmi...@tellapart.com> wrote: >> > Hi all, >> > >> > I'm trying to analyse some issues with HBase performance in a mapreduce. >> > >> > I'm running a mapreduce which reads a table and just writes it out to >> HDFS. >> > The table is small, roughly ~400M of data and 18M rows. >> > I've pre-split the table into 32 regions, so that I'm not running into >> the >> > problem of having one region server serve the entire table. >> > >> > I'm running an HBase cluster with: >> > - 16 region servers (each on the same machine as a Hadoop tasktracker and >> > datanode). >> > - 1 master (on the same machine as the Hadoop jobtracker and namenode.) >> > - Zookeeper quorum of just 1 machine (on the same machine as the master). >> > >> > I have LZO compression enabled for both HBase and Hadoop. >> > >> > Running this job takes about 5-6 minutes. >> > >> > Running a mapreduce reading the exact same set of data from a >> SequenceFile >> > on HDFS takes only about 1 minute. >> > >> > What else can I do to try to diagnose this? >> > >> > Thanks, >> > >> > - Dmitry >> > >> >