Re: Analysing slow HBase mapreduce performance

Dmitry Chechik Tue, 16 Mar 2010 21:37:04 -0700

I set it to 10,000 - the job ran in 44 seconds (compared to 29 seconds
reading from HFS), so a speed up of 7x or so.


Thanks again,

- Dmitry

On Tue, Mar 16, 2010 at 9:28 PM, Jean-Daniel Cryans <jdcry...@apache.org>wrote:

> Out of interest... to what did you set it and what was the speed-up like?
>
> J-D
>
> On Tue, Mar 16, 2010 at 9:26 PM, Dmitry Chechik <dmi...@tellapart.com>
> wrote:
> > That did it. Thanks!
> >
> > On Tue, Mar 16, 2010 at 9:14 PM, Jean-Daniel Cryans <jdcry...@apache.org
> >wrote:
> >
> >> Did you set scanner caching higher?
> >>
> >> J-D
> >>
> >> On Tue, Mar 16, 2010 at 9:10 PM, Dmitry <dmi...@tellapart.com> wrote:
> >> > Hi all,
> >> >
> >> > I'm trying to analyse some issues with HBase performance in a
> mapreduce.
> >> >
> >> > I'm running a mapreduce which reads a table and just writes it out to
> >> HDFS.
> >> > The table is small, roughly ~400M of data and 18M rows.
> >> > I've pre-split the table into 32 regions, so that I'm not running into
> >> the
> >> > problem of having one region server serve the entire table.
> >> >
> >> > I'm running an HBase cluster with:
> >> > - 16 region servers (each on the same machine as a Hadoop tasktracker
> and
> >> > datanode).
> >> > - 1 master (on the same machine as the Hadoop jobtracker and
> namenode.)
> >> > - Zookeeper quorum of just 1 machine (on the same machine as the
> master).
> >> >
> >> > I have LZO compression enabled for both HBase and Hadoop.
> >> >
> >> > Running this job takes about 5-6 minutes.
> >> >
> >> > Running a mapreduce reading the exact same set of data from a
> >> SequenceFile
> >> > on HDFS takes only about 1 minute.
> >> >
> >> > What else can I do to try to diagnose this?
> >> >
> >> > Thanks,
> >> >
> >> > - Dmitry
> >> >
> >>
> >
>

Re: Analysing slow HBase mapreduce performance

Reply via email to