Re: hadoop idle time on terasort

Todd Lipcon Wed, 02 Dec 2009 16:50:19 -0800

On Wed, Dec 2, 2009 at 4:37 PM, Vasilis Liaskovitis <vlias...@gmail.com>wrote:


> Hi Todd,
>
> thanks for the reply.
>
> >
> > This is seen reasonably often, and could be partly due to missed
> > configuration changes. A few things to check:
> >
> > - Did you increase the number of tasks per node from the default? If you
> > have a reasonable number of disks/cores, you're going to want to run a
> lot
> > more than 2 map and 2 reduce tasks on each node.
>
> For all tests so far, I have increased
> mapred.tasktracker.map.tasks.maximum,
> mapred.tasktracker.reduce.tasks.maximum to the number of cores per
> tasktracker/node. (12 cores per node).
>

Sounds about right.


> I 've also set mapred.map.tasks and mapred.reduce.tasks to a prime
> close to the number of nodes i.e. 8. (though the recommendation for
> mapred.map.tasks is "a prime several times greater than the number of
> hosts").
>
>
Yea, the documentation for those variables is known to be silly and is
corrected in trunk. Set mapred.map.tasks to several times the number of map
slots in your cluster, usually, for small clusters. Set mapred.reduce.tasks
to either a couple less than the number of reduce slots, or a couple less
than twice the number of reduce slots. Primeness really has nothing to do
with it.

The above are rules of thumb, and not always the best - in particular,
mapred.map.tasks can get ignored based on other settings, block size, etc. I
usually leave it unspecified and let it just do a task per block for most
workloads.



> > - Have you tuned any other settings? If you google around you can find
> some
> > guides for configuration tuning that should help squeeze some performance
> > out of your cluster.
>
> I am reusing JVMs. I also enabled default codec compression (native
> zlib I think) for intermediate map outputs. This decreased iowait
> times for some datasets. But idle time is still significant even with
> compression. I wonder if LZO compression would have better results -
> less overall execution time and perhaps less idle time?
>

Yes, LZO is almost always better than gzip for this use case.


>
> I also increased io.sort.mb (set to half the JVM heapsize) though I am
> not sure how that affected performance yet. If other parameters could
> be significant here, let me know. Would increasing the number of i/o
> streams (io.sort.factor I think) help, with a not-so-beefy disk system
> per node?
>
> If you can recommend specific tutorial/guide/blog for performance
> tuning, fell free to share.(though I suspect there may be so many out
> there)
>
>
http://www.cloudera.com/blog/2009/03/30/configuration-parameters-what-can-you-just-ignore/has
some tips. These mailing list archives probably will be useful as
well.


> > There are several patches that aren't in 0.20.1 but will be in 0.21 that
> > help performance. These aren't eligible for backport into 0.20 since
> point
> > releases are for bug fixes only. Some are eligible for backporting into
> > Cloudera's distro (or Yahoo's) and may show up in our next release (CDH3)
> > which should be available first in January for those who like to live on
> the
> > edge.
>
> ok thanks. I 'll try to check out 0.21 or a cloudera distro at some
> point. I wonder if there's a cetralized svn/git somewhere if I want to
> build from source. Or do I need to somehow combine all subprojects
> hadoop-common, hadoop-mapred and hadoop-hdfs?
>
>
I have a git repo up at http://github.com/toddlipcon/hadoop-meta which I
occasionally use. It may be out of date, though, as the "stitching together"
process recently changed, and I haven't updated it recently. In general
running trunk is a recipe for disaster - you're much better off running 0.20
unless you're a developer.

Thanks
-Todd


> thanks again,
>
> - Vasilis
>
> > Thanks,
> > -Todd
> >
> > On Wed, Dec 2, 2009 at 12:22 PM, Vasilis Liaskovitis <vlias...@gmail.com
> >wrote:
> >
> >> Hi,
> >>
> >> I am using hadoop-0.20.1 to run terasort and randsort benchmarking
> >> tests on a small 8-node linux cluster. Most runs consist of usually
> >> low (<50%) core utilizations in the map and reduce phase, as well as
> >> heavy I/O phases . There is usually a large fraction of runtime for
> >> which cores are idling and i/o disk traffic is not heavy.
> >>
> >> On average for the duration of a terasort run I get 20-30% cpu
> >> utilization, 10-30% iowait times and the rest 40-70% is idle time.
> >> This is data collected with mpstat for the duration of the run across
> >> the cores of a specific node. This utilization behaviour is true and
> >> symmetric for all tasktracker/data nodes (The namenode cores and I/O
> >> are mostly idle, so there doesn’t seem to be a bottleneck in the
> >> namenode).
> >>
> >> I am looking for an explanation for the significant idle-time in the
> >> runs. Could it have something to do with misconfigured network/RPC
> >> latency hadoop paremeters? For example, I have tried to increase
> >> mapred.heartbeats.in.second to 1000 from 100 but that didn’t help. The
> >> network bandwidth (1Gige card on each node) is not saturated during
> >> the runs, according to my netstat results.
> >>
> >> Have other people noticed significant cpu idle times that can’t be
> >> explained by I/O traffic?
> >>
> >> Is it reasonable to always expect decreasing idle times as the
> >> terasort dataset scales on the same cluster? I ‘ve only tried 2 small
> >> datasets of 40GB and 64GB each, but core utilizations didn’t increase
> >> with the runs done so far.
> >>
> >> Yahoo’s paper on terasort (http://sortbenchmark.org/Yahoo2009.pdf)
> >> mentions several performance optimizations, some of which seem
> >> relevant to idle times. I am wondering which, if any, of the yahoo
> >> patches are part of the hadoop-0.20.1 distribution.
> >>
> >> Would it be a good idea to try a development version of hadoop to
> >> resolve this issue?
> >>
> >> thanks,
> >>
> >> - Vasilis
> >>
> >
>

Re: hadoop idle time on terasort

Reply via email to