Re: Weird performance problem.

Jie Li Fri, 16 Mar 2012 17:47:26 -0700

Did you try using the Hadoop Vaidya or Karmasphere to diagnose the problem?


Jie

On Fri, Mar 16, 2012 at 5:23 PM, GUOJUN Zhu <guojun_...@freddiemac.com>wrote:

>
> We have a weird performance problem with a hadoop job on our cluster.  We
> have a 32-node experimenting cluster of blades (2 hex-core), one dedicated
> job tracker, one dedicated namenode, with Cloudera's CDH3 (0.20.2-cdh3u3,
> 03b655719d13929bd68bb2c2f9cee615b389cea9 ) .  All nodes are bought
> together with the same kick-start script.  All in Redhat 6.1 (Linux
> he3lxvd607 2.6.32-131.0.15.el6.x86_64 #1 SMP Tue May 10 15:42:40 EDT 2011
> x86_64 x86_64 x86_64 GNU/Linux).
>
> When we run the our job (~300 tasks),  all tasks fire off at once, so
> averagely 10 tasks per node.   We observe the higher-half of the nodes
> (node 17-32)  have the average load close to 10, CPU is about 50% used.
>  However, the lower-half (node 1-16) does not utilize the CPU fully, load
> is about 1-3, CPU is <10%.   In the final metrics, the map task in the
> lower half has about the same "CPU time spent (ms) " count as the one in
> the higher half.  So it is like that something throtles the tasks in the
> lower half (1-16).  We checked the difference between the two sets of nodes
> in every aspects we can think of.  No difference.
>
> Our job uses  the old mapred API.  It has a quite modest input (<1G input
> for 300 maps) and very tiny output.  The intermediate output  from maps are
> larger (maybe 10x input). The slow part is actually within the map, when we
> try to convert the input format into some classes before we can do the real
> calculation.
>
> We then physically switch the blades in 1-16 with the blades in 17-32.  We
> still see the under-utilization in now 1-16.  So it is more like some
> configuration in the hadoop or system.
>
> We run out of ideas.  Any suggestions are highly appreciated.
>
> We run terasort or word-count, They seem to use all nodes the same.
>
> Zhu, Guojun
> Modeling Sr Graduate
> 571-3824370
> guojun_...@freddiemac.com
> Financial Engineering
> Freddie Mac

Re: Weird performance problem.

Reply via email to