Hi Steve. Sounds like a classic case of uneven data distribution among the 
reducers. Most of your data is probably going to those 10 reducers that are 
taking many hours. You may want to adjust your key and/or partitioning strategy 
to better distribute the data amongst the reducers. If you're using a hashing 
type of partitioning strategy, think about using a prime number of reducers. 
Primes are proven to have a more even distribution with a hash type strategy 
and this alone may get you pretty far. I have no idea what your workflow or 
cluster configuration is like but 300 reducers for 300 mappers doesn't sound 
right. Try using a (prime) number of reducers that's roughly  equal to 95% of 
the total reducer slots allocated on the cluster and go from there. Usually, 
the cluster should be configured for less reducers than mappers. If you have 12 
cores per node (HT off), try 8 mappers and 3 reducers per node.

Good luck!

Chuck


From: Steve Lewis [mailto:lordjoe2...@gmail.com]
Sent: Wednesday, August 28, 2013 7:48 PM
To: mapreduce-user
Subject: Some jobs seem to run forever

I have an issue that I am running a hadoop job on a 40 node cluster with about 
300 Map tasks and about 300 reduce tasks. Most tasks complete within 20 minutes 
but a few, typically less than 10 run for many hours.
If they complete I see nothing to suggest that the number of bytes read or 
written or the number of records read or written is significantly different 
from tasks that run much faster. I sometimes see multiple attempts - usually 
only two and the cluster is doing nothing else.

Any suggested tuning?


</pre><font face="arial" size="2" color="#736F6E">



<a 
href="http://www.sdl.com/?utm_source=Email&utm_medium=Email%2BSignature&utm_campaign=SDL%2BStandard%2BEmail%2BSignature";>
<img 
src="http://www.sdl.com/Content/themes/common/images/SDL_logo_strapline_GCEM_EmailSig_150x68px.jpg";
 border=0><br><br>www.sdl.com
</a><br><br>

<font face="arial" size="1" color="#736F6E">

<b>SDL PLC confidential, all rights reserved.</b>

If you are not the intended recipient of this mail SDL requests and requires 
that you delete it without acting upon or copying any of its contents, 
and we further request that you advise us.<BR><BR>
SDL Enterprise Technologies, Inc. - all rights reserved.  The information 
contained in this email may be confidential and/or legally privileged. It has 
been sent for the sole use of the intended recipient(s). If you are not the 
intended recipient of this mail, you are hereby notified that any unauthorized 
review, use, disclosure, dissemination, distribution, or copying of this 
communication, or any of its contents, is strictly prohibited. If you have 
received this communication in error, please reply to the sender and destroy 
all copies of the message.
<BR>Registered address: 201 Edgewater Drive, Suite 225, Wakefield, MA 01880, USA
</font>

Reply via email to