Hi Steve. Sounds like a classic case of uneven data distribution among the reducers. Most of your data is probably going to those 10 reducers that are taking many hours. You may want to adjust your key and/or partitioning strategy to better distribute the data amongst the reducers. If you're using a hashing type of partitioning strategy, think about using a prime number of reducers. Primes are proven to have a more even distribution with a hash type strategy and this alone may get you pretty far. I have no idea what your workflow or cluster configuration is like but 300 reducers for 300 mappers doesn't sound right. Try using a (prime) number of reducers that's roughly equal to 95% of the total reducer slots allocated on the cluster and go from there. Usually, the cluster should be configured for less reducers than mappers. If you have 12 cores per node (HT off), try 8 mappers and 3 reducers per node.
Good luck! Chuck From: Steve Lewis [mailto:lordjoe2...@gmail.com] Sent: Wednesday, August 28, 2013 7:48 PM To: mapreduce-user Subject: Some jobs seem to run forever I have an issue that I am running a hadoop job on a 40 node cluster with about 300 Map tasks and about 300 reduce tasks. Most tasks complete within 20 minutes but a few, typically less than 10 run for many hours. If they complete I see nothing to suggest that the number of bytes read or written or the number of records read or written is significantly different from tasks that run much faster. I sometimes see multiple attempts - usually only two and the cluster is doing nothing else. Any suggested tuning? </pre><font face="arial" size="2" color="#736F6E"> <a href="http://www.sdl.com/?utm_source=Email&utm_medium=Email%2BSignature&utm_campaign=SDL%2BStandard%2BEmail%2BSignature"> <img src="http://www.sdl.com/Content/themes/common/images/SDL_logo_strapline_GCEM_EmailSig_150x68px.jpg" border=0><br><br>www.sdl.com </a><br><br> <font face="arial" size="1" color="#736F6E"> <b>SDL PLC confidential, all rights reserved.</b> If you are not the intended recipient of this mail SDL requests and requires that you delete it without acting upon or copying any of its contents, and we further request that you advise us.<BR><BR> SDL Enterprise Technologies, Inc. - all rights reserved. The information contained in this email may be confidential and/or legally privileged. It has been sent for the sole use of the intended recipient(s). If you are not the intended recipient of this mail, you are hereby notified that any unauthorized review, use, disclosure, dissemination, distribution, or copying of this communication, or any of its contents, is strictly prohibited. If you have received this communication in error, please reply to the sender and destroy all copies of the message. <BR>Registered address: 201 Edgewater Drive, Suite 225, Wakefield, MA 01880, USA </font>