[ https://issues.apache.org/jira/browse/MAPREDUCE-3226?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Vinod Kumar Vavilapalli reassigned MAPREDUCE-3226: -------------------------------------------------- Assignee: Vinod Kumar Vavilapalli > Few reduce tasks hanging in a gridmix-run > ----------------------------------------- > > Key: MAPREDUCE-3226 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-3226 > Project: Hadoop Map/Reduce > Issue Type: Bug > Components: mrv2, task > Affects Versions: 0.23.0 > Reporter: Vinod Kumar Vavilapalli > Assignee: Vinod Kumar Vavilapalli > Priority: Blocker > Fix For: 0.23.0 > > > In a gridmix run with ~1000 jobs, one job is getting stuck because of 2-3 > hanging reducers. All of the them are stuck after downloading all map outputs > and have the following thread dump. > {code} > "EventFetcher for fetching Map Completion Events" daemon prio=10 > tid=0xa325fc00 nid=0x1ca4 waiting on condition [0xa315c000] > java.lang.Thread.State: TIMED_WAITING (sleeping) > at java.lang.Thread.sleep(Native Method) > at > org.apache.hadoop.mapreduce.task.reduce.EventFetcher.run(EventFetcher.java:71) > "main" prio=10 tid=0x080ed400 nid=0x1c71 in Object.wait() [0xf73a2000] > java.lang.Thread.State: WAITING (on object monitor) > at java.lang.Object.wait(Native Method) > - waiting on <0xa94b23d8> (a > org.apache.hadoop.mapreduce.task.reduce.EventFetcher) > at java.lang.Thread.join(Thread.java:1143) > - locked <0xa94b23d8> (a > org.apache.hadoop.mapreduce.task.reduce.EventFetcher) > at java.lang.Thread.join(Thread.java:1196) > at > org.apache.hadoop.mapreduce.task.reduce.Shuffle.run(Shuffle.java:135) > at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:367) > at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:147) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:396) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1135) > at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:142) > {code} > Thanks to [~karams] for helping track this down. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira