[ http://issues.apache.org/jira/browse/HADOOP-704?page=comments#action_12448627 ] Mahadev konar commented on HADOOP-704: --------------------------------------
should we have some max_retries to fetch the mapoutputs from a given tasktracker for a given mapoutput? In that case we could fail and ask the jobtracker to rexecute the map on a different node. Also can this crash be related to jetty 6 upgrade? > Reduce hangs at 33% > ------------------- > > Key: HADOOP-704 > URL: http://issues.apache.org/jira/browse/HADOOP-704 > Project: Hadoop > Issue Type: Bug > Components: mapred > Affects Versions: 0.8.0 > Reporter: Nigel Daley > > I have a MR job that is hanging when the reduce reaches 33%. > Both the map and reduce are no-ops. The single reducer is continuously > trying to retrieve output from a TaskTracker that seems to have a crashed > "Acceptor 50060" thread. (Note the thread crash does not seem to be logged > anywhere). The thread dump of the TaskTracker is as follows: > "[EMAIL PROTECTED]" daemon prio=1 tid=0x085abd68 nid=0x5b37 waiting on > condition [0x4e979000..0x4e979f30] > at java.lang.Thread.sleep(Native Method) > at > org.apache.hadoop.dfs.DFSClient$LeaseChecker.run(DFSClient.java:462) > at java.lang.Thread.run(Thread.java:595) > "org.apache.hadoop.io.ObjectWritable Connection Culler" daemon prio=1 > tid=0x0809fe18 nid=0x5b34 waiting on condition [0x4f1e5000..0x4f1e5eb0] > at java.lang.Thread.sleep(Native Method) > at org.apache.hadoop.ipc.Client$ConnectionCuller.run(Client.java:388) > "IPC Server handler 1 on 50050" daemon prio=1 tid=0x085b5d30 nid=0x57f8 in > Object.wait() [0x4eafd000..0x4eafd130] > at java.lang.Object.wait(Native Method) > - waiting on <0x5553ee10> (a java.util.LinkedList) > at org.apache.hadoop.ipc.Server$Handler.run(Server.java:490) > - locked <0x5553ee10> (a java.util.LinkedList) > "IPC Server handler 0 on 50050" daemon prio=1 tid=0x085b57b0 nid=0x57f7 in > Object.wait() [0x4eb7e000..0x4eb7e1b0] > at java.lang.Object.wait(Native Method) > - waiting on <0x5553ee10> (a java.util.LinkedList) > at org.apache.hadoop.ipc.Server$Handler.run(Server.java:490) > - locked <0x5553ee10> (a java.util.LinkedList) > "IPC Server listener on 50050" daemon prio=1 tid=0x083884d8 nid=0x57f6 > runnable [0x4ebfe000..0x4ebff030] > at sun.nio.ch.PollArrayWrapper.poll0(Native Method) > at sun.nio.ch.PollArrayWrapper.poll(PollArrayWrapper.java:100) > at sun.nio.ch.PollSelectorImpl.doSelect(PollSelectorImpl.java:56) > at sun.nio.ch.SelectorImpl.lockAndDoSelect(SelectorImpl.java:69) > - locked <0x5553f3e8> (a sun.nio.ch.Util$1) > - locked <0x5553f3d8> (a java.util.Collections$UnmodifiableSet) > - locked <0x5553f150> (a sun.nio.ch.PollSelectorImpl) > at sun.nio.ch.SelectorImpl.select(SelectorImpl.java:80) > at sun.nio.ch.SelectorImpl.select(SelectorImpl.java:84) > at org.apache.hadoop.ipc.Server$Listener.run(Server.java:224) > "btpool0-1 - Invalidator - /" prio=1 tid=0x08239ac0 nid=0x57f2 waiting on > condition [0x4edfe000..0x4edfef30] > at java.lang.Thread.sleep(Native Method) > at > org.mortbay.jetty.servlet.AbstractSessionManager$SessionScavenger.run(AbstractSessionManager.java:933) > at > org.mortbay.thread.BoundedThreadPool$PoolThread.run(BoundedThreadPool.java:475) > "taskCleanup" daemon prio=1 tid=0x0810fd60 nid=0x57ed in Object.wait() > [0x4f6c0000..0x4f6c0e30] > at java.lang.Object.wait(Native Method) > - waiting on <0x554dc650> (a java.util.ArrayList) > at java.lang.Object.wait(Object.java:474) > at > org.apache.hadoop.mapred.TaskTracker$BlockingQueue.take(TaskTracker.java:783) > - locked <0x554dc650> (a java.util.ArrayList) > at org.apache.hadoop.mapred.TaskTracker$1.run(TaskTracker.java:143) > at java.lang.Thread.run(Thread.java:595) > "Low Memory Detector" daemon prio=1 tid=0x509a54a8 nid=0x57ea runnable > [0x00000000..0x00000000] > "CompilerThread1" daemon prio=1 tid=0x509a40c0 nid=0x57e9 waiting on > condition [0x00000000..0x506793d8] > "CompilerThread0" daemon prio=1 tid=0x509a3138 nid=0x57e8 waiting on > condition [0x00000000..0x506fa258] > "AdapterThread" daemon prio=1 tid=0x509a2170 nid=0x57e7 waiting on condition > [0x00000000..0x00000000] > "Signal Dispatcher" daemon prio=1 tid=0x509a13e0 nid=0x57e6 runnable > [0x00000000..0x00000000] > "Finalizer" daemon prio=1 tid=0x50998880 nid=0x57e5 in Object.wait() > [0x5087d000..0x5087dfb0] > at java.lang.Object.wait(Native Method) > - waiting on <0x554dca70> (a java.lang.ref.ReferenceQueue$Lock) > at java.lang.ref.ReferenceQueue.remove(ReferenceQueue.java:116) > - locked <0x554dca70> (a java.lang.ref.ReferenceQueue$Lock) > at java.lang.ref.ReferenceQueue.remove(ReferenceQueue.java:132) > at java.lang.ref.Finalizer$FinalizerThread.run(Finalizer.java:159) > "Reference Handler" daemon prio=1 tid=0x509983b8 nid=0x57e4 in Object.wait() > [0x508fe000..0x508fee30] > at java.lang.Object.wait(Native Method) > - waiting on <0x554c4450> (a java.lang.ref.Reference$Lock) > at java.lang.Object.wait(Object.java:474) > at java.lang.ref.Reference$ReferenceHandler.run(Reference.java:116) > - locked <0x554c4450> (a java.lang.ref.Reference$Lock) > "main" prio=1 tid=0x0805e608 nid=0x57d2 in Object.wait() > [0xdfffc000..0xdfffcd08] > at java.lang.Object.wait(Native Method) > - waiting on <0x554dc7b0> (a [I) > at > org.apache.hadoop.mapred.TaskTracker.offerService(TaskTracker.java:436) > - locked <0x554dc7b0> (a [I) > at org.apache.hadoop.mapred.TaskTracker.run(TaskTracker.java:720) > at org.apache.hadoop.mapred.TaskTracker.main(TaskTracker.java:1374) > "VM Thread" prio=1 tid=0x50996028 nid=0x57e3 runnable > "GC task thread#0 (ParallelGC)" prio=1 tid=0x08078cc8 nid=0x57df runnable > "GC task thread#1 (ParallelGC)" prio=1 tid=0x080798d0 nid=0x57e0 runnable > "GC task thread#2 (ParallelGC)" prio=1 tid=0x0807a4c0 nid=0x57e1 runnable > "GC task thread#3 (ParallelGC)" prio=1 tid=0x0807b0b0 nid=0x57e2 runnable > "VM Periodic Task Thread" prio=1 tid=0x509a6a10 nid=0x57eb waiting on > condition -- This message is automatically generated by JIRA. - If you think it was sent incorrectly contact one of the administrators: http://issues.apache.org/jira/secure/Administrators.jspa - For more information on JIRA, see: http://www.atlassian.com/software/jira
