Reduce hangs at 33%
-------------------
Key: HADOOP-704
URL: http://issues.apache.org/jira/browse/HADOOP-704
Project: Hadoop
Issue Type: Bug
Components: mapred
Affects Versions: 0.8.0
Reporter: Nigel Daley
I have a MR job that is hanging when the reduce reaches 33%.
Both the map and reduce are no-ops. The single reducer is continuously trying
to retrieve output from a TaskTracker that seems to have a crashed "Acceptor
50060" thread. (Note the thread crash does not seem to be logged anywhere).
The thread dump of the TaskTracker is as follows:
"[EMAIL PROTECTED]" daemon prio=1 tid=0x085abd68 nid=0x5b37 waiting on
condition [0x4e979000..0x4e979f30]
at java.lang.Thread.sleep(Native Method)
at org.apache.hadoop.dfs.DFSClient$LeaseChecker.run(DFSClient.java:462)
at java.lang.Thread.run(Thread.java:595)
"org.apache.hadoop.io.ObjectWritable Connection Culler" daemon prio=1
tid=0x0809fe18 nid=0x5b34 waiting on condition [0x4f1e5000..0x4f1e5eb0]
at java.lang.Thread.sleep(Native Method)
at org.apache.hadoop.ipc.Client$ConnectionCuller.run(Client.java:388)
"IPC Server handler 1 on 50050" daemon prio=1 tid=0x085b5d30 nid=0x57f8 in
Object.wait() [0x4eafd000..0x4eafd130]
at java.lang.Object.wait(Native Method)
- waiting on <0x5553ee10> (a java.util.LinkedList)
at org.apache.hadoop.ipc.Server$Handler.run(Server.java:490)
- locked <0x5553ee10> (a java.util.LinkedList)
"IPC Server handler 0 on 50050" daemon prio=1 tid=0x085b57b0 nid=0x57f7 in
Object.wait() [0x4eb7e000..0x4eb7e1b0]
at java.lang.Object.wait(Native Method)
- waiting on <0x5553ee10> (a java.util.LinkedList)
at org.apache.hadoop.ipc.Server$Handler.run(Server.java:490)
- locked <0x5553ee10> (a java.util.LinkedList)
"IPC Server listener on 50050" daemon prio=1 tid=0x083884d8 nid=0x57f6 runnable
[0x4ebfe000..0x4ebff030]
at sun.nio.ch.PollArrayWrapper.poll0(Native Method)
at sun.nio.ch.PollArrayWrapper.poll(PollArrayWrapper.java:100)
at sun.nio.ch.PollSelectorImpl.doSelect(PollSelectorImpl.java:56)
at sun.nio.ch.SelectorImpl.lockAndDoSelect(SelectorImpl.java:69)
- locked <0x5553f3e8> (a sun.nio.ch.Util$1)
- locked <0x5553f3d8> (a java.util.Collections$UnmodifiableSet)
- locked <0x5553f150> (a sun.nio.ch.PollSelectorImpl)
at sun.nio.ch.SelectorImpl.select(SelectorImpl.java:80)
at sun.nio.ch.SelectorImpl.select(SelectorImpl.java:84)
at org.apache.hadoop.ipc.Server$Listener.run(Server.java:224)
"btpool0-1 - Invalidator - /" prio=1 tid=0x08239ac0 nid=0x57f2 waiting on
condition [0x4edfe000..0x4edfef30]
at java.lang.Thread.sleep(Native Method)
at
org.mortbay.jetty.servlet.AbstractSessionManager$SessionScavenger.run(AbstractSessionManager.java:933)
at
org.mortbay.thread.BoundedThreadPool$PoolThread.run(BoundedThreadPool.java:475)
"taskCleanup" daemon prio=1 tid=0x0810fd60 nid=0x57ed in Object.wait()
[0x4f6c0000..0x4f6c0e30]
at java.lang.Object.wait(Native Method)
- waiting on <0x554dc650> (a java.util.ArrayList)
at java.lang.Object.wait(Object.java:474)
at
org.apache.hadoop.mapred.TaskTracker$BlockingQueue.take(TaskTracker.java:783)
- locked <0x554dc650> (a java.util.ArrayList)
at org.apache.hadoop.mapred.TaskTracker$1.run(TaskTracker.java:143)
at java.lang.Thread.run(Thread.java:595)
"Low Memory Detector" daemon prio=1 tid=0x509a54a8 nid=0x57ea runnable
[0x00000000..0x00000000]
"CompilerThread1" daemon prio=1 tid=0x509a40c0 nid=0x57e9 waiting on condition
[0x00000000..0x506793d8]
"CompilerThread0" daemon prio=1 tid=0x509a3138 nid=0x57e8 waiting on condition
[0x00000000..0x506fa258]
"AdapterThread" daemon prio=1 tid=0x509a2170 nid=0x57e7 waiting on condition
[0x00000000..0x00000000]
"Signal Dispatcher" daemon prio=1 tid=0x509a13e0 nid=0x57e6 runnable
[0x00000000..0x00000000]
"Finalizer" daemon prio=1 tid=0x50998880 nid=0x57e5 in Object.wait()
[0x5087d000..0x5087dfb0]
at java.lang.Object.wait(Native Method)
- waiting on <0x554dca70> (a java.lang.ref.ReferenceQueue$Lock)
at java.lang.ref.ReferenceQueue.remove(ReferenceQueue.java:116)
- locked <0x554dca70> (a java.lang.ref.ReferenceQueue$Lock)
at java.lang.ref.ReferenceQueue.remove(ReferenceQueue.java:132)
at java.lang.ref.Finalizer$FinalizerThread.run(Finalizer.java:159)
"Reference Handler" daemon prio=1 tid=0x509983b8 nid=0x57e4 in Object.wait()
[0x508fe000..0x508fee30]
at java.lang.Object.wait(Native Method)
- waiting on <0x554c4450> (a java.lang.ref.Reference$Lock)
at java.lang.Object.wait(Object.java:474)
at java.lang.ref.Reference$ReferenceHandler.run(Reference.java:116)
- locked <0x554c4450> (a java.lang.ref.Reference$Lock)
"main" prio=1 tid=0x0805e608 nid=0x57d2 in Object.wait()
[0xdfffc000..0xdfffcd08]
at java.lang.Object.wait(Native Method)
- waiting on <0x554dc7b0> (a [I)
at
org.apache.hadoop.mapred.TaskTracker.offerService(TaskTracker.java:436)
- locked <0x554dc7b0> (a [I)
at org.apache.hadoop.mapred.TaskTracker.run(TaskTracker.java:720)
at org.apache.hadoop.mapred.TaskTracker.main(TaskTracker.java:1374)
"VM Thread" prio=1 tid=0x50996028 nid=0x57e3 runnable
"GC task thread#0 (ParallelGC)" prio=1 tid=0x08078cc8 nid=0x57df runnable
"GC task thread#1 (ParallelGC)" prio=1 tid=0x080798d0 nid=0x57e0 runnable
"GC task thread#2 (ParallelGC)" prio=1 tid=0x0807a4c0 nid=0x57e1 runnable
"GC task thread#3 (ParallelGC)" prio=1 tid=0x0807b0b0 nid=0x57e2 runnable
"VM Periodic Task Thread" prio=1 tid=0x509a6a10 nid=0x57eb waiting on condition
--
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators:
http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see: http://www.atlassian.com/software/jira