Reduce hangs at 33%
-------------------

                 Key: HADOOP-704
                 URL: http://issues.apache.org/jira/browse/HADOOP-704
             Project: Hadoop
          Issue Type: Bug
          Components: mapred
    Affects Versions: 0.8.0
            Reporter: Nigel Daley


I have a MR job that is hanging when the reduce reaches 33%.

Both the map and reduce are no-ops.  The single reducer is continuously trying 
to retrieve output from a TaskTracker that seems to have a crashed "Acceptor 
50060" thread.  (Note the thread crash does not seem to be logged anywhere).  
The thread dump of the TaskTracker is as follows:

"[EMAIL PROTECTED]" daemon prio=1 tid=0x085abd68 nid=0x5b37 waiting on 
condition [0x4e979000..0x4e979f30]
        at java.lang.Thread.sleep(Native Method)
        at org.apache.hadoop.dfs.DFSClient$LeaseChecker.run(DFSClient.java:462)
        at java.lang.Thread.run(Thread.java:595)

"org.apache.hadoop.io.ObjectWritable Connection Culler" daemon prio=1 
tid=0x0809fe18 nid=0x5b34 waiting on condition [0x4f1e5000..0x4f1e5eb0]
        at java.lang.Thread.sleep(Native Method)
        at org.apache.hadoop.ipc.Client$ConnectionCuller.run(Client.java:388)

"IPC Server handler 1 on 50050" daemon prio=1 tid=0x085b5d30 nid=0x57f8 in 
Object.wait() [0x4eafd000..0x4eafd130]
        at java.lang.Object.wait(Native Method)
        - waiting on <0x5553ee10> (a java.util.LinkedList)
        at org.apache.hadoop.ipc.Server$Handler.run(Server.java:490)
        - locked <0x5553ee10> (a java.util.LinkedList)

"IPC Server handler 0 on 50050" daemon prio=1 tid=0x085b57b0 nid=0x57f7 in 
Object.wait() [0x4eb7e000..0x4eb7e1b0]
        at java.lang.Object.wait(Native Method)
        - waiting on <0x5553ee10> (a java.util.LinkedList)
        at org.apache.hadoop.ipc.Server$Handler.run(Server.java:490)
        - locked <0x5553ee10> (a java.util.LinkedList)

"IPC Server listener on 50050" daemon prio=1 tid=0x083884d8 nid=0x57f6 runnable 
[0x4ebfe000..0x4ebff030]
        at sun.nio.ch.PollArrayWrapper.poll0(Native Method)
        at sun.nio.ch.PollArrayWrapper.poll(PollArrayWrapper.java:100)
        at sun.nio.ch.PollSelectorImpl.doSelect(PollSelectorImpl.java:56)
        at sun.nio.ch.SelectorImpl.lockAndDoSelect(SelectorImpl.java:69)
        - locked <0x5553f3e8> (a sun.nio.ch.Util$1)
        - locked <0x5553f3d8> (a java.util.Collections$UnmodifiableSet)
        - locked <0x5553f150> (a sun.nio.ch.PollSelectorImpl)
        at sun.nio.ch.SelectorImpl.select(SelectorImpl.java:80)
        at sun.nio.ch.SelectorImpl.select(SelectorImpl.java:84)
        at org.apache.hadoop.ipc.Server$Listener.run(Server.java:224)

"btpool0-1 - Invalidator - /" prio=1 tid=0x08239ac0 nid=0x57f2 waiting on 
condition [0x4edfe000..0x4edfef30]
        at java.lang.Thread.sleep(Native Method)
        at 
org.mortbay.jetty.servlet.AbstractSessionManager$SessionScavenger.run(AbstractSessionManager.java:933)
        at 
org.mortbay.thread.BoundedThreadPool$PoolThread.run(BoundedThreadPool.java:475)

"taskCleanup" daemon prio=1 tid=0x0810fd60 nid=0x57ed in Object.wait() 
[0x4f6c0000..0x4f6c0e30]
        at java.lang.Object.wait(Native Method)
        - waiting on <0x554dc650> (a java.util.ArrayList)
        at java.lang.Object.wait(Object.java:474)
        at 
org.apache.hadoop.mapred.TaskTracker$BlockingQueue.take(TaskTracker.java:783)
        - locked <0x554dc650> (a java.util.ArrayList)
        at org.apache.hadoop.mapred.TaskTracker$1.run(TaskTracker.java:143)
        at java.lang.Thread.run(Thread.java:595)

"Low Memory Detector" daemon prio=1 tid=0x509a54a8 nid=0x57ea runnable 
[0x00000000..0x00000000]

"CompilerThread1" daemon prio=1 tid=0x509a40c0 nid=0x57e9 waiting on condition 
[0x00000000..0x506793d8]

"CompilerThread0" daemon prio=1 tid=0x509a3138 nid=0x57e8 waiting on condition 
[0x00000000..0x506fa258]

"AdapterThread" daemon prio=1 tid=0x509a2170 nid=0x57e7 waiting on condition 
[0x00000000..0x00000000]

"Signal Dispatcher" daemon prio=1 tid=0x509a13e0 nid=0x57e6 runnable 
[0x00000000..0x00000000]

"Finalizer" daemon prio=1 tid=0x50998880 nid=0x57e5 in Object.wait() 
[0x5087d000..0x5087dfb0]
        at java.lang.Object.wait(Native Method)
        - waiting on <0x554dca70> (a java.lang.ref.ReferenceQueue$Lock)
        at java.lang.ref.ReferenceQueue.remove(ReferenceQueue.java:116)
        - locked <0x554dca70> (a java.lang.ref.ReferenceQueue$Lock)
        at java.lang.ref.ReferenceQueue.remove(ReferenceQueue.java:132)
        at java.lang.ref.Finalizer$FinalizerThread.run(Finalizer.java:159)

"Reference Handler" daemon prio=1 tid=0x509983b8 nid=0x57e4 in Object.wait() 
[0x508fe000..0x508fee30]
        at java.lang.Object.wait(Native Method)
        - waiting on <0x554c4450> (a java.lang.ref.Reference$Lock)
        at java.lang.Object.wait(Object.java:474)
        at java.lang.ref.Reference$ReferenceHandler.run(Reference.java:116)
        - locked <0x554c4450> (a java.lang.ref.Reference$Lock)

"main" prio=1 tid=0x0805e608 nid=0x57d2 in Object.wait() 
[0xdfffc000..0xdfffcd08]
        at java.lang.Object.wait(Native Method)
        - waiting on <0x554dc7b0> (a [I)
        at 
org.apache.hadoop.mapred.TaskTracker.offerService(TaskTracker.java:436)
        - locked <0x554dc7b0> (a [I)
        at org.apache.hadoop.mapred.TaskTracker.run(TaskTracker.java:720)
        at org.apache.hadoop.mapred.TaskTracker.main(TaskTracker.java:1374)

"VM Thread" prio=1 tid=0x50996028 nid=0x57e3 runnable

"GC task thread#0 (ParallelGC)" prio=1 tid=0x08078cc8 nid=0x57df runnable

"GC task thread#1 (ParallelGC)" prio=1 tid=0x080798d0 nid=0x57e0 runnable

"GC task thread#2 (ParallelGC)" prio=1 tid=0x0807a4c0 nid=0x57e1 runnable

"GC task thread#3 (ParallelGC)" prio=1 tid=0x0807b0b0 nid=0x57e2 runnable

"VM Periodic Task Thread" prio=1 tid=0x509a6a10 nid=0x57eb waiting on condition


-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators: 
http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Reply via email to