Hi,
The job froze after the filesystem hung on a machine which had
successfully completed a map task.
Is there a flag to enable the re scheduling of such a task ?
Jstack of job tracker
"SocketListener0-2" prio=10 tid=0x08916000 nid=0x4a4f runnable
[0x4d05c000..0x4d05ce30]
java.lang.Thread.State: RUNNABLE
at java.net.SocketInputStream.socketRead0(Native Method)
at java.net.SocketInputStream.read(SocketInputStream.java:129)
at org.mortbay.util.LineInput.fill(LineInput.java:469)
at org.mortbay.util.LineInput.fillLine(LineInput.java:547)
at org.mortbay.util.LineInput.readLineBuffer(LineInput.java:293)
at org.mortbay.util.LineInput.readLineBuffer(LineInput.java:277)
at org.mortbay.http.HttpRequest.readHeader(HttpRequest.java:238)
at
org.mortbay.http.HttpConnection.readRequest(HttpConnection.java:861)
at
org.mortbay.http.HttpConnection.handleNext(HttpConnection.java:907)
at org.mortbay.http.HttpConnection.handle(HttpConnection.java:831)
at
org.mortbay.http.SocketListener.handleConnection(SocketListener.java:244)
at org.mortbay.util.ThreadedServer.handle(ThreadedServer.java:357)
at org.mortbay.util.ThreadPool$PoolThread.run(ThreadPool.java:534)
Locked ownable synchronizers:
- None
"SocketListener0-1" prio=10 tid=0x4da8c800 nid=0xeeb runnable
[0x4d266000..0x4d2670b0]
java.lang.Thread.State: RUNNABLE
at java.net.SocketInputStream.socketRead0(Native Method)
at java.net.SocketInputStream.read(SocketInputStream.java:129)
at org.mortbay.util.LineInput.fill(LineInput.java:469)
at org.mortbay.util.LineInput.fillLine(LineInput.java:547)
at org.mortbay.util.LineInput.readLineBuffer(LineInput.java:293)
at org.mortbay.util.LineInput.readLineBuffer(LineInput.java:277)
at org.mortbay.http.HttpRequest.readHeader(HttpRequest.java:238)
at
org.mortbay.http.HttpConnection.readRequest(HttpConnection.java:861)
at
org.mortbay.http.HttpConnection.handleNext(HttpConnection.java:907)
at org.mortbay.http.HttpConnection.handle(HttpConnection.java:831)
at
org.mortbay.http.SocketListener.handleConnection(SocketListener.java:244)
at org.mortbay.util.ThreadedServer.handle(ThreadedServer.java:357)
at org.mortbay.util.ThreadPool$PoolThread.run(ThreadPool.java:534)
"IPC Server listener on 54311" daemon prio=10 tid=0x4df70400 nid=0xe86
runnable [0x4d9fe000..0x4d9feeb0]
java.lang.Thread.State: RUNNABLE
at sun.nio.ch.EPollArrayWrapper.epollWait(Native Method)
at sun.nio.ch.EPollArrayWrapper.poll(EPollArrayWrapper.java:215)
at sun.nio.ch.EPollSelectorImpl.doSelect(EPollSelectorImpl.java:65)
at sun.nio.ch.SelectorImpl.lockAndDoSelect(SelectorImpl.java:69)
- locked <0x54fb4320> (a sun.nio.ch.Util$1)
- locked <0x54fb4310> (a java.util.Collections$UnmodifiableSet)
- locked <0x54fb40b8> (a sun.nio.ch.EPollSelectorImpl)
at sun.nio.ch.SelectorImpl.select(SelectorImpl.java:80)
at sun.nio.ch.SelectorImpl.select(SelectorImpl.java:84)
at org.apache.hadoop.ipc.Server$Listener.run(Server.java:296)
Locked ownable synchronizers:
- None
"IPC Server Responder" daemon prio=10 tid=0x4da22800 nid=0xe85 runnable
[0x4db75000..0x4db75e30]
java.lang.Thread.State: RUNNABLE
at sun.nio.ch.EPollArrayWrapper.epollWait(Native Method)
at sun.nio.ch.EPollArrayWrapper.poll(EPollArrayWrapper.java:215)
at sun.nio.ch.EPollSelectorImpl.doSelect(EPollSelectorImpl.java:65)
at sun.nio.ch.SelectorImpl.lockAndDoSelect(SelectorImpl.java:69)
- locked <0x54fdddd0> (a sun.nio.ch.Util$1)
- locked <0x54fdce10> (a java.util.Collections$UnmodifiableSet)
- locked <0x54fdcc18> (a sun.nio.ch.EPollSelectorImpl)
at sun.nio.ch.SelectorImpl.select(SelectorImpl.java:80)
at org.apache.hadoop.ipc.Server$Responder.run(Server.java:455)
Locked ownable synchronizers:
- None
"RMI TCP Accept-0" daemon prio=10 tid=0x4da13400 nid=0xe31 runnable
[0x4de55000..0x4de56130]
java.lang.Thread.State: RUNNABLE
at java.net.PlainSocketImpl.socketAccept(Native Method)
at java.net.PlainSocketImpl.accept(PlainSocketImpl.java:384)
- locked <0x54f6dae0> (a java.net.SocksSocketImpl)
at java.net.ServerSocket.implAccept(ServerSocket.java:453)
at java.net.ServerSocket.accept(ServerSocket.java:421)
at
sun.management.jmxremote.LocalRMIServerSocketFactory$1.accept(LocalRMIServerSocketFactory.java:34)
at
sun.rmi.transport.tcp.TCPTransport$AcceptLoop.executeAcceptLoop(TCPTransport.java:369)
at
sun.rmi.transport.tcp.TCPTransport$AcceptLoop.run(TCPTransport.java:341)
at java.lang.Thread.run(Thread.java:619)
Locked ownable synchronizers:
- None
-Sagar