hi all, We are using hadoop-0.19.1 on about 200 nodes. We find there are lots of slaves keep Child process even the job is done.
Here is an example, the process is running since "AUGEST 09"! > 1000 24625 1 0 Aug09 ? 00:00:38 (...java... classpath) > org.apache.hadoop.mapred.Child 127.0.0.1 55998 > attempt_200908081205_0054_r_000093_0 441920924 jstack output for the process is: > 2009-11-12 14:58:59 > Full thread dump Java HotSpot(TM) Server VM (11.0-b15 mixed mode): > > "Attach Listener" daemon prio=10 tid=0x08168400 nid=0x457a waiting on > condition [0x00000000..0x00000000] > java.lang.Thread.State: RUNNABLE > > "Thread-2" daemon prio=10 tid=0x08170400 nid=0x60f8 waiting for monitor > entry [0xa33ad000..0xa33adfd0] > java.lang.Thread.State: BLOCKED (on object monitor) > at > org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.closeInternal(DFSClient.java:3085) > - waiting to lock <0xa84d12a8> (a > org.apache.hadoop.hdfs.DFSClient$DFSOutputStream) > at > org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.close(DFSClient.java:3054) > at > org.apache.hadoop.hdfs.DFSClient$LeaseChecker.close(DFSClient.java:942) > - locked <0xa84cba48> (a > org.apache.hadoop.hdfs.DFSClient$LeaseChecker) > at org.apache.hadoop.hdfs.DFSClient.close(DFSClient.java:209) > - locked <0xa84cba60> (a org.apache.hadoop.hdfs.DFSClient) > at > org.apache.hadoop.hdfs.DistributedFileSystem.close(DistributedFileSystem.java:264) > at > org.apache.hadoop.fs.FileSystem$Cache.closeAll(FileSystem.java:1413) > - locked <0xa84a1e00> (a org.apache.hadoop.fs.FileSystem$Cache) > at org.apache.hadoop.fs.FileSystem.closeAll(FileSystem.java:236) > at > org.apache.hadoop.fs.FileSystem$ClientFinalizer.run(FileSystem.java:221) > - locked <0xa84a26f0> (a > org.apache.hadoop.fs.FileSystem$ClientFinalizer) > > "SIGTERM handler" daemon prio=10 tid=0x08176800 nid=0x60f6 in Object.wait() > [0xa35ad000..0xa35ae0d0] > java.lang.Thread.State: WAITING (on object monitor) > at java.lang.Object.wait(Native Method) > - waiting on <0xa84a26f0> (a > org.apache.hadoop.fs.FileSystem$ClientFinalizer) > at java.lang.Thread.join(Thread.java:1143) > - locked <0xa84a26f0> (a > org.apache.hadoop.fs.FileSystem$ClientFinalizer) > at java.lang.Thread.join(Thread.java:1196) > at > java.lang.ApplicationShutdownHooks.run(ApplicationShutdownHooks.java:79) > at java.lang.Shutdown.runHooks(Shutdown.java:89) > at java.lang.Shutdown.sequence(Shutdown.java:133) > at java.lang.Shutdown.exit(Shutdown.java:178) > - locked <0xa4556020> (a java.lang.Class for java.lang.Shutdown) > at java.lang.Terminator$1.handle(Terminator.java:35) > at sun.misc.Signal$1.run(Signal.java:195) > at java.lang.Thread.run(Thread.java:619) > > "Comm thread for attempt_200908081205_0054_r_000093_0" daemon prio=10 > tid=0x083f0000 nid=0x6049 waiting for monitor entry [0xa35fe000..0xa35ff050] > java.lang.Thread.State: BLOCKED (on object monitor) > at java.lang.Shutdown.exit(Shutdown.java:178) > - waiting to lock <0xa4556020> (a java.lang.Class for > java.lang.Shutdown) > at java.lang.Runtime.exit(Runtime.java:90) > at java.lang.System.exit(System.java:906) > at org.apache.hadoop.mapred.Task$1.run(Task.java:430) > at java.lang.Thread.run(Thread.java:619) > > "Thread for syncLogs" daemon prio=10 tid=0xa39cc800 nid=0x6041 waiting for > monitor entry [0xa38a3000..0xa38a3fd0] > java.lang.Thread.State: BLOCKED (on object monitor) > at java.lang.Shutdown.exit(Shutdown.java:178) > - waiting to lock <0xa4556020> (a java.lang.Class for > java.lang.Shutdown) > at java.lang.Runtime.exit(Runtime.java:90) > at java.lang.System.exit(System.java:906) > at org.apache.hadoop.mapred.Child$1.run(Child.java:84) > > "Low Memory Detector" daemon prio=10 tid=0x0811c800 nid=0x603e runnable > [0x00000000..0x00000000] > java.lang.Thread.State: RUNNABLE > > "CompilerThread1" daemon prio=10 tid=0x0811a400 nid=0x603d waiting on > condition [0x00000000..0xa3bfe5c8] > java.lang.Thread.State: RUNNABLE > > "CompilerThread0" daemon prio=10 tid=0x08118000 nid=0x603c waiting on > condition [0x00000000..0xa3df5608] > java.lang.Thread.State: RUNNABLE > > "Signal Dispatcher" daemon prio=10 tid=0x08116800 nid=0x603b runnable > [0x00000000..0xa3e46d90] > java.lang.Thread.State: RUNNABLE > > "Finalizer" daemon prio=10 tid=0x08104000 nid=0x603a in Object.wait() > [0xa3e97000..0xa3e97e50] > java.lang.Thread.State: WAITING (on object monitor) > at java.lang.Object.wait(Native Method) > - waiting on <0xa84887a0> (a java.lang.ref.ReferenceQueue$Lock) > at java.lang.ref.ReferenceQueue.remove(ReferenceQueue.java:116) > - locked <0xa84887a0> (a java.lang.ref.ReferenceQueue$Lock) > at java.lang.ref.ReferenceQueue.remove(ReferenceQueue.java:132) > at java.lang.ref.Finalizer$FinalizerThread.run(Finalizer.java:159) > > "Reference Handler" daemon prio=10 tid=0x08102800 nid=0x6039 in > Object.wait() [0xa3ee8000..0xa3ee8fd0] > java.lang.Thread.State: WAITING (on object monitor) > at java.lang.Object.wait(Native Method) > - waiting on <0xa84a93c0> (a java.lang.ref.Reference$Lock) > at java.lang.Object.wait(Object.java:485) > at java.lang.ref.Reference$ReferenceHandler.run(Reference.java:116) > - locked <0xa84a93c0> (a java.lang.ref.Reference$Lock) > > "main" prio=10 tid=0x0805b000 nid=0x6033 in Object.wait() > [0xb7dc6000..0xb7dc7298] > java.lang.Thread.State: WAITING (on object monitor) > at java.lang.Object.wait(Native Method) > - waiting on <0xa84cff68> (a java.util.LinkedList) > at java.lang.Object.wait(Object.java:485) > at > org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.flushInternal(DFSClient.java:3025) > - locked <0xa84cff68> (a java.util.LinkedList) > - locked <0xa84d12a8> (a > org.apache.hadoop.hdfs.DFSClient$DFSOutputStream) > at > org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.closeInternal(DFSClient.java:3105) > - locked <0xa84d12a8> (a > org.apache.hadoop.hdfs.DFSClient$DFSOutputStream) > at > org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.close(DFSClient.java:3054) > at > org.apache.hadoop.fs.FSDataOutputStream$PositionCache.close(FSDataOutputStream.java:61) > at > org.apache.hadoop.fs.FSDataOutputStream.close(FSDataOutputStream.java:86) > at > org.apache.hadoop.mapred.TextOutputFormat$LineRecordWriter.close(TextOutputFormat.java:102) > - locked <0xa84cffd0> (a > org.apache.hadoop.mapred.TextOutputFormat$LineRecordWriter) > at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:447) > at org.apache.hadoop.mapred.Child.main(Child.java:158) > > "VM Thread" prio=10 tid=0x080ff000 nid=0x6038 runnable > > "GC task thread#0 (ParallelGC)" prio=10 tid=0x08062400 nid=0x6034 runnable > > "GC task thread#1 (ParallelGC)" prio=10 tid=0x08063800 nid=0x6035 runnable > > "GC task thread#2 (ParallelGC)" prio=10 tid=0x08065000 nid=0x6036 runnable > > "GC task thread#3 (ParallelGC)" prio=10 tid=0x08066400 nid=0x6037 runnable > > "VM Periodic Task Thread" prio=10 tid=0x0811e400 nid=0x603f waiting on > condition > > JNI global references: 738 > It seems the process is blocked by DFS client. Anyone tell me how to avoid it? Best Regards, Ted Xu
