hi all,

We are using hadoop-0.19.1 on about 200 nodes. We find there are lots of
slaves keep Child process even the job is done.

Here is an example, the process is running since "AUGEST 09"!


> 1000     24625     1  0 Aug09 ?        00:00:38 (...java... classpath)
> org.apache.hadoop.mapred.Child 127.0.0.1 55998
> attempt_200908081205_0054_r_000093_0 441920924


jstack output for the process is:


> 2009-11-12 14:58:59
> Full thread dump Java HotSpot(TM) Server VM (11.0-b15 mixed mode):
>
> "Attach Listener" daemon prio=10 tid=0x08168400 nid=0x457a waiting on
> condition [0x00000000..0x00000000]
>    java.lang.Thread.State: RUNNABLE
>
> "Thread-2" daemon prio=10 tid=0x08170400 nid=0x60f8 waiting for monitor
> entry [0xa33ad000..0xa33adfd0]
>    java.lang.Thread.State: BLOCKED (on object monitor)
>         at
> org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.closeInternal(DFSClient.java:3085)
>         - waiting to lock <0xa84d12a8> (a
> org.apache.hadoop.hdfs.DFSClient$DFSOutputStream)
>         at
> org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.close(DFSClient.java:3054)
>         at
> org.apache.hadoop.hdfs.DFSClient$LeaseChecker.close(DFSClient.java:942)
>         - locked <0xa84cba48> (a
> org.apache.hadoop.hdfs.DFSClient$LeaseChecker)
>         at org.apache.hadoop.hdfs.DFSClient.close(DFSClient.java:209)
>         - locked <0xa84cba60> (a org.apache.hadoop.hdfs.DFSClient)
>         at
> org.apache.hadoop.hdfs.DistributedFileSystem.close(DistributedFileSystem.java:264)
>         at
> org.apache.hadoop.fs.FileSystem$Cache.closeAll(FileSystem.java:1413)
>         - locked <0xa84a1e00> (a org.apache.hadoop.fs.FileSystem$Cache)
>         at org.apache.hadoop.fs.FileSystem.closeAll(FileSystem.java:236)
>         at
> org.apache.hadoop.fs.FileSystem$ClientFinalizer.run(FileSystem.java:221)
>         - locked <0xa84a26f0> (a
> org.apache.hadoop.fs.FileSystem$ClientFinalizer)
>
> "SIGTERM handler" daemon prio=10 tid=0x08176800 nid=0x60f6 in Object.wait()
> [0xa35ad000..0xa35ae0d0]
>    java.lang.Thread.State: WAITING (on object monitor)
>         at java.lang.Object.wait(Native Method)
>         - waiting on <0xa84a26f0> (a
> org.apache.hadoop.fs.FileSystem$ClientFinalizer)
>         at java.lang.Thread.join(Thread.java:1143)
>         - locked <0xa84a26f0> (a
> org.apache.hadoop.fs.FileSystem$ClientFinalizer)
>         at java.lang.Thread.join(Thread.java:1196)
>         at
> java.lang.ApplicationShutdownHooks.run(ApplicationShutdownHooks.java:79)
>         at java.lang.Shutdown.runHooks(Shutdown.java:89)
>         at java.lang.Shutdown.sequence(Shutdown.java:133)
>         at java.lang.Shutdown.exit(Shutdown.java:178)
>         - locked <0xa4556020> (a java.lang.Class for java.lang.Shutdown)
>         at java.lang.Terminator$1.handle(Terminator.java:35)
>         at sun.misc.Signal$1.run(Signal.java:195)
>         at java.lang.Thread.run(Thread.java:619)
>
> "Comm thread for attempt_200908081205_0054_r_000093_0" daemon prio=10
> tid=0x083f0000 nid=0x6049 waiting for monitor entry [0xa35fe000..0xa35ff050]
>    java.lang.Thread.State: BLOCKED (on object monitor)
>         at java.lang.Shutdown.exit(Shutdown.java:178)
>         - waiting to lock <0xa4556020> (a java.lang.Class for
> java.lang.Shutdown)
>         at java.lang.Runtime.exit(Runtime.java:90)
>         at java.lang.System.exit(System.java:906)
>         at org.apache.hadoop.mapred.Task$1.run(Task.java:430)
>         at java.lang.Thread.run(Thread.java:619)
>
> "Thread for syncLogs" daemon prio=10 tid=0xa39cc800 nid=0x6041 waiting for
> monitor entry [0xa38a3000..0xa38a3fd0]
>    java.lang.Thread.State: BLOCKED (on object monitor)
>         at java.lang.Shutdown.exit(Shutdown.java:178)
>         - waiting to lock <0xa4556020> (a java.lang.Class for
> java.lang.Shutdown)
>         at java.lang.Runtime.exit(Runtime.java:90)
>         at java.lang.System.exit(System.java:906)
>         at org.apache.hadoop.mapred.Child$1.run(Child.java:84)
>
> "Low Memory Detector" daemon prio=10 tid=0x0811c800 nid=0x603e runnable
> [0x00000000..0x00000000]
>    java.lang.Thread.State: RUNNABLE
>
> "CompilerThread1" daemon prio=10 tid=0x0811a400 nid=0x603d waiting on
> condition [0x00000000..0xa3bfe5c8]
>    java.lang.Thread.State: RUNNABLE
>
> "CompilerThread0" daemon prio=10 tid=0x08118000 nid=0x603c waiting on
> condition [0x00000000..0xa3df5608]
>    java.lang.Thread.State: RUNNABLE
>
> "Signal Dispatcher" daemon prio=10 tid=0x08116800 nid=0x603b runnable
> [0x00000000..0xa3e46d90]
>    java.lang.Thread.State: RUNNABLE
>
> "Finalizer" daemon prio=10 tid=0x08104000 nid=0x603a in Object.wait()
> [0xa3e97000..0xa3e97e50]
>    java.lang.Thread.State: WAITING (on object monitor)
>         at java.lang.Object.wait(Native Method)
>         - waiting on <0xa84887a0> (a java.lang.ref.ReferenceQueue$Lock)
>         at java.lang.ref.ReferenceQueue.remove(ReferenceQueue.java:116)
>         - locked <0xa84887a0> (a java.lang.ref.ReferenceQueue$Lock)
>         at java.lang.ref.ReferenceQueue.remove(ReferenceQueue.java:132)
>         at java.lang.ref.Finalizer$FinalizerThread.run(Finalizer.java:159)
>
> "Reference Handler" daemon prio=10 tid=0x08102800 nid=0x6039 in
> Object.wait() [0xa3ee8000..0xa3ee8fd0]
>    java.lang.Thread.State: WAITING (on object monitor)
>         at java.lang.Object.wait(Native Method)
>         - waiting on <0xa84a93c0> (a java.lang.ref.Reference$Lock)
>         at java.lang.Object.wait(Object.java:485)
>         at java.lang.ref.Reference$ReferenceHandler.run(Reference.java:116)
>         - locked <0xa84a93c0> (a java.lang.ref.Reference$Lock)
>
> "main" prio=10 tid=0x0805b000 nid=0x6033 in Object.wait()
> [0xb7dc6000..0xb7dc7298]
>    java.lang.Thread.State: WAITING (on object monitor)
>         at java.lang.Object.wait(Native Method)
>         - waiting on <0xa84cff68> (a java.util.LinkedList)
>         at java.lang.Object.wait(Object.java:485)
>         at
> org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.flushInternal(DFSClient.java:3025)
>         - locked <0xa84cff68> (a java.util.LinkedList)
>         - locked <0xa84d12a8> (a
> org.apache.hadoop.hdfs.DFSClient$DFSOutputStream)
>         at
> org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.closeInternal(DFSClient.java:3105)
>         - locked <0xa84d12a8> (a
> org.apache.hadoop.hdfs.DFSClient$DFSOutputStream)
>         at
> org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.close(DFSClient.java:3054)
>         at
> org.apache.hadoop.fs.FSDataOutputStream$PositionCache.close(FSDataOutputStream.java:61)
>         at
> org.apache.hadoop.fs.FSDataOutputStream.close(FSDataOutputStream.java:86)
>         at
> org.apache.hadoop.mapred.TextOutputFormat$LineRecordWriter.close(TextOutputFormat.java:102)
>         - locked <0xa84cffd0> (a
> org.apache.hadoop.mapred.TextOutputFormat$LineRecordWriter)
>         at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:447)
>         at org.apache.hadoop.mapred.Child.main(Child.java:158)
>
> "VM Thread" prio=10 tid=0x080ff000 nid=0x6038 runnable
>
> "GC task thread#0 (ParallelGC)" prio=10 tid=0x08062400 nid=0x6034 runnable
>
> "GC task thread#1 (ParallelGC)" prio=10 tid=0x08063800 nid=0x6035 runnable
>
> "GC task thread#2 (ParallelGC)" prio=10 tid=0x08065000 nid=0x6036 runnable
>
> "GC task thread#3 (ParallelGC)" prio=10 tid=0x08066400 nid=0x6037 runnable
>
> "VM Periodic Task Thread" prio=10 tid=0x0811e400 nid=0x603f waiting on
> condition
>
> JNI global references: 738
>
It seems the process is blocked by DFS client. Anyone tell me how to avoid
it?

Best Regards,

Ted Xu

Reply via email to