[ https://issues.apache.org/jira/browse/GIRAPH-169?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13267257#comment-13267257 ]
Roman K commented on GIRAPH-169: -------------------------------- I successfully reproduced the problem even on the simpler case with 1 worker only on pseudo distributed environment: hadoop jar giraph-0.2-SNAPSHOT-jar-with-dependencies.jar org.apache.giraph.benchmark.PageRankBenchmark -e 1 -s 10 -v -V 1000 -w 1 I took the full thread dump of the "hung" child process using jstack (this is the meaningful part without GC threads) but didn't succeed to figure out the problem yet : -------------------------------------------------------------------------------- "pool-1-thread-1" prio=10 tid=0x00007f0398539000 nid=0x2218 waiting on condition [0x00007f0356d87000] java.lang.Thread.State: WAITING (parking) at sun.misc.Unsafe.park(Native Method) - parking to wait for <0x00000000fe1613a8> (a java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject) at java.util.concurrent.locks.LockSupport.park(LockSupport.java:156) at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:1987) at java.util.concurrent.LinkedBlockingQueue.take(LinkedBlockingQueue.java:399) at java.util.concurrent.ThreadPoolExecutor.getTask(ThreadPoolExecutor.java:947) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:907) at java.lang.Thread.run(Thread.java:662) "pool-2-thread-1" prio=10 tid=0x00007f03984ed000 nid=0x2213 runnable [0x00007f035728c000] java.lang.Thread.State: RUNNABLE at sun.nio.ch.EPollArrayWrapper.epollWait(Native Method) at sun.nio.ch.EPollArrayWrapper.poll(EPollArrayWrapper.java:210) at sun.nio.ch.EPollSelectorImpl.doSelect(EPollSelectorImpl.java:65) at sun.nio.ch.SelectorImpl.lockAndDoSelect(SelectorImpl.java:69) - locked <0x00000000fe1880f0> (a sun.nio.ch.Util$2) - locked <0x00000000fe188100> (a java.util.Collections$UnmodifiableSet) - locked <0x00000000fe1880a8> (a sun.nio.ch.EPollSelectorImpl) at sun.nio.ch.SelectorImpl.select(SelectorImpl.java:80) at sun.nio.ch.SelectorImpl.select(SelectorImpl.java:84) at org.apache.hadoop.ipc.Server$Listener$Reader.run(Server.java:333) - locked <0x00000000fe188110> (a org.apache.hadoop.ipc.Server$Listener$Reader) at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908) at java.lang.Thread.run(Thread.java:662) "LeaseChecker" daemon prio=10 tid=0x00007f039847a800 nid=0x21fa waiting on condition [0x00007f035758f000] java.lang.Thread.State: TIMED_WAITING (sleeping) at java.lang.Thread.sleep(Native Method) at org.apache.hadoop.hdfs.DFSClient$LeaseChecker.run(DFSClient.java:1376) at java.lang.Thread.run(Thread.java:662) "Thread for syncLogs" daemon prio=10 tid=0x00007f0398479000 nid=0x21eb waiting on condition [0x00007f0357b9a000] java.lang.Thread.State: TIMED_WAITING (sleeping) at java.lang.Thread.sleep(Native Method) at org.apache.hadoop.mapred.Child$3.run(Child.java:139) "Low Memory Detector" daemon prio=10 tid=0x00007f039809c000 nid=0x21e2 runnable [0x0000000000000000] java.lang.Thread.State: RUNNABLE "C2 CompilerThread1" daemon prio=10 tid=0x00007f0398099800 nid=0x21e1 waiting on condition [0x0000000000000000] java.lang.Thread.State: RUNNABLE "C2 CompilerThread0" daemon prio=10 tid=0x00007f0398096800 nid=0x21e0 waiting on condition [0x0000000000000000] java.lang.Thread.State: RUNNABLE "Signal Dispatcher" daemon prio=10 tid=0x00007f0398094800 nid=0x21df runnable [0x0000000000000000] java.lang.Thread.State: RUNNABLE "Finalizer" daemon prio=10 tid=0x00007f0398078000 nid=0x21de in Object.wait() [0x00007f0394af9000] java.lang.Thread.State: WAITING (on object monitor) at java.lang.Object.wait(Native Method) - waiting on <0x00000000fe158540> (a java.lang.ref.ReferenceQueue$Lock) at java.lang.ref.ReferenceQueue.remove(ReferenceQueue.java:118) - locked <0x00000000fe158540> (a java.lang.ref.ReferenceQueue$Lock) at java.lang.ref.ReferenceQueue.remove(ReferenceQueue.java:134) at java.lang.ref.Finalizer$FinalizerThread.run(Finalizer.java:159) "Reference Handler" daemon prio=10 tid=0x00007f0398076000 nid=0x21dd in Object.wait() [0x00007f0394bfa000] java.lang.Thread.State: WAITING (on object monitor) at java.lang.Object.wait(Native Method) - waiting on <0x00000000fe160070> (a java.lang.ref.Reference$Lock) at java.lang.Object.wait(Object.java:485) at java.lang.ref.Reference$ReferenceHandler.run(Reference.java:116) - locked <0x00000000fe160070> (a java.lang.ref.Reference$Lock) --------------------------------------------------------------------------------------- > How to close all child when a job finished? > ------------------------------------------- > > Key: GIRAPH-169 > URL: https://issues.apache.org/jira/browse/GIRAPH-169 > Project: Giraph > Issue Type: Improvement > Components: mapreduce > Affects Versions: 0.2.0 > Environment: sles 11 x64,jdk 1.6,hadoop 0.20.205.0,1 Master and 8 > slaves, > Reporter: Jianfeng Qian > Priority: Minor > > I ran pagerank at hadoop 0.20.205.0. When the job finished,the child in > slaves didn't quit immediately and sometimes they never quit and I have to > kill them. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira