[ 
https://issues.apache.org/jira/browse/GIRAPH-169?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13267257#comment-13267257
 ] 

Roman K commented on GIRAPH-169:
--------------------------------

I successfully reproduced the problem even on the simpler case with 1 worker 
only on pseudo distributed environment:
hadoop jar giraph-0.2-SNAPSHOT-jar-with-dependencies.jar 
org.apache.giraph.benchmark.PageRankBenchmark -e 1 -s 10 -v -V 1000 -w 1

I took the full thread dump of the "hung" child process using jstack (this is 
the meaningful part without GC threads)
but didn't succeed to figure out the problem yet :

--------------------------------------------------------------------------------
"pool-1-thread-1" prio=10 tid=0x00007f0398539000 nid=0x2218 waiting on 
condition [0x00007f0356d87000]
   java.lang.Thread.State: WAITING (parking)
        at sun.misc.Unsafe.park(Native Method)
        - parking to wait for  <0x00000000fe1613a8> (a 
java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject)
        at java.util.concurrent.locks.LockSupport.park(LockSupport.java:156)
        at 
java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:1987)
        at 
java.util.concurrent.LinkedBlockingQueue.take(LinkedBlockingQueue.java:399)
        at 
java.util.concurrent.ThreadPoolExecutor.getTask(ThreadPoolExecutor.java:947)
        at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:907)
        at java.lang.Thread.run(Thread.java:662)

"pool-2-thread-1" prio=10 tid=0x00007f03984ed000 nid=0x2213 runnable 
[0x00007f035728c000]
   java.lang.Thread.State: RUNNABLE
        at sun.nio.ch.EPollArrayWrapper.epollWait(Native Method)
        at sun.nio.ch.EPollArrayWrapper.poll(EPollArrayWrapper.java:210)
        at sun.nio.ch.EPollSelectorImpl.doSelect(EPollSelectorImpl.java:65)
        at sun.nio.ch.SelectorImpl.lockAndDoSelect(SelectorImpl.java:69)
        - locked <0x00000000fe1880f0> (a sun.nio.ch.Util$2)
        - locked <0x00000000fe188100> (a java.util.Collections$UnmodifiableSet)
        - locked <0x00000000fe1880a8> (a sun.nio.ch.EPollSelectorImpl)
        at sun.nio.ch.SelectorImpl.select(SelectorImpl.java:80)
        at sun.nio.ch.SelectorImpl.select(SelectorImpl.java:84)
        at org.apache.hadoop.ipc.Server$Listener$Reader.run(Server.java:333)
        - locked <0x00000000fe188110> (a 
org.apache.hadoop.ipc.Server$Listener$Reader)
        at 
java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
        at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
        at java.lang.Thread.run(Thread.java:662)

"LeaseChecker" daemon prio=10 tid=0x00007f039847a800 nid=0x21fa waiting on 
condition [0x00007f035758f000]
   java.lang.Thread.State: TIMED_WAITING (sleeping)
        at java.lang.Thread.sleep(Native Method)
        at 
org.apache.hadoop.hdfs.DFSClient$LeaseChecker.run(DFSClient.java:1376)
        at java.lang.Thread.run(Thread.java:662)

"Thread for syncLogs" daemon prio=10 tid=0x00007f0398479000 nid=0x21eb waiting 
on condition [0x00007f0357b9a000]
   java.lang.Thread.State: TIMED_WAITING (sleeping)
        at java.lang.Thread.sleep(Native Method)
        at org.apache.hadoop.mapred.Child$3.run(Child.java:139)

"Low Memory Detector" daemon prio=10 tid=0x00007f039809c000 nid=0x21e2 runnable 
[0x0000000000000000]
   java.lang.Thread.State: RUNNABLE

"C2 CompilerThread1" daemon prio=10 tid=0x00007f0398099800 nid=0x21e1 waiting 
on condition [0x0000000000000000]
   java.lang.Thread.State: RUNNABLE

"C2 CompilerThread0" daemon prio=10 tid=0x00007f0398096800 nid=0x21e0 waiting 
on condition [0x0000000000000000]
   java.lang.Thread.State: RUNNABLE

"Signal Dispatcher" daemon prio=10 tid=0x00007f0398094800 nid=0x21df runnable 
[0x0000000000000000]
   java.lang.Thread.State: RUNNABLE

"Finalizer" daemon prio=10 tid=0x00007f0398078000 nid=0x21de in Object.wait() 
[0x00007f0394af9000]
   java.lang.Thread.State: WAITING (on object monitor)
        at java.lang.Object.wait(Native Method)
        - waiting on <0x00000000fe158540> (a java.lang.ref.ReferenceQueue$Lock)
        at java.lang.ref.ReferenceQueue.remove(ReferenceQueue.java:118)
        - locked <0x00000000fe158540> (a java.lang.ref.ReferenceQueue$Lock)
        at java.lang.ref.ReferenceQueue.remove(ReferenceQueue.java:134)
        at java.lang.ref.Finalizer$FinalizerThread.run(Finalizer.java:159)

"Reference Handler" daemon prio=10 tid=0x00007f0398076000 nid=0x21dd in 
Object.wait() [0x00007f0394bfa000]
   java.lang.Thread.State: WAITING (on object monitor)
        at java.lang.Object.wait(Native Method)
        - waiting on <0x00000000fe160070> (a java.lang.ref.Reference$Lock)
        at java.lang.Object.wait(Object.java:485)
        at java.lang.ref.Reference$ReferenceHandler.run(Reference.java:116)
        - locked <0x00000000fe160070> (a java.lang.ref.Reference$Lock)

---------------------------------------------------------------------------------------
                
> How to close all child when a job finished?
> -------------------------------------------
>
>                 Key: GIRAPH-169
>                 URL: https://issues.apache.org/jira/browse/GIRAPH-169
>             Project: Giraph
>          Issue Type: Improvement
>          Components: mapreduce
>    Affects Versions: 0.2.0
>         Environment: sles 11 x64,jdk 1.6,hadoop 0.20.205.0,1 Master and 8 
> slaves,
>            Reporter: Jianfeng Qian
>            Priority: Minor
>
> I ran pagerank at hadoop 0.20.205.0. When the job finished,the child in 
> slaves didn't quit immediately and sometimes they never quit and I have to 
> kill them. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Reply via email to