[ 
https://issues.apache.org/jira/browse/TEZ-1494?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14111968#comment-14111968
 ] 

Rajesh Balamohan commented on TEZ-1494:
---------------------------------------

Added small testcase in https://github.com/rajeshbalamohan/tez-1494 which can 
be run from local-vm to reproduce the issue. With 
-Dtez.shuffle-vertex-manager.enable.auto-parallel=false, DAG would succeed. 
Initially thought, it was due to slow-start kicking in too early, but it 
appears to be a problem wherein downstream vertex connected via broadcast edge 
is not updated when the parallelism is changed. 

> DAG hangs waiting for ShuffleManager.getNextInput()
> ---------------------------------------------------
>
>                 Key: TEZ-1494
>                 URL: https://issues.apache.org/jira/browse/TEZ-1494
>             Project: Apache Tez
>          Issue Type: Bug
>            Reporter: Rajesh Balamohan
>            Assignee: Rajesh Balamohan
>              Labels: performance
>         Attachments: TEZ-1494-DAG.dot
>
>
> Attaching the DAG and the stack trace of the hung process.  
> Thread 30071: (state = BLOCKED)
>  - sun.misc.Unsafe.park(boolean, long) @bci=0 (Interpreted frame)
>  - java.util.concurrent.locks.LockSupport.park(java.lang.Object) @bci=14, 
> line=186 (Interpreted frame)
>  - 
> java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await() 
> @bci=42, line=2043 (Interpreted frame)
>  - java.util.concurrent.LinkedBlockingQueue.take() @bci=29, line=442 
> (Interpreted frame)
>  - 
> org.apache.tez.runtime.library.shuffle.common.impl.ShuffleManager.getNextInput()
>  @bci=67, line=610 (Interpreted frame)
>  - 
> org.apache.tez.runtime.library.common.readers.UnorderedKVReader.moveToNextInput()
>  @bci=26, line=176 (Interpreted frame)
>  - org.apache.tez.runtime.library.common.readers.UnorderedKVReader.next() 
> @bci=30, line=117 (Interpreted frame)



--
This message was sent by Atlassian JIRA
(v6.2#6252)

Reply via email to