[ https://issues.apache.org/jira/browse/TEZ-1494?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14112543#comment-14112543 ]
Siddharth Seth commented on TEZ-1494: ------------------------------------- We end up initializing a vertex when all of the following are met 1) initializer is complete, 2) edges are setup, 3) parallelism is not -1. All three conditions would be valid for Reducer3, so it would end up allowing Map5 (dependent vertex) to start. We currently have no way of knowing whether a Vertex will change parallelism - and whether we should block for such an operation. Alternately, we'll have to end up updating the downstream tasks with the new parallelism information - which may be a better way to deal with this since parallelism could potentially change multiple times at a later point. > DAG hangs waiting for ShuffleManager.getNextInput() > --------------------------------------------------- > > Key: TEZ-1494 > URL: https://issues.apache.org/jira/browse/TEZ-1494 > Project: Apache Tez > Issue Type: Bug > Reporter: Rajesh Balamohan > Assignee: Rajesh Balamohan > Labels: performance > Attachments: TEZ-1494-DAG.dot > > > Attaching the DAG and the stack trace of the hung process. > Thread 30071: (state = BLOCKED) > - sun.misc.Unsafe.park(boolean, long) @bci=0 (Interpreted frame) > - java.util.concurrent.locks.LockSupport.park(java.lang.Object) @bci=14, > line=186 (Interpreted frame) > - > java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await() > @bci=42, line=2043 (Interpreted frame) > - java.util.concurrent.LinkedBlockingQueue.take() @bci=29, line=442 > (Interpreted frame) > - > org.apache.tez.runtime.library.shuffle.common.impl.ShuffleManager.getNextInput() > @bci=67, line=610 (Interpreted frame) > - > org.apache.tez.runtime.library.common.readers.UnorderedKVReader.moveToNextInput() > @bci=26, line=176 (Interpreted frame) > - org.apache.tez.runtime.library.common.readers.UnorderedKVReader.next() > @bci=30, line=117 (Interpreted frame) -- This message was sent by Atlassian JIRA (v6.2#6252)