[ 
https://issues.apache.org/jira/browse/TEZ-1547?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14189606#comment-14189606
 ] 

Rajesh Balamohan commented on TEZ-1547:
---------------------------------------

Corner case in ImmediateStartVertexManager:

1. onVertexStarted() gets called and is the middle of populating 
srcVertexConfigured.  Assume it has to populate 2 items in srcVertexConfigured 
and has populated 1 item in the map.
2. In the mean time, onVertexStateUpdated() gets called with 
COMPLETELY_CONFIGURED for the item in srcVertexConfigured.
3. In this case, canScheduleTasks() would return true (without being aware of 
the 2nd item that is yet to be populated in srcVertexConfigured).
4. If source pertaining to 2nd item changes its parallelism, DAG can hang 
indefinitely.

{code}
e.g log:

2014-10-29 20:38:46,172 INFO [AsyncDispatcher event handler] 
impl.ImmediateStartVertexManager: Task count in Map_7: 1
2014-10-29 20:38:46,173 INFO [AsyncDispatcher event handler] 
impl.ImmediateStartVertexManager: Received configured notification : 
COMPLETELY_CONFIGURED for vertex: Map_7
2014-10-29 20:38:46,173 INFO [AsyncDispatcher event handler] 
impl.ImmediateStartVertexManager: Starting 10 in Map_5
2014-10-29 20:38:46,173 INFO [AsyncDispatcher event handler] 
impl.ImmediateStartVertexManager: Task count in Reducer_3: 2
...
...
2014-10-29 20:39:18,682 INFO [AsyncDispatcher event handler] 
vertexmanager.ShuffleVertexManager: Reduce auto parallelism for vertex: 
Reducer_3 to 1 from 2 . Expected output: 0 based on actual output: 0 from 1 
vertex manager events.  desiredTaskInputSize: 104857600 max slow start 
tasks:0.1 num sources completed:1
{code}

In short, check in scheduleTasks() should be added to ensure that 
srcVertexConfigured is completely populated in onVertexStarted().


> Make use of state change notifier in VertexManagerPlugins
> ---------------------------------------------------------
>
>                 Key: TEZ-1547
>                 URL: https://issues.apache.org/jira/browse/TEZ-1547
>             Project: Apache Tez
>          Issue Type: Improvement
>            Reporter: Siddharth Seth
>            Assignee: Bikas Saha
>         Attachments: TEZ-1547.1.patch, TEZ-1547.3.patch, TEZ-1547.4.patch, 
> TEZ-1547.5.patch, TEZ-1547.6.patch, TEZ-1547.7.patch
>
>
> Instead of the various APIs like onVertexStarted, simple notifications could 
> be sent.
> Some existing APIs could end up being deprecated.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to