Adrian Nicoara created TEZ-4060:
-----------------------------------

             Summary: NoOpVertexManager schedules tasks that are not ready to 
run
                 Key: TEZ-4060
                 URL: https://issues.apache.org/jira/browse/TEZ-4060
             Project: Apache Tez
          Issue Type: Bug
    Affects Versions: 0.9.1
            Reporter: Adrian Nicoara


During recovery, vertices which have already been reconfigured get assigned a 
NoOpVertexManager:
[https://github.com/apache/tez/blob/8395a9560a131799f1af49b26e1f10f12ef48752/tez-dag/src/main/java/org/apache/tez/dag/app/dag/impl/VertexImpl.java#L2689-L2711]

[https://github.com/apache/tez/blob/8395a9560a131799f1af49b26e1f10f12ef48752/tez-dag/src/main/java/org/apache/tez/dag/app/RecoveryParser.java#L970-L972]

The NoOpVertexManager directly schedules tasks upon being started:

[https://github.com/apache/tez/blob/8395a9560a131799f1af49b26e1f10f12ef48752/tez-dag/src/main/java/org/apache/tez/dag/app/dag/impl/VertexImpl.java#L4628]

However, for a large graph, we can end up having all vertices configured and 
started, before many of their inputs (for vertices that are not attached to the 
roots) are generated.

This ends up scheduling tasks which are not ready to run, and will ultimately 
fail until their inputs are generated.

In addition to bypassing input dependency checking, which is generally done in 
VertexManagerPlugin#onSourceTaskCompleted, we lose the ability of executing 
custom logic within our own VertexManagerPlugins that is needed for the 
configuration of downstream vertices. This is due to the fact that we 
communicate some graph configuration metadata through global objects that are 
populated through calls to VertexManagerPlugin#onVertexStateUpdated.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to