Adrian Nicoara created TEZ-4060:
-----------------------------------
Summary: NoOpVertexManager schedules tasks that are not ready to
run
Key: TEZ-4060
URL: https://issues.apache.org/jira/browse/TEZ-4060
Project: Apache Tez
Issue Type: Bug
Affects Versions: 0.9.1
Reporter: Adrian Nicoara
During recovery, vertices which have already been reconfigured get assigned a
NoOpVertexManager:
[https://github.com/apache/tez/blob/8395a9560a131799f1af49b26e1f10f12ef48752/tez-dag/src/main/java/org/apache/tez/dag/app/dag/impl/VertexImpl.java#L2689-L2711]
[https://github.com/apache/tez/blob/8395a9560a131799f1af49b26e1f10f12ef48752/tez-dag/src/main/java/org/apache/tez/dag/app/RecoveryParser.java#L970-L972]
The NoOpVertexManager directly schedules tasks upon being started:
[https://github.com/apache/tez/blob/8395a9560a131799f1af49b26e1f10f12ef48752/tez-dag/src/main/java/org/apache/tez/dag/app/dag/impl/VertexImpl.java#L4628]
However, for a large graph, we can end up having all vertices configured and
started, before many of their inputs (for vertices that are not attached to the
roots) are generated.
This ends up scheduling tasks which are not ready to run, and will ultimately
fail until their inputs are generated.
In addition to bypassing input dependency checking, which is generally done in
VertexManagerPlugin#onSourceTaskCompleted, we lose the ability of executing
custom logic within our own VertexManagerPlugins that is needed for the
configuration of downstream vertices. This is due to the fact that we
communicate some graph configuration metadata through global objects that are
populated through calls to VertexManagerPlugin#onVertexStateUpdated.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)