[
https://issues.apache.org/jira/browse/TEZ-1547?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14191625#comment-14191625
]
Siddharth Seth commented on TEZ-1547:
-------------------------------------
bq. Its actually already done that way to make the common case easier. An
incompletely configured vertex will implicitly send the notification when its
completely configured. Calling these API's is optional for the common case.
Yep. Was mentioning setParallelism as an alternate. I think the explicit
approach is better.
bq. Done this way because there are many path out of INITING. But all of those
immediately go trigger startVertex(). So it was easier to put it there without
any drawbacks. This would result in this and RUNNING coming one after another
except for cases like auto-reduce/multiple changes in parallelism etc.
I think it'll be better to send this out as early as possible if it's really a
reconfiguration update - form within setParallelism itself (only way to
configure a vertex), and onReconfiguringVertex in case the user specifies
intent, but doesn't actually call setParallelism.
In terms of the event name - RECONFIGURED would be better than
COMPLETELY_CONFIGURED IMO. COMPLETELY_CONFIGURED prevents adding support for
re-configuration after a vertex has started. While that's not planned at the
moment, this at least doesn't get in the way of such a change.
Also, should we drop PARALLELISM_UPDATED - since that's a subset of
RECONFIGURED.
The rest...
- It's still possible for COMPLETELY_CONFIGURED to go out twice
(ShuffleVertexManger can generate this). onVertexStarted, ShuffleVertexManager
schedules and call doneWithReconf (event goes out and sets flag to false).
maybeSend.. will end up re-sending the event. VertexImpl should guard against
this.
- if (fromVertexManager && canInitVertex()) - Should probably be after the if
(parallelismSet check). Multiple calls to setParallelism surfacing as an error
before an attempt to reconfigure a configured vertex.
- In terms of the incompatible change mentioned in the previous comment - yep,
MRInputSplitDistributor users will be fine. I'd ideally like to confirm with
Hive/Pig on their custom vertexManagers and how they set up parallelism
initially (-1, or a fixed value).
- VertexManagerPlugin.onVertexStateUpdated (and
InputInitializerPluginContext.onVertexStateUpdated) - JavaDoc should ideally
mention that these events may come on a separate thread, and the user code
should account for this. Follow up jira.
- ImmediateStartVertexManager. Does it make sense to rename these (pre-commit)
to something like StartOnSourcesConfiguredVertexManager - since that's the
current behaviour.
- "if (getContext().getVertexNumTasks(srcVertex) > 0)" - Is this because
vertices with parallelism set to -1 would end up not sending a
SRC_VERTEX_STARTED event to downstream vertices, and hence onVertexStarted
isn't invoked ?
- "if (srcVertexConfigured.put(stateUpdate.getVertexName(),
true).booleanValue() == false) {" Is this required, or just guarding against
duplicate notifications for the future?
- Exception handling change in VertexImpl - the Vertex should move into FAILED
state instead of KILELD.
- VertexManagerUserCodeErrorTransition needs to handle different states -
RUNNING (tasks scheduled) vs INITED/INITING. Also needs to call
vertex.finished() - and send out an event to the DAG, otherwise the DAG will
end up hanging. Ref: RootInputInitFailedTransition
- VertexManager - the isComplete() checks are no longer required. It's harmless
leaving them there as long as we're mindful of what gets blocked eventually. If
leaving them in place, please add a comment on the variable indicating eventual
intent, for folks browsing the code.
> Make use of state change notifier in VertexManagerPlugins
> ---------------------------------------------------------
>
> Key: TEZ-1547
> URL: https://issues.apache.org/jira/browse/TEZ-1547
> Project: Apache Tez
> Issue Type: Improvement
> Reporter: Siddharth Seth
> Assignee: Bikas Saha
> Attachments: TEZ-1547.1.patch, TEZ-1547.3.patch, TEZ-1547.4.patch,
> TEZ-1547.5.patch, TEZ-1547.6.patch, TEZ-1547.7.patch, TEZ-1547.8.patch
>
>
> Instead of the various APIs like onVertexStarted, simple notifications could
> be sent.
> Some existing APIs could end up being deprecated.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)