[
https://issues.apache.org/jira/browse/TEZ-1547?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14191181#comment-14191181
]
Bikas Saha commented on TEZ-1547:
---------------------------------
Fixed the race condition. Added test cases for that.
bq. vertexReconfigurationPlanned - In the JavaDoc, mentioning that
scheduleTasks / addInputInitilizerEvents does not qualify as re-configuration
Done.
bq. public void doneReconfiguringVertex() - The same could be achieved by
assuming reconfiguration
Its actually already done that way to make the common case easier. An
incompletely configured vertex will implicitly send the notification when its
completely configured. Calling these API's is optional for the common case. The
new API's are needed only when a fully configured vertex is re-configured. E.g.
Shuffle vertex parallelism is set and auto-parallel can choose to not change
the parallelism and setParallelism() will not be invoked. However,
doneReconfiguringVertex() must be invoked to let the vertex know that there is
not going to be any further reconfiguring and it sends the notification at that
time. This is also going to enable us to quickly add multiple invocations of
setParallelism() without encountering the current issues because the
doneReconfiguringVertex() can be used to notify the vertex that reconfiguration
is finally complete.
bq. public void vertexManagerDone(); - Do we need to add this right now.
This has been added for multiple reasons, not just for event unregistering.
Recovery needs to know when a vertex manager is done doing all things like
sending events etc. However, given that without the locking fix this is not
going to work. And given that its an optional API and things work without it, I
am going to back it out for this patch.
bq. onVertexStateUpdated should throw Exception
Done. Those patches came in while this was in progress. So now its my problem :)
bq. COMPLETELY_CONFIGURED - This is only sent out after startVertex has been
invoked in VertexImpl
Done this way because there are many path out of INITING. But all of those
immediately go trigger startVertex(). So it was easier to put it there without
any drawbacks. This would result in this and RUNNING coming one after another
except for cases like auto-reduce/multiple changes in parallelism etc.
bq. ImmediateStartVertexManager. SourceInfo no longer required
Removed. I think it was that way because only in those cases the
ImmediateStartManager is effective. If there is an SG edge then the shuffle
vertex manager gets used. Doing it this way removes need for special casing in
both vertex managers and also re-enabled the min/max=0 case for perf in the
shuffle vertex manager.
bq. VertexImpl "if (fromVertexManager && canInitVertex()) {" - this check
should ideally be af
Done. This is my last comment where I mentioned it needs to go inside the try
block.
bq. Typo: defune
Will do.
bq. MRInputSplitDistributor and the correct number of tasks up front
>From what I see only MRInputAMSplitGenerator results in setParallelism to be
>called. In which case numTasks will be -1. The distributor does not result in
>setParallelism to be called. If needed, we could change the
>RootInputVertexManager to use the new APIs and make this check pass. But dont
>think thats needed now.
bq. s/unregisterForVertexStatusUpdates/unregisterForVertexStateUpdates
Will do
bq. Does everything on the VertexManager need to be synchronized ?
IMO its fine to go with heavy handed synchronization for now since this is not
a high throughput API and it allows VMs to safely call methods. We should have
done this much earlier.
bq. on InputReadyVertexManager. That is intentional ? Nothing required,
Right. Nothing required. This is the fully conservative scheduler.
Will post a revised patch soon.
> Make use of state change notifier in VertexManagerPlugins
> ---------------------------------------------------------
>
> Key: TEZ-1547
> URL: https://issues.apache.org/jira/browse/TEZ-1547
> Project: Apache Tez
> Issue Type: Improvement
> Reporter: Siddharth Seth
> Assignee: Bikas Saha
> Attachments: TEZ-1547.1.patch, TEZ-1547.3.patch, TEZ-1547.4.patch,
> TEZ-1547.5.patch, TEZ-1547.6.patch, TEZ-1547.7.patch
>
>
> Instead of the various APIs like onVertexStarted, simple notifications could
> be sent.
> Some existing APIs could end up being deprecated.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)