[
https://issues.apache.org/jira/browse/TEZ-1547?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14193495#comment-14193495
]
Siddharth Seth commented on TEZ-1547:
-------------------------------------
bq. Both are under the writeLock. So they cannot interleave. If done() happens
first then it will fail the startTime check and no event will be sent. Then
startVertex will send the event because byManager==false. If start happens
first then event will not be sent since byManager==true. When done() is called
then it will see startTime>0 and send the event once. In anycase, I can use an
AtomicBoolean to make things really clear.
Here's the scenario where it can happen.
1. User sets intentToConfigure (vertexToBeReconfiguredByManager = true)
2. vertexManager.onVertexStarted(pendingReportedSrcCompletions) invoked.
3. The user code invokes doneReconfiguringVertex as part of this. Event goes
out, vertexToBeReconfiguredByManager set to false (The write lock is taken by
then same thread; otherwise this would be a deadlock)
4. onVertexStarted returns
5. maybeSendConfiguredEvent invoked - since vertexToBeReconfiguredByManager is
false, the event goes out again.
The latest patch contains an explicit check on whether the event has been sent
before, which will take care of this.
bq. In terms of the incompatible change mentioned in the previous comment
[~rohini], [~hagleitn] - need some info on Pig/Hive VertexManager usage. When
using a Pig/Hive custom vertexManager, will the parallelism always be set to -1
at DAG construction time, when these vertexManagers invoke setParallelism ?
bq. The difference between setParallelism and startVertex is 2 events on the
dispatcher queue that are sent back to back. setParallelism->immediately sends
READY_TO_INIT-> immediately triggers START-> sends CONFIGURED_COMPLETE. So time
difference would be negligibly small. Sending it in startVertex makes it future
proof to any code changes prior to start.
I don't think time difference is the consideration here. A CONFIGURED event
should go out as early as possible. Even if sending it in the start transition
- why is it being sent after invoking onVertexStarted ? If the event is going
out after onVertexStarted - it should be called STARTED.
bq. I had initially thought of reconfigured but its not necessary that the
vertex was reconfigured in many case and so the name would be misleading.
Please see my last comment about STARTED. The reason that RECONFIGURED may not
be valid is that we're sending an event which indicates that the vertex has
started, but calling it FULLY_CONFIGURED.
How about this, CONFIGURED - goes out when parallelism / edge configs are
changed (setParallelism), and the current event gets renamed to STARTED.
FULLY_CONFIGURED is a very restrictive name IMO for future changes.
On the ImmediateStartVertexManager - we can iterate in TEZ-1522, which should
make it possible for the VertexManagers to be oblivious of what's happening in
other Vertices, except when absolutely required (ShuffleVertexManager providing
data updates).
I'd be a little more comfortable using TEZ-1522 (and plugins) to fix the
scheduling, and late start issues that we're seeing - for 0.5.2. Your call on
this being safe for 0.5.2 though.
> Make use of state change notifier in VertexManagerPlugins
> ---------------------------------------------------------
>
> Key: TEZ-1547
> URL: https://issues.apache.org/jira/browse/TEZ-1547
> Project: Apache Tez
> Issue Type: Improvement
> Reporter: Siddharth Seth
> Assignee: Siddharth Seth
> Attachments: TEZ-1547.1.patch, TEZ-1547.3.patch, TEZ-1547.4.patch,
> TEZ-1547.5.patch, TEZ-1547.6.patch, TEZ-1547.7.patch, TEZ-1547.8.patch,
> TEZ-1547.9.patch
>
>
> Instead of the various APIs like onVertexStarted, simple notifications could
> be sent.
> Some existing APIs could end up being deprecated.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)