[ 
https://issues.apache.org/jira/browse/TEZ-1703?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14188956#comment-14188956
 ] 

Siddharth Seth commented on TEZ-1703:
-------------------------------------

Comments on the patch.

{code}
-            DAGTerminationCause.VERTEX_FAILURE,
-            vertexEvent.getVertexTerminationCause() == null ? 
VertexTerminationCause.OTHER_VERTEX_FAILURE
-                : vertexEvent.getVertexTerminationCause());
+            DAGTerminationCause.VERTEX_FAILURE, 
VertexTerminationCause.OTHER_VERTEX_FAILURE);
{code}
This is required so that all vertices don't get the same termination cause as 
the first vertex to fail ?
We should remove getVertexTerminationCause in a follow up jira, since that 
seems to be of no use.

{code}
+        String diagnosticMsg = "Vertex failed/killed due to 
VertexManagerPlugin/EdgeManagerPlugin failed. "
{code}
Will inputInitializer failures never go through this transition ? It may be 
better to set this up based on the SOURCE information available in the 
exception.

There's some race conditions possible in the InputInitialzier.
Prior to the patch
 - It's possible for events/notifications to be sent to a complete Initializer 
since the initializers / events are handled in separate threads. The 
setComplete() and isComplete checks aren't sufficient to avoid this.
- Ideally, completed initializers should just handle these events gracefully, 
but that's not something that Tez can guarantee. We need to handle such 
situations, likely in a separate jira.

With the patch,
It's possible for a INITIALIZER_FAILED event to go out after an 
INITIALIZER_SUCCESS goes out. Sequence: T1: initializer running, T2: 
eventReceived/VertexUpdateReceived, throws Exception. T1: completes (the event 
could be partially handled which triggers completion of initialize()).
Similarly it's possible to get INITILZIER_SUCCEEDED messages after a 
INITIALIER_FAILED message (in a FAILEd etc state). This isn't as harmful.
This means we could end up getting INITIALIZER_FAILED messages in the INITED / 
RUNNING and possibly other states.

The state machine in VertexImpl will need to change to handle INITIALIER_FAILED 
in some more states, and fail the vertex.

> Exception handling for InputInitializer
> ---------------------------------------
>
>                 Key: TEZ-1703
>                 URL: https://issues.apache.org/jira/browse/TEZ-1703
>             Project: Apache Tez
>          Issue Type: Bug
>    Affects Versions: 0.5.1
>            Reporter: Jeff Zhang
>            Assignee: Jeff Zhang
>         Attachments: TEZ-1703-2.patch, TEZ-1703.patch
>
>
> For handleInputInitializerEvent - this should be fairly straightfoward to 
> handle. At the moment this is an inline call from within the AsyncDispatcher, 
> and will end up causing a RuntimeException. The RuntimeException can be 
> changed to a AMUserCodeException which will take care of this.
> For onVertexStateUpdated, this eventually gets invoked from within 
> RootInputInitializerManager. Catching exceptions there and sending a 
> RootInputInitialzierFailedEvent should be enough to fix this ? May require 
> some state machine changes to handle this event on a few more states.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to