[ https://issues.apache.org/jira/browse/TEZ-1703?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14188956#comment-14188956 ]
Siddharth Seth commented on TEZ-1703: ------------------------------------- Comments on the patch. {code} - DAGTerminationCause.VERTEX_FAILURE, - vertexEvent.getVertexTerminationCause() == null ? VertexTerminationCause.OTHER_VERTEX_FAILURE - : vertexEvent.getVertexTerminationCause()); + DAGTerminationCause.VERTEX_FAILURE, VertexTerminationCause.OTHER_VERTEX_FAILURE); {code} This is required so that all vertices don't get the same termination cause as the first vertex to fail ? We should remove getVertexTerminationCause in a follow up jira, since that seems to be of no use. {code} + String diagnosticMsg = "Vertex failed/killed due to VertexManagerPlugin/EdgeManagerPlugin failed. " {code} Will inputInitializer failures never go through this transition ? It may be better to set this up based on the SOURCE information available in the exception. There's some race conditions possible in the InputInitialzier. Prior to the patch - It's possible for events/notifications to be sent to a complete Initializer since the initializers / events are handled in separate threads. The setComplete() and isComplete checks aren't sufficient to avoid this. - Ideally, completed initializers should just handle these events gracefully, but that's not something that Tez can guarantee. We need to handle such situations, likely in a separate jira. With the patch, It's possible for a INITIALIZER_FAILED event to go out after an INITIALIZER_SUCCESS goes out. Sequence: T1: initializer running, T2: eventReceived/VertexUpdateReceived, throws Exception. T1: completes (the event could be partially handled which triggers completion of initialize()). Similarly it's possible to get INITILZIER_SUCCEEDED messages after a INITIALIER_FAILED message (in a FAILEd etc state). This isn't as harmful. This means we could end up getting INITIALIZER_FAILED messages in the INITED / RUNNING and possibly other states. The state machine in VertexImpl will need to change to handle INITIALIER_FAILED in some more states, and fail the vertex. > Exception handling for InputInitializer > --------------------------------------- > > Key: TEZ-1703 > URL: https://issues.apache.org/jira/browse/TEZ-1703 > Project: Apache Tez > Issue Type: Bug > Affects Versions: 0.5.1 > Reporter: Jeff Zhang > Assignee: Jeff Zhang > Attachments: TEZ-1703-2.patch, TEZ-1703.patch > > > For handleInputInitializerEvent - this should be fairly straightfoward to > handle. At the moment this is an inline call from within the AsyncDispatcher, > and will end up causing a RuntimeException. The RuntimeException can be > changed to a AMUserCodeException which will take care of this. > For onVertexStateUpdated, this eventually gets invoked from within > RootInputInitializerManager. Catching exceptions there and sending a > RootInputInitialzierFailedEvent should be enough to fix this ? May require > some state machine changes to handle this event on a few more states. -- This message was sent by Atlassian JIRA (v6.3.4#6332)