[ https://issues.apache.org/jira/browse/TEZ-2303?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14510319#comment-14510319 ]
Hitesh Shah commented on TEZ-2303: ---------------------------------- The dag is being sent the recover event before all services are started. This will start generating events ( both to the dispatcher as well as to history/recovery, etc ). If an error occurs, the shutdownHandler is invoked. This will hit issues as services will not have started. This should unregister with the RM under normal circumstances. Maybe a separate jira to handle the diagnostics in the following section: {code} } catch (IOException e) { LOG.error("Error occurred when trying to recover data from previous attempt." + " Shutting down AM", e); this.state = DAGAppMasterState.ERROR; this.taskSchedulerEventHandler.setShouldUnregisterFlag(); shutdownHandler.shutdown(); return; } {code} Is there a way to only stop accepting connections from clients until after the DAG is recovered? Not starting only that service also has problems as I believe the YarnSchedulerService depends on it for the host:port info. > ConcurrentModificationException while processing recovery > --------------------------------------------------------- > > Key: TEZ-2303 > URL: https://issues.apache.org/jira/browse/TEZ-2303 > Project: Apache Tez > Issue Type: Bug > Affects Versions: 0.6.0 > Reporter: Jason Lowe > Assignee: Jeff Zhang > Attachments: TEZ-2303-1.patch, TEZ-2303-2.patch > > > Saw a Tez AM log a few ConcurrentModificationException messages while trying > to recover from a previous attempt that crashed. Exception details to follow. -- This message was sent by Atlassian JIRA (v6.3.4#6332)