[ https://issues.apache.org/jira/browse/TEZ-2303?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14488219#comment-14488219 ]
Jason Lowe commented on TEZ-2303: --------------------------------- {noformat} 2015-04-09 19:36:11,231 INFO [main] app.RecoveryParser: Recovering from event, eventType=VERTEX_INITIALIZED, event=vertexName=scope-1973, vertexId=vertex_1428329756093_168563_1_43, initRequestedTime=1428606011138, initedTime=1428606011166, numTasks=769, processorName=null, additionalInputsCount=0 2015-04-09 19:36:11,231 INFO [main] impl.VertexImpl: Setting vertexManager to ShuffleVertexManager for vertex_1428329756093_168563_1_43 [scope-1973] 2015-04-09 19:36:11,242 INFO [main] vertexmanager.ShuffleVertexManager: Shuffle Vertex Manager: settings minFrac:0.25 maxFrac:0.75 auto:false desiredTaskIput:104857600 minTasks:1 2015-04-09 19:36:11,251 WARN [IPC Server handler 0 on x] ipc.Server: IPC Server handler 0 on x, call org.apache.tez.dag.api.client.rpc.DAGClientAMProtocolBlockingPB.getDAGStatus from x Call#1965 Retry#0 java.util.ConcurrentModificationException at java.util.LinkedHashMap$LinkedHashIterator.nextEntry(LinkedHashMap.java:394) at java.util.LinkedHashMap$ValueIterator.next(LinkedHashMap.java:409) at org.apache.tez.dag.app.dag.impl.VertexImpl.getRunningTasks(VertexImpl.java:892) at org.apache.tez.dag.app.dag.impl.VertexImpl.getVertexProgress(VertexImpl.java:988) at org.apache.tez.dag.app.dag.impl.DAGImpl.getDAGStatus(DAGImpl.java:694) at org.apache.tez.dag.api.client.DAGClientHandler.getDAGStatus(DAGClientHandler.java:62) at org.apache.tez.dag.api.client.rpc.DAGClientAMProtocolBlockingPBServerImpl.getDAGStatus(DAGClientAMProtocolBlockingPBServerImpl.java:98) at org.apache.tez.dag.api.client.rpc.DAGClientAMProtocolRPC$DAGClientAMProtocol$2.callBlockingMethod(DAGClientAMProtocolRPC.java:7375) at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:619) at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:962) at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2086) at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2082) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:415) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1694) at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2080) {noformat} Looks like a client trying to obtain status from the new attempt is sneaking in and walking the list of tasks as the recovery process is building that list. > ConcurrentModificationException while processing recovery > --------------------------------------------------------- > > Key: TEZ-2303 > URL: https://issues.apache.org/jira/browse/TEZ-2303 > Project: Apache Tez > Issue Type: Bug > Affects Versions: 0.6.0 > Reporter: Jason Lowe > > Saw a Tez AM log a few ConcurrentModificationException messages while trying > to recover from a previous attempt that crashed. Exception details to follow. -- This message was sent by Atlassian JIRA (v6.3.4#6332)