[ 
https://issues.apache.org/jira/browse/TEZ-2303?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14488219#comment-14488219
 ] 

Jason Lowe commented on TEZ-2303:
---------------------------------

{noformat}
2015-04-09 19:36:11,231 INFO [main] app.RecoveryParser: Recovering from event, 
eventType=VERTEX_INITIALIZED, event=vertexName=scope-1973, 
vertexId=vertex_1428329756093_168563_1_43, initRequestedTime=1428606011138, 
initedTime=1428606011166, numTasks=769, processorName=null, 
additionalInputsCount=0
2015-04-09 19:36:11,231 INFO [main] impl.VertexImpl: Setting vertexManager to 
ShuffleVertexManager for vertex_1428329756093_168563_1_43 [scope-1973]
2015-04-09 19:36:11,242 INFO [main] vertexmanager.ShuffleVertexManager: Shuffle 
Vertex Manager: settings minFrac:0.25 maxFrac:0.75 auto:false 
desiredTaskIput:104857600 minTasks:1
2015-04-09 19:36:11,251 WARN [IPC Server handler 0 on x] ipc.Server: IPC Server 
handler 0 on x, call 
org.apache.tez.dag.api.client.rpc.DAGClientAMProtocolBlockingPB.getDAGStatus 
from x Call#1965 Retry#0
java.util.ConcurrentModificationException
        at 
java.util.LinkedHashMap$LinkedHashIterator.nextEntry(LinkedHashMap.java:394)
        at java.util.LinkedHashMap$ValueIterator.next(LinkedHashMap.java:409)
        at 
org.apache.tez.dag.app.dag.impl.VertexImpl.getRunningTasks(VertexImpl.java:892)
        at 
org.apache.tez.dag.app.dag.impl.VertexImpl.getVertexProgress(VertexImpl.java:988)
        at 
org.apache.tez.dag.app.dag.impl.DAGImpl.getDAGStatus(DAGImpl.java:694)
        at 
org.apache.tez.dag.api.client.DAGClientHandler.getDAGStatus(DAGClientHandler.java:62)
        at 
org.apache.tez.dag.api.client.rpc.DAGClientAMProtocolBlockingPBServerImpl.getDAGStatus(DAGClientAMProtocolBlockingPBServerImpl.java:98)
        at 
org.apache.tez.dag.api.client.rpc.DAGClientAMProtocolRPC$DAGClientAMProtocol$2.callBlockingMethod(DAGClientAMProtocolRPC.java:7375)
        at 
org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:619)
        at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:962)
        at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2086)
        at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2082)
        at java.security.AccessController.doPrivileged(Native Method)
        at javax.security.auth.Subject.doAs(Subject.java:415)
        at 
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1694)
        at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2080)
{noformat}

Looks like a client trying to obtain status from the new attempt is sneaking in 
and walking the list of tasks as the recovery process is building that list.

> ConcurrentModificationException while processing recovery
> ---------------------------------------------------------
>
>                 Key: TEZ-2303
>                 URL: https://issues.apache.org/jira/browse/TEZ-2303
>             Project: Apache Tez
>          Issue Type: Bug
>    Affects Versions: 0.6.0
>            Reporter: Jason Lowe
>
> Saw a Tez AM log a few ConcurrentModificationException messages while trying 
> to recover from a previous attempt that crashed.  Exception details to follow.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to