[ 
https://issues.apache.org/jira/browse/AMBARI-25613?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17275849#comment-17275849
 ] 

Suraj Naik commented on AMBARI-25613:
-------------------------------------

[~aonishuk], [~ihorlukianov] please review my PR

> Concurrent Host Modification exception while sending INSTALL/START Host 
> request
> -------------------------------------------------------------------------------
>
>                 Key: AMBARI-25613
>                 URL: https://issues.apache.org/jira/browse/AMBARI-25613
>             Project: Ambari
>          Issue Type: Bug
>          Components: ambari-server
>    Affects Versions: 2.7.6
>            Reporter: Suraj Naik
>            Priority: Major
>          Time Spent: 20m
>  Remaining Estimate: 0h
>
> java.lang.RuntimeException: START Host request submission failed: 
> java.lang.RuntimeException: Update Host request submission failed: 
> java.util.ConcurrentModificationException
> at 
> org.apache.ambari.server.topology.AmbariContext.startHost(AmbariContext.java:497)
> at 
> org.apache.ambari.server.topology.ClusterTopologyImpl.startHost(ClusterTopologyImpl.java:268)
> at 
> org.apache.ambari.server.topology.tasks.StartHostTask.runTask(StartHostTask.java:51)
> at 
> org.apache.ambari.server.topology.tasks.TopologyHostTask.run(TopologyHostTask.java:55)
> at 
> org.apache.ambari.server.topology.HostOfferResponse$1.run(HostOfferResponse.java:85)
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
> at java.lang.Thread.run(Thread.java:748)
> Caused by: java.lang.RuntimeException: Update Host request submission failed: 
> java.util.ConcurrentModificationException
> at 
> org.apache.ambari.server.controller.internal.HostComponentResourceProvider$4.invoke(HostComponentResourceProvider.java:865)
> at 
> org.apache.ambari.server.controller.internal.HostComponentResourceProvider$4.invoke(HostComponentResourceProvider.java:852)
> at 
> org.apache.ambari.server.controller.internal.AbstractResourceProvider.invokeWithRetry(AbstractResourceProvider.java:465)
> at 
> org.apache.ambari.server.controller.internal.AbstractResourceProvider.modifyResources(AbstractResourceProvider.java:346)
> at 
> org.apache.ambari.server.controller.internal.HostComponentResourceProvider.doUpdateResources(HostComponentResourceProvider.java:852)
> at 
> org.apache.ambari.server.controller.internal.HostComponentResourceProvider.start(HostComponentResourceProvider.java:492)
> at 
> org.apache.ambari.server.topology.AmbariContext.startHost(AmbariContext.java:494)
> at 
> org.apache.ambari.server.topology.ClusterTopologyImpl.startHost(ClusterTopologyImpl.java:268)
> at 
> org.apache.ambari.server.topology.tasks.StartHostTask.runTask(StartHostTask.java:51)
> at 
> org.apache.ambari.server.topology.tasks.TopologyHostTask.run(TopologyHostTask.java:55)
> at 
> org.apache.ambari.server.topology.HostOfferResponse$1.run(HostOfferResponse.java:85)
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
> at java.lang.Thread.run(Thread.java:748)
> Caused by: java.util.ConcurrentModificationException: NA
> at java.util.HashMap$HashIterator.nextNode(HashMap.java:1445)
> at java.util.HashMap$EntryIterator.next(HashMap.java:1479)
> at java.util.HashMap$EntryIterator.next(HashMap.java:1477)
> at java.util.HashMap.putMapEntries(HashMap.java:512)
> at java.util.HashMap.<init>(HashMap.java:490)
> at 
> org.apache.ambari.server.topology.HostRequest.getPhysicalTaskMapping(HostRequest.java:458)
> at 
> org.apache.ambari.server.topology.LogicalRequest.getStageSummaries(LogicalRequest.java:286)
> at 
> org.apache.ambari.server.topology.TopologyManager.getPendingHostComponents(TopologyManager.java:823)
> at 
> org.apache.ambari.server.utils.StageUtils.getClusterHostInfo(StageUtils.java:306)
> at 
> org.apache.ambari.server.controller.AmbariManagementControllerImpl.doStageCreation(AmbariManagementControllerImpl.java:2788)
> at 
> org.apache.ambari.server.controller.AmbariManagementControllerImpl.addStages(AmbariManagementControllerImpl.java:3513)
> at 
> org.apache.ambari.server.controller.internal.HostComponentResourceProvider.updateHostComponents(HostComponentResourceProvider.java:707)
> at 
> org.apache.ambari.server.controller.internal.HostComponentResourceProvider$4.invoke(HostComponentResourceProvider.java:857)
> at 
> org.apache.ambari.server.controller.internal.HostComponentResourceProvider$4.invoke(HostComponentResourceProvider.java:852)
> at 
> org.apache.ambari.server.controller.internal.AbstractResourceProvider.invokeWithRetry(AbstractResourceProvider.java:465)
> at 
> org.apache.ambari.server.controller.internal.AbstractResourceProvider.modifyResources(AbstractResourceProvider.java:346)
> at 
> org.apache.ambari.server.controller.internal.HostComponentResourceProvider.doUpdateResources(HostComponentResourceProvider.java:852)
> at 
> org.apache.ambari.server.controller.internal.HostComponentResourceProvider.start(HostComponentResourceProvider.java:492)
> at 
> org.apache.ambari.server.topology.AmbariContext.startHost(AmbariContext.java:494)
> at 
> org.apache.ambari.server.topology.ClusterTopologyImpl.startHost(ClusterTopologyImpl.java:268)
> at 
> org.apache.ambari.server.topology.tasks.StartHostTask.runTask(StartHostTask.java:51)
> at 
> org.apache.ambari.server.topology.tasks.TopologyHostTask.run(TopologyHostTask.java:55)
> at 
> org.apache.ambari.server.topology.HostOfferResponse$1.run(HostOfferResponse.java:85)
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
> at java.lang.Thread.run(Thread.java:748)
>  
>  
> My teammate [~ramkrishna] did some analysis on this one by adding logs and 
> latches and found that the installation and registration though done 
> parallely each thread tries to get the entire cluster’s view of the current 
> physical tasks. So it is bound to happen that when a registration is 
> happening the other thread can do a getPhysicalTaskMapping().  (leading to 
> CME)
>  
>  
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to