[ https://issues.apache.org/jira/browse/AMBARI-21593?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16108078#comment-16108078 ]
Hudson commented on AMBARI-21593: --------------------------------- FAILURE: Integrated in Jenkins build Ambari-trunk-Commit #7828 (See [https://builds.apache.org/job/Ambari-trunk-Commit/7828/]) AMBARI-21593 : AMS stopped after RU [AMS distributed mode with 2 (avijayan: [http://git-wip-us.apache.org/repos/asf?p=ambari.git&a=commit&h=c7b350b678b82bae1c0834744249cb534fed18f1]) * (edit) ambari-metrics/ambari-metrics-timelineservice/src/main/java/org/apache/hadoop/yarn/server/applicationhistoryservice/metrics/timeline/availability/MetricCollectorHAController.java > RU: AMS stopped after RU [AMS distributed mode] > ----------------------------------------------- > > Key: AMBARI-21593 > URL: https://issues.apache.org/jira/browse/AMBARI-21593 > Project: Ambari > Issue Type: Bug > Components: ambari-metrics > Affects Versions: 2.5.2 > Reporter: Aravindan Vijayan > Assignee: Aravindan Vijayan > Priority: Blocker > Fix For: 2.5.2 > > Attachments: AMBARI-21593.patch > > > *PROBLEM* > When 2 metric collectors are started up simultaneously, both of them fail to > start. > *BUG* > There exists a race condition in the Metric Collector HA controller > initialization which was introduced through AMBARI-20179. When a helix > controller instance finds that the /ambari-metrics-collector znode exists but > a child node does not exists, it deletes the entire znode and recreates. If > another controller instance also initializes simultaneously, a race condition > can occur wherein each instance will end up cancelling the effort of the > other. > *FIX* > Do not delete and recreate the znode. Wait and retry for a few seconds to > check if /ambari-metrics-collector was fully initailized. -- This message was sent by Atlassian JIRA (v6.4.14#64029)