[ https://issues.apache.org/jira/browse/AMBARI-22848?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Swapan Shridhar updated AMBARI-22848: ------------------------------------- Fix Version/s: (was: 2.6.2) 2.7.0 > Blueprint database inconsistency should be caught by Ambari DB consistency > checker > ---------------------------------------------------------------------------------- > > Key: AMBARI-22848 > URL: https://issues.apache.org/jira/browse/AMBARI-22848 > Project: Ambari > Issue Type: Bug > Components: ambari-server > Affects Versions: 2.5.0 > Reporter: Robert Nettleton > Assignee: Robert Nettleton > Priority: Critical > Fix For: 2.7.0 > > Original Estimate: 168h > Remaining Estimate: 168h > > We've seen some Blueprint deployments fail after an upgrade to Ambari > 2.5.2/2.6) causes older configuration to be reset. > 1. User deploys cluster via Blueprints with older (older than Ambari 2.5/2.6) > version of Ambari. > 2. Cluster deployment fails, and either the user doesn't realize the > deployment has failed, or works through the manual configuration changes > required to get failed services up and running. > 3. Things run fine, sometimes for quite a while. > 4. User upgrades ambari-server to Ambari 2.5 or Ambari 2.6. > 5. Upon the restart of ambari-server, some services seem to be failing, due > to invalid, or old configuration. > The root cause of this problem is that the Blueprints TopologyManager class > will attempt to "replay" any failed requests, which was originally > implemented to allow a Blueprints install to continue working even if > ambari-server is stopped and restarted. > Since the original Blueprint deployment failed, the Ambari Server database is > in an inconsistent state, which causes the Blueprints ToplogyManager to > attempt a replay of various configuration tasks. This ends up causing the > TopologyManager to send configuration updates from the Blueprints's > configuration sections, why by now may be quite out of date, as the cluster > may have changed over time while being adminstered. > This in turn causes some services to fail, as older configuration may not > match the current environment. > > The ambari-server update mechanism should be modified to include integrity > checks on the Blueprint-related tables in the database. In particular, if a > Blueprint deployment is detected, at the very least the "clusterconfig" table > needs to be checked, to ensure that at least one configuration type's version > has a > {code:java} > version_tag{code} > of "TOPOLOGY_RESOLVED". If no configuration versions are found to have a tag > of "TOPOLOGY_RESOLVED", then the ambari-server upgrade should fail with the > appropriate messages, to allow the user to make the manual changes required > in order to resolve the problem, usually by applying a workaround. > Having this check at the ambari-server upgrade time seems like the correct > way to move forward, as this will more quickly detect this problem, and will > keep users from accidentally moving forward with an upgrade that will corrupt > the cluster's configuration with older configuration items. > -- This message was sent by Atlassian JIRA (v7.6.3#76005)