[ https://issues.apache.org/jira/browse/AMBARI-15714?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Myroslav Papirkovskyi updated AMBARI-15714: ------------------------------------------- Status: Patch Available (was: Open) > Express Upgrade hung after FAILED step is retried > ------------------------------------------------- > > Key: AMBARI-15714 > URL: https://issues.apache.org/jira/browse/AMBARI-15714 > Project: Ambari > Issue Type: Bug > Components: ambari-server > Affects Versions: 2.2.2 > Reporter: Myroslav Papirkovskyi > Assignee: Myroslav Papirkovskyi > Priority: Critical > Fix For: 2.2.2 > > Attachments: AMBARI-15714.patch > > > *Steps:* > # Start Express Upgrade from HDP 2.4.0.0 to 2.4.2.0-130 > # Reach till backup the Hive Metastore message and hit Proceed > # Stop Ambari agent on one of the HBase RegionServer host > (os-r7-kwjvku-ambari-eu-4-4.novalocal on the current cluster) > # Wait for EU to report failure (status as HOLDING_TIMEDOUT) > # Start ambari-agent on the RS host and wait 60 secs. for the heartbeat to be > operational > # Retry the failed step in EU wizard > > *Result* > EU hangs > ambari-server logs report below: > {code} > 05 Apr 2016 08:09:26,526 WARN [ambari-hearbeat-monitor] HeartbeatMonitor:157 > - Heartbeat lost from host os-r7-kwjvku-ambari-eu-4-4.novalocal > 05 Apr 2016 08:09:26,563 WARN [ambari-hearbeat-monitor] HeartbeatMonitor:172 > - Setting component state to UNKNOWN for component SECONDARY_NAMENODE on > os-r7-kwjvku-ambari-eu-4-4.novalocal > 05 Apr 2016 08:09:26,566 WARN [ambari-hearbeat-monitor] HeartbeatMonitor:172 > - Setting component state to UNKNOWN for component HISTORYSERVER on > os-r7-kwjvku-ambari-eu-4-4.novalocal > 05 Apr 2016 08:09:26,568 WARN [ambari-hearbeat-monitor] HeartbeatMonitor:172 > - Setting component state to UNKNOWN for component HIVE_METASTORE on > os-r7-kwjvku-ambari-eu-4-4.novalocal > 05 Apr 2016 08:09:26,571 WARN [ambari-hearbeat-monitor] HeartbeatMonitor:172 > - Setting component state to UNKNOWN for component WEBHCAT_SERVER on > os-r7-kwjvku-ambari-eu-4-4.novalocal > 05 Apr 2016 08:09:26,574 WARN [ambari-hearbeat-monitor] HeartbeatMonitor:172 > - Setting component state to UNKNOWN for component HIVE_SERVER on > os-r7-kwjvku-ambari-eu-4-4.novalocal > 05 Apr 2016 08:09:26,577 WARN [ambari-hearbeat-monitor] HeartbeatMonitor:172 > - Setting component state to UNKNOWN for component OOZIE_SERVER on > os-r7-kwjvku-ambari-eu-4-4.novalocal > 05 Apr 2016 08:09:26,580 WARN [ambari-hearbeat-monitor] HeartbeatMonitor:172 > - Setting component state to UNKNOWN for component ZOOKEEPER_SERVER on > os-r7-kwjvku-ambari-eu-4-4.novalocal > 05 Apr 2016 08:09:26,583 WARN [ambari-hearbeat-monitor] HeartbeatMonitor:172 > - Setting component state to UNKNOWN for component DRPC_SERVER on > os-r7-kwjvku-ambari-eu-4-4.novalocal > 05 Apr 2016 08:09:26,585 WARN [ambari-hearbeat-monitor] HeartbeatMonitor:172 > - Setting component state to UNKNOWN for component KAFKA_BROKER on > os-r7-kwjvku-ambari-eu-4-4.novalocal > 05 Apr 2016 08:09:26,589 WARN [ambari-hearbeat-monitor] HeartbeatMonitor:172 > - Setting component state to UNKNOWN for component SPARK_JOBHISTORYSERVER on > os-r7-kwjvku-ambari-eu-4-4.novalocal > 05 Apr 2016 08:09:26,592 WARN [ambari-hearbeat-monitor] HeartbeatMonitor:172 > - Setting component state to UNKNOWN for component METRICS_GRAFANA on > os-r7-kwjvku-ambari-eu-4-4.novalocal > 05 Apr 2016 08:09:26,595 WARN [ambari-hearbeat-monitor] HeartbeatMonitor:172 > - Setting component state to UNKNOWN for component DATANODE on > os-r7-kwjvku-ambari-eu-4-4.novalocal > 05 Apr 2016 08:09:26,597 WARN [ambari-hearbeat-monitor] HeartbeatMonitor:172 > - Setting component state to UNKNOWN for component NFS_GATEWAY on > os-r7-kwjvku-ambari-eu-4-4.novalocal > 05 Apr 2016 08:09:26,600 WARN [ambari-hearbeat-monitor] HeartbeatMonitor:172 > - Setting component state to UNKNOWN for component NODEMANAGER on > os-r7-kwjvku-ambari-eu-4-4.novalocal > 05 Apr 2016 08:09:26,603 WARN [ambari-hearbeat-monitor] HeartbeatMonitor:172 > - Setting component state to UNKNOWN for component HBASE_REGIONSERVER on > os-r7-kwjvku-ambari-eu-4-4.novalocal > 05 Apr 2016 08:09:26,606 WARN [ambari-hearbeat-monitor] HeartbeatMonitor:172 > - Setting component state to UNKNOWN for component SUPERVISOR on > os-r7-kwjvku-ambari-eu-4-4.novalocal > 05 Apr 2016 08:09:26,609 WARN [ambari-hearbeat-monitor] HeartbeatMonitor:172 > - Setting component state to UNKNOWN for component FLUME_HANDLER on > os-r7-kwjvku-ambari-eu-4-4.novalocal > 05 Apr 2016 08:09:26,611 WARN [ambari-hearbeat-monitor] HeartbeatMonitor:172 > - Setting component state to UNKNOWN for component SPARK_THRIFTSERVER on > os-r7-kwjvku-ambari-eu-4-4.novalocal > 05 Apr 2016 08:09:26,615 WARN [ambari-hearbeat-monitor] HeartbeatMonitor:172 > - Setting component state to UNKNOWN for component METRICS_MONITOR on > os-r7-kwjvku-ambari-eu-4-4.novalocal > 05 Apr 2016 08:09:26,618 WARN [ambari-hearbeat-monitor] HeartbeatMonitor:172 > - Setting component state to UNKNOWN for component HST_AGENT on > os-r7-kwjvku-ambari-eu-4-4.novalocal > 05 Apr 2016 08:09:27,571 INFO [ambari-action-scheduler] ActionScheduler:702 > - Host:os-r7-kwjvku-ambari-eu-4-4.novalocal, role:HBASE_REGIONSERVER, > actionId:13-29 timed out > 05 Apr 2016 08:09:28,750 INFO [ambari-action-scheduler] ActionScheduler:702 > - Host:os-r7-kwjvku-ambari-eu-4-4.novalocal, role:HBASE_REGIONSERVER, > actionId:13-29 timed out > 05 Apr 2016 08:09:28,751 WARN [ambari-action-scheduler] ActionScheduler:704 > - Host:os-r7-kwjvku-ambari-eu-4-4.novalocal, role:HBASE_REGIONSERVER, > actionId:13-29 expired > 05 Apr 2016 08:09:28,789 ERROR [ambari-action-scheduler] > ServiceComponentHostImpl:1030 - Can't handle ServiceComponentHostEvent event > at current state, serviceComponentName=HBASE_REGIONSERVER, > hostName=os-r7-kwjvku-ambari-eu-4-4.novalocal, currentState=UNKNOWN, > eventType=HOST_SVCCOMP_OP_FAILED, event=EventType: HOST_SVCCOMP_OP_FAILED > 05 Apr 2016 08:09:28,789 WARN [ambari-action-scheduler] ActionScheduler:806 > - Unable to transition to failed state. > org.apache.ambari.server.state.fsm.InvalidStateTransitionException: Invalid > event: HOST_SVCCOMP_OP_FAILED at UNKNOWN > at > org.apache.ambari.server.state.fsm.StateMachineFactory.doTransition(StateMachineFactory.java:297) > at > org.apache.ambari.server.state.fsm.StateMachineFactory.access$300(StateMachineFactory.java:39) > at > org.apache.ambari.server.state.fsm.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:440) > at > org.apache.ambari.server.state.svccomphost.ServiceComponentHostImpl.handleEvent(ServiceComponentHostImpl.java:1025) > at > org.apache.ambari.server.actionmanager.ActionScheduler.transitionToFailedState(ActionScheduler.java:789) > at > org.apache.ambari.server.actionmanager.ActionScheduler.processInProgressStage(ActionScheduler.java:710) > at > org.apache.ambari.server.actionmanager.ActionScheduler.doWork(ActionScheduler.java:289) > at > org.apache.ambari.server.actionmanager.ActionScheduler.run(ActionScheduler.java:196) > at java.lang.Thread.run(Thread.java:745) > 05 Apr 2016 08:09:28,790 INFO [ambari-action-scheduler] ActionScheduler:717 > - Removing command from queue, host=os-r7-kwjvku-ambari-eu-4-4.novalocal, > commandId=13-29 > 05 Apr 2016 08:09:59,571 WARN [qtp-ambari-agent-418] SecurityFilter:103 - > Request https://os-r7-kwjvku-ambari-eu-4-5.novalocal:8440/ca doesn't match > any pattern. > 05 Apr 2016 08:09:59,571 WARN [qtp-ambari-agent-418] SecurityFilter:62 - > This request is not allowed on this port: > https://os-r7-kwjvku-ambari-eu-4-5.novalocal:8440/ca > 5 Apr 2016 08:10:00,761 INFO [qtp-ambari-agent-418] HeartBeatHandler:400 - > agentOsType = centos7 > 05 Apr 2016 08:10:00,983 INFO [qtp-ambari-agent-418] HostImpl:285 - Received > host registration, > host=[hostname=os-r7-kwjvku-ambari-eu-4-4,fqdn=os-r7-kwjvku-ambari-eu-4-4.novalocal,domain=novalocal,architecture=x86_64,processorcount=2,physicalprocessorcount=2,osname=centos,osversion=7.0.1406,osfamily=redhat,memory=16269820,uptime_hours=2,mounts=(available=24033216,mountpoint=/,used=2165564,percent=9%,size=26198780,device=/dev/vda1,type=xfs)(available=8119336,mountpoint=/dev,used=0,percent=0%,size=8119336,device=devtmpfs,type=devtmpfs)(available=8134900,mountpoint=/dev/shm,used=8,percent=1%,size=8134908,device=tmpfs,type=tmpfs)(available=8093468,mountpoint=/run,used=41440,percent=1%,size=8134908,device=tmpfs,type=tmpfs)(available=234937192,mountpoint=/grid/0,used=9839132,percent=5%,size=257899908,device=/dev/vdb,type=ext4)] > , registrationTime=1459843800761, agentVersion=2.2.2.0 > 05 Apr 2016 08:10:00,983 INFO [qtp-ambari-agent-418] TopologyManager:316 - > TopologyManager.onHostRegistered: Entering > 05 Apr 2016 08:10:00,984 INFO [qtp-ambari-agent-418] TopologyManager:318 - > TopologyManager.onHostRegistered: host = os-r7-kwjvku-ambari-eu-4-4.novalocal > is already associated with the cluster or is currently being processed > 05 Apr 2016 08:10:01,127 INFO [qtp-ambari-agent-418] HeartBeatHandler:467 - > Recovery configuration set to RecoveryConfig{, type=AUTO_START, maxCount=6, > windowInMinutes=60, retryGap=5, maxLifetimeCount=1024, disabledComponents=, > enabledComponents=METRICS_COLLECTOR} > 05 Apr 2016 08:10:02,289 WARN [ambari-action-scheduler] ActionScheduler:695 > - Detected ambari-agent restart during command execution.The command has been > aborted.Execution command details: host: > os-r7-kwjvku-ambari-eu-4-4.novalocal, role: HBASE_REGIONSERVER, actionId: > 13-29 > 05 Apr 2016 08:10:03,489 WARN [ambari-action-scheduler] ActionScheduler:695 > - Detected ambari-agent restart during command execution.The command has been > aborted.Execution command details: host: > os-r7-kwjvku-ambari-eu-4-4.novalocal, role: HBASE_REGIONSERVER, actionId: > 13-29 > 05 Apr 2016 08:10:04,685 WARN [ambari-action-scheduler] ActionScheduler:695 > - Detected ambari-agent restart during command execution.The command has been > aborted.Execution command details: host: > os-r7-kwjvku-ambari-eu-4-4.novalocal, role: HBASE_REGIONSERVER, actionId: > 13-29 > 05 Apr 2016 08:10:05,911 WARN [ambari-action-scheduler] ActionScheduler:695 > - Detected ambari-agent restart during command execution.The command has been > aborted.Execution command details: host: > os-r7-kwjvku-ambari-eu-4-4.novalocal, role: HBASE_REGIONSERVER, actionId: > 13-29 > 05 Apr 2016 08:10:07,124 WARN [ambari-action-scheduler] ActionScheduler:695 > - Detected ambari-agent restart during command execution.The command has been > aborted.Execution command details: host: > os-r7-kwjvku-ambari-eu-4-4.novalocal, role: HBASE_REGIONSERVER, actionId: > 13-29 > 05 Apr 2016 08:10:08,336 WARN [ambari-action-scheduler] ActionScheduler:695 > - Detected ambari-agent restart during command execution.The command has been > aborted.Execution command details: host: > os-r7-kwjvku-ambari-eu-4-4.novalocal, role: HBASE_REGIONSERVER, actionId: > 13-29 > 05 Apr 2016 08:10:09,570 WARN [ambari-action-scheduler] ActionScheduler:695 > - Detected ambari-agent restart during command execution.The command has been > aborted.Execution command details: host: > os-r7-kwjvku-ambari-eu-4-4.novalocal, role: HBASE_REGIONSERVER, actionId: > 13-29 > 05 Apr 2016 08:10:10,857 WARN [ambari-action-scheduler] ActionScheduler:695 > - Detected ambari-agent restart during command execution.The command has been > aborted.Execution command details: host: > os-r7-kwjvku-ambari-eu-4-4.novalocal, role: HBASE_REGIONSERVER, actionId: > 13-29 > 05 Apr 2016 08:10:12,066 WARN [ambari-action-scheduler] ActionScheduler:695 > - Detected ambari-agent restart during command execution.The command has been > aborted.Execution command details: host: > os-r7-kwjvku-ambari-eu-4-4.novalocal, role: HBASE_REGIONSERVER, actionId: > 13-29 > 05 Apr 2016 08:10:13,295 WARN [ambari-action-scheduler] ActionScheduler:695 > - Detected ambari-agent restart during command execution.The command has been > aborted.Execution command details: host: > os-r7-kwjvku-ambari-eu-4-4.novalocal, role: HBASE_REGIONSERVER, actionId: > 13-29 > 05 Apr 2016 08:10:14,284 INFO [ambari-heartbeat-processor-0] > HeartbeatProcessor:605 - State of service component HISTORYSERVER of service > MAPREDUCE2 of cluster cl1 has changed from UNKNOWN to INSTALLED at host > os-r7-kwjvku-ambari-eu-4-4.novalocal according to STATUS_COMMAND report > 05 Apr 2016 08:10:14,290 INFO [ambari-heartbeat-processor-0] > HeartbeatProcessor:605 - State of service component WEBHCAT_SERVER of service > HIVE of cluster cl1 has changed from UNKNOWN to INSTALLED at host > os-r7-kwjvku-ambari-eu-4-4.novalocal according to STATUS_COMMAND report > 05 Apr 2016 08:10:14,294 INFO [ambari-heartbeat-processor-0] > HeartbeatProcessor:605 - State of service component KAFKA_BROKER of service > KAFKA of cluster cl1 has changed from UNKNOWN to INSTALLED at host > os-r7-kwjvku-ambari-eu-4-4.novalocal according to STATUS_COMMAND report > 05 Apr 2016 08:10:14,298 INFO [ambari-heartbeat-processor-0] > HeartbeatProcessor:605 - State of service component SPARK_JOBHISTORYSERVER of > service SPARK of cluster cl1 has changed from UNKNOWN to INSTALLED at host > os-r7-kwjvku-ambari-eu-4-4.novalocal according to STATUS_COMMAND report > 05 Apr 2016 08:10:14,303 INFO [ambari-heartbeat-processor-0] > HeartbeatProcessor:605 - State of service component METRICS_GRAFANA of > service AMBARI_METRICS of cluster cl1 has changed from UNKNOWN to STARTED at > host os-r7-kwjvku-ambari-eu-4-4.novalocal according to STATUS_COMMAND report > 05 Apr 2016 08:10:14,307 INFO [ambari-heartbeat-processor-0] > HeartbeatProcessor:605 - State of service component DATANODE of service HDFS > of cluster cl1 has changed from UNKNOWN to STARTED at host > os-r7-kwjvku-ambari-eu-4-4.novalocal according to STATUS_COMMAND report > 05 Apr 2016 08:10:14,313 INFO [ambari-heartbeat-processor-0] > HeartbeatProcessor:605 - State of service component NFS_GATEWAY of service > HDFS of cluster cl1 has changed from UNKNOWN to STARTED at host > os-r7-kwjvku-ambari-eu-4-4.novalocal according to STATUS_COMMAND report > 05 Apr 2016 08:10:14,316 INFO [ambari-heartbeat-processor-0] > HeartbeatProcessor:605 - State of service component HBASE_REGIONSERVER of > service HBASE of cluster cl1 has changed from UNKNOWN to STARTED at host > os-r7-kwjvku-ambari-eu-4-4.novalocal according to STATUS_COMMAND report > 05 Apr 2016 08:10:14,321 INFO [ambari-heartbeat-processor-0] > HeartbeatProcessor:605 - State of service component SUPERVISOR of service > STORM of cluster cl1 has changed from UNKNOWN to INSTALLED at host > os-r7-kwjvku-ambari-eu-4-4.novalocal according to STATUS_COMMAND report > 05 Apr 2016 08:10:14,325 INFO [ambari-heartbeat-processor-0] > HeartbeatProcessor:605 - State of service component OOZIE_SERVER of service > OOZIE of cluster cl1 has changed from UNKNOWN to INSTALLED at host > os-r7-kwjvku-ambari-eu-4-4.novalocal according to STATUS_COMMAND report > 05 Apr 2016 08:10:14,329 INFO [ambari-heartbeat-processor-0] > HeartbeatProcessor:605 - State of service component METRICS_MONITOR of > service AMBARI_METRICS of cluster cl1 has changed from UNKNOWN to STARTED at > host os-r7-kwjvku-ambari-eu-4-4.novalocal according to STATUS_COMMAND report > 05 Apr 2016 08:10:14,333 INFO [ambari-heartbeat-processor-0] > HeartbeatProcessor:605 - State of service component NODEMANAGER of service > YARN of cluster cl1 has changed from UNKNOWN to INSTALLED at host > os-r7-kwjvku-ambari-eu-4-4.novalocal according to STATUS_COMMAND report > 05 Apr 2016 08:10:14,337 INFO [ambari-heartbeat-processor-0] > HeartbeatProcessor:605 - State of service component HIVE_METASTORE of service > HIVE of cluster cl1 has changed from UNKNOWN to INSTALLED at host > os-r7-kwjvku-ambari-eu-4-4.novalocal according to STATUS_COMMAND report > 05 Apr 2016 08:10:14,343 INFO [ambari-heartbeat-processor-0] > HeartbeatProcessor:605 - State of service component HIVE_SERVER of service > HIVE of cluster cl1 has changed from UNKNOWN to INSTALLED at host > os-r7-kwjvku-ambari-eu-4-4.novalocal according to STATUS_COMMAND report > 05 Apr 2016 08:10:14,348 INFO [ambari-heartbeat-processor-0] > HeartbeatProcessor:605 - State of service component ZOOKEEPER_SERVER of > service ZOOKEEPER of cluster cl1 has changed from UNKNOWN to STARTED at host > os-r7-kwjvku-ambari-eu-4-4.novalocal according to STATUS_COMMAND report > 05 Apr 2016 08:10:14,351 INFO [ambari-heartbeat-processor-0] > HeartbeatProcessor:605 - State of service component SPARK_THRIFTSERVER of > service SPARK of cluster cl1 has changed from UNKNOWN to INSTALLED at host > os-r7-kwjvku-ambari-eu-4-4.novalocal according to STATUS_COMMAND report > 05 Apr 2016 08:10:14,354 INFO [ambari-heartbeat-processor-0] > HeartbeatProcessor:605 - State of service component DRPC_SERVER of service > STORM of cluster cl1 has changed from UNKNOWN to INSTALLED at host > os-r7-kwjvku-ambari-eu-4-4.novalocal according to STATUS_COMMAND report > 05 Apr 2016 08:10:14,359 INFO [ambari-heartbeat-processor-0] > HeartbeatProcessor:605 - State of service component SECONDARY_NAMENODE of > service HDFS of cluster cl1 has changed from UNKNOWN to STARTED at host > os-r7-kwjvku-ambari-eu-4-4.novalocal according to STATUS_COMMAND report > 05 Apr 2016 08:10:14,363 INFO [ambari-heartbeat-processor-0] > HeartbeatProcessor:605 - State of service component FLUME_HANDLER of service > FLUME of cluster cl1 has changed from UNKNOWN to INSTALLED at host > os-r7-kwjvku-ambari-eu-4-4.novalocal according to STATUS_COMMAND report > 05 Apr 2016 08:10:14,366 INFO [ambari-heartbeat-processor-0] > HeartbeatProcessor:605 - State of service component HST_AGENT of service > SMARTSENSE of cluster cl1 has changed from UNKNOWN to STARTED at host > os-r7-kwjvku-ambari-eu-4-4.novalocal according to STATUS_COMMAND report > 05 Apr 2016 08:10:14,499 WARN [ambari-action-scheduler] ActionScheduler:695 > - Detected ambari-agent restart during command execution.The command has been > aborted.Execution command details: host: > os-r7-kwjvku-ambari-eu-4-4.novalocal, role: HBASE_REGIONSERVER, actionId: > 13-29 > 05 Apr 2016 08:10:15,706 WARN [ambari-action-scheduler] ActionScheduler:695 > - Detected ambari-agent restart during command execution.The command has been > aborted.Execution command details: host: > os-r7-kwjvku-ambari-eu-4-4.novalocal, role: HBASE_REGIONSERVER, actionId: > 13-29 > 05 Apr 2016 08:10:16,894 WARN [ambari-action-scheduler] ActionScheduler:695 > - Detected ambari-agent restart during command execution.The command has been > aborted.Execution command details: host: > os-r7-kwjvku-ambari-eu-4-4.novalocal, role: HBASE_REGIONSERVER, actionId: > 13-29 > 05 Apr 2016 08:10:18,151 WARN [ambari-action-scheduler] ActionScheduler:695 > - Detected ambari-agent restart during command execution.The command has been > aborted.Execution command details: host: > os-r7-kwjvku-ambari-eu-4-4.novalocal, role: HBASE_REGIONSERVER, actionId: > 13-29 > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)