[ https://issues.apache.org/jira/browse/AMBARI-24201?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Andrew Onischuk updated AMBARI-24201: ------------------------------------- Attachment: AMBARI-24201.patch > Command reschedule does not work causing blueprint deployments to timeout > --------------------------------------------------------------------------- > > Key: AMBARI-24201 > URL: https://issues.apache.org/jira/browse/AMBARI-24201 > Project: Ambari > Issue Type: Bug > Reporter: Andrew Onischuk > Assignee: Andrew Onischuk > Priority: Major > Fix For: 2.7.0 > > Attachments: AMBARI-24201.patch, AMBARI-24201.patch, > AMBARI-24201.patch > > > During stage timeout/failure of devilery during blueprint install server > usually reschedules running command. By sending cancel command along with > repeated execution command. > The bug is that agent cancels the command which needs to be newly scheduled. > > > 2018-06-27 01:34:58,105 WARN [agent-message-retry-0] MessageEmitter:255 > - Reschedule execution command emitting, retry: 1, messageId: 19 > > > > ..., u'cancelCommands': [{u'commandType': u'CANCEL_COMMAND', > u'target_task_id': 145, u'reason': u'Stage timeout'}]}}, > u'requiredConfigTimestamp': 1530060845474} > INFO 2018-06-27 01:34:58,121 ActionQueue.py:115 - Canceling command with > taskId = 145 > INFO 2018-06-27 01:34:58,121 ActionQueue.py:134 - Canceling > EXECUTION_COMMAND for service ZOOKEEPER and role ZOOKEEPER_CLIENT with taskId > 145 > WARNING 2018-06-27 01:34:58,121 CustomServiceOrchestrator.py:129 - Unable > to find process associated with taskId = 145 > INFO 2018-06-27 01:34:58,122 ActionQueue.py:103 - Adding > EXECUTION_COMMAND for role ZOOKEEPER_CLIENT for service ZOOKEEPER of > cluster_id 2 to the queue. > INFO 2018-06-27 01:34:58,122 security.py:135 - Event to server at > /reports/responses (correlation_id=870): {'status': 'OK', 'messageId': '19'} > INFO 2018-06-27 01:34:58,142 __init__.py:57 - Event from server at /user/ > (correlation_id=870): {u'status': u'OK'} > INFO 2018-06-27 01:34:59,293 ActionQueue.py:238 - Executing command with > id = 10-0, taskId = 145 for role = ZOOKEEPER_CLIENT of cluster_id 2. > INFO 2018-06-27 01:34:59,294 security.py:135 - Event to server at > /reports/commands_status (correlation_id=871): {'clusters': {u'2': > [{'status': 'IN_PROGRESS', 'taskId': 145, 'tmpout': > '/var/lib/ambari-agent/data/output-145.txt', 'roleCommand': u'INSTALL', > 'structuredOut': '/var/lib/ambari-agent/data/structured-out-145.json', > 'clusterId': u'2', 'serviceName': u'ZOOKEEPER', 'role': u'ZOOKEEPER_CLIENT', > 'actionId': u'10-0', 'tmperr': '/var/lib/ambari-agent/data/errors-145.txt'}]}} > INFO 2018-06-27 01:34:59,295 ActionQueue.py:279 - Command execution > metadata - taskId = 145, retry enabled = True, max retry duration (sec) = > 1200, log_output = True > INFO 2018-06-27 01:34:59,296 ActionQueue.py:285 - Command with taskId = > 145 canceled > ERROR 2018-06-27 01:34:59,296 ActionQueue.py:221 - Exception while > processing EXECUTION_COMMAND command > Traceback (most recent call last): > File "/usr/lib/ambari-agent/lib/ambari_agent/ActionQueue.py", line 214, > in process_command > self.execute_command(command) > File "/usr/lib/ambari-agent/lib/ambari_agent/ActionQueue.py", line 354, > in execute_command > commandresult['stdout'] += '\n\nCommand completed successfully!\n' if > status == self.COMPLETED_STATUS else '\n\nCommand failed after ' + > str(numAttempts) + ' tries\n' > UnboundLocalError: local variable 'commandresult' referenced before > assignment > -- This message was sent by Atlassian JIRA (v7.6.3#76005)