[ https://issues.apache.org/jira/browse/SLIDER-1232?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
ouchengeng updated SLIDER-1232: ------------------------------- Description: When apps are docker containers, agentProviderService can tell slider-agent this is a docker mode, so DockerManager.py can install/start docker containers. However, as for stop_command, provider fails to tell this is in docker mode, that is to say missing docker entry in command. This will cause following exceptions. ``` ERROR 2017-07-05 05:14:25,830 CustomServiceOrchestrator.py:169 - Caught an exception while executing command: <type 'exceptions.AttributeError'>: 'NoneType' object has no attribute 'startswith' Traceback (most recent call last): File "/h8/hadoop/yarn/local/usercache/df/appcache/application_1499081562804_0093/filecache/10/slider-agent.tar.gz/slider-agent/agent/CustomServiceOrchestrator.py", line 115, in runCommand script_path = self.resolve_script_path(self.base_dir, script, script_type) File "/h8/hadoop/yarn/local/usercache/df/appcache/application_1499081562804_0093/filecache/10/slider-agent.tar.gz/slider-agent/agent/CustomServiceOrchestrator.py", line 196, in resolve_script_path path = os.path.realpath(posixpath.join(base_dir, script)) File "/usr/lib64/python2.7/posixpath.py", line 75, in join if b.startswith('/'): AttributeError: 'NoneType' object has no attribute 'startswith' INFO 2017-07-05 05:14:25,831 ActionQueue.py:188 - Stop command received INFO 2017-07-05 05:14:25,932 AgentToggleLogger.py:40 - Queue result: {'componentStatus': [], 'reports': [{'actionId': u'16-1', 'clusterName': u'cgou', 'exitcode': 1, 'reportResult': True, 'role': u'MATCHER', 'roleCommand': u'STOP', 'serviceName': u'cgou', 'status': 'FAILED', 'stderr': "Caught an exception while executing command: <type 'exceptions.AttributeError'>: 'NoneType' object has no attribute 'startswith'", 'stdout': "Caught an exception while executing command: <type 'exceptions.AttributeError'>: 'NoneType' object has no attribute 'startswith'", 'structuredOut': '{}', 'taskId': 16}]} ``` In AgentProviderService.java, I find that there are addInstallCommand/addInstallDockerCommand, addStartCommand/addStartDockerCommand methods. However, stop method for docker is missing. That's why slider-agent/ActionQueue.py cannot recognize this command properly and causes above exception. was: When apps are docker containers, agentProviderService can tell slider-agent this is a docker mode, so DockerManager.py can install/start docker containers. However, as for stop_command, provider fails to tell this is in docker mode, that is to say missing docker entry in command. This will cause following exceptions. ERROR 2017-07-05 05:14:25,830 CustomServiceOrchestrator.py:169 - Caught an exception while executing command: <type 'exceptions.AttributeError'>: 'NoneType' object has no attribute 'startswith' Traceback (most recent call last): File "/h8/hadoop/yarn/local/usercache/df/appcache/application_1499081562804_0093/filecache/10/slider-agent.tar.gz/slider-agent/agent/CustomServiceOrchestrator.py", line 115, in runCommand script_path = self.resolve_script_path(self.base_dir, script, script_type) File "/h8/hadoop/yarn/local/usercache/df/appcache/application_1499081562804_0093/filecache/10/slider-agent.tar.gz/slider-agent/agent/CustomServiceOrchestrator.py", line 196, in resolve_script_path path = os.path.realpath(posixpath.join(base_dir, script)) File "/usr/lib64/python2.7/posixpath.py", line 75, in join if b.startswith('/'): AttributeError: 'NoneType' object has no attribute 'startswith' INFO 2017-07-05 05:14:25,831 ActionQueue.py:188 - Stop command received INFO 2017-07-05 05:14:25,932 AgentToggleLogger.py:40 - Queue result: {'componentStatus': [], 'reports': [{'actionId': u'16-1', 'clusterName': u'cgou', 'exitcode': 1, 'reportResult': True, 'role': u'MATCHER', 'roleCommand': u'STOP', 'serviceName': u'cgou', 'status': 'FAILED', 'stderr': "Caught an exception while executing command: <type 'exceptions.AttributeError'>: 'NoneType' object has no attribute 'startswith'", 'stdout': "Caught an exception while executing command: <type 'exceptions.AttributeError'>: 'NoneType' object has no attribute 'startswith'", 'structuredOut': '{}', 'taskId': 16}]} In AgentProviderService.java, I find that there are addInstallCommand/addInstallDockerCommand, addStartCommand/addStartDockerCommand methods. However, stop method for docker is missing. That's why slider-agent/ActionQueue.py cannot recognize this command properly and causes above exception. > Provider misses docker entry in stop_command when apps are docker containers > ---------------------------------------------------------------------------- > > Key: SLIDER-1232 > URL: https://issues.apache.org/jira/browse/SLIDER-1232 > Project: Slider > Issue Type: Bug > Components: agent-provider > Affects Versions: Slider 0.91, Slider 0.92 > Reporter: ouchengeng > Assignee: ouchengeng > Priority: Critical > Fix For: Slider 0.92 > > > When apps are docker containers, agentProviderService can tell slider-agent > this is a docker mode, so DockerManager.py can install/start docker > containers. > However, as for stop_command, provider fails to tell this is in docker mode, > that is to say missing docker entry in command. > This will cause following exceptions. > ``` > ERROR 2017-07-05 05:14:25,830 CustomServiceOrchestrator.py:169 - Caught an > exception while executing command: <type 'exceptions.AttributeError'>: > 'NoneType' object has no attribute 'startswith' > Traceback (most recent call last): > File > "/h8/hadoop/yarn/local/usercache/df/appcache/application_1499081562804_0093/filecache/10/slider-agent.tar.gz/slider-agent/agent/CustomServiceOrchestrator.py", > line 115, in runCommand > script_path = self.resolve_script_path(self.base_dir, script, script_type) > File > "/h8/hadoop/yarn/local/usercache/df/appcache/application_1499081562804_0093/filecache/10/slider-agent.tar.gz/slider-agent/agent/CustomServiceOrchestrator.py", > line 196, in resolve_script_path > path = os.path.realpath(posixpath.join(base_dir, script)) > File "/usr/lib64/python2.7/posixpath.py", line 75, in join > if b.startswith('/'): > AttributeError: 'NoneType' object has no attribute 'startswith' > INFO 2017-07-05 05:14:25,831 ActionQueue.py:188 - Stop command received > INFO 2017-07-05 05:14:25,932 AgentToggleLogger.py:40 - Queue result: > {'componentStatus': [], > 'reports': [{'actionId': u'16-1', > 'clusterName': u'cgou', > 'exitcode': 1, > 'reportResult': True, > 'role': u'MATCHER', > 'roleCommand': u'STOP', > 'serviceName': u'cgou', > 'status': 'FAILED', > 'stderr': "Caught an exception while executing command: <type > 'exceptions.AttributeError'>: 'NoneType' object has no attribute > 'startswith'", > 'stdout': "Caught an exception while executing command: <type > 'exceptions.AttributeError'>: 'NoneType' object has no attribute > 'startswith'", > 'structuredOut': '{}', > 'taskId': 16}]} > ``` > In AgentProviderService.java, I find that there are > addInstallCommand/addInstallDockerCommand, > addStartCommand/addStartDockerCommand methods. However, stop method for > docker is missing. That's why slider-agent/ActionQueue.py cannot recognize > this command properly and causes above exception. -- This message was sent by Atlassian JIRA (v6.4.14#64029)