On Mon, Dec 5, 2016 at 2:14 PM, Ophir Etzion <op...@foursquare.com> wrote:
> it still doesn't work > > the error becomes: > INFO 2016-12-05 22:11:10,947 PythonExecutor.py:97 - stop command output: > err: shell-init: error retrieving current directory: getcwd: cannot access > parent directories: No such file or directory > This could be the race condition Gour was talking about, where the container gets cleaned up before the agent has a chance to stop gracefully. > > On Fri, Dec 2, 2016 at 2:58 PM, Gour Saha <gs...@hortonworks.com> wrote: > > > Billie, this is a good catch. > > > > Ophir, I think you should make this small change and try your app stop > > again to see if it works. > > > > -Gour > > > > On 12/2/16, 10:13 AM, "Billie Rinaldi" <billie.rina...@gmail.com> wrote: > > > > >This subprocess.Popen does appear to be missing an env=env parameter: > > >https://github.com/apache/incubator-slider/blob/develop/ > > slider-agent/src/m > > >ain/python/agent/PythonExecutor.py#L153 > > > > > >On Fri, Dec 2, 2016 at 9:30 AM, Ophir Etzion <op...@foursquare.com> > > wrote: > > > > > >> 1. you can't see the PYTHONPATH issue. you can see there is no setting > > >>of > > >> the PYTHONPATH that you can see in the START command. > > >> 2. thanks for letting me know about release_timeout_secs but for my > app > > >>I > > >> don't care if the containers die, the stop command sends an udp packet > > >> elsewhere. > > >> > > >> here is the output for START where you can see the PYTHONPATH being > set: > > >> INFO 2016-11-30 17:50:32,361 AgentToggleLogger.py:40 - Running command > > >> ['/usr/bin/python', > > >> '-S', > > >> > > >>u'/export/hdk3/yarn/nm/usercache/hive/appcache/ > > application_1479830316320_ > > >> 64974/filecache/11/enable_presto_worker.zip/package/ > > >> scripts/enable_presto_worker_component.py', > > >> u'START', > > >> '/export/hda3/data/log/hadoop-yarn/container/application_ > > >> 1479830316320_64974/container_e468_1479830316320_64974_01_ > > >> 000091/command-4.json', > > >> > > >>'/export/hdk3/yarn/nm/usercache/hive/appcache/ > application_1479830316320_ > > >> 64974/filecache/11/enable_presto_worker.zip/package', > > >> '/export/hda3/data/log/hadoop-yarn/container/application_ > > >> 1479830316320_64974/container_e468_1479830316320_64974_01_ > > >> 000091/structured-out-4.json', > > >> 'INFO', > > >> > > >>'/export/hdj3/yarn/nm/usercache/hive/appcache/ > application_1479830316320_ > > >> 64974/container_e468_1479830316320_64974_01_000091'] > > >> INFO 2016-11-30 17:50:32,361 AgentToggleLogger.py:40 - Setting env: > > >> PYTHONPATH to > > >> /export/hdj3/yarn/nm/usercache/hive/appcache/ > application_1479830316320_ > > >> 64974/filecache/10/slider-agent.tar.gz/slider-agent/ > > >> jinja2:/export/hdj3/yarn/nm/usercache/hive/appcache/ > > >> application_1479830316320_64974/filecache/10/slider- > > >> agent.tar.gz/slider-agent > > >> INFO 2016-11-30 17:50:32,463 AgentToggleLogger.py:40 - Queue result: > > >> {'componentStatus': [], > > >> 'reports': [{'actionId': u'4-1', > > >> 'clusterName': u'enable-presto-worker_cluster_a', > > >> 'exitcode': 777, > > >> 'reportResult': True, > > >> 'role': u'NODE', > > >> 'roleCommand': u'START', > > >> 'serviceName': u'enable-presto-worker_cluster_a', > > >> 'status': 'IN_PROGRESS', > > >> 'stderr': '', > > >> 'stdout': "2016-11-30 17:50:32,455 - > > >> Directory['/data/appdata/enable_presto_worker/data/var/run'] > > >>{'recursive': > > >> True}", > > >> 'structuredOut': '{}', > > >> 'taskId': 4}]} > > >> > > >> On Fri, Dec 2, 2016 at 11:51 AM, Gour Saha <gs...@hortonworks.com> > > >>wrote: > > >> > > >> > Also keep in mind - if your application needs to run something > useful > > >> when > > >> > the stop cmd is initiated then you need to set an appropriate value > to > > >> > site.global.app_container.release_timeout_secs. Otherwise kill > > signals > > >> are > > >> > sent to the agent containers via YARN (almost immediately) and the > > >> > containers don¹t get time for graceful shutdown. > > >> > > > >> > -Gour > > >> > > > >> > > > >> > > > >> > On 12/2/16, 8:29 AM, "Billie Rinaldi" <billie.rina...@gmail.com> > > >>wrote: > > >> > > > >> > >It looks like the Traceback stack for the stop command output is > > >> truncated > > >> > >in the logs you pasted. I only see the first line of the Traceback: > > >> > >INFO 2016-11-30 18:07:03,919 PythonExecutor.py:97 - stop command > > >>output: > > >> > > err: Traceback (most recent call last): > > >> > > File > > >> > >"/export/hdk3/yarn/nm/usercache/hive/appcache/ > > >> application_1479830316320_ > > >> > >64974/filecache/11/enable_presto_worker.zip/package/ > > >> > >scripts/enable_presto_worker_component.py", > > >> > >line 23, in <module> > > >> > > from resource_management import * > > >> > > > > >> > >So I cannot see the PYTHONPATH error you're talking about. If you > > >>paste > > >> > >the > > >> > >entire Traceback that might tell us more. > > >> > > > > >> > >Billie > > >> > > > > >> > >On Fri, Dec 2, 2016 at 7:19 AM, Ophir Etzion <op...@foursquare.com > > > > >> > wrote: > > >> > > > > >> > >> it does implement a STOP command that does something useful. > > >> > >> it fails because the PYTHONPATH isn't set like it is in different > > >> > >>commands. > > >> > >> > > >> > >> On Thu, Dec 1, 2016 at 10:38 PM, Gour Saha < > gs...@hortonworks.com> > > >> > >>wrote: > > >> > >> > > >> > >> > Does enable_presto_worker_component.py support/implement a > STOP > > >> > >>command? > > >> > >> > > > >> > >> > Does your application need to run something useful when the > stop > > >>cmd > > >> > >>is > > >> > >> > initiated? > > >> > >> > > > >> > >> > -Gour > > >> > >> > > > >> > >> > On 11/30/16, 10:58 AM, "Ophir Etzion" <op...@foursquare.com> > > >>wrote: > > >> > >> > > > >> > >> > >Hi, > > >> > >> > > > > >> > >> > >I hope I'm writing to the correct mailing list. please direct > me > > >> > >> elsewhere > > >> > >> > >if this is not the correct place to write to. > > >> > >> > > > > >> > >> > >I've written a simple custom slider application and the STOP > > >>script > > >> > >> fails > > >> > >> > >due to what seems like a slider issue of not setting the > > >>PYTHONPATH > > >> > >>when > > >> > >> > >running the stop command. > > >> > >> > > > > >> > >> > >I will probably debug to see what goes on in > > >> > >>CustomServiceOrchestrator > > >> > >> and > > >> > >> > >why it doesn't set the env variables there but I'll only do it > > >>in a > > >> > >> couple > > >> > >> > >of weeks. > > >> > >> > >I wanted to ask if anyone noticed something like this before I > > >>look > > >> > >>into > > >> > >> > >it > > >> > >> > >further. > > >> > >> > > > > >> > >> > >in the agent log it looks like this: > > >> > >> > > > > >> > >> > >INFO 2016-11-30 18:07:03,894 ActionQueue.py:173 - Running > > >>command: > > >> > >> > >{u'roleCommand': u'STOP', u'clusterName': > > >> > >> > >u'enable-presto-worker_cluster_a', u'componentName': u'NODE', > > >> > >> > u'hostname': > > >> > >> > >u'fsak20.prod.foursquare.com', u'hostLevelParams': > > >>{u'java_home': > > >> > >> > >u'/data/loko/infrastructure-jdk8/current/bin/', > > u'container_id': > > >> > >> > >u'container_e468_1479830316320_64974_01_000091'}, > > >>u'commandType': > > >> > >> > >u'EXECUTION_COMMAND', u'roleParams': {u'auto_restart': > > >>u'false'}, > > >> > >> > >u'serviceName': u'enable-presto-worker_cluster_a', u'role': > > >> u'NODE', > > >> > >> > >u'commandParams': {u'record_config': u'true', > > >> > >>u'service_package_folder': > > >> > >> > >u'${AGENT_WORK_ROOT}/work/app/definition/package', u'script': > > >> > >> > >u'scripts/enable_presto_worker_component.py', > > u'schema_version': > > >> > >> u'2.0', > > >> > >> > >u'command_timeout': u'600', u'script_type': u'PYTHON'}, > > >>u'taskId': > > >> 5, > > >> > >> > >u'yarnDockerMode': False, u'commandId': '5-1', u'containers': > > >>[], > > >> > >> > >u'configurations': {u'global': {u'security_enabled': u'false', > > >> > >> > >u'app_container_id': u'container_e468_ > > >> 1479830316320_64974_01_000091' > > >> > , > > >> > >> > >u'data_dir': u'/data/appdata/enable_presto_worker/data', > > >> > u'app_name': > > >> > >> > >u'enable_presto_worker.py', u'app_root': > > >> > >> > >u'${AGENT_WORK_ROOT}/app/install', > > >> > >> > >u'app_log_dir': u'${AGENT_LOG_ROOT}', u'app_pid_dir': > > >> > >> > >u'${AGENT_WORK_ROOT}/app/run', u'app_container_tag': u'2', > > >> > >>u'pid_file': > > >> > >> > >u'${AGENT_WORK_ROOT}/app/run/component.pid', > > u'app_install_dir': > > >> > >> > >u'${AGENT_WORK_ROOT}/app/install', u'app_input_conf_dir': > > >> > >> > >u'${AGENT_WORK_ROOT}/propagatedconf', u'state_monitor_port': > > >> > >>u'9990'}}} > > >> > >> > >INFO 2016-11-30 18:07:03,896 CustomServiceOrchestrator.py:329 > - > > >> > >>Storing > > >> > >> > >applied config: {u'global': {u'app_container_id': > > >> > >> > >u'container_e468_1479830316320_64974_01_000091', > > >> > >> > > u'app_container_tag': u'2', > > >> > >> > > u'app_input_conf_dir': > > >> > >> > >u'/export/hdj3/yarn/nm/usercache/hive/appcache/ > > >> > >> > application_1479830316320_6 > > >> > >> > > > >>>4974/container_e468_1479830316320_64974_01_000091/propagatedconf', > > >> > >> > > u'app_install_dir': > > >> > >> > >u'/export/hdj3/yarn/nm/usercache/hive/appcache/ > > >> > >> > application_1479830316320_6 > > >> > >> > >4974/container_e468_1479830316320_64974_01_000091/ > app/install', > > >> > >> > > u'app_log_dir': > > >> > >> > >u'/data/log/hadoop-yarn/container/application_ > > >> > >> > 1479830316320_64974/containe > > >> > >> > >r_e468_1479830316320_64974_01_000091', > > >> > >> > > u'app_name': u'enable_presto_worker.py', > > >> > >> > > u'app_pid_dir': > > >> > >> > >u'/export/hdj3/yarn/nm/usercache/hive/appcache/ > > >> > >> > application_1479830316320_6 > > >> > >> > >4974/container_e468_1479830316320_64974_01_000091/app/run', > > >> > >> > > u'app_root': > > >> > >> > >u'/export/hdj3/yarn/nm/usercache/hive/appcache/ > > >> > >> > application_1479830316320_6 > > >> > >> > >4974/container_e468_1479830316320_64974_01_000091/ > app/install', > > >> > >> > > u'data_dir': u'/data/appdata/enable_presto_ > > >> > worker/data', > > >> > >> > > u'pid_file': > > >> > >> > >u'/export/hdj3/yarn/nm/usercache/hive/appcache/ > > >> > >> > application_1479830316320_6 > > >> > >> > >4974/container_e468_1479830316320_64974_01_000091/ > > >> > >> app/run/component.pid', > > >> > >> > > u'security_enabled': u'false', > > >> > >> > > u'state_monitor_port': u'9990'}} > > >> > >> > >INFO 2016-11-30 18:07:03,898 PythonExecutor.py:152 - command > > >>str: > > >> > >> > > /usr/bin/python -S > > >> > >> > >/export/hdk3/yarn/nm/usercache/hive/appcache/ > > >> > >> > application_1479830316320_649 > > >> > >> > >74/filecache/11/enable_presto_worker.zip/package/ > > >> > >> > scripts/enable_presto_wor > > >> > >> > >ker_component.py > > >> > >> > >STOP > > >> > >> > >/export/hda3/data/log/hadoop-yarn/container/application_ > > >> > >> > 1479830316320_6497 > > >> > >> > >4/container_e468_1479830316320_64974_01_000091/command-5.json > > >> > >> > >/export/hdk3/yarn/nm/usercache/hive/appcache/ > > >> > >> > application_1479830316320_649 > > >> > >> > >74/filecache/11/enable_presto_worker.zip/package > > >> > >> > >/export/hda3/data/log/hadoop-yarn/container/application_ > > >> > >> > 1479830316320_6497 > > >> > >> > >4/container_e468_1479830316320_64974_01_000091/ > > >> structured-out-5.json > > >> > >> > >INFO > > >> > >> > >/export/hdj3/yarn/nm/usercache/hive/appcache/ > > >> > >> > application_1479830316320_649 > > >> > >> > >74/container_e468_1479830316320_64974_01_000091 > > >> > >> > >INFO 2016-11-30 18:07:03,919 PythonExecutor.py:97 - stop > command > > >> > >>output: > > >> > >> > > err: Traceback (most recent call last): > > >> > >> > > File > > >> > >> > >"/export/hdk3/yarn/nm/usercache/hive/appcache/ > > >> > >> > application_1479830316320_64 > > >> > >> > >974/filecache/11/enable_presto_worker.zip/package/ > > >> > >> > scripts/enable_presto_wo > > >> > >> > >rker_component.py", > > >> > >> > >line 23, in <module> > > >> > >> > > from resource_management import * > > >> > >> > > > >> > >> > > > >> > >> > > >> > > > >> > > > >> > > > > >