On Mon, Dec 5, 2016 at 2:14 PM, Ophir Etzion <op...@foursquare.com> wrote:

> it still doesn't work
>
> the error becomes:
> INFO 2016-12-05 22:11:10,947 PythonExecutor.py:97 - stop command output:
>  err: shell-init: error retrieving current directory: getcwd: cannot access
> parent directories: No such file or directory
>

This could be the race condition Gour was talking about, where the
container gets cleaned up before the agent has a chance to stop gracefully.


>
> On Fri, Dec 2, 2016 at 2:58 PM, Gour Saha <gs...@hortonworks.com> wrote:
>
> > Billie, this is a good catch.
> >
> > Ophir, I think you should make this small change and try your app stop
> > again to see if it works.
> >
> > -Gour
> >
> > On 12/2/16, 10:13 AM, "Billie Rinaldi" <billie.rina...@gmail.com> wrote:
> >
> > >This subprocess.Popen does appear to be missing an env=env parameter:
> > >https://github.com/apache/incubator-slider/blob/develop/
> > slider-agent/src/m
> > >ain/python/agent/PythonExecutor.py#L153
> > >
> > >On Fri, Dec 2, 2016 at 9:30 AM, Ophir Etzion <op...@foursquare.com>
> > wrote:
> > >
> > >> 1. you can't see the PYTHONPATH issue. you can see there is no setting
> > >>of
> > >> the PYTHONPATH that you can see in the START command.
> > >> 2. thanks for letting me know about release_timeout_secs but for my
> app
> > >>I
> > >> don't care if the containers die, the stop command sends an udp packet
> > >> elsewhere.
> > >>
> > >> here is the output for START where you can see the PYTHONPATH being
> set:
> > >> INFO 2016-11-30 17:50:32,361 AgentToggleLogger.py:40 - Running command
> > >> ['/usr/bin/python',
> > >>  '-S',
> > >>
> > >>u'/export/hdk3/yarn/nm/usercache/hive/appcache/
> > application_1479830316320_
> > >> 64974/filecache/11/enable_presto_worker.zip/package/
> > >> scripts/enable_presto_worker_component.py',
> > >>  u'START',
> > >>  '/export/hda3/data/log/hadoop-yarn/container/application_
> > >> 1479830316320_64974/container_e468_1479830316320_64974_01_
> > >> 000091/command-4.json',
> > >>
> > >>'/export/hdk3/yarn/nm/usercache/hive/appcache/
> application_1479830316320_
> > >> 64974/filecache/11/enable_presto_worker.zip/package',
> > >>  '/export/hda3/data/log/hadoop-yarn/container/application_
> > >> 1479830316320_64974/container_e468_1479830316320_64974_01_
> > >> 000091/structured-out-4.json',
> > >>  'INFO',
> > >>
> > >>'/export/hdj3/yarn/nm/usercache/hive/appcache/
> application_1479830316320_
> > >> 64974/container_e468_1479830316320_64974_01_000091']
> > >> INFO 2016-11-30 17:50:32,361 AgentToggleLogger.py:40 - Setting env:
> > >> PYTHONPATH to
> > >> /export/hdj3/yarn/nm/usercache/hive/appcache/
> application_1479830316320_
> > >> 64974/filecache/10/slider-agent.tar.gz/slider-agent/
> > >> jinja2:/export/hdj3/yarn/nm/usercache/hive/appcache/
> > >> application_1479830316320_64974/filecache/10/slider-
> > >> agent.tar.gz/slider-agent
> > >> INFO 2016-11-30 17:50:32,463 AgentToggleLogger.py:40 - Queue result:
> > >> {'componentStatus': [],
> > >>  'reports': [{'actionId': u'4-1',
> > >>               'clusterName': u'enable-presto-worker_cluster_a',
> > >>               'exitcode': 777,
> > >>               'reportResult': True,
> > >>               'role': u'NODE',
> > >>               'roleCommand': u'START',
> > >>               'serviceName': u'enable-presto-worker_cluster_a',
> > >>               'status': 'IN_PROGRESS',
> > >>               'stderr': '',
> > >>               'stdout': "2016-11-30 17:50:32,455 -
> > >> Directory['/data/appdata/enable_presto_worker/data/var/run']
> > >>{'recursive':
> > >> True}",
> > >>               'structuredOut': '{}',
> > >>               'taskId': 4}]}
> > >>
> > >> On Fri, Dec 2, 2016 at 11:51 AM, Gour Saha <gs...@hortonworks.com>
> > >>wrote:
> > >>
> > >> > Also keep in mind - if your application needs to run something
> useful
> > >> when
> > >> > the stop cmd is initiated then you need to set an appropriate value
> to
> > >> > site.global.app_container.release_timeout_secs. Otherwise kill
> > signals
> > >> are
> > >> > sent to the agent containers via YARN (almost immediately) and the
> > >> > containers don¹t get time for graceful shutdown.
> > >> >
> > >> > -Gour
> > >> >
> > >> >
> > >> >
> > >> > On 12/2/16, 8:29 AM, "Billie Rinaldi" <billie.rina...@gmail.com>
> > >>wrote:
> > >> >
> > >> > >It looks like the Traceback stack for the stop command output is
> > >> truncated
> > >> > >in the logs you pasted. I only see the first line of the Traceback:
> > >> > >INFO 2016-11-30 18:07:03,919 PythonExecutor.py:97 - stop command
> > >>output:
> > >> > > err: Traceback (most recent call last):
> > >> > >  File
> > >> > >"/export/hdk3/yarn/nm/usercache/hive/appcache/
> > >> application_1479830316320_
> > >> > >64974/filecache/11/enable_presto_worker.zip/package/
> > >> > >scripts/enable_presto_worker_component.py",
> > >> > >line 23, in <module>
> > >> > >    from resource_management import *
> > >> > >
> > >> > >So I cannot see the PYTHONPATH error you're talking about. If you
> > >>paste
> > >> > >the
> > >> > >entire Traceback that might tell us more.
> > >> > >
> > >> > >Billie
> > >> > >
> > >> > >On Fri, Dec 2, 2016 at 7:19 AM, Ophir Etzion <op...@foursquare.com
> >
> > >> > wrote:
> > >> > >
> > >> > >> it does implement a STOP command that does something useful.
> > >> > >> it fails because the PYTHONPATH isn't set like it is in different
> > >> > >>commands.
> > >> > >>
> > >> > >> On Thu, Dec 1, 2016 at 10:38 PM, Gour Saha <
> gs...@hortonworks.com>
> > >> > >>wrote:
> > >> > >>
> > >> > >> > Does enable_presto_worker_component.py support/implement a
> STOP
> > >> > >>command?
> > >> > >> >
> > >> > >> > Does your application need to run something useful when the
> stop
> > >>cmd
> > >> > >>is
> > >> > >> > initiated?
> > >> > >> >
> > >> > >> > -Gour
> > >> > >> >
> > >> > >> > On 11/30/16, 10:58 AM, "Ophir Etzion" <op...@foursquare.com>
> > >>wrote:
> > >> > >> >
> > >> > >> > >Hi,
> > >> > >> > >
> > >> > >> > >I hope I'm writing to the correct mailing list. please direct
> me
> > >> > >> elsewhere
> > >> > >> > >if this is not the correct place to write to.
> > >> > >> > >
> > >> > >> > >I've written a simple custom slider application and the STOP
> > >>script
> > >> > >> fails
> > >> > >> > >due to what seems like a slider issue of not setting the
> > >>PYTHONPATH
> > >> > >>when
> > >> > >> > >running the stop command.
> > >> > >> > >
> > >> > >> > >I will probably debug to see what goes on in
> > >> > >>CustomServiceOrchestrator
> > >> > >> and
> > >> > >> > >why it doesn't set the env variables there but I'll only do it
> > >>in a
> > >> > >> couple
> > >> > >> > >of weeks.
> > >> > >> > >I wanted to ask if anyone noticed something like this before I
> > >>look
> > >> > >>into
> > >> > >> > >it
> > >> > >> > >further.
> > >> > >> > >
> > >> > >> > >in the agent log it looks like this:
> > >> > >> > >
> > >> > >> > >INFO 2016-11-30 18:07:03,894 ActionQueue.py:173 - Running
> > >>command:
> > >> > >> > >{u'roleCommand': u'STOP', u'clusterName':
> > >> > >> > >u'enable-presto-worker_cluster_a', u'componentName': u'NODE',
> > >> > >> > u'hostname':
> > >> > >> > >u'fsak20.prod.foursquare.com', u'hostLevelParams':
> > >>{u'java_home':
> > >> > >> > >u'/data/loko/infrastructure-jdk8/current/bin/',
> > u'container_id':
> > >> > >> > >u'container_e468_1479830316320_64974_01_000091'},
> > >>u'commandType':
> > >> > >> > >u'EXECUTION_COMMAND', u'roleParams': {u'auto_restart':
> > >>u'false'},
> > >> > >> > >u'serviceName': u'enable-presto-worker_cluster_a', u'role':
> > >> u'NODE',
> > >> > >> > >u'commandParams': {u'record_config': u'true',
> > >> > >>u'service_package_folder':
> > >> > >> > >u'${AGENT_WORK_ROOT}/work/app/definition/package', u'script':
> > >> > >> > >u'scripts/enable_presto_worker_component.py',
> > u'schema_version':
> > >> > >> u'2.0',
> > >> > >> > >u'command_timeout': u'600', u'script_type': u'PYTHON'},
> > >>u'taskId':
> > >> 5,
> > >> > >> > >u'yarnDockerMode': False, u'commandId': '5-1', u'containers':
> > >>[],
> > >> > >> > >u'configurations': {u'global': {u'security_enabled': u'false',
> > >> > >> > >u'app_container_id': u'container_e468_
> > >> 1479830316320_64974_01_000091'
> > >> > ,
> > >> > >> > >u'data_dir': u'/data/appdata/enable_presto_worker/data',
> > >> > u'app_name':
> > >> > >> > >u'enable_presto_worker.py', u'app_root':
> > >> > >> > >u'${AGENT_WORK_ROOT}/app/install',
> > >> > >> > >u'app_log_dir': u'${AGENT_LOG_ROOT}', u'app_pid_dir':
> > >> > >> > >u'${AGENT_WORK_ROOT}/app/run', u'app_container_tag': u'2',
> > >> > >>u'pid_file':
> > >> > >> > >u'${AGENT_WORK_ROOT}/app/run/component.pid',
> > u'app_install_dir':
> > >> > >> > >u'${AGENT_WORK_ROOT}/app/install', u'app_input_conf_dir':
> > >> > >> > >u'${AGENT_WORK_ROOT}/propagatedconf', u'state_monitor_port':
> > >> > >>u'9990'}}}
> > >> > >> > >INFO 2016-11-30 18:07:03,896 CustomServiceOrchestrator.py:329
> -
> > >> > >>Storing
> > >> > >> > >applied config: {u'global': {u'app_container_id':
> > >> > >> > >u'container_e468_1479830316320_64974_01_000091',
> > >> > >> > >             u'app_container_tag': u'2',
> > >> > >> > >             u'app_input_conf_dir':
> > >> > >> > >u'/export/hdj3/yarn/nm/usercache/hive/appcache/
> > >> > >> > application_1479830316320_6
> > >> > >> >
> > >>>4974/container_e468_1479830316320_64974_01_000091/propagatedconf',
> > >> > >> > >             u'app_install_dir':
> > >> > >> > >u'/export/hdj3/yarn/nm/usercache/hive/appcache/
> > >> > >> > application_1479830316320_6
> > >> > >> > >4974/container_e468_1479830316320_64974_01_000091/
> app/install',
> > >> > >> > >             u'app_log_dir':
> > >> > >> > >u'/data/log/hadoop-yarn/container/application_
> > >> > >> > 1479830316320_64974/containe
> > >> > >> > >r_e468_1479830316320_64974_01_000091',
> > >> > >> > >             u'app_name': u'enable_presto_worker.py',
> > >> > >> > >             u'app_pid_dir':
> > >> > >> > >u'/export/hdj3/yarn/nm/usercache/hive/appcache/
> > >> > >> > application_1479830316320_6
> > >> > >> > >4974/container_e468_1479830316320_64974_01_000091/app/run',
> > >> > >> > >             u'app_root':
> > >> > >> > >u'/export/hdj3/yarn/nm/usercache/hive/appcache/
> > >> > >> > application_1479830316320_6
> > >> > >> > >4974/container_e468_1479830316320_64974_01_000091/
> app/install',
> > >> > >> > >             u'data_dir': u'/data/appdata/enable_presto_
> > >> > worker/data',
> > >> > >> > >             u'pid_file':
> > >> > >> > >u'/export/hdj3/yarn/nm/usercache/hive/appcache/
> > >> > >> > application_1479830316320_6
> > >> > >> > >4974/container_e468_1479830316320_64974_01_000091/
> > >> > >> app/run/component.pid',
> > >> > >> > >             u'security_enabled': u'false',
> > >> > >> > >             u'state_monitor_port': u'9990'}}
> > >> > >> > >INFO 2016-11-30 18:07:03,898 PythonExecutor.py:152 - command
> > >>str:
> > >> > >> > > /usr/bin/python -S
> > >> > >> > >/export/hdk3/yarn/nm/usercache/hive/appcache/
> > >> > >> > application_1479830316320_649
> > >> > >> > >74/filecache/11/enable_presto_worker.zip/package/
> > >> > >> > scripts/enable_presto_wor
> > >> > >> > >ker_component.py
> > >> > >> > >STOP
> > >> > >> > >/export/hda3/data/log/hadoop-yarn/container/application_
> > >> > >> > 1479830316320_6497
> > >> > >> > >4/container_e468_1479830316320_64974_01_000091/command-5.json
> > >> > >> > >/export/hdk3/yarn/nm/usercache/hive/appcache/
> > >> > >> > application_1479830316320_649
> > >> > >> > >74/filecache/11/enable_presto_worker.zip/package
> > >> > >> > >/export/hda3/data/log/hadoop-yarn/container/application_
> > >> > >> > 1479830316320_6497
> > >> > >> > >4/container_e468_1479830316320_64974_01_000091/
> > >> structured-out-5.json
> > >> > >> > >INFO
> > >> > >> > >/export/hdj3/yarn/nm/usercache/hive/appcache/
> > >> > >> > application_1479830316320_649
> > >> > >> > >74/container_e468_1479830316320_64974_01_000091
> > >> > >> > >INFO 2016-11-30 18:07:03,919 PythonExecutor.py:97 - stop
> command
> > >> > >>output:
> > >> > >> > > err: Traceback (most recent call last):
> > >> > >> > >  File
> > >> > >> > >"/export/hdk3/yarn/nm/usercache/hive/appcache/
> > >> > >> > application_1479830316320_64
> > >> > >> > >974/filecache/11/enable_presto_worker.zip/package/
> > >> > >> > scripts/enable_presto_wo
> > >> > >> > >rker_component.py",
> > >> > >> > >line 23, in <module>
> > >> > >> > >    from resource_management import *
> > >> > >> >
> > >> > >> >
> > >> > >>
> > >> >
> > >> >
> > >>
> >
> >
>

Reply via email to