If you look for the container ID in the nodemanager log on the host where
the container was running, you should be able to see when the container
stopped and was cleaned up. Looks like it even logs when it deletes the
container directories.

On Thu, Jul 7, 2016 at 2:04 PM, Sarthak Kukreti <skuk...@ncsu.edu> wrote:

> kafka,py is still present in the filecache directory: its just the
> "container_1467829690678_0022_01_000003" directory that seems to be
> deleted before the runCommand() call
>
> - Sarthak
>
> On Thu, Jul 7, 2016 at 12:35 PM, Billie Rinaldi
> <billie.rina...@gmail.com> wrote:
> > I think that
> >
> /private/tmp/hdfs/nm-local-dir/usercache/sarthakk/appcache/application_1467829690678_0022/container_1467829690678_0022_01_000003/app/definition
> > is linked to
> >
> /private/tmp/hdfs/nm-local-dir/usercache/sarthakk/appcache/application_1467829690678_0022/filecache/113/slider-kafka-package-1.0.0.zip,
> > so those are actually the same directory. I am not sure why it is saying
> > kafka.py does not exist when stopping the container; it definitely should
> > not clean up that directory while a container is still running. Can you
> > verify that app/definition/package/scripts/kafka.py exists for one of the
> > containers that is running?
> >
> > On Thu, Jul 7, 2016 at 11:50 AM, Sarthak Kukreti <skuk...@ncsu.edu>
> wrote:
> >
> >> Hello!
> >>
> >> I am trying to use Slider to distribute an application over a YARN
> >> cluster. While attempting to use "slider flex" to decrease the number
> >> of containers allocated for the application (using the kafka
> >> app-package as reference), I came across the following error:
> >>
> >> ERROR 2016-07-07 10:57:36,461 CustomServiceOrchestrator.py:169 -
> >> Caught an exception while executing command: <class
> >> 'AgentException.AgentException'>: 'Script
> >>
> >>
> /private/tmp/hdfs/nm-local-dir/usercache/sarthakk/appcache/application_1467829690678_0022/container_1467829690678_0022_01_000003/app/definition/package/scripts/kafka.py
> >> does not exist'
> >> Traceback (most recent call last):
> >>   File
> >>
> "/private/tmp/hdfs/nm-local-dir/usercache/sarthakk/appcache/application_1467829690678_0022/filecache/71/slider-agent.tar.gz/slider-agent/agent/CustomServiceOrchestrator.py",
> >> line 115, in runCommand
> >>     script_path = self.resolve_script_path(self.base_dir, script,
> >> script_type)
> >>   File
> >>
> "/private/tmp/hdfs/nm-local-dir/usercache/sarthakk/appcache/application_1467829690678_0022/filecache/71/slider-agent.tar.gz/slider-agent/agent/CustomServiceOrchestrator.py",
> >> line 199, in resolve_script_path
> >>     raise AgentException(message)
> >> AgentException: 'Script
> >>
> >>
> /private/tmp/hdfs/nm-local-dir/usercache/sarthakk/appcache/application_1467829690678_0022/container_1467829690678_0022_01_000003/app/definition/package/scripts/kafka.py
> >> does not exist'
> >>
> >>
> >> (Seems like the directory is cleared up before the command)
> >>
> >> Additionally, I tried adding debug prints in the
> >> CustomServiceOrchestrator to see what base directory is used for
> >> invoking the script and found that the base directories for STATUS and
> >> STOP command differ:
> >>
> >> STATUS command:
> >>
> >> INFO 2016-07-07 10:56:31,323 AgentToggleLogger.py:40 - Adding
> >> STATUS_COMMAND for service kc of cluster kc to the queue.
> >> INFO 2016-07-07 10:56:31,327 CustomServiceOrchestrator.py:114 - Base
> >> dir:
> >>
> /private/tmp/hdfs/nm-local-dir/usercache/sarthakk/appcache/application_1467829690678_0022/filecache/113/slider-kafka-package-1.0.0.zip/package
> >>
> >>
> >> STOP command:
> >>
> >> INFO 2016-07-07 10:57:36,455 AgentToggleLogger.py:40 - Adding
> >> EXECUTION_COMMAND for service kc of cluster kc to the queue.
> >> INFO 2016-07-07 10:57:36,456 Controller.py:251 - Attempting to
> >> gracefully stop the application ...
> >> INFO 2016-07-07 10:57:36,458 ActionQueue.py:134 - Package received:
> >> INFO 2016-07-07 10:57:36,458 ActionQueue.py:140 - Executing command
> >> with id = 4-1 for role = Hello of cluster kc
> >> INFO 2016-07-07 10:57:36,460 ActionQueue.py:170 - Running command:
> >> {u'roleCommand': u'STOP', u'clusterName': u'kc', u'componentName':
> >> u'Hello', u'hostname': u'192.168.1.195', u'hostLevelParams':
> >> {u'java_home':
> >> u'/Library/Java/JavaVirtualMachines/jdk1.7.0_79.jdk/Contents/Home/',
> >> u'container_id': u'container_1467829690678_0022_01_000003'},
> >> u'commandType': u'EXECUTION_COMMAND', u'roleParams': {u'auto_restart':
> >> u'false'}, u'serviceName': u'kc', u'role': u'Hello', u'commandParams':
> >> {u'record_config': u'true', u'service_package_folder':
> >> u'${AGENT_WORK_ROOT}/work/app/definition/package', u'script':
> >> u'scripts/kafka.py', u'schema_version': u'2.0', u'command_timeout':
> >> u'600', u'script_type': u'PYTHON'}, u'taskId': 4, u'commandId':
> >> u'4-1', u'containers': [], u'configurations': {u'global':
> >> {u'security_enabled': u'false', u'app_container_id':
> >> u'container_1467829690678_0022_01_000003', u'listen_port': u'52508',
> >> u'app_root': u'${AGENT_WORK_ROOT}/app/install', u'app_log_dir':
> >> u'${AGENT_LOG_ROOT}', u'kc_version': u'1.0.0', u'app_pid_dir':
> >> u'${AGENT_WORK_ROOT}/app/run', u'app_container_tag': u'2',
> >> u'pid_file': u'${AGENT_WORK_ROOT}/app/run/kc.pid', u'app_install_dir':
> >> u'${AGENT_WORK_ROOT}/app/install', u'app_user': u'sarthakk',
> >> u'app_input_conf_dir': u'${AGENT_WORK_ROOT}/propagatedconf'},
> >> u'server': {}}}
> >> INFO 2016-07-07 10:57:36,461 CustomServiceOrchestrator.py:114 - Base
> >> dir:
> >>
> /private/tmp/hdfs/nm-local-dir/usercache/sarthakk/appcache/application_1467829690678_0022/container_1467829690678_0022_01_000003/app/definition/package
> >>
> >> For some reason, the STOP command attempts to pick up the script from
> >> the container specific location, where the STATUS command goes through
> >> an entirely different path (I am not sure though if this is the cause
> >> of the issue). Any more pointers to debug this would be really
> >> helpful.
> >>
> >> (For reference, platform: OS X, Python 2.7.11)
> >>
> >> Thank you
> >> Sarthak
> >>
>

Reply via email to