Hi All:
Thanks to Billie Rinaldi's help and some code diving, I've been able to get slider to advance the jmemcached task to the running state. However, I don't think jmemcached is actually running (e.g. the port 11211 is not found by netstat or lsof). I suspect something has gone awry during the launching of the container in slider, and I could use a bit of guidance regarding how to track this down. If I understand correctly, slider is launching 2 containers, an application manager and the one that is supposed to run jmemcached. I think the jmemcached container is failing and relaunching frequently under a different container ID, and that the container id suffix is climbing (e.g. initially I think it was container_1493146167513_0001_01_000008 when I first checked, now the container name suffix ends in 0155. In any case, I'm trying to understand what is happening, looking at: http://quickstart.cloudera:8042/node/containerlogs/container_1493146167513_0001_01_000008/root/slider-agent.log/?start=0, I see that the container names INFO 2017-04-25 19:31:51,440 main.py:85 - loglevel=logging.INFO INFO 2017-04-25 19:31:51,440 main.py:96 - Newloglevel=logging.DEBUG INFO 2017-04-25 19:31:51,440 main.py:242 - Using AGENT_WORK_ROOT = /var/lib/hadoop-yarn/cache/yarn/nm-local-dir/usercache/root/appcache/application_1493146167513_0001/container_1493146167513_0001_01_000008 INFO 2017-04-25 19:31:51,441 main.py:243 - Using AGENT_LOG_ROOT = /var/log/hadoop-yarn/containers/application_1493146167513_0001/container_1493146167513_0001_01_000008 [Omitting logs for brevity, Note that the container name below might be getting corrupted to now read container_1493146167513_0001_01_000008___MEMCACHED] INFO 2017-04-25 19:31:51,478 NetUtil.py:67 - DEBUG: Trying to connect to the server at https://quickstart.cloudera:34593/ws/v1/slider/agents/ INFO 2017-04-25 19:31:51,478 NetUtil.py:38 - Connecting to the following url https://quickstart.cloudera:34593/ws/v1/slider/agents/ INFO 2017-04-25 19:31:51,536 NetUtil.py:45 - Calling url received 200 DEBUG 2017-04-25 19:31:51,537 Controller.py:63 - Initializing Controller RPC thread. INFO 2017-04-25 19:31:51,539 Controller.py:137 - Registering with the server at https://quickstart.cloudera:40959/ws/v1/slider/agents/container_1493146167513_0001_01_000008___MEMCACHED/register with data '{"actualState": 0, "logFolders": {}, "agentVersion": "1", "allocatedPorts": {}, "timestamp": 1493148711538, "expectedState": 0, "tags": "", "responseId": -1, "publicHostname": "quickstart.cloudera", "label": "container_1493146167513_0001_01_000008___MEMCACHED"}' INFO 2017-04-25 19:31:51,539 security.py:89 - SSL Connect being called.. connecting to the server INFO 2017-04-25 19:31:51,604 security.py:51 - SSL connection established. Two-way SSL authentication is turned off on the server. INFO 2017-04-25 19:31:51,607 Controller.py:180 - Unable to connect to: https://quickstart.cloudera:40959/ws/v1/slider/agents/container_1493146167513_0001_01_000008___MEMCACHED/register Traceback (most recent call last): File "/var/lib/hadoop-yarn/cache/yarn/nm-local-dir/usercache/root/appcache/application_1493146167513_0001/filecache/21/slider-agent.tar.gz/slider-agent/agent/Controller.py", line 139, in registerWithServer regResp = json.loads(response) File "/usr/lib64/python2.7/json/__init__.py", line 338, in loads return _default_decoder.decode(s) File "/usr/lib64/python2.7/json/decoder.py", line 366, in decode obj, end = self.raw_decode(s, idx=_w(s, 0).end()) File "/usr/lib64/python2.7/json/decoder.py", line 384, in raw_decode raise ValueError("No JSON object could be decoded") ValueError: No JSON object could be decoded Any idea how I can better debug this and analyze this? With best regards: Bill ________________________________ From: Foolish Ewe <foolish...@hotmail.com> Sent: Wednesday, April 19, 2017 1:35 AM To: dev@slider.incubator.apache.org Subject: Trying to get he memcached example to run for a new build Hello All: I'm trying to run the memcached example locally but it fails pretty quick and claims "protobuf-java-2.5.0.jar does not exist". Consider the following script (test-slider.sh): #!/bin/bash # this does not work when run during docker-build, we need to do this once we are logged in echo "memcached built, registering memcached with slider and then running it" /usr/lib/slider/bin/slider install-package --package /tmp/mybuild/incubator-slider/app-packages/memcached/jmemcached-1.0.0.zip --name jmemcached --debug echo "Listing applications - before starting jmemcached" /usr/lib/slider/bin/slider list --manager localhost:8032 # startup memcached properly, yarn task name will be mymemcached /usr/lib/slider/bin/slider create mymemcached --template /tmp/mybuild/incubator-slider/app-packages/memcached/appConfig.json --resources /tmp/mybuild/incubator-slider/app-packages/memcached/resources-default.json --manager localhost:8032 --debug echo "Listing applications - after starting jmemcached" /usr/lib/slider/bin/slider list --manager localhost:8032 echo "Finished $0" I am having difficulty in locating the logs after the failure, any idea where they would be? yarn logs cannot find them, and I'm not having luck with the name node. I did modify the yarn-site.xml read at startup to include the lines: <!-- Begin modifications for debugging slider --> <!-- 60 minutes after a failure to see what is left in the directory--> <property> <name>yarn.nodemanager.delete.debug-delay-sec</name> <value>3600</value> </property> <!--time before the process gets a -9 (Should it be 30 seconds?)--> <property> <name>yarn.nodemanager.sleep-delay-before-sigkill.ms</name> <value>3600000</value> </property> <!-- End modifications for debugging slider --> Regarding the error, I'm seeing, the slider install-package appears to work correctly from what I can tell, but the slider create encounters run time errors when launched, it seems to be looking for protobuf-java-2.5.0.jar. I tried making fat jars for the memcached application and the various slider packages, to no avail. Any ideas how to resolve these errors reported in the resource manager: Application application_1492557996590_0002 failed 2 times due to AM Container for appattempt_1492557996590_0002_000002 exited with exitCode: -1000 For more detailed output, check application tracking page:http://quickstart.cloudera:8088/proxy/application_1492557996590_0002/Then, click on links to logs of each attempt. Diagnostics: File file:/root/.slider/cluster/mymemcached/tmp/application_1492557996590_0002/am/lib/protobuf-java-2.5.0.jar does not exist java.io.FileNotFoundException: File file:/root/.slider/cluster/mymemcached/tmp/application_1492557996590_0002/am/lib/protobuf-java-2.5.0.jar does not exist at org.apache.hadoop.fs.RawLocalFileSystem.deprecatedGetFileStatus(RawLocalFileSystem.java:542) at org.apache.hadoop.fs.RawLocalFileSystem.getFileLinkStatusInternal(RawLocalFileSystem.java:755) at org.apache.hadoop.fs.RawLocalFileSystem.getFileStatus(RawLocalFileSystem.java:532) at org.apache.hadoop.fs.FilterFileSystem.getFileStatus(FilterFileSystem.java:425) at org.apache.hadoop.yarn.util.FSDownload.copy(FSDownload.java:251) at org.apache.hadoop.yarn.util.FSDownload.access$000(FSDownload.java:61) at org.apache.hadoop.yarn.util.FSDownload$2.run(FSDownload.java:359) at org.apache.hadoop.yarn.util.FSDownload$2.run(FSDownload.java:357) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:422) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1693) at org.apache.hadoop.yarn.util.FSDownload.call(FSDownload.java:356) at org.apache.hadoop.yarn.util.FSDownload.call(FSDownload.java:60) at java.util.concurrent.FutureTask.run(FutureTask.java:266) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) at java.util.concurrent.FutureTask.run(FutureTask.java:266) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) at java.lang.Thread.run(Thread.java:745) Failing this attempt. Failing the application. The test-slider.sh script's output is as follows: # bash -x /tmp/mybuild/test_slider.sh + echo 'memcached built, registering memcached with slider and then running it' memcached built, registering memcached with slider and then running it + /usr/lib/slider/bin/slider install-package --package /tmp/mybuild/incubator-slider/app-packages/memcached/jmemcached-1.0.0.zip --name jmemcached --debug SLF4J: Class path contains multiple SLF4J bindings. SLF4J: Found binding in [jar:file:/usr/lib/slider/lib/slider-core-0.60.0-incubating.jar!/org/slf4j/impl/StaticLoggerBinder.class] SLF4J: Found binding in [jar:file:/usr/lib/slider/lib/slf4j-log4j12-1.7.5.jar!/org/slf4j/impl/StaticLoggerBinder.class] SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation. SLF4J Error Codes<http://www.slf4j.org/codes.html#multiple_bindings> www.slf4j.org SLF4J warning or error messages and their meanings No SLF4J providers were found. This warning, i.e. not an error, message is reported when no SLF4J ... SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory] 2017-04-19 01:10:44,080 [main] INFO client.RMProxy - Connecting to ResourceManager at /0.0.0.0:8032 2017-04-19 01:10:44,243 [main] ERROR main.ServiceLauncher - Unable to access supplied pkg file at /tmp/mybuild/incubator-slider/app-packages/memcached/jmemcached-1.0.0.zip 2017-04-19 01:10:44,247 [main] INFO util.ExitUtil - Exiting with status 40 + echo 'Listing applications - before starting jmemcached' Listing applications - before starting jmemcached + /usr/lib/slider/bin/slider list --manager localhost:8032 SLF4J: Class path contains multiple SLF4J bindings. SLF4J: Found binding in [jar:file:/usr/lib/slider/lib/slider-core-0.60.0-incubating.jar!/org/slf4j/impl/StaticLoggerBinder.class] SLF4J: Found binding in [jar:file:/usr/lib/slider/lib/slf4j-log4j12-1.7.5.jar!/org/slf4j/impl/StaticLoggerBinder.class] SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation. SLF4J Error Codes<http://www.slf4j.org/codes.html#multiple_bindings> www.slf4j.org SLF4J warning or error messages and their meanings No SLF4J providers were found. This warning, i.e. not an error, message is reported when no SLF4J ... SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory] 2017-04-19 01:10:46,066 [main] INFO client.RMProxy - Connecting to ResourceManager at localhost/127.0.0.1:8032 2017-04-19 01:10:46,217 [main] INFO util.ExitUtil - Exiting with status 0 + /usr/lib/slider/bin/slider create mymemcached --template /tmp/mybuild/incubator-slider/app-packages/memcached/appConfig.json --resources /tmp/mybuild/incubator-slider/app-packages/memcached/resources-default.json --manager localhost:8032 --debug SLF4J: Class path contains multiple SLF4J bindings. SLF4J: Found binding in [jar:file:/usr/lib/slider/lib/slider-core-0.60.0-incubating.jar!/org/slf4j/impl/StaticLoggerBinder.class] SLF4J: Found binding in [jar:file:/usr/lib/slider/lib/slf4j-log4j12-1.7.5.jar!/org/slf4j/impl/StaticLoggerBinder.class] SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation. SLF4J Error Codes<http://www.slf4j.org/codes.html#multiple_bindings> www.slf4j.org SLF4J warning or error messages and their meanings No SLF4J providers were found. This warning, i.e. not an error, message is reported when no SLF4J ... SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory] 2017-04-19 01:10:48,070 [main] INFO client.RMProxy - Connecting to ResourceManager at localhost/127.0.0.1:8032 2017-04-19 01:10:48,647 [main] INFO agent.AgentClientProvider - Validating app definition /tmp/mybuild/incubator-slider/app-packages/memcached/jmemcached-1.0.1.zip 2017-04-19 01:10:48,648 [main] INFO agent.AgentUtils - Reading metainfo at /tmp/mybuild/incubator-slider/app-packages/memcached/jmemcached-1.0.1.zip 2017-04-19 01:10:48,675 [main] INFO tools.SliderUtils - Reading metainfo.xml of size 2202 2017-04-19 01:10:48,893 [main] INFO client.SliderClient - No credentials requested 2017-04-19 01:10:48,938 [main] INFO agent.AgentUtils - Reading metainfo at /tmp/mybuild/incubator-slider/app-packages/memcached/jmemcached-1.0.1.zip 2017-04-19 01:10:48,939 [main] INFO tools.SliderUtils - Reading metainfo.xml of size 2202 2017-04-19 01:10:48,976 [main] INFO launch.AbstractLauncher - Log include patterns: 2017-04-19 01:10:48,976 [main] INFO launch.AbstractLauncher - Log exclude patterns: 2017-04-19 01:10:49,861 [main] INFO slideram.SliderAMClientProvider - Loading all dependencies for AM. 2017-04-19 01:10:49,862 [main] INFO tools.SliderUtils - Loading all dependencies from /usr/lib/slider/lib 2017-04-19 01:10:51,324 [main] INFO agent.AgentClientProvider - Automatically uploading the agent tarball at file:/root/.slider/cluster/mymemcached/tmp/application_1492557996590_0003/agent 2017-04-19 01:10:51,361 [main] INFO agent.AgentClientProvider - Validating app definition /tmp/mybuild/incubator-slider/app-packages/memcached/jmemcached-1.0.1.zip 2017-04-19 01:10:51,361 [main] INFO agent.AgentUtils - Reading metainfo at /tmp/mybuild/incubator-slider/app-packages/memcached/jmemcached-1.0.1.zip 2017-04-19 01:10:51,364 [main] INFO tools.SliderUtils - Reading metainfo.xml of size 2202 2017-04-19 01:10:51,429 [main] INFO Configuration.deprecation - slider.registry.path is deprecated. Instead, use hadoop.registry.zk.root 2017-04-19 01:10:51,436 [main] INFO launch.AppMasterLauncher - Submitting application to Resource Manager 2017-04-19 01:10:51,480 [main] INFO impl.YarnClientImpl - Submitted application application_1492557996590_0003 2017-04-19 01:10:51,484 [main] INFO util.ExitUtil - Exiting with status 0 + echo 'Listing applications - after starting jmemcached' Listing applications - after starting jmemcached + /usr/lib/slider/bin/slider list --manager localhost:8032 SLF4J: Class path contains multiple SLF4J bindings. SLF4J: Found binding in [jar:file:/usr/lib/slider/lib/slider-core-0.60.0-incubating.jar!/org/slf4j/impl/StaticLoggerBinder.class] SLF4J: Found binding in [jar:file:/usr/lib/slider/lib/slf4j-log4j12-1.7.5.jar!/org/slf4j/impl/StaticLoggerBinder.class] SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation. SLF4J Error Codes<http://www.slf4j.org/codes.html#multiple_bindings> www.slf4j.org SLF4J warning or error messages and their meanings No SLF4J providers were found. This warning, i.e. not an error, message is reported when no SLF4J ... SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory] 2017-04-19 01:10:53,893 [main] INFO client.RMProxy - Connecting to ResourceManager at localhost/127.0.0.1:8032 mymemcached ACCEPTED application_1492557996590_0003 2017-04-19 01:10:54,186 [main] INFO util.ExitUtil - Exiting with status 0 + echo 'Finished /tmp/mybuild/test_slider.sh' Finished /tmp/mybuild/test_slider.sh Thanks: Bill