Hi All:

Thanks to Billie Rinaldi's help and some code diving, I've been able to get 
slider to advance the jmemcached task to the running state.  However, I don't 
think jmemcached is actually running (e.g. the port 11211 is not found by 
netstat or lsof).

I suspect something has gone awry during the launching of the container in 
slider, and I could use a bit of guidance regarding how to track this down.


If I understand correctly, slider is launching 2 containers, an application 
manager and the one that is supposed to run jmemcached.  I think the jmemcached 
container is failing and relaunching frequently under a different container ID, 
and that the container id  suffix is climbing (e.g. initially I think it was 
container_1493146167513_0001_01_000008 when I first checked, now the container 
name suffix ends in 0155.  In any case, I'm trying to understand what is 
happening, looking at:

http://quickstart.cloudera:8042/node/containerlogs/container_1493146167513_0001_01_000008/root/slider-agent.log/?start=0,
 I see that the container names

INFO 2017-04-25 19:31:51,440 main.py:85 - loglevel=logging.INFO
INFO 2017-04-25 19:31:51,440 main.py:96 - Newloglevel=logging.DEBUG
INFO 2017-04-25 19:31:51,440 main.py:242 - Using AGENT_WORK_ROOT = 
/var/lib/hadoop-yarn/cache/yarn/nm-local-dir/usercache/root/appcache/application_1493146167513_0001/container_1493146167513_0001_01_000008

INFO 2017-04-25 19:31:51,441 main.py:243 - Using AGENT_LOG_ROOT = 
/var/log/hadoop-yarn/containers/application_1493146167513_0001/container_1493146167513_0001_01_000008

[Omitting logs for brevity, Note that the container name below might be getting 
corrupted to now read container_1493146167513_0001_01_000008___MEMCACHED]

INFO 2017-04-25 19:31:51,478 NetUtil.py:67 - DEBUG: Trying to connect to the 
server at https://quickstart.cloudera:34593/ws/v1/slider/agents/
INFO 2017-04-25 19:31:51,478 NetUtil.py:38 - Connecting to the following url 
https://quickstart.cloudera:34593/ws/v1/slider/agents/
INFO 2017-04-25 19:31:51,536 NetUtil.py:45 - Calling url received 200
DEBUG 2017-04-25 19:31:51,537 Controller.py:63 - Initializing Controller RPC 
thread.
INFO 2017-04-25 19:31:51,539 Controller.py:137 - Registering with the server at 
https://quickstart.cloudera:40959/ws/v1/slider/agents/container_1493146167513_0001_01_000008___MEMCACHED/register
 with data '{"actualState": 0, "logFolders": {}, "agentVersion": "1", 
"allocatedPorts": {}, "timestamp": 1493148711538, "expectedState": 0, "tags": 
"", "responseId": -1, "publicHostname": "quickstart.cloudera", "label": 
"container_1493146167513_0001_01_000008___MEMCACHED"}'
INFO 2017-04-25 19:31:51,539 security.py:89 - SSL Connect being called.. 
connecting to the server
INFO 2017-04-25 19:31:51,604 security.py:51 - SSL connection established. 
Two-way SSL authentication is turned off on the server.
INFO 2017-04-25 19:31:51,607 Controller.py:180 - Unable to connect to: 
https://quickstart.cloudera:40959/ws/v1/slider/agents/container_1493146167513_0001_01_000008___MEMCACHED/register
Traceback (most recent call last):
  File 
"/var/lib/hadoop-yarn/cache/yarn/nm-local-dir/usercache/root/appcache/application_1493146167513_0001/filecache/21/slider-agent.tar.gz/slider-agent/agent/Controller.py",
 line 139, in registerWithServer
    regResp = json.loads(response)
  File "/usr/lib64/python2.7/json/__init__.py", line 338, in loads
    return _default_decoder.decode(s)
  File "/usr/lib64/python2.7/json/decoder.py", line 366, in decode
    obj, end = self.raw_decode(s, idx=_w(s, 0).end())
  File "/usr/lib64/python2.7/json/decoder.py", line 384, in raw_decode
    raise ValueError("No JSON object could be decoded")
ValueError: No JSON object could be decoded

Any idea how I can better debug this and analyze this?

With best regards:

Bill
________________________________
From: Foolish Ewe <foolish...@hotmail.com>
Sent: Wednesday, April 19, 2017 1:35 AM
To: dev@slider.incubator.apache.org
Subject: Trying to get he memcached example to run for a new build

Hello All:


I'm trying to run the memcached example locally but it fails pretty quick and 
claims "protobuf-java-2.5.0.jar does not exist".


Consider the following script (test-slider.sh):

#!/bin/bash

# this does not work when run during docker-build, we need to do this once we 
are logged in

echo "memcached built, registering memcached with slider and then running it"

/usr/lib/slider/bin/slider install-package --package 
/tmp/mybuild/incubator-slider/app-packages/memcached/jmemcached-1.0.0.zip 
--name jmemcached --debug

echo "Listing applications - before starting jmemcached"

/usr/lib/slider/bin/slider list --manager localhost:8032

# startup memcached properly, yarn task name will be mymemcached

/usr/lib/slider/bin/slider create mymemcached --template 
/tmp/mybuild/incubator-slider/app-packages/memcached/appConfig.json --resources 
/tmp/mybuild/incubator-slider/app-packages/memcached/resources-default.json 
--manager localhost:8032 --debug

echo "Listing applications - after starting jmemcached"

/usr/lib/slider/bin/slider list --manager localhost:8032

echo "Finished $0"

I am having difficulty in locating the logs after the failure, any idea where 
they would be? yarn logs cannot find them, and I'm not having luck with the 
name node.  I did modify the yarn-site.xml read at startup to include the lines:

 <!-- Begin modifications for debugging slider -->

  <!-- 60 minutes after a failure to see what is left in the directory-->

  <property>

    <name>yarn.nodemanager.delete.debug-delay-sec</name>

    <value>3600</value>

  </property>


  <!--time before the process gets a -9 (Should it be 30 seconds?)-->

  <property>

    <name>yarn.nodemanager.sleep-delay-before-sigkill.ms</name>

    <value>3600000</value>

  </property>

  <!-- End modifications for debugging slider -->


Regarding the error, I'm seeing, the slider install-package appears to work 
correctly from what I can tell, but the slider create encounters run time 
errors when launched, it seems to be looking for protobuf-java-2.5.0.jar. I 
tried making fat jars for the memcached application and the various slider 
packages, to no avail.  Any ideas how to resolve these errors reported in the 
resource manager:
Application application_1492557996590_0002 failed 2 times due to AM Container 
for appattempt_1492557996590_0002_000002 exited with exitCode: -1000
For more detailed output, check application tracking 
page:http://quickstart.cloudera:8088/proxy/application_1492557996590_0002/Then, 
click on links to logs of each attempt.
Diagnostics: File 
file:/root/.slider/cluster/mymemcached/tmp/application_1492557996590_0002/am/lib/protobuf-java-2.5.0.jar
 does not exist
java.io.FileNotFoundException: File 
file:/root/.slider/cluster/mymemcached/tmp/application_1492557996590_0002/am/lib/protobuf-java-2.5.0.jar
 does not exist
at 
org.apache.hadoop.fs.RawLocalFileSystem.deprecatedGetFileStatus(RawLocalFileSystem.java:542)
at 
org.apache.hadoop.fs.RawLocalFileSystem.getFileLinkStatusInternal(RawLocalFileSystem.java:755)
at 
org.apache.hadoop.fs.RawLocalFileSystem.getFileStatus(RawLocalFileSystem.java:532)
at 
org.apache.hadoop.fs.FilterFileSystem.getFileStatus(FilterFileSystem.java:425)
at org.apache.hadoop.yarn.util.FSDownload.copy(FSDownload.java:251)
at org.apache.hadoop.yarn.util.FSDownload.access$000(FSDownload.java:61)
at org.apache.hadoop.yarn.util.FSDownload$2.run(FSDownload.java:359)
at org.apache.hadoop.yarn.util.FSDownload$2.run(FSDownload.java:357)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at 
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1693)
at org.apache.hadoop.yarn.util.FSDownload.call(FSDownload.java:356)
at org.apache.hadoop.yarn.util.FSDownload.call(FSDownload.java:60)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
Failing this attempt. Failing the application.


The test-slider.sh script's output is as follows:

# bash -x /tmp/mybuild/test_slider.sh

+ echo 'memcached built, registering memcached with slider and then running it'

memcached built, registering memcached with slider and then running it

+ /usr/lib/slider/bin/slider install-package --package 
/tmp/mybuild/incubator-slider/app-packages/memcached/jmemcached-1.0.0.zip 
--name jmemcached --debug

SLF4J: Class path contains multiple SLF4J bindings.

SLF4J: Found binding in 
[jar:file:/usr/lib/slider/lib/slider-core-0.60.0-incubating.jar!/org/slf4j/impl/StaticLoggerBinder.class]

SLF4J: Found binding in 
[jar:file:/usr/lib/slider/lib/slf4j-log4j12-1.7.5.jar!/org/slf4j/impl/StaticLoggerBinder.class]

SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
SLF4J Error Codes<http://www.slf4j.org/codes.html#multiple_bindings>
www.slf4j.org
SLF4J warning or error messages and their meanings No SLF4J providers were 
found. This warning, i.e. not an error, message is reported when no SLF4J ...




SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory]

2017-04-19 01:10:44,080 [main] INFO  client.RMProxy - Connecting to 
ResourceManager at /0.0.0.0:8032

2017-04-19 01:10:44,243 [main] ERROR main.ServiceLauncher - Unable to access 
supplied pkg file at 
/tmp/mybuild/incubator-slider/app-packages/memcached/jmemcached-1.0.0.zip

2017-04-19 01:10:44,247 [main] INFO  util.ExitUtil - Exiting with status 40

+ echo 'Listing applications - before starting jmemcached'

Listing applications - before starting jmemcached

+ /usr/lib/slider/bin/slider list --manager localhost:8032

SLF4J: Class path contains multiple SLF4J bindings.

SLF4J: Found binding in 
[jar:file:/usr/lib/slider/lib/slider-core-0.60.0-incubating.jar!/org/slf4j/impl/StaticLoggerBinder.class]

SLF4J: Found binding in 
[jar:file:/usr/lib/slider/lib/slf4j-log4j12-1.7.5.jar!/org/slf4j/impl/StaticLoggerBinder.class]

SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
SLF4J Error Codes<http://www.slf4j.org/codes.html#multiple_bindings>
www.slf4j.org
SLF4J warning or error messages and their meanings No SLF4J providers were 
found. This warning, i.e. not an error, message is reported when no SLF4J ...




SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory]

2017-04-19 01:10:46,066 [main] INFO  client.RMProxy - Connecting to 
ResourceManager at localhost/127.0.0.1:8032

2017-04-19 01:10:46,217 [main] INFO  util.ExitUtil - Exiting with status 0

+ /usr/lib/slider/bin/slider create mymemcached --template 
/tmp/mybuild/incubator-slider/app-packages/memcached/appConfig.json --resources 
/tmp/mybuild/incubator-slider/app-packages/memcached/resources-default.json 
--manager localhost:8032 --debug

SLF4J: Class path contains multiple SLF4J bindings.

SLF4J: Found binding in 
[jar:file:/usr/lib/slider/lib/slider-core-0.60.0-incubating.jar!/org/slf4j/impl/StaticLoggerBinder.class]

SLF4J: Found binding in 
[jar:file:/usr/lib/slider/lib/slf4j-log4j12-1.7.5.jar!/org/slf4j/impl/StaticLoggerBinder.class]

SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
SLF4J Error Codes<http://www.slf4j.org/codes.html#multiple_bindings>
www.slf4j.org
SLF4J warning or error messages and their meanings No SLF4J providers were 
found. This warning, i.e. not an error, message is reported when no SLF4J ...




SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory]

2017-04-19 01:10:48,070 [main] INFO  client.RMProxy - Connecting to 
ResourceManager at localhost/127.0.0.1:8032

2017-04-19 01:10:48,647 [main] INFO  agent.AgentClientProvider - Validating app 
definition 
/tmp/mybuild/incubator-slider/app-packages/memcached/jmemcached-1.0.1.zip

2017-04-19 01:10:48,648 [main] INFO  agent.AgentUtils - Reading metainfo at 
/tmp/mybuild/incubator-slider/app-packages/memcached/jmemcached-1.0.1.zip

2017-04-19 01:10:48,675 [main] INFO  tools.SliderUtils - Reading metainfo.xml 
of size 2202

2017-04-19 01:10:48,893 [main] INFO  client.SliderClient - No credentials 
requested

2017-04-19 01:10:48,938 [main] INFO  agent.AgentUtils - Reading metainfo at 
/tmp/mybuild/incubator-slider/app-packages/memcached/jmemcached-1.0.1.zip

2017-04-19 01:10:48,939 [main] INFO  tools.SliderUtils - Reading metainfo.xml 
of size 2202

2017-04-19 01:10:48,976 [main] INFO  launch.AbstractLauncher - Log include 
patterns:

2017-04-19 01:10:48,976 [main] INFO  launch.AbstractLauncher - Log exclude 
patterns:

2017-04-19 01:10:49,861 [main] INFO  slideram.SliderAMClientProvider - Loading 
all dependencies for AM.

2017-04-19 01:10:49,862 [main] INFO  tools.SliderUtils - Loading all 
dependencies from /usr/lib/slider/lib

2017-04-19 01:10:51,324 [main] INFO  agent.AgentClientProvider - Automatically 
uploading the agent tarball at 
file:/root/.slider/cluster/mymemcached/tmp/application_1492557996590_0003/agent

2017-04-19 01:10:51,361 [main] INFO  agent.AgentClientProvider - Validating app 
definition 
/tmp/mybuild/incubator-slider/app-packages/memcached/jmemcached-1.0.1.zip

2017-04-19 01:10:51,361 [main] INFO  agent.AgentUtils - Reading metainfo at 
/tmp/mybuild/incubator-slider/app-packages/memcached/jmemcached-1.0.1.zip

2017-04-19 01:10:51,364 [main] INFO  tools.SliderUtils - Reading metainfo.xml 
of size 2202

2017-04-19 01:10:51,429 [main] INFO  Configuration.deprecation - 
slider.registry.path is deprecated. Instead, use hadoop.registry.zk.root

2017-04-19 01:10:51,436 [main] INFO  launch.AppMasterLauncher - Submitting 
application to Resource Manager

2017-04-19 01:10:51,480 [main] INFO  impl.YarnClientImpl - Submitted 
application application_1492557996590_0003

2017-04-19 01:10:51,484 [main] INFO  util.ExitUtil - Exiting with status 0

+ echo 'Listing applications - after starting jmemcached'

Listing applications - after starting jmemcached

+ /usr/lib/slider/bin/slider list --manager localhost:8032

SLF4J: Class path contains multiple SLF4J bindings.

SLF4J: Found binding in 
[jar:file:/usr/lib/slider/lib/slider-core-0.60.0-incubating.jar!/org/slf4j/impl/StaticLoggerBinder.class]

SLF4J: Found binding in 
[jar:file:/usr/lib/slider/lib/slf4j-log4j12-1.7.5.jar!/org/slf4j/impl/StaticLoggerBinder.class]

SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
SLF4J Error Codes<http://www.slf4j.org/codes.html#multiple_bindings>
www.slf4j.org
SLF4J warning or error messages and their meanings No SLF4J providers were 
found. This warning, i.e. not an error, message is reported when no SLF4J ...




SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory]

2017-04-19 01:10:53,893 [main] INFO  client.RMProxy - Connecting to 
ResourceManager at localhost/127.0.0.1:8032

mymemcached                       ACCEPTED  application_1492557996590_0003

2017-04-19 01:10:54,186 [main] INFO  util.ExitUtil - Exiting with status 0

+ echo 'Finished /tmp/mybuild/test_slider.sh'

Finished /tmp/mybuild/test_slider.sh


Thanks:


Bill

Reply via email to