Hello All:

I was able to get this working after some code diving, log reading, rtfm, etc.


The following issues came up, I'm not sure if the lessons learned for me are of 
general interest but here they are:

  1.  Slider needs a zookeeper instance for the yarn registry service to work. 
I haven't determined which parameters are optional, but setting the 
hadoop.registry.zk.quorum property was definitely needed in the commmand line 
argument is:
     *   slider create jmemcached --template 
/home/foolish_ewe/mybuild/incubator-slider/app-packages/memcached/appConfig.json
 --resources 
/home/foolish_ewe/mybuild/incubator-slider/app-packages/memcached/resources-default.json
 --manager yarn-rm.cluster.mycompany.com:8032 --debug --zkhosts 
zknode.zkcluster.mycompany.com:2181 --zkpath /slider_test/clustername/ -D 
hadoop.registry.zk.quorum=zknode.zkcluster.mycompany.com:2181 -D 
yarn.nodemanager.delete.debug-delay-sec=3600 -D 
yarn.nodemanager.sleep-delay-before-sigkill.ms=3600000
  2.  Slider can be installed in a user directory for testing, we don't need 
root permissions (great if you need to avoid pestering your ops team) and run 
as a user
  3.  If the launched application (e.g. memcached in this case) fails quickly, 
it can be really hard to view the memcached container's logs.  I had to try 
many times before I was quick enough to view the logs, initially I was confused 
and thought the registration had failed.  In case you are wondering why it 
failed, the java_home setting for the docker image was not consistent with our 
actual cluster.  The yarn node manager settings didn't seem to help that much 
with this, are there any better hints for reading the logs of failed containers?
  4.  It would be nice if the Container statistics page showed both the 
container and node that is running it (although the link gives a hint, it would 
be nice to see it).
  5.  I had to manually remove the hdfs installed files  during testing to 
allow a clean  test shot.
  6.  The log4j.properties and log4j-server.properties in the installation 
slider/conf directory are useful.  Since I'm running out of my home directory I 
was able to edit them, but if the rpm install was used, normal users might need 
to escalate privileges to edit them.  I'm testing a version of the client that 
lets use replace the client log4j.properties with a user defined one (for 
testing).

With best regards:

Bill


________________________________
From: Foolish Ewe <foolish...@hotmail.com>
Sent: Wednesday, May 3, 2017 1:47 AM
To: dev@slider.incubator.apache.org
Subject: How to port a working slider and memcached example from docker image 
to a cluster?

I have a version of the memcached example running on a  docker image, and now 
I'd like to port that to a real cluster (to get a working starting point for 
the actual service I want to run in slider).

I suspect the configuration issues could be in the zoo keeper or yarn service 
registry configuration.

Running the following (sanitized) commands:


slider install-package --package 
/home/foolish_ewe/mybuild/incubator-slider/app-packages/memcached/jmemcached-1.0.1.zip
 --name jmemcached --debug --replacepkg

slider create jmemcached --template 
/home/foolish_ewe/mybuild/incubator-slider/app-packages/memcached/appConfig.json
 --resources 
/home/foolish_ewe/mybuild/incubator-slider/app-packages/memcached/resources-default.json
 --manager rm.yarn.cluster.mycompany.com:8032 --debug --zkhosts 
zookeeper.cluster.mycompany.com:2181 --zkpath /slider_test/clustername/


I'm seeing failed zookeeper connections to localhost:2181 the AM logs:

2017-05-02 16:16:07,992 [main-SendThread(localhost:2181)] WARN  
zookeeper.ClientCnxn - Session 0x0 for server null, unexpected error, closing 
socket connection and attempting reconnect

java.net.ConnectException: Connection refused


How can I tweak the connection string?



If I look at slider/conf/slider-client.xml, I am still using the default 
configuration and see the following setting:

    <property>

      <name>hadoop.registry.zk.quorum</name>

      <value>@ZK-QUORUM</value>

    </property>

First off, I'm not sure about the @ZK-QUORUM syntax means, overriding this with 
with connection string with a  single host provides no relief from the dreaded 
symptom.



The AM logs look like:

2017-05-02 16:16:07,401 [main] INFO  appmaster.SliderAppMaster - Registry 
service username =fooolish_ewe

2017-05-02 16:16:07,462 [main] INFO  appmaster.SliderAppMaster - Service Record

ServiceRecord{description='Slider Application Master'; external endpoints: {{

  "api" : "http://";,

  "addressType" : "uri",

  "protocolType" : "webui",

  "addresses" : [ {

    "uri" : "http://cluster.mycompany.com:42734";

  } ]

}; {

  "api" : "classpath:org.apache.slider.management",

  "addressType" : "uri",

  "protocolType" : "REST",

  "addresses" : [ {

    "uri" : "http://cluster.mycompany.com:42734/ws/v1/slider/mgmt";

  } ]

}; {

  "api" : "classpath:org.apache.slider.publisher",

  "addressType" : "uri",

  "protocolType" : "REST",

  "addresses" : [ {

    "uri" : "http://cluster.mycompany.com:42734/ws/v1/slider/publisher";

  } ]

}; {

  "api" : "classpath:org.apache.slider.registry",

  "addressType" : "uri",

  "protocolType" : "REST",

  "addresses" : [ {

    "uri" : "http://cluster.mycompany.com:42734/ws/v1/slider/registry";

  } ]

}; {

  "api" : "classpath:org.apache.slider.publisher.configurations",

  "addressType" : "uri",

  "protocolType" : "REST",

  "addresses" : [ {

    "uri" : "http://cluster.mycompany.com:42734/ws/v1/slider/publisher/slider";

  } ]

}; {

  "api" : "classpath:org.apache.slider.publisher.exports",

  "addressType" : "uri",

  "protocolType" : "REST",

  "addresses" : [ {

    "uri" : "http://cluster.mycompany.com:42734/ws/v1/slider/publisher/exports";

  } ]

}; }; internal endpoints: {{

  "api" : "classpath:org.apache.slider.agents.secure",

  "addressType" : "uri",

  "protocolType" : "REST",

  "addresses" : [ {

    "uri" : "https://cluster.mycompany.com:40466/ws/v1/slider/agents";

  } ]

}; {

  "api" : "classpath:org.apache.slider.agents.oneway",

  "addressType" : "uri",

  "protocolType" : "REST",

  "addresses" : [ {

    "uri" : "https://cluster.mycompany.com:59141/ws/v1/slider/agents";

  } ]

}; }, attributes: {"yarn:id"="application_1492599342357_0064" 
"yarn:persistence"="application" }}

2017-05-02 16:16:07,992 [main-SendThread(localhost:2181)] WARN  
zookeeper.ClientCnxn - Session 0x0 for server null, unexpected error, closing 
socket connection and attempting reconnect

java.net.ConnectException: Connection refused

at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)

at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:717)

at 
org.apache.zookeeper.ClientCnxnSocketNIO.doTransport(ClientCnxnSocketNIO.java:361)

at org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1081)

[Several repetitions of the previous error omitted for clarity and then...]

2017-05-02 16:16:12,877 [780172372@qtp-747004588-0] ERROR webapp.Dispatcher - 
error handling URI: /slideram

java.lang.reflect.InvocationTargetException

at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)

at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)

at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)

at java.lang.reflect.Method.invoke(Method.java:498)

at org.apache.hadoop.yarn.webapp.Dispatcher.service(Dispatcher.java:153)

at javax.servlet.http.HttpServlet.service(HttpServlet.java:820)

at 
com.google.inject.servlet.ServletDefinition.doService(ServletDefinition.java:263)

at 
com.google.inject.servlet.ServletDefinition.service(ServletDefinition.java:178)

at 
com.google.inject.servlet.ManagedServletPipeline.service(ManagedServletPipeline.java:91)

at 
com.google.inject.servlet.FilterChainInvocation.doFilter(FilterChainInvocation.java:62)

at 
com.sun.jersey.spi.container.servlet.ServletContainer.doFilter(ServletContainer.java:900)

at 
com.sun.jersey.spi.container.servlet.ServletContainer.doFilter(ServletContainer.java:834)

at 
com.sun.jersey.spi.container.servlet.ServletContainer.doFilter(ServletContainer.java:795)

at 
com.google.inject.servlet.FilterDefinition.doFilter(FilterDefinition.java:163)

at 
com.google.inject.servlet.FilterChainInvocation.doFilter(FilterChainInvocation.java:58)

at 
com.google.inject.servlet.ManagedFilterPipeline.dispatch(ManagedFilterPipeline.java:118)

at com.google.inject.servlet.GuiceFilter.doFilter(GuiceFilter.java:113)

at 
org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1212)

at 
org.apache.hadoop.yarn.server.webproxy.amfilter.AmIpFilter.doFilter(AmIpFilter.java:164)

at 
org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1212)

at 
org.apache.hadoop.http.HttpServer2$QuotingInputFilter.doFilter(HttpServer2.java:1286)

at 
org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1212)

at org.apache.hadoop.http.NoCacheFilter.doFilter(NoCacheFilter.java:45)

at 
org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1212)

at org.mortbay.jetty.servlet.ServletHandler.handle(ServletHandler.java:399)

at org.mortbay.jetty.security.SecurityHandler.handle(SecurityHandler.java:216)

at org.mortbay.jetty.servlet.SessionHandler.handle(SessionHandler.java:182)

at org.mortbay.jetty.handler.ContextHandler.handle(ContextHandler.java:766)

at org.mortbay.jetty.webapp.WebAppContext.handle(WebAppContext.java:450)

at 
org.mortbay.jetty.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:230)

at org.mortbay.jetty.handler.HandlerWrapper.handle(HandlerWrapper.java:152)

at org.mortbay.jetty.Server.handle(Server.java:326)

at org.mortbay.jetty.HttpConnection.handleRequest(HttpConnection.java:542)

at 
org.mortbay.jetty.HttpConnection$RequestHandler.headerComplete(HttpConnection.java:928)

at org.mortbay.jetty.HttpParser.parseNext(HttpParser.java:549)

at org.mortbay.jetty.HttpParser.parseAvailable(HttpParser.java:212)

at org.mortbay.jetty.HttpConnection.handle(HttpConnection.java:404)

at org.mortbay.io.nio.SelectChannelEndPoint.run(SelectChannelEndPoint.java:410)

at org.mortbay.thread.QueuedThreadPool$PoolThread.run(QueuedThreadPool.java:582)

Caused by: java.lang.NullPointerException

at 
org.apache.slider.providers.AbstractProviderService.buildEndpointDetails(AbstractProviderService.java:352)

at 
org.apache.slider.providers.AbstractProviderService.buildMonitorDetails(AbstractProviderService.java:337)

at 
org.apache.slider.providers.agent.AgentProviderService.buildMonitorDetails(AgentProviderService.java:810)

at 
org.apache.slider.server.appmaster.web.view.IndexBlock.addProviderServiceOptions(IndexBlock.java:129)

at 
org.apache.slider.server.appmaster.web.view.IndexBlock.doIndex(IndexBlock.java:85)

at 
org.apache.slider.server.appmaster.web.view.IndexBlock.render(IndexBlock.java:60)

at org.apache.hadoop.yarn.webapp.view.HtmlBlock.render(HtmlBlock.java:67)

at org.apache.hadoop.yarn.webapp.view.HtmlBlock.renderPartial(HtmlBlock.java:77)

at org.apache.hadoop.yarn.webapp.View.render(View.java:235)

at org.apache.hadoop.yarn.webapp.view.HtmlPage$Page.subView(HtmlPage.java:49)

at org.apache.hadoop.yarn.webapp.hamlet.HamletImpl$EImp._v(HamletImpl.java:117)

at org.apache.hadoop.yarn.webapp.hamlet.Hamlet$TD._(Hamlet.java:845)

at 
org.apache.hadoop.yarn.webapp.view.TwoColumnLayout.render(TwoColumnLayout.java:56)

at org.apache.hadoop.yarn.webapp.view.HtmlPage.render(HtmlPage.java:82)

at org.apache.hadoop.yarn.webapp.Controller.render(Controller.java:212)

at 
org.apache.slider.server.appmaster.web.SliderAMController.index(SliderAMController.java:47)

... 39 more

2017-05-02 16:16:13,495 [main-SendThread(localhost:2181)] WARN  
zookeeper.ClientCnxn - Session 0x0 for server null, unexpected error, closing 
socket connection and attempting reconnect

java.net.ConnectException: Connection refused

at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)

at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:717)

at 
org.apache.zookeeper.ClientCnxnSocketNIO.doTransport(ClientCnxnSocketNIO.java:361)

at org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1081)

[More repetitions of the previous error deleted]

2017-05-02 16:16:22,474 [main] ERROR curator.ConnectionState - Connection timed 
out for connection string (localhost:2181) and timeout (15000) / elapsed (18944)

org.apache.curator.CuratorConnectionLossException: KeeperErrorCode = 
ConnectionLoss

at org.apache.curator.ConnectionState.checkTimeouts(ConnectionState.java:198)

at org.apache.curator.ConnectionState.getZooKeeper(ConnectionState.java:88)

at 
org.apache.curator.CuratorZookeeperClient.getZooKeeper(CuratorZookeeperClient.java:113)

at 
org.apache.curator.framework.imps.CuratorFrameworkImpl.getZooKeeper(CuratorFrameworkImpl.java:457)

at 
org.apache.curator.framework.imps.DeleteBuilderImpl$5.call(DeleteBuilderImpl.java:239)

at 
org.apache.curator.framework.imps.DeleteBuilderImpl$5.call(DeleteBuilderImpl.java:234)

at org.apache.curator.RetryLoop.callWithRetry(RetryLoop.java:107)

at 
org.apache.curator.framework.imps.DeleteBuilderImpl.pathInForeground(DeleteBuilderImpl.java:230)

at 
org.apache.curator.framework.imps.DeleteBuilderImpl.forPath(DeleteBuilderImpl.java:215)

at 
org.apache.curator.framework.imps.DeleteBuilderImpl.forPath(DeleteBuilderImpl.java:42)

at 
org.apache.hadoop.registry.client.impl.zk.CuratorService.zkDelete(CuratorService.java:673)

at 
org.apache.hadoop.registry.client.impl.zk.RegistryOperationsService.delete(RegistryOperationsService.java:160)

at 
org.apache.slider.server.services.yarnregistry.YarnRegistryViewForProviders.putService(YarnRegistryViewForProviders.java:186)

at 
org.apache.slider.server.services.yarnregistry.YarnRegistryViewForProviders.registerSelf(YarnRegistryViewForProviders.java:224)

at 
org.apache.slider.server.appmaster.SliderAppMaster.registerServiceInstance(SliderAppMaster.java:1084)

at 
org.apache.slider.server.appmaster.SliderAppMaster.createAndRunCluster(SliderAppMaster.java:885)

at 
org.apache.slider.server.appmaster.SliderAppMaster.runService(SliderAppMaster.java:525)

at 
org.apache.slider.core.main.ServiceLauncher.launchService(ServiceLauncher.java:188)

at 
org.apache.slider.core.main.ServiceLauncher.launchServiceRobustly(ServiceLauncher.java:475)

at 
org.apache.slider.core.main.ServiceLauncher.launchServiceAndExit(ServiceLauncher.java:403)

at 
org.apache.slider.core.main.ServiceLauncher.serviceMain(ServiceLauncher.java:630)

at 
org.apache.slider.server.appmaster.SliderAppMaster.main(SliderAppMaster.java:2240)

2017-05-02 16:16:23,403 [main-SendThread(localhost:2181)] WARN  
zookeeper.ClientCnxn - Session 0x0 for server null, unexpected error, closing 
socket connection and attempting reconnect

java.net.ConnectException: Connection refused

at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)

at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:717)

at 
org.apache.zookeeper.ClientCnxnSocketNIO.doTransport(ClientCnxnSocketNIO.java:361)

at org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1081)

2017-05-02 16:16:24,504 [main-SendThread(localhost:2181)] WARN  
zookeeper.ClientCnxn - Session 0x0 for server null, unexpected error, closing 
socket connection and attempting reconnect

java.net.ConnectException: Connection refused

at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)

at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:717)

at 
org.apache.zookeeper.ClientCnxnSocketNIO.doTransport(ClientCnxnSocketNIO.java:361)

at org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1081)


With best regards:


Bill

Reply via email to