Hello All:
I was able to get this working after some code diving, log reading, rtfm, etc. The following issues came up, I'm not sure if the lessons learned for me are of general interest but here they are: 1. Slider needs a zookeeper instance for the yarn registry service to work. I haven't determined which parameters are optional, but setting the hadoop.registry.zk.quorum property was definitely needed in the commmand line argument is: * slider create jmemcached --template /home/foolish_ewe/mybuild/incubator-slider/app-packages/memcached/appConfig.json --resources /home/foolish_ewe/mybuild/incubator-slider/app-packages/memcached/resources-default.json --manager yarn-rm.cluster.mycompany.com:8032 --debug --zkhosts zknode.zkcluster.mycompany.com:2181 --zkpath /slider_test/clustername/ -D hadoop.registry.zk.quorum=zknode.zkcluster.mycompany.com:2181 -D yarn.nodemanager.delete.debug-delay-sec=3600 -D yarn.nodemanager.sleep-delay-before-sigkill.ms=3600000 2. Slider can be installed in a user directory for testing, we don't need root permissions (great if you need to avoid pestering your ops team) and run as a user 3. If the launched application (e.g. memcached in this case) fails quickly, it can be really hard to view the memcached container's logs. I had to try many times before I was quick enough to view the logs, initially I was confused and thought the registration had failed. In case you are wondering why it failed, the java_home setting for the docker image was not consistent with our actual cluster. The yarn node manager settings didn't seem to help that much with this, are there any better hints for reading the logs of failed containers? 4. It would be nice if the Container statistics page showed both the container and node that is running it (although the link gives a hint, it would be nice to see it). 5. I had to manually remove the hdfs installed files during testing to allow a clean test shot. 6. The log4j.properties and log4j-server.properties in the installation slider/conf directory are useful. Since I'm running out of my home directory I was able to edit them, but if the rpm install was used, normal users might need to escalate privileges to edit them. I'm testing a version of the client that lets use replace the client log4j.properties with a user defined one (for testing). With best regards: Bill ________________________________ From: Foolish Ewe <foolish...@hotmail.com> Sent: Wednesday, May 3, 2017 1:47 AM To: dev@slider.incubator.apache.org Subject: How to port a working slider and memcached example from docker image to a cluster? I have a version of the memcached example running on a docker image, and now I'd like to port that to a real cluster (to get a working starting point for the actual service I want to run in slider). I suspect the configuration issues could be in the zoo keeper or yarn service registry configuration. Running the following (sanitized) commands: slider install-package --package /home/foolish_ewe/mybuild/incubator-slider/app-packages/memcached/jmemcached-1.0.1.zip --name jmemcached --debug --replacepkg slider create jmemcached --template /home/foolish_ewe/mybuild/incubator-slider/app-packages/memcached/appConfig.json --resources /home/foolish_ewe/mybuild/incubator-slider/app-packages/memcached/resources-default.json --manager rm.yarn.cluster.mycompany.com:8032 --debug --zkhosts zookeeper.cluster.mycompany.com:2181 --zkpath /slider_test/clustername/ I'm seeing failed zookeeper connections to localhost:2181 the AM logs: 2017-05-02 16:16:07,992 [main-SendThread(localhost:2181)] WARN zookeeper.ClientCnxn - Session 0x0 for server null, unexpected error, closing socket connection and attempting reconnect java.net.ConnectException: Connection refused How can I tweak the connection string? If I look at slider/conf/slider-client.xml, I am still using the default configuration and see the following setting: <property> <name>hadoop.registry.zk.quorum</name> <value>@ZK-QUORUM</value> </property> First off, I'm not sure about the @ZK-QUORUM syntax means, overriding this with with connection string with a single host provides no relief from the dreaded symptom. The AM logs look like: 2017-05-02 16:16:07,401 [main] INFO appmaster.SliderAppMaster - Registry service username =fooolish_ewe 2017-05-02 16:16:07,462 [main] INFO appmaster.SliderAppMaster - Service Record ServiceRecord{description='Slider Application Master'; external endpoints: {{ "api" : "http://", "addressType" : "uri", "protocolType" : "webui", "addresses" : [ { "uri" : "http://cluster.mycompany.com:42734" } ] }; { "api" : "classpath:org.apache.slider.management", "addressType" : "uri", "protocolType" : "REST", "addresses" : [ { "uri" : "http://cluster.mycompany.com:42734/ws/v1/slider/mgmt" } ] }; { "api" : "classpath:org.apache.slider.publisher", "addressType" : "uri", "protocolType" : "REST", "addresses" : [ { "uri" : "http://cluster.mycompany.com:42734/ws/v1/slider/publisher" } ] }; { "api" : "classpath:org.apache.slider.registry", "addressType" : "uri", "protocolType" : "REST", "addresses" : [ { "uri" : "http://cluster.mycompany.com:42734/ws/v1/slider/registry" } ] }; { "api" : "classpath:org.apache.slider.publisher.configurations", "addressType" : "uri", "protocolType" : "REST", "addresses" : [ { "uri" : "http://cluster.mycompany.com:42734/ws/v1/slider/publisher/slider" } ] }; { "api" : "classpath:org.apache.slider.publisher.exports", "addressType" : "uri", "protocolType" : "REST", "addresses" : [ { "uri" : "http://cluster.mycompany.com:42734/ws/v1/slider/publisher/exports" } ] }; }; internal endpoints: {{ "api" : "classpath:org.apache.slider.agents.secure", "addressType" : "uri", "protocolType" : "REST", "addresses" : [ { "uri" : "https://cluster.mycompany.com:40466/ws/v1/slider/agents" } ] }; { "api" : "classpath:org.apache.slider.agents.oneway", "addressType" : "uri", "protocolType" : "REST", "addresses" : [ { "uri" : "https://cluster.mycompany.com:59141/ws/v1/slider/agents" } ] }; }, attributes: {"yarn:id"="application_1492599342357_0064" "yarn:persistence"="application" }} 2017-05-02 16:16:07,992 [main-SendThread(localhost:2181)] WARN zookeeper.ClientCnxn - Session 0x0 for server null, unexpected error, closing socket connection and attempting reconnect java.net.ConnectException: Connection refused at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method) at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:717) at org.apache.zookeeper.ClientCnxnSocketNIO.doTransport(ClientCnxnSocketNIO.java:361) at org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1081) [Several repetitions of the previous error omitted for clarity and then...] 2017-05-02 16:16:12,877 [780172372@qtp-747004588-0] ERROR webapp.Dispatcher - error handling URI: /slideram java.lang.reflect.InvocationTargetException at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:498) at org.apache.hadoop.yarn.webapp.Dispatcher.service(Dispatcher.java:153) at javax.servlet.http.HttpServlet.service(HttpServlet.java:820) at com.google.inject.servlet.ServletDefinition.doService(ServletDefinition.java:263) at com.google.inject.servlet.ServletDefinition.service(ServletDefinition.java:178) at com.google.inject.servlet.ManagedServletPipeline.service(ManagedServletPipeline.java:91) at com.google.inject.servlet.FilterChainInvocation.doFilter(FilterChainInvocation.java:62) at com.sun.jersey.spi.container.servlet.ServletContainer.doFilter(ServletContainer.java:900) at com.sun.jersey.spi.container.servlet.ServletContainer.doFilter(ServletContainer.java:834) at com.sun.jersey.spi.container.servlet.ServletContainer.doFilter(ServletContainer.java:795) at com.google.inject.servlet.FilterDefinition.doFilter(FilterDefinition.java:163) at com.google.inject.servlet.FilterChainInvocation.doFilter(FilterChainInvocation.java:58) at com.google.inject.servlet.ManagedFilterPipeline.dispatch(ManagedFilterPipeline.java:118) at com.google.inject.servlet.GuiceFilter.doFilter(GuiceFilter.java:113) at org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1212) at org.apache.hadoop.yarn.server.webproxy.amfilter.AmIpFilter.doFilter(AmIpFilter.java:164) at org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1212) at org.apache.hadoop.http.HttpServer2$QuotingInputFilter.doFilter(HttpServer2.java:1286) at org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1212) at org.apache.hadoop.http.NoCacheFilter.doFilter(NoCacheFilter.java:45) at org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1212) at org.mortbay.jetty.servlet.ServletHandler.handle(ServletHandler.java:399) at org.mortbay.jetty.security.SecurityHandler.handle(SecurityHandler.java:216) at org.mortbay.jetty.servlet.SessionHandler.handle(SessionHandler.java:182) at org.mortbay.jetty.handler.ContextHandler.handle(ContextHandler.java:766) at org.mortbay.jetty.webapp.WebAppContext.handle(WebAppContext.java:450) at org.mortbay.jetty.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:230) at org.mortbay.jetty.handler.HandlerWrapper.handle(HandlerWrapper.java:152) at org.mortbay.jetty.Server.handle(Server.java:326) at org.mortbay.jetty.HttpConnection.handleRequest(HttpConnection.java:542) at org.mortbay.jetty.HttpConnection$RequestHandler.headerComplete(HttpConnection.java:928) at org.mortbay.jetty.HttpParser.parseNext(HttpParser.java:549) at org.mortbay.jetty.HttpParser.parseAvailable(HttpParser.java:212) at org.mortbay.jetty.HttpConnection.handle(HttpConnection.java:404) at org.mortbay.io.nio.SelectChannelEndPoint.run(SelectChannelEndPoint.java:410) at org.mortbay.thread.QueuedThreadPool$PoolThread.run(QueuedThreadPool.java:582) Caused by: java.lang.NullPointerException at org.apache.slider.providers.AbstractProviderService.buildEndpointDetails(AbstractProviderService.java:352) at org.apache.slider.providers.AbstractProviderService.buildMonitorDetails(AbstractProviderService.java:337) at org.apache.slider.providers.agent.AgentProviderService.buildMonitorDetails(AgentProviderService.java:810) at org.apache.slider.server.appmaster.web.view.IndexBlock.addProviderServiceOptions(IndexBlock.java:129) at org.apache.slider.server.appmaster.web.view.IndexBlock.doIndex(IndexBlock.java:85) at org.apache.slider.server.appmaster.web.view.IndexBlock.render(IndexBlock.java:60) at org.apache.hadoop.yarn.webapp.view.HtmlBlock.render(HtmlBlock.java:67) at org.apache.hadoop.yarn.webapp.view.HtmlBlock.renderPartial(HtmlBlock.java:77) at org.apache.hadoop.yarn.webapp.View.render(View.java:235) at org.apache.hadoop.yarn.webapp.view.HtmlPage$Page.subView(HtmlPage.java:49) at org.apache.hadoop.yarn.webapp.hamlet.HamletImpl$EImp._v(HamletImpl.java:117) at org.apache.hadoop.yarn.webapp.hamlet.Hamlet$TD._(Hamlet.java:845) at org.apache.hadoop.yarn.webapp.view.TwoColumnLayout.render(TwoColumnLayout.java:56) at org.apache.hadoop.yarn.webapp.view.HtmlPage.render(HtmlPage.java:82) at org.apache.hadoop.yarn.webapp.Controller.render(Controller.java:212) at org.apache.slider.server.appmaster.web.SliderAMController.index(SliderAMController.java:47) ... 39 more 2017-05-02 16:16:13,495 [main-SendThread(localhost:2181)] WARN zookeeper.ClientCnxn - Session 0x0 for server null, unexpected error, closing socket connection and attempting reconnect java.net.ConnectException: Connection refused at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method) at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:717) at org.apache.zookeeper.ClientCnxnSocketNIO.doTransport(ClientCnxnSocketNIO.java:361) at org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1081) [More repetitions of the previous error deleted] 2017-05-02 16:16:22,474 [main] ERROR curator.ConnectionState - Connection timed out for connection string (localhost:2181) and timeout (15000) / elapsed (18944) org.apache.curator.CuratorConnectionLossException: KeeperErrorCode = ConnectionLoss at org.apache.curator.ConnectionState.checkTimeouts(ConnectionState.java:198) at org.apache.curator.ConnectionState.getZooKeeper(ConnectionState.java:88) at org.apache.curator.CuratorZookeeperClient.getZooKeeper(CuratorZookeeperClient.java:113) at org.apache.curator.framework.imps.CuratorFrameworkImpl.getZooKeeper(CuratorFrameworkImpl.java:457) at org.apache.curator.framework.imps.DeleteBuilderImpl$5.call(DeleteBuilderImpl.java:239) at org.apache.curator.framework.imps.DeleteBuilderImpl$5.call(DeleteBuilderImpl.java:234) at org.apache.curator.RetryLoop.callWithRetry(RetryLoop.java:107) at org.apache.curator.framework.imps.DeleteBuilderImpl.pathInForeground(DeleteBuilderImpl.java:230) at org.apache.curator.framework.imps.DeleteBuilderImpl.forPath(DeleteBuilderImpl.java:215) at org.apache.curator.framework.imps.DeleteBuilderImpl.forPath(DeleteBuilderImpl.java:42) at org.apache.hadoop.registry.client.impl.zk.CuratorService.zkDelete(CuratorService.java:673) at org.apache.hadoop.registry.client.impl.zk.RegistryOperationsService.delete(RegistryOperationsService.java:160) at org.apache.slider.server.services.yarnregistry.YarnRegistryViewForProviders.putService(YarnRegistryViewForProviders.java:186) at org.apache.slider.server.services.yarnregistry.YarnRegistryViewForProviders.registerSelf(YarnRegistryViewForProviders.java:224) at org.apache.slider.server.appmaster.SliderAppMaster.registerServiceInstance(SliderAppMaster.java:1084) at org.apache.slider.server.appmaster.SliderAppMaster.createAndRunCluster(SliderAppMaster.java:885) at org.apache.slider.server.appmaster.SliderAppMaster.runService(SliderAppMaster.java:525) at org.apache.slider.core.main.ServiceLauncher.launchService(ServiceLauncher.java:188) at org.apache.slider.core.main.ServiceLauncher.launchServiceRobustly(ServiceLauncher.java:475) at org.apache.slider.core.main.ServiceLauncher.launchServiceAndExit(ServiceLauncher.java:403) at org.apache.slider.core.main.ServiceLauncher.serviceMain(ServiceLauncher.java:630) at org.apache.slider.server.appmaster.SliderAppMaster.main(SliderAppMaster.java:2240) 2017-05-02 16:16:23,403 [main-SendThread(localhost:2181)] WARN zookeeper.ClientCnxn - Session 0x0 for server null, unexpected error, closing socket connection and attempting reconnect java.net.ConnectException: Connection refused at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method) at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:717) at org.apache.zookeeper.ClientCnxnSocketNIO.doTransport(ClientCnxnSocketNIO.java:361) at org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1081) 2017-05-02 16:16:24,504 [main-SendThread(localhost:2181)] WARN zookeeper.ClientCnxn - Session 0x0 for server null, unexpected error, closing socket connection and attempting reconnect java.net.ConnectException: Connection refused at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method) at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:717) at org.apache.zookeeper.ClientCnxnSocketNIO.doTransport(ClientCnxnSocketNIO.java:361) at org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1081) With best regards: Bill