Filed https://issues.apache.org/jira/browse/SLIDER-1227. Thanks for everyones time !
On Wed, May 10, 2017 at 12:17 PM, Manoj Samel <manojsamelt...@gmail.com> wrote: > Thanks Billie for role group explanation, seems like a good feature to > have ! > > Thinking a bit about the role group, following are my thoughts ... > > 1. It seems user will either specify explicit component names for each > component (as in my case) OR will have a group like ZOOKEEPER where slider > generates unique component names like ZOOKEEPER_n (n=1,2..) > 2. If #1 is true, then I feel it would be cleaner design to explicitly > introduce a new entity called the group. In metaInfo.xml, one <component> > element can either have <name> or have <group> but not both. > 1. When <name> is specified, the code should not parse for a group > 2. When <group> is specified, the code will generate individual > names and expect group back from container name > 3. This will not just solve the current reported problem but will also > make it clearer for slider users to understand and implement groups. > > I will create the jira and add these thoughts to it ... > > Thanks > > > On Wed, May 10, 2017 at 7:45 AM, Billie Rinaldi <billie.rina...@gmail.com> > wrote: > >> The role group is used in the unique component names feature. This allows >> you to specify, say, 3 instances of a component SOLR, and you will get >> components SOLR1, SOLR2, and SOLR3. SOLR would be the group for these 3 >> components. This is helpful for apps like ZooKeeper, HBase, and Kafka that >> like component instances to be distinguishable by unique IDs. It will be >> even more helpful once RegistryDNS is available, which will give each a >> unique hostname. >> >> Yeah, we could think about possible ways of solving this problem. Please >> open a ticket along the lines of "allow LABEL_MAKER in component names OR >> document that it should not be used." >> >> On Tue, May 9, 2017 at 7:33 PM, Manoj Samel <manojsamelt...@gmail.com> >> wrote: >> >> > I think I found out what causes the NPE above in .92 and why it works in >> > version 0.80 >> > >> > The component name (a.k.a. role name) is "solo___super" i.e. it has 3 >> "_" >> > >> > In 0.92, it seems a new concept of "Role Group" is introduced, which was >> > not present in 0.80. >> > >> > In 0.92 - AgentProviderService.java >> > >> > private static final String LABEL_MAKER = "___"; >> > ... >> > private String getRoleName(String label) { >> > int index1 = label.indexOf(LABEL_MAKER); >> > int index2 = label.lastIndexOf(LABEL_MAKER); >> > if (index1 == index2) { >> > return label.substring(index1 + LABEL_MAKER.length()); >> > } else { >> > return label.substring(index1 + LABEL_MAKER.length(), index2); >> > } >> > } >> > >> > private String getRoleGroup(String label) { >> > return label.substring(label.lastIndexOf(LABEL_MAKER) + >> > LABEL_MAKER.length()); >> > } >> > >> > So when the real role name contains 3 "_" e.g. for "solo___super", the >> > getRoleName on container name will return just "solo" and not >> > "solo___super" and that bad role name can cause NPE >> > >> > Same role name works in 0.80 because in 0.80, there is no concept of >> > roleGroup >> > >> > In 0.80 - AgentProviderService.java >> > >> > private String getRoleName(String label) { >> > return label.substring(label.indexOf(LABEL_MAKER) + >> > LABEL_MAKER.length()); >> > } >> > >> > so in 0.80, the role name "solo__super" will return correct role name >> from >> > container label >> > >> > >> > 1) I tried to understand what the roleGroup is and whats its usage is >> but >> > could not locate any doc. Can someone give few lines of explanation ? >> > 2) Should this be considered a bug in .92 ? If not, and if you think >> > LABEL_MAKER should not be used in any role names; at least a clear doc >> AND >> > a clear check when accepting config files will help. I.e. if LABEL_MAKER >> > should not be used in any role names; then slider 0.92 should give error >> > when creating cluster or accepting configs during any other operations >> etc. >> > saying invalid role name etc. etc. >> > >> > Thanks in advance, >> > >> > >> > On Tue, Apr 11, 2017 at 6:09 PM, Manoj Samel <manojsamelt...@gmail.com> >> > wrote: >> > >> > > Hi >> > > >> > > Running slider 0.92 on CDH 5.5.1 (which is Hadoop 2.6), with Kerberos >> > > >> > > I am deploying a application with multiple components. The components >> > > start but fail to heart beat to slider AM. The slider AM log shows >> NPE at >> > > container heartbeat URLs as below. >> > > >> > > I have attached the complete slider AM log >> > > >> > > 2017-04-12 00:44:05,741 [2011871076@qtp-814377348-5] INFO >> > > agent.AgentProviderService - Handling registration: responseId=-1 >> > > timestamp=1491957845550 >> > > label=container_e95_1476898378926_91401_01_000003___solo___super >> > > hostname=node1078 >> > > expectedState=INIT >> > > actualState=INIT >> > > appVersion=null >> > > >> > > 2017-04-12 00:44:05,741 [2011871076@qtp-814377348-5] INFO >> > > agent.AgentProviderService - label: container_e95_1476898378926_ >> > 91401_01_000003___solo___super >> > > pkg: null >> > > 2017-04-12 00:44:05,741 [2011871076@qtp-814377348-5] INFO >> > > agent.AgentProviderService - Registration response: >> > > RegistrationResponse{response=OK, responseId=0, statusCommands=null} >> > > 2017-04-12 00:44:05,871 [Socket Reader #1 for port 32120] INFO >> > ipc.Server >> > > - Auth successful for slideradmin@BIGDATA (auth:SIMPLE) >> > > 2017-04-12 00:44:05,873 [Socket Reader #1 for port 32120] INFO >> > authorize.ServiceAuthorizationManager >> > > - Authorization successful for slideradmin@BIGDATA (auth:TOKEN) for >> > > protocol=interface org.apache.slider.server.appmaster.rpc. >> > > SliderClusterProtocolPB >> > > 2017-04-12 00:44:15,749 [1005856666@qtp-814377348-7] ERROR >> mortbay.log - >> > > /ws/v1/slider/agents/container_e95_1476898378926_ >> > > 91401_01_000002___pdx__svt___ten85/heartbeat >> > > java.lang.NullPointerException >> > > at org.apache.slider.providers.agent.AgentProviderService. >> > > handleHeartBeat(AgentProviderService.java:1090) >> > > at org.apache.slider.server.appmaster.web.rest.agent. >> > > AgentResource.heartbeat(AgentResource.java:98) >> > > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native >> Method) >> > > at sun.reflect.NativeMethodAccessorImpl.invoke( >> > > NativeMethodAccessorImpl.java:62) >> > > at sun.reflect.DelegatingMethodAccessorImpl.invoke( >> > > DelegatingMethodAccessorImpl.java:43) >> > > at java.lang.reflect.Method.invoke(Method.java:497) >> > > at com.sun.jersey.spi.container.JavaMethodInvokerFactory$1. >> > invoke( >> > > JavaMethodInvokerFactory.java:60) >> > > at com.sun.jersey.server.impl.model.method.dispatch. >> > > AbstractResourceMethodDispatchProvider$TypeOutInvoker._dispatch( >> > > AbstractResourceMethodDispatchProvider.java:185) >> > > at com.sun.jersey.server.impl.model.method.dispatch. >> > > ResourceJavaMethodDispatcher.dispatch(ResourceJavaMethodDispatcher. >> > > java:75) >> > > at com.sun.jersey.server.impl.uri.rules.HttpMethodRule. >> > > accept(HttpMethodRule.java:288) >> > > at com.sun.jersey.server.impl.uri.rules.RightHandPathRule. >> > > accept(RightHandPathRule.java:147) >> > > at com.sun.jersey.server.impl.uri.rules.SubLocatorRule. >> > > accept(SubLocatorRule.java:134) >> > > at com.sun.jersey.server.impl.uri.rules.RightHandPathRule. >> > > accept(RightHandPathRule.java:147) >> > > at com.sun.jersey.server.impl.uri.rules.ResourceClassRule. >> > > accept(ResourceClassRule.java:108) >> > > at com.sun.jersey.server.impl.uri.rules.RightHandPathRule. >> > > accept(RightHandPathRule.java:147) >> > > at com.sun.jersey.server.impl.uri >> .rules.RootResourceClassesRule. >> > > accept(RootResourceClassesRule.java:84) >> > > at com.sun.jersey.server.impl.app >> lication.WebApplicationImpl._ >> > > handleRequest(WebApplicationImpl.java:1469) >> > > at com.sun.jersey.server.impl.app >> lication.WebApplicationImpl._ >> > > handleRequest(WebApplicationImpl.java:1400) >> > > at com.sun.jersey.server.impl.application.WebApplicationImpl. >> > > handleRequest(WebApplicationImpl.java:1349) >> > > at com.sun.jersey.server.impl.application.WebApplicationImpl. >> > > handleRequest(WebApplicationImpl.java:1339) >> > > at com.sun.jersey.spi.container.servlet.WebComponent.service( >> > > WebComponent.java:416) >> > > at com.sun.jersey.spi.container.servlet.ServletContainer. >> > > service(ServletContainer.java:537) >> > > at com.sun.jersey.spi.container.servlet.ServletContainer. >> > > service(ServletContainer.java:699) >> > > at javax.servlet.http.HttpServlet >> .service(HttpServlet.java:820) >> > > at org.mortbay.jetty.servlet.ServletHolder.handle( >> > > ServletHolder.java:511) >> > > at org.mortbay.jetty.servlet.ServletHandler.handle( >> > > ServletHandler.java:401) >> > > at org.mortbay.jetty.servlet.SessionHandler.handle( >> > > SessionHandler.java:182) >> > > at org.mortbay.jetty.handler.ContextHandler.handle( >> > > ContextHandler.java:766) >> > > at org.mortbay.jetty.handler.HandlerWrapper.handle( >> > > HandlerWrapper.java:152) >> > > at org.mortbay.jetty.Server.handle(Server.java:326) >> > > at org.mortbay.jetty.HttpConnection.handleRequest( >> > > HttpConnection.java:542) >> > > at org.mortbay.jetty.HttpConnection$RequestHandler. >> > > content(HttpConnection.java:945) >> > > at org.mortbay.jetty.HttpParser.p >> arseNext(HttpParser.java:756) >> > > at org.mortbay.jetty.HttpParser.parseAvailable(HttpParser. >> > > java:212) >> > > at org.mortbay.jetty.HttpConnection.handle( >> > > HttpConnection.java:404) >> > > at org.mortbay.io.nio.SelectChannelEndPoint.run( >> > > SelectChannelEndPoint.java:410) >> > > at org.mortbay.thread.QueuedThreadPool$PoolThread. >> > > run(QueuedThreadPool.java:582) >> > > 2017-04-12 00:44:15,750 [2011871076@qtp-814377348-5] ERROR >> mortbay.log - >> > > /ws/v1/slider/agents/container_e95_1476898378926_ >> > > 91401_01_000004___pdx__svt___ten83/heartbeat >> > > java.lang.NullPointerException .... >> > > >> > > >> > >> > >