Thanks Gour !

Any idea when 1.0.0 will be available ?



On Fri, Sep 30, 2016 at 7:23 AM, Gour Saha <gs...@hortonworks.com> wrote:

> I think you are hitting this -
> https://issues.apache.org/jira/browse/SLIDER-1169
>
>
> On 9/29/16, 10:21 PM, "Manoj Samel" <manojsamelt...@gmail.com> wrote:
>
> >Hi
> >
> >Slider version .80 on secure cluster.
> >
> >In my xxx-site.xml files, the
> >    <property>
> >      <name>hadoop.registry.zk.quorum</name>
> >      <value>zk1_host:2181,zk2_host:2181,zk3_host:2181</value>
> >   </property>
> >
> >However, it appears slider AM uses only the first ZK to connect for
> >registry - and fails when the first ZK happens to be down.
> >
> >In the slider AM log
> >
> >2016-09-30 02:27:27,279 [main] INFO  appmaster.SliderAppMaster - Loading
> >slider-server.xml at
> >file:/foo/yarn/local/usercache/xx/appcache/application_1474675565244_
> 3660/
> >container_e80_1474675565244_3660_01_000001/confdir/slider-server.xml
> >2016-09-30 02:27:27,285 [main] INFO  appmaster.SliderAppMaster - AM
> >configuration:
> >dfs.namenode.kerberos.principal=hdfs/_HOST@ABC
> >hadoop.registry.zk.quorum=zk1_host:2181
> >hadoop.registry.zk.root=/registry
> >
> >Note -- the log shows only the first host, not the quorum string of 3
> >host:ports
> >
> >later in log, it tries to connect to ZK1 but since ZK1 is down, the
> >connection fails. The AM fails start any components as a result.
> >
> >
> >2016-09-29 23:32:49,768 [main] INFO  appmaster.SliderAppMaster - Service
> >YarnRegistry in state YarnRegistry: STARTED  Connection="fixed ZK quorum
> >"zk1_host:2181" " root="/registry" security disabled
> >2016-09-29 23:32:49,774 [main-SendThread(bds0211.svc.eng.pdx.wd:2181)]
> >WARN
> > zookeeper.ClientCnxn - Session 0x0 for server null, unexpected error,
> >closing socket connection and attempting reconnect
> >java.net.ConnectException: Connection refused
> >        at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
> >        at
> >sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:717)
> >        at
> >org.apache.zookeeper.ClientCnxnSocketNIO.doTransport(
> ClientCnxnSocketNIO.j
> >ava:361)
> >        at
> >org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1081)
> >
> >I would expect that if connection to ZK1 failed, then ZK2, 3 ... etc would
> >be tried .. thats what the ZK quorum is for.
> >
> >Looking into the code, I see this last "Connection" string is coming
> >from org.apache.hadoop.registry.client.impl.zk.CuratorService.java
> >
> >In it, supplyBindingInformation() gets and prints the string in log
> >message.
> >
> >public BindingInformation supplyBindingInformation() {
> >    BindingInformation binding = new BindingInformation();
> >    String connectString = buildConnectionString();
> >    binding.ensembleProvider = new FixedEnsembleProvider(connectString);
> >    binding.description =
> >        "fixed ZK quorum \"" + connectString + "\"";
> >    return binding;
> >  }
> >
> >protected String buildConnectionString() {
> >    return getConfig().getTrimmed(KEY_REGISTRY_ZK_QUORUM,
> >        DEFAULT_REGISTRY_ZK_QUORUM);
> >  }
> >
> >the getConfig() is from org.apache.hadoop.conf.Configuration.java
> >
> >Its not clear why the value of hadoop.registry.zk.quorum supplied in
> >config
> >gets trimmed to first host only. Is this the expected behavior ? Or Bug?
> >
> >It can't be possible to guarantee that the first zookeeper in quorum will
> >always be reachable .. I would expect multiple nodes in quorum to be tried
> >for connection
> >
> >
> >Any thoughts would be appreciated ...
>
>

Reply via email to