Thanks for the information, Miguel. I suggest using the the following yarn-site.xml:
<property> <name>yarn.resourcemanager.hostname</name> <value>rm.marathon.mesos</value> </property> Santosh On Wed, Feb 3, 2016 at 11:12 AM, <miguel.berna...@accenture.com> wrote: > Hi Santosh, > > I am running Myriad using Marathon. I am using Mesos-dns as well > and the yarn-site.xml and mapred-site.xml are exactly the same for both RM > and NM. I just only turned off my RM from MapR Control Console which > explains my error logs below. My understanding is that maybe I should > remove the rm1, rm2 and rm3 information from the yarn-site file but I want > to make sure I am doing it correctly. > > Here is the command I have sent to start RM from Marathon. I am > able to see Myriad running via this link in my environment > http://rm.marathon.mesos:8192/#/. > > { > "id": "rm", > "instances": 1, > "cpus": 0.2, > "mem": 2048, > "cmd": "env && export > YARN_RESOURCEMANAGER_OPTS=-Dyarn.resourcemanager.hostname=rm.marathon.mesos > && yarn resourcemanager", > "ports": [ 0 ] > } > > > > > Here is the yarn-site.xml file from our MapR cluster: > > > > > root@`hostname`:/home/diuser# cat > /opt/mapr/hadoop/hadoop-2.7.0/etc/hadoop/yarn-site.xml > <?xml version="1.0"?> > <!-- > Licensed under the Apache License, Version 2.0 (the "License"); > you may not use this file except in compliance with the License. > You may obtain a copy of the License at > > > http://www.apache.org/licenses/LICENSE-2.0 > > > Unless required by applicable law or agreed to in writing, software > distributed under the License is distributed on an "AS IS" BASIS, > WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. > See the License for the specific language governing permissions and > limitations under the License. See accompanying LICENSE file. > --> > <configuration> > <!-- Resource Manager HA Configs --> > <property> > <name>yarn.resourcemanager.ha.enabled</name> > <value>true</value> > </property> > <property> > <name>yarn.resourcemanager.ha.automatic-failover.enabled</name> > <value>true</value> > </property> > <property> > <name>yarn.resourcemanager.ha.automatic-failover.embedded</name> > <value>true</value> > </property> > <property> > <name>yarn.resourcemanager.recovery.enabled</name> > <value>true</value> > </property> > <property> > <name>yarn.resourcemanager.cluster-id</name> > <value>yarn-dicluster.techlabs.accenture.com</value> > </property> > <property> > <name>yarn.resourcemanager.ha.rm-ids</name> > <value>rm1,rm2,rm3</value> > </property> > <property> > <name>yarn.resourcemanager.ha.id</name> > <value>rm1</value> > </property> > <property> > <name>yarn.resourcemanager.zk-address</name> > > <value>otherhostname041:5181,otherhostname042:5181,otherhostname043:5181</value> > </property> > > <!-- Configuration for rm1 --> > <property> > <name>yarn.resourcemanager.scheduler.address.rm1</name> > <value>otherhostname041:8030</value> > </property> > <property> > <name>yarn.resourcemanager.resource-tracker.address.rm1</name> > <value>otherhostname041:8031</value> > </property> > <property> > <name>yarn.resourcemanager.address.rm1</name> > <value>otherhostname041:8032</value> > </property> > <property> > <name>yarn.resourcemanager.admin.address.rm1</name> > <value>otherhostname041:8033</value> > </property> > <property> > <name>yarn.resourcemanager.webapp.address.rm1</name> > <value>otherhostname041:8088</value> > </property> > <property> > <name>yarn.resourcemanager.webapp.https.address.rm1</name> > <value>otherhostname041:8090</value> > </property> > > > <!-- Configuration for rm2 --> > <property> > <name>yarn.resourcemanager.scheduler.address.rm2</name> > <value>otherhostname042:8030</value> > </property> > <property> > <name>yarn.resourcemanager.resource-tracker.address.rm2</name> > <value>otherhostname042:8031</value> > </property> > <property> > <name>yarn.resourcemanager.address.rm2</name> > <value>otherhostname042:8032</value> > </property> > <property> > <name>yarn.resourcemanager.admin.address.rm2</name> > <value>otherhostname042:8033</value> > </property> > <property> > <name>yarn.resourcemanager.webapp.address.rm2</name> > <value>otherhostname042:8088</value> > </property> > <property> > <name>yarn.resourcemanager.webapp.https.address.rm2</name> > <value>otherhostname042:8090</value> > </property> > > > <!-- Configuration for rm3 --> > <property> > <name>yarn.resourcemanager.scheduler.address.rm3</name> > <value>otherhostname043:8030</value> > </property> > <property> > <name>yarn.resourcemanager.resource-tracker.address.rm3</name> > <value>otherhostname043:8031</value> > </property> > <property> > <name>yarn.resourcemanager.address.rm3</name> > <value>otherhostname043:8032</value> > </property> > <property> > <name>yarn.resourcemanager.admin.address.rm3</name> > <value>otherhostname043:8033</value> > </property> > <property> > <name>yarn.resourcemanager.webapp.address.rm3</name> > <value>otherhostname043:8088</value> > </property> > <property> > <name>yarn.resourcemanager.webapp.https.address.rm3</name> > <value>otherhostname043:8090</value> > </property> > > > <!-- :::CAUTION::: DO NOT EDIT ANYTHING ON OR ABOVE THIS LINE —> > > <property> > <name>yarn.nodemanager.resource.cpu-vcores</name> > <value>${nodemanager.resource.cpu-vcores}</value> > </property> > <property> > <name>yarn.nodemanager.resource.memory-mb</name> > <value>${nodemanager.resource.memory-mb}</value> > </property> > <!--These options enable dynamic port assignment by mesos --> > <property> > <name>yarn.nodemanager.address</name> > <value>${myriad.yarn.nodemanager.address}</value> > </property> > <property> > <name>yarn.nodemanager.webapp.address</name> > <value>${myriad.yarn.nodemanager.webapp.address}</value> > </property> > <property> > <name>yarn.nodemanager.webapp.https.address</name> > <value>${myriad.yarn.nodemanager.webapp.address}</value> > </property> > <property> > <name>yarn.nodemanager.localizer.address</name> > <value>${myriad.yarn.nodemanager.localizer.address}</value> > </property> > <!-- Configure Myriad Scheduler here --> > <property> > <name>yarn.resourcemanager.scheduler.class</name> > > <value>org.apache.myriad.scheduler.yarn.MyriadFairScheduler</value> > <description>One can configure other scehdulers as well from > following list: org.apache.myriad.scheduler.yarn.MyriadCapacityScheduler, > org.apache.myriad.scheduler.yarn.MyriadFifoScheduler</description> > </property> > > > </configuration> > > > > > > > Here is the mapred-site.xml from the cluster as well: > > > > > root@`hostname`:/home/diuser# cat > /opt/mapr/hadoop/hadoop-2.7.0/etc/hadoop/mapred-site.xml > <?xml version="1.0"?> > <?xml-stylesheet type="text/xsl" href="configuration.xsl"?> > <!-- > Licensed under the Apache License, Version 2.0 (the "License"); > you may not use this file except in compliance with the License. > You may obtain a copy of the License at > > > http://www.apache.org/licenses/LICENSE-2.0 > > > Unless required by applicable law or agreed to in writing, software > distributed under the License is distributed on an "AS IS" BASIS, > WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. > See the License for the specific language governing permissions and > limitations under the License. See accompanying LICENSE file. > --> > > > <!-- Put site-specific property overrides in this file. --> > > > <configuration> > <property> > <name>mapreduce.jobhistory.address</name> > <value>otherhostname043:10020</value> > </property> > <property> > <name>mapreduce.jobhistory.webapp.address</name> > <value>otherhostname043:19888</value> > </property> > <!-- > <property> > <name>mapreduce.framework.name</name> > <value>yarn-tez</value> > </property> > —> > > <!--This option enables dynamic port assignment by mesos --> > <property> > <name>mapreduce.shuffle.port</name> > <value>${myriad.mapreduce.shuffle.port}</value> > </property> > > > > </configuration> > > > > > > > > > The issue is when I try to run a job from one of the nodes in the > cluster. I get this error below from the log > $YARN_HOME/logs/yarn-mapr-nodemanager-`hostname`.log You can see below that > it is reading from the yarn-site.xml for rm1, rm2, and rm3. > > > > > > 2016-02-02 15:33:52,116 INFO > org.apache.hadoop.yarn.client.ConfiguredRMFailoverProxyProvider: Failing > over to rm1 > 2016-02-02 15:33:52,117 INFO > org.apache.hadoop.io.retry.RetryInvocationHandler: Exception while invoking > nodeHeartbeat of class ResourceTrackerPBClientImpl over rm1 after 27 fail > over attempts. Trying to fail over after sleeping for 17384ms. > java.net.ConnectException: Call From `hostname`/10.1.194.49 to > otherhostname041:8031 failed on connection exception: > java.net.ConnectException: Connection refused; For more details see: > http://wiki.apache.org/hadoop/ConnectionRefused > at sun.reflect.GeneratedConstructorAccessor22.newInstance(Unknown > Source) > at > sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45) > at java.lang.reflect.Constructor.newInstance(Constructor.java:422) > at > org.apache.hadoop.net.NetUtils.wrapWithMessage(NetUtils.java:792) > at org.apache.hadoop.net.NetUtils.wrapException(NetUtils.java:732) > at org.apache.hadoop.ipc.Client.call(Client.java:1482) > at org.apache.hadoop.ipc.Client.call(Client.java:1409) > at > org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:229) > at com.sun.proxy.$Proxy28.nodeHeartbeat(Unknown Source) > at > org.apache.hadoop.yarn.server.api.impl.pb.client.ResourceTrackerPBClientImpl.nodeHeartbeat(ResourceTrackerPBClientImpl.java:80) > at sun.reflect.GeneratedMethodAccessor3.invoke(Unknown Source) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:497) > at > org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:187) > at > org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:102) > at com.sun.proxy.$Proxy29.nodeHeartbeat(Unknown Source) > at > org.apache.hadoop.yarn.server.nodemanager.NodeStatusUpdaterImpl$1.run(NodeStatusUpdaterImpl.java:622) > at java.lang.Thread.run(Thread.java:745) > Caused by: java.net.ConnectException: Connection refused > at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method) > at > sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:717) > at > org.apache.hadoop.net.SocketIOWithTimeout.connect(SocketIOWithTimeout.java:206) > at org.apache.hadoop.net.NetUtils.connect(NetUtils.java:531) > at org.apache.hadoop.net.NetUtils.connect(NetUtils.java:495) > at > org.apache.hadoop.ipc.Client$Connection.setupConnection(Client.java:614) > at > org.apache.hadoop.ipc.Client$Connection.setupIOstreams(Client.java:708) > at > org.apache.hadoop.ipc.Client$Connection.access$2900(Client.java:374) > at org.apache.hadoop.ipc.Client.getConnection(Client.java:1531) > at org.apache.hadoop.ipc.Client.call(Client.java:1448) > ... 12 more > 2016-02-02 15:34:09,502 INFO > org.apache.hadoop.yarn.client.ConfiguredRMFailoverProxyProvider: Failing > over to rm2 > 2016-02-02 15:34:09,503 INFO > org.apache.hadoop.io.retry.RetryInvocationHandler: Exception while invoking > nodeHeartbeat of class ResourceTrackerPBClientImpl over rm2 after 28 fail > over attempts. Trying to fail over after sleeping for 35554ms. > java.net.ConnectException: Call From `hostname`/10.1.194.49 to > otherhostname042:8031 failed on connection exception: > java.net.ConnectException: Connection refused; For more details see: > http://wiki.apache.org/hadoop/ConnectionRefused > at sun.reflect.GeneratedConstructorAccessor22.newInstance(Unknown > Source) > at > sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45) > at java.lang.reflect.Constructor.newInstance(Constructor.java:422) > at > org.apache.hadoop.net.NetUtils.wrapWithMessage(NetUtils.java:792) > at org.apache.hadoop.net.NetUtils.wrapException(NetUtils.java:732) > at org.apache.hadoop.ipc.Client.call(Client.java:1482) > at org.apache.hadoop.ipc.Client.call(Client.java:1409) > at > org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:229) > at com.sun.proxy.$Proxy28.nodeHeartbeat(Unknown Source) > at > org.apache.hadoop.yarn.server.api.impl.pb.client.ResourceTrackerPBClientImpl.nodeHeartbeat(ResourceTrackerPBClientImpl.java:80) > at sun.reflect.GeneratedMethodAccessor3.invoke(Unknown Source) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:497) > at > org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:187) > at > org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:102) > at com.sun.proxy.$Proxy29.nodeHeartbeat(Unknown Source) > at > org.apache.hadoop.yarn.server.nodemanager.NodeStatusUpdaterImpl$1.run(NodeStatusUpdaterImpl.java:622) > at java.lang.Thread.run(Thread.java:745) > Caused by: java.net.ConnectException: Connection refused > at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method) > at > sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:717) > at > org.apache.hadoop.net.SocketIOWithTimeout.connect(SocketIOWithTimeout.java:206) > at org.apache.hadoop.net.NetUtils.connect(NetUtils.java:531) > at org.apache.hadoop.net.NetUtils.connect(NetUtils.java:495) > at > org.apache.hadoop.ipc.Client$Connection.setupConnection(Client.java:614) > at > org.apache.hadoop.ipc.Client$Connection.setupIOstreams(Client.java:708) > at > org.apache.hadoop.ipc.Client$Connection.access$2900(Client.java:374) > at org.apache.hadoop.ipc.Client.getConnection(Client.java:1531) > at org.apache.hadoop.ipc.Client.call(Client.java:1448) > ... 12 more > 2016-02-02 15:34:45,058 INFO > org.apache.hadoop.yarn.client.ConfiguredRMFailoverProxyProvider: Failing > over to rm3 > 2016-02-02 15:34:45,060 INFO > org.apache.hadoop.io.retry.RetryInvocationHandler: Exception while invoking > nodeHeartbeat of class ResourceTrackerPBClientImpl over rm3 after 29 fail > over attempts. Trying to fail over after sleeping for 17219ms. > java.net.ConnectException: Call From `hostname`/10.1.194.49 to > otherhostname043:8031 failed on connection exception: > java.net.ConnectException: Connection refused; For more details see: > http://wiki.apache.org/hadoop/ConnectionRefused > at sun.reflect.GeneratedConstructorAccessor22.newInstance(Unknown > Source) > at > sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45) > at java.lang.reflect.Constructor.newInstance(Constructor.java:422) > at > org.apache.hadoop.net.NetUtils.wrapWithMessage(NetUtils.java:792) > at org.apache.hadoop.net.NetUtils.wrapException(NetUtils.java:732) > at org.apache.hadoop.ipc.Client.call(Client.java:1482) > at org.apache.hadoop.ipc.Client.call(Client.java:1409) > at > org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:229) > at com.sun.proxy.$Proxy28.nodeHeartbeat(Unknown Source) > at > org.apache.hadoop.yarn.server.api.impl.pb.client.ResourceTrackerPBClientImpl.nodeHeartbeat(ResourceTrackerPBClientImpl.java:80) > at sun.reflect.GeneratedMethodAccessor3.invoke(Unknown Source) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:497) > at > org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:187) > at > org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:102) > at com.sun.proxy.$Proxy29.nodeHeartbeat(Unknown Source) > at > org.apache.hadoop.yarn.server.nodemanager.NodeStatusUpdaterImpl$1.run(NodeStatusUpdaterImpl.java:622) > at java.lang.Thread.run(Thread.java:745) > Caused by: java.net.ConnectException: Connection refused > at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method) > at > sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:717) > at > org.apache.hadoop.net.SocketIOWithTimeout.connect(SocketIOWithTimeout.java:206) > at org.apache.hadoop.net.NetUtils.connect(NetUtils.java:531) > at org.apache.hadoop.net.NetUtils.connect(NetUtils.java:495) > at > org.apache.hadoop.ipc.Client$Connection.setupConnection(Client.java:614) > at > org.apache.hadoop.ipc.Client$Connection.setupIOstreams(Client.java:708) > at > org.apache.hadoop.ipc.Client$Connection.access$2900(Client.java:374) > at org.apache.hadoop.ipc.Client.getConnection(Client.java:1531) > at org.apache.hadoop.ipc.Client.call(Client.java:1448) > ... 12 more > 2016-02-02 15:35:02,280 INFO > org.apache.hadoop.yarn.client.ConfiguredRMFailoverProxyProvider: Failing > over to rm1 > 2016-02-02 15:35:02,281 WARN > org.apache.hadoop.io.retry.RetryInvocationHandler: Exception while invoking > class > org.apache.hadoop.yarn.server.api.impl.pb.client.ResourceTrackerPBClientImpl.nodeHeartbeat > over rm1. Not retrying because failovers (30) exceeded maximum allowed (30) > > > > > > Miguel Bernadin Accenture Technology Labs – System Engineering > Contact: W (408) 817-2742 | M (631) 835-6345 | > miguel.berna...@accenture.com > > > > > > > On 2/3/16, 10:49 AM, "Santosh Marella" <smare...@maprtech.com> wrote: > > >Hi Miguel, > > > > Are you running the YARN cluster using Myriad (I assume so)? How did > you > >launch your RM - manually/using marathon/using warden? How does NMs > >discover where RM is - perhaps you can paste your yarn-site.xml from the > RM > >node and on of your NM nodes. > > > >Thanks, > >Santosh > > > >On Wed, Feb 3, 2016 at 10:36 AM, <miguel.berna...@accenture.com> wrote: > > > >> Hello guys, > >> > >> I wanted to know if anyone with a MapR environment can share with me > their > >> yarn-site.xml and mapred-site.xml files with me. When running the > terasort > >> job, it looks like its looking for rm1, rm2, and rm3. I modified the > file > >> in place rather than taking it from template. > >> > >> > Miguel Bernadin Accenture Technology Labs – System Engineering > >> Contact: W (408) 817-2742 | M (631) 835-6345 | > >> miguel.berna...@accenture.com<mailto:miguel.berna...@accenture.com> > >> > >> ________________________________ > >> > >> This message is for the designated recipient only and may contain > >> privileged, proprietary, or otherwise confidential information. If you > have > >> received it in error, please notify the sender immediately and delete > the > >> original. Any other use of the e-mail by you is prohibited. Where > allowed > >> by local law, electronic communications with Accenture and its > affiliates, > >> including e-mail and instant messaging (including content), may be > scanned > >> by our systems for the purposes of information security and assessment > of > >> internal compliance with Accenture policy. > >> > >> > ______________________________________________________________________________________ > >> > >> www.accenture.com > >> > > ________________________________ > > This message is for the designated recipient only and may contain > privileged, proprietary, or otherwise confidential information. If you have > received it in error, please notify the sender immediately and delete the > original. Any other use of the e-mail by you is prohibited. Where allowed > by local law, electronic communications with Accenture and its affiliates, > including e-mail and instant messaging (including content), may be scanned > by our systems for the purposes of information security and assessment of > internal compliance with Accenture policy. > > ______________________________________________________________________________________ > > www.accenture.com >