[ https://issues.apache.org/jira/browse/SPARK-9019?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14625980#comment-14625980 ]
Bolke de Bruin commented on SPARK-9019: --------------------------------------- Tracing this down it seems that the tokens are not being set on the container in yarn.Client, which is required according to http://aajisaka.github.io/hadoop-project/hadoop-yarn/hadoop-yarn-site/WritingYarnApplications.html. something like this: ByteBuffer fsTokens = ByteBuffer.wrap(dob.getData(), 0, dob.getLength()); amContainer.setTokens(fsTokens); in createContainerLaunchContext of yarn/src/main/scala/org/apache/spark/deploy/yarn/Client.scala > spark-submit fails on yarn with kerberos enabled > ------------------------------------------------ > > Key: SPARK-9019 > URL: https://issues.apache.org/jira/browse/SPARK-9019 > Project: Spark > Issue Type: Bug > Components: Spark Submit > Affects Versions: 1.5.0 > Environment: Hadoop 2.6 with YARN and kerberos enabled > Reporter: Bolke de Bruin > Labels: kerberos, spark-submit, yarn > > It is not possible to run jobs using spark-submit on yarn with a kerberized > cluster. > Commandline: > /usr/hdp/2.2.0.0-2041/spark-1.5.0/bin/spark-submit --principal sparkjob > --keytab sparkjob.keytab --num-executors 3 --executor-cores 5 > --executor-memory 5G --master yarn-cluster /tmp/get_peers.py > Fails with: > 15/07/13 22:48:31 INFO server.Server: jetty-8.y.z-SNAPSHOT > 15/07/13 22:48:31 INFO server.AbstractConnector: Started > SelectChannelConnector@0.0.0.0:58380 > 15/07/13 22:48:31 INFO util.Utils: Successfully started service 'SparkUI' on > port 58380. > 15/07/13 22:48:31 INFO ui.SparkUI: Started SparkUI at > http://10.111.114.9:58380 > 15/07/13 22:48:31 INFO cluster.YarnClusterScheduler: Created > YarnClusterScheduler > 15/07/13 22:48:31 WARN metrics.MetricsSystem: Using default name DAGScheduler > for source because spark.app.id is not set. > 15/07/13 22:48:32 INFO util.Utils: Successfully started service > 'org.apache.spark.network.netty.NettyBlockTransferService' on port 43470. > 15/07/13 22:48:32 INFO netty.NettyBlockTransferService: Server created on > 43470 > 15/07/13 22:48:32 INFO storage.BlockManagerMaster: Trying to register > BlockManager > 15/07/13 22:48:32 INFO storage.BlockManagerMasterEndpoint: Registering block > manager 10.111.114.9:43470 with 265.1 MB RAM, BlockManagerId(driver, > 10.111.114.9, 43470) > 15/07/13 22:48:32 INFO storage.BlockManagerMaster: Registered BlockManager > 15/07/13 22:48:32 INFO impl.TimelineClientImpl: Timeline service address: > http://lxhnl002.ad.ing.net:8188/ws/v1/timeline/ > 15/07/13 22:48:33 WARN ipc.Client: Exception encountered while connecting to > the server : org.apache.hadoop.security.AccessControlException: Client cannot > authenticate via:[TOKEN, KERBEROS] > 15/07/13 22:48:33 INFO client.ConfiguredRMFailoverProxyProvider: Failing over > to rm2 > 15/07/13 22:48:33 INFO retry.RetryInvocationHandler: Exception while invoking > getClusterNodes of class ApplicationClientProtocolPBClientImpl over rm2 after > 1 fail over attempts. Trying to fail over after sleeping for 32582ms. > java.net.ConnectException: Call From lxhnl006.ad.ing.net/10.111.114.9 to > lxhnl013.ad.ing.net:8032 failed on connection exception: > java.net.ConnectException: Connection refused; For more details see: > http://wiki.apache.org/hadoop/ConnectionRefused > at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method) > at > sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:57) > at > sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45) > at java.lang.reflect.Constructor.newInstance(Constructor.java:526) > at org.apache.hadoop.net.NetUtils.wrapWithMessage(NetUtils.java:791) > at org.apache.hadoop.net.NetUtils.wrapException(NetUtils.java:731) > at org.apache.hadoop.ipc.Client.call(Client.java:1472) > at org.apache.hadoop.ipc.Client.call(Client.java:1399) > at > org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:232) > at com.sun.proxy.$Proxy24.getClusterNodes(Unknown Source) > at > org.apache.hadoop.yarn.api.impl.pb.client.ApplicationClientProtocolPBClientImpl.getClusterNodes(ApplicationClientProtocolPBClientImpl.java:262) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:606) > at > org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:187) > at > org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:102) > at com.sun.proxy.$Proxy25.getClusterNodes(Unknown Source) > at > org.apache.hadoop.yarn.client.api.impl.YarnClientImpl.getNodeReports(YarnClientImpl.java:475) > at > org.apache.spark.scheduler.cluster.YarnClusterSchedulerBackend$$anonfun$getDriverLogUrls$1.apply(YarnClusterSchedulerBackend.scala:92) > at > org.apache.spark.scheduler.cluster.YarnClusterSchedulerBackend$$anonfun$getDriverLogUrls$1.apply(YarnClusterSchedulerBackend.scala:73) > at scala.Option.foreach(Option.scala:236) > at > org.apache.spark.scheduler.cluster.YarnClusterSchedulerBackend.getDriverLogUrls(YarnClusterSchedulerBackend.scala:73) > at > org.apache.spark.SparkContext.postApplicationStart(SparkContext.scala:1993) > at org.apache.spark.SparkContext.<init>(SparkContext.scala:544) > at > org.apache.spark.api.java.JavaSparkContext.<init>(JavaSparkContext.scala:61) > at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method) > at > sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:57) > at > sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45) > at java.lang.reflect.Constructor.newInstance(Constructor.java:526) > at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:234) > at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:379) > at py4j.Gateway.invoke(Gateway.java:214) > at > py4j.commands.ConstructorCommand.invokeConstructor(ConstructorCommand.java:79) > at py4j.commands.ConstructorCommand.execute(ConstructorCommand.java:68) > at py4j.GatewayConnection.run(GatewayConnection.java:207) > at java.lang.Thread.run(Thread.java:745) > Caused by: java.net.ConnectException: Connection refused > at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method) > at > sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:739) > at > org.apache.hadoop.net.SocketIOWithTimeout.connect(SocketIOWithTimeout.java:206) > at org.apache.hadoop.net.NetUtils.connect(NetUtils.java:530) > at org.apache.hadoop.net.NetUtils.connect(NetUtils.java:494) > at > org.apache.hadoop.ipc.Client$Connection.setupConnection(Client.java:607) > at > org.apache.hadoop.ipc.Client$Connection.setupIOstreams(Client.java:705) > at org.apache.hadoop.ipc.Client$Connection.access$2800(Client.java:368) > at org.apache.hadoop.ipc.Client.getConnection(Client.java:1521) > at org.apache.hadoop.ipc.Client.call(Client.java:1438) > ... 30 more > If not using --principal and --keytab the same error shows. -- This message was sent by Atlassian JIRA (v6.3.4#6332) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org