[jira] [Commented] (SPARK-9019) spark-submit fails on yarn with kerberos enabled
[ https://issues.apache.org/jira/browse/SPARK-9019?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14680109#comment-14680109 ] Thomas Graves commented on SPARK-9019: -- We forgot to close out the jira. This was fixed by SPARK-8988. Comments in the pr if people are interested. spark-submit fails on yarn with kerberos enabled Key: SPARK-9019 URL: https://issues.apache.org/jira/browse/SPARK-9019 Project: Spark Issue Type: Bug Components: Spark Submit Affects Versions: 1.5.0 Environment: Hadoop 2.6 with YARN and kerberos enabled Reporter: Bolke de Bruin Labels: kerberos, spark-submit, yarn Attachments: debug-log-spark-1.5-fail, spark-submit-log-1.5.0-fail It is not possible to run jobs using spark-submit on yarn with a kerberized cluster. Commandline: /usr/hdp/2.2.0.0-2041/spark-1.5.0/bin/spark-submit --principal sparkjob --keytab sparkjob.keytab --num-executors 3 --executor-cores 5 --executor-memory 5G --master yarn-cluster /tmp/get_peers.py Fails with: 15/07/13 22:48:31 INFO server.Server: jetty-8.y.z-SNAPSHOT 15/07/13 22:48:31 INFO server.AbstractConnector: Started SelectChannelConnector@0.0.0.0:58380 15/07/13 22:48:31 INFO util.Utils: Successfully started service 'SparkUI' on port 58380. 15/07/13 22:48:31 INFO ui.SparkUI: Started SparkUI at http://10.111.114.9:58380 15/07/13 22:48:31 INFO cluster.YarnClusterScheduler: Created YarnClusterScheduler 15/07/13 22:48:31 WARN metrics.MetricsSystem: Using default name DAGScheduler for source because spark.app.id is not set. 15/07/13 22:48:32 INFO util.Utils: Successfully started service 'org.apache.spark.network.netty.NettyBlockTransferService' on port 43470. 15/07/13 22:48:32 INFO netty.NettyBlockTransferService: Server created on 43470 15/07/13 22:48:32 INFO storage.BlockManagerMaster: Trying to register BlockManager 15/07/13 22:48:32 INFO storage.BlockManagerMasterEndpoint: Registering block manager 10.111.114.9:43470 with 265.1 MB RAM, BlockManagerId(driver, 10.111.114.9, 43470) 15/07/13 22:48:32 INFO storage.BlockManagerMaster: Registered BlockManager 15/07/13 22:48:32 INFO impl.TimelineClientImpl: Timeline service address: http://lxhnl002.ad.ing.net:8188/ws/v1/timeline/ 15/07/13 22:48:33 WARN ipc.Client: Exception encountered while connecting to the server : org.apache.hadoop.security.AccessControlException: Client cannot authenticate via:[TOKEN, KERBEROS] 15/07/13 22:48:33 INFO client.ConfiguredRMFailoverProxyProvider: Failing over to rm2 15/07/13 22:48:33 INFO retry.RetryInvocationHandler: Exception while invoking getClusterNodes of class ApplicationClientProtocolPBClientImpl over rm2 after 1 fail over attempts. Trying to fail over after sleeping for 32582ms. java.net.ConnectException: Call From lxhnl006.ad.ing.net/10.111.114.9 to lxhnl013.ad.ing.net:8032 failed on connection exception: java.net.ConnectException: Connection refused; For more details see: http://wiki.apache.org/hadoop/ConnectionRefused at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method) at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:57) at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45) at java.lang.reflect.Constructor.newInstance(Constructor.java:526) at org.apache.hadoop.net.NetUtils.wrapWithMessage(NetUtils.java:791) at org.apache.hadoop.net.NetUtils.wrapException(NetUtils.java:731) at org.apache.hadoop.ipc.Client.call(Client.java:1472) at org.apache.hadoop.ipc.Client.call(Client.java:1399) at org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:232) at com.sun.proxy.$Proxy24.getClusterNodes(Unknown Source) at org.apache.hadoop.yarn.api.impl.pb.client.ApplicationClientProtocolPBClientImpl.getClusterNodes(ApplicationClientProtocolPBClientImpl.java:262) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:606) at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:187) at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:102) at com.sun.proxy.$Proxy25.getClusterNodes(Unknown Source) at org.apache.hadoop.yarn.client.api.impl.YarnClientImpl.getNodeReports(YarnClientImpl.java:475) at
[jira] [Commented] (SPARK-9019) spark-submit fails on yarn with kerberos enabled
[ https://issues.apache.org/jira/browse/SPARK-9019?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14662211#comment-14662211 ] Steve Loughran commented on SPARK-9019: --- # If this problem exists (I don't have test setup right now for various reasons) then it is a regression from 1.3 # Like Thomas says, RM client tokens should get down to the AM automatically. If, however, these tokens are needed in the containers, then a delegation token is going to be needed —presumably that is what this patch does. However, that token will expire then a new one is needed; SPARK-5342 was meant to address that. It should be creating the tokens providing them on demand. Something is playing up there. Regarding the patch, I don't know how well it would work in an RM-HA environment. Someone who understands the details for HA YARN would need to look at it spark-submit fails on yarn with kerberos enabled Key: SPARK-9019 URL: https://issues.apache.org/jira/browse/SPARK-9019 Project: Spark Issue Type: Bug Components: Spark Submit Affects Versions: 1.5.0 Environment: Hadoop 2.6 with YARN and kerberos enabled Reporter: Bolke de Bruin Labels: kerberos, spark-submit, yarn Attachments: debug-log-spark-1.5-fail, spark-submit-log-1.5.0-fail It is not possible to run jobs using spark-submit on yarn with a kerberized cluster. Commandline: /usr/hdp/2.2.0.0-2041/spark-1.5.0/bin/spark-submit --principal sparkjob --keytab sparkjob.keytab --num-executors 3 --executor-cores 5 --executor-memory 5G --master yarn-cluster /tmp/get_peers.py Fails with: 15/07/13 22:48:31 INFO server.Server: jetty-8.y.z-SNAPSHOT 15/07/13 22:48:31 INFO server.AbstractConnector: Started SelectChannelConnector@0.0.0.0:58380 15/07/13 22:48:31 INFO util.Utils: Successfully started service 'SparkUI' on port 58380. 15/07/13 22:48:31 INFO ui.SparkUI: Started SparkUI at http://10.111.114.9:58380 15/07/13 22:48:31 INFO cluster.YarnClusterScheduler: Created YarnClusterScheduler 15/07/13 22:48:31 WARN metrics.MetricsSystem: Using default name DAGScheduler for source because spark.app.id is not set. 15/07/13 22:48:32 INFO util.Utils: Successfully started service 'org.apache.spark.network.netty.NettyBlockTransferService' on port 43470. 15/07/13 22:48:32 INFO netty.NettyBlockTransferService: Server created on 43470 15/07/13 22:48:32 INFO storage.BlockManagerMaster: Trying to register BlockManager 15/07/13 22:48:32 INFO storage.BlockManagerMasterEndpoint: Registering block manager 10.111.114.9:43470 with 265.1 MB RAM, BlockManagerId(driver, 10.111.114.9, 43470) 15/07/13 22:48:32 INFO storage.BlockManagerMaster: Registered BlockManager 15/07/13 22:48:32 INFO impl.TimelineClientImpl: Timeline service address: http://lxhnl002.ad.ing.net:8188/ws/v1/timeline/ 15/07/13 22:48:33 WARN ipc.Client: Exception encountered while connecting to the server : org.apache.hadoop.security.AccessControlException: Client cannot authenticate via:[TOKEN, KERBEROS] 15/07/13 22:48:33 INFO client.ConfiguredRMFailoverProxyProvider: Failing over to rm2 15/07/13 22:48:33 INFO retry.RetryInvocationHandler: Exception while invoking getClusterNodes of class ApplicationClientProtocolPBClientImpl over rm2 after 1 fail over attempts. Trying to fail over after sleeping for 32582ms. java.net.ConnectException: Call From lxhnl006.ad.ing.net/10.111.114.9 to lxhnl013.ad.ing.net:8032 failed on connection exception: java.net.ConnectException: Connection refused; For more details see: http://wiki.apache.org/hadoop/ConnectionRefused at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method) at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:57) at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45) at java.lang.reflect.Constructor.newInstance(Constructor.java:526) at org.apache.hadoop.net.NetUtils.wrapWithMessage(NetUtils.java:791) at org.apache.hadoop.net.NetUtils.wrapException(NetUtils.java:731) at org.apache.hadoop.ipc.Client.call(Client.java:1472) at org.apache.hadoop.ipc.Client.call(Client.java:1399) at org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:232) at com.sun.proxy.$Proxy24.getClusterNodes(Unknown Source) at org.apache.hadoop.yarn.api.impl.pb.client.ApplicationClientProtocolPBClientImpl.getClusterNodes(ApplicationClientProtocolPBClientImpl.java:262) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at
[jira] [Commented] (SPARK-9019) spark-submit fails on yarn with kerberos enabled
[ https://issues.apache.org/jira/browse/SPARK-9019?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14633623#comment-14633623 ] Thomas Graves commented on SPARK-9019: -- The RMdelegationtoken is only needed if the application is doing things like submitting other applications, killing applications, etc. oozie uses this to launch jobs. We do not need to acquire it to just run the spark application on YARN.] Are you doing something special to try to launch another job? spark-submit fails on yarn with kerberos enabled Key: SPARK-9019 URL: https://issues.apache.org/jira/browse/SPARK-9019 Project: Spark Issue Type: Bug Components: Spark Submit Affects Versions: 1.5.0 Environment: Hadoop 2.6 with YARN and kerberos enabled Reporter: Bolke de Bruin Labels: kerberos, spark-submit, yarn It is not possible to run jobs using spark-submit on yarn with a kerberized cluster. Commandline: /usr/hdp/2.2.0.0-2041/spark-1.5.0/bin/spark-submit --principal sparkjob --keytab sparkjob.keytab --num-executors 3 --executor-cores 5 --executor-memory 5G --master yarn-cluster /tmp/get_peers.py Fails with: 15/07/13 22:48:31 INFO server.Server: jetty-8.y.z-SNAPSHOT 15/07/13 22:48:31 INFO server.AbstractConnector: Started SelectChannelConnector@0.0.0.0:58380 15/07/13 22:48:31 INFO util.Utils: Successfully started service 'SparkUI' on port 58380. 15/07/13 22:48:31 INFO ui.SparkUI: Started SparkUI at http://10.111.114.9:58380 15/07/13 22:48:31 INFO cluster.YarnClusterScheduler: Created YarnClusterScheduler 15/07/13 22:48:31 WARN metrics.MetricsSystem: Using default name DAGScheduler for source because spark.app.id is not set. 15/07/13 22:48:32 INFO util.Utils: Successfully started service 'org.apache.spark.network.netty.NettyBlockTransferService' on port 43470. 15/07/13 22:48:32 INFO netty.NettyBlockTransferService: Server created on 43470 15/07/13 22:48:32 INFO storage.BlockManagerMaster: Trying to register BlockManager 15/07/13 22:48:32 INFO storage.BlockManagerMasterEndpoint: Registering block manager 10.111.114.9:43470 with 265.1 MB RAM, BlockManagerId(driver, 10.111.114.9, 43470) 15/07/13 22:48:32 INFO storage.BlockManagerMaster: Registered BlockManager 15/07/13 22:48:32 INFO impl.TimelineClientImpl: Timeline service address: http://lxhnl002.ad.ing.net:8188/ws/v1/timeline/ 15/07/13 22:48:33 WARN ipc.Client: Exception encountered while connecting to the server : org.apache.hadoop.security.AccessControlException: Client cannot authenticate via:[TOKEN, KERBEROS] 15/07/13 22:48:33 INFO client.ConfiguredRMFailoverProxyProvider: Failing over to rm2 15/07/13 22:48:33 INFO retry.RetryInvocationHandler: Exception while invoking getClusterNodes of class ApplicationClientProtocolPBClientImpl over rm2 after 1 fail over attempts. Trying to fail over after sleeping for 32582ms. java.net.ConnectException: Call From lxhnl006.ad.ing.net/10.111.114.9 to lxhnl013.ad.ing.net:8032 failed on connection exception: java.net.ConnectException: Connection refused; For more details see: http://wiki.apache.org/hadoop/ConnectionRefused at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method) at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:57) at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45) at java.lang.reflect.Constructor.newInstance(Constructor.java:526) at org.apache.hadoop.net.NetUtils.wrapWithMessage(NetUtils.java:791) at org.apache.hadoop.net.NetUtils.wrapException(NetUtils.java:731) at org.apache.hadoop.ipc.Client.call(Client.java:1472) at org.apache.hadoop.ipc.Client.call(Client.java:1399) at org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:232) at com.sun.proxy.$Proxy24.getClusterNodes(Unknown Source) at org.apache.hadoop.yarn.api.impl.pb.client.ApplicationClientProtocolPBClientImpl.getClusterNodes(ApplicationClientProtocolPBClientImpl.java:262) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:606) at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:187) at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:102) at com.sun.proxy.$Proxy25.getClusterNodes(Unknown Source) at
[jira] [Commented] (SPARK-9019) spark-submit fails on yarn with kerberos enabled
[ https://issues.apache.org/jira/browse/SPARK-9019?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14633807#comment-14633807 ] Bolke de Bruin commented on SPARK-9019: --- As mentioned in the PR request the traces on the different clusters in the PR (not here) are from running the Pi example. My analysis shows that there has been a behavior change between spark 1.3.0 and spark 1.5. It could be helpful is someone else does the same with debugging logging turned on for the application and to compare it with mine? spark-submit fails on yarn with kerberos enabled Key: SPARK-9019 URL: https://issues.apache.org/jira/browse/SPARK-9019 Project: Spark Issue Type: Bug Components: Spark Submit Affects Versions: 1.5.0 Environment: Hadoop 2.6 with YARN and kerberos enabled Reporter: Bolke de Bruin Labels: kerberos, spark-submit, yarn It is not possible to run jobs using spark-submit on yarn with a kerberized cluster. Commandline: /usr/hdp/2.2.0.0-2041/spark-1.5.0/bin/spark-submit --principal sparkjob --keytab sparkjob.keytab --num-executors 3 --executor-cores 5 --executor-memory 5G --master yarn-cluster /tmp/get_peers.py Fails with: 15/07/13 22:48:31 INFO server.Server: jetty-8.y.z-SNAPSHOT 15/07/13 22:48:31 INFO server.AbstractConnector: Started SelectChannelConnector@0.0.0.0:58380 15/07/13 22:48:31 INFO util.Utils: Successfully started service 'SparkUI' on port 58380. 15/07/13 22:48:31 INFO ui.SparkUI: Started SparkUI at http://10.111.114.9:58380 15/07/13 22:48:31 INFO cluster.YarnClusterScheduler: Created YarnClusterScheduler 15/07/13 22:48:31 WARN metrics.MetricsSystem: Using default name DAGScheduler for source because spark.app.id is not set. 15/07/13 22:48:32 INFO util.Utils: Successfully started service 'org.apache.spark.network.netty.NettyBlockTransferService' on port 43470. 15/07/13 22:48:32 INFO netty.NettyBlockTransferService: Server created on 43470 15/07/13 22:48:32 INFO storage.BlockManagerMaster: Trying to register BlockManager 15/07/13 22:48:32 INFO storage.BlockManagerMasterEndpoint: Registering block manager 10.111.114.9:43470 with 265.1 MB RAM, BlockManagerId(driver, 10.111.114.9, 43470) 15/07/13 22:48:32 INFO storage.BlockManagerMaster: Registered BlockManager 15/07/13 22:48:32 INFO impl.TimelineClientImpl: Timeline service address: http://lxhnl002.ad.ing.net:8188/ws/v1/timeline/ 15/07/13 22:48:33 WARN ipc.Client: Exception encountered while connecting to the server : org.apache.hadoop.security.AccessControlException: Client cannot authenticate via:[TOKEN, KERBEROS] 15/07/13 22:48:33 INFO client.ConfiguredRMFailoverProxyProvider: Failing over to rm2 15/07/13 22:48:33 INFO retry.RetryInvocationHandler: Exception while invoking getClusterNodes of class ApplicationClientProtocolPBClientImpl over rm2 after 1 fail over attempts. Trying to fail over after sleeping for 32582ms. java.net.ConnectException: Call From lxhnl006.ad.ing.net/10.111.114.9 to lxhnl013.ad.ing.net:8032 failed on connection exception: java.net.ConnectException: Connection refused; For more details see: http://wiki.apache.org/hadoop/ConnectionRefused at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method) at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:57) at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45) at java.lang.reflect.Constructor.newInstance(Constructor.java:526) at org.apache.hadoop.net.NetUtils.wrapWithMessage(NetUtils.java:791) at org.apache.hadoop.net.NetUtils.wrapException(NetUtils.java:731) at org.apache.hadoop.ipc.Client.call(Client.java:1472) at org.apache.hadoop.ipc.Client.call(Client.java:1399) at org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:232) at com.sun.proxy.$Proxy24.getClusterNodes(Unknown Source) at org.apache.hadoop.yarn.api.impl.pb.client.ApplicationClientProtocolPBClientImpl.getClusterNodes(ApplicationClientProtocolPBClientImpl.java:262) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:606) at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:187) at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:102) at com.sun.proxy.$Proxy25.getClusterNodes(Unknown Source) at
[jira] [Commented] (SPARK-9019) spark-submit fails on yarn with kerberos enabled
[ https://issues.apache.org/jira/browse/SPARK-9019?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14632341#comment-14632341 ] Bolke de Bruin commented on SPARK-9019: --- I have created PR-#7489 for this issue. spark-submit fails on yarn with kerberos enabled Key: SPARK-9019 URL: https://issues.apache.org/jira/browse/SPARK-9019 Project: Spark Issue Type: Bug Components: Spark Submit Affects Versions: 1.5.0 Environment: Hadoop 2.6 with YARN and kerberos enabled Reporter: Bolke de Bruin Labels: kerberos, spark-submit, yarn It is not possible to run jobs using spark-submit on yarn with a kerberized cluster. Commandline: /usr/hdp/2.2.0.0-2041/spark-1.5.0/bin/spark-submit --principal sparkjob --keytab sparkjob.keytab --num-executors 3 --executor-cores 5 --executor-memory 5G --master yarn-cluster /tmp/get_peers.py Fails with: 15/07/13 22:48:31 INFO server.Server: jetty-8.y.z-SNAPSHOT 15/07/13 22:48:31 INFO server.AbstractConnector: Started SelectChannelConnector@0.0.0.0:58380 15/07/13 22:48:31 INFO util.Utils: Successfully started service 'SparkUI' on port 58380. 15/07/13 22:48:31 INFO ui.SparkUI: Started SparkUI at http://10.111.114.9:58380 15/07/13 22:48:31 INFO cluster.YarnClusterScheduler: Created YarnClusterScheduler 15/07/13 22:48:31 WARN metrics.MetricsSystem: Using default name DAGScheduler for source because spark.app.id is not set. 15/07/13 22:48:32 INFO util.Utils: Successfully started service 'org.apache.spark.network.netty.NettyBlockTransferService' on port 43470. 15/07/13 22:48:32 INFO netty.NettyBlockTransferService: Server created on 43470 15/07/13 22:48:32 INFO storage.BlockManagerMaster: Trying to register BlockManager 15/07/13 22:48:32 INFO storage.BlockManagerMasterEndpoint: Registering block manager 10.111.114.9:43470 with 265.1 MB RAM, BlockManagerId(driver, 10.111.114.9, 43470) 15/07/13 22:48:32 INFO storage.BlockManagerMaster: Registered BlockManager 15/07/13 22:48:32 INFO impl.TimelineClientImpl: Timeline service address: http://lxhnl002.ad.ing.net:8188/ws/v1/timeline/ 15/07/13 22:48:33 WARN ipc.Client: Exception encountered while connecting to the server : org.apache.hadoop.security.AccessControlException: Client cannot authenticate via:[TOKEN, KERBEROS] 15/07/13 22:48:33 INFO client.ConfiguredRMFailoverProxyProvider: Failing over to rm2 15/07/13 22:48:33 INFO retry.RetryInvocationHandler: Exception while invoking getClusterNodes of class ApplicationClientProtocolPBClientImpl over rm2 after 1 fail over attempts. Trying to fail over after sleeping for 32582ms. java.net.ConnectException: Call From lxhnl006.ad.ing.net/10.111.114.9 to lxhnl013.ad.ing.net:8032 failed on connection exception: java.net.ConnectException: Connection refused; For more details see: http://wiki.apache.org/hadoop/ConnectionRefused at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method) at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:57) at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45) at java.lang.reflect.Constructor.newInstance(Constructor.java:526) at org.apache.hadoop.net.NetUtils.wrapWithMessage(NetUtils.java:791) at org.apache.hadoop.net.NetUtils.wrapException(NetUtils.java:731) at org.apache.hadoop.ipc.Client.call(Client.java:1472) at org.apache.hadoop.ipc.Client.call(Client.java:1399) at org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:232) at com.sun.proxy.$Proxy24.getClusterNodes(Unknown Source) at org.apache.hadoop.yarn.api.impl.pb.client.ApplicationClientProtocolPBClientImpl.getClusterNodes(ApplicationClientProtocolPBClientImpl.java:262) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:606) at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:187) at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:102) at com.sun.proxy.$Proxy25.getClusterNodes(Unknown Source) at org.apache.hadoop.yarn.client.api.impl.YarnClientImpl.getNodeReports(YarnClientImpl.java:475) at org.apache.spark.scheduler.cluster.YarnClusterSchedulerBackend$$anonfun$getDriverLogUrls$1.apply(YarnClusterSchedulerBackend.scala:92) at
[jira] [Commented] (SPARK-9019) spark-submit fails on yarn with kerberos enabled
[ https://issues.apache.org/jira/browse/SPARK-9019?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14632340#comment-14632340 ] Apache Spark commented on SPARK-9019: - User 'bolkedebruin' has created a pull request for this issue: https://github.com/apache/spark/pull/7489 spark-submit fails on yarn with kerberos enabled Key: SPARK-9019 URL: https://issues.apache.org/jira/browse/SPARK-9019 Project: Spark Issue Type: Bug Components: Spark Submit Affects Versions: 1.5.0 Environment: Hadoop 2.6 with YARN and kerberos enabled Reporter: Bolke de Bruin Labels: kerberos, spark-submit, yarn It is not possible to run jobs using spark-submit on yarn with a kerberized cluster. Commandline: /usr/hdp/2.2.0.0-2041/spark-1.5.0/bin/spark-submit --principal sparkjob --keytab sparkjob.keytab --num-executors 3 --executor-cores 5 --executor-memory 5G --master yarn-cluster /tmp/get_peers.py Fails with: 15/07/13 22:48:31 INFO server.Server: jetty-8.y.z-SNAPSHOT 15/07/13 22:48:31 INFO server.AbstractConnector: Started SelectChannelConnector@0.0.0.0:58380 15/07/13 22:48:31 INFO util.Utils: Successfully started service 'SparkUI' on port 58380. 15/07/13 22:48:31 INFO ui.SparkUI: Started SparkUI at http://10.111.114.9:58380 15/07/13 22:48:31 INFO cluster.YarnClusterScheduler: Created YarnClusterScheduler 15/07/13 22:48:31 WARN metrics.MetricsSystem: Using default name DAGScheduler for source because spark.app.id is not set. 15/07/13 22:48:32 INFO util.Utils: Successfully started service 'org.apache.spark.network.netty.NettyBlockTransferService' on port 43470. 15/07/13 22:48:32 INFO netty.NettyBlockTransferService: Server created on 43470 15/07/13 22:48:32 INFO storage.BlockManagerMaster: Trying to register BlockManager 15/07/13 22:48:32 INFO storage.BlockManagerMasterEndpoint: Registering block manager 10.111.114.9:43470 with 265.1 MB RAM, BlockManagerId(driver, 10.111.114.9, 43470) 15/07/13 22:48:32 INFO storage.BlockManagerMaster: Registered BlockManager 15/07/13 22:48:32 INFO impl.TimelineClientImpl: Timeline service address: http://lxhnl002.ad.ing.net:8188/ws/v1/timeline/ 15/07/13 22:48:33 WARN ipc.Client: Exception encountered while connecting to the server : org.apache.hadoop.security.AccessControlException: Client cannot authenticate via:[TOKEN, KERBEROS] 15/07/13 22:48:33 INFO client.ConfiguredRMFailoverProxyProvider: Failing over to rm2 15/07/13 22:48:33 INFO retry.RetryInvocationHandler: Exception while invoking getClusterNodes of class ApplicationClientProtocolPBClientImpl over rm2 after 1 fail over attempts. Trying to fail over after sleeping for 32582ms. java.net.ConnectException: Call From lxhnl006.ad.ing.net/10.111.114.9 to lxhnl013.ad.ing.net:8032 failed on connection exception: java.net.ConnectException: Connection refused; For more details see: http://wiki.apache.org/hadoop/ConnectionRefused at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method) at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:57) at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45) at java.lang.reflect.Constructor.newInstance(Constructor.java:526) at org.apache.hadoop.net.NetUtils.wrapWithMessage(NetUtils.java:791) at org.apache.hadoop.net.NetUtils.wrapException(NetUtils.java:731) at org.apache.hadoop.ipc.Client.call(Client.java:1472) at org.apache.hadoop.ipc.Client.call(Client.java:1399) at org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:232) at com.sun.proxy.$Proxy24.getClusterNodes(Unknown Source) at org.apache.hadoop.yarn.api.impl.pb.client.ApplicationClientProtocolPBClientImpl.getClusterNodes(ApplicationClientProtocolPBClientImpl.java:262) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:606) at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:187) at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:102) at com.sun.proxy.$Proxy25.getClusterNodes(Unknown Source) at org.apache.hadoop.yarn.client.api.impl.YarnClientImpl.getNodeReports(YarnClientImpl.java:475) at org.apache.spark.scheduler.cluster.YarnClusterSchedulerBackend$$anonfun$getDriverLogUrls$1.apply(YarnClusterSchedulerBackend.scala:92) at
[jira] [Commented] (SPARK-9019) spark-submit fails on yarn with kerberos enabled
[ https://issues.apache.org/jira/browse/SPARK-9019?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14631367#comment-14631367 ] Bolke de Bruin commented on SPARK-9019: --- Ok. I tracked down the log entry due to a missing ResourceManager Delegation token. To get rid of this the following needs to be added to Client.scala in prepareLocalStorage. Of course with the relevant imports. I am currently testing if this solves to final issues as well, if so I will prepare a patch. obtainTokenForHBase(hadoopConf, credentials) logInfo(Requesting RM delegation token) val rmAddress = hadoopConf.getSocketAddr(YarnConfiguration.RM_ADDRESS, YarnConfiguration.DEFAULT_RM_ADDRESS, YarnConfiguration.DEFAULT_RM_PORT) val renewer = SecurityUtil.getServerPrincipal(hadoopConf.get(YarnConfiguration.RM_PRINCIPAL), rmAddress.getHostName) val protoToken = yarnClient.getRMDelegationToken(new Text(renewer)) val token = ConverterUtils.convertFromYarn(protoToken, rmAddress) credentials.addToken(new Text(token.getService), token) logInfo(sRM delegation token added: service ${token.getService} with renewer ${renewer} host was ${rmAddress.getHostName} and principal ${hadoopConf.get(YarnConfiguration.RM_PRINCIPAL)}) spark-submit fails on yarn with kerberos enabled Key: SPARK-9019 URL: https://issues.apache.org/jira/browse/SPARK-9019 Project: Spark Issue Type: Bug Components: Spark Submit Affects Versions: 1.5.0 Environment: Hadoop 2.6 with YARN and kerberos enabled Reporter: Bolke de Bruin Labels: kerberos, spark-submit, yarn It is not possible to run jobs using spark-submit on yarn with a kerberized cluster. Commandline: /usr/hdp/2.2.0.0-2041/spark-1.5.0/bin/spark-submit --principal sparkjob --keytab sparkjob.keytab --num-executors 3 --executor-cores 5 --executor-memory 5G --master yarn-cluster /tmp/get_peers.py Fails with: 15/07/13 22:48:31 INFO server.Server: jetty-8.y.z-SNAPSHOT 15/07/13 22:48:31 INFO server.AbstractConnector: Started SelectChannelConnector@0.0.0.0:58380 15/07/13 22:48:31 INFO util.Utils: Successfully started service 'SparkUI' on port 58380. 15/07/13 22:48:31 INFO ui.SparkUI: Started SparkUI at http://10.111.114.9:58380 15/07/13 22:48:31 INFO cluster.YarnClusterScheduler: Created YarnClusterScheduler 15/07/13 22:48:31 WARN metrics.MetricsSystem: Using default name DAGScheduler for source because spark.app.id is not set. 15/07/13 22:48:32 INFO util.Utils: Successfully started service 'org.apache.spark.network.netty.NettyBlockTransferService' on port 43470. 15/07/13 22:48:32 INFO netty.NettyBlockTransferService: Server created on 43470 15/07/13 22:48:32 INFO storage.BlockManagerMaster: Trying to register BlockManager 15/07/13 22:48:32 INFO storage.BlockManagerMasterEndpoint: Registering block manager 10.111.114.9:43470 with 265.1 MB RAM, BlockManagerId(driver, 10.111.114.9, 43470) 15/07/13 22:48:32 INFO storage.BlockManagerMaster: Registered BlockManager 15/07/13 22:48:32 INFO impl.TimelineClientImpl: Timeline service address: http://lxhnl002.ad.ing.net:8188/ws/v1/timeline/ 15/07/13 22:48:33 WARN ipc.Client: Exception encountered while connecting to the server : org.apache.hadoop.security.AccessControlException: Client cannot authenticate via:[TOKEN, KERBEROS] 15/07/13 22:48:33 INFO client.ConfiguredRMFailoverProxyProvider: Failing over to rm2 15/07/13 22:48:33 INFO retry.RetryInvocationHandler: Exception while invoking getClusterNodes of class ApplicationClientProtocolPBClientImpl over rm2 after 1 fail over attempts. Trying to fail over after sleeping for 32582ms. java.net.ConnectException: Call From lxhnl006.ad.ing.net/10.111.114.9 to lxhnl013.ad.ing.net:8032 failed on connection exception: java.net.ConnectException: Connection refused; For more details see: http://wiki.apache.org/hadoop/ConnectionRefused at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method) at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:57) at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45) at java.lang.reflect.Constructor.newInstance(Constructor.java:526) at org.apache.hadoop.net.NetUtils.wrapWithMessage(NetUtils.java:791) at org.apache.hadoop.net.NetUtils.wrapException(NetUtils.java:731) at org.apache.hadoop.ipc.Client.call(Client.java:1472) at org.apache.hadoop.ipc.Client.call(Client.java:1399) at org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:232) at com.sun.proxy.$Proxy24.getClusterNodes(Unknown Source) at
[jira] [Commented] (SPARK-9019) spark-submit fails on yarn with kerberos enabled
[ https://issues.apache.org/jira/browse/SPARK-9019?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14629432#comment-14629432 ] Bolke de Bruin commented on SPARK-9019: --- I tried running this on an update environment, however it still fails although behavior is a bit different now. The task is now being accepted but stays in the running state forever without executing anything. Please note that the trace below is without key tab usage, but with an authorized user (kinit admin/admin) 15/07/16 04:27:34 DEBUG Client: getting client out of cache: org.apache.hadoop.ipc.Client@53abb73 15/07/16 04:27:34 DEBUG AkkaRpcEnv$$anonfun$actorRef$lzycompute$1$1$$anon$1: [actor] received message AkkaMessage(ReviveOffers,false) from Actor[akka://sparkDriver/deadLetters] 15/07/16 04:27:34 DEBUG AkkaRpcEnv$$anonfun$actorRef$lzycompute$1$1$$anon$1: Received RPC message: AkkaMessage(ReviveOffers,false) 15/07/16 04:27:34 DEBUG AkkaRpcEnv$$anonfun$actorRef$lzycompute$1$1$$anon$1: [actor] handled message (1.632126 ms) AkkaMessage(ReviveOffers,false) from Actor[akka://sparkDriver/deadLetters] 15/07/16 04:27:34 DEBUG AbstractService: Service org.apache.hadoop.yarn.client.api.impl.TimelineClientImpl is started 15/07/16 04:27:34 DEBUG AbstractService: Service org.apache.hadoop.yarn.client.api.impl.YarnClientImpl is started 15/07/16 04:27:34 DEBUG Client: The ping interval is 6 ms. 15/07/16 04:27:34 DEBUG Client: Connecting to node6.local/10.79.10.6:8050 15/07/16 04:27:34 DEBUG UserGroupInformation: PrivilegedAction as:admin (auth:SIMPLE) from:org.apache.hadoop.ipc.Client$Connection.setupIOstreams(Client.java:717) 15/07/16 04:27:34 DEBUG SaslRpcClient: Sending sasl message state: NEGOTIATE 15/07/16 04:27:34 DEBUG SaslRpcClient: Received SASL message state: NEGOTIATE auths { method: TOKEN mechanism: DIGEST-MD5 protocol: serverId: default challenge: realm=\default\,nonce=\wjgFp9L22uDJt41FNtY9M8CP/T+dswfBoF48r9+s\,qop=\auth\,charset=utf-8,algorithm=md5-sess } auths { method: KERBEROS mechanism: GSSAPI protocol: rm serverId: node6.local } 15/07/16 04:27:34 DEBUG SaslRpcClient: Get token info proto:interface org.apache.hadoop.yarn.api.ApplicationClientProtocolPB info:org.apache.hadoop.yarn.security.client.ClientRMSecurityInfo$2@69990fa7 15/07/16 04:27:34 DEBUG RMDelegationTokenSelector: Looking for a token with service 10.79.10.6:8050 15/07/16 04:27:34 DEBUG RMDelegationTokenSelector: Token kind is YARN_AM_RM_TOKEN and the token's service name is 15/07/16 04:27:34 DEBUG RMDelegationTokenSelector: Token kind is HIVE_DELEGATION_TOKEN and the token's service name is 15/07/16 04:27:34 DEBUG RMDelegationTokenSelector: Token kind is TIMELINE_DELEGATION_TOKEN and the token's service name is 10.79.10.6:8188 15/07/16 04:27:34 DEBUG RMDelegationTokenSelector: Token kind is HDFS_DELEGATION_TOKEN and the token's service name is 10.79.10.4:8020 15/07/16 04:27:34 DEBUG UserGroupInformation: PrivilegedActionException as:admin (auth:SIMPLE) cause:org.apache.hadoop.security.AccessControlException: Client cannot authenticate via:[TOKEN, KERBEROS] 15/07/16 04:27:34 DEBUG UserGroupInformation: PrivilegedAction as:admin (auth:SIMPLE) from:org.apache.hadoop.ipc.Client$Connection.handleSaslConnectionFailure(Client.java:643) 15/07/16 04:27:34 WARN Client: Exception encountered while connecting to the server : org.apache.hadoop.security.AccessControlException: Client cannot authenticate via:[TOKEN, KERBEROS] 15/07/16 04:27:34 DEBUG UserGroupInformation: PrivilegedActionException as:admin (auth:SIMPLE) cause:java.io.IOException: org.apache.hadoop.security.AccessControlException: Client cannot authenticate via:[TOKEN, KERBEROS] 15/07/16 04:27:34 DEBUG Client: closing ipc connection to node6.local/10.79.10.6:8050: org.apache.hadoop.security.AccessControlException: Client cannot authenticate via:[TOKEN, KERBEROS] java.io.IOException: org.apache.hadoop.security.AccessControlException: Client cannot authenticate via:[TOKEN, KERBEROS] at org.apache.hadoop.ipc.Client$Connection$1.run(Client.java:680) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:415) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1628) at org.apache.hadoop.ipc.Client$Connection.handleSaslConnectionFailure(Client.java:643) at org.apache.hadoop.ipc.Client$Connection.setupIOstreams(Client.java:730) at org.apache.hadoop.ipc.Client$Connection.access$2800(Client.java:368) at org.apache.hadoop.ipc.Client.getConnection(Client.java:1521) at org.apache.hadoop.ipc.Client.call(Client.java:1438) at org.apache.hadoop.ipc.Client.call(Client.java:1399) at org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:232) at
[jira] [Commented] (SPARK-9019) spark-submit fails on yarn with kerberos enabled
[ https://issues.apache.org/jira/browse/SPARK-9019?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14625980#comment-14625980 ] Bolke de Bruin commented on SPARK-9019: --- Tracing this down it seems that the tokens are not being set on the container in yarn.Client, which is required according to http://aajisaka.github.io/hadoop-project/hadoop-yarn/hadoop-yarn-site/WritingYarnApplications.html. something like this: ByteBuffer fsTokens = ByteBuffer.wrap(dob.getData(), 0, dob.getLength()); amContainer.setTokens(fsTokens); in createContainerLaunchContext of yarn/src/main/scala/org/apache/spark/deploy/yarn/Client.scala spark-submit fails on yarn with kerberos enabled Key: SPARK-9019 URL: https://issues.apache.org/jira/browse/SPARK-9019 Project: Spark Issue Type: Bug Components: Spark Submit Affects Versions: 1.5.0 Environment: Hadoop 2.6 with YARN and kerberos enabled Reporter: Bolke de Bruin Labels: kerberos, spark-submit, yarn It is not possible to run jobs using spark-submit on yarn with a kerberized cluster. Commandline: /usr/hdp/2.2.0.0-2041/spark-1.5.0/bin/spark-submit --principal sparkjob --keytab sparkjob.keytab --num-executors 3 --executor-cores 5 --executor-memory 5G --master yarn-cluster /tmp/get_peers.py Fails with: 15/07/13 22:48:31 INFO server.Server: jetty-8.y.z-SNAPSHOT 15/07/13 22:48:31 INFO server.AbstractConnector: Started SelectChannelConnector@0.0.0.0:58380 15/07/13 22:48:31 INFO util.Utils: Successfully started service 'SparkUI' on port 58380. 15/07/13 22:48:31 INFO ui.SparkUI: Started SparkUI at http://10.111.114.9:58380 15/07/13 22:48:31 INFO cluster.YarnClusterScheduler: Created YarnClusterScheduler 15/07/13 22:48:31 WARN metrics.MetricsSystem: Using default name DAGScheduler for source because spark.app.id is not set. 15/07/13 22:48:32 INFO util.Utils: Successfully started service 'org.apache.spark.network.netty.NettyBlockTransferService' on port 43470. 15/07/13 22:48:32 INFO netty.NettyBlockTransferService: Server created on 43470 15/07/13 22:48:32 INFO storage.BlockManagerMaster: Trying to register BlockManager 15/07/13 22:48:32 INFO storage.BlockManagerMasterEndpoint: Registering block manager 10.111.114.9:43470 with 265.1 MB RAM, BlockManagerId(driver, 10.111.114.9, 43470) 15/07/13 22:48:32 INFO storage.BlockManagerMaster: Registered BlockManager 15/07/13 22:48:32 INFO impl.TimelineClientImpl: Timeline service address: http://lxhnl002.ad.ing.net:8188/ws/v1/timeline/ 15/07/13 22:48:33 WARN ipc.Client: Exception encountered while connecting to the server : org.apache.hadoop.security.AccessControlException: Client cannot authenticate via:[TOKEN, KERBEROS] 15/07/13 22:48:33 INFO client.ConfiguredRMFailoverProxyProvider: Failing over to rm2 15/07/13 22:48:33 INFO retry.RetryInvocationHandler: Exception while invoking getClusterNodes of class ApplicationClientProtocolPBClientImpl over rm2 after 1 fail over attempts. Trying to fail over after sleeping for 32582ms. java.net.ConnectException: Call From lxhnl006.ad.ing.net/10.111.114.9 to lxhnl013.ad.ing.net:8032 failed on connection exception: java.net.ConnectException: Connection refused; For more details see: http://wiki.apache.org/hadoop/ConnectionRefused at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method) at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:57) at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45) at java.lang.reflect.Constructor.newInstance(Constructor.java:526) at org.apache.hadoop.net.NetUtils.wrapWithMessage(NetUtils.java:791) at org.apache.hadoop.net.NetUtils.wrapException(NetUtils.java:731) at org.apache.hadoop.ipc.Client.call(Client.java:1472) at org.apache.hadoop.ipc.Client.call(Client.java:1399) at org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:232) at com.sun.proxy.$Proxy24.getClusterNodes(Unknown Source) at org.apache.hadoop.yarn.api.impl.pb.client.ApplicationClientProtocolPBClientImpl.getClusterNodes(ApplicationClientProtocolPBClientImpl.java:262) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:606) at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:187) at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:102)
[jira] [Commented] (SPARK-9019) spark-submit fails on yarn with kerberos enabled
[ https://issues.apache.org/jira/browse/SPARK-9019?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14626005#comment-14626005 ] Sean Owen commented on SPARK-9019: -- Same as SPARK-8851? spark-submit fails on yarn with kerberos enabled Key: SPARK-9019 URL: https://issues.apache.org/jira/browse/SPARK-9019 Project: Spark Issue Type: Bug Components: Spark Submit Affects Versions: 1.5.0 Environment: Hadoop 2.6 with YARN and kerberos enabled Reporter: Bolke de Bruin Labels: kerberos, spark-submit, yarn It is not possible to run jobs using spark-submit on yarn with a kerberized cluster. Commandline: /usr/hdp/2.2.0.0-2041/spark-1.5.0/bin/spark-submit --principal sparkjob --keytab sparkjob.keytab --num-executors 3 --executor-cores 5 --executor-memory 5G --master yarn-cluster /tmp/get_peers.py Fails with: 15/07/13 22:48:31 INFO server.Server: jetty-8.y.z-SNAPSHOT 15/07/13 22:48:31 INFO server.AbstractConnector: Started SelectChannelConnector@0.0.0.0:58380 15/07/13 22:48:31 INFO util.Utils: Successfully started service 'SparkUI' on port 58380. 15/07/13 22:48:31 INFO ui.SparkUI: Started SparkUI at http://10.111.114.9:58380 15/07/13 22:48:31 INFO cluster.YarnClusterScheduler: Created YarnClusterScheduler 15/07/13 22:48:31 WARN metrics.MetricsSystem: Using default name DAGScheduler for source because spark.app.id is not set. 15/07/13 22:48:32 INFO util.Utils: Successfully started service 'org.apache.spark.network.netty.NettyBlockTransferService' on port 43470. 15/07/13 22:48:32 INFO netty.NettyBlockTransferService: Server created on 43470 15/07/13 22:48:32 INFO storage.BlockManagerMaster: Trying to register BlockManager 15/07/13 22:48:32 INFO storage.BlockManagerMasterEndpoint: Registering block manager 10.111.114.9:43470 with 265.1 MB RAM, BlockManagerId(driver, 10.111.114.9, 43470) 15/07/13 22:48:32 INFO storage.BlockManagerMaster: Registered BlockManager 15/07/13 22:48:32 INFO impl.TimelineClientImpl: Timeline service address: http://lxhnl002.ad.ing.net:8188/ws/v1/timeline/ 15/07/13 22:48:33 WARN ipc.Client: Exception encountered while connecting to the server : org.apache.hadoop.security.AccessControlException: Client cannot authenticate via:[TOKEN, KERBEROS] 15/07/13 22:48:33 INFO client.ConfiguredRMFailoverProxyProvider: Failing over to rm2 15/07/13 22:48:33 INFO retry.RetryInvocationHandler: Exception while invoking getClusterNodes of class ApplicationClientProtocolPBClientImpl over rm2 after 1 fail over attempts. Trying to fail over after sleeping for 32582ms. java.net.ConnectException: Call From lxhnl006.ad.ing.net/10.111.114.9 to lxhnl013.ad.ing.net:8032 failed on connection exception: java.net.ConnectException: Connection refused; For more details see: http://wiki.apache.org/hadoop/ConnectionRefused at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method) at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:57) at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45) at java.lang.reflect.Constructor.newInstance(Constructor.java:526) at org.apache.hadoop.net.NetUtils.wrapWithMessage(NetUtils.java:791) at org.apache.hadoop.net.NetUtils.wrapException(NetUtils.java:731) at org.apache.hadoop.ipc.Client.call(Client.java:1472) at org.apache.hadoop.ipc.Client.call(Client.java:1399) at org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:232) at com.sun.proxy.$Proxy24.getClusterNodes(Unknown Source) at org.apache.hadoop.yarn.api.impl.pb.client.ApplicationClientProtocolPBClientImpl.getClusterNodes(ApplicationClientProtocolPBClientImpl.java:262) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:606) at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:187) at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:102) at com.sun.proxy.$Proxy25.getClusterNodes(Unknown Source) at org.apache.hadoop.yarn.client.api.impl.YarnClientImpl.getNodeReports(YarnClientImpl.java:475) at org.apache.spark.scheduler.cluster.YarnClusterSchedulerBackend$$anonfun$getDriverLogUrls$1.apply(YarnClusterSchedulerBackend.scala:92) at
[jira] [Commented] (SPARK-9019) spark-submit fails on yarn with kerberos enabled
[ https://issues.apache.org/jira/browse/SPARK-9019?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14626043#comment-14626043 ] Bolke de Bruin commented on SPARK-9019: --- Will try in a few minutes, however it did not only happen when using keytabs. Also when using the user's own credentials. spark-submit fails on yarn with kerberos enabled Key: SPARK-9019 URL: https://issues.apache.org/jira/browse/SPARK-9019 Project: Spark Issue Type: Bug Components: Spark Submit Affects Versions: 1.5.0 Environment: Hadoop 2.6 with YARN and kerberos enabled Reporter: Bolke de Bruin Labels: kerberos, spark-submit, yarn It is not possible to run jobs using spark-submit on yarn with a kerberized cluster. Commandline: /usr/hdp/2.2.0.0-2041/spark-1.5.0/bin/spark-submit --principal sparkjob --keytab sparkjob.keytab --num-executors 3 --executor-cores 5 --executor-memory 5G --master yarn-cluster /tmp/get_peers.py Fails with: 15/07/13 22:48:31 INFO server.Server: jetty-8.y.z-SNAPSHOT 15/07/13 22:48:31 INFO server.AbstractConnector: Started SelectChannelConnector@0.0.0.0:58380 15/07/13 22:48:31 INFO util.Utils: Successfully started service 'SparkUI' on port 58380. 15/07/13 22:48:31 INFO ui.SparkUI: Started SparkUI at http://10.111.114.9:58380 15/07/13 22:48:31 INFO cluster.YarnClusterScheduler: Created YarnClusterScheduler 15/07/13 22:48:31 WARN metrics.MetricsSystem: Using default name DAGScheduler for source because spark.app.id is not set. 15/07/13 22:48:32 INFO util.Utils: Successfully started service 'org.apache.spark.network.netty.NettyBlockTransferService' on port 43470. 15/07/13 22:48:32 INFO netty.NettyBlockTransferService: Server created on 43470 15/07/13 22:48:32 INFO storage.BlockManagerMaster: Trying to register BlockManager 15/07/13 22:48:32 INFO storage.BlockManagerMasterEndpoint: Registering block manager 10.111.114.9:43470 with 265.1 MB RAM, BlockManagerId(driver, 10.111.114.9, 43470) 15/07/13 22:48:32 INFO storage.BlockManagerMaster: Registered BlockManager 15/07/13 22:48:32 INFO impl.TimelineClientImpl: Timeline service address: http://lxhnl002.ad.ing.net:8188/ws/v1/timeline/ 15/07/13 22:48:33 WARN ipc.Client: Exception encountered while connecting to the server : org.apache.hadoop.security.AccessControlException: Client cannot authenticate via:[TOKEN, KERBEROS] 15/07/13 22:48:33 INFO client.ConfiguredRMFailoverProxyProvider: Failing over to rm2 15/07/13 22:48:33 INFO retry.RetryInvocationHandler: Exception while invoking getClusterNodes of class ApplicationClientProtocolPBClientImpl over rm2 after 1 fail over attempts. Trying to fail over after sleeping for 32582ms. java.net.ConnectException: Call From lxhnl006.ad.ing.net/10.111.114.9 to lxhnl013.ad.ing.net:8032 failed on connection exception: java.net.ConnectException: Connection refused; For more details see: http://wiki.apache.org/hadoop/ConnectionRefused at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method) at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:57) at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45) at java.lang.reflect.Constructor.newInstance(Constructor.java:526) at org.apache.hadoop.net.NetUtils.wrapWithMessage(NetUtils.java:791) at org.apache.hadoop.net.NetUtils.wrapException(NetUtils.java:731) at org.apache.hadoop.ipc.Client.call(Client.java:1472) at org.apache.hadoop.ipc.Client.call(Client.java:1399) at org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:232) at com.sun.proxy.$Proxy24.getClusterNodes(Unknown Source) at org.apache.hadoop.yarn.api.impl.pb.client.ApplicationClientProtocolPBClientImpl.getClusterNodes(ApplicationClientProtocolPBClientImpl.java:262) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:606) at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:187) at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:102) at com.sun.proxy.$Proxy25.getClusterNodes(Unknown Source) at org.apache.hadoop.yarn.client.api.impl.YarnClientImpl.getNodeReports(YarnClientImpl.java:475) at org.apache.spark.scheduler.cluster.YarnClusterSchedulerBackend$$anonfun$getDriverLogUrls$1.apply(YarnClusterSchedulerBackend.scala:92) at
[jira] [Commented] (SPARK-9019) spark-submit fails on yarn with kerberos enabled
[ https://issues.apache.org/jira/browse/SPARK-9019?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14626183#comment-14626183 ] Bolke de Bruin commented on SPARK-9019: --- [~srowen] unfortunately the patch from SPARK-8851 did not solve the issue. Trace remains the same. spark-submit fails on yarn with kerberos enabled Key: SPARK-9019 URL: https://issues.apache.org/jira/browse/SPARK-9019 Project: Spark Issue Type: Bug Components: Spark Submit Affects Versions: 1.5.0 Environment: Hadoop 2.6 with YARN and kerberos enabled Reporter: Bolke de Bruin Labels: kerberos, spark-submit, yarn It is not possible to run jobs using spark-submit on yarn with a kerberized cluster. Commandline: /usr/hdp/2.2.0.0-2041/spark-1.5.0/bin/spark-submit --principal sparkjob --keytab sparkjob.keytab --num-executors 3 --executor-cores 5 --executor-memory 5G --master yarn-cluster /tmp/get_peers.py Fails with: 15/07/13 22:48:31 INFO server.Server: jetty-8.y.z-SNAPSHOT 15/07/13 22:48:31 INFO server.AbstractConnector: Started SelectChannelConnector@0.0.0.0:58380 15/07/13 22:48:31 INFO util.Utils: Successfully started service 'SparkUI' on port 58380. 15/07/13 22:48:31 INFO ui.SparkUI: Started SparkUI at http://10.111.114.9:58380 15/07/13 22:48:31 INFO cluster.YarnClusterScheduler: Created YarnClusterScheduler 15/07/13 22:48:31 WARN metrics.MetricsSystem: Using default name DAGScheduler for source because spark.app.id is not set. 15/07/13 22:48:32 INFO util.Utils: Successfully started service 'org.apache.spark.network.netty.NettyBlockTransferService' on port 43470. 15/07/13 22:48:32 INFO netty.NettyBlockTransferService: Server created on 43470 15/07/13 22:48:32 INFO storage.BlockManagerMaster: Trying to register BlockManager 15/07/13 22:48:32 INFO storage.BlockManagerMasterEndpoint: Registering block manager 10.111.114.9:43470 with 265.1 MB RAM, BlockManagerId(driver, 10.111.114.9, 43470) 15/07/13 22:48:32 INFO storage.BlockManagerMaster: Registered BlockManager 15/07/13 22:48:32 INFO impl.TimelineClientImpl: Timeline service address: http://lxhnl002.ad.ing.net:8188/ws/v1/timeline/ 15/07/13 22:48:33 WARN ipc.Client: Exception encountered while connecting to the server : org.apache.hadoop.security.AccessControlException: Client cannot authenticate via:[TOKEN, KERBEROS] 15/07/13 22:48:33 INFO client.ConfiguredRMFailoverProxyProvider: Failing over to rm2 15/07/13 22:48:33 INFO retry.RetryInvocationHandler: Exception while invoking getClusterNodes of class ApplicationClientProtocolPBClientImpl over rm2 after 1 fail over attempts. Trying to fail over after sleeping for 32582ms. java.net.ConnectException: Call From lxhnl006.ad.ing.net/10.111.114.9 to lxhnl013.ad.ing.net:8032 failed on connection exception: java.net.ConnectException: Connection refused; For more details see: http://wiki.apache.org/hadoop/ConnectionRefused at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method) at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:57) at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45) at java.lang.reflect.Constructor.newInstance(Constructor.java:526) at org.apache.hadoop.net.NetUtils.wrapWithMessage(NetUtils.java:791) at org.apache.hadoop.net.NetUtils.wrapException(NetUtils.java:731) at org.apache.hadoop.ipc.Client.call(Client.java:1472) at org.apache.hadoop.ipc.Client.call(Client.java:1399) at org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:232) at com.sun.proxy.$Proxy24.getClusterNodes(Unknown Source) at org.apache.hadoop.yarn.api.impl.pb.client.ApplicationClientProtocolPBClientImpl.getClusterNodes(ApplicationClientProtocolPBClientImpl.java:262) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:606) at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:187) at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:102) at com.sun.proxy.$Proxy25.getClusterNodes(Unknown Source) at org.apache.hadoop.yarn.client.api.impl.YarnClientImpl.getNodeReports(YarnClientImpl.java:475) at org.apache.spark.scheduler.cluster.YarnClusterSchedulerBackend$$anonfun$getDriverLogUrls$1.apply(YarnClusterSchedulerBackend.scala:92) at
[jira] [Commented] (SPARK-9019) spark-submit fails on yarn with kerberos enabled
[ https://issues.apache.org/jira/browse/SPARK-9019?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14626113#comment-14626113 ] Bolke de Bruin commented on SPARK-9019: --- Now with debug info (not yet with patch): 15/07/14 11:03:49 DEBUG UserGroupInformation: PrivilegedAction as:yx66jx (auth:SIMPLE) from:org.apache.hadoop.ipc.Client$Connection.setupIOstreams(Client.java:717) 15/07/14 11:03:49 DEBUG SaslRpcClient: Sending sasl message state: NEGOTIATE 15/07/14 11:03:49 DEBUG SaslRpcClient: Received SASL message state: NEGOTIATE auths { method: TOKEN mechanism: DIGEST-MD5 protocol: serverId: default challenge: realm=\default\,nonce=\XXX\,qop=\auth\,charset=utf-8,algorithm=md5-sess } auths { method: KERBEROS mechanism: GSSAPI protocol: rm serverId: lxhnl002.ad.ing.net } 15/07/14 11:03:49 DEBUG SaslRpcClient: Get token info proto:interface org.apache.hadoop.yarn.api.ApplicationClientProtocolPB info:org.apache.hadoop.yarn.security.client.ClientRMSecurityInfo$2@5c53714b 15/07/14 11:03:49 DEBUG RMDelegationTokenSelector: Looking for a token with service 10.111.114.16:8032 15/07/14 11:03:49 DEBUG RMDelegationTokenSelector: Token kind is YARN_AM_RM_TOKEN and the token's service name is 15/07/14 11:03:49 DEBUG RMDelegationTokenSelector: Token kind is HIVE_DELEGATION_TOKEN and the token's service name is 15/07/14 11:03:49 DEBUG RMDelegationTokenSelector: Token kind is TIMELINE_DELEGATION_TOKEN and the token's service name is 10.111.114.16:8188 15/07/14 11:03:49 DEBUG RMDelegationTokenSelector: Token kind is HDFS_DELEGATION_TOKEN and the token's service name is 10.111.114.16:8020 15/07/14 11:03:49 DEBUG RMDelegationTokenSelector: Token kind is HDFS_DELEGATION_TOKEN and the token's service name is 10.111.114.17:8020 15/07/14 11:03:49 DEBUG RMDelegationTokenSelector: Token kind is HDFS_DELEGATION_TOKEN and the token's service name is ha-hdfs:hdpnlcb 15/07/14 11:03:49 DEBUG UserGroupInformation: PrivilegedActionException as:yx66jx (auth:SIMPLE) cause:org.apache.hadoop.security.AccessControlException: Client cannot authenticate via:[TOKEN, KERBEROS] 15/07/14 11:03:49 DEBUG UserGroupInformation: PrivilegedAction as:yx66jx (auth:SIMPLE) from:org.apache.hadoop.ipc.Client$Connection.handleSaslConnectionFailure(Client.java:643) 15/07/14 11:03:49 WARN Client: Exception encountered while connecting to the server : org.apache.hadoop.security.AccessControlException: Client cannot authenticate via:[TOKEN, KERBEROS] 15/07/14 11:03:49 DEBUG UserGroupInformation: PrivilegedActionException as:yx66jx (auth:SIMPLE) cause:java.io.IOException: org.apache.hadoop.security.AccessControlException: Client cannot authenticate via:[TOKEN, KERBEROS] auth:SIMPLE is what worries me. spark-submit fails on yarn with kerberos enabled Key: SPARK-9019 URL: https://issues.apache.org/jira/browse/SPARK-9019 Project: Spark Issue Type: Bug Components: Spark Submit Affects Versions: 1.5.0 Environment: Hadoop 2.6 with YARN and kerberos enabled Reporter: Bolke de Bruin Labels: kerberos, spark-submit, yarn It is not possible to run jobs using spark-submit on yarn with a kerberized cluster. Commandline: /usr/hdp/2.2.0.0-2041/spark-1.5.0/bin/spark-submit --principal sparkjob --keytab sparkjob.keytab --num-executors 3 --executor-cores 5 --executor-memory 5G --master yarn-cluster /tmp/get_peers.py Fails with: 15/07/13 22:48:31 INFO server.Server: jetty-8.y.z-SNAPSHOT 15/07/13 22:48:31 INFO server.AbstractConnector: Started SelectChannelConnector@0.0.0.0:58380 15/07/13 22:48:31 INFO util.Utils: Successfully started service 'SparkUI' on port 58380. 15/07/13 22:48:31 INFO ui.SparkUI: Started SparkUI at http://10.111.114.9:58380 15/07/13 22:48:31 INFO cluster.YarnClusterScheduler: Created YarnClusterScheduler 15/07/13 22:48:31 WARN metrics.MetricsSystem: Using default name DAGScheduler for source because spark.app.id is not set. 15/07/13 22:48:32 INFO util.Utils: Successfully started service 'org.apache.spark.network.netty.NettyBlockTransferService' on port 43470. 15/07/13 22:48:32 INFO netty.NettyBlockTransferService: Server created on 43470 15/07/13 22:48:32 INFO storage.BlockManagerMaster: Trying to register BlockManager 15/07/13 22:48:32 INFO storage.BlockManagerMasterEndpoint: Registering block manager 10.111.114.9:43470 with 265.1 MB RAM, BlockManagerId(driver, 10.111.114.9, 43470) 15/07/13 22:48:32 INFO storage.BlockManagerMaster: Registered BlockManager 15/07/13 22:48:32 INFO impl.TimelineClientImpl: Timeline service address: http://lxhnl002.ad.ing.net:8188/ws/v1/timeline/ 15/07/13 22:48:33 WARN ipc.Client: Exception encountered while connecting to the server : org.apache.hadoop.security.AccessControlException: Client cannot
[jira] [Commented] (SPARK-9019) spark-submit fails on yarn with kerberos enabled
[ https://issues.apache.org/jira/browse/SPARK-9019?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14626256#comment-14626256 ] Bolke de Bruin commented on SPARK-9019: --- And some more debugging information. Please note the selected auth:SIMPLE method. 15/07/14 11:03:45 INFO ApplicationMaster: Registered signal handlers for [TERM, HUP, INT] 15/07/14 11:03:45 DEBUG Shell: setsid exited with exit code 0 15/07/14 11:03:45 DEBUG MutableMetricsFactory: field org.apache.hadoop.metrics2.lib.MutableRate org.apache.hadoop.security.UserGroupInformation$UgiMetrics.loginSuccess with annotation @org.apache.hadoop.metrics2.annotation.Metric(value=[Rate of successful kerberos logins and latency (milliseconds)], about=, valueName=Time, type=DEFAULT, always=false, sampleName=Ops) 15/07/14 11:03:45 DEBUG MutableMetricsFactory: field org.apache.hadoop.metrics2.lib.MutableRate org.apache.hadoop.security.UserGroupInformation$UgiMetrics.loginFailure with annotation @org.apache.hadoop.metrics2.annotation.Metric(value=[Rate of failed kerberos logins and latency (milliseconds)], about=, valueName=Time, type=DEFAULT, always=false, sampleName=Ops) 15/07/14 11:03:45 DEBUG MutableMetricsFactory: field org.apache.hadoop.metrics2.lib.MutableRate org.apache.hadoop.security.UserGroupInformation$UgiMetrics.getGroups with annotation @org.apache.hadoop.metrics2.annotation.Metric(value=[GetGroups], about=, valueName=Time, type=DEFAULT, always=false, sampleName=Ops) 15/07/14 11:03:45 DEBUG MetricsSystemImpl: UgiMetrics, User and group related metrics 15/07/14 11:03:45 DEBUG Groups: Creating new Groups object 15/07/14 11:03:45 DEBUG NativeCodeLoader: Trying to load the custom-built native-hadoop library... 15/07/14 11:03:45 DEBUG NativeCodeLoader: Failed to load native-hadoop with error: java.lang.UnsatisfiedLinkError: no hadoop in java.library.path 15/07/14 11:03:45 DEBUG NativeCodeLoader: java.library.path=/usr/java/packages/lib/amd64:/usr/lib64:/lib64:/lib:/usr/lib 15/07/14 11:03:45 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable 15/07/14 11:03:45 DEBUG PerformanceAdvisory: Falling back to shell based 15/07/14 11:03:45 DEBUG JniBasedUnixGroupsMappingWithFallback: Group mapping impl=org.apache.hadoop.security.ShellBasedUnixGroupsMapping 15/07/14 11:03:45 DEBUG Groups: Group mapping impl=org.apache.hadoop.security.JniBasedUnixGroupsMappingWithFallback; cacheTimeout=30; warningDeltaMs=5000 15/07/14 11:03:45 DEBUG YarnSparkHadoopUtil: running as user: yx66jx 15/07/14 11:03:45 DEBUG UserGroupInformation: hadoop login 15/07/14 11:03:45 DEBUG UserGroupInformation: hadoop login commit 15/07/14 11:03:45 DEBUG UserGroupInformation: using kerberos user:null 15/07/14 11:03:45 DEBUG UserGroupInformation: using local user:UnixPrincipal: yx66jx 15/07/14 11:03:45 DEBUG UserGroupInformation: Using user: UnixPrincipal: yx66jx with name yx66jx 15/07/14 11:03:45 DEBUG UserGroupInformation: User entry: yx66jx 15/07/14 11:03:45 DEBUG UserGroupInformation: UGI loginUser:yx66jx (auth:KERBEROS) 15/07/14 11:03:45 DEBUG UserGroupInformation: PrivilegedAction as:yx66jx (auth:SIMPLE) from:org.apache.spark.deploy.SparkHadoopUtil.runAsSparkUser(SparkHadoopUtil.scala:65) 15/07/14 11:03:46 INFO ApplicationMaster: ApplicationAttemptId: appattempt_1436783220608_0085_01 15/07/14 11:03:46 DEBUG BlockReaderLocal: dfs.client.use.legacy.blockreader.local = false 15/07/14 11:03:46 DEBUG BlockReaderLocal: dfs.client.read.shortcircuit = true 15/07/14 11:03:46 DEBUG BlockReaderLocal: dfs.client.domain.socket.data.traffic = false 15/07/14 11:03:46 DEBUG BlockReaderLocal: dfs.domain.socket.path = /var/lib/hadoop-hdfs/dn_socket spark-submit fails on yarn with kerberos enabled Key: SPARK-9019 URL: https://issues.apache.org/jira/browse/SPARK-9019 Project: Spark Issue Type: Bug Components: Spark Submit Affects Versions: 1.5.0 Environment: Hadoop 2.6 with YARN and kerberos enabled Reporter: Bolke de Bruin Labels: kerberos, spark-submit, yarn It is not possible to run jobs using spark-submit on yarn with a kerberized cluster. Commandline: /usr/hdp/2.2.0.0-2041/spark-1.5.0/bin/spark-submit --principal sparkjob --keytab sparkjob.keytab --num-executors 3 --executor-cores 5 --executor-memory 5G --master yarn-cluster /tmp/get_peers.py Fails with: 15/07/13 22:48:31 INFO server.Server: jetty-8.y.z-SNAPSHOT 15/07/13 22:48:31 INFO server.AbstractConnector: Started SelectChannelConnector@0.0.0.0:58380 15/07/13 22:48:31 INFO util.Utils: Successfully started service 'SparkUI' on port 58380. 15/07/13 22:48:31 INFO ui.SparkUI: Started SparkUI at http://10.111.114.9:58380 15/07/13 22:48:31 INFO cluster.YarnClusterScheduler:
[jira] [Commented] (SPARK-9019) spark-submit fails on yarn with kerberos enabled
[ https://issues.apache.org/jira/browse/SPARK-9019?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14626300#comment-14626300 ] Bolke de Bruin commented on SPARK-9019: --- It might be that we have a configuration issue (but Im not sure): 15/07/14 11:03:49 DEBUG RMDelegationTokenSelector: Looking for a token with service 10.111.114.16:8032 15/07/14 11:03:49 DEBUG RMDelegationTokenSelector: Token kind is YARN_AM_RM_TOKEN and the token's service name is I think that should match spark-submit fails on yarn with kerberos enabled Key: SPARK-9019 URL: https://issues.apache.org/jira/browse/SPARK-9019 Project: Spark Issue Type: Bug Components: Spark Submit Affects Versions: 1.5.0 Environment: Hadoop 2.6 with YARN and kerberos enabled Reporter: Bolke de Bruin Labels: kerberos, spark-submit, yarn It is not possible to run jobs using spark-submit on yarn with a kerberized cluster. Commandline: /usr/hdp/2.2.0.0-2041/spark-1.5.0/bin/spark-submit --principal sparkjob --keytab sparkjob.keytab --num-executors 3 --executor-cores 5 --executor-memory 5G --master yarn-cluster /tmp/get_peers.py Fails with: 15/07/13 22:48:31 INFO server.Server: jetty-8.y.z-SNAPSHOT 15/07/13 22:48:31 INFO server.AbstractConnector: Started SelectChannelConnector@0.0.0.0:58380 15/07/13 22:48:31 INFO util.Utils: Successfully started service 'SparkUI' on port 58380. 15/07/13 22:48:31 INFO ui.SparkUI: Started SparkUI at http://10.111.114.9:58380 15/07/13 22:48:31 INFO cluster.YarnClusterScheduler: Created YarnClusterScheduler 15/07/13 22:48:31 WARN metrics.MetricsSystem: Using default name DAGScheduler for source because spark.app.id is not set. 15/07/13 22:48:32 INFO util.Utils: Successfully started service 'org.apache.spark.network.netty.NettyBlockTransferService' on port 43470. 15/07/13 22:48:32 INFO netty.NettyBlockTransferService: Server created on 43470 15/07/13 22:48:32 INFO storage.BlockManagerMaster: Trying to register BlockManager 15/07/13 22:48:32 INFO storage.BlockManagerMasterEndpoint: Registering block manager 10.111.114.9:43470 with 265.1 MB RAM, BlockManagerId(driver, 10.111.114.9, 43470) 15/07/13 22:48:32 INFO storage.BlockManagerMaster: Registered BlockManager 15/07/13 22:48:32 INFO impl.TimelineClientImpl: Timeline service address: http://lxhnl002.ad.ing.net:8188/ws/v1/timeline/ 15/07/13 22:48:33 WARN ipc.Client: Exception encountered while connecting to the server : org.apache.hadoop.security.AccessControlException: Client cannot authenticate via:[TOKEN, KERBEROS] 15/07/13 22:48:33 INFO client.ConfiguredRMFailoverProxyProvider: Failing over to rm2 15/07/13 22:48:33 INFO retry.RetryInvocationHandler: Exception while invoking getClusterNodes of class ApplicationClientProtocolPBClientImpl over rm2 after 1 fail over attempts. Trying to fail over after sleeping for 32582ms. java.net.ConnectException: Call From lxhnl006.ad.ing.net/10.111.114.9 to lxhnl013.ad.ing.net:8032 failed on connection exception: java.net.ConnectException: Connection refused; For more details see: http://wiki.apache.org/hadoop/ConnectionRefused at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method) at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:57) at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45) at java.lang.reflect.Constructor.newInstance(Constructor.java:526) at org.apache.hadoop.net.NetUtils.wrapWithMessage(NetUtils.java:791) at org.apache.hadoop.net.NetUtils.wrapException(NetUtils.java:731) at org.apache.hadoop.ipc.Client.call(Client.java:1472) at org.apache.hadoop.ipc.Client.call(Client.java:1399) at org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:232) at com.sun.proxy.$Proxy24.getClusterNodes(Unknown Source) at org.apache.hadoop.yarn.api.impl.pb.client.ApplicationClientProtocolPBClientImpl.getClusterNodes(ApplicationClientProtocolPBClientImpl.java:262) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:606) at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:187) at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:102) at com.sun.proxy.$Proxy25.getClusterNodes(Unknown Source) at
[jira] [Commented] (SPARK-9019) spark-submit fails on yarn with kerberos enabled
[ https://issues.apache.org/jira/browse/SPARK-9019?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14626574#comment-14626574 ] Bolke de Bruin commented on SPARK-9019: --- Can this be related to YARN-3103? spark-submit fails on yarn with kerberos enabled Key: SPARK-9019 URL: https://issues.apache.org/jira/browse/SPARK-9019 Project: Spark Issue Type: Bug Components: Spark Submit Affects Versions: 1.5.0 Environment: Hadoop 2.6 with YARN and kerberos enabled Reporter: Bolke de Bruin Labels: kerberos, spark-submit, yarn It is not possible to run jobs using spark-submit on yarn with a kerberized cluster. Commandline: /usr/hdp/2.2.0.0-2041/spark-1.5.0/bin/spark-submit --principal sparkjob --keytab sparkjob.keytab --num-executors 3 --executor-cores 5 --executor-memory 5G --master yarn-cluster /tmp/get_peers.py Fails with: 15/07/13 22:48:31 INFO server.Server: jetty-8.y.z-SNAPSHOT 15/07/13 22:48:31 INFO server.AbstractConnector: Started SelectChannelConnector@0.0.0.0:58380 15/07/13 22:48:31 INFO util.Utils: Successfully started service 'SparkUI' on port 58380. 15/07/13 22:48:31 INFO ui.SparkUI: Started SparkUI at http://10.111.114.9:58380 15/07/13 22:48:31 INFO cluster.YarnClusterScheduler: Created YarnClusterScheduler 15/07/13 22:48:31 WARN metrics.MetricsSystem: Using default name DAGScheduler for source because spark.app.id is not set. 15/07/13 22:48:32 INFO util.Utils: Successfully started service 'org.apache.spark.network.netty.NettyBlockTransferService' on port 43470. 15/07/13 22:48:32 INFO netty.NettyBlockTransferService: Server created on 43470 15/07/13 22:48:32 INFO storage.BlockManagerMaster: Trying to register BlockManager 15/07/13 22:48:32 INFO storage.BlockManagerMasterEndpoint: Registering block manager 10.111.114.9:43470 with 265.1 MB RAM, BlockManagerId(driver, 10.111.114.9, 43470) 15/07/13 22:48:32 INFO storage.BlockManagerMaster: Registered BlockManager 15/07/13 22:48:32 INFO impl.TimelineClientImpl: Timeline service address: http://lxhnl002.ad.ing.net:8188/ws/v1/timeline/ 15/07/13 22:48:33 WARN ipc.Client: Exception encountered while connecting to the server : org.apache.hadoop.security.AccessControlException: Client cannot authenticate via:[TOKEN, KERBEROS] 15/07/13 22:48:33 INFO client.ConfiguredRMFailoverProxyProvider: Failing over to rm2 15/07/13 22:48:33 INFO retry.RetryInvocationHandler: Exception while invoking getClusterNodes of class ApplicationClientProtocolPBClientImpl over rm2 after 1 fail over attempts. Trying to fail over after sleeping for 32582ms. java.net.ConnectException: Call From lxhnl006.ad.ing.net/10.111.114.9 to lxhnl013.ad.ing.net:8032 failed on connection exception: java.net.ConnectException: Connection refused; For more details see: http://wiki.apache.org/hadoop/ConnectionRefused at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method) at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:57) at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45) at java.lang.reflect.Constructor.newInstance(Constructor.java:526) at org.apache.hadoop.net.NetUtils.wrapWithMessage(NetUtils.java:791) at org.apache.hadoop.net.NetUtils.wrapException(NetUtils.java:731) at org.apache.hadoop.ipc.Client.call(Client.java:1472) at org.apache.hadoop.ipc.Client.call(Client.java:1399) at org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:232) at com.sun.proxy.$Proxy24.getClusterNodes(Unknown Source) at org.apache.hadoop.yarn.api.impl.pb.client.ApplicationClientProtocolPBClientImpl.getClusterNodes(ApplicationClientProtocolPBClientImpl.java:262) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:606) at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:187) at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:102) at com.sun.proxy.$Proxy25.getClusterNodes(Unknown Source) at org.apache.hadoop.yarn.client.api.impl.YarnClientImpl.getNodeReports(YarnClientImpl.java:475) at org.apache.spark.scheduler.cluster.YarnClusterSchedulerBackend$$anonfun$getDriverLogUrls$1.apply(YarnClusterSchedulerBackend.scala:92) at
[jira] [Commented] (SPARK-9019) spark-submit fails on yarn with kerberos enabled
[ https://issues.apache.org/jira/browse/SPARK-9019?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14625349#comment-14625349 ] Bolke de Bruin commented on SPARK-9019: --- Please note that keytab was successfully uploaded: 15/07/13 22:48:17 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable 15/07/13 22:48:18 INFO yarn.Client: Attempting to login to the Kerberos using principal: sparkjob and keytab: /etc/security/keytabs/sparkjob.keytab 15/07/13 22:48:18 INFO security.UserGroupInformation: Login successful for user sparkjob using keytab file /etc/security/keytabs/sparkjob.keytab 15/07/13 22:48:18 INFO yarn.Client: Successfully logged into the KDC. spark-submit fails on yarn with kerberos enabled Key: SPARK-9019 URL: https://issues.apache.org/jira/browse/SPARK-9019 Project: Spark Issue Type: Bug Components: Spark Submit Affects Versions: 1.5.0 Environment: Hadoop 2.6 with YARN and kerberos enabled Reporter: Bolke de Bruin Labels: kerberos, spark-submit, yarn It is not possible to run jobs using spark-submit on yarn with a kerberized cluster. Commandline: /usr/hdp/2.2.0.0-2041/spark-1.5.0/bin/spark-submit --principal sparkjob --keytab sparkjob.keytab --num-executors 3 --executor-cores 5 --executor-memory 5G --master yarn-cluster /tmp/get_peers.py Fails with: 15/07/13 22:48:31 INFO server.Server: jetty-8.y.z-SNAPSHOT 15/07/13 22:48:31 INFO server.AbstractConnector: Started SelectChannelConnector@0.0.0.0:58380 15/07/13 22:48:31 INFO util.Utils: Successfully started service 'SparkUI' on port 58380. 15/07/13 22:48:31 INFO ui.SparkUI: Started SparkUI at http://10.111.114.9:58380 15/07/13 22:48:31 INFO cluster.YarnClusterScheduler: Created YarnClusterScheduler 15/07/13 22:48:31 WARN metrics.MetricsSystem: Using default name DAGScheduler for source because spark.app.id is not set. 15/07/13 22:48:32 INFO util.Utils: Successfully started service 'org.apache.spark.network.netty.NettyBlockTransferService' on port 43470. 15/07/13 22:48:32 INFO netty.NettyBlockTransferService: Server created on 43470 15/07/13 22:48:32 INFO storage.BlockManagerMaster: Trying to register BlockManager 15/07/13 22:48:32 INFO storage.BlockManagerMasterEndpoint: Registering block manager 10.111.114.9:43470 with 265.1 MB RAM, BlockManagerId(driver, 10.111.114.9, 43470) 15/07/13 22:48:32 INFO storage.BlockManagerMaster: Registered BlockManager 15/07/13 22:48:32 INFO impl.TimelineClientImpl: Timeline service address: http://lxhnl002.ad.ing.net:8188/ws/v1/timeline/ 15/07/13 22:48:33 WARN ipc.Client: Exception encountered while connecting to the server : org.apache.hadoop.security.AccessControlException: Client cannot authenticate via:[TOKEN, KERBEROS] 15/07/13 22:48:33 INFO client.ConfiguredRMFailoverProxyProvider: Failing over to rm2 15/07/13 22:48:33 INFO retry.RetryInvocationHandler: Exception while invoking getClusterNodes of class ApplicationClientProtocolPBClientImpl over rm2 after 1 fail over attempts. Trying to fail over after sleeping for 32582ms. java.net.ConnectException: Call From lxhnl006.ad.ing.net/10.111.114.9 to lxhnl013.ad.ing.net:8032 failed on connection exception: java.net.ConnectException: Connection refused; For more details see: http://wiki.apache.org/hadoop/ConnectionRefused at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method) at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:57) at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45) at java.lang.reflect.Constructor.newInstance(Constructor.java:526) at org.apache.hadoop.net.NetUtils.wrapWithMessage(NetUtils.java:791) at org.apache.hadoop.net.NetUtils.wrapException(NetUtils.java:731) at org.apache.hadoop.ipc.Client.call(Client.java:1472) at org.apache.hadoop.ipc.Client.call(Client.java:1399) at org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:232) at com.sun.proxy.$Proxy24.getClusterNodes(Unknown Source) at org.apache.hadoop.yarn.api.impl.pb.client.ApplicationClientProtocolPBClientImpl.getClusterNodes(ApplicationClientProtocolPBClientImpl.java:262) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:606) at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:187)