Hello all — tl;dr: I’m having an issue running spark-shell from my laptop (or other non-cluster-affiliated machine), and I think the issue boils down to usernames. Can I convince spark/scala that I’m someone other than $USER?
A bit of background: our cluster is CDH 5.4.8, installed with Cloudera Manager 5.5. We use LDAP, and my login on all hadoop-affiliated machines (including the gateway boxes we use for running scheduled work) is ‘matt.tenenbaum’. When I run spark-shell on one of those machines, everything is fine: [matt.tenenbaum@remote-machine ~]$ HADOOP_CONF_DIR=/etc/hadoop/conf SPARK_HOME=spark-1.6.0-bin-hadoop2.6 spark-1.6.0-bin-hadoop2.6/bin/spark-shell --master yarn --deploy-mode client Everything starts up correctly, I get a scala prompt, the SparkContext and SQL context are correctly initialized, and I’m off to the races: 16/04/01 23:27:00 INFO session.SessionState: Created local directory: /tmp/35b58974-dad5-43c6-9864-43815d101ca0_resources 16/04/01 23:27:00 INFO session.SessionState: Created HDFS directory: /tmp/hive/matt.tenenbaum/35b58974-dad5-43c6-9864-43815d101ca0 16/04/01 23:27:00 INFO session.SessionState: Created local directory: /tmp/matt.tenenbaum/35b58974-dad5-43c6-9864-43815d101ca0 16/04/01 23:27:00 INFO session.SessionState: Created HDFS directory: /tmp/hive/matt.tenenbaum/35b58974-dad5-43c6-9864-43815d101ca0/_tmp_space.db 16/04/01 23:27:00 INFO repl.SparkILoop: Created sql context (with Hive support).. SQL context available as sqlContext. scala> 1 + 41 res0: Int = 42 scala> sc res1: org.apache.spark.SparkContext = org.apache.spark.SparkContext@4e9bd2c8 I am running 1.6 from a downloaded tgz file, rather than the spark-shell made available to the cluster from CDH. I can copy that tgz to my laptop, and grab a copy of the cluster configurations, and in a perfect world I would then be able to run everything in the same way [matt@laptop ~]$ HADOOP_CONF_DIR=path/to/hadoop/conf SPARK_HOME=spark-1.6.0-bin-hadoop2.6 spark-1.6.0-bin-hadoop2.6/bin/spark-shell --master yarn --deploy-mode client Notice there are two things that are different: 1. My local username on my laptop is ‘matt’, which does not match my name on the remote machine. 2. The Hadoop configs live somewhere other than /etc/hadoop/conf Alas, #1 proves fatal because of cluster permissions (there is no /user/matt/ in HDFS, and ‘matt’ is not a valid LDAP user). In the initialization logging output, I can see that fail in an expected way: 16/04/01 16:37:19 INFO yarn.Client: Setting up container launch context for our AM 16/04/01 16:37:19 INFO yarn.Client: Setting up the launch environment for our AM container 16/04/01 16:37:19 INFO yarn.Client: Preparing resources for our AM container 16/04/01 16:37:20 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable 16/04/01 16:37:21 ERROR spark.SparkContext: Error initializing SparkContext. org.apache.hadoop.security.AccessControlException: Permission denied: user=matt, access=WRITE, inode="/user":hdfs:supergroup:drwxr-xr-x at org.apache.hadoop.hdfs.server.namenode.DefaultAuthorizationProvider.checkFsPermission(DefaultAuthorizationProvider.java:257) at org.apache.hadoop.hdfs.server.namenode.DefaultAuthorizationProvider.check(DefaultAuthorizationProvider.java:238) at (... etc ...) Fine. In other circumstances I’ve told Hadoop explicitly who I am by setting HADOOP_USER_NAME. Maybe that works here? [matt@laptop ~]$ HADOOP_USER_NAME=matt.tenenbaum HADOOP_CONF_DIR=soma-conf SPARK_HOME=spark-1.6.0-bin-hadoop2.6 spark-1.6.0-bin-hadoop2.6/bin/spark-shell --master yarn --deploy-mode client Eventually that fails too, but not for the same reason. Setting HADOOP_USER_NAME is sufficient to allow initialization to get past the access-control problems, and I can see it request a new application from the cluster 16/04/01 16:43:08 INFO yarn.Client: Will allocate AM container, with 896 MB memory including 384 MB overhead 16/04/01 16:43:08 INFO yarn.Client: Setting up container launch context for our AM 16/04/01 16:43:08 INFO yarn.Client: Setting up the launch environment for our AM container 16/04/01 16:43:08 INFO yarn.Client: Preparing resources for our AM container ... [resource uploads happen here] ... 16/04/01 16:46:16 INFO spark.SecurityManager: Changing view acls to: matt,matt.tenenbaum 16/04/01 16:46:16 INFO spark.SecurityManager: Changing modify acls to: matt,matt.tenenbaum 16/04/01 16:46:16 INFO spark.SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users with view permissions: Set(matt, matt.tenenbaum); users with modify permissions: Set(matt, matt.tenenbaum) 16/04/01 16:46:16 INFO yarn.Client: Submitting application 30965 to ResourceManager 16/04/01 16:46:16 INFO impl.YarnClientImpl: Submitted application application_1451332794331_30965 16/04/01 16:46:17 INFO yarn.Client: Application report for application_1451332794331_30965 (state: ACCEPTED) 16/04/01 16:46:17 INFO yarn.Client: client token: N/A diagnostics: N/A ApplicationMaster host: N/A ApplicationMaster RPC port: -1 queue: root.matt_dot_tenenbaum start time: 1459554373844 final status: UNDEFINED tracking URL: http://resource-manager:8088/proxy/application_1451332794331_30965/ user: matt.tenenbaum 16/04/01 16:46:19 INFO yarn.Client: Application report for application_1451332794331_30965 (state: ACCEPTED) but this AM never switches state from ACCEPTED to RUNNING. Eventually it times out and kills the AM 16/04/01 16:50:14 INFO yarn.Client: Application report for application_1451332794331_30965 (state: FAILED) 16/04/01 16:50:14 INFO yarn.Client: client token: N/A diagnostics: Application application_1451332794331_30965 failed 2 times due to AM Container for appattempt_1451332794331_30965_000002 exited with exitCode: 10 For more detailed output, check application tracking page:http://resource-manager:8088/proxy/application_1451332794331_30965/Then, click on links to logs of each attempt. Diagnostics: Exception from container-launch. Container id: container_e43_1451332794331_30965_02_000001 Exit code: 10 Stack trace: ExitCodeException exitCode=10: at org.apache.hadoop.util.Shell.runCommand(Shell.java:543) at org.apache.hadoop.util.Shell.run(Shell.java:460) at org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:720) at org.apache.hadoop.yarn.server.nodemanager.LinuxContainerExecutor.launchContainer(LinuxContainerExecutor.java:293) at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:302) at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:82) at java.util.concurrent.FutureTask.run(FutureTask.java:266) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) at java.lang.Thread.run(Thread.java:745) Shell output: main : command provided 1 main : user is yarn main : requested yarn user is matt.tenenbaum Container exited with a non-zero exit code 10 Failing this attempt. Failing the application. ApplicationMaster host: N/A ApplicationMaster RPC port: -1 queue: root.matt_dot_tenenbaum start time: 1459554373844 final status: FAILED tracking URL: http://resource-manager:8088/cluster/app/application_1451332794331_30965 user: matt.tenenbaum 16/04/01 16:50:15 ERROR spark.SparkContext: Error initializing SparkContext. org.apache.spark.SparkException: Yarn application has already ended! It might have been killed or unable to launch application master. at org.apache.spark.scheduler.cluster.YarnClientSchedulerBackend.waitForApplication(YarnClientSchedulerBackend.scala:124) at org.apache.spark.scheduler.cluster.YarnClientSchedulerBackend.start(YarnClientSchedulerBackend.scala:64) at org.apache.spark.scheduler.TaskSchedulerImpl.start(TaskSchedulerImpl.scala:144) at org.apache.spark.SparkContext.<init>(SparkContext.scala:530) at org.apache.spark.repl.SparkILoop.createSparkContext(SparkILoop.scala:1017) at $line3.$read$$iwC$$iwC.<init>(<console>:15) at $line3.$read$$iwC.<init>(<console>:24) at $line3.$read.<init>(<console>:26) at $line3.$read$.<init>(<console>:30) at $line3.$read$.<clinit>(<console>) at $line3.$eval$.<init>(<console>:7) at $line3.$eval$.<clinit>(<console>) at $line3.$eval.$print(<console>) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:606) at org.apache.spark.repl.SparkIMain$ReadEvalPrint.call(SparkIMain.scala:1065) at org.apache.spark.repl.SparkIMain$Request.loadAndRun(SparkIMain.scala:1346) at org.apache.spark.repl.SparkIMain.loadAndRunReq$1(SparkIMain.scala:840) at org.apache.spark.repl.SparkIMain.interpret(SparkIMain.scala:871) at org.apache.spark.repl.SparkIMain.interpret(SparkIMain.scala:819) at org.apache.spark.repl.SparkILoop.reallyInterpret$1(SparkILoop.scala:857) at org.apache.spark.repl.SparkILoop.interpretStartingWith(SparkILoop.scala:902) at org.apache.spark.repl.SparkILoop.command(SparkILoop.scala:814) at org.apache.spark.repl.SparkILoopInit$$anonfun$initializeSpark$1.apply(SparkILoopInit.scala:125) at org.apache.spark.repl.SparkILoopInit$$anonfun$initializeSpark$1.apply(SparkILoopInit.scala:124) at org.apache.spark.repl.SparkIMain.beQuietDuring(SparkIMain.scala:324) at org.apache.spark.repl.SparkILoopInit$class.initializeSpark(SparkILoopInit.scala:124) at org.apache.spark.repl.SparkILoop.initializeSpark(SparkILoop.scala:64) at org.apache.spark.repl.SparkILoop$$anonfun$org$apache$spark$repl$SparkILoop$$process$1$$anonfun$apply$mcZ$sp$5.apply$mcV$sp(SparkILoop.scala:974) at org.apache.spark.repl.SparkILoopInit$class.runThunks(SparkILoopInit.scala:159) at org.apache.spark.repl.SparkILoop.runThunks(SparkILoop.scala:64) at org.apache.spark.repl.SparkILoopInit$class.postInitialization(SparkILoopInit.scala:108) at org.apache.spark.repl.SparkILoop.postInitialization(SparkILoop.scala:64) at org.apache.spark.repl.SparkILoop$$anonfun$org$apache$spark$repl$SparkILoop$$process$1.apply$mcZ$sp(SparkILoop.scala:991) at org.apache.spark.repl.SparkILoop$$anonfun$org$apache$spark$repl$SparkILoop$$process$1.apply(SparkILoop.scala:945) at org.apache.spark.repl.SparkILoop$$anonfun$org$apache$spark$repl$SparkILoop$$process$1.apply(SparkILoop.scala:945) at scala.tools.nsc.util.ScalaClassLoader$.savingContextLoader(ScalaClassLoader.scala:135) at org.apache.spark.repl.SparkILoop.org$apache$spark$repl$SparkILoop$$process(SparkILoop.scala:945) at org.apache.spark.repl.SparkILoop.process(SparkILoop.scala:1059) at org.apache.spark.repl.Main$.main(Main.scala:31) at org.apache.spark.repl.Main.main(Main.scala) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:606) at org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:731) at org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:181) at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:206) at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:121) at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala) In the end, I’m left at a scala prompt but (obviously) without sc or sqlContext <console>:16: error: not found: value sqlContext import sqlContext.implicits._ ^ <console>:16: error: not found: value sqlContext import sqlContext.sql ^ scala> A bit of googling and reading on Stack Overflow suggests that this all boils down to the SecurityManager, and the difference between running on remote where the shell user matches the expected Hadoop user (so scala.SecurityManager sees Set(matt.tenenbaum)) vs running on my laptop where the SecurityManager sees Set(matt, matt.tenenbaum). I tried manually setting the SPARK_IDENT_STRING and USER environment variables to “matt.tenenbaum” also, but that doesn’t change the outcome. Am I even on the right track? Is this because of a mismatch between who I am on my laptop and who the cluster wants me to be? Is there any way to convince my local spark-shell invocation that I’m “matt.tenenbaum”, not “matt”? Thank you for reading this far, and for any suggestions -mt