Hello all —

tl;dr: I’m having an issue running spark-shell from my laptop (or other
non-cluster-affiliated machine), and I think the issue boils down to
usernames. Can I convince spark/scala that I’m someone other than $USER?

A bit of background: our cluster is CDH 5.4.8, installed with Cloudera
Manager 5.5. We use LDAP, and my login on all hadoop-affiliated machines
(including the gateway boxes we use for running scheduled work) is
‘matt.tenenbaum’. When I run spark-shell on one of those machines,
everything is fine:

[matt.tenenbaum@remote-machine ~]$ HADOOP_CONF_DIR=/etc/hadoop/conf
SPARK_HOME=spark-1.6.0-bin-hadoop2.6
spark-1.6.0-bin-hadoop2.6/bin/spark-shell --master yarn --deploy-mode
client

Everything starts up correctly, I get a scala prompt, the SparkContext and
SQL context are correctly initialized, and I’m off to the races:

16/04/01 23:27:00 INFO session.SessionState: Created local directory:
/tmp/35b58974-dad5-43c6-9864-43815d101ca0_resources
16/04/01 23:27:00 INFO session.SessionState: Created HDFS directory:
/tmp/hive/matt.tenenbaum/35b58974-dad5-43c6-9864-43815d101ca0
16/04/01 23:27:00 INFO session.SessionState: Created local directory:
/tmp/matt.tenenbaum/35b58974-dad5-43c6-9864-43815d101ca0
16/04/01 23:27:00 INFO session.SessionState: Created HDFS directory:
/tmp/hive/matt.tenenbaum/35b58974-dad5-43c6-9864-43815d101ca0/_tmp_space.db
16/04/01 23:27:00 INFO repl.SparkILoop: Created sql context (with Hive
support)..
SQL context available as sqlContext.

scala> 1 + 41
res0: Int = 42

scala> sc
res1: org.apache.spark.SparkContext = org.apache.spark.SparkContext@4e9bd2c8

I am running 1.6 from a downloaded tgz file, rather than the spark-shell
made available to the cluster from CDH. I can copy that tgz to my laptop,
and grab a copy of the cluster configurations, and in a perfect world I
would then be able to run everything in the same way

[matt@laptop ~]$ HADOOP_CONF_DIR=path/to/hadoop/conf
SPARK_HOME=spark-1.6.0-bin-hadoop2.6
spark-1.6.0-bin-hadoop2.6/bin/spark-shell --master yarn --deploy-mode
client

Notice there are two things that are different:

   1. My local username on my laptop is ‘matt’, which does not match my
   name on the remote machine.
   2. The Hadoop configs live somewhere other than /etc/hadoop/conf

Alas, #1 proves fatal because of cluster permissions (there is no
/user/matt/ in HDFS, and ‘matt’ is not a valid LDAP user). In the
initialization logging output, I can see that fail in an expected way:

16/04/01 16:37:19 INFO yarn.Client: Setting up container launch
context for our AM
16/04/01 16:37:19 INFO yarn.Client: Setting up the launch environment
for our AM container
16/04/01 16:37:19 INFO yarn.Client: Preparing resources for our AM container
16/04/01 16:37:20 WARN util.NativeCodeLoader: Unable to load
native-hadoop library for your platform... using builtin-java classes
where applicable
16/04/01 16:37:21 ERROR spark.SparkContext: Error initializing SparkContext.
org.apache.hadoop.security.AccessControlException: Permission denied:
user=matt, access=WRITE, inode="/user":hdfs:supergroup:drwxr-xr-x
    at 
org.apache.hadoop.hdfs.server.namenode.DefaultAuthorizationProvider.checkFsPermission(DefaultAuthorizationProvider.java:257)
    at 
org.apache.hadoop.hdfs.server.namenode.DefaultAuthorizationProvider.check(DefaultAuthorizationProvider.java:238)
    at (... etc ...)

Fine. In other circumstances I’ve told Hadoop explicitly who I am by
setting HADOOP_USER_NAME. Maybe that works here?

[matt@laptop ~]$ HADOOP_USER_NAME=matt.tenenbaum
HADOOP_CONF_DIR=soma-conf SPARK_HOME=spark-1.6.0-bin-hadoop2.6
spark-1.6.0-bin-hadoop2.6/bin/spark-shell --master yarn --deploy-mode
client

Eventually that fails too, but not for the same reason. Setting
HADOOP_USER_NAME is sufficient to allow initialization to get past the
access-control problems, and I can see it request a new application from
the cluster

16/04/01 16:43:08 INFO yarn.Client: Will allocate AM container, with
896 MB memory including 384 MB overhead
16/04/01 16:43:08 INFO yarn.Client: Setting up container launch
context for our AM
16/04/01 16:43:08 INFO yarn.Client: Setting up the launch environment
for our AM container
16/04/01 16:43:08 INFO yarn.Client: Preparing resources for our AM container
... [resource uploads happen here] ...
16/04/01 16:46:16 INFO spark.SecurityManager: Changing view acls to:
matt,matt.tenenbaum
16/04/01 16:46:16 INFO spark.SecurityManager: Changing modify acls to:
matt,matt.tenenbaum
16/04/01 16:46:16 INFO spark.SecurityManager: SecurityManager:
authentication disabled; ui acls disabled; users with view
permissions: Set(matt, matt.tenenbaum); users with modify permissions:
Set(matt, matt.tenenbaum)
16/04/01 16:46:16 INFO yarn.Client: Submitting application 30965 to
ResourceManager
16/04/01 16:46:16 INFO impl.YarnClientImpl: Submitted application
application_1451332794331_30965
16/04/01 16:46:17 INFO yarn.Client: Application report for
application_1451332794331_30965 (state: ACCEPTED)
16/04/01 16:46:17 INFO yarn.Client:
     client token: N/A
     diagnostics: N/A
     ApplicationMaster host: N/A
     ApplicationMaster RPC port: -1
     queue: root.matt_dot_tenenbaum
     start time: 1459554373844
     final status: UNDEFINED
     tracking URL:
http://resource-manager:8088/proxy/application_1451332794331_30965/
     user: matt.tenenbaum
16/04/01 16:46:19 INFO yarn.Client: Application report for
application_1451332794331_30965 (state: ACCEPTED)

but this AM never switches state from ACCEPTED to RUNNING. Eventually it
times out and kills the AM

16/04/01 16:50:14 INFO yarn.Client: Application report for
application_1451332794331_30965 (state: FAILED)
16/04/01 16:50:14 INFO yarn.Client:
     client token: N/A
     diagnostics: Application application_1451332794331_30965 failed 2
times due to AM Container for appattempt_1451332794331_30965_000002
exited with  exitCode: 10
For more detailed output, check application tracking
page:http://resource-manager:8088/proxy/application_1451332794331_30965/Then,
click on links to logs of each attempt.
Diagnostics: Exception from container-launch.
Container id: container_e43_1451332794331_30965_02_000001
Exit code: 10
Stack trace: ExitCodeException exitCode=10:
    at org.apache.hadoop.util.Shell.runCommand(Shell.java:543)
    at org.apache.hadoop.util.Shell.run(Shell.java:460)
    at org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:720)
    at 
org.apache.hadoop.yarn.server.nodemanager.LinuxContainerExecutor.launchContainer(LinuxContainerExecutor.java:293)
    at 
org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:302)
    at 
org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:82)
    at java.util.concurrent.FutureTask.run(FutureTask.java:266)
    at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
    at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
    at java.lang.Thread.run(Thread.java:745)

Shell output: main : command provided 1
main : user is yarn
main : requested yarn user is matt.tenenbaum

Container exited with a non-zero exit code 10
Failing this attempt. Failing the application.
     ApplicationMaster host: N/A
     ApplicationMaster RPC port: -1
     queue: root.matt_dot_tenenbaum
     start time: 1459554373844
     final status: FAILED
     tracking URL:
http://resource-manager:8088/cluster/app/application_1451332794331_30965
     user: matt.tenenbaum
16/04/01 16:50:15 ERROR spark.SparkContext: Error initializing SparkContext.
org.apache.spark.SparkException: Yarn application has already ended!
It might have been killed or unable to launch application master.
    at 
org.apache.spark.scheduler.cluster.YarnClientSchedulerBackend.waitForApplication(YarnClientSchedulerBackend.scala:124)
    at 
org.apache.spark.scheduler.cluster.YarnClientSchedulerBackend.start(YarnClientSchedulerBackend.scala:64)
    at 
org.apache.spark.scheduler.TaskSchedulerImpl.start(TaskSchedulerImpl.scala:144)
    at org.apache.spark.SparkContext.<init>(SparkContext.scala:530)
    at 
org.apache.spark.repl.SparkILoop.createSparkContext(SparkILoop.scala:1017)
    at $line3.$read$$iwC$$iwC.<init>(<console>:15)
    at $line3.$read$$iwC.<init>(<console>:24)
    at $line3.$read.<init>(<console>:26)
    at $line3.$read$.<init>(<console>:30)
    at $line3.$read$.<clinit>(<console>)
    at $line3.$eval$.<init>(<console>:7)
    at $line3.$eval$.<clinit>(<console>)
    at $line3.$eval.$print(<console>)
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
    at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
    at java.lang.reflect.Method.invoke(Method.java:606)
    at 
org.apache.spark.repl.SparkIMain$ReadEvalPrint.call(SparkIMain.scala:1065)
    at 
org.apache.spark.repl.SparkIMain$Request.loadAndRun(SparkIMain.scala:1346)
    at org.apache.spark.repl.SparkIMain.loadAndRunReq$1(SparkIMain.scala:840)
    at org.apache.spark.repl.SparkIMain.interpret(SparkIMain.scala:871)
    at org.apache.spark.repl.SparkIMain.interpret(SparkIMain.scala:819)
    at org.apache.spark.repl.SparkILoop.reallyInterpret$1(SparkILoop.scala:857)
    at 
org.apache.spark.repl.SparkILoop.interpretStartingWith(SparkILoop.scala:902)
    at org.apache.spark.repl.SparkILoop.command(SparkILoop.scala:814)
    at 
org.apache.spark.repl.SparkILoopInit$$anonfun$initializeSpark$1.apply(SparkILoopInit.scala:125)
    at 
org.apache.spark.repl.SparkILoopInit$$anonfun$initializeSpark$1.apply(SparkILoopInit.scala:124)
    at org.apache.spark.repl.SparkIMain.beQuietDuring(SparkIMain.scala:324)
    at 
org.apache.spark.repl.SparkILoopInit$class.initializeSpark(SparkILoopInit.scala:124)
    at org.apache.spark.repl.SparkILoop.initializeSpark(SparkILoop.scala:64)
    at 
org.apache.spark.repl.SparkILoop$$anonfun$org$apache$spark$repl$SparkILoop$$process$1$$anonfun$apply$mcZ$sp$5.apply$mcV$sp(SparkILoop.scala:974)
    at 
org.apache.spark.repl.SparkILoopInit$class.runThunks(SparkILoopInit.scala:159)
    at org.apache.spark.repl.SparkILoop.runThunks(SparkILoop.scala:64)
    at 
org.apache.spark.repl.SparkILoopInit$class.postInitialization(SparkILoopInit.scala:108)
    at org.apache.spark.repl.SparkILoop.postInitialization(SparkILoop.scala:64)
    at 
org.apache.spark.repl.SparkILoop$$anonfun$org$apache$spark$repl$SparkILoop$$process$1.apply$mcZ$sp(SparkILoop.scala:991)
    at 
org.apache.spark.repl.SparkILoop$$anonfun$org$apache$spark$repl$SparkILoop$$process$1.apply(SparkILoop.scala:945)
    at 
org.apache.spark.repl.SparkILoop$$anonfun$org$apache$spark$repl$SparkILoop$$process$1.apply(SparkILoop.scala:945)
    at 
scala.tools.nsc.util.ScalaClassLoader$.savingContextLoader(ScalaClassLoader.scala:135)
    at 
org.apache.spark.repl.SparkILoop.org$apache$spark$repl$SparkILoop$$process(SparkILoop.scala:945)
    at org.apache.spark.repl.SparkILoop.process(SparkILoop.scala:1059)
    at org.apache.spark.repl.Main$.main(Main.scala:31)
    at org.apache.spark.repl.Main.main(Main.scala)
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
    at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
    at java.lang.reflect.Method.invoke(Method.java:606)
    at 
org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:731)
    at org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:181)
    at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:206)
    at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:121)
    at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)

In the end, I’m left at a scala prompt but (obviously) without sc or
sqlContext

<console>:16: error: not found: value sqlContext
         import sqlContext.implicits._
                ^
<console>:16: error: not found: value sqlContext
         import sqlContext.sql
                ^

scala>

A bit of googling and reading on Stack Overflow suggests that this all
boils down to the SecurityManager, and the difference between running on
remote where the shell user matches the expected Hadoop user (so
scala.SecurityManager sees Set(matt.tenenbaum)) vs running on my laptop
where the SecurityManager sees Set(matt, matt.tenenbaum). I tried manually
setting the SPARK_IDENT_STRING and USER environment variables to
“matt.tenenbaum” also, but that doesn’t change the outcome.

Am I even on the right track? Is this because of a mismatch between who I
am on my laptop and who the cluster wants me to be? Is there any way to
convince my local spark-shell invocation that I’m “matt.tenenbaum”, not
“matt”?

Thank you for reading this far, and for any suggestions
-mt
​

Reply via email to