[GitHub] spark pull request #20945: [SPARK-23790][Mesos] fix metastore connection iss...

skonto Tue, 03 Apr 2018 14:43:36 -0700

Github user skonto commented on a diff in the pull request:

https://github.com/apache/spark/pull/20945#discussion_r178971058

--- Diff:
resource-managers/mesos/src/main/scala/org/apache/spark/scheduler/cluster/mesos/MesosClusterScheduler.scala
---
@@ -506,6 +506,10 @@ private[spark] class MesosClusterScheduler(
options ++= Seq("--class", desc.command.mainClass)
}

+ desc.conf.getOption("spark.mesos.proxyUser").foreach { v =>
+ options ++= Seq("--proxy-user", v)
--- End diff --

Yes this is how Spark runs on mesos so far. It is run in client mode, mesos
dispatcher is just used to launch the driver in client mode. Its up to the
DC/OS to make sure that it runs things safely.
DC/OS does not use the spark-submit code it submits directly to the rest
api of the dispatcher because spark submit is not convenient for submitting
jobs outside the cluster. That submission is secured according to DC/OS
security capabilities eg https access.
Also pure mesos does not have the infrastructure in place like (secret
store or hdfs) to support this story, yarn has which assumes hdfs to manage
secrets.
Now back to the scenario you describe let's assume I dont use the dc/os cli
that submits to the rest api and avoids spark submit (for example I could do
this from a node in my cluster, like yarn more or less).
Then I would use `spark-submit --proxy-user Y` but first I would have to
log in my node as user X
who can impersonate other users and who can create a TGT for that puprose,
I would upload user's X ticket cache (which I can point to with KRB5CCNAME)
on the cluster wide accessible ticket store. I would run spark submit with env
var set to SPARK_USER=X which would launch via the dispatcher a container
running the spark driver in client mode and that container would mount the TGT
from the secret store. The OS user for the container would be X.
In there the driver is launched in client mode impersonating user Y.

Note: check this PR: https://github.com/apache/spark/pull/20967, a
customer was trying to access remotely a cluster and spark-submit was not a
good fit. It is hard to try unify all deployment envs,
and that is way DC/OS uses the cli and I think in the future scheduler code
should be out of the project.



---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #20945: [SPARK-23790][Mesos] fix metastore connection iss...

Reply via email to