Github user skonto commented on a diff in the pull request: https://github.com/apache/spark/pull/20945#discussion_r178971058 --- Diff: resource-managers/mesos/src/main/scala/org/apache/spark/scheduler/cluster/mesos/MesosClusterScheduler.scala --- @@ -506,6 +506,10 @@ private[spark] class MesosClusterScheduler( options ++= Seq("--class", desc.command.mainClass) } + desc.conf.getOption("spark.mesos.proxyUser").foreach { v => + options ++= Seq("--proxy-user", v) --- End diff -- Yes this is how Spark runs on mesos so far. It is run in client mode, mesos dispatcher is just used to launch the driver in client mode. Its up to the DC/OS to make sure that it runs things safely. DC/OS does not use the spark-submit code it submits directly to the rest api of the dispatcher because spark submit is not convenient for submitting jobs outside the cluster. That submission is secured according to DC/OS security capabilities eg https access. Also pure mesos does not have the infrastructure in place like (secret store or hdfs) to support this story, yarn has which assumes hdfs to manage secrets. Now back to the scenario you describe let's assume I dont use the dc/os cli that submits to the rest api and avoids spark submit (for example I could do this from a node in my cluster, like yarn more or less). Then I would use `spark-submit --proxy-user Y` but first I would have to log in my node as user X who can impersonate other users and who can create a TGT for that puprose, I would upload user's X ticket cache (which I can point to with KRB5CCNAME) on the cluster wide accessible ticket store. I would run spark submit with env var set to SPARK_USER=X which would launch via the dispatcher a container running the spark driver in client mode and that container would mount the TGT from the secret store. The OS user for the container would be X. In there the driver is launched in client mode impersonating user Y. Note: check this PR: https://github.com/apache/spark/pull/20967, a customer was trying to access remotely a cluster and spark-submit was not a good fit. It is hard to try unify all deployment envs, and that is way DC/OS uses the cli and I think in the future scheduler code should be out of the project.
--- --------------------------------------------------------------------- To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org