Github user skonto commented on a diff in the pull request:

    https://github.com/apache/spark/pull/20945#discussion_r178971058
  
    --- Diff: 
resource-managers/mesos/src/main/scala/org/apache/spark/scheduler/cluster/mesos/MesosClusterScheduler.scala
 ---
    @@ -506,6 +506,10 @@ private[spark] class MesosClusterScheduler(
           options ++= Seq("--class", desc.command.mainClass)
         }
     
    +    desc.conf.getOption("spark.mesos.proxyUser").foreach { v =>
    +      options ++= Seq("--proxy-user", v)
    --- End diff --
    
    Yes this is how Spark runs on mesos so far. It is run in client mode, mesos 
dispatcher is just used to launch the driver in client mode. Its up to the 
DC/OS  to make sure that it runs things safely.
    DC/OS does not use the spark-submit code it submits directly to the rest 
api of the dispatcher because spark submit is not convenient for submitting 
jobs outside the cluster. That submission is secured according to DC/OS 
security capabilities eg https access.
    Also pure mesos does not have the infrastructure in place like (secret 
store or hdfs) to support this story, yarn has which assumes hdfs to manage 
secrets.
    Now back to the scenario you describe let's assume I dont use the dc/os cli 
that submits to the rest api and avoids spark submit (for example I could do 
this from a node in my cluster, like yarn more or less).
    Then I would use `spark-submit --proxy-user Y` but first I would have to 
log in my node as user X
    who can impersonate other users and who can create a TGT for that puprose,
    I would upload user's X ticket cache (which I can point to with KRB5CCNAME) 
on the cluster wide accessible ticket store. I would run spark submit with env 
var set to SPARK_USER=X which would launch via the dispatcher a container 
running the spark driver in client mode and that container would mount the TGT 
from the secret store. The OS user for the container would be X. 
    In there the driver is launched in client mode impersonating user Y.
    
    Note: check this PR: https://github.com/apache/spark/pull/20967,  a 
customer was trying to access remotely a cluster and spark-submit was not a 
good fit. It is hard to try unify all deployment envs,
    and that is way DC/OS uses the cli and I think in the future scheduler code 
should be out of the project.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

Reply via email to