[ https://issues.apache.org/jira/browse/SPARK-16742?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15962440#comment-15962440 ]
Michael Gummelt commented on SPARK-16742: ----------------------------------------- Hi [~vanzin], [~ganger85] and Strat.io are pulling back their Mesos Kerberos implementation for now, and we at Mesosphere are about to submit a PR to upstream our implementation. I have a few questions I'd like to run by you to make sure that PR goes smoothly. 1) I've been following your comments on this Spark Standalone Kerberos PR: https://github.com/apache/spark/pull/17530. It looks like your concern is that in *cluster mode*, the keytab is written to a file on the host running the driver, and is owned by the user of the Spark Worker, which will be the same for each job. So jobs submitted by multiple users will be able to read each other's keytabs. In *client mode*, it looks like the delegation tokens are written to a file (HADOOP_TOKEN_FILE_LOCATION) on the host running the executor, which suffers from the same problem as the keytab in cluster mode. The problem is then that a kerberos-authenticated user submitting their job would be unaware that their credentials are being leaked to other users. Is this an accurate description of the issue? 2) I understand that YARN writes delegation tokens via {{amContainer.setTokens()}}, which ultimately results in the delegation token being written to a file owned by the submitting user. However, since the "submitting user" is a Kerberos user, not a Unix user, I'm assuming that {{hadoop.security.auth_to_local}} is what maps the Kerberos user to the Unix user who runs the ApplicationMaster and owns that file. Is that correct? To avoid the shared-file problem for delegation tokens, our Mesos implementation currently has the Executor issue an RPC call to fetch the delegation token from the driver. There therefore isn't any need for at-rest encryption, and if in-motion encryption is in the user's threat model, then can be sure to run Spark with SSL. We avoid the shared-file problem for keytabs entirely, because there's no need to distribute the keytab, at least in client mode. Unlike YARN, the driver and the equivalent of the "ApplicationMaster" in Mesos are one and the same. They both exist in the same process, the {{spark-submit}} process. We're probably going to punt on cluster mode for now, just for simplicity, but we should be able to solve this in cluster mode as well, because unlike standalone, and much like YARN, Mesos controls what user the driver runs as. What do you think of the above approach? If you see any blockers, I would very much appreciate teasing those out now rather than during the PR. Thanks! > Kerberos support for Spark on Mesos > ----------------------------------- > > Key: SPARK-16742 > URL: https://issues.apache.org/jira/browse/SPARK-16742 > Project: Spark > Issue Type: New Feature > Components: Mesos > Reporter: Michael Gummelt > > We at Mesosphere have written Kerberos support for Spark on Mesos. We'll be > contributing it to Apache Spark soon. > Mesosphere design doc: > https://docs.google.com/document/d/1xyzICg7SIaugCEcB4w1vBWp24UDkyJ1Pyt2jtnREFqc/edit#heading=h.tdnq7wilqrj6 > Mesosphere code: > https://github.com/mesosphere/spark/commit/73ba2ab8d97510d5475ef9a48c673ce34f7173fa -- This message was sent by Atlassian JIRA (v6.3.15#6346) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org