Github user vanzin commented on a diff in the pull request: https://github.com/apache/spark/pull/21669#discussion_r223101077 --- Diff: docs/security.md --- @@ -722,6 +722,67 @@ with encryption, at least. The Kerberos login will be periodically renewed using the provided credentials, and new delegation tokens for supported will be created. +## Secure Interaction with Kubernetes + +When talking to Hadoop-based services behind Kerberos, it was noted that Spark needs to obtain delegation tokens +so that non-local processes can authenticate. These delegation tokens in Kubernetes are stored in Secrets that are +shared by the Driver and its Executors. As such, there are three ways of submitting a kerberos job: + +In all cases you must define the environment variable: `HADOOP_CONF_DIR`. +It also important to note that the KDC needs to be visible from inside the containers if the user uses a local +krb5 file. + +If a user wishes to use a remote HADOOP_CONF directory, that contains the Hadoop configuration files, or +a remote krb5 file, this could be achieved by mounting a pre-defined ConfigMap and mounting the volume in the +desired location that you can point to via the appropriate configs. This method is useful for those who wish to not +rebuild their Docker images, but instead point to a ConfigMap that they could modify. This strategy is supported +via the pod-template feature. + +1. Submitting with a $kinit that stores a TGT in the Local Ticket Cache: +```bash +/usr/bin/kinit -kt <keytab_file> <username>/<krb5 realm> +/opt/spark/bin/spark-submit \ + --deploy-mode cluster \ + --class org.apache.spark.examples.HdfsTest \ + --master k8s://<KUBERNETES_MASTER_ENDPOINT> \ + --conf spark.executor.instances=1 \ + --conf spark.app.name=spark-hdfs \ + --conf spark.kubernetes.container.image=spark:latest \ + --conf spark.kubernetes.kerberos.krb5location=/etc/krb5.conf \ + local:///opt/spark/examples/jars/spark-examples_<VERSION>-SNAPSHOT.jar \ + <HDFS_FILE_LOCATION> +``` +2. Submitting with a local keytab and principal --- End diff -- So If I understand the code correctly, this mode is just replacing the need to run `kinit`. Unlike the use of this option in YARN and Mesos, you do not get token renewal, right? That can be a little confusing to users who are coming from one of those envs. I've sent #22624 which abstracts some of the code used by Mesos and YARN to make it more usable. It could probably be used by k8s too with some modifications. That could also be enhanced to include more functionality - specifically getting delegation tokens by the submission client when running in cluster mode without a keytab. That code is currently in YARN's `Client.scala` but could also be refactored so that k8s could use it to create dts for the cluster-mode driver.
--- --------------------------------------------------------------------- To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org