Github user ifilonenko commented on a diff in the pull request: https://github.com/apache/spark/pull/21669#discussion_r215470727 --- Diff: docs/security.md --- @@ -722,6 +722,62 @@ with encryption, at least. The Kerberos login will be periodically renewed using the provided credentials, and new delegation tokens for supported will be created. +## Secure Interaction with Kubernetes + +When talking to Hadoop-based services behind Kerberos, it was noted that Spark needs to obtain delegation tokens +so that non-local processes can authenticate. These delegation tokens in Kubernetes are stored in Secrets that are +shared by the Driver and its Executors. As such, there are three ways of submitting a kerberos job: + +1. Submitting with a $kinit that stores a TGT in the Local Ticket Cache: +```bash +/usr/bin/kinit -kt <keytab_file> <username>/<krb5 realm> +/opt/spark/bin/spark-submit \ + --deploy-mode cluster \ + --class org.apache.spark.examples.HdfsTest \ + --master k8s://<KUBERNETES_MASTER_ENDPOINT> \ + --conf spark.executor.instances=1 \ + --conf spark.app.name=spark-hdfs \ + --conf spark.kubernetes.container.image=spark:latest \ + --conf spark.kubernetes.kerberos.krb5location=/etc/krb5.conf \ + local:///opt/spark/examples/jars/spark-examples_2.11-2.4.0-SNAPSHOT.jar \ + <HDFS_FILE_LOCATION> +``` +3. Submitting with a local keytab and principle +```bash +/opt/spark/bin/spark-submit \ + --deploy-mode cluster \ + --class org.apache.spark.examples.HdfsTest \ + --master k8s://<KUBERNETES_MASTER_ENDPOINT> \ + --conf spark.executor.instances=1 \ + --conf spark.app.name=spark-hdfs \ + --conf spark.kubernetes.container.image=spark:latest \ + --conf spark.kubernetes.kerberos.keytab=<KEYTAB_FILE> \ + --conf spark.kubernetes.kerberos.principal=<PRINCIPLE> \ + --conf spark.kubernetes.kerberos.krb5location=/etc/krb5.conf \ + local:///opt/spark/examples/jars/spark-examples_2.11-2.4.0-SNAPSHOT.jar \ + <HDFS_FILE_LOCATION> +``` + +3. Submitting with pre-populated secrets already existing within the namespace --- End diff -- Agreed, I should be more clear. This is a secret containing the Delegation Token that will be used to mount onto the driver and executors, pointed to via the ENV: `HADOOP_TOKEN_FILE_LOCATION`
--- --------------------------------------------------------------------- To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org