[GitHub] spark pull request #21669: [SPARK-23257][K8S] Kerberos Support for Spark on ...

vanzin Fri, 05 Oct 2018 11:30:55 -0700

Github user vanzin commented on a diff in the pull request:

    https://github.com/apache/spark/pull/21669#discussion_r223101077
  
    --- Diff: docs/security.md ---
    @@ -722,6 +722,67 @@ with encryption, at least.
     The Kerberos login will be periodically renewed using the provided 
credentials, and new delegation
     tokens for supported will be created.
     
    +## Secure Interaction with Kubernetes
    +
    +When talking to Hadoop-based services behind Kerberos, it was noted that 
Spark needs to obtain delegation tokens
    +so that non-local processes can authenticate. These delegation tokens in 
Kubernetes are stored in Secrets that are 
    +shared by the Driver and its Executors. As such, there are three ways of 
submitting a kerberos job: 
    +
    +In all cases you must define the environment variable: `HADOOP_CONF_DIR`.
    +It also important to note that the KDC needs to be visible from inside the 
containers if the user uses a local
    +krb5 file. 
    +
    +If a user wishes to use a remote HADOOP_CONF directory, that contains the 
Hadoop configuration files, or 
    +a remote krb5 file, this could be achieved by mounting a pre-defined 
ConfigMap and mounting the volume in the
    +desired location that you can point to via the appropriate configs. This 
method is useful for those who wish to not
    +rebuild their Docker images, but instead point to a ConfigMap that they 
could modify. This strategy is supported
    +via the pod-template feature. 
    +
    +1. Submitting with a $kinit that stores a TGT in the Local Ticket Cache:
    +```bash
    +/usr/bin/kinit -kt <keytab_file> <username>/<krb5 realm>
    +/opt/spark/bin/spark-submit \
    +    --deploy-mode cluster \
    +    --class org.apache.spark.examples.HdfsTest \
    +    --master k8s://<KUBERNETES_MASTER_ENDPOINT> \
    +    --conf spark.executor.instances=1 \
    +    --conf spark.app.name=spark-hdfs \
    +    --conf spark.kubernetes.container.image=spark:latest \
    +    --conf spark.kubernetes.kerberos.krb5location=/etc/krb5.conf \
    +    local:///opt/spark/examples/jars/spark-examples_<VERSION>-SNAPSHOT.jar 
\
    +    <HDFS_FILE_LOCATION>
    +```
    +2. Submitting with a local keytab and principal
    --- End diff --
    
    So If I understand the code correctly, this mode is just replacing the need 
to run `kinit`. Unlike the use of this option in YARN and Mesos, you do not get 
token renewal, right? That can be a little confusing to users who are coming 
from one of those envs. 
    
    I've sent #22624 which abstracts some of the code used by Mesos and YARN to 
make it more usable. It could probably be used by k8s too with some 
modifications.
    
    That could also be enhanced to include more functionality - specifically 
getting delegation tokens by the submission client when running in cluster mode 
without a keytab. That code is currently in YARN's `Client.scala` but could 
also be refactored so that k8s could use it to create dts for the cluster-mode 
driver.



---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #21669: [SPARK-23257][K8S] Kerberos Support for Spark on ...

Reply via email to