[GitHub] spark pull request #23195: [SPARK-26236][SS] Add kafka delegation token supp...

gaborgsomogyi Tue, 04 Dec 2018 02:04:23 -0800

Github user gaborgsomogyi commented on a diff in the pull request:

    https://github.com/apache/spark/pull/23195#discussion_r238596372
  
    --- Diff: docs/structured-streaming-kafka-integration.md ---
    @@ -624,3 +624,56 @@ For experimenting on `spark-shell`, you can also use 
`--packages` to add `spark-
     
     See [Application Submission Guide](submitting-applications.html) for more 
details about submitting
     applications with external dependencies.
    +
    +## Security
    +
    +Kafka 0.9.0.0 introduced several features that increases security in a 
cluster. For detailed
    +description about these possibilities, see [Kafka security 
docs](http://kafka.apache.org/documentation.html#security).
    +
    +It's worth noting that security is optional and turned off by default.
    +
    +Spark supports the following ways to authenticate against Kafka cluster:
    +- **Delegation token (introduced in Kafka broker 1.1.0)**: This way the 
application can be configured
    +  via Spark parameters and may not need JAAS login configuration (Spark 
can use Kafka's dynamic JAAS
    +  configuration feature). For further information about delegation tokens, 
see
    +  [Kafka delegation token 
docs](http://kafka.apache.org/documentation/#security_delegation_token).
    +
    +  The process is initiated by Spark's Kafka delegation token provider. 
This is enabled by default
    +  but can be turned off with `spark.security.credentials.kafka.enabled`. 
When
    +  `spark.kafka.bootstrap.servers` set Spark looks for authentication 
information in the following
    +  order and choose the first available to log in:
    +  - **JAAS login configuration**
    +  - **Keytab file**, such as,
    +
    +        ./bin/spark-submit \
    +            --keytab <KEYTAB_FILE> \
    +            --principal <PRINCIPAL> \
    +            --conf spark.kafka.bootstrap.servers=<KAFKA_SERVERS> \
    +            ...
    +
    +  - **Kerberos credential cache**, such as,
    +
    +        ./bin/spark-submit \
    +            --conf spark.kafka.bootstrap.servers=<KAFKA_SERVERS> \
    +            ...
    +
    +  Spark supports the following authentication protocols to obtain token:
    --- End diff --
    
    > "Spark supports"
    
    Maybe `Spark can be configured to use` is better phrase.
    
    > explaining each option here is not really that helpful
    
    * I think the list must be kept (maybe without explanation) because if 
there is an authentication protocol in kafka it doesn't mean spark is prepared 
to use it. 
    
    * With the explanation wanted to give a high level feeling what it's 
roughly does and Kafka's doc is there to take a deeper look. I'm neutral on 
removing them. Should we?




---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #23195: [SPARK-26236][SS] Add kafka delegation token supp...

Reply via email to