Github user gaborgsomogyi commented on a diff in the pull request: https://github.com/apache/spark/pull/23195#discussion_r238596372 --- Diff: docs/structured-streaming-kafka-integration.md --- @@ -624,3 +624,56 @@ For experimenting on `spark-shell`, you can also use `--packages` to add `spark- See [Application Submission Guide](submitting-applications.html) for more details about submitting applications with external dependencies. + +## Security + +Kafka 0.9.0.0 introduced several features that increases security in a cluster. For detailed +description about these possibilities, see [Kafka security docs](http://kafka.apache.org/documentation.html#security). + +It's worth noting that security is optional and turned off by default. + +Spark supports the following ways to authenticate against Kafka cluster: +- **Delegation token (introduced in Kafka broker 1.1.0)**: This way the application can be configured + via Spark parameters and may not need JAAS login configuration (Spark can use Kafka's dynamic JAAS + configuration feature). For further information about delegation tokens, see + [Kafka delegation token docs](http://kafka.apache.org/documentation/#security_delegation_token). + + The process is initiated by Spark's Kafka delegation token provider. This is enabled by default + but can be turned off with `spark.security.credentials.kafka.enabled`. When + `spark.kafka.bootstrap.servers` set Spark looks for authentication information in the following + order and choose the first available to log in: + - **JAAS login configuration** + - **Keytab file**, such as, + + ./bin/spark-submit \ + --keytab <KEYTAB_FILE> \ + --principal <PRINCIPAL> \ + --conf spark.kafka.bootstrap.servers=<KAFKA_SERVERS> \ + ... + + - **Kerberos credential cache**, such as, + + ./bin/spark-submit \ + --conf spark.kafka.bootstrap.servers=<KAFKA_SERVERS> \ + ... + + Spark supports the following authentication protocols to obtain token: --- End diff -- > "Spark supports" Maybe `Spark can be configured to use` is better phrase. > explaining each option here is not really that helpful * I think the list must be kept (maybe without explanation) because if there is an authentication protocol in kafka it doesn't mean spark is prepared to use it. * With the explanation wanted to give a high level feeling what it's roughly does and Kafka's doc is there to take a deeper look. I'm neutral on removing them. Should we?
--- --------------------------------------------------------------------- To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org