Github user vanzin commented on a diff in the pull request:

    https://github.com/apache/spark/pull/23195#discussion_r239177994
  
    --- Diff: docs/structured-streaming-kafka-integration.md ---
    @@ -624,3 +624,199 @@ For experimenting on `spark-shell`, you can also use 
`--packages` to add `spark-
     
     See [Application Submission Guide](submitting-applications.html) for more 
details about submitting
     applications with external dependencies.
    +
    +## Security
    +
    +Kafka 0.9.0.0 introduced several features that increases security in a 
cluster. For detailed
    +description about these possibilities, see [Kafka security 
docs](http://kafka.apache.org/documentation.html#security).
    +
    +It's worth noting that security is optional and turned off by default.
    +
    +Spark supports the following ways to authenticate against Kafka cluster:
    +- **Delegation token (introduced in Kafka broker 1.1.0)**
    +- **JAAS login configuration**
    +
    +### Delegation token
    +
    +This way the application can be configured via Spark parameters and may 
not need JAAS login
    +configuration (Spark can use Kafka's dynamic JAAS configuration feature). 
For further information
    +about delegation tokens, see [Kafka delegation token 
docs](http://kafka.apache.org/documentation/#security_delegation_token).
    +
    +The process is initiated by Spark's Kafka delegation token provider. When 
`spark.kafka.bootstrap.servers`,
    +Spark considers the following log in options, in order of preference:
    +- **JAAS login configuration**
    +- **Keytab file**, such as,
    +
    +      ./bin/spark-submit \
    +          --keytab <KEYTAB_FILE> \
    +          --principal <PRINCIPAL> \
    +          --conf spark.kafka.bootstrap.servers=<KAFKA_SERVERS> \
    +          ...
    +
    +- **Kerberos credential cache**, such as,
    +
    +      ./bin/spark-submit \
    +          --conf spark.kafka.bootstrap.servers=<KAFKA_SERVERS> \
    +          ...
    +
    +The Kafka delegation token provider can be turned off by setting 
`spark.security.credentials.kafka.enabled` to `false` (default: `true`).
    +
    +Spark can be configured to use the following authentication protocols to 
obtain token (it must match with
    +Kafka broker configuration):
    +- **SASL SSL (default)**
    +- **SSL**
    +- **SASL PLAINTEXT (for testing)**
    +
    +After obtaining delegation token successfully, Spark distributes it across 
nodes and renews it accordingly.
    +Delegation token uses `SCRAM` login module for authentication and because 
of that the appropriate
    +`sasl.mechanism` has to be configured on source/sink:
    +
    +<div class="codetabs">
    +<div data-lang="scala" markdown="1">
    +{% highlight scala %}
    +
    +// Setting on Kafka Source for Streaming Queries
    --- End diff --
    
    I think having just one example should be enough.
    
    Is `SCRAM-SHA-512` the only possible value? I think you mentioned different 
values before. If this needs to match the broker's configuration, that needs to 
be mentioned.
    
    Separately, it would be nice to think about having an external config for 
this so people don't need to hardcode this kind of thing in their code...


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

Reply via email to