Hi all,
We currently have a new direct stream connector, thanks to work by Cody and
others on SPARK-12177.

However, that can't be used in secure clusters that require Kerberos
authentication. That's because Kafka currently doesn't support delegation
tokens (KAFKA-1696 <https://issues.apache.org/jira/browse/KAFKA-1696>).
Unfortunately, very little work has been done on that JIRA, so, in my
opinion, folks who want to use secure Kafka (using the norm - Kerberos)
can't do so because Spark Streaming can't consume from it today.

The right way is, of course, to get delegation tokens in Kafka but honestly
I don't know if that's happening in the near future. I am wondering if we
should consider something to remedy this - for example, we could come up
with a receiver based connector based on the new Kafka consumer API that'd
support kerberos authentication. It won't require delegation tokens since
there's only a very small number of executors talking to Kafka. Of course,
for anyone who cares about high throughput and other direct connector
benefits would have to use direct connector. Another thing we could do is
ship the keytab to the executors in the direct connector, so delegation
tokens are not required but the latter would be a pretty comprising
solution, and I'd prefer not doing that.

What do folks think? Would love to hear your thoughts, especially about the
receiver.

Thanks!
Mark

Reply via email to