Hi all,

I'm having an issue with dynamic configuration of interbroker SSL certificates 
(in Kafka 2.3.0) that I'm hoping someone can give me insight on. I've 
previously posted something similar on the Users email list, but I think it 
will need some help from developers experienced with how the networking code in 
Kafka works.

I'm trying to use SSL two-way authentication for inter broker communication, 
with short lived SSL certificates, rotatating them frequently without needing 
to do a broker restart. So, on each broker in my cluster, I periodically 
generate a new certificate keystore file, and set the 
"listener.name.interbroker.ssl.keystore.location" broker config property 
property. (I'm using inter.broker.listener.name=INTERBROKER)

Setting this property works fine, and everything appears ok. And manually 
connecting to the inter broker listener shows it's correctly serving the new 
certificate. But if I ever restart a broker after the original certificate has 
expired (The one the broker started up with, which is no longer configured 
anywhere), then communication failures between brokers start to toccur. My logs 
fill up with messages like this:

[2019-07-22 03:57:43,605] INFO [SocketServer brokerId=1] Failed authentication 
with 10.224.70.3 (SSL handshake failed) 
(org.apache.kafka.common.network.Selector)

A little bit of extra logging injected into the code tells me that the failures 
are caused by the out of date SSL certificates being used. So it seems there 
are some network components inside Kafka still stuck on the old settings.
This sounds similar to the behaviour described in KAFKA-8336 
(https://issues.apache.org/jira/browse/KAFKA-8336), but this is marked as fixed 
in 2.3.0.

I've confirmed that all the SslChannelBuilders and SslFactories appear to be 
being reconfigured correctly when the dynamic setting is set. I've tried 
closing all existing KafkaChannels on a reconfiguration event, in order to 
force them to re-open with the new certificates, but the problem persists.

Does anyone have any idea what other components may be hanging around, using 
the old certificates?

Thanks,
Michael

Reply via email to