[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-4790?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sönke Liebau updated ZOOKEEPER-4790:
------------------------------------
    Description: 
Currently, enabling Quorum TLS will make the server validate SANs client 
certificates of connecting quorum peers against their reverse DNS address. 

 We have seen this cause issues when running in Kubernetes, due to ip addresses 
resolving to multiple dns names, when ZooKeeper pods participate in multiple 
services. In this scenario coredns returns a random one out of the list of 
possible hostnames that match the ip - so when ZooKeeper does a reverse lookup 
on the ip it becomes a game of chance if it gets a name that is contained in 
the cert. 

Since `InetAddress.getHostAddress()` returns a String, it basically becomes a 
game of chance which dns name is checked against the cert. 
This usually shakes itself loose after a few minutes, when the hostname that 
gets returned by the reverse lookup randomly changes and all of a sudden 
matches the certificate... but this is less than ideal.

This has caused issues in the Strimzi operator as well (see [this 
issue|https://github.com/strimzi/strimzi-kafka-operator/issues/3099]) - they 
solved this by pretty much adding anything they can find that might be relevant 
to the SAN, and a few wildcards on top of that.

This is both, error prone and doesn't really add any relevant extra amount of 
security, since "This certificate matches the connecting peer" shouldn't 
automatically mean "this peer should be allowed to connect".
 
 There are two (probably more) ways to fix this:

# Retrieve _all_  reverse entries and check against all of them
# The ZK server could verify the SAN against the list of servers 
({{{}servers.N{}}} in the config). A peer should be able to connect on the 
quorum port if and only if at least one SAN matches at least one of the listed 
servers.

I'd argue that the second option is the better one, especially since the java 
api doesn't even seem to have the option of retrieving all dns entries, but 
also because it better matches the expressed intent of the ZK admin.

Additionally, it would be nice to have a "disable client hostname verification" 
option that still leaves server hostname verification enabled. Strictly 
speaking this is a separate issue though, I'd be happy to spin that out into a 
ticket of its own..



  was:
Currently, enabling Quorum TLS will make the server validate SANs client 
certificates of connecting quorum peers against their reverse DNS address. 

 We have seen this cause issues when running in Kubernetes, due to ip addresses 
resolving to multiple dns names, when ZooKeeper pods participate in multiple 
services.

Since `InetAddress.getHostAddress()` returns a String, it basically becomes a 
game of chance which dns name is checked against the cert. This has caused 
issues in the Strimzi operator as well (see [this 
issue|https://github.com/strimzi/strimzi-kafka-operator/issues/3099]) - they 
solved this by pretty much adding anything they can find that might be relevant 
to the SAN, and a few wildcards on top of that.

This is both, error prone and doesn't really add any relevant extra amount of 
security, since "This certificate matches the connecting peer" shouldn't 
automatically mean "this peer should be allowed to connect".
 
 There are two (probably more) ways to fix this:

# Retrieve _all_  reverse entries and check against all of them
# The ZK server could verify the SAN against the list of servers 
({{{}servers.N{}}} in the config). A peer should be able to connect on the 
quorum port if and only if at least one SAN matches at least one of the listed 
servers.

I'd argue that the second option is the better one, especially since the java 
api doesn't even seem to have the option of retrieving all dns entries, but 
also because it better matches the expressed intent of the ZK admin.

Additionally, it would be nice to have a "disable client hostname verification" 
option that still leaves server hostname verification enabled. Strictly 
speaking this is a separate issue though, I'd be happy to spin that out into a 
ticket of its own..




> TLS Quorum hostname verification breaks in some scenarios
> ---------------------------------------------------------
>
>                 Key: ZOOKEEPER-4790
>                 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-4790
>             Project: ZooKeeper
>          Issue Type: Improvement
>    Affects Versions: 3.9.1
>            Reporter: Sönke Liebau
>            Priority: Minor
>
> Currently, enabling Quorum TLS will make the server validate SANs client 
> certificates of connecting quorum peers against their reverse DNS address. 
>  We have seen this cause issues when running in Kubernetes, due to ip 
> addresses resolving to multiple dns names, when ZooKeeper pods participate in 
> multiple services. In this scenario coredns returns a random one out of the 
> list of possible hostnames that match the ip - so when ZooKeeper does a 
> reverse lookup on the ip it becomes a game of chance if it gets a name that 
> is contained in the cert. 
> Since `InetAddress.getHostAddress()` returns a String, it basically becomes a 
> game of chance which dns name is checked against the cert. 
> This usually shakes itself loose after a few minutes, when the hostname that 
> gets returned by the reverse lookup randomly changes and all of a sudden 
> matches the certificate... but this is less than ideal.
> This has caused issues in the Strimzi operator as well (see [this 
> issue|https://github.com/strimzi/strimzi-kafka-operator/issues/3099]) - they 
> solved this by pretty much adding anything they can find that might be 
> relevant to the SAN, and a few wildcards on top of that.
> This is both, error prone and doesn't really add any relevant extra amount of 
> security, since "This certificate matches the connecting peer" shouldn't 
> automatically mean "this peer should be allowed to connect".
>  
>  There are two (probably more) ways to fix this:
> # Retrieve _all_  reverse entries and check against all of them
> # The ZK server could verify the SAN against the list of servers 
> ({{{}servers.N{}}} in the config). A peer should be able to connect on the 
> quorum port if and only if at least one SAN matches at least one of the 
> listed servers.
> I'd argue that the second option is the better one, especially since the java 
> api doesn't even seem to have the option of retrieving all dns entries, but 
> also because it better matches the expressed intent of the ZK admin.
> Additionally, it would be nice to have a "disable client hostname 
> verification" option that still leaves server hostname verification enabled. 
> Strictly speaking this is a separate issue though, I'd be happy to spin that 
> out into a ticket of its own..



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to