[ https://issues.apache.org/jira/browse/ZOOKEEPER-4790?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Sönke Liebau updated ZOOKEEPER-4790: ------------------------------------ Description: Currently, enabling Quorum TLS will make the server validate SANs client certificates of connecting quorum peers against their reverse DNS address. We have seen this cause issues when running in Kubernetes, due to ip addresses resolving to multiple dns names, when ZooKeeper pods participate in multiple services. In this scenario coredns returns a random one out of the list of possible hostnames that match the ip - so when ZooKeeper does a reverse lookup on the ip it becomes a game of chance if it gets a name that is contained in the cert. Since `InetAddress.getHostAddress()` returns a String, it basically becomes a game of chance which dns name is checked against the cert. This usually shakes itself loose after a few minutes, when the hostname that gets returned by the reverse lookup randomly changes and all of a sudden matches the certificate... but this is less than ideal. This has caused issues in the Strimzi operator as well (see [this issue|https://github.com/strimzi/strimzi-kafka-operator/issues/3099]) - they solved this by pretty much adding anything they can find that might be relevant to the SAN, and a few wildcards on top of that. This is both, error prone and doesn't really add any relevant extra amount of security, since "This certificate matches the connecting peer" shouldn't automatically mean "this peer should be allowed to connect". There are two (probably more) ways to fix this: # Retrieve _all_ reverse entries and check against all of them # The ZK server could verify the SAN against the list of servers ({{{}servers.N{}}} in the config). A peer should be able to connect on the quorum port if and only if at least one SAN matches at least one of the listed servers. I'd argue that the second option is the better one, especially since the java api doesn't even seem to have the option of retrieving all dns entries, but also because it better matches the expressed intent of the ZK admin. Additionally, it would be nice to have a "disable client hostname verification" option that still leaves server hostname verification enabled. Strictly speaking this is a separate issue though, I'd be happy to spin that out into a ticket of its own.. was: Currently, enabling Quorum TLS will make the server validate SANs client certificates of connecting quorum peers against their reverse DNS address. We have seen this cause issues when running in Kubernetes, due to ip addresses resolving to multiple dns names, when ZooKeeper pods participate in multiple services. Since `InetAddress.getHostAddress()` returns a String, it basically becomes a game of chance which dns name is checked against the cert. This has caused issues in the Strimzi operator as well (see [this issue|https://github.com/strimzi/strimzi-kafka-operator/issues/3099]) - they solved this by pretty much adding anything they can find that might be relevant to the SAN, and a few wildcards on top of that. This is both, error prone and doesn't really add any relevant extra amount of security, since "This certificate matches the connecting peer" shouldn't automatically mean "this peer should be allowed to connect". There are two (probably more) ways to fix this: # Retrieve _all_ reverse entries and check against all of them # The ZK server could verify the SAN against the list of servers ({{{}servers.N{}}} in the config). A peer should be able to connect on the quorum port if and only if at least one SAN matches at least one of the listed servers. I'd argue that the second option is the better one, especially since the java api doesn't even seem to have the option of retrieving all dns entries, but also because it better matches the expressed intent of the ZK admin. Additionally, it would be nice to have a "disable client hostname verification" option that still leaves server hostname verification enabled. Strictly speaking this is a separate issue though, I'd be happy to spin that out into a ticket of its own.. > TLS Quorum hostname verification breaks in some scenarios > --------------------------------------------------------- > > Key: ZOOKEEPER-4790 > URL: https://issues.apache.org/jira/browse/ZOOKEEPER-4790 > Project: ZooKeeper > Issue Type: Improvement > Affects Versions: 3.9.1 > Reporter: Sönke Liebau > Priority: Minor > > Currently, enabling Quorum TLS will make the server validate SANs client > certificates of connecting quorum peers against their reverse DNS address. > We have seen this cause issues when running in Kubernetes, due to ip > addresses resolving to multiple dns names, when ZooKeeper pods participate in > multiple services. In this scenario coredns returns a random one out of the > list of possible hostnames that match the ip - so when ZooKeeper does a > reverse lookup on the ip it becomes a game of chance if it gets a name that > is contained in the cert. > Since `InetAddress.getHostAddress()` returns a String, it basically becomes a > game of chance which dns name is checked against the cert. > This usually shakes itself loose after a few minutes, when the hostname that > gets returned by the reverse lookup randomly changes and all of a sudden > matches the certificate... but this is less than ideal. > This has caused issues in the Strimzi operator as well (see [this > issue|https://github.com/strimzi/strimzi-kafka-operator/issues/3099]) - they > solved this by pretty much adding anything they can find that might be > relevant to the SAN, and a few wildcards on top of that. > This is both, error prone and doesn't really add any relevant extra amount of > security, since "This certificate matches the connecting peer" shouldn't > automatically mean "this peer should be allowed to connect". > > There are two (probably more) ways to fix this: > # Retrieve _all_ reverse entries and check against all of them > # The ZK server could verify the SAN against the list of servers > ({{{}servers.N{}}} in the config). A peer should be able to connect on the > quorum port if and only if at least one SAN matches at least one of the > listed servers. > I'd argue that the second option is the better one, especially since the java > api doesn't even seem to have the option of retrieving all dns entries, but > also because it better matches the expressed intent of the ZK admin. > Additionally, it would be nice to have a "disable client hostname > verification" option that still leaves server hostname verification enabled. > Strictly speaking this is a separate issue though, I'd be happy to spin that > out into a ticket of its own.. -- This message was sent by Atlassian Jira (v8.20.10#820010)