Hi all,
A couple of days ago, a user reported inability to perform a master-failover,
citing a GnuTLS handshake error on the master-candidate side. The error was
consistently reproducible and caused by the client never sending its own
certificate to the master's noded, as evidenced by a pcap trace obtained by the
user. A couple of tests ran by the user indicated that this is an issue with
pycurl: both gnutls-cli and openssl s_client could connect properly to the
master's noded using the master candidate's client.pem.
Finally the user was able to perform a master-failover after renewing the
cluster keys and was kind enough to mail us the old key material to
examine. I was able to reproduce the behaviour locally, using openssl
s_server with server.pem as a mock server and a small C program
(attached¹) as a mock ganeti RPC client.
The following is an analysis of what caused this error.
Note that the user is running Debian and this is probably the same as
Debian bug #755554, which also involved ganeti (but was filed against
pycurl).
¹ The C program must be compiled against the GnuTLS variant of libcurl.
On Debian this means having libcurl4-gnutls-dev installed.
TLS client certificate authentication
-------------------------------------
During the TLS handshake, the server may request the client to send a
certificate and will indicate a number of trusted Certification
Authorities used for client authentication in its Server Hello packet.
This is done using a special handshake message (Certificate Request)
containing the Distinguished Names and supported key and signature types
of the trusted Certification Authorities. The CA DNs are sent in X.509
ASN.1 encoding, specifying the components to match, their type and
value. The client will then consult its own certificate list to check if
it has a certificate of supported type issued by one of the indicated
CAs and send it back using a Certificate TLS message.
In this particular case the DN present in the Server Hello's Certificate
Request message is:
0000 31 1b 30 19 06 03 55 04 03 13 12 67 61 6e 65 74 1.0...U....ganet
0010 69 2e 65 78 61 6d 70 6c 65 2e 63 6f 6d i.example.com
This translates to a DN with CN=ganeti.example.com. The key (CN) is specified
using 0x55 0x04 0x03, and its value is specified as the 18-byte (0x12)
printable string (0x13) "ganeti.example.com". However, the client sent an
empty Certificate response, despite the fact that client.pem is indeed issued by
CN=ganeti.example.com. Taking a closer look at client.pem in its DER-encoded
form[1] and extracting the issuer DN from it, we see:
0000 31 1b 30 19 06 03 55 04 03 0c 12 67 61 6e 65 74 1.0...U....ganet
0010 69 2e 65 78 61 6d 70 6c 65 2e 63 6f 6d i.example.com
There is a single byte difference at offset 0x09, which indicates the CN value
type: the server's certificate has a type of 0x13 (printableString),
while the client's certificate uses 0x0c (utf8String). This difference
is the actual cause of the problem and is probably related to issue
#853.
[1] sed -n -e '/BEGIN CERTIFICATE/,/END CERTIFICATE/p' client.pem | grep -v
'^---' | base64 -d | hd
How GnuTLS handles Certificate Requests
---------------------------------------
(The following analysis uses the source of GnuTLS 3.3.15 from the 3.3.15-2
Debian package)
Looking at select_client_cert() (lib/auth/cert.c:638), we see that GnuTLS
selects the client certificate to send using two different mechanisms:
- The fist one is callback-based and triggered when a Certificate Request is
received and decoded. Its purpose is (among others) to allow e.g. the user
to choose a certificate interactively.
- If there are no callbacks installed, it calls find_x509_cert() which tries
to guess the correct certificate to use based on internal state only.
Looking at cURL's code (lib/vtls/gtls.c), it looks like cURL does not install
any callbacks and relies on GnuTLS's built-in behavior, thus falling back to
find_x509_cert(). This can also be confirmed using gdb on an actual
process.
Looking at find_x509_cert() (lib/auth/cert.c:206) more closely, we see
that it basically comes down to extracting the DN's from the Certificate
Request and its own candidate certificates and doing a memcmp() on them.
However, due to this 1-byte difference, the memcmp() will fail and
discard the certificate as a no-match.
Note that GnuTLS is not at fault here: RFC 5280
(https://www.ietf.org/rfc/rfc5280.txt) § 4.1.2.6 mandates that:
(a) When the subject of the certificate is a CA, the subject
field MUST be encoded in the same way as it is encoded in the
issuer field (Section 4.1.2.4) in all certificates issued by
the subject CA. Thus, if the subject CA encodes attributes
in the issuer fields of certificates that it issues using the
TeletexString, BMPString, or UniversalString encodings, then
the subject field of certificates issued to that CA MUST use
the same encoding.
What this basically means is that since server.pem specifies its Subject as a
printableString, then all client.pem's _must_ have their Issuer fields encoded
as printableStrings (and vice versa).
Conclusion
----------
Unfortunately I didn't have the time to examine how Ganeti generates and
distributes client.pem to the nodes. The fact that this bug is rarely seen and
does not affect all clusters or nodes suggests that it is probably
distribution-specific and has been fixed with issue #853. However, we have to
make sure that:
- Certificate generation works properly wrt. RFC 5280, i.e. generated
certificates always have their Issuer encoded in the same way as the
CA's Subject. If it doesn't, then the library used to sign the CSRs
is at fault and the issue must be further investigated.
- We never touch/re-encode the client certificates on distribution. I assume
this has been fixed with #853.
Q & A
-----
Q: Why did gnutls-cli work?
A: gnutls-cli installs a callback, cert_callback() (srv/cli.c:536), to
set the client certificate without consulting the DNs, thus never
falling back to find_x509_cert() and its memcmp().
Q: Why does /usr/bin/curl proper (the binary) work?
A: /usr/bin/curl in Debian is linked against the OpenSSL variant of libcurl. It
looks like OpenSSL does DN comparison in a different way and is not affected.
Cheers,
Apollon
#include <stdio.h>
#include <curl/curl.h>
int main(void) {
CURL *curl;
CURLcode res;
curl_global_init(CURL_GLOBAL_DEFAULT);
curl = curl_easy_init();
if (curl) {
curl_easy_setopt(curl, CURLOPT_FOLLOWLOCATION, 0L);
curl_easy_setopt(curl, CURLOPT_CAINFO, "server.pem");
curl_easy_setopt(curl, CURLOPT_SSL_VERIFYHOST, 0L);
curl_easy_setopt(curl, CURLOPT_SSL_VERIFYPEER, 1L);
curl_easy_setopt(curl, CURLOPT_SSLCERT, "client.pem");
curl_easy_setopt(curl, CURLOPT_VERBOSE, 1L);
curl_easy_setopt(curl, CURLOPT_URL, "https://localhost:1811/");
curl_easy_setopt(curl, CURLOPT_SSLVERSION, CURL_SSLVERSION_TLSv1);
res = curl_easy_perform(curl);
if (res != CURLE_OK)
fprintf(stderr, "curl_easy_perform() failed: %s\n", curl_easy_strerror(res));
curl_easy_cleanup(curl);
}
curl_global_cleanup();
return 0;
}