Hi all,

A couple of days ago, a user reported inability to perform a master-failover,
citing a GnuTLS handshake error on the master-candidate side. The error was
consistently reproducible and caused by the client never sending its own
certificate to the master's noded, as evidenced by a pcap trace obtained by the
user. A couple of tests ran by the user indicated that this is an issue with
pycurl: both gnutls-cli and openssl s_client could connect properly to the
master's noded using the master candidate's client.pem.

Finally the user was able to perform a master-failover after renewing the
cluster keys and was kind enough to mail us the old key material to 
examine. I was able to reproduce the behaviour locally, using openssl 
s_server with server.pem as a mock server and a small C program 
(attached¹) as a mock ganeti RPC client.

The following is an analysis of what caused this error.

Note that the user is running Debian and this is probably the same as 
Debian bug #755554, which also involved ganeti (but was filed against 
pycurl).

¹ The C program must be compiled against the GnuTLS variant of libcurl.  
  On Debian this means having libcurl4-gnutls-dev installed.

TLS client certificate authentication
-------------------------------------

During the TLS handshake, the server may request the client to send a 
certificate and will indicate a number of trusted Certification 
Authorities used for client authentication in its Server Hello packet. 
This is done using a special handshake message (Certificate Request) 
containing the Distinguished Names and supported key and signature types 
of the trusted Certification Authorities. The CA DNs are sent in X.509 
ASN.1 encoding, specifying the components to match, their type and 
value. The client will then consult its own certificate list to check if 
it has a certificate of supported type issued by one of the indicated 
CAs and send it back using a Certificate TLS message.

In this particular case the DN present in the Server Hello's Certificate
Request message is:

0000   31 1b 30 19 06 03 55 04 03 13 12 67 61 6e 65 74  1.0...U....ganet
0010   69 2e 65 78 61 6d 70 6c 65 2e 63 6f 6d           i.example.com

This translates to a DN with CN=ganeti.example.com. The key (CN) is specified
using 0x55 0x04 0x03, and its value is specified as the 18-byte (0x12)
printable string (0x13) "ganeti.example.com". However, the client sent an
empty Certificate response, despite the fact that client.pem is indeed issued by
CN=ganeti.example.com. Taking a closer look at client.pem in its DER-encoded
form[1] and extracting the issuer DN from it, we see:

0000   31 1b 30 19 06 03 55 04 03 0c 12 67 61 6e 65 74  1.0...U....ganet
0010   69 2e 65 78 61 6d 70 6c 65 2e 63 6f 6d           i.example.com

There is a single byte difference at offset 0x09, which indicates the CN value
type: the server's certificate has a type of 0x13 (printableString), 
while the client's certificate uses 0x0c (utf8String). This difference 
is the actual cause of the problem and is probably related to issue 
#853.

[1] sed -n -e '/BEGIN CERTIFICATE/,/END CERTIFICATE/p' client.pem | grep -v 
'^---' | base64 -d | hd

How GnuTLS handles Certificate Requests
---------------------------------------

(The following analysis uses the source of GnuTLS 3.3.15 from the 3.3.15-2
 Debian package)

Looking at select_client_cert() (lib/auth/cert.c:638), we see that GnuTLS
selects the client certificate to send using two different mechanisms:

 - The fist one is callback-based and triggered when a Certificate Request is
   received and decoded. Its purpose is (among others) to allow e.g. the user
   to choose a certificate interactively.

 - If there are no callbacks installed, it calls find_x509_cert() which tries
   to guess the correct certificate to use based on internal state only.

Looking at cURL's code (lib/vtls/gtls.c), it looks like cURL does not install
any callbacks and relies on GnuTLS's built-in behavior, thus falling back to
find_x509_cert(). This can also be confirmed using gdb on an actual 
process.

Looking at find_x509_cert() (lib/auth/cert.c:206) more closely, we see 
that it basically comes down to extracting the DN's from the Certificate 
Request and its own candidate certificates and doing a memcmp() on them. 
However, due to this 1-byte difference, the memcmp() will fail and 
discard the certificate as a no-match.

Note that GnuTLS is not at fault here: RFC 5280 
(https://www.ietf.org/rfc/rfc5280.txt) § 4.1.2.6 mandates that:

   (a)  When the subject of the certificate is a CA, the subject
        field MUST be encoded in the same way as it is encoded in the
        issuer field (Section 4.1.2.4) in all certificates issued by
        the subject CA.  Thus, if the subject CA encodes attributes
        in the issuer fields of certificates that it issues using the
        TeletexString, BMPString, or UniversalString encodings, then
        the subject field of certificates issued to that CA MUST use
        the same encoding.

What this basically means is that since server.pem specifies its Subject as a
printableString, then all client.pem's _must_ have their Issuer fields encoded
as printableStrings (and vice versa).

Conclusion
----------

Unfortunately I didn't have the time to examine how Ganeti generates and
distributes client.pem to the nodes. The fact that this bug is rarely seen and
does not affect all clusters or nodes suggests that it is probably
distribution-specific and has been fixed with issue #853. However, we have to
make sure that:

 - Certificate generation works properly wrt. RFC 5280, i.e. generated 
   certificates always have their Issuer encoded in the same way as the 
   CA's Subject. If it doesn't, then the library used to sign the CSRs 
   is at fault and the issue must be further investigated.

 - We never touch/re-encode the client certificates on distribution. I assume
   this has been fixed with #853.

Q & A
-----

Q: Why did gnutls-cli work?
A: gnutls-cli installs a callback, cert_callback() (srv/cli.c:536), to
   set the client certificate without consulting the DNs, thus never 
   falling back to find_x509_cert() and its memcmp().

Q: Why does /usr/bin/curl proper (the binary) work?
A: /usr/bin/curl in Debian is linked against the OpenSSL variant of libcurl. It
   looks like OpenSSL does DN comparison in a different way and is not affected.

Cheers,
Apollon
#include <stdio.h>
#include <curl/curl.h>

int main(void) {
	CURL *curl;
	CURLcode res;

	curl_global_init(CURL_GLOBAL_DEFAULT);

	curl = curl_easy_init();

	if (curl) {
		curl_easy_setopt(curl, CURLOPT_FOLLOWLOCATION, 0L);
		curl_easy_setopt(curl, CURLOPT_CAINFO, "server.pem");
		curl_easy_setopt(curl, CURLOPT_SSL_VERIFYHOST, 0L);
		curl_easy_setopt(curl, CURLOPT_SSL_VERIFYPEER, 1L);
		curl_easy_setopt(curl, CURLOPT_SSLCERT, "client.pem");
		curl_easy_setopt(curl, CURLOPT_VERBOSE, 1L);
		curl_easy_setopt(curl, CURLOPT_URL, "https://localhost:1811/";);
		curl_easy_setopt(curl, CURLOPT_SSLVERSION, CURL_SSLVERSION_TLSv1);
		res = curl_easy_perform(curl);
		if (res != CURLE_OK)
			fprintf(stderr, "curl_easy_perform() failed: %s\n", curl_easy_strerror(res));
		curl_easy_cleanup(curl);
	}
	curl_global_cleanup();
	return 0;
}

Reply via email to