Ciao Guido!

On 14:34 Tue 26 May     , Guido Trotter wrote:
> Thanks  for the very detailed report.
> 
> Is there anything that you see we could change/improve in Ganeti for
> this, or we need to live with the fact that the bug caused this issue
> and a renewal makes it go away?

Let me recap (mostly to make sure that I've gotten node-security 
correctly). We have the following facts regarding the use of SSL 
certificates in Ganeti:

 a) server.pem is self-signed, shared between all nodes and used by 
    noded's server.

 b) client.pem is also self-signed, generated on each node during node 
    join or renew-crypto and is used by the master as a client 
    certificate to connect to the node daemons. It is also used by a 
    prospective master to initiate a master-failover by connecting to 
    other nodes' node daemons.

 c) Both, client and server side involve Python code: noded is pure 
    Python and RPC on the master is currently invoked from Python job 
    code. noded's server implementation relies explicitly on OpenSSL 
    (via pyopenssl), while the RPC client code uses pycurl with whatever 
    underlying TLS library curl uses on the given platform; for Debian 
    this is GnuTLS.
 
 d) Certificate generation is always using OpenSSL. server.pem is 
    (re-)generated on the master, while client.pem is generated by and 
    on each node separately. This means that client.pem and server.pem 
    are generated by potentially different OpenSSL versions.
 
 e) SSL client authentication on the noded is *not* performed using the 
    X.509 CA model; instead, the client certificate fingerprint is 
    compared against a list of known-good master(-candidate) certificate 
    fingerprint.

ยน excluding RAPI clients.

The problem arises from the fact that server.pem and client.pem are two 
completely unrelated certificates, i.e. one is not signed by the other.  
We (implicitly) rely on the fact that they bear the same Subject and 
Issuer DN to force the client SSL implementation to send its certificate 
over to the noded for authentication. However, the certificate 
subject/issuer is a typed value and GnuTLS expects both, the type *and* 
the value to match.

Unfortunately, we have reached a point where OpenSSL seems to have 
changed defaults: wheezy's OpenSSL (1.0.1e) generates certificates with 
PrintableString subjects, while jessie's OpenSSL (1.0.1k) generates 
certificates with UTF8String subjects. Furthermore, we seem to have no 
control whatsoever on the encoding used, at least on PyOpenSSL's level.  

So, in a mixed-distribution cluster, failover from a wheezy master to a 
jessie master will not be possible, unless server.pem is first replaced 
with one generated on a jessie node. Note that gnt-cluster renew-crypto 
will likely not work in that case, because the master will still 
generate a server.pem with a different OpenSSL version than most other 
nodes in the cluster. Renewal will only work if all nodes have been 
upgraded to the same OpenSSL version and were just using "old" 
certificates.

What can we do about it? The way I see it, there are two solutions:

 i) Implement a proper CA, have all client certificates signed by that 
    CA and use that same CA for client authentication, together with the 
    fingerprint whitelist. Signing the certificate would make sure that 
    the Subject DN sent by the server would always match the Issuer DN 
    of client.pem, as per RFC 5280. This is obviously a high development 
    cost solution, and partly redundant because we don't need to rely on 
    a CA to perform authentication.

 ii) Collect *all node certificates* and use all of them as "trusted 
     CAs". As discussed above, when we set server.pem as the "client CA" 
     (HttpBase._CreateSocket), we do so exploiting the fact that it 
     shares the same CN with all clients. Instead of collecting just the 
     fingerprints in ssconf_master_candidate_certificates, we can 
     collect the certificates themselves and load them all as trusted 
     CAs using Context.set_client_ca_list()[1]. This will make sure that 
     the client will send their own certificates regardless of how the 
     Subject DN is encoded, and will also allow things like setting a 
     different CN for each node. This can work in parallel with the 
     fingerprint whitelist, but maybe later we can completely drop the 
     latter and rely on normal SSL verification rules (not sure about 
     the security implications, it's just a suggestion).

[1] 
https://pythonhosted.org/pyOpenSSL/api/ssl.html#OpenSSL.SSL.Context.set_client_ca_list

I think solution ii) above is the easiest and most straight-forward to 
implement. It is also more correct from an SSL standpoint than what we 
have today. Helga, any opinions on this?

Cheers,
Apollon

Reply via email to