Ciao Guido!
On 14:34 Tue 26 May , Guido Trotter wrote:
> Thanks for the very detailed report.
>
> Is there anything that you see we could change/improve in Ganeti for
> this, or we need to live with the fact that the bug caused this issue
> and a renewal makes it go away?
Let me recap (mostly to make sure that I've gotten node-security
correctly). We have the following facts regarding the use of SSL
certificates in Ganeti:
a) server.pem is self-signed, shared between all nodes and used by
noded's server.
b) client.pem is also self-signed, generated on each node during node
join or renew-crypto and is used by the master as a client
certificate to connect to the node daemons. It is also used by a
prospective master to initiate a master-failover by connecting to
other nodes' node daemons.
c) Both, client and server side involve Python code: noded is pure
Python and RPC on the master is currently invoked from Python job
code. noded's server implementation relies explicitly on OpenSSL
(via pyopenssl), while the RPC client code uses pycurl with whatever
underlying TLS library curl uses on the given platform; for Debian
this is GnuTLS.
d) Certificate generation is always using OpenSSL. server.pem is
(re-)generated on the master, while client.pem is generated by and
on each node separately. This means that client.pem and server.pem
are generated by potentially different OpenSSL versions.
e) SSL client authentication on the noded is *not* performed using the
X.509 CA model; instead, the client certificate fingerprint is
compared against a list of known-good master(-candidate) certificate
fingerprint.
ยน excluding RAPI clients.
The problem arises from the fact that server.pem and client.pem are two
completely unrelated certificates, i.e. one is not signed by the other.
We (implicitly) rely on the fact that they bear the same Subject and
Issuer DN to force the client SSL implementation to send its certificate
over to the noded for authentication. However, the certificate
subject/issuer is a typed value and GnuTLS expects both, the type *and*
the value to match.
Unfortunately, we have reached a point where OpenSSL seems to have
changed defaults: wheezy's OpenSSL (1.0.1e) generates certificates with
PrintableString subjects, while jessie's OpenSSL (1.0.1k) generates
certificates with UTF8String subjects. Furthermore, we seem to have no
control whatsoever on the encoding used, at least on PyOpenSSL's level.
So, in a mixed-distribution cluster, failover from a wheezy master to a
jessie master will not be possible, unless server.pem is first replaced
with one generated on a jessie node. Note that gnt-cluster renew-crypto
will likely not work in that case, because the master will still
generate a server.pem with a different OpenSSL version than most other
nodes in the cluster. Renewal will only work if all nodes have been
upgraded to the same OpenSSL version and were just using "old"
certificates.
What can we do about it? The way I see it, there are two solutions:
i) Implement a proper CA, have all client certificates signed by that
CA and use that same CA for client authentication, together with the
fingerprint whitelist. Signing the certificate would make sure that
the Subject DN sent by the server would always match the Issuer DN
of client.pem, as per RFC 5280. This is obviously a high development
cost solution, and partly redundant because we don't need to rely on
a CA to perform authentication.
ii) Collect *all node certificates* and use all of them as "trusted
CAs". As discussed above, when we set server.pem as the "client CA"
(HttpBase._CreateSocket), we do so exploiting the fact that it
shares the same CN with all clients. Instead of collecting just the
fingerprints in ssconf_master_candidate_certificates, we can
collect the certificates themselves and load them all as trusted
CAs using Context.set_client_ca_list()[1]. This will make sure that
the client will send their own certificates regardless of how the
Subject DN is encoded, and will also allow things like setting a
different CN for each node. This can work in parallel with the
fingerprint whitelist, but maybe later we can completely drop the
latter and rely on normal SSL verification rules (not sure about
the security implications, it's just a suggestion).
[1]
https://pythonhosted.org/pyOpenSSL/api/ssl.html#OpenSSL.SSL.Context.set_client_ca_list
I think solution ii) above is the easiest and most straight-forward to
implement. It is also more correct from an SSL standpoint than what we
have today. Helga, any opinions on this?
Cheers,
Apollon