On Sat, Dec 30, 2017 at 2:28 AM, Antoine Pitrou <solip...@pitrou.net> wrote: > On Fri, 29 Dec 2017 21:54:46 +0100 > Christian Heimes <christ...@python.org> wrote: >> >> On the other hand ssl module is currently completely broken. It converts >> hostnames from bytes to text with 'idna' codec in some places, but not >> in all. The SSLSocket.server_hostname attribute and callback function >> SSLContext.set_servername_callback() are decoded as U-label. >> Certificate's common name and subject alternative name fields are not >> decoded and therefore A-labels. The *must* stay A-labels because >> hostname verification is only defined in terms of A-labels. We even had >> a security issue once, because partial wildcard like 'xn*.example.org' >> must not match IDN hosts like 'xn--bcher-kva.example.org'. >> >> In issue [2] and PR [3], we all agreed that the only sensible fix is to >> make 'SSLContext.server_hostname' an ASCII text A-label. > > What are the changes in API terms? If I'm calling wrap_socket(), can I > pass `server_hostname='straße'` and it will IDNA-encode it? Or do I > have to encode it myself? If the latter, it seems like we are putting > the burden of protocol compliance on users.
Part of what makes this confusing is that there are actually three intertwined issues here. (Also, anything that deals with Unicode *or* SSL/TLS is automatically confusing, and this is about both!) Issue 1: Python's built-in IDNA implementation is wrong (implements IDNA 2003, not IDNA 2008). Issue 2: The ssl module insists on using Python's built-in IDNA implementation whether you want it to or not. Issue 3: Also, the ssl module has a separate bug that means client-side cert validation has never worked for any IDNA domain. Issue 1 is potentially a security issue, because it means that in a small number of cases, Python will misinterpret a domain name. IDNA 2003 and IDNA 2008 are very similar, but there are 4 characters that are interpreted differently, with ß being one of them. Fixing this though is a big job, and doesn't exactly have anything to do with the ssl module -- for example, socket.getaddrinfo("straße.de", 80) and sock.connect("straße.de", 80) also do the wrong thing. Christian's not proposing to fix this here. It's issues 2 and 3 that he's proposing to fix. Issue 2 is a problem because it makes it impossible to work around issue 1, even for users who know what they're doing. In the socket module, you can avoid Python's automagical IDNA handling by doing it manually, and then calling socket.getaddrinfo("strasse.de", 80) or socket.getaddrinfo("xn--strae-oqa.de", 80), whichever you prefer. In the ssl module, this doesn't work. There are two places where ssl uses hostnames. In client mode, the user specifies the server_hostname that they want to see a certificate for, and then the module runs this through Python's IDNA machinery *even if* it's already properly encoded in ascii. And in server mode, when the user has specified an SNI callback so they can find out which certificate an incoming client connection is looking for, the module runs the incoming name through Python's IDNA machinery before handing it to user code. In both cases, the right thing to do would be to just pass through the ascii A-label versions, so savvy users can do whatever they want with them. (This also matches the general design principle around IDNA, which assumes that the pretty unicode U-labels are used only for UI purposes, and everything internal uses A-labels.) Issue 3 is just a silly bug that needs to be fixed, but it's tangled up here because the fix is the same as for Issue 2: the reason client-side cert validation has never worked is that we've been taking the A-label from the server's certificate and checking if it matches the U-label we expect, and of course it never does because we're comparing strings in different encodings. If we consistently converted everything to A-labels as soon as possible and kept it that way, then this bug would never have happened. What makes it tricky is that on both the client and the server, fixing this is actually user-visible. On the client, checking sslsock.server_hostname used to always show a U-label, but if we stop using U-labels internally then this doesn't make sense. Fortunately, since this case has never worked at all, fixing it shouldn't cause any problems. On the server, the obvious fix would be to start passing A-label-encoded names to the servername_callback, instead of U-label-encoded names. Unfortunately, this is a bit trickier, because this *has* historically worked (AFAIK) for IDNA names, so long as they didn't use one of the four magic characters who changed meaning between IDNA 2003 and IDNA 2008. But we do still need to do something. For example, right now, it's impossible to use the ssl module to implement a web server at https://straße.de, because incoming connections will use SNI to say that they expect a cert for "xn--strae-oqa.de", and then the ssl module will freak out and throw an exception instead of invoking the servername callback. It's ugly, but probably the simplest thing is to add a new function like set_servername_callback2 that uses the A-label, and then redefine set_servername_callback as a deprecated compatibility shim: def set_servername_callback(self, cb): def shim_cb(sslobj, servername, sslctx): if servername is not None: servername = servername.encode("ascii").decode("idna") return cb(sslobj, servername, sslctx) self.set_servername_callback2(shim_cb) We can bikeshed what the new name should be. Maybe set_sni_callback? or set_server_hostname_callback, since the corresponding client-mode argument is server_hostname? -n -- Nathaniel J. Smith -- https://vorpus.org _______________________________________________ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com