On Mon, 2016-08-29 at 15:25 +0200, Andy Polyakov wrote: > First of all. *Everything* that is said below (and what modifications in > question are about) applies to non-ASCII passwords. ASCII-only passwords > are not affected by this and keep working as they were. > > > > > > > > > commit 647ac8d3d7143e3721d55e1f57730b6f26e72fc9 > > > > > > OpenSSL versions before 1.1.0 didn't convert non-ASCII > > > UTF8 PKCS#12 passwords to Unicode correctly. > > > > > > To correctly decrypt older files, if MAC verification fails > > > with the supplied password attempt to use the broken format > > > which is compatible with earlier versions of OpenSSL. > > > > > > Reviewed-by: Richard Levitte <levi...@openssl.org> > > > > Hm, this sounds like something that other crypto libraries also ought > > to try to work around. > > > > Please could you elaborate on the specific problem, and/or show a test > > case? > > Note that there is 80-test_pkcs12.t that refers to shibboleth.pfx.
Thanks. > > I'm not quite sure how to interpret the patch itself. You pass the > > password through OPENSSL_asc2uni() and then OPENSSL_uni2utf8() — which > > is essentially converting ISO8859-1 to UTF-8. > > You make it sound like these two are called one after another. But > that's not what happens. It *tries* to access data with passwords > converted with OPENSSL_asc2uni *or* OPENSSL_utf82uni, effectively > independently of each other, in order to see which one is right. So that > you can *transparently* access old broken data, as well as > specification-compliant one. Hm... at line 541 of pkcs12.c we call PKCS12_verify_mac(…mpass…) with the original provided password. Which is in the locale-native character set, is it not? No attempt to convert to anything known. Then if that *fails* we do indeed convert it via OPENSSL_asc2uni() and OPENSSL_uni2utf8() to make 'badpass' and try again: } else if (!PKCS12_verify_mac(p12, mpass, -1)) { /* * May be UTF8 from previous version of OpenSSL: * convert to a UTF8 form which will translate * to the same Unicode password. */ unsigned char *utmp; int utmplen; utmp = OPENSSL_asc2uni(mpass, -1, NULL, &utmplen); if (utmp == NULL) goto end; badpass = OPENSSL_uni2utf8(utmp, utmplen); OPENSSL_free(utmp); if (!PKCS12_verify_mac(p12, badpass, -1)) { BIO_printf(bio_err, "Mac verify error: invalid password?\n"); ERR_print_errors(bio_err); goto end; } else { BIO_printf(bio_err, "Warning: using broken algorithm\n"); if (!twopass) cpass = badpass; } The shibboleth.pfx test seems to pass on the *first* call to PKCS12_verify_mac() — it *isn't* testing this fallback. If I add a space to the end of the correct password and stick a breakpoint on PKCS12_verify_mac(), I see the following calls: #0 PKCS12_verify_mac (p12=0x956e40, pass=0x956a30 "σύνθημα γνώρισμα ", passlen=-1) at crypto/pkcs12/p12_mutl.c:152 #1 0x0000000000425567 in pkcs12_main (argc=0, argv=0x7fffffffdd90) at apps/pkcs12.c:541 And then it converts that string from ISO8859-1 (which it wasn't) to UTF-8, and calls PKCS12_verify_mac() again: #0 PKCS12_verify_mac (p12=0x956e40, pass=0x9597e0 "Ï\302\203Ï\302\215νθημα γνÏ\302\216Ï\302\201ιÏ\302\203μα ", passlen=-1) at crypto/pkcs12/p12_mutl.c:152 #1 0x00000000004255fc in pkcs12_main (argc=0, argv=0x7fffffffdd90) at apps/pkcs12.c:554 > > > > So, if my password is "naïve". In UTF-8 that's 6e 61 c3 af 76 65, which > > is the correct sequence of bytes to use for the password? > > 00 6e 00 61 00 ef 00 76 00 65, big-endian UTF-16. Is that conversion done inside PKCS12_verify_mac()? Because we definitely aren't passing UTF-16-BE into PKCS12_verify_mac(). > > > > And you now convert that sequence of bytes to 6e 61 c3 83 c2 af 76 65 > > by assuming it's ISO8859-1 (which would be 'naïve') and converting to > > UTF-8? > > I don't follow. Why would it have to be converted to this sequence? That's what it's doing. Changing the example above and showing the same breakpoints as they get hit again... Starting program: /home/dwmw2/git/openssl/apps/openssl pkcs12 -noout -password pass:naïve -in test/shibboleth.pfx [Thread debugging using libthread_db enabled] Using host libthread_db library "/lib64/libthread_db.so.1". Breakpoint 1, PKCS12_verify_mac (p12=0x956e30, pass=0x956a30 "naïve", passlen=-1) at crypto/pkcs12/p12_mutl.c:152 152 if (p12->mac == NULL) { (gdb) x/7bx pass 0x956a30: 0x6e 0x61 0xc3 0xaf 0x76 0x65 0x00 (gdb) c Continuing. Breakpoint 1, PKCS12_verify_mac (p12=0x956e30, pass=0x959960 "naïve", passlen=-1) at crypto/pkcs12/p12_mutl.c:152 152 if (p12->mac == NULL) { (gdb) x/8bx pass 0x959960: 0x6e 0x61 0xc3 0x83 0xc2 0xaf 0x76 0x65 > > > > So... what was the bug that was actually being worked around? > > 6e 61 c3 af 76 65 was expanded to 00 6e 00 61 00 c3 00 af 00 76 00 65, > in violation of specification. OK, that makes sense, because PKCS12_verify_mac() is going to convert that last byte sequence I showed above into UTF16-BE making it: 006e 0061 00c3 00af 0076 0065 > > > > That > > older versions were converting from the local charset to UTF-8 twice in > > a row? So you've implemented a "convert ISO8859-1 to UTF-8" fallback > > which will cope with that in *some* locales but not all...? > > Yeah, something like that. Note that if you have created PKCS#12 file > with a non-UTF8 locale, it's not given that you can read it with another > locale, UTF-8 or not. This was *always* the case. And that's *not* what > we try to address here. We assume that you don't change locale when you > upgrade OpenSSL version. Ultimate goal is to make it compliant with > specification and therefore interoperable with other compliant > implementations. But we can't just switch, because then it stops > interoperate with older OpenSSL versions. Hm, words fail me. Well, that's not entirely true. But *polite* words fail me... Let me try to understand this... you have always ignored, and still ignore, the actual LC_CTYPE which tells you the character set in which the password was provided from the user. You *used* to convert it through your horribly misnamed OPENSSL_asc2uni() function, which actually converts ISO8859-1 to UTF16BE by simply expanding it and inserting 0x00 between each byte of the original. So effectively you were "assuming" the input was ISO8859-1. Now you assume it's UTF-8, and convert it with OPENSSL_utf8touni(). (And now what happens when the locale input *isn't* valid UTF-8, because it's a legacy 8-bit charset? That conversion should fail, right?) Your latest workaround (from which I started this thread) is addressing the first problem *purely* for the case of the UTF-8 locale. It checks for the "we treated UTF-8 as if it were ISO8859-1" case, which is what leads to the 006e 0061 00c3 00af 0076 0065 example you gave above. But you *haven't* actually implemented any workaround for the other whole set of "we treated locale XXXXX as if it were ISO8859-1" bugs that your original code had. Or the whole set of "we treated local XXXXX as if it were UTF-8" bugs that the new code has. So for other applications to try to read OpenSSL's PKCs#12 files, what we need to do is first convert the sane Unicode rendition of the password into the native locale charset (e.g. Windows-1252), then take that bytestream and *pretend* it's ISO8859-1, and convert to UTF-16BE to check the MAC. Then if that fails, take the same Windows-1252 bytestream and *pretend* it's UTF-8, and convert *that* to UTF-16BE to see if it works. So... let's have a password 'nałve', in a ISO8859-2 environment where that is represented by the bytes 6e 61 b3 76 65. First I should try the correct version according to the spec: 006e 0061 0142 0076 0065 Then we try the "OpenSSL assumed it was ISO8859-1" version: 006e 0061 00b3 0076 0065 And finally we try the "OpenSSL assumed it was UTF-8" version: 006e 0061 ... actually you *can't* convert that byte sequence "from" UTF-8 since it isn't valid UTF-8. So what will new OpenSSL do here, again? > Is this helpful? On the whole, I wish I hadn't asked.... but I think we need to sort it out. For a start, *please* can you kill those horribly misnamed OPENSSL_asc2uni() and OPENSSL_utf82uni() functions. Especially the former. Having a function which is misnamed in what it takes as input *and* what it gives out output is particularly impressive. It was a stupid thing for Windows to do, to misuse the term 'Unicode' to mean UCS-2LE (and later UTF-16LE when it retroactively changed its mind. For you to then abuse 'uni' to mean UTF-16BE, something *different* to what Windows abuses the term for, is really not very helpful at all! :) -- dwmw2
smime.p7s
Description: S/MIME cryptographic signature
-- openssl-dev mailing list To unsubscribe: https://mta.openssl.org/mailman/listinfo/openssl-dev