Re: strange results after setting utf8 -subj in openssl ca command
Look at this example: $openssl x509 -subject -nameopt oneline,-esc_msb,utf8 -noout -in 13/13_cert.pem ... CN = 13#ტესტერიN13 $openssl x509 -subject -noout -in 13/13_cert.pem ... CN=13#\xE1\x83\xA2\xE1\x83\x94\xE1\x83\xA1\xE1\x83\xA2\xE1\x83\x94\xE1\x83\xA0\xE1\x83\x98N13 This certificate was signed by openssl ca without changing subject, and openssl req did not use BMPString and UCS-2 in this case. CN string contains Georgian letters but numbers are in ASCII so it is UTF-8 in fact. So why openssl ca decides to use BMPString format? Looks like 1-byte code strings can be used without violating ASN.1 standard - Original Message - From: Dave Thompson dthomp...@prinpay.com To: openssl-users@openssl.org Cc: Sent: Monday, July 30, 2012 6:47 AM Subject: RE: strange results after setting utf8 -subj in openssl ca command From: owner-openssl-us...@openssl.org On Behalf Of Pica Pica Contact Sent: Saturday, 28 July, 2012 14:41 Note that X.509 certs (and ASN.1 generally) don't actually support UTF8. They support several 1-byte codes (some now obsolete), BMPString which is 2-byte UCS-2, and UniversalString which is 4-byte UCS-4. I believe OpenSSL selects the smallest of these into which the specified (Unicode) codepoints fit, which in this case is UCS-2. After adding -nameopt oneline,-esc_msb,utf8 result looks fine That should translate the Unicode to UTF8 and output it, and assuming your terminal handles UTF8 then yes it will be good I call X509_NAME_oneline() function inside my application to get CN string, and application fails to convert number from CN field to integer, because X509_NAME_oneline() returns /CN=\x003\x000\x000\x000\x000\x00# instead of CN=3# I'm pretty sure _oneline is what x509 -text without -nameopt uses. Probably I should use X509_NAME_print_ex(), Or if you only want CN, you could get the raw CN item and its value out of the name structure which in OpenSSL is STACK_OF(X509_NAME_ENTRY). but I have doubts if this string encoding is correct and how it would work with other software. For example, certtool from GnuTLS outputs subject string in this way: $ certtool -i --infile 3.pem ...skipped... Subject: CN=#003300300030003000300023044204350441044210e210d410e110e24e2d56fd ...skipped... That apparently is dumping the UCS-2 bytes. Compare to above. There are no such problems in openssl req, I can set UTF8 strings with numbers in certificate requests and resulting certificate is ok for me, but I need to ignore subject from certificate requests and set my own value Is it possible to fix openssl ca command somehow to encode numbers in UTF8 strings as strings, not numbers? 'ca' can only encode ASN.1 strings in the ways defined by ASN.1. You must decode them accordingly. Automated List Manager majord...@openssl.org __ OpenSSL Project http://www.openssl.org User Support Mailing Listopenssl-users@openssl.org Automated List Manager majord...@openssl.org
Re: strange results after setting utf8 -subj in openssl ca command
On Sun, Jul 29, 2012, Dave Thompson wrote: Note that X.509 certs (and ASN.1 generally) don't actually support UTF8. They support several 1-byte codes (some now obsolete), BMPString which is 2-byte UCS-2, and UniversalString which is 4-byte UCS-4. I believe OpenSSL selects the smallest of these into which the specified (Unicode) codepoints fit, which in this case is UCS-2. There is a UTF8String type which has been about for some time. OpenSSL for certificate requests uses the smallest of a set of types determined by the string_mask option in openssl.cnf. This is set to utf8only in OpenSSL 1.0.0 and later. Steve. -- Dr Stephen N. Henson. OpenSSL project core developer. Commercial tech support now available see: http://www.openssl.org __ OpenSSL Project http://www.openssl.org User Support Mailing Listopenssl-users@openssl.org Automated List Manager majord...@openssl.org
RE: strange results after setting utf8 -subj in openssl ca command
From: owner-openssl-us...@openssl.org On Behalf Of Pica Pica Contact Sent: Monday, 30 July, 2012 13:47 Look at this example: snip This certificate was signed by openssl ca without changing subject, and openssl req did not use BMPString and UCS-2 in this case. CN string contains Georgian letters but numbers are in ASCII so it is UTF-8 in fact. You're probably right. (To be positive, I'd check the req directly, not the x509 into which it is copied, because the copy *could* change the encoding as long as it doesn't change the canonical value. But I'd be surprised if it did. OTOH I've been surprised before.) On rechecking I am reminded there *is* an ASN.1 type UTF8String, which I had forgotten when I answered before. Sorry for the misstatement. So why openssl ca decides to use BMPString format? Looks like 1-byte code strings can be used without violating ASN.1 standard So that is a valid question. (Well, pedantically UTF8 is a variable-byte code, not a 1-byte code, but it's clear what you mean.) I've definitely looked at some code, but I don't remember exactly where (or when), that chooses based on the chars needed, something like: if all are printable use PrintableString, else if all are 1-byte use GeneralString, else if all are 2-byte/BMP use BMPString, else use UniversalString. I'm guessing logic like that was used, and it wouldn't choose UTF8 even though UTF8 can represent all Unicode. You'll probably have to read the source or debug, unless someone else chips in. If you don't need all the features of 'ca', like database and CRLs, you could try 'x509 -req -CA*' and see if it's different on this point. That is a separate implementation of nearly-identical function. __ OpenSSL Project http://www.openssl.org User Support Mailing Listopenssl-users@openssl.org Automated List Manager majord...@openssl.org
RE: strange results after setting utf8 -subj in openssl ca command
From: owner-openssl-us...@openssl.org On Behalf Of Pica Pica Contact Sent: Saturday, 28 July, 2012 14:41 My application uses X.509 certificates with commonName field set to following format: number#UserName, Everything is ok when UserName is in ascii, but when I sign new certificates using snip: ca ... -subj ... -utf8 and subject contains non-ASCII characters in UTF-8 encoding, the resulting certificate's CN looks this way: $ openssl x509 -in 3.pem -subject -noout subject= /CN=\x003\x000\x000\x000\x000\x00#\x04B\x045\x04A\x04B\x10\xE2 \x10\xD4\x10\xE1\x10\xE2N-V\xFD Looks like string 3 is literally encoded as a sequence of bytes with corresponding decimal values, not as sequence of ASCII codes for characters 3, 0, 0,... Nope. \xHH is exactly two hex digits for one byte. You have: '\x00' '3' '\x00' '0' ... '\x00' '#' '\x04' 'B' '\x04' '5' ... That is obviously the UCS-2 (BMPString) encoding of: U+0033=digit3 U+0030=digit0,repeated4times U+0023=NumberSign U+0442=Cyrillic.SmallTE U+0435 U+0441 U+0442 U+10E2=Georgian.LetterTar U+10D4 U+10E1 U+10E2 U+4E2E=CJK.something U+56FD=CJK.something Note that X.509 certs (and ASN.1 generally) don't actually support UTF8. They support several 1-byte codes (some now obsolete), BMPString which is 2-byte UCS-2, and UniversalString which is 4-byte UCS-4. I believe OpenSSL selects the smallest of these into which the specified (Unicode) codepoints fit, which in this case is UCS-2. After adding -nameopt oneline,-esc_msb,utf8 result looks fine That should translate the Unicode to UTF8 and output it, and assuming your terminal handles UTF8 then yes it will be good I call X509_NAME_oneline() function inside my application to get CN string, and application fails to convert number from CN field to integer, because X509_NAME_oneline() returns /CN=\x003\x000\x000\x000\x000\x00# instead of CN=3# I'm pretty sure _oneline is what x509 -text without -nameopt uses. Probably I should use X509_NAME_print_ex(), Or if you only want CN, you could get the raw CN item and its value out of the name structure which in OpenSSL is STACK_OF(X509_NAME_ENTRY). but I have doubts if this string encoding is correct and how it would work with other software. For example, certtool from GnuTLS outputs subject string in this way: $ certtool -i --infile 3.pem ...skipped... Subject: CN=#003300300030003000300023044204350441044210e210d410e110e24e2d56fd ...skipped... That apparently is dumping the UCS-2 bytes. Compare to above. There are no such problems in openssl req, I can set UTF8 strings with numbers in certificate requests and resulting certificate is ok for me, but I need to ignore subject from certificate requests and set my own value Is it possible to fix openssl ca command somehow to encode numbers in UTF8 strings as strings, not numbers? 'ca' can only encode ASN.1 strings in the ways defined by ASN.1. You must decode them accordingly. __ OpenSSL Project http://www.openssl.org User Support Mailing Listopenssl-users@openssl.org Automated List Manager majord...@openssl.org
strange results after setting utf8 -subj in openssl ca command
My application uses X.509 certificates with commonName field set to following format: number#UserName, for example 12345#JohnSmith Everything is ok when UserName is in ascii, but when I sign new certificates using this command, for example: openssl ca -config ca_config.txt -subj /CN=3#тестტესტ中国 -utf8 -batch -notext -out 3.pem -in /tmp/CSR-file and subject contains non-ASCII characters in UTF-8 encoding, the resulting certificate's CN looks this way: $ openssl x509 -in 3.pem -subject -noout subject= /CN=\x003\x000\x000\x000\x000\x00#\x04B\x045\x04A\x04B\x10\xE2\x10\xD4\x10\xE1\x10\xE2N-V\xFD Looks like string 3 is literally encoded as a sequence of bytes with corresponding decimal values, not as sequence of ASCII codes for characters 3, 0, 0,... After adding -nameopt oneline,-esc_msb,utf8 result looks fine $ openssl x509 -in 0/0_cert.pem -subject -nameopt oneline,-esc_msb,utf8 -noout subject= CN = 3#тестტესტ中国 I call X509_NAME_oneline() function inside my application to get CN string, and application fails to convert number from CN field to integer, because X509_NAME_oneline() returns /CN=\x003\x000\x000\x000\x000\x00# instead of CN=3# Probably I should use X509_NAME_print_ex(), but I have doubts if this string encoding is correct and how it would work with other software. For example, certtool from GnuTLS outputs subject string in this way: $ certtool -i --infile 3.pem ...skipped... Subject: CN=#003300300030003000300023044204350441044210e210d410e110e24e2d56fd ...skipped... There are no such problems in openssl req, I can set UTF8 strings with numbers in certificate requests and resulting certificate is ok for me, but I need to ignore subject from certificate requests and set my own value Is it possible to fix openssl ca command somehow to encode numbers in UTF8 strings as strings, not numbers? __ OpenSSL Project http://www.openssl.org User Support Mailing Listopenssl-users@openssl.org Automated List Manager majord...@openssl.org