On 12/16/2011 6:47 PM, Erwann Abalea wrote:
Le 16/12/2011 16:29, Jakob Bohm a écrit :
On 12/16/2011 3:22 PM, Erwann Abalea wrote:
Le 16/12/2011 15:07, Jakob Bohm a écrit :
I think we may have a bug here, anyone from the core team
wish to comment on this.

The apparent bug:

When enforcing the "match" policy for a DN part, openssl reports an
error if the CSR has used a different string type for the field, but the
correct value (The naively expected behavior is to realize the strings
are identical and use the configured encoding for the resulting cert).

Do you expect the "openssl ca" tool to apply the complete X.520 comparison rules before checking the policy?
Not unless there are OpenSSL functions to do the work.

Otherwise I just expect it to apply the character set conversions it uses for its
other operations (such as reading the config file/displaying DNs).

Fair.
I personally use the openssl command line tools to have a quick CA, not a full-featured one. The API is complete and allows to code this.
But you're right, it would be fair to have consistent behaviour.

3. Validating a certificate whose issuing CA certificate specifies path
constraints where the issued certificate satisfies the path constraint
except for the exact choice of string type.

NameConstraints is a set of constraints imposed on the semantic value of the name elements, not on their encoding (string type, double-spacing, case differences, etc).
The question was how the OpenSSL code (library and command line) handle
the scenario, your answer seems to indicate that it is indeed supposed to
compare the semantic character sequence, not the encoding.

That's what X.509 and X.520 impose. An algorithm is described in X.520 for name comparisons.
I understand, but does OpenSSL implement that?

T.61 has no "well defined" bidirectional mapping with UTF8.
That said, T.61 was withdrawn before 1993 (IIRC) and shouldn't be used.

According to RFC1345, T.61 has a well defined mapping to named
characters also found in UNICODE.  Some of those are encoded
as two bytes in T.61 (using a modifier+base char scheme), the
rest as one byte.  That is what I mean by a bidirectional mapping
to a small (sub)set of UNICODE, meaning that most UNICODE
code points cannot be mapped to T.61, but the rest have a
bidirectional mapping.

I'm not finished with the reading of T.61 (1988 edition), but here's what I found: - 0xA6 is the '#' character, 0xA8 is the '¤' character (generic currency), but those characters can also be obtained with 0x23 and 0x24, respectively (Figure 2, note 4). Later in the same document, 0x23 and 0x24 are declared as "not used". This is both ambiguous and not bidirectional.
As you quote it (I don't have a copy), this sounds like using 0x23
and 0x24 should not be done when encoding, but should be accepted
when decoding.
 - 0x7F and 0xFF are not defined, and are not defined as "not used".
RFC1345 seems to indicate that 0x7F maps to U+007F DEL
- 0xC9 was the umlaut diacritical mark in the 1980 edition, which is still tolerated in the 1988 edition, but the tables don't clearly define 0xC9 (and again, don't define it as "not used"). 0xC8 is declared as "diaresis or umlaut mark". As I don't have the 1980 edition, I don't know if it was already the case. - nothing is said if a diacritical mark is encoded without a base character.
RFC1346 seems to indicate that certain diacritical marks must always be
followed by a base character (which may be 0x20 space), the others never.
This is consistent with the mechanical behavior of mechanical teletypes
and typewriters: Diacritics are implemented as overtyping "dead keys"
that place the diacritic on the paper but do not advance the print head,
thus causing the next character to be combined with it.

These are ambiguities.

Annexes define control sequences (longer that 2 bytes), graphical characters, configurable character sets, presentation functions (selection of page format, character sizes and attributes (bold/italic/underline), line settings (vertical and horizontal spacing)). I doubt everything can be mapped to UTF8.
Most of those would be inapplicable to the encoding of X.500 strings, configurable character sets sounds like an ISO-2022 like mechanism useful for encoding an even
larger subset of UNICODE, as do graphical characters.

However none of those features were mentioned in the still available secondary
references I looked at (RFC1345 and Wikipedia), so they are unlikely to be
accepted nor emitted by any current implementations of T.61.

Constructing a mapping table from the data in RFC1345 or other
sources is left as an exercise for the reader (cheat hint: Maybe
IBM included such a table in ICU or unicode.org included one in
its data files).

I think only a subset of T.61 is taken into consideration. But I haven't looked at the hinted files.

Sounds like it.  RFC1345 is a historic listing of character sets encountered
on the European part of the early Internet, with machine readable tables of
each such encoding in terms of two-character abbreviations from a historic
ISO standard (fortunately re-documented within the RFC itself).

RFC1345 obviously predates both the IANA charset registry and the other
current catalogs of character set encodings.


______________________________________________________________________
OpenSSL Project                                 http://www.openssl.org
User Support Mailing List                    openssl-users@openssl.org
Automated List Manager                           majord...@openssl.org

Reply via email to