------- Additional Comments From zack at codesourcery dot com  2004-12-16 02:16 
-------
Subject: Re: gcc and UCN in identifiers: bug PR 9449

Al Simons <[EMAIL PROTECTED]> writes:

> Hi, Zack.
>
> I'm looking into adding UCN support for identifiers into the HP
> C/C++ compiler, and wondered if there is any new status on your
> implementation / design?  We'd like to do things the same way if at
> all possible.

I don't intend to implement this feature until the C committee, the
C++ committee, and the Unicode committee all agree on which Unicode
character sequences are legitimate in identifiers and what sort of
canonicalization is to be performed.  As long as there is no
agreement, implementation of this feature risks indeterminacy in
shared library ABIs.

Suppose that the identifier "get_length_in_Ångstroms" is part of a
shared library's public interface.  The Å might be U+212B, U+00C5, or
U+0041 U+030A.  Suppose further that the person who implemented the
shared library used a text editor that generates NFD, so the library
header reads U+0041 U+030A.  But their compiler normalizes to NFC on
input, so the name in the shared library's symbol table reads U+00C5.
Now someone comes along with a compiler that does no normalization
whatsoever and tries to use the library.  They're going to get a link
error and they're not going to know why.  Worse, if someone recompiles
the library with a compiler that chose to normalize to NFD, its ABI
silently changes.

Joseph Myers insists that this situation cannot arise, because
C99/C++'s lists of valid Unicode code points in identifiers exclude
all combining forms.  But if I enforce those rules users will hate the
compiler, because their text editors will generate what looks like
perfectly fine text and then the compiler will barf on it.  And I am
not prepared to trust that every editor on the planet will adhere to
C99/C++'s rules.  And even if I were, we'd still have the problem of
the C99 and C++ lists not being identical.

> There is a link in the bug report that appears to be broken; any
> chance you can hook it back up?
>
> <<http://www.codesourcery.com/lists?2:mss:1481:danfdfbkjoaahbcmmeam>http://www.codesourcery.com/lists?2:mss:1481:danfdfbkjoaahbcmmeam>

My best guess is that this is now
<http://www.codesourcery.com/archives/cxx-abi-dev/msg00676.html>.
This is mostly about how to mangle non-ASCII characters in identifiers
to get them past limited linkers, and doesn't offer any help with the
problems I described above.

zw


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=9449

Reply via email to