I am the opposite of an expert on this topic.  But in fact gcc does
appear to have code related to (A), (B), and (C).  I repeat those
choices here from Paul's original e-mail:


  A.  Convert everything to UCNs in basic source characters as soon
      as possible, that is, in translation phase 1.  (This is what
      C++ requires, apparently.)

  B.  Use native encodings where possible, UCNs otherwise.

  C.  Convert everything to wide characters as soon as possible
      using an internal encoding that encompasses the entire source
      character set and all UCNs.


Now, see libcpp/charset.c.  See the -finput-charset= option.  To me
that looks like code which does something related to (A), (B), or (C).

It does. I think the best bet would be (A) for the code that we have in
libcpp at the moment. Right now we translate upon getting characters
into an intermediate format that does encompass as much as possible
(IIRC). That, and it'd make sure that we handle what c++ requires.

I'm also not as much of an expert as I'd have liked to be when dealing
with this in the first place.

-eric

Reply via email to