Re: gcc and utf-8 source

Edward H. Trager Fri, 12 Nov 2004 11:33:18 -0800

Hi, Egmont,

The example from Markus' page that you show actually shows "source code"
written using ASCII but with a C-style static string in UTF-8. 
There is no problem with this code!

However, if you try to write some code like this:

void Ãcrire(const char *myCString);   // Function name has Latin-1 chars *in 
UTF-8 encoding*
void åå(const char *myCString);     // Function name has Chinese chars *in 
UTF-8 encoding*

... instead of:

void myWriteFunction(const char *myCString); // Function name *limited to basic 
ASCII Latin*

... THEN You will get into trouble not only with GCC but probably with other 
compilers as
well.

So:

1. Keep your code --all parts of it that are actually parsed by the compiler-- 
limited only
to ASCII.  (Most people suggest the code be in English with English comments 
for world-wide
comprehension).  

2. Although the strings in your program can be in any encoding you want, UTF-8 
certainly makes
   the most sense.

I have real-life production code that contains message strings encoded in UTF-8 
that compiles
and executes just fine on numerous platforms.  I have never had a problem with 
this code
with either GCC or Intel's ICC on Linux, GCC on other Free *Nix platforms like 
FreeBSD and
OpenBSD, or Sun's Forte compiler on Solaris 8.  I *never* use special compiler 
#pragmas, nor
resort to wide-character (multibyte) strings.  I always just use UTF-8 encoding 
in simple
C-style "char *" strings or, for C++ code, in the standard C++ "String" class.

- Ed Trager

On Friday 2004.11.12 18:45:08 +0100, Egmont Koblinger wrote:
> Hi,
> 
> I was reading Markus's page and found the example:
>   printf("%ls\n", L"SchÃne GrÃÃe");
> and noticed that gcc always interprets the source code according to Latin-1.
> 
> Then I googled a bit and found this reported to the gcc folks by Markus:
> http://sources.redhat.com/ml/libc-alpha/2000-09/msg00337.html
> 
> However, this happened four years ago, and I haven't found more recent
> pieces of information on this topic.
> 
> So my questions:
> 
>  - Is there a proper solution where I can write my source code in UTF-8?
>    I have linux with gcc 3.3.4 and it's not necessary for the code to be
>    portable to older or different systems.
> 
>  - Some people were discussing a cpp #pragma charset. Is it already
>    implemented? If yes, where can I find docs about it?
> 
>  - Does recompiling gcc with --enable-c-mbchar solve this issue? Will gcc
>    then honour my locale settings? Is it a stable, ready-for-production-use
>    option of gcc?
> 
>  - Are there any applications which are known to miscompile with a c-mbchar
>    gcc if I have a non-Latin1 (e.g. Latin-2 or UTF-8) locale settings?
> 
> 
> 
> thanks,
> 
> Egmont
> 
> --
> Linux-UTF8:   i18n of Linux on all levels
> Archive:      http://mail.nl.linux.org/linux-utf8/
> 
> 
> 

--
Linux-UTF8:   i18n of Linux on all levels
Archive:      http://mail.nl.linux.org/linux-utf8/

Re: gcc and utf-8 source

Reply via email to