Re: gcc and utf-8 source

Bruno Haible Fri, 12 Nov 2004 10:44:07 -0800

Egmont Koblinger wrote:
> I was reading Markus's page and found the example:
>   printf("%ls\n", L"Schöne Grüße");
> and noticed that gcc always interprets the source code according to
> Latin-1.


gcc-3.4's documentation contains the following:

`-fexec-charset=CHARSET'
     Set the execution character set, used for string and character
     constants.  The default is UTF-8.  CHARSET can be any encoding
     supported by the system's `iconv' library routine.

`-fwide-exec-charset=CHARSET'
     Set the wide execution character set, used for wide string and
     character constants.  The default is UTF-32 or UTF-16, whichever
     corresponds to the width of `wchar_t'.  As with
     `-ftarget-charset', CHARSET can be any encoding supported by the
     system's `iconv' library routine; however, you will have problems
     with encodings that do not fit exactly in `wchar_t'.

`-finput-charset=CHARSET'
     Set the input character set, used for translation from the
     character set of the input file to the source character set used
     by GCC. If the locale does not specify, or GCC cannot get this
     information from the locale, the default is UTF-8. This can be
     overridden by either the locale or this command line option.
     Currently the command line option takes precedence if there's a
     conflict. CHARSET can be any encoding supported by the system's
     `iconv' library routine.

and these options work fine for me.

However, these gcc options are normally not usable for portable programs.
This is because

  1) For  printf("%s\n", "Schöne Grüße");

     Many Linux users work in an UTF-8 locale, many others work in a
     pre-Unicode locale. Do you want to ship two executables, one
     produced with -fexec-charset=UTF-8 and one with
     -fexec-charset=ISO-8859-2 ?

  2) For  printf("%ls\n", L"Schöne Grüße");

     On Solaris, FreeBSD and others, the wide character encoding is
     locale dependent and not documented. Therefore there is no good
     choice for the -fwide-exec-charset option that you could make.

The portable solution is to use gettext:

     printf("%s\n", gettext ("Schoene Gruesse"));
or   printf("%s\n", gettext ("Greetings"));

This works on all platforms, with all compilers, and furthermore allows
the program to be localized.

OTOH, if you limit yourself to Linux systems and don't want your
programs to be portable or internationalized, you can now use option 2.

Bruno


--
Linux-UTF8:   i18n of Linux on all levels
Archive:      http://mail.nl.linux.org/linux-utf8/

Re: gcc and utf-8 source

Reply via email to