Re: My First Program

Antoine Leca Thu, 08 Jul 2004 08:18:01 -0700

On Wednesday, June 30th, 2004, 19:41, "Elvis Presley" va escriure:

> To: the Unicode experts at linux-utf8:
>
> Salut!

Bon dia,

> This should work, because the literal in printf() is a multi-byte
> character string.

We should all assume that the original source had UTF-8 chars, shouldn't we?
Then, yes, it ought to work provided you run it in a UTF-8 savvy
environment.

In fact, whatever the encoding you use, the program will "work" provided
such encoding is supported.

Now we should remark that the encoding is required to support at the same
time 'à' and '?', a requirement that will significantly restrict it.

> But it didn't.
>
> I have two scenarios, neither is Linux.
>
> 1) A gcc based RDE called "Dev-4 C++" for Windows 98.

Let's first note that gcc delivers console binaries.
This is definitive stopover: Windows 9x's console does not allow you to
display at the same time 'à' and '?' (OK, there is a hack for ?, but it does
not scale.)

> The editor in
> this program will not let me enter Greek letters, neither in 8859-7,
> nor utf-8. On Copy & Paste, it seems to convert everything to 8859-1,
> so I can write a really nice 'hello' program in Latin-1.

Complain to the author of this program. It ought to, because Windows is
Unicode-enable, and it is possible to write Unicode applications (not
console, graphical ones) that use the whole set.

> 2) gcc running in the cygwin emulator.
<snip>
> My question is this:
>
> I assume the compiler uses the locale to determine the character set
> to use for the string literals.

It might. But I am not sure GCC does.

Anyway, there are two ways to act here:

- one is what you wrote: changing the locale of the compiler to make it
figure what is intended from the codes you gave it, then process adequately.
Given the profusion of encoding available out there, I understand GCC's
position is to NOT do that. Rather, GCC pretends it knows nothing (more
exactly: it only take care of multibytes, to avoid misinterpreting a 5C
character as second charater of a ShiftJIS or Big5 character, as the
beginning of an escape sequence) and simply trasmit unchanged the characters
from source to binary. Which launch us to...

- the other is what you probably thought about: changing the locale while
_executing_, to have the correct characters displayed on the screen. Doing
so requires, according to the C standard, a call to setlocale() to have any
effect at all (and I notice there is no such call in "Your First Program").
But read on...

> Might I be able to set the locale in cygwin to 'something-utf8'?

No. Cygwin's setlocale does nothing. It is a stub, and does not accept to
honour any locale except the basic C.

Furthermore, I do not believe that there are a setting for Windows NT's
console to grok UTF-8 (or any API in general, with the exception of
MultiByteToWideChar) as encoding for the input parameters. The only doable
thing (but useless) is to convince it to send UTF-8 "characters" on the
screen, where they would be displayed using current setting, i.e. garbage:
in effect, this is the reverse of what should really happens.

> Wouldn't it be nice if all computers were utf-8 by default, but I
> know, legacy issues... I don't think this Unicode thing will ever
> work, well, maybe in a hundred years. But by then, the world will be
> speaking English.

I do not believe it would. At least, not the English you or I are speaking
(and certainly not English as it is spoken these days in England.) If at any
rate we end speaking all the same language around the world, it would merely
be some pidgin, probably English-based, but rather different (and to have an
idea of what it may look like, you have to travel to Southern China's Sea,
where this linguistic move is already going on for several decades).

Antoine

--
Linux-UTF8:   i18n of Linux on all levels
Archive:      http://mail.nl.linux.org/linux-utf8/

Re: My First Program

Reply via email to