On Wednesday, June 30th, 2004, 19:41, "Elvis Presley" va escriure:
> To: the Unicode experts at linux-utf8: > > Salut! Bon dia, > This should work, because the literal in printf() is a multi-byte > character string. We should all assume that the original source had UTF-8 chars, shouldn't we? Then, yes, it ought to work provided you run it in a UTF-8 savvy environment. In fact, whatever the encoding you use, the program will "work" provided such encoding is supported. Now we should remark that the encoding is required to support at the same time 'à' and '?', a requirement that will significantly restrict it. > But it didn't. > > I have two scenarios, neither is Linux. > > 1) A gcc based RDE called "Dev-4 C++" for Windows 98. Let's first note that gcc delivers console binaries. This is definitive stopover: Windows 9x's console does not allow you to display at the same time 'à' and '?' (OK, there is a hack for ?, but it does not scale.) > The editor in > this program will not let me enter Greek letters, neither in 8859-7, > nor utf-8. On Copy & Paste, it seems to convert everything to 8859-1, > so I can write a really nice 'hello' program in Latin-1. Complain to the author of this program. It ought to, because Windows is Unicode-enable, and it is possible to write Unicode applications (not console, graphical ones) that use the whole set. > 2) gcc running in the cygwin emulator. <snip> > My question is this: > > I assume the compiler uses the locale to determine the character set > to use for the string literals. It might. But I am not sure GCC does. Anyway, there are two ways to act here: - one is what you wrote: changing the locale of the compiler to make it figure what is intended from the codes you gave it, then process adequately. Given the profusion of encoding available out there, I understand GCC's position is to NOT do that. Rather, GCC pretends it knows nothing (more exactly: it only take care of multibytes, to avoid misinterpreting a 5C character as second charater of a ShiftJIS or Big5 character, as the beginning of an escape sequence) and simply trasmit unchanged the characters from source to binary. Which launch us to... - the other is what you probably thought about: changing the locale while _executing_, to have the correct characters displayed on the screen. Doing so requires, according to the C standard, a call to setlocale() to have any effect at all (and I notice there is no such call in "Your First Program"). But read on... > Might I be able to set the locale in cygwin to 'something-utf8'? No. Cygwin's setlocale does nothing. It is a stub, and does not accept to honour any locale except the basic C. Furthermore, I do not believe that there are a setting for Windows NT's console to grok UTF-8 (or any API in general, with the exception of MultiByteToWideChar) as encoding for the input parameters. The only doable thing (but useless) is to convince it to send UTF-8 "characters" on the screen, where they would be displayed using current setting, i.e. garbage: in effect, this is the reverse of what should really happens. > Wouldn't it be nice if all computers were utf-8 by default, but I > know, legacy issues... I don't think this Unicode thing will ever > work, well, maybe in a hundred years. But by then, the world will be > speaking English. I do not believe it would. At least, not the English you or I are speaking (and certainly not English as it is spoken these days in England.) If at any rate we end speaking all the same language around the world, it would merely be some pidgin, probably English-based, but rather different (and to have an idea of what it may look like, you have to travel to Southern China's Sea, where this linguistic move is already going on for several decades). Antoine -- Linux-UTF8: i18n of Linux on all levels Archive: http://mail.nl.linux.org/linux-utf8/