Re: X11R7.5 and C.UTF-8
On Thu, 3 Dec 2009, Eric Blake wrote: Thomas Dickey his.com> writes: This means that characters 0..127 have to be treated as ASCII, but No, it means that portable characters and control characters must be < 128. ASCII meets this characteristic, but so does EBCDIC, as well as UTF-8. The C locale also implies that you can manipulate bytes >= 128 in the naive manner, so long as you don't care about characters embedded in those bytes. And what do you know - ASCII, EBCDIC, and UTF-8 all meet this property, too. beyond that an implementation can do what it wants. And on Cygwin 1.7, plain "C" actually does imply UTF-8, which happily is backward-compatible with ASCII. That's an interpretation that so far hasn't been blessed by the standards people. Any discussion of this topic should mention that, as a caveat. Actually, the standards people HAVE spoken - and they agreed with our interpretation. POSIX was INTENTIONALLY written with the intent that a UTF-8 encoding is valid for the C locale, for the same reason that it was written that an EBCDIC encoding is valid for the C locale. These emails from the Austin Group (the folks that write POSIX) are telling: https://www.opengroup.org/sophocles/show_mail.tpl? CALLER=show_archive.tpl&source=L&listname=austin-group-l&id=12982 This is basically your email on the matter. https://www.opengroup.org/sophocles/show_mail.tpl? CALLER=show_archive.tpl&source=L&listname=austin-group-l&id=13012 But they also admitted that there is still more work needed in POSIX to make this intent clearly codified (for example, that control characters must be single bytes < 128). But they have not actually agreed with you yet. -- Thomas E. Dickey http://invisible-island.net ftp://invisible-island.net -- Unsubscribe info: http://cygwin.com/ml/#unsubscribe-simple Problem reports: http://cygwin.com/problems.html Documentation: http://x.cygwin.com/docs/ FAQ: http://x.cygwin.com/docs/faq/
Re: X11R7.5 and C.UTF-8
Thomas Dickey his.com> writes: > > This means that characters 0..127 have to be treated as ASCII, but No, it means that portable characters and control characters must be < 128. ASCII meets this characteristic, but so does EBCDIC, as well as UTF-8. The C locale also implies that you can manipulate bytes >= 128 in the naive manner, so long as you don't care about characters embedded in those bytes. And what do you know - ASCII, EBCDIC, and UTF-8 all meet this property, too. > > beyond that an implementation can do what it wants. And on Cygwin 1.7, > > plain "C" actually does imply UTF-8, which happily is > > backward-compatible with ASCII. > > That's an interpretation that so far hasn't been blessed by the standards > people. Any discussion of this topic should mention that, as a caveat. Actually, the standards people HAVE spoken - and they agreed with our interpretation. POSIX was INTENTIONALLY written with the intent that a UTF-8 encoding is valid for the C locale, for the same reason that it was written that an EBCDIC encoding is valid for the C locale. These emails from the Austin Group (the folks that write POSIX) are telling: https://www.opengroup.org/sophocles/show_mail.tpl? CALLER=show_archive.tpl&source=L&listname=austin-group-l&id=12982 https://www.opengroup.org/sophocles/show_mail.tpl? CALLER=show_archive.tpl&source=L&listname=austin-group-l&id=13012 But they also admitted that there is still more work needed in POSIX to make this intent clearly codified (for example, that control characters must be single bytes < 128). -- Eric Blake -- Unsubscribe info: http://cygwin.com/ml/#unsubscribe-simple Problem reports: http://cygwin.com/problems.html Documentation: http://x.cygwin.com/docs/ FAQ: http://x.cygwin.com/docs/faq/
Re: X11R7.5 and C.UTF-8
On Dec 3 13:16, Andy Koppe wrote: > 2009/12/3 Thomas Dickey: > >> From > >> http://www.opengroup.org/onlinepubs/009695399/basedefs/xbd_chap07.html, > >> §7.2: > >> > >> "The tables in Locale Definition describe the characteristics and > >> behavior of the POSIX locale for data consisting entirely of > >> characters from the portable character set and the control character > >> set. For other characters, the behavior is unspecified." > >> > >> This means that characters 0..127 have to be treated as ASCII, but > >> beyond that an implementation can do what it wants. And on Cygwin 1.7, > >> plain "C" actually does imply UTF-8, which happily is > >> backward-compatible with ASCII. > > > > That's an interpretation that so far hasn't been blessed by the standards > > people. Any discussion of this topic should mention that, as a caveat. > > Fair point. It also means that apps are entitled to assume that "C" > supports no more than ASCII, which is why Cygwin 1.7's default locale > is C.UTF-8. A default locale setting based on the user's language > selection would be better, but we don't have that (yet?). Try the attached. Note: It has a hidden "--testloop" option... Corinna -- Corinna Vinschen Please, send mails regarding Cygwin to Cygwin Project Co-Leader cygwin AT cygwin DOT com Red Hat #define WINVER 0x0600 #include #include #include #define VERSION "1.0" extern char *__progname; void version () __attribute__ ((noreturn)); void usage (FILE *, int) __attribute__ ((noreturn)); void version () { printf ("%s (Cygwin) %s\n", __progname, VERSION); exit (0); } void usage (FILE * stream, int status) { fprintf (stream, "\n\ Usage: %s [-suU] [-l LCID]\n\ \n\ Return POSIX LANG identifier corresponding to a locale, default is the\n\ system default locale\n\ Possible options are:\n\ \n\ -s, --system return LANG for the system's default locale\n\ -u, --userreturn LANG for the current user's default locale\n\ -l, --lcid LCID return LANG for the LCID given as argument\n\ -U, --UTF-8 always attach .UTF-8 to LANG\n\ -h, --helpthis text\n\ -V, --version print the version of %s and exit\n", __progname, __progname); exit (status); } struct option longopts[] = { {"system", no_argument, NULL, 's'}, {"user", no_argument, NULL, 'u'}, {"lcid", required_argument, NULL, 'l'}, {"UTF-8", no_argument, NULL, 'U'}, {"help", no_argument, NULL, 'h'}, {"version", no_argument, NULL, 'V'}, {"testloop", no_argument, NULL, 'T'}, {0, no_argument, NULL, 0} }; char *opts = "dsul:UhV"; int getlocale (LCID lcid, bool utf, bool test) { UINT codepage; char iso639[10]; char iso3166[10]; if (!GetLocaleInfo (lcid, LOCALE_IDEFAULTANSICODEPAGE | LOCALE_RETURN_NUMBER, (char *) &codepage, sizeof codepage) || !GetLocaleInfo (lcid, LOCALE_SISO639LANGNAME, iso639, 10) || !GetLocaleInfo (lcid, LOCALE_SISO3166CTRYNAME, iso3166, 10)) { if (!test) fprintf (stderr, "%s: Non existant locale\n", __progname); return 2; } if (utf) codepage = 0; if (test) { char cty[256]; char lang[256]; GetLocaleInfo (lcid, LOCALE_SENGCOUNTRY, cty, 256); GetLocaleInfo (lcid, LOCALE_SENGLANGUAGE, lang, 256); printf ("0x%04x=\"%s_%s\", %s (%s)\n", (unsigned) lcid, iso639, iso3166, lang, cty); } else printf ("LANG=\"%s_%s%s\"\n", iso639, iso3166, codepage ? "" : ".UTF-8"); return 0; } #define d(X) {X, #X} struct dl { LCTYPE t; const char *s; } dlist[] = { d(LOCALE_SLONGDATE), d(LOCALE_SSHORTDATE), d(LOCALE_STIMEFORMAT), d(LOCALE_SYEARMONTH), d(LOCALE_S1159), d(LOCALE_S2359), d(LOCALE_SDAYNAME1), d(LOCALE_SDAYNAME2), d(LOCALE_SDAYNAME3), d(LOCALE_SDAYNAME4), d(LOCALE_SDAYNAME5), d(LOCALE_SDAYNAME6), d(LOCALE_SDAYNAME7), d(LOCALE_SABBREVDAYNAME1), d(LOCALE_SABBREVDAYNAME2), d(LOCALE_SABBREVDAYNAME3), d(LOCALE_SABBREVDAYNAME4), d(LOCALE_SABBREVDAYNAME5), d(LOCALE_SABBREVDAYNAME6), d(LOCALE_SABBREVDAYNAME7), d(LOCALE_SMONTHNAME1), d(LOCALE_SMONTHNAME2), d(LOCALE_SMONTHNAME3), d(LOCALE_SMONTHNAME4), d(LOCALE_SMONTHNAME5), d(LOCALE_SMONTHNAME6), d(LOCALE_SMONTHNAME7), d(LOCALE_SMONTHNAME8), d(LOCALE_SMONTHNAME9), d(LOCALE_SMONTHNAME10), d(LOCALE_SMONTHNAME11), d(LOCALE_SMONTHNAME12), d(LOCALE_SMONTHNAME13), d(LOCALE_SABBREVMONTHNAME1), d(LOCALE_SABBREVMONTHNAME2), d(LOCALE_SABBREVMONTHNAME3), d(LOCALE_SABBREVMONTHNAME4), d(LOCALE_SABBREVMONTHNAME5), d(LOCALE_SABBREVMONTHNAME6), d(LOCALE_SABBREVMONTHNAME7), d(LOCALE_SABBREVMONTHNAME8), d(LOCALE_SABBREVMONTHNAME9), d(LOCALE_SABBREVMONTHNAME10), d(LOCALE_SABBREVMONTHNAME11), d(LOCALE_SABBREVMONTHNAME12), d(LOCALE_SABBREVMONTHNAME13), { 0, NULL } }; int main (int argc, char **argv) { int opt; LCID lcid = LOCALE_SYSTEM_DEFAULT; bool utf = false; bool test = false; bool dates = false; while ((opt = getopt_long (
Re: X11R7.5 and C.UTF-8
2009/12/3 Thomas Dickey: >> From >> http://www.opengroup.org/onlinepubs/009695399/basedefs/xbd_chap07.html, >> §7.2: >> >> "The tables in Locale Definition describe the characteristics and >> behavior of the POSIX locale for data consisting entirely of >> characters from the portable character set and the control character >> set. For other characters, the behavior is unspecified." >> >> This means that characters 0..127 have to be treated as ASCII, but >> beyond that an implementation can do what it wants. And on Cygwin 1.7, >> plain "C" actually does imply UTF-8, which happily is >> backward-compatible with ASCII. > > That's an interpretation that so far hasn't been blessed by the standards > people. Any discussion of this topic should mention that, as a caveat. Fair point. It also means that apps are entitled to assume that "C" supports no more than ASCII, which is why Cygwin 1.7's default locale is C.UTF-8. A default locale setting based on the user's language selection would be better, but we don't have that (yet?). Andy -- Unsubscribe info: http://cygwin.com/ml/#unsubscribe-simple Problem reports: http://cygwin.com/problems.html Documentation: http://x.cygwin.com/docs/ FAQ: http://x.cygwin.com/docs/faq/
Re: X11R7.5 and C.UTF-8
On Thu, 3 Dec 2009, Andy Koppe wrote: 2009/12/3 Linda Walsh: C.UTF_8 doesn't exist. ... You can't have "C" and "UTF-8", because C means no encoding (default). UTF-8 IS an encoding, so they are mutually exclusive. From http://www.opengroup.org/onlinepubs/009695399/basedefs/xbd_chap07.html, §7.2: "The tables in Locale Definition describe the characteristics and behavior of the POSIX locale for data consisting entirely of characters from the portable character set and the control character set. For other characters, the behavior is unspecified." This means that characters 0..127 have to be treated as ASCII, but beyond that an implementation can do what it wants. And on Cygwin 1.7, plain "C" actually does imply UTF-8, which happily is backward-compatible with ASCII. That's an interpretation that so far hasn't been blessed by the standards people. Any discussion of this topic should mention that, as a caveat. ymmv -- Thomas E. Dickey http://invisible-island.net ftp://invisible-island.net -- Unsubscribe info: http://cygwin.com/ml/#unsubscribe-simple Problem reports: http://cygwin.com/problems.html Documentation: http://x.cygwin.com/docs/ FAQ: http://x.cygwin.com/docs/faq/
Re: X11R7.5 and C.UTF-8
On Dec 3 07:48, Andy Koppe wrote: > 2009/12/3 Linda Walsh: > > C.UTF_8 doesn't exist. > > Well, guess what: it does in Cygwin 1.7, and it's the default locale. Not exactly. The default locale is C.UTF-8. You can also use C.UTF8 or C.utf-8 or C.utf8, but not C.UTF_8 or C.utf_8. Corinna -- Corinna Vinschen Please, send mails regarding Cygwin to Cygwin Project Co-Leader cygwin AT cygwin DOT com Red Hat -- Unsubscribe info: http://cygwin.com/ml/#unsubscribe-simple Problem reports: http://cygwin.com/problems.html Documentation: http://x.cygwin.com/docs/ FAQ: http://x.cygwin.com/docs/faq/
Re: X11R7.5 and C.UTF-8
2009/12/3 Linda Walsh: > C.UTF_8 doesn't exist. Well, guess what: it does in Cygwin 1.7, and it's the default locale. And it's also in the next Debian: http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=522776. Cygwin 1.7 also supports "C.ISO-8859-1", "C.CP1252", ... > Might want to try 'Console' nstead of using mintty. Not perfect either, but > fewer compatibility problems that I've noticed. Care to provide examples, so they can be fixed? Or are you just bitter about having to tick a box to switch backspace to ^H? 'Console' is better for native Windows programs, because, well, it's a console, whereas mintty is more suited for Unix programs, because it's an xterm-compatible tty. > You can't have "C" and "UTF-8", because C means no encoding (default). > UTF-8 IS an encoding, so they are mutually exclusive. >From http://www.opengroup.org/onlinepubs/009695399/basedefs/xbd_chap07.html, §7.2: "The tables in Locale Definition describe the characteristics and behavior of the POSIX locale for data consisting entirely of characters from the portable character set and the control character set. For other characters, the behavior is unspecified." This means that characters 0..127 have to be treated as ASCII, but beyond that an implementation can do what it wants. And on Cygwin 1.7, plain "C" actually does imply UTF-8, which happily is backward-compatible with ASCII. Not that that is much to do with "C.UTF-8", which is a separate locale in any case. The meaning of locale strings is up to the OS, e.g. with the Windows C runtime you get stuff like "English_United States.1252". And 'C.' on Cygwin is intended to mean "the semantics of the C locale, but with the specified charset". However, since the 'C.' format is unlikely to be recognised by remote systems, it's recommended to set a "real" locale such as 'en_US.UTF-8'. > I don't > know under what circumstances "C" might imply UTF-8. If the definition > of "C" changes? It might be easier than changing "c" (as used in physics). How droll. Andy -- Unsubscribe info: http://cygwin.com/ml/#unsubscribe-simple Problem reports: http://cygwin.com/problems.html Documentation: http://x.cygwin.com/docs/ FAQ: http://x.cygwin.com/docs/faq/
Re: X11R7.5 and C.UTF-8
Linda Walsh wrote: > C.UTF_8 doesn't exist. You're wrong. Please read the whole of this thread -- and the last two months' worth of cygwin-developers. > mintty is broken. No, it isn't. It just doesn't work the way *you* expect it to. > Might want to try 'Console' nstead of using mintty. Not perfect either, > but fewer compatibility problems that I've noticed. > > Examples of valid LANG values: > C, ca_FR, en_US, fr_FR, it_IT, nl_NL, wa...@euro > > You can't have "C" and "UTF-8", because C means no encoding (default). No, it doesn't. "C" means "POSIX" and is defined here: http://www.opengroup.org/onlinepubs/009695399/basedefs/xbd_chap07.html Note how all the glyphs are defined in terms of character NAMES, not hexadecimal values? That's because "C", all by itself, just doesn't SPECIFY any encoding. You're still allowed to HAVE one -- in fact, you ALWAYS have one. On most systems, that has historically been the plain ASCII 7-bit encoding; many others used the EBCDIC encoding and were not considered in violation of the POSIX "C" locale specification. Now, many systems are starting to use the UTF-8 encoding by default, even in the "C" locale. "C"/"POSIX" locale (without an additional .ENCODING suffix) is encoding-AGNOSTIC, that's all. So, you're allowed to add an .ENCODING suffix to force a specific encoding if you like, without violating POSIX. (And your system is also allowed, in that case, to IGNORE that .ENCODING suffix, and still be Posix-compliant IIUC, so it's rather a hole in the spec IMO). > UTF-8 IS an encoding, so they are mutually exclusive. I don't > know under what circumstances "C" might imply UTF-8. Whenever the platform decides to use UTF-8 as its default encoding, which is perfectly acceptable according to Posix. Cygwin-1.7 has decided to do that. So, on cygwin-1.7, "C" implies .UTF-8. X11R7.5 doesn't yet know that, without outside help (e.g. explicitly setting $LANG to "C.UTF-8" by default, so that XWin "knows" about the new default behavior). > If the definition > of "C" changes? It might be easier than changing "c" (as used in physics). > > My understanding of locale issues is also limited and subject to change or > re-education... Uhm, yeah. -- Chuck -- Unsubscribe info: http://cygwin.com/ml/#unsubscribe-simple Problem reports: http://cygwin.com/problems.html Documentation: http://x.cygwin.com/docs/ FAQ: http://x.cygwin.com/docs/faq/
Re: X11R7.5 and C.UTF-8
Ken Brown wrote: On 10/28/2009 6:07 PM, Andy Koppe wrote: 2009/10/28 Ken Brown: Maybe my terminology is wrong. But if you start mintty with no .minttyrc and with LANG unset, mintty will set LANG=C.UTF-8. Yep. That's primarily for emacs' benefit, which parses the locale env variables itself instead of using setlocale(LC_CTYPE, ""), thereby missing out on Cygwin's default locale. Andy, I've sent a report about this to the emacs-devel list (http://lists.gnu.org/archive/html/emacs-devel/2009-11/threads.html#01216). But I don't have a good understanding of locale issues. Could you take a look and see if what I said is accurate or if more should be said? C.UTF_8 doesn't exist. mintty is broken. Might want to try 'Console' nstead of using mintty. Not perfect either, but fewer compatibility problems that I've noticed. Examples of valid LANG values: C, ca_FR, en_US, fr_FR, it_IT, nl_NL, wa...@euro You can't have "C" and "UTF-8", because C means no encoding (default). UTF-8 IS an encoding, so they are mutually exclusive. I don't know under what circumstances "C" might imply UTF-8. If the definition of "C" changes? It might be easier than changing "c" (as used in physics). My understanding of locale issues is also limited and subject to change or re-education... :-) -- Unsubscribe info: http://cygwin.com/ml/#unsubscribe-simple Problem reports: http://cygwin.com/problems.html Documentation: http://x.cygwin.com/docs/ FAQ: http://x.cygwin.com/docs/faq/
Re: X11R7.5 and C.UTF-8
On 11/28/2009 8:34 AM, Andy Koppe wrote: 2009/11/28 Ken Brown: On 10/28/2009 6:07 PM, Andy Koppe wrote: 2009/10/28 Ken Brown: Maybe my terminology is wrong. But if you start mintty with no .minttyrc and with LANG unset, mintty will set LANG=C.UTF-8. Yep. That's primarily for emacs' benefit, which parses the locale env variables itself instead of using setlocale(LC_CTYPE, ""), thereby missing out on Cygwin's default locale. Andy, I've sent a report about this to the emacs-devel list (http://lists.gnu.org/archive/html/emacs-devel/2009-11/threads.html#01216). But I don't have a good understanding of locale issues. Could you take a look and see if what I said is accurate or if more should be said? Thanks Ken, I think you've got that all correct, including pointing the finger at mule-cmds.el as the suspect. I'll keep an eye on that thread. One more thing that might be worth mentioning is 'nl_langinfo(CODESET)' for enquiring about the character encoding. (It's actually being used in a couple of places in the emacs sources already, in fns.c and w32proc.c, but I don't know what significance those files have.) w32proc.c doesn't get compiled in the Cygwin build, but fns.c does. The call to nl_langinfo(CODESET) is in the definition of the locale-info function, which provides a way for emacs to determine the CODESET. I've passed this on to the emacs-devel list. Thanks for the help. Ken -- Unsubscribe info: http://cygwin.com/ml/#unsubscribe-simple Problem reports: http://cygwin.com/problems.html Documentation: http://x.cygwin.com/docs/ FAQ: http://x.cygwin.com/docs/faq/
Re: X11R7.5 and C.UTF-8
2009/11/28 Ken Brown: > On 10/28/2009 6:07 PM, Andy Koppe wrote: >> >> 2009/10/28 Ken Brown: >>> >>> Maybe my terminology is wrong. But if you start mintty with no .minttyrc >>> and with LANG unset, mintty will set LANG=C.UTF-8. >> >> Yep. That's primarily for emacs' benefit, which parses the locale env >> variables itself instead of using setlocale(LC_CTYPE, ""), thereby >> missing out on Cygwin's default locale. > > Andy, > > I've sent a report about this to the emacs-devel list > (http://lists.gnu.org/archive/html/emacs-devel/2009-11/threads.html#01216). > But I don't have a good understanding of locale issues. Could you take a > look and see if what I said is accurate or if more should be said? Thanks Ken, I think you've got that all correct, including pointing the finger at mule-cmds.el as the suspect. I'll keep an eye on that thread. One more thing that might be worth mentioning is 'nl_langinfo(CODESET)' for enquiring about the character encoding. (It's actually being used in a couple of places in the emacs sources already, in fns.c and w32proc.c, but I don't know what significance those files have.) Andy -- Unsubscribe info: http://cygwin.com/ml/#unsubscribe-simple Problem reports: http://cygwin.com/problems.html Documentation: http://x.cygwin.com/docs/ FAQ: http://x.cygwin.com/docs/faq/
Re: X11R7.5 and C.UTF-8
On 10/28/2009 6:07 PM, Andy Koppe wrote: 2009/10/28 Ken Brown: Maybe my terminology is wrong. But if you start mintty with no .minttyrc and with LANG unset, mintty will set LANG=C.UTF-8. Yep. That's primarily for emacs' benefit, which parses the locale env variables itself instead of using setlocale(LC_CTYPE, ""), thereby missing out on Cygwin's default locale. Andy, I've sent a report about this to the emacs-devel list (http://lists.gnu.org/archive/html/emacs-devel/2009-11/threads.html#01216). But I don't have a good understanding of locale issues. Could you take a look and see if what I said is accurate or if more should be said? Thanks. Ken -- Unsubscribe info: http://cygwin.com/ml/#unsubscribe-simple Problem reports: http://cygwin.com/problems.html Documentation: http://x.cygwin.com/docs/ FAQ: http://x.cygwin.com/docs/faq/
Re: X11R7.5 and C.UTF-8
2009/11/3 Jon TURNEY: > On second look, this patch doesn't seem to be quite right, as it makes the > en_US.UTF-8 compose sequences available in C.UTF-8 (which is not the case in > the C locale). I think that's ok. The compose sequences don't make sense in an ASCII locale, since ASCII doesn't contain composed characters. Yet they can be very useful in a UTF-8 locale, so it would be a shame to remove them. Also, the en_US.UTF-8 compose sequences aren't actually English-specific, since the vast majority of non-English UTF-8 locales use the same sequences. Andy -- Unsubscribe info: http://cygwin.com/ml/#unsubscribe-simple Problem reports: http://cygwin.com/problems.html Documentation: http://x.cygwin.com/docs/ FAQ: http://x.cygwin.com/docs/faq/
Re: X11R7.5 and C.UTF-8
On 29/10/2009 20:20, Andy Koppe wrote: 2009/10/29 Jon TURNEY: I've put a patch in bugzilla [1] which can be applied to /usr/share/X11/locale to temporarily repair this problem. This needs to be looked at more deeply, though, as I'm not sure I've fully understood what that locale data is being used for, or specified C.UTF-8 correctly. [1] http://sourceware.org/bugzilla/show_bug.cgi?id=10870 I think the patch makes plenty of sense in mapping C.UTF-8 to en_US.UTF-8, because most other UTF-8 locales are also mapped to en_US.UTF-8, i.e. from X's perspective they're not actually language-specific. On second look, this patch doesn't seem to be quite right, as it makes the en_US.UTF-8 compose sequences available in C.UTF-8 (which is not the case in the C locale). More generally, there's the issue that Cygwin allows any combination of language and charset, whereas X has a fixed list of permitted combinations. Cygwin also supports many charsets that aren't supported by X (and vice versa). In particular, X only supports a few of the Windows/DOS codepages. But I guess unsupported locales will just have to be a case of "don't do that"? Yes. Treating XSupportsLocale() returning false as a fatal error as the Xserver currently does is wrong, I would say, unless the application has very specific requirement, though. -- Unsubscribe info: http://cygwin.com/ml/#unsubscribe-simple Problem reports: http://cygwin.com/problems.html Documentation: http://x.cygwin.com/docs/ FAQ: http://x.cygwin.com/docs/faq/
Re: X11R7.5 and C.UTF-8
2009/10/29 Jon TURNEY: > I've put a patch in bugzilla [1] which can be applied to > /usr/share/X11/locale to temporarily repair this problem. > > This needs to be looked at more deeply, though, as I'm not sure I've fully > understood what that locale data is being used for, or specified C.UTF-8 > correctly. > > [1] http://sourceware.org/bugzilla/show_bug.cgi?id=10870 I think the patch makes plenty of sense in mapping C.UTF-8 to en_US.UTF-8, because most other UTF-8 locales are also mapped to en_US.UTF-8, i.e. from X's perspective they're not actually language-specific. More generally, there's the issue that Cygwin allows any combination of language and charset, whereas X has a fixed list of permitted combinations. Cygwin also supports many charsets that aren't supported by X (and vice versa). In particular, X only supports a few of the Windows/DOS codepages. But I guess unsupported locales will just have to be a case of "don't do that"? Andy -- Unsubscribe info: http://cygwin.com/ml/#unsubscribe-simple Problem reports: http://cygwin.com/problems.html Documentation: http://x.cygwin.com/docs/ FAQ: http://x.cygwin.com/docs/faq/
Re: X11R7.5 and C.UTF-8
On 29/10/2009 15:01, Jon TURNEY wrote: On 29/10/2009 14:37, Ken Brown wrote: $ LANG=C.UTF-8 ./Xlocale.exe Setting locale from LANG succeeded Locale is C.UTF-8 XSupportsLocale returned false Okay, well this makes sense now :-( Appropriate data needs to exist in /usr/share/X11/locale for the C.UTF-8 locale, but it doesn't at the moment. Let me see if I can find it :-) I've put a patch in bugzilla [1] which can be applied to /usr/share/X11/locale to temporarily repair this problem. This needs to be looked at more deeply, though, as I'm not sure I've fully understood what that locale data is being used for, or specified C.UTF-8 correctly. [1] http://sourceware.org/bugzilla/show_bug.cgi?id=10870 -- Unsubscribe info: http://cygwin.com/ml/#unsubscribe-simple Problem reports: http://cygwin.com/problems.html Documentation: http://x.cygwin.com/docs/ FAQ: http://x.cygwin.com/docs/faq/
Re: X11R7.5 and C.UTF-8
On 29/10/2009 14:37, Ken Brown wrote: On 10/29/2009 9:42 AM, Jon TURNEY wrote: On 29/10/2009 00:07, Andy Koppe wrote: 2009/10/28 Jon TURNEY: On 28/10/2009 14:22, Ken Brown wrote: X11R7.5 doesn't like the (default) locale C.UTF-8. If I start the server with 'LANG=C.UTF-8 /usr/bin/startxwin.bat', the server exits immediately, and the log has complaints about the locale. If I instead use 'LANG=en_US.UTF-8', there's no problem. I've attached both logs and cygcheck output. Thanks for the bug report. I'm afraid I'm not immediately able to reproduce this, though, using the command you give. You might have LC_ALL or LC_CTYPE set, which would override LANG. Or perhaps startxwin.bat overrides things somewhere along the way? To avoid all that, you could try invoking Xwin directly with LC_ALL set, which is top dog among locale variables. LC_ALL=C.UTF-8 xwin -multiwindow& It fails with en.UTF-8 too (which also is a legal Cygwin locale), but it works with en_US.UTF-8. Nope, I don't have LC_ALL or LC_CTYPE set This is pretty curious, since all XSupportsLocale() should be doing effectively is checking if setlocale (LC_ALL, NULL) returns a name it understands. Perhaps you can try the attached small test program. $ LANG=C.UTF-8 ./Xlocale.exe Setting locale from LANG succeeded Locale is C.UTF-8 XSupportsLocale returned false $ LANG=en_US.UTF-8 ./Xlocale.exe Setting locale from LANG succeeded Locale is en_US.UTF-8 XSupportsLocale returned true $ unset LANG $ ./Xlocale.exe Setting locale from LANG succeeded Locale is C XSupportsLocale returned true $ uname -a CYGWIN_NT-5.1 markov 1.7.0(0.214/5/3) 2009-10-03 14:33 i686 Cygwin I suppose I should show you mine, then $ LANG=C.UTF-8 ./Xlocale Setting locale from LANG failed Locale is C XSupportsLocale returned true $ LANG=en_US.UTF-8 ./Xlocale Setting locale from LANG succeeded Locale is en_US.UTF-8 XSupportsLocale returned true $ unset LANG $ ./Xlocale Setting locale from LANG succeeded Locale is C XSupportsLocale returned true $ uname -a CYGWIN_NT-5.1 byron 1.7.0(0.212/5/3) 2009-09-11 01:25 i686 Cygwin Okay, well this makes sense now :-( Appropriate data needs to exist in /usr/share/X11/locale for the C.UTF-8 locale, but it doesn't at the moment. Let me see if I can find it :-) -- Unsubscribe info: http://cygwin.com/ml/#unsubscribe-simple Problem reports: http://cygwin.com/problems.html Documentation: http://x.cygwin.com/docs/ FAQ: http://x.cygwin.com/docs/faq/
Re: X11R7.5 and C.UTF-8
On 29/10/2009 13:56, Corinna Vinschen wrote: On Oct 29 13:42, Jon TURNEY wrote: I haven't been following the discussion about C.UTF-8 closely, but curiously, for me at least, this test program shows that setlocale(LC_ALL, "") fails with LANG=C.UTF-8 (so that doesn't actually seem to be a valid locale, although if it's the default it probably doesn't make much difference), but this means that a subsequent setlocale(LC_ALL, NULL) just returns "C" What version of Cygwin 1.7 are you using? The change to newlib, which allows to specify C.UTF-8 as locale is from 2009-09-29, so Cygwin 1.7.0-62 from 2009-10-03 allows to specify this locale. The change which makes C.UTF-8 Cygwin's default locale is from 2009-10-09, so this change is only in Cygwin from CVS, or in developer snapshots from past that date. Thanks for the clarification. j...@byron ~ $ cygcheck -c cygwin Cygwin Package Information Package VersionStatus cygwin 1.7.0-62 OK j...@byron ~ $ uname -a CYGWIN_NT-5.1 byron 1.7.0(0.212/5/3) 2009-09-11 01:25 i686 Cygwin Oops! -- Unsubscribe info: http://cygwin.com/ml/#unsubscribe-simple Problem reports: http://cygwin.com/problems.html Documentation: http://x.cygwin.com/docs/ FAQ: http://x.cygwin.com/docs/faq/
Re: X11R7.5 and C.UTF-8
On 10/29/2009 9:42 AM, Jon TURNEY wrote: On 29/10/2009 00:07, Andy Koppe wrote: 2009/10/28 Jon TURNEY: On 28/10/2009 14:22, Ken Brown wrote: X11R7.5 doesn't like the (default) locale C.UTF-8. If I start the server with 'LANG=C.UTF-8 /usr/bin/startxwin.bat', the server exits immediately, and the log has complaints about the locale. If I instead use 'LANG=en_US.UTF-8', there's no problem. I've attached both logs and cygcheck output. Thanks for the bug report. I'm afraid I'm not immediately able to reproduce this, though, using the command you give. You might have LC_ALL or LC_CTYPE set, which would override LANG. Or perhaps startxwin.bat overrides things somewhere along the way? To avoid all that, you could try invoking Xwin directly with LC_ALL set, which is top dog among locale variables. LC_ALL=C.UTF-8 xwin -multiwindow& It fails with en.UTF-8 too (which also is a legal Cygwin locale), but it works with en_US.UTF-8. Nope, I don't have LC_ALL or LC_CTYPE set This is pretty curious, since all XSupportsLocale() should be doing effectively is checking if setlocale (LC_ALL, NULL) returns a name it understands. Perhaps you can try the attached small test program. $ LANG=C.UTF-8 ./Xlocale.exe Setting locale from LANG succeeded Locale is C.UTF-8 XSupportsLocale returned false $ LANG=en_US.UTF-8 ./Xlocale.exe Setting locale from LANG succeeded Locale is en_US.UTF-8 XSupportsLocale returned true $ unset LANG $ ./Xlocale.exe Setting locale from LANG succeeded Locale is C XSupportsLocale returned true $ uname -a CYGWIN_NT-5.1 markov 1.7.0(0.214/5/3) 2009-10-03 14:33 i686 Cygwin Ken -- Unsubscribe info: http://cygwin.com/ml/#unsubscribe-simple Problem reports: http://cygwin.com/problems.html Documentation: http://x.cygwin.com/docs/ FAQ: http://x.cygwin.com/docs/faq/
Re: X11R7.5 and C.UTF-8
On Oct 29 13:42, Jon TURNEY wrote: > I haven't been following the discussion about C.UTF-8 closely, but > curiously, for me at least, this test program shows that > setlocale(LC_ALL, "") fails with LANG=C.UTF-8 (so that doesn't > actually seem to be a valid locale, although if it's the default it > probably doesn't make much difference), but this means that a > subsequent setlocale(LC_ALL, NULL) just returns "C" What version of Cygwin 1.7 are you using? The change to newlib, which allows to specify C.UTF-8 as locale is from 2009-09-29, so Cygwin 1.7.0-62 from 2009-10-03 allows to specify this locale. The change which makes C.UTF-8 Cygwin's default locale is from 2009-10-09, so this change is only in Cygwin from CVS, or in developer snapshots from past that date. Corinna -- Corinna Vinschen Please, send mails regarding Cygwin to Cygwin Project Co-Leader cygwin AT cygwin DOT com Red Hat -- Unsubscribe info: http://cygwin.com/ml/#unsubscribe-simple Problem reports: http://cygwin.com/problems.html Documentation: http://x.cygwin.com/docs/ FAQ: http://x.cygwin.com/docs/faq/
Re: X11R7.5 and C.UTF-8
On 29/10/2009 00:07, Andy Koppe wrote: 2009/10/28 Jon TURNEY: On 28/10/2009 14:22, Ken Brown wrote: X11R7.5 doesn't like the (default) locale C.UTF-8. If I start the server with 'LANG=C.UTF-8 /usr/bin/startxwin.bat', the server exits immediately, and the log has complaints about the locale. If I instead use 'LANG=en_US.UTF-8', there's no problem. I've attached both logs and cygcheck output. Thanks for the bug report. I'm afraid I'm not immediately able to reproduce this, though, using the command you give. You might have LC_ALL or LC_CTYPE set, which would override LANG. Or perhaps startxwin.bat overrides things somewhere along the way? To avoid all that, you could try invoking Xwin directly with LC_ALL set, which is top dog among locale variables. LC_ALL=C.UTF-8 xwin -multiwindow& It fails with en.UTF-8 too (which also is a legal Cygwin locale), but it works with en_US.UTF-8. Nope, I don't have LC_ALL or LC_CTYPE set This is pretty curious, since all XSupportsLocale() should be doing effectively is checking if setlocale (LC_ALL, NULL) returns a name it understands. Perhaps you can try the attached small test program. I haven't been following the discussion about C.UTF-8 closely, but curiously, for me at least, this test program shows that setlocale(LC_ALL, "") fails with LANG=C.UTF-8 (so that doesn't actually seem to be a valid locale, although if it's the default it probably doesn't make much difference), but this means that a subsequent setlocale(LC_ALL, NULL) just returns "C" Possibly C.UTF-8 needs adding to /usr/share/X11/locale/locale.alias and locale.dir. in any case, it's probably also a bug that the Xserver considers XSupportsLocale() failure a critical error, rather than continuing with a warning, but I'd like to get to the bottom of this first... The significant change is probably that libX11 is no longer built with X_LOCALE (so that libX11 uses the native locale support rather than it's own). Exactly why this would cause a problem, I don't know. Hmm, that sounds like it should have improved matters if anything. Indeed :-) Xlocale.c Description: application/itunes-itlp -- Unsubscribe info: http://cygwin.com/ml/#unsubscribe-simple Problem reports: http://cygwin.com/problems.html Documentation: http://x.cygwin.com/docs/ FAQ: http://x.cygwin.com/docs/faq/
Re: X11R7.5 and C.UTF-8
2009/10/28 Jon TURNEY: > On 28/10/2009 14:22, Ken Brown wrote: >> >> X11R7.5 doesn't like the (default) locale C.UTF-8. If I start the >> server with 'LANG=C.UTF-8 /usr/bin/startxwin.bat', the server exits >> immediately, and the log has complaints about the locale. If I instead >> use 'LANG=en_US.UTF-8', there's no problem. I've attached both logs and >> cygcheck output. > > Thanks for the bug report. > > I'm afraid I'm not immediately able to reproduce this, though, using the > command you give. You might have LC_ALL or LC_CTYPE set, which would override LANG. Or perhaps startxwin.bat overrides things somewhere along the way? To avoid all that, you could try invoking Xwin directly with LC_ALL set, which is top dog among locale variables. LC_ALL=C.UTF-8 xwin -multiwindow& It fails with en.UTF-8 too (which also is a legal Cygwin locale), but it works with en_US.UTF-8. > The significant change is probably that libX11 is no longer built with > X_LOCALE (so that libX11 uses the native locale support rather than it's > own). > Exactly why this would cause a problem, I don't know. Hmm, that sounds like it should have improved matters if anything. Andy -- Unsubscribe info: http://cygwin.com/ml/#unsubscribe-simple Problem reports: http://cygwin.com/problems.html Documentation: http://x.cygwin.com/docs/ FAQ: http://x.cygwin.com/docs/faq/
Re: X11R7.5 and C.UTF-8
On 28/10/2009 14:22, Ken Brown wrote: X11R7.5 doesn't like the (default) locale C.UTF-8. If I start the server with 'LANG=C.UTF-8 /usr/bin/startxwin.bat', the server exits immediately, and the log has complaints about the locale. If I instead use 'LANG=en_US.UTF-8', there's no problem. I've attached both logs and cygcheck output. Thanks for the bug report. I'm afraid I'm not immediately able to reproduce this, though, using the command you give. On 28/10/2009 21:49, Andy Koppe wrote: > Xwin 1.6.x had no problem with "C.UTF-8". The significant change is probably that libX11 is no longer built with X_LOCALE (so that libX11 uses the native locale support rather than it's own). Exactly why this would cause a problem, I don't know. -- Unsubscribe info: http://cygwin.com/ml/#unsubscribe-simple Problem reports: http://cygwin.com/problems.html Documentation: http://x.cygwin.com/docs/ FAQ: http://x.cygwin.com/docs/faq/
Re: X11R7.5 and C.UTF-8
> Xwin 1.6.x had no problem with "C.UTF-8". Actually it's libX11 that makes the difference: Xwin 1.7.1 is fine after downgrading libX11 from 1.3.2-1 to 1.2.2-2. Andy -- Unsubscribe info: http://cygwin.com/ml/#unsubscribe-simple Problem reports: http://cygwin.com/problems.html Documentation: http://x.cygwin.com/docs/ FAQ: http://x.cygwin.com/docs/faq/
Re: X11R7.5 and C.UTF-8
Thomas Dickey wrote: > On Wed, 28 Oct 2009, Ken Brown wrote: > >> X11R7.5 doesn't like the (default) locale C.UTF-8. If I start the server > > technically speaking, there's "no such locale" as C.UTF-8, > so I'd not expect portable code to accept it ("C" and "UTF-8" are > mutually exclusive). No, actually they are not. The "C" or "POSIX" locale is defined entirely in terms of character values -- not hexidecimal equivalents. That is, "the set alpha shall contain 'a', 'b'..." etc. The standard actually doesn't require that an implementation specify the encoding in which those character values are represented at all. You can, if you want, use 'HEX_CHAR', 'OCTAL_CHAR', and 'DECIMAL_CHAR' representations -- which implicitly require a specific encoding -- but the standard defines the 'C' locale entirely in terms of CHAR and CHARSYMBOL, which are encoding-agnostic. http://www.opengroup.org/onlinepubs/009695399/basedefs/xbd_chap07.html#tag_07_03 Personally, I think it's a hole in the standard that it doesn't actually talk about "the POSIX locale with encoding Y" -- but then, they don't want to show preference between ASCII and EBCDIC, so UTF-8 sneaks in there. -- Chuck -- Unsubscribe info: http://cygwin.com/ml/#unsubscribe-simple Problem reports: http://cygwin.com/problems.html Documentation: http://x.cygwin.com/docs/ FAQ: http://x.cygwin.com/docs/faq/
Re: X11R7.5 and C.UTF-8
2009/10/28 Ken Brown: > Maybe my terminology is wrong. But if you start mintty with no .minttyrc > and with LANG unset, mintty will set LANG=C.UTF-8. Yep. That's primarily for emacs' benefit, which parses the locale env variables itself instead of using setlocale(LC_CTYPE, ""), thereby missing out on Cygwin's default locale. (http://www.opengroup.org/onlinepubs/007908799/xbd/envvar.html says that if LC_ALL, LC_CTYPE, and LANG are all either unset or empty, the implementation-dependent default locale shall be used. For Cygwin 1.7, the default locale uses UTF-8 and not ASCII as assumed by emacs. It works correctly in vim.) Andy -- Unsubscribe info: http://cygwin.com/ml/#unsubscribe-simple Problem reports: http://cygwin.com/problems.html Documentation: http://x.cygwin.com/docs/ FAQ: http://x.cygwin.com/docs/faq/
Re: X11R7.5 and C.UTF-8
2009/10/28 Thomas Dickey: >> X11R7.5 doesn't like the (default) locale C.UTF-8. If I start the server > > technically speaking, there's "no such locale" as C.UTF-8, > so I'd not expect portable code to accept it ("C" and "UTF-8" are > mutually exclusive). Technically speaking, portable code should make no assumption whatsoever about the locale string. The meaning of that string is up to the OS, and portable code should be using POSIX interfaces such as the multibyte conversion functions or nl_langinfo to get at its meaning. "C.UTF-8" is a language-neutral locale with a UTF-8 charset. It is also being introduced by Debain: http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=522776. Xwin 1.6.x had no problem with "C.UTF-8". Andy -- Unsubscribe info: http://cygwin.com/ml/#unsubscribe-simple Problem reports: http://cygwin.com/problems.html Documentation: http://x.cygwin.com/docs/ FAQ: http://x.cygwin.com/docs/faq/
Re: X11R7.5 and C.UTF-8
On 10/28/2009 5:23 PM, Thomas Dickey wrote: On Wed, 28 Oct 2009, Ken Brown wrote: X11R7.5 doesn't like the (default) locale C.UTF-8. If I start the server technically speaking, there's "no such locale" as C.UTF-8, so I'd not expect portable code to accept it ("C" and "UTF-8" are mutually exclusive). Maybe my terminology is wrong. But if you start mintty with no .minttyrc and with LANG unset, mintty will set LANG=C.UTF-8. Trying to then start the X server via startxwin.bat or startxwin.sh leads to the error I reported. The error did not occur in X11R7.4. There's been a lot of discussion in the various cygwin lists leading to the decision that C.UTF-8 should be the default. Ken -- Unsubscribe info: http://cygwin.com/ml/#unsubscribe-simple Problem reports: http://cygwin.com/problems.html Documentation: http://x.cygwin.com/docs/ FAQ: http://x.cygwin.com/docs/faq/
Re: X11R7.5 and C.UTF-8
On Wed, 28 Oct 2009, Ken Brown wrote: X11R7.5 doesn't like the (default) locale C.UTF-8. If I start the server technically speaking, there's "no such locale" as C.UTF-8, so I'd not expect portable code to accept it ("C" and "UTF-8" are mutually exclusive). with 'LANG=C.UTF-8 /usr/bin/startxwin.bat', the server exits immediately, and the log has complaints about the locale. If I instead use 'LANG=en_US.UTF-8', there's no problem. I've attached both logs and cygcheck output. -- Thomas E. Dickey http://invisible-island.net ftp://invisible-island.net -- Unsubscribe info: http://cygwin.com/ml/#unsubscribe-simple Problem reports: http://cygwin.com/problems.html Documentation: http://x.cygwin.com/docs/ FAQ: http://x.cygwin.com/docs/faq/