On Dec 24 18:40, Andrey ``Bass'' Shcheglov wrote: > Hi, > > I'm running Cygwin 2.2.0 on an English Windows 8.1 box: > > > CYGWIN_NT-6.3 UNIT-725 2.2.0(0.289/5/3) 2015-08-03 12:51 x86_64 Cygwin > > Windows regional settings are set to Russian/Russia. > > In the absence of any settings in bashrc/bash_profile, `locale` command > outputs the following: > > > LANG=ru_RU > > LC_CTYPE="ru_RU" > > LC_NUMERIC="ru_RU" > > LC_TIME="ru_RU" > > LC_COLLATE="ru_RU" > > LC_MONETARY="ru_RU" > > LC_MESSAGES="ru_RU" > > LC_ALL= > > This is perfectly fine, except that "no charset" in the locale output > means "ISO charset", which is ISO-8859-5 for Russian/Russia and has > never been used (historically, DOS used CP866, Windows used CP1251 ANSI > codepage, and various Unices sticked to KOI8-R before the rise of > Unicode era).
Well, not quite. Cygwin is following Linux here: linux$ locale -av [...] locale: ru_RU archive: /usr/lib/locale/locale-archive ---------------------------------------------------------------------- title | Russian locale for Russia source | RAP address | Sankt Jorgens Alle 8, DK-1615 Kobenhavn V, Danmark email | bug-glibc-loca...@gnu.org language | Russian territory | Russia revision | 1.0 date | 2000-06-29 codeset | ISO-8859-5 cygwin$ locale -av [...] locale: ru_RU archive: /mnt/c/WINDOWS/system32/KERNEL32.DLL ---------------------------------------------------------------------- language | Russian territory | Russia codeset | ISO-8859-5 > Cygwin docs state that > > > Starting with Cygwin 1.7.2, the default character set is determined by the > > default Windows ANSI codepage for this language and territory. You missed to read on: Cygwin uses a character set which is the typical Unix-equivalent to the Windows ANSI codepage. For instance: [...] > which is not true in my case (Windows ANSI codepage for Cyrillic is > CP1251, not ISO-8859-5!). Rephrasing the above, Cygwin only uses the ANSI codepage to fetch the default Linux codepage from there. Maybe the documentation is a bit fuzzy, but it didn't say the charset is set *to* the Windows ANSI charset, it just *uses* the information to compute and set the codeset to the equivalent Linux codeset. > Surprisingly, for Belarusian (a.k.a > Belorussian, Eastern Slavic language very close to Russian) "be_BY" > locale the default charset is indeed CP1251 which is in accordance with > both the documentation and common sense. See the docs: The default charset of the "be_BY" locale (Belarusian/Belarus) is CP1251. With the "@latin" modifier it's UTF-8. Just as on Linux. > Despite that, $(locale -u) returns "en_GB", despite all regional > settings are set to Russian/Russia. I believe this is not correct, > either, and needs to be fixed. The locale is directly taken from the Windows system function GetUserDefaultUILanguage() in case of the -u option(*), and from GetUserDefaultLCID() in case of the -f option(**). This value is then fed into the Windows function GetLocaleInfo()(***) to fetch language and territory codes and that's what locale -u/-f prints. So, looks like you're using a UK-english system with just the region settings changed to Russia. In general UTF-8 is the preferred codeset so setting LANG to ru_RU.utf8 (locale -fU should work for you) is the better choice. Corinna (*) https://sourceware.org/git/?p=newlib-cygwin.git;a=blob;f=winsup/utils/locale.cc;h=fadf3f3dacedad6474c92aabe826620b2677e494;hb=HEAD#l805 (**) https://sourceware.org/git/?p=newlib-cygwin.git;a=blob;f=winsup/utils/locale.cc;h=fadf3f3dacedad6474c92aabe826620b2677e494;hb=HEAD#l812 (**) https://sourceware.org/git/?p=newlib-cygwin.git;a=blob;f=winsup/utils/locale.cc;h=fadf3f3dacedad6474c92aabe826620b2677e494;hb=HEAD#l114 -- Corinna Vinschen Please, send mails regarding Cygwin to Cygwin Maintainer cygwin AT cygwin DOT com Red Hat
signature.asc
Description: PGP signature