On 2021/01/25 06:03, Ariel Burbaickij via Cygwin wrote:
It says following:
LANG=en_US.UTF-8
LC_CTYPE="en_US.UTF-8"
LC_NUMERIC="en_US.UTF-8"
LC_TIME="en_US.UTF-8"
LC_COLLATE="en_US.UTF-8"
LC_MONETARY="en_US.UTF-8"
LC_MESSAGES="en_US.UTF-8"
LC_ALL=
but why would it matter in the scenario where the user switches the layout
explicitly him-/herself?
----
Because the OS (the keyboard driver) needs to know what mapping
is used on the keyboard, so that when you press a key,
the keyboard driver sends the keycode with the correct meaning to
programs.
The keys on your keyboard, _inherently_ have no meaning. They have
an "assigned" meaning as assigned by the locale settings so they can
send those characters to a program.
If you create your own layout, you need to create a *custom*
mapping in POSIX. Cygwin just uses the POSIX standard, it doesn't
create the mapping or the meanings.
(what cygwin uses -- cygwin didn't create its own system, it uses
the POSIX standard).
On Mon, 25 Jan 2021 13:46:48 +0100
Ariel Burbaickij wrote:
Hello Cygwin,
I tried to find some files from the command line prompt which are
named using various non-Latin (Russian, Hebrew, Arabic) and
non-default Latin (German) layouts under Windows 10 Enterprise using
recent cygwin version and the outcome is that instead of representing
letters I see control characters of the type: \263\320\321 (Unicode
numeric value of the letters?). Any ideas what happens here and how
correct functionality can be restored?
---
Note that the characters you type are 1 thing. How a program
interprets those characters is by using the "locale" settings.
The locale is using UTF-8. So you need to set your terminal
to interpret unicode. I don't know much about Win10, but in the Microsoft
cmd.exe prog, "chcp" changes the code page. The code page for UTF-8 is
65001, so in such a terminal you could type:
chcp<Enter> # this should say something like:
Active code page: 801 # your number may be different
# Remember it to switch back to your initial code page (or just
# close the cmd window).
To switch to UTF-8, type:
chcp 65001
That will interpret output as UTF-8 in that program.
Note, I'm not sure that will be all of your problems.
"\263" is not valid for the 1st byte of a UTF-8 string. Valid
First bytes of a single UTF-8 char (in hex):
00-7f, c2-cf, d0-df, e0-ef, f0-f4.
So if you see something like 0xb3 in the 1st byte of a unicode
character, you know it can't exist (part of UTF-8's
self-synchronizing feature).
A very useful utility for displaying all unicode characters
and what character sets you have that can display them can be
found at:
https://www.babelstone.co.uk/Software/BabelMap.html
Unzip it into a folder and put a link to it where it is
easy to access.
Hope this helps.
--
Problem reports: https://cygwin.com/problems.html
FAQ: https://cygwin.com/faq/
Documentation: https://cygwin.com/docs.html
Unsubscribe info: https://cygwin.com/ml/#unsubscribe-simple