On 2021/01/25 06:03, Ariel Burbaickij via Cygwin wrote:
It says following:
LANG=en_US.UTF-8
LC_CTYPE="en_US.UTF-8"
LC_NUMERIC="en_US.UTF-8"
LC_TIME="en_US.UTF-8"
LC_COLLATE="en_US.UTF-8"
LC_MONETARY="en_US.UTF-8"
LC_MESSAGES="en_US.UTF-8"
LC_ALL=

but why would it matter in the scenario where the user switches the layout
explicitly him-/herself?
----
   Because the OS (the keyboard driver) needs to know what mapping
is used on the keyboard, so that when you press a key,
the keyboard driver sends the keycode with the correct meaning to
programs.

   The keys on your keyboard, _inherently_ have no meaning.  They have
an "assigned" meaning as assigned by the locale settings so they can
send those characters to a program.

   If you create your own layout, you need to create a *custom*
mapping in POSIX.  Cygwin just uses the POSIX standard, it doesn't
create the mapping or the meanings.

(what cygwin uses -- cygwin didn't create its own system, it uses
the POSIX standard).
On Mon, 25 Jan 2021 13:46:48 +0100
Ariel Burbaickij wrote:
Hello Cygwin,
I tried to find some files from the command line prompt which are
named using various non-Latin (Russian, Hebrew, Arabic) and
non-default Latin (German) layouts under Windows 10 Enterprise using
recent cygwin version and the outcome is that instead of representing
letters I see control characters of the type: \263\320\321  (Unicode
numeric value of the letters?). Any ideas what happens here and how
correct functionality can be restored?
---
   Note that the characters you type are 1 thing.  How a program
interprets those characters is by using the "locale" settings.

   The locale is using UTF-8.  So you need to set your terminal
to interpret unicode.  I don't know much about Win10, but in the Microsoft
cmd.exe prog, "chcp" changes the code page.  The code page for UTF-8 is
65001, so in such a terminal you could type:

chcp<Enter>                # this should say something like:
Active code page: 801      # your number may be different

# Remember it to switch back to your initial code page (or just
#  close the cmd window).

To switch to UTF-8, type:

chcp 65001

That will interpret output as UTF-8 in that program.

Note, I'm not sure that will be all of your problems.
"\263" is not valid for the 1st byte of a UTF-8 string. Valid
First bytes of a single UTF-8 char (in hex):
00-7f, c2-cf, d0-df, e0-ef, f0-f4.
So if you see something like 0xb3 in the 1st byte of a unicode
character, you know it can't exist (part of UTF-8's
self-synchronizing feature).

A very useful utility for displaying all unicode characters
and what character sets you have that can display them can be
found at:

https://www.babelstone.co.uk/Software/BabelMap.html

Unzip it into a folder and put a link to it where it is
easy to access.


Hope this helps.

--
Problem reports:      https://cygwin.com/problems.html
FAQ:                  https://cygwin.com/faq/
Documentation:        https://cygwin.com/docs.html
Unsubscribe info:     https://cygwin.com/ml/#unsubscribe-simple

Reply via email to