Wait a sec, what do you specifically mean with "... Cygwin just uses the
POSIX standard..." -- POSIX standard for what and how does it interfere
with getting the current layout and mapping from OS?
What do you also mean with "... So you need to set your terminal to
interpret unicode..." ? My terminal is Cygwin Terminal here. cmd.exe does
at least handle Russian and German just fine, not so Arabic and Hebrew but
this, I am pretty sure, because of some additional fiddling around
right-to-left writing needed. Notepad++(!) already handles all input types
just fine as do all the other programs tested so far. So, what are these
supposed big OS-side secrets specifically that cygwin cannot get to here?

Best Regards
Ariel Burbaickij


On Mon, Jan 25, 2021 at 9:21 PM L A Walsh <cyg...@tlinx.org> wrote:

> On 2021/01/25 06:03, Ariel Burbaickij via Cygwin wrote:
> > It says following:
> > LANG=en_US.UTF-8
> > LC_CTYPE="en_US.UTF-8"
> > LC_NUMERIC="en_US.UTF-8"
> > LC_TIME="en_US.UTF-8"
> > LC_COLLATE="en_US.UTF-8"
> > LC_MONETARY="en_US.UTF-8"
> > LC_MESSAGES="en_US.UTF-8"
> > LC_ALL=
> >
> > but why would it matter in the scenario where the user switches the
> layout
> > explicitly him-/herself?
> >
> ----
>     Because the OS (the keyboard driver) needs to know what mapping
> is used on the keyboard, so that when you press a key,
> the keyboard driver sends the keycode with the correct meaning to
> programs.
>
>     The keys on your keyboard, _inherently_ have no meaning.  They have
> an "assigned" meaning as assigned by the locale settings so they can
> send those characters to a program.
>
>     If you create your own layout, you need to create a *custom*
> mapping in POSIX.  Cygwin just uses the POSIX standard, it doesn't
> create the mapping or the meanings.
>
>  (what cygwin uses -- cygwin didn't create its own system, it uses
> the POSIX standard).
> > On Mon, 25 Jan 2021 13:46:48 +0100
> > Ariel Burbaickij wrote:
> >
> >> Hello Cygwin,
> >> I tried to find some files from the command line prompt which are
> >> named using various non-Latin (Russian, Hebrew, Arabic) and
> >> non-default Latin (German) layouts under Windows 10 Enterprise using
> >> recent cygwin version and the outcome is that instead of representing
> >> letters I see control characters of the type: \263\320\321  (Unicode
> >> numeric value of the letters?). Any ideas what happens here and how
> >> correct functionality can be restored?
> >>
> ---
>     Note that the characters you type are 1 thing.  How a program
> interprets those characters is by using the "locale" settings.
>
>     The locale is using UTF-8.  So you need to set your terminal
> to interpret unicode.  I don't know much about Win10, but in the Microsoft
> cmd.exe prog, "chcp" changes the code page.  The code page for UTF-8 is
> 65001, so in such a terminal you could type:
>
> chcp<Enter>                # this should say something like:
> Active code page: 801      # your number may be different
>
> # Remember it to switch back to your initial code page (or just
> #  close the cmd window).
>
> To switch to UTF-8, type:
>
> chcp 65001
>
> That will interpret output as UTF-8 in that program.
>
> Note, I'm not sure that will be all of your problems.
> "\263" is not valid for the 1st byte of a UTF-8 string. Valid
> First bytes of a single UTF-8 char (in hex):
> 00-7f, c2-cf, d0-df, e0-ef, f0-f4.
> So if you see something like 0xb3 in the 1st byte of a unicode
> character, you know it can't exist (part of UTF-8's
> self-synchronizing feature).
>
> A very useful utility for displaying all unicode characters
> and what character sets you have that can display them can be
> found at:
>
> https://www.babelstone.co.uk/Software/BabelMap.html
>
> Unzip it into a folder and put a link to it where it is
> easy to access.
>
>
> Hope this helps.
>
>
--
Problem reports:      https://cygwin.com/problems.html
FAQ:                  https://cygwin.com/faq/
Documentation:        https://cygwin.com/docs.html
Unsubscribe info:     https://cygwin.com/ml/#unsubscribe-simple

Reply via email to