Re: Cygwin, Unicode and "long" path names

2021-06-25 Thread Doug Henderson via Cygwin
)()On Fri, 25 Jun 2021 at 19:55, Vadim  wrote:
>
> Ah, this beautiful topic. Windows 7 x64.
>
> This is the summary written as post-scriptum, tests and findings below:
>
> 1) Cygwin limits individual names to 255 bytes, Windows seems to follow
> UTF-16 chars and work fine: 256 bytes in 108 characters works.
>
> Basically, this becomes a bytes vs characters story.
>
> 2) Bash file name auto-expansion detects the file of that name, but it
> gets truncated to 255 bytes. find's behaviour is the same ("No such file
> or directory" due to trying to access a non-existing truncated name)
>
> 2.1) If you try to correct the above mistake by adding truncated
> characters, then the program (cat) will complain about "File name too long"
>
> 2.2) If there exists a folder with a 255-byte name, equal to the
> truncated name, then "find ." will do a listing on that folder twice
> (effectively hiding the long-named folder from tools without leaving an
> error message)
>
> 3) UNC Paths get the same treatment: File name too long.
>
> I expected Cygwin to handle these names without problems just like
> Windows, Explorer, cmd etc. do. Is this particular problem new or known?
> All I could find on the mailing list is around the time when Cygwin
> hadn't yet implemented Unicode support (UTF-8?), ~2004-2008.
>
> These names were created by youtube-dl.exe executed from within Cygwin.
>
> - Vadim

I believe this is the result of the difference between Pascal type
strings, which have a length-byte followed by data-bytes and C type
strings which have data-bytes followed by a zero-byte, or worse, in
the case of two byte characters, data-words followed by a zero-word.

For single byte characters both  P and C styles use 256 bytes. Using
the 255 length limit without accounting for the trailing zero-byte
could account for some of the observed problem.

More likely, the problems relates to double byte character sets. For
double byte characters, 255 bytes of UTF-16 characters or more likely
255 bytes of MCBS (multi-byte character set) or DBCS (double-byte
character set) can encode to more or less than 255 UTF-8 bytes
depending on the average bytes/character of the UTF-8 encoding. This
could account for the failure to handle all bytes of the NTFS filename
when converted to UTF-8. Converted Linux programs may fail to allocate
a large enough encoding buffer leading to the observed truncation.
Similarly for 510 bytes containing 255 words of DBCS characters.

Youtube-dl.exe is basically a windows Python 3 program with
C-extensions. Python 3 properly handles Unicode and the encoding and
decoding of the aforementioned character encodings.

I would look for library functions which perform decoding of NTFS file
names into UTF-8 names, verify their correctness, and follow the path
of the usage of their output through the system. I think this will
mean that using the windows 255 byte limit cannot be used at all in
any cygwin program that will handle international file names.
Unfortunately that sounds like a lot of work. In theory, if all 255
characters in the filename component required 4 byte UTF-8 encodings,
this would require about 1024 bytes. However this does not even touch
on emojis where a one character emoji can expand to as much as 35 or
so bytes! That basically means the end of static allocation for file
and directory names and name component buffers. That may be a major
job in the cygwin kernel, not to mention all the available packages!


HTH
Doug

-- 
Doug Henderson, Calgary, Alberta, Canada - from gmail.com

-- 
Problem reports:  https://cygwin.com/problems.html
FAQ:  https://cygwin.com/faq/
Documentation:https://cygwin.com/docs.html
Unsubscribe info: https://cygwin.com/ml/#unsubscribe-simple


Cygwin, Unicode and "long" path names

2021-06-25 Thread Vadim

Ah, this beautiful topic. Windows 7 x64.

This is the summary written as post-scriptum, tests and findings below:

1) Cygwin limits individual names to 255 bytes, Windows seems to follow 
UTF-16 chars and work fine: 256 bytes in 108 characters works.


Basically, this becomes a bytes vs characters story.

2) Bash file name auto-expansion detects the file of that name, but it 
gets truncated to 255 bytes. find's behaviour is the same ("No such file 
or directory" due to trying to access a non-existing truncated name)


2.1) If you try to correct the above mistake by adding truncated 
characters, then the program (cat) will complain about "File name too long"


2.2) If there exists a folder with a 255-byte name, equal to the 
truncated name, then "find ." will do a listing on that folder twice 
(effectively hiding the long-named folder from tools without leaving an 
error message)


3) UNC Paths get the same treatment: File name too long.

I expected Cygwin to handle these names without problems just like 
Windows, Explorer, cmd etc. do. Is this particular problem new or known? 
All I could find on the mailing list is around the time when Cygwin 
hadn't yet implemented Unicode support (UTF-8?), ~2004-2008.


These names were created by youtube-dl.exe executed from within Cygwin.

- Vadim

---

This file name is 255 bytes long and works:

s123點半蘋果新聞報道 字幕版重溫(2021年5月18日)︱蔡展鵬光顧賣淫骨場 
O記轉介律政司︱新巴車長被判不小心駕駛罪成︱深圳賽格大樓離奇劇晃 
民眾慌忙逃走︱蘋果日報 Apple Daily #香港新聞.txt


This is 256 bytes and works perfectly normal in Windows (explorer, can 
paste and "dir " in cmd despite showing [] block chars), but not 
Cygwin terminal (I used s123/s1234 as a prefix for easy auto-expansion):


s1234點半蘋果新聞報道 字幕版重溫(2021年5月18日)︱蔡展鵬光顧賣淫骨場 
O記轉介律政司︱新巴車長被判不小心駕駛罪成︱深圳賽格大樓離奇劇晃 
民眾慌忙逃走︱蘋果日報 Apple Daily #香港新聞.txt



If I try to use tab-expansion in the terminal (mintty, bash) the problem 
becomes apparent ("xt" missing at the end):


$ cat s1234點半蘋果新聞報道\ 
字幕版重溫(2021年5月18日)︱蔡展鵬光顧賣淫骨場\ 
O記轉介律政司︱新巴車長被判不小心駕駛罪成 ︱深圳賽格大樓離奇劇晃\ 
民眾慌忙逃走︱蘋果日報\ Apple\ Daily\ #香港新聞.t
cat: 's1234點半蘋果新聞報道 字幕版重溫(2021年5月18日)︱蔡展鵬光顧賣淫骨場 
O記轉介律政司︱新巴車長被判不小心駕駛罪成︱深圳賽格大樓離奇劇晃 
民眾慌忙逃走︱蘋果日報 Apple Daily #香港新聞.t': No such file or directory



However, with one fewer byte it expands properly:

$ cat s123點半蘋果新聞報道\ 字幕版重溫(2021年5月18日)︱蔡展鵬光顧賣淫骨場\ 
O記轉介律政司︱新巴車長被判不小心駕駛罪成︱深圳賽格大樓離奇劇晃\ 
民眾慌忙逃走︱蘋果日報\ Apple\ Daily\ #香港新聞.txt

hello


MAX_PATH? Yes, 255 bytes. Why then does the full file/folder name work 
in Windows? This is the full name (a folder), 257 bytes:


20210518_9點半蘋果新聞報道 字幕版重溫(2021年5月18日)︱蔡展鵬光顧賣淫骨場 
O記轉介律政司︱新巴車長被判不小心駕駛罪成︱深圳賽格大樓離奇劇晃 
民眾慌忙逃走︱蘋果日報 Apple Daily #香港新聞


And it can get longer! In fact, I can bump the total path to 396 bytes 
or "Column 249" as Notepad++ counts the characters (individual folder 
name is 359b or 211 chars, "column 212"):


D:/abcdefgh/Local_TEMP/cygwinunicode/1_123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789020210518_9點半蘋果新聞報道 
字幕版重溫(2021年5月18日)︱蔡展鵬光顧賣淫骨場 
O記轉介律政司︱新巴車長被判不小心駕駛罪成︱深圳賽格大樓離奇劇晃 
民眾慌忙逃走︱蘋果日報 Apple Daily #香港新聞



NTFS allows up to 255 UTF-16 for an individual path segment and this 
seems to align with the Windows tooling: cmd and Explorer can browse 
these fine, but the included file in the folder spills beyond the limit 
and you run into the usual 'total path too long' problem).


Whether you manually add the missing "xt" to the tab-completion or use 
UNC paths, the result is the same:


$ cat s1234點半蘋果新聞報道\ 
字幕版重溫(2021年5月18日)︱蔡展鵬光顧賣淫骨場\ 
O記轉介律政司︱新巴車長被判不小心駕駛罪成 ︱深圳賽格大樓離奇劇晃\ 
民眾慌忙逃走︱蘋果日報\ Apple\ Daily\ #香港新聞.txt
cat: 's1234點半蘋果新聞報道 字幕版重溫(2021年5月18日)︱蔡展鵬光顧賣淫骨場 
O記轉介律政司︱新巴車長被判不小心駕駛罪成︱深圳賽格大樓離奇劇晃 
民眾慌忙逃走︱蘋果日報 Apple Daily #香港新聞.txt': File name too long
$ cat '\\?\D:\abcdefgh\Local_TEMP\cygwinunicode\20210518_9點半蘋果新聞報道 
字幕版重溫(2021年5月18日)︱蔡展鵬光顧賣淫骨場 
O記轉介律政司︱新巴車長被判不小心駕駛罪成︱深圳賽格大樓離奇劇晃 
民眾慌忙逃走︱蘋果日報 Apple Daily #香港新聞.txt'
cat: '\\?\D:\abcdefgh\Local_TEMP\cygwinunicode\20210518_9點半蘋果新聞報道 
字幕版重溫(2021年5月18日)︱蔡展鵬光顧賣淫骨場 
O記轉介律政司︱新巴車長被判不小心駕駛罪成︱深圳賽格大樓離奇劇晃 
民眾慌忙逃走︱蘋果日報 Apple Daily #香港新聞.txt': File name too long



--
Problem reports:  https://cygwin.com/problems.html
FAQ:  https://cygwin.com/faq/
Documentation:https://cygwin.com/docs.html
Unsubscribe info: https://cygwin.com/ml/#unsubscribe-simple


Re: libtool with mingw hangs building openocd in func_convert_core_msys_to_w32

2021-06-25 Thread Dietmar May via Cygwin
The build completes successfully by replacing the "cmd /c | sed" 
construct with simply:


func_convert_core_msys_to_w32_result=$1

so no path translation takes place.

The function then becomes:

func_convert_core_msys_to_w32 ()
{
  $debug_cmd

func_convert_core_msys_to_w32_result=$1

}



SUMMARY

func_convert_core_msys_to_w32 in

/usr/share/libtool/build-aux/ltmain.sh

has an extraneous '/' in the call to

( cmd //c echo "$1" )

causing make to hang indefinitely

when configured with

--build=x86_64-w64-mingw32 --host=x86_64-w64-mingw32


The project builds successfully on msys2 & mingw-w64-x86_64-gcc 
installed, as well as on git-for-windows-sdk (which uses msys2). msys2 
has the same issue ('//c'), but the compiler is in the path, so no 
cross-compilation configuration is needed (and apparently this 
function is not invoked).



DETAILS

func_convert_core_msys_to_w32() in the generated libtool script, when 
configured using --build and --host for mingw, expands to:


cmd //c echo ... | sed

//c is not a valid option to cmd.exe, and causes cmd.exe to hang 
indefinitely. This is reproducible from the command line:


cmd //c echo .libs/ | /usr/bin/sed -e 's/[ ]*$//' -e 
's|*|\\|g;s|/|\\|g;s|\\||g'


ps aux shows cmd.exe, with sed at pid cmd.exe + 1. kill is the only 
way to terminate.


By changing "cmd //c" to "cmd /c", the command completes successfully.


/usr/share/libtool/build-aux/ltmain.sh is the template, which contains 
the code:


# func_convert_core_msys_to_w32 ARG
# Convert file name or path ARG from MSYS format to w32 format. Return
# result in func_convert_core_msys_to_w32_result.
func_convert_core_msys_to_w32 ()
{
  $debug_cmd

  # awkward: cmd appends spaces to result
  func_convert_core_msys_to_w32_result=`( cmd //c echo "$1" ) 
2>/dev/null` |

    $SED -e 's/[ ]*$//' -e "$sed_naive_backslashify"`
}
#end: func_convert_core_msys_to_w32

I've been able to get past this problem by editing this file and 
running configure again.


Unfortunately, make aborts at a later point with a different (but 
perhaps related?) error:


func_to_tool_file src/.libs/libopenocd.libcmd

func_convert_file_msys_to_w32 src/.libs/libopenocd.libcmd
func_convert_core_msys_to_w32 src/.libs/libopenocd.libcmd
func_convert_file_check src/.libs/libopenocd.libcmd 
src\\.libs\\libopenocd.libcmd
func_execute_cmds $AR $AR_FLAGS $oldlib$oldobjs~$RANLIB $tool_oldlib 
exit $?
 exit $?w_eval x86_64-w64-mingw32-ar cru src/.libs/libopenocd.a 
@src\\.libs\\libopenocd.libcmd
func_quote_for_expand x86_64-w64-mingw32-ar cru src/.libs/libopenocd.a 
@src\\.libs\\libopenocd.libcmd
func_notquiet x86_64-w64-mingw32-ar cru src/.libs/libopenocd.a 
@src\\.libs\\libopenocd.libcmd
func_echo x86_64-w64-mingw32-ar cru src/.libs/libopenocd.a 
@src\\.libs\\libopenocd.libcmd



libtool: link: x86_64-w64-mingw32-ar cru src/.libs/libopenocd.a 
@src\\.libs\\libopenocd.libcmd


: No such file or directory\.libs\libopenocd.libcmd
make[2]: *** [Makefile:2811: src/libopenocd.la] Error 1

The file *is* there:

$ ls src/.libs
libopenocd.lax  libopenocd.libcmd

Running the command directly completes with no errors:

$ x86_64-w64-mingw32-ar cru src/.libs/libopenocd.a 
@src\\.libs\\libopenocd.libcmd

$


One thing I don't understand is why libtool is converting paths to 
windows format to run inside of a cygwin shell. The command completes 
successfully *if no path conversion occurs* - so why bother?


$ x86_64-w64-mingw32-ar cru src/.libs/libopenocd.a 
@src/.libs/libopenocd.libcmd


$

Is this a holdover from 13 year old mingw behavior? or related somehow 
to running autotools in a cmd.exe environment (like Microsoft's 
original NT "posix" subsystem, a port of gnu commands to run 
"natively" under cmd.exe)?


Can libtool just ditch all of the back and forth path conversions, and 
simplify all of this?



REPRODUCING

Install mingw64-x86_64-gcc-g++, autoconf, autoconf2.1, autoconf2.5, 
automake and pkg-config in cygwin.


I believe that will pull in all required dependencies.


git clone https://git.code.sf.net/p/openocd/code

cd openocd

./bootstrap

./configure --disable-werror --disable-doxygen-pdf --enable-ftdi 
--enable-jlink --build=x86_64-w64-mingw32 --host=x86_64-w64-mingw32


make # or make -j8




--
Problem reports:  https://cygwin.com/problems.html
FAQ:  https://cygwin.com/faq/
Documentation:https://cygwin.com/docs.html
Unsubscribe info: https://cygwin.com/ml/#unsubscribe-simple


[no subject]

2021-06-25 Thread Megdelawit Mazengia via Cygwin
ftp.is.co.za
Addis ababa

-- 
Problem reports:  https://cygwin.com/problems.html
FAQ:  https://cygwin.com/faq/
Documentation:https://cygwin.com/docs.html
Unsubscribe info: https://cygwin.com/ml/#unsubscribe-simple


Re: getclip and putclip garble unicode characters

2021-06-25 Thread Brian Inglis

On 2021-06-25 12:01, Thomas Wolff wrote:

Am 24.06.2021 um 08:35 schrieb Andrey Repin via Cygwin:

Greetings, Миронов Леонид Владимирович!

getclip and putclip from cygutils-extra garble unicode characters:
non-latin characters copied to clipboard in windows are replaced with
question marks when retrieved with getclip in cygwin, and non-latin
characters copied to clipboard using putclip are pasted it in windows
looking like utf-8 displayed in cp1252 but can be retrieved with getclip
exactly as pasted, so it looks like the problem is not in the way the 
data

is copied but in the way cygwin and windows communicate text encoding to
each other. LC_CTYPE=en_US.UTF-8, windows ANSI codepage is set to 
cp1251 - 1251, not 1252.

This looks like you are using a program incapable of dealing with unicode
clipboard. To achieve better results, switch your input 
language/keyboard to

matching language before copying text from application. I.e. switch to
Russian then copy text, then check what is returned by getclip.
But then, why LC_CTYPE is en_US?
getclip and putclip are just broken, they don't even work in a pure 
UTF-8 environment.
Already noticed 9 years ago... 
https://sourceware.org/legacy-ml/cygwin/2012-03/msg00648.html

including a script-based replacement.


Just cat [<>] /dev/clipboard: recent Windows changes may have affected 
Windows<->X copy and paste transparency.


--
Take care. Thanks, Brian Inglis, Calgary, Alberta, Canada

This email may be disturbing to some readers as it contains
too much technical detail. Reader discretion is advised.
[Data in binary units and prefixes, physical quantities in SI.]

--
Problem reports:  https://cygwin.com/problems.html
FAQ:  https://cygwin.com/faq/
Documentation:https://cygwin.com/docs.html
Unsubscribe info: https://cygwin.com/ml/#unsubscribe-simple


Re: getclip and putclip garble unicode characters

2021-06-25 Thread Thomas Wolff



Am 24.06.2021 um 08:35 schrieb Andrey Repin via Cygwin:

Greetings, Миронов Леонид Владимирович!


getclip and putclip from cygutils-extra garble unicode characters:
non-latin characters copied to clipboard in windows are replaced with
question marks when retrieved with getclip in cygwin, and non-latin
characters copied to clipboard using putclip are pasted it in windows
looking like utf-8 displayed in cp1252 but can be retrieved with getclip
exactly as pasted, so it looks like the problem is not in the way the data
is copied but in the way cygwin and windows communicate text encoding to
each other. LC_CTYPE=en_US.UTF-8, windows ANSI codepage is set to cp1251 - 
1251, not 1252.

This looks like you are using a program incapable of dealing with unicode
clipboard. To achieve better results, switch your input language/keyboard to
matching language before copying text from application. I.e. switch to
Russian then copy text, then check what is returned by getclip.
But then, why LC_CTYPE is en_US?
getclip and putclip are just broken, they don't even work in a pure 
UTF-8 environment.
Already noticed 9 years ago... 
https://sourceware.org/legacy-ml/cygwin/2012-03/msg00648.html

including a script-based replacement.
Thomas

--
Problem reports:  https://cygwin.com/problems.html
FAQ:  https://cygwin.com/faq/
Documentation:https://cygwin.com/docs.html
Unsubscribe info: https://cygwin.com/ml/#unsubscribe-simple


Re: libtool with mingw hangs building openocd in func_convert_core_msys_to_w32

2021-06-25 Thread Jonathan Yong via Cygwin

On 6/25/21 2:34 PM, Dietmar May via Cygwin wrote:


./configure --disable-werror --disable-doxygen-pdf --enable-ftdi 
--enable-jlink --build=x86_64-w64-mingw32 --host=x86_64-w64-mingw32




Don't set --build, you are building on Cygwin, not MSYS.



OpenPGP_0x713B5FE29C145D45.asc
Description: OpenPGP public key


OpenPGP_signature
Description: OpenPGP digital signature

-- 
Problem reports:  https://cygwin.com/problems.html
FAQ:  https://cygwin.com/faq/
Documentation:https://cygwin.com/docs.html
Unsubscribe info: https://cygwin.com/ml/#unsubscribe-simple


libtool with mingw hangs building openocd in func_convert_core_msys_to_w32

2021-06-25 Thread Dietmar May via Cygwin

SUMMARY

func_convert_core_msys_to_w32 in

/usr/share/libtool/build-aux/ltmain.sh

has an extraneous '/' in the call to

( cmd //c echo "$1" )

causing make to hang indefinitely

when configured with

--build=x86_64-w64-mingw32 --host=x86_64-w64-mingw32


The project builds successfully on msys2 & mingw-w64-x86_64-gcc 
installed, as well as on git-for-windows-sdk (which uses msys2). msys2 
has the same issue ('//c'), but the compiler is in the path, so no 
cross-compilation configuration is needed (and apparently this function 
is not invoked).



DETAILS

func_convert_core_msys_to_w32() in the generated libtool script, when 
configured using --build and --host for mingw, expands to:


cmd //c echo ... | sed

//c is not a valid option to cmd.exe, and causes cmd.exe to hang 
indefinitely. This is reproducible from the command line:


cmd //c echo .libs/ | /usr/bin/sed -e 's/[ ]*$//' -e 
's|*|\\|g;s|/|\\|g;s|\\||g'


ps aux shows cmd.exe, with sed at pid cmd.exe + 1. kill is the only way 
to terminate.


By changing "cmd //c" to "cmd /c", the command completes successfully.


/usr/share/libtool/build-aux/ltmain.sh is the template, which contains 
the code:


# func_convert_core_msys_to_w32 ARG
# Convert file name or path ARG from MSYS format to w32 format. Return
# result in func_convert_core_msys_to_w32_result.
func_convert_core_msys_to_w32 ()
{
  $debug_cmd

  # awkward: cmd appends spaces to result
  func_convert_core_msys_to_w32_result=`( cmd //c echo "$1" ) 
2>/dev/null` |

    $SED -e 's/[ ]*$//' -e "$sed_naive_backslashify"`
}
#end: func_convert_core_msys_to_w32

I've been able to get past this problem by editing this file and running 
configure again.


Unfortunately, make aborts at a later point with a different (but 
perhaps related?) error:


func_to_tool_file src/.libs/libopenocd.libcmd

func_convert_file_msys_to_w32 src/.libs/libopenocd.libcmd
func_convert_core_msys_to_w32 src/.libs/libopenocd.libcmd
func_convert_file_check src/.libs/libopenocd.libcmd 
src\\.libs\\libopenocd.libcmd

func_execute_cmds $AR $AR_FLAGS $oldlib$oldobjs~$RANLIB $tool_oldlib exit $?
 exit $?w_eval x86_64-w64-mingw32-ar cru src/.libs/libopenocd.a 
@src\\.libs\\libopenocd.libcmd
func_quote_for_expand x86_64-w64-mingw32-ar cru src/.libs/libopenocd.a 
@src\\.libs\\libopenocd.libcmd
func_notquiet x86_64-w64-mingw32-ar cru src/.libs/libopenocd.a 
@src\\.libs\\libopenocd.libcmd
func_echo x86_64-w64-mingw32-ar cru src/.libs/libopenocd.a 
@src\\.libs\\libopenocd.libcmd



libtool: link: x86_64-w64-mingw32-ar cru src/.libs/libopenocd.a 
@src\\.libs\\libopenocd.libcmd


: No such file or directory\.libs\libopenocd.libcmd
make[2]: *** [Makefile:2811: src/libopenocd.la] Error 1

The file *is* there:

$ ls src/.libs
libopenocd.lax  libopenocd.libcmd

Running the command directly completes with no errors:

$ x86_64-w64-mingw32-ar cru src/.libs/libopenocd.a 
@src\\.libs\\libopenocd.libcmd

$


One thing I don't understand is why libtool is converting paths to 
windows format to run inside of a cygwin shell. The command completes 
successfully *if no path conversion occurs* - so why bother?


$ x86_64-w64-mingw32-ar cru src/.libs/libopenocd.a 
@src/.libs/libopenocd.libcmd


$

Is this a holdover from 13 year old mingw behavior? or related somehow 
to running autotools in a cmd.exe environment (like Microsoft's original 
NT "posix" subsystem, a port of gnu commands to run "natively" under 
cmd.exe)?


Can libtool just ditch all of the back and forth path conversions, and 
simplify all of this?



REPRODUCING

Install mingw64-x86_64-gcc-g++, autoconf, autoconf2.1, autoconf2.5, 
automake and pkg-config in cygwin.


I believe that will pull in all required dependencies.


git clone https://git.code.sf.net/p/openocd/code

cd openocd

./bootstrap

./configure --disable-werror --disable-doxygen-pdf --enable-ftdi 
--enable-jlink --build=x86_64-w64-mingw32 --host=x86_64-w64-mingw32


make # or make -j8



--
Problem reports:  https://cygwin.com/problems.html
FAQ:  https://cygwin.com/faq/
Documentation:https://cygwin.com/docs.html
Unsubscribe info: https://cygwin.com/ml/#unsubscribe-simple


RE: getclip and putclip garble unicode characters

2021-06-25 Thread Миронов Леонид Владимирович via Cygwin
As far as copying from cygwin to windows is concerned, it happens in exactly 
the same way in all windows programs I tried pasting data to - word, outlook, 
chrome, console, you name it. Changing windows keyboard language has no effect 
either, windows still stubbornly treats clipboard contents as cp1252 (don't 
quite see how it is supposed to help - data on the clipboard is not limited to 
one single-byte codepage anyway). 

At first I missed that when copying from windows to cygwin getclip actually 
gets data in cp1251 (windows ANSI codepage), thus cyrillic characters can be at 
least recovered with iconv, but non-cyrillic non-latin characters - e.g. greek, 
are replaced with question marks and are lost although in windows everything 
can be pasted back without issues, again regardless of the program and keyboard 
language.

So in a nutshell, when copy-pasting from cygwin putclip to windows unicode is 
treated as cp1252 while copy-pasting from windows to cygwin getclip unicode is 
treated as cp1251.

Sorry for top-posting.

-Original Message-
From: Andrey Repin  
Sent: Thursday, June 24, 2021 9:36 AM
To: Миронов Леонид Владимирович ; cygwin@cygwin.com
Subject: Re: getclip and putclip garble unicode characters

Greetings, Миронов Леонид Владимирович!

> getclip and putclip from cygutils-extra garble unicode characters:
> non-latin characters copied to clipboard in windows are replaced with 
> question marks when retrieved with getclip in cygwin, and non-latin 
> characters copied to clipboard using putclip are pasted it in windows 
> looking like utf-8 displayed in cp1252 but can be retrieved with 
> getclip exactly as pasted, so it looks like the problem is not in the 
> way the data is copied but in the way cygwin and windows communicate 
> text encoding to each other. LC_CTYPE=en_US.UTF-8, windows ANSI codepage is 
> set to cp1251 - 1251, not 1252.

This looks like you are using a program incapable of dealing with unicode 
clipboard. To achieve better results, switch your input language/keyboard to 
matching language before copying text from application. I.e. switch to Russian 
then copy text, then check what is returned by getclip.
But then, why LC_CTYPE is en_US?


--
With best regards,
Andrey Repin
Thursday, June 24, 2021 9:33:54

Sorry for my terrible english...

-- 
Problem reports:  https://cygwin.com/problems.html
FAQ:  https://cygwin.com/faq/
Documentation:https://cygwin.com/docs.html
Unsubscribe info: https://cygwin.com/ml/#unsubscribe-simple