Re: getclip and putclip garble unicode characters
Replying to myself... Mark Geisert wrote: Hi Leonid (?), Миронов Леонид Владимирович via Cygwin wrote: getclip and putclip from cygutils-extra garble unicode characters: non-latin characters copied to clipboard in windows are replaced with question marks when retrieved with getclip in cygwin, and non-latin characters copied to clipboard using putclip are pasted it in windows looking like utf-8 displayed in cp1252 but can be retrieved with getclip exactly as pasted, so it looks like the problem is not in the way the data is copied but in the way cygwin and windows communicate text encoding to each other. LC_CTYPE=en_US.UTF-8, windows ANSI codepage is set to cp1251 - 1251, not 1252. Thanks for the report. I will investigate. I believe I have a local testcase similar to your report: If I select a region of text on a message displayed from the Cygwin mailing list digest, and that message has Cyrillic characters in it, getclip replaces those characters with '?' on output. Since Thomas suggested an alternative, using 'cat < /dev/clipboard', I tried that as well and see that here UTF-8 is output and the Cyrillic characters are intact. So I've modified getclip to understand what MS calls CF_UNICODETEXT from the clipboard and have it converted to UTF-8 for output. Thus my new getclip can duplicate what the alternative does. (What getclip could understand previously was CF_TEXT ("normal" ANSI characters) or CYGWIN_NATIVE (an internal Cygwin format that makes your putclip + getclip example work)). How about I generate a test version of the cygutils package with this updated getclip and you can see if it solves your issue? Stay tuned, ..mark -- Problem reports: https://cygwin.com/problems.html FAQ: https://cygwin.com/faq/ Documentation:https://cygwin.com/docs.html Unsubscribe info: https://cygwin.com/ml/#unsubscribe-simple
Re: getclip and putclip garble unicode characters
On 2021-06-25 12:01, Thomas Wolff wrote: Am 24.06.2021 um 08:35 schrieb Andrey Repin via Cygwin: Greetings, Миронов Леонид Владимирович! getclip and putclip from cygutils-extra garble unicode characters: non-latin characters copied to clipboard in windows are replaced with question marks when retrieved with getclip in cygwin, and non-latin characters copied to clipboard using putclip are pasted it in windows looking like utf-8 displayed in cp1252 but can be retrieved with getclip exactly as pasted, so it looks like the problem is not in the way the data is copied but in the way cygwin and windows communicate text encoding to each other. LC_CTYPE=en_US.UTF-8, windows ANSI codepage is set to cp1251 - 1251, not 1252. This looks like you are using a program incapable of dealing with unicode clipboard. To achieve better results, switch your input language/keyboard to matching language before copying text from application. I.e. switch to Russian then copy text, then check what is returned by getclip. But then, why LC_CTYPE is en_US? getclip and putclip are just broken, they don't even work in a pure UTF-8 environment. Already noticed 9 years ago... https://sourceware.org/legacy-ml/cygwin/2012-03/msg00648.html including a script-based replacement. Just cat [<>] /dev/clipboard: recent Windows changes may have affected Windows<->X copy and paste transparency. -- Take care. Thanks, Brian Inglis, Calgary, Alberta, Canada This email may be disturbing to some readers as it contains too much technical detail. Reader discretion is advised. [Data in binary units and prefixes, physical quantities in SI.] -- Problem reports: https://cygwin.com/problems.html FAQ: https://cygwin.com/faq/ Documentation:https://cygwin.com/docs.html Unsubscribe info: https://cygwin.com/ml/#unsubscribe-simple
Re: getclip and putclip garble unicode characters
Am 24.06.2021 um 08:35 schrieb Andrey Repin via Cygwin: Greetings, Миронов Леонид Владимирович! getclip and putclip from cygutils-extra garble unicode characters: non-latin characters copied to clipboard in windows are replaced with question marks when retrieved with getclip in cygwin, and non-latin characters copied to clipboard using putclip are pasted it in windows looking like utf-8 displayed in cp1252 but can be retrieved with getclip exactly as pasted, so it looks like the problem is not in the way the data is copied but in the way cygwin and windows communicate text encoding to each other. LC_CTYPE=en_US.UTF-8, windows ANSI codepage is set to cp1251 - 1251, not 1252. This looks like you are using a program incapable of dealing with unicode clipboard. To achieve better results, switch your input language/keyboard to matching language before copying text from application. I.e. switch to Russian then copy text, then check what is returned by getclip. But then, why LC_CTYPE is en_US? getclip and putclip are just broken, they don't even work in a pure UTF-8 environment. Already noticed 9 years ago... https://sourceware.org/legacy-ml/cygwin/2012-03/msg00648.html including a script-based replacement. Thomas -- Problem reports: https://cygwin.com/problems.html FAQ: https://cygwin.com/faq/ Documentation:https://cygwin.com/docs.html Unsubscribe info: https://cygwin.com/ml/#unsubscribe-simple
RE: getclip and putclip garble unicode characters
As far as copying from cygwin to windows is concerned, it happens in exactly the same way in all windows programs I tried pasting data to - word, outlook, chrome, console, you name it. Changing windows keyboard language has no effect either, windows still stubbornly treats clipboard contents as cp1252 (don't quite see how it is supposed to help - data on the clipboard is not limited to one single-byte codepage anyway). At first I missed that when copying from windows to cygwin getclip actually gets data in cp1251 (windows ANSI codepage), thus cyrillic characters can be at least recovered with iconv, but non-cyrillic non-latin characters - e.g. greek, are replaced with question marks and are lost although in windows everything can be pasted back without issues, again regardless of the program and keyboard language. So in a nutshell, when copy-pasting from cygwin putclip to windows unicode is treated as cp1252 while copy-pasting from windows to cygwin getclip unicode is treated as cp1251. Sorry for top-posting. -Original Message- From: Andrey Repin Sent: Thursday, June 24, 2021 9:36 AM To: Миронов Леонид Владимирович ; cygwin@cygwin.com Subject: Re: getclip and putclip garble unicode characters Greetings, Миронов Леонид Владимирович! > getclip and putclip from cygutils-extra garble unicode characters: > non-latin characters copied to clipboard in windows are replaced with > question marks when retrieved with getclip in cygwin, and non-latin > characters copied to clipboard using putclip are pasted it in windows > looking like utf-8 displayed in cp1252 but can be retrieved with > getclip exactly as pasted, so it looks like the problem is not in the > way the data is copied but in the way cygwin and windows communicate > text encoding to each other. LC_CTYPE=en_US.UTF-8, windows ANSI codepage is > set to cp1251 - 1251, not 1252. This looks like you are using a program incapable of dealing with unicode clipboard. To achieve better results, switch your input language/keyboard to matching language before copying text from application. I.e. switch to Russian then copy text, then check what is returned by getclip. But then, why LC_CTYPE is en_US? -- With best regards, Andrey Repin Thursday, June 24, 2021 9:33:54 Sorry for my terrible english... -- Problem reports: https://cygwin.com/problems.html FAQ: https://cygwin.com/faq/ Documentation:https://cygwin.com/docs.html Unsubscribe info: https://cygwin.com/ml/#unsubscribe-simple
Re: getclip and putclip garble unicode characters
Greetings, Миронов Леонид Владимирович! > getclip and putclip from cygutils-extra garble unicode characters: > non-latin characters copied to clipboard in windows are replaced with > question marks when retrieved with getclip in cygwin, and non-latin > characters copied to clipboard using putclip are pasted it in windows > looking like utf-8 displayed in cp1252 but can be retrieved with getclip > exactly as pasted, so it looks like the problem is not in the way the data > is copied but in the way cygwin and windows communicate text encoding to > each other. LC_CTYPE=en_US.UTF-8, windows ANSI codepage is set to cp1251 - > 1251, not 1252. This looks like you are using a program incapable of dealing with unicode clipboard. To achieve better results, switch your input language/keyboard to matching language before copying text from application. I.e. switch to Russian then copy text, then check what is returned by getclip. But then, why LC_CTYPE is en_US? -- With best regards, Andrey Repin Thursday, June 24, 2021 9:33:54 Sorry for my terrible english... -- Problem reports: https://cygwin.com/problems.html FAQ: https://cygwin.com/faq/ Documentation:https://cygwin.com/docs.html Unsubscribe info: https://cygwin.com/ml/#unsubscribe-simple
Re: getclip and putclip garble unicode characters
Hi Leonid (?), Миронов Леонид Владимирович via Cygwin wrote: getclip and putclip from cygutils-extra garble unicode characters: non-latin characters copied to clipboard in windows are replaced with question marks when retrieved with getclip in cygwin, and non-latin characters copied to clipboard using putclip are pasted it in windows looking like utf-8 displayed in cp1252 but can be retrieved with getclip exactly as pasted, so it looks like the problem is not in the way the data is copied but in the way cygwin and windows communicate text encoding to each other. LC_CTYPE=en_US.UTF-8, windows ANSI codepage is set to cp1251 - 1251, not 1252. Thanks for the report. I will investigate. ..mark -- Problem reports: https://cygwin.com/problems.html FAQ: https://cygwin.com/faq/ Documentation:https://cygwin.com/docs.html Unsubscribe info: https://cygwin.com/ml/#unsubscribe-simple
getclip and putclip garble unicode characters
getclip and putclip from cygutils-extra garble unicode characters: non-latin characters copied to clipboard in windows are replaced with question marks when retrieved with getclip in cygwin, and non-latin characters copied to clipboard using putclip are pasted it in windows looking like utf-8 displayed in cp1252 but can be retrieved with getclip exactly as pasted, so it looks like the problem is not in the way the data is copied but in the way cygwin and windows communicate text encoding to each other. LC_CTYPE=en_US.UTF-8, windows ANSI codepage is set to cp1251 - 1251, not 1252. -- Problem reports: https://cygwin.com/problems.html FAQ: https://cygwin.com/faq/ Documentation:https://cygwin.com/docs.html Unsubscribe info: https://cygwin.com/ml/#unsubscribe-simple