Re: getclip and putclip garble unicode characters

2021-07-05 Thread Mark Geisert

Replying to myself...

Mark Geisert wrote:

Hi Leonid (?),

Миронов Леонид Владимирович via Cygwin wrote:
getclip and putclip from cygutils-extra garble unicode characters: non-latin 
characters copied to clipboard in windows are replaced with question marks when 
retrieved with getclip in cygwin, and non-latin characters copied to clipboard 
using putclip are pasted it in windows looking like utf-8 displayed in cp1252 
but can be retrieved with getclip exactly as pasted, so it looks like the 
problem is not in the way the data is copied but in the way cygwin and windows 
communicate text encoding to each other. LC_CTYPE=en_US.UTF-8, windows ANSI 
codepage is set to cp1251 - 1251, not 1252.


Thanks for the report.  I will investigate.


I believe I have a local testcase similar to your report: If I select a region of 
text on a message displayed from the Cygwin mailing list digest, and that message 
has Cyrillic characters in it, getclip replaces those characters with '?' on output.


Since Thomas suggested an alternative, using 'cat < /dev/clipboard', I tried that 
as well and see that here UTF-8 is output and the Cyrillic characters are intact.


So I've modified getclip to understand what MS calls CF_UNICODETEXT from the 
clipboard and have it converted to UTF-8 for output.  Thus my new getclip can 
duplicate what the alternative does.  (What getclip could understand previously 
was CF_TEXT ("normal" ANSI characters) or CYGWIN_NATIVE (an internal Cygwin format 
that makes your putclip + getclip example work)).


How about I generate a test version of the cygutils package with this updated 
getclip and you can see if it solves your issue?

Stay tuned,

..mark

--
Problem reports:  https://cygwin.com/problems.html
FAQ:  https://cygwin.com/faq/
Documentation:https://cygwin.com/docs.html
Unsubscribe info: https://cygwin.com/ml/#unsubscribe-simple


Re: getclip and putclip garble unicode characters

2021-06-25 Thread Brian Inglis

On 2021-06-25 12:01, Thomas Wolff wrote:

Am 24.06.2021 um 08:35 schrieb Andrey Repin via Cygwin:

Greetings, Миронов Леонид Владимирович!

getclip and putclip from cygutils-extra garble unicode characters:
non-latin characters copied to clipboard in windows are replaced with
question marks when retrieved with getclip in cygwin, and non-latin
characters copied to clipboard using putclip are pasted it in windows
looking like utf-8 displayed in cp1252 but can be retrieved with getclip
exactly as pasted, so it looks like the problem is not in the way the 
data

is copied but in the way cygwin and windows communicate text encoding to
each other. LC_CTYPE=en_US.UTF-8, windows ANSI codepage is set to 
cp1251 - 1251, not 1252.

This looks like you are using a program incapable of dealing with unicode
clipboard. To achieve better results, switch your input 
language/keyboard to

matching language before copying text from application. I.e. switch to
Russian then copy text, then check what is returned by getclip.
But then, why LC_CTYPE is en_US?
getclip and putclip are just broken, they don't even work in a pure 
UTF-8 environment.
Already noticed 9 years ago... 
https://sourceware.org/legacy-ml/cygwin/2012-03/msg00648.html

including a script-based replacement.


Just cat [<>] /dev/clipboard: recent Windows changes may have affected 
Windows<->X copy and paste transparency.


--
Take care. Thanks, Brian Inglis, Calgary, Alberta, Canada

This email may be disturbing to some readers as it contains
too much technical detail. Reader discretion is advised.
[Data in binary units and prefixes, physical quantities in SI.]

--
Problem reports:  https://cygwin.com/problems.html
FAQ:  https://cygwin.com/faq/
Documentation:https://cygwin.com/docs.html
Unsubscribe info: https://cygwin.com/ml/#unsubscribe-simple


Re: getclip and putclip garble unicode characters

2021-06-25 Thread Thomas Wolff



Am 24.06.2021 um 08:35 schrieb Andrey Repin via Cygwin:

Greetings, Миронов Леонид Владимирович!


getclip and putclip from cygutils-extra garble unicode characters:
non-latin characters copied to clipboard in windows are replaced with
question marks when retrieved with getclip in cygwin, and non-latin
characters copied to clipboard using putclip are pasted it in windows
looking like utf-8 displayed in cp1252 but can be retrieved with getclip
exactly as pasted, so it looks like the problem is not in the way the data
is copied but in the way cygwin and windows communicate text encoding to
each other. LC_CTYPE=en_US.UTF-8, windows ANSI codepage is set to cp1251 - 
1251, not 1252.

This looks like you are using a program incapable of dealing with unicode
clipboard. To achieve better results, switch your input language/keyboard to
matching language before copying text from application. I.e. switch to
Russian then copy text, then check what is returned by getclip.
But then, why LC_CTYPE is en_US?
getclip and putclip are just broken, they don't even work in a pure 
UTF-8 environment.
Already noticed 9 years ago... 
https://sourceware.org/legacy-ml/cygwin/2012-03/msg00648.html

including a script-based replacement.
Thomas

--
Problem reports:  https://cygwin.com/problems.html
FAQ:  https://cygwin.com/faq/
Documentation:https://cygwin.com/docs.html
Unsubscribe info: https://cygwin.com/ml/#unsubscribe-simple


RE: getclip and putclip garble unicode characters

2021-06-25 Thread Миронов Леонид Владимирович via Cygwin
As far as copying from cygwin to windows is concerned, it happens in exactly 
the same way in all windows programs I tried pasting data to - word, outlook, 
chrome, console, you name it. Changing windows keyboard language has no effect 
either, windows still stubbornly treats clipboard contents as cp1252 (don't 
quite see how it is supposed to help - data on the clipboard is not limited to 
one single-byte codepage anyway). 

At first I missed that when copying from windows to cygwin getclip actually 
gets data in cp1251 (windows ANSI codepage), thus cyrillic characters can be at 
least recovered with iconv, but non-cyrillic non-latin characters - e.g. greek, 
are replaced with question marks and are lost although in windows everything 
can be pasted back without issues, again regardless of the program and keyboard 
language.

So in a nutshell, when copy-pasting from cygwin putclip to windows unicode is 
treated as cp1252 while copy-pasting from windows to cygwin getclip unicode is 
treated as cp1251.

Sorry for top-posting.

-Original Message-
From: Andrey Repin  
Sent: Thursday, June 24, 2021 9:36 AM
To: Миронов Леонид Владимирович ; cygwin@cygwin.com
Subject: Re: getclip and putclip garble unicode characters

Greetings, Миронов Леонид Владимирович!

> getclip and putclip from cygutils-extra garble unicode characters:
> non-latin characters copied to clipboard in windows are replaced with 
> question marks when retrieved with getclip in cygwin, and non-latin 
> characters copied to clipboard using putclip are pasted it in windows 
> looking like utf-8 displayed in cp1252 but can be retrieved with 
> getclip exactly as pasted, so it looks like the problem is not in the 
> way the data is copied but in the way cygwin and windows communicate 
> text encoding to each other. LC_CTYPE=en_US.UTF-8, windows ANSI codepage is 
> set to cp1251 - 1251, not 1252.

This looks like you are using a program incapable of dealing with unicode 
clipboard. To achieve better results, switch your input language/keyboard to 
matching language before copying text from application. I.e. switch to Russian 
then copy text, then check what is returned by getclip.
But then, why LC_CTYPE is en_US?


--
With best regards,
Andrey Repin
Thursday, June 24, 2021 9:33:54

Sorry for my terrible english...

-- 
Problem reports:  https://cygwin.com/problems.html
FAQ:  https://cygwin.com/faq/
Documentation:https://cygwin.com/docs.html
Unsubscribe info: https://cygwin.com/ml/#unsubscribe-simple


Re: getclip and putclip garble unicode characters

2021-06-24 Thread Andrey Repin via Cygwin
Greetings, Миронов Леонид Владимирович!

> getclip and putclip from cygutils-extra garble unicode characters:
> non-latin characters copied to clipboard in windows are replaced with
> question marks when retrieved with getclip in cygwin, and non-latin
> characters copied to clipboard using putclip are pasted it in windows
> looking like utf-8 displayed in cp1252 but can be retrieved with getclip
> exactly as pasted, so it looks like the problem is not in the way the data
> is copied but in the way cygwin and windows communicate text encoding to
> each other. LC_CTYPE=en_US.UTF-8, windows ANSI codepage is set to cp1251 - 
> 1251, not 1252.

This looks like you are using a program incapable of dealing with unicode
clipboard. To achieve better results, switch your input language/keyboard to
matching language before copying text from application. I.e. switch to
Russian then copy text, then check what is returned by getclip.
But then, why LC_CTYPE is en_US?


-- 
With best regards,
Andrey Repin
Thursday, June 24, 2021 9:33:54

Sorry for my terrible english...

-- 
Problem reports:  https://cygwin.com/problems.html
FAQ:  https://cygwin.com/faq/
Documentation:https://cygwin.com/docs.html
Unsubscribe info: https://cygwin.com/ml/#unsubscribe-simple


Re: getclip and putclip garble unicode characters

2021-06-23 Thread Mark Geisert

Hi Leonid (?),

Миронов Леонид Владимирович via Cygwin wrote:

getclip and putclip from cygutils-extra garble unicode characters: non-latin 
characters copied to clipboard in windows are replaced with question marks when 
retrieved with getclip in cygwin, and non-latin characters copied to clipboard 
using putclip are pasted it in windows looking like utf-8 displayed in cp1252 
but can be retrieved with getclip exactly as pasted, so it looks like the 
problem is not in the way the data is copied but in the way cygwin and windows 
communicate text encoding to each other. LC_CTYPE=en_US.UTF-8, windows ANSI 
codepage is set to cp1251 - 1251, not 1252.


Thanks for the report.  I will investigate.

..mark

--
Problem reports:  https://cygwin.com/problems.html
FAQ:  https://cygwin.com/faq/
Documentation:https://cygwin.com/docs.html
Unsubscribe info: https://cygwin.com/ml/#unsubscribe-simple


getclip and putclip garble unicode characters

2021-06-23 Thread Миронов Леонид Владимирович via Cygwin
getclip and putclip from cygutils-extra garble unicode characters: non-latin 
characters copied to clipboard in windows are replaced with question marks when 
retrieved with getclip in cygwin, and non-latin characters copied to clipboard 
using putclip are pasted it in windows looking like utf-8 displayed in cp1252 
but can be retrieved with getclip exactly as pasted, so it looks like the 
problem is not in the way the data is copied but in the way cygwin and windows 
communicate text encoding to each other. LC_CTYPE=en_US.UTF-8, windows ANSI 
codepage is set to cp1251 - 1251, not 1252.


-- 
Problem reports:  https://cygwin.com/problems.html
FAQ:  https://cygwin.com/faq/
Documentation:https://cygwin.com/docs.html
Unsubscribe info: https://cygwin.com/ml/#unsubscribe-simple