[issue22999] Copying emoji to Windows clipboard corrupts string in Python 3.3 and up

2014-12-05 Thread Cees Timmerman

New submission from Cees Timmerman:

# http://stackoverflow.com/a/25678113/819417
def copy(data):
if not isinstance(data, unicode):
data = data.decode('mbcs')
OpenClipboard(None)
EmptyClipboard()
hCd = GlobalAlloc(GMEM_DDESHARE, 2 * (len(data) + 1))
pchData = GlobalLock(hCd)
wcscpy(ctypes.c_wchar_p(pchData), data)
GlobalUnlock(hCd)
SetClipboardData(CF_UNICODETEXT, hCd)
CloseClipboard()

Emoji  (\U0001f400) is copied as  (\U0001f4cb), or . turns to  
(note the period).

It works fine in Python 3.2.5.

--
components: Unicode
messages: 232188
nosy: Cees.Timmerman, ezio.melotti, haypo
priority: normal
severity: normal
status: open
title: Copying emoji to Windows clipboard corrupts string in Python 3.3 and up
type: behavior
versions: Python 3.3

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue22999
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue22999] Copying emoji to Windows clipboard corrupts string in Python 3.3 and up

2014-12-05 Thread Cees Timmerman

Cees Timmerman added the comment:

A copy of my test program at 
https://gist.github.com/CTimmerman/133cb80100357dde92d8

--
Added file: http://bugs.python.org/file37366/test_clipboard_win.py

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue22999
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue22999] Copying emoji to Windows clipboard corrupts string in Python 3.3 and up

2014-12-05 Thread Amaury Forgeot d'Arc

Amaury Forgeot d'Arc added the comment:

(you swapped the unicode values: \U0001f4cb is copied as \U0001f400)

On Windows, strings have changed in 3.3. See in 
https://docs.python.org/3/whatsnew/3.3.html, len() now always returns 1 for 
non-BMP characters.

The call to GlobalAlloc should use the number of wchar_t units, something like 
len(data.encode('utf-16')) + 2

--
nosy: +amaury.forgeotdarc
resolution:  - not a bug
status: open - closed

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue22999
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue22999] Copying emoji to Windows clipboard corrupts string in Python 3.3 and up

2014-12-05 Thread Amaury Forgeot d'Arc

Amaury Forgeot d'Arc added the comment:

Better use utf-16-le encoding:
  len(data.encode('utf-16-le')) + 2
otherwise the encoded bytes start with the \fffe BOM.

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue22999
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com