[issue14304] Implement utf-8-bmp codec

Serhiy Storchaka Mon, 16 Apr 2012 07:59:34 -0700

Serhiy Storchaka <[email protected]> added the comment:

Example:


>>> '\u0100'
'Ā'
>>> '\u0100\U00010000'
'\u0100\U00010000'
>>> print('\u0100')
Ā
>>> print('\u0100\U00010000')
Traceback (most recent call last):
  File "<pyshell#33>", line 1, in <module>
    print('\u0100\U00010000')
UnicodeEncodeError: 'UCS-2' codec can't encode characters in position 1-1: 
Non-BMP character not supported in Tk

But I think that it is too specific problem and too specific solution. It would 
be better if IDLE itself escapes the string in the most appropriate way.

def utf8bmp_encode(s):
    return ''.join(c if ord(c) <= 0xffff else '\\U%08x' % ord(c) for c in 
s).encode('utf-8')

or

def utf8bmp_encode(s):
    return re.sub('[^\x00-\uffff]', lambda m: '\\U%08x' % ord(m.group()), 
s).encode('utf-8')

----------

_______________________________________
Python tracker <[email protected]>
<http://bugs.python.org/issue14304>
_______________________________________
_______________________________________________
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

[issue14304] Implement utf-8-bmp codec

Reply via email to