[issue13333] utf-7 inconsistent with surrogates

Martin v . Löwis Thu, 03 Nov 2011 10:28:57 -0700

Martin v. Löwis <[email protected]> added the comment:

RFC 2152 talks about encoding 16-bit unicode, and clarifies


 Surrogate pairs (UTF-16) are converted by treating each half 
 of the pair as a separate 16 bit quantity (i.e., no special
 treatment).

So lone surrogates clearly should be supported.

This text could be interpreted as saying that decoding surrogate pairs should 
also keep them (rather than combining them). However, the RFC also assumes that 
the decoded form will use 16-bit code units; for Python, I think we should 
continue combining surrogate pairs on decoding UTF-7 when we find them.

----------

_______________________________________
Python tracker <[email protected]>
<http://bugs.python.org/issue13333>
_______________________________________
_______________________________________________
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

[issue13333] utf-7 inconsistent with surrogates

Reply via email to