[issue18814] Add tools for "cleaning" surrogate escaped strings

Nick Coghlan Sun, 24 Aug 2014 01:24:50 -0700

Nick Coghlan added the comment:

The purpose of these changes it to provide tools specifically for working with 
surrogate escaped data, not for working with arbitrary lone Unicode surrogates.


"escaped_surrogates" is not defined by the Unicode spec, it's defined by the 
behaviour of the surrogateescape error handler that lets us tunnel arbitrary 
bytes through str objects and reproduce them faithfully at the far end. On 
reflection, I think codecs would be a better home than string (as that's where 
the error handler is defined), but it doesn't belong in unicodedata.

I'd be OK with changing the name of the clean function to 
"clean_escaped_surrogates".

Needing redecode is not a bug: it's baked into the WSGI spec in PEP 3333. I 
would be OK with providing it in wsgiref rather than the codecs or string 
modules, but I think we should provide it somewhere.

----------

_______________________________________
Python tracker <rep...@bugs.python.org>
<http://bugs.python.org/issue18814>
_______________________________________
_______________________________________________
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

[issue18814] Add tools for "cleaning" surrogate escaped strings

Reply via email to