Nick Coghlan added the comment: The purpose of these changes it to provide tools specifically for working with surrogate escaped data, not for working with arbitrary lone Unicode surrogates.
"escaped_surrogates" is not defined by the Unicode spec, it's defined by the behaviour of the surrogateescape error handler that lets us tunnel arbitrary bytes through str objects and reproduce them faithfully at the far end. On reflection, I think codecs would be a better home than string (as that's where the error handler is defined), but it doesn't belong in unicodedata. I'd be OK with changing the name of the clean function to "clean_escaped_surrogates". Needing redecode is not a bug: it's baked into the WSGI spec in PEP 3333. I would be OK with providing it in wsgiref rather than the codecs or string modules, but I think we should provide it somewhere. ---------- _______________________________________ Python tracker <rep...@bugs.python.org> <http://bugs.python.org/issue18814> _______________________________________ _______________________________________________ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com