Nick Coghlan added the comment: The redecode thing is a distraction from my core concern here, so I've split that out to issue #22264, a separate RFE for a "wsgiref.fix_encoding" function.
For this issue, my main concern is the function to *clean* a string of escaped binary data, so it can be displayed easily, or otherwise purged of the escaped characters. Preserving the data by default is good, but you have to know a *lot* about how Python 3 works in order to be able figure out how to clean it out. For that, not knowing Unicode in general isn't the problem: it's not knowing PEP 383. If we forget the idea of exposing the constant with the escaped values (I agree that's not very useful), it suggests "codecs.clean_surrogate_escapes" as a possible name: # Helper to ensure a string contains no escaped surrogates # This allows it to be safely encoded without surrogateescape _extended_ascii = bytes(range(128, 256)) _escaped_surrogates = _extended_ascii.decode('ascii', errors='surrogateescape') _match_escaped = re.compile('[{}]'.format(_escaped_surrogates)) def clean_surrogate_escapes(s, repl='\ufffd'): return _match_escaped.sub(repl, s) A more efficient implementation in C would also be fine, this is just an easy way to define the exact semantics. (I also just noticed that unlike other error handlers, surrogateespace and surrogatepass do not have corresponding codecs.surrogateescape_errors and codecs.surrogatepass_errors functions) ---------- _______________________________________ Python tracker <rep...@bugs.python.org> <http://bugs.python.org/issue18814> _______________________________________ _______________________________________________ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com