Nick Coghlan added the comment:

The error handler is called "surrogateescape". That means 
"convert_surrogateescape" is always only a single step away from thinking "I 
want to remove the smuggled bytes from a surrogateescape'd string", without 
needing to assume any knowledge on the part of the user other than the name of 
the error handler and the fact that it is used to smuggle arbitrary bytes 
through the Python 3 str type.

Getting from "this string was decoded with the surrogateescape handler and may 
contain smuggled bytes" to "filter_non_utf8_data" as the relevant cleanup 
function is a much bigger leap that requires more assumed knowledge on the part 
of the user, and also one that confuses the conceptual purpose of the function 
(cleaning up the output of the surrogateescape error handler to ensure it is a 
pure Unicode string) with the internal details of the proposed approach to 
implementing that cleanup operation (encoding to UTF-8 with surrogateescape, 
and then decoding again with a different error handler).

----------

_______________________________________
Python tracker <rep...@bugs.python.org>
<http://bugs.python.org/issue18814>
_______________________________________
_______________________________________________
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

Reply via email to