Terry J. Reedy added the comment: The core of the patch is a wrapper that traps UnicodeDecodeErrors, corrects the strings, and re-decodes. A Python version might look like
def unicodeFromTclStringAndSize(s, size): try: return <PyUnicode_DecodeUTF8(s, size, NULL)> except UnicodeDecodeError: if b'\xc0\x80' in s: s.replace(b'\xc0\x80', b'\x00') return <PyUnicode_DecodeUTF8(s, size, NULL)> else: raise This is used in a couple of additional wrappers and all direct decode calls are replaced with wrappers. New tests are added. Overall, a great idea, and I want to see this patch in 3.4. But, how many of the replacement sites are exercised by the tests? There are a few changes that seem unrelated to nulls, which might have been left for another patch. Example: -#if TCL_UTF_MAX==3 return PyUnicode_FromKindAndData( - PyUnicode_2BYTE_KIND, Tcl_GetUnicode(value), + sizeof(Tcl_UniChar), Tcl_GetUnicode(value), Tcl_GetCharLength(value)); -#else - return PyUnicode_FromKindAndData( - PyUnicode_4BYTE_KIND, Tcl_GetUnicode(value), - Tcl_GetCharLength(value)); -#endif Do you know if this code block is tested. ---------- _______________________________________ Python tracker <rep...@bugs.python.org> <http://bugs.python.org/issue20368> _______________________________________ _______________________________________________ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com