[issue16585] surrogateescape broken w/ multibytecodecs' encode
Walter Dörwald added the comment: And returning bytes is documented in PEP 383, as an extension to the PEP 293 machinery: To convert non-decodable bytes, a new error handler ([2]) surrogateescape is introduced, which produces these surrogates. On encoding, the error handler converts the surrogate back to the corresponding byte. This error handler will be used in any API that receives or produces file names, command line arguments, or environment variables. The error handler interface is extended to allow the encode error handler to return byte strings immediately, in addition to returning Unicode strings which then get encoded again (also see the discussion below). -- nosy: +doerwalter ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue16585 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue16585] surrogateescape broken w/ multibytecodecs' encode
Changes by Antoine Pitrou pit...@free.fr: -- assignee: docs@python - components: +Library (Lib) -Documentation, Interpreter Core, Unicode ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue16585 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue16585] surrogateescape broken w/ multibytecodecs' encode
Roundup Robot added the comment: New changeset 5c88c72dec60 by Benjamin Peterson in branch '3.3': support encoding error handlers that return bytes (closes #16585) http://hg.python.org/cpython/rev/5c88c72dec60 New changeset 2181c37977d3 by Benjamin Peterson in branch 'default': merge 3.3 (#16585) http://hg.python.org/cpython/rev/2181c37977d3 -- nosy: +python-dev resolution: - fixed stage: needs patch - committed/rejected status: open - closed ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue16585 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue16585] surrogateescape broken w/ multibytecodecs' encode
Roundup Robot added the comment: New changeset 777aabdff35a by Benjamin Peterson in branch '3.3': document that encoding error handlers may return bytes (#16585) http://hg.python.org/cpython/rev/777aabdff35a -- ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue16585 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue16585] surrogateescape broken w/ multibytecodecs' encode
New submission from Philip Jenvey: surrogateescape claims to be implemented by all standard Python codecs http://docs.python.org/3/library/codecs.html#codec-base-classes However it fails w/ multibytecodecs on encode: Python 3.2.3+ (3.2:eb999002916c, Oct 26 2012, 16:11:03) [GCC 4.2.1 (Apple Inc. build 5666) (dot 3)] on darwin Type help, copyright, credits or license for more information. \u30fb.encode('gb18030') b'\x819\xa79' \u30fb\udc80.encode('gb18030', 'surrogateescape') Traceback (most recent call last): File stdin, line 1, in module TypeError: encoding error handler must return (unicode, int) tuple The problem being that multibytecodec.c forces error handler return results to always be unicode and surrogateescape returns bytes here. (surrogatepass also similarly returns bytes but it claims to be utf-8 only) The error handler spec seems to imply that error handlers should always return unicode, because The encoder will encode the replacement http://docs.python.org/3/library/codecs.html#codecs.register_error but obviously that's not really the case: some codecs special case bytes results and copy them directly to the output, e.g.: http://hg.python.org/cpython/file/ce3f0399ea33/Objects/unicodeobject.c#l6305 -- components: Interpreter Core messages: 176711 nosy: pjenvey priority: normal severity: normal status: open title: surrogateescape broken w/ multibytecodecs' encode versions: Python 3.2, Python 3.3 ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue16585 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue16585] surrogateescape broken w/ multibytecodecs' encode
Changes by Serhiy Storchaka storch...@gmail.com: -- components: +Unicode nosy: +benjamin.peterson, ezio.melotti, haypo, lemburg, pitrou, serhiy.storchaka type: - behavior versions: +Python 3.4 ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue16585 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue16585] surrogateescape broken w/ multibytecodecs' encode
Benjamin Peterson added the comment: Codecs should be fixed to accept bytes from the error handler and the definition in the docs loosened. Returning bytes seems to be useful. -- ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue16585 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue16585] surrogateescape broken w/ multibytecodecs' encode
Changes by Serhiy Storchaka storch...@gmail.com: -- assignee: - docs@python components: +Documentation nosy: +docs@python stage: - needs patch ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue16585 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com