[issue16585] surrogateescape broken w/ multibytecodecs' encode

2012-12-02 Thread Walter Dörwald

Walter Dörwald added the comment:

And returning bytes is documented in PEP 383, as an extension to the PEP 293 
machinery:

To convert non-decodable bytes, a new error handler ([2]) surrogateescape 
is introduced, which produces these surrogates. On encoding, the error handler 
converts the surrogate back to the corresponding byte. This error handler will 
be used in any API that receives or produces file names, command line 
arguments, or environment variables.

The error handler interface is extended to allow the encode error handler to 
return byte strings immediately, in addition to returning Unicode strings which 
then get encoded again (also see the discussion below).

--
nosy: +doerwalter

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue16585
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue16585] surrogateescape broken w/ multibytecodecs' encode

2012-12-02 Thread Antoine Pitrou

Changes by Antoine Pitrou pit...@free.fr:


--
assignee: docs@python - 
components: +Library (Lib) -Documentation, Interpreter Core, Unicode

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue16585
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue16585] surrogateescape broken w/ multibytecodecs' encode

2012-12-02 Thread Roundup Robot

Roundup Robot added the comment:

New changeset 5c88c72dec60 by Benjamin Peterson in branch '3.3':
support encoding error handlers that return bytes (closes #16585)
http://hg.python.org/cpython/rev/5c88c72dec60

New changeset 2181c37977d3 by Benjamin Peterson in branch 'default':
merge 3.3 (#16585)
http://hg.python.org/cpython/rev/2181c37977d3

--
nosy: +python-dev
resolution:  - fixed
stage: needs patch - committed/rejected
status: open - closed

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue16585
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue16585] surrogateescape broken w/ multibytecodecs' encode

2012-12-02 Thread Roundup Robot

Roundup Robot added the comment:

New changeset 777aabdff35a by Benjamin Peterson in branch '3.3':
document that encoding error handlers may return bytes (#16585)
http://hg.python.org/cpython/rev/777aabdff35a

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue16585
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue16585] surrogateescape broken w/ multibytecodecs' encode

2012-11-30 Thread Philip Jenvey

New submission from Philip Jenvey:

surrogateescape claims to be implemented by all standard Python codecs

http://docs.python.org/3/library/codecs.html#codec-base-classes

However it fails w/ multibytecodecs on encode:

Python 3.2.3+ (3.2:eb999002916c, Oct 26 2012, 16:11:03) 
[GCC 4.2.1 (Apple Inc. build 5666) (dot 3)] on darwin
Type help, copyright, credits or license for more information.
 \u30fb.encode('gb18030')
b'\x819\xa79'
 \u30fb\udc80.encode('gb18030', 'surrogateescape')
Traceback (most recent call last):
  File stdin, line 1, in module
TypeError: encoding error handler must return (unicode, int) tuple

The problem being that multibytecodec.c forces error handler return results to 
always be unicode and surrogateescape returns bytes here.

(surrogatepass also similarly returns bytes but it claims to be utf-8 only)

The error handler spec seems to imply that error handlers should always return 
unicode, because The encoder will encode the replacement

http://docs.python.org/3/library/codecs.html#codecs.register_error

but obviously that's not really the case: some codecs special case bytes 
results and copy them directly to the output, e.g.:

http://hg.python.org/cpython/file/ce3f0399ea33/Objects/unicodeobject.c#l6305

--
components: Interpreter Core
messages: 176711
nosy: pjenvey
priority: normal
severity: normal
status: open
title: surrogateescape broken w/ multibytecodecs' encode
versions: Python 3.2, Python 3.3

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue16585
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue16585] surrogateescape broken w/ multibytecodecs' encode

2012-11-30 Thread Serhiy Storchaka

Changes by Serhiy Storchaka storch...@gmail.com:


--
components: +Unicode
nosy: +benjamin.peterson, ezio.melotti, haypo, lemburg, pitrou, serhiy.storchaka
type:  - behavior
versions: +Python 3.4

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue16585
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue16585] surrogateescape broken w/ multibytecodecs' encode

2012-11-30 Thread Benjamin Peterson

Benjamin Peterson added the comment:

Codecs should be fixed to accept bytes from the error handler and the 
definition in the docs loosened. Returning bytes seems to be useful.

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue16585
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue16585] surrogateescape broken w/ multibytecodecs' encode

2012-11-30 Thread Serhiy Storchaka

Changes by Serhiy Storchaka storch...@gmail.com:


--
assignee:  - docs@python
components: +Documentation
nosy: +docs@python
stage:  - needs patch

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue16585
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com