Nick Coghlan <[email protected]> added the comment:
Some further comments after getting back up to speed with the actual status of
this problem (i.e. that we had issues with the error checking and reporting in
the original 3.2 commit).
1. I agree with the position that the codecs module itself is intended to be a
type neutral codec registry. It encodes and decodes things, but shouldn't
actually care about the types involved. If that is currently not the case in
3.x, it needs to be fixed.
This type neutrality was blurred in 2.x by the fact that it only implemented
str->str translations, and even further obscured by the coupling to the
.encode() and .decode() convenience APIs. The fact that the type neutrality of
the registry itself is currently broken in 3.x is a *regression*, not an
improvement. (The convenience APIs, on the other hand, are definitely *not*
type neutral, and aren't intended to be)
2. To assist in producing nice error messages, and to allow restrictions to be
enforced on type-specific convenience APIs, the CodecInfo objects should grow
additional state as MAL suggests. To avoid redundancy (and inaccurate
overspecification), my suggested colour for that particular bikeshed is:
Character encoding codec:
.decoded_format = 'text'
.encoded_format = 'binary'
Binary transform codec:
.decoded_format = 'binary'
.encoded_format = 'binary'
Text transform codec:
.decoded_format = 'text'
.encoded_format = 'text'
I suggest using the fuzzy format labels mainly due to the existence of the
buffer API - most codec operations that consume binary data will accept
anything that implements the buffer API, so referring specifically to 'bytes'
in error messages would be inaccurate.
The convenience APIs can then emit errors like:
'a'.encode('rot_13') ==>
CodecLookupError: text <-> binary codec expected ('rot_13' is text <-> text)
'a'.decode('rot_13') ==>
CodecLookupError: text <-> binary codec expected ('rot_13' is text <-> text)
'a'.transform('bz2') ==>
CodecLookupError: text <-> text codec expected ('bz2' is binary <-> binary)
'a'.transform('ascii') ==>
CodecLookupError: text <-> text codec expected ('ascii' is text <-> binary)
b'a'.transform('ascii') ==>
CodecLookupError: binary <-> binary codec expected ('ascii' is text <->
binary)
For backwards compatibility with 3.2, codecs that do not specify their formats
should be treated as character encoding codecs (i.e. decoded format is 'text',
encoded format is 'binary')
----------
_______________________________________
Python tracker <[email protected]>
<http://bugs.python.org/issue7475>
_______________________________________
_______________________________________________
Python-bugs-list mailing list
Unsubscribe:
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com