Marc-Andre Lemburg <m...@egenix.com> added the comment: While it's probably ok to fix the codecs, there's an issue which makes this difficult at least for the utf-8 codec:
The marshal module uses utf-8 to write Unicode objects and these can and need to be able to store the full range of supported UCS2/UCS4 code points, including lone surrogates. If the utf-8 codec were changed to raise an error for these, marshal would no longer be able to write/read Unicode objects. It is likely that other existing Python code (outside the std lib) also relies on this ability. Changing this would only be possible in 3.1. The marshal module would then also have to be changed to use a different encoding which does support encoding lone surrogates. See issue 3297 for a discussion of UTF-8/16 vs. UCS2/4, the implications, motivations, etc. ---------- _______________________________________ Python tracker <rep...@bugs.python.org> <http://bugs.python.org/issue3672> _______________________________________ _______________________________________________ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com