Serhiy Storchaka added the comment:
> btw If anyone can find the place in the code (sorry I tried and failed!)
> where str.encode('utf-8', error=X) is resulting in an implicit call to the
> equivalent of decode(defaultencoding, errors=strict) (as suggested by the
> exception message) I think it'll be easier to discuss the details of fixing.
There is no single place. Search lines "str = PyUnicode_FromObject(str);" in
Modules/_codecsmodule.c.
> But that's not what happens - it *silently works* (is a no-op) as long as you
> happen to be using ASCII characters so this so-called 'programming bug' will
> go unnoticed by most programmers (and authors of third party library code you
> might be relying on!)... but the moment a non-ascii character get introduced
> suddenly you'll get an exception, maybe in some library code you rely on but
> can't fix.
The problem is that encoding ASCII str to UTF-8 is legal operation in some
circumstances and is a programming bug in other. There is no way to distinguish
these two cases automatically.
As non-English speaker I am familiar with the problems you described. This is a
bug in the design of Python 2, and the only solution is using Python 3.
You can experiment with your idea, but I'm afraid that the patch will be more
difficult than you expect and break the tests. I want to warn that even if your
experiment is quite successful, there is not much chance to take it in 2.7.
This is more like a new feature than a bug fix. Programs that depend on this
feature will be incompatible with previous bugfix releases. It is unlikely to
help the migration on Python 3, but rather would encourage writing code that is
incompatible with Python 3.
----------
_______________________________________
Python tracker <[email protected]>
<http://bugs.python.org/issue26369>
_______________________________________
_______________________________________________
Python-bugs-list mailing list
Unsubscribe:
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com