[issue26369] unicode.decode and str.encode are unnecessarily confusing for non-ascii

Serhiy Storchaka Thu, 19 May 2016 10:55:24 -0700

Serhiy Storchaka added the comment:

> btw If anyone can find the place in the code (sorry I tried and failed!) 
> where str.encode('utf-8', error=X) is resulting in an implicit call to the 
> equivalent of decode(defaultencoding, errors=strict) (as suggested by the 
> exception message) I think it'll be easier to discuss the details of fixing.


There is no single place. Search lines "str = PyUnicode_FromObject(str);" in 
Modules/_codecsmodule.c.

> But that's not what happens - it *silently works* (is a no-op) as long as you 
> happen to be using ASCII characters so this so-called 'programming bug' will 
> go unnoticed by most programmers (and authors of third party library code you 
> might be relying on!)... but the moment a non-ascii character get introduced 
> suddenly you'll get an exception, maybe in some library code you rely on but 
> can't fix.

The problem is that encoding ASCII str to UTF-8 is legal operation in some 
circumstances and is a programming bug in other. There is no way to distinguish 
these two cases automatically.

As non-English speaker I am familiar with the problems you described. This is a 
bug in the design of Python 2, and the only solution is using Python 3.

You can experiment with your idea, but I'm afraid that the patch will be more 
difficult than you expect and break the tests. I want to warn that even if your 
experiment is quite successful, there is not much chance to take it in 2.7. 
This is more like a new feature than a bug fix. Programs that depend on this 
feature will be incompatible with previous bugfix releases. It is unlikely to 
help the migration on Python 3, but rather would encourage writing code that is 
incompatible with Python 3.

----------

_______________________________________
Python tracker <[email protected]>
<http://bugs.python.org/issue26369>
_______________________________________
_______________________________________________
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

[issue26369] unicode.decode and str.encode are unnecessarily confusing for non-ascii

Reply via email to