Terry J. Reedy added the comment:

When discussing problematical behavior, one should specify OS and exact Python 
version, including bugfix number. If at all possible, one should use the latest 
bugfix release with all released bugfixes. 2.7.3 came out 10+ months before the 
original report. I do not presume without evidence that it has the same 
behavior as the 2.7.2. The recently released 2.7.4 has another year of 
bugfixes, so it might also behave differently.

Looking again at the original report, I see that the false issue of lost 
encoding obscured to me a real problem: ord(u'€') is 8364, not 128. Does 2.7.4 
make the same error for that input? What does it do with u"こんにちは"?

(Note, on the Windows console, both keying and viewing unicode chars is 
problematical, apparently more so that with the *nix consoles. If I could not 
paste, u"こんにちは", I would most likely just key 
u'\u3053\u3093\u306b\u3061\u306f'.)

I believe the underlying problem is that a Python 2 program is a stream of 
bytes while a Python 3 program is a stream of unicode codepoints. So in Python 
2, a unicode literal has to be encoded to bytes before being decoded back to 
unicode codepoints in a unicode string object.

David, I presume this is why you say we cannot just toss out the encoding to 
bytes. I presume that you are also suggesting that the encoding and subsequent 
decoding are done with different codecs because of locale issues. Might 
IOBinding.encoding be miscalculated?

For ascii codepoints, the encoding and decoding is typically a null operation. 
This means that \u#### escapes, as opposed to non-ascii codepoints, should not 
get mangled before being interpreted during the creation of the unicode object. 
Using such escapes is one solution to the problem.

Another is to use Python 3. That *is* the generic answer to many Python 2.x 
unicode problems. In 3.3.1:
>>> u"こんにちは"
'こんにちは'
problem solved ;-).

In other words, fixing 2.7-only unicode bugs has fairly low priority in 
general. However, if there is an easy fix here that Roger thinks is safe, it 
can be applied.

----------
resolution:  -> invalid
stage:  -> committed/rejected

_______________________________________
Python tracker <rep...@bugs.python.org>
<http://bugs.python.org/issue17348>
_______________________________________
_______________________________________________
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

Reply via email to