Bugs item #1436532, was opened at 2006-02-22 10:45
Message generated for change (Comment added) made by loewis
You can respond by visiting:
https://sourceforge.net/tracker/?func=detail&atid=105470&aid=1436532&group_id=5470
Please note that this message will contain a full copy of the comment thread,
including the initial issue submission, for this request,
not just the latest update.
Category: IDLE
Group: Python 2.4
>Status: Closed
>Resolution: Invalid
Priority: 5
Submitted By: James (hover_boy)
Assigned to: Martin v. Löwis (loewis)
Summary: length of unicode string changes print behaviour
Initial Comment:
Python 2.4.2 and IDLE (with Courier New font) on XP
and the following code saved as a UTF-8 file
if __name__ == "__main__":
print "é¶ ä¸ äº ä¸ å äº å
ä¸ å
«"
print "é¶ ä¸ äº ä¸ å äº å
ä¸ å
« ä¹ å "
results in...
IDLE 1.1.2
>>> ================================ RESTART
================================
>>>
éâºÂ¶ ä¸⬠亊ä¸Ⱐåâºâº äºâ Ã¥â¦Ã¤Â¸Æ Ã¥â¦Â«
é¶ ä¸ äº ä¸ å äº å
ä¸ å
« ä¹ å
>>>
----------------------------------------------------------------------
>Comment By: Martin v. Löwis (loewis)
Date: 2006-07-23 21:42
Message:
Logged In: YES
user_id=21627
This is not a bug. The program should not attempt to print
byte strings, since it cannot know what the encoding of the
byte strings is. Instead, the program should use Unicode
strings, such as
print u"å
«å
«å
«å
«å
«å
«å
«å
«å
«å
«å
«å
«å
«å
«å
«å
«å
«å
«å
«å
«å
«å
«"
If you attempt to print byte strings, they have to be in the
encoding of stdout, or else the behaviour is unspecified.
In my installation/locale, sys.stdout.encoding is cp1250.
IDLE's OutputWindow.write has this code:
# Tk assumes that byte strings are Latin-1;
# we assume that they are in the locale's encoding
if isinstance(s, str):
try:
s = unicode(s, IOBinding.encoding)
except UnicodeError:
# some other encoding; let Tcl deal with it
pass
Of the strings specified in the source file, only strings
2..5 decode properly as cp1250; the others don't. So these
get passed directly to Tcl, which then assumes they are
UTF-8, with some fallback also. The strings that look
"incorrectly" are actually printed out as designed: using
sys.stdout.encoding.
----------------------------------------------------------------------
Comment By: Kurt B. Kaiser (kbk)
Date: 2006-07-23 07:33
Message:
Logged In: YES
user_id=149084
I don't have a font installed which will print
those characters. When I load your sample file,
I see print statements which include unicode
characters like \u5341. The printed output
contains the same unicode characters as the
input program. Maybe Martin has an idea.
----------------------------------------------------------------------
Comment By: James (hover_boy)
Date: 2006-03-22 16:21
Message:
Logged In: YES
user_id=1458491
I've attached an example file to demonstrate the problem
better.
it seems not to be the length but something else which I
haven't figured out yet.
I've also added the encoding comment and also tried
changing the default encoding in sitecustomize.py from latin
-1 to utf-8 but neither seem to work.
thanks,
James.
XP professional, SP2, english
----------------------------------------------------------------------
Comment By: James (hover_boy)
Date: 2006-03-22 16:12
Message:
Logged In: YES
user_id=1458491
----------------------------------------------------------------------
Comment By: Terry J. Reedy (tjreedy)
Date: 2006-03-06 02:44
Message:
Logged In: YES
user_id=593130
I am fairly ignorant of unicode and encodings, but I am
surprised you got anything coherent without an encoding
cookie comment at the top (see manual). Have you tried
that? Other questions that might help someone answer:
What specific XP version? SP2 installed? Country version?
Your results for
>>> sys.getdefaultencoding()
'ascii'
>>> sys.getfilesystemencoding()
'mbcs'
What happens if you reverse the order of the print
statements? (Ie, is it really the shorter string that
does not work or just the first?)
I don't know enough to know if this is really a bug. If
you don't get an answer here, you might try for more info
on python-list/comp.lang.python
----------------------------------------------------------------------
You can respond by visiting:
https://sourceforge.net/tracker/?func=detail&atid=105470&aid=1436532&group_id=5470
_______________________________________________
Python-bugs-list mailing list
Unsubscribe:
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com