Kurt Mueller wrote:
Hi,
on a Linux system and python 2.5.1 I have the
following behaviour which I do not understand:
case 1
python -c 'a="ä"; print a ; print a.center(6,"-") ; b=unicode(a, "utf8"); print
b.center(6,"-")'
ä
--ä--
--ä---
case 2
----- an UnicodeEncodeError in this case:
python -c 'a="ä"; print a ; print a.center(20,"-") ; b=unicode(a, "utf8"); print
b.center(20,"-")' | cat
Traceback (most recent call last):
File "<string>", line 1, in <module>
UnicodeEncodeError: 'ascii' codec can't encode character u'\xe4' in position 9:
ordinal not in range(128)
ä
--ä--
The behaviour changes if I pipe the output to another prog or to a file.
and
centering with the string a is not correct, but with string b.
Could somebody please explain this to me?
Thanks in advance
========================================================
Let me add to the confusion:
=======================================================================
stevet:> python -c 'a="ä"; print a ; print a.center(6,"-") ;
b=unicode(a, "utf8"); print b.center(6,"-")'
ä
--ä---
Traceback (most recent call last):
File "<string>", line 1, in <module>
File "/usr/local/lib/python2.5/encodings/utf_8.py", line 16, in decode
return codecs.utf_8_decode(input, errors, True)
UnicodeDecodeError: 'utf8' codec can't decode byte 0xe4 in position 0:
unexpected end of data
stevet:>
stevet:> python -c 'a="ä"; print a ; print a.center(20,"-") ;
b=unicode(a, "utf8"); print b.center(20,"-")' | cat
Traceback (most recent call last):
File "<string>", line 1, in <module>
File "/usr/local/lib/python2.5/encodings/utf_8.py", line 16, in decode
return codecs.utf_8_decode(input, errors, True)
UnicodeDecodeError: 'utf8' codec can't decode byte 0xe4 in position 0:
unexpected end of data
ä
---------ä----------
stevet:>
stevet:> python -c 'a="ä"; print a ; print a.center(20,"-") ;
b=unicode(a, "utf8"); print b.center(20,"-")'
ä
---------ä----------
Traceback (most recent call last):
File "<string>", line 1, in <module>
File "/usr/local/lib/python2.5/encodings/utf_8.py", line 16, in decode
return codecs.utf_8_decode(input, errors, True)
UnicodeDecodeError: 'utf8' codec can't decode byte 0xe4 in position 0:
unexpected end of data
stevet:>
=======================================================================
I'm using Python 2.5.2 on Linux Slackware 10.2
Line wraps (if showing) are Email induced.
Your first line bombs for me at unicode.
a is centered (even number field len)
The second line bombs for me at unicode. (Has a pipe)
a is centered (even number field len)
The third line bombs for me at unicode. (No pipe)
a is centered (even number field len)
In no case does the 'b' print.
If I put the code into a file and python zz #(a dummy file)
I get:
File zz:
-------
a="ä"
print a
print a.center(20,"-")
b=unicode(a, "utf8")
print b.center(20,"-")
----
Output:
------
stevet:> py zz
File "zz", line 2
SyntaxError: Non-ASCII character '\xe4' in file zz on line 2, but no
encoding declared; see http://www.python.org/peps/pep-0263.html for details
stevet:>
------
It don't like "ä"
Python is cooking print. It is disallowing the full ASCII set in the
print routine. (Yes - Yes, ASCII is 128 bytes (0->127) high bit off. But
the full set allows the high bits to be used 'undefined' AKA use at your
own risk.)
Look for special handling routines/docs whatever.
Seek 8Bit binary, 8Bit ASCII, etc..
Steve
--
http://mail.python.org/mailman/listinfo/python-list