Kurt Mueller wrote:
Hi,


on a Linux system and python 2.5.1 I have the
following behaviour which I do not understand:



case 1
python -c 'a="ä"; print a ; print a.center(6,"-") ; b=unicode(a, "utf8"); print 
b.center(6,"-")'
ä
--ä--
--ä---


case 2
----- an UnicodeEncodeError in this case:
python -c 'a="ä"; print a ; print a.center(20,"-") ; b=unicode(a, "utf8"); print 
b.center(20,"-")' | cat
Traceback (most recent call last):
  File "<string>", line 1, in <module>
UnicodeEncodeError: 'ascii' codec can't encode character u'\xe4' in position 9: 
ordinal not in range(128)
ä
--ä--


The behaviour changes if I pipe the output to another prog or to a file.
and
centering with the string a is not correct, but with string b.



Could somebody please explain this to me?




Thanks in advance
========================================================

Let me add to the confusion:
=======================================================================
stevet:> python -c 'a="ä"; print a ; print a.center(6,"-") ; b=unicode(a, "utf8"); print b.center(6,"-")'
ä
--ä---
Traceback (most recent call last):
  File "<string>", line 1, in <module>
  File "/usr/local/lib/python2.5/encodings/utf_8.py", line 16, in decode
    return codecs.utf_8_decode(input, errors, True)
UnicodeDecodeError: 'utf8' codec can't decode byte 0xe4 in position 0: unexpected end of data
stevet:>


stevet:> python -c 'a="ä"; print a ; print a.center(20,"-") ; b=unicode(a, "utf8"); print b.center(20,"-")' | cat
Traceback (most recent call last):
  File "<string>", line 1, in <module>
  File "/usr/local/lib/python2.5/encodings/utf_8.py", line 16, in decode
    return codecs.utf_8_decode(input, errors, True)
UnicodeDecodeError: 'utf8' codec can't decode byte 0xe4 in position 0: unexpected end of data
ä
---------ä----------
stevet:>


stevet:> python -c 'a="ä"; print a ; print a.center(20,"-") ; b=unicode(a, "utf8"); print b.center(20,"-")'
ä
---------ä----------
Traceback (most recent call last):
  File "<string>", line 1, in <module>
  File "/usr/local/lib/python2.5/encodings/utf_8.py", line 16, in decode
    return codecs.utf_8_decode(input, errors, True)
UnicodeDecodeError: 'utf8' codec can't decode byte 0xe4 in position 0: unexpected end of data
stevet:>

=======================================================================

I'm using Python 2.5.2 on Linux Slackware 10.2
Line wraps (if showing) are Email induced.

Your first line bombs for me at unicode.
  a is centered (even number field len)
The second line bombs for me at unicode. (Has a pipe)
  a is centered (even number field len)
The third  line bombs for me at unicode. (No pipe)
  a is centered (even number field len)

In no case does the 'b' print.


If I put the code into a file and   python zz   #(a dummy file)
I get:

File zz:
-------
a="ä"
print a
print a.center(20,"-")
b=unicode(a, "utf8")
print b.center(20,"-")
----

Output:
------
stevet:> py zz
  File "zz", line 2
SyntaxError: Non-ASCII character '\xe4' in file zz on line 2, but no encoding declared; see http://www.python.org/peps/pep-0263.html for details
stevet:>
------
It don't like "ä"

Python is cooking print. It is disallowing the full ASCII set in the print routine. (Yes - Yes, ASCII is 128 bytes (0->127) high bit off. But the full set allows the high bits to be used 'undefined' AKA use at your own risk.)

Look for special handling routines/docs whatever.
Seek  8Bit binary,  8Bit ASCII, etc..

Steve

--
http://mail.python.org/mailman/listinfo/python-list

Reply via email to