Re: string processing question

norseman Thu, 30 Apr 2009 10:52:15 -0700

Kurt Mueller wrote:

Hi,



on a Linux system and python 2.5.1 I have the
following behaviour which I do not understand:



case 1

python -c 'a="ä"; print a ; print a.center(6,"-") ; b=unicode(a, "utf8"); print 
b.center(6,"-")'

ä
--ä--
--ä---


case 2
----- an UnicodeEncodeError in this case:

python -c 'a="ä"; print a ; print a.center(20,"-") ; b=unicode(a, "utf8"); print 
b.center(20,"-")' | cat

Traceback (most recent call last):
  File "<string>", line 1, in <module>
UnicodeEncodeError: 'ascii' codec can't encode character u'\xe4' in position 9: 
ordinal not in range(128)
ä
--ä--


The behaviour changes if I pipe the output to another prog or to a file.
and
centering with the string a is not correct, but with string b.



Could somebody please explain this to me?




Thanks in advance

========================================================

Let me add to the confusion:
=======================================================================

stevet:> python -c 'a="ä"; print a ; print a.center(6,"-") ;b=unicode(a, "utf8"); print b.center(6,"-")'

ä
--ä---
Traceback (most recent call last):
  File "<string>", line 1, in <module>
  File "/usr/local/lib/python2.5/encodings/utf_8.py", line 16, in decode
    return codecs.utf_8_decode(input, errors, True)

UnicodeDecodeError: 'utf8' codec can't decode byte 0xe4 in position 0:unexpected end of data

stevet:>

stevet:> python -c 'a="ä"; print a ; print a.center(20,"-") ;b=unicode(a, "utf8"); print b.center(20,"-")' | cat

Traceback (most recent call last):
  File "<string>", line 1, in <module>
  File "/usr/local/lib/python2.5/encodings/utf_8.py", line 16, in decode
    return codecs.utf_8_decode(input, errors, True)

UnicodeDecodeError: 'utf8' codec can't decode byte 0xe4 in position 0:unexpected end of data

ä
---------ä----------
stevet:>

stevet:> python -c 'a="ä"; print a ; print a.center(20,"-") ;b=unicode(a, "utf8"); print b.center(20,"-")'

ä
---------ä----------
Traceback (most recent call last):
  File "<string>", line 1, in <module>
  File "/usr/local/lib/python2.5/encodings/utf_8.py", line 16, in decode
    return codecs.utf_8_decode(input, errors, True)

UnicodeDecodeError: 'utf8' codec can't decode byte 0xe4 in position 0:unexpected end of data

stevet:>

=======================================================================

I'm using Python 2.5.2 on Linux Slackware 10.2
Line wraps (if showing) are Email induced.

Your first line bombs for me at unicode.
  a is centered (even number field len)
The second line bombs for me at unicode. (Has a pipe)
  a is centered (even number field len)
The third  line bombs for me at unicode. (No pipe)
  a is centered (even number field len)

In no case does the 'b' print.


If I put the code into a file and   python zz   #(a dummy file)
I get:

File zz:
-------
a="ä"
print a
print a.center(20,"-")
b=unicode(a, "utf8")
print b.center(20,"-")
----

Output:
------
stevet:> py zz
  File "zz", line 2

SyntaxError: Non-ASCII character '\xe4' in file zz on line 2, but noencoding declared; see http://www.python.org/peps/pep-0263.html for details

stevet:>
------
It don't like "ä"

Python is cooking print. It is disallowing the full ASCII set in theprint routine. (Yes - Yes, ASCII is 128 bytes (0->127) high bit off. Butthe full set allows the high bits to be used 'undefined' AKA use at yourown risk.)


Look for special handling routines/docs whatever.
Seek  8Bit binary,  8Bit ASCII, etc..

Steve

--
http://mail.python.org/mailman/listinfo/python-list

Re: string processing question

Reply via email to