Re: =?iso-8859-1?q?Re=3A_problems_with_=C2_character?=

Bengt Richter Tue, 22 Mar 2005 20:20:43 -0800

On Tue, 22 Mar 2005 20:09:55 -0600, "John Roth" <[EMAIL PROTECTED]> wrote:


>I had this problem recently. It turned out that something
>had encoded a unicode string into utf-8. When I found
>the culprit and fixed the underlying design issue, it went away.
>
>John Roth
>
>
>
>"jdonnell" <[EMAIL PROTECTED]> wrote in message 
>news:[EMAIL PROTECTED]
>I have a mysql database with characters like  Ā  Ā  Ā» in it. I'm
>trying to write a python script to remove these, but I'm having a
>really hard time.
>
>These strings are coming out as type 'str' not 'unicode' so I tried to
>just
>
>record[4].replace('Ā', '')
>
>but this does nothing. However the following code works
>
>#!/usr/bin/python
>
>s = 'aaaaa Ā aaa'
>print type(s)
>print s
>print s.find('Ā')
>
>This returns
><type 'str'>
>aaaaa Ā aaa
>6
>
>The other odd thing is that the Ā character shows up as two spaces if
>I print it to the terminal from mysql, but it shows up as Ā when I
>print from the simple script above.
>What am I doing wrong?
>
What encodings are involved? 

This is from idle on windows, which seems to display latin-1 source ok:
 ----
 >>> "Latin-1:Ā»\n".decode('latin-1')
 u'Latin-1:\xc2\xbb\n'
 >>> "Latin-1:Ā»\n".decode('latin-1').encode('cp437', 'replace')
 'Latin-1:?\xaf\n'
 >>> "Latin-1:Ā»\n".decode('latin-1').encode('cp437', 'ignore')
 'Latin-1:\xaf\n'
 >>> u'Latin-1:\xc2\xbb\n'.encode('cp437','replace')
 'Latin-1:?\xaf\n'
 >>> 
 ----
Now this is in an NT4 console windows with code page 437:

 ----
 >>> u'Latin-1:\xc2\xbb\n'.encode('cp437','replace')
 'Latin-1:?\xaf\n'
 >>> import sys
 >>> sys.stdout.write(u'Latin-1:\xc2\xbb\n'.encode('cp437','replace'))
 Latin-1:?»
 ----

Notice that the interactive output does a repr that creates the \xaf, but
the character is available and can be written non-repr'd via sys.stdout.write.

For the heck of it:

 >>> sys.stdout.write(u'Latin-1:\xc2\xbb\n'.encode('cp437','xmlcharrefreplace'))
 Latin-1:&#194;»

I don't know if this is going to get through to your screen ;-)

Regards,
Bengt Richter
-- 
http://mail.python.org/mailman/listinfo/python-list

Re: =?iso-8859-1?q?Re=3A_problems_with_=C2_character?=

Reply via email to