Re: [2.5.1] ShiftJIS to Unicode?

Mark Tolonen Thu, 27 Nov 2008 10:00:33 -0800

"Gilles Ganault" <[EMAIL PROTECTED]> wrote in messagenews:[EMAIL PROTECTED]

On Thu, 27 Nov 2008 01:00:28 +0000, MRAB <[EMAIL PROTECTED]>
wrote:

No problem here:


>>> import urllib
>>> data = urllib.urlopen("http://www.amazon.co.jp/";).read()
>>> decoded_data = data.decode("shift-jis")
>>>

This is correct. You should read in the whole page and convert it toUnicode immediately.


Thanks, but it seems like some pages contain ShiftJIS mixed with some
other code page, and Python complains when trying to display this. I
ended up not displaying the string, and just sending it directly to
the database:

========
title = None
m = firsttry.search(the_page)
if m:
try:
title = m.group(1).decode('shift-jis').strip()

You should not search the raw data and decode it later...decode the datawhen first brought into the program and do all processing in Unicode.

except UnicodeEncodeError:
title = m.group(1).decode('iso8859-1').strip()
except:
title = ""
else:
m = secondtry.search(the_page)
if m:
try:
title = m.group(1).decode('shift-jis').strip()
except UnicodeEncodeError:
title = m.group(1).decode('iso8859-1').strip()
except:
title = ""
else:
print "Nothing found for ISBN %s" % isbn

if title:
#UnicodeEncodeError: 'charmap' codec can't encode characters in
position 49-55: character maps to <undefined>
#print "Found : %s" % title
print "Found stuff"

Note here that you are getting an "encode" error. When trying to print thedata, Python will try to encode the Unicode data using the terminal'sdefault encoding, which I suspect is not Shift-JIS.


-Mark


sql = 'INSERT INTO books (title) VALUES (?)'
cursor.execute(sql,(title,))
========

Thank you
--
http://mail.python.org/mailman/listinfo/python-list



--
http://mail.python.org/mailman/listinfo/python-list

Re: [2.5.1] ShiftJIS to Unicode?

Reply via email to