On Thu, 27 Nov 2008 01:00:28 +0000, MRAB <[EMAIL PROTECTED]>
wrote:
>No problem here:
>
> >>> import urllib
> >>> data = urllib.urlopen("http://www.amazon.co.jp/";).read()
> >>> decoded_data = data.decode("shift-jis")
> >>>

Thanks, but it seems like some pages contain ShiftJIS mixed with some
other code page, and Python complains when trying to display this. I
ended up not displaying the string, and just sending it directly to
the database:

========
title = None
m = firsttry.search(the_page)
if m:
        try:
                title = m.group(1).decode('shift-jis').strip()
        except UnicodeEncodeError:
                title = m.group(1).decode('iso8859-1').strip()
        except:
                title = ""
else:
        m = secondtry.search(the_page)
        if m:
                try:
                        title = m.group(1).decode('shift-jis').strip()
                except UnicodeEncodeError:
                        title = m.group(1).decode('iso8859-1').strip()
                except:
                        title = ""
        else:
                print "Nothing found for ISBN %s" % isbn

if title:
        #UnicodeEncodeError: 'charmap' codec can't encode characters in
position 49-55: character maps to <undefined>
        #print "Found : %s" % title
        print "Found stuff"

sql = 'INSERT INTO books (title) VALUES (?)'
cursor.execute(sql,(title,))
========

Thank you
--
http://mail.python.org/mailman/listinfo/python-list

Reply via email to