On Thu, 27 Nov 2008 01:00:28 +0000, MRAB <[EMAIL PROTECTED]> wrote: >No problem here: > > >>> import urllib > >>> data = urllib.urlopen("http://www.amazon.co.jp/").read() > >>> decoded_data = data.decode("shift-jis") > >>>
Thanks, but it seems like some pages contain ShiftJIS mixed with some other code page, and Python complains when trying to display this. I ended up not displaying the string, and just sending it directly to the database: ======== title = None m = firsttry.search(the_page) if m: try: title = m.group(1).decode('shift-jis').strip() except UnicodeEncodeError: title = m.group(1).decode('iso8859-1').strip() except: title = "" else: m = secondtry.search(the_page) if m: try: title = m.group(1).decode('shift-jis').strip() except UnicodeEncodeError: title = m.group(1).decode('iso8859-1').strip() except: title = "" else: print "Nothing found for ISBN %s" % isbn if title: #UnicodeEncodeError: 'charmap' codec can't encode characters in position 49-55: character maps to <undefined> #print "Found : %s" % title print "Found stuff" sql = 'INSERT INTO books (title) VALUES (?)' cursor.execute(sql,(title,)) ======== Thank you -- http://mail.python.org/mailman/listinfo/python-list