On May 25, 6:07 pm, Paul Boddie <p...@boddie.org.uk> wrote: > On 25 Mai, 17:39, someone <petshm...@googlemail.com> wrote: > > > Hi, > > > reading content of webpage (encoded in utf-8) with urllib2, I can't > > get parsed data into DB > > > Exception: > > > File "/usr/lib/python2.5/site-packages/pyPgSQL/PgSQL.py", line 3111, > > in execute > > raise OperationalError, msg > > libpq.OperationalError: ERROR: invalid UTF-8 byte sequence detected > > near byte 0xe4 > > > I've already checked several python unicode tutorials, but I have no > > idea how to solve my problem. > > With pyPgSQL, there are a few tricks that you have to take into > account: > > 1. With PostgreSQL, it would appear advantageous to create databases > using the "-E unicode" option.
Hi, DB is in UTF8 > > 2. When connecting, use the client_encoding and unicode_results > arguments for the connect function call: > > connection = PgSQL.connect(client_encoding="utf-8", > unicode_results=1) If I do unicode_results=1, then there are exceptions in other places, e.g. urllib.urlencode(values) cant encode values > > 3. After connecting, it appears necessary to set the client encoding > explicitly: > > connection.cursor().execute("set client_encoding to unicode") I've tried this as well, but still have exceptions > > I'd appreciate any suggestions which improve on the above, but what > this should allow you to do is to present Unicode objects to the > database and to receive such objects from queries. Whether you can > relax this and pass UTF-8-encoded strings instead of Unicode objects > is not something I can guarantee, but it's usually recommended that > you manipulate Unicode objects in your program where possible, and > here you should be able to let pyPgSQL deal with the encodings > preferred by the database. > Thanks for your suggestions! Sadly, I can't solve my problem... Pet > Paul -- http://mail.python.org/mailman/listinfo/python-list