On Mon, 18 Feb 2008 22:24:56 -0800 (PST), J Peyret wrote > [...] > You are right, I am confused about unicode. Guilty as charged.
You should read http://www.amk.ca/python/howto/unicode to clear up some of your confusion. > [...] > Also doesn't help that I am not sure what encoding is used in the > data file that I'm using. That is, incidentally, the direct cause of the error message below. > [...] > <class 'psycopg2.ProgrammingError'> > invalid byte sequence for encoding "UTF8": 0x92 > HINT: This error can also happen if the byte sequence does not match > the encoding expected by the server, which is controlled by > "client_encoding". What this error message means is that you've given the database a byte string in an unknown encoding, but you're pretending (by default, i.e. by not telling the database otherwise) that the string is utf-8 encoded. The database is encountering a byte that should never appear in a valid utf-8 encoded byte string, so it's raising this error, because your string is meaningless as utf-8 encoded text. This is not surprising, since you don't know the encoding of the string. Well, now we know it's not utf-8. > column is a varchar(2000) and the "guilty characters" are those used > in my posting. I doubt that. The error message is complaining about a byte with the value 0x92. That byte appeared nowhere in the string you posted, so the error message must have been caused by a different string. Now for the solution of your problem: If you don't care what the encoding of your byte string is and you simply want to treat it as binary data, you should use client_encoding "latin-1" or "iso8859_1" (they're different names for the same thing). Since latin-1 simply maps the bytes 0 to 255 to unicode code points 0 to 255, you can store any byte string in the database, and get the same byte string back from the database. (The same is not true for utf-8 since not every random string of bytes is a valid utf-8 encoded string.) Hope this helps, -- Carsten Haese http://informixdb.sourceforge.net -- http://mail.python.org/mailman/listinfo/python-list