> You have to know the original encoding (I mean, the one used for the csv > file), else there's nothing you can do. Then it's just a matter of > decoding (to unicode) then encoding (to utf8), ie (if your source is in > latin1): > > utf_string = latin1_string.decode("latin1").encode("utf8")
The OP mentioned using 'pgdb' which I assumed to mean he is using PostgeSQL and the PygreSQL DB. If that is the case, then PostgreSQL has an optional parameter call 'client_encoding'. If this is set in within postgres db or as part of the db transaction, then the db will accept the incoming data 'as is' and do the decoding internally saving this step and giving a bit of a performance boost as well as the client (python) application doesn't need to be concerned about it. As you so correctly point out Bruno, you do need to know the original encoding. My comments above just simplify the db update process. This part of the manual might be helpful http://www.postgresql.org/docs/8.1/static/multibyte.html If 'pgdb' != PostgreSQL then please accept my apologies for this intrusion in this thread. g. -- http://mail.python.org/mailman/listinfo/python-list