Re: split problem if the delimiter is inside the text limiter

imageguy Wed, 18 Mar 2009 17:45:40 -0700

> You have to know the original encoding (I mean, the one used for the csv
> file), else there's nothing you can do. Then it's just a matter of
> decoding (to unicode) then encoding (to utf8), ie (if your source is in
> latin1):
>
> utf_string = latin1_string.decode("latin1").encode("utf8")


The OP mentioned using 'pgdb' which I assumed to mean he is using
PostgeSQL and the PygreSQL DB.
If that is the case, then PostgreSQL has an optional parameter call
'client_encoding'.  If this is set in within postgres db or as part of
the db transaction, then the db will accept the incoming data 'as is'
and  do the decoding internally saving this step and giving a bit of a
performance boost as well as the client (python) application doesn't
need to be concerned about it.

As you so correctly point out Bruno, you do need to know the original
encoding.  My comments above just simplify the db update process.

This part of the manual might be helpful
http://www.postgresql.org/docs/8.1/static/multibyte.html

If 'pgdb' != PostgreSQL then please accept my apologies for this
intrusion in this thread.

g.

--
http://mail.python.org/mailman/listinfo/python-list

Re: split problem if the delimiter is inside the text limiter

Reply via email to