On 2018-03-24 11:21:09 +1100, Chris Angelico wrote:
> On Sat, Mar 24, 2018 at 11:11 AM, Steven D'Aprano
> <steve+comp.lang.pyt...@pearwood.info> wrote:
> > On Fri, 23 Mar 2018 07:46:16 -0700, Tobiah wrote:
> >> If I changed my database tables to all be UTF-8 would this work cleanly
> >> without any decoding?
> >
> > Not reliably or safely. It will appear to work so long as you have only
> > pure ASCII strings from the database, and then crash when you don't:
> >
> > py> text_from_database = u"hello wörld".encode('latin1')
> > py> print text_from_database
> > hello w�rld
[...]
> 
> If the database has been configured to use UTF-8 (as mentioned, that's
> "utf8mb4" in MySQL), you won't get that byte sequence back. You'll get
> back valid UTF-8.

Actually (with python3 and mysql.connector), you'll get back str values,
not byte values encoded in utf-8 or latin-1. You don't have to decode
them because the driver already did it.

So as a Python programmer, you don't care what character set the
database uses internally, as this is almost completely hidden from you
(The one aspect that isn't hidden is of course the set of characters
that you can store in a character field: Obviously, you can't store
Chinese characters in a latin1 field).

If you are using Python2, manual encoding and decoding may be necessary.
(AFAICS the OP still hasn't stated which Python version he uses)

        hp

-- 
   _  | Peter J. Holzer    | we build much bigger, better disasters now
|_|_) |                    | because we have much more sophisticated
| |   | h...@hjp.at         | management tools.
__/   | http://www.hjp.at/ | -- Ross Anderson <https://www.edge.org/>

Attachment: signature.asc
Description: PGP signature

-- 
https://mail.python.org/mailman/listinfo/python-list

Reply via email to