Re: Convert a list with wrong encoding to utf8

Gregory Ewing Thu, 14 Feb 2019 23:35:26 -0800

[email protected] wrote:

I just tried:


names = tuple( [s.encode('latin1').decode('utf8') for s in names] )

but i get
UnicodeEncodeError('latin-1', 'Άκης Τσιάμης', 0, 4, 'ordinal not in range(256)')


This suggests that the string you're getting from the database *has*
already been correctly decoded, and there is no need to go through the
latin1 re-coding step.

What do you get if you do

   print(names)

immediately *before* trying to re-code them?

What *may* be happening is that most of your data is stored in the
database encoded as utf-8, but some of it is actually using a different
encoding, and you're getting confused by the resulting inconsistencies.

I suggest you look carefully at *all* the names in the list, straight
after getting them from the database. If some of them look okay and
some of them look like mojibake, then you have bad data in the database
in the form of inconsistent encodings.

--
Greg
--
https://mail.python.org/mailman/listinfo/python-list

Re: Convert a list with wrong encoding to utf8

Reply via email to