On Fri, 18 Aug 2017 10:14 am, John Nagle wrote:

>      I'm cleaning up some data which has text description fields from
> multiple sources. Some are are in UTF-8. Some are in WINDOWS-1252.
> And some are in some other character set. So I have to examine and
> sanity check each field in a database dump, deciding which character
> set best represents what's there.
> 
>     Here's a hard case:
> 
>   g1 = bytearray(b'\\"Perfect Gift Idea\\"\x9d Each time')

py> unicodedata.name('\x9d'.decode('macroman'))
'LATIN SMALL LETTER U WITH GRAVE'

Doesn't seem too likely.

This may help:

http://i18nqa.com/debug/bug-double-conversion.html


There's always the possibility that it's just junk, or moji-bake from some other
source, so it might not be anything sensible in any extended ASCII character
set.




-- 
Steve
“Cheer up,” they said, “things could be worse.” So I cheered up, and sure
enough, things got worse.

-- 
https://mail.python.org/mailman/listinfo/python-list

Reply via email to