On Thu, Aug 17, 2017 at 8:15 PM, MRAB <pyt...@mrabarnett.plus.com> wrote:
> On 2017-08-18 01:53, Chris Angelico wrote:
>> So here's an insane theory: something attempted to lower-case the byte
>> stream as if it were ASCII. If you ignore the high bit, 0xC5 looks
>> like 0x45 or "E", which lower-cases by having 32 added to it, yielding
>> 0xE5. Reversing this transformation yields sane data for several of
>> your strings - they then decode as UTF-8:
>>
>> miguel Ángel santos
>
>
> I think that's:
>
> miguel ángel santos

It would be if it had been lower-cased correctly. The UTF-8 for á is
\xc3\xa1, not \xe3x81 (ironically the add-32 method still works in
this particular case; it was just added to the wrong byte).
-- 
https://mail.python.org/mailman/listinfo/python-list

Reply via email to