On Thu, Aug 17, 2017 at 8:15 PM, MRAB <pyt...@mrabarnett.plus.com> wrote: > On 2017-08-18 01:53, Chris Angelico wrote: >> So here's an insane theory: something attempted to lower-case the byte >> stream as if it were ASCII. If you ignore the high bit, 0xC5 looks >> like 0x45 or "E", which lower-cases by having 32 added to it, yielding >> 0xE5. Reversing this transformation yields sane data for several of >> your strings - they then decode as UTF-8: >> >> miguel Ángel santos > > > I think that's: > > miguel ángel santos
It would be if it had been lower-cased correctly. The UTF-8 for á is \xc3\xa1, not \xe3x81 (ironically the add-32 method still works in this particular case; it was just added to the wrong byte). -- https://mail.python.org/mailman/listinfo/python-list