On Thu, Aug 17, 2017 at 6:52 PM, Ian Kelly <ian.g.ke...@gmail.com> wrote:
> On Thu, Aug 17, 2017 at 6:30 PM, John Nagle <na...@animats.com> wrote:
>> A few more cases:
>>
>> bytearray(b'miguel \xe3\x81ngel santos')
>
> If that were b'\xc3\x81' it would be Á in UTF-8 which would fit the
> rest of the name.
>
>> bytearray(b'\xe5\x81ukasz zmywaczyk')
>
> If that were b'\xc5\x81' it would be Ł in UTF-8 which would fit the
> rest of the name.
>
> I suspect the others contain similar errors. I don't know if it's the
> result of some form of Mojibake or maybe just transcription errors.

Oh shit, I think know what happened. In ASCII you can lower-case
letters by just adding 32 (0x20) to them. Somebody tried to do that
here and fucked up the encoding. That's why all the ASCII letters in
the strings are lower-case while these ones aren't.
-- 
https://mail.python.org/mailman/listinfo/python-list

Reply via email to