On Fri, Aug 18, 2017 at 4:38 PM, Paul Rubin <no.email@nospam.invalid> wrote: > John Nagle <na...@animats.com> writes: >> Since, as someone pointed out, there was UTF-8 which had been >> run through an ASCII-type lower casing algorithm > > I spent a few minutes figuring out if some of the mysterious 0x81's > could be from ASCII-lower-casing some Unicode combining characters, but > the numbers didn't seem to work out. Might still be worth looking for > in some other cases.
They can't be from anything like that. Lower-casing in ASCII consists of adding 32 (or setting the fifth bit) on certain byte/character values. Subtracting 32 from 0x81 gives 0x61 which is lower-case letter 'a'; the fifth bit isn't set in 0x81. So there's no way that UTF-8 + dumb lowercasing could give you 0x81. ChrisA -- https://mail.python.org/mailman/listinfo/python-list