Re: What extended ASCII character set uses 0x9D?

Chris Angelico Thu, 17 Aug 2017 23:35:06 -0700

On Fri, Aug 18, 2017 at 4:24 PM, John Nagle <[email protected]> wrote:
>    I'm coming around to the idea that some of these snippets
> have been previously mis-converted, which is why they make no sense.
> Since, as someone pointed out, there was UTF-8 which had been
> run through an ASCII-type lower casing algorithm, that's a reasonable
> assumption.  Thanks for looking at this, everyone.  If a string won't
> parse as either UTF-8 or Windows-1252, I'm just going to convert the
> bogus stuff to the Unicode replacement character. I might remove
> 0x9d chars, since that never seems to affect readability.


That sounds like a good plan. Unless you can pin down a single
coherent encoding (even a broken one, like "UTF-8, then add 32 to
everything between 0xC1 and 0xDA"), all you have is decoding
individual strings. There just isn't enough context to do anything
smarter than flipping unparseable bytes to U+FFFD.

ChrisA
-- 
https://mail.python.org/mailman/listinfo/python-list

Re: What extended ASCII character set uses 0x9D?

Reply via email to