Chris Angelico writes: > Can anyone give an example of a current in-use system encoding that > would have [ASCII bytes in non-ASCII text]?
Shift JIS, Big5. (Both can have bytes < 128 inside multibyte characters.) I don't know if Big5 is still in use as the default encoding anywhere, but Shift JIS is, although it's decreasing. For both of those once you encounter a non-ASCII byte you can just switch over, and none of the previous text was mis-decoded. But that's only if you *know* the language was Japanese (respectively Chinese). Remember, there is no encoding that can be distinguished from ISO 8859-1 (and several other Latin encodings) simply based on the bytes found, since it uses all 256 bytes. > How likely is it that you'd get even one line of text that purports > to be ASCII? Program source code where the higher-level functions (likely to contain literal strings) come late in the file are frequently misdetected based on the earlier bytes. Steve _______________________________________________ Python-ideas mailing list -- python-ideas@python.org To unsubscribe send an email to python-ideas-le...@python.org https://mail.python.org/mailman3/lists/python-ideas.python.org/ Message archived at https://mail.python.org/archives/list/python-ideas@python.org/message/ZB2LM3KYLQ34DHA276SPZA73BHJBRQMF/ Code of Conduct: http://python.org/psf/codeofconduct/