Steven D'Aprano <st...@pearwood.info>: > On Thu, 30 Mar 2017 07:29:48 +0300, Marko Rauhamaa wrote: >> I'd expect not having to deal with Unicode decoding exceptions with >> arbitrary input. > > That's just silly. If you have *arbitrary* bytes, not all > byte-sequences are valid Unicode, so you have to expect decoding > exceptions, if you're processing text.
The input is not in my control, and bailing out may not be an option: $ echo $'aa\n\xdd\naa' | grep aa aa aa $ echo $'\xdd' | python2 -c 'import sys; sys.stdin.read(1)' $ echo $'\xdd' | python3 -c 'import sys; sys.stdin.read(1)' Traceback (most recent call last): File "<string>", line 1, in <module> File "/usr/lib64/python3.5/codecs.py", line 321, in decode (result, consumed) = self._buffer_decode(data, self.errors, final) UnicodeDecodeError: 'utf-8' codec can't decode byte 0xdd in position 0: invalid continuation byte Note that "grep" is also locale-aware. >> There recently was a related debate on the Guile mailing list. Like >> Python3, Guile2 is sensitive to illegal UTF-8 on the command line and >> in the standard streams. An emacs developer was urging Guile >> developers to follow emacs's example and support a superset of UTF-8 >> and Unicode where all byte strings can be bijectively mapped into >> text. > > I'd like to read that. Got a link? <URL: http://lists.gnu.org/archive/html/guile-user/2017-02/msg00054.html> Marko -- https://mail.python.org/mailman/listinfo/python-list