Re: Python under PowerShell adds characters

Marko Rauhamaa Wed, 29 Mar 2017 22:48:42 -0700

Steven D'Aprano <[email protected]>:

> On Thu, 30 Mar 2017 07:29:48 +0300, Marko Rauhamaa wrote:
>> I'd expect not having to deal with Unicode decoding exceptions with
>> arbitrary input.
>
> That's just silly. If you have *arbitrary* bytes, not all
> byte-sequences are valid Unicode, so you have to expect decoding
> exceptions, if you're processing text.


The input is not in my control, and bailing out may not be an option:

   $ echo $'aa\n\xdd\naa' | grep aa
   aa
   aa
   $ echo $'\xdd' | python2 -c 'import sys; sys.stdin.read(1)'
   $ echo $'\xdd' | python3 -c 'import sys; sys.stdin.read(1)'
   Traceback (most recent call last):
     File "<string>", line 1, in <module>
     File "/usr/lib64/python3.5/codecs.py", line 321, in decode
       (result, consumed) = self._buffer_decode(data, self.errors, final)
   UnicodeDecodeError: 'utf-8' codec can't decode byte 0xdd in position 0:
    invalid continuation byte

Note that "grep" is also locale-aware.

>> There recently was a related debate on the Guile mailing list. Like
>> Python3, Guile2 is sensitive to illegal UTF-8 on the command line and
>> in the standard streams. An emacs developer was urging Guile
>> developers to follow emacs's example and support a superset of UTF-8
>> and Unicode where all byte strings can be bijectively mapped into
>> text.
>
> I'd like to read that. Got a link?

<URL:
http://lists.gnu.org/archive/html/guile-user/2017-02/msg00054.html>


Marko
-- 
https://mail.python.org/mailman/listinfo/python-list

Re: Python under PowerShell adds characters

Reply via email to