Re: Python 3.1.1 bytes decode with replace bug

Benjamin Kaplan Sat, 24 Oct 2009 13:48:18 -0700

On Sat, Oct 24, 2009 at 1:09 PM, Joe <[email protected]> wrote:
> The Python 3.1.1 documentation has the following example:
>
>>>> b'\x80abc'.decode("utf-8", "strict")
> Traceback (most recent call last):
>  File "<stdin>", line 1, in ?
> UnicodeDecodeError: 'utf8' codec can't decode byte 0x80 in position 0:
>                    unexpected code byte
>>>> b'\x80abc'.decode("utf-8", "replace")
> '\ufffdabc'
>>>> b'\x80abc'.decode("utf-8", "ignore")
> 'abc'
>
> Strict and Ignore appear to work as per the documentation but replace
> does not.  Instead of replacing the values it fails:
>
>>>> b'\x80abc'.decode('utf-8', 'replace')
> Traceback (most recent call last):
>  File "<stdin>", line 1, in <module>
>  File "p:\SW64\Python.3.1.1\lib\encodings\cp437.py", line 19, in
> encode
>    return codecs.charmap_encode(input,self.errors,encoding_map)[0]
> UnicodeEncodeError: 'charmap' codec can't encode character '\ufffd' in
> position
> 1: character maps to <undefined>
>
> If this a known bug with 3.1.1?
>


It's not a bug. The problem isn't even the decode statement. Python
successfully creates the unicode string '\ufffdabc' and then tries to
print it to the screen. so it has to convert it to cp437 (your console
encoding) which fails. That's why the traceback mentions the cp437
file and not the utf-8 file.

>
> --
> http://mail.python.org/mailman/listinfo/python-list
>
-- 
http://mail.python.org/mailman/listinfo/python-list

Re: Python 3.1.1 bytes decode with replace bug

Reply via email to