On Sat, Oct 24, 2009 at 1:09 PM, Joe <joesalm...@hotmail.com> wrote: > The Python 3.1.1 documentation has the following example: > >>>> b'\x80abc'.decode("utf-8", "strict") > Traceback (most recent call last): > File "<stdin>", line 1, in ? > UnicodeDecodeError: 'utf8' codec can't decode byte 0x80 in position 0: > unexpected code byte >>>> b'\x80abc'.decode("utf-8", "replace") > '\ufffdabc' >>>> b'\x80abc'.decode("utf-8", "ignore") > 'abc' > > Strict and Ignore appear to work as per the documentation but replace > does not. Instead of replacing the values it fails: > >>>> b'\x80abc'.decode('utf-8', 'replace') > Traceback (most recent call last): > File "<stdin>", line 1, in <module> > File "p:\SW64\Python.3.1.1\lib\encodings\cp437.py", line 19, in > encode > return codecs.charmap_encode(input,self.errors,encoding_map)[0] > UnicodeEncodeError: 'charmap' codec can't encode character '\ufffd' in > position > 1: character maps to <undefined> > > If this a known bug with 3.1.1? >
It's not a bug. The problem isn't even the decode statement. Python successfully creates the unicode string '\ufffdabc' and then tries to print it to the screen. so it has to convert it to cp437 (your console encoding) which fails. That's why the traceback mentions the cp437 file and not the utf-8 file. > > -- > http://mail.python.org/mailman/listinfo/python-list > -- http://mail.python.org/mailman/listinfo/python-list