2009/4/28 Hrvoje Niksic <hrvoje.nik...@avl.com>: > Lino Mastrodomenico wrote: >> >> Since this byte sequence [b'\xed\xb3\xbf'] doesn't represent a valid >> character when >> decoded with UTF-8, it should simply be considered an invalid UTF-8 >> sequence of three bytes and decoded to '\udced\udcb3\udcbf' (*not* >> '\udcff'). > > "Should be considered" or "will be considered"? Python 3.0's UTF-8 decoder > happily accepts it and returns u'\udcff': > >>>> b'\xed\xb3\xbf'.decode('utf-8') > '\udcff'
Only for the new utf-8b encoding (if Martin agrees), while the existing utf-8 is fine as is (or at least waaay outside the scope of this PEP). -- Lino Mastrodomenico _______________________________________________ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com