Martin v. Löwis <mar...@v.loewis.de> added the comment: > Your name will end up being partially escaped as surrogate: > > 'L\udcf6wis' > > Further processing will fail
That depends on the further processing, no? > Traceback (most recent call last): > File "<stdin>", line 1, in <module> > UnicodeEncodeError: 'latin-1' codec can't encode character '\udcf6' in > position 1: ordinal not in > range(256) Where did you get this error from? > It doesn't work if an application tries to work *with* the data, > e.g. tries to convert it Converting it to what? > parse it Parsing will work fine. > decode it It's a string. You shouldn't decode it. > The reason is > that information included by the use of the 'surrogateescape' > error handler is lost along the way and this then causes data > corruption. And how would that not happen if it was bytes? The problems you describe were one of the primary motivations to switch to Unicode: it's *byte* strings that have these problems. ---------- _______________________________________ Python tracker <rep...@bugs.python.org> <http://bugs.python.org/issue8603> _______________________________________ _______________________________________________ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com