Martin v. Löwis <mar...@v.loewis.de> added the comment:

> Your name will end up being partially escaped as surrogate:
> 
> 'L\udcf6wis'
> 
> Further processing will fail

That depends on the further processing, no?

> Traceback (most recent call last):
>   File "<stdin>", line 1, in <module>
> UnicodeEncodeError: 'latin-1' codec can't encode character '\udcf6' in 
> position 1: ordinal not in
> range(256)

Where did you get this error from?

> It doesn't work if an application tries to work *with* the data,
> e.g. tries to convert it

Converting it to what?

> parse it

Parsing will work fine.

> decode it

It's a string. You shouldn't decode it.

> The reason is
> that information included by the use of the 'surrogateescape'
> error handler is lost along the way and this then causes data
> corruption.

And how would that not happen if it was bytes? The problems you describe
were one of the primary motivations to switch to Unicode: it's *byte*
strings that have these problems.

----------

_______________________________________
Python tracker <rep...@bugs.python.org>
<http://bugs.python.org/issue8603>
_______________________________________
_______________________________________________
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

Reply via email to