Glenn Linderman a écrit :
If there is going to be a required transformation from de novo strings
to funny-encoded strings, then why not make one that people can actually
see and compare and decode from the displayable form, by using
displayable characters instead of lone surrogates?
The problem with your "escape character" scheme is that the meaning is lost with
slicing of the strings, which is a very common operation.
I though half-surrogates were illegal in well formed Unicode. I confess
to being weak in this area. By "legitimate" above I meant things like
half-surrogates which, like quarks, should not occur alone?
"Illegal" just means violating the accepted rules. In this case, the
accepted rules are those enforced by the file system (at the bytes or
str API levels), and by Python (for the str manipulations). None of
those rules outlaw lone surrogates. [...]
Python could as well *specify* that lone surrogates are illegal, as their
meaning is undefined by Unicode. If this rule is respected language-wise, there
is no ambiguity. It might be unrealistic on windows, though.
This rule could even be specified only for strings that represent filesystem
paths. Sure, they are the same type as other strings, but the programmer usually
knows if a given string is intended to be a path or not.
Baptiste
_______________________________________________
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe:
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com