Glenn Linderman a écrit :


If there is going to be a required transformation from de novo strings to funny-encoded strings, then why not make one that people can actually see and compare and decode from the displayable form, by using displayable characters instead of lone surrogates?


The problem with your "escape character" scheme is that the meaning is lost with slicing of the strings, which is a very common operation.


I though half-surrogates were illegal in well formed Unicode. I confess
to being weak in this area. By "legitimate" above I meant things like
half-surrogates which, like quarks, should not occur alone?

"Illegal" just means violating the accepted rules. In this case, the accepted rules are those enforced by the file system (at the bytes or str API levels), and by Python (for the str manipulations). None of those rules outlaw lone surrogates. [...]


Python could as well *specify* that lone surrogates are illegal, as their meaning is undefined by Unicode. If this rule is respected language-wise, there is no ambiguity. It might be unrealistic on windows, though.

This rule could even be specified only for strings that represent filesystem paths. Sure, they are the same type as other strings, but the programmer usually knows if a given string is intended to be a path or not.

Baptiste

_______________________________________________
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Reply via email to