On 5/25/07, Jim Jewett <[EMAIL PROTECTED]> wrote: > On 5/25/07, Adam Olsen <[EMAIL PROTECTED]> wrote: > > If we allowed an underscore as a mixed-script separator > > (allowing "def get_原料(self):"), does this let us get away > > with otherwise banning mixed-scripts? > > I wondered that, until seeing that it wouldn't really solve the > problem anyhow. It is possible to write entire words (such as "allow" > or "scope") in multiple scripts. (Unicode calls these "whole script > confusables".) You can't stop that without banning one of the scripts > entirely, which would disenfranche users of some languages. > > So I think the least-bad solution is to say "OK, we won't allow these > potentially confusable characters unless you were expecting them." > > And once we have a way to say "I'm expecting Cyrillic", we might as > well let the user specify exactly what they're expecting, and make > their own decisions on what it likely to be needed vs likely to be > confused.
Indeed, the whole-script confusables does create significant holes, but I think the best solution is still to ban mixed-scripts and accept that it's only a "75% solution". Using an "I'm expecting cyrillic" flag makes it harder for those who need cyrillic AND still leaves them vulnerable to the same problem we're trying to protect ourselves from. A more extreme solution would be to introduce a symbol type that converts that converts whole-script confusables to a canonical form (as well as mixed-script confusables, if we don't ban them). For practically it would have to coerce any unicode it was compared with for equality.. and probably not support sorting. -- Adam Olsen, aka Rhamphoryncus _______________________________________________ Python-3000 mailing list [email protected] http://mail.python.org/mailman/listinfo/python-3000 Unsubscribe: http://mail.python.org/mailman/options/python-3000/archive%40mail-archive.com
