> I'd suggest restricting identifiers under the rules of UTS-39, > profile 2, "Highly Restrictive". This limits mixing of scripts > in a single identifier; you can't mix Hebrew and ASCII, for example, > which prevents problems with mixing right to left and left to right > scripts. Domain names have similar restrictions.
That sounds interesting, however, I cannot find the document your refer to. In TR 39 (also called Unicode Technical Standard #39), at http://unicode.org/reports/tr39/ there is no mentioning of numbered profiles, or "Highly Restrictive". Looking at the document, it seems 3.1., "General Security Profile for Identifiers" might apply. IIUC, xidmodifications.txt would have to be taken into account. I'm not quite sure what that means; apparently, a number of characters (listed as restricted) should not be used in identifiers. OTOH, it also adds HYPHEN-MINUS and KATAKANA MIDDLE DOT - which surely shouldn't apply to Python identifiers, no? (at least HYPHEN-MINUS already has a meaning in Python, and cannot possibly be part of an identifier). Also, mixed-script detection might be considered, but it is not clear to me how to interpret the algorithm in section 5, plus it says that this is just one of the possible algorithms. Finally, Confusable Detection is difficult to perform on a single identifier - it seems you need two of them to find out whether they are confusable. In any case, I added this as an open issue to the PEP. Regards, Martin -- http://mail.python.org/mailman/listinfo/python-list