On 6/6/07, Stephen J. Turnbull <[EMAIL PROTECTED]> wrote: > No. The point is that people want to use their current tools; they > may not be able to easily specify normalization.
> Please look through the list (I've already done so; I'm speaking from > detailed examination of the data) and state what compatibility > characters you want to keep. I cannot really say about code points I'm not familiar with, but I wouldn't use any of the ones I do know in identifiers. The only compatibility characters in ID_Continue I have used myself are, I think, halfwidth katakana and fullwidth alphanumerics. Examples: タ -> タ # halfwidth katakana x -> x # fullwidth alphabetic 1 -> 1 # fullwidth numeric Practically speaking I won't be using such things in my code. I don't like them but if it's more pragmatic to allow them then I guess it can't be helped. There are some cases where users might in the future want to make a distinction between "compatibility" characters, such as these: http://en.wikipedia.org/wiki/Mathematical_alphanumeric_symbols If some day everyone writes their TeX using such things, then it'd make sense to allow and distinguish them in Python, too. For this reason I think that compatibility transformation, if any, should only be applied to characters where there's a practical reason to do so, and for other cases punting (=syntax error) is safest. When in doubt, refuse the temptation to guess. > as a daily user of several Japanese input methods, I can tell you it > would be a massive pain in the ass if Python doesn't convert those, > and errors would be an on-the-minute-every-minute annoyance. I use two Japanese input methods (MS IME and scim/anthy), but only the latter one daily. When I type text that mixes Japanese and other languages, I switch the input mode off when not typing Japanese. For code that uses a lot of Japanese this may not be convenient, but then you'd want to set your input method to use ASCII for ASCII anyway, as that would still be required in literals (0x15 or "a" won't work) and punctuation (a「15」。foo=(5、6) won't work). A code mixing fullwidth and halfwidth alphanumerics also looks horrible, but that's just a coding style issue :-) > > Unicode, and adding extra equivalences (whether it's "FoO" == "foo", > > "カキ" == > "カキ" or "A123" == "A123") is surprising. > > How many Japanese documents do you deal with on a daily basis? Much fewer than you, as I don't live in Japan. I read a fair amount but don't type long texts in Japanese. When I do type, I usually use fullwidth alphanumerics except for foreign words that aren't acronyms. E.g. FBI but not alphabet. For code, consistently using ASCII for ASCII would be the most predictable rule (TOOWTDI). You have to go out of your way to type halfwidth katakana, and it isn't really useful in identifiers IMHO. > They are treated as font variants, not different characters, by *all* > users. I think programmers in general expect identifier identity to behave the same way as string identity. In this way they are a special class of users. (those who use case-insensitive programming languages have all my sympathy :-) > I would like this code to return "KK". This might be an unpleasant > surprise, once, and there would need to be a warning on the box for > distribution in Japan (and other cultures with compatibility > decompositions). This won't have a big impact if you apply it only to carefully selected code points, and that way it sounds like a viable choice. Asking your students for input as you suggested is surely a good idea. _______________________________________________ Python-3000 mailing list [email protected] http://mail.python.org/mailman/listinfo/python-3000 Unsubscribe: http://mail.python.org/mailman/options/python-3000/archive%40mail-archive.com
