Christopher Barker writes: > Would a proposal to switch the normalization to NFC only have any hope of > being accepted?
Hope, yes. Counting you, it's been proposed twice. :-) I don't know whether it would get through. We know this won't affect the stdlib, since that's restricted to ASCII. I suppose we could trawl PyPI and GitHub for "compatibles" (the Unicode term for "K" normalizations). > For example, in writing math we often use different scripts to mean > different things (e.g. TeX's Blackboard Bold). So if I were to use > some of the Unicode Mathematical Alphanumeric Symbols, I wouldn't > want them to get normalized. Independent of the question of the normalization of Python identifiers, I think using those characters this way is a bad idea. In fact, I think adding these symbols to Unicode was a bad idea; they should be handled at a higher level in the linguistic stack (by semantic markup). You're confusing two things here. In Unicode, a script is a collection of characters used for a specific language, typically a set of Unicode blocks of characters (more or less; there are a lot of Han ideographs that are recognizable as such to Japanese but are not part of the repertoire of the Japanese script). That is, these characters are *different* from others that look like them. Blackboard Bold is more what we would usually call a "font": the (math) italic "x" and the (math) bold italic "x" are the same "x", but one denotes a scalar and the other a vector in many math books. A roman "R" probably denotes the statistical application, an italic "R" the reaction function in game theory model, and a Blackboard Bold "R" the set of real numbers. But these are all the same character. It's a bad idea to rely on different (Unicode) scripts that use the same glyphs for different characters to look different from each other, unless you "own" the fonts to be used. As far as I know there's no way for a Python program to specify the font to be used to display itself though. :-) It's also a UX problem. At slightly higher layer in the stack, I'm used to using Japanese input methods to input sigma and pi which produce characters in the Greek block, and at least the upper case forms that denote sum and product have separate characters in the math operators block. I understand why people who literally write mathematics in Greek might want those not normalized, but I sure am going to keep using "Greek sigma", not "math sigma"! The probability that I'm going to have a Greek uppercase sigma in my papers is nil, the probability of a summation symbol near unity. But the summation symbol is not easily available, I have to scroll through all the preceding Unicode blocks to find Mathematical Operators. So I am perfectly happy with uppercase Greek sigma for that role (as is XeTeX!!) And the thing is, of course those Greek letters really are Greek letters: they were chosen because pi is the homophone of p which is the first letter of "product", and sigma is the homophone of s which is the first letter of "sum". Å for Ångström is similar, it's the initial letter of a Swedish name. Sure, we could fix the input methods (and search methods!! -- people are going to input the character they know that corresponds to the glyph *they* see, not the bit pattern the *CPU* sees). But that's as bad as trying to fix mail clients. Not worth the effort because I'm pretty sure you're gonna fail -- it's one of those "you'll have to pry this crappy software that annoys admins around the world from my cold dead fingers" issues, which is why their devs refuse to fix them. Steve _______________________________________________ Python-Dev mailing list -- python-dev@python.org To unsubscribe send an email to python-dev-le...@python.org https://mail.python.org/mailman3/lists/python-dev.python.org/ Message archived at https://mail.python.org/archives/list/python-dev@python.org/message/5GHPVNJLLOKBYPE7FSU5766XYP6IJPEK/ Code of Conduct: http://python.org/psf/codeofconduct/