[Python-Dev] Re: Preventing Unicode-related gotchas (Was: pre-PEP: Unicode Security Considerations for Python)

Jim J. Jewett Tue, 16 Nov 2021 16:24:34 -0800

Stephen J. Turnbull wrote:
> Christopher Barker writes:

> > For example, in writing math we often use different scripts to mean
> > different things (e.g. TeX's Blackboard Bold).  So if I were to use
> > some of the Unicode Mathematical Alphanumeric Symbols, I wouldn't
> > want them to get normalized.


Agreed, for careful writers.  But Stephen's answer about people using the wrong 
one and expecting it to work means that normalization is probably the lesser of 
evils for most people, and the ones who don't want it normalized are more 
likely to be able to specify custom processing when it is important enough.  
(The compatibility characters aren't normalized in strings, largely because 
that should still be possible.)

> In fact, I think adding these symbols to Unicode was a bad idea; they
> should be handled at a higher level in the linguistic stack (by
> semantic markup).

When I was a math student, these were clearly different symbols, with much less 
relation to each other than a mere case difference. 
 So by the Unicode consortium's goals, they are independent characters that 
should each be defined.  I admit that isn't ideal for most use cases outside of 
math, but ... supporting those other cases is what compatibility normalization 
is for. 

> It's also a UX problem.  At slightly higher layer in the stack, I'm
> used to using Japanese input methods to input sigma and pi which
> produce characters in the Greek block, and at least the upper case
> forms that denote sum and product have separate characters in the math
> operators block.  I understand why people who literally write
> mathematics in Greek might want those not normalized, but I sure am
> going to keep using "Greek sigma", not "math sigma"!  The probability
> that I'm going to have a Greek uppercase sigma in my papers is nil,
> the probability of a summation symbol near unity.  But the summation
> symbol is not easily available, I have to scroll through all the
> preceding Unicode blocks to find Mathematical Operators.  So I am
> perfectly happy with uppercase Greek sigma for that role (as is
> XeTeX!!)

I think that is mostly a backwards compatibility problem; XeTeX itself had to 
worry about compatibility with TeX (which preceded Unicode) and with the fonts 
actually available and then with earlier versions of XeTeX.

-jJ
_______________________________________________
Python-Dev mailing list -- python-dev@python.org
To unsubscribe send an email to python-dev-le...@python.org
https://mail.python.org/mailman3/lists/python-dev.python.org/
Message archived at 
https://mail.python.org/archives/list/python-dev@python.org/message/JNFLAQUKNCWCJSMBNJZGHVD5ZELOTU6G/
Code of Conduct: http://python.org/psf/codeofconduct/

[Python-Dev] Re: Preventing Unicode-related gotchas (Was: pre-PEP: Unicode Security Considerations for Python)

Reply via email to