[Python-Dev] Re: Preventing Unicode-related gotchas (Was: pre-PEP: Unicode Security Considerations for Python)

Stephen J. Turnbull Tue, 16 Nov 2021 20:04:02 -0800

Executive summary:

I guess the bottom line is that I'm sympathetic to both the NFC and
NFKC positions.

I think that wetware is such that people will go to the trouble of
picking out a letter-like symbol from a palette rarely, and in my
environment that's not going to happen at all because I use Japanese
phonetic input to get most symbols ("sekibun" = integral, "siguma" =
sigma), and I don't use calligraphic R for the real line, I use
\newcommand{\R}{{\cal R}}, except on a physical whiteboard, where I
use blackboard bold (go figure that one out!)  So to my mind the
letter-like block in Unicode is a failed experiemnt.

Jim J. Jewett writes:

 > When I was a math student, these were clearly different symbols,
 > with much less relation to each other than a mere case difference.

Arguable.  The letter-like symbols block has script (cursive),
blackboard bold, and Fraktur versions of R.  I've seen all of them as
well as plain Roman, bold, italic and bold italic facts used to denote
the real line, and I've personally used most of them for that purpose
depending on availability of fonts and input methods and medium (ie,
computer text vs. hand-written).  I've also seen several of them used
for reaction functions or spaces thereof in game theory (although
blackboard bold and Fraktur seem to be used uniquely for the real
line).  Clearly the common denominator is the uppercase latin letter
"R", and the glyph being recognizably "R" is necessary and sufficient
to each of those purposes.  The story for uppercase sigma as sum is
somewhat similar: sum is by far not the only use of that letter,
although I don't know of any other operator symbol for sum over a set
or series (outside of programming languages, which I think we can
discount).

I agree that we should consider math to be a separate language, but it
doesn't have a consistent script independent of the origins of the
symbols.  Even today none of my engineering and economics students can
type any symbols except those in the JIS repertoire, which they type
by original name ("siguma", "ramuda", "arefu", "yajirushi" == arrow,
etc, "sekibun" == integration does bring up the integral sign in at
least some modern input methods, but it doesn't have a script name,
while "kasann" == addition does not bring up sigma, although "siguma"
does, and "essu" brings up sigma -- but only in "ASCII emoji" strings,
go figure).  I have seen students use fullwidth R for the real line,
though, but distinguishing that is a deprecated compatibility feature
of Unicode (and of Japanese practice -- even in very formal university
documents such as grade reports for a final doctoral examination I've
seen numbers and names containing mixed half-width and full-width
ASCII).

So I think "letter-like" was a reasonable idea (I'm pretty sure this
block goes back to the '90s but I'm too lazy to check), but it hasn't
turned out well, and I doubt it ever will.

 > So by the Unicode consortium's goals, they are independent
 > characters that should each be defined.  I admit that isn't ideal
 > for most use cases outside of math,

I don't think it even makes sense *inside* of math for the letter-like
symbols.  The nature of math means that any "R" will be grabbed for
something whose name starts with "r" as soon as that's convenient.
Something like the integral sign (which is a stretched "S" for "sum"),
OK -- although category theory uses that for "ends" which still don't
look anything like integrals even if you turn them inside out, rotate
90 degrees, and paint them blue.

 > > It's also a UX problem.  At slightly higher layer in the stack, I'm
 > > used to using Japanese input methods to input sigma and pi which
 > > produce characters in the Greek block, and at least the upper case
 > > forms that denote sum and product have separate characters in the math
 > > operators block.
 > 
 > I think that is mostly a backwards compatibility problem; XeTeX
 > itself had to worry about compatibility with TeX (which preceded
 > Unicode) and with the fonts actually available and then with
 > earlier versions of XeTeX.

IMO, the analogy fails because the backward compatibility issue for
Unicode is in the wetware, not in the software.

Steve

_______________________________________________
Python-Dev mailing list -- python-dev@python.org
To unsubscribe send an email to python-dev-le...@python.org
https://mail.python.org/mailman3/lists/python-dev.python.org/
Message archived at 
https://mail.python.org/archives/list/python-dev@python.org/message/YTIIFIF75RMWP5J3GCSXWVXSUP5SX7AA/
Code of Conduct: http://python.org/psf/codeofconduct/

[Python-Dev] Re: Preventing Unicode-related gotchas (Was: pre-PEP: Unicode Security Considerations for Python)

Reply via email to