FYI, Markus Kuhn sent the following comments to the authors of PUDTR#25.
They are of sufficient general interest to warrant discussion on
the list.
A./
PS: I've refreshed the HTML so there should be fewer problems for people
to read the equations in section 5 on Win2K or XP.
X-Mailer: exmh version 2.3+CL 01/14/2001 with nmh-1.0.4
To: Barbara Beeton [EMAIL PROTECTED], Asmus Freytag [EMAIL PROTECTED],
Murray Sargent III [EMAIL PROTECTED]
cc: [EMAIL PROTECTED] (linux-utf8)
Subject: PDUTR #25: Unicode Support for Mathematics
X-URL: http://www.cl.cam.ac.uk/~mgk25/
Date: Thu, 03 Jan 2002 17:11:50 +
From: Markus Kuhn [EMAIL PROTECTED]
Dear Unicode Maths team,
I've read with enthusiasm your draft document
http://www.unicode.org/unicode/reports/tr25/
and have great hopes that this project for Unicode Plain Text Encoding
of Mathematics will progress well and be widely implemented once it is
finished!
I thought (from comp.text.sgml discussions in the early 1990s) that it
was in general widely accepted that SGML is in practice far too
inconvenient for entering mathematical text and that and math DTD will
not lead naturally to intuitive and consistent keyborad entry
techniques, which is why I always considered MathML more an academic
exercise than anything that I would ever really want to use to get work
done. MathML has never been anywhere near being a potential competitor
for TeX.
I therefore observe with great interest that Unicode plans to treat
mathematics as just yet another complex script (like Indic, etc.), in a
way such that finally authors of SGML/XML document type definitions and
style sheets will not have to make much further provisions for support
of mathematics than for example define a single element for marking a
displayed equation. Also the prospect of being able to search for
mathematical formula fragments with web search engines is exciting.
A few comments on the current draft:
- It is not yet clear, how white-space is to be handled. In TeX,
the math mode has a lot of heuristics for adding white space where
mathematical typographic tradition finds it convenient, for example
around every operator. It has often been observed that scientific papers
written in Word have often far inferiour mathematical spacing than
papers written in TeX, because TeX's heuristic algorithms are
far better than an inexperienced author. However, these heuristics
fail frequencly, and more often then desireable, TeX users have to
manually override the math spacing with \, and the like.
Your current text does not yet make it clear, whether the additional
white space used around mathematical operators will be added by the
rendering engine and font (as in TeX) or will be encoded in the plain
text. I suspect encoding the whitespace in the plaintext is ultimately
preferable, as it will ensure more control in a portable way, even
though that means that typographic beginners will be more likely
to produce ugly formulas. Heuristc's like TeX's would have to become
part of the keyboard entry and style checking mechanisms of the
editor (like the Word spell checker), not of the rendering engine.
This should make results hopefully more predictable across a wide
range of rendering engines.
- On section 5.1 Recognizing Mathematical Expressions:
With intra-formula white-space being encoded in the plain text, and
variables typically being written in the Plane 1 math characters, there
should never be a need to explicitly delimit mathematical formulas
from normal text, as for the rendering engine, they would just be
normal text. In other words, it would be desireable if your proposal
wouldn't make having section 5.1 necessary.
- What is missing at the moment are a mechanism for handling matrices
commutative diagrams and similar tabular arrangements of inline
formulas. Most markup languages and rendering engines have already
very sophisticated mechanisms for the layout of tables. I think,
the best appraoch would be to simply use or slightly extend the
already available table mechanism to encode matrices. All that Unicode
has to add is a combining modifier corresponding to TeX's \left and
\right command that instructs a delimiter glyph to grow with the
height of the text in between, which could include an inline table with
centered alignment. Don't dublicate what the existing table engines
already provide. In that light, I would reconsider the need for the
briefly mentioned align-over operator.
Using the table mechanism of the higher markup language has numerous
advantages:
- the DTD keeps control over where matrices are allowed (e.g., only in
displayed equations, but not inline and not in headings or
keyword lists)
- layout and cutpaste selection in tables is a very complex process