Fwd: PDUTR #25: Unicode Support for Mathematics

2002-01-04 Thread Asmus Freytag

FYI, Markus Kuhn sent the following comments to the authors of PUDTR#25.
They are of sufficient general interest to warrant discussion on
the list.

A./

PS: I've refreshed the HTML so there should be fewer problems for people
to read the equations in section 5 on Win2K or XP.

X-Mailer: exmh version 2.3+CL 01/14/2001 with nmh-1.0.4
To: Barbara Beeton [EMAIL PROTECTED], Asmus Freytag [EMAIL PROTECTED],
 Murray Sargent III [EMAIL PROTECTED]
cc: [EMAIL PROTECTED] (linux-utf8)
Subject: PDUTR #25: Unicode Support for Mathematics
X-URL: http://www.cl.cam.ac.uk/~mgk25/
Date: Thu, 03 Jan 2002 17:11:50 +
From: Markus Kuhn [EMAIL PROTECTED]

Dear Unicode Maths team,

I've read with enthusiasm your draft document

   http://www.unicode.org/unicode/reports/tr25/

and have great hopes that this project for Unicode Plain Text Encoding
of Mathematics will progress well and be widely implemented once it is
finished!

I thought (from comp.text.sgml discussions in the early 1990s) that it
was in general widely accepted that SGML is in practice far too
inconvenient for entering mathematical text and that and math DTD will
not lead naturally to intuitive and consistent keyborad entry
techniques, which is why I always considered MathML more an academic
exercise than anything that I would ever really want to use to get work
done. MathML has never been anywhere near being a potential competitor
for TeX.

I therefore observe with great interest that Unicode plans to treat
mathematics as just yet another complex script (like Indic, etc.), in a
way such that finally authors of SGML/XML document type definitions and
style sheets will not have to make much further provisions for support
of mathematics than for example define a single element for marking a
displayed equation. Also the prospect of being able to search for
mathematical formula fragments with web search engines is exciting.

A few comments on the current draft:

   - It is not yet clear, how white-space is to be handled. In TeX,
 the math mode has a lot of heuristics for adding white space where
 mathematical typographic tradition finds it convenient, for example
 around every operator. It has often been observed that scientific papers
 written in Word have often far inferiour mathematical spacing than
 papers written in TeX, because TeX's heuristic algorithms are
 far better than an inexperienced author. However, these heuristics
 fail frequencly, and more often then desireable, TeX users have to
 manually override the math spacing with \, and the like.

 Your current text does not yet make it clear, whether the additional
 white space used around mathematical operators will be added by the
 rendering engine and font (as in TeX) or will be encoded in the plain
 text. I suspect encoding the whitespace in the plaintext is ultimately
 preferable, as it will ensure more control in a portable way, even
 though that means that typographic beginners will be more likely
 to produce ugly formulas. Heuristc's like TeX's would have to become
 part of the keyboard entry and style checking mechanisms of the
 editor (like the Word spell checker), not of the rendering engine.
 This should make results hopefully more predictable across a wide
 range of rendering engines.

   - On section 5.1 Recognizing Mathematical Expressions:
 With intra-formula white-space being encoded in the plain text, and
 variables typically being written in the Plane 1 math characters, there
 should never be a need to explicitly delimit mathematical formulas
 from normal text, as for the rendering engine, they would just be
 normal text. In other words, it would be desireable if your proposal
 wouldn't make having section 5.1 necessary.

   - What is missing at the moment are a mechanism for handling matrices
 commutative diagrams and similar tabular arrangements of inline
 formulas. Most markup languages and rendering engines have already
 very sophisticated mechanisms for the layout of tables. I think,
 the best appraoch would be to simply use or slightly extend the
 already available table mechanism to encode matrices. All that Unicode
 has to add is a combining modifier corresponding to TeX's \left and
 \right command that instructs a delimiter glyph to grow with the
 height of the text in between, which could include an inline table with
 centered alignment. Don't dublicate what the existing table engines
 already provide. In that light, I would reconsider the need for the
 briefly mentioned align-over operator.

 Using the table mechanism of the higher markup language has numerous
 advantages:

   - the DTD keeps control over where matrices are allowed (e.g., only in
 displayed equations, but not inline and not in headings or
 keyword lists)

   - layout and cutpaste selection in tables is a very complex process

Re: PDUTR #25: Unicode Support for Mathematics

2001-12-29 Thread Asmus Freytag

At 12:34 AM 12/28/01 -0600, [EMAIL PROTECTED] wrote:
If you want to define text/math, and provide the disappearing parenthesis
and precedence tables and everything, then that's fine, but I don't see
why it should be part of Unicode, anymore than full music rendering is part
of Unicode. It's a higher level protocol. IMO, section 5 should not be part
of a Unicode draft report for that reason.

This opinion is shared by others and the best place for the information in 
section 5 will certainly be discussed when the current *proposed* draft is 
being reviewed for advancement *draft* status. In the meantime, I'd like to 
comment on your analogy.

The analogies to full music rendering are not as close as they appear at 
first glance. If you look at a mathematical, scientific or technical paper 
you will find substantial amount of mathematical notation appearing as part 
of ordinary text lines, including, at times, headings and titles, in other 
words, strings that often are part of databases. (**)

The same is not true for music laid out on staff.

Secondly, a large subset of mathematical formulae can be expressed very 
directly in a linear, text-like fashion, even though the remainder do 
require fairly heavyweight markup to display correctly.

Again, this is not strictly analogous to musical notation.

The convention proposed in section 5 is clearly a lightweight markup 
protocol. The disappearing parens in themselves are borderline in that 
regard - in fact the mechanism is not far removed from some of the complex 
script cases, where characters may or may not be invisible depending on 
context. And giving operators some properties is not so remvoced from 
FRACTION SLASH. However, to be workable, the proposed convention needs 
subscript and superscript operators, and ultimately a convention of 
applying certain decorations (limits, as well as combining accents) to 
both individual characters and groups of characters. These are the aspects 
that most clearly appear to cross the line into the realm of markup protocols.

On the other hand, the proposed 'markup' itself consists of the kinds of 
things that one would use in a plain text fallback, e.g. when communicating 
an equation by e-mail. One can conclude that the proposed convention is a 
'renderable plain text fallback' that happens to cover a large subset of 
commonly used mathematical notation. It is therefore a very different beast 
from MathML, which is a full-fledged markup protocol, able to cover 
practically everything and only barely human readable in source form.

As such it occupies a novel middle ground between the plain Unicode (with 
script rules) and full-fledged markup schemes.

A./

PS: Disclaimer: while I am a co-author of the TR, the credits for inventing 
the scheme described in section 5 belong entirely to Murray Sargent, who 
will undoubtably have his own things to say about it.

PPS: I did not elaborate on why the fact (marked with (**) above) that 
limited amounts of mathematical notation end up in database strings is 
significant. The reason is that such strings are ultimately plain text that 
need to be rendered in the absence of heavy duty markup protocols. A 
convention that implements its own plain-text fallback has great advantages.




Re: PDUTR #25: Unicode Support for Mathematics

2001-12-27 Thread starner

If you want to define text/math, and provide the disappearing parenthesis 
and precedence tables and everything, then that's fine, but I don't see
why it should be part of Unicode, anymore than full music rendering is part
of Unicode. It's a higher level protocol. IMO, section 5 should not be part
of a Unicode draft report for that reason.