Re: RFC 1345 mnemonics table not consistent with Unicode 3.2.0

Ben Finney Sat, 25 Aug 2007 19:14:49 -0700

[People, please don't send me copies of list messages by mail. I'm
subscribed to the list and read it via a non-mail interface.]

"Doug Ewell" <[EMAIL PROTECTED]> writes:

> Ben Finney <ben plus ietf at benfinney dot id dot au> wrote:
> 
> > The issue remains that the informational RFC presents useful
> > mnemonics for many characters, and there doesn't appear to be such
> > a thing from Unicode or ISO. That's the point of an update to RFC
> > 1345: it serves a purpose that I can't see served comparably well
> > elsewhere.
> 
> You might not find much enthusiasm in the character-encoding community
> for the mnemonics published in RFC 1345, and later as the so-called
> "repertoiremap" in ISO/IEC TR 14652.  These have been widely
> criticized for their incompleteness, (real or perceived)
> arbitrariness, and lack of extensibility to scripts not already
> covered.

Thanks for this. I agree that, for *encoding* and *naming*, the
mnemonics aren't much use anymore; we have superior encodings and
Unicode names, so the properties you (correctly) ascribe to the
mnemonics in RFC 1345 are not much use for those purposes.

The "repertoiremap" of ISO/IEC TR 14652 is apparently meant to be for
character transmission and translation only. It seems more extensible
for that purpose than the mnemonic approach in RFC 1345.

There is one specific application of the RFC 1345 mnemonics for which
I've not seen a superior reference: direct character *input* at a
keyboard using an input method program. There are numerous programs
(e.g. Emacs, SCIM) that support the RFC 1345 character mnemonic table
as an input method for typing key sequences to input the corresponding
characters.

> Most people will agree that "a plus apostrophe" makes a handy
> mnemonic for "a with acute," and "c plus comma" works well for "c
> with cedilla," but the system tends to break down rather quickly
> after that, with Greek letters identified by an asterisk, Cyrillic
> by an equal sign, Hebrew by a capital letter and plus sign, Arabic
> by a small letter and plus sign, etc.

So long as the table follows some kind of system (and the definition
of the RFC 1345 character mnemonic table does at least explain the
scheme it uses for those character sets), it is still useful as a
means of remembering short, discrete mnemonics for a large set of
characters.

> There are numerous exceptions to these guidelines, especially when
> the letters in question don't map cleanly to Basic Latin, and a
> large number of non-ideographic characters have no mnemonic at all,
> even some that were defined in ISO 10646 at the time RFC 1345 was
> published.

Yes, the system does have its limits; a mnemonic table cannot
reasonably expect to map mnemonic pure-ASCII keyboard characters to
*every* set of characters in ISO 10646. But with those limits
acknowledged, the mnemonic system can be useful for those character
sets where there *is* a reasonable expectation of such a mapping.

> That is why you are unlikely to find an update to RFC 1345 that
> brings the mnemonics up to date with 10646/Unicode: the task is
> almost impossible, given the limitations of the system.

Indeed. My initial comment was merely that even the characters that
*are* covered by the mnemonic table are not in accord with the current
Unicode data. To the extent that the character mnemonic table is
useful, it is surely undermined if the data are wrong.

> The motivation for inventing these mnemonics seems to have been to
> specify characters "in a coded character set independent way," which
> was perhaps a sensible goal in 1992 when the Universal Character Set
> was quite a bit less universal.

I'm beginning to understand the gap of understanding here; I've been
approaching this discussion caring *only* about the character mnemonic
table in RFC 1345, whereas others have (reasonably) approached the
discussion in the context of the entire RFC document and its apparent
purpose.

> Today, however, virtually all non-10646 character sets are mapped to
> 10646 code points, not to alphabetic mnemonics.

This is true for the purpose of *encoding*, but for the purpose of
*input* at a non-remapped largely-ASCII keyboard, input method
programs certainly do map ASCII mnemonic sequences to non-ASCII
characters.

> Almost any charatcer that can be found in a national or industry
> charset can be found in 10646.  The need for a notation independent
> of 10646 has passed.

I think it's clear that the domain of keyboard character input clearly
needs brief mnemonic ASCII sequences, not numeric ordinals or
descriptive character names, to map to the desired characters.

Thanks very much for the discussion, it's becoming clearer now. Two
further questions:

I'd like to discuss this with the people who made the original RFC
1345 character mnemonic table. How would I get in touch with the
authors of RFC 1345?

It wasn't my intention to write a new discussion draft, but it seems
that since my purpose is significantly different to the broad purpose
of RFC 1345 that a new draft aimed at the purpose I have in mind may
be warranted. What should I read (URLs please) before doing so?

-- 
 \         "If we don't believe in freedom of expression for people we |
  `\         despise, we don't believe in it at all."  -- Noam Chomsky |
_o__)                                                                  |
Ben Finney

_______________________________________________
Ietf mailing list
Ietf@ietf.org
https://www1.ietf.org/mailman/listinfo/ietf

Re: RFC 1345 mnemonics table not consistent with Unicode 3.2.0

Reply via email to