Re: Private Use proposals (long)

2002-05-24 Thread Doug Ewell

Michael Everson everson at evertype dot com wrote:

 At 08:51 -0700 2002-05-21, Doug Ewell wrote:
 (Deseret and Shavian were encoded in ConScript; whether that helped
 get them into Unicode or not, I don't know.)

 Certainly not. They were examined on their merits just like anything
else.

Of course they were.  By helped I didn't mean that the characters
wouldn't otherwise have been worthy of encoding, but that the CSUR
assignments might have resulted in additional usage, which in turn got
the attention of UTC and/or WG2.

I'm trying to examine the passage in TUS 3.0, Section 13.5 (p. 323)
which seems to have caught Mr. Overington's fancy:

quote
Promotion of Private-Use Characters.

In future versions of the Unicode Standard, some characters that have
been defined by one vendor or another in the Corporate Use subarea may
be encoded elsewhere as regular Unicode characters if their usage is
widespread enough that they become candidates for general use.  The code
positions in the Private Use Area are permanently reserved for private
use -- no assignment to a particular set of characters will ever be
endorsed by the Unicode Consortium.
/quote

Ignoring the last sentence, because we all seem to be on board with
that, I think the image of the PUA that may have emerged from this is
that of a test bed for proposed characters.  In this scenario,
characters are encoded in the PUA *so that* they will gain increased
usage, *so that* the UTC will take note of the increased usage and
respond by promoting the character to Unicode.  (I think the use of
the word promotion in the 13.5 subhead is turning out to be a bad
idea, as it implies a simple and straightforward progression.)

As I mentioned earlier, as far as I know no script or character has
followed this path deliberately -- that is, been encoded in the PUA for
the express purpose of satisfying Unicode's widespread usage
requirement.  Of course, we all know (don't we?) that a script or
character must satisfy many other criteria as well.  Deseret and Shavian
obviously did satisfy those criteria, as well as being judged to have
sufficiently widespread usage.

Those additional criteria -- not frequency of usage -- are what will
prevent additional Latin ligatures from being promoted to Unicode.

To answer (I hope) some of William's other points:

 Well, the ideas are not intended to be quasi-official.  Just one end
 user of the Unicode system seeking to use the Private Use Area to
 good effect and putting forward ideas to other end users who might
 like to consider using some of the facilities suggested.

Hooray for that.  The PUA is there for just that purpose.  However, in
the spirit of using Unicode, please also respect the character-glyph
model, which says (among other things) that a ligature is a glyph
requiring a font rendering, not a character requiring a code point.

 Now, the fact is that Michael suggested a feature named ZERO WIDTH
 LIGATOR specifically for the purpose of ligation and it appears that
 that suggestion has not been accepted, but that a shared solution
 with a code point that can also mean something else has been decided
 upon.  Now, I do not know the details of all of this and I certainly
 hope to study the matter more, yet, as someone who is not a linguist
 as such but an inventor and programmer, I have a concern that using
 one code point for two types of meaning rather than one code point
 for each type of meaning is what I call a software unicorn.  The
 concept of a software unicorn can be read about on
 http://www.users.globalnet.co.uk/~ngo/euto0008.htm if anyone is
 interested.

I gather from the article that a software unicorn is an unlikely,
perhaps impossible, situation that nevertheless must be handled because
it cannot be completely ruled out.  Lots of defensive code gets
written to handle such situations, often with a comment like:

default:// this can't happen, but...

In this context, I think William is saying that it's risky to overload
ZWJ to handle Latin ligation because we can't completely rule out the
possibility that we might need ZWJ to join Latin characters the way it
currently joins Arabic characters.  This concern can probably be put to
rest by reading the description in Section 13.2 of the Unicode 3.1
Technical Report (UAX #27).  The description carefully spells out the
relationship between cursively connected and ligated renditions and
the roles ZWJ and ZWNJ play in determining the rendition to be used.

 As to strong opposition to encoding additional presentation forms
 for alphabetic characters, well, we live in a democratic society and
 if some people who would like to produce quality printing feel that
 using a TrueType fount with some ligature characters does what they
 want and harms no one else, what exactly is the objection?

Ah, but it *isn't* harmless.  It causes problems for normalization.  For
homework tonight, read UAX #15, Unicode Normalization Forms.  The key
point for our discussion is that 

Courtyard Codes and the Private Use Area (derives from Re: Encoding of symbols and a lock/unlock pre-proposal)

2002-05-24 Thread William Overington

Peter Constable included the following in his post.

As for
PUA, many people have their own plans regarding U+F300..U+F3FF. For my own
part, my plans for U+F300..U+F3FF almost certainly do not involve padlock
symbols.

Thank you for your email.

As is well known, the Unicode Consortium will not endorse any code point
allocations in the Private Use Area and everyone has the right to allocate
none, some or all code points in the Private Use Area as he or she chooses,
and to publish them if he or she so chooses.

This is an interesting situation.  If one views the situation from the
inside looking out, then it becomes impossible for there to be any certainty
as to what is the intended meaning of a code point from the Private Use Area
which is used in a Unicode plain text file on the basis of examining the
code points.

However, if one views the situation from the outside looking in, a somewhat
different situation arises.

Suppose that I define a .eut file format to be structurally a Unicode plain
text file with the added feature that all code points that are within the
Unicode Private Use Area are defined to have the meanings which I give them
in my eutocode set of code point allocations.

So, a .eut file could be a rigorously defined file format, just as is .bmp
or .png.  If a wordprocessing package were to have a selection option for
reading in files of a .eut format, then there would be no confusion
whatsoever about the meaning of, say, a U+E707 character: it would be a ct
ligature.

Now, suppose I define a .uto file format to be structurally a Unicode plain
text file with the added feature that all code points that are within the
U+F3.. block of the Private Use Area have the meanings of a set of codes
called Courtyard Codes, and all other code points that are within the
Private Use Area have an undefined meaning, unless a sequence of some of the
Courtyard Codes has indicated from which type tray all subsequent Private
Use Area codes which are not in the U+F3.. block are to be regarded as
coming.

A wordprocessing package could be programmed by its manufacturer to accept
input in .uto file format, with accuracy of meaning for every code point
used in the file, even if some Private Use Area code points were used to
have two different meanings in two parts of the same document.



I like to imagine an analogy of the way that Unicode code points can be
defined as if there is a large kitchen table which is plane 0.  Onto most
parts of the table, pieces of coloured paper are laid, always taking care
that no piece of paper overlaps any other piece of paper, so that the table
surface is only covered by one thickness of paper.  On an area about one
tenth of the total area of the table is an area called the Private Use Area,
and here paper can be piled.  Perhaps 500 sheets of paper could be piled
upon this area.  So, if someone says, of some particular place on the
surface of the table What colour is the paper? then for parts of the table
that are not in the Private Use Area, the colour of the paper can be stated.
However, for the Private Use Area, the colour of the paper cannot be stated
with certainty.  It depends upon which piece of paper is being viewed at any
one time.  Suppose, however, that the people who are placing the paper onto
the Private Use Area agree amongst themselves that they like the look of
that nice yellow square of paper that takes up a small part of the Private
Use Area and will voluntarily avoid placing any paper on top of it.  One
would then end up with a Private Use Area that has coloured paper piled up
all over it, except for in one small area where there is a yellow square.
The net effect would be that the area covered by the yellow square would be
as uniquely defined as to the colour of paper upon it as anywhere not in the
Private Use Area.

Now, the question that naturally arises is as follows.  Will all end users
agree to keep the U+F3.. area only for the Courtyard Codes?  Who knows?  I
suggest however that it is possible that they will, because I hope that,
when they consider the matter, that people will feel that it is to their own
advantage to do so.

I feel that if everybody who wishes to make definitions into the Private Use
Area learned of the existence of the Courtyard Codes and finds that the
features that it could provide for them are extremely useful and may, in
time, become built into widely used software packages, then they might well
do so.

What would this take?

Ease of use.  Where a wordprocessing package or a desktop publishing package
or whatever has an option for reading in a Unicode plain text file it would
also have an option for reading in a .uto file.  The Courtyard Codes would
need to be well defined, publicly available, free to use and free of legal
entanglements.  Please note that I have chosen the name Courtyard Codes for
the system as Courtyard and Codes are two English words, not words specially
coined.  I got the idea of using the word Courtyard 

Please help: problem with Netscape 6.x

2002-05-24 Thread Magda Danish (Unicode)

Please reply directly to [EMAIL PROTECTED]
Thanks.
 
Magda

-Original Message- 


Date/Time:Fri May 24 04:02:24 EDT 2002

Contact:  [EMAIL PROTECTED]

Report Type:  General question

Text of the report is appended below:

-- -- -- -- -- -- -- -- -- -- -- -- -- -- -- --

I'm developing an internet application for user to input multi-lingual data. 
It works fine using internet explorer
5.x and above. However, it does not work for Netscape even for version 6.x.

1. All my web pages are set the charset to UTF-8.
head
meta http-equiv=Content-Type content=text/html; charset=utf-8
...
/head

2. My database is set to accept UTF-8 or Unicode characters.

3. My browser encoding selects UTF-8.

IE 5.x: View  Encoding  Auto-Select or Unicode (UTF-8).
Netscape 6.x: View  Character Coding  Auto-Detect  Auto-Detect (All) or 
Unicode (UTF-8).

Problem description (in netscape):
When I retrieved data (in format such as #x3B1; - my non-english data is 
stored in the database in this
form) from my database and displayed in a page - they displayed correctly. I 
proceeded to edit some fields,
leaving other fields as they were (even for multi-lingual data). When I posted 
the data, and navigate to the
next page, the non-english fields appeared as '?'. I'd spent days trying to 
figure out the problem, but to no
avail. I'm really at my wit- end and really need any help you can offer.

I'd got to know your email address while surfing the net for solution and saw 
your  articles on the net. Hope
that you really can help.

Thanks,
Geok Hu

-- -- -- -- -- -- -- -- -- -- -- -- -- -- -- --
(End of Report)







Re: Courtyard Codes and the Private Use Area (derives from Re: Encoding of symbols and a lock/unlock pre-proposal)

2002-05-24 Thread Philipp Reichmuth

WO U+F3A2 PLEASE LIGATE THE NEXT TWO CHARACTERS
WO U+F3A3 PLEASE LIGATE THE NEXT THREE CHARACTERS
WO U+F3A4 PLEASE LIGATE THE NEXT FOUR CHARACTERS

While I don't think this discussion of various PUA allocations should
continue very further, it's probably a lot better to introduce the
already-discussed ZERO WIDTH LIGATOR in such a form that X ZWL Y
produces the XY ligature, X ZWL Y ZWL Z the XYZ ligature and so on. It
saves you a lot of hassle with longer ligatures.

WO U+F3A8 PLEASE SWASH THE NEXT PRINTABLE ITEM
WO U+F3A9 PLEASE ALTERNATIVE SWASH THE NEXT PRINTABLE ITEM

Does this belong in a character-based encoding system at all? This is
better solved by markup. If you go on defining your own file formats
already, do include some sensible markup system there, and you don't
have to clutter the PUA and restrict their use. What if you've got
more than 2 swash forms, BTW?

WO U+F3C0 PLAIN - ITALIC:=false; BOLD:=false;
WO ...
WO U+F3FF 192 POINT

Again, markup is the better solution. And, to be honest, it's a bit of
a waste of space on the mailing list, don't you think?

WO I hope that these Courtyard Codes will be of interest to end users.

I don't really think so. They don't offer very much that well-known
typesetting systems don't implement already in their own fashion.

Philipp


















  Philippmailto:[EMAIL PROTECTED]
___
Stay the patient course / Of little worth is your ire / The network is down





Re: Courtyard Codes and the Private Use Area (derives from Re:Encoding of symbols and a lock/unlock pre-proposal)

2002-05-24 Thread Michael Everson

William,

Your Courtyard codes are a form of formatting markup. Why not use XML?

Everyone else will do.
-- 
Michael Everson *** Everson Typography *** http://www.evertype.com




Contributions from meeting 42 - Charts, Resolutions and latestdocument register

2002-05-24 Thread Michael Everson

Some new WG2 documents:

N2492 http://www.dkuug.dk/jtc1/sc2/wg2/docs/n2492.pdf
Charts - 10646-2 AMD1
Freytag
2002-05-22

N2491 http://www.dkuug.dk/jtc1/sc2/wg2/docs/n2491.pdf
Charts - 10646-1 AMD2
Freytag
2002-05-22

N2454 http://www.dkuug.dk/jtc1/sc2/wg2/docs/n2454.doc
Dublin Meeting 42 Resolutions
Ksar
2002-05-23

N2450 http://www.dkuug.dk/jtc1/sc2/wg2/docs/n2450.htm
Partial document register - 2190 - 2472 in reverse
Ksar
2002-05-23


Mike Ksar
-- 
Michael Everson *** Everson Typography *** http://www.evertype.com




Adding characters with decompositions (was Re: Private Use proposals)

2002-05-24 Thread David Hopwood

-BEGIN PGP SIGNED MESSAGE-

Doug Ewell wrote:
 [...] Beyond a certain point in time (defined as Unicode 3.1), no new
 canonical or compatibility equivalences can be defined.

Huh? What about the compatibility ideographs U+FA30..FA6A, added in 3.2?
Or U+2047 DOUBLE QUESTION MARK, also added in 3.2?

A correct statement of the policy is that no newly assigned character can
*canonically* decompose to *two characters* unless it is added to the
composition exclusion list.

- -- 
David Hopwood [EMAIL PROTECTED]

Home page  PGP public key: http://www.users.zetnet.co.uk/hopwood/
RSA 2048-bit; fingerprint 71 8E A6 23 0E D3 4C E5  0F 69 8C D4 FA 66 15 01
Nothing in this message is intended to be legally binding. If I revoke a
public key but refuse to specify why, it is because the private key has been
seized under the Regulation of Investigatory Powers Act; see www.fipr.org/rip


-BEGIN PGP SIGNATURE-
Version: 2.6.3i
Charset: noconv

iQEVAwUBPO6mzjkCAxeYt5gVAQHBswf/dFDQapAcGOWu+lVn47ZIz+azsachQALj
aws8rBZ5v9vSPT6KGVxdTVpPRtGEiLtNcFr5mEGnvQBaQ/eYBqb+xFnXVxPbxa4u
Vks8wRKTY6cEAlhzNFI/3da9Y9cb77PgAhtJVIniZwbDaqkI1a0K/y9DwJvXSl6O
r3RC6J55L2k/B+jbT7JvacRpvrKwOGwvQUiec+krs2u0D0Z64lCUjVAqgVTv+AYQ
1x1blsyqeEZSzH02W5q//JzFHI7+AADd3O1OZpzi3lUiNNTR7NZDBjtX3D4OEuLv
AvzkywQCaPbr7Sgl4Y/6uJ6Zz4/FzPS6B1FjoyfE62rGlSTEpvadvA==
=BfyB
-END PGP SIGNATURE-




Re: Courtyard Codes and the Private Use Area (derives from Re: Encoding of symbols and a lock/unlock pre-proposal)

2002-05-24 Thread John H. Jenkins


On Friday, May 24, 2002, at 08:06 AM, Philipp Reichmuth wrote:

 WO U+F3A2 PLEASE LIGATE THE NEXT TWO CHARACTERS
 WO U+F3A3 PLEASE LIGATE THE NEXT THREE CHARACTERS
 WO U+F3A4 PLEASE LIGATE THE NEXT FOUR CHARACTERS

 While I don't think this discussion of various PUA allocations should
 continue very further, it's probably a lot better to introduce the
 already-discussed ZERO WIDTH LIGATOR in such a form that X ZWL Y
 produces the XY ligature, X ZWL Y ZWL Z the XYZ ligature and so on. It
 saves you a lot of hassle with longer ligatures.



Zero width ligator was rejected.  Zero-width joiner can be used to mark 
ligation points where they are absolutely necessary; where they are merely 
stylistic preferences, they belong in markup.

==
John H. Jenkins
[EMAIL PROTECTED]
[EMAIL PROTECTED]
http://homepage.mac.com/jenkins/





Re: Courtyard Codes and the Private Use Area (derives from Re: Encoding of symbols and a lock/unlock pre-proposal)

2002-05-24 Thread Michael \(michka\) Kaplan

From: John H. Jenkins [EMAIL PROTECTED]
Sent: Friday, May 24, 2002 1:54 PM

 On Friday, May 24, 2002, at 08:06 AM, Philipp Reichmuth wrote:

  WO U+F3A2 PLEASE LIGATE THE NEXT TWO CHARACTERS
  WO U+F3A3 PLEASE LIGATE THE NEXT THREE CHARACTERS
  WO U+F3A4 PLEASE LIGATE THE NEXT FOUR CHARACTERS
 
  While I don't think this discussion of various PUA allocations should
  continue very further, it's probably a lot better to introduce the
  already-discussed ZERO WIDTH LIGATOR in such a form that X ZWL Y
  produces the XY ligature, X ZWL Y ZWL Z the XYZ ligature and so on. It
  saves you a lot of hassle with longer ligatures.
 
 

 Zero width ligator was rejected.  Zero-width joiner can be used to mark
 ligation points where they are absolutely necessary; where they are merely
 stylistic preferences, they belong in markup.

But with that said, I have to agree with Philipp -- the PUA discussion
really needs to end.

William, please start thinking of the PUA as the city dump. Everyone is glad
it is there when you have to stick something somewhere, but no one really
talks about it much and no one *ever* wants to take things out of it and
strew it on their nice, clean characters.

:-)


MichKa

Michael Kaplan
Trigeminal Software, Inc.  -- http://www.trigeminal.com/





Language name questions

2002-05-24 Thread Deborah Goldsmith

Hi,

I am trying to determine the names of a few languages in their own 
language. This is for a list of language names that a user can select, 
like:

English
Français
日本語

and so on.

I need answers to some particular questions, but if someone could point 
me at a book or web site, then that would be even better.

Here are the languages I'm trying to pin down:

Hungarian: magyar or magyarul?
Slovak: Slovenský?
Slovenian: Slovenski? Slovensko?

Any help would be gratefully appreciated!

Deborah Goldsmith
Manager, Fonts  Unicode
Apple Computer, Inc.
[EMAIL PROTECTED]





Re: Language name questions

2002-05-24 Thread Mark Davis

For the ICU data for those, look at:

http://oss.software.ibm.com/cgi-bin/icu/lx/en_US/utf-8/?_=hu
http://oss.software.ibm.com/cgi-bin/icu/lx/en_US/utf-8/?_=sk
http://oss.software.ibm.com/cgi-bin/icu/lx/en_US/utf-8/?_=sl

(Note: there are sometimes oddities with the server; 'refresh' if you
get a blank screen.)

Mark
__

http://www.macchiato.com

 “Eppur si muove”
- Original Message -
From: Deborah Goldsmith [EMAIL PROTECTED]
To: Unicode List [EMAIL PROTECTED]
Sent: Friday, May 24, 2002 16:04
Subject: Language name questions


 Hi,

 I am trying to determine the names of a few languages in their own
 language. This is for a list of language names that a user can
select,
 like:

 English
 Français
 日本語

 and so on.

 I need answers to some particular questions, but if someone could
point
 me at a book or web site, then that would be even better.

 Here are the languages I'm trying to pin down:

 Hungarian: magyar or magyarul?
 Slovak: Slovenský?
 Slovenian: Slovenski? Slovensko?

 Any help would be gratefully appreciated!

 Deborah Goldsmith
 Manager, Fonts  Unicode
 Apple Computer, Inc.
 [EMAIL PROTECTED]








Re: Language name questions

2002-05-24 Thread Deborah Goldsmith

On Friday, May 24, 2002, at 05:43 PM, Mark Davis wrote:

 http://oss.software.ibm.com/cgi-bin/icu/lx/en_US/utf-8/?_=sk

This has Slovenina, but we've also seen Slovensk.

Deborah





Re: Language name questions

2002-05-24 Thread Mark Davis

I'll forward it to our localization people, and see what they say.

Mark
__

http://www.macchiato.com

 Eppur si muove
- Original Message -
From: Deborah Goldsmith [EMAIL PROTECTED]
To: Mark Davis [EMAIL PROTECTED]
Cc: Unicode List [EMAIL PROTECTED]
Sent: Friday, May 24, 2002 18:26
Subject: Re: Language name questions


On Friday, May 24, 2002, at 05:43 PM, Mark Davis wrote:

 http://oss.software.ibm.com/cgi-bin/icu/lx/en_US/utf-8/?_=sk

This has Slovenina, but we've also seen Slovensk.

Deborah







Re: Courtyard Codes and the Private Use Area

2002-05-24 Thread Doug Ewell

Michael (michka) Kaplan michka at trigeminal dot com wrote:

 William, please start thinking of the PUA as the city dump. Everyone
 is glad it is there when you have to stick something somewhere, but
 no one really talks about it much and no one *ever* wants to take
 things out of it and strew it on their nice, clean characters.

Oh, that's going a bit far.  I think it's more like an attic or
basement, a handy place to store your old baseball card collection and
other personal items that don't belong anywhere else, but definitely not
a place you'd want to invite the neighbors or make the center of
attention of your house.

-Doug Ewell
 Fullerton, California





Re: Language name questions

2002-05-24 Thread Doug Ewell

Deborah Goldsmith goldsmit at apple dot com wrote:

 Here are the languages I'm trying to pin down:

 Hungarian: magyar or magyarul?
 Slovak: Slovenský?
 Slovenian: Slovenski? Slovensko?

FWIW, the language menu in my Ericsson T28 World phone offers Magyar
(Hungarian), Slovenčina (Slovak), and Slovenski (Slovenian).

-Doug Ewell
 Fullerton, California





N2476 a hoax?

2002-05-24 Thread Doug Ewell

A new JTC1/SC2/WG2 document, ostensibly from the Unicode Technical
Committee, was posted on the WG2 web site this past week.  The URL is:

http://std.dkuug.dk/jtc1/sc2/wg2/docs/n2476.pdf

This document is so far removed from the stated position of the UTC, and
so far below its normal editorial standards, that I believe it was
submitted by some other organization that signed the UTC’s name to it as
a hoax, perhaps in an attempt to lend it credibility.

I’m not normally much of a conspiracy theorist, so I'm admittedly
stepping in unfamiliar territory here.

The document, N2476, is titled Variants and CJK Unified Ideographs in
ISO/IEC 10646-1 and -2, which is quite a broad category.  It turns out
to be about inventing some sort of equivalence classes among Han
characters so that they can be considered the same in certain
contexts.  Anyone can create this type of equivalence class for their
own personal use, of course, but N2476 proposes that the IRG be
instructed to develop a classification scheme that would have some
sense of being officially sanctioned.

This is at odds with what I have heard from the most prominent CJK
experts on this list, that such equivalences are too dependent on
context and writer's intent to belong in a character encoding standard.

For starters, the paper is signed Unicode Techncial Committee.  Under
what circumstances would any member of the UTC release a paper with the
UTC’s own name misspelled?  There are other editorial mishaps:

 ... end-users may want text to be treated is equivalent...

 There are situations were some users...

which are not at all up to the usual standards of a UTC document.

But enough nitpicking; it’s the content that really makes me think this
document is a spoof.  Check this justification for creating an
equivalence class between simplified and traditional Han characters:

 To give one instance which has been of some importance in early
 2002, most users want simplified and traditional Chinese to be
 the same in internationalized domain names.

Most users is both overstated and unsubstantiated.  Several
representatives from the Chinese, Taiwanese, and Hong Kong domain-name
industry made this claim on the Internationalized Domain Name (IDN)
mailing list.  The topic became known simply as TC/SC and, for over a
month, was more frequently and persistently discussed than any other
topic.  It got to the point where the domain-name representatives
organized a chain-letter campaign, resulting in over 300 messages --
many identically worded, and from previously silent contributors --
insisting that the IDN architecture must implement TC/SC equivalence
or be a complete failure.

 Latin domain names, after all, are case insensitive.
 Www.Unicode.Org resolves to the same address as
 www.unicode.org.

UTC members have repeatedly stated that TC/SC equivalence is not at all
comparable to Latin case mapping.

 The inability to provide for [TC/SC equivalence] very nearly
 prevented Chinese from being used in internationalized domain
 names.

No, it didn’t.  That was a counterproposal made by the Chinese
domain-name representatives, who claimed that prohibiting Han characters
for now would give the relevant bodies more time to develop a proper
TC/SC mapping solution (implying that the problem was solvable at all,
an opinion disputed by many).

 Programmers and users are being increasingly frustrated that as
 ISO/IEC 10646 becomes more pervasive, they are increasingly
 compelled to deal with a large number of variant characters some
 of which are only subtly different from each other and which
 cannot be automatically equated.

The UTC would never refer to ISO/IEC 10646 as pervasive or talk of
programmers and users being compelled to deal with variant characters,
nor would it make such an emotional appeal that such variants should be
automatically equated.  Note the lack of standard UTC/WG2 terminology;
if this were the UTC talking, you would be reading about canonical and
compatibility equivalents and normalization.  This passage also hints at
the author’s lack of awareness that similar equivalence issues exist for
scripts other than Han.

 It is vitally important that data be provided to allow
 developers, protocols, and other standards to deal with Han
 variants.

I have never before seen an official UTC paper that claimed it was
vitally important to solve a given problem.  Individual submissions,
yes.

 What is needed, however, is something that allows at the least for
 a first-order approximation of equivalence  it would be up to
 the authors of the individual application, protocol, or standard
 to determine whether this were acceptable or not.

And what if the authors decide the IRG-developed approach is not
acceptable?  What are they expected to do then?  Again, the reader is
invited to contrast this passage, in both form and content, with any
other that has been issued from the UTC in the past.

On the very same day (2002-05-08) that N2476 was published,