Re: Unicode/10646 History (was Re: [idn] An ignorant question about TC<-> SC)

Mark Davis Thu, 01 Nov 2001 07:33:58 -0800


—————

Δός μοι ποῦ στῶ, καὶ κινῶ τὴν γῆν — 
Ἀρχιμήδης
[http://www.macchiato.com]

----- Original Message -----
From: "Eric Brunner-Williams in Portland Maine" <[EMAIL PROTECTED]>
To: "Mark Davis" <[EMAIL PROTECTED]>
Cc: "John C Klensin" <[EMAIL PROTECTED]>; <[EMAIL PROTECTED]>;
<[EMAIL PROTECTED]>
Sent: Wednesday, October 31, 2001 17:59
Subject: Re: Unicode/10646 History (was Re: [idn] An ignorant question about
TC<-> SC)

> Mark,
>
> As you're feeling your oats.
>
> >                                                                  ... but
> > there are some inaccuracies that should not be left hanging.
>
> I wish the UTC felt that way about a) making any recommendation as a
> body concerning any any attempt by another body to utilize the work
> product of the UTC, and b) specifically making an operational suggestion
> that does not appear to be accompanied by operational data, or a non-naive
> understanding of the dns.

I'm not trying to be argumentative, but I don't really understand your (a).
Perhaps you could restate it.

As to (b), the UTC is clearly not an expert on DNS, nor does it pretend to
be. It was asked to help with the character encoding / language issues with
IDN. I have tried to be very careful about stating when something was a
communication from the UTC; otherwise my contributions to this list have
been individual. Moreover, I don't pretend to be knowledgeable at all about
DNS, and never have. I have tried to stick to character issues that people
raise here.

I'm sorry if that was not clear.

>
> ...
>
> > We are very curious as to the origin of this "printer company" story;
where
> > did you hear this?
>
> That's "printer consortia". Private, with $$ to sit at the table. Like W3C
> now, but worse.
>
> It was my impression in 1991, at UTR#4-time, at 2.0-time, and at
UTR#8-time.
>
> It [Unicode] didn't spring out of the operating system industry. So which
> peripheral device consortia is left? Disks? Modems?
>
> It is still my impression today. Having writ XPG/1, and XPG/4.2, and
knowing
> most of the P1003.x particpants of the 90's, and the business and
technical
> managements of Bull, ICL, Siemens, Olivetti, Nixdorf, IBM, Sun, SGI, and
HP
> during the 90's -- I really don't care how much you'd prefer to believe
that
> the non-printer lines of business considered Unicode as strategic. I
watched
> them staff-up for the iso2022 period, and down-staff, and staff-up for the
> File-System-Safe period, and down-staff. "Them" means Unix vendor business
> core unit, not printer unit.
>
> Just because we (first Apple, then MicroSoft, then us Unix vendors) all
got
> within a transform format of Unicode by 1996 doesn't mean that Unicode was
> a core value proposition. At HP HP-15 was vastly more important, and at
SMI
> the multi-byte code-path for sort(1) was simply abominable, HP-UX 10.0 and
> Solaris 2.4 and 2.5 -- when we multi-byted (UTF-8'd the tty, the file
system,
> and all the user libs and apps).

We have never thought of ourselves as being limited to printer companies;
nor did I or anyone else I know represent the Unicode consortium as such.
For example, my own background is from working on the Mac OS; Unicode did
not at all come out of the printer end of Apple -- it came out of the OS,
and certainly was considered important. But it was clear that the Microsoft
people also worked on Windows, and very early on it was important enough for
them to make it the core of NT -- which has now become the mainstream
version of Windows. IBM was also involved early -- and the people there did
not come from their printer group. The bulk of the people got involved
simply because they had had to deal with mixtures of national standards in
programming, and knew how incredibly convoluted it was.

Here is the membership at various times -- people can judge for themselves
whether these are all printer companies.

1987) Apple & Xerox

1990) Apple, GO, IBM, Metaphor, NeXT, Microsoft, Sun, RLG

1992) Adobe, Aldus, Apple, Borland, Digital, Ecological Linguistics, GO,
IBM, Lotus, Microsoft, NeXT, Novell, Pacific Rim Connections, RLG, Sun,
WordPerfect, Xerox (this was as of the start of the year; I think HP joined
during the year).

2001) Adobe Systems, Inc.
      Apple Computer, Inc.
      Basis Technology Corporation
      Compaq Computer Corporation
      Government of India Ministry of Information Technology
      Government of Pakistan, National Language Authority
      Hewlett-Packard Company
      IBM Corporation
      Justsystem Corporation
      Microsoft Corporation  NCR Corporation
      Oracle Corporation
      PeopleSoft, Inc.
      Progress Software Corporation
      The Research Libraries Group, Inc. (RLG)
      Reuters, Ltd.
      RWS Group, LLC
      SAP AG
      Sun Microsystems, Inc.
      Sybase, Inc.
      Trigeminal Software, Inc.
      Unisys Corporation

(Associate Members)

      Agfa Monotype Corporation
      Beijing Zhong Yi Electronics Co.
      BMC Software, Inc.
      Booz, Allen, & Hamilton, Inc.
      Cable & Wireless HKT Limited
      CDAC-Centre for Development of Advanced Computing
      China Electronic Information Technology Ltd.
      The Church of Jesus Christ of Latter-day Saints
      Columbia University
      Data Research Associates
      DecoType, Inc.
      Endeavor Information Systems, Inc.
      eNIC Corporation
      epixtech, Inc.
      Ericsson Mobile Communications
      Ex Libris, Inc.
      GlobalMentor, Inc.
      GlobalSight Corporation
      The Government of Tamil Nadu, India
      iDNS
      i-EMAIL.net Pte Ltd
      Innovative Interfaces, Inc.
      Internet Mail Consortium
      Langoo.com
      Language Analysis Systems, Inc. Language Technology Research Center
      Netscape Communications
      Nokia
      Nortel Networks
      Novell
      OCLC, Inc.
      Openwave Systems, Inc.
      Optio Software
      Palm, Inc.
      Production First Software
      The Royal Library, Sweden
      SAS Institute, Inc.
      SHARE
      Siebel Systems
      SIL International
      SIRSI Corporation
      SLANGSOFT
      Software AG
      StarTV - Satellite Television Asia Region Ltd.
      Symbian, Ltd.
      Uniscape, Inc.
      Verisign Global Registry Services
      VTLS, Inc.
      WALID, Inc.
      WordWalla, Inc.
      Yet Another Society

>
> > I only take the time to correct some of the items above because a
mistaken
> > impression of the process of development of Unicode and ISO 10646 might
lead
> > people to have a mistaken impression of the quality of Unicode and
10646, or
> > the organizations behind them.
>
> Jeez. I made mistakes in XPG/1. I let Nixdorff shave bits off a pid for an
> early SMP-which-processor identifier. I made mistakes in Spec1170 also.
> You guys act as if you never goofed -- and you guys actually put Klingon,
> Esperanto and Pharonic Egyptian in your runqueue -- ahead of living
languages.

Sigh. I don't know of anyone in the consortium who considers it perfect.
Moreover, in the very note you are commenting on, I wrote:

"Yes, Unicode/10646 is what we have now. It is not perfect -- those of us
who worked on it from the beginning know the warts the best!"

As to your examples, you might try to get your facts correct. Klingon, for
example, had been proposed to the consortium many times, but was never in
the "runqueue", and was finally formally rejected. See
http://www.unicode.org/unicode/alloc/Pipeline.html. Esperanto is not a
separate script -- it is Latin with a few extra characters, characters that
are used in other languages in any event. By "Pharonic Egyptian", I assume
you mean Egyptian Hieroglyphics. We do have a proposal on the floor for
that, because of a great deal of scholarly interest in a number of
countries.

The order in which scripts and characters are dealt with depends on the
interests of the companies, countries, and individuals involved in the
consortium. The consortium as a whole generally doesn't get involved until
there is a complete proposal on the floor. By the way, many proposals come
from outside the consortium -- you don't need to be a member if you have
some particular script or language that you would like to make a proposal
for.

>
> We could replace Unicode as the basis for work on internationalizing the
dns.
> It wouldn't be a lark, but at least we already know what _not_ to do when
> creating encoding for identifiers, and in terms of incongruity, it
couldn't
> be worse than ACE-for-labels, something-else-for-everything-else.
>
> To quote from the report of the 1996 invitational workshop (not one was a
> Unix implementor)
>
>    E: The IAB should encourage the IRTF to create a research group to
>    explore the open issues of character sets on the Internet. This group
>    should set its sights much higher than this workshop did.
>
> That was a sound recommendation. It hasn't been acted on. It should have
> been, and it still should be in the present.
>
> Just to keep things honest, from rfc2044:
>
>    UTF-8 was originally a project of the X/Open Joint
>    Internationalization Group XOJIG with the objective to specify a File
>    System Safe UCS Transformation Format [FSS-UTF] that is compatible
>    with UNIX systems, supporting multilingual text in a single encoding.
>    The original authors were Gary Miller, Greger Leijonhufvud and John
>    Entenmann.
>
> I replaced John Entenmann (SMI) on XOJIG in 1995, and Gary Miller (IBM)
> and I came out with our code-set independent architectures for our two
> repsective operating systems (AIX and Solaris) in 1995/6.

I guess this accusation of dishonesty on my part is based upon what I wrote
in:

"* There was no UTF-8: the first UTF (UTF-1) was developed by Unicode to
encapsulate the merged Unicode/10646 in a sequence of compatible bytes. That
UTF was superceded by the superior UTF-8, but the principles were the same."

I did not say, and had no intention of implying, that UTF-8 was developed by
the Unicode Consortium. As you say, it came from X/Open. And as I said, it
was "superior" to the UTF-1 that had preceded it -- the consortium had no
objections at all to deprecating UTF-1 and replacing it by UTF-8.

>
> We have bigger fish to fry than the ruffled feathers of Unicadettes, or
> anyone else for that matter.

"Our ruffled feathers"?

>
> Anyone who wants to say they heard it from me is free to so state.
>
> Some days I look at national standards and think, is that really any worse
> than this mess?
>
> Eric
>
>

Re: Unicode/10646 History (was Re: [idn] An ignorant question about TC<-> SC)

Reply via email to