subject:"RE\: Tags and the Private Use Area"

Re: RE: Tags and the Private Use Area

2001-05-03 Thread Rick McGowan


Peter said:

>2. How do I get software X to know how to process my PUA characters, or how
>do I document my characters for others to understand my data?

Michael replied...

> In principle it would work, if the OSes are being written to handle user
> editing of such things. Ten euros sez they ain't.

Well, at one time when I believe it was possible with the precursor to Mac  
OS X to change the behavior & properties in the "Cocoa" text system.  It's  
merely a matter of loading or re-loading the data tables.  I don't know if  
the data format is published...

In an object-oriented system, you could pretty easily over-ride only part  
of the data dealing with the PUA by funnelling property requests in the PUA  
range to a set of user-functions, or to user-loadable data blocks.  (On  
Mac OS X for instance, the "User Defaults" mechanism should be easily  
adaptable with load/unload hooks for this and would work on a per-machine,  
per-user or per-app basis, or all three.)

Mark Leisher's publicly available "UCData" project also allows re-loading  
the property data with explicit load and unload functions.  Overloading the  
utility functions to filter PUA codes to another data set would be  
in-principle rather easy.  E.g., I mean something as trivial as this at the  
top of _ucprop_lookup():

if (_userDataLoaded && is_PUA_Char(code)) {
return _ucprop_lookup_PUA(code, n);
}

and providing a set of load functions for user data.  This project code  
already uses the Unicode Data file as-is for its initial input.  PUA data  
could be used on a per-installation basis by simply adding data to the data  
file.

Rick

RE: Tags and the Private Use Area

2001-05-03 Thread Curtis Clark


At 03:09 AM 5/3/01, Marco Cimarosti wrote:

>The PUA is (or might be) used for, e.g.: [...]

I have been following this thread with some amusement, and I have noticed 
that one use of the PUA has been overlooked: the Microsoft Symbol Font 
area. In Windows systems, this gives access to a literally limitless range 
of dingbats, artworks, and other "non-script" uses, 256 glyphs at a time. 
Many programs written for Windows (e.g. Powerpoint) look to "symbol fonts" 
for things such as bullets. The beauty of this area is its *lack* of 
semantics and repeatability and any other constraint except that it exists 
and is available.

The alternative of course would be to put these glyphs in the U+0021 - 
U+00FF range.


-- 
Curtis Clark  http://www.csupomona.edu/~jcclark/
Biological Sciences Department Voice: (909) 869-4062
California State Polytechnic University  FAX: (909) 869-4078
Pomona CA 91768-4032  USA  [EMAIL PROTECTED]

RE: Tags and the Private Use Area

2001-05-03 Thread Michael Everson

At 09:44 +0800 2001-05-03, [EMAIL PROTECTED] wrote:

>2. How do I get software X to know how to process my PUA characters, or how
>do I document my characters for others to understand my data?

That's a good one. A very good one. Is there a way to define these 
using the Unicode properties? Sorting could be done, but I don't know 
about editing the database for local use. In principle it would work, 
if the OSes are being written to handle user editing of such things.

Ten euros sez they ain't.

>3. Is there a need for some protocol to tag data (either internal to the
>data, as William suggested, or as metadata) to a recipient know either what
>my PUA characters mean, or where to find documentation that explains that?

I don't think so. I think this is pseudo-encoding.

-- 
Michael Everson  **  Everson Gunn Teoranta  **   http://www.egt.ie
15 Port Chaeimhghein Íochtarach; Baile Átha Cliath 2; Éire/Ireland
Mob +353 86 807 9169 ** Fax +353 1 478 2597 ** Vox +353 1 478 2597
27 Páirc an Fhéithlinn;  Baile an Bhóthair;  Co. Átha Cliath; Éire

RE: Tags and the Private Use Area

2001-05-03 Thread Peter_Constable

Marco wrote:

>So, if William drops it, I will take the challenge -- at the risk of
>repeating things that others and myself already wrote.
>
>The PUA is (or might be) used for, e.g

People, there are three distinct issues here:

1. Are there legitimate uses for the PUA?

2. How do I get software X to know how to process my PUA characters, or how
do I document my characters for others to understand my data?

3. Is there a need for some protocol to tag data (either internal to the
data, as William suggested, or as metadata) to a recipient know either what
my PUA characters mean, or where to find documentation that explains that?

I think there is no debate about 1. Marco and others have given lists of
valid scenarios.

Regarding 3, a variety of objections have been made to Williams suggestion:
- this is metadata and does not belong internal to the data
- use of PUA characters to create a protocol creates a circular problem of
documenting PUA usage and does not solve anything
- some type of markup protocol could be an appropriate mechanism for doing
this, but UTC will not establish this kind of protocol
- this is not the right forum to discuss higher-level protocols

I think that item 2 is the one thing that isn't getting discussed here, but
which is probably in greatest need of discussion.

>IMHO, It would be more interesting (and less impacting Unicode policies)
to
>discuss *what* this "PUA semantics" data could look like.

Bingo!

>Let me add that, however, all this subject is *not* exactly the
>highest-priority need that I ever heard. I personally can live even with
and
>"undefined PUA", and wouldn't spend my time in developing such a thing.

Lest we think this is unimportant, I will mention that I have heard of at
least one linguist who has created a hacked Unicode (rather than e.g.
hacked cp1252) font in order to get commercial software give the desired
shaping behaviour with their as-yet-unencoded characters. In this case, I
understand that they were given strong health warnings: "Don't give this to
anybody else lest we start getting garbage data disseminated." It won't
surprise me if these things start cropping up without those efforts to keep
it contained.

- Peter

---
Peter Constable

Non-Roman Script Initiative, SIL International
7500 W. Camp Wisdom Rd., Dallas, TX 75236, USA
Tel: +1 972 708 7485
E-mail: <[EMAIL PROTECTED]>

RE: Tags and the Private Use Area

2001-05-03 Thread Marco Cimarosti

William Overington wrote:
> Kenneth Whistler wrote:
> > Among other things, you have yet to have meet the challenge
> > by Michael Kaplan to provide a convincing case for their
> > requirement.
> 
> Oh, there was no need.  Michael stated his challenge as a 
> "put up, or shut up" challenge [...]

I am probably not the only one who feels that this is not the way of
discussing things.

I think that Michael did not state any "put up or shut up" challenge, but
rather made a very sensible objection to the whole subject of this thread:
"what is it for?". I think that such an objection should be answered
politely, rather than haughtily refused.

So, if William drops it, I will take the challenge -- at the risk of
repeating things that others and myself already wrote.

The PUA is (or might be) used for, e.g.:

1) linguistic research (e.g. handling texts in unencoded or unencodable
ancient scripts);

2) recreational linguistics (e.g. constructed scripts and the like);

3) encoding research (e.g. experimenting with interim encodings while
preparing proposals);

4) orthography development (e.g. special characters experimented for as-yet
unwritten living languages);

5) interim encodings for non-linguistic notations (e.g. people who need
labanotation to discuss dance over the Internet).

Of course, every single person can be involved in more than one project for
each one of the disciplines above, e.g.: a scholar may study (or teach) both
hieroglyphics and cuneiform; a "game master" may be discussing several role
games in ConLangs mailing lists, etc.

After a few years surfing in linguistic-related forums, I noticed that the
same names tend to occur in disciplines 1 to 4, and I wouldn't be surprised
if some of these people is also interested in point 5.

All this is to say that, yes, there may be a latent need for exchanging PUA
encodings and, consequently, to define some sort of protocol to attach the
intended meaning to the otherwise meaningless PUA codepoints.

Such a protocol can be private or public but, clearly, it *cannot* be a part
of the Unicode Standard, because this would contradict the basic statement
that everybody can do whatever they want with PUA .

One can imagine that, in a distant future, Unicode could choose to
"reference" such a protocol as a "related information", but no more.
However, before such a thing can happen, there must be something to be
reference...

I think that the discussion is currently focusing the wrong thing. It is not
so important how a certain text file will declare its "PUA semantics": after
all, there will never be *one* method for doing this (text who has a MIME
header will presumably use it; rich text will have its own means; mark-up
languages may add a tag for this, etc.).

IMHO, It would be more interesting (and less impacting Unicode policies) to
discuss *what* this "PUA semantics" data could look like. Will it be a
UniData-like file? Or will it be an XML-based file? Will it include a
default font? Which kind of font? And how will all this material be used:
will programmers manually download it and package it in their applications?
Or will it be automatically downloaded and installed à la plug-in?

Let me add that, however, all this subject is *not* exactly the
highest-priority need that I ever heard. I personally can live even with and
"undefined PUA", and wouldn't spend my time in developing such a thing. If
someone else wishes to start such a work, I would certainly try to keep
myself informed about their progress -- but I would not like to follow
*every* single step of the discussion on *this* mailing list.

_ Marco

Re: Tags and the Private Use Area

2001-05-02 Thread Rick McGowan


William Overington wrote, responding to Ken:

> As I have not claimed that any such case actually exists at the
> present time, then the challenge is null and void and I have no need to
> answer it.

Hmmm...  I still think, personally speaking, that you're going through a
lot of effort that appears basically pointless because there isn't any
problem.  If you could illustrate the existence of a real and more-or-less  
serious problem, I would probably take the discussion seriously.

At this point, I still don't see an actual problem that is affecting lots  
of people, so there's a lot of list traffic being spent to not much effect.

> I must also convince the Unicode Consortium that it has the power to
> implement private use area support tags even if the Unicode Consortium
> were to accept that private use area support tags were needed.

Why must you do that?  What would be the point of convincing them that
they can do something to support a non-problem?  It sounds like perhaps
you have a personal agenda that requires such a thing as a springboard.
That doesn't necessarily translate into a world-wide problem with the PUA.

> I happen to think that the Unicode Consortium arguably might have the
> power to implement private use area support tags if it chose to do so.

Well, honestly, the Consortium, like any organization, can do lots of
things, but they may not be interested in doing some things that they
could do.  I suspect they may have bigger and more immediately
problemmatic fish to fry -- like the living scripts of South & Southeast
Asia that still need to be encoded.  Or the worldwide lack of a decent
and complete language tagging standard...

> If the issue of capability to implement were resolved in favour
> of the view that the Unicode Consortium does indeed have the capability
> to implement private use area support tags in a non-private use area
> of the unicode code point space, then the issue of whether to implement
> or not to implement and if to implement in what manner to implement
> would become a normal Unicode Technical Committee process.

I would assert already that UTC could in fact implement such a thing.  I
believe they should not do so because that would undermine the freedom of  
action within the PUA itself.  If you took a straw poll of UTC members, I  
suspect you would find little or no favor for adding such support tags
for just that reason -- aside from the fact that no _need_ has yet been
demonstrated.

If you want to push the issue and get an actual response from UTC, I
suggest you submit a document with a proposal to UTC.  Instructions are
on the web site.

Rick

Re: Tags and the Private Use Area

2001-05-02 Thread William Overington


Kenneth Whistler wrote:

Among other things, you have yet to have meet the challenge by Michael
Kaplan to provide a convincing case for their requirement.

end quote

Oh, there was no need.  Michael stated his challenge as a "put up, or shut
up" challenge on the matter of stating an actual example of a clash between
two actual existing uses of the private use area.  A "put up, or shut up"
challenge relates to someone being ask to justify something that he or she
has stated.  As I have not claimed that any such case actually exists at the
present time, then the challenge is null and void and I have no need to
answer it.  I did not wish to seem less than diplomatic in my response so I
answered upon the scientific content of the challenge rather than commenting
on its validity as a challenge.

I can seek to provide a convincing case for their requirement.  Yet, there
is an additional matter that I need to do as well, for I must also convince
the Unicode Consortium that it has the power to implement private use area
support tags even if the Unicode Consortium were to accept that private use
area support tags were needed.  I happen to think that the Unicode
Consortium arguably might have the power to implement private use area
support tags if it chose to do so.  Thus far, those who have expressed an
opinion are clear that it does not have that power.  This is a different
issue than the issue of submitting a proposal for the system to be
implemented.  If the issue of capability to implement were resolved in
favour of the view that the Unicode Consortium does indeed have the
capability to implement private use area support tags in a non-private use
area of the unicode code point space, then the issue of whether to implement
or not to implement and if to implement in what manner to implement would
become a normal Unicode Technical Committee process.

Ken mentions that I had written as follows:

There would be a protocol saying that, in a plain unicode text file, but
not in a rich text file,

end quote

Ken then responded as follows:

This distinction already creates a problem for your proposal. Rich text
contains chunks of plain text, and introducing a bunch of tag characters
and a protocol for using them which have to be kept out of rich text, but
which can be in plain text, creates a filtering and transducement problem.
That would introduce a problem, rather than eliminating a problem.

end quote

I based my idea on the protocols for the language tags of plane 14, where
there are protocols for using the tags in this manner.

William Overington

2 May 2001

Re: Tags and the Private Use Area

2001-05-02 Thread William Overington


Mike Ayers wrote:

I'd like to point out that I consider it a Good Thing not to have a
classification system.  Should I choose to use PUA characters, I don't want
any application that I didn't write attempting to interpret their meaning,
since I may use them for anything (wasn't it you, William, who was working
on a soft processor which used PUA codepoints as instructions?) and I don't
want to waste time describing the usage for other applications (if the usage
can even be described).

end quote

In that case, should you choose to use PUA characters you may well not want
any application that you didn't write attempting to interpret their meaning.

So you need to

not let anyone who might try to have an application that you did not write
attempt to interpret their meaning get a physical copy of the file

and you need to

prohibit anyone whom you do allow to have a physical copy of the file from
trying to have an application that you did not write attempt to interpret
their meaning using your rights under intellectual property law.

If that is possible then so be it.  In any case, if an application not
written by you did actually end up trying to read your file, then if you
were not using private use area support tags then it would not know how to
interpret the private use area codes that it encountered.  For you, with
your declared choice, that would be fine.  I am not suggesting that the use
of private use area support tags should be obligatory for people using
private use area codes in a plain unicode text file, I am only suggesting an
additional facility for users of unicode.

You need not waste time describing the usage for other applications as use
of private use area support tags would be an entirely optional facility for
people to use if they so wish and not use if they do not so wish.

Yes, it is me that is researching on a soft processor which uses PUA
codepoints as instructions.  It is called a uniengine.  The word uniengine
is generic with the meaning previously posted, so any set of codes that I
publish regarding a particular uniengine will need for the uniengine to have
a specific name, which I shall need to coin.  I am thinking in terms of
licencing the technology and letting publishing the meanings of the codes so
that people know what the codes each mean.  I would use private use area
support tags to assist the processor know when an in-line graphic was
encountered if I could.

Yet, the situation of a software application seeking to read in a file in
plain unicode text where that text contains one or more characters from the
private use area is going to be a widely practiced activity.  How widely
practiced I do not know, but systems such as Word 97 have facilities to read
in plain text files and output plain text files.  I suggest that future word
processing packages, from a variety of manufacturers, may well have a
selection box choice for plain unicode text for reading in files and plain
unicode text for writing out files.

When reading in a unicode plain text file, any encounter of a private use
code will need to be managed by the software, and even if the user operating
the computer knows that it is whatever, be it for example symbols for early
chemistry or symbols for ballet, then the computer system will need to know
as well.  A font for early chemistry symbols may well not contain the
ordinary English alphabet within the font.  So, once loaded as a file and
internally converted into the internal rich text format of the
wordprocessing software there may well be lots of work for the operator to
carry out changing the font of the private use area characters to early
chemistry symbols.  Even if the word processor had a facility for a global
changing of the font of the private use area characters to the early
chemistry font, that facility would not be suitable if the document
contained even non-code-clashing uses of two private use area fonts.  If
code clashing occurred, then the chances for ambiguity would be huge.

When writing out, private use area support codes could be added in to the
unicode plain text file produced, or not added in to the unicode plain text
file produced as the user chose from the choice presented in the drop down
menu of the selection box in the Save as  section of the wordprocessor.

William Overington

2 May 2001

Re: Tags and the Private Use Area

2001-05-02 Thread Michael Everson


At 14:01 -0700 2001-05-01, Kenneth Whistler wrote:

>If, on the other hand, it were just a matter of ensuring interoperability
>of private Blissymbolics implementations, then you could get endorsement
>by the Blissymbolics Institute.

BCI (Clissymbolics Communication International) will be playing with 
a PUA implementation to ensure that what we are doing works, but the 
intention is to have Blissymbols encoded in the SMP.
-- 
Michael Everson  **  Everson Gunn Teoranta  **   http://www.egt.ie
15 Port Chaeimhghein Íochtarach; Baile Átha Cliath 2; Éire/Ireland
Mob +353 86 807 9169 ** Fax +353 1 478 2597 ** Vox +353 1 478 2597
27 Páirc an Fhéithlinn;  Baile an Bhóthair;  Co. Átha Cliath; Éire

Re: Tags and the Private Use Area

2001-05-01 Thread Asmus Freytag

At 07:53 PM 5/1/01 +0100, William Overington wrote:
>Asmus continues:
>
>Since such scheme(s) support only some particular
>usage (or set of usages) of the private use area,
>the consortium would no longer be neutral towards
>*any and all* uses of the Private Use Area.
>
>end quote
>
>This is the core sentence of the posting for me.  The question is as
>follows.
>
>Does such a scheme support only some particular usage (or set of usages) of
>the private use area?

If you want to discuss set theory here, the set of all usages includes - of 
course - all usages that do *not* make use of the extra information that 
you are trying to provide in your protocol. (I used the word 'scheme' in my 
posting). This set is always larger than the set of all usages that *do* 
make use of the scheme.

Since this the latter set can only be a proper subset of the total set, 
once the Consortium adds any characters specifically for use in the kind of 
protocol that you describe, it would have shown a preference over other 
users of the PUA who either don't use any protocol or use a set of PUA 
characters for the same purpose using a different protocol not recognized 
by the Consortium. In other words:

>Asmus Freytag wrote:
>
>This would violate the neutrality that the Unicode
>Consortium is bound to observe when it comes to
>uses of the Private Use Area. [Since] By encoding characters
>it would implicitly endorse the scheme (or series of
>schemes) designed to use these characters.
>
>end quote

Coulnd't have said it better myself ;-)
A./

Re: Tags and the Private Use Area

2001-05-01 Thread Kenneth Whistler


William Overington perorated:

> Asmus continues:
> 
> Going further and outlining a protocol for such a
> thing is even worse - if done by the Unicode Consortium.
> However, it would be fine for any other organization
> to define the protocol - but that organization could
> not assign any special non-private characters.
> 
> end quote
> 
> If the Unicode Consortium found that it could act in this matter within
> limits then definition of a protocol would be all but an essential part of
> any such action.

But precisely as Asmus has stated, the Unicode Consortium (or more
propertly the Unicode Technical Committee) would not define any
such protocol for private use characters. It is entirely up to
external organizations to engage in such work if they choose to do
so.

> 
> What I have in mind here is a set of private use area support tags, perhaps
> located in plane 14, better located in plane 0 if a contiguous block of 128
> unused codes in a reasonable place not within the private use area could be
> found.

This would be a request for encoding of standardized characters, which
the Unicode Technical Committee would, of course, have to decide upon.
However, judging by my experience in the Unicode Technical Committee and
the feedback so far on this list (and the feedback received on the
language tag characters in Plane 14, which were "deprecated on birth"), it
is rather unlikely that any such proposal would be approved by the
UTC.

Among other things, you have yet to have meet the challenge by Michael
Kaplan to provide a convincing case for their requirement.

> 
> There would be a protocol saying that, in a plain unicode text file, but not
> in a rich text file, 

This distinction already creates a problem for your proposal. Rich text
contains chunks of plain text, and introducing a bunch of tag characters
and a protocol for using them which have to be kept out of rich text, but
which can be in plain text, creates a filtering and transducement problem.
That would introduce a problem, rather than eliminating a problem.

> certain information related to the meanings ascribed to
> any private use area codes used *may, but need not* be included using these
> tags.  However, *if* that information is included using these tags, then
> this format of providing that information *must* be used.

... by a process that chooses to honor that protocol. But that would be
outside the scope of the Unicode Standard and nothing that one could depend
on a process conformant to the Unicode Standard to be following, merely by
virtue of that conformance claim.

> The protocols
> could be carefully designed so that no limiting presumption whatsoever as to
> the nature of the usages of the private use area that were capable of being
> described using the protocols were made by the protocols.
> 
> I suggest that such a facility would be useful and would provide a sound
> basis for the future.

I don't think so. Frankly I don't think things would be any better than with
the kind of plain text alternatives that have already been suggested.

Or, if you are convinced that there really is sufficient reason and demand
to automate the processing, an alternative is simply to provide for
a PUAconventions.xml file, which would contain the information you are
suggesting for the protocol. Point at the appropriate PUAconventions.xml
file, and you get the equivalent of trying to bury such information in plain
text files, without actually touching the plain text files or requiring
any additions to the Unicode Standard.

> 
> The alternative is either chaos or the need to use a protocol put forward by
> an organization other than the Unicode Consortium or by an individual. 

Exactly.

> Yet
> such a protocol would not have had the benefit of the full procedures of the
> Unicode Consortium in its drafting and would have no standing above any
> informal agreement amongst some users that it might receive and certainly
> not the endorsement of the Unicode Consortium.

It would have the standing of whatever organization chose to standardize
such a protocol. Which is entirely appropriate.

If you were expecting such a protocol to receive worldwide acceptance and
be usable on the Internet, then you should anticipate that you would have
to get it approved by the IETF as an Internet Standard.

If, on the other hand, it were just a matter of ensuring interoperability
of private Blissymbolics implementations, then you could get endorsement
by the Blissymbolics Institute.

And so on.

> 
> So, I ask a question.  Is there a formal method for the matter as to whether
> the Unicode Consortium has the powers to do what I suggest above to be
> formally decided please?

See:

http://www.unicode.org/pending/proposals.html

But note that the proposals that the Unicode Technical Committee
invites are related to the encoding of *characters*, rather than
the development of higher-level protocols.

--Ken

RE: Tags and the Private Use Area

2001-05-01 Thread Ayers, Mike



> From: William Overington [mailto:[EMAIL PROTECTED]]
> 
> Can there be found a possible usage that such a scheme would 
> not support?
> Finding just one would resolve the question.

I suspect that the whole issue is covered by Goedel's(sp?)
Incompleteness theorem, which says (approximately) that any mathematical
system above a certain complexity can not be fully mathematically described
(characterized) within itself.  Any scheme to describe PUA usage involving
only PUA characters cannot be distinguished with certainty from random use
of PUA characters.  A scheme which used non-PUA characters could work, but,
as has been stated many times, will not happen.

I'd like to point out that I consider it a Good Thing not to have a
classification system.  Should I choose to use PUA characters, I don't want
any application that I didn't write attempting to interpret their meaning,
since I may use them for anything (wasn't it you, William, who was working
on a soft processor which used PUA codepoints as instructions?) and I don't
want to waste time describing the usage for other applications (if the usage
can even be described).


/|/|ike

Re: Tags and the Private Use Area

2001-05-01 Thread William Overington


Asmus Freytag wrote:

This would violate the neutrality that the Unicode
Consortium is bound to observe when it comes to
uses of the Private Use Area. By encoding characters
it would implicitly endorse the scheme (or series of
schemes) designed to use these characters.

end quote

I have read the posting several times.  The first time through I did not
agree with what was written.  When I read it again later, I found that I
then believed that the posting is absolutely correct, that my suggestion was
incorrect and that I had a deeper understanding of the issues involved.  A
third reading put me back to disagreeing!  Thus, for me, from my reading of
it, the situation seems very finely balanced.  I mention this because I
would like, with permission, to comment on what is written in the posting
without necessarily either agreeing or disagreeing in total.

My first comment is that I am now by no means certain that my suggestion as
to what I felt that the Unicode Consortium could reasonably do in this
matter is valid.  Yet I am not quite sure that the Unicode Consortium could
not act if it so chose, within limits.

Asmus continues:

Since such scheme(s) support only some particular
usage (or set of usages) of the private use area,
the consortium would no longer be neutral towards
*any and all* uses of the Private Use Area.

end quote

This is the core sentence of the posting for me.  The question is as
follows.

Does such a scheme support only some particular usage (or set of usages) of
the private use area?

I find the phrase "or set of usages" particularly informative.

Let us consider the set of all possible usages.

Can there be found a possible usage that such a scheme would not support?
Finding just one would resolve the question.

However, since the "not finding" of one would be no proof that such a scheme
could not exist, can it be proved mathematically that there is no possible
usage that such a scheme would not support?  For, if it could be so proved,
then, unless there are also other reasons, the Unicode Consortium might
indeed *have* the power to act if it so chooses.  This would mean that
interested people might be able to develop a system within the private use
area with the prospect of it being moved from the private use area and
promoted to the status of being a part of the unicode standard if it were
found useful.

Asmus continues:

Going further and outlining a protocol for such a
thing is even worse - if done by the Unicode Consortium.
However, it would be fine for any other organization
to define the protocol - but that organization could
not assign any special non-private characters.

end quote

If the Unicode Consortium found that it could act in this matter within
limits then definition of a protocol would be all but an essential part of
any such action.

What I have in mind here is a set of private use area support tags, perhaps
located in plane 14, better located in plane 0 if a contiguous block of 128
unused codes in a reasonable place not within the private use area could be
found.

There would be a protocol saying that, in a plain unicode text file, but not
in a rich text file, certain information related to the meanings ascribed to
any private use area codes used *may, but need not* be included using these
tags.  However, *if* that information is included using these tags, then
this format of providing that information *must* be used.  The protocols
could be carefully designed so that no limiting presumption whatsoever as to
the nature of the usages of the private use area that were capable of being
described using the protocols were made by the protocols.

It is a matter for consideration as to quite how much information could be
included in such protocols without making any limiting presumption as to
usage of the private use area, but it might well be possible to provide an
amount sufficient to avoid ambiguity and to allow two or more overlapping
uses of the private use areas to be used in different parts of the same
document.  At the very least, if an ordinary language comment could be added
that would be helpful.  If a Uniform Resource Locator could be added as an
optional element, then good.

I feel that such protocols permitting the optional use of a font name if the
private use area codes so described are to be regarded as displayable
characters using a particular font does not violate neutrality as to whether
any particular defined use of a private use area character so defined by any
particular member of the unicode user community is to be a displayable
character or a non-displayable character.  It is well known that some uses
of the private use area can be for displayable characters and some for
non-displayable characters.  The unicode specification recognizes this in
the specification, so providing facilities such that any particular member
of the unicode user community may inform people as to what he or she has
chosen to do in any particular circumstance surely cannot be

Re: Tags and the Private Use Area

2001-05-01 Thread William Overington


Michael Kaplan invited me to give an actual scenario that requires private
use area support tags and an associated protocol.  Well, the short answer is
that I am unable to find one at the present time with my present level of
knowledge.

I thought about how I might find such an actual scenario with the facilities
before me.  I have access to as PC with Windows 95 which has Word 97 on it.
This enables one to create a Word document, select a font that contains
private use area characters, use Insert Symbol to add such a symbol or
symbols to the Word document, save the Word document then Save as HTML then
View HTML Source.  From the HTML source code produced one can find the
decimal values of the codes.  I tried the Junicode font and the Times New
Roman font, the latter because I remembered someone in this list writing
some months ago that the Microsoft Times New Roman font on their website
(which is the font that I am using, having obtained it so as to support on
this local machine the use of unicode in my 1456 object code system that is
on www.users.globalnet.co.uk/~ngo which is our family webspace in England)
has an extra character in it.  Might this extra character, the one that says
OBJ in a dotted box, just possibly clash with the Junicode lady in the
Junicode font?  An interesting situation is that the OBJ character came out
as  which is U+FFFC which is not in the private use area and I
learned some more about unicode by following that up in the unicode
specification (chapter 13 of version 3.0 under Specials, Replacement
Characters).  I found that the dialogue box of the Insert Symbol facility in
Word 97 has near its top right corner a text box that will actually display
the text "Private Use Area" when the "selection cursor" for the symbol that
one is considering is on a symbol that is from the private use area.  Does
what I have here termed the "selection cursor" have a proper Microsoft name?
I like to get the parlance of features of Microsoft products right.  I now
notice that Arial also has the OBJ character, so perhaps OBJ is not the
additional character to which the poster from some months ago referred?  I
also notice that both the Times New Roman and the Arial have fi and fl
ligatures twice, once under Private Use Area and once under Alphabetic
Presentation Forms.  Junicode does not appear to have either fi or fl under
Private Use Area but does have ff, fi, fl, ffi and ffl under Alphabetic
Presentation Forms.  The Junicode ffl placed in a Word document and then
formatted as Arial gives a black outline box.  The Arial fi from the Private
Use Area placed in a Word document and then formatted as Junicode gave a
black outline box.  The black outline box seems to represent "unknown to
this font".  The use of the fi and fl from the Alphabetic Presentation Forms
section of Times New Roman, Arial and Junicode all carry through to the
other two fonts, regardless of which font one uses to insert the characters
in the first place.

The decimal codes for ff, fi, fl, ffi and ffl come out as 64256, 64257,
64258, 64259 and 64260 respectively, and these turn out to be U+FB00 through
to FB04 inclusive.

Michael continues:

If there is no such scenario, then why not involve your obviously fine
intellect in some of those real problems? In other words, help clear the
backlog of work rather than try to create work without proof that it is even
needed?  Once we clear all of those things up, we will be bored and can
certainly move on to all of the theoretical matters that might be out there.

end quote

Well, as they say, I am grateful to the honourable gentleman for the remarks
in the first part of his question.

Two points arise.  One is that I happen, as a user of unicode, to regard
what is to be done to support the use of the private use area as a real
problem.  Secondly, are there any of the problems to which you refer where
there is an opportunity for people who do not represent the organizations
that are members of the Unicode Consortium to participate?  I have not seen
any such opportunities advertised in the mailing list, though I have not
been a subscriber to the list for very long.

William Overington

1 May 2001

Re: Provenance of the Unicode Standard and of statements (derives from Re: Tags and the Private Use Area)

2001-04-30 Thread Kenneth Whistler

William Overington wrote:

> What please is the IETF?

Internet Engineering Task Force. As Rick pointed out, peruse:

http://www.ietf.org/

> 
> Ken continues:
> 
> But anyone who comes to the [EMAIL PROTECTED] list looking
> to actually develop and establish a standard protocol involving
> Unicode is looking in the wrong place.
> 
> end quote
> 
> Well, maybe.  I notice that Ken writes using the email address
> [EMAIL PROTECTED] and that on the unicode website he is listed as a Technical
> Director of Unicode with the email address [EMAIL PROTECTED] with a mention
> of Sybase Inc. noted there.
> 
> So, when Ken states the sentence above, is that Ken writing as a private
> individual expressing a purely personal opinion, or Ken writing as a
> representative of Sybase Inc. or Ken writing as a Technical Director of the
> Unicode Consortium stating official Unicode Consortium policy?
> 
> I feel that that is an important issue that needs to be clarified.

I post personal opinions on this list.

When I am posting notes that represent an official Sybase position,
I post them on the relevant lists and sign myself as the Sybase
representative to the UTC and to L2.

When I respond in my capacity as one of the Technical Directors of
the Unicode Consortium (which is usually in direct email responses
to inquiries that come in via [EMAIL PROTECTED]), I sign my mail
appropriately.

Otherwise I'm not a big fan of officious-looking signatures on email,
and just sign myself "Ken".  

> May I suggest that there exists scope for considerable confusion as to the
> provenance of a statement made on this list where members of the unicode
> user community may well not know who are the directors of the Unicode
> consortium.

Well, sure, but I think most participants on this list know how
it generally works -- as Rick pointed out. None of the Unicode officers
or UTC representatives come to this open discussion list trying to
push official positions on everyone. Official policies are the
provenance of the Unicode website, the Unicode Standard itself, and
the meetings of the Unicode Technical Committee.

> I am genuinely confused by this situation.  Ken is a Technical Director of
> the Unicode Consortium and has the [EMAIL PROTECTED] email address.  He
> writes using the email address [EMAIL PROTECTED] and does not state that he is
> a Technical Director of the Unicode Consortium in this posting.  Ken makes
> statements about what is appropriate posting in this list.  Knowing that Ken
> is a Technical Director of the Unicode Consortium makes me feel that I
> should treat what he says as if it is an official ruling of the Unicode
> Consortium that that is how this list is to be used.  Yet is that a correct
> interpretation?  Is Ken just happily and in a friendly manner only seeking
> to express a personal view?

The latter.

My posting on this topic was not an official statement from the Unicode
Consortium or any other organization I may represent. I was merely
trying to point out that as a matter of history and practice the
[EMAIL PROTECTED] discussion list does not develop protocols, and
since what you were presenting and the way you presented it seemed to
invite participation in the development of a protocol, I was pointing
you to the kind of forum where protocol development *is* in scope and
is the focus of various email discussion lists.

> If people cannot legitimately and welcomely
> discuss such issues here then surely all that will happen is that someone
> will start an alt. newsgroup and the discussions will take place there.

I'm not trying to chase you off the list.

Only Sarasvati could do that, and as Rick pointed out, that only happens
when people blatantly violate her rules for participation on the
list.

--Ken

Re: Tags and the Private Use Area

2001-04-29 Thread James Kass

David Starner wrote:

> 
> Character set information must go along with every non-Latin-1
> webpage already, and most word processor formats already carry along
> huge quantities of data, such that just adding the information
> shouldn't be hard at all. 
>  

The charset declaration in HTML header is just one line, like
saying charset=utf-8.  The concern was that someone would
expect TUS 3.0 in its entirety to be included in every file,
as an extreme example.

Since the PUA is part of Unicode, it is covered when the 
character set is specified as utf-8 in HTML.

> Intellegent software cached the file and loads it up from the cache;
> the number of distinct uses for the PUA any one person will run
> across is probably low enough to cache every one permenantly. Dumb
> software will do the TeX thing and say "File not found. Please enter
> alternate PUA reference for 'Klingon at http://www.kli.org/klingon.xml':".  
> Note that there's already precedence in XML for stuff like this; XML
> includes a URL to find the doctype that's needed to validate it. 
> 

My impression is that the typical Klingon user (if they
used the Klingon script rather than the romanization) might
well have dozens or even hundreds of files using the ConScript
PUA encoding.  This could be true of any PUA user group.
The user files could also be many in format, *.TXT, *.HTM, 
*.DBF, *.EML, *.etc.   

Rather than specifying a structure requiring caches and on-line
sessions, it might be better to just leave things be and let 
authors and users work implementation issues out privately.

Common sense should indicate to a publisher that some kind of
info or pointers to same would be a good idea.

Best regards,

James Kass.

Re: Tags and the Private Use Area

2001-04-29 Thread David Starner


On Sun, Apr 29, 2001 at 03:14:23AM -0500, David Starner wrote:
> Why? For HTTP, all it would take is a line like 
> "Content-Unicode-PUA: Klingon; ref=http://www.kli.org/klingon.xml";
> where http://www.kli.org/klingon.xml is the definition.
> 
> Considering the amount of stuff a web 0 has to support, I don't
  browser
> see why this would be a 0 cost for anyone. Nothing says lynx
  ^^^ huge
> or Opera has to support it, but the heavyweight browser (IE, Mozilla)
> wouldn't have any reason not to. 
> 
> > Therefore, communities that share a well 
> > defined set of characters are better off if they can be standardized.
> 
> Well, duh. From the 0 to this thread, I don't think there's
 ^^^ responses

[Misspelled words were replaced with 0's, through a local mishap. I 
guess they weren't misspelled any more . . .]

-- 
David Starner - [EMAIL PROTECTED]
Pointless website: http://dvdeug.dhis.org
"I don't care if Bill personally has my name and reads my email and 
laughs at me. In fact, I'd be rather honored." - Joseph_Greg

Re: Tags and the Private Use Area

2001-04-29 Thread David Starner

On Sun, Apr 29, 2001 at 01:26:18AM -0700, James Kass wrote:
> To store all such information in each relevant file using
> non-BMP characters does seem a bit much.  Even without
> any new representations, providing this data in each file
> might work if the user had only one or two such files, 
> but wouldn't most users favoring a PUA encoding have 
> many files?

Character set information must go along with every non-Latin-1
webpage already, and most word processor formats already carry along
huge quantities of data, such that just adding the information
shouldn't be hard at all. 

> Earlier, someone brought up the idea that the format of
> the tag could include an active link to download additional
> data.  If the tag must be in each file's header, what happens 
> if a user is looking at files off-line?  Does the system read 
> the header of the file, determine that data is required on-line,
> and then prompt the user to connect?  Every time that file
> or a similar file is opened?

Intellegent software cached the file and loads it up from the cache;
the number of distinct uses for the PUA any one person will run
across is probably low enough to cache every one permenantly. Dumb
software will do the TeX thing and say "File not found. Please enter
alternate PUA reference for 'Klingon at http://www.kli.org/klingon.xml':".  
Note that there's already precedence in XML for stuff like this; XML
includes a URL to find the doctype that's needed to validate it. 

-- 
David Starner - [EMAIL PROTECTED]
Pointless website: http://dvdeug.dhis.org
"I don't care if Bill personally has my name and reads my email and 
laughs at me. In fact, I'd be rather honored." - Joseph_Greg

Re: Tags and the Private Use Area

2001-04-29 Thread James Kass

John Cowan wrote:

> 
> >   This file uses characters assigned
> > to the Private Use Area of Unicode according to the
> > PUA scheme published at (URL).  In order to view this
> > document, it will be necessary to obtain and install
> > the (font-name) font from (URL of font provider).  
> > 
> 
> Well, this is fine if all you want to do is render the document.
> If you want to *process* the document, though, you need
> need to have information on the properties of the PUA
> characters relevant to the document.
> 
> However, I agree that no new representations are needed
> for this.  It is sufficient just to extend the 3.x
> UnicodeData and *Properties files.
> 

The ability to correctly display text is important.

Anything beyond that would perhaps be better stored as
part of the PUA scheme itself at the referenced URL.  
(This could be in plain text format designed to be used
to extend the UnicodeData files.)

Or, in the case of TTF/OTF, there's a table within the font
(GDEF = glyph definition) which allows some rudimentary
properties for glyphs.  (This font table isn't yet widely
supported.)

To store all such information in each relevant file using
non-BMP characters does seem a bit much.  Even without
any new representations, providing this data in each file
might work if the user had only one or two such files, 
but wouldn't most users favoring a PUA encoding have 
many files?

Earlier, someone brought up the idea that the format of
the tag could include an active link to download additional
data.  If the tag must be in each file's header, what happens 
if a user is looking at files off-line?  Does the system read 
the header of the file, determine that data is required on-line,
and then prompt the user to connect?  Every time that file
or a similar file is opened?

Maybe it would be best to leave it incumbent upon a file's
author to provide any necessary information or pointers.  
If someone has accessed a file which uses the PUA and can't 
read it, it may well be that the contents of that file are 
supposed to be every bit as private as the Unicode area used.

Best regards,

James Kass.

Re: Tags and the Private Use Area

2001-04-29 Thread David Starner

On Sat, Apr 28, 2001 at 11:38:30PM -0700, Asmus Freytag wrote:
> Someone (for example IETF or W3C) who is in the business of defining 
> general protocols for text interchange built on top of the Unicode Standard 
> would probably want to be very careful about issues relating to the private 
> use area. There are three options:
> a) The safest thing is to prohibit the use of the private use area 
> altogether - this maximizes the success of any interchange.

Maximizing the success of interchange is not as important as being 
able to communicate what you want. If it were, then we should all
use ASCII, because that's about all that will reliably display almost
everywhere. 

> b) In the future, there may be a web-scalable way to characterize the 
> private use area assignments - in that case they could be built into the 
> protocols. The interchange would be definite, but at a considerable cost to 
> everyone.

Why? For HTTP, all it would take is a line like 
"Content-Unicode-PUA: Klingon; ref=http://www.kli.org/klingon.xml";
where http://www.kli.org/klingon.xml is the definition.

Considering the amount of stuff a web 0 has to support, I don't
see why this would be a 0 cost for anyone. Nothing says lynx
or Opera has to support it, but the heavyweight browser (IE, Mozilla)
wouldn't have any reason not to. 

> Therefore, communities that share a well 
> defined set of characters are better off if they can be standardized.

Well, duh. From the 0 to this thread, I don't think there's
any evidence that people are using the PUA for standardizable
characters and not working on getting them standardized. There's
apparently two different sets of people using the PUA: people who
are working on getting something standardized and (a) need to use
it now or (b) need to check their implementation, and people using
codes that won't be standardized (logos, conscripts, Han variants).
Telling the latter group they're just out of luck will produce more
character sets and kludges than trying to support them, at least to
the point of not banning all use of the PUA.

-- 
David Starner - [EMAIL PROTECTED]
Pointless website: http://dvdeug.dhis.org
"I don't care if Bill personally has my name and reads my email and 
laughs at me. In fact, I'd be rather honored." - Joseph_Greg

Re: Tags and the Private Use Area

2001-04-29 Thread Asmus Freytag

>William Overington wrote:
>
> > However, there is something that I feel that the Unicode
> > Consortium could do, if it so wished, without violating
> > that rule.  I suggest that the Consortium could,
> > if it so chooses, encode one or more regular unicode
> > characters together with a protocol so that an author of
> > a file of unicode plain text that uses any of the codes of
> > the PUA could, if and only if that author
> > chooses to so state, state in a file of plain unicode text
> > what meaning the author of that file places upon any
> > PUA characters that the author uses.

This would violate the neutrality that the Unicode
Consortium is bound to observe when it comes to
uses of the Private Use Area. By encoding characters
it would implicitly endorse the scheme (or series of
schemes) designed to use these characters.

Since such scheme(s) support only some particular
usage (or set of usages) of the private use area,
the consortium would no longer be neutral towards
*any and all* uses of the Private Use Area.

Going further and outlining a protocol for such a
thing is even worse - if done by the Unicode Consortium.
However, it would be fine for any other organization
to define the protocol - but that organization could
not assign any special non-private characters.

I do believe we are going in circles here, and
lengthy ones to boot.

A./

Re: Tags and the Private Use Area

2001-04-29 Thread Asmus Freytag


Why Unicode will never endorse certain proposals


By making the Private Use Area "private", the Unicode Consortium imposed on 
itself a restriction to stay absolutely neutral on the use of these 
characters. In other words, it cannot promote or appear to be promoting the 
use of this area for any one *particular* purpose. Nor can the Consortium 
endorse, or appear to be endorsing, any particular method of identifying 
the repertoire or usage of these characters. Doing so, would change the 
nature of the private use area from something that is private and outside 
the scope of the Consortium to something that is a formalized code 
extension technique.

Why everyone else is free to do what they want
--

By definition, this restriction does *not* apply to any other organization 
not involved in maintaining the standard. For example, vendors, user 
groups, and individuals are quite within their rights to propose particular 
assignments or even to define higher level protocols that regulate the use 
of the private use area, as long as these apply to *those users, and only 
those* that subscribe to that assignment or higher level protocol.

Why certain things may or may not be advisable
--

Someone (for example IETF or W3C) who is in the business of defining 
general protocols for text interchange built on top of the Unicode Standard 
would probably want to be very careful about issues relating to the private 
use area. There are three options:
a) The safest thing is to prohibit the use of the private use area 
altogether - this maximizes the success of any interchange.
b) In the future, there may be a web-scalable way to characterize the 
private use area assignments - in that case they could be built into the 
protocols. The interchange would be definite, but at a considerable cost to 
everyone.
c) Some protocols may be designed to cover any form of plain-text without 
loss. Such protocols would need to allow unrestricted use of the private 
use area, but success of interchange would depend on outside negotiation.

Why interchanging private use characters won't work
---

Because our growing dependency on internet and web protocols, data 
interchange among a community of users who rely on a common set of private 
use characters seems hopeless without the existence (and widespread 
implementation) of option b. However, if it simply involves the use of a 
common font, option c would work as well (with distribution of the common 
font being the outside negotiation). Anything more complex would run into 
the need to customize editors, browsers, databases etc. in ways that 
probably wouldn't be possible or not uniformly successful.

Since option b increases implementation costs for everyone, it is not 
likely to be supported everywhere. Therefore, communities that share a well 
defined set of characters are better off if they can be standardized.

Re: Tags and the Private Use Area

2001-04-28 Thread John Cowan


James Kass scripsit:

>   This file uses characters assigned
> to the Private Use Area of Unicode according to the
> PUA scheme published at (URL).  In order to view this
> document, it will be necessary to obtain and install
> the (font-name) font from (URL of font provider).  
> 

Well, this is fine if all you want to do is render the document.
If you want to *process* the document, though, you need
need to have information on the properties of the PUA
characters relevant to the document.

However, I agree that no new representations are needed
for this.  It is sufficient just to extend the 3.x
UnicodeData and *Properties files.

-- 
John Cowan   [EMAIL PROTECTED]
One art/there is/no less/no more/All things/to do/with sparks/galore
--Douglas Hofstadter

Re: Tags and the Private Use Area

2001-04-28 Thread James Kass

William Overington wrote:

> However, there is something that I feel that the Unicode 
> Consortium could do, if it so wished, without violating 
> that rule.  I suggest that the Unicode Consortium could, 
> if it so chooses, encode one or more regular unicode 
> characters together with a protocol so that an author of 
> a file of unicode plain text that uses any of the codes of 
> the private use area could, if and only if that author 
> chooses to so state, state in a file of plain unicode text 
> what meaning the author of that file places upon any 
> private use area characters that the author uses.

Suppose that the BMP of Unicode could be used for this
purpose.  In other words, why create additional
characters in order to note necessary information for
distribution with a PUA file?

We might agree in principle that it would be a good idea
for anyone publishing material using the PUA to include
a note to that effect.  Such a note could appear at the
beginning of the file/document and could use any
character from the BMP.  A modest example follows:

  This file uses characters assigned
to the Private Use Area of Unicode according to the
PUA scheme published at (URL).  In order to view this
document, it will be necessary to obtain and install
the (font-name) font from (URL of font provider).  

Now, the above example uses English, but the advantage
of being able to use any BMP character within the "tag"
or "note" is that any other modern language could be
used, like Russian, Japanese, or Esperanto.

This approach may offer some advantages:

1)  It would work right away.
2)  It would provide the essential information.
3)  It would not need to be endorsed by any organization.
4)  No additional characters would be required.
5)  It doesn't attempt to fix anything which isn't broken.
6)  Software applications don't have to be re-written.
7)  It is human-readable.
8)  It is simple.

Best regards,

James Kass.

Cutting to the chase (was Re: Tags and the Private Use Area)

2001-04-28 Thread Michael \(michka\) Kaplan

From: "William Overington" <[EMAIL PROTECTED]>
To: <[EMAIL PROTECTED]>
Cc: <[EMAIL PROTECTED]>
Sent: Saturday, April 28, 2001 7:44 AM
Subject: Re: Tags and the Private Use Area

> The quote is an excerpt from a sentence.

Well, you did manage to go on for quite a bit. Since you were able to pick
apart things I said at both the very start and then again at the very end
that you read the whole message that I posted. So I will assume that you
read the middle part, which included a polite form of the traditional "put
up, or shut up" type challenge: I asked you to come up with an exact
customer scenario rather than a desire to stretch the Unicode standard in a
direction that is only theoretically useful but has no true and actual
customer.

I might have been too subtle, so I will state the "challenge" (such as it
is) more clearly, so that you cannot ignore it in favor of 5000 words on
superfluous procedural rules of what e-mail address should be used. :-)

The CHALLENGE:

Give an ACTUAL scenario that requires this thing you wish to see discussed.
There is more than enough in the way of actual scenarios that anythin which
does not have such a scenario becomes slightly less important than
EVERYTHING ELSE in which folks here would have an interest.

If there is no such scenario, then why not involve your obviously fine
intellect in some of those real problems? In other words, help clear the
backlog of work rather than try to create work without proof that it is even
needed? Once we clear all of those things up, we will be bored and can
certainly move on to all of the theoretical matters that might be out there.

Not too much to ask, is it? :-)

MichKa

Michael Kaplan
Trigeminal Software, Inc.
http://www.trigeminal.com/

Re: Tags and the Private Use Area

2001-04-28 Thread William Overington


Michael Kaplan wrote:

Lets consider the fact that what you are looking for is summarized at the
end of your message: "I hope to gain fairly widespread agreement within the
unicode user community."

end quote

The quote is an excerpt from a sentence.  The whole sentence is as follows.

The suggestion is open for discussion and I hope to gain fairly widespread
agreement within the unicode user community.

Michael continues:

I submit that this very desire is a violation of the entire spirit of the
PUA, which is about PRIVATE USE and thus widespread acceptance is neither
needed nor desired. You can attribute the frustration you are feeling as due
to this single reason more than any other.

end quote

In everyday life there are laws, specifications and agreements.  These all
contain rules.  There exist the concepts of "working within the letter of
the rules", "working within the spirit of the rules", "working within the
letter and the spirit of the rules" and "working within the letter but not
the spirit of the rules".

I feel that the concept of the spirit of the rules is very important:
however, I feel that the spirit of the rules cannot possibly be such that it
*contradicts* the letter of the rules.  The rules for the private use area
specifically include as an example of possible use " or they could be
published as vendor-specific character assignments available to applications
and end users."  The letter of the specification is that there could be
publication.  The letter of the specification is that this publication could
be by way of trade.  Publication does not require any agreement, it can be
unilateral action.  I feel that my hope to gain fairly widespread agreement
within the unicode user community is well within both the letter and the
spirit of the specification.

As to my feeling, well you surprised me there!  I am feeling no frustration
whatsoever.  I have had a good week of research on a fascinating topic, I
have had the benefit of reading the views of top experts on the unicode
system as they debated the issues that arose.  It is as if I have been given
the privilege to spend a week as a guest in the common room of a top
university debating with top scholars on an aspect of world class leading
edge research.  I have learned much.  I had seen tags previously in passing
but, as a result of the matter being raised, I have learned more of them.  I
feel that I now have a broad understanding of what action is needed to solve
the problem.  I have a (basic) understanding of the technical issues and
also have become aware of the policy issues and the fascinating way that the
potential for chaos has been recognized but will possibly not be acted upon
officially until chaos occurs.  I am reminded of the Millennium bug and the
fact that that was envisaged as a potential cause of chaos during the 1990s
and the way that that was acted on before 1 January 2000 rather than after 1
January 2000.  I remember the way that, having heard talk of the Millennium
bug during the 1990s I was startled when Channel 4 News on the television
here in the United Kingdom announced early in 1998 that the Millennium bug
had struck!  A man had tried to buy something in a shop with his newly
issued replacement credit card.  The expiry date was 01/00 and the shop
could not get the online automated electronic system to issue an
authorization code for the purchase transaction and had to use a manual
credit card machine with the multilayer pieces of paper.  I am absolutely
fascinated by the way that the Unicode Consortium, having recognized that
the specification opens a window to potential chaos appears to prefer to
wait until the chaos actually happens and then reported back before even
starting a process of considering what to do about it.

As you raise the issue of feelings I mention that a search on the web for
Myers Briggs type indicator is interesting.  It is sometimes called
Myers-Briggs type indicator, using a hyphen.  The web site
www.new-oceans.co.uk is a good site for the Myers Briggs type indicator.
The Myers Briggs type indicator is based on the teachings of Carl Gustav
Jung.  It is fascinating and will hopefully give an insight into how
different people, based on their personalities, can view matters in entirely
different ways.

Another interesting aspect of psychology related to feelings is the
Yerkes-Dodson law.  I wish it were more widely known about.  So, the matter
of feelings having been raised I take the opportunity to mention it here so
that anyone interested might like to search the web and hopefully enjoy what
they find.  A serendipitous link to follow up perhaps?

William Overington

28 April 2001

Provenance of the Unicode Standard and of statements (derives from Re: Tags and the Private Use Area)

2001-04-28 Thread William Overington


Kenneth Whistler, wrote:

And there have been a couple of no-doubt frustrating responses already.

end quote

No, not frustrating at all.  I have found it fascinating.  I am seeking to
participate in world class leading edge research work and the number of
contributions to this thread, the variety of opinion, the matters raised and
the potential to learn from the pointers given has been pleasing,
fascinating and very helpful.

Ken continues:

I would like to uplevel briefly here and suggest why the people
on this list are not engaging in the details of Mr. Overington's
proposals so much as questioning the need for such a protocol,
arguing the premises, talking about the role of metadata, and so
on.

end quote

Well, most of the 650 recipients of this list do not participate in most
discussions.  I feel that some people will only respond to a posting in a
list if they feel that they disagree or wish to make some particular
additional point.  If they agree, then they might just say "fine" to
themselves and spend their time on something else rather than feel a need to
send a posting that just says, "I agree".  I am not suggesting that all or
indeed most or even any of the recipients of this list agree with the
suggestion that I made in my document.  Many may not even have looked at it.

When putting forward new ideas an inventor should perhaps not expect an
immediate response.  I feel that I will have done well if, of the 650
recipients on this list, some have filed the suggestion that I made in the
document of 26 April 2001 under private use area and made a mental note that
my suggestion exists, just in case one day a file coded using it turns up,
and maybe made a note that there is a suggestion about the use of U+12
and U+100020  U+10007F that has been sent round and that, if they
themselves are ever going to make use of the private use area for defining
characters then, at that time, they will take into consideration the
knowledge that that suggestion has been made and might be in use somewhere,
and will make their own decision as to whether to in effect tacitly agree to
it to the limited extent of avoiding *clashing codes* with it, even though
no one else outside any organization for which they work is even aware that
the decision to avoid clashing codes with my suggestion has been made so
that the organization cannot be in any way whatsoever be seen to be
endorsing my suggestion.

I am content.  I have sent out my idea as it stands and many of the key
companies using unicode may possibly have made a note that the document
exists.  I have placed in this posting the URL of our family webspace, so if
they want to check whether the idea is still about then they will be able to
seek to check at the website if they wish.

Ken continues later:

One thing the Unicode discussion list doesn't do is develop
protocols. That is the kind of work that instead often takes place
on temporary Working Group discussion lists in the IETF.

end quote

What please is the IETF?

Ken continues:

While Mr. Overington's initial proposals were couched in terms
of character encoding, it soon became clear to the list and to
him that we weren't talking about standardizing any characters,
but instead a proposal for particular private uses of PUA
characters -- something the UTC and WG2 cannot and will not
endorse, precisely because they *are* private use characters.

end quote

I learned about the idea of using characters within protocols within a
plain unicode text file when the discussion turned towards the matter of
tags.  I am a relative newcomer to unicode and am on the learning curve.

The Unicode Consortium cannot and will not endorse a proposal for particular
private uses of PUA characters.  That has not been an issue within this
thread.  I knew that situation before the thread started.

However, there is something that I feel that the Unicode Consortium could
do, if it so wished, without violating that rule.  I suggest that the
Unicode Consortium could, if it so chooses, encode one or more regular
unicode characters together with a protocol so that an author of a file of
unicode plain text that uses any of the codes of the private use area could,
if and only if that author chooses to so state, state in a file of plain
unicode text what meaning the author of that file places upon any private
use area characters that the author uses.

If the Unicode Consortium were to consider making such definitions, then
perhaps I might suggest, for purposes of clarifying what I mean and
providing some examples just in this discussion, there are, at the present
time, three broad possibilities.

1.  Define U+E0002 and use the existing tag characters.

2.  Promote my suggestion to codes U+E0102 and U+E0120  U+E017F.

3.  Something else.

Now I fully accept that the Unicode Consortium may not wish to do anything
whatsoever about this matter either now or ever and I am not saying or even
suggesting that it should.  That is a matter for

Re: Tags and the Private Use Area

2001-04-26 Thread Kenneth Whistler

William Overington wrote:

> I have updated my suggestion.  Here is the latest version for discussion.
...
> Specific protocols to use with such tagging can be devised.
...
> The suggestion is open for discussion and I hope to gain fairly widespread
> agreement within the unicode user community.

And there have been a couple of no-doubt frustrating responses already.

I would like to uplevel briefly here and suggest why the people
on this list are not engaging in the details of Mr. Overington's
proposals so much as questioning the need for such a protocol,
arguing the premises, talking about the role of metadata, and so
on.

The Unicode discussion list is focussed first-and-foremost on the
Unicode Standard itself. The discussants here often discuss additional
characters or scripts to be added to the standard, particular
implementation issues for some character or script, the details
of algorithms needed for implementing the Unicode Standard, and
then love to go OT to discuss interesting issues relating to
languages, scripts, etymologies and such.

One thing the Unicode discussion list doesn't do is develop
protocols. That is the kind of work that instead often takes place
on temporary Working Group discussion lists in the IETF.

While Mr. Overington's initial proposals were couched in terms
of character encoding, it soon became clear to the list and to
him that we weren't talking about standardizing any characters,
but instead a proposal for particular private uses of PUA
characters -- something the UTC and WG2 cannot and will not
endorse, precisely because they *are* private use characters.

And as has become clear in Mr. Overington's latest statement of
what he is proposing, this is really a proposal for a protocol:
a specification of a method for communicating particular interpretations
of rationally segmented portions of the PUA.

As such, this ([EMAIL PROTECTED]) is probably the wrong forum
to be trying to discuss, modify, and gain working consensus on
such a protocol proposal. It just isn't that kind of forum. 
[EMAIL PROTECTED] doesn't "work on" specific documents as a group, 
with the aim of publishing them as standard protocols for general 
usage. There is no program of work and no moderator whose job
it is to attempt to solicit and capture consensus and move a
document towards final form.

The mechanism that is more appropriate to that would be to
take the proposal, rework it as an Internet Draft,
solicit commentary on that document, and then try to develop
consensus *within the IETF* to progress such a document to
a standard protocol.

Of course in such a forum any proposal like this would also
face questions regarding justification and alternatives. And
those might be equally frustrating there.

But anyone who comes to the [EMAIL PROTECTED] list looking
to actually develop and establish a standard protocol involving
Unicode is looking in the wrong place.

--Ken

Re: Tags and the Private Use Area

2001-04-26 Thread Peter_Constable



On 04/27/2001 03:23:36 AM unicode-bounce wrote:

>From: "William Overington" <[EMAIL PROTECTED]>
>
>> I have updated my suggestion.  Here is the latest version for
discussion.
>
>Lets consider the fact that what you are looking for is summarized at the
>end of your message: "I hope to gain fairly widespread agreement within
the
>unicode user community." I submit that this very desire is a violation of
>the entire spirit of the PUA, which is about PRIVATE USE and thus
widespread
>acceptance is neither needed nor desired.

Nor possible. Not practically possible to get all potential 6G+ members of
the community to agree (or even a minute fraction), and even if that were
possible, not possible to know that they all agree (I'd guess it's only
possible among perhaps around 0.1% of the potential community at best).



- Peter


---
Peter Constable

Non-Roman Script Initiative, SIL International
7500 W. Camp Wisdom Rd., Dallas, TX 75236, USA
Tel: +1 972 708 7485
E-mail: <[EMAIL PROTECTED]>


You can attribute the frustration
>you are feeling as due to this single reason more than any other.
>
>Unicode, like any organization, can grow in response to real need. In
fact,
>it has done so in even core architectural ways in the past. But I would
tend
>to look at the way that the growth is being suggested here is not in the
>best interests of Unicode or the "Unicode user community."
>
>So actually, I have a better suggestion at this point. One that would meet
>the burden of the test that Rick McGowan has made and also one that would
>satisfy the crotchety folks like me who insist that this really not a
route
>that any of the fine minds on the Unicode List should even be considering.
>
>(What the unfine minds do is of course their own business, but I do not
>classify any of the people in this particular conversation as being in
that
>category!)
>
>Let us wait and find an ACTUAL example of a TRUE situation where a PUA
>encoding is needed that the existing mechanism is not enough. Let the
brave
>soul who has been forced by the circumstances of fate to deal with this
>complex issue come forward and explain how their circumstance and the
reason
>that the existing PUA mechanism which requires a mutual understanding and
a
>private agreement is so inadequate.
>
>No one in this group is UNREASONABLE. But I do think a lot more mindshare
is
>going to a problem that is THEORETICAL rather than real. And all of us can
>probably find useful ways to use Unicode as it stands and then as real
needs
>come up we can find ways to extend Unicode to meet those real needs. There
>are more than enough problems to solve that actually exist that it is
almost
>insulting that we are off inventing problems that we think might be
>important but have no clearcut case of need that is made.
>
>This will be my final plea here, as even though I do not classify myself
as
>one of those "fine minds" that I referred to earlier I do have many real
>problems with actual scripts that I have true customers for, and I think
>they deserve my attention much more than the problems that we are
concerned
>may exist.
>
>MichKa
>
>Michael Kaplan
>Trigeminal Software, Inc.
>http://www.trigeminal.com/
>
>
>

Re: Tags and the Private Use Area

2001-04-26 Thread Michael \(michka\) Kaplan


From: "William Overington" <[EMAIL PROTECTED]>

> I have updated my suggestion.  Here is the latest version for discussion.

Lets consider the fact that what you are looking for is summarized at the
end of your message: "I hope to gain fairly widespread agreement within the
unicode user community." I submit that this very desire is a violation of
the entire spirit of the PUA, which is about PRIVATE USE and thus widespread
acceptance is neither needed nor desired. You can attribute the frustration
you are feeling as due to this single reason more than any other.

Unicode, like any organization, can grow in response to real need. In fact,
it has done so in even core architectural ways in the past. But I would tend
to look at the way that the growth is being suggested here is not in the
best interests of Unicode or the "Unicode user community."

So actually, I have a better suggestion at this point. One that would meet
the burden of the test that Rick McGowan has made and also one that would
satisfy the crotchety folks like me who insist that this really not a route
that any of the fine minds on the Unicode List should even be considering.

(What the unfine minds do is of course their own business, but I do not
classify any of the people in this particular conversation as being in that
category!)

Let us wait and find an ACTUAL example of a TRUE situation where a PUA
encoding is needed that the existing mechanism is not enough. Let the brave
soul who has been forced by the circumstances of fate to deal with this
complex issue come forward and explain how their circumstance and the reason
that the existing PUA mechanism which requires a mutual understanding and a
private agreement is so inadequate.

No one in this group is UNREASONABLE. But I do think a lot more mindshare is
going to a problem that is THEORETICAL rather than real. And all of us can
probably find useful ways to use Unicode as it stands and then as real needs
come up we can find ways to extend Unicode to meet those real needs. There
are more than enough problems to solve that actually exist that it is almost
insulting that we are off inventing problems that we think might be
important but have no clearcut case of need that is made.

This will be my final plea here, as even though I do not classify myself as
one of those "fine minds" that I referred to earlier I do have many real
problems with actual scripts that I have true customers for, and I think
they deserve my attention much more than the problems that we are concerned
may exist.

MichKa

Michael Kaplan
Trigeminal Software, Inc.
http://www.trigeminal.com/

Re: Tags and the Private Use Area

2001-04-26 Thread William Overington


I have updated my suggestion.  Here is the latest version for discussion.

Let there exist the idea that there is U+12 (PUA INTERPRETATION TAG) and
a set of private use area tag characters (U+100020  U+10007F) all of
which code points are in the upper private use area.

May I suggest that mention is made that, where displayed for analysis
purposes, these private use area tags should be displayed as yellow on a red
background.  Ordinary unicode tags displayed for analysis are not specified
to be displayed in any specific colour but some people might like to display
them as white on blue so as not to conflict visually with these suggested
private use area tags.

Naturally this definition within the private use area is not an absolute
definition and the Unicode Consortium is not being asked to endorse it nor
would they, by their own statement.  All that could be reasonably sought is
that the practice and such protocols that are expressed using such private
use area tags are so well thought out and designed by interested users that
most users will wish to use them for most applications where private use
area characters are used.  It cannot be expected that most users will agree
to such a system, yet one can always hope.

Specific protocols to use with such tagging can be devised.

I put forward the idea that, in a file of plain unicode text that contains
characters from the private use area, information about the character set or
sets to which private use area codes refer may, if so desired, be included
within the file (before the use of any character to which the information
relates) by including the U+12 character followed by a number of private
use area tag characters from the set of private use area tag characters
(U+100020  U+10007F) which express one or more groups of characters in
the following formats.

A Uniform Resource Locator of a font file.

For example,

http://www.somewebsite.net/oldchem.ttf

A Uniform Resource Locator of a description file of the characters within
square brackets.

For example,

[http://www.somewebsite.net/oldchem.htm]

A comment about the characters in natural language within wavy brackets.

For example,

{Symbols used in early chemistry}

A list specifying the parts of the private use areas to which this
description refers within round brackets.

For example,

(E000..E2FF,E700..E7FF)

The name of the font to be used is always expressed as a full Uniform
Resource Locator using the private use area tag codes, though a software
package using the data may, if it wishes to take the risk, simply use the
file name at the very end of the said Uniform Resource Locator and search
for that file name in its own local font directory without accessing the
internet.

The suggestion is open for discussion and I hope to gain fairly widespread
agreement within the unicode user community.

William Overington

26 April 2001

Re: Tags and the Private Use Area

2001-04-25 Thread Rick McGowan


Eric Muller quoted from a Seybold Report, but... I think it's out of date.  
 Actually, I'm not talking about the "Gaiji Problem".  It's a well-known  
special case of needing things that aren't in the standard one is using;  
but it's a private need.

As long as the system you're using lets you make a character & font for  
your own purposes, you can use it.  Most of those proprietary systems do  
so.  But most such existing systems start with something that is far less  
complete than Unihan, and hence have greater need for utilizing home-grown  
gaiji.

I would assert that given the 20,000 Unihan characters originally encoded,  
topped off with what has been recently encoded, there should now be almost  
zero need in any of the Han-using countries for any such Gaiji as "many  
'unofficial' Kanji characters, mistakes and misinterpretations, and  
seldom-used Kanji passed down for generations".  Most of those things are  
already covered by Unicode, and in fact FILLING THAT GAP has been the  
primary purpose of the most recent tens of thousands of additional Han  
characters.

Company logos are a different matter, of course, but rarely if ever need  
to be publicly transmitted as characters.

So, while Japanese customers might have at one time needed to use  
something like the PUA, nowadays they shouldn't need to.  In any case, the  
Gaiji of any one installation are installation-specific.  In the past, they  
have never been transmissable, so that doesn't demonstrate any need for  
widespread transmission of PUA characters.

Rick

Re: Tags and the Private Use Area

2001-04-25 Thread John Cowan


Rick McGowan scripsit:

> I'm looking for a problem to which all of these engineering solutions are  
> being proposed and discussed.  I don't yet see anything that needs to be  
> solved.  I see a theoretically chaotic situation, not an actually chaotic  
> situation.

Well, I wanted to start the CSUR well in advance of actual usage,
and encouraged everyone and his brother to register their scripts,
*so that* code clash (at least within the conlang community)
would never come into existence at all.

-- 
John Cowan   [EMAIL PROTECTED]
One art/there is/no less/no more/All things/to do/with sparks/galore
--Douglas Hofstadter

Re: Tags and the Private Use Area

2001-04-25 Thread Eric Muller



Rick McGowan wrote:
 
I'm looking for a problem to which all of these engineering solutions
are
being proposed and discussed.  I don't yet see anything that needs
to be
solved.  I see a theoretically chaotic situation, not an actually
chaotic
situation.
Here is a quote from the November 27, 2000 Seybold Report on Publishing
Systems, from the article 'The Second Wave of Japanese Desktop Publishing':
Gaiji are Kanji characters outside the current JIS and Unicode
encoding sets and are not included in a standard font. They comprise many
"unofficial" Kanji characters, mistakes and misinterpretations, and seldom-used
Kanji passed down for generations, long before printing presses and governments
created standards. These Gaiji characters are widely used in people- and
place-names. To this day, they are a reason for publishers to hang on to
their proprietary systems.
I personally don't know enough about Japanese to say if this is indeed
a character collection problem, or only a glyph variant problem; I suspect
it is a combination of both.
Like many on this list, I am entirely of the opinion that every character
is either currently in Unicode or on the way for a feature version. However
good the process may be to add characters, it still remains that there
is a lag and something has to be done in the meantime; until three weeks
ago, there were about 40,000 characters I may have a need for, and my only
Unicode-compatible solution was to use the PUA. Until Unicode 3.2 is out,
there are characters in JIS X 0213:2000 which are not in Unicode. And all
the descriptions of Gaiji I have heard suggest that there are characters
that will not make it for a long time.
In other words, for a Japanese publisher, it seems that the PUA is something
you have to use every day, in almost every document. Would that qualify
for your search?
Eric.

Re: Tags and the Private Use Area

2001-04-25 Thread Rick McGowan

David Starner <[EMAIL PROTECTED]> wrote:

> Most of the PUA usages seem to be stuff the UTC refuses to encode
> (Apple's logo, Klingon

First a correction: UTC has not yet, unfortunately, actually *refused* to  
encode Klingon.  It still sits on the books.  (I think they should formally  
refuse to encode Klingon, but that's my personal opinion.)

But let me re-phrase my original argument.  My question is not:

"Has anyone made a survey of things that people might use?"

but rather

"Are people widely exchanging, for example, characters that
are listed in the ConScript registry?"

I don't think they are.  At least, they're not doing it in a widespread  
enough manner to be a noticeable problem that needs to be fixed.

I'm looking for a problem to which all of these engineering solutions are  
being proposed and discussed.  I don't yet see anything that needs to be  
solved.  I see a theoretically chaotic situation, not an actually chaotic  
situation.

But, maybe I'm just not in the groove, and I don't know what communities  
are using these PUA characters.  So I'm asking about instances and examples  
of actual usage.

Rick

Re: Tags and the Private Use Area

2001-04-25 Thread David Starner

On Wed, Apr 25, 2001 at 09:16:43AM -0700, Rick McGowan wrote:
> For the most-part, it's been my impression that actual PUA usages are very  
> localized and platform-specific, and the characters tend not to leak all  
> over the place.  If end-users have a demonstrable need to widely  
> communicate some set of characters, I would think they might first consider  
> them as candidates for standardization; not as evidence that arcane  
> regulatory mechanisms need to be engineered for the PUA.

Most of the PUA usages seem to be stuff the UTC refuses to encode
(Apple's logo, Klingon (I've seen it used on the web with the 
ConScript encoding, though I didn't have a font that would display 
it), etc.), not stuff that would be encoded if submitted, with the
exception of the MathML stuff, which was proposed and encoded.

As for the suggestion to use another encoding marker (x-mike-pua1),
that has the problem that non-PUA parts of the page can't be 
displayed and that doesn't work for untagged Unicode only systems 
(dict protocol, for example).

-- 
David Starner - [EMAIL PROTECTED]
Pointless website: http://dvdeug.dhis.org
"I don't care if Bill personally has my name and reads my email and 
laughs at me. In fact, I'd be rather honored." - Joseph_Greg

Re: Tags and the Private Use Area

2001-04-25 Thread Rick McGowan


There has been a lot of recent discussion about various uses of the PUA.

Can someone point to widespread instances of confusion and chaos right now  
over PUA usage?  I don't think there is any.

It seems to me there's a lot of effort being expended to engineer the  
regulation of something that hasn't been shown to be a problem in the  
world, or in any need of regulation.

For the most-part, it's been my impression that actual PUA usages are very  
localized and platform-specific, and the characters tend not to leak all  
over the place.  If end-users have a demonstrable need to widely  
communicate some set of characters, I would think they might first consider  
them as candidates for standardization; not as evidence that arcane  
regulatory mechanisms need to be engineered for the PUA.

Most things that most people need are already encoded.  I don't see people  
coming to this list with existing collections of entities that are not  
encoded, and yet need to be widely transmitted and stored.  (Of course, I  
am not referring to unencoded minority scripts, many of which UTC already  
knows about.)

If such collections were widespread, I'm sure UTC would like to hear about them.

Rick

37 matches

Mail list logo