Re: RE: Tags and the Private Use Area
Peter said: >2. How do I get software X to know how to process my PUA characters, or how >do I document my characters for others to understand my data? Michael replied... > In principle it would work, if the OSes are being written to handle user > editing of such things. Ten euros sez they ain't. Well, at one time when I believe it was possible with the precursor to Mac OS X to change the behavior & properties in the "Cocoa" text system. It's merely a matter of loading or re-loading the data tables. I don't know if the data format is published... In an object-oriented system, you could pretty easily over-ride only part of the data dealing with the PUA by funnelling property requests in the PUA range to a set of user-functions, or to user-loadable data blocks. (On Mac OS X for instance, the "User Defaults" mechanism should be easily adaptable with load/unload hooks for this and would work on a per-machine, per-user or per-app basis, or all three.) Mark Leisher's publicly available "UCData" project also allows re-loading the property data with explicit load and unload functions. Overloading the utility functions to filter PUA codes to another data set would be in-principle rather easy. E.g., I mean something as trivial as this at the top of _ucprop_lookup(): if (_userDataLoaded && is_PUA_Char(code)) { return _ucprop_lookup_PUA(code, n); } and providing a set of load functions for user data. This project code already uses the Unicode Data file as-is for its initial input. PUA data could be used on a per-installation basis by simply adding data to the data file. Rick
RE: Tags and the Private Use Area
At 03:09 AM 5/3/01, Marco Cimarosti wrote: >The PUA is (or might be) used for, e.g.: [...] I have been following this thread with some amusement, and I have noticed that one use of the PUA has been overlooked: the Microsoft Symbol Font area. In Windows systems, this gives access to a literally limitless range of dingbats, artworks, and other "non-script" uses, 256 glyphs at a time. Many programs written for Windows (e.g. Powerpoint) look to "symbol fonts" for things such as bullets. The beauty of this area is its *lack* of semantics and repeatability and any other constraint except that it exists and is available. The alternative of course would be to put these glyphs in the U+0021 - U+00FF range. -- Curtis Clark http://www.csupomona.edu/~jcclark/ Biological Sciences Department Voice: (909) 869-4062 California State Polytechnic University FAX: (909) 869-4078 Pomona CA 91768-4032 USA [EMAIL PROTECTED]
RE: Tags and the Private Use Area
At 09:44 +0800 2001-05-03, [EMAIL PROTECTED] wrote: >2. How do I get software X to know how to process my PUA characters, or how >do I document my characters for others to understand my data? That's a good one. A very good one. Is there a way to define these using the Unicode properties? Sorting could be done, but I don't know about editing the database for local use. In principle it would work, if the OSes are being written to handle user editing of such things. Ten euros sez they ain't. >3. Is there a need for some protocol to tag data (either internal to the >data, as William suggested, or as metadata) to a recipient know either what >my PUA characters mean, or where to find documentation that explains that? I don't think so. I think this is pseudo-encoding. -- Michael Everson ** Everson Gunn Teoranta ** http://www.egt.ie 15 Port Chaeimhghein Íochtarach; Baile Átha Cliath 2; Éire/Ireland Mob +353 86 807 9169 ** Fax +353 1 478 2597 ** Vox +353 1 478 2597 27 Páirc an Fhéithlinn; Baile an Bhóthair; Co. Átha Cliath; Éire
RE: Tags and the Private Use Area
Marco wrote: >So, if William drops it, I will take the challenge -- at the risk of >repeating things that others and myself already wrote. > >The PUA is (or might be) used for, e.g People, there are three distinct issues here: 1. Are there legitimate uses for the PUA? 2. How do I get software X to know how to process my PUA characters, or how do I document my characters for others to understand my data? 3. Is there a need for some protocol to tag data (either internal to the data, as William suggested, or as metadata) to a recipient know either what my PUA characters mean, or where to find documentation that explains that? I think there is no debate about 1. Marco and others have given lists of valid scenarios. Regarding 3, a variety of objections have been made to Williams suggestion: - this is metadata and does not belong internal to the data - use of PUA characters to create a protocol creates a circular problem of documenting PUA usage and does not solve anything - some type of markup protocol could be an appropriate mechanism for doing this, but UTC will not establish this kind of protocol - this is not the right forum to discuss higher-level protocols I think that item 2 is the one thing that isn't getting discussed here, but which is probably in greatest need of discussion. >IMHO, It would be more interesting (and less impacting Unicode policies) to >discuss *what* this "PUA semantics" data could look like. Bingo! >Let me add that, however, all this subject is *not* exactly the >highest-priority need that I ever heard. I personally can live even with and >"undefined PUA", and wouldn't spend my time in developing such a thing. Lest we think this is unimportant, I will mention that I have heard of at least one linguist who has created a hacked Unicode (rather than e.g. hacked cp1252) font in order to get commercial software give the desired shaping behaviour with their as-yet-unencoded characters. In this case, I understand that they were given strong health warnings: "Don't give this to anybody else lest we start getting garbage data disseminated." It won't surprise me if these things start cropping up without those efforts to keep it contained. - Peter --- Peter Constable Non-Roman Script Initiative, SIL International 7500 W. Camp Wisdom Rd., Dallas, TX 75236, USA Tel: +1 972 708 7485 E-mail: <[EMAIL PROTECTED]>
RE: Tags and the Private Use Area
William Overington wrote: > Kenneth Whistler wrote: > > Among other things, you have yet to have meet the challenge > > by Michael Kaplan to provide a convincing case for their > > requirement. > > Oh, there was no need. Michael stated his challenge as a > "put up, or shut up" challenge [...] I am probably not the only one who feels that this is not the way of discussing things. I think that Michael did not state any "put up or shut up" challenge, but rather made a very sensible objection to the whole subject of this thread: "what is it for?". I think that such an objection should be answered politely, rather than haughtily refused. So, if William drops it, I will take the challenge -- at the risk of repeating things that others and myself already wrote. The PUA is (or might be) used for, e.g.: 1) linguistic research (e.g. handling texts in unencoded or unencodable ancient scripts); 2) recreational linguistics (e.g. constructed scripts and the like); 3) encoding research (e.g. experimenting with interim encodings while preparing proposals); 4) orthography development (e.g. special characters experimented for as-yet unwritten living languages); 5) interim encodings for non-linguistic notations (e.g. people who need labanotation to discuss dance over the Internet). Of course, every single person can be involved in more than one project for each one of the disciplines above, e.g.: a scholar may study (or teach) both hieroglyphics and cuneiform; a "game master" may be discussing several role games in ConLangs mailing lists, etc. After a few years surfing in linguistic-related forums, I noticed that the same names tend to occur in disciplines 1 to 4, and I wouldn't be surprised if some of these people is also interested in point 5. All this is to say that, yes, there may be a latent need for exchanging PUA encodings and, consequently, to define some sort of protocol to attach the intended meaning to the otherwise meaningless PUA codepoints. Such a protocol can be private or public but, clearly, it *cannot* be a part of the Unicode Standard, because this would contradict the basic statement that everybody can do whatever they want with PUA . One can imagine that, in a distant future, Unicode could choose to "reference" such a protocol as a "related information", but no more. However, before such a thing can happen, there must be something to be reference... I think that the discussion is currently focusing the wrong thing. It is not so important how a certain text file will declare its "PUA semantics": after all, there will never be *one* method for doing this (text who has a MIME header will presumably use it; rich text will have its own means; mark-up languages may add a tag for this, etc.). IMHO, It would be more interesting (and less impacting Unicode policies) to discuss *what* this "PUA semantics" data could look like. Will it be a UniData-like file? Or will it be an XML-based file? Will it include a default font? Which kind of font? And how will all this material be used: will programmers manually download it and package it in their applications? Or will it be automatically downloaded and installed à la plug-in? Let me add that, however, all this subject is *not* exactly the highest-priority need that I ever heard. I personally can live even with and "undefined PUA", and wouldn't spend my time in developing such a thing. If someone else wishes to start such a work, I would certainly try to keep myself informed about their progress -- but I would not like to follow *every* single step of the discussion on *this* mailing list. _ Marco
Re: Tags and the Private Use Area
William Overington wrote, responding to Ken: > As I have not claimed that any such case actually exists at the > present time, then the challenge is null and void and I have no need to > answer it. Hmmm... I still think, personally speaking, that you're going through a lot of effort that appears basically pointless because there isn't any problem. If you could illustrate the existence of a real and more-or-less serious problem, I would probably take the discussion seriously. At this point, I still don't see an actual problem that is affecting lots of people, so there's a lot of list traffic being spent to not much effect. > I must also convince the Unicode Consortium that it has the power to > implement private use area support tags even if the Unicode Consortium > were to accept that private use area support tags were needed. Why must you do that? What would be the point of convincing them that they can do something to support a non-problem? It sounds like perhaps you have a personal agenda that requires such a thing as a springboard. That doesn't necessarily translate into a world-wide problem with the PUA. > I happen to think that the Unicode Consortium arguably might have the > power to implement private use area support tags if it chose to do so. Well, honestly, the Consortium, like any organization, can do lots of things, but they may not be interested in doing some things that they could do. I suspect they may have bigger and more immediately problemmatic fish to fry -- like the living scripts of South & Southeast Asia that still need to be encoded. Or the worldwide lack of a decent and complete language tagging standard... > If the issue of capability to implement were resolved in favour > of the view that the Unicode Consortium does indeed have the capability > to implement private use area support tags in a non-private use area > of the unicode code point space, then the issue of whether to implement > or not to implement and if to implement in what manner to implement > would become a normal Unicode Technical Committee process. I would assert already that UTC could in fact implement such a thing. I believe they should not do so because that would undermine the freedom of action within the PUA itself. If you took a straw poll of UTC members, I suspect you would find little or no favor for adding such support tags for just that reason -- aside from the fact that no _need_ has yet been demonstrated. If you want to push the issue and get an actual response from UTC, I suggest you submit a document with a proposal to UTC. Instructions are on the web site. Rick
Re: Tags and the Private Use Area
Kenneth Whistler wrote: Among other things, you have yet to have meet the challenge by Michael Kaplan to provide a convincing case for their requirement. end quote Oh, there was no need. Michael stated his challenge as a "put up, or shut up" challenge on the matter of stating an actual example of a clash between two actual existing uses of the private use area. A "put up, or shut up" challenge relates to someone being ask to justify something that he or she has stated. As I have not claimed that any such case actually exists at the present time, then the challenge is null and void and I have no need to answer it. I did not wish to seem less than diplomatic in my response so I answered upon the scientific content of the challenge rather than commenting on its validity as a challenge. I can seek to provide a convincing case for their requirement. Yet, there is an additional matter that I need to do as well, for I must also convince the Unicode Consortium that it has the power to implement private use area support tags even if the Unicode Consortium were to accept that private use area support tags were needed. I happen to think that the Unicode Consortium arguably might have the power to implement private use area support tags if it chose to do so. Thus far, those who have expressed an opinion are clear that it does not have that power. This is a different issue than the issue of submitting a proposal for the system to be implemented. If the issue of capability to implement were resolved in favour of the view that the Unicode Consortium does indeed have the capability to implement private use area support tags in a non-private use area of the unicode code point space, then the issue of whether to implement or not to implement and if to implement in what manner to implement would become a normal Unicode Technical Committee process. Ken mentions that I had written as follows: There would be a protocol saying that, in a plain unicode text file, but not in a rich text file, end quote Ken then responded as follows: This distinction already creates a problem for your proposal. Rich text contains chunks of plain text, and introducing a bunch of tag characters and a protocol for using them which have to be kept out of rich text, but which can be in plain text, creates a filtering and transducement problem. That would introduce a problem, rather than eliminating a problem. end quote I based my idea on the protocols for the language tags of plane 14, where there are protocols for using the tags in this manner. William Overington 2 May 2001
Re: Tags and the Private Use Area
Mike Ayers wrote: I'd like to point out that I consider it a Good Thing not to have a classification system. Should I choose to use PUA characters, I don't want any application that I didn't write attempting to interpret their meaning, since I may use them for anything (wasn't it you, William, who was working on a soft processor which used PUA codepoints as instructions?) and I don't want to waste time describing the usage for other applications (if the usage can even be described). end quote In that case, should you choose to use PUA characters you may well not want any application that you didn't write attempting to interpret their meaning. So you need to not let anyone who might try to have an application that you did not write attempt to interpret their meaning get a physical copy of the file and you need to prohibit anyone whom you do allow to have a physical copy of the file from trying to have an application that you did not write attempt to interpret their meaning using your rights under intellectual property law. If that is possible then so be it. In any case, if an application not written by you did actually end up trying to read your file, then if you were not using private use area support tags then it would not know how to interpret the private use area codes that it encountered. For you, with your declared choice, that would be fine. I am not suggesting that the use of private use area support tags should be obligatory for people using private use area codes in a plain unicode text file, I am only suggesting an additional facility for users of unicode. You need not waste time describing the usage for other applications as use of private use area support tags would be an entirely optional facility for people to use if they so wish and not use if they do not so wish. Yes, it is me that is researching on a soft processor which uses PUA codepoints as instructions. It is called a uniengine. The word uniengine is generic with the meaning previously posted, so any set of codes that I publish regarding a particular uniengine will need for the uniengine to have a specific name, which I shall need to coin. I am thinking in terms of licencing the technology and letting publishing the meanings of the codes so that people know what the codes each mean. I would use private use area support tags to assist the processor know when an in-line graphic was encountered if I could. Yet, the situation of a software application seeking to read in a file in plain unicode text where that text contains one or more characters from the private use area is going to be a widely practiced activity. How widely practiced I do not know, but systems such as Word 97 have facilities to read in plain text files and output plain text files. I suggest that future word processing packages, from a variety of manufacturers, may well have a selection box choice for plain unicode text for reading in files and plain unicode text for writing out files. When reading in a unicode plain text file, any encounter of a private use code will need to be managed by the software, and even if the user operating the computer knows that it is whatever, be it for example symbols for early chemistry or symbols for ballet, then the computer system will need to know as well. A font for early chemistry symbols may well not contain the ordinary English alphabet within the font. So, once loaded as a file and internally converted into the internal rich text format of the wordprocessing software there may well be lots of work for the operator to carry out changing the font of the private use area characters to early chemistry symbols. Even if the word processor had a facility for a global changing of the font of the private use area characters to the early chemistry font, that facility would not be suitable if the document contained even non-code-clashing uses of two private use area fonts. If code clashing occurred, then the chances for ambiguity would be huge. When writing out, private use area support codes could be added in to the unicode plain text file produced, or not added in to the unicode plain text file produced as the user chose from the choice presented in the drop down menu of the selection box in the Save as section of the wordprocessor. William Overington 2 May 2001
Re: Tags and the Private Use Area
At 14:01 -0700 2001-05-01, Kenneth Whistler wrote: >If, on the other hand, it were just a matter of ensuring interoperability >of private Blissymbolics implementations, then you could get endorsement >by the Blissymbolics Institute. BCI (Clissymbolics Communication International) will be playing with a PUA implementation to ensure that what we are doing works, but the intention is to have Blissymbols encoded in the SMP. -- Michael Everson ** Everson Gunn Teoranta ** http://www.egt.ie 15 Port Chaeimhghein Íochtarach; Baile Átha Cliath 2; Éire/Ireland Mob +353 86 807 9169 ** Fax +353 1 478 2597 ** Vox +353 1 478 2597 27 Páirc an Fhéithlinn; Baile an Bhóthair; Co. Átha Cliath; Éire
Re: Tags and the Private Use Area
At 07:53 PM 5/1/01 +0100, William Overington wrote: >Asmus continues: > >Since such scheme(s) support only some particular >usage (or set of usages) of the private use area, >the consortium would no longer be neutral towards >*any and all* uses of the Private Use Area. > >end quote > >This is the core sentence of the posting for me. The question is as >follows. > >Does such a scheme support only some particular usage (or set of usages) of >the private use area? If you want to discuss set theory here, the set of all usages includes - of course - all usages that do *not* make use of the extra information that you are trying to provide in your protocol. (I used the word 'scheme' in my posting). This set is always larger than the set of all usages that *do* make use of the scheme. Since this the latter set can only be a proper subset of the total set, once the Consortium adds any characters specifically for use in the kind of protocol that you describe, it would have shown a preference over other users of the PUA who either don't use any protocol or use a set of PUA characters for the same purpose using a different protocol not recognized by the Consortium. In other words: >Asmus Freytag wrote: > >This would violate the neutrality that the Unicode >Consortium is bound to observe when it comes to >uses of the Private Use Area. [Since] By encoding characters >it would implicitly endorse the scheme (or series of >schemes) designed to use these characters. > >end quote Coulnd't have said it better myself ;-) A./
Re: Tags and the Private Use Area
William Overington perorated: > Asmus continues: > > Going further and outlining a protocol for such a > thing is even worse - if done by the Unicode Consortium. > However, it would be fine for any other organization > to define the protocol - but that organization could > not assign any special non-private characters. > > end quote > > If the Unicode Consortium found that it could act in this matter within > limits then definition of a protocol would be all but an essential part of > any such action. But precisely as Asmus has stated, the Unicode Consortium (or more propertly the Unicode Technical Committee) would not define any such protocol for private use characters. It is entirely up to external organizations to engage in such work if they choose to do so. > > What I have in mind here is a set of private use area support tags, perhaps > located in plane 14, better located in plane 0 if a contiguous block of 128 > unused codes in a reasonable place not within the private use area could be > found. This would be a request for encoding of standardized characters, which the Unicode Technical Committee would, of course, have to decide upon. However, judging by my experience in the Unicode Technical Committee and the feedback so far on this list (and the feedback received on the language tag characters in Plane 14, which were "deprecated on birth"), it is rather unlikely that any such proposal would be approved by the UTC. Among other things, you have yet to have meet the challenge by Michael Kaplan to provide a convincing case for their requirement. > > There would be a protocol saying that, in a plain unicode text file, but not > in a rich text file, This distinction already creates a problem for your proposal. Rich text contains chunks of plain text, and introducing a bunch of tag characters and a protocol for using them which have to be kept out of rich text, but which can be in plain text, creates a filtering and transducement problem. That would introduce a problem, rather than eliminating a problem. > certain information related to the meanings ascribed to > any private use area codes used *may, but need not* be included using these > tags. However, *if* that information is included using these tags, then > this format of providing that information *must* be used. ... by a process that chooses to honor that protocol. But that would be outside the scope of the Unicode Standard and nothing that one could depend on a process conformant to the Unicode Standard to be following, merely by virtue of that conformance claim. > The protocols > could be carefully designed so that no limiting presumption whatsoever as to > the nature of the usages of the private use area that were capable of being > described using the protocols were made by the protocols. > > I suggest that such a facility would be useful and would provide a sound > basis for the future. I don't think so. Frankly I don't think things would be any better than with the kind of plain text alternatives that have already been suggested. Or, if you are convinced that there really is sufficient reason and demand to automate the processing, an alternative is simply to provide for a PUAconventions.xml file, which would contain the information you are suggesting for the protocol. Point at the appropriate PUAconventions.xml file, and you get the equivalent of trying to bury such information in plain text files, without actually touching the plain text files or requiring any additions to the Unicode Standard. > > The alternative is either chaos or the need to use a protocol put forward by > an organization other than the Unicode Consortium or by an individual. Exactly. > Yet > such a protocol would not have had the benefit of the full procedures of the > Unicode Consortium in its drafting and would have no standing above any > informal agreement amongst some users that it might receive and certainly > not the endorsement of the Unicode Consortium. It would have the standing of whatever organization chose to standardize such a protocol. Which is entirely appropriate. If you were expecting such a protocol to receive worldwide acceptance and be usable on the Internet, then you should anticipate that you would have to get it approved by the IETF as an Internet Standard. If, on the other hand, it were just a matter of ensuring interoperability of private Blissymbolics implementations, then you could get endorsement by the Blissymbolics Institute. And so on. > > So, I ask a question. Is there a formal method for the matter as to whether > the Unicode Consortium has the powers to do what I suggest above to be > formally decided please? See: http://www.unicode.org/pending/proposals.html But note that the proposals that the Unicode Technical Committee invites are related to the encoding of *characters*, rather than the development of higher-level protocols. --Ken
RE: Tags and the Private Use Area
> From: William Overington [mailto:[EMAIL PROTECTED]] > > Can there be found a possible usage that such a scheme would > not support? > Finding just one would resolve the question. I suspect that the whole issue is covered by Goedel's(sp?) Incompleteness theorem, which says (approximately) that any mathematical system above a certain complexity can not be fully mathematically described (characterized) within itself. Any scheme to describe PUA usage involving only PUA characters cannot be distinguished with certainty from random use of PUA characters. A scheme which used non-PUA characters could work, but, as has been stated many times, will not happen. I'd like to point out that I consider it a Good Thing not to have a classification system. Should I choose to use PUA characters, I don't want any application that I didn't write attempting to interpret their meaning, since I may use them for anything (wasn't it you, William, who was working on a soft processor which used PUA codepoints as instructions?) and I don't want to waste time describing the usage for other applications (if the usage can even be described). /|/|ike
Re: Tags and the Private Use Area
Asmus Freytag wrote: This would violate the neutrality that the Unicode Consortium is bound to observe when it comes to uses of the Private Use Area. By encoding characters it would implicitly endorse the scheme (or series of schemes) designed to use these characters. end quote I have read the posting several times. The first time through I did not agree with what was written. When I read it again later, I found that I then believed that the posting is absolutely correct, that my suggestion was incorrect and that I had a deeper understanding of the issues involved. A third reading put me back to disagreeing! Thus, for me, from my reading of it, the situation seems very finely balanced. I mention this because I would like, with permission, to comment on what is written in the posting without necessarily either agreeing or disagreeing in total. My first comment is that I am now by no means certain that my suggestion as to what I felt that the Unicode Consortium could reasonably do in this matter is valid. Yet I am not quite sure that the Unicode Consortium could not act if it so chose, within limits. Asmus continues: Since such scheme(s) support only some particular usage (or set of usages) of the private use area, the consortium would no longer be neutral towards *any and all* uses of the Private Use Area. end quote This is the core sentence of the posting for me. The question is as follows. Does such a scheme support only some particular usage (or set of usages) of the private use area? I find the phrase "or set of usages" particularly informative. Let us consider the set of all possible usages. Can there be found a possible usage that such a scheme would not support? Finding just one would resolve the question. However, since the "not finding" of one would be no proof that such a scheme could not exist, can it be proved mathematically that there is no possible usage that such a scheme would not support? For, if it could be so proved, then, unless there are also other reasons, the Unicode Consortium might indeed *have* the power to act if it so chooses. This would mean that interested people might be able to develop a system within the private use area with the prospect of it being moved from the private use area and promoted to the status of being a part of the unicode standard if it were found useful. Asmus continues: Going further and outlining a protocol for such a thing is even worse - if done by the Unicode Consortium. However, it would be fine for any other organization to define the protocol - but that organization could not assign any special non-private characters. end quote If the Unicode Consortium found that it could act in this matter within limits then definition of a protocol would be all but an essential part of any such action. What I have in mind here is a set of private use area support tags, perhaps located in plane 14, better located in plane 0 if a contiguous block of 128 unused codes in a reasonable place not within the private use area could be found. There would be a protocol saying that, in a plain unicode text file, but not in a rich text file, certain information related to the meanings ascribed to any private use area codes used *may, but need not* be included using these tags. However, *if* that information is included using these tags, then this format of providing that information *must* be used. The protocols could be carefully designed so that no limiting presumption whatsoever as to the nature of the usages of the private use area that were capable of being described using the protocols were made by the protocols. It is a matter for consideration as to quite how much information could be included in such protocols without making any limiting presumption as to usage of the private use area, but it might well be possible to provide an amount sufficient to avoid ambiguity and to allow two or more overlapping uses of the private use areas to be used in different parts of the same document. At the very least, if an ordinary language comment could be added that would be helpful. If a Uniform Resource Locator could be added as an optional element, then good. I feel that such protocols permitting the optional use of a font name if the private use area codes so described are to be regarded as displayable characters using a particular font does not violate neutrality as to whether any particular defined use of a private use area character so defined by any particular member of the unicode user community is to be a displayable character or a non-displayable character. It is well known that some uses of the private use area can be for displayable characters and some for non-displayable characters. The unicode specification recognizes this in the specification, so providing facilities such that any particular member of the unicode user community may inform people as to what he or she has chosen to do in any particular circumstance surely cannot be
Re: Tags and the Private Use Area
Michael Kaplan invited me to give an actual scenario that requires private use area support tags and an associated protocol. Well, the short answer is that I am unable to find one at the present time with my present level of knowledge. I thought about how I might find such an actual scenario with the facilities before me. I have access to as PC with Windows 95 which has Word 97 on it. This enables one to create a Word document, select a font that contains private use area characters, use Insert Symbol to add such a symbol or symbols to the Word document, save the Word document then Save as HTML then View HTML Source. From the HTML source code produced one can find the decimal values of the codes. I tried the Junicode font and the Times New Roman font, the latter because I remembered someone in this list writing some months ago that the Microsoft Times New Roman font on their website (which is the font that I am using, having obtained it so as to support on this local machine the use of unicode in my 1456 object code system that is on www.users.globalnet.co.uk/~ngo which is our family webspace in England) has an extra character in it. Might this extra character, the one that says OBJ in a dotted box, just possibly clash with the Junicode lady in the Junicode font? An interesting situation is that the OBJ character came out as  which is U+FFFC which is not in the private use area and I learned some more about unicode by following that up in the unicode specification (chapter 13 of version 3.0 under Specials, Replacement Characters). I found that the dialogue box of the Insert Symbol facility in Word 97 has near its top right corner a text box that will actually display the text "Private Use Area" when the "selection cursor" for the symbol that one is considering is on a symbol that is from the private use area. Does what I have here termed the "selection cursor" have a proper Microsoft name? I like to get the parlance of features of Microsoft products right. I now notice that Arial also has the OBJ character, so perhaps OBJ is not the additional character to which the poster from some months ago referred? I also notice that both the Times New Roman and the Arial have fi and fl ligatures twice, once under Private Use Area and once under Alphabetic Presentation Forms. Junicode does not appear to have either fi or fl under Private Use Area but does have ff, fi, fl, ffi and ffl under Alphabetic Presentation Forms. The Junicode ffl placed in a Word document and then formatted as Arial gives a black outline box. The Arial fi from the Private Use Area placed in a Word document and then formatted as Junicode gave a black outline box. The black outline box seems to represent "unknown to this font". The use of the fi and fl from the Alphabetic Presentation Forms section of Times New Roman, Arial and Junicode all carry through to the other two fonts, regardless of which font one uses to insert the characters in the first place. The decimal codes for ff, fi, fl, ffi and ffl come out as 64256, 64257, 64258, 64259 and 64260 respectively, and these turn out to be U+FB00 through to FB04 inclusive. Michael continues: If there is no such scenario, then why not involve your obviously fine intellect in some of those real problems? In other words, help clear the backlog of work rather than try to create work without proof that it is even needed? Once we clear all of those things up, we will be bored and can certainly move on to all of the theoretical matters that might be out there. end quote Well, as they say, I am grateful to the honourable gentleman for the remarks in the first part of his question. Two points arise. One is that I happen, as a user of unicode, to regard what is to be done to support the use of the private use area as a real problem. Secondly, are there any of the problems to which you refer where there is an opportunity for people who do not represent the organizations that are members of the Unicode Consortium to participate? I have not seen any such opportunities advertised in the mailing list, though I have not been a subscriber to the list for very long. William Overington 1 May 2001
Re: Provenance of the Unicode Standard and of statements (derives from Re: Tags and the Private Use Area)
William Overington wrote: > What please is the IETF? Internet Engineering Task Force. As Rick pointed out, peruse: http://www.ietf.org/ > > Ken continues: > > But anyone who comes to the [EMAIL PROTECTED] list looking > to actually develop and establish a standard protocol involving > Unicode is looking in the wrong place. > > end quote > > Well, maybe. I notice that Ken writes using the email address > [EMAIL PROTECTED] and that on the unicode website he is listed as a Technical > Director of Unicode with the email address [EMAIL PROTECTED] with a mention > of Sybase Inc. noted there. > > So, when Ken states the sentence above, is that Ken writing as a private > individual expressing a purely personal opinion, or Ken writing as a > representative of Sybase Inc. or Ken writing as a Technical Director of the > Unicode Consortium stating official Unicode Consortium policy? > > I feel that that is an important issue that needs to be clarified. I post personal opinions on this list. When I am posting notes that represent an official Sybase position, I post them on the relevant lists and sign myself as the Sybase representative to the UTC and to L2. When I respond in my capacity as one of the Technical Directors of the Unicode Consortium (which is usually in direct email responses to inquiries that come in via [EMAIL PROTECTED]), I sign my mail appropriately. Otherwise I'm not a big fan of officious-looking signatures on email, and just sign myself "Ken". > May I suggest that there exists scope for considerable confusion as to the > provenance of a statement made on this list where members of the unicode > user community may well not know who are the directors of the Unicode > consortium. Well, sure, but I think most participants on this list know how it generally works -- as Rick pointed out. None of the Unicode officers or UTC representatives come to this open discussion list trying to push official positions on everyone. Official policies are the provenance of the Unicode website, the Unicode Standard itself, and the meetings of the Unicode Technical Committee. > I am genuinely confused by this situation. Ken is a Technical Director of > the Unicode Consortium and has the [EMAIL PROTECTED] email address. He > writes using the email address [EMAIL PROTECTED] and does not state that he is > a Technical Director of the Unicode Consortium in this posting. Ken makes > statements about what is appropriate posting in this list. Knowing that Ken > is a Technical Director of the Unicode Consortium makes me feel that I > should treat what he says as if it is an official ruling of the Unicode > Consortium that that is how this list is to be used. Yet is that a correct > interpretation? Is Ken just happily and in a friendly manner only seeking > to express a personal view? The latter. My posting on this topic was not an official statement from the Unicode Consortium or any other organization I may represent. I was merely trying to point out that as a matter of history and practice the [EMAIL PROTECTED] discussion list does not develop protocols, and since what you were presenting and the way you presented it seemed to invite participation in the development of a protocol, I was pointing you to the kind of forum where protocol development *is* in scope and is the focus of various email discussion lists. > If people cannot legitimately and welcomely > discuss such issues here then surely all that will happen is that someone > will start an alt. newsgroup and the discussions will take place there. I'm not trying to chase you off the list. Only Sarasvati could do that, and as Rick pointed out, that only happens when people blatantly violate her rules for participation on the list. --Ken
Re: Tags and the Private Use Area
David Starner wrote: > > Character set information must go along with every non-Latin-1 > webpage already, and most word processor formats already carry along > huge quantities of data, such that just adding the information > shouldn't be hard at all. > The charset declaration in HTML header is just one line, like saying charset=utf-8. The concern was that someone would expect TUS 3.0 in its entirety to be included in every file, as an extreme example. Since the PUA is part of Unicode, it is covered when the character set is specified as utf-8 in HTML. > Intellegent software cached the file and loads it up from the cache; > the number of distinct uses for the PUA any one person will run > across is probably low enough to cache every one permenantly. Dumb > software will do the TeX thing and say "File not found. Please enter > alternate PUA reference for 'Klingon at http://www.kli.org/klingon.xml':". > Note that there's already precedence in XML for stuff like this; XML > includes a URL to find the doctype that's needed to validate it. > My impression is that the typical Klingon user (if they used the Klingon script rather than the romanization) might well have dozens or even hundreds of files using the ConScript PUA encoding. This could be true of any PUA user group. The user files could also be many in format, *.TXT, *.HTM, *.DBF, *.EML, *.etc. Rather than specifying a structure requiring caches and on-line sessions, it might be better to just leave things be and let authors and users work implementation issues out privately. Common sense should indicate to a publisher that some kind of info or pointers to same would be a good idea. Best regards, James Kass.
Re: Tags and the Private Use Area
On Sun, Apr 29, 2001 at 03:14:23AM -0500, David Starner wrote: > Why? For HTTP, all it would take is a line like > "Content-Unicode-PUA: Klingon; ref=http://www.kli.org/klingon.xml"; > where http://www.kli.org/klingon.xml is the definition. > > Considering the amount of stuff a web 0 has to support, I don't browser > see why this would be a 0 cost for anyone. Nothing says lynx ^^^ huge > or Opera has to support it, but the heavyweight browser (IE, Mozilla) > wouldn't have any reason not to. > > > Therefore, communities that share a well > > defined set of characters are better off if they can be standardized. > > Well, duh. From the 0 to this thread, I don't think there's ^^^ responses [Misspelled words were replaced with 0's, through a local mishap. I guess they weren't misspelled any more . . .] -- David Starner - [EMAIL PROTECTED] Pointless website: http://dvdeug.dhis.org "I don't care if Bill personally has my name and reads my email and laughs at me. In fact, I'd be rather honored." - Joseph_Greg
Re: Tags and the Private Use Area
On Sun, Apr 29, 2001 at 01:26:18AM -0700, James Kass wrote: > To store all such information in each relevant file using > non-BMP characters does seem a bit much. Even without > any new representations, providing this data in each file > might work if the user had only one or two such files, > but wouldn't most users favoring a PUA encoding have > many files? Character set information must go along with every non-Latin-1 webpage already, and most word processor formats already carry along huge quantities of data, such that just adding the information shouldn't be hard at all. > Earlier, someone brought up the idea that the format of > the tag could include an active link to download additional > data. If the tag must be in each file's header, what happens > if a user is looking at files off-line? Does the system read > the header of the file, determine that data is required on-line, > and then prompt the user to connect? Every time that file > or a similar file is opened? Intellegent software cached the file and loads it up from the cache; the number of distinct uses for the PUA any one person will run across is probably low enough to cache every one permenantly. Dumb software will do the TeX thing and say "File not found. Please enter alternate PUA reference for 'Klingon at http://www.kli.org/klingon.xml':". Note that there's already precedence in XML for stuff like this; XML includes a URL to find the doctype that's needed to validate it. -- David Starner - [EMAIL PROTECTED] Pointless website: http://dvdeug.dhis.org "I don't care if Bill personally has my name and reads my email and laughs at me. In fact, I'd be rather honored." - Joseph_Greg
Re: Tags and the Private Use Area
John Cowan wrote: > > > This file uses characters assigned > > to the Private Use Area of Unicode according to the > > PUA scheme published at (URL). In order to view this > > document, it will be necessary to obtain and install > > the (font-name) font from (URL of font provider). > > > > Well, this is fine if all you want to do is render the document. > If you want to *process* the document, though, you need > need to have information on the properties of the PUA > characters relevant to the document. > > However, I agree that no new representations are needed > for this. It is sufficient just to extend the 3.x > UnicodeData and *Properties files. > The ability to correctly display text is important. Anything beyond that would perhaps be better stored as part of the PUA scheme itself at the referenced URL. (This could be in plain text format designed to be used to extend the UnicodeData files.) Or, in the case of TTF/OTF, there's a table within the font (GDEF = glyph definition) which allows some rudimentary properties for glyphs. (This font table isn't yet widely supported.) To store all such information in each relevant file using non-BMP characters does seem a bit much. Even without any new representations, providing this data in each file might work if the user had only one or two such files, but wouldn't most users favoring a PUA encoding have many files? Earlier, someone brought up the idea that the format of the tag could include an active link to download additional data. If the tag must be in each file's header, what happens if a user is looking at files off-line? Does the system read the header of the file, determine that data is required on-line, and then prompt the user to connect? Every time that file or a similar file is opened? Maybe it would be best to leave it incumbent upon a file's author to provide any necessary information or pointers. If someone has accessed a file which uses the PUA and can't read it, it may well be that the contents of that file are supposed to be every bit as private as the Unicode area used. Best regards, James Kass.
Re: Tags and the Private Use Area
On Sat, Apr 28, 2001 at 11:38:30PM -0700, Asmus Freytag wrote: > Someone (for example IETF or W3C) who is in the business of defining > general protocols for text interchange built on top of the Unicode Standard > would probably want to be very careful about issues relating to the private > use area. There are three options: > a) The safest thing is to prohibit the use of the private use area > altogether - this maximizes the success of any interchange. Maximizing the success of interchange is not as important as being able to communicate what you want. If it were, then we should all use ASCII, because that's about all that will reliably display almost everywhere. > b) In the future, there may be a web-scalable way to characterize the > private use area assignments - in that case they could be built into the > protocols. The interchange would be definite, but at a considerable cost to > everyone. Why? For HTTP, all it would take is a line like "Content-Unicode-PUA: Klingon; ref=http://www.kli.org/klingon.xml"; where http://www.kli.org/klingon.xml is the definition. Considering the amount of stuff a web 0 has to support, I don't see why this would be a 0 cost for anyone. Nothing says lynx or Opera has to support it, but the heavyweight browser (IE, Mozilla) wouldn't have any reason not to. > Therefore, communities that share a well > defined set of characters are better off if they can be standardized. Well, duh. From the 0 to this thread, I don't think there's any evidence that people are using the PUA for standardizable characters and not working on getting them standardized. There's apparently two different sets of people using the PUA: people who are working on getting something standardized and (a) need to use it now or (b) need to check their implementation, and people using codes that won't be standardized (logos, conscripts, Han variants). Telling the latter group they're just out of luck will produce more character sets and kludges than trying to support them, at least to the point of not banning all use of the PUA. -- David Starner - [EMAIL PROTECTED] Pointless website: http://dvdeug.dhis.org "I don't care if Bill personally has my name and reads my email and laughs at me. In fact, I'd be rather honored." - Joseph_Greg
Re: Tags and the Private Use Area
>William Overington wrote: > > > However, there is something that I feel that the Unicode > > Consortium could do, if it so wished, without violating > > that rule. I suggest that the Consortium could, > > if it so chooses, encode one or more regular unicode > > characters together with a protocol so that an author of > > a file of unicode plain text that uses any of the codes of > > the PUA could, if and only if that author > > chooses to so state, state in a file of plain unicode text > > what meaning the author of that file places upon any > > PUA characters that the author uses. This would violate the neutrality that the Unicode Consortium is bound to observe when it comes to uses of the Private Use Area. By encoding characters it would implicitly endorse the scheme (or series of schemes) designed to use these characters. Since such scheme(s) support only some particular usage (or set of usages) of the private use area, the consortium would no longer be neutral towards *any and all* uses of the Private Use Area. Going further and outlining a protocol for such a thing is even worse - if done by the Unicode Consortium. However, it would be fine for any other organization to define the protocol - but that organization could not assign any special non-private characters. I do believe we are going in circles here, and lengthy ones to boot. A./
Re: Tags and the Private Use Area
Why Unicode will never endorse certain proposals By making the Private Use Area "private", the Unicode Consortium imposed on itself a restriction to stay absolutely neutral on the use of these characters. In other words, it cannot promote or appear to be promoting the use of this area for any one *particular* purpose. Nor can the Consortium endorse, or appear to be endorsing, any particular method of identifying the repertoire or usage of these characters. Doing so, would change the nature of the private use area from something that is private and outside the scope of the Consortium to something that is a formalized code extension technique. Why everyone else is free to do what they want -- By definition, this restriction does *not* apply to any other organization not involved in maintaining the standard. For example, vendors, user groups, and individuals are quite within their rights to propose particular assignments or even to define higher level protocols that regulate the use of the private use area, as long as these apply to *those users, and only those* that subscribe to that assignment or higher level protocol. Why certain things may or may not be advisable -- Someone (for example IETF or W3C) who is in the business of defining general protocols for text interchange built on top of the Unicode Standard would probably want to be very careful about issues relating to the private use area. There are three options: a) The safest thing is to prohibit the use of the private use area altogether - this maximizes the success of any interchange. b) In the future, there may be a web-scalable way to characterize the private use area assignments - in that case they could be built into the protocols. The interchange would be definite, but at a considerable cost to everyone. c) Some protocols may be designed to cover any form of plain-text without loss. Such protocols would need to allow unrestricted use of the private use area, but success of interchange would depend on outside negotiation. Why interchanging private use characters won't work --- Because our growing dependency on internet and web protocols, data interchange among a community of users who rely on a common set of private use characters seems hopeless without the existence (and widespread implementation) of option b. However, if it simply involves the use of a common font, option c would work as well (with distribution of the common font being the outside negotiation). Anything more complex would run into the need to customize editors, browsers, databases etc. in ways that probably wouldn't be possible or not uniformly successful. Since option b increases implementation costs for everyone, it is not likely to be supported everywhere. Therefore, communities that share a well defined set of characters are better off if they can be standardized.
Re: Tags and the Private Use Area
James Kass scripsit: > This file uses characters assigned > to the Private Use Area of Unicode according to the > PUA scheme published at (URL). In order to view this > document, it will be necessary to obtain and install > the (font-name) font from (URL of font provider). > Well, this is fine if all you want to do is render the document. If you want to *process* the document, though, you need need to have information on the properties of the PUA characters relevant to the document. However, I agree that no new representations are needed for this. It is sufficient just to extend the 3.x UnicodeData and *Properties files. -- John Cowan [EMAIL PROTECTED] One art/there is/no less/no more/All things/to do/with sparks/galore --Douglas Hofstadter
Re: Tags and the Private Use Area
William Overington wrote: > However, there is something that I feel that the Unicode > Consortium could do, if it so wished, without violating > that rule. I suggest that the Unicode Consortium could, > if it so chooses, encode one or more regular unicode > characters together with a protocol so that an author of > a file of unicode plain text that uses any of the codes of > the private use area could, if and only if that author > chooses to so state, state in a file of plain unicode text > what meaning the author of that file places upon any > private use area characters that the author uses. Suppose that the BMP of Unicode could be used for this purpose. In other words, why create additional characters in order to note necessary information for distribution with a PUA file? We might agree in principle that it would be a good idea for anyone publishing material using the PUA to include a note to that effect. Such a note could appear at the beginning of the file/document and could use any character from the BMP. A modest example follows: This file uses characters assigned to the Private Use Area of Unicode according to the PUA scheme published at (URL). In order to view this document, it will be necessary to obtain and install the (font-name) font from (URL of font provider). Now, the above example uses English, but the advantage of being able to use any BMP character within the "tag" or "note" is that any other modern language could be used, like Russian, Japanese, or Esperanto. This approach may offer some advantages: 1) It would work right away. 2) It would provide the essential information. 3) It would not need to be endorsed by any organization. 4) No additional characters would be required. 5) It doesn't attempt to fix anything which isn't broken. 6) Software applications don't have to be re-written. 7) It is human-readable. 8) It is simple. Best regards, James Kass.
Cutting to the chase (was Re: Tags and the Private Use Area)
From: "William Overington" <[EMAIL PROTECTED]> To: <[EMAIL PROTECTED]> Cc: <[EMAIL PROTECTED]> Sent: Saturday, April 28, 2001 7:44 AM Subject: Re: Tags and the Private Use Area > The quote is an excerpt from a sentence. Well, you did manage to go on for quite a bit. Since you were able to pick apart things I said at both the very start and then again at the very end that you read the whole message that I posted. So I will assume that you read the middle part, which included a polite form of the traditional "put up, or shut up" type challenge: I asked you to come up with an exact customer scenario rather than a desire to stretch the Unicode standard in a direction that is only theoretically useful but has no true and actual customer. I might have been too subtle, so I will state the "challenge" (such as it is) more clearly, so that you cannot ignore it in favor of 5000 words on superfluous procedural rules of what e-mail address should be used. :-) The CHALLENGE: Give an ACTUAL scenario that requires this thing you wish to see discussed. There is more than enough in the way of actual scenarios that anythin which does not have such a scenario becomes slightly less important than EVERYTHING ELSE in which folks here would have an interest. If there is no such scenario, then why not involve your obviously fine intellect in some of those real problems? In other words, help clear the backlog of work rather than try to create work without proof that it is even needed? Once we clear all of those things up, we will be bored and can certainly move on to all of the theoretical matters that might be out there. Not too much to ask, is it? :-) MichKa Michael Kaplan Trigeminal Software, Inc. http://www.trigeminal.com/
Re: Tags and the Private Use Area
Michael Kaplan wrote: Lets consider the fact that what you are looking for is summarized at the end of your message: "I hope to gain fairly widespread agreement within the unicode user community." end quote The quote is an excerpt from a sentence. The whole sentence is as follows. The suggestion is open for discussion and I hope to gain fairly widespread agreement within the unicode user community. Michael continues: I submit that this very desire is a violation of the entire spirit of the PUA, which is about PRIVATE USE and thus widespread acceptance is neither needed nor desired. You can attribute the frustration you are feeling as due to this single reason more than any other. end quote In everyday life there are laws, specifications and agreements. These all contain rules. There exist the concepts of "working within the letter of the rules", "working within the spirit of the rules", "working within the letter and the spirit of the rules" and "working within the letter but not the spirit of the rules". I feel that the concept of the spirit of the rules is very important: however, I feel that the spirit of the rules cannot possibly be such that it *contradicts* the letter of the rules. The rules for the private use area specifically include as an example of possible use " or they could be published as vendor-specific character assignments available to applications and end users." The letter of the specification is that there could be publication. The letter of the specification is that this publication could be by way of trade. Publication does not require any agreement, it can be unilateral action. I feel that my hope to gain fairly widespread agreement within the unicode user community is well within both the letter and the spirit of the specification. As to my feeling, well you surprised me there! I am feeling no frustration whatsoever. I have had a good week of research on a fascinating topic, I have had the benefit of reading the views of top experts on the unicode system as they debated the issues that arose. It is as if I have been given the privilege to spend a week as a guest in the common room of a top university debating with top scholars on an aspect of world class leading edge research. I have learned much. I had seen tags previously in passing but, as a result of the matter being raised, I have learned more of them. I feel that I now have a broad understanding of what action is needed to solve the problem. I have a (basic) understanding of the technical issues and also have become aware of the policy issues and the fascinating way that the potential for chaos has been recognized but will possibly not be acted upon officially until chaos occurs. I am reminded of the Millennium bug and the fact that that was envisaged as a potential cause of chaos during the 1990s and the way that that was acted on before 1 January 2000 rather than after 1 January 2000. I remember the way that, having heard talk of the Millennium bug during the 1990s I was startled when Channel 4 News on the television here in the United Kingdom announced early in 1998 that the Millennium bug had struck! A man had tried to buy something in a shop with his newly issued replacement credit card. The expiry date was 01/00 and the shop could not get the online automated electronic system to issue an authorization code for the purchase transaction and had to use a manual credit card machine with the multilayer pieces of paper. I am absolutely fascinated by the way that the Unicode Consortium, having recognized that the specification opens a window to potential chaos appears to prefer to wait until the chaos actually happens and then reported back before even starting a process of considering what to do about it. As you raise the issue of feelings I mention that a search on the web for Myers Briggs type indicator is interesting. It is sometimes called Myers-Briggs type indicator, using a hyphen. The web site www.new-oceans.co.uk is a good site for the Myers Briggs type indicator. The Myers Briggs type indicator is based on the teachings of Carl Gustav Jung. It is fascinating and will hopefully give an insight into how different people, based on their personalities, can view matters in entirely different ways. Another interesting aspect of psychology related to feelings is the Yerkes-Dodson law. I wish it were more widely known about. So, the matter of feelings having been raised I take the opportunity to mention it here so that anyone interested might like to search the web and hopefully enjoy what they find. A serendipitous link to follow up perhaps? William Overington 28 April 2001
Provenance of the Unicode Standard and of statements (derives from Re: Tags and the Private Use Area)
Kenneth Whistler, wrote: And there have been a couple of no-doubt frustrating responses already. end quote No, not frustrating at all. I have found it fascinating. I am seeking to participate in world class leading edge research work and the number of contributions to this thread, the variety of opinion, the matters raised and the potential to learn from the pointers given has been pleasing, fascinating and very helpful. Ken continues: I would like to uplevel briefly here and suggest why the people on this list are not engaging in the details of Mr. Overington's proposals so much as questioning the need for such a protocol, arguing the premises, talking about the role of metadata, and so on. end quote Well, most of the 650 recipients of this list do not participate in most discussions. I feel that some people will only respond to a posting in a list if they feel that they disagree or wish to make some particular additional point. If they agree, then they might just say "fine" to themselves and spend their time on something else rather than feel a need to send a posting that just says, "I agree". I am not suggesting that all or indeed most or even any of the recipients of this list agree with the suggestion that I made in my document. Many may not even have looked at it. When putting forward new ideas an inventor should perhaps not expect an immediate response. I feel that I will have done well if, of the 650 recipients on this list, some have filed the suggestion that I made in the document of 26 April 2001 under private use area and made a mental note that my suggestion exists, just in case one day a file coded using it turns up, and maybe made a note that there is a suggestion about the use of U+12 and U+100020 U+10007F that has been sent round and that, if they themselves are ever going to make use of the private use area for defining characters then, at that time, they will take into consideration the knowledge that that suggestion has been made and might be in use somewhere, and will make their own decision as to whether to in effect tacitly agree to it to the limited extent of avoiding *clashing codes* with it, even though no one else outside any organization for which they work is even aware that the decision to avoid clashing codes with my suggestion has been made so that the organization cannot be in any way whatsoever be seen to be endorsing my suggestion. I am content. I have sent out my idea as it stands and many of the key companies using unicode may possibly have made a note that the document exists. I have placed in this posting the URL of our family webspace, so if they want to check whether the idea is still about then they will be able to seek to check at the website if they wish. Ken continues later: One thing the Unicode discussion list doesn't do is develop protocols. That is the kind of work that instead often takes place on temporary Working Group discussion lists in the IETF. end quote What please is the IETF? Ken continues: While Mr. Overington's initial proposals were couched in terms of character encoding, it soon became clear to the list and to him that we weren't talking about standardizing any characters, but instead a proposal for particular private uses of PUA characters -- something the UTC and WG2 cannot and will not endorse, precisely because they *are* private use characters. end quote I learned about the idea of using characters within protocols within a plain unicode text file when the discussion turned towards the matter of tags. I am a relative newcomer to unicode and am on the learning curve. The Unicode Consortium cannot and will not endorse a proposal for particular private uses of PUA characters. That has not been an issue within this thread. I knew that situation before the thread started. However, there is something that I feel that the Unicode Consortium could do, if it so wished, without violating that rule. I suggest that the Unicode Consortium could, if it so chooses, encode one or more regular unicode characters together with a protocol so that an author of a file of unicode plain text that uses any of the codes of the private use area could, if and only if that author chooses to so state, state in a file of plain unicode text what meaning the author of that file places upon any private use area characters that the author uses. If the Unicode Consortium were to consider making such definitions, then perhaps I might suggest, for purposes of clarifying what I mean and providing some examples just in this discussion, there are, at the present time, three broad possibilities. 1. Define U+E0002 and use the existing tag characters. 2. Promote my suggestion to codes U+E0102 and U+E0120 U+E017F. 3. Something else. Now I fully accept that the Unicode Consortium may not wish to do anything whatsoever about this matter either now or ever and I am not saying or even suggesting that it should. That is a matter for
Re: Tags and the Private Use Area
William Overington wrote: > I have updated my suggestion. Here is the latest version for discussion. ... > Specific protocols to use with such tagging can be devised. ... > The suggestion is open for discussion and I hope to gain fairly widespread > agreement within the unicode user community. And there have been a couple of no-doubt frustrating responses already. I would like to uplevel briefly here and suggest why the people on this list are not engaging in the details of Mr. Overington's proposals so much as questioning the need for such a protocol, arguing the premises, talking about the role of metadata, and so on. The Unicode discussion list is focussed first-and-foremost on the Unicode Standard itself. The discussants here often discuss additional characters or scripts to be added to the standard, particular implementation issues for some character or script, the details of algorithms needed for implementing the Unicode Standard, and then love to go OT to discuss interesting issues relating to languages, scripts, etymologies and such. One thing the Unicode discussion list doesn't do is develop protocols. That is the kind of work that instead often takes place on temporary Working Group discussion lists in the IETF. While Mr. Overington's initial proposals were couched in terms of character encoding, it soon became clear to the list and to him that we weren't talking about standardizing any characters, but instead a proposal for particular private uses of PUA characters -- something the UTC and WG2 cannot and will not endorse, precisely because they *are* private use characters. And as has become clear in Mr. Overington's latest statement of what he is proposing, this is really a proposal for a protocol: a specification of a method for communicating particular interpretations of rationally segmented portions of the PUA. As such, this ([EMAIL PROTECTED]) is probably the wrong forum to be trying to discuss, modify, and gain working consensus on such a protocol proposal. It just isn't that kind of forum. [EMAIL PROTECTED] doesn't "work on" specific documents as a group, with the aim of publishing them as standard protocols for general usage. There is no program of work and no moderator whose job it is to attempt to solicit and capture consensus and move a document towards final form. The mechanism that is more appropriate to that would be to take the proposal, rework it as an Internet Draft, solicit commentary on that document, and then try to develop consensus *within the IETF* to progress such a document to a standard protocol. Of course in such a forum any proposal like this would also face questions regarding justification and alternatives. And those might be equally frustrating there. But anyone who comes to the [EMAIL PROTECTED] list looking to actually develop and establish a standard protocol involving Unicode is looking in the wrong place. --Ken
Re: Tags and the Private Use Area
On 04/27/2001 03:23:36 AM unicode-bounce wrote: >From: "William Overington" <[EMAIL PROTECTED]> > >> I have updated my suggestion. Here is the latest version for discussion. > >Lets consider the fact that what you are looking for is summarized at the >end of your message: "I hope to gain fairly widespread agreement within the >unicode user community." I submit that this very desire is a violation of >the entire spirit of the PUA, which is about PRIVATE USE and thus widespread >acceptance is neither needed nor desired. Nor possible. Not practically possible to get all potential 6G+ members of the community to agree (or even a minute fraction), and even if that were possible, not possible to know that they all agree (I'd guess it's only possible among perhaps around 0.1% of the potential community at best). - Peter --- Peter Constable Non-Roman Script Initiative, SIL International 7500 W. Camp Wisdom Rd., Dallas, TX 75236, USA Tel: +1 972 708 7485 E-mail: <[EMAIL PROTECTED]> You can attribute the frustration >you are feeling as due to this single reason more than any other. > >Unicode, like any organization, can grow in response to real need. In fact, >it has done so in even core architectural ways in the past. But I would tend >to look at the way that the growth is being suggested here is not in the >best interests of Unicode or the "Unicode user community." > >So actually, I have a better suggestion at this point. One that would meet >the burden of the test that Rick McGowan has made and also one that would >satisfy the crotchety folks like me who insist that this really not a route >that any of the fine minds on the Unicode List should even be considering. > >(What the unfine minds do is of course their own business, but I do not >classify any of the people in this particular conversation as being in that >category!) > >Let us wait and find an ACTUAL example of a TRUE situation where a PUA >encoding is needed that the existing mechanism is not enough. Let the brave >soul who has been forced by the circumstances of fate to deal with this >complex issue come forward and explain how their circumstance and the reason >that the existing PUA mechanism which requires a mutual understanding and a >private agreement is so inadequate. > >No one in this group is UNREASONABLE. But I do think a lot more mindshare is >going to a problem that is THEORETICAL rather than real. And all of us can >probably find useful ways to use Unicode as it stands and then as real needs >come up we can find ways to extend Unicode to meet those real needs. There >are more than enough problems to solve that actually exist that it is almost >insulting that we are off inventing problems that we think might be >important but have no clearcut case of need that is made. > >This will be my final plea here, as even though I do not classify myself as >one of those "fine minds" that I referred to earlier I do have many real >problems with actual scripts that I have true customers for, and I think >they deserve my attention much more than the problems that we are concerned >may exist. > >MichKa > >Michael Kaplan >Trigeminal Software, Inc. >http://www.trigeminal.com/ > > >
Re: Tags and the Private Use Area
From: "William Overington" <[EMAIL PROTECTED]> > I have updated my suggestion. Here is the latest version for discussion. Lets consider the fact that what you are looking for is summarized at the end of your message: "I hope to gain fairly widespread agreement within the unicode user community." I submit that this very desire is a violation of the entire spirit of the PUA, which is about PRIVATE USE and thus widespread acceptance is neither needed nor desired. You can attribute the frustration you are feeling as due to this single reason more than any other. Unicode, like any organization, can grow in response to real need. In fact, it has done so in even core architectural ways in the past. But I would tend to look at the way that the growth is being suggested here is not in the best interests of Unicode or the "Unicode user community." So actually, I have a better suggestion at this point. One that would meet the burden of the test that Rick McGowan has made and also one that would satisfy the crotchety folks like me who insist that this really not a route that any of the fine minds on the Unicode List should even be considering. (What the unfine minds do is of course their own business, but I do not classify any of the people in this particular conversation as being in that category!) Let us wait and find an ACTUAL example of a TRUE situation where a PUA encoding is needed that the existing mechanism is not enough. Let the brave soul who has been forced by the circumstances of fate to deal with this complex issue come forward and explain how their circumstance and the reason that the existing PUA mechanism which requires a mutual understanding and a private agreement is so inadequate. No one in this group is UNREASONABLE. But I do think a lot more mindshare is going to a problem that is THEORETICAL rather than real. And all of us can probably find useful ways to use Unicode as it stands and then as real needs come up we can find ways to extend Unicode to meet those real needs. There are more than enough problems to solve that actually exist that it is almost insulting that we are off inventing problems that we think might be important but have no clearcut case of need that is made. This will be my final plea here, as even though I do not classify myself as one of those "fine minds" that I referred to earlier I do have many real problems with actual scripts that I have true customers for, and I think they deserve my attention much more than the problems that we are concerned may exist. MichKa Michael Kaplan Trigeminal Software, Inc. http://www.trigeminal.com/
Re: Tags and the Private Use Area
I have updated my suggestion. Here is the latest version for discussion. Let there exist the idea that there is U+12 (PUA INTERPRETATION TAG) and a set of private use area tag characters (U+100020 U+10007F) all of which code points are in the upper private use area. May I suggest that mention is made that, where displayed for analysis purposes, these private use area tags should be displayed as yellow on a red background. Ordinary unicode tags displayed for analysis are not specified to be displayed in any specific colour but some people might like to display them as white on blue so as not to conflict visually with these suggested private use area tags. Naturally this definition within the private use area is not an absolute definition and the Unicode Consortium is not being asked to endorse it nor would they, by their own statement. All that could be reasonably sought is that the practice and such protocols that are expressed using such private use area tags are so well thought out and designed by interested users that most users will wish to use them for most applications where private use area characters are used. It cannot be expected that most users will agree to such a system, yet one can always hope. Specific protocols to use with such tagging can be devised. I put forward the idea that, in a file of plain unicode text that contains characters from the private use area, information about the character set or sets to which private use area codes refer may, if so desired, be included within the file (before the use of any character to which the information relates) by including the U+12 character followed by a number of private use area tag characters from the set of private use area tag characters (U+100020 U+10007F) which express one or more groups of characters in the following formats. A Uniform Resource Locator of a font file. For example, http://www.somewebsite.net/oldchem.ttf A Uniform Resource Locator of a description file of the characters within square brackets. For example, [http://www.somewebsite.net/oldchem.htm] A comment about the characters in natural language within wavy brackets. For example, {Symbols used in early chemistry} A list specifying the parts of the private use areas to which this description refers within round brackets. For example, (E000..E2FF,E700..E7FF) The name of the font to be used is always expressed as a full Uniform Resource Locator using the private use area tag codes, though a software package using the data may, if it wishes to take the risk, simply use the file name at the very end of the said Uniform Resource Locator and search for that file name in its own local font directory without accessing the internet. The suggestion is open for discussion and I hope to gain fairly widespread agreement within the unicode user community. William Overington 26 April 2001
Re: Tags and the Private Use Area
Eric Muller quoted from a Seybold Report, but... I think it's out of date. Actually, I'm not talking about the "Gaiji Problem". It's a well-known special case of needing things that aren't in the standard one is using; but it's a private need. As long as the system you're using lets you make a character & font for your own purposes, you can use it. Most of those proprietary systems do so. But most such existing systems start with something that is far less complete than Unihan, and hence have greater need for utilizing home-grown gaiji. I would assert that given the 20,000 Unihan characters originally encoded, topped off with what has been recently encoded, there should now be almost zero need in any of the Han-using countries for any such Gaiji as "many 'unofficial' Kanji characters, mistakes and misinterpretations, and seldom-used Kanji passed down for generations". Most of those things are already covered by Unicode, and in fact FILLING THAT GAP has been the primary purpose of the most recent tens of thousands of additional Han characters. Company logos are a different matter, of course, but rarely if ever need to be publicly transmitted as characters. So, while Japanese customers might have at one time needed to use something like the PUA, nowadays they shouldn't need to. In any case, the Gaiji of any one installation are installation-specific. In the past, they have never been transmissable, so that doesn't demonstrate any need for widespread transmission of PUA characters. Rick
Re: Tags and the Private Use Area
Rick McGowan scripsit: > I'm looking for a problem to which all of these engineering solutions are > being proposed and discussed. I don't yet see anything that needs to be > solved. I see a theoretically chaotic situation, not an actually chaotic > situation. Well, I wanted to start the CSUR well in advance of actual usage, and encouraged everyone and his brother to register their scripts, *so that* code clash (at least within the conlang community) would never come into existence at all. -- John Cowan [EMAIL PROTECTED] One art/there is/no less/no more/All things/to do/with sparks/galore --Douglas Hofstadter
Re: Tags and the Private Use Area
Rick McGowan wrote: I'm looking for a problem to which all of these engineering solutions are being proposed and discussed. I don't yet see anything that needs to be solved. I see a theoretically chaotic situation, not an actually chaotic situation. Here is a quote from the November 27, 2000 Seybold Report on Publishing Systems, from the article 'The Second Wave of Japanese Desktop Publishing': Gaiji are Kanji characters outside the current JIS and Unicode encoding sets and are not included in a standard font. They comprise many "unofficial" Kanji characters, mistakes and misinterpretations, and seldom-used Kanji passed down for generations, long before printing presses and governments created standards. These Gaiji characters are widely used in people- and place-names. To this day, they are a reason for publishers to hang on to their proprietary systems. I personally don't know enough about Japanese to say if this is indeed a character collection problem, or only a glyph variant problem; I suspect it is a combination of both. Like many on this list, I am entirely of the opinion that every character is either currently in Unicode or on the way for a feature version. However good the process may be to add characters, it still remains that there is a lag and something has to be done in the meantime; until three weeks ago, there were about 40,000 characters I may have a need for, and my only Unicode-compatible solution was to use the PUA. Until Unicode 3.2 is out, there are characters in JIS X 0213:2000 which are not in Unicode. And all the descriptions of Gaiji I have heard suggest that there are characters that will not make it for a long time. In other words, for a Japanese publisher, it seems that the PUA is something you have to use every day, in almost every document. Would that qualify for your search? Eric.
Re: Tags and the Private Use Area
David Starner <[EMAIL PROTECTED]> wrote: > Most of the PUA usages seem to be stuff the UTC refuses to encode > (Apple's logo, Klingon First a correction: UTC has not yet, unfortunately, actually *refused* to encode Klingon. It still sits on the books. (I think they should formally refuse to encode Klingon, but that's my personal opinion.) But let me re-phrase my original argument. My question is not: "Has anyone made a survey of things that people might use?" but rather "Are people widely exchanging, for example, characters that are listed in the ConScript registry?" I don't think they are. At least, they're not doing it in a widespread enough manner to be a noticeable problem that needs to be fixed. I'm looking for a problem to which all of these engineering solutions are being proposed and discussed. I don't yet see anything that needs to be solved. I see a theoretically chaotic situation, not an actually chaotic situation. But, maybe I'm just not in the groove, and I don't know what communities are using these PUA characters. So I'm asking about instances and examples of actual usage. Rick
Re: Tags and the Private Use Area
On Wed, Apr 25, 2001 at 09:16:43AM -0700, Rick McGowan wrote: > For the most-part, it's been my impression that actual PUA usages are very > localized and platform-specific, and the characters tend not to leak all > over the place. If end-users have a demonstrable need to widely > communicate some set of characters, I would think they might first consider > them as candidates for standardization; not as evidence that arcane > regulatory mechanisms need to be engineered for the PUA. Most of the PUA usages seem to be stuff the UTC refuses to encode (Apple's logo, Klingon (I've seen it used on the web with the ConScript encoding, though I didn't have a font that would display it), etc.), not stuff that would be encoded if submitted, with the exception of the MathML stuff, which was proposed and encoded. As for the suggestion to use another encoding marker (x-mike-pua1), that has the problem that non-PUA parts of the page can't be displayed and that doesn't work for untagged Unicode only systems (dict protocol, for example). -- David Starner - [EMAIL PROTECTED] Pointless website: http://dvdeug.dhis.org "I don't care if Bill personally has my name and reads my email and laughs at me. In fact, I'd be rather honored." - Joseph_Greg
Re: Tags and the Private Use Area
There has been a lot of recent discussion about various uses of the PUA. Can someone point to widespread instances of confusion and chaos right now over PUA usage? I don't think there is any. It seems to me there's a lot of effort being expended to engineer the regulation of something that hasn't been shown to be a problem in the world, or in any need of regulation. For the most-part, it's been my impression that actual PUA usages are very localized and platform-specific, and the characters tend not to leak all over the place. If end-users have a demonstrable need to widely communicate some set of characters, I would think they might first consider them as candidates for standardization; not as evidence that arcane regulatory mechanisms need to be engineered for the PUA. Most things that most people need are already encoded. I don't see people coming to this list with existing collections of entities that are not encoded, and yet need to be widely transmitted and stored. (Of course, I am not referring to unencoded minority scripts, many of which UTC already knows about.) If such collections were widespread, I'm sure UTC would like to hear about them. Rick