RE: [unicode] Re: FW: Inappropriate Proposals FAQ
On 07/05/2002 03:00:35 PM Marco Cimarosti wrote: David Possin wrote: But, if something it silently ignored, then somebody has discovered something that nobody wants to touch. I have observed this sevaral times now, the latest incident was in the Chromatic Font Research thread, with 2 cases: Aztec glyphs: [...] Silence. Funny. I interpreted that silence the opposite way: very positively. I didn't expect any immediate action, and the absence of denials made me feel the information I passed was not totally pointless. Anyway, even if the silence actually meant Who cares?, it doesn't bother me, because I think this is NOT an issue for Unicode... I think Marco has got this right. Let's suppose Aztec writing gets deciphered and there are cases of the same shape with different colouring to mean different things. Let's further suppose that we determine that the difference in semantics isn't akin to the ways in which colouring of English text might conceivably be used (e.g. for emphasis) but is really fundamental. Let's also further suppose that, taking all things into consideration, we really do consider this text and come to the conclusion that the best solution is one that's purely text-based (i.e. no markup or other higher-level protocal). We're nowhere near having made all these conclusions, but let's just suppose. So, we identify two things that are minimally contrastive: a red-and-white-whatsit, and a blue-green-and-yellow-whatsit. They are two entities and each gets a codepoint. That's an encoding issue. How they get rendered isn't an encoding issue. Of course, at that point, we'd be wanting to consider how to deal with chromatic issues in text rendering where chromaticity is inherent to the character and not a matter of user-discretion (for which formatting is appropriate). But we are not yet at the point of knowing that is even necessary. And since it would clearly not be a trivial problem to solve (it's not finding a way to do it that's hard -- it's the huge amount of secondary implications), I think the silence amounts to a reaction that we neither are ready to cross that bridge nor do we need to at this time -- in fact, it's not yet certain that we will ever need to -- and that in the mean time there are more immediate and real concerns to be dealt with. - Peter --- Peter Constable Non-Roman Script Initiative, SIL International 7500 W. Camp Wisdom Rd., Dallas, TX 75236, USA Tel: +1 972 708 7485 E-mail: [EMAIL PROTECTED]
RE: [unicode] Re: FW: Inappropriate Proposals FAQ
Marco, I see your point, you are probably right. Peter, I agree with you that color or other attributes are not Unicode issues when each entity has a different meaning. Each just gets their own codepoint. I was just trying to draw out something useful from a rather useless long thread that wasted a lot of time. What I am trying to understand is where exactly the color (or smell, sound) information gets added to the code. Does the font developer add this information to the glyphs and the rendering engine processes the information correctly? The two glyphs look the same otherwise, so how would the rendering engine know what to do without the attribute info? Does a mechanism for attribute already exist when a glyph gets sent from the font to the rendering engine? I know that this Aztec writing system will probably never be encoded, but I like to think in advance about possible solutions when a related issue might pop up some day, even if I push into the back of my brain then. Looking at the amount of time wasted already, a few thoughts about the only usable issue in the thread shouldn't be a waste of time. So, if it is a font issue, how would these attributes be stored with the font? Or does it look up the descriptive attributes assigned to each Unicode character? Meaning now that the attribute is defined in the Unicode standard as part of the character description, thus a Unicode issue after all? Hmm - full circle - chicken or egg? Dave --- [EMAIL PROTECTED] wrote: On 07/05/2002 03:00:35 PM Marco Cimarosti wrote: David Possin wrote: But, if something it silently ignored, then somebody has discovered something that nobody wants to touch. I have observed this sevaral times now, the latest incident was in the Chromatic Font Research thread, with 2 cases: Aztec glyphs: [...] Silence. Funny. I interpreted that silence the opposite way: very positively. I didn't expect any immediate action, and the absence of denials made me feel the information I passed was not totally pointless. Anyway, even if the silence actually meant Who cares?, it doesn't bother me, because I think this is NOT an issue for Unicode... I think Marco has got this right. Let's suppose Aztec writing gets deciphered and there are cases of the same shape with different colouring to mean different things. Let's further suppose that we determine that the difference in semantics isn't akin to the ways in which colouring of English text might conceivably be used (e.g. for emphasis) but is really fundamental. Let's also further suppose that, taking all things into consideration, we really do consider this text and come to the conclusion that the best solution is one that's purely text-based (i.e. no markup or other higher-level protocal). We're nowhere near having made all these conclusions, but let's just suppose. So, we identify two things that are minimally contrastive: a red-and-white-whatsit, and a blue-green-and-yellow-whatsit. They are two entities and each gets a codepoint. That's an encoding issue. How they get rendered isn't an encoding issue. Of course, at that point, we'd be wanting to consider how to deal with chromatic issues in text rendering where chromaticity is inherent to the character and not a matter of user-discretion (for which formatting is appropriate). But we are not yet at the point of knowing that is even necessary. And since it would clearly not be a trivial problem to solve (it's not finding a way to do it that's hard -- it's the huge amount of secondary implications), I think the silence amounts to a reaction that we neither are ready to cross that bridge nor do we need to at this time -- in fact, it's not yet certain that we will ever need to -- and that in the mean time there are more immediate and real concerns to be dealt with. - Peter --- Peter Constable Non-Roman Script Initiative, SIL International 7500 W. Camp Wisdom Rd., Dallas, TX 75236, USA Tel: +1 972 708 7485 E-mail: [EMAIL PROTECTED] = Dave Possin Globalization Consultant www.Welocalize.com http://groups.yahoo.com/group/locales/ __ Do You Yahoo!? Yahoo! Autos - Get free new car price quotes http://autos.yahoo.com
RE: [unicode] Re: FW: Inappropriate Proposals FAQ
On 07/17/2002 02:06:11 PM David Possin wrote: I was just trying to draw out something useful from a rather useless long thread that wasted a lot of time. I certainly won't object to any attempt to redeem value from what may have seemed worthless. - Peter --- Peter Constable Non-Roman Script Initiative, SIL International 7500 W. Camp Wisdom Rd., Dallas, TX 75236, USA Tel: +1 972 708 7485 E-mail: [EMAIL PROTECTED]
RE: Inappropriate Proposals FAQ
At 01:27 PM 7/11/2002 -0400, Suzanne M. Topping wrote: Unicode is a character set. Period. Well, maybe. But in a much broader sense then the character sets it subsumes in its listings. Each character has numerous properties in Unicode, whereas they generally don't in legacy character sets. Maybe Unicode is more of a shared set of rules that apply to low level data structures surrounding text and its algorithms then a character set. The Unicode consortium very wisely keeps it's focus narrow. It provides a mechanism for specifying characters. Not for manipulating them, not for describing them, not for making them twinkle. All true, except for some special cases (BOM, bidi issues and algoirthms, vertical variants, etc).Not saying those shouldn't be in there, just that they are useful only in the use of algorithms that are explicit (bi-di) or assumed (upper case/lower case, vertical/horizontal) etc. In many cases, these algorthms are not well known, even amongst the cognoscenti, or generally available in nice libraries. Anyone for an open source Japanese word splitting library (I know not taking a look at ICU before I press send is going to come back to haunt me on this, but if it is in there, then substitute something that isn't :) Barry Caplan www.i18n.com
RE: Inappropriate Proposals FAQ
-Original Message- From: Barry Caplan [mailto:[EMAIL PROTECTED]] At 01:27 PM 7/11/2002 -0400, Suzanne M. Topping wrote: Unicode is a character set. Period. Each character has numerous properties in Unicode, whereas they generally don't in legacy character sets. Each character, or some characters? Maybe Unicode is more of a shared set of rules that apply to low level data structures surrounding text and its algorithms then a character set. Sounds like the start of a philosophical debate. If Unicode is described as a set of rules, we'll be in a world of hurt. The Unicode consortium very wisely keeps it's focus narrow. It provides a mechanism for specifying characters. Not for manipulating them, not for describing them, not for making them twinkle. All true, except for some special cases (BOM, bidi issues and algoirthms, vertical variants, etc).Not saying those shouldn't be in there, just that they are useful only in the use of algorithms that are explicit (bi-di) or assumed (upper case/lower case, vertical/horizontal) etc. humour Why mess up a nice clean statement simply because of a few hard facts? /humour I choose to look at this stuff as the exceptions that make the rule. (On a serious note, these exceptions are exactly what make writing some sort of is and isn't FAQ pretty darned hard. I can't very well say that Unicode manipulates characters given certain historical/legacy conditions and under duress. If I did, people would be scurrying around trying to figure out how to foment the duress.)
Status update re. Inappropriate Proposals FAQ
I'm nearly done playing catchup after vacation and hope to begin extracting concepts for the FAQ next week. Thanks to all who've submitted input, as conflicting and varied as it is/was. -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED]] Sent: Wednesday, July 03, 2002 8:53 I would like to once again suggest that we refocus this 'FAQ' AWAY from a repetition of the Principles and Procedures document maintained by WG2 and containing the explanation of what constitutes a valid *formal* proposal.
Hmm, this evolved into an editorial when I wasn't looking :) was: RE: Inappropriate Proposals FAQ
At 05:13 PM 7/12/2002 -0400, Suzanne M. Topping wrote: -Original Message- From: Barry Caplan [mailto:[EMAIL PROTECTED]] At 01:27 PM 7/11/2002 -0400, Suzanne M. Topping wrote: Unicode is a character set. Period. Each character has numerous properties in Unicode, whereas they generally don't in legacy character sets. Each character, or some characters? For all intents and purposes, each character. Chapter 4.5 of my Unicode 3.0 book says The Unicode Character Database on the CDROM defines a General Category for all Unicode characters So, each character has at least one attribute. One could easily say that each character also has an attribute for isUpperCase of either true of false, and so on. There are no corresponding features in other character sets usually. Maybe Unicode is more of a shared set of rules that apply to low level data structures surrounding text and its algorithms then a character set. Sounds like the start of a philosophical debate. Not really. I have been giving presentations for years, and I have seen many others give similar presentations. A common definition of character set is a list of character you are interested in assigned to codepoints. That fits most legacy character sets pretty well, but Unicode is sooo much more than that. If Unicode is described as a set of rules, we'll be in a world of hurt. Yeah, one of the heaviest books I own is Unicode 3.0. I keep it on a low shelf so the book of rules describing Unicode doesn't fall on me for just that reason. this is earthquake country after all :) I choose to look at this stuff as the exceptions that make the rule. I don't really know if it is possible to break down Unicode into more fundamental units if you started over. Its complexity is inherent in the nature of the task. My own interest is more in getting things done with data and algorithms that use the type of material represented by the Unicode standard, more so than the arcania of the standard itself. So it doesn't bother me so much that there are exceptions - as long as we have the exceptions that everyone agrees on, that is fine by me because it means my data and at least some of my algorithms are likely to be preservable across systems. (On a serious note, these exceptions are exactly what make writing some sort of is and isn't FAQ pretty darned hard. humor Be careful what you ask for :) /humor I can't very well say that Unicode manipulates characters given certain historical/legacy conditions and under duress. Why not? It is true. But what if we took a look at it from a different point of view, that the standard is a agreed upon set of rules and building blocks for text oriented algorithms? Would people start to publish algorithms that extend on the base data provided so we don't have to reinvent wheels all the time? I'm just brainstorming here, this is all just coming to me now. If I were to stand in front of a college comp sci class, where the future is all ahead of the students, what proportion of time would I want to invest in how much they knew about legacy encodings versus how much I could inspire them to build from and extend what Unicode provides them? Seriously, most of the folks on this list that I know personally, and I include myself in this category, are approaching or past the halfway point in our careers. What would we want the folks who are just starting their careers now to know about Unicode and do with it by the time they reach the end of theirs, long after we have stopped working? For many applications, people are not going to specialize in i18n/l10n issues. They need to know what the appropriate building text based blocks are, and how they can expand on them while still building whatever they are working on. Unicode at least hints at this with the bidi algorothm. Moving forward should other algorithms be codified into Unicode, or as separate standards or defacto standards? I am thinking of Japanese word splitting algorithm. There are proprietary products that do this today with reasonable but not perfect results. Are they good enough that the rules can be encoded into a standard? If so, then someone would build an open implementation, and then there would always be this building block available for people to use. I am sure everyone on this list can think of their own favorite algorithms of this type, based on the part of Unicode that interests you the most. My point is that the raw information already in unicode *does* suggest the next level of usage, and the repeated newbie questions that inspired this thread suggest the need for a comprehensive solution at a higher level then a character set provides. Maybe part of this means including or at least facilitating the description of lowlevel text handling algorithms. If I did, people would be scurrying around trying to figure out how to foment the duress.) The accomplishments of the Unicode
What Unicode Is (was RE: Inappropriate Proposals FAQ)
Suzanne responded: Maybe Unicode is more of a shared set of rules that apply to low level data structures surrounding text and its algorithms then a character set. Sounds like the start of a philosophical debate. If Unicode is described as a set of rules, we'll be in a world of hurt. (On a serious note, these exceptions are exactly what make writing some sort of is and isn't FAQ pretty darned hard. Hmm. Since the discussion which started out trying to specify a few examples of what kinds of entities would be inappropriate to proffer for encoding as Unicode characters seems to be in danger of mutating into the recurrent What is Unicode? question, perhaps its time to start a new thread for the latter. And now for some ontological ground rules. When trying to decide what a thing is, it helps not to use an attribute nominatively, since that encourages people to privately visualize the noun the attribute is applied to, but to do so in different ways -- and then to argue past each other because they are, in the end, talking about different things. Unicode is used attributatively of a number of things, and if we are going to start arguing/discussing what it is, it would be better to lay out the alternative its a little more specifically first. 1. The Unicode *Consortium* is a standardization organization. It started out with a charter to produce a single standard, but along the way has expanded that charter, in response to the desire of its membership. In addition to The Unicode Standard, it now has adopted a terminology that refers to some of its other publications as Unicode Technical Standards [UTS], of which two formally exist now: UTS #6 SCSU, and UTS #10 Unicode Collation Algorithm [UCA]. It is important to keep this straight, because some people, when they say Unicode are talking about the *organization*, rather than the Unicode Standard per se. And when people talk about the standard, they are generally referring to The Unicode Standard, but the Unicode Consortium is actually responsible for several standards. 2. The Unicode *Standard* itself is a very complex standard, consisting of many pieces now. To keep track of just what something like The Unicode Standard, Version 3.2 means, we now have to keep web pages enumerating all the parts exactly -- like components in an assemble-your-own-furniture kit. See: http://www.unicode.org/unicode/standard/versions/ In any one particular version, the Unicode Standard now consists of a book publication, some number of web publications (referred to as Unicode Standard Annexes [UAX]), and a large number of contributory data files -- some normative and some informative, some data and some documentation. These definitions, including the exact list of contributory data files and their versions, are themselves under tight control by the Unicode Technical Committee, as they constitute the very *definition* of the Unicode Standard. It is not by accident that the version definitions start off now with the following wording: The Unicode Standard, Version 3.2.0 is defined by the following list... and so on for earlier versions. 3. The Unicode *Book* is a periodic publication, constituting the central document for any given version of the Unicode *Standard*, but is by no means the entire standard. The book, in turn, is very complex, consisting of many chapters and parts, some of which constitute tightly controlled, normative specification, and some of which is informative, editorial content. The book now also exists in an online version (pdf files): http://www.unicode.org/unicode/uni2book/u2.html which is *almost* identical to the published hardcover book, but not quite. (The Introduction is slightly restructured, the online glossary is restructured and has been added to, the charts are constructed slightly differently and have introductory pages of their own, etc.) 4. The Unicode *CCS* [coded character set] is the mapping of the set of abstract characters contained in the Unicode repertoire (at any given version) to a bunch of code points in the Unicode codespace (0x..0x10). Technically speaking, it is the Unicode *CCS* which is synchronized closely with ISO/IEC 10646, rather than the Unicode *Standard*. 10646 and the Unicode CCS have exactly the same coded characters (at various key synchronization points in their joint publication histories), but the *text* of the ISO/IEC 10646 standard doesn't look anything like the *text* of the Unicode Standard, and the Unicode Standard [sensum #2 above] contains all kinds of material, both textual and data, that goes far beyond the scope of 10646. There are other standards produced by some national bodies that are effectively just translations of 10646 (GB 13000 in China, JIS X 0221 in Japan), but the Unicode Standard is nothing like those. Finally, the attribute Unicode ... can be applied to all kinds of other things characteristic of the Unicode Standard, including algorithms for the
Re: What Unicode Is (was RE: Inappropriate Proposals FAQ)
At 03:54 PM 7/12/2002 -0700, Kenneth Whistler wrote: Suzanne responded: Maybe Unicode is more of a shared set of rules that apply to low level data structures surrounding text and its algorithms then a character set. O.k., so now before asserting or denying that Unicode ... is a shared set of rules, it would be helpful to pin down first what you are referring to. That might make the ensuing debate more fruitful. Actually, it was me, not Suzanne, that called Unicode a shared set of rules. As Ferris Bueller once said I'll take the heat for this. I was aware of all of the uses of Unicode that you listed. I have no quarrels with any of them. They do point to the fact that the word is overloaded with definitions. Which means that readers have to choose the appropriate one from the context. The context of the statement above is that the Unicode referred to is the Standard, and all associated documentation. Not Unicode the Consortia which manages the Standard. Not Unicode the way of life :) I did intend to throw open a debate about the long term future of Unicode the Standard and by extension Unicode the Consortia. Since Suzanne is writing What is Unicode and is not Unicode FAQ, I think the answer to that is going to be very definitely colored by the answer to the related question What will Unicode become?, e.g. Unicode 6.0, 7.0, 8.0, etc. See my previous msg, subject line: Hmm, this evolved into an editorial when I wasn't looking :) for some thoughts on that subject. Barry Caplan www.i18n.com
Re: Hmm, this evolved into an editorial when I wasn't looking :) was: RE: Inappropriate Proposals FAQ
Barry Caplan wrote: At 01:27 PM 7/11/2002 -0400, Suzanne M. Topping wrote: Unicode is a character set. Period. Each character has numerous properties in Unicode, whereas they generally don't in legacy character sets. Each character, or some characters? For all intents and purposes, each character. So, each character has at least one attribute. Yes. The implications of the Unicode Character Database include the determination that the UTC has normatively assigned properties (multiple) to all Unicode encoded characters. Actually, it is a little more subtle than that. There are some properties which accrue to code points. The General Category and the Bidirectional Category are good examples, since they constitute enumerated partitions of the entire codespace, and API's need to return meaningful values for any code point, including unassigned ones. Other properties accrue more directly to characters, per se. They attach to the abstract character, and get associated with a code point more indirectly by virtue of the encoding of that character. The numeric value of a character would be a good example of this. No one expects an unassigned code point or an assigned dingbat character or a left bracket to have a numeric value property (except perhaps a future generation of Unicabbalists). There are no corresponding features in other character sets usually. Correct. Before the development of the Unicode Standard, character encoding committees tended to leave that property assignments either up to implementations (considering them obvious) or up to standardization committees whose charter was character processing -- e.g. SC22/WG15 POSIX in the ISO context. The development of a Universal character encoding necessitated changing that, bringing character property development and standardization under the same roof as character encoding. Note that not everyone agrees about that, however. We are still having some rather vigorous disagreements in SC22 about who owns the problem of standardization of character properties. A common definition of character set is a list of character you are interested in assigned to codepoints. That fits most legacy character sets pretty well, but Unicode is sooo much more than that. Roughly the distinction I was drawing between the Unicode CCS and the Unicode Standard. But what if we took a look at it from a different point of view, that the standard is a agreed upon set of rules and building blocks for text oriented algorithms? Would people start to publish algorithms that extend on the base data provided so we don't have to reinvent wheels all the time? Well the Unicode Standard isn't that, although it contains both formal and informal algorithms for accomplishing various tasks with text, and even more general guidelines for how to do things. The members of the Unicode Technical Committee are always casting about for areas of Unicode implementation behavior where commonly defined, public algorithms would be mutually beneficial for everyone's implementations and would assist general interoperability with Unicode data. To date, it seems to me that the members, as well as other participants in the larger effort of implementing the Unicode Standard, have been rather generous in contributing time and brainpower to this development of public algorithms. The fact that ICU is an Open Source development effort is enormously helpful in this regard. If I were to stand in front of a college comp sci class, where the future is all ahead of the students, what proportion of time would I want to invest in how much they knew about legacy encodings versus how much I could inspire them to build from and extend what Unicode provides them? This problem, of Unicode in the computer science curriculum, intrigues me -- and I don't think it has received enough attention on this list. One of my concerns is that even now it seems to be that CS curricula not only don't teach enough about Unicode -- they basically don't teach much about characters, or text handling, or anything in the field of internationalization. It just isn't an area that people get Ph.D.'s in or do research in, and it tends to get overlooked in people's education until they go out, get a job in industry and discover that in the *real* world of software development, they have to learn about that stuff to make software work in real products. (Just like they have to do a lot of seat-of-the-pants learning about a lot of other topics: building, maintaining, and bug-fixing for large, legacy systems; software life cycle; large team cooperative development process; backwards compatibility -- almost nothing is really built from scratch!) The major work ahead is no longer in the context of building a character standard. Time is fast approaching to decide to keep it small and apply a bit of polish, or focus on the use and usage of what is already there in Unicode by those who
RE: Inappropriate Proposals FAQ
Apologies for the delayed response to this thread, I've been out of town. -Original Message- From: William Overington [mailto:[EMAIL PROTECTED]] Sent: Friday, July 05, 2002 10:22 AM For the avoidance of doubt I am not saying that the Unicode Technical Committee should necessarily accept such items as your furniture idea for encoding, I am simply saying that any decision as to what may be encoded and what shall and what shall not be encoded should be made by the Unicode Technical Committee on the basis of the scientific situation at the time that an encoding proposal is formally considered. I feel that it would be a major error for the Unicode Consortium to publish a FAQ document which prejudices the fair consideration of characters based upon new technologies which may arise in the future. While your thoughts on executing the floor plan idea are truly gobsmacking, I have to confess that I'd raised the concept precisely because it is -not- an appropriate issue for Unicode. Unicode is a character set. Period. When setting out on any endeavor, you have to be clear on what the intent is. If you want to go to the park and have a picnic, you set out parameters for that activity. If you allow a bunch of people to stop you along the way to buy shoes, see a movie, visit their aunt in the hospital, and get the oil changed in their car, you probably won't be able to accomplish the initial goal. If you develop a program for creating room layouts using graphics of furniture and architectural details, you probably shouldn't include modules to manage the drug histories of AIDS victims. That doesn't make tracking drug histories of AIDS victims unimportant, it means that they aren't a logical set of requirements to add to a room layout program. The Unicode consortium very wisely keeps it's focus narrow. It provides a mechanism for specifying characters. Not for manipulating them, not for describing them, not for making them twinkle. You clearly have widely ranging ideas for unique text and symbol applications. It would be great if you could channel that energy into developing ideas for a manipulation layer that could take Unicode characters, manipulate them, and deliver them in a cross-platform portable way which would allow them to be displayed and used in the ways that you envision. As recent discussions on this list have shown, Unicode is just one piece in the puzzle. Font and rendering issues for many languages remain serious stumbling blocks for actual use, even though the characters themselves are encoded. Any work you could do toward advancement of a manipulation layer that would ease the task of rendering characters as they are actually needed and used would be a tremendous boon. I would imagine that you would find a reasonable level of interest from a wide range of communities; font developers, bidi word processing developers, accessibility experts, minority script advocates, etc. I'll bet that some of the regular old Unicodies might even want to listen in. It would be sad if your energy and enthusiasm were dampened by the repeated denials you receive through this list. The ideas you generate are interesting, and often worth investigation. However, they are not appropriate additions to Unicode. I'm setting up a new group which can hopefully act as an appropriate venue for these types of discussions. As soon as I come up with a decent name for it, I'll send off an invitation with instructions for joining to the Unicode list. All the best, Suzanne Topping BizWonk Inc. [EMAIL PROTECTED]
RE: Chromatic text. (follows from Re: [unicode] Re: FW: Inappropriate Proposals FAQ)
William Overington wrote: The problem (if there is one!) is only for font technology. Ethiopian writing: [...] The capability to the same electronically would be well received. /Daniel. Same for this one: Unicode's task was to provide a code point for the Ethiopic full stop, and they did. Whether the corresponding glyph is colored or not is problem for fonts and word processors. Well, may I please suggest that the issue is one for Unicode as well as for font technology? [...] Of course you can. But my feeling is that you already *did* suggest this, many and many times. I interpret your post as one more lengthy repetition of your well-known opinion: differences between plain text and rich text should not exist: they should be eliminated by incorporating the mark-up in the encoding. I think that it is your right to repeat your opinions as many times as you want. Nevertheless, I find that repeating opinions which are already well-known to everybody is *useless* and *boring*. _ Marco
Re: Chromatic text. (follows from Re: [unicode] Re: FW: Inappropriate Proposals FAQ)
Marco Cimarosti wrote as follows. Of course you can. But my feeling is that you already *did* suggest this, many and many times. Actually I was trying in the posting upon which you comment to suggest that, even if people do not agree with me about having colour codes in a plain text file, they might perhaps consider as a separate issue the adding into regular Unicode of a zero width operator whose use would be to indicate that a character, such as U+1362, should be decorated chromatically. This would mean that a sequence U+1362 ZWJ ZWCDO could be used in documents, which would give a chromatically decorated glyph with a chromatic font yet would just give U+1362 as a monochrome character if the font did not recognize the U+1362 ZWJ ZWCDO sequence. I interpret your post as one more lengthy repetition of your well-known opinion: differences between plain text and rich text should not exist: they should be eliminated by incorporating the mark-up in the encoding. Actually, that is not my opinion. My opinion is that splitting text files into just two categories, either plain text or markup is not sufficient, but that there should perhaps be more categories or, if there are but two categories that the dividing line between them should be in a different place. I tend to base the essential dividing line upon whether the encoding of the file of code points is meaningful if one tries to compute the effect of a code point upon the system as simply the effect of that code point as it stands, without having to have software recognize a character such as and determine that a markup bubble is being entered then to have to read in several more characters within the markup bubble before taking any action as a result of the first character in the sequence (that is, the character) being read. That distinction means that each Unicode character is processed as it is received within the main loop of the program, without the receiving of a character putting the processing into an inner loop within a markup bubble, within which bubble ordinary Unicode character codes which are read have a different meaning than in the Unicode specification. To me, such a distinction means that people who are using lower cost, more generally available software packages, might by such an approach be able in the not too distant future to use files in a non-proprietary portable format and get much better results than just using monochrome traditional plain text. Perhaps some sort of consensus over nomenclature for three categories of text file could occur, namely plain text in the manner which you like it, plain text in the manner in which I like it and markup. Maybe plain text, enhanced text and markup would be suitable names. How do people feel about that please? It is unfortunately the case in discussions that when someone disagrees with an idea that is put forward that he or she is more likely to respond in public than if he or she agrees with an idea which is put forward, or has simply read about the idea and just notes it as an interesting possibility. This can have the effect that many people may agree with an idea or at least not be against it yet make no comment, perhaps giving an impression that an idea is not well received at large when in fact that is not necessarily the case. William Overington 8 July 2002
RE: Chromatic text. (follows from Re: [unicode] Re: FW: Inappropriate Proposals FAQ)
William Overington wrote: Actually I was trying in the posting upon which you comment to suggest that, even if people do not agree with me about having colour codes in a plain text file, they might perhaps consider as a separate issue the adding into regular Unicode of a zero width operator whose use would be to indicate that a character, such as U+1362, should be decorated chromatically. Come on, William!! Adding such a zero width operator *is* having color in plain text! And adding such zero width operators *is* inserting mark up in plain text! I interpret your post as one more lengthy repetition of your well-known opinion: differences between plain text and rich text should not exist: they should be eliminated by incorporating the mark-up in the encoding. Actually, that is not my opinion. No, I know. This is my explanation of my perception of your explanation of your opinion. Now I am not sure what your perception of my explanation of my perception of your explanation of your opinion might be. Gentlemen, communication is such a difficult art! [...] Perhaps some sort of consensus over nomenclature for three categories of text file could occur, namely plain text in the manner which you like it, plain text in the manner in which I like it and markup. Maybe plain text, enhanced text and markup would be suitable names. How do people feel about that please? I would suggest proletarian text, middle-class text and capitalist text, if I wasn't so scared that someone could take it seriously. It is unfortunately the case in discussions that when someone disagrees with an idea that is put forward that he or she is more likely to respond in public than if he or she agrees with an idea which is put forward, or has simply read about the idea and just notes it as an interesting possibility. This can have the effect that many people may agree with an idea or at least not be against it yet make no comment, perhaps giving an impression that an idea is not well received at large when in fact that is not necessarily the case. Yes, definitely a difficult art. _ Marco
Chromatic text. (follows from Re: [unicode] Re: FW: Inappropriate Proposals FAQ)
The problem (if there is one!) is only for font technology. Ethiopian writing: [...] The capability to the same electronically would be well received. /Daniel. Same for this one: Unicode's task was to provide a code point for the Ethiopic full stop, and they did. Whether the corresponding glyph is colored or not is problem for fonts and word processors. Well, may I please suggest that the issue is one for Unicode as well as for font technology? Firstly, for the avoidance of doubt in the matter, whereas I am an advocate for adding codes into Unicode for effects for organizing and controlling data in ways which some people consider should be done only by markup methods, I am hoping that, without that aspect of my research prejudicing the matter, readers might consider the possibility of adding into regular Unicode some operators for use in ZWJ sequences for requesting that a chromatically decorated glyph of the 'operated upon' regular Unicode character be produced if the font can provide it, yet otherwise that a monochrome ordinary glyph be provided. May I please refer to the following document. http://www.users.globalnet.co.uk/~ngo/courtfor.htm In that document I wrote as follows. quote Here are some codes for use in ZWJ sequences of the form U+ ZWJ U+F3DC and U+ ZWJ U+F3DD so as to provide facilities to indicate to a chromatic font that a colour decorated version of U+ is requested, where U+ represents any Unicode character where such usage would be meaningful. This facility is provided in anticipation of the possibility of chromatic fonts being introduced at some time in the future. U+F3DC ZERO WIDTH DECORATION OPERATOR OF THE FIRST KIND U+F3DD ZERO WIDTH DECORATION OPERATOR OF THE SECOND KIND end quote May I please refer to the following document. http://www.users.globalnet.co.uk/~ngo/courtcol.htm In that document I wrote as follows. quote U+F3E0 BLACK U+F3E1 BROWN U+F3E2 RED U+F3E3 ORANGE U+F3E4 YELLOW end quote So, it would be the case that in order to set some text in black one would use U+F3E0 then the text and in order to set some text in red one would use U+F3E2 then the text. In order to set some text in black including a character U+1362 in black with red flourishes one would use U+F3E0 then the text which precedes the U+1362 character and then U+1362 ZWJ U+F3DC which should do the job perfectly well as the chromatic font would be set up so that the decoration of the first kind operator worked in black and red. More generally, for other chromatic characters from other applications where the colours are not specific, then the chromatic colours can be changed before using the ZWJ sequence. Quoting from the same document. quote Colour changing is by a specially devised method which will hopefully be efficient in practice. Upon receiving one of the 18 codes to change colour, the system presumes on a temporary basis that the new colour is to become the foreground colour, which is what it will usually be. However, the previous foreground colour is stored. If a command to set one of the other four colours is received, then the foreground colour is used for that purpose, with the foreground colour being replaced by the previous foreground colour. This means that only one code point is needed to change the foreground colour and two code points are needed to change the contents of any of the other colour registers. The decoration colours are intended to be ready for the possible introduction of chromatic fonts at some future date. U+F3AC SET NEW BACKGROUND COLOUR U+F3AD SET NEW FIRST DECORATION COLOUR U+F3AE SET NEW SECOND DECORATION COLOUR U+F3AF SET NEW THIRD DECORATION COLOUR end quote In using chromatic font technology I suggest that the specific colours could either be built into the glyph or could just be foreground colour, background colour, first decoration colour, second decoration colour, third decoration colour, in abstract terms with the specific colours being supplied by the rendering software. This would enable some glyphs, such as those for the Ethiopic manuscripts, to be specified as black and red within the font, and for some glyphs, such as those for an ornament, to be specified by the rendering software. As regards the possibility of including such code points as I have mentioned above in regular Unicode, well, there are various levels. There is the issue of whether codes such as U+F3E2 RED above should be promoted as some people feel that they are markup and should not be included in regular Unicode. Well, that is an issue and I quite accept that I am currently in the minority over that issue, though I would ask that readers might look into the matter of the use of such codes in DVB-MHP (Digital Video Broadcasting - Multimedia Home Platform) broadcasting of multimedia where text in Unicode files will be processed by Java programs which have been broadcast to television sets, before taking a definitive view on the matter, and indeed at
Re: Inappropriate Proposals FAQ
Suzanne M. Topping wrote as follows. I see the need for perhaps two entries: one which states clearly what Unicode is NOT, and another which lists a few examples of innapropriate proposals and why they would not be considered. This section would probably refer to the what Unicode isn't entry for support of the whys. I have a few ideas for fictional proposals to use as examples (my room layout idea, and Mark's 3-D Mr. Potato Head representation), but I could use another one or two if anyone feels creative. The closer to being believable, the better, I suppose. (An alternative would be to use real-life proposals, and state why they were not accepted, but I thought it more politic to keep it fictional...) Well, having seen your furniture and room layout idea, presented in the Unicode list, I figured out the method to use to enable your room layout idea to be produced, using the technique, novel as far as I know, of allowing a glyph to contain some software which could be obeyed by the rendering system so as to rotate the points of the Bézier curves of the contours of the glyphs of the items of furniture. This seems to me to be something of a breakthrough in the possibilities for fonts, as including software inside a font which could be obeyed by the rendering system would allow a rendering system to be customized from within a font. It would seem a pity to restrict the future development of the concept by a Unicode Consortium issued FAQ document stating that Unicode will not encode such symbols when it seems that it would be relatively straightforward to implement such fonts. The font would need to contain the software that is to be obeyed. This could be organized so as to be accessed when a glyph is selected, with a central place within the font to store any subroutines called from within the software of the individual glyphs. If this software were in some appropriate portable software format, then the specification of the font format would perhaps not be that difficult, it could be part of an advanced font format that supports both chromatic font information and software in the fonts. For example, the software in the font could be specified to be written in 1456 object code. http://www.users.globalnet.co.uk/~ngo/1456.htm 1456 object code already supports double precision floating point items, integers, characters, strings, complex numbers and quaternions as standard types. Groups are also supported as a type experimentally. Consideration of this concept of software within the font has lead to consideration of how the position and rotation angle of the individual items of furniture could be set to an initial position from within the document and also as to how they could be adjusted by the end user using facilities set up from within the document and this has lead to the idea of having the document be able to open and customize a control panel, which control panel could contain buttons and scrollbars and so on and also a polar scrollbar for continuous rotational adjustment. It would seem, given the fact that 1456 object code supports quaternions and also has some functions of a quaternion variable built in as standard that this could be extended to three-dimensional rotations quite straightforwardly for applications that could use three-dimensional rotations. This is the sort of computational power which I feel that multimedia should be able to utilize, by including Unicode codes directly in a text file, so that the rendering system produces the control panel as instructed by the Unicode codes. This seems to be directly permissible within the definition of character in Annex B of the ISO document which was discussed recently, though perhaps not within the definition of character used by the Unicode Consortium at the present time. I feel that such ideas should not be thrown out by the Unicode Consortium publishing a FAQ document which would prevent it considering for inclusion glyphs in regular Unicode which could make good use of such technological advances. For the avoidance of doubt I am not saying that the Unicode Technical Committee should necessarily accept such items as your furniture idea for encoding, I am simply saying that any decision as to what may be encoded and what shall and what shall not be encoded should be made by the Unicode Technical Committee on the basis of the scientific situation at the time that an encoding proposal is formally considered. I feel that it would be a major error for the Unicode Consortium to publish a FAQ document which prejudices the fair consideration of characters based upon new technologies which may arise in the future. William Overington 5 July 2002
Re: Inappropriate Proposals FAQ
William, For the gods' sake reign in those hares. Interchange protocols for architectural computer-aided design already exists. Character encoding does not apply to anything like that, because there aren't any characters. Object code has nothing to do with character encoding. Your caveat, that you are saying that any decision as to what may be encoded and what shall and what shall not be encoded should be made by the Unicode Technical Committee on the basis of the scientific situation at the time that an encoding proposal is formally considered. I feel that it would be a major error for the Unicode Consortium to publish a FAQ document which prejudices the fair consideration of characters based upon new technologies which may arise in the future. is completely unnececessary. We know quite well what we are doing. We are hoping that with diligent study you will figure it out and get on board. But as Ken has said there is no scientific theory left to puzzle out. There may be aguments as to what specific symbols we wish to add (some people hate them, some people like them) and there the question is one of usage and the semantics of the symbols in general. -- Michael Everson *** Everson Typography *** http://www.evertype.com
Re: [unicode] Re: FW: Inappropriate Proposals FAQ
Resending this email because for some reason my membership in the Unicode list got deleted. --- Rick McGowan [EMAIL PROTECTED] wrote: Suzanne T asked: Can people from the review committee give me some hard and fast rules for when something is thrown out? --snip-- One rule of thumb that people can also use: if an off-the-cuff proposal for a thing doesn't fly on the Unicode list, it is unlikely to fly in UTC. Rick I have noticed something else in this aspect. If an idea gets bashed on the Unicode list it won't make it, and almost all the time I agree that it shouldn't make it. But, if something it silently ignored, then somebody has discovered something that nobody wants to touch. I have observed this sevaral times now, the latest incident was in the Chromatic Font Research thread, with 2 cases: Aztec glyphs: Some of the glyphs are identical in shape and form, but a certain colored area changes the meaning if a different color is applied. When Michael Everson asked for proof, both Marco Cimarosti and I sent him links to websites that state this color issue. Silence. Ethiopian writing: Daniel Yacob described the usage of red dots, accents, and words in that writing system, nobody except WO followed up with the significance of Daniel's statements. Silence, even though he wrote The capability to the same electronically would be well received. /Daniel. I see two valid possible proposals here to add a color attribute to a character. What will happen if a need for these characters is discovered, a consortium with the necessary background is formed, and the UTC receives an orderly proposal? Between all the arguing and mile long emails nobody actually saw this possibility, or wanted to see the valid issues for a proposal. I believe it is necessary to invest some thought into what a color implementation would mean for Unicode, not for a holly with red berries, but for a real writing system. The silence after these valid statements were made disturbs me. Dave = Dave Possin Globalization Consultant www.Welocalize.com http://groups.yahoo.com/group/locales/ __ Do You Yahoo!? Sign up for SBC Yahoo! Dial - First Month Free http://sbc.yahoo.com
RE: [unicode] Re: FW: Inappropriate Proposals FAQ
David Possin wrote: But, if something it silently ignored, then somebody has discovered something that nobody wants to touch. I have observed this sevaral times now, the latest incident was in the Chromatic Font Research thread, with 2 cases: Aztec glyphs: [...] Silence. Funny. I interpreted that silence the opposite way: very positively. I didn't expect any immediate action, and the absence of denials made me feel the information I passed was not totally pointless. Anyway, even if the silence actually meant Who cares?, it doesn't bother me, because I think this is NOT an issue for Unicode. Unicode encodes character, not glyphs! Whether the glyphs representing those characters are colored or not, the only problem for Unicode would be having four-color printing for a future edition of the Unicode Standard... The problem (if there is one!) is only for font technology. Ethiopian writing: [...] The capability to the same electronically would be well received. /Daniel. Same for this one: Unicode's task was to provide a code point for the Ethiopic full stop, and they did. Whether the corresponding glyph is colored or not is problem for fonts and word processors. WHISPERING However, there has been one case when Unicode's silence disturbed me, and this was when I (et al.) raised a real encoding problem, although certainly a minor one. It had to do with encoding repha out of context (repha is one of the contextual glyphs of letter RA in some Indic scripts). /WHISPERING _ Marco
Re: [unicode] Re: FW: Inappropriate Proposals FAQ
At 10:37 am -0700 2002-07-05, David Possin wrote: Aztec glyphs: Some of the glyphs are identical in shape and form, but a certain colored area changes the meaning if a different color is applied. When Michael Everson asked for proof, both Marco Cimarosti and I sent him links to websites that state this color issue. Silence. What did you want me to say? Aztec hasn't been fully deciphered yet, it seems. I did mention Budge's use of a black line over rubricked Egyptian text. If we encountered a script in which colour was really intrinsic we might have to deal with it, but in the real world such a convention would be pretty unstable. How would you carve your name into a tree with a knife if you had no ink with you and D was black but d was blue? Ethiopian writing: Daniel Yacob described the usage of red dots, accents, and words in that writing system, nobody except WO followed up with the significance of Daniel's statements. Silence, even though he wrote The capability to the same electronically would be well received. Would markup not do? I see two valid possible proposals here to add a color attribute to a character. What will happen if a need for these characters is discovered, a consortium with the necessary background is formed, and the UTC receives an orderly proposal? In Quark I can add colour attributes to a character for printing. We would consider an orderly proposal on its merits. -- Michael Everson *** Everson Typography *** http://www.evertype.com
RE: FW: Inappropriate Proposals FAQ
William Overington wrote: As for the script regarding the cipher in relation to a movie. [...] Which one? Doug mentioned three movies: Men in Black II, Indiana Jones and the Temple of Doom, Atlantis: The Lost Empire. _ Marco
Re: FW: Inappropriate Proposals FAQ
Marco Cimarosti wrote as follows. William Overington wrote: As for the script regarding the cipher in relation to a movie. [...] Which one? Doug mentioned three movies: Men in Black II, Indiana Jones and the Temple of Doom, Atlantis: The Lost Empire. _ Marco The Men in Black II one to which Ken referred in the post to which I was replying. William Overington 4 July 2002
Re: FW: Inappropriate Proposals FAQ
At 05:22 +0100 2002-07-04, William Overington wrote: Has the Unicode Technical Committee in fact ever discussed the possibility of whether or not to encode chess diagrams in regular Unicode at all? No, but I suspect that if they were to do so they would quickly come to the conclusion that markup was best for describing the chess board into which the chess characters were arranged. One script which the Unicode Technical Committee has turned down is the Phaistos Disk Script. There is a document, which if I may say so is of high quality, about the proposal for encoding of the Phaistos Disk Script into Unicode, into Plane 1. It was rejected. Is this really for a purported reason that the script is only found on one item at present? That, and the fact that it hasn't been deciphered. How about discussing the Phaistos Disk Script in this new FAQ document? How about not? General aphorisms are better in this case. As for the script regarding the cipher in relation to a movie. I feel that the characters are very elegant and I noticed in particular that the characters are non-contiguous within themselves. The point made was that the writing system was cumbersome and inefficient qua writing system. -- Michael Everson *** Everson Typography *** http://www.evertype.com
Re: FW: Inappropriate Proposals FAQ
William Overington had written: the proposal for encoding of the Phaistos Disk Script into Unicode [...] was rejected. Is this really for a purported reason that the script is only found on one item at present? Michael Everson wrote: That, and the fact that it hasn't been deciphered. Which implies that you really cannot tell what constitutes a character, in that script, nor its writing-direction. William Overington had written: I noticed in particular that the characters are non-contiguous within themselves. This is not enough to decide what constitutes a character. E. g. the cyrillic character Ы is non-contiguous; in the Latin script, when f and l are ligated, they still constitute two characters. Best wishes, Otto Stolz
Re: FW: Inappropriate Proposals FAQ
At 17:07 +0200 2002-07-04, Otto Stolz wrote: Michael Everson wrote: That, and the fact that it hasn't been deciphered. Which implies that you really cannot tell what constitutes a character, in that script, nor its writing-direction. Actually it appears that the writing direction is into the faces of the characters, and that the impression in the clay indicates that this was the likely direction of printing. And as far as the repertoire is concerned, it is small and each character is easily differentiated from the others. -- Michael Everson *** Everson Typography *** http://www.evertype.com
Re: FW: Inappropriate Proposals FAQ
On Thursday, July 4, 2002, at 09:07 AM, Otto Stolz wrote: Michael Everson wrote: That, and the fact that it hasn't been deciphered. Which implies that you really cannot tell what constitutes a character, in that script, nor its writing-direction. Actually, you can't even tell *that* it's a script, not for sure. But if it *is* writing, then the nature of the characters seems fairly unambiguous as the various signs are self-contained and don't break down into smaller pieces. It would appear to be a syllabary. Also IIRC the writing direction has been deduced by determining the order in which the characters were stamped into the clay (as indicated by overlaps). I should mention that the proposals for the encoding of the Phaistos disc are the only proposals made to the UTC and WG2 which contain the entire known corpus of writing with that script as a part of the proposal. :-) == John H. Jenkins [EMAIL PROTECTED] [EMAIL PROTECTED] http://homepage.mac.com/jenkins/
Re: FW: Inappropriate Proposals FAQ
Otto Stolz Otto dot Stolz at uni dash konstanz dot de wrote: I noticed in particular that the characters are non-contiguous within themselves. This is not enough to decide what constitutes a character. E. g. the cyrillic character ? is non-contiguous; in the Latin script, when f and l are ligated, they still constitute two characters. For that matter, the dot over lowercase i and j make those glyphs non-contiguous as well. -Doug Ewell Fullerton, California
FW: Inappropriate Proposals FAQ
I realized that I should probably turn an off-list discussion back to the list, as it's illustrating an area of difficulty. (See the bottom of this note for a partial discussion of what writing systems could/would be considered.) In the appropriate use FAQ entry, how the heck can we state what is and isn't a suitable writing system for inclusion? Fictional scripts in some cases would be considered, in other cases would not. Historical scripts would in some cases be considered, in other cases not. Can people from the review committee give me some hard and fast rules for when something is thrown out? Is there some way of listing or detailing the criteria in a way which potential readers could determine where theire script or character stands? It could perhaps be presented in a table or matrix form, so that people could look through the criteria and say yep, it's fictional, yep people currently use it, no, there are no fonts yet etc. Or maybe a decision tree would be better, where the criteria forks. I guess the first thing I need to collect is the criteria... Here are some starters to get the ball rolling: --Is this an entire script or additions to an existing script? --Is the script fictional? --Is the script in use? (as determined how???) --Does the character(s) already exist in some other part of the standard? --Is there a compelling reason for including a characther which would normally not be considered, due to legacy support issues? --Is the character a precomposed ligature which can be encoded using a sequence of existing character (possibly joined by ZWJ's)? --Is the character a precomposed accented character which can be composed using an existing character and one or more existing combining diacritics? --Is the character a clone of an existing character whose sole purpose is making a *logical* differentiation from some existing characters (e.g., hex digits looking identical to existing characters 0..9 and A...F; or a symbol for meter looking identical to Latin m)? --Is the chracter a clone of an existing character whose sole purpose is making a *graphical* differentiation from some existing characters (e.g., a Serbian letter t, disunified from Russian on the basis that italics looks different in the two languages)? --Is the character really a presentation glyph for a shape that can be obtained using regular characters in conjunction with ZWJ or ZWNJ? Again, additions, suggestions, or other help appreciated, Suzanne -Original Message- At 15:35 -0400 2002-07-02, Suzanne M. Topping wrote: Apologies, I was sloppy with my phrasing; of course a script is written. What I really meant was that the script is in current use in some sort of written form, as opposed to existing somewhere, historically, without being in current use. Fiction is written within history. Jonathan Swift made up an alphabet once I think in Gulliver's Travels. But it exists only in that book. Tengwar is different. It has to be used by people who want to interchange data safely with it. Tengwar meets this criterion. Klingon didn't. Is there some sort of metric associated with the concept of people who want...? Is there some sort of a threshhold of number of people? Certainly not. The number of users of Old Permic, for instance, is probably a few dozen or less -- specialists. That doesn't mean Old Permic doesn't deserve encoding. Numbers aren't really in the criteria. --
Re: FW: Inappropriate Proposals FAQ
Suzanne T asked: Can people from the review committee give me some hard and fast rules for when something is thrown out? There's only one hard and fast rule that I know: when a majority of UTC members vote to NOT encode something. I think the criteria that UTC representatives use to determine their votes would be along the lines of Suzanne's checklist. She might be able to come up with a list of questions such that some specific negative or positive answers would lead one to the conclusion that it is or isn't worth submitting a proposal for some specified thing. One rule of thumb that people can also use: if an off-the-cuff proposal for a thing doesn't fly on the Unicode list, it is unlikely to fly in UTC. Rick
Re: FW: Inappropriate Proposals FAQ
This looks like a lot of work and it looks like it duplicates as lot of the work in the submitting new proposals section of instructions on our website and in the standard. We are getting a large number of *informal* suggestions for proposals that are more or less clearly inappropriate and spend some amount of time on the list dealing with them. However, neither UTC nor ISO/IEC JTC1/SC2/WG2 are receiving a large number of inappropriate *formal* proposals. In fact, both groups few proposals that don't have active support or involvement of people active in either or both bodies - ad those people don't need an FAQ. I submit that, while an interesting excercise, such a FAQ is a solution in search of a problem. Having a FAQ will not do anything to keep enthusiasts from storming in with their latest brain child - since it's the nature of these informal suggestions that people never read the FAQ (any FAQ) before hitting the send button. (OK, maybe some do). At the same time, there's the risk that we maintain TWO sets of information on the same topic (what's an acceptable proposal), with all the maintenance issues and issues of which text is the binding one. I suggest that we either not do a FAQ, or do a very simple one of a few (three?) proposals that have definitely failed (or would definitely fail) as illustrations, just to establish the idea that there are inappropriate proposals, and then firmly point to the *official* document that sets out the criteria for creating a *formal* proposal. My favorite examples are Klingon (or any of the Latin ciphers/ movie scripts) Hexadecimal digits or 'Decimal separator' Any proposal asking for the rearrangement/removal of characters The decimal separator is a clear example of coding something that is already encoded using a different model and coding a character purely by function, which Unicode tends to be leery of, esp. if it can't be visually distinguished from existing characters. I would definitely NOT like to see a detailed typology of proposals in a FAQ, with detailed script classifications etc. A./ At 12:59 PM 7/3/02 -0400, Suzanne M. Topping wrote: I realized that I should probably turn an off-list discussion back to the list, as it's illustrating an area of difficulty. (See the bottom of this note for a partial discussion of what writing systems could/would be considered.) In the appropriate use FAQ entry, how the heck can we state what is and isn't a suitable writing system for inclusion? Fictional scripts in some cases would be considered, in other cases would not. Historical scripts would in some cases be considered, in other cases not. Can people from the review committee give me some hard and fast rules for when something is thrown out? Is there some way of listing or detailing the criteria in a way which potential readers could determine where theire script or character stands? It could perhaps be presented in a table or matrix form, so that people could look through the criteria and say yep, it's fictional, yep people currently use it, no, there are no fonts yet etc. Or maybe a decision tree would be better, where the criteria forks. I guess the first thing I need to collect is the criteria... Here are some starters to get the ball rolling: --Is this an entire script or additions to an existing script? --Is the script fictional? --Is the script in use? (as determined how???) --Does the character(s) already exist in some other part of the standard? --Is there a compelling reason for including a characther which would normally not be considered, due to legacy support issues? --Is the character a precomposed ligature which can be encoded using a sequence of existing character (possibly joined by ZWJ's)? --Is the character a precomposed accented character which can be composed using an existing character and one or more existing combining diacritics? --Is the character a clone of an existing character whose sole purpose is making a *logical* differentiation from some existing characters (e.g., hex digits looking identical to existing characters 0..9 and A...F; or a symbol for meter looking identical to Latin m)? --Is the chracter a clone of an existing character whose sole purpose is making a *graphical* differentiation from some existing characters (e.g., a Serbian letter t, disunified from Russian on the basis that italics looks different in the two languages)? --Is the character really a presentation glyph for a shape that can be obtained using regular characters in conjunction with ZWJ or ZWNJ? Again, additions, suggestions, or other help appreciated, Suzanne -Original Message- At 15:35 -0400 2002-07-02, Suzanne M. Topping wrote: Apologies, I was sloppy with my phrasing; of course a script is written. What I really meant was that the script is in current use in some sort of written form, as opposed to existing somewhere, historically, without being in current use. Fiction is written within
Re: FW: Inappropriate Proposals FAQ
At 12:59 -0400 2002-07-03, Suzanne M. Topping wrote: Can people from the review committee give me some hard and fast rules for when something is thrown out? I suspect we cannot. -- Michael Everson *** Everson Typography *** http://www.evertype.com
Re: FW: Inappropriate Proposals FAQ
On Wednesday, July 3, 2002, at 11:57 AM, Asmus Freytag wrote: Klingon (or any of the Latin ciphers/ movie scripts) I'd say Klingon *and* one of the Latin ciphers. Klingon is almost worth a FAQ in itself. == John H. Jenkins [EMAIL PROTECTED] [EMAIL PROTECTED] http://homepage.mac.com/jenkins/
Re: FW: Inappropriate Proposals FAQ
Suzanne, Can people from the review committee give me some hard and fast rules for when something is thrown out? As Michael Everson indicated, the answer to this is probably not. However, perhaps the most important thing for serious script proposers to do, to see if what they are concerned about might be acceptable, is to consult the Roadmap: http://www.unicode.org/roadmaps/ If a script is listed there in the Roadmap for the BMP or for Plane 1, then people can be assured that interested members of the encoding committees have *already* made a tentative determination that the script is suitable for encoding, although a proposal may not actually exist yet, and of course, there are no guarantees until the committees actually do the work on fully filled-out formal proposals. But if a script, like the MIIB BurgerKing cipher mentioned today, or chess diagram notation, is missing from the Roadmap, there is probably a *good* reason for it not to be there, and people should think twice (and then again) before they start proposing it for encoding in Unicode. --Ken Another missing example: The voice which shook the earth, from Chapter IV, verse 44 of LIBER LIBERI vel LAPIDIS LAZULI ADUMBRATIO KABBALÆ ÆGYPTIORUM, one of the Holy Books of Thelema: http://www.nuit.org/thelema/Library/HolyBooks/LibVII.html Disclaimer: The UTC New Scripts committee does not discriminate among script applicants on the basis of race, color, gender, religion, sexual orientation, national or ethnic origin, age, disability, or veteran status. However, if they are risible, we reserve the right to laugh. ;-)
RE: Inappropriate Proposals FAQ
Marco Cimarosti recently said: - No presentation glyphs for shapes that can already be obtained using regular characters in conjunction with ZWJ or ZWNJ. Why not just presentation glyphs in general? We seem to have queries about Indian cojuncts fairly frequently. Some more suggestions (some of which have covered from other angles already) - No scripts with a limited body of text in existance. (No need to exchange or analyse on computer.) E.g. Phaistos disk script - No scripts which are poorly understood and it is not clear as to what the characters are. E.g. Rongo-rongo. - No symbols that are just a picture of something with no other meaning e.g. a dog. (These tend not to have a fixed conventional form.) - No symbols that are only used in diagrams rather than running text. e.g. electrical component symbols. - No personal, ideosyncratic or company logos. E.g. the artist when he was not known as Prince. - No archaic styles of existing characters. E.g. dotless j. - No control codes for fancy text. E.g. begin bold - No characters that can be obtained by using a different font with existing characters and have no semantic difference from the existing characters. - No proposals to rename existing characters. (But a clarifying note might be added.) - No proposals to reposition existing characters, e.g. so they sort better. - No proposals for a newly invented character since putting it in the standard would help promote its use. (Significant usage must come first.) Tim -- Tim Partridge. Any opinions expressed are mine only and not those of my employer
RE: Inappropriate Proposals FAQ
Timothy Partridge included the restriction - No archaic styles of existing characters. E.g. dotless j. as something inappropriate. Question: how does one code up (presumably with markup) a caret over a jk pair in a math expression? The dot on the j should be missing for this case, but how does one communicate that to a font if there's no code for a dotless j? It seems that dotless j is needed for some mathematical purposes. Thanks Murray
Re: Inappropriate Proposals FAQ
On Wednesday, July 3, 2002, at 02:23 PM, Murray Sargent wrote: as something inappropriate. Question: how does one code up (presumably with markup) a caret over a jk pair in a math expression? The dot on the j should be missing for this case, but how does one communicate that to a font if there's no code for a dotless j? It seems that dotless j is needed for some mathematical purposes. The glyph is; the character isn't. There are also accented j's which are based on a dotless-j. The way we do it is include a glyph called dotlessj in the font, and have the tables set up so that whenever j is found with an accent, dotlessj is substituted. == John H. Jenkins [EMAIL PROTECTED] [EMAIL PROTECTED] http://homepage.mac.com/jenkins/
RE: Inappropriate Proposals FAQ
I would NOT like to see our committees' hands tied by taking this list as more than guidelines. I understand that it is for an FAQ but there should be text therein to emphasize that these are not binding. At 19:10 + 2002-07-03, Timothy Partridge wrote: Why not just presentation glyphs in general? We seem to have queries about Indian cojuncts fairly frequently. Some more suggestions (some of which have covered from other angles already) - No scripts with a limited body of text in existance. (No need to exchange or analyse on computer.) E.g. Phaistos disk script If the Phaistos disk were bilingual and deciphered, it could be added even if there were only one document. Why not? - No scripts which are poorly understood and it is not clear as to what the characters are. E.g. Rongo-rongo. True. - No symbols that are just a picture of something with no other meaning e.g. a dog. (These tend not to have a fixed conventional form.) For instance, Blissymbols has a dog symbol in it. Granted, Blissymbols is a separate script so maybe that isn't so convincing. But what if a series of hotel symbols were added, with things like NO SMOKING, NO DOGS, GUIDE DOGS appeared? Those do have some sort of real semantic even though the glyphs may vary. - No symbols that are only used in diagrams rather than running text. e.g. electrical component symbols. Probably unassailable. - No personal, ideosyncratic or company logos. E.g. the artist when he was not known as Prince. This IS a rule. - No archaic styles of existing characters. E.g. dotless j. There are some archaic characters already encoded, and N'Ko is going to have two of them. Probably. - No control codes for fancy text. E.g. begin bold We have BEGIN SLUR in Western Music already. Might have use for BEGIN and END CARTOUCHE in Egyptian -- or might not. Research continues. - No characters that can be obtained by using a different font with existing characters and have no semantic difference from the existing characters. Such as? - No proposals to rename existing characters. (But a clarifying note might be added.) This IS a rule. - No proposals to reposition existing characters, e.g. so they sort better. This IS a rule. - No proposals for a newly invented character since putting it in the standard would help promote its use. (Significant usage must come first.) We did encode the GREEK KAI SYMBOL, and when I proposed it, I hoped that it would promote its use. Why? Because I saw a lot of hand-painted signage in Greece which used it, but machine-printed signage which used the AMPERSAND instead. I thought that was pretty unfortunate. But I DIDN'T invent it. It is centuries old! Playing devil's advocate here, just a bit. -- Michael Everson *** Everson Typography *** http://www.evertype.com
Re: FW: Inappropriate Proposals FAQ
At 12:17 -0700 2002-07-03, Kenneth Whistler wrote: If a script is listed there in the Roadmap for the BMP or for Plane 1, then people can be assured that interested members of the encoding committees have *already* made a tentative determination that the script is suitable for encoding, although a proposal may not actually exist yet, and of course, there are no guarantees until the committees actually do the work on fully filled-out formal proposals. Indeed, we might end up unifying Javanese and Balinese for instance. We don't know just now. But if a script, like the MIIB BurgerKing cipher mentioned today, or chess diagram notation, is missing from the Roadmap, there is probably a *good* reason for it not to be there, and people should think twice (and then again) before they start proposing it for encoding in Unicode. Well... a year ago we hadn't heard of N'Ko. Things turn up all the time. The voice which shook the earth, from Chapter IV, verse 44 of LIBER LIBERI vel LAPIDIS LAZULI ADUMBRATIO KABBAL GYPTIORUM, one of the Holy Books of Thelema: http://www.nuit.org/thelema/Library/HolyBooks/LibVII.html Wow. Or indeed UAOAU -- Michael Everson *** Everson Typography *** http://www.evertype.com
Re: Inappropriate Proposals FAQ
At 15:17 -0600 2002-07-03, John H. Jenkins wrote: On Wednesday, July 3, 2002, at 02:23 PM, Murray Sargent wrote: as something inappropriate. Question: how does one code up (presumably with markup) a caret over a jk pair in a math expression? The dot on the j should be missing for this case, but how does one communicate that to a font if there's no code for a dotless j? It seems that dotless j is needed for some mathematical purposes. The glyph is; the character isn't. There are also accented j's which are based on a dotless-j. The way we do it is include a glyph called dotlessj in the font, and have the tables set up so that whenever j is found with an accent, dotlessj is substituted. This is a very good answer and should be in the FAQ. There may be a dotless j as a character in one of the Nordic phonetic alphabets. But even if there were, it would be wrong to use it for a decomposed Esperanto J WITH CIRCUMFLEX. -- Michael Everson *** Everson Typography *** http://www.evertype.com
RE: Inappropriate Proposals FAQ
I would like to once again suggest that we refocus this 'FAQ' AWAY from a repetition of the Principles and Procedures document maintained by WG2 and containing the explanation of what constitutes a valid *formal* proposal. AWAY from any attempt to cover *all* aspects that could make a proposal inappropriate, and away from any schema for a complete classification of the universe of possible proposals. TOWARDS a set of a few -easily understood and not contentious- examples of things that have been ruled out of bounds - with a clear pointer to the formal document with its typology of scripts. (By all means, point prominently to the roadmap as well). Doing anything else will take a lot of work, both initially and in constantly tweaking it; cause a lot of confusion (if it contains many items that are in fact in a gray zone) and can weaken our understanding of which set of 'rules' are the ones we really operate under. A./ On Wed, 3 Jul 2002 23:24:01 +0100 Michael Everson [EMAIL PROTECTED] wrote: I would NOT like to see our committees' hands tied by taking this list as more than guidelines. I understand that it is for an FAQ but there should be text therein to emphasize that these are not binding.
Re: FW: Inappropriate Proposals FAQ
Ken Whistler wrote as follows. But if a script, like the MIIB BurgerKing cipher mentioned today, or chess diagram notation, is missing from the Roadmap, there is probably a *good* reason for it not to be there, and people should think twice (and then again) before they start proposing it for encoding in Unicode. Well, there may or may not be a good reason for something missing from the Roadmap, one good reason possibly being that nobody has suggested something before. I make this point since you mention chess diagram notation. I have recently published some Private Use Area code point allocations for chess diagrams. http://www.users.globalnet.co.uk/~ngo/chess.htm http://www.users.globalnet.co.uk/~ngo/chess2.htm http://www.users.globalnet.co.uk/~ngo/golden.htm I am well aware that some people are not interested in my publishing of Private Use Area allocations, but I am stating the matter here since you mentioned chess diagram encoding and said that there is probably a good reason for it not being in the Roadmap. Has the Unicode Technical Committee in fact ever discussed the possibility of whether or not to encode chess diagrams in regular Unicode at all? Just because something has not arisen before and has never been discussed by members of the 1 dollars a year for a vote group does not in any way make that matter questionable in any way whatsoever. One script which the Unicode Technical Committee has turned down is the Phaistos Disk Script. There is a document, which if I may say so is of high quality, about the proposal for encoding of the Phaistos Disk Script into Unicode, into Plane 1. It was rejected. Is this really for a purported reason that the script is only found on one item at present? I find that a strange thing. I feel that that proposal should be looked at again by the committee. How about discussing the Phaistos Disk Script in this new FAQ document? As for the script regarding the cipher in relation to a movie. I feel that the characters are very elegant and I noticed in particular that the characters are non-contiguous within themselves. Examining the gif file in some detail I find that a lot of care has been taken in preparing those characters and I wonder if anyone knows what are the design influences underlying the design please. Is the designer perhaps reading this list and perhaps would like to comment please? William Overington 4 July 2002
Inappropriate Proposals FAQ
As no good deed goes unpunished, my suggestion re. an FAQ entry regarding innappropriate candidates for encoding resulted in my being asked to begin a draft. I see the need for perhaps two entries: one which states clearly what Unicode is NOT, and another which lists a few examples of innapropriate proposals and why they would not be considered. This section would probably refer to the what Unicode isn't entry for support of the whys. I have a few ideas for fictional proposals to use as examples (my room layout idea, and Mark's 3-D Mr. Potato Head representation), but I could use another one or two if anyone feels creative. The closer to being believable, the better, I suppose. (An alternative would be to use real-life proposals, and state why they were not accepted, but I thought it more politic to keep it fictional...) I'm also looking for key points to include in the what Unicode isn't section, and would appreciate input. I'm particularly looking for issues that have created ongoing repetitive arguments, since the goal of the FAQ entries is to help eliminate them. Thanks in advance for your input, Suzanne Topping BizWonk Inc. [EMAIL PROTECTED]
Re: Inappropriate Proposals FAQ
But would not using rejected proposals (as well as the fictional ones) be closer to the truth and therefore more accurate? John from:Suzanne M. Topping [EMAIL PROTECTED] date:Tue, 02 Jul 2002 15:01:16 to: [EMAIL PROTECTED] subject: Re: Inappropriate Proposals FAQ (An alternative would be to use real-life proposals, and state why they were not accepted, but I thought it more politic to keep it fictional...)
RE: Inappropriate Proposals FAQ
Suzanne M. Topping wrote: I have a few ideas for fictional proposals to use as examples (my room layout idea, and Mark's 3-D Mr. Potato Head representation), but I could use another one or two if anyone feels creative. Today I don't feel very creative, perhaps because deliberating inventing bad ideas does not appeal too much to my creativeness. :-) But perhaps I have some suggestions for the less creative part of the FAQ, which is: listing the existing policies for excluding some classes of proposals. In my understanding, a few such policies are: - No precomposed ligatures which can be encoded using a sequence of existing character (possibly joined by ZWJ's); - No precomposed accented characters which can be composed using an existing character and one or more existing combining diacritics; - No clones of existing characters whose sole purpose is making a *logical* differentiation from some existing characters (e.g., hex digits looking identical to existing characters 0..9 and A...F; or a symbol for meter looking identical to Latin m); - No clones of existing characters whose sole purpose is making a *graphical* differentiation from some existing characters (e.g., a Serbian letter t, disunified from Russian on the basis that italics looks different in the two languages); - No presentation glyphs for shapes that can already be obtained using regular characters in conjunction with ZWJ or ZWNJ. _ Marco
Re: Inappropriate Proposals FAQ
I have a few ideas: Fictional scripts that would probably be rejected, such as the script of the Codex Seraphinianus A "fictional" Hanzi (specifically, a Hanzi made up of the "woman" radical plus the character for "walk"), which I am attaching a crude image of. The proposer either (1) used this character in a novel once (or has seen it used in a novel), or (2) he wants to use it as a symbol for the length unit of the new system of measurement he invented. $B==0l$A$c$s??$N0&$OB8:_$7$J$$$N!)(B _ $B$-$C$H8+$D$+$k$"$J$?$N?75o!!ITF0;:>pJs$O(B MSN $B=;Bp$G(B http://house.msn.co.jp/
Re: Inappropriate Proposals FAQ
At 12:38 -0400 2002-07-02, ÇÎÅZÅZÅZÅZ ÇÎÅZÅZÅZ wrote: I have a few ideas: Fictional scripts that would probably be rejected, such as the script of the Codex Seraphinianus Certainly not. Tengwar and Cirth are certain to be encoded. The Codex script would probably not be encoded because it occurs in only one manuscript and is undeciphered. -- Michael Everson *** Everson Typography *** http://www.evertype.com
Re: Inappropriate Proposals FAQ
How about symbols from electronics and hydraulics? Schematic symbols. Wm Seán Glen - Original Message - From: Suzanne M. Topping To: Unicode (E-mail) Sent: Tuesday, 02 July, 2002 7:01 Subject: Inappropriate Proposals FAQ I have a few ideas for fictional proposals to use as examples (my roomlayout idea, and Mark's 3-D Mr. Potato Head representation), but I coulduse another one or two if anyone feels creative. Thanks in advance for your input,Suzanne ToppingBizWonk Inc.[EMAIL PROTECTED]
Re: Inappropriate Proposals FAQ
At 10:01 AM 7/2/2002 -0400, Suzanne M. Topping wrote: I have a few ideas for fictional proposals to use as examples (my room layout idea, and Mark's 3-D Mr. Potato Head representation), but I could use another one or two if anyone feels creative. The closer to being believable, the better, I suppose. (An alternative would be to use real-life proposals, and state why they were not accepted, but I thought it more politic to keep it fictional...) There was a discussion last year about a symbol to represent pi/2 or pi/4 or something like that. If you want to fictionalize that to some other fraction of a mathematical constant, that might work (e/2 perhaps?) Barry Caplan www.i18n.com