Re: Hmm, this evolved into an editorial when I wasn't looking :) was: RE: Inappropriate Proposals FAQ
Barry Caplan wrote: > >> At 01:27 PM 7/11/2002 -0400, Suzanne M. Topping wrote: > >> >Unicode is a character set. Period. > >> > >> Each character has numerous > >> properties in Unicode, whereas they generally don't in legacy > >> character sets. > > > >Each character, or some characters? > > > For all intents and purposes, each character. > So, each character has at least one attribute. Yes. The implications of the Unicode Character Database include the determination that the UTC has normatively assigned properties (multiple) to all Unicode encoded characters. Actually, it is a little more subtle than that. There are some properties which accrue to code points. The General Category and the Bidirectional Category are good examples, since they constitute enumerated partitions of the entire codespace, and API's need to return meaningful values for any code point, including unassigned ones. Other properties accrue more directly to characters, per se. They attach to the abstract character, and get associated with a code point more indirectly by virtue of the encoding of that character. The numeric value of a character would be a good example of this. No one expects an unassigned code point or an assigned dingbat character or a left bracket to have a numeric value property (except perhaps a future generation of Unicabbalists). > There are no corresponding features in other character sets usually. Correct. Before the development of the Unicode Standard, character encoding committees tended to leave that property assignments either up to implementations (considering them obvious) or up to standardization committees whose charter was "character processing" -- e.g. SC22/WG15 POSIX in the ISO context. The development of a Universal character encoding necessitated changing that, bringing character property development and standardization under the same roof as character encoding. Note that not everyone agrees about that, however. We are still having some rather vigorous disagreements in SC22 about who "owns" the problem of standardization of character properties. > A common definition of "character set" is a list of character > you are interested in assigned to codepoints. That fits most > legacy character sets pretty well, but Unicode is sooo much > more than that. Roughly the distinction I was drawing between "the Unicode CCS" and "the Unicode Standard". > But what if we took a look at it from a different point of view, > that the standard is a agreed upon set of rules and building > blocks for text oriented algorithms? Would people start to > publish algorithms that extend on the base data provided so > we don't have to reinvent wheels all the time? Well the "Unicode Standard" isn't that, although it contains both formal and informal algorithms for accomplishing various tasks with text, and even more general "guidelines" for how to do things. The members of the Unicode Technical Committee are always casting about for areas of Unicode implementation behavior where commonly defined, public algorithms would be mutually beneficial for everyone's implementations and would assist general interoperability with Unicode data. To date, it seems to me that the members, as well as other participants in the larger effort of implementing the Unicode Standard, have been rather generous in contributing time and brainpower to this development of public algorithms. The fact that ICU is an Open Source development effort is enormously helpful in this regard. > If I were to stand in front of a college comp sci class, > where the future is all ahead of the students, what proportion > of time would I want to invest in how much they knew about legacy > encodings versus how much I could inspire them to build from and > extend what Unicode provides them? This problem, of Unicode in the computer science curriculum, intrigues me -- and I don't think it has received enough attention on this list. One of my concerns is that even now it seems to be that CS curricula not only don't teach enough about Unicode -- they basically don't teach much about characters, or text handling, or anything in the field of internationalization. It just isn't an area that people get Ph.D.'s in or do research in, and it tends to get overlooked in people's education until they go out, get a job in industry and discover that in the *real* world of software development, they have to learn about that stuff to make software work in real products. (Just like they have to do a lot of seat-of-the-pants learning about a lot of other topics: building, maintaining, and bug-fixing for large, legacy systems; software life cycle; large team cooperative development process; backwards compatibility -- almost nothing is really built from scratch!) > > The major work ahead is no longer in the context of building > a character standard. Time is fast approaching to decide to keep > it small and apply a bit of polish, or focus on the use and
Re: What Unicode Is (was RE: Inappropriate Proposals FAQ)
At 03:54 PM 7/12/2002 -0700, Kenneth Whistler wrote: >Suzanne responded: > >> > Maybe Unicode is more of a shared set of rules that apply to >> > low level data structures surrounding text and its algorithms >> > then a character set. > >O.k., so now before asserting or denying that "Unicode ... is >a shared set of rules", it would be helpful to pin down >first what you are referring to. That might make the ensuing >debate more fruitful. Actually, it was me, not Suzanne, that called "Unicode" a shared set of rules. As Ferris Bueller once said "I'll take the heat for this." I was aware of all of the uses of Unicode that you listed. I have no quarrels with any of them. They do point to the fact that the word is overloaded with definitions. Which means that readers have to choose the appropriate one from the context. The context of the statement above is that the "Unicode" referred to is the Standard, and all associated documentation. Not Unicode the Consortia which manages the Standard. Not Unicode the way of life :) I did intend to throw open a debate about the long term future of Unicode the Standard and by extension Unicode the Consortia. Since Suzanne is writing "What is Unicode and is not Unicode FAQ", I think the answer to that is going to be very definitely colored by the answer to the related question "What will Unicode become?", e.g. Unicode 6.0, 7.0, 8.0, etc. See my previous msg, subject line: "Hmm, this evolved into an editorial when I wasn't looking :) " for some thoughts on that subject. Barry Caplan www.i18n.com
What Unicode Is (was RE: Inappropriate Proposals FAQ)
Suzanne responded: > > Maybe Unicode is more of a shared set of rules that apply to > > low level data structures surrounding text and its algorithms > > then a character set. > > Sounds like the start of a philosophical debate. > > If Unicode is described as a set of rules, we'll be in a world of hurt. > (On a serious note, these exceptions are exactly what make writing some > sort of "is and isn't" FAQ pretty darned hard. Hmm. Since the discussion which started out trying to specify a few examples of what kinds of entities would be inappropriate to proffer for encoding as Unicode characters seems to be in danger of mutating into the recurrent "What is Unicode?" question, perhaps its time to start a new thread for the latter. And now for some ontological ground rules. When trying to decide what a "thing" is, it helps not to use an attribute nominatively, since that encourages people to privately visualize the noun the attribute is applied to, but to do so in different ways -- and then to argue past each other because they are, in the end, talking about different things. "Unicode" is used attributatively of a number of things, and if we are going to start arguing/discussing what "it" is, it would be better to lay out the alternative "it"s a little more specifically first. 1. The Unicode *Consortium* is a standardization organization. It started out with a charter to produce a single standard, but along the way has expanded that charter, in response to the desire of its membership. In addition to "The Unicode Standard", it now has adopted a terminology that refers to some of its other publications as "Unicode Technical Standards" [UTS], of which two formally exist now: UTS #6 SCSU, and UTS #10 Unicode Collation Algorithm [UCA]. It is important to keep this straight, because some people, when they say "Unicode" are talking about the *organization*, rather than the Unicode Standard per se. And when people talk about "the standard", they are generally referring to "The Unicode Standard", but the Unicode Consortium is actually responsible for several standards. 2. The Unicode *Standard* itself is a very complex standard, consisting of many pieces now. To keep track of just what something like "The Unicode Standard, Version 3.2" means, we now have to keep web pages enumerating all the parts exactly -- like components in an assemble-your-own-furniture kit. See: http://www.unicode.org/unicode/standard/versions/ In any one particular version, the Unicode Standard now consists of a book publication, some number of web publications (referred to as Unicode Standard Annexes [UAX]), and a large number of contributory data files -- some normative and some informative, some data and some documentation. These definitions, including the exact list of contributory data files and their versions, are themselves under tight control by the Unicode Technical Committee, as they constitute the very *definition* of the Unicode Standard. It is not by accident that the version definitions start off now with the following wording: "The Unicode Standard, Version 3.2.0 is defined by the following list..." and so on for earlier versions. 3. The Unicode *Book* is a periodic publication, constituting the central document for any given version of the Unicode *Standard*, but is by no means the entire standard. The book, in turn, is very complex, consisting of many chapters and parts, some of which constitute tightly controlled, normative specification, and some of which is informative, editorial content. The "book" now also exists in an online version (pdf files): http://www.unicode.org/unicode/uni2book/u2.html which is *almost* identical to the published hardcover book, but not quite. (The Introduction is slightly restructured, the online glossary is restructured and has been added to, the charts are constructed slightly differently and have introductory pages of their own, etc.) 4. The Unicode *CCS* [coded character set] is the mapping of the set of abstract characters contained in the Unicode repertoire (at any given version) to a bunch of code points in the Unicode codespace (0x..0x10). Technically speaking, it is the Unicode *CCS* which is synchronized closely with ISO/IEC 10646, rather than the Unicode *Standard*. 10646 and the Unicode CCS have exactly the same coded characters (at various key synchronization points in their joint publication histories), but the *text* of the ISO/IEC 10646 standard doesn't look anything like the *text* of the Unicode Standard, and the Unicode Standard [sensum #2 above] contains all kinds of material, both textual and data, that goes far beyond the scope of 10646. There are other standards produced by some national bodies that are effectively just translations of 10646 (GB 13000 in China, JIS X 0221 in Japan), but the Unicode Standard is nothing like those. Finally, the attribute "Unicode ..." can be applied to all kinds of other "things" characteristic of the Unicode Sta
Hmm, this evolved into an editorial when I wasn't looking :) was: RE: Inappropriate Proposals FAQ
At 05:13 PM 7/12/2002 -0400, Suzanne M. Topping wrote: >> -Original Message- >> From: Barry Caplan [mailto:[EMAIL PROTECTED]] >> >> At 01:27 PM 7/11/2002 -0400, Suzanne M. Topping wrote: >> >Unicode is a character set. Period. >> >> Each character has numerous >> properties in Unicode, whereas they generally don't in legacy >> character sets. > >Each character, or some characters? For all intents and purposes, each character. Chapter 4.5 of my Unicode 3.0 book says " The Unicode Character Database on the CDROM defines a General Category for all Unicode characters" So, each character has at least one attribute. One could easily say that each character also has an attribute for "isUpperCase" of either true of false, and so on. There are no corresponding features in other character sets usually. >> Maybe Unicode is more of a shared set of rules that apply to >> low level data structures surrounding text and its algorithms >> then a character set. > >Sounds like the start of a philosophical debate. Not really. I have been giving presentations for years, and I have seen many others give similar presentations. A common definition of "character set" is a list of character you are interested in assigned to codepoints. That fits most legacy character sets pretty well, but Unicode is sooo much more than that. >If Unicode is described as a set of rules, we'll be in a world of hurt. Yeah, one of the heaviest books I own is Unicode 3.0. I keep it on a low shelf so the book of rules describing Unicode doesn't fall on me for just that reason. this is earthquake country after all :) >I choose to look at this stuff as the exceptions that make the rule. I don't really know if it is possible to break down Unicode into more fundamental units if you started over. Its complexity is inherent in the nature of the task. My own interest is more in getting things done with data and algorithms that use the type of material represented by the Unicode standard, more so than the arcania of the standard itself. So it doesn't bother me so much that there are exceptions - as long as we have the exceptions that everyone agrees on, that is fine by me because it means my data and at least some of my algorithms are likely to be preservable across systems. >(On a serious note, these exceptions are exactly what make writing some >sort of "is and isn't" FAQ pretty darned hard. Be careful what you ask for :) >I can't very well say >that Unicode manipulates characters given certain historical/legacy >conditions and under duress. Why not? It is true. But what if we took a look at it from a different point of view, that the standard is a agreed upon set of rules and building blocks for text oriented algorithms? Would people start to publish algorithms that extend on the base data provided so we don't have to reinvent wheels all the time? I'm just brainstorming here, this is all just coming to me now. If I were to stand in front of a college comp sci class, where the future is all ahead of the students, what proportion of time would I want to invest in how much they knew about legacy encodings versus how much I could inspire them to build from and extend what Unicode provides them? Seriously, most of the folks on this list that I know personally, and I include myself in this category, are approaching or past the halfway point in our careers. What would we want the folks who are just starting their careers now to know about Unicode and do with it by the time they reach the end of theirs, long after we have stopped working? For many applications, people are not going to specialize in i18n/l10n issues. They need to know what the appropriate building text based blocks are, and how they can expand on them while still building whatever they are working on. Unicode at least hints at this with the bidi algorothm. Moving forward should other algorithms be codified into Unicode, or as separate standards or defacto standards? I am thinking of "Japanese word splitting algorithm". There are proprietary products that do this today with reasonable but not perfect results. Are they good enough that the rules can be encoded into a standard? If so, then someone would build an open implementation, and then there would always be this building block available for people to use. I am sure everyone on this list can think of their own favorite algorithms of this type, based on the part of Unicode that interests you the most. My point is that the raw information already in unicode *does* suggest the next level of usage, and the repeated newbie questions that inspired this thread suggest the need for a comprehensive solution at a higher level then a character set provides. Maybe part of this means including or at least facilitating the description of lowlevel text handling algorithms. >If I did, people would be scurrying around >trying to figure out how to foment the duress.) The acc
Status update re. Inappropriate Proposals FAQ
I'm nearly done playing catchup after vacation and hope to begin extracting concepts for the FAQ next week. Thanks to all who've submitted input, as conflicting and varied as it is/was. > -Original Message- > From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED]] > Sent: Wednesday, July 03, 2002 8:53 > > > I would like to once again suggest that we refocus this 'FAQ' > > AWAY from a repetition of the "Principles and Procedures" > document maintained > by WG2 and containing the explanation of what constitutes a > valid *formal* > proposal.
RE: Inappropriate Proposals FAQ
> -Original Message- > From: Barry Caplan [mailto:[EMAIL PROTECTED]] > > At 01:27 PM 7/11/2002 -0400, Suzanne M. Topping wrote: > >Unicode is a character set. Period. > > Each character has numerous > properties in Unicode, whereas they generally don't in legacy > character sets. Each character, or some characters? > Maybe Unicode is more of a shared set of rules that apply to > low level data structures surrounding text and its algorithms > then a character set. Sounds like the start of a philosophical debate. If Unicode is described as a set of rules, we'll be in a world of hurt. > The Unicode consortium very wisely keeps it's focus narrow. > It provides > >a mechanism for specifying characters. Not for manipulating them, not > >for describing them, not for making them twinkle. > > All true, except for some special cases (BOM, bidi issues and > algoirthms, vertical variants, etc).Not saying those > shouldn't be in there, just that they are useful only in the > use of algorithms that are explicit (bi-di) or assumed (upper > case/lower case, vertical/horizontal) etc. Why mess up a nice clean statement simply because of a few hard facts? I choose to look at this stuff as the exceptions that make the rule. (On a serious note, these exceptions are exactly what make writing some sort of "is and isn't" FAQ pretty darned hard. I can't very well say that Unicode manipulates characters given certain historical/legacy conditions and under duress. If I did, people would be scurrying around trying to figure out how to foment the duress.)
RE: Inappropriate Proposals FAQ
At 01:27 PM 7/11/2002 -0400, Suzanne M. Topping wrote: >Unicode is a character set. Period. Well, maybe. But in a much broader sense then the character sets it subsumes in its listings. Each character has numerous properties in Unicode, whereas they generally don't in legacy character sets. Maybe Unicode is more of a shared set of rules that apply to low level data structures surrounding text and its algorithms then a character set. The Unicode consortium very wisely keeps it's focus narrow. It provides >a mechanism for specifying characters. Not for manipulating them, not >for describing them, not for making them twinkle. All true, except for some special cases (BOM, bidi issues and algoirthms, vertical variants, etc).Not saying those shouldn't be in there, just that they are useful only in the use of algorithms that are explicit (bi-di) or assumed (upper case/lower case, vertical/horizontal) etc. In many cases, these algorthms are not well known, even amongst the cognoscenti, or generally available in nice libraries. Anyone for an open source Japanese word splitting library (I know not taking a look at ICU before I press send is going to come back to haunt me on this, but if it is in there, then substitute something that isn't :) Barry Caplan www.i18n.com
RE: Inappropriate Proposals FAQ
Apologies for the delayed response to this thread, I've been out of town. -Original Message- > From: William Overington [mailto:[EMAIL PROTECTED]] > Sent: Friday, July 05, 2002 10:22 AM > > For the avoidance of doubt I am not saying that the Unicode Technical > Committee should necessarily accept such items as your > furniture idea for > encoding, I am simply saying that any decision as to what may > be encoded and > what shall and what shall not be encoded should be made by the Unicode > Technical Committee on the basis of the scientific situation > at the time > that an encoding proposal is formally considered. I feel > that it would be a > major error for the Unicode Consortium to publish a FAQ document which > prejudices the fair consideration of characters based upon > new technologies > which may arise in the future. While your thoughts on executing the floor plan idea are truly gobsmacking, I have to confess that I'd raised the concept precisely because it is -not- an appropriate issue for Unicode. Unicode is a character set. Period. When setting out on any endeavor, you have to be clear on what the intent is. If you want to go to the park and have a picnic, you set out parameters for that activity. If you allow a bunch of people to stop you along the way to buy shoes, see a movie, visit their aunt in the hospital, and get the oil changed in their car, you probably won't be able to accomplish the initial goal. If you develop a program for creating room layouts using graphics of furniture and architectural details, you probably shouldn't include modules to manage the drug histories of AIDS victims. That doesn't make tracking drug histories of AIDS victims unimportant, it means that they aren't a logical set of requirements to add to a room layout program. The Unicode consortium very wisely keeps it's focus narrow. It provides a mechanism for specifying characters. Not for manipulating them, not for describing them, not for making them twinkle. You clearly have widely ranging ideas for unique text and symbol applications. It would be great if you could channel that energy into developing ideas for a manipulation layer that could take Unicode characters, manipulate them, and deliver them in a cross-platform portable way which would allow them to be displayed and used in the ways that you envision. As recent discussions on this list have shown, Unicode is just one piece in the puzzle. Font and rendering issues for many languages remain serious stumbling blocks for actual use, even though the characters themselves are encoded. Any work you could do toward advancement of a manipulation layer that would ease the task of rendering characters as they are actually needed and used would be a tremendous boon. I would imagine that you would find a reasonable level of interest from a wide range of communities; font developers, bidi word processing developers, accessibility experts, minority script advocates, etc. I'll bet that some of the regular old Unicodies might even want to listen in. It would be sad if your energy and enthusiasm were dampened by the repeated denials you receive through this list. The ideas you generate are interesting, and often worth investigation. However, they are not appropriate additions to Unicode. I'm setting up a new group which can hopefully act as an appropriate venue for these types of discussions. As soon as I come up with a decent name for it, I'll send off an invitation with instructions for joining to the Unicode list. All the best, Suzanne Topping BizWonk Inc. [EMAIL PROTECTED]
Re: Inappropriate Proposals FAQ
William, For the gods' sake reign in those hares. Interchange protocols for architectural computer-aided design already exists. Character encoding does not apply to anything like that, because there aren't any characters. Object code has nothing to do with character encoding. Your caveat, that you are saying that >any decision as to what may be encoded and what shall and what shall >not be encoded should be made by the Unicode Technical Committee on >the basis of the scientific situation at the time that an encoding >proposal is formally considered. I feel that it would be a major >error for the Unicode Consortium to publish a FAQ document which >prejudices the fair consideration of characters based upon new >technologies which may arise in the future. is completely unnececessary. We know quite well what we are doing. We are hoping that with diligent study you will figure it out and get on board. But as Ken has said there is no scientific theory left to puzzle out. There may be aguments as to what specific symbols we wish to add (some people hate them, some people like them) and there the question is one of usage and the semantics of the symbols in general. -- Michael Everson *** Everson Typography *** http://www.evertype.com
Re: Inappropriate Proposals FAQ
Suzanne M. Topping wrote as follows. >I see the need for perhaps two entries: one which states clearly what >Unicode is NOT, and another which lists a few examples of innapropriate >proposals and why they would not be considered. This section would >probably refer to the "what Unicode isn't" entry for support of the >"why"s. > >I have a few ideas for fictional proposals to use as examples (my room >layout idea, and Mark's 3-D Mr. Potato Head representation), but I could >use another one or two if anyone feels creative. The closer to being >believable, the better, I suppose. (An alternative would be to use >real-life proposals, and state why they were not accepted, but I thought >it more politic to keep it fictional...) Well, having seen your furniture and room layout idea, presented in the Unicode list, I figured out the method to use to enable your room layout idea to be produced, using the technique, novel as far as I know, of allowing a glyph to contain some software which could be obeyed by the rendering system so as to rotate the points of the Bézier curves of the contours of the glyphs of the items of furniture. This seems to me to be something of a breakthrough in the possibilities for fonts, as including software inside a font which could be obeyed by the rendering system would allow a rendering system to be customized from within a font. It would seem a pity to restrict the future development of the concept by a Unicode Consortium issued FAQ document stating that Unicode will not encode such symbols when it seems that it would be relatively straightforward to implement such fonts. The font would need to contain the software that is to be obeyed. This could be organized so as to be accessed when a glyph is selected, with a central place within the font to store any subroutines called from within the software of the individual glyphs. If this software were in some appropriate portable software format, then the specification of the font format would perhaps not be that difficult, it could be part of an advanced font format that supports both chromatic font information and software in the fonts. For example, the software in the font could be specified to be written in 1456 object code. http://www.users.globalnet.co.uk/~ngo/1456.htm 1456 object code already supports double precision floating point items, integers, characters, strings, complex numbers and quaternions as standard types. Groups are also supported as a type experimentally. Consideration of this concept of software within the font has lead to consideration of how the position and rotation angle of the individual items of furniture could be set to an initial position from within the document and also as to how they could be adjusted by the end user using facilities set up from within the document and this has lead to the idea of having the document be able to open and customize a control panel, which control panel could contain buttons and scrollbars and so on and also a polar scrollbar for continuous rotational adjustment. It would seem, given the fact that 1456 object code supports quaternions and also has some functions of a quaternion variable built in as standard that this could be extended to three-dimensional rotations quite straightforwardly for applications that could use three-dimensional rotations. This is the sort of computational power which I feel that multimedia should be able to utilize, by including Unicode codes directly in a text file, so that the rendering system produces the control panel as instructed by the Unicode codes. This seems to be directly permissible within the definition of character in Annex B of the ISO document which was discussed recently, though perhaps not within the definition of character used by the Unicode Consortium at the present time. I feel that such ideas should not be thrown out by the Unicode Consortium publishing a FAQ document which would prevent it considering for inclusion glyphs in regular Unicode which could make good use of such technological advances. For the avoidance of doubt I am not saying that the Unicode Technical Committee should necessarily accept such items as your furniture idea for encoding, I am simply saying that any decision as to what may be encoded and what shall and what shall not be encoded should be made by the Unicode Technical Committee on the basis of the scientific situation at the time that an encoding proposal is formally considered. I feel that it would be a major error for the Unicode Consortium to publish a FAQ document which prejudices the fair consideration of characters based upon new technologies which may arise in the future. William Overington 5 July 2002
RE: Inappropriate Proposals FAQ
I would like to once again suggest that we refocus this 'FAQ' AWAY from a repetition of the "Principles and Procedures" document maintained by WG2 and containing the explanation of what constitutes a valid *formal* proposal. AWAY from any attempt to cover *all* aspects that could make a proposal inappropriate, and away from any schema for a complete classification of the universe of possible proposals. TOWARDS a set of a few -easily understood and not contentious- examples of things that have been ruled out of bounds - with a clear pointer to the formal document with its typology of scripts. (By all means, point prominently to the roadmap as well). Doing anything else will take a lot of work, both initially and in constantly tweaking it; cause a lot of confusion (if it contains many items that are in fact in a gray zone) and can weaken our understanding of which set of 'rules' are the ones we really operate under. A./ On Wed, 3 Jul 2002 23:24:01 +0100 Michael Everson <[EMAIL PROTECTED]> wrote: I would NOT like to see our committees' hands tied by taking this list as more than guidelines. I understand that it is for an FAQ but there should be text therein to emphasize that these are not binding.
Re: Inappropriate Proposals FAQ
At 15:17 -0600 2002-07-03, John H. Jenkins wrote: >On Wednesday, July 3, 2002, at 02:23 PM, Murray Sargent wrote: > >>as something inappropriate. Question: how does one code up (presumably >>with markup) a caret over a jk pair in a math expression? The dot on the >>j should be missing for this case, but how does one communicate that to >>a font if there's no code for a dotless j? It seems that dotless j is >>needed for some mathematical purposes. >> > >The glyph is; the character isn't. There are also accented j's >which are based on a dotless-j. The way we do it is include a glyph >called "dotlessj" in the font, and have the tables set up so that >whenever "j" is found with an accent, dotlessj is substituted. This is a very good answer and should be in the FAQ. There may be a dotless j as a character in one of the Nordic phonetic alphabets. But even if there were, it would be wrong to use it for a decomposed Esperanto J WITH CIRCUMFLEX. -- Michael Everson *** Everson Typography *** http://www.evertype.com
RE: Inappropriate Proposals FAQ
I would NOT like to see our committees' hands tied by taking this list as more than guidelines. I understand that it is for an FAQ but there should be text therein to emphasize that these are not binding. At 19:10 + 2002-07-03, Timothy Partridge wrote: >Why not just presentation glyphs in general? We seem to have queries about >Indian cojuncts fairly frequently. > >Some more suggestions (some of which have covered from other angles already) > >- No scripts with a limited body of text in existance. (No need to exchange >or analyse on computer.) E.g. Phaistos disk script If the Phaistos disk were bilingual and deciphered, it could be added even if there were only one document. Why not? >- No scripts which are poorly understood and it is not clear as to what the >characters are. E.g. Rongo-rongo. True. >- No symbols that are just a picture of something with no other meaning e.g. >a dog. (These tend not to have a fixed conventional form.) For instance, Blissymbols has a dog symbol in it. Granted, Blissymbols is a separate script so maybe that isn't so convincing. But what if a series of hotel symbols were added, with things like NO SMOKING, NO DOGS, GUIDE DOGS appeared? Those do have some sort of real semantic even though the glyphs may vary. >- No symbols that are only used in diagrams rather than running text. e.g. >electrical component symbols. Probably unassailable. >- No personal, ideosyncratic or company logos. E.g. the artist when he was >not known as Prince. This IS a rule. >- No archaic styles of existing characters. E.g. dotless j. There are some archaic characters already encoded, and N'Ko is going to have two of them. Probably. >- No control codes for fancy text. E.g. begin bold We have BEGIN SLUR in Western Music already. Might have use for BEGIN and END CARTOUCHE in Egyptian -- or might not. Research continues. >- No characters that can be obtained by using a different font with existing >characters and have no semantic difference from the existing characters. Such as? >- No proposals to rename existing characters. (But a clarifying note >might be added.) This IS a rule. >- No proposals to reposition existing characters, e.g. so they sort better. This IS a rule. >- No proposals for a newly invented character since putting it in the >standard would help promote its use. (Significant usage must come first.) We did encode the GREEK KAI SYMBOL, and when I proposed it, I hoped that it would promote its use. Why? Because I saw a lot of hand-painted signage in Greece which used it, but machine-printed signage which used the AMPERSAND instead. I thought that was pretty unfortunate. But I DIDN'T invent it. It is centuries old! Playing devil's advocate here, just a bit. -- Michael Everson *** Everson Typography *** http://www.evertype.com
Re: Inappropriate Proposals FAQ
On Wednesday, July 3, 2002, at 02:23 PM, Murray Sargent wrote: > as something inappropriate. Question: how does one code up (presumably > with markup) a caret over a jk pair in a math expression? The dot on the > j should be missing for this case, but how does one communicate that to > a font if there's no code for a dotless j? It seems that dotless j is > needed for some mathematical purposes. > The glyph is; the character isn't. There are also accented j's which are based on a dotless-j. The way we do it is include a glyph called "dotlessj" in the font, and have the tables set up so that whenever "j" is found with an accent, dotlessj is substituted. == John H. Jenkins [EMAIL PROTECTED] [EMAIL PROTECTED] http://homepage.mac.com/jenkins/
RE: Inappropriate Proposals FAQ
Timothy Partridge included the restriction - No archaic styles of existing characters. E.g. dotless j. as something inappropriate. Question: how does one code up (presumably with markup) a caret over a jk pair in a math expression? The dot on the j should be missing for this case, but how does one communicate that to a font if there's no code for a dotless j? It seems that dotless j is needed for some mathematical purposes. Thanks Murray
RE: Inappropriate Proposals FAQ
Marco Cimarosti recently said: > - No presentation glyphs for shapes that can already be obtained using > regular characters in conjunction with ZWJ or ZWNJ. Why not just presentation glyphs in general? We seem to have queries about Indian cojuncts fairly frequently. Some more suggestions (some of which have covered from other angles already) - No scripts with a limited body of text in existance. (No need to exchange or analyse on computer.) E.g. Phaistos disk script - No scripts which are poorly understood and it is not clear as to what the characters are. E.g. Rongo-rongo. - No symbols that are just a picture of something with no other meaning e.g. a dog. (These tend not to have a fixed conventional form.) - No symbols that are only used in diagrams rather than running text. e.g. electrical component symbols. - No personal, ideosyncratic or company logos. E.g. the artist when he was not known as Prince. - No archaic styles of existing characters. E.g. dotless j. - No control codes for fancy text. E.g. begin bold - No characters that can be obtained by using a different font with existing characters and have no semantic difference from the existing characters. - No proposals to rename existing characters. (But a clarifying note might be added.) - No proposals to reposition existing characters, e.g. so they sort better. - No proposals for a newly invented character since putting it in the standard would help promote its use. (Significant usage must come first.) Tim -- Tim Partridge. Any opinions expressed are mine only and not those of my employer
Re: Inappropriate Proposals FAQ
At 10:01 AM 7/2/2002 -0400, Suzanne M. Topping wrote: >I have a few ideas for fictional proposals to use as examples (my room >layout idea, and Mark's 3-D Mr. Potato Head representation), but I could >use another one or two if anyone feels creative. The closer to being >believable, the better, I suppose. (An alternative would be to use >real-life proposals, and state why they were not accepted, but I thought >it more politic to keep it fictional...) There was a discussion last year about a symbol to represent pi/2 or pi/4 or something like that. If you want to fictionalize that to some other fraction of a mathematical constant, that might work (e/2 perhaps?) Barry Caplan www.i18n.com
Re: Inappropriate Proposals FAQ
How about symbols from electronics and hydraulics? Schematic symbols. Wm Seán Glen - Original Message - From: Suzanne M. Topping To: Unicode (E-mail) Sent: Tuesday, 02 July, 2002 7:01 Subject: Inappropriate Proposals FAQ I have a few ideas for fictional proposals to use as examples (my roomlayout idea, and Mark's 3-D Mr. Potato Head representation), but I coulduse another one or two if anyone feels creative. Thanks in advance for your input,Suzanne ToppingBizWonk Inc.[EMAIL PROTECTED]
Re: Inappropriate Proposals FAQ
At 12:38 -0400 2002-07-02, ÇÎÅZÅZÅZÅZ ÇÎÅZÅZÅZ wrote: >I have a few ideas: > >Fictional scripts that would probably be rejected, such as the >script of the Codex Seraphinianus Certainly not. Tengwar and Cirth are certain to be encoded. The Codex script would probably not be encoded because it occurs in only one manuscript and is undeciphered. -- Michael Everson *** Everson Typography *** http://www.evertype.com
Re: Inappropriate Proposals FAQ
I have a few ideas: Fictional scripts that would probably be rejected, such as the script of the Codex Seraphinianus A "fictional" Hanzi (specifically, a Hanzi made up of the "woman" radical plus the character for "walk"), which I am attaching a crude image of. The proposer either (1) used this character in a novel once (or has seen it used in a novel), or (2) he wants to use it as a symbol for the length unit of the new system of measurement he invented. $B==0l$A$c$s??$N0&$OB8:_$7$J$$$N!)(B _ $B$-$C$H8+$D$+$k$"$J$?$N?75o!!ITF0;:>pJs$O(B MSN $B=;Bp$G(B http://house.msn.co.jp/
RE: Inappropriate Proposals FAQ
Suzanne M. Topping wrote: > I have a few ideas for fictional proposals to use as examples (my room > layout idea, and Mark's 3-D Mr. Potato Head representation), > but I could use another one or two if anyone feels creative. Today I don't feel very creative, perhaps because deliberating inventing bad ideas does not appeal too much to my creativeness. :-) But perhaps I have some suggestions for the less creative part of the FAQ, which is: listing the existing policies for excluding some classes of proposals. In my understanding, a few such policies are: - No precomposed ligatures which can be encoded using a sequence of existing character (possibly joined by ZWJ's); - No precomposed "accented characters" which can be composed using an existing character and one or more existing combining diacritics; - No clones of existing characters whose sole purpose is making a *logical* differentiation from some existing characters (e.g., hex digits looking identical to existing characters "0..9" and "A...F"; or a symbol for "meter" looking identical to Latin "m"); - No clones of existing characters whose sole purpose is making a *graphical* differentiation from some existing characters (e.g., a Serbian letter "t", disunified from Russian on the basis that italics looks different in the two languages); - No presentation glyphs for shapes that can already be obtained using regular characters in conjunction with ZWJ or ZWNJ. _ Marco
Re: Inappropriate Proposals FAQ
But would not using rejected proposals (as well as the fictional ones) be closer to the truth and therefore more accurate? John > from:"Suzanne M. Topping" <[EMAIL PROTECTED]> > date:Tue, 02 Jul 2002 15:01:16 > to: [EMAIL PROTECTED] > subject: Re: Inappropriate Proposals FAQ > > (An alternative would be to use > real-life proposals, and state why they were not accepted, but I thought > it more politic to keep it fictional...) >