Re: Re: Unicode Devanagari Font in Mozilla]
Prabhat Hedge wrote, > * Indic scripts do not have any standard (as in published/registered/ > recognized) font encoding that i know of. ISCII is the standard for Indic scripts. > * Indian language web-sites use mis-use charset tag "x-user-defined". So do some non-Indian language web sites. Best regards, James Kass.
Re: ?? Unicode ??
>As I use the Unicode characters? >Which publishers of text support Unicode? >To use Unicode, what she is necessary to make, download, upgrade... 1) Most Web browsers can be set to show pages which are in Unicode. 2) Unicode character codes are useful in JavaScript. 3) Some word processing programs have an option to save text files in Unicode. > >Thanks $B==0l$A$c$s??$N0&$OB8:_$7$J$$$N!)(B _ $B%&%#%k%9%a!<%k!"LBOG%a!<%kBP:v$J$i(B MSN Hotmail http://www.hotmail.com/JA
Re: Hmm, this evolved into an editorial when I wasn't looking :) was: RE: Inappropriate Proposals FAQ
Barry Caplan wrote: > >> At 01:27 PM 7/11/2002 -0400, Suzanne M. Topping wrote: > >> >Unicode is a character set. Period. > >> > >> Each character has numerous > >> properties in Unicode, whereas they generally don't in legacy > >> character sets. > > > >Each character, or some characters? > > > For all intents and purposes, each character. > So, each character has at least one attribute. Yes. The implications of the Unicode Character Database include the determination that the UTC has normatively assigned properties (multiple) to all Unicode encoded characters. Actually, it is a little more subtle than that. There are some properties which accrue to code points. The General Category and the Bidirectional Category are good examples, since they constitute enumerated partitions of the entire codespace, and API's need to return meaningful values for any code point, including unassigned ones. Other properties accrue more directly to characters, per se. They attach to the abstract character, and get associated with a code point more indirectly by virtue of the encoding of that character. The numeric value of a character would be a good example of this. No one expects an unassigned code point or an assigned dingbat character or a left bracket to have a numeric value property (except perhaps a future generation of Unicabbalists). > There are no corresponding features in other character sets usually. Correct. Before the development of the Unicode Standard, character encoding committees tended to leave that property assignments either up to implementations (considering them obvious) or up to standardization committees whose charter was "character processing" -- e.g. SC22/WG15 POSIX in the ISO context. The development of a Universal character encoding necessitated changing that, bringing character property development and standardization under the same roof as character encoding. Note that not everyone agrees about that, however. We are still having some rather vigorous disagreements in SC22 about who "owns" the problem of standardization of character properties. > A common definition of "character set" is a list of character > you are interested in assigned to codepoints. That fits most > legacy character sets pretty well, but Unicode is sooo much > more than that. Roughly the distinction I was drawing between "the Unicode CCS" and "the Unicode Standard". > But what if we took a look at it from a different point of view, > that the standard is a agreed upon set of rules and building > blocks for text oriented algorithms? Would people start to > publish algorithms that extend on the base data provided so > we don't have to reinvent wheels all the time? Well the "Unicode Standard" isn't that, although it contains both formal and informal algorithms for accomplishing various tasks with text, and even more general "guidelines" for how to do things. The members of the Unicode Technical Committee are always casting about for areas of Unicode implementation behavior where commonly defined, public algorithms would be mutually beneficial for everyone's implementations and would assist general interoperability with Unicode data. To date, it seems to me that the members, as well as other participants in the larger effort of implementing the Unicode Standard, have been rather generous in contributing time and brainpower to this development of public algorithms. The fact that ICU is an Open Source development effort is enormously helpful in this regard. > If I were to stand in front of a college comp sci class, > where the future is all ahead of the students, what proportion > of time would I want to invest in how much they knew about legacy > encodings versus how much I could inspire them to build from and > extend what Unicode provides them? This problem, of Unicode in the computer science curriculum, intrigues me -- and I don't think it has received enough attention on this list. One of my concerns is that even now it seems to be that CS curricula not only don't teach enough about Unicode -- they basically don't teach much about characters, or text handling, or anything in the field of internationalization. It just isn't an area that people get Ph.D.'s in or do research in, and it tends to get overlooked in people's education until they go out, get a job in industry and discover that in the *real* world of software development, they have to learn about that stuff to make software work in real products. (Just like they have to do a lot of seat-of-the-pants learning about a lot of other topics: building, maintaining, and bug-fixing for large, legacy systems; software life cycle; large team cooperative development process; backwards compatibility -- almost nothing is really built from scratch!) > > The major work ahead is no longer in the context of building > a character standard. Time is fast approaching to decide to keep > it small and apply a bit of polish, or focus on the use and
Re: [OpenType] Proposal: Ligatures w/ ZWJ in OpenType
The mechanism proposed by John to handle ZWJ/ZWNJ makes the implicit assumption that those characters are transformed into glyphs (via the usual 'cmap' mechanism) and that this is the avenue to transfer the intent of those characters to the shaping code in the font (i.e. some kind of ligature lookup). I'd like to revisit that assumption. The ZWJ/ZWNJ characters are formatting characters. Their function is definitely different from the function of the "regular" characters (such as "A"): they are a way to control the rendering of regular characters around them, and to express that control in plain text. The debate so far shows that there is no strong objection to that mechanism by itself. In an environment richer than plain text, there is obviously the possibility that this control could be expressed by other means than characters. In the OpenType world, and in particular in the interface between the layout engine and the shaping code in fonts, we have more than plain text, or rather plain glyphs; we also have a description of which features should be applied to which glyphs. So instead of having glyphs that stand for ZWJ/ZWNJ, can we use these features? In fact, we already do that every day. For example, an InDesign user can insert the two characters x and y, and apply a ligature feature (let's say 'dlig') to them. It seems to me that this is just what ZWJ is about. So InDesign could do the following given the character sequence x ZWJ y: map it the glyph sequence cmap(x) cmap(y), with 'dlig' applied on those two glyphs. This 'dlig' application takes precedence over one via UI, i.e. it happens regardles of whether the user requested 'dlig' explicitly. The ZWJ character is simply not mapped to the glyph stream, since the feature application does the job of ZWJ. We can handle ZWNJ in the same way: the sequence x ZWNJ y is transformed to the glyph sequence cmap(x) cmap(y), with 'dlig' not applied on those two glyphs. This 'dlig' non-application takes precedence over one via UI, i.e. 'dlig' is not applied to these two glyphs regardless of whether the user requested 'dlig' explicitly. [May be a better way of thinking about the precedence stuff is to think entirely in markup terms: ... x ZWNJ y ... is transformed in the glyph stream ... cmap(x) cmap(y) ... , i.e. dlig is off on the pair x y; hold your objection that a feature is applied to a position rather than a range for a minute.] With this approach, we gain two things. First, not having a "formatting" glyph for ZWJ is IMHO a huge conceptual win, even bigger than not having a "formatting" character ZWJ would be. Second, what John's proposal did not mention (or may be I missed it) is that it's not just the ligature features that have to deal with this glyph, it is all the features; compound that by all the formatting characters, and you will start to understand Paul's reaction. It's interesting to note that this approach can be applied to other formatting characters as well. Either their intent can be achieved by the layout engine alone, without help of the font, in which case there is no need to show anything to the code in the font; no glyph and no feature are consequence of those characters. Or their intent needs help of the font, and the OpenType way to ask for this help is to apply (or not) features. All that takes care of selecting a ligature, but it does not quite take care of selecting cursive forms. I can see how we could define 'dlig' to do that (or define a 'zwj' feature that invokes the ligature lookups plus some single substitution lookup), but I am not sure I am happy with that. In fact, I am not sure I am happy with that clause in Unicode. Eric. [About the features applied to ranges rather than positions: think about it and it should be obvious 8-) It does not make sense to apply a ligature at a position; what makes sense is to apply a ligature on range. Think about 1->n substitutions; whatever lookups apply to the source glyph should also apply to all the replacement glyphs - ranges again. I even believe that this approach is compatible with the current OpenType spec. More details on demand.]
Re: What Unicode Is (was RE: Inappropriate Proposals FAQ)
At 03:54 PM 7/12/2002 -0700, Kenneth Whistler wrote: >Suzanne responded: > >> > Maybe Unicode is more of a shared set of rules that apply to >> > low level data structures surrounding text and its algorithms >> > then a character set. > >O.k., so now before asserting or denying that "Unicode ... is >a shared set of rules", it would be helpful to pin down >first what you are referring to. That might make the ensuing >debate more fruitful. Actually, it was me, not Suzanne, that called "Unicode" a shared set of rules. As Ferris Bueller once said "I'll take the heat for this." I was aware of all of the uses of Unicode that you listed. I have no quarrels with any of them. They do point to the fact that the word is overloaded with definitions. Which means that readers have to choose the appropriate one from the context. The context of the statement above is that the "Unicode" referred to is the Standard, and all associated documentation. Not Unicode the Consortia which manages the Standard. Not Unicode the way of life :) I did intend to throw open a debate about the long term future of Unicode the Standard and by extension Unicode the Consortia. Since Suzanne is writing "What is Unicode and is not Unicode FAQ", I think the answer to that is going to be very definitely colored by the answer to the related question "What will Unicode become?", e.g. Unicode 6.0, 7.0, 8.0, etc. See my previous msg, subject line: "Hmm, this evolved into an editorial when I wasn't looking :) " for some thoughts on that subject. Barry Caplan www.i18n.com
What Unicode Is (was RE: Inappropriate Proposals FAQ)
Suzanne responded: > > Maybe Unicode is more of a shared set of rules that apply to > > low level data structures surrounding text and its algorithms > > then a character set. > > Sounds like the start of a philosophical debate. > > If Unicode is described as a set of rules, we'll be in a world of hurt. > (On a serious note, these exceptions are exactly what make writing some > sort of "is and isn't" FAQ pretty darned hard. Hmm. Since the discussion which started out trying to specify a few examples of what kinds of entities would be inappropriate to proffer for encoding as Unicode characters seems to be in danger of mutating into the recurrent "What is Unicode?" question, perhaps its time to start a new thread for the latter. And now for some ontological ground rules. When trying to decide what a "thing" is, it helps not to use an attribute nominatively, since that encourages people to privately visualize the noun the attribute is applied to, but to do so in different ways -- and then to argue past each other because they are, in the end, talking about different things. "Unicode" is used attributatively of a number of things, and if we are going to start arguing/discussing what "it" is, it would be better to lay out the alternative "it"s a little more specifically first. 1. The Unicode *Consortium* is a standardization organization. It started out with a charter to produce a single standard, but along the way has expanded that charter, in response to the desire of its membership. In addition to "The Unicode Standard", it now has adopted a terminology that refers to some of its other publications as "Unicode Technical Standards" [UTS], of which two formally exist now: UTS #6 SCSU, and UTS #10 Unicode Collation Algorithm [UCA]. It is important to keep this straight, because some people, when they say "Unicode" are talking about the *organization*, rather than the Unicode Standard per se. And when people talk about "the standard", they are generally referring to "The Unicode Standard", but the Unicode Consortium is actually responsible for several standards. 2. The Unicode *Standard* itself is a very complex standard, consisting of many pieces now. To keep track of just what something like "The Unicode Standard, Version 3.2" means, we now have to keep web pages enumerating all the parts exactly -- like components in an assemble-your-own-furniture kit. See: http://www.unicode.org/unicode/standard/versions/ In any one particular version, the Unicode Standard now consists of a book publication, some number of web publications (referred to as Unicode Standard Annexes [UAX]), and a large number of contributory data files -- some normative and some informative, some data and some documentation. These definitions, including the exact list of contributory data files and their versions, are themselves under tight control by the Unicode Technical Committee, as they constitute the very *definition* of the Unicode Standard. It is not by accident that the version definitions start off now with the following wording: "The Unicode Standard, Version 3.2.0 is defined by the following list..." and so on for earlier versions. 3. The Unicode *Book* is a periodic publication, constituting the central document for any given version of the Unicode *Standard*, but is by no means the entire standard. The book, in turn, is very complex, consisting of many chapters and parts, some of which constitute tightly controlled, normative specification, and some of which is informative, editorial content. The "book" now also exists in an online version (pdf files): http://www.unicode.org/unicode/uni2book/u2.html which is *almost* identical to the published hardcover book, but not quite. (The Introduction is slightly restructured, the online glossary is restructured and has been added to, the charts are constructed slightly differently and have introductory pages of their own, etc.) 4. The Unicode *CCS* [coded character set] is the mapping of the set of abstract characters contained in the Unicode repertoire (at any given version) to a bunch of code points in the Unicode codespace (0x..0x10). Technically speaking, it is the Unicode *CCS* which is synchronized closely with ISO/IEC 10646, rather than the Unicode *Standard*. 10646 and the Unicode CCS have exactly the same coded characters (at various key synchronization points in their joint publication histories), but the *text* of the ISO/IEC 10646 standard doesn't look anything like the *text* of the Unicode Standard, and the Unicode Standard [sensum #2 above] contains all kinds of material, both textual and data, that goes far beyond the scope of 10646. There are other standards produced by some national bodies that are effectively just translations of 10646 (GB 13000 in China, JIS X 0221 in Japan), but the Unicode Standard is nothing like those. Finally, the attribute "Unicode ..." can be applied to all kinds of other "things" characteristic of the Unicode Sta
Hmm, this evolved into an editorial when I wasn't looking :) was: RE: Inappropriate Proposals FAQ
At 05:13 PM 7/12/2002 -0400, Suzanne M. Topping wrote: >> -Original Message- >> From: Barry Caplan [mailto:[EMAIL PROTECTED]] >> >> At 01:27 PM 7/11/2002 -0400, Suzanne M. Topping wrote: >> >Unicode is a character set. Period. >> >> Each character has numerous >> properties in Unicode, whereas they generally don't in legacy >> character sets. > >Each character, or some characters? For all intents and purposes, each character. Chapter 4.5 of my Unicode 3.0 book says " The Unicode Character Database on the CDROM defines a General Category for all Unicode characters" So, each character has at least one attribute. One could easily say that each character also has an attribute for "isUpperCase" of either true of false, and so on. There are no corresponding features in other character sets usually. >> Maybe Unicode is more of a shared set of rules that apply to >> low level data structures surrounding text and its algorithms >> then a character set. > >Sounds like the start of a philosophical debate. Not really. I have been giving presentations for years, and I have seen many others give similar presentations. A common definition of "character set" is a list of character you are interested in assigned to codepoints. That fits most legacy character sets pretty well, but Unicode is sooo much more than that. >If Unicode is described as a set of rules, we'll be in a world of hurt. Yeah, one of the heaviest books I own is Unicode 3.0. I keep it on a low shelf so the book of rules describing Unicode doesn't fall on me for just that reason. this is earthquake country after all :) >I choose to look at this stuff as the exceptions that make the rule. I don't really know if it is possible to break down Unicode into more fundamental units if you started over. Its complexity is inherent in the nature of the task. My own interest is more in getting things done with data and algorithms that use the type of material represented by the Unicode standard, more so than the arcania of the standard itself. So it doesn't bother me so much that there are exceptions - as long as we have the exceptions that everyone agrees on, that is fine by me because it means my data and at least some of my algorithms are likely to be preservable across systems. >(On a serious note, these exceptions are exactly what make writing some >sort of "is and isn't" FAQ pretty darned hard. Be careful what you ask for :) >I can't very well say >that Unicode manipulates characters given certain historical/legacy >conditions and under duress. Why not? It is true. But what if we took a look at it from a different point of view, that the standard is a agreed upon set of rules and building blocks for text oriented algorithms? Would people start to publish algorithms that extend on the base data provided so we don't have to reinvent wheels all the time? I'm just brainstorming here, this is all just coming to me now. If I were to stand in front of a college comp sci class, where the future is all ahead of the students, what proportion of time would I want to invest in how much they knew about legacy encodings versus how much I could inspire them to build from and extend what Unicode provides them? Seriously, most of the folks on this list that I know personally, and I include myself in this category, are approaching or past the halfway point in our careers. What would we want the folks who are just starting their careers now to know about Unicode and do with it by the time they reach the end of theirs, long after we have stopped working? For many applications, people are not going to specialize in i18n/l10n issues. They need to know what the appropriate building text based blocks are, and how they can expand on them while still building whatever they are working on. Unicode at least hints at this with the bidi algorothm. Moving forward should other algorithms be codified into Unicode, or as separate standards or defacto standards? I am thinking of "Japanese word splitting algorithm". There are proprietary products that do this today with reasonable but not perfect results. Are they good enough that the rules can be encoded into a standard? If so, then someone would build an open implementation, and then there would always be this building block available for people to use. I am sure everyone on this list can think of their own favorite algorithms of this type, based on the part of Unicode that interests you the most. My point is that the raw information already in unicode *does* suggest the next level of usage, and the repeated newbie questions that inspired this thread suggest the need for a comprehensive solution at a higher level then a character set provides. Maybe part of this means including or at least facilitating the description of lowlevel text handling algorithms. >If I did, people would be scurrying around >trying to figure out how to foment the duress.) The acc
Status update re. Inappropriate Proposals FAQ
I'm nearly done playing catchup after vacation and hope to begin extracting concepts for the FAQ next week. Thanks to all who've submitted input, as conflicting and varied as it is/was. > -Original Message- > From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED]] > Sent: Wednesday, July 03, 2002 8:53 > > > I would like to once again suggest that we refocus this 'FAQ' > > AWAY from a repetition of the "Principles and Procedures" > document maintained > by WG2 and containing the explanation of what constitutes a > valid *formal* > proposal.
re smelly fonts (was: Saying characters out loud (derives from hash, pound,octothor pe?))
I saw the movie Polyester in Odorama. A John Waters film with Divine. They handed out scratch and sniff cards and the movie flashed numbers at the right time for you to scratch the numbered dot on the card which released the odor. It was a great technique and effect. They played a few tricks on the audience as well, substituting unexpected smells for the ones you were anticipating. It was great. Anyone for scratch and sniff fonts? It could be a problem though to scratch the character of the smoking Frenchman, where smoking is prohibited... ;-) Hey too bad we don't have a Gun character. Scratch it, and the gun goes off and becomes the smoking gun everyone is looking for! It could be the first animated, noisy, smelly font! (Probably shoots the dot off the dotted i, making it Turkish.) ;-) (OK, its friday!) David Possin wrote: > > OK, while we are at it: smelly fonts, anyone? > > (actually I can imagine how some fonts smell) > > Dave > --- Barry Caplan <[EMAIL PROTECTED]> wrote: > > At 09:43 AM 7/12/2002 -0400, Suzanne M. Topping wrote: > > > > >> -Original Message- > > >> From: David Possin [mailto:[EMAIL PROTECTED]] > > >> > > >> so now we have a chromatic audio attribute for each character? > > > > > >Don't be ridiculous. Sounds don't have chroma. > > > > > >There will however be a need for tone and accent variation so that > > >proper localization can be executed. > > > > > >;^P > > > > I have been dreaming of the idea of synaesthetic applications for > > years but haven't come up with a way to do it yet. But sounds > > absolutely will need chroma, that much I know. And when you "say it > > with feeling", the fonts will literally be perceived as "feeling" > > > > Such an application better not be written for Windows, because the > > "blue screen of death" will be felt rather than seen :) > > > > Barry Caplan > > www.i18n.com > > > > > > = > Dave Possin > Globalization Consultant > www.Welocalize.com > http://groups.yahoo.com/group/locales/ > > __ > Do You Yahoo!? > Sign up for SBC Yahoo! Dial - First Month Free > http://sbc.yahoo.com -- - Tex Texin cell: +1 781 789 1898 mailto:[EMAIL PROTECTED] Xen Master http://www.i18nGuy.com XenCrafthttp://www.XenCraft.com Making e-Business Work Around the World -
RE: Inappropriate Proposals FAQ
> -Original Message- > From: Barry Caplan [mailto:[EMAIL PROTECTED]] > > At 01:27 PM 7/11/2002 -0400, Suzanne M. Topping wrote: > >Unicode is a character set. Period. > > Each character has numerous > properties in Unicode, whereas they generally don't in legacy > character sets. Each character, or some characters? > Maybe Unicode is more of a shared set of rules that apply to > low level data structures surrounding text and its algorithms > then a character set. Sounds like the start of a philosophical debate. If Unicode is described as a set of rules, we'll be in a world of hurt. > The Unicode consortium very wisely keeps it's focus narrow. > It provides > >a mechanism for specifying characters. Not for manipulating them, not > >for describing them, not for making them twinkle. > > All true, except for some special cases (BOM, bidi issues and > algoirthms, vertical variants, etc).Not saying those > shouldn't be in there, just that they are useful only in the > use of algorithms that are explicit (bi-di) or assumed (upper > case/lower case, vertical/horizontal) etc. Why mess up a nice clean statement simply because of a few hard facts? I choose to look at this stuff as the exceptions that make the rule. (On a serious note, these exceptions are exactly what make writing some sort of "is and isn't" FAQ pretty darned hard. I can't very well say that Unicode manipulates characters given certain historical/legacy conditions and under duress. If I did, people would be scurrying around trying to figure out how to foment the duress.)
RE: Saying characters out loud (derives from hash, pound,octothor pe?)
OK, while we are at it: smelly fonts, anyone? (actually I can imagine how some fonts smell) Dave --- Barry Caplan <[EMAIL PROTECTED]> wrote: > At 09:43 AM 7/12/2002 -0400, Suzanne M. Topping wrote: > > >> -Original Message- > >> From: David Possin [mailto:[EMAIL PROTECTED]] > >> > >> so now we have a chromatic audio attribute for each character? > > > >Don't be ridiculous. Sounds don't have chroma. > > > >There will however be a need for tone and accent variation so that > >proper localization can be executed. > > > >;^P > > I have been dreaming of the idea of synaesthetic applications for > years but haven't come up with a way to do it yet. But sounds > absolutely will need chroma, that much I know. And when you "say it > with feeling", the fonts will literally be perceived as "feeling" > > Such an application better not be written for Windows, because the > "blue screen of death" will be felt rather than seen :) > > Barry Caplan > www.i18n.com > > = Dave Possin Globalization Consultant www.Welocalize.com http://groups.yahoo.com/group/locales/ __ Do You Yahoo!? Sign up for SBC Yahoo! Dial - First Month Free http://sbc.yahoo.com
RE: Saying characters out loud (derives from hash, pound,octothor pe?)
At 09:43 AM 7/12/2002 -0400, Suzanne M. Topping wrote: >> -Original Message- >> From: David Possin [mailto:[EMAIL PROTECTED]] >> >> so now we have a chromatic audio attribute for each character? > >Don't be ridiculous. Sounds don't have chroma. > >There will however be a need for tone and accent variation so that >proper localization can be executed. > >;^P I have been dreaming of the idea of synaesthetic applications for years but haven't come up with a way to do it yet. But sounds absolutely will need chroma, that much I know. And when you "say it with feeling", the fonts will literally be perceived as "feeling" Such an application better not be written for Windows, because the "blue screen of death" will be felt rather than seen :) Barry Caplan www.i18n.com
RE: Inappropriate Proposals FAQ
At 01:27 PM 7/11/2002 -0400, Suzanne M. Topping wrote: >Unicode is a character set. Period. Well, maybe. But in a much broader sense then the character sets it subsumes in its listings. Each character has numerous properties in Unicode, whereas they generally don't in legacy character sets. Maybe Unicode is more of a shared set of rules that apply to low level data structures surrounding text and its algorithms then a character set. The Unicode consortium very wisely keeps it's focus narrow. It provides >a mechanism for specifying characters. Not for manipulating them, not >for describing them, not for making them twinkle. All true, except for some special cases (BOM, bidi issues and algoirthms, vertical variants, etc).Not saying those shouldn't be in there, just that they are useful only in the use of algorithms that are explicit (bi-di) or assumed (upper case/lower case, vertical/horizontal) etc. In many cases, these algorthms are not well known, even amongst the cognoscenti, or generally available in nice libraries. Anyone for an open source Japanese word splitting library (I know not taking a look at ICU before I press send is going to come back to haunt me on this, but if it is in there, then substitute something that isn't :) Barry Caplan www.i18n.com
?? Unicode ??
As I use the Unicode characters? Which publishers of text support Unicode? To use Unicode, what she is necessary to make, download, upgrade... Thanks
[Fwd: Re: Unicode Devanagari Font in Mozilla]
resend. Original Message Subject: Re: Unicode Devanagari Font in Mozilla Date: Thu, 11 Jul 2002 20:58:03 -0700 (PDT) From: Prabhat Hegde <[EMAIL PROTECTED]> Reply-To: Prabhat Hegde <[EMAIL PROTECTED]> To: [EMAIL PROTECTED], [EMAIL PROTECTED] CC: [EMAIL PROTECTED] hi dipali, There are numerous changes needed to position/shape devanagari and other Indic text. Additional changes are needed to support caret and selection operations. And finally, it would depend on the nature of font that you use (intelligent Vs dumb). To complicate matters furthur: * Indic scripts do not have any standard (as in published/registered/ recognized) font encoding that i know of. * Indian language web-sites use mis-use charset tag "x-user-defined". Please look at : http://bugzilla.mozilla.org/show_bug.cgi?id=85204 And let me know if you need additional info. prabhat. >Date: Thu, 11 Jul 2002 22:50:04 +0530 >Date: Thu, 11 Jul 2002 22:45:47 +0530 (IST) >From: Dipali Choudhary >Subject: Unicode Devanagari Font in Mozilla >X-Originating-IP: 144.16.111.15 >To: [EMAIL PROTECTED] >MIME-version: 1.0 >X-Auth-User: [EMAIL PROTECTED] >X-archive-position: 959 >X-ecartis-version: Ecartis v1.0.0 >X-original-sender: [EMAIL PROTECTED] >X-List: unicode >List-Unsubscribe: >List-Help: >List-Id: >X-List-Id: >List-Software: Ecartis version 1.0.0 > > >Hello, > > I am newbie in the area.I am using mozilla 0.7 on Linux 7.2. I can >see devangari text in it. but there is problem of shifted matras. >What should I need to do to correctly position it. > >Every time Mozilla is using default devanagari font for showing the >char acters. What should I do to change default font? > > >Thanks in advance > >regards, >dipali > > > > > Dipali Choudhary > M.Tech.CSE dept. > IIT Bombay. > >
RE: Saying characters out loud (derives from hash, pound,octothor pe?)
> -Original Message- > From: David Possin [mailto:[EMAIL PROTECTED]] > > so now we have a chromatic audio attribute for each character? Don't be ridiculous. Sounds don't have chroma. There will however be a need for tone and accent variation so that proper localization can be executed. ;^P
Re: *Why* are precomposed characters required for "backward compatibility"?
>From: David Hopwood <[EMAIL PROTECTED]> >For all of these characters, use as a spacing diacritic is actually much >less common than any of the other uses listed above. Even when they are used >to represent accents, it is usually as a fallback representation of a combining >accent, not as a true spacing accent. > >So, there would have been no practical problem with disunifying spacing >circumflex, grave, and tilde from the above US-ASCII characters, so that the >preferred representation of all spacing diacritics would have been the >combining diacritic applied to U+0020. Apart from the problems Kenneth Whistler mentioned. You would get the same problems with the ISO 8859-1 spacing accents but there are less people using them than with those in ASCII. One problem is that some characters can be used as an accent and as a normal base character, and some characters that Unicode defines a decomposition of, is not a composed character in some countries. So in some contexts is is wrong to decompose some characters that could be ok to decompose in others. That is one reason I prefer NFC as it do not decompose characters. > >> For a lot of text handling precomposed characters are much easier to >> handle, especially when the combining character comes after instead of >> before the base character. > >I thought you said approximately the opposite in relation to T.61 above :-) > Sorry, got the last part wrong in my haste. I meant it is easier when the combining character comes before the base character. Dan
Re: What is TISI character Code?
Dear Sreedhar, for Thai Industrial Standard character set, it's TIS-620. to makes your apps support Thai, please try consisder about conforming these standards. TIS 620-2533 (1990) Standard for Thai Character Codes for Computers UDC 681.3.04:003.62 ISBN 974-606-153-4 TIS 820-2538 (1995) Layout of Thai Character Keys on Computer Keyboards UDC 681.3.02:003.62 ISBN 974-607-416-4 TIS 1566-2541 (1998) Thai Input/Output Methods for Computers ICS 35.060ISBN 974-607-898-4 Thai Industrial Standards Institute, Ministry of Industrial http://www.tisi.go.th [EMAIL PROTECTED] regards, Art Sreedhar.M wrote: > Hi, > I would lilke to make my application to Thai language compatible.In > that way I heard the term TISI character code.That's why I want to know > about the TISI character code.Please let me know if anybody have an idea > regarding this. > Thanks in Advance. > with Regards, > Sreedhar M.
Re: What is TISI character Code?
Sreedhar.M wrote: > I would lilke to make my application to Thai language compatible.In > that way I heard the term TISI character code.That's why I want to know > about the TISI character code.Please let me know if anybody have an idea > regarding this. TISI is the name of the standard organization in Thailand, Thai Industry Standard Institute. The character set name is tis-620. It's a 8-bit character set which is an extension to 7-bit ASCII for Thai characters. See :- http://www.nectec.or.th/it-standards/ -- Samphan Raruenrom Information Research and Development Division, National Electronics and Computer Technology Center, Thailand. http://www.nectec.or.th/home/index.html
What is TISI character Code?
Hi, I would lilke to make my application to Thai language compatible.In that way I heard the term TISI character code.That's why I want to know about the TISI character code.Please let me know if anybody have an idea regarding this. Thanks in Advance. with Regards, Sreedhar M.
Re: Proposal: Ligatures w/ ZWJ in OpenType
James Kass carved in stone: > There are a multitude of special cases such as paleontology, ... Ooof. Paleography. A kind person called my attention to this gaffe off-list. Thank you, kind person. Best regards, James Kass.
RE: Unicode Devanagari Font in Mozilla
Dipali Choudhary asked > Every time Mozilla is using default devanagari font for showing the > characters. What should I do to change default font? > Mozilla does not seem to allow you to choose a font for Devanagari. Edit > Preferences... > Category > Appearance > Fonts brings up a list of languages/scripts, but it does not include Devanagari. Unicode is on the list, so you could try changing the font(s) for that. Alan Wood http://www.alanwood.net (Unicode, special characters, pesticide names)
Re: Proposal: Ligatures w/ ZWJ in OpenType
>From Unicode 3.1 (On-line) ( http://www.unicode.org/unicode/reports/tr27/index.html ) U+200D ZERO WIDTH JOINER The intended semantic is to produce a more connected rendering of adjacent characters than would otherwise be the case, if possible. In particular: 1. If the two characters could form a ligature, but do not normally, ZWJ requests that the ligature be used. 2. Otherwise, if either of the characters could cursively connect, but do not normally, ZWJ requests that each of the characters take a cursive-connection form where possible. (bullet) In a sequence like , where a cursive form exists for X, but not for Y, the presence of ZWJ requests a cursive form for X. 3. Otherwise, where neither a ligature nor cursive connection are available, the ZWJ has no effect. Starting with Unicode 3.0.1, the definitions of ZWJ and ZWNJ were expanded to allow for greater control over ligature formation. A reason given for this is: "In some orthographies the same letters may either ligate or not, depending on the intended reading". > Thus, what John Hudson is wanting to do is to have "f" + ZWJ + "i" be > required to make the "fi" ligature by using the feature. Any font > that does not have OpenType support, or some other smart font > rendering, would ignore this and not render the ligature. Right. And any older font lacking a no-width no-contour glyph for ZWJ would probably display a null box between the "f" and the "i". > Another example: "a" + ZWJ + combining acute + ZWJ + "e" would be > required to produce an "ae" ligature with the combining acute over the > a portion of the ligature. Is this reasonable? AFAICT, ZWJ is not appropriate for combining glyphs like the combining acute diacritic. "a" + combining acute + ZWJ + "e" might be reasonably expected to produce what you've described. > Asmus is correct in needing to consider other languages. Saying that > the ZWJ causes Arabic to ligate would not be correct. It already is > defined to cause correct contextual shaping (isol, initial, medial, final) > forms. In fact, LAM + ZWJ + ALEF breaks the required ligature > formation because it sticks something in the middle of the context and > proves what the Unicode book says, "in some systems they may break > up ligatures by interrupting the character sequence required to form > the ligature." Should font vendors then have to not only code the normal > ligature formation, but also have to code shaping rules to make the ZWJ > work as well? > Yes, if font vendors want to provide this level of support. According to recent posts on the Unicode list, some font vendors are already doing this because of Unicode's recommendations on the subject. (Please see "Implementation Notes" under "Controlling Ligatures" in TR27 linked above.) As far as 'interrupting the sequence on some systems', the Unicode Standard may simply be referring to older, non-compliant systems which don't ignore these formatting characters where appropriate and/or have not yet implemented full support for Unicode 3.0.1 and up. So, this is already a complicated nightmare for shaping engine implementers. Sometimes the character should be ignored, but other times it needs to be a mandatory part of a look-up. Font developers seeking to follow the Unicode guidelines seem to be doing so on a 'by gosh and by golly' basis. John Hudson's proposal offers sensible parameters along with intuitive justification. Using 'rlig' for ZWJ based ligation is a clear choice. If an author takes the trouble to insert a ZWJ, a ligature is required if possible. Best regards, James Kass.
Re: Definition of character: Exegesis of SC2 nomenclature
Salve, ho scritto: > And which character most resembles a Frenchman smoking his cigarette? Marco Cimarosti scrisse: > I need to know, NOW! PLEASE! U+A232 Egli ha il suo basco. Ciao, Otto