Re: OT: OED
On Monday, April 29, 2002, at 10:05 AM, Patrick Rourke wrote: In the US, nearly all University libraries have the standard edition, and many good high school libraries (for 14-18 year-olds) have the compact edition (smaller typography). It's also available on CD for Windows. == John H. Jenkins [EMAIL PROTECTED] [EMAIL PROTECTED] http://homepage.mac.com/jenkins/
Re: Greek Extended: question: missing glyphs?
On Monday, April 29, 2002, at 08:37 PM, Pim Rietbroek wrote: Hello, Please forgive me if this question has been raised before: I am a newbie on this list. I am looking into the Unicode standard for the encoding of Classical Greek. While both the Greek and the Greek Extended ranges of the current Unicode Standard seem to cover most of the essentials, it looks strange to me that there some Greek extended glyphs have not been defined. They are: 1) GREEK CAPITAL LETTER UPSILON WITH PSILI 2) GREEK CAPITAL LETTER UPSILON WITH PSILI AND VARIA 3) GREEK CAPITAL LETTER UPSILON WITH PSILI AND OXIA 4) GREEK CAPITAL LETTER UPSILON WITH PSILI AND PERISPOMENI Reserved means don't use it. Yes, they're missing as precomposed forms, but you can always represent them using combining sequences. No, there's no point in asking for them. Unicode cannot add new precomposed accented Latin, Greek, or Cyrillic letters because it screw up normalization. Use the actual upsilon capital letter followed by the appropriate breathing and accent marks. == John H. Jenkins [EMAIL PROTECTED] [EMAIL PROTECTED] http://homepage.mac.com/jenkins/
Re: sources for plane 2 characters?
I don't know. I'll bring it up next week at the IRG meeting. On Tuesday, April 30, 2002, at 08:02 PM, Thomas Chan wrote: Hi all, I was looking at the plane 2 characters in the March 15, 2001 version of the unihan.txt file, and found five that did not have an IRG source: U+20957, U+221EC, U+22FDD, U+24FB9, and U+2A13A. (The last one, U+2A13A, however, has kIRGHanyuDaZidian and kIRGKangXi information showing that it can be found in those dictionaries. Still, shouldn't there be an IRG source for it?) Where are the first four from? Thanks, Thomas Chan [EMAIL PROTECTED] == John H. Jenkins [EMAIL PROTECTED] [EMAIL PROTECTED] http://homepage.mac.com/jenkins/
Re: on U+7384 (was Re: Synthetic scripts (was: Re: Private Use Agreements
On Friday, May 10, 2002, at 06:29 PM, John Cowan wrote: What is this about Qing taboo characters? Can someone point me to an explanation (in English)? Thanks. The whole idea of taboo forms stems from the fact that there are certain ideographs one could not use because, typically, they're part of personal name of someone important. So one deliberately distorts them when writing them. Such a thing is very much time-bound. Using a character from the personal name of the *current* emperor is a big deal, but using one from the personal name of an emperor five hundred years dead from an entirely different dynasty is no biggie. So the Qing dictionary, the KangXi, would have some taboo forms which would later become untaboo (especially now, of course, since nobody does that kind of thing anymore). == John H. Jenkins [EMAIL PROTECTED] [EMAIL PROTECTED] http://homepage.mac.com/jenkins/
Re: Han Radical-Stroke Index
On Monday, May 13, 2002, at 04:02 AM, William Overington wrote: In chapter 15 of the Unicode specification is the statement that the Han Radical-Stroke Index is available as a separate file. I have tried to find it on the web site with no success. Is this file available on the web site please? The current version is at http://www.unicode.org/charts/Unihan3.2.pdf. == John H. Jenkins [EMAIL PROTECTED] [EMAIL PROTECTED] http://homepage.mac.com/jenkins/
Re: CJK Unified Ideographs Extension B
On Monday, May 13, 2002, at 04:21 AM, William Overington wrote: I have been looking at the characters in the CJK Unified Ideographs Extension B document. These are the characters from U+02 through to U+02A6DF, which, as I understand it, are the rarer CJK characters. Actually, this is not quite true. The vast majority are rare, of course, and none of them are exactly *common*, but how rare they are depends on what you're writing. A small number, for example, are from HK SCS and reflect current needs for Hong Kong, including general-purpose Cantonese writing. (One is generally not supposed to write Cantonese, even if one speaks it, hence the lag in getting some Cantonese-specific characters added.) I wonder if any of the people who read this list who understand the languages involved might please like to say what any one or two of these characters, of their choice, mean please, just as a matter of general cultural interest for people who see these characters in the Unicode specification and, though not themselves knowledgeable of the languages, find the characters interesting for their artistry and history. My personal favorite is U+233B4, which means a tree stump. (It's formed by taking the tree radical and moving the cross-bar to the top of the character instead of having it in the middle.) U+20C43 is a Cantonese-specific character meaning thin or flat. Altogether, currently eighteen characters from Extension B currently have a kDefinition entry in Unihan.txt. == John H. Jenkins [EMAIL PROTECTED] [EMAIL PROTECTED] http://homepage.mac.com/jenkins/
Re: Additional Deseret letters
On Sunday, May 19, 2002, at 01:18 AM, Michael Everson wrote: At 16:48 -0700 2002-05-18, Doug Ewell wrote: I discovered on the updated Pipeline page that the apocryphal Deseret letters OI and EW were approved by UTC on 2002-05-02 for a future version of Unicode. This is news to me. They were omitted originally because they were considered ligatures. Has there been a new paper and proposal? Yes. WG2 documents N2473 and N2474 (when they show up, which should be shortly) deal with the issue. == John H. Jenkins [EMAIL PROTECTED] [EMAIL PROTECTED] http://homepage.mac.com/jenkins/
Re: Additional Deseret letters
On Saturday, May 18, 2002, at 05:48 PM, Doug Ewell wrote: Are these the same characters (and glyphs) that were described in John Jenkins' original Deseret proposal, and displayed -- perhaps accidentally -- in the chart accompanying the Deseret proposal for the ConScript Unicode Registry? Yes, they are. Ken Beesley of Xerox Research Center Europe is aware of their use in handwritten materials and argues that treating them as mere ligatures is insufficient. This will be WG2 document N2474. == John H. Jenkins [EMAIL PROTECTED] [EMAIL PROTECTED] http://homepage.mac.com/jenkins/
Re: Bengali script - where is khanda ta?
On Tuesday, May 21, 2002, at 01:03 PM, Somnath Kundu wrote: Is that same for Unicode, i.e., Ta + Halant + Halant - Khanda Ta and how Uniscribe handle this case? In other words, how can I write Khanda Ta in Unicode? Forgive me, but this is a pet peeve of mine. How something is done in Unicode and how it's done in Uniscribe are *NOT* the same thing. The reason for my posting was that I found Code2000 font some days ago and installed Bangla keyboard driver manually, found on my MSDN Win2k CD, on 2k/XP to type some Bangla letters but was not able to type Khanda Ta. (The glyph is also probably missing in that font). I don't think that Code2000 is an OpenType font, which means it won't have the ancillary glyphs and data needed to do full proper support of many languages and scripts. == John H. Jenkins [EMAIL PROTECTED] [EMAIL PROTECTED] http://homepage.mac.com/jenkins/
Re: Bengali script - where is khanda ta?
On Tuesday, May 21, 2002, at 09:01 PM, James Kass wrote: John H. Jenkins wrote: I don't think that Code2000 is an OpenType font, which means it won't have the ancillary glyphs and data needed to do full proper support of many languages and scripts. Code2000 is an OpenType font with fairly good OpenType coverage for Bengali presentation forms as well as coverage for several other scripts. I gladly stand corrected, then. Good job (as always), James! == John H. Jenkins [EMAIL PROTECTED] [EMAIL PROTECTED] http://homepage.mac.com/jenkins/
Re: Courtyard Codes and the Private Use Area (derives from Re: Encoding of symbols and a lock/unlock pre-proposal)
On Friday, May 24, 2002, at 08:06 AM, Philipp Reichmuth wrote: WO U+F3A2 PLEASE LIGATE THE NEXT TWO CHARACTERS WO U+F3A3 PLEASE LIGATE THE NEXT THREE CHARACTERS WO U+F3A4 PLEASE LIGATE THE NEXT FOUR CHARACTERS While I don't think this discussion of various PUA allocations should continue very further, it's probably a lot better to introduce the already-discussed ZERO WIDTH LIGATOR in such a form that X ZWL Y produces the XY ligature, X ZWL Y ZWL Z the XYZ ligature and so on. It saves you a lot of hassle with longer ligatures. Zero width ligator was rejected. Zero-width joiner can be used to mark ligation points where they are absolutely necessary; where they are merely stylistic preferences, they belong in markup. == John H. Jenkins [EMAIL PROTECTED] [EMAIL PROTECTED] http://homepage.mac.com/jenkins/
Re: N2476 a hoax?
prevented Chinese from being used in internationalized domain names. No, it didnt. That was a counterproposal made by the Chinese domain-name representatives, who claimed that prohibiting Han characters for now would give the relevant bodies more time to develop a proper TC/SC mapping solution (implying that the problem was solvable at all, an opinion disputed by many). Mea culpa. I stated the facts as I understood them, and I appear to have misunderstood them. In any event, while I (for one) would argue that TC/SC equivalence is not the same as English case-folding, my understanding was that there was a body of people who argued otherwise. The existence of such a body and and an acknowledgment of their desire is different from agreement with them. At the same time, I *do* agree that it is possible to define on a purely character level a function which allows a first-order approximation to SC/ TC equivalence. And I think it's a legitimate concern for companies and individuals that some mechanism be in place so that two domain names which are TC/SC equivalents aren't registered by competing organizations Unicode's own ideal Chinese domain name would be a case in point. Whether this is done via TC/SC folding or via someone asking to register domain name X and being told, Oh, by the way, you also need to register domain names Y and Z while you're at it is irrelevant. Programmers and users are being increasingly frustrated that as ISO/IEC 10646 becomes more pervasive, they are increasingly compelled to deal with a large number of variant characters some of which are only subtly different from each other and which cannot be automatically equated. The UTC would never refer to ISO/IEC 10646 as pervasive Why not? Isn't it? or talk of programmers and users being compelled to deal with variant characters, Why not? nor would it make such an emotional appeal that such variants should be automatically equated. Why not? Note the lack of standard UTC/WG2 terminology; if this were the UTC talking, you would be reading about canonical and compatibility equivalents and normalization. No, if it were Ken Whistler or Mark Davis writing the document, you would probably get this language. :-) More seriously, why do compatibility or canonical equivalents or the UTC's version of normalization come into it? The whole point here is that we are dealing with a different category of equivalent than the standard currently covers. The further issue of a normalized Han (Cleanihan) is also orthogonal. This passage also hints at the authors lack of awareness that similar equivalence issues exist for scripts other than Han. You may see the hint there; I certainly don't. In any event, I would argue that the problem is a lot worse for Han than for any other script in Unicode of which I'm aware. What is needed, however, is something that allows at the least for a first-order approximation of equivalence it would be up to the authors of the individual application, protocol, or standard to determine whether this were acceptable or not. And what if the authors decide the IRG-developed approach is not acceptable? What are they expected to do then? Whatever they want. We are repeatedly getting requests from people who are asking us how to handle Han variants, Doug, and we currently have no answer at all beyond pointing them to the rather limited data which is in Unihan.txt. (Indeed, many of the requests are coming from people who ask, How come the data in Unihan.txt is so crappy?) We want to solve this problem. At the same time, if Basis or Microsoft or someone else with the resources to develop their own solution wants to use their own solution, we don't preclude them from doing that. On the very same day (2002-05-08) that N2476 was published, a new Proposed Draft Technical Report (PDUTR #30) titled Character Foldings was also published. PDUTR #30, available on the Unicode Web site, deals with several different types of mappings between characters -- mappings that involve digraphs and trigraphs, removal of diacritical marks, mappings between Hiragana and Katakana, mappings between European, Arabic, and Indic digits, and so on. NOWHERE in this document is there the slightest mention of TC/SC mappings. Isn't that a bit strange? No, not really. There is sometimes a tendency for people who work on UTC documents to have a subconscious Han/everything-else dichotomy as they work. If the UTC were really driving the issue of TC/SC mapping, wouldn't they have at least given it a brief mention in a Character Foldings proposal? I would have hoped so, but evidently that didn't happen. That the UTC is concerned about SC/TC data and other Han equivalences is, in any event, already a part of the public record. == John H. Jenkins [EMAIL PROTECTED] [EMAIL PROTECTED] http://homepage.mac.com/jenkins/
Re: Normalisation and font technology
On Wednesday, May 29, 2002, at 10:55 AM, John Hudson wrote: In particular, I think it is is mistake to resolve display of character-level decompositions by relying on the presence of glyph-space substitution or positioning features in fonts, simply because most users have very few fonts that are capable of doing this. Agreement; Apple's current solution is a better-than-nothing one, but not really what's best in the long run IMHO. BTW, does FontLab 4 auto-generate OT layout data from the Unicode repertoire of a font? == John H. Jenkins [EMAIL PROTECTED] [EMAIL PROTECTED] http://homepage.mac.com/jenkins/
Re: Normalisation and font technology
On Wednesday, May 29, 2002, at 01:57 PM, John Hudson wrote: Thank you. My main concern was that someone might think that this is a reasonable model for handling this, and it wasn't immediately clear that Apple did not consider this, in fact, to be an appropriate long term solution. Hm. There aren't not too many negatives in that last sentence, making it not undifficult for some who who hadn't insufficient sleep last night not to be unable to parse it incorrectly. I think. I'm sorry. I'm very tired today. :-) Yes. Apple does not consider this an ideal long-term solution. == John H. Jenkins [EMAIL PROTECTED] [EMAIL PROTECTED] http://homepage.mac.com/jenkins/
Re: Unicode and the digital divide. (derives from Re: Towards some more Private Use Area code points for ligatures.)
everywhere. These are not rhetorical questions, I really would genuinely like to know. I am quite happy to accept that perhaps a solution to the problem has been found, yet wonder whether that solution is, as of today, only available to people who are on one side of a digital divide. Again, there are many digital divides. There are things that work on Windows and not on Macs. There are things that work on Macs and not on Windows. There are things that work with InDesign that don't work with Word. There are things that work with Word and not with InDesign. There are things that work with Windows XP that don't work with Windows 98, and things that work with Windows 98 that don't work with DOS 3.0. There are things that work with Mac OS X that don't work with Mac OS 9, and things that work with Mac OS 9 that don't work with Mac OS 6.8.3 of venerable memory. Don't put yourself in the position of arguing that it's wrong to innovate. Innovation in the IT industry always creates a digital divide. If there really are problems which my list will cause then I will be happy to add a note stating of the problem. Yet I am very concerned that I may be in effect being told here that Unicode is only really intended for people with the very latest equipment using expensive solutions that are only realistically available to rich corporations. *sigh* Unicode has from the beginning been designed with the assumption that it would require rendering engines capable of complex typesetting. We've always known that. It's taken longer to get them to market than we would have thought and liked, but they're showing up now. It's a bar we've always had to cross, however, if not for Latin ligatures, then for Arabic and Devanagari, and so on. The advantage of this is (ideally) that once you get a system capable of doing Arabic or Devanagari or Latin ligatures, you get a system capable of doing all of them. That, at least, was the goal. My thinking is that the existence of the list, (and hopefully, the list having been distributed in this discussion group, many people will be aware of its existence, and may perhaps have even filed a copy for possible future reference), will hopefully make the availability of such ligatures in founts more widespread and will also hopefully influence people who make software packages, such as relatively inexpensive electronic book publishing packages, to build in a feature so that such ligatures may be accessed from a TrueType fount. 1) People who make fonts already know about ligatures. 2) The set of ligatures appropriate for Latin typography is very much font-specific. Zapfino has dozens of Latin ligatures because it's a calligraphic font. Courier should have none because it's a monospaced font. 3) People who write book publishing packages already build in features to access Latin ligatures. Microsoft Word is not a good program to use for publishing books. I feel that the Unicode system should be available for all, not just for people who are on the money side of the digital divide. It's a nice goal. It isn't a realistic one, however. Now, a question on my part. You're using the term digital divide, but you're not defining it very well. Could you tell me: a) What the digital divide really is from your perspective—that is, what OS is on one side and what OS on the other? b) What are the relative numbers of people with systems on both sides? If, say, your divide were to be between Mac OS 6 or earlier and Mac OS 7 or later (the point at which Apple adopted TrueType as its primary font technology), then there are likely 99.99% of all Mac users on the 7-or-later side of the divide. Do you see what I'm asking here? == John H. Jenkins [EMAIL PROTECTED] [EMAIL PROTECTED] http://homepage.mac.com/jenkins/
Re: Unicode and the digital divide.
On Friday, May 31, 2002, at 10:11 AM, Doug Ewell wrote: Respefully, Nice one, Doug. Unfortunately, on my system, that collides with the ConScript version of Shavian which I have installed, so I got something unexpected. ☹ Which makes your point. As the Good Book says, He that hath ears to hear, let him hear. == John H. Jenkins [EMAIL PROTECTED] [EMAIL PROTECTED] http://homepage.mac.com/jenkins/
Re: Unicode and the digital divide.
On Friday, May 31, 2002, at 02:38 PM, Kenneth Whistler wrote: The issue is *NOT* hardware. Take a look at www.dell.com. The very, very, bottom-end system, a Dimension 2200 desktop, comes these days with a 1.3GHz Intel Celeron chip, oodles of multimegabytes of SDRAM, a 20- to 40GB hard drive, a 4MB of video memory. That machine, which can jump circles around even a top-of-the-line PC of just a few years ago, is listed at a base price of $669. These machines are now approaching supercomputer capabilities, at Radio Shack everyday consumer electronics prices. And if you can't afford one yourself, you can rent access to one. The issue is *NOT* the OS. All Dell PC's come pre-loaded with MS Windows XP right now. And guess what -- all that Unicode functionality is packed right under the hood in XP, waiting to go. And for the record, for slightly more you can get a low-end iMac with Mac OS Xagain, a Unicode-capable OS. == John H. Jenkins [EMAIL PROTECTED] [EMAIL PROTECTED] http://homepage.mac.com/jenkins/
Re: Q: How many enumerated characters in Unicode?
On Wednesday, June 5, 2002, at 07:27 AM, Adam Twardoch wrote: Oh, thank you! I needed that figure to make a point why you cannot make a single TrueType font covering all of the Unicode range. I knew it was way more than 65,536, but it's better to quote a precise figure :) Ah, but the figure Ken give you isn't enough, anyway for two reasons: 1) Some scripts (e.g., south Asian scripts) will require additional glyphs for proper display. 2) For Han in particular, one shape does not fit all. You'll need multiple locale-specific glyphs for a number of characters. In real life, you can ignore (2) by simply issuing a locale-specific version of a font, but there's no real way to get around (1). == John H. Jenkins [EMAIL PROTECTED] [EMAIL PROTECTED] http://homepage.mac.com/jenkins/
Re: Hong Kong Supplementary Character Set
On Tuesday, June 11, 2002, at 03:32 PM, Steve Watt wrote: Could someone explain the relationship of the two tags, kIRG_HSource and kHKSCS in the unihan.txt file on the Unicode site? Basically (at the moment), kIRG_HSource is a subset of kHKSCS. They also come via different routes. kIRG_HSource is a listing of those characters where the HKSAR submitted source information to the IRG. So far, all these are in HK SCS, but we can't guarantee this will be the case in the future. The latter comes via the HKSAR's official mapping tables. What would be the approved way to create a conversion table from Windows 950 (with HKSCS) to Unicode? Er, doesn't MS provide one somewhere? == John H. Jenkins [EMAIL PROTECTED] [EMAIL PROTECTED] http://homepage.mac.com/jenkins/
Re: Chess symbols, ZWJ, Opentype and holly type ornaments.
On Thursday, June 20, 2002, at 03:25 PM, Kenneth Whistler wrote: I think what a number of people on the list have been hinting -- or openly stating -- is that prolixity is not a virtue on an email list when trying to convey one's ideas. IOW, brevity's wit's soul. == John H. Jenkins [EMAIL PROTECTED] [EMAIL PROTECTED] http://homepage.mac.com/jenkins/
Re: (long) Re: Chromatic font research
On Saturday, June 29, 2002, at 06:41 AM, James Kass wrote: This is a display issue rather than an encoding one. Unicode already provides for the correct encoding of the ct ligature with the ZWJ character. Anyone wishing to correctly display the ct ligature might need to use a work-around. Substituting PUA code points by private agreement is one workable method. I must point out that for English (and a lot of other languages), the use of ZWJ to control ligation is considered improper. The ZWJ technique for requesting ligatures is intended to be limited to cases where the word is spelled incorrectly if *not* ligated (and similarly ZWNJ is intended to prevent ligature formation where that would make the word spelled incorrectly). The kind and degree of ligation in English is generally considered a sylistic issue and is best left to higher-level protocols. Thus saith Unicode 3.2. == John H. Jenkins [EMAIL PROTECTED] [EMAIL PROTECTED] http://homepage.mac.com/jenkins/
Re: (long) Re: Chromatic font research
Hmm. Disregard the last message from me. It isn't ct you're replacing. See how annoying this all is? :-) == John H. Jenkins [EMAIL PROTECTED] [EMAIL PROTECTED] http://homepage.mac.com/jenkins/
Re: (long) Re: Chromatic font research
On Saturday, June 29, 2002, at 03:01 PM, [EMAIL PROTECTED] wrote: On 06/28/2002 11:34:35 PM Doug Ewell wrote: sigh / OK, here are the details... OK, now I know the cha of events that he was referrg to, and I'm def itely cled to agree that it was complete cocidence. It is trivial, fact, to disprove the hypothesis that the experiment supposedly proved. Will you guys *please* stop sending me email with the Shavian letter CHURCH everywhere the Latin letters ct should be? It's most distracting. :-) == John H. Jenkins [EMAIL PROTECTED] [EMAIL PROTECTED] http://homepage.mac.com/jenkins/
Re: ZWJ and Latin Ligatures (was Re: (long) Re: Chromatic font research)
On Sunday, June 30, 2002, at 05:31 AM, James Kass wrote: Can you please point me to a URL for Unicode 3.2 ligature control? This link (March 2002): http://www.unicode.org/unicode/reports/tr28/ ...glosses over Latin ligatures suggesting that mark-up should be used in some cases and ZWJ in others. The precise language of the TR is: quote Ligatures and Latin Typography (addition) It is the task of the rendering system to select a ligature (where ligatures are possible) as part of the task of creating the most pleasing line layout. Fonts that provide more ligatures give the rendering system more options. However, defining the locations where ligatures are possible cannot be done by the rendering system, because there are many languages in which this depends not on simple letter pair context but on the meaning of the word in question. ZWJ and ZWNJ are to be used for the latter task, marking the non-regular cases where ligatures are required or prohibited. This is different from selecting a degree of ligation for stylistic reasons. Such selection is best done with style markup. See Unicode Technical Report #20, Unicode in XML and other Markup Languages for more information. /quote That seems pretty clear to me. If you want a ct ligature in your document because you think it looks cool, then you use some higher-level protocol. The looks cool factor simply doesn't apply unless you know what font you're dealing with, because ct looks cool in some fonts, but not others. In real Latin typography, the set of ligatures available with a typeface varies from font to font. Type designers add ligatures (or not) depending on their esthetic sense of what looks good and how the letters interact with one another. From a type design perspective, a monospaced font like Courier should have no ligatures; they don't make sense. A rich book font like Adobe Minion Pro will have a fairly large but standard set, and a calligraphic font like Linotype's Zapfino will have a huge and imaginative set. The programs that provide ligature control do so by means of having the user select a range of text and then changing the level of ligation. The type formats like OpenType or AAT support this by allowing the type designer to categorize ligatures as common, rare, required, and so on. Thus, if I'm typesetting a document in Adobe InDesign, I'll select text, and turn rare ligatures on and thus see the ct ligature, if it exists in the font and if the type designer has designated it a rare ligature. To be frank, turning on an optional ct ligature throughout a document by means of inserting ZWJ everywhere you want it to take place makes as much sense in that modelthe model that Western typography uses for languages such as Englishas having the user insert a i/i pair around every letter they want in italics. Remember, Unicode is aiming at encoding *plain text*. For the bulk of Latin-based languages, ligation control is simply not a matter of *plain text*that is, the message is still perfectly correct whether ligatures are on or off. There are some exceptional cases. The ZWJ/ZWNJ is available for such exceptional cases. == John H. Jenkins [EMAIL PROTECTED] [EMAIL PROTECTED] http://homepage.mac.com/jenkins/
Re: (long) Re: Chromatic font research
On Monday, July 1, 2002, at 05:31 AM, Michael Everson wrote: I must point out that for English (and a lot of other languages), the use of ZWJ to control ligation is considered improper. The ZWJ technique for requesting ligatures is intended to be limited to cases where the word is spelled incorrectly if *not* ligated What! No! Look at my paper and the examples of Runic and Old Hungarian and Irish. There are examples where ligation is used on a nonce-basis, not having anything to do with global ligation or correctness. Michael, I was very careful to say English (and a lot of other languages) . And, by and large, the software which supports ligation doesn't compel global on-or-off, so that nonce-bases are supported. What's frustrating for me about this never-ending discussion is that it always seems to come down to the stupid ct-ligature in English. I have a book that uses it *everywhere* and it gets *really* annoying. :-( I have sitting in front of me a reprint of a nineteenth century reproduction of the 1611 King James Version of the Bible. It uses the ct ligature in the headers, but not in the text. (It also uses the long-s, by the way.) But if someone were to come to me and ask how you would use plain text to reproduce this text, I'd tell them you can't, or shouldn'tt rying to reproduce precisely the visual appearance of a text isn't a job for plain text. Period. I also have a font on my machine based on the handwriting of Herman Zapf. It's a gorgeous font with a huge, idiosyncratic set of ligatures. It doesn't make sense to have the user (or software) insert ZWJ all over the place on the off-chance that the text will end up being set with Zapfino to make sure that these ligatures form correctly. Our system fonts are set, moreover, to do fi- and fl-ligature formation automatically (well, most of them are). That's because it's the appropriate default behavior for most Latin-based languages. (Not all. I know that.) Where this behavior is *not* appropriate, there are mechanisms, including the ZWJ/ZWNJ one, which can override the default behavior. This means that file names, menus, dialog boxes, email, and so on all do the most-nearly-correct thing without having to be told to. (and similarly ZWNJ is intended to prevent ligature formation where that would make the word spelled incorrectly). The kind and degree of ligation in English is generally considered a sylistic issue and is best left to higher-level protocols. Thus saith Unicode 3.2. It doesn't go so far as to say what you did. Maybe Book needs to check the text some on this point. We should have consensus. No, the bit about spelling is simply my attempt to state informally the idea that Unicode 3.2 is attempting to convey. Let's just have a quick survey here to see if there's consensus: 1) In Latin typography, ligature formation is generally a stylistic choice. There are exceptions, and these exceptions are more or less common depending on the precise language being represented. 2) Where ligature formation *is* a stylistic choice, it should not be controlled in plain text but by some sort of higher-level mechanism. Such a mechanism should allow the default formation of ligatures with the ability for the user to override the default behavior. 3) Where ligature formation is *not* a stylistic choice, the ZWJ/ZWNJ mechanism is an appropriate one to provide ligation control. 4) The precise set of ligatures in a Latin typeface is design-specific. A typeface should not be required to include a set of ligatures which do not make aesthetic sense for the overall design. This last point, by the way, is the one which is the big sticking point for the large type foundries that I've spoken to. == John H. Jenkins [EMAIL PROTECTED] [EMAIL PROTECTED] http://homepage.mac.com/jenkins/
Re: ZWJ and Latin Ligatures
On Monday, July 1, 2002, at 10:16 AM, Michael Everson wrote: Some nice person just said to me privately: Michael Everson wrote: In my paper http://www.dkuug.dk/jtc1/sc2/wg2/docs/n2317.pdf I raised a lot of questions about exceptions and the use of these. I don't think they were ever all answered.My other papers, N2141 and N2147, show a number of examples of ligation which is not particularly predictable. That's what ZWJ us supposed to be for. That's because some people (not to mention any ad-hominem names; there is more than one) are more interested in saying This is a simple problem, and the rendering systems of the future (or my Mac today) will handle it automatically than in answering the complex linguistic and orthographic questions you raised. For the record, I (at least) have never asserted that Mac (or any other) system software will ever gain the ability to handle ligation on a completely automatic basis. In any event, the ZWJ/ZWNJ mechanism has no advantage over any higher-level protocol when it comes to software support, since it's all being done via AAT/OpenType/Graphite or something similar in any event. I guess one thing that's frustrating for me personally in this perennial discussion is the creation of this false dichotomy, that ligation control either *must* be in plain text or *must* be expressly forbidden in plain text. I would agree, Michael, that your arguments that some degree of ligation control belongs in plain text were unanswerable. You did a good job there. But at the same time, I've never heard you argue that the only way to turn ligatures on or off is in plain text. I feel compelled to reiterate my own feelings on the subject: Ligation in Latin text is generally a matter of stylistic preference, and depends on the specific typeface being used and its set of available ligatures. There are exceptions, and these should be handled via the ZWJ/ZWNJ mechanism. Where ligation is merely a matter of stylistic preference, however, it should be handled by some other mechanism which can take the specific capacities of a typeface into consideration. System and other software can (and should) provide default ligation which the user should be able to override. And under no circumstances should new Latin ligatures be added to Unicode. Personally I think your ZERO-WIDTH LIGATOR papers are among the best of all your Unicode-related papers. I agreed with the decision to unify the ligation function with ZWJ rather than creating a new character, but your arguments about Latin, Greek, Runic, Old Hungarian, etc. ligation were thorough and unassailable. Thank you, nice person. It's nice to know that someone else looked at the argument and came up with the same conclusion that I did. For the record, Michael, this was the general feeling of the UTC when the matter was debated there. == John H. Jenkins [EMAIL PROTECTED] [EMAIL PROTECTED] http://homepage.mac.com/jenkins/
Re: ZWJ and Latin Ligatures (was Re: (long) Re: Chromatic font research)
On Monday, July 1, 2002, at 06:28 AM, James Kass wrote: John H. Jenkins wrote: That seems pretty clear to me. If you want a ct ligature in your document because you think it looks cool, then you use some higher-level protocol. The looks cool factor simply doesn't apply unless you know what font you're dealing with, because ct looks cool in some fonts, but not others. It's enough that an author would want a ct ligature to appear in text, the motivation for the desire isn't relevant. Authors who want to specify a certain ligature know about font selection. Au contraire, because of the italic analog. I may *want* a particular word to be in italics, but that doesn't mean that the italics belong in plain text. It is not the goal of Unicode to allow the complete representation of an author's intent in plain text. I can't typeset Alice in Wonderland in plain text. I'm sorry, but the Mouse's tail would simply get in the way. There's another level of problem here, too. What if it isn't the author's intent, but an artifact of the particular typesetter? One problem with TR28 is that it is worded so that it appears to be in addition to earlier guidelines. This implies that the examples used in TR27, for one, are still valid. In TR27, font developers are urged to add things like f+ZWJ+i to existing tables where f+i is already present. And for the record, Apple is doing that. Another problem with TR28 is that its date is earlier than the date on TR27. This suggests that TR27 is more current. This may be a point for clarification in TR28. Another issue is that a search of the Unicode site for controlling ligatures gives TR27 as a hit, but not TR28. Having slept on this, I concur that it might be cool to be able to turn on or turn off ligatures over a range of text or an entire file using a higher level protocol. However, options should be preserved for the user. Ligature selection is a task for the author/typesetter at the fundamental level; it should not be completely left to the rendering system. Er, James. I've never said it should. The rendering system should have the ability to do default ligation. The user should be able to override that behavior. That's what happens on systems I see. If they do ligation at *all*, they have a default behavior which can be overridden. The programs that provide ligature control do so by means of having the user select a range of text and then changing the level of ligation. The type formats like OpenType or AAT support this by allowing the type designer to categorize ligatures as common, rare, required, and so on. Thus, if I'm typesetting a document in Adobe InDesign, I'll select text, and turn rare ligatures on and thus see the ct ligature, if it exists in the font and if the type designer has designated it a rare ligature. That's a lot of ifs and it leaves too much to chance. When an author determines that, for instance, a ct ligature is required, there needs to be a method to encode it which is unambiguous. ZWJ fits the bill. I'll repeat a point that I've made over and over and over. The ct ligature does not exist in and of itself. It is a part of a typeface. It doesn't make sense in general to ask for the formation of a ct ligature without any reference to the typeface you're using. The implication of what you're saying is that Latin typefaces should be *required* to have a ct ligature on the off chance that the author of text determines that it's required in a particular context. That gives most type designers the heebie jeebies. It's bad enough that Adobe and Apple are making them stick useless fi and fl ligatures in their fonts. In any event, if an author determines that a ct ligature is honestly and absolutely *required* in a particular context (as opposed to being desirable), then the ZWJ mechanism exists. To be frank, turning on an optional ct ligature throughout a document by means of inserting ZWJ everywhere you want it to take place makes as much sense in that modelthe model that Western typography uses for languages such as Englishas having the user insert a i/i pair around every letter they want in italics. Not at all. This is apples and oranges. The italic tags operate upon every character in the enclosed string equally. Using a similar ligature tag would be expected to make ligatures wherever possible within the enclosed string according the the user system's ability to render ligatures... irrespective of the author's intent. Depending upon the system, the same run of text could be expressed with no ligatures at all in a monospaced font or as scripto continuo in a handwriting font. Er, you've just made my point, haven't you? The typeface makes a difference. If you're ever in a situation where the typeface of the originator may be different from the typeface of the receiver, you've lost the ability to say whether or not ligatures
Re: Radicals in CNS 11643-1992, Plane 1, Rows 7,8,9
On Monday, July 1, 2002, at 03:10 PM, Torsten Mohrin wrote: What should I do with these characters when converting CNS to Unicode? Mapping to regular Han? Are there compatibility ideographs for round-trip conversion? Use the KangXi radicals in the KangXi radical block (U+2Fxx). == John H. Jenkins [EMAIL PROTECTED] [EMAIL PROTECTED] http://homepage.mac.com/jenkins/
Re: ZWJ and Latin Ligatures (was Re: (long) Re: Chromatic font research)
On Monday, July 1, 2002, at 02:08 PM, Asmus Freytag wrote: At 11:34 AM 6/30/02 -0600, John H. Jenkins wrote: Remember, Unicode is aiming at encoding *plain text*. For the bulk of Latin-based languages, ligation control is simply not a matter of *plain text*that is, the message is still perfectly correct whether ligatures are on or off. There are some exceptional cases. The ZWJ/ZWNJ is available for such exceptional cases. Remember also that the simplistic model you present already breaks down for German, since the same character pair may or may not allow ligation depending on the content and meaning of the text - features that in the Unicode model are relegated to *plain* text. *sigh* I'm clearly not expressing myself well here. I'm trying to state the general rule. Each time I do, I say there are exceptions. German is an excellent example of an exception. Michael's exceptional cases are exceptional cases. We put ZWJ/ZWNJ in charge of plain-text ligature formation to handle these cases. I'm fine with that. Turkish is another exception, BTW, where the typical fi ligature of Latin typography should not be formed. The issue -- as I see it -- is not whether or not *any* ligature control belongs in plain text, or whether or not manditory/prohibited ligation points should be marked in plain text. I'm not aware of anyone who is arguing against that position. We started out with a discussion of whether or not we should add more Latin ligatures (whether in the PUA or elsewhere) so that people can, in essence, create a plain-text representation of an older book where such were more common. (And, as always, if my memory is inaccurate please feel free to correct me here.) This is not an appropriate use of plain text IMHO. I do not believe, moreover, that the ZWJ/ZWNJ mechanism is appropriate for this sort of thing. This is rich text, and other ligation controls should be used. Therefore, I would be much happier if the discussion of the 'standard' case wasn't as anglo-centric and allowed more directly for the fact that while fonts are in control of what ligatures are provided, layout engines may be in control of what and how many optional ligatures to use, the text (!) must be in control of where ligatures are mandatory or prohibited. Which is what Unicode 3.2 says. (You said it very nicely here, though.) (The standard case, BTW, seems to be Anglo-centric largely because this is an English-speaking list and people always seem to start out with the ct ligature they'd like to put in words like respectfully. Sorry about that.) == John H. Jenkins [EMAIL PROTECTED] [EMAIL PROTECTED] http://homepage.mac.com/jenkins/
Re: ZWJ and Latin Ligatures
On Monday, July 1, 2002, at 01:03 PM, Tex Texin wrote: The discussion refers to other ways of influencing a font with respect to ligature and I don't recall ever seeing a way to do this. What kinds of products have these abilities? It's a pretty common feature of desktop publishing applicationsQuark, FrameMaker, InDesign. TextEdit, the default text editor on Mac OS X, does it, but it's not at all common at the low end of things. I wouldn't be surprised if it showed up in Word eventually, however. In FrameMaker, which I happen to have open at the moment, you do it by turning pair kerning on and off. InDesign has a menu that lets you select degree of ligation. == John H. Jenkins [EMAIL PROTECTED] [EMAIL PROTECTED] http://homepage.mac.com/jenkins/
Re: ZWJ and Latin Ligatures
On Tuesday, July 2, 2002, at 06:51 AM, Michael Everson wrote: That is absolutely true. I have never argued that the only way to turn ligatures on or off is in plain text. I saw that there were difficult edge cases and sought blessing for the ZWJ/ZWNJ mechanism to handle them, and won the day. But it would certainly be my view that those should only be used where predictable ligation does not occur. A Runic font which had an AAT/OpenType/Graphite ligatures-on mechanism would, in my view, be inappropriate, because ligation is unusual in Runic, never the norm, and should only be used on a case-by-case basis. Runic fonts should have the ZWJ pairs encoded in the glyph tables. Alas, but that's technically impossible. Both OT and AAT (I'm not sure about Graphite) require that single characters map to single glyphs, which are then processed. (In OT, of course, you are also supposed to do some preprocessing in character space, but that doesn't solve this problem.) It would be nice to have a cmap format which maps multiple characters to single glyphs initially. The way we deal with this is to have the ligatures with the ZWJ inserted as part of a ligature table which is on by default and which isn't revealed to the UI so that the user can't turn them off. == John H. Jenkins [EMAIL PROTECTED] [EMAIL PROTECTED] http://homepage.mac.com/jenkins/
Re: ZWJ and Latin Ligatures
On Tuesday, July 2, 2002, at 09:49 AM, Michael Everson wrote: At 09:41 -0600 2002-07-02, John H. Jenkins wrote: Alas, but that's technically impossible. Both OT and AAT (I'm not sure about Graphite) require that single characters map to single glyphs, which are then processed. Hm? How do you handle the decomposed sequence A + COMBINING ACUTE? Surely that is a sequence of characters mapping to a single glyph. Same process. In OT, of course, you could count on the glyph being prenormalized (but this only works for stuff already in Unicode), or you could use the GPOS table to properly form the accented form on-the-fly. But neither technology allows the decomposed sequence to be mapped directly to a single glyph. Just goes to show that I don't make proper Unicode fonts yet because the tools just aren't up to snuff. We're working on it. :-) (In OT, of course, you are also supposed to do some preprocessing in character space, but that doesn't solve this problem.) It would be nice to have a cmap format which maps multiple characters to single glyphs initially. I always thought there was. Now I'm really confused as to how I would make a complex Indic syllable. Same sort of thing. You put the glyph in the font and the instructions for what sequence forms it in the GSUB or morx table. == John H. Jenkins [EMAIL PROTECTED] [EMAIL PROTECTED] http://homepage.mac.com/jenkins/
Re: ZWJ and Latin Ligatures
On Tuesday, July 2, 2002, at 10:55 AM, Marco Cimarosti wrote: I mean: isn't this two-step mapping: code point - glyph ID component glyph ID's - ligature glyph ID functionally equivalent to an hypothetical one-step mapping? component code points - ligature glyph ID Am I missing something? Functionally, the two are equivalent. There are, however, two subtle differences: 1) If you map directly from multiple characters to a single glyph, you don' t have to include glyphs in your font for all the pieces if they're never supposed to appear by themselves. As an extreme example, if I implemented astral character support via ligating surrogate pairs, I'd need to include glyphs for the unpaired surrogates. As it is, Windows and the Mac *do* support mapping paired surrogates directly to glyphs, so you don't need these extra glyphs which are never seen. 2) A mapping directly from multiple characters to single glyphs expressly makes the process something not to percolate up to the UI. The indirect process means that there are some actions in glyph space which *are* optional and which the user can turn on and off, and others which aren't. In OpenType, this is less of an issue since this was always the case and applications are expected to do the UI work themselves. In AAT, we originally assumed (back in the days of the Technology That Must Not Be Named) that all layout features are optional and can be turned on and off, and that the UI would always reflect the entire suite of available features. We had to rewrite our tools to allow for required actions which cannot be turned off. Poor Michael is saddled with older versions of our tools which are hard to use and don't let him do this. We're working on getting newer and better ones to him. == John H. Jenkins [EMAIL PROTECTED] [EMAIL PROTECTED] http://homepage.mac.com/jenkins/
Re: ZWJ and Latin Ligatures
On Tuesday, July 2, 2002, at 11:39 AM, John Cowan wrote: 1) If you map directly from multiple characters to a single glyph, you don' t have to include glyphs in your font for all the pieces if they're never supposed to appear by themselves. As an extreme example, if I implemented astral character support via ligating surrogate pairs, I'd need to include glyphs for the unpaired surrogates. More precisely, you need to have glyph *indexes* that are never mapped to glyphs. The actual outlines themselves don't need to exist, AFAIK. True. I tend to avoid that, because if something goes wrong and the system attempts to actually *display* one of these virtual glyphs, disaster would ensue. (Dave Opstad and I have had long debates on the safety of doing this.) == John H. Jenkins [EMAIL PROTECTED] [EMAIL PROTECTED] http://homepage.mac.com/jenkins/
Re: ZWJ and Latin Ligatures
On Tuesday, July 2, 2002, at 12:51 PM, Marco Cimarosti wrote: The next step could be standardizing the values of the glyph indexes, so that the entire GSUB/morx table can be copied in from a template, and type designers can concentrate on drawing the outlines. The typical approach these days is for the tools that provide advanced layout table support to be keyed to glyph name. Apple's tools allow glyph name, glyph number, of Unicode code point as glyph identifiers. As you say, it makes it possible to cut-and-paste source files and is very handy. == John H. Jenkins [EMAIL PROTECTED] [EMAIL PROTECTED] http://homepage.mac.com/jenkins/
Re: FW: Inappropriate Proposals FAQ
On Wednesday, July 3, 2002, at 11:57 AM, Asmus Freytag wrote: Klingon (or any of the Latin ciphers/ movie scripts) I'd say Klingon *and* one of the Latin ciphers. Klingon is almost worth a FAQ in itself. == John H. Jenkins [EMAIL PROTECTED] [EMAIL PROTECTED] http://homepage.mac.com/jenkins/
Re: Inappropriate Proposals FAQ
On Wednesday, July 3, 2002, at 02:23 PM, Murray Sargent wrote: as something inappropriate. Question: how does one code up (presumably with markup) a caret over a jk pair in a math expression? The dot on the j should be missing for this case, but how does one communicate that to a font if there's no code for a dotless j? It seems that dotless j is needed for some mathematical purposes. The glyph is; the character isn't. There are also accented j's which are based on a dotless-j. The way we do it is include a glyph called dotlessj in the font, and have the tables set up so that whenever j is found with an accent, dotlessj is substituted. == John H. Jenkins [EMAIL PROTECTED] [EMAIL PROTECTED] http://homepage.mac.com/jenkins/
Re: FW: Inappropriate Proposals FAQ
On Thursday, July 4, 2002, at 09:07 AM, Otto Stolz wrote: Michael Everson wrote: That, and the fact that it hasn't been deciphered. Which implies that you really cannot tell what constitutes a character, in that script, nor its writing-direction. Actually, you can't even tell *that* it's a script, not for sure. But if it *is* writing, then the nature of the characters seems fairly unambiguous as the various signs are self-contained and don't break down into smaller pieces. It would appear to be a syllabary. Also IIRC the writing direction has been deduced by determining the order in which the characters were stamped into the clay (as indicated by overlaps). I should mention that the proposals for the encoding of the Phaistos disc are the only proposals made to the UTC and WG2 which contain the entire known corpus of writing with that script as a part of the proposal. :-) == John H. Jenkins [EMAIL PROTECTED] [EMAIL PROTECTED] http://homepage.mac.com/jenkins/
Re: The pointless thread continues
On Friday, July 5, 2002, at 08:54 AM, John Hudson wrote: Actually, this isn't nonsense. A single buggy font is quite capable of crashing an operating system. Obviously the damage is not permanent, presuming one is able to get the system started in safe mode and remove the offending font. I've seen some spectacularly nasty fonts over the years, as have many of my colleagues (including engineers in the type group at Apple, so this isn't simply a Windows issue). C'est vrai. One of the fonts we used to print Unicode 2.0 killed *all* text display on the system if it were to be used with ATSUI. It was kind of cool, actually. We actually have a font zoo stashed away full of pathological fonts which have been known to do all kinds of interesting things if someone should be foolish enough to install them. == John H. Jenkins [EMAIL PROTECTED] [EMAIL PROTECTED] http://homepage.mac.com/jenkins/
Re:_How_do_I_encode_HTML_documents_in_old_languages_=C5uch as 17th century Swedih in Unicode?
On Wednesday, July 3, 2002, at 11:10 AM, Stefan Persson wrote: There is a big problem in the current Unicode ſtandard, ſince Fraktur letters aren't ſupported in any ſuitable manner. Aargh! Medial long-s! Run away! Run away! :-) == John H. Jenkins [EMAIL PROTECTED] [EMAIL PROTECTED] http://homepage.mac.com/jenkins/
Re: ZWJ and Latin Ligatures (was Re: (long) Re: Chromatic font research)
On Saturday, July 6, 2002, at 03:42 AM, James Kass wrote: We certainly agree that ligature use is a choice. I think we diverge on just what kind of choice is involved. You consider that ligature use is generally similar to bold or italic choices. I consider use of ligatures to be more akin to differences in spelling. If you're quoting from a source which used the word fount, it is wrong to change it to font. And, if you're quoting from a source which used hæmoglobin, anything other than hæmoglobin is incorrect. If the source used c., it should never be changed to etc.. So, if the source used the ct ligature... I see your point, but I think we're to the stage where we'll just have to agree to disagree. We *do* agree that ligation is a choice, but you're quite accurate in your assessment of where precisely we diverge. == John H. Jenkins [EMAIL PROTECTED] [EMAIL PROTECTED] http://homepage.mac.com/jenkins/
Re: [OpenType] Proposal: Ligatures w/ ZWJ in OpenType
On Saturday, July 6, 2002, at 04:11 PM, John Hudson wrote: There are going to be documents containing this character -- and ZWNJ -- and fonts that do not contain these characters may display them with .notdef glyphs. The only solution is system or application intelligence that is able to ensure that no attempt is made to display glyphs for these characters. This issue seems to have already been resolved in MS text processing, at least as far as I have tested it in WordPad. I have inserted a ZWJ character in a string of text using a standard PS Type 1 font, and the character is treated as a zero-width, no outline control character. Well, by default no attempt is made to display glyphs for these characters. (Somebody may have a show invisibles or equivalent on. BTW, does OT have a show invisibles feature? I'm too lazy to check right now.) We also have a list of invisible characters which should, ordinarily, be left undisplayed including ZWJ, ZWNJ, the bidi overrides, and so on. == John H. Jenkins [EMAIL PROTECTED] [EMAIL PROTECTED] http://homepage.mac.com/jenkins/
Re: Proposal: Ligatures w/ ZWJ in OpenType
On Monday, July 15, 2002, at 09:58 AM, Doug Ewell wrote: No, what bothers me is that the ZWJ/ZWNJ ligation scheme is starting to look just like the DOA (deprecated on arrival) Plane 14 language tags. In each case, Unicode has created a mechanism to solve a genuine (if limited) need, but then told us -- officially or unofficially -- that we should not use it, or that it is reserved for use with special protocols which are never defined or mentioned again. I'm not sure I agree with you here. The position of the UTC is not that ZWJ should never be used and we're sorry we added it, which is the case of the Plane 14 language tags. It's that the ZWJ should not be the primary mechanism for providing ligature support in many cases. That's as far as it goes. The UTC may have intended that ZWJ ligation be used only in rare and exceptional circumstances, but UAX #27, revised section 13.2 doesn't say that. The latest word is the Unicode 3.2 document, not the Unicode 3.1 document. It says: Ligatures and Latin Typography (addition) It is the task of the rendering system to select a ligature (where ligatures are possible) as part of the task of creating the most pleasing line layout. Fonts that provide more ligatures give the rendering system more options. However, defining the locations where ligatures are possible cannot be done by the rendering system, because there are many languages in which this depends not on simple letter pair context but on the meaning of the word in question. ZWJ and ZWNJ are to be used for the latter task, marking the non-regular cases where ligatures are required or prohibited. This is different from selecting a degree of ligation for stylistic reasons. Such selection is best done with style markup. See Unicode Technical Report #20, Unicode in XML and other Markup Languages for more information. It says that ZWJ and ZWNJ *may be used* to request ligation or non-ligation, and that font vendors should add ZWJ to their ligature mapping tables as appropriate. It does acknowledge that some fonts won't (or shouldn't) include glyphs for every possible ligature, and never claims that they must (or should). It specifically does *not* say that ZWJ ligation is to be restricted to certain orthographies, or to cases where ligation changes the meaning of the text. This is correct. Nor is this changed in Unicode 3.2. The goal is to make the ZWJ mechanism available to people who feel it is appropriate to meet their needs, but to try to inform them that in the majority of cases, a higher-level protocol would be better. Adobe doesn't have to revise InDesign, for example, to insert ZWJ all over when a user selects text and turns optional ligatures on. OTOH, the hope is that if ligatures are available InDesign will honor the ZWJ marked ones, even if ligation has been turned off. John Hudson has recommended what seems a reasonable way to handle this in OT. Apple will be releasing new versions of its font tools in the near future, and the documentation will include a recommendation for how this can be done with AAT. We've been revising our own fonts as the opportunity presents itself to support ZWJ as well. (The system and ATSUI-savvy applications require no revision.) The push-back coming from the font community on the issue has to do mostly with the communications problem that they weren't aware of it in as timely a fashion as would have been best, and the concern that font developers and application/OS developers will be forced to add ligature support where they have felt it in appropriate in the past. ZWJ/ZWNJ for ligation control is part of Unicode. It is not always the best solution, but it is *a* solution, and should be available to the user without restriction or discouragement. It's discouraged when it's inappropriate. It isn't deprecated. There are numerous places where Unicode provides multiple ways of representing something. In this instance, Unicode is trying to delineate where a particular mechanism is appropriate and where inappropriate. == John H. Jenkins [EMAIL PROTECTED] [EMAIL PROTECTED] http://homepage.mac.com/jenkins/
Re: Missing character glyph
On Tuesday, July 30, 2002, at 08:58 PM, Doug Ewell wrote: Have Last Resort symbols been devised for all the blocks in Unicode, including the new ones like Tagalog? Neither Mark Leisher's page nor the Apple typography page contains a complete list. Yes. It covers all of Unicode 3.2; but the font has been entirely redesigned. We really need to update our documentation. == John H. Jenkins [EMAIL PROTECTED] [EMAIL PROTECTED] http://homepage.mac.com/jenkins/
Re: Digraphs as Distinct Logical Units
On Friday, August 9, 2002, at 03:54 AM, Andrew C. West wrote: And in China, historically the personal names of emperors (for emperors read dictators) have been tabooed An Ideographic Taboo Variation Indicator has been approved by the UTC for addition to the standard to handle precisely this kind of situation (see http://www.unicode.org/unicode/alloc/Pipeline.html. It works on the theory that you rarely need to know the precise *form* of the taboo variant, just that a taboo form is being used. There was some disagreement in WG2 about its utility, however, and there is the problem that, as you note, some taboo variants have already been encoded. It's currently scheduled to be reconsidered by the UTC. == John H. Jenkins [EMAIL PROTECTED] [EMAIL PROTECTED] http://homepage.mac.com/jhjenkins/
Re: Taboo Variants
On Friday, August 9, 2002, at 11:38 AM, Andrew C. West wrote: My point is that if the commonly encountered taboo variants are already encoded in CJK-B, then either the other taboo variants should also be added to CJK-B or they could be *described* using IDCs. Encoding them was a mistake, pure and simple. We didn't monitor the IRG well enough in the CJK-B encoding process, or we would have objected to this kind of cruft. And describing them is a valid approach. It depends on what's more important to youthe appearance (which IDS's are better at), or the semantic (which is explicit with the TVS). Adding a taboo variant selector does make a difference, because then there'll be more than one way to reference the same character. Well, yes and no. Even though we've already got taboo variants encoded, we have no way to flag in a text that the purpose they're serving is taboo variants. The interesting thing about the taboo variants is precisely that meaning: This is character X written in a deliberately distorted way. You identified the taboo variants you found in Ext B not based on anything in the standard, but because of your outside knowledge. A student encountering them in a text may well be stymied until she goes to her professor. Meanwhile, multiple encodings of the same Han character are *already* a major problem. This is one reason why the UTC is determined to be stricter in the future to keep it from continuing to happen. == John H. Jenkins [EMAIL PROTECTED] [EMAIL PROTECTED] http://homepage.mac.com/jhjenkins/
Re: Keys. (derives from Re: Sequences of combining characters.)
On Friday, September 27, 2002, at 09:52 AM, [EMAIL PROTECTED] wrote: I doubt there's anyone on this list that always agrees with me I think you're wrong, there, Peter. I *never* disagree with you. :-) == John H. Jenkins [EMAIL PROTECTED] [EMAIL PROTECTED] http://www.tejat.net/
Re: script or block detection needed for Unicode fonts
On Saturday, September 28, 2002, at 03:19 PM, David Starner wrote: On Sat, Sep 28, 2002 at 01:19:58PM -0700, Murray Sargent wrote: Michael Everson said: I don't understand why a particular bit has to be set in some table. Why can't the OS just accept what's in the font? The main reason is performance. If an application has to check the font cmap for every character in a file, it slows down reading the file. Try, for example, opening a file for which you have no font coverage in Mozilla on Linux. It will open every font on the system looking for the missing characters, and it will take quite a while, accompanied by much disk thrashing to find they aren't there. This just seems wildly inefficient to me, but then I'm coming from an OS where this isn't done. The app doesn't keep track of whether or not a particular font can draw a particular character; that's handled at display time. If a particular font doesn't handle a particular character, then a fallback mechanism is invoked by the system, which caches the necessary data. I really don't see why an application needs to check every character as it reads in a file to make sure it can be drawn with the set font. == John H. Jenkins [EMAIL PROTECTED] [EMAIL PROTECTED] http://www.tejat.net/
Re: Mac Unicode question
On Tuesday, October 1, 2002, at 08:42 AM, Alan Wood wrote: I don't think anyone replied to this. As far as I know, these are the only applications for Mac OS 9 that can use Windows TrueType fonts: On X, any (non-Classic) application can use Windows TrueType fonts. Carbon applications which do not explicitly use ATSUI or MLTE are limited in how much of the font they can use. Cocoa apps are pretty much able to do anything. == John H. Jenkins [EMAIL PROTECTED] [EMAIL PROTECTED] http://www.tejat.net/
Re: is this a symbol of anything? CJK?
On Thursday, October 10, 2002, at 02:29 PM, Tex Texin wrote: It looks close to several cjk characters, so I wasn't sure. I think it's a variant turtle ideograph. :-) (Nothing bad, so far as I know.) == John H. Jenkins [EMAIL PROTECTED] [EMAIL PROTECTED] http://www.tejat.net/
Re: Manchu/Mongolian in Unicode
On Sunday, October 13, 2002, at 12:26 PM, Tom Gewecke wrote: The latest Mac OS X upgrade has fonts that include the classic Mongolian/Manchu range, 1800-18AF. Well, yes, but they're not ready for prime time. They're included because of PRC requirements which expect the glyphs but don't really insist that they do the right thing. The same is true of Tibetan. Even the PRC's own fonts have this problem. This is an unfortunate bind we were put in and I hope we can correct it in a not-too-distant release. == John H. Jenkins [EMAIL PROTECTED] [EMAIL PROTECTED] http://www.tejat.net/
Re: the carnival of lost souls
It's Carnival of Souls, actually. http://us.imdb.com/Title?0055830 is the original version, made by a fellow whose stock-in-trade was those old movies they used to show in high school to teach hygiene and the like. He shot it in something like a week while he was supposed to be on vacation, mostly in Lawrence, Kansas, and Salt Lake City, using the abandoned spa on the Great Salt Lake, Saltair, as a major set. Now, do you think I could have gotten any *more* off-topic than that? On Tuesday, October 15, 2002, at 06:43 AM, John Cowan wrote: Pavla OR Francis Frazier scripsit: the carnival of lost souls What an expression! Almost makes me want to view the poster to see what inspired it... Googling suggests that this is the title of a film, but the Internet Movie Database (imdb.com) knows it not. -- My corporate data's a mess! John Cowan It's all semi-structured, no less.http://www.ccil.org/~cowan But I'll be carefree [EMAIL PROTECTED] Using XSLThttp://www.reutershealth.com In an XML DBMS. == John H. Jenkins [EMAIL PROTECTED] [EMAIL PROTECTED] http://www.tejat.net/
Re: Sorting on number of strokes for Traditional Chinese
The Unihan database has total stroke count for many (but not all) characters. It may provide an adequate first-order set of data for a pure stroke-based ordering in TC. On Tuesday, October 15, 2002, at 12:02 PM, Magda Danish (Unicode) wrote: -Original Message- Date/Time:Tue Oct 15 05:13:41 EDT 2002 Contact: [EMAIL PROTECTED] Report Type: Other Question, Problem, or Feedback To whom concerns, I wonder Unicode provide us a way to do sorting on number of strokes for Traditional Chinese characters. This is urgent, please advise. regards Tony -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- (End of Report) == John H. Jenkins [EMAIL PROTECTED] [EMAIL PROTECTED] http://www.tejat.net/
Re: Sorting on number of strokes for Traditional Chinese
On Wednesday, October 16, 2002, at 04:14 AM, Marco Cimarosti wrote: The next step is knowing *which* strokes make up each characters, in order to properly sort characters having the same stroke number. There's no consistency there. Different dictionaries use different subsorts once you get beyond the stroke-count level. The five-stroke-type classification used by the PRC is a fairly recent innovation and not universally used. Is there any online source for such data? Even for smaller sets than Unicode CJK. Not that I'm aware. == John H. Jenkins [EMAIL PROTECTED] [EMAIL PROTECTED] http://www.tejat.net/
Re: ct, fj and blackletter ligatures
On Saturday, November 2, 2002, at 02:59 PM, Doug Ewell wrote: Using ZWJ to control ligation is admittedly a new concept, and it may not have been taken up yet by many vendors, but that seems like a really poor reason to discourage the Unicode approach. Proprietary layout features in OT-savvy apps like InDesign might get the job done, but wouldn't it be better if app vendors and font vendors would follow the Unicode Standard recommendation? You never know, it might even reduce the number of requests to encode ligatures. Remember, though that the Unicode approach is that ZWJ is *not* the preferred Unicode way to support things like a discretionary ct ligature in Latin text. The standard says that the preferred way to handle this is through higher-level protocols. I know that you and I disagree with to what extent ligation control belongs in plain text, but the standard clearly allows both approaches. The ZWJ mechanism is not *the* Unicode approach. == John H. Jenkins [EMAIL PROTECTED] [EMAIL PROTECTED] http://www.tejat.net/
Re: ct, fj and blackletter ligatures
On Tuesday, November 5, 2002, at 02:18 AM, William Overington wrote: Well, I suppose it depends upon what one means by a file format that supports Unicode. The TrueType format does not support the ZWJ method and thus does not provide means to access unencoded glyphs by transforming certain strings of Unicode characters into them. TrueType fonts are perfectly capable of supporting ligatures. OpenType, AAT, and Graphite all use TrueType fonts, and all support ligatures. == John H. Jenkins [EMAIL PROTECTED] [EMAIL PROTECTED] http://www.tejat.net/
Re: ct, fj and blackletter ligatures
On Thursday, November 7, 2002, at 09:40 AM, [EMAIL PROTECTED] wrote: As for providing a notification dialog to say that the text contains c, ZWJ, t but that the font doesn't support it, there are no existing mechanisms to support that at present, but it hasn't been demonstrated that there really is any need, and I really don't expect vendors will be hearing too many complaints from users. Actually, you *could* do it on a Mac if you really wanted to. I'm not sure why you would, however. One of the advantages of the ZWJ mechanism for requesting ligatures is that if the request is impossible to fulfill, it can be ignored. For discretionary ligatures like ct, this is the appropriate response. (Matters are a bit more complicated for required ligatures, of course.) == John H. Jenkins [EMAIL PROTECTED] [EMAIL PROTECTED] http://www.tejat.net/
Info: Apple OSX Font Tools Suite 1.0.0 Released
Cupertino 11/8/02: Today the Apple Font Group released its new suite of Unix command line font tools for OSX. These can be downloaded free from http://developer.apple.com/fonts/. The automatically installed 4.8 Mb package includes the tools, user documentation, and a 60-page tutorial. To use this package, you need to be running OSX 10.2. Everything is automatically configured by the installer. You just add fonts to taste. Working with text sources for many of the tables in an sfnt font structure is a powerful and efficient way to develop, debug and manage font sources. E.g. use ftxdumperfuser to solve cmap and postname glitches once and for all in .ttf, .otf and CFF format fonts. With this release, Apple has converted its text dump formats to XML and will be continuing to refine the XML formats in future releases. No previous experience of Unix is necessary as the 60-page tutorial takes you step-by-step through useful font editing proceses with an accompanying set of ready-worked live demo files. Applications in The Font Tool Suite are: * ftxanalyzer * ftxdiff * ftxdumperfuser * ftxenhancer * ftxinstalledfonts * ftxruler * ftxvalidator Documents included: * The Apple Font Tool Suite Manual (51 pages) * Tool Quick Reference (8 pages) * Tutorial (62 pages) * Tutorial Command Summary (8 pages) == John H. Jenkins [EMAIL PROTECTED] [EMAIL PROTECTED] http://www.tejat.net/
Re: Info: Apple OSX Font Tools Suite 1.0.0 Released
Try control-clicking on the link and then selecting Save link to disk from the popup menu. On Tuesday, November 12, 2002, at 09:55 AM, Dean Snyder wrote: At 4:49 PM John H. Jenkins wrote: Cupertino 11/8/02: Today the Apple Font Group released its new suite of Unix command line font tools for OSX. These can be downloaded free from http://developer.apple.com/fonts/. The actual download URL is: http://developer.apple.com/fonts/FontToolsv1.0.dmg But I can't get it to download with any browser I've tried (IE, Opera, Mozilla) - they all display the binary disk image as garbled text instead of downloading it to disk. (I've fiddled with download helper preferences for .dmg files but that hasn't helped. Is the .0.dmg file name termination confusing the browsers?) Respectfully, Dean A. Snyder Scholarly Technology Specialist Center For Scholarly Resources, Sheridan Libraries Garrett Room, MSE Library, 3400 N. Charles St. The Johns Hopkins University Baltimore, Maryland, USA 21218 office: 410 516-6850 mobile: 410 245-7168 fax: 410-516-6229 Digital Hammurabi: www.jhu.edu/digitalhammurabi Initiative for Cuneiform Encoding: www.jhu.edu/ice == John H. Jenkins [EMAIL PROTECTED] [EMAIL PROTECTED] http://www.tejat.net/
Re: N2515: Request for Roadmap - plane 3
On Tuesday, November 12, 2002, at 09:03 AM, Andrew C. West wrote: BTW, what is CJK Unified Ideographs Extension C intended to include ? Surely not any more ordinary Han ideographs - with over 70,000 ideographs already encoded, there can't be so many genuine ideographs that still need encoding as to warrant a whole new plane. However there is a real need to encode oracle bone characters and other ancient epigraphic forms of Han ideographs. Is this (hopefully) what Extension C is intended for ? Nope. We're still doing modern stuff. it is unlikely in the extreme that we'll actuall *need* a whole plane for new ideographs. Extension C is currently big enough, however, that if we were to accommodate it via separate encoding of everything we'd use up the rest of Plane 2. And there's still no end in sight. To some extent, we're having to deal with massive turtle--er, fecal matter being dumped uncritically into the bin consisting largely of things which are obviously variants of existing characters. This we will deal with to an extent by using variation selectors. (Many of Unicode's proposed additions are unofficial simplifications which will also be handled via variation selectors.) Beyond that, it is incredible just how many obscure characters there are once you start looking for them. The PRC's submission includes large numbers of place names, for example, and I dread to think how many more of *those* there may be. HKSAR has come up with more Cantonese- or Hong Kong-specific characters. The only non-Mandarin dialect to receive *any* attention at all is Cantonese, and despite the efforts of the HKSAR that's been rather unsystematic. Unicode's proposed characters include a few Cantonese-specific ones that we were able to dig up without much effort. And all this leaves out stuff like cute names for Hong Kong race horses, frogs-in-wells, and things like that. All in all, I wouldn't be surprised if there were as many as ten thousand or so genuinely distinct characters in modern use which have yet to be encoded. And there are a number of border line cases from pre-modern texts where it looks like it's probably a variant but it may not be. (Of course, I also estimated the total number of genuine Han ideographs to be under eighty thousand, which just goes to show how much *I* know.) Oracle bone forms and other older versions of the Han ideographs are something we haven't even got a good model for how to handle yet. == John H. Jenkins [EMAIL PROTECTED] [EMAIL PROTECTED] http://www.tejat.net/
Re: N2515: Request for Roadmap - plane 3
On Wednesday, November 13, 2002, at 03:22 AM, Andrew C. West wrote: On Wed, 13 Nov 2002 02:03:27 -0800 (PST), John H. Jenkins wrote: Nope. We're still doing modern stuff. Well, there's no rush, just as long as you get round to it sometime ... how about reserving a plane now anyway ? Because there's no indication that we'll need a full plane, basically. All in all, I wouldn't be surprised if there were as many as ten thousand or so genuinely distinct characters in modern use which have yet to be encoded. I'm really sceptical about this. Is there anywhere where I can see the proposals for CJK-C additions ? http://www.cse.cuhk.edu.hk/~irg/irg/extc/CJK_Ext_C.htm == John H. Jenkins [EMAIL PROTECTED] [EMAIL PROTECTED] http://www.tejat.net/
Re: The result of the plane 14 tag characters review.
On Wednesday, November 13, 2002, at 12:07 AM, George W Gerrity wrote: In an effort to unify all character and pictographs, the decision was made to unify CJK characters by suppressing most variant forms. That turns out to be the single greatest objection from users -- especially Japanese -- and somehow we need a low-level way of indicating the target language in the context of multilingual text. The plane 14 tags seem to be appropriate to do this, giving a hint to the font engine as to a good choice of alternate glyphs, where available. A couple of points. 1) There are two kinds of variant problems coming out from Unihan. The way objections are stated based on these variant problems is, respectively: Japanese readers will be forced to read Japanese text with Chinese glyphs! and Mr. Watanabe won't be able to insert the variant glyph for his name that he prefers into a document! The first objection is, and always has been, a non-issue, and is the only aspect of the problem that the Plane 14 tags could hope to deal with. The issue is not a language one, but a locale one, to begin with. Moreover, the typical practice in Japanese typography (at least) is to use Japanese-preferred glyphs even when displaying Chinese text. Japanese users do *not* expect the text to switch back-and-forth between Chinese and Japanese glyphs as the language varies. Given this, the best solution to the problem is to use fonts aimed at the specific locale. This means that a Japanese user who goes to read her email at an Internet café in Hong Kong may see things unexpectedly, true, but it really handles 99.99+% of the problem. I should note that as Unicode-based systems are becoming more common in Japan, such as Windows XP and Mac OS X, there is less concern being expressed on this point. The second objection could not be solved by the Plane 14 tags. The two solutions that are possible are to separately encode every glyphic variant which someone, somewhere, sometime may find necessary to distinguish in plain text, or to use variant markers. It is the latter solution which the UTC has adopted. 2) From a technical standpoint, the Plane 14 tags do not really lend themselves to use with the main complex script font engines available. I don't know enough about Graphite to really speak to it, but in the case of OpenType and AAT it is true that protocols are already available to use Japanese/SC/TC/Korean/Vietnamese glyphs for a run of text. These existing protocols, however, depending on information external to the text itself. To keep the information internal to the text, or, more accurately, internal to the glyph stream, one would have to have the ability to enter a state once a certain character (or glyph) is encountered and remain in that state indefinitely. Neither OpenType nor AAT allow this. OpenType does not use a state engine internal to the glyph stream for processing, and AAT resets the state at the beginning of each line. What would have to happen is that the rendering engine would have to find these characters within the text stream, massage the text data so as remove them and mark the text with the equivalent higher-level information, and then render the result. The problem here is that the libraries such as Uniscribe and ATSUI which provide Unicode rendering do not deal with the text as a whole (at least, this is definitely true with ATSUI and is probably true with Uniscribe, although I don't know for sure). That is, the Plane 14 tag may be found in the first paragraph of the text, but when the client hands the text off to the library, they may hand off only a later portion because that's all that needs to be drawn. The library then does not have access to this information and will not render the text correctly. This basically means that the onus is on the client to parse the presence of these tags in the text and make appropriate adjustments when it hands off the text to Uniscribe or ATSUI for rendering. As such, there is no real advantage gained by having these tags embedded directly in the text over having them in the same layer as font, point size, and other typographic preferences. Indeed, it becomes inconvenient to have them in a different layer as it means that the client has to do *two* levels of processing to derive this information, rather than just one. = John H. Jenkins [EMAIL PROTECTED] [EMAIL PROTECTED] http://www.tejat.net/
Re: ATSUI for MacOS9
On Tuesday, November 19, 2002, at 10:33 PM, Theodore H. Smith wrote: I'd like to know if ATSUI can be used for MacOS9. The ATSUI demo for OSX works perfectly, but the ATSUI demo for OS9, can't do horizontal hit testing. :o( ATSUI should work fine on Mac OS 9. (It was introduced with 8.5, after all.) Why not? Is this a bug in the demo, or a bug in ATSUI for OS9? Does ATSUI for Carbon on OS9 work if ATSUI for Classic OS9 doesn't? I really don't know. ATSUI for Carbon will be a later, better version than ATSUI for classic, however. If anyone knows ATSUI well, could you please contact me so I can ask a few more questions? Thanks a lot. You could send my questions to me and I can have them circulated to the proper people. == John H. Jenkins [EMAIL PROTECTED] [EMAIL PROTECTED] http://www.tejat.net/
Re: Why isn't my character displaying
On Friday, November 29, 2002, at 05:23 AM, Theodore H. Smith wrote: What is wrong? Is it something to do with font fallbacks? I am not touching font fallbacks at all. All I did was set the FontID for my ATSUStyle object, to that for Monaco plain. I'm a bit stuck here, can someone help? I thought ATSUI is meant to fill in the missing fonts, automatically??? So why isn't it? ATSUI *can* fill in the missing fonts automatically, but you have to tell it to. You call ATSUSetTransientFontMatching() for your layout object.
Re: Unihan Mandarin Readings
Is it possible to regenerate the Unihan database with the correct secondary Mandarin readings ? Certainly in the Unicode 4.0 time-frame we can improve things. I can't make any guarantees, however. == John H. Jenkins [EMAIL PROTECTED] [EMAIL PROTECTED] http://www.tejat.net/
Re: Unihan Mandarin Readings
On Tuesday, December 3, 2002, at 03:17 AM, Andrew C. West wrote: BTW, is it possible for Unicode to provide a Unihan.xml version of the Unihan database ? The first thing I do is convert the Unihan.txt file into XML format for ease of processing. As a rule, we tend to stick to older formats so that people don't have to rewrite their perl scripts and other parsers. I know you're asking if we could add an XML format *in addition* to the non-XML one, but given the size of Unihan.txt, that isn't likely. == John H. Jenkins [EMAIL PROTECTED] [EMAIL PROTECTED] http://www.tejat.net/
Re: CJK fonts
On Wednesday, December 11, 2002, at 08:27 AM, Raymond Mercier wrote: For example, the simplified form of the character Han itself (U+6C49) is given the Pinyin reading Yi, the traditional form U+6F22 is the correct reading Han. Have you reported this? BTW, there's the official Unihan lookup Web page at http://www.unicode.org/charts/unihan.html. == John H. Jenkins [EMAIL PROTECTED] [EMAIL PROTECTED] http://www.tejat.net/
Re: Small Latin Letter m with Macron
On Wednesday, January 15, 2003, at 01:35 PM, Kenneth Whistler wrote: Handwritten forms and arbitrary manuscript abbreviations should not be encoded as characters. The text should just be represented as m + m. Then, if you wish to *render* such text in a font which mimics this style of handwriting and uses such abbreviations, then you would need the font to ligate mm sequences into a *glyph* showing an m with an overbar. Remembering, of course, to use ZWNJ to mark places where this ligature may not be used. == John H. Jenkins [EMAIL PROTECTED] [EMAIL PROTECTED] http://www.tejat.net/
Re: newbie 18030 font question
On Thursday, January 16, 2003, at 12:25 PM, Stefan Persson wrote: I assume that you mean GB18030, right? Due to a change in Chinese laws, Apple and Microsoft had to make fonts supporting all those characters available. You may download those fonts from the companies' respective home pages. Well, not from Apple's, anyway. Several GB18030 fonts come with Mac OS X 10.2, but we don't have a license to make them freely downloadable. == John H. Jenkins [EMAIL PROTECTED] [EMAIL PROTECTED] http://www.tejat.net/
Re: Small Latin Letter m with Macron
On Thursday, January 16, 2003, at 01:29 PM, Timothy Partridge wrote: Yes, especially early printing of Latin documents. See for example Gutenberg's bibles. Well, for that matter, even current editions of Spenser's _Faerie Queene_ will use the occasional õ for on, and so on. == John H. Jenkins [EMAIL PROTECTED] [EMAIL PROTECTED] http://www.tejat.net/
Re: unicode in Mac
On Sunday, January 26, 2003, at 10:13 AM, Raymond Mercier wrote: Given a plain text unicode file, with the opening byte FEFF, and which displays correctly in Notepad on a PC. What facility is available on a Mac to make this file display correctly ? I am trying to help a colleague, who has MAC OS IX, and I need to tell him what font will cover Greek and Extended Greek. Do you mean Mac OS X, or Mac OS 9? For the former, TextEdit would work fine. If your friend is on Mac OS X 10.2 or later, the system font, Lucida Grande, has a full set of glyphs for Greek and Extended Greek. Otherwise any of the free Greek fonts on the Internet would work. On Mac OS 9, the situation is a bit grimmer, as there aren't many Unicode-savvy applications. SUE would be one option. You should be able to find it using Google. == John H. Jenkins [EMAIL PROTECTED] [EMAIL PROTECTED] http://www.tejat.net/
Re: 4701
On Saturday, February 1, 2003, at 01:39 PM, Thomas Chan wrote: And the website of the Pearl River (www.pearlriver.com) department store in New York City says "lamb"! unihan.txt says that U+7F8A is "sheep, goat; KangXi radical 123". Stolen from Mathews, as it happens. On Google, "year of the goat" has the lead. Systran has sheep. KangXi says (if I'm understanding it correctly) something like "animal with curved horns." (It's more complex than that, but I think I caught the essence.) == John H. Jenkins [EMAIL PROTECTED] [EMAIL PROTECTED] http://www.tejat.net/
Re: VS vs. P14 (was Re: Indic Devanagari Query)
On Thursday, February 6, 2003, at 08:47 AM, Andrew C. West wrote: There are also a number of other auspicious characters, such as fu2 (U+798F) good fortune that may be found written in a hundred variant forms as a decorative motif. Ah, but decorative motifs are not plain text. == John H. Jenkins [EMAIL PROTECTED] [EMAIL PROTECTED] http://www.tejat.net/
Re: traditional vs simplified chinese
On Thursday, February 13, 2003, at 07:18 AM, Marco Cimarosti wrote: 3) All other characters listed in Unihan.txt are *both* Traditional and Simplified. Actually, this is not quite true. Even though the current set of traditional/simplified data is much better than it's ever been, we still have cases where new simplified forms have been created and encoded where their traditional counterparts have not, and considerably more cases where traditional forms have theoretical simplifications which have not been encoded. The best you can say is that if a character has a traditional variant (but no simplified variant), it's simplified, and if it has a simplified variant (and no traditional variant), it's traditional, and if it has both, it's both. Anyway, I don't see how this information could be of any use for any purpose... There are some ideographs (e.g., anything with the bone radical) which have different appearance in simplified and traditional Chinese, even though the two have been unified in Unicode. Identifying a text as simplified vs. traditional could help in automatic font selection. == John H. Jenkins [EMAIL PROTECTED] [EMAIL PROTECTED] http://www.tejat.net/
Re: Converting old TrueType fonts to Unicode
On Friday, February 14, 2003, at 01:12 PM, John Hudson wrote: Another option for re-encoding fonts is to hack the font cmap table itself. The easiest way to do this is probably with Just van Rossum's TTX tool. See http://sourceforge.net/projects/fonttools/. This is a Python-based open source tool that decompiles TTF and OTF fonts to a human-readable XML file, which can then be edited and recompiled to a font. I have used this tool for a variety of purposes, but do not have any experience working on fonts with supplementary plane codepoints, so cannot verify its usefulness for this purpose. For people on Mac OS X, there is a set of tools available for download from http://developer.apple.com/fonts/ which, like TTX, can decompile table from TrueType and OpenType fonts and let the user edit the results. These *do* support astral characters. == John H. Jenkins [EMAIL PROTECTED] [EMAIL PROTECTED] http://www.tejat.net/
Re: Everson Mono
On Saturday, February 15, 2003, at 07:22 PM, [EMAIL PROTECTED] wrote: You could pick up the old TTFDUMP.EXE program from Microsoft Typography developer's web pages at http://www.microsoft.com/typography/creators.htm This utility can dump any or all of the tables in a TTF/OTF into a plain text file which is human-readable. Once the cmap table information has been dumped, you can import the text into your process and process it. (It only works on Plane Zero fonts.) And you can get ftxdumperfuser at Apple's site http://developer.apple.com/fonts, which works on Mac OS X and can handle the astral planes. == John H. Jenkins [EMAIL PROTECTED] [EMAIL PROTECTED] http://www.tejat.net/
Re: Finding a font that contains a particular character
On Monday, February 17, 2003, at 09:36 AM, Alan Wood wrote: Someone recently asked how to find a font that contains a particular Unicode character. I don't have an easy answer, but TrueType Explorer (for Windows) may help: On the Mac, BTW, Mac OS X 10.2 or later, you can either use the character palette (in the keyboard menu) or install Apple's font tools http://developer.apple.com/fonts and use ftxinstalledfonts with the -U option. Both of these work with astral characters. == John H. Jenkins [EMAIL PROTECTED] [EMAIL PROTECTED] http://www.tejat.net/
Re: [OpenType] PS glyph `phi' vs `phi1'
On Wednesday, February 19, 2003, at 04:13 PM, Werner LEMBERG wrote: I have to correct myself, fortunately. After looking into the printed version of Unicode 2.0 I see that the glyphs of 03D5 and 03C6 in the file U0370.pdf are exchanged. Your assuption is correct that the annotation in Unicode 3.2 is wrong. I'm sorry, but you've lost me here. The Unicode 3.2 text states: quote With Unicode 3.0 and the concurrent second edition of ISO/IEC 10646-1, the representative glyphs for U+03C6 GREEK LETTER SMALL PHI and U+03D5 GREEK PHI SYMBOL were swapped. In ordinary Greek text, the character U+03C6 is used exclusively, although this characters has considerably glyphic variation, sometimes represented with a glyph more like the representative glyph shown for U+03C6 (the loopy form) and less often with a glyph more like the representative glyph shown for U+03D5 (the straight form). For mathematical and technical use, the straight form of the small phi is an important symbol and needs to be consistently distinguishable from the loopy form. The straight form phi glyph is used as the representative glyph for the symbol phi at U+03D5 to satisfy this distinction. The reversed assignment of representative glyphs in versions of the Unicode Standard prior to Unicode 3.0 had the problem that the character explicitly identified as the mathematical symbol did not have the straight form of the character that is the preferred glyph for that use. Furthermore, it made it unnecessarily difficult for general purpose fonts supporting ordinary Greek text to also add support for Greek letters used as mathematical symbols. This resulted from the fact that many of those fonts already used the loopy form glyph for U+03C6, as preferred for Greek body text; to support the phi symbol as well, they would have had to disrupt glyph choices already optimized for Greek text. When mapping symbol sets or SGML entities to the Unicode Standard, it is important to make sure that codes or entities that require the straight form of the phi symbol be mapped to U+03D5 and not to U+03C6. Mapping to the latter should be reserved for codes or entities that represent the small phi as used in ordinary Greek text. Fonts used primarily for Greek text may use either glyph form for U+03C6, but fonts that also intend to support technical use of the Greek letters should use the loopy form to ensure appropriate contrast with the straight form used for U+03D5. /quote What annotation in 3.2 do you feel is incorrect? == John H. Jenkins [EMAIL PROTECTED] [EMAIL PROTECTED] http://www.tejat.net/
Re: The display of *kholam* on PCs
to shudder at the thought of designing a state table. But not everything in a 'morx' requires a state table. Ligature support is *really* easy to do, and has been for years and years. The fact of the matter is that the bulk of the font designers out there don't even *know* that there's a way to add ligature support to fonts on the Mac. We've tried to get the word out, but obviously we haven't succeeded. Still, when and where people have come to use to ask for help, we've done what we could to provide it. Frankly, few people have come. The best long-term solution is for Apple to follow through on their promise to support OpenType Layout features, so that we have a genuinely cross platform font solution. As I say, we've been careful not to make public promises in any detail on this issue. I'm not aware of any time when we've said more than that we're hoping to provide OT to AAT layout table conversion possible using our tools. We really can't commit ourselves on this. Given the fact that many application developers are basically echoing the same sentiment (why waste money developing for the Mac when I can get 90% of the same customer base without spending the money), I'm not sure it's entirely a matter of it being our fault, however. Certainly I'm not sure that the best long-term solution to having competing OSes is for everybody to simply switch over to Windows, either. The best *short-term* solution is for someone to tell them that if they're interested, they can contact us directly and we'll see what we can work out. We could probably work out AAT support for their specific font without too much trouble. == John H. Jenkins [EMAIL PROTECTED] [EMAIL PROTECTED] http://www.tejat.net/
Re: FAQ entry (was: Looking for information on the UnicodeData file)
On Friday, March 7, 2003, at 04:26 AM, Pim Blokland wrote: Oh, in that case I must say I think the UnicodeData.txt file doesn't do a very good job. For instance, the Danish ae (U+00E6) is not designated a ligature, but the Dutch ij (U+0133) is, even though the a and e are clearly fused together, while the i and j aren't. John's description is a general one of what the character names mean. They are not, however, systematic or entirely consistent, nor are they expected to be, since different people speaking different languages often have different perceptions of what a symbol is. == John H. Jenkins [EMAIL PROTECTED] [EMAIL PROTECTED] http://www.tejat.net/
Re: Encoding: Unicode Quarterly Newsletter
I certainly think it would be good published with a leather cover, onion-skin paper, and gilt edges, yes. First we have to have Ken divide it into verses, though. On Tuesday, March 11, 2003, at 01:19 PM, Yung-Fong Tang wrote: Hope they can reduce the weight next time by change the type of the paper. My Bible is about 500 pages (about 1500+ pages) more than the unicode 3.0 standard but only 50% of it's thick. Same as my Chinese/English dictionary. Otto Stolz wrote: Kenneth Whistler wrote: we can calculate the weight as being *approximately* 9.05 pounds (avoirdupois) [or 10.99 troy pounds]. Apparently a weighty publication, that forthcoming Unicode standard... Cheers, Otto Stolz == John H. Jenkins [EMAIL PROTECTED] [EMAIL PROTECTED] http://www.tejat.net/
Re: Unicode not in Quark 6
On Saturday, June 21, 2003, at 10:06 PM, Jungshik Shin wrote: PostgreSQL seems to be available for Mac OS X. See http://www.postgresql.org/ and http://developer.apple.com/internet/macosx/postgres.html MySQL is also available for Mac OS X (http://developer.apple.com/internet/macosx/osdb.html). I'm not sure of the status of Unicode support, but it seems to be fine if you're not worrying about collating or similar services. It's what's used at the moment to host the Unihan database, for example. == John H. Jenkins [EMAIL PROTECTED] [EMAIL PROTECTED] http://homepage.mac.com/jhjenkins/
Re: French group separators
On Monday, July 7, 2003, at 4:08 PM, Frank da Cruz wrote: Of course. But without two spaces you have greater ambiguity, at least in English: In Mr. Roberts, what is the function of the period? Don't call me Mr. Roberts is my name. Don't call me Mr. Roberts is my name. IIRC the English prefer to say Mr Roberts. == John H. Jenkins [EMAIL PROTECTED] [EMAIL PROTECTED] http://homepage.mac.com/jhjenkins/
Re: French group separators
On Monday, July 7, 2003, at 4:38 PM, Michael Everson wrote: At 16:22 -0600 2003-07-07, John H. Jenkins wrote: IIRC the English prefer to say Mr Roberts. The, ahem, Irish too. ;-) Well, to be frank, I'm sure that the Welsh, Scots, and Manx probably do, too. (Did I leave anybody out *this* time?) I just don't read many books, alas, printed in Ireland, Wales, Scotland, or on the Isle of Man. :-( == John H. Jenkins [EMAIL PROTECTED] [EMAIL PROTECTED] http://homepage.mac.com/jhjenkins/
Re: missing .GIF's for ideographs on unicode.org?
On Thursday, July 17, 2003, at 12:00 AM, Richard Cook wrote: I'm guessing this just hasn't been implemented yet. You are guessing correctly. Once some of the dust settles from my day job, I expect I can get to this. == John H. Jenkins [EMAIL PROTECTED] [EMAIL PROTECTED] http://homepage.mac.com/jhjenkins/
Re: Last Resort Glyphs (was: About the European MES-2 subset)
On Saturday, July 19, 2003, at 1:15 PM, Michael Everson wrote: So fonts containing these glyphs could be designed to display these glyphs, in a way similar to the current assignment of control pictures. Um, that's what the Last Resort font does, outside of Unicode encoding space. (I don't think PUA characters are used, actually, but I could be wrong. No, it uses the acutal Unicode characters, and just has a huge cmap that maps everything in Unicode to the glyph for its block. == John H. Jenkins [EMAIL PROTECTED] [EMAIL PROTECTED] http://homepage.mac.com/jhjenkins/
Re: Last Resort Glyphs (was: About the European MES-2 subset)
On Sunday, July 20, 2003, at 7:37 AM, Philippe Verdy wrote: Mostly for documentation purpose, but also in most system that want to be more informative to users missing a font for a particular script. Michael also judged it to be useful enough to create such a font for Apple, and Apple thought it would be useful for its Mac users. Er, no. Apple thought it would be useful for its Mac users and commissioned Michael to make glyphs. (And I personally think he's done an excellent job.) == John H. Jenkins [EMAIL PROTECTED] [EMAIL PROTECTED] http://homepage.mac.com/jhjenkins/
Re: Karen Language Representation in Unicode
On Sunday, July 20, 2003, at 7:38 AM, [EMAIL PROTECTED] wrote: Heather Batterham wrote on 07/20/2003 06:46:16 AM: The second interest I have is in the development of word processing tools that utilize the contents of unicode. I use a Macintosh with OSX installed. The basic language packages are very good but they do not have the Burmese script included. The only working font implementation for Burmese script that I know of is a one that we have (in beta), implemented using Graphite rendering. It's available at http://scripts.sil.org/cms/scripts/page.php? site_id=nrsiitem_id=GraphiteFonts. We could probably help you get it to work on Mac OS X. Meanwhile, Xenotype claims to have a Burmese language kit for Mac OS X (http://www.xenotypetech.com/osxBurmese.html), although nobody at Apple has seen it, so we can't confirm that it works as advertised. == John H. Jenkins [EMAIL PROTECTED] [EMAIL PROTECTED] http://homepage.mac.com/jhjenkins/
Re: About the European MES-2 subset
On Friday, July 18, 2003, at 4:45 PM, Michael (michka) Kaplan wrote: A question mark is a sign of a bad conversion from Unicode (to a code page that did not contain the character). This would likely happen on the Mac too rather than the Last Resort font, wouldn't it? MS Explorer on the Mac converts Unicode to old Mac scripts which it then renders. That's why all the question marks when the page is looked at with MS Explorer. == John H. Jenkins [EMAIL PROTECTED] [EMAIL PROTECTED] http://homepage.mac.com/jhjenkins/
Re: proposal for a creative commons character
On Jun 15, 2004, at 2:22 PM, [EMAIL PROTECTED] wrote: Michael Tiemann scripsit: Without getting greedy, I'd like to propose the adoption of the (cc) symbol in whatever way would be most expedient (so that creative commons authors can identify their work more appropriately), and leave for later the question of the other symbols. It's a logo. We normally don't do logos. To be a little less terse, in the case of symbols like this, it is the strong preference not to encode as a means to encourage use. John H. Jenkins [EMAIL PROTECTED] [EMAIL PROTECTED] http://homepage.mac.com/jhjenkins/
Re: number of bytes for simplified chinese
On Jun 27, 2004, at 11:37 PM, Duraivel wrote: hi, I would like to know the number opf bytes required for simplified chinese language. Can we represent all the characters of simplified chinese in unicode using just two bytes. No. It will take up to four bytes per character, whether you're using UTF-8, UTF-16, or UTF-32. John H. Jenkins [EMAIL PROTECTED] [EMAIL PROTECTED] http://homepage.mac.com/jhjenkins/
Re: Looking for transcription or transliteration standards latin- arabic
Jul 2, 2004 11:17 AM Chris Harvey Perhaps one could think of Ha Tinh as the English word for the city, like Rome (English) for Roma (Italian), or Tokyo (English) for Tky (English transliteration of Japanese), or Kahnawake (English/French) for Kahnaw:ke (Mohawk). Or Peking for Bejng. :-) John H. Jenkins [EMAIL PROTECTED] [EMAIL PROTECTED] http://homepage.mac.com/jhjenkins/
Re: Chinese Simplified - How many bytes
Jul 6, 2004 3:10 AM Duraivel Hi, I browsed through the ICU library and it looks similar to gettext library which GNU provides, with more functionality added. But we are developing our product on QT which has its own translations. So I dont want to use another library for translations. Also there is a class QString which says its takes care of byte issues. Basically it is overloaded and acts accordingly for two byte Unicode char set. Also it states that QString supports Chinese(simplified). Am not getting how he says that two bytes can support Chinese simplified. Is it true that, to represent Chinese simplified programmatically, two bytes will do. Unicode in the UTF-16 encoding will cover almost all the simplified Chinese characters people use today in two bytes. There are the occasional exceptions which will require four bytes. John H. Jenkins [EMAIL PROTECTED] [EMAIL PROTECTED] http://homepage.mac.com/jhjenkins/
Re: Unicode v. 4 font software for Mac
Jul 15, 2004 12:13 PM David Branner I have tried AsiaFont Studio 4 and FontLab, but they are not compatible with version 4 of the Unicode Standard and hence are not suitable for my purposes. I assume that by saying they're not compatible, you mean that they don't support characters off of the BMP. If this is the problem, you can use Apple's tool ftxdumperfuser to alter the cmap after FontLab has generated it. Apple's font tool suite is available at http://developer.apple.com/fonts.(Alternatively, if you give a character a name of the form ux, e.g., u2 I'm told that the latest version of FontLab will generate an appropriate cmap entry for it, but I don't know for sure.) John H. Jenkins [EMAIL PROTECTED] [EMAIL PROTECTED] http://homepage.mac.com/jhjenkins/
Re: Unicode v. 4 font software for Mac
Jul 15, 2004 2:54 PM David Branner : :I assume that by saying they're not compatible, you mean that they : :don't support characters off of the BMP. They can neither generate such characters nor (apparently) open fonts that contain such characters. Then move the non-BMP characters to the PUA using ftxdumperfuser (or remove their Unicode mappings altogether), and re-add (or re-shift) the Unicode mappings after using FontLab with the same tool. John H. Jenkins [EMAIL PROTECTED] [EMAIL PROTECTED] http://homepage.mac.com/jhjenkins/
Re: Problem with accented characters
On Aug 23, 2004, at 3:34 PM, Doug Ewell wrote: Deborah Goldsmith goldsmit at apple dot com wrote: FYI, by far the largest source of text in NFD (decomposed) form in Mac OS X is the file system. File names are stored this way (for historical reasons), so anything copied from a file name is in (a slightly altered form of) NFD. Slightly altered? Yes, the specification for the Mac file system was frozen before NFD had been developed by the UTC, so it isn't exactly the same. But it's close. John H. Jenkins [EMAIL PROTECTED] [EMAIL PROTECTED] http://homepage.mac.com/jhjenkins/
Re: Arial Unicode MS
On Dec 6, 2004, at 10:23 AM, Johannes Bergerhausen wrote: From some discussions here i learned that Arial Unicode MS contains about 50.000 glyphs, which is about the size of characters encoded in Unicode 2.0 and was shipped the last time bundled with Office for Windows 2003. A Pan-Unicode-Font is a beautiful idea. Why Microsoft/Monotype stopped the developpement of further versions? The TrueType and OpenType font formats do not allow a font to contain more than about 65,000 glyphs. Since there are well over 65,000 characters in Unicode, plus additional glyphic forms that would be necessary for proper support for various scripts, it is no longer possible to produce a single font like Arial Unicode MS. There are other issues -- making a single typeface which covers all the scripts in Unicode and has a common esthetic design is really not possible; loading a huge font can consume a significant chunk of the resources on a system, most of which is wasted; and so on.
Re: IUC27 Unicode, Cultural Diversity, and Multilingual Computing / Africa is forgotten once again.
On Dec 8, 2004, at 3:57 PM, Patrick Andries wrote: Azzedine Ait Khelifa a écrit : Hello All, The subject of this conference is really interesting and veryusefull. But once again Africa is forgotten. I want to know, if we can have the same conference AfricaOriented scheduled ? If Not, What should we do to have this conference scheduled in a cityaccesible for african community (like Paris). If this is possible, I would also add « and with much more contents ina language understood in Africa and the host country : French ». Well, and as with everything else associated with Unicode, feel free to volunteer.
Re: US-ASCII (was: Re: Invalid UTF-8 sequences)
On Dec 10, 2004, at 1:25 PM, Tim Greenwood wrote: Is that like the 'Please RSVP' that I see all too often? Or should that not be excused? Or -- my own personal favorite -- in the year AD 2004.
Re: Simplified Chinese radical set in Unihan
As you say, the main problem is that there are so many different possible sets. Some will be proprietary, which would limit their usefulness although there would, I believe, otherwise be no objection to its inclusion. If you can come up with a reasonably standard set and reasonably consistent data across several dictionaries referencing it, I'm sure there'd be no objection to including it. On Dec 16, 2004, at 2:19 PM, Erik Peterson wrote: Hello, I've found many uses for the UniHan data file the past few years. It's a great source of information. One potential addition that I've wanted is a field listing the simplified Chinese radical for at least the simplified Chinese characters, like what exists for the Xinhua Zidian (Xinhua Dictionary) and other mainland Chinese dictionaries. I was wondering if this has been discussed before? Some potential difficulties I could see include the fact that mainland dictionaries use a variety of different radical schemes. The most standard one that I can find is the Chinese Academy of Social Sciences (CASS) set with 189 different radicals. Even for dictionaries that use this set the ordering is often different. Could the radical set also be proprietary in some way? Anyway, I was curious. I've been working on something like this myself that I could also contribute when it's farther along. Regards, Erik Peterson