from:"John H. Jenkins"

Re: OT: OED

2002-04-29 Thread John H. Jenkins



On Monday, April 29, 2002, at 10:05 AM, Patrick Rourke wrote:

 In the US, nearly all University libraries have the standard edition, and
 many good high school libraries (for 14-18 year-olds) have the compact
 edition (smaller typography).



It's also available on CD for Windows.

==
John H. Jenkins
[EMAIL PROTECTED]
[EMAIL PROTECTED]
http://homepage.mac.com/jenkins/

Re: Greek Extended: question: missing glyphs?

2002-04-29 Thread John H. Jenkins



On Monday, April 29, 2002, at 08:37 PM, Pim Rietbroek wrote:

 Hello,

 Please forgive me if this question has been raised before: I am a newbie 
 on this list.

 I am looking into the Unicode standard for the encoding of Classical 
 Greek.  While both the Greek and the Greek Extended ranges of the current 
 Unicode Standard seem to cover most of the essentials, it looks strange 
 to me that there some Greek extended glyphs have not been defined.  They 
 are:

 1) GREEK CAPITAL LETTER UPSILON WITH PSILI
 2) GREEK CAPITAL LETTER UPSILON WITH PSILI AND VARIA
 3) GREEK CAPITAL LETTER UPSILON WITH PSILI AND OXIA
 4) GREEK CAPITAL LETTER UPSILON WITH PSILI AND PERISPOMENI


Reserved means don't use it.

Yes, they're missing as precomposed forms, but you can always represent 
them using combining sequences.  No, there's no point in asking for them.  
Unicode cannot add new precomposed accented Latin, Greek, or Cyrillic 
letters because it screw up normalization.  Use the actual upsilon capital 
letter followed by the appropriate breathing and accent marks.

==
John H. Jenkins
[EMAIL PROTECTED]
[EMAIL PROTECTED]
http://homepage.mac.com/jenkins/

Re: sources for plane 2 characters?

2002-05-01 Thread John H. Jenkins


I don't know.  I'll bring it up next week at the IRG meeting.

On Tuesday, April 30, 2002, at 08:02 PM, Thomas Chan wrote:

 Hi all,

 I was looking at the plane 2 characters in the March 15, 2001 version of
 the unihan.txt file, and found five that did not have an IRG source:
 U+20957, U+221EC, U+22FDD, U+24FB9, and U+2A13A.  (The last one, U+2A13A,
 however, has kIRGHanyuDaZidian and kIRGKangXi information showing that it
 can be found in those dictionaries.  Still, shouldn't there be an IRG
 source for it?)  Where are the first four from?

 Thanks,


 Thomas Chan
 [EMAIL PROTECTED]



==
John H. Jenkins
[EMAIL PROTECTED]
[EMAIL PROTECTED]
http://homepage.mac.com/jenkins/

Re: on U+7384 (was Re: Synthetic scripts (was: Re: Private Use Agreements

2002-05-11 Thread John H. Jenkins



On Friday, May 10, 2002, at 06:29 PM, John Cowan wrote:

 What is this about Qing taboo characters?  Can someone point me to an
 explanation (in English)?  Thanks.


The whole idea of taboo forms stems from the fact that there are certain 
ideographs one could not use because, typically, they're part of  personal 
name of someone important.  So one deliberately distorts them when writing 
them.

Such a thing is very much time-bound.  Using a character from the personal 
name of the *current* emperor is a big deal, but using one from the 
personal name of an emperor five hundred years dead from an entirely 
different dynasty is no biggie.  So the Qing dictionary, the KangXi, would 
have some taboo forms which would later become untaboo (especially now, of 
course, since nobody does that kind of thing anymore).

==
John H. Jenkins
[EMAIL PROTECTED]
[EMAIL PROTECTED]
http://homepage.mac.com/jenkins/

Re: Han Radical-Stroke Index

2002-05-13 Thread John H. Jenkins



On Monday, May 13, 2002, at 04:02 AM, William Overington wrote:

 In chapter 15 of the Unicode specification is the statement that the Han
 Radical-Stroke Index is available as a separate file.  I have tried to 
 find
 it on the web site with no success.  Is this file available on the web 
 site
 please?



The current version is at http://www.unicode.org/charts/Unihan3.2.pdf.

==
John H. Jenkins
[EMAIL PROTECTED]
[EMAIL PROTECTED]
http://homepage.mac.com/jenkins/

Re: CJK Unified Ideographs Extension B

2002-05-13 Thread John H. Jenkins



On Monday, May 13, 2002, at 04:21 AM, William Overington wrote:

 I have been looking at the characters in the CJK Unified Ideographs
 Extension B document.  These are the characters from U+02 through to
 U+02A6DF, which, as I understand it, are the rarer CJK characters.


Actually, this is not quite true.  The vast majority are rare, of course, 
and none of them are exactly *common*, but how rare they are depends on 
what you're writing.  A small number, for example, are from HK SCS and 
reflect current needs for Hong Kong, including general-purpose Cantonese 
writing.  (One is generally not supposed to write Cantonese, even if one 
speaks it, hence the lag in getting some Cantonese-specific characters 
added.)

 I wonder if any of the people who read this list who understand the
 languages involved might please like to say what any one or two of these
 characters, of their choice, mean please, just as a matter of general
 cultural interest for people who see these characters in the Unicode
 specification and, though not themselves knowledgeable of the languages,
 find the characters interesting for their artistry and history.



My personal favorite is U+233B4, which means a tree stump.  (It's formed 
by taking the tree radical and moving the cross-bar to the top of the 
character instead of having it in the middle.)  U+20C43 is a 
Cantonese-specific character meaning thin or flat.

Altogether, currently eighteen characters from Extension B currently have 
a kDefinition entry in Unihan.txt.

==
John H. Jenkins
[EMAIL PROTECTED]
[EMAIL PROTECTED]
http://homepage.mac.com/jenkins/

Re: Additional Deseret letters

2002-05-19 Thread John H. Jenkins



On Sunday, May 19, 2002, at 01:18 AM, Michael Everson wrote:

 At 16:48 -0700 2002-05-18, Doug Ewell wrote:
 I discovered on the updated Pipeline page that the apocryphal Deseret
 letters OI and EW were approved by UTC on 2002-05-02 for a future
 version of Unicode.

 This is news to me. They were omitted originally because they were 
 considered ligatures. Has there been a new paper and proposal?


Yes.  WG2 documents N2473 and N2474 (when they show up, which should be 
shortly) deal with the issue.

==
John H. Jenkins
[EMAIL PROTECTED]
[EMAIL PROTECTED]
http://homepage.mac.com/jenkins/

Re: Additional Deseret letters

2002-05-19 Thread John H. Jenkins



On Saturday, May 18, 2002, at 05:48 PM, Doug Ewell wrote:

 Are these the same characters (and glyphs) that were described in John
 Jenkins' original Deseret proposal, and displayed -- perhaps
 accidentally -- in the chart accompanying the Deseret proposal for the
 ConScript Unicode Registry?


Yes, they are.  Ken Beesley of Xerox Research Center Europe is aware of 
their use in handwritten materials and argues that treating them as mere 
ligatures is insufficient.  This will be WG2 document N2474.

==
John H. Jenkins
[EMAIL PROTECTED]
[EMAIL PROTECTED]
http://homepage.mac.com/jenkins/

Re: Bengali script - where is khanda ta?

2002-05-21 Thread John H. Jenkins



On Tuesday, May 21, 2002, at 01:03 PM, Somnath Kundu wrote:


 Is that same for Unicode, i.e.,

 Ta + Halant + Halant - Khanda Ta

 and how Uniscribe handle this case? In other words, how can I write 
 Khanda Ta in Unicode?


Forgive me, but this is a pet peeve of mine.  How something is done in 
Unicode and how it's done in Uniscribe are *NOT* the same thing.

 The reason for my posting was that I found Code2000 font some days ago 
 and installed Bangla keyboard driver manually, found on my MSDN Win2k CD,
  on 2k/XP to type some Bangla letters but was not able to type Khanda Ta.
  (The glyph is also probably missing in that font).



I don't think that Code2000 is an OpenType font, which means it won't have 
the ancillary glyphs and data needed to do full proper support of many 
languages and scripts.

==
John H. Jenkins
[EMAIL PROTECTED]
[EMAIL PROTECTED]
http://homepage.mac.com/jenkins/

Re: Bengali script - where is khanda ta?

2002-05-22 Thread John H. Jenkins



On Tuesday, May 21, 2002, at 09:01 PM, James Kass wrote:

 John H. Jenkins wrote:


 I don't think that Code2000 is an OpenType font, which means it won't 
 have
 the ancillary glyphs and data needed to do full proper support of many
 languages and scripts.


 Code2000 is an OpenType font with fairly good OpenType coverage
 for Bengali presentation forms as well as coverage for several other
 scripts.


I gladly stand corrected, then.  Good job (as always), James!

==
John H. Jenkins
[EMAIL PROTECTED]
[EMAIL PROTECTED]
http://homepage.mac.com/jenkins/

Re: Courtyard Codes and the Private Use Area (derives from Re: Encoding of symbols and a lock/unlock pre-proposal)

2002-05-24 Thread John H. Jenkins



On Friday, May 24, 2002, at 08:06 AM, Philipp Reichmuth wrote:

 WO U+F3A2 PLEASE LIGATE THE NEXT TWO CHARACTERS
 WO U+F3A3 PLEASE LIGATE THE NEXT THREE CHARACTERS
 WO U+F3A4 PLEASE LIGATE THE NEXT FOUR CHARACTERS

 While I don't think this discussion of various PUA allocations should
 continue very further, it's probably a lot better to introduce the
 already-discussed ZERO WIDTH LIGATOR in such a form that X ZWL Y
 produces the XY ligature, X ZWL Y ZWL Z the XYZ ligature and so on. It
 saves you a lot of hassle with longer ligatures.



Zero width ligator was rejected.  Zero-width joiner can be used to mark 
ligation points where they are absolutely necessary; where they are merely 
stylistic preferences, they belong in markup.

==
John H. Jenkins
[EMAIL PROTECTED]
[EMAIL PROTECTED]
http://homepage.mac.com/jenkins/

Re: N2476 a hoax?

2002-05-25 Thread John H. Jenkins

 prevented Chinese from being used in internationalized domain
 names.

 No, it didnt.  That was a counterproposal made by the Chinese
 domain-name representatives, who claimed that prohibiting Han characters
 for now would give the relevant bodies more time to develop a proper
 TC/SC mapping solution (implying that the problem was solvable at all,
 an opinion disputed by many).


Mea culpa.  I stated the facts as I understood them, and I appear to have 
misunderstood them.

In any event, while I (for one) would argue that TC/SC equivalence is not 
the same as English case-folding, my understanding was that there was a 
body of people who argued otherwise.  The existence of such a body and and 
an acknowledgment of their desire is different from agreement with them.

At the same time, I *do* agree that it is possible to define on a purely 
character level a function which allows a first-order approximation to SC/
TC  equivalence.  And I think it's a legitimate concern for companies and 
individuals that some mechanism be in place so that two domain names which 
are TC/SC equivalents aren't registered by competing organizations
Unicode's own ideal Chinese domain name would be a case in point.  Whether 
this is done via TC/SC folding or via someone asking to register domain 
name X and being told, Oh, by the way, you also need to register domain 
names Y and Z while you're at it is irrelevant.

 Programmers and users are being increasingly frustrated that as
 ISO/IEC 10646 becomes more pervasive, they are increasingly
 compelled to deal with a large number of variant characters some
 of which are only subtly different from each other and which
 cannot be automatically equated.

 The UTC would never refer to ISO/IEC 10646 as pervasive

Why not?  Isn't it?

 or talk of
 programmers and users being compelled to deal with variant characters,

Why not?

 nor would it make such an emotional appeal that such variants should be
 automatically equated.

Why not?

 Note the lack of standard UTC/WG2 terminology;
 if this were the UTC talking, you would be reading about canonical and
 compatibility equivalents and normalization.

No, if it were Ken Whistler or Mark Davis writing the document, you would 
probably get this language.  :-)

More seriously, why do compatibility or canonical equivalents or the UTC's 
version of normalization come into it?  The whole point here is that we 
are dealing with a different category of equivalent than the standard 
currently covers.  The further issue of a normalized Han (Cleanihan) is 
also orthogonal.

 This passage also hints at
 the authors lack of awareness that similar equivalence issues exist for
 scripts other than Han.


You may see the hint there; I certainly don't.  In any event, I would 
argue that the problem is a lot worse for Han than for any other script in 
Unicode of which I'm aware.

 What is needed, however, is something that allows at the least for
 a first-order approximation of equivalence  it would be up to
 the authors of the individual application, protocol, or standard
 to determine whether this were acceptable or not.

 And what if the authors decide the IRG-developed approach is not
 acceptable?  What are they expected to do then?

Whatever they want.

We are repeatedly getting requests from people who are asking us how to 
handle Han variants, Doug, and we currently have no answer at all beyond 
pointing them to the rather limited data which is in Unihan.txt.  (Indeed,
  many of the requests are coming from people who ask, How come the data 
in Unihan.txt is so crappy?) We want to solve this problem.  At the same 
time, if Basis or Microsoft or someone else with the resources to develop 
their own solution wants to use their own solution, we don't preclude them 
from doing that.

 On the very same day (2002-05-08) that N2476 was published, a new
 Proposed Draft Technical Report (PDUTR #30) titled Character Foldings
 was also published.  PDUTR #30, available on the Unicode Web site, deals
 with several different types of mappings between characters -- mappings
 that involve digraphs and trigraphs, removal of diacritical marks,
 mappings between Hiragana and Katakana, mappings between European,
 Arabic, and Indic digits, and so on.  NOWHERE in this document is there
 the slightest mention of TC/SC mappings.  Isn't that a bit strange?

No, not really.  There is sometimes a tendency for people who work on UTC 
documents to have a subconscious Han/everything-else dichotomy as they 
work.

 If
 the UTC were really driving the issue of TC/SC mapping, wouldn't they
 have at least given it a brief mention in a Character Foldings
 proposal?


I would have hoped so, but evidently that didn't happen.  That the UTC is 
concerned about SC/TC data and other Han equivalences is, in any event, 
already a part of the public record.

==
John H. Jenkins
[EMAIL PROTECTED]
[EMAIL PROTECTED]
http://homepage.mac.com/jenkins/

Re: Normalisation and font technology

2002-05-29 Thread John H. Jenkins



On Wednesday, May 29, 2002, at 10:55 AM, John Hudson wrote:


 In particular, I think it is is mistake to resolve display of 
 character-level decompositions by relying on the presence of glyph-space 
 substitution or positioning features in fonts, simply because most users 
 have very few fonts that are capable of doing this.


Agreement; Apple's current solution is a better-than-nothing one, but 
not really what's best in the long run IMHO.  BTW, does FontLab 4 
auto-generate OT layout data from the Unicode repertoire of a font?

==
John H. Jenkins
[EMAIL PROTECTED]
[EMAIL PROTECTED]
http://homepage.mac.com/jenkins/

Re: Normalisation and font technology

2002-05-29 Thread John H. Jenkins



On Wednesday, May 29, 2002, at 01:57 PM, John Hudson wrote:

 Thank you. My main concern was that someone might think that this is a 
 reasonable model for handling this, and it wasn't immediately clear that 
 Apple did not consider this, in fact, to be an appropriate long term 
 solution.


Hm.  There aren't not too many negatives in that last sentence, making it 
not undifficult for some who who hadn't insufficient sleep last night not 
to be unable to parse it incorrectly.  I think.

I'm sorry.  I'm very tired today. :-)

Yes.  Apple does not consider this an ideal long-term solution.

==
John H. Jenkins
[EMAIL PROTECTED]
[EMAIL PROTECTED]
http://homepage.mac.com/jenkins/

Re: Unicode and the digital divide. (derives from Re: Towards some more Private Use Area code points for ligatures.)

2002-05-31 Thread John H. Jenkins

 everywhere.

 These are not rhetorical questions, I really would genuinely like to know.
 I am quite happy to accept that perhaps a solution to the problem has been
 found, yet wonder whether that solution is, as of today, only available to
 people who are on one side of a digital divide.

Again, there are many digital divides.  There are things that work on 
Windows and not on Macs.  There are things that work on Macs and not on 
Windows.  There are things that work with InDesign that don't work with 
Word.  There are things that work with Word and not with InDesign.  There 
are things that work with Windows XP that don't work with Windows 98, and 
things that work with Windows 98 that don't work with DOS 3.0.  There are 
things that work with Mac OS X that don't work with Mac OS 9, and things 
that work with Mac OS 9 that don't work with Mac OS 6.8.3 of venerable 
memory.

Don't put yourself in the position of arguing that it's wrong to innovate.
   Innovation in the IT industry always creates a digital divide.

 If there really are
 problems which my list will cause then I will be happy to add a note 
 stating
 of the problem.  Yet I am very concerned that I may be in effect being 
 told
 here that Unicode is only really intended for people with the very latest
 equipment using expensive solutions that are only realistically available 
 to
 rich corporations.


*sigh*  Unicode has from the beginning been designed with the assumption 
that it would require rendering engines capable of complex typesetting.  
We've always known that.  It's taken longer to get them to market than we 
would have thought and liked, but they're showing up now.  It's a bar 
we've always had to cross, however, if not for Latin ligatures, then for 
Arabic and Devanagari, and so on.

The advantage of this is (ideally) that once you get a system capable of 
doing Arabic or Devanagari or Latin ligatures, you get a system capable of 
doing all of them.  That, at least, was the goal.

 My thinking is that the existence of the list, (and hopefully, the list
 having been distributed in this discussion group, many people will be 
 aware
 of its existence, and may perhaps have even filed a copy for possible 
 future
 reference), will hopefully make the availability of such ligatures in 
 founts
 more widespread and will also hopefully influence people who make software
 packages, such as relatively inexpensive electronic book publishing
 packages, to build in a feature so that such ligatures may be accessed 
 from
 a TrueType fount.

1) People who make fonts already know about ligatures.

2) The set of ligatures appropriate for Latin typography is very much 
font-specific.  Zapfino has dozens of Latin ligatures because it's a 
calligraphic font.  Courier should have none because it's a monospaced 
font.

3) People who write book publishing packages already build in features to 
access Latin ligatures.  Microsoft Word is not a good program to use for 
publishing books.

 I feel that the Unicode system should be available for all, not just for
 people who are on the money side of the digital divide.

It's a nice goal.  It isn't a realistic one, however.

Now, a question on my part.  You're using the term digital divide, but 
you're not defining it very well.  Could you tell me:

a) What the digital divide really is from your perspective—that is, what 
OS is on one side and what OS on the other?

b) What are the relative numbers of people with systems on both sides?

If, say, your divide were to be between Mac OS 6 or earlier and Mac OS 7 
or later (the point at which Apple adopted TrueType as its primary font 
technology), then there are likely 99.99% of all Mac users on the 
7-or-later side of the divide.  Do you see what I'm asking here?

==
John H. Jenkins
[EMAIL PROTECTED]
[EMAIL PROTECTED]
http://homepage.mac.com/jenkins/

Re: Unicode and the digital divide.

2002-05-31 Thread John H. Jenkins



On Friday, May 31, 2002, at 10:11 AM, Doug Ewell wrote:


 Respefully,

Nice one, Doug.  Unfortunately, on my system, that collides with the 
ConScript version of Shavian which I have installed, so I got something 
unexpected.  ☹

Which makes your point.  As the Good Book says, He that hath ears to hear,
  let him hear.

==
John H. Jenkins
[EMAIL PROTECTED]
[EMAIL PROTECTED]
http://homepage.mac.com/jenkins/

Re: Unicode and the digital divide.

2002-05-31 Thread John H. Jenkins



On Friday, May 31, 2002, at 02:38 PM, Kenneth Whistler wrote:

 The issue is *NOT* hardware. Take a look at www.dell.com. The
 very, very, bottom-end system, a Dimension 2200 desktop, comes these
 days with a 1.3GHz Intel Celeron chip, oodles of multimegabytes of SDRAM,
 a 20- to 40GB hard drive, a 4MB of video memory. That machine, which can
 jump circles around even a top-of-the-line PC of just a few years ago,
 is listed at a base price of $669. These machines are now approaching
 supercomputer capabilities, at Radio Shack everyday consumer electronics
 prices. And if you can't afford one yourself, you can rent access to
 one.

 The issue is *NOT* the OS. All Dell PC's come pre-loaded with MS Windows
 XP right now. And guess what -- all that Unicode functionality is packed
 right under the hood in XP, waiting to go.


And for the record, for slightly more you can get a low-end iMac with Mac 
OS Xagain, a Unicode-capable OS.

==
John H. Jenkins
[EMAIL PROTECTED]
[EMAIL PROTECTED]
http://homepage.mac.com/jenkins/

Re: Q: How many enumerated characters in Unicode?

2002-06-05 Thread John H. Jenkins



On Wednesday, June 5, 2002, at 07:27 AM, Adam Twardoch wrote:

 Oh, thank you! I needed that figure to make a point why you cannot make a
 single TrueType font covering all of the Unicode range. I knew it was way
 more than 65,536, but it's better to quote a precise figure :)


Ah, but the figure Ken give you isn't enough, anyway for two reasons:

1) Some scripts (e.g., south Asian scripts) will require additional glyphs 
for proper display.

2) For Han in particular, one shape does not fit all.  You'll need 
multiple locale-specific glyphs for a number of characters.

In real life, you can ignore (2) by simply issuing a locale-specific 
version of a font, but there's no real way to get around (1).

==
John H. Jenkins
[EMAIL PROTECTED]
[EMAIL PROTECTED]
http://homepage.mac.com/jenkins/

Re: Hong Kong Supplementary Character Set

2002-06-11 Thread John H. Jenkins



On Tuesday, June 11, 2002, at 03:32 PM, Steve Watt wrote:

 Could someone explain the relationship of the two tags, kIRG_HSource and
 kHKSCS in the unihan.txt file on the Unicode site?

Basically (at the moment), kIRG_HSource is a subset of kHKSCS.  They also 
come via different routes.  kIRG_HSource is a listing of those characters 
where the HKSAR submitted source information to the IRG.  So far, all 
these are in HK SCS, but we can't guarantee this will be the case in the 
future.  The latter comes via the HKSAR's official mapping tables.

 What would be the
 approved way to create a conversion table from Windows 950 (with HKSCS)
 to Unicode?

Er, doesn't MS provide one somewhere?

==
John H. Jenkins
[EMAIL PROTECTED]
[EMAIL PROTECTED]
http://homepage.mac.com/jenkins/

Re: Chess symbols, ZWJ, Opentype and holly type ornaments.

2002-06-20 Thread John H. Jenkins



On Thursday, June 20, 2002, at 03:25 PM, Kenneth Whistler wrote:

 I think what a number of people on the list have been hinting -- or
 openly stating -- is that prolixity is not a virtue on an email list
 when trying to convey one's ideas.


IOW, brevity's wit's soul.

==
John H. Jenkins
[EMAIL PROTECTED]
[EMAIL PROTECTED]
http://homepage.mac.com/jenkins/

Re: (long) Re: Chromatic font research

2002-06-29 Thread John H. Jenkins



On Saturday, June 29, 2002, at 06:41 AM, James Kass wrote:


 This is a display issue rather than an encoding one.  Unicode already
 provides for the correct encoding of the ct ligature with the
 ZWJ character.  Anyone wishing to correctly display the ct
 ligature might need to use a work-around.  Substituting PUA code
 points by private agreement is one workable method.

I must point out that for English (and a lot of other languages), the use 
of ZWJ to control ligation is considered improper.  The ZWJ technique for 
requesting ligatures is intended to be limited to cases where the word is 
spelled incorrectly if *not* ligated (and similarly ZWNJ is intended to 
prevent ligature formation where that would make the word spelled 
incorrectly).  The kind and degree of ligation in English is generally 
considered a sylistic issue and is best left to higher-level protocols.  
Thus saith Unicode 3.2.

==
John H. Jenkins
[EMAIL PROTECTED]
[EMAIL PROTECTED]
http://homepage.mac.com/jenkins/

Re: (long) Re: Chromatic font research

2002-06-29 Thread John H. Jenkins


Hmm.  Disregard the last message from me.  It isn't ct you're replacing.
   See how annoying this all is?  :-)

==
John H. Jenkins
[EMAIL PROTECTED]
[EMAIL PROTECTED]
http://homepage.mac.com/jenkins/

Re: (long) Re: Chromatic font research

2002-06-29 Thread John H. Jenkins



On Saturday, June 29, 2002, at 03:01 PM, [EMAIL PROTECTED] wrote:


 On 06/28/2002 11:34:35 PM Doug Ewell wrote:

 sigh / OK, here are the details...

 OK, now I know the cha of events that he was referrg to, and I'm def
 itely cled to agree that it was complete cocidence. It is trivial, 
 fact, to disprove the hypothesis that the experiment supposedly proved.



Will you guys *please* stop sending me email with the Shavian letter 
CHURCH everywhere the Latin letters ct should be?  It's most distracting.
   :-)

==
John H. Jenkins
[EMAIL PROTECTED]
[EMAIL PROTECTED]
http://homepage.mac.com/jenkins/

Re: ZWJ and Latin Ligatures (was Re: (long) Re: Chromatic font research)

2002-06-30 Thread John H. Jenkins



On Sunday, June 30, 2002, at 05:31 AM, James Kass wrote:

 Can you please point me to a URL for Unicode 3.2 ligature control?
 This link (March 2002):
 http://www.unicode.org/unicode/reports/tr28/
 ...glosses over Latin ligatures suggesting that mark-up should be
 used in some cases and ZWJ in others.


The precise language of the TR is:

quote

Ligatures and Latin Typography (addition)

It is the task of the rendering system to select a ligature (where 
ligatures are possible) as part of the task of creating the most pleasing 
line layout. Fonts that provide more ligatures give the rendering system 
more options.

However, defining the locations where ligatures are possible cannot be 
done by the rendering system, because there are many languages in which 
this depends not on simple letter pair context but on the meaning of the 
word in question. 

ZWJ and ZWNJ are to be used for the latter task, marking the non-regular 
cases where ligatures are required or prohibited. This is different from 
selecting a degree of ligation for stylistic reasons. Such selection is 
best done with style markup. See Unicode Technical Report #20, Unicode in 
XML and other Markup Languages for more information.

/quote

That seems pretty clear to me.  If you want a ct ligature in your 
document because you think it looks cool, then you use some higher-level 
protocol.  The looks cool factor simply doesn't apply unless you know 
what font you're dealing with, because ct looks cool in some fonts, 
but not others.

In real Latin typography, the set of ligatures available with a typeface 
varies from font to font.  Type designers add ligatures (or not) depending 
on their esthetic sense of what looks good and how the letters interact 
with one another.  From a type design perspective, a monospaced font like 
Courier should have no ligatures; they don't make sense.  A rich book font 
like Adobe Minion Pro will have a fairly large but standard set, and a 
calligraphic font like Linotype's Zapfino will have a huge and imaginative 
set.

The programs that provide ligature control do so by means of having the 
user select a range of text and then changing the level of ligation.  The 
type formats like OpenType or AAT support this by allowing the type 
designer to categorize ligatures as common, rare, required, and so 
on.  Thus, if I'm typesetting a document in Adobe InDesign, I'll select 
text, and turn rare ligatures on and thus see the ct ligature, if it 
exists in the font and if the type designer has designated it a rare 
ligature.

To be frank, turning on an optional ct ligature throughout a document by 
means of inserting ZWJ everywhere you want it to take place makes as much 
sense in that modelthe model that Western typography uses for languages 
such as Englishas having the user insert a i/i pair around every 
letter they want in italics.

Remember, Unicode is aiming at encoding *plain text*.  For the bulk of 
Latin-based languages, ligation control is simply not a matter of *plain 
text*that is, the message is still perfectly correct whether ligatures 
are on or off.  There are some exceptional cases.  The ZWJ/ZWNJ is 
available for such exceptional cases.

==
John H. Jenkins
[EMAIL PROTECTED]
[EMAIL PROTECTED]
http://homepage.mac.com/jenkins/

Re: (long) Re: Chromatic font research

2002-07-01 Thread John H. Jenkins



On Monday, July 1, 2002, at 05:31 AM, Michael Everson wrote:

 I must point out that for English (and a lot of other languages), the 
 use of ZWJ to control ligation is considered improper.  The ZWJ 
 technique for requesting ligatures is intended to be limited to cases 
 where the word is spelled incorrectly if *not* ligated

 What! No! Look at my paper and the examples of Runic and Old Hungarian 
 and Irish. There are examples where ligation is used on a nonce-basis, 
 not having anything to do with global ligation or correctness.


Michael, I was very careful to say English (and a lot of other languages)
.  And, by and large, the software which supports ligation doesn't compel 
global on-or-off, so that nonce-bases are supported.

What's frustrating for me about this never-ending discussion is that it 
always seems to come down to the stupid ct-ligature in English.  I have a 
book that uses it *everywhere* and it gets *really* annoying.  :-(

I have sitting in front of me a reprint of a nineteenth century 
reproduction of the 1611 King James Version of the Bible.  It uses the ct
  ligature in the headers, but not in the text.  (It also uses the long-s, 
by the way.)  But if someone were to come to me and ask how you would use 
plain text to reproduce this text, I'd tell them you can't, or shouldn'tt
rying to reproduce precisely the visual appearance of a text isn't a job 
for plain text.  Period.

I also have a font on my machine based on the handwriting of Herman Zapf.  
It's a gorgeous font with a huge, idiosyncratic set of ligatures.  It 
doesn't make sense to have the user (or software) insert ZWJ all over the 
place on the off-chance that the text will end up being set with Zapfino 
to make sure that these ligatures form correctly.

Our system fonts are set, moreover, to do fi- and fl-ligature formation 
automatically (well, most of them are).  That's because it's the 
appropriate default behavior for most Latin-based languages.  (Not all.  I 
know that.)  Where this behavior is *not* appropriate, there are 
mechanisms, including the ZWJ/ZWNJ one, which can override the default 
behavior.  This means that file names, menus, dialog boxes, email, and so 
on all do the most-nearly-correct thing without having to be told to.

 (and similarly ZWNJ is intended to prevent ligature formation where that 
 would make the word spelled incorrectly).  The kind and degree of 
 ligation in English is generally considered a sylistic issue and is best 
 left to higher-level protocols.  Thus saith Unicode 3.2.

 It doesn't go so far as to say what you did. Maybe Book needs to check 
 the text some on this point. We should have consensus.


No, the bit about spelling is simply my attempt to state informally the 
idea that Unicode 3.2 is attempting to convey.

Let's just have a quick survey here to see if there's consensus:

1) In Latin typography, ligature formation is generally a stylistic choice.
   There are exceptions, and these exceptions are more or less common 
depending on the precise language being represented.

2) Where ligature formation *is* a stylistic choice, it should not be 
controlled in plain text but by some sort of higher-level mechanism.  Such 
a mechanism should allow the default formation of ligatures with the 
ability for the user to override the default behavior.

3) Where ligature formation is *not* a stylistic choice, the ZWJ/ZWNJ 
mechanism is an appropriate one to provide ligation control.

4) The precise set of ligatures in a Latin typeface is design-specific.  A 
typeface should not be required to include a set of ligatures which do not 
make aesthetic sense for the overall design.

This last point, by the way, is the one which is the big sticking point 
for the large type foundries that I've spoken to.

==
John H. Jenkins
[EMAIL PROTECTED]
[EMAIL PROTECTED]
http://homepage.mac.com/jenkins/

Re: ZWJ and Latin Ligatures

2002-07-01 Thread John H. Jenkins



On Monday, July 1, 2002, at 10:16 AM, Michael Everson wrote:

 Some nice person just said to me privately:

 Michael Everson wrote:

  In my paper http://www.dkuug.dk/jtc1/sc2/wg2/docs/n2317.pdf I raised
  a lot of questions about exceptions and the use of these. I don't
  think they were ever all answered.My other papers, N2141 and N2147,
  show a number of examples of ligation which is not particularly
  predictable. That's what ZWJ us supposed to be for.

 That's because some people (not to mention any ad-hominem names; there
 is more than one) are more interested in saying This is a simple
 problem, and the rendering systems of the future (or my Mac today) will
 handle it automatically than in answering the complex linguistic and
 orthographic questions you raised.


For the record, I (at least) have never asserted that Mac (or any other) 
system software will ever gain the ability to handle ligation on a 
completely automatic basis.  In any event, the ZWJ/ZWNJ mechanism has no 
advantage over any higher-level protocol when it comes to software support,
  since it's all being done via AAT/OpenType/Graphite or something similar 
in any event.

I guess one thing that's frustrating for me personally in this perennial 
discussion is the creation of this false dichotomy, that ligation control 
either *must* be in plain text or *must* be expressly forbidden in plain 
text.  I would agree, Michael, that your arguments that some degree of 
ligation control belongs in plain text were unanswerable.  You did a good 
job there.  But at the same time, I've never heard you argue that the only 
way to turn ligatures on or off is in plain text.

I feel compelled to reiterate my own feelings on the subject:  Ligation in 
Latin text is generally a matter of stylistic preference, and depends on 
the specific typeface being used and its set of available ligatures.  
There are exceptions, and these should be handled via the ZWJ/ZWNJ 
mechanism.  Where ligation is merely a matter of stylistic preference, 
however, it should be handled by some other mechanism which can take the 
specific capacities of a typeface into consideration.  System and other 
software can (and should) provide default ligation which the user should 
be able to override.

And under no circumstances should new Latin ligatures be added to Unicode.

 Personally I think your ZERO-WIDTH LIGATOR papers are among the best of
 all your Unicode-related papers.  I agreed with the decision to unify
 the ligation function with ZWJ rather than creating a new character, but
 your arguments about Latin, Greek, Runic, Old Hungarian, etc. ligation
 were thorough and unassailable.

 Thank you, nice person. It's nice to know that someone else looked at the 
 argument and came up with the same conclusion that I did.


For the record, Michael, this was the general feeling of the UTC when the 
matter was debated there.

==
John H. Jenkins
[EMAIL PROTECTED]
[EMAIL PROTECTED]
http://homepage.mac.com/jenkins/

Re: ZWJ and Latin Ligatures (was Re: (long) Re: Chromatic font research)

2002-07-01 Thread John H. Jenkins



On Monday, July 1, 2002, at 06:28 AM, James Kass wrote:


 John H. Jenkins wrote:

 That seems pretty clear to me.  If you want a ct ligature in your
 document because you think it looks cool, then you use some 
 higher-level
 protocol.  The looks cool factor simply doesn't apply unless you know
 what font you're dealing with, because ct looks cool in some fonts,
 but not others.

 It's enough that an author would want a ct ligature to appear in text,
 the motivation for the desire isn't relevant.  Authors who want to
 specify a certain ligature know about font selection.


Au contraire, because of the italic analog.  I may *want* a particular 
word to be in italics, but that doesn't mean that the italics belong in 
plain text.

It is not the goal of Unicode to allow the complete representation of an 
author's intent in plain text.  I can't typeset Alice in Wonderland in 
plain text.  I'm sorry, but the Mouse's tail would simply get in the way.

There's another level of problem here, too.  What if it isn't the author's 
intent, but an artifact of the particular typesetter?

 One problem with TR28 is that it is worded so that it appears to
 be in addition to earlier guidelines.  This implies that the examples
 used in TR27, for one, are still valid.  In TR27, font developers are
 urged to add things like f+ZWJ+i to existing tables where f+i
 is already present.


And for the record, Apple is doing that.

 Another problem with TR28 is that its date is earlier than the date
 on TR27.  This suggests that TR27 is more current.


This may be a point for clarification in TR28.

 Another issue is that a search of the Unicode site for controlling
 ligatures gives TR27 as a hit, but not TR28.

 Having slept on this, I concur that it might be cool to be able to
 turn on or turn off ligatures over a range of text or an entire file
 using a higher level protocol.  However, options should be preserved
 for the user.  Ligature selection is a task for the author/typesetter
 at the fundamental level; it should not be completely left to the
 rendering system.


Er, James.  I've never said it should.  The rendering system should have 
the ability to do default ligation.  The user should be able to override 
that behavior.  That's what happens on systems I see.  If they do ligation 
at *all*, they have a default behavior which can be overridden.

 The programs that provide ligature control do so by means of having the
 user select a range of text and then changing the level of ligation.  The
 type formats like OpenType or AAT support this by allowing the type
 designer to categorize ligatures as common, rare, required, and so
 on.  Thus, if I'm typesetting a document in Adobe InDesign, I'll select
 text, and turn rare ligatures on and thus see the ct ligature, if it
 exists in the font and if the type designer has designated it a rare
 ligature.

 That's a lot of ifs and it leaves too much to chance.  When an author
 determines that, for instance, a ct ligature is required, there needs
 to be a method to encode it which is unambiguous.  ZWJ fits the bill.


I'll repeat a point that I've made over and over and over.

The ct ligature does not exist in and of itself.  It is a part of a 
typeface.  It doesn't make sense in general to ask for the formation of a 
ct ligature without any reference to the typeface you're using.

The implication of what you're saying is that Latin typefaces should be 
*required* to have a ct ligature on the off chance that the author of 
text determines that it's required in a particular context.  That gives 
most type designers the heebie jeebies.  It's bad enough that Adobe and 
Apple are making them stick useless fi and fl ligatures in their fonts.

In any event, if an author determines that a ct ligature is honestly and 
absolutely *required* in a particular context (as opposed to being 
desirable), then the ZWJ mechanism exists.

 To be frank, turning on an optional ct ligature throughout a document 
 by
 means of inserting ZWJ everywhere you want it to take place makes as much
 sense in that modelthe model that Western typography uses for languages
 such as Englishas having the user insert a i/i pair around every
 letter they want in italics.

 Not at all.  This is apples and oranges.  The italic tags operate upon
 every character in the enclosed string equally.  Using a similar ligature
 tag would be expected to make ligatures wherever possible within the
 enclosed string according the the user system's ability to render
 ligatures... irrespective of the author's intent.  Depending upon the
 system, the same run of text could be expressed with no ligatures
 at all in a monospaced font or as scripto continuo in a handwriting
 font.


Er, you've just made my point, haven't you?  The typeface makes a 
difference.  If you're ever in a situation where the typeface of the 
originator may be different from the typeface of the receiver, you've lost 
the ability to say whether or not ligatures

Re: Radicals in CNS 11643-1992, Plane 1, Rows 7,8,9

2002-07-01 Thread John H. Jenkins



On Monday, July 1, 2002, at 03:10 PM, Torsten Mohrin wrote:
 What should I do with these characters when converting CNS to Unicode?
 Mapping to regular Han? Are there compatibility ideographs for
 round-trip conversion?



Use the KangXi radicals in the KangXi radical block (U+2Fxx).

==
John H. Jenkins
[EMAIL PROTECTED]
[EMAIL PROTECTED]
http://homepage.mac.com/jenkins/

Re: ZWJ and Latin Ligatures (was Re: (long) Re: Chromatic font research)

2002-07-01 Thread John H. Jenkins



On Monday, July 1, 2002, at 02:08 PM, Asmus Freytag wrote:

 At 11:34 AM 6/30/02 -0600, John H. Jenkins wrote:
 Remember, Unicode is aiming at encoding *plain text*.  For the bulk of 
 Latin-based languages, ligation control is simply not a matter of *plain 
 text*that is, the message is still perfectly correct whether ligatures 
 are on or off.  There are some exceptional cases.  The ZWJ/ZWNJ is 
 available for such exceptional cases.

 Remember also that the simplistic model you present already breaks down 
 for German, since the same character pair may or may not allow ligation 
 depending on the content and meaning of the text - features that in the 
 Unicode model are relegated to *plain* text.


*sigh*  I'm clearly not expressing myself well here.

I'm trying to state the general rule.  Each time I do, I say there are 
exceptions.  German is an excellent example of an exception.  Michael's 
exceptional cases are exceptional cases.  We put ZWJ/ZWNJ in charge of 
plain-text ligature formation to handle these cases.  I'm fine with that.

Turkish is another exception, BTW, where the typical fi ligature of 
Latin typography should not be formed.

The issue -- as I see it -- is not whether or not *any* ligature control 
belongs in plain text, or whether or not manditory/prohibited ligation 
points should be marked in plain text.  I'm not aware of anyone who is 
arguing against that position.

We started out with a discussion of whether or not we should add more 
Latin ligatures (whether in the PUA or elsewhere) so that people can, in 
essence, create a plain-text representation of an older book where such 
were more common.  (And, as always, if my memory is inaccurate please feel 
free to correct me here.)  This is not an appropriate use of plain text 
IMHO.  I do not believe, moreover, that the ZWJ/ZWNJ mechanism is 
appropriate for this sort of thing.  This is rich text, and other ligation 
controls should be used.

 Therefore, I would be much happier if the discussion of the 'standard' 
 case wasn't as anglo-centric and allowed more directly for the fact that 
 while fonts are in control of what ligatures are provided, layout engines 
 may be in control of what and how many optional ligatures to use, the 
 text (!) must be in control of where ligatures are mandatory or 
 prohibited.


Which is what Unicode 3.2 says.  (You said it very nicely here, though.)

(The standard case, BTW, seems to be Anglo-centric largely because this is 
an English-speaking list and people always seem to start out with the ct
  ligature they'd like to put in words like respectfully.  Sorry about 
that.)

==
John H. Jenkins
[EMAIL PROTECTED]
[EMAIL PROTECTED]
http://homepage.mac.com/jenkins/

Re: ZWJ and Latin Ligatures

2002-07-01 Thread John H. Jenkins



On Monday, July 1, 2002, at 01:03 PM, Tex Texin wrote:

 The discussion refers to other ways of influencing a font
 with respect to ligature and I don't recall ever seeing a way to do
 this. What kinds of products have these abilities?


It's a pretty common feature of desktop publishing applicationsQuark, 
FrameMaker, InDesign.  TextEdit, the default text editor on Mac OS X, does 
it, but it's not at all common at the low end of things.  I wouldn't be 
surprised if it showed up in Word eventually, however.

In FrameMaker, which I happen to have open at the moment, you do it by 
turning pair kerning on and off.  InDesign has a menu that lets you select 
degree of ligation.

==
John H. Jenkins
[EMAIL PROTECTED]
[EMAIL PROTECTED]
http://homepage.mac.com/jenkins/

Re: ZWJ and Latin Ligatures

2002-07-02 Thread John H. Jenkins



On Tuesday, July 2, 2002, at 06:51 AM, Michael Everson wrote:

 That is absolutely true. I have never argued that the only way to turn 
 ligatures on or off is in plain text. I saw that there were difficult 
 edge cases and sought blessing for the ZWJ/ZWNJ mechanism to handle them,
  and won the day. But it would certainly be my view that those should 
 only be used where predictable ligation does not occur. A Runic font 
 which had an AAT/OpenType/Graphite ligatures-on mechanism would, in my 
 view, be inappropriate, because ligation is unusual in Runic, never the 
 norm, and should only be used on a case-by-case basis. Runic fonts should 
 have the ZWJ pairs encoded in the glyph tables.



Alas, but that's technically impossible.  Both OT and AAT (I'm not sure 
about Graphite) require that single characters map to single glyphs, which 
are then processed.  (In OT, of course, you are also supposed to do some 
preprocessing in character space, but that doesn't solve this problem.)  
It would be nice to have a cmap format which maps multiple characters to 
single glyphs initially.

The way we deal with this is to have the ligatures with the ZWJ inserted 
as part of a ligature table which is on by default and which isn't 
revealed to the UI so that the user can't turn them off.

==
John H. Jenkins
[EMAIL PROTECTED]
[EMAIL PROTECTED]
http://homepage.mac.com/jenkins/

Re: ZWJ and Latin Ligatures

2002-07-02 Thread John H. Jenkins



On Tuesday, July 2, 2002, at 09:49 AM, Michael Everson wrote:

 At 09:41 -0600 2002-07-02, John H. Jenkins wrote:

 Alas, but that's technically impossible.  Both OT and AAT (I'm not sure 
 about Graphite) require that single characters map to single glyphs, 
 which are then processed.

 Hm? How do you handle the decomposed sequence A + COMBINING ACUTE? Surely 
 that is a sequence of characters mapping to a single glyph.


Same process.  In OT, of course, you could count on the glyph being 
prenormalized (but this only works for stuff already in Unicode), or you 
could use the GPOS table to properly form the accented form on-the-fly.

But neither technology allows the decomposed sequence to be mapped 
directly to a single glyph.

 Just goes to show that I don't make proper Unicode fonts yet because the 
 tools just aren't up to snuff.


We're working on it.  :-)

 (In OT, of course, you are also supposed to do some preprocessing in 
 character space, but that doesn't solve this problem.)  It would be nice 
 to have a cmap format which maps multiple characters to single glyphs 
 initially.

 I always thought there was. Now I'm really confused as to how I would 
 make a complex Indic syllable.


Same sort of thing.  You put the glyph in the font and the instructions 
for what sequence forms it in the GSUB or morx table.

==
John H. Jenkins
[EMAIL PROTECTED]
[EMAIL PROTECTED]
http://homepage.mac.com/jenkins/

Re: ZWJ and Latin Ligatures

2002-07-02 Thread John H. Jenkins



On Tuesday, July 2, 2002, at 10:55 AM, Marco Cimarosti wrote:

 I mean: isn't this two-step mapping:

   code point - glyph ID
   component glyph ID's - ligature glyph ID

 functionally equivalent to an hypothetical one-step mapping?

   component code points - ligature glyph ID

 Am I missing something?


Functionally, the two are equivalent.  There are, however, two subtle 
differences:

1) If you map directly from multiple characters to a single glyph, you don'
t have to include glyphs in your font for all the pieces if they're 
never supposed to appear by themselves.  As an extreme example, if I 
implemented astral character support via ligating surrogate pairs, I'd 
need to include glyphs for the unpaired surrogates.  As it is, Windows and 
the Mac *do* support mapping paired surrogates directly to glyphs, so you 
don't need these extra glyphs which are never seen.

2) A mapping directly from multiple characters to single glyphs expressly 
makes the process something not to percolate up to the UI.  The indirect 
process means that there are some actions in glyph space which *are* 
optional and which the user can turn on and off, and others which aren't.

In OpenType, this is less of an issue since this was always the case and 
applications are expected to do the UI work themselves.  In AAT, we 
originally assumed (back in the days of the Technology That Must Not Be 
Named) that all layout features are optional and can be turned on and off,
  and that the UI would always reflect the entire suite of available 
features.  We had to rewrite our tools to allow for required actions which 
cannot be turned off.

Poor Michael is saddled with older versions of our tools which are hard to 
use and don't let him do this.  We're working on getting newer and better 
ones to him.

==
John H. Jenkins
[EMAIL PROTECTED]
[EMAIL PROTECTED]
http://homepage.mac.com/jenkins/

Re: ZWJ and Latin Ligatures

2002-07-02 Thread John H. Jenkins



On Tuesday, July 2, 2002, at 11:39 AM, John Cowan wrote:


 1) If you map directly from multiple characters to a single glyph, you 
 don'
 t have to include glyphs in your font for all the pieces if they're
 never supposed to appear by themselves.  As an extreme example, if I
 implemented astral character support via ligating surrogate pairs, I'd
 need to include glyphs for the unpaired surrogates.

 More precisely, you need to have glyph *indexes* that are never mapped
 to glyphs.  The actual outlines themselves don't need to exist, AFAIK.


True.  I tend to avoid that, because if something goes wrong and the 
system attempts to actually *display* one of these virtual glyphs, 
disaster would ensue.  (Dave Opstad and I have had long debates on the 
safety of doing this.)

==
John H. Jenkins
[EMAIL PROTECTED]
[EMAIL PROTECTED]
http://homepage.mac.com/jenkins/

Re: ZWJ and Latin Ligatures

2002-07-02 Thread John H. Jenkins



On Tuesday, July 2, 2002, at 12:51 PM, Marco Cimarosti wrote:

 The next step could be standardizing the values of the glyph indexes, so
 that the entire GSUB/morx table can be copied in from a template, and
 type designers can concentrate on drawing the outlines.


The typical approach these days is for the tools that provide advanced 
layout table support to be keyed to glyph name.  Apple's tools allow glyph 
name, glyph number, of Unicode code point as glyph identifiers.  As you 
say, it makes it possible to cut-and-paste source files and is very handy.

==
John H. Jenkins
[EMAIL PROTECTED]
[EMAIL PROTECTED]
http://homepage.mac.com/jenkins/

Re: FW: Inappropriate Proposals FAQ

2002-07-03 Thread John H. Jenkins



On Wednesday, July 3, 2002, at 11:57 AM, Asmus Freytag wrote:

 Klingon (or any of the Latin ciphers/ movie scripts)



I'd say Klingon *and* one of the Latin ciphers.  Klingon is almost worth a 
FAQ in itself.

==
John H. Jenkins
[EMAIL PROTECTED]
[EMAIL PROTECTED]
http://homepage.mac.com/jenkins/

Re: Inappropriate Proposals FAQ

2002-07-03 Thread John H. Jenkins



On Wednesday, July 3, 2002, at 02:23 PM, Murray Sargent wrote:

 as something inappropriate. Question: how does one code up (presumably
 with markup) a caret over a jk pair in a math expression? The dot on the
 j should be missing for this case, but how does one communicate that to
 a font if there's no code for a dotless j? It seems that dotless j is
 needed for some mathematical purposes.


The glyph is; the character isn't.  There are also accented j's which are 
based on a dotless-j.  The way we do it is include a glyph called 
dotlessj in the font, and have the tables set up so that whenever j is 
found with an accent, dotlessj is substituted.

==
John H. Jenkins
[EMAIL PROTECTED]
[EMAIL PROTECTED]
http://homepage.mac.com/jenkins/

Re: FW: Inappropriate Proposals FAQ

2002-07-04 Thread John H. Jenkins



On Thursday, July 4, 2002, at 09:07 AM, Otto Stolz wrote:

 Michael Everson wrote:
 That, and the fact that it hasn't been deciphered.


 Which implies that you really cannot tell what constitutes a character,
 in that script, nor its writing-direction.



Actually, you can't even tell *that* it's a script, not for sure.  But if 
it *is* writing, then the nature of the characters seems fairly 
unambiguous as the various signs are self-contained and don't break down 
into smaller pieces.  It would appear to be a syllabary.  Also IIRC the 
writing direction has been deduced by determining the order in which the 
characters were stamped into the clay (as indicated by overlaps).

I should mention that the proposals for the encoding of the Phaistos disc 
are the only proposals made to the UTC and WG2 which contain the entire 
known corpus of writing with that script as a part of the proposal.  :-)

==
John H. Jenkins
[EMAIL PROTECTED]
[EMAIL PROTECTED]
http://homepage.mac.com/jenkins/

Re: The pointless thread continues

2002-07-07 Thread John H. Jenkins



On Friday, July 5, 2002, at 08:54 AM, John Hudson wrote:

 Actually, this isn't nonsense. A single buggy font is quite capable of 
 crashing an operating system. Obviously the damage is not permanent, 
 presuming one is able to get the system started in safe mode and remove 
 the offending font. I've seen some spectacularly nasty fonts over the 
 years, as have many of my colleagues (including engineers in the type 
 group at Apple, so this isn't simply a Windows issue).


C'est vrai.  One of the fonts we used to print Unicode 2.0 killed *all* 
text display on the system if it were to be used with ATSUI.  It was kind 
of cool, actually.  We actually have a font zoo stashed away full of 
pathological fonts which have been known to do all kinds of interesting 
things if someone should be foolish enough to install them.

==
John H. Jenkins
[EMAIL PROTECTED]
[EMAIL PROTECTED]
http://homepage.mac.com/jenkins/

Re:_How_do_I_encode_HTML_documents_in_old_languages_=C5uch as 17th century Swedih in Unicode?

2002-07-07 Thread John H. Jenkins



On Wednesday, July 3, 2002, at 11:10 AM, Stefan Persson wrote:

 There is a big problem in the current Unicode ſtandard, ſince
 Fraktur letters aren't ſupported in any ſuitable manner.

Aargh!  Medial long-s!  Run away!  Run away!  :-)

==
John H. Jenkins
[EMAIL PROTECTED]
[EMAIL PROTECTED]
http://homepage.mac.com/jenkins/

Re: ZWJ and Latin Ligatures (was Re: (long) Re: Chromatic font research)

2002-07-07 Thread John H. Jenkins



On Saturday, July 6, 2002, at 03:42 AM, James Kass wrote:


 We certainly agree that ligature use is a choice.  I think we diverge
 on just what kind of choice is involved.  You consider that ligature
 use is generally similar to bold or italic choices.  I consider use of
 ligatures to be more akin to differences in spelling.  If you're
 quoting from a source which used the word fount, it is wrong to
 change it to font.  And, if you're quoting from a source which
 used hæmoglobin, anything other than hæmoglobin is incorrect.
 If the source used c., it should never be changed to etc..
 So, if the source used the ct ligature...



I see your point, but I think we're to the stage where we'll just have to 
agree to disagree.  We *do* agree that ligation is a choice, but you're 
quite accurate in your assessment of where precisely we diverge.

==
John H. Jenkins
[EMAIL PROTECTED]
[EMAIL PROTECTED]
http://homepage.mac.com/jenkins/

Re: [OpenType] Proposal: Ligatures w/ ZWJ in OpenType

2002-07-07 Thread John H. Jenkins



On Saturday, July 6, 2002, at 04:11 PM, John Hudson wrote:

 There are going to be documents containing this character -- and ZWNJ -- 
 and fonts that do not contain these characters may display them with 
 .notdef glyphs. The only solution is system or application intelligence 
 that is able to ensure that no attempt is made to display glyphs for 
 these characters. This issue seems to have already been resolved in MS 
 text processing, at least as far as I have tested it in WordPad. I have 
 inserted a ZWJ character in a string of text using a standard PS Type 1 
 font, and the character is treated as a zero-width, no outline control 
 character.


Well, by default no attempt is made to display glyphs for these characters.
   (Somebody may have a show invisibles or equivalent on.  BTW, does OT 
have a show invisibles feature?  I'm too lazy to check right now.)  We 
also have a list of invisible characters which should, ordinarily, be 
left undisplayed including ZWJ, ZWNJ, the bidi overrides, and so on.

==
John H. Jenkins
[EMAIL PROTECTED]
[EMAIL PROTECTED]
http://homepage.mac.com/jenkins/

Re: Proposal: Ligatures w/ ZWJ in OpenType

2002-07-15 Thread John H. Jenkins



On Monday, July 15, 2002, at 09:58 AM, Doug Ewell wrote:
 No, what bothers me is that the ZWJ/ZWNJ ligation scheme is starting to
 look just like the DOA (deprecated on arrival) Plane 14 language tags.
 In each case, Unicode has created a mechanism to solve a genuine (if
 limited) need, but then told us -- officially or unofficially -- that we
 should not use it, or that it is reserved for use with special
 protocols which are never defined or mentioned again.


I'm not sure I agree with you here.  The position of the UTC is not that 
ZWJ should never be used and we're sorry we added it, which is the case of 
the Plane 14 language tags.  It's that the ZWJ should not be the primary 
mechanism for providing ligature support in many cases.  That's as far as 
it goes.

 The UTC may have intended that ZWJ ligation be used only in rare and
 exceptional circumstances, but UAX #27, revised section 13.2 doesn't say
 that.

The latest word is the Unicode 3.2 document, not the Unicode 3.1 document.
   It says:

Ligatures and Latin Typography (addition)

It is the task of the rendering system to select a ligature (where 
ligatures are possible) as part of the task of creating the most pleasing 
line layout. Fonts that provide more ligatures give the rendering system 
more options.

However, defining the locations where ligatures are possible cannot be 
done by the rendering system, because there are many languages in which 
this depends not on simple letter pair context but on the meaning of the 
word in question. 

ZWJ and ZWNJ are to be used for the latter task, marking the non-regular 
cases where ligatures are required or prohibited. This is different from 
selecting a degree of ligation for stylistic reasons. Such selection is 
best done with style markup. See Unicode Technical Report #20, Unicode in 
XML and other Markup Languages for more information.

  It says that ZWJ and ZWNJ *may be used* to request ligation or
 non-ligation, and that font vendors should add ZWJ to their ligature
 mapping tables as appropriate.  It does acknowledge that some fonts
 won't (or shouldn't) include glyphs for every possible ligature, and
 never claims that they must (or should).  It specifically does *not* say
 that ZWJ ligation is to be restricted to certain orthographies, or to
 cases where ligation changes the meaning of the text.


This is correct.  Nor is this changed in Unicode 3.2.  The goal is to make 
the ZWJ mechanism available to people who feel it is appropriate to meet 
their needs, but to try to inform them that in the majority of cases, a 
higher-level protocol would be better.

Adobe doesn't have to revise InDesign, for example, to insert ZWJ all over 
when a user selects text and turns optional ligatures on.  OTOH, the hope 
is that if ligatures are available InDesign will honor the ZWJ marked ones,
  even if ligation has been turned off.

John Hudson has recommended what seems a reasonable way to handle this in 
OT.  Apple will be releasing new versions of its font tools in the near 
future, and the documentation will include a recommendation for how this 
can be done with AAT.  We've been revising our own fonts as the 
opportunity presents itself to support ZWJ as well.  (The system and 
ATSUI-savvy applications require no revision.)

The push-back coming from the font community on the issue has to do mostly 
with the communications problem that they weren't aware of it in as timely 
a fashion as would have been best,  and the concern that font developers 
and application/OS developers will be forced to add ligature support where 
they have felt it in appropriate in the past.

 ZWJ/ZWNJ for ligation control is part of Unicode.  It is not always the
 best solution, but it is *a* solution, and should be available to the
 user without restriction or discouragement.


It's discouraged when it's inappropriate.  It isn't deprecated.  There are 
numerous places where Unicode provides multiple ways of representing 
something.  In this instance, Unicode is trying to delineate where a 
particular mechanism is appropriate and where inappropriate.

==
John H. Jenkins
[EMAIL PROTECTED]
[EMAIL PROTECTED]
http://homepage.mac.com/jenkins/

Re: Missing character glyph

2002-07-31 Thread John H. Jenkins



On Tuesday, July 30, 2002, at 08:58 PM, Doug Ewell wrote:

 Have Last Resort symbols been devised for all the blocks in Unicode,
 including the new ones like Tagalog?  Neither Mark Leisher's page nor
 the Apple typography page contains a complete list.



Yes.  It covers all of Unicode 3.2; but the font has been entirely 
redesigned.  We really need to update our documentation.

==
John H. Jenkins
[EMAIL PROTECTED]
[EMAIL PROTECTED]
http://homepage.mac.com/jenkins/

Re: Digraphs as Distinct Logical Units

2002-08-09 Thread John H. Jenkins



On Friday, August 9, 2002, at 03:54 AM, Andrew C. West wrote:

 And in China, historically the personal names of emperors (for 
 emperors read dictators) have been
 tabooed

An Ideographic Taboo Variation Indicator has been approved by the UTC 
for addition to the standard to handle precisely this kind of situation 
(see http://www.unicode.org/unicode/alloc/Pipeline.html.  It works on 
the theory that you rarely need to know the precise *form* of the taboo 
variant, just that a taboo form is being used.  There was some 
disagreement in WG2 about its utility, however, and there is the 
problem that, as you note, some taboo variants have already been 
encoded.  It's currently scheduled to be reconsidered by the UTC.

==
John H. Jenkins
[EMAIL PROTECTED]
[EMAIL PROTECTED]
http://homepage.mac.com/jhjenkins/

Re: Taboo Variants

2002-08-09 Thread John H. Jenkins



On Friday, August 9, 2002, at 11:38 AM, Andrew C. West wrote:

 My point is that if the commonly encountered taboo variants are 
 already encoded in CJK-B, then
 either the other taboo variants should also be added to CJK-B or they 
 could be *described* using
 IDCs.

Encoding them was a mistake, pure and simple.  We didn't monitor the 
IRG well enough in the CJK-B encoding process, or we would have 
objected to this kind of cruft.

And describing them is a valid approach.  It depends on what's more 
important to youthe appearance (which IDS's are better at), or the 
semantic (which is explicit with the TVS).

 Adding a taboo variant selector does make a difference, because then 
 there'll be more than one
 way to reference the same character.


Well, yes and no.  Even though we've already got taboo variants 
encoded, we have no way to flag in a text that the purpose they're 
serving is taboo variants.  The interesting thing about the taboo 
variants is precisely that meaning:  This is character X written in a 
deliberately distorted way.  You identified the taboo variants you 
found in Ext B not based on anything in the standard, but because of 
your outside knowledge.  A student encountering them in a text may well 
be stymied until she goes to her professor.

Meanwhile, multiple encodings of the same Han character are *already* a 
major problem.  This is one reason why the UTC is determined to be 
stricter in the future to keep it from continuing to happen.

==
John H. Jenkins
[EMAIL PROTECTED]
[EMAIL PROTECTED]
http://homepage.mac.com/jhjenkins/

Re: Keys. (derives from Re: Sequences of combining characters.)

2002-09-27 Thread John H. Jenkins



On Friday, September 27, 2002, at 09:52 AM, [EMAIL PROTECTED] 
wrote:

 I doubt there's anyone on this list that always agrees with me


I think you're wrong, there, Peter.  I *never* disagree with you.  :-)

==
John H. Jenkins
[EMAIL PROTECTED]
[EMAIL PROTECTED]
http://www.tejat.net/

Re: script or block detection needed for Unicode fonts

2002-09-29 Thread John H. Jenkins



On Saturday, September 28, 2002, at 03:19 PM, David Starner wrote:

 On Sat, Sep 28, 2002 at 01:19:58PM -0700, Murray Sargent wrote:
 Michael Everson said:
 I don't understand why a particular bit has to be set in
 some table. Why can't the OS just accept what's in the font?

 The main reason is performance. If an application has to check the 
 font
 cmap for every character in a file, it slows down reading the file.

 Try, for example, opening a file for which you have no font coverage in
 Mozilla on Linux. It will open every font on the system looking for the
 missing characters, and it will take quite a while, accompanied by much
 disk thrashing to find they aren't there.


This just seems wildly inefficient to me, but then I'm coming from an 
OS where this isn't done.  The app doesn't keep track of whether or not 
a particular font can draw a particular character; that's handled at 
display time.  If a particular font doesn't handle a particular 
character, then a fallback mechanism is invoked by the system, which 
caches the necessary data.  I really don't see why an application needs 
to check every character as it reads in a file to make sure it can be 
drawn with the set font.

==
John H. Jenkins
[EMAIL PROTECTED]
[EMAIL PROTECTED]
http://www.tejat.net/

Re: Mac Unicode question

2002-10-01 Thread John H. Jenkins



On Tuesday, October 1, 2002, at 08:42 AM, Alan Wood wrote:

 I don't think anyone replied to this.  As far as I know, these are the 
 only
 applications for Mac OS 9 that can use Windows TrueType fonts:


On X, any (non-Classic) application can use Windows TrueType fonts.  
Carbon applications which do not explicitly use ATSUI or MLTE are 
limited in how much of the font they can use.  Cocoa apps are pretty 
much able to do anything.

==
John H. Jenkins
[EMAIL PROTECTED]
[EMAIL PROTECTED]
http://www.tejat.net/

Re: is this a symbol of anything? CJK?

2002-10-10 Thread John H. Jenkins



On Thursday, October 10, 2002, at 02:29 PM, Tex Texin wrote:

 It looks close to several cjk characters, so I wasn't sure.


I think it's a variant turtle ideograph.  :-)

(Nothing bad, so far as I know.)

==
John H. Jenkins
[EMAIL PROTECTED]
[EMAIL PROTECTED]
http://www.tejat.net/

Re: Manchu/Mongolian in Unicode

2002-10-14 Thread John H. Jenkins



On Sunday, October 13, 2002, at 12:26 PM, Tom Gewecke wrote:

 The latest Mac OS X upgrade has fonts that include the classic
 Mongolian/Manchu range, 1800-18AF.


Well, yes, but they're not ready for prime time.  They're included 
because of PRC requirements which expect the glyphs but don't really 
insist that they do the right thing.  The same is true of Tibetan.  
Even the PRC's own fonts have this problem.  This is an unfortunate 
bind we were put in and I hope we can correct it in a not-too-distant 
release.

==
John H. Jenkins
[EMAIL PROTECTED]
[EMAIL PROTECTED]
http://www.tejat.net/

Re: the carnival of lost souls

2002-10-15 Thread John H. Jenkins


It's Carnival of Souls, actually.  http://us.imdb.com/Title?0055830 
is the original version, made by a fellow whose stock-in-trade was 
those old movies they used to show in high school to teach hygiene and 
the like.  He shot it in something like a week while he was supposed to 
be on vacation, mostly in Lawrence, Kansas, and Salt Lake City, using 
the abandoned spa on the Great Salt Lake, Saltair, as a major set.

Now, do you think I could have gotten any *more* off-topic than that?

On Tuesday, October 15, 2002, at 06:43 AM, John Cowan wrote:

 Pavla OR Francis Frazier scripsit:

 the carnival of lost souls
 What an expression! Almost makes me want to view the poster to see 
 what inspired it...

 Googling suggests that this is the title of a film, but the Internet 
 Movie
 Database (imdb.com) knows it not.

 -- 
 My corporate data's a mess!   John Cowan
 It's all semi-structured, no less.http://www.ccil.org/~cowan
 But I'll be carefree  [EMAIL PROTECTED]
 Using XSLThttp://www.reutershealth.com
 In an XML DBMS.



==
John H. Jenkins
[EMAIL PROTECTED]
[EMAIL PROTECTED]
http://www.tejat.net/

Re: Sorting on number of strokes for Traditional Chinese

2002-10-15 Thread John H. Jenkins


The Unihan database has total stroke count for many (but not all) 
characters.  It may provide an adequate first-order set of data for a 
pure stroke-based ordering in TC.

On Tuesday, October 15, 2002, at 12:02 PM, Magda Danish (Unicode) wrote:



 -Original Message-
 Date/Time:Tue Oct 15 05:13:41 EDT 2002
 Contact:  [EMAIL PROTECTED]
 Report Type:  Other Question, Problem, or Feedback

 To whom concerns,

 I wonder Unicode provide us a way to do sorting on number of
 strokes for Traditional Chinese characters.

 This is urgent, please advise.

 regards
 Tony

 -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- --
 (End of Report)





==
John H. Jenkins
[EMAIL PROTECTED]
[EMAIL PROTECTED]
http://www.tejat.net/

Re: Sorting on number of strokes for Traditional Chinese

2002-10-16 Thread John H. Jenkins



On Wednesday, October 16, 2002, at 04:14 AM, Marco Cimarosti wrote:

 The next step is knowing *which* strokes make up each characters, in 
 order
 to properly sort characters having the same stroke number.


There's no consistency there.  Different dictionaries use different 
subsorts once you get beyond the stroke-count level.  The 
five-stroke-type classification used by the PRC is a fairly recent 
innovation and not universally used.

 Is there any online source for such data? Even for smaller sets than 
 Unicode
 CJK.


Not that I'm aware.

==
John H. Jenkins
[EMAIL PROTECTED]
[EMAIL PROTECTED]
http://www.tejat.net/

Re: ct, fj and blackletter ligatures

2002-11-03 Thread John H. Jenkins


On Saturday, November 2, 2002, at 02:59 PM, Doug Ewell wrote:


Using ZWJ to control ligation is admittedly a new concept, and it may
not have been taken up yet by many vendors, but that seems like a 
really
poor reason to discourage the Unicode approach.

Proprietary layout features in OT-savvy apps like InDesign might get 
the
job done, but wouldn't it be better if app vendors and font vendors
would follow the Unicode Standard recommendation?  You never know, it
might even reduce the number of requests to encode ligatures.


Remember, though that the Unicode approach is that ZWJ is *not* the 
preferred Unicode way to support things like a discretionary ct 
ligature in Latin text.  The standard says that the preferred way to 
handle this is through higher-level protocols.

I know that you and I disagree with to what extent ligation control 
belongs in plain text, but the standard clearly allows both approaches. 
 The ZWJ mechanism is not *the* Unicode approach.

==
John H. Jenkins
[EMAIL PROTECTED]
[EMAIL PROTECTED]
http://www.tejat.net/

Re: ct, fj and blackletter ligatures

2002-11-05 Thread John H. Jenkins


On Tuesday, November 5, 2002, at 02:18 AM, William Overington wrote:


Well, I suppose it depends upon what one means by a file format that
supports Unicode.  The TrueType format does not support the ZWJ method 
and
thus does not provide means to access unencoded glyphs by transforming
certain strings of Unicode characters into them.

TrueType fonts are perfectly capable of supporting ligatures.  
OpenType, AAT, and Graphite all use TrueType fonts, and all support 
ligatures.

==
John H. Jenkins
[EMAIL PROTECTED]
[EMAIL PROTECTED]
http://www.tejat.net/

Re: ct, fj and blackletter ligatures

2002-11-07 Thread John H. Jenkins


On Thursday, November 7, 2002, at 09:40 AM, [EMAIL PROTECTED] 
wrote:

As for providing a
notification dialog to say that the text contains  c, ZWJ, t  but 
that
the font doesn't support it, there are no existing mechanisms to 
support
that at present, but it hasn't been demonstrated that there really is 
any
need, and I really don't expect vendors will be hearing too many
complaints from users.


Actually, you *could* do it on a Mac if you really wanted to.  I'm not 
sure why you would, however.  One of the advantages of the ZWJ 
mechanism for requesting ligatures is that if the request is impossible 
to fulfill, it can be ignored.  For discretionary ligatures like ct, 
this is the appropriate response.  (Matters are a bit more complicated 
for required ligatures, of course.)

==
John H. Jenkins
[EMAIL PROTECTED]
[EMAIL PROTECTED]
http://www.tejat.net/

Info: Apple OSX Font Tools Suite 1.0.0 Released

2002-11-11 Thread John H. Jenkins

Cupertino 11/8/02: Today the Apple Font Group released its new suite of 
Unix command line font tools for OSX.

These can be downloaded free from http://developer.apple.com/fonts/. 
The automatically installed 4.8 Mb package includes the tools, user 
documentation, and a 60-page tutorial.
To use this package, you need to be running OSX 10.2. Everything is 
automatically configured by the installer. You just add fonts to taste.

Working with text sources for many of the tables in an sfnt font 
structure is a powerful and efficient way to develop, debug and manage 
font sources. E.g. use ftxdumperfuser to solve cmap and postname 
glitches once and for all in .ttf, .otf and CFF format fonts.

With this release, Apple has converted its text dump formats to XML and 
will be continuing to refine the XML formats in future releases.

No previous experience of Unix is necessary as the 60-page tutorial 
takes you step-by-step through useful font editing proceses with an 
accompanying set of ready-worked live demo files.

Applications in The Font Tool Suite are:

*	ftxanalyzer
*	ftxdiff
*	ftxdumperfuser
*	ftxenhancer
*	ftxinstalledfonts
*	ftxruler
*	ftxvalidator

Documents included:

*	The Apple Font Tool Suite Manual (51 pages)
*	Tool Quick Reference (8 pages)

*	Tutorial (62 pages)
*	Tutorial Command Summary (8 pages)

==
John H. Jenkins
[EMAIL PROTECTED]
[EMAIL PROTECTED]
http://www.tejat.net/

Re: Info: Apple OSX Font Tools Suite 1.0.0 Released

2002-11-12 Thread John H. Jenkins

Try control-clicking on the link and then selecting Save link to disk 
from the popup menu.

On Tuesday, November 12, 2002, at 09:55 AM, Dean Snyder wrote:

At 4:49 PM John H. Jenkins wrote:


Cupertino 11/8/02: Today the Apple Font Group released its new suite 
of
Unix command line font tools for OSX.

These can be downloaded free from http://developer.apple.com/fonts/.

The actual download URL is:

http://developer.apple.com/fonts/FontToolsv1.0.dmg

But I can't get it to download with any browser I've tried (IE, Opera,
Mozilla) - they all display the binary disk image as garbled text 
instead
of downloading it to disk. (I've fiddled with download helper 
preferences
for .dmg files but that hasn't helped. Is the .0.dmg file name
termination confusing the browsers?)


Respectfully,

Dean A. Snyder
Scholarly Technology Specialist
Center For Scholarly Resources, Sheridan Libraries
Garrett Room, MSE Library, 3400 N. Charles St.
The Johns Hopkins University
Baltimore, Maryland, USA 21218

office: 410 516-6850 mobile: 410 245-7168 fax: 410-516-6229
Digital Hammurabi: www.jhu.edu/digitalhammurabi
Initiative for Cuneiform Encoding: www.jhu.edu/ice





==
John H. Jenkins
[EMAIL PROTECTED]
[EMAIL PROTECTED]
http://www.tejat.net/

Re: N2515: Request for Roadmap - plane 3

2002-11-12 Thread John H. Jenkins


On Tuesday, November 12, 2002, at 09:03 AM, Andrew C. West wrote:


BTW, what is CJK Unified Ideographs Extension C intended to include 
? Surely
not any more ordinary Han ideographs - with over 70,000 ideographs 
already
encoded, there can't be so many genuine ideographs that still need 
encoding as
to warrant a whole new plane. However there is a real need to encode 
oracle bone
characters and other ancient epigraphic forms of Han ideographs. Is 
this
(hopefully) what Extension C is intended for ?


Nope.  We're still doing modern stuff.

it is unlikely in the extreme that we'll actuall *need* a whole plane 
for new ideographs.  Extension C is currently big enough, however, that 
if we were to accommodate it via separate encoding of everything we'd 
use up the rest of Plane 2.  And there's still no end in sight.

To some extent, we're having to deal with massive turtle--er, fecal 
matter being dumped uncritically into the bin consisting largely of 
things which are obviously variants of existing characters.  This we 
will deal with to an extent by using variation selectors.  (Many of 
Unicode's proposed additions are unofficial simplifications which will 
also be handled via variation selectors.)

Beyond that, it is incredible just how many obscure characters there 
are once you start looking for them.  The PRC's submission includes 
large numbers of place names, for example, and I dread to think how 
many more of *those* there may be.  HKSAR has come up with more 
Cantonese- or Hong Kong-specific characters.  The only non-Mandarin 
dialect to receive *any* attention at all is Cantonese, and despite the 
efforts of the HKSAR that's been rather unsystematic.  Unicode's 
proposed characters include a few Cantonese-specific ones that we were 
able to dig up without much effort.

And all this leaves out stuff like cute names for Hong Kong race 
horses, frogs-in-wells, and things like that.

All in all, I wouldn't be surprised if there were as many as ten 
thousand or so genuinely distinct characters in modern use which have 
yet to be encoded.  And there are a number of border line cases from 
pre-modern texts where it looks like it's probably a variant but it may 
not be. (Of course, I also estimated the total number of genuine Han 
ideographs to be under eighty thousand, which just goes to show how 
much *I* know.)

Oracle bone forms and other older versions of the Han ideographs are 
something we haven't even got a good model for how to handle yet.

==
John H. Jenkins
[EMAIL PROTECTED]
[EMAIL PROTECTED]
http://www.tejat.net/

Re: N2515: Request for Roadmap - plane 3

2002-11-13 Thread John H. Jenkins

On Wednesday, November 13, 2002, at 03:22 AM, Andrew C. West wrote:


On Wed, 13 Nov 2002 02:03:27 -0800 (PST), John H. Jenkins wrote:


Nope.  We're still doing modern stuff.



Well, there's no rush, just as long as you get round to it sometime 
... how
about reserving a plane now anyway ?


Because there's no indication that we'll need a full plane, basically.


All in all, I wouldn't be surprised if there were as many as ten
thousand or so genuinely distinct characters in modern use which have
yet to be encoded.


I'm really sceptical about this. Is there anywhere where I can see the 
proposals
for CJK-C additions ?


http://www.cse.cuhk.edu.hk/~irg/irg/extc/CJK_Ext_C.htm


==
John H. Jenkins
[EMAIL PROTECTED]
[EMAIL PROTECTED]
http://www.tejat.net/

Re: The result of the plane 14 tag characters review.

2002-11-14 Thread John H. Jenkins


On Wednesday, November 13, 2002, at 12:07 AM, George W Gerrity wrote:


In an effort to unify all character and pictographs, the decision was 
made to unify CJK characters by suppressing most variant forms. That 
turns out to be the single greatest objection from users -- especially 
Japanese -- and somehow we need a low-level way of indicating the 
target language in the context of multilingual text.

The plane 14 tags seem to be appropriate to do this, giving a hint to 
the font engine as to a good choice of alternate glyphs, where 
available.


A couple of points.

1) There are two kinds of variant problems coming out from Unihan.  The 
way objections are stated based on these variant problems is, 
respectively:

Japanese readers will be forced to read Japanese text with Chinese 
glyphs!

and

Mr. Watanabe won't be able to insert the variant glyph for his name 
that he prefers into a document!

The first objection is, and always has been, a non-issue, and is the 
only aspect of the problem that the Plane 14 tags could hope to deal 
with.  The issue is not a language one, but a locale one, to begin 
with.  Moreover, the typical practice in Japanese typography (at least) 
is to use Japanese-preferred glyphs even when displaying Chinese text.  
Japanese users do *not* expect the text to switch back-and-forth 
between Chinese and Japanese glyphs as the language varies.

Given this, the best solution to the problem is to use fonts aimed at 
the specific locale.  This means that a Japanese user who goes to read 
her email at an Internet café in Hong Kong may see things unexpectedly, 
true, but it really handles 99.99+% of the problem.

I should note that as Unicode-based systems are becoming more common in 
Japan, such as Windows XP and Mac OS X, there is less concern being 
expressed on this point.

The second objection could not be solved by the Plane 14 tags.  The two 
solutions that are possible are to separately encode every glyphic 
variant which someone, somewhere, sometime may find necessary to 
distinguish in plain text, or to use variant markers.  It is the latter 
solution which the UTC has adopted.

2) From a technical standpoint, the Plane 14 tags do not really lend 
themselves to use with the main complex script font engines available.  
I don't know enough about Graphite to really speak to it, but in the 
case of OpenType and AAT it is true that protocols are already 
available to use Japanese/SC/TC/Korean/Vietnamese glyphs for a run of 
text.  These existing protocols, however, depending on information 
external to the text itself.

To keep the information internal to the text, or, more accurately, 
internal to the glyph stream, one would have to have the ability to 
enter a state once a certain character (or glyph) is encountered and 
remain in that state indefinitely.  Neither OpenType nor AAT allow 
this.  OpenType does not use a state engine internal to the glyph 
stream for processing, and AAT resets the state at the beginning of 
each line.

What would have to happen is that the rendering engine would have to 
find these characters within the text stream, massage the text data so 
as remove them and mark the text with the equivalent higher-level 
information, and then render the result.

The problem here is that the libraries such as Uniscribe and ATSUI 
which provide Unicode rendering do not deal with the text as a whole 
(at least, this is definitely true with ATSUI and is probably true with 
Uniscribe, although I don't know for sure).  That is, the Plane 14 tag 
may be found in the first paragraph of the text, but when the client 
hands the text off to the library, they may hand off only a later 
portion because that's all that needs to be drawn.  The library then 
does not have access to this information and will not render the text 
correctly.

This basically means that the onus is on the client to parse the 
presence of these tags in the text and make appropriate adjustments 
when it hands off the text to Uniscribe or ATSUI for rendering.  As 
such, there is no real advantage gained by having these tags embedded 
directly in the text over having them in the same layer as font, point 
size, and other typographic preferences.  Indeed, it becomes 
inconvenient to have them in a different layer as it means that the 
client has to do *two* levels of processing to derive this information, 
rather than just one.

=
John H. Jenkins
[EMAIL PROTECTED]
[EMAIL PROTECTED]
http://www.tejat.net/

Re: ATSUI for MacOS9

2002-11-20 Thread John H. Jenkins


On Tuesday, November 19, 2002, at 10:33 PM, Theodore H. Smith wrote:


I'd like to know if ATSUI can be used for MacOS9. The ATSUI demo for 
OSX works perfectly, but the ATSUI demo for OS9, can't do horizontal 
hit testing. :o(


ATSUI should work fine on Mac OS 9.  (It was introduced with 8.5, after 
all.)

Why not? Is this a bug in the demo, or a bug in ATSUI for OS9? Does 
ATSUI for Carbon on OS9 work if ATSUI for Classic OS9 doesn't?


I really don't know.  ATSUI for Carbon will be a later, better version 
than ATSUI for classic, however.

If anyone knows ATSUI well, could you please contact me so I can ask a 
few more questions? Thanks a lot.


You could send my questions to me and I can have them circulated to the 
proper people.

==
John H. Jenkins
[EMAIL PROTECTED]
[EMAIL PROTECTED]
http://www.tejat.net/

Re: Why isn't my character displaying

2002-11-29 Thread John H. Jenkins


On Friday, November 29, 2002, at 05:23 AM, Theodore H. Smith wrote:


What is wrong? Is it something to do with font fallbacks? I am not 
touching font fallbacks at all. All I did was set the FontID for my 
ATSUStyle object, to that for Monaco plain.

I'm a bit stuck here, can someone help? I thought ATSUI is meant to 
fill in the missing fonts, automatically??? So why isn't it?



ATSUI *can* fill in the missing fonts automatically, but you have to 
tell it to.  You call ATSUSetTransientFontMatching() for your layout 
object.

Re: Unihan Mandarin Readings

2002-12-02 Thread John H. Jenkins


Is it possible to regenerate the Unihan database with the correct 
secondary
Mandarin readings ?


Certainly in the Unicode 4.0 time-frame we can improve things.  I can't 
make any guarantees, however.

==
John H. Jenkins
[EMAIL PROTECTED]
[EMAIL PROTECTED]
http://www.tejat.net/

Re: Unihan Mandarin Readings

2002-12-07 Thread John H. Jenkins


On Tuesday, December 3, 2002, at 03:17 AM, Andrew C. West wrote:


BTW, is it possible for Unicode to provide a Unihan.xml version of the 
Unihan
database ? The first thing I do is convert the Unihan.txt file into 
XML format
for ease of processing.


As a rule, we tend to stick to older formats so that people don't have 
to rewrite their perl scripts and other parsers.  I know you're asking 
if we could add an XML format *in addition* to the non-XML one, but 
given the size of Unihan.txt, that isn't likely.

==
John H. Jenkins
[EMAIL PROTECTED]
[EMAIL PROTECTED]
http://www.tejat.net/

Re: CJK fonts

2002-12-11 Thread John H. Jenkins


On Wednesday, December 11, 2002, at 08:27 AM, Raymond Mercier wrote:


For example, the simplified form of the character Han itself (U+6C49) 
is given the Pinyin reading Yi, the traditional form U+6F22 is the 
correct reading Han.


Have you reported this?

BTW, there's the official Unihan lookup Web page at 
http://www.unicode.org/charts/unihan.html.

==
John H. Jenkins
[EMAIL PROTECTED]
[EMAIL PROTECTED]
http://www.tejat.net/

Re: Small Latin Letter m with Macron

2003-01-16 Thread John H. Jenkins


On Wednesday, January 15, 2003, at 01:35 PM, Kenneth Whistler wrote:


Handwritten forms and arbitrary manuscript abbreviations
should not be encoded as characters. The text should just
be represented as m + m. Then, if you wish to *render*
such text in a font which mimics this style of handwriting
and uses such abbreviations, then you would need the font
to ligate mm sequences into a *glyph* showing an m with
an overbar.



Remembering, of course, to use ZWNJ to mark places where this ligature 
may not be used.

==
John H. Jenkins
[EMAIL PROTECTED]
[EMAIL PROTECTED]
http://www.tejat.net/

Re: newbie 18030 font question

2003-01-16 Thread John H. Jenkins


On Thursday, January 16, 2003, at 12:25 PM, Stefan Persson wrote:


I assume that you mean GB18030, right?  Due to a change in Chinese 
laws, Apple and Microsoft had to make fonts supporting all those 
characters available.  You may download those fonts from the 
companies' respective home pages.


Well, not from Apple's, anyway.  Several GB18030 fonts come with Mac OS 
X 10.2, but we don't have a license to make them freely downloadable.

==
John H. Jenkins
[EMAIL PROTECTED]
[EMAIL PROTECTED]
http://www.tejat.net/

Re: Small Latin Letter m with Macron

2003-01-16 Thread John H. Jenkins


On Thursday, January 16, 2003, at 01:29 PM, Timothy Partridge wrote:


Yes, especially early printing of Latin documents. See for example
Gutenberg's bibles.



Well, for that matter, even current editions of Spenser's _Faerie 
Queene_ will use the occasional õ for on, and so on.

==
John H. Jenkins
[EMAIL PROTECTED]
[EMAIL PROTECTED]
http://www.tejat.net/

Re: unicode in Mac

2003-01-26 Thread John H. Jenkins


On Sunday, January 26, 2003, at 10:13 AM, Raymond Mercier wrote:


Given a plain text unicode file, with the opening byte FEFF, and which 
displays correctly in Notepad on a PC.
What facility is available on a Mac to make this file  display 
correctly ?
I am trying to help a colleague, who has MAC OS IX, and I need to tell 
him what font will cover Greek and Extended Greek.


Do you mean Mac OS X, or Mac OS 9?

For the former, TextEdit would work fine.  If your friend is on Mac OS 
X 10.2 or later, the system font, Lucida Grande, has a full set of 
glyphs for Greek and Extended Greek.  Otherwise any of the free Greek 
fonts on the Internet would work.

On Mac OS 9, the situation is a bit grimmer, as there aren't many 
Unicode-savvy applications.  SUE would be one option.  You should be 
able to find it using Google.

==
John H. Jenkins
[EMAIL PROTECTED]
[EMAIL PROTECTED]
http://www.tejat.net/

Re: 4701

2003-02-01 Thread John H. Jenkins


On Saturday, February 1, 2003, at 01:39 PM, Thomas Chan wrote:

And the website of the Pearl River (www.pearlriver.com) department store
in New York City says "lamb"!  unihan.txt says that U+7F8A is 
"sheep, goat; KangXi radical 123". 

Stolen from Mathews, as it happens.  

 On Google, "year of the goat" has the
lead.


Systran has sheep.  KangXi says (if I'm understanding it correctly) something like "animal with curved horns."  (It's more complex than that, but I think I caught the essence.)

==
John H. Jenkins
[EMAIL PROTECTED]
[EMAIL PROTECTED]
http://www.tejat.net/

Re: VS vs. P14 (was Re: Indic Devanagari Query)

2003-02-06 Thread John H. Jenkins

On Thursday, February 6, 2003, at 08:47 AM, Andrew C. West wrote:


There are also a number of other auspicious characters, such as fu2 
(U+798F)
good fortune that may be found written in a hundred variant forms as 
a
decorative motif.

Ah, but decorative motifs are not plain text.

==
John H. Jenkins
[EMAIL PROTECTED]
[EMAIL PROTECTED]
http://www.tejat.net/

Re: traditional vs simplified chinese

2003-02-13 Thread John H. Jenkins


On Thursday, February 13, 2003, at 07:18 AM, Marco Cimarosti wrote:



	3) All other characters listed in Unihan.txt are *both*
Traditional and Simplified.



Actually, this is not quite true.  Even though the current set of 
traditional/simplified data is much better than it's ever been, we 
still have cases where new simplified forms have been created and 
encoded where their traditional counterparts have not, and considerably 
more cases where traditional forms have theoretical simplifications 
which have not been encoded.

The best you can say is that if a character has a traditional variant 
(but no simplified variant), it's simplified, and if it has a 
simplified variant (and no traditional variant), it's traditional, and 
if it has both, it's both.


Anyway, I don't see how this information could be of any use for any
purpose...



There are some ideographs (e.g., anything with the bone radical) which 
have different appearance in simplified and traditional Chinese, even 
though the two have been unified in Unicode.  Identifying a text as 
simplified vs. traditional could help in automatic font selection.

==
John H. Jenkins
[EMAIL PROTECTED]
[EMAIL PROTECTED]
http://www.tejat.net/

Re: Converting old TrueType fonts to Unicode

2003-02-14 Thread John H. Jenkins


On Friday, February 14, 2003, at 01:12 PM, John Hudson wrote:


Another option for re-encoding fonts is to hack the font cmap table 
itself. The easiest way to do this is probably with Just van Rossum's 
TTX tool. See http://sourceforge.net/projects/fonttools/. This is a 
Python-based open source tool that decompiles TTF and OTF fonts to a 
human-readable XML file, which can then be edited and recompiled to a 
font. I have used this tool for a variety of purposes, but do not have 
any experience working on fonts with supplementary plane codepoints, 
so cannot verify its usefulness for this purpose.


For people on Mac OS X, there is a set of tools available for download 
from http://developer.apple.com/fonts/ which, like TTX, can decompile 
table from TrueType and OpenType fonts and let the user edit the 
results.  These *do* support astral characters.

==
John H. Jenkins
[EMAIL PROTECTED]
[EMAIL PROTECTED]
http://www.tejat.net/

Re: Everson Mono

2003-02-16 Thread John H. Jenkins


On Saturday, February 15, 2003, at 07:22 PM, [EMAIL PROTECTED] wrote:



You could pick up the old TTFDUMP.EXE program from Microsoft Typography
developer's web pages at
http://www.microsoft.com/typography/creators.htm
This utility can dump any or all of the tables in a TTF/OTF into
a plain text file which is human-readable.  Once the cmap table
information has been dumped, you can import the text into your
process and process it.  (It only works on Plane Zero fonts.)



And you can get ftxdumperfuser at Apple's site 
http://developer.apple.com/fonts, which works on Mac OS X and can 
handle the astral planes.


==
John H. Jenkins
[EMAIL PROTECTED]
[EMAIL PROTECTED]
http://www.tejat.net/

Re: Finding a font that contains a particular character

2003-02-17 Thread John H. Jenkins


On Monday, February 17, 2003, at 09:36 AM, Alan Wood wrote:


Someone recently asked how to find a font that contains a particular 
Unicode
character.  I don't have an easy answer, but TrueType Explorer (for 
Windows)
may help:


On the Mac, BTW, Mac OS X 10.2 or later, you can either use the 
character palette (in the keyboard menu) or install Apple's font tools 
http://developer.apple.com/fonts and use ftxinstalledfonts with the 
-U option.  Both of these work with astral characters.

==
John H. Jenkins
[EMAIL PROTECTED]
[EMAIL PROTECTED]
http://www.tejat.net/

Re: [OpenType] PS glyph `phi' vs `phi1'

2003-02-19 Thread John H. Jenkins


On Wednesday, February 19, 2003, at 04:13 PM, Werner LEMBERG wrote:


I have to correct myself, fortunately.  After looking into the printed
version of Unicode 2.0 I see that the glyphs of 03D5 and 03C6 in the
file U0370.pdf are exchanged.  Your assuption is correct that the
annotation in Unicode 3.2 is wrong.



I'm sorry, but you've lost me here.  The Unicode 3.2 text states:

quote

With Unicode 3.0 and the concurrent second edition of ISO/IEC 10646-1, 
the representative glyphs for U+03C6 GREEK LETTER SMALL PHI and U+03D5 
GREEK PHI SYMBOL were swapped. In ordinary Greek text, the character 
U+03C6 is used exclusively, although this characters has considerably 
glyphic variation, sometimes represented with a glyph more like the 
representative glyph shown for U+03C6 (the loopy form) and less often 
with a glyph more like the representative glyph shown for U+03D5 (the 
straight form).

For mathematical and technical use, the straight form of the small phi 
is an important symbol and needs to be consistently distinguishable 
from the loopy form. The straight form phi glyph is used as the 
representative glyph for the symbol phi at U+03D5 to satisfy this 
distinction.

The reversed assignment of representative glyphs in versions of the 
Unicode Standard prior to Unicode 3.0 had the problem that the 
character explicitly identified as the mathematical symbol did not have 
the straight form of the character that is the preferred glyph for that 
use. Furthermore, it made it unnecessarily difficult for general 
purpose fonts supporting ordinary Greek text to also add support for 
Greek letters used as mathematical symbols. This resulted from the fact 
that many of those fonts already used the loopy form glyph for U+03C6, 
as preferred for Greek body text; to support the phi symbol as well, 
they would have had to disrupt glyph choices already optimized for 
Greek text.

When mapping symbol sets or SGML entities to the Unicode Standard, it 
is important to make sure that codes or entities that require the 
straight form of the phi symbol be mapped to U+03D5 and not to U+03C6. 
Mapping to the latter should be reserved for codes or entities that 
represent the small phi as used in ordinary Greek text.

Fonts used primarily for Greek text may use either glyph form for 
U+03C6, but fonts that also intend to support technical use of the 
Greek letters should use the loopy form to ensure appropriate contrast 
with the straight form used for U+03D5.

/quote

What annotation in 3.2 do you feel is incorrect?

==
John H. Jenkins
[EMAIL PROTECTED]
[EMAIL PROTECTED]
http://www.tejat.net/

Re: The display of kholam on PCs

2003-03-06 Thread John H. Jenkins

 to shudder at 
the thought of designing a state table.

But not everything in a 'morx' requires a state table.  Ligature 
support is *really* easy to do, and has been for years and years.  The 
fact of the matter is that the bulk of the font designers out there 
don't even *know* that there's a way to add ligature support to fonts 
on the Mac.  We've tried to get the word out, but obviously we haven't 
succeeded.

Still, when and where people have come to use to ask for help, we've 
done what we could to provide it.  Frankly, few people have come.

The best long-term solution is for Apple to follow through on their 
promise to support OpenType Layout features, so that we have a 
genuinely cross platform font solution.

As I say, we've been careful not to make public promises in any detail 
on this issue.  I'm not aware of any time when we've said more than 
that we're hoping to provide OT to AAT layout table conversion possible 
using our tools.  We really can't commit ourselves on this.

Given the fact that many application developers are basically echoing 
the same sentiment (why waste money developing for the Mac when I can 
get 90% of the same customer base without spending the money), I'm not 
sure it's entirely a matter of it being our fault, however.  Certainly 
I'm not sure that the best long-term solution to having competing OSes 
is for everybody to simply switch over to Windows, either.

The best *short-term* solution is for someone to tell them that if 
they're interested, they can contact us directly and we'll see what we 
can work out.  We could probably work out AAT support for their 
specific font without too much trouble.

==
John H. Jenkins
[EMAIL PROTECTED]
[EMAIL PROTECTED]
http://www.tejat.net/

Re: FAQ entry (was: Looking for information on the UnicodeData file)

2003-03-07 Thread John H. Jenkins

On Friday, March 7, 2003, at 04:26 AM, Pim Blokland wrote:

Oh, in that case I must say I think the UnicodeData.txt file doesn't 
do a
very good job.
For instance, the Danish ae (U+00E6) is not designated a ligature, but 
the
Dutch ij (U+0133) is, even though the a and e are clearly fused
together, while the i and j aren't.


John's description is a general one of what the character names mean.  
They are not, however, systematic or entirely consistent, nor are they 
expected to be, since different people speaking different languages 
often have different perceptions of what a symbol is.

==
John H. Jenkins
[EMAIL PROTECTED]
[EMAIL PROTECTED]
http://www.tejat.net/

Re: Encoding: Unicode Quarterly Newsletter

2003-03-11 Thread John H. Jenkins

I certainly think it would be good published with a leather cover, 
onion-skin paper, and gilt edges, yes.  First we have to have Ken 
divide it into verses, though.

On Tuesday, March 11, 2003, at 01:19 PM, Yung-Fong Tang wrote:

Hope they can reduce the weight next time by change the type of the 
paper. My Bible is about 500 pages (about 1500+ pages) more than the 
unicode 3.0 standard but only 50% of it's thick.  Same as my 
Chinese/English dictionary.

Otto Stolz wrote:

Kenneth Whistler wrote:

we can
calculate the weight as being *approximately* 9.05 pounds
(avoirdupois) [or 10.99 troy pounds].


Apparently a weighty publication, that forthcoming Unicode standard...

Cheers,
  Otto Stolz





==
John H. Jenkins
[EMAIL PROTECTED]
[EMAIL PROTECTED]
http://www.tejat.net/

Re: Unicode not in Quark 6

2003-06-22 Thread John H. Jenkins

On Saturday, June 21, 2003, at 10:06  PM, Jungshik Shin wrote:

  PostgreSQL seems to be available for Mac OS X. See
http://www.postgresql.org/ and
http://developer.apple.com/internet/macosx/postgres.html
MySQL is also available for Mac OS X 
(http://developer.apple.com/internet/macosx/osdb.html).  I'm not sure 
of the status of Unicode support, but it seems to be fine if you're not 
worrying about collating or similar services.  It's what's used at the 
moment to host the Unihan database, for example.

==
John H. Jenkins
[EMAIL PROTECTED]
[EMAIL PROTECTED]
http://homepage.mac.com/jhjenkins/

Re: French group separators

2003-07-07 Thread John H. Jenkins

On Monday, July 7, 2003, at 4:08 PM, Frank da Cruz wrote:

Of course.  But without two spaces you have greater ambiguity, at 
least in
English: In Mr. Roberts, what is the function of the period?

  Don't call me Mr. Roberts is my name.

  Don't call me Mr.  Roberts is my name.


IIRC the English prefer to say Mr Roberts.

==
John H. Jenkins
[EMAIL PROTECTED]
[EMAIL PROTECTED]
http://homepage.mac.com/jhjenkins/

Re: French group separators

2003-07-07 Thread John H. Jenkins

On Monday, July 7, 2003, at 4:38 PM, Michael Everson wrote:

At 16:22 -0600 2003-07-07, John H. Jenkins wrote:

IIRC the English prefer to say Mr Roberts.
The, ahem, Irish too. ;-)

Well, to be frank, I'm sure that the Welsh, Scots, and Manx probably 
do, too.  (Did I leave anybody out *this* time?) I just don't read many 
books, alas, printed in Ireland, Wales, Scotland, or on the Isle of 
Man.  :-(

==
John H. Jenkins
[EMAIL PROTECTED]
[EMAIL PROTECTED]
http://homepage.mac.com/jhjenkins/

Re: missing .GIF's for ideographs on unicode.org?

2003-07-17 Thread John H. Jenkins

On Thursday, July 17, 2003, at 12:00 AM, Richard Cook wrote:

I'm guessing this just hasn't been implemented yet.

You are guessing correctly.  Once some of the dust settles from my day 
job, I expect I can get to this.

==
John H. Jenkins
[EMAIL PROTECTED]
[EMAIL PROTECTED]
http://homepage.mac.com/jhjenkins/

Re: Last Resort Glyphs (was: About the European MES-2 subset)

2003-07-20 Thread John H. Jenkins

On Saturday, July 19, 2003, at 1:15 PM, Michael Everson wrote:

So fonts containing these glyphs could be designed to display these 
glyphs, in a way similar to the current assignment of control 
pictures.
Um, that's what the Last Resort font does, outside of Unicode encoding 
space. (I don't think PUA characters are used, actually, but I could 
be wrong.

No, it uses the acutal Unicode characters, and just has a huge cmap 
that maps everything in Unicode to the glyph for its block.

==
John H. Jenkins
[EMAIL PROTECTED]
[EMAIL PROTECTED]
http://homepage.mac.com/jhjenkins/

Re: Last Resort Glyphs (was: About the European MES-2 subset)

2003-07-20 Thread John H. Jenkins

On Sunday, July 20, 2003, at 7:37 AM, Philippe Verdy wrote:

Mostly for documentation purpose, but also in most system that want to 
be more informative to users missing a font for a particular script. 
Michael also judged it to be useful enough to create such a font for 
Apple, and Apple thought it would be useful for its Mac users.
Er, no.  Apple thought it would be useful for its Mac users and 
commissioned Michael to make glyphs.  (And I personally think he's done 
an excellent job.)

==
John H. Jenkins
[EMAIL PROTECTED]
[EMAIL PROTECTED]
http://homepage.mac.com/jhjenkins/

Re: Karen Language Representation in Unicode

2003-07-20 Thread John H. Jenkins

On Sunday, July 20, 2003, at 7:38 AM, [EMAIL PROTECTED] wrote:

Heather Batterham wrote on 07/20/2003 06:46:16 AM:

The second interest I have is in the development of word processing
tools that utilize the contents of unicode.  I use a Macintosh with  
OSX
installed.  The basic language packages are very good but they do not
have the Burmese script included.
The only working font implementation for Burmese script that I know of  
is
a one that we have (in beta), implemented using Graphite rendering.  
It's
available at
http://scripts.sil.org/cms/scripts/page.php? 
site_id=nrsiitem_id=GraphiteFonts.

We could probably help you get it to work on Mac OS X.  Meanwhile,  
Xenotype claims to have a Burmese language kit for Mac OS X  
(http://www.xenotypetech.com/osxBurmese.html), although nobody at  
Apple has seen it, so we can't confirm that it works as advertised.

==
John H. Jenkins
[EMAIL PROTECTED]
[EMAIL PROTECTED]
http://homepage.mac.com/jhjenkins/

Re: About the European MES-2 subset

2003-07-20 Thread John H. Jenkins

On Friday, July 18, 2003, at 4:45 PM, Michael (michka) Kaplan wrote:

A question mark is a sign of a bad conversion from Unicode (to a code 
page
that did not contain the character). This would likely happen on the 
Mac too
rather than the Last Resort font, wouldn't it?

MS Explorer on the Mac converts Unicode to old Mac scripts which it 
then renders.  That's why all the question marks when the page is 
looked at with MS Explorer.

==
John H. Jenkins
[EMAIL PROTECTED]
[EMAIL PROTECTED]
http://homepage.mac.com/jhjenkins/

Re: proposal for a creative commons character

2004-06-15 Thread John H. Jenkins

On Jun 15, 2004, at 2:22 PM, [EMAIL PROTECTED] wrote:
Michael Tiemann scripsit:
Without getting greedy, I'd like to propose the adoption of the (cc)
symbol in whatever way would be most expedient (so that creative 
commons
authors can identify their work more appropriately), and leave for 
later
the question of the other symbols.
It's a logo.  We normally don't do logos.
To be a little less terse, in the case of symbols like this, it is the 
strong preference not to encode as a means to encourage use.


John H. Jenkins
[EMAIL PROTECTED]
[EMAIL PROTECTED]
http://homepage.mac.com/jhjenkins/

Re: number of bytes for simplified chinese

2004-06-28 Thread John H. Jenkins

On Jun 27, 2004, at 11:37 PM, Duraivel wrote:
hi,
 
I would like to know the number opf bytes required for simplified 
chinese language. Can we represent all the characters of  simplified 
chinese in unicode using just two bytes.

No.  It will take up to four bytes per character, whether you're using 
UTF-8, UTF-16, or UTF-32.


John H. Jenkins
[EMAIL PROTECTED]
[EMAIL PROTECTED]
http://homepage.mac.com/jhjenkins/

Re: Looking for transcription or transliteration standards latin- arabic

2004-07-02 Thread John H. Jenkins

 Jul 2, 2004 11:17 AM Chris Harvey 
Perhaps one could think of Ha Tinh as the English word for the city, 
like Rome (English) for Roma (Italian), or Tokyo (English) for 
Tky (English transliteration of Japanese), or Kahnawake 
(English/French) for Kahnaw:ke (Mohawk).
Or Peking for Bejng.  :-)

John H. Jenkins
[EMAIL PROTECTED]
[EMAIL PROTECTED]
http://homepage.mac.com/jhjenkins/

Re: Chinese Simplified - How many bytes

2004-07-06 Thread John H. Jenkins

 Jul 6, 2004 3:10 AM Duraivel 
Hi,
I browsed  through the ICU library and it looks similar to 
gettext library which GNU provides, with more functionality added. But 
we are developing our product on QT which has its own translations. So 
I dont want to use another library for translations. Also there is a 
class QString which says its takes care of byte issues. Basically it 
is overloaded and acts accordingly for two byte Unicode char set. Also 
it states that QString supports Chinese(simplified). Am not getting 
how he says that two bytes can support Chinese simplified. Is it true 
that, to represent Chinese simplified programmatically, two bytes will 
do.

Unicode in the UTF-16 encoding will cover almost all the simplified 
Chinese characters people use today in two bytes.  There are the 
occasional exceptions which will require four bytes.


John H. Jenkins
[EMAIL PROTECTED]
[EMAIL PROTECTED]
http://homepage.mac.com/jhjenkins/

Re: Unicode v. 4 font software for Mac

2004-07-15 Thread John H. Jenkins

 Jul 15, 2004 12:13 PM David Branner 
I have tried AsiaFont Studio 4 and FontLab, but they are not compatible
with version 4 of the Unicode Standard and hence are not suitable for 
my
purposes.

I assume that by saying they're not compatible, you mean that they 
don't support characters off of the BMP.  If this is the problem, you 
can use Apple's tool ftxdumperfuser to alter the cmap after FontLab has 
generated it.  Apple's font tool suite is available at 
http://developer.apple.com/fonts.(Alternatively, if you give a 
character a name of the form ux, e.g., u2 I'm told that the 
latest version of FontLab will generate an appropriate cmap entry for 
it, but I don't know for sure.)


John H. Jenkins
[EMAIL PROTECTED]
[EMAIL PROTECTED]
http://homepage.mac.com/jhjenkins/

Re: Unicode v. 4 font software for Mac

2004-07-15 Thread John H. Jenkins

 Jul 15, 2004 2:54 PM David Branner 
: :I assume that by saying they're not compatible, you mean that 
they
: :don't support characters off of the BMP.

They can neither generate such characters nor (apparently) open fonts 
that
contain such characters.

Then move the non-BMP characters to the PUA using ftxdumperfuser (or 
remove their Unicode mappings altogether), and re-add (or re-shift) the 
Unicode mappings after using FontLab with the same tool.


John H. Jenkins
[EMAIL PROTECTED]
[EMAIL PROTECTED]
http://homepage.mac.com/jhjenkins/

Re: Problem with accented characters

2004-08-23 Thread John H. Jenkins

On Aug 23, 2004, at 3:34 PM, Doug Ewell wrote:
Deborah Goldsmith goldsmit at apple dot com wrote:
FYI, by far the largest source of text in NFD (decomposed) form in
Mac OS X is the file system. File names are stored this way (for
historical reasons), so anything copied from a file name is in (a
slightly altered form of) NFD.
Slightly altered?
Yes, the specification for the Mac file system was frozen before NFD 
had been developed by the UTC, so it isn't exactly the same.  But it's 
close.


John H. Jenkins
[EMAIL PROTECTED]
[EMAIL PROTECTED]
http://homepage.mac.com/jhjenkins/

Re: Arial Unicode MS

2004-12-06 Thread John H. Jenkins

On Dec 6, 2004, at 10:23 AM, Johannes Bergerhausen wrote:

From some discussions here i learned that Arial Unicode MS contains 
about 50.000 glyphs,
which is about the size of characters encoded in Unicode 2.0 and was 
shipped the last
time bundled with Office for Windows 2003.

A Pan-Unicode-Font is a beautiful idea.
Why Microsoft/Monotype stopped the developpement of further versions?
The TrueType and OpenType font formats do not allow a font to contain 
more than about 65,000 glyphs. Since there are well over 65,000 
characters in Unicode, plus additional glyphic forms that would be 
necessary for proper support for various scripts, it is no longer 
possible to produce a single font like Arial Unicode MS.

There are other issues -- making a single typeface which covers all the 
scripts in Unicode and has a common esthetic design is really not 
possible; loading a huge font can consume a significant chunk of the 
resources on a system, most of which is wasted; and so on.

Re: IUC27 Unicode, Cultural Diversity, and Multilingual Computing / Africa is forgotten once again.

2004-12-08 Thread John H. Jenkins

On Dec 8, 2004, at 3:57 PM, Patrick Andries wrote:

Azzedine Ait Khelifa a écrit :
Hello All,
The subject of this conference is really interesting and veryusefull.
But once again Africa is forgotten.
I want to know, if we can have the same conference AfricaOriented 
scheduled ?
If Not,  What should we do to have this conference scheduled in a 
cityaccesible for african community (like Paris).

If this is possible, I would also add « and with much more contents 
ina language understood in Africa and the host country : French ».

Well, and as with everything else associated with Unicode, feel free to 
volunteer.

Re: US-ASCII (was: Re: Invalid UTF-8 sequences)

2004-12-13 Thread John H. Jenkins

On Dec 10, 2004, at 1:25 PM, Tim Greenwood wrote:

Is that like the 'Please RSVP' that I see all too often? Or should
that not be excused?

Or -- my own personal favorite -- in the year AD 2004.

Re: Simplified Chinese radical set in Unihan

2004-12-16 Thread John H. Jenkins

As you say, the main problem is that there are so many different 
possible sets. Some will be proprietary, which would limit their 
usefulness although there would, I believe, otherwise be no objection 
to its inclusion. If you can come up with a reasonably standard set and 
reasonably consistent data across several dictionaries referencing it, 
I'm sure there'd be no objection to including it.

On Dec 16, 2004, at 2:19 PM, Erik Peterson wrote:

Hello,
 I've found many uses for the UniHan data file the past few years. 
It's a great source of information.

 One potential addition that I've wanted is a field listing the 
simplified Chinese radical for at least the simplified Chinese 
characters, like what exists for the Xinhua Zidian (Xinhua 
Dictionary) and other mainland Chinese dictionaries. I was wondering 
if this has been discussed before?

 Some potential difficulties I could see include the fact that 
mainland dictionaries use a variety of different radical schemes. The 
most standard one that I can find is the Chinese Academy of Social 
Sciences (CASS) set with 189 different radicals. Even for dictionaries 
that use this set the ordering is often different. Could the radical 
set also be proprietary in some way?

 Anyway, I was curious. I've been working on something like this 
myself that I could also contribute when it's farther along.

Regards,
Erik Peterson

< 1 2 3 >

101 - 200 of 273 matches

Mail list logo