Re: ZWJ and Latin Ligatures

2002-07-18 Thread Eric Muller

[EMAIL PROTECTED] wrote:

>This was one of the basic
>design criteria in order to ensure that support for a script could be added
>by building a font using tools assessible to people with less that than
>C-programming skills and without requiring any re-write of software.
>
Actually, the goal of "easily add shapping for a new orthography" and 
the goal of "do not duplicate in all fonts what really belongs to an 
orthography" are not as incompatible as we paint them. For example, 
there could a be plug-in mechanism (and those plug-ins could be written 
in a special-purpose language) for Uniscribe.

Eric.






Re: ZWJ and Latin Ligatures

2002-07-18 Thread Peter_Constable


On 07/17/2002 11:48:28 AM John Hudson wrote:

>>In Graphite, character sequences get mapped into glyph sequences
one-to-one
>>via the cmap...

>Presumably, though, making ligature lookups dependent on ZWJ in Graphite
--
>as in OpenType -- relies on the ZWJ character actually being painted.

Yes, if I understand you: the font developer would create rules in the
Graphite Description Language (which get compiled into state tables) that
look for glyph sequences that would include a virtual glyph corresponding
to ZWJ. So, a sequence of glyphs gLtnSmS gZWJ gLtnSmT would be transformed
into a different sequence gLtnLigSmSSmT. The vitual glyph that was in the
initial glyph string doesn't survive and so doesn't get rasterised, but it
does have to be part of the original glyph sequence.

>One
>of the issues I am looking at with ZWJ is treatment of that character as
an
>unpainted control character in some systems and applications. If a glyph
is
>never painted, it doesn't matter how many font lookups include the glyph:
>nothing is going to happen.

In principle, it would be possible with Graphite to have a layer (in the
app, say) that converted a character sequence like < ... s, ZWJ, t ... >
into a sequence without ZWJ -- < ... s, t ... > -- and also set a feature,
and then have a font that has a glyph mapping gLtnSmS + gLtnSmT ->
gLtnLigSmSSmT that operates only if that feature is set. That would go
counter to the general philosophy of Graphite (which is like AAT and unlike
OT in this regard) which fundamentally assumes that the shaping behaviour
is entirely encapsulated in the font and not in any way dependent upon
behaviours being hard-wired into the software. This was one of the basic
design criteria in order to ensure that support for a script could be added
by building a font using tools assessible to people with less that than
C-programming skills and without requiring any re-write of software.
Something like this could perhaps be considered as a special case, though,
if it is to become conventional programming practice for applications to
suppress ZWJ from strings that it asks the system to draw.

Perhaps this represents an area that would benefit from a TR or something
from the Consortium spelling out what is considered best practice. It's the
kind of thing that really should be standardised across implementations.



- Peter


---
Peter Constable

Non-Roman Script Initiative, SIL International
7500 W. Camp Wisdom Rd., Dallas, TX 75236, USA
Tel: +1 972 708 7485
E-mail: <[EMAIL PROTECTED]>







Re: ZWJ and Latin Ligatures

2002-07-17 Thread John Hudson

At 08:43 AM 17-07-02, [EMAIL PROTECTED] wrote:

>In Graphite, character sequences get mapped into glyph sequences one-to-one
>via the cmap, just as in OT and AAT. From that point, what happens and what
>can happen is completely at the discretion of the font developer. They can
>decide to always form certain ligatures, always form some ligatures but
>make others dependent upon ZWJ, or they may make some/all dependent upon a
>user-selected font feature (assumes the software exposes an UI for
>selecting the features that the font supports). Of course, nothing happens
>unless the font developer included rules that make it happen.

Presumably, though, making ligature lookups dependent on ZWJ in Graphite -- 
as in OpenType -- relies on the ZWJ character actually being painted. One 
of the issues I am looking at with ZWJ is treatment of that character as an 
unpainted control character in some systems and applications. If a glyph is 
never painted, it doesn't matter how many font lookups include the glyph: 
nothing is going to happen.

MS Office apps treat ZWJ as a control character by default and will not 
paint the glyph. However, users have the option to turn on visual display 
of control characters -- ZWJ, ZWNJ, LTR, RTL etc. -- in order to be able to 
see them and to figure out what is affecting text in what ways. My guess is 
that, at this point, all those ZWJ lookup sequences in the font would 
suddenly become active, ironically achieving the opposite of what the 
'display control characters' function is intended to achieve, since the ZWJ 
glyphs, where processed by lookups, would disappear and be replaced by 
ligatures. This is actually quite funny.

John Hudson

Tiro Typeworks  www.tiro.com
Vancouver, BC   [EMAIL PROTECTED]

Language must belong to the Other -- to my linguistic community
as a whole -- before it can belong to me, so that the self comes to its
unique articulation in a medium which is always at some level
indifferent to it.  - Terry Eagleton





Re: ZWJ and Latin Ligatures

2002-07-17 Thread Peter_Constable


On 07/02/2002 10:41:24 AM "John H. Jenkins" wrote:

[I've been out of the loop for the past couple of weeks -- just now
catching up on this.]


>Alas, but that's technically impossible.  Both OT and AAT (I'm not sure
>about Graphite) require that single characters map to single glyphs, which
>are then processed.  (In OT, of course, you are also supposed to do some
>preprocessing in character space, but that doesn't solve this problem.)
>It would be nice to have a cmap format which maps multiple characters to
>single glyphs initially.
>
>The way we deal with this is to have the ligatures with the ZWJ inserted
>as part of a ligature table which is on by default and which isn't
>revealed to the UI so that the user can't turn them off.

In Graphite, character sequences get mapped into glyph sequences one-to-one
via the cmap, just as in OT and AAT. From that point, what happens and what
can happen is completely at the discretion of the font developer. They can
decide to always form certain ligatures, always form some ligatures but
make others dependent upon ZWJ, or they may make some/all dependent upon a
user-selected font feature (assumes the software exposes an UI for
selecting the features that the font supports). Of course, nothing happens
unless the font developer included rules that make it happen.



- Peter


---
Peter Constable

Non-Roman Script Initiative, SIL International
7500 W. Camp Wisdom Rd., Dallas, TX 75236, USA
Tel: +1 972 708 7485
E-mail: <[EMAIL PROTECTED]>







Re: ZWJ and Latin Ligatures (was Re: (long) Re: Chromatic font research)

2002-07-07 Thread John H. Jenkins


On Saturday, July 6, 2002, at 03:42 AM, James Kass wrote:

>
> We certainly agree that ligature use is a choice.  I think we diverge
> on just what kind of choice is involved.  You consider that ligature
> use is generally similar to bold or italic choices.  I consider use of
> ligatures to be more akin to differences in spelling.  If you're
> quoting from a source which used the word "fount", it is wrong to
> change it to "font".  And, if you're quoting from a source which
> used "hæmoglobin", anything other than "hæmoglobin" is incorrect.
> If the source used "&c.", it should never be changed to "etc.".
> So, if the source used the "ct" ligature...
>
>

I see your point, but I think we're to the stage where we'll just have to 
agree to disagree.  We *do* agree that ligation is a choice, but you're 
quite accurate in your assessment of where precisely we diverge.

==
John H. Jenkins
[EMAIL PROTECTED]
[EMAIL PROTECTED]
http://homepage.mac.com/jenkins/





Re: ZWJ and Latin Ligatures (was Re: (long) Re: Chromatic font research)

2002-07-06 Thread Asmus Freytag

At 08:06 PM 7/4/02 +0300, John Hudson wrote:
>>But ligature prohibition is a quite regular feature of German orthography 
>>and any Unicode-based system that intends to provide generic support for 
>>Latin script use, should be able to support it. As the prohibition is on 
>>a case-by-case and word-by-word basis, it has to be marked in the text.
>
>The specific requirements of some languages with ligature restrictions, 
>e.g. Turkish, are supported in the OpenType 'language system' model, which 
>enables different layout behaviour to be associated with different 
>orthographic systems. Unfortunately, the German rules really require 
>dictionary support to be properly implemented, since the rules for 
>word-internal ligature use/prohibition are intimately linked to spelling 
>and cannot be algorithmically arrived at.

Once you support prohibited ligatures via ZWNJ, you can then extend the 
ligaturing model to (somehwat more) historical documents, by using Fraktur 
fonts that provide more ligatures. Similar prohibition rules that are 
spelling based apply there.

This is still a far cry from trying to capture irregular printed editions 
or manuscripts. I would be in favor of using markup for such (academic) 
cases, since it is for a specialized need, not for standard redition of a 
text in a given font and writing system.

A./




Re: ZWJ and Latin Ligatures (was Re: (long) Re: Chromatic font research)

2002-07-06 Thread Asmus Freytag

All your other good points noted:

At 02:57 PM 7/1/02 -0600, John H. Jenkins wrote:
>>Therefore, I would be much happier if the discussion of the 'standard' 
>>case wasn't as anglo-centric and allowed more directly for the fact that 
>>while fonts are in control of what ligatures are provided, layout engines 
>>may be in control of what and how many optional ligatures to use, the 
>>text (!) must be in control of where ligatures are mandatory or prohibited.
>
>Which is what Unicode 3.2 says.  (You said it very nicely here, though.)
>
>(The standard case, BTW, seems to be Anglo-centric largely because this is 
>an English-speaking list and people always seem to start out with the "ct"
>  ligature they'd like to put in words like "respectfully."  Sorry about 
> that.)

I guess I'm just trying to hold the list up to a higher standard.

A./




Re: ZWJ and Latin Ligatures (was Re: (long) Re: Chromatic font research)

2002-07-06 Thread James Kass


Kenneth Whistler wrote,

> 
> > Another problem with TR28 is that its date is earlier than the date
> > on TR27.  This suggests that TR27 is more current.
> 
> I don't understand this claim.
> 

After misreading the dates and writing the letter last Monday,
the internet connection was lost here for four days!

I stand corrected (or, at least, better informed) on several items 
and am thankful to those who contributed to this thread.

Best regards,

James Kass.







Re: ZWJ and Latin Ligatures (was Re: (long) Re: Chromatic font research)

2002-07-06 Thread James Kass


John H. Jenkins wrote,

>
> There's another level of problem here, too.  What if it isn't the author's
> intent, but an artifact of the particular typesetter?

When making an electronic reproduction of a specific text, a purist will
even duplicate any typographical errors found in the source.

>>...  However, options should be preserved
>> for the user.  Ligature selection is a task for the author/typesetter
>> at the fundamental level; it should not be completely left to the
>> rendering system.
>>
>
> Er, James.  I've never said it should.  The rendering system should have
> the ability to do default ligation.  The user should be able to override
> that behavior.  That's what happens on systems I see.  If they do ligation
> at *all*, they have a default behavior which can be overridden.

Sorry to have misunderstood you.

IMO, ligation should be off by default, and users should be able to
enable it.  I expect my browser to display the "fi" ligature whenever
the string "f+ZWJ+i" is encountered.  If the presentation form is
used, naturally the ligature should display.  However, if the original
author simply entered "f+i", the string would be expected.  This is
because it isn't the browser's job to guess which orthographic rules
apply.  And, it isn't acceptable for a computer to alter a person's
input without permission, even for display purposes.

> I'll repeat a point that I've made over and over and over.
>
> The "ct" ligature does not exist in and of itself.  It is a part of a
> typeface.  It doesn't make sense in general to ask for the formation of a
> "ct" ligature without any reference to the typeface you're using.
>

Sorry, I'm still missing it here.  Is this like saying the ligature "&" does
not exist in and of itself; it is a part of a typeface?  Or, like stating
that it doesn't make sense to ask for Hindi text without any reference
to the font?

> The implication of what you're saying is that Latin typefaces should be
> *required* to have a "ct" ligature on the off chance that the author of
> text determines that it's "required" in a particular context.  That gives
> most type designers the heebie jeebies.  It's bad enough that Adobe and
> Apple are making them stick useless "fi" and "fl" ligatures in their fonts.

Doug Ewell already answered this.

> In any event, if an author determines that a "ct" ligature is honestly and
> absolutely *required* in a particular context (as opposed to being
> desirable), then the ZWJ mechanism exists.

And, if an author determines that the browser or mark-up of choice
doesn't have global ligature options, the ZWJ mechanism still exists?

>>> To be frank, turning on an optional "ct" ligature throughout a document
>>> by
>>> means of inserting ZWJ everywhere you want it to take place makes as much
>>> sense in that model—the model that Western typography uses for languages
>>> such as English—as having the user insert a  pair around every
>>> letter they want in italics.
>>
>> Not at all.  This is apples and oranges.  The italic tags operate upon
>> every character in the enclosed string equally.  Using a similar ligature
>> tag would be expected to make ligatures wherever possible within the
>> enclosed string according the the user system's ability to render
>> ligatures... irrespective of the author's intent.  Depending upon the
>> system, the same run of text could be expressed with no ligatures
>> at all in a monospaced font or as scripto continuo in a handwriting
>> font.
>>
> Er, you've just made my point, haven't you?  The typeface makes a
> difference.  If you're ever in a situation where the typeface of the
> originator may be different from the typeface of the receiver, you've lost
> the ability to say whether or not ligatures should be used in a particular
> context.  Or do you want a "ct" ligature in Courier?

I'd never disagree that the typeface makes a difference.  On the contrary,
I'm wondering if you haven't just made my point.  If the ZWJ mechanism
is used, then the ability to say whether or not ligatures should be used is
directly encoded in the text and can't possibly be lost.

> If I want to reproduce, say, my reproduction of the 1611 KJV, it's equally
> incorrect to use a sans-serif typeface.  Actually, technically, my
> reproduction is already doing something very naughty by this standard,
> since the *real* 1611 KJV was in blackletter.

If a font isn't specified by the author, the author may be assured that
somebody someplace is reading KJ in a Bazooka Joe typeface.  If I were
to post an HTML version of the 1611 KJV, in my intro would appear a
note advising that if the reader wanted to see the display in the same
flavor as the source, they should download and install a specific font
or fonts along with links.

> The precise reproduction of the appearance of a text is *NOT* possible in
> plain text.  It is *NOT* the intention of Unicode to make it possible.

This is understood without plain text italic shouting.  That's what HTML
is for (e

Re: ZWJ and Latin Ligatures (was Re: (long) Re: Chromatic font research)

2002-07-06 Thread James Kass


- Original Message - 
From: "Asmus Freytag" <[EMAIL PROTECTED]>
Sent: Monday, July 01, 2002 1:08 PM

> Therefore, I would be much happier if the discussion of the 
> 'standard' case wasn't as anglo-centric and allowed more directly 
> for the fact that while fonts are in control of what ligatures 
> are provided, layout engines may be in control of what and how 
> many optional ligatures to use, the text (!) must be in control of 
> where ligatures are mandatory or prohibited.

Well stated.

Perhaps the best 'higher level protocol' is the mentation of the
author.

Best regards,

James Kass.







Re: ZWJ and Latin Ligatures (was Re: (long) Re: Chromatic font research)

2002-07-05 Thread Michael Everson

At 19:53 +0300 2002-07-04, John Hudson wrote:

>Well, we need and have (in OpenType and AAT) a general purpose 
>mechanism for typesetting texts employing ligatures as deemed fit by 
>the professional typographer. The expectation of such a mechanism is 
>that layout is applied to 'normal' text to render that text 
>according to the norms of particular typographic traditions, 
>publishing house styles, etc.. It should not be necessary to edit 
>the text, inserting ZWJ all over the place, in order to achieve this 
>result.

It *is* necessary for some ligatures in some scripts. Let's say that 
there is in the entire corpus of Ogham three ligatures of RUIS RUIS. 
We don't want to encode that as a separate character, and we don't 
want it to be on by default since there could be numerous other 
examples of RUIS side-by-side with RUIS. But a disgustingly complete 
font could take the ZWJ into account for the ligature, which could be 
used by people wanting to typeset the non-standard but extant 
ligature. ZWJ forces unusual ligatures if the font supports them, and 
ZWNJ breaks them where not.

>There are, however, kinds of documents in which the presence or 
>absence of ligatures is best determined by the author of the 
>document, and for that reason the ZWJ provides a means for the 
>author to specify ligation in plain text.

As I have said.

>But it seems to me that such documents are the exception rather than the norm

This is certainly true. That's why ZWJ should not be preferred for 
non-exceptional kinds of ligation. But some scripts like Hungarian 
Runic and Germanic Runic have a fairly large set which are used from 
time to time and irregularly.

>(a particular set of ligatures involving the lowercase f have been a 
>normative aspect of European typography for more than 500 years; in 
>my profession they are not considered optional or discretionary in 
>the setting of running text at typical sizes).

Except for Turkish and Azerbaijani of course. :-)

>Documents using ZWJ can only be reliably rendered in particular fonts.

Well, the same holds true for all ligatures.

>For example, there is no reason why I should not include the 
>sequence 'p ZWJ q' in a document, but unless I have a font 
>containing a pq ligature I will not be able to render the sequence 
>as intended by the author.

But if you were changing the font, it would be available for those 
fonts which had it, and ignored for those fonts which didn't.
-- 
Michael Everson *** Everson Typography *** http://www.evertype.com




Re: ZWJ and Latin Ligatures (was Re: (long) Re: Chromatic font research)

2002-07-04 Thread Doug Ewell

John Hudson  wrote:

> Documents using ZWJ can only be reliably rendered in particular
> fonts. For example, there is no reason why I should not include the
> sequence 'p ZWJ q' in a document, but unless I have a font containing
> a pq ligature I will not be able to render the sequence as intended
> by the author.

But the whole point of including ZWJ is to make the sequence "as
connected as possible," which implies "and no more so."  If you code the
sequence "f ZWJ l" that should tell the renderer to use an fl ligature
*if* one exists in the font, and to just use f l if it doesn't.  Some
fonts will be able to display the ligature, others won't.  For "p ZWJ q"
every font will just display p q.  That's completely consistent with
both the wording and the intent of ligation-by-ZWJ.

There's nothing about using ZWJ to form ligatures that requires every
font to contain all possible ligatures, nor anything that requires the
renderer to switch fonts in mid-word and hunt around for a font that
contains the requested ligature.  If it's not there, it's not there.

-Doug Ewell
 Fullerton, California





RE: ZWJ and Latin Ligatures

2002-07-04 Thread John Hudson

At 19:55 7/2/2002, Marco Cimarosti wrote:

>The OpenType specs published on the Adobe site states that table GSUB has a
>subtable to handle ligatures ("LookupType 4: Ligature Substitution
>Subtable": http://partners.adobe.com/asn/developer/opentype/gsub.html#LSF1).
>
>It says that "A Ligature Substitution (LigatureSubst) subtable identifies
>ligature substitutions where a single glyph replaces multiple glyphs"
>(multiple *glyphs*, not multiple characters).
>
>OK: literally speaking, it is true that OT maps single characters to single
>glyphs, but then it maps multiple glyphs to ligature glyphs, so what's the
>difference?
>
>I mean: isn't this two-step mapping:
>
> code point -> glyph ID
> component glyph ID's -> ligature glyph ID
>
>functionally equivalent to an hypothetical one-step mapping?
>
> component code points -> ligature glyph ID
>
>Am I missing something?

Well, what you might be missing is layout feature support, in which case 
the sequences described might be 'dysfunctionally equivalent'. It is useful 
to maintain a distinction between direct (cmap) and indirect (GSUB lookup) 
glyph-to-character mapping, because of the system and application support 
issues involved in getting the latter to work.

John Hudson

Tiro Typeworks  www.tiro.com
Vancouver, BC   [EMAIL PROTECTED]

Language must belong to the Other -- to my linguistic community
as a whole -- before it can belong to me, so that the self comes to its
unique articulation in a medium which is always at some level
indifferent to it.  - Terry Eagleton





Re: ZWJ and Latin Ligatures

2002-07-04 Thread John Hudson

At 18:49 7/2/2002, Michael Everson wrote:

>>Alas, but that's technically impossible.  Both OT and AAT (I'm not sure 
>>about Graphite) require that single characters map to single glyphs, 
>>which are then processed.
>
>Hm? How do you handle the decomposed sequence A + COMBINING ACUTE? Surely 
>that is a sequence of characters mapping to a single glyph.

Nope. That's two characters mapped to two glyphs that might be represented by

a) character level mapping of two characters to a single character 
represented by a single glyph;
b) character level mapping of two characters to a single character 
represented in glyph level processing by two glyphs;
c) glyph level mapping of two glyphs to a single glyph;
d) glyph level positioning of two glyphs to form a single typeform (grapheme).

There may be other variations.

The CMAP table maps individual glyphs to one or more characters. It cannot 
map sequences of characters to glyphs, or sequences of glyphs to characters.

>>(In OT, of course, you are also supposed to do some preprocessing in 
>>character space, but that doesn't solve this problem.)  It would be nice 
>>to have a cmap format which maps multiple characters to single glyphs 
>>initially.
>
>I always thought there was. Now I'm really confused as to how I would make 
>a complex Indic syllable.

See http://www.microsoft.com/typography/developers/opentype/default.htm, 
especially the section on Uniscribe in the 'Overview' part, which includes 
a step-by-step analysis of the shaping of a Sanskrit word.

The AAT approach is, of course, a bit different, because the 
character-level re-ordering takes place at the glyph level along with 
everything else.

John Hudson


Tiro Typeworks  www.tiro.com
Vancouver, BC   [EMAIL PROTECTED]

Language must belong to the Other -- to my linguistic community
as a whole -- before it can belong to me, so that the self comes to its
unique articulation in a medium which is always at some level
indifferent to it.  - Terry Eagleton





Re: ZWJ and Latin Ligatures

2002-07-04 Thread John Hudson

At 18:41 7/2/2002, John H. Jenkins wrote:

>Alas, but that's technically impossible.  Both OT and AAT (I'm not sure 
>about Graphite) require that single characters map to single glyphs, which 
>are then processed.  (In OT, of course, you are also supposed to do some 
>preprocessing in character space, but that doesn't solve this problem.)
>It would be nice to have a cmap format which maps multiple characters to 
>single glyphs initially.
>
>The way we deal with this is to have the ligatures with the ZWJ inserted 
>as part of a ligature table which is on by default and which isn't 
>revealed to the UI so that the user can't turn them off.

That would be possible in OpenType using the Required Ligatures  
feature, which is the same feature used in Arabic for the lam-alif 
ligature. It is certainly feasible, and I cannot think of a good reason not 
to use this approach.

John Hudson

Tiro Typeworks  www.tiro.com
Vancouver, BC   [EMAIL PROTECTED]

Language must belong to the Other -- to my linguistic community
as a whole -- before it can belong to me, so that the self comes to its
unique articulation in a medium which is always at some level
indifferent to it.  - Terry Eagleton





Re: ZWJ and Latin Ligatures (was Re: (long) Re: Chromatic font research)

2002-07-04 Thread John Hudson

At 23:08 7/1/2002, Asmus Freytag wrote:

>Remember also that the simplistic model you present already breaks down 
>for German, since the same character pair may or may not allow ligation 
>depending on the content and meaning of the text - features that in the 
>Unicode model are relegated to *plain* text.
>
>Therefore, I would be much happier if the discussion of the 'standard' 
>case wasn't as anglo-centric and allowed more directly for the fact that 
>while fonts are in control of what ligatures are provided, layout engines 
>may be in control of what and how many optional ligatures to use, the text 
>(!) must be in control of where ligatures are mandatory or prohibited.
>
>I don't know of a case where a mandatory ligature of two characters is 
>sometimes prohibited, which means that for all practical cases, mandatory 
>ligatures, like LAM-ALIF tend to also be handled by the layout engine. But 
>ligature prohibition is a quite regular feature of German orthography and 
>any Unicode-based system that intends to provide generic support for Latin 
>script use, should be able to support it. As the prohibition is on a 
>case-by-case and word-by-word basis, it has to be marked in the text.

The specific requirements of some languages with ligature restrictions, 
e.g. Turkish, are supported in the OpenType 'language system' model, which 
enables different layout behaviour to be associated with different 
orthographic systems. Unfortunately, the German rules really require 
dictionary support to be properly implemented, since the rules for 
word-internal ligature use/prohibition are intimately linked to spelling 
and cannot be algorithmically arrived at.

John Hudson

Tiro Typeworks  www.tiro.com
Vancouver, BC   [EMAIL PROTECTED]

Language must belong to the Other -- to my linguistic community
as a whole -- before it can belong to me, so that the self comes to its
unique articulation in a medium which is always at some level
indifferent to it.  - Terry Eagleton





Re: ZWJ and Latin Ligatures

2002-07-04 Thread John Hudson

At 22:03 7/1/2002, Tex Texin wrote:

>In following this thread, I am trying to find where, in a non-plain text
>product, I have the ability to make two characters into a ligature or
>cursively connected. (The latter I guess I could do with a wholesale
>font change.) For example, I looked at Microsoft Word and found that I
>can make the text shimmer and sparkle and have either marching red or
>black ants (When will Unicode have characters to do those?) but I don't
>see how to control ligatures.
>
>As someone who is not a high-end typographer, I don't recall ever having
>the ability to change ligaturing without replacing characters. I also do
>not ever recall having the need to, which I understand is part of the
>rationale for the Unicode policy. I am not trying to argue one way or
>the other. The discussion refers to other ways of influencing a font
>with respect to ligature and I don't recall ever seeing a way to do
>this. What kinds of products have these abilities?

Adobe InDesign and Photoshop have this ability using OpenType fonts (and 
InDesign will do automatic substitution of f-ligs based on glyph names for 
older fonts). Paul Nelson at MS is working on adding Latin typographic 
layout support to Uniscribe, so that this kind of ligature control will be 
available to any app that wants to make use of it.

John Hudson

Tiro Typeworks  www.tiro.com
Vancouver, BC   [EMAIL PROTECTED]

Language must belong to the Other -- to my linguistic community
as a whole -- before it can belong to me, so that the self comes to its
unique articulation in a medium which is always at some level
indifferent to it.  - Terry Eagleton





Re: ZWJ and Latin Ligatures (was Re: (long) Re: Chromatic font research)

2002-07-04 Thread John Hudson

At 14:31 6/30/2002, James Kass wrote:

>Sounds like a giant step backwards from Unicode 3.0.1  (March 2002)
>http://www.unicode.org/unicode/standard/versions/Unicode3.0.1.html
>(see section "Controlling Ligatures")
>
>This page clearly states that ZWJ is proper for controlling the
>formation of Latin ligatures and even uses f+ZWJ+i as an example.
>
>Unicode 3.1 (May 2002) uses the same examples:
>http://www.unicode.org/unicode/reports/tr27/index.html
>
>Can you please point me to a URL for Unicode 3.2 ligature control?
>This link (March 2002):
>http://www.unicode.org/unicode/reports/tr28/
>...glosses over Latin ligatures suggesting that mark-up should be
>used in some cases and ZWJ in others.
>
>Becuase of the reasons cited in that last link, IMHO ligature control
>is best performed by the author of a document and ZWJ still seems
>to be the most straightforward method.

Well, we need and have (in OpenType and AAT) a general purpose mechanism 
for typesetting texts employing ligatures as deemed fit by the professional 
typographer. The expectation of such a mechanism is that layout is applied 
to 'normal' text to render that text according to the norms of particular 
typographic traditions, publishing house styles, etc.. It should not be 
necessary to edit the text, inserting ZWJ all over the place, in order to 
achieve this result.

There are, however, kinds of documents in which the presence or absence of 
ligatures is best determined by the author of the document, and for that 
reason the ZWJ provides a means for the author to specify ligation in plain 
text. But it seems to me that such documents are the exception rather than 
the norm (a particular set of ligatures involving the lowercase f have been 
a normative aspect of European typography for more than 500 years; in my 
profession they are not considered optional or discretionary in the setting 
of running text at typical sizes). Documents using ZWJ can only be reliably 
rendered in particular fonts. For example, there is no reason why I should 
not include the sequence 'p ZWJ q' in a document, but unless I have a font 
containing a pq ligature I will not be able to render the sequence as 
intended by the author.

John Hudson

Tiro Typeworks  www.tiro.com
Vancouver, BC   [EMAIL PROTECTED]

Language must belong to the Other -- to my linguistic community
as a whole -- before it can belong to me, so that the self comes to its
unique articulation in a medium which is always at some level
indifferent to it.  - Terry Eagleton





Re: ZWJ and Latin Ligatures

2002-07-02 Thread John Cowan

Michael Everson scripsit:

> I have to confess I don't understand what you are talking about at 
> all. Get me them tools, John!

Ligature tables at a high level tell you things like "The glyph 'a'
and the glyph 'acute accent' should be merged to form the glyph
'aacute'."  Internally, though, it reads more like "A #502 followed
by a #397 should be replaced by a #929", where the numbers (or
names, in some contexts) *represent* the actual glyph outlines.
You could write "#202 followed by #999 becomes SHAVIAN PEEP glyph"
without there being any actual outlines for #202 or #999, but as
John says, if something actually called for a #202 to be imaged,
the rendering software would go belly-up.

I hope this helps.

-- 
John Cowan[EMAIL PROTECTED]
At times of peril or dubitation,  http://www.ccil.org/~cowan
Perform swift circular ambulation,http://www.reutershealth.com
With loud and high-pitched ululation.




Re: ZWJ and Latin Ligatures

2002-07-02 Thread John H. Jenkins


On Tuesday, July 2, 2002, at 12:51 PM, Marco Cimarosti wrote:

> The next step could be standardizing the values of the glyph indexes, so
> that the entire "GSUB"/"morx" table can be copied in from a template, and
> type designers can concentrate on drawing the outlines.
>

The typical approach these days is for the tools that provide advanced 
layout table support to be keyed to glyph name.  Apple's tools allow glyph 
name, glyph number, of Unicode code point as glyph identifiers.  As you 
say, it makes it possible to cut-and-paste source files and is very handy.

==
John H. Jenkins
[EMAIL PROTECTED]
[EMAIL PROTECTED]
http://homepage.mac.com/jenkins/





Re: ZWJ and Latin Ligatures

2002-07-02 Thread Michael Everson

At 12:15 -0600 2002-07-02, John H. Jenkins wrote:
>On Tuesday, July 2, 2002, at 11:39 AM, John Cowan wrote:
>
>>
>>>1) If you map directly from multiple characters to a single glyph, you don'
>>>t have to include glyphs in your font for all the "pieces" if they're
>>>never supposed to appear by themselves.  As an extreme example, if I
>>>implemented astral character support via ligating surrogate pairs, I'd
>>>need to include glyphs for the unpaired surrogates.
>>
>>More precisely, you need to have glyph *indexes* that are never mapped
>>to glyphs.  The actual outlines themselves don't need to exist, AFAIK.
>>
>
>True.  I tend to avoid that, because if something goes wrong and the 
>system attempts to actually *display* one of these virtual glyphs, 
>disaster would ensue.  (Dave Opstad and I have had long debates on 
>the safety of doing this.)

I have to confess I don't understand what you are talking about at 
all. Get me them tools, John!
-- 

Michael Everson *** Everson Typography *** http://www.evertype.com




RE: ZWJ and Latin Ligatures

2002-07-02 Thread Marco Cimarosti

John Cowan wrote:
> More precisely, you need to have glyph *indexes* that are never mapped
> to glyphs.  The actual outlines themselves don't need to exist, AFAIK.

Yes, of course. E.g., I guess that the ZWJ "glyph" can be a pseudo-index
which doesn't actually index anything.

The next step could be standardizing the values of the glyph indexes, so
that the entire "GSUB"/"morx" table can be copied in from a template, and
type designers can concentrate on drawing the outlines.
:-)

_ Marco




Re: ZWJ and Latin Ligatures

2002-07-02 Thread John H. Jenkins


On Tuesday, July 2, 2002, at 11:39 AM, John Cowan wrote:

>
>> 1) If you map directly from multiple characters to a single glyph, you 
>> don'
>> t have to include glyphs in your font for all the "pieces" if they're
>> never supposed to appear by themselves.  As an extreme example, if I
>> implemented astral character support via ligating surrogate pairs, I'd
>> need to include glyphs for the unpaired surrogates.
>
> More precisely, you need to have glyph *indexes* that are never mapped
> to glyphs.  The actual outlines themselves don't need to exist, AFAIK.
>

True.  I tend to avoid that, because if something goes wrong and the 
system attempts to actually *display* one of these virtual glyphs, 
disaster would ensue.  (Dave Opstad and I have had long debates on the 
safety of doing this.)

==
John H. Jenkins
[EMAIL PROTECTED]
[EMAIL PROTECTED]
http://homepage.mac.com/jenkins/





Re: ZWJ and Latin Ligatures

2002-07-02 Thread John Cowan

John H. Jenkins scripsit:

> 1) If you map directly from multiple characters to a single glyph, you don'
> t have to include glyphs in your font for all the "pieces" if they're 
> never supposed to appear by themselves.  As an extreme example, if I 
> implemented astral character support via ligating surrogate pairs, I'd 
> need to include glyphs for the unpaired surrogates.  

More precisely, you need to have glyph *indexes* that are never mapped
to glyphs.  The actual outlines themselves don't need to exist, AFAIK.

-- 
John Cowan   http://www.ccil.org/~cowan[EMAIL PROTECTED]
To say that Bilbo's breath was taken away is no description at all.  There are
no words left to express his staggerment, since Men changed the language that
they learned of elves in the days when all the world was wonderful. --The Hobbit




Re: ZWJ and Latin Ligatures

2002-07-02 Thread John H. Jenkins


On Tuesday, July 2, 2002, at 10:55 AM, Marco Cimarosti wrote:

> I mean: isn't this two-step mapping:
>
>   code point -> glyph ID
>   component glyph ID's -> ligature glyph ID
>
> functionally equivalent to an hypothetical one-step mapping?
>
>   component code points -> ligature glyph ID
>
> Am I missing something?
>

Functionally, the two are equivalent.  There are, however, two subtle 
differences:

1) If you map directly from multiple characters to a single glyph, you don'
t have to include glyphs in your font for all the "pieces" if they're 
never supposed to appear by themselves.  As an extreme example, if I 
implemented astral character support via ligating surrogate pairs, I'd 
need to include glyphs for the unpaired surrogates.  As it is, Windows and 
the Mac *do* support mapping paired surrogates directly to glyphs, so you 
don't need these extra glyphs which are never seen.

2) A mapping directly from multiple characters to single glyphs expressly 
makes the process something not to percolate up to the UI.  The indirect 
process means that there are some actions in glyph space which *are* 
optional and which the user can turn on and off, and others which aren't.

In OpenType, this is less of an issue since this was always the case and 
applications are expected to do the UI work themselves.  In AAT, we 
originally assumed (back in the days of the Technology That Must Not Be 
Named) that all layout features are optional and can be turned on and off,
  and that the UI would always reflect the entire suite of available 
features.  We had to rewrite our tools to allow for required actions which 
cannot be turned off.

Poor Michael is saddled with older versions of our tools which are hard to 
use and don't let him do this.  We're working on getting newer and better 
ones to him.

==
John H. Jenkins
[EMAIL PROTECTED]
[EMAIL PROTECTED]
http://homepage.mac.com/jenkins/





RE: ZWJ and Latin Ligatures

2002-07-02 Thread Marco Cimarosti

Michael Everson wrote:
> At 09:41 -0600 2002-07-02, John H. Jenkins wrote:
> 
> >Alas, but that's technically impossible.  Both OT and AAT (I'm not 
> >sure about Graphite) require that single characters map to single 
> >glyphs, which are then processed.

I am confused by this statement; perhaps some expert in fonts can help me
checking my understanding.

The OpenType specs published on the Adobe site states that table GSUB has a
subtable to handle ligatures ("LookupType 4: Ligature Substitution
Subtable": http://partners.adobe.com/asn/developer/opentype/gsub.html#LSF1).

It says that "A Ligature Substitution (LigatureSubst) subtable identifies
ligature substitutions where a single glyph replaces multiple glyphs"
(multiple *glyphs*, not multiple characters).

OK: literally speaking, it is true that OT maps single characters to single
glyphs, but then it maps multiple glyphs to ligature glyphs, so what's the
difference?

I mean: isn't this two-step mapping:

code point -> glyph ID
component glyph ID's -> ligature glyph ID

functionally equivalent to an hypothetical one-step mapping?

component code points -> ligature glyph ID

Am I missing something?

_ Marco




Re: ZWJ and Latin Ligatures

2002-07-02 Thread John H. Jenkins


On Tuesday, July 2, 2002, at 09:49 AM, Michael Everson wrote:

> At 09:41 -0600 2002-07-02, John H. Jenkins wrote:
>
>> Alas, but that's technically impossible.  Both OT and AAT (I'm not sure 
>> about Graphite) require that single characters map to single glyphs, 
>> which are then processed.
>
> Hm? How do you handle the decomposed sequence A + COMBINING ACUTE? Surely 
> that is a sequence of characters mapping to a single glyph.
>

Same process.  In OT, of course, you could count on the glyph being 
prenormalized (but this only works for stuff already in Unicode), or you 
could use the GPOS table to properly form the accented form on-the-fly.

But neither technology allows the decomposed sequence to be mapped 
directly to a single glyph.

> Just goes to show that I don't make proper Unicode fonts yet because the 
> tools just aren't up to snuff.
>

We're working on it.  :-)

>> (In OT, of course, you are also supposed to do some preprocessing in 
>> character space, but that doesn't solve this problem.)  It would be nice 
>> to have a cmap format which maps multiple characters to single glyphs 
>> initially.
>
> I always thought there was. Now I'm really confused as to how I would 
> make a complex Indic syllable.
>

Same sort of thing.  You put the glyph in the font and the instructions 
for what sequence forms it in the GSUB or morx table.

==
John H. Jenkins
[EMAIL PROTECTED]
[EMAIL PROTECTED]
http://homepage.mac.com/jenkins/





Re: ZWJ and Latin Ligatures

2002-07-02 Thread John H. Jenkins


On Tuesday, July 2, 2002, at 06:51 AM, Michael Everson wrote:

> That is absolutely true. I have never argued that the only way to turn 
> ligatures on or off is in plain text. I saw that there were difficult 
> edge cases and sought blessing for the ZWJ/ZWNJ mechanism to handle them,
>  and won the day. But it would certainly be my view that those should 
> only be used where predictable ligation does not occur. A Runic font 
> which had an AAT/OpenType/Graphite ligatures-on mechanism would, in my 
> view, be inappropriate, because ligation is unusual in Runic, never the 
> norm, and should only be used on a case-by-case basis. Runic fonts should 
> have the ZWJ pairs encoded in the glyph tables.
>
>>

Alas, but that's technically impossible.  Both OT and AAT (I'm not sure 
about Graphite) require that single characters map to single glyphs, which 
are then processed.  (In OT, of course, you are also supposed to do some 
preprocessing in character space, but that doesn't solve this problem.)  
It would be nice to have a cmap format which maps multiple characters to 
single glyphs initially.

The way we deal with this is to have the ligatures with the ZWJ inserted 
as part of a ligature table which is on by default and which isn't 
revealed to the UI so that the user can't turn them off.

==
John H. Jenkins
[EMAIL PROTECTED]
[EMAIL PROTECTED]
http://homepage.mac.com/jenkins/





Re: ZWJ and Latin Ligatures

2002-07-02 Thread Michael Everson

At 09:41 -0600 2002-07-02, John H. Jenkins wrote:

>Alas, but that's technically impossible.  Both OT and AAT (I'm not 
>sure about Graphite) require that single characters map to single 
>glyphs, which are then processed.

Hm? How do you handle the decomposed sequence A + COMBINING ACUTE? 
Surely that is a sequence of characters mapping to a single glyph.

Just goes to show that I don't make proper Unicode fonts yet because 
the tools just aren't up to snuff.

>(In OT, of course, you are also supposed to do some preprocessing in 
>character space, but that doesn't solve this problem.)  It would be 
>nice to have a cmap format which maps multiple characters to single 
>glyphs initially.

I always thought there was. Now I'm really confused as to how I would 
make a complex Indic syllable.

>The way we deal with this is to have the ligatures with the ZWJ 
>inserted as part of a ligature table which is on by default and 
>which isn't revealed to the UI so that the user can't turn them off.

I am not sure I understand, but then I haven't been able to make use 
of the AAT ligature tables yet. ;-)
-- 
Michael Everson *** Everson Typography *** http://www.evertype.com




Re: ZWJ and Latin Ligatures

2002-07-02 Thread Michael Everson

At 11:00 -0600 2002-07-01, John H. Jenkins wrote:

>I guess one thing that's frustrating for me personally in this 
>perennial discussion is the creation of this false dichotomy, that 
>ligation control either *must* be in plain text or *must* be 
>expressly forbidden in plain text.  I would agree, Michael, that 
>your arguments that some degree of ligation control belongs in plain 
>text were unanswerable.  You did a good job there.  But at the same 
>time, I've never heard you argue that the only way to turn ligatures 
>on or off is in plain text.

That is absolutely true. I have never argued that the only way to 
turn ligatures on or off is in plain text. I saw that there were 
difficult edge cases and sought blessing for the ZWJ/ZWNJ mechanism 
to handle them, and won the day. But it would certainly be my view 
that those should only be used where predictable ligation does not 
occur. A Runic font which had an AAT/OpenType/Graphite ligatures-on 
mechanism would, in my view, be inappropriate, because ligation is 
unusual in Runic, never the norm, and should only be used on a 
case-by-case basis. Runic fonts should have the ZWJ pairs encoded in 
the glyph tables.

>And under no circumstances should new Latin ligatures be added to Unicode.

I agree.

I wonder if it wouldn't be useful at some stage for me to pick the 
best bits out of my papers and do them up as a Unicode Technical Note.
-- 
Michael Everson *** Everson Typography *** http://www.evertype.com




Re: ZWJ and Latin Ligatures (was Re: (long) Re: Chromatic font research)

2002-07-01 Thread Kenneth Whistler

James Kass said:

> One problem with TR28 is that it is worded so that it appears to
> be "in addition" to earlier guidelines. 

It is. The way this works is as follows: The original decision
about the ZWJ as request for ligation was documented in the
Unicode 3.0.1 update notice. That documentation was rolled forward
into UAX #27 (Unicode 3.1), where it was explicitly cast as text
to replace the Unicode 3.0 text on p. 318 re Controlling Ligatures,
including an update of the example table. The additional text in
UAX #28 is just that -- an *addition* to the Unicode 3.1 text,
not a replacement for it.

This will all become more apparent when we can finally publish
Unicode 4.0, which will roll all of the textual additions, once
again, into a single published document.

> This implies that the examples
> used in TR27, for one, are still valid.

They are.

>  In TR27, font developers are
> urged to add things like "f+ZWJ+i" to existing tables where "f+i"
> is already present.

That recommendation still stands -- and, as John pointed out,
is being implemented by vendors.

> Another problem with TR28 is that its date is earlier than the date
> on TR27.  This suggests that TR27 is more current.

I don't understand this claim.

The date on UAX #27 is: 2001-05-16

The date on UAX #28 is: 2002-03-07

Please check that you are referring to the most recent (and only
valid) versions of each.

Otherwise, regarding the substance of this thread, I find myself
in violent agreement with John, who it seems to me is quite ably
stating the case for the current treatment as decided by the UTC.

--Ken




Re: ZWJ and Latin Ligatures

2002-07-01 Thread John H. Jenkins


On Monday, July 1, 2002, at 01:03 PM, Tex Texin wrote:

> The discussion refers to other ways of influencing a font
> with respect to ligature and I don't recall ever seeing a way to do
> this. What kinds of products have these abilities?
>

It's a pretty common feature of desktop publishing applications—Quark, 
FrameMaker, InDesign.  TextEdit, the default text editor on Mac OS X, does 
it, but it's not at all common at the low end of things.  I wouldn't be 
surprised if it showed up in Word eventually, however.

In FrameMaker, which I happen to have open at the moment, you do it by 
turning pair kerning on and off.  InDesign has a menu that lets you select 
degree of ligation.

==
John H. Jenkins
[EMAIL PROTECTED]
[EMAIL PROTECTED]
http://homepage.mac.com/jenkins/





Re: ZWJ and Latin Ligatures (was Re: (long) Re: Chromatic font research)

2002-07-01 Thread John H. Jenkins


On Monday, July 1, 2002, at 02:08 PM, Asmus Freytag wrote:

> At 11:34 AM 6/30/02 -0600, John H. Jenkins wrote:
>> Remember, Unicode is aiming at encoding *plain text*.  For the bulk of 
>> Latin-based languages, ligation control is simply not a matter of *plain 
>> text*—that is, the message is still perfectly correct whether ligatures 
>> are on or off.  There are some exceptional cases.  The ZWJ/ZWNJ is 
>> available for such exceptional cases.
>
> Remember also that the simplistic model you present already breaks down 
> for German, since the same character pair may or may not allow ligation 
> depending on the content and meaning of the text - features that in the 
> Unicode model are relegated to *plain* text.
>

*sigh*  I'm clearly not expressing myself well here.

I'm trying to state the general rule.  Each time I do, I say there are 
exceptions.  German is an excellent example of an exception.  Michael's 
exceptional cases are exceptional cases.  We put ZWJ/ZWNJ in charge of 
plain-text ligature formation to handle these cases.  I'm fine with that.

Turkish is another exception, BTW, where the typical "fi" ligature of 
Latin typography should not be formed.

The issue -- as I see it -- is not whether or not *any* ligature control 
belongs in plain text, or whether or not manditory/prohibited ligation 
points should be marked in plain text.  I'm not aware of anyone who is 
arguing against that position.

We started out with a discussion of whether or not we should add more 
Latin ligatures (whether in the PUA or elsewhere) so that people can, in 
essence, create a plain-text representation of an older book where such 
were more common.  (And, as always, if my memory is inaccurate please feel 
free to correct me here.)  This is not an appropriate use of plain text 
IMHO.  I do not believe, moreover, that the ZWJ/ZWNJ mechanism is 
appropriate for this sort of thing.  This is rich text, and other ligation 
controls should be used.

> Therefore, I would be much happier if the discussion of the 'standard' 
> case wasn't as anglo-centric and allowed more directly for the fact that 
> while fonts are in control of what ligatures are provided, layout engines 
> may be in control of what and how many optional ligatures to use, the 
> text (!) must be in control of where ligatures are mandatory or 
> prohibited.
>

Which is what Unicode 3.2 says.  (You said it very nicely here, though.)

(The standard case, BTW, seems to be Anglo-centric largely because this is 
an English-speaking list and people always seem to start out with the "ct"
  ligature they'd like to put in words like "respectfully."  Sorry about 
that.)

==
John H. Jenkins
[EMAIL PROTECTED]
[EMAIL PROTECTED]
http://homepage.mac.com/jenkins/





Re: ZWJ and Latin Ligatures (was Re: (long) Re: Chromatic font research)

2002-07-01 Thread Asmus Freytag

At 11:34 AM 6/30/02 -0600, John H. Jenkins wrote:
>Remember, Unicode is aiming at encoding *plain text*.  For the bulk of 
>Latin-based languages, ligation control is simply not a matter of *plain 
>text*—that is, the message is still perfectly correct whether ligatures 
>are on or off.  There are some exceptional cases.  The ZWJ/ZWNJ is 
>available for such exceptional cases.

Remember also that the simplistic model you present already breaks down for 
German, since the same character pair may or may not allow ligation 
depending on the content and meaning of the text - features that in the 
Unicode model are relegated to *plain* text.

Therefore, I would be much happier if the discussion of the 'standard' case 
wasn't as anglo-centric and allowed more directly for the fact that while 
fonts are in control of what ligatures are provided, layout engines may be 
in control of what and how many optional ligatures to use, the text (!) 
must be in control of where ligatures are mandatory or prohibited.

I don't know of a case where a mandatory ligature of two characters is 
sometimes prohibited, which means that for all practical cases, mandatory 
ligatures, like LAM-ALIF tend to also be handled by the layout engine. But 
ligature prohibition is a quite regular feature of German orthography and 
any Unicode-based system that intends to provide generic support for Latin 
script use, should be able to support it. As the prohibition is on a 
case-by-case and word-by-word basis, it has to be marked in the text.

A./




Re: ZWJ and Latin Ligatures

2002-07-01 Thread Tex Texin

In following this thread, I am trying to find where, in a non-plain text
product, I have the ability to make two characters into a ligature or
cursively connected. (The latter I guess I could do with a wholesale
font change.) For example, I looked at Microsoft Word and found that I
can make the text shimmer and sparkle and have either marching red or
black ants (When will Unicode have characters to do those?) but I don't
see how to control ligatures.

As someone who is not a high-end typographer, I don't recall ever having
the ability to change ligaturing without replacing characters. I also do
not ever recall having the need to, which I understand is part of the
rationale for the Unicode policy. I am not trying to argue one way or
the other. The discussion refers to other ways of influencing a font
with respect to ligature and I don't recall ever seeing a way to do
this. What kinds of products have these abilities?

tex
-- 
-
Tex Texin   cell: +1 781 789 1898   mailto:[EMAIL PROTECTED]
Xen Master  http://www.i18nGuy.com
 
XenCrafthttp://www.XenCraft.com
Making e-Business Work Around the World
-




Re: ZWJ and Latin Ligatures (was Re: (long) Re: Chromatic font research)

2002-07-01 Thread John H. Jenkins


On Monday, July 1, 2002, at 06:28 AM, James Kass wrote:

>
> John H. Jenkins wrote:
>
>> That seems pretty clear to me.  If you want a "ct" ligature in your
>> document because you think it "looks cool," then you use some 
>> higher-level
>> protocol.  The "looks cool" factor simply doesn't apply unless you know
>> what font you're dealing with, because "ct" "looks cool" in some fonts,
>> but not others.
>
> It's enough that an author would want a "ct" ligature to appear in text,
> the motivation for the desire isn't relevant.  Authors who want to
> specify a certain ligature know about font selection.
>

Au contraire, because of the italic analog.  I may *want* a particular 
word to be in italics, but that doesn't mean that the italics belong in 
plain text.

It is not the goal of Unicode to allow the complete representation of an 
author's intent in plain text.  I can't typeset "Alice in Wonderland" in 
plain text.  I'm sorry, but the Mouse's tail would simply get in the way.

There's another level of problem here, too.  What if it isn't the author's 
intent, but an artifact of the particular typesetter?

> One problem with TR28 is that it is worded so that it appears to
> be "in addition" to earlier guidelines.  This implies that the examples
> used in TR27, for one, are still valid.  In TR27, font developers are
> urged to add things like "f+ZWJ+i" to existing tables where "f+i"
> is already present.
>

And for the record, Apple is doing that.

> Another problem with TR28 is that its date is earlier than the date
> on TR27.  This suggests that TR27 is more current.
>

This may be a point for clarification in TR28.

> Another issue is that a search of the Unicode site for "controlling
> ligatures" gives TR27 as a hit, but not TR28.
>
> Having slept on this, I concur that it might be "cool" to be able to
> turn on or turn off ligatures over a range of text or an entire file
> using a higher level protocol.  However, options should be preserved
> for the user.  Ligature selection is a task for the author/typesetter
> at the fundamental level; it should not be completely left to the
> rendering system.
>

Er, James.  I've never said it should.  The rendering system should have 
the ability to do default ligation.  The user should be able to override 
that behavior.  That's what happens on systems I see.  If they do ligation 
at *all*, they have a default behavior which can be overridden.

>> The programs that provide ligature control do so by means of having the
>> user select a range of text and then changing the level of ligation.  The
>> type formats like OpenType or AAT support this by allowing the type
>> designer to categorize ligatures as "common," "rare," "required," and so
>> on.  Thus, if I'm typesetting a document in Adobe InDesign, I'll select
>> text, and turn "rare" ligatures on and thus see the "ct" ligature, if it
>> exists in the font and if the type designer has designated it a "rare"
>> ligature.
>
> That's a lot of ifs and it leaves too much to chance.  When an author
> determines that, for instance, a "ct" ligature is required, there needs
> to be a method to encode it which is unambiguous.  ZWJ fits the bill.
>

I'll repeat a point that I've made over and over and over.

The "ct" ligature does not exist in and of itself.  It is a part of a 
typeface.  It doesn't make sense in general to ask for the formation of a 
"ct" ligature without any reference to the typeface you're using.

The implication of what you're saying is that Latin typefaces should be 
*required* to have a "ct" ligature on the off chance that the author of 
text determines that it's "required" in a particular context.  That gives 
most type designers the heebie jeebies.  It's bad enough that Adobe and 
Apple are making them stick useless "fi" and "fl" ligatures in their fonts.

In any event, if an author determines that a "ct" ligature is honestly and 
absolutely *required* in a particular context (as opposed to being 
desirable), then the ZWJ mechanism exists.

>> To be frank, turning on an optional "ct" ligature throughout a document 
>> by
>> means of inserting ZWJ everywhere you want it to take place makes as much
>> sense in that model—the model that Western typography uses for languages
>> such as English—as having the user insert a  pair around every
>> letter they want in italics.
>
> Not at all.  This is apples and oranges.  The italic tags operate upon
> every character in the enclosed string equally.  Using a similar ligature
> tag would be expected to make ligatures wherever possible within the
> enclosed string according the the user system's ability to render
> ligatures... irrespective of the author's intent.  Depending upon the
> system, the same run of text could be expressed with no ligatures
> at all in a monospaced font or as scripto continuo in a handwriting
> font.
>

Er, you've just made my point, haven't you?  The typeface makes a 
difference.  If you're ever in a situation where the typeface 

Re: ZWJ and Latin Ligatures

2002-07-01 Thread John H. Jenkins


On Monday, July 1, 2002, at 10:16 AM, Michael Everson wrote:

> Some nice person just said to me privately:
>
>> Michael Everson wrote:
>>
>>>  In my paper http://www.dkuug.dk/jtc1/sc2/wg2/docs/n2317.pdf I raised
>>>  a lot of questions about exceptions and the use of these. I don't
>>>  think they were ever all answered.My other papers, N2141 and N2147,
>>>  show a number of examples of ligation which is not particularly
>>>  predictable. That's what ZWJ us supposed to be for.
>>
>> That's because some people (not to mention any ad-hominem names; there
>> is more than one) are more interested in saying "This is a simple
>> problem, and the rendering systems of the future (or my Mac today) will
>> handle it automatically" than in answering the complex linguistic and
>> orthographic questions you raised.
>>

For the record, I (at least) have never asserted that Mac (or any other) 
system software will ever gain the ability to handle ligation on a 
completely automatic basis.  In any event, the ZWJ/ZWNJ mechanism has no 
advantage over any higher-level protocol when it comes to software support,
  since it's all being done via AAT/OpenType/Graphite or something similar 
in any event.

I guess one thing that's frustrating for me personally in this perennial 
discussion is the creation of this false dichotomy, that ligation control 
either *must* be in plain text or *must* be expressly forbidden in plain 
text.  I would agree, Michael, that your arguments that some degree of 
ligation control belongs in plain text were unanswerable.  You did a good 
job there.  But at the same time, I've never heard you argue that the only 
way to turn ligatures on or off is in plain text.

I feel compelled to reiterate my own feelings on the subject:  Ligation in 
Latin text is generally a matter of stylistic preference, and depends on 
the specific typeface being used and its set of available ligatures.  
There are exceptions, and these should be handled via the ZWJ/ZWNJ 
mechanism.  Where ligation is merely a matter of stylistic preference, 
however, it should be handled by some other mechanism which can take the 
specific capacities of a typeface into consideration.  System and other 
software can (and should) provide default ligation which the user should 
be able to override.

And under no circumstances should new Latin ligatures be added to Unicode.

>> Personally I think your ZERO-WIDTH LIGATOR papers are among the best of
>> all your Unicode-related papers.  I agreed with the decision to unify
>> the ligation function with ZWJ rather than creating a new character, but
>> your arguments about Latin, Greek, Runic, Old Hungarian, etc. ligation
>> were thorough and unassailable.
>
> Thank you, nice person. It's nice to know that someone else looked at the 
> argument and came up with the same conclusion that I did.
>

For the record, Michael, this was the general feeling of the UTC when the 
matter was debated there.

==
John H. Jenkins
[EMAIL PROTECTED]
[EMAIL PROTECTED]
http://homepage.mac.com/jenkins/





Re: ZWJ and Latin Ligatures

2002-07-01 Thread Michael Everson

Some nice person just said to me privately:

>Michael Everson wrote:
>
>>  In my paper http://www.dkuug.dk/jtc1/sc2/wg2/docs/n2317.pdf I raised
>>  a lot of questions about exceptions and the use of these. I don't
>>  think they were ever all answered.My other papers, N2141 and N2147,
>>  show a number of examples of ligation which is not particularly
>>  predictable. That's what ZWJ us supposed to be for.
>
>That's because some people (not to mention any ad-hominem names; there
>is more than one) are more interested in saying "This is a simple
>problem, and the rendering systems of the future (or my Mac today) will
>handle it automatically" than in answering the complex linguistic and
>orthographic questions you raised.
>
>Personally I think your ZERO-WIDTH LIGATOR papers are among the best of
>all your Unicode-related papers.  I agreed with the decision to unify
>the ligation function with ZWJ rather than creating a new character, but
>your arguments about Latin, Greek, Runic, Old Hungarian, etc. ligation
>were thorough and unassailable.

Thank you, nice person. It's nice to know that someone else looked at 
the argument and came up with the same conclusion that I did.
-- 
Michael Everson *** Everson Typography *** http://www.evertype.com




Re: ZWJ and Latin Ligatures (was Re: (long) Re: Chromatic font research)

2002-07-01 Thread James Kass


John H. Jenkins wrote:

> That seems pretty clear to me.  If you want a "ct" ligature in your
> document because you think it "looks cool," then you use some higher-level
> protocol.  The "looks cool" factor simply doesn't apply unless you know
> what font you're dealing with, because "ct" "looks cool" in some fonts,
> but not others.

It's enough that an author would want a "ct" ligature to appear in text,
the motivation for the desire isn't relevant.  Authors who want to
specify a certain ligature know about font selection.

One problem with TR28 is that it is worded so that it appears to
be "in addition" to earlier guidelines.  This implies that the examples
used in TR27, for one, are still valid.  In TR27, font developers are
urged to add things like "f+ZWJ+i" to existing tables where "f+i"
is already present.

Another problem with TR28 is that its date is earlier than the date
on TR27.  This suggests that TR27 is more current.

Another issue is that a search of the Unicode site for "controlling
ligatures" gives TR27 as a hit, but not TR28.

Having slept on this, I concur that it might be "cool" to be able to
turn on or turn off ligatures over a range of text or an entire file
using a higher level protocol.  However, options should be preserved
for the user.  Ligature selection is a task for the author/typesetter
at the fundamental level; it should not be completely left to the
rendering system.

> The programs that provide ligature control do so by means of having the
> user select a range of text and then changing the level of ligation.  The
> type formats like OpenType or AAT support this by allowing the type
> designer to categorize ligatures as "common," "rare," "required," and so
> on.  Thus, if I'm typesetting a document in Adobe InDesign, I'll select
> text, and turn "rare" ligatures on and thus see the "ct" ligature, if it
> exists in the font and if the type designer has designated it a "rare"
> ligature.

That's a lot of ifs and it leaves too much to chance.  When an author
determines that, for instance, a "ct" ligature is required, there needs
to be a method to encode it which is unambiguous.  ZWJ fits the bill.

> To be frank, turning on an optional "ct" ligature throughout a document by
> means of inserting ZWJ everywhere you want it to take place makes as much
> sense in that model—the model that Western typography uses for languages
> such as English—as having the user insert a  pair around every
> letter they want in italics.

Not at all.  This is apples and oranges.  The italic tags operate upon
every character in the enclosed string equally.  Using a similar ligature
tag would be expected to make ligatures wherever possible within the
enclosed string according the the user system's ability to render
ligatures... irrespective of the author's intent.  Depending upon the
system, the same run of text could be expressed with no ligatures
at all in a monospaced font or as scripto continuo in a handwriting
font.

Furthermore, ZWJ doesn't require proprietary software or proprietary
rich text formats which are often not exchangeable.

> Remember, Unicode is aiming at encoding *plain text*.  For the bulk of
> Latin-based languages, ligation control is simply not a matter of *plain
> text*—that is, the message is still perfectly correct whether ligatures
> are on or off.  There are some exceptional cases.  The ZWJ/ZWNJ is
> available for such exceptional cases.

Three cheers for plain text!  But, we disagree about 'perfectly correct'.
If an author is reproducing an older document in which the "ct"
ligature is used, rendering the "ct" string rather than the ligature
is not faithful to the source.  (Even though it might be semantically
equivalent—it is merely approximately correct...)

How about "Encyclopædia Britannica"?  That's plain text enough.
It's the title of a book; it isn't italic, bold, blue, or green.  To cite
from "Encyclopedia" or "Encyclopaedia" would be correct, but not
perfectly so.

Unicode provides the long "s" form, which is arguably a presentation
form.  Users have the option of directly encoding the long s form
where it is either appropriate or desired.  Trusting something like
long-s-substitution to a higher protocol is not desirable because of
exceptional cases like "Malmesbury" in which the final "s" is used
medially.  Fortunately, since the long s is a Unicode character, no
one has to resort to higher protocols.  Likewise for the "oe" ligature
and other Latin ligatures which are directly covered by Unicode.

"Onomatopoeia" and "Onomatopœia" are the same in one sense, much
like "font" and "fount".  Yet both pairs are also different.  Unicoders
have the option of specifying the "oe" ligature in plain text at the
fundamental level.  It is suggested that the Standard be consistent
with regard to Latin ligatures in this respect and preserve the use
of ZWJ for this purpose.

Best regards,

James Kass.






Re: ZWJ and Latin Ligatures (was Re: (long) Re: Chromatic fontresearch)

2002-07-01 Thread Michael Everson

At 11:34 -0600 2002-06-30, John H. Jenkins wrote:
>
>Remember, Unicode is aiming at encoding *plain text*.  For the bulk 
>of Latin-based languages, ligation control is simply not a matter of 
>*plain text*-that is, the message is still perfectly correct whether 
>ligatures are on or off.  There are some exceptional cases.  The 
>ZWJ/ZWNJ is available for such exceptional cases.

In my paper http://www.dkuug.dk/jtc1/sc2/wg2/docs/n2317.pdf I raised 
a lot of questions about exceptions and the use of these. I don't 
think they were ever all answered.My other papers, N2141 and N2147, 
show a number of examples of ligation which is not particularly 
predictable. That's what ZWJ us supposed to be for.

"Ligation is a normal if sometimes unpredictable feature in the 
following European scripts: Armenian, Cyrillic, Greek, Ogham, Old 
Church Slavonic, Old Hungarian, Runic."

-- 
Michael Everson *** Everson Typography *** http://www.evertype.com




Re: ZWJ and Latin Ligatures (was Re: (long) Re: Chromatic font research)

2002-06-30 Thread John H. Jenkins


On Sunday, June 30, 2002, at 05:31 AM, James Kass wrote:

> Can you please point me to a URL for Unicode 3.2 ligature control?
> This link (March 2002):
> http://www.unicode.org/unicode/reports/tr28/
> ...glosses over Latin ligatures suggesting that mark-up should be
> used in some cases and ZWJ in others.
>

The precise language of the TR is:



Ligatures and Latin Typography (addition)

It is the task of the rendering system to select a ligature (where 
ligatures are possible) as part of the task of creating the most pleasing 
line layout. Fonts that provide more ligatures give the rendering system 
more options.

However, defining the locations where ligatures are possible cannot be 
done by the rendering system, because there are many languages in which 
this depends not on simple letter pair context but on the meaning of the 
word in question. 

ZWJ and ZWNJ are to be used for the latter task, marking the non-regular 
cases where ligatures are required or prohibited. This is different from 
selecting a degree of ligation for stylistic reasons. Such selection is 
best done with style markup. See Unicode Technical Report #20, “Unicode in 
XML and other Markup Languages” for more information.



That seems pretty clear to me.  If you want a "ct" ligature in your 
document because you think it "looks cool," then you use some higher-level 
protocol.  The "looks cool" factor simply doesn't apply unless you know 
what font you're dealing with, because "ct" "looks cool" in some fonts, 
but not others.

In real Latin typography, the set of ligatures available with a typeface 
varies from font to font.  Type designers add ligatures (or not) depending 
on their esthetic sense of what looks good and how the letters interact 
with one another.  From a type design perspective, a monospaced font like 
Courier should have no ligatures; they don't make sense.  A rich book font 
like Adobe Minion Pro will have a fairly large but standard set, and a 
calligraphic font like Linotype's Zapfino will have a huge and imaginative 
set.

The programs that provide ligature control do so by means of having the 
user select a range of text and then changing the level of ligation.  The 
type formats like OpenType or AAT support this by allowing the type 
designer to categorize ligatures as "common," "rare," "required," and so 
on.  Thus, if I'm typesetting a document in Adobe InDesign, I'll select 
text, and turn "rare" ligatures on and thus see the "ct" ligature, if it 
exists in the font and if the type designer has designated it a "rare" 
ligature.

To be frank, turning on an optional "ct" ligature throughout a document by 
means of inserting ZWJ everywhere you want it to take place makes as much 
sense in that model—the model that Western typography uses for languages 
such as English—as having the user insert a  pair around every 
letter they want in italics.

Remember, Unicode is aiming at encoding *plain text*.  For the bulk of 
Latin-based languages, ligation control is simply not a matter of *plain 
text*—that is, the message is still perfectly correct whether ligatures 
are on or off.  There are some exceptional cases.  The ZWJ/ZWNJ is 
available for such exceptional cases.

==
John H. Jenkins
[EMAIL PROTECTED]
[EMAIL PROTECTED]
http://homepage.mac.com/jenkins/