ct, fj and blackletter ligatures

2002-11-01 Thread Thomas Lotze
Hi,

the alphabetic presentation forms starting at UFB00 contain a number of
ligatures for latin scripts, among them the more common ones like fi and
fl, but also rather exotic ones like st.

However, I find there are a couple of other ligatures in use, namely the
ct ligature (for instance to be found in Adobe Garamond), the fj
ligature, and a couple of ligatures common in blackletter typesetting:
among them ch, ck and tz. Would it be a good idea to propose these
ligatures for inclusion in Unicode?

Cheers, Thomas

-- 
Thomas Lotze

thomas.lotze at gmx.net  http://www.thomas-lotze.de/




FW: ct, fj and blackletter ligatures

2002-11-01 Thread Dominikus Scherkl
> rather exotic ones like st.
this one is VERY common in german.
But anyway - ligatures are depecated, you shouldn't use them.
And no new ligatures will be added.
It's up to a font (or display renderer) to ligate characters.
-- 
Dominikus Scherkl
[EMAIL PROTECTED]




Re: ct, fj and blackletter ligatures

2002-11-01 Thread John Cowan
Thomas Lotze scripsit:

> the alphabetic presentation forms starting at UFB00 contain a number of
> ligatures for latin scripts, among them the more common ones like fi and
> fl, but also rather exotic ones like st.

Those exist basically for compatibility and round-tripping with non-Unicode
character sets.  Their use is discouraged.  No more will be encoded.

(FAQkeeper, this or something like it should go in the Unicode FAQ.
The ligature_digraph page doesn't really address the question directly.)

-- 
My corporate data's a mess! John Cowan
It's all semi-structured, no less.  http://www.ccil.org/~cowan
But I'll be carefree[EMAIL PROTECTED]
Using XSLT  http://www.reutershealth.com
In an XML DBMS.




Re: ct, fj and blackletter ligatures

2002-11-01 Thread William Overington
The matter of ligatures arises fairly often in this discussion forum, often
in relation to German Fraktur, but also in relation to English printing of
the 18th Century and the use of fj in Norwegian.

In relation to regular Unicode the policy is that no more ligatures are to
be encoded.  My own view is that this should change.  However, that is
unlikely to do so.

Earlier this year, following from a posting about Fraktur ligatures, I
produced some encodings for ligatures using the Private Use Area.  I have
published them on the web at the following place.

http://www.users.globalnet.co.uk/~ngo/golden.htm

These are my own Private Use Area code point allocations for various
ligatures.  They are not in any way a standard yet they are a consistent set
which may be useful to those who wish to use them.  The only use I know of
any of them in a published font is in the Code2000 font, produced by James
Kass.  James uses the code points of this set for ct, fj and ffj in his
Code2000 font.

I feel that it might well be of interest to you, for your background
knowledge, to have a look at the encodings which I have produced, yet I
mention that these Private Use Area encodings are a matter of some
controversy.  Using them could lead to documents existing which could not be
text sorted alphabetically, or spellchecked.  However, if someone is just
wishing to produce a print out of some text with some ligatures in the text,
then the golden ligatures collection can be useful.  There seems to be a lot
of theoretical possibilities for doing ligatures with Unicode fonts using
advanced font technology using the latest computers, yet if, say, someone
wants to set and print out a page of Fraktur, that possibility does not
seem, as far as I know, to be a practically achievable result at the present
time using a piece of text encoded in regular Unicode using a font which
uses only regular Unicode encoding.  Indeed, it seems more likely that one
would need to use a Fraktur font with ligatures encoded with a code number
below 255, that is, a font which is not Unicode compatible.  The golden
ligatures collection is Unicode compatible, though, as I say, it is not a
standard.  It is just one person's self-published writing.  I like to think
of it as an artform, much as if I had produced a painting and placed a copy
of the painting on the web.  That is, it exists, it may be interesting to
people, yet it does not in any way prevent anyone else from doing something
different and it does not require anyone else to take any notice of it, yet
it is a cultural item in the world of art.

So, it depends what one is wanting to do.  If your enquiry is solely in
relation to formal encoding of ligatures in regular Unicode, then the golden
ligatures collection will be of no use to you.  However, if you are
producing a black letter font as part of your studies and would like to
encode ligatures, then the golden ligatures collection might perhaps be of
interest to you.  For example, if such a font were encoded using advanced
font technology, then the golden ligatures collection code points would not
be the way to approach the problem, though they could, if you so chose, be
used to provide an additional way of accessing the glyphs for people who
were trying to produce printouts using, say, a Windows 95 or a Windows 98
system.  If, however, such a font were produced as an ordinary TrueType
font, then in order to access the ligature glyphs you would need code points
in order to access the glyphs, one code point for each glyph.  In order to
be Unicode compatible, those code points would need to be in the Private Use
Area range of U+E000 to U+F8FF.  There is essentially complete freedom of
choice as to which code points to use, though the lower part is perhaps best
due to the suggestions about Private Use Area usage in the Unicode
specification.  However, the golden ligatures collection of code points is
there for your consideration if you wish.

Within my collection of code point allocations, ct is U+E707, fj is U+E70B,
ch is U+E708, ck is U+E709, tz is U+E70F.

These are all in the following document.

http://www.users.globalnet.co.uk/~ngo/ligature.htm

The ffj is encoded at U+E773 in the following document.

http://www.users.globalnet.co.uk/~ngo/ligatur2.htm

There are some black letter ligature encodings including pp at U+E76C and
ppe at U+E77E in the following document.

http://www.users.globalnet.co.uk/~ngo/ligatur5.htm

The Private Use Area is described in Chapter 13, section 13.5 of the Unicode
specification.  There is a file named ch13.pdf available from one of the
pages in the http://www.unicode.org website.

The main index page of our family web site is as follows.

http://www.users.globalnet.co.uk/~ngo

William Overington

2 November 2002

-Original Message-
From: Thomas Lotze <[EMAIL PROTECTED]>
To: [EMAIL PROTECTED] <[EMAIL PROTECTED]>
Date: Friday, November 01, 2002 12:28 PM
Subject: ct, fj and blackletter ligatu

Re: ct, fj and blackletter ligatures

2002-11-02 Thread Michael Everson
At 07:18 + 2002-11-02, William Overington wrote:


These are my own Private Use Area code point allocations for various
ligatures.  They are not in any way a standard yet they are a consistent set
which may be useful to those who wish to use them.  The only use I know of
any of them in a published font is in the Code2000 font, produced by James
Kass.  James uses the code points of this set for ct, fj and ffj in his
Code2000 font.


James, if you would kindly take these crap out of your font we could 
put an end to this silliness.
--
Michael Everson * * Everson Typography *  * http://www.evertype.com



Re: ct, fj and blackletter ligatures

2002-11-02 Thread Mark Davis
If you (or anyone else) have an idea for a Q&A for the FAQ, just write it up
and submit it on http://www.unicode.org/unicode/reporting.html.

I see a great many promising Q&A's go by on this list; it would really help
to get some volunteers to clean them up a bit and submit them. (They don't
have to be formatted; just plain text in the style of any of the existing
Q&A's.)

Mark
__
http://www.macchiato.com
►  “Eppur si muove” ◄

- Original Message -
From: "John Cowan" <[EMAIL PROTECTED]>
To: "Thomas Lotze" <[EMAIL PROTECTED]>
Cc: <[EMAIL PROTECTED]>
Sent: Friday, November 01, 2002 05:59
Subject: Re: ct, fj and blackletter ligatures


> Thomas Lotze scripsit:
>
> > the alphabetic presentation forms starting at UFB00 contain a number of
> > ligatures for latin scripts, among them the more common ones like fi and
> > fl, but also rather exotic ones like st.
>
> Those exist basically for compatibility and round-tripping with
non-Unicode
> character sets.  Their use is discouraged.  No more will be encoded.
>
> (FAQkeeper, this or something like it should go in the Unicode FAQ.
> The ligature_digraph page doesn't really address the question directly.)
>
> --
> My corporate data's a mess! John Cowan
> It's all semi-structured, no less.  http://www.ccil.org/~cowan
> But I'll be carefree[EMAIL PROTECTED]
> Using XSLT
http://www.reutershealth.com
> In an XML DBMS.
>
>





Re: ct, fj and blackletter ligatures

2002-11-02 Thread Peter_Constable
On 11/02/2002 01:18:43 AM "William Overington" wrote:

>The matter of ligatures arises fairly often in this discussion forum

Mostly because there is a regular flow of newcomers who haven't yet 
learned about the Standard in detail and who fail to check the FAQ page 
before raising the issue, or because of others who simply keep coming back 
to it -- whether it's because they just don't get it, are bored, or what, 
I don't know.



- Peter


---
Peter Constable

Non-Roman Script Initiative, SIL International
7500 W. Camp Wisdom Rd., Dallas, TX 75236, USA
Tel: +1 972 708 7485
E-mail: <[EMAIL PROTECTED]>





Re: ct, fj and blackletter ligatures

2002-11-02 Thread Thomas Lotze
On Sat, 2 Nov 2002 07:18:43 -
"William Overington" <[EMAIL PROTECTED]> wrote:

> In relation to regular Unicode the policy is that no more ligatures
> are to be encoded.  My own view is that this should change.  However,
> that is unlikely to do so.

I agree with you. Ligatures may have semantics that can be composed from
characters already Unicode encoded, but they are separate glyphs whose
shape cannot be inferred from that of others but has to be designed
separately and stored somewhere in a font.

> http://www.users.globalnet.co.uk/~ngo/golden.htm

Thanks, I'll look at that.

> There seems to be a lot
> of theoretical possibilities for doing ligatures with Unicode fonts
> using advanced font technology using the latest computers,

That's something I don't understand; in fact I wondered about this
before when I read Dominikus' posting where he says "It's up to a font
(or display renderer) to ligate characters."

After all, it's the typographer's decision whether to use a ligature in
a given circumstance; using ligatures is not as simple as blindly
replacing all occurrences of a character sequence by the ligature. So, a
font cannot possibly be made repsonsible for ligature handling since it
doesn't know where to use a ligature; that knowledge can only reside
within the typeset document. The font can only contain the glyph shape
of the ligature, to be used where the typographer thinks it's
appropriate.

> Indeed, it seems more likely that one
> would need to use a Fraktur font with ligatures encoded with a code
> number below 255,

Why below 255?

Thank you for your interesting response.

Cheers, Thomas

-- 
Thomas Lotze

[EMAIL PROTECTED]  http://www.thomas-lotze.de/





Re: ct, fj and blackletter ligatures

2002-11-02 Thread Thomas Lotze
On Sat, 2 Nov 2002 10:38:36 +
Michael Everson <[EMAIL PROTECTED]> wrote:

> At 07:18 + 2002-11-02, William Overington wrote:
> 
> >These are my own Private Use Area code point allocations for various
> >ligatures.  They are not in any way a standard yet they are a
> >consistent set which may be useful to those who wish to use them. 
> >The only use I know of any of them in a published font is in the
> >Code2000 font, produced by James Kass.  James uses the code points of
> >this set for ct, fj and ffj in his Code2000 font.
> 
> James, if you would kindly take these crap out of your font we could 
> put an end to this silliness.

Excuse a silly question or two, but I'm rather new to Unicode. Why
shouldn't he be allowed to use the Private Use Area just as he
personally sees fit? How would you approach the ligature problem
instead? 

Cheers, Thomas

-- 
Thomas Lotze

[EMAIL PROTECTED]  http://www.thomas-lotze.de/





Re: ct, fj and blackletter ligatures

2002-11-02 Thread jameskass

Michael Everson wrote,

> James, if you would kindly take these crap out of your font we could 
> put an end to this silliness.

If I want to encode the ct ligature, I can use "c" + "" + "t".  But,
if I want to display the ct ligature on the tools available here, U+E707
is the only option.

If someone sends me a file containing the "ct" ligature encoded using
the ZWJ method, I can open the file in an editor and globally replace
all instances of "c" + ZWJ + "t" with U+E707 and display the file as
the author intended. 

(In response to your kind advice on matters PUA, and the good advice
of others, I am removing certain presentation forms from the PUA
of Code2000, though.  The Telugu forms, for instance, have already
been removed and won't be directly visible in the next release.)

William Overington wrote,

> The matter of ligatures arises fairly often in this discussion forum, often
> in relation to German Fraktur, but also in relation to English printing of
> the 18th Century and the use of fj in Norwegian.

The "ct" ligature, for one, isn't confined to English typography and isn't 
limited to the 18th century.

Spelman's "Of the Law Terms: ...", a 17th century publication, uses the
"ct" ligature.

The title page of Alexander of Rhodes' catechism in Latin and Quoc-Ngu,
printed in Rome, 1649, uses the "ct" ligature in the Latin portions of
the text.  One presumes that if the Viet portion of the text had included
any "ct" strings, they, too, would have been ligated.

(For the long-s fans, the word "Missionario" on the title page uses
long-s followed by final-s.  Anyone seeking orthotypographic rules
for medieval text should remember that even spelling wasn't well
standardized [or standardised].  Shakespeare, for example, didn't 
know how to spell his own last name.*)

A couple of 16th century books shown in "INTRODUCCION A LA
HISTORIA DEL LIBRO Y DE LAS BIBLIOTECAS" use the "ct" ligature
in French (figura 64, Geofroy Tory de Bourges, Aux Studieux &
bons Lecteurs dit & donne humble Salut.) and in Spanish (figura 72,
DOCTRINA / CHRISTIANA,EN LENGVA ME / xicana muy necessaria : ...)

Best regards,

James Kass,

* Neither did Queen Elizabeth I  - (heh heh)





Re: ct, fj and blackletter ligatures

2002-11-02 Thread jameskass

Thomas Lotze wrote,

> ... Why
> shouldn't he be allowed to use the Private Use Area just as he
> personally sees fit? 

Many "Unicoders" regard the PUA as some kind of a "Phantom Zone"
into which all of the "bad glyphs" are banished forever, never
to again be mentioned in "polite society".

Others consider the the "Private" in PUA to be a misnomer, considering
it to be more "Public".  In other words, its a Free Zone reserved by
the consortium for open use.  Many users use the PUA for temporary
work-around solutions to display issues.

> How would you approach the ligature problem
> instead? 
> 

Ideally, ligation should be handled by the font and operating system
based upon 1) author's wishes, 2) user's wishes, 3) computer's wishes.
(Where 1 rules yet 2 can over-ride 1.  And 3 can't over-ride either.)

In order to preserve important aspects of text processing, including
spelling validation and sorting/indexing, the Unicode Standard uses
some invisible, no-width formatting characters.  The ZWJ (zero-width
joiner), for example, requests the OS and font to provide a connected
or joined glyph in substitution for the string in the display, if such 
a glyph is available in the font.  So, the string "c" plus "" 
plus "t" would be expected to render a "ct" ligature in display if 
possible.

One popular, upcoming  method of providing such substitutions involves
OpenType technology.  Eagerly awaited advances in OpenType have been
occurring frequently of late, and perhaps more complete ligation support
for Latin typography will appear soon.

Best regards,

James Kass.




Re: ct, fj and blackletter ligatures

2002-11-02 Thread Thomas Lotze
On Sat, 02 Nov 2002 16:06:53 +
[EMAIL PROTECTED] wrote:

> The ZWJ (zero-width
> joiner), for example, requests the OS and font to provide a connected
> or joined glyph in substitution for the string in the display, if such
> a glyph is available in the font.

In the meantime, I found out about ZWJ (this one could be mentioned in
the FAQ, BTW). Now I agree that it is preferable not to use ligature
code points in documents. However, this isn't a matter of principle, it
just avoids having to resolve ligatures into their constituents when,
eg, searching documents, and requires instead ignoring the ZWJ, which is
easier to do.

Regardless of how the document is coded, the fact remains that ligature
glyph shapes have to be stored in the font, at some code point. IMO,
they might just as well be given official code points instead of being
banned to the PUA. A side effect of making them official could be that
their use and, more importantly, their being provided in fonts in the
first place, is encouraged, which would be good for the quality of
computer typesetting.

> One popular, upcoming  method of providing such substitutions involves
> OpenType technology.

As long as this doesn't mean ligature support in conjunction with
Unicode becomes (at least in practice) restricted to Opentype fonts,
this seems to be a good approach indeed.

Cheers, Thomas

-- 
Thomas Lotze

[EMAIL PROTECTED]  http://www.thomas-lotze.de/





Re: ct, fj and blackletter ligatures

2002-11-02 Thread Thomas Lotze
On Sat, 02 Nov 2002 17:21:06 +
[EMAIL PROTECTED] wrote:

> This is possible because, other than the cmap 
> (character-to-glyph mapping) table, all of the other tables in
> the font use a glyph index [...] internally.
> 
> Such glyphs, since they can't be directly called, are only accessible
> via so-called "smart font" technology, like OpenType, AAT, and
> Graphite.

How does this compare to unmapped glyphs in Type1 fonts, which can be
made accessible by re-encoding the font? Are they hidden at a deeper
level, or is it essentially the same thing? Do they get glyph names so a
program that can parse the font file can identify and use them even
though they are not mapped?

Cheers, Thomas

-- 
Thomas Lotze

[EMAIL PROTECTED]  http://www.thomas-lotze.de/





RE: ct, fj and blackletter ligatures

2002-11-02 Thread Carl W. Brown
Thomas,

It seems that the private use area is abused.  If you are sending characters
between two systems that are not a part of the Unicode standard then you can
use the private use area with agreed code points.

With ligatures you scan the text and identify ligature pairs.  The resultant
text is then printed or displayed using new code points that are internal to
that system.  There is a code point range U+FDD0 to U+FDEF that can be used
for internal processes.

Carl


> -Original Message-
> From: [EMAIL PROTECTED] [mailto:unicode-bounce@;unicode.org]On
> Behalf Of Thomas Lotze
> Sent: Saturday, November 02, 2002 6:25 AM
> To: [EMAIL PROTECTED]
> Subject: Re: ct, fj and blackletter ligatures
>
>
> On Sat, 2 Nov 2002 10:38:36 +
> Michael Everson <[EMAIL PROTECTED]> wrote:
>
> > At 07:18 + 2002-11-02, William Overington wrote:
> >
> > >These are my own Private Use Area code point allocations for various
> > >ligatures.  They are not in any way a standard yet they are a
> > >consistent set which may be useful to those who wish to use them.
> > >The only use I know of any of them in a published font is in the
> > >Code2000 font, produced by James Kass.  James uses the code points of
> > >this set for ct, fj and ffj in his Code2000 font.
> >
> > James, if you would kindly take these crap out of your font we could
> > put an end to this silliness.
>
> Excuse a silly question or two, but I'm rather new to Unicode. Why
> shouldn't he be allowed to use the Private Use Area just as he
> personally sees fit? How would you approach the ligature problem
> instead?
>
> Cheers, Thomas
>
> --
> Thomas Lotze
>
> [EMAIL PROTECTED]  http://www.thomas-lotze.de/
>
>
>






Re: ct, fj and blackletter ligatures

2002-11-02 Thread John Hudson
At 07:24 11/2/2002, Thomas Lotze wrote:


On Sat, 2 Nov 2002 07:18:43 -
"William Overington" <[EMAIL PROTECTED]> wrote:

> In relation to regular Unicode the policy is that no more ligatures
> are to be encoded.  My own view is that this should change.  However,
> that is unlikely to do so.

I agree with you. Ligatures may have semantics that can be composed from
characters already Unicode encoded, but they are separate glyphs whose
shape cannot be inferred from that of others but has to be designed
separately and stored somewhere in a font.


Thomas, please go and read the FAQ and the relevant parts of the Unicode 
Standard before you start agreeing with William. Yes, ligatures are 
separate glyphs, but not every glyph in a font needs to be encoded. A ct 
ligature is a variant glyph representation of the characters c and t; it 
does not need to be encoded, because it is possible to display the 
character sequence ct with a ligature using font layout features. Unicode 
is a *character* encoding standard, not a glyph encoding scheme. As 
previously noted, the handful of Latin ligatures included in the Alphabet 
Presentation Forms block are included only for backwards compatibility with 
non-Unicode standards that did not have a good character/glyph distinction.

Please also note, and this is very important, that using Private Use Area 
codepoints for elements that are meant to represent sequences of characters 
in normal text, such as ligatures is a REALLY BAD IDEA. This has been 
explained to William dozens of times, but he appears to be too wrapped up 
in his own erroneous brilliance to listen to reason. If you use PUA 
codepoints for glyph variants in text, you immediately lose all the 
benefits of a clean character/glyph distinction: you cannot sort text, you 
cannot spellcheck text, you cannot search text, and you have absolutely no 
guarantee that another user is going to be able to correctly display your text.

Let me put it another way. Think about the paradigm you are working within 
if you encode every glyph variant in a font. I see you standing in front of 
a tray of metal type, hunting and picking for the little bit of lead that 
*looks* correct. You pick up a bit of metal that has a ct ligature on the 
end of it, and you put it in your composing stick. The semantic 
relationship of that piece of metal to the letters c and t exists only in 
your mind. The piece of metal is dumb: it carries no meaning. That is the 
paradigm you are working in if you are typesetting text on a computer using 
PUA codepoints for glyph variants. A PUA codepoint in a stream of text is 
as meaningless as the piece of metal with a ligature on the end. You are 
applying an analogue, metal type paradigm to digital text processing, and 
in the process you are losing most of the benefits of using a computer. 
Does that make any sense?

If you are interested in learning more about font layout features for glyph 
variants, and how a smart font format like OpenType works with the Unicode 
Standard, you might find this article at the Microsoft Typography website 
useful:

http://www.microsoft.com/typography/developers/opentype/default.htm


John Hudson

Tiro Typeworks		www.tiro.com
Vancouver, BC		[EMAIL PROTECTED]

It is necessary that by all means and cunning,
the cursed owners of books should be persuaded
to make them available to us, either by argument
or by force.  - Michael Apostolis, 1467




Re: ct, fj and blackletter ligatures

2002-11-02 Thread John Hudson
At 10:55 11/2/2002, Thomas Lotze wrote:


How does this compare to unmapped glyphs in Type1 fonts, which can be
made accessible by re-encoding the font? Are they hidden at a deeper
level, or is it essentially the same thing? Do they get glyph names so a
program that can parse the font file can identify and use them even
though they are not mapped?


Unencoded glyphs in OpenType fonts (which use the TrueType sfnt table 
structure but may contain either TrueType or PostScript outlines) have no 
entries in the cmap table. It is possible to hack a font and add cmap table 
entries for such glyphs, so there is a parallel to re-encoding a Type 1 font.

Yes, variant glyphs will have glyph names (unless, e.g.  a format 3 'post' 
table is used, in which case no glyphs have names), and these can be 
parsed. For example, Adobe InDesign parses the names of some standard 
ligatures (ff fi fl ffi ffl) regardless of font format, so is able to do 
ligature substitutions for these without relying on glyph substitution 
lookups in the font. For information about glyph naming and its 
relationship to Unicode, see 
http://partners.adobe.com/asn/developer/typeforum/unicodegn.html

John Hudson

Tiro Typeworks		www.tiro.com
Vancouver, BC		[EMAIL PROTECTED]

It is necessary that by all means and cunning,
the cursed owners of books should be persuaded
to make them available to us, either by argument
or by force.  - Michael Apostolis, 1467




Re: ct, fj and blackletter ligatures

2002-11-02 Thread John Hudson
At 09:22 11/2/2002, Thomas Lotze wrote:


In the meantime, I found out about ZWJ (this one could be mentioned in
the FAQ, BTW). Now I agree that it is preferable not to use ligature
code points in documents. However, this isn't a matter of principle, it
just avoids having to resolve ligatures into their constituents when,
eg, searching documents, and requires instead ignoring the ZWJ, which is
easier to do.


It should be noted that using ZWJ is a valid way to encode the desirability 
of a ligature in plain text, but it is far from being a guarantee of 
displaying such a ligature. There are a lot of fonts out there with glyph 
substitution lookups that will correctly display something like a ct 
ligature using layout features (discretionary, controlled by the user) in 
OT savvy apps like Adobe InDesign, but will do so only for the sequence 
c+t. Ironically, the sequence c+ZWJ+t is more likely *not* to display as a 
ligature, since the ZWJ interferes with the sequence recognised by the font 
lookups.

I think some font developers will begin including additional ligature 
lookups using ZWJ, but I suspect that the majority will not. Most font 
developers are focused on markets in which users do not encode ligature 
preferences in plain text, and in which the use or non-use of ligatures is 
a typographical decision independent of the authorship of a document. Most 
font developers have never heard of ZWJ. Nor, come to think of it, have 
most users.

John Hudson

Tiro Typeworks		www.tiro.com
Vancouver, BC		[EMAIL PROTECTED]

It is necessary that by all means and cunning,
the cursed owners of books should be persuaded
to make them available to us, either by argument
or by force.  - Michael Apostolis, 1467




Re: ct, fj and blackletter ligatures

2002-11-02 Thread Doug Ewell
John Hudson  wrote:

> It should be noted that using ZWJ is a valid way to encode the
> desirability of a ligature in plain text, but it is far from being a
> guarantee of displaying such a ligature. There are a lot of fonts out
> there with glyph substitution lookups that will correctly display
> something like a ct ligature using layout features (discretionary,
> controlled by the user) in OT savvy apps like Adobe InDesign, but
> will do so only for the sequence c+t.  Ironically, the sequence
> c+ZWJ+t is more likely *not* to display as a ligature, since the ZWJ
> interferes with the sequence recognised by the font lookups.

Using ZWJ to control ligation is admittedly a new concept, and it may
not have been taken up yet by many vendors, but that seems like a really
poor reason to discourage the Unicode approach.

Proprietary layout features in OT-savvy apps like InDesign might get the
job done, but wouldn't it be better if app vendors and font vendors
would follow the Unicode Standard recommendation?  You never know, it
might even reduce the number of requests to encode ligatures.

-Doug Ewell
 Fullerton, California





Re: ct, fj and blackletter ligatures

2002-11-02 Thread John Hudson
At 14:59 11/2/2002, Doug Ewell wrote:


> It should be noted that using ZWJ is a valid way to encode the
> desirability of a ligature in plain text, but it is far from being a
> guarantee of displaying such a ligature. There are a lot of fonts out
> there with glyph substitution lookups that will correctly display
> something like a ct ligature using layout features (discretionary,
> controlled by the user) in OT savvy apps like Adobe InDesign, but
> will do so only for the sequence c+t.  Ironically, the sequence
> c+ZWJ+t is more likely *not* to display as a ligature, since the ZWJ
> interferes with the sequence recognised by the font lookups.



Using ZWJ to control ligation is admittedly a new concept, and it may
not have been taken up yet by many vendors, but that seems like a really
poor reason to discourage the Unicode approach.

Proprietary layout features in OT-savvy apps like InDesign might get the
job done, but wouldn't it be better if app vendors and font vendors
would follow the Unicode Standard recommendation?  You never know, it
might even reduce the number of requests to encode ligatures.


But using ZWJ still requires proprietary layout features in OT-savvy apps. 
Whenever you're mapping from a character sequence to an unencoded glyph 
variant, you need a higher level protocol than is provided in a character 
encoding standard. c+ZWJ+t needs to be resolved to a ct ligature, and that 
requires a font level lookup and a line layout engine that understands and 
implements such lookups.

The useful thing that the ZWJ does is to provide authors with a means to 
explicitly request ligation in a plain text document, but this is a *very* 
unusual requirement outside of a couple of obscure scripts (e.g. Old 
Hungarian). When someone sits down to type a typical English language 
document (or German, or French, etc.), he generally does not think about 
ligatures at all: he types his language in the commonly accepted manner, 
inputting character codes. Ligatures are a *typographical* feature of the 
Latin, not an orthographic one. Authors, by which I mean any user creating 
a document, do not think about ligatures unless they happen to be 
specialists, e.g. palaeographers, working with documents in which features 
such as ligatures are germane to the content. In probably 98% of all Latin 
script documents, standard ligature substitution should be automatic and 
global, which is what current smart font and professional layout 
applications provide (of course, users can turn ligatures off, globally or 
on an individual basis).

There are hundreds of thousands of books and periodicals published every 
year, almost all of them using standard ligatures in accordance with 500 
years of typographical convention. I suggest that, at most, a handful of 
authors gave any thought to ligation while they were writing their texts, 
either because they are specialists working in document studies, or because 
they feel the need to meddle in the typographer's job.

So, from this it should be obvious that we need automatic ligature 
substitution in higher level protocols using smart font features, because 
most documents that should have ligatures when published will not have them 
encoded in the text source. This is the reality of publishing, and it is 
the market on which the vast majority of font developers are focused. In 
this market, ZWJ ligation is rightly viewed as unnecessary except for a 
very small minority of specialist uses that will probably require 
specialist fonts from specialist foundries.

Far from discouraging my colleagues from supporting ZWJ in font lookups, 
I've made proposals on the OpenType list suggesting how this might be done 
in a way that is easy, does not conflict with current ligature 
implementations, and respects authorial intent (by using the Required 
Ligatures  feature instead of the optional Standard Ligatures  
feature). I don't expect many people to do this, since ZWJ ligation 
provides a solution to something that almost no one sees as a problem.

Finally, I don't think it is right to characterise ZWJ ligation as 'the 
Unicode approach' if the implication is that the approach taken in smart 
font technologies like OpenType, AAT and Graphite are somehow anti-Unicode. 
All these technologies are based on Unicode text processing, and the 
Unicode character/glyph model is the basis of the higher level protocols 
that they provide.

John Hudson

Tiro Typeworks		www.tiro.com
Vancouver, BC		[EMAIL PROTECTED]

It is necessary that by all means and cunning,
the cursed owners of books should be persuaded
to make them available to us, either by argument
or by force.  - Michael Apostolis, 1467




Re: ct, fj and blackletter ligatures

2002-11-02 Thread Thomas Lotze
On Sat, 02 Nov 2002 11:41:47 -0700
John Hudson <[EMAIL PROTECTED]> wrote:

> Ironically, the sequence c+ZWJ+t is more likely *not* to display as a 
> ligature, since the ZWJ interferes with the sequence recognised by the
> font lookups.

Does this mean that it is indeed common practice to replace every last
occurrence of a character sequence be the corresponding ligature with
these fonts? At least in the german language, this would be desastrous:
there are many cases where character sequences occur but the ligature is
not allowed, e.g. if two words are combined into one, the last letter of
the first one does not form a ligature with the first letter of the
second even if such a ligature exists. Don't know about other languages,
though.

Cheers, Thomas

-- 
Thomas Lotze

[EMAIL PROTECTED]  http://www.thomas-lotze.de/





Re: ct, fj and blackletter ligatures

2002-11-02 Thread Thomas Lotze
On Sat, 02 Nov 2002 11:19:41 -0700
John Hudson <[EMAIL PROTECTED]> wrote:

> If you use PUA 
> codepoints for glyph variants in text, you immediately lose all the 
> benefits of a clean character/glyph distinction:

I understand that perfectly well, and now that I've learnt about ZWJ I
don't see any reason anymore to represent a ligature in a document by a
(PUA or deprecated standard) UV instead of its constituents, connected
by ZWJs. OTOH, that's a matter of document coding style, not one of font
encoding. I don't see any harm in assigning standard UVs to ligatures
other than that users who don't understand the difference between font
encoding and text encoding will be encouraged to use them in documents.
However, the statement that Unicode is meant as a character encoding
instead of as a glyph encoding is clear enough for me.

> If you are interested in learning more about font layout features for
> glyph variants, and how a smart font format like OpenType works with
> the Unicode Standard, you might find this article at the Microsoft
> Typography website useful:
> 
> http://www.microsoft.com/typography/developers/opentype/default.htm

I'll take a look at it.

Thank you for your detailed and helpful response.

Cheers, Thomas

-- 
Thomas Lotze

[EMAIL PROTECTED]  http://www.thomas-lotze.de/





Re: ct, fj and blackletter ligatures

2002-11-02 Thread John Cowan
Thomas Lotze scripsit:

> Regardless of how the document is coded, the fact remains that ligature
> glyph shapes have to be stored in the font, at some code point. 

No, this is an error.  It is not the case that every glyph in the font
must correspond to a single Unicode character.  Some glyphs may very
well be invoked by the font engine in order to render a sequence of
Unicode characters -- modern fonts contain tables that indicate when
this should be done.

By the same token, a single character may have multiple glyphs to be
used in different circumstances, again based on the font's tables.

-- 
John Cowanhttp://www.ccil.org/~cowan   <[EMAIL PROTECTED]>
"Any legal document draws most of its meaning from context.  A telegram
that says 'SELL HUNDRED THOUSAND SHARES IBM SHORT' (only 190 bits in
5-bit Baudot code plus appropriate headers) is as good a legal document
as any, even sans digital signature." --me




Re: ct, fj and blackletter ligatures

2002-11-02 Thread John Hudson
At 15:53 11/2/2002, Thomas Lotze wrote:


> Ironically, the sequence c+ZWJ+t is more likely *not* to display as a
> ligature, since the ZWJ interferes with the sequence recognised by the
> font lookups.

Does this mean that it is indeed common practice to replace every last
occurrence of a character sequence be the corresponding ligature with
these fonts? At least in the german language, this would be desastrous:
there are many cases where character sequences occur but the ligature is
not allowed, e.g. if two words are combined into one, the last letter of
the first one does not form a ligature with the first letter of the
second even if such a ligature exists. Don't know about other languages,
though.


German is indeed a special case, and there are various ideas for how best 
to handle German ligation. Clearly, inserting ZWJ where one wanted ligation 
-- or, perhaps, ZWNJ where one didn't want it -- is an option. Using ZWNJ 
is probably a better solution, if one went this route, since it would work 
with existing ligature implementations in OpenType, AAT and Graphite: i.e. 
it would prevent ligatures from forming. However, expecting German users to 
manually enter ZWJ or ZWNJ in their documents seems highly impractical, so 
an automated dictionary-driven system seems to be required. This is getting 
outside my area of expertise.

For other languages, it is typical for a set of standard ligatures (usually 
those involving f followed by an ascending form: fb ffb ff fh ffh fi ffi fj 
ffj fk ffk fl ffl) to be on by default because these ligatures are not 
merely stylistic but preserve word shape integrity by reducing white space 
between the letters while avoiding distracting collisions. The well known 
exception to this is found in the typography of those Turkic languages that 
employ a dotless as well as a dotted i: for these languages fi and ffi 
ligature formation needs to be supressed, or special ligatures need to be 
provided that do not remove the dot of the i. OpenType includes a Language 
System tag that allows a layout feature such as Standard Ligatures  
to have different lookups for different writing systems.

Stylistic ligatures such as ct and st, are typically handled in a separate 
feature, which is not on by default. Obviously in fraktur fonts the 
designer might decide to include ligatures like ch and ck in the standard set.

John Hudson

Tiro Typeworks		www.tiro.com
Vancouver, BC		[EMAIL PROTECTED]

It is necessary that by all means and cunning,
the cursed owners of books should be persuaded
to make them available to us, either by argument
or by force.  - Michael Apostolis, 1467




Re: ct, fj and blackletter ligatures

2002-11-03 Thread Peter_Constable
On 11/02/2002 10:22:52 AM Thomas Lotze wrote:

>Regardless of how the document is coded, the fact remains that ligature
>glyph shapes have to be stored in the font, at some code point.

No, they do not. For instance, in recent versions of Times New Roman, you 
will find 208 glyphs that are not encoded at some code point. They are 
accessible via lookup rules in various OpenType tables within the font, 
but not via the cmap table (it's the cmap that determines what is directly 
encoded at some code point).



- Peter


---
Peter Constable

Non-Roman Script Initiative, SIL International
7500 W. Camp Wisdom Rd., Dallas, TX 75236, USA
Tel: +1 972 708 7485
E-mail: <[EMAIL PROTECTED]>





Re: ct, fj and blackletter ligatures

2002-11-03 Thread Peter_Constable
On 11/02/2002 04:43:40 PM Thomas Lotze wrote:

>I don't see any harm in assigning standard UVs to ligatures
>other than that users who don't understand the difference between font
>encoding and text encoding will be encouraged to use them in documents.

I don't consider that harm insignficant. Also, what's the benefit? The 
only benefit I can see is a very temporary one: older software does not 
support rendering via smart-font technologies, thus can only display 
glyphs that are directly encoded. Even in that situations, I'd probably 
just use one of Adobe's expert set fonts (old wine for old wine skins) 
rather than create new fonts with ligatures encoded as PUA characters.



- Peter


---
Peter Constable

Non-Roman Script Initiative, SIL International
7500 W. Camp Wisdom Rd., Dallas, TX 75236, USA
Tel: +1 972 708 7485
E-mail: <[EMAIL PROTECTED]>





Re: ct, fj and blackletter ligatures

2002-11-03 Thread Peter_Constable
On 11/02/2002 03:59:53 PM "Doug Ewell" wrote:

>Using ZWJ to control ligation is admittedly a new concept, and it may
>not have been taken up yet by many vendors, but that seems like a really
>poor reason to discourage the Unicode approach.

I think not all vendors are entirely happy with it, at least in some 
details.



- Peter


---
Peter Constable

Non-Roman Script Initiative, SIL International
7500 W. Camp Wisdom Rd., Dallas, TX 75236, USA
Tel: +1 972 708 7485
E-mail: <[EMAIL PROTECTED]>





Re: ct, fj and blackletter ligatures

2002-11-03 Thread Peter_Constable
On 11/02/2002 08:24:06 AM Thomas Lotze wrote:

>> Indeed, it seems more likely that one
>> would need to use a Fraktur font with ligatures encoded with a code
>> number below 255,
>
>Why below 255?

It's a good question, why below 255. It indicates a lack of understanding 
of how fonts work -- at least TrueType fonts on Windows, which I'm pretty 
sure is what William is most concerned with. The glyphs that get encoded 
in TrueType fonts for Windows have always been encoded using Unicode, and 
not all in the range U+0020..U+00FF. Even the characters in the Western 
codepage are encoded with codepoints ranging from U+0020 to U+2122 
(decimal 32 to 8,482).


- Peter


---
Peter Constable

Non-Roman Script Initiative, SIL International
7500 W. Camp Wisdom Rd., Dallas, TX 75236, USA
Tel: +1 972 708 7485
E-mail: <[EMAIL PROTECTED]>





Re: ct, fj and blackletter ligatures

2002-11-03 Thread Peter_Constable
On 11/02/2002 10:06:53 AM jameskass wrote:

>Many "Unicoders" regard the PUA as some kind of a "Phantom Zone"
>into which all of the "bad glyphs" are banished forever, never
>to again be mentioned in "polite society".

That's not how I would characterise the situation at all. It's that 
they're tired of repeated suggestions by certain individuals either to use 
the PUA for things that are not appropriate for character encoding at all 
in the first place, or to encourage widespread agreement on private-use 
assignments, or more likely both.



- Peter


---
Peter Constable

Non-Roman Script Initiative, SIL International
7500 W. Camp Wisdom Rd., Dallas, TX 75236, USA
Tel: +1 972 708 7485
E-mail: <[EMAIL PROTECTED]>





Re: ct, fj and blackletter ligatures

2002-11-03 Thread John H. Jenkins

On Saturday, November 2, 2002, at 02:59 PM, Doug Ewell wrote:


Using ZWJ to control ligation is admittedly a new concept, and it may
not have been taken up yet by many vendors, but that seems like a 
really
poor reason to discourage the Unicode approach.

Proprietary layout features in OT-savvy apps like InDesign might get 
the
job done, but wouldn't it be better if app vendors and font vendors
would follow the Unicode Standard recommendation?  You never know, it
might even reduce the number of requests to encode ligatures.


Remember, though that the Unicode approach is that ZWJ is *not* the 
preferred Unicode way to support things like a discretionary ct 
ligature in Latin text.  The standard says that the preferred way to 
handle this is through higher-level protocols.

I know that you and I disagree with to what extent ligation control 
belongs in plain text, but the standard clearly allows both approaches. 
 The ZWJ mechanism is not *the* Unicode approach.

==
John H. Jenkins
[EMAIL PROTECTED]
[EMAIL PROTECTED]
http://www.tejat.net/




Re: ct, fj and blackletter ligatures

2002-11-03 Thread Doug Ewell
John H. Jenkins  wrote:

> Remember, though that the Unicode approach is that ZWJ is *not* the
> preferred Unicode way to support things like a discretionary ct
> ligature in Latin text.  The standard says that the preferred way to
> handle this is through higher-level protocols.
>
> I know that you and I disagree with to what extent ligation control
> belongs in plain text, but the standard clearly allows both
> approaches.  The ZWJ mechanism is not *the* Unicode approach.

Once again, I have done a poor job of expressing myself on this topic.
Sometimes misunderstandings are the speaker's fault, sometimes the
listener's, and sometimes both.  In this case it is clearly my fault for
not communicating well.

I should not have implied that ZWJ was the only way to effect ligation
in Unicode Latin text, or that the user (or even the software) should
have to insert ZWJ everywhere ligatures are desired.  Rendering
subsystems can certainly use their own judgement to ligate or not.

The way I read the ZWJ in regard to ligation is as a request to the
renderer to override the default, in effect saying, "Look, dammit, I
want a ligature here."  The renderer (possibly influenced by the
capability of the font) still has the right to decline that request.

Let's consider our good old friend, the "ct" ligature.  Courier is a
good example of a font that had better darned well *not* have a ct
ligature; it would just look too weird.  Helvetica (≈ Arial) might or
might not have a "ct" ligature, but rendering systems using Helvetica
probably would not use it by default.  If Baskerville is used instead,
the chances of using the ligature by default might be somewhat higher.

(Note that I am deliberately avoiding the question of "default modes" of
fonts, or any mention of specific font technologies.  Also note that I
am steering way clear of the language-dependent "fi" ligature.)

So if the text contains the letters "ct", a Courier rendition definitely
would not ligate them by default, and a Helvetica rendition probably
would not, but a Baskerville rendition might.  This is all up to the
designers of the font and rendering engine, of course.  (Please, if you
are a font designer and know that one of these examples is wrong, be
gentle and just treat them as examples.)

Now, if the text contains c + ZWJ + t, that should tell the renderer
that the user would really, really like to see a ligature if possible.
In the case of Courier, it *isn't* possible, so you still get a "c" and
a "t".  In the case of Helvetica and Baskerville, assuming those fonts
have a "ct" ligature, the default (whatever it was) should be overridden
and the ligature should be displayed.

The same thing is true for ZWNJ.  That is, if the default behavior for
Baskerville is to ligate "ct", then c + ZWNJ + t should result in two
discrete letters.  Now, we know that fonts and renderers already do this
without being told, because ZWNJ breaks up the combination that would
otherwise be ligated, and that behavior (while accidental) is correct.

My point is that, if fonts and renderers are *also* breaking up
potential ligatures because of an intervening ZWJ, that is NOT correct
according to Unicode.  The accidental, naïve behavior that does the
right thing for ZWNJ does not do the right thing for ZWJ.

This is what I am proposing be changed: fonts and/or rendering engines
(wherever the intelligence lies, depending on the vendor technology)
should be updated to recognize "letter + ZWJ + letter" (and similar
combinations of 3 or more letters) as a request to ligate the characters
if possible.

I am *not* suggesting that fonts and rendering engines and intelligent
text processing tools like InDesign be stripped of all power to control
ligation.  They are probably in an excellent position to do so.  (I
wish, oh how I wish, that Microsoft Word had some facility for
generating ligatures.)  And I am *not* suggesting that user overrides of
the default ligation behavior be limited to inserting ZWJ or ZWNJ.  If
programs like InDesign give the user a convenient option to turn
ligation on and off, globally or locally, more power to them.  What I am
suggesting is that the Unicode ZWJ and ZWNJ *also* be honored as a way
to control ligation.  That is how I read the Unicode Standard.

-Doug Ewell
 Fullerton, California





Re: ct, fj and blackletter ligatures

2002-11-03 Thread John Hudson
At 15:09 11/3/2002, Doug Ewell wrote:


This is what I am proposing be changed: fonts and/or rendering engines
(wherever the intelligence lies, depending on the vendor technology)
should be updated to recognize "letter + ZWJ + letter" (and similar
combinations of 3 or more letters) as a request to ligate the characters
if possible.

I am *not* suggesting that fonts and rendering engines and intelligent
text processing tools like InDesign be stripped of all power to control
ligation.  They are probably in an excellent position to do so.  (I
wish, oh how I wish, that Microsoft Word had some facility for
generating ligatures.)  And I am *not* suggesting that user overrides of
the default ligation behavior be limited to inserting ZWJ or ZWNJ.  If
programs like InDesign give the user a convenient option to turn
ligation on and off, globally or locally, more power to them.  What I am
suggesting is that the Unicode ZWJ and ZWNJ *also* be honored as a way
to control ligation.  That is how I read the Unicode Standard.


I basically agree with you, Doug, and my proposal for handling ZWJ ligation 
in OpenType would provide exactly what you describe, if implemented in 
fonts and supported by rendering engines. There are, however, a number of 
issues that need to be resolved. In  order for a font lookup sequence 
involving ZWJ to be processed during layout, a *glyph* for the ZWJ 
character has to be painted in the glyph string, since font lookups work at 
the glyph level. Because ZWJ already had a function as a control character, 
e.g. in Indic script processing, prior to being pressganged into service 
for ligation, existing implementations do not paint a glyph for this 
character unless the user invokes an option to display control characters, 
e.g. in MS Office. In order to permit the latter option, these characters, 
if they are supported in a font at all, are represented by a special glyph: 
a vertical bar on a zero-width with a little x at the top. This obviously 
presents various problems, and should be a warning to the UTC to avoid 
repurposing characters that have already been implemented for other 
purposes: such implementations might not be compatible with the intended 
new purpose.

So we have a quandry: do we stop treating ZWJ as a control character and 
always paint a glyph so that it can be used in lookup sequences? If we do 
this, we run the risk of a visble glyph appearing in text anywhere that a 
font does not provide a ligature glyph or lookup sequence. Do we avoid this 
by making the ZWJ glyph a blank, zero-width glyph? If we do this, we can no 
longer use current methods to provide users with the option of displaying 
control characters (I can think of various ways to solve this particular 
problem, including glyph substitution, e.g. a 'Control Display Forms' 
layout feature that would map the blank glyphs to visible forms). We also 
lose the ability to kern the glyphs on either side of the ZWJ if a ligature 
is not available (this could be solved with a lot of contextual kerning 
data, but that would be a serious pain). I'm not saying that any of these 
problems are insoluble, or that software developers should not rewrite all 
their existing rendering engines and rethink their approach to control 
characters in order to implement ZWJ ligation. I just think people should 
be aware that supporting ZWJ ligation is considerably more difficult than 
it would have been if, for example, Michael Everson's initial proposal for 
a separate Zero-Width Ligator had been accepted. Implementing something new 
is a lot easier than completely changing an existing implementation for a 
character whose purpose has suddenly been redefined. The more widely 
implemented Unicode becomes, the more the UTC will need to consider the 
impact of their decisions on existing implementations.

John Hudson

Tiro Typeworks		www.tiro.com
Vancouver, BC		[EMAIL PROTECTED]

It is necessary that by all means and cunning,
the cursed owners of books should be persuaded
to make them available to us, either by argument
or by force.  - Michael Apostolis, 1467




Re: ct, fj and blackletter ligatures

2002-11-04 Thread William Overington
Thomas Lotze asked.

>Why below 255?

I don't know for certain but I suspect that it is that font designers do
this so that people can use an application such as Microsoft Paint to
produce an illustration using the font.  In the absence of regular Unicode
code points for the ligatures, a font designer has either to use the Private
Use Area and be Unicode compatible or make a non-Unicode compatible font, if
the font designer wishes people to be able to have direct access to the
ligature characters.

There is an interesting experiment which one can try if one wishes.

At the http://www.waldenfont.com website there are various Fraktur fonts for
sale.  There is a bundle of sample fonts available for download which have
only some of the letters and ligatures in the fonts.  The Gutenberg font has
the ppe ligature within it and indeed a number of other ligatures and
abbreviations and, in fact, a complete set of ten digit characters.

There is the manual gbpmanual.pdf available for download as well.  On page
14 of that document the ppe ligature is listed as being at 0171.

If on a PC one installs the sample Gutenberg font and then starts the
Microsoft Paint program and draws some text, selecting the Gutenberg font,
if one holds down the Alt key and keys 0171 using the digit keys at the far
right of the keyboard, hopefully the ppe ligature in the Gutenberg font will
appear on the screen.

In fact Paint only allows text up to 72 point.  However, if one uses
WordPad, then one can make the text something like 200 point in size if one
wishes and use the Print Screen facility to copy the display image onto the
clipboard.  On can then paste the image from the clipboard into Paint so
that one then has a 200 point Gutenberg ppe ligature in the Paint program.

There are some articles about using WordPad and Paint to produce graphic
effects with large characters and gold textures and so on in our family
webspace, together with the gold texture file and some other texture files
too.

http://www.users.globalnet.co.uk/~ngo

William Overington

4 November 2002

















Re: ct, fj and blackletter ligatures

2002-11-04 Thread Thomas Lotze
William Overington wrote:

> I don't know for certain but I suspect that it is that font designers
> do this so that people can use an application such as Microsoft Paint
> to produce an illustration using the font.  In the absence of regular
> Unicode code points for the ligatures, a font designer has either to
> use the Private Use Area and be Unicode compatible or make a
> non-Unicode compatible font, if the font designer wishes people to be
> able to have direct access to the ligature characters.

Judging from what I' learned by now, this is not true: If a font
designer wants to make a Unicode-compatible font, he has to use a font
file format that supports Unicode, and those formats provide means to
access unencoded glyphs by transforming certain strings of Unicode
characters into them. And if I understand it correctly, Unicode
compliance can only be achieved with all of compliant documents, fonts,
and renderers. So there appears to be no need for direct accessibility
of ligatures, alternates etc.

So far the theory is very clear, and as far as plain text is concerned,
seems to be directly applicable. However, if I have a typeset document,
say in PDF format, then I might need something stronger than a means of
suggesting ligation or variant glyphs if I can't be entirely sure of the
behaviour of the rendering engine. The lines would get scrambled if a
rendering engine chose to, say, not ligate two characters that were
supposed to be ligated when the document was typeset, thereby using more
space for the two of them than was reserved for the ligature. Does one
have to rely on the behaviour of the rendering engine in that case, or
does it make sense to call presentation forms directly in already
typeset documents? What about searchability of those documents?

> There are some articles about using WordPad and Paint to produce
> graphic effects with large characters and gold textures and so on in
> our family webspace, together with the gold texture file and some
> other texture files too.

And what's the relevance to Unicode of that?

Cheers, Thomas

-- 
Thomas Lotze

thomas.lotze at gmx.net  http://www.thomas-lotze.de/




Re: ct, fj and blackletter ligatures

2002-11-04 Thread Peter_Constable
On 11/04/2002 06:11:35 AM Thomas Lotze wrote:

>So far the theory is very clear, and as far as plain text is concerned,
>seems to be directly applicable. However, if I have a typeset document,
>say in PDF format...

If you've got a PDF document, it is encoded entirely in terms of glyphs. 
There is no problem here. A rendering engine can't choose not to ligate 
two characters that were supposed to be ligated when the document was 
typeset -- PDF doesn't work that way.



- Peter


---
Peter Constable

Non-Roman Script Initiative, SIL International
7500 W. Camp Wisdom Rd., Dallas, TX 75236, USA
Tel: +1 972 708 7485
E-mail: <[EMAIL PROTECTED]>





Re: ct, fj and blackletter ligatures

2002-11-05 Thread William Overington
Thomas Lotze wrote as follows.

>William Overington wrote:
>
>> I don't know for certain but I suspect that it is that font designers
>> do this so that people can use an application such as Microsoft Paint
>> to produce an illustration using the font.  In the absence of regular
>> Unicode code points for the ligatures, a font designer has either to
>> use the Private Use Area and be Unicode compatible or make a
>> non-Unicode compatible font, if the font designer wishes people to be
>> able to have direct access to the ligature characters.
>
>Judging from what I' learned by now, this is not true: If a font
>designer wants to make a Unicode-compatible font, he has to use a font
>file format that supports Unicode, and those formats provide means to
>access unencoded glyphs by transforming certain strings of Unicode
>characters into them.

Well, I suppose it depends upon what one means by a file format that
supports Unicode.  The TrueType format does not support the ZWJ method and
thus does not "provide means to access unencoded glyphs by transforming
certain strings of Unicode characters into them".  I am unsure as to
whether, in formal terms, TrueType is "a file format that supports Unicode"
as it does not allow the ZWJ sequences to be recognized.  Please note that
my sentence did have "if the font designer wishes people to be able to have
direct access to the ligature characters".  However, certainly, a font
designer using an advanced font format may well not wish people to be able
to have direct access to the ligature characters.  The paragraph was
replying to your question as to why someone who wants to set and print out a
page of Fraktur at present is in practice likely to have to use a font with
the ligatures encoded with code points less than 255.  Please know that I am
not seeking to be pedantic over the meaning of the phrase "a file format
that supports Unicode", it is just that I get the impression that you might
possibly have not quite understood that some font formats widely used for
Unicode encoded characters, such as the TrueType format, do not support the
ZWJ glyph substitution process or, in fact, any glyph substitution process,
such as noticing the two letter ct sequence and substituting a ct ligature
glyph within the font.

>And if I understand it correctly, Unicode
>compliance can only be achieved with all of compliant documents, fonts,
>and renderers. So there appears to be no need for direct accessibility
>of ligatures, alternates etc.

I said compatible, I did not say compliant and did not mean compliant.  I
was meaning compatible, in the sense that, if one wishes to produce a font
using the TrueType format and that font is to include glyphs for ligatures
such as ct and ppe, how does one do it so that the method used does not
conflct with Unicode.  Using Private Use Area code points avoids conflicting
with the regular Unicode code points used for other characters.

>> There are some articles about using WordPad and Paint to produce
>> graphic effects with large characters and gold textures and so on in
>> our family webspace, together with the gold texture file and some
>> other texture files too.
>
>And what's the relevance to Unicode of that?
>

Well, in direct terms probably nothing.  However, as this is a widely
distributed mailing list it might be that some readers, having read about
the matter of using ligature characters in Paint and the way that one needs
a font with code points less than 255 in order to access the ligature
characters from Paint, might like to have a go at producing such graphics,
so, having available some articles on the matter, I mentioned them.

If one considers the Gutenberg sample font, the ct ligature is available as
well, at Alt 0201 using Paint.  One could use Wordpad to get the character
as well.  Yet, suppose that one has an advanced format font with a ct glyph
within it yet where the font does not provide a direct code point access
glyph, but only allows a ct ligature to be displayed using a combination of
computer hardware and software which supports the advanced font format.  How
is one going to get that ct ligature to display if one does not have access
to that hardware and software combination?  Now certainly the attempt has
been made to trivialise the matter by reference to very very old computer
systems, yet here the problems arise with PCs manufactured in 1999.

May I add that this posting is trying to be helpful to answer questions
which you have posed, I am not seeking to reopen the discussion of whether
the Unicode Technical Committee should encode any more precomposed
ligatures.  I raised that issue before the August 2002 meeting of the
Unicode Technical Committee, the committee discussed the matter at the
meeting, formed a consensus view and that consensus view was minuted and the
minutes have been published.  It is simply a matter that the Unicode
Technical Committee is not going to encode any more ligatures, I have my
golden ligatures c

Re: ct, fj and blackletter ligatures

2002-11-05 Thread Thomas Lotze
William Overington wrote:

> Well, I suppose it depends upon what one means by a file format that
> supports Unicode.

In my reply, I understood by that term a font which both uses Unicode
code points and employs Unicode control character mechanisms. Only in
conjuction with these mechanisms does the policy not to encode alternate
presentation forms and ligatures work well.

> The TrueType format does not support the ZWJ method and
> thus does not "provide means to access unencoded glyphs by
> transforming certain strings of Unicode characters into them".

What about Opentype with TT outlines?

> Please note that
> my sentence did have "if the font designer wishes people to be able to
> have direct access to the ligature characters".

Sure. My point was that talking about Unicode compliance only makes
sense if you have both a Unicode compliant font and rendering engine.
AIUI, in that case you don't need more direct glyph access than you get
by Unicode strings including control characters.

> Using Private Use Area code points  [for ligatures] avoids conflicting
> with the regular Unicode code points used for other characters.

It does avoid conflicting with regular UVs. However, since Unicode is
meant as a character encoding rather than a glyph encoding, it conflicts
with the concept of Unicode. The comparison may be far-fetched, but
encoding things that are not to be Unicode-encoded reminds me of
so-called XML documents that have an opening and a closing tag and
binary data in between. It's the same kind of defeating the purpose of a
standard through a backdoor.

> Yet, suppose that one has an advanced format font with a ct glyph
> within it yet where the font does not provide a direct code point
> access glyph, but only allows a ct ligature to be displayed using a
> combination of computer hardware

Why hardware? When talking about screen representations and file
generation, no hardware has a say in the matter. (I could imagine a
printer directly handling intelligent font formats, though.)

> and software which supports the advanced font format.  How
> is one going to get that ct ligature to display if one does not have
> access to that hardware and software combination?

No problem. Either you find some way to access the glyphs by their glyph
name or sequential position in the font rather than by code point, or
you just have to live with the fact that in order to get some feature,
you need software that provides it. Another comparison: this reminds me
of ASCII graphics where one tries to get graphics effects without having
graphical capabilities. It works to a certain extent but is a workaround
at best.

> Now certainly the attempt has
> been made to trivialise the matter by reference to very very old
> computer systems, yet here the problems arise with PCs manufactured in
> 1999.

What is the relevance of the hardware? Do I miss something here?

> if people choose to ignore the
> golden ligatures collection, then that is up to them and if people
> choose to use the golden ligatures collection then that too is up to
> them. 

Actually, I find your collection useful, but more from the angle of
typography than from that of Unicode.

> I have also had great enjoyment in preparing the golden
> ligatures collection in that I have learned a lot about ligatures
> which were used by printers in days gone by.

I know that kind of enjoyment; typography is rather dangerous to other
interests ;o)

Cheers, Thomas

-- 
Thomas Lotze

thomas.lotze at gmx.net  http://www.thomas-lotze.de/




Re: ct, fj and blackletter ligatures

2002-11-05 Thread John Cowan
Thomas Lotze scripsit:

> Another comparison: this reminds me
> of ASCII graphics where one tries to get graphics effects without having
> graphical capabilities. It works to a certain extent but is a workaround
> at best.

FIGlet is a rendering engine (and associated font format) that uses
ASCII graphics to render Unicode-encoded characters.  See
http://www.figlet.org .

-- 
John Cowan <[EMAIL PROTECTED]> http://www.reutershealth.com
I amar prestar aen, han mathon ne nen,http://www.ccil.org/~cowan
han mathon ne chae, a han noston ne 'wilith.  --Galadriel, _LOTR:FOTR_




RE: ct, fj and blackletter ligatures

2002-11-05 Thread Kent Karlsson

> German is indeed a special case, and there are various ideas 
> for how best 
> to handle German ligation. Clearly, inserting ZWJ where one 
> wanted ligation 
> -- or, perhaps, ZWNJ where one didn't want it -- is an 
> option. Using ZWNJ is probably a better solution,

Why would not SOFT HYPHEN be appropriate here?  If this does
occur between words, a SOFT HYPHEN would be appropriate anyway.
(Of course, that does not mean that they are often inserted,
but more likely than a ZWNJ, which in addition may interfere
negatively with automatic hyphenation, while SHY interferes
positively with automatic hyphenation.)

This is not to say that SOFT HYFHEN must prevent ligatures:

Firstly, the claim that there must be no ligation over subword
boundaries is made only for German.  I've never heard such a claim
for Swedish, which uses compund words just as much as German.
So I don't see why SOFT HYPHEN (when invisible) should prevent
ligation in the graphical sense (the shapes for the original
letters are joined), with the possible exception of German.

Secondly, even if there is no ligature in the graphical sense
there can still be a ligature in the technical sense (a single
glyph) but where the shapes of the original letters are separated.
The latter may be needed for the case of "no ligature" (in the
graphical sense), to prevent an aestetically displeasing overlap.
So there may still be a need for a ligature (in the technical
sense) over word boundaries (which may include a SOFT HYPHEN)
also for German.

> For other languages, it is typical for a set of standard 
> ligatures (usually 
> those involving f followed by an ascending form: fb ffb ff fh 
> ffh fi ffi fj 
> ffj fk ffk fl ffl) to be on by default because these 

Yes, please. (ft and fft are strangely missing...) For some fonts
there may be even more cases (like gj and tt) where ligation to
make what otherwise would have been ugly overlaps look good.

> ligatures are not 
> merely stylistic but preserve word shape integrity by 
> reducing white space 
> between the letters while avoiding distracting collisions. 
> The well known 
> exception to this is found in the typography of those Turkic 
> languages that 
> employ a dotless as well as a dotted i: for these languages 
> fi and ffi 
> ligature formation needs to be supressed, or special 
> ligatures need to be 
> provided that do not remove the dot of the i. OpenType 

I would agree mostly, but with some formulation modifications.

If there would be no overlap, there is no pressing need
for a ligature.  If there would be overlap, use a ligature
(in the technical sense) to form a ligature (in the graphical
sense) or use a ligature (in the technical sense) to form
disjoint shapes, depending factors as you mention. 

/Kent K





Re: ct, fj and blackletter ligatures

2002-11-05 Thread Peter_Constable
On 11/05/2002 03:18:55 AM "William Overington" wrote:

>I am unsure as to
>whether, in formal terms, TrueType is "a file format that supports 
Unicode"

Absolutely. Every TrueType font on Windows has always made use of Unicode; 
every TrueType font shipped by vendors like Microsoft has conformed to the 
Unicode standard. Lots of text has been rendered for years using TrueType 
fonts in a Unicode-conformant manner.



- Peter


---
Peter Constable

Non-Roman Script Initiative, SIL International
7500 W. Camp Wisdom Rd., Dallas, TX 75236, USA
Tel: +1 972 708 7485
E-mail: <[EMAIL PROTECTED]>





Re: ct, fj and blackletter ligatures

2002-11-05 Thread John H. Jenkins

On Tuesday, November 5, 2002, at 02:18 AM, William Overington wrote:


Well, I suppose it depends upon what one means by a file format that
supports Unicode.  The TrueType format does not support the ZWJ method 
and
thus does not "provide means to access unencoded glyphs by transforming
certain strings of Unicode characters into them".

TrueType fonts are perfectly capable of supporting ligatures.  
OpenType, AAT, and Graphite all use TrueType fonts, and all support 
ligatures.

==
John H. Jenkins
[EMAIL PROTECTED]
[EMAIL PROTECTED]
http://www.tejat.net/




Re: ct, fj and blackletter ligatures

2002-11-05 Thread Radovan Garabik
On Tue, Nov 05, 2002 at 04:35:35PM +0100, Kent Karlsson wrote:

> Firstly, the claim that there must be no ligation over subword
> boundaries is made only for German. 

It is also valid for Slovak and Czech.


-- 
 ---
| Radovan Garabík http://melkor.dnp.fmph.uniba.sk/~garabik/ |
| __..--^^^--..__garabik @ melkor.dnp.fmph.uniba.sk |
 ---
Antivirus alert: file .signature infected by signature virus.
Hi! I'm a signature virus! Copy me into your signature file to help me spread!




Re: ct, fj and blackletter ligatures

2002-11-06 Thread William Overington
John Hudson wrote as follows.

>At 02:18 11/5/2002, William Overington wrote:

Not at 02:18, it was 09:18.

>
>>Well, I suppose it depends upon what one means by a file format that
>>supports Unicode.  The TrueType format does not support the ZWJ method and
>>thus does not "provide means to access unencoded glyphs by transforming
>>certain strings of Unicode characters into them".
>
>All three of the current 'smart font' formats are extensions of the
>TrueType file format. Structurally, the only difference between a TrueType
>font and an OpenType font is the presence of *optional* layout tables that
>support glyph substitution and positioning. Officially, the only difference
>is the presence of a digital signature.
>
>>I am unsure as to
>>whether, in formal terms, TrueType is "a file format that supports
Unicode"
>>as it does not allow the ZWJ sequences to be recognized.
>
>Of course TrueType allows ZWJ sequences to be recognised. ZWJ is a
>character that can appear in Unicode text and in the Unicode cmap of a
>TrueType font. If a font does not contain a ligature for the sequence, or
>does not contain layout information to render the sequence as a ligature,
>the text is still processed according to the Unicode Standard, i.e. nothing
>happens.

I am thinking here of ordinary TrueType fonts on a Windows 95 platform and
on a Windows 98 platform.  I was under the impression that the reason that
an ordinary TrueType font will not process a ZWJ sequence on those platforms
was that both the operating system and the ordinary TrueType font do not
have the capabilities to process ZWJ sequences.  My understanding is that
even an OpenType font with ZWJ sequence facilities will not work on a
Windows 95 or Windows 98 platform.  However, I thought that the ordinary
TrueType format would not support ZWJ sequences in itself and that not only
would a later operating system be needed but that also an OpenType font
would be needed and that an ordinary TrueType format would not be able to do
the job.  Was I wrong in that thinking?  My experience of fonts is very
limited.  I have tried making a few example TrueType fonts using the Softy
shareware facility and I wonder whether I have got it wrong as to what an
ordinary TrueType font will do when an ordinary TrueType font is made with
an expensive professional font making program.

>To say that a font only supports Unicode if it can process and
>render as a ligature every usage of the ZWJ character is foolish: every
>font would have to contain glyphs and substitution lookups to support every
>potential use of ZWJ in every possible
>c+ZWJi+i+ZWJi+r+ZWJi+c+ZWJi+u+ZWJi+m+ZWJi+s+ZWJi+t+ZWJi+a+ZWJi+n+ZWJi+c+ZWJ
i+e.

I have had a long think about this.

Suppose that a sequence of Unicode characters in a plain text file is mostly
in English and  has the sequence c ZWJ t in it at various places.

Suppose that the font is an advanced format font which does not have a
special glyph for the sequence c ZWJ t yet will simply render it as ct just
as if the ligature had not been requested.

As far as I know, there is no requirement in Unicode that the rendering
system should notify, perhaps using an Alert dialogue box or similar, the
end user that the ZWJ request has been "made yet not fulfilled".

Can an advanced format font supply such a message to the rendering system
for onward notification of the end user?

It seems to me that having the ZWJ mechanism in the Unicode Standard yet
having no reporting mechanism if a specific request is not fulfilled is
unfortunate.  As a font could have its own set of ZWJ sequences which it
recognizes, anything from an empty set to a set consisting of a full
complement of ligatures for Fraktur, it seems to me that whilst every font
would certainly not have to contain glyphs and substitution lookups to
support every potential use of ZWJ in every possible circumstance it would
not be unreasonable to hope that fonts could have a standardized reporting
mechanism as to whether a request for a particular ZWJ sequence has been
fulfilled.  Also, perhaps there could be a method for asking a font to
please display all its ZWJ sequences and their results.

Now it might be that some advanced font formats can do such things, I do not
know at present.

While on this topic, perhaps a standradized method of a font reporting that
it has no glyph for a character which it is asked to render might be a good
idea.  I am aware that a black line box could be displayed, yet in a long
document, one of those might easily slip past a general viewing of the text
in a printshop.  Also, perhaps some method of asking a font to declare a
list of the code points for which it has a specific glyph would be helpful.
Again, perhaps some advanced font formats have these abilities, I do not
know at present.

There seems to be a gap between the Unicode Technical Committee encoding
characters into a file and the process of making sure that the desired text
is rendered correctly on an end user's pla

RE: ct, fj and blackletter ligatures

2002-11-06 Thread Kent Karlsson


> > Firstly, the claim that there must be no ligation over subword
> > boundaries is made only for German. 
> 
> It is also valid for Slovak and Czech.

ok.


I still wonder a bit why.  It does not help the reader in
any significant way, esp. when many different words are
spelled the same quite regardless of ligation. Could it be
a heritage from Fraktur (where ligatures are used a lot)? Or
is/was it just as a  convenience when setting text in metal,
so that it would be a bit easier for the typesetter (and
colleagues) to do hyphenation during or after the (initial)
typesetting?  (Subword boundaries are likely hyphenation
points, whereas occurrences of ff, fi etc. elsewhere are
unlikely hyphenation points.) It would then be easy to
just put in a hyphen, without worrying about the letters
(or about their typeface; the hyphen would not vary much
with typeface). The latter reason should not apply to
digital typesetting, at least as long as one stays away
from the compatibility ligature characters but instead
letting the font do the ligation automatically.

Speculating
/Kent K





Re: ct, fj and blackletter ligatures

2002-11-06 Thread John Hudson
At 04:05 11/6/2002, William Overington wrote:


I am thinking here of ordinary TrueType fonts on a Windows 95 platform and
on a Windows 98 platform.


Sorry. I thought this was a discussion about Unicode.


However, I thought that the ordinary
TrueType format would not support ZWJ sequences in itself and that not only
would a later operating system be needed but that also an OpenType font
would be needed and that an ordinary TrueType format would not be able to do
the job.  Was I wrong in that thinking?


The Unicode cmap table as been part of the TrueType specification since the 
earliest days. This means that any Unicode character, including ZWJ, can be 
supported in any TrueType font. In order to perform display time glyph 
level substitution for sequences involving that character, optional layout 
tables are necessary (along with, of course, Unicode text processing and a 
layout engine that uses the optional font tables). But my earlier point was 
that doing nothing  when you encounter a ZWJ character in text is a 
perfectly valid implementation of the Unicode Standard. Not every font is 
required to have glyph support for every possible ZWJ sequence, so the 
Unicode support in an 'ordinary TrueType font' that does not include the 
optional layout tables cannot be judged deficient for not supporting any 
ZWJ ligation+ZWJ+. Oops, watch out for that n. ligature!

>To say that a font only supports Unicode if it can process and
>render as a ligature every usage of the ZWJ character is foolish: every
>font would have to contain glyphs and substitution lookups to support every
>potential use of ZWJ in every possible
>c+ZWJi+i+ZWJi+r+ZWJi+c+ZWJi+u+ZWJi+m+ZWJi+s+ZWJi+t+ZWJi+a+ZWJi+n+ZWJi+c+ZWJ
i+e.

I have had a long think about this.


Oh dear.


Suppose that a sequence of Unicode characters in a plain text file is mostly
in English and  has the sequence c ZWJ t in it at various places.

Suppose that the font is an advanced format font which does not have a
special glyph for the sequence c ZWJ t yet will simply render it as ct just
as if the ligature had not been requested.

As far as I know, there is no requirement in Unicode that the rendering
system should notify, perhaps using an Alert dialogue box or similar, the
end user that the ZWJ request has been "made yet not fulfilled".

Can an advanced format font supply such a message to the rendering system
for onward notification of the end user?


The font doesn't need to do this, because an application can produce such 
an alert based on the presence of 'unresolved' ZWJ glyphs in a document. It 
seems to me that the processing required to do is a lot of work to go 
through to inform the user of something that he's probably either already 
noticed or doesn't care about anyway. In any case, it would be easy to see 
which ZWJ sequences had been rendered as ligatures and which not by 
toggling a 'Display Control Characters' option.

While on this topic, perhaps a standradized method of a font reporting that
it has no glyph for a character which it is asked to render might be a good
idea.  I am aware that a black line box could be displayed, yet in a long
document, one of those might easily slip past a general viewing of the text
in a printshop.


There is a standardized method for all sfnt fonts. This is the inclusion of 
the .notdef glyph, which is a requirement of the TrueType specification. As 
you note, in many fonts this appears as a black line box, but it can take 
any form, some of which are easier to spot in proofreading. Adobe use a box 
with an X across it. I use a box with a large question mark in it.

Here's an exercise for your enthusiasm, William: devise the form of the 
perfect .notdef glyph. It needs to unambiguously indicate that a glyph is 
missing, i.e. it should be something that can easily be mistaken for a 
dingbat, and it needs to be easy to spot in proofreading in both print and 
onscreen (some applications, e.g. Adobe InDesign, make the latter a bit 
easier by applying colour highlight to the .notdef glyph).

There seems to be a gap between the Unicode Technical Committee encoding
characters into a file and the process of making sure that the desired text
is rendered correctly on an end user's platform with good provenance.  I
feel that that issue needs to be addressed.  Hopefully the Unicode Technical
Committee will wish to take that task upon itself.  It not, then perhaps
some other process can be found of codifying a standard method.


This is an implementation issue and is the responsibility of individual 
system and application developers. If you over-define a standard, by 
creating lots of little rules for every possible eventuality in 
implementation, you'll find that people will not implement your standard.

>That's even more moronic that saying that a font has to contain a glyph for
>every character in Unicode in order to support the standard.

I did not write that.


I didn't say you did. This is often and erroneously cited as 

Re: ct, fj and blackletter ligatures

2002-11-06 Thread Thomas Lotze
William Overington wrote:

> Also, perhaps there could be a method for asking a font to
> please display all its ZWJ sequences and their results.
[...]
> Now it might be that some advanced font formats can do such things, I
> do not know at present.
[...]
> Also, perhaps some method of asking a font to declare a
> list of the code points for which it has a specific glyph would be
> helpful. Again, perhaps some advanced font formats have these
> abilities, I do not know at present.

A font is not a program that executes machine code. A font is data, and
a program only in the sense that, e.g, a PostScript font contains data
in the PostScript programming language. Hence, a font can't 'do' this,
only an application using the font can. Such an application will extract
information it is interested in from the font, and do with it as its
purpose commands. A font viewer application might display to the user
tables of all glyphs, ligatures, and variations; the information (at
least that about the total glyphs) is there in the font by definition.

Similarly, it's a matter of the rendering engine to notify the
application that requested the rendition (and thereby the user) that it
didn't find all the glyphs it was looking for in the font. Maybe neither
Unicode nor the font format specifications require such a behaviour, but
one could of course write a rendering engine that exhibits it, whatever
fonts it works on.

Cheers, Thomas

-- 
Thomas Lotze

thomas.lotze at gmx.net  http://www.thomas-lotze.de/




RE: ct, fj and blackletter ligatures

2002-11-06 Thread Dominikus Scherkl
> > > Firstly, the claim that there must be no ligation over subword
> > > boundaries is made only for German. 
> > 
> > It is also valid for Slovak and Czech.
> I still wonder a bit why.

There are wonderful words in German like "Wachstube"
this could mean "guards room" (Wach-Stube, so "st" may be
ligated) or "wax tube" (Wachs-Tube, so an "st"-ligature
would force misreadings).
In this rare case both readings make sense, but there are
many more where a displaced ligature would simply lead to
misreadings where sylables are gathered the wrong way
which don't make any sense at all.
-- 
Dominikus Scherkl
[EMAIL PROTECTED]




Re: ct, fj and blackletter ligatures

2002-11-06 Thread Peter_Constable
On 11/06/2002 05:05:17 AM "William Overington" wrote:

>I am thinking here of ordinary TrueType fonts on a Windows 95 platform 
and
>on a Windows 98 platform.

So, by "ordinary" you mean a TTF with a cmap table but no GSUB or other 
tables that perform glyph transformations (though fonts containing such 
tables are just as much TrueType fonts as fonts that are not -- and some 
fonts with such tables were part of some versions of Win 98).


>  I was under the impression that the reason that
>an ordinary TrueType font will not process a ZWJ sequence on those 
platforms
>was that both the operating system and the ordinary TrueType font do not
>have the capabilities to process ZWJ sequences.

Given your definition of "ordinary TrueType font", glyph transformations 
are not possible, by definition. But your definition isn't all that 
relevant: fonts that contain tables to perform glyph transformations can 
be used on *any* flavour of Win32 (or other platforms), given appropriate 
software.

It is true that neither the western versions of Win95 or Win98 had 
OS-level capability of applying tables inside TrueType fonts to perform 
glyph transformations. The Mideast version of at least Win98 (not sure 
about Win95) did make use of some such tables (at that time, the 
technology was known as TrueType Open). But any application software on 
any platform could make use of such tables, provided the software is 
written to do so.

You'll probably come back to say, "But I was talking about 'ordinary 
TrueType fonts'." If you insist on an invalid assumption, there's no way 
to argue against it. It's like saying, "software with a character-mode UI 
is not capable of displaying bitmap graphics" -- true, but irrelevant.


>  My understanding is that
>even an OpenType font with ZWJ sequence facilities will not work on a
>Windows 95 or Windows 98 platform.

It can, given software that knows how to process such sequences to do 
glyph substitutions.



>As far as I know, there is no requirement in Unicode that the rendering
>system should notify, perhaps using an Alert dialogue box or similar, the
>end user that the ZWJ request has been "made yet not fulfilled".
>
>Can an advanced format font supply such a message to the rendering system
>for onward notification of the end user?

Yes, actually, there is a built-in feedback mechanism: the font provides 
to the rendering system outline data for the c glyph and the t glyph, and 
the rendering system rasterises those outlines in consecutive order, so 
the user sees a glyph sequence "ct" rather than the ct ligature. This 
feedback mechanism even works on older systems.

A font implementer could make use of this built in capability to provide 
even more explicit information: a font feature might be used to cause 
invisible characters to be displayed in some way (similar to seeing a 
raised circle for the non-breaking space when you set Word to show 
non-printing characters).

If you really want a dialog box to popup providing notification to the 
user, I'm wondering how many times as the file is opened and a page is 
rendered you'd like this popup to appear? 17 times if there are 17 
instances of < c, ZWJ, t > that are not rendered as a ct ligature? Not on 
my system, thank you.


>Also, perhaps some method of asking a font to declare a
>list of the code points for which it has a specific glyph would be 
helpful.

Software simply needs to inspect the cmap table. No new mechanism is 
needed for this. You're enumerating solutions that need to be built for 
problems that don't exist.



>There seems to be a gap between the Unicode Technical Committee encoding
>characters into a file and the process of making sure that the desired 
text
>is rendered correctly on an end user's platform with good provenance.

It is not the job of the Unicode Technical Committee to define guidelines 
or review implementations for rendering of text.


>I
>feel that that issue needs to be addressed.  Hopefully the Unicode 
Technical
>Committee will wish to take that task upon itself.

I assure you, they will not.


- Peter


---
Peter Constable

Non-Roman Script Initiative, SIL International
7500 W. Camp Wisdom Rd., Dallas, TX 75236, USA
Tel: +1 972 708 7485
E-mail: <[EMAIL PROTECTED]>





RE: ct, fj and blackletter ligatures

2002-11-06 Thread John Hudson
At 08:04 11/6/2002, Dominikus Scherkl wrote:


There are wonderful words in German like "Wachstube"
this could mean "guards room" (Wach-Stube, so "st" may be
ligated) or "wax tube" (Wachs-Tube, so an "st"-ligature
would force misreadings).
In this rare case both readings make sense, but there are
many more where a displaced ligature would simply lead to
misreadings where sylables are gathered the wrong way
which don't make any sense at all.


In the case of Wachstube, using an st ligature would only 'force a 
misreading' if the correct reading were 'wax tube'. One could equally well 
argue that using an st ligature would reinforce a *correct* reading of 
'guards room', in which case the ligature should perhaps not be prohibited 
but encouraged because it removes the ambiguity of the ligatureless spelling.

Of course, all this depends on the notion that readers automatically 
associate ligature formation with syllable construction, which I don't 
think is at all certain. Things like the ct and st ligatures are oddities, 
in that they are not standard elements of Latin script typography in any 
language. Consider, instead, f-ligatures, which are standard for most 
languages and which have a functional purpose in preserving good wordshape. 
I don't believe that English readers encountering an fb ligature in the 
middle of the compound word 'goofball' are confused about where the 
syllables, and hence the subwords, end and begin. Indeed, the point of 
having the ligature is so that the reader's attention will not be drawn to 
the sequence. Competent readers do not notice standard ligatures. Plenty of 
read hundreds of books in during their life without even knowing that 
ligatures exist.

Are the German ligation rules backed up by any empirical studies of the 
ways in which competent German readers read? Or is it a convention based on 
grammatical theory without any reference to the mechanics of reading?

John Hudson

Tiro Typeworks		www.tiro.com
Vancouver, BC		[EMAIL PROTECTED]

It is necessary that by all means and cunning,
the cursed owners of books should be persuaded
to make them available to us, either by argument
or by force.  - Michael Apostolis, 1467




Re: ct, fj and blackletter ligatures

2002-11-07 Thread William Overington
Peter Constable wrote as follows.

>You'll probably come back to say, "But I was talking about 'ordinary
>TrueType fonts'."

No I won't.  It's not my personality type to do so.  Have a look at the
Myers Briggs Type Indicator for personality type, the key message is that
not everybody has the same personality type.

I may argue a point if I consider it right to do so, but I do not argue
something just for the sake of arguing or because of some notion of not
being willing to lose face or something like that in accepting that I did
not previously know something.

I mean, that is pointless and is a waste of time.  Anyway, it is not my
nature to be like that.

So, I did not know the correct situation and you have helped me by
explaining more about it.  Thank you.

>If you insist on an invalid assumption, there's no way
>to argue against it. It's like saying, "software with a character-mode UI
>is not capable of displaying bitmap graphics" -- true, but irrelevant.

But I won't, it's not my personality to so so.

I genuinely did not understand and I am grateful to you for explaining the
matter to me.

>If you really want a dialog box to popup providing notification to the
>user, I'm wondering how many times as the file is opened and a page is
>rendered you'd like this popup to appear?

Once.  A notification in a dialogue box that the problem exists with a
button to click for further detailed information as to which character or
characters, how many times for each, and on which pages and lines.

>17 times if there are 17
>instances of < c, ZWJ, t > that are not rendered as a ct ligature?

No, just the once.

>Not on
>my system, thank you.

Certainly not!

Thank you for explaining the matter about the TrueType fonts.

William Overington

7 November 2002











FW: ct, fj and blackletter ligatures

2002-11-07 Thread Dominikus Scherkl
Hello.

> > There are wonderful words in German like "Wachstube"
> > this could mean "guards room" (Wach-Stube, so "st" may be
> > ligated) or "wax tube" (Wachs-Tube, so an "st"-ligature
> > would force misreadings).
> In the case of Wachstube, using an st ligature would only 'force a 
> misreading' if the correct reading were 'wax tube'.
yes, so if this meaning was intended, it's very bad to ligate.

if you don't like this example, how about 'fl' (very common in
english)? Several german words are composed of one part ending in 'f'
and next part beginning with 'l'. But if this second part is e.g.
"lasche" (hanger) an 'fl'-ligature lead the reader to "flasche" (bottle).
The situation get even worse if the word is then (erroneous) hyphenated
at this point (if you're lucky, at the end of a page).
You may need to flip the page twice or more until you recognise that
the strange "grei-flasche" was intendet to mean "greif-lasche" (one of
those things you hold on in a bus or so).

> I don't believe that English readers encountering an fb 
> ligature in the middle of the compound word 'goofball'
> are confused about where the syllables, and hence the subwords,
> end and begin.
That may be because english doesn't use word-concatenations the
way german do: words like "Krankenversischerungsbeitragserhöhungen"
become even more unreadable if you hide the in-word boundaries with
bad ligatures.

> plenty of read hundreds of books in during their life without
> even knowing that ligatures exist.
I stumbled sometimes (especialy in the last years, where several
publishers have changed to computer-typesetting systems) over strange
places to hyphenate or ligate glyphs. It sometimes had made me
very angry how sloopy books are typeset nowadays.

> Are the German ligation rules backed up by any empirical 
> studies of the ways in which competent German readers read?
Yes, I think so (and feel it personaly).
And, as someone mentioned, german is not the only language where
word-joining ligatures are prohibited for the reason to increase
readybility.

Best regards.
-- 
Dominikus Scherkl
[EMAIL PROTECTED]




Re: ct, fj and blackletter ligatures

2002-11-07 Thread Peter_Constable
On 11/07/2002 04:27:32 AM "William Overington" wrote:

>I may argue a point if I consider it right to do so, but I do not argue
>something just for the sake of arguing...

I didn't mean to suggest that you do. 



>Once.  A notification in a dialogue box that the problem exists 

You're assuming there is a problem. If I send you a document and I wanted 
it to display in Comic Sans but you don't have that font on your system, 
so you end up seeing it in, say, Arial, does that merit a dialog box? 
Perhaps, but systems just aren't working that way, and there apparently 
hasn't been any great cry to make them do so. As for providing a 
notification dialog to say that the text contains < c, ZWJ, t > but that 
the font doesn't support it, there are no existing mechanisms to support 
that at present, but it hasn't been demonstrated that there really is any 
need, and I really don't expect vendors will be hearing too many 
complaints from users.



>Thank you for explaining the matter about the TrueType fonts.

You're welcome.




- Peter


---
Peter Constable

Non-Roman Script Initiative, SIL International
7500 W. Camp Wisdom Rd., Dallas, TX 75236, USA
Tel: +1 972 708 7485
E-mail: <[EMAIL PROTECTED]>





Re: ct, fj and blackletter ligatures

2002-11-07 Thread John H. Jenkins

On Thursday, November 7, 2002, at 09:40 AM, [EMAIL PROTECTED] 
wrote:

As for providing a
notification dialog to say that the text contains < c, ZWJ, t > but 
that
the font doesn't support it, there are no existing mechanisms to 
support
that at present, but it hasn't been demonstrated that there really is 
any
need, and I really don't expect vendors will be hearing too many
complaints from users.


Actually, you *could* do it on a Mac if you really wanted to.  I'm not 
sure why you would, however.  One of the advantages of the ZWJ 
mechanism for requesting ligatures is that if the request is impossible 
to fulfill, it can be ignored.  For discretionary ligatures like ct, 
this is the appropriate response.  (Matters are a bit more complicated 
for required ligatures, of course.)

==
John H. Jenkins
[EMAIL PROTECTED]
[EMAIL PROTECTED]
http://www.tejat.net/




RE: ct, fj and blackletter ligatures

2002-11-07 Thread Marco Cimarosti
Kent Karlsson wrote:
> (Subword boundaries are likely hyphenation
> points, whereas occurrences of ff, fi etc. elsewhere are
> unlikely hyphenation points.)

I am sorry to always contradict you but, in Italian, there always is an
hyphenation point between two identical consonant letters. Nevertheless,
Italian typography traditionally requires the "ff", "ffi" and "ffl"
ligatures.

BTW, this leads me to a horrible thought: would a shy hyphen between the two
f's prevent the formation of the "ff" ligature? In this is the case, fonts
might also need to have ++, ++, and
++ into their ligature tables.

_ Marco




Re: ct, fj and blackletter ligatures

2002-11-07 Thread Thomas Lotze
On Thu, 7 Nov 2002 10:40:48 -0600
[EMAIL PROTECTED] wrote:

> You're assuming there is a problem. If I send you a document and I
> wanted it to display in Comic Sans but you don't have that font on
> your system, so you end up seeing it in, say, Arial, does that merit a
> dialog box?

Depends. For a home user who doesn't care about fonts, it doesn't. For
someone who cares, it does. If I am interested in some information, I
don't like if I can't get it just because someone couldn't imagine it
was a problem.

> As for providing a 
> notification dialog to say that the text contains < c, ZWJ, t > but
> that the font doesn't support it, there are no existing mechanisms to
> support that at present,

I don't understand this. Since a font doesn't "do" anything, but
software using the font does, one could write a rendering engine which
gives feedback about how well it could complete its job, and a
typesetting application might elect to make use of that feedback and
provide a notification dialog or whatever. BTW, as the information about
available glyphs in a font is independent of character encoding, I don't
see the relevance of this whole discussion to this list.

> but it hasn't been demonstrated that there really is any 
> need, and I really don't expect vendors will be hearing too many 
> complaints from users.

That's no reason. Just because many people don't need a feature, or
maybe just don't care enough to complain, doesn't mean it shouldn't be
provided for those who do need it.

I could well imagine that in a typesetting application, it would make
sense to be informed on whether a certain typographic feature can or
cannot be applied. Carefully checking a long document for whether, e.g.,
a certain pair of glyphs does form a ligature where it is supposed to is
tedious and error-prone if done by a human, but such routine tasks are
what computers are good at.

Cheers, Thomas

-- 
Thomas Lotze

[EMAIL PROTECTED]  http://www.thomas-lotze.de/





RE: ct, fj and blackletter ligatures

2002-11-07 Thread Kent Karlsson

> Kent Karlsson wrote:
> > (Subword boundaries are likely hyphenation
> > points, whereas occurrences of ff, fi etc. elsewhere are
> > unlikely hyphenation points.)
> 
> I am sorry to always contradict you 

I don't think we always contradict eachother! ;-)
Indeed we seem to agree on that the TAG "characters"
should be deprecated... [Long live the TAG BOLD and
TAG ITALIC!; ah, no]

> but, in Italian, there always is an
> hyphenation point between two identical consonant letters. 

ok. (And I don't think we disagree here at all.)

> Nevertheless,
> Italian typography traditionally requires the "ff", "ffi" and "ffl"
> ligatures.

I guess, fi, fl, etc. too.

> BTW, this leads me to a horrible thought: would a shy hyphen 
> between the two
> f's prevent the formation of the "ff" ligature? In this is 
> the case, fonts
> might also need to have ++, ++, and
> ++ into their ligature tables.

I'm not sure to what extent rendering systems "send" SHYs to fonts.
If they do, then yes, such ligature table entries would be needed.
If SHYs are handed before any characters are "sent" to the font (it
would need to for proper interpretation, right?) they could also
be removed before the string is "sent" to the font (or more generally,
before automatic ligature processing).

If, on the other hand, like (maybe!) for German, one want 
and similar to not form a ligature in the graphical sense, one would
have to somehow use special ligature entries or special letter forms
for what originally was  (etc.) so that no ligature in the
graphical sense would be formed, nor any ugly overlap.


/Kent K





Re: ct, fj and blackletter ligatures

2002-11-07 Thread Peter_Constable
On 11/07/2002 12:43:22 PM Thomas Lotze wrote:

>> As for providing a
>> notification dialog to say that the text contains < c, ZWJ, t > but
>> that the font doesn't support it, there are no existing mechanisms to
>> support that at present,
>
>I don't understand this. Since a font doesn't "do" anything, but
>software using the font does, one could write a rendering engine which
>gives feedback

Sure, if you're writing software that interprets OT lookups, you could do 
this. If you want to write your own code to process that state tables in 
AAT or Graphite fonts, you could do this. The software that currently do 
these thing and that most app developers are going to rely on do not.



>That's no reason. Just because many people don't need a feature, or
>maybe just don't care enough to complain, doesn't mean it shouldn't be
>provided for those who do need it.

If you want it, write to the vendor that creates whatever software you 
use, and ask them to support it. (And good luck!) But I really don't 
expect you're going to find the font / rendering industry as a whole to 
agree that this is something that needs to be supported across rendering 
systems; there are too many important issues waiting to be dealt with to 
worry about this.


>I could well imagine that in a typesetting application, it would make
>sense to be informed on whether a certain typographic feature can or
>cannot be applied.

In a typesetting application, you would be immediately informed: you see 
it in your proof, or you do not. If it's not there, you act accordingly. 
In typesetting, you probably don't want to send character data to your 
service bureau and assume that their rendering process will produce the 
same results. They may not have exactly the same typeface or match other 
aspects of rendering, let alone not get the ligature where you want.


> Carefully checking a long document for whether, e.g.,
>a certain pair of glyphs does form a ligature where it is supposed to is
>tedious and error-prone if done by a human, but such routine tasks are
>what computers are good at.

But if you're an author wanting a ligature, you don't need to proof read 
the entire long document; you just enter one instance and check it, and 
you'll known right away what the results for the rest of the document will 
be.


- Peter


---
Peter Constable

Non-Roman Script Initiative, SIL International
7500 W. Camp Wisdom Rd., Dallas, TX 75236, USA
Tel: +1 972 708 7485
E-mail: <[EMAIL PROTECTED]>





Re: ct, fj and blackletter ligatures

2002-11-07 Thread Thomas Lotze
On Thu, 7 Nov 2002 16:57:04 -0600
[EMAIL PROTECTED] wrote:

> >> As for providing a
> >> notification dialog to say that the text contains < c, ZWJ, t > but
> >> that the font doesn't support it, there are no existing mechanisms
> >> to support that at present,

> Sure, if you're writing software that interprets OT lookups, you could
> do this. If you want to write your own code to process that state
> tables in AAT or Graphite fonts, you could do this. The software that
> currently do these thing and that most app developers are going to
> rely on do not.

OK. I had read your earlier statement to the effect that it would be
impossible in principle. But if the mechanisms that are missing are
"just" the implementations, I can see your point.

> In a typesetting application, you would be immediately informed: you
> see it in your proof, or you do not. If it's not there, you act
> accordingly.
[...]
> But if you're an author wanting a ligature, you don't need to proof
> read the entire long document; you just enter one instance and check
> it, and you'll known right away what the results for the rest of the
> document will be.

Depends on how the system works. If it's a thing where you get to see
the result while typing, this is true to a certain degree. If it's a
system working like, say, TeX that produces a typeset document (PDF etc)
from a marked-up text input (a TeX file, XML, etc), it's a different
matter.

But even if you see the outcome as you're typing, you might decide to
apply a different font to the whole document later on, or change
something similar. Obviously you wouldn't want to retype it all just to
see it comes out right this time, and it would even be a stupid and
error-prone task to try out all the ligatures etc again after each
change.

Cheers, Thomas

-- 
Thomas Lotze

[EMAIL PROTECTED]  http://www.thomas-lotze.de/





Re: ct, fj and blackletter ligatures

2002-11-09 Thread Anto'nio Martins-Tuva'lkin
On 2002.11.07, 19:37, Kent Karlsson <[EMAIL PROTECTED]> wrote:

> [Long live the TAG BOLD and TAG ITALIC!; ah, no]

Of course we dont need any tag for style and basic face -- after all we
have all those bold/italic + serif|sans math letters in plane 2... ;-)

>> but, in Italian, there always is an hyphenation point between two
>> identical consonant letters.

In Portuguese also, FWIW.

--   .
António MARTINS-Tuválkin,   |  ()|
<[EMAIL PROTECTED]>   ||
R. Laureano de Oliveira, 64 r/c esq. |
PT-1885-050 MOSCAVIDE (LRS)  Não me invejo de quem tem   |
+351 917 511 549 carros, parelhas e montes   |
http://www.tuvalkin.web.pt/bandeira/ só me invejo de quem bebe   |
http://pagina.de/bandeiras/  a água em todas as fontes   |





Re: ct, fj and blackletter ligatures

2002-11-12 Thread Anto'nio Martins-Tuva'lkin
On 2002.11.09, 20:37, Anto'nio Martins-Tuva'lkin
<[EMAIL PROTECTED]> wrote: 

> Of course we dont need any tag for style and basic face -- after all
> we have all those bold/italic + serif|sans math letters in plane 2...
> ;-)

Here I mean Plane 1, of course -- which links to a near-by thread...

--   .
António MARTINS-Tuválkin|  ()|
<[EMAIL PROTECTED]>   ||
R. Laureano de Oliveira, 64 r/c esq. |
PT-1885-050 MOSCAVIDE (LRS)  Não me invejo de quem tem   |
+351 917 511 549 carros, parelhas e montes   |
http://www.tuvalkin.web.pt/bandeira/ só me invejo de quem bebe   |
http://pagina.de/bandeiras/  a água em todas as fontes   |





Re: FW: ct, fj and blackletter ligatures

2002-11-07 Thread Markus Scherer
Dominikus Scherkl wrote:

I don't believe that English readers encountering an fb 
ligature in the middle of the compound word 'goofball'
are confused about where the syllables, and hence the subwords,
end and begin.

That may be because english doesn't use word-concatenations the
way german do: words like "Krankenversischerungsbeitragserhöhungen"

 ^^^

become even more unreadable if you hide the in-word boundaries with
bad ligatures.


"Versischerung" sounds more like Schwäbisch... :-)
markus





A .notdef glyph (derives from Re: ct, fj and blackletter ligatures)

2002-11-06 Thread William Overington
John Hudson wrote as follows.

>Here's an exercise for your enthusiasm, William: devise the form of the
>perfect .notdef glyph. It needs to unambiguously indicate that a glyph is
>missing, i.e. it should be something that can easily be mistaken for a
>dingbat, and it needs to be easy to spot in proofreading in both print and
>onscreen (some applications, e.g. Adobe InDesign, make the latter a bit
>easier by applying colour highlight to the .notdef glyph).

Thank you for the design brief.

Here is my design.

The design consists of a single contour in as large a square box as is
possible for the particular font.

In my prototype I used a box 2048 font units by 2048 font units.  In this
case, the value of n is 1024.

The contour has seven points, the first point and the last point being at
the same place.

Point 1 is at (0,0) and is on the curve.
Point 2 is at (0,2n) and is off the curve.
Point 3 is at (2n,2n) and is on the curve.
Point 4 is at (2n,n) and is on the curve.
Point 5 is at (n,n) and is on the curve.
Point 6 is at (n,0) and is on the curve.
Point 7 is at (0,0) and is on the curve.

This has the effect of making the glyph easy to draw, solid enough to be
specifically noticeable, distinctively shaped with both a curved line and
straight lines so that it stands out and in an arc which goes against the
normal arc of design of a graphical user interface of the input screen of a
computer program so as also hopefully to make it more noticeable.  In
addition, the design has white space set out in a manner such that where
several copies of the glyph appear in sequence on a page of text, they are
easily counted.

I hope that you like the design.

William Overington

6 November 2002