Re: ligature usage - WAS: How do we find out what assigned code points aren't normally used in text?

2011-09-13 Thread Martin J. Dürst
If you have better ideas of how to specify ligature-related properties 
in CSS, please send them to the relevant place given in the draft 
(sorry, currently offline, otherwise I'd look it up).


Regards,   Martin.

On 2011/09/13 1:41, Christoph Päper wrote:

Philippe Verdy:


And it would be desirable to have a standardized CSS property for controling 
this default behavior in browsers.




(In my opinion there would be better ways to spec this, though.)






Re: ligature usage - WAS: How do we find out what assigned code points aren't normally used in text?

2011-09-13 Thread Philippe Verdy
I clearly see ligatures when zooming in, and the ligature disappears
when I select an individual character (Is that a rendering issue of
the Arial font, where the glyphs are colliding, and the collisions are
not expected when performing individual character selections ?)

I also see these ligatures occuring with Times New Roman. And anyway
if CSS3 continues like it is currently specified, it should be the
default expected behavior of browsers.

2011/9/13 Jukka K. Korpela 
>
> 12/09/2011 20:29, Philippe Verdy wrote:
>
>> I see those ligatures applied in Chrome v.13.0.782.220 over Windows 7
>> SP1 French, just when reading this email in Gmail which renders it with
>> the stock Arial font of Windows (no webfont used). My locale preferences
>> in the browser and in my Gmail profile are first in French (France),
>> then English (US).
>>
>> Zoom in, you'll see that these ligatures are rendered by default. Still
>> you can select the individual letters in "fi" or "fl" or "ffi" or "ffl",
>> copy-pasting to another document from the browser generates 2
>> characters, and a DOM inspection of the HTML document with the
>> Developers tools shows that there are affectively two letters in the
>> HTML document (and no ZWJ in the middle).
>
> So how did you conclude that there are any ligatures? As far as I can see, 
> the fi and fl ligatures in Arial are identical in appearance with the 
> corresponding two-letter combinations, and ffi and ffl ligatures do not exist 
> in Arial.
>
> If it looks like two characters, walks like two characters...
>
> --
> Yucca, http://www.cs.tut.fi/~jkorpela/
>



Re: ligature usage - WAS: How do we find out what assigned code points aren't normally used in text?

2011-09-13 Thread Jukka K. Korpela

13/09/2011 00:42, Stephan Stiller wrote:


On 9/12/2011 10:46 AM, Philippe Verdy wrote:

common-ligatures
Enables display of common ligatures (OpenType feature: |liga|). For
OpenType fonts, common ligatures are enabled by default.

This means that German documents will really need to use ZWNJ
(fortunately, this character should soon become standard on German
keyboards, and CSS3 would be a good motivation for including this key
mapping) for common ligatures like fi,fl, ff, ffi, ffl, ſt, or even tt...


It would be nicer if the user were, "by default", offered a choice.


I’m afraid this discussion, though on-topic in my opinion, has become 
rather specialized and technical (in terms of web techniques) for this 
list. I share Philippe’s concern for the change: changing the way 
browsers work in rendering texts is not a good thing when it changes the 
_default_ behavior.


Even if a change, like using a ligature for “fi,” might be an 
improvement in the average, that’s not enough. There are too many things 
that may get broken that way—even if we don’t consider drastic (yet 
realistic) issues like intentionally monospace text.


But I don’t think the Unicode Consortium, or the community supporting 
Unicode at large, could make a useful move in this issue. It really 
calls for common sense, rather than anything else, from browser vendors 
and CSS specs authors to realize that the default rendering should be 
left intact, as there are too many potential parameters to consider.


I see this primarily as an _author_ choice. The user should have the 
last word, as he has if he really wants that, but for the most of it, 
typographic issues like ligatures are not something that users can and 
will deal with. Authors can be expected to do that, if they care, and it 
should not be too much of a burden to write an author stylesheet that 
suggest ligature behavior for all text, if that’s desirable and possible.


--
Yucca, http://www.cs.tut.fi/~jkorpela/



Re: ligature usage - WAS: How do we find out what assigned code points aren't normally used in text?

2011-09-13 Thread Stephan Stiller



I clearly see ligatures when zooming in, and the ligature disappears
when I select an individual character


Philippe is referring to the same effect you could see on an older 
Firefox that, when you'd mark/select Arabic text with your mouse, it'd 
re-render the characters as if there were additional ZWNJs present.


-S




Re: ligature usage - WAS: How do we find out what assigned code points aren't normally used in text?

2011-09-13 Thread Philippe Verdy
Not necessarily : Uniscribe can place a caret on an approximative
position in the middle of a ligature, but the situation is different
when performing a selection, because it splits the text in separate
runs (with distinct color attributes, even though they are part of the
same "range").

See the terminology of "runs and ranges" in the MSDN documentation,
they are not synonyms and designate different groupings, and are even
different from clusters. They do not group the same thing as well
(ranges are for the character level, runs are for the glyph level and
have no meaning in Unicode)

2011/9/13 Stephan Stiller :
>
>> I clearly see ligatures when zooming in, and the ligature disappears
>> when I select an individual character
>
> Philippe is referring to the same effect you could see on an older Firefox
> that, when you'd mark/select Arabic text with your mouse, it'd re-render the
> characters as if there were additional ZWNJs present.



Re: ligature usage - WAS: How do we find out what assigned code points aren't normally used in text?

2011-09-13 Thread Philippe Verdy
2011/9/13 Jukka K. Korpela :
> 13/09/2011 07:18, Philippe Verdy wrote:
>
>> I clearly see ligatures when zooming in,
>
> That’s odd, because when looking at text in Arial, I find it very difficult
> to distinguish between the fi combination and the ligature U+FB01 (fi).
> There’s one pixel less space between the f and i in the ligature, for some
> large font sizes, but that’s it.

I absolutely don't see any difference in Chrome (not even one pixel or
a grayed quarter pixel when taking a bitmap snapshot and zooming it in
an external image editor), even at maximum zoom level, in your message
between the two letters and the 1-character ligature.
But I clearly see the difference in Notepad. Visibly, even if you can
place the input caret between the two letters, it is using exactly
same glyph as the ligature. (The difference only appears when
selecting with the mouse one of the characters, and only in that case,
1 joining pixel disappears, which gets immediately restored when
unselecting the text, or when selecting both letters
simultaneously...)

I should try with fonts where the differences are more evident anyway.




Re: ligature usage - WAS: How do we find out what assigned code points aren't normally used in text?

2011-09-12 Thread Jukka K. Korpela

13/09/2011 07:18, Philippe Verdy wrote:


I clearly see ligatures when zooming in,


That’s odd, because when looking at text in Arial, I find it very 
difficult to distinguish between the fi combination and the ligature 
U+FB01 (fi). There’s one pixel less space between the f and i in the 
ligature, for some large font sizes, but that’s it.


If you _clearly_ see a difference, I suspect the ligature might be from 
a different font.



I also see these ligatures occuring with Times New Roman.


There the difference is real but small, in most font sizes at least. 
What happens if you try Consolas? And does this happen on Chrome when 
viewing a normal web page, or is it somehow related to Gmail (which 
might do something special)?


I’m using Finnish version of Win 7 Pro. Changing the language to French 
via the Control Panel country & language settings didn’t have an effect 
on this (and I din’t expect an effect, as those settings have a rather 
limited effect).


--
Yucca, http://www.cs.tut.fi/~jkorpela/



Re: ligature usage - WAS: How do we find out what assigned code points aren't normally used in text?

2011-09-12 Thread Jukka K. Korpela

12/09/2011 20:29, Philippe Verdy wrote:


I see those ligatures applied in Chrome v.13.0.782.220 over Windows 7
SP1 French, just when reading this email in Gmail which renders it with
the stock Arial font of Windows (no webfont used). My locale preferences
in the browser and in my Gmail profile are first in French (France),
then English (US).

Zoom in, you'll see that these ligatures are rendered by default. Still
you can select the individual letters in "fi" or "fl" or "ffi" or "ffl",
copy-pasting to another document from the browser generates 2
characters, and a DOM inspection of the HTML document with the
Developers tools shows that there are affectively two letters in the
HTML document (and no ZWJ in the middle).


So how did you conclude that there are any ligatures? As far as I can 
see, the fi and fl ligatures in Arial are identical in appearance with 
the corresponding two-letter combinations, and ffi and ffl ligatures do 
not exist in Arial.


If it looks like two characters, walks like two characters...

--
Yucca, http://www.cs.tut.fi/~jkorpela/



Re: ligature usage - WAS: How do we find out what assigned code points aren't normally used in text?

2011-09-12 Thread Stephan Stiller

On 9/12/2011 10:46 AM, Philippe Verdy wrote:

common-ligatures
Enables display of common ligatures (OpenType feature: |liga|). For 
OpenType fonts, common ligatures are enabled by default.


This means that German documents will really need to use ZWNJ 
(fortunately, this character should soon become standard on German 
keyboards, and CSS3 would be a good motivation for including this key 
mapping) for common ligatures like fi,fl, ff, ffi, ffl, ſt, or even tt...


It would be nicer if the user were, "by default", offered a choice. It 
may also be of interest to point out the parallel to the well-known 
problem of using a chat or email client where innocuous letter strings 
all of a sudden show up as unintended emoticons.


Stephan



Re: ligature usage - WAS: How do we find out what assigned code points aren't normally used in text?

2011-09-12 Thread Stephan Stiller


Even if Dorfladen is not ambigous, it could be disturbing (and at 
first reading be understood as some obscure compound of -fladen.


Yep - I agree with your perception. But the point was not {that use of 
ligatures vs not using them is for disambiguation} but instead {that 
only ambiguous compounding needs human intervention, whereas other 
compounding can (always?) be dealt with by the computer if a list of 
words to build them out of is available}. Any list containing "Dorf" and 
"Laden" can easily be used to avoid ligating these components. Perhaps 
this is what was meant, but I'm clarifying just in case.


Once I read a text, it used ligature (inappropriately) in the word 
Auflage 'obligation', which is compounded from the prefix auf- 'upon' 
-lage , a nominal derivative of 'to lay'. Anyway, it's one word with 
its own meaning.
Because of that stupid ligature I read it twice as [ofla:ʒ], thinking 
it would be a yet-unknown French loanword, before finally realising it 
was simply Auflage.


You have a good eye, and I've had similar experiences. Interestingly 
your observation can be used as evidence that even well-lexicalized 
word[ usage]s can benefit from not being ligated at certain morpheme 
boundaries. Thankfully, lists are sufficient to address cases like 
"Auflage" as well. I'm saying it this way because I simply don't know 
whether there would be a similar effect if German didn't have French 
loanwords.


So we could be getting into the realm of AI-hard aesthetic judgments: 
Some cases that "should" ("Stickstoffflasche") don't actually depend on 
ligation, whereas others matter for other psycholinguistic reasons. 
Ideally publishers caring enough about ligatures have 
copyeditors/proofreaders paying attention to this.


Stephan




Re: ligature usage - WAS: How do we find out what assigned code points aren't normally used in text?

2011-09-12 Thread Philippe Verdy
Note: an old bug (signaled by me multiple times 2.5 years ago) is now being
corrected in Chrome since today: the Uniscribe part of the Webkit renderer
has a very critical bug that was finally isolated, it caused the whole
Windows desktop to become almost frozen or impossible to refresh on some
conditions, often forcing to reboot abruptly.

This old bug (appered in January 2009 with early beta versions of Chrome v4)
causes serious leakages of GDI resources (DC and font handles) in the
desktop window, within a very tight loop that sometimes never terminates.

It would be good if Chrome/Chromium and WebKit authors read this thread,
when they correct their Uniscribe support, in order to honor the language
markup or metadata in HTML5.

Philippe.

2011/9/12 James Cloos 

> > "WL" == Werner LEMBERG  writes:
>
> >> But "Dorfladen" is not ambiguous.
>
> WL> Yes, but some web browsers like Firefox automatically apply an `fl'
> WL> ligature...
>
> Only if the font does.  (At least in the case of gecko-on-X11.)
>
> Ideally the text should be tagged as DE so that the app can call the
> opentype/graphite/whatever features for DE text rather than for generic
> latin (script) text.
>
> Failing that it would be useful to guess based on word lists, provided
> of course that doing do does not kill performance.
>
> -JimC
> --
> James Cloos  OpenPGP: 1024D/ED7DAEA6
>
>


Re: ligature usage - WAS: How do we find out what assigned code points aren't normally used in text?

2011-09-12 Thread Philippe Verdy
2011/9/12 Christoph Päper 

> Philippe Verdy:
>
> > And it would be desirable to have a standardized CSS property for
> controling this default behavior in browsers.
>
> 
>
> (In my opinion there would be better ways to spec this, though.
>

Interestingly it says:

common-ligatures
Enables display of common ligatures (OpenType feature: liga). For OpenType
fonts, common ligatures are enabled by default.

This means that German documents will really need to use ZWNJ (fortunately,
this character should soon become standard on German keyboards, and CSS3
would be a good motivation for including this key mapping) for common
ligatures like fi,fl, ff, ffi, ffl, ſt, or even tt...

They are not considered "discretionary ligatures" in most OpenType fonts
(OpenType feature: dlig, disabled by default), except if the font includes a
German specialization of those OpenType features (provided that browsers DO
honor the language markup in HTML documents or CSS styles, or in document
metadata).

I just hope that with the advances of HTML5, more authors will conform to
the standard and apply the markup or metadata for the language consistantly,
so that browsers will honor this language markup at least in HTML 5, even if
they continue to ignore it for HTML 4 or for XHTML 1.0 in compatibility
mode.


Re: ligature usage - WAS: How do we find out what assigned code points aren't normally used in text?

2011-09-12 Thread Philippe Verdy
2011/9/12 Jukka K. Korpela 

> 12.9.2011 18:19, Philippe Verdy wrote:
>
> Yes, but some web browsers like Firefox automatically apply an `fl'
>>ligature...
>>
>> Well, not just Firefox, because Chrome is now doing the same thing for
>> this message !
>>
>
> Can you give more details? I just checked that my Chrome (Win 7) is
> up-to-date and tested with a simple document, and it did not apply any
> ligatures (for fi or fl). As far as I know, Firefox has applied ligatures
> for some time _but_ only for some font face and size combinations by default
> and controllable by the CSS property text-rendering. I still think it was a
> bad move to start applying ligatures by default on the web where none were
> applied so far.
>

I see those ligatures applied in Chrome v.13.0.782.220 over Windows 7 SP1
French, just when reading this email in Gmail which renders it with the
stock Arial font of Windows (no webfont used). My locale preferences in the
browser and in my Gmail profile are first in French (France), then English
(US).

Zoom in, you'll see that these ligatures are rendered by default. Still you
can select the individual letters in "fi" or "fl" or "ffi" or "ffl",
copy-pasting to another document from the browser generates 2 characters,
and a DOM inspection of the HTML document with the Developers tools shows
that there are affectively two letters in the HTML document (and no ZWJ in
the middle).

May be you have a different (German?) locale, for which Chrome does not
perform these ligatures by default.
-- Philippe.


Re: ligature usage - WAS: How do we find out what assigned code points aren't normally used in text?

2011-09-12 Thread Christoph Päper
Philippe Verdy:

> And it would be desirable to have a standardized CSS property for controling 
> this default behavior in browsers.



(In my opinion there would be better ways to spec this, though.)



Re: ligature usage - WAS: How do we find out what assigned code points aren't normally used in text?

2011-09-12 Thread Jukka K. Korpela

12.9.2011 18:19, Philippe Verdy wrote:


Yes, but some web browsers like Firefox automatically apply an `fl'
ligature...

Well, not just Firefox, because Chrome is now doing the same thing for
this message !


Can you give more details? I just checked that my Chrome (Win 7) is 
up-to-date and tested with a simple document, and it did not apply any 
ligatures (for fi or fl). As far as I know, Firefox has applied 
ligatures for some time _but_ only for some font face and size 
combinations by default and controllable by the CSS property 
text-rendering. I still think it was a bad move to start applying 
ligatures by default on the web where none were applied so far.


> And unconditionally (ignoring the HTML page content

language, if it's set to German).


Sadly enough, web browsers generally ignore language markup, just as 
search engines do. Probably largely because a) there is so often wrong 
information in such markup and b) for any page of nontrivial size in 
terms of amount of text, the language can be reasonably well and 
efficiently inferred from the context itself automatically.


> With the ligatures generated by default, now documents need to use
> ZWNJ instead if those ligatures are not suitable...

I'm afraid so. On the other hand, you can do that with client-side 
JavaScript fairly easily. However, I'm not quite sure whether all 
relevant browsers can deal with ZWNJ, at least in the sense of ignoring 
it, instead of doing something stupid like displaying a symbol of an 
unrepresentable glyph. I guess this revolves around IE 6 - can we ignore 
it? (Notes on using ZWNJ on web pages:

http://www.cs.tut.fi/~jkorpela/html/nobr.html#zwsp )

--
Yucca, http://www.cs.tut.fi/~jkorpela/



Re: ligature usage - WAS: How do we find out what assigned code points aren't normally used in text?

2011-09-12 Thread James Cloos
> "WL" == Werner LEMBERG  writes:

>> But "Dorfladen" is not ambiguous.

WL> Yes, but some web browsers like Firefox automatically apply an `fl'
WL> ligature...

Only if the font does.  (At least in the case of gecko-on-X11.)

Ideally the text should be tagged as DE so that the app can call the
opentype/graphite/whatever features for DE text rather than for generic
latin (script) text.

Failing that it would be useful to guess based on word lists, provided
of course that doing do does not kill performance.

-JimC
-- 
James Cloos  OpenPGP: 1024D/ED7DAEA6



Re: ligature usage - WAS: How do we find out what assigned code points aren't normally used in text?

2011-09-12 Thread Philippe Verdy
2011/9/12 Werner LEMBERG 

>
> >> Consider the word `Dorfladen' (village shop).  Using `=' to
> >> indicate a compound break point and `-' for normal ones, the proper
> >> break points are `Dorf=la-den' which means no `fl' ligature.  Note
> >> that `Fladen' means `cow dung', so having a ligature there is
> >> really bad.
> >
> > But "Dorfladen" is not ambiguous.
>
> Yes, but some web browsers like Firefox automatically apply an `fl'
> ligature...
>

Well, not just Firefox, because Chrome is now doing the same thing for this
message ! And unconditionally (ignoring the HTML page content language, if
it's set to German). Chrome developers probably thought it was good because
many English users demanded it. In the past, there was no automatic
ligatures produced for fi, fl, ffi, ffl, ſt, and so on...

Chrome should come back to the previous state on this, or apply ligatures
only where they are hinted by ZWJ (support for ZWJ and ZWNJ was added in
Chrome after my request in a bug report, nearly two years ago, so that they
would no longer display a .notdef box).

With the ligatures generated by default, now documents need to use ZWNJ
instead if those ligatures are not suitable...

But may be automatic ligatures may be kept on by default in English, or
French, but not in German... And it would be desirable to have a
standardized CSS property for controling this default behavior in browsers.

-- Philippe.


Re: ligature usage - WAS: How do we find out what assigned code points aren't normally used in text?

2011-09-12 Thread Karl Pentzlin
Am Montag, 12. September 2011 um 15:38 schrieb Christoph Päper:

CP> ZWJ or ZWNJ should become easier to input on standard keyboard
CP> layouts, not only in the German one.

ZWNJ is present on the new German standard keyboard layout "T2",
to be entered as AltGr+".", exactly to mark the places where automatic
ligature application shall not be done.
(The keyboard standard draft is now in the public discussion stage, and is
expected to be published as DIN 2137:2012 in the beginning of next year.)





Re: ligature usage - WAS: How do we find out what assigned code points aren't normally used in text?

2011-09-12 Thread Christoph Päper
Szelp A. Szabolcs:

> Even if Dorfladen is not ambigous, it could be disturbing

‹Dorfladen› and ‹Auflage› certainly are disturbing. 

For the current German orthography, smart fonts should rather sport ligatures 
for double consonants, especially when followed by a third one of their kind in 
compounds, i.e. in ‹Stickstoffflasche› it doesn’t matter much whether the first 
‹ff›, ‹fl› or both are ligated, but not all three ‹f› should look the exact 
same. 
Also digraph (esp. ‹ch›), trigraph (‹sch›) and diphthong (e.g. ‹au›) ligatures 
should be fine from a readability perspective, maybe advisable even.

ZWJ or ZWNJ should become easier to input on standard keyboard layouts, not 
only in the German one.

Anyhow, this hardly seems relevant still for the Unicode discussion list.



Re: ligature usage - WAS: How do we find out what assigned code points aren't normally used in text?

2011-09-12 Thread Werner LEMBERG

>> Consider the word `Dorfladen' (village shop).  Using `=' to
>> indicate a compound break point and `-' for normal ones, the proper
>> break points are `Dorf=la-den' which means no `fl' ligature.  Note
>> that `Fladen' means `cow dung', so having a ligature there is
>> really bad.
> 
> But "Dorfladen" is not ambiguous.

Yes, but some web browsers like Firefox automatically apply an `fl'
ligature...

> Asmus war referring to ambiguous cases created by the way compound
> words are spelled in German. For those, some user interaction is
> necessary, and it's my view that there are unobtrusive ways of
> interacting with the user about this.

Looking up my hyphenated word list containing about 43 entries of
the most frequent German words,[1] I find a *single* entry which
belongs into this class, using the historical `st' ligature:
`Wach-stube' vs. `Wachs-tube'.  However, improper use of `fl' or `fi'
ligatures can be seen very often; just think of words like `Auflage'
(Auf=la-ge).


Werner


[1] http://repo.or.cz/?a=project_list&s=wortliste



Re: ligature usage - WAS: How do we find out what assigned code points aren't normally used in text?

2011-09-12 Thread Szelp A. Szabolcs
Even if Dorfladen is not ambigous, it could be disturbing (and at first
reading be understood as some obscure compound of -fladen.

Once I read a text, it used ligature (inappropriately) in the word Auflage
'obligation', which is compounded from the prefix auf- 'upon' -lage , a
nominal derivative of 'to lay'. Anyway, it's one word with its own meaning.
Because of that stupid ligature I read it twice as [ofla:ʒ], thinking it
would be a yet-unknown French loanword, before finally realising it was
simply Auflage.

That mis-placed ligature really disturbed my reading flow, even though
Auflage would not be ambiguous  (like Dorfladen).

/Sz


On Mon, Sep 12, 2011 at 08:27, Stephan Stiller wrote:

>
> But "Dorfladen" is not ambiguous. Asmus war referring to ambiguous cases
> created by the way compound words are spelled in German. For those, some
> user interaction is necessary, and it's my view that there are unobtrusive
> ways of interacting with the user about this.
>
> (But then it needs to be acknowledged that ambiguous cases probably exist
> or can be constructed in a lot of languages. And the frequency of such
> ambiguity occurring in actual German text isn't that high. Even more so if
> one takes into account the orthographic recommendation to use an explicit
> hyphen in ambiguous cases. But of course these cases, if they occur, need to
> be handled nevertheless.)
>
> Stephan
>
>
>


Re: ligature usage - WAS: How do we find out what assigned code points aren't normally used in text?

2011-09-11 Thread Stephan Stiller

Well, it's not that complicated.  Ligatures in German must not happen
at compound break points, while they can be applied to ordinary break
points.

Consider the word `Dorfladen' (village shop).  Using `=' to indicate a
compound break point and `-' for normal ones, the proper break points
are `Dorf=la-den' which means no `fl' ligature.  Note that `Fladen'
means `cow dung', so having a ligature there is really bad.


But "Dorfladen" is not ambiguous. Asmus war referring to ambiguous cases 
created by the way compound words are spelled in German. For those, some 
user interaction is necessary, and it's my view that there are 
unobtrusive ways of interacting with the user about this.


(But then it needs to be acknowledged that ambiguous cases probably 
exist or can be constructed in a lot of languages. And the frequency of 
such ambiguity occurring in actual German text isn't that high. Even 
more so if one takes into account the orthographic recommendation to use 
an explicit hyphen in ambiguous cases. But of course these cases, if 
they occur, need to be handled nevertheless.)


Stephan




Re: ligature usage - WAS: How do we find out what assigned code points aren't normally used in text?

2011-09-11 Thread Werner LEMBERG
>> Certain layout processes, in certain cases, in certain languages,
>> simply can't be fully automated.
> 
> And interestingly, there is a crucial difference between ligatures
> and hyphenation in this regard: While a conservative processor could
> simply omit hyphenation in ambiguous cases (potentially leading to
> suboptimal linebreaking though), a decision ought to be made for
> ligatures if one uses a font requiring them. But then, although
> getting ligatures wrong in this case is categorically somehow
> "worse" than too-wide inter-word spacing, who knows which visual
> effect actually has more adverse effect on the reading process ...

Well, it's not that complicated.  Ligatures in German must not happen
at compound break points, while they can be applied to ordinary break
points.

Consider the word `Dorfladen' (village shop).  Using `=' to indicate a
compound break point and `-' for normal ones, the proper break points
are `Dorf=la-den' which means no `fl' ligature.  Note that `Fladen'
means `cow dung', so having a ligature there is really bad.

On the other hand, consider `Löffel' (spoon).  Inspite of the
hyphenation `Löf-fel', a ligature looks good.


   Werner




Re: ligature usage - WAS: How do we find out what assigned code points aren't normally used in text?

2011-09-11 Thread Stephan Stiller


From my background I never perceived a need, but I guess I (and most 
people??) wouldn't really mind the tradition coming back (in 
Germany) if things are designed well (which is the job of the font 
designer) and for the user everything is handled automatically in 
the background by the available technology ...


Which cannot happen for German, as it is one of the languages where 
the same letter pair may or may not have a ligature based on the 
*meaning* of the word - something that you can't automate.


You are absolutely right!




Certain layout processes, in certain cases, in certain languages, 
simply can't be fully automated.


*Actually*, the emphasis here is on the word "fully". Writing a 
(language-specific) tool (or wordprocessor plugin) for semi-automated 
processing would be so easy - something that walks you through all cases 
of ambiguous hyphenation and ligatures (if the font so requires). An 
unobtrusive way of doing this would be if the word processor simply put 
a purple squiggly line under each word needing closer inspection, for 
right-click fixing. I'm really wondering why such tools are not employed 
or - if they are - I haven't heard of them ...


Stephan




Re: ligature usage - WAS: How do we find out what assigned code points aren't normally used in text?

2011-09-11 Thread Stephan Stiller


From my background I never perceived a need, but I guess I (and most 
people??) wouldn't really mind the tradition coming back (in Germany) 
if things are designed well (which is the job of the font designer) 
and for the user everything is handled automatically in the 
background by the available technology ...


Which cannot happen for German, as it is one of the languages where 
the same letter pair may or may not have a ligature based on the 
*meaning* of the word - something that you can't automate.


You are absolutely right!

We had famous discussions on this list on this subject. Take an "st" 
ligature. There are two meanings for the German word "Wachstube", only 
one allows the st ligature. A human would have to decide when the 
ligature is appropriate. (Incidentally, the same goes for hyphenation 
for this word, one meaning allows a hyphen after the "s" the other 
does not).


Certain layout processes, in certain cases, in certain languages, 
simply can't be fully automated.


And interestingly, there is a crucial difference between ligatures and 
hyphenation in this regard: While a conservative processor could simply 
omit hyphenation in ambiguous cases (potentially leading to suboptimal 
linebreaking though), a decision ought to be made for ligatures if one 
uses a font requiring them. But then, although getting ligatures wrong 
in this case is categorically somehow "worse" than too-wide inter-word 
spacing, who knows which visual effect actually has more adverse effect 
on the reading process ...


There are two ways of generalizing from a situation where a locale tends 
to preferably use fonts without (and not necessitating) ligatures: If 
fonts with ligatures are introduced ...
(1) [generalizing: "we're not using ligatures"] ... the community is 
going to find it distracting because it is not used to the ligatures, 
plus there may be inherent problems with this for the respective locale 
anyways.
(2) [generalizing: "presently used fonts don't use ligatures"] ... the 
community won't find it distracting because good fonts will do ligatures 
well.

(while the great majority of laymen might neither notice nor care ...)

This theoretical ambiguity in generalizing simply arises from the fact 
that "not using ligatures" is equivalent to "not using fonts 
having/necessitating ligatures" in Germany.


Lots of the English-language discussion of ligatures I've seen tacitly 
assumes that "good" typesetting with "good" fonts" "should" use 
ligatures in certain cases, and I just disagree with this assumption. 
Well, forgive me if maybe I'm just getting the wrong impression, being a 
layman on this matter.


Stephan




RE: ligature usage - WAS: How do we find out what assigned code points aren't normally used in text?

2011-09-11 Thread Peter Constable
There are certainly monospaced fonts that support Arabic. For instance, Windows 
fonts Courier New and Simplified Arabic Fixed support Arabic.

Devanagari is a different matter.


Peter

-Original Message-
From: unicode-bou...@unicode.org [mailto:unicode-bou...@unicode.org] On Behalf 
Of Richard Wordingham
Sent: Sunday, September 11, 2011 5:19 PM
To: Unicode Discussion
Subject: Re: ligature usage - WAS: How do we find out what assigned code points 
aren't normally used in text?

On Sun, 11 Sep 2011 23:14:04 +0200
Kent Karlsson  wrote:

> Den 2011-09-11 18:53, skrev "Peter Constable"
> :

> > Hence, in a monospaced font, FB01 certainly should look different 
> > from <0066,
> > 0069>, regardless of whether ligature glyphs are used in either 
> > 0069>case.
> 
> If "monospace" is interpreted that rigidly, then it is much better
> *not* to have any glyph at all for FB01 (and other characters like
> it) in a "monospace" font.

Aesthetically you're correct, but U+FB01 and U+00E6 LATIN SMALL LETTER A WITH 
DIAERESIS both have the ID start property, and the latter is definitely allowed 
in C identifiers.  While U+00E6 is much securer as a character, it too tends to 
be quite ugly in monospaced fonts.  (Courier can be quite useful for setting 
off text as computer code, especially variable and function names.)

Incidentally, are there working definitions of monospace for Arabic and 
Devanagari?

Richard.






Re: ligature usage - WAS: How do we find out what assigned code points aren't normally used in text?

2011-09-11 Thread Richard Wordingham
On Sun, 11 Sep 2011 23:14:04 +0200
Kent Karlsson  wrote:

> Den 2011-09-11 18:53, skrev "Peter Constable"
> :

> > Hence, in a monospaced font, FB01 certainly should look different
> > from <0066,
> > 0069>, regardless of whether ligature glyphs are used in either
> > 0069>case.
> 
> If "monospace" is interpreted that rigidly, then it is much better
> *not* to have any glyph at all for FB01 (and other characters like
> it) in a "monospace" font.

Aesthetically you're correct, but U+FB01 and U+00E6 LATIN SMALL LETTER
A WITH DIAERESIS both have the ID start property, and the latter is
definitely allowed in C identifiers.  While U+00E6 is much securer as a
character, it too tends to be quite ugly in monospaced fonts.  (Courier
can be quite useful for setting off text as computer code, especially variable 
and function names.)

Incidentally, are there working definitions of monospace for Arabic and 
Devanagari?

Richard.



Re: ligature usage - WAS: How do we find out what assigned code points aren't normally used in text?

2011-09-11 Thread Kent Karlsson

Den 2011-09-11 18:53, skrev "Peter Constable" :

> There's no requirement that the width of glyphs in a monospaced font be 1 em.
> I would agree, though, that if a monospaced font forms a ligature of a pair
> like <0066, 0069>, then it should be twice the width (not necessarily 2em) of
> single-character glyphs.

That's fine (assuming the ligature is well designed, in the case of a
monospace font connecting the bar of the f to the top serif of the i and
only that).

> In a monospace font, nothing prevents the glyph for FB01 being a ligature, and
> some monospaced fonts do have a ligature glyph for that character.

Fine too. But see below.

> Of course, in a monospaced font, the glyph for that character should be the
> same width as all other glyphs. So if it's not a ligature, then the "f" and
> "i" elements still need to be narrower than the glyphs for 0066 and 0069.
> 
> Hence, in a monospaced font, FB01 certainly should look different from <0066,
> 0069>, regardless of whether ligature glyphs are used in either case.

If "monospace" is interpreted that rigidly, then it is much better *not* to
have any glyph at all for FB01 (and other characters like it) in a
"monospace" font.

/Kent K

> 
> Peter
> 
> -Original Message-
> From: unicode-bou...@unicode.org [mailto:unicode-bou...@unicode.org] On Behalf
> Of Philippe Verdy
> Sent: Saturday, September 10, 2011 10:33 PM
> To: Michael Everson
> Cc: unicode Unicode Discussion
> Subject: Re: ligature usage - WAS: How do we find out what assigned code
> points aren't normally used in text?
> 
> 2011/9/11 Michael Everson :
>> On 11 Sep 2011, at 00:23, Richard Wordingham wrote:
>> 
>>> A font need not support such ligation, but a glyph for U+FB01 must
>>> ligate the letters - otherwise it's not U+FB01!
>> 
>> Not in monowidth, it doesn't.
> 
> I also agree, a monospaced font can perfectly show the dot and ligate the
> letters, using a "double-width" (2em) ligature without any problem, or simply
> not map it at all, or choose to just map a composite glyph made of the
> 1em-width glyphs assigned to the two letters f and (dotted) i without showing
> any visible ligation between those glyphs (this being consistant with
> monospaced fonts that remove all ligations, variable advances and kernings
> between letters).
> 
> You could as well have a font design in which all pairs or Latin letters are
> joined, including in a monospaced font, in which case you should not see any
> difference between FB01 and the pair or Basic Latin letters. Joining letters
> is fully independant of the fact that the upper part of letter f may or may
> not interact graphically with the presence of a dot. If the style of letter
> glyphs does not cause any interaction, there's no reason to remove the dot
> over i or j in the "ligature" or joining letters.
> 
> You should not be limited by the common style used in modern Times-like fonts
> (notably in italic styles, where the letter f is overhanging over the nearby
> letters). Other font styles also exist that do not require adjustment to
> remove the dot, or merge it with a graphic feature of the preceding letter f
> which is specific to some fonts.
> 
> As the pair of letters f and (dotted) i is perfectly valid in Turkish, there's
> absolutely no reason why the fi ligature would be invalid in Turkish. But
> given that this character is just provided for compatibility with legacy
> encodings, I would still not recommand it for Turkish or for any other
> language, including English. This FB01 character is not necessary to any
> orthography and if possible, should be replaced by the pair of Basic Latin
> letters (and in fact I don't see any reason why a font would not choose to do
> this everywhere)
> 
> -- Philippe.
> 
> 
> 
> 





Re: ligature usage - WAS: How do we find out what assigned code points aren't normally used in text?

2011-09-11 Thread Peter Zilahy Ingerman, PhD
An old acquaintance of mine, many years ago, pointed out two cases in 
Dutch: a hunter of kiwi birds, kiwijager, cannot use the customary ij 
ligature. And as for parsing ambiguities, he observed that there were 
three different ways of understanding the word "kwartslagen", depending 
on whether it was read "kwart-slagen", "kwarts-lagen", or "kwart-sla-gen".


Peter Ingerman

On 2011-09-11 00:42, Asmus Freytag wrote:

On 9/9/2011 8:12 PM, Stephan Stiller wrote:

Dear Martin,

Thanks for alerting me to the issue of causal direction of aesthetic 
preference - it's been on my mind, but your reply helps me sort out 
some details.


When I first encountered text (outside of the German language locale) 
with ample use of ligatures in modern printed text, I definitely 
found the ligatures a bit distracting, but partly just because I 
wasn't used to them. I also perceived them as a solution to what (in 
Germany) appeared to me to be a real non-issue.


Put simply, there is a conflict between full flexibility for font 
designs and the burden imposed by sophisticated ligatures and kerning 
tables.


From my background I never perceived a need, but I guess I (and most 
people??) wouldn't really mind the tradition coming back (in Germany) 
if things are designed well (which is the job of the font designer) 
and for the user everything is handled automatically in the 
background by the available technology ...


Which cannot happen for German, as it is one of the languages where 
the same letter pair may or may not have a ligature based on the 
*meaning* of the word - something that you can't automate.


We had famous discussions on this list on this subject. Take an "st" 
ligature. There are two meanings for the German word "Wachstube", only 
one allows the st ligature. A human would have to decide when the 
ligature is appropriate. (Incidentally, the same goes for hyphenation 
for this word, one meaning allows a hyphen after the "s" the other 
does not).


Certain layout processes, in certain cases, in certain languages, 
simply can't be fully automated.


A./


Stephan












Re: ligature usage - WAS: How do we find out what assigned code points aren't normally used in text?

2011-09-11 Thread Asmus Freytag

On 9/9/2011 8:12 PM, Stephan Stiller wrote:

Dear Martin,

Thanks for alerting me to the issue of causal direction of aesthetic 
preference - it's been on my mind, but your reply helps me sort out 
some details.


When I first encountered text (outside of the German language locale) 
with ample use of ligatures in modern printed text, I definitely found 
the ligatures a bit distracting, but partly just because I wasn't used 
to them. I also perceived them as a solution to what (in Germany) 
appeared to me to be a real non-issue.


Put simply, there is a conflict between full flexibility for font 
designs and the burden imposed by sophisticated ligatures and kerning 
tables.


From my background I never perceived a need, but I guess I (and most 
people??) wouldn't really mind the tradition coming back (in Germany) 
if things are designed well (which is the job of the font designer) 
and for the user everything is handled automatically in the background 
by the available technology ...


Which cannot happen for German, as it is one of the languages where the 
same letter pair may or may not have a ligature based on the *meaning* 
of the word - something that you can't automate.


We had famous discussions on this list on this subject. Take an "st" 
ligature. There are two meanings for the German word "Wachstube", only 
one allows the st ligature. A human would have to decide when the 
ligature is appropriate. (Incidentally, the same goes for hyphenation 
for this word, one meaning allows a hyphen after the "s" the other does 
not).


Certain layout processes, in certain cases, in certain languages, simply 
can't be fully automated.


A./


Stephan








RE: ligature usage - WAS: How do we find out what assigned code points aren't normally used in text?

2011-09-11 Thread Peter Constable
There's no requirement that the width of glyphs in a monospaced font be 1 em. I 
would agree, though, that if a monospaced font forms a ligature of a pair like 
<0066, 0069>, then it should be twice the width (not necessarily 2em) of 
single-character glyphs.

In a monospace font, nothing prevents the glyph for FB01 being a ligature, and 
some monospaced fonts do have a ligature glyph for that character. 

Of course, in a monospaced font, the glyph for that character should be the 
same width as all other glyphs. So if it's not a ligature, then the "f" and "i" 
elements still need to be narrower than the glyphs for 0066 and 0069. 

Hence, in a monospaced font, FB01 certainly should look different from <0066, 
0069>, regardless of whether ligature glyphs are used in either case.


Peter

-Original Message-
From: unicode-bou...@unicode.org [mailto:unicode-bou...@unicode.org] On Behalf 
Of Philippe Verdy
Sent: Saturday, September 10, 2011 10:33 PM
To: Michael Everson
Cc: unicode Unicode Discussion
Subject: Re: ligature usage - WAS: How do we find out what assigned code points 
aren't normally used in text?

2011/9/11 Michael Everson :
> On 11 Sep 2011, at 00:23, Richard Wordingham wrote:
>
>> A font need not support such ligation, but a glyph for U+FB01 must 
>> ligate the letters - otherwise it's not U+FB01!
>
> Not in monowidth, it doesn't.

I also agree, a monospaced font can perfectly show the dot and ligate the 
letters, using a "double-width" (2em) ligature without any problem, or simply 
not map it at all, or choose to just map a composite glyph made of the 
1em-width glyphs assigned to the two letters f and (dotted) i without showing 
any visible ligation between those glyphs (this being consistant with 
monospaced fonts that remove all ligations, variable advances and kernings 
between letters).

You could as well have a font design in which all pairs or Latin letters are 
joined, including in a monospaced font, in which case you should not see any 
difference between FB01 and the pair or Basic Latin letters. Joining letters is 
fully independant of the fact that the upper part of letter f may or may not 
interact graphically with the presence of a dot. If the style of letter glyphs 
does not cause any interaction, there's no reason to remove the dot over i or j 
in the "ligature" or joining letters.

You should not be limited by the common style used in modern Times-like fonts 
(notably in italic styles, where the letter f is overhanging over the nearby 
letters). Other font styles also exist that do not require adjustment to remove 
the dot, or merge it with a graphic feature of the preceding letter f which is 
specific to some fonts.

As the pair of letters f and (dotted) i is perfectly valid in Turkish, there's 
absolutely no reason why the fi ligature would be invalid in Turkish. But given 
that this character is just provided for compatibility with legacy encodings, I 
would still not recommand it for Turkish or for any other language, including 
English. This FB01 character is not necessary to any orthography and if 
possible, should be replaced by the pair of Basic Latin letters (and in fact I 
don't see any reason why a font would not choose to do this everywhere)

-- Philippe.






Re: ligature usage - WAS: How do we find out what assigned code points aren't normally used in text?

2011-09-10 Thread Philippe Verdy
2011/9/11 Michael Everson :
> On 11 Sep 2011, at 00:23, Richard Wordingham wrote:
>
>> A font need not support such ligation, but a glyph for U+FB01 must
>> ligate the letters - otherwise it's not U+FB01!
>
> Not in monowidth, it doesn't.

I also agree, a monospaced font can perfectly show the dot and ligate
the letters, using a "double-width" (2em) ligature without any
problem, or simply not map it at all, or choose to just map a
composite glyph made of the 1em-width glyphs assigned to the two
letters f and (dotted) i without showing any visible ligation between
those glyphs (this being consistant with monospaced fonts that remove
all ligations, variable advances and kernings between letters).

You could as well have a font design in which all pairs or Latin
letters are joined, including in a monospaced font, in which case you
should not see any difference between FB01 and the pair or Basic Latin
letters. Joining letters is fully independant of the fact that the
upper part of letter f may or may not interact graphically with the
presence of a dot. If the style of letter glyphs does not cause any
interaction, there's no reason to remove the dot over i or j in the
"ligature" or joining letters.

You should not be limited by the common style used in modern
Times-like fonts (notably in italic styles, where the letter f is
overhanging over the nearby letters). Other font styles also exist
that do not require adjustment to remove the dot, or merge it with a
graphic feature of the preceding letter f which is specific to some
fonts.

As the pair of letters f and (dotted) i is perfectly valid in Turkish,
there's absolutely no reason why the fi ligature would be invalid in
Turkish. But given that this character is just provided for
compatibility with legacy encodings, I would still not recommand it
for Turkish or for any other language, including English. This FB01
character is not necessary to any orthography and if possible, should
be replaced by the pair of Basic Latin letters (and in fact I don't
see any reason why a font would not choose to do this everywhere)

-- Philippe.



Re: ligature usage - WAS: How do we find out what assigned code points aren't normally used in text?

2011-09-10 Thread Kent Karlsson

Den 2011-09-11 01:23, skrev "Richard Wordingham"
:

> On Sat, 10 Sep 2011 23:53:34 +0200
> Kent Karlsson  wrote:
> 
>> IMO, a glyph (if any) for that compatibility character should look
>> *exactly* like an "fi" (after automatic ligature formation, if that
>> is done for "fi") in the font used. So if no ligature for "fi" is
>> formed, the glyph for U+FB01 (if any) should have a dot just like
>> "fi" would have a dot. (I know, this is not commonly the case at the
>> moment.)
> 
> A font need not support such ligation,

True.

> but a glyph for U+FB01 must
> ligate the letters -

And this "ligature" can look just like "fi" in that font.
I see no reason whatsoever that it could not.

> otherwise it's not U+FB01!

Of course it would be.

> In such a case, I do
> not see the need for the dot.

That does not follow.

/Kent K

> Richard.
> 
> 





Re: ligature usage - WAS: How do we find out what assigned code points aren't normally used in text?

2011-09-10 Thread Michael Everson
On 11 Sep 2011, at 00:23, Richard Wordingham wrote:

> A font need not support such ligation, but a glyph for U+FB01 must
> ligate the letters - otherwise it's not U+FB01!

Not in monowidth, it doesn't.

Michael Everson * http://www.evertype.com/




Re: ligature usage - WAS: How do we find out what assigned code points aren't normally used in text?

2011-09-10 Thread Richard Wordingham
On Sat, 10 Sep 2011 23:53:34 +0200
Kent Karlsson  wrote:

> IMO, a glyph (if any) for that compatibility character should look
> *exactly* like an "fi" (after automatic ligature formation, if that
> is done for "fi") in the font used. So if no ligature for "fi" is
> formed, the glyph for U+FB01 (if any) should have a dot just like
> "fi" would have a dot. (I know, this is not commonly the case at the
> moment.)

A font need not support such ligation, but a glyph for U+FB01 must
ligate the letters - otherwise it's not U+FB01!  In such a case, I do
not see the need for the dot.

Richard.




Re: ligature usage - WAS: How do we find out what assigned code points aren't normally used in text?

2011-09-10 Thread Kent Karlsson

Den 2011-09-10 23:06, skrev "Richard Wordingham"
:

> On Sat, 10 Sep 2011 22:19:27 +0200
> Kent Karlsson  wrote:
> 
>> 
>> Den 2011-09-10 20:58, skrev "Jukka K. Korpela" :
>> 
>>> According to Oxford Style
>>> Manual, one should not use the fi ligature in Turkish, as that
>>> would obscure the distinction between normal i and dotless i (ž).
>  
>> It does not make perfect sense to me. Rather that:
> 
> I believe the point is that the glyph of fi U+FB01 LATIN SMALL LIGATURE
> FI

Which is a character that should not be use for any language. Typographic
ligatures (if any) should be formed automatically by the font (and font
handling system).

> is unsuitable for Turkish because it is normally undotted, or at
> least, the dot is barely visible. (Confusingly, my e-mail client chooses
> a dotted glyph!)

IMO, a glyph (if any) for that compatibility character should look *exactly*
like an "fi" (after automatic ligature formation, if that is done for "fi")
in the font used. So if no ligature for "fi" is formed, the glyph for U+FB01
(if any) should have a dot just like "fi" would have a dot. (I know, this is
not commonly the case at the moment.)

/Kent K

> Richard.
> 
> 






Re: ligature usage - WAS: How do we find out what assigned code points aren't normally used in text?

2011-09-10 Thread Philippe Verdy
In fact I also think that the fi ligature is still suitable for
Turkish, the way it is encoded, as meaning the ligature of a f and a
dotted i. I don't see why such ligature would not exhibit the presence
of the dot.

It is just a matter of glyph design, and a ligature of f and dotted i
is still possible (all depends on how you design the "f" part, notably
its top part). As well the same font design could include a distinct
ligature of f and dotless i, even if it's not encoded in Unicode.

The encoded fi ligature is clearly a compatibility character, no
longer needed for correct rendering of ligatures with today's font
technologies. How the encoded fi ligature should look like in the
rendered glyph does not matter as long as you recognize the f and the
dotted i in it.

If most fi ligatures present in many fonts do not exhibit the
difference, it's only because these font designs were not considering
the needs for Turkic typographies, when most Latin-written languages
do not have a strong differenciation between dotted and dotless i
(these languages just have a concept of "soft dots", where the dot
itself does not really modify the i, but only helps reading some
old-style typographies, for example to help separate strings made of
successive letters m, n, u, i).

For long, the dot was only a typographic feature, used contextually in
a discretionary way where it could be useful for readers, long before
becoming a standard, and not a distinctive diacritic. The fi ligature
belongs to the same class of typographic features, but it is probably
not helpful with modern font designs like Arial, Helvetica, Times, or
even Courier (in this case, a monospaced version of the fi ligature is
really bad, but it should not prevent a double-width presentation of
the ligature in a monospaced font)... I also think that for most Latin
languages, it will be suitable to drop the soft dot on i and j, if it
does not effectively help the reader.

-- Philippe.
2011/9/10 Kent Karlsson :
>
> Den 2011-09-10 20:58, skrev "Jukka K. Korpela" :
>
>> There is a deeper language-dependency. According to Oxford Style Manual,
>> one should not use the fi ligature in Turkish, as that would obscure the
>> distinction between normal i and dotless i (ž). This makes perfect sense
>> to me.
>
> It does not make perfect sense to me. Rather that:
>
> *If f followed by i is such that their font glyphs overlap (using
> normal letter spacing), making a ligature appropriate, makes that
> *font* unsuitable for Turkish, as such a ligature would obscure...*.
>
> If that is what you (and other who have said the same thing) meant,
> then fine. But taken at face value, your statement does not make
> (typographic) sense.
>
>    /Kent K
>
>
>
>
>




Re: ligature usage - WAS: How do we find out what assigned code points aren't normally used in text?

2011-09-10 Thread Richard Wordingham
On Sat, 10 Sep 2011 22:19:27 +0200
Kent Karlsson  wrote:

> 
> Den 2011-09-10 20:58, skrev "Jukka K. Korpela" :
> 
> > According to Oxford Style
> > Manual, one should not use the fi ligature in Turkish, as that
> > would obscure the distinction between normal i and dotless i (ž).
 
> It does not make perfect sense to me. Rather that:

I believe the point is that the glyph of fi U+FB01 LATIN SMALL LIGATURE
FI is unsuitable for Turkish because it is normally undotted, or at
least, the dot is barely visible. (Confusingly, my e-mail client chooses
a dotted glyph!)

Richard.




Re: ligature usage - WAS: How do we find out what assigned code points aren't normally used in text?

2011-09-10 Thread Kent Karlsson

Den 2011-09-10 20:58, skrev "Jukka K. Korpela" :

> There is a deeper language-dependency. According to Oxford Style Manual,
> one should not use the fi ligature in Turkish, as that would obscure the
> distinction between normal i and dotless i (ž). This makes perfect sense
> to me.

It does not make perfect sense to me. Rather that:

*If f followed by i is such that their font glyphs overlap (using
normal letter spacing), making a ligature appropriate, makes that
*font* unsuitable for Turkish, as such a ligature would obscure...*.

If that is what you (and other who have said the same thing) meant,
then fine. But taken at face value, your statement does not make
(typographic) sense.

/Kent K






Re: ligature usage - WAS: How do we find out what assigned code points aren't normally used in text?

2011-09-10 Thread Jukka K. Korpela

10.9.2011 2:14, Kent Karlsson wrote:


But of course, which pairs of
letters (or indeed also punctuation) are likely to occur adjacently
is language dependent.


Indeed, and I used to think (some years ago) that in Finnish, even the 
“fi” ligature does not matter much and isn’t used (as “f” only occurs in 
words of foreign origin). But later I realized that _when_ a case for a 
ligature appears, as in a word like “filosofia” (Finnish for 
“philosophy”), it may matter a lot—depending on the font of course.


So I would say that it primarily depends on font and other typographic 
parameters rather than language.


There is a deeper language-dependency. According to Oxford Style Manual, 
one should not use the fi ligature in Turkish, as that would obscure the 
distinction between normal i and dotless i (ı). This makes perfect sense 
to me.


--
Yucca, http://www.cs.tut.fi/~jkorpela/



Re: ligature usage - WAS: How do we find out what assigned code points aren't normally used in text?

2011-09-10 Thread tulasi
> ligature usage - WAS:
> How do we find out what assigned code points aren't
> normally used in text?

In most countries, each government has a language unit (each country has
literary society as well). You can find contact address for such unit by
writing to that county's Consulate or Embassy - FYI San Francisco has German
office.

I think you find at least one German symbol that does not have code-point.

If you write, for German, you can get all information (past to present
practice), within 30 days from the date they receive your request.

In case of Mark Davis
Unicode Inc president if you ask for information in some cases he goes into
hibernation :)

Tulasi


Re: ligature usage - WAS: How do we find out what assigned code points aren't normally used in text?

2011-09-09 Thread Stephan Stiller

Dear Martin,

Thanks for alerting me to the issue of causal direction of aesthetic 
preference - it's been on my mind, but your reply helps me sort out some 
details.


When I first encountered text (outside of the German language locale) 
with ample use of ligatures in modern printed text, I definitely found 
the ligatures a bit distracting, but partly just because I wasn't used 
to them. I also perceived them as a solution to what (in Germany) 
appeared to me to be a real non-issue.


Put simply, there is a conflict between full flexibility for font 
designs and the burden imposed by sophisticated ligatures and kerning 
tables.


From my background I never perceived a need, but I guess I (and most 
people??) wouldn't really mind the tradition coming back (in Germany) if 
things are designed well (which is the job of the font designer) and for 
the user everything is handled automatically in the background by the 
available technology ...


Stephan




Re: ligature usage - WAS: How do we find out what assigned code points aren't normally used in text?

2011-09-09 Thread Martin J. Dürst

On 2011/09/10 9:32, Stephan Stiller wrote:

Actually, I *was* talking about purely typographic/aesthetic ligatures
as well. I'm aware that which di-/trigraphs need to be considered from a
font design perspective is language-dependent.


And this language-dependence is not only a question of letter 
combination frequency, but also of aesthetic preference.


What I have heard very often is that Frenchs has a preference for using 
many ligatures, whereas Italian uses almost none.



But the point is that I
observe that:
(a) aesthetic ligatures are not frequently seen in modern German print and
(b) the absence of such ligatures doesn't offend me (in modern German
print).


I think part of that comes from the fact that with modern DTP, lots of 
fonts are used across languages without any particular adjustments with 
respect to ligatures. (This may not be the case for high-end order-made 
fonts used by publishing houses, but it's certainly true for the 
run-of-the mill Times Roman, Helvetica, and so on used on PCs.)


Typography is always an interplay between designer, reader, and 
technology. So what probably happened is that the technology-induced use 
of the same fonts across languages let to designs with less 
language-specific ligatures (essentially lowest-common-denominators in 
terms of ligatures) and to an adjustment of the designs so that this 
infrequency of ligatures would be less visible. Also, you and other 
readers got used to these designs.


Regards,Martin.


It could be - and a quick visual check confirms this - that the fonts
used for printing of {novels, school textbooks, tech/science books, ...}
and the associated kerning tables don't necessitate ligatures or have
traditionally (fwiw) not been seen as necessitating them. Enough
professional publishing houses I _think_ don't use aesthetic ligatures,
so that, whenever I do see them in German text, they stand out to me. So
/de facto/ usage of aesthetic ligatures seems a bit like a locale
parameter to me.

That said - if I'm really factually wrong (and ligatures in modern
German text are just so subtle and pervasive that I never took notice),
people on the list please feel free to correct me.

Stephan

On 9/9/2011 4:14 PM, Kent Karlsson wrote:

I was talking about purely typographic ligatures, in particular
ligatures used because the glyphs (normally spaced) would otherwise
overlap in an unpleasing manner. If the glyphs don't overlap (or
there is extra spacing, which is quite ugly in itself if used in
"normal" text), no need to use a (purely typographic) ligature.
So it is a font design issue. (And then there are also ornamental
typographic ligatures, like the st ligature, but those are outside
of what I was talking about here.) But of course, which pairs of
letters (or indeed also punctuation) are likely to occur adjacently
is language dependent.

/Kent K


Den 2011-09-09 23:45, skrev "Stephan Stiller":


Pardon my asking, as this is not my specialty:


There are several other ligatures
that *should* be formed (automatically) by "run of the mill" fonts:
for instance the "fj" ligature, just to mention one that I find
particularly important (and that does not have a compatibility code
point).

About the "should" - isn't this language-dependent? For example I recall
that ordinary German print literature barely uses any ligatures at all
these days (ie: I'm not talking about historical texts). And, has anyone
ever attempted to catalogue such ligature practices? (Is this suitable
for CLDR?)

(I also recall being taken aback by the odd look of ligatures in many
LaTeX-typeset English scientific documents, but I suspect that's rather
because some of the commonly used fonts there are lacking in aesthetic
design.)

Stephan










Re: ligature usage - WAS: How do we find out what assigned code points aren't normally used in text?

2011-09-09 Thread Stephan Stiller



Actually, I *was* talking about purely typographic/aesthetic
ligatures as well. I'm aware that which di-/trigraphs need to be
considered from a font design perspective is language-dependent.
But the point is that I observe that:
 (a) aesthetic ligatures are not frequently seen in modern German
print and


I would assume that is because many commonly used fonts are designed 
in such a way that letter glyphs don't overlap anyway.


That's the impression I get - all {fl/fi}'s in the dozen or so German 
books I've just checked look perfectly fine to me :-)


And then you should not use any ligature. (Sorry if my original 
"should" implied otherwise.)


Oh, well, then it looks like we agree. I guess it's at least an 
interesting observation that they've found a workaround in some locales. 
Just as it will please typographers that ligatures are seemingly making 
a comeback everywhere, now that we've left the typewriter age. (And that 
ill-designed ligatures - possibly standing out more than their absence - 
have biased and corrupted my perception of the subject matter.)


- S



Re: ligature usage - WAS: How do we find out what assigned code points aren't normally used in text?

2011-09-09 Thread Kent Karlsson



Den 2011-09-10 02:32, skrev "Stephan Stiller" :

>Actually, I *was* talking about purely typographic/aesthetic ligatures as
> well. I'm aware that which di-/trigraphs need to be considered from a font
> design perspective is language-dependent. But the point is that I observe
> that:
>  (a) aesthetic ligatures are not frequently seen in modern German print and

I would assume that is because many commonly used fonts are designed in such
a way that letter glyphs don't overlap anyway. And then you should not use
any ligature. (Sorry if my original "should" implied otherwise.)


/Kent K




Re: ligature usage - WAS: How do we find out what assigned code points aren't normally used in text?

2011-09-09 Thread Stephan Stiller
Actually, I *was* talking about purely typographic/aesthetic ligatures 
as well. I'm aware that which di-/trigraphs need to be considered from a 
font design perspective is language-dependent. But the point is that I 
observe that:

(a) aesthetic ligatures are not frequently seen in modern German print and
(b) the absence of such ligatures doesn't offend me (in modern German 
print).


It could be - and a quick visual check confirms this - that the fonts 
used for printing of {novels, school textbooks, tech/science books, ...} 
and the associated kerning tables don't necessitate ligatures or have 
traditionally (fwiw) not been seen as necessitating them. Enough 
professional publishing houses I _think_ don't use aesthetic ligatures, 
so that, whenever I do see them in German text, they stand out to me. So 
/de facto/ usage of aesthetic ligatures seems a bit like a locale 
parameter to me.


That said - if I'm really factually wrong (and ligatures in modern 
German text are just so subtle and pervasive that I never took notice), 
people on the list please feel free to correct me.


Stephan

On 9/9/2011 4:14 PM, Kent Karlsson wrote:

I was talking about purely typographic ligatures, in particular
ligatures used because the glyphs (normally spaced) would otherwise
overlap in an unpleasing manner. If the glyphs don't overlap (or
there is extra spacing, which is quite ugly in itself if used in
"normal" text), no need to  use a (purely typographic) ligature.
So it is a font design issue. (And then there are also ornamental
typographic ligatures, like the st ligature, but those are outside
of what I was talking about here.) But of course, which pairs of
letters (or indeed also punctuation) are likely to occur adjacently
is language dependent.

 /Kent K


Den 2011-09-09 23:45, skrev "Stephan Stiller":


Pardon my asking, as this is not my specialty:


There are several other ligatures
that *should* be formed (automatically) by "run of the mill" fonts:
for instance the "fj" ligature, just to mention one that I find
particularly important (and that does not have a compatibility code
point).

About the "should" - isn't this language-dependent? For example I recall
that ordinary German print literature barely uses any ligatures at all
these days (ie: I'm not talking about historical texts). And, has anyone
ever attempted to catalogue such ligature practices? (Is this suitable
for CLDR?)

(I also recall being taken aback by the odd look of ligatures in many
LaTeX-typeset English scientific documents, but I suspect that's rather
because some of the commonly used fonts there are lacking in aesthetic
design.)

Stephan






Re: ligature usage - WAS: How do we find out what assigned code points aren't normally used in text?

2011-09-09 Thread Kent Karlsson

I was talking about purely typographic ligatures, in particular
ligatures used because the glyphs (normally spaced) would otherwise
overlap in an unpleasing manner. If the glyphs don't overlap (or
there is extra spacing, which is quite ugly in itself if used in
"normal" text), no need to  use a (purely typographic) ligature.
So it is a font design issue. (And then there are also ornamental
typographic ligatures, like the st ligature, but those are outside
of what I was talking about here.) But of course, which pairs of
letters (or indeed also punctuation) are likely to occur adjacently
is language dependent.

/Kent K


Den 2011-09-09 23:45, skrev "Stephan Stiller" :

> Pardon my asking, as this is not my specialty:
> 
>> There are several other ligatures
>> that *should* be formed (automatically) by "run of the mill" fonts:
>> for instance the "fj" ligature, just to mention one that I find
>> particularly important (and that does not have a compatibility code
>> point).
> 
> About the "should" - isn't this language-dependent? For example I recall
> that ordinary German print literature barely uses any ligatures at all
> these days (ie: I'm not talking about historical texts). And, has anyone
> ever attempted to catalogue such ligature practices? (Is this suitable
> for CLDR?)
> 
> (I also recall being taken aback by the odd look of ligatures in many
> LaTeX-typeset English scientific documents, but I suspect that's rather
> because some of the commonly used fonts there are lacking in aesthetic
> design.)
> 
> Stephan
> 
> 





ligature usage - WAS: How do we find out what assigned code points aren't normally used in text?

2011-09-09 Thread Stephan Stiller

Pardon my asking, as this is not my specialty:


There are several other ligatures
that *should* be formed (automatically) by "run of the mill" fonts:
for instance the "fj" ligature, just to mention one that I find
particularly important (and that does not have a compatibility code
point).


About the "should" - isn't this language-dependent? For example I recall 
that ordinary German print literature barely uses any ligatures at all 
these days (ie: I'm not talking about historical texts). And, has anyone 
ever attempted to catalogue such ligature practices? (Is this suitable 
for CLDR?)


(I also recall being taken aback by the odd look of ligatures in many 
LaTeX-typeset English scientific documents, but I suspect that's rather 
because some of the commonly used fonts there are lacking in aesthetic 
design.)


Stephan