On Mon, Jan 15, 2007 at 04:42:12PM +0100, J.Pietschmann wrote: > As for Ligatures and character shaping: an algorithm for automatically > detecting ligature points may use a pattern lookup similar to the > pattern based hyphenation. The pattern dictionary should store only > either NFD or NFC forms, for the same reason this is advisable for > hyphenation.
Aren't ligatures a feature of the font, e.g. the GSUB table of an Open Type font? That is, one font may have a specific ligature, while another font does not. > We should choose either NFD or NFC as a canonical representation for > hyphenation patters (and, in the future, for similar things), so that > hyphenation patterns containing umlauts can be found regardless of > the representation of the umlaut in the source file. Currently, we > don't care much, which works but may break suddenly. > There is obviously a slight space vs. run time tradeoff (NFC ought to > be more compact but NFC'ing the source text may be more expensive > than NFD'ing). NFC is the standard for the web. Does that carry any weight? Simon -- Simon Pepping home page: http://www.leverkruid.eu