On 01/14/2011 09:34 AM, Steven Schveighoffer wrote: > Is it common to have multiple modifiers on a single character? The > problem I see with using decomposed canonical form for strings is that > we would have to return a dchar[] for each 'element', which severely > complicates code that, for instance, only expects to handle English.
Hebrew: • Almost every letter in a printed Hebrew bible has at least one of— ‣ vowel marker (the Hebrew alphabet is otherwise consonantal) and ‣ a /dagesh/ dot, indicating the difference between /b/ & /v/, or between /mm/ and /m/; • almost every word has at least one letter with a cantillation mark in addition to the above; and • other marks too complicated & off-topic to explain. Vietnamese uses Latin letters with accents playing multiple roles, so there are often two or three accent marks on a single letter; e.g., the name of the creator of pdfTeX is spelled “Hàn Thế Thành”, with two accents on the “e”. I’m sure there are others. —Joel