Re: Suppressing Ligation of Spacing Marks
inserting some zero-width word joiner or disjoiner should work with this... But if you see a dotted circle, you need to encode some zero-width space as the base holder for the combining vowel sign following it. However I wonder if fonts accept zero-width holders for combining vowels, they could still assume that there's no matching base consonnant and thus insert another base dotted circle. There's no consensus across script for using the same null-base holder acting as a pseudo-consonnant for vowels encoded after them (e.g. Hangul has its own jamo holder for this because of its specific algorithmic composition, but some other scripts also use such null holders for their own orthography).. In Alphabetic scripts, the ZWNJ should work. But in Indic scripts we are all depending on the capability of renderers to support specific scripts with only specific subsets of base letters and every other character outside this subset will trigger the insertion of a dotted circle glyph, and ZWJ/ZWNJ is already specific for being used in script-specific clusters for some distinctions (notably to control how parts of clusters are subgrouped ...) You'll need to "bug" the maintainers of the renderer if they forgot necessary cases described earlier for the script when it was initially approved for encoding. 2016-11-08 10:09 GMT+01:00 Richard Wordingham < richard.wording...@ntlworld.com>: > Should it be possible to suppress the ligation of a base character and > a visually following spacing mark in plain text? > > The example I have in minf is the sequence U+1A63 TAI THAM VOWEL SIGN AA>. It may be desirable to suppress the > ligation because both ligands have subscript consonants. However, if > I write, the Universal Shaping Engine > decides that the ZWNJ triggers a new syllable, and inserts a dotted > circle before SIGN AA. (The dotted circle after SIGN AA results from a > failure to read the proposal for the Lanna script as it was then > called.) > > Richard. > >
Re: Multiple Preposed Marks
2016-11-09 0:42 GMT+01:00 Richard Wordingham < richard.wording...@ntlworld.com>: > On Wed, 9 Nov 2016 00:00:01 +0100 > Philippe Verdywrote: > > > 2016-11-08 9:30 GMT+01:00 Richard Wordingham < > > richard.wording...@ntlworld.com>: > > > > > TUS Section 2.11 says, "If the combining characters can interact > > > typographically—for example, U+0304 combining macron and U+0308 > > > combining diaeresis — then the order of graphic display is > > > determined by the order of coded characters (see Table 2-5). > > > By default, the diacritics or other combining characters are > > > positioned from the base character’s glyph outward". > > > The interpretation of "If the combining characters can interact > > typographically" should be better read as "If the combining > > characters have the same non-zero combining class or any one of them > > has a zero combining class". > > The combining marks in question both have canonical combining class 0. > > > But now normalization is everywhere and causes the pairs using the > > condition above to be freely reordered (or decomposed and recomposed, > > meaning that the encoding order is NOT significant at all). > > I believe a renderer is permitted to treat canonically equivalent > sequence differently so long as it does not believe it should treat > them differently. However, that is irrelevant to this case. > This is DIRECTLY relevant to the sentence in TUS you quoted, which is all about combining characters encoded after the base letter and often have non-zero combining classes and are reorderable But evidently this sentence in TUS is not relevant to "prepended" combining marks that are all with combining class 0, here "prepended" meaning: encoded before the base character, but not after it even if they are visually combining before it, as is the case for wellknown Indic vowels that have now non-zero combining classes that allow them to be reordered before other combining marks when normalizing, but still remaining encoded after the base consonnant). What I want to say is that this sentence in TUS is quite ambiguous: it speaks about graphic interaction, but this is not really encoded in text sequences and forgets the the effect of combining classes on combining sequences, which NEVER considers any actual graphic interaction (simply because it is not specified and the actual graphic interactions may depend on font styles (notably in honorific Arabic typography using very complex layouts, but even within the Latin script when using decorated font styles or custom ligatures where complex also interactions occur, including on larger spans than clusters, such as full words).
Re: Multiple Preposed Marks
2016-11-08 9:30 GMT+01:00 Richard Wordingham < richard.wording...@ntlworld.com>: > TUS Section 2.11 says, "If the combining characters can interact > typographically—for example, U+0304 combining macron and U+0308 > combining diaeresis — then the order of graphic display is > determined by the order of coded characters (see Table 2-5). > By default, the diacritics or other combining characters are > positioned from the base character’s glyph outward". > The interpretation of "If the combining characters can interact typographically" should be better read as "If the combining characters have the same non-zero combining class or any one of them has a zero combining class". Effectively the combining classes were historically intended to track these possible graphic interactions, in order to allow or disable reordering and detect canonical equivalences. But now normalization is everywhere and causes the pairs using the condition above to be freely reordered (or decomposed and recomposed, meaning that the encoding order is NOT significant at all). But it turned out that some diacritics may be positioned differently according to their base character. E.g., the cedilla which may interact below, where no interaction is supposed with other combining characters normally interacting above (so that reordering to canonical equivalents is permitted and in fact made automatically during the encoding/decoding processes of documents), but with some Latin letters these interaction do occur. The only way then to block the reordering (if you don't want the positions infered from the encoding order of normalized strings), is to block it using zero-combining joiners (CGJ). This sentence should have been updated since long in TUS, because TUS does not really know how characters will be positioned and Unicode permits reordering of pairs of diacritics if they are not blocking each other for normalization. This is important for the cedilla, but even more important for Hebrew diacritics, whose combining classes do not really track correctly their relative positioning (as discussed on this list years ago, and known as the "Hebrew points bug" (but this will never change: the combiing classes are assigned permanently and continue to work for simple cases, but will cause problems with some pairs needing insertions of CGJ). This is also important for several Indic scripts that have complex positioning rules if you use combining characters with non-zero combining classes (initially intended for simple cases in Latin/Greek/Cyrillic). Thanks, the most critical diacritics in Indic scripts for such complex cases have a combining class set to zero (meaning that they blcok eah other and their relative encoding order is not affected by normalization, but there are many cases where CGJ is needed.
Re: Multiple Preposed Marks
On Tue, 8 Nov 2016 21:36, Richard Wordingham wrote: > > On Tue, 8 Nov 2016 08:30:25 + > Richard Wordingham wrote: > > > and the need for an OpenType feature (probably a cvXX) > > for inconsistent handling of U+1A58 MAI KANG LAI. The latter may be a > > challenge - I couldn't persuade MS Edge to use the font's Lao shaping > > General features (e.g. 'ss01') for Tai Tham work a treat in MS Edge, and > seem to be executed at the same time time as the 'standard typographical > presentation', e.g feature 'psts'. Thank you! That makes things much > easier. […] “Where thereʼs a will, thereʼs a way!” Marcel
Re: Multiple Preposed Marks
On Tue, 8 Nov 2016 08:30:25 + Richard Wordinghamwrote: > and the need for an OpenType feature (probably a cvXX) > for inconsistent handling of U+1A58 MAI KANG LAI. The latter may be a > challenge - I couldn't persuade MS Edge to use the font's Lao shaping General features (e.g. 'ss01') for Tai Tham work a treat in MS Edge, and seem to be executed at the same time time as the 'standard typographical presentation', e.g feature 'psts'. Thank you! That makes things much easier. (There seems to be quite a bit of variation in layout in Chiang Mai province, never mind the rest of the region.) Richard.
Re: The (Klingon) Empire Strikes Back
I believe there's already a court ruling that say languages and words are not copyrightablein the case about loglan, although the trademarkability of an language is another matter. 2016年11月5日 01:42 於 "David Faulks"寫道: > > On Thu, 11/3/16, Mark Shoulson wrote: > > Subject: The (Klingon) Empire Strikes Back > > > At the time of writing this letter it has not yet hit the UTC > > Document Register, but I have recently submitted a document > > revisiting the ever-popular issue of the encoding of Klingon > > "pIqaD". The reason always given why it could not be > > encoded was that it did not enjoy enough usage, and so I've > > collected a bunch of examples to demonstrate that this is not > > true (scans and also web pages, etc.) So the issue comes > > back up, and time to talk about it again. > > There is another issue of course, which I think could be a huge obstacle: > the Trademark/Copyright issue. Paramount claims copyright over the entire > Klingon language (presumably including the script). The issue has recently > gone to court. Encoding criteria for symbols (and this likely extends to > letters) is against encoding them without the permission of the > Copyright/Trademark holder. > > Is Paramount endorsing your proposal? > > > > > ~mark > > David Faulks > > > > > > >
Re: The (Klingon) Empire Strikes Back
On 2016-11-08, Mark E. Shoulsonwrote: > I've heard that there are similar questions regarding tengwar and cirth, > but it is notable that UTC *did* see fit to consider this question for > them and determine that they were worthy of encoding (they are on the > roadmap), even though they have not actually followed through on that > yet, perhaps because of these very IP concerns. Notably, pIqaD is not The Tolkien Estate considers that the tengwar constitute a work of art, and it's not willing to see them in Unicode, because this would hinder its ability to pursue people using tengwar for what it considers inappropriate purposes. (I finally asked them a couple of years ago for permission to encode, based on Michael Everson's draft proposal from yonks ago, and that's the summary of their reply.) Several years ago, I was told on this list that it would be up to the proposers to deal with this, and that the Unicode Consortium would have no interest in taking on the 800lb legal gorilla that is the Tolkien Estate. (Now a 24M£ gorilla with what it got from New Line Cinema.) If some wealthy Unicode Consortium member feels like paying for an American counsel's opinion that the Estate is just trying it on, feel free! -- The University of Edinburgh is a charitable body, registered in Scotland, with registration number SC005336.
Suppressing Ligation of Spacing Marks
Should it be possible to suppress the ligation of a base character and a visually following spacing mark in plain text? The example I have in minf is the sequence . It may be desirable to suppress the ligation because both ligands have subscript consonants. However, if I write, the Universal Shaping Engine decides that the ZWNJ triggers a new syllable, and inserts a dotted circle before SIGN AA. (The dotted circle after SIGN AA results from a failure to read the proposal for the Lanna script as it was then called.) Richard.
Multiple Preposed Marks
TUS Section 2.11 says, "If the combining characters can interact typographically—for example, U+0304 combining macron and U+0308 combining diaeresis — then the order of graphic display is determined by the order of coded characters (see Table 2-5). By default, the diacritics or other combining characters are positioned from the base character’s glyph outward". So, if I have two spacing combining marks E and O that are each positioned to the left of the base (say X) in a left-to-right script, so that the encodingsand appear with the glyph orders and , and codings and , if not total gibberish, represent a horizontal sequence of the glyphs with gX on the right, should render as or ? The phonetics and collation (in so far as it is meaningful) of the words provide no cue as to the order of the encoded characters. I have encountered both renderings. The issue came up when I was checking, in both the Firefox and MS Edge browsers, that my OpenType Tai Tham font Da Lekh could handle all the headwords of two Northern Thai dictionaries. (Sparing dotted circle deletion and orthographic syllable reunification are tricky.) One of the dictionaries spells a few words with a combination of the Tai and Pali notations for the vowel /o:/ in open syllables where one might expect to see an independent vowel. I'm down to two other rendering engine issues - a combination of tone mark and then vowel in 4 words, where the dictionary probably has a misspelling, and the need for an OpenType feature (probably a cvXX) for inconsistent handling of U+1A58 MAI KANG LAI. The latter may be a challenge - I couldn't persuade MS Edge to use the font's Lao shaping for the Tai Tham script or for the Latin script in a transliteration mode. (That mode is triggered by feature ss02 for the Latin script, and that works well enough in browsers.) Richard.