Re: Bidi edge cases in Hangul and Indic
Thank you. Section 3.5 confused me: Shaping, that is selection of cursive-connected shapes, is applied after the UBA reordering. However other character to glyph conversions are applied before it "(taking the embedding levels into account for mirroring)". >2018-02-26 21:45 GMT+02:00, Ken Whistler: > On 2/26/2018 7:11 AM, QSJN 4 UKR wrote: >>> The UBA reorders the display order in layout -- not the underlying >>> string. >> What? >> >> UBA reorders characters, not glyphs. > > Actually it does not. The backing order storage of the text is > unaffected. See UAX #9: > > "When working with bidirectional text, the characters are still > interpreted in logical order--only the display is affected." > > And see Section 3.4, Reordering Resolved Levels. The character stream is > mapped onto glyphs *in logical order*.
Re: Bidi edge cases in Hangul and Indic
David, On 2/22/2018 7:21 PM, David Corbett via Unicode wrote: My confusion stems from Unicode’s online bidi utility. That bidi utility has known defects in it. It is not yet conformant with changes to UBA 6.3, let alone later changes to UBA. And the mapping of memory position to display position in that utility does not take into account complex mapping that has to occur in the layout engines and fonts in real applications. --Ken
Re: Bidi edge cases in Hangul and Indic
On Thu, Feb 22, 2018 at 6:32 PM, Ken Whistler wrote: > > If you override the normal left-to-right ordering with bidi override > controls, then the layout order is reversed, but what is actually laid out > is those two glyphs. So you just reverse the order of the two syllables for > display, in either case. > My confusion stems from Unicode’s online bidi utility. Compare https://unicode.org/cldr/utility/bidi.jsp?a=%E2%80%AE%EB%B3%B4%EA%B8%B0 (NFC) to https://unicode.org/cldr/utility/bidi.jsp?a=%E2%80%AE%E1%84% 87%E1%85%A9%E1%84%80%E1%85%B5 (NFD). Concatenating each one’s characters in reordered display position order produces canonically different results. Here is more practical example. A sequence of an emoji modifier base and an emoji modifier in an RTL run will be display-reordered such that the modifier is to left of the base. Clearly, the right thing is to not reorder them, because they should ligate to form a single glyph. Contrast this with “fl” in an RTL run, which will be display-reordered to “lf”: it would be wrong to apply the previous rationale here just because “fl” may have a single glyph. It sounds like the UBA doesn’t specify how to reorder the glyphs of the characters within a level run. That’s about what I expected. I was just worried it might require an easily implemented but wrong order, so thanks for the reassurance.
Bidi edge cases in Hangul and Indic
Although the Unicode Bidirectional Algorithm clearly defines how to reorder characters in memory, I don’t understand precisely what it means to display one character after another once they’ve been reordered; specifically, when bidi reordering changes the number of user-perceived characters. For example, after a right-to-left override, the Hangul string 보기 (“bogi”) becomes 기보 (“gibo”) in visual order. However, its NFD form is reordered by jamo instead of by syllable; that is, it looks like “igob”. I don’t think it is the intent of the algorithm that canonically equivalent strings display so very differently, but I can’t find any explicit guidance. What should a UBA-conformant renderer do? Another unclear case is Indic clusters. षिक् is unambiguously two clusters, but after an RLO, and after following rule L3 to put combining marks after their bases, it looks like one cluster: क्षि. If Devanagari were actually written right-to-left, I would expect it to stay as two clusters: क्षि. Does the UBA prefer one rendering over the other, or is this outside its scope?