Re: Bidi edge cases in Hangul and Indic

2018-02-28 Thread QSJN 4 UKR via Unicode
Thank you.

Section 3.5 confused me: Shaping, that is selection of
cursive-connected shapes, is applied after the UBA reordering. However
other character to glyph conversions are applied before it "(taking
the embedding levels into account for mirroring)".

>2018-02-26 21:45 GMT+02:00, Ken Whistler :
> On 2/26/2018 7:11 AM, QSJN 4 UKR wrote:
>>> The UBA reorders the display order in layout -- not the underlying
>>> string.
>> What?
>>
>> UBA reorders characters, not glyphs.
>
> Actually it does not. The backing order storage of the text is
> unaffected. See UAX #9:
>
> "When working with bidirectional text, the characters are still
> interpreted in logical order--only the display is affected."
>
> And see Section 3.4, Reordering Resolved Levels. The character stream is
> mapped onto glyphs *in logical order*.


Re: Bidi edge cases in Hangul and Indic

2018-02-22 Thread Ken Whistler via Unicode

David,


On 2/22/2018 7:21 PM, David Corbett via Unicode wrote:

My confusion stems from Unicode’s online bidi utility.


That bidi utility has known defects in it. It is not yet conformant with 
changes to UBA 6.3, let alone later changes to UBA. And the mapping of 
memory position to display position in that utility does not take into 
account complex mapping that has to occur in the layout engines and 
fonts in real applications.


--Ken


Re: Bidi edge cases in Hangul and Indic

2018-02-22 Thread David Corbett via Unicode
On Thu, Feb 22, 2018 at 6:32 PM, Ken Whistler wrote:

>
> If you override the normal left-to-right ordering with bidi override
> controls, then the layout order is reversed, but what is actually laid out
> is those two glyphs. So you just reverse the order of the two syllables for
> display, in either case.
>

My confusion stems from Unicode’s online bidi utility. Compare
https://unicode.org/cldr/utility/bidi.jsp?a=%E2%80%AE%EB%B3%B4%EA%B8%B0
(NFC) to https://unicode.org/cldr/utility/bidi.jsp?a=%E2%80%AE%E1%84%
87%E1%85%A9%E1%84%80%E1%85%B5 (NFD). Concatenating each one’s characters in
reordered display position order produces canonically different results.

Here is more practical example. A sequence of an emoji modifier base and an
emoji modifier in an RTL run will be display-reordered such that the
modifier is to left of the base. Clearly, the right thing is to not reorder
them, because they should ligate to form a single glyph. Contrast this with
“fl” in an RTL run, which will be display-reordered to “lf”: it would be
wrong to apply the previous rationale here just because “fl” may have a
single glyph.

It sounds like the UBA doesn’t specify how to reorder the glyphs of the
characters within a level run. That’s about what I expected. I was just
worried it might require an easily implemented but wrong order, so thanks
for the reassurance.


Bidi edge cases in Hangul and Indic

2018-02-22 Thread David Corbett via Unicode
Although the Unicode Bidirectional Algorithm clearly defines how to reorder
characters in memory, I don’t understand precisely what it means to display
one character after another once they’ve been reordered; specifically, when
bidi reordering changes the number of user-perceived characters.

For example, after a right-to-left override, the Hangul string 보기 (“bogi”)
becomes 기보 (“gibo”) in visual order. However, its NFD form is reordered by
jamo instead of by syllable; that is, it looks like “igob”. I don’t think
it is the intent of the algorithm that canonically equivalent strings
display so very differently, but I can’t find any explicit guidance. What
should a UBA-conformant renderer do?

Another unclear case is Indic clusters. षिक् is unambiguously two clusters,
but after an RLO, and after following rule L3 to put combining marks after
their bases, it looks like one cluster: क्षि. If Devanagari were actually
written right-to-left, I would expect it to stay as two clusters: क्‌षि.
Does the UBA prefer one rendering over the other, or is this outside its
scope?