https://bugs.documentfoundation.org/show_bug.cgi?id=66791
--- Comment #32 from Jonathan Clark <[email protected]> --- This bug is due to the greedy algorithm we use to assign script types to weakly-associated characters. It does not properly handle punctuation. The current algorithm works something like this: - First, any weak characters at the start of a paragraph are assigned to the same script as the first strong character in the paragraph. - Then, the paragraph is scanned in reading order. Weak characters are assigned to the previously-seen script, with a few hard-coded exceptions (e.g. bug 112594). - Finally, we run the Unicode bidi algorithm, and reassign all right-to-left text to the complex script type. The last step hides the depth of the problem. The Unicode bidi algorithm accounts for nested punctuation, so the output seems correct-but-buggy for RTL languages (while not working at all for other language pairs). In my opinion, we should replace the current algorithm with one that extends the RTL behavior to all languages. Existing RTL documents depend on the current behavior, and impacted CJK documents likely already include manual formatting to achieve the same effect, so this seems like the least-disruptive option. -- You are receiving this mail because: You are the assignee for the bug.
