Dov Feldstern wrote:
Hi!
I think that I've finally tracked down the cause for a problem we've
been having for a long time with RTL / Hebrew in LyX 1.5.
Specifically, this is the problem described in bug 3040 (see links
below), where in the frontend, Hebrew words are placed in the correct
position on the screen, but the letters within each word are reversed.
There were also some weird interactions between this and the locale
settings (for certain *illegal* locale settings, the problems suddenly
disappeared); conversely, a change made in r17354-17355 made the
problem reappear, even for the illegal locale settings. (Incidentally,
also prior to r15893, the bug existed always --- regardless of the
locale --- and I was never able to understand how the changes made
there had any effect on the Bidi code.) I think the following will
explain all these phenomena:
(1) When painting characters to the screen, we try (for efficiency?)
to group characters together as much as possible
(http://www.lyx.org/trac/browser/lyx-devel/trunk/src/rowpainter.C?rev=17362#L325),
and then paint them to the screen as a single string, rather than
painting each character separately. Determining when we have to stop
the grouping is done in the section of code pointed to above. (I can't
necessarily explain *why* that's how the groups are broken up, or
indeed if it is even the correct way to do it, but that's how it's
done.) One of the conditions for breaking the group is (line 355:) if
(!isPrintableNonspace(c)) --- in other words, if a character is not
printable, or is a space, we break the group. Let's keep that in mind
for now.
(2) LyX uses a built-in Bidi algorithm to determine the correct order
for displaying characters on the screen. Internally, the text is
stored in logical order. When outputting the text to the screen, the
Bidi algorithm is used to determine the "visual" order of the
characters. This is performed in Bidi.C
(http://www.lyx.org/trac/browser/lyx-devel/trunk/src/Bidi.C).
computeTables is used to create the correct mapping between the
logical order and the visual order; and vis2log (log2vis) is used to
return the correct logical (visual) position for the given visual
(logical) position.
(As a user of many software applications over the years which have had
to deal with mixed Hebrew/English text, I must say that LyX has done a
wonderful job. I don't think that there's any other piece of software
--- commercial or otherwise --- with which I have had as few problems
with respect to Bidi, as with LyX. The credit for this goes to Dekel
Tsur, who implemented LyX's Bidi algorithm. Thanks, Dekel!)
(3) Qt 4 applies it's own Bidi algorithm to QStrings painted with
drawText. So if a string which contains an entire word in Hebrew is
painted, the letters will be reversed (the QString is assumed to be in
logical order).
(4) Put (2) and (3) together, and words get reversed twice, which
means they are back in logical order when displayed on the screen.
*This is the basic problem that we currently have*. It's new to 1.5, I
guess, for one or more of the following reasons:
* Earlier versions of Qt don't apply the Bidi algorithm to painted
strings?
* Qt (of earlier versions, and/or Qt4) doesn't apply the Bidi
algorithm to non-Unicode strings?
(5) So what happened between r15893 and r17354, and what does this
have to do with the locale settings?
Well, going back to (1): prior to r15893, isPrintableNonspace(char c)
was implemented like this: return (c & 127) > ' '; Hebrew characters
would be identified (correctly) as isPrintableNonspace, and would
therefore be grouped together --- meaning that, as explained above,
the string would be reversed. A space would (also correctly) be
identified as such, and would therefore break the group --- that's why
the order of the words was still okay.
But the above method for determining isPrintableNonspace is incorrect
for Unicode, and so in r15893 this was changed to use the iswprint()
and iswspace() functions from wctype.h. These depend on the locale
settings (specifically, LC_CTYPE) to perform correctly. So when the
locale was set, the same things as explained before would happen, and
the letters in each word would still get reversed. However, if the
locale wasn't set, or was illegal, then Hebrew characters would not be
identified as printable; thus, every Hebrew character would break the
grouping; and each character would be painted to the screen
separately. When this happens, (3) is irrelevant (there's only one
character, nothing for Qt to reverse!), and therefore only LyX's Bidi
algorithm is working, and the output is correct!
In r17354/5, the isw...() functions were replaced by a different
method for determining these classes, which do not depend on the
locale settings. Thus, we're back to the original situation: Hebrew
characters are *correctly* identified as PrintableNonspaces, and
therefore grouped together while painting, and getting reversed by Qt.
So that explains the bug. Now, to the possible solutions:
(1) Paint Hebrew/Arabic characters one at a time, so that Qt's Bidi
algorithm doesn't get applied. This is the easiest solution, and also
the most conservative, and therefore least likely to introduce new
bugs. I think that this is definitely the way to go at least until
1.5.0 is released. This does have the disadvantage, however, of
painting the characters one at a time, which may be less efficient
(does anyone know if this really makes a significant difference?).
Here is a way in which you can test this:
1. Make a big document with, say, 50 pages of Hebrew/Arabic text only.
2. Test how much time is needed to scroll through it while holding
down the down-arrow key. Compare with an equally long document
full of Roman text only. This test will paint every word on screen.
Perhaps a test with page-down brings out more differences, the
down-arrow test might end up testing video scrolling speed instead.
Make sure all tests are done with the same window size. Maximizing
is one way.
(3) The reverse of (2): stop using our own Bidi algorithm altogether,
and only rely on Qt's. Abdel, I know that you're in favor of this
suggestion ;). I also see that it could have certain advantages: we
wouldn't have to maintain our own bidi code; we may be able to paint
much larger chunks of text at once --- we
Yes - having LyX doing less work is definitely the way to go - at least
in the
long run. If we ever go for another frontend, then that frontend
had better support bidi too. (Or whoever push that frontend can make
frontend-specific bidi support for it.)
Helge Hafting