Even more Hebrew / Bidi / Encoding Woes (2/2)

Dov Feldstern Fri, 06 Apr 2007 08:25:28 -0700

Hi!

Another issue with the changes recently made to the encodings. This one is actually not a bug, perhaps, rather an "over-fix".

* In order to neutralize the problem described in part 1, this file uses the "default" encoding. * The attached file (more_heb_problems.lyx) is a LyX 1.3 generated file (only because I want to compare this to behavior of previous versions; one could have created the file in 1.5 and would see the same effects).


This one is a little more complicated:

On the screen (both 1.3 and 1.5), the document looks like gui.png. All three lines look the same, but actually each was typed in a slightly different manner, as described below (and you can of course see these differences if you look carefully at the .lyx file):

l1. I typed "english with ", then F12 to switch to Hebrew, typed the Hebrew, and then before typing the trailing space F12 again back to English, and then " in the middle" (note that the space before "in" was typed *after* switching back to English.

l2. this time, the space after the Hebrew, before "in the middle", was typed *before* switching back to English (so the space is actually "in Hebrew").

l3. I typed one Hebrew word, then F12 back to English, a space, F12 back to Hebrew, and then the second Hebrew word, F12 back to English, and then a space and the rest of the English.

Now, if we look at the output (please ignore the differences in the fonts between 1.3 and 1.5 --- they're running on two different machines), we can see that the output of 1.5 is actually perfect --- it depicts exactly what was typed. 1.3 was more forgiving, and dealt with the whitespace in a more "understanding" way --- it must somehow (I suspect "unconsciously"?) make sure that the spaces between two languages are depicted in the latex as belonging to the main document language. So lines l1 and l2 are output identically. Line l3 is output correctly, as in 1.5.


So here are the problems:

p1. l3 is output correctly (note the order of the two Hebrew words in the output, compared to the two previous lines), but displays wrong in the gui. p2. l2 displays correctly in 1.5, but breaks the expected behavior for documents being converted from 1.3. (Also, most probably 1.3's output is actually what the user meant, and 1.5 is *not* what the user meant.)


Possible solutions:

s1. One could argue that the gui's behavior is actually correct: it is applying the bidi algorithm, and this is deciding to ignore the language of the whitespace altogether (in other words, it's saying something like this: whitespace between two words of the same language get treated as that same language; whitespace between two words of differing languages get's treated as the "surrounding" language); this is correct from the point of view of an algorithm which doesn't have any explicit language commands, which is what normal bidi algorithms do. If we decide that this is really the correct behavior, then (a) p1 needs to be solved by making sure the latex output matches the gui; p2 needs to be solved by making 1.5 behave like 1.3 again. Actually, what this means is that the code which outputs the latex will have to be smart enough to apply it's own "bidi algorithm with respect to whitespace".

However, I've been arguing all along that since we *do* have explicit language commands, then we should use that information. That's why I'm claiming that the gui is actually wrong, and 1.5's output is correct.

s2. If we accept that we should be using language information even for the whitespace, then the gui will have to be changed to reflect this behavior (which probably means playing around a bit with the bidi algorithm, something I've been trying to avoid up to now). Also --- and this would be much easier --- the whitespace should be marked as belonging to the foreign language (with the blue underline) --- just that would already go a long way towards making it clearer to the user that there are differences between the three lines.

s3. Again, if we accept that 1.5 is doing the right thing, perhaps the conversion from 1.3 to 1.5 should somehow take this into account, and in the cases of l2 (which 1.3 has been outputting *in*correctly, although the output is actually what the user meant), the conversion should somehow fix the LyX file itself so that it will be output the way 1.3 would have output it, even though 1.5 is now doing the right thing by outputting it as depicted above.

Just one more note: this is not the most important issue, so it's not really crucial. However, 1.5's behavior is not good in that inadvertently one might type a space before or after switching the language, not see anything on the screen to indicate this, but get output which is not what was meant, and it becomes very hard to track the problem down. So perhaps the easiest solution is just to go back to 1.3 behavior. The gui should still be change for case l3, but that's a separate issue, and it's not as common for something like that to happen by mistake.

On one level, I'm just asking to hear what people here think is the correct way to solve this. I'm going offline now, I'll be back, I hope, tomorrow evening after dark.

Dov

more_heb_problems.lyx
Description: application/lyx

Even more Hebrew / Bidi / Encoding Woes (2/2)

Reply via email to