Just FYI: Bruce Rosen contacted me off-list, and had some questions, as he'd like to have a try at getting 'rl-tb' writing mode to work correctly (for Hebrew, at least). To increase the number of eyes/ heads that can weigh in, I'm forwarding the follow-up to [EMAIL PROTECTED]

Bruce, in case you aren't subscribed to fop-dev@ yet, and you're still interested in contributing, the best would be to subscribe, and continue the discussion there, as your questions will reach all fop- devs (not only me).

Just to round up what Bruce has discovered:
a) inherent directionality (Unicode BIDI embedding) is not implemented (no surprise here; known limitation) b) explicit specification of the writing-mode on the simple-page- master or a surrounding fo:block-container does /something/, but has an interesting side-effect: FOP really reverses the whole block (as in: rotates/mirrors). A consequence of applying the co-ordinate transformation matrix (CTM), which is necessary to interpret the co- ordinates right for absolute-positioned objects.

a) will definitely require some non-trivial changes to the code. For uninterrupted runs of text in the same writing-mode, there is no problem. The line-breaking algorithm can be used as-is. Only, the boxes in the lines should be rendered in reverse order (note: not enough for Arabic, as we'll also need support for inner-word ligatures...). As soon as writing-modes are mixed, though, we'll need changes to the line-layout algorithm. (reversal of embedded element- lists with different writing-modes, or something in that direction?)

b) not sure... It can easily be reproduced with simple western characters: the text 'test case' in writing-mode 'rl-tb' should be rendered as 'esac tset'. What FOP Trunk currently does is rotate the entire page/region/block-container by 180 degrees, along the center axis, perpendicular to the page, so that we get the right order of characters/words, but also in 'mirror writing'.

My initial idea went in the direction of checking in the TextLM, when the areas are added, what the governing writing-mode is, and if necessary, 'undo' the rotation on the level of the individual boxes. The problem with that approach is that the rotation then actually has to be undone on the level of the individual characters, and FOP optimizes layout and rendering precisely by generating combined boxes for whole words or hyphenated word-fragments. Moreover, at the time we compose those boxes (TextLM.getNextKnuthElements()), we can currently not be sure what the governing writing-mode will be, unless it was specified on an ancestor block-container. In case of the simple-page-master or region- body, ATM we can not be sure whether the writing-mode that is available in the LayoutContext when the boxes are generated, will be the one that is actually used later during page-breaking...

Pages with alternating writing-modes seems an issue that can only be solved by interleaved line- and page-breaking, where the context is guaranteed to always refer to the correct page-master at any point.

Bruce, I definitely do not mean to discourage you, but after looking closer, it does not really seem like a simple fix... :/


Cheers

Andreas

Begin forwarded message:

From: Andreas Delmelle <[EMAIL PROTECTED]>
Date: October 30, 2008 00:58:45 GMT+01:00
To: Bruce Rosen <[EMAIL PROTECTED]>
Subject: Re: FOP with Hebrew/Arabic

On Oct 29, 2008, at 23:24, Bruce Rosen wrote:

Hi Bruce

I am planning to have a look at this issue again ... and I could use some guidance
WRT "correct" XSL-FO behavior in the area of right-to-left scripts.

Good! Thanks for keeping FOP in mind.

<snip />
So, I really have two questions:

1. According to correct XSL-FO behavior, should it be necessary to specify the writing-mode?

Yes, but using a block-container is not always necessary, as writing-mode can also be specified on the simple-page-master or the region-body. For what the Recommendation has to say about it, some pointers at: http://www.w3.org/TR/xsl/#d0e4879

Very roughly: each Unicode codepoint has an inherent 'directionality'. Since the numerals are LEFT-TO-RIGHT codepoints, those parts are typeset that way, and only the parts consisting of specific Hebrew, RIGHT-TO-LEFT codepoints are reversed. Note that in XSL-FO, writing-mode has an initial value of 'lr-tb', if not explicitly specified (see: http://www.w3.org/TR/xsl/#writing-mode), so the RIGHT-TO-LEFT text will be embedded in the LEFT-TO-RIGHT block/paragraph.



XEP is correct in both cases.

2. I really would like to take a crack at patching this, at least so Hebrew works for PDF. Can you
point me to where in the FOP source tree I should be looking?

If I were to go for it, I'd start by debugging a small sample, placing some breakpoints in org.apache.fop.render.pdf.PDFRenderer and AbstractPathOrientedRenderer, but I realize that that's only very little to go on... I'll see if I can make time for some tests of my own this weekend. If not to fix it, then at least to give you something a bit more concrete as a starting point.

Weird that explicitly switching the writing-mode leads to a rotation/mirroring of the lines, rather than a much simpler reversal...


Cheers

Andreas

Reply via email to