Fwd: FOP with Hebrew/Arabic

Andreas Delmelle Fri, 31 Oct 2008 10:53:46 -0700

Just FYI: Bruce Rosen contacted me off-list, and had some questions,as he'd like to have a try at getting 'rl-tb' writing mode to workcorrectly (for Hebrew, at least). To increase the number of eyes/heads that can weigh in, I'm forwarding the follow-up to [EMAIL PROTECTED]

Bruce, in case you aren't subscribed to fop-dev@ yet, and you'restill interested in contributing, the best would be to subscribe, andcontinue the discussion there, as your questions will reach all fop-devs (not only me).


Just to round up what Bruce has discovered:

a) inherent directionality (Unicode BIDI embedding) is notimplemented (no surprise here; known limitation)b) explicit specification of the writing-mode on the simple-page-master or a surrounding fo:block-container does /something/, but hasan interesting side-effect: FOP really reverses the whole block (asin: rotates/mirrors). A consequence of applying the co-ordinatetransformation matrix (CTM), which is necessary to interpret the co-ordinates right for absolute-positioned objects.

a) will definitely require some non-trivial changes to the code. Foruninterrupted runs of text in the same writing-mode, there is noproblem. The line-breaking algorithm can be used as-is. Only, theboxes in the lines should be rendered in reverse order (note: notenough for Arabic, as we'll also need support for inner-wordligatures...). As soon as writing-modes are mixed, though, we'll needchanges to the line-layout algorithm. (reversal of embedded element-lists with different writing-modes, or something in that direction?)

b) not sure... It can easily be reproduced with simple westerncharacters: the text 'test case' in writing-mode 'rl-tb' should berendered as 'esac tset'. What FOP Trunk currently does is rotate theentire page/region/block-container by 180 degrees, along the centeraxis, perpendicular to the page, so that we get the right order ofcharacters/words, but also in 'mirror writing'.

My initial idea went in the direction of checking in the TextLM, whenthe areas are added, what the governing writing-mode is, and ifnecessary, 'undo' the rotation on the level of the individual boxes.The problem with that approach is that the rotation then actually hasto be undone on the level of the individual characters, and FOPoptimizes layout and rendering precisely by generating combined boxesfor whole words or hyphenated word-fragments.Moreover, at the time we compose those boxes(TextLM.getNextKnuthElements()), we can currently not be sure whatthe governing writing-mode will be, unless it was specified on anancestor block-container. In case of the simple-page-master or region-body, ATM we can not be sure whether the writing-mode that isavailable in the LayoutContext when the boxes are generated, will bethe one that is actually used later during page-breaking...

Pages with alternating writing-modes seems an issue that can only besolved by interleaved line- and page-breaking, where the context isguaranteed to always refer to the correct page-master at any point.

Bruce, I definitely do not mean to discourage you, but after lookingcloser, it does not really seem like a simple fix... :/



Cheers

Andreas

Begin forwarded message:

From: Andreas Delmelle <[EMAIL PROTECTED]>
Date: October 30, 2008 00:58:45 GMT+01:00
To: Bruce Rosen <[EMAIL PROTECTED]>
Subject: Re: FOP with Hebrew/Arabic

On Oct 29, 2008, at 23:24, Bruce Rosen wrote:

Hi Bruce
I am planning to have a look at this issue again ... and I coulduse some guidance
WRT "correct" XSL-FO behavior in the area of right-to-left scripts.
Good! Thanks for keeping FOP in mind.
<snip />
So, I really have two questions:
1. According to correct XSL-FO behavior, should it be necessary tospecify the writing-mode?
Yes, but using a block-container is not always necessary, aswriting-mode can also be specified on the simple-page-master or theregion-body.For what the Recommendation has to say about it, some pointers at:http://www.w3.org/TR/xsl/#d0e4879
Very roughly: each Unicode codepoint has an inherent'directionality'. Since the numerals are LEFT-TO-RIGHT codepoints,those parts are typeset that way, and only the parts consisting ofspecific Hebrew, RIGHT-TO-LEFT codepoints are reversed. Note thatin XSL-FO, writing-mode has an initial value of 'lr-tb', if notexplicitly specified (see: http://www.w3.org/TR/xsl/#writing-mode),so the RIGHT-TO-LEFT text will be embedded in the LEFT-TO-RIGHTblock/paragraph.
XEP is correct in both cases.
2. I really would like to take a crack at patching this, at leastso Hebrew works for PDF. Can you
point me to where in the FOP source tree I should be looking?
If I were to go for it, I'd start by debugging a small sample,placing some breakpoints in org.apache.fop.render.pdf.PDFRendererand AbstractPathOrientedRenderer, but I realize that that's onlyvery little to go on... I'll see if I can make time for some testsof my own this weekend. If not to fix it, then at least to give yousomething a bit more concrete as a starting point.
Weird that explicitly switching the writing-mode leads to arotation/mirroring of the lines, rather than a much simplerreversal...
Cheers

Andreas

Fwd: FOP with Hebrew/Arabic

Reply via email to