Just FYI: Bruce Rosen contacted me off-list, and had some questions,
as he'd like to have a try at getting 'rl-tb' writing mode to work
correctly (for Hebrew, at least). To increase the number of eyes/
heads that can weigh in, I'm forwarding the follow-up to [EMAIL PROTECTED]
Bruce, in case you aren't subscribed to fop-dev@ yet, and you're
still interested in contributing, the best would be to subscribe, and
continue the discussion there, as your questions will reach all fop-
devs (not only me).
Just to round up what Bruce has discovered:
a) inherent directionality (Unicode BIDI embedding) is not
implemented (no surprise here; known limitation)
b) explicit specification of the writing-mode on the simple-page-
master or a surrounding fo:block-container does /something/, but has
an interesting side-effect: FOP really reverses the whole block (as
in: rotates/mirrors). A consequence of applying the co-ordinate
transformation matrix (CTM), which is necessary to interpret the co-
ordinates right for absolute-positioned objects.
a) will definitely require some non-trivial changes to the code. For
uninterrupted runs of text in the same writing-mode, there is no
problem. The line-breaking algorithm can be used as-is. Only, the
boxes in the lines should be rendered in reverse order (note: not
enough for Arabic, as we'll also need support for inner-word
ligatures...). As soon as writing-modes are mixed, though, we'll need
changes to the line-layout algorithm. (reversal of embedded element-
lists with different writing-modes, or something in that direction?)
b) not sure... It can easily be reproduced with simple western
characters: the text 'test case' in writing-mode 'rl-tb' should be
rendered as 'esac tset'. What FOP Trunk currently does is rotate the
entire page/region/block-container by 180 degrees, along the center
axis, perpendicular to the page, so that we get the right order of
characters/words, but also in 'mirror writing'.
My initial idea went in the direction of checking in the TextLM, when
the areas are added, what the governing writing-mode is, and if
necessary, 'undo' the rotation on the level of the individual boxes.
The problem with that approach is that the rotation then actually has
to be undone on the level of the individual characters, and FOP
optimizes layout and rendering precisely by generating combined boxes
for whole words or hyphenated word-fragments.
Moreover, at the time we compose those boxes
(TextLM.getNextKnuthElements()), we can currently not be sure what
the governing writing-mode will be, unless it was specified on an
ancestor block-container. In case of the simple-page-master or region-
body, ATM we can not be sure whether the writing-mode that is
available in the LayoutContext when the boxes are generated, will be
the one that is actually used later during page-breaking...
Pages with alternating writing-modes seems an issue that can only be
solved by interleaved line- and page-breaking, where the context is
guaranteed to always refer to the correct page-master at any point.
Bruce, I definitely do not mean to discourage you, but after looking
closer, it does not really seem like a simple fix... :/
Cheers
Andreas
Begin forwarded message:
From: Andreas Delmelle <[EMAIL PROTECTED]>
Date: October 30, 2008 00:58:45 GMT+01:00
To: Bruce Rosen <[EMAIL PROTECTED]>
Subject: Re: FOP with Hebrew/Arabic
On Oct 29, 2008, at 23:24, Bruce Rosen wrote:
Hi Bruce
I am planning to have a look at this issue again ... and I could
use some guidance
WRT "correct" XSL-FO behavior in the area of right-to-left scripts.
Good! Thanks for keeping FOP in mind.
<snip />
So, I really have two questions:
1. According to correct XSL-FO behavior, should it be necessary to
specify the writing-mode?
Yes, but using a block-container is not always necessary, as
writing-mode can also be specified on the simple-page-master or the
region-body.
For what the Recommendation has to say about it, some pointers at:
http://www.w3.org/TR/xsl/#d0e4879
Very roughly: each Unicode codepoint has an inherent
'directionality'. Since the numerals are LEFT-TO-RIGHT codepoints,
those parts are typeset that way, and only the parts consisting of
specific Hebrew, RIGHT-TO-LEFT codepoints are reversed. Note that
in XSL-FO, writing-mode has an initial value of 'lr-tb', if not
explicitly specified (see: http://www.w3.org/TR/xsl/#writing-mode),
so the RIGHT-TO-LEFT text will be embedded in the LEFT-TO-RIGHT
block/paragraph.
XEP is correct in both cases.
2. I really would like to take a crack at patching this, at least
so Hebrew works for PDF. Can you
point me to where in the FOP source tree I should be looking?
If I were to go for it, I'd start by debugging a small sample,
placing some breakpoints in org.apache.fop.render.pdf.PDFRenderer
and AbstractPathOrientedRenderer, but I realize that that's only
very little to go on... I'll see if I can make time for some tests
of my own this weekend. If not to fix it, then at least to give you
something a bit more concrete as a starting point.
Weird that explicitly switching the writing-mode leads to a
rotation/mirroring of the lines, rather than a much simpler
reversal...
Cheers
Andreas