On Mar 8, 2007, at 20:15, Andreas L Delmelle wrote:
Hi Vincent,
On Mar 8, 2007, at 18:29, Vincent Hennebert wrote:
<snip />
Now, does that picture represent the current code?
For the largest part, yes, I think so.
The only difference is the addArea() part, which currently goes top-
down. It is not the Position that notifies its LM that the areas
must be added, but the LM that iterates over the Positions and adds
their corresponding areas.
Just felt like elaborating a bit, maybe putting the whole thing in
perspective, and offering a glimpse on a few of the current pain-
points, so here's a rough sketch of what happens inside FOP when a FO
document gets processed, with some pointers into the code. Maybe it
saves you and other interested subscribers a bit of time when trying
to get the bigger picture. I know it took me long enough... ;)
The XSL (identity) transform results in the event
FOTreeBuilder.MainFOHandler.startElement(
"http://www.w3.org/1999/XSL/Format",
"root",
"fo:root", attributes);
This event occurs for every FO node, starting with the fo:root.
Basically, if you follow that method's logic, what is done is no more
than:
-> create a FONode of the right type
-> create a PropertyList from the Attributes
-> bind the PropertyList to the FONode
(= transfer the applicable properties for that particular node-type
to instance members of the FONode; the PropertyList itself is
only stored by the FOTreeBuilder to use as parent PropertyList
for the FONode's childrens' PropertyLists)
At the end of startElement(), the processed FONode itself is added to
the list of childNodes of the MainFOHandler's currentFObj, if all
basic validation passed. Subsequently, the node's startOfNode()
method is called, which in most cases is no more than a mapping to:
FOEventHandler.startXXX()
[Note that, IIRC, in the current Trunk code, FONode.startOfNode() is
called /before/ the child has been added to the parent. I think it
makes more sense to reverse this order... My local sandbox reflects
the above description. In the current overall design, however, this
makes only little difference: see further on.]
When a node has no children, or all of its children have been
processed, we receive a MainFOHandler.endElement(), where
FONode.endOfNode() is called, which maps to:
FOEventHandler.endXXX()
The FOEventHandler that is used for area-tree based rendering, is of
course the AreaTreeHandler, which currently offers implementations
only for startPageSequence() and endPageSequence(). The latter is
where the fun starts (read: the layout-loop)
What this boils down to, is the first pain-point:
The whole FO tree for an fo:page-sequence has to be built /before/
anything layout-related is triggered. In most cases relatively
unimportant, since the scenarios where the content absolutely /
cannot/ be split into multiple logical sequences beforehand are rare.
However, in some of those cases --tables that would span hundreds of
pages-- the amount of heap space required to build the corresponding
tree is simply too much for any JVM to bear. In environments where
such a document is one amongst other concurrently processed documents
--heap space is shared by different threads--, it will lead to an
OOMError *long* before the end of the page-sequence is reached, and
since the heap is shared between threads, either the error will also
occur there or those other threads will be left with very limited
heap space to work with, resulting in a much slower processing.
OK, if you then follow the trail starting at
AreaTreeHandler.endPageSequence(), we almost immediately arrive at
another pain-point, to which the problem of changing available ipd is
related:
PageSequenceLM.activateLayout()
...
-> AbstractBreaker.doLayout()
...
-> PageBreaker.getNextBlockList()
(first) page provided
-> AbstractBreaker.getNextBlockList()
...
-> PageBreaker.getNextKnuthElements()
-> FlowLayoutManager.getNextKnuthElements()
Now, as I recall from a few debug sessions in that area, the latter
method-name is actually a bit misleading, as it gets called only once
per fo:flow, and triggers (line-)layout for all descendants of the
flow, based on a LayoutContext that has the available ipd of the
region-body of the first page in the sequence. The page-break
computation is started only after the entire fo:flow has been broken
into lines. The PageBreaker operates completely upon the single
element list returned by the main flow, and that element list has
been constructed under the (possibly false) assumption that the ipd
would be the same for all pages.
This design is also partly related to the above (memory consumption
of the fo tree), precisely because the LM creation loop is currently
tied in to the getNextKnuthElements() loop.
Each LM uses getChildLM() to obtain a reference to its 'next' child
LM, and that method ultimately maps to a next() on an iterator over
the list of children of the FO that is associated with the LM. The
iterator that is used to do this, is a standard ListIterator
iterating over an ArrayList, obtained via FONode.getChildNodes().
This means that the underlying list /cannot/ be altered after the
iterator has been created (other than through one of /that/
particular iterator's methods). Since this iterator is obtained when
the LM is constructed, this makes it impossible to instantiate a
FlowLayoutManager without the corresponding Flow's list of child
nodes being complete. A call to Flow.addChildNode() after the LM has
been created would lead to a ConcurrentModificationException when the
LM issues the following getChildLM()...
The rest of the story you already seem to understand (and some parts
even better than the rest of us, it seems at times ;)). Once the page-
break positions have been computed, the PageSequenceLM starts the
addAreas() chain, passing the PositionIterator down to the descendant
LMs.
The area tree that is the result of all this, is then handed off to
the renderer, which basically translates the area tree structure into
PDF, PS, XML...
That's it --for now :)
Cheers,
Andreas