On Mar 8, 2007, at 20:15, Andreas L Delmelle wrote:

Hi Vincent,

On Mar 8, 2007, at 18:29, Vincent Hennebert wrote:

<snip />

Now, does that picture represent the current code?

For the largest part, yes, I think so.
The only difference is the addArea() part, which currently goes top- down. It is not the Position that notifies its LM that the areas must be added, but the LM that iterates over the Positions and adds their corresponding areas.

Just felt like elaborating a bit, maybe putting the whole thing in perspective, and offering a glimpse on a few of the current pain- points, so here's a rough sketch of what happens inside FOP when a FO document gets processed, with some pointers into the code. Maybe it saves you and other interested subscribers a bit of time when trying to get the bigger picture. I know it took me long enough... ;)

The XSL (identity) transform results in the event
FOTreeBuilder.MainFOHandler.startElement(
  "http://www.w3.org/1999/XSL/Format";,
  "root",
  "fo:root", attributes);

This event occurs for every FO node, starting with the fo:root.
Basically, if you follow that method's logic, what is done is no more than:
-> create a FONode of the right type
-> create a PropertyList from the Attributes
-> bind the PropertyList to the FONode
    (= transfer the applicable properties for that particular node-type
       to instance members of the FONode; the PropertyList itself is
       only stored by the FOTreeBuilder to use as parent PropertyList
       for the FONode's childrens' PropertyLists)

At the end of startElement(), the processed FONode itself is added to the list of childNodes of the MainFOHandler's currentFObj, if all basic validation passed. Subsequently, the node's startOfNode() method is called, which in most cases is no more than a mapping to:

FOEventHandler.startXXX()

[Note that, IIRC, in the current Trunk code, FONode.startOfNode() is called /before/ the child has been added to the parent. I think it makes more sense to reverse this order... My local sandbox reflects the above description. In the current overall design, however, this makes only little difference: see further on.]

When a node has no children, or all of its children have been processed, we receive a MainFOHandler.endElement(), where FONode.endOfNode() is called, which maps to:

FOEventHandler.endXXX()

The FOEventHandler that is used for area-tree based rendering, is of course the AreaTreeHandler, which currently offers implementations only for startPageSequence() and endPageSequence(). The latter is where the fun starts (read: the layout-loop)

What this boils down to, is the first pain-point:
The whole FO tree for an fo:page-sequence has to be built /before/ anything layout-related is triggered. In most cases relatively unimportant, since the scenarios where the content absolutely / cannot/ be split into multiple logical sequences beforehand are rare. However, in some of those cases --tables that would span hundreds of pages-- the amount of heap space required to build the corresponding tree is simply too much for any JVM to bear. In environments where such a document is one amongst other concurrently processed documents --heap space is shared by different threads--, it will lead to an OOMError *long* before the end of the page-sequence is reached, and since the heap is shared between threads, either the error will also occur there or those other threads will be left with very limited heap space to work with, resulting in a much slower processing.

OK, if you then follow the trail starting at AreaTreeHandler.endPageSequence(), we almost immediately arrive at another pain-point, to which the problem of changing available ipd is related:

PageSequenceLM.activateLayout()
...
-> AbstractBreaker.doLayout()
   ...
   -> PageBreaker.getNextBlockList()
      (first) page provided
      -> AbstractBreaker.getNextBlockList()
         ...
         -> PageBreaker.getNextKnuthElements()
            -> FlowLayoutManager.getNextKnuthElements()

Now, as I recall from a few debug sessions in that area, the latter method-name is actually a bit misleading, as it gets called only once per fo:flow, and triggers (line-)layout for all descendants of the flow, based on a LayoutContext that has the available ipd of the region-body of the first page in the sequence. The page-break computation is started only after the entire fo:flow has been broken into lines. The PageBreaker operates completely upon the single element list returned by the main flow, and that element list has been constructed under the (possibly false) assumption that the ipd would be the same for all pages.

This design is also partly related to the above (memory consumption of the fo tree), precisely because the LM creation loop is currently tied in to the getNextKnuthElements() loop. Each LM uses getChildLM() to obtain a reference to its 'next' child LM, and that method ultimately maps to a next() on an iterator over the list of children of the FO that is associated with the LM. The iterator that is used to do this, is a standard ListIterator iterating over an ArrayList, obtained via FONode.getChildNodes(). This means that the underlying list /cannot/ be altered after the iterator has been created (other than through one of /that/ particular iterator's methods). Since this iterator is obtained when the LM is constructed, this makes it impossible to instantiate a FlowLayoutManager without the corresponding Flow's list of child nodes being complete. A call to Flow.addChildNode() after the LM has been created would lead to a ConcurrentModificationException when the LM issues the following getChildLM()...

The rest of the story you already seem to understand (and some parts even better than the rest of us, it seems at times ;)). Once the page- break positions have been computed, the PageSequenceLM starts the addAreas() chain, passing the PositionIterator down to the descendant LMs. The area tree that is the result of all this, is then handed off to the renderer, which basically translates the area tree structure into PDF, PS, XML...


That's it --for now :)

Cheers,

Andreas

Reply via email to