Re: Position, Leaf/NonLeafPosition, wrapping positions

Andreas L Delmelle Fri, 09 Mar 2007 01:37:17 -0800

On Mar 8, 2007, at 20:15, Andreas L Delmelle wrote:

Hi Vincent,

On Mar 8, 2007, at 18:29, Vincent Hennebert wrote:
<snip />

Now, does that picture represent the current code?
For the largest part, yes, I think so.
The only difference is the addArea() part, which currently goes top-down. It is not the Position that notifies its LM that the areasmust be added, but the LM that iterates over the Positions and addstheir corresponding areas.

Just felt like elaborating a bit, maybe putting the whole thing inperspective, and offering a glimpse on a few of the current pain-points, so here's a rough sketch of what happens inside FOP when a FOdocument gets processed, with some pointers into the code. Maybe itsaves you and other interested subscribers a bit of time when tryingto get the bigger picture. I know it took me long enough... ;)


The XSL (identity) transform results in the event
FOTreeBuilder.MainFOHandler.startElement(
  "http://www.w3.org/1999/XSL/Format";,
  "root",
  "fo:root", attributes);

This event occurs for every FO node, starting with the fo:root.

Basically, if you follow that method's logic, what is done is no morethan:

-> create a FONode of the right type
-> create a PropertyList from the Attributes
-> bind the PropertyList to the FONode
    (= transfer the applicable properties for that particular node-type
       to instance members of the FONode; the PropertyList itself is
       only stored by the FOTreeBuilder to use as parent PropertyList
       for the FONode's childrens' PropertyLists)

At the end of startElement(), the processed FONode itself is added tothe list of childNodes of the MainFOHandler's currentFObj, if allbasic validation passed. Subsequently, the node's startOfNode()method is called, which in most cases is no more than a mapping to:


FOEventHandler.startXXX()

[Note that, IIRC, in the current Trunk code, FONode.startOfNode() iscalled /before/ the child has been added to the parent. I think itmakes more sense to reverse this order... My local sandbox reflectsthe above description. In the current overall design, however, thismakes only little difference: see further on.]

When a node has no children, or all of its children have beenprocessed, we receive a MainFOHandler.endElement(), whereFONode.endOfNode() is called, which maps to:


FOEventHandler.endXXX()

The FOEventHandler that is used for area-tree based rendering, is ofcourse the AreaTreeHandler, which currently offers implementationsonly for startPageSequence() and endPageSequence(). The latter iswhere the fun starts (read: the layout-loop)


What this boils down to, is the first pain-point:

The whole FO tree for an fo:page-sequence has to be built /before/anything layout-related is triggered. In most cases relativelyunimportant, since the scenarios where the content absolutely /cannot/ be split into multiple logical sequences beforehand are rare.However, in some of those cases --tables that would span hundreds ofpages-- the amount of heap space required to build the correspondingtree is simply too much for any JVM to bear. In environments wheresuch a document is one amongst other concurrently processed documents--heap space is shared by different threads--, it will lead to anOOMError *long* before the end of the page-sequence is reached, andsince the heap is shared between threads, either the error will alsooccur there or those other threads will be left with very limitedheap space to work with, resulting in a much slower processing.

OK, if you then follow the trail starting atAreaTreeHandler.endPageSequence(), we almost immediately arrive atanother pain-point, to which the problem of changing available ipd isrelated:


PageSequenceLM.activateLayout()
...
-> AbstractBreaker.doLayout()
   ...
   -> PageBreaker.getNextBlockList()
      (first) page provided
      -> AbstractBreaker.getNextBlockList()
         ...
         -> PageBreaker.getNextKnuthElements()
            -> FlowLayoutManager.getNextKnuthElements()

Now, as I recall from a few debug sessions in that area, the lattermethod-name is actually a bit misleading, as it gets called only onceper fo:flow, and triggers (line-)layout for all descendants of theflow, based on a LayoutContext that has the available ipd of theregion-body of the first page in the sequence. The page-breakcomputation is started only after the entire fo:flow has been brokeninto lines. The PageBreaker operates completely upon the singleelement list returned by the main flow, and that element list hasbeen constructed under the (possibly false) assumption that the ipdwould be the same for all pages.

This design is also partly related to the above (memory consumptionof the fo tree), precisely because the LM creation loop is currentlytied in to the getNextKnuthElements() loop.Each LM uses getChildLM() to obtain a reference to its 'next' childLM, and that method ultimately maps to a next() on an iterator overthe list of children of the FO that is associated with the LM. Theiterator that is used to do this, is a standard ListIteratoriterating over an ArrayList, obtained via FONode.getChildNodes().This means that the underlying list /cannot/ be altered after theiterator has been created (other than through one of /that/particular iterator's methods). Since this iterator is obtained whenthe LM is constructed, this makes it impossible to instantiate aFlowLayoutManager without the corresponding Flow's list of childnodes being complete. A call to Flow.addChildNode() after the LM hasbeen created would lead to a ConcurrentModificationException when theLM issues the following getChildLM()...

The rest of the story you already seem to understand (and some partseven better than the rest of us, it seems at times ;)). Once the page-break positions have been computed, the PageSequenceLM starts theaddAreas() chain, passing the PositionIterator down to the descendantLMs.The area tree that is the result of all this, is then handed off tothe renderer, which basically translates the area tree structure intoPDF, PS, XML...



That's it --for now :)

Cheers,

Andreas

Re: Position, Leaf/NonLeafPosition, wrapping positions

Reply via email to