Andreas,

First, many thanks for your explanations!
Some comments below.

Andreas L Delmelle a écrit :
> On Mar 8, 2007, at 20:15, Andreas L Delmelle wrote:
> 
> Hi Vincent,
> 
>> On Mar 8, 2007, at 18:29, Vincent Hennebert wrote:
>>
>>> <snip />
>>>
>>> Now, does that picture represent the current code?
>>
>> For the largest part, yes, I think so.
>> The only difference is the addArea() part, which currently goes
>> top-down. It is not the Position that notifies its LM that the areas
>> must be added, but the LM that iterates over the Positions and adds
>> their corresponding areas.
> 
> Just felt like elaborating a bit, maybe putting the whole thing in
> perspective, and offering a glimpse on a few of the current pain-points,
> so here's a rough sketch of what happens inside FOP when a FO document
> gets processed, with some pointers into the code. Maybe it saves you and
> other interested subscribers a bit of time when trying to get the bigger
> picture. I know it took me long enough... ;)
> 
> The XSL (identity) transform results in the event
> FOTreeBuilder.MainFOHandler.startElement(
>   "http://www.w3.org/1999/XSL/Format";,
>   "root",
>   "fo:root", attributes);
> 
> This event occurs for every FO node, starting with the fo:root.
> Basically, if you follow that method's logic, what is done is no more than:
> -> create a FONode of the right type
> -> create a PropertyList from the Attributes
> -> bind the PropertyList to the FONode
>     (= transfer the applicable properties for that particular node-type
>        to instance members of the FONode; the PropertyList itself is
>        only stored by the FOTreeBuilder to use as parent PropertyList
>        for the FONode's childrens' PropertyLists)

When you say "transfer the applicable properties", you mean that
inheritance is also handled here? That is, from all the specified +
inherited properties, pick up the ones which apply?

> 
> At the end of startElement(), the processed FONode itself is added to
> the list of childNodes of the MainFOHandler's currentFObj, if all basic
> validation passed. Subsequently, the node's startOfNode() method is
> called, which in most cases is no more than a mapping to:
> 
> FOEventHandler.startXXX()
> 
> [Note that, IIRC, in the current Trunk code, FONode.startOfNode() is
> called /before/ the child has been added to the parent. I think it makes
> more sense to reverse this order... My local sandbox reflects the above
> description. In the current overall design, however, this makes only
> little difference: see further on.]

My understanding of this part is still limited, but it also seems to
make more sense to me.


> When a node has no children, or all of its children have been processed,
> we receive a MainFOHandler.endElement(), where FONode.endOfNode() is
> called, which maps to:
> 
> FOEventHandler.endXXX()
> 
> The FOEventHandler that is used for area-tree based rendering, is of
> course the AreaTreeHandler, which currently offers implementations only
> for startPageSequence() and endPageSequence(). The latter is where the
> fun starts (read: the layout-loop)
> 
> What this boils down to, is the first pain-point:
> The whole FO tree for an fo:page-sequence has to be built /before/
> anything layout-related is triggered. In most cases relatively
> unimportant, since the scenarios where the content absolutely /cannot/
> be split into multiple logical sequences beforehand are rare. However,
> in some of those cases --tables that would span hundreds of pages-- the
> amount of heap space required to build the corresponding tree is simply
> too much for any JVM to bear. In environments where such a document is
> one amongst other concurrently processed documents --heap space is
> shared by different threads--, it will lead to an OOMError *long* before
> the end of the page-sequence is reached, and since the heap is shared
> between threads, either the error will also occur there or those other
> threads will be left with very limited heap space to work with,
> resulting in a much slower processing.

Some thoughts related to this:
It would anyway be best to start the layout process as soon as possible;
ideally there would be multiple, chained threads for the multiple tasks:
FO tree generation, Knuth elements generation, breaking, area tree
generation, rendering, etc. They would act like Unix pipes, in a
producer/consumer model where each thread would be fed by the thread it
depends on, and would itself feed the subsequent thread.
Questions are: does that make sense, when does a thread know it can
start its work, can we clearly separate the several processes, oh well
all those thread synchronizing issues, etc.
But that might give some real performance boost on multi-processor machines.


> OK, if you then follow the trail starting at
> AreaTreeHandler.endPageSequence(), we almost immediately arrive at
> another pain-point, to which the problem of changing available ipd is
> related:
> 
> PageSequenceLM.activateLayout()
> ...
> -> AbstractBreaker.doLayout()
>    ...
>    -> PageBreaker.getNextBlockList()
>       (first) page provided
>       -> AbstractBreaker.getNextBlockList()
>          ...
>          -> PageBreaker.getNextKnuthElements()
>             -> FlowLayoutManager.getNextKnuthElements()
> 
> Now, as I recall from a few debug sessions in that area, the latter
> method-name is actually a bit misleading, as it gets called only once
> per fo:flow, and triggers (line-)layout for all descendants of the flow,
> based on a LayoutContext that has the available ipd of the region-body
> of the first page in the sequence. The page-break computation is started
> only after the entire fo:flow has been broken into lines. The
> PageBreaker operates completely upon the single element list returned by
> the main flow, and that element list has been constructed under the
> (possibly false) assumption that the ipd would be the same for all pages.

Regarding the changing-IPD problem, I wrote some notes during the GSoC
last summer:
http://wiki.apache.org/xmlgraphics-fop/GoogleSummerOfCode2006/FloatsImplementationProgress/ImplementingSideFloats#head-953fc5836ed422f91834ea15bf1e2515d0101300
I already explained my ideas to some of you. At one time I'll have to
write them down on a wiki page.


> This design is also partly related to the above (memory consumption of
> the fo tree), precisely because the LM creation loop is currently tied
> in to the getNextKnuthElements() loop.
> Each LM uses getChildLM() to obtain a reference to its 'next' child LM,
> and that method ultimately maps to a next() on an iterator over the list
> of children of the FO that is associated with the LM. The iterator that
> is used to do this, is a standard ListIterator iterating over an
> ArrayList, obtained via FONode.getChildNodes().
> This means that the underlying list /cannot/ be altered after the
> iterator has been created (other than through one of /that/ particular
> iterator's methods). Since this iterator is obtained when the LM is
> constructed, this makes it impossible to instantiate a FlowLayoutManager
> without the corresponding Flow's list of child nodes being complete. A
> call to Flow.addChildNode() after the LM has been created would lead to
> a ConcurrentModificationException when the LM issues the following
> getChildLM()...

That seems to give some confirmation to my thread ideas above: a thread
for creating FONodes, one for LMs, one for layout; change from a pull
model to a push model: instead of requiring the next LM, the LM thread
would notify the layout thread that a new LM is available. Possibly
while being itself notified by the FONode thread that new nodes have
been created.


> The rest of the story you already seem to understand (and some parts
> even better than the rest of us, it seems at times ;)). Once the
> page-break positions have been computed, the PageSequenceLM starts the
> addAreas() chain, passing the PositionIterator down to the descendant LMs.
> The area tree that is the result of all this, is then handed off to the
> renderer, which basically translates the area tree structure into PDF,
> PS, XML...

... but that's another story ;-)

> That's it --for now :)

Those notes deserve their wiki page, to not get lost in the mailing list
archives. I'll create one as soon as I have time. The documentation part
of the website might also need some cleaning up, BTW.

It's amazing the number of things if would do if I had time...

Cheers,
Vincent

Reply via email to