Re: [Xmlgraphics-fop Wiki] Update of "PageLayout/PageAndLineBreaking" by SimonPepping

Vincent Hennebert Tue, 12 Sep 2006 11:27:15 -0700

Hi Simon,

I finally took the time to read and digest your Wiki page. This is an
interesting reading. A few comments:

According to that representation paragraphs with inline text have
legal linebreak points. I consider those legal linebreak points also
as legal pagebreak points. In addition, there are legal pagebreak
points between the vertical elements such as paragraphs and blocks.


One issue that will have to be addressed is that widow or
keep-with-previous settings may invalidate some previously believed
legal breakpoints. In such cases, active nodes which contain those
breakpoints in their chains will have to be deactivated; if they were
the chosen best nodes, some other nodes will somehow have to be
retrieved to replace them. I hope this won't be a too great difficulty.

Within the inner loop, we consider the page and paragraph layout
between the active and the current pagebreak point. If the active
breakpoint is within a paragraph, we calculate the best line breaks
from that breakpoint to the end of the paragraph. For all complete


Unless the current breakpoint lies in the same paragraph.

In page independent linebreaking, for each feasible breakpoint the
best node is retained, which represents the best layout of the
paragraph up to that point, and which, due to the dynamic principle,
is part of the best layout of the whole paragraph if that layout uses
this breakpoint. If line numbers matter, the best node for each line
number is retained. In page dependent linebreaking, even that is not
enough. We must retain the best node for each vertical offset on the
page, because that is the quantity that influences page breaking. This


Good point. This led me to the following thoughts:

Currently the iteration over the active nodes is broken into two loops:
one loop for iterating over the line numbers, one for iterating over the
active nodes associated to each line number. Why? Because if line widths
aren't the same they have an influence on the computation of best line
breaks. Because when considering a given legal breakpoint, we must know
the width of the line it would end in order to be able to compute the
shrinking/stretching for that line. In fact we make a distinction
between line numbers because they determine the /context/ in which
linebreaks are computed. If all the lines had the same widths such a
distinction wouldn't be necessary.

The merging of line- and page-breakings generalizes this problem of
differing contexts. This time, not only the line number counts, but also
the page number, the offset from the beginning of the page, the
out-of-lines to be placed, etc. I think the greatest challenge will be
to identify all the elements which determine that context, and to be
able to compare two contexts and say if they are equivalent or not.
Considering the case 1 you describe on the wiki page, there are only two
different contexts: page number even or odd. In this case the offset
from the beginning of the page doesn't count. In other more complex
documents this may be much more complicated.

When the linewidths depend on the page number, we need to remember the
best pagebreak node for each feasible pagebreak point for each page
number. Otherwise, we only need to remember the best pagebreak node
for each feasible pagebreak point. Note that the latter condition is
true in the presence of out-of-line elements, because those are
related to the content of the page, not the the page number.


Small typo: "not the the page number"

Optimization opportunity 1: We may need to reuse many times the best
layout from a breakpoint to the end of the paragraph, and the best
layout from the start of the paragraph to a breakpoint. Therefore we
need to store a reference to the best end node for either case in the
active node. If we wish to take into account the different possible
heights of the part of the paragraph, we need to store references to a
set of best end nodes. Especially for long page sequences, a page
breakpoint may be feasible both for an odd and an even page. In that
case, we need to store different end points for each, due to the
different line widths on odd and even pages. This optimization is
certainly true for the start of the paragraph. There will be many page
layouts on which the whole paragraph fits.


So, this might be handled automatically by the dynamic algorithm if we
were able to identify the different contexts.

Optimization opportunity 2: Do we need to consider each active node?
Or can we already determine that some active nodes will never give a
better layout than others? Suppose that a paragraph has two feasible
breakpoints A and B, which have an equal number of lines, or even the
same height, before and after the page breakpoint. Suppose that B has
a higher amount of demerits than A. Can we then conclude that B will
never be part of the best layout, because a better layout can always
be achieved with A? Yes, we can.


Same here, we would just have to detect that linebreak point A and B
always occur in the same context, and then that B may be discarded.

In most cases the whole paragraph will fit on the page, and the legal
pagebreak point after it is not feasible. So we do another paragraph,
and another. Finally we will reach a feasible pagebreak point, which
ends page 1. The corresponding linebreak point ends a line which is
able to fill the page height with maximally tolerable stretching. In
the next steps we will find more feasible pagebreak points. They
correspond to different layouts of the last paragraph that starts on
this page, with the same number of lines, and to layouts with more
lines, which also fit on the page, until the maximal shrink is
exceeded. At this point the starting node is deactivated, and the
active page node list only contains nodes that end page 2.


Unless I've missed something: that end page *1*. Or?

It should be noted that for each feasible pagebreak node, we restart
the linebreaking calculations of that page. Of course we must optimize
this. We could attach a data structure to the starting page node which
records the results of the calculations done. Those results consist of
the active line node list, the last linebreak point considered, and
the widths calculated up to that point. During uninterrupted linebreak
calculation these data are held in loop variables. Now we must store
them so that they can be retrieved when we resume the calculation in
the consideration of the next current pagebreak point. On the first


If the main loop were iterating over line breaks, we wouldn't have to do
this. But perhaps we would have to for pagebreaks then? Which would be
equivalent, but I can't figure out yet.


Regards,
Vincent

Re: [Xmlgraphics-fop Wiki] Update of "PageLayout/PageAndLineBreaking" by SimonPepping

Reply via email to