Hi Simon, I finally took the time to read and digest your Wiki page. This is an interesting reading. A few comments:
According to that representation paragraphs with inline text have legal linebreak points. I consider those legal linebreak points also as legal pagebreak points. In addition, there are legal pagebreak points between the vertical elements such as paragraphs and blocks.
One issue that will have to be addressed is that widow or keep-with-previous settings may invalidate some previously believed legal breakpoints. In such cases, active nodes which contain those breakpoints in their chains will have to be deactivated; if they were the chosen best nodes, some other nodes will somehow have to be retrieved to replace them. I hope this won't be a too great difficulty.
Within the inner loop, we consider the page and paragraph layout between the active and the current pagebreak point. If the active breakpoint is within a paragraph, we calculate the best line breaks from that breakpoint to the end of the paragraph. For all complete
Unless the current breakpoint lies in the same paragraph.
In page independent linebreaking, for each feasible breakpoint the best node is retained, which represents the best layout of the paragraph up to that point, and which, due to the dynamic principle, is part of the best layout of the whole paragraph if that layout uses this breakpoint. If line numbers matter, the best node for each line number is retained. In page dependent linebreaking, even that is not enough. We must retain the best node for each vertical offset on the page, because that is the quantity that influences page breaking. This
Good point. This led me to the following thoughts: Currently the iteration over the active nodes is broken into two loops: one loop for iterating over the line numbers, one for iterating over the active nodes associated to each line number. Why? Because if line widths aren't the same they have an influence on the computation of best line breaks. Because when considering a given legal breakpoint, we must know the width of the line it would end in order to be able to compute the shrinking/stretching for that line. In fact we make a distinction between line numbers because they determine the /context/ in which linebreaks are computed. If all the lines had the same widths such a distinction wouldn't be necessary. The merging of line- and page-breakings generalizes this problem of differing contexts. This time, not only the line number counts, but also the page number, the offset from the beginning of the page, the out-of-lines to be placed, etc. I think the greatest challenge will be to identify all the elements which determine that context, and to be able to compare two contexts and say if they are equivalent or not. Considering the case 1 you describe on the wiki page, there are only two different contexts: page number even or odd. In this case the offset from the beginning of the page doesn't count. In other more complex documents this may be much more complicated.
When the linewidths depend on the page number, we need to remember the best pagebreak node for each feasible pagebreak point for each page number. Otherwise, we only need to remember the best pagebreak node for each feasible pagebreak point. Note that the latter condition is true in the presence of out-of-line elements, because those are related to the content of the page, not the the page number.
Small typo: "not the the page number"
Optimization opportunity 1: We may need to reuse many times the best layout from a breakpoint to the end of the paragraph, and the best layout from the start of the paragraph to a breakpoint. Therefore we need to store a reference to the best end node for either case in the active node. If we wish to take into account the different possible heights of the part of the paragraph, we need to store references to a set of best end nodes. Especially for long page sequences, a page breakpoint may be feasible both for an odd and an even page. In that case, we need to store different end points for each, due to the different line widths on odd and even pages. This optimization is certainly true for the start of the paragraph. There will be many page layouts on which the whole paragraph fits.
So, this might be handled automatically by the dynamic algorithm if we were able to identify the different contexts.
Optimization opportunity 2: Do we need to consider each active node? Or can we already determine that some active nodes will never give a better layout than others? Suppose that a paragraph has two feasible breakpoints A and B, which have an equal number of lines, or even the same height, before and after the page breakpoint. Suppose that B has a higher amount of demerits than A. Can we then conclude that B will never be part of the best layout, because a better layout can always be achieved with A? Yes, we can.
Same here, we would just have to detect that linebreak point A and B always occur in the same context, and then that B may be discarded.
In most cases the whole paragraph will fit on the page, and the legal pagebreak point after it is not feasible. So we do another paragraph, and another. Finally we will reach a feasible pagebreak point, which ends page 1. The corresponding linebreak point ends a line which is able to fill the page height with maximally tolerable stretching. In the next steps we will find more feasible pagebreak points. They correspond to different layouts of the last paragraph that starts on this page, with the same number of lines, and to layouts with more lines, which also fit on the page, until the maximal shrink is exceeded. At this point the starting node is deactivated, and the active page node list only contains nodes that end page 2.
Unless I've missed something: that end page *1*. Or?
It should be noted that for each feasible pagebreak node, we restart the linebreaking calculations of that page. Of course we must optimize this. We could attach a data structure to the starting page node which records the results of the calculations done. Those results consist of the active line node list, the last linebreak point considered, and the widths calculated up to that point. During uninterrupted linebreak calculation these data are held in loop variables. Now we must store them so that they can be retrieved when we resume the calculation in the consideration of the next current pagebreak point. On the first
If the main loop were iterating over line breaks, we wouldn't have to do this. But perhaps we would have to for pagebreaks then? Which would be equivalent, but I can't figure out yet. Regards, Vincent