Jeremias Maerki wrote:
did you already investigate how footnotes are implemented? Can you say
anything about how similar the problem of footnotes is to before-floats?
Just so you don't have to start from scratch while there may be
something to build upon. After all, the footnotes also contain some
logic to move certain parts to a different page than where anchor is
located.
A few quick comments about the footnote implementation:
1) the FootnoteLM returns only the sequence of elements representing the
inline part (not the footnote-body part); it just adds to the last
(inline) box a reference to the FootnoteBodyLM.
2) the LineLM, after computing the breaks, adds to each (block) box
representing a line the references to the FootnoteBodyLM whose citations
are in that line
3) during the remaining of the element collection phase, these references
are not used (but in the creation of "combined" element lists, when they
should be copied inside the new elements)
4) the PageSequenceLM.PageBreaker.getNextKnuthElements() method, after
receiving all the (block) elements, scans them looking for footnote
information, gets the elements from the referenced FootnoteBodyLM and puts
them in a different list (at the moment a list of lists, but this is
sub-optimal), and from the footnote-separator (in a separate list)
5) these lists are looked at in PageBreakingAlgorithm.computeDifference(),
where we try to add some footnote content to the "normal" page content
using getFootnoteSplit(), and in computeDemerits(), where some extra
demerits are added if we break a footnote or some footnotes are deferred.
This last point at the moment is performed using many
PageBreakingAlgorithm private variables, which is maybe not the best way
to do it, as we must be very careful about their initialization and their
use, especially when the algorithm restarts. I think that a "state" object
storing these variables could be used to store these values, and
explicitly passed along the methods instead of relying on the class
members, but concerning this I'd like to hear the opinions of the other
committers ...
Insertion of before-floats could be implemented in a similar way, giving
the precedence to the footnote insertion (as it is affected by more strict
constraints).
An important difference between a footnote and a before-float is that the
latter does not have an "inline part", so (if we want to follow the same
pattern) we need to either store the reference inside a previously-created
box or to add some new elements containing the reference (but we must be
sure that these elements cannot be parted from the previous ones, see the
constraints in section 6.10.2 in the spec).
A crucial point is the demerit function as, if I remember correctly, it
greatly affect the computational complexity of the breaking algorithm
(thre should be a M. Plass paper concerning this).
HTH
Another thing that we may need to keep in mind: There was lots of desire
from the user community that FOP supports large documents (long-term
goal, not necessary yours). I wrote that a first-fit algorithm could
help free memory earlier. Obviously, for complex before-float situations
a total-fit approach is probably more interesting as it can come up with
more "creative" solutions. I'm just mentioning it so we keep the bigger
picture in mind and since there could be conflicting goals.
A "first degree" of first-fit algorithm could be achieved quite quickly by
having a BreakingAlgorithm interface which is implemented by a TotalFitBA
(the existing implementation) and a FirstFitBA which would have a much
simpler considerLegalBreak() method that, instead of the complex set of
nodes, just keeps in mind a single node.
This would surely decrease the memory footprint, but is not (I think) what
we really want, as this simplified algorithm would be performed on the
whole sequence of elements.
In order to start processing the sequence as soon as we receive a few
elements we need to do some deeper changes.
An idea (I just had it now, so I did not fully consider all its
implications).
At the moment, the block-level LM collect elements from their children and
return just a single sequence (if there are no break conditions); we could
have a parameter requesting them to return after they receive each child
sub-sequence, and have a canStartComputingBreak() method that returns true
if the sequence contains enough elements and we are using a first-fit
algorithm, or false otherwise ...
Sorry for the long post ... and for the long absence too, but it seems
that just after thinking "great, now I've really got some time to spend on
FOP" I receive tons of other things to do ... :-(
Regards
Luca