Thanks, Luca. I've had a nice casual talk on the phone with Simon, yesterday. Essentially, we only talked about very high-level stuff, especially the decision for a certain strategy (or two). You know I came up with the idea to create a simpler best-fit strategy with no look-ahead for invoice-style documents but maybe it would be possible to design your obvious total-fit strategy in a way that it could be used as a best-fit without look-ahead. The problem, like I mentioned already, is the possible change of available IPD within a page-sequence which results in a possible back-tracking and recalculation of vertical boxes.
Of course, if it's possible to stay with one page-breaking algorithm for all use cases that would be best (because of the reduced effort), but only if the algorithm is reasonably fast for invoice-style documents. I'm repeatedly confronted with certain speed requirements in this case. Since modern high-volume single-feed printers handle about 180 pages per minute (continuous feed systems handle over 4 times that speed, but I think that's neither relevant, nor realistic here) FOP should be able to operate close to these 180 pages per minute for not too complex documents on a modern server. That means 330ms per page. Not much. Of course, in such an environment it is possible to distribute the formatting process over several blade servers but I had to realize that certain companies tend to prefer spending 100'000 dollars on a big server than spending a lot less for a much faster CPU-power-oriented setup. It seems to be hard to say good-bye to the old host systems. Well, that's just like the reality looks like in my environment. Simon, for example, is much more interested in book-style documents where there are other requirements. Speed is not a big issue, but quality is. In the end, I think we need to rate the chosen approach in these two points of view. These are very contradicting requirements and it's something that seems quite important to me not to forget here. Luca, do you think your total-fit approach may be written in a way to handle changing available IPDs and that look-ahead can be disabled to improve processing speed at the cost of optimal break decisions? If it's ok for you (and feasible) I'd like to integrate what you already have (in code) into that branch I was talking about. I would like to avoid recreating something you've already started, even if it doesn't work with the changes that happened in the last weeks. Even if we may create two different strategies I'm sure that certain parts will be shared by both approaches, like the creation of Knuth-style elements for the PageLM. Some more comments inline: On 04.03.2005 13:23:01 Luca Furini wrote: > > Jeremias Maerki wrote: > > >Would you consider sharing what you already > >have? This may help us in the general discussion and may be a good > >starting point. > > Ok, I'll try to. > > The main change in the LineLM is that the line breaking algorithm does not > select only the node in activeList with fewest demerits: all the nodes > whose demerits are <= a threshold are used to create LineBreakPositions, > so for each paragraph there is a set of layout options (for example, a > paragraph could create 8 to 10 lines, 9 being the layout with fewest > demerits). Hmm, that's a feature that I would say is something that only book-style documents will need. Invoice-style documents could live without it. > According to the value of widows and orphans, the LineLM creates a > sequence of elements: besides "normal" lines, represented by a box, there > are "optional lines", represented by > box(0) penalty(inf,0) glue(0,1,0) box(0) > and "removable lines" > box(0) penalty(inf,0) glue(1,0,1) box(0) > A few complications arise if not every possible layout allows breaks > between lines, but they all can be solved using boxes, glues and > penalties (for example, if a paragraph needs 3 or 4 lines, if it uses 3 > it cannot be parted). Also something that's not all too important for invoice-style documents, although it can't hurt to have it. > The BlockLM, and a block stacking LM in general, adds elements > representing its children's spaces and keep condition, for example > adding a 0 penalty or an infinite penalty according to > child1.mustKeepWithNext(), child2.mustKeepWithPrevious() and > this.mustKeepTogether(). That's certainly a must-have in any case. > The PageLM, once it has the list of elements representing a whole > page-sequence (or the content before a forced page break), calls the same > breaking algorithm, using only a different selection method which leaves > only one node in activeList. That's the part where I have a big question mark about changing available IPD. We may have to have a check that figures out if the available IPD changes within a page-sequence by inspecting the page-masters. That would allow us to switch automatically between total-fit and best-fit or maybe even first-fit. A remaining question mark is with side-floats as they influence the available IPD on a line-to-line basis. > It has now a rough sequence of pages: each one may may have a positive or > negative difference (with respect to the page height); the glue elements > representing adjustable lines or adjustable spaces in a page are collected > in different lists and they are used to "negotiate" a block progression > adjustment with the LM which created them. In this phase each LineLM > knows how many lines it will finally create. > > Then, a new sequence of elements is created, and this time each element > has a fixed width (as the adjustments have already been decided). > This sequence is used to create the final pages; note that if the > adjustments have been enough to perfectly fill the pages, a first fit > algorithm would be enough to recreate the right page breaks. > This phase is needed, at the moment, because the Positions that the > LineLMs store in their elements are not LineBreakPosition (as they still > don't know how many lines they have to create), but maybe it could be > avoided in some way ... > > Don't hesitate to ask for further details, I'll try to answer as clearly > as possible! > > As per the columns, I did not think about them yet, but if they are > equally wide it shouldn't be terribly hard to handle them ... Yes, I think so, too. One thing for a deluxe strategy for book-style documents is certainly alignment of lines between facing pages. But that's something that's not important at the moment. I'd be very interested to hear what you think about the difficulty of changing available IPD. The more I think about it, however, the more I think the total-fit model gets too complicated for what we/I need right now. But I'm unsure here. Jeremias Maerki