Re: Skype-conference on page-breaking?

Jeremias Maerki Mon, 07 Mar 2005 01:19:00 -0800

Thanks, Luca.

I've had a nice casual talk on the phone with Simon, yesterday.
Essentially, we only talked about very high-level stuff, especially the
decision for a certain strategy (or two). You know I came up with the
idea to create a simpler best-fit strategy with no look-ahead for
invoice-style documents but maybe it would be possible to design your
obvious total-fit strategy in a way that it could be used as a best-fit
without look-ahead. The problem, like I mentioned already, is the
possible change of available IPD within a page-sequence which results in
a possible back-tracking and recalculation of vertical boxes.

Of course, if it's possible to stay with one page-breaking algorithm for
all use cases that would be best (because of the reduced effort), but
only if the algorithm is reasonably fast for invoice-style documents.
I'm repeatedly confronted with certain speed requirements in this case.
Since modern high-volume single-feed printers handle about 180 pages per
minute (continuous feed systems handle over 4 times that speed, but I
think that's neither relevant, nor realistic here) FOP should be able to
operate close to these 180 pages per minute for not too complex
documents on a modern server. That means 330ms per page. Not much. Of
course, in such an environment it is possible to distribute the
formatting process over several blade servers but I had to realize that
certain companies tend to prefer spending 100'000 dollars on a big
server than spending a lot less for a much faster CPU-power-oriented
setup. It seems to be hard to say good-bye to the old host systems. Well,
that's just like the reality looks like in my environment.

Simon, for example, is much more interested in book-style documents
where there are other requirements. Speed is not a big issue, but
quality is.

In the end, I think we need to rate the chosen approach in these two
points of view. These are very contradicting requirements and it's
something that seems quite important to me not to forget here.

Luca, do you think your total-fit approach may be written in a way to
handle changing available IPDs and that look-ahead can be disabled to
improve processing speed at the cost of optimal break decisions? If it's
ok for you (and feasible) I'd like to integrate what you already have
(in code) into that branch I was talking about. I would like to avoid
recreating something you've already started, even if it doesn't work
with the changes that happened in the last weeks. Even if we may create
two different strategies I'm sure that certain parts will be shared by
both approaches, like the creation of Knuth-style elements for the
PageLM. 

Some more comments inline:

On 04.03.2005 13:23:01 Luca Furini wrote:
> 
> Jeremias Maerki wrote:
> 
> >Would you consider sharing what you already
> >have? This may help us in the general discussion and may be a good
> >starting point.
> 
> Ok, I'll try to.
> 
> The main change in the LineLM is that the line breaking algorithm does not
> select only the node in activeList with fewest demerits: all the nodes
> whose demerits are <= a threshold are used to create LineBreakPositions,
> so for each paragraph there is a set of layout options (for example, a
> paragraph could create 8 to 10 lines, 9 being the layout with fewest
> demerits).

Hmm, that's a feature that I would say is something that only book-style
documents will need. Invoice-style documents could live without it.

> According to the value of widows and orphans, the LineLM creates a
> sequence of elements: besides "normal" lines, represented by a box, there
> are "optional lines", represented by
>   box(0) penalty(inf,0) glue(0,1,0) box(0)
> and "removable lines"
>   box(0) penalty(inf,0) glue(1,0,1) box(0)
> A few complications arise if not every possible layout allows breaks
> between lines, but they all can be solved using boxes, glues and
> penalties (for example, if a paragraph needs 3 or 4 lines, if it uses 3
> it cannot be parted).

Also something that's not all too important for invoice-style documents,
although it can't hurt to have it.

> The BlockLM, and a block stacking LM in general, adds elements
> representing its children's spaces and keep condition, for example
> adding a 0 penalty or an infinite penalty according to
> child1.mustKeepWithNext(), child2.mustKeepWithPrevious() and
> this.mustKeepTogether().

That's certainly a must-have in any case.

> The PageLM, once it has the list of elements representing a whole
> page-sequence (or the content before a forced page break), calls the same
> breaking algorithm, using only a different selection method which leaves
> only one node in activeList.

That's the part where I have a big question mark about changing
available IPD. We may have to have a check that figures out if the
available IPD changes within a page-sequence by inspecting the
page-masters. That would allow us to switch automatically between
total-fit and best-fit or maybe even first-fit. A remaining question
mark is with side-floats as they influence the available IPD on a
line-to-line basis.

> It has now a rough sequence of pages: each one may may have a positive or
> negative difference (with respect to the page height); the glue elements
> representing adjustable lines or adjustable spaces in a page are collected
> in different lists and they are used to "negotiate" a block progression
> adjustment with the LM which created them. In this phase each LineLM
> knows how many lines it will finally create.
> 
> Then, a new sequence of elements is created, and this time each element
> has a fixed width (as the adjustments have already been decided).
> This sequence is used to create the final pages; note that if the
> adjustments have been enough to perfectly fill the pages, a first fit
> algorithm would be enough to recreate the right page breaks.
> This phase is needed, at the moment, because the Positions that the
> LineLMs store in their elements are not LineBreakPosition (as they still
> don't know how many lines they have to create), but maybe it could be
> avoided in some way ...
> 
> Don't hesitate to ask for further details, I'll try to answer as clearly
> as possible!
> 
> As per the columns, I did not think about them yet, but if they are
> equally wide it shouldn't be terribly hard to handle them ...

Yes, I think so, too. One thing for a deluxe strategy for book-style
documents is certainly alignment of lines between facing pages. But
that's something that's not important at the moment.

I'd be very interested to hear what you think about the difficulty of
changing available IPD. The more I think about it, however, the more I
think the total-fit model gets too complicated for what we/I need right
now. But I'm unsure here.

Jeremias Maerki

Re: Skype-conference on page-breaking?

Reply via email to