Re: Element list generation for tables (special case)

Jeremias Maerki Thu, 28 Jul 2005 01:10:51 -0700

On 27.07.2005 23:26:48 Andreas L Delmelle wrote:
> On Jul 27, 2005, at 20:45, Jeremias Maerki wrote:
> 
> Hi,
> 
> > I got a test case for tables which raises not a technical but rather a
> > interesting conceptual question. Please have a look at the attached  
> > test
> > case. It defines a table with two columns and two rows. In the given
> > setup the second row creates an break decision with the current code  
> > that
> > can be argued as being bad (see the PDF).
> 
> Indeed, doesn't look right. Given the value for the orphans property,  
> one still would reasonably expect the break to occur before the first  
> cell of the second row.


...or after the first 3 lines of the second row.

> BTW: tried adding a third column mirroring the first, and this leads to  
> ONLY the second column being moved to the next page... This as a  
> further demonstration that the currently produced result still leaves a  
> bit to be desired. (see attach)

That was to be expected because the element list from the first and
third column will likely be that same and therefore won't produce a
different combined element list.

> > Here's an excerpt from the element list:
> >
> >  8) box w=9600
> >  9) penalty p=0 w=0
> > 10) box w=28800
> > 11) penalty p=0 w=0
> > 12) box w=0             //<-- this is where the second row starts
> > 13) penalty p=0 w=9600  //this penalty is due to the possible break  
> > after "B"
> > 14) box w=28800
> > 15) penalty p=0 w=0     //this is the next break poss after three lines
> >                         //due to the orphan setting
> > 16) box w=28800
> >
> > While working on element list generation for tables I came across this
> > question and decided not to do anything about it, especially since
> > removing some of these break possibilities might not be desirable in  
> > all
> > cases.
> >
> > A rule that could be easily implemented would be that we allow the  
> > first
> > break possibility only after every cell in a new row contributed at
> > least one of its own boxes to the combined element list.
> 
> So IOW, if I get this correctly: all break possibilities are to be  
> considered preliminary until the last cell occupying this row (= last  
> grid-unit in the row) has been taken into account?

Almost. In different words again: this means the first step is only
after each newly started cell in a new row contributes at least one box
to the combined element list. I wouldn't want to work with something
like a preliminary break possibility as it suggests that you somehow
have to revisit the list. I'd rather improve the getNextStep method to
only return for the first time after the above rule is met.

> > An example: If you look at page 1 of [1], step 1 would over ignored. On
> > page 3 of [1], the steps 1 and 2 would be ignored.
> > [1]  
> > http://people.apache.org/~jeremias/fop/ 
> > KnuthBoxesForTablesWithBorders.pdf
> 
> Hmm... Do you mean that the steps would be performed but their results  
> discarded, or that the steps simply would not be performed at all?

Not performed at all. See above.

> I'd think the first, but just want to make sure...
> 
> Are the break possibilities currently considered only at the level of  
> the table body --so the element list contains the elements for the  
> cells' boxes, but no separate elements/indicators of row-boundaries?

We seem to have a different word set for expressing this. I don't think
we can say that the breaks are considered at table body level. And you
have to be careful about with element list you speak: the individual
cell element lists or the effective combined element list. Let me
explain how this is implemented:

The TableRowIterator simply provides effective rows with grid units.
For TableContentLM it chooses an array of effective rows which forms a
row group so that no column-spanned cell is split between groups. See
the Wiki for details. Such a row group is the minimal work item for
combining element lists. There is always a break possibility before and
after a row group (except if there is a keep constraint on a row, for
example). Inside a row group the break possibilities are determined by
the getNextStep() method where the combined element list is created.

> In that case --with the risk of underestimating the complexity of what  
> I propose--, perhaps an alternative to the suggested rule would be to  
> insert a step that combines the generated boxes/penalties only after  
> the element list for the last grid unit in a logical row has been  
> created (?) Anyway, instead of simply ignoring those steps, we could  
> also increase the penalty value for the offending break possibility  
> (currently: p=0 for all of them)
> So, IOW, for each row, store the element lists, and after all lists are  
> available, review the calculated penalty values, increasing them when a  
> given break possibility has undesirable consequences when the other  
> element lists for the row are taken into account.
> Or the other way around: give them a default penalty value that is high  
> enough, then afterwards decreasing them for the most favorable break  
> possibilities.
> Or modify all boxes' widths (=heights) to be equal to the largest box.
> After this step is completed, add the combined element list to the body.

These are all valid possibilities, but as a I hinted I want to discuss
this at conceptual level not implementation level. I want to know if we
can have a general rule that we don't allow breaks before every cell
contributed at least one box to the combined element list. Also, Simon
and you are talking about providing higher penalty values, but I asked
about allowing a break at all (i.e. INFINITE penalty, or rather no
penalty at all, only a box). Considering a penalty value p<INFINITE
requires a decision that such breaks are possible/desirable in the first
place.

> IIC, the two separate element lists for the second row would be:
> 
> First grid unit:
> 1) box w=9600
> 2) penalty p=0 w=0
> 
> Second grid unit:
> 1) box w=28800
> 2) penalty p=0 w=0
> 
> So, compare the first boxes' widths and, since the first box in the  
> first list is smaller than that in the second list, either increase the  
> penalty value for the second step in the first list, or change the  
> width of the first box in the first list. Maybe the latter is more  
> attractive, since the resulting combined list can then be created by  
> concatenating the two separate lists...

I really don't worry so much about the implementation of the rule since
it's probably easily done by modifying the exit criteria in one of the
loops in getNextStep().

> [Admitted: this particular case is rather simple, since both lists only  
> have one box.]
> 
> Then combine the lists to arrive at the result below:
> 
> > With this rule the element list would look like this:
> <snip />
> >
> > 12) box w=28800         //<-- this is where the second row starts
> > 13) penalty p=0 w=0
> > 14) box w=28800
> >
> > I'm unsure ATM what this would mean for cases with row spanning,  
> > though.
> 
> As long as the criterion is that every _grid unit_ for the (logical)  
> row in question must have contributed at least one box, I wouldn't  
> expect any particular problem.

Right, but is that rule ok or not from a conceptual view. Are there any
cases where it might be bad?

> > I can see that this new rule would make this better in most cases. What
> > worries me is that there might be cases where we wouldn't want that
> > behaviour, although ATM I can't see them. So I just want to check with
> > you that I haven't forgotten about anything. Or maybe someone has a
> > better rule to implement this. Thoughts welcome.

Sorry for insisting, but we need to talk about the same thing and at the
same level.


Jeremias Maerki

Re: Element list generation for tables (special case)

Reply via email to