Re: Collapsing borders/Tables: Knuth element generation questions (possible ideas?)
On 20.09.2005 23:50:12 Andreas L Delmelle wrote: Hi, Jeremias, Luca or Simon will probably be able to make the most sense out of it, but if there's anyone else that can add a few comments, feel free to do so. (FYI: This is completely separate from my idea to move the border-collapsing to the FOTree.) Now, I'm still not fully at home in the Knuth element generation algorithm, so I don't know exactly whether what I'm about to describe is at all feasible/doable. Maybe it's currently already done this way, and I'm missing the point somewhere... In that case: sorry for the noise. :-/ Here goes: I get the impression that the elements for borders and those for the content of the cells are created in one single pass, which seems to be the source of the so-called 'interaction problem' --IIC, this refers to the situation where, for example, we have already generated the AFTER border elements for the first two cells, while it's only when generating the elements for the third cell that a break is triggered. So, the obtained border- and content-elements become invalid, and need to be re-evaluated (possibly taking the footer into account). Is this a correct assessment of the issue? Unfortunately not. I get the impression that you haven't understood, yet, how the Knuth approach works. We don't reevaluate any decisions in this approach, but rather calculate ALL(!) possible decisions beforehand and incorporate them into the element list we generate. The breaker will merely choose a break possibility and the addAreas stage will paint the results given the break decision. The only reevaluation will happen if we start to implement support for the changing available IPD problem, i.e. when the available IPD is different from page to page within the same page-sequence. In this case we will need to be able to recreate the element list from an arbitrary former break possibility on forward which means that all decisions are reevaluated from this point on due to changed environmental factors (the IPD). Even the line-breaking has to be redone, although the inline element list will not have to be recreated. This calculation of all possible decisions when generating the element list is exactly the same problem I'm currently facing with space resolution. I have to precalculate all space resolution scenarios for every single break possibility in order to be able to create the right element list. Mind-breaking, I tell you.. :-) Am I correct when I say that this problem doesn't pose itself when the break would occur in the first cell of the row(group)? If so, I'm wondering whether it could help if the element generation for row(groups) were split up in two (possibly three passes) and be made to look like the following (in pseudo-code): while( rowIterator.hasNext() ) { if( firstRowGroupInPageOrColumn ) { generateBeforeBorderElements(); } generateAfterBorderElements(); generateContentElements(); } So, by the time we get to generating boxes/glues/penalties for the content of the cells, we would already have the minimum/maximum widths for *all* possible AFTER border elements in the row. The generateAfterBorderElements() step would create two element lists: - one to use if there is no page- or column-break - an alternate list to use in case the content triggers a break (which would then include all elements for the footer, if any) I don't think something like that is possible. During my analysis I found that the effective borders influence the Knuth element generation a lot. You can't separate the borders from the content. Have a look at the notes in the Wiki. They show this interaction. It's all documented there. The element list generation is fully implemented for the separate border model. For the collapsing border model, several examples are documented and fully calculated. The only thing left is the algorithm to handle all the little difficulties arising from the collapsing border model. The most important pages for implementing the collapsing border model are these: http://wiki.apache.org/xmlgraphics-fop/TableLayout/KnuthElementsForTables/RowBorder http://wiki.apache.org/xmlgraphics-fop/TableLayout/KnuthElementsForTables/RowBorder2 http://wiki.apache.org/xmlgraphics-fop/TableLayout/KnuthElementsForTables/HfIntegrationInSteppingAlgorithm Maybe both lists could be made to include the elements for the AFTER padding as well (? since we have to iterate over the cells/grid-units anyway). Eventually only one of the two lists will be merged with the content element list, depending on the situation after the content element list completely known, but it would become a matter of inserting the right list (and discarding the incorrect one --at least, throwing away its elements). The only drawback I immediately see is that the generateAfterBorderElements() step would have to make the comparison with the footer- or table-borders for
Re: Collapsing borders/Tables: Knuth element generation questions (possible ideas?)
On Wed, 21 Sep 2005 02:50 pm, Jeremias Maerki wrote: On 20.09.2005 23:50:12 Andreas L Delmelle wrote: Hi, snip/ Here goes: I get the impression that the elements for borders and those for the content of the cells are created in one single pass, which seems to be the source of the so-called 'interaction problem' --IIC, this refers to the situation where, for example, we have already generated the AFTER border elements for the first two cells, while it's only when generating the elements for the third cell that a break is triggered. So, the obtained border- and content-elements become invalid, and need to be re-evaluated (possibly taking the footer into account). Is this a correct assessment of the issue? Unfortunately not. I get the impression that you haven't understood, yet, how the Knuth approach works. We don't reevaluate any decisions in this approach, but rather calculate ALL(!) possible decisions beforehand and incorporate them into the element list we generate. The breaker will merely choose a break possibility and the addAreas stage will paint the results given the break decision. Andreas, FWIW I struggled with a similar misconception when doing the conditional borders on inline elements (with is a very simplified version of of your problem). Luckily Luca set me straight very quickly. At every break possibility (space, hyphen, hyphenation point, ... for the ipd) you have to add all the Knuth elements required to model a) what happens if a break is inserted and b) what happens if no break is created. Amazingly the Knuth approach of just using box, penalty and glue elements can handle that. However, the only thing the Knuth elements and the line breaking algorithm do is to reserve the correct amount of space. Adding the actual borders is done when the areas are created. The only reevaluation will happen if we start to implement support for the changing available IPD problem, i.e. when the available IPD is different from page to page within the same page-sequence. In this case we will need to be able to recreate the element list from an arbitrary former break possibility on forward which means that all decisions are reevaluated from this point on due to changed environmental factors (the IPD). Even the line-breaking has to be redone, although the inline element list will not have to be recreated. Jeremias, can you explain to me why we have to reevaluate? Can't the line breaking code simply ask the LM for the new IPD when it inserts a page break and then continue with the new IPD? Yes, the question on the new IPD when ask of a LM may have to ripple up the LM chain until we get to a LM which can actually answer it. But is that conceptually flawed? snip/ Jeremias Maerki Manuel
Re: Collapsing borders/Tables: Knuth element generation questions (possible ideas?)
On 21.09.2005 09:52:00 Manuel Mall wrote: On Wed, 21 Sep 2005 02:50 pm, Jeremias Maerki wrote: snip/ The only reevaluation will happen if we start to implement support for the changing available IPD problem, i.e. when the available IPD is different from page to page within the same page-sequence. In this case we will need to be able to recreate the element list from an arbitrary former break possibility on forward which means that all decisions are reevaluated from this point on due to changed environmental factors (the IPD). Even the line-breaking has to be redone, although the inline element list will not have to be recreated. Jeremias, can you explain to me why we have to reevaluate? Let me explain by showing the flow of events for a simple block with a long text in it: - BlockLM.getNextKnuthElements is called indirectly by the PSLM which provides the available IPD in the layout context. - The BlockLM calls on the LineLM to do the line-breaking passing on the available IPD. - The LineLM creates the element list for its content. (IPD is irrelevant here) - The LineLM does the line breaking (by using the IPD value in the layout context) and creates a box element for each created line and penalties between lines. - The BlockLM receives the element list from the LineLM and integrates it into it own element list. - The resulting element list is returned to the parents until we're back in the PSLM which invokes the breaker. All break decisions for the whole sequence are generated at this point. - Let's assume the contents don't fit in the first page and we get at least one break point. Let's assume further that the first page is an A4 portrait page and the second is an A4 landscape page, i.e. the available IPD is bigger on the second page. - The first page is generated normally, all lines are properly generated. - The second page has the problem that the available IPD is different, but the line breaks have all been done for the IPD of the first page, because the LineLM has no chance of knowing on which page it will end up. because the break decisions in b-p-direction are done later. - Now the layout has to be backtracked to the point of the first break after which the available IPD is different. From there on the lines have to be rebroken to get line boxes which work with the right IPD. Note that the element list for the inline stuff doesn't have to be recreated. Only the line breaker gets a new available IPD. Due to different break decisions you get a different set of b-p-d elements which also have to be broken over the pages. It might be possible to optimize the amount of lines that are precreated to avoid unnecessary work in these conditions but these will be heuristics and guesses and most of all only approximations. At the very least this will get messy. The biggest problem will be tables whose LMs have to be made restartable [1]. The other LM will probably be easier. We used to have a restart mechanism before introducing the Knuth approach. Something like that will need to be reintroduced at some point. [1] http://wiki.apache.org/xmlgraphics-fop/TableLayout/BreakHandling Can't the line breaking code simply ask the LM for the new IPD when it inserts a page break and then continue with the new IPD? It's not that immediate as the LineLM has to do the line breaking before the page breaking can be done. Yes, the question on the new IPD when ask of a LM may have to ripple up the LM chain until we get to a LM which can actually answer it. But is that conceptually flawed? It doesn't work that way with the Knuth approach. HTH Jeremias Maerki
Re: Collapsing borders/Tables: Knuth element generation questions (possible ideas?)
On Wed, 21 Sep 2005 04:57 pm, Jeremias Maerki wrote: On 21.09.2005 09:52:00 Manuel Mall wrote: On Wed, 21 Sep 2005 02:50 pm, Jeremias Maerki wrote: snip/ Jeremias, can you explain to me why we have to reevaluate? Let me explain by showing the flow of events for a simple block with a long text in it: snip/ It's not that immediate as the LineLM has to do the line breaking before the page breaking can be done. Ok, I see we do all the line breaking first followed by the page breaking therefore the Line LM when creating a break as no idea that this may or may not become a page break and happily continues line breaking with the given ipd. Yes, the question on the new IPD when ask of a LM may have to ripple up the LM chain until we get to a LM which can actually answer it. But is that conceptually flawed? It doesn't work that way with the Knuth approach. Yes, I see because we use a two pass approach line breaking and page breaking use different set of Knuth elements. HTH Yes it did, thanks. Jeremias Maerki Manuel
Collapsing borders/Tables: Knuth element generation questions (possible ideas?)
Hi, Jeremias, Luca or Simon will probably be able to make the most sense out of it, but if there's anyone else that can add a few comments, feel free to do so. (FYI: This is completely separate from my idea to move the border-collapsing to the FOTree.) Now, I'm still not fully at home in the Knuth element generation algorithm, so I don't know exactly whether what I'm about to describe is at all feasible/doable. Maybe it's currently already done this way, and I'm missing the point somewhere... In that case: sorry for the noise. :-/ Here goes: I get the impression that the elements for borders and those for the content of the cells are created in one single pass, which seems to be the source of the so-called 'interaction problem' --IIC, this refers to the situation where, for example, we have already generated the AFTER border elements for the first two cells, while it's only when generating the elements for the third cell that a break is triggered. So, the obtained border- and content-elements become invalid, and need to be re-evaluated (possibly taking the footer into account). Is this a correct assessment of the issue? Am I correct when I say that this problem doesn't pose itself when the break would occur in the first cell of the row(group)? If so, I'm wondering whether it could help if the element generation for row(groups) were split up in two (possibly three passes) and be made to look like the following (in pseudo-code): while( rowIterator.hasNext() ) { if( firstRowGroupInPageOrColumn ) { generateBeforeBorderElements(); } generateAfterBorderElements(); generateContentElements(); } So, by the time we get to generating boxes/glues/penalties for the content of the cells, we would already have the minimum/maximum widths for *all* possible AFTER border elements in the row. The generateAfterBorderElements() step would create two element lists: - one to use if there is no page- or column-break - an alternate list to use in case the content triggers a break (which would then include all elements for the footer, if any) Maybe both lists could be made to include the elements for the AFTER padding as well (? since we have to iterate over the cells/grid-units anyway). Eventually only one of the two lists will be merged with the content element list, depending on the situation after the content element list completely known, but it would become a matter of inserting the right list (and discarding the incorrect one --at least, throwing away its elements). The only drawback I immediately see is that the generateAfterBorderElements() step would have to make the comparison with the footer- or table-borders for each and every row, unless we were to do this only in case the remaining page- or column-BPD has dropped below a certain threshold. The only remaining problems would then be that: a) there may be row(groups) whose content is so large that the remaining BPD is more than enough before the content's elements are generated, but only drops below the threshold during the generateContentElements() step. b) there's always the possibility of a forced break, regardless of the remaining BPD The creation of the alternate element list should therefore be implemented as a separate step that can be triggered either during generateAfterBorderElements() or generateContentElements(). In any case, besides gaining certainty about min- or max-border-widths, splitting up the element generation in 2-3 passes would allow us to gain a few hints on the content to get an idea of the probability of a page- or column-break. I mean: without actually triggering creation of a full element list for the content, we could maybe do a quick traverse of the FOTree-fragment contained in each cell to see if any of its descendants have a break-* property specified. To make an even more educated guess, perhaps we could even perform some off-hand calculations based on the average font-size, the number of blocks, the number of characters of the descendant FOText nodes, the content-height for contained images... But this all *without* generating the elements. Only minimal communication with the actual childLMs in that step, placing the focus on the FONode-elements (= the list returned by TableCell.getChildNodes()) and their properties. Does this make any sense? Cheers, Andreas