Improving Keeps and Breaks
Hi all, Caution, long post ;-) I’ve been thinking about the handling of keeps and breaks in tables for a while, and it seems to me that improvements could be done in that whole area. I’ll use break-before as an example but what I’ll be saying applies to break-after and keeps as well. Currently, for every object to which the break-before property applies, the value of the property is checked at the beginning of the getNextKnuthElements method and, if it corresponds to a hard-break, a Knuth penalty with -inf value is produced. This has several shortcomings: - this leads to much code duplication; - if, say, an fo:block has another fo:block as a child and both have break-before, two penalties will be generated instead of one (although I’m suddenly not so sure of that, more later) - this makes it difficult to improve breaks to distinguish columns from pages, and keeps to take integer values into account. In fact, we don’t so much care about the penalty element itself as whether there /should/ be a break between elements or not. I mean, the LMs which actually care are those which concatenate the elements: fo:flow, fo:block, fo:block-container, fo:table-cell, etc. And they all do it the same way: iterate over the children LMs for each LM: if there is a following LM, then: if the current LM has break-after or the following LM has break-before, then generate a Knuth forced break So the main question is to know whether there is a break before a LM. And here that’s quite simple, there are only a few shared behaviours: indeed there is a break-before on an element if: - the break-before property is set on the element itself or the first of its children: fo:block, fo:block-container, fo:list-block - it is set on the element itself or any of its children: fo:table-row, fo:list-item - it is set on the first of its children: fo:table-caption, fo:table-cell, fo:list-item-body, fo:list-item-label, fo:wrapper - special: - fo:table: on itself or first table-body - fo:table-body: on the first fo:table-row child if any; otherwise on any of fo:table-cell children making the first row - not applicable: fo:table-column, fo:table-header/footer, fo:float, fo:footnote-body So there would just be a couple of methods to write, for each behaviour. We could (for the moment) define a dedicated class with static methods, which would be called by each LM (some methods are imaginary, but you get the idea): public class KeepsAndBreaksHandler { public static boolean breakBeforeSelfOrFirstChild(LM lm) { return lm.getProperty(break-before).isForced() || lm.getFirstChild().getProperty(break-before).isForced(); } public static boolean breakBeforeSelfOrAnyChild(LM lm) { ... } public static boolean breakBeforeFirstChild(LM lm) { ... } ... } The method mustKeepWithPrevious would become: - for BlockLM: return KeepsAndBreaksHandler.breakBeforeSelfOrFirstChild(this) - for TableCellLM: return KeepsAndBreaksHandler.breakBeforeFirstChild(this) etc. The static method wouldn’t even need to be made synchronized, as the class is stateless. Later on we could replace this class with a Singleton object that would be passed to the LMs at their creation, and we would have some kind of “KeepsAndBreaksHandlerFactory”. But for now a simple class with static methods would be sufficient. We could also put in this class the code for producing the penalties elements: KnuthElement getElementForBreakBetween(LM firstLM, LM secondLM) { if (firstLM.hasBreakAfter() || secondLM().hasBreakBefore()) { return new KnuthPenalty(...); } else if (firstLM.mustKeepWithNext() || secondLM.mustKeepWithPrevious()) { return new KnuthForbiddenBreak() } } or something like that. The benefit would be that the whole handling of breaks can be found in just one place, instead of being spread among all the LMs. This is easier to correct bugs; this is easier to implement new features (column vs page break, integer keeps...); this simplifies the getNextKnuthElements methods of the LMs. And so on... WDYT? Vincent
Re: Improving Keeps and Breaks
On Aug 2, 2007, at 12:13, Vincent Hennebert wrote: Hi Vincent Hope you still remember this one. Sorry about the late reply. Still catching up on some missed posts during the holidays. I’ve been thinking about the handling of keeps and breaks in tables for a while, and it seems to me that improvements could be done in that whole area. I’ll use break-before as an example but what I’ll be saying applies to break-after and keeps as well. While I've also been looking into that direction myself, I'm not sure I agree with all of your suggestions (more inline below). Currently, for every object to which the break-before property applies, the value of the property is checked at the beginning of the getNextKnuthElements method and, if it corresponds to a hard-break, a Knuth penalty with -inf value is produced. This has several shortcomings: - this leads to much code duplication; - if, say, an fo:block has another fo:block as a child and both have break-before, two penalties will be generated instead of one (although I’m suddenly not so sure of that, more later) - this makes it difficult to improve breaks to distinguish columns from pages, and keeps to take integer values into account. Very right about the above, and note that this is not only true for block-level breaks and keeps (= page- or column-breaks), but also inline-level, where at the moment, the situation is worse. keep- together.within-line="always" only works for text-descendants of the same FObj. Forcing a keep-with-previous on a graphic or an fo:inline to have it always kept on the same line as the last line generated by preceding PCDATA is currently impossible with FOP. In fact, we don’t so much care about the penalty element itself as whether there /should/ be a break between elements or not. I mean, the LMs which actually care are those which concatenate the elements: fo:flow, fo:block, fo:block-container, fo:table-cell, etc. And they all do it the same way: iterate over the children LMs for each LM: if there is a following LM, then: if the current LM has break-after or the following LM has break-before, then generate a Knuth forced break So the main question is to know whether there is a break before a LM. And here that’s quite simple, there are only a few shared behaviours: indeed there is a break-before on an element if: - the break-before property is set on the element itself or the first of its children: fo:block, fo:block-container, fo:list-block - it is set on the element itself or any of its children: fo:table-row, fo:list-item - it is set on the first of its children: fo:table-caption, fo:table-cell, fo:list-item-body, fo:list-item-label, fo:wrapper - special: - fo:table: on itself or first table-body - fo:table-body: on the first fo:table-row child if any; otherwise on any of fo:table-cell children making the first row - not applicable: fo:table-column, fo:table-header/footer, fo:float, fo:footnote-body I think I see another way of simplifying the necessary code in the LMs, but I'm not a 100% sure... Currently, a break is not propagated upwards or normalized in any way in the fo-tree. If you would have: Then we would get three nested Block instances, and the corresponding BlockLM for the first outer Block always needs to check its first childLM for the break-condition. OTOH, the above is semantically equivalent to (I think we had already established that there should not be a double page-break here) If the LMs would be guaranteed to receive the 'normalized' form, the break-condition can be tested for internally by the outer LM itself. No need to look forward or back... The first descendants wouldn't even need to check for breaks anymore. So there would just be a couple of methods to write, for each behaviour. We could (for the moment) define a dedicated class with static methods, which would be called by each LM (some methods are imaginary, but you get the idea): Interesting idea, too, but... The benefit would be that the whole handling of breaks can be found in just one place, instead of being spread among all the LMs. This is easier to correct bugs; this is easier to implement new features (column vs page break, integer keeps...); this simplifies the getNextKnuthElements methods of the LMs. And so on... Agreed with the concerns, but I'm wondering if these portions of code, instead of extracting them into a separate class, could be centralized in, say, BlockStackingLM and InlineStackingLM...? Cheers Andreas
Re: Improving Keeps and Breaks
Hi Andreas, Andreas L Delmelle wrote: > On Aug 2, 2007, at 12:13, Vincent Hennebert wrote: > > Hi Vincent > > Hope you still remember this one. Sorry about the late reply. Still > catching up on some missed posts during the holidays. Hope you still remember this one. Sorry about the late reply. Still catching up on some missed posts while I was not working on FOP :-] And thanks for your comments. >> In fact, we don’t so much care about the penalty element itself as >> whether there /should/ be a break between elements or not. I mean, the >> LMs which actually care are those which concatenate the elements: >> fo:flow, fo:block, fo:block-container, fo:table-cell, etc. And they all >> do it the same way: >> iterate over the children LMs >> for each LM: >> if there is a following LM, then: >> if the current LM has break-after or the following LM has >> break-before, then >> generate a Knuth forced break >> >> So the main question is to know whether there is a break before a LM. >> And here that’s quite simple, there are only a few shared behaviours: >> indeed there is a break-before on an element if: >> - the break-before property is set on the element itself or the first of >> its children: >> fo:block, fo:block-container, fo:list-block >> - it is set on the element itself or any of its children: >> fo:table-row, fo:list-item >> - it is set on the first of its children: >> fo:table-caption, fo:table-cell, fo:list-item-body, >> fo:list-item-label, fo:wrapper >> - special: >> - fo:table: on itself or first table-body >> - fo:table-body: on the first fo:table-row child if any; otherwise on >> any of fo:table-cell children making the first row >> - not applicable: >> fo:table-column, fo:table-header/footer, fo:float, fo:footnote-body > > I think I see another way of simplifying the necessary code in the LMs, > but I'm not a 100% sure... > > Currently, a break is not propagated upwards or normalized in any way in > the fo-tree. > > If you would have: > > > > > > Then we would get three nested Block instances, and the corresponding > BlockLM for the first outer Block always needs to check its first > childLM for the break-condition. > > OTOH, the above is semantically equivalent to (I think we had already > established that there should not be a double page-break here) > > > > > > If the LMs would be guaranteed to receive the 'normalized' form, the > break-condition can be tested for internally by the outer LM itself. No > need to look forward or back... The first descendants wouldn't even need > to check for breaks anymore. I think I see your point. Basically you’re proposing a push method (a LM notifies its parent LM that it has a break-before) while mine is a pull method (a LM asks its children LMs if they have break-before). You’re more at the FO tree building stage, I’m more at the layout stage. In terms of efficiency I think both methods are equivalent as the same amount of method calls will be performed in either way. The push method might be slighty more complicated to implement in special cases like tables: when an fo:cell notifies its parent fo:table-body that it has a break-before, the table-body must figure out if the cell lies in the first row or not. A matter of taste, probably, but I think I’d prefer the pull method: the LM performs requests to the appropriate children LMs exactly when and if needed. That may simplify code as well (and improve its readability) as some form of pull method is necessary anyway (the mustKeepWithPrevious/WithNext/Together methods). I believe you already mentioned this idea of normalizing/simplifying the FO tree in the past. Note that it may exist in parallel as it addresses a different general issue. One concern I’d have is to make sure that a simplification leads to a semantically equivalent result. Given the complexity of the spec that might be difficult to establish. Not sure also if the overhead is compensated by the gain in the further processes (layout, area tree generation). But that’s a different topic. >> So there would just be a couple of methods to write, for each behaviour. >> We could (for the moment) define a dedicated class with static methods, >> which would be called by each LM (some methods are imaginary, but you >> get the idea): > > > Interesting idea, too, but... > >> The benefit would be that the whole handling of breaks can be found in >> just one place, instead of being spread among all the LMs. This is >> easier to correct bugs; this is easier to implement new features (column >> vs page break, integer keeps...); this simplifies the >> getNextKnuthElements methods of the LMs. And so on... > > Agreed with the concerns, but I'm wondering if these portions of code, > instead of extracting them into a separate class, could be centralized > in, say, BlockStackingLM and InlineStackingLM...? I thought of that, but a se
Re: Improving Keeps and Breaks
On Oct 18, 2007, at 19:23, Vincent Hennebert wrote: OTOH, the above is semantically equivalent to (I think we had already established that there should not be a double page-break here) If the LMs would be guaranteed to receive the 'normalized' form, the break-condition can be tested for internally by the outer LM itself. No need to look forward or back... The first descendants wouldn't even need to check for breaks anymore. I think I see your point. Basically you’re proposing a push method (a LM notifies its parent LM that it has a break-before) while mine is a pull method (a LM asks its children LMs if they have break-before). Yep, although it would not be the LM but rather the FO that pushes the break-before upwards to its parent if it is also the first child. The LMs would largely continue to work as they do now, except that under a certain set of conditions, they don't need to check the outside anymore: only take into account the forced break on its own FO. If there is none, then no need to recursively check for first descendants having forced breaks. Currently (sorry if it becomes boring to stress this) the construction of the layout-tree starts only when the end-of-page- sequence event occurs. I still see room for changing this in the future, and so I need to consider the effects on the layout-algorithm as well: the algorithm will, for instance, no longer be able to rely on *all* childLMs being available the first time it enters the loop... The last childLM in an iteration might turn out to be not-the- last-one-after-all. For many following FONodes, the LMs do not exist yet at that point. Not in my head, at least. ;-) You’re more at the FO tree building stage, I’m more at the layout stage. In terms of efficiency I think both methods are equivalent as the same amount of method calls will be performed in either way. Right, but OTOH... it's more a matter of /when/ (in the process) that happens. The push method might be slighty more complicated to implement in special cases like tables: when an fo:cell notifies its parent fo:table-body that it has a break-before, the table-body must figure out if the cell lies in the first row or not. Almost everything is /slightly/ more complicated in case of fo:tables, especially those without explicit fo:table-rows or - columns. ;-) Anyway, I remember that when I implemented implicit column-numbers, I also gave TableBody an instance member to check whether we are adding cells in the first row or not, so this particular case would be easily addressed. (Checking... yep, it's still there.) Come to think of tables, I'd consider 'propagation' in terms of pushing a forced break on a cell to the first cell in the row. In the table-layout code, at the point where we have a reference to the row or the first cell in a row, we would immediately know whether there is a forced break on a first descendant in any of the following sibling cells without having to request the corresponding childLMs and trigger a tree-traversal of who-knows-how-many levels. Keeping in mind the above mentioned idea of triggering layout sooner, if we can guarantee that the layoutengine always receives complete rows, then the table-layout job should become a bit simpler in the general use-case, while still not adding much complexity in trickier, more exotic cases, like: //table-cell/block[position() > [EMAIL PROTECTED]'page'] especially where the cell's column-number corresponds to the highest column-number. Triggering layout sooner is the only way we are ever going to get FOP to accept arbitrarily large tables, without consuming massive amounts of heap. A 'simple' grid of 5 x 500 cells generates +5000 FONodes (table-cells must have at least one block each) that stay in memory until the page-sequence is completely finished. I wonder how many break-possibilities that generates... :/ A matter of taste, probably, but I think I’d prefer the pull method: the LM performs requests to the appropriate children LMs exactly when and if needed. The only thing an LM should initially pull/request from its children, AFAIU, is a list of elements, given a certain LayoutContext. When composing its own element list, an LM should ideally be able to rely on the lists it receives from its children. Then add/delete/ update elements and (un)wrap, depending on context that is unknown or irrelevant to the child. That may simplify code as well (and improve its readability) as some form of pull method is necessary anyway (the mustKeepWithPrevious/WithNext/Together methods). Keeps are a different story indeed. Big difference is that keeps have strengths, and breaks do not. Consider: ... This may be interpretation: you cannot specify a 'strength' for a break. It is either there or not. I take this to mean that a forced break overrules any keep. Main advant
Re: Improving Keeps and Breaks
Andreas L Delmelle schrieb: Triggering layout sooner is the only way we are ever going to get FOP to accept arbitrarily large tables, without consuming massive amounts of heap. It is little OT, but multipass would be another way - at the cost of runtime and diskspace/-access (if the memory footprint should be shrinked). I know it is not a issue now, but I sometimes think [1] of how the "formating objects for indexing" (section 6.10 of xsl 1.1) could be implemented and it always ends in a multipass solution because in contrast to page-number-citation it is hardly possible to guess how much space to reserve for the resulting index-page-citation-list. Indexing would be a great help for my project - I do it currently by creating as much page-number-citations as needed but without beeing able to use any advanced features, so I end up with long lists of pagenumbers that could/should be merged to page-ranges. It is not realy a big problem because it is a "private" project, but I want to keep the door open for a better solution that is contained in the standard. Nevertheless resolving as much as possible at the FO tree building stage and trigger an earlier layout process would be an advantage for most use cases. gerhard [1] Yes, "thinking" is all what I do at present ;-)))
Re: Improving Keeps and Breaks
On Oct 19, 2007, at 13:32, Vincent Hennebert wrote: Andreas L Delmelle wrote: On Oct 18, 2007, at 19:23, Vincent Hennebert wrote: I think I see your point. Basically you’re proposing a push method (a LM notifies its parent LM that it has a break-before) while mine is a pull method (a LM asks its children LMs if they have break-before). Yep, although it would not be the LM but rather the FO that pushes the break-before upwards to its parent if it is also the first child. The Sure, of course. BTW, that may be a dumb question, but: how does a FO know that it is the first child of its parent? AFAICT there’s no such information in the FObjs. Oh, that's pretty easy. Each FObj has a firstChild member, so the most straightforward idiom for this would be something like: if (parent.firstChild == null) { ... } //if checking before addChildNode() has been called if (this == parent.firstChild) { ... } //if you check after addChildNode() has been called Which means that an object would blindly notify its parent that it has a break, and that would be up to the parent to figure out whether it should take it into account or not. Would be a bit overkill, wouldn’t it? No, it wouldn't. Or at least: depends on what you mean exactly by 'overkill'? LMs would largely continue to work as they do now, except that under a certain set of conditions, they don't need to check the outside anymore: only take into account the forced break on its own FO. If there is none, then no need to recursively check for first descendants having forced breaks. Of course, but I’m concerned about spreading code relevant to the same functionality over several places of the codebase. That will always be a concern. If you separate some related logic into YAUC[*], the same reasoning applies, IMO. Even if the logic is contained in a separate class, you still have to look at/know of the other classes. [*] (yet-another-utility-class) Keeping in mind the above mentioned idea of triggering layout sooner, if we can guarantee that the layoutengine always receives complete rows, then the table-layout job should become a bit simpler in the general use-case, while still not adding much complexity in trickier, more exotic cases, like: //table-cell/block[position() > [EMAIL PROTECTED]'page'] That one triggers a break /inside/ the table-row, not before it. What I meant precisely... and, in practice, this will occur far less often than forced breaks on the first block in a cell, IIC. Anyway, at a given LM level the work to do looks simple enough to me. especially where the cell's column-number corresponds to the highest column-number. Triggering layout sooner is the only way we are ever going to get FOP to accept arbitrarily large tables, without consuming massive amounts of heap. A 'simple' grid of 5 x 500 cells generates +5000 FONodes (table-cells must have at least one block each) that stay in memory until the page-sequence is completely finished. I wonder how many break-possibilities that generates... :/ Like said above, I don’t think anything in my approach prevents that from happening. Indeed not. Did I say anything else? A matter of taste, probably, but I think I’d prefer the pull method: the LM performs requests to the appropriate children LMs exactly when and if needed. The only thing an LM should initially pull/request from its children, AFAIU, is a list of elements, given a certain LayoutContext. When composing its own element list, an LM should ideally be able to rely on the lists it receives from its children. Then add/delete/ update elements and (un)wrap, depending on context that is unknown or irrelevant to the child. That one I don’t quite agree with. Although I thought of it too on a first step. I think it’s more complicated to play with a list of elements, try and get the first one from children if any, create a new one if necessary, change the break context (column, page) if needed, etc., rather than simply request the applicable children LMs for their break-before values. And again, in the case of tables that means that the merging algorithm needs to deal with many possibilities of break elements to occur. That’s really not its job I think. That may simplify code as well (and improve its readability) as some form of pull method is necessary anyway (the mustKeepWithPrevious/WithNext/Together methods). Keeps are a different story indeed. Big difference is that keeps have strengths, and breaks do not. Yes, but keeps and breaks are handled at the same place, mainly, when a LM considers the stacking of children LMs (BlockLM, for example). And the treatment is very similar. The point is that the layoutengine/-algorithm can focus on the keeps, since the breaks would have been 'normalized' (since it is very easy to do so; for keeps this much less straightforward, since the strengths need to be considered as well, so that might indeed lead to unneces
Re: Improving Keeps and Breaks
Hi Andreas, Andreas L Delmelle wrote: > On Oct 18, 2007, at 19:23, Vincent Hennebert wrote: >> I think I see your point. Basically you’re proposing a push method (a LM >> notifies its parent LM that it has a break-before) while mine is a pull >> method (a LM asks its children LMs if they have break-before). > > Yep, although it would not be the LM but rather the FO that pushes the > break-before upwards to its parent if it is also the first child. The Sure, of course. BTW, that may be a dumb question, but: how does a FO know that it is the first child of its parent? AFAICT there’s no such information in the FObjs. Which means that an object would blindly notify its parent that it has a break, and that would be up to the parent to figure out whether it should take it into account or not. Would be a bit overkill, wouldn’t it? > LMs would largely continue to work as they do now, except that under a > certain set of conditions, they don't need to check the outside anymore: > only take into account the forced break on its own FO. If there is none, > then no need to recursively check for first descendants having forced > breaks. Of course, but I’m concerned about spreading code relevant to the same functionality over several places of the codebase. > Currently (sorry if it becomes boring to stress this) the construction > of the layout-tree starts only when the end-of-page-sequence event > occurs. I still see room for changing this in the future, and so I need > to consider the effects on the layout-algorithm as well: the algorithm > will, for instance, no longer be able to rely on *all* childLMs being > available the first time it enters the loop... The last childLM in an > iteration might turn out to be not-the-last-one-after-all. For many > following FONodes, the LMs do not exist yet at that point. Not in my > head, at least. ;-) I think nothing would prevent the layout process from being started earlier. On most FOs only the first child needs to be checked for a break. And for a table, only the first row needs to be retrieved in order to know if a break must occur before it. That’s another topic, but we could imagine that FObj instances and corresponding LMs are dynamically created when requested by a parent LM. Whereas at the (child) FObj level, it’s unclear to me how we will be able to say that we know enough to start the layout, and that we don’t need to grab further children FObjs. > Anyway, I remember that when I implemented implicit column-numbers, I > also gave TableBody an instance member to check whether we are adding > cells in the first row or not, so this particular case would be easily > addressed. (Checking... yep, it's still there.) > > Come to think of tables, I'd consider 'propagation' in terms of pushing > a forced break on a cell to the first cell in the row. > In the table-layout code, at the point where we have a reference to the > row or the first cell in a row, we would immediately know whether there > is a forced break on a first descendant in any of the following sibling > cells without having to request the corresponding childLMs and trigger a > tree-traversal of who-knows-how-many levels. > > Keeping in mind the above mentioned idea of triggering layout sooner, if > we can guarantee that the layoutengine always receives complete rows, > then the table-layout job should become a bit simpler in the general > use-case, while still not adding much complexity in trickier, more > exotic cases, like: > //table-cell/block[position() > [EMAIL PROTECTED]'page'] That one triggers a break /inside/ the table-row, not before it. Anyway, at a given LM level the work to do looks simple enough to me. > especially where the cell's column-number corresponds to the highest > column-number. > > Triggering layout sooner is the only way we are ever going to get FOP to > accept arbitrarily large tables, without consuming massive amounts of > heap. A 'simple' grid of 5 x 500 cells generates +5000 FONodes > (table-cells must have at least one block each) that stay in memory > until the page-sequence is completely finished. I wonder how many > break-possibilities that generates... :/ Like said above, I don’t think anything in my approach prevents that from happening. >> A matter of taste, probably, but I think I’d prefer the pull method: the >> LM performs requests to the appropriate children LMs exactly when and if >> needed. > > The only thing an LM should initially pull/request from its children, > AFAIU, is a list of elements, given a certain LayoutContext. > When composing its own element list, an LM should ideally be able to > rely on the lists it receives from its children. Then add/delete/update > elements and (un)wrap, depending on context that is unknown or > irrelevant to the child. That one I don’t quite agree with. Although I thought of it too on a first step. I think it’s more complicated to play with a list of elements, try and get the first one from ch