> The short version is that I can't figure out how to distribute vertical > space to avoid ragged column bottoms in multi-column pages when the flow > contains several long-ish one-column tables. Why one-column tables? > Because I have sections with headings that must be repeated at the top > of a column if split across columns. >
A quick follow-up on this: I ended up solving it using my existing one-column table approach to repeat headings, then post-processing the area tree to re-distribute the space. I'm most of the way through implementing insertion of house ads as whitepace filler as part of area tree post-processing, too. I've posted the core of the space redistribution code below in case it helps anyone else. The rest of the code (not shown) is just the usual stuff to embed fop, generate the area tree to a tempfile, and then render from a dom to pdf after reprocessing. Sorry for the mixed HTML/plain post, but in this case it's the best way to get Thunderbird to maintain my code formatting. Grr stupid mail clients. I may post the whole lot later if I get permission from the boss to open source this whole classified/pagination system, which is likely. For now, just the bits to reprocess the area tree follow, along with a cut-down version of the PaginatorConfiguration class that provides the required factories. This code finds all blocks with a prod-id starting with "ad_" or "heading_", determines how much free space is in the column, and distributes that free space evenly among those blocks, adding to any existing space-before if found. If it adds space to a block, it adds the same amount of space to the block progression dimension of all containing parent elements up to and including the <flow> that contains the column. To work, it requires that blocks that should receive distributed space be labeled with a suitable "id" attribute like "ad_bobsmowing" or "heading_forsale" in the XSL-FO. My app produces the XSL-FO with some XSLT, from simple input in a format like the following: sample_ad_input.xml <ads> <section class_no="1700"> <heading>FOR SALE HOUSEHOLD</heading> <ad adname="BBQ 4BURNER GAS"><adbody><b>BBQ</b> 4-burner gas, good cond $80 ONO. 9999 9999.</adbody></ad> </section> <section class_no="1725"> <heading>DANCE</heading> <ad adname="LATIN AMERICAN "> <adbody><b>LATIN</b> American and Social Dancing. Learn all the popular dances, Cha Cha, Jive, Rumba, Waltz, Quickstep, foxtrot .... Private and Wedding lessons available.</adbody> </ad> </section> </ads> ... but of course your needs would differ. I'm just showing how the space is redistributed in case others have this problem. In reality the XML is generated on demand by queries against a PostgreSQL database containing the ads, but that doesn't matter much for this purpose. A cut-down version of the XSLT to transform the above into FO is: ads_to_fo.xsl <?xml version="1.0"?> <!-- REQUIREMENTS: - An XSLST processor - Apache FOP - Hyphenation files from http://offo.sourceforge.net/hyphenation/index.html --> <xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform" xmlns:fo="http://www.w3.org/1999/XSL/Format"> <!-- File extension image files will have. --> <xsl:param name="imgext"/> <!-- The root template produces the XSL-FO document structure, including page templates etc. It then calls the processor to loop through the <ad/> elements and generate content for them. --> <xsl:template match="/"> <!-- XSL-FO document structure--> <fo:root xmlns:fo="http://www.w3.org/1999/XSL/Format" xml:lang="en"> <!-- Define master pages --> <fo:layout-master-set> <!-- TODO: separate masters for first page, left pages, right pages --> <!-- see http://xmlgraphics.apache.org/fop/fo.html#fo-oddeven --> <fo:simple-page-master master-name="page" page-height="400mm" page-width="290mm" margin="0mm"> <!-- Body, with columns --> <fo:region-body column-count="7" column-gap="0" margin-top="12mm"/> <!-- masthead --> <fo:region-before extent="10mm"/> </fo:simple-page-master> <fo:page-sequence-master master-name="pagesequence"> <!-- If you want a different first page, use fo:single-page-master-reference here --> <fo:repeatable-page-master-reference master-reference="page"/> </fo:page-sequence-master> </fo:layout-master-set> <!-- Define page contents --> <fo:page-sequence master-reference="pagesequence" language="en"> <!-- Ad text --> <fo:flow flow-name="xsl-region-body"> <!-- This should really be a fo:wrapper, but fop isn't bright enough to cope with that right now and will complain about inappropriate inline areas. Use a fo:block container instead until fop svn (which fixes this) is released to replace 0.95 --> <fo:block border-left-style="solid" border-right-style="solid" border-left-width="0.5pt" border-right-width="0.5pt" border-left-color="black" border-right-color="black" margin-left="-0.5pt" padding-left="2pt" margin-right="0pt" padding-right="2pt"> <xsl:apply-templates/> </fo:block> </fo:flow> </fo:page-sequence> </fo:root> <!-- End XSL-FO document structure--> </xsl:template> <!-- Process a classification section, producing a one-column table so we can ensure the heading repeats on column breaks. The table header will be provided by the <heading> element, which must be the first element of a <section>. Subsequent <ad> elements will go in the body. --> <xsl:template match="section"> <fo:table table-layout="fixed" width="100%" space-before="4pt" id="sectio...@class_no}"> <xsl:apply-templates select="heading"/> <fo:table-body> <fo:table-row> <fo:table-cell> <xsl:apply-templates select="ad"/> </fo:table-cell> </fo:table-row> </fo:table-body> </fo:table> </xsl:template> <!-- Process a heading --> <xsl:template match="heading"> <fo:table-header> <fo:table-cell> <fo:block hyphenate="false" text-align="center" background-color="black" color="white" font-family="Helvetica" font-size="10pt" font-weight="bold" padding-before="3pt" padding-after="1.5pt" margin-top="0" margin-bottom="2pt" id="heading_{../@class_no}"> <xsl:apply-templates/> </fo:block> </fo:table-cell> </fo:table-header> </xsl:template> <!-- Process an ad. Additional top-level templates are used to handle formatting, so this just encloses it in a block and calls the processor. --> <xsl:template match="ad"> <fo:block hyphenate="true" text-align="justify" text-align-last="left" widows="4" orphans="4" border-top-width="0.2pt" border-top-style="solid" border-top-color="black" padding-after="0.2pt" padding-before="0.3pt" font-family="Helvetica" font-weight="regular" font-size="6.3pt" id="a...@adname}_class_{../@class_no}" > <xsl:apply-templates/> </fo:block> </xsl:template> <xsl:template match="adbody"> <xsl:apply-templates/> </xsl:template> <!-- handle an external ad reference, for when an ad is an image --> <!-- They need to be centered --> <xsl:template match="external"> <!-- Fop takes some persuasion to scale the pics to the right width. The explicit "width=100%" appears to be necessary to get it to scale - scale-to-fit alone won't do it. --> <fo:external-graphic content-width="scale-to-fit" width="100%" content-height="auto" scaling="non-uniform" src="url('pics/{.}{$imgext}')" /> </xsl:template> <!-- Convert bold tag to XSL-FO bold inline style block --> <xsl:template match="b"> <fo:inline font-weight="bold"> <xsl:apply-templates/> </fo:inline> </xsl:template> </xsl:stylesheet> Given xml produced by that XSLT, converted to area tree XML by fop and loaded into a W3C DOM (Document) using the usual Java tools, the following code will redistribute space in the columns so that the Document may be passed back into FOP via a DOMSource to be rendered to PDF. *AreaTreeTransformer.java:* import javax.xml.xpath.XPath; import javax.xml.xpath.XPathConstants; import javax.xml.xpath.XPathExpression; import javax.xml.xpath.XPathExpressionException; import javax.xml.xpath.XPathFactory; import org.w3c.dom.Document; import org.w3c.dom.Element; import org.w3c.dom.Node; import org.w3c.dom.NodeList; /** * AreaTreeTransformer is responsible for manipulating a loaded area tree * XML DOM to redistribute space, insert house ads, etc. * * @author Craig Ringer <cr...@postnewspapers.com.au> */ public class AreaTreeTransformer { private final PaginatorConfiguration conf; private final Document areaTree; private final XPathFactory xpathFactory; private final XPathExpression findColumnsInDocument; private final XPathExpression findAdsInColumn; /** * Prepare a new transformer to operate on the passed area tree XML. * * @param conf PaginatorConfiguration to provide factories required * @param areaTree W3C DOM containing area tree XML to process * @throws XPathExpressionException */ public AreaTreeTransformer(PaginatorConfiguration conf, Document areaTree) throws XPathExpressionException { this.conf = conf; this.areaTree = areaTree; this.xpathFactory = conf.getXPathFactory(); // This expression locates all columns in the document. It'll be called // with the document root node as an argument. findColumnsInDocument = xpathFactory.newXPath().compile("//span/flow"); // This expression locates all ad and heading nodes within a column. // It'll be called with a column node, as returned by findColumnsInDocument, // as an argument. findAdsInColumn = xpathFactory.newXPath().compile(".//block[starts-with(@prod-id,'ad_') or starts-with(@prod-id,'heading_')]"); } /** * Transformation of the document is done on a column-by-column basis. * First, we must find all the columns and iterate over them. Then within * each column, we must find the amount of white space that must be consumed. * * Once the white space is known, the decision of what to do with it must be * made. Should it just be re-distributed? Or should a house ad be inserted? * Or (for the final column) should it be left empty for other content to be * put in? * * Once any house ads are inserted, the remaining white space must be distributed * between all the ads. * * For some basic info on the xpath api see * http://www.ibm.com/developerworks/library/x-javaxpathapi.html */ public void doTransform() throws XPathExpressionException { // First we must find the columns. Each column in the area tree is identifed // by a flow element under a span element, so it's easy to find them. NodeList columnList = (NodeList)findColumnsInDocument.evaluate(areaTree, XPathConstants.NODESET); for ( int i = 0; i < columnList.getLength(); i++ ) { Element flowNode = (Element)columnList.item(i); // For each column, we must determine how much free space is in the column. // This is the difference between the block progression dimension of the // span (ie the max col height) and the block progression dimension of the // flow containing the column its self. final int spanBpd = Integer.parseInt(((Element)flowNode.getParentNode()).getAttribute("bpd")); final int flowBpd = Integer.parseInt(flowNode.getAttribute("bpd")); if (flowBpd == 0) { // Empty column. // TODO: The last column BEFORE the empty column may need special // treatment, so we might need to add lookahead. OTOH, there's no // guarantee there will be any empty cols - the last col might be // on the end of a page. continue; } final double spaceToFill = (double)spanBpd - (double)flowBpd; // TODO: determine optimal house ad(s) to consume this space // and append them to the column, increasing the flow b-p-d as // necessary. final double spaceToDistribute = addHouseAds(flowNode, spaceToFill); // Now redistribute space within the column so that ads use up // all the space. To do this, we find all ads (and headings) in the // column, and then divide the space evenly between all except // the first block in the column. We then distribute that space among // all the nodes we found by adding it to each node's space-before. // // For each node to which space is added, we must update the b-p-d // of all parent blocks up to the flow level, so that everything // starts in the right places and all the children fit inside their // containing parents. The easiest way to do that is walk up the // ancestor tree adding to the b-p-d of each node along the way. NodeList adsAndHeads = (NodeList) findAdsInColumn.evaluate(flowNode, XPathConstants.NODESET); // Distribute space among all blocks EXCEPT first, which shouldn't get any // because we want it flush with the top margin. final int numBlocksToPad = adsAndHeads.getLength() - 1; if (numBlocksToPad == 0) { // Only one block in this column! System.err.println("Cannot distribute space in column: only one ad block in column"); continue; } final double extraSpacePerBlock = spaceToDistribute / numBlocksToPad; // Start padding AFTER first block for ( int j = 1 ; j < adsAndHeads.getLength(); j++ ) { Element block = (Element) adsAndHeads.item(j); padBlock( block, flowNode, extraSpacePerBlock ); } } } private double addHouseAds(Element columnElement, double spaceToFill) { // TODO: use conf object to obtain house ad dimensions, determine best fit, // and insert ads into area tree. // // Currently no ads added, return original space return spaceToFill; } /* * Add `extraSpacePerBlock' to space-before on block, adding the attribute if * it is missing and otherwise increasing its value by the specified amount. * * Then scan up the ancestor tree, and for each ancestor with a bpd attribute * (block progression dimension) between the block and the surrounding flow * element, inclusive, increase the bpd of that element by extraSpacePerBlock. */ private void padBlock(Element block, Element flowNode, double extraSpacePerBlock) { double newSpaceBefore = extraSpacePerBlock; if (block.hasAttribute("space-before")) { newSpaceBefore += Integer.parseInt(block.getAttribute("space-before")); } String roundedSpaceBefore = Long.toString(Math.round(newSpaceBefore)); block.setAttribute("space-before", roundedSpaceBefore); Element parent = (Element)block.getParentNode(); do { if (parent.hasAttribute("bpd")) { long newBpd = Math.round(extraSpacePerBlock + Integer.parseInt(parent.getAttribute("bpd"))); parent.setAttribute("bpd", Long.toString(newBpd)); } if (flowNode.isSameNode(parent)) break; } while ( (parent = (Element)parent.getParentNode()) != null ); } } PaginatorConfiguration.java import java.io.File; import java.net.MalformedURLException; import java.nio.file.Path; import javax.xml.parsers.DocumentBuilderFactory; import javax.xml.transform.Transformer; import javax.xml.transform.TransformerFactory; import javax.xml.xpath.XPathFactory; import org.apache.fop.apps.FopFactory; /** * PaginatorConfiguration tracks instances of configured factories required * for parsing, formatting, etc. * * @author Craig Ringer <cr...@postnewspapers.com.au> */ public class PaginatorConfiguration { private final FopFactory fopFactory = FopFactory.newInstance(); private final TransformerFactory xsltFactory = TransformerFactory.newInstance(); private final DocumentBuilderFactory documentBuilderFactory = DocumentBuilderFactory.newInstance(); private final XPathFactory xpathFactory = XPathFactory.newInstance(); // TODO: load from resource private final File adsToFoXSLTFile = new File("ads_to_fo.xml"); public PaginatorConfiguration() throws MalformedURLException { // Namespace awareness is required if feeding the dom back into fop. documentBuilderFactory.setNamespaceAware(true); // TODO: configure font base, image base, etc here. File cwd = new File( System.getProperty("user.dir") ); fopFactory.setBaseURL( cwd.toURI().toString() ); fopFactory.setFontBaseURL( (new File(cwd,"fonts")).toURI().toString() ); fopFactory.setSourceResolution(200); fopFactory.setTargetResolution(200); // TODO: download and paginate house ads } public FopFactory getFopFactory() { return fopFactory; } public TransformerFactory getTransformerFactory() { return xsltFactory; } public DocumentBuilderFactory getDocBuilderFactory() { return documentBuilderFactory; } public XPathFactory getXPathFactory() { return xpathFactory; } public File getAdsToFoXSLTFile() { return adsToFoXSLTFile; } } -- Craig Ringer Tech-related writing: http://soapyfrogs.blogspot.com/