RE: Apache FOP 0.95 Patch
I agree that this should happen behind the scenes without the user having to specify anything. I had to perform this work-around the way I did because of our current time constraints. Hopefully, this can lead to something else. Unfortunately, my knowledge of the Apache FOP Source only extends to the last week of work getting this work-around in place. I am pretty sure, however, that to implement memory management effectively for FOP (behind the scenes) for RTF and PDF, the two handlers (RTFHandler and AreaTreeHandler) will have to be modified. Having said that, I think it will be much easier to modify the RTF rendering because it does not use the Page Breaking Algorithm. Regards, Ben. -Original Message- From: Andreas Delmelle [mailto:andreas.delme...@telenet.be] Sent: Thursday, June 04, 2009 9:35 AM To: fop-dev@xmlgraphics.apache.org Subject: Re: Apache FOP 0.95 Patch On 04 Jun 2009, at 14:11, Simon Pepping wrote: Hi Ben, Simon & Vincent, >> > > Indeed, it is a horrible hack with regard to the meaning of a > page-sequence. But it is an interesting solution to the problem of > influencing FOP's page breaking algorithm. The very same thoughts over here. A really interesting showcase of what FOP can/should do, but I'd go about the implementation differently. Still a worthwhile overview of what needs to happen, albeit behind the scenes, without requiring the user to do anything special. > > B.T.W., why does the algorithm not stop at hard page breaks? IIC from recent debug-sessions, it does. Well, it's not really the algorithm that stops... If the FlowLM signals a forced page-break, the current block-list is returned, page-breaks are computed and the areas are immediately added to the tree. After that, the PageBreaker resumes fetching the following block-lists. The breaks for the latter part are computed later by an entirely separate PageBreakingAlgorithm. In fact, this is one scenario where the line-breaking continues with a possibly different available i-p-d. Span-changes are another example where FOP currently already processes part of the page-sequence with a different PageBreakingAlgorithm. > I seem to recall that in the past this happened for hard line breaks. This is indeed not so. Hard line-breaks just trigger the end of the current Paragraph and start a new one (an empty one, if it only contains a preserved linefeed, to produce a blank line), but the main getNextKnuthElements() loop is not interrupted. The forced breaks do, however, help the algorithm. I once ran a test with a document containing one single fo:block with the pre-formatted text of an entire book. Without 'linefeed-treatment="preserve"', FOP needed at least 768MB to avoid running out of memory, because it had to recompute all the line-breaks. Preserving the linefeeds, I needed only 64MB (maybe even lower, but I don't think I tried that). Regards Andreas
RE: Apache FOP 0.95 Patch
Vincent - I agree this is a work-around and does distort the semantics of the fo:page-sequence element. When I opened up the FOP 0.95 Source last week, it became apparent that trying to interject where FOP starts its layout would be time consuming. Currently the handlers are directly tied to the FO to only start rendering when a page-sequence is closed. I would like to point out that the patched code does not start rendering earlier than this. It simply provides a method of continuing the rendering without a page break. Again, this does not conform to the semantics of a page-sequence -- in that -- a page-sequence should be a set of pages that start and end on its own pages. However, there was no other way (at least that I could see) within the confines of the time constraints I had to provide a work-around for FOP to manage the memory correctly. So, having said this, the patch allows RTF and PDF Rendering to work in original fashion or in the new modified approach. The default behavior is to a page-sequence not trigger a page break at the end of its content. To enable your document to render as it did pre-patch the page-sequence is specified as: ... Contents I understand why you are not going to blend this into Trunk. Maybe some of the work provided in this patch can be used to provide a new mechanism for improving the memory management of FOP in which rendering is not tied to the end of page-sequences. Cheers. Ben. -Original Message- From: Vincent Hennebert [mailto:vhenneb...@gmail.com] Sent: Thursday, June 04, 2009 7:35 AM To: fop-dev@xmlgraphics.apache.org Cc: Chris Fanjoy; Jody Brownell; Glen Campbell Subject: Re: Apache FOP 0.95 Patch Hi Ben, Thank you very much for your interest in FOP and your contribution. I've opened a Bugzilla issue containing your patch so that it can easily be referred to: https://issues.apache.org/bugzilla/show_bug.cgi?id=47314 It is likely to interest other users who run into similar memory issues, and the good thing of having made it against the 0.95 release is that it won't be made obsolete by further changes in the code. We are not going to apply this patch to the Trunk, though. This is a workaround that, although quite clever, distorts too much the semantics of the fo:page-sequence element. A page sequence really is a self-contained set of typographical material, that should start and end on its own pages (the common analogy is the chapter of a book). Rather, at some point we will have to tackle that artificial limitation of starting the layout only when the end of the page sequence has been reached. It's possible to start earlier, like you have somehow proved it. Also, we are planning to implement several layout options, providing different trade-off between speed/memory consumption and typographical quality. Eventually it should no longer be necessary to split the document into several page sequences to avoid memory issues. Still, meanwhile your patch may save the lives of users who need a quick solution to that problem. I hope you understand. Thanks again, Vincent Ben Wuest wrote: > Hi - > > > > We recently integrated Apache FOP 0.95 with our software to perform the > rendering of RTF and PDF reports. This integration was very quick and > provided great results. However, due to the large amounts of data that our > software is required to handle, we began experiencing Out of Memory problems > with FOP. We researched this and sent letters to the user community and > determined that what we were experiencing OOM issues because each of our > reports existed in one page-sequence. We came to the conclusions from the > community response, Web Forums, and analysis of the Apache FOP code itself > that FOP reads to the end of a page sequence and then begins to render. > With the large amounts of data ( 40 Mb FO files ) we quickly ran into > scalability issues with one page-sequence per report. At this point we > divided up our reports into multiple page-sequences only to find that FOP > starts a new page on every page sequence and this behavior can not be changed > (through the means of alterin g the FO file). Page breaking at unpredictable locations (sometimes leaving half or ¾ pages empty) made the report presentation visually unacceptable. > > > > We have modified the Apache 0.95 code for PDF and RTF Rendering and would > like to offer this patch back to the community (the attached SVN diff is from > the 0.95 release baseline). Listed below is an overview of the > modifications that have been made. > > > > 1. Page Sequence Changes > > > > The handling of the break-after attribute was added to the page-sequence. > This can only be set to auto (meaning that no page break will occur after t
Hyphenation Compilation Problem with Java 1.6.0_10
org.apache.fop.hyphenation.TernaryTree.insert(TernaryTree.java:228) Ben Wuest Software Engineer, Development Q1 Labs Inc - The Nexus of Security and Networking Office: (506)-462-9117 ext 163 Fax: (506)-459-7016 ben.wu...@q1labs.com | http://www.q1labs.com <http://www.q1labs.com/>
Next Release Date?
Hi All - We are currently looking at integrating FOP for generating PDF and RTF documents. We have a policy to only use the latest stable releases. We came across the issue in FOP 0.95 in which open Office has trouble displaying the multiple columns. This has been remedied in the trunk (see https://issues.apache.org/bugzilla/show_bug.cgi?id=45616 and http://svn.apache.org/viewvc?view=rev&revision=693742 for the original bug report and fix). We are just wondering when the next release is planned because we are interested in this and trying to do some planning! Any info would be appreciated. Cheers Ben -- Ben Wuest Software Engineer, Development Q1 Labs Inc - The Nexus of Security and Networking Office: (506)-462-9117 ext 163 Fax: (506)-459-7016 ben.wu...@q1labs.com | http://www.q1labs.com <http://www.q1labs.com/>