How does one unsubscribe
I'll be changing to a new e-mail address and am cancelling my list subscriptions. fop-user has instructions at the bottom of messages but fop-dev doesn't. I guess I'll have to read the web page.
Re: [Fwd: Re: cvs commit: xml-fop/src/java/org/apache/fop/apps CommandLineOptions.java Fop.java]
On Mon, 2004-04-12 at 04:33, Peter B. West wrote: > Glen, > > I put in a vote for Simon. The language thing is confusing, I know. > There have been occasions on which the Austrian flag has been flown, or > the Austrian National Anthem been played, somewhat inappropriately. But > it's en_AU over here; AU because we got in first. And the Austrians don'd call it Austria ... Isn't it Osterreich John Austin <[EMAIL PROTECTED]>
Re: urgent help needed using FOP
ush(); > > > > } catch (Exception ex) { > throw new ServletException(ex); > } > } > > ... > > } > > > > This is the exact error I got: > > org.xml.sax.SAXParseException: Content is not allowed > in prolog. > at > org.apache.xerces.parsers.AbstractSAXParser.parse(AbstractSAXParser.java:1172) > at org.apache.fop.apps.Driver.render(Driver.java:498) > at org.apache.fop.apps.Driver.run(Driver.java:565) > > > > __ > Do you Yahoo!? > Yahoo! Small Business $15K Web Design Giveaway > http://promotions.yahoo.com/design_giveaway/ -- John Austin <[EMAIL PROTECTED]>
Re: DO NOT REPLY [Bug 27901] - TextCharIterator.remove() does not work properly
On Thu, 2004-03-25 at 19:08, Glen Mazza wrote: > Ich bin confused--ist chz ([EMAIL PROTECTED])--Christian > Geisert oder anderer Christian? The bugzilla entry > lists chz as being "Christian Z", so I'm not sure whom > I'm speaking with! So we shouldn't all be running around with multiple e-mail identities ? My excuse is, I used that e-mail address years ago when I opened my first Bugzilla account. -- John Austin <[EMAIL PROTECTED]>
Re: Java thory and proctice: Garbase collection and performance
On Fri, 2004-02-20 at 15:46, J.Pietschmann wrote: > *bg* > Twenty years ago, I had to work on a 8008 driven computer > with 4k RAM and 12k ROM. That's enough to run a program > which nicely prints formatted and justified text (25 lines > a 80 characters). We went a lng way since then. I went to a presentation on the Mars Rover's at the St John's GeoCentre which is one of the sites that NASA has granted access to the FTP site for fresh Images ... Comparing the old Mars projects to the new stuff ... That was FORTRAN ... This is Java. I recall hearing about a court case in which the Canadian Military were suing a supplier about something as trivial nowadays as 8K of memory. -- John Austin <[EMAIL PROTECTED]>
Re: Java thory and proctice: Garbase collection and performance
On Thu, 2004-02-19 at 17:53, J.Pietschmann wrote: > John Austin wrote: > > I noticed this artcle on Developer Works: > > > > Java theory and practice: Garbage collection and performance > > http://www-106.ibm.com/developerworks/library/j-jtp01274.html > > > > Something to read on Thursday. > > Nice read, however, they don't talk about constructors. There Isn't allocation the only unseen part of construction ? Everything else is visible in the code and surely a few assignments are never expensive. Any other expensive operations will stand out in measurements of code execution. > are still arguments for reusing objects and for trying to > replace objects with a bunch of primitive values. > (BTW a nice try selling yet-to-be-written optimizations > regarding inlining...) Moore's law is another optimization we sell in advance all the time. -- John Austin <[EMAIL PROTECTED]>
Java thory and proctice: Garbase collection and performance
I noticed this artcle on Developer Works: Java theory and practice: Garbage collection and performance http://www-106.ibm.com/developerworks/library/j-jtp01274.html Something to read on Thursday. -- John Austin <[EMAIL PROTECTED]>
RE: Just a small question...
On Thu, 2004-02-05 at 15:28, Andreas L. Delmelle wrote: > I think this is a bit over the top. Suppose that tomorrow, someone gets > fired at RX or AH, and this ex-employee decides to share some ideas with us. > Are we really going to tell him to take a hike?? Just because of simple > integrity? (Suppose that, before we find out, he has already submitted a few > patches that have been applied. Would we undo all of these patches, because > of 'simple integrity'?) I am surprised that MS or their minions at SCO haven't twigged to the following scheme" They could 'set-up' Open Source by masquerading as some student in netland and submit some provably proprietary code as original. Six months later, MS sues Linus for malfeasance with the vigorous support of Homeland Security ... Of course, conspiracies never succeed for long. Some small fish would rat them out. -- John Austin <[EMAIL PROTECTED]>
Re: (FOP examples) XSLT question
On Wed, 2004-02-04 at 21:13, Glen Mazza wrote: > > ... > ... > Version select="$versionParam"/> ... > > But it keeps outputting "Version 1" in the resultant > PDF. What is the standard way of getting it to > display "Version 1.0"? select='format-number($versionParam,"##.0")' should work. -- John Austin <[EMAIL PROTECTED]>
Re: (FOP examples) XSLT question
On Wed, 2004-02-04 at 21:13, Glen Mazza wrote: > Since this is FOP work-related, I guess I can be > allowed to ask a very newbie XSLT question here: > > I just added a parameter to one of the XSL example > files (eventually to show the use of a JAXP > transformer.setParam() call) as follows: > > > ... > ... > Version select="$versionParam"/> ... > > But it keeps outputting "Version 1" in the resultant > PDF. What is the standard way of getting it to > display "Version 1.0"? Isn't there also or for this sort of thing. I think value-of implies some kind of conversion ... My reference is upstairs. -- John Austin <[EMAIL PROTECTED]>
RE: Unnesting properties and makers.
On Mon, 2004-01-26 at 17:45, Andreas L. Delmelle wrote: > > -Original Message- > > From: Finn Bock [mailto:[EMAIL PROTECTED] > > > The result is then: > > > > [/d/fop] /c/java/jdk1.2.2/jre/bin/java.exe -cp . x > > false method call 581 > > true method call 581 > > false instanceof 160 > > true instanceof 170 > > > > [/d/fop] /c/java/jdk1.3.1_03/jre/bin/java.exe -cp . x > > false method call 1272 > > true method call 2304 > > false instanceof 17945 > > true instanceof 912 > > > > [/d/fop] /c/java/j2sdk1.4.2_02/bin/java.exe -cp . x > > false method call 2154 > > true method call 2754 > > false instanceof 590 > > true instanceof 651 > > > > Very, very interesting... Java's OO-optimization at its best (except for > 1.3)! After all, it shouldn't be *that* surprising that an > accessor-method-call generates more overhead than a test for > class-membership (but what if the class in question is not yet loaded at > time? Not that this should occur a lot...) So I copied that program and ran it on my RH 9 system. Got the following results. I am just quoting the results here: Note that the default JVM is -client or HotSpot ... [EMAIL PROTECTED] foptest]$ java -classpath . x false method call 998 true method call 1001 false instanceof 3008 true instanceof 4119 [EMAIL PROTECTED] foptest]$ java -server -classpath . x false method call 1 true method call 0 false instanceof 0 true instanceof 4822 [EMAIL PROTECTED] foptest]$ java -server x false method call 1 true method call 0 false instanceof 0 true instanceof 4784 java version "1.4.2" Java(TM) 2 Runtime Environment, Standard Edition (build 1.4.2-b28) Java HotSpot(TM) Client VM (build 1.4.2-b28, mixed mode) H. -- John Austin <[EMAIL PROTECTED]>
RE: Unnesting properties and makers.
On Mon, 2004-01-26 at 17:45, Andreas L. Delmelle wrote: > > -Original Message- > > From: Finn Bock [mailto:[EMAIL PROTECTED] > > > The result is then: > > > > [/d/fop] /c/java/jdk1.2.2/jre/bin/java.exe -cp . x > > false method call 581 > > true method call 581 > > false instanceof 160 > > true instanceof 170 > > > > [/d/fop] /c/java/jdk1.3.1_03/jre/bin/java.exe -cp . x > > false method call 1272 > > true method call 2304 > > false instanceof 17945 > > true instanceof 912 > > > > [/d/fop] /c/java/j2sdk1.4.2_02/bin/java.exe -cp . x > > false method call 2154 > > true method call 2754 > > false instanceof 590 > > true instanceof 651 > > > > Very, very interesting... When did the choice of JVM (java -client | java -server) appear ? Wasn't it 1.3 ? -- John Austin <[EMAIL PROTECTED]>
Re: Newbie committer questions.
On Tue, 2004-01-20 at 21:10, Glen Mazza wrote: > Actually, yes and no, as I learned a ton--XSLT, Java, > and FOP--from his coding. (Jeremias never taught me > that much! ;) However, I am quite fatigued right now You have been impressively active in the past short while. > and need a few days off. Finn, would you mind taking > over the rest of your last patch? The issues I found > can be discussed and changed, if necessary, after you > apply it. Sometimes I feel like I did when I first heard Wynton Marsalis play some variations from the "Trumpet Method" by J. Arban ... (Carnival of Venice) Man, he plays that faster than I can READ it ... -- John Austin <[EMAIL PROTECTED]>
Re: Quote of the Day
On Mon, 2004-01-19 at 16:17, Andreas L. Delmelle wrote: > "When I hear Bill Gates bragging about how his programmers can code up to 72 There's the world's richest hermit again. Maybe he'll end up nuttier than Howard Hughes. -- John Austin <[EMAIL PROTECTED]>
Re: Servlet Examples in HEAD v.s. 0.20.5
On Sun, 2004-01-18 at 08:49, J.Pietschmann wrote: > John Austin wrote: > >(is Content-length: required for any reason other than placating > >Acrobat and that rich hermit who lives outside Redmond WA ?) > > Not really a FOP topic but anyway. > Setting content-length is considered "good style", because it allows > browsers give feedback to the users how far the download proceeded. > This is especially useful for larger files on slow connections. > Of course, there is a tradeoff for dynamically generated content: > there wont be any feedback at all until the content is ready, and > if this is longer than the download time itself (now that everybody > has broadband :-) ), the user is still dissatisfied. Well, the > IEx architecture bug saves us from pondering the philosophical > background. Mentioned because it is in the extant codebase even though it isn't necessary. I deduce it is related to Acrobat because of cryptic comments in the documentation. > > 2) Cache Templates objects for faster Transformations when XSLT > >files are to be re-used. The 'Java and XSLT' O'Reilly book > >has some interesting suggestions in this area. > > The problem is to detect style sheet reuse without context information. I think the only prob is how to purge from the cache. Re-use detected if names are URL's. Still faces the problem of detecting changes to stylesheets. Discussed a bit in Burke's book. > > 3) Using URL's for the fo= and xml=,xsl= parameters so we can use > >network resources as well as local files. > > +1000. > Doh, revert to +0. I'd like to do this, unfortunately, this is not > without drawbacks: > - People have to learn what an URI is. This seems to be much harder > than expected, especially for file:-URLs. > - People will still insist to keep "xml=foo.xml". This is still an > URL (actually: a relative URL reference, which has to be resolved). > We have to think hard what the base URL is in this case. What if default xml=fred.xml is mapped to xml=file://./fred.xml where the servlet's 'working dir' is defined relative to servlet context. The we can ship some of our test xml/xsl files in that location and people have something to start with. > J.Pietschmann -- John Austin <[EMAIL PROTECTED]>
Re: Servlet Examples in HEAD v.s. 0.20.5
On Sat, 2004-01-17 at 19:18, Jeremias Maerki wrote: > Discussion on this can be found here: > http://marc.theaimsgroup.com/?t=10383153256&r=1&w=2 > http://marc.theaimsgroup.com/?t=10172302692&r=1&w=2 > > There were pros and cons about the move from examples into the main > source tree. I think the triggering point was that the servlet sees real > use and doesn't really qualify as an "example". I agree that with > today's build it may not be so obvious what is necessary to build the > WAR file (the various parts are distributed in the source tree). But the > WAR file gets built automatically today. Doh! I did a 'locate fop.war' and there it is! Of course, my oldish snapshot from HEAD doesn't work, so there was no output from running it, but it does build and deploy into Tomcat. > Proposal: I like your ideas. I also think that we have to preserve the > simplicity of the servlet as an educational example for people who want > to play with it. So what about resurrecting the examples/servlet but > keeping it real simple? Just the basics. And the servlet in the main > source tree stays where it is but gets your new features. I would expect an example servlet to be quite simple with descriptive comments and suggestions for variations. The purpose here would be to provide a prototypical webapp that could be used to populate a small project in the user's development space. I did find it difficult to settle on a set of features that I would include in a single 'FopServlet' program. This is simplified if FopServlet is primarily real working code. I would be comfortable with an org.apache.fop.servlet.FopServlet that included some more advanced features: 1) Deflate and Inflate the byte stream used to store the PDF file (is Content-length: required for any reason other than placating Acrobat and that rich hermit who lives outside Redmond WA ?) 2) Cache Templates objects for faster Transformations when XSLT files are to be re-used. The 'Java and XSLT' O'Reilly book has some interesting suggestions in this area. 3) Using URL's for the fo= and xml=,xsl= parameters so we can use network resources as well as local files. 4) Detect IE and redirect users to a URL that has the proper '.pdf' filetypes in basename and end of request URL. 5) The servlet could be used as part of an automated testing process. The fop.war file could be deployed in Tomcat as part of an HttpUnit test and then many of our tests could be run using HttpUnit. Examples could be simpler than this as they have the specific purpose of illustrating a practical use case. > German speaking Swiss people would say you get "de Föifer und's Weggli" > (freely translated to english: the 5 cent piece and the donut. Meaning: > You get twice as happy. Want to know what a "Weggli" is? Go to > http://www.jowa.ch/1776/1846/1847/1865/1867.asp). :-) My German has atrophied over the past 31 years. I left Ramstein, Germany in July 1973 and except for one undregrad course, have only spoken German once or twice since. [I stopped overnight in Lahr about 1978.] As a Canadian I understand 'donut' (see http://www.timhortons.com/) but I always think of brotchen as a German pastry. -- John Austin <[EMAIL PROTECTED]>
Servlet Examples in HEAD v.s. 0.20.5
After the last week's thread about running FOP in a servlet, I thought I'd review the examples with a view to improving the end-user experience and flattening the learning. Some notes: The current sample: org.apache.fop.servlet.FopServlet has been improved in HEAD but the packaging seems (IMHO) to have suffered. In 0.20.5 the examples/servlet directory contains a fully-functional web application that can be deployed and run in the latest Tomcat. This webapp includes a valid build.xml file so one can simply type: 'ant' in the examples/servlet directory. Even better, the Ant Farm plugin in Jedit can build 'fop.war'. From the Tomcat Manager window, you can upload 'fop.war' and use the webapp right away. The HEAD version of FopServlet has been rewritten to use JAXP and works reasonably well. Unfortunately, the examples/servlet directory has disappeared from the project. It has not disappeared from the documentation, so there is an error there. The servlet seems to make provisions for peculiarities of the Acrobat plug-in (writes the PDF to a memory buffer then copies this to response.getOutputStream() after setting the Content-length header). This knowledge SHOULD appear in program comments. The same is true for information about Internet Explorer and it's need for the filetype '.pdf' in the base URL and the end of the invoking URL. One could even update the example to issue a redirect for Evil(tm) User Agents so that the IE user's request is corrected for him (heh .. heh). Would anyone be offended if we were to put the examples/servlet back in to the build ? We could update the deployment descriptor to use: org.apache.fop.servlet.FopServlet (and FopPrintServlet) as well as one or two new examples (whose code appears in the examples/servlet directory), that illustrate other concepts such as cached Templates objects and the use of Deflator/Inflator streams to reduce the size of the in-memory PDF file buffer. I have some thoughts about generalizing FopServlet to use URL parameters so that both server-side files and network-resident HTTP resources would be usable. I would consider adding some of my own test files which demonstrate the use of FOP to generate letters and print envelopes from data base output. It should be possible to build a servlet example that executes all of the .fo and .xml/.xsl files in the examples directory. It would be nice for potential users to have an out-of-box webapp that runs a large number of our examples. -- John Austin <[EMAIL PROTECTED]>
Re: HashMap
On Wed, 2004-01-14 at 21:27, Peter B. West wrote: > A friend was watching over my shoulder as I was responding to an earlier > message on fop-dev. "HashMaps... I won't say what image that conjures > up for me." "Well?" "A map of where you have the stash." > > I never thought of it that way. Those of you in 'foreign climes' won't have heard of Canada's latest drug bust. A former brewery north of Toronto was being used as one of the largest 'grow ops' (hydroponic marijuana factory) ever discovered. The Globe and Mail (http://www.globeandmail.com/) stated that Ontario produces more weed than the entire population could possibly smoke. There's an image of Canada that I want Europeans to have. Of course it would slow hockey down quite a bit (but it would dramatically increase concession sales at NHL games ...) and cut out the fights. And only one of the Cheech and Chong guys is/was Canajun, eh! Anyway ... the former Molson's brewery in Barrie Ontario next to Highway 400 (Interstate/Motorway/Autobahn) ... had everything they needed ... huge metal kettles ... loading docks ... -- John Austin <[EMAIL PROTECTED]>
RE: [Bug 25480] - Experimental performance improvements.
On Tue, 2004-01-13 at 20:49, Glen Mazza wrote: > Let's not get too certain of anything right now with > respect to implementation--but you probably have a > point--a huge and very repetitively formatted document > (say, the Chicago phone book, perhaps) would have > comparatively fewer properties with a higher > cardinality for each. SOLVED! Yes! Something to cheer up a morbidly downcast Packers fan two days after the fall of the mighty number '4'. I used DocBook for the frequency table because I was familiar with formatting it as PDF with FOP. I suspect that properties have similar distributions in general because XSL-FO are always generated with programs and (ransom notes notwithstanding) adhere to general styles. Really repetitive documents would be only slightly more skewed than general text documents. (Say 90-10 rather than 80-20). Someone told me where to get the style sheets for the XSL-FO specification (RenderX) and I wanted to generate the XSL-FO file for it, as a more appropriate 'challenge' for the project. -- John Austin <[EMAIL PROTECTED]>
Re: PropertySets - target-locks on SDK 1.4
On Mon, 2004-01-05 at 21:11, Glen Mazza wrote: > It's probably not *yet* time to set 1.4 as the JDK to > code against for 1.0, but it probably wouldn't be much > of a disaster if we did so either. Does a target-lock commitment like this require a vote ? John Austin <[EMAIL PROTECTED]>
Re: AW: Regression tests was: Re: Output from NIST test suite
On Fri, 2003-12-26 at 05:29, Peter Kullmann wrote: > J. Pietschmann wrote: > > > > John Austin wrote: > > > RedHat 9.0 (my system anyhow) includes a command 'pdftopbm' > > that will > > > convert a PDF to multiple PBM (protable Bit Map) files that might be > > > comparable. > > ... > > >It would certainly help detect pixel-sized changes. > > > That might help regression testing. I wasn't thinking of using graphics as the primary means of comparing output. It was just a thought that one could use visualization in some circumstances: + pixels that were white in both images would be rendered as white + pixels that were black in both images would be rendered as black + black pixels in the first image that were white in the second could be rendered as red + white pixels in the first image that were black in the second could be rendered as blue I thought of the idea of overlaying images for comparison when I was scrolling through the side-by-side renderings of PDF's that Finn posted yesterday (what does 'yesterday' mean in a discussion that crosses the International Date Line ?) Of course, this color-based scheme breaks down for test cases that use color. > > > > We need regression tests badly. Some problems to ponder: > > a) Tests need to be automated for actually being useful. > > JUnit seems the way to go. Unfortuanately, it's still > > underutiliyed in FOP. > > b) We don't have much *unit* tests. There's only the > > UtilityCodeTestSuite.java. We need much more tests for > > basic functionality. The problem seems to be however > > that an elaborated test harness needs to be written in > > order to do unt tests for, e.g. layout managers. > > c) In order to test the whole engine at once, from FO input > > to generating PDF/whatever, well, a binary compare with > > a pregenerated PDF would be as sufficient as comparing > > bitmap images. Problems here: > > + The files to compare against are binary, and consume > >a lot of space. Well, take a look at GenericFOPTestCase.java > >which uses MD5 sums, one for the FO in order to detect > >accidental changes to the source, and one for the result. > > + Even small changes have potential to break the whole test > >suite, even if nothing important changed, let's say the > >order of entries in a PDF dictionary. Rendering bitmaps > >from PDF eliminates this, but then you wont find regressions > >in non-visible stuff. > > All in all, if there are 143 template PDFs and a change causes > > mismatches for all, what will you do? Examine everything, > > comparing pixels, check whether there are visible differences > > at all, and then judge whether the original or the newly > > generated PDF is at fault? I don't think this will be done > > often. Use tests for binary equality to detect differences. Visualization might be one tool, useful in following up on detected differences. I might want to use the technique to compare the effects of changes to a document. For example: What happens on page 7 when I change space-before="10pt" to space-before="15pt" ? A colorized visualization would give me a better idea than separate files. Remember that our brains are all quite different. Your rote visual memory ability is probably much better than mine. You might learn more from a side-by-side comparison than I would. Crap. Now I have to give an example. Perhaps it won't take that long. > > > > Ideas welcome! > > > > J.Pietschmann > > > > As an alternative approach for c) one could create tests along > the following lines: Suppose you want to test left margin > properties of a block. For this a simple fo file is rendered as > a bitmap. The bitmap will not be compared to a reference bitmap > but some elementary assertions are calculated. For instance one > such assertion could be: "The rectangle of width 1 inch of the > left edge is blank." I don't know of a tool that can do this > but it should be pretty straight forward to implement. Probably not that hard to do once you get inside an image file in a program. Especially if you know the colors will be black (0,0,0) and white (255,255,255) or a small number of selected colors. > So, in the test suit one has a piece of fo containing a test > document and some assertions in java or coded in xml that should > be fulfilled by the rendered image of the fo. > > Assertions could contain some of the following pieces: > - a specified rectangle is blank (or of some specific color) >
Re: Output from NIST test suite
On Thu, 2003-12-25 at 11:42, Finn Bock wrote: > Hi, > > After 'fixing' the master-reference issue in my copy of the NIST test > suite, I ran the tests against 0.20.5 and 1.0dev and merged the result > side by side into a single .pdf file. > > You can download the result (1Mb) here: > > http://bckfnn-modules.sf.net/out-0.20.5-1.0.pdf > > For some reason the pdf does not display correctly in my browsers, so it > is better to download it. The merged pdf file is created using iText. > > The square to the left contains the output from 0.20.5 and the square on > the right the output from HEAD. > > Here is also a merge between the pdf files that comes with the NIST > suite and head: > > http://bckfnn-modules.sf.net/out-nist-1.0.pdf > > There is still a few issues left to fix . > > > Another way of using the test suite could be to compare a binary image > of the pages against some kind of reference. Has such a approach been > tried? Does anyone know of available software that can render a PDF as > an image file? RedHat 9.0 (my system anyhow) includes a command 'pdftopbm' that will convert a PDF to multiple PBM (protable Bit Map) files that might be comparable. They would be convertable in to other formats such as PNG (or GIF for the patent-minded). I found the result pretty poor (ugly text badly in need of anti-aliasing). That might help contribute to keeping images similar. It would certainly help detect pixel-sized changes. That might help regression testing. There are suggestions on the Net that Ghostcript can do this sort of conversion as well. GIMP can read a PDF as well. When I tried it, I got a graphic for every pair of pages (my doc was over 133 pages). Perhaps some script-fu ... ? > regards, > finn -- John Austin <[EMAIL PROTECTED]>
Re: Output from NIST test suite
On Thu, 2003-12-25 at 11:42, Finn Bock wrote: > Hi, > > After 'fixing' the master-reference issue in my copy of the NIST test > suite, I ran the tests against 0.20.5 and 1.0dev and merged the result > side by side into a single .pdf file. Interesting technique. What tool do you use to make the side-by-side comparison ? -- John Austin <[EMAIL PROTECTED]>
Re: Is this a coding flaw ?
Nothing to do with optimization. Just noticed some wrongness that has the possibility to be pathological wrongness. Classes should preclude the possibility of erroneous use. The subject was making a URL resolver thread-safe. The class in question is a source of state information needed later by the resolver. [Lucky thing we didn't mention the dirty knife!] On Fri, 2003-12-19 at 11:50, Ben Galbraith wrote: > Jeremias Maerki wrote: > > Hmm, again, we could probably cache the value. Not very elegant, of > > course, but how else do we get that value which is used in several > > places? > > Just an outsider's point-of-view: it probably doesn't make sense to > waste time optimizing code like this unless a profiler indicates that > it's a bottleneck. > > Randomly searching through code for potential inefficiencies has widely > been disproven as an effective optimization technique. ;-) > > Ben > > > > > On 19.12.2003 13:57:26 John Austin wrote: > > > >>And of course, I missed the fact that the last method in the class > >>contains a pathological use. To get the name of this class, we create a > >>parser ? > >> > >> /** > >> * Returns the fully qualified classname of the standard XML parser > >>for FOP > >> * to use. > >> * @return the XML parser classname > >> */ > >>public static final String getParserClassName() { > >>try { > >>return createParser().getClass().getName(); > >>} catch (FOPException e) { > >>return null; > >>} > >>} > > > > > > > > Jeremias Maerki > > -- John Austin <[EMAIL PROTECTED]>
Re: Is this a coding flaw ?
On Fri, 2003-12-19 at 10:02, Jeremias Maerki wrote: > I should be thread-safe, the way it is used here. You could of course, > cache the SAXParserFactory instance but I doubt the performance > improvement would be measurable. getParser is probably not the best name > if you look at it from a bean-oriented angle but it's not that it's > called many times anyway. Do you think we should rename it? > > On 19.12.2003 13:13:45 John Austin wrote: > > I found the following snippet in the class FOFileHandler: And of course, I missed the fact that the last method in the class contains a pathological use. To get the name of this class, we create a parser ? /** * Returns the fully qualified classname of the standard XML parser for FOP * to use. * @return the XML parser classname */ public static final String getParserClassName() { try { return createParser().getClass().getName(); } catch (FOPException e) { return null; } } -- John Austin <[EMAIL PROTECTED]>
Re: Is this a coding flaw ?
On Fri, 2003-12-19 at 10:02, Jeremias Maerki wrote: > I should be thread-safe, the way it is used here. You could of course, > cache the SAXParserFactory instance but I doubt the performance > improvement would be measurable. getParser is probably not the best name > if you look at it from a bean-oriented angle but it's not that it's > called many times anyway. Do you think we should rename it? As long as we are certain that it is being used correctly, probably not necessary. Just jumped a bit when I saw the possibility that it would be easily mis-used. > > On 19.12.2003 13:13:45 John Austin wrote: > > I found the following snippet in the class FOFileHandler: > > > > === > > /** > > * @see org.apache.fop.apps.InputHandler#getParser() > > */ > > public XMLReader getParser() throws FOPException { > > return createParser(); > > } > > === > > > > and the createParser() method > > > > === > > /** > > * Creates XMLReader object using default > > * SAXParserFactory > > * @return the created XMLReader > > * @throws FOPException if the parser couldn't be created or > > configured for proper operation. > > */ > > protected static XMLReader createParser() throws FOPException { > > try { > > SAXParserFactory factory = SAXParserFactory.newInstance(); > > factory.setNamespaceAware(true); > > factory.setFeature( > > "http://xml.org/sax/features/namespace-prefixes";, true); > > return factory.newSAXParser().getXMLReader(); > > > > > > > > === > > > > Now it would seem to me that a 'getter' method should not go around > > creating objects every time it needs to. It hust doesn't look right. > > > > I assume that SAXParserFactory is thread-safe. > > > Jeremias Maerki -- John Austin <[EMAIL PROTECTED]>
Is this a coding flaw ?
I found the following snippet in the class FOFileHandler: === /** * @see org.apache.fop.apps.InputHandler#getParser() */ public XMLReader getParser() throws FOPException { return createParser(); } === and the createParser() method === /** * Creates XMLReader object using default * SAXParserFactory * @return the created XMLReader * @throws FOPException if the parser couldn't be created or configured for proper operation. */ protected static XMLReader createParser() throws FOPException { try { SAXParserFactory factory = SAXParserFactory.newInstance(); factory.setNamespaceAware(true); factory.setFeature( "http://xml.org/sax/features/namespace-prefixes";, true); return factory.newSAXParser().getXMLReader(); === Now it would seem to me that a 'getter' method should not go around creating objects every time it needs to. It hust doesn't look right. I assume that SAXParserFactory is thread-safe. -- John Austin <[EMAIL PROTECTED]>
Re: FOs and Areas
On Wed, 2003-12-17 at 15:56, J.Pietschmann wrote: > I've got a lot of ideas myself, perhaps too many. What the > project needs is *working* *code*. Amen! [but a short one, not drawn out like the final chorus of Messiah!] -- John Austin <[EMAIL PROTECTED]>
What should I be doing ?
As I mentioned off-line to another list member, I have some questions about the progress of the current Fop development effort. So far: i) I have made a few measurements and reconfirmed some other peoples opinions about possible areas for improvement. ii) I have also proof-read some code from Alt-Design as preparation for possibly working on it's integration. iii) Written a few Problem Reports in Bugzilla to better document the 'here and now' status of the HEAD development stream. iv) Established a statistical basis for an object discovery and re-use strategy. v) Provoked some discussion of string interning (I was aiming for something grander in terms of [iv] above.) Everyone is extremely polite and encouraging and I have every confidence in the abilities of each active member of this discussion. BUT: I don't have a feeling that we are capturing any territory. The discussions are lively and quite enlightening but they seem to peter out or double back on themselves. I don't see any state changes from Bugzilla indicating that anything is getting fixed and my experience tells me that this is not healthy. -- John Austin <[EMAIL PROTECTED]>
Re: (Victor et al) Re: Performance improvements.
I haven't looked at the XSLT code but I have a question in my mind that I need to answer about it. I wonder what it is that is being generated and what were the design alternatives to the codegen implementation. One question that popped in to my head was: Is there 'missing polymorphism' here ? As I said, I only have the question at this time. On Sat, 2003-12-13 at 12:12, Glen Mazza wrote: > -1. I'd like to hold off on this, at least until I > can gain a better understanding of the autogenerated > code. I may still to the same conclusion as the other > committers, but Finn's endorsement of the XSLT--as > well as the long work of those like Keiron who have > worked with the XSLT files--suggests that there are > significant time benefits to using them. (At work, I > use "SQL to write SQL" all the time, and love the time > efficiencies that result.) > > If we check in the Java code, then changes may end up > being made to those files directly, which will result > in the XSLT files becoming unregeneratable. Or, every > run of the XSLT will require re-modification of the > changes made manually to all the Java > files--potentially dozens--100's of files. So I'm > kind of leery about doing this at the moment. > > [Actually, I'm looking forward to studying the XSLT > that generates these files--as I mentioned to Clay > that CVS and Ant were two of the initial benefits you > get by working on FOP, apparently being about to write > Java code using XSLT is a third one...i.e., Yeehaw!, > as I believe he had put it... ;)] > > Glen > > --- "J.Pietschmann" <[EMAIL PROTECTED]> wrote: > > Finn Bock wrote: > > > I like the generation process as it allowed me to > > try out and experiment > > > with different optimizations. I don't think that I > > realisticly could > > > have added caching of compound properties or > > changed the abs2rel/rel2abs > > > code if I had to change the Maker classes > > manually. > > > > If its common code, that's what class hierarchies > > and > > inheritance are made for. > > > > J.Pietschmann > > > > > > > __ > Do you Yahoo!? > New Yahoo! Photos - easier uploading and sharing. > http://photos.yahoo.com/ -- John Austin <[EMAIL PROTECTED]>
Re: Testing for main development stream.
On Sun, 2003-12-07 at 06:25, J.Pietschmann wrote: > John Austin wrote: > > It seems that the relative file reference ../graphics/page.gif is > > computed by the program relative to the 'current directory' not > > relative to the file: 'test/xml/bugtests/image.fo'. > > > > I'm sure the spec has an opinion on this. > > Interestingly, the XSLFO spec doesn't have an opinion on this. However, > by using the term "URL" they probably imply the usual resolving procedure > for URLs apply, meaning any relative URL is resolved against the base URL > of the containing document or base document (in case the FO is generated > by XSLT). > This means there is a problem to correct. > > J.Pietschmann So, the desired behaviour is open a report in Bugzilla ? Will do that for the three or four I found. -- John Austin <[EMAIL PROTECTED]>
Testing for main development stream.
I ran a few tests of a recent copy of the 1.0dev stream and found some errors. What are your preferences for problem reports at this time ? Should I enter issues into BugZilla as I find them ? Should I take a look at the code and notify the committer who last worked on anything I find ? So far: 1) ./build.sh test <-- testing fails quickly 2) ./build.sh junit <-- are there any tests ? 3) from root directory (the one containing build.xml) I ran: find test -name "*.fo" -print -exec ./test.sh {} \; where test.sh contains: #!/bin/sh java -Xms100m -Xmx200m -cp .:build/fop.jar:lib/avalon-framework-4.1.4.jar:lib/batik.jar:lib/commons-io-dev-20030703.jar org.apache.fop.apps.Fop -fo ${1} -pdf /tmp/$$.pdf I get quite a few errors. One example problem (or non-problem): test/xml/bugtests/image.fo [INFO] 1.0dev [ERROR] Error while opening stream for (file:../graphics/page.gif): .../graphics/page.gif (No such file or directory) java.io.FileNotFoundException: .../graphics/page.gif (No such file or directory) at java.io.FileInputStream.open(Native Method) It seems that the relative file reference ../graphics/page.gif is computed by the program relative to the 'current directory' not relative to the file: 'test/xml/bugtests/image.fo'. I'm sure the spec has an opinion on this. There are other errors. (other opinions too no doubt) test/xml/bugtests/text-transform.fo [INFO] 1.0dev Invalid byte 1 of 1-byte UTF-8 sequence. Turn on debugging for more information -- John Austin <[EMAIL PROTECTED]>
Measure (accurately) before optimizing.
Mea (tool) culpa! I am investigating an inaccuracy in CPU measurements reported by the Java Memory Profiler Tool that led me to the conclusion thet PropertyList.findProperty is the high-runner in FOP 0.20.5. A couple of other profilers report that findProperty() uses more CPU than we would like (10-12%) but less than JMP reports. Note that this measurement error in JMP also affects other XML code such as Xerces and Xalan as these are also recursive. I reported the question to the jmp-dev list and will advise when I get corrected results from a corrected program. I found a good list of profilers at: http://www.computerprograms.com/Directory/Computers/Programming/Languages/Java/Development_Tools/Performance_and_Testing/Profilers/ I have tested several profilers and JMP is the easiest to use. It is slower than Sun's hprof but has some nice features. 1) JMP -- nice but slow and has a problem over-reporting CPU seconds used by subordinates when those subordinates include recursive methods (i.e. findProperty()). 2) JPerfAnal -- based on hprof but quite slow and the GUI was done quickly i.e. is a kludge) Usable and one source of my suspicions about JMP. 3) HPMeter -- based on hprof but it crashes on the output from my traces. Odd because HP are usually pretty good (except for some device drivers). 4) prophIt -- a demo delivered by Java Webstart uses hprof input and has a really novel GUI to visualize performance. Unfortunately the GUI isn't quite there yet. The program is represented like a skyscraper where each floor has slabs representing CPU used. Higher 'floors' represent subordinate functions in the call tree. The item I don't like is the fact that the vertical dimension draws the eye to thin spires that are very tall. This could make you ignore bug flat slabs of CPU usage. Not all floors should be the same height. When they get this right it will be a category killer. Still very useful as it uses the same input as JPerfAnal HPMeter and a lot of others. This helped me find the error in JMP because I could not find findProperty() in the 3D graph. 5) EJP -- Extensible Java Profiler is a CS students excellent project. Unfortunately, it's a bit slow and requires one to read and follow directions. This one also helped me find the error in JMP after I read the Fine Manual. I have also decided to use the command line class for future performance measurements. -- John Austin <[EMAIL PROTECTED]>
Re: String.intern() test and measurement
On Tue, 2003-12-02 at 16:43, Glen Mazza wrote: > --- "J.Pietschmann" <[EMAIL PROTECTED]> wrote: > > > But, as Glenn noticed, the attribute names can > > also be implemented with > > > enumeration > > > > There are no enumerations in pre 1.5 Java. What was > > meant was that > > strings denoting XSLFO property enumeration tokens > > can be interned > > as the set is of limited and more or less fixed > > size, > > No I was actually thinking static final variables, my > reference to "enumerations" was in a generic sense: > > public static final int PROPA = 1; > public static final int PROPB = 2; > public static final int PROPC = 3; > > If it is only the property names we were planning on > interning, then I thought static final variables would > be faster/more efficient instead. > > Glen Given that there are between 249 and 380 names and they exist in both integer and String format, there isn't a lot that we can recover here. If we are after better performance, we must measure to find the 'high runner' and tune from there. To 'design-in' stuff that we think will be fast is often unproductive. This is why I have been using JMP to measure performance of FOP and some sample programs. A high runner in FOP 0.20.5 is: PropertyList.findProperty(). It calls other functions in org.apache.fop.fo that consume significant CPU resources. In one example it called itself recursively to a (depth of 10) One of the reasons I am playing with the SAXTreeValidator program is as a simplified test bed for the Property implementation. I want to be able to plug in a new Property implementation and test it independantly of the rest of FOP. -- John Austin <[EMAIL PROTECTED]>
Re: String.intern() test and measurement
On Tue, 2003-12-02 at 14:04, J.Pietschmann wrote: > Finn Bock wrote: > >> new DefaultMutableTreeNode(("Attribute (name = '" + > >>atts.getLocalName(i) + > >>"', value = '" + > >>atts.getValue(i) + > >> "')").intern() ); > > > Here you are also interning the attribute values, right? > > Eh, no. The String "Attribute (name = ';name:', value = ';value:')" ^ Canadian ? > is interned. But I feel it's similar for the purpose of modelling the behavior in the SAXTreeValidator.java example. The references held by the parser go away in this model program. I ensure the strings are unique. That's where the 80M savings comes in. In FOP, many strings are created in the parser and references to them are stored in the parsed tree. Interning these will produce memory savings. > > But, as Glenn noticed, the attribute names can also be implemented with > > enumeration > > There are no enumerations in pre 1.5 Java. What was meant was that > strings denoting XSLFO property enumeration tokens can be interned > as the set is of limited and more or less fixed size, while it is > probably not prudent to intern the complete XML attribute value > strings. > For example: >text-decoration="underline overline" > (Yes, that's valid, provided "overline" is valid). > The possibly interned strings are "text-decoration" (property name), > "underline" and "overline" (enumeration tokens). > Somewhere else, the user might have put >text-decoration="overline underline" > Granted, given that FO source ought to be XSLT or otherwise generated, > this isn't very likely, but still. This is why I wrote a perl program to count from a real-world case (defguide). I haven't concerned myself with how an interned "overline underline" string is used, just ensured it is stored that way once. The attribute name space is defined in the XSL-FO spec. So the number of names is strictly limited. Peter lists 380 or so in PropNames.java. > Another issue is how the values are stored. For example, there are only > 8 distinct TextDecoration objects. In 0.20.5, basically every Text node > gets its own instance. > The weird thing is, when I hacked the PropertyManager to look up the > actual text decoration in an array of preinstantiated objects, FOP > run slower and took more memory. Apparently something went wrong. If > anybody is up there to get it right, please do. I tried testing 0.20.5 doing things that worked in SAXTReeValidator and haven't had instant success. The benefits would disappear if the references passed out of the parser are still held elsewhere in FOP. Give it some time. > Same for some other objects, BorderAndPadding and especially FontInfo > come to mind, although there is more variation. -- John Austin <[EMAIL PROTECTED]>
Re: String.intern() test and measurement
On Tue, 2003-12-02 at 12:59, Finn Bock wrote: > I'm resending this mail since it hasn't yet shown up in the archives. > I'm sorry about any duplicates. > > [John Austin] > > > 4) Changed the handling of strings at the for-loop storing the > >attributes received from the parser in startElement( ... ) > > > > // Process attributes > > for (int i=0; i > DefaultMutableTreeNode attribute = > > new DefaultMutableTreeNode(("Attribute (name = '" + > >atts.getLocalName(i) + > >"', value = '" + > >atts.getValue(i) + > > "')").intern() ); > > > > So I intern these strings rather than storing new strings. > > Here you are also interning the attribute values, right? Yes. > Interning is best used with discretion. The real problem, which isn't > really spelled out in the gotchas.html page, is that the interning > algorithm is completely undefined by the java spec. > > F.ex, in jdk1.4 the intern table (in symbolTable.[hpp,cpp]) is a fixed > size hashing table (size of 20011) with chaining buckets. So when the > total number of intern'ed string grows beyond that number, the interning > process becomes linear in time. Try the attached test program to see the > effect. Gasp! SUN implemented something might not scale up ? Note that my example of last night, the SAXTreeValidator.java results processed the 'defguide' file that has 117 unique names and 13,520 unique values. The memory saved by interning was impressive (184M down to 87M = 97M reduction). [My results may be artificially good as I have interned character strings as well. I expect that the nature of DEFGUIDE includes many repeated character strings. I shall re-run the benchmark: Hmm .. the interning version of the program now uses 104M, a reduction of 80M rather than the previous reduction of 97M. The number of interned strings must have been well over the hash table size, but the difference in CPU usage to parse that file was less than ten seconds (more for interned character strings).] I wondered how much chaining there has to be before performance gets really bad when I checked your program more closely. Your example program would produce external chains of length: 200/20011 ~ 100. Because the table keys are constructed, I expect your access times are uniformly distributed and average access times reflect that. Your demonstration program artificially employs 2 million strings which is not a behavior we would expect for FOP. The number of attribute names is limited (by the XSL-FO Spec) and the number of distinct values is limited by some probability distributions that are definitely not Uniform. Takes a LOT of ransom-note typography to make that many unique property values. > I also think that two different aspects of interning is being mixed > around in your measurements here. > > 1. The identity sharing. > 2. The memory sharing. I have not changed the programs beyond calling intern() on strings passed to some constructors. So no identity sharing is present. > The identity sharing can quite possible give a performance boost to the > lookup of the attribute names. I prefer to optimize from measurements. The FOP high runner is property lookup. I didn't see a lot of time in String.equals(), I may look again. Peter's alt-design all ready provides this functionality. I am chasing parallel memory effects. > The memory sharing can quite possible give a memory boost for duplicated > attribute values and a performance boost during garbage collection. I agree, esp in light of the counts I made using a Perl program and the large file defguide.fo. > Ad.1: If one decide that all attribute names must be interned before > insertion and also before lookup of attributes, a special hashtable > (identity-hashtable) can be coded that is significantly faster than a > normal value-based hashtable. As an example of the performance boost, > the hash calculation can be done as: I would use that if i felt that intern() wasn't theraputic. So far, I just intern the name and value for each attribute passed to the SAX parser callback: startElement. This stores the interned string in the parsed tree and lets the parser trash it's copies of strings whenever. > int index = (System.identityHashCode(key) & 0x7fff) % maxindex; > > In addition attribute names can be compared with '==' instead of equals. > The downside is that the lookup can only be done with intern'ed keys. But these values would be encapsulated in a class. Good reason for not breaking encapsulat
String.intern() test and measurement
I decided to find a demonstration program that works similar enough to FOP that I could try the String.intern() technique. 1) SAXTreeValidator.java from Chapter 3 of Brett McLaughlin's "XML and Java" the online copy of the example is 2nd Ed. 2) Fed this program various fragments of .fo files I have accumulated lately. 3) There was one line I had to change in the program Line 355: if (attPrefix == null || attPrefix.equals("")) { had to add test for null as the prog threw NPE. 4) Changed the handling of strings at the for-loop storing the attributes received from the parser in startElement( ... ) // Process attributes for (int i=0; i
Re: String.intern() thoughts and more stats
On Mon, 2003-12-01 at 02:11, Glen Mazza wrote: > --- John Austin <[EMAIL PROTECTED]> wrote: > > I mentioned yesterday that I thought I had read a > > comment > > by Bruce Eckel suggesting that String.intern() might > > be > > avoided. > > > > I could not find the reference in either the 2nd or > > 3rd editions > > of "Thinking In Java". > > > > No need-the "gotcha" site you gave earlier did give > some specific drawbacks under string compares: > > http://mindprod.com/jgloss/gotchas.html#COMPARISON > > BTW, The third drawback listed in the link above gave > "weak references" as an alternative implmentation--I'm > unsure what that construct is about--is this the > vtable you were speaking of in an earlier message? > > Also, another question for my comprehension here--the > "canonical mappings" you have been referring to in > this thread--is this the same thing as the property > enumerations that alt-design uses? I'm unsure of the > difference between the two. I started using the term here after re-reading parts of Eckel's Thinking in Java. I think the CM I refer to and the alt-design implementation are almost to the same thing. >From the on-line version of TIJ(3rd ed) the following excerpt: TIJ313.htm: Weak references are for implementing canonicalizing mappings where instances of objects can be simultaneously used in multiple places in a program, to save storage - that do not prevent their keys (or values) from being reclaimed. Don't be mislead to the red herring of 'weak references'. I am arguing for the "cache of unique objects" not for this GC technique. After I started using a large .FO file to provide statistics, I realized that we can use the same technique for larger non-string objects. To this end, I have some more statistics. In the sample FO file (DocBook: The Definitive Guide), there are 285,223 tags but there are only 18,419 unique property lists. (There may be fewer, my perl stats program treats different orderings of the same attributes as different lists) The program prints out property lists which occur more than 100 times. I prefix each list with tag names to distinguish empty lists by tag type. That increases the number of lists by only 15 or so. Number of Elements by tree level: level=1 count=1 level=2 count=473 level=3 count=5242 level=4 count=5480 level=5 count=7129 level=6 count=26231 level=7 count=22475 level=8 count=36447 level=9 count=62288 level=10 count=38536 level=11 count=30486 level=12 count=23641 level=13 count=23190 level=14 count=2023 level=15 count=771 level=16 count=701 level=17 count=109 Element frequencies: a 24 fo:basic-link 5225 fo:block 112142 fo:conditional-page-master-reference 48 fo:external-graphic 1097 fo:flow 472 fo:footnote 22 fo:footnote-body 22 fo:inline 62792 fo:layout-master-set 1 fo:leader 1764 fo:list-block 279 fo:list-item 1004 fo:list-item-body 1004 fo:list-item-label 1004 fo:marker 5335 fo:page-number 1872 fo:page-number-citation 3224 fo:page-sequence 472 fo:page-sequence-master 12 fo:region-after 38 fo:region-before 38 fo:region-body 38 fo:repeatable-page-master-alternatives 12 fo:root 1 fo:simple-page-master 38 fo:static-content 4720 fo:table 6497 fo:table-body 6497 fo:table-cell 33174 fo:table-column 19225 fo:table-footer 1 fo:table-header 29 fo:table-row 15301 fo:wrapper 1799 Property List frequencies: 395 fo:basic-link internal-destination=common.attributes, 66878 fo:block 1292fo:block end-indent=24pt,text-align-last=justify,last-line-end-indent=-24pt, 2119fo:block font-family=monospace,space-after.optimum=1em,white-space-collapse=false,text-align=start,space-before.maximum=1.2em,space-before.optimum=1em,wrap-option=no-wrap,space-before.minimum=0.8em,space-after.maximum=1.2em,linefeed-treatment=preserve,space-after.minimum=0.8em, 5082fo:block font-family=sans-serif,Symbol,ZapfDingbats,keep-together=always, 236 fo:block font-family=sans-serif,Symbol,ZapfDingbats,margin-left=-4pc,keep-together=always, 439 fo:block font-family=sans-serif,space-after.optimum=0.5em,hyphenate=false,font-weight=bold,font-size=18pt,space-after.maximum=0.6em,space-after.minimum=0.4em,keep-with-next.within-column=always,space-after=1em, 5321fo:block font-family=sans-serif,space-before.maximum=1.2em,font-weight=bold,space-before.optimum=1.0em,space-before.minimum=0.8em,keep-with-next.within-column=always, 3768fo:block font-family=serif,Symbol,ZapfDingbats,margin-left=-4pc, 3533fo:block font-size=17.28pt, 1722fo:block font-size=20.7359997pt, 104 fo:block font-weight=bold, 5332fo:block keep-with-next.within-column=always, 439 fo:block space-after=1em, 6037fo:block space-before.maximum=1.2em,space-before.optimum=1em,space-before.minimum=0.8em, 2558fo:block span=none, 191 fo:bl
Re: String.intern() thoughts
On Mon, 2003-12-01 at 02:11, Glen Mazza wrote: > --- John Austin <[EMAIL PROTECTED]> wrote: > > I mentioned yesterday that I thought I had read a > BTW, The third drawback listed in the link above gave > "weak references" as an alternative implmentation--I'm > unsure what that construct is about--is this the > vtable you were speaking of in an earlier message? The Vtable is used to dispatch virtual functions in (some implementations) of C++. There is one such table for each class and it contains a pointer to each virtual function defined. Each object holds a pointer to it's actual class vtable. This is the mechanism used to implement polymorphism for C++ virtual functions. I think a similar means can be used for the inheritence of properties in FOP. class a { virtual int f() { return 3; } } class b : a { virtual int f() { return 33; } } ... a* z = new b; cout << "a" << z->f() << endl; class a vTable contains address of a::f() bb::f() instance z includes pointer to the class b vTable and is function b::f() is called using pointer to object of type a. -- John Austin <[EMAIL PROTECTED]>
Re: Properties Implementation and Canonical Mappings
On Mon, 2003-12-01 at 02:45, Glen Mazza wrote: > --- John Austin <[EMAIL PROTECTED]> wrote: > > > > The property strings are given to the Property > > object > > constructor by some path beginning with a SAX > > parser. > > It is reasonable to assume that the SAX parser loses > > refs to most of these strings and that the Property > > implementation retains the only references to these > > String objects. > > > > How big are String Objects ? > > At least 16 bytes plus storage for characters. > > > > What does this save us ? > > Probably only about 1,600,000 bytes for this file. > > CPU cost of creating strings is probably similar to > > cost of checking string table for a copy. > > > > Just to clarify, the (additional?) "CPU cost" you > mentioning above is *not* occurring for the present > process, correct? I think you're referring to the > cost that would be added as a result of the changes > you're recommending (because there now will be a > string table search to avoid duplication). Going back to the beginning of my involvement, I found this issue because Property searches are the high-runner for CPU in FOP. I don't want to split hairs in isolation over which search/constructor sequence is faster. I want to remove the conditions that cause the current pathology. Hash table lookups are FAST. When we invest in object creation we recover many times over in the end. > Also, the "string table" you mention--I think you're > speaking generically, but is there a specific, already > available construct in Java that we can use for this > purpose in FOP? I'd like to find out what you have in > mind for a specific implementation. HashMap works fine the way Peter has it set up in alt-design. I use the same construct in the Perl code I use to analyze the large sample FO files. -- John Austin <[EMAIL PROTECTED]>
String.intern() thoughts
I mentioned yesterday that I thought I had read a comment by Bruce Eckel suggesting that String.intern() might be avoided. I could not find the reference in either the 2nd or 3rd editions of "Thinking In Java". A couple of observations from some research: 1) There were some problems in Java 1.1 and before. 2) There may be problems in non-Sun implementations (KAffe...) 3) There have been discussions in the SAX2 list and other places about using String.ntern() and I notice that interning is a feature of SAX2 that can be turned on. There is a lot of support for the technique and I suspect some of the objections are of the theological type. 4) The property strings in Peter West's code start life as string literals which are interned by the Java Language Spec. So they are all ready present in the table. Some of the benefit of interning can be turned on in the parser. -- John Austin <[EMAIL PROTECTED]>
Re: Properties Implementation and Canonical Mappings
Input: The XSL-FO file produced from: "DocBook: The Definitive Guide " Document size: 648 Pages // for the O'Reilly edition FO file size: 21,659,370 bytes Properties: 526,648 Tags: 285,223 Height of tree: 17 // max height of the parse tree Unique prop names: 117 // bounded by the spec Unique prop values: 13,520 // bounded by the real world Using these numbers, we can explore the sort of benefits to expect from revised Property implementation. With over a million strings, the FOTree for this document would use forty or fifty Mb in addition to data structures. This document can be used as an example even though it probably can't be formatted (yet) by FOP. It has a lot of tables. It could be a goal of the FOP project to generate this well-known document. I was thinking of using the XSL-FO spec from the W3C web site but couldn't find the stylesheet to make the FO file. If anyone knows where to find them, please let me know. Statistics from this file: Number of Elements by tree level: level=1 count=1 level=2 count=473 level=3 count=5242 level=4 count=5480 level=5 count=7129 level=6 count=26231 level=7 count=22475 level=8 count=36447 level=9 count=62288 level=10 count=38536 level=11 count=30486 level=12 count=23641 level=13 count=23190 level=14 count=2023 level=15 count=771 level=16 count=701 level=17 count=109 Element frequencies: a 24< I wonder where this came from fo:basic-link 5225 fo:block 112142 fo:conditional-page-master-reference 48 fo:external-graphic 1097 fo:flow 472 fo:footnote 22 fo:footnote-body 22 fo:inline 62792 fo:layout-master-set 1 fo:leader 1764 fo:list-block 279 fo:list-item 1004 fo:list-item-body 1004 fo:list-item-label 1004 fo:marker 5335 fo:page-number 1872 fo:page-number-citation 3224 fo:page-sequence 472 fo:page-sequence-master 12 fo:region-after 38 fo:region-before 38 fo:region-body 38 fo:repeatable-page-master-alternatives 12 fo:root 1 fo:simple-page-master 38 fo:static-content 4720 fo:table 6497 fo:table-body 6497 fo:table-cell 33174 fo:table-column 19225 fo:table-footer 1 fo:table-header 29 fo:table-row 15301 fo:wrapper 1799 Properties: 526648 Tags: 285223 num_keys: 117 num_vals: 13520 -- John Austin <[EMAIL PROTECTED]>
Re: Properties Implementation and Canonical Mappings
On Sat, 2003-11-29 at 16:35, J.Pietschmann wrote: > Darn, racall the last post. > > John Austin wrote: > > Note that storing the property name and value refs supplied > > to the Property constructor will use 45,620 strings. If the > > Property implementation employs canonical mapping to ensure > > that only one copy of each unique string is stored, then just > > over 2,300 strings are required. > > Have a look at String.intern() Bruce Eckel said not to trust it for some reason. I have 2nd Ed of "Thinking in Java" and the online one is 3rd Ed so I haven't found chapter and verse for this yet. The only 'bad thing' said about it that I could find quickly was: http://mindprod.com/jgloss/gotchas.html The other good thing we can do is .... compare these string refs for equality. > J.Pietschmann -- John Austin <[EMAIL PROTECTED]>
Properties Implementation and Canonical Mappings
In the interest of contributing (instead of just trashing) to the proposed implementation, I wrote a simple Perl script to get some counts out of a real-world XSL-FO file. Input: The XSL-FO file produced from a DocBook file I have left from a dormant project. The perl program counts the number of properties in the source file. PDF size: 130 Pages // some users have a lot more FO file size: 1.2M bytes Properties: 22,815 Unique prop names: 89 // bounded by the spec Unique prop values: 2,227 // bounded by the real world Note that storing the property name and value refs supplied to the Property constructor will use 45,620 strings. If the Property implementation employs canonical mapping to ensure that only one copy of each unique string is stored, then just over 2,300 strings are required. The property strings are given to the Property object constructor by some path beginning with a SAX parser. It is reasonable to assume that the SAX parser loses refs to most of these strings and that the Property implementation retains the only references to these String objects. How big are String Objects ? At least 16 bytes plus storage for characters. What does this save us ? Probably only about 1,600,000 bytes for this file. CPU cost of creating strings is probably similar to cost of checking string table for a copy. What does it buy for us ? Bounds a source of current Order(n) memory growth. It gets us in the habit of using another good technique. I am all ready thinking along the lines of: The property lists for these FO's are usually generated by programs and will be the repeated many times. Perhaps we could use larger, faster working Property Lists consolidated with Canonical Mappings to save both time and space. I am thinking again along the lines of handling properties more like C++ virtual function table (vTable). This object is larger than Peter's ordered Property array, but would be faster. That's a reason C++ has fast virtual function dispatching. -- John Austin <[EMAIL PROTECTED]>
RE: [VOTE] Properties API
On Thu, 2003-11-27 at 14:57, Victor Mote wrote: > John Austin wrote: > > > I am critical > Now, if you can figure out how to digest an FO document without building a > tree that represents a page-sequence object, I hope you'll share it with the > rest of us. That could be a breakthrough indeed. I am just thinking of ensuring that objects disappear after the page they are on has been printed. At the point that 0.20.5 prints: [INFO] [1] The related objects from Page 1, should ... join the choir invisibule ... They don't appear to, which is why the memory use of FOP increases in proportion to document length. You only need to retain the useful parts of the page-sequence object. Stuff that has been 'printed' isn't useful. > Victor Mote -- John Austin <[EMAIL PROTECTED]>
RE: [VOTE] Properties API
On Thu, 2003-11-27 at 13:58, Victor Mote wrote: ... > Again, this is an implementation detail, and doesn't affect the interface. > However, on the implementation side, it seems that the tradeoff will be > between doing a full parse each time, or creating lots of objects. John > Austin's inquiry about the huge number of objects created is what got me > started down this line of thinking. I am critical of what I percieve to be a pathological growth of objects (and search times). If those problems are corrected, there are plenty of resources left to do a few extra parses. How often will you encounter expressions this complex ? Rarely. If they become common (and someone will do that!), we can call THAT a pathalogical development and blame the victim. > I suppose that the best way would be to > have your cake and eat it too -- store integers where possible, and create > objects where not possible, and teach everything how to tell the difference. > (Here is a half-baked idea that I don't want to even think about pursuing > for a while -- PropertyStrategy. With the API I have proposed, one could > conceivably store the Properties one of several ways, and have the user > select which one they want based on performance needs). As Peter knows, I have been reading the code. I shall attempt the XSL-FO Spec soon. I understand the spec defines the behavior of the program in terms of fully parsed/expanded trees. This implies that objects must exist even if they will never be used after the parser moves past their end-points. Optimization anyone? What I infer of the Tree structures in your discussion and Peter's code suggests to me that FOP creates a DOM-ish view of the document in one or more trees. This is a mis-match with the SAX parser that is in there somewhere. And just to say something completely ludicrous, because someone will take it seriously ... You could convert those expressions to a Java class, compile, load and invoke it with Reflection ... -- John Austin <[EMAIL PROTECTED]>
Re: [VOTE] Properties API
On Wed, 2003-11-26 at 14:45, Glen Mazza wrote: > --- "Peter B. West" <[EMAIL PROTECTED]> wrote: > > The set of property values relevant to a > > particular FO are > > available in a sparse array, accessible by the int > > index corresponding > > to the Property. > > Which source file has the enumerations of the > properties--I'd like to see how you listed them. Are > you satisfied with those enumerations--anything you > would change if you had to do it over? org.apache.fop.fo.PropNames.java has the property strings and assigned numbers. He even states a Perl program (and how to execute it in emacs) to regenerate the numbers. Similar file is org.apache.fop.fo.FObjectNames > It may be good to create a sample FO document that > would exhibit what you're saying above. Hopefully > something that shows a important feature that would > clearly fail if we don't take into account the Area > Tree while resolving properties. That would help > clarify things, and we can use it for testing. And there are reasons to create a set of XSL-FO documents providing test cases. I am concerned that some of Peter's NameSpace code hasn't been tested (or is just hard to grok). -- John Austin <[EMAIL PROTECTED]>
Property classes and eventually, new Property handling.
After my last post I went away to play in the code for a while. Mostly to see what is necessary to isolate a minmal set of classes related to Property handling. What I found is: 1) Property is ubiquitous: every client class knows what package it lives in. As a planning point, I better think about keeping a Property class (or prepare to make a lot more changes and lose a few friends). 2) PropertyList and PropertyListBuilder are used in fewer places. but they are used a lot. PropertyList is referenced 129 times and PropertyListBuilder only 8 times. Most of these references are in the Property class. If they can be hidden, the problem is bounded by the Property class. One way to discover the scope of an API is to rename a class or a package. Doing so breaks all of the compile units that depend on the renamed class(es). Restoring the missing interfaces restores the system if the restoration obeys the previous class contracts. As I suspected, automated code generation for properties is do-able. There are more than 17,000 lines in files generated through Ant target: 'codegen'. Many of these are clients of the Property class. Changes here can be localized to the XSL files that generate the code. -- John Austin <[EMAIL PROTECTED]>
RE: [VOTE] Properties API
Victor, I was mostly backing away from my earlier posting which was off-target. On Tue, 2003-11-25 at 13:26, Victor Mote wrote: > John Austin wrote: > > > After thinking about the proposal, I'm not sure it solves anything. > you might make to the implementation would require (I think) changes to the > LayoutStrategys and to the Renderers. Also, as Glen has pointed out, there > is business logic that can be pulled out of these code modules back into FO > Tree where they more properly belong, and where duplication and confusion > can be minimized. In order to adapt Peter's ideas, I would need identify the current Interface(s). Ideally a re-implementation of Property handling would be invisible outside of those classes. All the proposal addresses is the signature of some accessors. It does not identify the set of property-related classes. I think an adapter class could convert the current interface to the proposed interface. This could be useful if it covers the entire properties interface. > > The discussion favours the proposal. > > I don't understand what you are saying here. All/most other postings agreed with the proposal. > > class PropertyAdapter? extends Property{ // Ugh! just f'rinstance > > > > // repeat the following about 380 times: > > final public Property getMaxWidth?() { // use final to inline and annoy > > return get("max-width"); > > } > > } > > Sorry -- I was not clear here. I meant to suggest that these methods be > added to FObj (and its subclasses to the extent necessary). Just a for-instance sketch of an Adapter. Reference to properties has to come from somewhere. I used Inheritence as a convenience. > > There is no statement defining the current interface. This will be > > determined from existing code. > > > > Implementation > > The proposal makes no suggestion for implementation and my earlier > > submission is not relevant except as an indication that this issue is > > linked to performance. > > Again, I am not sure what you are saying here. The proposal deliberately > does *not* address implementation. I am quite glad to have you address the > performance aspects of implementation, but I think it is a separate issue > from the interface. We can (and should, IMO) fix the interface before or at > least during any changes to the implementation. All I am trying to do is to > hide the implementation from the rest of the system. What else is needed to 'get' properties ? Your accessors just have the property name. > Since FO Tree (and Properties) kind of works right now, we have been paying > much more attention to other parts of the FOP code. Your questions and > interest are forcing us to address it sooner than we would have. Obviously, Don't feel forced. You CAN ignore me. I appreciate your efforts to clue me in. > one of our highest priorities should be to make other developers as > productive as possible. Also, to a certain extent, we have been waiting on > Peter West's work, hoping that his efforts can be useful in all of this. I > am still hoping to hear from Peter on this, but in the meantime, I am trying > to do some housekeeping that IMO will be important to clear the decks for > you. I support your Interface-view of properties but would like to have the scope of the Property interface mapped out to include more than the accessors. Of course, if these accessors and some references all ready held by FObj, can do the trick, lets get on with it! -- John Austin <[EMAIL PROTECTED]>
Re: [VOTE] Properties API
Note: I added a page to the wiki for this thread. http://nagoya.apache.org/wiki/apachewiki.cgi?PropertiesRedesign After thinking about the proposal, I'm not sure it solves anything. There are two aspects to the redesign of Property handling in FOP. * Interface means the external points of contact for Property data * Interface determines impact of changes on other components of the system. * Implementation means the internal construction of classes * Implementation determines performance characteristics of the program. The discussion favours the proposal. Interface The current proposal asks that FOP will employ Java-Bean-like accessors for the properties of Formatting Objects visible to the FOTree. As an example: getMaxWidth?() for the property "max-width" There are between 250 and 380 of these methods required and they could be generated automatically from an XML-based list of properties. This list could be derived (if not generated) from the XSL-FO Specification itself. Some kind of simple adapter class can be used to equate the proposed interface to the existing one: class PropertyAdapter? extends Property{ // Ugh! just f'rinstance // repeat the following about 380 times: final public Property getMaxWidth?() { // use final to inline and annoy return get("max-width"); } } There is no statement defining the current interface. This will be determined from existing code. Implementation The proposal makes no suggestion for implementation and my earlier submission is not relevant except as an indication that this issue is linked to performance. -- John Austin <[EMAIL PROTECTED]>
Re: [VOTE] Properties API
On Mon, 2003-11-24 at 19:47, Victor Mote wrote: > FOP Developers: > > Proposal: > I propose that public "get" methods be used to retrieve FO Tree property > values, and that the data behind these values be made as private as > possible. The methods should be given a name based on the XSL-FO Standard. > For example, the "max-width" property should be accessed using the method > "getMaxWidth". The values returned should be the "refined" values. I have been thinking about this a bit and I would like to throw out a few rambling observations: Properties are defined in the spec, so there are a finite number of them meaning that they map nicely in to an enumeration type*. Peter West had written some stuff along that line. This allows us to get away from object-and-compute-intensive String types. There are about 380 of these in Peter's Mapping. Some 249 are simple attributes and the rest are more complex, like space-after.optimum, space-after.miumum etc. One train of thought I've had asks whether we need an atomic 'Property' type alone or whether we can use a larger aggregate object type like PropertyList that is a vector with each attribute value in a fixed position. The idea here is that such a vector, something like a vTable, can be merged quickly to resolve inheritence. v[FONT_FAMILY] = "sans-serif" v[FONT_SIZE] -> "10pt" v[...] -> ... v has 400 entries or 250 entries and we use polymorphism somehow on the complex properties > Discussion: > 1. The purpose here is to separate the storage of the property values from > their presentation (API) to the rest of the system. This opens the door for > later changes to the storage without disrupting the remainder of the system. We need to contain the number of objects here. say with canonical mapping of the property name strings. Possibly also use canonical mapping of the attribute values too. How many times is "10pt" or "bold" coded in an document ? Especially, given that patterns of FO are emitted by XSLT in other programs. Allows faster compares when we can test with == rather than Object.equals( Object). I had thought that the fact that all of the attributes are in the XSL-FO Specification and that there are some simple structures used, I might want to generate the property name list and some of the acccessors like you have named them, automatically. Is there an XSL Schema for XSL-FO, or would I just extract them from xml in the Spec document ? I can't say anything about how this stuff gets used, points 2 & 3. I'd be interested in being involved in item 4. > 2. This could perhaps be implemented for now only in FObj (??). > 3. This is not directly relevant to the question, but needs to be addressed > from the "big picture" standpoint. With the FO Tree mostly isolated, and > LayoutStrategy implemented, the issue of whether a certain feature is > implemented or not moves from the FO Tree to the specific LayoutStrategy. > Each LayoutStrategy eventually needs to track which objects and properties > it supports. The FO Tree should always store and return the data, without > regard to whether it can be used or not. In the future, our properties.xml > file, "compliance" page, and perhaps other things need to handle this. The > LayoutStrategy needs to be the entity that reports on whether a feature is > supported (perhaps using a scheme similar to properties.xml). > 4. If accepted, this would be a great project for one of the new developers > to tackle. I don't mean to volunteer anyone, but it might be a good "feet > wet" project. > > My vote: > +1 > > Victor Mote * we don' need no steenking enumeration type! For years I thought the 'steenkin badges' quote originated with WKRP's Dr. Johnny Fever. -- John Austin <[EMAIL PROTECTED]>
Re: Memory measurement -- importance of Driver.reset() in Cocoon - NOT!
On Mon, 2003-11-24 at 16:23, Glen Mazza wrote: > --- John Austin <[EMAIL PROTECTED]> wrote: > > > > My own feeling is that FOP > > will remain > > problematic for large documents and this will be > > especially so in > > server environments such as Cocoon. > > > > We hear you, and we do emphasize performance as one of > our main goals [1]. As always, you may wish to add a > Wiki page on things we're not doing that we should be > doing in order to speed things up. Bugzilla may also > be a good place. Others can comment on them that way, > and we can refine what needs to be done. > > For me though, depending on the problem, there can be > a "grokking delay"--I tend not to act until I fully > understand the issues involved. Also, I am currently > tied up with layout issues which preclude me from > getting too much into properties code at this time. > > But many of the other committers/contributors have > more thought-out opinions on how properties should be > handled and should be able to contribute more quickly > to your ideas. Thanks. Sometimes Ijump on things (like a cop on a donut). Part of the ENTJ personality type (most engineers are ENTP - Perceiving vs Judgemental). I want my focus to be on the PropertyList issue, as I feel there are significant benefits there. I have a few miles to go to understand the issues in PropertyList and PropertyListBuilder. I intentionally have not brought up my new observation: When I run the same test twice with a Driver.reset(), GC and a wait in between, heap usage the first time is TWICE the usage of the second run. Time taken is the same. Just somethng to wonder about, in case we stumble on a reason. JMP graphic available on request. I have learned quite a bit about running FOP (and Java) for larger files. A year ago, I had terrible trouble running FOP in Cocoon becasue I had only 256Mb. The same task is a breeze in the current release. 1) Large (equal) values of -Xms and -Xmx can reduce GC and speed execution of a task. There is documented support for this on the Sun site. 2) Use of the Server HotSpot VM makes a bigger difference than point (1). I have submitted a fix for 'cocoon.sh' to select this JVM. I am sure this was an oversight at Cocoon. 3) FOP and related libraries compiled with Sun's SDK can all be run with the current (1.4.1) IBM SDK. The GC performance of this IBM SDK is too ugly for words. More to thnk about. 4) You can observe Garbage Collection events with the Java option: -verbose:gc and this works similarly on both IBM and Sun run-times. Trace syntax is quite different but you can see times and sizes. Given that I may have p*ss*d off some Germans (my Neandertal* comment ;-), I shall refrain from stating that there may be something rotten in Denmark - that might annoy Shakespeare-averse Danes ... * I lived on Ramstein Air Force Base for 4 years. Many Neandertals in my high-school at Kaiserslautern! > Glen > > [1] > http://marc.theaimsgroup.com/?l=fop-dev&m=106735192618324&w=2 > > > __ > Do you Yahoo!? > Free Pop-Up Blocker - Get it now > http://companion.yahoo.com/ -- John Austin <[EMAIL PROTECTED]>
Re: Memory measurement -- importance of Driver.reset() in Cocoon - NOT!
I took a further look at some of the JVM options and ran a few more tests. I found a couple of things that may be useful. 1) I have to use -Xmx400m to produce the 2000 page PDF in a large PDF test, 'time' reported: real ~5m user 2m57.880 ... 2) When I added -Xms400m to the same work 'time' reported: real 4m35.968s user 2m35.190s ... This is drop in user time of 22 seconds, about 12 percent The Cocoon Performance Tips page says we should not use -Xms. One problem with statements of performance is they don't always discuss the rationale. My own feeling is that FOP will remain problematic for large documents and this will be especially so in server environments such as Cocoon. 3) When I add the JVM 1.4 option: '-server', executiuon gets faster. Time reports: real 2m52.100s user 2m1.520s as improvement of about 20 percent. Note that 'cocoon.sh' in the top-level directory for cocoon-2.1.3 supplies the defaults: JAVA_OPTIONS=-Xms32m -Xmx512m but does not specify '-server'. -- John Austin <[EMAIL PROTECTED]>
Re: Memory measurement -- importance of Driver.reset()
On Fri, 2003-11-21 at 15:50, J.Pietschmann wrote: > John Austin wrote: > > It is clear that there is a fair bit of memory freed by Driver.reset(). > > > > After thinking it over, I modified the same test to skip reset() and > > just null the reference and issue System.gc(). > > > > This should be the same as letting it go out of scope (which happens > > afterwards but this way I get the square wave on the graph). > > > > Attachment 2: footprint2.png has about 1Mb more heap in use! > > Shrug. The Driver.reset() is > _source = null; > _stream = null; > _reader = null; > _treeBuilder.reset(); > and the tree builder's reset in turn is > currentFObj = null; > rootFObj = null; > streamRenderer = null; > this.errorCount = 0; Why bother implementing reset() ? > There are no static variables explicitely freed (there are not much > static variables in FOP in general). I don't see any difference calling > reset() can make compared to simply nulling the Driver reference. > > > Why ? Does this suggest that there are finalizers (destructors) that are > > not being called ? References set to null inside reset() should all > > be unreachable when the reference to Driver goes out of scope. > > You realize that gc() doesn't *force* a GC? > The most reliable way to measure allocated heap space I know off is > allocating a large byte[][] array, then allocate 1k byte[] or so > until you run out of memory. Yes. But the observations were convincing. It is obvious that much garbage was collected as expected. It was possible that the garbage collector did not collect ALL of the garbage at that point but I didn't understand how that could happen. I thought that the GC would examine every object to determine it's status. This seemed to imply that you MUST collect all garbage every time. I overlooked the possibility of multiple storage pools. > > This might explain problems people are reporting when > > generating multiple PDF files using FOP. Especially if their > > programs don't lose references to instances of Driver. > Well, if they hang on to the Driver object, they are doomed. > > > Personally, I suspect there are a lot of logical memory leaks > > inside FOP. > What's a logical memory leak? The only kind you can get in Java. Referenced memory that is never used again. In the case of FOP, this includes objects which have been laid out in the PDF never to be used again. FOP memory use increases >= O(n) where n is the number of pages being created. This is easy to show. Massive amounts of memory are released when Driver is reset() or unreferenced. I still object to the extreme high water marks in FOP. The saw-tooth pattern is described in the JMP documentation as characteristic of the creation of too many objects. This is probably fundamental to XSL-FO but the high-water mark of memory use would be lower if objects were collected sooner. This would also speed up collection as there would be fewer objects to inspect. -- John Austin <[EMAIL PROTECTED]>
Re: Memory measurement -- importance of Driver.reset() in Cocoon - NOT!
On Fri, 2003-11-21 at 00:43, John Austin wrote: > I mean, I wonder what Cocoon does ? In FOPSerializer: /** * Recycle serializer by removing references */ public void recycle() { super.recycle(); this.driver = null; this.renderer = null; } Apparently, we don't need no stinkin reset(); -- John Austin <[EMAIL PROTECTED]>
Re: Memory measurement -- importance of Driver.reset()
On Fri, 2003-11-21 at 00:02, Glen Mazza wrote: > Please do not cross-post to both lists--keep > development-related issues on fop-dev. Sorry. Not something I do a lot. It was posted to fop-user to be available in the archives for anyone searching on memory useage. Users will want to make sure they use Driver.reset(). Hmm. I wonder what would Jimmy Buffet do ? I mean, I wonder what Cocoon does ? -- John Austin <[EMAIL PROTECTED]>
Memory measurement -- importance of Driver.reset()
After reading the Sept 2003 thread about Memory Performance, leaks (and how wonderful ADA is), I modified my test program that generates 3 PDF files. The program now sleeps 30 seconds, calls Driver.reset(), nulls the reference and sleeps again. In JMP this plots a square wave between that you can read on the attached graphs. It is clear that there is a fair bit of memory freed by Driver.reset(). After thinking it over, I modified the same test to skip reset() and just null the reference and issue System.gc(). This should be the same as letting it go out of scope (which happens afterwards but this way I get the square wave on the graph). Guess what ? Attachment 2: footprint2.png has about 1Mb more heap in use! And this is a very short test file with just one member name & address. The test prints a letter, envelope and a renewal form for a non-profit Gardening group. The difference ... no call to Driver.reset() !!! Why ? Does this suggest that there are finalizers (destructors) that are not being called ? References set to null inside reset() should all be unreachable when the reference to Driver goes out of scope. This might explain problems people are reporting when generating multiple PDF files using FOP. Especially if their programs don't lose references to instances of Driver. Personally, I suspect there are a lot of logical memory leaks inside FOP. A reset() at the end of using a Driver instance is a catch-all way of releasing all of the logically leaked memory allocated from inside Driver() (and therefore inside FOP). This approach is of little help to the developer who builds an application that dies of memory exhaustion in production. We will have to fix the logical leaks inside FOP to improve the user experience. -- John Austin <[EMAIL PROTECTED]> <><>import java.io.File; import java.io.IOException; import javax.xml.transform.Result; import javax.xml.transform.Source; import javax.xml.transform.sax.SAXResult; import javax.xml.transform.stream.StreamResult; import javax.xml.transform.stream.StreamSource; import javax.xml.transform.Transformer; import javax.xml.transform.TransformerFactory ; import org.apache.avalon.framework.logger.ConsoleLogger; import org.apache.avalon.framework.logger.Logger; import org.apache.fop.apps.Driver; import org.apache.fop.apps.FOPException; /** * Use JAXP 1.1 to apply two transformations and FOP to generate PDF output * for the Friends of the Gardens (FOG) project for the MUN Botanical Garden * * Requires: * (i) Java >= 1.4 to obtain the XML parser and XSLT processor - JAXP 1.1 * (ii) FOP >= 0.20.5, fop.jar and the associated batik.jar and avalon-cvs-20020806000.jar * (iii) Input file: members.xml * (iv) Transforms: letter.xsl, letter2fo.xsl, * env.xsl, env2fo.xsl, * renewal.xsl, renewal2fo.xsl * Compile: * javac -classpath .;fop.jar;avalon-framework-cvs-20020806.jar SimpleJaxp.java * * Execute: * java -Xmx4 -classpath .;fop.jar;batik.jar;avalov-framework-cvs-20020806.jar SimpleJaxp * * Alternative: * cocoon: pipelines like this: */ public class SimpleJaxp extends java.lang.Thread { public static void main(String[] args) throws javax.xml.transform.TransformerException { java.util.Calendar cal = java.util.Calendar.getInstance(); long start = cal.getTimeInMillis(); transformToPDF( "letter", "members.xml", "letter.xsl", "letter2fo.xsl" ); transformToPDF( "env", "members.xml", "env.xsl", "env2fo.xsl" ); transformToPDF( "renewal", "members.xml", "renewal.xsl", "renewal2fo.xsl" ); System.out.println( "Elapsed " + ((java.util.Calendar.getInstance().getTimeInMillis() - start + 500)/1000) + " seconds." ); try { sleep(24*360); } catch (InterruptedException e ) { System.err.println( "sleep() Interrupted." ); } } public static void transformToPDF( String namePart, String xmlFileName, String xsltFileName1, String xsltFileName2 ) throws javax.xml.transform.TransformerException { File xmlFile = new File( xmlFileName ); File xsltFile = new File( xsltFileName1); File out1 = null; try { out1 = File.createTempFile( namePart, ".xml" ); out1.deleteOnExit(); } catch( IOException ioe ) { System.err.println( "Could not create temp file" ); System.exit(0); } //*** First transformation *** Source xmlSource = new StreamSource(xmlFile); Source xsltSource = new StreamSource(xsltFile); Result result = new StreamResult(out1); TransformerFactory transFact = TransformerFactory.newInstance(); Transformer trans = transFact.newTransformer(xsltSource); trans.transform(xmlSource, result ); trans = null; //*** Second transformation *** File xsl
Development Environment suggestions ?
So far I have been playing around like the Neanderthal* that I am. I use Sun Java 1.4.x with xterm, vi, emacs and occasionally Jedit when I feel modern urges. Peter has mentioned Eclipse and I have used VisualAge for Java, and either NetBeans or the Sun form thereof. Is there a path to enlightenment (excuse the trollish tone) therein ? Given that FOP can be installed and started in TBI (The Bash IDE), are there other graphical IDE's with a reasonable learning curve ? I have both Win98 and RH9 available to me. The RH box has more resources in addition to having the usual Linux advantages. * Is that term Politically Correct ? Would it be offensive to Europeans ? I myself am descended from Celts and probably some Angles, Jutes and Saxons. Dunno about Picts. -- When I showed my mother an Anglican Church with the sign: "Angle Parking Only", she asked "What about the poor Jutes and Saxons ?". John Austin <[EMAIL PROTECTED]>
Re: FOP ~ PropertyList search gives linear performance (FROM: fop-user)
On Wed, 2003-11-19 at 19:50, Peter B. West wrote: > John Austin wrote: ... > My apologies to everyone on the list for the testy tone ... You mean I might have help p*ss*ng people off ? > Peter -- John Austin <[EMAIL PROTECTED]>
Re: ANN: alt-design can now be integrated??
On Wed, 2003-11-19 at 15:34, Victor Mote wrote: > ANNOUNCEMENT: I have just committed a change that 1) allows LayoutStrategy > to tell whether an FO Tree should be built, 2) has Driver act on this, i.e. > to build an FO Tree only if LayoutStrategy indicates that this should be > done. This should theoretically allow Peter's logic to be used as a > LayoutStrategy within the trunk development line. What I have done is > probably overly simplistic, but I will allow Peter or anyone wishing to work > on that strategy tell us what additional things are needed to accommodate. > To start integrating, create a subclass of LayoutStrategy, override the > foTreeNeeded() method to return false, then write a format() method that > does the layout work. LayoutStrategy knows its parent Document, which knows > its parent Driver, so you should be able to get to all of the parsing > variables that are needed. Let me know if you need help. > > Since configuration is still messed up, you will need to hard-code a change > to Driver to get your new LayoutStrategy object created. > > Victor Mote Geez! I was thinking more along the lines of plugging in a few new data structures for property lookup. I am exploring the old implementation through the marvel of code grooming in order to understand it. Don't worry, I have the time to do this right. I got a tiny improvement by playing around in some of PropertyList and PropertyListBuilder. This is just a throw-away effort of course. I did enough tracing this a.m. to realize that my 'linear behaviour' may be deeply buried. I have done this often enough that I don't expect more than marginal improvements from grooming/tweaking lines of code. [Gone are the days of PL/I and unaligned bit fields.] Before I found out about Alt-Design, I was thinking about using a HashMap with property names as keys and a class implementing some stack behaviour. Each new FO would conceptually 'push' new values on a stack for each property in it's list. A smart 'pop' would allow the entire set of properties for a FO to be popped together. Hopefully, this design would allow faster access to the current properties, without a need to search through higher 'activation records', 'stack frames', contexts or whatever you choose to call them. The observations of performance indicate that there are millions of accesses through PropertyList.get(String propertyName) which are sent one-to-one through PropertyList.get( propertyName, true, true) and thence on to PropertyList.findProperty( propertyName, true ). Combine this information with the fact that I didn't notice the performance of the corresponding put() operations on the HashMap underneath PropertyList to conclude that retrieval is much more intensive than storage in this structure. So I should optimize retrievals. My plan is to get to know the internals of FOP that are 'in contact with' the existing code, then get to know the Alt-Design, then play with more and more of it until I feel comfortable integrating it. I don't expect fast-track to committer status, I would hope to work with one or two current participants and package the changes so that they 'drop in' to place. (We'll see) -- John Austin <[EMAIL PROTECTED]>
RE: FOP ~ PropertyList search gives linear performance (FROM:fop-user)
On Wed, 2003-11-19 at 15:22, Victor Mote wrote: > John Austin wrote: > > to work on this but I don't want to walk in to a firefight. > > FWIW, I don't think there is really a firefight. Our discussions are usually > at least robust, maybe even rowdy, but AFAICT, there is a large amount of > mutual respect. That said, we *are* still trying to sort out some design Ah yes! The old vigorous and spirited exchange of views. -- John Austin <[EMAIL PROTECTED]>
Re: FOP ~ PropertyList search gives linear performance (FROM: fop-user)
Looks like I really put my foot into it this time ;-) I have repeated the measurements I did yesterday and I think that it is a pretty reasonable conclusion that a lot of resources are consumed by FOP in its rather Byzantine property management code. I just spent a while trying to understand PropertyList and PropertyListBuilder and found out that I need to understand Property and Property.Maker as well. I think I am going to have to help with this part of the project but it is going to take a while. I offered (off-line) to look merging the Alt-Design code in to the main branch but I suspect that there are some different directions associated with this. Perhaps this is the reason it has not been done so far. I am still willing to work on this but I don't want to walk in to a firefight. I believe the measurements I did yesterday and I feel that a bit of algorithm replacement should produce a significant improvement in the program. I would also like to suggest that anyone interested in performance look at Java Memory Profiler at http://www.khelekore.org/jmp/performance.html I suspect there are still major memory leaks in FOP and this is one tool that will help you track them down. -- John Austin <[EMAIL PROTECTED]>
Re: Confused with Fop extensions, distinct.
On Tuesday 16 April 2002 11:13, you wrote: > I have tried to read the docs. I had used distinct before but not > in an embedded application. > > Function not supported. Is my error message. > > I have inclued my stack Trace. My includes in my application. and my > stylesheet. > Thanks in advance for any help you can give. Check your spelling in namespace declarations and make certain the class name is coirrect for the version of Xalan you are using. I posted the following to cocoon-users a few weeks ago: This note is mostly for the benefit of anyone else searching for help with Redirect. Sorry if Xalan is a bit off topic for Cocoon2, but I expect that a lot of XSLT users will be using Cocoon as a framework for Xalan-J/XSLT-based applications. I spotted Redirect in Chapter 8(?) of "XSLT" by Doug Tidwell, August 2001, O'Reilly and Associates. It does work in Cocoon2 but I had a few problems getting it to run. I finally solved it by looking at the Xalan-J docs at xml.apache.org as well as the source code for Redirect. The Redirect class seems to have moved from org.apache.xalan.xslt.extensions.Redirect to the more concise org.apache.xalan.lib.Redirect. The comments in Redirect are actually useful! (thank you Mr Boag!) It appears that you need to include several namespaces in the stylesheet tag. This was another source of 'the fog of war' for me. http://xml.apache.org/xslt"; xmlns:redirect="org.apache.xalan.lib.Redirect" extension-element-prefixes="redirect" ... rest omitted ... This did not work for me when I followed examples from various sources. I have one minor complaint about Redirect (may apply to all extensions). This facility is terrific but it is strangely silent when things go wrong. For example, I used file="concat(... at one point and created a directory named "concat(...". When I had mis-spelled namespace URI's nothing worked and it did so silently. When the lxslt declaration was missing ... same thing. I know I had perms wrong and was rewarded again with silence. Thanks again. Hope this helps. - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, email: [EMAIL PROTECTED]
Re: Unix and FOP ?
On Friday 12 April 2002 22:43, you wrote: > yep The only area that Windows is (arguably) superior to Unix is in Graphics and especially FONTS. A consequence of Windows success is the fact that almost all computers have Windows licenses. This lets us use the Windows fonts. You need to have the Windows license however. The Windows license doesn't require that you actually run all of Windows. So in using the fonts, we are using Windows as licensed. We are just ignoring the unreliable parts of Windows (generally the executabe parts). Now actually installing and using the Windows fonts is an area which needs better documentation. > > in userconfig.xml, but > > Can I use the specials fonts by Window without problems into UNIX.? > > That's possible? - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, email: [EMAIL PROTECTED]
Re: Unix and FOP ?
On Thursday 11 April 2002 09:37, you wrote: > Hi, > > I need information the file xsl:fo transformation in UNIX. > What's I need by uses XSL:FO in UNIX? > Can I do? You can use xsl:fo in ANY system that has an implementation of the Java VM. This includes any reasonable implementation of Unix (AFAIK). Many people use Cocoon 2 from the Apache project because it provides a ton of features, but you can use just the Fop program if you wish to. I use it both ways and have also used it from XML Spy. In Cocoon 2 you can run XSL transformations to produce an XSL:FO document and render this to one of a number of formats such as PDF, PS and RTF using the FopSerializer. I have also used Fop driven by a shell script. The memory footprint is smaller but you have to cart Fop around with you. From Cocoon, you can generate fancy-pants documents over the network. Cocoon has a much steeper and lengthy learning curve. Fop is much smaller and comes with lots of examples. - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, email: [EMAIL PROTECTED]
Re: License Issue Inquiry
On Thursday 04 April 2002 14:47, you wrote: > At 08:34 AM 4/4/02 -0500, Charles Marcus wrote: > >like it could be the answer. The only question is, can OOo use it? > license of it's software. My question is why does OOo require FOP to > be LGPLed? You can integrate it into OpenOffice without it being > LGPL. You can't change the Apache license into LGPL but you can license YOUR CODE under LGPL (or GPL or even MicroSloth EULA). This will require anyone using YOUR CODE to observe LGPL (or whatever), which is what you want to do, as I understand it. This would allow anyone else to extract the Apache-licensed code out of your distribution and use it under Apache terms, as long as they removed all of the LGPL code from it. Of course, it would be easier to go out and get a pristine copy of FOp instead. The only thing you can create license terms for is YOUR CODE and then it still has to be provably original etc. etc. > If you do this it would also be fair if you contribute changes to FOP > back to the Apache tree. - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, email: [EMAIL PROTECTED]