from:"John Austin"

How does one unsubscribe

2004-04-24 Thread John Austin

I'll be changing to a new e-mail address and am cancelling my
list subscriptions.

fop-user has instructions at the bottom of messages but
fop-dev doesn't.


I guess I'll have to read the web page.

Re: [Fwd: Re: cvs commit: xml-fop/src/java/org/apache/fop/apps CommandLineOptions.java Fop.java]

2004-04-12 Thread John Austin

On Mon, 2004-04-12 at 04:33, Peter B. West wrote:
> Glen,
> 
> I put in a vote for Simon.  The language thing is confusing, I know. 
> There have been occasions on which the Austrian flag has been flown, or 
> the Austrian National Anthem been played, somewhat inappropriately.  But 
> it's en_AU over here; AU because we got in first.

And the Austrians don'd call it Austria ... Isn't it Osterreich 

John Austin <[EMAIL PROTECTED]>

Re: urgent help needed using FOP

2004-04-01 Thread John Austin

ush();
> 
> 
> 
> } catch (Exception ex) {
> throw new ServletException(ex);
> }
> }
> 
> ...
> 
> }
> 
> 
> 
> This is the exact error I got:
> 
> org.xml.sax.SAXParseException: Content is not allowed
> in prolog.
>   at
> org.apache.xerces.parsers.AbstractSAXParser.parse(AbstractSAXParser.java:1172)
>   at org.apache.fop.apps.Driver.render(Driver.java:498)
>   at org.apache.fop.apps.Driver.run(Driver.java:565)
> 
> 
> 
> __
> Do you Yahoo!?
> Yahoo! Small Business $15K Web Design Giveaway 
> http://promotions.yahoo.com/design_giveaway/
-- 
John Austin <[EMAIL PROTECTED]>

Re: DO NOT REPLY [Bug 27901] - TextCharIterator.remove() does not work properly

2004-03-25 Thread John Austin

On Thu, 2004-03-25 at 19:08, Glen Mazza wrote:
> Ich bin confused--ist chz ([EMAIL PROTECTED])--Christian
> Geisert oder anderer Christian?  The bugzilla entry
> lists chz as being "Christian Z", so I'm not sure whom
> I'm speaking with!  

So we shouldn't all be running around with multiple e-mail
identities ?

My excuse is, I used that e-mail address years ago when I opened
my first Bugzilla account.

-- 
John Austin <[EMAIL PROTECTED]>

Re: Java thory and proctice: Garbase collection and performance

2004-02-20 Thread John Austin

On Fri, 2004-02-20 at 15:46, J.Pietschmann wrote:

> *bg*
> Twenty years ago, I had to work on a 8008 driven computer
> with 4k RAM and 12k ROM. That's enough to run a program
> which nicely prints formatted and justified text (25 lines
> a 80 characters). We went a lng way since then.

I went to a presentation on the Mars Rover's at the St John's GeoCentre
which is one of the sites that NASA has granted access to the FTP site
for fresh Images ...

Comparing the old Mars projects to the new stuff ...

That was FORTRAN ... This is Java.

I recall hearing about a court case in which the Canadian Military were 
suing a supplier about something as trivial nowadays as 8K of memory.

-- 
John Austin <[EMAIL PROTECTED]>

Re: Java thory and proctice: Garbase collection and performance

2004-02-19 Thread John Austin

On Thu, 2004-02-19 at 17:53, J.Pietschmann wrote:
> John Austin wrote:
> > I noticed this artcle on Developer Works:
> > 
> > Java theory and practice: Garbage collection and performance
> > http://www-106.ibm.com/developerworks/library/j-jtp01274.html
> > 
> > Something to read on Thursday.
> 
> Nice read, however, they don't talk about constructors. There

Isn't allocation the only unseen part of construction ? Everything
else is visible in the code and surely a few assignments are never
expensive. Any other expensive operations will stand out in
measurements of code execution.

> are still arguments for reusing objects and for trying to
> replace objects with a bunch of primitive values.
> (BTW a nice try selling yet-to-be-written optimizations
> regarding inlining...)

Moore's law is another optimization we sell in advance
all the time.

-- 
John Austin <[EMAIL PROTECTED]>

Java thory and proctice: Garbase collection and performance

2004-02-18 Thread John Austin

I noticed this artcle on Developer Works:

Java theory and practice: Garbage collection and performance
http://www-106.ibm.com/developerworks/library/j-jtp01274.html

Something to read on Thursday.
-- 
John Austin <[EMAIL PROTECTED]>

RE: Just a small question...

2004-02-05 Thread John Austin

On Thu, 2004-02-05 at 15:28, Andreas L. Delmelle wrote:

> I think this is a bit over the top. Suppose that tomorrow, someone gets
> fired at RX or AH, and this ex-employee decides to share some ideas with us.
> Are we really going to tell him to take a hike?? Just because of simple
> integrity? (Suppose that, before we find out, he has already submitted a few
> patches that have been applied. Would we undo all of these patches, because
> of 'simple integrity'?)

I am surprised that MS or their minions at SCO haven't twigged to the
following scheme"

They could 'set-up' Open Source by masquerading as some student
in netland and submit some provably proprietary code as original.

Six months later, MS sues Linus for malfeasance with the vigorous
support of Homeland Security ... 

Of course, conspiracies never succeed for long. Some small fish
would rat them out.

-- 
John Austin <[EMAIL PROTECTED]>

Re: (FOP examples) XSLT question

2004-02-04 Thread John Austin

On Wed, 2004-02-04 at 21:13, Glen Mazza wrote:

>  
> ...
> ...
> Version  select="$versionParam"/> ...
> 
> But it keeps outputting "Version 1" in the resultant
> PDF.  What is the standard way of getting it to
> display "Version 1.0"?

select='format-number($versionParam,"##.0")'

should work.
-- 
John Austin <[EMAIL PROTECTED]>

Re: (FOP examples) XSLT question

2004-02-04 Thread John Austin

On Wed, 2004-02-04 at 21:13, Glen Mazza wrote:
> Since this is FOP work-related, I guess I can be
> allowed to ask a very newbie XSLT question here:
> 
> I just added a parameter to one of the XSL example
> files (eventually to show the use of a JAXP
> transformer.setParam() call) as follows:
> 
>  
> ...
> ...
> Version  select="$versionParam"/> ...
> 
> But it keeps outputting "Version 1" in the resultant
> PDF.  What is the standard way of getting it to
> display "Version 1.0"?

Isn't there also  or   for this sort of
thing. I think value-of implies some kind of conversion ...

My reference is upstairs.


-- 
John Austin <[EMAIL PROTECTED]>

RE: Unnesting properties and makers.

2004-01-26 Thread John Austin

On Mon, 2004-01-26 at 17:45, Andreas L. Delmelle wrote:
> > -Original Message-
> > From: Finn Bock [mailto:[EMAIL PROTECTED]
> 
> > The result is then:
> >
> > [/d/fop] /c/java/jdk1.2.2/jre/bin/java.exe  -cp . x
> > false method call 581
> > true method call 581
> > false instanceof 160
> > true instanceof 170
> >
> > [/d/fop] /c/java/jdk1.3.1_03/jre/bin/java.exe  -cp . x
> > false method call 1272
> > true method call 2304
> > false instanceof 17945
> > true instanceof 912
> >
> > [/d/fop] /c/java/j2sdk1.4.2_02/bin/java.exe -cp . x
> > false method call 2154
> > true method call 2754
> > false instanceof 590
> > true instanceof 651
> >
> 
> Very, very interesting... Java's OO-optimization at its best (except for
> 1.3)! After all, it shouldn't be *that* surprising that an
> accessor-method-call generates more overhead than a test for
> class-membership (but what if the class in question is not yet loaded at
> time? Not that this should occur a lot...)

So I copied that program and ran it on my RH 9 system.

Got the following results. I am just quoting the results here:

Note that the default JVM is -client or HotSpot ...

[EMAIL PROTECTED] foptest]$ java -classpath . x
false method call 998
true method call 1001
false instanceof 3008
true instanceof 4119
[EMAIL PROTECTED] foptest]$ java -server  -classpath . x
false method call 1
true method call 0
false instanceof 0
true instanceof 4822
[EMAIL PROTECTED] foptest]$ java -server x
false method call 1
true method call 0
false instanceof 0
true instanceof 4784

java version "1.4.2"
Java(TM) 2 Runtime Environment, Standard Edition (build 1.4.2-b28)
Java HotSpot(TM) Client VM (build 1.4.2-b28, mixed mode)

H.

-- 
John Austin <[EMAIL PROTECTED]>

RE: Unnesting properties and makers.

2004-01-26 Thread John Austin

On Mon, 2004-01-26 at 17:45, Andreas L. Delmelle wrote:
> > -Original Message-
> > From: Finn Bock [mailto:[EMAIL PROTECTED]
> 
> > The result is then:
> >
> > [/d/fop] /c/java/jdk1.2.2/jre/bin/java.exe  -cp . x
> > false method call 581
> > true method call 581
> > false instanceof 160
> > true instanceof 170
> >
> > [/d/fop] /c/java/jdk1.3.1_03/jre/bin/java.exe  -cp . x
> > false method call 1272
> > true method call 2304
> > false instanceof 17945
> > true instanceof 912
> >
> > [/d/fop] /c/java/j2sdk1.4.2_02/bin/java.exe -cp . x
> > false method call 2154
> > true method call 2754
> > false instanceof 590
> > true instanceof 651
> >
> 
> Very, very interesting... 

When did the choice of JVM (java -client | java -server) appear ?

Wasn't it 1.3 ?
-- 
John Austin <[EMAIL PROTECTED]>

Re: Newbie committer questions.

2004-01-20 Thread John Austin

On Tue, 2004-01-20 at 21:10, Glen Mazza wrote:
> Actually, yes and no, as I learned a ton--XSLT, Java,
> and FOP--from his coding.  (Jeremias never taught me
> that much! ;)  However, I am quite fatigued right now

You have been impressively active in the past short while.

> and need a few days off.  Finn, would you mind taking
> over the rest of your last patch?  The issues I found
> can be discussed and changed, if necessary, after you
> apply it.

Sometimes I feel like I did when I first heard Wynton
Marsalis play some variations from the "Trumpet Method"
by J. Arban ... (Carnival of Venice)

Man, he plays that faster than I can READ it ...
-- 
John Austin <[EMAIL PROTECTED]>

Re: Quote of the Day

2004-01-19 Thread John Austin

On Mon, 2004-01-19 at 16:17, Andreas L. Delmelle wrote:

> "When I hear Bill Gates bragging about how his programmers can code up to 72

There's the world's richest hermit again.

Maybe he'll end up nuttier than Howard Hughes.
-- 
John Austin <[EMAIL PROTECTED]>

Re: Servlet Examples in HEAD v.s. 0.20.5

2004-01-18 Thread John Austin

On Sun, 2004-01-18 at 08:49, J.Pietschmann wrote:
> John Austin wrote:
> >(is Content-length: required for any reason other than placating
> >Acrobat and that rich hermit who lives outside Redmond WA ?) 
> 
> Not really a FOP topic but anyway.
> Setting content-length is considered "good style", because it allows
> browsers give feedback to the users how far the download proceeded.
> This is especially useful for larger files on slow connections.
> Of course, there is a tradeoff for dynamically generated content:
> there wont be any feedback at all until the content is ready, and
> if this is longer than the download time itself (now that everybody
> has broadband :-) ), the user is still dissatisfied. Well, the
> IEx architecture bug saves us from pondering the philosophical
> background.

Mentioned because it is in the extant codebase even though it isn't
necessary. I deduce it is related to Acrobat because of cryptic comments
in the documentation.

> > 2) Cache Templates objects for faster Transformations when XSLT
> >files are to be re-used. The 'Java and XSLT' O'Reilly book
> >has some interesting suggestions in this area.
> 
> The problem is to detect style sheet reuse without context information.

I think the only prob is how to purge from the cache. Re-use detected 
if names are URL's. Still faces the problem of detecting changes to
stylesheets. Discussed a bit in Burke's book.

> > 3) Using URL's for the fo= and xml=,xsl= parameters so we can use
> >network resources as well as local files.
> 
> +1000.
> Doh, revert to +0. I'd like to do this, unfortunately, this is not
> without drawbacks:
> - People have to learn what an URI is. This seems to be much harder
>   than expected, especially for file:-URLs.
> - People will still insist to keep "xml=foo.xml". This is still an
>   URL (actually: a relative URL reference, which has to be resolved).
>   We have to think hard what the base URL is in this case.

What if default xml=fred.xml is mapped to xml=file://./fred.xml where
the servlet's 'working dir' is defined relative to servlet context.
The we can ship some of our test xml/xsl files in that location and
people have something to start with.


> J.Pietschmann
-- 
John Austin <[EMAIL PROTECTED]>

Re: Servlet Examples in HEAD v.s. 0.20.5

2004-01-17 Thread John Austin

On Sat, 2004-01-17 at 19:18, Jeremias Maerki wrote:
> Discussion on this can be found here:
> http://marc.theaimsgroup.com/?t=10383153256&r=1&w=2
> http://marc.theaimsgroup.com/?t=10172302692&r=1&w=2
> 
> There were pros and cons about the move from examples into the main
> source tree. I think the triggering point was that the servlet sees real
> use and doesn't really qualify as an "example". I agree that with
> today's build it may not be so obvious what is necessary to build the
> WAR file (the various parts are distributed in the source tree). But the
> WAR file gets built automatically today.

Doh! I did a 'locate fop.war' and there it is! Of course, my oldish
snapshot from HEAD doesn't work, so there was no output from
running it, but it does build and deploy into Tomcat.

> Proposal: I like your ideas. I also think that we have to preserve the
> simplicity of the servlet as an educational example for people who want
> to play with it. So what about resurrecting the examples/servlet but
> keeping it real simple? Just the basics. And the servlet in the main
> source tree stays where it is but gets your new features.

I would expect an example servlet to be quite simple with descriptive
comments and suggestions for variations. The purpose here would be to
provide a prototypical webapp that could be used to populate a small
project in the user's development space.

I did find it difficult to settle on a set of features that I would 
include in a single 'FopServlet' program. This is simplified if
FopServlet is primarily real working code. I would be comfortable
with an org.apache.fop.servlet.FopServlet that included some more
advanced features:

1) Deflate and Inflate the byte stream used to store the PDF file
   (is Content-length: required for any reason other than placating
   Acrobat and that rich hermit who lives outside Redmond WA ?) 
2) Cache Templates objects for faster Transformations when XSLT
   files are to be re-used. The 'Java and XSLT' O'Reilly book
   has some interesting suggestions in this area.
3) Using URL's for the fo= and xml=,xsl= parameters so we can use
   network resources as well as local files.
4) Detect IE and redirect users to a URL that has the proper '.pdf'
   filetypes in basename and end of request URL.
5) The servlet could be used as part of an automated testing process.
   The fop.war file could be deployed in Tomcat as part of an HttpUnit
   test and then many of our tests could be run using HttpUnit.

Examples could be simpler than this as they have the specific purpose of
illustrating a practical use case.

> German speaking Swiss people would say you get "de Föifer und's Weggli"
> (freely translated to english: the 5 cent piece and the donut. Meaning:
> You get twice as happy. Want to know what a "Weggli" is? Go to
> http://www.jowa.ch/1776/1846/1847/1865/1867.asp). :-)

My German has atrophied over the past 31 years. I left Ramstein, Germany
in July 1973 and except for one undregrad course, have only spoken
German once or twice since. [I stopped overnight in Lahr about 1978.]

As a Canadian I understand 'donut' (see http://www.timhortons.com/) but 
I always think of brotchen as a German pastry. 

-- 
John Austin <[EMAIL PROTECTED]>

Servlet Examples in HEAD v.s. 0.20.5

2004-01-17 Thread John Austin

After the last week's thread about running FOP in a servlet,
I thought I'd review the examples  with a view to improving
the end-user experience and flattening the learning.

Some notes:

The current sample: org.apache.fop.servlet.FopServlet has been
improved in HEAD but the packaging seems (IMHO) to have suffered.

In 0.20.5 the examples/servlet directory contains a fully-functional
web application that can be deployed and run in the latest Tomcat.
This webapp includes a valid build.xml file so one can simply type:
'ant' in the examples/servlet directory. Even better, the Ant Farm
plugin in Jedit can build 'fop.war'. From the Tomcat Manager window,
you can upload 'fop.war' and use the webapp right away.

The HEAD version of FopServlet has been rewritten to use JAXP
and works reasonably well. Unfortunately, the examples/servlet
directory has disappeared from the project. It has not disappeared 
from the documentation, so there is an error there.

The servlet seems to make provisions for peculiarities of the Acrobat
plug-in (writes the PDF to a memory buffer then copies this to 
response.getOutputStream() after setting the Content-length header).
This knowledge SHOULD appear in program comments. The same is true 
for information about Internet Explorer and it's need for the filetype
'.pdf' in the base URL and the end of the invoking URL. One could even
update the example to issue a redirect for Evil(tm) User Agents so that
the IE user's request is corrected for him (heh .. heh).

Would anyone be offended if we were to put the examples/servlet back in
to the build ? We could update the deployment descriptor to use:
org.apache.fop.servlet.FopServlet (and FopPrintServlet) as well as one
or two new examples (whose code appears in the examples/servlet
directory), that illustrate other concepts such as cached Templates
objects and the use of Deflator/Inflator streams to reduce the size of
the in-memory PDF file buffer. I have some thoughts about generalizing
FopServlet to use URL parameters so that both server-side files and
network-resident HTTP resources would be usable.

I would consider adding some of my own test files which demonstrate the
use of FOP to generate letters and print envelopes from data base
output.

It should be possible to build a servlet example that executes all of
the .fo and .xml/.xsl files in the examples directory. It would be
nice for potential users to have an out-of-box webapp that runs a 
large number of our examples. 
  
-- 
John Austin <[EMAIL PROTECTED]>

Re: HashMap

2004-01-14 Thread John Austin

On Wed, 2004-01-14 at 21:27, Peter B. West wrote:
> A friend was watching over my shoulder as I was responding to an earlier 
> message on fop-dev.  "HashMaps... I won't say what image that conjures 
> up for me."  "Well?"  "A map of where you have the stash."
> 
> I never thought of it that way.

Those of you in 'foreign climes' won't have heard of Canada's
latest drug bust. A former brewery north of Toronto was being used 
as one of the largest 'grow ops' (hydroponic marijuana factory)
ever discovered.

The Globe and Mail (http://www.globeandmail.com/) stated that
Ontario produces more weed than the entire population could
possibly smoke. There's an image of Canada that I want Europeans
to have. Of course it would slow hockey down quite a bit (but it
would dramatically increase concession sales at NHL games ...)
and cut out the fights. And only one of the Cheech and Chong guys
is/was Canajun, eh!

Anyway ... the former Molson's brewery in Barrie Ontario next
to Highway 400 (Interstate/Motorway/Autobahn) ... had everything
they needed ... huge metal kettles ... loading docks ... 
-- 
John Austin <[EMAIL PROTECTED]>

RE: [Bug 25480] - Experimental performance improvements.

2004-01-13 Thread John Austin

On Tue, 2004-01-13 at 20:49, Glen Mazza wrote:
> Let's not get too certain of anything right now with
> respect to implementation--but you probably have a
> point--a huge and very repetitively formatted document
> (say, the Chicago phone book, perhaps) would have
> comparatively fewer properties with a higher
> cardinality for each.

SOLVED! Yes!

Something to cheer up a morbidly downcast Packers fan two
days after the fall of the mighty number '4'.

I used DocBook for the frequency table because I was familiar
with formatting it as PDF with FOP. I suspect that properties
have similar distributions in general because XSL-FO are always
generated with programs and (ransom notes notwithstanding) 
adhere to general styles.

Really repetitive documents would be only slightly more skewed
than general text documents. (Say 90-10 rather than 80-20).

Someone told me where to get the style sheets for the XSL-FO
specification (RenderX) and I wanted to generate the XSL-FO
file for it, as a more appropriate 'challenge' for the project. 

--
John Austin <[EMAIL PROTECTED]>

Re: PropertySets - target-locks on SDK 1.4

2004-01-05 Thread John Austin

On Mon, 2004-01-05 at 21:11, Glen Mazza wrote:
> It's probably not *yet* time to set 1.4 as the JDK to
> code against for 1.0, but it probably wouldn't be much
> of a disaster if we did so either.

Does a target-lock commitment like this require a vote ?

John Austin <[EMAIL PROTECTED]>

Re: AW: Regression tests was: Re: Output from NIST test suite

2003-12-26 Thread John Austin

On Fri, 2003-12-26 at 05:29, Peter Kullmann wrote:
> J. Pietschmann wrote:
> > 
> > John Austin wrote:
> > > RedHat 9.0 (my system anyhow) includes a command 'pdftopbm' 
> > that will
> > > convert a PDF to multiple PBM (protable Bit Map) files that might be
> > > comparable.
> > ...
> >  >It would certainly help detect pixel-sized changes.
> > > That might help regression testing.

I wasn't thinking of using graphics as the primary means of 
comparing output. It was just a thought that one could use
visualization in some circumstances: 

+ pixels that were white in both images would be
  rendered as white
+ pixels that were black in both images would be
  rendered as black
+ black pixels in the first image that were white
  in the second could be rendered as red
+ white pixels in the first image that were black
  in the second could be rendered as blue

I thought of the idea of overlaying images for comparison when
I was scrolling through the side-by-side renderings of PDF's
that Finn posted yesterday (what does 'yesterday' mean in a
discussion that crosses the International Date Line ?)

Of course, this color-based scheme breaks down for test cases
that use color.

> > 
> > We need regression tests badly. Some problems to ponder:
> > a) Tests need to be automated for actually being useful.
> >   JUnit seems the way to go. Unfortuanately, it's still
> >   underutiliyed in FOP.
> > b) We don't have much *unit* tests. There's only the
> >   UtilityCodeTestSuite.java. We need much more tests for
> >   basic functionality. The problem seems to be however
> >   that an elaborated test harness needs to be written in
> >   order to do unt tests for, e.g. layout managers.
> > c) In order to test the whole engine at once, from FO input
> >   to generating PDF/whatever, well, a binary compare with
> >   a pregenerated PDF would be as sufficient as comparing
> >   bitmap images. Problems here:
> >   + The files to compare against are binary, and consume
> >a lot of space. Well, take a look at GenericFOPTestCase.java
> >which uses MD5 sums, one for the FO in order to detect
> >accidental changes to the source, and one for the result.
> >   + Even small changes have potential to break the whole test
> >suite, even if nothing important changed, let's say the
> >order of entries in a PDF dictionary. Rendering bitmaps
> >from PDF eliminates this, but then you wont find regressions
> >in non-visible stuff.
> > All in all, if there are 143 template PDFs and a change causes
> > mismatches for all, what will you do? Examine everything,
> > comparing pixels, check whether there are visible differences
> > at all, and then judge whether the original or the newly
> > generated PDF is at fault? I don't think this will be done
> > often.

Use tests for binary equality to detect differences. Visualization
might be one tool, useful in following up on detected differences.

I might want to use the technique to compare the effects of changes 
to a document. For example:

What happens on page 7 when I change space-before="10pt" to
space-before="15pt" ?

A colorized visualization would give me a better idea than separate
files. Remember that our brains are all quite different. Your
rote visual memory ability is probably much better than mine. You
might learn more from a side-by-side comparison than I would.

Crap. Now I have to give an example. Perhaps it won't take that 
long.

> > 
> > Ideas welcome!
> > 
> > J.Pietschmann
> > 
> 
> As an alternative approach for c) one could create tests along 
> the following lines: Suppose you want to test left margin 
> properties of a block. For this a simple fo file is rendered as 
> a bitmap. The bitmap will not be compared to a reference bitmap
> but some elementary assertions are calculated. For instance one
> such assertion could be: "The rectangle of width 1 inch of the
> left edge is blank." I don't know of a tool that can do this
> but it should be pretty straight forward to implement. 

Probably not that hard to do once you get inside an image file
in a program. Especially if you know the colors will be black
(0,0,0) and white (255,255,255) or a small number of selected 
colors.

> So, in the test suit one has a piece of fo containing a test 
> document and some assertions in java or coded in xml that should
> be fulfilled by the rendered image of the fo. 
> 
> Assertions could contain some of the following pieces:
> - a specified rectangle is blank (or of some specific color)
>

Re: Output from NIST test suite

2003-12-25 Thread John Austin

On Thu, 2003-12-25 at 11:42, Finn Bock wrote:
> Hi,
> 
> After 'fixing' the master-reference issue in my copy of the NIST test 
> suite, I ran the tests against 0.20.5 and 1.0dev and merged the result 
> side by side into a single .pdf file.
> 
> You can download the result (1Mb) here:
> 
> http://bckfnn-modules.sf.net/out-0.20.5-1.0.pdf
> 
> For some reason the pdf does not display correctly in my browsers, so it 
> is better to download it. The merged pdf file is created using iText.
> 
> The square to the left contains the output from 0.20.5 and the square on 
> the right the output from HEAD.
> 
> Here is also a merge between the pdf files that comes with the NIST 
> suite and head:
> 
> http://bckfnn-modules.sf.net/out-nist-1.0.pdf
> 
> There is still a few issues left to fix .
> 
> 
> Another way of using the test suite could be to compare a binary image 
> of the pages against some kind of reference. Has such a approach been 
> tried? Does anyone know of available software that can render a PDF as 
> an image file?

RedHat 9.0 (my system anyhow) includes a command 'pdftopbm' that will
convert a PDF to multiple PBM (protable Bit Map) files that might be
comparable. They would be convertable in to other formats such as PNG
(or GIF for the patent-minded). 

I found the result pretty poor (ugly text badly in need of
anti-aliasing). That might help contribute to keeping images 
similar. It would certainly help detect pixel-sized changes.
That might help regression testing.

There are suggestions on the Net that Ghostcript can do this sort of 
conversion as well.

GIMP can read a PDF as well. When I tried it, I got a graphic for every
pair of pages (my doc was over 133 pages). Perhaps some script-fu ... ?

> regards,
> finn
-- 
John Austin <[EMAIL PROTECTED]>

Re: Output from NIST test suite

2003-12-25 Thread John Austin

On Thu, 2003-12-25 at 11:42, Finn Bock wrote:
> Hi,
> 
> After 'fixing' the master-reference issue in my copy of the NIST test 
> suite, I ran the tests against 0.20.5 and 1.0dev and merged the result 
> side by side into a single .pdf file.

Interesting technique.

What tool do you use to make the side-by-side comparison ?
-- 
John Austin <[EMAIL PROTECTED]>

Re: Is this a coding flaw ?

2003-12-19 Thread John Austin

Nothing to do with optimization. Just noticed some wrongness
that has the possibility to be pathological wrongness. Classes
should preclude the possibility of erroneous use. The subject
was making a URL resolver thread-safe. The class in question is
a source of state information needed later by the resolver.

[Lucky thing we didn't mention the dirty knife!]

On Fri, 2003-12-19 at 11:50, Ben Galbraith wrote:
> Jeremias Maerki wrote:
> > Hmm, again, we could probably cache the value. Not very elegant, of
> > course, but how else do we get that value which is used in several
> > places?
> 
> Just an outsider's point-of-view: it probably doesn't make sense to 
> waste time optimizing code like this unless a profiler indicates that 
> it's a bottleneck.
> 
> Randomly searching through code for potential inefficiencies has widely 
> been disproven as an effective optimization technique.  ;-)
> 
> Ben
> 
> > 
> > On 19.12.2003 13:57:26 John Austin wrote:
> > 
> >>And of course, I missed the fact that the last method in the class
> >>contains a pathological use. To get the name of this class, we create a
> >>parser ?  
> >>
> >>   /**
> >> * Returns the fully qualified classname of the standard XML parser
> >>for FOP
> >> * to use.
> >> * @return the XML parser classname
> >> */
> >>public static final String getParserClassName() {
> >>try {
> >>return createParser().getClass().getName();
> >>} catch (FOPException e) {
> >>return null;
> >>}
> >>}
> > 
> > 
> > 
> > Jeremias Maerki
> > 
-- 
John Austin <[EMAIL PROTECTED]>

Re: Is this a coding flaw ?

2003-12-19 Thread John Austin

On Fri, 2003-12-19 at 10:02, Jeremias Maerki wrote:
> I should be thread-safe, the way it is used here. You could of course,
> cache the SAXParserFactory instance but I doubt the performance
> improvement would be measurable. getParser is probably not the best name
> if you look at it from a bean-oriented angle but it's not that it's
> called many times anyway. Do you think we should rename it?
> 
> On 19.12.2003 13:13:45 John Austin wrote:
> > I found the following snippet in the class FOFileHandler:

And of course, I missed the fact that the last method in the class
contains a pathological use. To get the name of this class, we create a
parser ?  

   /**
 * Returns the fully qualified classname of the standard XML parser
for FOP
 * to use.
 * @return the XML parser classname
 */
public static final String getParserClassName() {
try {
return createParser().getClass().getName();
} catch (FOPException e) {
    return null;
}
}

-- 
John Austin <[EMAIL PROTECTED]>

Re: Is this a coding flaw ?

2003-12-19 Thread John Austin

On Fri, 2003-12-19 at 10:02, Jeremias Maerki wrote:
> I should be thread-safe, the way it is used here. You could of course,
> cache the SAXParserFactory instance but I doubt the performance
> improvement would be measurable. getParser is probably not the best name
> if you look at it from a bean-oriented angle but it's not that it's
> called many times anyway. Do you think we should rename it?

As long as we are certain that it is being used correctly, probably
not necessary. Just jumped a bit when I saw the possibility that it
would be easily mis-used.

> 
> On 19.12.2003 13:13:45 John Austin wrote:
> > I found the following snippet in the class FOFileHandler:
> > 
> > ===
> > /**
> >  * @see org.apache.fop.apps.InputHandler#getParser()
> >  */
> > public XMLReader getParser() throws FOPException {
> > return createParser();
> > }
> > ===
> > 
> > and the createParser() method
> > 
> > ===
> > /**
> >  * Creates XMLReader object using default
> >  * SAXParserFactory
> >  * @return the created XMLReader
> >  * @throws FOPException if the parser couldn't be created or
> > configured for proper operation.
> >  */
> > protected static XMLReader createParser() throws FOPException {
> > try {
> > SAXParserFactory factory = SAXParserFactory.newInstance();
> > factory.setNamespaceAware(true);
> > factory.setFeature(
> > "http://xml.org/sax/features/namespace-prefixes";, true);
> > return factory.newSAXParser().getXMLReader();
> > 
> > 
> > 
> > ===
> > 
> > Now it would seem to me that a 'getter' method should not go around 
> > creating objects every time it needs to. It hust doesn't look right.
> > 
> > I assume that SAXParserFactory is thread-safe.
> 
> 
> Jeremias Maerki
-- 
John Austin <[EMAIL PROTECTED]>

Is this a coding flaw ?

2003-12-19 Thread John Austin

I found the following snippet in the class FOFileHandler:

===
/**
 * @see org.apache.fop.apps.InputHandler#getParser()
 */
public XMLReader getParser() throws FOPException {
return createParser();
}
===

and the createParser() method

===
/**
 * Creates XMLReader object using default
 * SAXParserFactory
 * @return the created XMLReader
 * @throws FOPException if the parser couldn't be created or
configured for proper operation.
 */
protected static XMLReader createParser() throws FOPException {
try {
SAXParserFactory factory = SAXParserFactory.newInstance();
factory.setNamespaceAware(true);
factory.setFeature(
"http://xml.org/sax/features/namespace-prefixes";, true);
return factory.newSAXParser().getXMLReader();



===

Now it would seem to me that a 'getter' method should not go around 
creating objects every time it needs to. It hust doesn't look right.

I assume that SAXParserFactory is thread-safe.


-- 
John Austin <[EMAIL PROTECTED]>

Re: FOs and Areas

2003-12-17 Thread John Austin

On Wed, 2003-12-17 at 15:56, J.Pietschmann wrote:
> I've got a lot of ideas myself, perhaps too many. What the
> project needs is *working* *code*.

Amen!

[but a short one, not drawn out like the final chorus of Messiah!]
-- 
John Austin <[EMAIL PROTECTED]>

What should I be doing ?

2003-12-16 Thread John Austin

As I mentioned off-line to another list member, I have some
questions about the progress of the current Fop development
effort.

So far:

i) I have made a few measurements and reconfirmed some other peoples
   opinions about possible areas for improvement.

ii) I have also proof-read some code from Alt-Design as preparation
for possibly working on it's integration. 

iii) Written a few Problem Reports in Bugzilla to better document
 the 'here and now' status of the HEAD development stream.

iv) Established a statistical basis for an object discovery and
re-use strategy.

v) Provoked some discussion of string interning (I was aiming for
   something grander in terms of [iv] above.)


Everyone is extremely polite and encouraging and I have every
confidence in the abilities of each active member of this
discussion. 

BUT:

I don't have a feeling that we are capturing any territory.

The discussions are lively and quite enlightening but they seem to
peter out or double back on themselves. I don't see any state
changes from Bugzilla indicating that anything is getting fixed
and my experience tells me that this is not healthy.


-- 
John Austin <[EMAIL PROTECTED]>

Re: (Victor et al) Re: Performance improvements.

2003-12-13 Thread John Austin

I haven't looked at the XSLT code but I have a question
in my mind that I need to answer about it.

I wonder what it is that is being generated and what were 
the design alternatives to the codegen implementation.

One question that popped in to my head was:

Is there 'missing polymorphism' here ?

As I said, I only have the question at this time. 

On Sat, 2003-12-13 at 12:12, Glen Mazza wrote:
> -1.  I'd like to hold off on this, at least until I
> can gain a better understanding of the autogenerated
> code.  I may still to the same conclusion as the other
> committers, but Finn's endorsement of the XSLT--as
> well as the long work of those like Keiron who have
> worked with the XSLT files--suggests that there are
> significant time benefits to using them.  (At work, I
> use "SQL to write SQL" all the time, and love the time
> efficiencies that result.)
> 
> If we check in the Java code, then changes may end up
> being made to those files directly, which will result
> in the XSLT files becoming unregeneratable.  Or, every
> run of the XSLT will require re-modification of the
> changes made manually to all the Java
> files--potentially dozens--100's of files.  So I'm
> kind of leery about doing this at the moment.
> 
> [Actually, I'm looking forward to studying the XSLT
> that generates these files--as I mentioned to Clay
> that CVS and Ant were two of the initial benefits you
> get by working on FOP, apparently being about to write
> Java code using XSLT is a third one...i.e., Yeehaw!,
> as I believe he had put it... ;)]
> 
> Glen
> 
> --- "J.Pietschmann" <[EMAIL PROTECTED]> wrote:
> > Finn Bock wrote:
> > > I like the generation process as it allowed me to
> > try out and experiment 
> > > with different optimizations. I don't think that I
> > realisticly could 
> > > have added caching of compound properties or
> > changed the abs2rel/rel2abs 
> > > code if I had to change the Maker classes
> > manually.
> > 
> > If its common code, that's what class hierarchies
> > and
> > inheritance are made for.
> > 
> > J.Pietschmann
> > 
> > 
> 
> 
> __
> Do you Yahoo!?
> New Yahoo! Photos - easier uploading and sharing.
> http://photos.yahoo.com/
-- 
John Austin <[EMAIL PROTECTED]>

Re: Testing for main development stream.

2003-12-07 Thread John Austin

On Sun, 2003-12-07 at 06:25, J.Pietschmann wrote:
> John Austin wrote:
> > It seems that the relative file reference ../graphics/page.gif is
> > computed by the program relative to the 'current directory' not
> > relative to the file: 'test/xml/bugtests/image.fo'.
> > 
> > I'm sure the spec has an opinion on this.
> 
> Interestingly, the XSLFO spec doesn't have an opinion on this. However,
> by using the term "URL" they probably imply the usual resolving procedure
> for URLs apply, meaning any relative URL is resolved against the base URL
> of the containing document or base document (in case the FO is generated
> by XSLT).
> This means there is a problem to correct.
> 
> J.Pietschmann

So, the desired behaviour is open a report in Bugzilla ?

Will do that for the three or four I found.
-- 
John Austin <[EMAIL PROTECTED]>

Testing for main development stream.

2003-12-06 Thread John Austin

I ran a few tests of a recent copy of the 1.0dev
stream and found some errors. 

What are your preferences for problem reports at this time ?

Should I enter issues into BugZilla as I find them ?

Should I take a look at the code and notify the committer
who last worked on anything I find ?

So far:

1) ./build.sh test <-- testing fails quickly
2) ./build.sh junit <-- are there any tests ? 
3) from root directory (the one containing build.xml) I ran:

find test -name "*.fo" -print -exec ./test.sh {} \;

where test.sh contains:

#!/bin/sh

java -Xms100m -Xmx200m -cp
.:build/fop.jar:lib/avalon-framework-4.1.4.jar:lib/batik.jar:lib/commons-io-dev-20030703.jar
 org.apache.fop.apps.Fop -fo ${1} -pdf /tmp/$$.pdf

I get quite a few errors.

One example problem (or non-problem):

test/xml/bugtests/image.fo
[INFO] 1.0dev
[ERROR] Error while opening stream for (file:../graphics/page.gif):
.../graphics/page.gif (No such file or directory)
java.io.FileNotFoundException: .../graphics/page.gif (No such file or
directory)
at java.io.FileInputStream.open(Native Method)


It seems that the relative file reference ../graphics/page.gif is
computed by the program relative to the 'current directory' not
relative to the file: 'test/xml/bugtests/image.fo'.

I'm sure the spec has an opinion on this. There are other errors.
(other opinions too no doubt)

test/xml/bugtests/text-transform.fo
[INFO] 1.0dev
Invalid byte 1 of 1-byte UTF-8 sequence.
Turn on debugging for more information






-- 
John Austin <[EMAIL PROTECTED]>

Measure (accurately) before optimizing.

2003-12-03 Thread John Austin

Mea (tool) culpa!

I am investigating an inaccuracy in CPU measurements reported
by the Java Memory Profiler Tool that led me to the conclusion
thet PropertyList.findProperty is the high-runner in FOP 0.20.5.

A couple of other profilers report that findProperty() uses more
CPU than we would like (10-12%) but less than JMP reports. Note
that this measurement error in JMP also affects other XML code
such as Xerces and Xalan as these are also recursive.

I reported the question to the jmp-dev list and will advise when I
get corrected results from a corrected program.

I found a good list of profilers at:
http://www.computerprograms.com/Directory/Computers/Programming/Languages/Java/Development_Tools/Performance_and_Testing/Profilers/

I have tested several profilers and JMP is the easiest to use. It
is slower than Sun's hprof but has some nice features.

1) JMP -- nice but slow and has a problem over-reporting
CPU seconds used by subordinates when those subordinates
include recursive methods (i.e. findProperty()).

2) JPerfAnal -- based on hprof but quite slow and the GUI was
done quickly i.e. is a kludge) Usable and one source of
my suspicions about JMP.

3) HPMeter -- based on hprof but it crashes on the output from
my traces. Odd because HP are usually pretty good (except for
some device drivers).

4) prophIt -- a demo delivered by Java Webstart uses hprof input
and has a really novel GUI to visualize performance.
Unfortunately the GUI isn't quite there yet. The program is
represented like a skyscraper where each floor has slabs
representing CPU used. Higher 'floors' represent subordinate
functions in the call tree.

The item I don't like is the fact that the vertical dimension
draws the eye to thin spires that are very tall. This could
make you ignore bug flat slabs of CPU usage. Not all floors
should be the same height.

When they get this right it will be a category killer.

Still very useful as it uses the same input as JPerfAnal
HPMeter and a lot of others.

This helped me find the error in JMP because I could not
find findProperty() in the 3D graph.

5) EJP -- Extensible Java Profiler is a CS students excellent
project. Unfortunately, it's a bit slow and requires one to
read and follow directions. This one also helped me find the
error in JMP after I read the Fine Manual.

I have also decided to use the command line class for future
performance measurements.

--
John Austin <[EMAIL PROTECTED]>

Re: String.intern() test and measurement

2003-12-02 Thread John Austin

On Tue, 2003-12-02 at 16:43, Glen Mazza wrote:
> --- "J.Pietschmann" <[EMAIL PROTECTED]> wrote:
> > > But, as Glenn noticed, the attribute names can
> > also be implemented with 
> > > enumeration 
> > 
> > There are no enumerations in pre 1.5 Java. What was
> > meant was that
> > strings denoting XSLFO property enumeration tokens
> > can be interned
> > as the set is of limited and more or less fixed
> > size, 
> 
> No I was actually thinking static final variables, my
> reference to "enumerations" was in a generic sense:
> 
> public static final int PROPA = 1;
> public static final int PROPB = 2;
> public static final int PROPC = 3;
> 
> If it is only the property names we were planning on
> interning, then I thought static final variables would
> be faster/more efficient instead.  
> 
> Glen

Given that there are between 249 and 380 names and they
exist in both integer and String format, there isn't a lot
that we can recover here.

If we are after better performance, we must measure to find the
'high runner' and tune from there. To 'design-in' stuff that we 
think will be fast is often unproductive. This is why I have been
using JMP to measure performance of FOP and some sample programs.

A high runner in FOP 0.20.5 is: PropertyList.findProperty().
It calls other functions in org.apache.fop.fo that consume
significant CPU resources. In one example it called itself
recursively to a (depth of 10)

One of the reasons I am playing with the SAXTreeValidator program
is as a simplified test bed for the Property implementation. I want to
be able to plug in a new Property implementation and test
it independantly of the rest of FOP.

-- 
John Austin <[EMAIL PROTECTED]>

Re: String.intern() test and measurement

2003-12-02 Thread John Austin

On Tue, 2003-12-02 at 14:04, J.Pietschmann wrote:
> Finn Bock wrote:
> >> new DefaultMutableTreeNode(("Attribute (name = '" +
> >>atts.getLocalName(i) + 
> >>"', value = '" +
> >>atts.getValue(i) +
> >> "')").intern() );
> 
> > Here you are also interning the attribute values, right?
> 
> Eh, no. The String "Attribute (name = ';name:', value = ';value:')"
  ^ Canadian ?
> is interned.

But I feel it's similar for the purpose of modelling the behavior in the
SAXTreeValidator.java example. The references held by the parser go away
in this model program. I ensure the strings are unique. That's where the
80M savings comes in. In FOP, many strings are created in the parser and
references to them are stored in the parsed tree. Interning these will
produce memory savings. 

> > But, as Glenn noticed, the attribute names can also be implemented with 
> > enumeration 
> 
> There are no enumerations in pre 1.5 Java. What was meant was that
> strings denoting XSLFO property enumeration tokens can be interned
> as the set is of limited and more or less fixed size, while it is
> probably not prudent to intern the complete XML attribute value
> strings.
> For example:
>text-decoration="underline overline"
> (Yes, that's valid, provided "overline" is valid).
> The possibly interned strings are "text-decoration" (property name),
> "underline" and "overline" (enumeration tokens).
> Somewhere else, the user might have put
>text-decoration="overline underline"
> Granted, given that FO source ought to be XSLT or otherwise generated,
> this isn't very likely, but still.

This is why I wrote a perl program to count from a real-world case
(defguide). I haven't concerned myself with how an interned "overline
underline" string is used, just ensured it is stored that way once.

The attribute name space is defined in the XSL-FO spec. So the number
of names is strictly limited. Peter lists 380 or so in PropNames.java.

> Another issue is how the values are stored. For example, there are only
> 8 distinct TextDecoration objects. In 0.20.5, basically every Text node
> gets its own instance.
> The weird thing is, when I hacked the PropertyManager to look up the
> actual text decoration in an array of preinstantiated objects, FOP
> run slower and took more memory. Apparently something went wrong. If
> anybody is up there to get it right, please do.

I tried testing 0.20.5 doing things that worked in SAXTReeValidator
and haven't had instant success. The benefits would disappear if the
references passed out of the parser are still held elsewhere in FOP.
Give it some time.

> Same for some other objects, BorderAndPadding and especially FontInfo
> come to mind, although there is more variation.

-- 
John Austin <[EMAIL PROTECTED]>

Re: String.intern() test and measurement

2003-12-02 Thread John Austin

On Tue, 2003-12-02 at 12:59, Finn Bock wrote:
> I'm resending this mail since it hasn't yet shown up in the archives. 
> I'm sorry about any duplicates.
> 
> [John Austin]
> 
> > 4) Changed the handling of strings at the for-loop storing the
> >attributes received from the parser in startElement( ... )
> > 
> > // Process attributes
> > for (int i=0; i > DefaultMutableTreeNode attribute =
> > new DefaultMutableTreeNode(("Attribute (name = '" +
> >atts.getLocalName(i) + 
> >"', value = '" +
> >atts.getValue(i) +
> > "')").intern() );
> > 
> > So I intern these strings rather than storing new strings.
> 
> Here you are also interning the attribute values, right?

Yes.

> Interning is best used with discretion. The real problem, which isn't 
> really spelled out in the gotchas.html page, is that the interning 
> algorithm is completely undefined by the java spec.
> 
> F.ex, in jdk1.4 the intern table (in symbolTable.[hpp,cpp]) is a fixed 
> size hashing table (size of 20011) with chaining buckets. So when the 
> total number of intern'ed string grows beyond that number, the interning 
> process becomes linear in time. Try the attached test program to see the 
> effect.

Gasp! SUN implemented something might not scale up ?

Note that my example of last night, the SAXTreeValidator.java
results processed the 'defguide' file that has 117 unique names and
13,520 unique values. The memory saved by interning was impressive (184M
down to 87M = 97M reduction). 

[My results may be artificially good as I have interned character
strings as well. I expect that the nature of DEFGUIDE includes many
repeated character strings. I shall re-run the benchmark: 

Hmm .. the interning version of the program now uses 104M, a reduction
of 80M rather than the previous reduction of 97M. The number of interned
strings must have been well over the hash table size, but the difference
in CPU usage to parse that file was less than ten seconds (more for
interned character strings).]

I wondered how much chaining there has to be before performance
gets really bad when I checked your program more closely. Your 
example program would produce external chains of length:
200/20011 ~ 100. Because the table keys are constructed, I 
expect your access times are uniformly distributed and average
access times reflect that.

Your demonstration program artificially employs 2 million strings
which is not a behavior we would expect for FOP. The number of 
attribute names is limited (by the XSL-FO Spec) and the number of
distinct values is limited by some probability distributions that
are definitely not Uniform. Takes a LOT of ransom-note typography 
to make that many unique property values.

> I also think that two different aspects of interning is being mixed 
> around in your measurements here.
> 
> 1. The identity sharing.
> 2. The memory sharing.

I have not changed the programs beyond calling intern() on strings
passed to some constructors. So no identity sharing is present.

> The identity sharing can quite possible give a performance boost to the 
> lookup of the attribute names.

I prefer to optimize from measurements. The FOP high runner is property
lookup. I didn't see a lot of time in String.equals(), I may look again.

Peter's alt-design all ready provides this functionality. I am chasing
parallel memory effects.

> The memory sharing can quite possible give a memory boost for duplicated 
> attribute values and a performance boost during garbage collection.

I agree, esp in light of the counts I made using a Perl program and the
large file defguide.fo.

> Ad.1: If one decide that all attribute names must be interned before 
> insertion and also before lookup of attributes, a special hashtable 
> (identity-hashtable) can be coded that is significantly faster than a 
> normal value-based hashtable. As an example of the performance boost, 
> the hash calculation can be done as:

I would use that if i felt that intern() wasn't theraputic. So far, I
just intern the name and value for each attribute passed to the SAX
parser callback: startElement. This stores the interned string in the
parsed tree and lets the parser trash it's copies of strings whenever.

> int index = (System.identityHashCode(key) & 0x7fff) % maxindex;
> 
> In addition attribute names can be compared with '==' instead of equals.
> The downside is that the lookup can only be done with intern'ed keys.

But these values would be encapsulated in a class. Good reason for
not breaking encapsulat

String.intern() test and measurement

2003-12-01 Thread John Austin

I decided to find a demonstration program that works
similar enough to FOP that I could try the String.intern()
technique.

1) SAXTreeValidator.java from Chapter 3 of Brett McLaughlin's
   "XML and Java" the online copy of the example is 2nd Ed.

2) Fed this program various fragments of .fo files I have
   accumulated lately.

3) There was one line I had to change in the program

   Line 355: if (attPrefix == null || attPrefix.equals("")) {
   had to add test for null as the prog threw NPE.

4) Changed the handling of strings at the for-loop storing the
   attributes received from the parser in startElement( ... )

// Process attributes
for (int i=0; i

Re: String.intern() thoughts and more stats

2003-12-01 Thread John Austin

On Mon, 2003-12-01 at 02:11, Glen Mazza wrote:
> --- John Austin <[EMAIL PROTECTED]> wrote:
> > I mentioned yesterday that I thought I had read a
> > comment
> > by Bruce Eckel suggesting that String.intern() might
> > be 
> > avoided. 
> > 
> > I could not find the reference in either the 2nd or
> > 3rd editions
> > of "Thinking In Java".
> > 
> 
> No need-the "gotcha" site you gave earlier did give
> some specific drawbacks under string compares:  
> 
> http://mindprod.com/jgloss/gotchas.html#COMPARISON
> 
> BTW, The third drawback listed in the link above gave
> "weak references" as an alternative implmentation--I'm
> unsure what that construct is about--is this the
> vtable you were speaking of in an earlier message?
> 
> Also, another question for my comprehension here--the
> "canonical mappings" you have been referring to in
> this thread--is this the same thing as the property
> enumerations that alt-design uses?  I'm unsure of the
> difference between the two.

I started using the term here after re-reading parts of Eckel's
Thinking in Java. I think the CM I refer to and the alt-design
implementation are almost to the same thing. 

>From the on-line version of TIJ(3rd ed) the following excerpt:

TIJ313.htm:
Weak references are for implementing canonicalizing mappings where
instances of objects can be simultaneously used in multiple places in a
program, to save storage - that do not prevent their keys (or values)
from being reclaimed. 

Don't be mislead to the red herring of 'weak references'. I am
arguing for the "cache of unique objects" not for this GC technique.

After I started using a large .FO file to provide statistics, I
realized that we can use the same technique for larger non-string
objects. 

To this end, I have some more statistics.

In the sample FO file (DocBook: The Definitive Guide), there are
285,223 tags but there are only 18,419 unique property lists.
(There may be fewer, my perl stats program treats different
orderings of the same attributes as different lists)

The program prints out property lists which occur more than 100 times. I
prefix each list with tag names to distinguish empty lists by tag type.
That increases the number of lists by only 15 or so.



Number of Elements by tree level:
level=1 count=1
level=2 count=473
level=3 count=5242
level=4 count=5480
level=5 count=7129
level=6 count=26231
level=7 count=22475
level=8 count=36447
level=9 count=62288
level=10 count=38536
level=11 count=30486
level=12 count=23641
level=13 count=23190
level=14 count=2023
level=15 count=771
level=16 count=701
level=17 count=109


Element frequencies:
a 24
fo:basic-link 5225
fo:block 112142
fo:conditional-page-master-reference 48
fo:external-graphic 1097
fo:flow 472
fo:footnote 22
fo:footnote-body 22
fo:inline 62792
fo:layout-master-set 1
fo:leader 1764
fo:list-block 279
fo:list-item 1004
fo:list-item-body 1004
fo:list-item-label 1004
fo:marker 5335
fo:page-number 1872
fo:page-number-citation 3224
fo:page-sequence 472
fo:page-sequence-master 12
fo:region-after 38
fo:region-before 38
fo:region-body 38
fo:repeatable-page-master-alternatives 12
fo:root 1
fo:simple-page-master 38
fo:static-content 4720
fo:table 6497
fo:table-body 6497
fo:table-cell 33174
fo:table-column 19225
fo:table-footer 1
fo:table-header 29
fo:table-row 15301
fo:wrapper 1799


Property List frequencies:
395 fo:basic-link internal-destination=common.attributes,
66878   fo:block 
1292fo:block
end-indent=24pt,text-align-last=justify,last-line-end-indent=-24pt,
2119fo:block
font-family=monospace,space-after.optimum=1em,white-space-collapse=false,text-align=start,space-before.maximum=1.2em,space-before.optimum=1em,wrap-option=no-wrap,space-before.minimum=0.8em,space-after.maximum=1.2em,linefeed-treatment=preserve,space-after.minimum=0.8em,
5082fo:block
font-family=sans-serif,Symbol,ZapfDingbats,keep-together=always,
236 fo:block
font-family=sans-serif,Symbol,ZapfDingbats,margin-left=-4pc,keep-together=always,
439 fo:block
font-family=sans-serif,space-after.optimum=0.5em,hyphenate=false,font-weight=bold,font-size=18pt,space-after.maximum=0.6em,space-after.minimum=0.4em,keep-with-next.within-column=always,space-after=1em,
5321fo:block
font-family=sans-serif,space-before.maximum=1.2em,font-weight=bold,space-before.optimum=1.0em,space-before.minimum=0.8em,keep-with-next.within-column=always,
3768fo:block font-family=serif,Symbol,ZapfDingbats,margin-left=-4pc,
3533fo:block font-size=17.28pt,
1722fo:block font-size=20.7359997pt,
104 fo:block font-weight=bold,
5332fo:block keep-with-next.within-column=always,
439 fo:block space-after=1em,
6037fo:block
space-before.maximum=1.2em,space-before.optimum=1em,space-before.minimum=0.8em,
2558fo:block span=none,
191 fo:bl

Re: String.intern() thoughts

2003-12-01 Thread John Austin

On Mon, 2003-12-01 at 02:11, Glen Mazza wrote:
> --- John Austin <[EMAIL PROTECTED]> wrote:
> > I mentioned yesterday that I thought I had read a

> BTW, The third drawback listed in the link above gave
> "weak references" as an alternative implmentation--I'm
> unsure what that construct is about--is this the
> vtable you were speaking of in an earlier message?

The Vtable is used to dispatch virtual functions in (some
implementations) of C++. There is one such table for each class
and it contains a pointer to each virtual function defined. Each
object holds a pointer to it's actual class vtable. This is the
mechanism used to implement polymorphism for C++ virtual functions.

I think a similar means can be used for the inheritence of
properties in FOP.

class a {
  virtual int f() { return 3; }
}

class b : a {
  virtual int f() { return 33; }
}

...

a* z = new b;

cout << "a" << z->f() << endl;

class a vTable contains address of a::f()
  bb::f()

instance z includes pointer to the class b vTable 
and is function b::f() is called using pointer to
object of type a.

-- 
John Austin <[EMAIL PROTECTED]>

Re: Properties Implementation and Canonical Mappings

2003-12-01 Thread John Austin

On Mon, 2003-12-01 at 02:45, Glen Mazza wrote:
> --- John Austin <[EMAIL PROTECTED]> wrote:
> > 
> > The property strings are given to the Property
> > object
> > constructor by some path beginning with a SAX
> > parser.
> > It is reasonable to assume that the SAX parser loses
> > refs to most of these strings and that the Property
> > implementation retains the only references to these 
> > String objects.
> > 
> > How big are String Objects ? 
> > At least 16 bytes plus storage for characters. 
> > 
> > What does this save us ? 
> > Probably only about 1,600,000 bytes for this file. 
> > CPU cost of creating strings is probably similar to 
> > cost of checking string table for a copy.
> > 
> 
> Just to clarify, the (additional?) "CPU cost" you
> mentioning above is *not* occurring for the present
> process, correct?  I think you're referring to the
> cost that would be added as a result of the changes
> you're recommending (because there now will be a
> string table search to avoid duplication).

Going back to the beginning of my involvement, I found this
issue because Property searches are the high-runner for CPU
in FOP. I don't want to split hairs in isolation over which
search/constructor sequence is faster. I want to remove the
conditions that cause the current pathology.

Hash table lookups are FAST. When we invest in object creation
we recover many times over in the end. 

> Also, the "string table" you mention--I think you're
> speaking generically, but is there a specific, already
> available construct in Java that we can use for this
> purpose in FOP?  I'd like to find out what you have in
> mind for a specific implementation.

HashMap works fine the way Peter has it set up in alt-design.

I use the same construct in the Perl code I use to analyze the
large sample FO files.

-- 
John Austin <[EMAIL PROTECTED]>

String.intern() thoughts

2003-11-30 Thread John Austin

I mentioned yesterday that I thought I had read a comment
by Bruce Eckel suggesting that String.intern() might be 
avoided. 

I could not find the reference in either the 2nd or 3rd editions
of "Thinking In Java".

A couple of observations from some research:

1) There were some problems in Java 1.1 and before.
2) There may be problems in non-Sun implementations (KAffe...)

3) There have been discussions in the SAX2 list and other 
   places about using String.ntern() and I notice that
   interning is a feature of SAX2 that can be turned on.
   There is a lot of support for the technique and I suspect
   some of the objections are of the theological type.

4) The property strings in Peter West's code start life as
   string literals which are interned by the Java Language Spec.
   So they are all ready present in the table.

Some of the benefit of interning can be turned on in the parser. 

-- 
John Austin <[EMAIL PROTECTED]>

Re: Properties Implementation and Canonical Mappings

2003-11-30 Thread John Austin

Input: The XSL-FO file produced from:
"DocBook: The Definitive Guide "

Document size:   648 Pages  // for the O'Reilly edition
FO file size:   21,659,370 bytes
Properties: 526,648
Tags:   285,223
Height of tree: 17   // max height of the parse tree
Unique prop names:  117  // bounded by the spec
Unique prop values: 13,520   // bounded by the real world

Using these numbers, we can explore the sort of benefits to expect
from revised Property implementation. With over a million strings,
the FOTree for this document would use forty or fifty Mb in addition
to data structures. 

This document can be used as an example even though it probably
can't be formatted (yet) by FOP. It has a lot of tables. It could 
be a goal of the FOP project to generate this well-known document.

I was thinking of using the XSL-FO spec from the W3C web site but
couldn't find the stylesheet to make the FO file. If anyone knows
where to find them, please let me know.

Statistics from this file:

Number of Elements by tree level:
level=1 count=1
level=2 count=473
level=3 count=5242
level=4 count=5480
level=5 count=7129
level=6 count=26231
level=7 count=22475
level=8 count=36447
level=9 count=62288
level=10 count=38536
level=11 count=30486
level=12 count=23641
level=13 count=23190
level=14 count=2023
level=15 count=771
level=16 count=701
level=17 count=109

Element frequencies:
a 24< I wonder where this came from 
fo:basic-link 5225
fo:block 112142
fo:conditional-page-master-reference 48
fo:external-graphic 1097
fo:flow 472
fo:footnote 22
fo:footnote-body 22
fo:inline 62792
fo:layout-master-set 1
fo:leader 1764
fo:list-block 279
fo:list-item 1004
fo:list-item-body 1004
fo:list-item-label 1004
fo:marker 5335
fo:page-number 1872
fo:page-number-citation 3224
fo:page-sequence 472
fo:page-sequence-master 12
fo:region-after 38
fo:region-before 38
fo:region-body 38
fo:repeatable-page-master-alternatives 12
fo:root 1
fo:simple-page-master 38
fo:static-content 4720
fo:table 6497
fo:table-body 6497
fo:table-cell 33174
fo:table-column 19225
fo:table-footer 1
fo:table-header 29
fo:table-row 15301
fo:wrapper 1799

Properties: 526648
Tags: 285223
num_keys: 117
num_vals: 13520


-- 

John Austin <[EMAIL PROTECTED]>

Re: Properties Implementation and Canonical Mappings

2003-11-29 Thread John Austin

On Sat, 2003-11-29 at 16:35, J.Pietschmann wrote:
> Darn, racall the last post.
> 
> John Austin wrote:
> > Note that storing the property name and value refs supplied
> > to the Property constructor will use 45,620 strings. If the
> > Property implementation employs canonical mapping to ensure
> > that only one copy of each unique string is stored, then just
> > over 2,300 strings are required. 
> 
> Have a look at String.intern()

Bruce Eckel said not to trust it for some reason. I have 2nd Ed
of "Thinking in Java" and the online one is 3rd Ed so I haven't
found chapter and verse for this yet. 

The only 'bad thing' said about it that I could find quickly was:

http://mindprod.com/jgloss/gotchas.html

The other good thing we can do is .... compare these string refs for
equality.

> J.Pietschmann
-- 
John Austin <[EMAIL PROTECTED]>

Properties Implementation and Canonical Mappings

2003-11-29 Thread John Austin

In the interest of contributing (instead of just
trashing) to the proposed implementation, I wrote 
a simple Perl script to get some counts out of a 
real-world XSL-FO file.

Input: The XSL-FO file produced from a DocBook file
I have left from a dormant project. The perl program 
counts the number of properties in the source file.

PDF size:   130 Pages  // some users have a lot more
FO file size:   1.2M bytes
Properties: 22,815
Unique prop names:  89  // bounded by the spec
Unique prop values: 2,227   // bounded by the real world

Note that storing the property name and value refs supplied
to the Property constructor will use 45,620 strings. If the
Property implementation employs canonical mapping to ensure
that only one copy of each unique string is stored, then just
over 2,300 strings are required. 

The property strings are given to the Property object
constructor by some path beginning with a SAX parser.
It is reasonable to assume that the SAX parser loses
refs to most of these strings and that the Property
implementation retains the only references to these 
String objects.

How big are String Objects ? 
At least 16 bytes plus storage for characters. 

What does this save us ? 
Probably only about 1,600,000 bytes for this file. 
CPU cost of creating strings is probably similar to 
cost of checking string table for a copy.

What does it buy for us ?
Bounds a source of current Order(n) memory growth. 
It gets us in the habit of using another good technique.

I am all ready thinking along the lines of:
The property lists for these FO's are usually generated by
programs and will be the repeated many times. Perhaps we
could use larger, faster working Property Lists consolidated with
Canonical Mappings to save both time and space.

I am thinking again along the lines of handling properties more
like C++ virtual function table (vTable). This object is larger
than Peter's ordered Property array, but would be faster. 
That's a reason C++ has fast virtual function dispatching.
-- 
John Austin <[EMAIL PROTECTED]>

RE: [VOTE] Properties API

2003-11-27 Thread John Austin

On Thu, 2003-11-27 at 14:57, Victor Mote wrote:
> John Austin wrote:
> 
> > I am critical

> Now, if you can figure out how to digest an FO document without building a
> tree that represents a page-sequence object, I hope you'll share it with the
> rest of us. That could be a breakthrough indeed.

I am just thinking of ensuring that objects disappear after the
page they are on has been printed. At the point that 0.20.5
prints:
[INFO] [1]
The related objects from Page 1, should ... join the choir invisibule
...

They don't appear to, which is why the memory use of FOP increases
in proportion to document length.

You only need to retain the useful parts of the page-sequence object.
Stuff that has been 'printed' isn't useful.

> Victor Mote
-- 
John Austin <[EMAIL PROTECTED]>

RE: [VOTE] Properties API

2003-11-27 Thread John Austin

On Thu, 2003-11-27 at 13:58, Victor Mote wrote:
...
> Again, this is an implementation detail, and doesn't affect the interface.
> However, on the implementation side, it seems that the tradeoff will be
> between doing a full parse each time, or creating lots of objects. John
> Austin's inquiry about the huge number of objects created is what got me
> started down this line of thinking.

I am critical of what I percieve to be a pathological growth of objects
(and search times). If those problems are corrected, there are plenty of
resources left to do a few extra parses.

How often will you encounter expressions this complex ? Rarely.

If they become common (and someone will do that!), we can call THAT a
pathalogical development and blame the victim.

>  I suppose that the best way would be to
> have your cake and eat it too -- store integers where possible, and create
> objects where not possible, and teach everything how to tell the difference.
> (Here is a half-baked idea that I don't want to even think about pursuing
> for a while -- PropertyStrategy. With the API I have proposed, one could
> conceivably store the Properties one of several ways, and have the user
> select which one they want based on performance needs).

As Peter knows, I have been reading the code. I shall attempt the
XSL-FO Spec soon. I understand the spec defines the behavior of
the program in terms of fully parsed/expanded trees. This
implies that objects must exist even if they will never be used
after the parser moves past their end-points. Optimization anyone?

What I infer of the Tree structures in your discussion and Peter's
code suggests to me that FOP creates a DOM-ish view of the document
in one or more trees. This is a mis-match with the SAX parser that
is in there somewhere.  

And just to say something completely ludicrous, because someone
will take it seriously ...

You could convert those expressions to a Java class, compile, load
and invoke it with Reflection ... 

-- 
John Austin <[EMAIL PROTECTED]>

Re: [VOTE] Properties API

2003-11-26 Thread John Austin

On Wed, 2003-11-26 at 14:45, Glen Mazza wrote:
> --- "Peter B. West" <[EMAIL PROTECTED]> wrote:
> > The set of property values relevant to a
> > particular FO are 
> > available in a sparse array, accessible by the int
> > index corresponding 
> > to the Property.  
> 
> Which source file has the enumerations of the
> properties--I'd like to see how you listed them.  Are
> you satisfied with those enumerations--anything you
> would change if you had to do it over?

org.apache.fop.fo.PropNames.java has the property strings
and assigned numbers. He even states a Perl program (and how 
to execute it in emacs) to regenerate the numbers.

Similar file is org.apache.fop.fo.FObjectNames

> It may be good to create a sample FO document that
> would exhibit what you're saying above.  Hopefully
> something that shows a important feature that would
> clearly fail if we don't take into account the Area
> Tree while resolving properties.  That would help
> clarify things, and we can use it for testing.

And there are reasons to create a set of XSL-FO documents
providing test cases. I am concerned that some of Peter's
NameSpace code hasn't been tested (or is just hard to grok).

-- 
John Austin <[EMAIL PROTECTED]>

Property classes and eventually, new Property handling.

2003-11-25 Thread John Austin

After my last post I went away to play in the code for a while.

Mostly to see what is necessary to isolate a minmal set of
classes related to Property handling.

What I found is:

1) Property is ubiquitous: every client class knows what package
   it lives in. As a planning point, I better think about keeping
   a Property class (or prepare to make a lot more changes and
   lose a few friends). 

2) PropertyList and PropertyListBuilder are used in fewer places.
   but they are used a lot. PropertyList is referenced 129 times  
   and PropertyListBuilder only 8 times. Most of these references 
   are in the Property class. If they can be hidden, the problem
   is bounded by the Property class.

One way to discover the scope of an API is to rename a class or a
package. Doing so breaks all of the compile units that depend on
the renamed class(es). Restoring the missing interfaces restores
the system if the restoration obeys the previous class contracts.

As I suspected, automated code generation for properties is
do-able. There are more than 17,000 lines in files generated
through Ant target: 'codegen'. Many of these are clients of the
Property class. Changes here can be localized to the XSL files
that generate the code.

-- 
John Austin <[EMAIL PROTECTED]>

RE: [VOTE] Properties API

2003-11-25 Thread John Austin

Victor, I was mostly backing away from my earlier posting which was 
off-target.

On Tue, 2003-11-25 at 13:26, Victor Mote wrote:
> John Austin wrote:
> 
> > After thinking about the proposal, I'm not sure it solves anything.

> you might make to the implementation would require (I think) changes to the
> LayoutStrategys and to the Renderers. Also, as Glen has pointed out, there
> is business logic that can be pulled out of these code modules back into FO
> Tree where they more properly belong, and where duplication and confusion
> can be minimized.

In order to adapt Peter's ideas, I would need identify the current
Interface(s). Ideally a re-implementation of Property handling would be
invisible outside of those classes.

All the proposal addresses is the signature of some accessors. It does
not identify the set of property-related classes. I think an adapter
class could convert the current interface to the proposed interface.

This could be useful if it covers the entire properties interface.

> > The discussion favours the proposal.
> 
> I don't understand what you are saying here.

All/most other postings agreed with the proposal. 

> >  class PropertyAdapter? extends Property{ // Ugh! just f'rinstance
> >
> >   // repeat the following about 380 times:
> >   final public Property getMaxWidth?() { // use final to inline and annoy
> > return get("max-width");
> >   }
> >  }
> 
> Sorry -- I was not clear here. I meant to suggest that these methods be
> added to FObj (and its subclasses to the extent necessary).

Just a for-instance sketch of an Adapter. Reference to properties has to
come from somewhere. I used Inheritence as a convenience.

> > There is no statement defining the current interface. This will be
> > determined from existing code.
> >
> > Implementation
> > The proposal makes no suggestion for implementation and my earlier
> > submission is not relevant except as an indication that this issue is
> > linked to performance.
> 
> Again, I am not sure what you are saying here. The proposal deliberately
> does *not* address implementation. I am quite glad to have you address the
> performance aspects of implementation, but I think it is a separate issue
> from the interface. We can (and should, IMO) fix the interface before or at
> least during any changes to the implementation. All I am trying to do is to
> hide the implementation from the rest of the system.

What else is needed to 'get' properties ? Your accessors just have the 
property name. 

> Since FO Tree (and Properties) kind of works right now, we have been paying
> much more attention to other parts of the FOP code. Your questions and
> interest are forcing us to address it sooner than we would have. Obviously,

Don't feel forced. You CAN ignore me. I appreciate your efforts to clue
me in.

> one of our highest priorities should be to make other developers as
> productive as possible. Also, to a certain extent, we have been waiting on
> Peter West's work, hoping that his efforts can be useful in all of this. I
> am still hoping to hear from Peter on this, but in the meantime, I am trying
> to do some housekeeping that IMO will be important to clear the decks for
> you.

I support your Interface-view of properties but would like to have the
scope of the Property interface mapped out to include more than the
accessors. Of course, if these accessors and some references all ready
held by FObj, can do the trick, lets get on with it!

-- 
John Austin <[EMAIL PROTECTED]>

Re: [VOTE] Properties API

2003-11-25 Thread John Austin

Note: I added a page to the wiki for this thread.
http://nagoya.apache.org/wiki/apachewiki.cgi?PropertiesRedesign

After thinking about the proposal, I'm not sure it solves anything. 

There are two aspects to the redesign of Property handling in FOP.

  * Interface means the external points of contact for Property data
  * Interface determines impact of changes on other components of
the system.
  * Implementation means the internal construction of classes 
  * Implementation determines performance characteristics of the
program.

The discussion favours the proposal.

Interface
The current proposal asks that FOP will employ Java-Bean-like accessors
for the properties of Formatting Objects visible to the FOTree. As an
example: 

 getMaxWidth?() for the property "max-width" 

There are between 250 and 380 of these methods required and they could
be generated automatically from an XML-based list of properties. This
list could be derived (if not generated) from the XSL-FO Specification
itself. Some kind of simple adapter class can be used to equate the
proposed interface to the existing one:

 class PropertyAdapter? extends Property{ // Ugh! just f'rinstance

  // repeat the following about 380 times:
  final public Property getMaxWidth?() { // use final to inline and annoy
return get("max-width");
  }
 }

There is no statement defining the current interface. This will be
determined from existing code.


Implementation
The proposal makes no suggestion for implementation and my earlier
submission is not relevant except as an indication that this issue is
linked to performance.


-- 
John Austin <[EMAIL PROTECTED]>

Re: [VOTE] Properties API

2003-11-24 Thread John Austin

On Mon, 2003-11-24 at 19:47, Victor Mote wrote:
> FOP Developers:
> 

> Proposal:
> I propose that public "get" methods be used to retrieve FO Tree property
> values, and that the data behind these values be made as private as
> possible. The methods should be given a name based on the XSL-FO Standard.
> For example, the "max-width" property should be accessed using the method
> "getMaxWidth". The values returned should be the "refined" values.

I have been thinking about this a bit and I would like to throw out a
few rambling observations:

Properties are defined in the spec, so there are a finite number of them
meaning that they map nicely in to an enumeration type*. Peter West had
written some stuff along that line. This allows us to get away from
object-and-compute-intensive String types. There are about 380 of these
in Peter's Mapping. Some 249 are simple attributes and the rest are
more complex, like space-after.optimum, space-after.miumum etc.

One train of thought I've had asks whether we need an atomic 'Property'
type alone or whether we can use a larger aggregate object type like
PropertyList that is a vector with each attribute value in a fixed
position. The idea here is that such a vector, something like a vTable,
can be merged quickly to resolve inheritence. 

v[FONT_FAMILY] = "sans-serif"
v[FONT_SIZE] -> "10pt"
v[...] -> ...

v has 400 entries or 250 entries and we use polymorphism somehow on the
complex properties

> Discussion:
> 1. The purpose here is to separate the storage of the property values from
> their presentation (API) to the rest of the system. This opens the door for
> later changes to the storage without disrupting the remainder of the system.

We need to contain the number of objects here. say with canonical
mapping of the property name strings. Possibly also use canonical
mapping of the attribute values too. How many times is "10pt" or
"bold" coded in an document ? Especially, given that patterns of 
FO are emitted by XSLT in other programs. Allows faster compares
when we can test with == rather than Object.equals( Object).

I had thought that the fact that all of the attributes are in the XSL-FO
Specification and that there are some simple structures used, I might
want to generate the property name list and some of the acccessors
like you have named them, automatically. Is there an XSL Schema for
XSL-FO, or would I just extract them from xml in the Spec document ? 

I can't say anything about how this stuff gets used, points 2 & 3.
I'd be interested in being involved in item 4.

> 2. This could perhaps be implemented for now only in FObj (??).
> 3. This is not directly relevant to the question, but needs to be addressed
> from the "big picture" standpoint. With the FO Tree mostly isolated, and
> LayoutStrategy implemented, the issue of whether a certain feature is
> implemented or not moves from the FO Tree to the specific LayoutStrategy.
> Each LayoutStrategy eventually needs to track which objects and properties
> it supports. The FO Tree should always store and return the data, without
> regard to whether it can be used or not. In the future, our properties.xml
> file, "compliance" page, and perhaps other things need to handle this. The
> LayoutStrategy needs to be the entity that reports on whether a feature is
> supported (perhaps using a scheme similar to properties.xml).
> 4. If accepted, this would be a great project for one of the new developers
> to tackle. I don't mean to volunteer anyone, but it might be a good "feet
> wet" project.
> 
> My vote:
> +1
> 
> Victor Mote

* we don' need no steenking enumeration type!

For years I thought the 'steenkin badges' quote originated with
WKRP's Dr. Johnny Fever.
-- 
John Austin <[EMAIL PROTECTED]>

Re: Memory measurement -- importance of Driver.reset() in Cocoon - NOT!

2003-11-24 Thread John Austin

On Mon, 2003-11-24 at 16:23, Glen Mazza wrote:
> --- John Austin <[EMAIL PROTECTED]> wrote:
> >
> > My own feeling is that FOP
> > will remain
> > problematic for large documents and this will be
> > especially so in
> > server environments such as Cocoon.
> > 
> 
> We hear you, and we do emphasize performance as one of
> our main goals [1].  As always, you may wish to add a
> Wiki page on things we're not doing that we should be
> doing in order to speed things up.  Bugzilla may also
> be a good place.  Others can comment on them that way,
> and we can refine what needs to be done.
> 
> For me though, depending on the problem, there can be
> a "grokking delay"--I tend not to act until I fully
> understand the issues involved.  Also, I am currently
> tied up with layout issues which preclude me from
> getting too much into properties code at this time. 
> 
> But many of the other committers/contributors have
> more thought-out opinions on how properties should be
> handled and should be able to contribute more quickly
> to your ideas.

Thanks. Sometimes Ijump on things (like a cop on a donut). 
Part of the ENTJ personality type (most engineers are ENTP
 - Perceiving vs Judgemental).

I want my focus to be on the PropertyList issue, as I feel
there are significant benefits there. I have a few miles to
go to understand the issues in PropertyList and 
PropertyListBuilder. 

I intentionally have not brought up my new observation:

When I run the same test twice with a Driver.reset(), GC and
a wait in between, heap usage the first time is TWICE
the usage of the second run.  Time taken is the same.
Just somethng to wonder about, in case we stumble on a
reason.

JMP graphic available on request.

I have learned quite a bit about running FOP (and Java) for
larger files. A year ago, I had terrible trouble running FOP
in Cocoon becasue I had only 256Mb. The same task is a breeze
in the current release.

1) Large (equal) values of -Xms and -Xmx can reduce GC and
speed execution of a task. There is documented support
for this on the Sun site.

2) Use of the Server HotSpot VM makes a bigger difference
than point (1). I have submitted a fix for 'cocoon.sh' to
select this JVM. I am sure this was an oversight at Cocoon.

3) FOP and related libraries compiled with Sun's SDK can all 
be run with the current (1.4.1) IBM SDK. The GC performance
of this IBM SDK is too ugly for words. More to thnk about.

4) You can observe Garbage Collection events with the Java option:
-verbose:gc and this works similarly on both IBM and Sun run-times.
Trace syntax is quite different but you can see times and sizes.

Given that I may have p*ss*d off some Germans (my Neandertal*
comment ;-), I shall refrain from stating that there may be
something rotten in Denmark - that might annoy Shakespeare-averse
Danes ...

* I lived on Ramstein Air Force Base for 4 years. 
Many Neandertals in my high-school at Kaiserslautern!

> Glen
> 
> [1]
> http://marc.theaimsgroup.com/?l=fop-dev&m=106735192618324&w=2
> 
> 
> __
> Do you Yahoo!?
> Free Pop-Up Blocker - Get it now
> http://companion.yahoo.com/
-- 
John Austin <[EMAIL PROTECTED]>

Re: Memory measurement -- importance of Driver.reset() in Cocoon - NOT!

2003-11-24 Thread John Austin

I took a further look at some of the JVM options and ran a few 
more tests. I found  a couple of things that may be useful.

1) I have to use -Xmx400m to produce the 2000 page PDF in a 
large PDF test, 'time' reported: real ~5m user 2m57.880 ...

2) When I added -Xms400m to the same work 'time' reported: real
4m35.968s user 2m35.190s ...

This is drop in user time of 22 seconds, about 12 percent

The Cocoon Performance Tips page says we should not use -Xms.
One problem with statements of performance is they don't always 
discuss the rationale. My own feeling is that FOP will remain
problematic for large documents and this will be especially so in
server environments such as Cocoon.

3) When I add the JVM 1.4 option: '-server', executiuon gets faster.
Time reports: real 2m52.100s user 2m1.520s as improvement of
about 20 percent.

Note that 'cocoon.sh'  in the top-level directory for cocoon-2.1.3
supplies the defaults: JAVA_OPTIONS=-Xms32m -Xmx512m but does not
specify '-server'.
 


-- 
John Austin <[EMAIL PROTECTED]>

Re: Memory measurement -- importance of Driver.reset()

2003-11-23 Thread John Austin

On Fri, 2003-11-21 at 15:50, J.Pietschmann wrote:
> John Austin wrote:
> > It is clear that there is a fair bit of memory freed by Driver.reset(). 
> > 
> > After thinking it over, I modified the same test to skip reset() and
> > just null the reference and issue System.gc().
> > 
> > This should be the same as letting it go out of scope (which happens
> > afterwards but this way I get the square wave on the graph).
> > 
> > Attachment 2: footprint2.png has about 1Mb more heap in use!
> 
> Shrug. The Driver.reset() is
>  _source = null;
>  _stream = null;
>  _reader = null;
>  _treeBuilder.reset();
> and the tree builder's reset in turn is
>  currentFObj = null;
>  rootFObj = null;
>  streamRenderer = null;
> this.errorCount = 0;

Why bother implementing reset() ?

> There are no static variables explicitely freed (there are not much
> static variables in FOP in general). I don't see any difference calling
> reset() can make compared to simply nulling the Driver reference.
> 
> > Why ? Does this suggest that there are finalizers (destructors) that are
> > not being called ? References set to null inside reset() should all
> > be unreachable when the reference to Driver goes out of scope.
> 
> You realize that gc() doesn't *force* a GC?
> The most reliable way to measure  allocated heap space I know off is
> allocating a large byte[][] array, then allocate 1k byte[] or so
> until you run out of memory.

Yes. But the observations were convincing. It is obvious that much
garbage was collected as expected. It was possible that the garbage
collector did not collect ALL of the garbage at that point but I
didn't understand how that could happen. I thought that the GC 
would examine every object to determine it's status. This seemed to
imply that you MUST collect all garbage every time.

I overlooked the possibility of multiple storage pools.

> > This might explain problems people are reporting when 
> > generating multiple PDF files using FOP. Especially if their
> > programs don't lose references to instances of Driver.
> Well, if they hang on to the Driver object, they are doomed.
> 
> > Personally, I suspect there are a lot of logical memory leaks
> > inside FOP.
> What's a logical memory leak?

The only kind you can get in Java. Referenced memory that is never
used again. 

In the case of FOP, this includes objects which have been laid out
in the PDF never to be used again. FOP memory use increases >= O(n)
where n is the number of pages being created. This is easy to show.

Massive amounts of memory are released when Driver is reset() or
unreferenced. 

I still object to the extreme high water marks in FOP. The saw-tooth
pattern is described in the JMP documentation as characteristic of
the creation of too many objects. This is probably fundamental to 
XSL-FO but the high-water mark of memory use would be lower if objects
were collected sooner. This would also speed up collection as there
would be fewer objects to inspect.

-- 
John Austin <[EMAIL PROTECTED]>

Re: Memory measurement -- importance of Driver.reset() in Cocoon - NOT!

2003-11-20 Thread John Austin

On Fri, 2003-11-21 at 00:43, John Austin wrote:

> I mean, I wonder what Cocoon does ?

In FOPSerializer:

/**
 * Recycle serializer by removing references
 */
public void recycle() {
super.recycle();
this.driver = null;
this.renderer = null;
}

Apparently, we don't need no stinkin reset();
 
-- 
John Austin <[EMAIL PROTECTED]>

Re: Memory measurement -- importance of Driver.reset()

2003-11-20 Thread John Austin

On Fri, 2003-11-21 at 00:02, Glen Mazza wrote:
> Please do not cross-post to both lists--keep
> development-related issues on fop-dev.

Sorry. Not something I do a lot.

It was posted to fop-user to be available in the
archives for anyone searching on memory useage.

Users will want to make sure they use Driver.reset().

Hmm. I wonder what would Jimmy Buffet do ?

I mean, I wonder what Cocoon does ?

-- 
John Austin <[EMAIL PROTECTED]>

Memory measurement -- importance of Driver.reset()

2003-11-20 Thread John Austin

After reading the Sept 2003 thread about Memory Performance, leaks (and
how wonderful ADA is), I modified my test program that generates
3 PDF files. The program now sleeps 30 seconds, calls Driver.reset(),
nulls the reference and sleeps again. In JMP this plots a square wave
between that you can read on the attached graphs.

It is clear that there is a fair bit of memory freed by Driver.reset(). 

After thinking it over, I modified the same test to skip reset() and
just null the reference and issue System.gc().

This should be the same as letting it go out of scope (which happens
afterwards but this way I get the square wave on the graph).

Guess what ?

Attachment 2: footprint2.png has about 1Mb more heap in use!
And this is a very short test file with just one member name & address.
The test prints a letter, envelope and a renewal form for a non-profit
Gardening group.

The difference ... no call to Driver.reset() !!!

Why ? Does this suggest that there are finalizers (destructors) that are
not being called ? References set to null inside reset() should all
be unreachable when the reference to Driver goes out of scope.

This might explain problems people are reporting when 
generating multiple PDF files using FOP. Especially if their
programs don't lose references to instances of Driver.

Personally, I suspect there are a lot of logical memory leaks
inside FOP. A reset() at the end of using a Driver instance is a
catch-all way of releasing all of the logically leaked memory
allocated from inside Driver() (and therefore inside FOP).

This approach is of little help to the developer who builds an
application that dies of memory exhaustion in production. We will have
to fix the logical leaks inside FOP to improve the user experience.


-- 
John Austin <[EMAIL PROTECTED]>
<><>import java.io.File;
import java.io.IOException;

import javax.xml.transform.Result;
import javax.xml.transform.Source;
import javax.xml.transform.sax.SAXResult;
import javax.xml.transform.stream.StreamResult;
import javax.xml.transform.stream.StreamSource;
import javax.xml.transform.Transformer;
import javax.xml.transform.TransformerFactory ;

import org.apache.avalon.framework.logger.ConsoleLogger;
import org.apache.avalon.framework.logger.Logger;

import org.apache.fop.apps.Driver;
import org.apache.fop.apps.FOPException;

/**
* Use JAXP 1.1 to apply two transformations and FOP to generate PDF output
* for the Friends of the Gardens (FOG) project for the MUN Botanical Garden
*
* Requires: 
* (i)  Java >= 1.4 to obtain the XML parser and XSLT processor - JAXP 1.1
* (ii) FOP >= 0.20.5, fop.jar and the associated batik.jar and avalon-cvs-20020806000.jar
* (iii) Input file: members.xml
* (iv) Transforms: letter.xsl,  letter2fo.xsl, 
*  env.xsl, env2fo.xsl, 
*  renewal.xsl, renewal2fo.xsl
* Compile:
* javac -classpath .;fop.jar;avalon-framework-cvs-20020806.jar SimpleJaxp.java
* 
* Execute:
* java -Xmx4 -classpath .;fop.jar;batik.jar;avalov-framework-cvs-20020806.jar SimpleJaxp
* 
* Alternative:
* cocoon: pipelines like this:
	
	  
	  
	  
	  
	
*/

public class SimpleJaxp extends java.lang.Thread {
	
	public static void main(String[] args)
		throws javax.xml.transform.TransformerException {

		java.util.Calendar cal = java.util.Calendar.getInstance();
		long start = cal.getTimeInMillis();
		
		transformToPDF( "letter",  "members.xml", "letter.xsl",  "letter2fo.xsl"  );
		transformToPDF( "env", "members.xml", "env.xsl", "env2fo.xsl" );
		transformToPDF( "renewal", "members.xml", "renewal.xsl", "renewal2fo.xsl" );	
		
		System.out.println( "Elapsed " 
		+ ((java.util.Calendar.getInstance().getTimeInMillis() - start + 500)/1000)
		+ " seconds." );
		
		try {
		sleep(24*360);
		}
		catch (InterruptedException e ) {
		System.err.println( "sleep() Interrupted." );
		}
	}
	
	public static void transformToPDF(
		String namePart,
		String xmlFileName,
		String xsltFileName1,
		String xsltFileName2
	) 
		throws javax.xml.transform.TransformerException {
		
		File xmlFile = new File( xmlFileName );
		File xsltFile = new File( xsltFileName1);
		
		File out1 = null;
		
		try {
			out1 = File.createTempFile( namePart, ".xml" );
			out1.deleteOnExit();
		}
		catch( IOException ioe ) {
			System.err.println( "Could not create temp file" );
			System.exit(0);
		}

		//*** First transformation ***
		
		Source xmlSource = new StreamSource(xmlFile);
		Source xsltSource = new StreamSource(xsltFile);

		Result result = new StreamResult(out1);

		TransformerFactory transFact = TransformerFactory.newInstance();

		Transformer trans = transFact.newTransformer(xsltSource);

		trans.transform(xmlSource, result );
		
		trans = null;
		
		//*** Second transformation ***
		
		File xsl

Development Environment suggestions ?

2003-11-20 Thread John Austin

So far I have been playing around like the Neanderthal*
that I am. I use Sun Java 1.4.x with xterm, vi, emacs and 
occasionally Jedit when I feel modern urges.

Peter has mentioned Eclipse and I have used VisualAge for 
Java, and either NetBeans or the Sun form thereof.

Is there a path to enlightenment (excuse the trollish tone)
therein ? Given that FOP can be installed and started in 
TBI (The Bash IDE), are there other graphical IDE's with a
reasonable learning curve ?

I have both Win98 and RH9 available to me. The RH box
has more resources in addition to having the usual Linux 
advantages. 

* Is that term Politically Correct ?  Would it be offensive to 
Europeans ? I myself am descended from Celts and probably 
some Angles, Jutes and Saxons. Dunno about Picts.

-- 
When I showed my mother an Anglican Church with the sign:
"Angle Parking Only", she asked "What about the poor Jutes and Saxons
?".

John Austin <[EMAIL PROTECTED]>

Re: FOP ~ PropertyList search gives linear performance (FROM: fop-user)

2003-11-19 Thread John Austin

On Wed, 2003-11-19 at 19:50, Peter B. West wrote:
> John Austin wrote:
...
> My apologies to everyone on the list for the testy tone  ...
You mean I might have help p*ss*ng people off ?
> Peter
-- 
John Austin <[EMAIL PROTECTED]>

Re: ANN: alt-design can now be integrated??

2003-11-19 Thread John Austin

On Wed, 2003-11-19 at 15:34, Victor Mote wrote:
> ANNOUNCEMENT: I have just committed a change that 1) allows LayoutStrategy
> to tell whether an FO Tree should be built, 2) has Driver act on this, i.e.
> to build an FO Tree only if LayoutStrategy indicates that this should be
> done. This should theoretically allow Peter's logic to be used as a
> LayoutStrategy within the trunk development line. What I have done is
> probably overly simplistic, but I will allow Peter or anyone wishing to work
> on that strategy tell us what additional things are needed to accommodate.
> To start integrating, create a subclass of LayoutStrategy, override the
> foTreeNeeded() method to return false, then write a format() method that
> does the layout work. LayoutStrategy knows its parent Document, which knows
> its parent Driver, so you should be able to get to all of the parsing
> variables that are needed. Let me know if you need help.
> 
> Since configuration is still messed up, you will need to hard-code a change
> to Driver to get your new LayoutStrategy object created.
> 
> Victor Mote

Geez! I was thinking more along the lines of plugging in a few new data
structures for property lookup. I am exploring the old implementation 
through the marvel of code grooming in order to understand it.

Don't worry, I have the time to do this right.

I got a tiny  improvement by playing around in some of PropertyList and
PropertyListBuilder. This is just a throw-away effort of course. I did 
enough tracing this a.m. to realize that my 'linear behaviour' may
be deeply buried. I have done this often enough that I don't
expect more than marginal improvements from grooming/tweaking 
lines of code. [Gone are the days of PL/I and unaligned bit
fields.] 

Before I found out about Alt-Design, I was thinking about using a 
HashMap with property names as keys and a class implementing 
some stack behaviour. Each new FO would conceptually 'push'
new values on a stack for each property in it's list. A smart 
'pop' would allow the entire set of properties for a FO to
be popped together. Hopefully, this design would allow faster
access to the current properties, without a need to search through
higher 'activation records', 'stack frames', contexts or whatever
you choose to call them.

The observations of performance indicate that there are millions of
accesses through PropertyList.get(String propertyName) which are 
sent one-to-one through PropertyList.get( propertyName, true, true)
and thence on to PropertyList.findProperty( propertyName, true ).

Combine this information with the fact that I didn't notice the 
performance of the corresponding put() operations on the HashMap
underneath PropertyList to conclude that retrieval is much more
intensive than storage in this structure.

So I should optimize retrievals.

My plan is to get to know the internals of FOP that are 'in contact
with' the existing code, then get to know the Alt-Design, then play with
more and more of it until I feel comfortable integrating it.

I don't expect fast-track to committer status, I would hope to work
with one or two current participants and package the changes so that
they 'drop in' to place. (We'll see)

-- 
John Austin <[EMAIL PROTECTED]>

RE: FOP ~ PropertyList search gives linear performance (FROM:fop-user)

2003-11-19 Thread John Austin

On Wed, 2003-11-19 at 15:22, Victor Mote wrote:
> John Austin wrote:

> > to work on this but I don't want to walk in to a firefight.
> 
> FWIW, I don't think there is really a firefight. Our discussions are usually
> at least robust, maybe even rowdy, but AFAICT, there is a large amount of
> mutual respect. That said, we *are* still trying to sort out some design

Ah yes! The old vigorous and spirited exchange of views.
-- 
John Austin <[EMAIL PROTECTED]>

Re: FOP ~ PropertyList search gives linear performance (FROM: fop-user)

2003-11-18 Thread John Austin

Looks like I really put my foot into it this time ;-)

I have repeated the measurements I did yesterday and I
think that it is a pretty reasonable conclusion that a
lot of resources are consumed by FOP in its rather
Byzantine property management code. 

I just spent a while trying to understand PropertyList
and PropertyListBuilder and found out that I need to
understand Property and Property.Maker as well. I think
I am going to have to help with this part of the project
but it is going to take a while.

I offered (off-line) to look merging the Alt-Design code
in to the main branch but I suspect that there are some
different directions associated with this. Perhaps this
is the reason it has not been done so far. I am still willing
to work on this but I don't want to walk in to a firefight.

I believe the measurements I did yesterday and I feel that
a bit of algorithm replacement should produce a significant
improvement in the program. I would also like to suggest
that anyone interested in performance look at Java Memory
Profiler at http://www.khelekore.org/jmp/performance.html

I suspect there are still major memory leaks in FOP and this
is one tool that will help you track them down.

-- 
John Austin <[EMAIL PROTECTED]>

Re: Confused with Fop extensions, distinct.

2002-04-17 Thread John Austin

On Tuesday 16 April 2002 11:13, you wrote:
> I have tried to read the docs.   I had used distinct before but not
> in an embedded application.
>
> Function not supported.   Is my error message.
>
> I have inclued my stack Trace.  My includes in my application. and my
> stylesheet.
> Thanks in advance for any help you can give.

Check your spelling in namespace declarations and make certain the 
class name is coirrect for the version of Xalan you are using.

I posted the following to cocoon-users a few weeks ago:

This note is mostly for the benefit of anyone else searching for help 
with Redirect. Sorry if Xalan is a bit off topic for Cocoon2, but I 
expect that a lot of XSLT users will be using Cocoon as a framework for 
Xalan-J/XSLT-based applications.

I spotted Redirect in Chapter 8(?) of "XSLT" by Doug Tidwell, August 
2001, O'Reilly and Associates. It does work in Cocoon2 but I had a few 
problems getting it to run. I finally solved it by looking at the 
Xalan-J docs at xml.apache.org as well as the source code for Redirect.

The Redirect class seems to have moved from 
org.apache.xalan.xslt.extensions.Redirect to the more concise
org.apache.xalan.lib.Redirect.

The comments in Redirect are actually useful! (thank you Mr Boag!)

It appears that you need to include several namespaces in the 
stylesheet tag. This was another source of 'the fog of war' for me.

http://xml.apache.org/xslt";
 xmlns:redirect="org.apache.xalan.lib.Redirect"
 extension-element-prefixes="redirect"
 ... rest omitted ...

This did not work for me when I followed examples from various sources.

I have one minor complaint about Redirect (may apply to all extensions).
This facility is terrific but it is strangely silent when things go 
wrong. For example, I used file="concat(... at one point and created a 
directory named "concat(...".

When I had mis-spelled namespace URI's nothing worked and it did so 
silently. When the lxslt declaration was missing ... same thing. I know 
I had perms wrong and was rewarded again with silence.

Thanks again. Hope this helps.

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, email: [EMAIL PROTECTED]

Re: Unix and FOP ?

2002-04-13 Thread John Austin

On Friday 12 April 2002 22:43, you wrote:
> yep

The only area that Windows is (arguably) superior to Unix is in 
Graphics and especially FONTS.  A consequence of  Windows success is 
the fact that almost all computers have Windows licenses. This lets us 
use the Windows fonts. You need to have the Windows license however.

The Windows license doesn't require that you actually run all of 
Windows. So in using the fonts, we are using Windows as licensed. We 
are just ignoring the unreliable parts of Windows (generally the 
executabe parts). 

Now actually installing and using the Windows fonts is an area which 
needs better documentation. 

> > in userconfig.xml, but
> > Can I use the specials fonts by Window without problems into UNIX.?
> > That's possible?

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, email: [EMAIL PROTECTED]

Re: Unix and FOP ?

2002-04-11 Thread John Austin

On Thursday 11 April 2002 09:37, you wrote:
> Hi,
>
> I need information the file xsl:fo transformation in UNIX.
> What's I need by uses XSL:FO in UNIX?
> Can I do?

You can use xsl:fo in ANY system that has an implementation of the Java 
VM. This includes any reasonable implementation of Unix (AFAIK).

Many people use Cocoon 2 from the Apache project because it provides a 
ton of features, but you can use just the Fop program if you wish to.

I use it both ways and have also used it from XML Spy. 

In Cocoon 2 you can run XSL transformations to produce an XSL:FO 
document and render this to one of a number of formats such as PDF, PS 
and RTF using the FopSerializer.

I have also used Fop driven by a shell script. The memory footprint is 
smaller but you have to cart Fop around with you. From Cocoon, you can 
generate fancy-pants documents over the network.

Cocoon has a much steeper and lengthy learning curve. Fop is much 
smaller and comes with lots of examples. 

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, email: [EMAIL PROTECTED]

Re: License Issue Inquiry

2002-04-05 Thread John Austin

On Thursday 04 April 2002 14:47, you wrote:
> At 08:34 AM 4/4/02 -0500, Charles Marcus wrote:
> >like it could be the answer.  The only question is, can OOo use it?

> license of it's software. My question is why does OOo require FOP to
> be LGPLed? You can integrate it into OpenOffice without it being
> LGPL.

You can't change the Apache license into LGPL but you can license YOUR 
CODE under LGPL (or GPL or  even MicroSloth EULA). This will require 
anyone using YOUR CODE to observe LGPL (or whatever), which is what you 
want to do, as I understand it. This would allow anyone else to extract 
the Apache-licensed code out of your distribution and use it under 
Apache terms, as long as they removed all of the LGPL code from it. Of 
course, it would be easier to go out and get a pristine copy of FOp 
instead. The only thing you can create license terms for is YOUR CODE 
and then it still has to be provably original etc. etc.

> If you do this it would also be fair if you contribute changes to FOP
> back to the Apache tree.

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, email: [EMAIL PROTECTED]

66 matches

Mail list logo