cruft ? -- David B. Bitton [EMAIL PROTECTED] www.codenoevil.com
Code Made Fresh DailyT ----- Original Message ----- From: "J.Pietschmann" <[EMAIL PROTECTED]> To: "fop dev" <[EMAIL PROTECTED]> Sent: Friday, May 31, 2002 5:34 PM Subject: Exploring the FOP API design space > Hi foppers, > I know I should provide code instead of talking, but then... > > The current FOP API suffers from a variety of deficiencies > - unexpected statefulness (most horribly embodied in > XSLTInputHandler) > - weak abstraction of input and output channels > - incomplete separation of abstraction levels. > - cruft :-) > > Some points I think should be followed on design of a > new and hopefully better API: > - Atomic initialisation. After creating a processor, > it should be ready to run. Mandatory parametrisation > data should be passed either to the constructor or the > method(s) running the formatting process, everything > else should be initialised from sensible defaults. > - No file names, anywhere. Strings representing ressources > are always URLs, on the command line, in the config file > everywhere. In the API, use java.io.File if files are > deemed necessary. > - No baseDir. Define a baseURL concept. Pass all URL > through a resolver. > - Better abstraction of input and output channels. > > Whether only an avalon component API is exposed or whether there is an > avalon-free API and a separate avalon component is a matter of > taste. In either case, I'd like to have the possiblity to run a FOP > core without access to external config *files*, this means I can > create a new Driver() and can pass all config data by java properties, > service definitions and by using a user written Configuration class > passed to the Driver.configure() method for everything too complex to > be passed as properties and services (i.e. user font config). A FOP > default Configuration class could read a system and a user config > file. From what I've gathered from Avalon this is already implemented > this way there. However, I'm not sure, and I'm not dogmatic about > this. > > The problem I have is the design space for abstracting input > and output channels. > > = Input = > For input, we have the javax.xml.transform.Source stuff which > provides a nice unified encapsulation of SAX, DOM and serialised > XML streams as well as SAX and DOM itself. > > The nice part about the j.x.t.Source stuff is that it shields > the user from as much of the lower level XML stuff as possible, > in particular from setting up a parser in the common case of > having serialised XML as input. > > Design choice 1: > Use j.x.t.Source as FOP input. Implement a > o.a.f.stream.XSLTStreamSource as a j.x.t.s.StreamSource subclass > for providing XSLT power. (see end of message for an interface > proposal) > > Choice 2: > Provide SAX and DOM as input (getContentHandler() and render(DOM)) > > Choice 3: > Provide (more precise: expose) both. Redundant, but, well... > > = Output = > Next problem: output. We have two rather radically different > output types: byte streams and GUI panels. > A really stumbling block is that the object the output is > written to is volatile, it is likely to change with every > rendering run, while the kind of renderer as well as the > renderer specific configuration is more stable. This has > profund implications for the API design. > > Choice 1: > The interface is at the final output level. This means > render()/run() methods for each of the classes: > render(OutputStream) // for PDF, MIF, PS, ... > run(UserAgent) // for AWT... > We could add a print() method if necessary. > Rationale for choosing the method names: render() means > the input FO is rendered to a byte stream. Run() means, > the UserAgent is started and the user can interact with it. > The run() method will return if the user somehow ends the > interaction process and shuts down the UserAgent. Do I > interpret the current state correctly? > This choice implies the renderer and any configuration > data specific to the renderer has to be passed to the > Driver (processor) through the Driver configuration > methods. Because some renderers can be assumed to have > a lot of renderer specific config data which warrants > a structure imposed on it, I'm not very fond of the whole > idea. > > Choice 2: > The interface is the renderer. This means the renderer > object has to be created by the user explicitely. The > advantage is that the renderer configuration can be > designed to fit the renderer rather than to be passed > through a more generic interface at the Driver. Also, > renderer configuration and the renderer independent > processor configuration are better separated, which > might be a good idea, in particular for people who want > to render the same FO to several different output formats. > In this case, a typical code snipped would look like > > Processor p=new Processor( > new ProcessorConfiguration(new File("myconfig.xml"))); > Renderer r=new PDFRenderer( > new PDFRendererConfiguration("cocoon:/myPDFconfig.xml"))); > p.render(new StreamSource(new File("foo.fo")),r); > (I don't mind if the configuration is not passed to the > constructor but to a configuration() method, this is just > for illustration). > > = Reuse = > Last problem: reuse processors and renderers. > The XSLT processor of the JAXP interface and presumably > many XML parsers are throw away objects and not meant > to be reused after the "work" method (transform(), parse()) > has been called. > > Choice 1: > Make both processor and renderers throwaway objects. No > reset() method. Advantage: the state after the rendering > has ended can be retrieved as long as the objects are kepts. > The most common use case for this which has been mentionedp > on this list is inquiring the total number of pages rendered. > There are other use cases for sure. > I'm not sure how well this would fit into the avalon > component model. Can someone enlighten me? > Another consequence would be factory objects, where a user > can conveniently prepare a preconfigured template so that > repeated processor creation is simple and fast. Again, I'm > not sure if this fits well in the model with separated > processor and renderer, it is likely that the user will > create lots of identically configured processor+renderer > combinations. > > Choice 2: > Make processor and/or renderers reusable by providing a > reset() method. Again, in the model wit separated processor > and renderers users may be confused by having to reset two > objects. Another interesting question would be whether the > renderer is kept after resetting the processor or not. > In the first case, the renderer is a part of the processor > configuration rather than a rendering parameter and should > be passed to the constructor rather than to the rendering > method. > > Choice 3: > Reusable processor with auto-reset. The disadvantage is that > no state is kept after rendering has ended. THere is still > the possibly confusing problem whether a new renderer has > to be used or the old renderer is kept. > > = Caching = > Caching is an interesting topic. It comes in two flavours: > 1. Caching of stuff like images within a rendering run. > 2. Caching across multiple rendering runs on reused objects > The first is not only concerned with efficiency but also with > predictability. Consider > <fo:page-sequence initial-page-number="1"> > <fo:static-content> > <fo:external-graphic src="http://dynamic.com/curr-time.gif"/> > ... > <fo:page-sequence initial-page-number="20"> > <fo:static-content> > <fo:external-graphic src="http://dynamic.com/curr-time.gif"/> > Will the two page sequences feature the same or different > pictures in the page header? > XSLT explicitely says that within a transformer run, multiple > access to the same URL results in the same content. > The other interesting question is whether object reuse implies > caching stuff like images across rendering runs. Whether this is > useful depends on how often and how much stuff is shared. The > use cases vary from rendering the same document several times > to rendering documents sharing the same logo in the header to > rendering documents at random. > > Choice 1: > No caching at all, or a non-guaranteed caching. Risk reading > sources multiple times, including possibly dynamically changing > content. > Perhaps we should leave the cache problem to another application > layer. Cocoon appears to be quite good at it, no reinvention of the > wheel necessary. > > Choice 2: > Guarantee an URL is only read once within a rendering run. May imply > memory problems. > > Choice 3: > Expose caching across multiple renderings on a reused object. > Needs an API for Cache control. > (My opinion: not recommended). > > = Conclusions = > Ok, concrete proposals for the new interface, tentatively > called Processor, for various combinations of the design > changes regarding output abstraction and reuse. (I use > j.x.t.Source for input, this does not mean I'm biased to > this. Ok, I am :-) ) > > 1. Output is physical. Throw away. > class Processor { > // default renderer, may adapt to output type > Processor() > // configureation includes renderer choice > Processor(Configuration) > run( Source s, UserAgent ua) > render( Source s, OutputStream o) > } > > 2. Alternative with separate configuration method > class Processor { > Processor() > configure(Configuration) > run( Source s, UserAgent ua) > render( Source s, OutputStream o) > } > > 3. Output is physical. Alternative for avoiding calling and > explicitly configured renderer with an improper output type > class Processor { > Processor(Source s, UserAgent ua) > Processor(Source s, OutputStream o) > Processor(Source s, UserAgent ua, Configuration) > Processor(Source s, OutputStream o, Configuration) > run() > } > > 4. Alternative with separate configuration method > class Processor { > Processor(Source s, UserAgent ua) > Processor(Source s, OutputStream o) > configure(Configuration) > render() // or run() > } > > 5-8. Add a reset() which resets both processor and renderer > to either of the altenatives above. > > 9. Output is Renderer. Throw away. Not well suited for Factory. > class Processor { > Processor() > Processor(Configuration) > render( Source s, Renderer r) > } > class PDFRenderer { > PDFRenderer(OutputStream o) > PDFRenderer(OutputStream o, Configuration) > } > > 10. Add reset() to 9. > 11. Variants for Renderer output and Factory approach > omitted (look ugly). Add your own proposals > > = Further activity = > Well, I suppose there will be a consensus built: > - Whether to expose > 1. Avalon component interface only > 2. Both Avalon and non-avalon interface > 3. Non-Avalon interface only > - Design variant for input channel > - Design variant for output channel > - Design variant for object reuse > - Whether to provide a factory (if appropriate) > I hope this happens within the next week. > I will then post a detailed interface to the list. I hope > someone will help me to avalonise this, if necessary. > After the interface is voted on, I'll implement this, > with the objective to have running code in august. The > current interface should be deprecated but kept for a > few maintenance releases. > > Is this ok? > > J.Pietschmann > > > --------------------------------------------------------------------- > To unsubscribe, e-mail: [EMAIL PROTECTED] > For additional commands, email: [EMAIL PROTECTED] > > --------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, email: [EMAIL PROTECTED]