Hi,

Le mardi 08 f�vrier 2005 � 00:32 +0100, Erik Bruchez a �crit :
> Eric van der Vlist wrote:
> 
> > For instance, one of the basic features of an OpenOffice converter would
> > be to accept an OpenOffice document as a model and the new XML content
> > to replace this content in the model.
> > 
> > This can be done passing the location of the model in a config input
> > (like I think it's the case for the Excel converter) but this could also
> > be done passing the model itself as an input.
> > 
> > The second solution would be more flexible (it gives the possibility to
> > chain transformations of OpenOffice documents without having to
> > explicitly use temporary files).
> > 
> > Now, I would question the efficiency of base64 encoding and decoding
> > OpenOffice documents that are zip files containing XML documents and
> > pictures. 
> 
> In theory, Base64 encoding and decoding shouldn't be too slow. In 
> practice, we haven't measured the performance of the implementation we 
> use, which comes from Apache. The big question is whether the time of 
> encoding / decoding is significant compared to the other tasks.
> 
> > Between these two options, which one would you recommend?
> 
> I think both ways can work. But if you have a URL available, and your 
> processor supports a binary stream, you can use the URL generator to 
> produce that binary stream. So it is more flexible this way. I would 
> prefer that solution.

Yes, that's a good point.

> Or, you could go the whole extra mile (or kilometer) and use the 
> strategy used by the Email processor, which comes down to specifying 
> URIs for attachments, but a URI can be something like "oxf:/foo.jpg", 
> but also "input:foo".
> 
> > There is also a variant (possible with both options) which would be to
> > totally expose the content of OpenOffice documents.
> > 
> > A converter from OpenOffice to XML would have one input (the OpenOffice
> > document) and one output per XML document composing the package. Vice
> > versa, a converter from XML to OpenOffice would have as many input as
> > documents and an output for the OpenOffice document.
> 
> > The downside is more pipeline work to do to connect all the inputs and
> > outputs, but I find that the additional flexibility could be worth the
> > pain and that this would give the possibility to work on all the
> > components of OpenOffice documents (this is needed, for instance if you
> > want to add pictures or change master styles or metadata in a document).
> 
> Another downside is that you have to know the names of the documents 
> before creating the pipeline, unless you plan to generate the pipeline 
> dynamically (which should be done IMO in last resort). Right?

No, as far as I can tell, the names of the documents are fixed excepted
for pictures.

Now, there is also yet another possibility: what about writing a single
XML compound document that would be an aggregation of all the documents
and can be manipulated as such in pipelines and XML databases.

The structure of an OpenOffice document archive is described in a
manifest file that looks like:

<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE manifest:manifest PUBLIC "-//OpenOffice.org//DTD Manifest 1.0//EN" 
"Manifest.dtd">
<manifest:manifest xmlns:manifest="http://openoffice.org/2001/manifest";>
 <manifest:file-entry manifest:media-type="application/vnd.sun.xml.writer" 
manifest:full-path="/"/>
 <manifest:file-entry manifest:media-type="image/png" 
manifest:full-path="Pictures/10000000000001DA0000005BF70F3350.png"/>
 <manifest:file-entry manifest:media-type="image/png" 
manifest:full-path="Pictures/1000020000000055000000255789B5EB.png"/>
 <manifest:file-entry manifest:media-type="" manifest:full-path="Pictures/"/>
 <manifest:file-entry manifest:media-type="appication/binary" 
manifest:full-path="layout-cache"/>
 <manifest:file-entry manifest:media-type="text/xml" 
manifest:full-path="content.xml"/>
 <manifest:file-entry manifest:media-type="text/xml" 
manifest:full-path="styles.xml"/>
 <manifest:file-entry manifest:media-type="text/xml" 
manifest:full-path="meta.xml"/>
 <manifest:file-entry manifest:media-type="text/xml" 
manifest:full-path="settings.xml"/>
</manifest:manifest>

If you take this document and include the content of each document in
the manifest:file-entry elements except for "/" and maybe layout-cache
(either as plain XML, CDATA or base64 based on the media type like you
are doing for streams), you get something that is easy to process in a
pipeline, easy to split through XPointer and easy to reconstitute
through XSLT.

And, if you want to use attach external documents, you can use XInclude.

Looks like that solves a lot of issues and that's probably what I am
going to implement!

The frequent use case where you want to transform content.xml to replace
content while keeping the same presentation would become:

      * Use a converter to convert an OOo model into this XML format or
        retrieve this XML format directly from a conversion previously
        stored in a XML database.
      * Apply a transformation either on the content.xml fragment using
        an XPointer expression for the data entry of the transformation
        (that allows you to use transformations defined as OOo filters)
        or on the compound document.
      * Use a converter to convert either directly the result of the
        transformation or the result of a XSLT transformation
        reconstructing the the compound document.

I think that this is the option I'll be implementing!

What do you think?

Thanks,

Eric
-- 
Have you ever thought about unit testing XSLT templates?
                                                     http://xsltunit.org
------------------------------------------------------------------------
Eric van der Vlist       http://xmlfr.org            http://dyomedea.com
(ISO) RELAX NG   ISBN:0-596-00421-4 http://oreilly.com/catalog/relax
(W3C) XML Schema ISBN:0-596-00252-1 http://oreilly.com/catalog/xmlschema
------------------------------------------------------------------------



-------------------------------------------------------
SF email is sponsored by - The IT Product Guide
Read honest & candid reviews on hundreds of IT Products from real users.
Discover which products truly live up to the hype. Start reading now.
http://ads.osdn.com/?ad_ide95&alloc_id396&op=click
_______________________________________________
orbeon-user mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/orbeon-user

Reply via email to