It must be possible to do, since an OpenOffice file is just a set of XML files, zipped into one. I did the opposite thing: create one XML file from the Zip file, in order to publish them through Cocoon. I used Perl. Here's my script. This doesn't do what you want it to do, but hey, it's Open Source, right.
#!/usr/bin/perl # Written by Yves Vindevogel - [EMAIL PROTECTED] # 14-Nov-2002 # This file opens a OpenOffice document (which is a zip file) # and exports all the files in the document to XML # # Usage: oo2xml inputfile outputfile # Check if the input file exists unless (-e @ARGV[0]) { die "oo2xml error: Could not find input file\n" ; } ; # Run system command to unzip the file into a temp xml file # unzip -p opens the zip file and puts the content in the pipe # Since the content of an OpenOffice file is plain XML, # all the files in the OO file are put into the pipe. # The pipe is then flushed into a file, thus the xml file # contains all the content, in XML. # This is not a new valid XML file !! # On the temp xml file, some modifications must be done. system "unzip -p @ARGV[0] > /tmp/tmp.xml" || die "oo2xml error: Could not unzip the input file\n"; # Open the temp xml file open (tmp, "/tmp/tmp.xml") || die "oo2xml error: Could not open temp file\n" ; # Open second temp file to split the tags # When the tags are not split, and an <!tag> comes second, # the complete line is neglected, resulting in bugs # Therefore, in a first pass, the tags rewritten to a seperate line open (tmp2, "> /tmp/tmp2.xml") || die "002xml error: Could not open temp split file\n" ; # Loop through lines and split by entering a \n between the > and < while ($line = <tmp>) { $line =~ s/></>\n</g ; print tmp2 $line ; } ; # Close them close tmp2 ; close tmp ; # Open the filtered input file open (tmp, "/tmp/tmp2.xml") || die "oo2xml error: Could not open split file\n" ; # Open the output file open (xml, "> @ARGV[1]") || die "oo2xml error: Could not open output file\n" ; # Print the office:document tag # The complete document needs to be enclosed by one root element # The root element will thus be <office:document> print xml "<?xml version=\x221.0\x22 encoding=\x22UTF-8\x22?>\n" ; # \x22 = " print xml "<office:document " ; print xml "xmlns:office=\x22http://openoffice.org/2000/office\x22 " ; print xml "xmlns:style=\x22http://openoffice.org/2000/style\x22 " ; print xml "xmlns:text=\x22http://openoffice.org/2000/text\x22 " ; print xml "xmlns:table=\x22http://openoffice.org/2000/table\x22 " ; print xml "xmlns:draw=\x22http://openoffice.org/2000/drawing\x22 " ; print xml "xmlns:fo=\x22http://www.w3.org/1999/XSL/Format\x22 " ; print xml "xmlns:xlink=\x22http://www.w3.org/1999/xlink\x22 " ; print xml "xmlns:number=\x22http://openoffice.org/2000/datastyle\x22 " ; print xml "xmlns:svg=\x22http://www.w3.org/2000/svg\x22 " ; print xml "xmlns:chart=\x22http://openoffice.org/2000/chart\x22 " ; print xml "xmlns:dr3d=\x22http://openoffice.org/2000/dr3d\x22 " ; print xml "xmlns:math=\x22http://www.w3.org/1998/Math/MathML\x22 " ; print xml "xmlns:form=\x22http://openoffice.org/2000/form\x22 " ; print xml "xmlns:script=\x22http://openoffice.org/2000/script\x22 " ; print xml "xmlns:config=\x22http://openoffice.org/2001/config\x22 " ; print xml "xmlns:meta=\x22http://openoffice.org/2000/meta\x22 " ; print xml "xmlns:manifest=\x22http://openoffice.org/2001/manifest\x22 " ; print xml "xmlns:dc=\x22http://purl.org/dc/elements/1.1/\x22 " ; print xml ">\n" ; # Loop through the lines in the temp XML file # Lines with DOCTYPE descriptions and version info is omitted while ($line = <tmp>) { # temp var to see if we need to write the line $ok = 1 ; # Two reasons not to write the line: procession instructions and doctypes if ($line =~ /<\x3F/) { $ok = 0; } ; # \x3F = ? if ($line =~ /<!/) { $ok = 0; } ; # Remove any xmlns info from the line, # all the namespace information is already written in the root element # If you don't remove them, you get errors if ($line =~ /xmlns/) { # Split on white space @tags = split / /, $line ; # Loop through the tags, # if xmlns, check to see if it was the first or last tag # If so, write the opening or closing tag # otherwise simply write the tag and a white space foreach $tag (@tags) { if ($tag =~ /xmlns/) { if ($tag =~ /</) { print xml "<"} ; if ($tag =~ />/) { print xml ">\n"} ; } else { print xml $tag, " "; } ; } ; # Don't need to write the line, already written $ok = 0 ; } ; # Write the line if the temp var is still 1 unless ($ok == 0) { print xml $line ; } ; } ; # Write document end tag print xml "</office:document>\n" ; # Delete the temp files system "rm -f /tmp/tmp.xml" || warn "oo2xml warning: Temp file could not be deleted" ; system "rm -f /tmp/tmp2.xml" || warn "oo2xml warning: Temp split file could not be deleted" ; Citeren Olivier Mengué <[EMAIL PROTECTED]>: > Hi, > > I'm working on a project that will generate OpenOffice.org document from > data extracted from a database. Our aim is to automatise the publishing of > the program of hikes for my hikers association. It is actually done with a > Microsoft Word document merge and it is not satisfying. PDF is not an option > as publishers have to do additionnal editing after the automatic step. > The output document will be many pages long, so we want to process in batch > instead of as a web application. > > As OpenOffice.org document format is XML, I would like to reuse the Cocoon > pipeline with an ESQL transformer from a simple Java application. > > My question are : > - is it possible ? I mean, is it possible to reuse just the pipeline in a > standard Java application, without the sitemap and servlet stuff, without > too much code or too many dependencies. The pipeline would be either > hard-coded or specified with a simpler sitemap-like configuration file. > - how ? The package org.apache.cocoon.components.pipeline seems interesting, > but I don't know which class to use and how to build a simple pipeline with > a generator, a transformer and serialiser. Then, how to feed the pipeline ? > > Could you point me to the important classes, and the order to create them ? > > > Thank you for your help, > > Olivier Mengué > > > --------------------------------------------------------------------- > Please check that your question has not already been answered in the > FAQ before posting. <http://xml.apache.org/cocoon/faq/index.html> > > To unsubscribe, e-mail: <[EMAIL PROTECTED]> > For additional commands, e-mail: <[EMAIL PROTECTED]> --------------------------------------------------------------------- Please check that your question has not already been answered in the FAQ before posting. <http://xml.apache.org/cocoon/faq/index.html> To unsubscribe, e-mail: <[EMAIL PROTECTED]> For additional commands, e-mail: <[EMAIL PROTECTED]>