It must be possible to do, since an OpenOffice file is just a set of XML files,
zipped into one. I did the opposite thing: create one XML file from the Zip
file, in order to publish them through Cocoon. I used Perl. Here's my script.
This doesn't do what you want it to do, but hey, it's Open Source, right.
#!/usr/bin/perl
# Written by Yves Vindevogel - [EMAIL PROTECTED]
# 14-Nov-2002
# This file opens a OpenOffice document (which is a zip file)
# and exports all the files in the document to XML
#
# Usage: oo2xml inputfile outputfile
# Check if the input file exists
unless (-e @ARGV[0])
{ die "oo2xml error: Could not find input file\n" ;
} ;
# Run system command to unzip the file into a temp xml file
# unzip -p opens the zip file and puts the content in the pipe
# Since the content of an OpenOffice file is plain XML,
# all the files in the OO file are put into the pipe.
# The pipe is then flushed into a file, thus the xml file
# contains all the content, in XML.
# This is not a new valid XML file !!
# On the temp xml file, some modifications must be done.
system "unzip -p @ARGV[0] > /tmp/tmp.xml"
|| die "oo2xml error: Could not unzip the input file\n";
# Open the temp xml file
open (tmp, "/tmp/tmp.xml")
|| die "oo2xml error: Could not open temp file\n" ;
# Open second temp file to split the tags
# When the tags are not split, and an <!tag> comes second,
# the complete line is neglected, resulting in bugs
# Therefore, in a first pass, the tags rewritten to a seperate line
open (tmp2, "> /tmp/tmp2.xml")
|| die "002xml error: Could not open temp split file\n" ;
# Loop through lines and split by entering a \n between the > and <
while ($line = <tmp>)
{
$line =~ s/></>\n</g ;
print tmp2 $line ;
} ;
# Close them
close tmp2 ;
close tmp ;
# Open the filtered input file
open (tmp, "/tmp/tmp2.xml")
|| die "oo2xml error: Could not open split file\n" ;
# Open the output file
open (xml, "> @ARGV[1]")
|| die "oo2xml error: Could not open output file\n" ;
# Print the office:document tag
# The complete document needs to be enclosed by one root element
# The root element will thus be <office:document>
print xml "<?xml version=\x221.0\x22 encoding=\x22UTF-8\x22?>\n" ; # \x22 = "
print xml "<office:document " ;
print xml "xmlns:office=\x22http://openoffice.org/2000/office\x22 " ;
print xml "xmlns:style=\x22http://openoffice.org/2000/style\x22 " ;
print xml "xmlns:text=\x22http://openoffice.org/2000/text\x22 " ;
print xml "xmlns:table=\x22http://openoffice.org/2000/table\x22 " ;
print xml "xmlns:draw=\x22http://openoffice.org/2000/drawing\x22 " ;
print xml "xmlns:fo=\x22http://www.w3.org/1999/XSL/Format\x22 " ;
print xml "xmlns:xlink=\x22http://www.w3.org/1999/xlink\x22 " ;
print xml "xmlns:number=\x22http://openoffice.org/2000/datastyle\x22 " ;
print xml "xmlns:svg=\x22http://www.w3.org/2000/svg\x22 " ;
print xml "xmlns:chart=\x22http://openoffice.org/2000/chart\x22 " ;
print xml "xmlns:dr3d=\x22http://openoffice.org/2000/dr3d\x22 " ;
print xml "xmlns:math=\x22http://www.w3.org/1998/Math/MathML\x22 " ;
print xml "xmlns:form=\x22http://openoffice.org/2000/form\x22 " ;
print xml "xmlns:script=\x22http://openoffice.org/2000/script\x22 " ;
print xml "xmlns:config=\x22http://openoffice.org/2001/config\x22 " ;
print xml "xmlns:meta=\x22http://openoffice.org/2000/meta\x22 " ;
print xml "xmlns:manifest=\x22http://openoffice.org/2001/manifest\x22 " ;
print xml "xmlns:dc=\x22http://purl.org/dc/elements/1.1/\x22 " ;
print xml ">\n" ;
# Loop through the lines in the temp XML file
# Lines with DOCTYPE descriptions and version info is omitted
while ($line = <tmp>)
{
# temp var to see if we need to write the line
$ok = 1 ;
# Two reasons not to write the line: procession instructions and
doctypes
if ($line =~ /<\x3F/) { $ok = 0; } ; # \x3F = ?
if ($line =~ /<!/) { $ok = 0; } ;
# Remove any xmlns info from the line,
# all the namespace information is already written in the root element
# If you don't remove them, you get errors
if ($line =~ /xmlns/)
{
# Split on white space
@tags = split / /, $line ;
# Loop through the tags,
# if xmlns, check to see if it was the first or last tag
# If so, write the opening or closing tag
# otherwise simply write the tag and a white space
foreach $tag (@tags)
{
if ($tag =~ /xmlns/)
{
if ($tag =~ /</) { print xml "<"} ;
if ($tag =~ />/) { print xml ">\n"} ;
}
else
{
print xml $tag, " ";
} ;
} ;
# Don't need to write the line, already written
$ok = 0 ;
} ;
# Write the line if the temp var is still 1
unless ($ok == 0) { print xml $line ; } ;
} ;
# Write document end tag
print xml "</office:document>\n" ;
# Delete the temp files
system "rm -f /tmp/tmp.xml"
|| warn "oo2xml warning: Temp file could not be deleted" ;
system "rm -f /tmp/tmp2.xml"
|| warn "oo2xml warning: Temp split file could not be deleted" ;
Citeren Olivier Mengu� <[EMAIL PROTECTED]>:
> Hi,
>
> I'm working on a project that will generate OpenOffice.org document from
> data extracted from a database. Our aim is to automatise the publishing of
> the program of hikes for my hikers association. It is actually done with a
> Microsoft Word document merge and it is not satisfying. PDF is not an option
> as publishers have to do additionnal editing after the automatic step.
> The output document will be many pages long, so we want to process in batch
> instead of as a web application.
>
> As OpenOffice.org document format is XML, I would like to reuse the Cocoon
> pipeline with an ESQL transformer from a simple Java application.
>
> My question are :
> - is it possible ? I mean, is it possible to reuse just the pipeline in a
> standard Java application, without the sitemap and servlet stuff, without
> too much code or too many dependencies. The pipeline would be either
> hard-coded or specified with a simpler sitemap-like configuration file.
> - how ? The package org.apache.cocoon.components.pipeline seems interesting,
> but I don't know which class to use and how to build a simple pipeline with
> a generator, a transformer and serialiser. Then, how to feed the pipeline ?
>
> Could you point me to the important classes, and the order to create them ?
>
>
> Thank you for your help,
>
> Olivier Mengu�
>
>
> ---------------------------------------------------------------------
> Please check that your question has not already been answered in the
> FAQ before posting. <http://xml.apache.org/cocoon/faq/index.html>
>
> To unsubscribe, e-mail: <[EMAIL PROTECTED]>
> For additional commands, e-mail: <[EMAIL PROTECTED]>
---------------------------------------------------------------------
Please check that your question has not already been answered in the
FAQ before posting. <http://xml.apache.org/cocoon/faq/index.html>
To unsubscribe, e-mail: <[EMAIL PROTECTED]>
For additional commands, e-mail: <[EMAIL PROTECTED]>