This is tremendous.  Great work nick.

Nick Burch wrote:
If you've been watching the commit messages over the last few days, you'll
have seen that I've made a quick stab at some ooxml support.

The code I've committed is powered by two other projects:
* xml beans - http://xmlbeans.apache.org/
* openxml4j - http://www.openxml4j.org/

OpenXML4J provides a nice library to get at the underlying zip file
format, grab the relationships between different bits of the file etc. In
many ways, it's the ooxml equivalent of poifs.

Then, I'm using xmlbeans + the microsoft supplied xsds to build up the
low level objects to work with the different streams. These objects are
much like our record and record factory stuff.


On top of all this, I've written some classes to handle getting at the
interesting low level parts of the files (HSSFXML, HWPFXML and HSLFXML).
I've stubbed out some usermodel equivalents, but not done anything else
with them. Finally, I've written some text extractors, which use the low
level beans to get the text out into a format you can stuff into lucene.


Couple of snags to be aware of:
* openxml4j is java 1.5, so we're going to want to keep this all separate
   whatever happens, so people who don't want ooxml can continue to use
   poi with java 1.3 / 1.4
* openxml4j haven't done even an alpha release yet, so we're working of a
   jar I tested and built, hosted off people.apache.org/~nick/
* the ooxml xsds haven't been confirmed to be under a ASL compatible
   licence, so ant just downloads everyone their own copy of them, and
   they're not in svn
* everything ooxml related has its own ant tasks - compile-ooxml,
   test-ooxml and jar-ooxml. The existing ant tasks (eg compile, jar,
   dist) will all ignore the ooxml stuff, so you'll have to take positive
   steps to get it
* to confirm, if you do a dist or a jar, you won't get it, so it won't
   interfere with the 3.0.2 release process
* there's no formal documentation for it, just unit tests and javadocs.
   I'm holding off writing any until other people have sanity checked the
   api structure :)
* you're going to need to read the emca specs if you want to make much use
   of it as it stands, unless you know the ole2 equivalents really well
   and can spot how they've stuffed it all into xml...


Next up is probably write support. This may require some tweaking and
thought, as there are three objects relating to each stream:
* the PackagePart (xml file in the zip)
* the Document (bean for the root of the xml file) eg WorkbookDocument
* the main bean, eg CTWorkbook
As someone using the API, you'll want the CTWorkbook, as that's the thing
with the actual data on it. However, to save the changes, you need to get
the document bean the ct bean came from, and trigger the write from there,
and stuff the resulting bytes back into the PackagePart. So, we'll need to
track all these bits internally, so we can give the user the bean they
want, but still have everything available to write it out.

(One option might be to nobble xmlbeans so that we can attach the
PackagePart onto the Document, and get back at the document from the main
bean, but that might prove to be far too much work, so we'll have to see)

If anyone has any good ideas for how to do the writing stuff, do pipe up.
It looks like it's going to be a little while before I get a copy of
office 2007, so there's no point me trying to knock up write support
before I have something to test opening with, which gives us a gap to
figure it out in :)


Oh, and I've put all the code in src/scratchpad/ooxml-src/ and
src/scratchpad/ooxml-testcases/, to indicate it's of scratchpad completion
levels, but different directories as it needs java 1.5. Once it's a bit
more stable, we'll probably want to move it to its own top level area
under src

Nick

---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]


--
Buni Meldware Communication Suite
http://buni.org
Multi-platform and extensible Email,
Calendaring (including freebusy),
Rich Webmail, Web-calendaring, ease
of installation/administration.

Attachment: smime.p7s
Description: S/MIME Cryptographic Signature

Reply via email to