[OctDev] ODS (OpenOffice.org Calc) read support added to IO package

Philip Nienhuis Wed, 30 Dec 2009 08:23:16 -0800

Hi there,

(+sorry for long post)
A little while ago I've added and -just a few minutes ago- updated ODS
read support for Octave in the IO package (in SVN)
(ODS = spreadsheet format used by OpenOffice.org Calc, it's the
spreadsheet subtype of Open Document Format).
While this was motivated mainly by an informal question by Alois
Schloegl, I could use octave ods support for my own needs as well.


As it stands, reading from ODS is fairly reliable (but admittedly a
bit slow) - though more testing is surely welcome. For me it works on
Windows (XP SP3) and on Linux (Mandriva 2009.0).
Writing ODS is still a bridge too far (see Developers's Notes below).
Relevant m-files:
odsread / odsopen / ods2oct / odsclose / odsfinfo (/ calccelladdress)

For this ODS support the java package is needed + one or both of
(odfdom.jar + xercesImpl.jar) or (jopendocument.jar), and in all cases,
these jars + rt.jar from your java jre / java jdk should be in your
javaclasspath. The former (odfdom + xercesImpl) is presently the
preferred option.
Get the jars from:
http://odftoolkit.org/projects/odfdom/downloads/directory/current-version
http://xerces.apache.org/mirrors.cgi
   or
www.jopendocument.org


Usage
===============
Usage is easy and modeled after the Excel xls stuff I committed earlier:

Use odsfinfo() to explore ods-files with unknown content; then just do:

[<outputdata>] = odsread (<spreadsheet/worksheet/...>)

              ----or (= more flexible)----

ods = odsread (filename.ods)               # Get filepointer;
[<data>] = ods2oct (ods, worksheet/range)  # Get actual raw data;
[.. = parsecell (...) ]                    # Separate numbers and text;
:                                          # Optionally get more data
:                                          # from same file;
ods = odsclose (ods)                       # Close file pointer.

(see the help from each m-file)

Like in the xls java stuff I contributed a while ago, my scripts also
return the outer column and row numbers where the data came from (AFAIK
Excel nor Matlab can do this), an IMO handy feature for exploring
spreadsheets with unknown content.


Contributors note for package maintainer
========================================
Although ODS support doesn't work without java, I left the requirements
for the java package as "suggested" as lacking java support is (should 
be) gracefully catched & the user informed appropriately.


Developer's notes (as far as I'm "developer" - more "Hobby programmer"):
=======================================================================
ODS (a zipped xml format) seems easy to parse (after unzipping you can
read it) so I once had the impression that simply unzipping the ods file
and then processing what's inside would do. But it turns out that things
quickly and steeply become complex far beyond basic needs (= just simple
ODS data exchange).

Unzipping is the first obstacle - the current tools in the miscellaneous
package do not allow direct unzipping into memory (as e.g. a char
array). Fiddling around with temporary disk files is not very elegant
IMO and might induce frustrations if your USB stick is almost full (or 
when your IT dept has particularly restrictive write policies). A
way around might be possible (e.g. IIRC V7 mat files are compressed and
that functionality is in core octave) but that's beyond me currently.
What's needed is unzipping into a stream and directly parsing that
stream into an XML tree (a la pipe).

After having explored various unzipped ODS files, having read the
discussions in nabble on XML, and having stumbled over the absolute lack
of usable (for me) documentation for the XML stuff in the miscellaneous 
package, I decided to definitely go for a java-based solution:
- First of all we need plain data exchange. Post-processing, formatting
and worksheet overhaul like inserting columns & merging cells can wait.
- Octave <-> spreadsheet data exchange primarily revolves around
rectangular arrays that don't map easily onto tree-like XML data
structures and vice versa.
- So, to make it simple, we need something that hides the gory XML
details and presents a "rectangular" (let's say table) model to octave.
- Such interfaces must have been written many many times. Why reinvent
the wheel?
- I'm not quite a java fanboy (too much bloat I think) but it *does*
help platform independence and -once you get the hang of it- it *does*
speed up development.
- Matlab is heavily dependent on java. Typing 'javaclasspath' (or the
equivalent) in Matlab returns a LONG list of java classes installed by
default. If I'm not mistaken Matlab may even add a java JDK during
installation - our IT dept installs all our SW so I can't tell for sure.
- I did find a number of (at first sight!) promising java solutions.
- Java adds a dependency; but IMO that is outweighed by the advantages:
(1) octave now has ODS (read) support and (2) even a mere hobby
programmer like me could make it with a reasonable time investment.

The currently available java solutions I found & selected have their
limitations:

--- jOpenDocument is the most promising, but currently lacks key
features (some vital methods are protected and therefore unusable, some
obviously needed methods are lacking, and of course there are bugs)
while -as stated in their support forums- development is primarily
driven by paying customers. Maybe a next version next year works
better... (I really hope so - jOpenDocument opens the way for easy ODS
write support).
But: jOpenDocument is *much* faster than ODFtoolkit.
I've left jOpenDocument support in because (1) it *does* work, and (2)
it should be developed further once the time is right.

--- ODFtoolkit is big and IMO leans too much to the ODF (xml) format
itself rather than abstracting away from it to make access easy. E.g.,
spreadsheet support mainly goes through Tables; in the xml domain that
is an obvious choice for developers but less so for what octave ODS
support needs (although it goes some way).
Once I was handed a table-cell I simply resorted to octave's native
string processing for the final bits - admittedly clumsy but it works
and it's faster than preparing and executing another java_invoke().
This also avoids potential problems as java (.util.Date) apparently has
epoch at 1-1-1970 (so earlier java dates might not be represented
properly) while OO.o Calc happily fits them in.


Write support
=============
This whole state of things is the main reason that I've only glanced at
ODS write support but decided to let that rest for now.
If someone desperately needs it, well, the beauty of open source is ....

Best wishes,

Philip


------------------------------------------------------------------------------
This SF.Net email is sponsored by the Verizon Developer Community
Take advantage of Verizon's best-in-class app development support
A streamlined, 14 day to market process makes app distribution fast and easy
Join now and get one step closer to millions of Verizon customers
http://p.sf.net/sfu/verizon-dev2dev 
_______________________________________________
Octave-dev mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/octave-dev

[OctDev] ODS (OpenOffice.org Calc) read support added to IO package

Reply via email to