On 04/02/2012 18:03, Rob Oakes wrote:
Dear eLyXer Users and Developers,
I'm still at work on the import/export module for Microsoft Word documents. I'm
making pretty good progress. I've got a rough prototype that works pretty well
and I'm now starting to refine it.
My approach up to now has been to use regular expressions to match portions of
the document and then use a library to translate those to the corresponding
Word XML structures. It's working pretty well with my simple test documents.
Before going too far with this approach, though, I wanted to post (another
general query).
In the eLyXer library, there is already a robust set of tools used for
converting LyX documents to HTML. Does anyone know if the library is written in
such as way that getting a generic in-memory representation of the document
would be possible? It would be awesome to re-use as much existing code for the
Word document export as possible. That would allow me to support a broader
number of features, and gives me a framework for working with maths.
Strong suggestion: use LyX proper. I am quite sure you already know that
because I saw some patches from you in this area but I'll explain
anyway: LyX's html own export is so good and fast because it effectively
knows the in-memory representation of the document. You can't be faster
nor more accurate than that. I mean, unless you want to rewrite LyX in
python.
IIUC you want a single module in python for both import and export in
python. But I don't think this is a valid argument. As for the word to
lyx format conversion, if you want to use this epub library there must
be a way to use that in C++ I'm sure...
Any thoughts Alex (and others)? I've downloaded the sources and have begun to
work through them, but before spending hours to days trying to wrap my head
around them, I thought I would ask.
AFAIK, eLyXer doesn't construct a document model. So you'd better spend
this time reading the C++ code for exporting to html/xhtml ;-)
Abdel.