Re: eLyXer for Document Parsing

Abdelrazak Younes Sun, 05 Feb 2012 01:15:50 -0800

On 04/02/2012 18:03, Rob Oakes wrote:

Dear eLyXer Users and Developers,


I'm still at work on the import/export module for Microsoft Word documents. I'm 
making pretty good progress. I've got a rough prototype that works pretty well 
and I'm now starting to refine it.

My approach up to now has been to use regular expressions to match portions of 
the document and then use a library to translate those to the corresponding 
Word XML structures. It's working pretty well with my simple test documents.

Before going too far with this approach, though, I wanted to post (another 
general query).

In the eLyXer library, there is already a robust set of tools used for 
converting LyX documents to HTML. Does anyone know if the library is written in 
such as way that getting a generic in-memory representation of the document 
would be possible? It would be awesome to re-use as much existing code for the 
Word document export as possible. That would allow me to support a broader 
number of features, and gives me a framework for working with maths.

Strong suggestion: use LyX proper. I am quite sure you already know thatbecause I saw some patches from you in this area but I'll explainanyway: LyX's html own export is so good and fast because it effectivelyknows the in-memory representation of the document. You can't be fasternor more accurate than that. I mean, unless you want to rewrite LyX inpython.

IIUC you want a single module in python for both import and export inpython. But I don't think this is a valid argument. As for the word tolyx format conversion, if you want to use this epub library there mustbe a way to use that in C++ I'm sure...

Any thoughts Alex (and others)? I've downloaded the sources and have begun to 
work through them, but before spending hours to days trying to wrap my head 
around them, I thought I would ask.

AFAIK, eLyXer doesn't construct a document model. So you'd better spendthis time reading the C++ code for exporting to html/xhtml ;-)


Abdel.

Re: eLyXer for Document Parsing

Reply via email to