Hi,

On Tue, Sep 02, 2025 at 04:15:44PM +1000, Chris Sherlock 
<[email protected]> wrote:
> > To make it easier to maintain the code and enable other open source 
> > software to use it, it can be useful to split off the import export code as 
> > libraries. This way other software such as NAPS2, can then use the 
> > libraries to scan and save to doc or docx files. These produced files can 
> > then be edited in Libreoffice for instance, it would help to aid its OCR 
> > functions.
> > 
> > What would it take to split off this code into independent libraries which 
> > can be installed and used by other software, as well as Libreoffice?

This is not as easy as it sounds, because naturally the DOC & DOCX
import code maps from the specific formats to Writer's doc model. So in
case other software would want to use this, then you would need to map
to that different document model, so there is not much to share.

Additionally, given that e.g. DOCX can embed XLSX or PPTX files, you
still need all of libreoffice to handle these documents properly, you
can't just split off some of the import code to a separate library.

If other software wants to reuse libreoffice's import filters, perhaps
use libreoffice to convert to (flat) ODF, then only handle that one
format in your application?

> Does anyone know if there is a library based on librevenge that handles doc 
> and docx files?

I'm not aware of something like that,
https://www.documentliberation.org/projects/ has a list of existing
importers.

Regards,

Miklos

Reply via email to