Hi, On Tue, Sep 02, 2025 at 04:15:44PM +1000, Chris Sherlock <[email protected]> wrote: > > To make it easier to maintain the code and enable other open source > > software to use it, it can be useful to split off the import export code as > > libraries. This way other software such as NAPS2, can then use the > > libraries to scan and save to doc or docx files. These produced files can > > then be edited in Libreoffice for instance, it would help to aid its OCR > > functions. > > > > What would it take to split off this code into independent libraries which > > can be installed and used by other software, as well as Libreoffice?
This is not as easy as it sounds, because naturally the DOC & DOCX import code maps from the specific formats to Writer's doc model. So in case other software would want to use this, then you would need to map to that different document model, so there is not much to share. Additionally, given that e.g. DOCX can embed XLSX or PPTX files, you still need all of libreoffice to handle these documents properly, you can't just split off some of the import code to a separate library. If other software wants to reuse libreoffice's import filters, perhaps use libreoffice to convert to (flat) ODF, then only handle that one format in your application? > Does anyone know if there is a library based on librevenge that handles doc > and docx files? I'm not aware of something like that, https://www.documentliberation.org/projects/ has a list of existing importers. Regards, Miklos
