On Mon, Aug 20, 2018 at 10:10:16AM +0200, Félix Sipma wrote: > Are you aware of "libreoffice --convert-to pdf --outdir . document.doc"?
Yes. > It seems to work well, so I'm not sure what lloconv brings to the table :-). It's faster for a start - on a random .doc I have to hand: $ time lloconv disclosure.doc tmp.pdf No language whitelisted, turning off the language support. real 0m0.431s user 0m0.369s sys 0m0.069s $ time libreoffice --convert-to pdf --outdir . disclosure.doc > /dev/null real 0m0.670s user 0m0.548s sys 0m0.098s If you're just converting a single document then either is likely fine, but for converting a lot of documents the extra overhead often starts to matter. You can batch convert many files to a single format with --convert-to, but lloconv's server mode allows reusing a single LibreOfficeKit instance to perform conversions to any formats you want. A batch conversion with --convert-to also needs enough scratch space for all the output files, whereas an lloconv in server mode can be used to process files on demand. With lloconv you can specify options to the conversion (maybe --convert-to supports that too, but if so it seems to be undocumented). Sadly the LibreOfficeKit options are also rather poorly documented. I know of "SkipImages" (which is handy if you just want to extract text for indexing for example) as that's mentioned in the LOK headers, but if you can figure out the names I think this should allow setting options that you can set when exporting by hand. Also lloconv seems to be more reliable - on the trivial testcase from the autopkgtest in the lloconv package, libreoffice --convert-to gives a blank PDF for no obvious reason: $ echo '<html><title>foo</title><body>hello world</body></html>' > in.html $ libreoffice --convert-to pdf --outdir . in.html [...] $ pdftotext in.pdf $ ls -l in.txt -rw-r--r-- 1 olly olly 1 Aug 21 08:46 in.txt Cheers, Olly