I've bee using pdftohtml (get the latest version from poppler.freedesktop.org) with the '-complex -xml' options, to generate an XML file (which I am then processing with a Perl prog to make an ePub) - depending on the PDF, it does a pretty good job, and you may be able to import the XML directly?
On 18 July 2014 15:05, TimA <[email protected]> wrote: > Hi Terry > > > On 18/07/14 14:47, Terry Coles wrote: > >> Hi, >> >> Does anyone know how I can use tools available in Linux to convert a PDF >> file >> to MS Word .doc or .docx format (or even to LibreOffice .odt)? >> > > Closest I'm aware of is pdftotext (also pdf2text, pdf2txt etc). But of > course you'll lose the formatting. There's also pdf2ps from which maybe you > can use > > http://www.coolutils.com/PS-to-DOC > > or something similar > > Cheers > > Tim > > > >> I thought I could do it using LibreOffice, but it reads the PDF content >> as if it >> is a series of graphical objects with text labels. As a consequence, I >> can >> only save it as .odg or export it to a graphical format. >> >> The problem is that we have a number of specifications in PDF format. We >> need >> to get them into an editable form (preferably word) because they need >> translating. >> >> At work I tried the real thing (Adobe Writer), but it seriously mangles >> the >> format, even when it works. >> >> The originals seem to have been created using a number of different >> tools; some >> were created in MS Word 2010, some PDFCreator (presumably from a Word >> Source, >> some with Acrobat Distiller and some by conversion from Postscript. Adobe >> Writer was only able to save three out of five documents and they were >> not very >> good. >> >> >> > > -- > Next meeting: Bournemouth, Tuesday, 2014-08-05 20:00 > Meets, Mailing list, IRC, LinkedIn, ... http://dorset.lug.org.uk/ > New thread on mailing list: mailto:[email protected] > How to Report Bugs Effectively: http://goo.gl/4Xue > -- best regards, 웃 Victor Churchill, Bournemouth -- Next meeting: Bournemouth, Tuesday, 2014-08-05 20:00 Meets, Mailing list, IRC, LinkedIn, ... http://dorset.lug.org.uk/ New thread on mailing list: mailto:[email protected] How to Report Bugs Effectively: http://goo.gl/4Xue

