Control: severity -1 minor Control: retitle -1 antiword: give more helpful error for docx
On Sat, Aug 23, 2014 at 10:27:45AM +0200, Vincent Lefevre wrote: > On a Microsoft Word 2007+ document (according to the "file" utility), > I get: > > $ antiword test.docx > test.docx is not a Word Document. > > which is wrong since the "file" utility correctly recognized this file > as a Word document: > > $ file test.docx > test.docx: Microsoft Word 2007+ This is the new-style XML-in-a-zip-container format, which is completely different to the binary Microsoft Word formats which antiword handles most of. There's no realistic likelihood of antiword supporting this - the last antiword upstream release was 2005-10-21 (which predates this XML format). The antiword package is still useful for handling the files it handles, but I don't plan to take on maintaining the upstream code to the extent of adding support for entirely new formats. The error message isn't very helpful though - it was correct at the time of the last upstream release, but arguably isn't now, as you point out. We can at least improve that. Not sure what a good lightweight extractor for docx files is - I see docx2txt in the archive, but I've never tried it. Cheers, Olly -- To UNSUBSCRIBE, email to debian-bugs-dist-requ...@lists.debian.org with a subject of "unsubscribe". Trouble? Contact listmas...@lists.debian.org