Control: severity -1 minor
Control: retitle -1 antiword: give more helpful error for docx

On Sat, Aug 23, 2014 at 10:27:45AM +0200, Vincent Lefevre wrote:
> On a Microsoft Word 2007+ document (according to the "file" utility),
> I get:
> 
> $ antiword test.docx
> test.docx is not a Word Document.
> 
> which is wrong since the "file" utility correctly recognized this file
> as a Word document:
> 
> $ file test.docx
> test.docx: Microsoft Word 2007+

This is the new-style XML-in-a-zip-container format, which is completely
different to the binary Microsoft Word formats which antiword handles
most of.

There's no realistic likelihood of antiword supporting this - the last
antiword upstream release was 2005-10-21 (which predates this XML
format).  The antiword package is still useful for handling the files it
handles, but I don't plan to take on maintaining the upstream code to
the extent of adding support for entirely new formats.

The error message isn't very helpful though - it was correct at the time
of the last upstream release, but arguably isn't now, as you point out.
We can at least improve that.

Not sure what a good lightweight extractor for docx files is - I see
docx2txt in the archive, but I've never tried it.

Cheers,
    Olly


-- 
To UNSUBSCRIBE, email to debian-bugs-dist-requ...@lists.debian.org
with a subject of "unsubscribe". Trouble? Contact listmas...@lists.debian.org

Reply via email to