Hello Olly, thanks for your e-mail.

> I'm not expecting absolute proof, but it'd be good to test it on a
> selection of word documents, and compare output with and without
> the patch.

Okay, will do once the patch is ready, which as I said will not happen
shortly because it's a lot of work.

> Don't read anything into that - it's just an artifact of how I replied
> (I just fetched the mailbox for the bug with bts show -m, so replied
> to the message as it was before the X-Debbugs-Cc got processed).

I understand.

> > Thanks. If I can't talk to anybody and have to discover things by myself
> > it may take me some time to come up with a patch because the spec of the
> > format offered by Microsoft is non-trivial and, for me, not so easy to
> > read and understand.
> 
> It might be worth trying some of the other options (if you haven't
> already).

So far I tried catdoc and maybe wordview, which were not more
successful.

> wv has a command line extractor (wvText), which in my experience handles
> some files better than antiword (and others less well).  Sadly it isn't
> actively maintained upstream either these days (last release was just
> under 3 years ago).  ISTR antiword is faster than wvText.  
> 
> There's wv2, but that doesn't come with a command line tool - it's
> just a library.  That's also not active upstream (last release nearly 4
> years ago).
> 
> There's also unoconv which uses libreoffice to do the extraction - that
> means the extraction code is actively maintained upstream, and it seems
> to work with most files I've tried.  The downside is it is rather slow
> and memory hungry, and I've found it randomly fails sometimes.  I think
> the issues stem from trying to remote control libreoffice, which of
> course thinks it's a GUI application rather than a command line tool
> or library.

Will give a new look to all these, thanks. I think libreoffice also
misses the conversion. I don't knowbutthis font-as-codepag trick
seemsnot very well supported. It looks as if people aremostly unaware of
the problem. Perhaps i's because it has been used only for exotic fonts
such as tibetan and sanskrit ones.

Best wishes,
Sébastien.


-- 
To UNSUBSCRIBE, email to debian-bugs-dist-requ...@lists.debian.org
with a subject of "unsubscribe". Trouble? Contact listmas...@lists.debian.org

Reply via email to