Ryan Ackley wrote:

David,

The textmining.org stuff only works on Word97 and above. It should work with

Could be we had pre word97 docs as some date from 1996 when we (Lumos at least)
were founded.


no exceptions on any Word 97 doc. If you have any problems then it is from
an earlier version (most likely Word 6.0) or its not a word document. If
this isn't the case you need to email me so I can fix it and make it better
for the benefit of everyone. I plan on adding support for Word 6 in the
future.

Ryan Ackley

----- Original Message -----
From: "David Spencer" <[EMAIL PROTECTED]>
To: "Lucene Users List" <[EMAIL PROTECTED]>
Sent: Wednesday, March 05, 2003 6:24 PM
Subject: my experiences - Re: Parsing Word Docs




FYI I tried the textmining.org/poi combo and on a collection of 350 word
docs people have developed here over the years, and it failed on 33% of


them


with exceptions being thrown about the formats being invalid.

I tried "antiword" ( http://www.winfield.demon.nl/ ), a native & free
*.exe, and
it worked great ( well it seemed to process all the files fine).

I've had similar experiences with PDF - I tried the 3 or so
freeware/java PDF
text extractors and they were not as good as the exe, pdftotext,
from foolabs (http://www.foolabs.com/xpdf/).

Not satisfying to a java developer but these work better than anything
else I can find.

You get source and I use them on windows & linux, no prob.



Eric Anderson wrote:



I'm interested in using the textmining/textextraction utilities using


Apache


POI, that Ryan was discussing. However, I'm having some difficulty


determining


what the insertion point would be to replace the default parser with the


word


parser.

Any assistance would be appreciated.





LanRx Network Solutions, Inc.
Providing Enterprise Level Solutions...On A Small Business Budget

---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]






--------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]






---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]






Reply via email to