Eric,No you don't. Just use Runtime.exec - no JNI :)
The problem with antiword is that it is a native application. You must write
a class that uses JNI to access the native code.
Yeah but given that the source for antitext is avail and it runs on all platformsIf you link your java code with native code you have lost one of the biggest benefits of Java, platform
I use (windows/linux/sun) and works better than anything else (given that it seems
to accept older formats than POI/textmining) it seems to get the job done better.
independence. I would suggest you use the library at http://textmining.org. contrary to what David Spencer says, it should work on all documents created with Word 97 or above. I have literally indexed 100,000s of unique documents using my library.
Ryan Ackley
----- Original Message ----- From: "Eric Anderson" <[EMAIL PROTECTED]> To: "Lucene Users List" <[EMAIL PROTECTED]> Sent: Wednesday, March 05, 2003 7:14 PM Subject: Re: my experiences - Re: Parsing Word Docs
Ok. Thanks for the tip.indexing
I downloaded and compiled Antiword, and would like to now add it to my
theclass. However, I'm not sure how the application would be called, and from where it would be called.
How will I have the class parse the document through Antiword to create
keyword index, but leaving the DOC intact, as Mr. Litchfield did withPDFBox?
Your assistance is greatly appreciated.Apache
Eric Anderson 815-505-6132
Quoting David Spencer <[EMAIL PROTECTED]>:
FYI I tried the textmining.org/poi combo and on a collection of 350 word docs people have developed here over the years, and it failed on 33% of them with exceptions being thrown about the formats being invalid.
I tried "antiword" ( http://www.winfield.demon.nl/ ), a native & free *.exe, and it worked great ( well it seemed to process all the files fine).
I've had similar experiences with PDF - I tried the 3 or so freeware/java PDF text extractors and they were not as good as the exe, pdftotext, from foolabs (http://www.foolabs.com/xpdf/).
Not satisfying to a java developer but these work better than anything else I can find.
You get source and I use them on windows & linux, no prob.
Eric Anderson wrote:
I'm interested in using the textmining/textextraction utilities using
thePOI, that Ryan was discussing. However, I'm having some difficultydetermining
what the insertion point would be to replace the default parser with
word
parser.
Any assistance would be appreciated.
LanRx Network Solutions, Inc. Providing Enterprise Level Solutions...On A Small Business Budget
--------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
--------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
LanRx Network Solutions, Inc. Providing Enterprise Level Solutions...On A Small Business Budget
--------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
--------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]