subject:"my experiences \- Re\: Parsing Word Docs"

Re: my experiences - Re: Parsing Word Docs

2003-03-06 Thread Ryan Ackley

David,

The textmining.org stuff only works on Word97 and above. It should work with
no exceptions on any Word 97 doc. If you have any problems then it is from
an earlier version (most likely Word 6.0) or its not a word document. If
this isn't the case you need to email me so I can fix it and make it better
for the benefit of everyone. I plan on adding support for Word 6 in the
future.

Ryan Ackley

- Original Message -
From: David Spencer [EMAIL PROTECTED]
To: Lucene Users List [EMAIL PROTECTED]
Sent: Wednesday, March 05, 2003 6:24 PM
Subject: my experiences - Re: Parsing Word Docs


 FYI I tried the textmining.org/poi combo and on a collection of 350 word
 docs people have developed here over the years, and it failed on 33% of
them
 with exceptions being thrown about the formats being invalid.

 I tried antiword ( http://www.winfield.demon.nl/ ), a native  free
 *.exe, and
 it worked great ( well it seemed to process all the files fine).

 I've had similar experiences with PDF - I tried the 3 or so
 freeware/java PDF
 text extractors and they were not as good as the exe, pdftotext,
 from foolabs (http://www.foolabs.com/xpdf/).

 Not satisfying to a java developer but these work better than anything
 else I can find.

 You get source and I use them on windows  linux, no prob.



 Eric Anderson wrote:

 I'm interested in using the textmining/textextraction utilities using
Apache
 POI, that Ryan was discussing. However, I'm having some difficulty
determining
 what the insertion point would be to replace the default parser with the
word
 parser.
 
 Any assistance would be appreciated.
 
 
 
 
 
 LanRx Network Solutions, Inc.
 Providing Enterprise Level Solutions...On A Small Business Budget
 
 -
 To unsubscribe, e-mail: [EMAIL PROTECTED]
 For additional commands, e-mail: [EMAIL PROTECTED]
 
 
 



 -
 To unsubscribe, e-mail: [EMAIL PROTECTED]
 For additional commands, e-mail: [EMAIL PROTECTED]




-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Re: my experiences - Re: Parsing Word Docs

2003-03-06 Thread Ryan Ackley

Eric,

The problem with antiword is that it is a native application. You must write
a class that uses JNI to access the native code. If you link your java code
with native code you have lost one of the biggest benefits of Java, platform
independence. I would suggest you use the library at http://textmining.org.
contrary to what David Spencer says, it should work on all documents created
with Word 97 or above. I have literally indexed 100,000s of unique documents
using my library.

Ryan Ackley

- Original Message -
From: Eric Anderson [EMAIL PROTECTED]
To: Lucene Users List [EMAIL PROTECTED]
Sent: Wednesday, March 05, 2003 7:14 PM
Subject: Re: my experiences - Re: Parsing Word Docs


 Ok. Thanks for the tip.

 I downloaded and compiled Antiword, and would like to now add it to my
indexing
 class. However, I'm not sure how the application would be called, and from
 where it would be called.

 How will I have the class parse the document through Antiword to create
the
 keyword index, but leaving the DOC intact, as Mr. Litchfield did with
PDFBox?

 Your assistance is greatly appreciated.

 Eric Anderson
 815-505-6132


 Quoting David Spencer [EMAIL PROTECTED]:

  FYI I tried the textmining.org/poi combo and on a collection of 350 word
  docs people have developed here over the years, and it failed on 33% of
  them
  with exceptions being thrown about the formats being invalid.
 
  I tried antiword ( http://www.winfield.demon.nl/ ), a native  free
  *.exe, and
  it worked great ( well it seemed to process all the files fine).
 
  I've had similar experiences with PDF - I tried the 3 or so
  freeware/java PDF
  text extractors and they were not as good as the exe, pdftotext,
  from foolabs (http://www.foolabs.com/xpdf/).
 
  Not satisfying to a java developer but these work better than anything
  else I can find.
 
  You get source and I use them on windows  linux, no prob.
 
 
 
  Eric Anderson wrote:
 
  I'm interested in using the textmining/textextraction utilities using
Apache
 
  POI, that Ryan was discussing. However, I'm having some difficulty
  determining
  what the insertion point would be to replace the default parser with
the
  word
  parser.
  
  Any assistance would be appreciated.
  
  
  
  
  
  LanRx Network Solutions, Inc.
  Providing Enterprise Level Solutions...On A Small Business Budget
  
  -
  To unsubscribe, e-mail: [EMAIL PROTECTED]
  For additional commands, e-mail: [EMAIL PROTECTED]
  
  
  
 
 
 
  -
  To unsubscribe, e-mail: [EMAIL PROTECTED]
  For additional commands, e-mail: [EMAIL PROTECTED]
 

 LanRx Network Solutions, Inc.
 Providing Enterprise Level Solutions...On A Small Business Budget

 -
 To unsubscribe, e-mail: [EMAIL PROTECTED]
 For additional commands, e-mail: [EMAIL PROTECTED]



-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Re: my experiences - Re: Parsing Word Docs

2003-03-06 Thread Eric Anderson

I'll go either way, but I still don't know how to implement the word parser, as 
opposed to the PDF parser or HTM parser.

Eric Anderson
LanRx Network Solutions


Quoting Ryan Ackley [EMAIL PROTECTED]:

 Eric,
 
 The problem with antiword is that it is a native application. You must
 write
 a class that uses JNI to access the native code. If you link your java code
 with native code you have lost one of the biggest benefits of Java,
 platform
 independence. I would suggest you use the library at http://textmining.org.
 contrary to what David Spencer says, it should work on all documents
 created
 with Word 97 or above. I have literally indexed 100,000s of unique
 documents
 using my library.
 
 Ryan Ackley
 
 - Original Message -
 From: Eric Anderson [EMAIL PROTECTED]
 To: Lucene Users List [EMAIL PROTECTED]
 Sent: Wednesday, March 05, 2003 7:14 PM
 Subject: Re: my experiences - Re: Parsing Word Docs
 
 
  Ok. Thanks for the tip.
 
  I downloaded and compiled Antiword, and would like to now add it to my
 indexing
  class. However, I'm not sure how the application would be called, and
 from
  where it would be called.
 
  How will I have the class parse the document through Antiword to create
 the
  keyword index, but leaving the DOC intact, as Mr. Litchfield did with
 PDFBox?
 
  Your assistance is greatly appreciated.
 
  Eric Anderson
  815-505-6132
 
 
  Quoting David Spencer [EMAIL PROTECTED]:
 
   FYI I tried the textmining.org/poi combo and on a collection of 350
 word
   docs people have developed here over the years, and it failed on 33% of
   them
   with exceptions being thrown about the formats being invalid.
  
   I tried antiword ( http://www.winfield.demon.nl/ ), a native  free
   *.exe, and
   it worked great ( well it seemed to process all the files fine).
  
   I've had similar experiences with PDF - I tried the 3 or so
   freeware/java PDF
   text extractors and they were not as good as the exe, pdftotext,
   from foolabs (http://www.foolabs.com/xpdf/).
  
   Not satisfying to a java developer but these work better than anything
   else I can find.
  
   You get source and I use them on windows  linux, no prob.
  
  
  
   Eric Anderson wrote:
  
   I'm interested in using the textmining/textextraction utilities using
 Apache
  
   POI, that Ryan was discussing. However, I'm having some difficulty
   determining
   what the insertion point would be to replace the default parser with
 the
   word
   parser.
   
   Any assistance would be appreciated.
   
   
   
   
   
   LanRx Network Solutions, Inc.
   Providing Enterprise Level Solutions...On A Small Business Budget
   
   -
   To unsubscribe, e-mail: [EMAIL PROTECTED]
   For additional commands, e-mail: [EMAIL PROTECTED]
   
   
   
  
  
  
   -
   To unsubscribe, e-mail: [EMAIL PROTECTED]
   For additional commands, e-mail: [EMAIL PROTECTED]
  
 
  LanRx Network Solutions, Inc.
  Providing Enterprise Level Solutions...On A Small Business Budget
 
  -
  To unsubscribe, e-mail: [EMAIL PROTECTED]
  For additional commands, e-mail: [EMAIL PROTECTED]
 
 
 
 -
 To unsubscribe, e-mail: [EMAIL PROTECTED]
 For additional commands, e-mail: [EMAIL PROTECTED]
 

LanRx Network Solutions, Inc.
Providing Enterprise Level Solutions...On A Small Business Budget

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

AW: my experiences - Re: Parsing Word Docs

2003-03-06 Thread Borkenhagen, Michael (ofd-ko zdfin)

Ryan,

I tried to use texmining to extract text from word97 Documents. Some german
characters like ä, ü etc. aren`t parsed correctly, so a can`t use it
cause many german words include this characters. I dont know if the reason
is textmining or hdf from poi (hssf from poi parses this characters
correctly). Do you have any hints for me ?

Michael

-Ursprüngliche Nachricht-
Von: Ryan Ackley [mailto:[EMAIL PROTECTED]
Gesendet: Donnerstag, 6. März 2003 13:13
An: Lucene Users List
Betreff: Re: my experiences - Re: Parsing Word Docs


David,

The textmining.org stuff only works on Word97 and above. It should work with
no exceptions on any Word 97 doc. If you have any problems then it is from
an earlier version (most likely Word 6.0) or its not a word document. If
this isn't the case you need to email me so I can fix it and make it better
for the benefit of everyone. I plan on adding support for Word 6 in the
future.

Ryan Ackley

- Original Message -
From: David Spencer [EMAIL PROTECTED]
To: Lucene Users List [EMAIL PROTECTED]
Sent: Wednesday, March 05, 2003 6:24 PM
Subject: my experiences - Re: Parsing Word Docs


 FYI I tried the textmining.org/poi combo and on a collection of 350 word
 docs people have developed here over the years, and it failed on 33% of
them
 with exceptions being thrown about the formats being invalid.

 I tried antiword ( http://www.winfield.demon.nl/ ), a native  free
 *.exe, and
 it worked great ( well it seemed to process all the files fine).

 I've had similar experiences with PDF - I tried the 3 or so
 freeware/java PDF
 text extractors and they were not as good as the exe, pdftotext,
 from foolabs (http://www.foolabs.com/xpdf/).

 Not satisfying to a java developer but these work better than anything
 else I can find.

 You get source and I use them on windows  linux, no prob.



 Eric Anderson wrote:

 I'm interested in using the textmining/textextraction utilities using
Apache
 POI, that Ryan was discussing. However, I'm having some difficulty
determining
 what the insertion point would be to replace the default parser with the
word
 parser.
 
 Any assistance would be appreciated.
 
 
 
 
 
 LanRx Network Solutions, Inc.
 Providing Enterprise Level Solutions...On A Small Business Budget
 
 -
 To unsubscribe, e-mail: [EMAIL PROTECTED]
 For additional commands, e-mail: [EMAIL PROTECTED]
 
 
 



 -
 To unsubscribe, e-mail: [EMAIL PROTECTED]
 For additional commands, e-mail: [EMAIL PROTECTED]




-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

AW: my experiences - Re: Parsing Word Docs

2003-03-06 Thread Borkenhagen, Michael (ofd-ko zdfin)

thx a lot :) I'll try it

-Ursprüngliche Nachricht-
Von: Mario Ivankovits [mailto:[EMAIL PROTECTED]
Gesendet: Donnerstag, 6. März 2003 14:00
An: Lucene Users List
Betreff: Re: my experiences - Re: Parsing Word Docs


The problems with german umlauts should be fixed.
I have posted them a patch (see
http://nagoya.apache.org/bugzilla/show_bug.cgi?id=14735), and it should be
applied now.
I havent cross-checked it for now.

I currently use POI to index documents with lucene, but i do not use the
standard way with an lucende-word-document class (like the pdfdocument).
For sure, i have had some problems with getting the text from old documents,
but in this case my system falls back to an simple STRINGS parser (filters
any human-readable) char from the document-file.

byebye
Mario

- Original Message -
From: Borkenhagen, Michael (ofd-ko zdfin)
[EMAIL PROTECTED]
To: 'Lucene Users List' [EMAIL PROTECTED]
Sent: Thursday, March 06, 2003 1:39 PM
Subject: AW: my experiences - Re: Parsing Word Docs


Ryan,

I tried to use texmining to extract text from word97 Documents. Some german
characters like ä, ü etc. aren`t parsed correctly, so a can`t use it
cause many german words include this characters. I dont know if the reason
is textmining or hdf from poi (hssf from poi parses this characters
correctly). Do you have any hints for me ?

Michael

-Ursprüngliche Nachricht-
Von: Ryan Ackley [mailto:[EMAIL PROTECTED]
Gesendet: Donnerstag, 6. März 2003 13:13
An: Lucene Users List
Betreff: Re: my experiences - Re: Parsing Word Docs


David,

The textmining.org stuff only works on Word97 and above. It should work with
no exceptions on any Word 97 doc. If you have any problems then it is from
an earlier version (most likely Word 6.0) or its not a word document. If
this isn't the case you need to email me so I can fix it and make it better
for the benefit of everyone. I plan on adding support for Word 6 in the
future.

Ryan Ackley

- Original Message -
From: David Spencer [EMAIL PROTECTED]
To: Lucene Users List [EMAIL PROTECTED]
Sent: Wednesday, March 05, 2003 6:24 PM
Subject: my experiences - Re: Parsing Word Docs


 FYI I tried the textmining.org/poi combo and on a collection of 350 word
 docs people have developed here over the years, and it failed on 33% of
them
 with exceptions being thrown about the formats being invalid.

 I tried antiword ( http://www.winfield.demon.nl/ ), a native  free
 *.exe, and
 it worked great ( well it seemed to process all the files fine).

 I've had similar experiences with PDF - I tried the 3 or so
 freeware/java PDF
 text extractors and they were not as good as the exe, pdftotext,
 from foolabs (http://www.foolabs.com/xpdf/).

 Not satisfying to a java developer but these work better than anything
 else I can find.

 You get source and I use them on windows  linux, no prob.



 Eric Anderson wrote:

 I'm interested in using the textmining/textextraction utilities using
Apache
 POI, that Ryan was discussing. However, I'm having some difficulty
determining
 what the insertion point would be to replace the default parser with the
word
 parser.
 
 Any assistance would be appreciated.
 
 
 
 
 
 LanRx Network Solutions, Inc.
 Providing Enterprise Level Solutions...On A Small Business Budget
 
 -
 To unsubscribe, e-mail: [EMAIL PROTECTED]
 For additional commands, e-mail: [EMAIL PROTECTED]
 
 
 



 -
 To unsubscribe, e-mail: [EMAIL PROTECTED]
 For additional commands, e-mail: [EMAIL PROTECTED]




-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Re: my experiences - Re: Parsing Word Docs

2003-03-06 Thread David Spencer

Eric Anderson wrote:

Ok. Thanks for the tip.

I downloaded and compiled Antiword, and would like to now add it to my indexing 
class. However, I'm not sure how the application would be called, 

How? You exec passing the file name and it prints the ascii text to stdout.
This method takes the file name (e.g. c:/dir1/dir2/foo.doc) and returns
the output from antitext as one big string:
   public static String getAntiText( String fn)
   throws Throwable
   {
   Process p = null;
   InputStream is = null;
   DataInputStream dis = null;
   try
   {
   p = rt.exec( new String[] { anti, fn});
   is = p.getInputStream();
   dis = new DataInputStream( is);
   String line;
   StringBuffer sb = new StringBuffer();
   while ( ( line = dis.readLine()) != null)
   {
   //o.println( READ:  +line);
   sb.append( line);
   sb.append(  );
   }
   return sb.toString();
   }
   finally
   {
   try { dis.close(); } catch( Throwable t) { }
   try { is.close(); } catch( Throwable t) { }   
   try { p.waitFor(); } catch( Throwable t) { }   
   try { p.destroy(); } catch( Throwable t) { }
   }
   }
   private static String anti = c:/antiword/antiword.exe;

and from 
where it would be called.

From where? If the file is a word doc e.g. name ends with .doc.

How will I have the class parse the document through Antiword to create the 
keyword index, but leaving the DOC intact, as Mr. Litchfield did with PDFBox?

Hmmm not sure what the exact issue is but is this the answer:

doc.add( Field.Text( contents, new StringReader( getAntiText( 
file_name_of_word_file;

Your assistance is greatly appreciated.

Eric Anderson
815-505-6132
Quoting David Spencer [EMAIL PROTECTED]:

 

FYI I tried the textmining.org/poi combo and on a collection of 350 word
docs people have developed here over the years, and it failed on 33% of
them
with exceptions being thrown about the formats being invalid.
I tried antiword ( http://www.winfield.demon.nl/ ), a native  free 
*.exe, and
it worked great ( well it seemed to process all the files fine).

I've had similar experiences with PDF - I tried the 3 or so 
freeware/java PDF
text extractors and they were not as good as the exe, pdftotext,
from foolabs (http://www.foolabs.com/xpdf/).

Not satisfying to a java developer but these work better than anything 
else I can find.

You get source and I use them on windows  linux, no prob.



Eric Anderson wrote:

   

I'm interested in using the textmining/textextraction utilities using Apache
 

POI, that Ryan was discussing. However, I'm having some difficulty
 

determining 
   

what the insertion point would be to replace the default parser with the
 

word 
   

parser. 

Any assistance would be appreciated.





LanRx Network Solutions, Inc.
Providing Enterprise Level Solutions...On A Small Business Budget
-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]


 

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]
   

LanRx Network Solutions, Inc.
Providing Enterprise Level Solutions...On A Small Business Budget
-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Re: my experiences - Re: Parsing Word Docs

2003-03-06 Thread David Spencer

Ryan Ackley wrote:

Eric,

The problem with antiword is that it is a native application. You must write
a class that uses JNI to access the native code. 

No you don't. Just use Runtime.exec - no JNI :)

If you link your java code
with native code you have lost one of the biggest benefits of Java, platform
Yeah but given that the source for antitext is avail and it runs on all 
platforms
I use (windows/linux/sun) and works better than anything else (given 
that it seems
to accept older formats than POI/textmining) it seems to get the job 
done better.

independence. I would suggest you use the library at http://textmining.org.
contrary to what David Spencer says, it should work on all documents created
with Word 97 or above. I have literally indexed 100,000s of unique documents
using my library.
Ryan Ackley

- Original Message -
From: Eric Anderson [EMAIL PROTECTED]
To: Lucene Users List [EMAIL PROTECTED]
Sent: Wednesday, March 05, 2003 7:14 PM
Subject: Re: my experiences - Re: Parsing Word Docs
 

Ok. Thanks for the tip.

I downloaded and compiled Antiword, and would like to now add it to my
   

indexing
 

class. However, I'm not sure how the application would be called, and from
where it would be called.
How will I have the class parse the document through Antiword to create
   

the
 

keyword index, but leaving the DOC intact, as Mr. Litchfield did with
   

PDFBox?
 

Your assistance is greatly appreciated.

Eric Anderson
815-505-6132
Quoting David Spencer [EMAIL PROTECTED]:

   

FYI I tried the textmining.org/poi combo and on a collection of 350 word
docs people have developed here over the years, and it failed on 33% of
them
with exceptions being thrown about the formats being invalid.
I tried antiword ( http://www.winfield.demon.nl/ ), a native  free
*.exe, and
it worked great ( well it seemed to process all the files fine).
I've had similar experiences with PDF - I tried the 3 or so
freeware/java PDF
text extractors and they were not as good as the exe, pdftotext,
from foolabs (http://www.foolabs.com/xpdf/).
Not satisfying to a java developer but these work better than anything
else I can find.
You get source and I use them on windows  linux, no prob.



Eric Anderson wrote:

 

I'm interested in using the textmining/textextraction utilities using
   

Apache
 

POI, that Ryan was discussing. However, I'm having some difficulty
   

determining
 

what the insertion point would be to replace the default parser with
   

the
 

word
 

parser.

Any assistance would be appreciated.





LanRx Network Solutions, Inc.
Providing Enterprise Level Solutions...On A Small Business Budget
-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]


   

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]
 

LanRx Network Solutions, Inc.
Providing Enterprise Level Solutions...On A Small Business Budget
-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]
   



-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Re: my experiences - Re: Parsing Word Docs

2003-03-06 Thread David Spencer

Ryan Ackley wrote:

David,

The textmining.org stuff only works on Word97 and above. It should work with

Could be we had pre word97 docs as some date from 1996 when we (Lumos at 
least)
were founded.

no exceptions on any Word 97 doc. If you have any problems then it is from
an earlier version (most likely Word 6.0) or its not a word document. If
this isn't the case you need to email me so I can fix it and make it better
for the benefit of everyone. I plan on adding support for Word 6 in the
future.
Ryan Ackley

- Original Message -
From: David Spencer [EMAIL PROTECTED]
To: Lucene Users List [EMAIL PROTECTED]
Sent: Wednesday, March 05, 2003 6:24 PM
Subject: my experiences - Re: Parsing Word Docs
 

FYI I tried the textmining.org/poi combo and on a collection of 350 word
docs people have developed here over the years, and it failed on 33% of
   

them
 

with exceptions being thrown about the formats being invalid.

I tried antiword ( http://www.winfield.demon.nl/ ), a native  free
*.exe, and
it worked great ( well it seemed to process all the files fine).
I've had similar experiences with PDF - I tried the 3 or so
freeware/java PDF
text extractors and they were not as good as the exe, pdftotext,
from foolabs (http://www.foolabs.com/xpdf/).
Not satisfying to a java developer but these work better than anything
else I can find.
You get source and I use them on windows  linux, no prob.



Eric Anderson wrote:

   

I'm interested in using the textmining/textextraction utilities using
 

Apache
 

POI, that Ryan was discussing. However, I'm having some difficulty
 

determining
 

what the insertion point would be to replace the default parser with the
 

word
 

parser.

Any assistance would be appreciated.





LanRx Network Solutions, Inc.
Providing Enterprise Level Solutions...On A Small Business Budget
-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]


 

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]
   



-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

my experiences - Re: Parsing Word Docs

2003-03-05 Thread David Spencer

FYI I tried the textmining.org/poi combo and on a collection of 350 word
docs people have developed here over the years, and it failed on 33% of them
with exceptions being thrown about the formats being invalid.
I tried antiword ( http://www.winfield.demon.nl/ ), a native  free 
*.exe, and
it worked great ( well it seemed to process all the files fine).

I've had similar experiences with PDF - I tried the 3 or so 
freeware/java PDF
text extractors and they were not as good as the exe, pdftotext,
from foolabs (http://www.foolabs.com/xpdf/).

Not satisfying to a java developer but these work better than anything 
else I can find.

You get source and I use them on windows  linux, no prob.



Eric Anderson wrote:

I'm interested in using the textmining/textextraction utilities using Apache 
POI, that Ryan was discussing. However, I'm having some difficulty determining 
what the insertion point would be to replace the default parser with the word 
parser. 

Any assistance would be appreciated.





LanRx Network Solutions, Inc.
Providing Enterprise Level Solutions...On A Small Business Budget
-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]
 



-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Re: my experiences - Re: Parsing Word Docs

Re: my experiences - Re: Parsing Word Docs

Re: my experiences - Re: Parsing Word Docs

AW: my experiences - Re: Parsing Word Docs

AW: my experiences - Re: Parsing Word Docs

Re: my experiences - Re: Parsing Word Docs

Re: my experiences - Re: Parsing Word Docs

Re: my experiences - Re: Parsing Word Docs

my experiences - Re: Parsing Word Docs

9 matches

Site Navigation

Mail list logo

Footer information