Hi,

we had some problems using the POI Word filter. In one document set,
everything would work fine, in another more than 50% documents refused to
work with it (does not index). I am not an OLE2 pro and cannot see any
apparent difference in the documents between the different sets. The version
used was Word 97 in almost all the docs. For the moment, I switched to a
native converter (that does not process metadata and must be run using
Runtime.exec(), though) until I have time to revisit the problem.

I do not want to disrecommend the POI-filters, it's a very cool idea. Please
do try your particular document set with it. For a quick test, you can use
the Docco personal search tool by Peter Becker and colleagues (available
from SourceForge). It has a current version of POI included as a plugin and
Lucene running as indexing backend. So you don't have to write code to get
answers...

Cheers, gregor

-----Original Message-----
From: Pleasant, Tracy [mailto:[EMAIL PROTECTED]
Sent: Monday, December 15, 2003 2:58 PM
To: Lucene Users List
Subject: Word Documents


As a spinoff, I was wondering if anyone has been happy with indexing and
searching Word docs. What about reading the contents? Any problems?


-----Original Message-----
From: Ryan Ackley [mailto:[EMAIL PROTECTED]
Sent: Friday, December 12, 2003 5:59 PM
To: Zhou, Oliver; Lucene Users List
Subject: Re: textmining: document title


Check out jakarta POI (http://jakarta.apache.org/poi ) particularly the HPSF
API. It allows you to extract metadata like Title, Author, etc. from OLE
documents.

-Ryan

----- Original Message ----- 
From: "Zhou, Oliver" <[EMAIL PROTECTED]>
To: <[EMAIL PROTECTED]>
Sent: Friday, December 12, 2003 5:26 PM
Subject: textmining: document title


> Ryan,
>
> I'm using textmining and lucene to index word documents but don't know how
> to get word document title.  Your advice on this matter is appreciated.
>
> Thanks,
> Oliver Zhou
>
>


---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]


---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Reply via email to