Hi,

On 5/17/06, thomasg <[EMAIL PROTECTED]> wrote:
One slight worry, have you visited www.textmining.org lately?
Doesn't seem too healthy!

The site has been hacked since December. :-( Would it make sense to
consider alternatives? Some ideas that come to my mind:

a) Contact the Jakarta POI community for their suggestions.

b) Implement a generic text filter that pipes the binary stream
through an external application like catdoc and reads the output as
plain text to be indexed.

c) Implement a text filter that uses an OpenOffice "server" through
the UNO API to manipulate Word and other types of documents.

BR,

Jukka Zitting

--
Yukatan - http://yukatan.fi/ - [EMAIL PROTECTED]
Software craftsmanship, JCR consulting, and Java development

Reply via email to