Hi, On 5/17/06, thomasg <[EMAIL PROTECTED]> wrote:
One slight worry, have you visited www.textmining.org lately? Doesn't seem too healthy!
The site has been hacked since December. :-( Would it make sense to consider alternatives? Some ideas that come to my mind: a) Contact the Jakarta POI community for their suggestions. b) Implement a generic text filter that pipes the binary stream through an external application like catdoc and reads the output as plain text to be indexed. c) Implement a text filter that uses an OpenOffice "server" through the UNO API to manipulate Word and other types of documents. BR, Jukka Zitting -- Yukatan - http://yukatan.fi/ - [EMAIL PROTECTED] Software craftsmanship, JCR consulting, and Java development
