Re: MsWordTextFilter Problem

Marcel Reutegger Tue, 16 May 2006 22:17:50 -0700

Hi Thomas,

the jackrabbit text filter for ms word documents depends on the textmininglibrary and apache poi. maybe you can find some information or hints inthose mailing lists?


regards
 marcel

thomasg wrote:

Has anyone encoutered problems with this text filter. I am testing the text
extraction of quite a large document (6MB worth of Thinking In Java by
captain Bruce Eckel). Seaching    was not producing expected results. I have
taken the Reader object generated by the MsWordTextFilter and converted it
into a String and writen it to a file. Inspection shows that most of the
document has been omitted. The missing part is in the middle of the file and
there are no particularly unusal contents that mark the start of the missing
section. I've tested larger docs that work fine so its a bit of a mystery?

Cheers, Thomas
--
View this message in context: 
http://www.nabble.com/MsWordTextFilter-Problem-t1626136.html#a4406009
Sent from the Jackrabbit - Dev forum at Nabble.com.

Re: MsWordTextFilter Problem

Reply via email to