Hi Thomas,

the jackrabbit text filter for ms word documents depends on the textmining library and apache poi. maybe you can find some information or hints in those mailing lists?

regards
 marcel

thomasg wrote:
Has anyone encoutered problems with this text filter. I am testing the text
extraction of quite a large document (6MB worth of Thinking In Java by
captain Bruce Eckel). Seaching    was not producing expected results. I have
taken the Reader object generated by the MsWordTextFilter and converted it
into a String and writen it to a file. Inspection shows that most of the
document has been omitted. The missing part is in the middle of the file and
there are no particularly unusal contents that mark the start of the missing
section. I've tested larger docs that work fine so its a bit of a mystery?

Cheers, Thomas
--
View this message in context: 
http://www.nabble.com/MsWordTextFilter-Problem-t1626136.html#a4406009
Sent from the Jackrabbit - Dev forum at Nabble.com.



Reply via email to