Thanks for pointing out this issue.
The bug was related to having a doc bigger than the maxNumDocsToAnalyze setting. In this situation, the last fragment created was always sized from maxNumDocsToAnalyze position to the remainder of the doc (in your case, quite large!)

I have fixed this in SVN and Junit tests are clean. If you want to patch your version comment out these lines in the code

// append text after end of last token
if (lastEndOffset < text.length())

I believe the above code was trying to retain any non-token text after the last token eg appending ?! or . so I have removed it to be on the safe side.


To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Reply via email to