And to include the code On Thu, Oct 4, 2012 at 3:52 PM, Markus Jelsma <markus.jel...@openindex.io> wrote: > I forgot to add that this is with today's build of trunk. > > -----Original message----- >> From:Markus Jelsma <markus.jel...@openindex.io> >> Sent: Thu 04-Oct-2012 15:42 >> To: java-user@lucene.apache.org >> Subject: Highlighter IOOBE with modified HyphenationCompoundWordTokenFilter >> >> Hi, >> >> I've modified the HyphenationCompoundWordTokenFilter to emit less subtokens >> because the original filter can emit all kinds of subtokens that have a very >> different meaning on their own. I've modified it so no overlapping subtokens >> are emitted and no subtokens are emitted that can be found within another >> subtoken. I've also modified it to force that the generated subtokens >> comprise the original token and if they don't forget the subtokens. It also >> doesn't return the original token anymore, the original filter produces a >> duplicate of the original input token. For example: verzekeringmaatschappij >> now becomes verzekering and maatschappij and not verzekeringmaatschappij, >> ver, zeker, verzeker, zekering, ringmaat, maat and more. >> >> But it seem that i have done something wrong because my modified version >> sometimes causes the Highlighter to throw the following IOOBE: >> >> java.lang.StringIndexOutOfBoundsException: String index out of range: -14 >> at java.lang.String.substring(String.java:1937) >> at >> org.apache.lucene.search.vectorhighlight.BaseFragmentsBuilder.makeFragment(BaseFragmentsBuilder.java:172) >> at >> org.apache.lucene.search.vectorhighlight.BaseFragmentsBuilder.createFragments(BaseFragmentsBuilder.java:138) >> at >> org.apache.lucene.search.vectorhighlight.FastVectorHighlighter.getBestFragments(FastVectorHighlighter.java:186) >> at >> org.apache.solr.highlight.DefaultSolrHighlighter.doHighlightingByFastVectorHighlighter(DefaultSolrHighlighter.java:571) >> at >> org.apache.solr.highlight.DefaultSolrHighlighter.doHighlighting(DefaultSolrHighlighter.java:401) >> at >> org.apache.solr.handler.component.HighlightComponent.process(HighlightComponent.java:136) >> at >> org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:214) >> at >> org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:129) >> at org.apache.solr.core.SolrCore.execute(SolrCore.java:1750) >> at >> org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:455) >> at >> org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:276) >> at >> org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1337) >> ..... >> >> Anyone to point me in the right direction? I've checked the LIA book on how >> to manipulate the tokenstream and thought it should be alright. My analysis >> tests also yield good results, nothing strange to be found. Or could it be >> an error in the highlighter that only now shows up? >> >> Thanks, >> Markus >> >> --------------------------------------------------------------------- >> To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org >> For additional commands, e-mail: java-user-h...@lucene.apache.org >> >> > > --------------------------------------------------------------------- > To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org > For additional commands, e-mail: java-user-h...@lucene.apache.org >
--------------------------------------------------------------------- To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands, e-mail: java-user-h...@lucene.apache.org