And to include the code

On Thu, Oct 4, 2012 at 3:52 PM, Markus Jelsma
<markus.jel...@openindex.io> wrote:
> I forgot to add that this is with today's build of trunk.
>
> -----Original message-----
>> From:Markus Jelsma <markus.jel...@openindex.io>
>> Sent: Thu 04-Oct-2012 15:42
>> To: java-user@lucene.apache.org
>> Subject: Highlighter IOOBE with modified HyphenationCompoundWordTokenFilter
>>
>> Hi,
>>
>> I've modified the HyphenationCompoundWordTokenFilter to emit less subtokens 
>> because the original filter can emit all kinds of subtokens that have a very 
>> different meaning on their own. I've modified it so no overlapping subtokens 
>> are emitted and no subtokens are emitted that can be found within another 
>> subtoken. I've also modified it to force that the generated subtokens 
>> comprise the original token and if they don't forget the subtokens. It also 
>> doesn't return the original token anymore, the original filter produces a 
>> duplicate of the original input token. For example: verzekeringmaatschappij 
>> now becomes verzekering and maatschappij and not verzekeringmaatschappij, 
>> ver, zeker, verzeker, zekering, ringmaat, maat and more.
>>
>> But it seem that i have done something wrong because my modified version 
>> sometimes causes the Highlighter to throw the following IOOBE:
>>
>> java.lang.StringIndexOutOfBoundsException: String index out of range: -14
>>         at java.lang.String.substring(String.java:1937)
>>         at 
>> org.apache.lucene.search.vectorhighlight.BaseFragmentsBuilder.makeFragment(BaseFragmentsBuilder.java:172)
>>         at 
>> org.apache.lucene.search.vectorhighlight.BaseFragmentsBuilder.createFragments(BaseFragmentsBuilder.java:138)
>>         at 
>> org.apache.lucene.search.vectorhighlight.FastVectorHighlighter.getBestFragments(FastVectorHighlighter.java:186)
>>         at 
>> org.apache.solr.highlight.DefaultSolrHighlighter.doHighlightingByFastVectorHighlighter(DefaultSolrHighlighter.java:571)
>>         at 
>> org.apache.solr.highlight.DefaultSolrHighlighter.doHighlighting(DefaultSolrHighlighter.java:401)
>>         at 
>> org.apache.solr.handler.component.HighlightComponent.process(HighlightComponent.java:136)
>>         at 
>> org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:214)
>>         at 
>> org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:129)
>>         at org.apache.solr.core.SolrCore.execute(SolrCore.java:1750)
>>         at 
>> org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:455)
>>         at 
>> org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:276)
>>         at 
>> org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1337)
>>         .....
>>
>> Anyone to point me in the right direction? I've checked the LIA book on how 
>> to manipulate the tokenstream and thought it should be alright. My analysis 
>> tests also yield good results, nothing strange to be found. Or could it be 
>> an error in the highlighter that only now shows up?
>>
>> Thanks,
>> Markus
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
>> For additional commands, e-mail: java-user-h...@lucene.apache.org
>>
>>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
> For additional commands, e-mail: java-user-h...@lucene.apache.org
>

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org

Reply via email to