[
http://issues.apache.org/jira/browse/LUCENE-627?page=comments#action_12420950 ]
Mark Harwood commented on LUCENE-627:
-------------------------------------
>>The original token stream is a valid one though right?
I don't think so, see below...
List lst = new ArrayList();
Token t;
t = new Token("i",0,1);
lst.add(t);
t = new Token("pod",1,4);
t.setPositionIncrement(0);
lst.add(t);
t = new Token("ipod",0,4);
!! Missing a t.setPositionIncrement(0) here.
lst.add(t);
t = new Token("foo",5,8);
lst.add(t);
iter = lst.iterator();
Having fixed the above I believe this change below is all that is required to
fix the highlighter:
TokenGroup.java
boolean isDistinct(Token token)
{
// return token.startOffset()>=endOffset;
return token.getPositionIncrement()>0;
}
All my Junit tests pass with this change - can you verify this is true for you
too?
This change would break highlighting for any analyzers that had position
increments but whose offsets somehow overlapped - text would potentially be
duplicated in the same way you originally reported your problem. I can't verify
this to be true for CJK analyzers etc so feel a little uneasy about committing
this.
Cheers
Mark
> highlighter problems with overlapping tokens
> --------------------------------------------
>
> Key: LUCENE-627
> URL: http://issues.apache.org/jira/browse/LUCENE-627
> Project: Lucene - Java
> Type: Bug
> Components: Other
> Versions: 2.0.1
> Reporter: Yonik Seeley
>
> The lucene highlighter has problems when tokens that overlap are generated.
> For example, if analysis of iPod generates the tokens "i", "pod", "ipod"
> (with pod and ipod in the same position),
> then the highlighter will output this as iipod, regardless of if any of those
> tokens are highlighted.
> Discovered via http://issues.apache.org/jira/browse/SOLR-24
--
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators:
http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see:
http://www.atlassian.com/software/jira
---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]