[ 
https://issues.apache.org/jira/browse/LUCENE-6139?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

David Smiley updated LUCENE-6139:
---------------------------------
    Attachment: LUCENE-6139_TokenGroup_offsets.patch

The attached patch makes TokenGroup's fields private to force Highlighter to 
use the getters, and I made getStartOffset return matchStartOffset (likewise 
for 'end'). And I added docs.

I experimented with not having the match vs. not distinction on the internal 
state in terms of getter and field.  The moment a matching token (score > 0) 
was added to the group, the semantics of the start & end offset were 
constrained to be limited to just the matching token(s).  But a boolean query 
for "hi" and "speed" failed in testOverlapAnalyzer2 to match what the test was 
told it should be. [IMO the new behavior was totally 
acceptable|https://issues.apache.org/jira/browse/LUCENE-627?focusedCommentId=14262458],
 but I'm weary of introducing anything that would change the results without 
peer review.  So I'll commit this instead in a couple days if there are no 
further comments.

> TokenGroup.getStart|EndOffset should return matchStart|EndOffset not 
> start|endOffset
> ------------------------------------------------------------------------------------
>
>                 Key: LUCENE-6139
>                 URL: https://issues.apache.org/jira/browse/LUCENE-6139
>             Project: Lucene - Core
>          Issue Type: Bug
>          Components: modules/highlighter
>            Reporter: David Smiley
>         Attachments: LUCENE-6139_TokenGroup_offsets.patch
>
>
> The default highlighter has a TokenGroup class that is passed to 
> Formatter.highlightTerm().  TokenGroup also has getStartOffset() and 
> getEndOffset() methods that ostensibly return the start and end offsets into 
> the original text of the current term.  These getters aren't called by Lucene 
> or Solr but they are made available and are useful to me.  _The problem is 
> that they return the wrong offsets when there are tokens at the same 
> position._  I believe this was an oversight of LUCENE-627 in which these 
> getters should have been updated but weren't.  The fix is simple: return 
> matchStartOffset and matchEndOffset from these getters, not startOffset and 
> endOffset.  I think this oversight would not have occurred if Highlighter 
> didn't have package-access to TokenGroup's fields.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to