[
https://issues.apache.org/jira/browse/LUCENE-6139?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
David Smiley updated LUCENE-6139:
---------------------------------
Attachment: LUCENE-6139_TokenGroup_offsets.patch
The attached patch makes TokenGroup's fields private to force Highlighter to
use the getters, and I made getStartOffset return matchStartOffset (likewise
for 'end'). And I added docs.
I experimented with not having the match vs. not distinction on the internal
state in terms of getter and field. The moment a matching token (score > 0)
was added to the group, the semantics of the start & end offset were
constrained to be limited to just the matching token(s). But a boolean query
for "hi" and "speed" failed in testOverlapAnalyzer2 to match what the test was
told it should be. [IMO the new behavior was totally
acceptable|https://issues.apache.org/jira/browse/LUCENE-627?focusedCommentId=14262458],
but I'm weary of introducing anything that would change the results without
peer review. So I'll commit this instead in a couple days if there are no
further comments.
> TokenGroup.getStart|EndOffset should return matchStart|EndOffset not
> start|endOffset
> ------------------------------------------------------------------------------------
>
> Key: LUCENE-6139
> URL: https://issues.apache.org/jira/browse/LUCENE-6139
> Project: Lucene - Core
> Issue Type: Bug
> Components: modules/highlighter
> Reporter: David Smiley
> Attachments: LUCENE-6139_TokenGroup_offsets.patch
>
>
> The default highlighter has a TokenGroup class that is passed to
> Formatter.highlightTerm(). TokenGroup also has getStartOffset() and
> getEndOffset() methods that ostensibly return the start and end offsets into
> the original text of the current term. These getters aren't called by Lucene
> or Solr but they are made available and are useful to me. _The problem is
> that they return the wrong offsets when there are tokens at the same
> position._ I believe this was an oversight of LUCENE-627 in which these
> getters should have been updated but weren't. The fix is simple: return
> matchStartOffset and matchEndOffset from these getters, not startOffset and
> endOffset. I think this oversight would not have occurred if Highlighter
> didn't have package-access to TokenGroup's fields.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]