[ https://issues.apache.org/jira/browse/LUCENE-6334?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14644988#comment-14644988 ]
ASF subversion and git services commented on LUCENE-6334: --------------------------------------------------------- Commit 1693155 from [~mikemccand] in branch 'dev/branches/branch_5x' [ https://svn.apache.org/r1693155 ] LUCENE-6334: fix FastVectorHighlighter when a phrase spans more than one value in a multi-valued field > Fast Vector Highlighter does not properly span neighboring term offsets > ----------------------------------------------------------------------- > > Key: LUCENE-6334 > URL: https://issues.apache.org/jira/browse/LUCENE-6334 > Project: Lucene - Core > Issue Type: Bug > Components: core/termvectors, modules/highlighter > Reporter: Chris Earle > Labels: easyfix > Attachments: LUCENE-6334.patch > > > If you are using term vectors for fast vector highlighting along with a > multivalue field while matching a phrase that crosses two elements, then it > will not properly highlight even though it _properly_ finds the correct > values to highlight. > A good example of this is when matching source code, where you might have > lines like: > {code} > one two three five > two three four > five six five > six seven eight nine eight nine eight nine eight nine eight nine eight nine > eight nine > ten eleven > twelve thirteen > {code} > Matching the phrase "four five" will return > {code} > two three four > five six five > six seven eight nine eight nine eight nine eight nine eight > eight nine > ten eleven > {code} > However, it does not properly highlight "four" (on the first line) and "five" > (on the second line) _and_ it is returning too many lines, but not all of > them. > The problem lies in the [BaseFragmentsBuilder at line 269| > https://github.com/apache/lucene-solr/blob/trunk/lucene/highlighter/src/java/org/apache/lucene/search/vectorhighlight/BaseFragmentsBuilder.java#L269] > because it is not checking for cross-coverage. Here is a possible solution: > {code} > boolean started = toffs.getStartOffset() >= fieldStart; > boolean ended = toffs.getEndOffset() <= fieldEnd; > // existing behavior: > if (started && ended) { > toffsList.add(toffs); > toffsIterator.remove(); > } > else if (started) { > toffsList.add(new Toffs(toffs.getStartOffset(), field.end)); > // toffsIterator.remove(); // is this necessary? > } > else if (ended) { > toffsList.add(new Toffs(fieldStart, toff.getEndOffset())); > // toffsIterator.remove(); // is this necessary? > } > else if (toffs.getEndOffset() > fieldEnd) { > // ie the toff spans whole field > toffsList.add(new Toffs(fieldStart, fieldEnd)); > // toffsIterator.remove(); // is this necessary? > } > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332) --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org