[
https://issues.apache.org/jira/browse/LUCENE-2287?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Michael Goddard updated LUCENE-2287:
------------------------------------
Attachment: LUCENE-2287.patch
This is a very rough initial patch, has way too much cruft, but I wanted to
park it somewhere nonetheless. ** not meant to be applied yet **
> Unexpected terms are highlighted within nested SpanQuery instances
> ------------------------------------------------------------------
>
> Key: LUCENE-2287
> URL: https://issues.apache.org/jira/browse/LUCENE-2287
> Project: Lucene - Java
> Issue Type: Improvement
> Components: contrib/highlighter
> Affects Versions: 2.9.1
> Environment: Linux, Solaris, Windows
> Reporter: Michael Goddard
> Priority: Minor
> Attachments: LUCENE-2287.patch
>
> Original Estimate: 336h
> Remaining Estimate: 336h
>
> I haven't yet been able to resolve why I'm seeing spurious highlighting in
> nested SpanQuery instances. Briefly, the issue is illustrated by the second
> instance of "Lucene" being highlighted in the test below, when it doesn't
> satisfy the inner span. There's been some discussion about this on the
> java-dev list, and I'm opening this issue now because I have made some
> initial progress on this.
> This new test, added to the HighlighterTest class in lucene_2_9_1,
> illustrates this:
> /*
> * Ref: http://www.lucidimagination.com/blog/2009/07/18/the-spanquery/
> */
> public void testHighlightingNestedSpans2() throws Exception {
> String theText = "The Lucene was made by Doug Cutting and Lucene great
> Hadoop was"; // Problem
> //String theText = "The Lucene was made by Doug Cutting and the great
> Hadoop was"; // Works okay
> String fieldName = "SOME_FIELD_NAME";
> SpanNearQuery spanNear = new SpanNearQuery(new SpanQuery[] {
> new SpanTermQuery(new Term(fieldName, "lucene")),
> new SpanTermQuery(new Term(fieldName, "doug")) }, 5, true);
> Query query = new SpanNearQuery(new SpanQuery[] { spanNear,
> new SpanTermQuery(new Term(fieldName, "hadoop")) }, 4, true);
> String expected = "The <B>Lucene</B> was made by <B>Doug</B> Cutting and
> Lucene great <B>Hadoop</B> was";
> //String expected = "The <B>Lucene</B> was made by <B>Doug</B> Cutting and
> the great <B>Hadoop</B> was";
> String observed = highlightField(query, fieldName, theText);
> System.out.println("Expected: \"" + expected + "\n" + "Observed: \"" +
> observed);
> assertEquals("Why is that second instance of the term \"Lucene\"
> highlighted?", expected, observed);
> }
> Is this an issue that's arisen before? I've been reading through the source
> to QueryScorer, WeightedSpanTerm, WeightedSpanTermExtractor, Spans, and
> NearSpansOrdered, but haven't found the solution yet. Initially, I thought
> that the extractWeightedSpanTerms method in WeightedSpanTermExtractor should
> be called on each clause of a SpanNearQuery or SpanOrQuery, but that didn't
> get me too far.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]