[jira] Updated: (LUCENE-2287) Unexpected terms are highlighted within nested SpanQuery instances

Michael Goddard (JIRA) Fri, 26 Feb 2010 08:51:56 -0800

     [ 
https://issues.apache.org/jira/browse/LUCENE-2287?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Michael Goddard updated LUCENE-2287:
------------------------------------

    Attachment: LUCENE-2287.patch

This is a very rough initial patch, has way too much cruft, but I wanted to 
park it somewhere nonetheless.  ** not meant to be applied yet **

> Unexpected terms are highlighted within nested SpanQuery instances
> ------------------------------------------------------------------
>
>                 Key: LUCENE-2287
>                 URL: https://issues.apache.org/jira/browse/LUCENE-2287
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: contrib/highlighter
>    Affects Versions: 2.9.1
>         Environment: Linux, Solaris, Windows
>            Reporter: Michael Goddard
>            Priority: Minor
>         Attachments: LUCENE-2287.patch
>
>   Original Estimate: 336h
>  Remaining Estimate: 336h
>
> I haven't yet been able to resolve why I'm seeing spurious highlighting in 
> nested SpanQuery instances.  Briefly, the issue is illustrated by the second 
> instance of "Lucene" being highlighted in the test below, when it doesn't 
> satisfy the inner span.  There's been some discussion about this on the 
> java-dev list, and I'm opening this issue now because I have made some 
> initial progress on this.
> This new test, added to the  HighlighterTest class in lucene_2_9_1, 
> illustrates this:
> /*
>  * Ref: http://www.lucidimagination.com/blog/2009/07/18/the-spanquery/
>  */
> public void testHighlightingNestedSpans2() throws Exception {
>   String theText = "The Lucene was made by Doug Cutting and Lucene great 
> Hadoop was"; // Problem
>   //String theText = "The Lucene was made by Doug Cutting and the great 
> Hadoop was"; // Works okay
>   String fieldName = "SOME_FIELD_NAME";
>   SpanNearQuery spanNear = new SpanNearQuery(new SpanQuery[] {
>     new SpanTermQuery(new Term(fieldName, "lucene")),
>     new SpanTermQuery(new Term(fieldName, "doug")) }, 5, true);
>   Query query = new SpanNearQuery(new SpanQuery[] { spanNear,
>     new SpanTermQuery(new Term(fieldName, "hadoop")) }, 4, true);
>   String expected = "The <B>Lucene</B> was made by <B>Doug</B> Cutting and 
> Lucene great <B>Hadoop</B> was";
>   //String expected = "The <B>Lucene</B> was made by <B>Doug</B> Cutting and 
> the great <B>Hadoop</B> was";
>   String observed = highlightField(query, fieldName, theText);
>   System.out.println("Expected: \"" + expected + "\n" + "Observed: \"" + 
> observed);
>   assertEquals("Why is that second instance of the term \"Lucene\" 
> highlighted?", expected, observed);
> }
> Is this an issue that's arisen before?  I've been reading through the source 
> to QueryScorer, WeightedSpanTerm, WeightedSpanTermExtractor, Spans, and 
> NearSpansOrdered, but haven't found the solution yet.  Initially, I thought 
> that the extractWeightedSpanTerms method in WeightedSpanTermExtractor should 
> be called on each clause of a SpanNearQuery or SpanOrQuery, but that didn't 
> get me too far.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] Updated: (LUCENE-2287) Unexpected terms are highlighted within nested SpanQuery instances

Reply via email to