Hello,

I initially posted a version of this question to java-user, but think it's more 
of a java-dev question.  I haven't yet been able to resolve why I'm seeing 
spurious highlighting in nested SpanQuery instances.  To illustrate this, I 
added the code below to the HighlighterTest class in lucene_2_9_1:

/*
 * Ref: http://www.lucidimagination.com/blog/2009/07/18/the-spanquery/
 */
public void testHighlightingNestedSpans2() throws Exception {

  String theText = "The Lucene was made by Doug Cutting and Lucene great Hadoop 
was"; // Problem
  //String theText = "The Lucene was made by Doug Cutting and the great Hadoop 
was"; // Works okay

  String fieldName = "SOME_FIELD_NAME";

  SpanNearQuery spanNear = new SpanNearQuery(new SpanQuery[] {
    new SpanTermQuery(new Term(fieldName, "lucene")),
    new SpanTermQuery(new Term(fieldName, "doug")) }, 5, true);

  Query query = new SpanNearQuery(new SpanQuery[] { spanNear,
    new SpanTermQuery(new Term(fieldName, "hadoop")) }, 4, true);

  String expected = "The <B>Lucene</B> was made by <B>Doug</B> Cutting and 
Lucene great <B>Hadoop</B> was";
  //String expected = "The <B>Lucene</B> was made by <B>Doug</B> Cutting and 
the great <B>Hadoop</B> was";

  String observed = highlightField(query, fieldName, theText);
  System.out.println("Expected: \"" + expected + "\n" + "Observed: \"" + 
observed);

  assertEquals("Why is that second instance of the term \"Lucene\" 
highlighted?", expected, observed);
}

Is this an issue that's arisen before?  I've been reading through the source to 
QueryScorer, WeightedSpanTerm, WeightedSpanTermExtractor, Spans, and 
NearSpansOrdered, but haven't found the solution yet.  Initially, I thought 
that the extractWeightedSpanTerms method in WeightedSpanTermExtractor should be 
called on each clause of a SpanNearQuery or SpanOrQuery, but that didn't get me 
too far.

Any suggestions are welcome.

Thanks.

  Mike

Reply via email to