Hello, I initially posted a version of this question to java-user, but think it's more of a java-dev question. I haven't yet been able to resolve why I'm seeing spurious highlighting in nested SpanQuery instances. To illustrate this, I added the code below to the HighlighterTest class in lucene_2_9_1:
/* * Ref: http://www.lucidimagination.com/blog/2009/07/18/the-spanquery/ */ public void testHighlightingNestedSpans2() throws Exception { String theText = "The Lucene was made by Doug Cutting and Lucene great Hadoop was"; // Problem //String theText = "The Lucene was made by Doug Cutting and the great Hadoop was"; // Works okay String fieldName = "SOME_FIELD_NAME"; SpanNearQuery spanNear = new SpanNearQuery(new SpanQuery[] { new SpanTermQuery(new Term(fieldName, "lucene")), new SpanTermQuery(new Term(fieldName, "doug")) }, 5, true); Query query = new SpanNearQuery(new SpanQuery[] { spanNear, new SpanTermQuery(new Term(fieldName, "hadoop")) }, 4, true); String expected = "The <B>Lucene</B> was made by <B>Doug</B> Cutting and Lucene great <B>Hadoop</B> was"; //String expected = "The <B>Lucene</B> was made by <B>Doug</B> Cutting and the great <B>Hadoop</B> was"; String observed = highlightField(query, fieldName, theText); System.out.println("Expected: \"" + expected + "\n" + "Observed: \"" + observed); assertEquals("Why is that second instance of the term \"Lucene\" highlighted?", expected, observed); } Is this an issue that's arisen before? I've been reading through the source to QueryScorer, WeightedSpanTerm, WeightedSpanTermExtractor, Spans, and NearSpansOrdered, but haven't found the solution yet. Initially, I thought that the extractWeightedSpanTerms method in WeightedSpanTermExtractor should be called on each clause of a SpanNearQuery or SpanOrQuery, but that didn't get me too far. Any suggestions are welcome. Thanks. Mike