On Aug 30, 2007, at 11:21 AM, Paul Elschot wrote:
Grant, On Wednesday 15 August 2007 17:57, Grant Ingersoll wrote:Couple of Spans questions for people: 1. Would the docs be clearer for Spans.end() if it said that the span is not inclusive of the end position? From what I can tell, it is not inclusive, correct?Yes. The easiest place to see that is in TermSpans.end(), which is the term position plus 1, see TermSpans.java line 89.
I will update the docs to make it explicit.
2. I have added the following test to TestSpans.java public void testSpanNearUnOrdered() throws Exception { SpanNearQuery snq; SpanNearQuery u1u2 = new SpanNearQuery(new SpanQuery[] {makeSpanTermQuery("u1"), makeSpanTermQuery("u2")}, 0, false); snq = new SpanNearQuery( new SpanQuery[] { u1u2, makeSpanTermQuery("u2") }, 1, false); spans = snq.getSpans(searcher.getIndexReader()); assertTrue("Does not have next and it should", spans.next()); assertEquals("doc", 4, spans.doc()); assertEquals("start", 0, spans.start()); assertEquals("end", 3, spans.end()); //Why does this match? assertTrue("Does not have next and it should", spans.next()); assertEquals("doc", 4, spans.doc()); assertEquals("start", 1, spans.start()); assertEquals("end", 3, spans.end()); ... } My question is why does the second span match? Doc 4 looks like: "u2 u2 u1" (see the docFields array in TestSpans.java) It seems incorrect because it is completely inside of the other Span, but maybe I am just not understanding the slop factor or something about unordered spans. I would think there would only be one match for this document since the u1u2 has a slop of 0 and the snq has a slop of 1 (which shouldn't matter, since there are no other permutations).I split the original NearSpans into an ordered and an unordered version because there was a bug LUCENE-569 for the ordered case that was difficult to fix while keeping these two cases in the same class. I documented the ordered case in the javadoc of the NearSpansOrdered class. I also specialized the original NearSpans class to implement only the unordered case, and did not add javadoc comments there. In the current version of NearSpansOrdered the subspans should not overlap to form a match. I did that to prevent the ordered spans query "t1 t1" to match all single occurrences of t1. Btw. similar considerations apply for terms indexed at the same position. However, iirc there is no test case for a span near query with the same terms (subspans).
At the time of LUCENE-569 I considered writing separate versions of ordered/unordered and overlapping/non overlapping, but thatwould have resulted in four different cases, and the split into ordered/unordered was enough to fix the bug, so I left it at that. The split into ordered and unordered was a split into (ordered + non overlapping) and (unordered + overlapping), and this is what you see in your test cases for unordered spans. To totally clear the semantics of NearSpans, it is probably a good idea to make all four cases for the subspans separately available.
Thanks for the info, Paul. This makes sense. I am not sure how I feel about spans within spans. I think in my test case it isn't that they are overlapping, the one is a subset of the other, which doesn't seem correct, but maybe I am wrong. I think you are right, that we should make the 4 cases explicit.
Regards, Paul Elschot P.S. I also remember hesitating between the class names NearSpansUnordered and NearSpansUnOrdered. In case you want to change the class name in the trunk to NearSpansUnOrdered, please do so.
I won't change them. I am never sure how to name those edge cases, either.
Cheers, Grant
In my mind, the correct test should be something like: public void testSpanNearUnOrdered() throws Exception { SpanNearQuery snq; snq = new SpanNearQuery( new SpanQuery[] { makeSpanTermQuery("u1"), makeSpanTermQuery("u2") }, 0, false); Spans spans = snq.getSpans(searcher.getIndexReader()); assertTrue("Does not have next and it should", spans.next()); assertEquals("doc", 4, spans.doc()); assertEquals("start", 1, spans.start()); assertEquals("end", 3, spans.end()); assertTrue("Does not have next and it should", spans.next()); assertEquals("doc", 5, spans.doc()); assertEquals("start", 2, spans.start()); assertEquals("end", 4, spans.end()); assertTrue("Does not have next and it should", spans.next()); assertEquals("doc", 8, spans.doc()); assertEquals("start", 2, spans.start()); assertEquals("end", 4, spans.end()); assertTrue("Does not have next and it should", spans.next()); assertEquals("doc", 9, spans.doc()); assertEquals("start", 0, spans.start()); assertEquals("end", 2, spans.end()); assertTrue("Does not have next and it should", spans.next()); assertEquals("doc", 10, spans.doc()); assertEquals("start", 0, spans.start()); assertEquals("end", 2, spans.end()); assertTrue("Has next and it shouldn't: " + spans.doc(), spans.next() == false); SpanNearQuery u1u2 = new SpanNearQuery(new SpanQuery[] {makeSpanTermQuery("u1"), makeSpanTermQuery("u2")}, 0, false); snq = new SpanNearQuery( new SpanQuery[] { u1u2, makeSpanTermQuery("u2") }, 1, false); spans = snq.getSpans(searcher.getIndexReader()); assertTrue("Does not have next and it should", spans.next()); assertEquals("doc", 4, spans.doc()); assertEquals("start", 0, spans.start()); assertEquals("end", 3, spans.end()); assertTrue("Does not have next and it should", spans.next()); assertEquals("doc", 5, spans.doc()); assertEquals("start", 0, spans.start()); assertEquals("end", 4, spans.end()); assertTrue("Does not have next and it should", spans.next()); assertEquals("doc", 8, spans.doc()); assertEquals("start", 0, spans.start()); assertEquals("end", 5, spans.end()); assertTrue("Does not have next and it should", spans.next()); assertEquals("doc", 9, spans.doc()); assertEquals("start", 0, spans.start()); assertEquals("end", 5, spans.end()); assertTrue("Does not have next and it should", spans.next()); assertEquals("doc", 10, spans.doc()); assertEquals("start", 0, spans.start()); assertEquals("end", 5, spans.end()); assertTrue("Has next and it shouldn't", spans.next() == false); } Thanks, Grant --------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]--------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
------------------------------------------------------ Grant Ingersoll http://www.grantingersoll.com/ http://lucene.grantingersoll.com http://www.paperoftheweek.com/ --------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]