Meanwhile it occurred to me that your situation is about containment of spans, and the one currently implemented is about overlaps and order. Containment is actually a special case of overlap, but with containment there is less need to talk about order. Perhaps span containment could even be treated as a case closely related to SpanNotQuery.
Regards, Paul Elschot On Sunday 16 September 2007 04:43, Grant Ingersoll wrote: > > On Aug 30, 2007, at 2:42 PM, Grant Ingersoll wrote: > > > > > On Aug 30, 2007, at 11:21 AM, Paul Elschot wrote: > > > >> Grant, > >> > >> On Wednesday 15 August 2007 17:57, Grant Ingersoll wrote: > >>> Couple of Spans questions for people: > >>> > >>> 1. Would the docs be clearer for Spans.end() if it said that the > >>> span is not inclusive of the end position? From what I can tell, it > >>> is not inclusive, correct? > >> > >> Yes. The easiest place to see that is in TermSpans.end(), > >> which is the term position plus 1, see TermSpans.java line 89. > >> > > > > I will update the docs to make it explicit. > > > >> > >>> 2. I have added the following test to TestSpans.java > >>> public void testSpanNearUnOrdered() throws Exception { > >>> > >>> SpanNearQuery snq; > >>> SpanNearQuery u1u2 = new SpanNearQuery(new SpanQuery[] > >>> {makeSpanTermQuery("u1"), > >>> makeSpanTermQuery("u2")}, 0, > >>> false); > >>> snq = new SpanNearQuery( > >>> new SpanQuery[] { > >>> u1u2, > >>> makeSpanTermQuery("u2") > >>> }, > >>> 1, > >>> false); > >>> spans = snq.getSpans(searcher.getIndexReader()); > >>> assertTrue("Does not have next and it should", spans.next()); > >>> assertEquals("doc", 4, spans.doc()); > >>> assertEquals("start", 0, spans.start()); > >>> assertEquals("end", 3, spans.end()); > >>> > >>> //Why does this match? > >>> assertTrue("Does not have next and it should", spans.next()); > >>> assertEquals("doc", 4, spans.doc()); > >>> assertEquals("start", 1, spans.start()); > >>> assertEquals("end", 3, spans.end()); > >>> > >>> ... > >>> } > >>> > >>> My question is why does the second span match? Doc 4 looks like: > >>> "u2 u2 u1" (see the docFields array in TestSpans.java) It seems > >>> incorrect because it is completely inside of the other Span, but > >>> maybe I am just not understanding the slop factor or something about > >>> unordered spans. I would think there would only be one match for > >>> this document since the u1u2 has a slop of 0 and the snq has a slop > >>> of 1 (which shouldn't matter, since there are no other > >>> permutations). > >> > >> I split the original NearSpans into an ordered and an unordered > >> version because there was a bug LUCENE-569 for the ordered > >> case that was difficult to fix while keeping these two cases > >> in the same class. > >> > >> I documented the ordered case in the javadoc of the > >> NearSpansOrdered class. I also specialized the original > >> NearSpans class to implement only the unordered case, > >> and did not add javadoc comments there. > >> > >> In the current version of NearSpansOrdered the subspans should > >> not overlap to form a match. I did that to prevent the > >> ordered spans query "t1 t1" to match all single occurrences of t1. > >> Btw. similar considerations apply for terms indexed at the same > >> position. However, iirc there is no test case for a span near query > >> with the same terms (subspans). > >> > > > >> At the time of LUCENE-569 I considered writing separate versions > >> of ordered/unordered and overlapping/non overlapping, but that > >> would have resulted in four different cases, and the split into > >> ordered/ > >> unordered was enough to fix the bug, so I left it at that. > >> The split into ordered and unordered was a split > >> into (ordered + non overlapping) and (unordered + overlapping), > >> and this is what you see in your test cases for unordered spans. > >> > >> To totally clear the semantics of NearSpans, it is probably a good > >> idea to make all four cases for the subspans separately available. > >> > >> > > > > Thanks for the info, Paul. This makes sense. I am not sure how I > > feel about spans within spans. I think in my test case it isn't > > that they are overlapping, the one is a subset of the other, which > > doesn't seem correct, but maybe I am wrong. I think you are right, > > that we should make the 4 cases explicit. > > In thinking about this some more, I think it is actually doing a > reasonable thing, even if it is still a subset of the other, thus I > am going to leave it as is (and update my test). The results that > are returned are "narrower" and I can thus see a case being made for > returning them. > > Still, given a doc: > u2 u2 u1 > > and > SpanNearQuery u1u2 = new SpanNearQuery(new SpanQuery[] > {makeSpanTermQuery("u1"), > makeSpanTermQuery("u2")}, 0, false); > snq = new SpanNearQuery( > new SpanQuery[] { > u1u2, > makeSpanTermQuery("u2") > }, > 1, > false); > > > I am not totally sure it makes sense to return 0-3 as a span AND 1-3 > as Span because the second "u2" is being used to satisfy the u1u2 > clause AND the solo "u2" clause in the snq query above. However, > since this behavior has been around for a while and no one has really > complained and I can understand wanting to satisfy the clauses this > way, I can be convinced to leave it alone. > > Anyone have opinions otherwise? > > > > > > > > >> Regards, > >> Paul Elschot > >> > >> > >> P.S. I also remember hesitating between the class names > >> NearSpansUnordered and NearSpansUnOrdered. In case > >> you want to change the class name in the trunk to > >> NearSpansUnOrdered, please do so. > >> > > > > I won't change them. I am never sure how to name those edge cases, > > either. > > > > Cheers, > > Grant > > > > > >>> In my mind, the correct test should be something like: > >>> public void testSpanNearUnOrdered() throws Exception { > >>> > >>> SpanNearQuery snq; > >>> snq = new SpanNearQuery( > >>> new SpanQuery[] { > >>> makeSpanTermQuery("u1"), > >>> makeSpanTermQuery("u2") }, > >>> 0, > >>> false); > >>> Spans spans = snq.getSpans(searcher.getIndexReader()); > >>> assertTrue("Does not have next and it should", spans.next()); > >>> assertEquals("doc", 4, spans.doc()); > >>> assertEquals("start", 1, spans.start()); > >>> assertEquals("end", 3, spans.end()); > >>> > >>> assertTrue("Does not have next and it should", spans.next()); > >>> assertEquals("doc", 5, spans.doc()); > >>> assertEquals("start", 2, spans.start()); > >>> assertEquals("end", 4, spans.end()); > >>> > >>> assertTrue("Does not have next and it should", spans.next()); > >>> assertEquals("doc", 8, spans.doc()); > >>> assertEquals("start", 2, spans.start()); > >>> assertEquals("end", 4, spans.end()); > >>> > >>> assertTrue("Does not have next and it should", spans.next()); > >>> assertEquals("doc", 9, spans.doc()); > >>> assertEquals("start", 0, spans.start()); > >>> assertEquals("end", 2, spans.end()); > >>> > >>> assertTrue("Does not have next and it should", spans.next()); > >>> assertEquals("doc", 10, spans.doc()); > >>> assertEquals("start", 0, spans.start()); > >>> assertEquals("end", 2, spans.end()); > >>> assertTrue("Has next and it shouldn't: " + spans.doc(), > >>> spans.next() == false); > >>> > >>> SpanNearQuery u1u2 = new SpanNearQuery(new SpanQuery[] > >>> {makeSpanTermQuery("u1"), > >>> makeSpanTermQuery("u2")}, 0, > >>> false); > >>> snq = new SpanNearQuery( > >>> new SpanQuery[] { > >>> u1u2, > >>> makeSpanTermQuery("u2") > >>> }, > >>> 1, > >>> false); > >>> spans = snq.getSpans(searcher.getIndexReader()); > >>> assertTrue("Does not have next and it should", spans.next()); > >>> assertEquals("doc", 4, spans.doc()); > >>> assertEquals("start", 0, spans.start()); > >>> assertEquals("end", 3, spans.end()); > >>> > >>> > >>> assertTrue("Does not have next and it should", spans.next()); > >>> assertEquals("doc", 5, spans.doc()); > >>> assertEquals("start", 0, spans.start()); > >>> assertEquals("end", 4, spans.end()); > >>> > >>> assertTrue("Does not have next and it should", spans.next()); > >>> assertEquals("doc", 8, spans.doc()); > >>> assertEquals("start", 0, spans.start()); > >>> assertEquals("end", 5, spans.end()); > >>> > >>> > >>> assertTrue("Does not have next and it should", spans.next()); > >>> assertEquals("doc", 9, spans.doc()); > >>> assertEquals("start", 0, spans.start()); > >>> assertEquals("end", 5, spans.end()); > >>> > >>> assertTrue("Does not have next and it should", spans.next()); > >>> assertEquals("doc", 10, spans.doc()); > >>> assertEquals("start", 0, spans.start()); > >>> assertEquals("end", 5, spans.end()); > >>> assertTrue("Has next and it shouldn't", spans.next() == false); > >>> } > >>> > >>> > >>> > >>> Thanks, > >>> Grant > >>> > >>> > >>> -------------------------------------------------------------------- > >>> - > >>> To unsubscribe, e-mail: [EMAIL PROTECTED] > >>> For additional commands, e-mail: [EMAIL PROTECTED] > >>> > >>> > >>> > >> > >> --------------------------------------------------------------------- > >> To unsubscribe, e-mail: [EMAIL PROTECTED] > >> For additional commands, e-mail: [EMAIL PROTECTED] > >> > > > > ------------------------------------------------------ > > Grant Ingersoll > > http://www.grantingersoll.com/ > > http://lucene.grantingersoll.com > > http://www.paperoftheweek.com/ > > > > > > > > --------------------------------------------------------------------- > > To unsubscribe, e-mail: [EMAIL PROTECTED] > > For additional commands, e-mail: [EMAIL PROTECTED] > > > > ------------------------------------------------------ > Grant Ingersoll > http://www.grantingersoll.com/ > http://lucene.grantingersoll.com > http://www.paperoftheweek.com/ > > > > --------------------------------------------------------------------- > To unsubscribe, e-mail: [EMAIL PROTECTED] > For additional commands, e-mail: [EMAIL PROTECTED] > > > --------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]