Sorry Peter - I introduced this problem with some kind of typo type issue - I somehow changed an includeSpans variable to excludeSpans - but I certainly didn't mean too - it makes no sense. So not sure how it happened, and surprised the tests that passed still passed!
We could probably use even more tests before feeling too confident hereā¦ I've attached a patch for 3X with the new test and fix (changed that include back to exclude). - Mark Miller lucidimagination.com On Jul 25, 2011, at 10:29 AM, Mark Miller wrote: > Thanks Peter - if you supply the unit tests, I'm happy to work on the fixes. > > I can likely look at this later today. > > - Mark Miller > lucidimagination.com > > On Jul 25, 2011, at 10:14 AM, Peter Keegan wrote: > >> Hi Mark, >> >> Sorry to bug you again, but there's another case that fails the unit test >> (search within the second sentence), as shown here in the last test: >> >> package org.apache.lucene.search.spans; >> >> import java.io.Reader; >> >> import org.apache.lucene.analysis.Analyzer; >> import org.apache.lucene.analysis.TokenStream; >> import org.apache.lucene.analysis.tokenattributes.OffsetAttribute; >> import >> org.apache.lucene.analysis.tokenattributes.PositionIncrementAttribute; >> import org.apache.lucene.analysis.tokenattributes.CharTermAttribute; >> import org.apache.lucene.document.Document; >> import org.apache.lucene.document.Field; >> import org.apache.lucene.index.IndexReader; >> import org.apache.lucene.index.RandomIndexWriter; >> import org.apache.lucene.index.Term; >> import org.apache.lucene.store.Directory; >> import org.apache.lucene.search.IndexSearcher; >> import org.apache.lucene.search.PhraseQuery; >> import org.apache.lucene.search.ScoreDoc; >> import org.apache.lucene.search.TermQuery; >> import org.apache.lucene.search.spans.SpanNearQuery; >> import org.apache.lucene.search.spans.SpanQuery; >> import org.apache.lucene.search.spans.SpanTermQuery; >> import org.apache.lucene.util.LuceneTestCase; >> >> public class TestSentence extends LuceneTestCase { >> public static final String field = "field"; >> public static final String START = "^"; >> public static final String END = "$"; >> public void testSetPosition() throws Exception { >> Analyzer analyzer = new Analyzer() { >> @Override >> public TokenStream tokenStream(String fieldName, Reader reader) { >> return new TokenStream() { >> private final String[] TOKENS = {"1", "2", "3", END, "4", "5", "6", END, >> "9"}; >> private final int[] INCREMENTS = {1,1,1,0,1,1,1,0,1}; >> private int i = 0; >> PositionIncrementAttribute posIncrAtt = >> addAttribute(PositionIncrementAttribute.class); >> CharTermAttribute termAtt = addAttribute(CharTermAttribute.class); >> OffsetAttribute offsetAtt = addAttribute(OffsetAttribute.class); >> @Override >> public boolean incrementToken() { >> assertEquals(TOKENS.length, INCREMENTS.length); >> if (i == TOKENS.length) >> return false; >> clearAttributes(); >> termAtt.append(TOKENS[i]); >> offsetAtt.setOffset(i,i); >> posIncrAtt.setPositionIncrement(INCREMENTS[i]); >> i++; >> return true; >> } >> }; >> } >> }; >> Directory store = newDirectory(); >> RandomIndexWriter writer = new RandomIndexWriter(random, store, analyzer); >> Document d = new Document(); >> d.add(newField("field", "bogus", Field.Store.YES, Field.Index.ANALYZED)); >> writer.addDocument(d); >> IndexReader reader = writer.getReader(); >> writer.close(); >> IndexSearcher searcher = newSearcher(reader); >> SpanTermQuery startSentence = makeSpanTermQuery(START); >> SpanTermQuery endSentence = makeSpanTermQuery(END); >> SpanQuery[] clauses = new SpanQuery[2]; >> clauses[0] = makeSpanTermQuery("1"); >> clauses[1] = makeSpanTermQuery("2"); >> SpanNearQuery allKeywords = new SpanNearQuery(clauses, Integer.MAX_VALUE, >> false); // SpanAndQuery equivalent >> SpanWithinQuery query = new SpanWithinQuery(allKeywords, endSentence, 0); >> System.out.println("query: "+query); >> ScoreDoc[] hits = searcher.search(query, null, 1000).scoreDocs; >> assertEquals(1, hits.length); >> clauses[1] = makeSpanTermQuery("4"); >> allKeywords = new SpanNearQuery(clauses, Integer.MAX_VALUE, false); // >> SpanAndQuery equivalent >> query = new SpanWithinQuery(allKeywords, endSentence, 0); >> System.out.println("query: "+query); >> hits = searcher.search(query, null, 1000).scoreDocs; >> assertEquals(0, hits.length); >> PhraseQuery pq = new PhraseQuery(); >> pq.add(new Term(field, "3")); >> pq.add(new Term(field, "4")); >> System.out.println("query: "+pq); >> hits = searcher.search(pq, null, 1000).scoreDocs; >> assertEquals(1, hits.length); >> clauses[0] = makeSpanTermQuery("4"); >> clauses[1] = makeSpanTermQuery("6"); >> allKeywords = new SpanNearQuery(clauses, Integer.MAX_VALUE, false); // >> SpanAndQuery equivalent >> query = new SpanWithinQuery(allKeywords, endSentence, 0); >> System.out.println("query: "+query); >> hits = searcher.search(query, null, 1000).scoreDocs; >> assertEquals(1, hits.length); >> } >> >> public SpanTermQuery makeSpanTermQuery(String text) { >> return new SpanTermQuery(new Term(field, text)); >> } >> public TermQuery makeTermQuery(String text) { >> return new TermQuery(new Term(field, text)); >> } >> } >> >> Peter >> >> On Thu, Jul 21, 2011 at 5:23 PM, Mark Miller <markrmil...@gmail.com> wrote: >> >>> >>> I just uploaded a patch for 3X that will work for 3.2. >>> >>> On Jul 21, 2011, at 4:25 PM, Mark Miller wrote: >>> >>>> Yeah, it's off trunk - I'll submit a 3X patch in a bit - just have to >>> change that to an IndexReader I believe. >>>> >>>> - Mark >>>> >>>> On Jul 21, 2011, at 4:01 PM, Peter Keegan wrote: >>>> >>>>> Does this patch require the trunk version? I'm using 3.2 and >>>>> 'AtomicReaderContext' isn't there. >>>>> >>>>> Peter >>>>> >>>>> On Thu, Jul 21, 2011 at 3:07 PM, Mark Miller <markrmil...@gmail.com> >>> wrote: >>>>> >>>>>> Hey Peter, >>>>>> >>>>>> Getting sucked back into Spans... >>>>>> >>>>>> That test should pass now - I uploaded a new patch to >>>>>> https://issues.apache.org/jira/browse/LUCENE-777 >>>>>> >>>>>> Further tests may be needed though. >>>>>> >>>>>> - Mark >>>>>> >>>>>> >>>>>> On Jul 21, 2011, at 9:28 AM, Peter Keegan wrote: >>>>>> >>>>>>> Hi Mark, >>>>>>> >>>>>>> Here is a unit test using a version of 'SpanWithinQuery' modified for >>> 3.2 >>>>>>> ('getTerms' removed) . The last test fails (search for "1" and "3"). >>>>>>> >>>>>>> package org.apache.lucene.search.spans; >>>>>>> >>>>>>> import java.io.Reader; >>>>>>> >>>>>>> import org.apache.lucene.analysis.Analyzer; >>>>>>> import org.apache.lucene.analysis.TokenStream; >>>>>>> import org.apache.lucene.analysis.tokenattributes.OffsetAttribute; >>>>>>> import >>>>>>> org.apache.lucene.analysis.tokenattributes.PositionIncrementAttribute; >>>>>>> import org.apache.lucene.analysis.tokenattributes.CharTermAttribute; >>>>>>> import org.apache.lucene.document.Document; >>>>>>> import org.apache.lucene.document.Field; >>>>>>> import org.apache.lucene.index.IndexReader; >>>>>>> import org.apache.lucene.index.RandomIndexWriter; >>>>>>> import org.apache.lucene.index.Term; >>>>>>> import org.apache.lucene.store.Directory; >>>>>>> import org.apache.lucene.search.IndexSearcher; >>>>>>> import org.apache.lucene.search.PhraseQuery; >>>>>>> import org.apache.lucene.search.ScoreDoc; >>>>>>> import org.apache.lucene.search.TermQuery; >>>>>>> import org.apache.lucene.search.spans.SpanNearQuery; >>>>>>> import org.apache.lucene.search.spans.SpanQuery; >>>>>>> import org.apache.lucene.search.spans.SpanTermQuery; >>>>>>> import org.apache.lucene.util.LuceneTestCase; >>>>>>> >>>>>>> public class TestSentence extends LuceneTestCase { >>>>>>> public static final String field = "field"; >>>>>>> public static final String START = "^"; >>>>>>> public static final String END = "$"; >>>>>>> public void testSetPosition() throws Exception { >>>>>>> Analyzer analyzer = new Analyzer() { >>>>>>> @Override >>>>>>> public TokenStream tokenStream(String fieldName, Reader reader) { >>>>>>> return new TokenStream() { >>>>>>> private final String[] TOKENS = {"1", "2", "3", END, "4", "5", "6", >>> END, >>>>>>> "9"}; >>>>>>> private final int[] INCREMENTS = {1,1,1,0,1,1,1,0,1}; >>>>>>> private int i = 0; >>>>>>> >>>>>>> PositionIncrementAttribute posIncrAtt = >>>>>>> addAttribute(PositionIncrementAttribute.class); >>>>>>> CharTermAttribute termAtt = addAttribute(CharTermAttribute.class); >>>>>>> OffsetAttribute offsetAtt = addAttribute(OffsetAttribute.class); >>>>>>> >>>>>>> @Override >>>>>>> public boolean incrementToken() { >>>>>>> assertEquals(TOKENS.length, INCREMENTS.length); >>>>>>> if (i == TOKENS.length) >>>>>>> return false; >>>>>>> clearAttributes(); >>>>>>> termAtt.append(TOKENS[i]); >>>>>>> offsetAtt.setOffset(i,i); >>>>>>> posIncrAtt.setPositionIncrement(INCREMENTS[i]); >>>>>>> i++; >>>>>>> return true; >>>>>>> } >>>>>>> }; >>>>>>> } >>>>>>> }; >>>>>>> Directory store = newDirectory(); >>>>>>> RandomIndexWriter writer = new RandomIndexWriter(random, store, >>>>>> analyzer); >>>>>>> Document d = new Document(); >>>>>>> d.add(newField("field", "bogus", Field.Store.YES, >>> Field.Index.ANALYZED)); >>>>>>> writer.addDocument(d); >>>>>>> IndexReader reader = writer.getReader(); >>>>>>> writer.close(); >>>>>>> IndexSearcher searcher = newSearcher(reader); >>>>>>> >>>>>>> SpanTermQuery startSentence = makeSpanTermQuery(START); >>>>>>> SpanTermQuery endSentence = makeSpanTermQuery(END); >>>>>>> SpanQuery[] clauses = new SpanQuery[2]; >>>>>>> clauses[0] = makeSpanTermQuery("1"); >>>>>>> clauses[1] = makeSpanTermQuery("2"); >>>>>>> SpanNearQuery allKeywords = new SpanNearQuery(clauses, >>> Integer.MAX_VALUE, >>>>>>> false); // SpanAndQuery equivalent >>>>>>> SpanWithinQuery query = new SpanWithinQuery(allKeywords, endSentence, >>> 0); >>>>>>> System.out.println("query: "+query); >>>>>>> ScoreDoc[] hits = searcher.search(query, null, 1000).scoreDocs; >>>>>>> assertEquals(hits.length, 1); >>>>>>> >>>>>>> clauses[1] = makeSpanTermQuery("4"); >>>>>>> allKeywords = new SpanNearQuery(clauses, Integer.MAX_VALUE, false); // >>>>>>> SpanAndQuery equivalent >>>>>>> query = new SpanWithinQuery(allKeywords, endSentence, 0); >>>>>>> System.out.println("query: "+query); >>>>>>> hits = searcher.search(query, null, 1000).scoreDocs; >>>>>>> assertEquals(hits.length, 0); >>>>>>> >>>>>>> PhraseQuery pq = new PhraseQuery(); >>>>>>> pq.add(new Term(field, "3")); >>>>>>> pq.add(new Term(field, "4")); >>>>>>> hits = searcher.search(pq, null, 1000).scoreDocs; >>>>>>> assertEquals(hits.length, 1); >>>>>>> >>>>>>> clauses[1] = makeSpanTermQuery("3"); >>>>>>> allKeywords = new SpanNearQuery(clauses, Integer.MAX_VALUE, false); // >>>>>>> SpanAndQuery equivalent >>>>>>> query = new SpanWithinQuery(allKeywords, endSentence, 0); >>>>>>> System.out.println("query: "+query); >>>>>>> hits = searcher.search(query, null, 1000).scoreDocs; >>>>>>> assertEquals(hits.length, 1); >>>>>>> >>>>>>> >>>>>>> } >>>>>>> >>>>>>> public SpanTermQuery makeSpanTermQuery(String text) { >>>>>>> return new SpanTermQuery(new Term(field, text)); >>>>>>> } >>>>>>> public TermQuery makeTermQuery(String text) { >>>>>>> return new TermQuery(new Term(field, text)); >>>>>>> } >>>>>>> } >>>>>>> >>>>>>> Peter >>>>>>> >>>>>>> On Wed, Jul 20, 2011 at 9:22 PM, Mark Miller <markrmil...@gmail.com> >>>>>> wrote: >>>>>>> >>>>>>>> >>>>>>>> On Jul 20, 2011, at 7:44 PM, Mark Miller wrote: >>>>>>>> >>>>>>>>> >>>>>>>>> On Jul 20, 2011, at 11:27 AM, Peter Keegan wrote: >>>>>>>>> >>>>>>>>>> Mark Miller's 'SpanWithinQuery' patch >>>>>>>>>> seems to have the same issue. >>>>>>>>> >>>>>>>>> If I remember right (It's been more the a couple years), I did index >>>>>> the >>>>>>>> sentence markers at the same position as the last word in the >>> sentence. >>>>>> And >>>>>>>> I think the limitation that I ate was that the word could belong to >>> both >>>>>>>> it's true sentence, and the one after it. >>>>>>>>> >>>>>>>>> - Mark Miller >>>>>>>>> lucidimagination.com >>>>>>>> >>>>>>>> Perhaps you could index the sentence marker at both the last word of >>> the >>>>>>>> sentence as well as the first word of the next sentence if there is >>> one. >>>>>>>> This would seem to solve the above limitation as well? >>>>>>>> >>>>>>>> - Mark Miller >>>>>>>> lucidimagination.com >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> --------------------------------------------------------------------- >>>>>>>> To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org >>>>>>>> For additional commands, e-mail: java-user-h...@lucene.apache.org >>>>>>>> >>>>>>>> >>>>>> >>>>>> - Mark Miller >>>>>> lucidimagination.com >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> --------------------------------------------------------------------- >>>>>> To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org >>>>>> For additional commands, e-mail: java-user-h...@lucene.apache.org >>>>>> >>>>>> >>>> >>>> - Mark Miller >>>> lucidimagination.com >>>> >>>> >>>> >>>> >>>> >>>> >>>> >>>> >>> >>> - Mark Miller >>> lucidimagination.com >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> --------------------------------------------------------------------- >>> To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org >>> For additional commands, e-mail: java-user-h...@lucene.apache.org >>> >>> > > > > > > > > > > --------------------------------------------------------------------- To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands, e-mail: java-user-h...@lucene.apache.org