Hello Lucene community! I am working with Solr/Lucene tool near half of year, and faced with interesting issue with SpanNearQuery queries.
Consider we have following text within document (you can find whole document text below): "intended recipient of this message or if this message has been addressed" and query: (messag within 3 of address) within 5 of messag within 3 of address. I was expecting that this query will return me the document, however it didn't. However, according to http://www.lucidimagination.com/blog/2009/07/18/the-spanquery/ What does it mean to require that Spans come in order and how do SpanQuerys actually match? it looks like that Lucene doesn't see word "message" second time, but picks up the first one. I tried to change slop for the last word to 6 ((messag within 3 of address) within 5 of messag within 6 of address) and the document was returned. Unfortunately I am not allowed to do widening for queries in runtime. Did anyone has such issue and can provide me some information how to omit this? Text sample within document (Please note that Snowball analyzer is used during indexing, this is raw text): The contents of this e-mail message and any attachments are intended solely for the addressee(s) and may contain confidential and/or legally privileged information. If you are not the intended recipient of this message or if this message has been addressed to you in error, please immediately alert the sender by reply e-mail and then delete this message and any attachments. If you are not the intended recipient, you are notified that any use, dissemination, distribution, copying, or storage of this message or any attachment is strictly prohibited. Code: public static void main(String ... args) throws Exception, CorruptIndexException, IOException { SpanNearQuery spanNear = new SpanNearQuery(new SpanQuery[] { new SpanTermQuery(new Term(BODY, "intend")), new SpanTermQuery(new Term(BODY, "messag"))}, 4, false); SpanNearQuery spanNear2 = new SpanNearQuery(new SpanQuery[] {spanNear, new SpanTermQuery(new Term(BODY, "messag"))}, 5, false); SpanNearQuery spanNear3 = new SpanNearQuery(new SpanQuery[] {spanNear2, new SpanTermQuery(new Term(BODY, "address"))}, 3, false); Directory directory = SimpleFSDirectory("C:\\\\20\\index"); IndexSearcher searcher = new IndexSearcher(directory); searcher.setDefaultFieldSortScoring(true, false); TopDocs results = searcher.search(spanNear3, null, 20, Sort.RELEVANCE); //Iterator it = results.iterator(); for (ScoreDoc sd : results.scoreDocs) { int docID = sd.doc; float score = sd.score; System.out.println("Doc id: " + docID + " ,score: " + score); } searcher.close(); } -- View this message in context: http://lucene.472066.n3.nabble.com/SpanNearQuery-doesn-t-return-document-if-the-same-word-within-query-is-repeated-tp2167618p2167618.html Sent from the Lucene - Java Users mailing list archive at Nabble.com. --------------------------------------------------------------------- To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands, e-mail: java-user-h...@lucene.apache.org