One interpretation of the query with ~5 is that your text has 5 words
and ~5 would imply a word in any position can match. Could it be this?
----- Original Message -----
From: "Ivan Vasilev" <[EMAIL PROTECTED]>
To: "LUCENE MAIL LIST" <java-user@lucene.apache.org>
Sent: Thursday, April 03, 2008 6:03 AM
Subject: PhraseQuery little bug?
Hi Guys,
I make the following test – I create 2 files. File1.txt with content:
“apple 2 3 4 pear”
And File2.txt with content:
“pear 2 3 4 apple”
I made the following searching tests:
1. Using Luke Search tab.
1.1. When searching for:
content:"pear apple"~3
Then the File1.txt is returned.
1.2. When searching for:
content:"pear apple"~4
Result is the same – File1.txt
1.3. When searching for:
content:"pear apple"~5
Both File1.txt and File2.txt are returned.
2. Using simple app that uses the class PhraseQuery – the results are the
same as in Test 1.
PhraseQuery pq = new PhraseQuery();
pq.add(new Term("content", "apple"));
pq.add(new Term("content", "pear"));
2.1. pq.setSlop(3);
Then the File1.txt is returned.
2.2. pq.setSlop(4);
Result is the same – File1.txt
2.3. pq.setSlop(5);
Both File1.txt and File2.txt are returned.
3. Using simple app that uses the class SpanNearQuery – the results are
now different (on my opinion these are the correct results):
new SpanNearQuery(new SpanQuery[]{
new SpanTermQuery(new Term("content", "apple")),
new SpanTermQuery(new Term("content", "pear"))}, 3, false);
Both File1.txt and File2.txt are returned.
When changing the slop from 3 to 2 (in the constructor) – no results
found.
When changing it to 4 – again 2 results are returned.
When changing the inOrder boolean (in the constructor) from false to
true – then of course File2.txt is not returned under any other
conditions.
Although this is not very important but I think PhraseQuery has a bit
buggy behavior. When searching with slop equal to the number of words that
are between the two searched words than the files that contain them IN THE
SAME ORDER are returned.
When slop is with 2 greater – ranging search words, as well as, in between
words – then THE ORDER of the searched words does not matter.
Best Regards,
Ivan
---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]
---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]