Yonik,
Thanks for your detailed reply. I'm looking at LUCENE-330 but
unfortunately the patches are now stale and I'm not sure which files
are applicable to this scenario. Hopefully Paul can advise.
Thanks,
Erik
On Mar 4, 2006, at 12:59 AM, Yonik Seeley wrote:
This is the first time I've looked at FilteredQuery, but the scorer is
indeed flawed IMO.
next() and skipTo() simply iterate over the documents that match the
query, and just modify the score to return 0 if it doesn't match the
filter.
public boolean next() throws IOException { return
scorer.next(); }
public boolean skipTo (int i) throws IOException { return
scorer.skipTo(i); }
// if the document has been filtered out, set score to 0.0
public float score() throws IOException {
return (bitset.get(scorer.doc())) ? scorer.score() : 0.0f;
}
The higher level search functions would return the correct results
since they filter out any documents with a score <= 0.
Check out LUCENE-330 for possible fixes. (sorry, firefox is refusing
to paste the URL for me again...)@author Paul Elschot
-Yonik
On 3/3/06, Erik Hatcher <[EMAIL PROTECTED]> wrote:
I've run into what I feel is an issue with FilteredQuery. The best
description is an example. First I've indexed three documents:
public void setUp() throws IOException {
RAMDirectory directory = new RAMDirectory();
IndexWriter writer = new IndexWriter(directory, new
WhitespaceAnalyzer(), true);
Document doc = new Document();
doc.add(new Field("field", "zero", Field.Store.YES,
Field.Index.TOKENIZED));
writer.addDocument(doc);
doc = new Document();
doc.add(new Field("field", "one", Field.Store.YES,
Field.Index.TOKENIZED));
writer.addDocument(doc);
writer.close();
doc = new Document();
doc.add(new Field("field", "two", Field.Store.YES,
Field.Index.TOKENIZED));
writer.addDocument(doc);
writer.close();
searcher = new IndexSearcher(directory);
}
Now for a mock filter to keep things simple:
public class DummyFilter extends Filter {
private int doc;
public DummyFilter(int doc) {
this.doc = doc;
}
public BitSet bits(IndexReader reader) throws IOException {
BitSet bits = new BitSet(reader.maxDoc());
bits.set(doc);
return bits;
}
}
And finally a test case that fails:
public void testBoolean() throws Exception {
BooleanQuery bq = new BooleanQuery();
Query query = new FilteredQuery(new MatchAllDocsQuery(),
new DummyFilter(0));
bq.add(query, BooleanClause.Occur.MUST);
query = new FilteredQuery(new MatchAllDocsQuery(),
new DummyFilter(1));
bq.add(query, BooleanClause.Occur.MUST);
Hits hits = searcher.search(bq);
assertEquals(0, hits.length()); // fails: hits.length() == 2
}
I expect no documents should match this BooleanQuery, but yet two
documents match (id's 0 and 1). Am I right in thinking that no
documents should match since each required clause selects a different
document so there is no intersection? If so, what's the flaw in
FilteredQuery that causes this? If I'm wrong in my assertion,
how so?
For comparison, a ChainedFilter does do what I expect:
public void testChainedFilter() throws Exception {
ChainedFilter filter = new ChainedFilter(
new Filter[] {new DummyFilter(0), new DummyFilter(1)},
ChainedFilter.AND);
Hits hits = searcher.search(new MatchAllDocsQuery(), filter);
assertEquals(0, hits.length()); // passes
}
Thanks,
Erik
---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]
---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]
---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]