To test my Boolean queries, I have a small test collection where each document
contains one of 1024 possible combinations of the strings "aaa", "bbb",
... "jjj". I tried wrapping a Boolean query like this (it's based on an
older post to this list [1])
private static TermsFilter getTermsFilter(String field, String text) {
TermsFilter tf = new TermsFilter();
tf.addTerm(new Term(field, text));
return tf;
}
Query q = new QueryParser("f1", new StandardAnalyzer()).parse("(aaa
AND bbb) OR ccc");
IndexSearcher searcher = new IndexSearcher(indexDir);
TopDocCollector collector = new TopDocCollector(1024);
BooleanQuery bc = (BooleanQuery) q;
BooleanFilter finalFilter = new BooleanFilter();
BooleanFilter boolFilt = new BooleanFilter();
// add each clause of the original query to the filter
for (BooleanClause clause : bc.getClauses()) {
boolFilt.add(new FilterClause(getTermsFilter("f1",
clause.getQuery().toString()), clause.getOccur()));
System.out.println(clause.getQuery().toString());
}
finalFilter.add(new FilterClause(boolFilt, BooleanClause.Occur.MUST));
ConstantScoreQuery csq = new ConstantScoreQuery(finalFilter);
searcher.search(csq, finalFilter, collector);
ScoreDoc[] hits = collector.topDocs().scoreDocs;
System.out.println("Found " + collector.getTotalHits() + " hits");
The result is 0 hits (should be 640).
[1] tinyurl.com/ml52ye
2009/7/4 Mark Harwood <[email protected]>:
>
> Check out booleanfilter in contrib/queries. It can be wrapped in a
> constantScoreQuery
>
>
>
> On 4 Jul 2009, at 17:37, Lukas Michelbacher <[email protected]>
> wrote:
>
>
> This is about an experiment comparing plain Boolean retrieval with
> vector-space-based retrieval.
>
> I would like to disable all of Lucene's scoring mechanisms and just
> run a true Boolean query that returns exactly the documents that match a
> query specified in Boolean syntax (OR, AND, NOT). No scoring or sorting
> required.
>
> As far as I can see, this is not supported out of the box. Which classes
> would I have to modify?
>
> Would it be enough to create a subclass of Similarity and to ignore all terms
> but one (coord, say) and make this term return 1 if the query matches the
> document and 0 otherwise?
>
> Lukas
>
> --
> Lukas Michelbacher
> Institute for Natural Language Processing
> Universität Stuttgart
> email: [email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]