On 04/05/2011 12:51, Paul Taylor wrote:
On 04/05/2011 12:39, Ahmet Arslan wrote:
Im receiving a number of searches with many ORs so that the total
number of matches is huge (> 1 million) although only the first 20
results are required. Analysis shows most time is spent scoring the
results. Now it seems to me if you sending a query with 10 OR
components, documents that match most of the terms are bound to get a
better score than a match that only matches one or two of the terms.
So does lucene do any optimization to not bother working out the
scores of the poor matches.
EDIT:Actually not sure the statement because if only term matches it
could still get the highest score if the match was on the shortest term.
But can you see my point is there way to get lucene discount the less
good matches without scoring them, or is there another approach. At
the moment we allow the full lucene syntax and use QueryParser to
parse a query and pass the resultant query to search unchanged
(execpt for handling of numeric fields), should I be modifying the
query somehow ?
You can restrict number of returned results by using a adaptively
computed BooleanQuery.html#setMinimumNumberShouldMatch(int) parameter.
For example, If you have 10 optional clauses you can set minimum
should match to 60% of 10 = 6.
Similar mechanism exists in solr :
http://wiki.apache.org/solr/DisMaxQParserPlugin#mm_.28Minimum_.27Should.27_Match.29
Thanks for the hint, so this could be done by overriding
getBooleanQuery() in QueryParser ?
Paul
Well I did extend QuerParser, and the method is being called but rather
disappointingly it had no noticeablke effect on how long queries took. I
really thought by reducing the number of matches the corresponding
scoring phase would be quicker.
@Override
protected Query getBooleanQuery(List<BooleanClause> clauses,
boolean disableCoord)
throws ParseException
{
BooleanQuery query = (BooleanQuery)
super.getBooleanQuery(clauses,disableCoord);
if(query!=null)
{
if(clauses.size() > 5)
{
query.setMinimumNumberShouldMatch(3);
}
}
return query;
}
---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org