Add support for slow filters with batch processing
--------------------------------------------------
Key: LUCENE-2362
URL: https://issues.apache.org/jira/browse/LUCENE-2362
Project: Lucene - Java
Issue Type: Improvement
Components: Search
Affects Versions: 3.0.1
Reporter: Sergey Vladimirov
Internal implementation of IndexSearch assumes that Filter and scorer has
almost equal perfomance. But in our environment we have Filter implementation
that is very expensive (in compare to scorer).
if we have, let's say, 2k of termdocs selected by scorer (each ~250 docs) and
2k selected by filter, then 250k docs will be fastly checked (and filtered out)
by scorer, and 250k docs will be slowly checked by our filter.
Using straigthforward implementation makes search out of 60 seconds per query
boundary, because each next() or advance() requires N queries to database PER
CHECKED DOC. Using read ahead technique allows us to optimze it to 35 seconds
per query. Still too slow.
The solution to problem is firstly select all documents by scorer and filter
them in batch by our filter. Example of implementation (with BitSet) in
attachement. Currently it takes only ~300 millseconds per query.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]