I noticed today that my code calls
IndexSearcher.search (Query query, Filter filter, Collector collector)
But also noticed that the DOCs says

"Applications should only use this if they need all of the matching documents. 
The high-level search API (Searcher.search(Query, Filter, int)
) is usually more efficient, as it skips non-high-scoring hits."
   
http://lucene.apache.org/core/old_versioned_docs/versions/3_5_0/api/core/org/apache/lucene/search/IndexSearcher.html#searchAfter%28org.apache.lucene.search.ScoreDoc,%20org.apache.lucene.search.Query,%20int%29
Which makes complete sense since I didn't provide it with any count limit.
My original, but apparently inefficient call is:
            searcher.search(userQuery, securityFilter, dedupingCollector);
The userQuery is really an enhanced query based on what the user entered, not 
really the usersQuery.
The duplicateCollector uses one fieldCache 
(FieldCache.DEFAULT.getStrings(reader, deDupField) to work out which ones to 
collect and which ones to reject, saving a list of 1st occurrences of documents.
I don't think I can use the contrib DuplicateFilter, because my duplicates are 
not guaranteed to be in the same index segment.

So am I being misled by my interpretation of the JavaDoc comment, even though I 
really DON'T "need all matching documents" or is there some way to work a count 
limit and a flitering into the whole chain of filters and collectors.

-Paul

Reply via email to