For the searches I want to run on my index I want to return all matching
documents (as opposed to N top hits).



My first naïve approach was just to use Searcher.search(query, filter,
Integer.MAX_VALUE, sort) – that is, pass Integer.MAX_VALUE for the number
of possible docs to return. That unfortunately seems to have huge heap
requirements in org.apache.lucene.util.PriorityQueue.heap as the max docID
in my index gets large. Multiply that per search heap requirement by a
handful of concurrent threads and I OOME my server.



When I don’t need to do any sorting it pretty easy to just use my own
collector to gather the doc ids.  Of course depending on the number of
hits I might still need a good amount of heap but at least it a factor of
the number of matches (not the index size).



I’m struggling to figure out how to do the same search but with sorting.
I’m looking for a method like Searcher.search(Query, Filter, Sort,
Collector), but perhaps that isn’t a reasonable thing to have, please
enlighten me if so :-)



I’m using 3.0.3 lucene-core at the moment but I don’t see that this aspect
is any different in 3.2.0.



Hopefully this made sense, any help you can provide is appreciated.









Reply via email to