Thanks for responding

On 03/06/14 10:32, Mikhail Khludnev wrote:

On Tue, Jun 3, 2014 at 11:12 AM, Per Steffensen <[email protected] <mailto:[email protected]>> wrote:

    It is not desirable to set rows-param to e.g. MAX_VALUE, because I
    believe Solr will allocate memory dependent on the value of
    rows-param.

not really. it reasonably limits it by maxdocs()
https://github.com/apache/lucene-solr/blob/trunk/lucene/core/src/java/org/apache/lucene/search/IndexSearcher.java#L475

Yes I see. But I am not sure when reader.maxDocs is not just "the number of docs available" - we have way more than Integer.MAX_VALUE documents.
* SegmentReader.maxDocs: si.info.getDocCount()
* BaseCompositeReader.maxDocs: for (int i = 0; i < subReaders.length; i++) { maxDoc += subReaders[i].maxDoc(); }

The query I want to get "all docs" from, might hit 1k, 10k, 100k, 1m, ... , but never even close to Integer.MAX_VALUE. And I really do not like setting rows to something "big enough", because I sure the next day someone tries to extract "big enough"+1 documents :-). I am sure no one will ever try to extract Integer.MAX_VALUE so that would be ok for "big enough", but that just seems to use an unreasonable amount of memory.
Solr and Lucene does not really suits for such "all docs", which usually don't need scores and ranking, but Lucene always intended to allocate results heap for ranking.
Grrrr, yes
Deep paging, might help, but it's not the most achievable performance.
see https://issues.apache.org/jira/browse/SOLR-5244 for some discussion, and prototype
Thanks! I will definitely vote for that one.
The thing I am working on here is actually some kind of "export".

--
Sincerely yours
Mikhail Khludnev
Principal Engineer,
Grid Dynamics


<mailto:[email protected]>

Reply via email to