Thanks for responding
On 03/06/14 10:32, Mikhail Khludnev wrote:
On Tue, Jun 3, 2014 at 11:12 AM, Per Steffensen <[email protected]
<mailto:[email protected]>> wrote:
It is not desirable to set rows-param to e.g. MAX_VALUE, because I
believe Solr will allocate memory dependent on the value of
rows-param.
not really. it reasonably limits it by maxdocs()
https://github.com/apache/lucene-solr/blob/trunk/lucene/core/src/java/org/apache/lucene/search/IndexSearcher.java#L475
Yes I see. But I am not sure when reader.maxDocs is not just "the number
of docs available" - we have way more than Integer.MAX_VALUE documents.
* SegmentReader.maxDocs: si.info.getDocCount()
* BaseCompositeReader.maxDocs: for (int i = 0; i < subReaders.length;
i++) { maxDoc += subReaders[i].maxDoc(); }
The query I want to get "all docs" from, might hit 1k, 10k, 100k, 1m,
... , but never even close to Integer.MAX_VALUE. And I really do not
like setting rows to something "big enough", because I sure the next day
someone tries to extract "big enough"+1 documents :-). I am sure no one
will ever try to extract Integer.MAX_VALUE so that would be ok for "big
enough", but that just seems to use an unreasonable amount of memory.
Solr and Lucene does not really suits for such "all docs", which
usually don't need scores and ranking, but Lucene always intended to
allocate results heap for ranking.
Grrrr, yes
Deep paging, might help, but it's not the most achievable performance.
see https://issues.apache.org/jira/browse/SOLR-5244 for some
discussion, and prototype
Thanks! I will definitely vote for that one.
The thing I am working on here is actually some kind of "export".
--
Sincerely yours
Mikhail Khludnev
Principal Engineer,
Grid Dynamics
<mailto:[email protected]>