Re: Querying "all docs"

Per Steffensen Tue, 03 Jun 2014 04:52:27 -0700

Thanks for responding

On 03/06/14 10:32, Mikhail Khludnev wrote:

On Tue, Jun 3, 2014 at 11:12 AM, Per Steffensen <[email protected]<mailto:[email protected]>> wrote:


    It is not desirable to set rows-param to e.g. MAX_VALUE, because I
    believe Solr will allocate memory dependent on the value of
    rows-param.

not really. it reasonably limits it by maxdocs()
https://github.com/apache/lucene-solr/blob/trunk/lucene/core/src/java/org/apache/lucene/search/IndexSearcher.java#L475

Yes I see. But I am not sure when reader.maxDocs is not just "the numberof docs available" - we have way more than Integer.MAX_VALUE documents.

* SegmentReader.maxDocs: si.info.getDocCount()

* BaseCompositeReader.maxDocs: for (int i = 0; i < subReaders.length;i++) { maxDoc += subReaders[i].maxDoc(); }

The query I want to get "all docs" from, might hit 1k, 10k, 100k, 1m,... , but never even close to Integer.MAX_VALUE. And I really do notlike setting rows to something "big enough", because I sure the next daysomeone tries to extract "big enough"+1 documents :-). I am sure no onewill ever try to extract Integer.MAX_VALUE so that would be ok for "bigenough", but that just seems to use an unreasonable amount of memory.

Solr and Lucene does not really suits for such "all docs", whichusually don't need scores and ranking, but Lucene always intended toallocate results heap for ranking.

Grrrr, yes

Deep paging, might help, but it's not the most achievable performance.
see https://issues.apache.org/jira/browse/SOLR-5244 for somediscussion, and prototype

Thanks! I will definitely vote for that one.
The thing I am working on here is actually some kind of "export".


--
Sincerely yours
Mikhail Khludnev
Principal Engineer,
Grid Dynamics


<mailto:[email protected]>

Re: Querying "all docs"

Reply via email to