Hi Armin, glad I could help. Getting all IDs first also avoids problems with changing data which could mess with the offsets. This way you have a fixed snapshot of all existing documents (at the beginning).
Best, Jens On Mon, Aug 29, 2016 at 8:12 AM, <armin.weg...@bka.bund.de> wrote: > Hi Jens, > > I just want to confirm your information. As you said, the query gets > slower the larger start is, even using filters. The best solution is to get > all ids first (may take some time), and then to get each documents by id > successively. There is a request handler (get) and a Java API method > (HttpSolrClient.getById()) to do so. > > Thanks to your help, I have a constantly fast queries, now. > > Cheers, > Armin > > -----Ursprüngliche Nachricht----- > Von: j...@grivolla.net [mailto:j...@grivolla.net] Im Auftrag von Jens > Grivolla > Gesendet: Dienstag, 16. August 2016 13:34 > An: user@uima.apache.org > Betreff: Re: CPE memory usage > > Solr is known not to be very good at deep paging, but rather getting the > top relevant results. Running a query asking for the millionth document is > pretty much the worst you can do as it will have to rank all documents > again, up to the millionth, and return that one. It can also be unreliable > if your document collection changes. > > We did get it to work quite well, though. I believe we used only filters > and retrieved the results in natural order, so that Solr wouldn't have to > rank the documents. We also had a version where we first retrieved all > matching document ids in one go, and then queried for the documents by id, > one by one, in getNext(). > > Deep paging has also seen some major improvements over time IIRC, so newer > Solr versions should perform much better than the ones from a few years > ago. > > Best, > Jens > > On Tue, Aug 9, 2016 at 12:20 PM, <armin.weg...@bka.bund.de> wrote: > > > Hi! > > > > Finally, it looks like that Solr causes the high memory consumption. The > > SolrClient isn't expected to be used like I did it. But it isn't > documented > > either. The Solr documentation is very bad. I just happened to find a > > solution on the web by accident. > > > > Thanks, > > Armin > > > > -----Ursprüngliche Nachricht----- > > Von: Richard Eckart de Castilho [mailto:r...@apache.org] > > Gesendet: Montag, 8. August 2016 15:33 > > An: user@uima.apache.org > > Betreff: Re: CPE memory usage > > > > Do you have code for a minimal test case? > > > > Cheers, > > > > -- Richard > > > > > On 08.08.2016, at 15:31, <armin.weg...@bka.bund.de> < > > armin.weg...@bka.bund.de> wrote: > > > > > > Hi Richard! > > > > > > I've changed the document reader to a kind of no-op-reader, that always > > sets the document text to an empty string: same behavior, but much slower > > increase in memory usage. > > > > > > Cheers, > > > Armin > > > > >