Valentin Popov <[email protected]> wrote:

> We have ~10 indexes for 500M documents, each document
> has «archive date», and «to» address, one of our task is
> calculate statistics of «to» for last year. Right now we are
> using search archive_date:(current_date - 1 year) and paginate
> results for 50k records for page. Bottleneck of that approach,
> pagination take too long time and on powerful server it take 
>~20 days to execute, and it is very long.

Lucene does not like deep page requests due to the way the internal Priority 
Queue works. Solr has CursorMark, which should be fairly simple to emulate in 
your Lucene handling code:

http://lucidworks.com/blog/2013/12/12/coming-soon-to-solr-efficient-cursor-based-iteration-of-large-result-sets/

- Toke Eskildsen

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to