Re: 500 millions document for loop.

Sheng Thu, 21 Apr 2016 20:38:24 -0700

If you don't care about search, why not just use reader to traverse ?
Establish a for loop from 0 to reader.maxDoc() - 1, and filter the
documents using Multifields. You can even bucket this procedure, and run
your statistics calc in parallel.


On Thursday, November 12, 2015, Valentin Popov <valentin...@gmail.com>
wrote:

> Hello everyone.
>
> We have ~10 indexes for 500M documents, each document has «archive date»,
> and «to» address, one of our task is calculate statistics of «to» for last
> year. Right now we are using search archive_date:(current_date - 1 year)
> and paginate results for 50k records for page. Bottleneck of that approach,
> pagination take too long time and on powerful server it take ~20 days to
> execute, and it is very long.
>
> I done experiment with csv file, put there 200M records and parse it with
> same alghoritm as using for statistics, it takes few hours to execute.
>
> Is it possible some how just fast iterate throw lucene documents without
> search and pagination? Or some how increase speed of traverse?
>
> Thanks
>
> Regards,
> Valentin.
>
>
>
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
> <javascript:;>
> For additional commands, e-mail: java-user-h...@lucene.apache.org
> <javascript:;>
>
>

Re: 500 millions document for loop.

Reply via email to