If you don't care about search, why not just use reader to traverse ? Establish a for loop from 0 to reader.maxDoc() - 1, and filter the documents using Multifields. You can even bucket this procedure, and run your statistics calc in parallel.
On Thursday, November 12, 2015, Valentin Popov <valentin...@gmail.com> wrote: > Hello everyone. > > We have ~10 indexes for 500M documents, each document has «archive date», > and «to» address, one of our task is calculate statistics of «to» for last > year. Right now we are using search archive_date:(current_date - 1 year) > and paginate results for 50k records for page. Bottleneck of that approach, > pagination take too long time and on powerful server it take ~20 days to > execute, and it is very long. > > I done experiment with csv file, put there 200M records and parse it with > same alghoritm as using for statistics, it takes few hours to execute. > > Is it possible some how just fast iterate throw lucene documents without > search and pagination? Or some how increase speed of traverse? > > Thanks > > Regards, > Valentin. > > > > > > --------------------------------------------------------------------- > To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org > <javascript:;> > For additional commands, e-mail: java-user-h...@lucene.apache.org > <javascript:;> > >