Hi Stephen,

On Thu, Oct 24, 2013 at 1:18 AM, Stephen GRAY <stephen.g...@immi.gov.au> wrote:
> I actually need to loop through a large number of documents (50,000 - 
> 100,000) calculating a number of statistics (min, max, sum) so I really need 
> the most efficient/fastest solution available. It sounds like it would be 
> best to just store the data in a stored field.

I see. For that many documents, doc values are actually the right
thing to use, sorry if I put you on the wrong track I was assuming you
were only going to collect values from a few documents.

In your case the best option would be to split your doc ids according
to the segment they belong to, and then for each segment, get a
per-segment NumericDocValues instance and aggregate your statistics.
It is better than using MultiDocValues because MultiDocValues needs to
binary-search for the appropriate segment for every document.

-- 
Adrien

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org

Reply via email to