Re: Retrieving values for a NumericDocValuesField [SEC=UNOFFICIAL]

Adrien Grand Thu, 24 Oct 2013 00:20:07 -0700

Hi Stephen,

On Thu, Oct 24, 2013 at 1:18 AM, Stephen GRAY <[email protected]> wrote:
> I actually need to loop through a large number of documents (50,000 - 
> 100,000) calculating a number of statistics (min, max, sum) so I really need 
> the most efficient/fastest solution available. It sounds like it would be 
> best to just store the data in a stored field.


I see. For that many documents, doc values are actually the right
thing to use, sorry if I put you on the wrong track I was assuming you
were only going to collect values from a few documents.

In your case the best option would be to split your doc ids according
to the segment they belong to, and then for each segment, get a
per-segment NumericDocValues instance and aggregate your statistics.
It is better than using MultiDocValues because MultiDocValues needs to
binary-search for the appropriate segment for every document.

-- 
Adrien

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Re: Retrieving values for a NumericDocValuesField [SEC=UNOFFICIAL]

Reply via email to