Mark, Understood, but what about cases where size is set to unlimited? Inaccuracies are not a concern in that case, correct?
On Wednesday, January 14, 2015 at 1:09:48 PM UTC-5, Mark Harwood wrote: > > If you introduce an extra reduction phase (for multiple shards on the same > node) you introduce further potential for inaccuracies in the final results. > Consider the role of 'size' and 'shard_size' in the "terms" aggregation > [1] and the effects they have on accuracy. You'd arguably need a > 'node_size' setting to also control the size of this new intermediate > collection. All stages that reduce the volumes of data processed can > introduce an approximation with the potential for inaccuracies upstream > when merging. > > > [1] > http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/search-aggregations-bucket-terms-aggregation.html#_shard_size > > On Wednesday, January 14, 2015 at 5:44:47 PM UTC, Elliott Bradshaw wrote: >> >> Adrien, >> >> I get the feeling that you're a pretty heavy contributor to the >> aggregation module. In your experience, would a shard per cpu core >> strategy be an effective performance solution in a pure aggregation use >> case? If this could proportionally reduce the aggregation time, would a >> node local reduce (in which all shard aggregations on a given node are >> reduced prior to being sent to the client node) be a good follow on >> strategy for further enhancement? >> >> On Wednesday, January 14, 2015 at 10:56:03 AM UTC-5, Adrien Grand wrote: >>> >>> >>> >>> On Wed, Jan 14, 2015 at 4:16 PM, Elliott Bradshaw <ebrad...@gmail.com> >>> wrote: >>> >>>> Just out of curiosity, are aggregations on multiple shards on a single >>>> node executed serially or in parallel? In my experience, it appears that >>>> they're executed serially (my CPU usage did not change when going from 1 >>>> shard to 2 shards per node, but I didn't test this extensively). I'm >>>> interested in maximizing the parallelism of an aggregation without >>>> creating >>>> a massive number of nodes. >>>> >>>> >>> Requests are processed serially per shard, but several shards can be >>> processed at the same time. So if you have an index that consists of N >>> primaries, this would run on N processors of your cluster in parallel. >>> >>> >>> -- >>> Adrien Grand >>> >> -- You received this message because you are subscribed to the Google Groups "elasticsearch" group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/e25d6807-627e-48b7-b5fd-c33d9a094194%40googlegroups.com. For more options, visit https://groups.google.com/d/optout.