Adding a 'node reduce phase' to aggregations is something I'm very interested in, and also investigating for the project I'm currently working on.
"If you introduce an extra reduction phase (for multiple shards on the same node) you introduce further potential for inaccuracies in the final results." This is true if you only reduce the top-k items per shard, but I was thinking to reduce the complete set of buckets locally. This takes a bit more cpu, and memory, but my guess is that this is negligible compared to the work already being done by the aggregation framework. If you reduce the buckets on the node before sending it to the coordinator it will actually increase the accuracy for aggregations! "how many of these sorts of use cases generate sufficiently large trees of results where a node-level merging would be beneficial" It is primarily beneficial for bigger installations with lots of shards per machine. Say 40 machines with ~100 shards per machine. In the current strategy where every node is sending 100 results there is a lot of bandwidth used on the coordinating node, since it receives 4000 responses, while it could do with 40 responses (1 per machine). I acknowledge it is a highly specialised use-case which not very many people run into, but it is a case I'm currently working on. "How hard would it to be to implement such a feature?" I have been looking into this, and it is not trivial. This needs to be implemented in/around the SearchService. This is the place I found to be implementing the different search strategies, eg. DFS. Unlike the rest of Elasticsearch it does seem to not consist of modules that implement different search strategies. Regarding the accuracy of top-k lists. I think the above, both the 'node reduce phase' and making the search strategy pluggable will be the groundwork to start working on implementations of TJA or TPUT strategies as discussed in an old issue[1] about accuracy of factes. The order of steps to take before reaching the ultimate goal would be: 1) Make search strategies (eg. query then fetch, dfs query then fetch) more modularized. 2) Make a search strategy with a 'node reduce phase' for the aggregations. Start with a complete reduce on the node. If that takes to much memory/time you can use TJA or TPUT locally on the node to get a reliable top-k list. 3a) Make a search strategy that executes TJA on the cluster coordinated by the coordinating node 3b) Make a separate strategy that executes TPUT on the cluster coordinated by the coordinating node I would say that 3a and 3b are 'easy' if doing a complete reduce in step 2 is not consuming to much resources. Adding strategies for both TJA and TPUT gives ultimate control to the user, as TPUT is not suited for reliably sorting on sums where the field might contain a negative value. But TPUT has better performance in latency over TJA. I would love to get an opinion from Adrien concerning the feasibility of such an approach. -- Nils [1] https://github.com/elasticsearch/elasticsearch/issues/1305 On Wednesday, January 14, 2015 at 7:47:07 PM UTC+1, Elliott Bradshaw wrote: > > How hard would it to be to implement such a feature? Even if there are > only a handful of use cases, it could prove very helpful in these. > Particularly since very large trees are the ones that will struggle the > most with bandwidth issues. > > > On Wednesday, January 14, 2015 at 1:36:53 PM UTC-5, Mark Harwood wrote: >> >> Understood, but what about cases where size is set to unlimited? >>> Inaccuracies are not a concern in that case, correct? >>> >> >> Correct. But if we only consider the scenarios where the key sets are >> complete and accuracy is not put at risk by merging (i.e. there is no "top >> N" type filtering in play), how many of these sorts of use cases generate >> sufficiently large trees of results where a node-level merging would be >> beneficial? >> >> >>> >>> On Wednesday, January 14, 2015 at 1:09:48 PM UTC-5, Mark Harwood wrote: >>>> >>>> If you introduce an extra reduction phase (for multiple shards on the >>>> same node) you introduce further potential for inaccuracies in the final >>>> results. >>>> Consider the role of 'size' and 'shard_size' in the "terms" aggregation >>>> [1] and the effects they have on accuracy. You'd arguably need a >>>> 'node_size' setting to also control the size of this new intermediate >>>> collection. All stages that reduce the volumes of data processed can >>>> introduce an approximation with the potential for inaccuracies upstream >>>> when merging. >>>> >>>> >>>> [1] >>>> http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/search-aggregations-bucket-terms-aggregation.html#_shard_size >>>> >>>> On Wednesday, January 14, 2015 at 5:44:47 PM UTC, Elliott Bradshaw >>>> wrote: >>>>> >>>>> Adrien, >>>>> >>>>> I get the feeling that you're a pretty heavy contributor to the >>>>> aggregation module. In your experience, would a shard per cpu core >>>>> strategy be an effective performance solution in a pure aggregation use >>>>> case? If this could proportionally reduce the aggregation time, would >>>>> a >>>>> node local reduce (in which all shard aggregations on a given node are >>>>> reduced prior to being sent to the client node) be a good follow on >>>>> strategy for further enhancement? >>>>> >>>>> On Wednesday, January 14, 2015 at 10:56:03 AM UTC-5, Adrien Grand >>>>> wrote: >>>>>> >>>>>> >>>>>> >>>>>> On Wed, Jan 14, 2015 at 4:16 PM, Elliott Bradshaw <ebrad...@gmail.com >>>>>> > wrote: >>>>>> >>>>>>> Just out of curiosity, are aggregations on multiple shards on a >>>>>>> single node executed serially or in parallel? In my experience, it >>>>>>> appears >>>>>>> that they're executed serially (my CPU usage did not change when going >>>>>>> from >>>>>>> 1 shard to 2 shards per node, but I didn't test this extensively). I'm >>>>>>> interested in maximizing the parallelism of an aggregation without >>>>>>> creating >>>>>>> a massive number of nodes. >>>>>>> >>>>>>> >>>>>> Requests are processed serially per shard, but several shards can be >>>>>> processed at the same time. So if you have an index that consists of N >>>>>> primaries, this would run on N processors of your cluster in parallel. >>>>>> >>>>>> >>>>>> -- >>>>>> Adrien Grand >>>>>> >>>>> -- You received this message because you are subscribed to the Google Groups "elasticsearch" group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/f28febf8-c097-4301-8a4e-315ff1d00b92%40googlegroups.com. For more options, visit https://groups.google.com/d/optout.