> Regarding the accuracy of top-k lists.... This is perhaps an over-simplification - we deal with far more complex scenarios than a simple, single top-K list - we have whole aggregation trees with multiple layers of aggs: geo, time, nested, parent/child, percentiles, cardinalities etc etc which can embed multiple top K terms aggs, or be contained by one. Today all aggs work in one pass over local data to produce a merge-able summary output - if you introduce the idea of pausing all of this local computation mid-stream and then resuming it once you've centrally determined what "top K" is across a cluster and for various points in the agg tree then coordinating all of these updates gets impossibly complex.
>I acknowledge it is a highly specialised use-case which not very many people run into, but it is a case I'm currently working on. To be fair multi-level merging is a capability which might also apply to analytics in federated architectures where proxy servers might act as the front to nodes in remote clusters. >I was thinking to reduce the complete set of buckets locally I'm unclear on your approach to the "reduce": 1) Take the summary outputs of multiple agg pipelines computed in parallel and merge them in the same way coordinating nodes do or 2) Take the raw inputs (doc streams) from all shards held on a node and feed them through a single aggregation pipeline to get one combined output The problems being 1) loses accuracy and 2) loses any parallelism because agg pipelines are single threaded and must process doc streams serially. Because you claimed accuracy would be better I guess you mean option 2? -- You received this message because you are subscribed to the Google Groups "elasticsearch" group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/5967eb30-5bd8-42b8-aa35-1793dc77afa7%40googlegroups.com. For more options, visit https://groups.google.com/d/optout.