Re: Is ElasticSearch truly scalable for analytics?

Mark Harwood Thu, 15 Jan 2015 10:54:53 -0800

> Regarding the accuracy of top-k lists....

This is perhaps an over-simplification -  we deal with far more complex 
scenarios than a simple, single top-K list - we have whole aggregation 
trees with multiple layers of aggs: geo, time, nested, parent/child, 
percentiles, cardinalities etc etc which can embed multiple top K terms 
aggs, or be contained by one. Today all aggs work in one pass over local 
data to produce a merge-able summary output - if you introduce the idea of 
pausing all of this local computation mid-stream and then resuming it once 
you've centrally determined what "top K" is across a cluster and for 
various points in the agg tree then coordinating all of these updates gets 
impossibly complex.


>I acknowledge it is a highly specialised use-case which not very many 
people run into, but it is a case I'm currently working on.

To be fair multi-level merging is a capability which might also apply to 
analytics in federated architectures where proxy servers might act as the 
front to nodes in remote clusters.

>I was thinking to reduce the complete set of buckets locally

I'm unclear on your approach to the "reduce":
1) Take the summary outputs of multiple agg pipelines computed in parallel 
and merge them in the same way coordinating nodes do or
2) Take the raw inputs (doc streams) from all shards held on a node and 
feed them through a single aggregation pipeline to get one combined output

The problems being 1) loses accuracy and 2) loses any parallelism because 
agg pipelines are single threaded and must process doc streams serially.
Because you claimed accuracy would be better I guess you mean option 2?



-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/5967eb30-5bd8-42b8-aa35-1793dc77afa7%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Re: Is ElasticSearch truly scalable for analytics?

Reply via email to