Re: Is ElasticSearch truly scalable for analytics?

Elliott Bradshaw Wed, 14 Jan 2015 10:16:00 -0800

Mark,

Understood, but what about cases where size is set to unlimited?  
Inaccuracies are not a concern in that case, correct?


On Wednesday, January 14, 2015 at 1:09:48 PM UTC-5, Mark Harwood wrote:
>
> If you introduce an extra reduction phase (for multiple shards on the same 
> node) you introduce further potential for inaccuracies in the final results.
> Consider the role of 'size' and 'shard_size' in the "terms" aggregation 
> [1] and the effects they have on accuracy. You'd arguably need a 
> 'node_size' setting to also control the size of this new intermediate 
> collection. All stages that reduce the volumes of data processed can 
> introduce an approximation with the potential for inaccuracies upstream 
> when merging.
>
>
> [1] 
> http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/search-aggregations-bucket-terms-aggregation.html#_shard_size
>
> On Wednesday, January 14, 2015 at 5:44:47 PM UTC, Elliott Bradshaw wrote:
>>
>> Adrien,
>>
>> I get the feeling that you're a pretty heavy contributor to the 
>> aggregation module.  In your experience, would a shard per cpu core 
>> strategy be an effective performance solution in a pure aggregation use 
>> case?    If this could proportionally reduce the aggregation time, would a 
>> node local reduce (in which all shard aggregations on a given node are 
>> reduced prior to being sent to the client node) be a good follow on 
>> strategy for further enhancement?
>>
>> On Wednesday, January 14, 2015 at 10:56:03 AM UTC-5, Adrien Grand wrote:
>>>
>>>
>>>
>>> On Wed, Jan 14, 2015 at 4:16 PM, Elliott Bradshaw <ebrad...@gmail.com> 
>>> wrote:
>>>
>>>> Just out of curiosity, are aggregations on multiple shards on a single 
>>>> node executed serially or in parallel?  In my experience, it appears that 
>>>> they're executed serially (my CPU usage did not change when going from 1 
>>>> shard to 2 shards per node, but I didn't test this extensively).  I'm 
>>>> interested in maximizing the parallelism of an aggregation without 
>>>> creating 
>>>> a massive number of nodes.
>>>>
>>>>
>>> Requests are processed serially per shard, but several shards can be 
>>> processed at the same time. So if you have an index that consists of N 
>>> primaries, this would run on N processors of your cluster in parallel.
>>>
>>>
>>> -- 
>>> Adrien Grand
>>>  
>>

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/e25d6807-627e-48b7-b5fd-c33d9a094194%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Re: Is ElasticSearch truly scalable for analytics?

Reply via email to