Re: Aggregation-"sql like" optimization guidance with elasticsearch 1.0.0

2014-05-29 Thread Niko Nyrhila
Hi,

You can nest aggregations, so in this case you'd first use Date Histogram 
aggregation with an interval of one hour:
http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/search-aggregations-bucket-datehistogram-aggregation.html

Then you'd aggregate by "id" field:
http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/search-aggregations-bucket-terms-aggregation.html

Here is an example:
http://www.solinea.com/blog/elasticsearch-aggs-save-the-day

This should be very fast, even when running on a single machine.


On Friday, January 31, 2014 3:36:20 AM UTC+2, Maxime Nay wrote:
>
> Hi,
>
> We are experimenting elasticsearch 1.0.0, and are particularly excited 
> about the new aggregation feature.
>
> Here is one of our use-case that we would like to optimize :
>
> Right now, to imitate a basic SQL group by query that would look like : 
> SELECT day, hour, id, SUM(views), SUM(clicks), SUM(video_plays) FROM 
> events GROUP BY day, hour, id
>
> we are issuing this kind of queries :
>
> {  
> "size" : 0,
> "query":{"match_all":{}},
> "aggs" : {
> "test_aggregation" : {
> "terms" : {
> "script" : "doc['day'].date + '-' + doc['hour'].value + 
> '-' + doc['id'].value",
> "order" : { "_term" : "asc" },
> "size": 
> },
> "aggs" : {
> "sum_click" : { "sum" : { "field" : "clicks" } },
> "sum_views" : { "sum" : { "field" : "views" } },
> "sum_video_plays" : { "sum" : { "field" : "video_plays" } }
> }
> }
> }
> }
>
> But the perfs for this kind of queries are kind of low. Thus, we would 
> like to know if there are a more optimized way to get what we want.
>
> Thanks !
> Maxime
>

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/bb2293a1-b83c-45a1-af42-e48b3fd9a0c9%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: Aggregation-"sql like" optimization guidance with elasticsearch 1.0.0

2014-01-31 Thread Maxime Nay
For test purposes we currently have an index containing about 50M docs, 
distributed on a 4 nodes cluster, with 16 shards.
Do you think that drastically increasing the number of shards would help ? 

On Friday, January 31, 2014 10:14:08 AM UTC-8, Binh Ly wrote:
>
> Maxime, forgot to mention, you can also distribute the load out by 
> increasing the shard count and adding more nodes. But precomputing the 
> field is probably the quickest way to improve that performance. Keep in 
> mind that unlike SQL, ES aggregations may return approximate metrics if you 
> have more than 1 shard.
>

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/21874399-99c9-4c6b-8c76-f856ff95216f%40googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.


Re: Aggregation-"sql like" optimization guidance with elasticsearch 1.0.0

2014-01-31 Thread Binh Ly
Maxime, forgot to mention, you can also distribute the load out by 
increasing the shard count and adding more nodes. But precomputing the 
field is probably the quickest way to improve that performance. Keep in 
mind that unlike SQL, ES aggregations may return approximate metrics if you 
have more than 1 shard.

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/2ee82d59-f8c7-41cc-b777-3af6e18f6200%40googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.


Re: Aggregation-"sql like" optimization guidance with elasticsearch 1.0.0

2014-01-31 Thread Maxime Nay
Unfortunately, we have about 8 different fields that could serve as 
aggregation key, and a lot of potential combinations between these fields.
Thus, pre-computing all these combinations doesn't seem to be a viable 
solution.

On Friday, January 31, 2014 7:52:40 AM UTC-8, Binh Ly wrote:
>
> Maxime, your bottleneck is likely in the script part. It has to 
> dynamically compute that per doc just like in sql. However, if you can 
> precompute that at index time (for example, introduce a field that contains 
> the value of date-hour-id, you should be able to improve that aggregation 
> time significantly. I did a quick test in 1.0 RC1 with an index of about 
> 100K docs, and if I precompute that term field (and eliminate the script 
> part), it is at least 10x faster than the script version. YMMV.
>

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/134a71d9-7683-4804-9ae9-449d40580b35%40googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.


Re: Aggregation-"sql like" optimization guidance with elasticsearch 1.0.0

2014-01-31 Thread Binh Ly
Maxime, your bottleneck is likely in the script part. It has to dynamically 
compute that per doc just like in sql. However, if you can precompute that 
at index time (for example, introduce a field that contains the value of 
date-hour-id, you should be able to improve that aggregation time 
significantly. I did a quick test in 1.0 RC1 with an index of about 100K 
docs, and if I precompute that term field (and eliminate the script part), 
it is at least 10x faster than the script version. YMMV.

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/b3b87708-4435-40bb-9182-1f2a843f31c7%40googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.