Re: Aggregation-"sql like" optimization guidance with elasticsearch 1.0.0
Hi, You can nest aggregations, so in this case you'd first use Date Histogram aggregation with an interval of one hour: http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/search-aggregations-bucket-datehistogram-aggregation.html Then you'd aggregate by "id" field: http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/search-aggregations-bucket-terms-aggregation.html Here is an example: http://www.solinea.com/blog/elasticsearch-aggs-save-the-day This should be very fast, even when running on a single machine. On Friday, January 31, 2014 3:36:20 AM UTC+2, Maxime Nay wrote: > > Hi, > > We are experimenting elasticsearch 1.0.0, and are particularly excited > about the new aggregation feature. > > Here is one of our use-case that we would like to optimize : > > Right now, to imitate a basic SQL group by query that would look like : > SELECT day, hour, id, SUM(views), SUM(clicks), SUM(video_plays) FROM > events GROUP BY day, hour, id > > we are issuing this kind of queries : > > { > "size" : 0, > "query":{"match_all":{}}, > "aggs" : { > "test_aggregation" : { > "terms" : { > "script" : "doc['day'].date + '-' + doc['hour'].value + > '-' + doc['id'].value", > "order" : { "_term" : "asc" }, > "size": > }, > "aggs" : { > "sum_click" : { "sum" : { "field" : "clicks" } }, > "sum_views" : { "sum" : { "field" : "views" } }, > "sum_video_plays" : { "sum" : { "field" : "video_plays" } } > } > } > } > } > > But the perfs for this kind of queries are kind of low. Thus, we would > like to know if there are a more optimized way to get what we want. > > Thanks ! > Maxime > -- You received this message because you are subscribed to the Google Groups "elasticsearch" group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/bb2293a1-b83c-45a1-af42-e48b3fd9a0c9%40googlegroups.com. For more options, visit https://groups.google.com/d/optout.
Re: Aggregation-"sql like" optimization guidance with elasticsearch 1.0.0
For test purposes we currently have an index containing about 50M docs, distributed on a 4 nodes cluster, with 16 shards. Do you think that drastically increasing the number of shards would help ? On Friday, January 31, 2014 10:14:08 AM UTC-8, Binh Ly wrote: > > Maxime, forgot to mention, you can also distribute the load out by > increasing the shard count and adding more nodes. But precomputing the > field is probably the quickest way to improve that performance. Keep in > mind that unlike SQL, ES aggregations may return approximate metrics if you > have more than 1 shard. > -- You received this message because you are subscribed to the Google Groups "elasticsearch" group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/21874399-99c9-4c6b-8c76-f856ff95216f%40googlegroups.com. For more options, visit https://groups.google.com/groups/opt_out.
Re: Aggregation-"sql like" optimization guidance with elasticsearch 1.0.0
Maxime, forgot to mention, you can also distribute the load out by increasing the shard count and adding more nodes. But precomputing the field is probably the quickest way to improve that performance. Keep in mind that unlike SQL, ES aggregations may return approximate metrics if you have more than 1 shard. -- You received this message because you are subscribed to the Google Groups "elasticsearch" group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/2ee82d59-f8c7-41cc-b777-3af6e18f6200%40googlegroups.com. For more options, visit https://groups.google.com/groups/opt_out.
Re: Aggregation-"sql like" optimization guidance with elasticsearch 1.0.0
Unfortunately, we have about 8 different fields that could serve as aggregation key, and a lot of potential combinations between these fields. Thus, pre-computing all these combinations doesn't seem to be a viable solution. On Friday, January 31, 2014 7:52:40 AM UTC-8, Binh Ly wrote: > > Maxime, your bottleneck is likely in the script part. It has to > dynamically compute that per doc just like in sql. However, if you can > precompute that at index time (for example, introduce a field that contains > the value of date-hour-id, you should be able to improve that aggregation > time significantly. I did a quick test in 1.0 RC1 with an index of about > 100K docs, and if I precompute that term field (and eliminate the script > part), it is at least 10x faster than the script version. YMMV. > -- You received this message because you are subscribed to the Google Groups "elasticsearch" group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/134a71d9-7683-4804-9ae9-449d40580b35%40googlegroups.com. For more options, visit https://groups.google.com/groups/opt_out.
Re: Aggregation-"sql like" optimization guidance with elasticsearch 1.0.0
Maxime, your bottleneck is likely in the script part. It has to dynamically compute that per doc just like in sql. However, if you can precompute that at index time (for example, introduce a field that contains the value of date-hour-id, you should be able to improve that aggregation time significantly. I did a quick test in 1.0 RC1 with an index of about 100K docs, and if I precompute that term field (and eliminate the script part), it is at least 10x faster than the script version. YMMV. -- You received this message because you are subscribed to the Google Groups "elasticsearch" group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/b3b87708-4435-40bb-9182-1f2a843f31c7%40googlegroups.com. For more options, visit https://groups.google.com/groups/opt_out.