> I need an exact value, not an approximate one :) However I've read more the documentation and it may not be a real problem in practice, especially if I use a threshold of 40000 (the max apparently). I couldn't find the default precision value BTW in the documentation.
Do you really need an exact value? I mean that even if counts are approximate, they tend to be precise and to have an error which is around 1%. But going from this 1% error margin to an accurate count would make the aggregation MUCH more costly. Counting high-cardinalities in a distributed system is actually a tough issue, I tried to explain the reason why we chose an approximate algorithm and don't provide an option to get accurate counts in our last blog post, you might want to check it out: http://www.elasticsearch.org/blog/count-elasticsearch/ We didn't document the default value because we try to have sensible defaults depending on how your aggregation is structured. For example, if you use a top-level aggregation, this is not an issue to setup a high threshold since there would be a single counter, but it you use a cardinality aggregation under a very large terms or histogram aggregation this is a different story since there would be many more counters. > Actually I've just realized I'm going to hit a problem... I wanted to use Kibana to graph this for me but I'm not sure Kibana supports "aggregations"... Indeed. There is significant work to be performed in order to integrate aggregations into Kibana, so unfortunately this will probably take some time to be implemented. On Wed, Apr 2, 2014 at 6:43 PM, Vincent Massol <vmas...@gmail.com> wrote: > Actually I've just realized I'm going to hit a problem... I wanted to use > Kibana to graph this for me but I'm not sure Kibana supports > "aggregations"... > > Any idea? > > Thanks > -Vincent > > > On Wednesday, April 2, 2014 11:38:14 AM UTC+2, Vincent Massol wrote: >> >> Thanks a lot for your fast response Adrien! >> >> * I noticed the cardinality aggregation but I was worried by the "an >> approximate count of distinct values." part of the documentation. I need an >> exact value, not an approximate one :) However I've read more the >> documentation and it may not be a real problem in practice, especially if I >> use a threshold of 40000 (the max apparently). I couldn't find the default >> precision value BTW in the documentation. >> * From your answer I gather that using aggregations is the only solution >> to my problem and there's no way to use the Query DSL to solve it. >> >> Thanks, it helps a lot! >> -Vincent >> >> On Wednesday, April 2, 2014 11:17:17 AM UTC+2, Adrien Grand wrote: >>> >>> Hi Vincent, >>> >>> I left some replies inline: >>> >>> On Wed, Apr 2, 2014 at 10:02 AM, Vincent Massol <vma...@gmail.com>wrote: >>> >>>> Hi guys, >>>> >>>> I'd like to count all entries in my ES instance, having a timestamp >>>> from the *last day* and *group together all entries having the same >>>> "instanceId"*. With the data below, the count result should be 1 (and >>>> not 2) since 2 entries are within the last day but they have the same >>>> instanceId of "def". >>>> >>>> I tried the following: >>>> >>>> curl -XPOST "http://localhost:9200/installs/install/_search? >>>> pretty=1&fields=_source,_timestamp" -d' >>>> { >>>> "aggs": { >>>> "lastday" : { >>>> "filter" : { >>>> "range" : { >>>> "_timestamp" : { >>>> "gt" : "now-1d" >>>> } >>>> } >>>> }, >>>> "aggs" : { >>>> "instanceids" : { >>>> "terms" : { "field" : "instanceId" } >>>> } >>>> } >>>> } >>>> } >>>> }' >>>> >>>> But I have 3 problems with this: >>>> * It's not a count but a search. "aggs" don't seem to work with _count >>>> * It returns all entries in the result before the aggs data >>>> >>> >>> For these two issues, you probably want to check out the count search >>> type[1] which works with aggregations. It's like a regular search, but >>> doesn't do perform the fetch phase in order to fetch the top hits. >>> >>> [1] http://www.elasticsearch.org/guide/en/elasticsearch/ >>> reference/current/search-request-search-type.html#count >>> >>> >>>> * In the aggs I don't get a direct count value and I have to count the >>>> number of buckets to get my answer >>>> >>> >>> We recently (Elasticsearch 1.1.0) added a cardinality[2] aggregation, >>> that allows for counting unique values. In previous versions of >>> Elasticsearch, counting was indeed only possible through the terms >>> aggregation with a high `size` parameter, but this was inefficient on >>> high-cardinality fields. >>> >>> [2] http://www.elasticsearch.org/guide/en/elasticsearch/ >>> reference/current/search-aggregations-metrics- >>> cardinality-aggregation.html#search-aggregations-metrics- >>> cardinality-aggregation >>> >>> Here is a gist that gives an example of the count search_type and the >>> cardinality aggregation: >>> https://gist.github.com/jpountz/9930690 >>> >>> -- >>> Adrien Grand >>> >> -- > You received this message because you are subscribed to the Google Groups > "elasticsearch" group. > To unsubscribe from this group and stop receiving emails from it, send an > email to elasticsearch+unsubscr...@googlegroups.com. > To view this discussion on the web visit > https://groups.google.com/d/msgid/elasticsearch/0a0ba031-ab73-40d7-8397-dc536343ddf8%40googlegroups.com<https://groups.google.com/d/msgid/elasticsearch/0a0ba031-ab73-40d7-8397-dc536343ddf8%40googlegroups.com?utm_medium=email&utm_source=footer> > . > > For more options, visit https://groups.google.com/d/optout. > -- Adrien Grand -- You received this message because you are subscribed to the Google Groups "elasticsearch" group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAL6Z4j79jhvu-%3DbzakKYUiSiieUNWZuBWaiVZgFxnjWCYzLV6g%40mail.gmail.com. For more options, visit https://groups.google.com/d/optout.