> I need an exact value, not an approximate one :) However I've read more
the documentation and it may not be a real problem in practice, especially
if I use a threshold of 40000 (the max apparently). I couldn't find the
default precision value BTW in the documentation.

Do you really need an exact value? I mean that even if counts are
approximate, they tend to be precise and to have an error which is around
1%. But going from this 1% error margin to an accurate count would make the
aggregation MUCH more costly.

Counting high-cardinalities in a distributed system is actually a tough
issue, I tried to explain the reason why we chose an approximate algorithm
and don't provide an option to get accurate counts in our last blog post,
you might want to check it out:
  http://www.elasticsearch.org/blog/count-elasticsearch/

We didn't document the default value because we try to have sensible
defaults depending on how your aggregation is structured. For example, if
you use a top-level aggregation, this is not an issue to setup a high
threshold since there would be a single counter, but it you use a
cardinality aggregation under a very large terms or histogram aggregation
this is a different story since there would be many more counters.

> Actually I've just realized I'm going to hit a problem... I wanted to use
Kibana to graph this for me but I'm not sure Kibana supports
"aggregations"...

Indeed. There is significant work to be performed in order to integrate
aggregations into Kibana, so unfortunately this will probably take some
time to be implemented.



On Wed, Apr 2, 2014 at 6:43 PM, Vincent Massol <vmas...@gmail.com> wrote:

> Actually I've just realized I'm going to hit a problem... I wanted to use
> Kibana to graph this for me but I'm not sure Kibana supports
> "aggregations"...
>
> Any idea?
>
> Thanks
> -Vincent
>
>
> On Wednesday, April 2, 2014 11:38:14 AM UTC+2, Vincent Massol wrote:
>>
>> Thanks a lot for your fast response Adrien!
>>
>> * I noticed the cardinality aggregation but I was worried by the "an
>> approximate count of distinct values." part of the documentation. I need an
>> exact value, not an approximate one :) However I've read more the
>> documentation and it may not be a real problem in practice, especially if I
>> use a threshold of 40000 (the max apparently). I couldn't find the default
>> precision value BTW in the documentation.
>> * From your answer I gather that using aggregations is the only solution
>> to my problem and there's no way to use the Query DSL to solve it.
>>
>> Thanks, it helps a lot!
>> -Vincent
>>
>> On Wednesday, April 2, 2014 11:17:17 AM UTC+2, Adrien Grand wrote:
>>>
>>> Hi Vincent,
>>>
>>> I left some replies inline:
>>>
>>> On Wed, Apr 2, 2014 at 10:02 AM, Vincent Massol <vma...@gmail.com>wrote:
>>>
>>>> Hi guys,
>>>>
>>>> I'd like to count all entries in my ES instance, having a timestamp
>>>> from the *last day* and *group together all entries having the same
>>>> "instanceId"*. With the data below, the count result should be 1 (and
>>>> not 2) since 2 entries are within the last day but they have the same
>>>> instanceId of "def".
>>>>
>>>> I tried the following:
>>>>
>>>> curl -XPOST "http://localhost:9200/installs/install/_search?
>>>> pretty=1&fields=_source,_timestamp" -d'
>>>> {
>>>>     "aggs": {
>>>>         "lastday" : {
>>>>             "filter" : {
>>>>                 "range" : {
>>>>                     "_timestamp" : {
>>>>                         "gt" : "now-1d"
>>>>                     }
>>>>                 }
>>>>             },
>>>>             "aggs" : {
>>>>                 "instanceids" : {
>>>>                     "terms" : { "field" : "instanceId" }
>>>>                 }
>>>>             }
>>>>         }
>>>>     }
>>>> }'
>>>>
>>>> But I have 3 problems with this:
>>>> * It's not a count but a search. "aggs" don't seem to work with _count
>>>> * It returns all entries in the result before the aggs data
>>>>
>>>
>>> For these two issues, you probably want to check out the count search
>>> type[1] which works with aggregations. It's like a regular search, but
>>> doesn't do perform the fetch phase in order to fetch the top hits.
>>>
>>> [1] http://www.elasticsearch.org/guide/en/elasticsearch/
>>> reference/current/search-request-search-type.html#count
>>>
>>>
>>>> * In the aggs I don't get a direct count value and I have to count the
>>>> number of buckets to get my answer
>>>>
>>>
>>> We recently (Elasticsearch 1.1.0) added a cardinality[2] aggregation,
>>> that allows for counting unique values. In previous versions of
>>> Elasticsearch, counting was indeed only possible through the terms
>>> aggregation with a high `size` parameter, but this was inefficient on
>>> high-cardinality fields.
>>>
>>> [2] http://www.elasticsearch.org/guide/en/elasticsearch/
>>> reference/current/search-aggregations-metrics-
>>> cardinality-aggregation.html#search-aggregations-metrics-
>>> cardinality-aggregation
>>>
>>> Here is a gist that gives an example of the count search_type and the
>>> cardinality aggregation:
>>>   https://gist.github.com/jpountz/9930690
>>>
>>> --
>>> Adrien Grand
>>>
>>  --
> You received this message because you are subscribed to the Google Groups
> "elasticsearch" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to elasticsearch+unsubscr...@googlegroups.com.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/elasticsearch/0a0ba031-ab73-40d7-8397-dc536343ddf8%40googlegroups.com<https://groups.google.com/d/msgid/elasticsearch/0a0ba031-ab73-40d7-8397-dc536343ddf8%40googlegroups.com?utm_medium=email&utm_source=footer>
> .
>
> For more options, visit https://groups.google.com/d/optout.
>



-- 
Adrien Grand

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/CAL6Z4j79jhvu-%3DbzakKYUiSiieUNWZuBWaiVZgFxnjWCYzLV6g%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

Reply via email to