I don't believe value_count is intended to be a unique count.
On Friday, March 28, 2014 7:17:47 AM UTC, Henrik Nordvik wrote: > > Hi, > I'm trying out the new cardinality aggregation, and want to measure the > accuracy on my data. I'm using a dataset of a day of sample tweets (2.8m > tweets). > > I'm counting the number of unique usernames per language. > To get my "reference" unique count I use this: > GET /twitter-2014.03.26/_search > { > "size": 0, > "aggs": { > "country_count": { > "terms": { > "field": "lang" > }, > "aggs": { > "unique_count" : { "value_count" : { "field" : "screen_name" } } > } > } > } > } > > Result: > "aggregations": { > "country_count": { > "buckets": [ > { > "key": "en", > "doc_count": 872906, > "unique_count": { > "value": 307489 > } > }, > { > "key": "ja", > "doc_count": 581521, > "unique_count": { > "value": 103035 > } > }, > > > To get the approximate count with cardinality: > GET /twitter-2014.03.26/_search > { > "size": 0, > "aggs": { > "country_count": { > "terms": { > "field": "lang" > }, > "aggregations": { > "distinct_users_approx": { > "cardinality": { > "field": "screen_name", > "precision_threshold": 40000 > } > } > } > } > } > } > > Result: > "aggregations": { > "country_count": { > "buckets": [ > { > "key": "en", > "doc_count": 872906, > "distinct_users_approx": { > "value": 145541 > } > }, > { > "key": "ja", > "doc_count": 581521, > "distinct_users_approx": { > "value": 50824 > } > }, > > So, 307489 vs 145541 for english, and 103035 vs 50824 for japanese. Not > very accurate. > > 1) Am I doing the reference unique count distinct correctly? > 2) Is it supposed to be this inaccurate on this type of dataset? > 3) Is there any way to improve precision? > > - > Henrik > -- You received this message because you are subscribed to the Google Groups "elasticsearch" group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/b323f916-81ff-4e98-baa2-e3b0f84fa28e%40googlegroups.com. For more options, visit https://groups.google.com/d/optout.