Nice to see someone taking the trouble to put their stats in context. Drives me nuts every time I see the equivalent of this: http://xkcd.com/1138/
So we have a feature that does some of what you are after - it's called the "significant_terms" aggregation. Your query would look like this: { "query" : { "match" : { "text": "foo" } }, "aggs":{ "keywords":{ "significant_terms":{ "field":"country", "size":100 } } } } What you get back are buckets for each country with a doc_count that represents how many "foo" documents there were in that country and a background count called "bg_count" which is how many docs (foo and non foo) came from that country. Selections are ranked using a score that is returned and which is more nuanced than a straight doc_count/bg_count percentage. In practice we find prioritizing selections solely by a percentage measure can skew results towards very rare terms (in your case v small countries) that have few data samples and so can more easily achieve high-scoring percentages. Instead, we offer a variety of scoring heuristics which place a different emphasis on popular vs rare when it comes to ranking: (see https://twitter.com/elasticmark/status/513320986956292096 ) Cheers Mark On Tuesday, February 17, 2015 at 1:07:31 AM UTC, ja...@holderdeord.no wrote: > > Hi, > > I'm looking for a way to have Elasticsearch calculate the percentage of > docs that match a query *within* a terms aggregation. > That is, given two aggregations where one is filtered and the other is not: > > { > aggregations: { > countries: { > filter: { > query: { > query_string: { > default_field: "description", > query: "foo" > } > } > }, > aggregations: { > filteredCountries: { > terms: { field: "country" } > } > } > }, > totalCountries: { > terms: { field: "countries" } > } > }, > size: 0 > } > > Let's say the totalCountries buckets are: > > "buckets": [ > { > "key": "USA", > "doc_count": 100 > }, > { > "key": "UK", > "doc_count": 50 > } > ] > > > and the filteredCountries buckets are: > > "buckets": [ > { > "key": "USA", > "doc_count": 10 > }, > { > "key": "UK", > "doc_count": 25 > } > ] > > > Is there a way to get a response that returns filteredCountries as > percentages of totalCountries? I.e. something like: > > [ > { > "key": "USA", > "percent": 10 > }, > { > "key": "UK", > "percent": 50 > } > ] > > Thanks! > -- You received this message because you are subscribed to the Google Groups "elasticsearch" group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/5337cd90-a434-4a44-9a81-969e55568389%40googlegroups.com. For more options, visit https://groups.google.com/d/optout.