Thanks Mark! I've been planning to look into `significant_terms`, but didn't know it could help me with this. I'm a bit concerned that a too clever scoring could be hard to explain to users, but I'll give it a shot.
On Tue, Feb 17, 2015 at 9:41 AM, Mark Harwood < mark.harw...@elasticsearch.com> wrote: > Nice to see someone taking the trouble to put their stats in context. > Drives me nuts every time I see the equivalent of this: > http://xkcd.com/1138/ > > So we have a feature that does some of what you are after - it's called > the "significant_terms" aggregation. > Your query would look like this: > { > "query" : > { > "match" : { > "text": "foo" > } > }, > "aggs":{ > "keywords":{ > "significant_terms":{ > "field":"country", > "size":100 > } > } > } > } > > What you get back are buckets for each country with a doc_count that > represents how many "foo" documents there were in that country and a > background count called "bg_count" which is how many docs (foo and non foo) > came from that country. Selections are ranked using a score that is > returned and which is more nuanced than a straight doc_count/bg_count > percentage. In practice we find prioritizing selections solely by a > percentage measure can skew results towards very rare terms (in your case v > small countries) that have few data samples and so can more easily achieve > high-scoring percentages. Instead, we offer a variety of scoring heuristics > which place a different emphasis on popular vs rare when it comes to > ranking: (see https://twitter.com/elasticmark/status/513320986956292096 ) > > Cheers > Mark > > On Tuesday, February 17, 2015 at 1:07:31 AM UTC, ja...@holderdeord.no > wrote: >> >> Hi, >> >> I'm looking for a way to have Elasticsearch calculate the percentage of >> docs that match a query *within* a terms aggregation. >> That is, given two aggregations where one is filtered and the other is >> not: >> >> { >> aggregations: { >> countries: { >> filter: { >> query: { >> query_string: { >> default_field: "description", >> query: "foo" >> } >> } >> }, >> aggregations: { >> filteredCountries: { >> terms: { field: "country" } >> } >> } >> }, >> totalCountries: { >> terms: { field: "countries" } >> } >> }, >> size: 0 >> } >> >> Let's say the totalCountries buckets are: >> >> "buckets": [ >> { >> "key": "USA", >> "doc_count": 100 >> }, >> { >> "key": "UK", >> "doc_count": 50 >> } >> ] >> >> >> and the filteredCountries buckets are: >> >> "buckets": [ >> { >> "key": "USA", >> "doc_count": 10 >> }, >> { >> "key": "UK", >> "doc_count": 25 >> } >> ] >> >> >> Is there a way to get a response that returns filteredCountries as >> percentages of totalCountries? I.e. something like: >> >> [ >> { >> "key": "USA", >> "percent": 10 >> }, >> { >> "key": "UK", >> "percent": 50 >> } >> ] >> >> Thanks! >> > -- > You received this message because you are subscribed to a topic in the > Google Groups "elasticsearch" group. > To unsubscribe from this topic, visit > https://groups.google.com/d/topic/elasticsearch/1ojltqSRdhA/unsubscribe. > To unsubscribe from this group and all its topics, send an email to > elasticsearch+unsubscr...@googlegroups.com. > To view this discussion on the web visit > https://groups.google.com/d/msgid/elasticsearch/5337cd90-a434-4a44-9a81-969e55568389%40googlegroups.com > <https://groups.google.com/d/msgid/elasticsearch/5337cd90-a434-4a44-9a81-969e55568389%40googlegroups.com?utm_medium=email&utm_source=footer> > . > > For more options, visit https://groups.google.com/d/optout. > -- You received this message because you are subscribed to the Google Groups "elasticsearch" group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAP4LNbgBjhXyB3rXUPD-nfOg89MsUOLiNSLJtRO78F5WHH9vxA%40mail.gmail.com. For more options, visit https://groups.google.com/d/optout.