Yes! If I have to do the division on my own I might as well stick with the two aggregations, AFAICT.
But if it was available as a scoring heuristic I could effectively use {size: N} so I don’t have to fetch the full set of countries to do this calculation. I’ve opened a feature request here <https://github.com/elasticsearch/elasticsearch/issues/9720>. On Tue, Feb 17, 2015 at 10:52 AM, Mark Harwood < mark.harw...@elasticsearch.com> wrote: > You can choose to ignore the score and compute your own by dividing > doc_count by bg_count. > > Your post has made me think we should add this more easily explainable > metric as one of the scoring heuristics we offer for this aggregation. > > On Tuesday, February 17, 2015 at 10:44:12 AM UTC, Jari Bakken wrote: >> >> Thanks Mark! >> >> I've been planning to look into `significant_terms`, but didn't know it >> could help me with this. I'm a bit concerned that a too clever scoring >> could be hard to explain to users, but I'll give it a shot. >> >> >> On Tue, Feb 17, 2015 at 9:41 AM, Mark Harwood <mark.h...@elasticsearch. >> com> wrote: >> >>> Nice to see someone taking the trouble to put their stats in context. >>> Drives me nuts every time I see the equivalent of this: >>> http://xkcd.com/1138/ >>> >>> So we have a feature that does some of what you are after - it's called >>> the "significant_terms" aggregation. >>> Your query would look like this: >>> { >>> "query" : >>> { >>> "match" : { >>> "text": "foo" >>> } >>> }, >>> "aggs":{ >>> "keywords":{ >>> "significant_terms":{ >>> "field":"country", >>> "size":100 >>> } >>> } >>> } >>> } >>> >>> What you get back are buckets for each country with a doc_count that >>> represents how many "foo" documents there were in that country and a >>> background count called "bg_count" which is how many docs (foo and non foo) >>> came from that country. Selections are ranked using a score that is >>> returned and which is more nuanced than a straight doc_count/bg_count >>> percentage. In practice we find prioritizing selections solely by a >>> percentage measure can skew results towards very rare terms (in your case v >>> small countries) that have few data samples and so can more easily achieve >>> high-scoring percentages. Instead, we offer a variety of scoring heuristics >>> which place a different emphasis on popular vs rare when it comes to >>> ranking: (see https://twitter.com/elasticmark/status/513320986956292096 >>> ) >>> >>> Cheers >>> Mark >>> >>> On Tuesday, February 17, 2015 at 1:07:31 AM UTC, ja...@holderdeord.no >>> wrote: >>>> >>>> Hi, >>>> >>>> I'm looking for a way to have Elasticsearch calculate the percentage of >>>> docs that match a query *within* a terms aggregation. >>>> That is, given two aggregations where one is filtered and the other is >>>> not: >>>> >>>> { >>>> aggregations: { >>>> countries: { >>>> filter: { >>>> query: { >>>> query_string: { >>>> default_field: "description", >>>> query: "foo" >>>> } >>>> } >>>> }, >>>> aggregations: { >>>> filteredCountries: { >>>> terms: { field: "country" } >>>> } >>>> } >>>> }, >>>> totalCountries: { >>>> terms: { field: "countries" } >>>> } >>>> }, >>>> size: 0 >>>> } >>>> >>>> Let's say the totalCountries buckets are: >>>> >>>> "buckets": [ >>>> { >>>> "key": "USA", >>>> "doc_count": 100 >>>> }, >>>> { >>>> "key": "UK", >>>> "doc_count": 50 >>>> } >>>> ] >>>> >>>> >>>> and the filteredCountries buckets are: >>>> >>>> "buckets": [ >>>> { >>>> "key": "USA", >>>> "doc_count": 10 >>>> }, >>>> { >>>> "key": "UK", >>>> "doc_count": 25 >>>> } >>>> ] >>>> >>>> >>>> Is there a way to get a response that returns filteredCountries as >>>> percentages of totalCountries? I.e. something like: >>>> >>>> [ >>>> { >>>> "key": "USA", >>>> "percent": 10 >>>> }, >>>> { >>>> "key": "UK", >>>> "percent": 50 >>>> } >>>> ] >>>> >>>> Thanks! >>>> >>> -- >>> You received this message because you are subscribed to a topic in the >>> Google Groups "elasticsearch" group. >>> To unsubscribe from this topic, visit https://groups.google.com/d/ >>> topic/elasticsearch/1ojltqSRdhA/unsubscribe. >>> To unsubscribe from this group and all its topics, send an email to >>> elasticsearc...@googlegroups.com. >>> To view this discussion on the web visit https://groups.google.com/d/ >>> msgid/elasticsearch/5337cd90-a434-4a44-9a81-969e55568389% >>> 40googlegroups.com >>> <https://groups.google.com/d/msgid/elasticsearch/5337cd90-a434-4a44-9a81-969e55568389%40googlegroups.com?utm_medium=email&utm_source=footer> >>> . >>> >>> For more options, visit https://groups.google.com/d/optout. >>> >> >> -- > You received this message because you are subscribed to a topic in the > Google Groups "elasticsearch" group. > To unsubscribe from this topic, visit > https://groups.google.com/d/topic/elasticsearch/1ojltqSRdhA/unsubscribe. > To unsubscribe from this group and all its topics, send an email to > elasticsearch+unsubscr...@googlegroups.com. > To view this discussion on the web visit > https://groups.google.com/d/msgid/elasticsearch/efc841d3-7c1a-4f8f-afa2-2f6474261085%40googlegroups.com > <https://groups.google.com/d/msgid/elasticsearch/efc841d3-7c1a-4f8f-afa2-2f6474261085%40googlegroups.com?utm_medium=email&utm_source=footer> > . > > For more options, visit https://groups.google.com/d/optout. > -- You received this message because you are subscribed to the Google Groups "elasticsearch" group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAP4LNbiKSR4jcPCHYvidqFJniyyuVgbXorQ8AKr_qKrJdk1V8A%40mail.gmail.com. For more options, visit https://groups.google.com/d/optout.