Yes!

If I have to do the division on my own I might as well stick with the two
aggregations, AFAICT.

But if it was available as a scoring heuristic I could effectively use {size:
N} so I don’t have to fetch the full set of countries to do this
calculation.

I’ve opened a feature request here
<https://github.com/elasticsearch/elasticsearch/issues/9720>.
​


On Tue, Feb 17, 2015 at 10:52 AM, Mark Harwood <
mark.harw...@elasticsearch.com> wrote:

> You can choose to ignore the score and compute your own by dividing
> doc_count by bg_count.
>

> Your post has made me think we should add this more easily explainable
> metric as one of the scoring heuristics we offer for this aggregation.
>
> On Tuesday, February 17, 2015 at 10:44:12 AM UTC, Jari Bakken wrote:
>>
>> Thanks Mark!
>>
>> I've been planning to look into `significant_terms`, but didn't know it
>> could help me with this. I'm a bit concerned that a too clever scoring
>> could be hard to explain to users, but I'll give it a shot.
>>
>>
>> On Tue, Feb 17, 2015 at 9:41 AM, Mark Harwood <mark.h...@elasticsearch.
>> com> wrote:
>>
>>> Nice to see someone taking the trouble to put their stats in context.
>>> Drives me nuts every time I see the equivalent of this:
>>> http://xkcd.com/1138/
>>>
>>> So we have a feature that does some of what you are after - it's called
>>> the "significant_terms" aggregation.
>>> Your query would look like this:
>>> {
>>> "query" :
>>> {
>>>  "match" : {
>>> "text": "foo"
>>> }
>>> },
>>> "aggs":{
>>> "keywords":{
>>> "significant_terms":{
>>> "field":"country",
>>> "size":100
>>> }
>>> }
>>> }
>>> }
>>>
>>> What you get back are buckets for each country with a doc_count that
>>> represents how many "foo" documents there were in that country and a
>>> background count called "bg_count" which is how many docs (foo and non foo)
>>> came from that country. Selections are ranked using a score that is
>>> returned and which is more nuanced than a straight doc_count/bg_count
>>> percentage. In practice we find prioritizing selections solely by a
>>> percentage measure can skew results towards very rare terms (in your case v
>>> small countries) that have few data samples and so can more easily achieve
>>> high-scoring percentages. Instead, we offer a variety of scoring heuristics
>>> which place a different emphasis on popular vs rare when it comes to
>>> ranking: (see https://twitter.com/elasticmark/status/513320986956292096
>>> )
>>>
>>> Cheers
>>> Mark
>>>
>>> On Tuesday, February 17, 2015 at 1:07:31 AM UTC, ja...@holderdeord.no
>>> wrote:
>>>>
>>>> Hi,
>>>>
>>>> I'm looking for a way to have Elasticsearch calculate the percentage of
>>>> docs that match a query *within* a terms aggregation.
>>>> That is, given two aggregations where one is filtered and the other is
>>>> not:
>>>>
>>>> {
>>>>     aggregations: {
>>>>         countries: {
>>>>             filter: {
>>>>                 query: {
>>>>                     query_string: {
>>>>                         default_field: "description",
>>>>                         query: "foo"
>>>>                     }
>>>>                 }
>>>>             },
>>>>             aggregations: {
>>>>                 filteredCountries: {
>>>>                     terms: { field: "country" }
>>>>                 }
>>>>             }
>>>>         },
>>>>         totalCountries: {
>>>>             terms: { field: "countries" }
>>>>         }
>>>>     },
>>>>     size: 0
>>>> }
>>>>
>>>> Let's say the totalCountries buckets are:
>>>>
>>>>     "buckets": [
>>>>         {
>>>>             "key": "USA",
>>>>             "doc_count": 100
>>>>         },
>>>>         {
>>>>             "key": "UK",
>>>>             "doc_count": 50
>>>>         }
>>>>     ]
>>>>
>>>>
>>>> and the filteredCountries buckets are:
>>>>
>>>>     "buckets": [
>>>>         {
>>>>             "key": "USA",
>>>>             "doc_count": 10
>>>>         },
>>>>         {
>>>>             "key": "UK",
>>>>             "doc_count": 25
>>>>         }
>>>>     ]
>>>>
>>>>
>>>> Is there a way to get a response that returns filteredCountries as
>>>> percentages of totalCountries? I.e. something like:
>>>>
>>>> [
>>>>     {
>>>>         "key": "USA",
>>>>         "percent": 10
>>>>     },
>>>>     {
>>>>         "key": "UK",
>>>>         "percent": 50
>>>>     }
>>>> ]
>>>>
>>>> Thanks!
>>>>
>>>  --
>>> You received this message because you are subscribed to a topic in the
>>> Google Groups "elasticsearch" group.
>>> To unsubscribe from this topic, visit https://groups.google.com/d/
>>> topic/elasticsearch/1ojltqSRdhA/unsubscribe.
>>> To unsubscribe from this group and all its topics, send an email to
>>> elasticsearc...@googlegroups.com.
>>> To view this discussion on the web visit https://groups.google.com/d/
>>> msgid/elasticsearch/5337cd90-a434-4a44-9a81-969e55568389%
>>> 40googlegroups.com
>>> <https://groups.google.com/d/msgid/elasticsearch/5337cd90-a434-4a44-9a81-969e55568389%40googlegroups.com?utm_medium=email&utm_source=footer>
>>> .
>>>
>>> For more options, visit https://groups.google.com/d/optout.
>>>
>>
>>  --
> You received this message because you are subscribed to a topic in the
> Google Groups "elasticsearch" group.
> To unsubscribe from this topic, visit
> https://groups.google.com/d/topic/elasticsearch/1ojltqSRdhA/unsubscribe.
> To unsubscribe from this group and all its topics, send an email to
> elasticsearch+unsubscr...@googlegroups.com.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/elasticsearch/efc841d3-7c1a-4f8f-afa2-2f6474261085%40googlegroups.com
> <https://groups.google.com/d/msgid/elasticsearch/efc841d3-7c1a-4f8f-afa2-2f6474261085%40googlegroups.com?utm_medium=email&utm_source=footer>
> .
>
> For more options, visit https://groups.google.com/d/optout.
>

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/CAP4LNbiKSR4jcPCHYvidqFJniyyuVgbXorQ8AKr_qKrJdk1V8A%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

Reply via email to