You can choose to ignore the score and compute your own by dividing 
doc_count by bg_count.

Your post has made me think we should add this more easily explainable 
metric as one of the scoring heuristics we offer for this aggregation.

On Tuesday, February 17, 2015 at 10:44:12 AM UTC, Jari Bakken wrote:
>
> Thanks Mark! 
>
> I've been planning to look into `significant_terms`, but didn't know it 
> could help me with this. I'm a bit concerned that a too clever scoring 
> could be hard to explain to users, but I'll give it a shot.
>
>
> On Tue, Feb 17, 2015 at 9:41 AM, Mark Harwood <mark.h...@elasticsearch.com 
> <javascript:>> wrote:
>
>> Nice to see someone taking the trouble to put their stats in context.  
>> Drives me nuts every time I see the equivalent of this: 
>> http://xkcd.com/1138/
>>
>> So we have a feature that does some of what you are after - it's called 
>> the "significant_terms" aggregation.
>> Your query would look like this:
>> {
>> "query" :
>> {
>>  "match" : {
>> "text": "foo"
>> }
>> },
>> "aggs":{
>> "keywords":{
>> "significant_terms":{
>> "field":"country",
>> "size":100
>> }
>> }
>> }
>> }
>>
>> What you get back are buckets for each country with a doc_count that 
>> represents how many "foo" documents there were in that country and a 
>> background count called "bg_count" which is how many docs (foo and non foo) 
>> came from that country. Selections are ranked using a score that is 
>> returned and which is more nuanced than a straight doc_count/bg_count 
>> percentage. In practice we find prioritizing selections solely by a 
>> percentage measure can skew results towards very rare terms (in your case v 
>> small countries) that have few data samples and so can more easily achieve 
>> high-scoring percentages. Instead, we offer a variety of scoring heuristics 
>> which place a different emphasis on popular vs rare when it comes to 
>> ranking: (see https://twitter.com/elasticmark/status/513320986956292096 )
>>
>> Cheers
>> Mark
>>
>> On Tuesday, February 17, 2015 at 1:07:31 AM UTC, ja...@holderdeord.no 
>> wrote:
>>>
>>> Hi,
>>>
>>> I'm looking for a way to have Elasticsearch calculate the percentage of 
>>> docs that match a query *within* a terms aggregation. 
>>> That is, given two aggregations where one is filtered and the other is 
>>> not:
>>>
>>> {
>>>     aggregations: {
>>>         countries: {
>>>             filter: {       
>>>                 query: {
>>>                     query_string: {
>>>                         default_field: "description",
>>>                         query: "foo"
>>>                     }
>>>                 }
>>>             },
>>>             aggregations: { 
>>>                 filteredCountries: { 
>>>                     terms: { field: "country" }
>>>                 }
>>>             }
>>>         },
>>>         totalCountries: {
>>>             terms: { field: "countries" }
>>>         }
>>>     },
>>>     size: 0
>>> }
>>>
>>> Let's say the totalCountries buckets are:
>>>
>>>     "buckets": [
>>>         {
>>>             "key": "USA",
>>>             "doc_count": 100
>>>         },
>>>         {
>>>             "key": "UK",
>>>             "doc_count": 50
>>>         }
>>>     ]
>>>
>>>
>>> and the filteredCountries buckets are: 
>>>
>>>     "buckets": [
>>>         {
>>>             "key": "USA",
>>>             "doc_count": 10
>>>         },
>>>         {
>>>             "key": "UK",
>>>             "doc_count": 25
>>>         }
>>>     ]
>>>
>>>
>>> Is there a way to get a response that returns filteredCountries as 
>>> percentages of totalCountries? I.e. something like:
>>>
>>> [
>>>     {
>>>         "key": "USA",
>>>         "percent": 10
>>>     },
>>>     {
>>>         "key": "UK",
>>>         "percent": 50
>>>     }
>>> ]
>>>
>>> Thanks!
>>>
>>  -- 
>> You received this message because you are subscribed to a topic in the 
>> Google Groups "elasticsearch" group.
>> To unsubscribe from this topic, visit 
>> https://groups.google.com/d/topic/elasticsearch/1ojltqSRdhA/unsubscribe.
>> To unsubscribe from this group and all its topics, send an email to 
>> elasticsearc...@googlegroups.com <javascript:>.
>> To view this discussion on the web visit 
>> https://groups.google.com/d/msgid/elasticsearch/5337cd90-a434-4a44-9a81-969e55568389%40googlegroups.com
>>  
>> <https://groups.google.com/d/msgid/elasticsearch/5337cd90-a434-4a44-9a81-969e55568389%40googlegroups.com?utm_medium=email&utm_source=footer>
>> .
>>
>> For more options, visit https://groups.google.com/d/optout.
>>
>
>

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/efc841d3-7c1a-4f8f-afa2-2f6474261085%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Reply via email to