Re: Combining two aggregations to get term percentage

Mark Harwood Tue, 17 Feb 2015 01:42:12 -0800

Nice to see someone taking the trouble to put their stats in context. 
 Drives me nuts every time I see the equivalent of this: 
http://xkcd.com/1138/


So we have a feature that does some of what you are after - it's called the 
"significant_terms" aggregation.
Your query would look like this:
{
"query" :
{
 "match" : {
"text": "foo"
}
},
"aggs":{
"keywords":{
"significant_terms":{
"field":"country",
"size":100
}
}
}
}

What you get back are buckets for each country with a doc_count that 
represents how many "foo" documents there were in that country and a 
background count called "bg_count" which is how many docs (foo and non foo) 
came from that country. Selections are ranked using a score that is 
returned and which is more nuanced than a straight doc_count/bg_count 
percentage. In practice we find prioritizing selections solely by a 
percentage measure can skew results towards very rare terms (in your case v 
small countries) that have few data samples and so can more easily achieve 
high-scoring percentages. Instead, we offer a variety of scoring heuristics 
which place a different emphasis on popular vs rare when it comes to 
ranking: (see https://twitter.com/elasticmark/status/513320986956292096 )

Cheers
Mark

On Tuesday, February 17, 2015 at 1:07:31 AM UTC, ja...@holderdeord.no wrote:
>
> Hi,
>
> I'm looking for a way to have Elasticsearch calculate the percentage of 
> docs that match a query *within* a terms aggregation. 
> That is, given two aggregations where one is filtered and the other is not:
>
> {
>     aggregations: {
>         countries: {
>             filter: {       
>                 query: {
>                     query_string: {
>                         default_field: "description",
>                         query: "foo"
>                     }
>                 }
>             },
>             aggregations: { 
>                 filteredCountries: { 
>                     terms: { field: "country" }
>                 }
>             }
>         },
>         totalCountries: {
>             terms: { field: "countries" }
>         }
>     },
>     size: 0
> }
>
> Let's say the totalCountries buckets are:
>
>     "buckets": [
>         {
>             "key": "USA",
>             "doc_count": 100
>         },
>         {
>             "key": "UK",
>             "doc_count": 50
>         }
>     ]
>
>
> and the filteredCountries buckets are: 
>
>     "buckets": [
>         {
>             "key": "USA",
>             "doc_count": 10
>         },
>         {
>             "key": "UK",
>             "doc_count": 25
>         }
>     ]
>
>
> Is there a way to get a response that returns filteredCountries as 
> percentages of totalCountries? I.e. something like:
>
> [
>     {
>         "key": "USA",
>         "percent": 10
>     },
>     {
>         "key": "UK",
>         "percent": 50
>     }
> ]
>
> Thanks!
>

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/5337cd90-a434-4a44-9a81-969e55568389%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Re: Combining two aggregations to get term percentage

Reply via email to