Thanks Mark!

I've been planning to look into `significant_terms`, but didn't know it
could help me with this. I'm a bit concerned that a too clever scoring
could be hard to explain to users, but I'll give it a shot.


On Tue, Feb 17, 2015 at 9:41 AM, Mark Harwood <
mark.harw...@elasticsearch.com> wrote:

> Nice to see someone taking the trouble to put their stats in context.
> Drives me nuts every time I see the equivalent of this:
> http://xkcd.com/1138/
>
> So we have a feature that does some of what you are after - it's called
> the "significant_terms" aggregation.
> Your query would look like this:
> {
> "query" :
> {
>  "match" : {
> "text": "foo"
> }
> },
> "aggs":{
> "keywords":{
> "significant_terms":{
> "field":"country",
> "size":100
> }
> }
> }
> }
>
> What you get back are buckets for each country with a doc_count that
> represents how many "foo" documents there were in that country and a
> background count called "bg_count" which is how many docs (foo and non foo)
> came from that country. Selections are ranked using a score that is
> returned and which is more nuanced than a straight doc_count/bg_count
> percentage. In practice we find prioritizing selections solely by a
> percentage measure can skew results towards very rare terms (in your case v
> small countries) that have few data samples and so can more easily achieve
> high-scoring percentages. Instead, we offer a variety of scoring heuristics
> which place a different emphasis on popular vs rare when it comes to
> ranking: (see https://twitter.com/elasticmark/status/513320986956292096 )
>
> Cheers
> Mark
>
> On Tuesday, February 17, 2015 at 1:07:31 AM UTC, ja...@holderdeord.no
> wrote:
>>
>> Hi,
>>
>> I'm looking for a way to have Elasticsearch calculate the percentage of
>> docs that match a query *within* a terms aggregation.
>> That is, given two aggregations where one is filtered and the other is
>> not:
>>
>> {
>>     aggregations: {
>>         countries: {
>>             filter: {
>>                 query: {
>>                     query_string: {
>>                         default_field: "description",
>>                         query: "foo"
>>                     }
>>                 }
>>             },
>>             aggregations: {
>>                 filteredCountries: {
>>                     terms: { field: "country" }
>>                 }
>>             }
>>         },
>>         totalCountries: {
>>             terms: { field: "countries" }
>>         }
>>     },
>>     size: 0
>> }
>>
>> Let's say the totalCountries buckets are:
>>
>>     "buckets": [
>>         {
>>             "key": "USA",
>>             "doc_count": 100
>>         },
>>         {
>>             "key": "UK",
>>             "doc_count": 50
>>         }
>>     ]
>>
>>
>> and the filteredCountries buckets are:
>>
>>     "buckets": [
>>         {
>>             "key": "USA",
>>             "doc_count": 10
>>         },
>>         {
>>             "key": "UK",
>>             "doc_count": 25
>>         }
>>     ]
>>
>>
>> Is there a way to get a response that returns filteredCountries as
>> percentages of totalCountries? I.e. something like:
>>
>> [
>>     {
>>         "key": "USA",
>>         "percent": 10
>>     },
>>     {
>>         "key": "UK",
>>         "percent": 50
>>     }
>> ]
>>
>> Thanks!
>>
>  --
> You received this message because you are subscribed to a topic in the
> Google Groups "elasticsearch" group.
> To unsubscribe from this topic, visit
> https://groups.google.com/d/topic/elasticsearch/1ojltqSRdhA/unsubscribe.
> To unsubscribe from this group and all its topics, send an email to
> elasticsearch+unsubscr...@googlegroups.com.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/elasticsearch/5337cd90-a434-4a44-9a81-969e55568389%40googlegroups.com
> <https://groups.google.com/d/msgid/elasticsearch/5337cd90-a434-4a44-9a81-969e55568389%40googlegroups.com?utm_medium=email&utm_source=footer>
> .
>
> For more options, visit https://groups.google.com/d/optout.
>

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/CAP4LNbgBjhXyB3rXUPD-nfOg89MsUOLiNSLJtRO78F5WHH9vxA%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

Reply via email to