Hi Michael, In general the more buckets being returned by the parent aggregator the top_hits is nested in, the more work the top_hits agg needs to do, but I didn't come across performance issues with `size` on terms agg being set to 50 and the time it takes to execute increasing 30 times when top_hits is used. To exclude this on your side, can you play around with the `size` option on terms agg?
Also perhaps the _source of your documents are relatively large. How does the top_hits agg perform without the `_source` option on the top_hits agg? Martijn On 6 January 2015 at 22:29, Michael Irani <irani.mich...@gmail.com> wrote: > Sure. I simplified the query to keep things focused. > > This query takes about 3 seconds to run: > > { > > "size": 0, > > "aggs": { > "top-fingerprints": { > "terms": { > "field": "fingerprint", > "size": 50 > }, > "aggs": { > "top_tag_hits": { > "top_hits": { > "size": 1, > "_source": { > "include": [ > "title" > ] > } > } > } > } > } > } > > } > > > This one takes about 80 milliseconds: > > { > > "size": 0, > > "aggs": { > "fingerprints": { > "terms": { > "field": "fingerprint", > "size": 100 > } > } > } > > } > > > The result's a bit too big to paste here. Anything specific about it you want > me to expose? > > > Michael. > > > On Tuesday, January 6, 2015 12:14:55 PM UTC-8, Itamar Syn-Hershko wrote: >> >> Can you share the query and example results please? >> >> -- >> >> Itamar Syn-Hershko >> http://code972.com | @synhershko <https://twitter.com/synhershko> >> Freelance Developer & Consultant >> Author of RavenDB in Action <http://manning.com/synhershko/> >> >> On Tue, Jan 6, 2015 at 10:11 PM, Michael Irani <irani....@gmail.com> >> wrote: >> >>> Hello, >>> I'm working on a corpus of size approximately 10 million documents. The >>> issue I'm running into right now is that the top scoring documents that >>> come back from my query are essentially all the same result. I'm trying to >>> find a way to get back unique results. >>> >>> I've looked into modeling the data differently with nested objects or >>> parent-child relationships, but neither layout seems to fit the bill. The >>> nested model won't work because some of the documents have too many closely >>> related objects. On the flip side there are also too many unique documents >>> for the parent-child relationship to fit. >>> >>> I then tried the "top hits aggregation" and it's exactly what I'm >>> looking for, except the running time of the query is approximately 30x >>> slower than the query without the aggregation. Are there known performance >>> issues with "top hits"? Any ideas on what I should use to make these >>> queries? Here's the aggregation piece: >>> "aggs": { >>> >>> "top-fingerprints": { >>> "terms": { >>> "field": "fingerprint", >>> "size": 50 >>> }, >>> "aggs": { >>> "top_tag_hits": { >>> "top_hits": { >>> "size": 1, >>> "_source": { >>> "include": [ >>> "title" >>> ] >>> } >>> } >>> } >>> } >>> } >>> } >>> >>> >>> Thanks, >>> Michael >>> >>> -- >>> You received this message because you are subscribed to the Google >>> Groups "elasticsearch" group. >>> To unsubscribe from this group and stop receiving emails from it, send >>> an email to elasticsearc...@googlegroups.com. >>> To view this discussion on the web visit https://groups.google.com/d/ >>> msgid/elasticsearch/29fce15c-79b7-4756-b033-93e490204095% >>> 40googlegroups.com >>> <https://groups.google.com/d/msgid/elasticsearch/29fce15c-79b7-4756-b033-93e490204095%40googlegroups.com?utm_medium=email&utm_source=footer> >>> . >>> For more options, visit https://groups.google.com/d/optout. >>> >> >> -- > You received this message because you are subscribed to the Google Groups > "elasticsearch" group. > To unsubscribe from this group and stop receiving emails from it, send an > email to elasticsearch+unsubscr...@googlegroups.com. > To view this discussion on the web visit > https://groups.google.com/d/msgid/elasticsearch/14e4a31c-3168-409a-8b2b-cb1e432ef433%40googlegroups.com > <https://groups.google.com/d/msgid/elasticsearch/14e4a31c-3168-409a-8b2b-cb1e432ef433%40googlegroups.com?utm_medium=email&utm_source=footer> > . > > For more options, visit https://groups.google.com/d/optout. > -- Met vriendelijke groet, Martijn van Groningen -- You received this message because you are subscribed to the Google Groups "elasticsearch" group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CA%2BA76Tzqo48VW0xTkR3zMpZ4Ys1CxwjB7J8dGTdp19N_1rYO3Q%40mail.gmail.com. For more options, visit https://groups.google.com/d/optout.