We are experiencing slow parent/child queries even when we run the query a 
second time and I wanted to know if this is just the limit of this feature 
within ElasticSearch. According to the ES Docs 
(http://www.elasticsearch.org/guide/en/elasticsearch/guide/current/parent-child-performance.html)
 
parent/child queries can be 5-10x slower and consume a lot of memory. 

My impression has been that as long as we give ES enough memory via the 
field data cache, subsequent queries would be quicker than the first time 
it is executed. We are seeing the following query take ~16 seconds to 
complete every time. 


{
    "from": 0,
    "size": 100,
    "query": {
        "filtered": {
            "query": {
                "match_all": {}
            },
            "filter": {
                "bool": {
                    "must": [
                        {
                            "term": {
                                "oid": 61
                            }
                        },
                        {
                            "has_child": {
                                "type": "social",
                                "query": {
                                    "bool": {
                                        "should": [
                                            {
                                                "term": {
                                                    "engagement.type": 
"like"
                                                }
                                            },
                                            {
                                                "term": {
                                                    "content.remote_id": 
"20697868961_10152270678178962"
                                                }
                                            }
                                        ]
                                    }
                                }
                            }
                        }
                    ]
                }
            }
        }
    },
    "fields": "id",
    "sort": [
        {
            "_score": {}
        },
        {
            "id": {
                "order": "asc"
            }
        }
    ]
}


The index (which has 5 shards with 1 replica shard) we are testing this on 
has 2.2 million parent documents and 1.1 million child documents.

We are running our two data nodes on r3.2xlarge's which have 8 CPU's, 60GB 
of RAM, and SSD.

Our ES data nodes have 30G of heap and the field data cache is only 
consuming around ~3GB right now and there are no cache evictions. The field 
data cache is also allowed to grow to 75% of the available heap.

I'm looking to understand if this is a limitation with parent/child or is 
there additional configuration that has to be set beyond the defaults that 
would help speed these queries up?

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/9563cc90-21df-42ca-9eb1-aab4520db871%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Reply via email to