Re: Every other query slow
Note also that the slow instances of the query do not appear to show up in the slow query log. Also, I'm pulling the referenced times out of the response's "took" field. On Tuesday, July 8, 2014 11:09:21 PM UTC-4, Jonathan Foy wrote: > > Hello > > I'm trying to get a new ES cluster tuned properly to actually put into > production, and I'm running into some performance issues. > > While testing, I noticed that when running the same query multiple times, > I had alternating fast (~50 ms), and slow (2-3 s) results. It's the exact > same query, submitted via curl, and it happens consistently query after > query. > > curl -XGET localhost:9200/my_index/my_type/_search?routing=1234 -d > @fileWithQuery > > I literally hit up/enter time after time. At one point I wandered away > for ~30 minutes after a slow execution, came back, up/enter, it finished in > 40 ms. The immediate next attempt, 2 seconds. > > I'm running ES 1.1.1 on a two-node cluster. There's three indexes, three > shards each for the smaller two, six shards for the larger index, which is > the one I'm hitting. I'm using custom routing for all three. One replica > of each, so all 12 shards on each server. The index in question is ~125 > GB. The other two are 10 GB and 2 GB more or less. > > Summary server info: > ES 1.1.1 > 2 AWS r3.xlarge instances (30.5 GB each) > One 18 GB heap/G1 GC > One 20 GB heap/default GC > > I know the heap is set higher than recommended, but I wouldn't think > that'd be the current problem. > > My first thought was that I was simply hitting one server and then the > other via round-robin, and I needed to figure out which server was slow. > However, the stats reported in ElasticHQ indicated that the queries were > hitting the same server each time (there was no other searching going on > and limited indexing). Even when I tried running the search from the other > server, ElasticHQ still indicated that they queries were running on the one > server (and the same fast/slow/fast pattern was noticed, though independent > of the cycle on the other server). I'm not sure why the other server was > never being hit, though ElasticHQ DID report about 3x the amount of search > activity on the server on which the queries were running. That might be my > next question. > > There are warmers in place for some of the fields. Field cache reports no > evictions and hovers around 4GB, though there ARE a lot of evictions in the > filter cache. I think that's probably inevitable given how much variety > can come through in the searches, though I'm open to advice. > > I've pasted a sample query below. It's admittedly a bit ugly, because > it's built dynamically from a large number of search criteria with various > levels of nesting. I've tried cleaned up versions of the same query > (remove unnecessary filters) with the same results, but included it as is > (with renamed fields) in case there's something wrong. > > Note that while I've been testing and writing this post, I found that > removing the nested sort and instead sorting on a non-nested field does not > result in the fast/slow/fast pattern, they're all fast. However, I've > since tested other queries, including some with no sort/limit at all, and > found the same pattern. There is a lot of nesting, and sometimes has_child > filters. Executing somewhat different (though admittedly similar) queries > results in the same pattern across queries, regardless of which is run > when. Fast/slow/fast. > > So, any idea as to what is going on here? The fast queries are completely > adequate, the slow queries completely inadequate. I need to figure this > out. > > Let me know if any other info is needed. Thanks in advance. > > { > "from" : 0, > "size" : 50, > "query" : { > "filtered" : { > "query" : { > "match_all" : { } > }, > "filter" : { > "and" : { > "filters" : [ { > "term" : { > "accountId" : 1234 > } > }, { > "nested" : { > "filter" : { > "and" : { > "filters" : [ { > "nested" : { > "filter" : { > "and" : { > "filters" : [ { > "or" : { > "filters" : [ { > "term" : { > "stage1.stage2.bool1" : true > } > }, { > "term" : { > "stage1.stage2.bool2" : false > } > } ] > } > } ] > } > }, > "path" : "stage1.stage2" > } > }
Every other query slow
Hello I'm trying to get a new ES cluster tuned properly to actually put into production, and I'm running into some performance issues. While testing, I noticed that when running the same query multiple times, I had alternating fast (~50 ms), and slow (2-3 s) results. It's the exact same query, submitted via curl, and it happens consistently query after query. curl -XGET localhost:9200/my_index/my_type/_search?routing=1234 -d @fileWithQuery I literally hit up/enter time after time. At one point I wandered away for ~30 minutes after a slow execution, came back, up/enter, it finished in 40 ms. The immediate next attempt, 2 seconds. I'm running ES 1.1.1 on a two-node cluster. There's three indexes, three shards each for the smaller two, six shards for the larger index, which is the one I'm hitting. I'm using custom routing for all three. One replica of each, so all 12 shards on each server. The index in question is ~125 GB. The other two are 10 GB and 2 GB more or less. Summary server info: ES 1.1.1 2 AWS r3.xlarge instances (30.5 GB each) One 18 GB heap/G1 GC One 20 GB heap/default GC I know the heap is set higher than recommended, but I wouldn't think that'd be the current problem. My first thought was that I was simply hitting one server and then the other via round-robin, and I needed to figure out which server was slow. However, the stats reported in ElasticHQ indicated that the queries were hitting the same server each time (there was no other searching going on and limited indexing). Even when I tried running the search from the other server, ElasticHQ still indicated that they queries were running on the one server (and the same fast/slow/fast pattern was noticed, though independent of the cycle on the other server). I'm not sure why the other server was never being hit, though ElasticHQ DID report about 3x the amount of search activity on the server on which the queries were running. That might be my next question. There are warmers in place for some of the fields. Field cache reports no evictions and hovers around 4GB, though there ARE a lot of evictions in the filter cache. I think that's probably inevitable given how much variety can come through in the searches, though I'm open to advice. I've pasted a sample query below. It's admittedly a bit ugly, because it's built dynamically from a large number of search criteria with various levels of nesting. I've tried cleaned up versions of the same query (remove unnecessary filters) with the same results, but included it as is (with renamed fields) in case there's something wrong. Note that while I've been testing and writing this post, I found that removing the nested sort and instead sorting on a non-nested field does not result in the fast/slow/fast pattern, they're all fast. However, I've since tested other queries, including some with no sort/limit at all, and found the same pattern. There is a lot of nesting, and sometimes has_child filters. Executing somewhat different (though admittedly similar) queries results in the same pattern across queries, regardless of which is run when. Fast/slow/fast. So, any idea as to what is going on here? The fast queries are completely adequate, the slow queries completely inadequate. I need to figure this out. Let me know if any other info is needed. Thanks in advance. { "from" : 0, "size" : 50, "query" : { "filtered" : { "query" : { "match_all" : { } }, "filter" : { "and" : { "filters" : [ { "term" : { "accountId" : 1234 } }, { "nested" : { "filter" : { "and" : { "filters" : [ { "nested" : { "filter" : { "and" : { "filters" : [ { "or" : { "filters" : [ { "term" : { "stage1.stage2.bool1" : true } }, { "term" : { "stage1.stage2.bool2" : false } } ] } } ] } }, "path" : "stage1.stage2" } } ] } }, "path" : "stage1" } } ] } } } }, "fields" : "id", "sort" : [ { "website.domain.sortable" : { "order" : "asc", "missing" : "0", "nested_path" : "website" } }, { "id" : { "order" : "asc" } } ] } -- You received this message because you are subscribed to the Google Groups "elasticsearch" group. To unsub