Every other query slow

Jonathan Foy Tue, 08 Jul 2014 20:10:26 -0700

Hello

I'm trying to get a new ES cluster tuned properly to actually put into 
production, and I'm running into some performance issues.


While testing, I noticed that when running the same query multiple times, I 
had alternating fast (~50 ms), and slow (2-3 s) results.  It's the exact 
same query, submitted via curl, and it happens consistently query after 
query.

curl -XGET localhost:9200/my_index/my_type/_search?routing=1234 -d 
@fileWithQuery

I literally hit up/enter time after time.  At one point I wandered away for 
~30 minutes after a slow execution, came back, up/enter, it finished in 40 
ms.  The immediate next attempt, 2 seconds.

I'm running ES 1.1.1 on a two-node cluster.  There's three indexes, three 
shards each for the smaller two, six shards for the larger index, which is 
the one I'm hitting.  I'm using custom routing for all three.  One replica 
of each, so all 12 shards on each server.  The index in question is ~125 
GB.  The other two are 10 GB and 2 GB more or less.  

Summary server info:
ES 1.1.1
2 AWS r3.xlarge instances (30.5 GB each)
One 18 GB heap/G1 GC
One 20 GB heap/default GC

I know the heap is set higher than recommended, but I wouldn't think that'd 
be the current problem.

My first thought was that I was simply hitting one server and then the 
other via round-robin, and I needed to figure out which server was slow.  
However, the stats reported in ElasticHQ indicated that the queries were 
hitting the same server each time (there was no other searching going on 
and limited indexing).  Even when I tried running the search from the other 
server, ElasticHQ still indicated that they queries were running on the one 
server (and the same fast/slow/fast pattern was noticed, though independent 
of the cycle on the other server).  I'm not sure why the other server was 
never being hit, though ElasticHQ DID report about 3x the amount of search 
activity on the server on which the queries were running.  That might be my 
next question.

There are warmers in place for some of the fields.  Field cache reports no 
evictions and hovers around 4GB, though there ARE a lot of evictions in the 
filter cache.  I think that's probably inevitable given how much variety 
can come through in the searches, though I'm open to advice.

I've pasted a sample query below.  It's admittedly a bit ugly, because it's 
built dynamically from a large number of search criteria with various 
levels of nesting.  I've tried cleaned up versions of the same query 
(remove unnecessary filters) with the same results, but included it as is 
(with renamed fields) in case there's something wrong.

Note that while I've been testing and writing this post, I found that 
removing the nested sort and instead sorting on a non-nested field does not 
result in the fast/slow/fast pattern, they're all fast.  However, I've 
since tested other queries, including some with no sort/limit at all, and 
found the same pattern.  There is a lot of nesting, and sometimes has_child 
filters.  Executing somewhat different (though admittedly similar) queries 
results in the same pattern across queries, regardless of which is run 
when.  Fast/slow/fast.

So, any idea as to what is going on here?  The fast queries are completely 
adequate, the slow queries completely inadequate.  I need to figure this 
out.

Let me know if any other info is needed.  Thanks in advance.

{
  "from" : 0,
  "size" : 50,
  "query" : {
    "filtered" : {
      "query" : {
        "match_all" : { }
      },
      "filter" : {
        "and" : {
          "filters" : [ {
            "term" : {
              "accountId" : 1234
            }
          }, {
            "nested" : {
              "filter" : {
                "and" : {
                  "filters" : [ {
                    "nested" : {
                      "filter" : {
                        "and" : {
                          "filters" : [ {
                            "or" : {
                              "filters" : [ {
                                "term" : {
                                  "stage1.stage2.bool1" : true
                                }
                              }, {
                                "term" : {
                                  "stage1.stage2.bool2" : false
                                }
                              } ]
                            }
                          } ]
                        }
                      },
                      "path" : "stage1.stage2"
                    }
                  } ]
                }
              },
              "path" : "stage1"
            }
          } ]
        }
      }
    }
  },
  "fields" : "id",
  "sort" : [ {
    "website.domain.sortable" : {
      "order" : "asc",
      "missing" : "0",
      "nested_path" : "website"
    }
  }, {
    "id" : {
      "order" : "asc"
    }
  } ]
}

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/4a9f6678-34e3-4f56-b55d-2137b45833dc%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Every other query slow

Reply via email to