Re: Every other query slow

2014-07-09 Thread Jonathan Foy
Note also that the slow instances of the query do not appear to show up in 
the slow query log.  Also, I'm pulling the referenced times out of the 
response's "took" field.

On Tuesday, July 8, 2014 11:09:21 PM UTC-4, Jonathan Foy wrote:
>
> Hello
>
> I'm trying to get a new ES cluster tuned properly to actually put into 
> production, and I'm running into some performance issues. 
>
> While testing, I noticed that when running the same query multiple times, 
> I had alternating fast (~50 ms), and slow (2-3 s) results.  It's the exact 
> same query, submitted via curl, and it happens consistently query after 
> query.
>
> curl -XGET localhost:9200/my_index/my_type/_search?routing=1234 -d 
> @fileWithQuery
>
> I literally hit up/enter time after time.  At one point I wandered away 
> for ~30 minutes after a slow execution, came back, up/enter, it finished in 
> 40 ms.  The immediate next attempt, 2 seconds.
>
> I'm running ES 1.1.1 on a two-node cluster.  There's three indexes, three 
> shards each for the smaller two, six shards for the larger index, which is 
> the one I'm hitting.  I'm using custom routing for all three.  One replica 
> of each, so all 12 shards on each server.  The index in question is ~125 
> GB.  The other two are 10 GB and 2 GB more or less.  
>
> Summary server info:
> ES 1.1.1
> 2 AWS r3.xlarge instances (30.5 GB each)
> One 18 GB heap/G1 GC
> One 20 GB heap/default GC
>
> I know the heap is set higher than recommended, but I wouldn't think 
> that'd be the current problem.
>
> My first thought was that I was simply hitting one server and then the 
> other via round-robin, and I needed to figure out which server was slow.  
> However, the stats reported in ElasticHQ indicated that the queries were 
> hitting the same server each time (there was no other searching going on 
> and limited indexing).  Even when I tried running the search from the other 
> server, ElasticHQ still indicated that they queries were running on the one 
> server (and the same fast/slow/fast pattern was noticed, though independent 
> of the cycle on the other server).  I'm not sure why the other server was 
> never being hit, though ElasticHQ DID report about 3x the amount of search 
> activity on the server on which the queries were running.  That might be my 
> next question.
>
> There are warmers in place for some of the fields.  Field cache reports no 
> evictions and hovers around 4GB, though there ARE a lot of evictions in the 
> filter cache.  I think that's probably inevitable given how much variety 
> can come through in the searches, though I'm open to advice.
>
> I've pasted a sample query below.  It's admittedly a bit ugly, because 
> it's built dynamically from a large number of search criteria with various 
> levels of nesting.  I've tried cleaned up versions of the same query 
> (remove unnecessary filters) with the same results, but included it as is 
> (with renamed fields) in case there's something wrong.
>
> Note that while I've been testing and writing this post, I found that 
> removing the nested sort and instead sorting on a non-nested field does not 
> result in the fast/slow/fast pattern, they're all fast.  However, I've 
> since tested other queries, including some with no sort/limit at all, and 
> found the same pattern.  There is a lot of nesting, and sometimes has_child 
> filters.  Executing somewhat different (though admittedly similar) queries 
> results in the same pattern across queries, regardless of which is run 
> when.  Fast/slow/fast.
>
> So, any idea as to what is going on here?  The fast queries are completely 
> adequate, the slow queries completely inadequate.  I need to figure this 
> out.
>
> Let me know if any other info is needed.  Thanks in advance.
>
> {
>   "from" : 0,
>   "size" : 50,
>   "query" : {
> "filtered" : {
>   "query" : {
> "match_all" : { }
>   },
>   "filter" : {
> "and" : {
>   "filters" : [ {
> "term" : {
>   "accountId" : 1234
> }
>   }, {
> "nested" : {
>   "filter" : {
> "and" : {
>   "filters" : [ {
> "nested" : {
>   "filter" : {
> "and" : {
>   "filters" : [ {
> "or" : {
>   "filters" : [ {
> "term" : {
>   "stage1.stage2.bool1" : true
> }
>   }, {
> "term" : {
>   "stage1.stage2.bool2" : false
> }
>   } ]
> }
>   } ]
> }
>   },
>   "path" : "stage1.stage2"
> }
>   }

Every other query slow

2014-07-08 Thread Jonathan Foy
Hello

I'm trying to get a new ES cluster tuned properly to actually put into 
production, and I'm running into some performance issues. 

While testing, I noticed that when running the same query multiple times, I 
had alternating fast (~50 ms), and slow (2-3 s) results.  It's the exact 
same query, submitted via curl, and it happens consistently query after 
query.

curl -XGET localhost:9200/my_index/my_type/_search?routing=1234 -d 
@fileWithQuery

I literally hit up/enter time after time.  At one point I wandered away for 
~30 minutes after a slow execution, came back, up/enter, it finished in 40 
ms.  The immediate next attempt, 2 seconds.

I'm running ES 1.1.1 on a two-node cluster.  There's three indexes, three 
shards each for the smaller two, six shards for the larger index, which is 
the one I'm hitting.  I'm using custom routing for all three.  One replica 
of each, so all 12 shards on each server.  The index in question is ~125 
GB.  The other two are 10 GB and 2 GB more or less.  

Summary server info:
ES 1.1.1
2 AWS r3.xlarge instances (30.5 GB each)
One 18 GB heap/G1 GC
One 20 GB heap/default GC

I know the heap is set higher than recommended, but I wouldn't think that'd 
be the current problem.

My first thought was that I was simply hitting one server and then the 
other via round-robin, and I needed to figure out which server was slow.  
However, the stats reported in ElasticHQ indicated that the queries were 
hitting the same server each time (there was no other searching going on 
and limited indexing).  Even when I tried running the search from the other 
server, ElasticHQ still indicated that they queries were running on the one 
server (and the same fast/slow/fast pattern was noticed, though independent 
of the cycle on the other server).  I'm not sure why the other server was 
never being hit, though ElasticHQ DID report about 3x the amount of search 
activity on the server on which the queries were running.  That might be my 
next question.

There are warmers in place for some of the fields.  Field cache reports no 
evictions and hovers around 4GB, though there ARE a lot of evictions in the 
filter cache.  I think that's probably inevitable given how much variety 
can come through in the searches, though I'm open to advice.

I've pasted a sample query below.  It's admittedly a bit ugly, because it's 
built dynamically from a large number of search criteria with various 
levels of nesting.  I've tried cleaned up versions of the same query 
(remove unnecessary filters) with the same results, but included it as is 
(with renamed fields) in case there's something wrong.

Note that while I've been testing and writing this post, I found that 
removing the nested sort and instead sorting on a non-nested field does not 
result in the fast/slow/fast pattern, they're all fast.  However, I've 
since tested other queries, including some with no sort/limit at all, and 
found the same pattern.  There is a lot of nesting, and sometimes has_child 
filters.  Executing somewhat different (though admittedly similar) queries 
results in the same pattern across queries, regardless of which is run 
when.  Fast/slow/fast.

So, any idea as to what is going on here?  The fast queries are completely 
adequate, the slow queries completely inadequate.  I need to figure this 
out.

Let me know if any other info is needed.  Thanks in advance.

{
  "from" : 0,
  "size" : 50,
  "query" : {
"filtered" : {
  "query" : {
"match_all" : { }
  },
  "filter" : {
"and" : {
  "filters" : [ {
"term" : {
  "accountId" : 1234
}
  }, {
"nested" : {
  "filter" : {
"and" : {
  "filters" : [ {
"nested" : {
  "filter" : {
"and" : {
  "filters" : [ {
"or" : {
  "filters" : [ {
"term" : {
  "stage1.stage2.bool1" : true
}
  }, {
"term" : {
  "stage1.stage2.bool2" : false
}
  } ]
}
  } ]
}
  },
  "path" : "stage1.stage2"
}
  } ]
}
  },
  "path" : "stage1"
}
  } ]
}
  }
}
  },
  "fields" : "id",
  "sort" : [ {
"website.domain.sortable" : {
  "order" : "asc",
  "missing" : "0",
  "nested_path" : "website"
}
  }, {
"id" : {
  "order" : "asc"
}
  } ]
}

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsub