Terms aggregation scripts running slower than expected

2014-04-09 Thread Thomas S.
Hi,

I am currently exploring the option of using scripts with aggregations and 
I noticed that for some reason scripts for terms aggregations are executed 
much slower than for other aggregations, even if the script doesn't access 
any fields yet. This also happens for native Java scripts. I'm running 
Elasticsearch 1.1.0.

For example, on my data set the simple script "1" takes around 400ms for 
the sum and histogram aggregations, but takes around 25s to run on a terms 
aggregation, even on repeated runs. What is going on here? Terms 
aggregations without a script are very fast, and histogram/sum aggregations 
with scripts that access the document are also very fast: I had to 
transform a script aggregation that should have been a terms aggregation 
into a histogram and convert the numeric values back into terms on the 
client so the aggregation would be executed in reasonable time.


In [2]: app.search.search({'size': 0, 'query': { 'match_all': {} }, 
'aggregations': { 'test_script': { 'terms': { 'script': '1' } } }})
Out[2]:
{u'_shards': {u'failed': 0, u'successful': 246, u'total': 246},
 u'aggregations': {u'test_script': {u'buckets': [{u'doc_count': 4231327,
 u'key': u'1'}]}},
 u'hits': {u'hits': [], u'max_score': 0.0, u'total': 4231327},
 u'timed_out': False,
 u'took': 24986}


In [10]: app.search.search({'size': 0, 'query': { 'match_all': {} }, 
'aggregations': { 'test_script': { 'sum': { 'script': '1' } } }})
Out[10]:
{u'_shards': {u'failed': 0, u'successful': 246, u'total': 246},
 u'aggregations': {u'test_script': {u'value': 4231327.0}},
 u'hits': {u'hits': [], u'max_score': 0.0, u'total': 4231327},
 u'timed_out': False,
 u'took': 363}


In [8]: app.search.search({'size': 0, 'query': { 'match_all': {} }, 
'aggregations': { 'test_script': { 'histogram': { 'script': '1', 
'interval': 1 } } }})
Out[8]:
{u'_shards': {u'failed': 0, u'successful': 246, u'total': 246},
 u'aggregations': {u'test_script': {u'buckets': [{u'doc_count': 4231327,
 u'key': 1}]}},
 u'hits': {u'hits': [], u'max_score': 0.0, u'total': 4231327},
 u'timed_out': False,
 u'took': 421}


Thomas

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/4af8942c-db46-47fa-9d38-370051a15c5c%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: Inconsistent search cluster status and search results after long GC run

2014-03-27 Thread Thomas S.
Forgot to reply to your questions, Binh:

1) No I haven't set this. However I wonder if this has any significant 
effect since swap space is barely used.
2) It seems to happen when the cluster is under high load but I haven't 
seen any specific pattern so far.
3) No there's not. There's a very small Redis instance running on node1, 
but there's nothing else on the nodes with shards (where the GC problem 
happens).

If I was going to disable master on any node that has shards I'd have to 
add another dummy node with master:true so the cluster is in good state if 
any one of the nodes is down.


On Thursday, March 27, 2014 4:46:41 PM UTC+1, Binh Ly wrote:
>
> I would probably not master enable any node that can potentially gc for a 
> couple seconds. You want your master-eligible nodes to make decisions as 
> quick as possible.
>
> About your GC situation, I'd find out what the underlying cause is:
>
> 1) Do you have bootstrap.mlockall set to true?
>
> 2) Does it usually triggered while running queries? Or is there a pattern 
> on when it usually triggers?
>
> 3) Is there anything else running on these nodes that would overload and 
> affect normal ES operations?
>

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/0ae93e7c-a6f7-4784-8b4a-71d6f52552a7%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: Inconsistent search cluster status and search results after long GC run

2014-03-27 Thread Thomas S.
Thanks Jörg,

I can increase the ping_timeout to 60s for now. However, shouldn't the goal 
be to minimize the time GC runs? Is the node blocked when GC runs and will 
delay any requests to it? If so, then it would be very bad to allow long GC 
runs.

Regarding the bulk thread pool: I specifically set this to a higher value 
to avoid errors when we perform bulk indexing (we had errors sometimes when 
the queue was full and set to 50. I was also going to increase the "index" 
queue since there are sometimes errors). I will try keeping the limit and 
give it more heap space to indexing instead, as you suggested.

Regarding Java 8: We're currently running Java 7 and haven't tweaked any GC 
specific settings. Do you think it makes sense to already switch to Java 8 
on production and enable the G1 garbage collector?

Thanks again,
Thomas

On Thursday, March 27, 2014 9:41:10 PM UTC+1, Jörg Prante wrote:
>
> It seems you run into trouble because you changed some of the default 
> settings, worsening your situation.
>
> Increase ping_timout from 9s to 60s as first band aid - you have GCs with 
> 35secs running.
>
> You should reduce the bulk thread pool of 100 to 50, this reduces high 
> memory pressure on the 20% memory you allow. Give more heap space to 
> indexing, use 50% instead of 20%.
>
> Better help would be to diagnose the nodes if you exceed the capacity for 
> search and index operations. If so, think about adding nodes.
>
> More finetuning after adding nodes could include G1 GC with Java 8, which 
> is targeted to minimize GC stalls. This would not solve node capacity 
> problems though.
>
> Jörg
>
>
> On Thu, Mar 27, 2014 at 4:46 PM, Binh Ly  >wrote:
>
>> I would probably not master enable any node that can potentially gc for a 
>> couple seconds. You want your master-eligible nodes to make decisions as 
>> quick as possible.
>>
>> About your GC situation, I'd find out what the underlying cause is:
>>
>> 1) Do you have bootstrap.mlockall set to true?
>>
>> 2) Does it usually triggered while running queries? Or is there a pattern 
>> on when it usually triggers?
>>
>> 3) Is there anything else running on these nodes that would overload and 
>> affect normal ES operations?
>>  
>> -- 
>> You received this message because you are subscribed to the Google Groups 
>> "elasticsearch" group.
>> To unsubscribe from this group and stop receiving emails from it, send an 
>> email to elasticsearc...@googlegroups.com .
>> To view this discussion on the web visit 
>> https://groups.google.com/d/msgid/elasticsearch/cd594a91-00c4-43ae-97d8-bbda35618d8e%40googlegroups.com
>> .
>>
>> For more options, visit https://groups.google.com/d/optout.
>>
>
>

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/86db1b12-038f-47d6-9fac-9e8eb8314dbc%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Inconsistent search cluster status and search results after long GC run

2014-03-27 Thread Thomas S.
Hi,

Multiple times we ran into a problem where our search cluster was in an 
inconsistent state. We have 3 nodes (all running 1.0.1), where nodes 2+3 
hold the data (all the shards each, i.e. one replica per shard). Sometimes, 
a long GC run happens on one of the nodes (here on node 3), causing it to 
disconnect because the GC took longer than the timeout (here GC took 35.1s 
and our timeout is currently 9s):


NODE 1
[2014-03-27 00:55:41,032][WARN ][discovery.zen] [node1] 
received cluster state from 
[[node2][A45sMYqtQsGrwY5exK0sEg][node2][inet[/10.216.32.81:9300]]{master=true}] 
which is also master but with an older cluster_state, telling 
[[node2][A45sMYqtQsGrwY5exK0sEg][node2][inet[/10.216.32.81:9300]]{master=true}] 
to rejoin the cluster
[2014-03-27 00:55:41,033][WARN ][discovery.zen] [node1] failed 
to send rejoin request to 
[[node2][A45sMYqtQsGrwY5exK0sEg][node2][inet[/10.216.32.81:9300]]{master=true}]
org.elasticsearch.transport.SendRequestTransportException: 
[node2][inet[/10.216.32.81:9300]][discovery/zen/rejoin]
at 
org.elasticsearch.transport.TransportService.sendRequest(TransportService.java:202)
at 
org.elasticsearch.transport.TransportService.sendRequest(TransportService.java:173)
at 
org.elasticsearch.discovery.zen.ZenDiscovery$7.execute(ZenDiscovery.java:556)
at 
org.elasticsearch.cluster.service.InternalClusterService$UpdateTask.run(InternalClusterService.java:308)
at 
org.elasticsearch.common.util.concurrent.PrioritizedEsThreadPoolExecutor$TieBreakingPrioritizedRunnable.run(PrioritizedEsThreadPoolExecutor.java:134)
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:724)
Caused by: org.elasticsearch.transport.NodeNotConnectedException: 
[node2][inet[/10.216.32.81:9300]] Node not connected
at 
org.elasticsearch.transport.netty.NettyTransport.nodeChannel(NettyTransport.java:859)
at 
org.elasticsearch.transport.netty.NettyTransport.sendRequest(NettyTransport.java:540)
at 
org.elasticsearch.transport.TransportService.sendRequest(TransportService.java:189)
... 7 more
[2014-03-27 01:54:45,722][WARN ][discovery.zen] [node1] 
received cluster state from 
[[node2][A45sMYqtQsGrwY5exK0sEg][node2][inet[/10.216.32.81:9300]]{master=true}] 
which is also master but with an older cluster_state, telling 
[[node2][A45sMYqtQsGrwY5exK0sEg][node2][inet[/10.216.32.81:9300]]{master=true}] 
to rejoin the cluster
[2014-03-27 01:54:45,723][WARN ][discovery.zen] [node1] failed 
to send rejoin request to 
[[node2][A45sMYqtQsGrwY5exK0sEg][node2][inet[/10.216.32.81:9300]]{master=true}]
org.elasticsearch.transport.SendRequestTransportException: 
[node2][inet[/10.216.32.81:9300]][discovery/zen/rejoin]
at 
org.elasticsearch.transport.TransportService.sendRequest(TransportService.java:202)
at 
org.elasticsearch.transport.TransportService.sendRequest(TransportService.java:173)
at 
org.elasticsearch.discovery.zen.ZenDiscovery$7.execute(ZenDiscovery.java:556)
at 
org.elasticsearch.cluster.service.InternalClusterService$UpdateTask.run(InternalClusterService.java:308)
at 
org.elasticsearch.common.util.concurrent.PrioritizedEsThreadPoolExecutor$TieBreakingPrioritizedRunnable.run(PrioritizedEsThreadPoolExecutor.java:134)
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:724)
Caused by: org.elasticsearch.transport.NodeNotConnectedException: 
[node2][inet[/10.216.32.81:9300]] Node not connected
at 
org.elasticsearch.transport.netty.NettyTransport.nodeChannel(NettyTransport.java:859)
at 
org.elasticsearch.transport.netty.NettyTransport.sendRequest(NettyTransport.java:540)
at 
org.elasticsearch.transport.TransportService.sendRequest(TransportService.java:189)
... 7 more
[2014-03-27 07:19:02,889][WARN ][discovery.zen] [node1] 
received cluster state from 
[[node2][A45sMYqtQsGrwY5exK0sEg][node2][inet[/10.216.32.81:9300]]{master=true}] 
which is also master but with an older cluster_state, telling 
[[node2][A45sMYqtQsGrwY5exK0sEg][node2][inet[/10.216.32.81:9300]]{master=true}] 
to rejoin the cluster
[2014-03-27 07:19:02,889][WARN ][discovery.zen] [node1] failed 
to send rejoin request to 
[[node2][A45sMYqtQsGrwY5exK0sEg][node2][inet[/10.216.32.81:9300]]{master=true}]
org.elasticsearch.transport.SendRequestTransportException: 
[node2][inet[/10.216.32.81:9300]][discovery/zen/rejoin]
at 
org.elasticsearch.transport.TransportService.sendRequest(TransportService.java:202)
at 
org.elasticsearch.transport.TransportService.sendRequest(TransportService.java:173)
  

Re: Delete by query fails often with HTTP 503

2014-03-18 Thread Thomas S.
Thanks Clint,

We have two nodes with 60 shards per node. I will increase the queue size. 
Hopefully this will reduce the amount of rejections.

Thomas


On Tuesday, March 18, 2014 6:11:27 PM UTC+1, Clinton Gormley wrote:
>
> Do you have lots of shards on just a few nodes? Delete by query is handled 
> by the `index` thread pool, but those threads are shared across all shards 
> on a node.  Delete by query can produce a large number of changes, which 
> can fill up the thread pool queue and result in rejections.
>
> You can either just (a) retry or (b) increase the queue size for the 
> `index` thread pool (which will use more memory as more delete requests 
> will need to be queued)
>
> See 
> http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/modules-threadpool.html#types
>
> clint
>
>
> On 18 March 2014 08:13, Thomas S. > wrote:
>
>> Hi,
>>
>> We often get failures when using the delete by query API. The response is 
>> an HTTP 503 with a body like this:
>>
>> {"_indices": {"myindex": {"_shards": {"successful": 2, "failed": 58, 
>> "total": 60
>>
>> Is there a way to figure out what is causing this error? It seems to 
>> mostly happen when the search cluster is busy.
>>
>> Thomas
>>
>>  -- 
>> You received this message because you are subscribed to the Google Groups 
>> "elasticsearch" group.
>> To unsubscribe from this group and stop receiving emails from it, send an 
>> email to elasticsearc...@googlegroups.com .
>> To view this discussion on the web visit 
>> https://groups.google.com/d/msgid/elasticsearch/f8c84eaf-79b9-4f4e-9b26-732d11544fb9%40googlegroups.com<https://groups.google.com/d/msgid/elasticsearch/f8c84eaf-79b9-4f4e-9b26-732d11544fb9%40googlegroups.com?utm_medium=email&utm_source=footer>
>> .
>> For more options, visit https://groups.google.com/d/optout.
>>
>
>

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/b815184a-8382-4b25-8a54-b98753f6cbb4%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Delete by query fails often with HTTP 503

2014-03-18 Thread Thomas S.
Hi,

We often get failures when using the delete by query API. The response is 
an HTTP 503 with a body like this:

{"_indices": {"myindex": {"_shards": {"successful": 2, "failed": 58, 
"total": 60

Is there a way to figure out what is causing this error? It seems to mostly 
happen when the search cluster is busy.

Thomas

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/f8c84eaf-79b9-4f4e-9b26-732d11544fb9%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.