[jira] [Comment Edited] (SOLR-10130) Serious performance degradation in Solr 6.4.1 due to the new metrics collection

2017-02-14 Thread Andrzej Bialecki (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-10130?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15865912#comment-15865912
 ] 

Andrzej Bialecki  edited comment on SOLR-10130 at 2/14/17 3:09 PM:
---

I haven't been able to reproduce such drastic slowdown using simple benchmarks 
- example results from indexing using {{post}} tool, fairly representative from 
several runs on each branch:
{code}
* branch_6_3
real4m14.804s
user0m0.883s
sys 0m2.279s

* branch_6_4
real5m0.987s
user0m0.910s
sys 0m2.276s

* jira/solr-10130
real4m38.097s
user0m0.881s
sys 0m2.287s
{code}
Profiler indeed shows that one of the hotspots on branch_6_4 is the 
{{Meter.mark}} code that is called in 
{{org.apache.solr.core.MetricsDirectoryFactory$MetricsInput.readByte}}. In my 
test the profiler showed that this consumes ~ 3% CPU, which is indeed something 
that we should avoid and turn off by default.

However, this still doesn't explain the order of magnitude slowdown reported 
above.

[~emaijala] and [~wunder] - please apply the above patch in your environment 
and see what is the impact. It makes sense to make this change anyway, so I'm 
going to apply this or similar version to all affected branches, but maybe 
there's more we can do here.


was (Author: ab):
I haven't been able to reproduce such drastic slowdown using simple benchmarks 
- example results from indexing using {{post}} tool, fairly representative from 
several runs on each branch:
{code}
* branch_6_3

* branch_6_4

* jira/solr-10130

{code}
Profiler indeed shows that one of the hotspots on branch_6_4 is the 
{{Meter.mark}} code that is called in 
{{org.apache.solr.core.MetricsDirectoryFactory$MetricsInput.readByte}}. In my 
test the profiler showed that this consumes ~ 3% CPU, which is indeed something 
that we should avoid and turn off by default.

However, this still doesn't explain the order of magnitude slowdown reported 
above.

[~emaijala] and [~wunder] - please apply the above patch in your environment 
and see what is the impact. It makes sense to make this change anyway, so I'm 
going to apply this or similar version to all affected branches, but maybe 
there's more we can do here.

> Serious performance degradation in Solr 6.4.1 due to the new metrics 
> collection
> ---
>
> Key: SOLR-10130
> URL: https://issues.apache.org/jira/browse/SOLR-10130
> Project: Solr
>  Issue Type: Bug
>  Security Level: Public(Default Security Level. Issues are Public) 
>  Components: metrics
>Affects Versions: 6.4.1
> Environment: Centos 7, OpenJDK 1.8.0 update 111
>Reporter: Ere Maijala
>Assignee: Andrzej Bialecki 
>Priority: Blocker
>  Labels: perfomance
> Attachments: SOLR-10130.patch, solr-8983-console-f1.log
>
>
> We've stumbled on serious performance issues after upgrading to Solr 6.4.1. 
> Looks like the new metrics collection system in MetricsDirectoryFactory is 
> causing a major slowdown. This happens with an index configuration that, as 
> far as I can see, has no metrics specific configuration and uses 
> luceneMatchVersion 5.5.0. In practice a moderate load will completely bog 
> down the server with Solr threads constantly using up all CPU (600% on 6 core 
> machine) capacity with a load that normally  where we normally see an average 
> load of < 50%.
> I took stack traces (I'll attach them) and noticed that the threads are 
> spending time in com.codahale.metrics.Meter.mark. I tested building Solr 
> 6.4.1 with the metrics collection disabled in MetricsDirectoryFactory getByte 
> and getBytes methods and was unable to reproduce the issue.
> As far as I can see there are several issues:
> 1. Collecting metrics on every single byte read is slow.
> 2. Having it enabled by default is not a good idea.
> 3. The comment "enable coarse-grained metrics by default" at 
> https://github.com/apache/lucene-solr/blob/branch_6x/solr/core/src/java/org/apache/solr/update/SolrIndexConfig.java#L104
>  implies that only coarse-grained metrics should be enabled by default, and 
> this contradicts with collecting metrics on every single byte read.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Comment Edited] (SOLR-10130) Serious performance degradation in Solr 6.4.1 due to the new metrics collection

2017-02-14 Thread Ishan Chattopadhyaya (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-10130?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15865470#comment-15865470
 ] 

Ishan Chattopadhyaya edited comment on SOLR-10130 at 2/14/17 9:39 AM:
--

I ran some benchmarks, with and without this patch.

Benchmarking suite: 
https://github.com/chatman/solr-upgrade-tests/blob/master/BENCHMARKS.md
Environment: packet.net, Type 0 server 
(https://www.packet.net/bare-metal/servers/type-0/)
{code}
6.4.1 Without patch

java -cp 
target/org.apache.solr.tests.upgradetests-0.0.1-SNAPSHOT-jar-with-dependencies.jar:.
 org.apache.solr.tests.upgradetests.SimpleBenchmarks -v 
72f75b2503fa0aa4f0aff76d439874feb923bb0e -Nodes 1 -Shards 1 -Replicas 1 
-numDocs 10 -threads 6 -benchmarkType generalIndexing

Indexing times: 188,190

6.4.1 With patch

java -cp 
target/org.apache.solr.tests.upgradetests-0.0.1-SNAPSHOT-jar-with-dependencies.jar:.
 org.apache.solr.tests.upgradetests.SimpleBenchmarks -v 
72f75b2503fa0aa4f0aff76d439874feb923bb0e -patchUrl 
https://issues.apache.org/jira/secure/attachment/12852444/SOLR-10130.patch 
-Nodes 1 -Shards 1 -Replicas 1 -numDocs 10 -threads 6 -benchmarkType 
generalIndexing

Indexing times: 171,165
{code}


was (Author: ichattopadhyaya):
I ran some benchmarks, with and without this patch.

{code}
Benchmarking suite: 
https://github.com/chatman/solr-upgrade-tests/blob/master/BENCHMARKS.md
Environment: packet.net, Type 0 server 
(https://www.packet.net/bare-metal/servers/type-0/)

6.4.1 Without patch

java -cp 
target/org.apache.solr.tests.upgradetests-0.0.1-SNAPSHOT-jar-with-dependencies.jar:.
 org.apache.solr.tests.upgradetests.SimpleBenchmarks -v 
72f75b2503fa0aa4f0aff76d439874feb923bb0e -Nodes 1 -Shards 1 -Replicas 1 
-numDocs 10 -threads 6 -benchmarkType generalIndexing

Indexing times: 188,190

6.4.1 With patch

java -cp 
target/org.apache.solr.tests.upgradetests-0.0.1-SNAPSHOT-jar-with-dependencies.jar:.
 org.apache.solr.tests.upgradetests.SimpleBenchmarks -v 
72f75b2503fa0aa4f0aff76d439874feb923bb0e -patchUrl 
https://issues.apache.org/jira/secure/attachment/12852444/SOLR-10130.patch 
-Nodes 1 -Shards 1 -Replicas 1 -numDocs 10 -threads 6 -benchmarkType 
generalIndexing

Indexing times: 171,165
{code}

> Serious performance degradation in Solr 6.4.1 due to the new metrics 
> collection
> ---
>
> Key: SOLR-10130
> URL: https://issues.apache.org/jira/browse/SOLR-10130
> Project: Solr
>  Issue Type: Bug
>  Security Level: Public(Default Security Level. Issues are Public) 
>  Components: metrics
>Affects Versions: 6.4.1
> Environment: Centos 7, OpenJDK 1.8.0 update 111
>Reporter: Ere Maijala
>Assignee: Andrzej Bialecki 
>Priority: Blocker
>  Labels: perfomance
> Attachments: SOLR-10130.patch, solr-8983-console-f1.log
>
>
> We've stumbled on serious performance issues after upgrading to Solr 6.4.1. 
> Looks like the new metrics collection system in MetricsDirectoryFactory is 
> causing a major slowdown. This happens with an index configuration that, as 
> far as I can see, has no metrics specific configuration and uses 
> luceneMatchVersion 5.5.0. In practice a moderate load will completely bog 
> down the server with Solr threads constantly using up all CPU (600% on 6 core 
> machine) capacity with a load that normally  where we normally see an average 
> load of < 50%.
> I took stack traces (I'll attach them) and noticed that the threads are 
> spending time in com.codahale.metrics.Meter.mark. I tested building Solr 
> 6.4.1 with the metrics collection disabled in MetricsDirectoryFactory getByte 
> and getBytes methods and was unable to reproduce the issue.
> As far as I can see there are several issues:
> 1. Collecting metrics on every single byte read is slow.
> 2. Having it enabled by default is not a good idea.
> 3. The comment "enable coarse-grained metrics by default" at 
> https://github.com/apache/lucene-solr/blob/branch_6x/solr/core/src/java/org/apache/solr/update/SolrIndexConfig.java#L104
>  implies that only coarse-grained metrics should be enabled by default, and 
> this contradicts with collecting metrics on every single byte read.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org