[ 
https://issues.apache.org/jira/browse/SOLR-10130?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15870165#comment-15870165
 ] 

Walter Underwood commented on SOLR-10130:
-----------------------------------------

The slowdown is impressive under heavy query load. Here are two load benchmarks 
with a 16 node cluster, c4.8xlarge instances (36 CPUs, 60 GB RAM), 15.7 million 
docs, 4 shards, replication factor 4 using production query logs. These are 
very long text queries, up to 40 words. Benchmark runs for two or three hours, 
depending on my patience. Java 8u121, G1 collector.

6.4.0 with 1000 requests/minute is running out of CPU. Median and 95th 
percentile response times for an ngram/prefix match are 7.5 and 9.8 seconds. 
For a word match, they are 11 and 25.4 seconds.

6.3.0 with 6000 rpm, the times are 0.4 and 2.7 seconds, and 0.7 and 4.3 
seconds, respectively. CPU usage is under 50%.

Short version, 6.4 is 10X slower than 6.3 handling 1/6 the load. 

> Serious performance degradation in Solr 6.4.1 due to the new metrics 
> collection
> -------------------------------------------------------------------------------
>
>                 Key: SOLR-10130
>                 URL: https://issues.apache.org/jira/browse/SOLR-10130
>             Project: Solr
>          Issue Type: Bug
>      Security Level: Public(Default Security Level. Issues are Public) 
>          Components: metrics
>    Affects Versions: 6.4.1
>         Environment: Centos 7, OpenJDK 1.8.0 update 111
>            Reporter: Ere Maijala
>            Assignee: Andrzej Bialecki 
>            Priority: Blocker
>              Labels: perfomance
>             Fix For: master (7.0), 6.4.2
>
>         Attachments: SOLR-10130.patch, SOLR-10130.patch, 
> solr-8983-console-f1.log
>
>
> We've stumbled on serious performance issues after upgrading to Solr 6.4.1. 
> Looks like the new metrics collection system in MetricsDirectoryFactory is 
> causing a major slowdown. This happens with an index configuration that, as 
> far as I can see, has no metrics specific configuration and uses 
> luceneMatchVersion 5.5.0. In practice a moderate load will completely bog 
> down the server with Solr threads constantly using up all CPU (600% on 6 core 
> machine) capacity with a load that normally  where we normally see an average 
> load of < 50%.
> I took stack traces (I'll attach them) and noticed that the threads are 
> spending time in com.codahale.metrics.Meter.mark. I tested building Solr 
> 6.4.1 with the metrics collection disabled in MetricsDirectoryFactory getByte 
> and getBytes methods and was unable to reproduce the issue.
> As far as I can see there are several issues:
> 1. Collecting metrics on every single byte read is slow.
> 2. Having it enabled by default is not a good idea.
> 3. The comment "enable coarse-grained metrics by default" at 
> https://github.com/apache/lucene-solr/blob/branch_6x/solr/core/src/java/org/apache/solr/update/SolrIndexConfig.java#L104
>  implies that only coarse-grained metrics should be enabled by default, and 
> this contradicts with collecting metrics on every single byte read.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Reply via email to