[ 
https://issues.apache.org/jira/browse/SOLR-10130?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15866092#comment-15866092
 ] 

Walter Underwood commented on SOLR-10130:
-----------------------------------------

I have a JMeter-based load script I can share. It replays access logs. I reload 
the collection to clear caches, run warming queries, then run queries at a 
controlled rate. After, it calculates percentiles.

This was a test of 6.4.1. Really slow. The errors are usually log lines with 
queries so long that they are truncated and end up with bad syntax. There is 
one column per request handler, so these results are for /auto, /mobile, 
/select, and /srp.

Mon Feb 13 12:01:29 PST 2017 ; INFO testing solr-cloud.test.cheggnet.com:8983
Mon Feb 13 12:01:29 PST 2017 ; INFO testing with 2000 requests/min
Mon Feb 13 12:01:29 PST 2017 ; INFO testing with 240000 requests
Mon Feb 13 12:01:29 PST 2017 : splitting log into cache warming (first 2000 
lines) and benchmark for 
/home/wunder/2016-12-12-peak-questions-traffic-clean.log
Mon Feb 13 12:01:36 PST 2017 : starting cache warming to 
solr-cloud.test.cheggnet.com:8983
Mon Feb 13 12:24:29 PST 2017 : starting benchmarking to 
solr-cloud.test.cheggnet.com:8983
Mon Feb 13 12:24:29 PST 2017 : benchmark should run for 120 minutes
Mon Feb 13 12:24:29 PST 2017 : to get a count of requests sent so far, use "wc 
-l out-32688.jtl"
Mon Feb 13 14:55:01 PST 2017 : WARNING 207 error responses from 
solr-cloud.test.cheggnet.com
Mon Feb 13 14:55:01 PST 2017 : INFO Removing 207 error responses from JMeter 
output file before analysis
Mon Feb 13 14:55:01 PST 2017 : analyzing results
/home/wunder/search-test/load-test
Mon Feb 13 14:55:04 PST 2017 : 25th percentiles are 3151.0,3389.0,9329.0,5647.0
Mon Feb 13 14:55:04 PST 2017 : medians are 6101.0,10579.0,18692.0,8780.0
Mon Feb 13 14:55:04 PST 2017 : 75th percentiles are 
6871.0,12499.0,25000.0,12580.0
Mon Feb 13 14:55:04 PST 2017 : 90th percentiles are 
7593.0,13481.0,27623.0,14068.0
Mon Feb 13 14:55:04 PST 2017 : 95th percentiles are 
8079.0,14039.0,28566.0,16606.0
Mon Feb 13 14:55:04 PST 2017 : full results are in test.csv

> Serious performance degradation in Solr 6.4.1 due to the new metrics 
> collection
> -------------------------------------------------------------------------------
>
>                 Key: SOLR-10130
>                 URL: https://issues.apache.org/jira/browse/SOLR-10130
>             Project: Solr
>          Issue Type: Bug
>      Security Level: Public(Default Security Level. Issues are Public) 
>          Components: metrics
>    Affects Versions: 6.4.1
>         Environment: Centos 7, OpenJDK 1.8.0 update 111
>            Reporter: Ere Maijala
>            Assignee: Andrzej Bialecki 
>            Priority: Blocker
>              Labels: perfomance
>             Fix For: master (7.0), 6.4.2
>
>         Attachments: SOLR-10130.patch, solr-8983-console-f1.log
>
>
> We've stumbled on serious performance issues after upgrading to Solr 6.4.1. 
> Looks like the new metrics collection system in MetricsDirectoryFactory is 
> causing a major slowdown. This happens with an index configuration that, as 
> far as I can see, has no metrics specific configuration and uses 
> luceneMatchVersion 5.5.0. In practice a moderate load will completely bog 
> down the server with Solr threads constantly using up all CPU (600% on 6 core 
> machine) capacity with a load that normally  where we normally see an average 
> load of < 50%.
> I took stack traces (I'll attach them) and noticed that the threads are 
> spending time in com.codahale.metrics.Meter.mark. I tested building Solr 
> 6.4.1 with the metrics collection disabled in MetricsDirectoryFactory getByte 
> and getBytes methods and was unable to reproduce the issue.
> As far as I can see there are several issues:
> 1. Collecting metrics on every single byte read is slow.
> 2. Having it enabled by default is not a good idea.
> 3. The comment "enable coarse-grained metrics by default" at 
> https://github.com/apache/lucene-solr/blob/branch_6x/solr/core/src/java/org/apache/solr/update/SolrIndexConfig.java#L104
>  implies that only coarse-grained metrics should be enabled by default, and 
> this contradicts with collecting metrics on every single byte read.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Reply via email to