[ https://issues.apache.org/jira/browse/SOLR-10130?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15866092#comment-15866092 ]
Walter Underwood commented on SOLR-10130: ----------------------------------------- I have a JMeter-based load script I can share. It replays access logs. I reload the collection to clear caches, run warming queries, then run queries at a controlled rate. After, it calculates percentiles. This was a test of 6.4.1. Really slow. The errors are usually log lines with queries so long that they are truncated and end up with bad syntax. There is one column per request handler, so these results are for /auto, /mobile, /select, and /srp. Mon Feb 13 12:01:29 PST 2017 ; INFO testing solr-cloud.test.cheggnet.com:8983 Mon Feb 13 12:01:29 PST 2017 ; INFO testing with 2000 requests/min Mon Feb 13 12:01:29 PST 2017 ; INFO testing with 240000 requests Mon Feb 13 12:01:29 PST 2017 : splitting log into cache warming (first 2000 lines) and benchmark for /home/wunder/2016-12-12-peak-questions-traffic-clean.log Mon Feb 13 12:01:36 PST 2017 : starting cache warming to solr-cloud.test.cheggnet.com:8983 Mon Feb 13 12:24:29 PST 2017 : starting benchmarking to solr-cloud.test.cheggnet.com:8983 Mon Feb 13 12:24:29 PST 2017 : benchmark should run for 120 minutes Mon Feb 13 12:24:29 PST 2017 : to get a count of requests sent so far, use "wc -l out-32688.jtl" Mon Feb 13 14:55:01 PST 2017 : WARNING 207 error responses from solr-cloud.test.cheggnet.com Mon Feb 13 14:55:01 PST 2017 : INFO Removing 207 error responses from JMeter output file before analysis Mon Feb 13 14:55:01 PST 2017 : analyzing results /home/wunder/search-test/load-test Mon Feb 13 14:55:04 PST 2017 : 25th percentiles are 3151.0,3389.0,9329.0,5647.0 Mon Feb 13 14:55:04 PST 2017 : medians are 6101.0,10579.0,18692.0,8780.0 Mon Feb 13 14:55:04 PST 2017 : 75th percentiles are 6871.0,12499.0,25000.0,12580.0 Mon Feb 13 14:55:04 PST 2017 : 90th percentiles are 7593.0,13481.0,27623.0,14068.0 Mon Feb 13 14:55:04 PST 2017 : 95th percentiles are 8079.0,14039.0,28566.0,16606.0 Mon Feb 13 14:55:04 PST 2017 : full results are in test.csv > Serious performance degradation in Solr 6.4.1 due to the new metrics > collection > ------------------------------------------------------------------------------- > > Key: SOLR-10130 > URL: https://issues.apache.org/jira/browse/SOLR-10130 > Project: Solr > Issue Type: Bug > Security Level: Public(Default Security Level. Issues are Public) > Components: metrics > Affects Versions: 6.4.1 > Environment: Centos 7, OpenJDK 1.8.0 update 111 > Reporter: Ere Maijala > Assignee: Andrzej Bialecki > Priority: Blocker > Labels: perfomance > Fix For: master (7.0), 6.4.2 > > Attachments: SOLR-10130.patch, solr-8983-console-f1.log > > > We've stumbled on serious performance issues after upgrading to Solr 6.4.1. > Looks like the new metrics collection system in MetricsDirectoryFactory is > causing a major slowdown. This happens with an index configuration that, as > far as I can see, has no metrics specific configuration and uses > luceneMatchVersion 5.5.0. In practice a moderate load will completely bog > down the server with Solr threads constantly using up all CPU (600% on 6 core > machine) capacity with a load that normally where we normally see an average > load of < 50%. > I took stack traces (I'll attach them) and noticed that the threads are > spending time in com.codahale.metrics.Meter.mark. I tested building Solr > 6.4.1 with the metrics collection disabled in MetricsDirectoryFactory getByte > and getBytes methods and was unable to reproduce the issue. > As far as I can see there are several issues: > 1. Collecting metrics on every single byte read is slow. > 2. Having it enabled by default is not a good idea. > 3. The comment "enable coarse-grained metrics by default" at > https://github.com/apache/lucene-solr/blob/branch_6x/solr/core/src/java/org/apache/solr/update/SolrIndexConfig.java#L104 > implies that only coarse-grained metrics should be enabled by default, and > this contradicts with collecting metrics on every single byte read. -- This message was sent by Atlassian JIRA (v6.3.15#6346) --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org