[ https://issues.apache.org/jira/browse/SOLR-13234?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16792359#comment-16792359 ]
ASF subversion and git services commented on SOLR-13234: -------------------------------------------------------- Commit cedff86aaaee70a28bd56372666b88f21381c975 in lucene-solr's branch refs/heads/branch_8x from Shalin Shekhar Mangar [ https://gitbox.apache.org/repos/asf?p=lucene-solr.git;h=cedff86 ] SOLR-13234: Fix for turkish locales WhenSolrExporterIntegrationTest.jvmMetrics ran on a JVM with the Turkish locale, (test seed: 62880F3B9F140C89). The JVM metric for terminated thread-count has a dotless-i e.g. termınated. This causes the check for matching metrics to fail. We could normalize the text in this case, however I think it's better to ensure we have the correct total number of JVM thread metrics rather than looking at Prometheus labels which maybe localized. This closes #605. (cherry picked from commit 6d0386c901b9d14c7464c7cf286d4a005eb9c72c) > Prometheus Metric Exporter Not Threadsafe > ----------------------------------------- > > Key: SOLR-13234 > URL: https://issues.apache.org/jira/browse/SOLR-13234 > Project: Solr > Issue Type: Bug > Security Level: Public(Default Security Level. Issues are Public) > Components: metrics > Affects Versions: 7.6, 8.0 > Reporter: Danyal Prout > Assignee: Shalin Shekhar Mangar > Priority: Minor > Labels: metric-collector > Fix For: 8.x, master (9.0) > > Attachments: SOLR-13234-branch_7x.patch > > Time Spent: 40m > Remaining Estimate: 0h > > The Solr Prometheus Exporter collects metrics when it receives a HTTP request > from Prometheus. Prometheus sends this request, on its [scrape > interval|https://prometheus.io/docs/prometheus/latest/configuration/configuration/#scrape_config]. > When the time taken to collect the Solr metrics is greater than the scrape > interval of the Prometheus server, this results in concurrent metric > collection occurring in this > [method|https://github.com/apache/lucene-solr/blob/master/solr/contrib/prometheus-exporter/src/java/org/apache/solr/prometheus/collector/SolrCollector.java#L86]. > This method doesn’t appear to be thread safe, for instance you could have > concurrent modifications of a > [map|https://github.com/apache/lucene-solr/blob/master/solr/contrib/prometheus-exporter/src/java/org/apache/solr/prometheus/collector/SolrCollector.java#L119]. > After a while the Solr Exporter processes becomes nondeterministic, we've > observed NPE and loss of metrics. > To address this, I'm proposing the following fixes: > 1. Read/parse the configuration at startup and make it immutable. > 2. Collect metrics from Solr on an interval which is controlled by the Solr > Exporter and cache the metric samples to return during Prometheus scraping. > Metric collection can be expensive, for example executing arbitrary Solr > searches, it's not ideal to allow for concurrent metric collection and on an > interval which is not defined by the Solr Exporter. > There are also a few other performance improvements that we've made while > fixing this, for example using the ClusterStateProvider instead of sending > multiple HTTP requests to each Solr node to lookup all the cores. > I'm currently finishing up these changes which I'll submit as a PR. -- This message was sent by Atlassian JIRA (v7.6.3#76005) --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org