[
https://issues.apache.org/jira/browse/SOLR-13953?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Erick Erickson resolved SOLR-13953.
-----------------------------------
Fix Version/s: 8.4
Resolution: Fixed
> Prometheus exporter in SolrCloud mode limited to 100 nodes
> ----------------------------------------------------------
>
> Key: SOLR-13953
> URL: https://issues.apache.org/jira/browse/SOLR-13953
> Project: Solr
> Issue Type: Bug
> Security Level: Public(Default Security Level. Issues are Public)
> Reporter: Alex Jablonski
> Assignee: Erick Erickson
> Priority: Major
> Fix For: 8.4
>
> Attachments: SOLR-13953.patch
>
>
> When using the Prometheus Exporter in SolrCloud mode against a cluster with
> more than 100 nodes, only 100 nodes' metrics are collected. For the other
> nodes, we see "Connection pool shut down" errors show up in logs, and the
> metrics from those nodes aren't reported.
> This seems to be tied to the cache implementation in hostClientCache in
> SolrCloudScraper. That cache currently has a fixed maximum size of 100. When
> it approaches that limit begins to evict HttpSolrClients, it closes those
> clients.
> We use the cache to build up a map of base URL to HttpSolrClient. For a >100
> node cluster, the cache will successfully return clients for all nodes,
> sequentially. But once we add the 101st node, the first HttpSolrClient, which
> the cache still holds a reference to, gets closed. When we then try to get
> the metrics using all of the HttpSolrClients returned from the cache, the
> ones that have been closed throw IllegalStateExceptions with message
> "Connection pool shut down".
>
> Original email thread here:
> [http://mail-archives.apache.org/mod_mbox/lucene-dev/201911.mbox/%3CCAOz296DSV-tt7rWBirBZ%2BP4%3DvT5g29FZrR_2zHrHF084Xq%2Bgyw%40mail.gmail.com%3E]
> Github PR here: [https://github.com/apache/lucene-solr/pull/1022]
>
> Example stacktrace:
>
> {code:java}
> WARN - 2019-11-15 21:21:19.584; org.apache.solr.prometheus.scraper.Async;
> Error occurred during metrics collection =>
> java.util.concurrent.ExecutionException: java.lang.IllegalStateException:
> Connection pool shut down
> at
> java.base/java.util.concurrent.CompletableFuture.reportGet(CompletableFuture.java:395)
> at
> java.util.stream.ReferencePipeline$2$1.accept(ReferencePipeline.java:177)
> [?:?]
> at
> java.util.ArrayList$ArrayListSpliterator.forEachRemaining(ArrayList.java:1654)
> [?:?]
> at
> java.util.stream.AbstractPipeline.copyInto(AbstractPipeline.java:484) [?:?]
> at
> java.util.stream.AbstractPipeline.wrapAndCopyInto(AbstractPipeline.java:474)
> [?:?]
> at
> java.util.stream.ForEachOps$ForEachOp.evaluateSequential(ForEachOps.java:150)
> [?:?]
> at
> java.util.stream.ForEachOps$ForEachOp$OfRef.evaluateSequential(ForEachOps.java:173)
> [?:?]
> at
> java.util.stream.AbstractPipeline.evaluate(AbstractPipeline.java:234) [?:?]
> at
> java.util.stream.ReferencePipeline.forEach(ReferencePipeline.java:497) [?:?]
> at
> org.apache.solr.prometheus.scraper.Async.lambda$waitForAllSuccessfulResponses$3(Async.java:43)
> [solr-prometheus-exporter-7.7.2.jar:7.7.2
> d4c30fc2856154f2c1fefc589eb7cd070a415b94 - janhoy - 2019-05-28 23:37:41]
> at
> java.util.concurrent.CompletableFuture.uniExceptionally(CompletableFuture.java:986)
> [?:?]
> at
> java.util.concurrent.CompletableFuture$UniExceptionally.tryFire(CompletableFuture.java:970)
> [?:?]
> at
> java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:506)
> [?:?]
> at
> java.util.concurrent.CompletableFuture$AsyncSupply.run(CompletableFuture.java:1705)
> [?:?]
> at
> org.apache.solr.common.util.ExecutorUtil$MDCAwareThreadPoolExecutor.lambda$execute$0(ExecutorUtil.java:209)
> [solr-solrj-7.7.2.jar:7.7.2 d4c30fc2856154f2c1fefc589eb7cd070a415b94 -
> janhoy - 2019-05-28 23:37:52]
> at
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
> [?:?]
> at
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
> [?:?]
> at java.lang.Thread.run(Thread.java:834) [?:?]
> Caused by: java.lang.IllegalStateException: Connection pool shut down
> at org.apache.http.util.Asserts.check(Asserts.java:34)
> ~[httpcore-4.4.10.jar:4.4.10]
> at
> org.apache.http.pool.AbstractConnPool.lease(AbstractConnPool.java:191)
> ~[httpcore-4.4.10.jar:4.4.10]
> at
> org.apache.http.impl.conn.PoolingHttpClientConnectionManager.requestConnection(PoolingHttpClientConnectionManager.java:267)
> ~[httpclient-4.5.6.jar:4.5.6]
> at
> org.apache.http.impl.execchain.MainClientExec.execute(MainClientExec.java:176)
> ~[httpclient-4.5.6.jar:4.5.6]
> at
> org.apache.http.impl.execchain.ProtocolExec.execute(ProtocolExec.java:185)
> ~[httpclient-4.5.6.jar:4.5.6]
> at
> org.apache.http.impl.execchain.RetryExec.execute(RetryExec.java:89)
> ~[httpclient-4.5.6.jar:4.5.6]
> at
> org.apache.http.impl.execchain.RedirectExec.execute(RedirectExec.java:110)
> ~[httpclient-4.5.6.jar:4.5.6]
> at
> org.apache.http.impl.client.InternalHttpClient.doExecute(InternalHttpClient.java:185)
> ~[httpclient-4.5.6.jar:4.5.6]
> at
> org.apache.http.impl.client.CloseableHttpClient.execute(CloseableHttpClient.java:83)
> ~[httpclient-4.5.6.jar:4.5.6]
> at
> org.apache.http.impl.client.CloseableHttpClient.execute(CloseableHttpClient.java:56)
> ~[httpclient-4.5.6.jar:4.5.6]
> at
> org.apache.solr.client.solrj.impl.HttpSolrClient.executeMethod(HttpSolrClient.java:542)
> ~[solr-solrj-7.7.2.jar:7.7.2 d4c30fc2856154f2c1fefc589eb7cd070a415b94 -
> janhoy - 2019-05-28 23:37:52]
> at
> org.apache.solr.client.solrj.impl.HttpSolrClient.request(HttpSolrClient.java:255)
> ~[solr-solrj-7.7.2.jar:7.7.2 d4c30fc2856154f2c1fefc589eb7cd070a415b94 -
> janhoy - 2019-05-28 23:37:52]
> at
> org.apache.solr.client.solrj.impl.HttpSolrClient.request(HttpSolrClient.java:244)
> ~[solr-solrj-7.7.2.jar:7.7.2 d4c30fc2856154f2c1fefc589eb7cd070a415b94 -
> janhoy - 2019-05-28 23:37:52]
> at
> org.apache.solr.client.solrj.SolrClient.request(SolrClient.java:1260)
> ~[solr-solrj-7.7.2.jar:7.7.2 d4c30fc2856154f2c1fefc589eb7cd070a415b94 -
> janhoy - 2019-05-28 23:37:52]
> at
> org.apache.solr.prometheus.scraper.SolrScraper.request(SolrScraper.java:102)
> ~[solr-prometheus-exporter-7.7.2.jar:7.7.2
> d4c30fc2856154f2c1fefc589eb7cd070a415b94 - janhoy - 2019-05-28 23:37:41]
> at
> org.apache.solr.prometheus.scraper.SolrCloudScraper.lambda$metricsForAllHosts$6(SolrCloudScraper.java:119)
> ~[solr-prometheus-exporter-7.7.2.jar:7.7.2
> d4c30fc2856154f2c1fefc589eb7cd070a415b94 - janhoy - 2019-05-28 23:37:41]
> at
> org.apache.solr.prometheus.scraper.SolrScraper.lambda$null$0(SolrScraper.java:81)
> ~[solr-prometheus-exporter-7.7.2.jar:7.7.2
> d4c30fc2856154f2c1fefc589eb7cd070a415b94 - janhoy - 2019-05-28 23:37:41]
> at
> java.util.concurrent.CompletableFuture$AsyncSupply.run(CompletableFuture.java:1700)
> ~[?:?]
> ... 4 more
> {code}
>
--
This message was sent by Atlassian Jira
(v8.3.4#803005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]