[ 
https://issues.apache.org/jira/browse/SOLR-9290?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shalin Shekhar Mangar updated SOLR-9290:
----------------------------------------
    Attachment: SOLR-9290-debug.patch

This patch applies on 5.3.2. This patch adds a monitor thread for the pool 
created in UpdateShardHandler and with this applied, I cannot reproduce this 
problem anymore.

My hypothesis is that: We have a large limit for maxConnections and 
maxConnectionsPerHost. As long as the limit isn't met and the servers are 
decently busy, new connections will continue to be created from the pool. In 
5.x and 6.x, we do not have a policy of closing idle connections so httpclient 
will keep these connections in CLOSE_WAIT for reuse. So we must periodically 
close such connections once they're idle to avoid the number of such 
connections increasing to absurd limits.

Also, I think the reason this wasn't reproducible on master is because 
SOLR-4509 enabled eviction of idle threads by calling 
HttpClientBuilder#evictIdleConnections with a 50 second limit.

> TCP-connections in CLOSE_WAIT spikes during heavy indexing when SSL is enabled
> ------------------------------------------------------------------------------
>
>                 Key: SOLR-9290
>                 URL: https://issues.apache.org/jira/browse/SOLR-9290
>             Project: Solr
>          Issue Type: Bug
>      Security Level: Public(Default Security Level. Issues are Public) 
>    Affects Versions: 5.5.1, 5.5.2
>            Reporter: Anshum Gupta
>            Priority: Critical
>         Attachments: SOLR-9290-debug.patch, SOLR-9290-debug.patch, 
> setup-solr.sh
>
>
> Heavy indexing on Solr with SSL leads to a lot of connections in CLOSE_WAIT 
> state. 
> At my workplace, we have seen this issue only with 5.5.1 and could not 
> reproduce it with 5.4.1 but from my conversation with Shalin, he knows of 
> users with 5.3.1 running into this issue too. 
> Here's an excerpt from the email [~shaie] sent to the mailing list  (about 
> what we see:
> {quote}
> 1) It consistently reproduces on 5.5.1, but *does not* reproduce on 5.4.1
> 2) It does not reproduce when SSL is disabled
> 3) Restarting the Solr process (sometimes both need to be restarted), the
> count drops to 0, but if indexing continues, they climb up again
> When it does happen, Solr seems stuck. The leader cannot talk to the
> replica, or vice versa, the replica is usually put in DOWN state and
> there's no way to fix it besides restarting the JVM.
> {quote}
> Here's the mail thread: 
> http://mail-archives.apache.org/mod_mbox/lucene-solr-user/201607.mbox/%3c46cc66220a8143dc903fa34e79205...@vp-exc01.dips.local%3E
> Creating this issue so we could track this and have more people comment on 
> what they see. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Reply via email to