[ 
https://issues.apache.org/jira/browse/SOLR-9290?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15376893#comment-15376893
 ] 

Shalin Shekhar Mangar commented on SOLR-9290:
---------------------------------------------

Hoss has covered most of the things but just a few comments (note that I'm 
responding to multiple people and comments here):

bq. Why not backport that and avoid the problem entirely? Is it a different 
client version in master or something that makes it not that easy?

We could backport SOLR-4509 to 6.x and deal with the incompatible changes but 
I'd certainly not backport it to 5x because it is just a huge change and I am 
not comfortable releasing that in a minor bug-fix release. I am sure many 
people running 5.x releases would also like a fix to this issue. Adding an idle 
eviction thread is trivial and unlikely to cause any regressions.

{quote}
Shalin Shekhar Mangar: why not just re-use the IdleConnectionEvictor class 
provided by httpcomponents (getting the exact same underlying impl as what 
master gets from HttpClientBuilder.evictIdleConnections) ?
https://hc.apache.org/httpcomponents-client-4.4.x/httpclient/apidocs/org/apache/http/impl/client/IdleConnectionEvictor.html
{quote}

I wasn't aware of this class. But looking deeper, I see that it requires a 
HttpClientConnectionManager instance but the 6.x and 5.x code uses the 
deprecated PoolingClientConnectionManager which extends 
ClientConnectionManager. But now that we know it exists, I can just borrow it 
from the httpclient project instead of writing my own evictor. It is ASLv2 
anyway.

bq. Somebody sanity check my understanding / summary description of the root 
issue...

That sounds about right to me Hoss. Thanks for the summary!

bq. For reasons I don't understand, 'idle' connections are more likely to 
(exist? | be kept around indefinitely?) when the intra-node communication is 
over SSL.

Perhaps the SSL setup/teardown overhead adds some latency such that concurrent 
requests end up opening more connections overall? I am just guessing here.

bq. Which begs the question: why are there 15 CLOSE_WAIT connections that last 
forever on branch_6x even with this patch?

As Shai said, this is likely the HttpShardHandler's pool. The overseer 
collection processor invokes a core admin create for each replica in parallel 
so you get 15 connections for 15 replicas that were created by the collection 
API.

I'm working on a new patch which applies on branch_6x that incorporates Shai's 
comments as well. We can then backport it to 5x.

> TCP-connections in CLOSE_WAIT spikes during heavy indexing when SSL is enabled
> ------------------------------------------------------------------------------
>
>                 Key: SOLR-9290
>                 URL: https://issues.apache.org/jira/browse/SOLR-9290
>             Project: Solr
>          Issue Type: Bug
>      Security Level: Public(Default Security Level. Issues are Public) 
>    Affects Versions: 5.5.1, 5.5.2
>            Reporter: Anshum Gupta
>            Priority: Critical
>         Attachments: SOLR-9290-debug.patch, SOLR-9290-debug.patch, index.sh, 
> setup-solr.sh, setup-solr.sh
>
>
> Heavy indexing on Solr with SSL leads to a lot of connections in CLOSE_WAIT 
> state. 
> At my workplace, we have seen this issue only with 5.5.1 and could not 
> reproduce it with 5.4.1 but from my conversation with Shalin, he knows of 
> users with 5.3.1 running into this issue too. 
> Here's an excerpt from the email [~shaie] sent to the mailing list  (about 
> what we see:
> {quote}
> 1) It consistently reproduces on 5.5.1, but *does not* reproduce on 5.4.1
> 2) It does not reproduce when SSL is disabled
> 3) Restarting the Solr process (sometimes both need to be restarted), the
> count drops to 0, but if indexing continues, they climb up again
> When it does happen, Solr seems stuck. The leader cannot talk to the
> replica, or vice versa, the replica is usually put in DOWN state and
> there's no way to fix it besides restarting the JVM.
> {quote}
> Here's the mail thread: 
> http://mail-archives.apache.org/mod_mbox/lucene-solr-user/201607.mbox/%[email protected]%3E
> Creating this issue so we could track this and have more people comment on 
> what they see. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to