[ https://issues.apache.org/jira/browse/SOLR-11881?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16472501#comment-16472501 ]
Mark Miller commented on SOLR-11881: ------------------------------------ Yeah, it's versioned, that is being counted on in SOLR-12305 as well. My bigger concern with retires would be stuff we didn't think - I know it's much tricker than the other distributed commands in terms of what ramifications changes have. bq. and then call the {{cmdDistrib.blockAndDoRetries();}} Why don't we call that first? Isn't this just the same case as a commit? Commit has to blockAndDoRetries first, to make sure it applies to all previous updates. > Connection Reset Causing LIR > ---------------------------- > > Key: SOLR-11881 > URL: https://issues.apache.org/jira/browse/SOLR-11881 > Project: Solr > Issue Type: Bug > Security Level: Public(Default Security Level. Issues are Public) > Reporter: Varun Thacker > Assignee: Varun Thacker > Priority: Major > Attachments: SOLR-11881-SolrCmdDistributor.patch, SOLR-11881.patch, > SOLR-11881.patch > > > We can see that a connection reset is causing LIR. > If a leader -> replica update get's a connection like this the leader will > initiate LIR > {code:java} > 2018-01-08 17:39:16.980 ERROR (qtp1010030701-468988) [c:collection s:shardX > r:core_node56 collection_shardX_replicaY] > o.a.s.u.p.DistributedUpdateProcessor Setting up to try to start recovery on > replica https://host08.domain:8985/solr/collection_shardX_replicaY/ > java.net.SocketException: Connection reset > at java.net.SocketInputStream.read(SocketInputStream.java:210) > at java.net.SocketInputStream.read(SocketInputStream.java:141) > at sun.security.ssl.InputRecord.readFully(InputRecord.java:465) > at sun.security.ssl.InputRecord.read(InputRecord.java:503) > at sun.security.ssl.SSLSocketImpl.readRecord(SSLSocketImpl.java:973) > at > sun.security.ssl.SSLSocketImpl.performInitialHandshake(SSLSocketImpl.java:1375) > at > sun.security.ssl.SSLSocketImpl.startHandshake(SSLSocketImpl.java:1403) > at > sun.security.ssl.SSLSocketImpl.startHandshake(SSLSocketImpl.java:1387) > at > org.apache.http.conn.ssl.SSLSocketFactory.connectSocket(SSLSocketFactory.java:543) > at > org.apache.http.conn.ssl.SSLSocketFactory.connectSocket(SSLSocketFactory.java:409) > at > org.apache.http.impl.conn.DefaultClientConnectionOperator.openConnection(DefaultClientConnectionOperator.java:177) > at > org.apache.http.impl.conn.ManagedClientConnectionImpl.open(ManagedClientConnectionImpl.java:304) > at > org.apache.http.impl.client.DefaultRequestDirector.tryConnect(DefaultRequestDirector.java:611) > at > org.apache.http.impl.client.DefaultRequestDirector.execute(DefaultRequestDirector.java:446) > at > org.apache.http.impl.client.AbstractHttpClient.doExecute(AbstractHttpClient.java:882) > at > org.apache.http.impl.client.CloseableHttpClient.execute(CloseableHttpClient.java:82) > at > org.apache.http.impl.client.CloseableHttpClient.execute(CloseableHttpClient.java:55) > at > org.apache.solr.client.solrj.impl.ConcurrentUpdateSolrClient$Runner.sendUpdateStream(ConcurrentUpdateSolrClient.java:312) > at > org.apache.solr.client.solrj.impl.ConcurrentUpdateSolrClient$Runner.run(ConcurrentUpdateSolrClient.java:185) > at > org.apache.solr.common.util.ExecutorUtil$MDCAwareThreadPoolExecutor.lambda$execute$0(ExecutorUtil.java:229) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) > at java.lang.Thread.run(Thread.java:745) > {code} > From https://issues.apache.org/jira/browse/SOLR-6931 Mark says "On a heavy > working SolrCloud cluster, even a rare response like this from a replica can > cause a recovery and heavy cluster disruption" . > Looking at SOLR-6931 we added a http retry handler but we only retry on GET > requests. Updates are POST requests > {{ConcurrentUpdateSolrClient#sendUpdateStream}} > Update requests between the leader and replica should be retry-able since > they have been versioned. -- This message was sent by Atlassian JIRA (v7.6.3#76005) --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org