[ 
https://issues.apache.org/jira/browse/SOLR-6511?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14157843#comment-14157843
 ] 

Shalin Shekhar Mangar commented on SOLR-6511:
---------------------------------------------

bq. Digging a bit further into the logs, maxTries is set to 1 because 
ensureReplicaInLeaderInitiatedRecovery throws a SessionExpiredException 
(presumably because ZK has noticed the network blip and removed the relevant 
ephemeral node).

It's not just SessionExpiredException. Sometime it might throw a 
ConnectionLossException which also should be handled in the same way. I got the 
following stack trace in my testing when a node was partitioned from ZooKeeper 
for a long time:
{code}
7984566 [qtp1600876769-17] ERROR 
org.apache.solr.update.processor.DistributedUpdateProcessor  – Leader failed to 
set replica http://n4:8983/solr/collection_5x3_shard4_replica3/ state to DOWN 
due to: org.apache.solr.common.SolrException: Failed to update data to down for 
znode: /collections/collection_5x3/leader_initiated_recovery/shard4/core_node10
org.apache.solr.common.SolrException: Failed to update data to down for znode: 
/collections/collection_5x3/leader_initiated_recovery/shard4/core_node10
        at 
org.apache.solr.cloud.ZkController.updateLeaderInitiatedRecoveryState(ZkController.java:1959)
        at 
org.apache.solr.cloud.ZkController.ensureReplicaInLeaderInitiatedRecovery(ZkController.java:1841)
        at 
org.apache.solr.update.processor.DistributedUpdateProcessor.doFinish(DistributedUpdateProcessor.java:837)
        at 
org.apache.solr.update.processor.DistributedUpdateProcessor.finish(DistributedUpdateProcessor.java:1679)
        at 
org.apache.solr.update.processor.LogUpdateProcessor.finish(LogUpdateProcessorFactory.java:179)
        at 
org.apache.solr.update.processor.UpdateRequestProcessor.finish(UpdateRequestProcessor.java:76)
        at 
org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(ContentStreamHandlerBase.java:83)
        at 
org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:135)
        at org.apache.solr.core.SolrCore.execute(SolrCore.java:1967)
        at 
org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:777)
        at 
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:418)
        at 
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:207)
        at 
org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1419)
        at 
org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:455)
        at 
org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:137)
        at 
org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:557)
        at 
org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:231)
        at 
org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1075)
        at 
org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:384)
        at 
org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:193)
        at 
org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1009)
        at 
org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:135)
        at 
org.eclipse.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:255)
        at 
org.eclipse.jetty.server.handler.HandlerCollection.handle(HandlerCollection.java:154)
        at 
org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:116)
        at org.eclipse.jetty.server.Server.handle(Server.java:368)
        at 
org.eclipse.jetty.server.AbstractHttpConnection.handleRequest(AbstractHttpConnection.java:489)
        at 
org.eclipse.jetty.server.BlockingHttpConnection.handleRequest(BlockingHttpConnection.java:53)
        at 
org.eclipse.jetty.server.AbstractHttpConnection.content(AbstractHttpConnection.java:953)
        at 
org.eclipse.jetty.server.AbstractHttpConnection$RequestHandler.content(AbstractHttpConnection.java:1014)
        at org.eclipse.jetty.http.HttpParser.parseNext(HttpParser.java:953)
        at org.eclipse.jetty.http.HttpParser.parseAvailable(HttpParser.java:240)
        at 
org.eclipse.jetty.server.BlockingHttpConnection.handle(BlockingHttpConnection.java:72)
        at 
org.eclipse.jetty.server.bio.SocketConnector$ConnectorEndPoint.run(SocketConnector.java:264)
        at 
org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:608)
        at 
org.eclipse.jetty.util.thread.QueuedThreadPool$3.run(QueuedThreadPool.java:543)
        at java.lang.Thread.run(Thread.java:745)
Caused by: org.apache.zookeeper.KeeperException$ConnectionLossException: 
KeeperErrorCode = ConnectionLoss for 
/collections/collection_5x3/leader_initiated_recovery/shard4/core_node10
        at org.apache.zookeeper.KeeperException.create(KeeperException.java:99)
        at org.apache.zookeeper.KeeperException.create(KeeperException.java:51)
        at org.apache.zookeeper.ZooKeeper.exists(ZooKeeper.java:1045)
        at 
org.apache.solr.common.cloud.SolrZkClient$5.execute(SolrZkClient.java:256)
        at 
org.apache.solr.common.cloud.SolrZkClient$5.execute(SolrZkClient.java:253)
        at 
org.apache.solr.common.cloud.ZkCmdExecutor.retryOperation(ZkCmdExecutor.java:74)
        at 
org.apache.solr.common.cloud.SolrZkClient.exists(SolrZkClient.java:253)
        at 
org.apache.solr.cloud.ZkController.updateLeaderInitiatedRecoveryState(ZkController.java:1949)
        ... 36 more
{code}

> Fencepost error in LeaderInitiatedRecoveryThread
> ------------------------------------------------
>
>                 Key: SOLR-6511
>                 URL: https://issues.apache.org/jira/browse/SOLR-6511
>             Project: Solr
>          Issue Type: Bug
>            Reporter: Alan Woodward
>            Assignee: Timothy Potter
>             Fix For: 5.0
>
>         Attachments: SOLR-6511.patch, SOLR-6511.patch
>
>
> At line 106:
> {code}
>     while (continueTrying && ++tries < maxTries) {
> {code}
> should be
> {code}
>     while (continueTrying && ++tries <= maxTries) {
> {code}
> This is only a problem when called from DistributedUpdateProcessor, as it can 
> have maxTries set to 1, which means the loop is never actually run.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to