[jira] [Created] (SOLR-8843) Missing fallback for NoChildrenForEphemeralsException on ZkController.getLeaderPropsWithFallback for rolling upgrade

2016-03-14 Thread Enrico Hartung (JIRA)
Enrico Hartung created SOLR-8843:


 Summary: Missing fallback for NoChildrenForEphemeralsException on 
ZkController.getLeaderPropsWithFallback for rolling upgrade
 Key: SOLR-8843
 URL: https://issues.apache.org/jira/browse/SOLR-8843
 Project: Solr
  Issue Type: Bug
  Components: SolrCloud
Affects Versions: 5.4.1, 5.5
Reporter: Enrico Hartung


When doing a rolling upgrade from 5.3.2 to 5.4.1 (or 5.5.0) leader election 
fails with the following error (NoChildrenForEphemeralsException):
{code}
ERROR org.apache.solr.cloud.ShardLeaderElectionContext  [c:collection s:shard1 
r:core_node1 x:collection_shard1_replica1] – There was a problem trying to 
register as the leader:org.apache.solr.common.SolrException: Could not register 
as the leader because creating the ephemeral registration node in ZooKeeper 
failed
#011at 
org.apache.solr.cloud.ShardLeaderElectionContextBase.runLeaderProcess(ElectionContext.java:214)
#011at 
org.apache.solr.cloud.ShardLeaderElectionContext.runLeaderProcess(ElectionContext.java:406)
#011at 
org.apache.solr.cloud.LeaderElector.runIamLeaderProcess(LeaderElector.java:198)
#011at 
org.apache.solr.cloud.LeaderElector.checkIfIamLeader(LeaderElector.java:158)
#011at org.apache.solr.cloud.LeaderElector.access$200(LeaderElector.java:59)
#011at 
org.apache.solr.cloud.LeaderElector$ElectionWatcher.process(LeaderElector.java:389)
#011at org.apache.solr.common.cloud.SolrZkClient$3$1.run(SolrZkClient.java:264)
#011at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
#011at java.util.concurrent.FutureTask.run(FutureTask.java:266)
#011at 
org.apache.solr.common.util.ExecutorUtil$MDCAwareThreadPoolExecutor$1.run(ExecutorUtil.java:232)
#011at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
#011at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
#011at java.lang.Thread.run(Thread.java:745)
Caused by: 
org.apache.zookeeper.KeeperException$NoChildrenForEphemeralsException: 
KeeperErrorCode = NoChildrenForEphemerals
#011at org.apache.zookeeper.KeeperException.create(KeeperException.java:117)
#011at org.apache.zookeeper.ZooKeeper.multiInternal(ZooKeeper.java:949)
#011at org.apache.zookeeper.ZooKeeper.multi(ZooKeeper.java:915)
#011at 
org.apache.solr.common.cloud.SolrZkClient$11.execute(SolrZkClient.java:570)
#011at 
org.apache.solr.common.cloud.SolrZkClient$11.execute(SolrZkClient.java:567)
#011at 
org.apache.solr.common.cloud.ZkCmdExecutor.retryOperation(ZkCmdExecutor.java:61)
#011at org.apache.solr.common.cloud.SolrZkClient.multi(SolrZkClient.java:567)
#011at 
org.apache.solr.cloud.ShardLeaderElectionContextBase$1.execute(ElectionContext.java:197)
#011at org.apache.solr.common.util.RetryUtil.retryOnThrowable(RetryUtil.java:50)
#011at org.apache.solr.common.util.RetryUtil.retryOnThrowable(RetryUtil.java:43)
#011at 
org.apache.solr.cloud.ShardLeaderElectionContextBase.runLeaderProcess(ElectionContext.java:179)
#011... 12 more
{code}

A similar issues has been resolved with SOLR-8561, but it is not handling the 
case *NoChildrenForEphemeralsException*.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-8561) Add fallback to ZkController.getLeaderProps for a mixed 5.4-pre-5.4 deployments

2016-02-08 Thread Enrico Hartung (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-8561?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15136993#comment-15136993
 ] 

Enrico Hartung commented on SOLR-8561:
--

Not sure whether this is related, but when doing a rolling upgrade from 5.3.2 
to 5.4.1 leader election still fails with the following error:

{code}
ERROR org.apache.solr.cloud.ShardLeaderElectionContext  [c:collection s:shard1 
r:core_node1 x:collection_shard1_replica1] – There was a problem trying to 
register as the leader:org.apache.solr.common.SolrException: Could not register 
as the leader because creating the ephemeral registration node in ZooKeeper 
failed
#011at 
org.apache.solr.cloud.ShardLeaderElectionContextBase.runLeaderProcess(ElectionContext.java:214)
#011at 
org.apache.solr.cloud.ShardLeaderElectionContext.runLeaderProcess(ElectionContext.java:406)
#011at 
org.apache.solr.cloud.LeaderElector.runIamLeaderProcess(LeaderElector.java:198)
#011at 
org.apache.solr.cloud.LeaderElector.checkIfIamLeader(LeaderElector.java:158)
#011at org.apache.solr.cloud.LeaderElector.access$200(LeaderElector.java:59)
#011at 
org.apache.solr.cloud.LeaderElector$ElectionWatcher.process(LeaderElector.java:389)
#011at org.apache.solr.common.cloud.SolrZkClient$3$1.run(SolrZkClient.java:264)
#011at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
#011at java.util.concurrent.FutureTask.run(FutureTask.java:266)
#011at 
org.apache.solr.common.util.ExecutorUtil$MDCAwareThreadPoolExecutor$1.run(ExecutorUtil.java:232)
#011at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
#011at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
#011at java.lang.Thread.run(Thread.java:745)
Caused by: 
org.apache.zookeeper.KeeperException$NoChildrenForEphemeralsException: 
KeeperErrorCode = NoChildrenForEphemerals
#011at org.apache.zookeeper.KeeperException.create(KeeperException.java:117)
#011at org.apache.zookeeper.ZooKeeper.multiInternal(ZooKeeper.java:949)
#011at org.apache.zookeeper.ZooKeeper.multi(ZooKeeper.java:915)
#011at 
org.apache.solr.common.cloud.SolrZkClient$11.execute(SolrZkClient.java:570)
#011at 
org.apache.solr.common.cloud.SolrZkClient$11.execute(SolrZkClient.java:567)
#011at 
org.apache.solr.common.cloud.ZkCmdExecutor.retryOperation(ZkCmdExecutor.java:61)
#011at org.apache.solr.common.cloud.SolrZkClient.multi(SolrZkClient.java:567)
#011at 
org.apache.solr.cloud.ShardLeaderElectionContextBase$1.execute(ElectionContext.java:197)
#011at org.apache.solr.common.util.RetryUtil.retryOnThrowable(RetryUtil.java:50)
#011at org.apache.solr.common.util.RetryUtil.retryOnThrowable(RetryUtil.java:43)
#011at 
org.apache.solr.cloud.ShardLeaderElectionContextBase.runLeaderProcess(ElectionContext.java:179)
#011... 12 more
{code}

Should I create a separate ticket for this?

> Add fallback to ZkController.getLeaderProps for a mixed 5.4-pre-5.4 
> deployments
> ---
>
> Key: SOLR-8561
> URL: https://issues.apache.org/jira/browse/SOLR-8561
> Project: Solr
>  Issue Type: Bug
>  Components: SolrCloud
>Reporter: Shai Erera
>Assignee: Shai Erera
> Fix For: 5.5, 5.4.1
>
> Attachments: SOLR-8561.patch, SOLR-8561.patch
>
>
> See last comments in SOLR-7844. The latter changed the structure of the 
> leader path in ZK such that upgrading from pre-5.4 to 5.4 is impossible, 
> unless all nodes are taken down. This issue adds a fallback logic to look for 
> the leader properties on the old ZK node, as discussed.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org