RE: If zookeeper is down, SolrCloud nodes will not start correctly, even if zookeeper is started later

2015-10-07 Thread Adrian Liew
Hi Shawn

Thanks for informing me. I guess the worst case scenario is that all 3 ZK 
services are down and that may be unlikely the case. At this juncture, as you 
said the viable workaround is a manual approach to start up the services in 
sequence in ensuring a quorum can take place. So the proper sequence in a 3 ZK 
+ Solr (both ZK and Solr in each server) server setup will be as follows:

Downed situation with one or mode ZK services
1. Restart all ZK Services first on all three machines
2. Restart all Solr Services on all three machines

Please do clarify if the above is correct and I will be happy to take this 
approach and communicate to my customer.

Many thanks.

Regards,
Adrian 

-Original Message-
From: Shawn Heisey [mailto:apa...@elyograg.org] 
Sent: Wednesday, October 7, 2015 4:09 PM
To: solr-user@lucene.apache.org
Subject: Re: If zookeeper is down, SolrCloud nodes will not start correctly, 
even if zookeeper is started later

On 10/6/2015 10:22 PM, Adrian Liew wrote:
> Hence, the issue is that upon startup of three machines, the startup 
> of ZK and Solr is out of sequence that causes SolrCloud to behave 
> unexpectedly. Noting there is Jira ticket addressed here for Solr 4.9 
> above to include an improvement to the issue above. 
> (https://issues.apache.org/jira/browse/SOLR-5129)

That issue is unresolved, so it has not been fixed in any Solr version.

At this time, if you do not have Zookeeper quorum (a majority of your ZK nodes 
fully operational), you will not be able to successfully start SolrCloud nodes. 
 The issue has low priority because there is a viable workaround -- ensure that 
ZK has quorum before starting or restarting any Solr node.

Thinking out loud:  Until this issue is fixed, I think this means that a 3-node 
setup where all three nodes use the zookeeper embedded in Solr will require a 
strange startup sequence if none of the nodes are running:

* Start node 1. Solr will not start correctly -- no ZK quorum.
* Start node 2. Solr might start correctly, not sure.
* Start node 3. This should start correctly.
* Restart node 1. With ZK nodes 2 and 3 running, this will work.
* Restart node 2 if it did not start properly the first time.

I really have no idea whether the second node startup will work properly.

Thanks,
Shawn



Re: If zookeeper is down, SolrCloud nodes will not start correctly, even if zookeeper is started later

2015-10-07 Thread Shawn Heisey
On 10/6/2015 10:22 PM, Adrian Liew wrote:
> Hence, the issue is that upon startup of three machines, the startup of ZK 
> and Solr is out of sequence that causes SolrCloud to behave unexpectedly. 
> Noting there is Jira ticket addressed here for Solr 4.9 above to include an 
> improvement to the issue above. 
> (https://issues.apache.org/jira/browse/SOLR-5129) 

That issue is unresolved, so it has not been fixed in any Solr version.

At this time, if you do not have Zookeeper quorum (a majority of your ZK
nodes fully operational), you will not be able to successfully start
SolrCloud nodes.  The issue has low priority because there is a viable
workaround -- ensure that ZK has quorum before starting or restarting
any Solr node.

Thinking out loud:  Until this issue is fixed, I think this means that a
3-node setup where all three nodes use the zookeeper embedded in Solr
will require a strange startup sequence if none of the nodes are running:

* Start node 1. Solr will not start correctly -- no ZK quorum.
* Start node 2. Solr might start correctly, not sure.
* Start node 3. This should start correctly.
* Restart node 1. With ZK nodes 2 and 3 running, this will work.
* Restart node 2 if it did not start properly the first time.

I really have no idea whether the second node startup will work properly.

Thanks,
Shawn



Re: If zookeeper is down, SolrCloud nodes will not start correctly, even if zookeeper is started later

2015-10-07 Thread Shawn Heisey
On 10/7/2015 3:06 AM, Adrian Liew wrote:
> Thanks for informing me. I guess the worst case scenario is that all 3 ZK 
> services are down and that may be unlikely the case. At this juncture, as you 
> said the viable workaround is a manual approach to start up the services in 
> sequence in ensuring a quorum can take place. So the proper sequence in a 3 
> ZK + Solr (both ZK and Solr in each server) server setup will be as follows:
> 
> Downed situation with one or mode ZK services
> 1. Restart all ZK Services first on all three machines
> 2. Restart all Solr Services on all three machines
> 
> Please do clarify if the above is correct and I will be happy to take this 
> approach and communicate to my customer.

If zookeeper is external (not embedded in Solr), your procedure is
correct -- ensure enough ZK nodes are started to reach quorum, then
start Solr.  If using the embedded zookeeper (-DzkRun) then you would
want to follow the procedure I outlined in my last message.

Thanks,
Shawn