[jira] [Commented] (SOLR-5129) If zookeeper is down, SolrCloud nodes will not start correctly, even if zookeeper is started later

Shawn Heisey (JIRA) Fri, 09 Aug 2013 08:23:08 -0700

    [ 
https://issues.apache.org/jira/browse/SOLR-5129?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13734891#comment-13734891
 ]


Shawn Heisey commented on SOLR-5129:
------------------------------------

Full report from user on mailing list:

We have 10 Solr4 nodes (5 shards with replication factor 2) and three zookeeper 
instances. When we bring 10 Solr4 nodes [up] (while all zookeeper instances are 
down), we see this exception in Solr4 logs. (which makes sense)

{noformat}
java.net.ConnectException: Connection refused
        at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
        at 
sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:567)
        at 
org.apache.zookeeper.ClientCnxnSocketNIO.doTransport(ClientCnxnSocketNIO.java:350)
        at org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1068)
862352 [main-SendThread(d136274-003.dc.gs.com:2181)] WARN  
org.apache.zookeeper.ClientCnxn  ? Session 0x0 for server null, unexpected 
error, closing socket connection and attempting reconnect
{noformat}

When we bring up all zookeeper instances, we stop getting above exception, see 
this message in log and log stops moving after that:

{noformat}
INFO  - 2013-08-09 15:48:41.447; 
org.apache.solr.common.cloud.ConnectionManager; Watcher 
org.apache.solr.common.cloud.ConnectionManager@203727c5 
name:ZooKeeperConnection 
Watcher:zk1.test.com:2181,zk2.test.com:2181,zk3.test.com:2181 got event 
WatchedEvent state:SyncConnected type:None path:null path:null type:None
998962 [main-EventThread] INFO  org.apache.solr.common.cloud.ConnectionManager  
? Watcher org.apache.solr.common.cloud.ConnectionManager@203727c5 
name:ZooKeeperConnection 
Watcher:zk1.test.com:2181,zk2.test.com:2181,qa-zk3.test.com:2181 got event 
WatchedEvent state:SyncConnected type:None path:null path:null type:None
INFO  - 2013-08-09 15:48:41.528; 
org.apache.solr.common.cloud.ConnectionManager; Client->ZooKeeper status change 
trigger but we are already closed
999043 [main-EventThread] INFO  org.apache.solr.common.cloud.ConnectionManager  
? Client->ZooKeeper status change trigger but we are already closed
{noformat}

At this point, we cannot see admin page or query of any solr nodes unless we 
restart entire cloud and after that everything is great. So we must put checks 
to make sure that N/2 + 1 zookeeper instances are up before we can bring up any 
solr nodes.

                
> If zookeeper is down, SolrCloud nodes will not start correctly, even if 
> zookeeper is started later
> --------------------------------------------------------------------------------------------------
>
>                 Key: SOLR-5129
>                 URL: https://issues.apache.org/jira/browse/SOLR-5129
>             Project: Solr
>          Issue Type: Bug
>          Components: SolrCloud
>    Affects Versions: 4.4
>            Reporter: Shawn Heisey
>             Fix For: 4.5, 5.0
>
>
> Summary of report from user on mailing list:
> If zookeeper is down when you start Solr nodes, they will not function 
> correctly, even if you later start zookeeper.  While zookeeper is down, the 
> log shows connection failures as expected.  When zookeeper comes back, the 
> log shows:
> INFO  - 2013-08-09 15:48:41.528; 
> org.apache.solr.common.cloud.ConnectionManager; Client->ZooKeeper status 
> change trigger but we are already closed
> At that point, Solr (admin UI and all other functions) does not work, and 
> won't work until it is restarted.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Commented] (SOLR-5129) If zookeeper is down, SolrCloud nodes will not start correctly, even if zookeeper is started later

Reply via email to