[
https://issues.apache.org/jira/browse/SOLR-5129?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13734891#comment-13734891
]
Shawn Heisey commented on SOLR-5129:
------------------------------------
Full report from user on mailing list:
We have 10 Solr4 nodes (5 shards with replication factor 2) and three zookeeper
instances. When we bring 10 Solr4 nodes [up] (while all zookeeper instances are
down), we see this exception in Solr4 logs. (which makes sense)
{noformat}
java.net.ConnectException: Connection refused
at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
at
sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:567)
at
org.apache.zookeeper.ClientCnxnSocketNIO.doTransport(ClientCnxnSocketNIO.java:350)
at org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1068)
862352 [main-SendThread(d136274-003.dc.gs.com:2181)] WARN
org.apache.zookeeper.ClientCnxn ? Session 0x0 for server null, unexpected
error, closing socket connection and attempting reconnect
{noformat}
When we bring up all zookeeper instances, we stop getting above exception, see
this message in log and log stops moving after that:
{noformat}
INFO - 2013-08-09 15:48:41.447;
org.apache.solr.common.cloud.ConnectionManager; Watcher
org.apache.solr.common.cloud.ConnectionManager@203727c5
name:ZooKeeperConnection
Watcher:zk1.test.com:2181,zk2.test.com:2181,zk3.test.com:2181 got event
WatchedEvent state:SyncConnected type:None path:null path:null type:None
998962 [main-EventThread] INFO org.apache.solr.common.cloud.ConnectionManager
? Watcher org.apache.solr.common.cloud.ConnectionManager@203727c5
name:ZooKeeperConnection
Watcher:zk1.test.com:2181,zk2.test.com:2181,qa-zk3.test.com:2181 got event
WatchedEvent state:SyncConnected type:None path:null path:null type:None
INFO - 2013-08-09 15:48:41.528;
org.apache.solr.common.cloud.ConnectionManager; Client->ZooKeeper status change
trigger but we are already closed
999043 [main-EventThread] INFO org.apache.solr.common.cloud.ConnectionManager
? Client->ZooKeeper status change trigger but we are already closed
{noformat}
At this point, we cannot see admin page or query of any solr nodes unless we
restart entire cloud and after that everything is great. So we must put checks
to make sure that N/2 + 1 zookeeper instances are up before we can bring up any
solr nodes.
> If zookeeper is down, SolrCloud nodes will not start correctly, even if
> zookeeper is started later
> --------------------------------------------------------------------------------------------------
>
> Key: SOLR-5129
> URL: https://issues.apache.org/jira/browse/SOLR-5129
> Project: Solr
> Issue Type: Bug
> Components: SolrCloud
> Affects Versions: 4.4
> Reporter: Shawn Heisey
> Fix For: 4.5, 5.0
>
>
> Summary of report from user on mailing list:
> If zookeeper is down when you start Solr nodes, they will not function
> correctly, even if you later start zookeeper. While zookeeper is down, the
> log shows connection failures as expected. When zookeeper comes back, the
> log shows:
> INFO - 2013-08-09 15:48:41.528;
> org.apache.solr.common.cloud.ConnectionManager; Client->ZooKeeper status
> change trigger but we are already closed
> At that point, Solr (admin UI and all other functions) does not work, and
> won't work until it is restarted.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]