There are 16 Solr Nodes (Solr 5.2.1) & 5 Zookeeper Nodes (Zookeeper 3.4.6) in our production cluster. We had to restart Solr nodes for some reason and we are doing it after 3 months. To our surprise, none of the solr nodes came up. We can see the Solr process running the machine, but, the Solr Admin console is not reachable. We even tried restarting Zookeeper cluster and Solr node cluster. Still, the issue remained.
On debugging I have found out - 1. Below exception in solr.log : > > > *ERROR - 2016-07-12 07:43:48.988; > org.apache.solr.servlet.SolrDispatchFilter; Could not start Solr. Check > solr/home property and the logsERROR - 2016-07-12 07:43:49.012; > org.apache.solr.common.SolrException; > null:org.apache.solr.common.SolrException: Could not find collection : > cont_coll_2_fr at > org.apache.solr.common.cloud.ClusterState.getCollection(ClusterState.java:164)* 2. Connected to zookeeper quorum using Zookeeper's zkCli.sh and found out that there are few collections (which are deleted using Solr Collections Delete API) still exists in zookeeper (ls /collections). The same collections doesn't exist on the solr node disk. 3. There are entries related to these deleted collections in Zookeeper's clusterstate.json file as well. 4. There are many entries in overseer queue (/overseer/queue) & queue-work (/overseer/queue-work). I have tried below things based on some existing suggestions on the net - 1. Stopped all the Solr nodes and removed unwanted (which are deleted using Solr Collections Delete API) collections using *rmr *command from Zookeeper (/collections). 2. Removed all the entries from overseer queue (/overseer/queue) & queue-work (/overseer/queue-work) as well. 3. Restarted Zookeeper and then Solr. Even, after doing this the issue still remains. Can someone help me on how to resolve this? - Thanks