RE: external zookeeper with SolrCloud
Is there a way to find if We have a zookeeper quorum? We can ping individual zookeeper and see if it is running, but it would be nice to ping/query one URL and check if we have a quorum. -Original Message- From: Shawn Heisey [mailto:s...@elyograg.org] Sent: Friday, August 09, 2013 2:15 PM To: solr-user@lucene.apache.org Subject: Re: external zookeeper with SolrCloud On 8/9/2013 11:15 AM, Joshi, Shital wrote: Same thing happen. It only works with N/2 + 1 zookeeper instances up. Got it. An update came in on the issue that I filed. This behavior that you're seeing is currently by design. Because this is expected behavior, I've changed the issue to improvement instead of a bug. I don't know if it is something that will happen, but the request is in. The workaround is fairly simple -- don't start or restart Solr nodes if you don't have zookeeper quorum. Thank you for your diligent testing! Shawn
Re: external zookeeper with SolrCloud
On 8/16/2013 11:58 AM, Joshi, Shital wrote: Is there a way to find if We have a zookeeper quorum? We can ping individual zookeeper and see if it is running, but it would be nice to ping/query one URL and check if we have a quorum. This is a really good question, to which I do not have an answer. If your client code is Java, you could probably get this information out of CloudSolrServer, with something like this: server.getZkStateReader().getZkClient().getSolrZooKeeper().getState(); If the state is CONNECTED everything's probably fine. If anyone who's dealt with Zookeeper happens to know whether this would work, I'd appreciate knowing. For Solr, it is probably a good idea to expose something via an admin handler with the current zookeeper quorum state. Thanks, Shawn
Re: external zookeeper with SolrCloud
You might be able to get info from the Zookeeper four letter words. http://zookeeper.apache.org/doc/r3.1.2/zookeeperAdmin.html#sc_zkCommands Here is a command to get the status for one of our Zookeeper hosts: $ echo stat | nc zk-web02.test3.cloud.cheggnet.com 2181 wunder On Aug 16, 2013, at 12:01 PM, Shawn Heisey wrote: On 8/16/2013 11:58 AM, Joshi, Shital wrote: Is there a way to find if We have a zookeeper quorum? We can ping individual zookeeper and see if it is running, but it would be nice to ping/query one URL and check if we have a quorum. This is a really good question, to which I do not have an answer. If your client code is Java, you could probably get this information out of CloudSolrServer, with something like this: server.getZkStateReader().getZkClient().getSolrZooKeeper().getState(); If the state is CONNECTED everything's probably fine. If anyone who's dealt with Zookeeper happens to know whether this would work, I'd appreciate knowing. For Solr, it is probably a good idea to expose something via an admin handler with the current zookeeper quorum state. Thanks, Shawn
RE: external zookeeper with SolrCloud
good stuff here is a more recent version of the same resource as they have added a few new commands in the recent releases of zookeeper http://zookeeper.apache.org/doc/r3.4.5/zookeeperAdmin.html#sc_zkCommands From: Walter Underwood wun...@wunderwood.org Sent: Friday, August 16, 2013 12:48 To: solr-user@lucene.apache.org Subject: Re: external zookeeper with SolrCloud You might be able to get info from the Zookeeper four letter words. http://zookeeper.apache.org/doc/r3.1.2/zookeeperAdmin.html#sc_zkCommands Here is a command to get the status for one of our Zookeeper hosts: $ echo stat | nc zk-web02.test3.cloud.cheggnet.com 2181 wunder On Aug 16, 2013, at 12:01 PM, Shawn Heisey wrote: On 8/16/2013 11:58 AM, Joshi, Shital wrote: Is there a way to find if We have a zookeeper quorum? We can ping individual zookeeper and see if it is running, but it would be nice to ping/query one URL and check if we have a quorum. This is a really good question, to which I do not have an answer. If your client code is Java, you could probably get this information out of CloudSolrServer, with something like this: server.getZkStateReader().getZkClient().getSolrZooKeeper().getState(); If the state is CONNECTED everything's probably fine. If anyone who's dealt with Zookeeper happens to know whether this would work, I'd appreciate knowing. For Solr, it is probably a good idea to expose something via an admin handler with the current zookeeper quorum state. Thanks, Shawn
Re: external zookeeper with SolrCloud
On 8/16/2013 11:58 AM, Joshi, Shital wrote: Is there a way to find if We have a zookeeper quorum? We can ping individual zookeeper and see if it is running, but it would be nice to ping/query one URL and check if we have a quorum. I filed an issue on this: https://issues.apache.org/jira/browse/SOLR-5169 Thanks, Shawn
RE: external zookeeper with SolrCloud
the mntr command can give that info if you hit the leader of the zk quorum e.g. in the example for that command on the link you can see that its a 5 member zk ensemble (zk_followers 4) and that all followers are synced (zk_synced_followers 4) you would obviously need to query for the zk leader before you could get that data. the srvr command can tell you the status of a given zk (leader or follower) $ echo mntr | nc localhost 2185 zk_version 3.4.0 zk_avg_latency 0 zk_max_latency 0 zk_min_latency 0 zk_packets_received 70 zk_packets_sent 69 zk_outstanding_requests 0 zk_server_state leader zk_znode_count 4 zk_watch_count 0 zk_ephemerals_count 0 zk_approximate_data_size27 zk_followers4 - only exposed by the Leader zk_synced_followers 4 - only exposed by the Leader zk_pending_syncs0 - only exposed by the Leader zk_open_file_descriptor_count 23- only available on Unix platforms zk_max_file_descriptor_count 1024 - only available on Unix platforms --- some examples of using srvr command echo srvr | nc fookeeper_follower 2185 Zookeeper version: 3.4.5-1392090, built on 09/30/2012 17:52 GMT Latency min/avg/max: 0/0/45 Received: 1132673 Sent: 1132724 Connections: 4 Outstanding: 0 Zxid: 0x600172e5a Mode: follower Node count: 218 echo srvr | nc fookeeper_leader 2181 Zookeeper version: 3.4.5-1392090, built on 09/30/2012 17:52 GMT Latency min/avg/max: 0/0/880 Received: 21976696 Sent: 21988742 Connections: 17 Outstanding: 0 Zxid: 0x600172e66 Mode: leader Node count: 218 From: Shawn Heisey s...@elyograg.org Sent: Friday, August 16, 2013 14:13 To: solr-user@lucene.apache.org Subject: Re: external zookeeper with SolrCloud On 8/16/2013 11:58 AM, Joshi, Shital wrote: Is there a way to find if We have a zookeeper quorum? We can ping individual zookeeper and see if it is running, but it would be nice to ping/query one URL and check if we have a quorum. I filed an issue on this: https://issues.apache.org/jira/browse/SOLR-5169 Thanks, Shawn
RE: external zookeeper with SolrCloud
sorry, it looks like you can get the follower/leader status for each node using just the mntrnot the zk_server_state values echo mntr | nc fookeeper_follower 2181 zk_version 3.4.5-1392090, built on 09/30/2012 17:52 GMT zk_avg_latency 0 zk_max_latency 45 zk_min_latency 0 zk_packets_received 1132824 zk_packets_sent 1132875 zk_num_alive_connections4 zk_outstanding_requests 0 zk_server_state follower zk_znode_count 218 zk_watch_count 12 zk_ephemerals_count 85 zk_approximate_data_size546670 zk_open_file_descriptor_count 35 zk_max_file_descriptor_count4096 From: Boogie Shafer boo...@ebrary.com Sent: Friday, August 16, 2013 14:26 To: solr-user@lucene.apache.org Subject: RE: external zookeeper with SolrCloud the mntr command can give that info if you hit the leader of the zk quorum e.g. in the example for that command on the link you can see that its a 5 member zk ensemble (zk_followers 4) and that all followers are synced (zk_synced_followers 4) you would obviously need to query for the zk leader before you could get that data. the srvr command can tell you the status of a given zk (leader or follower) $ echo mntr | nc localhost 2185 zk_version 3.4.0 zk_avg_latency 0 zk_max_latency 0 zk_min_latency 0 zk_packets_received 70 zk_packets_sent 69 zk_outstanding_requests 0 zk_server_state leader zk_znode_count 4 zk_watch_count 0 zk_ephemerals_count 0 zk_approximate_data_size27 zk_followers4 - only exposed by the Leader zk_synced_followers 4 - only exposed by the Leader zk_pending_syncs0 - only exposed by the Leader zk_open_file_descriptor_count 23- only available on Unix platforms zk_max_file_descriptor_count 1024 - only available on Unix platforms --- some examples of using srvr command echo srvr | nc fookeeper_follower 2185 Zookeeper version: 3.4.5-1392090, built on 09/30/2012 17:52 GMT Latency min/avg/max: 0/0/45 Received: 1132673 Sent: 1132724 Connections: 4 Outstanding: 0 Zxid: 0x600172e5a Mode: follower Node count: 218 echo srvr | nc fookeeper_leader 2181 Zookeeper version: 3.4.5-1392090, built on 09/30/2012 17:52 GMT Latency min/avg/max: 0/0/880 Received: 21976696 Sent: 21988742 Connections: 17 Outstanding: 0 Zxid: 0x600172e66 Mode: leader Node count: 218 From: Shawn Heisey s...@elyograg.org Sent: Friday, August 16, 2013 14:13 To: solr-user@lucene.apache.org Subject: Re: external zookeeper with SolrCloud On 8/16/2013 11:58 AM, Joshi, Shital wrote: Is there a way to find if We have a zookeeper quorum? We can ping individual zookeeper and see if it is running, but it would be nice to ping/query one URL and check if we have a quorum. I filed an issue on this: https://issues.apache.org/jira/browse/SOLR-5169 Thanks, Shawn
RE: external zookeeper with SolrCloud
Thanks so much for your reply. Appreciate your help with this. We have 10 Solr4 nodes (5 shards with replication factor 2) and three zookeeper instances. When we bring 10 Solr4 nodes (while all zookeeper instances are down), we see this exception in Solr4 logs. (which makes sense) java.net.ConnectException: Connection refused at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method) at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:567) at org.apache.zookeeper.ClientCnxnSocketNIO.doTransport(ClientCnxnSocketNIO.java:350) at org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1068) 862352 [main-SendThread(d136274-003.dc.gs.com:2181)] WARN org.apache.zookeeper.ClientCnxn ? Session 0x0 for server null, unexpected error, closing socket connection and attempting reconnect When we bring up all zookeeper instances, we stop getting above exception, see this message in log and log stops moving after that: INFO - 2013-08-09 15:48:41.447; org.apache.solr.common.cloud.ConnectionManager; Watcher org.apache.solr.common.cloud.ConnectionManager@203727c5 name:ZooKeeperConnection Watcher:zk1.test.com:2181,zk2.test.com:2181,zk3.test.com:2181 got event WatchedEvent state:SyncConnected type:None path:null path:null type:None 998962 [main-EventThread] INFO org.apache.solr.common.cloud.ConnectionManager ? Watcher org.apache.solr.common.cloud.ConnectionManager@203727c5 name:ZooKeeperConnection Watcher:zk1.test.com:2181,zk2.test.com:2181,qa-zk3.test.com:2181 got event WatchedEvent state:SyncConnected type:None path:null path:null type:None INFO - 2013-08-09 15:48:41.528; org.apache.solr.common.cloud.ConnectionManager; Client-ZooKeeper status change trigger but we are already closed 999043 [main-EventThread] INFO org.apache.solr.common.cloud.ConnectionManager ? Client-ZooKeeper status change trigger but we are already closed At this point, we cannot see admin page or query of any solr nodes unless we restart entire cloud and after that everything is great. So we must put checks to make sure that N/2 + 1 zookeeper instances are up before we can bring up any solr nodes. -Original Message- From: Shawn Heisey [mailto:s...@elyograg.org] Sent: Thursday, August 08, 2013 6:34 PM To: solr-user@lucene.apache.org Subject: Re: external zookeeper with SolrCloud On 8/8/2013 3:03 PM, Joshi, Shital wrote: We did quite a bit of testing and we think bug https://issues.apache.org/jira/browse/SOLR-4899 is not resolved in Solr 4.4 The commit for SOLR-4899 was made to branch_4x on June 10th. lucene_solr_4_4 code branch was created from branch_4x on July 8th. The change is definitely present in 4.4. It's an extremely simple one-line change - instead of waiting for DEFAULT_CLIENT_CONNECT_TIMEOUT, a zookeeper reconnect will wait for Long.MAX_VALUE milliseconds. http://svn.apache.org/viewvc/lucene/dev/branches/branch_4x/solr/solrj/src/java/org/apache/solr/common/cloud/ConnectionManager.java?r1=1491451r2=1491450pathrev=1491451 Either you are having a problem that's unrelated to the change committed by SOLR-4899 or there's something strange going on. Can you describe exactly what you are trying, what you are seeing, and what you expect to see? Thanks, Shawn
Re: external zookeeper with SolrCloud
On 8/9/2013 9:02 AM, Joshi, Shital wrote: At this point, we cannot see admin page or query of any solr nodes unless we restart entire cloud and after that everything is great. So we must put checks to make sure that N/2 + 1 zookeeper instances are up before we can bring up any solr nodes. I am not really surprised to learn that SolrCloud doesn't start correctly if you don't have zookeeper running when starting Solr. I think it's definitely a bug that Solr won't start working correctly when you start zookeeper. I have filed an issue: https://issues.apache.org/jira/browse/SOLR-5129 If you repeat your test while you have one zookeeper node up (but not N/2 + 1 for quorum), does the same thing happen, or will it work? Thanks, Shawn
RE: external zookeeper with SolrCloud
Same thing happen. It only works with N/2 + 1 zookeeper instances up. -Original Message- From: Shawn Heisey [mailto:s...@elyograg.org] Sent: Friday, August 09, 2013 11:22 AM To: solr-user@lucene.apache.org Subject: Re: external zookeeper with SolrCloud On 8/9/2013 9:02 AM, Joshi, Shital wrote: At this point, we cannot see admin page or query of any solr nodes unless we restart entire cloud and after that everything is great. So we must put checks to make sure that N/2 + 1 zookeeper instances are up before we can bring up any solr nodes. I am not really surprised to learn that SolrCloud doesn't start correctly if you don't have zookeeper running when starting Solr. I think it's definitely a bug that Solr won't start working correctly when you start zookeeper. I have filed an issue: https://issues.apache.org/jira/browse/SOLR-5129 If you repeat your test while you have one zookeeper node up (but not N/2 + 1 for quorum), does the same thing happen, or will it work? Thanks, Shawn
Re: external zookeeper with SolrCloud
On 8/9/2013 11:15 AM, Joshi, Shital wrote: Same thing happen. It only works with N/2 + 1 zookeeper instances up. Got it. An update came in on the issue that I filed. This behavior that you're seeing is currently by design. Because this is expected behavior, I've changed the issue to improvement instead of a bug. I don't know if it is something that will happen, but the request is in. The workaround is fairly simple -- don't start or restart Solr nodes if you don't have zookeeper quorum. Thank you for your diligent testing! Shawn
RE: external zookeeper with SolrCloud
We did quite a bit of testing and we think bug https://issues.apache.org/jira/browse/SOLR-4899 is not resolved in Solr 4.4 -Original Message- From: Joshi, Shital [Tech] Sent: Wednesday, August 07, 2013 2:48 PM To: 'solr-user@lucene.apache.org' Subject: RE: external zookeeper with SolrCloud I started looking into what I might have missed while upgrading to Solr 4.4. and I noticed that solr.xml in Solr 4.4 has this: solr solrcloud str name=host${host:}/str int name=hostPort${jetty.port:8983}/int str name=hostContext${hostContext:solr}/str int name=zkClientTimeout${zkClientTimeout:15000}/int bool name=genericCoreNodeNames${genericCoreNodeNames:true}/bool /solrcloud shardHandlerFactory name=shardHandlerFactory class=HttpShardHandlerFactory int name=socketTimeout${socketTimeout:0}/int int name=connTimeout${connTimeout:0}/int /shardHandlerFactory /solr While our solr.xml has this: solr persistent=true cores adminPath=/admin/cores defaultCoreName=collection1 host=${host:} hostPort=${jetty.port:8983} hostContext=${hostContext:solr} zkClientTim eout=${zkClientTimeout:15000} core name=collection1 instanceDir=collection1 shard=${shard:} dataDir=${solr.data.dir} / /cores /solr Do you think not having shardHandlerFactory is causing this bug to appear on our end? -Original Message- From: Raymond Wiker [mailto:rwi...@gmail.com] Sent: Wednesday, August 07, 2013 8:29 AM To: solr-user@lucene.apache.org Subject: Re: external zookeeper with SolrCloud You said earlier that you had 6 zookeeper instances, but the zkHost param only shows 5 instances... is that correct? On Tue, Aug 6, 2013 at 11:23 PM, Joshi, Shital shital.jo...@gs.com wrote: Machines are definitely up. Solr4 node and zookeeper instance share the machine. We're using -DzkHost=zk1,zk2,zk3,zk4,zk5 to let solr nodes know about the zk instances. -Original Message- From: Erick Erickson [mailto:erickerick...@gmail.com] Sent: Tuesday, August 06, 2013 5:03 PM To: solr-user@lucene.apache.org Subject: Re: external zookeeper with SolrCloud First off, even 6 ZK instances are overkill, vast overkill. 3 should be more than enough. That aside, however, how are you letting your Solr nodes know about the zk machines? Is it possible you've pointed some of your Solr nodes at specific ZK machines that aren't up when you have this problem? I.e. -zkHost=zk1,zk2,zk3 Best Erick On Tue, Aug 6, 2013 at 4:56 PM, Joshi, Shital shital.jo...@gs.com wrote: Hi, We have SolrCloud (4.4.0) cluster (5 shards and 2 replicas) on 10 boxes. We have 6 zookeeper instances. We are planning to change to odd number of zookeeper instances. With Solr 4.3.0, if all zookeeper instances are not up, the solr4 node never connects to zookeeper (can't see the admin page) until all zookeeper instances are up and we restart all solr nodes. It was suggested that it could be due this bug https://issues.apache.org/jira/browse/SOLR-4899and this bug is solved in Solr 4.4 We upgraded to Solr 4.4 but still see this issue. We brought up 4 out of 6 zookeeper instances and then brought up all ten Solr4 nodes. We kept seeing this exception in Solr logs: 751395 [main-SendThread] WARN org.apache.zookeeper.ClientCnxn ? Session 0x0 for server null, unexpected error, closing socket connection and attempting reconnect java.net.ConnectException: Connection refused at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method) at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:567) at org.apache.zookeeper.ClientCnxnSocketNIO.doTransport(ClientCnxnSocketNIO.java:350) at org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1068) And after a while saw this exception. INFO - 2013-08-05 22:24:07.582; org.apache.solr.common.cloud.ConnectionManager; Watcher org.apache.solr.common.cloud.ConnectionManager@5140709name:ZooKeeperConnection Watcher: qa-zk1.services.gs.com,qa-zk2.services.gs.com,qa-zk3.services.gs.com, qa-zk4.services.gs.com,qa-zk5.services.gs.com,qa-zk6.services.gs.com got event WatchedEvent state:SyncConnected type:None path:null path:null type:None INFO - 2013-08-05 22:24:07.662; org.apache.solr.common.cloud.ConnectionManager; Client-ZooKeeper status change trigger but we are already closed 754311 [main-EventThread] INFO org.apache.solr.common.cloud.ConnectionManager ? Client-ZooKeeper status change trigger but we are already closed We brought up all zookeeper instances but the cloud never came up until all solr nodes were restarted. Do we need to change any settings? After weekend reboot, all zookeeper instances come up one by one. While zookeeper instances are coming up solr nodes are also getting started. With this issue, we have to put checks to make sure all zookeeper instances are up before we bring up any solr node. Thanks
Re: external zookeeper with SolrCloud
On 8/8/2013 3:03 PM, Joshi, Shital wrote: We did quite a bit of testing and we think bug https://issues.apache.org/jira/browse/SOLR-4899 is not resolved in Solr 4.4 The commit for SOLR-4899 was made to branch_4x on June 10th. lucene_solr_4_4 code branch was created from branch_4x on July 8th. The change is definitely present in 4.4. It's an extremely simple one-line change - instead of waiting for DEFAULT_CLIENT_CONNECT_TIMEOUT, a zookeeper reconnect will wait for Long.MAX_VALUE milliseconds. http://svn.apache.org/viewvc/lucene/dev/branches/branch_4x/solr/solrj/src/java/org/apache/solr/common/cloud/ConnectionManager.java?r1=1491451r2=1491450pathrev=1491451 Either you are having a problem that's unrelated to the change committed by SOLR-4899 or there's something strange going on. Can you describe exactly what you are trying, what you are seeing, and what you expect to see? Thanks, Shawn
Re: external zookeeper with SolrCloud
Hmmm, shouldn't be happening. How sure are you that the upgrade to 4.4 was carried out on all machines? Erick On Tue, Aug 6, 2013 at 5:23 PM, Joshi, Shital shital.jo...@gs.com wrote: Machines are definitely up. Solr4 node and zookeeper instance share the machine. We're using -DzkHost=zk1,zk2,zk3,zk4,zk5 to let solr nodes know about the zk instances. -Original Message- From: Erick Erickson [mailto:erickerick...@gmail.com] Sent: Tuesday, August 06, 2013 5:03 PM To: solr-user@lucene.apache.org Subject: Re: external zookeeper with SolrCloud First off, even 6 ZK instances are overkill, vast overkill. 3 should be more than enough. That aside, however, how are you letting your Solr nodes know about the zk machines? Is it possible you've pointed some of your Solr nodes at specific ZK machines that aren't up when you have this problem? I.e. -zkHost=zk1,zk2,zk3 Best Erick On Tue, Aug 6, 2013 at 4:56 PM, Joshi, Shital shital.jo...@gs.com wrote: Hi, We have SolrCloud (4.4.0) cluster (5 shards and 2 replicas) on 10 boxes. We have 6 zookeeper instances. We are planning to change to odd number of zookeeper instances. With Solr 4.3.0, if all zookeeper instances are not up, the solr4 node never connects to zookeeper (can't see the admin page) until all zookeeper instances are up and we restart all solr nodes. It was suggested that it could be due this bug https://issues.apache.org/jira/browse/SOLR-4899and this bug is solved in Solr 4.4 We upgraded to Solr 4.4 but still see this issue. We brought up 4 out of 6 zookeeper instances and then brought up all ten Solr4 nodes. We kept seeing this exception in Solr logs: 751395 [main-SendThread] WARN org.apache.zookeeper.ClientCnxn ? Session 0x0 for server null, unexpected error, closing socket connection and attempting reconnect java.net.ConnectException: Connection refused at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method) at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:567) at org.apache.zookeeper.ClientCnxnSocketNIO.doTransport(ClientCnxnSocketNIO.java:350) at org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1068) And after a while saw this exception. INFO - 2013-08-05 22:24:07.582; org.apache.solr.common.cloud.ConnectionManager; Watcher org.apache.solr.common.cloud.ConnectionManager@5140709name:ZooKeeperConnection Watcher: qa-zk1.services.gs.com,qa-zk2.services.gs.com,qa-zk3.services.gs.com, qa-zk4.services.gs.com,qa-zk5.services.gs.com,qa-zk6.services.gs.com got event WatchedEvent state:SyncConnected type:None path:null path:null type:None INFO - 2013-08-05 22:24:07.662; org.apache.solr.common.cloud.ConnectionManager; Client-ZooKeeper status change trigger but we are already closed 754311 [main-EventThread] INFO org.apache.solr.common.cloud.ConnectionManager ? Client-ZooKeeper status change trigger but we are already closed We brought up all zookeeper instances but the cloud never came up until all solr nodes were restarted. Do we need to change any settings? After weekend reboot, all zookeeper instances come up one by one. While zookeeper instances are coming up solr nodes are also getting started. With this issue, we have to put checks to make sure all zookeeper instances are up before we bring up any solr node. Thanks!! -Original Message- From: Mark Miller [mailto:markrmil...@gmail.com] Sent: Tuesday, June 11, 2013 10:42 AM To: solr-user@lucene.apache.org Subject: Re: external zookeeper with SolrCloud On Jun 11, 2013, at 10:15 AM, Joshi, Shital shital.jo...@gs.com wrote: Thanks Mark. Looks like this bug is fixed in Solr 4.4. Do you have any date for official release of 4.4? Looks like it might come out in a couple of weeks. Is there any instruction available on how to build Solr 4.4 from SVN repository? It's java, so it's pretty easy - you might find some help here: http://wiki.apache.org/solr/HowToContribute - Mark -Original Message- From: Mark Miller [mailto:markrmil...@gmail.com] Sent: Monday, June 10, 2013 8:05 PM To: solr-user@lucene.apache.org Subject: Re: external zookeeper with SolrCloud This might be https://issues.apache.org/jira/browse/SOLR-4899 - Mark On Jun 10, 2013, at 5:59 PM, Joshi, Shital shital.jo...@gs.com wrote: Hi, We're setting up 5 shard SolrCloud with external zoo keeper. When we bring up Solr nodes while the zookeeper instance is not up and running, we see this error in Solr logs. java.net.ConnectException: Connection refused at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method) at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:567) at org.apache.zookeeper.ClientCnxnSocketNIO.doTransport(ClientCnxnSocketNIO.java
Re: external zookeeper with SolrCloud
You said earlier that you had 6 zookeeper instances, but the zkHost param only shows 5 instances... is that correct? On Tue, Aug 6, 2013 at 11:23 PM, Joshi, Shital shital.jo...@gs.com wrote: Machines are definitely up. Solr4 node and zookeeper instance share the machine. We're using -DzkHost=zk1,zk2,zk3,zk4,zk5 to let solr nodes know about the zk instances. -Original Message- From: Erick Erickson [mailto:erickerick...@gmail.com] Sent: Tuesday, August 06, 2013 5:03 PM To: solr-user@lucene.apache.org Subject: Re: external zookeeper with SolrCloud First off, even 6 ZK instances are overkill, vast overkill. 3 should be more than enough. That aside, however, how are you letting your Solr nodes know about the zk machines? Is it possible you've pointed some of your Solr nodes at specific ZK machines that aren't up when you have this problem? I.e. -zkHost=zk1,zk2,zk3 Best Erick On Tue, Aug 6, 2013 at 4:56 PM, Joshi, Shital shital.jo...@gs.com wrote: Hi, We have SolrCloud (4.4.0) cluster (5 shards and 2 replicas) on 10 boxes. We have 6 zookeeper instances. We are planning to change to odd number of zookeeper instances. With Solr 4.3.0, if all zookeeper instances are not up, the solr4 node never connects to zookeeper (can't see the admin page) until all zookeeper instances are up and we restart all solr nodes. It was suggested that it could be due this bug https://issues.apache.org/jira/browse/SOLR-4899and this bug is solved in Solr 4.4 We upgraded to Solr 4.4 but still see this issue. We brought up 4 out of 6 zookeeper instances and then brought up all ten Solr4 nodes. We kept seeing this exception in Solr logs: 751395 [main-SendThread] WARN org.apache.zookeeper.ClientCnxn ? Session 0x0 for server null, unexpected error, closing socket connection and attempting reconnect java.net.ConnectException: Connection refused at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method) at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:567) at org.apache.zookeeper.ClientCnxnSocketNIO.doTransport(ClientCnxnSocketNIO.java:350) at org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1068) And after a while saw this exception. INFO - 2013-08-05 22:24:07.582; org.apache.solr.common.cloud.ConnectionManager; Watcher org.apache.solr.common.cloud.ConnectionManager@5140709name:ZooKeeperConnection Watcher: qa-zk1.services.gs.com,qa-zk2.services.gs.com,qa-zk3.services.gs.com, qa-zk4.services.gs.com,qa-zk5.services.gs.com,qa-zk6.services.gs.com got event WatchedEvent state:SyncConnected type:None path:null path:null type:None INFO - 2013-08-05 22:24:07.662; org.apache.solr.common.cloud.ConnectionManager; Client-ZooKeeper status change trigger but we are already closed 754311 [main-EventThread] INFO org.apache.solr.common.cloud.ConnectionManager ? Client-ZooKeeper status change trigger but we are already closed We brought up all zookeeper instances but the cloud never came up until all solr nodes were restarted. Do we need to change any settings? After weekend reboot, all zookeeper instances come up one by one. While zookeeper instances are coming up solr nodes are also getting started. With this issue, we have to put checks to make sure all zookeeper instances are up before we bring up any solr node. Thanks!! -Original Message- From: Mark Miller [mailto:markrmil...@gmail.com] Sent: Tuesday, June 11, 2013 10:42 AM To: solr-user@lucene.apache.org Subject: Re: external zookeeper with SolrCloud On Jun 11, 2013, at 10:15 AM, Joshi, Shital shital.jo...@gs.com wrote: Thanks Mark. Looks like this bug is fixed in Solr 4.4. Do you have any date for official release of 4.4? Looks like it might come out in a couple of weeks. Is there any instruction available on how to build Solr 4.4 from SVN repository? It's java, so it's pretty easy - you might find some help here: http://wiki.apache.org/solr/HowToContribute - Mark -Original Message- From: Mark Miller [mailto:markrmil...@gmail.com] Sent: Monday, June 10, 2013 8:05 PM To: solr-user@lucene.apache.org Subject: Re: external zookeeper with SolrCloud This might be https://issues.apache.org/jira/browse/SOLR-4899 - Mark On Jun 10, 2013, at 5:59 PM, Joshi, Shital shital.jo...@gs.com wrote: Hi, We're setting up 5 shard SolrCloud with external zoo keeper. When we bring up Solr nodes while the zookeeper instance is not up and running, we see this error in Solr logs. java.net.ConnectException: Connection refused at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method) at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:567) at org.apache.zookeeper.ClientCnxnSocketNIO.doTransport
RE: external zookeeper with SolrCloud
I went through Admin page - Dashboard of all 10 nodes and verified that each one is using solr-spec 4.4.0. solr-spec 4.4.0 solr-impl 4.4.0 1504776 - sarowe - 2013-07-19 02:58:35 lucene-spec 4.4.0 lucene-impl 4.4.0 1504776 - sarowe - 2013-07-19 02:53:42 Is there anything else I can check to verify that we upgraded to solr 4.4.0? -Original Message- From: Erick Erickson [mailto:erickerick...@gmail.com] Sent: Wednesday, August 07, 2013 8:10 AM To: solr-user@lucene.apache.org Subject: Re: external zookeeper with SolrCloud Hmmm, shouldn't be happening. How sure are you that the upgrade to 4.4 was carried out on all machines? Erick On Tue, Aug 6, 2013 at 5:23 PM, Joshi, Shital shital.jo...@gs.com wrote: Machines are definitely up. Solr4 node and zookeeper instance share the machine. We're using -DzkHost=zk1,zk2,zk3,zk4,zk5 to let solr nodes know about the zk instances. -Original Message- From: Erick Erickson [mailto:erickerick...@gmail.com] Sent: Tuesday, August 06, 2013 5:03 PM To: solr-user@lucene.apache.org Subject: Re: external zookeeper with SolrCloud First off, even 6 ZK instances are overkill, vast overkill. 3 should be more than enough. That aside, however, how are you letting your Solr nodes know about the zk machines? Is it possible you've pointed some of your Solr nodes at specific ZK machines that aren't up when you have this problem? I.e. -zkHost=zk1,zk2,zk3 Best Erick On Tue, Aug 6, 2013 at 4:56 PM, Joshi, Shital shital.jo...@gs.com wrote: Hi, We have SolrCloud (4.4.0) cluster (5 shards and 2 replicas) on 10 boxes. We have 6 zookeeper instances. We are planning to change to odd number of zookeeper instances. With Solr 4.3.0, if all zookeeper instances are not up, the solr4 node never connects to zookeeper (can't see the admin page) until all zookeeper instances are up and we restart all solr nodes. It was suggested that it could be due this bug https://issues.apache.org/jira/browse/SOLR-4899and this bug is solved in Solr 4.4 We upgraded to Solr 4.4 but still see this issue. We brought up 4 out of 6 zookeeper instances and then brought up all ten Solr4 nodes. We kept seeing this exception in Solr logs: 751395 [main-SendThread] WARN org.apache.zookeeper.ClientCnxn ? Session 0x0 for server null, unexpected error, closing socket connection and attempting reconnect java.net.ConnectException: Connection refused at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method) at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:567) at org.apache.zookeeper.ClientCnxnSocketNIO.doTransport(ClientCnxnSocketNIO.java:350) at org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1068) And after a while saw this exception. INFO - 2013-08-05 22:24:07.582; org.apache.solr.common.cloud.ConnectionManager; Watcher org.apache.solr.common.cloud.ConnectionManager@5140709name:ZooKeeperConnection Watcher: qa-zk1.services.gs.com,qa-zk2.services.gs.com,qa-zk3.services.gs.com, qa-zk4.services.gs.com,qa-zk5.services.gs.com,qa-zk6.services.gs.com got event WatchedEvent state:SyncConnected type:None path:null path:null type:None INFO - 2013-08-05 22:24:07.662; org.apache.solr.common.cloud.ConnectionManager; Client-ZooKeeper status change trigger but we are already closed 754311 [main-EventThread] INFO org.apache.solr.common.cloud.ConnectionManager ? Client-ZooKeeper status change trigger but we are already closed We brought up all zookeeper instances but the cloud never came up until all solr nodes were restarted. Do we need to change any settings? After weekend reboot, all zookeeper instances come up one by one. While zookeeper instances are coming up solr nodes are also getting started. With this issue, we have to put checks to make sure all zookeeper instances are up before we bring up any solr node. Thanks!! -Original Message- From: Mark Miller [mailto:markrmil...@gmail.com] Sent: Tuesday, June 11, 2013 10:42 AM To: solr-user@lucene.apache.org Subject: Re: external zookeeper with SolrCloud On Jun 11, 2013, at 10:15 AM, Joshi, Shital shital.jo...@gs.com wrote: Thanks Mark. Looks like this bug is fixed in Solr 4.4. Do you have any date for official release of 4.4? Looks like it might come out in a couple of weeks. Is there any instruction available on how to build Solr 4.4 from SVN repository? It's java, so it's pretty easy - you might find some help here: http://wiki.apache.org/solr/HowToContribute - Mark -Original Message- From: Mark Miller [mailto:markrmil...@gmail.com] Sent: Monday, June 10, 2013 8:05 PM To: solr-user@lucene.apache.org Subject: Re: external zookeeper with SolrCloud This might be https://issues.apache.org/jira/browse/SOLR-4899 - Mark On Jun 10, 2013, at 5:59 PM, Joshi, Shital
RE: external zookeeper with SolrCloud
We have all 6 instances in zkhost parameter. -Original Message- From: Raymond Wiker [mailto:rwi...@gmail.com] Sent: Wednesday, August 07, 2013 8:29 AM To: solr-user@lucene.apache.org Subject: Re: external zookeeper with SolrCloud You said earlier that you had 6 zookeeper instances, but the zkHost param only shows 5 instances... is that correct? On Tue, Aug 6, 2013 at 11:23 PM, Joshi, Shital shital.jo...@gs.com wrote: Machines are definitely up. Solr4 node and zookeeper instance share the machine. We're using -DzkHost=zk1,zk2,zk3,zk4,zk5 to let solr nodes know about the zk instances. -Original Message- From: Erick Erickson [mailto:erickerick...@gmail.com] Sent: Tuesday, August 06, 2013 5:03 PM To: solr-user@lucene.apache.org Subject: Re: external zookeeper with SolrCloud First off, even 6 ZK instances are overkill, vast overkill. 3 should be more than enough. That aside, however, how are you letting your Solr nodes know about the zk machines? Is it possible you've pointed some of your Solr nodes at specific ZK machines that aren't up when you have this problem? I.e. -zkHost=zk1,zk2,zk3 Best Erick On Tue, Aug 6, 2013 at 4:56 PM, Joshi, Shital shital.jo...@gs.com wrote: Hi, We have SolrCloud (4.4.0) cluster (5 shards and 2 replicas) on 10 boxes. We have 6 zookeeper instances. We are planning to change to odd number of zookeeper instances. With Solr 4.3.0, if all zookeeper instances are not up, the solr4 node never connects to zookeeper (can't see the admin page) until all zookeeper instances are up and we restart all solr nodes. It was suggested that it could be due this bug https://issues.apache.org/jira/browse/SOLR-4899and this bug is solved in Solr 4.4 We upgraded to Solr 4.4 but still see this issue. We brought up 4 out of 6 zookeeper instances and then brought up all ten Solr4 nodes. We kept seeing this exception in Solr logs: 751395 [main-SendThread] WARN org.apache.zookeeper.ClientCnxn ? Session 0x0 for server null, unexpected error, closing socket connection and attempting reconnect java.net.ConnectException: Connection refused at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method) at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:567) at org.apache.zookeeper.ClientCnxnSocketNIO.doTransport(ClientCnxnSocketNIO.java:350) at org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1068) And after a while saw this exception. INFO - 2013-08-05 22:24:07.582; org.apache.solr.common.cloud.ConnectionManager; Watcher org.apache.solr.common.cloud.ConnectionManager@5140709name:ZooKeeperConnection Watcher: qa-zk1.services.gs.com,qa-zk2.services.gs.com,qa-zk3.services.gs.com, qa-zk4.services.gs.com,qa-zk5.services.gs.com,qa-zk6.services.gs.com got event WatchedEvent state:SyncConnected type:None path:null path:null type:None INFO - 2013-08-05 22:24:07.662; org.apache.solr.common.cloud.ConnectionManager; Client-ZooKeeper status change trigger but we are already closed 754311 [main-EventThread] INFO org.apache.solr.common.cloud.ConnectionManager ? Client-ZooKeeper status change trigger but we are already closed We brought up all zookeeper instances but the cloud never came up until all solr nodes were restarted. Do we need to change any settings? After weekend reboot, all zookeeper instances come up one by one. While zookeeper instances are coming up solr nodes are also getting started. With this issue, we have to put checks to make sure all zookeeper instances are up before we bring up any solr node. Thanks!! -Original Message- From: Mark Miller [mailto:markrmil...@gmail.com] Sent: Tuesday, June 11, 2013 10:42 AM To: solr-user@lucene.apache.org Subject: Re: external zookeeper with SolrCloud On Jun 11, 2013, at 10:15 AM, Joshi, Shital shital.jo...@gs.com wrote: Thanks Mark. Looks like this bug is fixed in Solr 4.4. Do you have any date for official release of 4.4? Looks like it might come out in a couple of weeks. Is there any instruction available on how to build Solr 4.4 from SVN repository? It's java, so it's pretty easy - you might find some help here: http://wiki.apache.org/solr/HowToContribute - Mark -Original Message- From: Mark Miller [mailto:markrmil...@gmail.com] Sent: Monday, June 10, 2013 8:05 PM To: solr-user@lucene.apache.org Subject: Re: external zookeeper with SolrCloud This might be https://issues.apache.org/jira/browse/SOLR-4899 - Mark On Jun 10, 2013, at 5:59 PM, Joshi, Shital shital.jo...@gs.com wrote: Hi, We're setting up 5 shard SolrCloud with external zoo keeper. When we bring up Solr nodes while the zookeeper instance is not up and running, we see this error in Solr logs. java.net.ConnectException: Connection refused
RE: external zookeeper with SolrCloud
I started looking into what I might have missed while upgrading to Solr 4.4. and I noticed that solr.xml in Solr 4.4 has this: solr solrcloud str name=host${host:}/str int name=hostPort${jetty.port:8983}/int str name=hostContext${hostContext:solr}/str int name=zkClientTimeout${zkClientTimeout:15000}/int bool name=genericCoreNodeNames${genericCoreNodeNames:true}/bool /solrcloud shardHandlerFactory name=shardHandlerFactory class=HttpShardHandlerFactory int name=socketTimeout${socketTimeout:0}/int int name=connTimeout${connTimeout:0}/int /shardHandlerFactory /solr While our solr.xml has this: solr persistent=true cores adminPath=/admin/cores defaultCoreName=collection1 host=${host:} hostPort=${jetty.port:8983} hostContext=${hostContext:solr} zkClientTim eout=${zkClientTimeout:15000} core name=collection1 instanceDir=collection1 shard=${shard:} dataDir=${solr.data.dir} / /cores /solr Do you think not having shardHandlerFactory is causing this bug to appear on our end? -Original Message- From: Raymond Wiker [mailto:rwi...@gmail.com] Sent: Wednesday, August 07, 2013 8:29 AM To: solr-user@lucene.apache.org Subject: Re: external zookeeper with SolrCloud You said earlier that you had 6 zookeeper instances, but the zkHost param only shows 5 instances... is that correct? On Tue, Aug 6, 2013 at 11:23 PM, Joshi, Shital shital.jo...@gs.com wrote: Machines are definitely up. Solr4 node and zookeeper instance share the machine. We're using -DzkHost=zk1,zk2,zk3,zk4,zk5 to let solr nodes know about the zk instances. -Original Message- From: Erick Erickson [mailto:erickerick...@gmail.com] Sent: Tuesday, August 06, 2013 5:03 PM To: solr-user@lucene.apache.org Subject: Re: external zookeeper with SolrCloud First off, even 6 ZK instances are overkill, vast overkill. 3 should be more than enough. That aside, however, how are you letting your Solr nodes know about the zk machines? Is it possible you've pointed some of your Solr nodes at specific ZK machines that aren't up when you have this problem? I.e. -zkHost=zk1,zk2,zk3 Best Erick On Tue, Aug 6, 2013 at 4:56 PM, Joshi, Shital shital.jo...@gs.com wrote: Hi, We have SolrCloud (4.4.0) cluster (5 shards and 2 replicas) on 10 boxes. We have 6 zookeeper instances. We are planning to change to odd number of zookeeper instances. With Solr 4.3.0, if all zookeeper instances are not up, the solr4 node never connects to zookeeper (can't see the admin page) until all zookeeper instances are up and we restart all solr nodes. It was suggested that it could be due this bug https://issues.apache.org/jira/browse/SOLR-4899and this bug is solved in Solr 4.4 We upgraded to Solr 4.4 but still see this issue. We brought up 4 out of 6 zookeeper instances and then brought up all ten Solr4 nodes. We kept seeing this exception in Solr logs: 751395 [main-SendThread] WARN org.apache.zookeeper.ClientCnxn ? Session 0x0 for server null, unexpected error, closing socket connection and attempting reconnect java.net.ConnectException: Connection refused at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method) at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:567) at org.apache.zookeeper.ClientCnxnSocketNIO.doTransport(ClientCnxnSocketNIO.java:350) at org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1068) And after a while saw this exception. INFO - 2013-08-05 22:24:07.582; org.apache.solr.common.cloud.ConnectionManager; Watcher org.apache.solr.common.cloud.ConnectionManager@5140709name:ZooKeeperConnection Watcher: qa-zk1.services.gs.com,qa-zk2.services.gs.com,qa-zk3.services.gs.com, qa-zk4.services.gs.com,qa-zk5.services.gs.com,qa-zk6.services.gs.com got event WatchedEvent state:SyncConnected type:None path:null path:null type:None INFO - 2013-08-05 22:24:07.662; org.apache.solr.common.cloud.ConnectionManager; Client-ZooKeeper status change trigger but we are already closed 754311 [main-EventThread] INFO org.apache.solr.common.cloud.ConnectionManager ? Client-ZooKeeper status change trigger but we are already closed We brought up all zookeeper instances but the cloud never came up until all solr nodes were restarted. Do we need to change any settings? After weekend reboot, all zookeeper instances come up one by one. While zookeeper instances are coming up solr nodes are also getting started. With this issue, we have to put checks to make sure all zookeeper instances are up before we bring up any solr node. Thanks!! -Original Message- From: Mark Miller [mailto:markrmil...@gmail.com] Sent: Tuesday, June 11, 2013 10:42 AM To: solr-user@lucene.apache.org Subject: Re: external zookeeper with SolrCloud On Jun 11, 2013, at 10:15 AM, Joshi, Shital shital.jo...@gs.com wrote: Thanks Mark
RE: external zookeeper with SolrCloud
Hi, We have SolrCloud (4.4.0) cluster (5 shards and 2 replicas) on 10 boxes. We have 6 zookeeper instances. We are planning to change to odd number of zookeeper instances. With Solr 4.3.0, if all zookeeper instances are not up, the solr4 node never connects to zookeeper (can't see the admin page) until all zookeeper instances are up and we restart all solr nodes. It was suggested that it could be due this bug https://issues.apache.org/jira/browse/SOLR-4899 and this bug is solved in Solr 4.4 We upgraded to Solr 4.4 but still see this issue. We brought up 4 out of 6 zookeeper instances and then brought up all ten Solr4 nodes. We kept seeing this exception in Solr logs: 751395 [main-SendThread] WARN org.apache.zookeeper.ClientCnxn ? Session 0x0 for server null, unexpected error, closing socket connection and attempting reconnect java.net.ConnectException: Connection refused at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method) at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:567) at org.apache.zookeeper.ClientCnxnSocketNIO.doTransport(ClientCnxnSocketNIO.java:350) at org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1068) And after a while saw this exception. INFO - 2013-08-05 22:24:07.582; org.apache.solr.common.cloud.ConnectionManager; Watcher org.apache.solr.common.cloud.ConnectionManager@5140709 name:ZooKeeperConnection Watcher:qa-zk1.services.gs.com,qa-zk2.services.gs.com,qa-zk3.services.gs.com,qa-zk4.services.gs.com,qa-zk5.services.gs.com,qa-zk6.services.gs.com got event WatchedEvent state:SyncConnected type:None path:null path:null type:None INFO - 2013-08-05 22:24:07.662; org.apache.solr.common.cloud.ConnectionManager; Client-ZooKeeper status change trigger but we are already closed 754311 [main-EventThread] INFO org.apache.solr.common.cloud.ConnectionManager ? Client-ZooKeeper status change trigger but we are already closed We brought up all zookeeper instances but the cloud never came up until all solr nodes were restarted. Do we need to change any settings? After weekend reboot, all zookeeper instances come up one by one. While zookeeper instances are coming up solr nodes are also getting started. With this issue, we have to put checks to make sure all zookeeper instances are up before we bring up any solr node. Thanks!! -Original Message- From: Mark Miller [mailto:markrmil...@gmail.com] Sent: Tuesday, June 11, 2013 10:42 AM To: solr-user@lucene.apache.org Subject: Re: external zookeeper with SolrCloud On Jun 11, 2013, at 10:15 AM, Joshi, Shital shital.jo...@gs.com wrote: Thanks Mark. Looks like this bug is fixed in Solr 4.4. Do you have any date for official release of 4.4? Looks like it might come out in a couple of weeks. Is there any instruction available on how to build Solr 4.4 from SVN repository? It's java, so it's pretty easy - you might find some help here: http://wiki.apache.org/solr/HowToContribute - Mark -Original Message- From: Mark Miller [mailto:markrmil...@gmail.com] Sent: Monday, June 10, 2013 8:05 PM To: solr-user@lucene.apache.org Subject: Re: external zookeeper with SolrCloud This might be https://issues.apache.org/jira/browse/SOLR-4899 - Mark On Jun 10, 2013, at 5:59 PM, Joshi, Shital shital.jo...@gs.com wrote: Hi, We're setting up 5 shard SolrCloud with external zoo keeper. When we bring up Solr nodes while the zookeeper instance is not up and running, we see this error in Solr logs. java.net.ConnectException: Connection refused at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method) at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:567) at org.apache.zookeeper.ClientCnxnSocketNIO.doTransport(ClientCnxnSocketNIO.java:350) at org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1068) INFO - 2013-06-10 15:03:35.422; org.apache.solr.common.cloud.ConnectionManager; Watcher 592147 [main-EventThread] INFO org.apache.solr.common.cloud.ConnectionManager ? Watcher org.apache.solr.common.cloud.ConnectionManager@530d0eae name:ZooKeeperConnection Watcher: . got event WatchedEvent state:SyncConnected type:None path:null path:null type:None INFO - 2013-06-10 15:03:35.423; org.apache.solr.common.cloud.ConnectionManager; Client-ZooKeeper status change trigger but we are already closed 592148 [main-EventThread] INFO org.apache.solr.common.cloud.ConnectionManager ? Client-ZooKeeper status change trigger but we are already closed After we bring up zookeeper instance, the node never connects to zookeeper and we can't see the solr admin page, until we restart the node. Does the zookeeper instance has to be up when we bring up Solr node? That's not what the documentation say though. Thanks.
Re: external zookeeper with SolrCloud
First off, even 6 ZK instances are overkill, vast overkill. 3 should be more than enough. That aside, however, how are you letting your Solr nodes know about the zk machines? Is it possible you've pointed some of your Solr nodes at specific ZK machines that aren't up when you have this problem? I.e. -zkHost=zk1,zk2,zk3 Best Erick On Tue, Aug 6, 2013 at 4:56 PM, Joshi, Shital shital.jo...@gs.com wrote: Hi, We have SolrCloud (4.4.0) cluster (5 shards and 2 replicas) on 10 boxes. We have 6 zookeeper instances. We are planning to change to odd number of zookeeper instances. With Solr 4.3.0, if all zookeeper instances are not up, the solr4 node never connects to zookeeper (can't see the admin page) until all zookeeper instances are up and we restart all solr nodes. It was suggested that it could be due this bug https://issues.apache.org/jira/browse/SOLR-4899 and this bug is solved in Solr 4.4 We upgraded to Solr 4.4 but still see this issue. We brought up 4 out of 6 zookeeper instances and then brought up all ten Solr4 nodes. We kept seeing this exception in Solr logs: 751395 [main-SendThread] WARN org.apache.zookeeper.ClientCnxn ? Session 0x0 for server null, unexpected error, closing socket connection and attempting reconnect java.net.ConnectException: Connection refused at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method) at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:567) at org.apache.zookeeper.ClientCnxnSocketNIO.doTransport(ClientCnxnSocketNIO.java:350) at org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1068) And after a while saw this exception. INFO - 2013-08-05 22:24:07.582; org.apache.solr.common.cloud.ConnectionManager; Watcher org.apache.solr.common.cloud.ConnectionManager@5140709name:ZooKeeperConnection Watcher: qa-zk1.services.gs.com,qa-zk2.services.gs.com,qa-zk3.services.gs.com, qa-zk4.services.gs.com,qa-zk5.services.gs.com,qa-zk6.services.gs.com got event WatchedEvent state:SyncConnected type:None path:null path:null type:None INFO - 2013-08-05 22:24:07.662; org.apache.solr.common.cloud.ConnectionManager; Client-ZooKeeper status change trigger but we are already closed 754311 [main-EventThread] INFO org.apache.solr.common.cloud.ConnectionManager ? Client-ZooKeeper status change trigger but we are already closed We brought up all zookeeper instances but the cloud never came up until all solr nodes were restarted. Do we need to change any settings? After weekend reboot, all zookeeper instances come up one by one. While zookeeper instances are coming up solr nodes are also getting started. With this issue, we have to put checks to make sure all zookeeper instances are up before we bring up any solr node. Thanks!! -Original Message- From: Mark Miller [mailto:markrmil...@gmail.com] Sent: Tuesday, June 11, 2013 10:42 AM To: solr-user@lucene.apache.org Subject: Re: external zookeeper with SolrCloud On Jun 11, 2013, at 10:15 AM, Joshi, Shital shital.jo...@gs.com wrote: Thanks Mark. Looks like this bug is fixed in Solr 4.4. Do you have any date for official release of 4.4? Looks like it might come out in a couple of weeks. Is there any instruction available on how to build Solr 4.4 from SVN repository? It's java, so it's pretty easy - you might find some help here: http://wiki.apache.org/solr/HowToContribute - Mark -Original Message- From: Mark Miller [mailto:markrmil...@gmail.com] Sent: Monday, June 10, 2013 8:05 PM To: solr-user@lucene.apache.org Subject: Re: external zookeeper with SolrCloud This might be https://issues.apache.org/jira/browse/SOLR-4899 - Mark On Jun 10, 2013, at 5:59 PM, Joshi, Shital shital.jo...@gs.com wrote: Hi, We're setting up 5 shard SolrCloud with external zoo keeper. When we bring up Solr nodes while the zookeeper instance is not up and running, we see this error in Solr logs. java.net.ConnectException: Connection refused at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method) at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:567) at org.apache.zookeeper.ClientCnxnSocketNIO.doTransport(ClientCnxnSocketNIO.java:350) at org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1068) INFO - 2013-06-10 15:03:35.422; org.apache.solr.common.cloud.ConnectionManager; Watcher 592147 [main-EventThread] INFO org.apache.solr.common.cloud.ConnectionManager ? Watcher org.apache.solr.common.cloud.ConnectionManager@530d0eaename:ZooKeeperConnection Watcher: . got event WatchedEvent state:SyncConnected type:None path:null path:null type:None INFO - 2013-06-10 15:03:35.423; org.apache.solr.common.cloud.ConnectionManager; Client-ZooKeeper status change trigger but we are already closed 592148 [main-EventThread] INFO
RE: external zookeeper with SolrCloud
Machines are definitely up. Solr4 node and zookeeper instance share the machine. We're using -DzkHost=zk1,zk2,zk3,zk4,zk5 to let solr nodes know about the zk instances. -Original Message- From: Erick Erickson [mailto:erickerick...@gmail.com] Sent: Tuesday, August 06, 2013 5:03 PM To: solr-user@lucene.apache.org Subject: Re: external zookeeper with SolrCloud First off, even 6 ZK instances are overkill, vast overkill. 3 should be more than enough. That aside, however, how are you letting your Solr nodes know about the zk machines? Is it possible you've pointed some of your Solr nodes at specific ZK machines that aren't up when you have this problem? I.e. -zkHost=zk1,zk2,zk3 Best Erick On Tue, Aug 6, 2013 at 4:56 PM, Joshi, Shital shital.jo...@gs.com wrote: Hi, We have SolrCloud (4.4.0) cluster (5 shards and 2 replicas) on 10 boxes. We have 6 zookeeper instances. We are planning to change to odd number of zookeeper instances. With Solr 4.3.0, if all zookeeper instances are not up, the solr4 node never connects to zookeeper (can't see the admin page) until all zookeeper instances are up and we restart all solr nodes. It was suggested that it could be due this bug https://issues.apache.org/jira/browse/SOLR-4899 and this bug is solved in Solr 4.4 We upgraded to Solr 4.4 but still see this issue. We brought up 4 out of 6 zookeeper instances and then brought up all ten Solr4 nodes. We kept seeing this exception in Solr logs: 751395 [main-SendThread] WARN org.apache.zookeeper.ClientCnxn ? Session 0x0 for server null, unexpected error, closing socket connection and attempting reconnect java.net.ConnectException: Connection refused at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method) at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:567) at org.apache.zookeeper.ClientCnxnSocketNIO.doTransport(ClientCnxnSocketNIO.java:350) at org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1068) And after a while saw this exception. INFO - 2013-08-05 22:24:07.582; org.apache.solr.common.cloud.ConnectionManager; Watcher org.apache.solr.common.cloud.ConnectionManager@5140709name:ZooKeeperConnection Watcher: qa-zk1.services.gs.com,qa-zk2.services.gs.com,qa-zk3.services.gs.com, qa-zk4.services.gs.com,qa-zk5.services.gs.com,qa-zk6.services.gs.com got event WatchedEvent state:SyncConnected type:None path:null path:null type:None INFO - 2013-08-05 22:24:07.662; org.apache.solr.common.cloud.ConnectionManager; Client-ZooKeeper status change trigger but we are already closed 754311 [main-EventThread] INFO org.apache.solr.common.cloud.ConnectionManager ? Client-ZooKeeper status change trigger but we are already closed We brought up all zookeeper instances but the cloud never came up until all solr nodes were restarted. Do we need to change any settings? After weekend reboot, all zookeeper instances come up one by one. While zookeeper instances are coming up solr nodes are also getting started. With this issue, we have to put checks to make sure all zookeeper instances are up before we bring up any solr node. Thanks!! -Original Message- From: Mark Miller [mailto:markrmil...@gmail.com] Sent: Tuesday, June 11, 2013 10:42 AM To: solr-user@lucene.apache.org Subject: Re: external zookeeper with SolrCloud On Jun 11, 2013, at 10:15 AM, Joshi, Shital shital.jo...@gs.com wrote: Thanks Mark. Looks like this bug is fixed in Solr 4.4. Do you have any date for official release of 4.4? Looks like it might come out in a couple of weeks. Is there any instruction available on how to build Solr 4.4 from SVN repository? It's java, so it's pretty easy - you might find some help here: http://wiki.apache.org/solr/HowToContribute - Mark -Original Message- From: Mark Miller [mailto:markrmil...@gmail.com] Sent: Monday, June 10, 2013 8:05 PM To: solr-user@lucene.apache.org Subject: Re: external zookeeper with SolrCloud This might be https://issues.apache.org/jira/browse/SOLR-4899 - Mark On Jun 10, 2013, at 5:59 PM, Joshi, Shital shital.jo...@gs.com wrote: Hi, We're setting up 5 shard SolrCloud with external zoo keeper. When we bring up Solr nodes while the zookeeper instance is not up and running, we see this error in Solr logs. java.net.ConnectException: Connection refused at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method) at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:567) at org.apache.zookeeper.ClientCnxnSocketNIO.doTransport(ClientCnxnSocketNIO.java:350) at org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1068) INFO - 2013-06-10 15:03:35.422; org.apache.solr.common.cloud.ConnectionManager; Watcher 592147 [main-EventThread] INFO org.apache.solr.common.cloud.ConnectionManager ? Watcher
RE: external zookeeper with SolrCloud
Thanks Mark. Looks like this bug is fixed in Solr 4.4. Do you have any date for official release of 4.4? Is there any instruction available on how to build Solr 4.4 from SVN repository? -Original Message- From: Mark Miller [mailto:markrmil...@gmail.com] Sent: Monday, June 10, 2013 8:05 PM To: solr-user@lucene.apache.org Subject: Re: external zookeeper with SolrCloud This might be https://issues.apache.org/jira/browse/SOLR-4899 - Mark On Jun 10, 2013, at 5:59 PM, Joshi, Shital shital.jo...@gs.com wrote: Hi, We're setting up 5 shard SolrCloud with external zoo keeper. When we bring up Solr nodes while the zookeeper instance is not up and running, we see this error in Solr logs. java.net.ConnectException: Connection refused at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method) at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:567) at org.apache.zookeeper.ClientCnxnSocketNIO.doTransport(ClientCnxnSocketNIO.java:350) at org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1068) INFO - 2013-06-10 15:03:35.422; org.apache.solr.common.cloud.ConnectionManager; Watcher 592147 [main-EventThread] INFO org.apache.solr.common.cloud.ConnectionManager ? Watcher org.apache.solr.common.cloud.ConnectionManager@530d0eae name:ZooKeeperConnection Watcher: . got event WatchedEvent state:SyncConnected type:None path:null path:null type:None INFO - 2013-06-10 15:03:35.423; org.apache.solr.common.cloud.ConnectionManager; Client-ZooKeeper status change trigger but we are already closed 592148 [main-EventThread] INFO org.apache.solr.common.cloud.ConnectionManager ? Client-ZooKeeper status change trigger but we are already closed After we bring up zookeeper instance, the node never connects to zookeeper and we can't see the solr admin page, until we restart the node. Does the zookeeper instance has to be up when we bring up Solr node? That's not what the documentation say though. Thanks.
Re: external zookeeper with SolrCloud
On Jun 11, 2013, at 10:15 AM, Joshi, Shital shital.jo...@gs.com wrote: Thanks Mark. Looks like this bug is fixed in Solr 4.4. Do you have any date for official release of 4.4? Looks like it might come out in a couple of weeks. Is there any instruction available on how to build Solr 4.4 from SVN repository? It's java, so it's pretty easy - you might find some help here: http://wiki.apache.org/solr/HowToContribute - Mark -Original Message- From: Mark Miller [mailto:markrmil...@gmail.com] Sent: Monday, June 10, 2013 8:05 PM To: solr-user@lucene.apache.org Subject: Re: external zookeeper with SolrCloud This might be https://issues.apache.org/jira/browse/SOLR-4899 - Mark On Jun 10, 2013, at 5:59 PM, Joshi, Shital shital.jo...@gs.com wrote: Hi, We're setting up 5 shard SolrCloud with external zoo keeper. When we bring up Solr nodes while the zookeeper instance is not up and running, we see this error in Solr logs. java.net.ConnectException: Connection refused at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method) at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:567) at org.apache.zookeeper.ClientCnxnSocketNIO.doTransport(ClientCnxnSocketNIO.java:350) at org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1068) INFO - 2013-06-10 15:03:35.422; org.apache.solr.common.cloud.ConnectionManager; Watcher 592147 [main-EventThread] INFO org.apache.solr.common.cloud.ConnectionManager ? Watcher org.apache.solr.common.cloud.ConnectionManager@530d0eae name:ZooKeeperConnection Watcher: . got event WatchedEvent state:SyncConnected type:None path:null path:null type:None INFO - 2013-06-10 15:03:35.423; org.apache.solr.common.cloud.ConnectionManager; Client-ZooKeeper status change trigger but we are already closed 592148 [main-EventThread] INFO org.apache.solr.common.cloud.ConnectionManager ? Client-ZooKeeper status change trigger but we are already closed After we bring up zookeeper instance, the node never connects to zookeeper and we can't see the solr admin page, until we restart the node. Does the zookeeper instance has to be up when we bring up Solr node? That's not what the documentation say though. Thanks.
external zookeeper with SolrCloud
Hi, We're setting up 5 shard SolrCloud with external zoo keeper. When we bring up Solr nodes while the zookeeper instance is not up and running, we see this error in Solr logs. java.net.ConnectException: Connection refused at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method) at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:567) at org.apache.zookeeper.ClientCnxnSocketNIO.doTransport(ClientCnxnSocketNIO.java:350) at org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1068) INFO - 2013-06-10 15:03:35.422; org.apache.solr.common.cloud.ConnectionManager; Watcher 592147 [main-EventThread] INFO org.apache.solr.common.cloud.ConnectionManager ? Watcher org.apache.solr.common.cloud.ConnectionManager@530d0eae name:ZooKeeperConnection Watcher: . got event WatchedEvent state:SyncConnected type:None path:null path:null type:None INFO - 2013-06-10 15:03:35.423; org.apache.solr.common.cloud.ConnectionManager; Client-ZooKeeper status change trigger but we are already closed 592148 [main-EventThread] INFO org.apache.solr.common.cloud.ConnectionManager ? Client-ZooKeeper status change trigger but we are already closed After we bring up zookeeper instance, the node never connects to zookeeper and we can't see the solr admin page, until we restart the node. Does the zookeeper instance has to be up when we bring up Solr node? That's not what the documentation say though. Thanks.
Re: external zookeeper with SolrCloud
This might be https://issues.apache.org/jira/browse/SOLR-4899 - Mark On Jun 10, 2013, at 5:59 PM, Joshi, Shital shital.jo...@gs.com wrote: Hi, We're setting up 5 shard SolrCloud with external zoo keeper. When we bring up Solr nodes while the zookeeper instance is not up and running, we see this error in Solr logs. java.net.ConnectException: Connection refused at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method) at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:567) at org.apache.zookeeper.ClientCnxnSocketNIO.doTransport(ClientCnxnSocketNIO.java:350) at org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1068) INFO - 2013-06-10 15:03:35.422; org.apache.solr.common.cloud.ConnectionManager; Watcher 592147 [main-EventThread] INFO org.apache.solr.common.cloud.ConnectionManager ? Watcher org.apache.solr.common.cloud.ConnectionManager@530d0eae name:ZooKeeperConnection Watcher: . got event WatchedEvent state:SyncConnected type:None path:null path:null type:None INFO - 2013-06-10 15:03:35.423; org.apache.solr.common.cloud.ConnectionManager; Client-ZooKeeper status change trigger but we are already closed 592148 [main-EventThread] INFO org.apache.solr.common.cloud.ConnectionManager ? Client-ZooKeeper status change trigger but we are already closed After we bring up zookeeper instance, the node never connects to zookeeper and we can't see the solr admin page, until we restart the node. Does the zookeeper instance has to be up when we bring up Solr node? That's not what the documentation say though. Thanks.