CloudSolrServer or load-balancer for indexing
Hi, As far as I know CloudSolrServer is recommended to be used for indexing to SolrCloud. I wonder what are advantages of this approach over external load-balancer ? Let's say I have 4 nodes SolrCloud (2 shards + replicas) + 1 server running ZooKeeper. I can use CloudSolrServer for indexing or use load-balancer and send updates to any existing node. In former case it seems that ZooKeeper is a single point of failure - indexing is not possible if it is down. In latter case I can still indexing data even if some nodes are down (no data outage). What is better for reliable indexing - CloudSolrServer, load-balancer or you know some different methods worth to consider ? Regards.
Re: CloudSolrServer or load-balancer for indexing
Nodes stop accepting updates if they cannot talk to Zookeeper, so the external load balancer is no advantage there. CloudSolrServer will be smart about knowing who the leaders are, eventually will do hashing, will auto add/remove nodes from rotation based on the cluster state in Zookeeper, and is probably out of the box more intelligent about retrying on some responses (for example responses that are returned on shutdown or startup). - Mark On Nov 19, 2012, at 6:54 AM, Marcin Rzewucki mrzewu...@gmail.com wrote: Hi, As far as I know CloudSolrServer is recommended to be used for indexing to SolrCloud. I wonder what are advantages of this approach over external load-balancer ? Let's say I have 4 nodes SolrCloud (2 shards + replicas) + 1 server running ZooKeeper. I can use CloudSolrServer for indexing or use load-balancer and send updates to any existing node. In former case it seems that ZooKeeper is a single point of failure - indexing is not possible if it is down. In latter case I can still indexing data even if some nodes are down (no data outage). What is better for reliable indexing - CloudSolrServer, load-balancer or you know some different methods worth to consider ? Regards.
Re: CloudSolrServer or load-balancer for indexing
OK, got it. Thanks. On 19 November 2012 15:00, Mark Miller markrmil...@gmail.com wrote: Nodes stop accepting updates if they cannot talk to Zookeeper, so the external load balancer is no advantage there. CloudSolrServer will be smart about knowing who the leaders are, eventually will do hashing, will auto add/remove nodes from rotation based on the cluster state in Zookeeper, and is probably out of the box more intelligent about retrying on some responses (for example responses that are returned on shutdown or startup). - Mark On Nov 19, 2012, at 6:54 AM, Marcin Rzewucki mrzewu...@gmail.com wrote: Hi, As far as I know CloudSolrServer is recommended to be used for indexing to SolrCloud. I wonder what are advantages of this approach over external load-balancer ? Let's say I have 4 nodes SolrCloud (2 shards + replicas) + 1 server running ZooKeeper. I can use CloudSolrServer for indexing or use load-balancer and send updates to any existing node. In former case it seems that ZooKeeper is a single point of failure - indexing is not possible if it is down. In latter case I can still indexing data even if some nodes are down (no data outage). What is better for reliable indexing - CloudSolrServer, load-balancer or you know some different methods worth to consider ? Regards.
Re: CloudSolrServer or load-balancer for indexing
A single zookeeper node could be a single point of failure. It is recommended that you have at least one three zookeeper nodes running as an ensemble. Zookeeper has a simple rule - over half of your nodes must be available to achieve quorum and thus be functioning. This is to avoid 'split-brain'. Thus, with three servers, you could handle the loss of one zookeeper node. Five would allow the loss of two nodes. More to the point, you're pushing the static configuration from being a list of solr nodes, to being a list of Zookeeper nodes. The expectation is clearly that you'll need to scale your Zookeeper nodes far less often than you'd need to do it with Solr. Upayavira On Mon, Nov 19, 2012, at 09:39 PM, Marcin Rzewucki wrote: OK, got it. Thanks. On 19 November 2012 15:00, Mark Miller markrmil...@gmail.com wrote: Nodes stop accepting updates if they cannot talk to Zookeeper, so the external load balancer is no advantage there. CloudSolrServer will be smart about knowing who the leaders are, eventually will do hashing, will auto add/remove nodes from rotation based on the cluster state in Zookeeper, and is probably out of the box more intelligent about retrying on some responses (for example responses that are returned on shutdown or startup). - Mark On Nov 19, 2012, at 6:54 AM, Marcin Rzewucki mrzewu...@gmail.com wrote: Hi, As far as I know CloudSolrServer is recommended to be used for indexing to SolrCloud. I wonder what are advantages of this approach over external load-balancer ? Let's say I have 4 nodes SolrCloud (2 shards + replicas) + 1 server running ZooKeeper. I can use CloudSolrServer for indexing or use load-balancer and send updates to any existing node. In former case it seems that ZooKeeper is a single point of failure - indexing is not possible if it is down. In latter case I can still indexing data even if some nodes are down (no data outage). What is better for reliable indexing - CloudSolrServer, load-balancer or you know some different methods worth to consider ? Regards.