CloudSolrServer or load-balancer for indexing

2012-11-19 Thread Marcin Rzewucki
Hi,

As far as I know CloudSolrServer is recommended to be used for indexing to
SolrCloud. I wonder what are advantages of this approach over external
load-balancer ? Let's say I have 4 nodes SolrCloud (2 shards + replicas) +
1 server running ZooKeeper. I can use CloudSolrServer for indexing or use
load-balancer and send updates to any existing node. In former case it
seems that ZooKeeper is a single point of failure - indexing is not
possible if it is down. In latter case I can still indexing data even if
some nodes are down (no data outage). What is better for reliable indexing
- CloudSolrServer, load-balancer or you know some different methods worth
to consider ?

Regards.


Re: CloudSolrServer or load-balancer for indexing

2012-11-19 Thread Mark Miller
Nodes stop accepting updates if they cannot talk to Zookeeper, so the external 
load balancer is no advantage there.

CloudSolrServer will be smart about knowing who the leaders are, eventually 
will do hashing, will auto add/remove nodes from rotation based on the cluster 
state in Zookeeper, and is probably out of the box more intelligent about 
retrying on some responses (for example responses that are returned on shutdown 
or startup).

- Mark

On Nov 19, 2012, at 6:54 AM, Marcin Rzewucki mrzewu...@gmail.com wrote:

 Hi,
 
 As far as I know CloudSolrServer is recommended to be used for indexing to
 SolrCloud. I wonder what are advantages of this approach over external
 load-balancer ? Let's say I have 4 nodes SolrCloud (2 shards + replicas) +
 1 server running ZooKeeper. I can use CloudSolrServer for indexing or use
 load-balancer and send updates to any existing node. In former case it
 seems that ZooKeeper is a single point of failure - indexing is not
 possible if it is down. In latter case I can still indexing data even if
 some nodes are down (no data outage). What is better for reliable indexing
 - CloudSolrServer, load-balancer or you know some different methods worth
 to consider ?
 
 Regards.



Re: CloudSolrServer or load-balancer for indexing

2012-11-19 Thread Marcin Rzewucki
OK, got it. Thanks.

On 19 November 2012 15:00, Mark Miller markrmil...@gmail.com wrote:

 Nodes stop accepting updates if they cannot talk to Zookeeper, so the
 external load balancer is no advantage there.

 CloudSolrServer will be smart about knowing who the leaders are,
 eventually will do hashing, will auto add/remove nodes from rotation based
 on the cluster state in Zookeeper, and is probably out of the box more
 intelligent about retrying on some responses (for example responses that
 are returned on shutdown or startup).

 - Mark

 On Nov 19, 2012, at 6:54 AM, Marcin Rzewucki mrzewu...@gmail.com wrote:

  Hi,
 
  As far as I know CloudSolrServer is recommended to be used for indexing
 to
  SolrCloud. I wonder what are advantages of this approach over external
  load-balancer ? Let's say I have 4 nodes SolrCloud (2 shards + replicas)
 +
  1 server running ZooKeeper. I can use CloudSolrServer for indexing or use
  load-balancer and send updates to any existing node. In former case it
  seems that ZooKeeper is a single point of failure - indexing is not
  possible if it is down. In latter case I can still indexing data even if
  some nodes are down (no data outage). What is better for reliable
 indexing
  - CloudSolrServer, load-balancer or you know some different methods worth
  to consider ?
 
  Regards.




Re: CloudSolrServer or load-balancer for indexing

2012-11-19 Thread Upayavira
A single zookeeper node could be a single point of failure. It is
recommended that you have at least one three zookeeper nodes running as
an ensemble.

Zookeeper has a simple rule - over half of your nodes must be available
to achieve quorum and thus be functioning. This is to avoid
'split-brain'. Thus, with three servers, you could handle the loss of
one zookeeper node. Five would allow the loss of two nodes.

More to the point, you're pushing the static configuration from being a
list of solr nodes, to being a list of Zookeeper nodes. The expectation
is clearly that you'll need to scale your Zookeeper nodes far less often
than you'd need to do it with Solr.

Upayavira

On Mon, Nov 19, 2012, at 09:39 PM, Marcin Rzewucki wrote:
 OK, got it. Thanks.
 
 On 19 November 2012 15:00, Mark Miller markrmil...@gmail.com wrote:
 
  Nodes stop accepting updates if they cannot talk to Zookeeper, so the
  external load balancer is no advantage there.
 
  CloudSolrServer will be smart about knowing who the leaders are,
  eventually will do hashing, will auto add/remove nodes from rotation based
  on the cluster state in Zookeeper, and is probably out of the box more
  intelligent about retrying on some responses (for example responses that
  are returned on shutdown or startup).
 
  - Mark
 
  On Nov 19, 2012, at 6:54 AM, Marcin Rzewucki mrzewu...@gmail.com wrote:
 
   Hi,
  
   As far as I know CloudSolrServer is recommended to be used for indexing
  to
   SolrCloud. I wonder what are advantages of this approach over external
   load-balancer ? Let's say I have 4 nodes SolrCloud (2 shards + replicas)
  +
   1 server running ZooKeeper. I can use CloudSolrServer for indexing or use
   load-balancer and send updates to any existing node. In former case it
   seems that ZooKeeper is a single point of failure - indexing is not
   possible if it is down. In latter case I can still indexing data even if
   some nodes are down (no data outage). What is better for reliable
  indexing
   - CloudSolrServer, load-balancer or you know some different methods worth
   to consider ?
  
   Regards.