Re: Solr Master-Slave fail-over across multiple data-centers

2014-06-13 Thread Daniel Collins
Why do you need to swap the replicas from one master to another?

If you have a cross DC database that ensures both Masters are in sync, why
not just tie SolrSlave-B1 and SolrSlave-B2 to SolrMaster-B at all times?
 Then you don't have any fail-over to do at all?

We have multiple DCs and a similar setup (though a bit larger, 16 machines
per DC comprising 4 replicas of the collection) and we do exactly that.  So
we have 2 independent Solr Clouds, but we feed them from a single input
stream, so they should be in sync (except commit times might vary slightly
from replica to replica).  Users query whichever replica is nearest/least
loaded, to minimize cross-DC traffic.

But then for us, availability beats consistency, we'd rather have a working
cloud if one DC dies, even if it is slightly inconsistent.  For us, that's
better (its an NRT system) than the alternative.  If we do lose a DC, we'll
have to manually sync back up before we bring it on-line for users but
that's a price we are willing to pay.


On 13 June 2014 00:52, Arcadius Ahouansou arcad...@menelic.com wrote:

 Hello.

 - We currently have solr 4 in master-slave mode across 2 DataCenters.

 - We are planning to run the system in active-active mode, meaning that
 search requests will go to Solr Slaves in both DC-A and DC-B.

 - We have a highly available and cross DC database that feeds the
 SolrMaster in both DC. So, both Solr Masters are being kept up-to-date.

 - In order to allow all slaves in both DC to have the very same index
 version, we have come up with the idea of having multiple masterUrl on each
 slave, i.e masterUrl=masterUrl-A,masterUrl-B (and this is the main point of
 this post)

 - When both DC are available, only masterUrl-A is used for fetching the
 index and the topology would look like the one shown at
 https://www.dropbox.com/s/4vqdx70af5ddn69/master-slave-failover.png

 - In case the worst happens and we lose DC-A,   the slaves in DC-B will get
 network errors like NoRouteToHost or ConnectionTimeout.

 - After few attempts, the slaves will switch to using the next url in the
 masterUrl variable which would be masterUrl-B

 - This should work pretty well and when DC-A becomes available, we could
 issue a rest API call to reset the masterUrl or restart the master in DC-B
 and slaves in DC-B should switch back to using masterUrl-A.

 - I would like to gather your thought about this idea.

 - If this makes sense, I could raise a Jira ticket to enable multiple
 masterUrl and the fail-over principle described here.

 Thank you very much.

 Arcadius.



Re: Solr Master-Slave fail-over across multiple data-centers

2014-06-13 Thread Arcadius Ahouansou
Hello Daniel.

Consistency is the main reason for initially pointing all slaves to
SolrMaster-A.
 This gives us the confidence that the very same index version is being
served everywhere...

Thank you very much for the input.

Arcadius.

On 13 June 2014 10:56, Daniel Collins danwcoll...@gmail.com wrote:

 Why do you need to swap the replicas from one master to another?

 If you have a cross DC database that ensures both Masters are in sync, why
 not just tie SolrSlave-B1 and SolrSlave-B2 to SolrMaster-B at all times?
  Then you don't have any fail-over to do at all?

 We have multiple DCs and a similar setup (though a bit larger, 16 machines
 per DC comprising 4 replicas of the collection) and we do exactly that.  So
 we have 2 independent Solr Clouds, but we feed them from a single input
 stream, so they should be in sync (except commit times might vary slightly
 from replica to replica).  Users query whichever replica is nearest/least
 loaded, to minimize cross-DC traffic.

 But then for us, availability beats consistency, we'd rather have a working
 cloud if one DC dies, even if it is slightly inconsistent.  For us, that's
 better (its an NRT system) than the alternative.  If we do lose a DC, we'll
 have to manually sync back up before we bring it on-line for users but
 that's a price we are willing to pay.


 On 13 June 2014 00:52, Arcadius Ahouansou arcad...@menelic.com wrote:

  Hello.
 
  - We currently have solr 4 in master-slave mode across 2 DataCenters.
 
  - We are planning to run the system in active-active mode, meaning that
  search requests will go to Solr Slaves in both DC-A and DC-B.
 
  - We have a highly available and cross DC database that feeds the
  SolrMaster in both DC. So, both Solr Masters are being kept up-to-date.
 
  - In order to allow all slaves in both DC to have the very same index
  version, we have come up with the idea of having multiple masterUrl on
 each
  slave, i.e masterUrl=masterUrl-A,masterUrl-B (and this is the main point
 of
  this post)
 
  - When both DC are available, only masterUrl-A is used for fetching the
  index and the topology would look like the one shown at
  https://www.dropbox.com/s/4vqdx70af5ddn69/master-slave-failover.png
 
  - In case the worst happens and we lose DC-A,   the slaves in DC-B will
 get
  network errors like NoRouteToHost or ConnectionTimeout.
 
  - After few attempts, the slaves will switch to using the next url in the
  masterUrl variable which would be masterUrl-B
 
  - This should work pretty well and when DC-A becomes available, we could
  issue a rest API call to reset the masterUrl or restart the master in
 DC-B
  and slaves in DC-B should switch back to using masterUrl-A.
 
  - I would like to gather your thought about this idea.
 
  - If this makes sense, I could raise a Jira ticket to enable multiple
  masterUrl and the fail-over principle described here.
 
  Thank you very much.
 
  Arcadius.
 




-- 
Arcadius Ahouansou
Menelic Ltd | Information is Power
M: 07908761999
W: www.menelic.com
---