Re: Cross DC SolrCloud anti-patterns in presentation shalinmangar/cross-datacenter-replication-in-apache-solr-6
bq: if you loose DC1, then your cluster will not be able to recover after DC1 comes back as there will be no clear majority When ZK loses majority, no indexing takes place. So in the case where you have 3 nodes in DC1 and 3 nodes in DC2, _neither_ of them would allow updates if the connection was cut for any reason since updates require 4 live ZK servers in this scenario to be available ((6/2)+1). So when the connection was restored, there'd be nothing to reconcile and Solr should recover just fine. The whole ZK majority thing is about data consistency. Since querying doesn't change the index at all there's no consistency problem here to reconcile after the connection is restored. And since quorum was lost, no updates are allowed. Best, Erick On Mon, Jul 10, 2017 at 5:08 PM, Arcadius Ahouansouwrote: > Hello Shawn. > > Thank you very much for the comment. > > On 24 June 2017 at 16:14, Shawn Heisey wrote: > >> On 6/24/2017 2:14 AM, Arcadius Ahouansou wrote: >> > Interpretation 1: >> >> ZooKeeper doesn't *need* an odd number of servers, but there's no >> benefit to an even number. If you have 5 servers, two can go down. If >> you have 6 servers, you can still only lose two, so you might as well >> just run 5. You'd have fewer possible points of failure, less power >> usage, and less bandwidth usage. >> >> > About Slide 8 and the odd/even number of nodes... > what I meant is that on Slide 8, if you loose DC1, then your cluster will > not be able to recover after DC1 comes back as there will be no clear > majority > and you will have: > - 3 ZK nodes with up-to-date data (that is DC2+DC3) and > - 3 ZK nodes with out-of-date data (DC1). > > But, if you had only 2 ZK nodes in DC1, then you could afford to loose one > of either DC1, or DC2 or DC3 and the cluster will be able to recover and be > OK > > > Thank you very much. > > > Arcadius > > -- > Arcadius Ahouansou > Menelic Ltd | Applied Knowledge Is Power > Office : +441444702101 > Mobile: +447908761999 > Web: www.menelic.com > ---
Re: Cross DC SolrCloud anti-patterns in presentation shalinmangar/cross-datacenter-replication-in-apache-solr-6
Hello Shawn. Thank you very much for the comment. On 24 June 2017 at 16:14, Shawn Heiseywrote: > On 6/24/2017 2:14 AM, Arcadius Ahouansou wrote: > > Interpretation 1: > > ZooKeeper doesn't *need* an odd number of servers, but there's no > benefit to an even number. If you have 5 servers, two can go down. If > you have 6 servers, you can still only lose two, so you might as well > just run 5. You'd have fewer possible points of failure, less power > usage, and less bandwidth usage. > > About Slide 8 and the odd/even number of nodes... what I meant is that on Slide 8, if you loose DC1, then your cluster will not be able to recover after DC1 comes back as there will be no clear majority and you will have: - 3 ZK nodes with up-to-date data (that is DC2+DC3) and - 3 ZK nodes with out-of-date data (DC1). But, if you had only 2 ZK nodes in DC1, then you could afford to loose one of either DC1, or DC2 or DC3 and the cluster will be able to recover and be OK Thank you very much. Arcadius -- Arcadius Ahouansou Menelic Ltd | Applied Knowledge Is Power Office : +441444702101 Mobile: +447908761999 Web: www.menelic.com ---
Re: Cross DC SolrCloud anti-patterns in presentation shalinmangar/cross-datacenter-replication-in-apache-solr-6
On 6/24/2017 2:14 AM, Arcadius Ahouansou wrote: > Interpretation 1: > > - On slide 6 and 7: Only 2 DC used, so the ZK quorum will not survive and > recover after 1 DC failure > > - On slide 8: We have 3 DCs which OK for ZK. > But we have 6 ZK nodes. > This is a problem because ZK likes 3, 5, 7 ... odd nodes. On both slide 6 and slide 7, Solr stays completely operational in DC1 if DC2 goes down. It all falls apart if DC1 goes down. For clients that can still reach them, the remaining Solr servers are read only in that situation. Slide 8 is very similar -- if DC1 goes down, Solr is read only. If either DC2 or DC3 goes down, everything is fine for clients that can still get to Solr. One additional consideration: If both DC2 and DC3 go down, then the remaining Solr severs in DC1 are read only. ZooKeeper doesn't *need* an odd number of servers, but there's no benefit to an even number. If you have 5 servers, two can go down. If you have 6 servers, you can still only lose two, so you might as well just run 5. You'd have fewer possible points of failure, less power usage, and less bandwidth usage. The best minimum option is an odd number of data centers, minimum 3, with one zookeeper in each location. For Solr, you want at least two servers, which should be split evenly between at least two of those datacenter locations. If you're really stuck with only two datacenters, then you can follow the advice in the presentation: Set up a full cloud in each datacenter and use CDCR between them. > Interpretation 2: > > Any SolrCloud deployment with "Remote SolrCloud nodes" i.e. solrCloud not in > same DC as ZK is deemed an anti-pattern (note that DCs can be just a couple > of miles apart and could be connected by high speed network) I'm not sure that this is actually true, but it does introduce latency and more moving parts in the form of network connections between data centers -- connections which might go down. I wouldn't do it, but I also wouldn't automatically dismiss it as a viable setup, as long as it meets ZooKeeper's requirements and there are two complete copies of the Solr collections, each in different data centers. Typical designs only stay viable if one datacenter goes down, but if you were to use five datacenters and have enough Solr servers for three complete copies of your collections, you could survive two data center outages. Thanks, Shawn
Cross DC SolrCloud anti-patterns in presentation shalinmangar/cross-datacenter-replication-in-apache-solr-6
Hello. This email is about the presentation at https://www.slideshare.net/shalinmangar/cross-datacenter- replication-in-apache-solr-6 On slides 6, 7 and 8, there are 3 anti-patterns. It seems there are many interpretations. Interpretation 1: - On slide 6 and 7: Only 2 DC used, so the ZK quorum will not survive and recover after 1 DC failure - On slide 8: We have 3 DCs which OK for ZK. But we have 6 ZK nodes. This is a problem because ZK likes 3, 5, 7 ... odd nodes. - Those rules are dictated by zookeeper and will apply when using any system that rely on zookeeper such as Hadoop, Kafka, HBase, etc Interpretation 2: Any SolrCloud deployment with "Remote SolrCloud nodes" i.e. solrCloud not in same DC as ZK is deemed an anti-pattern (note that DCs can be just a couple of miles apart and could be connected by high speed network) There are many more interpretations. It would be interesting to know the correct interpretation. Thank you very much. Arcadius.