Re: Cross DC SolrCloud anti-patterns in presentation shalinmangar/cross-datacenter-replication-in-apache-solr-6

2017-07-11 Thread Erick Erickson
bq: if you loose DC1, then your cluster will not be able to recover
after DC1 comes back as there will be no clear majority

When ZK loses majority, no indexing takes place. So in the case where
you have 3 nodes in DC1 and 3 nodes in DC2, _neither_ of them would
allow updates if the connection was cut for any reason since updates
require 4 live ZK servers in this scenario to be available ((6/2)+1).
So when the connection was restored, there'd be nothing to reconcile
and Solr should recover just fine.

The whole ZK majority thing is about data consistency. Since querying
doesn't change the index at all there's no consistency problem here to
reconcile after the connection is restored. And since quorum was lost,
no updates are allowed.

Best,
Erick

On Mon, Jul 10, 2017 at 5:08 PM, Arcadius Ahouansou
 wrote:
> Hello Shawn.
>
> Thank you very much for the comment.
>
> On 24 June 2017 at 16:14, Shawn Heisey  wrote:
>
>> On 6/24/2017 2:14 AM, Arcadius Ahouansou wrote:
>> > Interpretation 1:
>>
>> ZooKeeper doesn't *need* an odd number of servers, but there's no
>> benefit to an even number.  If you have 5 servers, two can go down.  If
>> you have 6 servers, you can still only lose two, so you might as well
>> just run 5.  You'd have fewer possible points of failure, less power
>> usage, and less bandwidth usage.
>>
>>
> About Slide 8 and the odd/even number of nodes...
> what I meant is that on Slide 8, if you loose DC1, then your cluster will
> not be able to recover after DC1 comes back as there will be no clear
> majority
> and you will have:
> -  3 ZK nodes with up-to-date data (that is DC2+DC3) and
> -  3 ZK nodes with out-of-date data (DC1).
>
> But, if you had only 2 ZK nodes in DC1, then you could afford to loose one
> of either DC1, or DC2 or DC3 and the cluster will be able to recover and be
> OK
>
>
> Thank you very much.
>
>
> Arcadius
>
> --
> Arcadius Ahouansou
> Menelic Ltd | Applied Knowledge Is Power
> Office : +441444702101
> Mobile: +447908761999
> Web: www.menelic.com
> ---


Re: Cross DC SolrCloud anti-patterns in presentation shalinmangar/cross-datacenter-replication-in-apache-solr-6

2017-07-10 Thread Arcadius Ahouansou
Hello Shawn.

Thank you very much for the comment.

On 24 June 2017 at 16:14, Shawn Heisey  wrote:

> On 6/24/2017 2:14 AM, Arcadius Ahouansou wrote:
> > Interpretation 1:
>
> ZooKeeper doesn't *need* an odd number of servers, but there's no
> benefit to an even number.  If you have 5 servers, two can go down.  If
> you have 6 servers, you can still only lose two, so you might as well
> just run 5.  You'd have fewer possible points of failure, less power
> usage, and less bandwidth usage.
>
>
About Slide 8 and the odd/even number of nodes...
what I meant is that on Slide 8, if you loose DC1, then your cluster will
not be able to recover after DC1 comes back as there will be no clear
majority
and you will have:
-  3 ZK nodes with up-to-date data (that is DC2+DC3) and
-  3 ZK nodes with out-of-date data (DC1).

But, if you had only 2 ZK nodes in DC1, then you could afford to loose one
of either DC1, or DC2 or DC3 and the cluster will be able to recover and be
OK


Thank you very much.


Arcadius

-- 
Arcadius Ahouansou
Menelic Ltd | Applied Knowledge Is Power
Office : +441444702101
Mobile: +447908761999
Web: www.menelic.com
---


Re: Cross DC SolrCloud anti-patterns in presentation shalinmangar/cross-datacenter-replication-in-apache-solr-6

2017-06-24 Thread Shawn Heisey
On 6/24/2017 2:14 AM, Arcadius Ahouansou wrote:
> Interpretation 1:
>
> - On slide 6 and 7: Only 2 DC used, so the ZK quorum will not survive and 
> recover after 1 DC failure
>
> - On slide 8: We have 3 DCs which OK for ZK.
> But we have 6 ZK nodes.
> This is a problem because ZK likes 3, 5, 7 ... odd nodes.

On both slide 6 and slide 7, Solr stays completely operational in DC1 if
DC2 goes down.  It all falls apart if DC1 goes down.  For clients that
can still reach them, the remaining Solr servers are read only in that
situation.

Slide 8 is very similar -- if DC1 goes down, Solr is read only.  If
either DC2 or DC3 goes down, everything is fine for clients that can
still get to Solr.  One additional consideration: If both DC2 and DC3 go
down, then the remaining Solr severs in DC1 are read only.

ZooKeeper doesn't *need* an odd number of servers, but there's no
benefit to an even number.  If you have 5 servers, two can go down.  If
you have 6 servers, you can still only lose two, so you might as well
just run 5.  You'd have fewer possible points of failure, less power
usage, and less bandwidth usage.

The best minimum option is an odd number of data centers, minimum 3,
with one zookeeper in each location.  For Solr, you want at least two
servers, which should be split evenly between at least two of those
datacenter locations.

If you're really stuck with only two datacenters, then you can follow
the advice in the presentation: Set up a full cloud in each datacenter
and use CDCR between them.

> Interpretation 2:
>
> Any SolrCloud deployment with "Remote SolrCloud nodes" i.e. solrCloud not in 
> same DC as ZK is deemed an anti-pattern (note that DCs can be just a couple 
> of miles apart and could be connected by high speed network)

I'm not sure that this is actually true, but it does introduce latency
and more moving parts in the form of network connections between data
centers -- connections which might go down.  I wouldn't do it, but I
also wouldn't automatically dismiss it as a viable setup, as long as it
meets ZooKeeper's requirements and there are two complete copies of the
Solr collections, each in different data centers.

Typical designs only stay viable if one datacenter goes down, but if you
were to use five datacenters and have enough Solr servers for three
complete copies of your collections, you could survive two data center
outages.

Thanks,
Shawn



Cross DC SolrCloud anti-patterns in presentation shalinmangar/cross-datacenter-replication-in-apache-solr-6

2017-06-24 Thread Arcadius Ahouansou
Hello.


This email is about the presentation at
https://www.slideshare.net/shalinmangar/cross-datacenter-
replication-in-apache-solr-6

On slides 6, 7 and 8, there are 3 anti-patterns.

It seems there are many interpretations.

Interpretation 1:

- On slide 6 and 7: Only 2 DC used, so the ZK quorum will not survive and
recover after 1 DC failure

- On slide 8: We have 3 DCs which OK for ZK.
But we have 6 ZK nodes.
This is a problem because ZK likes 3, 5, 7 ... odd nodes.

- Those rules are dictated by zookeeper and will apply when using any
system that rely on zookeeper such as Hadoop, Kafka, HBase, etc


Interpretation 2:

Any SolrCloud deployment with "Remote SolrCloud nodes" i.e. solrCloud not
in same DC as ZK is deemed an anti-pattern (note that DCs can be just a
couple of miles apart and could be connected by high speed network)


There are many more interpretations.


It would be interesting to know the correct interpretation.


Thank you very much.

Arcadius.