Re: [ClusterLabs] Sub‑clusters / super‑clusters - working :)

2021-08-07 Thread Strahil Nikolov via Users
>Because Asterisk at cityA is bound to a floating IP address, which is held 
>onone of the three machines in cityA. I can't run Asterisk on all 
>threemachines there because only one of them has the IP address.
That's not true. You can use a cloned IP resource with 'globally-unique=true' 
which runs the IP everywhere, but the cluster determines which node to respond 
(conntrolled via IPTABLES) and the others never reply.
It's quite useful for reducing the time for failover.

Best Regards,Strahil Nikolov___
Manage your subscription:
https://lists.clusterlabs.org/mailman/listinfo/users

ClusterLabs home: https://www.clusterlabs.org/


Re: [ClusterLabs] Sub‑clusters / super‑clusters - working :)

2021-08-06 Thread Antony Stone
On Friday 06 August 2021 at 15:12:57, Andrei Borzenkov wrote:

> On Fri, Aug 6, 2021 at 3:42 PM Antony Stone wrote:
> > On Friday 06 August 2021 at 14:14:09, Andrei Borzenkov wrote:
> > > 
> > > If connectivity between (any two) sites is lost you may end up with
> > > one of A or B going out of quorum.
> > 
> > Agreed.
> > 
> > > While this will stop active resources and restart them on another site,
> > 
> > No.  Resources do not start on the "wrong" site because of:
> > location pref_A GroupA rule -inf: site ne cityA
> > location pref_B GroupB rule -inf: site ne cityB
> > 
> > The resources in GroupA either run in cityA or they do not run at all.
> 
> Where did I say anything about group A or B? You have single resource
> that can migrate between sites
> 
> location no_pref Anywhere rule -inf: site ne cityA and site ne cityB

In fact that rule turns out to be unnecessary, because of:

colocation Ast 100: Anywhere [ GroupA GroupB ]

(apologies for the typo the first time I posted that, corrected in my previous 
reply to this one).

This ensures that the "Anywhere" resource group runs either on the machine 
which is running the "GroupA" group or the one which is running the "GroupB" 
group.  This is an added bonus which I find useful, so that only one machine at 
each site is running all the resources at that site.

> I have no idea what "Asterisk in cityA'' means because I see only one
> resource named Asterisk which is not restricted to a single site
> according to your configuration.

Ah, I see the confusion.  I used Asterisk as a simple resource in my example, 
as the thing I wanted to run just once, somewhere.

In fact, for the real setup, where GroupA and GroupB each comprise 10 
resources, and the Anywhere group comprises two, Asterisk is one of the 10 
resources which do run at both sites.

> The only resource that allegedly can migrate between sites in
> configuration you have shown so far is Asterisk.

Yes, in my example documented here.

> Now you say this resource never migrates between sites.

Yes, for my real configuration, which contains 10 resources (one of which is 
Asterisk) in each of GroupA and GroupB, and is therefore over-complicated to 
quote as a proof-of-concept here.

> I'm not sure how helpful this will be to anyone reading archives because I
> completely lost all track of what you tried to achieve.

That can be expressed very simply:

1. A group of resources named GroupA which either run in cityA or do not run 
at all.

2. A group of resources named GroupB which either run in cityB or do not run 
at all.

3. A group of resources name Anywhere which run in either cityA or cityB but 
not both.


Antony.

-- 
Numerous psychological studies over the years have demonstrated that the 
majority of people genuinely believe they are not like the majority of people.

   Please reply to the list;
 please *don't* CC me.
___
Manage your subscription:
https://lists.clusterlabs.org/mailman/listinfo/users

ClusterLabs home: https://www.clusterlabs.org/


Re: [ClusterLabs] Sub‑clusters / super‑clusters - working :)

2021-08-06 Thread Andrei Borzenkov
On Fri, Aug 6, 2021 at 3:42 PM Antony Stone
 wrote:
>
> On Friday 06 August 2021 at 14:14:09, Andrei Borzenkov wrote:
>
> > On Thu, Aug 5, 2021 at 3:44 PM Antony Stone wrote:
> > >
> > > For anyone interested in the detail of how to do this (without needing
> > > booth), here is my cluster.conf file, as in "crm configure load replace
> > > cluster.conf":
> > >
> > > 
> > > node tom attribute site=cityA
> > > node dick attribute site=cityA
> > > node harry attribute site=cityA
> > >
> > > node fred attribute site=cityB
> > > node george attribute site=cityB
> > > node ron attribute site=cityB
> > >
> > > primitive A-float IPaddr2 params ip=192.168.32.250 cidr_netmask=24 meta
> > > migration-threshold=3 failure-timeout=60 op monitor interval=5 timeout=20
> > > on- fail=restart
> > > primitive B-float IPaddr2 params ip=192.168.42.250 cidr_netmask=24 meta
> > > migration-threshold=3 failure-timeout=60 op monitor interval=5 timeout=20
> > > on- fail=restart
> > > primitive Asterisk asterisk meta migration-threshold=3 failure-timeout=60
> > > op monitor interval=5 timeout=20 on-fail=restart
> > >
> > > group GroupA A-float4  resource-stickiness=100
> > > group GroupB B-float4  resource-stickiness=100
> > > group Anywhere Asterisk resource-stickiness=100
> > >
> > > location pref_A GroupA rule -inf: site ne cityA
> > > location pref_B GroupB rule -inf: site ne cityB
> > > location no_pref Anywhere rule -inf: site ne cityA and site ne cityB
> > >
> > > colocation Ast 100: Anywhere [ cityA cityB ]
> >
> > You define a resource set, but there are no resources cityA or cityB,
> > at least you do not show them. So it is not quite clear what this
> > colocation does.
>
> Apologies - I had used different names in my test setup, and converted them to
> cityA etc for the sake of continuity in this discussion.
>
> That should be:
>
> colocation Ast 100: Anywhere [ GroupA GroupB ]
>
> > > property cib-bootstrap-options: stonith-enabled=no no-quorum-policy=stop
> >
> > If connectivity between (any two) sites is lost you may end up with
> > one of A or B going out of quorum.
>
> Agreed.
>
> > While this will stop active resources and restart them on another site,
>
> No.  Resources do not start on the "wrong" site because of:
>
> location pref_A GroupA rule -inf: site ne cityA
> location pref_B GroupB rule -inf: site ne cityB
>
> The resources in GroupA either run in cityA or they do not run at all.
>

Where did I say anything about group A or B? You have single resource
that can migrate between sites

location no_pref Anywhere rule -inf: site ne cityA and site ne cityB

> > there is no coordination between stopping and starting so for some time
> > resources will be active on both sites. It is up to you to evaluate whether
> > this matters.
>
> Any resource which tried to start at the wrong site would simply fail, because
> the IP addresses involved do not work at the "other" site.
>
> > If this matters your solution does not protect against it.
> >
> > If this does not matter, the usual response is - why do you need a
> > cluster in the first place? Why not simply always run asterisk on both
> > sites all the time?
>
> Because Asterisk at cityA is bound to a floating IP address, which is held on
> one of the three machines in cityA.  I can't run Asterisk on all three
> machines there because only one of them has the IP address.
>

I have no idea what "Asterisk in cityA'' means because I see only one
resource named Asterisk which is not restricted to a single site
according to your configuration.

> Asterisk _does_ normally run on both sites all the time, but only on one
> machine at each site.
>

The only resource that allegedly can migrate between sites in
configuration you have shown so far is Asterisk. Now you say this
resource never migrates between sites. I'm not sure how helpful this
will be to anyone reading archives because I completely lost all track
of what you tried to achieve.

> > > start-failure-is-fatal=false cluster-recheck-interval=60s
> > > 
> > >
> > > Of course, the group definitions are not needed for single resources, but
> > > I shall in practice be using multiple resources which do need groups, so
> > > I wanted to ensure I was creating something which would work with that.
> >
> > > I have tested it by:
> > ...
> > >  - causing a network failure at one city (so it simply disappears without
> > > stopping corosync neatly): the other city continues its resources (plus
> > > the "anywhere" resource), the isolated city stops
> >
> > If the site is completely isolated it probably does not matter whether
> > anything is active there. It is partial connectivity loss where it
> > becomes interesting.
>
> Agreed, however my testing shows that resources which I want running in cityA
> are either running there or they're not (they never move to cityB or cityC),
> similarly for cityB, and the resources I want just a single instance of are
> doing just that, and on the sam

Re: [ClusterLabs] Sub‑clusters / super‑clusters - working :)

2021-08-06 Thread Antony Stone
On Friday 06 August 2021 at 14:14:09, Andrei Borzenkov wrote:

> On Thu, Aug 5, 2021 at 3:44 PM Antony Stone wrote:
> > 
> > For anyone interested in the detail of how to do this (without needing
> > booth), here is my cluster.conf file, as in "crm configure load replace
> > cluster.conf":
> > 
> > 
> > node tom attribute site=cityA
> > node dick attribute site=cityA
> > node harry attribute site=cityA
> > 
> > node fred attribute site=cityB
> > node george attribute site=cityB
> > node ron attribute site=cityB
> > 
> > primitive A-float IPaddr2 params ip=192.168.32.250 cidr_netmask=24 meta
> > migration-threshold=3 failure-timeout=60 op monitor interval=5 timeout=20
> > on- fail=restart
> > primitive B-float IPaddr2 params ip=192.168.42.250 cidr_netmask=24 meta
> > migration-threshold=3 failure-timeout=60 op monitor interval=5 timeout=20
> > on- fail=restart
> > primitive Asterisk asterisk meta migration-threshold=3 failure-timeout=60
> > op monitor interval=5 timeout=20 on-fail=restart
> > 
> > group GroupA A-float4  resource-stickiness=100
> > group GroupB B-float4  resource-stickiness=100
> > group Anywhere Asterisk resource-stickiness=100
> > 
> > location pref_A GroupA rule -inf: site ne cityA
> > location pref_B GroupB rule -inf: site ne cityB
> > location no_pref Anywhere rule -inf: site ne cityA and site ne cityB
> > 
> > colocation Ast 100: Anywhere [ cityA cityB ]
> 
> You define a resource set, but there are no resources cityA or cityB,
> at least you do not show them. So it is not quite clear what this
> colocation does.

Apologies - I had used different names in my test setup, and converted them to 
cityA etc for the sake of continuity in this discussion.

That should be:

colocation Ast 100: Anywhere [ GroupA GroupB ]

> > property cib-bootstrap-options: stonith-enabled=no no-quorum-policy=stop
> 
> If connectivity between (any two) sites is lost you may end up with
> one of A or B going out of quorum.

Agreed.

> While this will stop active resources and restart them on another site,

No.  Resources do not start on the "wrong" site because of:

location pref_A GroupA rule -inf: site ne cityA
location pref_B GroupB rule -inf: site ne cityB

The resources in GroupA either run in cityA or they do not run at all.

> there is no coordination between stopping and starting so for some time
> resources will be active on both sites. It is up to you to evaluate whether
> this matters.

Any resource which tried to start at the wrong site would simply fail, because 
the IP addresses involved do not work at the "other" site.

> If this matters your solution does not protect against it.
> 
> If this does not matter, the usual response is - why do you need a
> cluster in the first place? Why not simply always run asterisk on both
> sites all the time?

Because Asterisk at cityA is bound to a floating IP address, which is held on 
one of the three machines in cityA.  I can't run Asterisk on all three 
machines there because only one of them has the IP address.

Asterisk _does_ normally run on both sites all the time, but only on one 
machine at each site.

> > start-failure-is-fatal=false cluster-recheck-interval=60s
> > 
> > 
> > Of course, the group definitions are not needed for single resources, but
> > I shall in practice be using multiple resources which do need groups, so
> > I wanted to ensure I was creating something which would work with that.
> 
> > I have tested it by:
> ...
> >  - causing a network failure at one city (so it simply disappears without
> > stopping corosync neatly): the other city continues its resources (plus
> > the "anywhere" resource), the isolated city stops
> 
> If the site is completely isolated it probably does not matter whether
> anything is active there. It is partial connectivity loss where it
> becomes interesting.

Agreed, however my testing shows that resources which I want running in cityA 
are either running there or they're not (they never move to cityB or cityC), 
similarly for cityB, and the resources I want just a single instance of are 
doing just that, and on the same machine at cityA or cityB as the local 
resources are running on.


Thanks for the feedback,


Antony.

-- 
"Measuring average network latency is about as useful as measuring the mean 
temperature of patients in a hospital."

 - Stéphane Bortzmeyer

   Please reply to the list;
 please *don't* CC me.
___
Manage your subscription:
https://lists.clusterlabs.org/mailman/listinfo/users

ClusterLabs home: https://www.clusterlabs.org/


Re: [ClusterLabs] Sub‑clusters / super‑clusters - working :)

2021-08-06 Thread Andrei Borzenkov
On Thu, Aug 5, 2021 at 3:44 PM Antony Stone
 wrote:
>
> On Thursday 05 August 2021 at 10:51:37, Antony Stone wrote:
>
> > On Thursday 05 August 2021 at 07:48:37, Ulrich Windl wrote:
> > >
> > > Have you ever tried to find out why this happens? (Talking about logs)
> >
> > Not in detail, no, but just in case there's a chance of getting this
> > working as suggested simply using location constraints, I shall look
> > further.
>
> I now have a working solution - thank you to everyone who has helped.
>
> The answer to the problem above was simple - with a 6-node cluster, 3 votes is
> not quorum.
>
> I added a 7th node (in "city C") and adjusted the location constraints to
> ensure that cluster A resources run in city A, cluster B resources run in city
> B, and the "anywhere" resource runs in either city A or city B.
>
> I've even added a colocation constraint to ensure that the "anywhere" resource
> runs on the same machine in either city A or city B as is running the local
> resources there (which wasn't a strict requirement, but is very useful).
>
> For anyone interested in the detail of how to do this (without needing booth),
> here is my cluster.conf file, as in "crm configure load replace cluster.conf":
>
> 
> node tom attribute site=cityA
> node dick attribute site=cityA
> node harry attribute site=cityA
>
> node fred attribute site=cityB
> node george attribute site=cityB
> node ron attribute site=cityB
>
> primitive A-float IPaddr2 params ip=192.168.32.250 cidr_netmask=24 meta
> migration-threshold=3 failure-timeout=60 op monitor interval=5 timeout=20 on-
> fail=restart
> primitive B-float IPaddr2 params ip=192.168.42.250 cidr_netmask=24 meta
> migration-threshold=3 failure-timeout=60 op monitor interval=5 timeout=20 on-
> fail=restart
> primitive Asterisk asterisk meta migration-threshold=3 failure-timeout=60 op
> monitor interval=5 timeout=20 on-fail=restart
>
> group GroupA A-float4  resource-stickiness=100
> group GroupB B-float4  resource-stickiness=100
> group Anywhere Asterisk resource-stickiness=100
>
> location pref_A GroupA rule -inf: site ne cityA
> location pref_B GroupB rule -inf: site ne cityB
> location no_pref Anywhere rule -inf: site ne cityA and site ne cityB
>
> colocation Ast 100: Anywhere [ cityA cityB ]
>

You define a resource set, but there are no resources cityA or cityB,
at least you do not show them. So it is not quite clear what this
colocation does.

> property cib-bootstrap-options: stonith-enabled=no no-quorum-policy=stop

If connectivity between (any two) sites is lost you may end up with
one of A or B going out of quorum. While this will stop active
resources and restart them on another site, there is no coordination
between stopping and starting so for some time resources will be
active on both sites. It is up to you to evaluate whether this
matters.

If this matters your solution does not protect against it.

If this does not matter, the usual response is - why do you need a
cluster in the first place? Why not simply always run asterisk on both
sites all the time?


> start-failure-is-fatal=false cluster-recheck-interval=60s
> 
>
> Of course, the group definitions are not needed for single resources, but I
> shall in practice be using multiple resources which do need groups, so I
> wanted to ensure I was creating something which would work with that.
>
> I have tested it by:
>
...
>  - causing a network failure at one city (so it simply disappears without
> stopping corosync neatly): the other city continues its resources (plus the
> "anywhere" resource), the isolated city stops
>

If the site is completely isolated it probably does not matter whether
anything is active there. It is partial connectivity loss where it
becomes interesting.
___
Manage your subscription:
https://lists.clusterlabs.org/mailman/listinfo/users

ClusterLabs home: https://www.clusterlabs.org/


Re: [ClusterLabs] Sub‑clusters / super‑clusters - working :)

2021-08-05 Thread Antony Stone
On Thursday 05 August 2021 at 15:44:18, Ulrich Windl wrote:

> Hi!
> 
> Nice to hear. What could be "interesting" is how stable the WAN-type of
> corosync communication works.

Well, between cityA and cityB it should be pretty good, because these are two 
data centres on opposite sides of England run by the same hosting provider 
(with private dark fibre between them, not dependent on the Internet).

> If it's not that stable, the cluster could try to fence nodes rather
> frequently. OK, you disabled fencing; maybe it works without.

I'm going to find out :)

> Did you tune the parameters?

No:

a) I only just got it working today :)

b) I got it working on a bunch of VMs in my own personal hosting environment; 
I haven't tried it in the real data centres yet.

At the moment I regard it as a Proof of Concept to show that the design works.


Antony.

-- 
Heisenberg, Gödel, and Chomsky walk in to a bar.
Heisenberg says, "Clearly this is a joke, but how can we work out if it's 
funny or not?"
Gödel replies, "We can't know that because we're inside the joke."
Chomsky says, "Of course it's funny. You're just saying it wrong."

   Please reply to the list;
 please *don't* CC me.
___
Manage your subscription:
https://lists.clusterlabs.org/mailman/listinfo/users

ClusterLabs home: https://www.clusterlabs.org/


Re: [ClusterLabs] Sub‑clusters / super‑clusters - working :)

2021-08-05 Thread Antony Stone
On Thursday 05 August 2021 at 10:51:37, Antony Stone wrote:

> On Thursday 05 August 2021 at 07:48:37, Ulrich Windl wrote:
> > 
> > Have you ever tried to find out why this happens? (Talking about logs)
> 
> Not in detail, no, but just in case there's a chance of getting this
> working as suggested simply using location constraints, I shall look
> further.

I now have a working solution - thank you to everyone who has helped.

The answer to the problem above was simple - with a 6-node cluster, 3 votes is 
not quorum.

I added a 7th node (in "city C") and adjusted the location constraints to 
ensure that cluster A resources run in city A, cluster B resources run in city 
B, and the "anywhere" resource runs in either city A or city B.

I've even added a colocation constraint to ensure that the "anywhere" resource 
runs on the same machine in either city A or city B as is running the local 
resources there (which wasn't a strict requirement, but is very useful).

For anyone interested in the detail of how to do this (without needing booth), 
here is my cluster.conf file, as in "crm configure load replace cluster.conf":


node tom attribute site=cityA
node dick attribute site=cityA
node harry attribute site=cityA

node fred attribute site=cityB
node george attribute site=cityB
node ron attribute site=cityB

primitive A-float IPaddr2 params ip=192.168.32.250 cidr_netmask=24 meta 
migration-threshold=3 failure-timeout=60 op monitor interval=5 timeout=20 on-
fail=restart
primitive B-float IPaddr2 params ip=192.168.42.250 cidr_netmask=24 meta 
migration-threshold=3 failure-timeout=60 op monitor interval=5 timeout=20 on-
fail=restart
primitive Asterisk asterisk meta migration-threshold=3 failure-timeout=60 op 
monitor interval=5 timeout=20 on-fail=restart

group GroupA A-float4  resource-stickiness=100
group GroupB B-float4  resource-stickiness=100
group Anywhere Asterisk resource-stickiness=100

location pref_A GroupA rule -inf: site ne cityA
location pref_B GroupB rule -inf: site ne cityB
location no_pref Anywhere rule -inf: site ne cityA and site ne cityB

colocation Ast 100: Anywhere [ cityA cityB ]

property cib-bootstrap-options: stonith-enabled=no no-quorum-policy=stop 
start-failure-is-fatal=false cluster-recheck-interval=60s


Of course, the group definitions are not needed for single resources, but I 
shall in practice be using multiple resources which do need groups, so I 
wanted to ensure I was creating something which would work with that.

I have tested it by:

 - bringing up one node at a time: as soon as any 4 nodes are running, all 
possible resources are running

 - bringing up 5 or more nodes: all resources run

 - taking down one node at a time to a maximum of three nodes offline: if at 
least one node in a given city is running, the resources at that city are 
running

 - turning off (using "halt", so that corosync dies nicely) all three nodes in 
a city simultaneously: that city's resources stop running, the other city 
continues working, as well as the "anywhere" resource

 - causing a network failure at one city (so it simply disappears without 
stopping corosync neatly): the other city continues its resources (plus the 
"anywhere" resource), the isolated city stops

For me, this is the solution I wanted, and in fact it's even slightly better 
than the previous two isolated 3-node clusters I had, because I can now have 
resources running on a single active node in cityA (provided it can see at 
least 3 other nodes in cityB or cityC), which wasn't possible before.


Once again, thanks to everyone who has helped me to achieve this result :)


Antony.

-- 
"The future is already here.   It's just not evenly distributed yet."

 - William Gibson

   Please reply to the list;
 please *don't* CC me.
___
Manage your subscription:
https://lists.clusterlabs.org/mailman/listinfo/users

ClusterLabs home: https://www.clusterlabs.org/