[ClusterLabs] Antw: Re: Antw: [EXT] Re: Sub‑clusters / super‑clusters?

2021-08-04 Thread Ulrich Windl
>>> Antony Stone  schrieb am 04.08.2021 um
23:01 in
Nachricht <202108042301.19895.antony.st...@ha.open.source.it>:
> On Wednesday 04 August 2021 at 22:06:39, Frank D. Engel, Jr. wrote:
> 
>> There is no safe way to do what you are trying to do.
>> 
>> If the resource is on cluster A and contact is lost between clusters A
>> and B due to a network failure, how does cluster B know if the resource
>> is still running on cluster A or not?
>>
>> It has no way of knowing if cluster A is even up and running.
>> 
>> In that situation it cannot safely start the resource.
> 
> I am perfectly happy to have an additional machine at a third location in 
> order to avoid this split‑brain between two clusters.
> 
> However, what I cannot have is for the resources which should be running on

> cluster A to get started on cluster B.
> 
> If cluster A is down, then its resources should simply not run ‑ as happens

> right now with two independent clusters.
> 
> Suppose for a moment I had three clusters at three locations: A, B and C.
> 
> Is there a method by which I can have:
> 
> 1. Cluster A resources running on cluster A if cluster A is functional and 
> not 
> running anywhere if cluster A is non‑functional.

If cluster A is non-functional, no resoiurces of cluster A will run.

> 
> 2. Cluster B resources running on cluster B if cluster B is functional and 
> not 
> running anywhere if cluster B is non‑functional.

Likewise for cluster B.

> 
> 3. Cluster C resources running on cluster C if cluster C is functional and 
> not 
> running anywhere if cluster C is non‑functional.

Same here.

> 
> 4. Resource D running _somewhere_ on clusters A, B or C, but only a single 
> instance of D at a single location at any time.

Part of the problem is your description: Actually you do not have a resource
D, but you have three resources like D_A, D_B, and D_C

Maybe things were easier if it it would all be one big cluster with location
constraints.


> 
> Requirements 1, 2 and 3 are easy to achieve ‑ don't connect the clusters.
> 
> Requirement 4 is the one I'm stuck with how to implement.
> 
> If the three nodes comprising cluster A can manage resources such that they

> run on only one of the three nodes at any time, surely there must be a way 
> of 
> doing the same thing with a resource running on one of three clusters?
> 
> 
> Antony.
> 
> ‑‑ 
> I don't know, maybe if we all waited then cosmic rays would write all our 
> software for us. Of course it might take a while.
> 
>  ‑ Ron Minnich, Los Alamos National Laboratory
> 
>Please reply to the
list;
>  please *don't* CC 
> me.
> ___
> Manage your subscription:
> https://lists.clusterlabs.org/mailman/listinfo/users 
> 
> ClusterLabs home: https://www.clusterlabs.org/ 



___
Manage your subscription:
https://lists.clusterlabs.org/mailman/listinfo/users

ClusterLabs home: https://www.clusterlabs.org/


[ClusterLabs] Antw: Re: Antw: [EXT] Re: Sub‑clusters / super‑clusters?

2021-08-04 Thread Ulrich Windl
>>> Antony Stone  schrieb am 04.08.2021 um
21:27 in
Nachricht <202108042127.43916.antony.st...@ha.open.source.it>:
> On Wednesday 04 August 2021 at 20:57:49, Strahil Nikolov wrote:
> 
>> That's why you need a qdisk at a 3‑rd location, so you will have 7 votes
in
>> total.When 3 nodes in cityA die, all resources will be started on the
>> remaining 3 nodes.
> 
> I think I have not explained this properly.
> 
> I have three nodes in city A which run resources which have to run in city 
> A.  
> They are based on IP addresses which are only valid on the network in city 
> A.
> 
> I have three nodes in city B which run resources which have to run in city 
> B.  
> They are based on IP addresses which are only valid on the network in city 
> B.
> 
> I have redundant routing between my upstream provider, and cities A and B, 
> so 
> that I only _need_ resources to be running in one of the two cities for 
> everything to work as required.  City A can go completely offline and not 
> run 
> its resources, and everything I need continues to work via city B.
> 
> I now have an additional requirement to run a single resource at either city

> A 
> or city B but not both.
> 
> As soon as I connect the clusters at city A and city B, and apply the 
> location 
> contraints and weighting rules you have suggested:
> 
> 1. everything works, including the single resource at either city A or city

> B, 
> so long as both clusters are operational.
> 
> 2. as soon as one cluster fails (all three of its nodes nodes become 
> unavailable), then the other cluster stops running all its resources as 
> well.  
> This is even with quorum=2.

Have you ever tried to find out why this happens? (Talking about logs)

> 
> This means I have lost the redundancy between my two clusters, which is 
> based 
> on the expectation that only one cluster will fail at a time.  If the 
> failure 
> of one automatically _causes_ the failure of the other, I have no high 
> availability any more.
> 
> What I require is for cluster A to continue running its own resources, plus

> the single resource which can run anywhere, in the event that cluster B 
> fails.
> 
> In other words, I need the exact same outcome as I have at present if 
> cluster 
> B fails (its resources stop, cluster A is unaffected), except that cluster A

> 
> continues to run the single resource which I need just a single instance
of.
> 
> It is impossible for the nodes at city A to run the resources which should 
> be 
> running at city B, partly because some of them are identical ("Asterisk" as

> a 
> resource, for example, is already running at city A), and partly because 
> some 
> of them are bound to the networking arrangements (I cannot set a floating IP

> 
> address which belongs in city A on a machine which exists in city B ‑ it
just 
> 
> doesn't work).
> 
> Therefore if adding a seventh node at a third location would try to start 
> _all_ resources in city A if city B goes down, it is not a working solution.

>  
> If city B goes down then I simply do not want its resources to be running 
> anywhere, just the same as I have now with the two independent clusters.
> 
> 
> Thanks,
> 
> 
> Antony.
> 
> ‑‑ 
> "In fact I wanted to be John Cleese and it took me some time to realise that

> 
> the job was already taken."
> 
>  ‑ Douglas Adams
> 
>Please reply to the
list;
>  please *don't* CC 
> me.
> ___
> Manage your subscription:
> https://lists.clusterlabs.org/mailman/listinfo/users 
> 
> ClusterLabs home: https://www.clusterlabs.org/ 



___
Manage your subscription:
https://lists.clusterlabs.org/mailman/listinfo/users

ClusterLabs home: https://www.clusterlabs.org/


Re: [ClusterLabs] Antw: [EXT] Re: Sub‑clusters / super‑clusters?

2021-08-04 Thread Andrei Borzenkov
On 05.08.2021 00:01, Antony Stone wrote:
> On Wednesday 04 August 2021 at 22:06:39, Frank D. Engel, Jr. wrote:
> 
>> There is no safe way to do what you are trying to do.
>>
>> If the resource is on cluster A and contact is lost between clusters A
>> and B due to a network failure, how does cluster B know if the resource
>> is still running on cluster A or not?
>>
>> It has no way of knowing if cluster A is even up and running.
>>
>> In that situation it cannot safely start the resource.
> 
> I am perfectly happy to have an additional machine at a third location in 
> order to avoid this split-brain between two clusters.
> 
> However, what I cannot have is for the resources which should be running on 
> cluster A to get started on cluster B.
> 
> If cluster A is down, then its resources should simply not run - as happens 
> right now with two independent clusters.
> 
> Suppose for a moment I had three clusters at three locations: A, B and C.
> 
> Is there a method by which I can have:
> 
> 1. Cluster A resources running on cluster A if cluster A is functional and 
> not 
> running anywhere if cluster A is non-functional.
> 
> 2. Cluster B resources running on cluster B if cluster B is functional and 
> not 
> running anywhere if cluster B is non-functional.
> 
> 3. Cluster C resources running on cluster C if cluster C is functional and 
> not 
> running anywhere if cluster C is non-functional.
> 
> 4. Resource D running _somewhere_ on clusters A, B or C, but only a single 
> instance of D at a single location at any time.
> 
> Requirements 1, 2 and 3 are easy to achieve - don't connect the clusters.
> 
> Requirement 4 is the one I'm stuck with how to implement.
> 

You either have single cluster and define appropriate location
constraints or you have multiple clusters and configure geo-cluster on
top of them. But you already have been told it multiple times.

> If the three nodes comprising cluster A can manage resources such that they 
> run on only one of the three nodes at any time, surely there must be a way of 
> doing the same thing with a resource running on one of three clusters?
> 
> 

You need something that coordinates resources between three clusters and
that is booth.
___
Manage your subscription:
https://lists.clusterlabs.org/mailman/listinfo/users

ClusterLabs home: https://www.clusterlabs.org/


Re: [ClusterLabs] Antw: [EXT] Re: Sub‑clusters / super‑clusters?

2021-08-04 Thread Strahil Nikolov via Users
I still can't understand why the whole cluster will fail when only 3 nodes are 
down and a qdisk is used.
CityA -> 3 nodes to run packageA -> 3 votesCityB -> 3 nodes to run packageB -> 
3 votesCityC -> 1 node which cannot run any package (qdisk) -> 1 vote
Max votes:7Quorum: 4
As long as one city is up + qdisk -> your cluster will be working.
Then you just configure that packageA cannot run in CityB, packageB cannot run 
in CityA.If all nodes in a city die, the relevant package will be down.
Last, you configure your last resource without any location constraint.
PS: by package consider either a resource group or a single resource.

Best Regards,Strahil Nikolov___
Manage your subscription:
https://lists.clusterlabs.org/mailman/listinfo/users

ClusterLabs home: https://www.clusterlabs.org/


Re: [ClusterLabs] Antw: [EXT] Re: Sub‑clusters / super‑clusters?

2021-08-04 Thread Frank D. Engel, Jr.
In theory if you could have an independent voting infrastructure among 
the three clusters which serves to effectively create a second cluster 
infrastructure interconnecting them to support resource D, you could 
have D running on one of the clusters so long as at least two of them 
can communicate with each other.



In other words, give each cluster one vote, then as long as two of them 
can communicate there are two votes which makes quorum, thus resource D 
can run on one of those two clusters.


If all three clusters lose contact with each other, then D still cannot 
safely run.



To keep the remaining resources working when contact is lost between the 
clusters, the vote for this would need to be independent of the vote 
within each individual cluster, effectively meaning that each node would 
belong to two clusters at once: its own local cluster (A/B/C) plus a 
"global" cluster spread across the three locations.  I don't know 
offhand if that is readily possible to support with the current software.



On 8/4/21 5:01 PM, Antony Stone wrote:

On Wednesday 04 August 2021 at 22:06:39, Frank D. Engel, Jr. wrote:


There is no safe way to do what you are trying to do.

If the resource is on cluster A and contact is lost between clusters A
and B due to a network failure, how does cluster B know if the resource
is still running on cluster A or not?

It has no way of knowing if cluster A is even up and running.

In that situation it cannot safely start the resource.

I am perfectly happy to have an additional machine at a third location in
order to avoid this split-brain between two clusters.

However, what I cannot have is for the resources which should be running on
cluster A to get started on cluster B.

If cluster A is down, then its resources should simply not run - as happens
right now with two independent clusters.

Suppose for a moment I had three clusters at three locations: A, B and C.

Is there a method by which I can have:

1. Cluster A resources running on cluster A if cluster A is functional and not
running anywhere if cluster A is non-functional.

2. Cluster B resources running on cluster B if cluster B is functional and not
running anywhere if cluster B is non-functional.

3. Cluster C resources running on cluster C if cluster C is functional and not
running anywhere if cluster C is non-functional.

4. Resource D running _somewhere_ on clusters A, B or C, but only a single
instance of D at a single location at any time.

Requirements 1, 2 and 3 are easy to achieve - don't connect the clusters.

Requirement 4 is the one I'm stuck with how to implement.

If the three nodes comprising cluster A can manage resources such that they
run on only one of the three nodes at any time, surely there must be a way of
doing the same thing with a resource running on one of three clusters?


Antony.



___
Manage your subscription:
https://lists.clusterlabs.org/mailman/listinfo/users

ClusterLabs home: https://www.clusterlabs.org/


Re: [ClusterLabs] Antw: [EXT] Re: Sub‑clusters / super‑clusters?

2021-08-04 Thread Antony Stone
On Wednesday 04 August 2021 at 22:06:39, Frank D. Engel, Jr. wrote:

> There is no safe way to do what you are trying to do.
> 
> If the resource is on cluster A and contact is lost between clusters A
> and B due to a network failure, how does cluster B know if the resource
> is still running on cluster A or not?
>
> It has no way of knowing if cluster A is even up and running.
> 
> In that situation it cannot safely start the resource.

I am perfectly happy to have an additional machine at a third location in 
order to avoid this split-brain between two clusters.

However, what I cannot have is for the resources which should be running on 
cluster A to get started on cluster B.

If cluster A is down, then its resources should simply not run - as happens 
right now with two independent clusters.

Suppose for a moment I had three clusters at three locations: A, B and C.

Is there a method by which I can have:

1. Cluster A resources running on cluster A if cluster A is functional and not 
running anywhere if cluster A is non-functional.

2. Cluster B resources running on cluster B if cluster B is functional and not 
running anywhere if cluster B is non-functional.

3. Cluster C resources running on cluster C if cluster C is functional and not 
running anywhere if cluster C is non-functional.

4. Resource D running _somewhere_ on clusters A, B or C, but only a single 
instance of D at a single location at any time.

Requirements 1, 2 and 3 are easy to achieve - don't connect the clusters.

Requirement 4 is the one I'm stuck with how to implement.

If the three nodes comprising cluster A can manage resources such that they 
run on only one of the three nodes at any time, surely there must be a way of 
doing the same thing with a resource running on one of three clusters?


Antony.

-- 
I don't know, maybe if we all waited then cosmic rays would write all our 
software for us. Of course it might take a while.

 - Ron Minnich, Los Alamos National Laboratory

   Please reply to the list;
 please *don't* CC me.
___
Manage your subscription:
https://lists.clusterlabs.org/mailman/listinfo/users

ClusterLabs home: https://www.clusterlabs.org/


Re: [ClusterLabs] Antw: [EXT] Re: Sub‑clusters / super‑clusters?

2021-08-04 Thread Frank D. Engel, Jr.

There is no safe way to do what you are trying to do.

If the resource is on cluster A and contact is lost between clusters A 
and B due to a network failure, how does cluster B know if the resource 
is still running on cluster A or not?


It has no way of knowing if cluster A is even up and running.

In that situation it cannot safely start the resource.


If the network is down and both clusters come up at the same time, 
without being able to contact each other, neither knows if the other is 
running the resource, so neither can safely start it.




On 8/4/21 3:27 PM, Antony Stone wrote:

On Wednesday 04 August 2021 at 20:57:49, Strahil Nikolov wrote:


That's why you need a qdisk at a 3-rd location, so you will have 7 votes in
total.When 3 nodes in cityA die, all resources will be started on the
remaining 3 nodes.

I think I have not explained this properly.

I have three nodes in city A which run resources which have to run in city A.
They are based on IP addresses which are only valid on the network in city A.

I have three nodes in city B which run resources which have to run in city B.
They are based on IP addresses which are only valid on the network in city B.

I have redundant routing between my upstream provider, and cities A and B, so
that I only _need_ resources to be running in one of the two cities for
everything to work as required.  City A can go completely offline and not run
its resources, and everything I need continues to work via city B.

I now have an additional requirement to run a single resource at either city A
or city B but not both.

As soon as I connect the clusters at city A and city B, and apply the location
contraints and weighting rules you have suggested:

1. everything works, including the single resource at either city A or city B,
so long as both clusters are operational.

2. as soon as one cluster fails (all three of its nodes nodes become
unavailable), then the other cluster stops running all its resources as well.
This is even with quorum=2.

This means I have lost the redundancy between my two clusters, which is based
on the expectation that only one cluster will fail at a time.  If the failure
of one automatically _causes_ the failure of the other, I have no high
availability any more.

What I require is for cluster A to continue running its own resources, plus
the single resource which can run anywhere, in the event that cluster B fails.

In other words, I need the exact same outcome as I have at present if cluster
B fails (its resources stop, cluster A is unaffected), except that cluster A
continues to run the single resource which I need just a single instance of.

It is impossible for the nodes at city A to run the resources which should be
running at city B, partly because some of them are identical ("Asterisk" as a
resource, for example, is already running at city A), and partly because some
of them are bound to the networking arrangements (I cannot set a floating IP
address which belongs in city A on a machine which exists in city B - it just
doesn't work).

Therefore if adding a seventh node at a third location would try to start
_all_ resources in city A if city B goes down, it is not a working solution.
If city B goes down then I simply do not want its resources to be running
anywhere, just the same as I have now with the two independent clusters.


Thanks,


Antony.



___
Manage your subscription:
https://lists.clusterlabs.org/mailman/listinfo/users

ClusterLabs home: https://www.clusterlabs.org/


Re: [ClusterLabs] Antw: [EXT] Re: Sub‑clusters / super‑clusters?

2021-08-04 Thread Antony Stone
On Wednesday 04 August 2021 at 20:57:49, Strahil Nikolov wrote:

> That's why you need a qdisk at a 3-rd location, so you will have 7 votes in
> total.When 3 nodes in cityA die, all resources will be started on the
> remaining 3 nodes.

I think I have not explained this properly.

I have three nodes in city A which run resources which have to run in city A.  
They are based on IP addresses which are only valid on the network in city A.

I have three nodes in city B which run resources which have to run in city B.  
They are based on IP addresses which are only valid on the network in city B.

I have redundant routing between my upstream provider, and cities A and B, so 
that I only _need_ resources to be running in one of the two cities for 
everything to work as required.  City A can go completely offline and not run 
its resources, and everything I need continues to work via city B.

I now have an additional requirement to run a single resource at either city A 
or city B but not both.

As soon as I connect the clusters at city A and city B, and apply the location 
contraints and weighting rules you have suggested:

1. everything works, including the single resource at either city A or city B, 
so long as both clusters are operational.

2. as soon as one cluster fails (all three of its nodes nodes become 
unavailable), then the other cluster stops running all its resources as well.  
This is even with quorum=2.

This means I have lost the redundancy between my two clusters, which is based 
on the expectation that only one cluster will fail at a time.  If the failure 
of one automatically _causes_ the failure of the other, I have no high 
availability any more.

What I require is for cluster A to continue running its own resources, plus 
the single resource which can run anywhere, in the event that cluster B fails.

In other words, I need the exact same outcome as I have at present if cluster 
B fails (its resources stop, cluster A is unaffected), except that cluster A 
continues to run the single resource which I need just a single instance of.

It is impossible for the nodes at city A to run the resources which should be 
running at city B, partly because some of them are identical ("Asterisk" as a 
resource, for example, is already running at city A), and partly because some 
of them are bound to the networking arrangements (I cannot set a floating IP 
address which belongs in city A on a machine which exists in city B - it just 
doesn't work).

Therefore if adding a seventh node at a third location would try to start 
_all_ resources in city A if city B goes down, it is not a working solution.  
If city B goes down then I simply do not want its resources to be running 
anywhere, just the same as I have now with the two independent clusters.


Thanks,


Antony.

-- 
"In fact I wanted to be John Cleese and it took me some time to realise that 
the job was already taken."

 - Douglas Adams

   Please reply to the list;
 please *don't* CC me.
___
Manage your subscription:
https://lists.clusterlabs.org/mailman/listinfo/users

ClusterLabs home: https://www.clusterlabs.org/


Re: [ClusterLabs] Antw: [EXT] Re: Sub‑clusters / super‑clusters?

2021-08-04 Thread Strahil Nikolov via Users
That's why you need a qdisk at a 3-rd location, so you will have 7 votes in 
total.When 3 nodes in cityA die, all resources will be started on the remaining 
3 nodes.
Best Regards,Strahil Nikolov
 
 
  On Wed, Aug 4, 2021 at 17:23, Antony Stone 
wrote:   On Wednesday 04 August 2021 at 16:07:39, Andrei Borzenkov wrote:

> On Wed, Aug 4, 2021 at 5:03 PM Antony Stone wrote:
> > On Wednesday 04 August 2021 at 13:31:12, Andrei Borzenkov wrote:
> > > On Wed, Aug 4, 2021 at 1:48 PM Antony Stone wrote:
> > > > On Tuesday 03 August 2021 at 12:12:03, Strahil Nikolov via Users
> > > > wrote:
> > > > > Won't something like this work ? Each node in LA will have same
> > > > > score of 5000, while other cities will be -5000.
> > > > > 
> > > > > pcs constraint location DummyRes1 rule score=5000 city eq LA
> > > > > pcs constraint location DummyRes1 rule score=-5000 city ne LA
> > > > > stickiness -> 1
> > > > 
> > > > Thanks for the idea, but no difference.
> > > > 
> > > > Basically, as soon as zero nodes in one city are available, all
> > > > resources, including those running perfectly at the other city, stop.
> > > 
> > > That is not what you originally said.
> > > 
> > > You said you have 6 node cluster (3 + 3) and 2 nodes are not available.
> > 
> > No, I don't think I said that?
> 
> "With the new setup, if two machines in city A fail, then _both_
> clusters stop working"

Ah, apologies - that was a typo.  "With the new setup, if the machines in city 
A fail, then _both_ clusters stop working".

So, basically what I'm saying is that with two separate clusters, if one 
fails, the other keeps going (as one would expect).

Joining the two clusters together so that I can have a single floating resource 
which can run anywhere (as well as the exact same location-specific resources 
as before) results in one cluster failure taking the other cluster down too.

I need one fully-working 3-node cluster to keep going, no matter what the 
other cluster does.


Antony.

-- 
It is also possible that putting the birds in a laboratory setting 
inadvertently renders them relatively incompetent.

 - Daniel C Dennett

                                                  Please reply to the list;
                                                        please *don't* CC me.
___
Manage your subscription:
https://lists.clusterlabs.org/mailman/listinfo/users

ClusterLabs home: https://www.clusterlabs.org/
  
___
Manage your subscription:
https://lists.clusterlabs.org/mailman/listinfo/users

ClusterLabs home: https://www.clusterlabs.org/


[ClusterLabs] Pacemaker problems with pingd

2021-08-04 Thread Janusz Jaskiewicz
Hello.

Please forgive the length of this email but I wanted to provide as much
details as possible.

I'm trying to set up a cluster of two nodes for my service.
I have a problem with a scenario where the network between two nodes gets
broken and they can no longer see each other.
This causes split-brain.
I know that proper way of implementing this would be to employ STONITH, but
it is not feasible for me now (I don't have necessary hardware support and
I don't want to introduce another point of failure by introducing shared
storage based STONITH).

In order to work-around the split-brain scenario I introduced pingd to my
cluster, which in theory should do what I expect.
pingd pings a network device, so when the NIC is broken on one of my nodes,
this node should not run the resources because pingd would fail for it.

pingd resource is configured to update the value of variable 'pingd'
(interval: 5s, dampen: 3s, multiplier:1000).
Based on the value of pingd I have a location constraint which sets score
to -INFINITY for resource DimProdClusterIP when 'pingd' is not 1000.
All other resources are colocated with DimProdClusterIP, and
DimProdClusterIP should start before all other resources.

Based on that setup I would expect that when the resources run on dimprod01
and I disconnect dimprod02 from the network, the resources will not start
on dimprod02.
Unfortunately I see that after a token interval + consensus interval my
resources are brought up for a moment and then go down again.
This is undesirable, as it causes DRBD split-brain inconsistency and
cluster IP may also be taken over by the node which is down.

I tried to debug it, but I can't figure out why it doesn't work.
I would appreciate any help/pointers.


Following are some details of my setup and snippet of pacemaker logs with
comments:

Setup details:

pcs status:
Cluster name: dimprodcluster
Cluster Summary:
  * Stack: corosync
  * Current DC: dimprod02 (version 2.0.5-9.el8_4.1-ba59be7122) - partition
with quorum
  * Last updated: Tue Aug  3 08:20:32 2021
  * Last change:  Mon Aug  2 18:24:39 2021 by root via cibadmin on dimprod01
  * 2 nodes configured
  * 8 resource instances configured

Node List:
  * Online: [ dimprod01 dimprod02 ]

Full List of Resources:
  * DimProdClusterIP (ocf::heartbeat:IPaddr2): Started dimprod01
  * WyrDimProdServer (systemd:wyr-dim): Started dimprod01
  * Clone Set: WyrDimProdServerData-clone [WyrDimProdServerData]
(promotable):
* Masters: [ dimprod01 ]
* Slaves: [ dimprod02 ]
  * WyrDimProdFS (ocf::heartbeat:Filesystem): Started dimprod01
  * DimTestClusterIP (ocf::heartbeat:IPaddr2): Started dimprod01
  * Clone Set: ping-clone [ping]:
* Started: [ dimprod01 dimprod02 ]

Daemon Status:
  corosync: active/enabled
  pacemaker: active/enabled
  pcsd: active/enabled


pcs constraint
Location Constraints:
  Resource: DimProdClusterIP
Constraint: location-DimProdClusterIP
  Rule: score=-INFINITY
Expression: pingd ne 1000
Ordering Constraints:
  start DimProdClusterIP then promote WyrDimProdServerData-clone
(kind:Mandatory)
  promote WyrDimProdServerData-clone then start WyrDimProdFS
(kind:Mandatory)
  start WyrDimProdFS then start WyrDimProdServer (kind:Mandatory)
  start WyrDimProdServer then start DimTestClusterIP (kind:Mandatory)
Colocation Constraints:
  WyrDimProdServer with DimProdClusterIP (score:INFINITY)
  DimTestClusterIP with DimProdClusterIP (score:INFINITY)
  WyrDimProdServerData-clone with DimProdClusterIP (score:INFINITY)
(with-rsc-role:Master)
  WyrDimProdFS with DimProdClusterIP (score:INFINITY)
Ticket Constraints:


pcs resource config ping
 Resource: ping (class=ocf provider=pacemaker type=ping)
  Attributes: dampen=3s host_list=193.30.22.33 multiplier=1000
  Operations: monitor interval=5s timeout=4s (ping-monitor-interval-5s)
  start interval=0s timeout=60s (ping-start-interval-0s)
  stop interval=0s timeout=5s (ping-stop-interval-0s)



cat /etc/corosync/corosync.conf
totem {
version: 2
cluster_name: dimprodcluster
transport: knet
crypto_cipher: aes256
crypto_hash: sha256
token: 1
interface {
knet_ping_interval: 1000
knet_ping_timeout: 1000
}
}

nodelist {
node {
ring0_addr: dimprod01
name: dimprod01
nodeid: 1
}

node {
ring0_addr: dimprod02
name: dimprod02
nodeid: 2
}
}

quorum {
provider: corosync_votequorum
two_node: 1
}

logging {
to_logfile: yes
logfile: /var/log/cluster/corosync.log
to_syslog: yes
timestamp: on
debug:on
}



Logs:
When the network is connected 'pingd' takes value of 1000:

Aug 03 08:23:01 dimprod02.my.clustertest.com pacemaker-attrd [2827046]
(attrd_client_update) debug: Broadcasting pingd[dimprod02]=1000 (writer)
Aug 03 08:23:01 dimprod02.my.clustertest.com attrd_updater   [3369856]
(pcmk__node_attr_request) debug: Asked pacemaker-attrd to update pingd=1000
for dimprod02: OK 

Re: [ClusterLabs] Antw: [EXT] Re: Sub‑clusters / super‑clusters?

2021-08-04 Thread Antony Stone
On Wednesday 04 August 2021 at 16:07:39, Andrei Borzenkov wrote:

> On Wed, Aug 4, 2021 at 5:03 PM Antony Stone wrote:
> > On Wednesday 04 August 2021 at 13:31:12, Andrei Borzenkov wrote:
> > > On Wed, Aug 4, 2021 at 1:48 PM Antony Stone wrote:
> > > > On Tuesday 03 August 2021 at 12:12:03, Strahil Nikolov via Users
> > > > wrote:
> > > > > Won't something like this work ? Each node in LA will have same
> > > > > score of 5000, while other cities will be -5000.
> > > > > 
> > > > > pcs constraint location DummyRes1 rule score=5000 city eq LA
> > > > > pcs constraint location DummyRes1 rule score=-5000 city ne LA
> > > > > stickiness -> 1
> > > > 
> > > > Thanks for the idea, but no difference.
> > > > 
> > > > Basically, as soon as zero nodes in one city are available, all
> > > > resources, including those running perfectly at the other city, stop.
> > > 
> > > That is not what you originally said.
> > > 
> > > You said you have 6 node cluster (3 + 3) and 2 nodes are not available.
> > 
> > No, I don't think I said that?
> 
> "With the new setup, if two machines in city A fail, then _both_
> clusters stop working"

Ah, apologies - that was a typo.  "With the new setup, if the machines in city 
A fail, then _both_ clusters stop working".

So, basically what I'm saying is that with two separate clusters, if one 
fails, the other keeps going (as one would expect).

Joining the two clusters together so that I can have a single floating resource 
which can run anywhere (as well as the exact same location-specific resources 
as before) results in one cluster failure taking the other cluster down too.

I need one fully-working 3-node cluster to keep going, no matter what the 
other cluster does.


Antony.

-- 
It is also possible that putting the birds in a laboratory setting 
inadvertently renders them relatively incompetent.

 - Daniel C Dennett

   Please reply to the list;
 please *don't* CC me.
___
Manage your subscription:
https://lists.clusterlabs.org/mailman/listinfo/users

ClusterLabs home: https://www.clusterlabs.org/


Re: [ClusterLabs] Antw: [EXT] Re: Sub‑clusters / super‑clusters?

2021-08-04 Thread Andrei Borzenkov
On Wed, Aug 4, 2021 at 5:03 PM Antony Stone
 wrote:
>
> On Wednesday 04 August 2021 at 13:31:12, Andrei Borzenkov wrote:
>
> > On Wed, Aug 4, 2021 at 1:48 PM Antony Stone wrote:
> > > On Tuesday 03 August 2021 at 12:12:03, Strahil Nikolov via Users wrote:
> > > > Won't something like this work ? Each node in LA will have same score
> > > > of 5000, while other cities will be -5000.
> > > >
> > > > pcs constraint location DummyRes1 rule score=5000 city eq LA
> > > > pcs constraint location DummyRes1 rule score=-5000 city ne LA
> > > > stickiness -> 1
> > >
> > > Thanks for the idea, but no difference.
> > >
> > > Basically, as soon as zero nodes in one city are available, all
> > > resources, including those running perfectly at the other city, stop.
> >
> > That is not what you originally said.
> >
> > You said you have 6 node cluster (3 + 3) and 2 nodes are not available.
>
> No, I don't think I said that?
>

"With the new setup, if two machines in city A fail, then _both_
clusters stop working"
___
Manage your subscription:
https://lists.clusterlabs.org/mailman/listinfo/users

ClusterLabs home: https://www.clusterlabs.org/


Re: [ClusterLabs] Antw: [EXT] Re: Sub‑clusters / super‑clusters?

2021-08-04 Thread Antony Stone
On Wednesday 04 August 2021 at 13:31:12, Andrei Borzenkov wrote:

> On Wed, Aug 4, 2021 at 1:48 PM Antony Stone wrote:
> > On Tuesday 03 August 2021 at 12:12:03, Strahil Nikolov via Users wrote:
> > > Won't something like this work ? Each node in LA will have same score
> > > of 5000, while other cities will be -5000.
> > > 
> > > pcs constraint location DummyRes1 rule score=5000 city eq LA
> > > pcs constraint location DummyRes1 rule score=-5000 city ne LA
> > > stickiness -> 1
> > 
> > Thanks for the idea, but no difference.
> > 
> > Basically, as soon as zero nodes in one city are available, all
> > resources, including those running perfectly at the other city, stop.
> 
> That is not what you originally said.
> 
> You said you have 6 node cluster (3 + 3) and 2 nodes are not available.

No, I don't think I said that?

With the new setup, if 2 nodes are not available, everything carries on 
working; it doesn't matter whether the two nodes are in the same or different 
locations.  That's fine.

My problem is that with the new setup, if three nodes at one location go down, 
then *everything* stops, including the resources I want to carry on running at 
the other location.

Under my previous, working arrangement with two separate clusters, one data 
centre going down does not affect the other, therefore I have a fully working 
system (since the two data centres provide identical services with redundant 
routing).

A failure of one data centre taking down working services in the other data 
centre is not the high availability solution I'm looking for - it's more like 
high unavailability :)


Antony.

-- 
BASIC is to computer languages what Roman numerals are to arithmetic.

   Please reply to the list;
 please *don't* CC me.
___
Manage your subscription:
https://lists.clusterlabs.org/mailman/listinfo/users

ClusterLabs home: https://www.clusterlabs.org/


Re: [ClusterLabs] Antw: [EXT] Moving resource only one way

2021-08-04 Thread Ervin Hegedüs
Hi Strahil,

On Wed, Aug 04, 2021 at 10:17:26AM +, Strahil Nikolov wrote:
> When you move/migrate resources without the --lifetime option, cluster stack 
> will set +|-INFINITY on the host. (+ -> when migrating to, - -> when 
> migrating away without specifying destination host)
> Take a look at:
> https://clusterlabs.org/pacemaker/doc/deprecated/en-US/Pacemaker/1.1/html/Clusters_from_Scratch/_move_resources_manually.html

meantime I founded this page, and it helped to clarify the
situation.


Thanks,


a.

___
Manage your subscription:
https://lists.clusterlabs.org/mailman/listinfo/users

ClusterLabs home: https://www.clusterlabs.org/


Re: [ClusterLabs] Antw: [EXT] Re: Sub‑clusters / super‑clusters?

2021-08-04 Thread Andrei Borzenkov
On Wed, Aug 4, 2021 at 1:48 PM Antony Stone
 wrote:
>
> On Tuesday 03 August 2021 at 12:12:03, Strahil Nikolov via Users wrote:
>
> > Won't something like this work ? Each node in LA will have same score of
> > 5000, while other cities will be -5000.
> >
> > pcs constraint location DummyRes1 rule score=5000 city eq LA
> > pcs constraint location DummyRes1 rule score=-5000 city ne LA
> > stickiness -> 1
>
> Thanks for the idea, but no difference.
>
> Basically, as soon as zero nodes in one city are available, all resources,
> including those running perfectly at the other city, stop.
>

That is not what you originally said.

You said you have 6 node cluster (3 + 3) and 2 nodes are not available.

If you lose half of nodes and do not have working fencing then this is
expected behavior (in default configuration). You may configure
cluster to keep running resources, but you cannot configure cluster to
take over resources without fencing (well, you can, but ...)

> I'm going to look into booth as suggested by others.
>
> Thanks,
>
>
> Antony.
>
> --
> Atheism is a non-prophet-making organisation.
>
>Please reply to the list;
>  please *don't* CC me.
> ___
> Manage your subscription:
> https://lists.clusterlabs.org/mailman/listinfo/users
>
> ClusterLabs home: https://www.clusterlabs.org/
___
Manage your subscription:
https://lists.clusterlabs.org/mailman/listinfo/users

ClusterLabs home: https://www.clusterlabs.org/


Re: [ClusterLabs] Antw: [EXT] Re: Sub‑clusters / super‑clusters?

2021-08-04 Thread Antony Stone
On Tuesday 03 August 2021 at 12:12:03, Strahil Nikolov via Users wrote:

> Won't something like this work ? Each node in LA will have same score of
> 5000, while other cities will be -5000.
>
> pcs constraint location DummyRes1 rule score=5000 city eq LA
> pcs constraint location DummyRes1 rule score=-5000 city ne LA
> stickiness -> 1

Thanks for the idea, but no difference.

Basically, as soon as zero nodes in one city are available, all resources, 
including those running perfectly at the other city, stop.

I'm going to look into booth as suggested by others.

Thanks,


Antony.

-- 
Atheism is a non-prophet-making organisation.

   Please reply to the list;
 please *don't* CC me.
___
Manage your subscription:
https://lists.clusterlabs.org/mailman/listinfo/users

ClusterLabs home: https://www.clusterlabs.org/


Re: [ClusterLabs] Antw: [EXT] Moving resource only one way

2021-08-04 Thread Strahil Nikolov via Users
When you move/migrate resources without the --lifetime option, cluster stack 
will set +|-INFINITY on the host. (+ -> when migrating to, - -> when migrating 
away without specifying destination host)
Take a look at:
https://clusterlabs.org/pacemaker/doc/deprecated/en-US/Pacemaker/1.1/html/Clusters_from_Scratch/_move_resources_manually.html
Best Regards,Strahil Nikolov
 
 
  On Tue, Aug 3, 2021 at 22:16, Ervin Hegedüs wrote:   Hi,

On Tue, Aug 03, 2021 at 05:46:51PM +, Strahil Nikolov via Users wrote:
> Yes.INFINITY= 100 (one million)-INFINITY=-100(negative one mill)
> Set stickiness > 100 .


hmm... it's interesting.

I've found the documentation what I made for these systems, but
there isn't any line for "location" settings.

How did I get it there?

I reviwed the configured systems (there are three pairs), and one
pair still does not have this line, but two of them have.



Thanks,

a.
  
___
Manage your subscription:
https://lists.clusterlabs.org/mailman/listinfo/users

ClusterLabs home: https://www.clusterlabs.org/


[ClusterLabs] Corosync 3.1.5 is available at corosync.org!

2021-08-04 Thread Jan Friesse

I am pleased to announce the latest maintenance release of Corosync
3.1.5 available immediately from GitHub release section at 
https://github.com/corosync/corosync/releases or our website at

http://build.clusterlabs.org/corosync/releases/.

This release contains important bugfixes of cfgtool and support for 
cgroup v2. Please see corosync.conf(5) man page for more information 
about cgroup v2, because cgroup v2 is very different from cgroup v1 and 
systems with CONFIG_RT_GROUP_SCHED kernel option enabled may experience 
problems with systemd logging or inability to enable cpu controller.


Complete changelog for 3.1.5:

Christine Caulfield (1):
  knet: Fix node status display

Jan Friesse (9):
  main: Add support for cgroup v2 and auto mode
  totemconfig: Do not process totem.nodeid
  cfgtool: Check existence of at least one of nodeid
  totemconfig: Put autogenerated nodeid back to cmap
  cfgtool: Set nodeid indexes after sort
  cfgtool: Fix brief mode display of localhost
  cfgtool: Use CS_PRI_NODE_ID for formatting nodeid
  totemconfig: Ensure all knet hosts has a nodeid
  totemconfig: Knet nodeid must be < 65536

Upgrade is highly recommended.

Thanks/congratulations to all people that contributed to achieve this 
great milestone.


___
Manage your subscription:
https://lists.clusterlabs.org/mailman/listinfo/users

ClusterLabs home: https://www.clusterlabs.org/


Re: [ClusterLabs] Sub-clusters / super-clusters?

2021-08-04 Thread Jan Friesse

On 03/08/2021 10:40, Antony Stone wrote:

On Tuesday 11 May 2021 at 12:56:01, Strahil Nikolov wrote:


Here is the example I had promised:

pcs node attribute server1 city=LA
pcs node attribute server2 city=NY

# Don't run on any node that is not in LA
pcs constraint location DummyRes1 rule score=-INFINITY city ne LA

#Don't run on any node that is not in NY
pcs constraint location DummyRes2 rule score=-INFINITY city ne NY

The idea is that if you add a node and you forget to specify the attribute
with the name 'city' , DummyRes1 & DummyRes2 won't be started on it.

For resources that do not have a constraint based on the city -> they will
run everywhere unless you specify a colocation constraint between the
resources.


Excellent - thanks.  I happen to use crmsh rather than pcs, but I've adapted
the above and got it working.

Unfortunately, there is a problem.

My current setup is:

One 3-machine cluster in city A running a bunch of resources between them, the
most important of which for this discussion is Asterisk telephony.

One 3-machine cluster in city B doing exactly the same thing.

The two clusters have no knowledge of each other.

I have high-availability routing between my clusters and my upstream telephony
provider, such that a call can be handled by Cluster A or Cluster B, and if
one is unavailable, the call gets routed to the other.

Thus, a total failure of Cluster A means I still get phone calls, via Cluster
B.


To implement the above "one resource which can run anywhere, but only a single
instance", I joined together clusters A and B, and placed the corresponding
location constraints on the resources I want only at A and the ones I want
only at B.  I then added the resource with no location constraint, and it runs
anywhere, just once.

So far, so good.


The problem is:

With the two independent clusters, if two machines in city A fail, then
Cluster A fails completely (no quorum), and Cluster B continues working.  That
means I still get phone calls.

With the new setup, if two machines in city A fail, then _both_ clusters stop
working and I have no functional resources anywhere.


So, my question now is:

How can I have a 3-machine Cluster A running local resources, and a 3-machine
Cluster B running local resources, plus one resource running on either Cluster
A or Cluster B, but without a failure of one cluster causing _everything_ to
stop?


Yes, it's called geo-clustering (multi-site) - 
https://access.redhat.com/documentation/en-us/red_hat_enterprise_linux/8/html/configuring_and_managing_high_availability_clusters/assembly_configuring-multisite-cluster-configuring-and-managing-high-availability-clusters


(ignore doc being for RHEL, other distributions with booth will work 
same way)


Regards,
  Honza




Thanks,


Antony.



___
Manage your subscription:
https://lists.clusterlabs.org/mailman/listinfo/users

ClusterLabs home: https://www.clusterlabs.org/