[ClusterLabs] Antw: Re: Antw: [EXT] Re: Sub‑clusters / super‑clusters?
>>> Antony Stone schrieb am 04.08.2021 um 23:01 in Nachricht <202108042301.19895.antony.st...@ha.open.source.it>: > On Wednesday 04 August 2021 at 22:06:39, Frank D. Engel, Jr. wrote: > >> There is no safe way to do what you are trying to do. >> >> If the resource is on cluster A and contact is lost between clusters A >> and B due to a network failure, how does cluster B know if the resource >> is still running on cluster A or not? >> >> It has no way of knowing if cluster A is even up and running. >> >> In that situation it cannot safely start the resource. > > I am perfectly happy to have an additional machine at a third location in > order to avoid this split‑brain between two clusters. > > However, what I cannot have is for the resources which should be running on > cluster A to get started on cluster B. > > If cluster A is down, then its resources should simply not run ‑ as happens > right now with two independent clusters. > > Suppose for a moment I had three clusters at three locations: A, B and C. > > Is there a method by which I can have: > > 1. Cluster A resources running on cluster A if cluster A is functional and > not > running anywhere if cluster A is non‑functional. If cluster A is non-functional, no resoiurces of cluster A will run. > > 2. Cluster B resources running on cluster B if cluster B is functional and > not > running anywhere if cluster B is non‑functional. Likewise for cluster B. > > 3. Cluster C resources running on cluster C if cluster C is functional and > not > running anywhere if cluster C is non‑functional. Same here. > > 4. Resource D running _somewhere_ on clusters A, B or C, but only a single > instance of D at a single location at any time. Part of the problem is your description: Actually you do not have a resource D, but you have three resources like D_A, D_B, and D_C Maybe things were easier if it it would all be one big cluster with location constraints. > > Requirements 1, 2 and 3 are easy to achieve ‑ don't connect the clusters. > > Requirement 4 is the one I'm stuck with how to implement. > > If the three nodes comprising cluster A can manage resources such that they > run on only one of the three nodes at any time, surely there must be a way > of > doing the same thing with a resource running on one of three clusters? > > > Antony. > > ‑‑ > I don't know, maybe if we all waited then cosmic rays would write all our > software for us. Of course it might take a while. > > ‑ Ron Minnich, Los Alamos National Laboratory > >Please reply to the list; > please *don't* CC > me. > ___ > Manage your subscription: > https://lists.clusterlabs.org/mailman/listinfo/users > > ClusterLabs home: https://www.clusterlabs.org/ ___ Manage your subscription: https://lists.clusterlabs.org/mailman/listinfo/users ClusterLabs home: https://www.clusterlabs.org/
[ClusterLabs] Antw: Re: Antw: [EXT] Re: Sub‑clusters / super‑clusters?
>>> Antony Stone schrieb am 04.08.2021 um 21:27 in Nachricht <202108042127.43916.antony.st...@ha.open.source.it>: > On Wednesday 04 August 2021 at 20:57:49, Strahil Nikolov wrote: > >> That's why you need a qdisk at a 3‑rd location, so you will have 7 votes in >> total.When 3 nodes in cityA die, all resources will be started on the >> remaining 3 nodes. > > I think I have not explained this properly. > > I have three nodes in city A which run resources which have to run in city > A. > They are based on IP addresses which are only valid on the network in city > A. > > I have three nodes in city B which run resources which have to run in city > B. > They are based on IP addresses which are only valid on the network in city > B. > > I have redundant routing between my upstream provider, and cities A and B, > so > that I only _need_ resources to be running in one of the two cities for > everything to work as required. City A can go completely offline and not > run > its resources, and everything I need continues to work via city B. > > I now have an additional requirement to run a single resource at either city > A > or city B but not both. > > As soon as I connect the clusters at city A and city B, and apply the > location > contraints and weighting rules you have suggested: > > 1. everything works, including the single resource at either city A or city > B, > so long as both clusters are operational. > > 2. as soon as one cluster fails (all three of its nodes nodes become > unavailable), then the other cluster stops running all its resources as > well. > This is even with quorum=2. Have you ever tried to find out why this happens? (Talking about logs) > > This means I have lost the redundancy between my two clusters, which is > based > on the expectation that only one cluster will fail at a time. If the > failure > of one automatically _causes_ the failure of the other, I have no high > availability any more. > > What I require is for cluster A to continue running its own resources, plus > the single resource which can run anywhere, in the event that cluster B > fails. > > In other words, I need the exact same outcome as I have at present if > cluster > B fails (its resources stop, cluster A is unaffected), except that cluster A > > continues to run the single resource which I need just a single instance of. > > It is impossible for the nodes at city A to run the resources which should > be > running at city B, partly because some of them are identical ("Asterisk" as > a > resource, for example, is already running at city A), and partly because > some > of them are bound to the networking arrangements (I cannot set a floating IP > > address which belongs in city A on a machine which exists in city B ‑ it just > > doesn't work). > > Therefore if adding a seventh node at a third location would try to start > _all_ resources in city A if city B goes down, it is not a working solution. > > If city B goes down then I simply do not want its resources to be running > anywhere, just the same as I have now with the two independent clusters. > > > Thanks, > > > Antony. > > ‑‑ > "In fact I wanted to be John Cleese and it took me some time to realise that > > the job was already taken." > > ‑ Douglas Adams > >Please reply to the list; > please *don't* CC > me. > ___ > Manage your subscription: > https://lists.clusterlabs.org/mailman/listinfo/users > > ClusterLabs home: https://www.clusterlabs.org/ ___ Manage your subscription: https://lists.clusterlabs.org/mailman/listinfo/users ClusterLabs home: https://www.clusterlabs.org/
Re: [ClusterLabs] Antw: [EXT] Re: Sub‑clusters / super‑clusters?
On 05.08.2021 00:01, Antony Stone wrote: > On Wednesday 04 August 2021 at 22:06:39, Frank D. Engel, Jr. wrote: > >> There is no safe way to do what you are trying to do. >> >> If the resource is on cluster A and contact is lost between clusters A >> and B due to a network failure, how does cluster B know if the resource >> is still running on cluster A or not? >> >> It has no way of knowing if cluster A is even up and running. >> >> In that situation it cannot safely start the resource. > > I am perfectly happy to have an additional machine at a third location in > order to avoid this split-brain between two clusters. > > However, what I cannot have is for the resources which should be running on > cluster A to get started on cluster B. > > If cluster A is down, then its resources should simply not run - as happens > right now with two independent clusters. > > Suppose for a moment I had three clusters at three locations: A, B and C. > > Is there a method by which I can have: > > 1. Cluster A resources running on cluster A if cluster A is functional and > not > running anywhere if cluster A is non-functional. > > 2. Cluster B resources running on cluster B if cluster B is functional and > not > running anywhere if cluster B is non-functional. > > 3. Cluster C resources running on cluster C if cluster C is functional and > not > running anywhere if cluster C is non-functional. > > 4. Resource D running _somewhere_ on clusters A, B or C, but only a single > instance of D at a single location at any time. > > Requirements 1, 2 and 3 are easy to achieve - don't connect the clusters. > > Requirement 4 is the one I'm stuck with how to implement. > You either have single cluster and define appropriate location constraints or you have multiple clusters and configure geo-cluster on top of them. But you already have been told it multiple times. > If the three nodes comprising cluster A can manage resources such that they > run on only one of the three nodes at any time, surely there must be a way of > doing the same thing with a resource running on one of three clusters? > > You need something that coordinates resources between three clusters and that is booth. ___ Manage your subscription: https://lists.clusterlabs.org/mailman/listinfo/users ClusterLabs home: https://www.clusterlabs.org/
Re: [ClusterLabs] Antw: [EXT] Re: Sub‑clusters / super‑clusters?
I still can't understand why the whole cluster will fail when only 3 nodes are down and a qdisk is used. CityA -> 3 nodes to run packageA -> 3 votesCityB -> 3 nodes to run packageB -> 3 votesCityC -> 1 node which cannot run any package (qdisk) -> 1 vote Max votes:7Quorum: 4 As long as one city is up + qdisk -> your cluster will be working. Then you just configure that packageA cannot run in CityB, packageB cannot run in CityA.If all nodes in a city die, the relevant package will be down. Last, you configure your last resource without any location constraint. PS: by package consider either a resource group or a single resource. Best Regards,Strahil Nikolov___ Manage your subscription: https://lists.clusterlabs.org/mailman/listinfo/users ClusterLabs home: https://www.clusterlabs.org/
Re: [ClusterLabs] Antw: [EXT] Re: Sub‑clusters / super‑clusters?
In theory if you could have an independent voting infrastructure among the three clusters which serves to effectively create a second cluster infrastructure interconnecting them to support resource D, you could have D running on one of the clusters so long as at least two of them can communicate with each other. In other words, give each cluster one vote, then as long as two of them can communicate there are two votes which makes quorum, thus resource D can run on one of those two clusters. If all three clusters lose contact with each other, then D still cannot safely run. To keep the remaining resources working when contact is lost between the clusters, the vote for this would need to be independent of the vote within each individual cluster, effectively meaning that each node would belong to two clusters at once: its own local cluster (A/B/C) plus a "global" cluster spread across the three locations. I don't know offhand if that is readily possible to support with the current software. On 8/4/21 5:01 PM, Antony Stone wrote: On Wednesday 04 August 2021 at 22:06:39, Frank D. Engel, Jr. wrote: There is no safe way to do what you are trying to do. If the resource is on cluster A and contact is lost between clusters A and B due to a network failure, how does cluster B know if the resource is still running on cluster A or not? It has no way of knowing if cluster A is even up and running. In that situation it cannot safely start the resource. I am perfectly happy to have an additional machine at a third location in order to avoid this split-brain between two clusters. However, what I cannot have is for the resources which should be running on cluster A to get started on cluster B. If cluster A is down, then its resources should simply not run - as happens right now with two independent clusters. Suppose for a moment I had three clusters at three locations: A, B and C. Is there a method by which I can have: 1. Cluster A resources running on cluster A if cluster A is functional and not running anywhere if cluster A is non-functional. 2. Cluster B resources running on cluster B if cluster B is functional and not running anywhere if cluster B is non-functional. 3. Cluster C resources running on cluster C if cluster C is functional and not running anywhere if cluster C is non-functional. 4. Resource D running _somewhere_ on clusters A, B or C, but only a single instance of D at a single location at any time. Requirements 1, 2 and 3 are easy to achieve - don't connect the clusters. Requirement 4 is the one I'm stuck with how to implement. If the three nodes comprising cluster A can manage resources such that they run on only one of the three nodes at any time, surely there must be a way of doing the same thing with a resource running on one of three clusters? Antony. ___ Manage your subscription: https://lists.clusterlabs.org/mailman/listinfo/users ClusterLabs home: https://www.clusterlabs.org/
Re: [ClusterLabs] Antw: [EXT] Re: Sub‑clusters / super‑clusters?
On Wednesday 04 August 2021 at 22:06:39, Frank D. Engel, Jr. wrote: > There is no safe way to do what you are trying to do. > > If the resource is on cluster A and contact is lost between clusters A > and B due to a network failure, how does cluster B know if the resource > is still running on cluster A or not? > > It has no way of knowing if cluster A is even up and running. > > In that situation it cannot safely start the resource. I am perfectly happy to have an additional machine at a third location in order to avoid this split-brain between two clusters. However, what I cannot have is for the resources which should be running on cluster A to get started on cluster B. If cluster A is down, then its resources should simply not run - as happens right now with two independent clusters. Suppose for a moment I had three clusters at three locations: A, B and C. Is there a method by which I can have: 1. Cluster A resources running on cluster A if cluster A is functional and not running anywhere if cluster A is non-functional. 2. Cluster B resources running on cluster B if cluster B is functional and not running anywhere if cluster B is non-functional. 3. Cluster C resources running on cluster C if cluster C is functional and not running anywhere if cluster C is non-functional. 4. Resource D running _somewhere_ on clusters A, B or C, but only a single instance of D at a single location at any time. Requirements 1, 2 and 3 are easy to achieve - don't connect the clusters. Requirement 4 is the one I'm stuck with how to implement. If the three nodes comprising cluster A can manage resources such that they run on only one of the three nodes at any time, surely there must be a way of doing the same thing with a resource running on one of three clusters? Antony. -- I don't know, maybe if we all waited then cosmic rays would write all our software for us. Of course it might take a while. - Ron Minnich, Los Alamos National Laboratory Please reply to the list; please *don't* CC me. ___ Manage your subscription: https://lists.clusterlabs.org/mailman/listinfo/users ClusterLabs home: https://www.clusterlabs.org/
Re: [ClusterLabs] Antw: [EXT] Re: Sub‑clusters / super‑clusters?
There is no safe way to do what you are trying to do. If the resource is on cluster A and contact is lost between clusters A and B due to a network failure, how does cluster B know if the resource is still running on cluster A or not? It has no way of knowing if cluster A is even up and running. In that situation it cannot safely start the resource. If the network is down and both clusters come up at the same time, without being able to contact each other, neither knows if the other is running the resource, so neither can safely start it. On 8/4/21 3:27 PM, Antony Stone wrote: On Wednesday 04 August 2021 at 20:57:49, Strahil Nikolov wrote: That's why you need a qdisk at a 3-rd location, so you will have 7 votes in total.When 3 nodes in cityA die, all resources will be started on the remaining 3 nodes. I think I have not explained this properly. I have three nodes in city A which run resources which have to run in city A. They are based on IP addresses which are only valid on the network in city A. I have three nodes in city B which run resources which have to run in city B. They are based on IP addresses which are only valid on the network in city B. I have redundant routing between my upstream provider, and cities A and B, so that I only _need_ resources to be running in one of the two cities for everything to work as required. City A can go completely offline and not run its resources, and everything I need continues to work via city B. I now have an additional requirement to run a single resource at either city A or city B but not both. As soon as I connect the clusters at city A and city B, and apply the location contraints and weighting rules you have suggested: 1. everything works, including the single resource at either city A or city B, so long as both clusters are operational. 2. as soon as one cluster fails (all three of its nodes nodes become unavailable), then the other cluster stops running all its resources as well. This is even with quorum=2. This means I have lost the redundancy between my two clusters, which is based on the expectation that only one cluster will fail at a time. If the failure of one automatically _causes_ the failure of the other, I have no high availability any more. What I require is for cluster A to continue running its own resources, plus the single resource which can run anywhere, in the event that cluster B fails. In other words, I need the exact same outcome as I have at present if cluster B fails (its resources stop, cluster A is unaffected), except that cluster A continues to run the single resource which I need just a single instance of. It is impossible for the nodes at city A to run the resources which should be running at city B, partly because some of them are identical ("Asterisk" as a resource, for example, is already running at city A), and partly because some of them are bound to the networking arrangements (I cannot set a floating IP address which belongs in city A on a machine which exists in city B - it just doesn't work). Therefore if adding a seventh node at a third location would try to start _all_ resources in city A if city B goes down, it is not a working solution. If city B goes down then I simply do not want its resources to be running anywhere, just the same as I have now with the two independent clusters. Thanks, Antony. ___ Manage your subscription: https://lists.clusterlabs.org/mailman/listinfo/users ClusterLabs home: https://www.clusterlabs.org/
Re: [ClusterLabs] Antw: [EXT] Re: Sub‑clusters / super‑clusters?
On Wednesday 04 August 2021 at 20:57:49, Strahil Nikolov wrote: > That's why you need a qdisk at a 3-rd location, so you will have 7 votes in > total.When 3 nodes in cityA die, all resources will be started on the > remaining 3 nodes. I think I have not explained this properly. I have three nodes in city A which run resources which have to run in city A. They are based on IP addresses which are only valid on the network in city A. I have three nodes in city B which run resources which have to run in city B. They are based on IP addresses which are only valid on the network in city B. I have redundant routing between my upstream provider, and cities A and B, so that I only _need_ resources to be running in one of the two cities for everything to work as required. City A can go completely offline and not run its resources, and everything I need continues to work via city B. I now have an additional requirement to run a single resource at either city A or city B but not both. As soon as I connect the clusters at city A and city B, and apply the location contraints and weighting rules you have suggested: 1. everything works, including the single resource at either city A or city B, so long as both clusters are operational. 2. as soon as one cluster fails (all three of its nodes nodes become unavailable), then the other cluster stops running all its resources as well. This is even with quorum=2. This means I have lost the redundancy between my two clusters, which is based on the expectation that only one cluster will fail at a time. If the failure of one automatically _causes_ the failure of the other, I have no high availability any more. What I require is for cluster A to continue running its own resources, plus the single resource which can run anywhere, in the event that cluster B fails. In other words, I need the exact same outcome as I have at present if cluster B fails (its resources stop, cluster A is unaffected), except that cluster A continues to run the single resource which I need just a single instance of. It is impossible for the nodes at city A to run the resources which should be running at city B, partly because some of them are identical ("Asterisk" as a resource, for example, is already running at city A), and partly because some of them are bound to the networking arrangements (I cannot set a floating IP address which belongs in city A on a machine which exists in city B - it just doesn't work). Therefore if adding a seventh node at a third location would try to start _all_ resources in city A if city B goes down, it is not a working solution. If city B goes down then I simply do not want its resources to be running anywhere, just the same as I have now with the two independent clusters. Thanks, Antony. -- "In fact I wanted to be John Cleese and it took me some time to realise that the job was already taken." - Douglas Adams Please reply to the list; please *don't* CC me. ___ Manage your subscription: https://lists.clusterlabs.org/mailman/listinfo/users ClusterLabs home: https://www.clusterlabs.org/
Re: [ClusterLabs] Antw: [EXT] Re: Sub‑clusters / super‑clusters?
That's why you need a qdisk at a 3-rd location, so you will have 7 votes in total.When 3 nodes in cityA die, all resources will be started on the remaining 3 nodes. Best Regards,Strahil Nikolov On Wed, Aug 4, 2021 at 17:23, Antony Stone wrote: On Wednesday 04 August 2021 at 16:07:39, Andrei Borzenkov wrote: > On Wed, Aug 4, 2021 at 5:03 PM Antony Stone wrote: > > On Wednesday 04 August 2021 at 13:31:12, Andrei Borzenkov wrote: > > > On Wed, Aug 4, 2021 at 1:48 PM Antony Stone wrote: > > > > On Tuesday 03 August 2021 at 12:12:03, Strahil Nikolov via Users > > > > wrote: > > > > > Won't something like this work ? Each node in LA will have same > > > > > score of 5000, while other cities will be -5000. > > > > > > > > > > pcs constraint location DummyRes1 rule score=5000 city eq LA > > > > > pcs constraint location DummyRes1 rule score=-5000 city ne LA > > > > > stickiness -> 1 > > > > > > > > Thanks for the idea, but no difference. > > > > > > > > Basically, as soon as zero nodes in one city are available, all > > > > resources, including those running perfectly at the other city, stop. > > > > > > That is not what you originally said. > > > > > > You said you have 6 node cluster (3 + 3) and 2 nodes are not available. > > > > No, I don't think I said that? > > "With the new setup, if two machines in city A fail, then _both_ > clusters stop working" Ah, apologies - that was a typo. "With the new setup, if the machines in city A fail, then _both_ clusters stop working". So, basically what I'm saying is that with two separate clusters, if one fails, the other keeps going (as one would expect). Joining the two clusters together so that I can have a single floating resource which can run anywhere (as well as the exact same location-specific resources as before) results in one cluster failure taking the other cluster down too. I need one fully-working 3-node cluster to keep going, no matter what the other cluster does. Antony. -- It is also possible that putting the birds in a laboratory setting inadvertently renders them relatively incompetent. - Daniel C Dennett Please reply to the list; please *don't* CC me. ___ Manage your subscription: https://lists.clusterlabs.org/mailman/listinfo/users ClusterLabs home: https://www.clusterlabs.org/ ___ Manage your subscription: https://lists.clusterlabs.org/mailman/listinfo/users ClusterLabs home: https://www.clusterlabs.org/
[ClusterLabs] Pacemaker problems with pingd
Hello. Please forgive the length of this email but I wanted to provide as much details as possible. I'm trying to set up a cluster of two nodes for my service. I have a problem with a scenario where the network between two nodes gets broken and they can no longer see each other. This causes split-brain. I know that proper way of implementing this would be to employ STONITH, but it is not feasible for me now (I don't have necessary hardware support and I don't want to introduce another point of failure by introducing shared storage based STONITH). In order to work-around the split-brain scenario I introduced pingd to my cluster, which in theory should do what I expect. pingd pings a network device, so when the NIC is broken on one of my nodes, this node should not run the resources because pingd would fail for it. pingd resource is configured to update the value of variable 'pingd' (interval: 5s, dampen: 3s, multiplier:1000). Based on the value of pingd I have a location constraint which sets score to -INFINITY for resource DimProdClusterIP when 'pingd' is not 1000. All other resources are colocated with DimProdClusterIP, and DimProdClusterIP should start before all other resources. Based on that setup I would expect that when the resources run on dimprod01 and I disconnect dimprod02 from the network, the resources will not start on dimprod02. Unfortunately I see that after a token interval + consensus interval my resources are brought up for a moment and then go down again. This is undesirable, as it causes DRBD split-brain inconsistency and cluster IP may also be taken over by the node which is down. I tried to debug it, but I can't figure out why it doesn't work. I would appreciate any help/pointers. Following are some details of my setup and snippet of pacemaker logs with comments: Setup details: pcs status: Cluster name: dimprodcluster Cluster Summary: * Stack: corosync * Current DC: dimprod02 (version 2.0.5-9.el8_4.1-ba59be7122) - partition with quorum * Last updated: Tue Aug 3 08:20:32 2021 * Last change: Mon Aug 2 18:24:39 2021 by root via cibadmin on dimprod01 * 2 nodes configured * 8 resource instances configured Node List: * Online: [ dimprod01 dimprod02 ] Full List of Resources: * DimProdClusterIP (ocf::heartbeat:IPaddr2): Started dimprod01 * WyrDimProdServer (systemd:wyr-dim): Started dimprod01 * Clone Set: WyrDimProdServerData-clone [WyrDimProdServerData] (promotable): * Masters: [ dimprod01 ] * Slaves: [ dimprod02 ] * WyrDimProdFS (ocf::heartbeat:Filesystem): Started dimprod01 * DimTestClusterIP (ocf::heartbeat:IPaddr2): Started dimprod01 * Clone Set: ping-clone [ping]: * Started: [ dimprod01 dimprod02 ] Daemon Status: corosync: active/enabled pacemaker: active/enabled pcsd: active/enabled pcs constraint Location Constraints: Resource: DimProdClusterIP Constraint: location-DimProdClusterIP Rule: score=-INFINITY Expression: pingd ne 1000 Ordering Constraints: start DimProdClusterIP then promote WyrDimProdServerData-clone (kind:Mandatory) promote WyrDimProdServerData-clone then start WyrDimProdFS (kind:Mandatory) start WyrDimProdFS then start WyrDimProdServer (kind:Mandatory) start WyrDimProdServer then start DimTestClusterIP (kind:Mandatory) Colocation Constraints: WyrDimProdServer with DimProdClusterIP (score:INFINITY) DimTestClusterIP with DimProdClusterIP (score:INFINITY) WyrDimProdServerData-clone with DimProdClusterIP (score:INFINITY) (with-rsc-role:Master) WyrDimProdFS with DimProdClusterIP (score:INFINITY) Ticket Constraints: pcs resource config ping Resource: ping (class=ocf provider=pacemaker type=ping) Attributes: dampen=3s host_list=193.30.22.33 multiplier=1000 Operations: monitor interval=5s timeout=4s (ping-monitor-interval-5s) start interval=0s timeout=60s (ping-start-interval-0s) stop interval=0s timeout=5s (ping-stop-interval-0s) cat /etc/corosync/corosync.conf totem { version: 2 cluster_name: dimprodcluster transport: knet crypto_cipher: aes256 crypto_hash: sha256 token: 1 interface { knet_ping_interval: 1000 knet_ping_timeout: 1000 } } nodelist { node { ring0_addr: dimprod01 name: dimprod01 nodeid: 1 } node { ring0_addr: dimprod02 name: dimprod02 nodeid: 2 } } quorum { provider: corosync_votequorum two_node: 1 } logging { to_logfile: yes logfile: /var/log/cluster/corosync.log to_syslog: yes timestamp: on debug:on } Logs: When the network is connected 'pingd' takes value of 1000: Aug 03 08:23:01 dimprod02.my.clustertest.com pacemaker-attrd [2827046] (attrd_client_update) debug: Broadcasting pingd[dimprod02]=1000 (writer) Aug 03 08:23:01 dimprod02.my.clustertest.com attrd_updater [3369856] (pcmk__node_attr_request) debug: Asked pacemaker-attrd to update pingd=1000 for dimprod02: OK
Re: [ClusterLabs] Antw: [EXT] Re: Sub‑clusters / super‑clusters?
On Wednesday 04 August 2021 at 16:07:39, Andrei Borzenkov wrote: > On Wed, Aug 4, 2021 at 5:03 PM Antony Stone wrote: > > On Wednesday 04 August 2021 at 13:31:12, Andrei Borzenkov wrote: > > > On Wed, Aug 4, 2021 at 1:48 PM Antony Stone wrote: > > > > On Tuesday 03 August 2021 at 12:12:03, Strahil Nikolov via Users > > > > wrote: > > > > > Won't something like this work ? Each node in LA will have same > > > > > score of 5000, while other cities will be -5000. > > > > > > > > > > pcs constraint location DummyRes1 rule score=5000 city eq LA > > > > > pcs constraint location DummyRes1 rule score=-5000 city ne LA > > > > > stickiness -> 1 > > > > > > > > Thanks for the idea, but no difference. > > > > > > > > Basically, as soon as zero nodes in one city are available, all > > > > resources, including those running perfectly at the other city, stop. > > > > > > That is not what you originally said. > > > > > > You said you have 6 node cluster (3 + 3) and 2 nodes are not available. > > > > No, I don't think I said that? > > "With the new setup, if two machines in city A fail, then _both_ > clusters stop working" Ah, apologies - that was a typo. "With the new setup, if the machines in city A fail, then _both_ clusters stop working". So, basically what I'm saying is that with two separate clusters, if one fails, the other keeps going (as one would expect). Joining the two clusters together so that I can have a single floating resource which can run anywhere (as well as the exact same location-specific resources as before) results in one cluster failure taking the other cluster down too. I need one fully-working 3-node cluster to keep going, no matter what the other cluster does. Antony. -- It is also possible that putting the birds in a laboratory setting inadvertently renders them relatively incompetent. - Daniel C Dennett Please reply to the list; please *don't* CC me. ___ Manage your subscription: https://lists.clusterlabs.org/mailman/listinfo/users ClusterLabs home: https://www.clusterlabs.org/
Re: [ClusterLabs] Antw: [EXT] Re: Sub‑clusters / super‑clusters?
On Wed, Aug 4, 2021 at 5:03 PM Antony Stone wrote: > > On Wednesday 04 August 2021 at 13:31:12, Andrei Borzenkov wrote: > > > On Wed, Aug 4, 2021 at 1:48 PM Antony Stone wrote: > > > On Tuesday 03 August 2021 at 12:12:03, Strahil Nikolov via Users wrote: > > > > Won't something like this work ? Each node in LA will have same score > > > > of 5000, while other cities will be -5000. > > > > > > > > pcs constraint location DummyRes1 rule score=5000 city eq LA > > > > pcs constraint location DummyRes1 rule score=-5000 city ne LA > > > > stickiness -> 1 > > > > > > Thanks for the idea, but no difference. > > > > > > Basically, as soon as zero nodes in one city are available, all > > > resources, including those running perfectly at the other city, stop. > > > > That is not what you originally said. > > > > You said you have 6 node cluster (3 + 3) and 2 nodes are not available. > > No, I don't think I said that? > "With the new setup, if two machines in city A fail, then _both_ clusters stop working" ___ Manage your subscription: https://lists.clusterlabs.org/mailman/listinfo/users ClusterLabs home: https://www.clusterlabs.org/
Re: [ClusterLabs] Antw: [EXT] Re: Sub‑clusters / super‑clusters?
On Wednesday 04 August 2021 at 13:31:12, Andrei Borzenkov wrote: > On Wed, Aug 4, 2021 at 1:48 PM Antony Stone wrote: > > On Tuesday 03 August 2021 at 12:12:03, Strahil Nikolov via Users wrote: > > > Won't something like this work ? Each node in LA will have same score > > > of 5000, while other cities will be -5000. > > > > > > pcs constraint location DummyRes1 rule score=5000 city eq LA > > > pcs constraint location DummyRes1 rule score=-5000 city ne LA > > > stickiness -> 1 > > > > Thanks for the idea, but no difference. > > > > Basically, as soon as zero nodes in one city are available, all > > resources, including those running perfectly at the other city, stop. > > That is not what you originally said. > > You said you have 6 node cluster (3 + 3) and 2 nodes are not available. No, I don't think I said that? With the new setup, if 2 nodes are not available, everything carries on working; it doesn't matter whether the two nodes are in the same or different locations. That's fine. My problem is that with the new setup, if three nodes at one location go down, then *everything* stops, including the resources I want to carry on running at the other location. Under my previous, working arrangement with two separate clusters, one data centre going down does not affect the other, therefore I have a fully working system (since the two data centres provide identical services with redundant routing). A failure of one data centre taking down working services in the other data centre is not the high availability solution I'm looking for - it's more like high unavailability :) Antony. -- BASIC is to computer languages what Roman numerals are to arithmetic. Please reply to the list; please *don't* CC me. ___ Manage your subscription: https://lists.clusterlabs.org/mailman/listinfo/users ClusterLabs home: https://www.clusterlabs.org/
Re: [ClusterLabs] Antw: [EXT] Moving resource only one way
Hi Strahil, On Wed, Aug 04, 2021 at 10:17:26AM +, Strahil Nikolov wrote: > When you move/migrate resources without the --lifetime option, cluster stack > will set +|-INFINITY on the host. (+ -> when migrating to, - -> when > migrating away without specifying destination host) > Take a look at: > https://clusterlabs.org/pacemaker/doc/deprecated/en-US/Pacemaker/1.1/html/Clusters_from_Scratch/_move_resources_manually.html meantime I founded this page, and it helped to clarify the situation. Thanks, a. ___ Manage your subscription: https://lists.clusterlabs.org/mailman/listinfo/users ClusterLabs home: https://www.clusterlabs.org/
Re: [ClusterLabs] Antw: [EXT] Re: Sub‑clusters / super‑clusters?
On Wed, Aug 4, 2021 at 1:48 PM Antony Stone wrote: > > On Tuesday 03 August 2021 at 12:12:03, Strahil Nikolov via Users wrote: > > > Won't something like this work ? Each node in LA will have same score of > > 5000, while other cities will be -5000. > > > > pcs constraint location DummyRes1 rule score=5000 city eq LA > > pcs constraint location DummyRes1 rule score=-5000 city ne LA > > stickiness -> 1 > > Thanks for the idea, but no difference. > > Basically, as soon as zero nodes in one city are available, all resources, > including those running perfectly at the other city, stop. > That is not what you originally said. You said you have 6 node cluster (3 + 3) and 2 nodes are not available. If you lose half of nodes and do not have working fencing then this is expected behavior (in default configuration). You may configure cluster to keep running resources, but you cannot configure cluster to take over resources without fencing (well, you can, but ...) > I'm going to look into booth as suggested by others. > > Thanks, > > > Antony. > > -- > Atheism is a non-prophet-making organisation. > >Please reply to the list; > please *don't* CC me. > ___ > Manage your subscription: > https://lists.clusterlabs.org/mailman/listinfo/users > > ClusterLabs home: https://www.clusterlabs.org/ ___ Manage your subscription: https://lists.clusterlabs.org/mailman/listinfo/users ClusterLabs home: https://www.clusterlabs.org/
Re: [ClusterLabs] Antw: [EXT] Re: Sub‑clusters / super‑clusters?
On Tuesday 03 August 2021 at 12:12:03, Strahil Nikolov via Users wrote: > Won't something like this work ? Each node in LA will have same score of > 5000, while other cities will be -5000. > > pcs constraint location DummyRes1 rule score=5000 city eq LA > pcs constraint location DummyRes1 rule score=-5000 city ne LA > stickiness -> 1 Thanks for the idea, but no difference. Basically, as soon as zero nodes in one city are available, all resources, including those running perfectly at the other city, stop. I'm going to look into booth as suggested by others. Thanks, Antony. -- Atheism is a non-prophet-making organisation. Please reply to the list; please *don't* CC me. ___ Manage your subscription: https://lists.clusterlabs.org/mailman/listinfo/users ClusterLabs home: https://www.clusterlabs.org/
Re: [ClusterLabs] Antw: [EXT] Moving resource only one way
When you move/migrate resources without the --lifetime option, cluster stack will set +|-INFINITY on the host. (+ -> when migrating to, - -> when migrating away without specifying destination host) Take a look at: https://clusterlabs.org/pacemaker/doc/deprecated/en-US/Pacemaker/1.1/html/Clusters_from_Scratch/_move_resources_manually.html Best Regards,Strahil Nikolov On Tue, Aug 3, 2021 at 22:16, Ervin Hegedüs wrote: Hi, On Tue, Aug 03, 2021 at 05:46:51PM +, Strahil Nikolov via Users wrote: > Yes.INFINITY= 100 (one million)-INFINITY=-100(negative one mill) > Set stickiness > 100 . hmm... it's interesting. I've found the documentation what I made for these systems, but there isn't any line for "location" settings. How did I get it there? I reviwed the configured systems (there are three pairs), and one pair still does not have this line, but two of them have. Thanks, a. ___ Manage your subscription: https://lists.clusterlabs.org/mailman/listinfo/users ClusterLabs home: https://www.clusterlabs.org/
[ClusterLabs] Corosync 3.1.5 is available at corosync.org!
I am pleased to announce the latest maintenance release of Corosync 3.1.5 available immediately from GitHub release section at https://github.com/corosync/corosync/releases or our website at http://build.clusterlabs.org/corosync/releases/. This release contains important bugfixes of cfgtool and support for cgroup v2. Please see corosync.conf(5) man page for more information about cgroup v2, because cgroup v2 is very different from cgroup v1 and systems with CONFIG_RT_GROUP_SCHED kernel option enabled may experience problems with systemd logging or inability to enable cpu controller. Complete changelog for 3.1.5: Christine Caulfield (1): knet: Fix node status display Jan Friesse (9): main: Add support for cgroup v2 and auto mode totemconfig: Do not process totem.nodeid cfgtool: Check existence of at least one of nodeid totemconfig: Put autogenerated nodeid back to cmap cfgtool: Set nodeid indexes after sort cfgtool: Fix brief mode display of localhost cfgtool: Use CS_PRI_NODE_ID for formatting nodeid totemconfig: Ensure all knet hosts has a nodeid totemconfig: Knet nodeid must be < 65536 Upgrade is highly recommended. Thanks/congratulations to all people that contributed to achieve this great milestone. ___ Manage your subscription: https://lists.clusterlabs.org/mailman/listinfo/users ClusterLabs home: https://www.clusterlabs.org/
Re: [ClusterLabs] Sub-clusters / super-clusters?
On 03/08/2021 10:40, Antony Stone wrote: On Tuesday 11 May 2021 at 12:56:01, Strahil Nikolov wrote: Here is the example I had promised: pcs node attribute server1 city=LA pcs node attribute server2 city=NY # Don't run on any node that is not in LA pcs constraint location DummyRes1 rule score=-INFINITY city ne LA #Don't run on any node that is not in NY pcs constraint location DummyRes2 rule score=-INFINITY city ne NY The idea is that if you add a node and you forget to specify the attribute with the name 'city' , DummyRes1 & DummyRes2 won't be started on it. For resources that do not have a constraint based on the city -> they will run everywhere unless you specify a colocation constraint between the resources. Excellent - thanks. I happen to use crmsh rather than pcs, but I've adapted the above and got it working. Unfortunately, there is a problem. My current setup is: One 3-machine cluster in city A running a bunch of resources between them, the most important of which for this discussion is Asterisk telephony. One 3-machine cluster in city B doing exactly the same thing. The two clusters have no knowledge of each other. I have high-availability routing between my clusters and my upstream telephony provider, such that a call can be handled by Cluster A or Cluster B, and if one is unavailable, the call gets routed to the other. Thus, a total failure of Cluster A means I still get phone calls, via Cluster B. To implement the above "one resource which can run anywhere, but only a single instance", I joined together clusters A and B, and placed the corresponding location constraints on the resources I want only at A and the ones I want only at B. I then added the resource with no location constraint, and it runs anywhere, just once. So far, so good. The problem is: With the two independent clusters, if two machines in city A fail, then Cluster A fails completely (no quorum), and Cluster B continues working. That means I still get phone calls. With the new setup, if two machines in city A fail, then _both_ clusters stop working and I have no functional resources anywhere. So, my question now is: How can I have a 3-machine Cluster A running local resources, and a 3-machine Cluster B running local resources, plus one resource running on either Cluster A or Cluster B, but without a failure of one cluster causing _everything_ to stop? Yes, it's called geo-clustering (multi-site) - https://access.redhat.com/documentation/en-us/red_hat_enterprise_linux/8/html/configuring_and_managing_high_availability_clusters/assembly_configuring-multisite-cluster-configuring-and-managing-high-availability-clusters (ignore doc being for RHEL, other distributions with booth will work same way) Regards, Honza Thanks, Antony. ___ Manage your subscription: https://lists.clusterlabs.org/mailman/listinfo/users ClusterLabs home: https://www.clusterlabs.org/