Re: [ClusterLabs] 'pcs stonith update' takes, then reverts
On 2021-07-26 9:54 a.m., kgail...@redhat.com wrote: On Fri, 2021-07-23 at 21:46 -0400, Digimer wrote: After a LOT of hassle, I finally got it updated, but OMG it was painful. I degraded the cluster (unsure if needed), set maintenance mode, deleted the stonith levels, deleted the stonith devices, recreated them with the updated values, recreated the stonith levels, and finally disabled maintenance mode. It should not have been this hard, right? Why is heck would it be that pacemaker kept "rolling back" to old configs? I'd delete the stonith That is bizarre. It sounds like the CIB changes were taking effect locally, then being rejected by the rest of the cluster, which would send the "correct" CIB back to the originator. The logs of interest would be pacemaker.log from both nodes at the time you made the first configuration change that failed. I'm guessing the logs you posted were from after that point? The logs I shared started after the issue began, yes. I can see if I can access the nodes and pull the logs. Note that I degraded the cluster (withdrew the inactive node), so the node was a cluster itself alone, and and still happened. digimer -- Digimer Papers and Projects: https://alteeve.com/w/ "I am, somehow, less interested in the weight and convolutions of Einstein’s brain than in the near certainty that people of equal talent have lived and died in cotton fields and sweatshops." - Stephen Jay Gould ___ Manage your subscription: https://lists.clusterlabs.org/mailman/listinfo/users ClusterLabs home: https://www.clusterlabs.org/
[ClusterLabs] pcs stonith update problems
Hi all, I've got a predicament... I want to update a stonith resource to remove an argument. Specifically, when resource move nodes, I want to change the stonith delay to favour the new host. This involves adding the 'delay="x"' argument to one stonith resource, and removing it from the other; Example; # pcs cluster cib | grep -B7 -A7 '"delay"' Here, the stonith resource 'ipmilan_node1' has the delay="15". If I run: # pcs stonith update ipmilan_node1 fence_ipmilan ipaddr="10.201.17.1" password="xxx" username="admin"; echo $? 0 I see nothing happen in journald, and the delay argument remains in the 'pcs cluster cib' output. If, however, I do; # /usr/sbin/pcs stonith update ipmilan_node1 fence_ipmilan ipaddr="10.201.17.1" password="xxx" username="admin" delay="0"; echo $? 0 I can see in journald that the CIB was updated and can confirm in 'pcs cluster cib' that the 'delay' value becomes '0'. So it seems that, if an argument previously existed and is NOT specified in an update, it is not removed. Is this intentional for some reason? If so, how would I remove the delay attribute? I've got a fairly complex stonith config, with stonith levels. Deleting and recreating the config would be non-trivial. Pacemaker v2.1.0, pcs v0.10.8.181-47e9, CentOS Stream 8. digimer -- Digimer Papers and Projects: https://alteeve.com/w/ "I am, somehow, less interested in the weight and convolutions of Einstein’s brain than in the near certainty that people of equal talent have lived and died in cotton fields and sweatshops." - Stephen Jay Gould ___ Manage your subscription: https://lists.clusterlabs.org/mailman/listinfo/users ClusterLabs home: https://www.clusterlabs.org/
Re: [ClusterLabs] HA-Cluster, UPS and power outage - how is your setup ?
On 2022-02-01 11:16, Lentes, Bernd wrote: Hi, we just experienced two power outages in a few days. This showed me that our UPS configuration and the handling of resources on the cluster is insufficient. We have a two-node cluster with SLES 12 SP5 and a Smart-UPS SRT 3000 from APC with Network Management Card. The UPS is able to buffer the two nodes and some Hardware (SAN, Monitor) for about one hour. Our resources are Virtual Domains, about 20 of different flavor and version. Our primary goal is not to bypass as long as possible a power outage but to shutdown all domains correctly after a dedicated time. I'm currently thinking of waiting for a dedicated time (maybe 15 minutes) and then do a "crm resource stop VirtualDomains" in a script. I would give the cluster some time for the shutdown (5-10 minutes) and afterwards shutdown the nodes (via script). I have to keep an eye on if both nodes are running or only one of them. How is your approach ? Bernd I don't know if this will be a useful answer for you, but I haven't seen anyone else reply. In the Anvil!, we use SNMP to collect data on APC UPSes powering a given cluster. The OIDs we read are at the head of this file, but the logic to read and collect the data starts here; https://github.com/ClusterLabs/anvil/blob/main/scancore-agents/scan-apc-ups/scan-apc-ups#L3026 Some processing happens in-agent, but mainly the collected data is written to a generic "power" table (as we support any UPS we can collect data from). When we're done scanning, we analyze the data in the 'power' table to decide if we need to shed load (withdraw and power off nodes to extend runtime), do a complete graceful shutdown (if the batteries are about to die), or reboot the nodes after power is restored. This logic is handled mainly here. First, we figure out which UPS powers which nodes/clusters, then we pull the data on those specific UPSes to return a general "power state". https://github.com/ClusterLabs/anvil/blob/main/Anvil/Tools/ScanCore.pm#L607 The power state then tells the main daemon what actions to take, if any (load shed, shut down, restart). That's here; https://github.com/ClusterLabs/anvil/blob/main/Anvil/Tools/ScanCore.pm#L1541 This is super high level, and much of the specifics are related to the Anvil! cluster, but it hopefully gives you a starting point on how to approach the problem. We've been doing it this way for many years with really good effect. Cheers -- Digimer Papers and Projects: https://alteeve.com/w/ "I am, somehow, less interested in the weight and convolutions of Einstein’s brain than in the near certainty that people of equal talent have lived and died in cotton fields and sweatshops." - Stephen Jay Gould ___ Manage your subscription: https://lists.clusterlabs.org/mailman/listinfo/users ClusterLabs home: https://www.clusterlabs.org/
Re: [ClusterLabs] Removing a resource without stopping it
On 2022-01-29 03:16, Strahil Nikolov wrote: I think there is pcs cluster edit --scope=resources (based on memory). Can you try to delete it from there ? Best Regards, Strahil Nikolov Thanks, but no that doesn't seem to work. 'pcs cluster edit' wants to open an editor, and I'm trying to find a way to make this change with a program (once I sort out the manual process). So an option that requires user input won't work in my case regardless. Thank you just the same though! -- Digimer Papers and Projects: https://alteeve.com/w/ "I am, somehow, less interested in the weight and convolutions of Einstein’s brain than in the near certainty that people of equal talent have lived and died in cotton fields and sweatshops." - Stephen Jay Gould ___ Manage your subscription: https://lists.clusterlabs.org/mailman/listinfo/users ClusterLabs home: https://www.clusterlabs.org/
Re: [ClusterLabs] Removing a resource without stopping it
I think I have a working method, though not in the conditions first explained. Though I would love to have feedback on it's sanity. The ultimate goal is to migrate the resource (VM) to a different pacemaker cluster. Setting it to be unmanaged, migrating the VM off, setting the resource to disabled, and managing the resource again marks it as stopped, then it can be deleted. [root@an-a01n01 ~]# pcs resource unmanage srv01-cs8 # Migrate the server to another pacemaker cluster here [root@an-a01n01 ~]# pcs resource disable srv01-cs8 Warning: 'srv01-cs8' is unmanaged [root@an-a01n01 ~]# pcs resource manage srv01-cs8 [root@an-a01n01 ~]# pcs resource delete srv01-cs8 Deleting Resource - srv01-cs8 Though going back to the original question, deleting the server from pacemaker while the VM is left running, is still something I am quite curious about. Madi On 2022-01-29 13:27, Strahil Nikolov wrote: I know... and the editor stuff can be bypassed, if the approach works. Best Regards, Strahil Nikolov On Sat, Jan 29, 2022 at 15:43, Digimer wrote: On 2022-01-29 03:16, Strahil Nikolov wrote: I think there is pcs cluster edit --scope=resources (based on memory). Can you try to delete it from there ? Best Regards, Strahil Nikolov Thanks, but no that doesn't seem to work. 'pcs cluster edit' wants to open an editor, and I'm trying to find a way to make this change with a program (once I sort out the manual process). So an option that requires user input won't work in my case regardless. Thank you just the same though! -- Digimer Papers and Projects: https://alteeve.com/w/ "I am, somehow, less interested in the weight and convolutions of Einstein’s brain than in the near certainty that people of equal talent have lived and died in cotton fields and sweatshops." - Stephen Jay Gould -- Digimer Papers and Projects: https://alteeve.com/w/ "I am, somehow, less interested in the weight and convolutions of Einstein’s brain than in the near certainty that people of equal talent have lived and died in cotton fields and sweatshops." - Stephen Jay Gould ___ Manage your subscription: https://lists.clusterlabs.org/mailman/listinfo/users ClusterLabs home: https://www.clusterlabs.org/
[ClusterLabs] Removing a resource without stopping it
Hi all, I'm trying to figure out how to move a running VM from one pacemaker cluster to another. I've got the storage and VM live migration sorted, but having trouble with pacemaker. I tried unmanaging the resource (the VM), then deleted the resource, and the node got fenced. So I am assuming it thought it couldn't stop the service so it self-fenced. In any case, can someone let me know what the proper procedure is? Said more directly; How to I delete a resource from pacemaker (via pcs on EL8) without stopping the resource? -- Digimer Papers and Projects: https://alteeve.com/w/ "I am, somehow, less interested in the weight and convolutions of Einstein’s brain than in the near certainty that people of equal talent have lived and died in cotton fields and sweatshops." - Stephen Jay Gould ___ Manage your subscription: https://lists.clusterlabs.org/mailman/listinfo/users ClusterLabs home: https://www.clusterlabs.org/
Re: [ClusterLabs] Removing a resource without stopping it
On 2022-01-29 00:10, Digimer wrote: On 2022-01-28 16:54, Ken Gaillot wrote: On Fri, 2022-01-28 at 16:38 -0500, Digimer wrote: Hi all, I'm trying to figure out how to move a running VM from one pacemaker cluster to another. I've got the storage and VM live migration sorted, but having trouble with pacemaker. I tried unmanaging the resource (the VM), then deleted the resource, and the node got fenced. So I am assuming it thought it couldn't stop the service so it self-fenced. In any case, can someone let me know what the proper procedure is? Said more directly; How to I delete a resource from pacemaker (via pcs on EL8) without stopping the resource? Set the stop-orphan-resources cluster property to false (at least while you move it) The problem with your first approach is that once you remove the resource configuration, which includes the is-managed setting, Pacemaker no longer knows the resource is unmanaged. And even if you set it via resource defaults or something, eventually you have to set it back, at which point Pacemaker will still have the same response. Follow up; I tried to do the following sequence; pcs property set stop-orphan-resources=false pcs resource unmanage srv01-cs8 # Without this, the resource was stopped pcs resource delete srv01-cs8 # Failed with "Warning: 'srv01-cs8' is unmanaged" pcs resource delete srv01-cs8 --force # Got 'Deleting Resource - srv01-cs8' pcs resource status -- * srv01-cs8 (ocf::alteeve:server): ORPHANED Started an-a01n01 (unmanaged) -- So it seems like this doesn't delete the resource. Can I get some insight on how to actually delete this resource without disabling the VM? Thanks! Adding; I tried 'pcs property set stop-orphan-resources=true' and it stopped the VM and then actually deleted the resource. =/ -- Digimer Papers and Projects: https://alteeve.com/w/ "I am, somehow, less interested in the weight and convolutions of Einstein’s brain than in the near certainty that people of equal talent have lived and died in cotton fields and sweatshops." - Stephen Jay Gould ___ Manage your subscription: https://lists.clusterlabs.org/mailman/listinfo/users ClusterLabs home: https://www.clusterlabs.org/
Re: [ClusterLabs] Removing a resource without stopping it
On 2022-01-28 16:54, Ken Gaillot wrote: On Fri, 2022-01-28 at 16:38 -0500, Digimer wrote: Hi all, I'm trying to figure out how to move a running VM from one pacemaker cluster to another. I've got the storage and VM live migration sorted, but having trouble with pacemaker. I tried unmanaging the resource (the VM), then deleted the resource, and the node got fenced. So I am assuming it thought it couldn't stop the service so it self-fenced. In any case, can someone let me know what the proper procedure is? Said more directly; How to I delete a resource from pacemaker (via pcs on EL8) without stopping the resource? Set the stop-orphan-resources cluster property to false (at least while you move it) The problem with your first approach is that once you remove the resource configuration, which includes the is-managed setting, Pacemaker no longer knows the resource is unmanaged. And even if you set it via resource defaults or something, eventually you have to set it back, at which point Pacemaker will still have the same response. Thanks for this! I'm not entirely sure I understand the implications of "stop-orphan-resources". I assume it would be a bad idea to set it to "false" and leave it that way, given it's not the default. What's the purpose of this being set to 'true'? Thanks! -- Digimer Papers and Projects: https://alteeve.com/w/ "I am, somehow, less interested in the weight and convolutions of Einstein’s brain than in the near certainty that people of equal talent have lived and died in cotton fields and sweatshops." - Stephen Jay Gould ___ Manage your subscription: https://lists.clusterlabs.org/mailman/listinfo/users ClusterLabs home: https://www.clusterlabs.org/
Re: [ClusterLabs] Removing a resource without stopping it
On 2022-01-28 16:54, Ken Gaillot wrote: On Fri, 2022-01-28 at 16:38 -0500, Digimer wrote: Hi all, I'm trying to figure out how to move a running VM from one pacemaker cluster to another. I've got the storage and VM live migration sorted, but having trouble with pacemaker. I tried unmanaging the resource (the VM), then deleted the resource, and the node got fenced. So I am assuming it thought it couldn't stop the service so it self-fenced. In any case, can someone let me know what the proper procedure is? Said more directly; How to I delete a resource from pacemaker (via pcs on EL8) without stopping the resource? Set the stop-orphan-resources cluster property to false (at least while you move it) The problem with your first approach is that once you remove the resource configuration, which includes the is-managed setting, Pacemaker no longer knows the resource is unmanaged. And even if you set it via resource defaults or something, eventually you have to set it back, at which point Pacemaker will still have the same response. Follow up; I tried to do the following sequence; pcs property set stop-orphan-resources=false pcs resource unmanage srv01-cs8 # Without this, the resource was stopped pcs resource delete srv01-cs8 # Failed with "Warning: 'srv01-cs8' is unmanaged" pcs resource delete srv01-cs8 --force # Got 'Deleting Resource - srv01-cs8' pcs resource status -- * srv01-cs8 (ocf::alteeve:server): ORPHANED Started an-a01n01 (unmanaged) -- So it seems like this doesn't delete the resource. Can I get some insight on how to actually delete this resource without disabling the VM? Thanks! digimer -- Digimer Papers and Projects: https://alteeve.com/w/ "I am, somehow, less interested in the weight and convolutions of Einstein’s brain than in the near certainty that people of equal talent have lived and died in cotton fields and sweatshops." - Stephen Jay Gould ___ Manage your subscription: https://lists.clusterlabs.org/mailman/listinfo/users ClusterLabs home: https://www.clusterlabs.org/