Re: [ClusterLabs] SAPHanaController & SAPHanaTopology question
I just checked and it seems that I got 2 regular rules for stopping topology before nfs_active: stop Topology-clone then stop nfs_active_siteA (kind: Mandatory)stop Topology-clone then stop nfs_active_siteB (kind Mandatory) Also I got:start Topology-clone then start Controller-clone (kind: Mandatory) And also resource sets that take care that all FS start and then the relevant nfs_active resources. Also, It seems that regular order rules cannot be removed via ID , maybe a Feature request is needed. Best Regards,Strahil Nikolov If you mean a whole constraint set, then yes -- run `pcs constraint --full` to get a list of all constraints with their constraint IDs. Then run `pcs constraint remove ` to remove a particular constraint. This can include set constraints. ___ Manage your subscription: https://lists.clusterlabs.org/mailman/listinfo/users ClusterLabs home: https://www.clusterlabs.org/
Re: [ClusterLabs] SAPHanaController & SAPHanaTopology question
Hi Reid, I will check it out in Monday, but I'm pretty sure I created an order set that first stops the topology and only then it stops the nfs-active. Yet, I made the stupid decision to prevent ocf:heartbeat:Filesystem (and setting a huge timeout for the stop operation) from killing those 2 SAP processes which led to 'I can't umount, giving up'-like notification and of course fenced the entire cluster :D . Note taken, stonith has now different delays , and Filesystem can kill the processes. As per the SAP note from Andrei, these could really be 'fast restart' mechanisms in HANA 2.0 and it looks safe to be killed (will check with SAP about that). P.S: Is there a way to remove a whole set in pcs , cause it's really irritating when the stupid command wipes the resource from multiple order constraints? Best Regards,Strahil Nikolov On Fri, Apr 2, 2021 at 23:44, Reid Wahl wrote: Hi, Strahil. Based on the constraints documented in the article you're following (RH KB solution 5423971), I think I see what's happening. The SAPHanaTopology resource requires the appropriate nfs-active attribute in order to run. That means that if the nfs-active attribute is set to false, the SAPHanaTopology resource must stop. However, there's no rule saying SAPHanaTopology must finish stopping before the nfs-active attribute resource stops. In fact, it's quite the opposite: the SAPHanaTopology resource stops only after the nfs-active resource stops. At the same time, the NFS resources are allowed to stop after the nfs-active attribute resource has stopped. So the NFS resources are stopping while the SAPHana* resources are likely still active. Try something like this: # pcs constraint order hana_nfs1_active-clone then SAPHanaTopology__-clone kind=Optional # pcs constraint order hana_nfs2_active-clone then SAPHanaTopology__-clone kind=Optional This says "if both hana_nfs1_active and SAPHanaTopology are scheduled to start, then make hana_nfs1_active start first. If both are scheduled to stop, then make SAPHanaTopology stop first." "kind=Optional" means there's no order dependency unless both resources are already going to be scheduled for the action. I'm using kind=Optional here even though kind=Mandatory (the default) would make sense, because IIRC there were some unexpected interactions with ordering constraints for clones, where events on one node had unwanted effects on other nodes. I'm not able to test right now since setting up an environment for this even with dummy resources is non-trivial -- but you're welcome to try this both with and without kind=Optional if you'd like. Please let us know how this goes. On Fri, Apr 2, 2021 at 2:20 AM Strahil Nikolov wrote: Hello All, I am testing the newly built HANA (Scale-out) cluster and it seems that:Neither SAPHanaController, nor SAPHanaTopology are stopping the HANA when I put the nodes (same DC = same HANA) in standby. This of course leads to a situation where the NFS cannot be umounted and despite the stop timeout - leads to fencing(on-fail=fence). I thought that the Controller resource agent is stopping the HANA and the slave role should not be 'stopped' before that . Maybe my expectations are wrong ? Best Regards,Strahil Nikolov ___ Manage your subscription: https://lists.clusterlabs.org/mailman/listinfo/users ClusterLabs home: https://www.clusterlabs.org/ -- Regards, Reid Wahl, RHCA Senior Software Maintenance Engineer, Red Hat CEE - Platform Support Delivery - ClusterHA ___ Manage your subscription: https://lists.clusterlabs.org/mailman/listinfo/users ClusterLabs home: https://www.clusterlabs.org/
Re: [ClusterLabs] SAPHanaController & SAPHanaTopology question
On Fri, Apr 2, 2021 at 2:04 PM Strahil Nikolov wrote: > Hi Reid, > > I will check it out in Monday, but I'm pretty sure I created an order set > that first stops the topology and only then it stops the nfs-active. > > Yet, I made the stupid decision to prevent ocf:heartbeat:Filesystem (and > setting a huge timeout for the stop operation) from killing those 2 SAP > processes which led to 'I can't umount, giving up'-like notification and of > course fenced the entire cluster :D . > > Note taken, stonith has now different delays , and Filesystem can kill the > processes. > > As per the SAP note from Andrei, these could really be 'fast restart' > mechanisms in HANA 2.0 and it looks safe to be killed (will check with SAP > about that). > > > P.S: Is there a way to remove a whole set in pcs , cause it's really > irritating when the stupid command wipes the resource from multiple order > constraints? > If you mean a whole constraint set, then yes -- run `pcs constraint --full` to get a list of all constraints with their constraint IDs. Then run `pcs constraint remove ` to remove a particular constraint. This can include set constraints. > > Best Regards, > Strahil Nikolov > > > > On Fri, Apr 2, 2021 at 23:44, Reid Wahl > wrote: > Hi, Strahil. > > Based on the constraints documented in the article you're following (RH KB > solution 5423971), I think I see what's happening. > > The SAPHanaTopology resource requires the appropriate nfs-active attribute > in order to run. That means that if the nfs-active attribute is set to > false, the SAPHanaTopology resource must stop. > > However, there's no rule saying SAPHanaTopology must finish stopping > before the nfs-active attribute resource stops. In fact, it's quite the > opposite: the SAPHanaTopology resource stops only after the nfs-active > resource stops. > > At the same time, the NFS resources are allowed to stop after the > nfs-active attribute resource has stopped. So the NFS resources are > stopping while the SAPHana* resources are likely still active. > > Try something like this: > # pcs constraint order hana_nfs1_active-clone then > SAPHanaTopology__-clone kind=Optional > # pcs constraint order hana_nfs2_active-clone then > SAPHanaTopology__-clone kind=Optional > > This says "if both hana_nfs1_active and SAPHanaTopology are scheduled to > start, then make hana_nfs1_active start first. If both are scheduled to > stop, then make SAPHanaTopology stop first." > > "kind=Optional" means there's no order dependency unless both resources > are already going to be scheduled for the action. I'm using kind=Optional > here even though kind=Mandatory (the default) would make sense, because > IIRC there were some unexpected interactions with ordering constraints for > clones, where events on one node had unwanted effects on other nodes. > > I'm not able to test right now since setting up an environment for this > even with dummy resources is non-trivial -- but you're welcome to try this > both with and without kind=Optional if you'd like. > > Please let us know how this goes. > > On Fri, Apr 2, 2021 at 2:20 AM Strahil Nikolov > wrote: > > Hello All, > > I am testing the newly built HANA (Scale-out) cluster and it seems that: > Neither SAPHanaController, nor SAPHanaTopology are stopping the HANA when > I put the nodes (same DC = same HANA) in standby. This of course leads to a > situation where the NFS cannot be umounted and despite the stop timeout - > leads to fencing(on-fail=fence). > > I thought that the Controller resource agent is stopping the HANA and the > slave role should not be 'stopped' before that . > > Maybe my expectations are wrong ? > > Best Regards, > Strahil Nikolov > > ___ > Manage your subscription: > https://lists.clusterlabs.org/mailman/listinfo/users > > ClusterLabs home: https://www.clusterlabs.org/ > > > > -- > Regards, > > > Reid Wahl, RHCA > Senior Software Maintenance Engineer, Red Hat > CEE - Platform Support Delivery - ClusterHA > > -- Regards, Reid Wahl, RHCA Senior Software Maintenance Engineer, Red Hat CEE - Platform Support Delivery - ClusterHA ___ Manage your subscription: https://lists.clusterlabs.org/mailman/listinfo/users ClusterLabs home: https://www.clusterlabs.org/
Re: [ClusterLabs] SAPHanaController & SAPHanaTopology question
Hi, Strahil. Based on the constraints documented in the article you're following (RH KB solution 5423971), I think I see what's happening. The SAPHanaTopology resource requires the appropriate nfs-active attribute in order to run. That means that if the nfs-active attribute is set to false, the SAPHanaTopology resource must stop. However, there's no rule saying SAPHanaTopology must finish stopping before the nfs-active attribute resource stops. In fact, it's quite the opposite: the SAPHanaTopology resource stops only after the nfs-active resource stops. At the same time, the NFS resources are allowed to stop after the nfs-active attribute resource has stopped. So the NFS resources are stopping while the SAPHana* resources are likely still active. Try something like this: # pcs constraint order hana_nfs1_active-clone then SAPHanaTopology__-clone kind=Optional # pcs constraint order hana_nfs2_active-clone then SAPHanaTopology__-clone kind=Optional This says "if both hana_nfs1_active and SAPHanaTopology are scheduled to start, then make hana_nfs1_active start first. If both are scheduled to stop, then make SAPHanaTopology stop first." "kind=Optional" means there's no order dependency unless both resources are already going to be scheduled for the action. I'm using kind=Optional here even though kind=Mandatory (the default) would make sense, because IIRC there were some unexpected interactions with ordering constraints for clones, where events on one node had unwanted effects on other nodes. I'm not able to test right now since setting up an environment for this even with dummy resources is non-trivial -- but you're welcome to try this both with and without kind=Optional if you'd like. Please let us know how this goes. On Fri, Apr 2, 2021 at 2:20 AM Strahil Nikolov wrote: > Hello All, > > I am testing the newly built HANA (Scale-out) cluster and it seems that: > Neither SAPHanaController, nor SAPHanaTopology are stopping the HANA when > I put the nodes (same DC = same HANA) in standby. This of course leads to a > situation where the NFS cannot be umounted and despite the stop timeout - > leads to fencing(on-fail=fence). > > I thought that the Controller resource agent is stopping the HANA and the > slave role should not be 'stopped' before that . > > Maybe my expectations are wrong ? > > Best Regards, > Strahil Nikolov > > ___ > Manage your subscription: > https://lists.clusterlabs.org/mailman/listinfo/users > > ClusterLabs home: https://www.clusterlabs.org/ > -- Regards, Reid Wahl, RHCA Senior Software Maintenance Engineer, Red Hat CEE - Platform Support Delivery - ClusterHA ___ Manage your subscription: https://lists.clusterlabs.org/mailman/listinfo/users ClusterLabs home: https://www.clusterlabs.org/
Re: [ClusterLabs] SAPHanaController & SAPHanaTopology question
Thanks Andrei, so can we assume that killing those processes during NFS umount is acceptable and no risk to the HANA data can be observed ? I have noticed that the cluster is killing those when the cluster is being stopped (including NFS) . Best Regards,Strahil Nikolov On Fri, Apr 2, 2021 at 14:31, Andrei Borzenkov wrote: On Fri, Apr 2, 2021 at 12:30 PM Strahil Nikolov wrote: > > To be more specific, the processes left are 'hdbrsutil' This process holds database content in memory after shutdown (I believe, for 1 hour by default) to facilitate fast startup. You can disable it. See SAP note 2159435. > and the 'sapstartsrv'. > Well, this is the primary service that handles all requests from sapcontrol. It is sort of supposed to be always running. Resource agent handles missing sapstartsrv during activation. I guess if you have a valid use case you may try to open a service request or github issue to also stop sapstartsrv. > Best Regards, > Strahil Nikolov > > On Fri, Apr 2, 2021 at 12:20, Strahil Nikolov > wrote: > Hello All, > > I am testing the newly built HANA (Scale-out) cluster and it seems that: > Neither SAPHanaController, nor SAPHanaTopology are stopping the HANA when I > put the nodes (same DC = same HANA) in standby. This of course leads to a > situation where the NFS cannot be umounted and despite the stop timeout - > leads to fencing(on-fail=fence). > > I thought that the Controller resource agent is stopping the HANA and the > slave role should not be 'stopped' before that . > > Maybe my expectations are wrong ? > > Best Regards, > Strahil Nikolov > > ___ > Manage your subscription: > https://lists.clusterlabs.org/mailman/listinfo/users > > ClusterLabs home: https://www.clusterlabs.org/ ___ Manage your subscription: https://lists.clusterlabs.org/mailman/listinfo/users ClusterLabs home: https://www.clusterlabs.org/
Re: [ClusterLabs] SAPHanaController & SAPHanaTopology question
To be more specific, the processes left are 'hdbrsutil' and the 'sapstartsrv'. Best Regards,Strahil Nikolov On Fri, Apr 2, 2021 at 12:20, Strahil Nikolov wrote: Hello All, I am testing the newly built HANA (Scale-out) cluster and it seems that:Neither SAPHanaController, nor SAPHanaTopology are stopping the HANA when I put the nodes (same DC = same HANA) in standby. This of course leads to a situation where the NFS cannot be umounted and despite the stop timeout - leads to fencing(on-fail=fence). I thought that the Controller resource agent is stopping the HANA and the slave role should not be 'stopped' before that . Maybe my expectations are wrong ? Best Regards,Strahil Nikolov ___ Manage your subscription: https://lists.clusterlabs.org/mailman/listinfo/users ClusterLabs home: https://www.clusterlabs.org/
Re: [ClusterLabs] SAPHanaController & SAPHanaTopology question
On Fri, Apr 2, 2021 at 3:42 PM Strahil Nikolov wrote: > > Thanks Andrei, > > so can we assume that killing those processes during NFS umount is acceptable > and no risk to the HANA data can be observed ? > This is a question for SAP support, not for some random public mailing list. > I have noticed that the cluster is killing those when the cluster is being > stopped (including NFS) . > > > Best Regards, > Strahil Nikolov > > On Fri, Apr 2, 2021 at 14:31, Andrei Borzenkov > wrote: > On Fri, Apr 2, 2021 at 12:30 PM Strahil Nikolov wrote: > > > > To be more specific, the processes left are 'hdbrsutil' > > This process holds database content in memory after shutdown (I > believe, for 1 hour by default) to facilitate fast startup. You can > disable it. See SAP note 2159435. > > > > and the 'sapstartsrv'. > > > > Well, this is the primary service that handles all requests from > sapcontrol. It is sort of supposed to be always running. Resource > agent handles missing sapstartsrv during activation. > > I guess if you have a valid use case you may try to open a service > request or github issue to also stop sapstartsrv. > > > > Best Regards, > > Strahil Nikolov > > > > On Fri, Apr 2, 2021 at 12:20, Strahil Nikolov > > wrote: > > Hello All, > > > > I am testing the newly built HANA (Scale-out) cluster and it seems that: > > Neither SAPHanaController, nor SAPHanaTopology are stopping the HANA when I > > put the nodes (same DC = same HANA) in standby. This of course leads to a > > situation where the NFS cannot be umounted and despite the stop timeout - > > leads to fencing(on-fail=fence). > > > > I thought that the Controller resource agent is stopping the HANA and the > > slave role should not be 'stopped' before that . > > > > Maybe my expectations are wrong ? > > > > Best Regards, > > Strahil Nikolov > > > > > ___ > > Manage your subscription: > > https://lists.clusterlabs.org/mailman/listinfo/users > > > > ClusterLabs home: https://www.clusterlabs.org/ > ___ Manage your subscription: https://lists.clusterlabs.org/mailman/listinfo/users ClusterLabs home: https://www.clusterlabs.org/
Re: [ClusterLabs] SAPHanaController & SAPHanaTopology question
On Fri, Apr 2, 2021 at 12:30 PM Strahil Nikolov wrote: > > To be more specific, the processes left are 'hdbrsutil' This process holds database content in memory after shutdown (I believe, for 1 hour by default) to facilitate fast startup. You can disable it. See SAP note 2159435. > and the 'sapstartsrv'. > Well, this is the primary service that handles all requests from sapcontrol. It is sort of supposed to be always running. Resource agent handles missing sapstartsrv during activation. I guess if you have a valid use case you may try to open a service request or github issue to also stop sapstartsrv. > Best Regards, > Strahil Nikolov > > On Fri, Apr 2, 2021 at 12:20, Strahil Nikolov > wrote: > Hello All, > > I am testing the newly built HANA (Scale-out) cluster and it seems that: > Neither SAPHanaController, nor SAPHanaTopology are stopping the HANA when I > put the nodes (same DC = same HANA) in standby. This of course leads to a > situation where the NFS cannot be umounted and despite the stop timeout - > leads to fencing(on-fail=fence). > > I thought that the Controller resource agent is stopping the HANA and the > slave role should not be 'stopped' before that . > > Maybe my expectations are wrong ? > > Best Regards, > Strahil Nikolov > > ___ > Manage your subscription: > https://lists.clusterlabs.org/mailman/listinfo/users > > ClusterLabs home: https://www.clusterlabs.org/ ___ Manage your subscription: https://lists.clusterlabs.org/mailman/listinfo/users ClusterLabs home: https://www.clusterlabs.org/