Re: [ClusterLabs] SAPHanaController & SAPHanaTopology question

2021-04-05 Thread Strahil Nikolov
I just checked and it seems that I got 2 regular rules for stopping topology 
before nfs_active:
stop Topology-clone then stop nfs_active_siteA (kind: Mandatory)stop 
Topology-clone then stop nfs_active_siteB (kind Mandatory)
Also I got:start Topology-clone then start Controller-clone (kind: Mandatory)
And also resource sets that take care that all FS start and then the relevant 
nfs_active resources.

Also, It seems that regular order rules cannot be removed via ID , maybe a 
Feature request is needed.
 
Best Regards,Strahil Nikolov
 
  
If you mean a whole constraint set, then yes -- run `pcs constraint --full` to 
get a list of all constraints with their constraint IDs. Then run `pcs 
constraint remove ` to remove a particular constraint. This can 
include set constraints.  
___
Manage your subscription:
https://lists.clusterlabs.org/mailman/listinfo/users

ClusterLabs home: https://www.clusterlabs.org/


Re: [ClusterLabs] SAPHanaController & SAPHanaTopology question

2021-04-02 Thread Strahil Nikolov
Hi Reid,
I will check it out in Monday, but I'm pretty sure I created an order set that 
first stops the topology and only then it stops the nfs-active.
Yet, I made the stupid decision to prevent ocf:heartbeat:Filesystem (and 
setting a huge timeout for the stop operation) from killing those 2 SAP 
processes which led to 'I can't umount, giving up'-like notification and of 
course fenced the entire cluster :D . 
Note taken, stonith has now different delays , and Filesystem can kill the 
processes.
As per the SAP note from Andrei, these could really be 'fast restart' 
mechanisms in HANA 2.0 and it looks safe to be killed (will check with SAP 
about that).

P.S: Is there a way to remove a whole set in pcs , cause it's really irritating 
when the stupid command wipes the resource from multiple order constraints?
Best Regards,Strahil Nikolov

 
 
  On Fri, Apr 2, 2021 at 23:44, Reid Wahl wrote:   Hi, 
Strahil.
Based on the constraints documented in the article you're following (RH KB 
solution 5423971), I think I see what's happening.
The SAPHanaTopology resource requires the appropriate nfs-active attribute in 
order to run. That means that if the nfs-active attribute is set to false, the 
SAPHanaTopology resource must stop.
However, there's no rule saying SAPHanaTopology must finish stopping before the 
nfs-active attribute resource stops. In fact, it's quite the opposite: the 
SAPHanaTopology resource stops only after the nfs-active resource stops.
At the same time, the NFS resources are allowed to stop after the nfs-active 
attribute resource has stopped. So the NFS resources are stopping while the 
SAPHana* resources are likely still active.
Try something like this:    # pcs constraint order hana_nfs1_active-clone then 
SAPHanaTopology__-clone kind=Optional
    # pcs constraint order hana_nfs2_active-clone then 
SAPHanaTopology__-clone kind=Optional

This says "if both hana_nfs1_active and SAPHanaTopology are scheduled to start, 
then make hana_nfs1_active start first. If both are scheduled to stop, then 
make SAPHanaTopology stop first."
"kind=Optional" means there's no order dependency unless both resources are 
already going to be scheduled for the action. I'm using kind=Optional here even 
though kind=Mandatory (the default) would make sense, because IIRC there were 
some unexpected interactions with ordering constraints for clones, where events 
on one node had unwanted effects on other nodes.
I'm not able to test right now since setting up an environment for this even 
with dummy resources is non-trivial -- but you're welcome to try this both with 
and without kind=Optional if you'd like.
Please let us know how this goes.

On Fri, Apr 2, 2021 at 2:20 AM Strahil Nikolov  wrote:

Hello All,
I am testing the newly built HANA (Scale-out) cluster and it seems that:Neither 
SAPHanaController, nor SAPHanaTopology are stopping the HANA when I put the 
nodes (same DC = same HANA) in standby. This of course leads to a situation 
where the NFS cannot be umounted and despite the stop timeout  - leads to 
fencing(on-fail=fence).
I thought that the Controller resource agent is stopping the HANA and the slave 
role should not be 'stopped' before that .
Maybe my expectations are wrong ?
Best Regards,Strahil Nikolov
___
Manage your subscription:
https://lists.clusterlabs.org/mailman/listinfo/users

ClusterLabs home: https://www.clusterlabs.org/



-- 
Regards,

Reid Wahl, RHCA
Senior Software Maintenance Engineer, Red Hat
CEE - Platform Support Delivery - ClusterHA  
___
Manage your subscription:
https://lists.clusterlabs.org/mailman/listinfo/users

ClusterLabs home: https://www.clusterlabs.org/


Re: [ClusterLabs] SAPHanaController & SAPHanaTopology question

2021-04-02 Thread Reid Wahl
On Fri, Apr 2, 2021 at 2:04 PM Strahil Nikolov 
wrote:

> Hi Reid,
>
> I will check it out in Monday, but I'm pretty sure I created an order set
> that first stops the topology and only then it stops the nfs-active.
>
> Yet, I made the stupid decision to prevent ocf:heartbeat:Filesystem (and
> setting a huge timeout for the stop operation) from killing those 2 SAP
> processes which led to 'I can't umount, giving up'-like notification and of
> course fenced the entire cluster :D .
>
> Note taken, stonith has now different delays , and Filesystem can kill the
> processes.
>
> As per the SAP note from Andrei, these could really be 'fast restart'
> mechanisms in HANA 2.0 and it looks safe to be killed (will check with SAP
> about that).
>
>
> P.S: Is there a way to remove a whole set in pcs , cause it's really
> irritating when the stupid command wipes the resource from multiple order
> constraints?
>

If you mean a whole constraint set, then yes -- run `pcs constraint --full`
to get a list of all constraints with their constraint IDs. Then run `pcs
constraint remove ` to remove a particular constraint. This
can include set constraints.


>
> Best Regards,
> Strahil Nikolov
>
>
>
> On Fri, Apr 2, 2021 at 23:44, Reid Wahl
>  wrote:
> Hi, Strahil.
>
> Based on the constraints documented in the article you're following (RH KB
> solution 5423971), I think I see what's happening.
>
> The SAPHanaTopology resource requires the appropriate nfs-active attribute
> in order to run. That means that if the nfs-active attribute is set to
> false, the SAPHanaTopology resource must stop.
>
> However, there's no rule saying SAPHanaTopology must finish stopping
> before the nfs-active attribute resource stops. In fact, it's quite the
> opposite: the SAPHanaTopology resource stops only after the nfs-active
> resource stops.
>
> At the same time, the NFS resources are allowed to stop after the
> nfs-active attribute resource has stopped. So the NFS resources are
> stopping while the SAPHana* resources are likely still active.
>
> Try something like this:
> # pcs constraint order hana_nfs1_active-clone then
> SAPHanaTopology__-clone kind=Optional
> # pcs constraint order hana_nfs2_active-clone then
> SAPHanaTopology__-clone kind=Optional
>
> This says "if both hana_nfs1_active and SAPHanaTopology are scheduled to
> start, then make hana_nfs1_active start first. If both are scheduled to
> stop, then make SAPHanaTopology stop first."
>
> "kind=Optional" means there's no order dependency unless both resources
> are already going to be scheduled for the action. I'm using kind=Optional
> here even though kind=Mandatory (the default) would make sense, because
> IIRC there were some unexpected interactions with ordering constraints for
> clones, where events on one node had unwanted effects on other nodes.
>
> I'm not able to test right now since setting up an environment for this
> even with dummy resources is non-trivial -- but you're welcome to try this
> both with and without kind=Optional if you'd like.
>
> Please let us know how this goes.
>
> On Fri, Apr 2, 2021 at 2:20 AM Strahil Nikolov 
> wrote:
>
> Hello All,
>
> I am testing the newly built HANA (Scale-out) cluster and it seems that:
> Neither SAPHanaController, nor SAPHanaTopology are stopping the HANA when
> I put the nodes (same DC = same HANA) in standby. This of course leads to a
> situation where the NFS cannot be umounted and despite the stop timeout  -
> leads to fencing(on-fail=fence).
>
> I thought that the Controller resource agent is stopping the HANA and the
> slave role should not be 'stopped' before that .
>
> Maybe my expectations are wrong ?
>
> Best Regards,
> Strahil Nikolov
>
> ___
> Manage your subscription:
> https://lists.clusterlabs.org/mailman/listinfo/users
>
> ClusterLabs home: https://www.clusterlabs.org/
>
>
>
> --
> Regards,
>
>
> Reid Wahl, RHCA
> Senior Software Maintenance Engineer, Red Hat
> CEE - Platform Support Delivery - ClusterHA
>
>

-- 
Regards,

Reid Wahl, RHCA
Senior Software Maintenance Engineer, Red Hat
CEE - Platform Support Delivery - ClusterHA
___
Manage your subscription:
https://lists.clusterlabs.org/mailman/listinfo/users

ClusterLabs home: https://www.clusterlabs.org/


Re: [ClusterLabs] SAPHanaController & SAPHanaTopology question

2021-04-02 Thread Reid Wahl
Hi, Strahil.

Based on the constraints documented in the article you're following (RH KB
solution 5423971), I think I see what's happening.

The SAPHanaTopology resource requires the appropriate nfs-active attribute
in order to run. That means that if the nfs-active attribute is set to
false, the SAPHanaTopology resource must stop.

However, there's no rule saying SAPHanaTopology must finish stopping before
the nfs-active attribute resource stops. In fact, it's quite the opposite:
the SAPHanaTopology resource stops only after the nfs-active resource stops.

At the same time, the NFS resources are allowed to stop after the
nfs-active attribute resource has stopped. So the NFS resources are
stopping while the SAPHana* resources are likely still active.

Try something like this:
# pcs constraint order hana_nfs1_active-clone then
SAPHanaTopology__-clone kind=Optional
# pcs constraint order hana_nfs2_active-clone then
SAPHanaTopology__-clone kind=Optional

This says "if both hana_nfs1_active and SAPHanaTopology are scheduled to
start, then make hana_nfs1_active start first. If both are scheduled to
stop, then make SAPHanaTopology stop first."

"kind=Optional" means there's no order dependency unless both resources are
already going to be scheduled for the action. I'm using kind=Optional here
even though kind=Mandatory (the default) would make sense, because IIRC
there were some unexpected interactions with ordering constraints for
clones, where events on one node had unwanted effects on other nodes.

I'm not able to test right now since setting up an environment for this
even with dummy resources is non-trivial -- but you're welcome to try this
both with and without kind=Optional if you'd like.

Please let us know how this goes.

On Fri, Apr 2, 2021 at 2:20 AM Strahil Nikolov 
wrote:

> Hello All,
>
> I am testing the newly built HANA (Scale-out) cluster and it seems that:
> Neither SAPHanaController, nor SAPHanaTopology are stopping the HANA when
> I put the nodes (same DC = same HANA) in standby. This of course leads to a
> situation where the NFS cannot be umounted and despite the stop timeout  -
> leads to fencing(on-fail=fence).
>
> I thought that the Controller resource agent is stopping the HANA and the
> slave role should not be 'stopped' before that .
>
> Maybe my expectations are wrong ?
>
> Best Regards,
> Strahil Nikolov
>
> ___
> Manage your subscription:
> https://lists.clusterlabs.org/mailman/listinfo/users
>
> ClusterLabs home: https://www.clusterlabs.org/
>


-- 
Regards,

Reid Wahl, RHCA
Senior Software Maintenance Engineer, Red Hat
CEE - Platform Support Delivery - ClusterHA
___
Manage your subscription:
https://lists.clusterlabs.org/mailman/listinfo/users

ClusterLabs home: https://www.clusterlabs.org/


Re: [ClusterLabs] SAPHanaController & SAPHanaTopology question

2021-04-02 Thread Strahil Nikolov
Thanks Andrei,
so can we assume that killing those processes during NFS umount is acceptable 
and no risk to the HANA data can be observed ?
I have noticed that the cluster is killing those when the cluster is being 
stopped (including NFS) .

Best Regards,Strahil Nikolov
 
 
  On Fri, Apr 2, 2021 at 14:31, Andrei Borzenkov wrote:   
On Fri, Apr 2, 2021 at 12:30 PM Strahil Nikolov  wrote:
>
> To be more specific, the processes left are 'hdbrsutil'

This process holds database content in memory after shutdown (I
believe, for 1 hour by default) to facilitate fast startup. You can
disable it. See SAP note 2159435.


> and the 'sapstartsrv'.
>

Well, this is the primary service that handles all requests from
sapcontrol. It is sort of supposed to be always running. Resource
agent handles missing sapstartsrv during activation.

I guess if you have a valid use case you may try to open a service
request or github issue to also stop sapstartsrv.

> Best Regards,
> Strahil Nikolov
>
> On Fri, Apr 2, 2021 at 12:20, Strahil Nikolov
>  wrote:
> Hello All,
>
> I am testing the newly built HANA (Scale-out) cluster and it seems that:
> Neither SAPHanaController, nor SAPHanaTopology are stopping the HANA when I 
> put the nodes (same DC = same HANA) in standby. This of course leads to a 
> situation where the NFS cannot be umounted and despite the stop timeout  - 
> leads to fencing(on-fail=fence).
>
> I thought that the Controller resource agent is stopping the HANA and the 
> slave role should not be 'stopped' before that .
>
> Maybe my expectations are wrong ?
>
> Best Regards,
> Strahil Nikolov
>
> ___
> Manage your subscription:
> https://lists.clusterlabs.org/mailman/listinfo/users
>
> ClusterLabs home: https://www.clusterlabs.org/
  
___
Manage your subscription:
https://lists.clusterlabs.org/mailman/listinfo/users

ClusterLabs home: https://www.clusterlabs.org/


Re: [ClusterLabs] SAPHanaController & SAPHanaTopology question

2021-04-02 Thread Strahil Nikolov
To be more specific, the processes left are 'hdbrsutil' and the 'sapstartsrv'.
Best Regards,Strahil Nikolov
 
 
  On Fri, Apr 2, 2021 at 12:20, Strahil Nikolov wrote:   
Hello All,
I am testing the newly built HANA (Scale-out) cluster and it seems that:Neither 
SAPHanaController, nor SAPHanaTopology are stopping the HANA when I put the 
nodes (same DC = same HANA) in standby. This of course leads to a situation 
where the NFS cannot be umounted and despite the stop timeout  - leads to 
fencing(on-fail=fence).
I thought that the Controller resource agent is stopping the HANA and the slave 
role should not be 'stopped' before that .
Maybe my expectations are wrong ?
Best Regards,Strahil Nikolov
  
___
Manage your subscription:
https://lists.clusterlabs.org/mailman/listinfo/users

ClusterLabs home: https://www.clusterlabs.org/


Re: [ClusterLabs] SAPHanaController & SAPHanaTopology question

2021-04-02 Thread Andrei Borzenkov
On Fri, Apr 2, 2021 at 3:42 PM Strahil Nikolov  wrote:
>
> Thanks Andrei,
>
> so can we assume that killing those processes during NFS umount is acceptable 
> and no risk to the HANA data can be observed ?
>

This is a question for SAP support, not for some random public mailing list.

> I have noticed that the cluster is killing those when the cluster is being 
> stopped (including NFS) .
>
>
> Best Regards,
> Strahil Nikolov
>
> On Fri, Apr 2, 2021 at 14:31, Andrei Borzenkov
>  wrote:
> On Fri, Apr 2, 2021 at 12:30 PM Strahil Nikolov  wrote:
> >
> > To be more specific, the processes left are 'hdbrsutil'
>
> This process holds database content in memory after shutdown (I
> believe, for 1 hour by default) to facilitate fast startup. You can
> disable it. See SAP note 2159435.
>
>
> > and the 'sapstartsrv'.
> >
>
> Well, this is the primary service that handles all requests from
> sapcontrol. It is sort of supposed to be always running. Resource
> agent handles missing sapstartsrv during activation.
>
> I guess if you have a valid use case you may try to open a service
> request or github issue to also stop sapstartsrv.
>
>
> > Best Regards,
> > Strahil Nikolov
> >
> > On Fri, Apr 2, 2021 at 12:20, Strahil Nikolov
> >  wrote:
> > Hello All,
> >
> > I am testing the newly built HANA (Scale-out) cluster and it seems that:
> > Neither SAPHanaController, nor SAPHanaTopology are stopping the HANA when I 
> > put the nodes (same DC = same HANA) in standby. This of course leads to a 
> > situation where the NFS cannot be umounted and despite the stop timeout  - 
> > leads to fencing(on-fail=fence).
> >
> > I thought that the Controller resource agent is stopping the HANA and the 
> > slave role should not be 'stopped' before that .
> >
> > Maybe my expectations are wrong ?
> >
> > Best Regards,
> > Strahil Nikolov
>
> >
> > ___
> > Manage your subscription:
> > https://lists.clusterlabs.org/mailman/listinfo/users
> >
> > ClusterLabs home: https://www.clusterlabs.org/
>
___
Manage your subscription:
https://lists.clusterlabs.org/mailman/listinfo/users

ClusterLabs home: https://www.clusterlabs.org/


Re: [ClusterLabs] SAPHanaController & SAPHanaTopology question

2021-04-02 Thread Andrei Borzenkov
On Fri, Apr 2, 2021 at 12:30 PM Strahil Nikolov  wrote:
>
> To be more specific, the processes left are 'hdbrsutil'

This process holds database content in memory after shutdown (I
believe, for 1 hour by default) to facilitate fast startup. You can
disable it. See SAP note 2159435.


> and the 'sapstartsrv'.
>

Well, this is the primary service that handles all requests from
sapcontrol. It is sort of supposed to be always running. Resource
agent handles missing sapstartsrv during activation.

I guess if you have a valid use case you may try to open a service
request or github issue to also stop sapstartsrv.

> Best Regards,
> Strahil Nikolov
>
> On Fri, Apr 2, 2021 at 12:20, Strahil Nikolov
>  wrote:
> Hello All,
>
> I am testing the newly built HANA (Scale-out) cluster and it seems that:
> Neither SAPHanaController, nor SAPHanaTopology are stopping the HANA when I 
> put the nodes (same DC = same HANA) in standby. This of course leads to a 
> situation where the NFS cannot be umounted and despite the stop timeout  - 
> leads to fencing(on-fail=fence).
>
> I thought that the Controller resource agent is stopping the HANA and the 
> slave role should not be 'stopped' before that .
>
> Maybe my expectations are wrong ?
>
> Best Regards,
> Strahil Nikolov
>
> ___
> Manage your subscription:
> https://lists.clusterlabs.org/mailman/listinfo/users
>
> ClusterLabs home: https://www.clusterlabs.org/
___
Manage your subscription:
https://lists.clusterlabs.org/mailman/listinfo/users

ClusterLabs home: https://www.clusterlabs.org/