Re: [ClusterLabs] staggered resource start/stop

2021-03-29 Thread Klaus Wenninger

On 3/29/21 8:44 AM, d tbsky wrote:

Reid Wahl 

An order constraint set with kind=Serialize (which is mentioned in the first 
reply to the thread you linked) seems like the most logical option to me. You 
could serialize a set of resource sets, where each inner set contains a 
VirtualDomain resource and an ocf:heartbeat:Delay resource.

  ⁠5.3.1. Ordering Properties 
(https://clusterlabs.org/pacemaker/doc/en-US/Pacemaker/2.0/html-single/Pacemaker_Explained/index.html#idm46061192464416)
  ⁠5.6. Ordering Sets of Resources 
(https://clusterlabs.org/pacemaker/doc/en-US/Pacemaker/2.0/html-single/Pacemaker_Explained/index.html#s-resource-sets-ordering)

  thanks a lot! I don't know there is an official RA acting as
delay. that's interesting and useful to me.

In this case it might be useful not to wait some defined time
hoping startup of the VM would have gone far enough that
the IO load has already decayed enough.
What about a resource that checks for something running
inside the VM that indicates that startup has completed?
Don't remember if the VirtualDomain RA might already
have such a probe possibility.

Klaus

___
Manage your subscription:
https://lists.clusterlabs.org/mailman/listinfo/users

ClusterLabs home: https://www.clusterlabs.org/


___
Manage your subscription:
https://lists.clusterlabs.org/mailman/listinfo/users

ClusterLabs home: https://www.clusterlabs.org/


Re: [ClusterLabs] WebSite_start_0 on node2 'error' (1): call=6, status='complete', exitreason='Failed to access httpd status page.'

2021-03-29 Thread Klaus Wenninger

On 3/29/21 5:24 PM, Tomas Jelinek wrote:
If you stopped a node and you want it to start and reconnect to its 
cluster, run 'pcs cluster start' on the node. You may also run 'pcs 
cluster start --all' or (in your case) 'pcs cluster start node1' on 
any cluster node.


Maybe for better understanding: This is working via the pcsd-instances 
talking

to each other while all the other cluster-daemons are down on a node that
has the cluster stopped.

Klaus

Tomas


Dne 29. 03. 21 v 16:25 Jason Long napsal(a):

Thank you.
Then, if a node disconnected then how it could back to the cluster 
chain?







On Monday, March 29, 2021, 06:13:09 PM GMT+4:30, Tomas Jelinek 
 wrote:






Hi Jason,

Regarding point 3:
Most pcs commands operate on the local node. If you stop a cluster on a
node, pcs is unable to connect to cluster daemons on the node (since
they are not running) and prints an error message denoting that. This is
expected behavior.

Regards,
Tomas


Dne 27. 03. 21 v 6:54 Jason Long napsal(a):

Thank you.
I have other questions:

1- How can I launch a test lab?
2- Why, when I stop node1 manually and then start it again, I 
can't browse "http://127.0.0.1:2080;? I think when I stopped node1 
then Pacemaker forget to back to the chain!!!
3- Why, when I stopped node1, then "pcs status nodes" command not 
work? It shows me "Error: error running crm_mon, is pacemaker 
running?".







On Thursday, March 25, 2021, 09:08:45 PM GMT+4:30, Ken Gaillot 
 wrote:






On Thu, 2021-03-25 at 14:44 +, Jason Long wrote:

Then, how can I sure my configuration is OK?
In a clustering environment, when a node disconnected then another
node must replace it. Am I right?
I did a test:
I defined a NAT interface for my VM2 (node2) and used port
forwarding: "127.0.0.1:2090" on Host  FORWARDING TO 127.0.0.1:80 on
Guest.
When node1 is OK and I browse "http://127.0.0.1:2080; then it shown
me "My Test Site - node1", but when I browse "http://127.0.0.1:2090;
then it doesn't show anything.
I stopped node1 and when I browse "http://127.0.0.1:2080; it doesn't
show anything, but when I browse "http://127.0.0.1:2090;, then it has
shown me "My Test Site - node2".
Could this mean that my cluster is working properly?


Port-forwarding to a single VM can never allow the other VM to take
over.

The intent of the floating IP address is to have a single, unique
address that users can use to contact the service. The cluster can move
this IP to one VM or the other, and that is invisible to users. The
term "floating" is intended to convey this, that the IP address is not
tied to a single node, but can move ("float") from one node to another,
transparently to users using the IP address.

In this case, the floating IP would take the place of the 127.0.0.1
port-forwarding addresses. Instead of two port-forwarding addresses,
you just have the one floating IP address.

How you get that working with a reverse proxy is up to you. The
Clusters from Scratch example shows how to do it with a web server, to
present the concepts, and you can tailor that to any service that needs
to be clustered.



On Thursday, March 25, 2021, 05:20:33 PM GMT+4:30, Klaus Wenninger <
kwenn...@redhat.com> wrote:





On 3/25/21 9:55 AM, Jason Long wrote:

Thank you so much.

       Now you can proceed with the "Add Apache HTTP" section.


What does it mean? I did all steps in the document.


       Once apache is set up as a cluster resource, you should be
able to contact the web server at the floating IP...


# pcs cluster stop node1
node1: Stopping Cluster (pacemaker)...
node1: Stopping Cluster (corosync)...
#
# pcs status
Error: error running crm_mon, is pacemaker running?
       Could not connect to the CIB: Transport endpoint is not
connected
       crm_mon: Error: cluster is not available on this node
#
# curl http://192.168.56.9

       My Test Site - node2
       

Thank you about it, but I want to use these two VMs as an Apache
Reverse Proxy Server. When one of my nodes stopped, then another
node start servicing.

My test lab use VirtualBox with two VMs as below:
VM1: This VM has two NICs (NAT, Host-only Adapter)
VM2: This VM has one NIC (Host-only Adapter)

On VM1, I use the NAT interface for the port forwarding:
"127.0.0.1:2080" on Host  FORWARDING TO 127.0.0.1:80 on Guest.

When I stopped node1 and browse "http://127.0.0.1:2080; then I
can't see anything. I want it shown me "My Test Site - node2". I
think it is reasonable because when on of my Reverse Proxy Server
(node1) stopped, then other Reverse Proxy Server (node2) started.

How can I achieve this goal?


Definitely not using that NAT interface I would say.
It will just be able to connect you with a service running on VM1.
And that doesn't make any sense seen from a high-availability
point of view. Even if you setup NAT that would make the
proxy on node2 visible via VM1 this wouldn't give you
increased availability - rather the opposite due to increased
complexity. In high-availability we are speaking of a 

Re: [ClusterLabs] Antw: [EXT] Colocation per site ?

2021-03-29 Thread Andrei Borzenkov
On 29.03.2021 20:12, Ken Gaillot wrote:
> On Sun, 2021-03-28 at 09:20 +0300, Andrei Borzenkov wrote:
>> On 28.03.2021 07:16, Strahil Nikolov wrote:
>>> I didn't mean DC as a designated coordinator, but as a physical
>>> Datecenter location.
>>> Last time I checked, the node attributes for all nodes seemed the
>>> same.I will verify that tomorrow (Monday).
>>>
>>
>> Yes, I was probably mistaken. It is different with scale-out, agent
>> puts
>> information in global property section of CIB.
>>
>> Ideally we'd need expression that says "on node where site attribute
>> is
>> the same as on node where clone master is active" but I guess there
>> is
>> no way to express it in pacemaker.
> 
> Yep, colocation by node attribute (combined with colocation with
> promoted role)
> 
> https://clusterlabs.org/pacemaker/doc/en-US/Pacemaker/2.0/html-single/Pacemaker_Explained/index.html#_colocation_properties
> 


Sorry, I must be daft but I do not see how it helps.

There are two sets of nodes. One set has property site=A, another set -
site=B. This can be assumed to be static in this case and never changing.

There is a set of resources that must run on each node of either site A
or site B. Which site depends on where some other resource (in this case
clone master) is currently active.

Colocation does not work, this will force everything on the same node
where master is active and that is not what we want.

Location constraints do not work because required value of "site" is not
known in advance and rules can only use static values to compare node
attributes (i.e. value is either literal, or is taken from resource
parameter or from resource meta-attribute).


> 
>>
>> I do not see any easy way to implement it without essentially
>> duplicating SAPHanaTopology. There are some attributes that are
>> defined
>> but never set so far, you may try to open service request to
>> implement
>> consistent attribute for all nodes on current primary site.
>>
>> ...
>>
>> Hmm ... agent sets (at least, should set) hana_${SID}_vhost attribute
>> for each node and this attribute must be unique and different between
>> two sites. May be worth to look into it.
>>
>>
>>> Best Regards,Strahil Nikolov
>>>  
>>>  
>>>   On Fri, Feb 19, 2021 at 16:51, Andrei Borzenkov<
>>> arvidj...@gmail.com> wrote:   On Fri, Feb 19, 2021 at 2:44 PM
>>> Strahil Nikolov  wrote:


> Do you have a fixed relation between node >pairs and VIPs? I.e.
> must
> A/D always get VIP1, B/E - VIP2 etc?

 I have to verify it again, but generally speaking - yes , VIP1 is
 always on nodeA/D (master), VIP2 on nodeB/E (worker1) , etc.

 I guess I can set negative constraints (-inf) -> VIP1 on node B/E
 + nodeC/F, but the stuff with the 'same DC as master' is the
 tricky part.

>>>
>>> I am not sure I understand what DC has to do with it. You have two
>>> scale-out SAP HANA instances, one is primary, another is secondary.
>>> If
>>> I understand correctly your requirements, your backup application
>>> needs to contact the primary instance which may failover to another
>>> site. You must be using some resource agent for it, to manage
>>> failover. The only one I am aware of is SAPHanaSR-ScaleOut. It
>>> already
>>> sets different node properties for primary and secondary sites.
>>> Just
>>> use them. If you use something else, just look at what attributes
>>> your
>>> RA sets. Otherwise you will be essentially duplicating your RA
>>> functionality because you will somehow need to find out which site
>>> is
>>> currently primary.
>>>
>>> There is no guarantee that pacemaker DC wil be on the same site as
>>> SAP
>>> HANA primary system.
>>>   
>>>
>>
>> ___
>> Manage your subscription:
>> https://lists.clusterlabs.org/mailman/listinfo/users
>>
>> ClusterLabs home: https://www.clusterlabs.org/
>>

___
Manage your subscription:
https://lists.clusterlabs.org/mailman/listinfo/users

ClusterLabs home: https://www.clusterlabs.org/


Re: [ClusterLabs] WebSite_start_0 on node2 'error' (1): call=6, status='complete', exitreason='Failed to access httpd status page.'

2021-03-29 Thread Jason Long
Thank you.
Then, if a node disconnected then how it could back to the cluster chain?






On Monday, March 29, 2021, 06:13:09 PM GMT+4:30, Tomas Jelinek 
 wrote: 





Hi Jason,

Regarding point 3:
Most pcs commands operate on the local node. If you stop a cluster on a 
node, pcs is unable to connect to cluster daemons on the node (since 
they are not running) and prints an error message denoting that. This is 
expected behavior.

Regards,
Tomas


Dne 27. 03. 21 v 6:54 Jason Long napsal(a):
> Thank you.
> I have other questions:
> 
> 1- How can I launch a test lab?
> 2- Why, when I stop node1 manually and then start it again, I can't browse 
> "http://127.0.0.1:2080;? I think when I stopped node1 then Pacemaker forget 
> to back to the chain!!!
> 3- Why, when I stopped node1, then "pcs status nodes" command not work? It 
> shows me "Error: error running crm_mon, is pacemaker running?".
> 
> 
> 
> 
> 
> 
> On Thursday, March 25, 2021, 09:08:45 PM GMT+4:30, Ken Gaillot 
>  wrote:
> 
> 
> 
> 
> 
> On Thu, 2021-03-25 at 14:44 +, Jason Long wrote:
>> Then, how can I sure my configuration is OK?
>> In a clustering environment, when a node disconnected then another
>> node must replace it. Am I right?
>> I did a test:
>> I defined a NAT interface for my VM2 (node2) and used port
>> forwarding: "127.0.0.1:2090" on Host  FORWARDING TO 127.0.0.1:80 on
>> Guest.
>> When node1 is OK and I browse "http://127.0.0.1:2080; then it shown
>> me "My Test Site - node1", but when I browse "http://127.0.0.1:2090;
>> then it doesn't show anything.
>> I stopped node1 and when I browse "http://127.0.0.1:2080; it doesn't
>> show anything, but when I browse "http://127.0.0.1:2090;, then it has
>> shown me "My Test Site - node2".
>> Could this mean that my cluster is working properly?
> 
> Port-forwarding to a single VM can never allow the other VM to take
> over.
> 
> The intent of the floating IP address is to have a single, unique
> address that users can use to contact the service. The cluster can move
> this IP to one VM or the other, and that is invisible to users. The
> term "floating" is intended to convey this, that the IP address is not
> tied to a single node, but can move ("float") from one node to another,
> transparently to users using the IP address.
> 
> In this case, the floating IP would take the place of the 127.0.0.1
> port-forwarding addresses. Instead of two port-forwarding addresses,
> you just have the one floating IP address.
> 
> How you get that working with a reverse proxy is up to you. The
> Clusters from Scratch example shows how to do it with a web server, to
> present the concepts, and you can tailor that to any service that needs
> to be clustered.
> 
>>
>> On Thursday, March 25, 2021, 05:20:33 PM GMT+4:30, Klaus Wenninger <
>> kwenn...@redhat.com> wrote:
>>
>>
>>
>>
>>
>> On 3/25/21 9:55 AM, Jason Long wrote:
>>> Thank you so much.
      Now you can proceed with the "Add Apache HTTP" section.
>>>
>>> What does it mean? I did all steps in the document.
>>>
      Once apache is set up as a cluster resource, you should be
 able to contact the web server at the floating IP...
>>>
>>> # pcs cluster stop node1
>>> node1: Stopping Cluster (pacemaker)...
>>> node1: Stopping Cluster (corosync)...
>>> #
>>> # pcs status
>>> Error: error running crm_mon, is pacemaker running?
>>>      Could not connect to the CIB: Transport endpoint is not
>>> connected
>>>      crm_mon: Error: cluster is not available on this node
>>> #
>>> # curl http://192.168.56.9
>>> 
>>>      My Test Site - node2
>>>      
>>>
>>> Thank you about it, but I want to use these two VMs as an Apache
>>> Reverse Proxy Server. When one of my nodes stopped, then another
>>> node start servicing.
>>>
>>> My test lab use VirtualBox with two VMs as below:
>>> VM1: This VM has two NICs (NAT, Host-only Adapter)
>>> VM2: This VM has one NIC (Host-only Adapter)
>>>
>>> On VM1, I use the NAT interface for the port forwarding:
>>> "127.0.0.1:2080" on Host  FORWARDING TO 127.0.0.1:80 on Guest.
>>>
>>> When I stopped node1 and browse "http://127.0.0.1:2080; then I
>>> can't see anything. I want it shown me "My Test Site - node2". I
>>> think it is reasonable because when on of my Reverse Proxy Server
>>> (node1) stopped, then other Reverse Proxy Server (node2) started.
>>>
>>> How can I achieve this goal?
>>
>> Definitely not using that NAT interface I would say.
>> It will just be able to connect you with a service running on VM1.
>> And that doesn't make any sense seen from a high-availability
>> point of view. Even if you setup NAT that would make the
>> proxy on node2 visible via VM1 this wouldn't give you
>> increased availability - rather the opposite due to increased
>> complexity. In high-availability we are speaking of a Single
>> Point of Failure (SPOF) which VM1 is gonna be here and what you
>> never ever wanna have.
>>>
>>>
>>>
>>>
>>>
>>> On Wednesday, March 24, 2021, 10:21:09 PM GMT+4:30, Ken Gaillot <
>>> 

Re: [ClusterLabs] What a "high priority"?

2021-03-29 Thread Ken Gaillot
Scores are in the range -1,000,000 to +1,000,000 (also known as
"infinity").

Numerically higher scores are preferred in whatever the context is
(e.g. higher stickiness means more sticky, higher colocation score
means more likely to stay together, etc.).

On Mon, 2021-03-29 at 13:05 +0200, Ulrich Windl wrote:
> Hi!
> 
> The question may sound completely stupid, but I didn't find the
> formal definition of a "high priority" in the pacemaker docs.
> Many years ago I thought lower numbers are higher priorities, but
> then I flipped the concept, thinking higher numbers are higher
> priorities.
> As it seems resource placement (i.e.: relocation) is don using lower
> priorities first, I wonder whether ther is consent among the
> developers what a "higher priority" is.
> 
> Regards,
> Ulrich
> 
> 
> 
> ___
> Manage your subscription:
> https://lists.clusterlabs.org/mailman/listinfo/users
> 
> ClusterLabs home: https://www.clusterlabs.org/
> 
-- 
Ken Gaillot 

___
Manage your subscription:
https://lists.clusterlabs.org/mailman/listinfo/users

ClusterLabs home: https://www.clusterlabs.org/


Re: [ClusterLabs] Antw: Re: Antw: [EXT] staggered resource start/stop

2021-03-29 Thread Ken Gaillot
On Mon, 2021-03-29 at 13:01 +0200, Ulrich Windl wrote:
> > > > Reid Wahl  schrieb am 29.03.2021 um 12:47 in
> > > > Nachricht
> 
> :
> > On Mon, Mar 29, 2021 at 3:35 AM Ulrich Windl <
> > ulrich.wi...@rz.uni-regensburg.de> wrote:
> > 
> > > > > > d tbsky  schrieb am 29.03.2021 um 04:01
> > > > > > in Nachricht
> > > 
> > > <
> > > CAC6SzHLi0ufVhE3RM57e2V=t_moml5ecx8ay3gtcfgmofkd...@mail.gmail.com
> > > >:
> > > > Hi:
> > > >since the vm start/stop at once will consume disk IO, I want
> > > > to
> > > > start/stop the vm
> > > > one‑by‑one with delay.
> > > 
> > > I'm surprised that in these days of fast disks and SSDs this is
> > > still an
> > > issue.
> > > Maybe don't delay the start, but limit concurrent starts.
> > > Or maybe add some weak ordering between the VMs.
> > > 
> > 
> > kind=Serialize does this. It makes the resources start
> > consecutively, in no
> > particular order. I added the comment about ocf:heartbeat:Delay
> > because D
> > mentioned wanting a delay... but I don't see why it would be
> > necessary, if
> > Serialize is used.
> 
> This problem made me think of: Does there exist a mechanism for the
> rather new
> tag mechanism to impose a concurrency limit per tag? So the VMs could
> be tagged
> as "VM", and if you limit concurrency for tag "VM" to 1 or 2, you'd
> be done,
> not limiting other resources...

Serialize turns concurrency on/off rather than a number of concurrent
actions, but yes, tags can be used in resource sets, so you could
create a Serialize constraint using a resource set with a tag.

> And still the resources could start in the order the cluster thinks
> it's
> best...
> 
> Regards,
> Ulrich
> 
> > 
> > 
> > > > 
> > > > search the email‑list I found the discussion
> > > > https://oss.clusterlabs.org/pipermail/pacemaker/2013‑August/043128.html 
> > > > 
> > > > now I am testing rhel8 with pacemaker 2.0.4. I wonder if
> > > > there are
> > > > new methods to solve the problem. I search the document but
> > > > didn't
> > > > find new parameters for the job.
> > > > 
> > > > if possible I don't want to modify VirtualDomain RA which
> > > > comes
> > > > with standard rpm package. maybe I should write a new RA which
> > > > stagger
> > > > the node utilization. but if I reset the node utilization when
> > > > cluster
> > > > restart, there maybe a race condition.
> > > > 
> > > >  thanks for help!
> > > > ___
> > > > Manage your subscription:
> > > > https://lists.clusterlabs.org/mailman/listinfo/users 
> > > > 
> > > > ClusterLabs home: https://www.clusterlabs.org/ 
> > > 
> > > 
> > > 
> > > ___
> > > Manage your subscription:
> > > https://lists.clusterlabs.org/mailman/listinfo/users 
> > > 
> > > ClusterLabs home: https://www.clusterlabs.org/ 
> > > 
> > 
> > 
> > -- 
> > Regards,
> > 
> > Reid Wahl, RHCA
> > Senior Software Maintenance Engineer, Red Hat
> > CEE - Platform Support Delivery - ClusterHA
> 
> 
> 
> ___
> Manage your subscription:
> https://lists.clusterlabs.org/mailman/listinfo/users
> 
> ClusterLabs home: https://www.clusterlabs.org/
-- 
Ken Gaillot 

___
Manage your subscription:
https://lists.clusterlabs.org/mailman/listinfo/users

ClusterLabs home: https://www.clusterlabs.org/


Re: [ClusterLabs] Antw: [EXT] Colocation per site ?

2021-03-29 Thread Ken Gaillot
On Sun, 2021-03-28 at 09:20 +0300, Andrei Borzenkov wrote:
> On 28.03.2021 07:16, Strahil Nikolov wrote:
> > I didn't mean DC as a designated coordinator, but as a physical
> > Datecenter location.
> > Last time I checked, the node attributes for all nodes seemed the
> > same.I will verify that tomorrow (Monday).
> > 
> 
> Yes, I was probably mistaken. It is different with scale-out, agent
> puts
> information in global property section of CIB.
> 
> Ideally we'd need expression that says "on node where site attribute
> is
> the same as on node where clone master is active" but I guess there
> is
> no way to express it in pacemaker.

Yep, colocation by node attribute (combined with colocation with
promoted role)

https://clusterlabs.org/pacemaker/doc/en-US/Pacemaker/2.0/html-single/Pacemaker_Explained/index.html#_colocation_properties


> 
> I do not see any easy way to implement it without essentially
> duplicating SAPHanaTopology. There are some attributes that are
> defined
> but never set so far, you may try to open service request to
> implement
> consistent attribute for all nodes on current primary site.
> 
> ...
> 
> Hmm ... agent sets (at least, should set) hana_${SID}_vhost attribute
> for each node and this attribute must be unique and different between
> two sites. May be worth to look into it.
> 
> 
> > Best Regards,Strahil Nikolov
> >  
> >  
> >   On Fri, Feb 19, 2021 at 16:51, Andrei Borzenkov<
> > arvidj...@gmail.com> wrote:   On Fri, Feb 19, 2021 at 2:44 PM
> > Strahil Nikolov  wrote:
> > > 
> > > 
> > > > Do you have a fixed relation between node >pairs and VIPs? I.e.
> > > > must
> > > > A/D always get VIP1, B/E - VIP2 etc?
> > > 
> > > I have to verify it again, but generally speaking - yes , VIP1 is
> > > always on nodeA/D (master), VIP2 on nodeB/E (worker1) , etc.
> > > 
> > > I guess I can set negative constraints (-inf) -> VIP1 on node B/E
> > > + nodeC/F, but the stuff with the 'same DC as master' is the
> > > tricky part.
> > > 
> > 
> > I am not sure I understand what DC has to do with it. You have two
> > scale-out SAP HANA instances, one is primary, another is secondary.
> > If
> > I understand correctly your requirements, your backup application
> > needs to contact the primary instance which may failover to another
> > site. You must be using some resource agent for it, to manage
> > failover. The only one I am aware of is SAPHanaSR-ScaleOut. It
> > already
> > sets different node properties for primary and secondary sites.
> > Just
> > use them. If you use something else, just look at what attributes
> > your
> > RA sets. Otherwise you will be essentially duplicating your RA
> > functionality because you will somehow need to find out which site
> > is
> > currently primary.
> > 
> > There is no guarantee that pacemaker DC wil be on the same site as
> > SAP
> > HANA primary system.
> >   
> > 
> 
> ___
> Manage your subscription:
> https://lists.clusterlabs.org/mailman/listinfo/users
> 
> ClusterLabs home: https://www.clusterlabs.org/
> 
-- 
Ken Gaillot 

___
Manage your subscription:
https://lists.clusterlabs.org/mailman/listinfo/users

ClusterLabs home: https://www.clusterlabs.org/


Re: [ClusterLabs] [EXT] Re: Feedback wanted: OCF Resource Agent API 1.1 proposed for adoption

2021-03-29 Thread Ken Gaillot
I've made a note of these as ideas for 1.2/2.0 :)

On Sun, 2021-03-28 at 03:03 +0200, Ulrich Windl wrote:
> On 3/26/21 11:17 PM, Ken Gaillot wrote:
> > OCF 1.1 is now formally adopted!
> > 
> > https://github.com/ClusterLabs/OCF-spec/blob/master/ra/1.1/resource-agent-api.md
> > 
> > Thanks to everyone who gave feedback.
> 
> "The minor number can be used by both sides to see whether a certain 
> additional feature is supported by the other party."
> 
> That would mean there's a precise revision history with all features 
> changed. I doubt such a thing exists yet, and the mistakes made in
> the 
> past (like chaging the XML without changing the version) can't be 
> corrected either. ;-)

Very true :) but we're in a better position now than before.

All versions of the standard will be kept in the repo (ra/1.0/,
ra/1.1/, etc.) so the info is all there for comparison.

> "Actions must be idempotent." Well: if a "start" action fails, does
> it 
> have to fail the next time, too? Maybe it's "Successful Actions must
> be 
> idempotent."
> Maybe even "Successful state-changing Actions must be idempotent."
> ("Monitor" most likely isn't idempotent; otherwise you would get the 
> same status all the time, right?)

Maybe we should avoid "idempotent" and describe the desired situation
in more natural language.

> "Multiple resource instances of the same type may be running in
> parallel."
> 
> What about "Multiple concurrent actions for separate resource
> instances 
> (using the same RA) must be handled correctly." instead?

Or in addition, sure

> What about listing allowable exit codes with each action?

I think for the purposes of the standard, any action can return any of
the specified error codes that make sense in the agent's specific
context (of course the error codes should be used as the standard
describes)

> Are there any metadata provisions for reporting the OCF_CHECK_LEVEL?

Yes, "depth" (which is described in the standard, but maybe could be
clearer)

> I don't quite understand exit codes 190 and 191. Maybe add an
> example.

Sure, that makes sense.

The degraded codes are intended for host-specific conditions where the
service is currently fine but there is some indication that the host
may be less desirable as a location in the near future. Maybe some
required system resource is nearing exhaustion. An agent for a network
interface might report degraded if transmission errors are increasing.
That sort of thing.

For Pacemaker, these will be displayed in status output like failures,
but will not otherwise be treated as a resource failure.

I'm not aware of any agents currently using the feature. Support was
added to Pacemaker in 2015 (though broken until recently); I'm not sure
who originally requested it. 

> "must at least support XML output": Is there any format other than
> XML 
> specified? If not the statement doesn't make sense.

The standard allows agents to support any output formats they choose,
but meta-data must support XML. As of the new standard, an agent can
choose to support other formats if specified by OCF_OUTPUT_FORMAT, for
example for "text" they could display it in a human-readable format.
Only the XML format is constrained by the standard, agents can do what
they want with anything else.

> What about line-wrapping and other formatting usable in ?
> What about lengths for ?
> 
> The Semantics are under-specified IMHO. Example  vs.
> ?

The schema allows either , or  and ,
depending on the context -- never mixed.

> IMHO it would be best to specify exactly what is allowed; everything 
> that isn't allowed is forbidden.
> (That's better tan allowing some things and forbidding others,
> leaving a 
> "gray zone" in between)
> 
> Regards,
> Ulrich

Definitely lots of details are left unspecified in the standard. It was
decided not to try to be exhaustive with 1.1 for the sake of getting it
out more quickly, since it's been 19 years since 1.0 already :)
-- 
Ken Gaillot 

___
Manage your subscription:
https://lists.clusterlabs.org/mailman/listinfo/users

ClusterLabs home: https://www.clusterlabs.org/


Re: [ClusterLabs] Antw: Re: Antw: Re: Antw: [EXT] Re: Order set troubles

2021-03-29 Thread Andrei Borzenkov
On 29.03.2021 11:11, Ulrich Windl wrote:
 Andrei Borzenkov  schrieb am 27.03.2021 um 06:37 in
> Nachricht <7c294034-56c3-baab-73c6-7909ab554...@gmail.com>:
>> On 26.03.2021 22:18, Reid Wahl wrote:
>>> On Fri, Mar 26, 2021 at 6:27 AM Andrei Borzenkov 
>>> wrote:
>>>
 On Fri, Mar 26, 2021 at 10:17 AM Ulrich Windl
  wrote:
>
 Andrei Borzenkov  schrieb am 26.03.2021 um
 06:19 in
> Nachricht <534274b3‑a6de‑5fac‑0ae4‑d02c305f1...@gmail.com>:
>> On 25.03.2021 21:45, Reid Wahl wrote:
>>> FWIW we have this KB article (I seem to remember Strahil is a Red Hat
>>> customer):
>>>   ‑ How do I configure SAP HANA Scale‑Up System Replication in a
 Pacemaker
>>> cluster when the HANA filesystems are on NFS shares?(
>>> https://access.redhat.com/solutions/5156571)
>>>
>>
>> "How do I make the cluster resources recover when one node loses access
>> to the NFS server?"
>>
>> If node loses access to NFS server then monitor operations for
 resources
>> that depend on NFS availability will fail or timeout and pacemaker will
>> recover (likely by rebooting this node). That's how similar
>> configurations have been handled for the past 20 years in other HA
>> managers. I am genuinely interested, have you encountered the case
 where
>> it was not enough?
>
> That's a big problem with the SAP design (basically it's just too
 complex).
> In the past I had written a kind of resource agent that worked without
 that
> overly complex overhead, but since those days SAP has added much more
> complexity.
> If the NFS server is external, pacemaker could fence your nodes when the
 NFS
> server is down as first the monitor operation will fail (hanging on
 NFS), the
> the recover (stop/start) will fail (also hanging on NFS).

 And how exactly placing NFS resource under pacemaker control is going
 to change it?

>>>
>>> I noted earlier based on the old case notes:
>>>
>>> "Apparently there were situations in which the SAPHana resource wasn't
>>> failing over when connectivity was lost with the NFS share that contained
>>> the hdb* binaries and the HANA data. I don't remember the exact details
>>> (whether demotion was failing, or whether it wasn't even trying to demote
>>> on the primary and promote on the secondary, or what). Either way, I was
>>> surprised that this procedure was necessary, but it seemed to be."
>>>
>>> Strahil may be dealing with a similar situation, not sure. I get where
>>> you're coming from ‑‑ I too would expect the application that depends on
>>> NFS to simply fail when NFS connectivity is lost, which in turn leads to
>>> failover and recovery. For whatever reason, due to some weirdness of the
>>> SAPHana resource agent, that didn't happen.
>>>
>>
>> Yes. The only reason to use this workaround would be if resource agent
>> monitor still believes that application is up when required NFS is down.
>> Which is a bug in resource agent or possibly in application itself.
> 
> I think it's getting philosophical now:
> For example a web server using documents from an NFS server:
> Is the webserver down, when access to NFS hangs?

From end user point of view, web server is down when it cannot complete
user request. From HA manager point of view, web server is down when
agent says it is down. Whether agent just checks for web server PID or
actually attempts to fetch something from web server is up to the agent.

I know that SAP HANA agent is using SAP HANA binaries to query SAP HANA
database so I /expect/ that in this case this attempt ends up as failure
from HA manager point of view.

> Would restarting ("recover")
> the web server help in that situation?

No. But it is irrelevant here. If web server depends on NFS mount and
NFS mount is reported failed, HA manager will attempt to recover by
first stopping web server. Whether error indication comes from web
server or NFS mount is irrelevant. It is very unlikely that HA manager
will ever reach "starting" step because stopping will either fail or
time out and node will be fenced.

> Maybe the OCF_CHECK_LEVEL could be used: High levels could query whether that
> resource is not only "running", but also that the resource is responding, etc.
> 
>>
>> While using this workaround in this case is perfectly reasonable, none
>> of reasons listed in the message I was replying to are applicable.
>>
>> So far the only reason OP wanted to do it was some obscure race
>> condition on startup outside of pacemaker. In which case this workaround
>> simply delays NFS mount, sidestepping race.
>>
>> I also remember something about racing with dnsmasq, at which point I'd
>> say that making cluster depend on availability of DNS is e‑h‑h‑h unwise.
>> ___
>> Manage your subscription:
>> https://lists.clusterlabs.org/mailman/listinfo/users 
>>
>> ClusterLabs home: 

Re: [ClusterLabs] WebSite_start_0 on node2 'error' (1): call=6, status='complete', exitreason='Failed to access httpd status page.'

2021-03-29 Thread Tomas Jelinek
If you stopped a node and you want it to start and reconnect to its 
cluster, run 'pcs cluster start' on the node. You may also run 'pcs 
cluster start --all' or (in your case) 'pcs cluster start node1' on any 
cluster node.


Tomas


Dne 29. 03. 21 v 16:25 Jason Long napsal(a):

Thank you.
Then, if a node disconnected then how it could back to the cluster chain?






On Monday, March 29, 2021, 06:13:09 PM GMT+4:30, Tomas Jelinek 
 wrote:





Hi Jason,

Regarding point 3:
Most pcs commands operate on the local node. If you stop a cluster on a
node, pcs is unable to connect to cluster daemons on the node (since
they are not running) and prints an error message denoting that. This is
expected behavior.

Regards,
Tomas


Dne 27. 03. 21 v 6:54 Jason Long napsal(a):

Thank you.
I have other questions:

1- How can I launch a test lab?
2- Why, when I stop node1 manually and then start it again, I can't browse 
"http://127.0.0.1:2080;? I think when I stopped node1 then Pacemaker forget to 
back to the chain!!!
3- Why, when I stopped node1, then "pcs status nodes" command not work? It shows me 
"Error: error running crm_mon, is pacemaker running?".






On Thursday, March 25, 2021, 09:08:45 PM GMT+4:30, Ken Gaillot 
 wrote:





On Thu, 2021-03-25 at 14:44 +, Jason Long wrote:

Then, how can I sure my configuration is OK?
In a clustering environment, when a node disconnected then another
node must replace it. Am I right?
I did a test:
I defined a NAT interface for my VM2 (node2) and used port
forwarding: "127.0.0.1:2090" on Host  FORWARDING TO 127.0.0.1:80 on
Guest.
When node1 is OK and I browse "http://127.0.0.1:2080; then it shown
me "My Test Site - node1", but when I browse "http://127.0.0.1:2090;
then it doesn't show anything.
I stopped node1 and when I browse "http://127.0.0.1:2080; it doesn't
show anything, but when I browse "http://127.0.0.1:2090;, then it has
shown me "My Test Site - node2".
Could this mean that my cluster is working properly?


Port-forwarding to a single VM can never allow the other VM to take
over.

The intent of the floating IP address is to have a single, unique
address that users can use to contact the service. The cluster can move
this IP to one VM or the other, and that is invisible to users. The
term "floating" is intended to convey this, that the IP address is not
tied to a single node, but can move ("float") from one node to another,
transparently to users using the IP address.

In this case, the floating IP would take the place of the 127.0.0.1
port-forwarding addresses. Instead of two port-forwarding addresses,
you just have the one floating IP address.

How you get that working with a reverse proxy is up to you. The
Clusters from Scratch example shows how to do it with a web server, to
present the concepts, and you can tailor that to any service that needs
to be clustered.



On Thursday, March 25, 2021, 05:20:33 PM GMT+4:30, Klaus Wenninger <
kwenn...@redhat.com> wrote:





On 3/25/21 9:55 AM, Jason Long wrote:

Thank you so much.

       Now you can proceed with the "Add Apache HTTP" section.


What does it mean? I did all steps in the document.


       Once apache is set up as a cluster resource, you should be
able to contact the web server at the floating IP...


# pcs cluster stop node1
node1: Stopping Cluster (pacemaker)...
node1: Stopping Cluster (corosync)...
#
# pcs status
Error: error running crm_mon, is pacemaker running?
       Could not connect to the CIB: Transport endpoint is not
connected
       crm_mon: Error: cluster is not available on this node
#
# curl http://192.168.56.9

       My Test Site - node2
       

Thank you about it, but I want to use these two VMs as an Apache
Reverse Proxy Server. When one of my nodes stopped, then another
node start servicing.

My test lab use VirtualBox with two VMs as below:
VM1: This VM has two NICs (NAT, Host-only Adapter)
VM2: This VM has one NIC (Host-only Adapter)

On VM1, I use the NAT interface for the port forwarding:
"127.0.0.1:2080" on Host  FORWARDING TO 127.0.0.1:80 on Guest.

When I stopped node1 and browse "http://127.0.0.1:2080; then I
can't see anything. I want it shown me "My Test Site - node2". I
think it is reasonable because when on of my Reverse Proxy Server
(node1) stopped, then other Reverse Proxy Server (node2) started.

How can I achieve this goal?


Definitely not using that NAT interface I would say.
It will just be able to connect you with a service running on VM1.
And that doesn't make any sense seen from a high-availability
point of view. Even if you setup NAT that would make the
proxy on node2 visible via VM1 this wouldn't give you
increased availability - rather the opposite due to increased
complexity. In high-availability we are speaking of a Single
Point of Failure (SPOF) which VM1 is gonna be here and what you
never ever wanna have.






On Wednesday, March 24, 2021, 10:21:09 PM GMT+4:30, Ken Gaillot <
kgail...@redhat.com> wrote:





On Wed, 2021-03-24 at 10:50 +, Jason 

Re: [ClusterLabs] WebSite_start_0 on node2 'error' (1): call=6, status='complete', exitreason='Failed to access httpd status page.'

2021-03-29 Thread Tomas Jelinek

Hi Jason,

Regarding point 3:
Most pcs commands operate on the local node. If you stop a cluster on a 
node, pcs is unable to connect to cluster daemons on the node (since 
they are not running) and prints an error message denoting that. This is 
expected behavior.


Regards,
Tomas


Dne 27. 03. 21 v 6:54 Jason Long napsal(a):

Thank you.
I have other questions:

1- How can I launch a test lab?
2- Why, when I stop node1 manually and then start it again, I can't browse 
"http://127.0.0.1:2080;? I think when I stopped node1 then Pacemaker forget to 
back to the chain!!!
3- Why, when I stopped node1, then "pcs status nodes" command not work? It shows me 
"Error: error running crm_mon, is pacemaker running?".






On Thursday, March 25, 2021, 09:08:45 PM GMT+4:30, Ken Gaillot 
 wrote:





On Thu, 2021-03-25 at 14:44 +, Jason Long wrote:

Then, how can I sure my configuration is OK?
In a clustering environment, when a node disconnected then another
node must replace it. Am I right?
I did a test:
I defined a NAT interface for my VM2 (node2) and used port
forwarding: "127.0.0.1:2090" on Host  FORWARDING TO 127.0.0.1:80 on
Guest.
When node1 is OK and I browse "http://127.0.0.1:2080; then it shown
me "My Test Site - node1", but when I browse "http://127.0.0.1:2090;
then it doesn't show anything.
I stopped node1 and when I browse "http://127.0.0.1:2080; it doesn't
show anything, but when I browse "http://127.0.0.1:2090;, then it has
shown me "My Test Site - node2".
Could this mean that my cluster is working properly?


Port-forwarding to a single VM can never allow the other VM to take
over.

The intent of the floating IP address is to have a single, unique
address that users can use to contact the service. The cluster can move
this IP to one VM or the other, and that is invisible to users. The
term "floating" is intended to convey this, that the IP address is not
tied to a single node, but can move ("float") from one node to another,
transparently to users using the IP address.

In this case, the floating IP would take the place of the 127.0.0.1
port-forwarding addresses. Instead of two port-forwarding addresses,
you just have the one floating IP address.

How you get that working with a reverse proxy is up to you. The
Clusters from Scratch example shows how to do it with a web server, to
present the concepts, and you can tailor that to any service that needs
to be clustered.



On Thursday, March 25, 2021, 05:20:33 PM GMT+4:30, Klaus Wenninger <
kwenn...@redhat.com> wrote:





On 3/25/21 9:55 AM, Jason Long wrote:

Thank you so much.

     Now you can proceed with the "Add Apache HTTP" section.


What does it mean? I did all steps in the document.


     Once apache is set up as a cluster resource, you should be
able to contact the web server at the floating IP...


# pcs cluster stop node1
node1: Stopping Cluster (pacemaker)...
node1: Stopping Cluster (corosync)...
#
# pcs status
Error: error running crm_mon, is pacemaker running?
     Could not connect to the CIB: Transport endpoint is not
connected
     crm_mon: Error: cluster is not available on this node
#
# curl http://192.168.56.9

     My Test Site - node2
     

Thank you about it, but I want to use these two VMs as an Apache
Reverse Proxy Server. When one of my nodes stopped, then another
node start servicing.

My test lab use VirtualBox with two VMs as below:
VM1: This VM has two NICs (NAT, Host-only Adapter)
VM2: This VM has one NIC (Host-only Adapter)

On VM1, I use the NAT interface for the port forwarding:
"127.0.0.1:2080" on Host  FORWARDING TO 127.0.0.1:80 on Guest.

When I stopped node1 and browse "http://127.0.0.1:2080; then I
can't see anything. I want it shown me "My Test Site - node2". I
think it is reasonable because when on of my Reverse Proxy Server
(node1) stopped, then other Reverse Proxy Server (node2) started.

How can I achieve this goal?


Definitely not using that NAT interface I would say.
It will just be able to connect you with a service running on VM1.
And that doesn't make any sense seen from a high-availability
point of view. Even if you setup NAT that would make the
proxy on node2 visible via VM1 this wouldn't give you
increased availability - rather the opposite due to increased
complexity. In high-availability we are speaking of a Single
Point of Failure (SPOF) which VM1 is gonna be here and what you
never ever wanna have.






On Wednesday, March 24, 2021, 10:21:09 PM GMT+4:30, Ken Gaillot <
kgail...@redhat.com> wrote:





On Wed, 2021-03-24 at 10:50 +, Jason Long wrote:

Thank you.
Form node1 and node2, I can ping the floating IP address
(192.168.56.9).
I stopped node1:

# pcs cluster stop node1
node1: Stopping Cluster (pacemaker)...
node1: Stopping Cluster (corosync)...

And from both machines, I can ping the floating IP address:

[root@node1 ~]# ping 192.168.56.9
PING 192.168.56.9 (192.168.56.9) 56(84) bytes of data.
64 bytes from 192.168.56.9: icmp_seq=1 ttl=64 time=0.504 ms
64 bytes from 192.168.56.9: 

[ClusterLabs] What a "high priority"?

2021-03-29 Thread Ulrich Windl
Hi!

The question may sound completely stupid, but I didn't find the formal 
definition of a "high priority" in the pacemaker docs.
Many years ago I thought lower numbers are higher priorities, but then I 
flipped the concept, thinking higher numbers are higher priorities.
As it seems resource placement (i.e.: relocation) is don using lower priorities 
first, I wonder whether ther is consent among the developers what a "higher 
priority" is.

Regards,
Ulrich



___
Manage your subscription:
https://lists.clusterlabs.org/mailman/listinfo/users

ClusterLabs home: https://www.clusterlabs.org/


[ClusterLabs] Antw: Re: Antw: [EXT] staggered resource start/stop

2021-03-29 Thread Ulrich Windl
>>> Reid Wahl  schrieb am 29.03.2021 um 12:47 in Nachricht
:
> On Mon, Mar 29, 2021 at 3:35 AM Ulrich Windl <
> ulrich.wi...@rz.uni-regensburg.de> wrote:
> 
>> >>> d tbsky  schrieb am 29.03.2021 um 04:01 in Nachricht
>> :
>> > Hi:
>> >since the vm start/stop at once will consume disk IO, I want to
>> > start/stop the vm
>> > one‑by‑one with delay.
>>
>> I'm surprised that in these days of fast disks and SSDs this is still an
>> issue.
>> Maybe don't delay the start, but limit concurrent starts.
>> Or maybe add some weak ordering between the VMs.
>>
> 
> kind=Serialize does this. It makes the resources start consecutively, in no
> particular order. I added the comment about ocf:heartbeat:Delay because D
> mentioned wanting a delay... but I don't see why it would be necessary, if
> Serialize is used.

This problem made me think of: Does there exist a mechanism for the rather new
tag mechanism to impose a concurrency limit per tag? So the VMs could be tagged
as "VM", and if you limit concurrency for tag "VM" to 1 or 2, you'd be done,
not limiting other resources...
And still the resources could start in the order the cluster thinks it's
best...

Regards,
Ulrich

> 
> 
>> >
>> > search the email‑list I found the discussion
>> > https://oss.clusterlabs.org/pipermail/pacemaker/2013‑August/043128.html 
>> >
>> > now I am testing rhel8 with pacemaker 2.0.4. I wonder if there are
>> > new methods to solve the problem. I search the document but didn't
>> > find new parameters for the job.
>> >
>> > if possible I don't want to modify VirtualDomain RA which comes
>> > with standard rpm package. maybe I should write a new RA which stagger
>> > the node utilization. but if I reset the node utilization when cluster
>> > restart, there maybe a race condition.
>> >
>> >  thanks for help!
>> > ___
>> > Manage your subscription:
>> > https://lists.clusterlabs.org/mailman/listinfo/users 
>> >
>> > ClusterLabs home: https://www.clusterlabs.org/ 
>>
>>
>>
>> ___
>> Manage your subscription:
>> https://lists.clusterlabs.org/mailman/listinfo/users 
>>
>> ClusterLabs home: https://www.clusterlabs.org/ 
>>
> 
> 
> -- 
> Regards,
> 
> Reid Wahl, RHCA
> Senior Software Maintenance Engineer, Red Hat
> CEE - Platform Support Delivery - ClusterHA



___
Manage your subscription:
https://lists.clusterlabs.org/mailman/listinfo/users

ClusterLabs home: https://www.clusterlabs.org/


Re: [ClusterLabs] Antw: [EXT] staggered resource start/stop

2021-03-29 Thread Reid Wahl
On Mon, Mar 29, 2021 at 3:35 AM Ulrich Windl <
ulrich.wi...@rz.uni-regensburg.de> wrote:

> >>> d tbsky  schrieb am 29.03.2021 um 04:01 in Nachricht
> :
> > Hi:
> >since the vm start/stop at once will consume disk IO, I want to
> > start/stop the vm
> > one‑by‑one with delay.
>
> I'm surprised that in these days of fast disks and SSDs this is still an
> issue.
> Maybe don't delay the start, but limit concurrent starts.
> Or maybe add some weak ordering between the VMs.
>

kind=Serialize does this. It makes the resources start consecutively, in no
particular order. I added the comment about ocf:heartbeat:Delay because D
mentioned wanting a delay... but I don't see why it would be necessary, if
Serialize is used.


> >
> > search the email‑list I found the discussion
> > https://oss.clusterlabs.org/pipermail/pacemaker/2013‑August/043128.html
> >
> > now I am testing rhel8 with pacemaker 2.0.4. I wonder if there are
> > new methods to solve the problem. I search the document but didn't
> > find new parameters for the job.
> >
> > if possible I don't want to modify VirtualDomain RA which comes
> > with standard rpm package. maybe I should write a new RA which stagger
> > the node utilization. but if I reset the node utilization when cluster
> > restart, there maybe a race condition.
> >
> >  thanks for help!
> > ___
> > Manage your subscription:
> > https://lists.clusterlabs.org/mailman/listinfo/users
> >
> > ClusterLabs home: https://www.clusterlabs.org/
>
>
>
> ___
> Manage your subscription:
> https://lists.clusterlabs.org/mailman/listinfo/users
>
> ClusterLabs home: https://www.clusterlabs.org/
>


-- 
Regards,

Reid Wahl, RHCA
Senior Software Maintenance Engineer, Red Hat
CEE - Platform Support Delivery - ClusterHA
___
Manage your subscription:
https://lists.clusterlabs.org/mailman/listinfo/users

ClusterLabs home: https://www.clusterlabs.org/


[ClusterLabs] Antw: Re: Antw: [EXT] Re: ocf-tester always claims failure, even with built-in resource agents?

2021-03-29 Thread Ulrich Windl
>>> Antony Stone  schrieb am 29.03.2021 um
10:30 in
Nachricht <202103291030.56200.antony.st...@ha.open.source.it>:
> On Monday 29 March 2021 at 09:03:10, Ulrich Windl wrote:
> 
>> >> So, that would be an extra parameter to the resource definition in
>> >> cluster.cib?
>> >> 
>> >> Change:
>> >> 
>> >> primitive Asterisk asterisk meta migration‑threshold=3 op monitor
>> >> interval=5 timeout=30 on‑fail=restart failure‑timeout=10s
>> >> 
>> >> to:
>> >> 
>> >> primitive Asterisk asterisk meta migration‑threshold=3 op monitor
>> >> interval=5 timeout=30 on‑fail=restart failure‑timeout=10s trace_ra=1
>> >> 
>> >> ?
>> 
>> IMHO it does not make sense to have failure‑timeout smaller than the
>> monitoring interval;
> 
> Um, 10 seconds is not smaller than 5 seconds...

I was referring to "interval=5 timeout=30 failure-timeout=10s"..

Whenever your monitor would time out, the failure would have been reset.

> 
> 
> Antony.
> 
> ‑‑ 
> Your work is both good and original.  Unfortunately the parts that are good

> aren't original, and the parts that are original aren't good.
> 
>  ‑ Samuel Johnson
> 
>Please reply to the
list;
>  please *don't* CC 
> me.
> ___
> Manage your subscription:
> https://lists.clusterlabs.org/mailman/listinfo/users 
> 
> ClusterLabs home: https://www.clusterlabs.org/ 



___
Manage your subscription:
https://lists.clusterlabs.org/mailman/listinfo/users

ClusterLabs home: https://www.clusterlabs.org/


[ClusterLabs] Antw: [EXT] staggered resource start/stop

2021-03-29 Thread Ulrich Windl
>>> d tbsky  schrieb am 29.03.2021 um 04:01 in Nachricht
:
> Hi:
>since the vm start/stop at once will consume disk IO, I want to
> start/stop the vm
> one‑by‑one with delay.

I'm surprised that in these days of fast disks and SSDs this is still an
issue.
Maybe don't delay the start, but limit concurrent starts.
Or maybe add some weak ordering between the VMs.

> 
> search the email‑list I found the discussion
> https://oss.clusterlabs.org/pipermail/pacemaker/2013‑August/043128.html 
> 
> now I am testing rhel8 with pacemaker 2.0.4. I wonder if there are
> new methods to solve the problem. I search the document but didn't
> find new parameters for the job.
> 
> if possible I don't want to modify VirtualDomain RA which comes
> with standard rpm package. maybe I should write a new RA which stagger
> the node utilization. but if I reset the node utilization when cluster
> restart, there maybe a race condition.
> 
>  thanks for help!
> ___
> Manage your subscription:
> https://lists.clusterlabs.org/mailman/listinfo/users 
> 
> ClusterLabs home: https://www.clusterlabs.org/ 



___
Manage your subscription:
https://lists.clusterlabs.org/mailman/listinfo/users

ClusterLabs home: https://www.clusterlabs.org/


[ClusterLabs] Antw: [EXT] Re: Which fence agent is needed for an Apache web server cluster?

2021-03-29 Thread Ulrich Windl
>>> Reid Wahl  schrieb am 28.03.2021 um 00:42 in Nachricht
:
> On Sat, Mar 27, 2021 at 4:28 PM Strahil Nikolov 
> wrote:
> 
>> I had to tune the fence_ipmi recently on some older HPE blades. The
>> default settings were working, but also returning some output about
>> problems negotiating the cypher.
>> As that output could make future version of the fence agent go wild, I
>> tested several options untill no errors are reported. Maybe the cypher flag
>> was different, but I think it was '-c'. If I'm wrong , the author of this
>> thread can check the man page .
>>
>> Yes -> 'HandlePowerKey=ignore' . I have never expected ipmi to try
>> graceful shutdown when I tell it to 'press and hold' or 'cold boot', yet I
>> never checked the code of fence_ipmi.

I think for HP you actually have four options (in iLO at least):
1) Power Cycle (hardest version of a reset)
2) Reset (the traditional "reset" button)
3) Pressand Hold the Power button: Most likely it wil lsend an ACPI power event 
first, tehmn within a very few seconds tun off power
4) Press (ACPI) Power Button shortly: deliver an ACPI event the OS may handle

Maybe you even have a 5th option to send an NMI (I think the hardware watchdog 
does that) (which could cause a kernelpanic with dump, or just be ignored)

However I don't know how much of it is available through IPMI.
On an old DL380 G7 I get:
# ipmitool power
chassis power Commands: status, on, off, cycle, reset, diag, soft
# ipmitool chassis power
chassis power Commands: status, on, off, cycle, reset, diag, soft

Regards,
Ulrich

>>
> 
> fence_ipmilan uses ipmitool to send a poweroff signal. The iLO then sends a
> virtual power button press, which IIRC goes through ACPI. By default on
> RHEL 7 and above, if the system is responsive, systemd-logind handles a
> power key press by initiating a graceful shutdown. You have to disable it
> from handling the power key press so if you want hard-power-off behavior.
> 
> 
>> With triple sbd , I mean sbd with 3 block devices.
>>
>> Best Regards,
>> Strahil Nikolov
>>
>> On Sat, Mar 27, 2021 at 23:15, Reid Wahl
>>  wrote:
>>
>>
>> On Saturday, March 27, 2021, Strahil Nikolov 
>> wrote:
>> > My notes:
>> > - ilo ssh fence mechanism is crappy due to ilo itself, try to avoid if
>> possible
>>
>> It has been unreliable in my experience.
>>
>> > - fence_ipmi requires some tunings (-c flag) and also to disable power
>> button from the system
>>
>> I've rarely, perhaps never, seen a customer have to tune the -c flag.
>>
>> By disabling the power button, do you mean setting HandlePowerKey=ignore
>> in logind.conf? That's not specific to fence_ipmilan, to be clear.
>>
>> > - triple 'sbd' is quite reliable.My previous company was using 'softdog'
>> kernel module for a watchdog device and it never failed us. Yet, it's just
>> a kernel module (no hardware required) and thus RH do not support such
>> setup.
>>
>> What do you mean by triple sbd? Correct that RH doesn't support using
>> softdog as an sbd watchdog device. It was determined that it's not reliable
>> in all situations. It's probably fine much of the time, and I'm glad you
>> had a smooth experience with it.
>> >
>> > On Sat, Mar 27, 2021 at 22:15, Reid Wahl
>> >  wrote:
>> > ___
>> > Manage your subscription:
>> > https://lists.clusterlabs.org/mailman/listinfo/users 
>> >
>> > ClusterLabs home: https://www.clusterlabs.org/ 
>>
>> >
>>
>> --
>> Regards,
>>
>>
>> Reid Wahl, RHCA
>> Senior Software Maintenance Engineer, Red Hat
>> CEE - Platform Support Delivery - ClusterHA
>>
>>
> 
> -- 
> Regards,
> 
> Reid Wahl, RHCA
> Senior Software Maintenance Engineer, Red Hat
> CEE - Platform Support Delivery - ClusterHA




___
Manage your subscription:
https://lists.clusterlabs.org/mailman/listinfo/users

ClusterLabs home: https://www.clusterlabs.org/


Re: [ClusterLabs] Antw: [EXT] Re: ocf-tester always claims failure, even with built-in resource agents?

2021-03-29 Thread Antony Stone
On Monday 29 March 2021 at 09:03:10, Ulrich Windl wrote:

> >> So, that would be an extra parameter to the resource definition in
> >> cluster.cib?
> >> 
> >> Change:
> >> 
> >> primitive Asterisk asterisk meta migration-threshold=3 op monitor
> >> interval=5 timeout=30 on-fail=restart failure-timeout=10s
> >> 
> >> to:
> >> 
> >> primitive Asterisk asterisk meta migration-threshold=3 op monitor
> >> interval=5 timeout=30 on-fail=restart failure-timeout=10s trace_ra=1
> >> 
> >> ?
> 
> IMHO it does not make sense to have failure-timeout smaller than the
> monitoring interval;

Um, 10 seconds is not smaller than 5 seconds...


Antony.

-- 
Your work is both good and original.  Unfortunately the parts that are good 
aren't original, and the parts that are original aren't good.

 - Samuel Johnson

   Please reply to the list;
 please *don't* CC me.
___
Manage your subscription:
https://lists.clusterlabs.org/mailman/listinfo/users

ClusterLabs home: https://www.clusterlabs.org/


[ClusterLabs] Antw: Re: Antw: Re: Antw: [EXT] Re: Order set troubles

2021-03-29 Thread Ulrich Windl
>>> Andrei Borzenkov  schrieb am 27.03.2021 um 06:37 in
Nachricht <7c294034-56c3-baab-73c6-7909ab554...@gmail.com>:
> On 26.03.2021 22:18, Reid Wahl wrote:
>> On Fri, Mar 26, 2021 at 6:27 AM Andrei Borzenkov 
>> wrote:
>> 
>>> On Fri, Mar 26, 2021 at 10:17 AM Ulrich Windl
>>>  wrote:

>>> Andrei Borzenkov  schrieb am 26.03.2021 um
>>> 06:19 in
 Nachricht <534274b3‑a6de‑5fac‑0ae4‑d02c305f1...@gmail.com>:
> On 25.03.2021 21:45, Reid Wahl wrote:
>> FWIW we have this KB article (I seem to remember Strahil is a Red Hat
>> customer):
>>   ‑ How do I configure SAP HANA Scale‑Up System Replication in a
>>> Pacemaker
>> cluster when the HANA filesystems are on NFS shares?(
>> https://access.redhat.com/solutions/5156571)
>>
>
> "How do I make the cluster resources recover when one node loses access
> to the NFS server?"
>
> If node loses access to NFS server then monitor operations for
>>> resources
> that depend on NFS availability will fail or timeout and pacemaker will
> recover (likely by rebooting this node). That's how similar
> configurations have been handled for the past 20 years in other HA
> managers. I am genuinely interested, have you encountered the case
>>> where
> it was not enough?

 That's a big problem with the SAP design (basically it's just too
>>> complex).
 In the past I had written a kind of resource agent that worked without
>>> that
 overly complex overhead, but since those days SAP has added much more
 complexity.
 If the NFS server is external, pacemaker could fence your nodes when the
>>> NFS
 server is down as first the monitor operation will fail (hanging on
>>> NFS), the
 the recover (stop/start) will fail (also hanging on NFS).
>>>
>>> And how exactly placing NFS resource under pacemaker control is going
>>> to change it?
>>>
>> 
>> I noted earlier based on the old case notes:
>> 
>> "Apparently there were situations in which the SAPHana resource wasn't
>> failing over when connectivity was lost with the NFS share that contained
>> the hdb* binaries and the HANA data. I don't remember the exact details
>> (whether demotion was failing, or whether it wasn't even trying to demote
>> on the primary and promote on the secondary, or what). Either way, I was
>> surprised that this procedure was necessary, but it seemed to be."
>> 
>> Strahil may be dealing with a similar situation, not sure. I get where
>> you're coming from ‑‑ I too would expect the application that depends on
>> NFS to simply fail when NFS connectivity is lost, which in turn leads to
>> failover and recovery. For whatever reason, due to some weirdness of the
>> SAPHana resource agent, that didn't happen.
>> 
> 
> Yes. The only reason to use this workaround would be if resource agent
> monitor still believes that application is up when required NFS is down.
> Which is a bug in resource agent or possibly in application itself.

I think it's getting philosophical now:
For example a web server using documents from an NFS server:
Is the webserver down, when access to NFS hangs? Would restarting ("recover")
the web server help in that situation?
Maybe the OCF_CHECK_LEVEL could be used: High levels could query whether that
resource is not only "running", but also that the resource is responding, etc.

> 
> While using this workaround in this case is perfectly reasonable, none
> of reasons listed in the message I was replying to are applicable.
> 
> So far the only reason OP wanted to do it was some obscure race
> condition on startup outside of pacemaker. In which case this workaround
> simply delays NFS mount, sidestepping race.
> 
> I also remember something about racing with dnsmasq, at which point I'd
> say that making cluster depend on availability of DNS is e‑h‑h‑h unwise.
> ___
> Manage your subscription:
> https://lists.clusterlabs.org/mailman/listinfo/users 
> 
> ClusterLabs home: https://www.clusterlabs.org/ 



___
Manage your subscription:
https://lists.clusterlabs.org/mailman/listinfo/users

ClusterLabs home: https://www.clusterlabs.org/


[ClusterLabs] Antw: [EXT] Re: Community adoption of PAF vs pgsql

2021-03-29 Thread Ulrich Windl
>>> Reid Wahl  schrieb am 26.03.2021 um 20:39 in Nachricht
:
> If you have an enterprise support agreement, be sure to also explore
> whether your vendor supports one and not the other. For example, Red Hat
> currently supports pgsql but not PAF (though there is an open BZ to add
> support for PAF).
> 

Years ago I used a configuration that wasn't supported, while the supported one 
did not work, so we sticked with the working version. Support didn't interfere 
as it was less work for them ;-)

> 
> On Fri, Mar 26, 2021 at 9:14 AM Jehan-Guillaume de Rorthais 
> wrote:
> 
>> Hi,
>>
>> I'm one of the PAF author, so I'm biased.
>>
>> On Fri, 26 Mar 2021 14:51:28 +
>> Isaac Pittman  wrote:
>>
>> > My team has the opportunity to update our PostgreSQL resource agent to
>> either
>> > PAF (https://github.com/ClusterLabs/PAF) or pgsql
>> > (
>> https://github.com/ClusterLabs/resource-agents/blob/master/heartbeat/pgsql 
>> ),
>> > and I've been charged with comparing them.
>>
>> In my opinion, you should spend time to actually build some "close-to-prod"
>> clusters and train them. Then you'll be able to choose base on some team
>> experience.
>>
>> Both agent have very different spirit and very different administrative
>> tasks.
>>
>> Break your cluster, make some switchover, some failover, how to failback a
>> node
>> and so on.
>>
>> > After searching various mailing lists and reviewing the code and
>> > documentation, it seems like either could suit our needs and both are
>> > actively maintained.
>> >
>> > One factor that I couldn't get a sense of is community support and
>> adoption:
>> >
>> >   *   Does PAF or pgsql enjoy wider community support or adoption,
>> especially
>> > for new projects? (I would expect many older projects to be on pgsql due
>> to
>> > its longer history.)
>>
>> Sadly, I have absolutely no clues...
>>
>> >   *   Does either seem to be on the road to deprecation?
>>
>> PAF is not on its way to deprecation, I have a pending TODO list for it.
>>
>> I would bet pgsql is not on its way to deprecation either, but I can't
>> speak
>> for the real authors.
>>
>> Regards,
>> ___
>> Manage your subscription:
>> https://lists.clusterlabs.org/mailman/listinfo/users 
>>
>> ClusterLabs home: https://www.clusterlabs.org/ 
>>
>>
> 
> -- 
> Regards,
> 
> Reid Wahl, RHCA
> Senior Software Maintenance Engineer, Red Hat
> CEE - Platform Support Delivery - ClusterHA




___
Manage your subscription:
https://lists.clusterlabs.org/mailman/listinfo/users

ClusterLabs home: https://www.clusterlabs.org/


[ClusterLabs] Antw: [EXT] Re: ocf-tester always claims failure, even with built-in resource agents?

2021-03-29 Thread Ulrich Windl
>>> Reid Wahl  schrieb am 26.03.2021 um 23:28 in Nachricht
:

...
>> So, that would be an extra parameter to the resource definition in
>> cluster.cib?
>>
>> Change:
>>
>> primitive Asterisk asterisk meta migration-threshold=3 op monitor
>> interval=5
>> timeout=30 on-fail=restart failure-timeout=10s
>>
>> to:
>>
>> primitive Asterisk asterisk meta migration-threshold=3 op monitor
>> interval=5
>> timeout=30 on-fail=restart failure-timeout=10s trace_ra=1
>>
>> ?

IMHO it does not make sense to have failure-timeout smaller than the monitoring 
interval; I'd say use at least two monitor intervals; otherwise you are 
basically disabling the monitoring. Usually reasonable values are probably 
hours or days, depending on the stability of your cluster.

>>
> 
> It's an instance attribute, not a meta attribute. I'm not familiar with
> crmsh syntax but trace_ra=1 would go wherever you would configure a

Syntax is very easy (says the manual):
   Example:

   trace fs start
   trace webserver
   trace webserver probe
   trace fs monitor 0

> "normal" option, like `ip=x.x.x.x` for an IPaddr2 resource. It will save a
> shell trace of each operation to a file in
> /var/lib/heartbeat/trace_ra/asterisk. You would then wait for an operation
> to fail, find the file containing that operation's trace, and see what it
> tells you about the error.
> 
> You might already have some more detail about the error in
> /var/log/messages and/or /var/log/pacemaker/pacemaker.log. Look in
> /var/log/messages around Fri Mar 26 13:37:08 2021 on the node where the
> failure occurred. See if there are any additional messages from the
> resource agent, or any stdout or stderr logged by lrmd/pacemaker-execd for
> the Asterisk resource.
> 
> 
>>
>> Antony.
>>
>> --
>> "It is easy to be blinded to the essential uselessness of them by the
>> sense of
>> achievement you get from getting them to work at all. In other words - and
>> this is the rock solid principle on which the whole of the Corporation's
>> Galaxy-wide success is founded - their fundamental design flaws are
>> completely
>> hidden by their superficial design flaws."
>>
>>  - Douglas Noel Adams
>>
>>Please reply to the
>> list;
>>  please *don't* CC
>> me.
>> ___
>> Manage your subscription:
>> https://lists.clusterlabs.org/mailman/listinfo/users 
>>
>> ClusterLabs home: https://www.clusterlabs.org/ 
>>
>>
> 
> -- 
> Regards,
> 
> Reid Wahl, RHCA
> Senior Software Maintenance Engineer, Red Hat
> CEE - Platform Support Delivery - ClusterHA




___
Manage your subscription:
https://lists.clusterlabs.org/mailman/listinfo/users

ClusterLabs home: https://www.clusterlabs.org/


[ClusterLabs] Antw: Re: Antw: Re: Antw: [EXT] Re: Order set troubles

2021-03-29 Thread Ulrich Windl
>>> Andrei Borzenkov  schrieb am 26.03.2021 um 14:26 in
Nachricht
:
> On Fri, Mar 26, 2021 at 10:17 AM Ulrich Windl
>  wrote:
>>
>> >>> Andrei Borzenkov  schrieb am 26.03.2021 um 06:19
in
>> Nachricht <534274b3‑a6de‑5fac‑0ae4‑d02c305f1...@gmail.com>:
>> > On 25.03.2021 21:45, Reid Wahl wrote:
>> >> FWIW we have this KB article (I seem to remember Strahil is a Red Hat
>> >> customer):
>> >>   ‑ How do I configure SAP HANA Scale‑Up System Replication in a
Pacemaker
>> >> cluster when the HANA filesystems are on NFS shares?(
>> >> https://access.redhat.com/solutions/5156571)
>> >>
>> >
>> > "How do I make the cluster resources recover when one node loses access
>> > to the NFS server?"
>> >
>> > If node loses access to NFS server then monitor operations for resources
>> > that depend on NFS availability will fail or timeout and pacemaker will
>> > recover (likely by rebooting this node). That's how similar
>> > configurations have been handled for the past 20 years in other HA
>> > managers. I am genuinely interested, have you encountered the case where
>> > it was not enough?
>>
>> That's a big problem with the SAP design (basically it's just too
complex).
>> In the past I had written a kind of resource agent that worked without
that
>> overly complex overhead, but since those days SAP has added much more
>> complexity.
>> If the NFS server is external, pacemaker could fence your nodes when the
NFS
>> server is down as first the monitor operation will fail (hanging on NFS), 
> the
>> the recover (stop/start) will fail (also hanging on NFS).
> 
> And how exactly placing NFS resource under pacemaker control is going
> to change it?

Actively maybe: Check reachability of the NFS server (local or remote); if
it's not reachable, block all RA operations that would hang while NFS is down.
(Basically a "freeze" isntead of a "recover" when NFS is down)

> 
>> Even when fencing the
>> node it would not help (resources cannot start) if the NFS server is still
>> down.
> 
> And how exactly placing NFS resource under pacemaker control is going
> to change it?

See above.

> 
>> So you may end up with all your nodes being fenced and the fail counts
>> disabling any automatic resource restart.
>>
> 
> And how exactly placing NFS resource under pacemaker control is going
> to change it?

Andrei, is there also another sentence you can say, or is that your favorite
clicpboard message? ;-)

Regards,
Ulrich

> ___
> Manage your subscription:
> https://lists.clusterlabs.org/mailman/listinfo/users 
> 
> ClusterLabs home: https://www.clusterlabs.org/ 



___
Manage your subscription:
https://lists.clusterlabs.org/mailman/listinfo/users

ClusterLabs home: https://www.clusterlabs.org/


Re: [ClusterLabs] staggered resource start/stop

2021-03-29 Thread d tbsky
Reid Wahl 
>
> An order constraint set with kind=Serialize (which is mentioned in the first 
> reply to the thread you linked) seems like the most logical option to me. You 
> could serialize a set of resource sets, where each inner set contains a 
> VirtualDomain resource and an ocf:heartbeat:Delay resource.
>
>  ⁠5.3.1. Ordering Properties 
> (https://clusterlabs.org/pacemaker/doc/en-US/Pacemaker/2.0/html-single/Pacemaker_Explained/index.html#idm46061192464416)
>  ⁠5.6. Ordering Sets of Resources 
> (https://clusterlabs.org/pacemaker/doc/en-US/Pacemaker/2.0/html-single/Pacemaker_Explained/index.html#s-resource-sets-ordering)

 thanks a lot! I don't know there is an official RA acting as
delay. that's interesting and useful to me.
___
Manage your subscription:
https://lists.clusterlabs.org/mailman/listinfo/users

ClusterLabs home: https://www.clusterlabs.org/


Re: [ClusterLabs] staggered resource start/stop

2021-03-29 Thread Reid Wahl
An order constraint set with kind=Serialize (which is mentioned in the
first reply to the thread you linked) seems like the most logical option to
me. You could serialize a set of resource sets, where each inner set
contains a VirtualDomain resource and an ocf:heartbeat:Delay resource.

 ⁠5.3.1. Ordering Properties (
https://clusterlabs.org/pacemaker/doc/en-US/Pacemaker/2.0/html-single/Pacemaker_Explained/index.html#idm46061192464416
)
 ⁠5.6. Ordering Sets of Resources (
https://clusterlabs.org/pacemaker/doc/en-US/Pacemaker/2.0/html-single/Pacemaker_Explained/index.html#s-resource-sets-ordering
)


On Sun, Mar 28, 2021 at 7:02 PM d tbsky  wrote:

> Hi:
>since the vm start/stop at once will consume disk IO, I want to
> start/stop the vm
> one-by-one with delay.
>
> search the email-list I found the discussion
> https://oss.clusterlabs.org/pipermail/pacemaker/2013-August/043128.html
>
> now I am testing rhel8 with pacemaker 2.0.4. I wonder if there are
> new methods to solve the problem. I search the document but didn't
> find new parameters for the job.
>
> if possible I don't want to modify VirtualDomain RA which comes
> with standard rpm package. maybe I should write a new RA which stagger
> the node utilization. but if I reset the node utilization when cluster
> restart, there maybe a race condition.
>
>  thanks for help!
> ___
> Manage your subscription:
> https://lists.clusterlabs.org/mailman/listinfo/users
>
> ClusterLabs home: https://www.clusterlabs.org/
>
>

-- 
Regards,

Reid Wahl, RHCA
Senior Software Maintenance Engineer, Red Hat
CEE - Platform Support Delivery - ClusterHA
___
Manage your subscription:
https://lists.clusterlabs.org/mailman/listinfo/users

ClusterLabs home: https://www.clusterlabs.org/