Re: [ClusterLabs] Disabled resources after parallel removing of group

2024-05-22 Thread Miroslav Lisik

Hi,
see comments inline.

On 5/17/24 17:46, Александр Руденко wrote:

Miroslav, thank you!

It helps me understand that it's not a configuration issue.

BTW, is it okay to create new resources in parallel?


Same as with parallel 'remove' operations it is not safe to do parallel
'create' operations, although it may work in some cases.

The 'pcs resource create' updates CIB by using CIB diffs and cibadmin's
'--patch' option, which is different from 'pcs resource remove', where
combination of '--replace' and '--delete' is used.

There is still risk that cib patch will not apply or something will
break due to parallel actions.

Do not use pcs command in parallel on live cluster, rather modify cib
file using pcs '-f' option and then push cib configuration to a cluster:
pcs cluster cib-push 
OR
pcs cluster cib-ppush  diff-against=

The difference in this two commands is in a method how cib update is
applied. The first command uses cibadmin's '--replace' option and the
second uses '--patch' option.


On timeline it looks like:

pcs resource create resA1  --group groupA
pcs resource create resB1  --group groupB
resA1 Started
pcs resource create resA2  --group groupA
res B1 Started
pcs resource create resB2  --group groupB
res A2 Started
res B2 Started

For now, it works okay)

In our case, cluster events like 'create' and 'remove' are generated by 
users, and for now we don't have any queue for operations. But now, I 
realized that we need a queue for 'remove' operations. Maybe we need a 
queue for 'create' operations to?


Yes, it is better to prevent users from doing modify operations at the
same time.



пт, 17 мая 2024 г. в 17:49, Miroslav Lisik >:


Hi Aleksandr!

It is not safe to use `pcs resource remove` command in parallel because
you run into the same issues as you already described. Processes run by
remove command are not synchronized.

Unfortunately, remove command does not support more than one resource
yet.

If you really need to remove resources at once you can use this method:
1. get the current cib configuration:
pcs cluster cib > original.xml

2. create a new copy of the file:
cp original.xml new.xml

3. disable all to be removed resources using -f option and new
configuration file:
pcs -f new.xml resource disable ...

4. remove resources using -f option and new configuration file:
pcs -f new.xml resource remove 
...

5. push new cib configuration to the cluster
pcs cluster cib-push new.xml diff-against=original.xml


On 5/17/24 13:47, Александр Руденко wrote:
 > Hi!
 >
 > I am new in the pacemaker world, and I, unfortunately, have problems
 > with simple actions like group removal. Please, help me
understand when
 > I'm wrong.
 >
 > For simplicity I will use standard resources like IPaddr2 (but we
have
 > this problem on any type of our custom resources).
 >
 > I have 5 groups like this:
 >
 > Full List of Resources:
 >    * Resource Group: group-1:
 >      * ip-11 (ocf::heartbeat:IPaddr2): Started vdc16
 >      * ip-12 (ocf::heartbeat:IPaddr2): Started vdc16
 >    * Resource Group: group-2:
 >      * ip-21 (ocf::heartbeat:IPaddr2): Started vdc17
 >      * ip-22 (ocf::heartbeat:IPaddr2): Started vdc17
 >    * Resource Group: group-3:
 >      * ip-31 (ocf::heartbeat:IPaddr2): Started vdc18
 >      * ip-32 (ocf::heartbeat:IPaddr2): Started vdc18
 >    * Resource Group: group-4:
 >      * ip-41 (ocf::heartbeat:IPaddr2): Started vdc16
 >      * ip-42 (ocf::heartbeat:IPaddr2): Started vdc16
 >
 > Groups were created by next simple script:
 > cat groups.sh
 > pcs resource create ip-11 ocf:heartbeat:IPaddr2 ip=10.7.1.11
 > cidr_netmask=24 nic=lo op monitor interval=10s --group group-1
 > pcs resource create ip-12 ocf:heartbeat:IPaddr2 ip=10.7.1.12
 > cidr_netmask=24 nic=lo op monitor interval=10s --group group-1
 >
 > pcs resource create ip-21 ocf:heartbeat:IPaddr2 ip=10.7.1.21
 > cidr_netmask=24 nic=lo op monitor interval=10s --group group-2
 > pcs resource create ip-22 ocf:heartbeat:IPaddr2 ip=10.7.1.22
 > cidr_netmask=24 nic=lo op monitor interval=10s --group group-2
 >
 > pcs resource create ip-31 ocf:heartbeat:IPaddr2 ip=10.7.1.31
 > cidr_netmask=24 nic=lo op monitor interval=10s --group group-3
 > pcs resource create ip-32 ocf:heartbeat:IPaddr2 ip=10.7.1.32
 > cidr_netmask=24 nic=lo op monitor interval=10s --group group-3
 >
 > pcs resource create ip-41 ocf:heartbeat:IPaddr2 ip=10.7.1.41
 > cidr_netmask=24 nic=lo op monitor interval=10s --group group-4
 > pcs resource create ip-42 ocf:heartbeat:IPaddr2 ip=10.7.1.42
 > cidr_netmask=24 nic=lo op monitor interval=10s --group group-4
 >
 > Next, i try to remove all of these group in 'parallel':
 > cat remove.sh
 > pcs 

Re: [ClusterLabs] Disabled resources after parallel removing of group

2024-05-20 Thread Александр Руденко
Alexey, thank you!

Now, it's clear for me.

сб, 18 мая 2024 г. в 02:11, :

> Hi Alexander,
>
>
>
> AFAIK, Pacemaker itself only have deal with XML-based configuration
> database, shared across all cluster. Each time you call pcs or any other
> tool it takes XML (or part of it) from pacemaker, tweaks it and then push
> it back to Pacemaker. Each time XML is pushed, Pacemaker completely rethink
> the new config, look to the current state and schedule changes from current
> state to target state. I can’t point you to exact place in the docs where
> this described, but this from Pacemaker docs.
>
>
>
> Therefore, each use of pcs command triggering this process immediately.
> Seems that some async-driven side effects may happen from this. Then, you
> may do ANY count of changes in one *stroke if Pacemaker got the new
> config with all these changes*. So, you need to enforce management tools
> FIRST prepare all changes and THEN push it all at once. And then you have
> no need to complete separate changes in background because the preparation
> is very fast. And final application will be done at max possible speed too.
>
>
>
> Miroslav exampled how to manage bulk delete, but this is the *common way
> to manage massive change*. Any operations could be done! You take the
> Pacemaker CIB to a file, complete all the changes against the file instead
> write each one to CIB and then push the total back, then Pacemaker will
> schedule all changes.
>
>
>
> You may put ANY commands in any mix: add, change, delete, but use -f
>  option for changes to be done against file. You may keep
> original to push diff (as at Miroslav example), or you may just push whole
> changed config, AFAIK, there no difference.
>
>
>
> ###
>
> # Make a copy of CIB into local file
>
> pcs cluster cib config.xml
>
>
>
> # do changes against file
>
> pcs -f config.xml resource add 
>
>
>
> pcs -f config.xml constraint 
>
>
>
> pcs -f config.xml resource disable 
>
>
>
> pcs -f config.xml resource remove 
>
>
>
> # And finally push the whole ‘configuration’ scope back (mind there no
> diff, but push only config scope)
>
> pcs cluster cib-push config.xml –config
>
>
>
> 
>
>
>
> And Pacemaker apply all changes at once.
>
>
>
> Miroslav’s example taken from pcs man page
> <https://manpages.ubuntu.com/manpages/jammy/man8/pcs.8.html> for command
> ‘cluster cib-push’. My example works too.
>
>
>
> Have a good failover! Means no failover at all )))
>
>
>
> Alex
>
>
>
>
>
> *From:* Users  *On Behalf Of *Александр
> Руденко
> *Sent:* Friday, May 17, 2024 6:46 PM
> *To:* Cluster Labs - All topics related to open-source clustering
> welcomed 
> *Subject:* Re: [ClusterLabs] Disabled resources after parallel removing
> of group
>
>
>
> Miroslav, thank you!
>
> It helps me understand that it's not a configuration issue.
>
> BTW, is it okay to create new resources in parallel?
>
> On timeline it looks like:
>
> pcs resource create resA1  --group groupA
>
> pcs resource create resB1  --group groupB
> resA1 Started
> pcs resource create resA2  --group groupA
>
> res B1 Started
> pcs resource create resB2  --group groupB
>
> res A2 Started
>
> res B2 Started
>
>
>
> For now, it works okay)
>
> In our case, cluster events like 'create' and 'remove' are generated by
> users, and for now we don't have any queue for operations. But now, I
> realized that we need a queue for 'remove' operations. Maybe we need a
> queue for 'create' operations to?
>
>
>
> пт, 17 мая 2024 г. в 17:49, Miroslav Lisik :
>
> Hi Aleksandr!
>
> It is not safe to use `pcs resource remove` command in parallel because
> you run into the same issues as you already described. Processes run by
> remove command are not synchronized.
>
> Unfortunately, remove command does not support more than one resource
> yet.
>
> If you really need to remove resources at once you can use this method:
> 1. get the current cib configuration:
> pcs cluster cib > original.xml
>
> 2. create a new copy of the file:
> cp original.xml new.xml
>
> 3. disable all to be removed resources using -f option and new
> configuration file:
> pcs -f new.xml resource disable ...
>
> 4. remove resources using -f option and new configuration file:
> pcs -f new.xml resource remove 
> ...
>
> 5. push new cib configuration to the cluster
> pcs cluster cib-push new.xml diff-against=original.xml
>
>
> On 5/17/24 13:47, Александр Руденко wrote:
> >

Re: [ClusterLabs] Disabled resources after parallel removing of group

2024-05-17 Thread alexey
Hi Alexander,

 

AFAIK, Pacemaker itself only have deal with XML-based configuration database, 
shared across all cluster. Each time you call pcs or any other tool it takes 
XML (or part of it) from pacemaker, tweaks it and then push it back to 
Pacemaker. Each time XML is pushed, Pacemaker completely rethink the new 
config, look to the current state and schedule changes from current state to 
target state. I can’t point you to exact place in the docs where this 
described, but this from Pacemaker docs.

 

Therefore, each use of pcs command triggering this process immediately. Seems 
that some async-driven side effects may happen from this. Then, you may do ANY 
count of changes in one stroke if Pacemaker got the new config with all these 
changes. So, you need to enforce management tools FIRST prepare all changes and 
THEN push it all at once. And then you have no need to complete separate 
changes in background because the preparation is very fast. And final 
application will be done at max possible speed too.

 

Miroslav exampled how to manage bulk delete, but this is the common way to 
manage massive change. Any operations could be done! You take the Pacemaker CIB 
to a file, complete all the changes against the file instead write each one to 
CIB and then push the total back, then Pacemaker will schedule all changes.

 

You may put ANY commands in any mix: add, change, delete, but use -f  
option for changes to be done against file. You may keep original to push diff 
(as at Miroslav example), or you may just push whole changed config, AFAIK, 
there no difference.

 

###

# Make a copy of CIB into local file

pcs cluster cib config.xml

 

# do changes against file

pcs -f config.xml resource add 

 

pcs -f config.xml constraint

 

pcs -f config.xml resource disable 

 

pcs -f config.xml resource remove 

 

# And finally push the whole ‘configuration’ scope back (mind there no diff, 
but push only config scope)

pcs cluster cib-push config.xml –config

 



 

And Pacemaker apply all changes at once.

 

Miroslav’s example taken from pcs man page 
<https://manpages.ubuntu.com/manpages/jammy/man8/pcs.8.html>  for command 
‘cluster cib-push’. My example works too.

 

Have a good failover! Means no failover at all )))

 

Alex

 

 

From: Users  On Behalf Of Александр Руденко
Sent: Friday, May 17, 2024 6:46 PM
To: Cluster Labs - All topics related to open-source clustering welcomed 

Subject: Re: [ClusterLabs] Disabled resources after parallel removing of group

 

Miroslav, thank you! 

It helps me understand that it's not a configuration issue.

BTW, is it okay to create new resources in parallel?

On timeline it looks like:

pcs resource create resA1  --group groupA

pcs resource create resB1  --group groupB
resA1 Started
pcs resource create resA2  --group groupA

res B1 Started
pcs resource create resB2  --group groupB

res A2 Started

res B2 Started

 

For now, it works okay)

In our case, cluster events like 'create' and 'remove' are generated by users, 
and for now we don't have any queue for operations. But now, I realized that we 
need a queue for 'remove' operations. Maybe we need a queue for 'create' 
operations to?

 

пт, 17 мая 2024 г. в 17:49, Miroslav Lisik mailto:mli...@redhat.com> >:

Hi Aleksandr!

It is not safe to use `pcs resource remove` command in parallel because
you run into the same issues as you already described. Processes run by
remove command are not synchronized.

Unfortunately, remove command does not support more than one resource
yet.

If you really need to remove resources at once you can use this method:
1. get the current cib configuration:
pcs cluster cib > original.xml

2. create a new copy of the file:
cp original.xml new.xml

3. disable all to be removed resources using -f option and new
configuration file:
pcs -f new.xml resource disable ...

4. remove resources using -f option and new configuration file:
pcs -f new.xml resource remove 
...

5. push new cib configuration to the cluster
pcs cluster cib-push new.xml diff-against=original.xml


On 5/17/24 13:47, Александр Руденко wrote:
> Hi!
> 
> I am new in the pacemaker world, and I, unfortunately, have problems 
> with simple actions like group removal. Please, help me understand when 
> I'm wrong.
> 
> For simplicity I will use standard resources like IPaddr2 (but we have 
> this problem on any type of our custom resources).
> 
> I have 5 groups like this:
> 
> Full List of Resources:
>* Resource Group: group-1:
>  * ip-11 (ocf::heartbeat:IPaddr2): Started vdc16
>  * ip-12 (ocf::heartbeat:IPaddr2): Started vdc16
>* Resource Group: group-2:
>  * ip-21 (ocf::heartbeat:IPaddr2): Started vdc17
>  * ip-22 (ocf::heartbeat:IPaddr2): Started vdc17
>* Resource Group: group-3:
>  * ip-31 (ocf::heartbeat:IPaddr2): Started v

Re: [ClusterLabs] Disabled resources after parallel removing of group

2024-05-17 Thread Александр Руденко
Miroslav, thank you!

It helps me understand that it's not a configuration issue.

BTW, is it okay to create new resources in parallel?
On timeline it looks like:

pcs resource create resA1  --group groupA
pcs resource create resB1  --group groupB
resA1 Started
pcs resource create resA2  --group groupA
res B1 Started
pcs resource create resB2  --group groupB
res A2 Started
res B2 Started

For now, it works okay)

In our case, cluster events like 'create' and 'remove' are generated by
users, and for now we don't have any queue for operations. But now, I
realized that we need a queue for 'remove' operations. Maybe we need a
queue for 'create' operations to?

пт, 17 мая 2024 г. в 17:49, Miroslav Lisik :

> Hi Aleksandr!
>
> It is not safe to use `pcs resource remove` command in parallel because
> you run into the same issues as you already described. Processes run by
> remove command are not synchronized.
>
> Unfortunately, remove command does not support more than one resource
> yet.
>
> If you really need to remove resources at once you can use this method:
> 1. get the current cib configuration:
> pcs cluster cib > original.xml
>
> 2. create a new copy of the file:
> cp original.xml new.xml
>
> 3. disable all to be removed resources using -f option and new
> configuration file:
> pcs -f new.xml resource disable ...
>
> 4. remove resources using -f option and new configuration file:
> pcs -f new.xml resource remove 
> ...
>
> 5. push new cib configuration to the cluster
> pcs cluster cib-push new.xml diff-against=original.xml
>
>
> On 5/17/24 13:47, Александр Руденко wrote:
> > Hi!
> >
> > I am new in the pacemaker world, and I, unfortunately, have problems
> > with simple actions like group removal. Please, help me understand when
> > I'm wrong.
> >
> > For simplicity I will use standard resources like IPaddr2 (but we have
> > this problem on any type of our custom resources).
> >
> > I have 5 groups like this:
> >
> > Full List of Resources:
> >* Resource Group: group-1:
> >  * ip-11 (ocf::heartbeat:IPaddr2): Started vdc16
> >  * ip-12 (ocf::heartbeat:IPaddr2): Started vdc16
> >* Resource Group: group-2:
> >  * ip-21 (ocf::heartbeat:IPaddr2): Started vdc17
> >  * ip-22 (ocf::heartbeat:IPaddr2): Started vdc17
> >* Resource Group: group-3:
> >  * ip-31 (ocf::heartbeat:IPaddr2): Started vdc18
> >  * ip-32 (ocf::heartbeat:IPaddr2): Started vdc18
> >* Resource Group: group-4:
> >  * ip-41 (ocf::heartbeat:IPaddr2): Started vdc16
> >  * ip-42 (ocf::heartbeat:IPaddr2): Started vdc16
> >
> > Groups were created by next simple script:
> > cat groups.sh
> > pcs resource create ip-11 ocf:heartbeat:IPaddr2 ip=10.7.1.11
> > cidr_netmask=24 nic=lo op monitor interval=10s --group group-1
> > pcs resource create ip-12 ocf:heartbeat:IPaddr2 ip=10.7.1.12
> > cidr_netmask=24 nic=lo op monitor interval=10s --group group-1
> >
> > pcs resource create ip-21 ocf:heartbeat:IPaddr2 ip=10.7.1.21
> > cidr_netmask=24 nic=lo op monitor interval=10s --group group-2
> > pcs resource create ip-22 ocf:heartbeat:IPaddr2 ip=10.7.1.22
> > cidr_netmask=24 nic=lo op monitor interval=10s --group group-2
> >
> > pcs resource create ip-31 ocf:heartbeat:IPaddr2 ip=10.7.1.31
> > cidr_netmask=24 nic=lo op monitor interval=10s --group group-3
> > pcs resource create ip-32 ocf:heartbeat:IPaddr2 ip=10.7.1.32
> > cidr_netmask=24 nic=lo op monitor interval=10s --group group-3
> >
> > pcs resource create ip-41 ocf:heartbeat:IPaddr2 ip=10.7.1.41
> > cidr_netmask=24 nic=lo op monitor interval=10s --group group-4
> > pcs resource create ip-42 ocf:heartbeat:IPaddr2 ip=10.7.1.42
> > cidr_netmask=24 nic=lo op monitor interval=10s --group group-4
> >
> > Next, i try to remove all of these group in 'parallel':
> > cat remove.sh
> > pcs resource remove group-1 &
> > sleep 0.2
> > pcs resource remove group-2 &
> > sleep 0.2
> > pcs resource remove group-3 &
> > sleep 0.2
> > pcs resource remove group-4 &
> >
> > After this, every time I have a few resources in some groups which were
> > not removed. It looks like:
> >
> > Full List of Resources:
> >* Resource Group: group-2 (disabled):
> >  * ip-21 (ocf::heartbeat:IPaddr2): Stopped (disabled)
> >* Resource Group: group-4 (disabled):
> >  * ip-41 (ocf::heartbeat:IPaddr2): Stopped (disabled)
> >
> > In logs, I can see success stopping all resources, but after stopping
> > some resources it looks like pacemaker just 'forgot' about deletion and
> > didn't.
> >
> > Cluster name: pacemaker1
> > Cluster Summary:
> >* Stack: corosync
> >* Current DC: vdc16 (version 2.1.0-8.el8-7c3f660707) - partition with
> > quorum
> >* Last updated: Fri May 17 14:30:14 2024
> >* Last change:  Fri May 17 14:30:05 2024 by root via cibadmin on vdc16
> >* 3 nodes configured
> >* 2 resource instances configured (2 DISABLED)
> >
> > Node List:
> >* Online: [ vdc16 vdc17 vdc18 ]
> >
> > Host OS is CentOS 8.4. Cluster with default 

Re: [ClusterLabs] Disabled resources after parallel removing of group

2024-05-17 Thread Miroslav Lisik

Hi Aleksandr!

It is not safe to use `pcs resource remove` command in parallel because
you run into the same issues as you already described. Processes run by
remove command are not synchronized.

Unfortunately, remove command does not support more than one resource
yet.

If you really need to remove resources at once you can use this method:
1. get the current cib configuration:
pcs cluster cib > original.xml

2. create a new copy of the file:
cp original.xml new.xml

3. disable all to be removed resources using -f option and new
configuration file:
pcs -f new.xml resource disable ...

4. remove resources using -f option and new configuration file:
pcs -f new.xml resource remove 
...

5. push new cib configuration to the cluster
pcs cluster cib-push new.xml diff-against=original.xml


On 5/17/24 13:47, Александр Руденко wrote:

Hi!

I am new in the pacemaker world, and I, unfortunately, have problems 
with simple actions like group removal. Please, help me understand when 
I'm wrong.


For simplicity I will use standard resources like IPaddr2 (but we have 
this problem on any type of our custom resources).


I have 5 groups like this:

Full List of Resources:
   * Resource Group: group-1:
     * ip-11 (ocf::heartbeat:IPaddr2): Started vdc16
     * ip-12 (ocf::heartbeat:IPaddr2): Started vdc16
   * Resource Group: group-2:
     * ip-21 (ocf::heartbeat:IPaddr2): Started vdc17
     * ip-22 (ocf::heartbeat:IPaddr2): Started vdc17
   * Resource Group: group-3:
     * ip-31 (ocf::heartbeat:IPaddr2): Started vdc18
     * ip-32 (ocf::heartbeat:IPaddr2): Started vdc18
   * Resource Group: group-4:
     * ip-41 (ocf::heartbeat:IPaddr2): Started vdc16
     * ip-42 (ocf::heartbeat:IPaddr2): Started vdc16

Groups were created by next simple script:
cat groups.sh
pcs resource create ip-11 ocf:heartbeat:IPaddr2 ip=10.7.1.11 
cidr_netmask=24 nic=lo op monitor interval=10s --group group-1
pcs resource create ip-12 ocf:heartbeat:IPaddr2 ip=10.7.1.12 
cidr_netmask=24 nic=lo op monitor interval=10s --group group-1


pcs resource create ip-21 ocf:heartbeat:IPaddr2 ip=10.7.1.21 
cidr_netmask=24 nic=lo op monitor interval=10s --group group-2
pcs resource create ip-22 ocf:heartbeat:IPaddr2 ip=10.7.1.22 
cidr_netmask=24 nic=lo op monitor interval=10s --group group-2


pcs resource create ip-31 ocf:heartbeat:IPaddr2 ip=10.7.1.31 
cidr_netmask=24 nic=lo op monitor interval=10s --group group-3
pcs resource create ip-32 ocf:heartbeat:IPaddr2 ip=10.7.1.32 
cidr_netmask=24 nic=lo op monitor interval=10s --group group-3


pcs resource create ip-41 ocf:heartbeat:IPaddr2 ip=10.7.1.41 
cidr_netmask=24 nic=lo op monitor interval=10s --group group-4
pcs resource create ip-42 ocf:heartbeat:IPaddr2 ip=10.7.1.42 
cidr_netmask=24 nic=lo op monitor interval=10s --group group-4


Next, i try to remove all of these group in 'parallel':
cat remove.sh
pcs resource remove group-1 &
sleep 0.2
pcs resource remove group-2 &
sleep 0.2
pcs resource remove group-3 &
sleep 0.2
pcs resource remove group-4 &

After this, every time I have a few resources in some groups which were 
not removed. It looks like:


Full List of Resources:
   * Resource Group: group-2 (disabled):
     * ip-21 (ocf::heartbeat:IPaddr2): Stopped (disabled)
   * Resource Group: group-4 (disabled):
     * ip-41 (ocf::heartbeat:IPaddr2): Stopped (disabled)

In logs, I can see success stopping all resources, but after stopping 
some resources it looks like pacemaker just 'forgot' about deletion and 
didn't.


Cluster name: pacemaker1
Cluster Summary:
   * Stack: corosync
   * Current DC: vdc16 (version 2.1.0-8.el8-7c3f660707) - partition with 
quorum

   * Last updated: Fri May 17 14:30:14 2024
   * Last change:  Fri May 17 14:30:05 2024 by root via cibadmin on vdc16
   * 3 nodes configured
   * 2 resource instances configured (2 DISABLED)

Node List:
   * Online: [ vdc16 vdc17 vdc18 ]

Host OS is CentOS 8.4. Cluster with default settings. vdc16,vdc17,vdc18 
are VMs with 4 vCPU.



___
Manage your subscription:
https://lists.clusterlabs.org/mailman/listinfo/users

ClusterLabs home: https://www.clusterlabs.org/


___
Manage your subscription:
https://lists.clusterlabs.org/mailman/listinfo/users

ClusterLabs home: https://www.clusterlabs.org/