Re: [ClusterLabs] 'pcs stonith update' takes, then reverts

2021-07-26 Thread Digimer

  
  
On 2021-07-26 9:54 a.m.,
  kgail...@redhat.com wrote:


  On Fri, 2021-07-23 at 21:46 -0400, Digimer wrote:

  
After a LOT of hassle, I finally got it updated, but OMG it was
painful.

I degraded the cluster (unsure if needed), set maintenance mode,
deleted
the stonith levels, deleted the stonith devices, recreated them with
the
updated values, recreated the stonith levels, and finally disabled
maintenance mode.

It should not have been this hard, right? Why is heck would it be
that
pacemaker kept "rolling back" to old configs? I'd delete the stonith

  
  
That is bizarre. It sounds like the CIB changes were taking effect
locally, then being rejected by the rest of the cluster, which would
send the "correct" CIB back to the originator.

The logs of interest would be pacemaker.log from both nodes at the time
you made the first configuration change that failed. I'm guessing the
logs you posted were from after that point?


The logs I shared started after the issue began, yes. I can see
  if I can access the nodes and pull the logs.
Note that I degraded the cluster (withdrew the inactive node), so
  the node was a cluster itself alone, and and still happened.
digimer

-- 
Digimer
Papers and Projects: https://alteeve.com/w/
"I am, somehow, less interested in the weight and convolutions of Einstein’s brain than in the near certainty that people of equal talent have lived and died in cotton fields and sweatshops." - Stephen Jay Gould
  

___
Manage your subscription:
https://lists.clusterlabs.org/mailman/listinfo/users

ClusterLabs home: https://www.clusterlabs.org/


[ClusterLabs] pcs stonith update problems

2021-07-15 Thread Digimer
Hi all,

  I've got a predicament... I want to update a stonith resource to
remove an argument. Specifically, when resource move nodes, I want to
change the stonith delay to favour the new host. This involves adding
the 'delay="x"' argument to one stonith resource, and removing it from
the other;

Example;


# pcs cluster cib | grep -B7 -A7 '"delay"'
  

  
  
  
  
  


  

  


Here, the stonith resource 'ipmilan_node1' has the delay="15".

If I run:


# pcs stonith update ipmilan_node1 fence_ipmilan ipaddr="10.201.17.1"
password="xxx" username="admin"; echo $?
0


I see nothing happen in journald, and the delay argument remains in the
'pcs cluster cib' output. If, however, I do;


# /usr/sbin/pcs stonith update ipmilan_node1 fence_ipmilan
ipaddr="10.201.17.1" password="xxx" username="admin" delay="0"; echo $?
0


I can see in journald that the CIB was updated and can confirm in 'pcs
cluster cib' that the 'delay' value becomes '0'. So it seems that, if an
argument previously existed and is NOT specified in an update, it is not
removed.

Is this intentional for some reason? If so, how would I remove the delay
attribute? I've got a fairly complex stonith config, with stonith
levels. Deleting and recreating the config would be non-trivial.

Pacemaker v2.1.0, pcs v0.10.8.181-47e9, CentOS Stream 8.

digimer

-- 
Digimer
Papers and Projects: https://alteeve.com/w/
"I am, somehow, less interested in the weight and convolutions of
Einstein’s brain than in the near certainty that people of equal talent
have lived and died in cotton fields and sweatshops." - Stephen Jay Gould
___
Manage your subscription:
https://lists.clusterlabs.org/mailman/listinfo/users

ClusterLabs home: https://www.clusterlabs.org/


Re: [ClusterLabs] HA-Cluster, UPS and power outage - how is your setup ?

2022-02-01 Thread Digimer

  
  
On 2022-02-01 11:16, Lentes, Bernd
  wrote:


  Hi,

we just experienced two power outages in a few days.
This showed me that our UPS configuration and the handling of resources on the cluster is insufficient.
We have a two-node cluster with SLES 12 SP5 and a Smart-UPS SRT 3000 from APC with Network Management Card.
The UPS is able to buffer the two nodes and some Hardware (SAN, Monitor) for about one hour.
Our resources are Virtual Domains, about 20 of different flavor and version.

Our primary goal is not to bypass as long as possible a power outage but to shutdown all domains correctly after a dedicated time.

I'm currently thinking of waiting for a dedicated time (maybe 15 minutes) and then do a "crm resource stop VirtualDomains" in a script.
I would give the cluster some time for the shutdown (5-10 minutes) and afterwards shutdown the nodes (via script).
I have to keep an eye on if both nodes are running or only one of them.

How is your approach ?

Bernd


I don't know if this will be a useful answer for you, but I
  haven't seen anyone else reply. 

In the Anvil!, we use SNMP to collect data on APC UPSes powering
  a given cluster. The OIDs we read are at the head of this file,
  but the logic to read and collect the data starts here;
https://github.com/ClusterLabs/anvil/blob/main/scancore-agents/scan-apc-ups/scan-apc-ups#L3026
Some processing happens in-agent, but mainly the collected data
  is written to a generic "power" table (as we support any UPS we
  can collect data from). When we're done scanning, we analyze the
  data in the 'power' table to decide if we need to shed load
  (withdraw and power off nodes to extend runtime), do a complete
  graceful shutdown (if the batteries are about to die), or reboot
  the nodes after power is restored.
This logic is handled mainly here. First, we figure out which UPS
  powers which nodes/clusters, then we pull the data on those
  specific UPSes to return a general "power state". 

https://github.com/ClusterLabs/anvil/blob/main/Anvil/Tools/ScanCore.pm#L607
The power state then tells the main daemon what actions to take,
  if any (load shed, shut down, restart). That's here;
https://github.com/ClusterLabs/anvil/blob/main/Anvil/Tools/ScanCore.pm#L1541
This is super high level, and much of the specifics are related
  to the Anvil! cluster, but it hopefully gives you a starting point
  on how to approach the problem. We've been doing it this way for
  many years with really good effect.
Cheers



-- 
Digimer
Papers and Projects: https://alteeve.com/w/
"I am, somehow, less interested in the weight and convolutions of Einstein’s brain than in the near certainty that people of equal talent have lived and died in cotton fields and sweatshops." - Stephen Jay Gould
  

___
Manage your subscription:
https://lists.clusterlabs.org/mailman/listinfo/users

ClusterLabs home: https://www.clusterlabs.org/


Re: [ClusterLabs] Removing a resource without stopping it

2022-01-29 Thread Digimer

  
  
On 2022-01-29 03:16, Strahil Nikolov
  wrote:


  
  I think there is pcs cluster edit --scope=resources (based on
  memory).
  Can you try to
delete it from there ?
  
  
  Best Regards,
  Strahil Nikolov

Thanks, but no that doesn't seem to work. 'pcs cluster edit'
  wants to open an editor, and I'm trying to find a way to make this
  change with a program (once I sort out the manual process). So an
  option that requires user input won't work in my case regardless.
  Thank you just the same though!
-- 
Digimer
Papers and Projects: https://alteeve.com/w/
"I am, somehow, less interested in the weight and convolutions of Einstein’s brain than in the near certainty that people of equal talent have lived and died in cotton fields and sweatshops." - Stephen Jay Gould
  

___
Manage your subscription:
https://lists.clusterlabs.org/mailman/listinfo/users

ClusterLabs home: https://www.clusterlabs.org/


Re: [ClusterLabs] Removing a resource without stopping it

2022-01-29 Thread Digimer

  
  
I think I have a working method, though
  not in the conditions first explained. Though I would love to have
  feedback on it's sanity.


The ultimate goal is to migrate the
  resource (VM) to a different pacemaker cluster. Setting it to be
  unmanaged, migrating the VM off, setting the resource to disabled,
  and managing the resource again marks it as stopped, then it can
  be deleted.




  [root@an-a01n01 ~]# pcs resource unmanage
srv01-cs8
# Migrate the server to another pacemaker cluster here
[root@an-a01n01 ~]# pcs resource disable srv01-cs8
Warning: 'srv01-cs8' is unmanaged
[root@an-a01n01 ~]# pcs resource manage srv01-cs8
[root@an-a01n01 ~]# pcs resource delete srv01-cs8
Deleting Resource - srv01-cs8
  


Though going back to the original
  question, deleting the server from pacemaker while the VM is left
  running, is still something I am quite curious about.


Madi



On 2022-01-29 13:27, Strahil Nikolov
  wrote:


  
  I know...
   and the editor
stuff can be bypassed, if the approach works.
  
  
  Best Regards,
  Strahil Nikolov


  
On Sat, Jan 29, 2022 at 15:43, Digimer
 wrote:
  
  

  

  On
2022-01-29 03:16, Strahil Nikolov wrote:
  
   

  
  
 I think there is
  pcs cluster edit --scope=resources (based on memory).
  Can
you try to delete it from there ?
  
  
  Best
Regards,
  Strahil
Nikolov

Thanks, but no that doesn't seem to work. 'pcs
  cluster edit' wants to open an editor, and I'm trying
  to find a way to make this change with a program (once
  I sort out the manual process). So an option that
  requires user input won't work in my case regardless.
  Thank you just the same though!
-- 
Digimer
Papers and Projects: https://alteeve.com/w/
"I am, somehow, less interested in the weight and convolutions of Einstein’s brain than in the near certainty that people of equal talent have lived and died in cotton fields and sweatshops." - Stephen Jay Gould
 
  

  

  


    
-- 
Digimer
Papers and Projects: https://alteeve.com/w/
"I am, somehow, less interested in the weight and convolutions of Einstein’s brain than in the near certainty that people of equal talent have lived and died in cotton fields and sweatshops." - Stephen Jay Gould
  

___
Manage your subscription:
https://lists.clusterlabs.org/mailman/listinfo/users

ClusterLabs home: https://www.clusterlabs.org/


[ClusterLabs] Removing a resource without stopping it

2022-01-28 Thread Digimer

Hi all,

  I'm trying to figure out how to move a running VM from one pacemaker 
cluster to another. I've got the storage and VM live migration sorted, 
but having trouble with pacemaker.


  I tried unmanaging the resource (the VM), then deleted the resource, 
and the node got fenced. So I am assuming it thought it couldn't stop 
the service so it self-fenced. In any case, can someone let me know what 
the proper procedure is?


  Said more directly;

  How to I delete a resource from pacemaker (via pcs on EL8) without 
stopping the resource?


--
Digimer
Papers and Projects: https://alteeve.com/w/
"I am, somehow, less interested in the weight and convolutions of Einstein’s brain 
than in the near certainty that people of equal talent have lived and died in cotton 
fields and sweatshops." - Stephen Jay Gould

___
Manage your subscription:
https://lists.clusterlabs.org/mailman/listinfo/users

ClusterLabs home: https://www.clusterlabs.org/


Re: [ClusterLabs] Removing a resource without stopping it

2022-01-28 Thread Digimer

  
  
On 2022-01-29 00:10, Digimer wrote:


  
  On 2022-01-28 16:54, Ken Gaillot
wrote:
  
  
On Fri, 2022-01-28 at 16:38 -0500, Digimer wrote:


  Hi all,

   I'm trying to figure out how to move a running VM from one
pacemaker 
cluster to another. I've got the storage and VM live migration
sorted, 
but having trouble with pacemaker.

   I tried unmanaging the resource (the VM), then deleted the
resource, 
and the node got fenced. So I am assuming it thought it couldn't
stop 
the service so it self-fenced. In any case, can someone let me know
what 
the proper procedure is?

   Said more directly;

   How to I delete a resource from pacemaker (via pcs on EL8)
without 
stopping the resource?



Set the stop-orphan-resources cluster property to false (at least while
you move it)

The problem with your first approach is that once you remove the
resource configuration, which includes the is-managed setting,
Pacemaker no longer knows the resource is unmanaged. And even if you
set it via resource defaults or something, eventually you have to set
it back, at which point Pacemaker will still have the same response.

  
  Follow up;
    I tried to do the following sequence;
  
pcs property set
  stop-orphan-resources=false
  pcs resource unmanage srv01-cs8             # Without
  this, the resource was stopped
  pcs resource delete srv01-cs8                   # Failed with
  "Warning: 'srv01-cs8' is unmanaged"
  pcs resource delete srv01-cs8 --force   # Got
  'Deleting Resource - srv01-cs8'
  pcs resource status
  --
    * srv01-cs8   (ocf::alteeve:server):   ORPHANED Started
  an-a01n01 (unmanaged)
  --

    So it seems like this doesn't delete the resource. Can I get
some insight on how to actually delete this resource without
disabling the VM? 
  
  Thanks!

Adding;
I tried 'pcs property set
stop-orphan-resources=true' and it stopped the VM and
  then actually deleted the resource. =/
    
-- 
Digimer
Papers and Projects: https://alteeve.com/w/
"I am, somehow, less interested in the weight and convolutions of Einstein’s brain than in the near certainty that people of equal talent have lived and died in cotton fields and sweatshops." - Stephen Jay Gould
  

___
Manage your subscription:
https://lists.clusterlabs.org/mailman/listinfo/users

ClusterLabs home: https://www.clusterlabs.org/


Re: [ClusterLabs] Removing a resource without stopping it

2022-01-28 Thread Digimer

  
  
On 2022-01-28 16:54, Ken Gaillot wrote:


  On Fri, 2022-01-28 at 16:38 -0500, Digimer wrote:

  
Hi all,

   I'm trying to figure out how to move a running VM from one
pacemaker 
cluster to another. I've got the storage and VM live migration
sorted, 
but having trouble with pacemaker.

   I tried unmanaging the resource (the VM), then deleted the
resource, 
and the node got fenced. So I am assuming it thought it couldn't
stop 
the service so it self-fenced. In any case, can someone let me know
what 
the proper procedure is?

   Said more directly;

   How to I delete a resource from pacemaker (via pcs on EL8)
without 
stopping the resource?


  
  
Set the stop-orphan-resources cluster property to false (at least while
you move it)

The problem with your first approach is that once you remove the
resource configuration, which includes the is-managed setting,
Pacemaker no longer knows the resource is unmanaged. And even if you
set it via resource defaults or something, eventually you have to set
it back, at which point Pacemaker will still have the same response.


Thanks for this! I'm not entirely sure I understand the
  implications of "stop-orphan-resources". I assume it would be a
  bad idea to set it to "false" and leave it that way, given it's
  not the default. What's the purpose of this being set to 'true'?
Thanks!

-- 
Digimer
Papers and Projects: https://alteeve.com/w/
"I am, somehow, less interested in the weight and convolutions of Einstein’s brain than in the near certainty that people of equal talent have lived and died in cotton fields and sweatshops." - Stephen Jay Gould
  

___
Manage your subscription:
https://lists.clusterlabs.org/mailman/listinfo/users

ClusterLabs home: https://www.clusterlabs.org/


Re: [ClusterLabs] Removing a resource without stopping it

2022-01-28 Thread Digimer

  
  
On 2022-01-28 16:54, Ken Gaillot wrote:


  On Fri, 2022-01-28 at 16:38 -0500, Digimer wrote:

  
Hi all,

   I'm trying to figure out how to move a running VM from one
pacemaker 
cluster to another. I've got the storage and VM live migration
sorted, 
but having trouble with pacemaker.

   I tried unmanaging the resource (the VM), then deleted the
resource, 
and the node got fenced. So I am assuming it thought it couldn't
stop 
the service so it self-fenced. In any case, can someone let me know
what 
the proper procedure is?

   Said more directly;

   How to I delete a resource from pacemaker (via pcs on EL8)
without 
stopping the resource?


  
  
Set the stop-orphan-resources cluster property to false (at least while
you move it)

The problem with your first approach is that once you remove the
resource configuration, which includes the is-managed setting,
Pacemaker no longer knows the resource is unmanaged. And even if you
set it via resource defaults or something, eventually you have to set
it back, at which point Pacemaker will still have the same response.


Follow up;
  I tried to do the following sequence;

  pcs property set
stop-orphan-resources=false
pcs resource unmanage srv01-cs8             # Without this,
the resource was stopped
pcs resource delete srv01-cs8                   # Failed with
"Warning: 'srv01-cs8' is unmanaged"
pcs resource delete srv01-cs8 --force   # Got 'Deleting
Resource - srv01-cs8'
pcs resource status
--
  * srv01-cs8   (ocf::alteeve:server):   ORPHANED Started
an-a01n01 (unmanaged)
--
  
  So it seems like this doesn't delete the resource. Can I get
  some insight on how to actually delete this resource without
  disabling the VM? 

Thanks!
    digimer
    
-- 
Digimer
Papers and Projects: https://alteeve.com/w/
"I am, somehow, less interested in the weight and convolutions of Einstein’s brain than in the near certainty that people of equal talent have lived and died in cotton fields and sweatshops." - Stephen Jay Gould
  

___
Manage your subscription:
https://lists.clusterlabs.org/mailman/listinfo/users

ClusterLabs home: https://www.clusterlabs.org/


<    1   2   3   4   5