Re: [ClusterLabs] Stonith external/ssh "device"?

2022-12-21 Thread Antony Stone
On Wednesday 21 December 2022 at 17:19:34, Antony Stone wrote: > > pacemaker-fenced[3262]: notice: Operation reboot of nodeB by > > for pacemaker-controld.26852@nodeA.93b391b2: No such device > pacemaker-controld[3264]: notice: Peer nodeB was not terminated (reboot) > by on behalf of

Re: [ClusterLabs] Stonith external/ssh "device"?

2022-12-21 Thread Antony Stone
On Wednesday 21 December 2022 at 16:59:16, Antony Stone wrote: > Hi. > > I'm implementing fencing on a 7-node cluster as described recently: > https://lists.clusterlabs.org/pipermail/users/2022-December/030714.html > > I'm using external/ssh for the time being, and it works if I test it using:

[ClusterLabs] Stonith external/ssh "device"?

2022-12-21 Thread Antony Stone
Hi. I'm implementing fencing on a 7-node cluster as described recently: https://lists.clusterlabs.org/pipermail/users/2022-December/030714.html I'm using external/ssh for the time being, and it works if I test it using: stonith -t external/ssh -p "nodeA nodeB nodeC" -T reset nodeB However,

Re: [ClusterLabs] Stonith

2022-12-19 Thread Ken Gaillot
On Mon, 2022-12-19 at 16:17 +0300, Andrei Borzenkov wrote: > On Mon, Dec 19, 2022 at 4:01 PM Antony Stone > wrote: > > On Monday 19 December 2022 at 13:55:45, Andrei Borzenkov wrote: > > > > > On Mon, Dec 19, 2022 at 3:44 PM Antony Stone > > > > > > wrote: > > > > So, do I simply create one

Re: [ClusterLabs] Stonith

2022-12-19 Thread Andrei Borzenkov
On Mon, Dec 19, 2022 at 4:01 PM Antony Stone wrote: > > On Monday 19 December 2022 at 13:55:45, Andrei Borzenkov wrote: > > > On Mon, Dec 19, 2022 at 3:44 PM Antony Stone > > > > wrote: > > > So, do I simply create one stonith resource for each server, and rely on > > > some other random server

Re: [ClusterLabs] Stonith

2022-12-19 Thread Antony Stone
On Monday 19 December 2022 at 13:55:45, Andrei Borzenkov wrote: > On Mon, Dec 19, 2022 at 3:44 PM Antony Stone > > wrote: > > So, do I simply create one stonith resource for each server, and rely on > > some other random server to invoke it when needed? > > Yes, this is the most simple

Re: [ClusterLabs] Stonith

2022-12-19 Thread Andrei Borzenkov
On Mon, Dec 19, 2022 at 3:44 PM Antony Stone wrote: > > So, do I simply create one stonith resource for each server, and rely on some > other random server to invoke it when needed? > Yes, this is the most simple approach. You need to restrict this stonith resource to only one cluster node (set

[ClusterLabs] Stonith

2022-12-19 Thread Antony Stone
Hi. I have a 7-node corosync / pacemaker cluster which is working nicely as a proof-of-concept. Three machines are in data centre 1, three are in data centre 2, and one machine is in data centre 3. I'm using location contraints to run one set of resources on any of the machines in DC1,

[ClusterLabs] Stonith failing

2020-07-28 Thread Gabriele Bulfon
Hi, now I have my two nodes (xstha1 and xstha2) with IPs configured by Corosync. To check how stonith would work, I turned off Corosync service on second node. First node try to attempt to stonith 2nd node and take care of its resources, but this fails. Stonith action is configured to run a

Re: [ClusterLabs] Stonith configuration

2020-02-14 Thread Dan Swartzendruber
On 2020-02-14 13:06, Strahil Nikolov wrote: On February 14, 2020 4:44:53 PM GMT+02:00, "BASDEN, ALASTAIR G." wrote: Hi Strahil, Note2: Consider adding a third node /for example a VM/ or a qdevice on a separate node (allows to be on a separate network, so a simple routing is the only

Re: [ClusterLabs] Stonith configuration

2020-02-14 Thread BASDEN, ALASTAIR G.
Hi Strahil, corosync-cfgtool -s Printing ring status. Local node ID 1 RING ID 0 id = 172.17.150.20 status = ring 0 active with no faults RING ID 1 id = 10.0.6.20 status = ring 1 active with no faults corosync-quorumtool -s Quorum information

[ClusterLabs] Stonith configuration

2020-02-14 Thread BASDEN, ALASTAIR G.
Hi, I wonder whether anyone could give me some advice about a stonith configuration. We have 2 nodes, which form a HA cluster. These have 3 networks: A generic network over which they are accessed (eg ssh) (node1.primary.network, node2.primary.network) A directly connected cable between them

Re: [ClusterLabs] stonith-ng - performing action 'monitor' timed out with signal 15

2019-09-16 Thread Ken Gaillot
On Tue, 2019-09-03 at 10:09 +0200, Marco Marino wrote: > Hi, I have a problem with fencing on a two node cluster. It seems > that randomly the cluster cannot complete monitor operation for fence > devices. In log I see: > crmd[8206]: error: Result of monitor operation for fence-node2 on >

Re: [ClusterLabs] stonith-ng - performing action 'monitor' timed out with signal 15

2019-09-11 Thread Marco Marino
Hi, some updates about this? Thank you Il Mer 4 Set 2019, 10:46 Marco Marino ha scritto: > First of all, thank you for your support. > Andrey: sure, I can reach machines through IPMI. > Here is a short "log": > > #From ld1 trying to contact ld1 > [root@ld1 ~]# ipmitool -I lanplus -H

Re: [ClusterLabs] stonith-ng - performing action 'monitor' timed out with signal 15

2019-09-04 Thread Marco Marino
First of all, thank you for your support. Andrey: sure, I can reach machines through IPMI. Here is a short "log": #From ld1 trying to contact ld1 [root@ld1 ~]# ipmitool -I lanplus -H 192.168.254.250 -U root -P XX sdr elist all SEL | 72h | ns | 7.1 | No Reading Intrusion

Re: [ClusterLabs] stonith-ng - performing action 'monitor' timed out with signal 15

2019-09-04 Thread Jan Pokorný
On 03/09/19 20:15 +0300, Andrei Borzenkov wrote: > 03.09.2019 11:09, Marco Marino пишет: >> Hi, I have a problem with fencing on a two node cluster. It seems that >> randomly the cluster cannot complete monitor operation for fence devices. >> In log I see: >> crmd[8206]: error: Result of monitor

Re: [ClusterLabs] stonith-ng - performing action 'monitor' timed out with signal 15

2019-09-03 Thread Andrei Borzenkov
03.09.2019 11:09, Marco Marino пишет: > Hi, I have a problem with fencing on a two node cluster. It seems that > randomly the cluster cannot complete monitor operation for fence devices. > In log I see: > crmd[8206]: error: Result of monitor operation for fence-node2 on > ld2.mydomain.it: Timed

[ClusterLabs] stonith-ng - performing action 'monitor' timed out with signal 15

2019-09-03 Thread Marco Marino
Hi, I have a problem with fencing on a two node cluster. It seems that randomly the cluster cannot complete monitor operation for fence devices. In log I see: crmd[8206]: error: Result of monitor operation for fence-node2 on ld2.mydomain.it: Timed Out As attachment there is - /var/log/messages

Re: [ClusterLabs] Stonith two-node cluster shot each other

2018-12-05 Thread Klaus Wenninger
If you are not so sure which of the nodes you want to give precedence then you can at least add some random-delay to it as to at least prevent that they at the same time decide to kill each other (fence-race). If your fencing-agent doesn't support a delay or you don't want to use that for some

Re: [ClusterLabs] Stonith two-node cluster shot each other

2018-12-04 Thread Digimer
You need to set a fence delay on the node you want to win in a case like this. So say, for example, node 1 is hosting services. You will want to add 'delay="15"' to the stonith config for node 1. This way, when both nodes try to fence each other, node 2 looks up how to fence node 1, sees a delay

Re: [ClusterLabs] Stonith two-node cluster shot each other

2018-12-04 Thread Daniel Ragle
Once again I botched the obfuscation. The corosync.conf should in fact be 'node1.mydomain.com' and 'node2.mydomain.com' (i.e., it matches the rest of the configuration). Thanks! Dan On 12/4/2018 12:48 PM, Daniel Ragle wrote: I *think* the two nodes of my cluster shot each other in the head

[ClusterLabs] Stonith two-node cluster shot each other

2018-12-04 Thread Daniel Ragle
I *think* the two nodes of my cluster shot each other in the head this weekend and I can't figure out why. Looking at corosync.log on node1 I see this: [143747] node1.mydomain.com corosyncnotice [TOTEM ] A processor failed, forming new configuration. [143747] node1.mydomain.com

Re: [ClusterLabs] STONITH resources on wrong nodes

2018-07-11 Thread Salvatore D'angelo
Thank you. It's clear now. Il Mer 11 Lug 2018, 7:18 PM Andrei Borzenkov ha scritto: > 11.07.2018 20:12, Salvatore D'angelo пишет: > > Does this mean that if STONITH resource p_ston_pg1 even if it runs on > node pg2 if pacemaker send a signal to it pg1 is powered of and not pg2. > > Am I

Re: [ClusterLabs] STONITH resources on wrong nodes

2018-07-11 Thread Andrei Borzenkov
11.07.2018 20:12, Salvatore D'angelo пишет: > Does this mean that if STONITH resource p_ston_pg1 even if it runs on node > pg2 if pacemaker send a signal to it pg1 is powered of and not pg2. > Am I correct? Yes. Resource will be used to power off whatever hosts are listed in its pcmk_host_list.

Re: [ClusterLabs] STONITH resources on wrong nodes

2018-07-11 Thread Salvatore D'angelo
Does this mean that if STONITH resource p_ston_pg1 even if it runs on node pg2 if pacemaker send a signal to it pg1 is powered of and not pg2. Am I correct? > On 11 Jul 2018, at 19:10, Andrei Borzenkov wrote: > > 11.07.2018 19:44, Salvatore D'angelo пишет: >> Hi all, >> >> in my cluster doing

Re: [ClusterLabs] STONITH resources on wrong nodes

2018-07-11 Thread Andrei Borzenkov
11.07.2018 19:44, Salvatore D'angelo пишет: > Hi all, > > in my cluster doing cam_mon -1ARrf I noticed my STONITH resources are not > correctly located: Actual location of stonith resources does not really matter in up to date pacemaker. It only determines where resource will be monitored;

Re: [ClusterLabs] STONITH resources on wrong nodes

2018-07-11 Thread Salvatore D'angelo
Suppose I do the following: crm configure delete l_ston_pg1 crm configure delete l_ston_pg2 crm configure delete l_ston_pg3 crm configure location l_ston_pg1 p_ston_pg1 inf: pg1 crm configure location l_ston_pg2 p_ston_pg2 inf: pg2 crm configure location l_ston_pg3 p_ston_pg3 inf: pg3 How long

Re: [ClusterLabs] STONITH resources on wrong nodes

2018-07-11 Thread Emmanuel Gelati
You need to use location l_ston_pg3 p_ston_pg3 inf: pg3, because -inf is negative. 2018-07-11 18:44 GMT+02:00 Salvatore D'angelo : > Hi all, > > in my cluster doing cam_mon -1ARrf I noticed my STONITH resources are not > correctly located: > p_ston_pg1 (stonith:external/ipmi): Started pg2 >

[ClusterLabs] STONITH resources on wrong nodes

2018-07-11 Thread Salvatore D'angelo
Hi all, in my cluster doing cam_mon -1ARrf I noticed my STONITH resources are not correctly located: p_ston_pg1 (stonith:external/ipmi):Started pg2 p_ston_pg2 (stonith:external/ipmi):Started pg1 p_ston_pg3 (stonith:external/ipmi):Started pg1 I have three

[ClusterLabs] stonith continues to reboot server once fencing occurs

2018-05-11 Thread Dickerson, Charles {Chuck} (JSC-EG)[Jacobs Technology, Inc.]
I have a 2 node cluster, once fencing occurs, the fenced node is continually rebooted every time it comes up. Configuration: 2 identical nodes - Centos 7.4, pacemaker 1.1.18, pcs 0.9.162, fencing configured using fence_ipmilan The cluster is set to ignore quorum and stonith is enabled.

Re: [ClusterLabs] STONITH forever?

2018-04-10 Thread Ken Gaillot
On Tue, 2018-04-10 at 07:26 +, Stefan Schlösser wrote: > Hi, >   > I have a 3 node setup on ubuntu 16.04. Corosync/Pacemaker services > are not started automatically. >   > If I put all 3 nodes to offline mode, with 1 node in an „unclean“ > state I get a never ending STONITH. >   > What

[ClusterLabs] STONITH forever?

2018-04-10 Thread Stefan Schlösser
Hi, I have a 3 node setup on ubuntu 16.04. Corosync/Pacemaker services are not started automatically. If I put all 3 nodes to offline mode, with 1 node in an "unclean" state I get a never ending STONITH. What happens is that the STONITH causes a reboot of the unclean node. 1) I would have

Re: [ClusterLabs] Stonith stops after vSphere restart

2018-04-03 Thread jota
Hi again, After restarting the vCenter, all worked as expected. Thanks to all. Have a nice day. 23 de febrero de 2018 7:59, j...@disroot.org escribió: > Hi all, > > Thanks for your responses. > With your advice I was able to configure it. I still have to test its > operation. When it is >

Re: [ClusterLabs] Stonith stops after vSphere restart

2018-02-22 Thread jota
Hi all, Thanks for your responses. With your advice I was able to configure it. I still have to test its operation. When it is possible to restart the vCenter, I will post the results. Have a nice weekend! 22 de febrero de 2018 16:00, "Tomas Jelinek" escribió: > Try

Re: [ClusterLabs] Stonith stops after vSphere restart

2018-02-22 Thread Tomas Jelinek
Try this: pcs resource meta vmware_soap failure-timeout= Tomas Dne 22.2.2018 v 14:55 j...@disroot.org napsal(a): Hi, I am trying to configure the failure-timeout for stonith, but I only can do it for the other resources. When try to enable it for stonith, I get this error: "Error:

Re: [ClusterLabs] Stonith stops after vSphere restart

2018-02-22 Thread Klaus Wenninger
On 02/22/2018 02:55 PM, j...@disroot.org wrote: > Hi, > > I am trying to configure the failure-timeout for stonith, but I only can do > it for the other resources. > When try to enable it for stonith, I get this error: "Error: resource > option(s): 'failure-timeout', are not recognized for

Re: [ClusterLabs] Stonith stops after vSphere restart

2018-02-22 Thread jota
Hi, I am trying to configure the failure-timeout for stonith, but I only can do it for the other resources. When try to enable it for stonith, I get this error: "Error: resource option(s): 'failure-timeout', are not recognized for resource type: 'stonith::fence_vmware_soap'". Thanks. 22 de

Re: [ClusterLabs] Stonith stops after vSphere restart

2018-02-22 Thread Andrei Borzenkov
On Thu, Feb 22, 2018 at 2:40 PM, wrote: > Thanks for the responses. > > So, if I understand, this is the right behaviour and it does not affect to > the stonith mechanism. > > If I remember correctly, the fault status persists for hours until I fix it > manually. > Is there

Re: [ClusterLabs] Stonith stops after vSphere restart

2018-02-22 Thread jota
Thanks for the responses. So, if I understand, this is the right behaviour and it does not affect to the stonith mechanism. If I remember correctly, the fault status persists for hours until I fix it manually. Is there any way to modify the expiry time to clean itself?. 22 de febrero de 2018

Re: [ClusterLabs] Stonith stops after vSphere restart

2018-02-22 Thread Andrei Borzenkov
Stonith resource state should have no impact on actual stonith operation. It only reflects whether monitor was successful or not and serves as warning to administrator that something may be wrong. It should automatically clear itself after failure-timeout has expired. On Thu, Feb 22, 2018 at 1:58

Re: [ClusterLabs] Stonith stops after vSphere restart

2018-02-22 Thread Marek Grac
Hi, On Thu, Feb 22, 2018 at 11:58 AM, wrote: > > Hi, > > I have a 2 node pacemaker cluster configured with the fence agent > vmware_soap. > Everything works fine until the vCenter is restarted. After that, stonith > fails and stop. > This is expected as we run 'monitor'

[ClusterLabs] Stonith stops after vSphere restart

2018-02-22 Thread jota
Hi, I have a 2 node pacemaker cluster configured with the fence agent vmware_soap. Everything works fine until the vCenter is restarted. After that, stonith fails and stop. [root@node1 ~]# pcs status Cluster name: psqltest Stack: corosync Current DC: node2 (version 1.1.16-12.el7_4.7-94ff4df) -

Re: [ClusterLabs] Stonith hostname vs port vs plug

2017-07-31 Thread ArekW
> The "plug" should match the name used by the hypervisor, not the actual host name (if they differ). I understand the difference between plug and hostname. I don't clearly understand which fence config is correct (I reffer to pcs stonith describe fence_...): the same entry on every node:

Re: [ClusterLabs] Stonith hostname vs port vs plug

2017-07-31 Thread Digimer
On 2017-07-31 03:18 AM, ArekW wrote: > Hi, I'm confused how to properly set stonith when a hostname is > different than port/plug name. I have 2 vms on vbox/vmware with > hostnames: node1, node2. The port's names are: Centos1, Centos2. > According to my understanding the stonith device must know

[ClusterLabs] Stonith hostname vs port vs plug

2017-07-31 Thread ArekW
Hi, I'm confused how to properly set stonith when a hostname is different than port/plug name. I have 2 vms on vbox/vmware with hostnames: node1, node2. The port's names are: Centos1, Centos2. According to my understanding the stonith device must know which vm to control (each other) so I set:

Re: [ClusterLabs] stonith disabled, but pacemaker tries to reboot

2017-07-20 Thread Ken Gaillot
On 07/20/2017 03:46 AM, Daniel.L wrote: > Hi Pacemaker Users, > > > We have a 2 node pacemaker cluster (v1.1.14). > Stonith at this moment is disabled: > > $ pcs property --all | grep stonith > stonith-action: reboot > stonith-enabled: false > stonith-timeout: 60s > stonith-watchdog-timeout:

[ClusterLabs] stonith disabled, but pacemaker tries to reboot

2017-07-20 Thread Daniel.L
Hi Pacemaker Users, We have a 2 node pacemaker cluster (v1.1.14). Stonith at this moment is disabled: $ pcs property --all | grep stonith stonith-action: reboot stonith-enabled: false stonith-timeout: 60s stonith-watchdog-timeout: (null) $ pcs property --all | grep fenc startup-fencing: true

Re: [ClusterLabs] stonith device locate on same host in active/passive cluster

2017-05-11 Thread Albert Weng
Hi Ken, thank you for your comment. i think this case can be closed, i use your suggestion of constraint and then problem resolved. thanks a lot~~ On Thu, May 4, 2017 at 10:28 PM, Ken Gaillot wrote: > On 05/03/2017 09:04 PM, Albert Weng wrote: > > Hi Marek, > > > >

Re: [ClusterLabs] stonith device locate on same host in active/passive cluster

2017-05-04 Thread Ken Gaillot
On 05/03/2017 09:04 PM, Albert Weng wrote: > Hi Marek, > > Thanks your reply. > > On Tue, May 2, 2017 at 5:15 PM, Marek Grac > wrote: > > > > On Tue, May 2, 2017 at 11:02 AM, Albert Weng

Re: [ClusterLabs] stonith device locate on same host in active/passive cluster

2017-05-02 Thread Albert Weng
Hi Marek, thanks for your quickly responding. According to you opinion, when i type "pcs status" then i saw the following result of fence : ipmi-fence-node1(stonith:fence_ipmilan):Started cluaterb ipmi-fence-node2(stonith:fence_ipmilan):Started clusterb Does it means both ipmi

Re: [ClusterLabs] stonith device locate on same host in active/passive cluster

2017-05-02 Thread Marek Grac
Hi, On Tue, May 2, 2017 at 3:39 AM, Albert Weng wrote: > Hi All, > > I have created active/passive pacemaker cluster on RHEL 7. > > here is my environment: > clustera : 192.168.11.1 > clusterb : 192.168.11.2 > clustera-ilo4 : 192.168.11.10 > clusterb-ilo4 :

Re: [ClusterLabs] stonith device locate on same host in active/passive cluster

2017-05-01 Thread Albert Weng
Hi All, the following logs from corosync.log that might help. Apr 25 10:29:32 [15334] gmlcdbw02pengine: info: native_print: ipmi-fence-db01(stonith:fence_ipmilan):Started gmlcdbw01 Apr 25 10:29:32 [15334] gmlcdbw02pengine: info: native_print: ipmi-fence-db02

[ClusterLabs] stonith device locate on same host in active/passive cluster

2017-05-01 Thread Albert Weng
Hi All, I have created active/passive pacemaker cluster on RHEL 7. here is my environment: clustera : 192.168.11.1 clusterb : 192.168.11.2 clustera-ilo4 : 192.168.11.10 clusterb-ilo4 : 192.168.11.11 both nodes are connected SAN storage for shared storage. i used the following cmd to create my

Re: [ClusterLabs] STONITH not communicated back to initiator until token expires

2017-04-26 Thread Chris Walker
Just to close the loop on this issue, discussions with Redhat have confirmed that this behavior is as designed, that all membership changes must first be realized by the Corosync layer. So the full trajectory of a STONITH action in response to, for example, a failed stop operation looks like:

Re: [ClusterLabs] STONITH not communicated back to initiator until token expires

2017-04-04 Thread Ken Gaillot
On 03/13/2017 10:43 PM, Chris Walker wrote: > Thanks for your reply Digimer. > > On Mon, Mar 13, 2017 at 1:35 PM, Digimer > wrote: > > On 13/03/17 12:07 PM, Chris Walker wrote: > > Hello, > > > > On our two-node EL7 cluster (pacemaker:

Re: [ClusterLabs] Stonith

2017-03-31 Thread Alexander Markov
Kristoffer Grönlund writes: The only solution I know which allows for a configuration like this is using separate clusters in each data center, and using booth for transferring ticket ownership between them. Booth requires a data center-level quorum (meaning at least 3 locations), though the

Re: [ClusterLabs] Stonith

2017-03-30 Thread Kristoffer Grönlund
Alexander Markov writes: > Hello, Kristoffer > >> Did you test failover through pacemaker itself? > > Yes, I did, no problems here. > >> However: Am I understanding it correctly that you have one node in each >> data center, and a stonith device in each data center? > > Yes.

Re: [ClusterLabs] Stonith

2017-03-30 Thread Alexander Markov
Hello, Kristoffer Did you test failover through pacemaker itself? Yes, I did, no problems here. However: Am I understanding it correctly that you have one node in each data center, and a stonith device in each data center? Yes. If the data center is lost, the stonith device for the node

Re: [ClusterLabs] Stonith

2017-03-30 Thread Kristoffer Grönlund
Alexander Markov writes: > Hello guys, > > it looks like I miss something obvious, but I just don't get what has > happened. > > I've got a number of stonith-enabled clusters within my big POWER boxes. > My stonith devices are two HMC (hardware management consoles) -

Re: [ClusterLabs] stonith in dual HMC environment

2017-03-28 Thread Dejan Muhamedagic
On Tue, Mar 28, 2017 at 04:20:12PM +0300, Alexander Markov wrote: > Hello, Dejan, > > >Why? I don't have a test system right now, but for instance this > >should work: > > > >$ stonith -t ibmhmc ipaddr=10.1.2.9 -lS > >$ stonith -t ibmhmc ipaddr=10.1.2.9 -T reset {nodename} > > Ah, I see.

Re: [ClusterLabs] stonith in dual HMC environment

2017-03-28 Thread Ken Gaillot
On 03/28/2017 08:20 AM, Alexander Markov wrote: > Hello, Dejan, > >> Why? I don't have a test system right now, but for instance this >> should work: >> >> $ stonith -t ibmhmc ipaddr=10.1.2.9 -lS >> $ stonith -t ibmhmc ipaddr=10.1.2.9 -T reset {nodename} > > Ah, I see. Everything (including

Re: [ClusterLabs] stonith in dual HMC environment

2017-03-28 Thread Alexander Markov
Hello, Dejan, Why? I don't have a test system right now, but for instance this should work: $ stonith -t ibmhmc ipaddr=10.1.2.9 -lS $ stonith -t ibmhmc ipaddr=10.1.2.9 -T reset {nodename} Ah, I see. Everything (including stonith methods, fencing and failover) works just fine under normal

Re: [ClusterLabs] stonith in dual HMC environment

2017-03-28 Thread Dejan Muhamedagic
On Mon, Mar 27, 2017 at 01:17:31PM +0300, Alexander Markov wrote: > Hello, Dejan, > > > >The first thing I'd try is making sure you can fence each node from the > >command line by manually running the fence agent. I'm not sure how to do > >that for the "stonith:" type agents. > > > >There's a

Re: [ClusterLabs] stonith in dual HMC environment

2017-03-27 Thread Alexander Markov
Hello, Dejan, The first thing I'd try is making sure you can fence each node from the command line by manually running the fence agent. I'm not sure how to do that for the "stonith:" type agents. There's a program stonith(8). It's easy to replicate the configuration on the command line.

Re: [ClusterLabs] stonith in dual HMC environment

2017-03-24 Thread Ken Gaillot
On 03/22/2017 09:42 AM, Alexander Markov wrote: > >> Please share your config along with the logs from the nodes that were >> effected. > > I'm starting to think it's not about how to define stonith resources. If > the whole box is down with all the logical partitions defined, then HMC > cannot

[ClusterLabs] stonith in dual HMC environment

2017-03-23 Thread Alexander Markov
Please share your config along with the logs from the nodes that were effected. I'm starting to think it's not about how to define stonith resources. If the whole box is down with all the logical partitions defined, then HMC cannot define if LPAR (partition) is really dead or just

Re: [ClusterLabs] stonith in dual HMC environment

2017-03-21 Thread Digimer
On 20/03/17 12:22 PM, Alexander Markov wrote: > Hello guys, > > it looks like I miss something obvious, but I just don't get what has > happened. > > I've got a number of stonith-enabled clusters within my big POWER boxes. > My stonith devices are two HMC (hardware management consoles) -

Re: [ClusterLabs] STONITH not communicated back to initiator until token expires

2017-03-13 Thread Chris Walker
Thanks for your reply Digimer. On Mon, Mar 13, 2017 at 1:35 PM, Digimer wrote: > On 13/03/17 12:07 PM, Chris Walker wrote: > > Hello, > > > > On our two-node EL7 cluster (pacemaker: 1.1.15-11.el7_3.4; corosync: > > 2.4.0-4; libqb: 1.0-1), > > it looks like successful STONITH

Re: [ClusterLabs] STONITH not communicated back to initiator until token expires

2017-03-13 Thread Digimer
On 13/03/17 12:07 PM, Chris Walker wrote: > Hello, > > On our two-node EL7 cluster (pacemaker: 1.1.15-11.el7_3.4; corosync: > 2.4.0-4; libqb: 1.0-1), > it looks like successful STONITH operations are not communicated from > stonith-ng back to theinitiator (in this case, crmd) until the STONITHed

[ClusterLabs] STONITH not communicated back to initiator until token expires

2017-03-13 Thread Chris Walker
Hello, On our two-node EL7 cluster (pacemaker: 1.1.15-11.el7_3.4; corosync: 2.4.0-4; libqb: 1.0-1), it looks like successful STONITH operations are not communicated from stonith-ng back to theinitiator (in this case, crmd) until the STONITHed node is removed from the cluster when Corosync notices

Re: [ClusterLabs] Stonith : meta-data contains no resource-agent element

2016-11-28 Thread bliu
Hi, SSH stonith is just devel demo if I did not remember wrong. If you are using openSUSE, you need to install libglue-devel. I think there similar packages on other distribution. On 11/21/2016 04:05 AM, jitendra.jaga...@dell.com wrote: Hello Pacemaker admins, We have recently

Re: [ClusterLabs] STONITH Fencing for Amazon EC2

2016-08-04 Thread Jason A Ramsey
Is there some other [updated] fencing module I can use in this use case? -- [ jR ] M: +1 (703) 628-2621 @: ja...@eramsey.org there is no path to greatness; greatness is the path On 8/2/16, 11:59 AM, "Digimer" wrote: On 02/08/16 10:02 AM, Jason A Ramsey wrote:

Re: [ClusterLabs] STONITH Fencing for Amazon EC2

2016-08-02 Thread Digimer
On 02/08/16 10:02 AM, Jason A Ramsey wrote: > I’ve found [oldish] references on the internet to a fencing module for Amazon > EC2, but it doesn’t seem to be included in any the fencing yum packages for > CentOS. Is this module not part of the canonical distribution? Is there > something else I

[ClusterLabs] STONITH Fencing for Amazon EC2

2016-08-02 Thread Jason A Ramsey
I’ve found [oldish] references on the internet to a fencing module for Amazon EC2, but it doesn’t seem to be included in any the fencing yum packages for CentOS. Is this module not part of the canonical distribution? Is there something else I should be looking for? -- [ jR ] @:

Re: [ClusterLabs] Stonith ignores resource stop errors

2016-03-10 Thread Klechomir
Thanks, That was it! On 10.03.2016 17:08, Ken Gaillot wrote: On 03/10/2016 04:42 AM, Klechomir wrote: Hi List I'm testing stonith now (pacemaker 1.1.8), and noticed that it properly kills a node with stopped pacemaker, but ignores resource stop errors. I'm pretty sure that the same version

Re: [ClusterLabs] Stonith ignores resource stop errors

2016-03-10 Thread Ken Gaillot
On 03/10/2016 04:42 AM, Klechomir wrote: > Hi List > > I'm testing stonith now (pacemaker 1.1.8), and noticed that it properly kills > a node with stopped pacemaker, but ignores resource stop errors. > > I'm pretty sure that the same version worked properly with stonith before. > Maybe I'm

[ClusterLabs] STONITH interrupting pacemaker yum upgrades

2016-02-19 Thread Richard Stevenson
Hi, I'm having trouble updating pacemaker on a small 3 node cluster. All nodes are running Centos 7, and I'm upgrading via a simple `yum upgrade`. Whenever I attempt to do this the node is fenced when yum attempts to clean up the old pacemaker package - restarted by STONITH in this case.

Re: [ClusterLabs] STONITH when both IB interfaces are down, and how to trigger Filesystem mount/umount failure to test STONITH?

2015-08-20 Thread Marcin Dulak
Hi, thanks for the answers, i've performed the test of shutting down both IPoIB interfaces on an OSS server while a Lustre client writing a large file to the OST one that server, the umount still succeded, and writing to the file continued after a short delay on the same OST mounted on the