Re: [ClusterLabs] fence_scsi no such device
On 03/21/2016 03:46 PM, Ken Gaillot wrote: On 03/21/2016 08:39 AM, marvin wrote: On 03/15/2016 03:39 PM, Ken Gaillot wrote: On 03/15/2016 09:10 AM, marvin wrote: Hi, I'm trying to get fence_scsi working, but i get "no such device" error. It's a two node cluster with nodes called "node01" and "node03". The OS is RHEL 7.2. here is some relevant info: # pcs status Cluster name: testrhel7cluster Last updated: Tue Mar 15 15:05:40 2016 Last change: Tue Mar 15 14:33:39 2016 by root via cibadmin on node01 Stack: corosync Current DC: node03 (version 1.1.13-10.el7-44eb2dd) - partition with quorum 2 nodes and 23 resources configured Online: [ node01 node03 ] Full list of resources: Clone Set: dlm-clone [dlm] Started: [ node01 node03 ] Clone Set: clvmd-clone [clvmd] Started: [ node01 node03 ] fence-node1(stonith:fence_ipmilan):Started node03 fence-node3(stonith:fence_ipmilan):Started node01 Resource Group: test_grupa test_ip(ocf::heartbeat:IPaddr):Started node01 lv_testdbcl(ocf::heartbeat:LVM): Started node01 fs_testdbcl(ocf::heartbeat:Filesystem):Started node01 oracle11_baza (ocf::heartbeat:oracle):Started node01 oracle11_lsnr (ocf::heartbeat:oralsnr): Started node01 fence-scsi-node1 (stonith:fence_scsi): Started node03 fence-scsi-node3 (stonith:fence_scsi): Started node01 PCSD Status: node01: Online node03: Online Daemon Status: corosync: active/enabled pacemaker: active/enabled pcsd: active/enabled # pcs stonith show fence-node1(stonith:fence_ipmilan):Started node03 fence-node3(stonith:fence_ipmilan):Started node01 fence-scsi-node1 (stonith:fence_scsi): Started node03 fence-scsi-node3 (stonith:fence_scsi): Started node01 Node: node01 Level 1 - fence-scsi-node3 Level 2 - fence-node3 Node: node03 Level 1 - fence-scsi-node1 Level 2 - fence-node1 # pcs stonith show fence-scsi-node1 --all Resource: fence-scsi-node1 (class=stonith type=fence_scsi) Attributes: pcmk_host_list=node01 pcmk_monitor_action=metadata pcmk_reboot_action=off Meta Attrs: provides=unfencing Operations: monitor interval=60s (fence-scsi-node1-monitor-interval-60s) # pcs stonith show fence-scsi-node3 --all Resource: fence-scsi-node3 (class=stonith type=fence_scsi) Attributes: pcmk_host_list=node03 pcmk_monitor_action=metadata pcmk_reboot_action=off Meta Attrs: provides=unfencing Operations: monitor interval=60s (fence-scsi-node3-monitor-interval-60s) node01 # pcs stonith fence node03 Error: unable to fence 'node03' Command failed: No such device node01 # tail /var/log/messages Mar 15 14:54:04 node01 stonith-ng[20024]: notice: Client stonith_admin.29191.2b7fe910 wants to fence (reboot) 'node03' with device '(any)' Mar 15 14:54:04 node01 stonith-ng[20024]: notice: Initiating remote operation reboot for node03: d1df9201-5bb1-447f-9b40-d3d7235c3d0a (0) Mar 15 14:54:04 node01 stonith-ng[20024]: notice: fence-scsi-node3 can fence (reboot) node03: static-list Mar 15 14:54:04 node01 stonith-ng[20024]: notice: fence-node3 can fence (reboot) node03: static-list Mar 15 14:54:04 node01 stonith-ng[20024]: notice: All fencing options to fence node03 for stonith_admin.29191@node01.d1df9201 failed The above line is the key. Both of the devices registered for node03 returned failure. Pacemaker then looked for any other device capable of fencing node03 and there is none, so that's why it reported "No such device" (an admittedly obscure message). It looks like the fence agents require more configuration options than you have set. If you run "/path/to/fence/agent -o metadata", you can see the available options. It's a good idea to first get the agent running successfully manually on the command line ("status" command is usually sufficient), then put those same options in the cluster configuration. Made some progress, found new issue. So i get the scsi_fence to work, it unfences at start, and fences when i tell it to. The problem is when I, for instance, fence node01. It stops pacemaker but leaves corosync, so node01 is in "pending" state and node03 won't stop services until node01 is restarted. The keys seem to be handled correctly. Technically, fence_scsi won't stop pacemaker or corosync, it will just cut off the node's disk access and let the cluster know it's safe to recover resources. I haven't used fence_scsi myself, so I'm not sure of the exact details, but your configuration needs some changes. The pcmk_host_list option should list only the one node that the fence device can fence (one device configured for each node). You need more attributes, such as "devices" to specify which SCSI devices to cut off, and either "key" or "nodename" to specify the node key for SCSI reservations. I'll give it a try, but all those should be automagic if
Re: [ClusterLabs] fence_scsi no such device
On 21/03/16 09:46 -0500, Ken Gaillot wrote: > You need more attributes, such as "devices" to specify which SCSI > devices to cut off, and either "key" or "nodename" to specify the > node key for SCSI reservations. Hmm, I keep lamenting that by extending agents metadata with inline RelaxNG grammar to express co-occurrence/mutual exclusion of particular parameters and/or its datatype in detail, and by using that information at the configuration front-ends, we would push the overall user experience to the new level (https://bugzilla.redhat.com/show_bug.cgi?id=1281463#c4). -- Jan (Poki) pgpF0e6JAZcEK.pgp Description: PGP signature ___ Users mailing list: Users@clusterlabs.org http://clusterlabs.org/mailman/listinfo/users Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org
Re: [ClusterLabs] fence_scsi no such device
On 03/21/2016 08:39 AM, marvin wrote: > > > On 03/15/2016 03:39 PM, Ken Gaillot wrote: >> On 03/15/2016 09:10 AM, marvin wrote: >>> Hi, >>> >>> I'm trying to get fence_scsi working, but i get "no such device" error. >>> It's a two node cluster with nodes called "node01" and "node03". The OS >>> is RHEL 7.2. >>> >>> here is some relevant info: >>> >>> # pcs status >>> Cluster name: testrhel7cluster >>> Last updated: Tue Mar 15 15:05:40 2016 Last change: Tue Mar 15 >>> 14:33:39 2016 by root via cibadmin on node01 >>> Stack: corosync >>> Current DC: node03 (version 1.1.13-10.el7-44eb2dd) - partition with >>> quorum >>> 2 nodes and 23 resources configured >>> >>> Online: [ node01 node03 ] >>> >>> Full list of resources: >>> >>> Clone Set: dlm-clone [dlm] >>> Started: [ node01 node03 ] >>> Clone Set: clvmd-clone [clvmd] >>> Started: [ node01 node03 ] >>> fence-node1(stonith:fence_ipmilan):Started node03 >>> fence-node3(stonith:fence_ipmilan):Started node01 >>> Resource Group: test_grupa >>> test_ip(ocf::heartbeat:IPaddr):Started node01 >>> lv_testdbcl(ocf::heartbeat:LVM): Started node01 >>> fs_testdbcl(ocf::heartbeat:Filesystem):Started node01 >>> oracle11_baza (ocf::heartbeat:oracle):Started node01 >>> oracle11_lsnr (ocf::heartbeat:oralsnr): Started node01 >>> fence-scsi-node1 (stonith:fence_scsi): Started node03 >>> fence-scsi-node3 (stonith:fence_scsi): Started node01 >>> >>> PCSD Status: >>>node01: Online >>>node03: Online >>> >>> Daemon Status: >>>corosync: active/enabled >>>pacemaker: active/enabled >>>pcsd: active/enabled >>> >>> # pcs stonith show >>> fence-node1(stonith:fence_ipmilan):Started node03 >>> fence-node3(stonith:fence_ipmilan):Started node01 >>> fence-scsi-node1 (stonith:fence_scsi): Started node03 >>> fence-scsi-node3 (stonith:fence_scsi): Started node01 >>> Node: node01 >>>Level 1 - fence-scsi-node3 >>>Level 2 - fence-node3 >>> Node: node03 >>>Level 1 - fence-scsi-node1 >>>Level 2 - fence-node1 >>> >>> # pcs stonith show fence-scsi-node1 --all >>> Resource: fence-scsi-node1 (class=stonith type=fence_scsi) >>>Attributes: pcmk_host_list=node01 pcmk_monitor_action=metadata >>> pcmk_reboot_action=off >>>Meta Attrs: provides=unfencing >>>Operations: monitor interval=60s >>> (fence-scsi-node1-monitor-interval-60s) >>> >>> # pcs stonith show fence-scsi-node3 --all >>> Resource: fence-scsi-node3 (class=stonith type=fence_scsi) >>>Attributes: pcmk_host_list=node03 pcmk_monitor_action=metadata >>> pcmk_reboot_action=off >>>Meta Attrs: provides=unfencing >>>Operations: monitor interval=60s >>> (fence-scsi-node3-monitor-interval-60s) >>> >>> node01 # pcs stonith fence node03 >>> Error: unable to fence 'node03' >>> Command failed: No such device >>> >>> node01 # tail /var/log/messages >>> Mar 15 14:54:04 node01 stonith-ng[20024]: notice: Client >>> stonith_admin.29191.2b7fe910 wants to fence (reboot) 'node03' with >>> device '(any)' >>> Mar 15 14:54:04 node01 stonith-ng[20024]: notice: Initiating remote >>> operation reboot for node03: d1df9201-5bb1-447f-9b40-d3d7235c3d0a (0) >>> Mar 15 14:54:04 node01 stonith-ng[20024]: notice: fence-scsi-node3 can >>> fence (reboot) node03: static-list >>> Mar 15 14:54:04 node01 stonith-ng[20024]: notice: fence-node3 can fence >>> (reboot) node03: static-list >>> Mar 15 14:54:04 node01 stonith-ng[20024]: notice: All fencing options >>> to fence node03 for stonith_admin.29191@node01.d1df9201 failed >> The above line is the key. Both of the devices registered for node03 >> returned failure. Pacemaker then looked for any other device capable of >> fencing node03 and there is none, so that's why it reported "No such >> device" (an admittedly obscure message). >> >> It looks like the fence agents require more configuration options than >> you have set. If you run "/path/to/fence/agent -o metadata", you can see >> the available options. It's a good idea to first get the agent running >> successfully manually on the command line ("status" command is usually >> sufficient), then put those same options in the cluster configuration. >> > Made some progress, found new issue. > So i get the scsi_fence to work, it unfences at start, and fences when i > tell it to. > > The problem is when I, for instance, fence node01. It stops pacemaker > but leaves corosync, so node01 is in "pending" state and node03 won't > stop services until node01 is restarted. The keys seem to be handled > correctly. Technically, fence_scsi won't stop pacemaker or corosync, it will just cut off the node's disk access and let the cluster know it's safe to recover resources. I haven't used fence_scsi myself, so I'm not sure of the exact details, but your configuration needs some changes. The pcmk_host_list option should list
Re: [ClusterLabs] fence_scsi no such device
On 03/15/2016 03:39 PM, Ken Gaillot wrote: On 03/15/2016 09:10 AM, marvin wrote: Hi, I'm trying to get fence_scsi working, but i get "no such device" error. It's a two node cluster with nodes called "node01" and "node03". The OS is RHEL 7.2. here is some relevant info: # pcs status Cluster name: testrhel7cluster Last updated: Tue Mar 15 15:05:40 2016 Last change: Tue Mar 15 14:33:39 2016 by root via cibadmin on node01 Stack: corosync Current DC: node03 (version 1.1.13-10.el7-44eb2dd) - partition with quorum 2 nodes and 23 resources configured Online: [ node01 node03 ] Full list of resources: Clone Set: dlm-clone [dlm] Started: [ node01 node03 ] Clone Set: clvmd-clone [clvmd] Started: [ node01 node03 ] fence-node1(stonith:fence_ipmilan):Started node03 fence-node3(stonith:fence_ipmilan):Started node01 Resource Group: test_grupa test_ip(ocf::heartbeat:IPaddr):Started node01 lv_testdbcl(ocf::heartbeat:LVM): Started node01 fs_testdbcl(ocf::heartbeat:Filesystem):Started node01 oracle11_baza (ocf::heartbeat:oracle):Started node01 oracle11_lsnr (ocf::heartbeat:oralsnr): Started node01 fence-scsi-node1 (stonith:fence_scsi): Started node03 fence-scsi-node3 (stonith:fence_scsi): Started node01 PCSD Status: node01: Online node03: Online Daemon Status: corosync: active/enabled pacemaker: active/enabled pcsd: active/enabled # pcs stonith show fence-node1(stonith:fence_ipmilan):Started node03 fence-node3(stonith:fence_ipmilan):Started node01 fence-scsi-node1 (stonith:fence_scsi): Started node03 fence-scsi-node3 (stonith:fence_scsi): Started node01 Node: node01 Level 1 - fence-scsi-node3 Level 2 - fence-node3 Node: node03 Level 1 - fence-scsi-node1 Level 2 - fence-node1 # pcs stonith show fence-scsi-node1 --all Resource: fence-scsi-node1 (class=stonith type=fence_scsi) Attributes: pcmk_host_list=node01 pcmk_monitor_action=metadata pcmk_reboot_action=off Meta Attrs: provides=unfencing Operations: monitor interval=60s (fence-scsi-node1-monitor-interval-60s) # pcs stonith show fence-scsi-node3 --all Resource: fence-scsi-node3 (class=stonith type=fence_scsi) Attributes: pcmk_host_list=node03 pcmk_monitor_action=metadata pcmk_reboot_action=off Meta Attrs: provides=unfencing Operations: monitor interval=60s (fence-scsi-node3-monitor-interval-60s) node01 # pcs stonith fence node03 Error: unable to fence 'node03' Command failed: No such device node01 # tail /var/log/messages Mar 15 14:54:04 node01 stonith-ng[20024]: notice: Client stonith_admin.29191.2b7fe910 wants to fence (reboot) 'node03' with device '(any)' Mar 15 14:54:04 node01 stonith-ng[20024]: notice: Initiating remote operation reboot for node03: d1df9201-5bb1-447f-9b40-d3d7235c3d0a (0) Mar 15 14:54:04 node01 stonith-ng[20024]: notice: fence-scsi-node3 can fence (reboot) node03: static-list Mar 15 14:54:04 node01 stonith-ng[20024]: notice: fence-node3 can fence (reboot) node03: static-list Mar 15 14:54:04 node01 stonith-ng[20024]: notice: All fencing options to fence node03 for stonith_admin.29191@node01.d1df9201 failed The above line is the key. Both of the devices registered for node03 returned failure. Pacemaker then looked for any other device capable of fencing node03 and there is none, so that's why it reported "No such device" (an admittedly obscure message). It looks like the fence agents require more configuration options than you have set. If you run "/path/to/fence/agent -o metadata", you can see the available options. It's a good idea to first get the agent running successfully manually on the command line ("status" command is usually sufficient), then put those same options in the cluster configuration. Made some progress, found new issue. So i get the scsi_fence to work, it unfences at start, and fences when i tell it to. The problem is when I, for instance, fence node01. It stops pacemaker but leaves corosync, so node01 is in "pending" state and node03 won't stop services until node01 is restarted. The keys seem to be handled correctly. Before fence: # pcs status Cluster name: testrhel7cluster Last updated: Mon Mar 21 14:26:53 2016 Last change: Mon Mar 21 14:26:27 2016 by root via crm_resource on node01 Stack: corosync Current DC: node01 (version 1.1.13-10.el7-44eb2dd) - partition with quorum 2 nodes and 21 resources configured Online: [ node01 node03 ] Full list of resources: Clone Set: dlm-clone [dlm] Started: [ node01 node03 ] Clone Set: clvmd-clone [clvmd] Started: [ node01 node03 ] Resource Group: test_grupa test_ip(ocf::heartbeat:IPaddr):Started node01 lv_testdbcl(ocf::heartbeat:LVM): Started node01 fs_testdbcl(ocf::heartbeat:Filesystem):Started node01 oracle11_baza (ocf:
Re: [ClusterLabs] fence_scsi no such device
On 03/15/2016 09:10 AM, marvin wrote: > Hi, > > I'm trying to get fence_scsi working, but i get "no such device" error. > It's a two node cluster with nodes called "node01" and "node03". The OS > is RHEL 7.2. > > here is some relevant info: > > # pcs status > Cluster name: testrhel7cluster > Last updated: Tue Mar 15 15:05:40 2016 Last change: Tue Mar 15 > 14:33:39 2016 by root via cibadmin on node01 > Stack: corosync > Current DC: node03 (version 1.1.13-10.el7-44eb2dd) - partition with quorum > 2 nodes and 23 resources configured > > Online: [ node01 node03 ] > > Full list of resources: > > Clone Set: dlm-clone [dlm] > Started: [ node01 node03 ] > Clone Set: clvmd-clone [clvmd] > Started: [ node01 node03 ] > fence-node1(stonith:fence_ipmilan):Started node03 > fence-node3(stonith:fence_ipmilan):Started node01 > Resource Group: test_grupa > test_ip(ocf::heartbeat:IPaddr):Started node01 > lv_testdbcl(ocf::heartbeat:LVM): Started node01 > fs_testdbcl(ocf::heartbeat:Filesystem):Started node01 > oracle11_baza (ocf::heartbeat:oracle):Started node01 > oracle11_lsnr (ocf::heartbeat:oralsnr): Started node01 > fence-scsi-node1 (stonith:fence_scsi): Started node03 > fence-scsi-node3 (stonith:fence_scsi): Started node01 > > PCSD Status: > node01: Online > node03: Online > > Daemon Status: > corosync: active/enabled > pacemaker: active/enabled > pcsd: active/enabled > > # pcs stonith show > fence-node1(stonith:fence_ipmilan):Started node03 > fence-node3(stonith:fence_ipmilan):Started node01 > fence-scsi-node1 (stonith:fence_scsi): Started node03 > fence-scsi-node3 (stonith:fence_scsi): Started node01 > Node: node01 > Level 1 - fence-scsi-node3 > Level 2 - fence-node3 > Node: node03 > Level 1 - fence-scsi-node1 > Level 2 - fence-node1 > > # pcs stonith show fence-scsi-node1 --all > Resource: fence-scsi-node1 (class=stonith type=fence_scsi) > Attributes: pcmk_host_list=node01 pcmk_monitor_action=metadata > pcmk_reboot_action=off > Meta Attrs: provides=unfencing > Operations: monitor interval=60s (fence-scsi-node1-monitor-interval-60s) > > # pcs stonith show fence-scsi-node3 --all > Resource: fence-scsi-node3 (class=stonith type=fence_scsi) > Attributes: pcmk_host_list=node03 pcmk_monitor_action=metadata > pcmk_reboot_action=off > Meta Attrs: provides=unfencing > Operations: monitor interval=60s (fence-scsi-node3-monitor-interval-60s) > > node01 # pcs stonith fence node03 > Error: unable to fence 'node03' > Command failed: No such device > > node01 # tail /var/log/messages > Mar 15 14:54:04 node01 stonith-ng[20024]: notice: Client > stonith_admin.29191.2b7fe910 wants to fence (reboot) 'node03' with > device '(any)' > Mar 15 14:54:04 node01 stonith-ng[20024]: notice: Initiating remote > operation reboot for node03: d1df9201-5bb1-447f-9b40-d3d7235c3d0a (0) > Mar 15 14:54:04 node01 stonith-ng[20024]: notice: fence-scsi-node3 can > fence (reboot) node03: static-list > Mar 15 14:54:04 node01 stonith-ng[20024]: notice: fence-node3 can fence > (reboot) node03: static-list > Mar 15 14:54:04 node01 stonith-ng[20024]: notice: All fencing options > to fence node03 for stonith_admin.29191@node01.d1df9201 failed The above line is the key. Both of the devices registered for node03 returned failure. Pacemaker then looked for any other device capable of fencing node03 and there is none, so that's why it reported "No such device" (an admittedly obscure message). It looks like the fence agents require more configuration options than you have set. If you run "/path/to/fence/agent -o metadata", you can see the available options. It's a good idea to first get the agent running successfully manually on the command line ("status" command is usually sufficient), then put those same options in the cluster configuration. > Mar 15 14:54:04 node01 stonith-ng[20024]: notice: Couldn't find anyone > to fence (reboot) node03 with fence-node1 > Mar 15 14:54:04 node01 stonith-ng[20024]: error: Operation reboot of > node03 by for stonith_admin.29191@node01.d1df9201: No such device > Mar 15 14:54:04 node01 crmd[20028]: notice: Peer node03 was not > terminated (reboot) by for node01: No such device > (ref=d1df9201-5bb1-447f-9b40-d3d7235c3d0a) by client stonith_admin.29191 > > node03 # tail /var/log/messages > Mar 15 14:54:04 node03 stonith-ng[2601]: notice: fence-scsi-node1 can > not fence (reboot) node03: static-list > Mar 15 14:54:04 node03 stonith-ng[2601]: notice: fence-node1 can not > fence (reboot) node03: static-list > Mar 15 14:54:04 node03 stonith-ng[2601]: notice: Operation reboot of > node03 by for stonith_admin.29191@node01.d1df9201: No such device > Mar 15 14:54:04 node03 crmd[2605]: notice: Peer node03 was not > terminated (reboot) by for node01: No such device > (ref=d1df9201-5bb1-447f-9b