Re: [ClusterLabs] fence_scsi no such device

2016-03-21 Thread Jan Pokorný
On 21/03/16 09:46 -0500, Ken Gaillot wrote:
> You need more attributes, such as "devices" to specify which SCSI
> devices to cut off, and either "key" or "nodename" to specify the
> node key for SCSI reservations.

Hmm, I keep lamenting that by extending agents metadata with inline
RelaxNG grammar to express co-occurrence/mutual exclusion of
particular parameters and/or its datatype in detail, and by using
that information at the configuration front-ends, we would push
the overall user experience to the new level
(https://bugzilla.redhat.com/show_bug.cgi?id=1281463#c4).

-- 
Jan (Poki)


pgpF0e6JAZcEK.pgp
Description: PGP signature
___
Users mailing list: Users@clusterlabs.org
http://clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [ClusterLabs] fence_scsi no such device

2016-03-21 Thread Ken Gaillot
On 03/21/2016 08:39 AM, marvin wrote:
> 
> 
> On 03/15/2016 03:39 PM, Ken Gaillot wrote:
>> On 03/15/2016 09:10 AM, marvin wrote:
>>> Hi,
>>>
>>> I'm trying to get fence_scsi working, but i get "no such device" error.
>>> It's a two node cluster with nodes called "node01" and "node03". The OS
>>> is RHEL 7.2.
>>>
>>> here is some relevant info:
>>>
>>> # pcs status
>>> Cluster name: testrhel7cluster
>>> Last updated: Tue Mar 15 15:05:40 2016  Last change: Tue Mar 15
>>> 14:33:39 2016 by root via cibadmin on node01
>>> Stack: corosync
>>> Current DC: node03 (version 1.1.13-10.el7-44eb2dd) - partition with
>>> quorum
>>> 2 nodes and 23 resources configured
>>>
>>> Online: [ node01 node03 ]
>>>
>>> Full list of resources:
>>>
>>>   Clone Set: dlm-clone [dlm]
>>>   Started: [ node01 node03 ]
>>>   Clone Set: clvmd-clone [clvmd]
>>>   Started: [ node01 node03 ]
>>>   fence-node1(stonith:fence_ipmilan):Started node03
>>>   fence-node3(stonith:fence_ipmilan):Started node01
>>>   Resource Group: test_grupa
>>>   test_ip(ocf::heartbeat:IPaddr):Started node01
>>>   lv_testdbcl(ocf::heartbeat:LVM):   Started node01
>>>   fs_testdbcl(ocf::heartbeat:Filesystem):Started node01
>>>   oracle11_baza  (ocf::heartbeat:oracle):Started node01
>>>   oracle11_lsnr  (ocf::heartbeat:oralsnr):   Started node01
>>>   fence-scsi-node1   (stonith:fence_scsi):   Started node03
>>>   fence-scsi-node3   (stonith:fence_scsi):   Started node01
>>>
>>> PCSD Status:
>>>node01: Online
>>>node03: Online
>>>
>>> Daemon Status:
>>>corosync: active/enabled
>>>pacemaker: active/enabled
>>>pcsd: active/enabled
>>>
>>> # pcs stonith show
>>>   fence-node1(stonith:fence_ipmilan):Started node03
>>>   fence-node3(stonith:fence_ipmilan):Started node01
>>>   fence-scsi-node1   (stonith:fence_scsi):   Started node03
>>>   fence-scsi-node3   (stonith:fence_scsi):   Started node01
>>>   Node: node01
>>>Level 1 - fence-scsi-node3
>>>Level 2 - fence-node3
>>>   Node: node03
>>>Level 1 - fence-scsi-node1
>>>Level 2 - fence-node1
>>>
>>> # pcs stonith show fence-scsi-node1 --all
>>>   Resource: fence-scsi-node1 (class=stonith type=fence_scsi)
>>>Attributes: pcmk_host_list=node01 pcmk_monitor_action=metadata
>>> pcmk_reboot_action=off
>>>Meta Attrs: provides=unfencing
>>>Operations: monitor interval=60s
>>> (fence-scsi-node1-monitor-interval-60s)
>>>
>>> # pcs stonith show fence-scsi-node3 --all
>>>   Resource: fence-scsi-node3 (class=stonith type=fence_scsi)
>>>Attributes: pcmk_host_list=node03 pcmk_monitor_action=metadata
>>> pcmk_reboot_action=off
>>>Meta Attrs: provides=unfencing
>>>Operations: monitor interval=60s
>>> (fence-scsi-node3-monitor-interval-60s)
>>>
>>> node01 # pcs stonith fence node03
>>> Error: unable to fence 'node03'
>>> Command failed: No such device
>>>
>>> node01 # tail /var/log/messages
>>> Mar 15 14:54:04 node01 stonith-ng[20024]:  notice: Client
>>> stonith_admin.29191.2b7fe910 wants to fence (reboot) 'node03' with
>>> device '(any)'
>>> Mar 15 14:54:04 node01 stonith-ng[20024]:  notice: Initiating remote
>>> operation reboot for node03: d1df9201-5bb1-447f-9b40-d3d7235c3d0a (0)
>>> Mar 15 14:54:04 node01 stonith-ng[20024]:  notice: fence-scsi-node3 can
>>> fence (reboot) node03: static-list
>>> Mar 15 14:54:04 node01 stonith-ng[20024]:  notice: fence-node3 can fence
>>> (reboot) node03: static-list
>>> Mar 15 14:54:04 node01 stonith-ng[20024]:  notice: All fencing options
>>> to fence node03 for stonith_admin.29191@node01.d1df9201 failed
>> The above line is the key. Both of the devices registered for node03
>> returned failure. Pacemaker then looked for any other device capable of
>> fencing node03 and there is none, so that's why it reported "No such
>> device" (an admittedly obscure message).
>>
>> It looks like the fence agents require more configuration options than
>> you have set. If you run "/path/to/fence/agent -o metadata", you can see
>> the available options. It's a good idea to first get the agent running
>> successfully manually on the command line ("status" command is usually
>> sufficient), then put those same options in the cluster configuration.
>>
> Made some progress, found new issue.
> So i get the scsi_fence to work, it unfences at start, and fences when i
> tell it to.
> 
> The problem is when I, for instance, fence node01. It stops pacemaker
> but leaves corosync, so node01 is in "pending" state and node03 won't
> stop services until node01 is restarted. The keys seem to be handled
> correctly.

Technically, fence_scsi won't stop pacemaker or corosync, it will just
cut off the node's disk access and let the cluster know it's safe to
recover resources.

I haven't used fence_scsi myself, so I'm not sure of the exact details,
but your configuration needs some changes. The pcmk_host_list option
should list 

Re: [ClusterLabs] fence_scsi no such device

2016-03-21 Thread marvin



On 03/15/2016 03:39 PM, Ken Gaillot wrote:

On 03/15/2016 09:10 AM, marvin wrote:

Hi,

I'm trying to get fence_scsi working, but i get "no such device" error.
It's a two node cluster with nodes called "node01" and "node03". The OS
is RHEL 7.2.

here is some relevant info:

# pcs status
Cluster name: testrhel7cluster
Last updated: Tue Mar 15 15:05:40 2016  Last change: Tue Mar 15
14:33:39 2016 by root via cibadmin on node01
Stack: corosync
Current DC: node03 (version 1.1.13-10.el7-44eb2dd) - partition with quorum
2 nodes and 23 resources configured

Online: [ node01 node03 ]

Full list of resources:

  Clone Set: dlm-clone [dlm]
  Started: [ node01 node03 ]
  Clone Set: clvmd-clone [clvmd]
  Started: [ node01 node03 ]
  fence-node1(stonith:fence_ipmilan):Started node03
  fence-node3(stonith:fence_ipmilan):Started node01
  Resource Group: test_grupa
  test_ip(ocf::heartbeat:IPaddr):Started node01
  lv_testdbcl(ocf::heartbeat:LVM):   Started node01
  fs_testdbcl(ocf::heartbeat:Filesystem):Started node01
  oracle11_baza  (ocf::heartbeat:oracle):Started node01
  oracle11_lsnr  (ocf::heartbeat:oralsnr):   Started node01
  fence-scsi-node1   (stonith:fence_scsi):   Started node03
  fence-scsi-node3   (stonith:fence_scsi):   Started node01

PCSD Status:
   node01: Online
   node03: Online

Daemon Status:
   corosync: active/enabled
   pacemaker: active/enabled
   pcsd: active/enabled

# pcs stonith show
  fence-node1(stonith:fence_ipmilan):Started node03
  fence-node3(stonith:fence_ipmilan):Started node01
  fence-scsi-node1   (stonith:fence_scsi):   Started node03
  fence-scsi-node3   (stonith:fence_scsi):   Started node01
  Node: node01
   Level 1 - fence-scsi-node3
   Level 2 - fence-node3
  Node: node03
   Level 1 - fence-scsi-node1
   Level 2 - fence-node1

# pcs stonith show fence-scsi-node1 --all
  Resource: fence-scsi-node1 (class=stonith type=fence_scsi)
   Attributes: pcmk_host_list=node01 pcmk_monitor_action=metadata
pcmk_reboot_action=off
   Meta Attrs: provides=unfencing
   Operations: monitor interval=60s (fence-scsi-node1-monitor-interval-60s)

# pcs stonith show fence-scsi-node3 --all
  Resource: fence-scsi-node3 (class=stonith type=fence_scsi)
   Attributes: pcmk_host_list=node03 pcmk_monitor_action=metadata
pcmk_reboot_action=off
   Meta Attrs: provides=unfencing
   Operations: monitor interval=60s (fence-scsi-node3-monitor-interval-60s)

node01 # pcs stonith fence node03
Error: unable to fence 'node03'
Command failed: No such device

node01 # tail /var/log/messages
Mar 15 14:54:04 node01 stonith-ng[20024]:  notice: Client
stonith_admin.29191.2b7fe910 wants to fence (reboot) 'node03' with
device '(any)'
Mar 15 14:54:04 node01 stonith-ng[20024]:  notice: Initiating remote
operation reboot for node03: d1df9201-5bb1-447f-9b40-d3d7235c3d0a (0)
Mar 15 14:54:04 node01 stonith-ng[20024]:  notice: fence-scsi-node3 can
fence (reboot) node03: static-list
Mar 15 14:54:04 node01 stonith-ng[20024]:  notice: fence-node3 can fence
(reboot) node03: static-list
Mar 15 14:54:04 node01 stonith-ng[20024]:  notice: All fencing options
to fence node03 for stonith_admin.29191@node01.d1df9201 failed

The above line is the key. Both of the devices registered for node03
returned failure. Pacemaker then looked for any other device capable of
fencing node03 and there is none, so that's why it reported "No such
device" (an admittedly obscure message).

It looks like the fence agents require more configuration options than
you have set. If you run "/path/to/fence/agent -o metadata", you can see
the available options. It's a good idea to first get the agent running
successfully manually on the command line ("status" command is usually
sufficient), then put those same options in the cluster configuration.


Made some progress, found new issue.
So i get the scsi_fence to work, it unfences at start, and fences when i 
tell it to.


The problem is when I, for instance, fence node01. It stops pacemaker 
but leaves corosync, so node01 is in "pending" state and node03 won't 
stop services until node01 is restarted. The keys seem to be handled 
correctly.


Before fence:
# pcs status
Cluster name: testrhel7cluster
Last updated: Mon Mar 21 14:26:53 2016  Last change: Mon Mar 21 
14:26:27 2016 by root via crm_resource on node01

Stack: corosync
Current DC: node01 (version 1.1.13-10.el7-44eb2dd) - partition with quorum
2 nodes and 21 resources configured

Online: [ node01 node03 ]

Full list of resources:

 Clone Set: dlm-clone [dlm]
 Started: [ node01 node03 ]
 Clone Set: clvmd-clone [clvmd]
 Started: [ node01 node03 ]
 Resource Group: test_grupa
 test_ip(ocf::heartbeat:IPaddr):Started node01
 lv_testdbcl(ocf::heartbeat:LVM):   Started node01
 fs_testdbcl(ocf::heartbeat:Filesystem):Started node01
 oracle11_baza  

[ClusterLabs] fence_scsi no such device

2016-03-15 Thread marvin

Hi,

I'm trying to get fence_scsi working, but i get "no such device" error.
It's a two node cluster with nodes called "node01" and "node03". The OS 
is RHEL 7.2.


here is some relevant info:

# pcs status
Cluster name: testrhel7cluster
Last updated: Tue Mar 15 15:05:40 2016  Last change: Tue Mar 15 
14:33:39 2016 by root via cibadmin on node01

Stack: corosync
Current DC: node03 (version 1.1.13-10.el7-44eb2dd) - partition with quorum
2 nodes and 23 resources configured

Online: [ node01 node03 ]

Full list of resources:

 Clone Set: dlm-clone [dlm]
 Started: [ node01 node03 ]
 Clone Set: clvmd-clone [clvmd]
 Started: [ node01 node03 ]
 fence-node1(stonith:fence_ipmilan):Started node03
 fence-node3(stonith:fence_ipmilan):Started node01
 Resource Group: test_grupa
 test_ip(ocf::heartbeat:IPaddr):Started node01
 lv_testdbcl(ocf::heartbeat:LVM):   Started node01
 fs_testdbcl(ocf::heartbeat:Filesystem):Started node01
 oracle11_baza  (ocf::heartbeat:oracle):Started node01
 oracle11_lsnr  (ocf::heartbeat:oralsnr):   Started node01
 fence-scsi-node1   (stonith:fence_scsi):   Started node03
 fence-scsi-node3   (stonith:fence_scsi):   Started node01

PCSD Status:
  node01: Online
  node03: Online

Daemon Status:
  corosync: active/enabled
  pacemaker: active/enabled
  pcsd: active/enabled

# pcs stonith show
 fence-node1(stonith:fence_ipmilan):Started node03
 fence-node3(stonith:fence_ipmilan):Started node01
 fence-scsi-node1   (stonith:fence_scsi):   Started node03
 fence-scsi-node3   (stonith:fence_scsi):   Started node01
 Node: node01
  Level 1 - fence-scsi-node3
  Level 2 - fence-node3
 Node: node03
  Level 1 - fence-scsi-node1
  Level 2 - fence-node1

# pcs stonith show fence-scsi-node1 --all
 Resource: fence-scsi-node1 (class=stonith type=fence_scsi)
  Attributes: pcmk_host_list=node01 pcmk_monitor_action=metadata 
pcmk_reboot_action=off

  Meta Attrs: provides=unfencing
  Operations: monitor interval=60s (fence-scsi-node1-monitor-interval-60s)

# pcs stonith show fence-scsi-node3 --all
 Resource: fence-scsi-node3 (class=stonith type=fence_scsi)
  Attributes: pcmk_host_list=node03 pcmk_monitor_action=metadata 
pcmk_reboot_action=off

  Meta Attrs: provides=unfencing
  Operations: monitor interval=60s (fence-scsi-node3-monitor-interval-60s)

node01 # pcs stonith fence node03
Error: unable to fence 'node03'
Command failed: No such device

node01 # tail /var/log/messages
Mar 15 14:54:04 node01 stonith-ng[20024]:  notice: Client 
stonith_admin.29191.2b7fe910 wants to fence (reboot) 'node03' with 
device '(any)'
Mar 15 14:54:04 node01 stonith-ng[20024]:  notice: Initiating remote 
operation reboot for node03: d1df9201-5bb1-447f-9b40-d3d7235c3d0a (0)
Mar 15 14:54:04 node01 stonith-ng[20024]:  notice: fence-scsi-node3 can 
fence (reboot) node03: static-list
Mar 15 14:54:04 node01 stonith-ng[20024]:  notice: fence-node3 can fence 
(reboot) node03: static-list
Mar 15 14:54:04 node01 stonith-ng[20024]:  notice: All fencing options 
to fence node03 for stonith_admin.29191@node01.d1df9201 failed
Mar 15 14:54:04 node01 stonith-ng[20024]:  notice: Couldn't find anyone 
to fence (reboot) node03 with fence-node1
Mar 15 14:54:04 node01 stonith-ng[20024]:   error: Operation reboot of 
node03 by  for stonith_admin.29191@node01.d1df9201: No such device
Mar 15 14:54:04 node01 crmd[20028]:  notice: Peer node03 was not 
terminated (reboot) by  for node01: No such device 
(ref=d1df9201-5bb1-447f-9b40-d3d7235c3d0a) by client stonith_admin.29191


node03 # tail /var/log/messages
Mar 15 14:54:04 node03 stonith-ng[2601]:  notice: fence-scsi-node1 can 
not fence (reboot) node03: static-list
Mar 15 14:54:04 node03 stonith-ng[2601]:  notice: fence-node1 can not 
fence (reboot) node03: static-list
Mar 15 14:54:04 node03 stonith-ng[2601]:  notice: Operation reboot of 
node03 by  for stonith_admin.29191@node01.d1df9201: No such device
Mar 15 14:54:04 node03 crmd[2605]:  notice: Peer node03 was not 
terminated (reboot) by  for node01: No such device 
(ref=d1df9201-5bb1-447f-9b40-d3d7235c3d0a) by client stonith_admin.29191


node01 # stonith_admin -L
 fence-scsi-node3
 fence-node3
2 devices found

node03 # stonith_admin -L
 fence-scsi-node1
 fence-node1
2 devices found

node01 # sg_persist --in -r -d 
/dev/disk/by-id/scsi-360060e8013757c005020757c3f08

  HITACHI   OPEN-V7303
  Peripheral device type: disk
  PR generation=0x6, Reservation follows:
Key=0x7b6b0001
scope: LU_SCOPE,  type: Write Exclusive, registrants only
node01 # sg_persist --in -k -d 
/dev/disk/by-id/scsi-360060e8013757c005020757c3f08

  HITACHI   OPEN-V7303
  Peripheral device type: disk
  PR generation=0x6, 4 registered reservation keys follow:
0x7b6b
0x7b6b0001
0x7b6b0001
0x7b6b

node03 # sg_persist --in -r -d