Re: [ClusterLabs] Trying to understand the default action of a fence agent

2019-01-10 Thread Bryan K. Walton
On Tue, Jan 08, 2019 at 01:29:51PM -0600, Bryan K. Walton wrote:
> On Tue, Jan 08, 2019 at 10:55:09AM -0600, Ken Gaillot wrote:
> > 
> > FYI pcmk_off_action="off" is the default
> > 
> > If you want the cluster to request an "off" command instead of a
> > "reboot" when fencing a node, set the stonith-action cluster property
> > to "off".
> 
> Awesome! Thank you, Ken.  I don't know how I've missed this, up to now.
> Setting this property is exactly what I needed.

I swear I had this working the other day.  I'm still struggling with
this, apparently.  I've set the default stonith-action to off:

[root@storage1 ~]# pcs config | grep -i stonith-action
 stonith-action: off

[root@storage1 ~]# pcs config | grep -i stonith-enabled
  stonith-enabled: true

But when I run "pcs stonith fence storage2" (from my storage1 node, the
fabric ports are getting sucessfully disabled, and then re-enabled:

Here are the logs that show stonith-ng issuing "off" commands
(successfully), and then following up with "on" commands:

Jan 10 13:31:55 storage1 stonith-ng[43051]:  notice: Client
stonith_admin.44835.f958d69c wants to fence (reboot) 'storage2' with
device '(any)'
Jan 10 13:31:55 storage1 stonith-ng[43051]:  notice: Requesting peer
fencing (off) of storage2
Jan 10 13:31:55 storage1 stonith-ng[43051]:  notice:
fenceStorage2-millipede can fence (reboot) storage2: static-list
Jan 10 13:31:55 storage1 stonith-ng[43051]:  notice:
fenceStorage2-centipede can fence (reboot) storage2: static-list
Jan 10 13:31:56 storage1 stonith-ng[43051]:  notice: Operation 'off'
[44836] (call 2 from stonith_admin.44835) for host 'storage2' with
device 'fenceStorage2-centipede' returned: 0 (OK)
Jan 10 13:31:56 storage1 stonith-ng[43051]:  notice: Call to
fenceStorage2-centipede for 'storage2 off' on behalf of
stonith_admin.44835@storage1: OK (0)
Jan 10 13:31:57 storage1 stonith-ng[43051]:  notice: Operation 'off'
[44930] (call 2 from stonith_admin.44835) for host 'storage2' with
device 'fenceStorage2-millipede' returned: 0 (OK)
Jan 10 13:31:57 storage1 stonith-ng[43051]:  notice: Call to
fenceStorage2-millipede for 'storage2 off' on behalf of
stonith_admin.44835@storage1: OK (0)
Jan 10 13:31:58 storage1 stonith-ng[43051]:  notice: Operation 'on'
[44936] (call 2 from stonith_admin.44835) for host 'storage2' with
device 'fenceStorage2-centipede' returned: 0 (OK)
Jan 10 13:31:58 storage1 stonith-ng[43051]:  notice: Call to
fenceStorage2-centipede for 'storage2 on' on behalf of
stonith_admin.44835@storage1: OK (0)
Jan 10 13:32:00 storage1 stonith-ng[43051]:  notice: Operation 'on'
[44942] (call 2 from stonith_admin.44835) for host 'storage2' with
device 'fenceStorage2-millipede' returned: 0 (OK)
Jan 10 13:32:00 storage1 stonith-ng[43051]:  notice: Call to
fenceStorage2-millipede for 'storage2 on' on behalf of
stonith_admin.44835@storage1: OK (0)
Jan 10 13:32:00 storage1 stonith-ng[43051]:  notice: Operation reboot of
storage2 by storage1 for stonith_admin.44835@storage1.0b0f51e0: OK
Jan 10 13:32:00 storage1 crmd[43055]:  notice: Peer storage2 was
terminated (reboot) by storage1 on behalf of stonith_admin.44835: OK

Any ideas what I'm doing wrong?  I'd be happy to provide more mogs, if
desired.

Thanks!
Bryan
___
Users mailing list: Users@clusterlabs.org
https://lists.clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [ClusterLabs] Trying to understand the default action of a fence agent

2019-01-10 Thread Bryan K. Walton
On Thu, Jan 10, 2019 at 01:45:30PM -0600, Ken Gaillot wrote:
> stonith-action applies to fence actions initiated by the cluster (e.g.
> when a node disappears). When you request a fence action yourself, it
> does whatever you requested -- in this case, pcs is doing a reboot by
> default. You have to explicitly add --off to get it do "off" instead.
> 
> It occurs to me that it might be nice for pcs to follow stonith-action
> by default and allow an explicit --off or --reboot.


Thank you for the futher clarification, Ken.

-Bryan

-- 
Bryan K. Walton   319-337-3877 
Linux Systems Administrator Leepfrog Technologies, Inc 
___
Users mailing list: Users@clusterlabs.org
https://lists.clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


[ClusterLabs] Trying to Understanding crm-fence-peer.sh

2019-01-16 Thread Bryan K. Walton
I have posed this question on the DRBD-user list, but didn't receive a
response.

I'm using DRBD 8.4 with Pacemaker in a two node cluster, with a
single primary and fabric fencing.

Almost all of my STONITH testing has worked as I would expect it to.  I
get expected results when I use iptables to sever the replication link,
when I force a kernel panic, and when I trigger an unclean
shutdown/reboot with sysrq trigger.  The fact that my iptables test
results in a fenced node, would seem to suggest that crm-fence-peer.sh
is working as expected.

However, if I issue a simple reboot command on my current primary node
(storage1), what I see is that Pacemaker successfully fails over to the
secondary node.  But the logs on storage2 show the following:

Jan 11 08:49:53 storage2 kernel: drbd r0: helper command: /sbin/drbdadm
fence-peer r0
Jan 11 08:49:53 storage2 crm-fence-peer.sh[15594]:
DRBD_CONF=/etc/drbd.conf DRBD_DONT_WARN_ON_VERSION_MISMATCH=1
DRBD_MINOR=1 DRBD_PEER=storage1 DRBD_PEERS=storage1
DRBD_PEER_ADDRESS=192.168.0.2 DRBD_PEER_AF=ipv4 DRBD_RESOURCE=r0
UP_TO_DATE_NODES='' /usr/lib/drbd/crm-fence-peer.sh
Jan 11 08:49:53 storage2 crm-fence-peer.sh[15594]: INFO peer is
reachable, my disk is UpToDate: placed constraint
'drbd-fence-by-handler-r0-StorageClusterClone'
Jan 11 08:49:53 storage2 kernel: drbd r0: helper command: /sbin/drbdadm
fence-peer r0 exit code 4 (0x400)
Jan 11 08:49:53 storage2 kernel: drbd r0: fence-peer helper returned 4
(peer was fenced)

The exit code 4 would seem to suggest that storage1 should be fenced.
But the switch ports connected to storage1 are still enabled.

Am I misreading the logs here?  This is a clean reboot, maybe fencing
isn't supposed to happen in this situation?  But the logs seem to
suggest otherwise.

Thanks!
Bryan Walton

-- 
Bryan K. Walton   319-337-3877 
Linux Systems Administrator Leepfrog Technologies, Inc 

- End forwarded message -

-- 
Bryan K. Walton   319-337-3877 
Linux Systems Administrator Leepfrog Technologies, Inc 
___
Users mailing list: Users@clusterlabs.org
https://lists.clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [ClusterLabs] Antw: Trying to Understanding crm-fence-peer.sh

2019-01-16 Thread Bryan K. Walton
On Wed, Jan 16, 2019 at 04:07:36PM +0100, Ulrich Windl wrote:
> Hi!
> 
> I guess we need more logs; especially some events from storage2 before fencing
> is triggered.
> 
> Regards,
> Ulrich

Here are the rest of the logs, starting from the time that I issued the
reboot command, to the end of the fencing attempt.

Thanks,
Bryan
Jan 11 08:49:52 storage2 crmd[13173]:  notice: Result of notify operation for 
StorageCluster on storage2: 0 (ok)
Jan 11 08:49:52 storage2 kernel: block drbd1: peer( Primary -> Secondary ) 
Jan 11 08:49:52 storage2 IPaddr2(iscsiMillipedeIP)[15245]: INFO: Adding inet 
address 10.40.2.101/32 with broadcast address 10.40.1.255 to device enp179s0f0
Jan 11 08:49:52 storage2 IPaddr2(iscsiCentipedeIP)[15246]: INFO: Adding inet 
address 10.40.1.101/32 with broadcast address 10.40.2.255 to device enp179s0f1
Jan 11 08:49:52 storage2 IPaddr2(iscsiMillipedeIP)[15245]: INFO: Bringing 
device enp179s0f0 up
Jan 11 08:49:52 storage2 IPaddr2(iscsiCentipedeIP)[15246]: INFO: Bringing 
device enp179s0f1 up
Jan 11 08:49:52 storage2 IPaddr2(iscsiMillipedeIP)[15245]: INFO: 
/usr/libexec/heartbeat/send_arp  -i 200 -r 5 -p 
/var/run/resource-agents/send_arp-10.40.2.101 enp179s0f0 10.40.2.101 auto 
not_used not_used
Jan 11 08:49:52 storage2 IPaddr2(iscsiCentipedeIP)[15246]: INFO: 
/usr/libexec/heartbeat/send_arp  -i 200 -r 5 -p 
/var/run/resource-agents/send_arp-10.40.1.101 enp179s0f1 10.40.1.101 auto 
not_used not_used
Jan 11 08:49:52 storage2 crmd[13173]:  notice: Result of start operation for 
iscsiMillipedeIP on storage2: 0 (ok)
Jan 11 08:49:52 storage2 crmd[13173]:  notice: Result of start operation for 
iscsiCentipedeIP on storage2: 0 (ok)
Jan 11 08:49:53 storage2 crmd[13173]:  notice: Result of notify operation for 
StorageCluster on storage2: 0 (ok)
Jan 11 08:49:53 storage2 crmd[13173]:  notice: Result of notify operation for 
StorageCluster on storage2: 0 (ok)
Jan 11 08:49:53 storage2 kernel: drbd r0: peer( Secondary -> Unknown ) conn( 
Connected -> TearDown ) pdsk( UpToDate -> DUnknown ) 
Jan 11 08:49:53 storage2 kernel: drbd r0: ack_receiver terminated
Jan 11 08:49:53 storage2 kernel: drbd r0: Terminating drbd_a_r0
Jan 11 08:49:53 storage2 kernel: drbd r0: Connection closed
Jan 11 08:49:53 storage2 kernel: drbd r0: conn( TearDown -> Unconnected ) 
Jan 11 08:49:53 storage2 kernel: drbd r0: receiver terminated
Jan 11 08:49:53 storage2 kernel: drbd r0: Restarting receiver thread
Jan 11 08:49:53 storage2 kernel: drbd r0: receiver (re)started
Jan 11 08:49:53 storage2 kernel: drbd r0: conn( Unconnected -> WFConnection ) 
Jan 11 08:49:53 storage2 crmd[13173]:  notice: Result of notify operation for 
StorageCluster on storage2: 0 (ok)
Jan 11 08:49:53 storage2 crmd[13173]:  notice: Result of notify operation for 
StorageCluster on storage2: 0 (ok)
Jan 11 08:49:53 storage2 kernel: drbd r0: helper command: /sbin/drbdadm 
fence-peer r0
Jan 11 08:49:53 storage2 crm-fence-peer.sh[15594]: DRBD_CONF=/etc/drbd.conf 
DRBD_DONT_WARN_ON_VERSION_MISMATCH=1 DRBD_MINOR=1 DRBD_PEER=storage1 
DRBD_PEERS=storage1 DRBD_PEER_ADDRESS=192.168.0.2 DRBD_PEER_AF=ipv4 
DRBD_RESOURCE=r0 UP_TO_DATE_NODES='' /usr/lib/drbd/crm-fence-peer.sh
Jan 11 08:49:53 storage2 crm-fence-peer.sh[15594]: INFO peer is reachable, my 
disk is UpToDate: placed constraint 
'drbd-fence-by-handler-r0-StorageClusterClone'
Jan 11 08:49:53 storage2 kernel: drbd r0: helper command: /sbin/drbdadm 
fence-peer r0 exit code 4 (0x400)
Jan 11 08:49:53 storage2 kernel: drbd r0: fence-peer helper returned 4 (peer 
was fenced)
Jan 11 08:49:53 storage2 kernel: drbd r0: pdsk( DUnknown -> Outdated ) 
Jan 11 08:49:53 storage2 kernel: block drbd1: role( Secondary -> Primary ) 
Jan 11 08:49:53 storage2 kernel: block drbd1: new current UUID 
8193109A1958EDC1:6E65E262290A59E6:0525636210B40C9E:0524636210B40C9F
Jan 11 08:49:53 storage2 crmd[13173]:  notice: Result of promote operation for 
StorageCluster on storage2: 0 (ok)
Jan 11 08:49:54 storage2 crmd[13173]:  notice: Result of notify operation for 
StorageCluster on storage2: 0 (ok)
Jan 11 08:49:54 storage2 crmd[13173]:  notice: Our peer on the DC (storage1) is 
dead
Jan 11 08:49:54 storage2 crmd[13173]:  notice: State transition S_NOT_DC -> 
S_ELECTION
Jan 11 08:49:54 storage2 crmd[13173]:  notice: State transition S_ELECTION -> 
S_INTEGRATION
Jan 11 08:49:54 storage2 attrd[13171]:  notice: Node storage1 state is now lost
Jan 11 08:49:54 storage2 attrd[13171]:  notice: Removing all storage1 
attributes for peer loss
Jan 11 08:49:54 storage2 attrd[13171]:  notice: Lost attribute writer storage1
Jan 11 08:49:54 storage2 attrd[13171]:  notice: Purged 1 peer with id=1 and/or 
uname=storage1 from the membership cache
Jan 11 08:49:54 storage2 stonith-ng[13169]:  notice: Node storage1 state is now 
lost
Jan 11 08:49:54 storage2 stonith-ng[13169]:  notice: Purged 1 peer with id=1 
and/or uname=storage1 from the membership cache
Jan 11 08:49:54 storage2 cib[13168]:  notice: Node storage1 state is now lost
Jan 11 08:49:54 storage2 ci

Re: [ClusterLabs] Trying to Understanding crm-fence-peer.sh

2019-01-16 Thread Bryan K. Walton
On Wed, Jan 16, 2019 at 04:53:32PM +0100, Lars Ellenberg wrote:
> 
> To clarify: crm-fence-peer.sh is an *example implementation*
> (even though an elaborate one) of a DRBD fencing policy handler,
> which uses pacemaker location constraints on the Master role
> if DRBD is not sure about the up-to-date-ness of that instance,
> to ban nodes from taking over the Master role.
> 
> It does NOT trigger node level fencing.
> But it has to wait for, and rely on, pacemaker node level fencing.

Thanks Lars.  Between these comments, and the man page for drbd.conf, I
think I understand what is going on here.  Is it correct to say, that in
the case I provided, that DRBD successfully issued a "drbdadm outdate
res" on the other node, and therefore it didn't need to STONITH the
peer?  (Looking at the crm-fence-peer code, I see that exit code 4 is
node fenced, but there is an exit code 7 which means STONITHED.  In my
case, I got an exit code 4, and not a 7.)

Also you mentioned that "Other implementations of drbd fencing policy
handlers may directly escalate to node level fencing."

Are these "other implementations" third-party handlers?  Or are they
available from within the DRBD software? Can you point to any of these?

Thanks!
Bryan 
___
Users mailing list: Users@clusterlabs.org
https://lists.clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [ClusterLabs] Trying to Understanding crm-fence-peer.sh

2019-01-17 Thread Bryan K. Walton
On Thu, Jan 17, 2019 at 07:57:25AM +0300, Andrei Borzenkov wrote:
> That is not what your logs show. DRBD successfully issued fence-peer
> handler on local node. crm-fence-peer.sh won't attempt to run anything
> on other node. DRBD just calls fence-peer handler, it is up to handler
> to issue explicit stonith request if appropriate.

Understood. Thanks!

> > Are these "other implementations" third-party handlers?  Or are they
> > available from within the DRBD software? Can you point to any of these?
> > 
> 
> stonith_admin-fence-peer.sh?

Awesome! Thanks!

-Bryan
___
Users mailing list: Users@clusterlabs.org
https://lists.clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [ClusterLabs] Antw: Re: Antw: Trying to Understanding crm-fence-peer.sh

2019-01-17 Thread Bryan K. Walton
On Thu, Jan 17, 2019 at 07:54:28AM +0100, Ulrich Windl wrote:
> 
> Jan 11 08:49:54 storage2 crmd[13173]: warning: No reason to expect node 1 to 
> be down
> Jan 11 08:49:54 storage2 crmd[13173]:  notice: Stonith/shutdown of storage1 
> not matched
> 
> So it seems you reboot did not shut down pacemaker properly, so the other 
> node will assume some failure and fence (to be sure the other side is dead).

This makes sense, and would explain why, when I issued a reboot test
yesterday, I didn't see crm-fence-peer.sh getting called, at all. I will
attribute blame to an improper shutdown that one time.

Thanks!
Bryan
___
Users mailing list: Users@clusterlabs.org
https://lists.clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


[ClusterLabs] ERROR: This Target already exists in configFS

2019-02-12 Thread Bryan K. Walton
I'm building a storage cluster with DRBD 9, Pacemaker, LVM, and iSCSI.
I have two storage nodes in a primary/secondary configuration.

I'm trying to configure Pacemaker to create and manage an iSCSI target
and LUN.  I'm giving it the following commands:

pcs resource create targetRHEVM ocf:heartbeat:iSCSITarget \
iqn="iqn.2019-02.com.leepfrog:storage.rhevm" \
allowed_initiators="iqn.1994-05.com.redhat:3d066d1f423e \
iqn.1994-05.com.redhat:84f0f7458c58" \
--group ISCSIGroup

And:

pcs resource create lunRHEVM ocf:heartbeat:iSCSILogicalUnit \
target_iqn=iqn.2019-02.com.leepfrog:storage.rhevm lun=0 \
path=/dev/storage/rhevm \
--group ISCSIGroup


But when I do that, the resource fails to start:
Operation start for targetRHEVM (ocf:heartbeat:iSCSITarget) returned:
'unknown error' (1)
 >  stderr: Feb 12 14:06:19 INFO: Parameter auto_add_default_portal is
 >  now 'false'. 
  >  stderr: Feb 12 14:06:20 INFO: Created target
  >  iqn.2019-02.com.leepfrog:storage.rhevm. Created TPG 1. 
   >  stderr: Feb 12 14:06:20 ERROR: This Target already exists in
   >  configFS

The ONLY way I've been able to get this to work is to create the iscsi
target manually, on both storage nodes, using targetcli.  Then, manually
create the ACLs on both nodes, using targetcli.  And then, finally, ONLY
create a LUN resource in Pacemaker.

Surely, I'm doing something wrong, but I can't figure out what that is.
What might I be doing wrong that is preventing Pacemaker from creating
the target resource?

Can you assist me, please?

Thanks!
Bryan
___
Users mailing list: Users@clusterlabs.org
https://lists.clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [ClusterLabs] ERROR: This Target already exists in configFS

2019-02-13 Thread Bryan K. Walton
On Tue, Feb 12, 2019 at 02:14:17PM -0600, Bryan K. Walton wrote:
> I'm giving it the following commands:
> 
> pcs resource create targetRHEVM ocf:heartbeat:iSCSITarget \
>   iqn="iqn.2019-02.com.leepfrog:storage.rhevm" \
>   allowed_initiators="iqn.1994-05.com.redhat:3d066d1f423e \
>   iqn.1994-05.com.redhat:84f0f7458c58" \
>   --group ISCSIGroup
> 
> 
> But when I do that, the resource fails to start:
> Operation start for targetRHEVM (ocf:heartbeat:iSCSITarget) returned:
> 'unknown error' (1)
>  >  stderr: Feb 12 14:06:19 INFO: Parameter auto_add_default_portal is
>  >  now 'false'. 
>   >  stderr: Feb 12 14:06:20 INFO: Created target
>   >  iqn.2019-02.com.leepfrog:storage.rhevm. Created TPG 1. 
>>  stderr: Feb 12 14:06:20 ERROR: This Target already exists in
>>  configFS

I wanted to reply here that I was able to fix this by downloading the
latest iSCSITarget resource agent from here:
https://github.com/ClusterLabs/resource-agents/blob/master/heartbeat/iSCSITarget.in

Apparently, there is a bug in the agent, as shipped with Centos 7.

Thanks,
Bryan
___
Users mailing list: Users@clusterlabs.org
https://lists.clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


[ClusterLabs] iSCSI Target resource starts on both nodes -- despite my colocation constraint

2019-06-24 Thread Bryan K. Walton
7.bz2): Complete
Jun 20 11:48:42 storage1 crmd[240695]:  notice: State transition 
S_TRANSITION_ENGINE -> S_IDLE

Here are the logs from storage2, which was coming back online:

Jun 20 11:48:36 storage2 crmd[22305]:  notice: Result of probe operation for 
ISCSIMillipedeIP on storage2: 7 (not running)
Jun 20 11:48:36 storage2 crmd[22305]:  notice: Result of probe operation for 
ISCSICentipedeIP on storage2: 7 (not running)
Jun 20 11:48:36 storage2 crmd[22305]:  notice: Result of probe operation for 
targetRHEVM on storage2: 0 (ok)
Jun 20 11:48:36 storage2 crmd[22305]:  notice: Result of probe operation for 
targetVMStorage on storage2: 0 (ok)
Jun 20 11:48:36 storage2 crmd[22305]:  notice: Result of probe operation for 
lunRHEVM on storage2: 7 (not running)
Jun 20 11:48:36 storage2 crmd[22305]:  notice: Result of probe operation for 
lunVMStorage on storage2: 7 (not running)



-- 
Bryan K. Walton   319-337-3877 
Linux Systems Administrator Leepfrog Technologies, Inc 
___
Manage your subscription:
https://lists.clusterlabs.org/mailman/listinfo/users

ClusterLabs home: https://www.clusterlabs.org/


Re: [ClusterLabs] iSCSI Target resource starts on both nodes -- despite my colocation constraint

2019-06-24 Thread Bryan K. Walton
On Mon, Jun 24, 2019 at 12:02:59PM -0500, Ken Gaillot wrote:
> > Jun 20 11:48:36 storage1 crmd[240695]:  notice: Transition 1
> > (Complete=12, Pending=0, Fired=0, Skipped=0, Incomplete=0,
> > Source=/var/lib/pacemaker/pengine/pe-input-1054.bz2): Complete
> > Jun 20 11:48:36 storage1 pengine[240694]:   error: Resource
> > targetRHEVM is active on 2 nodes (attempting recovery)
> 
> This means that pacemaker found the target active on storage2 *without*
> having scheduled it there -- so either something outside pacemaker is
> setting up the target, or (less likely) the agent is returning the
> wrong status.

Thanks for the reply, Ken.  I can't figure out what might have caused
these iSCSI targets to already be active.  They aren't configured in
targetcli (outside of Pacemaker) and I have no scripts that do anything
like that, dynamically.

I don't have a default-resource-stickiness value set.  Could that have
caused the iSCSI targets to be brought up on the node that was being
brought out of standby?

Thanks!
Bryan

-- 
Bryan K. Walton   319-337-3877 
Linux Systems Administrator Leepfrog Technologies, Inc 
___
Manage your subscription:
https://lists.clusterlabs.org/mailman/listinfo/users

ClusterLabs home: https://www.clusterlabs.org/


Re: [ClusterLabs] iSCSI Target resource starts on both nodes -- despite my colocation constraint

2019-06-26 Thread Bryan K. Walton
On Tue, Jun 25, 2019 at 12:26:35PM -0500, Ken Gaillot wrote:
> 
> Have you tried checking whether the target is really active before
> bringing the node out of standby? That would narrow down whether the
> issue is in pacemaker or earlier.

Thanks, Ken!  Targetcli IS enabled.  I will disable it and schedule some
downtime, soon, to test.

Thanks!
Bryan
___
Manage your subscription:
https://lists.clusterlabs.org/mailman/listinfo/users

ClusterLabs home: https://www.clusterlabs.org/


[ClusterLabs] Trying to understand the default action of a fence agent

2019-01-08 Thread Bryan K. Walton
Hi,

I'm building a two node cluster with Centos 7.6 and DRBD.  These nodes
are connected upstream to two Brocade switches.  I'm trying to enable
fencing by using Digimer's fence_dlink_snmp script (
https://github.com/digimer/fence_dlink_snmp ).

I've renamed the script to fence_brocade_snmp and have 
created my stonith resources using the following syntax:

pcs -f stonith_cfg stonith create fenceStorage1-centipede \
fence_brocade_snmp pcmk_host_list=storage1-drbd ipaddr=10.40.1.1 \
community=xxx port=193 pcmk_off_action="off" \
pcmk_monitor_timeout=120s 

When I run "stonith-admin storage1-drbd", from my other node, 
the switch ports do not get disabled.  However, when I run
"stonith_admin -F storage1-drbd", the switch ports DO get disabled.

If I run "pcs stonith fence storage1-drbd", from the other node, the
response is: "Node: storage1-drbd fenced", but, again, the switch ports
are still enabled.  I'm forced to instead run: "pcs stonith fence
storage1-drbd --off" to get the ports to be disabled.

What I'm trying to figure out, is under what scenario should I see the
ports actually get disabled?  My concern is that, for example, I can
stop the cluster on storage1-drbd, and the logs will show that the
fencing was successful, and then my resources get moved.  But when I
check on the switch ports that are connected to storage1-drbd, they are
still enabled.  So, the node does not appear to be really fenced. 

Do I need to create my stonith resource differently to actually disable
those ports?

Thank you for your time.  I am greatly appreciative.

Sincerely,
Bryan Walton


-- 
Bryan K. Walton   319-337-3877 
Linux Systems Administrator         Leepfrog Technologies, Inc 

- End forwarded message -

-- 
Bryan K. Walton   319-337-3877 
Linux Systems Administrator Leepfrog Technologies, Inc 
___
Users mailing list: Users@clusterlabs.org
https://lists.clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [ClusterLabs] Trying to understand the default action of a fence agent

2019-01-08 Thread Bryan K. Walton
On Tue, Jan 08, 2019 at 10:55:09AM -0600, Ken Gaillot wrote:
> 
> FYI pcmk_off_action="off" is the default
> 
> If you want the cluster to request an "off" command instead of a
> "reboot" when fencing a node, set the stonith-action cluster property
> to "off".

Awesome! Thank you, Ken.  I don't know how I've missed this, up to now.
Setting this property is exactly what I needed.

Much obliged,
Bryan

-- 
Bryan K. Walton   319-337-3877 
Linux Systems Administrator Leepfrog Technologies, Inc 
___
Users mailing list: Users@clusterlabs.org
https://lists.clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org