[ClusterLabs] unable to start fence_scsi on a new add node

2020-04-16 Thread Stefan Sabolowitsch
Hi there,
i have expanded a cluster with 2 nodes with an additional one "elastic-03". 
However, fence_scsi does not start on the new node.

pcs-status:
[root@logger cluster]# pcs status
Cluster name: cluster_elastic
Stack: corosync
Current DC: elastic-02 (version 1.1.20-5.el7_7.2-3c4c782f70) - partition with 
quorum
Last updated: Thu Apr 16 17:38:16 2020
Last change: Thu Apr 16 17:23:43 2020 by root via cibadmin on elastic-03

3 nodes configured
10 resources configured

Online: [ elastic-01 elastic-02 elastic-03 ]

Full list of resources:

 scsi   (stonith:fence_scsi):   Stopped
 Clone Set: dlm-clone [dlm]
 Started: [ elastic-01 elastic-02 ]
 Stopped: [ elastic-03 ]
 Clone Set: clvmd-clone [clvmd]
 Started: [ elastic-01 elastic-02 ]
 Stopped: [ elastic-03 ]
 Clone Set: fs_gfs2-clone [fs_gfs2]
 Started: [ elastic-01 elastic-02 ]
 Stopped: [ elastic-03 ]

Failed Fencing Actions:
* unfencing of elastic-03 failed: delegate=, client=crmd.5149, 
origin=elastic-02,
last-failed='Thu Apr 16 17:23:43 2020'

Daemon Status:
  corosync: active/enabled
  pacemaker: active/enabled
  pcsd: active/enabled


corosync.log 
Apr 16 17:27:10 [4572] logger stonith-ng:   notice: can_fence_host_with_device: 
  scsi can fence (off) elastic-01 : static-list
Apr 16 17:27:12 [4572] logger stonith-ng:   notice: can_fence_host_with_device: 
  scsi can fence (off) elastic-02 : static-list
Apr 16 17:27:13 [4572] logger stonith-ng:   notice: can_fence_host_with_device: 
  scsi can not fence (off) elasti c-03: static-list
Apr 16 17:38:43 [4572] logger stonith-ng:   notice: can_fence_host_with_device: 
  scsi can not fence (on) elastic -03: static-list
Apr 16 17:38:43 [4572] logger stonith-ng:   notice: remote_op_done:   Operation 
on of elastic-03 by  for crmd .5149@elastic-02.4b624305: No such device
Apr 16 17:38:43 [4576] logger.feltengroup.local   crmd:error: 
tengine_stonith_notify:   Unfencing of elastic-03 by  failed: No such 
device (-19)

[root@logger cluster]# stonith_admin -L
 scsi
1 devices found

[root@logger cluster]# stonith_admin -l elastic-03
No devices found

Thanks for any help here.
Stefan

___
Manage your subscription:
https://lists.clusterlabs.org/mailman/listinfo/users

ClusterLabs home: https://www.clusterlabs.org/


Re: [ClusterLabs] When the active node enters the standby state, what should be done to make the VIP not automatically jump

2020-04-16 Thread Ken Gaillot
Hi,

My suggestion would be to unmanage the IP resource before putting the
node in standby. When a resource is unmanaged, the cluster will not
start or stop it.

On Thu, 2020-04-16 at 17:57 +0800, 邴洪涛 wrote:
> >hi:
> >  We now get a strange requirement. When the active node enters
> standby
> >mode, virtual_ip will not automatically jump to the normal node, but
> >requires manual operation to achieve the jump of virtual_ip
> > The mode we use is Active / Passive mode
> > The Resource Agent we use is ocf: heartbeat: IPaddr2
> >  Hope you can solve my confusion
> 
> Hello,
> 
> Can you provide the version of the stack, your config and the command
> you run to put the node in sandby ?
> 
> Best Regards,
> Strahil Nikolov
> -
> Sorry, I don't know how to reply correctly, so I pasted the previous
> chat content on it
> The following are the commands we use
> pcs property set stonith-enabled=false
> pcs property set no-quorum-policy=ignore
> pcs resource create virtual_ip ocf:heartbeat:IPaddr2
> ip=${VIP} cidr_netmask=32  op monitor interval="10s"
> 
> pcs resource create docker systemd:docker op monitor
> interval="10s" timeout="15s" op start interval="0" timeout="1200s" op
> stop interval="0" timeout="1200s"
> pcs constraint colocation add docker virtual_ip INFINITY
> pcs constraint order virtual_ip then docker
> pcs constraint location docker prefers ${MASTER_NAME}=50
> 
> pcs resource create lsyncd systemd:lsyncd op monitor
> interval="10s" timeout="15s" op start interval="0" timeout="120s" op
> stop interval="0" timeout="60s"
> pcs constraint colocation add lsyncd virtual_ip INFINITY
> 
> The version we use is
>  Pacemaker 1.1.20-5.el7_7.2
>  Written by Andrew Beekhof
> ___
> Manage your subscription:
> https://lists.clusterlabs.org/mailman/listinfo/users
> 
> ClusterLabs home: https://www.clusterlabs.org/
-- 
Ken Gaillot 

___
Manage your subscription:
https://lists.clusterlabs.org/mailman/listinfo/users

ClusterLabs home: https://www.clusterlabs.org/


Re: [ClusterLabs] Off-line build-time cluster configuration

2020-04-16 Thread Tomas Jelinek

Hi Craig,

Currently, there is no support in RHEL8 for an equivalent of the --local 
option of the 'pcs cluster setup' command from RHEL7. We were focusing 
higher priority tasks related to supporting the new major version of 
corosync and knet. As a part of this, the 'pcs cluster setup' command 
has been completely overhauled providing better functionality overall, 
like improved validations, synchronizing other files than just 
corosync.conf and so on. Sadly, we didn't have enough capacity to 
support the --local option in step 1.


We are working on adding support for the --local option (or its 
equivalent) in the near future, but we don't have any code to share yet.



Obviously, the --local version of the setup will skip some tasks done in 
the regular cluster setup command. You are expected to do them by other 
means. I'll put them all here for the sake of completion, even though 
not all of them apply in your situation:

* check that nodes are not running or configured to run a cluster
* check that nodes do have cluster daemons installed in matching versions
* run 'pcs cluster destroy' on each node to get rid of all cluster 
config files and be sure there are no leftovers from previously 
configured clusters
* delete /var/lib/pcsd/pcs_settings.conf file (this is not done by the 
'pcs cluster destroy' command)

* distribute pcs auth tokens for the nodes
* distribute corosync and pacemaker authkeys, /etc/corosync/authkey and 
/etc/pacemaker/authkey respectively
* synchronize pcsd certificates (only needed if you intend to use pcs 
web UI in an HA mode)

* distribute corosync.conf
Let me know if you have any questions regarding these.


Running the current 'pcs cluster setup' command on all nodes is not 
really an option. The command requires the nodes to be online as it 
stores corosync.conf and other files to them over the network.


You may, however, run it once on a live cluster to get an idea of what 
the corosync.conf looks like and turn it into a template. I don't really 
expect its format or schema to be changed significantly during the RHEL8 
life cycle. I understand your concerns regarding this approach, but it 
would give you at least some option to proceed until the --local is 
supported in pcs.



Regards,
Tomas


Dne 14. 04. 20 v 20:46 Craig Johnston napsal(a):

Hello,

Sorry if this has already been covered, but a perusal of recent mail 
archives didn't turn up anything for me.


We are looking for help in configuring a pacemaker/corosync cluster at 
the time the Linux root file system is built, or perhaps as part of a 
"pre-pivot" process in the initramfs of a live-CD environment.


We are using the RHEL versions of the cluster products.  Current 
production is RHEL7 based, and we are trying to move to RHEL8.


The issues we have stem from the configuration tools' expectation that 
they are operating on a live system, with all cluster nodes available on 
the network.  This is obviously not the case during a "kickstart" 
install and configuration process.  It's also not true in an embedded 
environment where all nodes are powered simultaneously and expected to 
become operational without any human intervention.


We create the cluster configuration from a "system model", that 
describes the available nodes, cluster managed services, fencing agents, 
etc..  This model is different for each deployment, and is used as input 
to create a customized Linux distribution that is deployed to a set of 
physical hardware, virtual machines, or containers.  Each node, and it's 
root file system, is required to be configured and ready to go, the very 
first time it is ever booted.  The on-media Linux file system is also 
immutable, and thus each boot is exactly like the previous one.


Under RHEL7, we were able to use the "pcs" command to create the 
corosync.conf/cib.xml files for each node.

e.g.
   pcs cluster setup --local --enable --force --name mycluster 
node1 node2 node3

   pcs -f ${CIB} property set startup-fencing=false
   pcs -f ${CIB} resource create tftp ocf:heartbeat:Xinetd 
  service=tftp  --group grp_tftp

   etc...

Plus a little "awk" "sed" on the corosync.conf file, and we were able to 
create a working configuration that worked out of the box. It's not 
pretty, but it works in spite of the fact that we feel like we're 
swimming up stream.


Under RHEL8 however, the "pcs cluster" command no longer has a "--local" 
option.  We can't find any tool to replace it's functionality.  We can 
use "cibadmin --empty" to create a starting cib.xml file, but there is 
no way to add nodes to it (or create the corosync.conf file with nodes".


Granted, we could write our own tools to create template 
corosync.conf/cib.xml files, and "pcs -f" still works.  However, that 
leaves us in the unenviable position where the cluster configuration 
schema could change, and our tools would not be the wiser.  We'd much 
prefer to use a standard and maintained interface for 

Re: [ClusterLabs] When the active node enters the standby state, what should be done to make the VIP not automatically jump

2020-04-16 Thread Strahil Nikolov
On April 16, 2020 12:57:05 PM GMT+03:00, "邴洪涛" <695097494p...@gmail.com> wrote:
>>*hi:
>*>*  We now get a strange requirement. When the active node enters
>standby
>*>*mode, virtual_ip will not automatically jump to the normal node, but
>*>*requires manual operation to achieve the jump of virtual_ip
>*>* The mode we use is Active / Passive mode
>*>* The Resource Agent we use is ocf: heartbeat: IPaddr2
>*>*  Hope you can solve my confusion
>*
>Hello,
>
>Can you provide the version of the stack, your config and the command
>you run to put the node in sandby ?
>
>Best Regards,
>Strahil Nikolov
>
>-
>
>Sorry, I don't know how to reply correctly, so I pasted the previous
>chat content on it
>
>The following are the commands we use
>
>pcs property set stonith-enabled=false
>
>pcs property set no-quorum-policy=ignore
>pcs resource create virtual_ip ocf:heartbeat:IPaddr2 ip=${VIP}
>cidr_netmask=32  op monitor interval="10s"
>
>pcs resource create docker systemd:docker op monitor interval="10s"
>timeout="15s" op start interval="0" timeout="1200s" op stop
>interval="0"
>timeout="1200s"
>pcs constraint colocation add docker virtual_ip INFINITY
>pcs constraint order virtual_ip then docker
>pcs constraint location docker prefers ${MASTER_NAME}=50
>
>pcs resource create lsyncd systemd:lsyncd op monitor interval="10s"
>timeout="15s" op start interval="0" timeout="120s" op stop interval="0"
>timeout="60s"
>pcs constraint colocation add lsyncd virtual_ip INFINITY
>
>The version we use is
> Pacemaker 1.1.20-5.el7_7.2
> Written by Andrew Beekhof

If you need  to enter a node in standby mode and still keep the IP on that node 
- I don't think that you can do it at all.

Best Regards,
Strahil Nikolov
___
Manage your subscription:
https://lists.clusterlabs.org/mailman/listinfo/users

ClusterLabs home: https://www.clusterlabs.org/


[ClusterLabs] When the active node enters the standby state, what should be done to make the VIP not automatically jump

2020-04-16 Thread 邴洪涛
>*hi:
*>*  We now get a strange requirement. When the active node enters standby
*>*mode, virtual_ip will not automatically jump to the normal node, but
*>*requires manual operation to achieve the jump of virtual_ip
*>* The mode we use is Active / Passive mode
*>* The Resource Agent we use is ocf: heartbeat: IPaddr2
*>*  Hope you can solve my confusion
*
Hello,

Can you provide the version of the stack, your config and the command
you run to put the node in sandby ?

Best Regards,
Strahil Nikolov

-

Sorry, I don't know how to reply correctly, so I pasted the previous
chat content on it

The following are the commands we use

pcs property set stonith-enabled=false

pcs property set no-quorum-policy=ignore
pcs resource create virtual_ip ocf:heartbeat:IPaddr2 ip=${VIP}
cidr_netmask=32  op monitor interval="10s"

pcs resource create docker systemd:docker op monitor interval="10s"
timeout="15s" op start interval="0" timeout="1200s" op stop interval="0"
timeout="1200s"
pcs constraint colocation add docker virtual_ip INFINITY
pcs constraint order virtual_ip then docker
pcs constraint location docker prefers ${MASTER_NAME}=50

pcs resource create lsyncd systemd:lsyncd op monitor interval="10s"
timeout="15s" op start interval="0" timeout="120s" op stop interval="0"
timeout="60s"
pcs constraint colocation add lsyncd virtual_ip INFINITY

The version we use is
 Pacemaker 1.1.20-5.el7_7.2
 Written by Andrew Beekhof
___
Manage your subscription:
https://lists.clusterlabs.org/mailman/listinfo/users

ClusterLabs home: https://www.clusterlabs.org/


[ClusterLabs] When the active node enters the standby state, what should be done to make the VIP not automatically jump

2020-04-16 Thread 邴洪涛
hi:
We now get a strange requirement. When the active node enters standby
mode, virtual_ip will not automatically jump to the normal node, but
requires manual operation to achieve the jump of virtual_ip
 The mode we use is Active / Passive mode
 The Resource Agent we use is ocf: heartbeat: IPaddr2
  Hope you can solve my confusion
___
Manage your subscription:
https://lists.clusterlabs.org/mailman/listinfo/users

ClusterLabs home: https://www.clusterlabs.org/