[ClusterLabs] unable to start fence_scsi on a new add node
Hi there, i have expanded a cluster with 2 nodes with an additional one "elastic-03". However, fence_scsi does not start on the new node. pcs-status: [root@logger cluster]# pcs status Cluster name: cluster_elastic Stack: corosync Current DC: elastic-02 (version 1.1.20-5.el7_7.2-3c4c782f70) - partition with quorum Last updated: Thu Apr 16 17:38:16 2020 Last change: Thu Apr 16 17:23:43 2020 by root via cibadmin on elastic-03 3 nodes configured 10 resources configured Online: [ elastic-01 elastic-02 elastic-03 ] Full list of resources: scsi (stonith:fence_scsi): Stopped Clone Set: dlm-clone [dlm] Started: [ elastic-01 elastic-02 ] Stopped: [ elastic-03 ] Clone Set: clvmd-clone [clvmd] Started: [ elastic-01 elastic-02 ] Stopped: [ elastic-03 ] Clone Set: fs_gfs2-clone [fs_gfs2] Started: [ elastic-01 elastic-02 ] Stopped: [ elastic-03 ] Failed Fencing Actions: * unfencing of elastic-03 failed: delegate=, client=crmd.5149, origin=elastic-02, last-failed='Thu Apr 16 17:23:43 2020' Daemon Status: corosync: active/enabled pacemaker: active/enabled pcsd: active/enabled corosync.log Apr 16 17:27:10 [4572] logger stonith-ng: notice: can_fence_host_with_device: scsi can fence (off) elastic-01 : static-list Apr 16 17:27:12 [4572] logger stonith-ng: notice: can_fence_host_with_device: scsi can fence (off) elastic-02 : static-list Apr 16 17:27:13 [4572] logger stonith-ng: notice: can_fence_host_with_device: scsi can not fence (off) elasti c-03: static-list Apr 16 17:38:43 [4572] logger stonith-ng: notice: can_fence_host_with_device: scsi can not fence (on) elastic -03: static-list Apr 16 17:38:43 [4572] logger stonith-ng: notice: remote_op_done: Operation on of elastic-03 by for crmd .5149@elastic-02.4b624305: No such device Apr 16 17:38:43 [4576] logger.feltengroup.local crmd:error: tengine_stonith_notify: Unfencing of elastic-03 by failed: No such device (-19) [root@logger cluster]# stonith_admin -L scsi 1 devices found [root@logger cluster]# stonith_admin -l elastic-03 No devices found Thanks for any help here. Stefan ___ Manage your subscription: https://lists.clusterlabs.org/mailman/listinfo/users ClusterLabs home: https://www.clusterlabs.org/
Re: [ClusterLabs] When the active node enters the standby state, what should be done to make the VIP not automatically jump
Hi, My suggestion would be to unmanage the IP resource before putting the node in standby. When a resource is unmanaged, the cluster will not start or stop it. On Thu, 2020-04-16 at 17:57 +0800, 邴洪涛 wrote: > >hi: > > We now get a strange requirement. When the active node enters > standby > >mode, virtual_ip will not automatically jump to the normal node, but > >requires manual operation to achieve the jump of virtual_ip > > The mode we use is Active / Passive mode > > The Resource Agent we use is ocf: heartbeat: IPaddr2 > > Hope you can solve my confusion > > Hello, > > Can you provide the version of the stack, your config and the command > you run to put the node in sandby ? > > Best Regards, > Strahil Nikolov > - > Sorry, I don't know how to reply correctly, so I pasted the previous > chat content on it > The following are the commands we use > pcs property set stonith-enabled=false > pcs property set no-quorum-policy=ignore > pcs resource create virtual_ip ocf:heartbeat:IPaddr2 > ip=${VIP} cidr_netmask=32 op monitor interval="10s" > > pcs resource create docker systemd:docker op monitor > interval="10s" timeout="15s" op start interval="0" timeout="1200s" op > stop interval="0" timeout="1200s" > pcs constraint colocation add docker virtual_ip INFINITY > pcs constraint order virtual_ip then docker > pcs constraint location docker prefers ${MASTER_NAME}=50 > > pcs resource create lsyncd systemd:lsyncd op monitor > interval="10s" timeout="15s" op start interval="0" timeout="120s" op > stop interval="0" timeout="60s" > pcs constraint colocation add lsyncd virtual_ip INFINITY > > The version we use is > Pacemaker 1.1.20-5.el7_7.2 > Written by Andrew Beekhof > ___ > Manage your subscription: > https://lists.clusterlabs.org/mailman/listinfo/users > > ClusterLabs home: https://www.clusterlabs.org/ -- Ken Gaillot ___ Manage your subscription: https://lists.clusterlabs.org/mailman/listinfo/users ClusterLabs home: https://www.clusterlabs.org/
Re: [ClusterLabs] Off-line build-time cluster configuration
Hi Craig, Currently, there is no support in RHEL8 for an equivalent of the --local option of the 'pcs cluster setup' command from RHEL7. We were focusing higher priority tasks related to supporting the new major version of corosync and knet. As a part of this, the 'pcs cluster setup' command has been completely overhauled providing better functionality overall, like improved validations, synchronizing other files than just corosync.conf and so on. Sadly, we didn't have enough capacity to support the --local option in step 1. We are working on adding support for the --local option (or its equivalent) in the near future, but we don't have any code to share yet. Obviously, the --local version of the setup will skip some tasks done in the regular cluster setup command. You are expected to do them by other means. I'll put them all here for the sake of completion, even though not all of them apply in your situation: * check that nodes are not running or configured to run a cluster * check that nodes do have cluster daemons installed in matching versions * run 'pcs cluster destroy' on each node to get rid of all cluster config files and be sure there are no leftovers from previously configured clusters * delete /var/lib/pcsd/pcs_settings.conf file (this is not done by the 'pcs cluster destroy' command) * distribute pcs auth tokens for the nodes * distribute corosync and pacemaker authkeys, /etc/corosync/authkey and /etc/pacemaker/authkey respectively * synchronize pcsd certificates (only needed if you intend to use pcs web UI in an HA mode) * distribute corosync.conf Let me know if you have any questions regarding these. Running the current 'pcs cluster setup' command on all nodes is not really an option. The command requires the nodes to be online as it stores corosync.conf and other files to them over the network. You may, however, run it once on a live cluster to get an idea of what the corosync.conf looks like and turn it into a template. I don't really expect its format or schema to be changed significantly during the RHEL8 life cycle. I understand your concerns regarding this approach, but it would give you at least some option to proceed until the --local is supported in pcs. Regards, Tomas Dne 14. 04. 20 v 20:46 Craig Johnston napsal(a): Hello, Sorry if this has already been covered, but a perusal of recent mail archives didn't turn up anything for me. We are looking for help in configuring a pacemaker/corosync cluster at the time the Linux root file system is built, or perhaps as part of a "pre-pivot" process in the initramfs of a live-CD environment. We are using the RHEL versions of the cluster products. Current production is RHEL7 based, and we are trying to move to RHEL8. The issues we have stem from the configuration tools' expectation that they are operating on a live system, with all cluster nodes available on the network. This is obviously not the case during a "kickstart" install and configuration process. It's also not true in an embedded environment where all nodes are powered simultaneously and expected to become operational without any human intervention. We create the cluster configuration from a "system model", that describes the available nodes, cluster managed services, fencing agents, etc.. This model is different for each deployment, and is used as input to create a customized Linux distribution that is deployed to a set of physical hardware, virtual machines, or containers. Each node, and it's root file system, is required to be configured and ready to go, the very first time it is ever booted. The on-media Linux file system is also immutable, and thus each boot is exactly like the previous one. Under RHEL7, we were able to use the "pcs" command to create the corosync.conf/cib.xml files for each node. e.g. pcs cluster setup --local --enable --force --name mycluster node1 node2 node3 pcs -f ${CIB} property set startup-fencing=false pcs -f ${CIB} resource create tftp ocf:heartbeat:Xinetd service=tftp --group grp_tftp etc... Plus a little "awk" "sed" on the corosync.conf file, and we were able to create a working configuration that worked out of the box. It's not pretty, but it works in spite of the fact that we feel like we're swimming up stream. Under RHEL8 however, the "pcs cluster" command no longer has a "--local" option. We can't find any tool to replace it's functionality. We can use "cibadmin --empty" to create a starting cib.xml file, but there is no way to add nodes to it (or create the corosync.conf file with nodes". Granted, we could write our own tools to create template corosync.conf/cib.xml files, and "pcs -f" still works. However, that leaves us in the unenviable position where the cluster configuration schema could change, and our tools would not be the wiser. We'd much prefer to use a standard and maintained interface for
Re: [ClusterLabs] When the active node enters the standby state, what should be done to make the VIP not automatically jump
On April 16, 2020 12:57:05 PM GMT+03:00, "邴洪涛" <695097494p...@gmail.com> wrote: >>*hi: >*>* We now get a strange requirement. When the active node enters >standby >*>*mode, virtual_ip will not automatically jump to the normal node, but >*>*requires manual operation to achieve the jump of virtual_ip >*>* The mode we use is Active / Passive mode >*>* The Resource Agent we use is ocf: heartbeat: IPaddr2 >*>* Hope you can solve my confusion >* >Hello, > >Can you provide the version of the stack, your config and the command >you run to put the node in sandby ? > >Best Regards, >Strahil Nikolov > >- > >Sorry, I don't know how to reply correctly, so I pasted the previous >chat content on it > >The following are the commands we use > >pcs property set stonith-enabled=false > >pcs property set no-quorum-policy=ignore >pcs resource create virtual_ip ocf:heartbeat:IPaddr2 ip=${VIP} >cidr_netmask=32 op monitor interval="10s" > >pcs resource create docker systemd:docker op monitor interval="10s" >timeout="15s" op start interval="0" timeout="1200s" op stop >interval="0" >timeout="1200s" >pcs constraint colocation add docker virtual_ip INFINITY >pcs constraint order virtual_ip then docker >pcs constraint location docker prefers ${MASTER_NAME}=50 > >pcs resource create lsyncd systemd:lsyncd op monitor interval="10s" >timeout="15s" op start interval="0" timeout="120s" op stop interval="0" >timeout="60s" >pcs constraint colocation add lsyncd virtual_ip INFINITY > >The version we use is > Pacemaker 1.1.20-5.el7_7.2 > Written by Andrew Beekhof If you need to enter a node in standby mode and still keep the IP on that node - I don't think that you can do it at all. Best Regards, Strahil Nikolov ___ Manage your subscription: https://lists.clusterlabs.org/mailman/listinfo/users ClusterLabs home: https://www.clusterlabs.org/
[ClusterLabs] When the active node enters the standby state, what should be done to make the VIP not automatically jump
>*hi: *>* We now get a strange requirement. When the active node enters standby *>*mode, virtual_ip will not automatically jump to the normal node, but *>*requires manual operation to achieve the jump of virtual_ip *>* The mode we use is Active / Passive mode *>* The Resource Agent we use is ocf: heartbeat: IPaddr2 *>* Hope you can solve my confusion * Hello, Can you provide the version of the stack, your config and the command you run to put the node in sandby ? Best Regards, Strahil Nikolov - Sorry, I don't know how to reply correctly, so I pasted the previous chat content on it The following are the commands we use pcs property set stonith-enabled=false pcs property set no-quorum-policy=ignore pcs resource create virtual_ip ocf:heartbeat:IPaddr2 ip=${VIP} cidr_netmask=32 op monitor interval="10s" pcs resource create docker systemd:docker op monitor interval="10s" timeout="15s" op start interval="0" timeout="1200s" op stop interval="0" timeout="1200s" pcs constraint colocation add docker virtual_ip INFINITY pcs constraint order virtual_ip then docker pcs constraint location docker prefers ${MASTER_NAME}=50 pcs resource create lsyncd systemd:lsyncd op monitor interval="10s" timeout="15s" op start interval="0" timeout="120s" op stop interval="0" timeout="60s" pcs constraint colocation add lsyncd virtual_ip INFINITY The version we use is Pacemaker 1.1.20-5.el7_7.2 Written by Andrew Beekhof ___ Manage your subscription: https://lists.clusterlabs.org/mailman/listinfo/users ClusterLabs home: https://www.clusterlabs.org/
[ClusterLabs] When the active node enters the standby state, what should be done to make the VIP not automatically jump
hi: We now get a strange requirement. When the active node enters standby mode, virtual_ip will not automatically jump to the normal node, but requires manual operation to achieve the jump of virtual_ip The mode we use is Active / Passive mode The Resource Agent we use is ocf: heartbeat: IPaddr2 Hope you can solve my confusion ___ Manage your subscription: https://lists.clusterlabs.org/mailman/listinfo/users ClusterLabs home: https://www.clusterlabs.org/