[ClusterLabs] Corosync lost quorum but DLM still gives locks
Hi, I am caught by surprise with this behaviour of DLM: - I have 5 nodes (test VMs) - 3 of them have 1 vote for the corosync quorum (they are "voters") - 2 of them have 0 vote ("non-voters") So the corosync quorum is 2. On the non-voters, I run DLM and an application that runs it. On DLM, fencing is disabled. Now, if I stop corosync on 2 of the voters: - as expected, corosync says "Activity blocked" - but to my surprise, DLM seems happy to give more locks Shouldn't DLM block lock requests in this situation? Cheers, JM -- [root@vm4 ~]# corosync-quorumtool Quorum information -- Date: Wed Oct 11 20:29:52 2017 Quorum provider: corosync_votequorum Nodes:3 Node ID: 5 Ring ID: 3/24 Quorate: No Votequorum information -- Expected votes: 3 Highest expected: 3 Total votes: 1 Quorum: 2 Activity blocked Flags: Membership information -- Nodeid Votes Name 3 1 172.16.2.33 4 0 172.16.3.33 5 0 172.16.4.33 (local) [root@vm4 ~]# dlm_tool status cluster nodeid 5 quorate 0 ring seq 24 24 daemon now 6908 fence_pid 0 node 4 M add 4912 rem 0 fail 0 fence 0 at 0 0 node 5 M add 4912 rem 0 fail 0 fence 0 at 0 0 [root@vm4 ~]# corosync-cpgtool Group Name PID Node ID dlm:ls:XYZ\x00 971 4 (172.16.3.33) 10095 5 (172.16.4.33) dlm:controld\x00 971 4 (172.16.3.33) 10095 5 (172.16.4.33) [root@vm4 ~]# cat /etc/redhat-release CentOS Linux release 7.2.1511 (Core) [root@vm4 ~]# rpm -q corosync dlm corosync-2.4.0-9.el7_4.2.x86_64 dlm-4.0.7-1.el7.x86_64 -- saff...@gmail.com ___ Users mailing list: Users@clusterlabs.org http://lists.clusterlabs.org/mailman/listinfo/users Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org
Re: [ClusterLabs] ClusterMon mail notification - does not work
On Wed, 2017-10-11 at 09:12 +0200, Ferenc Wágner wrote: > Donat Zenichevwrites: > > > then resource is stopped, but nothing occurred on e-mail > > destination. > > Where I did wrong actions? > > Please note that ClusterMon notifications are becoming deprecated > (they > should still work, but I've got no experience with them). Try using > alerts instead, as documented at > https://clusterlabs.org/doc/en-US/Pacemaker/1.1/html/Pacemaker_Explai > ned/ch07.html Alerts were introduced in Pacemaker 1.1.15 -- I believe Ubuntu 16.04 has 1.1.14. Donat: if you can upgrade to a newer Ubuntu, you should be able to get a version with alerts, which is a better implementation for sending e- mails than ClusterMon. If you can't, or you still want to use ClusterMon for the HTML status updates, my first suggestion would be to make sure that your version of crm_mon supports the mail-* arguments. It's a compile-time option, and I don't know if Ubuntu enabled it. Simply do "man crm_mon", and if it shows the mail-* options, then you have the capability. -- Ken Gaillot ___ Users mailing list: Users@clusterlabs.org http://lists.clusterlabs.org/mailman/listinfo/users Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org
[ClusterLabs] corosync race condition when node leaves immediately after joining
Hi ClusterLabs, I'm seeing a race condition in corosync where votequorum can have incorrect membership info when a node joins the cluster then leaves very soon after. I'm on corosync-2.3.4 plus my patch https://github.com/corosync/corosync/pull/248. That patch makes the problem readily reproducible but the bug was already present. Here's the scenario. I have two hosts, cluster1 and cluster2. The corosync.conf on cluster2 is: totem { version: 2 cluster_name: test config_version: 2 transport: udpu } nodelist { node { nodeid: 1 ring0_addr: cluster1 } node { nodeid: 2 ring0_addr: cluster2 } } quorum { provider: corosync_votequorum auto_tie_breaker: 1 } logging { to_syslog: yes } The corosync.conf on cluster1 is the same except with "config_version: 1". I start corosync on cluster2. When I start corosync on cluster1, it joins and then immediately leaves due to the lower config_version. (Previously corosync on cluster2 would also exit but with https://github.com/corosync/corosync/pull/248 it remains alive.) But often at this point, cluster1's disappearance is not reflected in the votequorum info on cluster2: Quorum information -- Date: Tue Oct 10 16:43:50 2017 Quorum provider: corosync_votequorum Nodes:1 Node ID: 2 Ring ID: 700 Quorate: Yes Votequorum information -- Expected votes: 2 Highest expected: 2 Total votes: 2 Quorum: 2 Flags:Quorate AutoTieBreaker Membership information -- Nodeid Votes Name 2 1 cluster2 (local) The logs on cluster1 show: Oct 10 16:43:37 cluster1 corosync[15750]: [CMAP ] Received config version (2) is different than my config version (1)! Exiting The logs on cluster2 show: Oct 10 16:43:37 cluster2 corosync[5102]: [TOTEM ] A new membership (10.71.218.17:588) was formed. Members joined: 1 Oct 10 16:43:37 cluster2 corosync[5102]: [QUORUM] This node is within the primary component and will provide service. Oct 10 16:43:37 cluster2 corosync[5102]: [QUORUM] Members[1]: 2 Oct 10 16:43:37 cluster2 corosync[5102]: [TOTEM ] A new membership (10.71.218.18:592) was formed. Members left: 1 Oct 10 16:43:37 cluster2 corosync[5102]: [QUORUM] Members[1]: 2 Oct 10 16:43:37 cluster2 corosync[5102]: [MAIN ] Completed service synchronization, ready to provide service. It looks like QUORUM has seen cluster1's arrival but not its departure! When it works as expected, the state is left consistent: Quorum information -- Date: Tue Oct 10 16:58:14 2017 Quorum provider: corosync_votequorum Nodes:1 Node ID: 2 Ring ID: 604 Quorate: No Votequorum information -- Expected votes: 2 Highest expected: 2 Total votes: 1 Quorum: 2 Activity blocked Flags:AutoTieBreaker Membership information -- Nodeid Votes Name 2 1 cluster2 (local) Logs on cluster1: Oct 10 16:58:01 cluster1 corosync[16430]: [CMAP ] Received config version (2) is different than my config version (1)! Exiting Logs on cluster2 are either: Oct 10 16:58:01 cluster2 corosync[17835]: [TOTEM ] A new membership (10.71.218.17:600) was formed. Members joined: 1 Oct 10 16:58:01 cluster2 corosync[17835]: [QUORUM] This node is within the primary component and will provide service. Oct 10 16:58:01 cluster2 corosync[17835]: [QUORUM] Members[1]: 2 Oct 10 16:58:01 cluster2 corosync[17835]: [CMAP ] Highest config version (2) and my config version (2) Oct 10 16:58:01 cluster2 corosync[17835]: [TOTEM ] A new membership (10.71.218.18:604) was formed. Members left: 1 Oct 10 16:58:01 cluster2 corosync[17835]: [QUORUM] This node is within the non-primary component and will NOT provide any services. Oct 10 16:58:01 cluster2 corosync[17835]: [QUORUM] Members[1]: 2 Oct 10 16:58:01 cluster2 corosync[17835]: [MAIN ] Completed service synchronization, ready to provide service. ... in which it looks like QUORUM has seen cluster1's arrival *and* its departure, or: Oct 10 16:59:03 cluster2 corosync[18841]: [TOTEM ] A new membership (10.71.218.17:632) was formed. Members joined: 1 Oct 10 16:59:03 cluster2 corosync[18841]: [CMAP ] Highest config version (2) and my config version (2) Oct 10 16:59:03 cluster2 corosync[18841]: [TOTEM ] A new membership (10.71.218.18:636) was formed. Members left: 1 Oct 10 16:59:03 cluster2 corosync[18841]: [QUORUM] Members[1]: 2 Oct 10 16:59:03 cluster2 corosync[18841]: [MAIN ] Completed
Re: [ClusterLabs] if resourceA starts @nodeA then start resource[xy] @node[xy]
On Tue, 2017-10-10 at 12:06 +0100, lejeczek wrote: > > On 26/09/17 13:15, Klaus Wenninger wrote: > > On 09/26/2017 02:06 PM, lejeczek wrote: > > > hi fellas > > > > > > can something like in the subject pacemaker do? And if yes then > > > how to > > > do it? > > > > You could bind ResourceA to nodeA and resource[xy] to node[xy] via > > location constraints. > > Afterwards you could make resource[xy] depend on ResourceA - > > without > > collocation. > > The actual commands (based on crmsh or pcs) to create these rules > > depend on thedistribution you are using. > > > > Regards, > > Klaus > > thanks, > I am probably hoping for too much(?) - without man made > constraints but sort of cluster logic would make these > decisions: wherever it decided to start/run resourceA then > rourceB would have to run on different(all or remaining > cluster's nodes). > > I'd only have to tell it something like: if you started > resourceA on nodeA then remaing(or maybe specific) resources > start on all but nodeA nodes. Sure, that's simply a colocation constraint with a negative score. For details, see http://clusterlabs.org/doc/en-US/Pacemaker/1.1-pcs/htm l-single/Pacemaker_Explained/index.html#s-resource-colocation (and/or the help for whatever higher-level tools you're using) > > > > I'm looking into docs but before I'd gone through it all I hoped > > > an > > > expert could tell. > > > > > > many thanks, L. -- Ken Gaillot___ Users mailing list: Users@clusterlabs.org http://lists.clusterlabs.org/mailman/listinfo/users Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org
Re: [ClusterLabs] trouble with IPaddr2
On Wed, Oct 11, 2017 at 01:29:40PM +0200, Stefan Krueger wrote: > ohh damn.. thanks a lot for this hint.. I delete all the IPs on enp4s0f0, and > than it works.. > but could you please explain why it now works? why he has a problem with this > IPs? AFAICT, it found a better interface with that subnet and tried to use it instead of the one specified in the parameters :) But maybe IPaddr2 should just skip interface auto-detection if an explicit interface was given in the parameters? -- Valentin ___ Users mailing list: Users@clusterlabs.org http://lists.clusterlabs.org/mailman/listinfo/users Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org
[ClusterLabs] can't move/migrate ressource
Hello, when i try to migrate a ressource from one server to an other (for example for maintenance), it don't work. a single ressource works fine, after that I create a group with 2 ressources and try to move that. my config is: crm conf show node 739272007: zfs-serv1 node 739272008: zfs-serv2 primitive HA_IP-Serv1 IPaddr2 \ params ip=172.16.101.70 cidr_netmask=16 \ op monitor interval=20 timeout=30 on-fail=restart nic=bond0 \ meta target-role=Started primitive HA_IP-Serv2 IPaddr2 \ params ip=172.16.101.74 cidr_netmask=16 \ op monitor interval=10s nic=bond0 primitive nc_storage ZFS \ params pool=nc_storage importargs="-d /dev/disk/by-partlabel/" group compl_zfs-serv1 nc_storage HA_IP-Serv1 location cli-prefer-HA_IP-Serv1 compl_zfs-serv1 role=Started inf: zfs-serv1 location cli-prefer-HA_IP-Serv2 HA_IP-Serv2 role=Started inf: zfs-serv2 location cli-prefer-compl_zfs-serv1 compl_zfs-serv1 role=Started inf: zfs-serv2 location cli-prefer-nc_storage compl_zfs-serv1 role=Started inf: zfs-serv1 property cib-bootstrap-options: \ have-watchdog=false \ dc-version=1.1.16-94ff4df \ cluster-infrastructure=corosync \ cluster-name=debian \ no-quorum-policy=ignore \ default-resource-stickiness=100 \ stonith-enabled=false \ last-lrm-refresh=1507702403 command: crm resource move compl_zfs-serv1 zfs-serv2 pacemakerlog from zfs-serv2: Oct 11 13:55:58 [3556] zfs-serv2cib: info: cib_perform_op: Diff: --- 0.106.0 2 Oct 11 13:55:58 [3556] zfs-serv2cib: info: cib_perform_op: Diff: +++ 0.107.0 cc224b15d0a796e040b026b7c2965770 Oct 11 13:55:58 [3556] zfs-serv2cib: info: cib_perform_op: -- /cib/configuration/constraints/rsc_location[@id='cli-prefer-compl_zfs-serv1'] Oct 11 13:55:58 [3556] zfs-serv2cib: info: cib_perform_op: + /cib: @epoch=107 Oct 11 13:55:58 [3556] zfs-serv2cib: info: cib_process_request: Completed cib_delete operation for section constraints: OK (rc=0, origin=zfs-serv1/crm_resource/3, version=0.107.0) Oct 11 13:55:58 [3561] zfs-serv2 crmd: info: abort_transition_graph: Transition aborted by deletion of rsc_location[@id='cli-prefer-compl_zfs-serv1']: Configuration change | cib=0.107.0 source=te_update_diff:444 path=/cib/configuration/constraints/rsc_location[@id='cli-prefer-compl_zfs-serv1'] complete=true Oct 11 13:55:58 [3561] zfs-serv2 crmd: notice: do_state_transition: State transition S_IDLE -> S_POLICY_ENGINE | input=I_PE_CALC cause=C_FSA_INTERNAL origin=abort_transition_graph Oct 11 13:55:58 [3556] zfs-serv2cib: info: cib_perform_op: Diff: --- 0.107.0 2 Oct 11 13:55:58 [3556] zfs-serv2cib: info: cib_perform_op: Diff: +++ 0.108.0 (null) Oct 11 13:55:58 [3556] zfs-serv2cib: info: cib_perform_op: + /cib: @epoch=108 Oct 11 13:55:58 [3556] zfs-serv2cib: info: cib_perform_op: ++ /cib/configuration/constraints: Oct 11 13:55:58 [3556] zfs-serv2cib: info: cib_process_request: Completed cib_modify operation for section constraints: OK (rc=0, origin=zfs-serv1/crm_resource/4, version=0.108.0) Oct 11 13:55:58 [3561] zfs-serv2 crmd: info: abort_transition_graph: Transition aborted by rsc_location.cli-prefer-compl_zfs-serv1 'create': Configuration change | cib=0.108.0 source=te_update_diff:444 path=/cib/configuration/constraints complete=true Oct 11 13:55:58 [3560] zfs-serv2pengine: notice: unpack_config: On loss of CCM Quorum: Ignore Oct 11 13:55:58 [3560] zfs-serv2pengine: info: determine_online_status: Node zfs-serv2 is online Oct 11 13:55:58 [3560] zfs-serv2pengine: info: determine_online_status: Node zfs-serv1 is online Oct 11 13:55:58 [3560] zfs-serv2pengine: info: determine_op_status: Operation monitor found resource nc_storage active on zfs-serv2 Oct 11 13:55:58 [3560] zfs-serv2pengine: info: native_print: HA_IP-Serv2 (ocf::heartbeat:IPaddr2): Started zfs-serv2 Oct 11 13:55:58 [3560] zfs-serv2pengine: info: group_print: Resource Group: compl_zfs-serv1 Oct 11 13:55:58 [3560] zfs-serv2pengine: info: native_print: nc_storage (ocf::heartbeat:ZFS): Started zfs-serv1 Oct 11 13:55:58 [3560] zfs-serv2pengine: info: native_print: HA_IP-Serv1(ocf::heartbeat:IPaddr2): Started zfs-serv1 Oct 11 13:55:58 [3560] zfs-serv2pengine: info: LogActions: Leave HA_IP-Serv2 (Started zfs-serv2) Oct 11 13:55:58 [3560] zfs-serv2pengine: info: LogActions: Leave nc_storage (Started zfs-serv1) Oct 11 13:55:58 [3560] zfs-serv2pengine: info: LogActions: Leave HA_IP-Serv1 (Started zfs-serv1) Oct 11 13:55:58 [3560] zfs-serv2pengine: notice: process_pe_message: Calculated transition 8, saving inputs in /var/lib/pacemaker/pengine/pe-input-1348.bz2
Re: [ClusterLabs] trouble with IPaddr2
Hello Valentin, thanks for you help > Can you share more info on the network of zfs-serv2, for example: ip a? ip a s 1: lo:mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1 link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00 inet 127.0.0.1/8 scope host lo valid_lft forever preferred_lft forever inet6 ::1/128 scope host valid_lft forever preferred_lft forever 2: enp4s0f0: mtu 1500 qdisc mq state UP group default qlen 1000 link/ether ac:1f:6b:xx:xx:xx brd ff:ff:ff:ff:ff:ff inet 172.16.22.126/16 brd 172.16.255.255 scope global enp4s0f0 valid_lft forever preferred_lft forever inet 172.16.101.74/16 brd 172.16.255.255 scope global secondary enp4s0f0 valid_lft forever preferred_lft forever inet6 fe80::ae1f:6bff::/64 scope link valid_lft forever preferred_lft forever 3: enp4s0f1: mtu 1500 qdisc noop state DOWN group default qlen 1000 link/ether ac:1f:6b:xx:xx:xx brd ff:ff:ff:ff:ff:ff 4: ens2f0: mtu 1500 qdisc mq master bond0 state UP group default qlen 1000 link/ether 3c:fd:fe:xx:xx:xx brd ff:ff:ff:ff:ff:ff 5: ens2f1: mtu 1500 qdisc mq master bond0 state UP group default qlen 1000 link/ether 3c:fd:fe:xx:xx:xx brd ff:ff:ff:ff:ff:ff 6: bond0: mtu 1500 qdisc noqueue state UP group default qlen 1000 link/ether 3c:fd:fe:xx:xx:xx brd ff:ff:ff:ff:ff:ff inet 172.16.101.72/16 brd 172.16.255.255 scope global bond0 valid_lft forever preferred_lft forever inet 172.16.101.74/16 brd 172.16.255.255 scope global secondary bond0 valid_lft forever preferred_lft forever inet6 fe80::3efd:feff::/64 scope link valid_lft forever preferred_lft forever ohh damn.. thanks a lot for this hint.. I delete all the IPs on enp4s0f0, and than it works.. but could you please explain why it now works? why he has a problem with this IPs? best regards ___ Users mailing list: Users@clusterlabs.org http://lists.clusterlabs.org/mailman/listinfo/users Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org
[ClusterLabs] pcs 0.9.160 released
I am happy to announce the latest release of pcs, version 0.9.160. Source code is available at: https://github.com/ClusterLabs/pcs/archive/0.9.160.tar.gz or https://github.com/ClusterLabs/pcs/archive/0.9.160.zip Be aware that support for CMAN clusters has been deprecated in this release. This does not mean it will be removed in the very next release. Instead, once we get to overhaul commands supporting CMAN clusters to the new architecture, the support will be removed. Complete change log for this release: ### Added - Configurable pcsd port ([rhbz#1415197]) - Description of the `--force` option added to man page and help ([rhbz#1491631]) ### Fixed - Fixed some crashes when pcs encounters a non-ascii character in environment variables, command line arguments and so on ([rhbz#1435697]) - Fixed detecting if systemd is in use ([ghissue#118]) - Upgrade CIB schema version when `resource-discovery` option is used in location constraints ([rhbz#1420437]) - Fixed error messages in `pcs cluster report` ([rhbz#1388783]) - Increase request timeout when starting a cluster with large number of nodes to prevent timeouts ([rhbz#1463327]) - Fixed "Unable to update cib" error caused by invalid resource operation IDs - `pcs resource op defaults` now fails on an invalid option ([rhbz#1341582]) - Fixed behaviour of `pcs cluster verify` command when entered with the filename argument ([rhbz#1213946]) ### Changed - CIB changes are now pushed to pacemaker as a diff in commands overhauled to the new architecture (previously the whole CIB was pushed). This resolves race conditions and ACLs related errors when pushing CIB. ([rhbz#1441673]) - All actions / operations defined in resource agent's metadata (except meta-data, status and validate-all) are now copied to the CIB when creating a resource. ([rhbz#1418199], [ghissue#132]) - Improve documentation of the `pcs stonith confirm` command ([rhbz#1489682]) ### Deprecated - This is the last version fully supporting CMAN clusters and python 2.6. Support for these will be gradually dropped. Thanks / congratulations to everyone who contributed to this release, including Ivan Devat, Jan Pokorný, Ondrej Mular, Tomas Jelinek and Valentin Vidic. Cheers, Tomas [ghissue#118]: https://github.com/ClusterLabs/pcs/issues/118 [ghissue#132]: https://github.com/ClusterLabs/pcs/issues/132 [rhbz#1213946]: https://bugzilla.redhat.com/show_bug.cgi?id=1213946 [rhbz#1341582]: https://bugzilla.redhat.com/show_bug.cgi?id=1341582 [rhbz#1388783]: https://bugzilla.redhat.com/show_bug.cgi?id=1388783 [rhbz#1415197]: https://bugzilla.redhat.com/show_bug.cgi?id=1415197 [rhbz#1418199]: https://bugzilla.redhat.com/show_bug.cgi?id=1418199 [rhbz#1420437]: https://bugzilla.redhat.com/show_bug.cgi?id=1420437 [rhbz#1435697]: https://bugzilla.redhat.com/show_bug.cgi?id=1435697 [rhbz#1441673]: https://bugzilla.redhat.com/show_bug.cgi?id=1441673 [rhbz#1463327]: https://bugzilla.redhat.com/show_bug.cgi?id=1463327 [rhbz#1489682]: https://bugzilla.redhat.com/show_bug.cgi?id=1489682 [rhbz#1491631]: https://bugzilla.redhat.com/show_bug.cgi?id=1491631 ___ Users mailing list: Users@clusterlabs.org http://lists.clusterlabs.org/mailman/listinfo/users Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org
Re: [ClusterLabs] trouble with IPaddr2
On Wed, Oct 11, 2017 at 10:51:04AM +0200, Stefan Krueger wrote: > primitive HA_IP-Serv1 IPaddr2 \ > params ip=172.16.101.70 cidr_netmask=16 \ > op monitor interval=20 timeout=30 on-fail=restart nic=bond0 \ > meta target-role=Started There might be something wrong with the network setup because enp4s0f0 gets used instead of bond0: > Oct 11 08:19:32 zfs-serv2 IPaddr2(HA_IP-Serv1)[27672]: INFO: Adding inet > address 172.16.101.70/16 with broadcast address 172.16.255.255 to device > enp4s0f0 Can you share more info on the network of zfs-serv2, for example: ip a? -- Valentin ___ Users mailing list: Users@clusterlabs.org http://lists.clusterlabs.org/mailman/listinfo/users Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org
Re: [ClusterLabs] corosync service not automatically started
Václav Machwrites: > On 10/11/2017 09:00 AM, Ferenc Wágner wrote: > >> Václav Mach writes: >> >>> allow-hotplug eth0 >>> iface eth0 inet dhcp >> >> Try replacing allow-hotplug with auto. Ifupdown simply runs ifup -a >> before network-online.target, which excludes allow-hotplug interfaces. >> That means allow-hotplug interfaces are not waited for before corosync >> is started during boot. > > That did the trick for network config using DHCP. Thanks for clarification. > > Do you know what is the reason, why allow-hotplug interfaces are > excluded? It's obivous that if ifup (according to it's man) is run as > 'ifup -a' it does ignore them, but I don't get why allow hotplug > interfaces should be ignored by init system. Allow-hotplug interfaces aren't assumed to be present all the time, but rather to be plugged in and out arbitrarily. They are handled by udev, asynchronously when the system is running. Waiting for them during bootup would be strange if you ask me. -- Regards, Feri ___ Users mailing list: Users@clusterlabs.org http://lists.clusterlabs.org/mailman/listinfo/users Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org
[ClusterLabs] trouble with IPaddr2
Hello, I've a simple setup with just 3 resources (at the moment), the ZFS resource also works fine.BUT my IPaddr2 don't work, and I dont know why and how to resolve that. my config: conf sh node 739272007: zfs-serv1 node 739272008: zfs-serv2 primitive HA_IP-Serv1 IPaddr2 \ params ip=172.16.101.70 cidr_netmask=16 \ op monitor interval=20 timeout=30 on-fail=restart nic=bond0 \ meta target-role=Started primitive HA_IP-Serv2 IPaddr2 \ params ip=172.16.101.74 cidr_netmask=16 \ op monitor interval=10s nic=bond0 primitive nc_storage ZFS \ params pool=nc_storage importargs="-d /dev/disk/by-partlabel/" location cli-prefer-HA_IP-Serv1 HA_IP-Serv1 role=Started inf: zfs-serv1 location cli-prefer-HA_IP-Serv2 HA_IP-Serv2 role=Started inf: zfs-serv2 location cli-prefer-nc_storage nc_storage role=Started inf: zfs-serv2 property cib-bootstrap-options: \ have-watchdog=false \ dc-version=1.1.16-94ff4df \ cluster-infrastructure=corosync \ cluster-name=debian \ no-quorum-policy=ignore \ default-resource-stickiness=100 \ stonith-enabled=false \ last-lrm-refresh=1507702403 command: resource move HA_IP-Serv1 zfs-serv2 pacemaker log from zfs-serv2: Oct 11 08:19:32 [23933] zfs-serv2cib: info: cib_process_request: Completed cib_delete operation for section constraints: OK (rc=0, origin=zfs-serv1/crm_resource/3, version=0.82.44) Oct 11 08:19:32 [23933] zfs-serv2cib: info: cib_perform_op: Diff: --- 0.82.44 2 Oct 11 08:19:32 [23933] zfs-serv2cib: info: cib_perform_op: Diff: +++ 0.83.0 (null) Oct 11 08:19:32 [23933] zfs-serv2cib: info: cib_perform_op: + /cib: @epoch=83, @num_updates=0 Oct 11 08:19:32 [23933] zfs-serv2cib: info: cib_perform_op: + /cib/configuration/constraints/rsc_location[@id='cli-prefer-HA_IP-Serv1']: @node=zfs-serv2 Oct 11 08:19:32 [23933] zfs-serv2cib: info: cib_process_request: Completed cib_modify operation for section constraints: OK (rc=0, origin=zfs-serv1/crm_resource/4, version=0.83.0) Oct 11 08:19:32 [23933] zfs-serv2cib: info: cib_perform_op: Diff: --- 0.83.0 2 Oct 11 08:19:32 [23933] zfs-serv2cib: info: cib_perform_op: Diff: +++ 0.83.1 (null) Oct 11 08:19:32 [23933] zfs-serv2cib: info: cib_perform_op: + /cib: @num_updates=1 Oct 11 08:19:32 [23933] zfs-serv2cib: info: cib_perform_op: + /cib/status/node_state[@id='739272007']/lrm[@id='739272007']/lrm_resources/lrm_resource[@id='HA_IP-Serv1']/lrm_rsc_op[@id='HA_IP-Serv1_last_0']: @operation_key=HA_IP-Serv1_stop_0, @operation=stop, @crm-debug-origin=do_update_resource, @transition-key=8:1574:0:d4b03c3c-1a4e-4609-86ca-675fa4a2ec8f, @transition-magic=0:0;8:1574:0:d4b03c3c-1a4e-4609-86ca-675fa4a2ec8f, @call-id=55, @last-run=1507702772, @last-rc-change=1507702772, @exec Oct 11 08:19:32 [23933] zfs-serv2cib: info: cib_process_request: Completed cib_modify operation for section status: OK (rc=0, origin=zfs-serv1/crmd/1853, version=0.83.1) Oct 11 08:19:32 [23938] zfs-serv2 crmd: info: do_lrm_rsc_op: Performing key=9:1574:0:d4b03c3c-1a4e-4609-86ca-675fa4a2ec8f op=HA_IP-Serv1_start_0 Oct 11 08:19:32 [23935] zfs-serv2 lrmd: info: log_execute: executing - rsc:HA_IP-Serv1 action:start call_id:17 Oct 11 08:19:32 [23933] zfs-serv2cib: info: cib_file_backup: Archived previous version as /var/lib/pacemaker/cib/cib-39.raw Oct 11 08:19:32 [23935] zfs-serv2 lrmd: info: log_finished: finished - rsc:HA_IP-Serv1 action:start call_id:17 pid:27672 exit-code:0 exec-time:48ms queue-time:0ms Oct 11 08:19:32 [23938] zfs-serv2 crmd: info: action_synced_wait: Managed IPaddr2_meta-data_0 process 27735 exited with rc=0 Oct 11 08:19:32 [23938] zfs-serv2 crmd: notice: process_lrm_event: Result of start operation for HA_IP-Serv1 on zfs-serv2: 0 (ok) | call=17 key=HA_IP-Serv1_start_0 confirmed=true cib-update=15 Oct 11 08:19:32 [23933] zfs-serv2cib: info: cib_process_request: Forwarding cib_modify operation for section status to all (origin=local/crmd/15) Oct 11 08:19:32 [23933] zfs-serv2cib: info: cib_perform_op: Diff: --- 0.83.1 2 Oct 11 08:19:32 [23933] zfs-serv2cib: info: cib_perform_op: Diff: +++ 0.83.2 (null) Oct 11 08:19:32 [23933] zfs-serv2cib: info: cib_perform_op: + /cib: @num_updates=2 Oct 11 08:19:32 [23933] zfs-serv2cib: info: cib_perform_op: + /cib/status/node_state[@id='739272008']/lrm[@id='739272008']/lrm_resources/lrm_resource[@id='HA_IP-Serv1']/lrm_rsc_op[@id='HA_IP-Serv1_last_0']: @operation_key=HA_IP-Serv1_start_0, @operation=start, @transition-key=9:1574:0:d4b03c3c-1a4e-4609-86ca-675fa4a2ec8f, @transition-magic=0:0;9:1574:0:d4b03c3c-1a4e-4609-86ca-675fa4a2ec8f, @call-id=17, @rc-code=0, @last-run=1507702772,
Re: [ClusterLabs] corosync service not automatically started
On 10/11/2017 09:00 AM, Ferenc Wágner wrote: Václav Machwrites: allow-hotplug eth0 iface eth0 inet dhcp Try replacing allow-hotplug with auto. Ifupdown simply runs ifup -a before network-online.target, which excludes allow-hotplug interfaces. That means allow-hotplug interfaces are not waited for before corosync is started during boot. That did the trick for network config using DHCP. Thanks for clarification. Do you know what is the reason, why allow-hotplug interfaces are excluded? It's obivous that if ifup (according to it's man) is run as 'ifup -a' it does ignore them, but I don't get why allow hotplug interfaces should be ignored by init system. -- Václav Mach CESNET, z.s.p.o. www.cesnet.cz smime.p7s Description: S/MIME Cryptographic Signature ___ Users mailing list: Users@clusterlabs.org http://lists.clusterlabs.org/mailman/listinfo/users Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org
Re: [ClusterLabs] ClusterMon mail notification - does not work
Donat Zenichevwrites: > then resource is stopped, but nothing occurred on e-mail destination. > Where I did wrong actions? Please note that ClusterMon notifications are becoming deprecated (they should still work, but I've got no experience with them). Try using alerts instead, as documented at https://clusterlabs.org/doc/en-US/Pacemaker/1.1/html/Pacemaker_Explained/ch07.html -- Regards, Feri ___ Users mailing list: Users@clusterlabs.org http://lists.clusterlabs.org/mailman/listinfo/users Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org
Re: [ClusterLabs] corosync service not automatically started
Václav Machwrites: > allow-hotplug eth0 > iface eth0 inet dhcp Try replacing allow-hotplug with auto. Ifupdown simply runs ifup -a before network-online.target, which excludes allow-hotplug interfaces. That means allow-hotplug interfaces are not waited for before corosync is started during boot. -- Regards, Feri ___ Users mailing list: Users@clusterlabs.org http://lists.clusterlabs.org/mailman/listinfo/users Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org