Thank you Andrei, and apologies for being unclear: offline in this example was supposed to mean stopped for maintenance, i.e. with pcs cluster stop.
So, basically, here is what's going on: VIP 172.16.16.9; mac = 11:54:33:a8:b2:6b redmine1 172.16.16.10, if mac = 00:0c:29:8e:0c:a4 redmine2 172.16.16.11 if mac = 00:0c:29:96:9c:c6 1. Both nodes online, as pcs status shows [root@redmine2 ~]# pcs status [...] 2 nodes configured 2 resources configured Online: [ redmine1 redmine2 ] Full list of resources: Clone Set: RedmineIP-clone [RedmineIP] (unique) RedmineIP:0 (ocf::heartbeat:IPaddr2): Started redmine1 RedmineIP:1 (ocf::heartbeat:IPaddr2): Started redmine2 Daemon Status: corosync: active/disabled pacemaker: active/disabled pcsd: active/enabled ARP entry: ? (172.16.16.9) at 11:54:33:a8:b2:6b on re1_vlan6 expires in 1197 seconds Everything correct here. 2. redmine1 is stopped with pcs cluster stop 172.16.16.10; pcs status shows [root@redmine2 ~]# pcs status [...] 2 nodes configured 2 resources configured Online: [ redmine2 ] OFFLINE: [ redmine1 ] Full list of resources: Clone Set: RedmineIP-clone [RedmineIP] (unique) RedmineIP:0 (ocf::heartbeat:IPaddr2): Started redmine2 RedmineIP:1 (ocf::heartbeat:IPaddr2): Started redmine2 Daemon Status: corosync: active/disabled pacemaker: active/disabled pcsd: active/enabled Failover worked, both resources serviced by second host. However, target now learned redmine2's max for VIP: ARP entry: ? (172.16.16.9) at 00:0c:29:96:9c:c6 on re1_vlan6 expires in 1155 seconds So far not "dangerous", as all IPs are serviced by redmine2 anyway. 3. But now, after failback via pcs cluster start 172.16.16.10: [root@redmine2 ~]# pcs status [...] 2 nodes configured 2 resources configured Online: [ redmine1 redmine2 ] Full list of resources: Clone Set: RedmineIP-clone [RedmineIP] (unique) RedmineIP:0 (ocf::heartbeat:IPaddr2): Started redmine2 RedmineIP:1 (ocf::heartbeat:IPaddr2): Started redmine1 Daemon Status: corosync: active/disabled pacemaker: active/disabled pcsd: active/enabled ARP Entry: ? (172.16.16.9) at 00:0c:29:8e:0c:a4 on re1_vlan6 expires in 1184 seconds For some reason, the VIP now resolves to only redmine1 instead of Multicast MAC. If the host should be serviced by redmine2 (through clusterip_hash=sourceip), then the VIP becomes unreachable! So, wouldn't the correct behavior be to always maintain the Multicast MAC? Mit freundlichen Grüßen / With best regards Andreas Iwanowski- IT Administrator / Software Developer www.awato.de |namez...@afim.info T:+49 2133 26031 55 | F: +49 (0)2133 26031 01 awato Software GmbH | Salm Reifferscheidt Allee 37 | D-41540 Dormagen avisor-Support | T: +49 (0)621 6094 043 | F: +49 (0)621 6071 447 Geschäftsführer: Ursula Iwanowski | HRB: Neuss 7208 | VAT-no.: DE 122796158 -----Original Message----- From: Users [mailto:users-boun...@clusterlabs.org] On Behalf Of Andrei Borzenkov Sent: Wednesday, 14 March, 2018 8:01 To: Cluster Labs - All topics related to open-source clustering welcomed Subject: Re: [ClusterLabs] FW: ocf_heartbeat_IPaddr2 - Real MAC of interface is revealed On Wed, Mar 14, 2018 at 12:40 AM, Andreas M. Iwanowski <namez...@afim.info> wrote: > Dear folks, > > We are currently trying to set up a multimaster cluster and use a cloned > ocf_heartbeat_IPaddr2 resource to share the IP address. > > We have, however, run into a problem that, when a cluster member is taken > offline, the MAC for the IP address changes from the multicast-MAC to the > interface mac of the remaining host. > When the other host is put pack online, pings to the cluster IP time out when > it changes back to multicast (until the ARP cache on the router expires). > What exactly offline means? Host failure? You put node in standby in pacemaker? When MAC changes - immediately or after host/cluster restart? > Is there any way to prevent network devices from learning the interface MACs? > I.e. even if one host is servicing both resources, use the multicast MAC? > Any help would be appreciated! > > Here is the pcs status: > =========================== > Cluster name: test_svc > WARNING: corosync and pacemaker node names do not match (IPs used in > setup?) > Stack: corosync > Current DC: host1 (version 1.1.16-12.el7_4.8-94ff4df) - partition with > quorum Last updated: Tue Mar 13 07:12:07 2018 Last change: Sun Mar 11 > 17:17:04 2018 by hacluster via crmd on host1 > > 2 nodes configured > 2 resources configured > > Online: [ host1 host2 ] > I guess output when one host is "offline" would be needed here. > Full list of resources: > > Clone Set: RedmineIP-clone [RedmineIP] (unique) > RedmineIP:0 (ocf::heartbeat:IPaddr2): Started host1 > RedmineIP:1 (ocf::heartbeat:IPaddr2): Started host2 > > Daemon Status: > corosync: active/disabled > pacemaker: active/disabled > pcsd: active/enabled > =========================== > _______________________________________________ Users mailing list: Users@clusterlabs.org https://lists.clusterlabs.org/mailman/listinfo/users Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org _______________________________________________ Users mailing list: Users@clusterlabs.org https://lists.clusterlabs.org/mailman/listinfo/users Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org