On Mon, Mar 18, 2013 at 3:17 AM, David Coulson <da...@davidcoulson.net> wrote: > First off, I'm going to preface this with the realization that what I am > explaining makes no sense, doesn't follow normal logic and I'm not a > complete idiot. I've beaten my head against a wall with this issue for two > days, and have made no progress, yet we've had a couple of production system > outages because of it. > > Environment is a pair of IBM x-series systems in a DMZ connected to an > ASA5500. Each IBM box has two interfaces in a mode=4 bond connected to two > switches, which connected to the pri/sec firewall and are interconnected - > Poor man's redundancy I support. Both boxes run RHEL6.3 and Pacemaker > 1.1.7-6.el6-148fccfd5985c5590cc601123c6c16e966b85d14. ASA has a ARP table > timeout of 4hours. > > There are about a dozen IPAddr resources in a group which are configured > with meta ordered="false" collocated="false" - Each is independent from a > service perspective, but the group makes it easy to manage them. Each box > runs LVS with mangle rules, then assigns fwm values for routing within LVS - > For whatever reason, this still requires the IP to be on the box receiving > the packet through LVS, even if the mangle rule is triggered. > > We've had a couple of instances for two IPs in this configuration where > Pacemaker (and syslog) indicate the IP is assigned to box 01, yet the > firewall receives an ARP reply from box 02. Didn't believe it at first until > we grabbed packets from a SPAN on the switches. Correct IP address in reply, > MAC of one of the bonded interfaces on box 02, yet the IP isn't on it. > > We've experienced both 01 arping for an IP on 02, and 02 arping for an IP on > 01. Last night when we had the issue, an IP was on 02, 01 arped for it and I > tcpdumped on 01 and saw SYN packets coming in for the IP on 01 - Makes > sense, but doesn't explain why the box answered the arp in the first place.
Had pacemaker just failed over the IP? Did you set any ARP related options for the resource? > > I realize this likely isn't a Pacemaker issue, but I was hoping someone else > might have experienced a similar issue, or can at least point me in the > right direction. We have a far more complex Pacemaker/LVS environment on our > inside network (which isn't link-local to the ASA - goes through an inside > router) which works flawlessly, so I'm open to the fact that something is > totally screwed up in our DMZ. > > Sorry that was long. :) > > _______________________________________________ > Pacemaker mailing list: Pacemaker@oss.clusterlabs.org > http://oss.clusterlabs.org/mailman/listinfo/pacemaker > > Project Home: http://www.clusterlabs.org > Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf > Bugs: http://bugs.clusterlabs.org _______________________________________________ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org