Hi!

There are things I don't understand: Even after
# /usr/lib64/heartbeat/send_arp -i 200 -r 5 br0 172.20.3.59 f1e991b1b951 
not_used not_used

neither the local arp table (arp) not the software bridge (brctl ... showmacs) 
know anything about the MAC address being used. So how will the system respond 
to ARP queries?

Should "ip maddress show" show the address (it doesn't)? Should "ip neighbour 
show" display the address? It doesn't.

BTW: I wondered why the "lo" interafce has an "UNKNOWN" up status in "ip link 
show":
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 16436 qdisc noqueue state UNKNOWN
    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00

I have no idea how to debug this problem. The gateway just shows one of the 
nodes using that MAC address, and ARPs sent out have the interface's MAC 
address as source; maybe that confuses the gateway.

I also consulted "Dr. Google", but it was like looking in a mirror ;-)

Any ideas?

Regards,
Ulrich

>>> Lars Marowsky-Bree <l...@suse.com> schrieb am 29.08.2012 um 15:38 in 
>>> Nachricht
<20120829133817.gd5...@suse.de>:
> On 2012-08-29T13:31:05, Ulrich Windl <ulrich.wi...@rz.uni-regensburg.de> 
> wrote:
> 
> > > Well, you should see the MAC/IP mapping in the arp table if the host
> > > is on the same ethernet segment, yes. Otherwise the host doesn't
> > > know where to send the packets to.
> > I checked the arp table of the host that is hosting the cluster IP
> > address. 
> 
> I'm not sure that that node has the ARP entry, unless it has tried to
> talk to the CIP before. The ARP table of the gateway and the other
> non-cluster nodes on the subnet are more interesting.
> 
> > > > > Can you get the network trace of the arp traffic on the router into 
> > > > > the
> > > > > subnet when an outside ping comes in?
> > > > I see this on the host (one cluster node):
> > > > o1:~ # tcpdump -p -i br0 -s100 -v -n host 172.20.3.59
> > The router is part of some HP switch where I have no access.
> 
> But that router has the ARP table and the logs you need to look at. When
> a packet comes in to the CIP, this router needs to send out an ARP
> request, accept the ARP reply (and update its ARP table), and then send
> the packet to the multicast MAC that belongs to the CIP.
> 
> (Of course, the ARP lookup happens only infrequently, caching and all,
> that's understood.)
> 
> Do you see ARP requests from the router? What do you see when a ping
> comes in?

It seems ping requests arrive at the host's interface (cluster node), but they 
are discarded before being replied to. I don't know where or why. Even if the 
firewall is off...

> 
> > > Are you trying to reach the cluster IP from one of the cluster nodes
> > > itself? I'm not sure that will work.
> > Why not (curiosity)? No, I was using a host that is some distance away.
> 
> Because I think local traffic will bypass the CLUSTERIP target which
> could lead to unexpected effects. Similar to trying to reach an ipvs/LVS
> setup from one of the real servers.
> 
> > > That looks OK. You should check the ARP table on the gateway if it is
> > > correctly updated with the address, though.
> > I'll have to meet my local guru ;-) ... Actually the MAC address was found 
> on the gateway as "(dynamic)", what ever that means...
> 
> Interesting ;-) I don't know what that means either.
> 
> > > If you try to ping the cluster IP from a client, what does tcpdump show
> > > on the servers/gateway? Do you see the ICMP ECHO REQUEST go to the
> > > cluster IP with the above MAC? How do the servers respond?
> > A remote server only shows outgoing ICMP ECHO requests, but no replies, and 
> TCP open attempts to 172.20.3.59:445/139. I'm afraid packets end at the 
> gateway (as you suspected).
> 
> This *looks* as if the gateway is discarding the ARP response it gets,
> probably complaining about an "invalid MAC".
> 
> This is a side-effect of the CIP approach violating the letter of
> RFC1812, section 3.3.2 possibly.
> 
> Nokia and Microsoft have a similar implementation too, and this can
> occasionally require that the MAC address is statically added to the
> router.
> 
> Some scenarios may be better off using the more traditional LVS/ipvs
> load balancing scenarios.
> 
> > Well, the amazing thing is that it doesn't work here, but is supported
> > through Novell. In contrast, the "public_address" of CTDB works just
> > fine here, but isn't supported by Novell: "Due to technical
> > limitations, this also includes the CTDB internal fail-over
> > functionality for IP address take-over. Please note that this part is
> > not supported by Novell. Only Pacemaker clusters are fully supported."
> 
> Uh? That's something else entirely.
> 
> The CIP works if the network environment supports it; that's outside the
> scope of the cluster software.
> 
> The above paragraph refers to traditional fail-over IP addresses.
> 
> > Well shouldn't the manual (sle-ha-manuals_en/manual/book.sleha.html)
> > include some notes on understanding and/or troubleshooting the
> > clustered IP addresses).
> 
> Manuals are always a work in progress, especially the "how do I debug
> ..." sections.
> 
> > Anyway, if one clustered IP address is up, it can also be used for
> > testing with PING.
> 
> Sure. That was what I was recommending.
> 
> 
> Regards,
>     Lars



 
_______________________________________________
Linux-HA mailing list
Linux-HA@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems

Reply via email to