> So it seems that, while the network is configured before ucarp is > launched (S10 vs S98), the cards (or the driver?) don't have link until > after some 25 seconds after running the network startup script. So when
Sounds like spanning tree. Check your switch and set the port to portfast (begin forwarding packets immediately, even while performing spanning tree). > On a side note: the VIP works with ucarp 1.5. The first gratuitous-arp > still gets lost, but it sends an additional one when the link gets up > and it receives the heartbeats from the other server, "fixing" the > router's arp table at that moment. There were some major improvements in the releases after 1.2 for gratuitous arp reliability, in cases just like what you're describing. > Is there any other way to check the link status? I use ifplugd to detect link up or down and to start or stop ucarp respectively. It's very reliable on the hardware I use, at least. But if your issue is spanning tree, this won't help--the link really is up, just the switch isn't forwarding your packets until it can be sure doing so won't cause a bridge loop. Steve On Tue, Mar 17, 2009 at 9:33 AM, Vicente Aguilar <bise...@bisente.com> wrote: > Hi > > I have an issue on my servers related to both ucarp and the e1000 > drivers, thus the crossposting. :-) > > I think that during system boot the e1000 driver (e1000e too) reports to > the OS that the link is up some seconds before it really is. > > Server & module info: > > Red Hat Enterprise Linux ES release 4 (Nahant) > > filename: /lib/modules/2.6.9-5.ELsmp/kernel/drivers/net/e1000/e1000.ko > parm: copybreak:Maximum size of packet that is copied to a new > buffer on receive > author: Intel Corporation, <linux.n...@intel.com> > description: Intel(R) PRO/1000 Network Driver > license: GPL > version: 7.5.5-NAPI > > ucarp 1.2 > > Networking is configured on rc2.d/S10network and ucarp on S98ucarp. > > This is what happens: after a reboot of the master server, configured > with preemption so that it would be master again after getting back > online, the virtual IP was unresponsive. We did some tcpdumps and found > out that the gratuitous-arp that ucarp sends when going to master state > wasn't reaching the router, so in the router's arp table the virtual IP > still pointed to the secondary server's MAC address. > > On syslog on the primary server we have: > > Mar 17 13:45:48 server1 network: Bringing up interface eth2: succeeded > Mar 17 13:45:54 server1 ucarp[2489]: [INFO] Local advertised ethernet > address is [00:15:17:58:19:08] > Mar 17 13:45:54 server1 ucarp[2489]: [WARNING] Spawning > [/opt/VIP/servicioVIP_add.sh eth2] > Mar 17 13:45:54 server1 ucarp[2489]: [WARNING] Switching to state: > MASTER > Mar 17 13:46:12 server1 kernel: e1000: eth2: e1000_probe: Intel(R) > PRO/1000 Network Connection > Mar 17 13:46:13 server1 kernel: e1000: eth2: e1000_watchdog_task: 10/100 > speed: disabling TSO > Mar 17 13:46:13 server1 kernel: e1000: eth2: e1000_watchdog_task: NIC > Link is Up 100 Mbps Half Duplex, Flow Control: None > > So it seems that, while the network is configured before ucarp is > launched (S10 vs S98), the cards (or the driver?) don't have link until > after some 25 seconds after running the network startup script. So when > ucarp runs, the network isn't still really working. ucarp sends the > gratuitous-arp but it gets lost. After some seconds the link gets up and > the heartbeats reach the secondary server, which goes into backup state > and releases the VIP. But, as the router hasn't received the > gratuitous-arp, in its table the VIP still belongs to the secondary > server. All traffic to the VIP gets routed to the secondary server, > which drops it as it doesn't recognize the VIP any more. This last point > was verified with dumps on both the router and the secondary server and > taking a look at the arp table on the router. > > There are two things that make me think the driver has to do with this > issue: > > - I've talked with the people in charge of all the networking systems > and there have been no flapping on the port the server is plugged to. In > other words, according to the switch (Cisco Catalyst 4510), that link > has never gone down. > > - I've inserted both a mii-tool and a ethtool on the ucarp startup > script, just before launching ucarp. According to both of them the link > is UP at that moment. But according to the messages by e1000_watchdong > on syslog, the link goes UP a couple of seconds after that!!! And in any > case the first packets sent by ucarp never leave the server. > > Besides, after all this testing I've tried upgrading the driver to the > latest e1000e-0.5.11.2. Same problem, same log traces (bring up > interface succeeded -> ucarp runs -> link UP), same behavior when > studying the traffic with dumps. > > On a side note: the VIP works with ucarp 1.5. The first gratuitous-arp > still gets lost, but it sends an additional one when the link gets up > and it receives the heartbeats from the other server, "fixing" the > router's arp table at that moment. > > So, is this a know issue with the e1000/e1000e drivers? Anybody else has > experienced a similar situation? Just after a reboot, apparently having > the network up but losing traffic for some seconds? Why do mii-tool and > ethtool report that the link is UP, but it appears as going UP on syslog > a couple of seconds after that? Is there any other way to check the link > status? > > Thanks in advance. > > Regards > > -- > Vicente Aguilar <bise...@bisente.com> | http://www.bisente.com > ------------------------------------------------------------------------------ Apps built with the Adobe(R) Flex(R) framework and Flex Builder(TM) are powering Web 2.0 with engaging, cross-platform capabilities. Quickly and easily build your RIAs with Flex Builder, the Eclipse(TM)based development software that enables intelligent coding and step-through debugging. Download the free 60 day trial. http://p.sf.net/sfu/www-adobe-com _______________________________________________ E1000-devel mailing list E1000-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/e1000-devel