Nick, Do each of your ifcfg-eth* files contain the real HWADDR for your ethernet devices? I've seen a lot of setups (every place I have walked into) where they are either duplicated from other ifcfg-eth files or bashed together in a non-consistent way that casuses grief when it comes to the bond picking which MAC address to use, etc (it's actually possible to tell the bond to use a totally different one, but I'm not sure what the recommended practice is -- I've always just ensured the real HW addrs are in there). The logs seem to imply that this is one of the conflict you are running into.
Sounds simple, but I've seen a lot of things blow up because of it. -----Original Message----- From: [email protected] [mailto:[email protected]] On Behalf Of Nick Lunt Sent: Friday, January 29, 2010 1:21 AM To: Red Hat Enterprise Linux 5 (Tikanga) discussion mailing-list Subject: [rhelv5-list] Suspect bonding errors Hi folks we have an NFS server with a bonded IP. This shared directories to 5 clients. The server and the clients talk to each other on a private subnet. This subnet is a 192. address. The server also has another bonded IP for user access on a 10. address. Yesterday the NFS mounts failed. Trying to remount them gave us the "permission denied" error, however the NFS server logs showed "successfully authenticated". We arranged down time last night and restarted NFS on the server, then restarted NFS on the clients. However the clients would still not mount the NFS shares. We decided to reboot the NFS server, it came back up, we got on it via the 10. address but we could not even ping the NFS clients. Then after nearly 1 hour of basically scratching our heads ping suddenly started working and the clients could mount the NFS shares. However I reckon it could fail again at any moment. Checking the server logs I noticed some info for bond1, the 192. address the server uses to talk to the clients: Jan 28 20:39:07 findb kernel: bonding: bond1 is being created... Jan 28 20:39:07 findb kernel: bonding: bond1: Adding slave eth2. Jan 28 20:39:07 findb kernel: bonding: bond1: enslaving eth2 as a backup interface with a down link. Jan 28 20:39:07 findb kernel: bonding: bond1: Adding slave eth3. Jan 28 20:39:07 findb kernel: bonding: bond1: enslaving eth3 as a backup interface with a down link. Jan 28 20:39:07 findb kernel: bonding: bond1: link status definitely up for interface eth3. Jan 28 20:39:07 findb kernel: bonding: bond1: making interface eth3 the new active one. Jan 28 20:39:07 findb kernel: bonding: bond1: first active interface up! Jan 28 20:39:07 findb kernel: bonding: bond1: link status definitely up for interface eth2. Jan 28 21:23:08 findb kernel: bonding: bond1: Removing slave eth2 Jan 28 21:23:08 findb kernel: bonding: bond1: Warning: the permanent HWaddr of eth2 - 00:1E:C9:CD:D9:4C - is still in use by bond1. Set the HWaddr of eth2 to a different address to avoid conflicts. Jan 28 21:23:08 findb kernel: bonding: bond1: releasing backup interface eth2 Jan 28 21:23:08 findb kernel: bonding: bond1: Removing slave eth3 Jan 28 21:23:08 findb kernel: bonding: bond1: releasing active interface eth3 Jan 28 21:23:13 findb kernel: ADDRCONF(NETDEV_UP): bond1: link is not ready Jan 28 21:23:13 findb kernel: bonding: bond1: Adding slave eth2. Jan 28 21:23:13 findb kernel: bonding: bond1: enslaving eth2 as a backup interface with a down link. Jan 28 21:23:13 findb kernel: bonding: bond1: Adding slave eth3. Jan 28 21:23:14 findb kernel: bonding: bond1: enslaving eth3 as a backup interface with a down link. Jan 28 21:23:16 findb kernel: bonding: bond1: link status definitely up for interface eth2. Jan 28 21:23:16 findb kernel: bonding: bond1: making interface eth2 the new active one. Jan 28 21:23:16 findb kernel: bonding: bond1: first active interface up! Jan 28 21:23:16 findb kernel: ADDRCONF(NETDEV_CHANGE): bond1: link becomes ready Jan 28 21:23:16 findb kernel: bonding: bond1: link status definitely up for interface eth3. This doesn't look right to me, what is the Warning line meaning ? I'm struggling to work out what the problem is here, any help appreciated. Nick. __________ Information from ESET NOD32 Antivirus, version of virus signature database 4816 (20100128) __________ The message was checked by ESET NOD32 Antivirus. http://www.eset.com _______________________________________________ rhelv5-list mailing list [email protected] https://www.redhat.com/mailman/listinfo/rhelv5-list This email communication and any files transmitted with it may contain confidential and or proprietary information and is provided for the use of the intended recipient only. Any review, retransmission or dissemination of this information by anyone other than the intended recipient is prohibited. If you receive this email in error, please contact the sender and delete this communication and any copies immediately. Thank you. http://www.encana.com _______________________________________________ rhelv5-list mailing list [email protected] https://www.redhat.com/mailman/listinfo/rhelv5-list
