Nick, 

Do each of your ifcfg-eth* files contain the real HWADDR for your
ethernet devices? I've seen a lot of setups (every place I have walked
into) where they are either duplicated from other ifcfg-eth files or
bashed together in a non-consistent way that casuses grief when it comes
to the bond picking which MAC address to use, etc (it's actually
possible to tell the bond to use a totally different one, but I'm not
sure what the recommended practice is -- I've always just ensured the
real HW addrs are in there). The logs seem to imply that this is one of
the conflict you are running into. 

Sounds simple, but I've seen a lot of things blow up because of it.

-----Original Message-----
From: [email protected]
[mailto:[email protected]] On Behalf Of Nick Lunt
Sent: Friday, January 29, 2010 1:21 AM
To: Red Hat Enterprise Linux 5 (Tikanga) discussion mailing-list
Subject: [rhelv5-list] Suspect bonding errors

Hi folks

we have an NFS server with a bonded IP. This shared directories to 5
clients.
The server and the clients talk to each other on a private subnet. 
This subnet is a 192. address.

The server also has another bonded IP for user access on a 10. address.

Yesterday the NFS mounts failed. Trying to remount them gave us the
"permission denied" error, however the NFS server logs showed
"successfully authenticated".

We arranged down time last night and restarted NFS on the server, then
restarted NFS on the clients. However the clients would still not mount
the NFS shares.

We decided to reboot the NFS server, it came back up, we got on it via
the 10. address but we could not even ping the NFS clients.
Then after nearly 1 hour of basically scratching our heads ping suddenly
started working and the clients could mount the NFS shares.
However I reckon it could fail again at any moment.

Checking the server logs I noticed some info for bond1, the 192. address
the server uses to talk to the clients:

Jan 28 20:39:07 findb kernel: bonding: bond1 is being created...
Jan 28 20:39:07 findb kernel: bonding: bond1: Adding slave eth2.
Jan 28 20:39:07 findb kernel: bonding: bond1: enslaving eth2 as a backup
interface with a down link.
Jan 28 20:39:07 findb kernel: bonding: bond1: Adding slave eth3.
Jan 28 20:39:07 findb kernel: bonding: bond1: enslaving eth3 as a backup
interface with a down link.
Jan 28 20:39:07 findb kernel: bonding: bond1: link status definitely up
for interface eth3.
Jan 28 20:39:07 findb kernel: bonding: bond1: making interface eth3 the
new active one.
Jan 28 20:39:07 findb kernel: bonding: bond1: first active interface up!
Jan 28 20:39:07 findb kernel: bonding: bond1: link status definitely up
for interface eth2.
Jan 28 21:23:08 findb kernel: bonding: bond1: Removing slave eth2
Jan 28 21:23:08 findb kernel: bonding: bond1: Warning: the permanent
HWaddr of eth2 - 00:1E:C9:CD:D9:4C - is still in use by bond1. Set the
HWaddr of eth2 to a different address to avoid conflicts.
Jan 28 21:23:08 findb kernel: bonding: bond1: releasing backup interface
eth2
Jan 28 21:23:08 findb kernel: bonding: bond1: Removing slave eth3
Jan 28 21:23:08 findb kernel: bonding: bond1: releasing active interface
eth3
Jan 28 21:23:13 findb kernel: ADDRCONF(NETDEV_UP): bond1: link is not
ready
Jan 28 21:23:13 findb kernel: bonding: bond1: Adding slave eth2.
Jan 28 21:23:13 findb kernel: bonding: bond1: enslaving eth2 as a backup
interface with a down link.
Jan 28 21:23:13 findb kernel: bonding: bond1: Adding slave eth3.
Jan 28 21:23:14 findb kernel: bonding: bond1: enslaving eth3 as a backup
interface with a down link.
Jan 28 21:23:16 findb kernel: bonding: bond1: link status definitely up
for interface eth2.
Jan 28 21:23:16 findb kernel: bonding: bond1: making interface eth2 the
new active one.
Jan 28 21:23:16 findb kernel: bonding: bond1: first active interface up!
Jan 28 21:23:16 findb kernel: ADDRCONF(NETDEV_CHANGE): bond1: link
becomes ready
Jan 28 21:23:16 findb kernel: bonding: bond1: link status definitely up
for interface eth3.

This doesn't look right to me, what is the Warning line meaning ?

I'm struggling to work out what the problem is here, any help
appreciated.

Nick.




 

__________ Information from ESET NOD32 Antivirus, version of virus
signature database 4816 (20100128) __________

The message was checked by ESET NOD32 Antivirus.

http://www.eset.com
 

_______________________________________________
rhelv5-list mailing list
[email protected]
https://www.redhat.com/mailman/listinfo/rhelv5-list

This email communication and any files transmitted with it may contain 
confidential and or proprietary information and is provided for the use of the 
intended recipient only.  Any review, retransmission or dissemination of this 
information by anyone other than the intended recipient is prohibited.  If you 
receive this email in error, please contact the sender and delete this 
communication and any copies immediately.  Thank you.
http://www.encana.com



_______________________________________________
rhelv5-list mailing list
[email protected]
https://www.redhat.com/mailman/listinfo/rhelv5-list

Reply via email to