Re: [CentOS] Kernel 2.6.18-53.1.13.el5 fails on network.
On Thu, Feb 14, 2008 at 03:08:54PM +1100, Steven Haigh wrote: -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of nate Sent: Thursday, 14 February 2008 2:46 PM To: centos@centos.org Subject: Re: [CentOS] Kernel 2.6.18-53.1.13.el5 fails on network. Indunil Jayasooriya wrote: I also got this type of probles once before. pls check initrd image. pls performe below steps. While it's always good to make sure your initrd is in a good state, the network drivers don't need to be in the initrd (unless your booting from NFS or something). They can be loaded fine from /lib/modules/`uname -r` What kind of network chip(s) are in the system? What driver are they using?(/etc/modprobe.conf), it'd be helpful to have the output of dmesg as well from the kernel that doesn't provide networking support. The network is an e100 - dmesg shows the following: # dmesg | grep e100: e100: Intel(R) PRO/100 Network Driver, 3.5.10-k2-NAPI e100: Copyright(c) 1999-2005 Intel Corporation e100: eth0: e100_probe: addr 0xdfffe000, irq 169, MAC addr 00:02:B3:8B:BE:26 e100: eth0: e100_watchdog: link up, 100Mbps, full-duplex Of course, this doesn't give us the exact chip, however mii-tool is a bit more helpful: # mii-tool -v eth0 eth0: negotiated 100baseTx-FD, link ok product info: Intel 82555 rev 4 basic mode: autonegotiation enabled basic status: autonegotiation complete, link ok capabilities: 100baseTx-FD 100baseTx-HD 10baseT-FD 10baseT-HD advertising: 100baseTx-FD 100baseTx-HD 10baseT-FD 10baseT-HD flow-control link partner: 100baseTx-FD 100baseTx-HD 10baseT-FD 10baseT-HD The interesting part for me however, is that certain things unrelated to the network also fail. I would expect iptables to come up as OK on boot - even if no network device was configured - as its independent of network configuration. It also doesn't explain how the firmware microcode update also fails. I had similar problem with a Linux system (Fedora) which was using SElinux in enforcing mode (like CentOS is doing by default) after I booted from a CD not supporting SElinux and editing some configuration files (like ifcfg-eth0) which has lost appropriate SElinux labels because of that. This is most probably different from what you see (one kernel working OK, the other not); no-one was tinkering with /lib/modules from not-SElinux CD, right? You could write a script for some person at the remote co-lo to execute when the system comes up w/o network, the results could be stored in a file on the disk and when the system is rebooted again under the old kernel you can examine them for possible causes. Some commands to try: dmesg ifconfig -a mii-tool route -n ping -c 5 (IP of default gateway) arping -c 5 (IP of default gateway) arp -an lsmod I have a bit of trouble with this, as the only person that can do it is around 30 minutes travel from the colo. As the system boots, I'm thinking of writing a script that will gather this, then reboot the system after changing the default=x line in /etc/grub.conf - however obviously I want to make sure it works 100% before I tell the machine to reboot ;) IP KVM device would be your friend (unfortunately they are not cheap...) -- Steven Haigh Email: [EMAIL PROTECTED] Web: http://www.crc.id.au Phone: (03) 9001 6090 - 0412 935 897 Best regards, Wojtek ___ CentOS mailing list CentOS@centos.org http://lists.centos.org/mailman/listinfo/centos
RE: [CentOS] Kernel 2.6.18-53.1.13.el5 fails on network.
On Thu, 2008-02-14 at 14:57 +1100, Steven Haigh wrote: There are a number of differences in the initrd, although nothing that I would call obvious as causing an issue.. - # gunzip -cd /boot/initrd-2.6.18-8.1.8.el5.img |cpio -t |more 6097 blocks bin snip ... sys etc # - # gunzip -cd /boot/initrd-2.6.18-53.1.13.el5.img |cpio -t |more 9679 blocks bin bin/dmraid snip sys etc # - Do yourself a favor, as you'll probably have several more comparisons to do. When making the lists, sort the output, either piped to sort or make a sorted version afterward, and use comm (man comm). You can see a nice consolidated output, or select any combination of only on file1, only on file2, ... both, etc. Makes detecting differences much faster. snip If grub had a one time next boot like LILO, I'd have some more thoughts, but *sigh* -- Bill ___ CentOS mailing list CentOS@centos.org http://lists.centos.org/mailman/listinfo/centos
RE: [CentOS] Kernel 2.6.18-53.1.13.el5 fails on network.
Steven Haigh wrote: I have a bit of trouble with this, as the only person that can do it is around 30 minutes travel from the colo. As the system boots, I'm thinking of writing a script that will gather this, then reboot the system after changing the default=x line in /etc/grub.conf - however obviously I want to make sure it works 100% before I tell the machine to reboot ;) I looked at your original email again, and if I read your previous kernel right it's over a year since you last updated the kernel? (2.6.18-8 was released 1/07 by RH, though I can't find 8.1.8) I was browsing through the change log and saw several e100 related changes, which could be related to the network end of your problems. Without more detailed information as to error messages and stuff for the failures the best thing I can suggest at this point is to try a few kernels in between the one you were on and the latest and see if any of them break, likely they will as the latest kernel only has 1 change in it. Maybe you can narrow it down to a particular kernel rev that came out. nate ___ CentOS mailing list CentOS@centos.org http://lists.centos.org/mailman/listinfo/centos
Re: [CentOS] Kernel 2.6.18-53.1.13.el5 fails on network.
After the latest lot of kernel security updates have come out, I updated one of my colo boxes and rebooted. It didn't come back up and fails when booting on: * CPU Microcode update * iptables * eth0 The booting process completes, however as you can imagine, there is no network connectivity at all. The only config changes were installing the new kernel. Booting back into 2.6.18-8.1.8.el5 make things work 100% again. I also got this type of probles once before. pls check initrd image. pls performe below steps. gunzip -cd /boot/initrd-2.6.18-8.1.8.el5.img |cpio -t |more and see then, check your newly installed kernel. as below gunzip -cd /boot/initrd-2.6.18-53.1.13.el5.img |cpio -t |more and see pls check what is missing. If found, All you have to make an initrd by using mkinitrd command. pls check below URL http://readlist.com/lists/centos.org/centos/2/13952.html -- Thank you Indunil Jayasooriya ___ CentOS mailing list CentOS@centos.org http://lists.centos.org/mailman/listinfo/centos
RE: [CentOS] Kernel 2.6.18-53.1.13.el5 fails on network.
There are a number of differences in the initrd, although nothing that I would call obvious as causing an issue.. - # gunzip -cd /boot/initrd-2.6.18-8.1.8.el5.img |cpio -t |more 6097 blocks bin bin/modprobe bin/insmod bin/nash dev dev/tty6 dev/zero dev/tty5 dev/console dev/ram1 dev/ttyS3 dev/tty0 dev/ttyS0 dev/null dev/tty3 dev/tty10 dev/ram0 dev/ptmx dev/rtc dev/tty dev/tty8 dev/ttyS1 dev/systty dev/ram dev/tty7 dev/tty1 dev/tty11 dev/tty4 dev/tty2 dev/tty12 dev/tty9 dev/ttyS2 dev/mapper proc lib lib/jbd.ko lib/uhci-hcd.ko lib/ext3.ko lib/ohci-hcd.ko lib/ehci-hcd.ko init sysroot sbin sys etc # - # gunzip -cd /boot/initrd-2.6.18-53.1.13.el5.img |cpio -t |more 9679 blocks bin bin/dmraid bin/modprobe bin/insmod bin/kpartx bin/nash dev dev/tty6 dev/zero dev/tty5 dev/console dev/ram1 dev/ttyS3 dev/tty0 dev/ttyS0 dev/null dev/tty3 dev/tty10 dev/ram0 dev/ptmx dev/rtc dev/tty dev/tty8 dev/ttyS1 dev/systty dev/ram dev/tty7 dev/tty1 dev/tty11 dev/tty4 dev/tty2 dev/tty12 dev/tty9 dev/ttyS2 dev/mapper proc lib lib/jbd.ko lib/uhci-hcd.ko lib/ext3.ko lib/firmware lib/ohci-hcd.ko lib/ehci-hcd.ko init sysroot sbin sys etc # - The obvious additions in .53 are kpartx and dmraid - however as I'm using a plain HDD (hda) with no RAID, I don't really think that would cause an issue. -- Steven Haigh Email: [EMAIL PROTECTED] Web: http://www.crc.id.au Phone: (03) 9001 6090 - 0412 935 897 -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Indunil Jayasooriya Sent: Thursday, 14 February 2008 2:34 PM To: CentOS mailing list Subject: Re: [CentOS] Kernel 2.6.18-53.1.13.el5 fails on network. After the latest lot of kernel security updates have come out, I updated one of my colo boxes and rebooted. It didn't come back up and fails when booting on: * CPU Microcode update * iptables * eth0 The booting process completes, however as you can imagine, there is no network connectivity at all. The only config changes were installing the new kernel. Booting back into 2.6.18-8.1.8.el5 make things work 100% again. I also got this type of probles once before. pls check initrd image. pls performe below steps. gunzip -cd /boot/initrd-2.6.18-8.1.8.el5.img |cpio -t |more and see then, check your newly installed kernel. as below gunzip -cd /boot/initrd-2.6.18-53.1.13.el5.img |cpio -t |more and see pls check what is missing. If found, All you have to make an initrd by using mkinitrd command. pls check below URL http://readlist.com/lists/centos.org/centos/2/13952.html -- Thank you Indunil Jayasooriya ___ CentOS mailing list CentOS@centos.org http://lists.centos.org/mailman/listinfo/centos ___ CentOS mailing list CentOS@centos.org http://lists.centos.org/mailman/listinfo/centos
Re: [CentOS] Kernel 2.6.18-53.1.13.el5 fails on network.
Indunil Jayasooriya wrote: I also got this type of probles once before. pls check initrd image. pls performe below steps. While it's always good to make sure your initrd is in a good state, the network drivers don't need to be in the initrd (unless your booting from NFS or something). They can be loaded fine from /lib/modules/`uname -r` What kind of network chip(s) are in the system? What driver are they using?(/etc/modprobe.conf), it'd be helpful to have the output of dmesg as well from the kernel that doesn't provide networking support. You could write a script for some person at the remote co-lo to execute when the system comes up w/o network, the results could be stored in a file on the disk and when the system is rebooted again under the old kernel you can examine them for possible causes. Some commands to try: dmesg ifconfig -a mii-tool route -n ping -c 5 (IP of default gateway) arping -c 5 (IP of default gateway) arp -an lsmod nate ___ CentOS mailing list CentOS@centos.org http://lists.centos.org/mailman/listinfo/centos
RE: [CentOS] Kernel 2.6.18-53.1.13.el5 fails on network.
-Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of nate Sent: Thursday, 14 February 2008 2:46 PM To: centos@centos.org Subject: Re: [CentOS] Kernel 2.6.18-53.1.13.el5 fails on network. Indunil Jayasooriya wrote: I also got this type of probles once before. pls check initrd image. pls performe below steps. While it's always good to make sure your initrd is in a good state, the network drivers don't need to be in the initrd (unless your booting from NFS or something). They can be loaded fine from /lib/modules/`uname -r` What kind of network chip(s) are in the system? What driver are they using?(/etc/modprobe.conf), it'd be helpful to have the output of dmesg as well from the kernel that doesn't provide networking support. The network is an e100 - dmesg shows the following: # dmesg | grep e100: e100: Intel(R) PRO/100 Network Driver, 3.5.10-k2-NAPI e100: Copyright(c) 1999-2005 Intel Corporation e100: eth0: e100_probe: addr 0xdfffe000, irq 169, MAC addr 00:02:B3:8B:BE:26 e100: eth0: e100_watchdog: link up, 100Mbps, full-duplex Of course, this doesn't give us the exact chip, however mii-tool is a bit more helpful: # mii-tool -v eth0 eth0: negotiated 100baseTx-FD, link ok product info: Intel 82555 rev 4 basic mode: autonegotiation enabled basic status: autonegotiation complete, link ok capabilities: 100baseTx-FD 100baseTx-HD 10baseT-FD 10baseT-HD advertising: 100baseTx-FD 100baseTx-HD 10baseT-FD 10baseT-HD flow-control link partner: 100baseTx-FD 100baseTx-HD 10baseT-FD 10baseT-HD The interesting part for me however, is that certain things unrelated to the network also fail. I would expect iptables to come up as OK on boot - even if no network device was configured - as its independent of network configuration. It also doesn't explain how the firmware microcode update also fails. You could write a script for some person at the remote co-lo to execute when the system comes up w/o network, the results could be stored in a file on the disk and when the system is rebooted again under the old kernel you can examine them for possible causes. Some commands to try: dmesg ifconfig -a mii-tool route -n ping -c 5 (IP of default gateway) arping -c 5 (IP of default gateway) arp -an lsmod I have a bit of trouble with this, as the only person that can do it is around 30 minutes travel from the colo. As the system boots, I'm thinking of writing a script that will gather this, then reboot the system after changing the default=x line in /etc/grub.conf - however obviously I want to make sure it works 100% before I tell the machine to reboot ;) -- Steven Haigh Email: [EMAIL PROTECTED] Web: http://www.crc.id.au Phone: (03) 9001 6090 - 0412 935 897 ___ CentOS mailing list CentOS@centos.org http://lists.centos.org/mailman/listinfo/centos