Hey, thanks for the help, I'll take anything I can get at this point. What it deals with was about as far as I've gotten, I just have no idea what to do about fixing the problem. Yes, our intent is to upgrade kernel on everything, I just can't (with our current load) take one out to do so, and don't have a machine available to test on yet. Now, if someone could tell me this is what it took to fix the problem, that would certainly change things.
My sincere apologies, how do I tell if the gigabit is using large frames? Drivers bound are e100 on eth0 and e1000 on eth2. Unfortunately, yes, we are using both cards in production, and one can't be removed. My dmesg output is huge, but here it goes, you'll see the error repeated over and over I'm talking about: und 0x8086:0x1960:idx 0:bus 0:slot 8:func 1 scsi0 : Found a MegaRAID controller at 0xfc80d000, IRQ: 14 megaraid: [3.13:1.43] detected 1 logical drives scsi0 : AMI MegaRAID 3.13 254 commands 16 targs 1 chans 8 luns scsi : 1 host. scsi0: scanning channel 1 for devices. Vendor: DELL Model: 1x8 U2W SCSI BP Rev: 5.35 Type: Processor ANSI SCSI revision: 02 scsi0: scanning virtual channel for logical drives. Vendor: MegaRAID Model: LD0 RAID5 69112R Rev: 3.13 Type: Direct-Access ANSI SCSI revision: 02 Detected scsi disk sda at scsi0, channel 1, id 0, lun 0 SCSI device sda: hdwr sector= 512 bytes. Sectors= 141541376 [69112 MB] [69.1 GB] sda: sda1 sda2 < sda5 sda6 sda7 sda8 sda9 sda10 > (scsi1) <Adaptec AIC-7890/1 Ultra2 SCSI host adapter> found at PCI 2/4/0 (scsi1) Wide Channel, SCSI ID=7, 32/255 SCBs (scsi1) Downloading sequencer code... 396 instructions downloaded enable_irq() unbalanced from fc821b26 (scsi2) <Adaptec AIC-7890/1 Ultra2 SCSI host adapter> found at PCI 2/6/0 (scsi2) Wide Channel, SCSI ID=7, 32/255 SCBs (scsi2) Downloading sequencer code... 396 instructions downloaded enable_irq() unbalanced from fc821b26 (scsi3) <Adaptec AIC-7860 Ultra SCSI host adapter> found at PCI 2/8/0 (scsi3) Narrow Channel, SCSI ID=7, 3/255 SCBs (scsi3) Downloading sequencer code... 423 instructions downloaded enable_irq() unbalanced from fc821b26 scsi1 : Adaptec AHA274x/284x/294x (EISA/VLB/PCI-Fast SCSI) 5.1.28/3.2.4 <Adaptec AIC-7890/1 Ultra2 SCSI host adapter> scsi2 : Adaptec AHA274x/284x/294x (EISA/VLB/PCI-Fast SCSI) 5.1.28/3.2.4 <Adaptec AIC-7890/1 Ultra2 SCSI host adapter> scsi3 : Adaptec AHA274x/284x/294x (EISA/VLB/PCI-Fast SCSI) 5.1.28/3.2.4 <Adaptec AIC-7860 Ultra SCSI host adapter> scsi : 4 hosts. (scsi3:0:5:0) Synchronous at 20.0 Mbyte/sec, offset 15. Vendor: NEC Model: CD-ROM DRIVE:466 Rev: 1.06 Type: CD-ROM ANSI SCSI revision: 02 Detected scsi CD-ROM sr0 at scsi3, channel 0, id 5, lun 0 sr0: scsi3-mmc drive: 17x/40x cd/rw xa/form2 cdda tray Uniform CDROM driver Revision: 2.56 autodetecting RAID arrays autorun ... ... autorun DONE. VFS: Mounted root (ext2 filesystem) readonly. change_root: old root has d_count=1 Trying to unmount old root ... okay Freeing unused kernel memory: 76k freed Adding Swap: 2048248k swap-space (priority -1) Intel(R) PRO/100 Fast Ethernet Adapter - Loadable driver, ver. 1.2.1 Copyright (c) 2000 Intel Corporation e100 - Intel(R) PRO/100+ Dual Port Server Adapter (Port 1) eth0: Mem:0xfe7ff008 IRQ:19 Speed:100 Mbps Dx:Half e100 - Intel(R) PRO/100+ Dual Port Server Adapter (Port 2) eth1: Mem:0xfe7fe008 IRQ:16 Speed:0 Mbps Dx:N/A Failed to detect cable link. Speed and duplex will be determined at time of connection. Intel(R) PRO/1000 Gigabit Ethernet Adapter - Loadable driver, ver. 2.0.6 Copyright (c) 1999-2000 Intel Corporation Intel(R) PRO/1000 Gigabit Adapter (SC - Fiber) eth2: Mem:0xfe400000 IRQ:11 Speed:1000 Mbps Dx:Full e1000: eth2 Link is Down e1000: eth2 1000Mbs Full Duplex Link is Up Installing knfsd (copyright (C) 1996 [EMAIL PROTECTED]). nfsd_fh_init : initialized fhcache, entries=1024 !Proc_Rec_Ints cannot alloc_skb memory !Proc_Rec_Ints cannot alloc_skb memory !Proc_Rec_Ints cannot alloc_skb memory !Proc_Rec_Ints cannot alloc_skb memory !Proc_Rec_Ints cannot alloc_skb memory !Proc_Rec_Ints cannot alloc_skb memory !Proc_Rec_Ints cannot alloc_skb memory !Proc_Rec_Ints cannot alloc_skb memory !Proc_Rec_Ints cannot alloc_skb memory !Proc_Rec_Ints cannot alloc_skb memory !Proc_Rec_Ints cannot alloc_skb memory !Proc_Rec_Ints cannot alloc_skb memory !Proc_Rec_Ints cannot alloc_skb memory !Proc_Rec_Ints cannot alloc_skb memory !Proc_Rec_Ints cannot alloc_skb memory !Proc_Rec_Ints cannot alloc_skb memory !Proc_Rec_Ints cannot alloc_skb memory !Proc_Rec_Ints cannot alloc_skb memory !Proc_Rec_Ints cannot alloc_skb memory !Proc_Rec_Ints cannot alloc_skb memory !Proc_Rec_Ints cannot alloc_skb memory !Proc_Rec_Ints cannot alloc_skb memory !Proc_Rec_Ints cannot alloc_skb memory !Proc_Rec_Ints cannot alloc_skb memory !Proc_Rec_Ints cannot alloc_skb memory !Proc_Rec_Ints cannot alloc_skb memory !Proc_Rec_Ints cannot alloc_skb memory !Proc_Rec_Ints cannot alloc_skb memory !Proc_Rec_Ints cannot alloc_skb memory !Proc_Rec_Ints cannot alloc_skb memory exp_do_unexport: 08:06 last use, flushing cache exp_do_unexport: 08:0a last use, flushing cache nfsd: terminating on signal 9 nfsd: terminating on signal 9 nfsd: terminating on signal 9 nfsd: terminating on signal 9 nfsd: terminating on signal 9 nfsd: terminating on signal 9 nfsd: terminating on signal 9 nfsd: terminating on signal 9 nfsd: last server exiting nfsd_fh_shutdown : freeing 1024 fhcache entries. nfsd_fh_init : initialized fhcache, entries=1024 VFS: Disk change detected on device fd(2,0) !Proc_Rec_Ints cannot alloc_skb memory !Proc_Rec_Ints cannot alloc_skb memory !Proc_Rec_Ints cannot alloc_skb memory !Proc_Rec_Ints cannot alloc_skb memory !Proc_Rec_Ints cannot alloc_skb memory !Proc_Rec_Ints cannot alloc_skb memory !Proc_Rec_Ints cannot alloc_skb memory !Proc_Rec_Ints cannot alloc_skb memory !Proc_Rec_Ints cannot alloc_skb memory !Proc_Rec_Ints cannot alloc_skb memory !Proc_Rec_Ints cannot alloc_skb memory !Proc_Rec_Ints cannot alloc_skb memory !Proc_Rec_Ints cannot alloc_skb memory !Proc_Rec_Ints cannot alloc_skb memory !Proc_Rec_Ints cannot alloc_skb memory !Proc_Rec_Ints cannot alloc_skb memory !Proc_Rec_Ints cannot alloc_skb memory !Proc_Rec_Ints cannot alloc_skb memory !Proc_Rec_Ints cannot alloc_skb memory !Proc_Rec_Ints cannot alloc_skb memory !Proc_Rec_Ints cannot alloc_skb memory !Proc_Rec_Ints cannot alloc_skb memory !Proc_Rec_Ints cannot alloc_skb memory !Proc_Rec_Ints cannot alloc_skb memory !Proc_Rec_Ints cannot alloc_skb memory !Proc_Rec_Ints cannot alloc_skb memory !Proc_Rec_Ints cannot alloc_skb memory !Proc_Rec_Ints cannot alloc_skb memory !Proc_Rec_Ints cannot alloc_skb memory !Proc_Rec_Ints cannot alloc_skb memory !Proc_Rec_Ints cannot alloc_skb memory !Proc_Rec_Ints cannot alloc_skb memory !Proc_Rec_Ints cannot alloc_skb memory !Proc_Rec_Ints cannot alloc_skb memory !Proc_Rec_Ints cannot alloc_skb memory !Proc_Rec_Ints cannot alloc_skb memory !Proc_Rec_Ints cannot alloc_skb memory e1000: eth2 Link is Down e1000: eth2 1000Mbs Full Duplex Link is Up !Proc_Rec_Ints cannot alloc_skb memory !Proc_Rec_Ints cannot alloc_skb memory !Proc_Rec_Ints cannot alloc_skb memory !Proc_Rec_Ints cannot alloc_skb memory !Proc_Rec_Ints cannot alloc_skb memory !Proc_Rec_Ints cannot alloc_skb memory !Proc_Rec_Ints cannot alloc_skb memory !Proc_Rec_Ints cannot alloc_skb memory !Proc_Rec_Ints cannot alloc_skb memory !Proc_Rec_Ints cannot alloc_skb memory !Proc_Rec_Ints cannot alloc_skb memory !Proc_Rec_Ints cannot alloc_skb memory !Proc_Rec_Ints cannot alloc_skb memory !Proc_Rec_Ints cannot alloc_skb memory !Proc_Rec_Ints cannot alloc_skb memory !Proc_Rec_Ints cannot alloc_skb memory !Proc_Rec_Ints cannot alloc_skb memory !Proc_Rec_Ints cannot alloc_skb memory !Proc_Rec_Ints cannot alloc_skb memory !Proc_Rec_Ints cannot alloc_skb memory !Proc_Rec_Ints cannot alloc_skb memory !Proc_Rec_Ints cannot alloc_skb memory !Proc_Rec_Ints cannot alloc_skb memory !Proc_Rec_Ints cannot alloc_skb memory !Proc_Rec_Ints cannot alloc_skb memory !Proc_Rec_Ints cannot alloc_skb memory !Proc_Rec_Ints cannot alloc_skb memory !Proc_Rec_Ints cannot alloc_skb memory !Proc_Rec_Ints cannot alloc_skb memory !Proc_Rec_Ints cannot alloc_skb memory !Proc_Rec_Ints cannot alloc_skb memory !Proc_Rec_Ints cannot alloc_skb memory -----Original Message----- From: Robert Dege [mailto:[EMAIL PROTECTED]] Sent: Wednesday, January 16, 2002 7:36 AM To: [EMAIL PROTECTED] Subject: RE: skb problem alloc_skb deals with Network Buffers & Memory Management. If all of these machines are the same, it might be in your best interest to take one machine out of the loop and do a few tests on it. Your kernel (2.2.14-6) sounds like a redhat pre-built kernel. Have you tried manually compiling the kernel on your own (obtaining source from ftp.kernel.org)? A few things to check: Your gigabit ethernet, are you using Jumbo frames (insanely large r/w buffers)? Check your /etc/modules.conf file and see what drivers are bound the the eth* devices. Are you to use both cards? What happens if you take one card out and boot the machine? Unfortunately, I'm no kernel hacker, so all I can offer is a process of elmination style solution. You can always post your ifconfig & dmesg output to the list. -Rob > My apologies. Yes, the hardware is identical, each box has 2 nic cards, one > intel pro 10/100 (e100) and an intel fiber gigabit card (e1000). We're > running red hat 6.2 and kernel 2.2.14-6. > > -----Original Message----- > From: Robert Dege [mailto:[EMAIL PROTECTED]] > Sent: Tuesday, January 15, 2002 3:37 PM > To: [EMAIL PROTECTED] > Subject: Re: skb problem > > > > Well, it appears to be a kernel problem. What kernel are you running on > these machines? Also, is the hardware the same on each of these boxes? > spefically the NIC card? > > More info is appreciated. > > -Rob > > > I'm hoping someone out there can help us, we're desperate. We've got 9 > > webservers all experiencing a strange problem. It's random, happens to > > individual servers at diff times, there doesn't appear to be a pattern, > and > > it isn't related to traffic or load, we've had them go down in the middle > of > > the night. You can still ping them, and their nfs mounts are available, > but > > we cannot telnet, ssh, or log in at console. The only way out of it is to > > power off. If anyone has seen this, please email back. The error we see > in > > /var/log/messages just prior to rebooting is: > > > > kernel: !Proc_Rec_Ints cannot alloc_skb memory > > > > If it repeats itself 30 or 40 times, the box is gone. > > > > Thanks for the assistance. > > > > > > <mailto:[EMAIL PROTECTED]> D o u g T u c k e r > > Systems Administrator - <http://www.belointeractive.com/> Belo > Interactive > > Phone: 214.977.4016 > > <mailto:[EMAIL PROTECTED]> Page: 877.417.4750 > > > > > > > > > > _______________________________________________ > > Redhat-list mailing list > > [EMAIL PROTECTED] > > https://listman.redhat.com/mailman/listinfo/redhat-list > > > -- > > -Rob > > > > _______________________________________________ > Redhat-list mailing list > [EMAIL PROTECTED] > https://listman.redhat.com/mailman/listinfo/redhat-list > > > > _______________________________________________ > Redhat-list mailing list > [EMAIL PROTECTED] > https://listman.redhat.com/mailman/listinfo/redhat-list > -- -Rob _______________________________________________ Redhat-list mailing list [EMAIL PROTECTED] https://listman.redhat.com/mailman/listinfo/redhat-list _______________________________________________ Redhat-list mailing list [EMAIL PROTECTED] https://listman.redhat.com/mailman/listinfo/redhat-list