I'd hate to be linear, but it looks like the messages began appearing once nfs started up. Does the error message appear during a fresh reboot, or does it take a while to appear (I'm dismissing the evidence from the dmesg output at the moment)?
Another few things to look at: What do you use NFS for? What does your /etc/exports file look like? Are you using NFS2/3? (use rpcinfo to deter) Is your kernel configured for NFS2/3 A quick way to determine if it's a nfs problem would to disable nfs on boot and see if you get the error message. Or, if the error message is continually populating syslog, just stop the NFS services & see if that does anything. -Rob > Hey, thanks for the help, I'll take anything I can get at this point. What > it deals with was about as far as I've gotten, I just have no idea what to > do about fixing the problem. Yes, our intent is to upgrade kernel on > everything, I just can't (with our current load) take one out to do so, and > don't have a machine available to test on yet. Now, if someone could tell > me this is what it took to fix the problem, that would certainly change > things. > > My sincere apologies, how do I tell if the gigabit is using large frames? > > Drivers bound are e100 on eth0 and e1000 on eth2. > > Unfortunately, yes, we are using both cards in production, and one can't be > removed. > > My dmesg output is huge, but here it goes, you'll see the error repeated > over and over I'm talking about: > > und 0x8086:0x1960:idx 0:bus 0:slot 8:func 1 > scsi0 : Found a MegaRAID controller at 0xfc80d000, IRQ: 14 > megaraid: [3.13:1.43] detected 1 logical drives > scsi0 : AMI MegaRAID 3.13 254 commands 16 targs 1 chans 8 luns > scsi : 1 host. > scsi0: scanning channel 1 for devices. > Vendor: DELL Model: 1x8 U2W SCSI BP Rev: 5.35 > Type: Processor ANSI SCSI revision: 02 > scsi0: scanning virtual channel for logical drives. > Vendor: MegaRAID Model: LD0 RAID5 69112R Rev: 3.13 > Type: Direct-Access ANSI SCSI revision: 02 > Detected scsi disk sda at scsi0, channel 1, id 0, lun 0 > SCSI device sda: hdwr sector= 512 bytes. Sectors= 141541376 [69112 MB] [69.1 > GB] > sda: sda1 sda2 < sda5 sda6 sda7 sda8 sda9 sda10 > > (scsi1) <Adaptec AIC-7890/1 Ultra2 SCSI host adapter> found at PCI 2/4/0 > (scsi1) Wide Channel, SCSI ID=7, 32/255 SCBs > (scsi1) Downloading sequencer code... 396 instructions downloaded > enable_irq() unbalanced from fc821b26 > (scsi2) <Adaptec AIC-7890/1 Ultra2 SCSI host adapter> found at PCI 2/6/0 > (scsi2) Wide Channel, SCSI ID=7, 32/255 SCBs > (scsi2) Downloading sequencer code... 396 instructions downloaded > enable_irq() unbalanced from fc821b26 > (scsi3) <Adaptec AIC-7860 Ultra SCSI host adapter> found at PCI 2/8/0 > (scsi3) Narrow Channel, SCSI ID=7, 3/255 SCBs > (scsi3) Downloading sequencer code... 423 instructions downloaded > enable_irq() unbalanced from fc821b26 > scsi1 : Adaptec AHA274x/284x/294x (EISA/VLB/PCI-Fast SCSI) 5.1.28/3.2.4 > <Adaptec AIC-7890/1 Ultra2 SCSI host adapter> > scsi2 : Adaptec AHA274x/284x/294x (EISA/VLB/PCI-Fast SCSI) 5.1.28/3.2.4 > <Adaptec AIC-7890/1 Ultra2 SCSI host adapter> > scsi3 : Adaptec AHA274x/284x/294x (EISA/VLB/PCI-Fast SCSI) 5.1.28/3.2.4 > <Adaptec AIC-7860 Ultra SCSI host adapter> > scsi : 4 hosts. > (scsi3:0:5:0) Synchronous at 20.0 Mbyte/sec, offset 15. > Vendor: NEC Model: CD-ROM DRIVE:466 Rev: 1.06 > Type: CD-ROM ANSI SCSI revision: 02 > Detected scsi CD-ROM sr0 at scsi3, channel 0, id 5, lun 0 > sr0: scsi3-mmc drive: 17x/40x cd/rw xa/form2 cdda tray > Uniform CDROM driver Revision: 2.56 > autodetecting RAID arrays > autorun ... > ... autorun DONE. > VFS: Mounted root (ext2 filesystem) readonly. > change_root: old root has d_count=1 > Trying to unmount old root ... okay > Freeing unused kernel memory: 76k freed > Adding Swap: 2048248k swap-space (priority -1) > Intel(R) PRO/100 Fast Ethernet Adapter - Loadable driver, ver. 1.2.1 > Copyright (c) 2000 Intel Corporation > > e100 - Intel(R) PRO/100+ Dual Port Server Adapter (Port 1) > eth0: Mem:0xfe7ff008 IRQ:19 Speed:100 Mbps Dx:Half > > e100 - Intel(R) PRO/100+ Dual Port Server Adapter (Port 2) > eth1: Mem:0xfe7fe008 IRQ:16 Speed:0 Mbps Dx:N/A > Failed to detect cable link. > Speed and duplex will be determined at time of connection. > Intel(R) PRO/1000 Gigabit Ethernet Adapter - Loadable driver, ver. 2.0.6 > Copyright (c) 1999-2000 Intel Corporation > > Intel(R) PRO/1000 Gigabit Adapter (SC - Fiber) > eth2: Mem:0xfe400000 IRQ:11 Speed:1000 Mbps Dx:Full > e1000: eth2 Link is Down > e1000: eth2 1000Mbs Full Duplex Link is Up > Installing knfsd (copyright (C) 1996 [EMAIL PROTECTED]). > nfsd_fh_init : initialized fhcache, entries=1024 > !Proc_Rec_Ints cannot alloc_skb memory > !Proc_Rec_Ints cannot alloc_skb memory > !Proc_Rec_Ints cannot alloc_skb memory > !Proc_Rec_Ints cannot alloc_skb memory > !Proc_Rec_Ints cannot alloc_skb memory > !Proc_Rec_Ints cannot alloc_skb memory > !Proc_Rec_Ints cannot alloc_skb memory > !Proc_Rec_Ints cannot alloc_skb memory > !Proc_Rec_Ints cannot alloc_skb memory > !Proc_Rec_Ints cannot alloc_skb memory > !Proc_Rec_Ints cannot alloc_skb memory > !Proc_Rec_Ints cannot alloc_skb memory > !Proc_Rec_Ints cannot alloc_skb memory > !Proc_Rec_Ints cannot alloc_skb memory > !Proc_Rec_Ints cannot alloc_skb memory > !Proc_Rec_Ints cannot alloc_skb memory > !Proc_Rec_Ints cannot alloc_skb memory > !Proc_Rec_Ints cannot alloc_skb memory > !Proc_Rec_Ints cannot alloc_skb memory > !Proc_Rec_Ints cannot alloc_skb memory > !Proc_Rec_Ints cannot alloc_skb memory > !Proc_Rec_Ints cannot alloc_skb memory > !Proc_Rec_Ints cannot alloc_skb memory > !Proc_Rec_Ints cannot alloc_skb memory > !Proc_Rec_Ints cannot alloc_skb memory > !Proc_Rec_Ints cannot alloc_skb memory > !Proc_Rec_Ints cannot alloc_skb memory > !Proc_Rec_Ints cannot alloc_skb memory > !Proc_Rec_Ints cannot alloc_skb memory > !Proc_Rec_Ints cannot alloc_skb memory > exp_do_unexport: 08:06 last use, flushing cache > exp_do_unexport: 08:0a last use, flushing cache > nfsd: terminating on signal 9 > nfsd: terminating on signal 9 > nfsd: terminating on signal 9 > nfsd: terminating on signal 9 > nfsd: terminating on signal 9 > nfsd: terminating on signal 9 > nfsd: terminating on signal 9 > nfsd: terminating on signal 9 > nfsd: last server exiting > nfsd_fh_shutdown : freeing 1024 fhcache entries. > nfsd_fh_init : initialized fhcache, entries=1024 > VFS: Disk change detected on device fd(2,0) > !Proc_Rec_Ints cannot alloc_skb memory > !Proc_Rec_Ints cannot alloc_skb memory > !Proc_Rec_Ints cannot alloc_skb memory > !Proc_Rec_Ints cannot alloc_skb memory > !Proc_Rec_Ints cannot alloc_skb memory > !Proc_Rec_Ints cannot alloc_skb memory > !Proc_Rec_Ints cannot alloc_skb memory > !Proc_Rec_Ints cannot alloc_skb memory > !Proc_Rec_Ints cannot alloc_skb memory > !Proc_Rec_Ints cannot alloc_skb memory > !Proc_Rec_Ints cannot alloc_skb memory > !Proc_Rec_Ints cannot alloc_skb memory > !Proc_Rec_Ints cannot alloc_skb memory > !Proc_Rec_Ints cannot alloc_skb memory > !Proc_Rec_Ints cannot alloc_skb memory > !Proc_Rec_Ints cannot alloc_skb memory > !Proc_Rec_Ints cannot alloc_skb memory > !Proc_Rec_Ints cannot alloc_skb memory > !Proc_Rec_Ints cannot alloc_skb memory > !Proc_Rec_Ints cannot alloc_skb memory > !Proc_Rec_Ints cannot alloc_skb memory > !Proc_Rec_Ints cannot alloc_skb memory > !Proc_Rec_Ints cannot alloc_skb memory > !Proc_Rec_Ints cannot alloc_skb memory > !Proc_Rec_Ints cannot alloc_skb memory > !Proc_Rec_Ints cannot alloc_skb memory > !Proc_Rec_Ints cannot alloc_skb memory > !Proc_Rec_Ints cannot alloc_skb memory > !Proc_Rec_Ints cannot alloc_skb memory > !Proc_Rec_Ints cannot alloc_skb memory > !Proc_Rec_Ints cannot alloc_skb memory > !Proc_Rec_Ints cannot alloc_skb memory > !Proc_Rec_Ints cannot alloc_skb memory > !Proc_Rec_Ints cannot alloc_skb memory > !Proc_Rec_Ints cannot alloc_skb memory > !Proc_Rec_Ints cannot alloc_skb memory > !Proc_Rec_Ints cannot alloc_skb memory > e1000: eth2 Link is Down > e1000: eth2 1000Mbs Full Duplex Link is Up > !Proc_Rec_Ints cannot alloc_skb memory > !Proc_Rec_Ints cannot alloc_skb memory > !Proc_Rec_Ints cannot alloc_skb memory > !Proc_Rec_Ints cannot alloc_skb memory > !Proc_Rec_Ints cannot alloc_skb memory > !Proc_Rec_Ints cannot alloc_skb memory > !Proc_Rec_Ints cannot alloc_skb memory > !Proc_Rec_Ints cannot alloc_skb memory > !Proc_Rec_Ints cannot alloc_skb memory > !Proc_Rec_Ints cannot alloc_skb memory > !Proc_Rec_Ints cannot alloc_skb memory > !Proc_Rec_Ints cannot alloc_skb memory > !Proc_Rec_Ints cannot alloc_skb memory > !Proc_Rec_Ints cannot alloc_skb memory > !Proc_Rec_Ints cannot alloc_skb memory > !Proc_Rec_Ints cannot alloc_skb memory > !Proc_Rec_Ints cannot alloc_skb memory > !Proc_Rec_Ints cannot alloc_skb memory > !Proc_Rec_Ints cannot alloc_skb memory > !Proc_Rec_Ints cannot alloc_skb memory > !Proc_Rec_Ints cannot alloc_skb memory > !Proc_Rec_Ints cannot alloc_skb memory > !Proc_Rec_Ints cannot alloc_skb memory > !Proc_Rec_Ints cannot alloc_skb memory > !Proc_Rec_Ints cannot alloc_skb memory > !Proc_Rec_Ints cannot alloc_skb memory > !Proc_Rec_Ints cannot alloc_skb memory > !Proc_Rec_Ints cannot alloc_skb memory > !Proc_Rec_Ints cannot alloc_skb memory > !Proc_Rec_Ints cannot alloc_skb memory > !Proc_Rec_Ints cannot alloc_skb memory > !Proc_Rec_Ints cannot alloc_skb memory > > -----Original Message----- > From: Robert Dege [mailto:[EMAIL PROTECTED]] > Sent: Wednesday, January 16, 2002 7:36 AM > To: [EMAIL PROTECTED] > Subject: RE: skb problem > > > alloc_skb deals with Network Buffers & Memory Management. If all of > these machines are the same, it might be in your best interest to take > one machine out of the loop and do a few tests on it. > > Your kernel (2.2.14-6) sounds like a redhat pre-built kernel. Have you > tried manually compiling the kernel on your own (obtaining source from > ftp.kernel.org)? > > A few things to check: > > Your gigabit ethernet, are you using Jumbo frames (insanely large r/w > buffers)? > > Check your /etc/modules.conf file and see what drivers are bound the the > eth* devices. > > Are you to use both cards? What happens if you take one card out and > boot the machine? > > Unfortunately, I'm no kernel hacker, so all I can offer is a process of > elmination style solution. > > You can always post your ifconfig & dmesg output to the list. > > -Rob > > > > My apologies. Yes, the hardware is identical, each box has 2 nic cards, > one > > intel pro 10/100 (e100) and an intel fiber gigabit card (e1000). We're > > running red hat 6.2 and kernel 2.2.14-6. > > > > -----Original Message----- > > From: Robert Dege [mailto:[EMAIL PROTECTED]] > > Sent: Tuesday, January 15, 2002 3:37 PM > > To: [EMAIL PROTECTED] > > Subject: Re: skb problem > > > > > > > > Well, it appears to be a kernel problem. What kernel are you running on > > these machines? Also, is the hardware the same on each of these boxes? > > spefically the NIC card? > > > > More info is appreciated. > > > > -Rob > > > > > I'm hoping someone out there can help us, we're desperate. We've got 9 > > > webservers all experiencing a strange problem. It's random, happens to > > > individual servers at diff times, there doesn't appear to be a pattern, > > and > > > it isn't related to traffic or load, we've had them go down in the > middle > > of > > > the night. You can still ping them, and their nfs mounts are available, > > but > > > we cannot telnet, ssh, or log in at console. The only way out of it is > to > > > power off. If anyone has seen this, please email back. The error we > see > > in > > > /var/log/messages just prior to rebooting is: > > > > > > kernel: !Proc_Rec_Ints cannot alloc_skb memory > > > > > > If it repeats itself 30 or 40 times, the box is gone. > > > > > > Thanks for the assistance. > > > > > > > > > <mailto:[EMAIL PROTECTED]> D o u g T u c k e r > > > Systems Administrator - <http://www.belointeractive.com/> Belo > > Interactive > > > Phone: 214.977.4016 > > > <mailto:[EMAIL PROTECTED]> Page: 877.417.4750 > > > > > > > > > > > > > > > _______________________________________________ > > > Redhat-list mailing list > > > [EMAIL PROTECTED] > > > https://listman.redhat.com/mailman/listinfo/redhat-list > > > > > -- > > > > -Rob > > > > > > > > _______________________________________________ > > Redhat-list mailing list > > [EMAIL PROTECTED] > > https://listman.redhat.com/mailman/listinfo/redhat-list > > > > > > > > _______________________________________________ > > Redhat-list mailing list > > [EMAIL PROTECTED] > > https://listman.redhat.com/mailman/listinfo/redhat-list > > > -- > > -Rob > > > > _______________________________________________ > Redhat-list mailing list > [EMAIL PROTECTED] > https://listman.redhat.com/mailman/listinfo/redhat-list > > > > _______________________________________________ > Redhat-list mailing list > [EMAIL PROTECTED] > https://listman.redhat.com/mailman/listinfo/redhat-list > -- -Rob _______________________________________________ Redhat-list mailing list [EMAIL PROTECTED] https://listman.redhat.com/mailman/listinfo/redhat-list