I'd hate to be linear, but it looks like the messages began appearing
once nfs started up.  Does the error message appear during a fresh
reboot, or does it take a while to appear (I'm dismissing the evidence
from the dmesg output at the moment)?

Another few things to look at:

What do you use NFS for?
What does your /etc/exports file look like?
Are you using NFS2/3? (use rpcinfo to deter)
Is your kernel configured for NFS2/3

A quick way to determine if it's a nfs problem would to disable nfs on
boot and see if you get the error message.  Or, if the error message is
continually populating syslog, just stop the NFS services & see if that
does anything.

-Rob



> Hey, thanks for the help, I'll take anything I can get at this point.  What
> it deals with was about as far as I've gotten, I just have no idea what to
> do about fixing the problem.  Yes, our intent is to upgrade kernel on
> everything, I just can't (with our current load) take one out to do so, and
> don't have a machine available to test on yet.  Now, if someone could tell
> me this is what it took to fix the problem, that would certainly change
> things.
> 
> My sincere apologies, how do I tell if the gigabit is using large frames?
> 
> Drivers bound are e100 on eth0 and e1000 on eth2.
> 
> Unfortunately, yes, we are using both cards in production, and one can't be
> removed.
> 
> My dmesg output is huge, but here it goes, you'll see the error repeated
> over and over I'm talking about:
> 
> und 0x8086:0x1960:idx 0:bus 0:slot 8:func 1
> scsi0 : Found a MegaRAID controller at 0xfc80d000, IRQ: 14
> megaraid: [3.13:1.43] detected 1 logical drives
> scsi0 : AMI MegaRAID 3.13 254 commands 16 targs 1 chans 8 luns
> scsi : 1 host.
> scsi0: scanning channel 1 for devices.
>   Vendor: DELL      Model: 1x8 U2W SCSI BP   Rev: 5.35
>   Type:   Processor                          ANSI SCSI revision: 02
> scsi0: scanning virtual channel for logical drives.
>   Vendor: MegaRAID  Model: LD0 RAID5 69112R  Rev: 3.13
>   Type:   Direct-Access                      ANSI SCSI revision: 02
> Detected scsi disk sda at scsi0, channel 1, id 0, lun 0
> SCSI device sda: hdwr sector= 512 bytes. Sectors= 141541376 [69112 MB] [69.1
> GB]
>  sda: sda1 sda2 < sda5 sda6 sda7 sda8 sda9 sda10 >
> (scsi1) <Adaptec AIC-7890/1 Ultra2 SCSI host adapter> found at PCI 2/4/0
> (scsi1) Wide Channel, SCSI ID=7, 32/255 SCBs
> (scsi1) Downloading sequencer code... 396 instructions downloaded
> enable_irq() unbalanced from fc821b26
> (scsi2) <Adaptec AIC-7890/1 Ultra2 SCSI host adapter> found at PCI 2/6/0
> (scsi2) Wide Channel, SCSI ID=7, 32/255 SCBs
> (scsi2) Downloading sequencer code... 396 instructions downloaded
> enable_irq() unbalanced from fc821b26
> (scsi3) <Adaptec AIC-7860 Ultra SCSI host adapter> found at PCI 2/8/0
> (scsi3) Narrow Channel, SCSI ID=7, 3/255 SCBs
> (scsi3) Downloading sequencer code... 423 instructions downloaded
> enable_irq() unbalanced from fc821b26
> scsi1 : Adaptec AHA274x/284x/294x (EISA/VLB/PCI-Fast SCSI) 5.1.28/3.2.4
>        <Adaptec AIC-7890/1 Ultra2 SCSI host adapter>
> scsi2 : Adaptec AHA274x/284x/294x (EISA/VLB/PCI-Fast SCSI) 5.1.28/3.2.4
>        <Adaptec AIC-7890/1 Ultra2 SCSI host adapter>
> scsi3 : Adaptec AHA274x/284x/294x (EISA/VLB/PCI-Fast SCSI) 5.1.28/3.2.4
>        <Adaptec AIC-7860 Ultra SCSI host adapter>
> scsi : 4 hosts.
> (scsi3:0:5:0) Synchronous at 20.0 Mbyte/sec, offset 15.
>   Vendor: NEC       Model: CD-ROM DRIVE:466  Rev: 1.06
>   Type:   CD-ROM                             ANSI SCSI revision: 02
> Detected scsi CD-ROM sr0 at scsi3, channel 0, id 5, lun 0
> sr0: scsi3-mmc drive: 17x/40x cd/rw xa/form2 cdda tray
> Uniform CDROM driver Revision: 2.56
> autodetecting RAID arrays
> autorun ...
> ... autorun DONE.
> VFS: Mounted root (ext2 filesystem) readonly.
> change_root: old root has d_count=1
> Trying to unmount old root ... okay
> Freeing unused kernel memory: 76k freed
> Adding Swap: 2048248k swap-space (priority -1)
> Intel(R) PRO/100 Fast Ethernet Adapter - Loadable driver, ver. 1.2.1
> Copyright (c) 2000 Intel Corporation
> 
> e100 - Intel(R) PRO/100+ Dual Port Server Adapter (Port 1) 
> eth0:  Mem:0xfe7ff008  IRQ:19  Speed:100 Mbps  Dx:Half
> 
> e100 - Intel(R) PRO/100+ Dual Port Server Adapter (Port 2) 
> eth1:  Mem:0xfe7fe008  IRQ:16  Speed:0 Mbps  Dx:N/A
>   Failed to detect cable link.
>   Speed and duplex will be determined at time of connection.
> Intel(R) PRO/1000 Gigabit Ethernet Adapter - Loadable driver, ver. 2.0.6
>          Copyright (c) 1999-2000 Intel Corporation
> 
> Intel(R) PRO/1000 Gigabit Adapter (SC - Fiber)
> eth2: Mem:0xfe400000  IRQ:11  Speed:1000 Mbps  Dx:Full
> e1000: eth2 Link is Down
> e1000: eth2 1000Mbs Full Duplex Link is Up
> Installing knfsd (copyright (C) 1996 [EMAIL PROTECTED]).
> nfsd_fh_init : initialized fhcache, entries=1024
> !Proc_Rec_Ints cannot alloc_skb memory
> !Proc_Rec_Ints cannot alloc_skb memory
> !Proc_Rec_Ints cannot alloc_skb memory
> !Proc_Rec_Ints cannot alloc_skb memory
> !Proc_Rec_Ints cannot alloc_skb memory
> !Proc_Rec_Ints cannot alloc_skb memory
> !Proc_Rec_Ints cannot alloc_skb memory
> !Proc_Rec_Ints cannot alloc_skb memory
> !Proc_Rec_Ints cannot alloc_skb memory
> !Proc_Rec_Ints cannot alloc_skb memory
> !Proc_Rec_Ints cannot alloc_skb memory
> !Proc_Rec_Ints cannot alloc_skb memory
> !Proc_Rec_Ints cannot alloc_skb memory
> !Proc_Rec_Ints cannot alloc_skb memory
> !Proc_Rec_Ints cannot alloc_skb memory
> !Proc_Rec_Ints cannot alloc_skb memory
> !Proc_Rec_Ints cannot alloc_skb memory
> !Proc_Rec_Ints cannot alloc_skb memory
> !Proc_Rec_Ints cannot alloc_skb memory
> !Proc_Rec_Ints cannot alloc_skb memory
> !Proc_Rec_Ints cannot alloc_skb memory
> !Proc_Rec_Ints cannot alloc_skb memory
> !Proc_Rec_Ints cannot alloc_skb memory
> !Proc_Rec_Ints cannot alloc_skb memory
> !Proc_Rec_Ints cannot alloc_skb memory
> !Proc_Rec_Ints cannot alloc_skb memory
> !Proc_Rec_Ints cannot alloc_skb memory
> !Proc_Rec_Ints cannot alloc_skb memory
> !Proc_Rec_Ints cannot alloc_skb memory
> !Proc_Rec_Ints cannot alloc_skb memory
> exp_do_unexport: 08:06 last use, flushing cache
> exp_do_unexport: 08:0a last use, flushing cache
> nfsd: terminating on signal 9
> nfsd: terminating on signal 9
> nfsd: terminating on signal 9
> nfsd: terminating on signal 9
> nfsd: terminating on signal 9
> nfsd: terminating on signal 9
> nfsd: terminating on signal 9
> nfsd: terminating on signal 9
> nfsd: last server exiting
> nfsd_fh_shutdown : freeing 1024 fhcache entries.
> nfsd_fh_init : initialized fhcache, entries=1024
> VFS: Disk change detected on device fd(2,0)
> !Proc_Rec_Ints cannot alloc_skb memory
> !Proc_Rec_Ints cannot alloc_skb memory
> !Proc_Rec_Ints cannot alloc_skb memory
> !Proc_Rec_Ints cannot alloc_skb memory
> !Proc_Rec_Ints cannot alloc_skb memory
> !Proc_Rec_Ints cannot alloc_skb memory
> !Proc_Rec_Ints cannot alloc_skb memory
> !Proc_Rec_Ints cannot alloc_skb memory
> !Proc_Rec_Ints cannot alloc_skb memory
> !Proc_Rec_Ints cannot alloc_skb memory
> !Proc_Rec_Ints cannot alloc_skb memory
> !Proc_Rec_Ints cannot alloc_skb memory
> !Proc_Rec_Ints cannot alloc_skb memory
> !Proc_Rec_Ints cannot alloc_skb memory
> !Proc_Rec_Ints cannot alloc_skb memory
> !Proc_Rec_Ints cannot alloc_skb memory
> !Proc_Rec_Ints cannot alloc_skb memory
> !Proc_Rec_Ints cannot alloc_skb memory
> !Proc_Rec_Ints cannot alloc_skb memory
> !Proc_Rec_Ints cannot alloc_skb memory
> !Proc_Rec_Ints cannot alloc_skb memory
> !Proc_Rec_Ints cannot alloc_skb memory
> !Proc_Rec_Ints cannot alloc_skb memory
> !Proc_Rec_Ints cannot alloc_skb memory
> !Proc_Rec_Ints cannot alloc_skb memory
> !Proc_Rec_Ints cannot alloc_skb memory
> !Proc_Rec_Ints cannot alloc_skb memory
> !Proc_Rec_Ints cannot alloc_skb memory
> !Proc_Rec_Ints cannot alloc_skb memory
> !Proc_Rec_Ints cannot alloc_skb memory
> !Proc_Rec_Ints cannot alloc_skb memory
> !Proc_Rec_Ints cannot alloc_skb memory
> !Proc_Rec_Ints cannot alloc_skb memory
> !Proc_Rec_Ints cannot alloc_skb memory
> !Proc_Rec_Ints cannot alloc_skb memory
> !Proc_Rec_Ints cannot alloc_skb memory
> !Proc_Rec_Ints cannot alloc_skb memory
> e1000: eth2 Link is Down
> e1000: eth2 1000Mbs Full Duplex Link is Up
> !Proc_Rec_Ints cannot alloc_skb memory
> !Proc_Rec_Ints cannot alloc_skb memory
> !Proc_Rec_Ints cannot alloc_skb memory
> !Proc_Rec_Ints cannot alloc_skb memory
> !Proc_Rec_Ints cannot alloc_skb memory
> !Proc_Rec_Ints cannot alloc_skb memory
> !Proc_Rec_Ints cannot alloc_skb memory
> !Proc_Rec_Ints cannot alloc_skb memory
> !Proc_Rec_Ints cannot alloc_skb memory
> !Proc_Rec_Ints cannot alloc_skb memory
> !Proc_Rec_Ints cannot alloc_skb memory
> !Proc_Rec_Ints cannot alloc_skb memory
> !Proc_Rec_Ints cannot alloc_skb memory
> !Proc_Rec_Ints cannot alloc_skb memory
> !Proc_Rec_Ints cannot alloc_skb memory
> !Proc_Rec_Ints cannot alloc_skb memory
> !Proc_Rec_Ints cannot alloc_skb memory
> !Proc_Rec_Ints cannot alloc_skb memory
> !Proc_Rec_Ints cannot alloc_skb memory
> !Proc_Rec_Ints cannot alloc_skb memory
> !Proc_Rec_Ints cannot alloc_skb memory
> !Proc_Rec_Ints cannot alloc_skb memory
> !Proc_Rec_Ints cannot alloc_skb memory
> !Proc_Rec_Ints cannot alloc_skb memory
> !Proc_Rec_Ints cannot alloc_skb memory
> !Proc_Rec_Ints cannot alloc_skb memory
> !Proc_Rec_Ints cannot alloc_skb memory
> !Proc_Rec_Ints cannot alloc_skb memory
> !Proc_Rec_Ints cannot alloc_skb memory
> !Proc_Rec_Ints cannot alloc_skb memory
> !Proc_Rec_Ints cannot alloc_skb memory
> !Proc_Rec_Ints cannot alloc_skb memory
> 
> -----Original Message-----
> From: Robert Dege [mailto:[EMAIL PROTECTED]]
> Sent: Wednesday, January 16, 2002 7:36 AM
> To: [EMAIL PROTECTED]
> Subject: RE: skb problem
> 
> 
> alloc_skb deals with Network Buffers & Memory Management.  If all of
> these machines are the same, it might be in your best interest to take
> one machine out of the loop and do a few tests on it.
> 
> Your kernel (2.2.14-6) sounds like a redhat pre-built kernel.  Have you
> tried manually compiling the kernel on your own (obtaining source from
> ftp.kernel.org)?
> 
> A few things to check:
> 
> Your gigabit ethernet, are you using Jumbo frames (insanely large r/w
> buffers)?
> 
> Check your /etc/modules.conf file and see what drivers are bound the the
> eth* devices.
> 
> Are you to use both cards?  What happens if you take one card out and
> boot the machine?
> 
> Unfortunately, I'm no kernel hacker, so all I can offer is a process of
> elmination style solution.
> 
> You can always post your ifconfig & dmesg output to the list.
> 
> -Rob
> 
> 
> > My apologies.  Yes, the hardware is identical, each box has 2 nic cards,
> one
> > intel pro 10/100 (e100) and an intel fiber gigabit card (e1000).  We're
> > running red hat 6.2 and kernel 2.2.14-6.
> > 
> > -----Original Message-----
> > From: Robert Dege [mailto:[EMAIL PROTECTED]]
> > Sent: Tuesday, January 15, 2002 3:37 PM
> > To: [EMAIL PROTECTED]
> > Subject: Re: skb problem
> > 
> > 
> > 
> > Well, it appears to be a kernel problem.  What kernel are you running on
> > these machines?  Also, is the hardware the same on each of these boxes? 
> > spefically the NIC card?
> > 
> > More info is appreciated.
> > 
> > -Rob
> > 
> > > I'm hoping someone out there can help us, we're desperate.  We've got 9
> > > webservers all experiencing a strange problem.  It's random, happens to
> > > individual servers at diff times, there doesn't appear to be a pattern,
> > and
> > > it isn't related to traffic or load, we've had them go down in the
> middle
> > of
> > > the night.  You can still ping them, and their nfs mounts are available,
> > but
> > > we cannot telnet, ssh, or log in at console.  The only way out of it is
> to
> > > power off.  If anyone has seen this, please email back.  The error we
> see
> > in
> > > /var/log/messages just prior to rebooting is:
> > >  
> > > kernel: !Proc_Rec_Ints cannot alloc_skb memory
> > >  
> > > If it repeats itself 30 or 40 times, the box is gone.
> > >  
> > > Thanks for the assistance.
> > >  
> > > 
> > >  <mailto:[EMAIL PROTECTED]> D o u g   T u c k e r
> > > Systems Administrator -  <http://www.belointeractive.com/> Belo
> > Interactive
> > > Phone: 214.977.4016
> > >  <mailto:[EMAIL PROTECTED]> Page: 877.417.4750
> > >  
> > > 
> > > 
> > > 
> > > _______________________________________________
> > > Redhat-list mailing list
> > > [EMAIL PROTECTED]
> > > https://listman.redhat.com/mailman/listinfo/redhat-list
> > > 
> > -- 
> > 
> > -Rob
> > 
> > 
> > 
> > _______________________________________________
> > Redhat-list mailing list
> > [EMAIL PROTECTED]
> > https://listman.redhat.com/mailman/listinfo/redhat-list
> > 
> > 
> > 
> > _______________________________________________
> > Redhat-list mailing list
> > [EMAIL PROTECTED]
> > https://listman.redhat.com/mailman/listinfo/redhat-list
> > 
> -- 
> 
> -Rob
> 
> 
> 
> _______________________________________________
> Redhat-list mailing list
> [EMAIL PROTECTED]
> https://listman.redhat.com/mailman/listinfo/redhat-list
> 
> 
> 
> _______________________________________________
> Redhat-list mailing list
> [EMAIL PROTECTED]
> https://listman.redhat.com/mailman/listinfo/redhat-list
> 
-- 

-Rob



_______________________________________________
Redhat-list mailing list
[EMAIL PROTECTED]
https://listman.redhat.com/mailman/listinfo/redhat-list

Reply via email to