Hey, thanks for the help, I'll take anything I can get at this point.  What
it deals with was about as far as I've gotten, I just have no idea what to
do about fixing the problem.  Yes, our intent is to upgrade kernel on
everything, I just can't (with our current load) take one out to do so, and
don't have a machine available to test on yet.  Now, if someone could tell
me this is what it took to fix the problem, that would certainly change
things.

My sincere apologies, how do I tell if the gigabit is using large frames?

Drivers bound are e100 on eth0 and e1000 on eth2.

Unfortunately, yes, we are using both cards in production, and one can't be
removed.

My dmesg output is huge, but here it goes, you'll see the error repeated
over and over I'm talking about:

und 0x8086:0x1960:idx 0:bus 0:slot 8:func 1
scsi0 : Found a MegaRAID controller at 0xfc80d000, IRQ: 14
megaraid: [3.13:1.43] detected 1 logical drives
scsi0 : AMI MegaRAID 3.13 254 commands 16 targs 1 chans 8 luns
scsi : 1 host.
scsi0: scanning channel 1 for devices.
  Vendor: DELL      Model: 1x8 U2W SCSI BP   Rev: 5.35
  Type:   Processor                          ANSI SCSI revision: 02
scsi0: scanning virtual channel for logical drives.
  Vendor: MegaRAID  Model: LD0 RAID5 69112R  Rev: 3.13
  Type:   Direct-Access                      ANSI SCSI revision: 02
Detected scsi disk sda at scsi0, channel 1, id 0, lun 0
SCSI device sda: hdwr sector= 512 bytes. Sectors= 141541376 [69112 MB] [69.1
GB]
 sda: sda1 sda2 < sda5 sda6 sda7 sda8 sda9 sda10 >
(scsi1) <Adaptec AIC-7890/1 Ultra2 SCSI host adapter> found at PCI 2/4/0
(scsi1) Wide Channel, SCSI ID=7, 32/255 SCBs
(scsi1) Downloading sequencer code... 396 instructions downloaded
enable_irq() unbalanced from fc821b26
(scsi2) <Adaptec AIC-7890/1 Ultra2 SCSI host adapter> found at PCI 2/6/0
(scsi2) Wide Channel, SCSI ID=7, 32/255 SCBs
(scsi2) Downloading sequencer code... 396 instructions downloaded
enable_irq() unbalanced from fc821b26
(scsi3) <Adaptec AIC-7860 Ultra SCSI host adapter> found at PCI 2/8/0
(scsi3) Narrow Channel, SCSI ID=7, 3/255 SCBs
(scsi3) Downloading sequencer code... 423 instructions downloaded
enable_irq() unbalanced from fc821b26
scsi1 : Adaptec AHA274x/284x/294x (EISA/VLB/PCI-Fast SCSI) 5.1.28/3.2.4
       <Adaptec AIC-7890/1 Ultra2 SCSI host adapter>
scsi2 : Adaptec AHA274x/284x/294x (EISA/VLB/PCI-Fast SCSI) 5.1.28/3.2.4
       <Adaptec AIC-7890/1 Ultra2 SCSI host adapter>
scsi3 : Adaptec AHA274x/284x/294x (EISA/VLB/PCI-Fast SCSI) 5.1.28/3.2.4
       <Adaptec AIC-7860 Ultra SCSI host adapter>
scsi : 4 hosts.
(scsi3:0:5:0) Synchronous at 20.0 Mbyte/sec, offset 15.
  Vendor: NEC       Model: CD-ROM DRIVE:466  Rev: 1.06
  Type:   CD-ROM                             ANSI SCSI revision: 02
Detected scsi CD-ROM sr0 at scsi3, channel 0, id 5, lun 0
sr0: scsi3-mmc drive: 17x/40x cd/rw xa/form2 cdda tray
Uniform CDROM driver Revision: 2.56
autodetecting RAID arrays
autorun ...
... autorun DONE.
VFS: Mounted root (ext2 filesystem) readonly.
change_root: old root has d_count=1
Trying to unmount old root ... okay
Freeing unused kernel memory: 76k freed
Adding Swap: 2048248k swap-space (priority -1)
Intel(R) PRO/100 Fast Ethernet Adapter - Loadable driver, ver. 1.2.1
Copyright (c) 2000 Intel Corporation

e100 - Intel(R) PRO/100+ Dual Port Server Adapter (Port 1) 
eth0:  Mem:0xfe7ff008  IRQ:19  Speed:100 Mbps  Dx:Half

e100 - Intel(R) PRO/100+ Dual Port Server Adapter (Port 2) 
eth1:  Mem:0xfe7fe008  IRQ:16  Speed:0 Mbps  Dx:N/A
  Failed to detect cable link.
  Speed and duplex will be determined at time of connection.
Intel(R) PRO/1000 Gigabit Ethernet Adapter - Loadable driver, ver. 2.0.6
         Copyright (c) 1999-2000 Intel Corporation

Intel(R) PRO/1000 Gigabit Adapter (SC - Fiber)
eth2: Mem:0xfe400000  IRQ:11  Speed:1000 Mbps  Dx:Full
e1000: eth2 Link is Down
e1000: eth2 1000Mbs Full Duplex Link is Up
Installing knfsd (copyright (C) 1996 [EMAIL PROTECTED]).
nfsd_fh_init : initialized fhcache, entries=1024
!Proc_Rec_Ints cannot alloc_skb memory
!Proc_Rec_Ints cannot alloc_skb memory
!Proc_Rec_Ints cannot alloc_skb memory
!Proc_Rec_Ints cannot alloc_skb memory
!Proc_Rec_Ints cannot alloc_skb memory
!Proc_Rec_Ints cannot alloc_skb memory
!Proc_Rec_Ints cannot alloc_skb memory
!Proc_Rec_Ints cannot alloc_skb memory
!Proc_Rec_Ints cannot alloc_skb memory
!Proc_Rec_Ints cannot alloc_skb memory
!Proc_Rec_Ints cannot alloc_skb memory
!Proc_Rec_Ints cannot alloc_skb memory
!Proc_Rec_Ints cannot alloc_skb memory
!Proc_Rec_Ints cannot alloc_skb memory
!Proc_Rec_Ints cannot alloc_skb memory
!Proc_Rec_Ints cannot alloc_skb memory
!Proc_Rec_Ints cannot alloc_skb memory
!Proc_Rec_Ints cannot alloc_skb memory
!Proc_Rec_Ints cannot alloc_skb memory
!Proc_Rec_Ints cannot alloc_skb memory
!Proc_Rec_Ints cannot alloc_skb memory
!Proc_Rec_Ints cannot alloc_skb memory
!Proc_Rec_Ints cannot alloc_skb memory
!Proc_Rec_Ints cannot alloc_skb memory
!Proc_Rec_Ints cannot alloc_skb memory
!Proc_Rec_Ints cannot alloc_skb memory
!Proc_Rec_Ints cannot alloc_skb memory
!Proc_Rec_Ints cannot alloc_skb memory
!Proc_Rec_Ints cannot alloc_skb memory
!Proc_Rec_Ints cannot alloc_skb memory
exp_do_unexport: 08:06 last use, flushing cache
exp_do_unexport: 08:0a last use, flushing cache
nfsd: terminating on signal 9
nfsd: terminating on signal 9
nfsd: terminating on signal 9
nfsd: terminating on signal 9
nfsd: terminating on signal 9
nfsd: terminating on signal 9
nfsd: terminating on signal 9
nfsd: terminating on signal 9
nfsd: last server exiting
nfsd_fh_shutdown : freeing 1024 fhcache entries.
nfsd_fh_init : initialized fhcache, entries=1024
VFS: Disk change detected on device fd(2,0)
!Proc_Rec_Ints cannot alloc_skb memory
!Proc_Rec_Ints cannot alloc_skb memory
!Proc_Rec_Ints cannot alloc_skb memory
!Proc_Rec_Ints cannot alloc_skb memory
!Proc_Rec_Ints cannot alloc_skb memory
!Proc_Rec_Ints cannot alloc_skb memory
!Proc_Rec_Ints cannot alloc_skb memory
!Proc_Rec_Ints cannot alloc_skb memory
!Proc_Rec_Ints cannot alloc_skb memory
!Proc_Rec_Ints cannot alloc_skb memory
!Proc_Rec_Ints cannot alloc_skb memory
!Proc_Rec_Ints cannot alloc_skb memory
!Proc_Rec_Ints cannot alloc_skb memory
!Proc_Rec_Ints cannot alloc_skb memory
!Proc_Rec_Ints cannot alloc_skb memory
!Proc_Rec_Ints cannot alloc_skb memory
!Proc_Rec_Ints cannot alloc_skb memory
!Proc_Rec_Ints cannot alloc_skb memory
!Proc_Rec_Ints cannot alloc_skb memory
!Proc_Rec_Ints cannot alloc_skb memory
!Proc_Rec_Ints cannot alloc_skb memory
!Proc_Rec_Ints cannot alloc_skb memory
!Proc_Rec_Ints cannot alloc_skb memory
!Proc_Rec_Ints cannot alloc_skb memory
!Proc_Rec_Ints cannot alloc_skb memory
!Proc_Rec_Ints cannot alloc_skb memory
!Proc_Rec_Ints cannot alloc_skb memory
!Proc_Rec_Ints cannot alloc_skb memory
!Proc_Rec_Ints cannot alloc_skb memory
!Proc_Rec_Ints cannot alloc_skb memory
!Proc_Rec_Ints cannot alloc_skb memory
!Proc_Rec_Ints cannot alloc_skb memory
!Proc_Rec_Ints cannot alloc_skb memory
!Proc_Rec_Ints cannot alloc_skb memory
!Proc_Rec_Ints cannot alloc_skb memory
!Proc_Rec_Ints cannot alloc_skb memory
!Proc_Rec_Ints cannot alloc_skb memory
e1000: eth2 Link is Down
e1000: eth2 1000Mbs Full Duplex Link is Up
!Proc_Rec_Ints cannot alloc_skb memory
!Proc_Rec_Ints cannot alloc_skb memory
!Proc_Rec_Ints cannot alloc_skb memory
!Proc_Rec_Ints cannot alloc_skb memory
!Proc_Rec_Ints cannot alloc_skb memory
!Proc_Rec_Ints cannot alloc_skb memory
!Proc_Rec_Ints cannot alloc_skb memory
!Proc_Rec_Ints cannot alloc_skb memory
!Proc_Rec_Ints cannot alloc_skb memory
!Proc_Rec_Ints cannot alloc_skb memory
!Proc_Rec_Ints cannot alloc_skb memory
!Proc_Rec_Ints cannot alloc_skb memory
!Proc_Rec_Ints cannot alloc_skb memory
!Proc_Rec_Ints cannot alloc_skb memory
!Proc_Rec_Ints cannot alloc_skb memory
!Proc_Rec_Ints cannot alloc_skb memory
!Proc_Rec_Ints cannot alloc_skb memory
!Proc_Rec_Ints cannot alloc_skb memory
!Proc_Rec_Ints cannot alloc_skb memory
!Proc_Rec_Ints cannot alloc_skb memory
!Proc_Rec_Ints cannot alloc_skb memory
!Proc_Rec_Ints cannot alloc_skb memory
!Proc_Rec_Ints cannot alloc_skb memory
!Proc_Rec_Ints cannot alloc_skb memory
!Proc_Rec_Ints cannot alloc_skb memory
!Proc_Rec_Ints cannot alloc_skb memory
!Proc_Rec_Ints cannot alloc_skb memory
!Proc_Rec_Ints cannot alloc_skb memory
!Proc_Rec_Ints cannot alloc_skb memory
!Proc_Rec_Ints cannot alloc_skb memory
!Proc_Rec_Ints cannot alloc_skb memory
!Proc_Rec_Ints cannot alloc_skb memory

-----Original Message-----
From: Robert Dege [mailto:[EMAIL PROTECTED]]
Sent: Wednesday, January 16, 2002 7:36 AM
To: [EMAIL PROTECTED]
Subject: RE: skb problem


alloc_skb deals with Network Buffers & Memory Management.  If all of
these machines are the same, it might be in your best interest to take
one machine out of the loop and do a few tests on it.

Your kernel (2.2.14-6) sounds like a redhat pre-built kernel.  Have you
tried manually compiling the kernel on your own (obtaining source from
ftp.kernel.org)?

A few things to check:

Your gigabit ethernet, are you using Jumbo frames (insanely large r/w
buffers)?

Check your /etc/modules.conf file and see what drivers are bound the the
eth* devices.

Are you to use both cards?  What happens if you take one card out and
boot the machine?

Unfortunately, I'm no kernel hacker, so all I can offer is a process of
elmination style solution.

You can always post your ifconfig & dmesg output to the list.

-Rob


> My apologies.  Yes, the hardware is identical, each box has 2 nic cards,
one
> intel pro 10/100 (e100) and an intel fiber gigabit card (e1000).  We're
> running red hat 6.2 and kernel 2.2.14-6.
> 
> -----Original Message-----
> From: Robert Dege [mailto:[EMAIL PROTECTED]]
> Sent: Tuesday, January 15, 2002 3:37 PM
> To: [EMAIL PROTECTED]
> Subject: Re: skb problem
> 
> 
> 
> Well, it appears to be a kernel problem.  What kernel are you running on
> these machines?  Also, is the hardware the same on each of these boxes? 
> spefically the NIC card?
> 
> More info is appreciated.
> 
> -Rob
> 
> > I'm hoping someone out there can help us, we're desperate.  We've got 9
> > webservers all experiencing a strange problem.  It's random, happens to
> > individual servers at diff times, there doesn't appear to be a pattern,
> and
> > it isn't related to traffic or load, we've had them go down in the
middle
> of
> > the night.  You can still ping them, and their nfs mounts are available,
> but
> > we cannot telnet, ssh, or log in at console.  The only way out of it is
to
> > power off.  If anyone has seen this, please email back.  The error we
see
> in
> > /var/log/messages just prior to rebooting is:
> >  
> > kernel: !Proc_Rec_Ints cannot alloc_skb memory
> >  
> > If it repeats itself 30 or 40 times, the box is gone.
> >  
> > Thanks for the assistance.
> >  
> > 
> >  <mailto:[EMAIL PROTECTED]> D o u g   T u c k e r
> > Systems Administrator -  <http://www.belointeractive.com/> Belo
> Interactive
> > Phone: 214.977.4016
> >  <mailto:[EMAIL PROTECTED]> Page: 877.417.4750
> >  
> > 
> > 
> > 
> > _______________________________________________
> > Redhat-list mailing list
> > [EMAIL PROTECTED]
> > https://listman.redhat.com/mailman/listinfo/redhat-list
> > 
> -- 
> 
> -Rob
> 
> 
> 
> _______________________________________________
> Redhat-list mailing list
> [EMAIL PROTECTED]
> https://listman.redhat.com/mailman/listinfo/redhat-list
> 
> 
> 
> _______________________________________________
> Redhat-list mailing list
> [EMAIL PROTECTED]
> https://listman.redhat.com/mailman/listinfo/redhat-list
> 
-- 

-Rob



_______________________________________________
Redhat-list mailing list
[EMAIL PROTECTED]
https://listman.redhat.com/mailman/listinfo/redhat-list



_______________________________________________
Redhat-list mailing list
[EMAIL PROTECTED]
https://listman.redhat.com/mailman/listinfo/redhat-list

Reply via email to