RE: Kernel oops in mm/slab.c [ kmem_cache_grow() ] with test4-8

2000-09-18 Thread Jonathan Earle
Title: RE: Kernel oops in mm/slab.c [ kmem_cache_grow() ] with test4-8







> -Original Message-
> From: Andrew Morton [mailto:[EMAIL PROTECTED]]
> Sent: Friday, September 15, 2000 11:47 PM
> To: Earle, Jonathan [KAN:1A31:EXCH]
> Cc: Linux MPLS List (E-mail); Linux Kernel List (E-mail)
> Subject: Re: Kernel oops in mm/slab.c [ kmem_cache_grow() ] 
> with test4-8
> 
> 
> > Jonathan Earle wrote:
> > 
> > Hi,
> > 
> > I've been having kernel oopses with the 2.4.0-test series and am
> > including ksymoops processed output from both test4 and test5
> > kernels.  The same oops happens in later kernels too (Tested with
> > test6, test7 and test8).
> > 
> 
> Presumably mpls_output() is doing a kmalloc(..., GFP_KERNEL) 
> from within
> a softirq.  Hunt that down and turn it into GFP_ATOMIC.


Okay... Did that (turned all the GFP_KERNEL refereces in net/mpls to GFP_ATOMIC, and the problem seems to have gone away, I'll post a more confident summary when I'm more sure that things are working properly.

Now, what did I do (aside from fixing the problem) by changing that reference?


Many thanks for the hint!! 


Jon





RE: Kernel oops in mm/slab.c [ kmem_cache_grow() ] with test4-8

2000-09-18 Thread Jonathan Earle
Title: RE: Kernel oops in mm/slab.c [ kmem_cache_grow() ] with test4-8







 -Original Message-
 From: Andrew Morton [mailto:[EMAIL PROTECTED]]
 Sent: Friday, September 15, 2000 11:47 PM
 To: Earle, Jonathan [KAN:1A31:EXCH]
 Cc: Linux MPLS List (E-mail); Linux Kernel List (E-mail)
 Subject: Re: Kernel oops in mm/slab.c [ kmem_cache_grow() ] 
 with test4-8
 
 
  Jonathan Earle wrote:
  
  Hi,
  
  I've been having kernel oopses with the 2.4.0-test series and am
  including ksymoops processed output from both test4 and test5
  kernels. The same oops happens in later kernels too (Tested with
  test6, test7 and test8).
  
 
 Presumably mpls_output() is doing a kmalloc(..., GFP_KERNEL) 
 from within
 a softirq. Hunt that down and turn it into GFP_ATOMIC.


Okay... Did that (turned all the GFP_KERNEL refereces in net/mpls to GFP_ATOMIC, and the problem seems to have gone away, I'll post a more confident summary when I'm more sure that things are working properly.

Now, what did I do (aside from fixing the problem) by changing that reference?


Many thanks for the hint!! 


Jon





Re: Kernel oops in mm/slab.c [ kmem_cache_grow() ] with test4-8

2000-09-15 Thread Andrew Morton

> Jonathan Earle wrote:
> 
> Hi,
> 
> I've been having kernel oopses with the 2.4.0-test series and am
> including ksymoops processed output from both test4 and test5
> kernels.  The same oops happens in later kernels too (Tested with
> test6, test7 and test8).
> 

Presumably mpls_output() is doing a kmalloc(..., GFP_KERNEL) from within
a softirq.  Hunt that down and turn it into GFP_ATOMIC.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Kernel oops in mm/slab.c [ kmem_cache_grow() ] with test4-8

2000-09-15 Thread Jonathan Earle
Title: Kernel oops in mm/slab.c [ kmem_cache_grow() ] with test4-8





Hi, 


I've been having kernel oopses with the 2.4.0-test series and am including ksymoops processed output from both test4 and test5 kernels.  The same oops happens in later kernels too (Tested with test6, test7 and test8).

The scenario is this:


I have an incoming UDP stream at 1mbit.  The router marks packets in this stream, according to port ranges, with 3 (or any # of) marks (via iptables v1.1.1). iproute2 builds new routing tables based on these marks, and mplsadm, with the tc patch, is called to build LSPs using these routing tables.  Finally, the 3 egress LSPs are rate limited using tc (employing cbq classes) to a value less than the ingress rate (ie: I limited each LSP to 200kbit, for an aggregate egress output rate of 600kbit).  When I start the traffic flowing from our generator, the box panics and freezes quite solidly.  Policing via filters also crashes the box.  If I move the egress rate limiting function to another box, it works okay.

I've also noted that the crash only occurs if I throttle the traffic flow to an egress rate which is less than the ingress rate (ie: ingress flow at 1mbit and egress flow at 1mbit works fine.  If the egress rate is reduced, boom!)

I copied down the oopses and ran 'ksymoops < oops.txt > oops_proc.txt' and pasted them here.  The first is from kernel 2.4.0-test4 and the second from 2.4.0-test5.

NEW: Here's the funny part.  In mm/slab.c, the function kmem_cache_grow() contains a check as follows:


    /*
 * The test for missing atomic flag is performed here, rather than
 * the more obvious place, simply to reduce the critical path length
 * in kmem_cache_alloc(). If a caller is seriously mis-behaving they
 * will eventually be caught here (where it matters).
 */
 /* Commented out Sep 15 since it was crashing my router. */
 /* if (in_interrupt() && (flags & SLAB_LEVEL_MASK) != SLAB_ATOMIC)
 BUG(); */


This is the check that fails and causes the oops.  Not understanding what is actually being checked, and not knowing the repercussions of tampering with it, I commented out the check, recompiled and reran the test.  I understand that this is not really a fix (it's more akin to just turning my head and pretending that the problem doesn't exist, but... it seems to work.)  The result:  Great joy and much celebration!  I'm throwing 7.2mbps at the box, limiting the rate to 900kbit aggregate throughput and it's working!  The numbers I'm getting also seem to jive with anticipated results.

Cheers! 
Jon 


ksymoops 0.7c on i686 2.4.0-test4.  Options used 
 -V (default) 
 -k /proc/ksyms (default) 
 -l /proc/modules (default) 
 -o /lib/modules/2.4.0-test4/ (default) 
 -m /usr/src/linux/System.map (default) 


Warning: You did not tell me where to find symbol information.  I will 
assume that the log matches the kernel and modules that are running 
right now and I'll use the default options above for symbol resolution. 
If the current kernel and/or modules do not match the log, you can get 
more accurate output by telling me the kernel version and where to find 
map, modules, ksyms etc.  ksymoops -h explains the options. 


invalid operand:  
CPU: 0 
EIP: 0010:[] 
Using defaults from ksymoops -t elf32-i386 -a i386 
EFLAGS: 00010286 
eax: 001b ebx: c7ffd0c0 ecx:  edx: 0082 
esi: 0246 edi: c7ffd0c0 ebp: 0007 esp: c024fe70 
ds: 0018 es: 0018 ss: 0018 
Process swapper (pid:0, stackpage=c024f000) 
Stack: c01fb794 c01fb834 0412 c7ffd0c0 0247 0007 c024fed4 c7d1602e 
   c0127aaf c7ffd0c0 0007  c7d170e0 c7d1602e c01eb196 0008 
   0007  c7d170e0 c7d1602e c7f8be00  c01b6aaf c7d170e0 
Call trace: [][][][][][][] 
    [][][][][][][][] 
    [][][][][][][][] 
Code: 0f 0b 83 c4 0c c7 44 24 10 01 00 00 00 89 ee 83 e6 07 b8 03 


>>EIP; c01277fd    <= 
Trace; c01fb794  
Trace; c01fb834  
Trace; c0127aaf  
Trace; c01eb196  
Trace; c01b6aaf  
Trace; c01b6c6f  
Trace; c01b6a84  
Trace; c019b1c4  
Trace; c01b6936  
Trace; c01b6a84  
Trace; c019efe3  
Trace; c011b17f  
Trace; c010b8ee  
Trace; c01087e0  
Trace; c01087e0  
Trace; c010a518  
Trace; c01087e0  
Trace; c01087e0  
Trace; c0100018  
Trace; c0108803  
Trace; c0108864  
Trace; c0105000  
Trace; c0100192  
Code;  c01277fd  
 <_EIP>: 
Code;  c01277fd    <= 
   0:   0f 0b ud2a  <= 
Code;  c01277ff  
   2:   83 c4 0c  add    $0xc,%esp 
Code;  c0127802  
   5:   c7 44 24 10 01 00 00  movl   $0x1,0x10(%esp,1) 
Code;  c0127809  
   c:   00 
Code;  c012780a  
   d:   89 ee mov    %ebp,%esi 
Code;  c012780c  
   f:   83 e6 07  and    $0x7,%esi 
Code;  c012780f  
  12:   b8 03 00 00 00    mov    $0x3,%eax 


Aiee, killing interrupt h

Kernel oops in mm/slab.c [ kmem_cache_grow() ] with test4-8

2000-09-15 Thread Jonathan Earle
Title: Kernel oops in mm/slab.c [ kmem_cache_grow() ] with test4-8





Hi, 


I've been having kernel oopses with the 2.4.0-test series and am including ksymoops processed output from both test4 and test5 kernels. The same oops happens in later kernels too (Tested with test6, test7 and test8).

The scenario is this:


I have an incoming UDP stream at 1mbit. The router marks packets in this stream, according to port ranges, with 3 (or any # of) marks (via iptables v1.1.1). iproute2 builds new routing tables based on these marks, and mplsadm, with the tc patch, is called to build LSPs using these routing tables. Finally, the 3 egress LSPs are rate limited using tc (employing cbq classes) to a value less than the ingress rate (ie: I limited each LSP to 200kbit, for an aggregate egress output rate of 600kbit). When I start the traffic flowing from our generator, the box panics and freezes quite solidly. Policing via filters also crashes the box. If I move the egress rate limiting function to another box, it works okay.

I've also noted that the crash only occurs if I throttle the traffic flow to an egress rate which is less than the ingress rate (ie: ingress flow at 1mbit and egress flow at 1mbit works fine. If the egress rate is reduced, boom!)

I copied down the oopses and ran 'ksymoops  oops.txt  oops_proc.txt' and pasted them here. The first is from kernel 2.4.0-test4 and the second from 2.4.0-test5.

NEW: Here's the funny part. In mm/slab.c, the function kmem_cache_grow() contains a check as follows:


 /*
 * The test for missing atomic flag is performed here, rather than
 * the more obvious place, simply to reduce the critical path length
 * in kmem_cache_alloc(). If a caller is seriously mis-behaving they
 * will eventually be caught here (where it matters).
 */
 /* Commented out Sep 15 since it was crashing my router. */
 /* if (in_interrupt()  (flags  SLAB_LEVEL_MASK) != SLAB_ATOMIC)
 BUG(); */


This is the check that fails and causes the oops. Not understanding what is actually being checked, and not knowing the repercussions of tampering with it, I commented out the check, recompiled and reran the test. I understand that this is not really a fix (it's more akin to just turning my head and pretending that the problem doesn't exist, but... it seems to work.) The result: Great joy and much celebration! I'm throwing 7.2mbps at the box, limiting the rate to 900kbit aggregate throughput and it's working! The numbers I'm getting also seem to jive with anticipated results.

Cheers! 
Jon 


ksymoops 0.7c on i686 2.4.0-test4. Options used 
 -V (default) 
 -k /proc/ksyms (default) 
 -l /proc/modules (default) 
 -o /lib/modules/2.4.0-test4/ (default) 
 -m /usr/src/linux/System.map (default) 


Warning: You did not tell me where to find symbol information. I will 
assume that the log matches the kernel and modules that are running 
right now and I'll use the default options above for symbol resolution. 
If the current kernel and/or modules do not match the log, you can get 
more accurate output by telling me the kernel version and where to find 
map, modules, ksyms etc. ksymoops -h explains the options. 


invalid operand:  
CPU: 0 
EIP: 0010:[c01277fd] 
Using defaults from ksymoops -t elf32-i386 -a i386 
EFLAGS: 00010286 
eax: 001b ebx: c7ffd0c0 ecx:  edx: 0082 
esi: 0246 edi: c7ffd0c0 ebp: 0007 esp: c024fe70 
ds: 0018 es: 0018 ss: 0018 
Process swapper (pid:0, stackpage=c024f000) 
Stack: c01fb794 c01fb834 0412 c7ffd0c0 0247 0007 c024fed4 c7d1602e 
 c0127aaf c7ffd0c0 0007  c7d170e0 c7d1602e c01eb196 0008 
 0007  c7d170e0 c7d1602e c7f8be00  c01b6aaf c7d170e0 
Call trace: [c01fb794][c01fb834][c0127aaf][c01eb196][c01b6aaf][c01b6c6f][c01b6a84] 
 [c019b1c4][c01b6936][c01b6a84][c019efe3][c011b17f][c010b8ee][c01087e0][c01087e0] 
 [c010a518][c01087e0][c01087e0][c0100018][c0108803][c0108864][c0105000][c0100192] 
Code: 0f 0b 83 c4 0c c7 44 24 10 01 00 00 00 89 ee 83 e6 07 b8 03 


EIP; c01277fd kmem_cache_grow+69/254 = 
Trace; c01fb794 tvecs+1500/14d4c 
Trace; c01fb834 tvecs+15a0/14d4c 
Trace; c0127aaf kmalloc+73/ac 
Trace; c01eb196 mpls_output+12/26c 
Trace; c01b6aaf ip_rcv_finish+2b/21c 
Trace; c01b6c6f ip_rcv_finish+1eb/21c 
Trace; c01b6a84 ip_rcv_finish+0/21c 
Trace; c019b1c4 nf_hook_slow+7c/b4 
Trace; c01b6936 ip_rcv+356/38c 
Trace; c01b6a84 ip_rcv_finish+0/21c 
Trace; c019efe3 net_rx_action+123/1e8 
Trace; c011b17f do_softirq+4f/70 
Trace; c010b8ee do_IRQ+a6/b8 
Trace; c01087e0 default_idle+0/28 
Trace; c01087e0 default_idle+0/28 
Trace; c010a518 ret_from_intr+0/20 
Trace; c01087e0 default_idle+0/28 
Trace; c01087e0 default_idle+0/28 
Trace; c0100018 startup_32+18/13a 
Trace; c0108803 default_idle+23/28 
Trace; c0108864 cpu_idle+3c/50 
Trace; c0105000 empty_bad_page+0/1000 
Trace; c0100192 L6+0/2 
Code; c01277fd kmem_cache_grow+69/254 
 _EIP: 
Code; c01277fd kmem_cache_grow+69/254 = 
 0: 0f 0b ud2a = 
Code; c01277ff

Re: Kernel oops in mm/slab.c [ kmem_cache_grow() ] with test4-8

2000-09-15 Thread Andrew Morton

 Jonathan Earle wrote:
 
 Hi,
 
 I've been having kernel oopses with the 2.4.0-test series and am
 including ksymoops processed output from both test4 and test5
 kernels.  The same oops happens in later kernels too (Tested with
 test6, test7 and test8).
 

Presumably mpls_output() is doing a kmalloc(..., GFP_KERNEL) from within
a softirq.  Hunt that down and turn it into GFP_ATOMIC.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/