Re: slab corruption in 2.6.16rc1-git4

2006-02-04 Thread David S. Miller
From: Dave Jones [EMAIL PROTECTED]
Date: Sat, 4 Feb 2006 12:14:11 -0500

 I've hit it three times now, and every time it seems to have happened
 whilst it was under attack from junk icmp, which hopefully narrows 
 it down a little to a specific set of isic parameters.

I've sent the following fix from Herbert to Linus and -stable.

diff-tree 429563d07b4feda0729f296b90c722f4d431adac (from 
53ea68ecea11bcbb3451c2758ce181bd97b569a9)
Author: Herbert Xu [EMAIL PROTECTED]
Date:   Sat Feb 4 02:09:34 2006 -0800

[ICMP]: Fix extra dst release when ip_options_echo fails

When two ip_route_output_key lookups in icmp_send were combined I
forgot to change the error path for ip_options_echo to not drop the
dst reference since it now sits before the dst lookup.  To fix it we
simply jump past the ip_rt_put call.

Signed-off-by: Herbert Xu [EMAIL PROTECTED]
Signed-off-by: David S. Miller [EMAIL PROTECTED]

diff --git a/net/ipv4/icmp.c b/net/ipv4/icmp.c
index 6bc0887..4d1c409 100644
--- a/net/ipv4/icmp.c
+++ b/net/ipv4/icmp.c
@@ -524,7 +524,7 @@ void icmp_send(struct sk_buff *skb_in, i
  iph-tos;
 
if (ip_options_echo(icmp_param.replyopts, skb_in))
-   goto ende;
+   goto out_unlock;
 
 
/*
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: slab corruption in 2.6.16rc1-git4

2006-02-03 Thread Herbert Xu
Dave Jones [EMAIL PROTECTED] wrote:
 Note the first slab corruption line..
 
 000: 6b 6b 6b 6b 6a 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b
 
 has a single bit error, which _could_ be bad ram, as this box is an ancient

Actually, this is exactly what would've happened if someone did a
dst_release on a freed dst entry.  So this probably ties in with
your report about dst badness.

Unfrotunately, I was able to reproduce this bug exactly once with isic
and since then no matter what I do it just works perfectly.
-- 
Visit Openswan at http://www.openswan.org/
Email: Herbert Xu ~{PmVHI~} [EMAIL PROTECTED]
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: slab corruption in 2.6.16rc1-git4

2006-02-03 Thread Stephen Hemminger
On Sat, 04 Feb 2006 13:50:44 +1100
Herbert Xu [EMAIL PROTECTED] wrote:

 Dave Jones [EMAIL PROTECTED] wrote:
  Note the first slab corruption line..
  
  000: 6b 6b 6b 6b 6a 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b
  
  has a single bit error, which _could_ be bad ram, as this box is an ancient
 
 Actually, this is exactly what would've happened if someone did a
 dst_release on a freed dst entry.  So this probably ties in with
 your report about dst badness.
 
 Unfrotunately, I was able to reproduce this bug exactly once with isic
 and since then no matter what I do it just works perfectly.

It takes about 15 minutes of over a gigabit link for me to trigger
on a dual Opteron with 2G of mem. Maybe Dave's niagra's would be
faster.

Although it might be depend on what level of debugging is turned on.
What would be helpful is knowing whether it is related to code path
(ie input packet), or dst cache fillup/release.
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


slab corruption in 2.6.16rc1-git4

2006-02-02 Thread Dave Jones
I've had a box being tortured with random junk packets (created with isic)
for a few days, and it spat this out last night..

Feb  1 04:28:09 trogdor kernel: Slab corruption: (Not tainted) start=cefc8a9c, 
len=244
Feb  1 04:28:09 trogdor kernel: Redzone: 0x5a2cf071/0x5a2cf071.
Feb  1 04:28:09 trogdor kernel: Last user: [c02a3d22](dst_destroy+0x7f/0xab)
Feb  1 04:28:09 trogdor kernel:  [c015f88f] check_poison_obj+0x73/0x16a 
[c015f9a8] cache_alloc_debugcheck_after+0x22/0xf9
Feb  1 04:28:09 trogdor kernel:  [c015fafc] kmem_cache_alloc+0x7d/0x86 
[c02a3ed5] dst_alloc+0x27/0x7b
Feb  1 04:28:09 trogdor kernel:  [c02a3ed5] dst_alloc+0x27/0x7b 
[c02b724d] __ip_route_output_key+0x5a2/0x843
Feb  1 04:28:09 trogdor kernel:  [e08d34bb] issue_and_wait+0x28/0x93 [3c59x]  
   [e08d6c64] boomerang_start_xmit+0x31c/0x335 [3c59x]
Feb  1 04:28:09 trogdor kernel:  [c02a26ca] dev_queue_xmit+0x208/0x20f 
[c02b7501] ip_route_output_flow+0x13/0x57
Feb  1 04:28:09 trogdor kernel:  [c02b754e] ip_route_output_key+0x9/0xb 
[c02d81cc] icmp_send+0x282/0x397
Feb  1 04:28:09 trogdor kernel:  [c02b758b] ip_route_input+0x3b/0xc6a 
[c02f61af] _spin_lock_irqsave+0x9/0xd
Feb  1 04:28:09 trogdor kernel:  [c02bbb4f] ip_options_compile+0x3da/0x3f3
 [c02ba1a0] ip_rcv+0x322/0x478
Feb  1 04:28:09 trogdor kernel:  [c02a0d70] netif_receive_skb+0x211/0x259 
[c02a2283] process_backlog+0x7a/0x100
Feb  1 04:28:09 trogdor kernel:  [c02a23a2] net_rx_action+0x99/0x170 
[c0127ac8] __do_softirq+0x58/0xc2
Feb  1 04:28:09 trogdor kernel:  [c0105f03] do_softirq+0x46/0x4e0 
===
Feb  1 04:28:09 trogdor kernel:  [c0105eb4] do_IRQ+0x72/0x7b
Feb  1 04:28:09 trogdor kernel:  [c0104766] common_interrupt+0x1a/0x20 
[c0102db1] default_idle+0x0/0x55
Feb  1 04:28:09 trogdor kernel:  [c0102ddd] default_idle+0x2c/0x55 
[c0102e95] cpu_idle+0x8f/0xa8
Feb  1 04:28:09 trogdor kernel:  [c03ce684] start_kernel+0x301/0x307
3000: 6b 6b 6b 6b 6a 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b
Feb  1 04:28:09 trogdor kernel: Prev obj: start=cefc899c, len=244
Feb  1 04:28:09 trogdor kernel: Redzone: 0x5a2cf071/0x5a2cf071.
Feb  1 04:28:09 trogdor kernel: Last user: [c02a3d22](dst_destroy+0x7f/0xab)
Feb  1 04:28:09 trogdor kernel: 000: 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 
6b 6b
Feb  1 04:28:09 trogdor kernel: 010: 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 
6b 6b
Feb  1 04:28:09 trogdor kernel: Next obj: start=cefc8b9c, len=244
Feb  1 04:28:09 trogdor kernel: Redzone: 0x170fc2a5/0x170fc2a5.
Feb  1 04:28:09 trogdor kernel: Last user: [c02a3ed5](dst_alloc+0x27/0x7b)
Feb  1 04:28:09 trogdor kernel: 000: 7c dd 63 cf 01 00 00 00 a6 a2 00 00 00 00 
00 00
Feb  1 04:28:09 trogdor kernel: 010: 00 b7 37 c0 00 00 02 00 01 00 00 00 56 a9 
1b 02

Note the first slab corruption line..

000: 6b 6b 6b 6b 6a 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b

has a single bit error, which _could_ be bad ram, as this box is an ancient
4-way pentium pro, so it's days may be numbered. I'll give it a spin with
memtest86 next time I'm at the office, but I wanted to report this just
in case, as the last few days I've been seeing a number of slab corruption
issues on different boxes, some of which I know are definitly ok wrt hardware 
problems.

Dave

-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html