subject:"Memory leak in 2.6.11\-rc1\?"

Re: -rc3 leaking NOT BIO [Was: Memory leak in 2.6.11-rc1?]

2005-02-20 Thread Parag Warudkar

On Thursday 17 February 2005 08:38 pm, Badari Pulavarty wrote:
> > On Wednesday 16 February 2005 06:52 pm, Andrew Morton wrote:
> > > So it's probably an ndiswrapper bug?
> >
> > Andrew,
> > It looks like it is a kernel bug triggered by NdisWrapper. Without
> > NdisWrapper, and with just 8139too plus some light network activity the
> > size-64 grew from ~ 1100 to 4500 overnight. Is this normal? I will keep
> > it running to see where it goes.

[OT]

Didn't wanted to keep this hanging - It turned out to be a strange ndiswrapper 
bug - It seems that the other OS in question allows the following without a 
leak ;) -
ptr =Allocate(...);
ptr = Allocate(...);
:
repeat this zillion times without ever fearing that 'ptr' will leak..

I sent a fix to ndiswrapper-general mailing list on sourceforge if any one is 
using ndiswrapper and having a similar problem.

Parag
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: -rc3 leaking NOT BIO [Was: Memory leak in 2.6.11-rc1?]

2005-02-20 Thread Parag Warudkar

On Thursday 17 February 2005 08:38 pm, Badari Pulavarty wrote:
  On Wednesday 16 February 2005 06:52 pm, Andrew Morton wrote:
   So it's probably an ndiswrapper bug?
 
  Andrew,
  It looks like it is a kernel bug triggered by NdisWrapper. Without
  NdisWrapper, and with just 8139too plus some light network activity the
  size-64 grew from ~ 1100 to 4500 overnight. Is this normal? I will keep
  it running to see where it goes.

[OT]

Didn't wanted to keep this hanging - It turned out to be a strange ndiswrapper 
bug - It seems that the other OS in question allows the following without a 
leak ;) -
ptr =Allocate(...);
ptr = Allocate(...);
:
repeat this zillion times without ever fearing that 'ptr' will leak..

I sent a fix to ndiswrapper-general mailing list on sourceforge if any one is 
using ndiswrapper and having a similar problem.

Parag
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: -rc3 leaking NOT BIO [Was: Memory leak in 2.6.11-rc1?]

2005-02-17 Thread Badari Pulavarty

On Thu, 2005-02-17 at 05:00, Parag Warudkar wrote:
> On Wednesday 16 February 2005 06:52 pm, Andrew Morton wrote:
> > So it's probably an ndiswrapper bug?
> Andrew, 
> It looks like it is a kernel bug triggered by NdisWrapper. Without 
> NdisWrapper, and with just 8139too plus some light network activity the 
> size-64 grew from ~ 1100 to 4500 overnight. Is this normal? I will keep it 
> running to see where it goes.
> 
> A question - is it safe to assume it is  a kmalloc based leak? (I am thinking 
> of tracking it down by using kprobes to insert a probe into __kmalloc and 
> record the stack to see what is causing so many allocations.)
> 

Last time I debugged something like this, I ended up adding dump_stack()
in kmem_cache_alloc() for the specific slab.

If you are really interested, you can try to get following jprobe
module working. (need to teach about kmem_cache_t structure to
get it to compile and export kallsyms_lookup_name() symbol etc).

Thanks,
Badari



#include 
#include 
#include 
#include 

MODULE_PARM_DESC(kmod, "\n");

int count = 0;
void fastcall inst_kmem_cache_alloc(kmem_cache_t *cachep, int flags)
{
	if (cachep->objsize == 64) {
		if (count++ == 100) {
			dump_stack();
			count = 0;
		}
	}
	jprobe_return();
}
static char *fn_names[] = {
	"kmem_cache_alloc",
};

static struct jprobe kmem_probes[] = {
  {
.entry = (kprobe_opcode_t *) inst_kmem_cache_alloc,
.kp.addr=(kprobe_opcode_t *) 0,
  }
};

#define MAX_KMEM_ROUTINE (sizeof(kmem_probes)/sizeof(struct kprobe))

/* installs the probes in the appropriate places */
static int init_kmods(void)
{
	int i;

	for (i = 0; i < MAX_KMEM_ROUTINE; i++) {
		kmem_probes[i].kp.addr = kallsyms_lookup_name(fn_names[i]);
		if (kmem_probes[i].kp.addr) { 
			printk("plant jprobe at name %s %p, handler addr %p\n",
		  fn_names[i], kmem_probes[i].kp.addr, kmem_probes[i].entry);
			register_jprobe(_probes[i]);
		}
	}
	return 0;
}

static void cleanup_kmods(void)
{
	int i;
	for (i = 0; i < MAX_KMEM_ROUTINE; i++) {
		unregister_jprobe(_probes[i]);
	}
}

module_init(init_kmods);
module_exit(cleanup_kmods);
MODULE_LICENSE("GPL");

Re: -rc3 leaking NOT BIO [Was: Memory leak in 2.6.11-rc1?]

2005-02-17 Thread Linus Torvalds

On Thu, 17 Feb 2005, Parag Warudkar wrote:
> 
> A question - is it safe to assume it is  a kmalloc based leak? (I am thinking 
> of tracking it down by using kprobes to insert a probe into __kmalloc and 
> record the stack to see what is causing so many allocations.)

It's definitely kmalloc-based, but you may not catch it in __kmalloc. The 
"kmalloc()" function is actually an inline function which has some magic 
compile-time code that statically determines when the size is constant and 
can be turned into a direct call to "kmem_cache_alloc()" with the proper 
cache descriptor.

So you'd need to either instrument kmem_cache_alloc() (and trigger on the 
proper slab descriptor) or you would need to modify the kmalloc() 
definition in  to not do the constant size optimization, at 
which point you can instrument just __kmalloc() and avoid some of the 
overhead.

Linus
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: -rc3 leaking NOT BIO [Was: Memory leak in 2.6.11-rc1?]

2005-02-17 Thread Parag Warudkar

On Wednesday 16 February 2005 10:48 pm, Horst von Brand wrote:
> Does x86_64 use up a (freeable) register for the frame pointer or not?
> I.e., does -fomit-frame-pointer have any effect on the generated code?

{Took Linus out of the loop as he probably isn't interested}

The generated code is different for both cases but for some reason gcc has 
trouble with __builtin_return_address on x86-64.

For e.g. specifying gcc -fo-f-p, a method produces following assembly.

method_1:
.LFB2:
subq$8, %rsp
.LCFI0:
movl$__FUNCTION__.0, %esi
movl$.LC0, %edi
movl$0, %eax
callprintf
movl$0, %eax
addq$8, %rsp
ret

And with -fno-o-f-p,  the same method yields 

method_1:
.LFB2:
pushq   %rbp
.LCFI0:
movq%rsp, %rbp
.LCFI1:
movl$__FUNCTION__.0, %esi
movl$.LC0, %edi
movl$0, %eax
callprintf
movl$0, %eax
leave
ret
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: -rc3 leaking NOT BIO [Was: Memory leak in 2.6.11-rc1?]

2005-02-17 Thread Parag Warudkar

On Wednesday 16 February 2005 06:52 pm, Andrew Morton wrote:
> So it's probably an ndiswrapper bug?
Andrew, 
It looks like it is a kernel bug triggered by NdisWrapper. Without 
NdisWrapper, and with just 8139too plus some light network activity the 
size-64 grew from ~ 1100 to 4500 overnight. Is this normal? I will keep it 
running to see where it goes.

A question - is it safe to assume it is  a kmalloc based leak? (I am thinking 
of tracking it down by using kprobes to insert a probe into __kmalloc and 
record the stack to see what is causing so many allocations.)

Thanks
Parag
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: -rc3 leaking NOT BIO [Was: Memory leak in 2.6.11-rc1?]

2005-02-17 Thread Parag Warudkar

On Wednesday 16 February 2005 06:52 pm, Andrew Morton wrote:
 So it's probably an ndiswrapper bug?
Andrew, 
It looks like it is a kernel bug triggered by NdisWrapper. Without 
NdisWrapper, and with just 8139too plus some light network activity the 
size-64 grew from ~ 1100 to 4500 overnight. Is this normal? I will keep it 
running to see where it goes.

A question - is it safe to assume it is  a kmalloc based leak? (I am thinking 
of tracking it down by using kprobes to insert a probe into __kmalloc and 
record the stack to see what is causing so many allocations.)

Thanks
Parag
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: -rc3 leaking NOT BIO [Was: Memory leak in 2.6.11-rc1?]

2005-02-17 Thread Parag Warudkar

On Wednesday 16 February 2005 10:48 pm, Horst von Brand wrote:
 Does x86_64 use up a (freeable) register for the frame pointer or not?
 I.e., does -fomit-frame-pointer have any effect on the generated code?

{Took Linus out of the loop as he probably isn't interested}

The generated code is different for both cases but for some reason gcc has 
trouble with __builtin_return_address on x86-64.

For e.g. specifying gcc -fo-f-p, a method produces following assembly.

method_1:
.LFB2:
subq$8, %rsp
.LCFI0:
movl$__FUNCTION__.0, %esi
movl$.LC0, %edi
movl$0, %eax
callprintf
movl$0, %eax
addq$8, %rsp
ret

And with -fno-o-f-p,  the same method yields 

method_1:
.LFB2:
pushq   %rbp
.LCFI0:
movq%rsp, %rbp
.LCFI1:
movl$__FUNCTION__.0, %esi
movl$.LC0, %edi
movl$0, %eax
callprintf
movl$0, %eax
leave
ret
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: -rc3 leaking NOT BIO [Was: Memory leak in 2.6.11-rc1?]

2005-02-17 Thread Linus Torvalds



On Thu, 17 Feb 2005, Parag Warudkar wrote:
 
 A question - is it safe to assume it is  a kmalloc based leak? (I am thinking 
 of tracking it down by using kprobes to insert a probe into __kmalloc and 
 record the stack to see what is causing so many allocations.)

It's definitely kmalloc-based, but you may not catch it in __kmalloc. The 
kmalloc() function is actually an inline function which has some magic 
compile-time code that statically determines when the size is constant and 
can be turned into a direct call to kmem_cache_alloc() with the proper 
cache descriptor.

So you'd need to either instrument kmem_cache_alloc() (and trigger on the 
proper slab descriptor) or you would need to modify the kmalloc() 
definition in linux/slab.h to not do the constant size optimization, at 
which point you can instrument just __kmalloc() and avoid some of the 
overhead.

Linus
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: -rc3 leaking NOT BIO [Was: Memory leak in 2.6.11-rc1?]

2005-02-17 Thread Badari Pulavarty

On Thu, 2005-02-17 at 05:00, Parag Warudkar wrote:
 On Wednesday 16 February 2005 06:52 pm, Andrew Morton wrote:
  So it's probably an ndiswrapper bug?
 Andrew, 
 It looks like it is a kernel bug triggered by NdisWrapper. Without 
 NdisWrapper, and with just 8139too plus some light network activity the 
 size-64 grew from ~ 1100 to 4500 overnight. Is this normal? I will keep it 
 running to see where it goes.
 
 A question - is it safe to assume it is  a kmalloc based leak? (I am thinking 
 of tracking it down by using kprobes to insert a probe into __kmalloc and 
 record the stack to see what is causing so many allocations.)
 

Last time I debugged something like this, I ended up adding dump_stack()
in kmem_cache_alloc() for the specific slab.

If you are really interested, you can try to get following jprobe
module working. (need to teach about kmem_cache_t structure to
get it to compile and export kallsyms_lookup_name() symbol etc).

Thanks,
Badari



#include linux/module.h
#include linux/kprobes.h
#include linux/kallsyms.h
#include linux/kdev_t.h

MODULE_PARM_DESC(kmod, \n);

int count = 0;
void fastcall inst_kmem_cache_alloc(kmem_cache_t *cachep, int flags)
{
	if (cachep-objsize == 64) {
		if (count++ == 100) {
			dump_stack();
			count = 0;
		}
	}
	jprobe_return();
}
static char *fn_names[] = {
	kmem_cache_alloc,
};

static struct jprobe kmem_probes[] = {
  {
.entry = (kprobe_opcode_t *) inst_kmem_cache_alloc,
.kp.addr=(kprobe_opcode_t *) 0,
  }
};

#define MAX_KMEM_ROUTINE (sizeof(kmem_probes)/sizeof(struct kprobe))

/* installs the probes in the appropriate places */
static int init_kmods(void)
{
	int i;

	for (i = 0; i  MAX_KMEM_ROUTINE; i++) {
		kmem_probes[i].kp.addr = kallsyms_lookup_name(fn_names[i]);
		if (kmem_probes[i].kp.addr) { 
			printk(plant jprobe at name %s %p, handler addr %p\n,
		  fn_names[i], kmem_probes[i].kp.addr, kmem_probes[i].entry);
			register_jprobe(kmem_probes[i]);
		}
	}
	return 0;
}

static void cleanup_kmods(void)
{
	int i;
	for (i = 0; i  MAX_KMEM_ROUTINE; i++) {
		unregister_jprobe(kmem_probes[i]);
	}
}

module_init(init_kmods);
module_exit(cleanup_kmods);
MODULE_LICENSE(GPL);

Re: -rc3 leaking NOT BIO [Was: Memory leak in 2.6.11-rc1?]

2005-02-16 Thread Horst von Brand

Andrew Morton <[EMAIL PROTECTED]> said:
> Parag Warudkar <[EMAIL PROTECTED]> wrote:

[...]

> > Is there a reason X86_64 doesnt have CONFIG_FRAME_POINTER anywhere in 
> > the .config?

> No good reason, I suspect.

Does x86_64 use up a (freeable) register for the frame pointer or not?
I.e., does -fomit-frame-pointer have any effect on the generated code?
-- 
Dr. Horst H. von Brand   User #22616 counter.li.org
Departamento de Informatica Fono: +56 32 654431
Universidad Tecnica Federico Santa Maria  +56 32 654239
Casilla 110-V, Valparaiso, ChileFax:  +56 32 797513
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: -rc3 leaking NOT BIO [Was: Memory leak in 2.6.11-rc1?]

2005-02-16 Thread Parag Warudkar

On Wednesday 16 February 2005 06:51 pm, Andrew Morton wrote:
> 81002fe8 is the address of the slab object.  08a8 is
> supposed to be the caller's text address.  It appears that
> __builtin_return_address(0) is returning junk.  Perhaps due to
> -fomit-frame-pointer.
I tried manually removing -fomit-frame-pointer from Makefile and adding 
-fno-omit-frame-pointer but with same results - junk return addresses. 
Probably a X86_64 issue.

>So it's probably an ndiswrapper bug? 
I looked at ndiswrapper mailing lists and found this explanation for the same 
issue of growing size-64 with ndiswrapper  -
--
"It looks like the problem is kernel-version related, not ndiswrapper. 
 ndiswrapper just uses some API that starts the memory leak but the 
 problem is indeed in the kernel itself. versions from 2.6.10 up to 
 .11-rc3 have this problem afaik. haven"t tested rc4 but maybe this one 
 doesn"t have the problem anymore, we will see"
--

I tested -rc4 and it has the problem too.  More over, with plain old 8139too 
driver, the slab still continues to grow albeit slowly. So there is a reason 
to suspect kernel leak as well. I will try binary searching...

Parag
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: -rc3 leaking NOT BIO [Was: Memory leak in 2.6.11-rc1?]

2005-02-16 Thread Andrew Morton

Parag Warudkar <[EMAIL PROTECTED]> wrote:
>
> On Wednesday 16 February 2005 12:12 am, Andrew Morton wrote:
> > Plenty of moisture there.
> >
> > Could you please use this patch?  Make sure that you enable
> > CONFIG_FRAME_POINTER (might not be needed for __builtin_return_address(0),
> > but let's be sure).  Also enable CONFIG_DEBUG_SLAB.
> 
> Will try that out. For now I tried -rc4 and couple other things - removing 
> nvidia module doesnt make any difference but removing ndiswrapper and with no 
> networking the slab growth stops. With 8139too driver and network the growth 
> is there but pretty slower than with ndiswrapper. With 8139too + some network 
> activity slab seems to reduce sometimes.

OK.

> Seems either an ndiswrapper or a networking related leak. Will report the 
> results with Manfred's patch tomorrow.

So it's probably an ndiswrapper bug?
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: -rc3 leaking NOT BIO [Was: Memory leak in 2.6.11-rc1?]

2005-02-16 Thread Andrew Morton

Parag Warudkar <[EMAIL PROTECTED]> wrote:
>
> On Wednesday 16 February 2005 12:12 am, Andrew Morton wrote:
> > echo "size-4096 0 0 0" > /proc/slabinfo
> 
> Is there a reason X86_64 doesnt have CONFIG_FRAME_POINTER anywhere in 
> the .config?

No good reason, I suspect.

> I tried -rc4 with Manfred's patch and with CONFIG_DEBUG_SLAB and 
> CONFIG_DEBUG.

Thanks.

> I get the following output from
> echo "size-64 0 0 0" > /proc/slabinfo
> 
> obj 81002fe8/0: 08a8 <0x8a8>
> obj 81002fe8/1: 08a8 <0x8a8>
> obj 81002fe8/2: 08a8 <0x8a8>
> : 3
> : 4
> : :
> obj 81002fe8/43: 08a8 <0x8a8>
> obj 81002fe8/44: 08a8 <0x8a8>
>  
> How do I know what is at 81002fe8? I tried the normal tricks (gdb 
> -c /proc/kcore vmlinux, objdump -d etc.) but none of the places list this 
> address.

81002fe8 is the address of the slab object.  08a8 is
supposed to be the caller's text address.  It appears that
__builtin_return_address(0) is returning junk.  Perhaps due to
-fomit-frame-pointer.

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: -rc3 leaking NOT BIO [Was: Memory leak in 2.6.11-rc1?]

2005-02-16 Thread Parag Warudkar

On Wednesday 16 February 2005 12:12 am, Andrew Morton wrote:
> Plenty of moisture there.
>
> Could you please use this patch?  Make sure that you enable
> CONFIG_FRAME_POINTER (might not be needed for __builtin_return_address(0),
> but let's be sure).  Also enable CONFIG_DEBUG_SLAB.

Will try that out. For now I tried -rc4 and couple other things - removing 
nvidia module doesnt make any difference but removing ndiswrapper and with no 
networking the slab growth stops. With 8139too driver and network the growth 
is there but pretty slower than with ndiswrapper. With 8139too + some network 
activity slab seems to reduce sometimes.

Seems either an ndiswrapper or a networking related leak. Will report the 
results with Manfred's patch tomorrow.

Thanks
Parag
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: -rc3 leaking NOT BIO [Was: Memory leak in 2.6.11-rc1?]

2005-02-16 Thread Parag Warudkar

On Wednesday 16 February 2005 12:12 am, Andrew Morton wrote:
 Plenty of moisture there.

 Could you please use this patch?  Make sure that you enable
 CONFIG_FRAME_POINTER (might not be needed for __builtin_return_address(0),
 but let's be sure).  Also enable CONFIG_DEBUG_SLAB.

Will try that out. For now I tried -rc4 and couple other things - removing 
nvidia module doesnt make any difference but removing ndiswrapper and with no 
networking the slab growth stops. With 8139too driver and network the growth 
is there but pretty slower than with ndiswrapper. With 8139too + some network 
activity slab seems to reduce sometimes.

Seems either an ndiswrapper or a networking related leak. Will report the 
results with Manfred's patch tomorrow.

Thanks
Parag
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: -rc3 leaking NOT BIO [Was: Memory leak in 2.6.11-rc1?]

2005-02-16 Thread Andrew Morton

Parag Warudkar [EMAIL PROTECTED] wrote:

 On Wednesday 16 February 2005 12:12 am, Andrew Morton wrote:
  echo size-4096 0 0 0  /proc/slabinfo
 
 Is there a reason X86_64 doesnt have CONFIG_FRAME_POINTER anywhere in 
 the .config?

No good reason, I suspect.

 I tried -rc4 with Manfred's patch and with CONFIG_DEBUG_SLAB and 
 CONFIG_DEBUG.

Thanks.

 I get the following output from
 echo size-64 0 0 0  /proc/slabinfo
 
 obj 81002fe8/0: 08a8 0x8a8
 obj 81002fe8/1: 08a8 0x8a8
 obj 81002fe8/2: 08a8 0x8a8
 : 3
 : 4
 : :
 obj 81002fe8/43: 08a8 0x8a8
 obj 81002fe8/44: 08a8 0x8a8
  
 How do I know what is at 81002fe8? I tried the normal tricks (gdb 
 -c /proc/kcore vmlinux, objdump -d etc.) but none of the places list this 
 address.

81002fe8 is the address of the slab object.  08a8 is
supposed to be the caller's text address.  It appears that
__builtin_return_address(0) is returning junk.  Perhaps due to
-fomit-frame-pointer.

-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: -rc3 leaking NOT BIO [Was: Memory leak in 2.6.11-rc1?]

2005-02-16 Thread Andrew Morton

Parag Warudkar [EMAIL PROTECTED] wrote:

 On Wednesday 16 February 2005 12:12 am, Andrew Morton wrote:
  Plenty of moisture there.
 
  Could you please use this patch?  Make sure that you enable
  CONFIG_FRAME_POINTER (might not be needed for __builtin_return_address(0),
  but let's be sure).  Also enable CONFIG_DEBUG_SLAB.
 
 Will try that out. For now I tried -rc4 and couple other things - removing 
 nvidia module doesnt make any difference but removing ndiswrapper and with no 
 networking the slab growth stops. With 8139too driver and network the growth 
 is there but pretty slower than with ndiswrapper. With 8139too + some network 
 activity slab seems to reduce sometimes.

OK.

 Seems either an ndiswrapper or a networking related leak. Will report the 
 results with Manfred's patch tomorrow.

So it's probably an ndiswrapper bug?
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: -rc3 leaking NOT BIO [Was: Memory leak in 2.6.11-rc1?]

2005-02-16 Thread Parag Warudkar

On Wednesday 16 February 2005 06:51 pm, Andrew Morton wrote:
 81002fe8 is the address of the slab object.  08a8 is
 supposed to be the caller's text address.  It appears that
 __builtin_return_address(0) is returning junk.  Perhaps due to
 -fomit-frame-pointer.
I tried manually removing -fomit-frame-pointer from Makefile and adding 
-fno-omit-frame-pointer but with same results - junk return addresses. 
Probably a X86_64 issue.

So it's probably an ndiswrapper bug? 
I looked at ndiswrapper mailing lists and found this explanation for the same 
issue of growing size-64 with ndiswrapper  -
--
It looks like the problem is kernel-version related, not ndiswrapper. 
 ndiswrapper just uses some API that starts the memory leak but the 
 problem is indeed in the kernel itself. versions from 2.6.10 up to 
 .11-rc3 have this problem afaik. havent tested rc4 but maybe this one 
 doesnt have the problem anymore, we will see
--

I tested -rc4 and it has the problem too.  More over, with plain old 8139too 
driver, the slab still continues to grow albeit slowly. So there is a reason 
to suspect kernel leak as well. I will try binary searching...

Parag
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: -rc3 leaking NOT BIO [Was: Memory leak in 2.6.11-rc1?]

2005-02-16 Thread Horst von Brand

Andrew Morton [EMAIL PROTECTED] said:
 Parag Warudkar [EMAIL PROTECTED] wrote:

[...]

  Is there a reason X86_64 doesnt have CONFIG_FRAME_POINTER anywhere in 
  the .config?

 No good reason, I suspect.

Does x86_64 use up a (freeable) register for the frame pointer or not?
I.e., does -fomit-frame-pointer have any effect on the generated code?
-- 
Dr. Horst H. von Brand   User #22616 counter.li.org
Departamento de Informatica Fono: +56 32 654431
Universidad Tecnica Federico Santa Maria  +56 32 654239
Casilla 110-V, Valparaiso, ChileFax:  +56 32 797513
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: -rc3 leaking NOT BIO [Was: Memory leak in 2.6.11-rc1?]

2005-02-15 Thread Andrew Morton

Parag Warudkar <[EMAIL PROTECTED]> wrote:
>
> I am running -rc3 on my AMD64 laptop and I noticed it becomes sluggish after 
> use mainly due to growing swap use.  It has 768M of RAM and a Gig of swap. 
> After following this thread, I started monitoring /proc/slabinfo. It seems 
> size-64 is continuously growing and doing a compile run seem to make it grow 
> noticeably faster. After a day's uptime size-64 line in /proc/slabinfo looks 
> like 
> 
> size-64   7216543 7216544 64   611 : tunables  120   600 
> : 
> slabdata 118304 118304  0

Plenty of moisture there.

Could you please use this patch?  Make sure that you enable
CONFIG_FRAME_POINTER (might not be needed for __builtin_return_address(0),
but let's be sure).  Also enable CONFIG_DEBUG_SLAB.



From: Manfred Spraul <[EMAIL PROTECTED]>

With the patch applied,

echo "size-4096 0 0 0" > /proc/slabinfo

walks the objects in the size-4096 slab, printing out the calling address
of whoever allocated that object.

It is for leak detection.


diff -puN mm/slab.c~slab-leak-detector mm/slab.c
--- 25/mm/slab.c~slab-leak-detector 2005-02-15 21:06:44.0 -0800
+++ 25-akpm/mm/slab.c   2005-02-15 21:06:44.0 -0800
@@ -2116,6 +2116,15 @@ cache_alloc_debugcheck_after(kmem_cache_
*dbg_redzone1(cachep, objp) = RED_ACTIVE;
*dbg_redzone2(cachep, objp) = RED_ACTIVE;
}
+   {
+   int objnr;
+   struct slab *slabp;
+
+   slabp = GET_PAGE_SLAB(virt_to_page(objp));
+
+   objnr = (objp - slabp->s_mem) / cachep->objsize;
+   slab_bufctl(slabp)[objnr] = (unsigned long)caller;
+   }
objp += obj_dbghead(cachep);
if (cachep->ctor && cachep->flags & SLAB_POISON) {
unsigned long   ctor_flags = SLAB_CTOR_CONSTRUCTOR;
@@ -2179,12 +2188,14 @@ static void free_block(kmem_cache_t *cac
objnr = (objp - slabp->s_mem) / cachep->objsize;
check_slabp(cachep, slabp);
 #if DEBUG
+#if 0
if (slab_bufctl(slabp)[objnr] != BUFCTL_FREE) {
printk(KERN_ERR "slab: double free detected in cache 
'%s', objp %p.\n",
cachep->name, objp);
BUG();
}
 #endif
+#endif
slab_bufctl(slabp)[objnr] = slabp->free;
slabp->free = objnr;
STATS_DEC_ACTIVE(cachep);
@@ -2998,6 +3009,29 @@ struct seq_operations slabinfo_op = {
.show   = s_show,
 };
 
+static void do_dump_slabp(kmem_cache_t *cachep)
+{
+#if DEBUG
+   struct list_head *q;
+
+   check_irq_on();
+   spin_lock_irq(>spinlock);
+   list_for_each(q,>lists.slabs_full) {
+   struct slab *slabp;
+   int i;
+   slabp = list_entry(q, struct slab, list);
+   for (i = 0; i < cachep->num; i++) {
+   unsigned long sym = slab_bufctl(slabp)[i];
+
+   printk("obj %p/%d: %p", slabp, i, (void *)sym);
+   print_symbol(" <%s>", sym);
+   printk("\n");
+   }
+   }
+   spin_unlock_irq(>spinlock);
+#endif
+}
+
 #define MAX_SLABINFO_WRITE 128
 /**
  * slabinfo_write - Tuning for the slab allocator
@@ -3038,9 +3072,11 @@ ssize_t slabinfo_write(struct file *file
batchcount < 1 ||
batchcount > limit ||
shared < 0) {
-   res = -EINVAL;
+   do_dump_slabp(cachep);
+   res = 0;
} else {
-   res = do_tune_cpucache(cachep, limit, 
batchcount, shared);
+   res = do_tune_cpucache(cachep, limit,
+   batchcount, shared);
}
break;
}
_

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

-rc3 leaking NOT BIO [Was: Memory leak in 2.6.11-rc1?]

2005-02-15 Thread Parag Warudkar

I am running -rc3 on my AMD64 laptop and I noticed it becomes sluggish after 
use mainly due to growing swap use.  It has 768M of RAM and a Gig of swap. 
After following this thread, I started monitoring /proc/slabinfo. It seems 
size-64 is continuously growing and doing a compile run seem to make it grow 
noticeably faster. After a day's uptime size-64 line in /proc/slabinfo looks 
like 

size-64   7216543 7216544 64   611 : tunables  120   600 : 
slabdata 118304 118304  0

Since this doesn't seem to bio, I think we have another slab leak somewhere. 
The box recently went OOM during a gcc compile run after I killed the swap.

Output from free , OOM Killer, and /proc/slabinfo is down below..

free output -
   total   used   free sharedbuffers cached
Mem:767996 758120   9876  0   5276 130360
-/+ buffers/cache: 622484 145512
Swap:  1052248  67668 984580

OOM Killer Output
oom-killer: gfp_mask=0x1d2
DMA per-cpu:
cpu 0 hot: low 2, high 6, batch 1
cpu 0 cold: low 0, high 2, batch 1
Normal per-cpu:
cpu 0 hot: low 32, high 96, batch 16
cpu 0 cold: low 0, high 32, batch 16
HighMem per-cpu: empty

Free pages:7260kB (0kB HighMem)
Active:62385 inactive:850 dirty:0 writeback:0 unstable:0 free:1815 slab:120136 
mapped:62334 pagetables:2110
DMA free:3076kB min:72kB low:88kB high:108kB active:3328kB inactive:0kB 
present:16384kB pages_scanned:4446 all_unreclaimable? yes
lowmem_reserve[]: 0 751 751
Normal free:4184kB min:3468kB low:4332kB high:5200kB active:246212kB 
inactive:3400kB present:769472kB pages_scanned:3834 all_unreclaimable? no
lowmem_reserve[]: 0 0 0
HighMem free:0kB min:128kB low:160kB high:192kB active:0kB inactive:0kB 
present:0kB pages_scanned:0 all_unreclaimable? no
lowmem_reserve[]: 0 0 0
DMA: 1*4kB 0*8kB 0*16kB 0*32kB 0*64kB 0*128kB 0*256kB 0*512kB 1*1024kB 
1*2048kB 0*4096kB = 3076kB
Normal: 170*4kB 10*8kB 2*16kB 0*32kB 1*64kB 0*128kB 1*256kB 2*512kB 0*1024kB 
1*2048kB 0*4096kB = 4184kB
HighMem: empty
Swap cache: add 310423, delete 310423, find 74707/105490, race 0+0
Free swap  = 0kB
Total swap = 0kB
Out of Memory: Killed process 4898 (klauncher).
oom-killer: gfp_mask=0x1d2
DMA per-cpu:
cpu 0 hot: low 2, high 6, batch 1
cpu 0 cold: low 0, high 2, batch 1
Normal per-cpu:
cpu 0 hot: low 32, high 96, batch 16
cpu 0 cold: low 0, high 32, batch 16
HighMem per-cpu: empty

Free pages:7020kB (0kB HighMem)
Active:62308 inactive:648 dirty:0 writeback:0 unstable:0 free:1755 slab:120439 
mapped:62199 pagetables:2020
DMA free:3076kB min:72kB low:88kB high:108kB active:3336kB inactive:0kB 
present:16384kB pages_scanned:7087 all_unreclaimable? yes
lowmem_reserve[]: 0 751 751
Normal free:3944kB min:3468kB low:4332kB high:5200kB active:245896kB 
inactive:2592kB present:769472kB pages_scanned:3861 all_unreclaimable? no
lowmem_reserve[]: 0 0 0
HighMem free:0kB min:128kB low:160kB high:192kB active:0kB inactive:0kB 
present:0kB pages_scanned:0 all_unreclaimable? no
lowmem_reserve[]: 0 0 0
DMA: 1*4kB 0*8kB 0*16kB 0*32kB 0*64kB 0*128kB 0*256kB 0*512kB 1*1024kB 
1*2048kB 0*4096kB = 3076kB
Normal: 112*4kB 9*8kB 0*16kB 1*32kB 1*64kB 0*128kB 1*256kB 2*512kB 0*1024kB 
1*2048kB 0*4096kB = 3944kB
HighMem: empty
Swap cache: add 310423, delete 310423, find 74707/105490, race 0+0
Free swap  = 0kB
Total swap = 0kB
Out of Memory: Killed process 4918 (kwin).

/proc/slabinfo output

ipx_sock   0  089641 : tunables   54   270 : 
slabdata  0  0  0
scsi_cmd_cache 3  757671 : tunables   54   270 : 
slabdata  1  1  0
ip_fib_alias  10119 32  1191 : tunables  120   600 : 
slabdata  1  1  0
ip_fib_hash   10 61 64   611 : tunables  120   600 : 
slabdata  1  1  0
sgpool-12832 32   409611 : tunables   24   120 : 
slabdata 32 32  0
sgpool-64 32 32   204821 : tunables   24   120 : 
slabdata 16 16  0
sgpool-32 32 32   102441 : tunables   54   270 : 
slabdata  8  8  0
sgpool-16 32 3251281 : tunables   54   270 : 
slabdata  4  4  0
sgpool-8  32 45256   151 : tunables  120   600 : 
slabdata  3  3  0
ext3_inode_cache2805   3063   122431 : tunables   24   120 : 
slabdata   1021   1021  0
ext3_xattr 0  0 88   451 : tunables  120   600 : 
slabdata  0  0  0
journal_handle16156 24  1561 : tunables  120   600 : 
slabdata  1  1  0
journal_head  49180 88   451 : tunables  120   600 : 
slabdata  4  4  0
revoke_table   6225 16  2251 : tunables  120   600 : 
slabdata  1  1  0
revoke_record  0  0 32  1191 :

-rc3 leaking NOT BIO [Was: Memory leak in 2.6.11-rc1?]

2005-02-15 Thread Parag Warudkar

I am running -rc3 on my AMD64 laptop and I noticed it becomes sluggish after 
use mainly due to growing swap use.  It has 768M of RAM and a Gig of swap. 
After following this thread, I started monitoring /proc/slabinfo. It seems 
size-64 is continuously growing and doing a compile run seem to make it grow 
noticeably faster. After a day's uptime size-64 line in /proc/slabinfo looks 
like 

size-64   7216543 7216544 64   611 : tunables  120   600 : 
slabdata 118304 118304  0

Since this doesn't seem to bio, I think we have another slab leak somewhere. 
The box recently went OOM during a gcc compile run after I killed the swap.

Output from free , OOM Killer, and /proc/slabinfo is down below..

free output -
   total   used   free sharedbuffers cached
Mem:767996 758120   9876  0   5276 130360
-/+ buffers/cache: 622484 145512
Swap:  1052248  67668 984580

OOM Killer Output
oom-killer: gfp_mask=0x1d2
DMA per-cpu:
cpu 0 hot: low 2, high 6, batch 1
cpu 0 cold: low 0, high 2, batch 1
Normal per-cpu:
cpu 0 hot: low 32, high 96, batch 16
cpu 0 cold: low 0, high 32, batch 16
HighMem per-cpu: empty

Free pages:7260kB (0kB HighMem)
Active:62385 inactive:850 dirty:0 writeback:0 unstable:0 free:1815 slab:120136 
mapped:62334 pagetables:2110
DMA free:3076kB min:72kB low:88kB high:108kB active:3328kB inactive:0kB 
present:16384kB pages_scanned:4446 all_unreclaimable? yes
lowmem_reserve[]: 0 751 751
Normal free:4184kB min:3468kB low:4332kB high:5200kB active:246212kB 
inactive:3400kB present:769472kB pages_scanned:3834 all_unreclaimable? no
lowmem_reserve[]: 0 0 0
HighMem free:0kB min:128kB low:160kB high:192kB active:0kB inactive:0kB 
present:0kB pages_scanned:0 all_unreclaimable? no
lowmem_reserve[]: 0 0 0
DMA: 1*4kB 0*8kB 0*16kB 0*32kB 0*64kB 0*128kB 0*256kB 0*512kB 1*1024kB 
1*2048kB 0*4096kB = 3076kB
Normal: 170*4kB 10*8kB 2*16kB 0*32kB 1*64kB 0*128kB 1*256kB 2*512kB 0*1024kB 
1*2048kB 0*4096kB = 4184kB
HighMem: empty
Swap cache: add 310423, delete 310423, find 74707/105490, race 0+0
Free swap  = 0kB
Total swap = 0kB
Out of Memory: Killed process 4898 (klauncher).
oom-killer: gfp_mask=0x1d2
DMA per-cpu:
cpu 0 hot: low 2, high 6, batch 1
cpu 0 cold: low 0, high 2, batch 1
Normal per-cpu:
cpu 0 hot: low 32, high 96, batch 16
cpu 0 cold: low 0, high 32, batch 16
HighMem per-cpu: empty

Free pages:7020kB (0kB HighMem)
Active:62308 inactive:648 dirty:0 writeback:0 unstable:0 free:1755 slab:120439 
mapped:62199 pagetables:2020
DMA free:3076kB min:72kB low:88kB high:108kB active:3336kB inactive:0kB 
present:16384kB pages_scanned:7087 all_unreclaimable? yes
lowmem_reserve[]: 0 751 751
Normal free:3944kB min:3468kB low:4332kB high:5200kB active:245896kB 
inactive:2592kB present:769472kB pages_scanned:3861 all_unreclaimable? no
lowmem_reserve[]: 0 0 0
HighMem free:0kB min:128kB low:160kB high:192kB active:0kB inactive:0kB 
present:0kB pages_scanned:0 all_unreclaimable? no
lowmem_reserve[]: 0 0 0
DMA: 1*4kB 0*8kB 0*16kB 0*32kB 0*64kB 0*128kB 0*256kB 0*512kB 1*1024kB 
1*2048kB 0*4096kB = 3076kB
Normal: 112*4kB 9*8kB 0*16kB 1*32kB 1*64kB 0*128kB 1*256kB 2*512kB 0*1024kB 
1*2048kB 0*4096kB = 3944kB
HighMem: empty
Swap cache: add 310423, delete 310423, find 74707/105490, race 0+0
Free swap  = 0kB
Total swap = 0kB
Out of Memory: Killed process 4918 (kwin).

/proc/slabinfo output

ipx_sock   0  089641 : tunables   54   270 : 
slabdata  0  0  0
scsi_cmd_cache 3  757671 : tunables   54   270 : 
slabdata  1  1  0
ip_fib_alias  10119 32  1191 : tunables  120   600 : 
slabdata  1  1  0
ip_fib_hash   10 61 64   611 : tunables  120   600 : 
slabdata  1  1  0
sgpool-12832 32   409611 : tunables   24   120 : 
slabdata 32 32  0
sgpool-64 32 32   204821 : tunables   24   120 : 
slabdata 16 16  0
sgpool-32 32 32   102441 : tunables   54   270 : 
slabdata  8  8  0
sgpool-16 32 3251281 : tunables   54   270 : 
slabdata  4  4  0
sgpool-8  32 45256   151 : tunables  120   600 : 
slabdata  3  3  0
ext3_inode_cache2805   3063   122431 : tunables   24   120 : 
slabdata   1021   1021  0
ext3_xattr 0  0 88   451 : tunables  120   600 : 
slabdata  0  0  0
journal_handle16156 24  1561 : tunables  120   600 : 
slabdata  1  1  0
journal_head  49180 88   451 : tunables  120   600 : 
slabdata  4  4  0
revoke_table   6225 16  2251 : tunables  120   600 : 
slabdata  1  1  0
revoke_record  0  0 32  1191 :

Re: -rc3 leaking NOT BIO [Was: Memory leak in 2.6.11-rc1?]

2005-02-15 Thread Andrew Morton

Parag Warudkar [EMAIL PROTECTED] wrote:

 I am running -rc3 on my AMD64 laptop and I noticed it becomes sluggish after 
 use mainly due to growing swap use.  It has 768M of RAM and a Gig of swap. 
 After following this thread, I started monitoring /proc/slabinfo. It seems 
 size-64 is continuously growing and doing a compile run seem to make it grow 
 noticeably faster. After a day's uptime size-64 line in /proc/slabinfo looks 
 like 
 
 size-64   7216543 7216544 64   611 : tunables  120   600 
 : 
 slabdata 118304 118304  0

Plenty of moisture there.

Could you please use this patch?  Make sure that you enable
CONFIG_FRAME_POINTER (might not be needed for __builtin_return_address(0),
but let's be sure).  Also enable CONFIG_DEBUG_SLAB.



From: Manfred Spraul [EMAIL PROTECTED]

With the patch applied,

echo size-4096 0 0 0  /proc/slabinfo

walks the objects in the size-4096 slab, printing out the calling address
of whoever allocated that object.

It is for leak detection.


diff -puN mm/slab.c~slab-leak-detector mm/slab.c
--- 25/mm/slab.c~slab-leak-detector 2005-02-15 21:06:44.0 -0800
+++ 25-akpm/mm/slab.c   2005-02-15 21:06:44.0 -0800
@@ -2116,6 +2116,15 @@ cache_alloc_debugcheck_after(kmem_cache_
*dbg_redzone1(cachep, objp) = RED_ACTIVE;
*dbg_redzone2(cachep, objp) = RED_ACTIVE;
}
+   {
+   int objnr;
+   struct slab *slabp;
+
+   slabp = GET_PAGE_SLAB(virt_to_page(objp));
+
+   objnr = (objp - slabp-s_mem) / cachep-objsize;
+   slab_bufctl(slabp)[objnr] = (unsigned long)caller;
+   }
objp += obj_dbghead(cachep);
if (cachep-ctor  cachep-flags  SLAB_POISON) {
unsigned long   ctor_flags = SLAB_CTOR_CONSTRUCTOR;
@@ -2179,12 +2188,14 @@ static void free_block(kmem_cache_t *cac
objnr = (objp - slabp-s_mem) / cachep-objsize;
check_slabp(cachep, slabp);
 #if DEBUG
+#if 0
if (slab_bufctl(slabp)[objnr] != BUFCTL_FREE) {
printk(KERN_ERR slab: double free detected in cache 
'%s', objp %p.\n,
cachep-name, objp);
BUG();
}
 #endif
+#endif
slab_bufctl(slabp)[objnr] = slabp-free;
slabp-free = objnr;
STATS_DEC_ACTIVE(cachep);
@@ -2998,6 +3009,29 @@ struct seq_operations slabinfo_op = {
.show   = s_show,
 };
 
+static void do_dump_slabp(kmem_cache_t *cachep)
+{
+#if DEBUG
+   struct list_head *q;
+
+   check_irq_on();
+   spin_lock_irq(cachep-spinlock);
+   list_for_each(q,cachep-lists.slabs_full) {
+   struct slab *slabp;
+   int i;
+   slabp = list_entry(q, struct slab, list);
+   for (i = 0; i  cachep-num; i++) {
+   unsigned long sym = slab_bufctl(slabp)[i];
+
+   printk(obj %p/%d: %p, slabp, i, (void *)sym);
+   print_symbol( %s, sym);
+   printk(\n);
+   }
+   }
+   spin_unlock_irq(cachep-spinlock);
+#endif
+}
+
 #define MAX_SLABINFO_WRITE 128
 /**
  * slabinfo_write - Tuning for the slab allocator
@@ -3038,9 +3072,11 @@ ssize_t slabinfo_write(struct file *file
batchcount  1 ||
batchcount  limit ||
shared  0) {
-   res = -EINVAL;
+   do_dump_slabp(cachep);
+   res = 0;
} else {
-   res = do_tune_cpucache(cachep, limit, 
batchcount, shared);
+   res = do_tune_cpucache(cachep, limit,
+   batchcount, shared);
}
break;
}
_

-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: Memory leak in 2.6.11-rc1? (also here)

2005-02-07 Thread Noel Maddy

On Mon, Feb 07, 2005 at 07:38:12AM -0800, Linus Torvalds wrote:
> 
> Whee. You've got 5 _million_ bio's "active". Which account for about 750MB
> of your 860MB of slab usage.

Same situation here, at different rates on two different platforms,
both running same kernel build. Both show steadily increasing biovec-1.

uglybox was previously running Ingo's 2.6.11-rc2-RT-V0.7.36-03, and was
well over 3,000,000 bios after about a week of uptime. With only 512M of
memory, it was pretty sluggish.

Interesting that the 4-disk RAID5 seems to be growing about 4 times as
fast as the RAID1.

If there's anything else that could help, or patches you want me to try,
just ask.

Details:

=
#1: Soyo KT600 Platinum, Athlon 2500+, 512MB
2 SATA, 2 PATA (all on 8237)
RAID1 and RAID5
on-board tg3


>uname -a
Linux uglybox 2.6.11-rc3 #2 Thu Feb 3 16:19:44 EST 2005 i686 GNU/Linux
>uptime
 21:27:47 up  7:04,  4 users,  load average: 1.06, 1.03, 1.02
>grep '^bio' /proc/slabinfo
biovec-(256) 256256   307222 : tunables   24   120 : 
slabdata128128  0
biovec-128   256260   153652 : tunables   24   120 : 
slabdata 52 52  0
biovec-6425626076851 : tunables   54   270 : 
slabdata 52 52  0
biovec-16256260192   201 : tunables  120   600 : 
slabdata 13 13  0
biovec-4 256305 64   611 : tunables  120   600 : 
slabdata  5  5  0
biovec-1   64547  64636 16  2261 : tunables  120   600 : 
slabdata286286  0
bio64551  64599 64   611 : tunables  120   600 : 
slabdata   1059   1059  0
>lsmod
Module  Size  Used by
ppp_deflate 4928  2 
zlib_deflate   21144  1 ppp_deflate
bsd_comp5376  0 
ppp_async   9280  1 
crc_ccitt   1728  1 ppp_async
ppp_generic21396  7 ppp_deflate,bsd_comp,ppp_async
slhc6720  1 ppp_generic
radeon 76224  1 
ipv6  235456  27 
pcspkr  3300  0 
tg384932  0 
ohci1394   31748  0 
ieee1394   94196  1 ohci1394
snd_cmipci 30112  1 
snd_pcm_oss48480  0 
snd_mixer_oss  17728  1 snd_pcm_oss
usbhid 31168  0 
snd_pcm83528  2 snd_cmipci,snd_pcm_oss
snd_page_alloc  7620  1 snd_pcm
snd_opl3_lib9472  1 snd_cmipci
snd_timer  21828  2 snd_pcm,snd_opl3_lib
snd_hwdep   7456  1 snd_opl3_lib
snd_mpu401_uart 6528  1 snd_cmipci
snd_rawmidi20704  1 snd_mpu401_uart
snd_seq_device  7116  2 snd_opl3_lib,snd_rawmidi
snd48996  12 
snd_cmipci,snd_pcm_oss,snd_mixer_oss,snd_pcm,snd_opl3_lib,snd_timer,snd_hwdep,snd_mpu401_uart,snd_rawmidi,snd_seq_device
soundcore   7648  1 snd
uhci_hcd   29968  0 
ehci_hcd   29000  0 
usbcore   106744  4 usbhid,uhci_hcd,ehci_hcd
dm_mod 52796  0 
it87   23900  0 
eeprom  5776  0 
lm90   11044  0 
i2c_sensor  2944  3 it87,eeprom,lm90
i2c_isa 1728  0 
i2c_viapro  6412  0 
i2c_core   18512  6 it87,eeprom,lm90,i2c_sensor,i2c_isa,i2c_viapro
>lspci
:00:00.0 Host bridge: VIA Technologies, Inc. VT8377 [KT400/KT600 AGP] Host 
Bridge (rev 80)
:00:01.0 PCI bridge: VIA Technologies, Inc. VT8237 PCI Bridge
:00:07.0 Ethernet controller: Broadcom Corporation NetXtreme BCM5705 
Gigabit Ethernet (rev 03)
:00:0d.0 FireWire (IEEE 1394): VIA Technologies, Inc. IEEE 1394 Host 
Controller (rev 46)
:00:0e.0 Multimedia audio controller: C-Media Electronics Inc CM8738 (rev 
10)
:00:0f.0 RAID bus controller: VIA Technologies, Inc. VIA VT6420 SATA RAID 
Controller (rev 80)
:00:0f.1 IDE interface: VIA Technologies, Inc. 
VT82C586A/B/VT82C686/A/B/VT823x/A/C PIPC Bus Master IDE (rev 06)
:00:10.0 USB Controller: VIA Technologies, Inc. VT82x UHCI USB 1.1 
Controller (rev 81)
:00:10.1 USB Controller: VIA Technologies, Inc. VT82x UHCI USB 1.1 
Controller (rev 81)
:00:10.2 USB Controller: VIA Technologies, Inc. VT82x UHCI USB 1.1 
Controller (rev 81)
:00:10.3 USB Controller: VIA Technologies, Inc. VT82x UHCI USB 1.1 
Controller (rev 81)
:00:10.4 USB Controller: VIA Technologies, Inc. USB 2.0 (rev 86)
:00:11.0 ISA bridge: VIA Technologies, Inc. VT8237 ISA bridge [K8T800 South]
:00:13.0 RAID bus controller: Silicon Image, Inc. (formerly CMD Technology 
Inc) SiI 3112 [SATALink/SATARaid] Serial ATA Controller (rev 02)
:01:00.0 VGA compatible controller: ATI Technologies Inc Radeon RV200 QW 
[Radeon 7500]
>cat /proc/mdstat
Personalities : [raid0]

Re: Memory leak in 2.6.11-rc1?

2005-02-07 Thread Jan Kasprzak

Jan Kasprzak wrote:
:   I think I have been running 2.6.10-rc3 before. I've copied
: the fs/bio.c from 2.6.10-rc3 to my 2.6.11-rc2 sources and booted the
: resulting kernel. I hope it will not eat my filesystems :-) I will send
: my /proc/slabinfo in a few days.

Hmm, after 3h35min of uptime I have

biovec-1   92157  92250 16  2251 : tunables  120   608 : 
slabdata410410 60
bio92163  92163128   311 : tunables  120   608 : 
slabdata   2973   2973 60

so it is probably still leaking - about half an hour ago it was

biovec-1   77685  77850 16  2251 : tunables  120   608 : 
slabdata346346  0
bio77841  77841128   311 : tunables  120   608 : 
slabdata   2511   2511180

-Yenya

-- 
| Jan "Yenya" Kasprzak   |
| GPG: ID 1024/D3498839  Fingerprint 0D99A7FB206605D7 8B35FCDE05B18A5E |
| http://www.fi.muni.cz/~kas/   Czech Linux Homepage: http://www.linux.cz/ |
> Whatever the Java applications and desktop dances may lead to, Unix will <
> still be pushing the packets around for a quite a while.  --Rob Pike <
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: Memory leak in 2.6.11-rc1?

2005-02-07 Thread Jan Kasprzak

[EMAIL PROTECTED] wrote:
: My guess would be the clone change, if raid was not leaking before. I
: cannot lookup any patches at the moment, as I'm still at the hospital
: taking care of my new born baby and wife :)

Congratulations!

: But try and reverse the patches to fs/bio.c that mention corruption due to
: bio_clone and bio->bi_io_vec and see if that cures it. If it does, I know
: where to look. When did you notice this started to leak?

I think I have been running 2.6.10-rc3 before. I've copied
the fs/bio.c from 2.6.10-rc3 to my 2.6.11-rc2 sources and booted the
resulting kernel. I hope it will not eat my filesystems :-) I will send
my /proc/slabinfo in a few days.

-Yenya

-- 
| Jan "Yenya" Kasprzak   |
| GPG: ID 1024/D3498839  Fingerprint 0D99A7FB206605D7 8B35FCDE05B18A5E |
| http://www.fi.muni.cz/~kas/   Czech Linux Homepage: http://www.linux.cz/ |
> Whatever the Java applications and desktop dances may lead to, Unix will <
> still be pushing the packets around for a quite a while.  --Rob Pike <
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: Memory leak in 2.6.11-rc1?

2005-02-07 Thread axboe

> Linus Torvalds wrote:
> : Jan - can you give Jens a bit of an idea of what drivers and/or
> schedulers
> : you're using?
>
>   I have a Tyan S2882 dual Opteron, network is on-board tg3,
> there are 8 P-ATA HDDs hooked on 3ware 7506-8 controller (no HW RAID
> there, but the drives are partitioned and partition grouped to form
> software RAID-0, 1, 5, and 10 volumes - the main fileserving traffic
> is on a RAID-5 volume, and /var is on RAID-10 volume.
>
>   Filesystems are XFS for that RAID-5 volume, ext3 for the rest
> of the system. I have compiled-in the following I/O schedulers (according
> to my /var/log/dmesg :-)
>
> io scheduler noop registered
> io scheduler anticipatory registered
> io scheduler deadline registered
> io scheduler cfq registered
>
> I have not changed the scheduler by hand, so I suppose the anticipatory
> is the default.
>
>   No X, just serial console. The server does FTP serving mostly
> (ProFTPd with sendfile() compiled in), sending mail via qmail (cca
> 100-200k mails a day), and bits of other work (rsync, Apache, ...).
> Fedora core 3 with all relevant updates.
>
>   My fstab (physical devices only):
> /dev/md0/   ext3defaults1
> 1
> /dev/md1/home   ext3defaults1
> 2
> /dev/md6/varext3defaults1
> 2
> /dev/md4/fastraid   xfs noatime 1
> 3
> /dev/md5/export xfs noatime 1
> 4
> /dev/sde4   swapswappri=10  0
> 0
> /dev/sdf4   swapswappri=10  0
> 0
> /dev/sdg4   swapswappri=10  0
> 0
> /dev/sdh4   swapswappri=10  0
> 0
>
>   My mdstat:
>
> Personalities : [raid0] [raid1] [raid5]
> md6 : active raid0 md3[0] md2[1]
>   19550720 blocks 64k chunks
>
> md1 : active raid1 sdd1[1] sdc1[0]
>   14659200 blocks [2/2] [UU]
>
> md2 : active raid1 sdf1[1] sde1[0]
>   9775424 blocks [2/2] [UU]
>
> md3 : active raid1 sdh1[1] sdg1[0]
>   9775424 blocks [2/2] [UU]
>
> md4 : active raid0 sdh2[7] sdg2[6] sdf2[5] sde2[4] sdd2[3] sdc2[2] sdb2[1]
> sda2[0]
>   39133184 blocks 256k chunks
>
> md5 : active raid5 sdh3[7] sdg3[6] sdf3[5] sde3[4] sdd3[3] sdc3[2] sdb3[1]
> sda3[0]
>   1572512256 blocks level 5, 256k chunk, algorithm 2 [8/8] []
>
> md0 : active raid1 sdb1[1] sda1[0]
>   14659200 blocks [2/2] [UU]

My guess would be the clone change, if raid was not leaking before. I
cannot lookup any patches at the moment, as I'm still at the hospital
taking care of my new born baby and wife :)

But try and reverse the patches to fs/bio.c that mention corruption due to
bio_clone and bio->bi_io_vec and see if that cures it. If it does, I know
where to look. When did you notice this started to leak?

Jens

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: Memory leak in 2.6.11-rc1?

2005-02-07 Thread Jan Kasprzak

Linus Torvalds wrote:
: Jan - can you give Jens a bit of an idea of what drivers and/or schedulers 
: you're using?

I have a Tyan S2882 dual Opteron, network is on-board tg3,
there are 8 P-ATA HDDs hooked on 3ware 7506-8 controller (no HW RAID
there, but the drives are partitioned and partition grouped to form
software RAID-0, 1, 5, and 10 volumes - the main fileserving traffic
is on a RAID-5 volume, and /var is on RAID-10 volume.

Filesystems are XFS for that RAID-5 volume, ext3 for the rest
of the system. I have compiled-in the following I/O schedulers (according
to my /var/log/dmesg :-)

io scheduler noop registered
io scheduler anticipatory registered
io scheduler deadline registered
io scheduler cfq registered

I have not changed the scheduler by hand, so I suppose the anticipatory
is the default.

No X, just serial console. The server does FTP serving mostly
(ProFTPd with sendfile() compiled in), sending mail via qmail (cca
100-200k mails a day), and bits of other work (rsync, Apache, ...).
Fedora core 3 with all relevant updates.

My fstab (physical devices only):
/dev/md0/   ext3defaults1 1
/dev/md1/home   ext3defaults1 2
/dev/md6/varext3defaults1 2
/dev/md4/fastraid   xfs noatime 1 3
/dev/md5/export xfs noatime 1 4
/dev/sde4   swapswappri=10  0 0
/dev/sdf4   swapswappri=10  0 0
/dev/sdg4   swapswappri=10  0 0
/dev/sdh4   swapswappri=10  0 0

My mdstat:

Personalities : [raid0] [raid1] [raid5]
md6 : active raid0 md3[0] md2[1]
  19550720 blocks 64k chunks

md1 : active raid1 sdd1[1] sdc1[0]
  14659200 blocks [2/2] [UU]

md2 : active raid1 sdf1[1] sde1[0]
  9775424 blocks [2/2] [UU]

md3 : active raid1 sdh1[1] sdg1[0]
  9775424 blocks [2/2] [UU]

md4 : active raid0 sdh2[7] sdg2[6] sdf2[5] sde2[4] sdd2[3] sdc2[2] sdb2[1] 
sda2[0]
  39133184 blocks 256k chunks

md5 : active raid5 sdh3[7] sdg3[6] sdf3[5] sde3[4] sdd3[3] sdc3[2] sdb3[1] 
sda3[0]
  1572512256 blocks level 5, 256k chunk, algorithm 2 [8/8] []

md0 : active raid1 sdb1[1] sda1[0]
  14659200 blocks [2/2] [UU]

unused devices: 

Anything else you want to know? Thanks,

-Yenya

-- 
| Jan "Yenya" Kasprzak   |
| GPG: ID 1024/D3498839  Fingerprint 0D99A7FB206605D7 8B35FCDE05B18A5E |
| http://www.fi.muni.cz/~kas/   Czech Linux Homepage: http://www.linux.cz/ |
> Whatever the Java applications and desktop dances may lead to, Unix will <
> still be pushing the packets around for a quite a while.  --Rob Pike <
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: Memory leak in 2.6.11-rc1?

2005-02-07 Thread Linus Torvalds

On Mon, 7 Feb 2005, Jan Kasprzak wrote:
>
>The server has been running 2.6.11-rc2 + patch to fs/pipe.c
>for last 8 days. 
> 
> # cat /proc/meminfo
> MemTotal:  4045168 kB
> Cached:2861648 kB
> LowFree: 59396 kB
> Mapped: 206540 kB
> Slab:   861176 kB

Ok, pretty much everything there and accounted for: you've got 4GB of 
memory, and it's pretty much all in cached/mapped/slab. So if something is 
leaking, it's in one of those three.

And I think I see which one it is:

> # cat /proc/slabinfo
> slabinfo - version: 2.1
> # name
>  : tunables: slabdata 
>   
> biovec-1  5506200 5506200 16  2251 : tunables  120   608 
> : slabdata  24472  24472240
> bio   5506189 5506189128   311 : tunables  120   608 
> : slabdata 177619 177619180

Whee. You've got 5 _million_ bio's "active". Which account for about 750MB
of your 860MB of slab usage.

Jens, any ideas? Doesn't look like the "md sync_page_io bio leak", since
that would just lose one bio per md suprt block read according to you (and
that's the only one I can find fixed since -rc2). I doubt Jan has caused
five million of those..

Jan - can you give Jens a bit of an idea of what drivers and/or schedulers 
you're using?

Linus
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: Memory leak in 2.6.11-rc1?

2005-02-07 Thread William Lee Irwin III

On Mon, Feb 07, 2005 at 12:00:30PM +0100, Jan Kasprzak wrote:
>   Well, with Linus' patch to fs/pipe.c the situation seems to
> improve a bit, but some leak is still there (look at the "monthly" graph
> at the above URL). The server has been running 2.6.11-rc2 + patch to fs/pipe.c
> for last 8 days. I am letting it run for a few more days in case you want
> some debugging info from a live system. I am attaching my /proc/meminfo
> and /proc/slabinfo.

Congratulations. You have 688MB of bio's.


-- wli
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: Memory leak in 2.6.11-rc1?

2005-02-07 Thread Jan Kasprzak

: I've been running 2.6.11-rc1 on my dual opteron Fedora Core 3 box for a week
: now, and I think there is a memory leak somewhere. I am measuring the
: size of active and inactive pages (from /proc/meminfo), and it seems
: that the count of sum (active+inactive) pages is decreasing. Please
: take look at the graphs at
: 
: http://www.linux.cz/stats/mrtg-rrd/vm_active.html

Well, with Linus' patch to fs/pipe.c the situation seems to
improve a bit, but some leak is still there (look at the "monthly" graph
at the above URL). The server has been running 2.6.11-rc2 + patch to fs/pipe.c
for last 8 days. I am letting it run for a few more days in case you want
some debugging info from a live system. I am attaching my /proc/meminfo
and /proc/slabinfo.

-Yenya

# cat /proc/meminfo
MemTotal:  4045168 kB
MemFree: 59396 kB
Buffers: 17812 kB
Cached:2861648 kB
SwapCached:  0 kB
Active: 827700 kB
Inactive:  2239752 kB
HighTotal:   0 kB
HighFree:0 kB
LowTotal:  4045168 kB
LowFree: 59396 kB
SwapTotal:14651256 kB
SwapFree: 14650584 kB
Dirty:1616 kB
Writeback:   0 kB
Mapped: 206540 kB
Slab:   861176 kB
CommitLimit:  16673840 kB
Committed_AS:   565684 kB
PageTables:  20812 kB
VmallocTotal: 34359738367 kB
VmallocUsed:  7400 kB
VmallocChunk: 34359730867 kB
# cat /proc/slabinfo
slabinfo - version: 2.1
# name
 : tunables: slabdata 
  
raid5/md5256260   141652 : tunables   24   128 : 
slabdata 52 52  0
rpc_buffers8  8   204821 : tunables   24   128 : 
slabdata  4  4  0
rpc_tasks 12 20384   101 : tunables   54   278 : 
slabdata  2  2  0
rpc_inode_cache8 1076851 : tunables   54   278 : 
slabdata  2  2  0
fib6_nodes27 61 64   611 : tunables  120   608 : 
slabdata  1  1  0
ip6_dst_cache 17 36320   121 : tunables   54   278 : 
slabdata  3  3  0
ndisc_cache2 30256   151 : tunables  120   608 : 
slabdata  2  2  0
rawv6_sock 4  4   102441 : tunables   54   278 : 
slabdata  1  1  0
udpv6_sock 1  496041 : tunables   54   278 : 
slabdata  1  1  0
tcpv6_sock 8  8   166442 : tunables   24   128 : 
slabdata  2  2  0
unix_sock56765076851 : tunables   54   278 : 
slabdata130130  0
tcp_tw_bucket445920192   201 : tunables  120   608 : 
slabdata 46 46  0
tcp_bind_bucket  389   2261 32  1191 : tunables  120   608 : 
slabdata 19 19  0
tcp_open_request 135310128   311 : tunables  120   608 : 
slabdata 10 10  0
inet_peer_cache   32 62128   311 : tunables  120   608 : 
slabdata  2  2  0
ip_fib_alias  20119 32  1191 : tunables  120   608 : 
slabdata  1  1  0
ip_fib_hash   18 61 64   611 : tunables  120   608 : 
slabdata  1  1  0
ip_dst_cache1738   2060384   101 : tunables   54   278 : 
slabdata206206  0
arp_cache  8 30256   151 : tunables  120   608 : 
slabdata  2  2  0
raw_sock   3  983292 : tunables   54   278 : 
slabdata  1  1  0
udp_sock  45 4583292 : tunables   54   278 : 
slabdata  5  5  0
tcp_sock 431600   147252 : tunables   24   128 : 
slabdata120120  0
flow_cache 0  0128   311 : tunables  120   608 : 
slabdata  0  0  0
dm_tio 0  0 24  1561 : tunables  120   608 : 
slabdata  0  0  0
dm_io  0  0 32  1191 : tunables  120   608 : 
slabdata  0  0  0
scsi_cmd_cache   26131551271 : tunables   54   278 : 
slabdata 45 45216
cfq_ioc_pool   0  0 48   811 : tunables  120   608 : 
slabdata  0  0  0
cfq_pool   0  0176   221 : tunables  120   608 : 
slabdata  0  0  0
crq_pool   0  0104   381 : tunables  120   608 : 
slabdata  0  0  0
deadline_drq   0  0 96   411 : tunables  120   608 : 
slabdata  0  0  0
as_arq   580700112   351 : tunables  120   608 : 
slabdata 20 20432
xfs_acl0  0304   131 : tunables   54   278 : 
slabdata  0  0  0
xfs_chashlist380   4879 32  1191

Re: Memory leak in 2.6.11-rc1?

2005-02-07 Thread Jan Kasprzak

: I've been running 2.6.11-rc1 on my dual opteron Fedora Core 3 box for a week
: now, and I think there is a memory leak somewhere. I am measuring the
: size of active and inactive pages (from /proc/meminfo), and it seems
: that the count of sum (active+inactive) pages is decreasing. Please
: take look at the graphs at
: 
: http://www.linux.cz/stats/mrtg-rrd/vm_active.html

Well, with Linus' patch to fs/pipe.c the situation seems to
improve a bit, but some leak is still there (look at the monthly graph
at the above URL). The server has been running 2.6.11-rc2 + patch to fs/pipe.c
for last 8 days. I am letting it run for a few more days in case you want
some debugging info from a live system. I am attaching my /proc/meminfo
and /proc/slabinfo.

-Yenya

# cat /proc/meminfo
MemTotal:  4045168 kB
MemFree: 59396 kB
Buffers: 17812 kB
Cached:2861648 kB
SwapCached:  0 kB
Active: 827700 kB
Inactive:  2239752 kB
HighTotal:   0 kB
HighFree:0 kB
LowTotal:  4045168 kB
LowFree: 59396 kB
SwapTotal:14651256 kB
SwapFree: 14650584 kB
Dirty:1616 kB
Writeback:   0 kB
Mapped: 206540 kB
Slab:   861176 kB
CommitLimit:  16673840 kB
Committed_AS:   565684 kB
PageTables:  20812 kB
VmallocTotal: 34359738367 kB
VmallocUsed:  7400 kB
VmallocChunk: 34359730867 kB
# cat /proc/slabinfo
slabinfo - version: 2.1
# nameactive_objs num_objs objsize objperslab 
pagesperslab : tunables batchcount limit sharedfactor : slabdata 
active_slabs num_slabs sharedavail
raid5/md5256260   141652 : tunables   24   128 : 
slabdata 52 52  0
rpc_buffers8  8   204821 : tunables   24   128 : 
slabdata  4  4  0
rpc_tasks 12 20384   101 : tunables   54   278 : 
slabdata  2  2  0
rpc_inode_cache8 1076851 : tunables   54   278 : 
slabdata  2  2  0
fib6_nodes27 61 64   611 : tunables  120   608 : 
slabdata  1  1  0
ip6_dst_cache 17 36320   121 : tunables   54   278 : 
slabdata  3  3  0
ndisc_cache2 30256   151 : tunables  120   608 : 
slabdata  2  2  0
rawv6_sock 4  4   102441 : tunables   54   278 : 
slabdata  1  1  0
udpv6_sock 1  496041 : tunables   54   278 : 
slabdata  1  1  0
tcpv6_sock 8  8   166442 : tunables   24   128 : 
slabdata  2  2  0
unix_sock56765076851 : tunables   54   278 : 
slabdata130130  0
tcp_tw_bucket445920192   201 : tunables  120   608 : 
slabdata 46 46  0
tcp_bind_bucket  389   2261 32  1191 : tunables  120   608 : 
slabdata 19 19  0
tcp_open_request 135310128   311 : tunables  120   608 : 
slabdata 10 10  0
inet_peer_cache   32 62128   311 : tunables  120   608 : 
slabdata  2  2  0
ip_fib_alias  20119 32  1191 : tunables  120   608 : 
slabdata  1  1  0
ip_fib_hash   18 61 64   611 : tunables  120   608 : 
slabdata  1  1  0
ip_dst_cache1738   2060384   101 : tunables   54   278 : 
slabdata206206  0
arp_cache  8 30256   151 : tunables  120   608 : 
slabdata  2  2  0
raw_sock   3  983292 : tunables   54   278 : 
slabdata  1  1  0
udp_sock  45 4583292 : tunables   54   278 : 
slabdata  5  5  0
tcp_sock 431600   147252 : tunables   24   128 : 
slabdata120120  0
flow_cache 0  0128   311 : tunables  120   608 : 
slabdata  0  0  0
dm_tio 0  0 24  1561 : tunables  120   608 : 
slabdata  0  0  0
dm_io  0  0 32  1191 : tunables  120   608 : 
slabdata  0  0  0
scsi_cmd_cache   26131551271 : tunables   54   278 : 
slabdata 45 45216
cfq_ioc_pool   0  0 48   811 : tunables  120   608 : 
slabdata  0  0  0
cfq_pool   0  0176   221 : tunables  120   608 : 
slabdata  0  0  0
crq_pool   0  0104   381 : tunables  120   608 : 
slabdata  0  0  0
deadline_drq   0  0 96   411 : tunables  120   608 : 
slabdata  0  0  0
as_arq   580700112   351 : tunables  120   608 : 
slabdata 20 20432
xfs_acl0  0304   131 :

Re: Memory leak in 2.6.11-rc1?

2005-02-07 Thread William Lee Irwin III

On Mon, Feb 07, 2005 at 12:00:30PM +0100, Jan Kasprzak wrote:
   Well, with Linus' patch to fs/pipe.c the situation seems to
 improve a bit, but some leak is still there (look at the monthly graph
 at the above URL). The server has been running 2.6.11-rc2 + patch to fs/pipe.c
 for last 8 days. I am letting it run for a few more days in case you want
 some debugging info from a live system. I am attaching my /proc/meminfo
 and /proc/slabinfo.

Congratulations. You have 688MB of bio's.


-- wli
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: Memory leak in 2.6.11-rc1?

2005-02-07 Thread Linus Torvalds



On Mon, 7 Feb 2005, Jan Kasprzak wrote:

The server has been running 2.6.11-rc2 + patch to fs/pipe.c
for last 8 days. 
 
 # cat /proc/meminfo
 MemTotal:  4045168 kB
 Cached:2861648 kB
 LowFree: 59396 kB
 Mapped: 206540 kB
 Slab:   861176 kB

Ok, pretty much everything there and accounted for: you've got 4GB of 
memory, and it's pretty much all in cached/mapped/slab. So if something is 
leaking, it's in one of those three.

And I think I see which one it is:

 # cat /proc/slabinfo
 slabinfo - version: 2.1
 # nameactive_objs num_objs objsize objperslab 
 pagesperslab : tunables batchcount limit sharedfactor : slabdata 
 active_slabs num_slabs sharedavail
 biovec-1  5506200 5506200 16  2251 : tunables  120   608 
 : slabdata  24472  24472240
 bio   5506189 5506189128   311 : tunables  120   608 
 : slabdata 177619 177619180

Whee. You've got 5 _million_ bio's active. Which account for about 750MB
of your 860MB of slab usage.

Jens, any ideas? Doesn't look like the md sync_page_io bio leak, since
that would just lose one bio per md suprt block read according to you (and
that's the only one I can find fixed since -rc2). I doubt Jan has caused
five million of those..

Jan - can you give Jens a bit of an idea of what drivers and/or schedulers 
you're using?

Linus
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: Memory leak in 2.6.11-rc1?

2005-02-07 Thread Jan Kasprzak

Linus Torvalds wrote:
: Jan - can you give Jens a bit of an idea of what drivers and/or schedulers 
: you're using?

I have a Tyan S2882 dual Opteron, network is on-board tg3,
there are 8 P-ATA HDDs hooked on 3ware 7506-8 controller (no HW RAID
there, but the drives are partitioned and partition grouped to form
software RAID-0, 1, 5, and 10 volumes - the main fileserving traffic
is on a RAID-5 volume, and /var is on RAID-10 volume.

Filesystems are XFS for that RAID-5 volume, ext3 for the rest
of the system. I have compiled-in the following I/O schedulers (according
to my /var/log/dmesg :-)

io scheduler noop registered
io scheduler anticipatory registered
io scheduler deadline registered
io scheduler cfq registered

I have not changed the scheduler by hand, so I suppose the anticipatory
is the default.

No X, just serial console. The server does FTP serving mostly
(ProFTPd with sendfile() compiled in), sending mail via qmail (cca
100-200k mails a day), and bits of other work (rsync, Apache, ...).
Fedora core 3 with all relevant updates.

My fstab (physical devices only):
/dev/md0/   ext3defaults1 1
/dev/md1/home   ext3defaults1 2
/dev/md6/varext3defaults1 2
/dev/md4/fastraid   xfs noatime 1 3
/dev/md5/export xfs noatime 1 4
/dev/sde4   swapswappri=10  0 0
/dev/sdf4   swapswappri=10  0 0
/dev/sdg4   swapswappri=10  0 0
/dev/sdh4   swapswappri=10  0 0

My mdstat:

Personalities : [raid0] [raid1] [raid5]
md6 : active raid0 md3[0] md2[1]
  19550720 blocks 64k chunks

md1 : active raid1 sdd1[1] sdc1[0]
  14659200 blocks [2/2] [UU]

md2 : active raid1 sdf1[1] sde1[0]
  9775424 blocks [2/2] [UU]

md3 : active raid1 sdh1[1] sdg1[0]
  9775424 blocks [2/2] [UU]

md4 : active raid0 sdh2[7] sdg2[6] sdf2[5] sde2[4] sdd2[3] sdc2[2] sdb2[1] 
sda2[0]
  39133184 blocks 256k chunks

md5 : active raid5 sdh3[7] sdg3[6] sdf3[5] sde3[4] sdd3[3] sdc3[2] sdb3[1] 
sda3[0]
  1572512256 blocks level 5, 256k chunk, algorithm 2 [8/8] []

md0 : active raid1 sdb1[1] sda1[0]
  14659200 blocks [2/2] [UU]

unused devices: none

Anything else you want to know? Thanks,

-Yenya

-- 
| Jan Yenya Kasprzak  kas at {fi.muni.cz - work | yenya.net - private} |
| GPG: ID 1024/D3498839  Fingerprint 0D99A7FB206605D7 8B35FCDE05B18A5E |
| http://www.fi.muni.cz/~kas/   Czech Linux Homepage: http://www.linux.cz/ |
 Whatever the Java applications and desktop dances may lead to, Unix will 
 still be pushing the packets around for a quite a while.  --Rob Pike 
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: Memory leak in 2.6.11-rc1?

2005-02-07 Thread axboe

 Linus Torvalds wrote:
 : Jan - can you give Jens a bit of an idea of what drivers and/or
 schedulers
 : you're using?

   I have a Tyan S2882 dual Opteron, network is on-board tg3,
 there are 8 P-ATA HDDs hooked on 3ware 7506-8 controller (no HW RAID
 there, but the drives are partitioned and partition grouped to form
 software RAID-0, 1, 5, and 10 volumes - the main fileserving traffic
 is on a RAID-5 volume, and /var is on RAID-10 volume.

   Filesystems are XFS for that RAID-5 volume, ext3 for the rest
 of the system. I have compiled-in the following I/O schedulers (according
 to my /var/log/dmesg :-)

 io scheduler noop registered
 io scheduler anticipatory registered
 io scheduler deadline registered
 io scheduler cfq registered

 I have not changed the scheduler by hand, so I suppose the anticipatory
 is the default.

   No X, just serial console. The server does FTP serving mostly
 (ProFTPd with sendfile() compiled in), sending mail via qmail (cca
 100-200k mails a day), and bits of other work (rsync, Apache, ...).
 Fedora core 3 with all relevant updates.

   My fstab (physical devices only):
 /dev/md0/   ext3defaults1
 1
 /dev/md1/home   ext3defaults1
 2
 /dev/md6/varext3defaults1
 2
 /dev/md4/fastraid   xfs noatime 1
 3
 /dev/md5/export xfs noatime 1
 4
 /dev/sde4   swapswappri=10  0
 0
 /dev/sdf4   swapswappri=10  0
 0
 /dev/sdg4   swapswappri=10  0
 0
 /dev/sdh4   swapswappri=10  0
 0

   My mdstat:

 Personalities : [raid0] [raid1] [raid5]
 md6 : active raid0 md3[0] md2[1]
   19550720 blocks 64k chunks

 md1 : active raid1 sdd1[1] sdc1[0]
   14659200 blocks [2/2] [UU]

 md2 : active raid1 sdf1[1] sde1[0]
   9775424 blocks [2/2] [UU]

 md3 : active raid1 sdh1[1] sdg1[0]
   9775424 blocks [2/2] [UU]

 md4 : active raid0 sdh2[7] sdg2[6] sdf2[5] sde2[4] sdd2[3] sdc2[2] sdb2[1]
 sda2[0]
   39133184 blocks 256k chunks

 md5 : active raid5 sdh3[7] sdg3[6] sdf3[5] sde3[4] sdd3[3] sdc3[2] sdb3[1]
 sda3[0]
   1572512256 blocks level 5, 256k chunk, algorithm 2 [8/8] []

 md0 : active raid1 sdb1[1] sda1[0]
   14659200 blocks [2/2] [UU]

My guess would be the clone change, if raid was not leaking before. I
cannot lookup any patches at the moment, as I'm still at the hospital
taking care of my new born baby and wife :)

But try and reverse the patches to fs/bio.c that mention corruption due to
bio_clone and bio-bi_io_vec and see if that cures it. If it does, I know
where to look. When did you notice this started to leak?

Jens

-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: Memory leak in 2.6.11-rc1?

2005-02-07 Thread Jan Kasprzak

[EMAIL PROTECTED] wrote:
: My guess would be the clone change, if raid was not leaking before. I
: cannot lookup any patches at the moment, as I'm still at the hospital
: taking care of my new born baby and wife :)

Congratulations!

: But try and reverse the patches to fs/bio.c that mention corruption due to
: bio_clone and bio-bi_io_vec and see if that cures it. If it does, I know
: where to look. When did you notice this started to leak?

I think I have been running 2.6.10-rc3 before. I've copied
the fs/bio.c from 2.6.10-rc3 to my 2.6.11-rc2 sources and booted the
resulting kernel. I hope it will not eat my filesystems :-) I will send
my /proc/slabinfo in a few days.

-Yenya

-- 
| Jan Yenya Kasprzak  kas at {fi.muni.cz - work | yenya.net - private} |
| GPG: ID 1024/D3498839  Fingerprint 0D99A7FB206605D7 8B35FCDE05B18A5E |
| http://www.fi.muni.cz/~kas/   Czech Linux Homepage: http://www.linux.cz/ |
 Whatever the Java applications and desktop dances may lead to, Unix will 
 still be pushing the packets around for a quite a while.  --Rob Pike 
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: Memory leak in 2.6.11-rc1?

2005-02-07 Thread Jan Kasprzak

Jan Kasprzak wrote:
:   I think I have been running 2.6.10-rc3 before. I've copied
: the fs/bio.c from 2.6.10-rc3 to my 2.6.11-rc2 sources and booted the
: resulting kernel. I hope it will not eat my filesystems :-) I will send
: my /proc/slabinfo in a few days.

Hmm, after 3h35min of uptime I have

biovec-1   92157  92250 16  2251 : tunables  120   608 : 
slabdata410410 60
bio92163  92163128   311 : tunables  120   608 : 
slabdata   2973   2973 60

so it is probably still leaking - about half an hour ago it was

biovec-1   77685  77850 16  2251 : tunables  120   608 : 
slabdata346346  0
bio77841  77841128   311 : tunables  120   608 : 
slabdata   2511   2511180

-Yenya

-- 
| Jan Yenya Kasprzak  kas at {fi.muni.cz - work | yenya.net - private} |
| GPG: ID 1024/D3498839  Fingerprint 0D99A7FB206605D7 8B35FCDE05B18A5E |
| http://www.fi.muni.cz/~kas/   Czech Linux Homepage: http://www.linux.cz/ |
 Whatever the Java applications and desktop dances may lead to, Unix will 
 still be pushing the packets around for a quite a while.  --Rob Pike 
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: Memory leak in 2.6.11-rc1? (also here)

2005-02-07 Thread Noel Maddy

On Mon, Feb 07, 2005 at 07:38:12AM -0800, Linus Torvalds wrote:
 
 Whee. You've got 5 _million_ bio's active. Which account for about 750MB
 of your 860MB of slab usage.

Same situation here, at different rates on two different platforms,
both running same kernel build. Both show steadily increasing biovec-1.

uglybox was previously running Ingo's 2.6.11-rc2-RT-V0.7.36-03, and was
well over 3,000,000 bios after about a week of uptime. With only 512M of
memory, it was pretty sluggish.

Interesting that the 4-disk RAID5 seems to be growing about 4 times as
fast as the RAID1.

If there's anything else that could help, or patches you want me to try,
just ask.

Details:

=
#1: Soyo KT600 Platinum, Athlon 2500+, 512MB
2 SATA, 2 PATA (all on 8237)
RAID1 and RAID5
on-board tg3


uname -a
Linux uglybox 2.6.11-rc3 #2 Thu Feb 3 16:19:44 EST 2005 i686 GNU/Linux
uptime
 21:27:47 up  7:04,  4 users,  load average: 1.06, 1.03, 1.02
grep '^bio' /proc/slabinfo
biovec-(256) 256256   307222 : tunables   24   120 : 
slabdata128128  0
biovec-128   256260   153652 : tunables   24   120 : 
slabdata 52 52  0
biovec-6425626076851 : tunables   54   270 : 
slabdata 52 52  0
biovec-16256260192   201 : tunables  120   600 : 
slabdata 13 13  0
biovec-4 256305 64   611 : tunables  120   600 : 
slabdata  5  5  0
biovec-1   64547  64636 16  2261 : tunables  120   600 : 
slabdata286286  0
bio64551  64599 64   611 : tunables  120   600 : 
slabdata   1059   1059  0
lsmod
Module  Size  Used by
ppp_deflate 4928  2 
zlib_deflate   21144  1 ppp_deflate
bsd_comp5376  0 
ppp_async   9280  1 
crc_ccitt   1728  1 ppp_async
ppp_generic21396  7 ppp_deflate,bsd_comp,ppp_async
slhc6720  1 ppp_generic
radeon 76224  1 
ipv6  235456  27 
pcspkr  3300  0 
tg384932  0 
ohci1394   31748  0 
ieee1394   94196  1 ohci1394
snd_cmipci 30112  1 
snd_pcm_oss48480  0 
snd_mixer_oss  17728  1 snd_pcm_oss
usbhid 31168  0 
snd_pcm83528  2 snd_cmipci,snd_pcm_oss
snd_page_alloc  7620  1 snd_pcm
snd_opl3_lib9472  1 snd_cmipci
snd_timer  21828  2 snd_pcm,snd_opl3_lib
snd_hwdep   7456  1 snd_opl3_lib
snd_mpu401_uart 6528  1 snd_cmipci
snd_rawmidi20704  1 snd_mpu401_uart
snd_seq_device  7116  2 snd_opl3_lib,snd_rawmidi
snd48996  12 
snd_cmipci,snd_pcm_oss,snd_mixer_oss,snd_pcm,snd_opl3_lib,snd_timer,snd_hwdep,snd_mpu401_uart,snd_rawmidi,snd_seq_device
soundcore   7648  1 snd
uhci_hcd   29968  0 
ehci_hcd   29000  0 
usbcore   106744  4 usbhid,uhci_hcd,ehci_hcd
dm_mod 52796  0 
it87   23900  0 
eeprom  5776  0 
lm90   11044  0 
i2c_sensor  2944  3 it87,eeprom,lm90
i2c_isa 1728  0 
i2c_viapro  6412  0 
i2c_core   18512  6 it87,eeprom,lm90,i2c_sensor,i2c_isa,i2c_viapro
lspci
:00:00.0 Host bridge: VIA Technologies, Inc. VT8377 [KT400/KT600 AGP] Host 
Bridge (rev 80)
:00:01.0 PCI bridge: VIA Technologies, Inc. VT8237 PCI Bridge
:00:07.0 Ethernet controller: Broadcom Corporation NetXtreme BCM5705 
Gigabit Ethernet (rev 03)
:00:0d.0 FireWire (IEEE 1394): VIA Technologies, Inc. IEEE 1394 Host 
Controller (rev 46)
:00:0e.0 Multimedia audio controller: C-Media Electronics Inc CM8738 (rev 
10)
:00:0f.0 RAID bus controller: VIA Technologies, Inc. VIA VT6420 SATA RAID 
Controller (rev 80)
:00:0f.1 IDE interface: VIA Technologies, Inc. 
VT82C586A/B/VT82C686/A/B/VT823x/A/C PIPC Bus Master IDE (rev 06)
:00:10.0 USB Controller: VIA Technologies, Inc. VT82x UHCI USB 1.1 
Controller (rev 81)
:00:10.1 USB Controller: VIA Technologies, Inc. VT82x UHCI USB 1.1 
Controller (rev 81)
:00:10.2 USB Controller: VIA Technologies, Inc. VT82x UHCI USB 1.1 
Controller (rev 81)
:00:10.3 USB Controller: VIA Technologies, Inc. VT82x UHCI USB 1.1 
Controller (rev 81)
:00:10.4 USB Controller: VIA Technologies, Inc. USB 2.0 (rev 86)
:00:11.0 ISA bridge: VIA Technologies, Inc. VT8237 ISA bridge [K8T800 South]
:00:13.0 RAID bus controller: Silicon Image, Inc. (formerly CMD Technology 
Inc) SiI 3112 [SATALink/SATARaid] Serial ATA Controller (rev 02)
:01:00.0 VGA compatible controller: ATI Technologies Inc Radeon RV200 QW 
[Radeon 7500]
cat /proc/mdstat
Personalities : [raid0] [raid1]

Re: Memory leak in 2.6.11-rc1?

2005-02-02 Thread Linus Torvalds



On Wed, 2 Feb 2005, Dave Hansen wrote:
> 
> Strangely enough, it seems to be one single, persistent page.  

Ok. Almost certainly not a leak. 

It's most likely the FIFO that "init" opens (/dev/initctl). FIFO's use the 
pipe code too.

If you don't want unreclaimable highmem pages, then I suspect you just 
need to change the GFP_HIGHUSER to a GFP_USER in fs/pipe.c

Linus
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: Memory leak in 2.6.11-rc1?

2005-02-02 Thread Dave Hansen

On Wed, 2005-02-02 at 10:27 -0800, Linus Torvalds wrote:
> How many of these pages do you see? It's normal for a single pipe to be 
> associated with up to 16 pages (although that would only happen if there 
> is no reader or a slow reader, which is obviously not very common). 

Strangely enough, it seems to be one single, persistent page.  

> Now, if your memory freeing code depends on the fact that all HIGHMEM
> pages are always "freeable" (page cache + VM mappings only), then yes, the
> new pipe code introduces highmem pages that weren't highmem before.  But
> such long-lived and unfreeable pages have been there before too:  kernel
> modules (or any other vmalloc() user, for that matter) also do the same
> thing.

That might be it.  For now, I just change the GFP masks for vmalloc() so
that I don't have to deal with it, yet.  But, I certainly can see that
how this is a new user of highmem.

I did go around killing processes like mad to see if any of them still
had a hold of the pipe, but the shotgun approach didn't seem to help.

> Now, there _is_ another possibility here: we might have had a pipe leak
> before, and the new pipe code would potentially make it a lot more
> noticeable, with up to sixteen times as many pages lost if somebody freed
> a pipe inode without calling "free_pipe_info()". I don't see where that 
> would happen - all the normal "release" functions seem fine.
> 
> Hmm.. Adding a 
> 
>   WARN_ON(inode->i_pipe);
> 
> to "iput_final()" might be a good idea - showing if somebody is releasing 
> an inode while it still associated with a pipe-info data structure.
> 
> Also, while I don't see how a write could leak, but maybe you could you
> add a
> 
>   WARN_ON(buf->ops);
> 
> to the pipe_writev() case just before we insert a new buffer (ie to just
> after the comment that says "Insert it into the buffer array"). Just to
> see if the circular buffer handling might overwrite an old entry (although
> I _really_ don't see that - it's not like the code is complex, and it
> would also be accompanied by data-loss in the pipe, so we'd have seen
> that, methinks).

I'll put the warnings in, and see if anything comes up.

-- Dave

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: Memory leak in 2.6.11-rc1?

2005-02-02 Thread Linus Torvalds

On Wed, 2 Feb 2005, Dave Hansen wrote:
> 
> In any case, I'm running a horribly hacked up kernel, but this is
> certainly a new problem, and not one that I've run into before.  Here's
> output from the new CONFIG_PAGE_OWNER code:

Hmm.. Everything looks fine. One new thing about the pipe code is that it 
historically never allocated HIGHMEM pages, and the new code no longer 
cares and thus can allocate anything. So there's nothing strange in your 
output that I can see.

How many of these pages do you see? It's normal for a single pipe to be 
associated with up to 16 pages (although that would only happen if there 
is no reader or a slow reader, which is obviously not very common). 

Now, if your memory freeing code depends on the fact that all HIGHMEM
pages are always "freeable" (page cache + VM mappings only), then yes, the
new pipe code introduces highmem pages that weren't highmem before.  But
such long-lived and unfreeable pages have been there before too:  kernel
modules (or any other vmalloc() user, for that matter) also do the same
thing.

Now, there _is_ another possibility here: we might have had a pipe leak
before, and the new pipe code would potentially make it a lot more
noticeable, with up to sixteen times as many pages lost if somebody freed
a pipe inode without calling "free_pipe_info()". I don't see where that 
would happen - all the normal "release" functions seem fine.

Hmm.. Adding a 

WARN_ON(inode->i_pipe);

to "iput_final()" might be a good idea - showing if somebody is releasing 
an inode while it still associated with a pipe-info data structure.

Also, while I don't see how a write could leak, but maybe you could you
add a

WARN_ON(buf->ops);

to the pipe_writev() case just before we insert a new buffer (ie to just
after the comment that says "Insert it into the buffer array"). Just to
see if the circular buffer handling might overwrite an old entry (although
I _really_ don't see that - it's not like the code is complex, and it
would also be accompanied by data-loss in the pipe, so we'd have seen
that, methinks).

Linus
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: Memory leak in 2.6.11-rc1?

2005-02-02 Thread Dave Hansen

I think there's still something funky going on in the pipe code, at
least in 2.6.11-rc2-mm2, which does contain the misordered __free_page()
fix in pipe.c.  I'm noticing any leak pretty easily because I'm
attempting memory removal of highmem areas, and these apparently leaked
pipe pages the only things keeping those from succeeding.

In any case, I'm running a horribly hacked up kernel, but this is
certainly a new problem, and not one that I've run into before.  Here's
output from the new CONFIG_PAGE_OWNER code:

Page (e0c4f8b8) pfn: 00566606 allocated via order 0
[0xc0162ef6] pipe_writev+542
[0xc0157f48] do_readv_writev+288
[0xc0163114] pipe_write+0
[0xc0134484] ltt_log_event+64
[0xc0158077] vfs_writev+75
[0xc01581ac] sys_writev+104
[0xc0102430] no_syscall_entry_trace+11

And some more information about the page (yes, it's in the vmalloc
space)

page: e0c4f8b8
pfn: 0008a54e 566606
count: 1
mapcount: 0
index: 786431
mapping: 
private: 
lru->prev: 00200200
lru->next: 00100100
PG_locked:  0
PG_error:   0
PG_referenced:  0
PG_uptodate:0
PG_dirty:   0
PG_lru: 0
PG_active:  0
PG_slab:0
PG_highmem: 1
PG_checked: 0
PG_arch_1:  0
PG_reserved:0
PG_private: 0
PG_writeback:   0
PG_nosave:  0
PG_compound:0
PG_swapcache:   0
PG_mappedtodisk:0
PG_reclaim: 0
PG_nosave_free: 0
PG_capture: 1


-- Dave

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: Memory leak in 2.6.11-rc1?

2005-02-02 Thread Linus Torvalds



On Wed, 2 Feb 2005, Lennert Van Alboom wrote:
>
> I applied the patch and it works like a charm. As a kinky side effect: before 
> this patch, using a compiled-in vesa or vga16 framebuffer worked with the 
> proprietary nvidia driver, whereas now tty1-6 are corrupt when not using 
> 80x25. Strangeness :)

It really sounds like you should lay off those pharmaceutical drugs ;)

That is _strange_. Is it literally just this single pipe merging change
that matters to you? No other changces? I don't see how it could
_possibly_ make any difference at all to anything else.

Linus
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: Memory leak in 2.6.11-rc1?

2005-02-02 Thread Lennert Van Alboom

Positive, I only applied this single two-line change. I'm not capable of 
messing with kernel code myself so I prefer not to. Probably just a lucky 
shot that the vesa didn't go nuts with nvidia before... O well, with a bit 
more o'those pharmaceutical drugs even this 80x25 doesn't look too bad. 
Hurray!

Lennert

On Wednesday 02 February 2005 17:00, Linus Torvalds wrote:
> On Wed, 2 Feb 2005, Lennert Van Alboom wrote:
> > I applied the patch and it works like a charm. As a kinky side effect:
> > before this patch, using a compiled-in vesa or vga16 framebuffer worked
> > with the proprietary nvidia driver, whereas now tty1-6 are corrupt when
> > not using 80x25. Strangeness :)
>
> It really sounds like you should lay off those pharmaceutical drugs ;)
>
> That is _strange_. Is it literally just this single pipe merging change
> that matters to you? No other changces? I don't see how it could
> _possibly_ make any difference at all to anything else.
>
>   Linus


pgpAEbNCIOoA0.pgp
Description: PGP signature

Re: Memory leak in 2.6.11-rc1?

2005-02-02 Thread Lennert Van Alboom

I applied the patch and it works like a charm. As a kinky side effect: before 
this patch, using a compiled-in vesa or vga16 framebuffer worked with the 
proprietary nvidia driver, whereas now tty1-6 are corrupt when not using 
80x25. Strangeness :)

Lennert

On Monday 24 January 2005 23:35, Linus Torvalds wrote:
> On Mon, 24 Jan 2005, Andrew Morton wrote:
> > Would indicate that the new pipe code is leaking.
>
> Duh. It's the pipe merging.
>
>   Linus
>
> 
> --- 1.40/fs/pipe.c2005-01-15 12:01:16 -08:00
> +++ edited/fs/pipe.c  2005-01-24 14:35:09 -08:00
> @@ -630,13 +630,13 @@
>   struct pipe_inode_info *info = inode->i_pipe;
>
>   inode->i_pipe = NULL;
> - if (info->tmp_page)
> - __free_page(info->tmp_page);
>   for (i = 0; i < PIPE_BUFFERS; i++) {
>   struct pipe_buffer *buf = info->bufs + i;
>   if (buf->ops)
>   buf->ops->release(info, buf);
>   }
> + if (info->tmp_page)
> + __free_page(info->tmp_page);
>   kfree(info);
>  }


pgpGO7TveTbBa.pgp
Description: PGP signature

Re: Memory leak in 2.6.11-rc1?

2005-02-02 Thread Lennert Van Alboom

I applied the patch and it works like a charm. As a kinky side effect: before 
this patch, using a compiled-in vesa or vga16 framebuffer worked with the 
proprietary nvidia driver, whereas now tty1-6 are corrupt when not using 
80x25. Strangeness :)

Lennert

On Monday 24 January 2005 23:35, Linus Torvalds wrote:
 On Mon, 24 Jan 2005, Andrew Morton wrote:
  Would indicate that the new pipe code is leaking.

 Duh. It's the pipe merging.

   Linus

 
 --- 1.40/fs/pipe.c2005-01-15 12:01:16 -08:00
 +++ edited/fs/pipe.c  2005-01-24 14:35:09 -08:00
 @@ -630,13 +630,13 @@
   struct pipe_inode_info *info = inode-i_pipe;

   inode-i_pipe = NULL;
 - if (info-tmp_page)
 - __free_page(info-tmp_page);
   for (i = 0; i  PIPE_BUFFERS; i++) {
   struct pipe_buffer *buf = info-bufs + i;
   if (buf-ops)
   buf-ops-release(info, buf);
   }
 + if (info-tmp_page)
 + __free_page(info-tmp_page);
   kfree(info);
  }


pgpGO7TveTbBa.pgp
Description: PGP signature

Re: Memory leak in 2.6.11-rc1?

2005-02-02 Thread Lennert Van Alboom

Positive, I only applied this single two-line change. I'm not capable of 
messing with kernel code myself so I prefer not to. Probably just a lucky 
shot that the vesa didn't go nuts with nvidia before... O well, with a bit 
more o'those pharmaceutical drugs even this 80x25 doesn't look too bad. 
Hurray!

Lennert

On Wednesday 02 February 2005 17:00, Linus Torvalds wrote:
 On Wed, 2 Feb 2005, Lennert Van Alboom wrote:
  I applied the patch and it works like a charm. As a kinky side effect:
  before this patch, using a compiled-in vesa or vga16 framebuffer worked
  with the proprietary nvidia driver, whereas now tty1-6 are corrupt when
  not using 80x25. Strangeness :)

 It really sounds like you should lay off those pharmaceutical drugs ;)

 That is _strange_. Is it literally just this single pipe merging change
 that matters to you? No other changces? I don't see how it could
 _possibly_ make any difference at all to anything else.

   Linus


pgpAEbNCIOoA0.pgp
Description: PGP signature

Re: Memory leak in 2.6.11-rc1?

2005-02-02 Thread Linus Torvalds



On Wed, 2 Feb 2005, Lennert Van Alboom wrote:

 I applied the patch and it works like a charm. As a kinky side effect: before 
 this patch, using a compiled-in vesa or vga16 framebuffer worked with the 
 proprietary nvidia driver, whereas now tty1-6 are corrupt when not using 
 80x25. Strangeness :)

It really sounds like you should lay off those pharmaceutical drugs ;)

That is _strange_. Is it literally just this single pipe merging change
that matters to you? No other changces? I don't see how it could
_possibly_ make any difference at all to anything else.

Linus
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: Memory leak in 2.6.11-rc1?

2005-02-02 Thread Dave Hansen

I think there's still something funky going on in the pipe code, at
least in 2.6.11-rc2-mm2, which does contain the misordered __free_page()
fix in pipe.c.  I'm noticing any leak pretty easily because I'm
attempting memory removal of highmem areas, and these apparently leaked
pipe pages the only things keeping those from succeeding.

In any case, I'm running a horribly hacked up kernel, but this is
certainly a new problem, and not one that I've run into before.  Here's
output from the new CONFIG_PAGE_OWNER code:

Page (e0c4f8b8) pfn: 00566606 allocated via order 0
[0xc0162ef6] pipe_writev+542
[0xc0157f48] do_readv_writev+288
[0xc0163114] pipe_write+0
[0xc0134484] ltt_log_event+64
[0xc0158077] vfs_writev+75
[0xc01581ac] sys_writev+104
[0xc0102430] no_syscall_entry_trace+11

And some more information about the page (yes, it's in the vmalloc
space)

page: e0c4f8b8
pfn: 0008a54e 566606
count: 1
mapcount: 0
index: 786431
mapping: 
private: 
lru-prev: 00200200
lru-next: 00100100
PG_locked:  0
PG_error:   0
PG_referenced:  0
PG_uptodate:0
PG_dirty:   0
PG_lru: 0
PG_active:  0
PG_slab:0
PG_highmem: 1
PG_checked: 0
PG_arch_1:  0
PG_reserved:0
PG_private: 0
PG_writeback:   0
PG_nosave:  0
PG_compound:0
PG_swapcache:   0
PG_mappedtodisk:0
PG_reclaim: 0
PG_nosave_free: 0
PG_capture: 1


-- Dave

-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: Memory leak in 2.6.11-rc1?

2005-02-02 Thread Linus Torvalds



On Wed, 2 Feb 2005, Dave Hansen wrote:
 
 In any case, I'm running a horribly hacked up kernel, but this is
 certainly a new problem, and not one that I've run into before.  Here's
 output from the new CONFIG_PAGE_OWNER code:

Hmm.. Everything looks fine. One new thing about the pipe code is that it 
historically never allocated HIGHMEM pages, and the new code no longer 
cares and thus can allocate anything. So there's nothing strange in your 
output that I can see.

How many of these pages do you see? It's normal for a single pipe to be 
associated with up to 16 pages (although that would only happen if there 
is no reader or a slow reader, which is obviously not very common). 

Now, if your memory freeing code depends on the fact that all HIGHMEM
pages are always freeable (page cache + VM mappings only), then yes, the
new pipe code introduces highmem pages that weren't highmem before.  But
such long-lived and unfreeable pages have been there before too:  kernel
modules (or any other vmalloc() user, for that matter) also do the same
thing.

Now, there _is_ another possibility here: we might have had a pipe leak
before, and the new pipe code would potentially make it a lot more
noticeable, with up to sixteen times as many pages lost if somebody freed
a pipe inode without calling free_pipe_info(). I don't see where that 
would happen - all the normal release functions seem fine.

Hmm.. Adding a 

WARN_ON(inode-i_pipe);

to iput_final() might be a good idea - showing if somebody is releasing 
an inode while it still associated with a pipe-info data structure.

Also, while I don't see how a write could leak, but maybe you could you
add a

WARN_ON(buf-ops);

to the pipe_writev() case just before we insert a new buffer (ie to just
after the comment that says Insert it into the buffer array). Just to
see if the circular buffer handling might overwrite an old entry (although
I _really_ don't see that - it's not like the code is complex, and it
would also be accompanied by data-loss in the pipe, so we'd have seen
that, methinks).

Linus
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: Memory leak in 2.6.11-rc1?

2005-02-02 Thread Dave Hansen

On Wed, 2005-02-02 at 10:27 -0800, Linus Torvalds wrote:
 How many of these pages do you see? It's normal for a single pipe to be 
 associated with up to 16 pages (although that would only happen if there 
 is no reader or a slow reader, which is obviously not very common). 

Strangely enough, it seems to be one single, persistent page.  

 Now, if your memory freeing code depends on the fact that all HIGHMEM
 pages are always freeable (page cache + VM mappings only), then yes, the
 new pipe code introduces highmem pages that weren't highmem before.  But
 such long-lived and unfreeable pages have been there before too:  kernel
 modules (or any other vmalloc() user, for that matter) also do the same
 thing.

That might be it.  For now, I just change the GFP masks for vmalloc() so
that I don't have to deal with it, yet.  But, I certainly can see that
how this is a new user of highmem.

I did go around killing processes like mad to see if any of them still
had a hold of the pipe, but the shotgun approach didn't seem to help.

 Now, there _is_ another possibility here: we might have had a pipe leak
 before, and the new pipe code would potentially make it a lot more
 noticeable, with up to sixteen times as many pages lost if somebody freed
 a pipe inode without calling free_pipe_info(). I don't see where that 
 would happen - all the normal release functions seem fine.
 
 Hmm.. Adding a 
 
   WARN_ON(inode-i_pipe);
 
 to iput_final() might be a good idea - showing if somebody is releasing 
 an inode while it still associated with a pipe-info data structure.
 
 Also, while I don't see how a write could leak, but maybe you could you
 add a
 
   WARN_ON(buf-ops);
 
 to the pipe_writev() case just before we insert a new buffer (ie to just
 after the comment that says Insert it into the buffer array). Just to
 see if the circular buffer handling might overwrite an old entry (although
 I _really_ don't see that - it's not like the code is complex, and it
 would also be accompanied by data-loss in the pipe, so we'd have seen
 that, methinks).

I'll put the warnings in, and see if anything comes up.

-- Dave

-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: Memory leak in 2.6.11-rc1?

2005-02-02 Thread Linus Torvalds



On Wed, 2 Feb 2005, Dave Hansen wrote:
 
 Strangely enough, it seems to be one single, persistent page.  

Ok. Almost certainly not a leak. 

It's most likely the FIFO that init opens (/dev/initctl). FIFO's use the 
pipe code too.

If you don't want unreclaimable highmem pages, then I suspect you just 
need to change the GFP_HIGHUSER to a GFP_USER in fs/pipe.c

Linus
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: Memory leak in 2.6.11-rc1?

2005-01-30 Thread Yasuyuki KOZAKAI


(BHi,
(B
(BFrom: YOSHIFUJI Hideaki / [EMAIL PROTECTED](B <[EMAIL PROTECTED]>
(BDate: Mon, 31 Jan 2005 14:16:36 +0900 (JST)
(B
(B> In article <[EMAIL PROTECTED]> (at Mon, 31 Jan 2005 06:00:40 +0100), Patrick 
(B> McHardy <[EMAIL PROTECTED]> says:
(B> 
(B> |We don't need this for IPv6 yet. Once we get nf_conntrack in we
(B> |might need this, but its IPv6 fragment handling is different from
(B> |ip_conntrack, I need to check first.
(B> 
(B> Ok. It would be better to have some comment but anyway...
(B> kozakai-san?
(B
(BIMO, fix for nf_conntrack isn't needed yet. Because someone may change
(BIPv6 fragment handling in nf_conntrack.
(B
(BAnyway, current nf_conntrack passes the original (not de-fragmented) skb to
(BIPv6 stack. nf_conntrack doesn't touch its dst.
(B
(BRegards,
(B
(BYasuyuki KOZAKAI
(B
(BCommunication Platform Laboratory,
(BCorporate Research & Development Center,
(BToshiba Corporation
(B
(B[EMAIL PROTECTED]
(B
(B-
(BTo unsubscribe from this list: send the line "unsubscribe linux-kernel" in
(Bthe body of a message to [EMAIL PROTECTED]
(BMore majordomo info at  http://vger.kernel.org/majordomo-info.html
(BPlease read the FAQ at  http://www.tux.org/lkml/

Re: Memory leak in 2.6.11-rc1?

2005-01-30 Thread Herbert Xu

On Sun, Jan 30, 2005 at 09:11:50PM -0800, David S. Miller wrote:
> On Mon, 31 Jan 2005 06:00:40 +0100
> Patrick McHardy <[EMAIL PROTECTED]> wrote:
> 
> > We don't need this for IPv6 yet. Once we get nf_conntrack in we
> > might need this, but its IPv6 fragment handling is different from
> > ip_conntrack, I need to check first.
> 
> Right, ipv6 netfilter cannot create this situation yet.

Not through netfilter but I'm not convinced that other paths
won't do this.

For instance, what about ipv6_frag_rcv -> esp6_input -> ... -> ip6_fragment?
That would seem to be a potential path for a non-NULL dst to survive
through to ip6_fragment, no?

Cheers,
-- 
Visit Openswan at http://www.openswan.org/
Email: Herbert Xu ~{PmV>HI~} <[EMAIL PROTECTED]>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: Memory leak in 2.6.11-rc1?

2005-01-30 Thread David S. Miller

On Mon, 31 Jan 2005 06:00:40 +0100
Patrick McHardy <[EMAIL PROTECTED]> wrote:

> We don't need this for IPv6 yet. Once we get nf_conntrack in we
> might need this, but its IPv6 fragment handling is different from
> ip_conntrack, I need to check first.

Right, ipv6 netfilter cannot create this situation yet.

However, logically the fix is still correct and I'll add
it into the tree.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: Memory leak in 2.6.11-rc1?

2005-01-30 Thread YOSHIFUJI Hideaki / $B5HF#1QL@(B

In article <[EMAIL PROTECTED]> (at Mon, 31 Jan 2005 06:00:40 +0100), Patrick 
McHardy <[EMAIL PROTECTED]> says:

|We don't need this for IPv6 yet. Once we get nf_conntrack in we
|might need this, but its IPv6 fragment handling is different from
|ip_conntrack, I need to check first.

Ok. It would be better to have some comment but anyway...
kozakai-san?

--yoshfuji
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: Memory leak in 2.6.11-rc1?

2005-01-30 Thread Patrick McHardy

YOSHIFUJI Hideaki / [EMAIL PROTECTED] wrote:
In article <[EMAIL PROTECTED]> (at Mon, 31 Jan 2005 15:11:32 +1100), Herbert Xu 
<[EMAIL PROTECTED]> says:

Patrick McHardy <[EMAIL PROTECTED]> wrote:
Ok, final decision: you are right :) conntrack also defragments locally
generated packets before they hit ip_fragment. In this case the fragments
have skb->dst set.
Well caught.  The same thing is needed for IPv6, right?
(not yet confirmed, but) yes, please.
We don't need this for IPv6 yet. Once we get nf_conntrack in we
might need this, but its IPv6 fragment handling is different from
ip_conntrack, I need to check first.
Regards
Patrick
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: Memory leak in 2.6.11-rc1?

2005-01-30 Thread YOSHIFUJI Hideaki / $B5HF#1QL@(B

In article <[EMAIL PROTECTED]> (at Mon, 31 Jan 2005 15:11:32 +1100), Herbert Xu 
<[EMAIL PROTECTED]> says:

> Patrick McHardy <[EMAIL PROTECTED]> wrote:
> > 
> > Ok, final decision: you are right :) conntrack also defragments locally
> > generated packets before they hit ip_fragment. In this case the fragments
> > have skb->dst set.
> 
> Well caught.  The same thing is needed for IPv6, right?

(not yet confirmed, but) yes, please.

Signed-off-by: Hideaki YOSHIFUJI <[EMAIL PROTECTED]>

= net/ipv6/ip6_output.c 1.82 vs edited =
--- 1.82/net/ipv6/ip6_output.c  2005-01-25 09:40:10 +09:00
+++ edited/net/ipv6/ip6_output.c2005-01-31 13:44:01 +09:00
@@ -463,6 +463,7 @@
to->priority = from->priority;
to->protocol = from->protocol;
to->security = from->security;
+   dst_release(to->dst);
to->dst = dst_clone(from->dst);
to->dev = from->dev;
 

--yoshfuji
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: Memory leak in 2.6.11-rc1?

2005-01-30 Thread Herbert Xu

Patrick McHardy <[EMAIL PROTECTED]> wrote:
> 
> Ok, final decision: you are right :) conntrack also defragments locally
> generated packets before they hit ip_fragment. In this case the fragments
> have skb->dst set.

Well caught.  The same thing is needed for IPv6, right?
-- 
Visit Openswan at http://www.openswan.org/
Email: Herbert Xu ~{PmV>HI~} <[EMAIL PROTECTED]>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: Memory leak in 2.6.11-rc1?

2005-01-30 Thread David S. Miller

On Sun, 30 Jan 2005 18:58:27 +0100
Patrick McHardy <[EMAIL PROTECTED]> wrote:

> Ok, final decision: you are right :) conntrack also defragments locally
> generated packets before they hit ip_fragment. In this case the fragments
> have skb->dst set.

It's amazing how many bugs exist due to the local defragmentation and
refragmentation done by netfilter. :-)

Good catch Patrick, I'll apply this and push upstream.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: Memory leak in 2.6.11-rc1?

2005-01-30 Thread Russell King

On Sun, Jan 30, 2005 at 06:58:27PM +0100, Patrick McHardy wrote:
> Patrick McHardy wrote:
> >> Russell King wrote:
> >>> I don't know if the code is using fragment lists in ip_fragment(), but
> >>> on reading the code a question comes to mind: if we have a list of
> >>> fragments, does each fragment skb have a valid (and refcounted) dst
> >>> pointer before ip_fragment() does it's job?  If yes, then isn't the
> >>> first ip_copy_metadata() in ip_fragment() going to overwrite this
> >>> pointer without dropping the refcount?
> >>>
> >> Nice spotting. If conntrack isn't loaded defragmentation happens after
> >> routing, so this is likely the cause.
> >
> > OTOH, if conntrack isn't loaded forwarded packet are never defragmented,
> > so frag_list should be empty. So probably false alarm, sorry.
> 
> Ok, final decision: you are right :) conntrack also defragments locally
> generated packets before they hit ip_fragment. In this case the fragments
> have skb->dst set.

Good news - with this in place, I no longer have refcounts of 14000!
After 18 minutes (the first clearout of the dst cache from 500 odd
down to 11 or so), all dst cache entries have a ref count of zero.

I'll check it again later this evening to be sure.

Thanks Patrick.

> = net/ipv4/ip_output.c 1.74 vs edited =
> --- 1.74/net/ipv4/ip_output.c 2005-01-25 01:40:10 +01:00
> +++ edited/net/ipv4/ip_output.c   2005-01-30 18:54:43 +01:00
> @@ -389,6 +389,7 @@
>   to->priority = from->priority;
>   to->protocol = from->protocol;
>   to->security = from->security;
> + dst_release(to->dst);
>   to->dst = dst_clone(from->dst);
>   to->dev = from->dev;
>  


-- 
Russell King
 Linux kernel2.6 ARM Linux   - http://www.arm.linux.org.uk/
 maintainer of:  2.6 PCMCIA  - http://pcmcia.arm.linux.org.uk/
 2.6 Serial core
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: Memory leak in 2.6.11-rc1?

2005-01-30 Thread Phil Oester

On Sun, Jan 30, 2005 at 06:01:46PM +, Russell King wrote:
> > OTOH, if conntrack isn't loaded forwarded packet are never defragmented,
> > so frag_list should be empty. So probably false alarm, sorry.
> 
> I've just checked Phil's mails - both Phil and myself are using
> netfilter on the troublesome boxen.
> 
> Also, since FragCreates is zero, and this does mean that the frag_list
> is not empty in all cases so far where ip_fragment() has been called.
> (Reading the code, if frag_list was empty, we'd have to create some
> fragments, which increments the FragCreates statistic.)

The below testcase seems to illustrate the problem nicely -- ip_dst_cache
grows but never shrinks:

On gateway:

iptables -I FORWARD -d 10.10.10.0/24 -j DROP

On client:

for i in `seq 1 254` ; do ping -s 1500 -c 5 -w 1 -f 10.10.10.$i ; done


Phil
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: Memory leak in 2.6.11-rc1?

2005-01-30 Thread Russell King

On Sun, Jan 30, 2005 at 06:26:29PM +0100, Patrick McHardy wrote:
> Patrick McHardy wrote:
> 
> > Russell King wrote:
> >
> >> I don't know if the code is using fragment lists in ip_fragment(), but
> >> on reading the code a question comes to mind: if we have a list of
> >> fragments, does each fragment skb have a valid (and refcounted) dst
> >> pointer before ip_fragment() does it's job?  If yes, then isn't the
> >> first ip_copy_metadata() in ip_fragment() going to overwrite this
> >> pointer without dropping the refcount?
> >>
> > Nice spotting. If conntrack isn't loaded defragmentation happens after
> > routing, so this is likely the cause.
> 
> OTOH, if conntrack isn't loaded forwarded packet are never defragmented,
> so frag_list should be empty. So probably false alarm, sorry.

I've just checked Phil's mails - both Phil and myself are using
netfilter on the troublesome boxen.

Also, since FragCreates is zero, and this does mean that the frag_list
is not empty in all cases so far where ip_fragment() has been called.
(Reading the code, if frag_list was empty, we'd have to create some
fragments, which increments the FragCreates statistic.)

-- 
Russell King
 Linux kernel2.6 ARM Linux   - http://www.arm.linux.org.uk/
 maintainer of:  2.6 PCMCIA  - http://pcmcia.arm.linux.org.uk/
 2.6 Serial core
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: Memory leak in 2.6.11-rc1?

2005-01-30 Thread Patrick McHardy

Patrick McHardy wrote:
Russell King wrote:
I don't know if the code is using fragment lists in ip_fragment(), but
on reading the code a question comes to mind: if we have a list of
fragments, does each fragment skb have a valid (and refcounted) dst
pointer before ip_fragment() does it's job?  If yes, then isn't the
first ip_copy_metadata() in ip_fragment() going to overwrite this
pointer without dropping the refcount?
Nice spotting. If conntrack isn't loaded defragmentation happens after
routing, so this is likely the cause.

OTOH, if conntrack isn't loaded forwarded packet are never defragmented,
so frag_list should be empty. So probably false alarm, sorry.
Ok, final decision: you are right :) conntrack also defragments locally
generated packets before they hit ip_fragment. In this case the fragments
have skb->dst set.
Regards
Patrick
= net/ipv4/ip_output.c 1.74 vs edited =
--- 1.74/net/ipv4/ip_output.c   2005-01-25 01:40:10 +01:00
+++ edited/net/ipv4/ip_output.c 2005-01-30 18:54:43 +01:00
@@ -389,6 +389,7 @@
to->priority = from->priority;
to->protocol = from->protocol;
to->security = from->security;
+   dst_release(to->dst);
to->dst = dst_clone(from->dst);
to->dev = from->dev;

Re: Memory leak in 2.6.11-rc1?

2005-01-30 Thread Patrick McHardy

Patrick McHardy wrote:
Russell King wrote:
I don't know if the code is using fragment lists in ip_fragment(), but
on reading the code a question comes to mind: if we have a list of
fragments, does each fragment skb have a valid (and refcounted) dst
pointer before ip_fragment() does it's job?  If yes, then isn't the
first ip_copy_metadata() in ip_fragment() going to overwrite this
pointer without dropping the refcount?
Nice spotting. If conntrack isn't loaded defragmentation happens after
routing, so this is likely the cause.
OTOH, if conntrack isn't loaded forwarded packet are never defragmented,
so frag_list should be empty. So probably false alarm, sorry.
Regards
Patrick

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: Memory leak in 2.6.11-rc1?

2005-01-30 Thread Patrick McHardy

Russell King wrote:
I don't know if the code is using fragment lists in ip_fragment(), but
on reading the code a question comes to mind: if we have a list of
fragments, does each fragment skb have a valid (and refcounted) dst
pointer before ip_fragment() does it's job?  If yes, then isn't the
first ip_copy_metadata() in ip_fragment() going to overwrite this
pointer without dropping the refcount?
Nice spotting. If conntrack isn't loaded defragmentation happens after
routing, so this is likely the cause.
Regards
Patrick
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: Memory leak in 2.6.11-rc1?

2005-01-30 Thread Phil Oester

On Sun, Jan 30, 2005 at 03:34:49PM +, Russell King wrote:
> I think the case against the IPv4 fragmentation code is mounting.
> However, without knowing what the expected conditions for this code,
> (eg, are skbs on the fraglist supposed to have NULL skb->dst?) I'm
> unable to progress this any further.  However, I think it's quite
> clear that there is something bad going on here.

Interesting...the gateway which exhibits the problem fastest in my
area does have a large number of fragmented UDP packets running through it,
as shown by tcpdump 'ip[6:2] & 0x1fff != 0'.

> Why many more people aren't seeing this I've no idea.

Perhaps you (and I) experience more fragments than the average user???

Nice detective work!

Phil
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: Memory leak in 2.6.11-rc1?

2005-01-30 Thread Russell King

On Sun, Jan 30, 2005 at 01:23:43PM +, Russell King wrote:
> Anyway, I've produced some code which keeps a record of the __refcnt
> increments and decrements, and I think it's produced some interesting
> results.  Essentially, I'm seeing the odd dst entry with a __refcnt of
> 14000 or so (which is still in active use, so probably ok), and a number
> with 4, 7, and 13 which haven't had the refcount touched for at least 14
> minutes.

An hour or so goes by.  I now have 14 dst cache entries with non-zero
refcounts, and these have the following properties:

* The five from before (with counts 13, 14473, 4, 4, 7 respectively):
  + all remain unfreed.
  + show precisely no change in the refcounts.
  + the refcount has not been touched for more than an hour.
* They have all been touched by ip_copy_metadata.
* Their remaining refcounts are precisely half the number of
  ip_copy_metadata calls in every instance.

No entries with a refcount of zero contain ip_copy_metadata() and do
appear in /proc/net/rt_cache.

The following may also be a pointer - from /proc/net/snmp:

Ip: Forwarding DefaultTTL InReceives InHdrErrors InAddrErrors ForwDatagrams 
InUnknownProtos InDiscards InDelivers OutRequests OutDiscards OutNoRoutes 
ReasmTimeout ReasmReqds ReasmOKs ReasmFails FragOKs FragFails FragCreates
Ip: 1 64 140510 0 0 36861 0 0 93549 131703 485 0 21 46622 15695 21 21950 0 0

Since FragCreates is 0, this means that we are using the frag_lists
rather than creating our own fragments (and indeed the first
ip_copy_metadata() call rather than the second in ip_fragment()).

I think the case against the IPv4 fragmentation code is mounting.
However, without knowing what the expected conditions for this code,
(eg, are skbs on the fraglist supposed to have NULL skb->dst?) I'm
unable to progress this any further.  However, I think it's quite
clear that there is something bad going on here.

Why many more people aren't seeing this I've no idea.

-- 
Russell King
 Linux kernel2.6 ARM Linux   - http://www.arm.linux.org.uk/
 maintainer of:  2.6 PCMCIA  - http://pcmcia.arm.linux.org.uk/
 2.6 Serial core
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: Memory leak in 2.6.11-rc1?

2005-01-30 Thread Russell King

On Fri, Jan 28, 2005 at 08:58:59AM +, Russell King wrote:
> On Thu, Jan 27, 2005 at 04:34:44PM -0800, David S. Miller wrote:
> > On Fri, 28 Jan 2005 00:17:01 +
> > Russell King <[EMAIL PROTECTED]> wrote:
> > > Yes.  Someone suggested this evening that there may have been a recent
> > > change to do with some IPv6 refcounting which may have caused this
> > > problem.  Is that something you can confirm?
> > 
> > Yep, it would be this change below.  Try backing it out and see
> > if that makes your leak go away.
> 
> Thanks.  I'll try it, but:
> 
> 1. Looking at the date of the change it seems unlikely.  The recent
>death occurred with 2.6.10-rc2, booted on 29th November and dying
>on 19th January, which obviously predates this cset.
> 2. It'll take a couple of days to confirm the behaviour of the dst cache.

I have another question whether ip6_output.c is the problem - the leak
is with ip_dst_cache (== IPv4).  If the problem were ip6_output, wouldn't
we see ip6_dst_cache leaking instead?

Anyway, I've produced some code which keeps a record of the __refcnt
increments and decrements, and I think it's produced some interesting
results.  Essentially, I'm seeing the odd dst entry with a __refcnt of
14000 or so (which is still in active use, so probably ok), and a number
with 4, 7, and 13 which haven't had the refcount touched for at least 14
minutes.

One of these were created via ip_route_input_slow(), the other three via
ip_route_output_slow().  That isn't significant on its own.

However, whenever ip_copy_metadata() appears in the refcount log, I see
half the number of increments due to that still remaining to be
decremented (see the output below).  0 = "mark", positive numbers =
increment refcnt this many times, negative numbers = decrement refcnt
this many times.

I don't know if the code is using fragment lists in ip_fragment(), but
on reading the code a question comes to mind: if we have a list of
fragments, does each fragment skb have a valid (and refcounted) dst
pointer before ip_fragment() does it's job?  If yes, then isn't the
first ip_copy_metadata() in ip_fragment() going to overwrite this
pointer without dropping the refcount?

All that said, it's probably far too early to read much into these
results - once the machine has been running for more than 19 minutes
and has a significant number of "stuck" dst cache entries, I think
it'll be far more conclusive.  Nevertheless, it looks like food for
thought.

dst pointer: creation time (200Hz jiffies) last reference time (200Hz jiffies)
c1c66260: 6c79 879d:
location count  function
c01054f4 0  dst_alloc
c0114a80 1  ip_route_input_slow
c00fa95c -18__kfree_skb
c0115104 13 ip_route_input
c011ae1c 8  ip_copy_metadata
c01055ac 0  __dst_free
untracked counts
: 0
total
= 4
  next=c1c66b60 refcnt=0004 use=000d dst=24f45cc3 src=0f00a8c0

c1c66b60: 20fe 5066:
c01054f4 0  dst_alloc
c01156e8 1  ip_route_output_slow
c011b854 6813   ip_append_data
c011c7e0 6813   ip_push_pending_frames
c00fa95c -6826  __kfree_skb
c011c8fc -6813  ip_push_pending_frames
c0139dbc -6813  udp_sendmsg
c0115a0c 6814   __ip_route_output_key
c013764c -2 ip4_datagram_connect
c011ae1c 26 ip_copy_metadata
c01055ac 0  __dst_free
: 0
= 13
  next=c1c57680 refcnt=000d use=1a9e dst=bbe812d4 src=bae812d4

c1c66960: 89ac a42d:
c01054f4 0  dst_alloc
c01156e8 1  ip_route_output_slow
c011b854 3028   ip_append_data
c0139dbc -3028  udp_sendmsg
c011c7e0 3028   ip_push_pending_frames
c011ae1c 8  ip_copy_metadata
c00fa95c -3032  __kfree_skb
c011c8fc -3028  ip_push_pending_frames
c0115a0c 3027   __ip_route_output_key
c01055ac 0  __dst_free
: 0
= 4
  next=c16d1080 refcnt=0004 use=0bd3 dst=bbe812d4 src=bae812d4

c16d1080: 879b 89af:
c01054f4 0  dst_alloc
c01156e8 1  ip_route_output_slow
c011b854 240ip_append_data
c011c7e0 240ip_push_pending_frames
c00fa95c -247   __kfree_skb
c011c8fc -240   ip_push_pending_frames
c0139dbc -240   udp_sendmsg
c0115a0c 239__ip_route_output_key
c011ae1c 14 ip_copy_metadata
c01055ac 0  __dst_free
: 0
= 7
  next=c1c66260 refcnt=0007 use=00ef dst=bbe812d4 src=bae812d4

-- 
Russell King
 Linux kernel2.6 ARM Linux   - http://www.arm.linux.org.uk/
 maintainer of:  2.6 PCMCIA  - http://pcmcia.arm.linux.org.uk/
 2.6 Serial core
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read

Re: Memory leak in 2.6.11-rc1?

2005-01-30 Thread Herbert Xu

On Sun, Jan 30, 2005 at 09:11:50PM -0800, David S. Miller wrote:
 On Mon, 31 Jan 2005 06:00:40 +0100
 Patrick McHardy [EMAIL PROTECTED] wrote:
 
  We don't need this for IPv6 yet. Once we get nf_conntrack in we
  might need this, but its IPv6 fragment handling is different from
  ip_conntrack, I need to check first.
 
 Right, ipv6 netfilter cannot create this situation yet.

Not through netfilter but I'm not convinced that other paths
won't do this.

For instance, what about ipv6_frag_rcv - esp6_input - ... - ip6_fragment?
That would seem to be a potential path for a non-NULL dst to survive
through to ip6_fragment, no?

Cheers,
-- 
Visit Openswan at http://www.openswan.org/
Email: Herbert Xu ~{PmVHI~} [EMAIL PROTECTED]
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: Memory leak in 2.6.11-rc1?

2005-01-30 Thread Yasuyuki KOZAKAI


(BHi,
(B
(BFrom: YOSHIFUJI Hideaki / [EMAIL PROTECTED](B [EMAIL PROTECTED]
(BDate: Mon, 31 Jan 2005 14:16:36 +0900 (JST)
(B
(B In article [EMAIL PROTECTED] (at Mon, 31 Jan 2005 06:00:40 +0100), Patrick 
(B McHardy [EMAIL PROTECTED] says:
(B 
(B |We don't need this for IPv6 yet. Once we get nf_conntrack in we
(B |might need this, but its IPv6 fragment handling is different from
(B |ip_conntrack, I need to check first.
(B 
(B Ok. It would be better to have some comment but anyway...
(B kozakai-san?
(B
(BIMO, fix for nf_conntrack isn't needed yet. Because someone may change
(BIPv6 fragment handling in nf_conntrack.
(B
(BAnyway, current nf_conntrack passes the original (not de-fragmented) skb to
(BIPv6 stack. nf_conntrack doesn't touch its dst.
(B
(BRegards,
(B
(BYasuyuki KOZAKAI
(B
(BCommunication Platform Laboratory,
(BCorporate Research  Development Center,
(BToshiba Corporation
(B
(B[EMAIL PROTECTED]
(B
(B-
(BTo unsubscribe from this list: send the line "unsubscribe linux-kernel" in
(Bthe body of a message to [EMAIL PROTECTED]
(BMore majordomo info at  http://vger.kernel.org/majordomo-info.html
(BPlease read the FAQ at  http://www.tux.org/lkml/

Re: Memory leak in 2.6.11-rc1?

2005-01-30 Thread David S. Miller

On Sun, 30 Jan 2005 18:58:27 +0100
Patrick McHardy [EMAIL PROTECTED] wrote:

 Ok, final decision: you are right :) conntrack also defragments locally
 generated packets before they hit ip_fragment. In this case the fragments
 have skb-dst set.

It's amazing how many bugs exist due to the local defragmentation and
refragmentation done by netfilter. :-)

Good catch Patrick, I'll apply this and push upstream.
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: Memory leak in 2.6.11-rc1?

2005-01-30 Thread Herbert Xu

Patrick McHardy [EMAIL PROTECTED] wrote:
 
 Ok, final decision: you are right :) conntrack also defragments locally
 generated packets before they hit ip_fragment. In this case the fragments
 have skb-dst set.

Well caught.  The same thing is needed for IPv6, right?
-- 
Visit Openswan at http://www.openswan.org/
Email: Herbert Xu ~{PmVHI~} [EMAIL PROTECTED]
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: Memory leak in 2.6.11-rc1?

2005-01-30 Thread YOSHIFUJI Hideaki / $B5HF#1QL@(B

In article [EMAIL PROTECTED] (at Mon, 31 Jan 2005 15:11:32 +1100), Herbert Xu 
[EMAIL PROTECTED] says:

 Patrick McHardy [EMAIL PROTECTED] wrote:
  
  Ok, final decision: you are right :) conntrack also defragments locally
  generated packets before they hit ip_fragment. In this case the fragments
  have skb-dst set.
 
 Well caught.  The same thing is needed for IPv6, right?

(not yet confirmed, but) yes, please.

Signed-off-by: Hideaki YOSHIFUJI [EMAIL PROTECTED]

= net/ipv6/ip6_output.c 1.82 vs edited =
--- 1.82/net/ipv6/ip6_output.c  2005-01-25 09:40:10 +09:00
+++ edited/net/ipv6/ip6_output.c2005-01-31 13:44:01 +09:00
@@ -463,6 +463,7 @@
to-priority = from-priority;
to-protocol = from-protocol;
to-security = from-security;
+   dst_release(to-dst);
to-dst = dst_clone(from-dst);
to-dev = from-dev;
 

--yoshfuji
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: Memory leak in 2.6.11-rc1?

2005-01-28 Thread Russell King

On Thu, Jan 27, 2005 at 12:40:12PM -0800, Phil Oester wrote:
> Vanilla 2.6.10, though I've been seeing these problems since 2.6.8 or
> earlier.

Right.  For me:

- 2.6.9-rc3 (installed 8th Oct) died with dst cache overflow on 29th November
- 2.6.10-rc2 (booted 29th Nov) died with the same on 19th January
- 2.6.11-rc1 (booted 19th Jan) appears to have the same problem, but
  it hasn't died yet.

> Netfilter running on all boxes, some utilizing SNAT, others
> not -- none using MASQ.

IPv4 filter targets: ACCEPT, DROP, REJECT, LOG
using: state, limit & protocol

IPv4 nat targets: DNAT, MASQ
using: protocol

IPv4 mangle targets: ACCEPT, MARK
using: protocol

IPv6 filter targets: ACCEPT, DROP
using: protocol

IPv6 mangle targets: none

(protocol == at least one rule matching tcp, icmp or udp packets)

IPv6 configured native on internal interface, tun6to4 for external IPv6
communication.

IPv4 and IPv6 forwarding enabled.
IPv4 rpfilter, proxyarp, syncookies enabled.
IPv4 proxy delay on internal interface set to '1'.

> These boxes are all running the quagga OSPF daemon, but those that
> are lightly loaded are not exhibiting these problems.

Running zebra (for ipv6 route advertisment on the local network only.)

Network traffic-wise, 2.6.11-rc1 has this on its public facing
interface(s) in 8.5 days.

4: eth1:  mtu 1500 qdisc pfifo_fast qlen 1000
RX: bytes  packets  errors  dropped overrun mcast
667468541  2603373  0   0   0   0
TX: bytes  packets  errors  dropped carrier collsns
1245774764 2777605  0   0   1   2252

5: [EMAIL PROTECTED]:  mtu 1480 qdisc noqueue
RX: bytes  packets  errors  dropped overrun mcast
19130536   840340   0   0   0
TX: bytes  packets  errors  dropped carrier collsns
10436749   915890   0   0   0


-- 
Russell King
 Linux kernel2.6 ARM Linux   - http://www.arm.linux.org.uk/
 maintainer of:  2.6 PCMCIA  - http://pcmcia.arm.linux.org.uk/
 2.6 Serial core
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: Memory leak in 2.6.11-rc1?

2005-01-28 Thread Russell King

On Thu, Jan 27, 2005 at 04:34:44PM -0800, David S. Miller wrote:
> On Fri, 28 Jan 2005 00:17:01 +
> Russell King <[EMAIL PROTECTED]> wrote:
> > Yes.  Someone suggested this evening that there may have been a recent
> > change to do with some IPv6 refcounting which may have caused this
> > problem.  Is that something you can confirm?
> 
> Yep, it would be this change below.  Try backing it out and see
> if that makes your leak go away.

Thanks.  I'll try it, but:

1. Looking at the date of the change it seems unlikely.  The recent
   death occurred with 2.6.10-rc2, booted on 29th November and dying
   on 19th January, which obviously predates this cset.
2. It'll take a couple of days to confirm the behaviour of the dst cache.

-- 
Russell King
 Linux kernel2.6 ARM Linux   - http://www.arm.linux.org.uk/
 maintainer of:  2.6 PCMCIA  - http://pcmcia.arm.linux.org.uk/
 2.6 Serial core
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: Memory leak in 2.6.11-rc1?

2005-01-28 Thread Russell King

On Thu, Jan 27, 2005 at 04:34:44PM -0800, David S. Miller wrote:
 On Fri, 28 Jan 2005 00:17:01 +
 Russell King [EMAIL PROTECTED] wrote:
  Yes.  Someone suggested this evening that there may have been a recent
  change to do with some IPv6 refcounting which may have caused this
  problem.  Is that something you can confirm?
 
 Yep, it would be this change below.  Try backing it out and see
 if that makes your leak go away.

Thanks.  I'll try it, but:

1. Looking at the date of the change it seems unlikely.  The recent
   death occurred with 2.6.10-rc2, booted on 29th November and dying
   on 19th January, which obviously predates this cset.
2. It'll take a couple of days to confirm the behaviour of the dst cache.

-- 
Russell King
 Linux kernel2.6 ARM Linux   - http://www.arm.linux.org.uk/
 maintainer of:  2.6 PCMCIA  - http://pcmcia.arm.linux.org.uk/
 2.6 Serial core
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: Memory leak in 2.6.11-rc1?

2005-01-28 Thread Russell King

On Thu, Jan 27, 2005 at 12:40:12PM -0800, Phil Oester wrote:
 Vanilla 2.6.10, though I've been seeing these problems since 2.6.8 or
 earlier.

Right.  For me:

- 2.6.9-rc3 (installed 8th Oct) died with dst cache overflow on 29th November
- 2.6.10-rc2 (booted 29th Nov) died with the same on 19th January
- 2.6.11-rc1 (booted 19th Jan) appears to have the same problem, but
  it hasn't died yet.

 Netfilter running on all boxes, some utilizing SNAT, others
 not -- none using MASQ.

IPv4 filter targets: ACCEPT, DROP, REJECT, LOG
using: state, limit  protocol

IPv4 nat targets: DNAT, MASQ
using: protocol

IPv4 mangle targets: ACCEPT, MARK
using: protocol

IPv6 filter targets: ACCEPT, DROP
using: protocol

IPv6 mangle targets: none

(protocol == at least one rule matching tcp, icmp or udp packets)

IPv6 configured native on internal interface, tun6to4 for external IPv6
communication.

IPv4 and IPv6 forwarding enabled.
IPv4 rpfilter, proxyarp, syncookies enabled.
IPv4 proxy delay on internal interface set to '1'.

 These boxes are all running the quagga OSPF daemon, but those that
 are lightly loaded are not exhibiting these problems.

Running zebra (for ipv6 route advertisment on the local network only.)

Network traffic-wise, 2.6.11-rc1 has this on its public facing
interface(s) in 8.5 days.

4: eth1: BROADCAST,MULTICAST,UP mtu 1500 qdisc pfifo_fast qlen 1000
RX: bytes  packets  errors  dropped overrun mcast
667468541  2603373  0   0   0   0
TX: bytes  packets  errors  dropped carrier collsns
1245774764 2777605  0   0   1   2252

5: [EMAIL PROTECTED]: NOARP,UP mtu 1480 qdisc noqueue
RX: bytes  packets  errors  dropped overrun mcast
19130536   840340   0   0   0
TX: bytes  packets  errors  dropped carrier collsns
10436749   915890   0   0   0


-- 
Russell King
 Linux kernel2.6 ARM Linux   - http://www.arm.linux.org.uk/
 maintainer of:  2.6 PCMCIA  - http://pcmcia.arm.linux.org.uk/
 2.6 Serial core
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: Memory leak in 2.6.11-rc1?

2005-01-27 Thread Phil Oester

On Fri, Jan 28, 2005 at 12:17:01AM +, Russell King wrote:
> On Thu, Jan 27, 2005 at 12:33:26PM -0800, David S. Miller wrote:
> > So they won't be listed in /proc/net/rt_cache (since they've been
> > removed from the lookup table) but they will be accounted for in
> > /proc/net/stat/rt_cache until the final release is done on the
> > routing cache object and it can be completely freed up.
> > 
> > Do you happen to be using IPV6 in any way by chance?
> 
> Yes.  Someone suggested this evening that there may have been a recent
> change to do with some IPv6 refcounting which may have caused this
> problem.  Is that something you can confirm?

FWIW, I do not use IPv6, and it is not compiled into the kernel.

Phil
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: Memory leak in 2.6.11-rc1?

2005-01-27 Thread David S. Miller

On Fri, 28 Jan 2005 00:17:01 +
Russell King <[EMAIL PROTECTED]> wrote:

> Yes.  Someone suggested this evening that there may have been a recent
> change to do with some IPv6 refcounting which may have caused this
> problem.  Is that something you can confirm?

Yep, it would be this change below.  Try backing it out and see
if that makes your leak go away.

# This is a BitKeeper generated diff -Nru style patch.
#
# ChangeSet
#   2005/01/14 20:41:55-08:00 [EMAIL PROTECTED] 
#   [IPV6]: Fix locking in ip6_dst_lookup().
#   
#   The caller does not necessarily have the socket locked
#   (udpv6sendmsg() is one such case) so we have to use
#   sk_dst_check() instead of __sk_dst_check().
#   
#   Signed-off-by: Herbert Xu <[EMAIL PROTECTED]>
#   Signed-off-by: David S. Miller <[EMAIL PROTECTED]>
# 
# net/ipv6/ip6_output.c
#   2005/01/14 20:41:34-08:00 [EMAIL PROTECTED] +3 -3
#   [IPV6]: Fix locking in ip6_dst_lookup().
#   
#   The caller does not necessarily have the socket locked
#   (udpv6sendmsg() is one such case) so we have to use
#   sk_dst_check() instead of __sk_dst_check().
#   
#   Signed-off-by: Herbert Xu <[EMAIL PROTECTED]>
#   Signed-off-by: David S. Miller <[EMAIL PROTECTED]>
# 
diff -Nru a/net/ipv6/ip6_output.c b/net/ipv6/ip6_output.c
--- a/net/ipv6/ip6_output.c 2005-01-27 16:07:21 -08:00
+++ b/net/ipv6/ip6_output.c 2005-01-27 16:07:21 -08:00
@@ -745,7 +745,7 @@
if (sk) {
struct ipv6_pinfo *np = inet6_sk(sk);

-   *dst = __sk_dst_check(sk, np->dst_cookie);
+   *dst = sk_dst_check(sk, np->dst_cookie);
if (*dst) {
struct rt6_info *rt = (struct rt6_info*)*dst;

@@ -772,9 +772,9 @@
 && (np->daddr_cache == NULL ||
 !ipv6_addr_equal(>fl6_dst, 
np->daddr_cache)))
|| (fl->oif && fl->oif != (*dst)->dev->ifindex)) {
+   dst_release(*dst);
*dst = NULL;
-   } else
-   dst_hold(*dst);
+   }
}
}
 
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: Memory leak in 2.6.11-rc1?

2005-01-27 Thread Russell King

On Thu, Jan 27, 2005 at 12:33:26PM -0800, David S. Miller wrote:
> So they won't be listed in /proc/net/rt_cache (since they've been
> removed from the lookup table) but they will be accounted for in
> /proc/net/stat/rt_cache until the final release is done on the
> routing cache object and it can be completely freed up.
> 
> Do you happen to be using IPV6 in any way by chance?

Yes.  Someone suggested this evening that there may have been a recent
change to do with some IPv6 refcounting which may have caused this
problem.  Is that something you can confirm?

-- 
Russell King
 Linux kernel2.6 ARM Linux   - http://www.arm.linux.org.uk/
 maintainer of:  2.6 PCMCIA  - http://pcmcia.arm.linux.org.uk/
 2.6 Serial core
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: Memory leak in 2.6.11-rc1?

2005-01-27 Thread Phil Oester

On Thu, Jan 27, 2005 at 07:25:04PM +, Russell King wrote:
> Can you provide some details, eg kernel configuration, loaded modules
> and a brief overview of any netfilter modules you may be using.
> 
> Maybe we can work out what's common between our setups.

Vanilla 2.6.10, though I've been seeing these problems since 2.6.8 or
earlier.  Netfilter running on all boxes, some utilizing SNAT, others
not -- none using MASQ.  This is from a box running no NAT at all,
although has some other filter rules:

# wc -l /proc/net/rt_cache ; grep dst_cache /proc/slabinfo
 50 /proc/net/rt_cache
ip_dst_cache   84285  84285

Also with uptime of 26 days.  

These boxes are all running the quagga OSPF daemon, but those that
are lightly loaded are not exhibiting these problems.

Phil
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: Memory leak in 2.6.11-rc1?

2005-01-27 Thread David S. Miller

On Thu, 27 Jan 2005 16:49:18 +
Russell King <[EMAIL PROTECTED]> wrote:

> notice how /proc/net/stat/rt_cache says there's 1336 entries in the
> route cache.  _Where_ are they?  They're not there according to
> /proc/net/rt_cache.

When the route cache is flushed, that kills a reference to each
entry in the routing cache.  If for some reason, other references
remain (route connected to socket, some leak in the stack somewhere)
the route cache entry can't be immediately completely freed up.

So they won't be listed in /proc/net/rt_cache (since they've been
removed from the lookup table) but they will be accounted for in
/proc/net/stat/rt_cache until the final release is done on the
routing cache object and it can be completely freed up.

Do you happen to be using IPV6 in any way by chance?
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: Memory leak in 2.6.11-rc1?

2005-01-27 Thread Russell King

On Thu, Jan 27, 2005 at 10:37:45AM -0800, Phil Oester wrote:
> On Thu, Jan 27, 2005 at 04:49:18PM +, Russell King wrote:
> > so obviously the GC does appear to be working - as can be seen from the
> > number of entries in /proc/net/rt_cache.  However, the number of objects
> > in the slab cache does grow day on day.  About 4 days ago, it was only
> > about 600 active objects.  Now it's more than twice that, and it'll
> > continue increasing until it hits 8192, where upon it's game over.
> 
> I can confirm the behavior you are seeing -- does seem to be a leak
> somewhere.  Below from a heavily used gateway with 26 days uptime:
> 
> # wc -l /proc/net/rt_cache ; grep ip_dst /proc/slabinfo
>   12870 /proc/net/rt_cache
> ip_dst_cache   53327  57855
> 
> Eventually I get the dst_cache overflow errors and have to reboot.

Can you provide some details, eg kernel configuration, loaded modules
and a brief overview of any netfilter modules you may be using.

Maybe we can work out what's common between our setups.

-- 
Russell King
 Linux kernel2.6 ARM Linux   - http://www.arm.linux.org.uk/
 maintainer of:  2.6 PCMCIA  - http://pcmcia.arm.linux.org.uk/
 2.6 Serial core
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: Memory leak in 2.6.11-rc1?

2005-01-27 Thread Phil Oester

On Thu, Jan 27, 2005 at 04:49:18PM +, Russell King wrote:
> so obviously the GC does appear to be working - as can be seen from the
> number of entries in /proc/net/rt_cache.  However, the number of objects
> in the slab cache does grow day on day.  About 4 days ago, it was only
> about 600 active objects.  Now it's more than twice that, and it'll
> continue increasing until it hits 8192, where upon it's game over.

I can confirm the behavior you are seeing -- does seem to be a leak
somewhere.  Below from a heavily used gateway with 26 days uptime:

# wc -l /proc/net/rt_cache ; grep ip_dst /proc/slabinfo
  12870 /proc/net/rt_cache
ip_dst_cache   53327  57855

Eventually I get the dst_cache overflow errors and have to reboot.

Phil
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: Memory leak in 2.6.11-rc1?

2005-01-27 Thread Russell King

On Thu, Jan 27, 2005 at 01:56:30PM +0100, Robert Olsson wrote:
> 
> Andrew Morton writes:
>  > Russell King <[EMAIL PROTECTED]> wrote:
> 
>  > >  ip_dst_cache1292   1485256   151
> 
>  > I guess we should find a way to make it happen faster.
>  
> Here is route DoS attack. Pure routing no NAT no filter.
> 
> Start
> =
> ip_dst_cache   5 30256   151 : tunables  120   608 : 
> slabdata  2  2  0
> 
> After DoS
> =
> ip_dst_cache   66045  76125256   151 : tunables  120   608 : 
> slabdata   5075   5075480
> 
> After some GC runs.
> ==
> ip_dst_cache   2 15256   151 : tunables  120   608 : 
> slabdata  1  1  0
> 
> No problems here. I saw Martin talked about NAT...

Yes, I can reproduce that same behaviour, where I can artificially
inflate the DST cache and the GC does run and trims it back down to
something reasonable.

BUT, over time, my DST cache just increases in size and won't trim back
down.  Not even by writing to the /proc/sys/net/ipv4/route/flush sysctl
(which, if I'm reading the code correctly - and would be nice to know
from those who actually know this stuff - should force an immediate
flush of the DST cache.)

For instance, I have (in sequence):

# cat /proc/net/rt_cache|wc -l;grep ip_dst /proc/slabinfo
581
ip_dst_cache1860   1860256   151 : tunables  120   600 : 
slabdata124124  0
# cat /proc/net/rt_cache|wc -l;grep ip_dst /proc/slabinfo
717
ip_dst_cache1995   1995256   151 : tunables  120   600 : 
slabdata133133  0
# cat /proc/net/rt_cache|wc -l;grep ip_dst /proc/slabinfo
690
ip_dst_cache1995   1995256   151 : tunables  120   600 : 
slabdata133133  0
# cat /proc/net/rt_cache|wc -l;grep ip_dst /proc/slabinfo
696
ip_dst_cache1995   1995256   151 : tunables  120   600 : 
slabdata133133  0
# cat /proc/net/rt_cache|wc -l;grep ip_dst /proc/slabinfo
700
ip_dst_cache1995   1995256   151 : tunables  120   600 : 
slabdata133133  0
# cat /proc/net/rt_cache|wc -l;grep ip_dst /proc/slabinfo
718
ip_dst_cache1993   1995256   151 : tunables  120   600 : 
slabdata133133  0
# cat /proc/net/rt_cache|wc -l;grep ip_dst /proc/slabinfo
653
ip_dst_cache1993   1995256   151 : tunables  120   600 : 
slabdata133133  0
# cat /proc/net/rt_cache|wc -l;grep ip_dst /proc/slabinfo
667
ip_dst_cache1956   1995256   151 : tunables  120   600 : 
slabdata133133  0
# cat /proc/net/rt_cache|wc -l;grep ip_dst /proc/slabinfo
620
ip_dst_cache1944   1995256   151 : tunables  120   600 : 
slabdata133133  0
# cat /proc/net/rt_cache|wc -l;grep ip_dst /proc/slabinfo
623
ip_dst_cache1920   1995256   151 : tunables  120   600 : 
slabdata133133  0
# cat /proc/net/rt_cache|wc -l;grep ip_dst /proc/slabinfo
8
ip_dst_cache1380   1980256   151 : tunables  120   600 : 
slabdata132132  0
# cat /proc/net/rt_cache|wc -l;grep ip_dst /proc/slabinfo
86
ip_dst_cache1375   1875256   151 : tunables  120   600 : 
slabdata125125  0

so obviously the GC does appear to be working - as can be seen from the
number of entries in /proc/net/rt_cache.  However, the number of objects
in the slab cache does grow day on day.  About 4 days ago, it was only
about 600 active objects.  Now it's more than twice that, and it'll
continue increasing until it hits 8192, where upon it's game over.

And, here's the above with /proc/net/stat/rt_cache included:

# cat /proc/net/rt_cache|wc -l;grep ip_dst /proc/slabinfo; cat 
/proc/net/stat/rt_cache
61
ip_dst_cache1340   1680256   151 : tunables  120   600 : 
slabdata112112  0
entries  in_hit in_slow_tot in_no_route in_brd in_martian_dst in_martian_src  
out_hit out_slow_tot out_slow_mc  gc_total gc_ignored gc_goal_miss 
gc_dst_overflow in_hlist_search out_hlist_search
0538  005c9f10 0005e163  0013 02e2  0005  
003102e3 00038f6d  0007887a 0005286d 1142  00138855 0010848d

notice how /proc/net/stat/rt_cache says there's 1336 entries in the
route cache.  _Where_ are they?  They're not there according to
/proc/net/rt_cache.

(PS, the formatting of the headings in /proc/net/stat/rt_cache doesn't
appear to tie up with the formatting of the data which is _really_
confusing.)

-- 
Russell King
 Linux kernel2.6 ARM Linux   - http://www.arm.linux.org.uk/
 maintainer of:  2.6 PCMCIA  - http://pcmcia.arm.linux.org.uk/
 2.6 Serial core
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read

Re: Memory leak in 2.6.11-rc1?

2005-01-27 Thread Robert Olsson


Oh. Linux version 2.6.11-rc2 was used.

Robert Olsson writes:
 > 
 > Andrew Morton writes:
 >  > Russell King <[EMAIL PROTECTED]> wrote:
 > 
 >  > >  ip_dst_cache1292   1485256   151
 > 
 >  > I guess we should find a way to make it happen faster.
 >  
 > Here is route DoS attack. Pure routing no NAT no filter.
 > 
 > Start
 > =
 > ip_dst_cache   5 30256   151 : tunables  120   608 : 
 > slabdata  2  2  0
 > 
 > After DoS
 > =
 > ip_dst_cache   66045  76125256   151 : tunables  120   608 : 
 > slabdata   5075   5075480
 > 
 > After some GC runs.
 > ==
 > ip_dst_cache   2 15256   151 : tunables  120   608 : 
 > slabdata  1  1  0
 > 
 > No problems here. I saw Martin talked about NAT...
 > 
 >  --ro
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: Memory leak in 2.6.11-rc1?

2005-01-27 Thread Robert Olsson


Andrew Morton writes:
 > Russell King <[EMAIL PROTECTED]> wrote:

 > >  ip_dst_cache1292   1485256   151

 > I guess we should find a way to make it happen faster.
 
Here is route DoS attack. Pure routing no NAT no filter.

Start
=
ip_dst_cache   5 30256   151 : tunables  120   608 : 
slabdata  2  2  0

After DoS
=
ip_dst_cache   66045  76125256   151 : tunables  120   608 : 
slabdata   5075   5075480

After some GC runs.
==
ip_dst_cache   2 15256   151 : tunables  120   608 : 
slabdata  1  1  0

No problems here. I saw Martin talked about NAT...

--ro
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: Memory leak in 2.6.11-rc1?

2005-01-27 Thread Martin Josefsson

On Thu, 27 Jan 2005, Andrew Morton wrote:

> Russell King <[EMAIL PROTECTED]> wrote:
> >
> > This mornings magic numbers are:
> >
> >  3
> >  ip_dst_cache1292   1485256   151
>
> I just did a q-n-d test here: send one UDP frame to 1.1.1.1 up to
> 1.1.255.255.  The ip_dst_cache grew to ~15k entries and grew no further.
> It's now gradually shrinking.  So there doesn't appear to be a trivial
> bug..
>
> >  Is no one interested in the fact that the DST cache is leaking and
> >  eventually takes out machines?  I've had virtually zero interest in
> >  this problem so far.
>
> I guess we should find a way to make it happen faster.

I could be a refcount problem. I think Russell is using NAT, it could be
the MASQUERADE target if that is in use. A simple test would be to switch
to SNAT and try again if possible.

/Martin
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: Memory leak in 2.6.11-rc1?

2005-01-27 Thread Alessandro Suardi

On Thu, 27 Jan 2005 00:47:32 -0800, Andrew Morton <[EMAIL PROTECTED]> wrote:
> Russell King <[EMAIL PROTECTED]> wrote:
> >
> > This mornings magic numbers are:
> >
> >  3
> >  ip_dst_cache1292   1485256   151
> 
> I just did a q-n-d test here: send one UDP frame to 1.1.1.1 up to
> 1.1.255.255.  The ip_dst_cache grew to ~15k entries and grew no further.
> It's now gradually shrinking.  So there doesn't appear to be a trivial
> bug..
> 
> >  Is no one interested in the fact that the DST cache is leaking and
> >  eventually takes out machines?  I've had virtually zero interest in
> >  this problem so far.
> 
> I guess we should find a way to make it happen faster.
> -
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to [EMAIL PROTECTED]
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/
> 

Data point... on my box, used as ed2k/bittorrent
 machine, the ip_dst_cache grows and shrinks quite
 fast; these two samples were ~3 minutes apart:


[EMAIL PROTECTED] ~]# grep ip_dst /proc/slabinfo
ip_dst_cache 998   1005256   151 : tunables  120   60 
  0 : slabdata 67 67  0
[EMAIL PROTECTED] ~]# wc -l /proc/net/rt_cache
926 /proc/net/rt_cache

[EMAIL PROTECTED] ~]# grep ip_dst /proc/slabinfo
ip_dst_cache 466795256   151 : tunables  120   60 
  0 : slabdata 53 53  0
[EMAIL PROTECTED] ~]# wc -l /proc/net/rt_cache
443 /proc/net/rt_cache

 and these were 2 seconds apart

[EMAIL PROTECTED] ~]# wc -l /proc/net/rt_cache
737 /proc/net/rt_cache
[EMAIL PROTECTED] ~]# grep ip_dst /proc/slabinfo
ip_dst_cache 795795256   151 : tunables  120   60 
  0 : slabdata 53 53  0

[EMAIL PROTECTED] ~]# wc -l /proc/net/rt_cache
1023 /proc/net/rt_cache
[EMAIL PROTECTED] ~]# grep ip_dst /proc/slabinfo
ip_dst_cache1035   1035256   151 : tunables  120   60 
  0 : slabdata 69 69  0

--alessandro
 
  "And every dream, every, is just a dream after all"
 
 (Heather Nova, "Paper Cup")
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: Memory leak in 2.6.11-rc1?

2005-01-27 Thread Andrew Morton

Russell King <[EMAIL PROTECTED]> wrote:
>
> This mornings magic numbers are:
> 
>  3
>  ip_dst_cache1292   1485256   151

I just did a q-n-d test here: send one UDP frame to 1.1.1.1 up to
1.1.255.255.  The ip_dst_cache grew to ~15k entries and grew no further. 
It's now gradually shrinking.  So there doesn't appear to be a trivial
bug..

>  Is no one interested in the fact that the DST cache is leaking and
>  eventually takes out machines?  I've had virtually zero interest in
>  this problem so far.

I guess we should find a way to make it happen faster.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: Memory leak in 2.6.11-rc1?

2005-01-27 Thread Russell King

On Tue, Jan 25, 2005 at 07:32:07PM +, Russell King wrote:
> On Mon, Jan 24, 2005 at 11:48:53AM +, Russell King wrote:
> > On Sun, Jan 23, 2005 at 08:03:15PM +, Russell King wrote:
> > > I think I may be seeing something odd here, maybe a possible memory leak.
> > > The only problem I have is wondering whether I'm actually comparing like
> > > with like.  Maybe some networking people can provide a hint?
> > > 
> > > Below is gathered from 2.6.11-rc1.
> > > 
> > > bash-2.05a# cat /proc/net/rt_cache | wc -l; grep ip_dst /proc/slabinfo
> > > 24
> > > ip_dst_cache 669885256   151
> > > 
> > > I'm fairly positive when I rebooted the machine a couple of days ago,
> > > ip_dst_cache was significantly smaller for the same number of lines in
> > > /proc/net/rt_cache.
> > 
> > FYI, today it looks like this:
> > 
> > bash-2.05a# cat /proc/net/rt_cache | wc -l; grep ip_dst /proc/slabinfo
> > 26
> > ip_dst_cache 820   1065256   151 
> > 
> > So the dst cache seems to have grown by 151 in 16 hours...  I'll continue
> > monitoring and providing updates.
> 
> Tonights update:
> 50
> ip_dst_cache1024   1245256   151
> 
> As you can see, the dst cache is consistently growing by about 200
> entries per day.  Given this, I predict that the box will fall over
> due to "dst cache overflow" in roughly 35 days.

This mornings magic numbers are:

3
ip_dst_cache1292   1485256   151

Is no one interested in the fact that the DST cache is leaking and
eventually takes out machines?  I've had virtually zero interest in
this problem so far.

-- 
Russell King
 Linux kernel2.6 ARM Linux   - http://www.arm.linux.org.uk/
 maintainer of:  2.6 PCMCIA  - http://pcmcia.arm.linux.org.uk/
 2.6 Serial core
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: Memory leak in 2.6.11-rc1?

2005-01-27 Thread Russell King

On Tue, Jan 25, 2005 at 07:32:07PM +, Russell King wrote:
 On Mon, Jan 24, 2005 at 11:48:53AM +, Russell King wrote:
  On Sun, Jan 23, 2005 at 08:03:15PM +, Russell King wrote:
   I think I may be seeing something odd here, maybe a possible memory leak.
   The only problem I have is wondering whether I'm actually comparing like
   with like.  Maybe some networking people can provide a hint?
   
   Below is gathered from 2.6.11-rc1.
   
   bash-2.05a# cat /proc/net/rt_cache | wc -l; grep ip_dst /proc/slabinfo
   24
   ip_dst_cache 669885256   151
   
   I'm fairly positive when I rebooted the machine a couple of days ago,
   ip_dst_cache was significantly smaller for the same number of lines in
   /proc/net/rt_cache.
  
  FYI, today it looks like this:
  
  bash-2.05a# cat /proc/net/rt_cache | wc -l; grep ip_dst /proc/slabinfo
  26
  ip_dst_cache 820   1065256   151 
  
  So the dst cache seems to have grown by 151 in 16 hours...  I'll continue
  monitoring and providing updates.
 
 Tonights update:
 50
 ip_dst_cache1024   1245256   151
 
 As you can see, the dst cache is consistently growing by about 200
 entries per day.  Given this, I predict that the box will fall over
 due to dst cache overflow in roughly 35 days.

This mornings magic numbers are:

3
ip_dst_cache1292   1485256   151

Is no one interested in the fact that the DST cache is leaking and
eventually takes out machines?  I've had virtually zero interest in
this problem so far.

-- 
Russell King
 Linux kernel2.6 ARM Linux   - http://www.arm.linux.org.uk/
 maintainer of:  2.6 PCMCIA  - http://pcmcia.arm.linux.org.uk/
 2.6 Serial core
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: Memory leak in 2.6.11-rc1?

2005-01-27 Thread Andrew Morton

Russell King [EMAIL PROTECTED] wrote:

 This mornings magic numbers are:
 
  3
  ip_dst_cache1292   1485256   151

I just did a q-n-d test here: send one UDP frame to 1.1.1.1 up to
1.1.255.255.  The ip_dst_cache grew to ~15k entries and grew no further. 
It's now gradually shrinking.  So there doesn't appear to be a trivial
bug..

  Is no one interested in the fact that the DST cache is leaking and
  eventually takes out machines?  I've had virtually zero interest in
  this problem so far.

I guess we should find a way to make it happen faster.
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: Memory leak in 2.6.11-rc1?

2005-01-27 Thread Alessandro Suardi

On Thu, 27 Jan 2005 00:47:32 -0800, Andrew Morton [EMAIL PROTECTED] wrote:
 Russell King [EMAIL PROTECTED] wrote:
 
  This mornings magic numbers are:
 
   3
   ip_dst_cache1292   1485256   151
 
 I just did a q-n-d test here: send one UDP frame to 1.1.1.1 up to
 1.1.255.255.  The ip_dst_cache grew to ~15k entries and grew no further.
 It's now gradually shrinking.  So there doesn't appear to be a trivial
 bug..
 
   Is no one interested in the fact that the DST cache is leaking and
   eventually takes out machines?  I've had virtually zero interest in
   this problem so far.
 
 I guess we should find a way to make it happen faster.
 -
 To unsubscribe from this list: send the line unsubscribe linux-kernel in
 the body of a message to [EMAIL PROTECTED]
 More majordomo info at  http://vger.kernel.org/majordomo-info.html
 Please read the FAQ at  http://www.tux.org/lkml/
 

Data point... on my box, used as ed2k/bittorrent
 machine, the ip_dst_cache grows and shrinks quite
 fast; these two samples were ~3 minutes apart:


[EMAIL PROTECTED] ~]# grep ip_dst /proc/slabinfo
ip_dst_cache 998   1005256   151 : tunables  120   60 
  0 : slabdata 67 67  0
[EMAIL PROTECTED] ~]# wc -l /proc/net/rt_cache
926 /proc/net/rt_cache

[EMAIL PROTECTED] ~]# grep ip_dst /proc/slabinfo
ip_dst_cache 466795256   151 : tunables  120   60 
  0 : slabdata 53 53  0
[EMAIL PROTECTED] ~]# wc -l /proc/net/rt_cache
443 /proc/net/rt_cache

 and these were 2 seconds apart

[EMAIL PROTECTED] ~]# wc -l /proc/net/rt_cache
737 /proc/net/rt_cache
[EMAIL PROTECTED] ~]# grep ip_dst /proc/slabinfo
ip_dst_cache 795795256   151 : tunables  120   60 
  0 : slabdata 53 53  0

[EMAIL PROTECTED] ~]# wc -l /proc/net/rt_cache
1023 /proc/net/rt_cache
[EMAIL PROTECTED] ~]# grep ip_dst /proc/slabinfo
ip_dst_cache1035   1035256   151 : tunables  120   60 
  0 : slabdata 69 69  0

--alessandro
 
  And every dream, every, is just a dream after all
 
 (Heather Nova, Paper Cup)
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: Memory leak in 2.6.11-rc1?

2005-01-27 Thread Martin Josefsson

On Thu, 27 Jan 2005, Andrew Morton wrote:

 Russell King [EMAIL PROTECTED] wrote:
 
  This mornings magic numbers are:
 
   3
   ip_dst_cache1292   1485256   151

 I just did a q-n-d test here: send one UDP frame to 1.1.1.1 up to
 1.1.255.255.  The ip_dst_cache grew to ~15k entries and grew no further.
 It's now gradually shrinking.  So there doesn't appear to be a trivial
 bug..

   Is no one interested in the fact that the DST cache is leaking and
   eventually takes out machines?  I've had virtually zero interest in
   this problem so far.

 I guess we should find a way to make it happen faster.

I could be a refcount problem. I think Russell is using NAT, it could be
the MASQUERADE target if that is in use. A simple test would be to switch
to SNAT and try again if possible.

/Martin
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: Memory leak in 2.6.11-rc1?

2005-01-27 Thread Robert Olsson


Andrew Morton writes:
  Russell King [EMAIL PROTECTED] wrote:

ip_dst_cache1292   1485256   151

  I guess we should find a way to make it happen faster.
 
Here is route DoS attack. Pure routing no NAT no filter.

Start
=
ip_dst_cache   5 30256   151 : tunables  120   608 : 
slabdata  2  2  0

After DoS
=
ip_dst_cache   66045  76125256   151 : tunables  120   608 : 
slabdata   5075   5075480

After some GC runs.
==
ip_dst_cache   2 15256   151 : tunables  120   608 : 
slabdata  1  1  0

No problems here. I saw Martin talked about NAT...

--ro
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: Memory leak in 2.6.11-rc1?

2005-01-27 Thread Robert Olsson


Oh. Linux version 2.6.11-rc2 was used.

Robert Olsson writes:
  
  Andrew Morton writes:
Russell King [EMAIL PROTECTED] wrote:
  
  ip_dst_cache1292   1485256   151
  
I guess we should find a way to make it happen faster.
   
  Here is route DoS attack. Pure routing no NAT no filter.
  
  Start
  =
  ip_dst_cache   5 30256   151 : tunables  120   608 : 
  slabdata  2  2  0
  
  After DoS
  =
  ip_dst_cache   66045  76125256   151 : tunables  120   608 : 
  slabdata   5075   5075480
  
  After some GC runs.
  ==
  ip_dst_cache   2 15256   151 : tunables  120   608 : 
  slabdata  1  1  0
  
  No problems here. I saw Martin talked about NAT...
  
   --ro
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

1 2 >

1 - 100 of 165 matches

Mail list logo