On Tue, May 16, 2017 at 6:39 AM, Catalin Marinas <catalin.mari...@arm.com> wrote: > Thanks for cc'ing me. The vmalloc allocations have always been tricky > for kmemleak since there are 2-3 other memory locations with the same > value as the vmalloc'ed object: vm_struct.addr and vmap_area.va_start; > occasionally we have vmap_area.va_end pointing to the next > vmap_area.va_start. > > To have a more reliable object reference counting, kmemleak avoids > scanning most of the vmap_area structure (apart from vmap_area.vm) and > requires a minimum count of 2 references for a vmalloc'ed object to not > be considered a leak (that is vm_struct.addr and the caller's location, > in this case task->stack). > >> On Mon 15-05-17 23:53:18, Luis R. Rodriguez wrote: >> > root@piggy:~# cat /sys/kernel/debug/kmemleak >> > unreferenced object 0xffffa07500d4c000 (size 16384): >> > comm "bash", pid 1349, jiffies 4294895999 (age 263.204s) >> > hex dump (first 32 bytes): >> > 9d 6e ac 57 00 00 00 00 00 00 00 00 00 00 00 00 .n.W............ >> > 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ >> > backtrace: >> > [<ffffffffa5464cca>] kmemleak_alloc+0x4a/0xa0 >> > [<ffffffffa4fdfe6c>] __vmalloc_node_range+0x20c/0x2b0 >> > [<ffffffffa4e7d1a2>] copy_process.part.37+0x5c2/0x1af0 >> > [<ffffffffa4e7e89f>] _do_fork+0xcf/0x390 >> > [<ffffffffa4e7ec09>] SyS_clone+0x19/0x20 >> > [<ffffffffa4e03b0b>] do_syscall_64+0x5b/0xc0 >> > [<ffffffffa547072b>] return_from_SYSCALL_64+0x0/0x6a >> > [<ffffffffffffffff>] 0xffffffffffffffff > > This stack here was allocated by bash for a child process via > alloc_thread_stack_node() when CONFIG_VMAP_STACK is enabled. When the > child died, free_stack_thread() did not call vfree_atomic() but rather > stored the corresponding vm_struct pointer in cached_stacks[i]. However, > the original task->stack pointer was lost (child task_struct freed), so > when scanning the memory kmemleak can only find a single reference to > the original stack (via vm_struct cached_stacks[i]) rather than 2 as > required for vmalloc'ed objects. > > BTW, you can get more info about a specific object, including the number > of references found, with: > > echo dump=0xffffa07500d4c000 > /sys/kernel/debug/kmemleak > > So basically kernel/fork.c has its own thread stack allocator on top of > vmalloc and kmemleak gets confused. A quick workaround: > > ------------8<------------------------------ > diff --git a/kernel/fork.c b/kernel/fork.c > index 06d759ab4c62..785907fccf67 100644 > --- a/kernel/fork.c > +++ b/kernel/fork.c > @@ -214,6 +214,7 @@ static unsigned long *alloc_thread_stack_node(struct > task_struct *tsk, int node) > this_cpu_write(cached_stacks[i], NULL); > > tsk->stack_vm_area = s; > + kmemleak_alloc(s->addr, THREAD_SIZE, 2, GFP_ATOMIC); > local_irq_enable(); > return s->addr; > } > @@ -254,6 +255,7 @@ static inline void free_thread_stack(struct task_struct > *tsk) > continue; > > this_cpu_write(cached_stacks[i], tsk->stack_vm_area); > + kmemleak_free(tsk->stack); > local_irq_restore(flags); > return; > } > ------------8<------------------------------
I would do this differently. The problem only affects cached stacks, right? How about kmemleal_ignoring when caching a stack and unignoring when uncaching a stack? Also, this needs a serious comment. How about "kmemleak does not presently understand that a reference to a vm_area_struct implies a reference to the vmalloced memory"? > A more complex solution (which I'm not yet convinced it works) may be to > pass the number of references of an objects down to the objects it > refers. In the vmalloc case we normally have a single reference to > vm_struct and two to the vmalloc'ed object. If a reference to the > vmalloc'ed object disappeared we could balance it with an increased > number of references to the vm_struct object. But this option requires > more research. The idea being that, with this change, kmemleak would start to understand that surplus references to vm_area_struct cause it to think the memory itself is reachable? --Andy