On Wed, 1 Oct 2014, Sasha Levin wrote: > On 10/01/2014 05:07 PM, Andrew Morton wrote: > > On Mon, 29 Sep 2014 21:47:14 -0400 Sasha Levin <sasha.le...@oracle.com> > > wrote: > > > >> Currently we're seeing a few issues which are unexplainable by looking at > >> the > >> data we see and are most likely caused by a memory corruption caused > >> elsewhere. > >> > >> This is wasting time for folks who are trying to figure out an issue > >> provided > >> a stack trace that can't really point out the real issue. > >> > >> This patch introduces poisoning on struct page, vm_area_struct, and > >> mm_struct, > >> and places checks in busy paths to catch corruption early. > >> > >> This series was tested, and it detects corruption in vm_area_struct. Right > >> now > >> I'm working on figuring out the source of the corruption, (which is a long > >> standing bug) using KASan, but the current code is useful as it is. > > > > Is this still useful if/when kasan is in place? > > Yes, the corruption we're seeing happens inside the struct rather than around > it. > kasan doesn't look there. > > When kasan is merged, we could complement this patchset by making kasan trap > on > when the poison is getting written, rather than triggering a BUG in some place > else after we saw the corruption. > > > It looks fairly cheap - I wonder if it should simply fall under > > CONFIG_DEBUG_VM rather than the new CONFIG_DEBUG_VM_POISON. > > Config options are cheap as well :) > > I'd rather expand it further and add poison/kasan trapping into other places > such > as the vma interval tree rather than having to keep it "cheap".
I like to run with CONFIG_DEBUG_VM, and would not want this stuff turned on in my builds (especially not the struct page enlargement); so I'm certainly with you in preferring a separate option. But it all seems very ad hoc to me. Are people going to be adding more and more mm structures into it, ad infinitum? And adding CONFIG_DEBUG_SCHED_POISON one day when someone notices corruption of a scheduler structure? etc etc. What does this add on top of slab poisoning? Some checks in some mm places while the object is active, I guess: why not base those on slab poisoning? And add them in as appropriate to the problem at hand, when a problem is seen. I think these patches are fine for investigating whatever is the problem currently afflicting you and mm under trinity; but we all have our temporary debugging patches, I don't think all deserve preservation in everyone else's kernel, that amounts to far more clutter than any are worth. I'm glad to hear they've confirmed some vm_area_struct corruption: any ideas on where that's coming from? Hugh -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/