On Tue, 23 Aug 2016, Peter Zijlstra wrote:

> On Mon, Aug 22, 2016 at 10:54:32PM -0400, Vince Weaver wrote:
> > > > > > 
> > > > > >     perf stat -a -e amd_nb/config=0x37,config1=0x20/ /bin/ls
> > >   amd_uncore_find_online_sibling()
> > > function is broken. 
> > 
> > and that's the problem.  uncore_find_online_sibling() does all kinds of 
> > wrong things including sticking active uncore structures in 
> > uncore->free_when_cpu_online
> > 
> > Then uncore_online() comes along and frees those structures.
> > 
> > Then some other part of the kernel comes and re-uses the free'd data.
> > 
> > Then when we try to start an event, all of the fields are invalid because 
> > the uncore pointer is pointing to re-used data.
> > 
> > I don't have a patch because I am not 100% clear on what 
> > uncore_find_online_sibling() is doing in the first place.
> 
> Thanks for doing all that, I'll see if I can make sense of it.

I should have provided more detail, was just tired after chasing the bug 
for so long.  I mostly found things by sprinkling printks everywhere.
Comenting out the call to kfree() in uncore_online() makes the code stop 
crashing (but perhaps causes a memory leak?)

In any case it's odd the problem didn't show up earlier, but maybe the 
recent changes to CPU hotplugging in that file exposed the issue.

Vince

Reply via email to