On Mon, 22 Aug 2016, Huang Rui wrote: > Hi Peter, Vince > > On Fri, Aug 19, 2016 at 12:01:30PM +0200, Peter Zijlstra wrote: > > On Thu, Aug 18, 2016 at 10:46:31AM -0400, Vince Weaver wrote: > > > On Thu, 18 Aug 2016, Vince Weaver wrote: > > > > > > > Tried the perf_fuzzer on my A10 fam15h/model13h system with 4.8-rc2 and > > > > it > > > > falls over more or less immediately. > > > > > > > > This maps to variable_test_bit() > > > > called by ctx = find_get_context(pmu, task, event); > > > > in kernel/events/core.c:9467 > > > > > > > > It happens quickly enough I can probably track down the exact event > > > > that > > > > causes this, if needed. > > > > > > I have a one line reproducer: > > > > > > perf stat -a -e amd_nb/config=0x37,config1=0x20/ /bin/ls > > > > OK, cannot reproduce on my fam15h/model1h. I'll go dig through the > > various manuals to see if I can spot the fail. > > > > Huang could you either prod someone at AMD or do yourself, audit the AMD > > perf code for all the various new models? > > Actually, there might be some NBPMC event changes between model 0h-fh and > model 10h-1fh. Below are the documents of these two processors: > > http://support.amd.com/TechDocs/42301_15h_Mod_00h-0Fh_BKDG.pdf > http://support.amd.com/TechDocs/42300_15h_Mod_10h-1Fh_BKDG.pdf > > In section 3.16, it describes usage of NB Performance Counter Events.
I don't think it's the hardware that's causing the problem. I've wasted a lot more time on it, and finally figured out how the "bt" instruction works, so the assembly more or less makes sense. The problem is the per-cpu amd_uncore struct is being over-written with kernel memory addresses. This makes uncore[0]->cpu a large number (it's often, but not always, the per-cpu address of uncore[1]->cpu) which leads to the GPF. I can't figure out what piece of code is overwriting things though. And to make things complicated, I think the amd_uncore_find_online_sibling() function is broken. The code could really use more commenting, but I think it is designed so all siblings share one single amd_uncore structure, but in practice it looks like this doesn't work due to the way the list iterator works. Vince