cqm: Make sure the head event of cache_groups always has valid RMID

David Carrillo-Cisneros Wed, 17 May 2017 22:00:07 -0700

On Tue, May 16, 2017 at 7:38 AM, Peter Zijlstra <[email protected]> wrote:
> On Thu, May 04, 2017 at 10:31:43AM +0800, Zefan Li wrote:
>> It is assumed that the head of cache_groups always has valid RMID,
>> which isn't true.
>>
>> When we deallocate RMID from conflicting events currently we don't
>> move them to the tail, and one of those events can happen to be in
>> the head. Another case is we allocate RMIDs for all the events except
>> the head event in intel_cqm_sched_in_event().
>>
>> Besides there's another bug that we retry rotating without resetting
>> nr_needed and start in __intel_cqm_rmid_rotate().
>>
>> Those bugs combined together led to the following oops.
>>
>> WARNING: at arch/x86/kernel/cpu/perf_event_intel_cqm.c:186 
>> __put_rmid+0x28/0x80()
>> ...
>>  [<ffffffff8103a578>] __put_rmid+0x28/0x80
>>  [<ffffffff8103a74a>] intel_cqm_rmid_rotate+0xba/0x440
>>  [<ffffffff8109d8cb>] process_one_work+0x17b/0x470
>>  [<ffffffff8109e69b>] worker_thread+0x11b/0x400
>> ...
>> BUG: unable to handle kernel NULL pointer dereference at           (null)


I ran into this bug long time ago but never found an easy way to
reproduce. Do you have one?

>> ...
>>  [<ffffffff8103a74a>] intel_cqm_rmid_rotate+0xba/0x440
>>  [<ffffffff8109d8cb>] process_one_work+0x17b/0x470
>>  [<ffffffff8109e69b>] worker_thread+0x11b/0x400
>
> I've managed to forgot most if not all of that horror show. Vikas and
> David seem to be working on a replacement, but until such a time it
> would be good if this thing would not crash the kernel.
>
> Guys, could you have a look? To me it appears to mostly have the right
> shape, but like I said, I forgot most details...

The patch LGTM. I ran into this issues before and fixed them in a
similar but messier way, then the re-write started ...

>
>>
>> Cc: [email protected]
>> Signed-off-by: Zefan Li <[email protected]>
Acked-by: David Carrillo-Cisneros <[email protected]>

Re: [PATCH] perf/x86/intel/cqm: Make sure the head event of cache_groups always has valid RMID

Reply via email to