On Thu, Apr 04, 2019 at 11:15:39AM +0200, Thomas-Mich Richter wrote:
> On 4/3/19 12:41 PM, Peter Zijlstra wrote:
> > On Wed, Apr 03, 2019 at 11:47:00AM +0200, Thomas-Mich Richter wrote:
> >> I use linux 5.1.0-rc3 on s390 and got this WARN_ON_ONCE message:
> >>
> >> WARNING: CPU: 15 PID: 0 at kernel/events/core.c:330
> >>                 event_function_local.constprop.79+0xe2/0xe8
> >>
> >> which was introduced with
> >>    commit cca2094605ef ("perf/core: Fix event_function_local()").
> >> ..snip.... 
> >>
> >> Any ideas or hints who to avoid/fix this warning?
> > 
> > Some thoughts here:
> > 
> >   
> > https://lkml.kernel.org/r/20190213101644.gn32...@hirez.programming.kicks-ass.net
> > 
> > tl;dr, I've no frigging clue.
> > 
> 
> I have read this thread and at the end you mentioned:
> 
>     Humm, but in that case:
> 
>    context_switch()
>     prepare_task_switch()
>       perf_event_task_sched_out()
>         __perf_event_task_sched_out()
>         perf_event_context_sched_out()
>           task_ctx_sched_out()
>             ctx_sched_out()
>               group_sched_out()
>                 event_sched_out()
>                   if (event->pending_disable)
> 
>    Would have already cleared the pending_disable state, so the IPI would
>    not have ran perf_event_disable_local() in the first place.
> 
> Our test system is configured to panic in WARN_ON_ONCE(). I looked
> at the dump. The event triggering WARN_ON_ONCE:
> 
> crash> struct perf_event.oncpu 0x1f9b24800
>   oncpu = 0
> crash> struct perf_event.state 0x1f9b24800
>   state = PERF_EVENT_STATE_ACTIVE
> crash> 
> 
> This means the code in 
> static void event_sched_out(....)
> {
>         ....
>         event->pmu->del(event, 0);
>         event->oncpu = -1;
> 
>         if (event->pending_disable) {
>                 event->pending_disable = 0;
>                 state = PERF_EVENT_STATE_OFF;
>         }
>         perf_event_set_state(event, state);
>         ...
> }
> 
> has not finished and returned from this function. So the task was not 
> completely context-switched
> out from CPU 0 while the interrupt handler was executing on CPU 15:
> 
> static void perf_pending_event(...)
> {
>         ....
>         if (event->pending_disable) {
>                 event->pending_disable = 0;
>                 perf_event_disable_local(event);  <--- Causes the 
> WARN_ON_ONCE()
>         }
>         .....
> }
> 
> I think there is a race, especially when the interrupt handler on CPU 15
> was invoked via timer interrupt an runs on a different CPU.

That is not entirely the scenario I talked about, but *groan*.

So what I meant was:

        CPU-0                                                   CPU-n

        __schedule()
          local_irq_disable()

          ...
            deactivate_task(prev);

                                                                
try_to_wake_up(@p)
                                                                  ...
                                                                  
smp_cond_load_acquire(&p->on_cpu, !VAL);

          <PMI>
            ..
            perf_event_disable_inatomic()
              event->pending_disable = 1;
              irq_work_queue() /* self-IPI */
          </PMI>

          context_switch()
            prepare_task_switch()
              perf_event_task_sched_out()
                // the above chain that clears pending_disable

            finish_task_switch()
              finish_task()
                smp_store_release(prev->on_cpu, 0);
                                                                  /* 
finally.... */
                                                                // take woken
                                                                // 
context_switch to @p
              finish_lock_switch()
                raw_spin_unlock_irq()
                /* w00t, IRQs enabled, self-IPI time */
                <self-IPI>
                  perf_pending_event()
                    // event->pending_disable == 0
                </self-IPI>


What you're suggesting, is that the time between:

  smp_store_release(prev->on_cpu, 0);

and

  <self-IPI>

on CPU-0 is sufficient for CPU-n to context switch to the task, enable
the event there, trigger a PMI that calls perf_event_disable_inatomic()
_again_ (this would mean irq_work_queue() failing, which we don't check)
(and schedule out again, although that's not required).

This being virt that might actually be possible if (v)CPU-0 takes a nap
I suppose.

Let me think about this a little more...

> > Does it reproduce on x86 without virt on? I don't have a PPC LPAR to
> > test things on.
> > 
> 
> s390 LPARs run under hipervisor control, no chance to run any OS without it.

Yah, I know.. Same difference though; I also don't have an s390.

Reply via email to