On Fri, Apr 05, 2019 at 12:18:54PM +0200, Thomas-Mich Richter wrote: > On 4/4/19 3:03 PM, Peter Zijlstra wrote: > > On Thu, Apr 04, 2019 at 01:09:09PM +0200, Peter Zijlstra wrote: > > > >> That is not entirely the scenario I talked about, but *groan*. > >> > >> So what I meant was: > >> > >> CPU-0 CPU-n > >> > >> __schedule() > >> local_irq_disable() > >> > >> ... > >> deactivate_task(prev); > >> > >> > >> try_to_wake_up(@p) > >> ... > >> > >> smp_cond_load_acquire(&p->on_cpu, !VAL); > >> > >> <PMI> > >> .. > >> perf_event_disable_inatomic() > >> event->pending_disable = 1; > >> irq_work_queue() /* self-IPI */ > >> </PMI> > >> > >> context_switch() > >> prepare_task_switch() > >> perf_event_task_sched_out() > >> // the above chain that clears pending_disable > >> > >> finish_task_switch() > >> finish_task() > >> smp_store_release(prev->on_cpu, 0); > >> /* > >> finally.... */ > >> // take woken > >> // > >> context_switch to @p > >> finish_lock_switch() > >> raw_spin_unlock_irq() > >> /* w00t, IRQs enabled, self-IPI time */ > >> <self-IPI> > >> perf_pending_event() > >> // event->pending_disable == 0 > >> </self-IPI> > >> > >> > >> What you're suggesting, is that the time between: > >> > >> smp_store_release(prev->on_cpu, 0); > >> > >> and > >> > >> <self-IPI> > >> > >> on CPU-0 is sufficient for CPU-n to context switch to the task, enable > >> the event there, trigger a PMI that calls perf_event_disable_inatomic() > >> _again_ (this would mean irq_work_queue() failing, which we don't check) > >> (and schedule out again, although that's not required). > >> > >> This being virt that might actually be possible if (v)CPU-0 takes a nap > >> I suppose. > >> > >> Let me think about this a little more... > > > > Does the below cure things? It's not exactly pretty, but it could just > > do the trick. > > > > Thanks a lot for the patch, I have built a new kernel and let it run over the > week end. > > s390 does not have a PMI, all interrupts (including the measurement > interrupts from > the PMU) are normal, maskable interrupts.
Please implement arch_irq_work_raise() for s390 though, traditionally that was only required if you had NMI-like PMis, and IRQ based PMIs would have to run irq_work_run() at the end of their handler. The s390-sf handler does not do this, but even it if were to do that, it wouldn't be sufficient, since you also drain the buffer from pmu::stop(). Not doing this will get you some 'weird' behavoiur. As in, your signals could be delivered very late etc.