events: bug fixes and some diagnostic aids

Julien Grall Mon, 08 Feb 2021 02:04:18 -0800



On 08/02/2021 09:41, Jürgen Groß wrote:

On 08.02.21 10:11, Julien Grall wrote:
Hi Juergen,

On 07/02/2021 12:58, Jürgen Groß wrote:
On 06.02.21 19:46, Julien Grall wrote:
Hi Juergen,

On 06/02/2021 10:49, Juergen Gross wrote:
The first three patches are fixes for XSA-332. The avoid WARN splats
and a performance issue with interdomain events.
Thanks for helping to figure out the problem. Unfortunately, I stillsee reliably the WARN splat with the latest Linux master(1e0d27fce010) + your first 3 patches.
I am using Xen 4.11 (1c7d984645f9) and dom0 is forced to use the 2Levents ABI.
After some debugging, I think I have an idea what's went wrong. Theproblem happens when the event is initially bound from vCPU0 to adifferent vCPU.
From the comment in xen_rebind_evtchn_to_cpu(), we are masking theevent to prevent it being delivered on an unexpected vCPU. However,I believe the following can happen:
vCPU0                | vCPU1
                 |
                 | Call xen_rebind_evtchn_to_cpu()
receive event X            |
                 | mask event X
                 | bind to vCPU1
<vCPU descheduled>        | unmask event X
                 |
                 | receive event X
                 |
                 | handle_edge_irq(X)
handle_edge_irq(X)        |  -> handle_irq_event()
                 |   -> set IRQD_IN_PROGRESS
  -> set IRQS_PENDING        |
                 |   -> evtchn_interrupt()
                 |   -> clear IRQD_IN_PROGRESS
                 |  -> IRQS_PENDING is set
                 |  -> handle_irq_event()
                 |   -> evtchn_interrupt()
                 |     -> WARN()
                 |
All the lateeoi handlers expect a ONESHOT semantic andevtchn_interrupt() is doesn't tolerate any deviation.
I think the problem was introduced by 7f874a0447a9 ("xen/events: fixlateeoi irq acknowledgment") because the interrupt was disabledpreviously. Therefore we wouldn't do another iteration inhandle_edge_irq().
I think you picked the wrong commit for blaming, as this is just
the last patch of the three patches you were testing.
I actually found the right commit for blaming but I copied theinformation from the wrong shell :/. The bug was introduced by:
c44b849cee8c ("xen/events: switch user event channels to lateeoi model")
Aside the handlers, I think it may impact the defer EOI mitigationbecause in theory if a 3rd vCPU is joining the party (let say vCPU Amigrate the event from vCPU B to vCPU C). So info->{eoi_cpu,irq_epoch, eoi_time} could possibly get mangled?
For a fix, we may want to consider to hold evtchn_rwlock with thewrite permission. Although, I am not 100% sure this is going toprevent everything.
It will make things worse, as it would violate the locking hierarchy
(xen_rebind_evtchn_to_cpu() is called with the IRQ-desc lock held).
Ah, right.
On a first glance I think we'll need a 3rd masking state ("temporarily
masked") in the second patch in order to avoid a race with lateeoi.

In order to avoid the race you outlined above we need an "event is being
handled" indicator checked via test_and_set() semantics in
handle_irq_for_port() and reset only when calling clear_evtchn().
It feels like we are trying to workaround the IRQ flow we are using(i.e. handle_edge_irq()).
I'm not really sure this is the main problem here. According to your
analysis the main problem is occurring when handling the event, not when
handling the IRQ: the event is being received on two vcpus.

I don't think we can easily divide the two because we rely on the IRQframework to handle the lifecycle of the event. So...


Our problem isn't due to the IRQ still being pending, but due it being
raised again, which should happen for a one shot IRQ the same way.

... I don't really see how the difference matter here. The idea is tore-use what's already existing rather than trying to re-invent the wheelwith an extra lock (or whatever we can come up).


Cheers,

--
Julien Grall

Re: [PATCH 0/7] xen/events: bug fixes and some diagnostic aids

Reply via email to