Public bug reported:
[Impact]
To save the vgic LPI pending state with GICv4.1, the VPEs must all be
unmapped from the ITSs so that the sGIC caches can be flushed. The opposite is
done once the state is saved.
This is all done by using the activate/deactivate irqdomain
callbacks directly from the vgic code. Crutially, this is done without
holding the irqdesc lock for the interrupts that represent the VPE. And
these callbacks are changing the state of the irqdesc. What could
possibly go wrong?
If a doorbell fires while we are messing with the irqdesc state, it
will acquire the lock and change the interrupt state concurrently. Since
we don't hole the lock, curruption occurs in on the interrupt state. Oh
well.
While acquiring the lock would fix this (and this was Shanker's
initial approach), this is still a layering violation we could do
without. A better approach is actually to free the VPE interrupt, do
what we have to do, and re-request it.
It is more work, but this usually happens only once in the lifetime
of the VM and we don't really care about this sort of overhead.
The upstream maintainer acknowledged the bug, fixed the issue. and
it will be available in v6.2.
[Fixes]
- single patch to address the race condition on VPE activation/deactivation
** Affects: linux-nvidia-5.19 (Ubuntu)
Importance: Undecided
Status: New
--
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux-nvidia-5.19 in Ubuntu.
https://bugs.launchpad.net/bugs/2003640
Title:
Integrate NVIDIA Grace kernel fixes for vGIC
Status in linux-nvidia-5.19 package in Ubuntu:
New
Bug description:
[Impact]
To save the vgic LPI pending state with GICv4.1, the VPEs must all be
unmapped from the ITSs so that the sGIC caches can be flushed. The opposite is
done once the state is saved.
This is all done by using the activate/deactivate irqdomain
callbacks directly from the vgic code. Crutially, this is done without
holding the irqdesc lock for the interrupts that represent the VPE.
And these callbacks are changing the state of the irqdesc. What could
possibly go wrong?
If a doorbell fires while we are messing with the irqdesc state,
it will acquire the lock and change the interrupt state concurrently.
Since we don't hole the lock, curruption occurs in on the interrupt
state. Oh well.
While acquiring the lock would fix this (and this was Shanker's
initial approach), this is still a layering violation we could do
without. A better approach is actually to free the VPE interrupt, do
what we have to do, and re-request it.
It is more work, but this usually happens only once in the
lifetime of the VM and we don't really care about this sort of
overhead.
The upstream maintainer acknowledged the bug, fixed the issue. and
it will be available in v6.2.
[Fixes]
- single patch to address the race condition on VPE
activation/deactivation
To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux-nvidia-5.19/+bug/2003640/+subscriptions
--
Mailing list: https://launchpad.net/~kernel-packages
Post to : [email protected]
Unsubscribe : https://launchpad.net/~kernel-packages
More help : https://help.launchpad.net/ListHelp