On 05/05/2023 19:56, Laszlo Ersek wrote:
I don't like the patch. For two reasons:
(1) It papers over the actual issue. The problem should be fixed where
it is, if possible.
Agreed, but (as you have shown in
https://bugzilla.redhat.com/show_bug.cgi?id=2189136) the bug lies in
Windows code rather than in EDK2 code. If the goal is to allow these
buggy Windows builds to still be used with OVMF, then the only option is
to paper over the issue. We should do this only if it can be proven
safe to do so, of course.
(2) With the patch applied, NestedInterruptRaiseTPL() can return
TPL_HIGH_LEVEL (as "InterruptedTPL"). Consequently,
TimerInterruptHandler() [OvmfPkg/LocalApicTimerDxe/LocalApicTimerDxe.c]
may pass TPL_HIGH_LEVEL back to NestedInterruptRestoreTPL(), as
"InterruptedTPL".
I believe that this in turn may invalidate at least one comment in
NestedInterruptRestoreTPL():
//
// Call RestoreTPL() to allow event notifications to be
// dispatched. This will implicitly re-enable interrupts.
//
gBS->RestoreTPL (InterruptedTPL);
Restoring TPL_HIGH_LEVEL does not re-enable interrupts -- nominally anyways.
I agree that the comment is invalidated, but as far as I can tell the
logic remains safe.
I will put together a patch to update the comments in
NestedInterruptTplLib to address the possibility of an interrupt
occurring (illegally) at TPL_HIGH_LEVEL.
(a) Make LocalApicTimerDxe Xen-specific again. It's only the OVMF Xen
platform that really *needs* NestedInterruptTplLib. (Don't get me wrong:
NestedInterruptTplLib is technically correct in all circumstances, but
in practice it happens to be too strict.)
(b) For the non-Xen OVMF platforms, re-create a LocalApicTimerDxe
variant that effectively has commits a086f4a63bc0 and a24fbd606125
reverted. (We should keep 9bf473da4c1d.) This returns us to
pre-239b50a86370 status -- that is, a timer interrupt handler that (a)
does not try to be smart about nested interrupts, therefore one that is
much simpler, and (b) is more tolerant of the Windows / cdboot.efi spec
violation, (c) is vulnerable to the timer interrupt storm seen on Xen,
but will never run on Xen. (Only the OVMF Xen platform is supposed to be
launched on Xen.)
I'm less keen on this because it reduces the runtime exposure of a very
complex piece of code, and will effectively cause that code to become
unmaintained.
It's also satisfying (to me) that NestedInterruptTplLib provides a
provable upper bound on stack consumption due to interrupts, which can't
be guaranteed by the simpler pre-239b50a86370 scheme.
Could we defer judgement until after I've fully reasoned through (and
documented) how NestedInterruptTplLib will work in the presence of
interrupts occurring at TPL_HIGH_LEVEL?
Thanks,
Michael
-=-=-=-=-=-=-=-=-=-=-=-
Groups.io Links: You receive all messages sent to this group.
View/Reply Online (#104162): https://edk2.groups.io/g/devel/message/104162
Mute This Topic: https://groups.io/mt/98656860/21656
Group Owner: devel+ow...@edk2.groups.io
Unsubscribe: https://edk2.groups.io/g/devel/unsub [arch...@mail-archive.com]
-=-=-=-=-=-=-=-=-=-=-=-