On 16/01/2024 14:34, Laszlo Ersek wrote:
On 1/16/24 10:48, Michael Brown wrote:
IOW, my impression is that NestedInterruptTplLib can certainly handle
all scenarios thrown at it, but where it really matters is in the face
of an interrupt storm (not just "normal nesting"), and a storm is
unlikely (or even impossible?) on physical hardware.

... Oh, scratch that. "Interrupt storm" simply means that interrupts are
being delivered at a rate higher than the handler routine can service
them. IOW, the "storm" is not that interrupts are delivered *very
rapidly* in an absoulte sense. If interrupts are delivered at normal
frequency, but the handler is too slow to service *even that rate*, then
that also qualifies as "storm", because the nesting depth will *keep
growing*. It's not really the growth rate that matters; what matter is
the *trend*, i.e., the fact that there *is* growth (the stack gets
deeper and deeper). The stack might not overflow immediately, and if the
handler speeds up (for whatever reason), the stack might recover, but
there is nothing to prevent an overflow.

So, in the end, I think you've convinced me.

:)

I'm happy to send a patch to migrate NestedInterruptTplLib to
MdeModulePkg, so that it can be consumed outside of OvmfPkg.  Shall I do
this?

Sounds like a valid idea to me.

Could be greatly supported by a test case (to be run on the bare metal)
installing a slow handler that *eventually* exhausted the stack, when
not using NestedInterruptTplLib.

(FWIW, IIRC, the UEFI spec warns about this -- it says something like,
"return from TPL_HIGH as soon as you can, otherwise the system will
become unstable".)

Sorry for the wall of text, I find this very difficult to reason about.

I also find it very difficult to reason about, which is why NestedInterruptRestoreTpl() has 126 lines of comments providing a semi-formal proof of correctness for a mere 15 statements of C code!

In particular, I find it difficult to reason about when it would be safe for a platform to *not* use NestedInterruptTplLib. It's clearly empirically difficult to trigger stack underflow via an interrupt "storm" on physical hardware, but I'm not convinced it's impossible.

I find it mentally easier to rely on the hard guarantee that NestedInterruptTplLib provides: that nested interrupts will continue to be delivered but that the number of interrupt-induced stack frames is bounded by the (small, finite) number of distinct TPL levels in existence.



While developing NestedInterruptTplLib, I did hack together a test case for a slow handler that would deliberately induce an interrupt storm, since I needed this to test that my code was working. When triggered, this test would cause the machine to effectively hang due to servicing an endless storm of timer interrupts. Before NestedInterruptTplLib, the stack would soon underflow and would typically cause a reboot (or other crash). With NestedInterruptTplLib the machine would continue to service interrupts indefinitely.

How might such a test case be included in upstream EDK2? I'm peripherally aware of EDK2 test infrastructure such as UEFI SCT, but I've never interacted with it yet.

Thanks,

Michael



-=-=-=-=-=-=-=-=-=-=-=-
Groups.io Links: You receive all messages sent to this group.
View/Reply Online (#113908): https://edk2.groups.io/g/devel/message/113908
Mute This Topic: https://groups.io/mt/103734961/21656
Group Owner: devel+ow...@edk2.groups.io
Unsubscribe: https://edk2.groups.io/g/devel/unsub [arch...@mail-archive.com]
-=-=-=-=-=-=-=-=-=-=-=-


Reply via email to