I think we are all aligned on the purpose. It's to avoid enabling the
interrupts in the end of RestoreTPL (HIGH->non-HIGH) in the interrupt context.
The discussion is about how to implement it.
Michael Brown's idea is to avoid changing DxeCore but add a customized
RaiseTpl/RestoreTpl implementation in a lib and request Timer driver calls it.
That lib was implemented very smartly. It includes while-loop,
implicitly-recursive, implicitly-requiring NESTED_INTERRUPT_STATE in global
storage not in stack as local variable.
I really do NOT like the future that every timer driver calls that lib to avoid
the potential stack overflow. It's so complicated! And it's called in every
10ms!!
Paolo,
I don't fully understand your patch especially the following changes.
3 comments embedded.
@@ -161,5 +191,46 @@ CoreRestoreTpl (
IN EFI_TPL NewTpl
)
{
+ BOOLEAN InInterruptHandler = FALSE;
+
+ //
+ // Unwind the nested interrupt handlers up to the required
+ // TPL, paying attention not to overflow the stack. While
+ // not strictly necessary according to the specification,
+ // accept the possibility that multiple RaiseTPL calls are
+ // undone by a single RestoreTPL
+ //
+ while ((INTN)NewTpl <= HighBitSet64 (mInterruptedTplMask)) {
1. why "<="? I thought when RestoreTPL() is called there are only two cases:
a. NewTpl == HighBitSet64 (...)
b. NewTpl > HighBitSet64 (...)
1.a is the case when TimerInterruptHandler() or CoreTimerTick() restores
TPL from HIGH to non-HIGH.
1.b is the case when the pending event backs call RaiseTPL/RestoreTPL().
Because only pending events whose TPL > "Interrupted TPL" can run, the
RestoreTPL() call from the event callbacks cannot change the TPL to a value
less than or equal to "Interrupted TPL".
So, I think "<=" can be "==".
2. can you explain a bit more about the reason of "while"?
+ UINTN InterruptedTpl = HighBitSet64 (mInterruptedTplMask);
+ mInterruptedTplMask &= ~(UINTN)(1 << InterruptedTpl);
+
+ ASSERT (GetInterruptState () == FALSE);
+ InInterruptHandler = TRUE;
+
+ //
+ // Take the TPL down a notch to allow event notifications to be
+ // dispatched. This will implicitly re-enable interrupts (if
+ // InterruptedTPL is below TPL_HIGH_LEVEL), even though we are
+ // still inside the interrupt handler, but the new TPL will
+ // be set while they are disabled.
+ //
+ // DesiredInterruptState must be FALSE to ensure that the
+ // stack does not blow up. If we used, as in the final call
+ // below, "InterruptedTpl < TPL_HIGH_LEVEL", the timer interrupt
+ // handler could be invoked repeatedly in the small window between
+ // CoreSetInterruptState (TRUE) and the IRET instruction.
+ //
+ CoreRestoreTplInternal (InterruptedTpl, FALSE);
+
+ if (InterruptedTpl == NewTpl) {
+ break;
3. "break" or "return"? I think we should exit from this function.
+ }
+ }
+
+ //
+ // If we get here with InInterruptHandler == TRUE, an interrupt
+ // handler forgot to restore the TPL.
+ //
+ ASSERT (!InInterruptHandler);
CoreRestoreTplInternal (NewTpl, NewTpl < TPL_HIGH_LEVEL);
}
Thanks,
Ray
> -----Original Message-----
> From: Paolo Bonzini <[email protected]>
> Sent: Friday, March 1, 2024 8:14 AM
> To: Ni, Ray <[email protected]>
> Cc: [email protected]; Kinney, Michael D <[email protected]>;
> Liming Gao <[email protected]>; Laszlo Ersek <[email protected]>;
> Michael Brown <[email protected]>
> Subject: Re: [PATCH 2/2] MdeModulePkg/DxeCore: Fix stack overflow issue
> due to nested interrupts
>
> On Thu, Feb 29, 2024 at 2:04 PM Ray Ni <[email protected]> wrote:
> > @@ -134,9 +262,9 @@ CoreRestoreTpl (
> > }
> >
> > //
> > - // Set the new value
> > + // Set the new TPL with interrupt disabled.
> > //
> > -
> > + CoreSetInterruptState (FALSE);
> > gEfiCurrentTpl = NewTpl;
> >
> > //
> > @@ -144,7 +272,22 @@ CoreRestoreTpl (
> > // interrupts are enabled
> > //
> > if (gEfiCurrentTpl < TPL_HIGH_LEVEL) {
> > - CoreSetInterruptState (TRUE);
> > + if ((INTN)gEfiCurrentTpl > HighBitSet64 (mInterruptedTplMask)) {
> > + //
> > + // Only enable interrupts if restoring to a level above the highest
> > + // interrupted TPL level. This allows interrupt nesting, but only
> > for
> > + // events at higher TPL level than the current TPL level.
> > + //
> > + CoreSetInterruptState (TRUE);
> > + } else {
> > + //
> > + // Clear interrupted TPL level mask, but do not re-enable interrupts
> here
> > + // This will return to CoreTimerTick() and interrupts will be
> re-enabled
> > + // when the timer interrupt handlers returns from interrupt
> context.
> > + //
> > + ASSERT ((INTN)gEfiCurrentTpl == HighBitSet64
> (mInterruptedTplMask));
> > + mInterruptedTplMask &= ~(UINTN)(1 << gEfiCurrentTpl);
> > + }
> > }
>
> Ok, now I understand what's going on and it's indeed the same logic as
> NestedInterruptTplLib, with DisableInterruptsOnIret() replaced by
> skipping CoreSetInterruptState(TRUE). It's similar to what I proposed
> elsewhere in the thread, just written differenty.
>
> I agree with Michael Brown that the spec is unclear on the state of
> the interrupt flag on exit from gBS->RestoreTPL(), but perhaps this
> change is feasible if the interrupt handlers just raise the TPL first
> and restore it last.
>
> Just as an exercise for me to understand the code better, I tried
> rewriting the code in terms of the CoreRestoreTplInternal() function
> that I proposed. I find it easier to read, but I guess that's a bit in
> the eye of the beholder, and it is a little more defensively coded. I
> attach it (untested beyond compilation) for reference.
>
> Paolo
-=-=-=-=-=-=-=-=-=-=-=-
Groups.io Links: You receive all messages sent to this group.
View/Reply Online (#116207): https://edk2.groups.io/g/devel/message/116207
Mute This Topic: https://groups.io/mt/104642317/21656
Group Owner: [email protected]
Unsubscribe: https://edk2.groups.io/g/devel/unsub [[email protected]]
-=-=-=-=-=-=-=-=-=-=-=-