On 09/23/20 23:34, Bret Barkelew wrote: > Agreed. It's random that the previous config worked.
Here's an idea. Assume we have two DXE drivers (D1 and D2), each of which registers an EndOfDxe notification function (each to be queued at TPL_CALLBACK or TPL_NOTIFY, doesn't matter). Assume D1 does something in its EndOfDxe handler that breaks if D2's EndOfDxe handler runs first. Let's call the offending action in D2's EndOfDxe handler "Nastiness". (Scientific term.) Further assume that D1 is an optional driver (while D2 is mandatory). D1 may or may not be included in the platform firmware. Here's what the drivers could do, for ensuring that D1's handler runs first, *if* D1 is included in the platform firmate: (1) D2 defines (standardizes) a new, well-known protocol GUID. There is no need for a protocol structure; it can be a null protocol. If the protocol is present in the protocol database, that means that Nastiness must not be performed by D2 in its EndOfDxe handler. For simplicity, let's call the protocol EDKII_NO_NASTINESS_PLEASE_PROTOCOL. (2) In its entry point function, D1 produces an instance of EDKII_NO_NASTINESS_PLEASE_PROTOCOL. The protocol can be the sole protocol on a new handle, or exist on another long-term handle (such as gImageHandle). (3) In its EndOfDxe handler, D1 uninstalls EDKII_NO_NASTINESS_PLEASE_PROTOCOL (potentially destroying the handle as well). (This is safe because the handler is running at TPL_NOTIFY at the highest, and uninstalling protocols is safe at <= TPL_NOTIFY. Memory allocation services are also explicitly safe at <= TPL_NOTIFY.) (4) D2 factors Nastiness out of its EndOfDxe handler to a new function. This new function -- solely consisting of Nastiness -- has the usual notification function prototype. Let's call it NastyFun(). (5) In its entry point function, D2 creates a new (private, not GUID-ed) event, for which NastyFun() is the notification function. The TPL is TPL_CALLBACK. Let's call the event NastyEvent. (6) In D2's EndOfDxe handler, the original, open-coded Nastiness is replaced with the following logic: - If EDKII_NO_NASTINESS_PLEASE_PROTOCOL exists in the protocol database (according to gBS->LocateProtocol(), then D2's EndOfDxe handler signals NastyEvent. (This is safe because LocateProtocol() is safe to call at up to and including TPL_NOTIFY, and gBS->SignalEvent is safe even at TPL_HIGH_LEVEL.) - Otherwise, D2's EndOfDxe calls NastyFun() with a normal function call. It's supposed to work like this: - If D1 is not in the firmware, it won't install EDKII_NO_NASTINESS_PLEASE_PROTOCOL, so D2 performs Nastiness (via a direct call to NastyFun()) on the call stack of its original EndOfDxe handler. That is, no change in behavior. - If D1's EndOfDxe handler completes first, D2's EndOfDxe handler will again not see EDKII_NO_NASTINESS_PLEASE_PROTOCOL; so goto previous point. - Multiple drivers similar to D1 may coexist; multiple EDKII_NO_NASTINESS_PLEASE_PROTOCOL instances may coexist (all NULL interfaces, but on different handles). If there is at least one, gBS->LocateProtocol() in D2's EndOfDxe handler will find the first one, and postpone Nastiness. - When D2's EndOfDxe handler signals NastyEvent, NastyFun() will be enqueued at TPL_CALLBACK. Note that, given that D2's EndOfDxe handler is running at the moment, the EndOfDxe event group has been signaled previously. This means *all* events in the EndOfDxe event group have been signaled, and all their notification functions have been enqueued too (in some unspecified order, except for the specified dispatch ordering between TPL_NOTIFY and TPL_CALLBACK). So when NastyEvent is signaled, NastyFun() is enqueued *after* all the other -- already enqueued -- EndOfDxe notification functions. That's because (a) NastyEvent uses TPL_CALLBACK, so the already pending TPL_NOTIFY notifications will be dispatched before it of course, and (b) regarding notify functions enqueued previously at the same TPL (= TPL_CALLBACK), functions in any given TPL queue are queued / dispatched in FIFO order. By the time NastyFun() runs, it could ASSERT() that EDKII_NO_NASTINESS_PLEASE_PROTOCOL does not exist. Thanks Laszlo > On Wed, Sep 23, 2020 at 2:32 PM Andrew Fish <af...@apple.com> wrote: > >> >> >> On Sep 23, 2020, at 2:14 PM, Bret Barkelew <b...@corthon.com> wrote: >> >> As far as I can tell, this is exposing a pre-existing race condition in >> EndOfDxe. It just so HAPPENS that the old "Disable RequestToLock" >> callback executed after this TcgMor callback, whereas the new >> VariablePolicy callback executes first. >> >> I'm going to poke around for a solution, but this seems like an >> architectural problem with EndOfDxe. >> >> >> The architecture does not guarantee order for the events. So if the events >> are dependent on the order of other events that is a bug in implementation. >> These kind of bugs are hard to notice as the DXE Core implementation event >> dispatch in a fixed order, I think in the order the event was registered. >> So it looks correct until you add more events. >> >> Thanks, >> >> Andrew Fish >> >> On Wed, Sep 23, 2020 at 6:02 AM Laszlo Ersek <ler...@redhat.com> wrote: >> >>> Hi Bret, >>> >>> On 09/23/20 08:12, Bret Barkelew via groups.io wrote: >>>> To whom it may concern, >>>> This is as done as it’s going to get. >>>> >>>> Thank you all for your help! >>> >>> I tried to merge this via <https://github.com/tianocore/edk2/pull/954>, >>> but it failed. Please review the build report, and submit v9 if necessary. >>> >>> Thanks! >>> Laszlo >>> >>> >> >> >> > > > > > > -=-=-=-=-=-=-=-=-=-=-=- Groups.io Links: You receive all messages sent to this group. View/Reply Online (#65576): https://edk2.groups.io/g/devel/message/65576 Mute This Topic: https://groups.io/mt/77029646/21656 Group Owner: devel+ow...@edk2.groups.io Unsubscribe: https://edk2.groups.io/g/devel/unsub [arch...@mail-archive.com] -=-=-=-=-=-=-=-=-=-=-=-