On 2021-02-03 12:58 p.m., Laszlo Ersek wrote:
On 02/03/21 07:45, Ankur Arora wrote:
On 2021-02-02 6:15 a.m., Laszlo Ersek wrote:
On 02/02/21 15:00, Laszlo Ersek wrote:

... I guess that volatile-qualifying both CPU_HOT_EJECT_DATA, and the
array pointed-to by CPU_HOT_EJECT_DATA.ApicIdMap, should suffice. In
combination with the sync-up point that you quoted. This seems to match
existing practice in PiSmmCpuDxeSmm -- there are no concurrent accesses,
so atomicity is not a concern, and serializing the instruction streams
coarsely, with the sync-up, in combination with volatile accesses,
should presumably guarantee visibility (on x86 anyway).

To summarize, this is what I would ask for:

- make CPU_HOT_EJECT_DATA volatile

- make (*CPU_HOT_EJECT_DATA.ApicIdMap) volatile

- after storing something to CPU_HOT_EJECT_DATA or
CPU_HOT_EJECT_DATA.ApicIdMap on the BSP, execute a MemoryFence()

- before fetching something from CPU_HOT_EJECT_DATA or
CPU_HOT_EJECT_DATA.ApicIdMap on an AP, execute a MemoryFence()


Except: MemoryFence() isn't a *memory fence* in fact.

See "MdePkg/Library/BaseLib/X64/GccInline.c".

It's just a compiler barrier, which may not add anything beyond what
we'd already have from "volatile".

Case in point: PiSmmCpuDxeSmm performs heavy multi-processing, but does
not contain a single invocation of MemoryFence(). It uses volatile
objects, and a handful of InterlockedCompareExchangeXx() calls, for
implementing semaphores. (NB: there is no 8-bit variant of
InterlockedCompareExchange(), as "volatile UINT8" is considered atomic
in itself, and a suitable basis for a sempahore too.) And given the
synchronization from those semaphores, PiSmmCpuDpxeSmm trusts that
updates to the *other* volatile objects are both atomic and visible.

I'm pretty sure this only works because x86 is in-order. There are
instruction stream barriers in place, and compiler barriers too, but no
actual memory barriers.

Right and just to separate them explicitly, there are two problems:

  - compiler: where the compiler caches stuff in or looks at stale memory
locations. Now, AFAICS, this should not happen because the ApicIdMap would
never change once set so the compiler should reasonably be able to cache
the address of ApicIdMap and dereference it (thus obviating the need for
volatile.)

(CPU_HOT_EJECT_DATA.Handler does change though.)

Yeah, I did kinda elide over that. Let me think this through in v7
and add more explicit comments and then we can see if it still looks
fishy?

Thanks
Ankur


The compiler could, however, cache any assignments to ApicIdMap[Idx]
(especially given LTO) and we could use a MemoryFence() (as the compiler
barrier that it is) to force the store.

  - CPU pipeline: as you said, right now we basically depend on x86 store
order semantics (and the CpuPause() loop around AllCpusInSync, kinda
provides
a barrier.)

So the BSP writes in this order:
ApicIdMap[Idx]=x; ... ->AllCpusInSync = true

And whenever the AP sees ->AllCpusInSync == True, it has to now see
ApicIdMap[Idx] == x.

Yes.


Maybe the thing to do is to detail this barrier in a commit note/comment?

That would be nice.

And add the MemoryFence() but not the volatile.

Yes, that should work.

Thanks,
Laszlo



-=-=-=-=-=-=-=-=-=-=-=-
Groups.io Links: You receive all messages sent to this group.
View/Reply Online (#71127): https://edk2.groups.io/g/devel/message/71127
Mute This Topic: https://groups.io/mt/80199926/21656
Group Owner: devel+ow...@edk2.groups.io
Unsubscribe: https://edk2.groups.io/g/devel/unsub [arch...@mail-archive.com]
-=-=-=-=-=-=-=-=-=-=-=-


Reply via email to