On 2021-02-23 9:18 a.m., Paolo Bonzini wrote:
On 23/02/21 18:06, Laszlo Ersek wrote:
On 02/23/21 08:45, Paolo Bonzini wrote:
On 22/02/21 15:53, Laszlo Ersek wrote:
+
+  if (mCpuHotEjectData != NULL) {
+    CPU_HOT_EJECT_HANDLER Handler;
+
+    Handler = mCpuHotEjectData->Handler;
This patch looks otherwise OK to me, but:

In patch v8 08/10, we have a ReleaseMemoryFence(). (For now, it is only
expressed as a MemoryFence() call; we'll make that more precise later.)

(1) I think that should be paired with an AcquireMemoryFence() call,
just before loading "mCpuHotEjectData->Handler" above -- for now, also
expressed as a MemoryFence() call only.

In Linux terms, there is a control dependency here.  However, it should
at least be a separate statement to load mCpuHotEjectData (which from my
EDK2 reminiscences should be a global) into a local variable.  So

   EjectData = mCPUHotEjectData;
   // Optional AcquireMemoryFence here
   if (EjectData != NULL) {
     CPU_HOT_EJECT_HANDLER Handler;

     Handler = EjectData->Handler;
     if (Handler != NULL) {
       Handler (CpuIndex);
     }
   }

Yes, "mCPUHotEjectData" is a global.

"mCpuHotEjectData" itself is set up on the BSP (from the entry point
function of the PiSmmCpuSmmDxe driver), before any other APs have a
chance to execute any SMM-related code at all. Furthermore, once set up,
mCpuHotEjectData never changes -- it remains set to a particular
non-NULL value forever, or it remains NULL forever. (The latter case
applies when the possible CPU count is 1; IOW, then there is no AP at all.)

Ok, that's what I was missing.  However, your code below has *two* loads of 
mCpuHotEjectData and the fence would have to go after the second (between the load of 
mCpuHotEjectData and the load of the Handler field).  Therefore I would still use a local 
variable even if you decide to put the fence inside the "if", which I agree is 
the clearest.

Sorry, I'm missing something here. As Laszlo said given that mCpuHotEjectData
does not change after being set, so why would it be a problem in referencing it
twice?

The generated code looks like this (load for mCpuHotEjectData at 0xf54b and
then the dependent mCpuHotEjectData->Handler load on 0xf645):

      # 17d60 <mCpuHotEjectData>
f54b:       48 8b 05 0e 88 00 00    mov    0x880e(%rip),%rax
f54e: R_X86_64_PC32     .data+0x1d5c
f552:       48 85 c0                test   %rax,%rax
f555:       0f 85 ea 00 00 00       jne    f645 <SmiRendezvous+0x17e>

      # Handler = mCpuHotEjectData->Handler
f645:       48 8b 40 08             mov    0x8(%rax),%rax
f649:       48 85 c0                test   %rax,%rax
f64c:       74 05                   je     f653 <SmiRendezvous+0x18c>
f64e:       4c 89 e1                mov    %r12,%rcx
f651:       ff d0                   callq  *%rax

In the worst case, however, maybe it looks like this (two loads for
mCpuHotEjectData and then the dependent load):

      # 17d60 <mCpuHotEjectData>
f54b:       48 8b 05 0e 88 00 00    mov    0x880e(%rip),%rax
f54e: R_X86_64_PC32     .data+0x1d5c
f552:       48 85 c0                test   %rax,%rax
f555:       0f 85 ea 00 00 00       jne    f645 <SmiRendezvous+0x17e>

      # 17d60 <mCpuHotEjectData>
f645:       48 8b 05 0e 88 00 00    mov    0x880e(%rip),%rax
  +3: R_X86_64_PC32     .data+0x1d5c

      # Handler = mCpuHotEjectData->Handler
  +7:       48 8b 40 08             mov    0x8(%rax),%rax
 +11:       48 85 c0                test   %rax,%rax
 +14:       74 05                   je     f653 <SmiRendezvous+0x18c>
 +16:       4c 89 e1                mov    %r12,%rcx
 +19:       ff d0                   callq  *%rax

As you and Laszlo say -- we do need an acquire fence before this line
(which corresponds to the release fence in UnplugCpus(), patch 8
and the release fence in EjectCpu() in patch 9).

         # Handler = mCpuHotEjectData->Handler
      48 8b 40 08             mov    0x8(%rax),%rax

A local variable for mCpuHotEjectData, would be nice to have but I'm
not sure it is needed for correctness.

Ankur

Paolo

Because of that, I thought that the first comparison (mCpuHotEjectData
!= NULL) would not need any fence -- I thought it was similar to a
userspace program that (a) set a global variable in the "main" thread,
before calling pthread_create(), (b) treated the global variable as a
constant, ever after (meaning all threads).

However, mCpuHotEjectData->Handler is changed regularly (modified by the
BSP, and read "later" by all processors). That's why I thought the
acquire fence was needed in the following location:

   if (mCpuHotEjectData != NULL) {
     CPU_HOT_EJECT_HANDLER Handler;

     //
     // HERE -- AcquireMemoryFence()
     //
     Handler = mCpuHotEjectData->Handler;
     if (Handler != NULL) {
       Handler (CpuIndex);
     }
   }

Thanks!
Laszlo




-=-=-=-=-=-=-=-=-=-=-=-
Groups.io Links: You receive all messages sent to this group.
View/Reply Online (#72114): https://edk2.groups.io/g/devel/message/72114
Mute This Topic: https://groups.io/mt/80819862/21656
Group Owner: devel+ow...@edk2.groups.io
Unsubscribe: https://edk2.groups.io/g/devel/unsub [arch...@mail-archive.com]
-=-=-=-=-=-=-=-=-=-=-=-


Reply via email to