Ray,

If you chooses assembly solution, you'd better to consider stack adjust to 
follow calling convention. Otherwise, it may break some debugger tools to do 
call stack trace.

Jeff


fanjianf...@byosoft.com.cn
 
From: Ni, Ray
Date: 2023-05-19 10:53
To: Andrew (EFI) Fish; devel@edk2.groups.io; Kinney, Michael D
CC: Rebecca Cran
Subject: Re: [edk2-devel] CpuDeadLoop() is optimized by compiler
I think all the options we considered are workarounds. These might break again 
if compiler is “cleverer” in future. Unless some Cxx spec clearly guarantees 
that.
 
I like Mike’s idea to use assembly implementation for CpuDeadLoop. The assembly 
can simply “jmp $” then “ret”.
 
I didn’t find a dead-loop intrinsic function in MSVC.
Any better idea?
 
Thanks,
Ray
 
From: Andrew (EFI) Fish <af...@apple.com> 
Sent: Friday, May 19, 2023 8:42 AM
To: devel@edk2.groups.io; Kinney, Michael D <michael.d.kin...@intel.com>
Cc: Ni, Ray <ray...@intel.com>; Rebecca Cran <rebe...@bsdio.com>
Subject: Re: [edk2-devel] CpuDeadLoop() is optimized by compiler
 
Mike,
 
Sorry static was just to scope the name to the file since it is a lib, not to 
make it work.
 
That is a cool site. I learned about it complaining about stuff to the compiler 
team on our internal clang Slack channel as they use it to answer my questions.
 
Thanks,
 
Andrew Fish


On May 18, 2023, at 2:42 PM, Michael D Kinney <michael.d.kin...@intel.com> 
wrote:
 
Using that tool, the following fragment seems to generate the right code.  
Volatile is required.  Static is optional.
 
static volatile int  mDeadLoopCount = 0;
 
void
CpuDeadLoop(
  void
  )
{
  while (mDeadLoopCount == 0);
}
 
 
GCC
===
CpuDeadLoop():
.L2:
        mov     eax, DWORD PTR mDeadLoopCount[rip]
        test    eax, eax
        je      .L2
        ret
 
 
CLANG
=====
CpuDeadLoop():                       # @CpuDeadLoop()
.LBB0_1:                                # =>This Inner Loop Header: Depth=1
        cmp     dword ptr [rip + _ZL14mDeadLoopCount], 0
        je      .LBB0_1
        ret
 
 
Mike
 
 
From: Andrew (EFI) Fish <af...@apple.com> 
Sent: Thursday, May 18, 2023 1:45 PM
To: edk2-devel-groups-io <devel@edk2.groups.io>; Andrew Fish <af...@apple.com>
Cc: Kinney, Michael D <michael.d.kin...@intel.com>; Ni, Ray <ray...@intel.com>; 
Rebecca Cran <rebe...@bsdio.com>
Subject: Re: [edk2-devel] CpuDeadLoop() is optimized by compiler
 
Whoops wrong compiler. Here is an update. I added the flags so this one 
reproduces the issue.
 
Compiler Explorer
godbolt.org
<image001.png>
 
Thanks,
 
Andrew Fish



On May 18, 2023, at 11:45 AM, Andrew Fish via 
groups.io<afish=apple....@groups.io> wrote:
 
Mike,
 
This is a good way to play around with fixes, and to report bugs. You can see 
the assembler for different compilers with different flag. 
 
Compiler Explorer
godbolt.org
<favicon.png>
 
Sorry I’m traveling and in Cupertino with lots of meetings so I did not have 
time to adjust the compiler flags….
 
Thanks,
 
Andrew Fish



On May 18, 2023, at 10:24 AM, Andrew (EFI) Fish <af...@apple.com> wrote:
 
Mike,
 
I guess my other question… If this turns out to be a compiler bug should we 
scope the change to the broken toolchain. I’m not sure what the right answer is 
for that, but I want to ask the question? 
 
Thanks,
 
Andrew Fish



On May 18, 2023, at 10:19 AM, Michael D Kinney <michael.d.kin...@intel.com> 
wrote:
 
Andrew,
 
This might work for XIP.  Set non const global to initial value that is 
expected value to stay in dead loop.
 
UINTN  mDeadLoopCount = 0;
 
VOID
CpuDeadLoop(
  VOID
  ) 
{
  while (mDeadLoopCount == 0) {
      CpuPause();
  }
}
 
When deadloop is entered, developer can not change value of mDeadLoopCount, but 
they can use debugger to force exit loop and return from function.
 
Mike
 
 
From: Andrew (EFI) Fish <af...@apple.com> 
Sent: Thursday, May 18, 2023 10:09 AM
To: Kinney, Michael D <michael.d.kin...@intel.com>
Cc: edk2-devel-groups-io <devel@edk2.groups.io>; Ni, Ray <ray...@intel.com>; 
Rebecca Cran <rebe...@bsdio.com>
Subject: Re: [edk2-devel] CpuDeadLoop() is optimized by compiler
 
Mike,
 
Good point, that is why we are using the stack ….
 
The only other thing I can think of is to pass the address of Index to some 
inline assembler, or an asm no op function, to give it a side effect the 
compiler can’t resolve. 
 
Thanks,
 
Andrew Fish




On May 18, 2023, at 10:05 AM, Kinney, Michael D <michael.d.kin...@intel.com> 
wrote:
 
Static global will not work for XIP
 
Mike
 
From: Andrew (EFI) Fish <af...@apple.com> 
Sent: Thursday, May 18, 2023 9:49 AM
To: edk2-devel-groups-io <devel@edk2.groups.io>; Kinney, Michael D 
<michael.d.kin...@intel.com>
Cc: Ni, Ray <ray...@intel.com>; Rebecca Cran <rebe...@bsdio.com>
Subject: Re: [edk2-devel] CpuDeadLoop() is optimized by compiler
 
Mike,
 
I pinged some compiler experts to see if our code is correct, or if the 
compiler has an issue. Seems to be trending compiler issue right now, but I’ve 
NOT gotten feedback from anyone on the spec committee yet. 
 
If we move Index to a static global that would likely work around the compiler 
issue.
 
Thanks,
 
Andrew Fish





On May 18, 2023, at 8:36 AM, Michael D Kinney <michael.d.kin...@intel.com> 
wrote:
 
Hi Ray,
 
So the code generated does deadloop, but is just not easy to resume from as we 
have been able to do in the past.
 
We use CpuDeadloop() for 2 purposes.  One is a terminal condition with no 
reason to ever continue.
 
The 2nd is a debug aide for developers to halt the system at a specific 
location and then continue from that point, usually with a debugger, to step 
through code to an area to evaluate unexpected behavior.
 
We may have to do a NASM implementation of CpuDeadloop() to make sure it meets 
both use cases.
 
Mike
 
From: Ni, Ray <ray...@intel.com> 
Sent: Thursday, May 18, 2023 3:00 AM
To: devel@edk2.groups.io
Cc: Kinney, Michael D <michael.d.kin...@intel.com>; Rebecca Cran 
<rebe...@bsdio.com>; Ni, Ray <ray...@intel.com>
Subject: CpuDeadLoop() is optimized by compiler
 
Hi,
Starting from certain version of Visual Studio C compiler (I don’t have the 
exact version. I am using VS2019), CpuDeadLoop is now optimized quite well by 
compiler.
 
The optimization is so “good” that it becomes harder for developers to break 
out of the deadloop.
 
I copied the assembly instructions as below for your reference.
The compiler does not generate instructions that jump out of the loop when the 
Index is not zero.
So in order to break out of the loop, developers need to:
Manually adjust rsp by increasing 40
Manually “ret”
 
I am not sure if anyone has interest to re-write this function so that compiler 
can be “fooled” again.
Thanks,
Ray
 
=======================
; Function compile flags: /Ogspy
; File e:\work\edk2\MdePkg\Library\BaseLib\CpuDeadLoop.c
;              COMDAT CpuDeadLoop
_TEXT    SEGMENT
Index$ = 48
CpuDeadLoop PROC                                                                
    ; COMDAT
 
; 26   : {
 
$LN12:
  00000  48 83 ec 28         sub        rsp, 40                                
; 00000028H
 
; 27   :   volatile UINTN  Index;
; 28   : 
; 29   :   for (Index = 0; Index == 0;) {
 
  00004  48 c7 44 24 30
               00 00 00 00        mov      QWORD PTR Index$[rsp], 0
$LN10@CpuDeadLoo:
 
; 30   :     CpuPause ();
 
  0000d  48 8b 44 24 30   mov      rax, QWORD PTR Index$[rsp]
  00012  e8 00 00 00 00   call        CpuPause
  00017  eb f4                     jmp       SHORT $LN10@CpuDeadLoo
CpuDeadLoop ENDP
_TEXT    ENDS
END
 
 
 
 
 
 
 



-=-=-=-=-=-=-=-=-=-=-=-
Groups.io Links: You receive all messages sent to this group.
View/Reply Online (#105056): https://edk2.groups.io/g/devel/message/105056
Mute This Topic: https://groups.io/mt/98987896/21656
Group Owner: devel+ow...@edk2.groups.io
Unsubscribe: https://edk2.groups.io/g/devel/unsub [arch...@mail-archive.com]
-=-=-=-=-=-=-=-=-=-=-=-


Reply via email to