[Bug target/98482] -mfentry creates invalid call for -mcmodel=large
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98482 --- Comment #15 from CVS Commits --- The master branch has been updated by H.J. Lu : https://gcc.gnu.org/g:745d04e796c1a7ebcea0185d0742d58b0c0030ab commit r11-6557-g745d04e796c1a7ebcea0185d0742d58b0c0030ab Author: H.J. Lu Date: Fri Jan 8 08:41:38 2021 -0800 x86-64: Require lp64 for PR target/98482 tests Require lp64 for PR target/98482 tests since -mcmodel=large is isn't supported for x32. PR target/98482 * gcc.target/i386/pr98482-1.c: Require lp64. * gcc.target/i386/pr98482-2.c: Likewise.
[Bug target/98482] -mfentry creates invalid call for -mcmodel=large
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98482 --- Comment #14 from Uroš Bizjak --- (In reply to Jakub Jelinek from comment #10) > If we are emitting for nested functions > pushq %r10 > 1:call__fentry__ > popq%r10 > (is it ok to misalign the stack for __fentry__? but then even plain call > __fentry__ actually misaligns it), then perhaps we can do similarly for the > PIC case. But I wonder how does __fentry__ then find the caller if it can't > rely on the return address being right above the return address to the > function that called __fentry__ (appart from unwind info of course, but we > don't really emit .cfi_* directives here either, do we?). Generic part of the compiler pushes static chain register for nested functions, so there is little we can do in the target part. If there is a problem with misaligned stack, then I think __mcount_internal will have to be realigned, because calls to both, mcount and __fentry__ can be misaligned. I don't know what to do with __fentry__ argument. Luckily, mcount finds its argument via frame pointer, so it works there.
[Bug target/98482] -mfentry creates invalid call for -mcmodel=large
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98482 --- Comment #13 from H.J. Lu --- Fixed for GCC 11 so far. Please open a new GCC bug for mcount stack alignment.
[Bug target/98482] -mfentry creates invalid call for -mcmodel=large
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98482 --- Comment #12 from CVS Commits --- The master branch has been updated by H.J. Lu : https://gcc.gnu.org/g:76be18f442948d1a4bc49a7d670b07097f9e5983 commit r11-6552-g76be18f442948d1a4bc49a7d670b07097f9e5983 Author: H.J. Lu Date: Fri Jan 8 05:20:19 2021 -0800 x86-64: Use R10 and R11 for profiling large model with PIC For NO_PROFILE_COUNTERS targets, R11 is a scratch register. We can use R10 and R11 to call mcount in large model with PIC. gcc/ PR target/98482 * config/i386/i386.c (x86_function_profiler): Use R10 and R11 to call mcount in large model with PIC for NO_PROFILE_COUNTERS targets. gcc/testsuite/ PR target/98482 * gcc.target/i386/pr98482-2.c: Updated.
[Bug target/98482] -mfentry creates invalid call for -mcmodel=large
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98482 --- Comment #11 from CVS Commits --- The master branch has been updated by H.J. Lu : https://gcc.gnu.org/g:1b885264a48dcd71b7aeb26c0abeb91246724897 commit r11-6548-g1b885264a48dcd71b7aeb26c0abeb91246724897 Author: H.J. Lu Date: Thu Jan 7 14:27:49 2021 -0800 x86-64: Use R10 for profiling large model R10 is caller-saved. Although it can be used as a static chain register, it is preserved when calling mcount for nested functions. Use R10 as a scratch register to call mcount in large model. gcc/ PR target/98482 * config/i386/i386.c (x86_function_profiler): Use R10 to call mcount in large model. Sorry for large model with PIC. gcc/testsuite/ PR target/98482 * gcc.target/i386/pr98482-1.c: New test. * gcc.target/i386/pr98482-1.c: Likewise.
[Bug target/98482] -mfentry creates invalid call for -mcmodel=large
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98482 Jakub Jelinek changed: What|Removed |Added CC||jakub at gcc dot gnu.org --- Comment #10 from Jakub Jelinek --- If we are emitting for nested functions pushq %r10 1: call__fentry__ popq%r10 (is it ok to misalign the stack for __fentry__? but then even plain call __fentry__ actually misaligns it), then perhaps we can do similarly for the PIC case. But I wonder how does __fentry__ then find the caller if it can't rely on the return address being right above the return address to the function that called __fentry__ (appart from unwind info of course, but we don't really emit .cfi_* directives here either, do we?).
[Bug target/98482] -mfentry creates invalid call for -mcmodel=large
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98482 --- Comment #9 from Uroš Bizjak --- (In reply to Topi Miettinen from comment #8) > I'm unfortunately ignorant to GCC internals and usage of %r10, but otherwise > the patch looks good to me. > > For -mcmodel=large -fPIC, the call sequence probably needs to be similar to > how other extern functions are called under those flags: > > .L2: > movabsq $_GLOBAL_OFFSET_TABLE_-.L2, %reg0 > leaq.L2(%rip), %reg1 > movabsq $__fentry__@PLTOFF, %reg2 > addq%reg0, %reg1 > addq%reg1, %reg2 > call*%reg2 We are only lucky to get one temporary register (%r10), so perhaps the above could be implemented for NO_PROFILE_COUNTERS targets, where %r11 is also available.
[Bug target/98482] -mfentry creates invalid call for -mcmodel=large
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98482 --- Comment #8 from Topi Miettinen --- I'm unfortunately ignorant to GCC internals and usage of %r10, but otherwise the patch looks good to me. For -mcmodel=large -fPIC, the call sequence probably needs to be similar to how other extern functions are called under those flags: .L2: movabsq $_GLOBAL_OFFSET_TABLE_-.L2, %reg0 leaq.L2(%rip), %reg1 movabsq $__fentry__@PLTOFF, %reg2 addq%reg0, %reg1 addq%reg1, %reg2 call*%reg2
[Bug target/98482] -mfentry creates invalid call for -mcmodel=large
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98482 --- Comment #7 from Hongtao.liu --- (In reply to H.J. Lu from comment #6) > A patch is posted at > > https://gcc.gnu.org/pipermail/gcc-patches/2021-January/563033.html Yes, %r10 is pushed before __fentry__ cat test.c void func(int (*param)(int)); void outer(int x) { int nested(int y) { // If x is not used somewhere in here, // then the function will be "lifted" into // a normal, non-nested function. return x + y; } func(nested); } with -O2 -pg -mfentry got nested.0: .LFB1: .cfi_startproc pushq %r10 1: call__fentry__ popq%r10 movl(%r10), %eax addl%edi, %eax ret .cfi_endproc .LFE1: .size nested.0, .-nested.0 .p2align 4 .globl outer .type outer, @function outer: .LFB0: .cfi_startproc 1: call__fentry__ subq$56, %rsp .cfi_def_cfa_offset 64 leaq64(%rsp), %rax movq%rax, 32(%rsp) movl$-17599, %eax movl%edi, (%rsp) movw%ax, 4(%rsp) movl$-17847, %edx movl$nested.0, %eax leaq4(%rsp), %rdi movl%eax, 6(%rsp) movw%dx, 10(%rsp) movq%rsp, 12(%rsp) movl$-1864106167, 20(%rsp) callfunc addq$56, %rsp .cfi_def_cfa_offset 8
[Bug target/98482] -mfentry creates invalid call for -mcmodel=large
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98482 H.J. Lu changed: What|Removed |Added Ever confirmed|0 |1 Last reconfirmed|2021-01-04 00:00:00 |2021-01-07 Status|UNCONFIRMED |NEW --- Comment #6 from H.J. Lu --- A patch is posted at https://gcc.gnu.org/pipermail/gcc-patches/2021-January/563033.html
[Bug target/98482] -mfentry creates invalid call for -mcmodel=large
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98482 --- Comment #5 from Uroš Bizjak --- (In reply to Topi Miettinen from comment #4) > Sorry, I didn't check the ABI. It seems that %r11 and maybe %r10 should be > usable: %r11 is already used as PROFILE_COUNT_REGISTER for !NO_PROFILE_COUNTERS targets.
[Bug target/98482] -mfentry creates invalid call for -mcmodel=large
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98482 --- Comment #4 from Topi Miettinen --- Sorry, I didn't check the ABI. It seems that %r11 and maybe %r10 should be usable: Figure 3.4: Register Usage Register Usage Preserved across function calls %r10 temporary register, used for passing a function’s static chain pointer No %r11 temporary register No Otherwise, I suppose any register could be used if it's saved: pushq %reg movabsq $__fentry__, %reg call*%reg popq %reg
[Bug target/98482] -mfentry creates invalid call for -mcmodel=large
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98482 --- Comment #3 from Uroš Bizjak --- (In reply to Uroš Bizjak from comment #2) > (In reply to Hongtao.liu from comment #1) > > and by the time of output __fentry__ in gcc, register is already accocated, > > is there any regs supposed to be safe in the entry of function? or we need > > to spill reg to stack and load it back after call, it looks inefficient. > > You can use any calee-saved register here. Eh, no - __fentry__ is called before pushes.
[Bug target/98482] -mfentry creates invalid call for -mcmodel=large
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98482 --- Comment #2 from Uroš Bizjak --- (In reply to Hongtao.liu from comment #1) > and by the time of output __fentry__ in gcc, register is already accocated, > is there any regs supposed to be safe in the entry of function? or we need > to spill reg to stack and load it back after call, it looks inefficient. You can use any calee-saved register here.
[Bug target/98482] -mfentry creates invalid call for -mcmodel=large
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98482 --- Comment #1 from Hongtao.liu --- (In reply to Topi Miettinen from comment #0) > GCC on x86_64 with `-mfentry` generates invalid code for `-mcmodel=large`. > The call to `__fentry__` uses plain `call` instruction, but this can only > address locations within 32 bit range while the target may be anywhere in > the 64 bit range due to `-mcmodel=large`. > > The expected code would be something which loads a 64 bit value to a > register and then uses register indirect call, so instead of > call__fentry__ > there needs to be > movabsq $__fentry__, %rax > call*%rax according to PSabi, %rax is not safe. %rax temporary register; with variable arguments No passes information about the number of vector registers used; 1 st return register and by the time of output __fentry__ in gcc, register is already accocated, is there any regs supposed to be safe in the entry of function? or we need to spill reg to stack and load it back after call, it looks inefficient.