https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108315
Kewen Lin <linkw at gcc dot gnu.org> changed: What |Removed |Added ---------------------------------------------------------------------------- CC| |linkw at gcc dot gnu.org --- Comment #2 from Kewen Lin <linkw at gcc dot gnu.org> --- (In reply to Alexander Monakov from comment #0) > Created attachment 54202 [details] > testcase > > At least the documentation should mention that if intentional. > > In the attached example, the function bar is compiled to > > bar: > .localentry bar,1 > mtctr 3 > mr 12,3 > bctr > .long 0 > .byte 0,0,0,0,0,0,0,0 > > i.e. it does not preserve r2 (it's compiled with -mcpu=power10). If the > caller is not compiled with -mcpu=power10, it needs r2 preserved (bar has a > localentry, so the nop in the caller stays a nop after linking). My local 64bit-elfv2-abi spec v1.5 has the following description: 3.4.1. Symbol Values "The values of these three most significant bits of the st_other field have the following meanings: ... 1 The local and global entry points are the same, and r2 should be treated as caller-saved for local and global callers. " ... "The value of st_other is determined from the .localentry directive as follows: If the .localentry value is 0, the value of st_other is 0. If the .localentry value is 1, the value of st_other is 1. Otherwise, the value of st_other is the logarithm (base 2) of the .localentry value." The function bar is with st_other value 1, r2 should be treated as caller-saved, so it doesn't take action to preserve r2. > > I verified the testcase misbehaves on Compile Farm's gcc135: as it does not > use any power10-specific instructions, it's runnable there. I tried the attachment on one local machine (also ppc64le p9) and noticed the linker already did some fix-ups with long_branch.bar stub, Dump of assembler code for function main: 0x0000000010000540 <+0>: lis r2,4098 0x0000000010000544 <+4>: addi r2,r2,32512 0x0000000010000548 <+8>: mflr r0 0x000000001000054c <+12>: nop 0x0000000010000550 <+16>: ld r3,-32728(r2) 0x0000000010000554 <+20>: std r0,16(r1) 0x0000000010000558 <+24>: stdu r1,-32(r1) 0x000000001000055c <+28>: bl 0x10000510 <00000038.long_branch.bar> => 0x0000000010000560 <+32>: ld r2,24(r1) 0x0000000010000564 <+36>: addis r3,r2,-2 0x0000000010000568 <+40>: addi r3,r3,-30328 Dump of assembler code for function 00000038.long_branch.bar: => 0x0000000010000510 <+0>: std r2,24(r1) 0x0000000010000514 <+4>: b 0x10000710 <bar> which would save r2 onto the corresponding stack slot ahead, it runs well as expected. Not sure why it doesn't work on your side, maybe this inter-operation requires some support in newer binutils? My local one is GNU ld 2.34 which is for final linking (and 2.35 for power10 support, ie. bar.o generation).