Re: Calling convention for Intel APX extension
Hello, On Sun, 30 Jul 2023, Thomas Koenig wrote: > > I've recently submitted a patch that adds some attributes that basically > > say "these-and-those regs aren't clobbered by this function" (I did them > > for not clobbered xmm8-15). Something similar could be used for the new > > GPRs as well. Then it would be a matter of ensuring that the interesting > > functions are marked with that attributes (and then of course do the > > necessary call-save/restore). > > Interesting. > > Taking this a bit further: The compiler knows which registers it used > (and which ones might get clobbered by called functions) and could > generate such information automatically and embed it in the assembly > file, and the assembler could, in turn, put it into the object file. > > A linker (or LTO) could then check this and elide save/restore pairs > where they are not needed. LTO with interprocedural register allocation (-fipa-ra) already does this. Doing it without LTO is possible to implement in the way you suggest, but is very hard to get effective: the problem is that saving/restoring of registers might be scheduled in non-trivial ways and getting rid of instruction bytes within function bodies at link time is fairly non-trivial: it needs excessive meta-information to be effective (e.g. all jumps that potentially cross the removed bytes must get relocations). So you either limit the ways that prologue and epilogues are emitted to help the linker (thereby limiting effectiveness of unchanged xlogues) or you emit more meta-info than the instruction bytes themself, bloating object files for dubious outcomes. > It would probably be impossible for calls into shared libraries, since > the saved registers might change from version to version. The above scheme could be extended to also allow introducing stubs (wrappers) for shared lib functions, handled by the dynamic loader. But then you would get hard problems to solve related to function addresses and their uniqueness. > Still, potential gains could be substantial, and it could have an > effect which could come close to inlining, while actually saving space > instead of using extra. > > Comments? I think it would be an interesting experiment to implement such scheme fully just to see how effective it would be in practice. But it's very non-trivial to do, and my guess is that it won't be super effective. So, could be a typical research paper topic :-) At least outside of extreme cases like the SSE regs, where none are callee-saved, and which can be handled in a different way like the explicit attributes. Ciao, Michael.
Re: Calling convention for Intel APX extension
Am 27.07.23 um 15:43 schrieb Michael Matz: I've recently submitted a patch that adds some attributes that basically say "these-and-those regs aren't clobbered by this function" (I did them for not clobbered xmm8-15). Something similar could be used for the new GPRs as well. Then it would be a matter of ensuring that the interesting functions are marked with that attributes (and then of course do the necessary call-save/restore). Interesting. Taking this a bit further: The compiler knows which registers it used (and which ones might get clobbered by called functions) and could generate such information automatically and embed it in the assembly file, and the assembler could, in turn, put it into the object file. A linker (or LTO) could then check this and elide save/restore pairs where they are not needed. Now, I know that removing instructions during linking is a dangerous business, and is a source of hard-to-find and rare bugs (the worst kind) if not done right; a bullet-proof algorithm would be needed for that. It would probably be impossible for calls into shared libraries, since the saved registers might change from version to version. It also would probably not work for virtual member functions which are not found by devirtualitzation. Still, potential gains could be substantial, and it could have an effect which could come close to inlining, while actually saving space instead of using extra. Comments?
Re: Calling convention for Intel APX extension
Hey, On Thu, 27 Jul 2023, Thomas Koenig via Gcc wrote: > Intel recommends to have the new registers as caller-saved for > compatibility with current calling conventions. If I understand this > correctly, this is required for exception unwinding, but not if the > function called is __attribute__((nothrow)). That's not the full truth. It's not (only) exception handling but also context switching via setjmp/longjmp and make/get/setcontext. The data structures for that are part of the ABI unfortunately, and can't be assumed to be extensible (as Florian says, for glibc there maybe be hacks (or maybe not) on x86-64. Some other archs implemented extensibility from the outset). So all registers (and register parts!) added after the initial psABI is defined usually _have_ to be call-clobbered. > Since Fortran tends to use a lot of registers for its array descriptors, > and also tends to call nothrow functions (all Fortran functions, and > all Fortran intrinsics, such as sin/cos/etc) a lot, it could profit from > making some of the new registers callee-saved, to save some spills > at function calls. I've recently submitted a patch that adds some attributes that basically say "these-and-those regs aren't clobbered by this function" (I did them for not clobbered xmm8-15). Something similar could be used for the new GPRs as well. Then it would be a matter of ensuring that the interesting functions are marked with that attributes (and then of course do the necessary call-save/restore). Ciao, Michael.
Re: Calling convention for Intel APX extension
* Thomas Koenig via Gcc: > Intel recommends to have the new registers as caller-saved for > compatibility with current calling conventions. If I understand this > correctly, this is required for exception unwinding, but not if the > function called is __attribute__((nothrow)). Nothrow functions still can call longjmp, so that's probably not the right discriminator. For glibc on Linux, we have some extra space in jmpbuf in the signal mask (we have 1024 bits, but the kernel can use just 64, some of those have already been repurposed), but it's going to be tough for cancellation support because of a historic microoptimization there. Thanks, Florian