https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102952
Bug ID: 102952 Summary: New code-gen options for retpolines and straight line speculation Product: gcc Version: unknown Status: UNCONFIRMED Severity: normal Priority: P3 Component: c Assignee: unassigned at gcc dot gnu.org Reporter: andrew.cooper3 at citrix dot com Target Milestone: --- Hello [FYI, this is being cross-requested of Clang too] Linux and other kernel level software makes use of -mindirect-branch=thunk-extern to be able to alter the handling of indirect branches at boot. It turns out to be advantageous to inline the thunks when retpoline is not in use. https://lore.kernel.org/lkml/20211026120132.613201...@infradead.org/ is some infrastructure to make this work. In some cases, we want to be able to inline an `lfence; jmp *%reg` thunk. This is fine for the low 8 registers, but not fine for %r{8..15} where the REX prefix pushes the replacement size to being 6 bytes. It would be very useful to have a code-gen option to write out `call %cs:__x86_indirect_thunk_r{8..15}` where the redundant %cs prefix will increase the instruction length to 6, allowing the non-retpoline form to be inlined. Relatedly, x86 straight line speculation has been discussed before, but without any action taken. It would be helpful to have a code gen option which would emit `int3` following any `ret` instruction, and any indirect jump, as neither of these two cases have following architectural execution. The reason these two are related is that if both options are in use, we want an extra byte of replacement space to be able to inline `lfence; jmp *%reg; int3`. Third (and possibly only for future optimisations), Clang has been observed to spot conditional tail calls as `Jcc __x86_indirect_thunk_*`. This is a 6 byte source size, but needs up to 9 bytes of space for inlining including an `int3` for straight line speculation reasons (See https://lore.kernel.org/lkml/20211026120310.359986...@infradead.org/ for full details). It might be enough to simply prohibit an optimisation like this when trying to pad retpolines for inlineability.