https://gcc.gnu.org/bugzilla/show_bug.cgi?id=125893
--- Comment #2 from GCC Commits <cvs-commit at gcc dot gnu.org> --- The master branch has been updated by H.J. Lu <[email protected]>: https://gcc.gnu.org/g:1f774d902f1ec9ae6a487e00ba49514d3b37057f commit r17-1766-g1f774d902f1ec9ae6a487e00ba49514d3b37057f Author: H.J. Lu <[email protected]> Date: Sun Jun 21 07:13:38 2026 +0800 x86: Use previous scratch register in LCP stall peepholes Since LCP stall peepholes are added after register allocation, each peephole may use a different scratch register. For input: extern void bar (void); void foo (short *dst) { dst[0] = 3; asm volatile ("" : : : "memory"); dst[2] = 3; bar (); dst[1] = 3; asm volatile ("" : : : "memory"); dst[4] = 3; } with LCP stall peepholes, GCC generates: movl $3, %eax pushq %rbx movq %rdi, %rbx movw %ax, (%rdi) movl $3, %edx movw %dx, 4(%rdi) call bar movl $3, %ecx movw %cx, 2(%rbx) movl $3, %esi movw %si, 8(%rbx) popq %rbx using 4 different scratch registers vs without LCP stall peepholes: pushq %rbx movq %rdi, %rbx movw $3, (%rdi) movw $3, 4(%rdi) call bar movw $3, 2(%rbx) movw $3, 8(%rbx) popq %rbx Add ix86_output_lcp_stall_peephole to generate LCP stall peepholes with the previous scratch register: 1. Scan backward for the previous scratch register definition with the same immediate operand in the same basic block. 2. The previous scratch register is unusable if it is set between the previous scratch register definition and the current instruction. 3. If a usable previous scratch register is found, ignore the allocated scratch register and use the previous scratch register. Otherwise, use the allocated scratch register. so that the same scratch register can be reused if possible: movl $3, %eax pushq %rbx movq %rdi, %rbx movw %ax, (%rdi) movw %ax, 4(%rdi) call bar movl $3, %ecx movw %cx, 2(%rbx) movw %cx, 8(%rbx) popq %rbx I backported this patch to GCC 16: 1. When bootstrapping GCC 16 with only C and C++ enabled, this optimization triggers 54 times. No regressions. 2. When building glibc 2.44, this optimization triggers 33 times. No regressions. 3. When building Linux kernel 7.1.1, this optimization triggers 2099 times. Kernel boots correctly. Tested on Linux/x86-64 and Linux/i686. gcc/ PR target/125893 * config/i386/i386-expand.cc (ix86_expand_lcp_stall_peephole): New. * config/i386/i386-protos.h (ix86_expand_lcp_stall_peephole): Likewise. * config/i386/i386.md (TARGET_LCP_STALL peepholes): Call ix86_expand_lcp_stall_peephole. gcc/testsuite/ PR target/125893 * gcc.target/i386/pr125893-1.c: New test. * gcc.target/i386/pr125893-2.c: Likewise. * gcc.target/i386/pr125893-3.c: Likewise. * gcc.target/i386/pr125893-4.c: Likewise. * gcc.target/i386/pr125893-5.c: Likewise. * gcc.target/i386/pr125893-6.c: Likewise. * gcc.target/i386/pr125893-7.c: Likewise. * gcc.target/i386/pr125893-8.c: Likewise. * gcc.target/i386/pr125893-9.c: Likewise. * gcc.target/i386/pr125893-10.c: Likewise. * gcc.target/i386/pr125893-11.c: Likewise. Signed-off-by: H.J. Lu <[email protected]>
