https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100955
Bug ID: 100955 Summary: varargs causes extra stores to/from stack Product: gcc Version: 12.0 Status: UNCONFIRMED Keywords: missed-optimization Severity: enhancement Priority: P3 Component: target Assignee: unassigned at gcc dot gnu.org Reporter: pinskia at gcc dot gnu.org Target Milestone: --- Target: aarch64-linux-gnu Take: #include <stdarg.h> int vfprintf1 (const char *, va_list); int __fprintf1 (const char *format, ...) { va_list arg; int done; va_start (arg, format); done = vfprintf1 (format, arg); va_end (arg); return done; } ---- CUT --- Currently at -O2 we produce: stp x29, x30, [sp, -272]! mov w9, -56 mov w8, -128 mov x29, sp add x10, sp, 208 add x11, sp, 272 stp x11, x11, [sp, 48] str x10, [sp, 64] stp w9, w8, [sp, 72] str q0, [sp, 80] ldp q0, q16, [sp, 48] str q1, [sp, 96] str q2, [sp, 112] stp q0, q16, [sp, 16] str q3, [sp, 128] str q4, [sp, 144] str q5, [sp, 160] str q6, [sp, 176] str q7, [sp, 192] stp x1, x2, [sp, 216] add x1, sp, 16 stp x3, x4, [sp, 232] stp x5, x6, [sp, 248] str x7, [sp, 264] bl vfprintf1 Notice how we store to arg (va_list) and then do a copy of arg (va_list) to pass to the vfprintf1. This is due to __builtin_va_start (&arg, 0); If we had arg = __builtin_va_start_internal(); and expanded that instead, there would be no needed.