https://gcc.gnu.org/bugzilla/show_bug.cgi?id=119681
Bug ID: 119681
Summary: extraneous move instructions when unrolling
core_list_reverse ()
Product: gcc
Version: 15.0
Status: UNCONFIRMED
Severity: normal
Priority: P3
Component: rtl-optimization
Assignee: unassigned at gcc dot gnu.org
Reporter: artemiy at synopsys dot com
Target Milestone: ---
When unrolling the core_list_reverse () loop from Coremark (with an unroll
pragma):
list_head *
core_list_reverse(list_head *list)
{
list_head *next = 0, *tmp;
#pragma GCC unroll 4
while (list)
{
tmp = list->next;
list->next = next;
next = list;
list = tmp;
}
return next;
}
gcc (15 or any version) doesn't split the variable 'next', causing a move to x0
at every iteration:
core_list_reverse:
cbz x0, .L2
ldr x1, [x0]
mov x6, 0
str x6, [x0]
mov x3, x0
cbz x1, .L2
.L4:
ldr x2, [x1]
str x3, [x1]
mov x0, x1
cbz x2, .L2
ldr x4, [x2]
str x1, [x2]
mov x0, x2
cbz x4, .L2
ldr x5, [x4]
str x2, [x4]
mov x0, x4
mov x6, x4
cbz x5, .L2
mov x0, x5
mov x3, x0
ldr x1, [x0]
str x6, [x0]
cbnz x1, .L4
.L2:
ret
LLVM behavior, for comparison:
core_list_reverse:
cbz x0, .LBB0_6
mov x8, x0
mov x0, xzr
.LBB0_2:
ldr x10, [x8]
str x0, [x8]
cbz x10, .LBB0_7
ldr x9, [x10]
str x8, [x10]
cbz x9, .LBB0_8
ldr x0, [x9]
str x10, [x9]
cbz x0, .LBB0_9
ldr x8, [x0]
str x9, [x0]
cbnz x8, .LBB0_2
.LBB0_6:
ret
.LBB0_7:
mov x0, x8
ret
.LBB0_8:
mov x0, x10
ret
.LBB0_9:
mov x0, x9
ret
Under certain conditions (the load doesn't fully hide the latency of the mov,
the loop is executed a sufficient number of times, etc.), the version with
multiple exits is faster.
godbolt for convenience: https://godbolt.org/z/W965qqWKe
$ aarch64-unknown-linux-gnu-gcc -v
Using built-in specs.
COLLECT_GCC=/home/art/install/aarch64-gcc/bin/aarch64-unknown-linux-gnu-gcc
COLLECT_LTO_WRAPPER=/home/art/install/aarch64-gcc/libexec/gcc/aarch64-unknown-linux-gnu/15.0.1/lto-wrapper
Target: aarch64-unknown-linux-gnu
Configured with: ../../src/gcc/configure --enable-checking --disable-bootstrap
--enable-languages=c,c++ --prefix=/home/art/install/aarch64-gcc
Thread model: posix
Supported LTO compression algorithms: zlib
gcc version 15.0.1 20250408 (experimental) (GCC)