https://gcc.gnu.org/bugzilla/show_bug.cgi?id=118492
Bug ID: 118492
Summary: Move retrieval of virtual table pointers out of the
loop
Product: gcc
Version: 15.0
Status: UNCONFIRMED
Keywords: missed-optimization
Severity: normal
Priority: P3
Component: tree-optimization
Assignee: unassigned at gcc dot gnu.org
Reporter: antoshkka at gmail dot com
Target Milestone: ---
Consider the sample code:
struct Base {
virtual void foo() = 0;
};
void sample(Base& derived) {
for (unsigned i = 0; i < 1000; ++i) {
derived.foo();
}
}
With -O2 GCC generates the following assembly:
sample(Base&):
push rbp
mov rbp, rdi
push rbx
mov ebx, 1000
sub rsp, 8
.L2:
mov rax, QWORD PTR [rbp+0]
mov rdi, rbp
call [QWORD PTR [rax]]
sub ebx, 1
jne .L2
add rsp, 8
pop rbx
pop rbp
ret
Note that `rax, QWORD PTR [rbp+0]` is computed on each iteration, however the
vptr should not change.
A more optimal assembly may look like the following:
sample(Base&):
push rbp
push r14
push rbx
mov rbx, rdi
mov rax, qword ptr [rdi]
mov r14, qword ptr [rax]
mov ebp, 1000
.LBB0_1:
mov rdi, rbx
call r14
dec ebp
jne .LBB0_1
pop rbx
pop r14
pop rbp
ret
Clang has such optimization under the -fstrict-vtable-pointers flag. Some
iformation on that option is available at
https://llvm.org/devmtg/2016-11/Slides/Padlewski-DevirtualizationInLLVM.pdf
Godbolt playground: https://godbolt.org/z/s36qK5e8v