| Issue |
170194
|
| Summary |
Combine `add %rax, %rcx` and `jmp *%rcx` into `jmp *(%rax, %rcx)`
|
| Labels |
backend:X86,
missed-optimization
|
| Assignees |
|
| Reporter |
PiJoules
|
Given a typical switch/jump table:
```
__attribute__((visibility("hidden")))
int f1();
__attribute__((visibility("hidden")))
int f2();
__attribute__((visibility("hidden")))
int f3();
__attribute__((visibility("hidden")))
int f4();
int foo(int id) {
switch(id) {
case 1 : f1();
break;
case 2 : f2();
break;
case 3 : f3();
break;
case 4 : f4();
break;
}
return 0;
}
```
which generates a pic-friendly lookup table (`-O3 -fPIC`):
```
foo: # @foo
.cfi_startproc
# %bb.0:
# kill: def $edi killed $edi def $rdi
decl %edi
cmpl $3, %edi
ja .LBB0_7
# %bb.1:
pushq %rax
.cfi_def_cfa_offset 16
leaq .LJTI0_0(%rip), %rax
movslq (%rax,%rdi,4), %rcx
addq %rax, %rcx
jmpq *%rcx
.LBB0_5:
xorl %eax, %eax
callq f1
jmp .LBB0_6
.LBB0_3:
xorl %eax, %eax
callq f3
jmp .LBB0_6
.LBB0_4:
xorl %eax, %eax
callq f4
jmp .LBB0_6
.LBB0_2:
xorl %eax, %eax
callq f2
.LBB0_6:
addq $8, %rsp
.cfi_def_cfa_offset 8
.LBB0_7:
xorl %eax, %eax
retq
.LJTI0_0:
.long .LBB0_5-.LJTI0_0
.long .LBB0_2-.LJTI0_0
.long .LBB0_3-.LJTI0_0
.long .LBB0_4-.LJTI0_0
```
The typical sequence for doing the lookup is
```
leaq .LJTI0_0(%rip), %rax # Load the lookup table start (.LJTI0_0)
movslq (%rax,%rdi,4), %rcx # Load and sign extend the offset in the table
addq %rax, %rcx # Add the offset back to the lookup table start
jmpq *%rcx # Jump to the target branch
```
The `add + jmp` instructions on x86_64 take up 5 bytes but I think this could be instead a single `jmp` instruction that takes up 3 bytes if it were combined into `jmpq *(%rax, %rcx)`.
_______________________________________________
llvm-bugs mailing list
[email protected]
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-bugs