https://gcc.gnu.org/bugzilla/show_bug.cgi?id=117544
Bug ID: 117544
Summary: Lack of vsetvli after function call for whole register
move
Product: gcc
Version: 15.0
Status: UNCONFIRMED
Severity: normal
Priority: P3
Component: target
Assignee: unassigned at gcc dot gnu.org
Reporter: kito at gcc dot gnu.org
CC: jeffreyalaw at gmail dot com, juzhe.zhong at rivai dot ai,
palmer at gcc dot gnu.org, pan2.li at intel dot com,
rdapp at gcc dot gnu.org
Target Milestone: ---
Target: riscv64
Cross posting from RISC-V LLVM community.
Unfortunately, whole register move instructions depend on vtype*1, which means
they will cause an illegal instruction exception if VILL=1. This is generally
not a problem, as VILL is set to 0 after any valid vsetvli instruction, so it’s
usually safe unless the user executes a whole vector register move very early
in the program.
However, the situation changed after the Linux kernel applied a patch[2] that
sets VILL=1 after any system call. So, if we try to execute a whole register
move after a system call, it will cause an illegal instruction exception. This
can be difficult to detect, as the system call may not be invoked immediately;
it might be deeply nested in a call chain, such as within printf.
Unfortunately, this change has already shipped with Linux kernel 6.5, which was
released on August 28, 2023.
I'm not sure if it's reasonable to ask the Linux kernel maintainers to fix this
by keeping VILL consistent across system calls.
An alternative approach is to address this issue on the toolchain side by
requiring at least one valid vsetvli instruction before any whole register
move. This might be an ugly workaround, but it’s probably the simplest way to
resolve the issue. I also realized this might be a better solution since the
psABI specifies that VTYPE is NOT preserved across function calls. This means
we can’t guarantee that VILL is not 1 at the function entry, so placing a
vsetvli instruction right after the function call may be necessary.
Testcase:
#include <riscv_vector.h>
void bar() __attribute__((riscv_vector_cc));
vint32m1_t foo(vint32m1_t a, vint32m1_t b) {
register vint32m1_t x asm("v24") = b;
bar();
asm ("#xx %0"::"vr"(x) );
return x;
}
Generated asm with riscv-linux-gcc -O3 -march=rv64gcv:
foo:
addi sp,sp,-16
csrr t0,vlenb
sd ra,8(sp)
sub sp,sp,t0
vs1r.v v24,0(sp)
vmv1r.v v24,v9
call bar
csrr t0,vlenb
vmv1r.v v8,v24
vl1re64.v v24,0(sp)
add sp,sp,t0
ld ra,8(sp)
addi sp,sp,16
jr ra
And the compiler could emits code like below to fix this issue:
foo:
addi sp,sp,-16
csrr t0,vlenb
sd ra,8(sp)
sub sp,sp,t0
vs1r.v v24,0(sp)
vsetivli x0, 0, e8, m1, ta, ma # Need vsetvli to make VILL=0 here
vmv1r.v v24,v9
call bar
csrr t0,vlenb
vsetivli x0, 0, e8, m1, ta, ma # Need vsetvli to make VILL=0 here
vmv1r.v v8,v24
vl1re64.v v24,0(sp)
add sp,sp,t0
ld ra,8(sp)
addi sp,sp,16
jr ra
NOTE: We have hit this issue within our internal spec run.
*1 That clarification[1] is added after 1.0...
[1]
https://github.com/riscvarchive/riscv-v-spec/commit/856fe5bd1cb135c39258e6ca941bf234ae63e1b1
[2]
https://github.com/torvalds/linux/commit/9657e9b7d2538dc73c24947aa00a8525dfb8062c