On 1/20/25 19:07, Li, Pan2 wrote:
> Agree, the mode-switch will take care of the frm when meet a call (covered by
> testcase already).
>
> 5 │
> 6 │ extern size_t normalize_vl_1 (size_t vl);
> 7 │ extern size_t normalize_vl_2 (size_t vl);
> 8 │
> 9 │ vfloat32m1_t
> 10 │ test_float_point_dynamic_frm (vfloat32m1_t op1, vfloat32m1_t op2,
> 11 │ unsigned count, size_t vl)
> 12 │ {
> 13 │ vfloat32m1_t result = op1;
> 14 │
> 15 │ for (unsigned i = 0; i < count; i++)
> 16 │ {
> 17 │ if (i % 3 == 0)
> 18 │ {
> 19 │ result = __riscv_vfadd_vv_f32m1 (op1, result, vl);
> 20 │ vl = normalize_vl_1 (vl);
> 21 │ }
> 22 │ else
> 23 │ {
> 24 │ result = __riscv_vfadd_vv_f32m1_rm (result, op2, 1, vl);
> 25 │ vl = normalize_vl_2 (vl);
> 26 │ }
> 27 │ }
> 28 │
> 29 │ return result;
> 30 │ }
>
> .L12:
> csrr a5,vlenb
> add a5,a5,sp
> vl1re32.v v1,0(a5)
> vsetvli zero,a1,e32,m1,ta,ma
> addiw s0,s0,1
> vfadd.vv v8,v1,v8 // Do not pollute frm, nothing need to do
> here
> vs1r.v v8,0(sp)
> call normalize_vl_1
> vl1re32.v v8,0(sp)
> frrm a4
> mv a1,a0
> beq s3,s0,.L8
> .L5:
> mulw a5,s0,s2
> mv a0,a1
> bleu a5,s1,.L12
> fsrmi 1
> csrr a5,vlenb
> slli a5,a5,1
> add a5,a5,sp
>
>
>
> vl1re32.v v1,0(a5)
>
>
>
> vsetvli zero,a1,e32,m1,ta,ma
>
>
>
> vfadd.vv v8,v8,v1 // Pollute frm, will restore frm before call
> vs1r.v v8,0(sp)
> fsrm a4
> call normalize_vl_2
> addiw s0,s0,1
> vl1re32.v v8,0(sp)
> frrm a4
> mv a1,a0
> bne s3,s0,.L5
>
> while for llround autovec, it will also perform something like restore frm
> before leave the func.
>
> 8 │ #define TEST_UNARY_CALL_CVT(TYPE_IN, TYPE_OUT, CALL) \
> 9 │ void test_##TYPE_IN##_##TYPE_OUT##_##CALL ( \
> 10 │ TYPE_OUT *out, TYPE_IN *in, unsigned count) \
> 11 │ { \
> 12 │ for (unsigned i = 0; i < count; i++) \
> 13 │ out[i] = CALL (in[i]); \
> 14 │ }
>
> TEST_UNARY_CALL_CVT (double, int64_t, __builtin_llround)
>
> test_double_int64_t___builtin_llround:
> frrm a3
>
>
>
> beq a2,zero,.L8
>
>
>
> fsrmi 4
>
>
>
> slli a2,a2,32
>
>
>
> srli a2,a2,32
>
>
>
> .L3:
>
>
>
> vsetvli a5,a2,e64,m1,ta,ma
>
>
>
> vle64.v v1,0(a1)
> slli a4,a5,3
>
>
>
> sub a2,a2,a5
>
>
>
> add a1,a1,a4
>
>
>
> vfcvt.x.f.v v1,v1
> vse64.v v1,0(a0)
> add a0,a0,a4
>
>
>
> bne a2,zero,.L3
> .L8:
> fsrm a3Silly question, what exactly is the procedure calling convention rule for FCSR/FRM ? Is it a Caller saved or a Callee saved Reg. The psABI CC doc is not explicit in those terms at least [1] | "The Floating-Point Control and Status Register (fcsr) must have thread storage duration | in accordance with C11 section 7.6 "Floating-point environment <fenv.h" Per your llround snippet#2 is seems like Callee saved (function is restoring the mode before it returns), but then in the snippet #1 at top, why does it need to save the value before a function call, can't the callee just restore it back. I'm surely missing something here. P.S. I though I had the test reduced, but turns out the reduced one is fine, even if it has some abnormal control flow and FRM set/restore. Will try to narrow down using qemu tracing the actual function which is leaking out FRM. Thx, -Vineet [1] https://github.com/riscv-non-isa/riscv-elf-psabi-doc/blob/master/riscv-cc.adoc
