https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112092

Kito Cheng <kito at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |kito at gcc dot gnu.org

--- Comment #4 from Kito Cheng <kito at gcc dot gnu.org> ---
The testcase it self is look like tricky but right, 
it typically could use to optimize mixed-width (mixed-SEW) operations,

You can refer to the EEW stuffs in v-spec[1], most load store has encoding
static-EEW and then could apply such vsetvli fusion optimization.

[1]
https://github.com/riscv/riscv-v-spec/blob/master/v-spec.adoc#52-vector-operands

Give a (more) practical example here:

```c
#include "riscv_vector.h"

void foo(int32_t *in1, int16_t *in2, int16_t *in3, int32_t *out, size_t n, int
cond, int avl) {
    size_t vl = __riscv_vsetvl_e16mf2(avl);
    vint32m1_t a = __riscv_vle32_v_i32m1(in1, vl);
    vint16mf2_t b = __riscv_vle16_v_i16mf2(in2, vl);
    vint16mf2_t c = __riscv_vle16_v_i16mf2(in3, vl);
    vint32m1_t x = __riscv_vwmacc_vv_i32m1(a, b, c, vl);
    __riscv_vse32_v_i32m1(out, x, vl);
}

```

> Is is guaranteed by the RVV specification that the value of `vl' produced
> (which is then supplied as an argument to `__riscv_vle32_v_i32m1', etc.;
> I presume implicitly via the VL CSR as I can't see it in actual assembly
> produced) is going to be the same for all microarchitectures for both:
>
>       vsetvli zero,a6,e32,m1,tu,ma
>
>and:
>
>       vsetvli zero,a6,e16,mf2,ta,ma

This is another trick in this case: tail agnostic vs tail undisturbed

tail undisturbed has stronger semantic than tail agnostic, so using tail
undisturbed for agnostic is always safe and satisfied the semantic, same for
mask agnostic vs mask undisturbed.

But performance is another story, as I know some uArch implement agnostic as
undisturbed, which means agnostic or undisturbed no much difference, so fuse
those two vsetvli is become kind of optimization.

However you could imagine, that also means some uArch is implement agnostic in
another way: agnostic MAY has better performance than undisturbed, we should
not fuse those vsetvli IF we are targeting such target, anyway, our cost model
for RVV still in an initial states, so personally I am fine with that for now,
but I guess we need add some more stuff to -mtune to handle those difference.

Reply via email to