Issue 182312
Summary [RISC-V] quad widening multiply-accumulate is not vectorized, when the destination element width is not supported in the vector extension
Labels new issue
Assignees
Reporter christian-herber-nxp
    This function:

``` C
void
mac (int64_t* dst, int16_t *a, int16_t *b, uint16_t len)
{
    for (uint16_t i=0; i < len; i++)
        dst[i] += (int64_t) a[i] * (int64_t) b[i];
}
```

Is vectorized when compiled for Zve64x. When compiling for Zve32x, it is not vectorized. The 64-bit widening multiplication has to be emulated through mul and mulh. This happens for the scalar version.
Diagnostics report that

> remark: the cost-model indicates that interleaving is not beneficial [-Rpass-missed=loop-vectorize]

My expectation would be that the vectorization is still beneficial, especially since Zve32x has the same capabilities as the scalar ISA for emulating this behavior.

https://godbolt.org/z/4Yo4cqojd
_______________________________________________
llvm-bugs mailing list
[email protected]
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-bugs

Reply via email to