| Issue |
182312
|
| Summary |
[RISC-V] quad widening multiply-accumulate is not vectorized, when the destination element width is not supported in the vector extension
|
| Labels |
new issue
|
| Assignees |
|
| Reporter |
christian-herber-nxp
|
This function:
``` C
void
mac (int64_t* dst, int16_t *a, int16_t *b, uint16_t len)
{
for (uint16_t i=0; i < len; i++)
dst[i] += (int64_t) a[i] * (int64_t) b[i];
}
```
Is vectorized when compiled for Zve64x. When compiling for Zve32x, it is not vectorized. The 64-bit widening multiplication has to be emulated through mul and mulh. This happens for the scalar version.
Diagnostics report that
> remark: the cost-model indicates that interleaving is not beneficial [-Rpass-missed=loop-vectorize]
My expectation would be that the vectorization is still beneficial, especially since Zve32x has the same capabilities as the scalar ISA for emulating this behavior.
https://godbolt.org/z/4Yo4cqojd
_______________________________________________
llvm-bugs mailing list
[email protected]
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-bugs