You can see here:
https://godbolt.org/z/d78646hWb
The first case can't genreate vfwmul.vv but second case succeed.
Failed to match this instruction:
(set (reg:VNx2DF 150 [ vect__11.50 ])
(if_then_else:VNx2DF (unspec:VNx2BI [
(const_vector:VNx2BI repeat [
(const_int 1 [0x1])
])
(reg:DI 153)
(const_int 2 [0x2]) repeated x2
(const_int 1 [0x1])
(const_int 7 [0x7])
(reg:SI 66 vl)
(reg:SI 67 vtype)
(reg:SI 69 N/A)
] UNSPEC_VPREDICATE)
(mult:VNx2DF (float_extend:VNx2DF (reg:VNx2SF 149 [ vect__5.45 ]))
(reg:VNx2DF 148 [ vect__8.49 ]))
(unspec:VNx2DF [
(reg:SI 0 zero)
] UNSPEC_VUNDEF)))
This patch is adding this combine pattern.
[email protected]
From: Jeff Law
Date: 2023-06-29 00:24
To: Juzhe-Zhong; gcc-patches
CC: kito.cheng; kito.cheng; palmer; palmer; rdapp.gcc
Subject: Re: [PATCH] RISC-V: Support vfwmul.vv combine lowering
On 6/27/23 22:15, Juzhe-Zhong wrote:
> Consider the following complicate case:
> #define TEST_TYPE(TYPE1, TYPE2)
> \
> __attribute__ ((noipa)) void vwadd_##TYPE1_##TYPE2 (
> \
> TYPE1 *__restrict dst, TYPE1 *__restrict dst2, TYPE1 *__restrict dst3,
> \
> TYPE1 *__restrict dst4, TYPE2 *__restrict a, TYPE2 *__restrict b,
> \
> TYPE2 *__restrict a2, TYPE2 *__restrict b2, int n)
> \
> {
> \
> for (int i = 0; i < n; i++)
> \
> {
> \
> dst[i] = (TYPE1) a[i] * (TYPE1) b[i]; \
> dst2[i] = (TYPE1) a2[i] * (TYPE1) b[i]; \
> dst3[i] = (TYPE1) a2[i] * (TYPE1) a[i]; \
> dst4[i] = (TYPE1) a[i] * (TYPE1) b2[i]; \
> }
> \
> }
>
> TEST_TYPE (double, float)
>
> Such complicate situation, Combine PASS can not combine extension of both
> operands on the fly.
> So the combine PASS will first try to combine one of the combine extension,
> and then combine
> the other. The combine flow is as follows:
>
> Original IR:
> (set (reg 0) (float_extend: (reg 1))
> (set (reg 3) (float_extend: (reg 2))
> (set (reg 4) (mult: (reg 0) (reg 3))
>
> First step of combine:
> (set (reg 3) (float_extend: (reg 2))
> (set (reg 4) (mult: (float_extend: (reg 1) (reg 3))
>
> Second step of combine:
> (set (reg 4) (mult: (float_extend: (reg 1) (float_extend: (reg 2))
>
> So, to enhance the combine optimization, we add a "pseudo vwfmul.wv" RTL
> pattern in autovec-opt.md
> which is (set (reg 0) (mult (float_extend (reg 1) (reg 2)))).
Hmm, something doesn't make sense here. Combine knows how to do a 3->1
combination. I would expect to see the first step fail (substituting
just one operand), then a later step try to combine all three
instructions, substituting the extension for both input operands.
Can you pass along the .combine dump from the failing case?
Jeff