You can see here:

https://godbolt.org/z/d78646hWb 

The first case can't genreate vfwmul.vv but second case succeed.

Failed to match this instruction:
(set (reg:VNx2DF 150 [ vect__11.50 ])
    (if_then_else:VNx2DF (unspec:VNx2BI [
                (const_vector:VNx2BI repeat [
                        (const_int 1 [0x1])
                    ])
                (reg:DI 153)
                (const_int 2 [0x2]) repeated x2
                (const_int 1 [0x1])
                (const_int 7 [0x7])
                (reg:SI 66 vl)
                (reg:SI 67 vtype)
                (reg:SI 69 N/A)
            ] UNSPEC_VPREDICATE)
        (mult:VNx2DF (float_extend:VNx2DF (reg:VNx2SF 149 [ vect__5.45 ]))
            (reg:VNx2DF 148 [ vect__8.49 ]))
        (unspec:VNx2DF [
                (reg:SI 0 zero)
            ] UNSPEC_VUNDEF)))


This patch is adding this combine pattern.


juzhe.zh...@rivai.ai
 
From: Jeff Law
Date: 2023-06-29 00:24
To: Juzhe-Zhong; gcc-patches
CC: kito.cheng; kito.cheng; palmer; palmer; rdapp.gcc
Subject: Re: [PATCH] RISC-V: Support vfwmul.vv combine lowering
 
 
On 6/27/23 22:15, Juzhe-Zhong wrote:
> Consider the following complicate case:
> #define TEST_TYPE(TYPE1, TYPE2)                                               
>  \
>    __attribute__ ((noipa)) void vwadd_##TYPE1_##TYPE2 (                       
>   \
>      TYPE1 *__restrict dst, TYPE1 *__restrict dst2, TYPE1 *__restrict dst3,   
>   \
>      TYPE1 *__restrict dst4, TYPE2 *__restrict a, TYPE2 *__restrict b,        
>   \
>      TYPE2 *__restrict a2, TYPE2 *__restrict b2, int n)                       
>   \
>    {                                                                          
>   \
>      for (int i = 0; i < n; i++)                                              
>   \
>        {                                                                      
>   \
> dst[i] = (TYPE1) a[i] * (TYPE1) b[i];                                  \
> dst2[i] = (TYPE1) a2[i] * (TYPE1) b[i];                                \
> dst3[i] = (TYPE1) a2[i] * (TYPE1) a[i];                                \
> dst4[i] = (TYPE1) a[i] * (TYPE1) b2[i];                                \
>        }                                                                      
>   \
>    }
> 
> TEST_TYPE (double, float)
> 
> Such complicate situation, Combine PASS can not combine extension of both 
> operands on the fly.
> So the combine PASS will first try to combine one of the combine extension, 
> and then combine
> the other. The combine flow is as follows:
> 
> Original IR:
> (set (reg 0) (float_extend: (reg 1))
> (set (reg 3) (float_extend: (reg 2))
> (set (reg 4) (mult: (reg 0) (reg 3))
> 
> First step of combine:
> (set (reg 3) (float_extend: (reg 2))
> (set (reg 4) (mult: (float_extend: (reg 1) (reg 3))
> 
> Second step of combine:
> (set (reg 4) (mult: (float_extend: (reg 1) (float_extend: (reg 2))
> 
> So, to enhance the combine optimization, we add a "pseudo vwfmul.wv" RTL 
> pattern in autovec-opt.md
> which is (set (reg 0) (mult (float_extend (reg 1) (reg 2)))).
Hmm, something doesn't make sense here.  Combine knows how to do a 3->1 
combination.  I would expect to see the first step fail (substituting 
just one operand), then a later step try to combine all three 
instructions, substituting the extension for both input operands.
 
Can you pass along the .combine dump from the failing case?
 
Jeff
 

Reply via email to