>> (My question whether why we shouldn't vectorize this at 256b
>> and above still stands, though)
I think we shouldn't vectorize it with any vlen, since the non-vectorized 
codegen is much better.
And also, I have tested -msve-vector-bits=2048, ARM SVE doesn't vectorize it.
-zvl65536b, RVV Clang also doesn't vectorize it.



juzhe.zh...@rivai.ai
 
From: Robin Dapp
Date: 2024-01-11 18:40
To: juzhe.zh...@rivai.ai; Richard Biener
CC: rdapp.gcc; gcc-patches; kito.cheng; Kito.cheng; jeffreyalaw
Subject: Re: [PATCH] RISC-V: Increase scalar_to_vec_cost from 1 to 3
On 1/11/24 11:20, juzhe.zh...@rivai.ai wrote:
> Ok I see your idea and we need to adjust scalar_to_vec accurately. Inside the 
> loop we have these 2 scalar_to_vec:
> 
> 1. MIN_EXPR <patt_28, 15> 1 times scalar_to_vec costs 1 in prologue
> 
>    This scalar_to_vec cost should be 0 or 1 since it only generate single 
> instructions: vmv.v.iv16,15
> 
> 2. 32872 >> patt_26 1 times scalar_to_vec costs 1 in prologue
> 
>    This cost should be higher since it cost 3 instructions:
>     lia4,-32768
>     addiwa4,a4,104
>     vmv.v.xv16,a4
> 
> Am I correct ?
> 
> I guess if we cost 1 case as 1 cost and 2 case as 3 cost. Then we will be 
> good.
 
That would be the general idea, yes.  As Richard mentioned, it doesn't
always work well but for this case here it could help a bit.
(My question whether why we shouldn't vectorize this at 256b
and above still stands, though)
 
As mentioned before, the other thing that needs to be considered
is register-move costs (or the respective cost structure).  On
some uarchs the vmv.v.f might be more expensive than vmv.v.x and
so on - in addition to the instructions needed to synthesize the
constant.
 
Regards
Robin
 
 

Reply via email to