>> (My question whether why we shouldn't vectorize this at 256b >> and above still stands, though) I think we shouldn't vectorize it with any vlen, since the non-vectorized codegen is much better. And also, I have tested -msve-vector-bits=2048, ARM SVE doesn't vectorize it. -zvl65536b, RVV Clang also doesn't vectorize it.
juzhe.zh...@rivai.ai From: Robin Dapp Date: 2024-01-11 18:40 To: juzhe.zh...@rivai.ai; Richard Biener CC: rdapp.gcc; gcc-patches; kito.cheng; Kito.cheng; jeffreyalaw Subject: Re: [PATCH] RISC-V: Increase scalar_to_vec_cost from 1 to 3 On 1/11/24 11:20, juzhe.zh...@rivai.ai wrote: > Ok I see your idea and we need to adjust scalar_to_vec accurately. Inside the > loop we have these 2 scalar_to_vec: > > 1. MIN_EXPR <patt_28, 15> 1 times scalar_to_vec costs 1 in prologue > > This scalar_to_vec cost should be 0 or 1 since it only generate single > instructions: vmv.v.iv16,15 > > 2. 32872 >> patt_26 1 times scalar_to_vec costs 1 in prologue > > This cost should be higher since it cost 3 instructions: > lia4,-32768 > addiwa4,a4,104 > vmv.v.xv16,a4 > > Am I correct ? > > I guess if we cost 1 case as 1 cost and 2 case as 3 cost. Then we will be > good. That would be the general idea, yes. As Richard mentioned, it doesn't always work well but for this case here it could help a bit. (My question whether why we shouldn't vectorize this at 256b and above still stands, though) As mentioned before, the other thing that needs to be considered is register-move costs (or the respective cost structure). On some uarchs the vmv.v.f might be more expensive than vmv.v.x and so on - in addition to the instructions needed to synthesize the constant. Regards Robin