https://gcc.gnu.org/bugzilla/show_bug.cgi?id=96789
Kewen Lin <linkw at gcc dot gnu.org> changed: What |Removed |Added ---------------------------------------------------------------------------- Assignee|unassigned at gcc dot gnu.org |linkw at gcc dot gnu.org --- Comment #6 from Kewen Lin <linkw at gcc dot gnu.org> --- (In reply to Richard Biener from comment #4) > This delays some checks to eventually support part of the BB vectorization > which is what succeeds here. I suspect that w/o vectorization we manage > to elide the tmp[] array but with the part vectorization that occurs we > fail to do that. > > On the cost side there would be a lot needed to make the vectorization > not profitable: > > Vector inside of basic block cost: 8 > Vector prologue cost: 36 > Vector epilogue cost: 0 > Scalar cost of basic block: 64 > > the thing to double-check is > > 0x123b1ff0 <unknown> 1 times vec_construct costs 17 in prologue > > that is the cost of the V16QI construct > > _813 = {_437, _448, _459, _470, _490, _501, _512, _523, _543, _554, _565, > _576, _125, _143, _161, _179}; > Thanks Richard! I did some cost adjustment experiment last year and the cost for v16qi looks off indeed, but at that time with the cost tweaking for this the SPEC performance doesn't change, I guessed it's just we happened not have this kind of case to trap into. I'll have a look and re-evaluate it for this.