https://gcc.gnu.org/bugzilla/show_bug.cgi?id=96789

Kewen Lin <linkw at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
           Assignee|unassigned at gcc dot gnu.org      |linkw at gcc dot gnu.org

--- Comment #6 from Kewen Lin <linkw at gcc dot gnu.org> ---
(In reply to Richard Biener from comment #4)
> This delays some checks to eventually support part of the BB vectorization
> which is what succeeds here.  I suspect that w/o vectorization we manage
> to elide the tmp[] array but with the part vectorization that occurs we
> fail to do that.
> 
> On the cost side there would be a lot needed to make the vectorization
> not profitable:
> 
>   Vector inside of basic block cost: 8
>   Vector prologue cost: 36
>   Vector epilogue cost: 0
>   Scalar cost of basic block: 64
> 
> the thing to double-check is
> 
> 0x123b1ff0 <unknown> 1 times vec_construct costs 17 in prologue
> 
> that is the cost of the V16QI construct
> 
>  _813 = {_437, _448, _459, _470, _490, _501, _512, _523, _543, _554, _565,
> _576, _125, _143, _161, _179}; 
> 

Thanks Richard!  I did some cost adjustment experiment last year and the cost
for v16qi looks off indeed, but at that time with the cost tweaking for this
the SPEC performance doesn't change, I guessed it's just we happened not have
this kind of case to trap into. I'll have a look and re-evaluate it for this.

Reply via email to