https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113247

--- Comment #2 from JuzheZhong <juzhe.zhong at rivai dot ai> ---
(In reply to Robin Dapp from comment #1)
> Hmm, so I tried reproducing this and without a vector cost model we indeed
> vectorize.  My qemu dynamic instruction count results are not as abysmal as
> yours but still bad enough (20-30% increase in dynamic instructions).
> 
> However, as soon as I use the vector cost model, enabled by
> -mtune=generic-ooo, the sha256 function is not vectorized anymore:
> 
> bla.c:95:5: note: Cost model analysis for part in loop 0:
>   Vector cost: 294
>   Scalar cost: 185
> bla.c:95:5: missed: not vectorized: vectorization is not profitable.
> 
> Without that we have:
> bla.c:95:5: note: Cost model analysis for part in loop 0:
>   Vector cost: 173
>   Scalar cost: 185
> bla.c:95:5: note: Basic block will be vectorized using SLP
> 
> (Those costs are obtained via default_builtin_vectorization_cost).
> 
> The main difference is vec_to_scalar cost being 1 by default and 2 in our
> cost model, as well as vec_perm = 2.  Given our limited permute capabilities
> I think a cost of 2 makes sense.  We can also argue in favor of
> vec_to_scalar = 2 because we need to slide down elements for extraction and
> cannot extract directly.  Setting scalar_to_vec = 2 is debatable and I'd
> rather keep it at 1.
> 
> For the future we need to make a decision whether to continue with
> generic-ooo as the default vector model or if we want to set latencies to a
> few uniform values in order for scheduling not to introduce spilling and
> waiting for dependencies.
> 
> To help with that decision you could run some benchmarks with the
> generic-ooo tuning and see if things get better or worse?

generic-ooo is using the generic cost model I added.

Since the default tuning info is rocket which doesn't have cost model, then
we use default builtin cost model.

I think it's more reasonable adjust code in builtin_vectorize_cost:

if (!cost)
  cost = &generic_vector_cost;

Reply via email to