https://gcc.gnu.org/bugzilla/show_bug.cgi?id=119879
--- Comment #1 from Jan Hubicka <hubicka at gcc dot gnu.org> ---
The problem is in:
/* VEC_PACK_TRUNC_EXPR: If inner size is greater than outer size we will end
up doing two conversions and packing them. */
if (!scalar_p && inner_size > outer_size)
{
int n = inner_size / outer_size;
stmt_cost = stmt_cost * n
+ (n - 1) * ix86_vec_cost (mode, ix86_cost->sse_op);
}
While this is true for code produced by loop vectorizer (which for
double->float produces VEC_PACK_TRUNC_EXPR having two float inputs), it is not
true when SLP vectorizer is trying to cost conversions of 2 floats to 2
doubles. I will check how to distinguish these.