https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98167

--- Comment #15 from Hongtao.liu <crazylht at gmail dot com> ---
(In reply to Andrew Pinski from comment #14)
> (In reply to Hongtao.liu from comment #13)
> > fold shulfps to vec_perm_exp, but still 2 shulfps are generated.
> > 
> > __m128 f (__m128 a, __m128 b)
> > {
> >   vector(4) float _3;
> >   vector(4) float _5;
> >   vector(4) float _6;
> > 
> > ;;   basic block 2, loop depth 0
> > ;;    pred:       ENTRY
> >   _3 = VEC_PERM_EXPR <b_2(D), b_2(D), { 0, 0, 0, 0 }>;
> >   _5 = VEC_PERM_EXPR <a_4(D), a_4(D), { 0, 0, 0, 0 }>;
> >   _6 = _3 * _5;
> >   return _6;
> > ;;    succ:       EXIT
> > 
> > }
> 
> So this is a bit more complex as not all targets have a good extract/dup
> functionary for scalars. So maybe this should be done as a define_insn for
> x86.

No need for extract/dup, if both perm indexes is the same, it can be c = a * b,
and vec_perm_expr (c, c, index}. it seems a quite general optimization which
could apply to all other operations.

Reply via email to