float64x2 vector operator overloads scalarize on NEON

pinskia at gcc dot gnu.org Fri, 04 Jan 2019 21:49:00 -0800

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88705


--- Comment #6 from Andrew Pinski <pinskia at gcc dot gnu.org> ---
for the v4sf issue (v2sf has a similar issue too):
(define_insn "vec_extract<mode>"
  [(set (match_operand:<V_elem> 0 "nonimmediate_operand" "=Um,r")
        (vec_select:<V_elem>
          (match_operand:VQ2 1 "s_register_operand" "w,w")
          (parallel [(match_operand:SI 2 "immediate_operand" "i,i")])))]


That does not allow w (neon/vfp register) constraint as a dst so everything
needs to go through GPRs.

There is no vec_extract for V2DF which causes it to go through memory.

For the init part, the biggest issue is neon_expand_vector_init falls back to
doing everything in memory instead of doing "insertations" if there are a small
number of elements (<= 4, though you could do some gpr logical operations to
get to that number if needed):
  /* Construct the vector in memory one field at a time
     and load the whole vector.  */
  mem = assign_stack_temp (mode, GET_MODE_SIZE (mode));
  for (i = 0; i < n_elts; i++)
    emit_move_insn (adjust_address_nv (mem, inner_mode,
                                    i * GET_MODE_SIZE (inner_mode)),
                    XVECEXP (vals, 0, i));
  emit_move_insn (target, mem);

----- CUT ----
Note aarch64_expand_vector_init has some interesting ideas that could be
repeated here (and more due to the overlapping of lower d, q, and s registers).

[Bug target/88705] [ARM][Generic Vector Extensions] float32x4/float64x2 vector operator overloads scalarize on NEON

Reply via email to