On 20 October 2015 at 08:54, Michael Collison <michael.colli...@linaro.org> wrote: > I want to ask a question about existing patterns in neon.md that utilize the > vec_select and all the lanes as my example does: Why are the following > pattern not matched if the target is big endian?
> (define_insn "neon_vec_unpack<US>_lo_<mode>" > [(set (match_operand:<V_unpack> 0 "register_operand" "=w") > (SE:<V_unpack> (vec_select:<V_HALF> > (match_operand:VU 1 "register_operand" "w") > (match_operand:VU 2 "vect_par_constant_low" ""))))] > "TARGET_NEON && !BYTES_BIG_ENDIAN" > "vmovl.<US><V_sz_elem> %q0, %e1" > [(set_attr "type" "neon_shift_imm_long")] > ) > > (define_insn "neon_vec_unpack<US>_hi_<mode>" > [(set (match_operand:<V_unpack> 0 "register_operand" "=w") > (SE:<V_unpack> (vec_select:<V_HALF> > (match_operand:VU 1 "register_operand" "w") > (match_operand:VU 2 "vect_par_constant_high" ""))))] > "TARGET_NEON && !BYTES_BIG_ENDIAN" > "vmovl.<US><V_sz_elem> %q0, %f1" > [(set_attr "type" "neon_shift_imm_long")] > > These patterns are similar to the new patterns I am adding and I am > wondering if my patterns should exclude BYTES_BIG_ENDIAN? These patterns use %e and %f to access the low and high part of the input operand - so %e is used to match the use of _lo in the pattern name, and vect_par_constant_low, and %f with _hi and vect_par_constant_high. For big-endian, the use of %e and %f would need to be swapped. Looking at the patch you posted last month (possibly not the latest version?): This is a pattern which is supposed to act on the low part of the input vector, hence _lo in the name: +(define_insn "vec_sel_widen_ssum_lo<VQI:mode><VW:mode>3" + [(set (match_operand:<VW:V_widen> 0 "s_register_operand" "=w") + (plus:<VW:V_widen> (sign_extend:<VW:V_widen> (vec_select:VW (match_operand:VQI 1 "s_register_operand" "%w") + (match_operand:VQI 2 "vect_par_constant_low" ""))) + (match_operand:<VW:V_widen> 3 "s_register_operand" "0")))] + "TARGET_NEON" + "vaddw.<V_s_elem>\t%q0, %q3, %e1" Here, using %e1 carries an implicit assumption that the low part of the input vector is in the lowest numbered of the pair of D registers, which is only true on little-endian. This is a bit ugly (and untested) but perhaps something like this would fix the problem { return BYTES_BIG_ENDIAN ? "vaddw.<V_s_elem>\t%q0, %q3, %f1" : "vaddw.<V_s_elem>\t%q0, %q3, %e1"; } + [(set_attr "type" "neon_add_widen") + (set_attr "length" "8")] +) Similarly, here. Pattern is _hi, register is %f1: +(define_insn "vec_sel_widen_ssum_hi<VQI:mode><VW:mode>3" + [(set (match_operand:<VW:V_widen> 0 "s_register_operand" "=w") + (plus:<VW:V_widen> (sign_extend:<VW:V_widen> (vec_select:VW (match_operand:VQI 1 "s_register_operand" "%w") + (match_operand:VQI 2 "vect_par_constant_high" ""))) + (match_operand:<VW:V_widen> 3 "s_register_operand" "0")))] + "TARGET_NEON" + "vaddw.<V_s_elem>\t%q0, %q3, %f1" + [(set_attr "type" "neon_add_widen") + (set_attr "length" "8")] +) However, as far as I can see, there isn't an endianness dependency in widen_ssum<mode>3/widen_usum<mode>3 because both halves of the vector are used and added together. Hope this helps Charles