On 20 October 2015 at 08:54, Michael Collison
<[email protected]> wrote:
> I want to ask a question about existing patterns in neon.md that utilize the
> vec_select and all the lanes as my example does: Why are the following
> pattern not matched if the target is big endian?
> (define_insn "neon_vec_unpack<US>_lo_<mode>"
> [(set (match_operand:<V_unpack> 0 "register_operand" "=w")
> (SE:<V_unpack> (vec_select:<V_HALF>
> (match_operand:VU 1 "register_operand" "w")
> (match_operand:VU 2 "vect_par_constant_low" ""))))]
> "TARGET_NEON && !BYTES_BIG_ENDIAN"
> "vmovl.<US><V_sz_elem> %q0, %e1"
> [(set_attr "type" "neon_shift_imm_long")]
> )
>
> (define_insn "neon_vec_unpack<US>_hi_<mode>"
> [(set (match_operand:<V_unpack> 0 "register_operand" "=w")
> (SE:<V_unpack> (vec_select:<V_HALF>
> (match_operand:VU 1 "register_operand" "w")
> (match_operand:VU 2 "vect_par_constant_high" ""))))]
> "TARGET_NEON && !BYTES_BIG_ENDIAN"
> "vmovl.<US><V_sz_elem> %q0, %f1"
> [(set_attr "type" "neon_shift_imm_long")]
>
> These patterns are similar to the new patterns I am adding and I am
> wondering if my patterns should exclude BYTES_BIG_ENDIAN?
These patterns use %e and %f to access the low and high part of the
input operand - so %e is used to match the use of _lo in the pattern
name, and vect_par_constant_low, and %f with _hi and
vect_par_constant_high. For big-endian, the use of %e and %f would
need to be swapped.
Looking at the patch you posted last month (possibly not the latest version?):
This is a pattern which is supposed to act on the low part of the
input vector, hence _lo in the name:
+(define_insn "vec_sel_widen_ssum_lo<VQI:mode><VW:mode>3"
+ [(set (match_operand:<VW:V_widen> 0 "s_register_operand" "=w")
+ (plus:<VW:V_widen> (sign_extend:<VW:V_widen> (vec_select:VW
(match_operand:VQI 1 "s_register_operand" "%w")
+ (match_operand:VQI 2 "vect_par_constant_low" "")))
+ (match_operand:<VW:V_widen> 3 "s_register_operand" "0")))]
+ "TARGET_NEON"
+ "vaddw.<V_s_elem>\t%q0, %q3, %e1"
Here, using %e1 carries an implicit assumption that the low part of
the input vector is in the lowest numbered of the pair of D registers,
which is only true on little-endian.
This is a bit ugly (and untested) but perhaps something like this
would fix the problem
{
return BYTES_BIG_ENDIAN ? "vaddw.<V_s_elem>\t%q0, %q3, %f1" :
"vaddw.<V_s_elem>\t%q0, %q3, %e1";
}
+ [(set_attr "type" "neon_add_widen")
+ (set_attr "length" "8")]
+)
Similarly, here. Pattern is _hi, register is %f1:
+(define_insn "vec_sel_widen_ssum_hi<VQI:mode><VW:mode>3"
+ [(set (match_operand:<VW:V_widen> 0 "s_register_operand" "=w")
+ (plus:<VW:V_widen> (sign_extend:<VW:V_widen> (vec_select:VW
(match_operand:VQI 1 "s_register_operand" "%w")
+ (match_operand:VQI 2 "vect_par_constant_high" "")))
+ (match_operand:<VW:V_widen> 3 "s_register_operand" "0")))]
+ "TARGET_NEON"
+ "vaddw.<V_s_elem>\t%q0, %q3, %f1"
+ [(set_attr "type" "neon_add_widen")
+ (set_attr "length" "8")]
+)
However, as far as I can see, there isn't an endianness dependency in
widen_ssum<mode>3/widen_usum<mode>3 because both halves of the vector
are used and added together.
Hope this helps
Charles