Re: [ARM] Use vector wide add for mixed-mode adds

Charles Baylis Wed, 21 Oct 2015 08:06:10 -0700

On 20 October 2015 at 08:54, Michael Collison
<[email protected]> wrote:
> I want to ask a question about existing patterns in neon.md that utilize the
> vec_select and all the lanes as my example does: Why are the following
> pattern not matched if the target is big endian?


> (define_insn "neon_vec_unpack<US>_lo_<mode>"
>   [(set (match_operand:<V_unpack> 0 "register_operand" "=w")
>         (SE:<V_unpack> (vec_select:<V_HALF>
>               (match_operand:VU 1 "register_operand" "w")
>               (match_operand:VU 2 "vect_par_constant_low" ""))))]
>   "TARGET_NEON && !BYTES_BIG_ENDIAN"
>   "vmovl.<US><V_sz_elem> %q0, %e1"
>   [(set_attr "type" "neon_shift_imm_long")]
> )
>
> (define_insn "neon_vec_unpack<US>_hi_<mode>"
>   [(set (match_operand:<V_unpack> 0 "register_operand" "=w")
>         (SE:<V_unpack> (vec_select:<V_HALF>
>               (match_operand:VU 1 "register_operand" "w")
>               (match_operand:VU 2 "vect_par_constant_high" ""))))]
>   "TARGET_NEON && !BYTES_BIG_ENDIAN"
>   "vmovl.<US><V_sz_elem> %q0, %f1"
>   [(set_attr "type" "neon_shift_imm_long")]
>
> These patterns are similar to the new patterns I am adding and I am
> wondering if my patterns should exclude BYTES_BIG_ENDIAN?

These patterns use %e and %f to access the low and high part of the
input operand - so %e is used to match the use of _lo in the pattern
name, and vect_par_constant_low, and %f with _hi and
vect_par_constant_high. For big-endian, the use of %e and %f would
need to be swapped.

Looking at the patch you posted last month (possibly not the latest version?):

This is a pattern which is supposed to act on the low part of the
input vector, hence _lo in the name:
+(define_insn "vec_sel_widen_ssum_lo<VQI:mode><VW:mode>3"
+  [(set (match_operand:<VW:V_widen> 0 "s_register_operand" "=w")
+ (plus:<VW:V_widen> (sign_extend:<VW:V_widen> (vec_select:VW
(match_operand:VQI 1 "s_register_operand" "%w")
+   (match_operand:VQI 2 "vect_par_constant_low" "")))
+        (match_operand:<VW:V_widen> 3 "s_register_operand" "0")))]
+  "TARGET_NEON"
+  "vaddw.<V_s_elem>\t%q0, %q3, %e1"

Here, using %e1 carries an implicit assumption that the low part of
the input vector is in the lowest numbered of the pair of D registers,
which is only true on little-endian.

This is a bit ugly (and untested) but perhaps something like this
would fix the problem
{
    return BYTES_BIG_ENDIAN ?  "vaddw.<V_s_elem>\t%q0, %q3, %f1" :
"vaddw.<V_s_elem>\t%q0, %q3, %e1";
}

+  [(set_attr "type" "neon_add_widen")
+  (set_attr "length" "8")]
+)

Similarly, here. Pattern is _hi, register is %f1:

+(define_insn "vec_sel_widen_ssum_hi<VQI:mode><VW:mode>3"
+  [(set (match_operand:<VW:V_widen> 0 "s_register_operand" "=w")
+ (plus:<VW:V_widen> (sign_extend:<VW:V_widen> (vec_select:VW
(match_operand:VQI 1 "s_register_operand" "%w")
+   (match_operand:VQI 2 "vect_par_constant_high" "")))
+        (match_operand:<VW:V_widen> 3 "s_register_operand" "0")))]
+  "TARGET_NEON"
+  "vaddw.<V_s_elem>\t%q0, %q3, %f1"
+  [(set_attr "type" "neon_add_widen")
+  (set_attr "length" "8")]
+)

However, as far as I can see, there isn't an endianness dependency in
widen_ssum<mode>3/widen_usum<mode>3 because both halves of the vector
are used and added together.


Hope this helps
Charles

Re: [ARM] Use vector wide add for mixed-mode adds

Reply via email to