Tamar Christina <tamar.christ...@arm.com> writes:
> Hi All,
>
> The previous fix for this problem was wrong due to a subtle difference between
> where NEON expects the RMW values and where intrinsics expects them.
>
> The insn pattern is modeled after the intrinsics and so needs an expand for
> the vectorizer optab to switch the RTL.
>
> However operand[3] is not expected to be written to so the current pattern is
> bogus.
>
> Instead we use the expand to shuffle around the RTL.
>
> The vectorizer expects operands[3] and operands[0] to be
> the same but the aarch64 intrinsics expanders expect operands[0] and
> operands[1] to be the same.
>
> Bootstrapped Regtested on aarch64-none-linux-gnu and no issues.
>
> Ok for master? and active branches after some stew?
>
> Thanks,
> Tamar
>
> gcc/ChangeLog:
>
>       * config/aarch64/aarch64-simd.md (<sur>dot_prod<vsi2qi>): Correct
>       RTL.
>
> --- inline copy of patch -- 
> diff --git a/gcc/config/aarch64/aarch64-simd.md 
> b/gcc/config/aarch64/aarch64-simd.md
> index 
> 7397f1ec5ca0cb9e3cdd5c46772f604e640666e4..51789f954affd9fa88e2bc1bcc3dacf64ccb5bde
>  100644
> --- a/gcc/config/aarch64/aarch64-simd.md
> +++ b/gcc/config/aarch64/aarch64-simd.md
> @@ -635,18 +635,12 @@ (define_insn "aarch64_usdot<vsi2qi>"
>  ;; and so the vectorizer provides r, in which the result has to be 
> accumulated.
>  (define_expand "<sur>dot_prod<vsi2qi>"
>    [(set (match_operand:VS 0 "register_operand")
> -     (plus:VS (unspec:VS [(match_operand:<VSI2QI> 1 "register_operand")
> +     (plus:VS (match_operand:VS 3 "register_operand")
> +              (unspec:VS [(match_operand:<VSI2QI> 1 "register_operand")
>                           (match_operand:<VSI2QI> 2 "register_operand")]
> -              DOTPROD)
> -             (match_operand:VS 3 "register_operand")))]
> +              DOTPROD)))]
>    "TARGET_DOTPROD"

The canonical plus: operand order was the original one, so I think
it would be better to keep this rtl as-is and instead change
aarch64_<sur>dot<vsi2qi> to:

        (plus:VS (unspec:VS [(match_operand:<VSI2QI> 2 "register_operand" "w")
                             (match_operand:<VSI2QI> 3 "register_operand" "w")]
                            DOTPROD)
                 (match_operand:VS 1 "register_operand" "0"))

Same idea for aarch64_<sur>dot_lane<vsi2qi> and
aarch64_<sur>dot_laneq<vsi2qi>.

Sorry to be awkward…

Thanks,
Richard

> -{
> -  emit_insn (
> -    gen_aarch64_<sur>dot<vsi2qi> (operands[3], operands[3], operands[1],
> -                                 operands[2]));
> -  emit_insn (gen_rtx_SET (operands[0], operands[3]));
> -  DONE;
> -})
> +)
>  
>  ;; Auto-vectorizer pattern for usdot.  The operand[3] and operand[0] are the
>  ;; RMW parameters that when it comes to the vectorizer.

Reply via email to