Tamar Christina <tamar.christ...@arm.com> writes: > Hi All, > > The previous fix for this problem was wrong due to a subtle difference between > where NEON expects the RMW values and where intrinsics expects them. > > The insn pattern is modeled after the intrinsics and so needs an expand for > the vectorizer optab to switch the RTL. > > However operand[3] is not expected to be written to so the current pattern is > bogus. > > Instead we use the expand to shuffle around the RTL. > > The vectorizer expects operands[3] and operands[0] to be > the same but the aarch64 intrinsics expanders expect operands[0] and > operands[1] to be the same. > > Bootstrapped Regtested on aarch64-none-linux-gnu and no issues. > > Ok for master? and active branches after some stew? > > Thanks, > Tamar > > gcc/ChangeLog: > > * config/aarch64/aarch64-simd.md (<sur>dot_prod<vsi2qi>): Correct > RTL. > > --- inline copy of patch -- > diff --git a/gcc/config/aarch64/aarch64-simd.md > b/gcc/config/aarch64/aarch64-simd.md > index > 7397f1ec5ca0cb9e3cdd5c46772f604e640666e4..51789f954affd9fa88e2bc1bcc3dacf64ccb5bde > 100644 > --- a/gcc/config/aarch64/aarch64-simd.md > +++ b/gcc/config/aarch64/aarch64-simd.md > @@ -635,18 +635,12 @@ (define_insn "aarch64_usdot<vsi2qi>" > ;; and so the vectorizer provides r, in which the result has to be > accumulated. > (define_expand "<sur>dot_prod<vsi2qi>" > [(set (match_operand:VS 0 "register_operand") > - (plus:VS (unspec:VS [(match_operand:<VSI2QI> 1 "register_operand") > + (plus:VS (match_operand:VS 3 "register_operand") > + (unspec:VS [(match_operand:<VSI2QI> 1 "register_operand") > (match_operand:<VSI2QI> 2 "register_operand")] > - DOTPROD) > - (match_operand:VS 3 "register_operand")))] > + DOTPROD)))] > "TARGET_DOTPROD"
The canonical plus: operand order was the original one, so I think it would be better to keep this rtl as-is and instead change aarch64_<sur>dot<vsi2qi> to: (plus:VS (unspec:VS [(match_operand:<VSI2QI> 2 "register_operand" "w") (match_operand:<VSI2QI> 3 "register_operand" "w")] DOTPROD) (match_operand:VS 1 "register_operand" "0")) Same idea for aarch64_<sur>dot_lane<vsi2qi> and aarch64_<sur>dot_laneq<vsi2qi>. Sorry to be awkward… Thanks, Richard > -{ > - emit_insn ( > - gen_aarch64_<sur>dot<vsi2qi> (operands[3], operands[3], operands[1], > - operands[2])); > - emit_insn (gen_rtx_SET (operands[0], operands[3])); > - DONE; > -}) > +) > > ;; Auto-vectorizer pattern for usdot. The operand[3] and operand[0] are the > ;; RMW parameters that when it comes to the vectorizer.