On Sun, Nov 11, 2018 at 11:28 AM Tamar Christina <tamar.christ...@arm.com> wrote: > > Hi All, > > This patch adds the expander support for supporting autovectorization of > complex number operations > such as Complex addition with a rotation along the Argand plane. This also > adds support for complex > FMA. > > The instructions are described in the ArmARM [1] and are available from > Armv8.3-a onwards. > > Concretely, this generates > > f90: > add ip, r1, #15 > add r3, r0, #15 > sub r3, r3, r2 > sub ip, ip, r2 > cmp ip, #30 > cmphi r3, #30 > add r3, r0, #1600 > bls .L5 > .L3: > vld1.32 {q8}, [r0]! > vld1.32 {q9}, [r1]! > vcadd.f32 q8, q8, q9, #90 > vst1.32 {q8}, [r2]! > cmp r0, r3 > bne .L3 > bx lr > .L5: > vld1.32 {d16}, [r0]! > vld1.32 {d17}, [r1]! > vcadd.f32 d16, d16, d17, #90 > vst1.32 {d16}, [r2]! > cmp r0, r3 > bne .L5 > bx lr > > > > now instead of > > f90: > add ip, r1, #31 > add r3, r0, #31 > sub r3, r3, r2 > sub ip, ip, r2 > cmp ip, #62 > cmphi r3, #62 > add r3, r0, #1600 > bls .L2 > .L3: > vld2.32 {d20-d23}, [r0]! > vld2.32 {d24-d27}, [r1]! > cmp r0, r3 > vsub.f32 q8, q10, q13 > vadd.f32 q9, q12, q11 > vst2.32 {d16-d19}, [r2]! > bne .L3 > bx lr > .L2: > vldr d19, .L10 > .L5: > vld1.32 {d16}, [r1]! > vld1.32 {d18}, [r0]! > vrev64.32 d16, d16 > cmp r0, r3 > vsub.f32 d17, d18, d16 > vadd.f32 d16, d16, d18 > vswp d16, d17 > vtbl.8 d16, {d16, d17}, d19 > vst1.32 {d16}, [r2]! > bne .L5 > bx lr > .L11: > .align 3 > .L10: > .byte 0 > .byte 1 > .byte 2 > .byte 3 > .byte 12 > .byte 13 > .byte 14 > .byte 15 > > > For complex additions with a 90* rotation along the Argand plane. > > [1] > https://developer.arm.com/docs/ddi0487/latest/arm-architecture-reference-manual-armv8-for-armv8-a-architecture-profile > > Bootstrap and Regtest on aarch64-none-linux-gnu, arm-none-gnueabihf and > x86_64-pc-linux-gnu > are still on going but previous patch showed no regressions. > > The instructions have also been tested on aarch64-none-elf and arm-none-eabi > on a Armv8.3-a model > and -march=Armv8.3-a+fp16 and all tests pass. > > Ok for trunk?
+;; The complex mla operations always need to expand to two instructions. +;; The first operation does half the computation and the second does the +;; remainder. Because of this, expand early. +(define_expand "fcmla<rot><mode>4" + [(set (match_operand:VF 0 "register_operand") + (plus:VF (match_operand:VF 1 "register_operand") + (unspec:VF [(match_operand:VF 2 "register_operand") + (match_operand:VF 3 "register_operand")] + VCMLA)))] + "TARGET_COMPLEX" +{ + emit_insn (gen_neon_vcmla<rotsplit1><mode> (operands[0], operands[1], + operands[2], operands[3])); + emit_insn (gen_neon_vcmla<rotsplit2><mode> (operands[0], operands[0], + operands[2], operands[3])); + DONE; +}) What's the two halves? Why hide this from the vectorizer if you go down all to the detail and expose the rotation to it? +;; The vcadd and vcmla patterns are made UNSPEC for the explicitly due to the +;; fact that their usage need to guarantee that the source vectors are +;; contiguous. It would be wrong to describe the operation without being able +;; to describe the permute that is also required, but even if that is done +;; the permute would have been created as a LOAD_LANES which means the values +;; in the registers are in the wrong order. Hmm, it's totally non-obvious to me how this relates to loads or what a "non-contiguous" register would be? That is, once you make this an unspec combine will never be able to synthesize this from intrinsics code that doesn't use this form. +(define_insn "neon_vcadd<rot><mode>" + [(set (match_operand:VF 0 "register_operand" "=w") + (unspec:VF [(match_operand:VF 1 "register_operand" "w") + (match_operand:VF 2 "register_operand" "w")] + VCADD))] > Thanks, > Tamar > > gcc/ChangeLog: > > 2018-11-11 Tamar Christina <tamar.christ...@arm.com> > > * config/arm/arm.c (arm_arch8_3, arm_arch8_4): New. > * config/arm/arm.h (TARGET_COMPLEX, arm_arch8_3, arm_arch8_4): New. > (arm_option_reconfigure_globals): Use them. > * config/arm/iterators.md (VDF, VQ_HSF): New. > (VCADD, VCMLA): New. > (VF_constraint, rot, rotsplit1, rotsplit2): Add V4HF and V8HF. > * config/arm/neon.md (neon_vcadd<rot><mode>, fcadd<rot><mode>3, > neon_vcmla<rot><mode>, fcmla<rot><mode>4): New. > * config/arm/unspecs.md (UNSPEC_VCADD90, UNSPEC_VCADD270, > UNSPEC_VCMLA, UNSPEC_VCMLA90, UNSPEC_VCMLA180, UNSPEC_VCMLA270): New. > > gcc/testsuite/ChangeLog: > > 2018-11-11 Tamar Christina <tamar.christ...@arm.com> > > * gcc.target/aarch64/advsimd-intrinsics/vcadd-arrays_1.c: Add Arm > support. > * gcc.target/aarch64/advsimd-intrinsics/vcadd-arrays_2.c: Likewise. > * gcc.target/aarch64/advsimd-intrinsics/vcadd-arrays_3.c: Likewise. > * gcc.target/aarch64/advsimd-intrinsics/vcadd-arrays_4.c: Likewise. > * gcc.target/aarch64/advsimd-intrinsics/vcadd-arrays_5.c: Likewise. > * gcc.target/aarch64/advsimd-intrinsics/vcadd-arrays_6.c: Likewise. > * gcc.target/aarch64/advsimd-intrinsics/vcadd-complex_1.c: Likewise. > * gcc.target/aarch64/advsimd-intrinsics/vcadd-complex_2.c: Likewise. > * gcc.target/aarch64/advsimd-intrinsics/vcadd-complex_3.c: Likewise. > * gcc.target/aarch64/advsimd-intrinsics/vcadd-complex_4.c: Likewise. > * gcc.target/aarch64/advsimd-intrinsics/vcadd-complex_5.c: Likewise. > * gcc.target/aarch64/advsimd-intrinsics/vcadd-complex_6.c: Likewise. > * gcc.target/aarch64/advsimd-intrinsics/vcmla-complex_1.c: Likewise. > * gcc.target/aarch64/advsimd-intrinsics/vcmla-complex_180_1.c: > Likewise. > * gcc.target/aarch64/advsimd-intrinsics/vcmla-complex_180_2.c: > Likewise. > * gcc.target/aarch64/advsimd-intrinsics/vcmla-complex_180_3.c: > Likewise. > * gcc.target/aarch64/advsimd-intrinsics/vcmla-complex_2.c: Likewise. > * gcc.target/aarch64/advsimd-intrinsics/vcmla-complex_270_1.c: > Likewise. > * gcc.target/aarch64/advsimd-intrinsics/vcmla-complex_270_2.c: > Likewise. > * gcc.target/aarch64/advsimd-intrinsics/vcmla-complex_270_3.c: > Likewise. > * gcc.target/aarch64/advsimd-intrinsics/vcmla-complex_3.c: Likewise. > * gcc.target/aarch64/advsimd-intrinsics/vcmla-complex_90_1.c: > Likewise. > * gcc.target/aarch64/advsimd-intrinsics/vcmla-complex_90_2.c: > Likewise. > * gcc.target/aarch64/advsimd-intrinsics/vcmla-complex_90_3.c: > Likewise. > > --