> From: James Greenhalgh <james.greenha...@arm.com> > Sent: 31 May 2016 16:24 > To: Bin Cheng > Cc: gcc-patches@gcc.gnu.org; nd > Subject: Re: [PATCH AArch64]Support missing vcond pattern by adding/using > vec_cmp/vcond_mask patterns. > > On Tue, May 17, 2016 at 09:02:22AM +0000, Bin Cheng wrote: > > Hi, > > Alan and Renlin noticed that some vcond patterns are not supported in > > AArch64(or AArch32?) backend, and they both had some patches fixing this. > > After investigation, I agree with them that vcond/vcondu in AArch64's > > backend > > should be re-implemented using vec_cmp/vcond_mask patterns, so here comes > > this patch which is based on Alan's. This patch supports all vcond/vcondu > > patterns by implementing/using vec_cmp and vcond_mask patterns. Different > > to > > the original patch, it doesn't change GCC's expanding process, and it keeps > > vcond patterns. The patch also introduces vec_cmp*_internal to support > > special case optimization for vcond/vcondu which current implementation > > does. > > Apart from Alan's patch, I also learned ideas from Renlin's, and it is my > > change that shall be blamed if any potential bug is introduced. > > > > With this patch, GCC's test condition "vect_cond_mixed" can be enabled on > > AArch64 (in a following patch). Bootstrap and test on AArch64. Is it OK? > > BTW, this patch is necessary for gcc.dg/vect/PR56541.c (on AArch64) which > > was > > added before in tree if-conversion patch. > > Splitting this patch would have been very helpful. One patch each for the > new standard pattern names, and one patch for the refactor of vcond. As > it is, this patch is rather difficult to read. Done, patch split into two with one implementing new vcond_mask&vec_cmp patterns, and another re-writing vcond patterns.
> > > diff --git a/gcc/config/aarch64/aarch64-simd.md > > b/gcc/config/aarch64/aarch64-simd.md > > index bd73bce..f51473a 100644 > > --- a/gcc/config/aarch64/aarch64-simd.md > > +++ b/gcc/config/aarch64/aarch64-simd.md > > @@ -1053,7 +1053,7 @@ > > } > > > > cmp_fmt = gen_rtx_fmt_ee (cmp_operator, V2DImode, operands[1], > > operands[2]); > > - emit_insn (gen_aarch64_vcond_internalv2div2di (operands[0], operands[1], > > + emit_insn (gen_vcondv2div2di (operands[0], operands[1], > > operands[2], cmp_fmt, operands[1], operands[2])); > > DONE; > > }) > > @@ -2225,204 +2225,215 @@ > > DONE; > > }) > > > > -(define_expand "aarch64_vcond_internal<mode><mode>" > > +(define_expand "vcond_mask_<mode><v_cmp_result>" > > + [(match_operand:VALLDI 0 "register_operand") > > + (match_operand:VALLDI 1 "nonmemory_operand") > > + (match_operand:VALLDI 2 "nonmemory_operand") > > + (match_operand:<V_cmp_result> 3 "register_operand")] > > + "TARGET_SIMD" > > +{ > > + /* If we have (a = (P) ? -1 : 0); > > + Then we can simply move the generated mask (result must be int). */ > > + if (operands[1] == CONSTM1_RTX (<MODE>mode) > > + && operands[2] == CONST0_RTX (<MODE>mode)) > > + emit_move_insn (operands[0], operands[3]); > > + /* Similarly, (a = (P) ? 0 : -1) is just inverting the generated mask. > > */ > > + else if (operands[1] == CONST0_RTX (<MODE>mode) > > + && operands[2] == CONSTM1_RTX (<MODE>mode)) > > + emit_insn (gen_one_cmpl<v_cmp_result>2 (operands[0], operands[3])); > > + else > > + { > > + if (!REG_P (operands[1])) > > + operands[1] = force_reg (<MODE>mode, operands[1]); > > + if (!REG_P (operands[2])) > > + operands[2] = force_reg (<MODE>mode, operands[2]); > > + emit_insn (gen_aarch64_simd_bsl<mode> (operands[0], operands[3], > > + operands[1], operands[2])); > > + } > > + > > + DONE; > > +}) > > + > > This pattern is fine. > > > +;; Patterns comparing two vectors to produce a mask. > > This comment is insufficient. The logic in vec_cmp<mode><mode>_internal > does not always return the expected mask (in particular for NE), but this > is not made clear in the comment. Comments added to various places to make the code easier to understand and maintain. > > > + > > +(define_expand "vec_cmp<mode><mode>_internal" > > [(set (match_operand:VSDQ_I_DI 0 "register_operand") > > - (if_then_else:VSDQ_I_DI > > - (match_operator 3 "comparison_operator" > > - [(match_operand:VSDQ_I_DI 4 "register_operand") > > - (match_operand:VSDQ_I_DI 5 "nonmemory_operand")]) > > - (match_operand:VSDQ_I_DI 1 "nonmemory_operand") > > - (match_operand:VSDQ_I_DI 2 "nonmemory_operand")))] > > + (match_operator 1 "comparison_operator" > > + [(match_operand:VSDQ_I_DI 2 "register_operand") > > + (match_operand:VSDQ_I_DI 3 "nonmemory_operand")]))] > > "TARGET_SIMD" > > <snip> > > > +(define_expand "vec_cmp<mode><mode>" > > + [(set (match_operand:VSDQ_I_DI 0 "register_operand") > > + (match_operator 1 "comparison_operator" > > + [(match_operand:VSDQ_I_DI 2 "register_operand") > > + (match_operand:VSDQ_I_DI 3 "nonmemory_operand")]))] > > + "TARGET_SIMD" > > +{ > > + enum rtx_code code = GET_CODE (operands[1]); > > + > > + emit_insn (gen_vec_cmp<mode><mode>_internal (operands[0], > > + operands[1], operands[2], > > + operands[3])); > > + /* See comments in vec_cmp<mode><v_cmp_result>_internal, below > > + operators are computed using other operators, and we need to > > + revert the result. */ > > s/revert/invert > > There are no comments in vec_cmp<mode><v_cmp_result>_internal for this > to refer to. But, there should be - more comments would definitely help > this patch! > > These comments apply equally to the other copies of this comment in the > patch. Done. > > > + if (code == NE) > > + emit_insn (gen_one_cmpl<v_cmp_result>2 (operands[0], operands[0])); > > DONE; > > }) > > > > -(define_expand "aarch64_vcond_internal<VDQF_COND:mode><VDQF:mode>" > > - [(set (match_operand:VDQF_COND 0 "register_operand") > > - (if_then_else:VDQF > > - (match_operator 3 "comparison_operator" > > - [(match_operand:VDQF 4 "register_operand") > > - (match_operand:VDQF 5 "nonmemory_operand")]) > > - (match_operand:VDQF_COND 1 "nonmemory_operand") > > - (match_operand:VDQF_COND 2 "nonmemory_operand")))] > > +(define_expand "vec_cmp<mode><v_cmp_result>_internal" > > + [(set (match_operand:<V_cmp_result> 0 "register_operand") > > + (match_operator 1 "comparison_operator" > > + [(match_operand:VDQF 2 "register_operand") > > + (match_operand:VDQF 3 "nonmemory_operand")]))] > > This needs much more thorough commenting. The logic is now extremely > difficult to verify as to which pattern is responible for inverting the > mask. This makes it hard to spot whether the inversions applied to the > unordered comparisons are correct. I can't really review the patch with > the logic split across uncommented patterns in this way - is there > anything you can do to clean it up? Done. > > > diff --git a/gcc/testsuite/gcc.target/aarch64/vect-vcond.c > > b/gcc/testsuite/gcc.target/aarch64/vect-vcond.c > > new file mode 100644 > > index 0000000..b0ca6d1 > > --- /dev/null > > +++ b/gcc/testsuite/gcc.target/aarch64/vect-vcond.c > > None of this is AArch64 specific? Test case removed since I will enable vect_cond_mixed on AArch64, which will enables various generic cases testing mixed vcond vectorization, which makes adding AArch64 specific test unnecessary here. > > > @@ -0,0 +1,59 @@ > > +/* { dg-do compile } */ > > +/* { dg-options "-O2 -ftree-vectorize -fdump-tree-vect-details" } */ > > + > > +float fa[1024] = {0.0}, fb[1024] = {0.0}, fc[1024]; > > +double da[1024] = {0.0}, db[1024] = {0.0}, dc[1024]; > > + > > +int vcond_f_i (int *x) > > +{ > > + int i = 0; > > + for (i = 0; i < 1024; i++) > > + { > > + fa[i] -= 1.0; > > + fb[i] += 1.0; > > + fc[i] = (x[i] < i) ? fa[i] : fb[i]; > > + } > > + > > + return 0; > > +} > > + > > +int vcond_f_u (unsigned int *x) > > +{ > > + unsigned int i = 0; > > + for (i = 0; i < 1024; i++) > > + { > > + fa[i] -= 1.0; > > + fb[i] += 1.0; > > + fc[i] = (x[i] < i) ? fa[i] : fb[i]; > > + } > > + > > + return 0; > > +} > > + > > +int vcond_d_i (long long *x) > > +{ > > + long long i = 0; > > + for (i = 0; i < 1024; i++) > > + { > > + da[i] -= 1.0; > > + db[i] += 1.0; > > + dc[i] = (x[i] < i) ? da[i] : db[i]; > > + } > > + > > + return 0; > > +} > > + > > +int vcond_d_u (unsigned long long *x) > > +{ > > + unsigned long long i = 0; > > + for (i = 0; i < 1024; i++) > > + { > > + da[i] -= 1.0; > > + db[i] += 1.0; > > + dc[i] = (x[i] < i) ? da[i] : db[i]; > > + } > > + > > + return 0; > > +} > > + > > +/* { dg-final { scan-tree-dump-times "vectorized 1 loops" 4 "vect" } } */ > Patches attached. Bootstrap and test on AArch64. Is it OK? Thanks, bin 2016-06-07 Alan Lawrence <alan.lawre...@arm.com> Renlin Li <renlin...@arm.com> Bin Cheng <bin.ch...@arm.com> * config/aarch64/aarch64-simd.md (vec_cmp<mode><mode>): New pattern. (vec_cmp<mode><mode>_internal): New pattern. (vec_cmp<mode><v_cmp_result>): New pattern. (vec_cmp<mode><v_cmp_result>_internal): New pattern. (vec_cmpu<mode><mode>): New pattern. (vcond_mask_<mode><v_cmp_result>): New pattern. 2016-06-07 Alan Lawrence <alan.lawre...@arm.com> Renlin Li <renlin...@arm.com> Bin Cheng <bin.ch...@arm.com> * config/aarch64/iterators.md (V_cmp_mixed, v_cmp_mixed): New. * config/aarch64/aarch64-simd.md (<su><maxmin>v2di3): Call gen_vcondv2div2di instead of gen_aarch64_vcond_internalv2div2di. (aarch64_vcond_internal<mode><mode>): Delete pattern. (aarch64_vcond_internal<VDQF_COND:mode><VDQF:mode>): Ditto. (vcond<mode><mode>): Re-implement using vec_cmp and vcond_mask. (vcondu<mode><mode>): Ditto. (vcond<v_cmp_result><mode>): Delete. (vcond<v_cmp_mixed><mode>): New pattern. (vcondu<mode><v_cmp_mixed>): New pattern. (aarch64_cmtst<mode>): Revise comment using aarch64_vcond instead of aarch64_vcond_internal.
diff --git a/gcc/config/aarch64/aarch64-simd.md b/gcc/config/aarch64/aarch64-simd.md index 6ea35bf..9437d02 100644 --- a/gcc/config/aarch64/aarch64-simd.md +++ b/gcc/config/aarch64/aarch64-simd.md @@ -2202,6 +2202,325 @@ DONE; }) +(define_expand "vcond_mask_<mode><v_cmp_result>" + [(match_operand:VALLDI 0 "register_operand") + (match_operand:VALLDI 1 "nonmemory_operand") + (match_operand:VALLDI 2 "nonmemory_operand") + (match_operand:<V_cmp_result> 3 "register_operand")] + "TARGET_SIMD" +{ + /* If we have (a = (P) ? -1 : 0); + Then we can simply move the generated mask (result must be int). */ + if (operands[1] == CONSTM1_RTX (<MODE>mode) + && operands[2] == CONST0_RTX (<MODE>mode)) + emit_move_insn (operands[0], operands[3]); + /* Similarly, (a = (P) ? 0 : -1) is just inverting the generated mask. */ + else if (operands[1] == CONST0_RTX (<MODE>mode) + && operands[2] == CONSTM1_RTX (<MODE>mode)) + emit_insn (gen_one_cmpl<v_cmp_result>2 (operands[0], operands[3])); + else + { + if (!REG_P (operands[1])) + operands[1] = force_reg (<MODE>mode, operands[1]); + if (!REG_P (operands[2])) + operands[2] = force_reg (<MODE>mode, operands[2]); + emit_insn (gen_aarch64_simd_bsl<mode> (operands[0], operands[3], + operands[1], operands[2])); + } + + DONE; +}) + +;; Patterns comparing two vectors to produce a mask. + +;; Internal pattern for vec_cmp. It returns expected result mask for +;; comparison operators other than NE. For NE operator, it returns +;; the opposite result mask. This is intended behavior so that we +;; can save one mask inverting instruction when using this pattern in +;; vcond patterns. In this case, it is the caller's responsibility +;; to interpret and use the result mask correctly. +(define_expand "vec_cmp<mode><mode>_internal" + [(set (match_operand:VSDQ_I_DI 0 "register_operand") + (match_operator 1 "comparison_operator" + [(match_operand:VSDQ_I_DI 2 "register_operand") + (match_operand:VSDQ_I_DI 3 "nonmemory_operand")]))] + "TARGET_SIMD" +{ + rtx mask = operands[0]; + enum rtx_code code = GET_CODE (operands[1]); + + if (operands[3] == CONST0_RTX (<MODE>mode) + && (code == NE || code == LE || code == LT + || code == GE || code == GT || code == EQ)) + { + /* Some instructions have a form taking an immediate zero. */ + ; + } + else if (!REG_P (operands[3])) + { + /* Make sure we can handle the last operand. */ + operands[3] = force_reg (<MODE>mode, operands[3]); + } + + switch (code) + { + case LT: + emit_insn (gen_aarch64_cmlt<mode> (mask, operands[2], operands[3])); + break; + + case GE: + emit_insn (gen_aarch64_cmge<mode> (mask, operands[2], operands[3])); + break; + + case LE: + emit_insn (gen_aarch64_cmle<mode> (mask, operands[2], operands[3])); + break; + + case GT: + emit_insn (gen_aarch64_cmgt<mode> (mask, operands[2], operands[3])); + break; + + case LTU: + emit_insn (gen_aarch64_cmgtu<mode> (mask, operands[3], operands[2])); + break; + + case GEU: + emit_insn (gen_aarch64_cmgeu<mode> (mask, operands[2], operands[3])); + break; + + case LEU: + emit_insn (gen_aarch64_cmgeu<mode> (mask, operands[3], operands[2])); + break; + + case GTU: + emit_insn (gen_aarch64_cmgtu<mode> (mask, operands[2], operands[3])); + break; + + case NE: + /* See comment at the beginning of this pattern, we return the + opposite of result mask for this operator, and it's caller's + resonsibility to invert the mask. + + Fall through. */ + + case EQ: + emit_insn (gen_aarch64_cmeq<mode> (mask, operands[2], operands[3])); + break; + + default: + gcc_unreachable (); + } + + /* Don't invert result mask for NE, which should be done in caller. */ + + DONE; +}) + +(define_expand "vec_cmp<mode><mode>" + [(set (match_operand:VSDQ_I_DI 0 "register_operand") + (match_operator 1 "comparison_operator" + [(match_operand:VSDQ_I_DI 2 "register_operand") + (match_operand:VSDQ_I_DI 3 "nonmemory_operand")]))] + "TARGET_SIMD" +{ + enum rtx_code code = GET_CODE (operands[1]); + + emit_insn (gen_vec_cmp<mode><mode>_internal (operands[0], + operands[1], operands[2], + operands[3])); + /* See comments of vec_cmp<mode><mode>_internal, the opposite result + mask is computed for NE operator, we need to invert the mask here. */ + if (code == NE) + emit_insn (gen_one_cmpl<v_cmp_result>2 (operands[0], operands[0])); + + DONE; +}) + + +;; Internal pattern for vec_cmp. Similar to vec_cmp<mode<mode>_internal, +;; it returns the opposite result mask for operators NE, UNEQ, UNLT, +;; UNLE, UNGT, UNGE and UNORDERED. This is intended behavior so that +;; we can save one mask inverting instruction when using this pattern +;; in vcond patterns. In these cases, it is the caller's responsibility +;; to interpret and use the result mask correctly. +(define_expand "vec_cmp<mode><v_cmp_result>_internal" + [(set (match_operand:<V_cmp_result> 0 "register_operand") + (match_operator 1 "comparison_operator" + [(match_operand:VDQF 2 "register_operand") + (match_operand:VDQF 3 "nonmemory_operand")]))] + "TARGET_SIMD" +{ + int use_zero_form = 0; + enum rtx_code code = GET_CODE (operands[1]); + rtx tmp = gen_reg_rtx (<V_cmp_result>mode); + + rtx (*comparison) (rtx, rtx, rtx); + + if (operands[3] == CONST0_RTX (<MODE>mode) + && (code == LE || code == LT || code == GE || code == GT || code == EQ)) + { + /* Some instructions have a form taking an immediate zero. */ + use_zero_form = 1; + } + else if (!REG_P (operands[3])) + { + /* Make sure we can handle the last operand. */ + operands[3] = force_reg (<MODE>mode, operands[3]); + } + + switch (code) + { + case LT: + if (use_zero_form) + { + comparison = gen_aarch64_cmlt<mode>; + break; + } + /* Else, fall through. */ + case UNGE: + std::swap (operands[2], operands[3]); + /* Fall through. */ + case UNLE: + case GT: + comparison = gen_aarch64_cmgt<mode>; + break; + case LE: + if (use_zero_form) + { + comparison = gen_aarch64_cmle<mode>; + break; + } + /* Else, fall through. */ + case UNGT: + std::swap (operands[2], operands[3]); + /* Fall through. */ + case UNLT: + case GE: + comparison = gen_aarch64_cmge<mode>; + break; + case NE: + case EQ: + comparison = gen_aarch64_cmeq<mode>; + break; + case UNEQ: + case ORDERED: + case UNORDERED: + break; + default: + gcc_unreachable (); + } + + switch (code) + { + case UNGT: + case UNGE: + case NE: + case UNLT: + case UNLE: + /* FCM returns false for lanes which are unordered, so if we use + the inverse of the comparison we actually want to emit, then + revert the result, we will end up with the correct result. + Note that a NE NaN and NaN NE b are true for all a, b. + + Our transformations are: + a GE b -> !(b GT a) + a GT b -> !(b GE a) + a LE b -> !(a GT b) + a LT b -> !(a GE b) + a NE b -> !(a EQ b) + + See comment at the beginning of this pattern, we return the + opposite of result mask for these operators, and it's caller's + resonsibility to invert the mask. + + Fall through. */ + case LT: + case LE: + case GT: + case GE: + case EQ: + /* The easy case. Here we emit one of FCMGE, FCMGT or FCMEQ. + As a LT b <=> b GE a && a LE b <=> b GT a. Our transformations are: + a GE b -> a GE b + a GT b -> a GT b + a LE b -> b GE a + a LT b -> b GT a + a EQ b -> a EQ b + Note that there also exist direct comparison against 0 forms, + so catch those as a special case. */ + + emit_insn (comparison (operands[0], operands[2], operands[3])); + break; + + case UNEQ: + /* We first check (a > b || b > a) which is !UNEQ, reverting + this result will then give us (a == b || a UNORDERED b). + + See comment at the beginning of this pattern, we return the + opposite of result mask for this operator, and it's caller's + resonsibility to invert the mask. */ + + emit_insn (gen_aarch64_cmgt<mode> (operands[0], + operands[2], operands[3])); + emit_insn (gen_aarch64_cmgt<mode> (tmp, operands[3], operands[2])); + emit_insn (gen_ior<v_cmp_result>3 (operands[0], operands[0], tmp)); + break; + case UNORDERED: + /* Operands are ORDERED iff (a > b || b >= a), so we can compute + UNORDERED as !ORDERED. + + See comment at the beginning of this pattern, we return the + opposite of result mask for this operator, and it's caller's + resonsibility to invert the mask. + + Fall through. */ + case ORDERED: + emit_insn (gen_aarch64_cmgt<mode> (tmp, operands[2], operands[3])); + emit_insn (gen_aarch64_cmge<mode> (operands[0], + operands[3], operands[2])); + emit_insn (gen_ior<v_cmp_result>3 (operands[0], operands[0], tmp)); + break; + default: + gcc_unreachable (); + } + + DONE; +}) + +(define_expand "vec_cmp<mode><v_cmp_result>" + [(set (match_operand:<V_cmp_result> 0 "register_operand") + (match_operator 1 "comparison_operator" + [(match_operand:VDQF 2 "register_operand") + (match_operand:VDQF 3 "nonmemory_operand")]))] + "TARGET_SIMD" +{ + enum rtx_code code = GET_CODE (operands[1]); + + emit_insn (gen_vec_cmp<mode><v_cmp_result>_internal (operands[0], + operands[1], operands[2], + operands[3])); + /* See comments of vec_cmp<mode><v_cmp_result>_internal, the opposite + result masks are computed for below operators, we need to invert + the mask here. */ + if (code == NE || code == UNEQ || code == UNLT || code == UNLE + || code == UNGT || code == UNGE || code == UNORDERED) + emit_insn (gen_one_cmpl<v_cmp_result>2 (operands[0], operands[0])); + + DONE; +}) + +(define_expand "vec_cmpu<mode><mode>" + [(set (match_operand:VSDQ_I_DI 0 "register_operand") + (match_operator 1 "comparison_operator" + [(match_operand:VSDQ_I_DI 2 "register_operand") + (match_operand:VSDQ_I_DI 3 "nonmemory_operand")]))] + "TARGET_SIMD" +{ + emit_insn (gen_vec_cmp<mode><mode> (operands[0], operands[1], + operands[2], operands[3])); + DONE; +}) + (define_expand "aarch64_vcond_internal<mode><mode>" [(set (match_operand:VSDQ_I_DI 0 "register_operand") (if_then_else:VSDQ_I_DI
diff --git a/gcc/config/aarch64/aarch64-simd.md b/gcc/config/aarch64/aarch64-simd.md index 6ea35bf..e080b71 100644 --- a/gcc/config/aarch64/aarch64-simd.md +++ b/gcc/config/aarch64/aarch64-simd.md @@ -1053,7 +1053,7 @@ } cmp_fmt = gen_rtx_fmt_ee (cmp_operator, V2DImode, operands[1], operands[2]); - emit_insn (gen_aarch64_vcond_internalv2div2di (operands[0], operands[1], + emit_insn (gen_vcondv2div2di (operands[0], operands[1], operands[2], cmp_fmt, operands[1], operands[2])); DONE; }) @@ -2202,314 +2202,6 @@ DONE; }) -(define_expand "aarch64_vcond_internal<mode><mode>" - [(set (match_operand:VSDQ_I_DI 0 "register_operand") - (if_then_else:VSDQ_I_DI - (match_operator 3 "comparison_operator" - [(match_operand:VSDQ_I_DI 4 "register_operand") - (match_operand:VSDQ_I_DI 5 "nonmemory_operand")]) - (match_operand:VSDQ_I_DI 1 "nonmemory_operand") - (match_operand:VSDQ_I_DI 2 "nonmemory_operand")))] - "TARGET_SIMD" -{ - rtx op1 = operands[1]; - rtx op2 = operands[2]; - rtx mask = gen_reg_rtx (<MODE>mode); - enum rtx_code code = GET_CODE (operands[3]); - - /* Switching OP1 and OP2 is necessary for NE (to output a cmeq insn), - and desirable for other comparisons if it results in FOO ? -1 : 0 - (this allows direct use of the comparison result without a bsl). */ - if (code == NE - || (code != EQ - && op1 == CONST0_RTX (<V_cmp_result>mode) - && op2 == CONSTM1_RTX (<V_cmp_result>mode))) - { - op1 = operands[2]; - op2 = operands[1]; - switch (code) - { - case LE: code = GT; break; - case LT: code = GE; break; - case GE: code = LT; break; - case GT: code = LE; break; - /* No case EQ. */ - case NE: code = EQ; break; - case LTU: code = GEU; break; - case LEU: code = GTU; break; - case GTU: code = LEU; break; - case GEU: code = LTU; break; - default: gcc_unreachable (); - } - } - - /* Make sure we can handle the last operand. */ - switch (code) - { - case NE: - /* Normalized to EQ above. */ - gcc_unreachable (); - - case LE: - case LT: - case GE: - case GT: - case EQ: - /* These instructions have a form taking an immediate zero. */ - if (operands[5] == CONST0_RTX (<MODE>mode)) - break; - /* Fall through, as may need to load into register. */ - default: - if (!REG_P (operands[5])) - operands[5] = force_reg (<MODE>mode, operands[5]); - break; - } - - switch (code) - { - case LT: - emit_insn (gen_aarch64_cmlt<mode> (mask, operands[4], operands[5])); - break; - - case GE: - emit_insn (gen_aarch64_cmge<mode> (mask, operands[4], operands[5])); - break; - - case LE: - emit_insn (gen_aarch64_cmle<mode> (mask, operands[4], operands[5])); - break; - - case GT: - emit_insn (gen_aarch64_cmgt<mode> (mask, operands[4], operands[5])); - break; - - case LTU: - emit_insn (gen_aarch64_cmgtu<mode> (mask, operands[5], operands[4])); - break; - - case GEU: - emit_insn (gen_aarch64_cmgeu<mode> (mask, operands[4], operands[5])); - break; - - case LEU: - emit_insn (gen_aarch64_cmgeu<mode> (mask, operands[5], operands[4])); - break; - - case GTU: - emit_insn (gen_aarch64_cmgtu<mode> (mask, operands[4], operands[5])); - break; - - /* NE has been normalized to EQ above. */ - case EQ: - emit_insn (gen_aarch64_cmeq<mode> (mask, operands[4], operands[5])); - break; - - default: - gcc_unreachable (); - } - - /* If we have (a = (b CMP c) ? -1 : 0); - Then we can simply move the generated mask. */ - - if (op1 == CONSTM1_RTX (<V_cmp_result>mode) - && op2 == CONST0_RTX (<V_cmp_result>mode)) - emit_move_insn (operands[0], mask); - else - { - if (!REG_P (op1)) - op1 = force_reg (<MODE>mode, op1); - if (!REG_P (op2)) - op2 = force_reg (<MODE>mode, op2); - emit_insn (gen_aarch64_simd_bsl<mode> (operands[0], mask, - op1, op2)); - } - - DONE; -}) - -(define_expand "aarch64_vcond_internal<VDQF_COND:mode><VDQF:mode>" - [(set (match_operand:VDQF_COND 0 "register_operand") - (if_then_else:VDQF - (match_operator 3 "comparison_operator" - [(match_operand:VDQF 4 "register_operand") - (match_operand:VDQF 5 "nonmemory_operand")]) - (match_operand:VDQF_COND 1 "nonmemory_operand") - (match_operand:VDQF_COND 2 "nonmemory_operand")))] - "TARGET_SIMD" -{ - int inverse = 0; - int use_zero_form = 0; - int swap_bsl_operands = 0; - rtx op1 = operands[1]; - rtx op2 = operands[2]; - rtx mask = gen_reg_rtx (<VDQF_COND:V_cmp_result>mode); - rtx tmp = gen_reg_rtx (<VDQF_COND:V_cmp_result>mode); - - rtx (*base_comparison) (rtx, rtx, rtx); - rtx (*complimentary_comparison) (rtx, rtx, rtx); - - switch (GET_CODE (operands[3])) - { - case GE: - case GT: - case LE: - case LT: - case EQ: - if (operands[5] == CONST0_RTX (<MODE>mode)) - { - use_zero_form = 1; - break; - } - /* Fall through. */ - default: - if (!REG_P (operands[5])) - operands[5] = force_reg (<VDQF:MODE>mode, operands[5]); - } - - switch (GET_CODE (operands[3])) - { - case LT: - case UNLT: - inverse = 1; - /* Fall through. */ - case GE: - case UNGE: - case ORDERED: - case UNORDERED: - base_comparison = gen_aarch64_cmge<VDQF:mode>; - complimentary_comparison = gen_aarch64_cmgt<VDQF:mode>; - break; - case LE: - case UNLE: - inverse = 1; - /* Fall through. */ - case GT: - case UNGT: - base_comparison = gen_aarch64_cmgt<VDQF:mode>; - complimentary_comparison = gen_aarch64_cmge<VDQF:mode>; - break; - case EQ: - case NE: - case UNEQ: - base_comparison = gen_aarch64_cmeq<VDQF:mode>; - complimentary_comparison = gen_aarch64_cmeq<VDQF:mode>; - break; - default: - gcc_unreachable (); - } - - switch (GET_CODE (operands[3])) - { - case LT: - case LE: - case GT: - case GE: - case EQ: - /* The easy case. Here we emit one of FCMGE, FCMGT or FCMEQ. - As a LT b <=> b GE a && a LE b <=> b GT a. Our transformations are: - a GE b -> a GE b - a GT b -> a GT b - a LE b -> b GE a - a LT b -> b GT a - a EQ b -> a EQ b - Note that there also exist direct comparison against 0 forms, - so catch those as a special case. */ - if (use_zero_form) - { - inverse = 0; - switch (GET_CODE (operands[3])) - { - case LT: - base_comparison = gen_aarch64_cmlt<VDQF:mode>; - break; - case LE: - base_comparison = gen_aarch64_cmle<VDQF:mode>; - break; - default: - /* Do nothing, other zero form cases already have the correct - base_comparison. */ - break; - } - } - - if (!inverse) - emit_insn (base_comparison (mask, operands[4], operands[5])); - else - emit_insn (complimentary_comparison (mask, operands[5], operands[4])); - break; - case UNLT: - case UNLE: - case UNGT: - case UNGE: - case NE: - /* FCM returns false for lanes which are unordered, so if we use - the inverse of the comparison we actually want to emit, then - swap the operands to BSL, we will end up with the correct result. - Note that a NE NaN and NaN NE b are true for all a, b. - - Our transformations are: - a GE b -> !(b GT a) - a GT b -> !(b GE a) - a LE b -> !(a GT b) - a LT b -> !(a GE b) - a NE b -> !(a EQ b) */ - - if (inverse) - emit_insn (base_comparison (mask, operands[4], operands[5])); - else - emit_insn (complimentary_comparison (mask, operands[5], operands[4])); - - swap_bsl_operands = 1; - break; - case UNEQ: - /* We check (a > b || b > a). combining these comparisons give us - true iff !(a != b && a ORDERED b), swapping the operands to BSL - will then give us (a == b || a UNORDERED b) as intended. */ - - emit_insn (gen_aarch64_cmgt<VDQF:mode> (mask, operands[4], operands[5])); - emit_insn (gen_aarch64_cmgt<VDQF:mode> (tmp, operands[5], operands[4])); - emit_insn (gen_ior<VDQF_COND:v_cmp_result>3 (mask, mask, tmp)); - swap_bsl_operands = 1; - break; - case UNORDERED: - /* Operands are ORDERED iff (a > b || b >= a). - Swapping the operands to BSL will give the UNORDERED case. */ - swap_bsl_operands = 1; - /* Fall through. */ - case ORDERED: - emit_insn (gen_aarch64_cmgt<VDQF:mode> (tmp, operands[4], operands[5])); - emit_insn (gen_aarch64_cmge<VDQF:mode> (mask, operands[5], operands[4])); - emit_insn (gen_ior<VDQF_COND:v_cmp_result>3 (mask, mask, tmp)); - break; - default: - gcc_unreachable (); - } - - if (swap_bsl_operands) - { - op1 = operands[2]; - op2 = operands[1]; - } - - /* If we have (a = (b CMP c) ? -1 : 0); - Then we can simply move the generated mask. */ - - if (op1 == CONSTM1_RTX (<VDQF_COND:V_cmp_result>mode) - && op2 == CONST0_RTX (<VDQF_COND:V_cmp_result>mode)) - emit_move_insn (operands[0], mask); - else - { - if (!REG_P (op1)) - op1 = force_reg (<VDQF_COND:MODE>mode, op1); - if (!REG_P (op2)) - op2 = force_reg (<VDQF_COND:MODE>mode, op2); - emit_insn (gen_aarch64_simd_bsl<VDQF_COND:mode> (operands[0], mask, - op1, op2)); - } - - DONE; -}) - (define_expand "vcond<mode><mode>" [(set (match_operand:VALLDI 0 "register_operand") (if_then_else:VALLDI @@ -2520,26 +2212,50 @@ (match_operand:VALLDI 2 "nonmemory_operand")))] "TARGET_SIMD" { - emit_insn (gen_aarch64_vcond_internal<mode><mode> (operands[0], operands[1], - operands[2], operands[3], + rtx mask = gen_reg_rtx (<V_cmp_result>mode); + enum rtx_code code = GET_CODE (operands[3]); + + emit_insn (gen_vec_cmp<mode><v_cmp_result>_internal (mask, operands[3], operands[4], operands[5])); + /* See comments of vec_cmp<mode><v_cmp_result>_internal, the opposite + result masks are computed for below operators, we need to invert + the mask here. In this case we can save an inverting instruction + by simply swapping the two operands to bsl. */ + if (code == NE || code == UNEQ || code == UNLT || code == UNLE + || code == UNGT || code == UNGE || code == UNORDERED) + std::swap (operands[1], operands[2]); + + emit_insn (gen_vcond_mask_<mode><v_cmp_result> (operands[0], operands[1], + operands[2], mask)); DONE; }) -(define_expand "vcond<v_cmp_result><mode>" - [(set (match_operand:<V_cmp_result> 0 "register_operand") - (if_then_else:<V_cmp_result> +(define_expand "vcond<v_cmp_mixed><mode>" + [(set (match_operand:<V_cmp_mixed> 0 "register_operand") + (if_then_else:<V_cmp_mixed> (match_operator 3 "comparison_operator" - [(match_operand:VDQF 4 "register_operand") - (match_operand:VDQF 5 "nonmemory_operand")]) - (match_operand:<V_cmp_result> 1 "nonmemory_operand") - (match_operand:<V_cmp_result> 2 "nonmemory_operand")))] + [(match_operand:VDQF_COND 4 "register_operand") + (match_operand:VDQF_COND 5 "nonmemory_operand")]) + (match_operand:<V_cmp_mixed> 1 "nonmemory_operand") + (match_operand:<V_cmp_mixed> 2 "nonmemory_operand")))] "TARGET_SIMD" { - emit_insn (gen_aarch64_vcond_internal<v_cmp_result><mode> ( + rtx mask = gen_reg_rtx (<V_cmp_result>mode); + enum rtx_code code = GET_CODE (operands[3]); + + emit_insn (gen_vec_cmp<mode><v_cmp_result>_internal (mask, operands[3], + operands[4], operands[5])); + /* See comments of vec_cmp<mode><v_cmp_result>_internal, the opposite + result masks are computed for below operators, we need to invert + the mask here. In this case we can save an inverting instruction + by simply swapping the two operands to bsl. */ + if (code == NE || code == UNEQ || code == UNLT || code == UNLE + || code == UNGT || code == UNGE || code == UNORDERED) + std::swap (operands[1], operands[2]); + + emit_insn (gen_vcond_mask_<v_cmp_mixed><v_cmp_result> ( operands[0], operands[1], - operands[2], operands[3], - operands[4], operands[5])); + operands[2], mask)); DONE; }) @@ -2553,9 +2269,48 @@ (match_operand:VSDQ_I_DI 2 "nonmemory_operand")))] "TARGET_SIMD" { - emit_insn (gen_aarch64_vcond_internal<mode><mode> (operands[0], operands[1], - operands[2], operands[3], + rtx mask = gen_reg_rtx (<MODE>mode); + enum rtx_code code = GET_CODE (operands[3]); + + emit_insn (gen_vec_cmp<mode><mode>_internal (mask, operands[3], operands[4], operands[5])); + /* See comments of vec_cmp<mode><mode>_internal, the opposite result + mask is computed for NE operator, we need to invert the mask here. + In this case we can save an inverting instruction by simply swapping + the two operands to bsl. */ + if (code == NE) + std::swap (operands[1], operands[2]); + + emit_insn (gen_vcond_mask_<mode><v_cmp_result> (operands[0], operands[1], + operands[2], mask)); + DONE; +}) + +(define_expand "vcondu<mode><v_cmp_mixed>" + [(set (match_operand:VDQF 0 "register_operand") + (if_then_else:VDQF + (match_operator 3 "comparison_operator" + [(match_operand:<V_cmp_mixed> 4 "register_operand") + (match_operand:<V_cmp_mixed> 5 "nonmemory_operand")]) + (match_operand:VDQF 1 "nonmemory_operand") + (match_operand:VDQF 2 "nonmemory_operand")))] + "TARGET_SIMD" +{ + rtx mask = gen_reg_rtx (<V_cmp_result>mode); + enum rtx_code code = GET_CODE (operands[3]); + + emit_insn (gen_vec_cmp<v_cmp_mixed><v_cmp_mixed>_internal ( + mask, operands[3], + operands[4], operands[5])); + /* See comments of vec_cmp<mode><mode>_internal, the opposite result + mask is computed for NE operator, we need to invert the mask here. + In this case we can save an inverting instruction by simply swapping + the two operands to bsl. */ + if (code == NE) + std::swap (operands[1], operands[2]); + + emit_insn (gen_vcond_mask_<mode><v_cmp_result> (operands[0], operands[1], + operands[2], mask)); DONE; }) @@ -4156,7 +3911,7 @@ ;; cmtst ;; Although neg (ne (and x y) 0) is the natural way of expressing a cmtst, -;; we don't have any insns using ne, and aarch64_vcond_internal outputs +;; we don't have any insns using ne, and aarch64_vcond outputs ;; not (neg (eq (and x y) 0)) ;; which is rewritten by simplify_rtx as ;; plus (eq (and x y) 0) -1. diff --git a/gcc/config/aarch64/iterators.md b/gcc/config/aarch64/iterators.md index 43b22d8..f53fe9d 100644 --- a/gcc/config/aarch64/iterators.md +++ b/gcc/config/aarch64/iterators.md @@ -610,6 +610,16 @@ (V2DF "v2di") (DF "di") (SF "si")]) +;; Mode for vector conditional operations where the comparison has +;; different type from the lhs. +(define_mode_attr V_cmp_mixed [(V2SI "V2SF") (V4SI "V4SF") + (V2DI "V2DF") (V2SF "V2SI") + (V4SF "V4SI") (V2DF "V2DI")]) + +(define_mode_attr v_cmp_mixed [(V2SI "v2sf") (V4SI "v4sf") + (V2DI "v2df") (V2SF "v2si") + (V4SF "v4si") (V2DF "v2di")]) + ;; Lower case element modes (as used in shift immediate patterns). (define_mode_attr ve_mode [(V8QI "qi") (V16QI "qi") (V4HI "hi") (V8HI "hi")