Re: [PATCH] libatomic: Improve ifunc selection on AArch64

2023-08-10 Thread Richard Henderson via Gcc-patches

On 8/10/23 02:50, Wilco Dijkstra wrote:

Hi Richard,


Why would HWCAP_USCAT not be set by the kernel?

Failing that, I would think you would check ID_AA64MMFR2_EL1.AT.


Answering my own question, N1 does not officially have FEAT_LSE2.


It doesn't indeed. However most cores support atomic 128-bit load/store
(part of LSE2), so we can still use the LSE2 ifunc for those cores. Since there
isn't a feature bit for this in the CPU or HWCAP, I check the CPUID register.


That would be a really nice bit to add to HWCAP, then, to consolidate this knowledge in 
one place.  Certainly I would use it in QEMU as well.



r~



Re: [PATCH] libatomic: Improve ifunc selection on AArch64

2023-08-09 Thread Richard Henderson via Gcc-patches

On 8/9/23 19:11, Richard Henderson wrote:

On 8/4/23 08:05, Wilco Dijkstra via Gcc-patches wrote:

+#ifdef HWCAP_USCAT
+
+#define MIDR_IMPLEMENTOR(midr)    (((midr) >> 24) & 255)
+#define MIDR_PARTNUM(midr)    (((midr) >> 4) & 0xfff)
+
+static inline bool
+ifunc1 (unsigned long hwcap)
+{
+  if (hwcap & HWCAP_USCAT)
+    return true;
+  if (!(hwcap & HWCAP_CPUID))
+    return false;
+
+  unsigned long midr;
+  asm volatile ("mrs %0, midr_el1" : "=r" (midr));
+
+  /* Neoverse N1 supports atomic 128-bit load/store.  */
+  if (MIDR_IMPLEMENTOR (midr) == 'A' && MIDR_PARTNUM(midr) == 0xd0c)
+    return true;
+
+  return false;
+}
+#endif


Why would HWCAP_USCAT not be set by the kernel?

Failing that, I would think you would check ID_AA64MMFR2_EL1.AT.


Answering my own question, N1 does not officially have FEAT_LSE2.


r~



Re: [PATCH] libatomic: Improve ifunc selection on AArch64

2023-08-09 Thread Richard Henderson via Gcc-patches

On 8/4/23 08:05, Wilco Dijkstra via Gcc-patches wrote:

+#ifdef HWCAP_USCAT
+
+#define MIDR_IMPLEMENTOR(midr) (((midr) >> 24) & 255)
+#define MIDR_PARTNUM(midr) (((midr) >> 4) & 0xfff)
+
+static inline bool
+ifunc1 (unsigned long hwcap)
+{
+  if (hwcap & HWCAP_USCAT)
+return true;
+  if (!(hwcap & HWCAP_CPUID))
+return false;
+
+  unsigned long midr;
+  asm volatile ("mrs %0, midr_el1" : "=r" (midr));
+
+  /* Neoverse N1 supports atomic 128-bit load/store.  */
+  if (MIDR_IMPLEMENTOR (midr) == 'A' && MIDR_PARTNUM(midr) == 0xd0c)
+return true;
+
+  return false;
+}
+#endif


Why would HWCAP_USCAT not be set by the kernel?

Failing that, I would think you would check ID_AA64MMFR2_EL1.AT.


r~


[PATCH] MAINTAINERS: Update my email address.

2022-04-19 Thread Richard Henderson via Gcc-patches
2022-04-19  Richard Henderson  

* MAINTAINERS: Update my email address.
---
 MAINTAINERS | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/MAINTAINERS b/MAINTAINERS
index 30f81b3dd52..15973503722 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -53,7 +53,7 @@ aarch64 port  Richard Earnshaw

 aarch64 port   Richard Sandiford   
 aarch64 port   Marcus Shawcroft
 aarch64 port   Kyrylo Tkachov  
-alpha port Richard Henderson   
+alpha port Richard Henderson   
 amdgcn portJulian Brown
 amdgcn portAndrew Stubbs   
 arc port   Joern Rennecke  
-- 
2.34.1



[PATCH v4 11/12] aarch64: Accept 0 as first argument to compares

2020-04-09 Thread Richard Henderson via Gcc-patches
While cmp (extended register) and cmp (immediate) uses ,
cmp (shifted register) uses .  So we can perform cmp xzr, x0.

For ccmp, we only have  as an input.

* config/aarch64/aarch64.md (cmp): For operand 0, use
aarch64_reg_or_zero.  Shuffle reg/reg to last alternative
and accept Z.
(@ccmpcc): For operand 0, use aarch64_reg_or_zero and Z.
(@ccmpcc_rev): Likewise.
---
 gcc/config/aarch64/aarch64.md | 14 +++---
 1 file changed, 7 insertions(+), 7 deletions(-)

diff --git a/gcc/config/aarch64/aarch64.md b/gcc/config/aarch64/aarch64.md
index fb1a39a3886..2b5a6eb510d 100644
--- a/gcc/config/aarch64/aarch64.md
+++ b/gcc/config/aarch64/aarch64.md
@@ -502,7 +502,7 @@
   [(match_operand 0 "cc_register" "")
(const_int 0)])
  (compare:CC_ONLY
-   (match_operand:GPI 2 "register_operand" "r,r,r")
+   (match_operand:GPI 2 "aarch64_reg_or_zero" "rZ,r,r")
(match_operand:GPI 3 "aarch64_ccmp_operand" "r,Uss,Usn"))
  (unspec:CC_ONLY
[(match_operand 5 "immediate_operand")]
@@ -542,7 +542,7 @@
[(match_operand 5 "immediate_operand")]
UNSPEC_NZCV)
  (compare:CC_ONLY
-   (match_operand:GPI 2 "register_operand" "r,r,r")
+   (match_operand:GPI 2 "aarch64_reg_or_zero" "rZ,r,r")
(match_operand:GPI 3 "aarch64_ccmp_operand" "r,Uss,Usn"]
   ""
   "@
@@ -3902,14 +3902,14 @@
 
 (define_insn "cmp"
   [(set (reg:CC CC_REGNUM)
-   (compare:CC (match_operand:GPI 0 "register_operand" "rk,rk,rk")
-   (match_operand:GPI 1 "aarch64_plus_operand" "r,I,J")))]
+   (compare:CC (match_operand:GPI 0 "aarch64_reg_or_zero" "rk,rk,rkZ")
+   (match_operand:GPI 1 "aarch64_plus_operand" "I,J,r")))]
   ""
   "@
-   cmp\\t%0, %1
cmp\\t%0, %1
-   cmn\\t%0, #%n1"
-  [(set_attr "type" "alus_sreg,alus_imm,alus_imm")]
+   cmn\\t%0, #%n1
+   cmp\\t%0, %1"
+  [(set_attr "type" "alus_imm,alus_imm,alus_sreg")]
 )
 
 (define_insn "fcmp"
-- 
2.20.1



[PATCH v4 12/12] aarch64: Implement TImode comparisons

2020-04-09 Thread Richard Henderson via Gcc-patches
* config/aarch64/aarch64-modes.def (CC_NV): New.
* config/aarch64/aarch64.c (aarch64_gen_compare_reg): Expand
all of the comparisons for TImode, not just NE.
(aarch64_select_cc_mode): Recognize cmp_carryin.
(aarch64_get_condition_code_1): Handle CC_NVmode.
* config/aarch64/aarch64.md (cbranchti4, cstoreti4): New.
(ccmp_iorne): New.
(cmp_carryin): New.
(*cmp_carryin): New.
(*cmp_carryin_z1): New.
(*cmp_carryin_z2): New.
(*cmp_carryin_m2, *ucmp_carryin_m2): New.
* config/aarch64/iterators.md (CC_EXTEND): New.
* config/aarch64/predicates.md (const_dword_umax): New.
---
 gcc/config/aarch64/aarch64.c | 164 ---
 gcc/config/aarch64/aarch64-modes.def |   1 +
 gcc/config/aarch64/aarch64.md| 113 ++
 gcc/config/aarch64/iterators.md  |   3 +
 gcc/config/aarch64/predicates.md |   9 ++
 5 files changed, 277 insertions(+), 13 deletions(-)

diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c
index 837ee6a5e37..6c825b341a0 100644
--- a/gcc/config/aarch64/aarch64.c
+++ b/gcc/config/aarch64/aarch64.c
@@ -2731,32 +2731,143 @@ rtx
 aarch64_gen_compare_reg (RTX_CODE code, rtx x, rtx y)
 {
   machine_mode cmp_mode = GET_MODE (x);
-  machine_mode cc_mode;
   rtx cc_reg;
 
   if (cmp_mode == TImode)
 {
-  gcc_assert (code == NE);
+  rtx x_lo, x_hi, y_lo, y_hi, tmp;
+  struct expand_operand ops[2];
 
-  cc_mode = CCmode;
-  cc_reg = gen_rtx_REG (cc_mode, CC_REGNUM);
+  x_lo = operand_subword (x, 0, 0, TImode);
+  x_hi = operand_subword (x, 1, 0, TImode);
 
-  rtx x_lo = operand_subword (x, 0, 0, TImode);
-  rtx y_lo = operand_subword (y, 0, 0, TImode);
-  emit_set_insn (cc_reg, gen_rtx_COMPARE (cc_mode, x_lo, y_lo));
+  if (CONST_SCALAR_INT_P (y))
+   {
+ wide_int y_wide = rtx_mode_t (y, TImode);
 
-  rtx x_hi = operand_subword (x, 1, 0, TImode);
-  rtx y_hi = operand_subword (y, 1, 0, TImode);
-  emit_insn (gen_ccmpccdi (cc_reg, cc_reg, x_hi, y_hi,
-  gen_rtx_EQ (cc_mode, cc_reg, const0_rtx),
-  GEN_INT (AARCH64_EQ)));
+ switch (code)
+   {
+   case EQ:
+   case NE:
+ /* For equality, IOR the two halves together.  If this gets
+used for a branch, we expect this to fold to cbz/cbnz;
+otherwise it's no larger than the cmp+ccmp below.  Beware
+of the compare-and-swap post-reload split and use ccmp.  */
+ if (y_wide == 0 && can_create_pseudo_p ())
+   {
+ tmp = gen_reg_rtx (DImode);
+ emit_insn (gen_iordi3 (tmp, x_hi, x_lo));
+ emit_insn (gen_cmpdi (tmp, const0_rtx));
+ cc_reg = gen_rtx_REG (CCmode, CC_REGNUM);
+ goto done;
+   }
+ break;
+
+   case LE:
+   case GT:
+ /* Add 1 to Y to convert to LT/GE, which avoids the swap and
+keeps the constant operand.  */
+ if (wi::cmps(y_wide, wi::max_value (TImode, SIGNED)) < 0)
+   {
+ y = immed_wide_int_const (wi::add (y_wide, 1), TImode);
+ code = (code == LE ? LT : GE);
+   }
+ break;
+
+   case LEU:
+   case GTU:
+ /* Add 1 to Y to convert to LT/GE, which avoids the swap and
+keeps the constant operand.  */
+ if (wi::cmpu(y_wide, wi::max_value (TImode, UNSIGNED)) < 0)
+   {
+ y = immed_wide_int_const (wi::add (y_wide, 1), TImode);
+ code = (code == LEU ? LTU : GEU);
+   }
+ break;
+
+   default:
+ break;
+   }
+   }
+
+  y_lo = simplify_gen_subreg (DImode, y, TImode,
+ subreg_lowpart_offset (DImode, TImode));
+  y_hi = simplify_gen_subreg (DImode, y, TImode,
+ subreg_highpart_offset (DImode, TImode));
+
+  switch (code)
+   {
+   case LEU:
+   case GTU:
+   case LE:
+   case GT:
+ std::swap (x_lo, y_lo);
+ std::swap (x_hi, y_hi);
+ code = swap_condition (code);
+ break;
+
+   case LTU:
+   case GEU:
+   case LT:
+   case GE:
+ /* If the low word of y is 0, then this is simply a normal
+compare of the upper words.  */
+ if (y_lo == const0_rtx)
+   {
+ if (!aarch64_plus_operand (y_hi, DImode))
+   y_hi = force_reg (DImode, y_hi);
+ return aarch64_gen_compare_reg (code, x_hi, y_hi);
+   }
+ break;
+
+   default:
+ break;
+   }
+
+  /* Emit cmpdi, forcing operands into registers as required.  */
+  create_input_operand (&ops[0], x_lo

[PATCH v4 07/12] aarch64: Rename CC_ADCmode to CC_NOTCmode

2020-04-09 Thread Richard Henderson via Gcc-patches
We are about to use !C in more contexts than add-with-carry.
Choose a more generic name.

* config/aarch64/aarch64-modes.def (CC_NOTC): Rename CC_ADC.
* config/aarch64/aarch64.c (aarch64_select_cc_mode): Update.
(aarch64_get_condition_code_1): Likewise.
* config/aarch64/aarch64.md (addvti4): Likewise.
(add3_carryinC): Likewise.
(*add3_carryinC_zero): Likewise.
(*add3_carryinC): Likewise.
---
 gcc/config/aarch64/aarch64.c |  4 ++--
 gcc/config/aarch64/aarch64-modes.def |  5 +++--
 gcc/config/aarch64/aarch64.md| 14 +++---
 gcc/config/aarch64/predicates.md |  4 ++--
 4 files changed, 14 insertions(+), 13 deletions(-)

diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c
index cd4dc1ef6f9..c09b7bcb7f0 100644
--- a/gcc/config/aarch64/aarch64.c
+++ b/gcc/config/aarch64/aarch64.c
@@ -9530,7 +9530,7 @@ aarch64_select_cc_mode (RTX_CODE code, rtx x, rtx y)
   && code_x == PLUS
   && GET_CODE (XEXP (x, 1)) == ZERO_EXTEND
   && const_dword_umaxp1 (y, mode_x))
-return CC_ADCmode;
+return CC_NOTCmode;
 
   /* A test for signed overflow.  */
   if ((mode_x == DImode || mode_x == TImode)
@@ -9663,7 +9663,7 @@ aarch64_get_condition_code_1 (machine_mode mode, enum 
rtx_code comp_code)
}
   break;
 
-case E_CC_ADCmode:
+case E_CC_NOTCmode:
   switch (comp_code)
{
case GEU: return AARCH64_CS;
diff --git a/gcc/config/aarch64/aarch64-modes.def 
b/gcc/config/aarch64/aarch64-modes.def
index af972e8f72b..181b7b30dcd 100644
--- a/gcc/config/aarch64/aarch64-modes.def
+++ b/gcc/config/aarch64/aarch64-modes.def
@@ -29,7 +29,7 @@
CCmode is used for 'normal' compare (subtraction) operations.  For
ADC, the representation becomes more complex still, since we cannot
use the normal idiom of comparing the result to one of the input
-   operands; instead we use CC_ADCmode to represent this case.  */
+   operands; instead we use CC_NOTCmode to represent this case.  */
 CC_MODE (CCFP);
 CC_MODE (CCFPE);
 CC_MODE (CC_SWP);
@@ -38,7 +38,8 @@ CC_MODE (CC_NZC);   /* Only N, Z and C bits of condition 
flags are valid.
 CC_MODE (CC_NZ);/* Only N and Z bits of condition flags are valid.  */
 CC_MODE (CC_Z); /* Only Z bit of condition flags is valid.  */
 CC_MODE (CC_C); /* C represents unsigned overflow of a simple addition.  */
-CC_MODE (CC_ADC);   /* Unsigned overflow from an ADC (add with carry).  */
+CC_MODE (CC_NOTC);  /* !C represents unsigned overflow of subtraction,
+   as well as our representation of add-with-carry.  */
 CC_MODE (CC_V); /* Only V bit of condition flags is valid.  */
 
 /* Half-precision floating point for __fp16.  */
diff --git a/gcc/config/aarch64/aarch64.md b/gcc/config/aarch64/aarch64.md
index d51f6146c43..7d4a63f9a2a 100644
--- a/gcc/config/aarch64/aarch64.md
+++ b/gcc/config/aarch64/aarch64.md
@@ -2077,7 +2077,7 @@
   CODE_FOR_adddi3_compareC,
   CODE_FOR_adddi3_compareC,
   CODE_FOR_adddi3_carryinC);
-  aarch64_gen_unlikely_cbranch (GEU, CC_ADCmode, operands[3]);
+  aarch64_gen_unlikely_cbranch (GEU, CC_NOTCmode, operands[3]);
   DONE;
 })
 
@@ -2580,7 +2580,7 @@
 (define_expand "add3_carryinC"
   [(parallel
  [(set (match_dup 3)
-  (compare:CC_ADC
+  (compare:CC_NOTC
 (plus:
   (plus:
 (match_dup 4)
@@ -2595,7 +2595,7 @@
 (match_dup 2)))])]
""
 {
-  operands[3] = gen_rtx_REG (CC_ADCmode, CC_REGNUM);
+  operands[3] = gen_rtx_REG (CC_NOTCmode, CC_REGNUM);
   rtx ccin = gen_rtx_REG (CC_Cmode, CC_REGNUM);
   operands[4] = gen_rtx_LTU (mode, ccin, const0_rtx);
   operands[5] = gen_rtx_LTU (mode, ccin, const0_rtx);
@@ -2605,8 +2605,8 @@
 })
 
 (define_insn "*add3_carryinC_zero"
-  [(set (reg:CC_ADC CC_REGNUM)
-   (compare:CC_ADC
+  [(set (reg:CC_NOTC CC_REGNUM)
+   (compare:CC_NOTC
  (plus:
(match_operand: 2 "aarch64_carry_operation" "")
(zero_extend: (match_operand:GPI 1 "register_operand" "r")))
@@ -2620,8 +2620,8 @@
 )
 
 (define_insn "*add3_carryinC"
-  [(set (reg:CC_ADC CC_REGNUM)
-   (compare:CC_ADC
+  [(set (reg:CC_NOTC CC_REGNUM)
+   (compare:CC_NOTC
  (plus:
(plus:
  (match_operand: 3 "aarch64_carry_operation" "")
diff --git a/gcc/config/aarch64/predicates.md b/gcc/config/aarch64/predicates.md
index 99c3bfbace4..e3572d2f60d 100644
--- a/gcc/config/aarch64/predicates.md
+++ b/gcc/config/aarch64/predicates.md
@@ -390,7 +390,7 @@
   machine_mode ccmode = GET_MODE (op0);
   if (ccmode == CC_Cmode)
 return GET_CODE (op) == LTU;
-  if (ccmode == CC_ADCmode || ccmode == CCmode)
+  if (ccmode == CC_NOTCmode || ccmode == CCmode)
 return GET_CODE (op) == GEU;
   return false;
 })
@@ -408,7 +408,7 @@
   machine_mode ccmode = GET_MODE (op0);
   if (ccmode == CC_Cmode)
 return GET_COD

[PATCH v4 10/12] aarch64: Adjust result of aarch64_gen_compare_reg

2020-04-09 Thread Richard Henderson via Gcc-patches
Return the entire comparison expression, not just the cc_reg.
This will allow the routine to adjust the comparison code as
needed for TImode comparisons.

Note that some users were passing e.g. EQ to aarch64_gen_compare_reg
and then using gen_rtx_NE.  Pass the proper code in the first place.

* config/aarch64/aarch64.c (aarch64_gen_compare_reg): Return
the final comparison for code & cc_reg.
(aarch64_gen_compare_reg_maybe_ze): Likewise.
(aarch64_expand_compare_and_swap): Update to match -- do not
build the final comparison here, but PUT_MODE as necessary.
(aarch64_split_compare_and_swap): Use prebuilt comparison.
* config/aarch64/aarch64-simd.md (aarch64_cmdi): Likewise.
(aarch64_cmdi): Likewise.
(aarch64_cmtstdi): Likewise.
* config/aarch64/aarch64-speculation.cc
(aarch64_speculation_establish_tracker): Likewise.
* config/aarch64/aarch64.md (cbranch4, cbranch4): Likewise.
(mod3, abs2): Likewise.
(cstore4, cstore4): Likewise.
(cmov6, cmov6): Likewise.
(movcc, movcc, movcc): Likewise.
(cc): Likewise.
(ffs2): Likewise.
(cstorecc4): Remove redundant "".
---
 gcc/config/aarch64/aarch64.c  | 26 +++---
 gcc/config/aarch64/aarch64-simd.md| 18 ++---
 gcc/config/aarch64/aarch64-speculation.cc |  5 +-
 gcc/config/aarch64/aarch64.md | 96 ++-
 4 files changed, 63 insertions(+), 82 deletions(-)

diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c
index d80afc36889..837ee6a5e37 100644
--- a/gcc/config/aarch64/aarch64.c
+++ b/gcc/config/aarch64/aarch64.c
@@ -2726,7 +2726,7 @@ emit_set_insn (rtx x, rtx y)
 }
 
 /* X and Y are two things to compare using CODE.  Emit the compare insn and
-   return the rtx for register 0 in the proper mode.  */
+   return the rtx for the CCmode comparison.  */
 rtx
 aarch64_gen_compare_reg (RTX_CODE code, rtx x, rtx y)
 {
@@ -2757,7 +2757,7 @@ aarch64_gen_compare_reg (RTX_CODE code, rtx x, rtx y)
   cc_reg = gen_rtx_REG (cc_mode, CC_REGNUM);
   emit_set_insn (cc_reg, gen_rtx_COMPARE (cc_mode, x, y));
 }
-  return cc_reg;
+  return gen_rtx_fmt_ee (code, VOIDmode, cc_reg, const0_rtx);
 }
 
 /* Similarly, but maybe zero-extend Y if Y_MODE < SImode.  */
@@ -2783,7 +2783,7 @@ aarch64_gen_compare_reg_maybe_ze (RTX_CODE code, rtx x, 
rtx y,
  cc_mode = CC_SWPmode;
  cc_reg = gen_rtx_REG (cc_mode, CC_REGNUM);
  emit_set_insn (cc_reg, t);
- return cc_reg;
+ return gen_rtx_fmt_ee (code, VOIDmode, cc_reg, const0_rtx);
}
 }
 
@@ -18980,7 +18980,8 @@ aarch64_expand_compare_and_swap (rtx operands[])
 
   emit_insn (gen_aarch64_compare_and_swap_lse (mode, rval, mem,
   newval, mod_s));
-  cc_reg = aarch64_gen_compare_reg_maybe_ze (NE, rval, oldval, mode);
+  x = aarch64_gen_compare_reg_maybe_ze (EQ, rval, oldval, mode);
+  PUT_MODE (x, SImode);
 }
   else if (TARGET_OUTLINE_ATOMICS)
 {
@@ -18991,7 +18992,8 @@ aarch64_expand_compare_and_swap (rtx operands[])
   rval = emit_library_call_value (func, NULL_RTX, LCT_NORMAL, r_mode,
  oldval, mode, newval, mode,
  XEXP (mem, 0), Pmode);
-  cc_reg = aarch64_gen_compare_reg_maybe_ze (NE, rval, oldval, mode);
+  x = aarch64_gen_compare_reg_maybe_ze (EQ, rval, oldval, mode);
+  PUT_MODE (x, SImode);
 }
   else
 {
@@ -19003,13 +19005,13 @@ aarch64_expand_compare_and_swap (rtx operands[])
   emit_insn (GEN_FCN (code) (rval, mem, oldval, newval,
 is_weak, mod_s, mod_f));
   cc_reg = gen_rtx_REG (CCmode, CC_REGNUM);
+  x = gen_rtx_EQ (SImode, cc_reg, const0_rtx);
 }
 
   if (r_mode != mode)
 rval = gen_lowpart (mode, rval);
   emit_move_insn (operands[1], rval);
 
-  x = gen_rtx_EQ (SImode, cc_reg, const0_rtx);
   emit_insn (gen_rtx_SET (bval, x));
 }
 
@@ -19084,10 +19086,8 @@ aarch64_split_compare_and_swap (rtx operands[])
   if (strong_zero_p)
 x = gen_rtx_NE (VOIDmode, rval, const0_rtx);
   else
-{
-  rtx cc_reg = aarch64_gen_compare_reg_maybe_ze (NE, rval, oldval, mode);
-  x = gen_rtx_NE (VOIDmode, cc_reg, const0_rtx);
-}
+x = aarch64_gen_compare_reg_maybe_ze (NE, rval, oldval, mode);
+
   x = gen_rtx_IF_THEN_ELSE (VOIDmode, x,
gen_rtx_LABEL_REF (Pmode, label2), pc_rtx);
   aarch64_emit_unlikely_jump (gen_rtx_SET (pc_rtx, x));
@@ -19100,8 +19100,7 @@ aarch64_split_compare_and_swap (rtx operands[])
{
  /* Emit an explicit compare instruction, so that we can correctly
 track the condition codes.  */
- rtx cc_reg = aarch64_gen_compare_reg (NE, scratch, const0_rtx);
- x = gen_rtx_NE (GET_MODE (cc_reg), cc_reg, const0_rtx);
+ x = aarch64_gen_compare_reg (NE, scratch,

[PATCH v4 01/12] aarch64: Provide expander for sub3_compare1

2020-04-09 Thread Richard Henderson via Gcc-patches
In one place we open-code a special case of this pattern into the
more specific sub3_compare1_imm, and miss this special case
in other places.  Centralize that special case into an expander.

* config/aarch64/aarch64.md (*sub3_compare1): Rename
from sub3_compare1.
(sub3_compare1): New expander.
(usubv4): Use aarch64_plus_operand for operand2.
* config/aarch64/aarch64.c (aarch64_expand_subvti): Remove
call to gen_subdi3_compare1_imm.
---
 gcc/config/aarch64/aarch64.c  | 11 ++-
 gcc/config/aarch64/aarch64.md | 24 +---
 2 files changed, 23 insertions(+), 12 deletions(-)

diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c
index 4af562a81ea..ce306a10de6 100644
--- a/gcc/config/aarch64/aarch64.c
+++ b/gcc/config/aarch64/aarch64.c
@@ -20797,16 +20797,9 @@ aarch64_expand_subvti (rtx op0, rtx low_dest, rtx 
low_in1,
 }
   else
 {
-  if (aarch64_plus_immediate (low_in2, DImode))
-   emit_insn (gen_subdi3_compare1_imm (low_dest, low_in1, low_in2,
-   GEN_INT (-INTVAL (low_in2;
-  else
-   {
- low_in2 = force_reg (DImode, low_in2);
- emit_insn (gen_subdi3_compare1 (low_dest, low_in1, low_in2));
-   }
-  high_in2 = force_reg (DImode, high_in2);
+  emit_insn (gen_subdi3_compare1 (low_dest, low_in1, low_in2));
 
+  high_in2 = force_reg (DImode, high_in2);
   if (unsigned_p)
emit_insn (gen_usubdi3_carryinC (high_dest, high_in1, high_in2));
   else
diff --git a/gcc/config/aarch64/aarch64.md b/gcc/config/aarch64/aarch64.md
index c7c4d1dd519..728c63bd8d6 100644
--- a/gcc/config/aarch64/aarch64.md
+++ b/gcc/config/aarch64/aarch64.md
@@ -2966,13 +2966,12 @@
 (define_expand "usubv4"
   [(match_operand:GPI 0 "register_operand")
(match_operand:GPI 1 "aarch64_reg_or_zero")
-   (match_operand:GPI 2 "aarch64_reg_or_zero")
+   (match_operand:GPI 2 "aarch64_plus_operand")
(label_ref (match_operand 3 "" ""))]
   ""
 {
   emit_insn (gen_sub3_compare1 (operands[0], operands[1], operands[2]));
   aarch64_gen_unlikely_cbranch (LTU, CCmode, operands[3]);
-
   DONE;
 })
 
@@ -3119,7 +3118,7 @@
   [(set_attr "type" "alus_imm")]
 )
 
-(define_insn "sub3_compare1"
+(define_insn "*sub3_compare1"
   [(set (reg:CC CC_REGNUM)
(compare:CC
  (match_operand:GPI 1 "aarch64_reg_or_zero" "rkZ")
@@ -3131,6 +3130,25 @@
   [(set_attr "type" "alus_sreg")]
 )
 
+(define_expand "sub3_compare1"
+  [(parallel
+[(set (reg:CC CC_REGNUM)
+ (compare:CC
+   (match_operand:GPI 1 "aarch64_reg_or_zero")
+   (match_operand:GPI 2 "aarch64_plus_operand")))
+ (set (match_operand:GPI 0 "register_operand")
+ (minus:GPI (match_dup 1) (match_dup 2)))])]
+  ""
+{
+  if (CONST_SCALAR_INT_P (operands[2]))
+{
+  emit_insn (gen_sub3_compare1_imm
+(operands[0], operands[1], operands[2],
+ GEN_INT (-INTVAL (operands[2];
+  DONE;
+}
+})
+
 (define_peephole2
   [(set (match_operand:GPI 0 "aarch64_general_reg")
(minus:GPI (match_operand:GPI 1 "aarch64_reg_or_zero")
-- 
2.20.1



[PATCH v4 09/12] aarch64: Use CC_NOTCmode for double-word subtract

2020-04-09 Thread Richard Henderson via Gcc-patches
We have been using CCmode, which is not correct for this case.
Mirror the same code from the arm target.

* config/aarch64/aarch64.c (aarch64_select_cc_mode):
Recognize usub*_carryinC patterns.
* config/aarch64/aarch64.md (usubvti4): Use CC_NOTC.
(usub3_carryinC): Likewise.
(*usub3_carryinC_z1): Likewise.
(*usub3_carryinC_z2): Likewise.
(*usub3_carryinC): Likewise.
---
 gcc/config/aarch64/aarch64.c  |  9 +
 gcc/config/aarch64/aarch64.md | 18 +-
 2 files changed, 18 insertions(+), 9 deletions(-)

diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c
index c09b7bcb7f0..d80afc36889 100644
--- a/gcc/config/aarch64/aarch64.c
+++ b/gcc/config/aarch64/aarch64.c
@@ -9532,6 +9532,15 @@ aarch64_select_cc_mode (RTX_CODE code, rtx x, rtx y)
   && const_dword_umaxp1 (y, mode_x))
 return CC_NOTCmode;
 
+  /* A test for unsigned overflow from a subtract with borrow.  */
+  if ((mode_x == DImode || mode_x == TImode)
+  && (code == GEU || code == LTU)
+  && code_x == ZERO_EXTEND
+  && ((GET_CODE (y) == PLUS
+  && aarch64_borrow_operation (XEXP (y, 0), mode_x))
+ || aarch64_borrow_operation (y, mode_x)))
+return CC_NOTCmode;
+
   /* A test for signed overflow.  */
   if ((mode_x == DImode || mode_x == TImode)
   && (code == NE || code == EQ)
diff --git a/gcc/config/aarch64/aarch64.md b/gcc/config/aarch64/aarch64.md
index 7d4a63f9a2a..a0a872c6d94 100644
--- a/gcc/config/aarch64/aarch64.md
+++ b/gcc/config/aarch64/aarch64.md
@@ -2954,7 +2954,7 @@
   CODE_FOR_subdi3_compare1,
   CODE_FOR_subdi3_compare1,
   CODE_FOR_usubdi3_carryinC);
-  aarch64_gen_unlikely_cbranch (LTU, CCmode, operands[3]);
+  aarch64_gen_unlikely_cbranch (LTU, CC_NOTCmode, operands[3]);
   DONE;
 })
 
@@ -3367,8 +3367,8 @@
 
 (define_expand "usub3_carryinC"
   [(parallel
- [(set (reg:CC CC_REGNUM)
-  (compare:CC
+ [(set (reg:CC_NOTC CC_REGNUM)
+  (compare:CC_NOTC
 (zero_extend:
   (match_operand:GPI 1 "aarch64_reg_or_zero"))
 (plus:
@@ -3383,8 +3383,8 @@
 )
 
 (define_insn "*usub3_carryinC_z1"
-  [(set (reg:CC CC_REGNUM)
-   (compare:CC
+  [(set (reg:CC_NOTC CC_REGNUM)
+   (compare:CC_NOTC
  (const_int 0)
  (plus:
(zero_extend:
@@ -3400,8 +3400,8 @@
 )
 
 (define_insn "*usub3_carryinC_z2"
-  [(set (reg:CC CC_REGNUM)
-   (compare:CC
+  [(set (reg:CC_NOTC CC_REGNUM)
+   (compare:CC_NOTC
  (zero_extend:
(match_operand:GPI 1 "register_operand" "r"))
  (match_operand: 2 "aarch64_borrow_operation" "")))
@@ -3415,8 +3415,8 @@
 )
 
 (define_insn "*usub3_carryinC"
-  [(set (reg:CC CC_REGNUM)
-   (compare:CC
+  [(set (reg:CC_NOTC CC_REGNUM)
+   (compare:CC_NOTC
  (zero_extend:
(match_operand:GPI 1 "register_operand" "r"))
  (plus:
-- 
2.20.1



[PATCH v4 06/12] aarch64: Introduce aarch64_expand_addsubti

2020-04-09 Thread Richard Henderson via Gcc-patches
Modify aarch64_expand_subvti into a form that handles all
addition and subtraction, modulo, signed or unsigned overflow.

Use expand_insn to put the operands into the proper form,
and do not force values into register if not required.

* config/aarch64/aarch64.c (aarch64_ti_split) New.
(aarch64_addti_scratch_regs): Remove.
(aarch64_subvti_scratch_regs): Remove.
(aarch64_expand_subvti): Remove.
(aarch64_expand_addsubti): New.
* config/aarch64/aarch64-protos.h: Update to match.
* config/aarch64/aarch64.md (addti3): Use aarch64_expand_addsubti.
(addvti4, uaddvti4): Likewise.
(subvti4, usubvti4): Likewise.
(subti3): Likewise; accept immediates for operand 2.
---
 gcc/config/aarch64/aarch64-protos.h |  10 +--
 gcc/config/aarch64/aarch64.c| 129 +---
 gcc/config/aarch64/aarch64.md   | 125 ++-
 3 files changed, 67 insertions(+), 197 deletions(-)

diff --git a/gcc/config/aarch64/aarch64-protos.h 
b/gcc/config/aarch64/aarch64-protos.h
index 9e43adb7db0..787d67d62e0 100644
--- a/gcc/config/aarch64/aarch64-protos.h
+++ b/gcc/config/aarch64/aarch64-protos.h
@@ -630,16 +630,8 @@ void aarch64_reset_previous_fndecl (void);
 bool aarch64_return_address_signing_enabled (void);
 bool aarch64_bti_enabled (void);
 void aarch64_save_restore_target_globals (tree);
-void aarch64_addti_scratch_regs (rtx, rtx, rtx *,
-rtx *, rtx *,
-rtx *, rtx *,
-rtx *);
-void aarch64_subvti_scratch_regs (rtx, rtx, rtx *,
- rtx *, rtx *,
- rtx *, rtx *, rtx *);
-void aarch64_expand_subvti (rtx, rtx, rtx,
-   rtx, rtx, rtx, rtx, bool);
 
+void aarch64_expand_addsubti (rtx, rtx, rtx, int, int, int);
 
 /* Initialize builtins for SIMD intrinsics.  */
 void init_aarch64_simd_builtins (void);
diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c
index 36e9ebb468a..cd4dc1ef6f9 100644
--- a/gcc/config/aarch64/aarch64.c
+++ b/gcc/config/aarch64/aarch64.c
@@ -20706,110 +20706,61 @@ aarch64_gen_unlikely_cbranch (enum rtx_code code, 
machine_mode cc_mode,
   aarch64_emit_unlikely_jump (gen_rtx_SET (pc_rtx, x));
 }
 
-/* Generate DImode scratch registers for 128-bit (TImode) addition.
+/* Generate DImode scratch registers for 128-bit (TImode) add/sub.
+   INPUT represents the TImode input operand
+   LO represents the low half (DImode) of the TImode operand
+   HI represents the high half (DImode) of the TImode operand.  */
 
-   OP1 represents the TImode destination operand 1
-   OP2 represents the TImode destination operand 2
-   LOW_DEST represents the low half (DImode) of TImode operand 0
-   LOW_IN1 represents the low half (DImode) of TImode operand 1
-   LOW_IN2 represents the low half (DImode) of TImode operand 2
-   HIGH_DEST represents the high half (DImode) of TImode operand 0
-   HIGH_IN1 represents the high half (DImode) of TImode operand 1
-   HIGH_IN2 represents the high half (DImode) of TImode operand 2.  */
-
-void
-aarch64_addti_scratch_regs (rtx op1, rtx op2, rtx *low_dest,
-   rtx *low_in1, rtx *low_in2,
-   rtx *high_dest, rtx *high_in1,
-   rtx *high_in2)
+static void
+aarch64_ti_split (rtx input, rtx *lo, rtx *hi)
 {
-  *low_dest = gen_reg_rtx (DImode);
-  *low_in1 = gen_lowpart (DImode, op1);
-  *low_in2 = simplify_gen_subreg (DImode, op2, TImode,
- subreg_lowpart_offset (DImode, TImode));
-  *high_dest = gen_reg_rtx (DImode);
-  *high_in1 = gen_highpart (DImode, op1);
-  *high_in2 = simplify_gen_subreg (DImode, op2, TImode,
-  subreg_highpart_offset (DImode, TImode));
+  *lo = simplify_gen_subreg (DImode, input, TImode,
+subreg_lowpart_offset (DImode, TImode));
+  *hi = simplify_gen_subreg (DImode, input, TImode,
+subreg_highpart_offset (DImode, TImode));
 }
 
-/* Generate DImode scratch registers for 128-bit (TImode) subtraction.
-
-   This function differs from 'arch64_addti_scratch_regs' in that
-   OP1 can be an immediate constant (zero). We must call
-   subreg_highpart_offset with DImode and TImode arguments, otherwise
-   VOIDmode will be used for the const_int which generates an internal
-   error from subreg_size_highpart_offset which does not expect a size of zero.
-
-   OP1 represents the TImode destination operand 1
-   OP2 represents the TImode destination operand 2
-   LOW_DEST represents the low half (DImode) of TImode operand 0
-   LOW_IN1 represents the low half (DImode) of TImode operand 1
-   LOW_IN2 represents the low half (DImode) of TImode operand 2
-   HIGH_DEST represents the high half (DImode) of TImode operand 0
-   HIGH_IN1 represents the high half (DImode) of TImode operand 1
-   HIG

[PATCH v4 08/12] arm: Merge CC_ADC and CC_B to CC_NOTC

2020-04-09 Thread Richard Henderson via Gcc-patches
These CC_MODEs are identical, merge them into a more generic name.

* config/arm/arm-modes.def (CC_NOTC): New.
(CC_ADC, CC_B): Remove.
* config/arm/arm.c (arm_select_cc_mode): Update to match.
(arm_gen_dicompare_reg): Likewise.
(maybe_get_arm_condition_code): Likewise.
* config/arm/arm.md (uaddvdi4): Likewise.
(addsi3_cin_cout_reg, addsi3_cin_cout_imm): Likewise.
(*addsi3_cin_cout_reg_insn): Likewise.
(*addsi3_cin_cout_imm_insn): Likewise.
(addsi3_cin_cout_0, *addsi3_cin_cout_0_insn): Likewise.
(usubvsi3_borrow, usubvsi3_borrow_imm): Likewise.
---
 gcc/config/arm/arm.c | 30 +++---
 gcc/config/arm/arm-modes.def | 12 
 gcc/config/arm/arm.md| 36 ++--
 gcc/config/arm/iterators.md  |  2 +-
 gcc/config/arm/predicates.md |  4 ++--
 5 files changed, 36 insertions(+), 48 deletions(-)

diff --git a/gcc/config/arm/arm.c b/gcc/config/arm/arm.c
index c38776fdad7..145345c2278 100644
--- a/gcc/config/arm/arm.c
+++ b/gcc/config/arm/arm.c
@@ -15669,7 +15669,7 @@ arm_select_cc_mode (enum rtx_code op, rtx x, rtx y)
   && CONST_INT_P (y)
   && UINTVAL (y) == 0x8
   && (op == GEU || op == LTU))
-return CC_ADCmode;
+return CC_NOTCmode;
 
   if (GET_MODE (x) == DImode
   && (op == GE || op == LT)
@@ -15685,7 +15685,7 @@ arm_select_cc_mode (enum rtx_code op, rtx x, rtx y)
   && ((GET_CODE (y) == PLUS
   && arm_borrow_operation (XEXP (y, 0), DImode))
  || arm_borrow_operation (y, DImode)))
-return CC_Bmode;
+return CC_NOTCmode;
 
   if (GET_MODE (x) == DImode
   && (op == EQ || op == NE)
@@ -15879,18 +15879,18 @@ arm_gen_dicompare_reg (rtx_code code, rtx x, rtx y, 
rtx scratch)
 
rtx_insn *insn;
if (y_hi == const0_rtx)
- insn = emit_insn (gen_cmpsi3_0_carryin_CC_Bout (scratch, x_hi,
- cmp1));
+ insn = emit_insn (gen_cmpsi3_0_carryin_CC_NOTCout
+   (scratch, x_hi, cmp1));
else if (CONST_INT_P (y_hi))
  {
/* Constant is viewed as unsigned when zero-extended.  */
y_hi = GEN_INT (UINTVAL (y_hi) & 0xULL);
-   insn = emit_insn (gen_cmpsi3_imm_carryin_CC_Bout (scratch, x_hi,
- y_hi, cmp1));
+   insn = emit_insn (gen_cmpsi3_imm_carryin_CC_NOTCout
+ (scratch, x_hi, y_hi, cmp1));
  }
else
- insn = emit_insn (gen_cmpsi3_carryin_CC_Bout (scratch, x_hi, y_hi,
-   cmp1));
+ insn = emit_insn (gen_cmpsi3_carryin_CC_NOTCout
+   (scratch, x_hi, y_hi, cmp1));
return SET_DEST (single_set (insn));
   }
 
@@ -15911,8 +15911,8 @@ arm_gen_dicompare_reg (rtx_code code, rtx x, rtx y, rtx 
scratch)
 arm_gen_compare_reg (LTU, y_lo, x_lo, scratch),
 const0_rtx);
y_hi = GEN_INT (0x & UINTVAL (y_hi));
-   rtx_insn *insn = emit_insn (gen_rscsi3_CC_Bout_scratch (scratch, y_hi,
-   x_hi, cmp1));
+   rtx_insn *insn = emit_insn (gen_rscsi3_CC_NOTCout_scratch
+   (scratch, y_hi, x_hi, cmp1));
return SET_DEST (single_set (insn));
   }
 
@@ -24511,7 +24511,7 @@ maybe_get_arm_condition_code (rtx comparison)
default: return ARM_NV;
}
 
-case E_CC_Bmode:
+case E_CC_NOTCmode:
   switch (comp_code)
{
case GEU: return ARM_CS;
@@ -24527,14 +24527,6 @@ maybe_get_arm_condition_code (rtx comparison)
default: return ARM_NV;
}
 
-case E_CC_ADCmode:
-  switch (comp_code)
-   {
-   case GEU: return ARM_CS;
-   case LTU: return ARM_CC;
-   default: return ARM_NV;
-   }
-
 case E_CCmode:
 case E_CC_RSBmode:
   switch (comp_code)
diff --git a/gcc/config/arm/arm-modes.def b/gcc/config/arm/arm-modes.def
index 6e48223b63d..2495054e066 100644
--- a/gcc/config/arm/arm-modes.def
+++ b/gcc/config/arm/arm-modes.def
@@ -33,18 +33,15 @@ ADJUST_FLOAT_FORMAT (HF, ((arm_fp16_format == 
ARM_FP16_FORMAT_ALTERNATIVE)
CC_Zmode should be used if only the Z flag is set correctly
CC_Cmode should be used if only the C flag is set correctly, after an
  addition.
+   CC_NOTCmode is the inverse of the C flag, after subtraction (borrow),
+ or for ADC where we cannot use the trick of comparing the sum
+ against one of the other operands.
CC_Nmode should be used if only the N (sign) flag is set correctly
CC_NVmode should be used if only the N and V bits are set correctly,
  (used for signed comparisons when the carry is propagated in).
CC_RSBmode should be used where the comparison is set by an RSB

[PATCH v4 05/12] aarch64: Improvements to aarch64_select_cc_mode from arm

2020-04-09 Thread Richard Henderson via Gcc-patches
The arm target has some improvements over aarch64 for
double-word arithmetic and comparisons.

* config/aarch64/aarch64.c (aarch64_select_cc_mode): Check
for swapped operands to CC_Cmode; check for zero_extend to
CC_ADCmode; check for swapped operands to CC_Vmode.
---
 gcc/config/aarch64/aarch64.c | 12 
 1 file changed, 8 insertions(+), 4 deletions(-)

diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c
index f2c14818c79..36e9ebb468a 100644
--- a/gcc/config/aarch64/aarch64.c
+++ b/gcc/config/aarch64/aarch64.c
@@ -9521,21 +9521,25 @@ aarch64_select_cc_mode (RTX_CODE code, rtx x, rtx y)
   if ((mode_x == DImode || mode_x == TImode)
   && (code == LTU || code == GEU)
   && code_x == PLUS
-  && rtx_equal_p (XEXP (x, 0), y))
+  && (rtx_equal_p (XEXP (x, 0), y) || rtx_equal_p (XEXP (x, 1), y)))
 return CC_Cmode;
 
   /* A test for unsigned overflow from an add with carry.  */
   if ((mode_x == DImode || mode_x == TImode)
   && (code == LTU || code == GEU)
   && code_x == PLUS
+  && GET_CODE (XEXP (x, 1)) == ZERO_EXTEND
   && const_dword_umaxp1 (y, mode_x))
 return CC_ADCmode;
 
   /* A test for signed overflow.  */
   if ((mode_x == DImode || mode_x == TImode)
-  && code == NE
-  && code_x == PLUS
-  && GET_CODE (y) == SIGN_EXTEND)
+  && (code == NE || code == EQ)
+  && (code_x == PLUS || code_x == MINUS)
+  && (GET_CODE (XEXP (x, 0)) == SIGN_EXTEND
+  || GET_CODE (XEXP (x, 1)) == SIGN_EXTEND)
+  && GET_CODE (y) == SIGN_EXTEND
+  && GET_CODE (XEXP (y, 0)) == GET_CODE (x))
 return CC_Vmode;
 
   /* For everything else, return CCmode.  */
-- 
2.20.1



[PATCH v4 03/12] aarch64: Add cset, csetm, cinc patterns for carry/borrow

2020-04-09 Thread Richard Henderson via Gcc-patches
Some implementations have a higher cost for the csel insn
(and its specializations) than they do for adc/sbc.

* config/aarch64/aarch64.md (*cstore_carry): New.
(*cstoresi_carry_uxtw): New.
(*cstore_borrow): New.
(*cstoresi_borrow_uxtw): New.
(*csinc2_carry): New.
---
 gcc/testsuite/gcc.target/aarch64/asm-flag-1.c |  3 +-
 gcc/config/aarch64/aarch64.md | 51 ++-
 2 files changed, 51 insertions(+), 3 deletions(-)

diff --git a/gcc/testsuite/gcc.target/aarch64/asm-flag-1.c 
b/gcc/testsuite/gcc.target/aarch64/asm-flag-1.c
index 49901e59c38..b6c21fee306 100644
--- a/gcc/testsuite/gcc.target/aarch64/asm-flag-1.c
+++ b/gcc/testsuite/gcc.target/aarch64/asm-flag-1.c
@@ -21,7 +21,8 @@ void f(char *out)
 
 /* { dg-final { scan-assembler "cset.*, ne" } } */
 /* { dg-final { scan-assembler "cset.*, eq" } } */
-/* { dg-final { scan-assembler "cset.*, cs" } } */
+/* { dg-final { scan-assembler-not "cset.*, cs" } } */
+/* { dg-final { scan-assembler "adc.*, .zr, .zr" } } */
 /* { dg-final { scan-assembler "cset.*, cc" } } */
 /* { dg-final { scan-assembler "cset.*, mi" } } */
 /* { dg-final { scan-assembler "cset.*, pl" } } */
diff --git a/gcc/config/aarch64/aarch64.md b/gcc/config/aarch64/aarch64.md
index e65f46f0f74..d266a1edd64 100644
--- a/gcc/config/aarch64/aarch64.md
+++ b/gcc/config/aarch64/aarch64.md
@@ -4086,6 +4086,15 @@
   "
 )
 
+;; On some implementations (e.g. tx1) csel is more expensive than adc.
+(define_insn "*cstore_carry"
+  [(set (match_operand:ALLI 0 "register_operand" "=r")
+   (match_operand:ALLI 1 "aarch64_carry_operation"))]
+  ""
+  "adc\\t%0, zr, zr"
+  [(set_attr "type" "adc_reg")]
+)
+
 (define_insn "aarch64_cstore"
   [(set (match_operand:ALLI 0 "register_operand" "=r")
(match_operator:ALLI 1 "aarch64_comparison_operator_mode"
@@ -4130,7 +4139,16 @@
   [(set_attr "type" "csel")]
 )
 
-;; zero_extend version of the above
+;; zero_extend versions of the above
+
+(define_insn "*cstoresi_carry_uxtw"
+  [(set (match_operand:DI 0 "register_operand" "=r")
+   (zero_extend:DI (match_operand:SI 1 "aarch64_carry_operation")))]
+  ""
+  "adc\\t%w0, wzr, wzr"
+  [(set_attr "type" "adc_reg")]
+)
+
 (define_insn "*cstoresi_insn_uxtw"
   [(set (match_operand:DI 0 "register_operand" "=r")
(zero_extend:DI
@@ -4141,6 +4159,15 @@
   [(set_attr "type" "csel")]
 )
 
+;; On some implementations (e.g. tx1) csel is more expensive than sbc.
+(define_insn "*cstore_borrow"
+  [(set (match_operand:ALLI 0 "register_operand" "=r")
+   (neg:ALLI (match_operand:ALLI 1 "aarch64_borrow_operation")))]
+  ""
+  "sbc\\t%0, zr, zr"
+  [(set_attr "type" "adc_reg")]
+)
+
 (define_insn "cstore_neg"
   [(set (match_operand:ALLI 0 "register_operand" "=r")
(neg:ALLI (match_operator:ALLI 1 "aarch64_comparison_operator_mode"
@@ -4150,7 +4177,17 @@
   [(set_attr "type" "csel")]
 )
 
-;; zero_extend version of the above
+;; zero_extend versions of the above
+
+(define_insn "*cstoresi_borrow_uxtw"
+  [(set (match_operand:DI 0 "register_operand" "=r")
+   (zero_extend:DI
+ (neg:SI (match_operand:SI 1 "aarch64_borrow_operation"]
+  ""
+  "sbc\\t%w0, wzr, wzr"
+  [(set_attr "type" "adc_reg")]
+)
+
 (define_insn "*cstoresi_neg_uxtw"
   [(set (match_operand:DI 0 "register_operand" "=r")
(zero_extend:DI
@@ -4353,6 +4390,16 @@
   [(set_attr "type" "crc")]
 )
 
+;; On some implementations (e.g. tx1) csel is more expensive than adc.
+(define_insn "*csinc2_carry"
+  [(set (match_operand:GPI 0 "register_operand" "=r")
+   (plus:GPI (match_operand 2 "aarch64_carry_operation")
+  (match_operand:GPI 1 "register_operand" "r")))]
+  ""
+  "adc\\t%0, %1, zr"
+  [(set_attr "type" "adc_reg")]
+)
+
 (define_insn "*csinc2_insn"
   [(set (match_operand:GPI 0 "register_operand" "=r")
 (plus:GPI (match_operand 2 "aarch64_comparison_operation" "")
-- 
2.20.1



[PATCH v4 04/12] aarch64: Add const_dword_umaxp1

2020-04-09 Thread Richard Henderson via Gcc-patches
Rather than duplicating the rather verbose integral test,
pull it out to a predicate.

* config/aarch64/predicates.md (const_dword_umaxp1): New.
* config/aarch64/aarch64.c (aarch64_select_cc_mode): Use it.
* config/aarch64/aarch64.md (add*add3_carryinC): Likewise.
(*add3_carryinC_zero): Likewise.
(add3_carryinC): Use mode for constant, not TImode.
---
 gcc/config/aarch64/aarch64.c |  5 +
 gcc/config/aarch64/aarch64.md| 16 +++-
 gcc/config/aarch64/predicates.md |  9 +
 3 files changed, 17 insertions(+), 13 deletions(-)

diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c
index ce306a10de6..f2c14818c79 100644
--- a/gcc/config/aarch64/aarch64.c
+++ b/gcc/config/aarch64/aarch64.c
@@ -9528,10 +9528,7 @@ aarch64_select_cc_mode (RTX_CODE code, rtx x, rtx y)
   if ((mode_x == DImode || mode_x == TImode)
   && (code == LTU || code == GEU)
   && code_x == PLUS
-  && CONST_SCALAR_INT_P (y)
-  && (rtx_mode_t (y, mode_x)
- == (wi::shwi (1, mode_x)
- << (GET_MODE_BITSIZE (mode_x).to_constant () / 2
+  && const_dword_umaxp1 (y, mode_x))
 return CC_ADCmode;
 
   /* A test for signed overflow.  */
diff --git a/gcc/config/aarch64/aarch64.md b/gcc/config/aarch64/aarch64.md
index d266a1edd64..6b21cc9c61b 100644
--- a/gcc/config/aarch64/aarch64.md
+++ b/gcc/config/aarch64/aarch64.md
@@ -2659,7 +2659,7 @@
   operands[5] = gen_rtx_LTU (mode, ccin, const0_rtx);
   operands[6] = immed_wide_int_const (wi::shwi (1, mode)
  << GET_MODE_BITSIZE (mode),
- TImode);
+ mode);
 })
 
 (define_insn "*add3_carryinC_zero"
@@ -2668,13 +2668,12 @@
  (plus:
(match_operand: 2 "aarch64_carry_operation" "")
(zero_extend: (match_operand:GPI 1 "register_operand" "r")))
- (match_operand 4 "const_scalar_int_operand" "")))
+ (match_operand: 4 "const_dword_umaxp1" "")))
(set (match_operand:GPI 0 "register_operand" "=r")
(plus:GPI (match_operand:GPI 3 "aarch64_carry_operation" "")
  (match_dup 1)))]
-  "rtx_mode_t (operands[4], mode)
-   == (wi::shwi (1, mode) << (unsigned) GET_MODE_BITSIZE (mode))"
-   "adcs\\t%0, %1, zr"
+  ""
+  "adcs\\t%0, %1, zr"
   [(set_attr "type" "adc_reg")]
 )
 
@@ -2686,15 +2685,14 @@
  (match_operand: 3 "aarch64_carry_operation" "")
  (zero_extend: (match_operand:GPI 1 "register_operand" "r")))
(zero_extend: (match_operand:GPI 2 "register_operand" "r")))
- (match_operand 5 "const_scalar_int_operand" "")))
+ (match_operand: 5 "const_dword_umaxp1" "")))
(set (match_operand:GPI 0 "register_operand" "=r")
(plus:GPI
  (plus:GPI (match_operand:GPI 4 "aarch64_carry_operation" "")
(match_dup 1))
  (match_dup 2)))]
-  "rtx_mode_t (operands[5], mode)
-   == (wi::shwi (1, mode) << (unsigned) GET_MODE_BITSIZE (mode))"
-   "adcs\\t%0, %1, %2"
+  ""
+  "adcs\\t%0, %1, %2"
   [(set_attr "type" "adc_reg")]
 )
 
diff --git a/gcc/config/aarch64/predicates.md b/gcc/config/aarch64/predicates.md
index 215fcec5955..99c3bfbace4 100644
--- a/gcc/config/aarch64/predicates.md
+++ b/gcc/config/aarch64/predicates.md
@@ -46,6 +46,15 @@
   return CONST_INT_P (op) && IN_RANGE (INTVAL (op), 1, 3);
 })
 
+;; True for 1 << (GET_MODE_BITSIZE (mode) / 2)
+;; I.e UINT_MAX + 1 for a given mode, in the double-word mode.
+(define_predicate "const_dword_umaxp1"
+  (match_code "const_int,const_wide_int")
+{
+  unsigned bits = GET_MODE_BITSIZE (mode).to_constant () / 2;
+  return rtx_mode_t (op, mode) == (wi::shwi (1, mode) << bits);
+})
+
 (define_predicate "subreg_lowpart_operator"
   (ior (match_code "truncate")
(and (match_code "subreg")
-- 
2.20.1



[PATCH v4 00/12] aarch64: Implement TImode comparisons

2020-04-09 Thread Richard Henderson via Gcc-patches
This is attacking case 3 of PR 94174.

In v4, I attempt to bring over as many patterns from config/arm
as are applicable.  It's not too far away from what I had from v2.

In the process of checking all of the combinations (below), I
discovered that we could probably have a better represenation
for ccmp.  One that the optimizers can actually do something with,
rather than the if_then_else+unspec combo that we have now.

A special case of that is in the last patch: ccmp_iorne.  I think
it should be possible to come up with some sort of logical combo
that would apply to all cases, but haven't put enough thought
into the problem.


r~


Richard Henderson (12):
  aarch64: Provide expander for sub3_compare1
  aarch64: Match add3_carryin expander and insn
  aarch64: Add cset, csetm, cinc patterns for carry/borrow
  aarch64: Add const_dword_umaxp1
  aarch64: Improvements to aarch64_select_cc_mode from arm
  aarch64: Introduce aarch64_expand_addsubti
  aarch64: Rename CC_ADCmode to CC_NOTCmode
  arm: Merge CC_ADC and CC_B to CC_NOTC
  aarch64: Use CC_NOTCmode for double-word subtract
  aarch64: Adjust result of aarch64_gen_compare_reg
  aarch64: Accept 0 as first argument to compares
  aarch64: Implement TImode comparisons

 gcc/config/aarch64/aarch64-protos.h   |  10 +-
 gcc/config/aarch64/aarch64.c  | 356 -
 gcc/config/arm/arm.c  |  30 +-
 gcc/testsuite/gcc.target/aarch64/asm-flag-1.c |   3 +-
 gcc/config/aarch64/aarch64-modes.def  |   6 +-
 gcc/config/aarch64/aarch64-simd.md|  18 +-
 gcc/config/aarch64/aarch64-speculation.cc |   5 +-
 gcc/config/aarch64/aarch64.md | 473 +++---
 gcc/config/aarch64/iterators.md   |   3 +
 gcc/config/aarch64/predicates.md  |  22 +-
 gcc/config/arm/arm-modes.def  |  12 +-
 gcc/config/arm/arm.md |  36 +-
 gcc/config/arm/iterators.md   |   2 +-
 gcc/config/arm/predicates.md  |   4 +-
 14 files changed, 580 insertions(+), 400 deletions(-)

---

typedef signed long long s64;
typedef unsigned long long u64;
typedef __uint128_t u128;
typedef __int128_t s128;

#define i128(hi,lo) (((u128)(hi) << 64) | (u64)(lo))

int eq(u128 a, u128 b)  { return a == b; }
int ne(u128 a, u128 b)  { return a != b; }
int ltu(u128 a, u128 b) { return a < b; }
int geu(u128 a, u128 b) { return a >= b; }
int leu(u128 a, u128 b) { return a <= b; }
int gtu(u128 a, u128 b) { return a > b; }
int lt(s128 a, s128 b) { return a < b; }
int ge(s128 a, s128 b) { return a >= b; }
int le(s128 a, s128 b) { return a <= b; }
int gt(s128 a, s128 b) { return a > b; }

int eqS(u128 a, u64 b)  { return a == b; }
int neS(u128 a, u64 b)  { return a != b; }
int ltuS(u128 a, u64 b) { return a < b; }
int geuS(u128 a, u64 b) { return a >= b; }
int leuS(u128 a, u64 b) { return a <= b; }
int gtuS(u128 a, u64 b) { return a > b; }
int ltS(s128 a, s64 b) { return a < b; }
int geS(s128 a, s64 b) { return a >= b; }
int leS(s128 a, s64 b) { return a <= b; }
int gtS(s128 a, s64 b) { return a > b; }

int eqSH(u128 a, u64 b)  { return a == (u128)b << 64; }
int neSH(u128 a, u64 b)  { return a != (u128)b << 64; }
int ltuSH(u128 a, u64 b) { return a < (u128)b << 64; }
int geuSH(u128 a, u64 b) { return a >= (u128)b << 64; }
int leuSH(u128 a, u64 b) { return a <= (u128)b << 64; }
int gtuSH(u128 a, u64 b) { return a > (u128)b << 64; }
int ltSH(s128 a, s64 b) { return a < (s128)b << 64; }
int geSH(s128 a, s64 b) { return a >= (s128)b << 64; }
int leSH(s128 a, s64 b) { return a <= (s128)b << 64; }
int gtSH(s128 a, s64 b) { return a > (s128)b << 64; }

int eqFFHS(u128 a, u64 b)  { return a == i128(-1,b); }
int neFFHS(u128 a, u64 b)  { return a != i128(-1,b); }
int ltuFFHS(u128 a, u64 b) { return a < i128(-1,b); }
int geuFFHS(u128 a, u64 b) { return a >= i128(-1,b); }
int leuFFHS(u128 a, u64 b) { return a <= i128(-1,b); }
int gtuFFHS(u128 a, u64 b) { return a > i128(-1,b); }
int ltFFHS(s128 a, s64 b) { return a < (s128)i128(-1,b); }
int geFFHS(s128 a, s64 b) { return a >= (s128)i128(-1,b); }
int leFFHS(s128 a, s64 b) { return a <= (s128)i128(-1,b); }
int gtFFHS(s128 a, s64 b) { return a > (s128)i128(-1,b); }

int eq0(u128 a) { return a == 0; }
int ne0(u128 a) { return a != 0; }
int ltu0(u128 a) { return a < 0; }
int geu0(u128 a) { return a >= 0; }
int leu0(u128 a) { return a <= 0; }
int gtu0(u128 a) { return a > 0; }
int lt0(s128 a) { return a < 0; }
int ge0(s128 a) { return a >= 0; }
int le0(s128 a) { return a <= 0; }
int gt0(s128 a) { return a > 0; }

int eq1(u128 a) { return a == 1; }
int ne1(u128 a) { return a != 1; }
int ltu1(u128 a) { return a < 1; }
int geu1(u128 a) { return a >= 1; }
int leu1(u128 a) { return a <= 1; }
int gtu1(u128 a) { return a > 1; }
int lt1(s128 a) { return a < 1; }
int ge1(s128 a) { return a >= 1; }
int le1(s128 a) { return a <= 1; }
int gt1(s128 a) { return a > 1; }

int eqm1(u128 a) { return a == -1; }
int nem1(u128 a) { r

[PATCH v4 02/12] aarch64: Match add3_carryin expander and insn

2020-04-09 Thread Richard Henderson via Gcc-patches
The expander and insn predicates do not match,
which can lead to insn recognition errors.

* config/aarch64/aarch64.md (add3_carryin):
Use register_operand instead of aarch64_reg_or_zero.
---
 gcc/config/aarch64/aarch64.md | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/gcc/config/aarch64/aarch64.md b/gcc/config/aarch64/aarch64.md
index 728c63bd8d6..e65f46f0f74 100644
--- a/gcc/config/aarch64/aarch64.md
+++ b/gcc/config/aarch64/aarch64.md
@@ -2600,8 +2600,8 @@
(plus:GPI
  (plus:GPI
(ltu:GPI (reg:CC_C CC_REGNUM) (const_int 0))
-   (match_operand:GPI 1 "aarch64_reg_or_zero"))
- (match_operand:GPI 2 "aarch64_reg_or_zero")))]
+   (match_operand:GPI 1 "register_operand"))
+ (match_operand:GPI 2 "register_operand")))]
""
""
 )
-- 
2.20.1



Re: [PATCH v2 00/11] aarch64: Implement TImode comparisons

2020-04-07 Thread Richard Henderson via Gcc-patches
On 4/7/20 4:58 PM, Segher Boessenkool wrote:
>> I wonder if it would be helpful to have
>>
>>   (uoverflow_plus x y carry)
>>   (soverflow_plus x y carry)
>>
>> etc.
> 
> Those have three operands, which is nasty to express.

How so?  It's a perfectly natural operation.

> On rs6000 we have the carry bit as a separate register (it really is
> only one bit, XER[CA], but in GCC we model it as a separate register).
> We handle it as a fixed register (there is only one, and saving and
> restoring it is relatively expensive, so this worked out the best).

As for most platforms, more or less.

> Still, in the patterns (for insns like "adde") that take both a carry
> input and have it as output, the expression for the carry output but
> already the one for the GPR output become so unwieldy that nothing
> can properly work with it.  So, in the end, I have all such insns that
> take a carry input just clobber their carry output.  This works great!

Sure, right up until the point when you want to actually *use* that carry
output.  Which is exactly what we're talking about here.

> Expressing the carry setting for insns that do not take a carry in is
> much easier.  You get somewhat different patterns for various
> immediate inputs, but that is all.

It's not horrible, but it's certainly verbose.  If we add a shorthand for that
common operation, so much the better.

I would not expect optimizers to take a collection of inputs and introduce this
rtx code, but only operate with it when the backend emits it.

>> This does have the advantage of avoiding the extensions, so that constants 
>> can
>> be retained in the original mode.
> 
> But it won't ever survive simplification; or, it will be in the way of
> simplification.

How so?

It's clear that

  (set (reg:CC_C flags)
   (uoverflow_plus:CC_C
 (reg:SI x)
 (const_int 0)
 (const_int 0)))

cannot overflow.  Thus this expression as a whole would, in combination with
the user of the CC_MODE, e.g.

  (set (reg:SI y) (ne:SI (reg:CC_C flags) (const_int 0))

fold to

  (set (reg:SI y) (ne:SI (const_int 0) (const_int 0))
to
  (set (reg:SI y) (const_int 0))

just like any other (compare) + (condition) pair.

I don't see why this new rtx code is any more difficult than ones that we have
already.

>> Though of course if we go this way, there will be incentive to add
>> overflow codes for all __builtin_*_overflow_p.
> 
> Yeah, eww.  And where will it stop?  What muladd insns should we have
> special RTL codes for, for the high part?

Well, we don't have overflow builtins for muladd yet.  Only plus, minus, and
mul.  Only x86 and s390x have insns to support overflow from mul without also
computing the highpart.

But add/sub-with-carry are *very* common operations.  As are add/sub-with-carry
with signed overflow into flags.  It would be nice to make that as simple as
possible across all targets.


r~


Re: [PATCH v2 00/11] aarch64: Implement TImode comparisons

2020-04-07 Thread Richard Henderson via Gcc-patches
On 4/7/20 9:32 AM, Richard Sandiford wrote:
> It's not really reversibility that I'm after (at least not for its
> own sake).
> 
> If we had a three-input compare_cc rtx_code that described a comparison
> involving a carry input, we'd certainly be using it here, because that's
> what the instruction does.  Given that we don't have the rtx_code, three
> obvious choices are:
> 
> (1) Add it.
> 
> (2) Continue to represent what the instruction does using an unspec.
> 
> (3) Don't try to represent the "three-input compare_cc" operation and
> instead describe a two-input comparison that only yields a valid
> result for a subset of tests.
> 
> (1) seems like the best technical solution but would probably be
> a lot of work.  I guess the reason I like (2) is that it stays
> closest to (1).

Indeed, the biggest problem that I'm having with copying the arm solution to
aarch64 is the special cases of the constants.

The first problem is that (any_extend:M1 (match_operand:M2)) is invalid rtl for
a constant, so you can't share the same define_insn to handle both register and
immediate input.

The second problem is how unpredictable the canonical rtl of an expression can
be after constant folding.  Which again requires more and more define_insns.
Even the Arm target gets this wrong.  In particular,

> (define_insn "cmpsi3_carryin_out"
>   [(set (reg: CC_REGNUM)
> (compare:
>  (SE:DI (match_operand:SI 1 "s_register_operand" "0,r"))
>  (plus:DI (match_operand:DI 3 "arm_borrow_operation" "")
>   (SE:DI (match_operand:SI 2 "s_register_operand" "l,r")
>(clobber (match_scratch:SI 0 "=l,r"))]

is non-canonical according to combine.  It will only attempt the ordering

  (compare
(plus ...)
(sign_extend ...))

I have no idea why combine is attempting to reverse the sense of the comparison
here.  I can only presume it would also reverse the sense of the branch on
which the comparison is made, had the pattern matched.

This second problem is partially worked around by fwprop, in that it will try
to simply replace the operand without folding if that is recognizable.  Thus
cases like

  (compare (const_int 0) (plus ...))

can be produced from fwprop but not combine.  Which works well enough to not
bother with the CC_RSBmode that the arm target uses.

The third problem is the really quite complicated code that goes into
SELECT_CC_MODE.  This really should not be as difficult as it is, and is the
sort of thing for which we built recog.

Related to that is the insn costing, which also ought to use something akin to
recog.  We have all of the information there: if the insn is recognizable, the
type/length attributes can be used to provide a good value.


r~


[PATCH v2 09/11] aarch64: Adjust result of aarch64_gen_compare_reg

2020-04-02 Thread Richard Henderson via Gcc-patches
Return the entire comparison expression, not just the cc_reg.
This will allow the routine to adjust the comparison code as
needed for TImode comparisons.

Note that some users were passing e.g. EQ to aarch64_gen_compare_reg
and then using gen_rtx_NE.  Pass the proper code in the first place.

* config/aarch64/aarch64.c (aarch64_gen_compare_reg): Return
the final comparison for code & cc_reg.
(aarch64_gen_compare_reg_maybe_ze): Likewise.
(aarch64_expand_compare_and_swap): Update to match -- do not
build the final comparison here, but PUT_MODE as necessary.
(aarch64_split_compare_and_swap): Use prebuilt comparison.
* config/aarch64/aarch64-simd.md (aarch64_cmdi): Likewise.
(aarch64_cmdi): Likewise.
(aarch64_cmtstdi): Likewise.
* config/aarch64/aarch64-speculation.cc
(aarch64_speculation_establish_tracker): Likewise.
* config/aarch64/aarch64.md (cbranch4, cbranch4): Likewise.
(mod3, abs2): Likewise.
(cstore4, cstore4): Likewise.
(cmov6, cmov6): Likewise.
(movcc, movcc, movcc): Likewise.
(cc): Likewise.
(ffs2): Likewise.
(cstorecc4): Remove redundant "".
---
 gcc/config/aarch64/aarch64.c  | 26 +++---
 gcc/config/aarch64/aarch64-simd.md| 18 ++---
 gcc/config/aarch64/aarch64-speculation.cc |  5 +-
 gcc/config/aarch64/aarch64.md | 96 ++-
 4 files changed, 63 insertions(+), 82 deletions(-)

diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c
index 8e54506bc3e..93658338041 100644
--- a/gcc/config/aarch64/aarch64.c
+++ b/gcc/config/aarch64/aarch64.c
@@ -2328,7 +2328,7 @@ emit_set_insn (rtx x, rtx y)
 }
 
 /* X and Y are two things to compare using CODE.  Emit the compare insn and
-   return the rtx for register 0 in the proper mode.  */
+   return the rtx for the CCmode comparison.  */
 rtx
 aarch64_gen_compare_reg (RTX_CODE code, rtx x, rtx y)
 {
@@ -2359,7 +2359,7 @@ aarch64_gen_compare_reg (RTX_CODE code, rtx x, rtx y)
   cc_reg = gen_rtx_REG (cc_mode, CC_REGNUM);
   emit_set_insn (cc_reg, gen_rtx_COMPARE (cc_mode, x, y));
 }
-  return cc_reg;
+  return gen_rtx_fmt_ee (code, VOIDmode, cc_reg, const0_rtx);
 }
 
 /* Similarly, but maybe zero-extend Y if Y_MODE < SImode.  */
@@ -2382,7 +2382,7 @@ aarch64_gen_compare_reg_maybe_ze (RTX_CODE code, rtx x, 
rtx y,
  cc_mode = CC_SWPmode;
  cc_reg = gen_rtx_REG (cc_mode, CC_REGNUM);
  emit_set_insn (cc_reg, t);
- return cc_reg;
+ return gen_rtx_fmt_ee (code, VOIDmode, cc_reg, const0_rtx);
}
 }
 
@@ -18487,7 +18487,8 @@ aarch64_expand_compare_and_swap (rtx operands[])
 
   emit_insn (gen_aarch64_compare_and_swap_lse (mode, rval, mem,
   newval, mod_s));
-  cc_reg = aarch64_gen_compare_reg_maybe_ze (NE, rval, oldval, mode);
+  x = aarch64_gen_compare_reg_maybe_ze (EQ, rval, oldval, mode);
+  PUT_MODE (x, SImode);
 }
   else if (TARGET_OUTLINE_ATOMICS)
 {
@@ -18498,7 +18499,8 @@ aarch64_expand_compare_and_swap (rtx operands[])
   rval = emit_library_call_value (func, NULL_RTX, LCT_NORMAL, r_mode,
  oldval, mode, newval, mode,
  XEXP (mem, 0), Pmode);
-  cc_reg = aarch64_gen_compare_reg_maybe_ze (NE, rval, oldval, mode);
+  x = aarch64_gen_compare_reg_maybe_ze (EQ, rval, oldval, mode);
+  PUT_MODE (x, SImode);
 }
   else
 {
@@ -18510,13 +18512,13 @@ aarch64_expand_compare_and_swap (rtx operands[])
   emit_insn (GEN_FCN (code) (rval, mem, oldval, newval,
 is_weak, mod_s, mod_f));
   cc_reg = gen_rtx_REG (CCmode, CC_REGNUM);
+  x = gen_rtx_EQ (SImode, cc_reg, const0_rtx);
 }
 
   if (r_mode != mode)
 rval = gen_lowpart (mode, rval);
   emit_move_insn (operands[1], rval);
 
-  x = gen_rtx_EQ (SImode, cc_reg, const0_rtx);
   emit_insn (gen_rtx_SET (bval, x));
 }
 
@@ -18591,10 +18593,8 @@ aarch64_split_compare_and_swap (rtx operands[])
   if (strong_zero_p)
 x = gen_rtx_NE (VOIDmode, rval, const0_rtx);
   else
-{
-  rtx cc_reg = aarch64_gen_compare_reg_maybe_ze (NE, rval, oldval, mode);
-  x = gen_rtx_NE (VOIDmode, cc_reg, const0_rtx);
-}
+x = aarch64_gen_compare_reg_maybe_ze (NE, rval, oldval, mode);
+
   x = gen_rtx_IF_THEN_ELSE (VOIDmode, x,
gen_rtx_LABEL_REF (Pmode, label2), pc_rtx);
   aarch64_emit_unlikely_jump (gen_rtx_SET (pc_rtx, x));
@@ -18607,8 +18607,7 @@ aarch64_split_compare_and_swap (rtx operands[])
{
  /* Emit an explicit compare instruction, so that we can correctly
 track the condition codes.  */
- rtx cc_reg = aarch64_gen_compare_reg (NE, scratch, const0_rtx);
- x = gen_rtx_NE (GET_MODE (cc_reg), cc_reg, const0_rtx);
+ x = aarch64_gen_compare_reg (NE, scratch,

[PATCH v2 04/11] aarch64: Introduce aarch64_expand_addsubti

2020-04-02 Thread Richard Henderson via Gcc-patches
Modify aarch64_expand_subvti into a form that handles all
addition and subtraction, modulo, signed or unsigned overflow.

Use expand_insn to put the operands into the proper form,
and do not force values into register if not required.

* config/aarch64/aarch64.c (aarch64_ti_split) New.
(aarch64_addti_scratch_regs): Remove.
(aarch64_subvti_scratch_regs): Remove.
(aarch64_expand_subvti): Remove.
(aarch64_expand_addsubti): New.
* config/aarch64/aarch64-protos.h: Update to match.
* config/aarch64/aarch64.md (addti3): Use aarch64_expand_addsubti.
(addvti4, uaddvti4): Likewise.
(subvti4, usubvti4): Likewise.
(subti3): Likewise; accept immediates for operand 2.
---
 gcc/config/aarch64/aarch64-protos.h |  10 +--
 gcc/config/aarch64/aarch64.c| 129 +---
 gcc/config/aarch64/aarch64.md   | 125 ++-
 3 files changed, 67 insertions(+), 197 deletions(-)

diff --git a/gcc/config/aarch64/aarch64-protos.h 
b/gcc/config/aarch64/aarch64-protos.h
index d6d668ea920..787085b24d2 100644
--- a/gcc/config/aarch64/aarch64-protos.h
+++ b/gcc/config/aarch64/aarch64-protos.h
@@ -630,16 +630,8 @@ void aarch64_reset_previous_fndecl (void);
 bool aarch64_return_address_signing_enabled (void);
 bool aarch64_bti_enabled (void);
 void aarch64_save_restore_target_globals (tree);
-void aarch64_addti_scratch_regs (rtx, rtx, rtx *,
-rtx *, rtx *,
-rtx *, rtx *,
-rtx *);
-void aarch64_subvti_scratch_regs (rtx, rtx, rtx *,
- rtx *, rtx *,
- rtx *, rtx *, rtx *);
-void aarch64_expand_subvti (rtx, rtx, rtx,
-   rtx, rtx, rtx, rtx, bool);
 
+void aarch64_expand_addsubti (rtx, rtx, rtx, int, int, int);
 
 /* Initialize builtins for SIMD intrinsics.  */
 void init_aarch64_simd_builtins (void);
diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c
index 7a13a8e8ec4..6263897c9a0 100644
--- a/gcc/config/aarch64/aarch64.c
+++ b/gcc/config/aarch64/aarch64.c
@@ -20241,110 +20241,61 @@ aarch64_gen_unlikely_cbranch (enum rtx_code code, 
machine_mode cc_mode,
   aarch64_emit_unlikely_jump (gen_rtx_SET (pc_rtx, x));
 }
 
-/* Generate DImode scratch registers for 128-bit (TImode) addition.
+/* Generate DImode scratch registers for 128-bit (TImode) add/sub.
+   INPUT represents the TImode input operand
+   LO represents the low half (DImode) of the TImode operand
+   HI represents the high half (DImode) of the TImode operand.  */
 
-   OP1 represents the TImode destination operand 1
-   OP2 represents the TImode destination operand 2
-   LOW_DEST represents the low half (DImode) of TImode operand 0
-   LOW_IN1 represents the low half (DImode) of TImode operand 1
-   LOW_IN2 represents the low half (DImode) of TImode operand 2
-   HIGH_DEST represents the high half (DImode) of TImode operand 0
-   HIGH_IN1 represents the high half (DImode) of TImode operand 1
-   HIGH_IN2 represents the high half (DImode) of TImode operand 2.  */
-
-void
-aarch64_addti_scratch_regs (rtx op1, rtx op2, rtx *low_dest,
-   rtx *low_in1, rtx *low_in2,
-   rtx *high_dest, rtx *high_in1,
-   rtx *high_in2)
+static void
+aarch64_ti_split (rtx input, rtx *lo, rtx *hi)
 {
-  *low_dest = gen_reg_rtx (DImode);
-  *low_in1 = gen_lowpart (DImode, op1);
-  *low_in2 = simplify_gen_subreg (DImode, op2, TImode,
- subreg_lowpart_offset (DImode, TImode));
-  *high_dest = gen_reg_rtx (DImode);
-  *high_in1 = gen_highpart (DImode, op1);
-  *high_in2 = simplify_gen_subreg (DImode, op2, TImode,
-  subreg_highpart_offset (DImode, TImode));
+  *lo = simplify_gen_subreg (DImode, input, TImode,
+subreg_lowpart_offset (DImode, TImode));
+  *hi = simplify_gen_subreg (DImode, input, TImode,
+subreg_highpart_offset (DImode, TImode));
 }
 
-/* Generate DImode scratch registers for 128-bit (TImode) subtraction.
-
-   This function differs from 'arch64_addti_scratch_regs' in that
-   OP1 can be an immediate constant (zero). We must call
-   subreg_highpart_offset with DImode and TImode arguments, otherwise
-   VOIDmode will be used for the const_int which generates an internal
-   error from subreg_size_highpart_offset which does not expect a size of zero.
-
-   OP1 represents the TImode destination operand 1
-   OP2 represents the TImode destination operand 2
-   LOW_DEST represents the low half (DImode) of TImode operand 0
-   LOW_IN1 represents the low half (DImode) of TImode operand 1
-   LOW_IN2 represents the low half (DImode) of TImode operand 2
-   HIGH_DEST represents the high half (DImode) of TImode operand 0
-   HIGH_IN1 represents the high half (DImode) of TImode operand 1
-   HIG

[PATCH v2 05/11] aarch64: Use UNSPEC_SBCS for subtract-with-borrow + output flags

2020-04-02 Thread Richard Henderson via Gcc-patches
The rtl description of signed/unsigned overflow from subtract
was fine, as far as it goes -- we have CC_Cmode and CC_Vmode
that indicate that only those particular bits are valid.

However, it's not clear how to extend that description to
handle signed comparison, where N == V (GE) N != V (LT) are
the only valid bits.

Using an UNSPEC means that we can unify all 3 usages without
fear that combine will try to infer anything from the rtl.
It also means we need far fewer variants when various inputs
have constants propagated in, and the rtl folds.

Accept -1 for the second input by using ADCS.

* config/aarch64/aarch64.md (UNSPEC_SBCS): New.
(cmp3_carryin): New expander.
(sub3_carryin_cmp): New expander.
(*cmp3_carryin): New pattern.
(*cmp3_carryin_0): New pattern.
(*sub3_carryin_cmp): New pattern.
(*sub3_carryin_cmp_0): New pattern.
(subvti4, usubvti4, negvti3): Use subdi3_carryin_cmp.
(negvdi_carryinV): Remove.
(usub3_carryinC): Remove.
(*usub3_carryinC): Remove.
(*usub3_carryinC_z1): Remove.
(*usub3_carryinC_z2): Remove.
(sub3_carryinV): Remove.
(*sub3_carryinV): Remove.
(*sub3_carryinV_z2): Remove.
* config/aarch64/predicates.md (aarch64_reg_zero_minus1): New.
---
 gcc/config/aarch64/aarch64.md| 217 +--
 gcc/config/aarch64/predicates.md |   7 +
 2 files changed, 94 insertions(+), 130 deletions(-)

diff --git a/gcc/config/aarch64/aarch64.md b/gcc/config/aarch64/aarch64.md
index 532c114a42e..564dea390be 100644
--- a/gcc/config/aarch64/aarch64.md
+++ b/gcc/config/aarch64/aarch64.md
@@ -281,6 +281,7 @@
 UNSPEC_GEN_TAG_RND ; Generate a random 4-bit MTE tag.
 UNSPEC_TAG_SPACE   ; Translate address to MTE tag address space.
 UNSPEC_LD1RO
+UNSPEC_SBCS
 ])
 
 (define_c_enum "unspecv" [
@@ -2942,7 +2943,7 @@
   aarch64_expand_addsubti (operands[0], operands[1], operands[2],
   CODE_FOR_subvdi_insn,
   CODE_FOR_subdi3_compare1,
-  CODE_FOR_subdi3_carryinV);
+  CODE_FOR_subdi3_carryin_cmp);
   aarch64_gen_unlikely_cbranch (NE, CC_Vmode, operands[3]);
   DONE;
 })
@@ -2957,7 +2958,7 @@
   aarch64_expand_addsubti (operands[0], operands[1], operands[2],
   CODE_FOR_subdi3_compare1,
   CODE_FOR_subdi3_compare1,
-  CODE_FOR_usubdi3_carryinC);
+  CODE_FOR_subdi3_carryin_cmp);
   aarch64_gen_unlikely_cbranch (LTU, CCmode, operands[3]);
   DONE;
 })
@@ -2968,12 +2969,14 @@
(label_ref (match_operand 2 "" ""))]
   ""
   {
-emit_insn (gen_negdi_carryout (gen_lowpart (DImode, operands[0]),
-  gen_lowpart (DImode, operands[1])));
-emit_insn (gen_negvdi_carryinV (gen_highpart (DImode, operands[0]),
-   gen_highpart (DImode, operands[1])));
-aarch64_gen_unlikely_cbranch (NE, CC_Vmode, operands[2]);
+rtx op0l = gen_lowpart (DImode, operands[0]);
+rtx op1l = gen_lowpart (DImode, operands[1]);
+rtx op0h = gen_highpart (DImode, operands[0]);
+rtx op1h = gen_highpart (DImode, operands[1]);
 
+emit_insn (gen_negdi_carryout (op0l, op1l));
+emit_insn (gen_subdi3_carryin_cmp (op0h, const0_rtx, op1h));
+aarch64_gen_unlikely_cbranch (NE, CC_Vmode, operands[2]);
 DONE;
   }
 )
@@ -2989,23 +2992,6 @@
   [(set_attr "type" "alus_sreg")]
 )
 
-(define_insn "negvdi_carryinV"
-  [(set (reg:CC_V CC_REGNUM)
-   (compare:CC_V
-(neg:TI (plus:TI
- (ltu:TI (reg:CC CC_REGNUM) (const_int 0))
- (sign_extend:TI (match_operand:DI 1 "register_operand" "r"
-(sign_extend:TI
- (neg:DI (plus:DI (ltu:DI (reg:CC CC_REGNUM) (const_int 0))
-  (match_dup 1))
-   (set (match_operand:DI 0 "register_operand" "=r")
-   (neg:DI (plus:DI (ltu:DI (reg:CC CC_REGNUM) (const_int 0))
-(match_dup 1]
-  ""
-  "ngcs\\t%0, %1"
-  [(set_attr "type" "alus_sreg")]
-)
-
 (define_insn "*sub3_compare0"
   [(set (reg:CC_NZ CC_REGNUM)
(compare:CC_NZ (minus:GPI (match_operand:GPI 1 "register_operand" "rk")
@@ -3370,134 +3356,105 @@
   [(set_attr "type" "adc_reg")]
 )
 
-(define_expand "usub3_carryinC"
+(define_expand "sub3_carryin_cmp"
   [(parallel
- [(set (reg:CC CC_REGNUM)
-  (compare:CC
-(zero_extend:
-  (match_operand:GPI 1 "aarch64_reg_or_zero"))
-(plus:
-  (zero_extend:
-(match_operand:GPI 2 "register_operand"))
-  (ltu: (reg:CC CC_REGNUM) (const_int 0)
-  (set (match_operand:GPI 0 "register_operand")
-  (minus:GPI
-(minus:GPI (match_dup 1) (match_dup 2))
-(ltu:GPI (reg:CC CC_REGNUM) (const_int 0])]
+[(set (match_dup 3)
+

[PATCH v2 07/11] aarch64: Remove CC_ADCmode

2020-04-02 Thread Richard Henderson via Gcc-patches
Now that we're using UNSPEC_ADCS instead of rtl, there's
no reason to distinguish CC_ADCmode from CC_Cmode.  Both
examine only the C bit.  Within uaddvti4, using CC_Cmode
is clearer, since it's the carry-outthat's relevant.

* config/aarch64/aarch64-modes.def (CC_ADC): Remove.
* config/aarch64/aarch64.c (aarch64_select_cc_mode):
Do not look for unsigned overflow from add with carry.
* config/aarch64/aarch64.md (uaddvti4): Use CC_Cmode.
* config/aarch64/predicates.md (aarch64_carry_operation)
Remove check for CC_ADCmode.
(aarch64_borrow_operation): Likewise.
---
 gcc/config/aarch64/aarch64.c | 19 ---
 gcc/config/aarch64/aarch64-modes.def |  1 -
 gcc/config/aarch64/aarch64.md|  2 +-
 gcc/config/aarch64/predicates.md |  4 ++--
 4 files changed, 3 insertions(+), 23 deletions(-)

diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c
index 6263897c9a0..8e54506bc3e 100644
--- a/gcc/config/aarch64/aarch64.c
+++ b/gcc/config/aarch64/aarch64.c
@@ -9094,16 +9094,6 @@ aarch64_select_cc_mode (RTX_CODE code, rtx x, rtx y)
   && rtx_equal_p (XEXP (x, 0), y))
 return CC_Cmode;
 
-  /* A test for unsigned overflow from an add with carry.  */
-  if ((mode_x == DImode || mode_x == TImode)
-  && (code == LTU || code == GEU)
-  && code_x == PLUS
-  && CONST_SCALAR_INT_P (y)
-  && (rtx_mode_t (y, mode_x)
- == (wi::shwi (1, mode_x)
- << (GET_MODE_BITSIZE (mode_x).to_constant () / 2
-return CC_ADCmode;
-
   /* A test for signed overflow.  */
   if ((mode_x == DImode || mode_x == TImode)
   && code == NE
@@ -9232,15 +9222,6 @@ aarch64_get_condition_code_1 (machine_mode mode, enum 
rtx_code comp_code)
}
   break;
 
-case E_CC_ADCmode:
-  switch (comp_code)
-   {
-   case GEU: return AARCH64_CS;
-   case LTU: return AARCH64_CC;
-   default: return -1;
-   }
-  break;
-
 case E_CC_Vmode:
   switch (comp_code)
{
diff --git a/gcc/config/aarch64/aarch64-modes.def 
b/gcc/config/aarch64/aarch64-modes.def
index af972e8f72b..32e4b6a35a9 100644
--- a/gcc/config/aarch64/aarch64-modes.def
+++ b/gcc/config/aarch64/aarch64-modes.def
@@ -38,7 +38,6 @@ CC_MODE (CC_NZC);   /* Only N, Z and C bits of condition 
flags are valid.
 CC_MODE (CC_NZ);/* Only N and Z bits of condition flags are valid.  */
 CC_MODE (CC_Z); /* Only Z bit of condition flags is valid.  */
 CC_MODE (CC_C); /* C represents unsigned overflow of a simple addition.  */
-CC_MODE (CC_ADC);   /* Unsigned overflow from an ADC (add with carry).  */
 CC_MODE (CC_V); /* Only V bit of condition flags is valid.  */
 
 /* Half-precision floating point for __fp16.  */
diff --git a/gcc/config/aarch64/aarch64.md b/gcc/config/aarch64/aarch64.md
index 99023494fa1..8d405b40173 100644
--- a/gcc/config/aarch64/aarch64.md
+++ b/gcc/config/aarch64/aarch64.md
@@ -2079,7 +2079,7 @@
   CODE_FOR_adddi3_compareC,
   CODE_FOR_adddi3_compareC,
   CODE_FOR_adddi3_carryin_cmp);
-  aarch64_gen_unlikely_cbranch (GEU, CC_ADCmode, operands[3]);
+  aarch64_gen_unlikely_cbranch (LTU, CC_Cmode, operands[3]);
   DONE;
 })
 
diff --git a/gcc/config/aarch64/predicates.md b/gcc/config/aarch64/predicates.md
index 5f44ef7d672..42864cbf4dd 100644
--- a/gcc/config/aarch64/predicates.md
+++ b/gcc/config/aarch64/predicates.md
@@ -388,7 +388,7 @@
   machine_mode ccmode = GET_MODE (op0);
   if (ccmode == CC_Cmode)
 return GET_CODE (op) == LTU;
-  if (ccmode == CC_ADCmode || ccmode == CCmode)
+  if (ccmode == CCmode)
 return GET_CODE (op) == GEU;
   return false;
 })
@@ -406,7 +406,7 @@
   machine_mode ccmode = GET_MODE (op0);
   if (ccmode == CC_Cmode)
 return GET_CODE (op) == GEU;
-  if (ccmode == CC_ADCmode || ccmode == CCmode)
+  if (ccmode == CCmode)
 return GET_CODE (op) == LTU;
   return false;
 })
-- 
2.20.1



[PATCH v2 11/11] aarch64: Implement absti2

2020-04-02 Thread Richard Henderson via Gcc-patches
* config/aarch64/aarch64.md (absti2): New.
---
 gcc/config/aarch64/aarch64.md | 29 +
 1 file changed, 29 insertions(+)

diff --git a/gcc/config/aarch64/aarch64.md b/gcc/config/aarch64/aarch64.md
index cf716f815a1..4a30d4cca93 100644
--- a/gcc/config/aarch64/aarch64.md
+++ b/gcc/config/aarch64/aarch64.md
@@ -3521,6 +3521,35 @@
   }
 )
 
+(define_expand "absti2"
+  [(match_operand:TI 0 "register_operand")
+   (match_operand:TI 1 "register_operand")]
+  ""
+  {
+rtx lo_op1 = gen_lowpart (DImode, operands[1]);
+rtx hi_op1 = gen_highpart (DImode, operands[1]);
+rtx lo_tmp = gen_reg_rtx (DImode);
+rtx hi_tmp = gen_reg_rtx (DImode);
+rtx x, cc;
+
+emit_insn (gen_negdi_carryout (lo_tmp, lo_op1));
+emit_insn (gen_subdi3_carryin_cmp (hi_tmp, const0_rtx, hi_op1));
+
+cc = gen_rtx_REG (CC_NZmode, CC_REGNUM);
+x = gen_rtx_GE (VOIDmode, cc, const0_rtx);
+x = gen_rtx_IF_THEN_ELSE (DImode, x, lo_tmp, lo_op1);
+emit_insn (gen_rtx_SET (lo_tmp, x));
+
+x = gen_rtx_GE (VOIDmode, cc, const0_rtx);
+x = gen_rtx_IF_THEN_ELSE (DImode, x, hi_tmp, hi_op1);
+emit_insn (gen_rtx_SET (hi_tmp, x));
+
+emit_move_insn (gen_lowpart (DImode, operands[0]), lo_tmp);
+emit_move_insn (gen_highpart (DImode, operands[0]), hi_tmp);
+DONE;
+  }
+)
+
 (define_insn "neg2"
   [(set (match_operand:GPI 0 "register_operand" "=r,w")
(neg:GPI (match_operand:GPI 1 "register_operand" "r,w")))]
-- 
2.20.1



[PATCH v2 08/11] aarch64: Accept -1 as second argument to add3_carryin

2020-04-02 Thread Richard Henderson via Gcc-patches
* config/aarch64/predicates.md (aarch64_reg_or_minus1): New.
* config/aarch64/aarch64.md (add3_carryin): Use it.
(*add3_carryin): Likewise.
(*addsi3_carryin_uxtw): Likewise.
---
 gcc/config/aarch64/aarch64.md| 26 +++---
 gcc/config/aarch64/predicates.md |  6 +-
 2 files changed, 20 insertions(+), 12 deletions(-)

diff --git a/gcc/config/aarch64/aarch64.md b/gcc/config/aarch64/aarch64.md
index 8d405b40173..c11c4366bf9 100644
--- a/gcc/config/aarch64/aarch64.md
+++ b/gcc/config/aarch64/aarch64.md
@@ -2545,7 +2545,7 @@
  (plus:GPI
(ltu:GPI (reg:CC_C CC_REGNUM) (const_int 0))
(match_operand:GPI 1 "aarch64_reg_or_zero"))
- (match_operand:GPI 2 "aarch64_reg_or_zero")))]
+ (match_operand:GPI 2 "aarch64_reg_zero_minus1")))]
""
""
 )
@@ -2555,28 +2555,32 @@
 ;; accept the zeros during initial expansion.
 
 (define_insn "*add3_carryin"
-  [(set (match_operand:GPI 0 "register_operand" "=r")
+  [(set (match_operand:GPI 0 "register_operand" "=r,r")
(plus:GPI
  (plus:GPI
(match_operand:GPI 3 "aarch64_carry_operation" "")
-   (match_operand:GPI 1 "aarch64_reg_or_zero" "rZ"))
- (match_operand:GPI 2 "aarch64_reg_or_zero" "rZ")))]
-   ""
-   "adc\\t%0, %1, %2"
+   (match_operand:GPI 1 "aarch64_reg_or_zero" "rZ,rZ"))
+ (match_operand:GPI 2 "aarch64_reg_zero_minus1" "rZ,UsM")))]
+  ""
+  "@
+   adc\\t%0, %1, %2
+   sbc\\t%0, %1, zr"
   [(set_attr "type" "adc_reg")]
 )
 
 ;; zero_extend version of above
 (define_insn "*addsi3_carryin_uxtw"
-  [(set (match_operand:DI 0 "register_operand" "=r")
+  [(set (match_operand:DI 0 "register_operand" "=r,r")
(zero_extend:DI
  (plus:SI
(plus:SI
  (match_operand:SI 3 "aarch64_carry_operation" "")
- (match_operand:SI 1 "register_operand" "r"))
-   (match_operand:SI 2 "register_operand" "r"]
-   ""
-   "adc\\t%w0, %w1, %w2"
+ (match_operand:SI 1 "register_operand" "r,r"))
+   (match_operand:SI 2 "aarch64_reg_or_minus1" "r,UsM"]
+  ""
+  "@
+   adc\\t%w0, %w1, %w2
+   sbc\\t%w0, %w1, wzr"
   [(set_attr "type" "adc_reg")]
 )
 
diff --git a/gcc/config/aarch64/predicates.md b/gcc/config/aarch64/predicates.md
index 42864cbf4dd..2e7aa6389eb 100644
--- a/gcc/config/aarch64/predicates.md
+++ b/gcc/config/aarch64/predicates.md
@@ -68,13 +68,17 @@
(ior (match_operand 0 "register_operand")
(match_test "op == CONST0_RTX (GET_MODE (op))"
 
+(define_predicate "aarch64_reg_or_minus1"
+  (and (match_code "reg,subreg,const_int")
+   (ior (match_operand 0 "register_operand")
+   (match_test "op == CONSTM1_RTX (GET_MODE (op))"
+
 (define_predicate "aarch64_reg_zero_minus1"
   (and (match_code "reg,subreg,const_int")
(ior (match_operand 0 "register_operand")
(ior (match_test "op == CONST0_RTX (GET_MODE (op))")
 (match_test "op == CONSTM1_RTX (GET_MODE (op))")
 
-
 (define_predicate "aarch64_reg_or_fp_zero"
   (ior (match_operand 0 "register_operand")
(and (match_code "const_double")
-- 
2.20.1



[PATCH v2 06/11] aarch64: Use UNSPEC_ADCS for add-with-carry + output flags

2020-04-02 Thread Richard Henderson via Gcc-patches
Similar to UNSPEC_SBCS, we can unify the signed/unsigned overflow
paths by using an unspec.

Accept -1 for the second input by using SBCS.

* config/aarch64/aarch64.md (UNSPEC_ADCS): New.
(addvti4, uaddvti4): Use adddi_carryin_cmp.
(add3_carryinC): Remove.
(*add3_carryinC_zero): Remove.
(*add3_carryinC): Remove.
(add3_carryinV): Remove.
(*add3_carryinV_zero): Remove.
(*add3_carryinV): Remove.
(add3_carryin_cmp): New expander.
(*add3_carryin_cmp): New pattern.
(*add3_carryin_cmp_0): New pattern.
(*cmn3_carryin): New pattern.
(*cmn3_carryin_0): New pattern.
---
 gcc/config/aarch64/aarch64.md | 206 +++---
 1 file changed, 89 insertions(+), 117 deletions(-)

diff --git a/gcc/config/aarch64/aarch64.md b/gcc/config/aarch64/aarch64.md
index 564dea390be..99023494fa1 100644
--- a/gcc/config/aarch64/aarch64.md
+++ b/gcc/config/aarch64/aarch64.md
@@ -281,6 +281,7 @@
 UNSPEC_GEN_TAG_RND ; Generate a random 4-bit MTE tag.
 UNSPEC_TAG_SPACE   ; Translate address to MTE tag address space.
 UNSPEC_LD1RO
+UNSPEC_ADCS
 UNSPEC_SBCS
 ])
 
@@ -2062,7 +2063,7 @@
   aarch64_expand_addsubti (operands[0], operands[1], operands[2],
   CODE_FOR_adddi3_compareV,
   CODE_FOR_adddi3_compareC,
-  CODE_FOR_adddi3_carryinV);
+  CODE_FOR_adddi3_carryin_cmp);
   aarch64_gen_unlikely_cbranch (NE, CC_Vmode, operands[3]);
   DONE;
 })
@@ -2077,7 +2078,7 @@
   aarch64_expand_addsubti (operands[0], operands[1], operands[2],
   CODE_FOR_adddi3_compareC,
   CODE_FOR_adddi3_compareC,
-  CODE_FOR_adddi3_carryinC);
+  CODE_FOR_adddi3_carryin_cmp);
   aarch64_gen_unlikely_cbranch (GEU, CC_ADCmode, operands[3]);
   DONE;
 })
@@ -2579,133 +2580,104 @@
   [(set_attr "type" "adc_reg")]
 )
 
-(define_expand "add3_carryinC"
+(define_expand "add3_carryin_cmp"
   [(parallel
- [(set (match_dup 3)
-  (compare:CC_ADC
-(plus:
-  (plus:
-(match_dup 4)
-(zero_extend:
-  (match_operand:GPI 1 "register_operand")))
-  (zero_extend:
-(match_operand:GPI 2 "register_operand")))
-(match_dup 6)))
-  (set (match_operand:GPI 0 "register_operand")
-  (plus:GPI
-(plus:GPI (match_dup 5) (match_dup 1))
-(match_dup 2)))])]
+[(set (match_dup 3)
+ (unspec:CC
+   [(match_operand:GPI 1 "aarch64_reg_or_zero")
+(match_operand:GPI 2 "aarch64_reg_zero_minus1")
+(match_dup 4)]
+   UNSPEC_ADCS))
+ (set (match_operand:GPI 0 "register_operand")
+ (unspec:GPI
+   [(match_dup 1) (match_dup 2) (match_dup 4)]
+   UNSPEC_ADCS))])]
""
-{
-  operands[3] = gen_rtx_REG (CC_ADCmode, CC_REGNUM);
-  rtx ccin = gen_rtx_REG (CC_Cmode, CC_REGNUM);
-  operands[4] = gen_rtx_LTU (mode, ccin, const0_rtx);
-  operands[5] = gen_rtx_LTU (mode, ccin, const0_rtx);
-  operands[6] = immed_wide_int_const (wi::shwi (1, mode)
- << GET_MODE_BITSIZE (mode),
- TImode);
-})
+  {
+operands[3] = gen_rtx_REG (CCmode, CC_REGNUM);
+operands[4] = gen_rtx_GEU (mode, operands[3], const0_rtx);
+  }
+)
 
-(define_insn "*add3_carryinC_zero"
-  [(set (reg:CC_ADC CC_REGNUM)
-   (compare:CC_ADC
- (plus:
-   (match_operand: 2 "aarch64_carry_operation" "")
-   (zero_extend: (match_operand:GPI 1 "register_operand" "r")))
- (match_operand 4 "const_scalar_int_operand" "")))
-   (set (match_operand:GPI 0 "register_operand" "=r")
-   (plus:GPI (match_operand:GPI 3 "aarch64_carry_operation" "")
- (match_dup 1)))]
-  "rtx_mode_t (operands[4], mode)
-   == (wi::shwi (1, mode) << (unsigned) GET_MODE_BITSIZE (mode))"
-   "adcs\\t%0, %1, zr"
+(define_insn "*add3_carryin_cmp"
+  [(set (reg:CC CC_REGNUM)
+   (unspec:CC
+ [(match_operand:GPI 1 "aarch64_reg_or_zero" "%rZ,rZ")
+  (match_operand:GPI 2 "aarch64_reg_zero_minus1" "rZ,UsM")
+  (match_operand:GPI 3 "aarch64_carry_operation" "")]
+ UNSPEC_ADCS))
+   (set (match_operand:GPI 0 "register_operand" "=r,r")
+   (unspec:GPI
+ [(match_dup 1) (match_dup 2) (match_dup 3)]
+ UNSPEC_ADCS))]
+   ""
+   "@
+adcs\\t%0, %1, %2
+sbcs\\t%0, %1, zr"
   [(set_attr "type" "adc_reg")]
 )
 
-(define_insn "*add3_carryinC"
-  [(set (reg:CC_ADC CC_REGNUM)
-   (compare:CC_ADC
- (plus:
-   (plus:
- (match_operand: 3 "aarch64_carry_operation" "")
- (zero_extend: (match_operand:GPI 1 "register_operand" "r")))
-   (zero_extend: (match_operand:GPI 2 "register_operand" "r")))
- (m

[PATCH v2 10/11] aarch64: Implement TImode comparisons

2020-04-02 Thread Richard Henderson via Gcc-patches
Use ccmp to perform all TImode comparisons branchless.

* config/aarch64/aarch64.c (aarch64_gen_compare_reg): Expand all of
the comparisons for TImode, not just NE.
* config/aarch64/aarch64.md (cbranchti4, cstoreti4): New.
---
 gcc/config/aarch64/aarch64.c  | 122 ++
 gcc/config/aarch64/aarch64.md |  28 
 2 files changed, 136 insertions(+), 14 deletions(-)

diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c
index 93658338041..89c9192266c 100644
--- a/gcc/config/aarch64/aarch64.c
+++ b/gcc/config/aarch64/aarch64.c
@@ -2333,32 +2333,126 @@ rtx
 aarch64_gen_compare_reg (RTX_CODE code, rtx x, rtx y)
 {
   machine_mode cmp_mode = GET_MODE (x);
-  machine_mode cc_mode;
   rtx cc_reg;
 
   if (cmp_mode == TImode)
 {
-  gcc_assert (code == NE);
-
-  cc_mode = CCmode;
-  cc_reg = gen_rtx_REG (cc_mode, CC_REGNUM);
-
   rtx x_lo = operand_subword (x, 0, 0, TImode);
-  rtx y_lo = operand_subword (y, 0, 0, TImode);
-  emit_set_insn (cc_reg, gen_rtx_COMPARE (cc_mode, x_lo, y_lo));
-
   rtx x_hi = operand_subword (x, 1, 0, TImode);
-  rtx y_hi = operand_subword (y, 1, 0, TImode);
-  emit_insn (gen_ccmpccdi (cc_reg, cc_reg, x_hi, y_hi,
-  gen_rtx_EQ (cc_mode, cc_reg, const0_rtx),
-  GEN_INT (AARCH64_EQ)));
+  struct expand_operand ops[2];
+  rtx y_lo, y_hi, tmp;
+
+  if (CONST_INT_P (y))
+   {
+ HOST_WIDE_INT y_int = INTVAL (y);
+
+ y_lo = y;
+ switch (code)
+   {
+   case EQ:
+   case NE:
+ /* For equality, IOR the two halves together.  If this gets
+used for a branch, we expect this to fold to cbz/cbnz;
+otherwise it's no larger than cmp+ccmp below.  Beware of
+the compare-and-swap post-reload split and use cmp+ccmp.  */
+ if (y_int == 0 && can_create_pseudo_p ())
+   {
+ tmp = gen_reg_rtx (DImode);
+ emit_insn (gen_iordi3 (tmp, x_hi, x_lo));
+ emit_insn (gen_cmpdi (tmp, const0_rtx));
+ cc_reg = gen_rtx_REG (CCmode, CC_REGNUM);
+ goto done;
+   }
+   break;
+
+   case LE:
+   case GT:
+ /* Add 1 to Y to convert to LT/GE, which avoids the swap and
+keeps the constant operand.  The cstoreti and cbranchti
+operand predicates require aarch64_plus_operand, which
+means this increment cannot overflow.  */
+ y_lo = gen_int_mode (++y_int, DImode);
+ code = (code == LE ? LT : GE);
+ /* fall through */
+
+   case LT:
+   case GE:
+ /* Check only the sign bit using tst, or fold to tbz/tbnz.  */
+ if (y_int == 0)
+   {
+ cc_reg = gen_rtx_REG (CC_NZmode, CC_REGNUM);
+ tmp = gen_rtx_AND (DImode, x_hi, GEN_INT (INT64_MIN));
+ tmp = gen_rtx_COMPARE (CC_NZmode, tmp, const0_rtx);
+ emit_set_insn (cc_reg, tmp);
+ code = (code == LT ? NE : EQ);
+ goto done;
+   }
+ break;
+
+   default:
+ break;
+   }
+ y_hi = (y_int < 0 ? constm1_rtx : const0_rtx);
+   }
+  else
+   {
+ y_lo = operand_subword (y, 0, 0, TImode);
+ y_hi = operand_subword (y, 1, 0, TImode);
+   }
+
+  switch (code)
+   {
+   case LEU:
+   case GTU:
+   case LE:
+   case GT:
+ std::swap (x_lo, y_lo);
+ std::swap (x_hi, y_hi);
+ code = swap_condition (code);
+ break;
+
+   default:
+ break;
+   }
+
+  /* Emit cmpdi, forcing operands into registers as required. */
+  create_input_operand (&ops[0], x_lo, DImode);
+  create_input_operand (&ops[1], y_lo, DImode);
+  expand_insn (CODE_FOR_cmpdi, 2, ops);
+
+  cc_reg = gen_rtx_REG (CCmode, CC_REGNUM);
+  switch (code)
+   {
+   case EQ:
+   case NE:
+ /* For EQ, (x_lo == y_lo) && (x_hi == y_hi).  */
+ emit_insn (gen_ccmpccdi (cc_reg, cc_reg, x_hi, y_hi,
+  gen_rtx_EQ (VOIDmode, cc_reg, const0_rtx),
+  GEN_INT (AARCH64_EQ)));
+ break;
+
+   case LTU:
+   case GEU:
+   case LT:
+   case GE:
+ /* Compute (x - y), as double-word arithmetic.  */
+ create_input_operand (&ops[0], x_hi, DImode);
+ create_input_operand (&ops[1], y_hi, DImode);
+ expand_insn (CODE_FOR_cmpdi3_carryin, 2, ops);
+ break;
+
+   default:
+ gcc_unreachable ();
+   }
 }
   else
 {
-  cc_mode = SELECT_CC_MODE (code, x, y);
+  machine_mode cc_mode = SELECT_CC_MODE (code, x, y);
   cc_reg = gen_rtx_REG (cc_mode, CC_R

[PATCH v2 01/11] aarch64: Accept 0 as first argument to compares

2020-04-02 Thread Richard Henderson via Gcc-patches
While cmp (extended register) and cmp (immediate) uses ,
cmp (shifted register) uses .  So we can perform cmp xzr, x0.

For ccmp, we only have  as an input.

* config/aarch64/aarch64.md (cmp): For operand 0, use
aarch64_reg_or_zero.  Shuffle reg/reg to last alternative
and accept Z.
(@ccmpcc): For operand 0, use aarch64_reg_or_zero and Z.
(@ccmpcc_rev): Likewise.
---
 gcc/config/aarch64/aarch64.md | 14 +++---
 1 file changed, 7 insertions(+), 7 deletions(-)

diff --git a/gcc/config/aarch64/aarch64.md b/gcc/config/aarch64/aarch64.md
index c7c4d1dd519..6fdab5f3402 100644
--- a/gcc/config/aarch64/aarch64.md
+++ b/gcc/config/aarch64/aarch64.md
@@ -502,7 +502,7 @@
   [(match_operand 0 "cc_register" "")
(const_int 0)])
  (compare:CC_ONLY
-   (match_operand:GPI 2 "register_operand" "r,r,r")
+   (match_operand:GPI 2 "aarch64_reg_or_zero" "rZ,rZ,rZ")
(match_operand:GPI 3 "aarch64_ccmp_operand" "r,Uss,Usn"))
  (unspec:CC_ONLY
[(match_operand 5 "immediate_operand")]
@@ -542,7 +542,7 @@
[(match_operand 5 "immediate_operand")]
UNSPEC_NZCV)
  (compare:CC_ONLY
-   (match_operand:GPI 2 "register_operand" "r,r,r")
+   (match_operand:GPI 2 "aarch64_reg_or_zero" "rZ,rZ,rZ")
(match_operand:GPI 3 "aarch64_ccmp_operand" "r,Uss,Usn"]
   ""
   "@
@@ -3961,14 +3961,14 @@
 
 (define_insn "cmp"
   [(set (reg:CC CC_REGNUM)
-   (compare:CC (match_operand:GPI 0 "register_operand" "rk,rk,rk")
-   (match_operand:GPI 1 "aarch64_plus_operand" "r,I,J")))]
+   (compare:CC (match_operand:GPI 0 "aarch64_reg_or_zero" "rk,rk,rkZ")
+   (match_operand:GPI 1 "aarch64_plus_operand" "I,J,r")))]
   ""
   "@
-   cmp\\t%0, %1
cmp\\t%0, %1
-   cmn\\t%0, #%n1"
-  [(set_attr "type" "alus_sreg,alus_imm,alus_imm")]
+   cmn\\t%0, #%n1
+   cmp\\t%0, %1"
+  [(set_attr "type" "alus_imm,alus_imm,alus_sreg")]
 )
 
 (define_insn "fcmp"
-- 
2.20.1



[PATCH v2 00/11] aarch64: Implement TImode comparisons

2020-04-02 Thread Richard Henderson via Gcc-patches
This is attacking case 3 of PR 94174.

In v2, I unify the various subtract-with-borrow and add-with-carry
patterns that also output flags with unspecs.  As suggested by
Richard Sandiford during review of v1.  It does seem cleaner.


r~


Richard Henderson (11):
  aarch64: Accept 0 as first argument to compares
  aarch64: Accept zeros in add3_carryin
  aarch64: Provide expander for sub3_compare1
  aarch64: Introduce aarch64_expand_addsubti
  aarch64: Use UNSPEC_SBCS for subtract-with-borrow + output flags
  aarch64: Use UNSPEC_ADCS for add-with-carry + output flags
  aarch64: Remove CC_ADCmode
  aarch64: Accept -1 as second argument to add3_carryin
  aarch64: Adjust result of aarch64_gen_compare_reg
  aarch64: Implement TImode comparisons
  aarch64: Implement absti2

 gcc/config/aarch64/aarch64-protos.h   |  10 +-
 gcc/config/aarch64/aarch64.c  | 303 +
 gcc/config/aarch64/aarch64-modes.def  |   1 -
 gcc/config/aarch64/aarch64-simd.md|  18 +-
 gcc/config/aarch64/aarch64-speculation.cc |   5 +-
 gcc/config/aarch64/aarch64.md | 762 ++
 gcc/config/aarch64/predicates.md  |  15 +-
 7 files changed, 527 insertions(+), 587 deletions(-)

-- 
2.20.1



[PATCH v2 02/11] aarch64: Accept zeros in add3_carryin

2020-04-02 Thread Richard Henderson via Gcc-patches
The expander and the insn pattern did not match, leading to
recognition failures in expand.

* config/aarch64/aarch64.md (*add3_carryin): Accept zeros.
---
 gcc/config/aarch64/aarch64.md | 9 +
 1 file changed, 5 insertions(+), 4 deletions(-)

diff --git a/gcc/config/aarch64/aarch64.md b/gcc/config/aarch64/aarch64.md
index 6fdab5f3402..b242f2b1c73 100644
--- a/gcc/config/aarch64/aarch64.md
+++ b/gcc/config/aarch64/aarch64.md
@@ -2606,16 +2606,17 @@
""
 )
 
-;; Note that add with carry with two zero inputs is matched by cset,
-;; and that add with carry with one zero input is matched by cinc.
+;; While add with carry with two zero inputs will be folded to cset,
+;; and add with carry with one zero input will be folded to cinc,
+;; accept the zeros during initial expansion.
 
 (define_insn "*add3_carryin"
   [(set (match_operand:GPI 0 "register_operand" "=r")
(plus:GPI
  (plus:GPI
(match_operand:GPI 3 "aarch64_carry_operation" "")
-   (match_operand:GPI 1 "register_operand" "r"))
- (match_operand:GPI 2 "register_operand" "r")))]
+   (match_operand:GPI 1 "aarch64_reg_or_zero" "rZ"))
+ (match_operand:GPI 2 "aarch64_reg_or_zero" "rZ")))]
""
"adc\\t%0, %1, %2"
   [(set_attr "type" "adc_reg")]
-- 
2.20.1



[PATCH v2 03/11] aarch64: Provide expander for sub3_compare1

2020-04-02 Thread Richard Henderson via Gcc-patches
In one place we open-code a special case of this pattern into the
more specific sub3_compare1_imm, and miss this special case
in other places.  Centralize that special case into an expander.

* config/aarch64/aarch64.md (*sub3_compare1): Rename
from sub3_compare1.
(sub3_compare1): New expander.
* config/aarch64/aarch64.c (aarch64_expand_subvti): Remove
call to gen_subdi3_compare1_imm.
---
 gcc/config/aarch64/aarch64.c  | 11 ++-
 gcc/config/aarch64/aarch64.md | 22 +-
 2 files changed, 23 insertions(+), 10 deletions(-)

diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c
index c90de65de12..7a13a8e8ec4 100644
--- a/gcc/config/aarch64/aarch64.c
+++ b/gcc/config/aarch64/aarch64.c
@@ -20333,16 +20333,9 @@ aarch64_expand_subvti (rtx op0, rtx low_dest, rtx 
low_in1,
 }
   else
 {
-  if (aarch64_plus_immediate (low_in2, DImode))
-   emit_insn (gen_subdi3_compare1_imm (low_dest, low_in1, low_in2,
-   GEN_INT (-INTVAL (low_in2;
-  else
-   {
- low_in2 = force_reg (DImode, low_in2);
- emit_insn (gen_subdi3_compare1 (low_dest, low_in1, low_in2));
-   }
-  high_in2 = force_reg (DImode, high_in2);
+  emit_insn (gen_subdi3_compare1 (low_dest, low_in1, low_in2));
 
+  high_in2 = force_reg (DImode, high_in2);
   if (unsigned_p)
emit_insn (gen_usubdi3_carryinC (high_dest, high_in1, high_in2));
   else
diff --git a/gcc/config/aarch64/aarch64.md b/gcc/config/aarch64/aarch64.md
index b242f2b1c73..d6389cc8148 100644
--- a/gcc/config/aarch64/aarch64.md
+++ b/gcc/config/aarch64/aarch64.md
@@ -3120,7 +3120,7 @@
   [(set_attr "type" "alus_imm")]
 )
 
-(define_insn "sub3_compare1"
+(define_insn "*sub3_compare1"
   [(set (reg:CC CC_REGNUM)
(compare:CC
  (match_operand:GPI 1 "aarch64_reg_or_zero" "rkZ")
@@ -3132,6 +3132,26 @@
   [(set_attr "type" "alus_sreg")]
 )
 
+(define_expand "sub3_compare1"
+  [(parallel
+[(set (reg:CC CC_REGNUM)
+ (compare:CC
+   (match_operand:GPI 1 "aarch64_reg_or_zero")
+   (match_operand:GPI 2 "aarch64_reg_or_imm")))
+ (set (match_operand:GPI 0 "register_operand")
+ (minus:GPI (match_dup 1) (match_dup 2)))])]
+  ""
+{
+  if (aarch64_plus_immediate (operands[2], mode))
+{
+  emit_insn (gen_sub3_compare1_imm
+(operands[0], operands[1], operands[2],
+ GEN_INT (-INTVAL (operands[2];
+  DONE;
+}
+  operands[2] = force_reg (mode, operands[2]);
+})
+
 (define_peephole2
   [(set (match_operand:GPI 0 "aarch64_general_reg")
(minus:GPI (match_operand:GPI 1 "aarch64_reg_or_zero")
-- 
2.20.1



Re: [PATCH v2 3/9] aarch64: Add cmp_*_carryinC patterns

2020-04-01 Thread Richard Henderson via Gcc-patches
On 4/1/20 9:28 AM, Richard Sandiford wrote:
> How important is it to describe the flags operation as a compare though?
> Could we instead use an unspec with three inputs, and keep it as :CC?
> That would still allow special-case matching for zero operands.

I'm not sure.

My guess is that the only interesting optimization for ADC/SBC is when
optimization determines that the low-part of op2 is zero, so that we can fold

  [(set (reg cc) (compare ...))
   (set (reg t0) (sub (reg a0) (reg b0))]

  [(set (reg cc) (compare ...))
   (set (reg t1) (sub (reg a1)
   (sub (reg b1)
 (geu (reg cc) (const 0)]

to

  [(set (reg t0) (reg a0)]

  [(set (reg cc) (compare ...))
   (set (reg t1) (sub (reg a1) (reg b1))]

which combine should be able to do by propagating zeros across the compare+geu.

Though I suppose it's still possible to handle this with unspecs and
define_split, so that

  [(set (reg cc)
(unspec [(reg a1) (reg b2) (geu ...)]
UNSPEC_SBCS)
   (set (reg t1) ...)]

when the geu folds to (const_int 0), we can split this to a plain sub.

I'll see if I can make this work with a minimum of effort.


r~


Re: [PATCH v2 3/9] aarch64: Add cmp_*_carryinC patterns

2020-03-31 Thread Richard Henderson via Gcc-patches
On 3/31/20 11:34 AM, Richard Sandiford wrote:
>> +(define_insn "*cmp3_carryinC"
>> +  [(set (reg:CC CC_REGNUM)
>> +(compare:CC
>> +  (ANY_EXTEND:
>> +(match_operand:GPI 0 "register_operand" "r"))
>> +  (plus:
>> +(ANY_EXTEND:
>> +  (match_operand:GPI 1 "register_operand" "r"))
>> +(match_operand: 2 "aarch64_borrow_operation" ""]
>> +   ""
>> +   "sbcs\\tzr, %0, %1"
>> +  [(set_attr "type" "adc_reg")]
>> +)
> 
> I guess this feeds into your reply to Segher's comment for 7/9,
> but I think:
> 
>(compare:CC X Y)
> 
> is always supposed to be the NZCV flags result of X - Y, as computed in
> the mode of X and Y.  If so, it seems like the type of extension should
> matter.  E.g. the N flag ought to be set for:
> 
>   (compare:CC
> (sign_extend 0xf...)
> (plus (sign_extend 0x7...)
>   (ltu ...)))
> 
> but ought to be clear for:
> 
>   (compare:CC
> (zero_extend 0xf...)
> (plus (zero_extend 0x7...)
>   (ltu ...)))
> 
> If so, I guess this is a bug in the existing code...

The subject of CCmodes is a sticky one.  It mostly all depends on what combine
is able to do with the patterns.

For instance, your choice of example above, even for signed, the N bit cannot
be examined by itself, because that would only be valid for a comparison
against zero, like

(compare (plus (reg) (reg))
 (const_int 0))

For this particular bit of rtl, the only valid comparison is N == V, i.e. GE/LT.

If we add a new CC mode for this, what would you call it?  Probably not
CC_NVmode, because to me that implies you can use either N or V, but it doesn't
imply you must examine both.

If we add more CC modes, does that mean that we have to improve SELECT_CC_MODE
to match those patterns?  Or do we add new CC modes just so that combine's use
of SELECT_CC_MODE *cannot* match them?


r~


Re: [PATCH v2 1/9] aarch64: Accept 0 as first argument to compares

2020-03-31 Thread Richard Henderson via Gcc-patches
On 3/31/20 9:55 AM, Richard Sandiford wrote:
>>  (define_insn "cmp"
>>[(set (reg:CC CC_REGNUM)
>> -(compare:CC (match_operand:GPI 0 "register_operand" "rk,rk,rk")
>> -(match_operand:GPI 1 "aarch64_plus_operand" "r,I,J")))]
>> +(compare:CC (match_operand:GPI 0 "aarch64_reg_or_zero" "rk,rk,rkZ")
>> +(match_operand:GPI 1 "aarch64_plus_operand" "I,J,rZ")))]
>>""
>>"@
>> -   cmp\\t%0, %1
>> cmp\\t%0, %1
>> -   cmn\\t%0, #%n1"
>> -  [(set_attr "type" "alus_sreg,alus_imm,alus_imm")]
>> +   cmn\\t%0, #%n1
>> +   cmp\\t%0, %1"
>> +  [(set_attr "type" "alus_imm,alus_imm,alus_sreg")]
>>  )
>>  
>>  (define_insn "fcmp"
> 
> ...does adding 'Z' to operand 1 enable any new combinations?

Not useful ones, on reflection, but given it's a valid combination, it's easier
to include it than not.

I can certainly remove that.

r~



Re: [PATCH v2 3/9] aarch64: Add cmp_*_carryinC patterns

2020-03-22 Thread Richard Henderson via Gcc-patches
On 3/22/20 12:30 PM, Segher Boessenkool wrote:
> Hi!
> 
> On Fri, Mar 20, 2020 at 07:42:25PM -0700, Richard Henderson via Gcc-patches 
> wrote:
>> Duplicate all usub_*_carryinC, but use xzr for the output when we
>> only require the flags output.  The signed versions use sign_extend
>> instead of zero_extend for combine's benefit.
> 
> You actually use ANY_EXTEND, which makes a lot more sense :-)
> 
> Did you see combine create a sign_extend, ever?  Or do those just come
> from combining other insns that already contain a sign_extend?

In the penultimate patch, for cmpti, I emit this sign_extend'ed pattern
manually, so that rtl actually gets the proper description of the comparison of
the high-half of the TImode variable.


r~


[PATCH v2 7/9] aarch64: Adjust result of aarch64_gen_compare_reg

2020-03-20 Thread Richard Henderson via Gcc-patches
Return the entire comparison expression, not just the cc_reg.
This will allow the routine to adjust the comparison code as
needed for TImode comparisons.

Note that some users were passing e.g. EQ to aarch64_gen_compare_reg
and then using gen_rtx_NE.  Pass the proper code in the first place.

* config/aarch64/aarch64.c (aarch64_gen_compare_reg): Return
the final comparison for code & cc_reg.
(aarch64_gen_compare_reg_maybe_ze): Likewise.
(aarch64_expand_compare_and_swap): Update to match -- do not
build the final comparison here, but PUT_MODE as necessary.
(aarch64_split_compare_and_swap): Use prebuilt comparison.
* config/aarch64/aarch64-simd.md (aarch64_cmdi): Likewise.
(aarch64_cmdi): Likewise.
(aarch64_cmtstdi): Likewise.
* config/aarch64/aarch64-speculation.cc
(aarch64_speculation_establish_tracker): Likewise.
* config/aarch64/aarch64.md (cbranch4, cbranch4): Likewise.
(mod3, abs2): Likewise.
(cstore4, cstore4): Likewise.
(cmov6, cmov6): Likewise.
(movcc, movcc, movcc): Likewise.
(cc): Likewise.
(ffs2): Likewise.
(cstorecc4): Remove redundant "".
---
 gcc/config/aarch64/aarch64.c  | 26 +++---
 gcc/config/aarch64/aarch64-simd.md| 18 ++---
 gcc/config/aarch64/aarch64-speculation.cc |  5 +-
 gcc/config/aarch64/aarch64.md | 96 ++-
 4 files changed, 63 insertions(+), 82 deletions(-)

diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c
index 6263897c9a0..9e7c26a8df2 100644
--- a/gcc/config/aarch64/aarch64.c
+++ b/gcc/config/aarch64/aarch64.c
@@ -2328,7 +2328,7 @@ emit_set_insn (rtx x, rtx y)
 }
 
 /* X and Y are two things to compare using CODE.  Emit the compare insn and
-   return the rtx for register 0 in the proper mode.  */
+   return the rtx for the CCmode comparison.  */
 rtx
 aarch64_gen_compare_reg (RTX_CODE code, rtx x, rtx y)
 {
@@ -2359,7 +2359,7 @@ aarch64_gen_compare_reg (RTX_CODE code, rtx x, rtx y)
   cc_reg = gen_rtx_REG (cc_mode, CC_REGNUM);
   emit_set_insn (cc_reg, gen_rtx_COMPARE (cc_mode, x, y));
 }
-  return cc_reg;
+  return gen_rtx_fmt_ee (code, VOIDmode, cc_reg, const0_rtx);
 }
 
 /* Similarly, but maybe zero-extend Y if Y_MODE < SImode.  */
@@ -2382,7 +2382,7 @@ aarch64_gen_compare_reg_maybe_ze (RTX_CODE code, rtx x, 
rtx y,
  cc_mode = CC_SWPmode;
  cc_reg = gen_rtx_REG (cc_mode, CC_REGNUM);
  emit_set_insn (cc_reg, t);
- return cc_reg;
+ return gen_rtx_fmt_ee (code, VOIDmode, cc_reg, const0_rtx);
}
 }
 
@@ -18506,7 +18506,8 @@ aarch64_expand_compare_and_swap (rtx operands[])
 
   emit_insn (gen_aarch64_compare_and_swap_lse (mode, rval, mem,
   newval, mod_s));
-  cc_reg = aarch64_gen_compare_reg_maybe_ze (NE, rval, oldval, mode);
+  x = aarch64_gen_compare_reg_maybe_ze (EQ, rval, oldval, mode);
+  PUT_MODE (x, SImode);
 }
   else if (TARGET_OUTLINE_ATOMICS)
 {
@@ -18517,7 +18518,8 @@ aarch64_expand_compare_and_swap (rtx operands[])
   rval = emit_library_call_value (func, NULL_RTX, LCT_NORMAL, r_mode,
  oldval, mode, newval, mode,
  XEXP (mem, 0), Pmode);
-  cc_reg = aarch64_gen_compare_reg_maybe_ze (NE, rval, oldval, mode);
+  x = aarch64_gen_compare_reg_maybe_ze (EQ, rval, oldval, mode);
+  PUT_MODE (x, SImode);
 }
   else
 {
@@ -18529,13 +18531,13 @@ aarch64_expand_compare_and_swap (rtx operands[])
   emit_insn (GEN_FCN (code) (rval, mem, oldval, newval,
 is_weak, mod_s, mod_f));
   cc_reg = gen_rtx_REG (CCmode, CC_REGNUM);
+  x = gen_rtx_EQ (SImode, cc_reg, const0_rtx);
 }
 
   if (r_mode != mode)
 rval = gen_lowpart (mode, rval);
   emit_move_insn (operands[1], rval);
 
-  x = gen_rtx_EQ (SImode, cc_reg, const0_rtx);
   emit_insn (gen_rtx_SET (bval, x));
 }
 
@@ -18610,10 +18612,8 @@ aarch64_split_compare_and_swap (rtx operands[])
   if (strong_zero_p)
 x = gen_rtx_NE (VOIDmode, rval, const0_rtx);
   else
-{
-  rtx cc_reg = aarch64_gen_compare_reg_maybe_ze (NE, rval, oldval, mode);
-  x = gen_rtx_NE (VOIDmode, cc_reg, const0_rtx);
-}
+x = aarch64_gen_compare_reg_maybe_ze (NE, rval, oldval, mode);
+
   x = gen_rtx_IF_THEN_ELSE (VOIDmode, x,
gen_rtx_LABEL_REF (Pmode, label2), pc_rtx);
   aarch64_emit_unlikely_jump (gen_rtx_SET (pc_rtx, x));
@@ -18626,8 +18626,7 @@ aarch64_split_compare_and_swap (rtx operands[])
{
  /* Emit an explicit compare instruction, so that we can correctly
 track the condition codes.  */
- rtx cc_reg = aarch64_gen_compare_reg (NE, scratch, const0_rtx);
- x = gen_rtx_NE (GET_MODE (cc_reg), cc_reg, const0_rtx);
+ x = aarch64_gen_compare_reg (NE, scratch,

[PATCH v2 8/9] aarch64: Implement TImode comparisons

2020-03-20 Thread Richard Henderson via Gcc-patches
Use ccmp to perform all TImode comparisons branchless.

* config/aarch64/aarch64.c (aarch64_gen_compare_reg): Expand all of
the comparisons for TImode, not just NE.
* config/aarch64/aarch64.md (cbranchti4, cstoreti4): New.
---
 gcc/config/aarch64/aarch64.c  | 130 ++
 gcc/config/aarch64/aarch64.md |  28 
 2 files changed, 144 insertions(+), 14 deletions(-)

diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c
index 9e7c26a8df2..6ae0ea388ce 100644
--- a/gcc/config/aarch64/aarch64.c
+++ b/gcc/config/aarch64/aarch64.c
@@ -2333,32 +2333,134 @@ rtx
 aarch64_gen_compare_reg (RTX_CODE code, rtx x, rtx y)
 {
   machine_mode cmp_mode = GET_MODE (x);
-  machine_mode cc_mode;
   rtx cc_reg;
 
   if (cmp_mode == TImode)
 {
-  gcc_assert (code == NE);
-
-  cc_mode = CCmode;
-  cc_reg = gen_rtx_REG (cc_mode, CC_REGNUM);
-
   rtx x_lo = operand_subword (x, 0, 0, TImode);
-  rtx y_lo = operand_subword (y, 0, 0, TImode);
-  emit_set_insn (cc_reg, gen_rtx_COMPARE (cc_mode, x_lo, y_lo));
-
   rtx x_hi = operand_subword (x, 1, 0, TImode);
-  rtx y_hi = operand_subword (y, 1, 0, TImode);
-  emit_insn (gen_ccmpccdi (cc_reg, cc_reg, x_hi, y_hi,
-  gen_rtx_EQ (cc_mode, cc_reg, const0_rtx),
-  GEN_INT (AARCH64_EQ)));
+  struct expand_operand ops[2];
+  rtx y_lo, y_hi, tmp;
+
+  if (CONST_INT_P (y))
+   {
+ HOST_WIDE_INT y_int = INTVAL (y);
+
+ y_lo = y;
+ switch (code)
+   {
+   case EQ:
+   case NE:
+ /* For equality, IOR the two halves together.  If this gets
+used for a branch, we expect this to fold to cbz/cbnz;
+otherwise it's no larger than cmp+ccmp below.  Beware of
+the compare-and-swap post-reload split and use cmp+ccmp.  */
+ if (y_int == 0 && can_create_pseudo_p ())
+   {
+ tmp = gen_reg_rtx (DImode);
+ emit_insn (gen_iordi3 (tmp, x_hi, x_lo));
+ emit_insn (gen_cmpdi (tmp, const0_rtx));
+ cc_reg = gen_rtx_REG (CCmode, CC_REGNUM);
+ goto done;
+   }
+   break;
+
+   case LE:
+   case GT:
+ /* Add 1 to Y to convert to LT/GE, which avoids the swap and
+keeps the constant operand.  The cstoreti and cbranchti
+operand predicates require aarch64_plus_operand, which
+means this increment cannot overflow.  */
+ y_lo = gen_int_mode (++y_int, DImode);
+ code = (code == LE ? LT : GE);
+ /* fall through */
+
+   case LT:
+   case GE:
+ /* Check only the sign bit using tst, or fold to tbz/tbnz.  */
+ if (y_int == 0)
+   {
+ cc_reg = gen_rtx_REG (CC_NZmode, CC_REGNUM);
+ tmp = gen_rtx_AND (DImode, x_hi, GEN_INT (INT64_MIN));
+ tmp = gen_rtx_COMPARE (CC_NZmode, tmp, const0_rtx);
+ emit_set_insn (cc_reg, tmp);
+ code = (code == LT ? NE : EQ);
+ goto done;
+   }
+ break;
+
+   default:
+ break;
+   }
+ y_hi = (y_int < 0 ? constm1_rtx : const0_rtx);
+   }
+  else
+   {
+ y_lo = operand_subword (y, 0, 0, TImode);
+ y_hi = operand_subword (y, 1, 0, TImode);
+   }
+
+  switch (code)
+   {
+   case LEU:
+   case GTU:
+   case LE:
+   case GT:
+ std::swap (x_lo, y_lo);
+ std::swap (x_hi, y_hi);
+ code = swap_condition (code);
+ break;
+
+   default:
+ break;
+   }
+
+  /* Emit cmpdi, forcing operands into registers as required. */
+  create_input_operand (&ops[0], x_lo, DImode);
+  create_input_operand (&ops[1], y_lo, DImode);
+  expand_insn (CODE_FOR_cmpdi, 2, ops);
+
+  cc_reg = gen_rtx_REG (CCmode, CC_REGNUM);
+  switch (code)
+   {
+   case EQ:
+   case NE:
+ /* For EQ, (x_lo == y_lo) && (x_hi == y_hi).  */
+ emit_insn (gen_ccmpccdi (cc_reg, cc_reg, x_hi, y_hi,
+  gen_rtx_EQ (VOIDmode, cc_reg, const0_rtx),
+  GEN_INT (AARCH64_EQ)));
+ break;
+
+   case LTU:
+   case GEU:
+ /* For LTU, (x - y), as double-word arithmetic.  */
+ create_input_operand (&ops[0], x_hi, DImode);
+ create_input_operand (&ops[1], y_hi, DImode);
+ expand_insn (CODE_FOR_ucmpdi3_carryinC, 2, ops);
+ /* The result is entirely within the C bit. */
+ break;
+
+   case LT:
+   case GE:
+ /* For LT, (x - y), as double-word arithmetic.  */
+ create_input_operand (&ops[0], x_hi, DImode);
+ create_input_operand (&ops[1], y_hi, DI

[PATCH v2 6/9] aarch64: Introduce aarch64_expand_addsubti

2020-03-20 Thread Richard Henderson via Gcc-patches
Modify aarch64_expand_subvti into a form that handles all
addition and subtraction, modulo, signed or unsigned overflow.

Use expand_insn to put the operands into the proper form,
and do not force values into register if not required.

* config/aarch64/aarch64.c (aarch64_ti_split) New.
(aarch64_addti_scratch_regs): Remove.
(aarch64_subvti_scratch_regs): Remove.
(aarch64_expand_subvti): Remove.
(aarch64_expand_addsubti): New.
* config/aarch64/aarch64-protos.h: Update to match.
* config/aarch64/aarch64.md (addti3): Use aarch64_expand_addsubti.
(addvti4, uaddvti4): Likewise.
(subvti4, usubvti4): Likewise.
(subti3): Likewise; accept immediates for operand 2.
---
 gcc/config/aarch64/aarch64-protos.h |  10 +-
 gcc/config/aarch64/aarch64.c| 136 
 gcc/config/aarch64/aarch64.md   | 125 ++---
 3 files changed, 67 insertions(+), 204 deletions(-)

diff --git a/gcc/config/aarch64/aarch64-protos.h 
b/gcc/config/aarch64/aarch64-protos.h
index d6d668ea920..787085b24d2 100644
--- a/gcc/config/aarch64/aarch64-protos.h
+++ b/gcc/config/aarch64/aarch64-protos.h
@@ -630,16 +630,8 @@ void aarch64_reset_previous_fndecl (void);
 bool aarch64_return_address_signing_enabled (void);
 bool aarch64_bti_enabled (void);
 void aarch64_save_restore_target_globals (tree);
-void aarch64_addti_scratch_regs (rtx, rtx, rtx *,
-rtx *, rtx *,
-rtx *, rtx *,
-rtx *);
-void aarch64_subvti_scratch_regs (rtx, rtx, rtx *,
- rtx *, rtx *,
- rtx *, rtx *, rtx *);
-void aarch64_expand_subvti (rtx, rtx, rtx,
-   rtx, rtx, rtx, rtx, bool);
 
+void aarch64_expand_addsubti (rtx, rtx, rtx, int, int, int);
 
 /* Initialize builtins for SIMD intrinsics.  */
 void init_aarch64_simd_builtins (void);
diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c
index c90de65de12..6263897c9a0 100644
--- a/gcc/config/aarch64/aarch64.c
+++ b/gcc/config/aarch64/aarch64.c
@@ -20241,117 +20241,61 @@ aarch64_gen_unlikely_cbranch (enum rtx_code code, 
machine_mode cc_mode,
   aarch64_emit_unlikely_jump (gen_rtx_SET (pc_rtx, x));
 }
 
-/* Generate DImode scratch registers for 128-bit (TImode) addition.
+/* Generate DImode scratch registers for 128-bit (TImode) add/sub.
+   INPUT represents the TImode input operand
+   LO represents the low half (DImode) of the TImode operand
+   HI represents the high half (DImode) of the TImode operand.  */
 
-   OP1 represents the TImode destination operand 1
-   OP2 represents the TImode destination operand 2
-   LOW_DEST represents the low half (DImode) of TImode operand 0
-   LOW_IN1 represents the low half (DImode) of TImode operand 1
-   LOW_IN2 represents the low half (DImode) of TImode operand 2
-   HIGH_DEST represents the high half (DImode) of TImode operand 0
-   HIGH_IN1 represents the high half (DImode) of TImode operand 1
-   HIGH_IN2 represents the high half (DImode) of TImode operand 2.  */
-
-void
-aarch64_addti_scratch_regs (rtx op1, rtx op2, rtx *low_dest,
-   rtx *low_in1, rtx *low_in2,
-   rtx *high_dest, rtx *high_in1,
-   rtx *high_in2)
+static void
+aarch64_ti_split (rtx input, rtx *lo, rtx *hi)
 {
-  *low_dest = gen_reg_rtx (DImode);
-  *low_in1 = gen_lowpart (DImode, op1);
-  *low_in2 = simplify_gen_subreg (DImode, op2, TImode,
- subreg_lowpart_offset (DImode, TImode));
-  *high_dest = gen_reg_rtx (DImode);
-  *high_in1 = gen_highpart (DImode, op1);
-  *high_in2 = simplify_gen_subreg (DImode, op2, TImode,
-  subreg_highpart_offset (DImode, TImode));
+  *lo = simplify_gen_subreg (DImode, input, TImode,
+subreg_lowpart_offset (DImode, TImode));
+  *hi = simplify_gen_subreg (DImode, input, TImode,
+subreg_highpart_offset (DImode, TImode));
 }
 
-/* Generate DImode scratch registers for 128-bit (TImode) subtraction.
-
-   This function differs from 'arch64_addti_scratch_regs' in that
-   OP1 can be an immediate constant (zero). We must call
-   subreg_highpart_offset with DImode and TImode arguments, otherwise
-   VOIDmode will be used for the const_int which generates an internal
-   error from subreg_size_highpart_offset which does not expect a size of zero.
-
-   OP1 represents the TImode destination operand 1
-   OP2 represents the TImode destination operand 2
-   LOW_DEST represents the low half (DImode) of TImode operand 0
-   LOW_IN1 represents the low half (DImode) of TImode operand 1
-   LOW_IN2 represents the low half (DImode) of TImode operand 2
-   HIGH_DEST represents the high half (DImode) of TImode operand 0
-   HIGH_IN1 represents the high half (DImode) of TImode operand 1
-   HIGH_I

[PATCH v2 1/9] aarch64: Accept 0 as first argument to compares

2020-03-20 Thread Richard Henderson via Gcc-patches
While cmp (extended register) and cmp (immediate) uses ,
cmp (shifted register) uses .  So we can perform cmp xzr, x0.

For ccmp, we only have  as an input.

* config/aarch64/aarch64.md (cmp): For operand 0, use
aarch64_reg_or_zero.  Shuffle reg/reg to last alternative
and accept Z.
(@ccmpcc): For operand 0, use aarch64_reg_or_zero and Z.
(@ccmpcc_rev): Likewise.
---
 gcc/config/aarch64/aarch64.md | 14 +++---
 1 file changed, 7 insertions(+), 7 deletions(-)

diff --git a/gcc/config/aarch64/aarch64.md b/gcc/config/aarch64/aarch64.md
index c7c4d1dd519..b9ae51e48dd 100644
--- a/gcc/config/aarch64/aarch64.md
+++ b/gcc/config/aarch64/aarch64.md
@@ -502,7 +502,7 @@
   [(match_operand 0 "cc_register" "")
(const_int 0)])
  (compare:CC_ONLY
-   (match_operand:GPI 2 "register_operand" "r,r,r")
+   (match_operand:GPI 2 "aarch64_reg_or_zero" "rZ,rZ,rZ")
(match_operand:GPI 3 "aarch64_ccmp_operand" "r,Uss,Usn"))
  (unspec:CC_ONLY
[(match_operand 5 "immediate_operand")]
@@ -542,7 +542,7 @@
[(match_operand 5 "immediate_operand")]
UNSPEC_NZCV)
  (compare:CC_ONLY
-   (match_operand:GPI 2 "register_operand" "r,r,r")
+   (match_operand:GPI 2 "aarch64_reg_or_zero" "rZ,rZ,rZ")
(match_operand:GPI 3 "aarch64_ccmp_operand" "r,Uss,Usn"]
   ""
   "@
@@ -3961,14 +3961,14 @@
 
 (define_insn "cmp"
   [(set (reg:CC CC_REGNUM)
-   (compare:CC (match_operand:GPI 0 "register_operand" "rk,rk,rk")
-   (match_operand:GPI 1 "aarch64_plus_operand" "r,I,J")))]
+   (compare:CC (match_operand:GPI 0 "aarch64_reg_or_zero" "rk,rk,rkZ")
+   (match_operand:GPI 1 "aarch64_plus_operand" "I,J,rZ")))]
   ""
   "@
-   cmp\\t%0, %1
cmp\\t%0, %1
-   cmn\\t%0, #%n1"
-  [(set_attr "type" "alus_sreg,alus_imm,alus_imm")]
+   cmn\\t%0, #%n1
+   cmp\\t%0, %1"
+  [(set_attr "type" "alus_imm,alus_imm,alus_sreg")]
 )
 
 (define_insn "fcmp"
-- 
2.20.1



[PATCH v2 9/9] aarch64: Implement absti2

2020-03-20 Thread Richard Henderson via Gcc-patches
* config/aarch64/aarch64.md (absti2): New.
---
 gcc/config/aarch64/aarch64.md | 30 ++
 1 file changed, 30 insertions(+)

diff --git a/gcc/config/aarch64/aarch64.md b/gcc/config/aarch64/aarch64.md
index 284a8038e28..7a112f89487 100644
--- a/gcc/config/aarch64/aarch64.md
+++ b/gcc/config/aarch64/aarch64.md
@@ -3653,6 +3653,36 @@
   }
 )
 
+(define_expand "absti2"
+  [(match_operand:TI 0 "register_operand")
+   (match_operand:TI 1 "register_operand")]
+  ""
+  {
+rtx lo_op1 = gen_lowpart (DImode, operands[1]);
+rtx hi_op1 = gen_highpart (DImode, operands[1]);
+rtx lo_tmp = gen_reg_rtx (DImode);
+rtx hi_tmp = gen_reg_rtx (DImode);
+rtx x;
+
+emit_insn (gen_negdi_carryout (lo_tmp, lo_op1));
+emit_insn (gen_negvdi_carryinV (hi_tmp, hi_op1));
+
+rtx cc = gen_rtx_REG (CC_NZmode, CC_REGNUM);
+
+x = gen_rtx_GE (VOIDmode, cc, const0_rtx);
+x = gen_rtx_IF_THEN_ELSE (DImode, x, lo_tmp, lo_op1);
+emit_insn (gen_rtx_SET (lo_tmp, x));
+
+x = gen_rtx_GE (VOIDmode, cc, const0_rtx);
+x = gen_rtx_IF_THEN_ELSE (DImode, x, hi_tmp, hi_op1);
+emit_insn (gen_rtx_SET (hi_tmp, x));
+
+emit_move_insn (gen_lowpart (DImode, operands[0]), lo_tmp);
+emit_move_insn (gen_highpart (DImode, operands[0]), hi_tmp);
+DONE;
+  }
+)
+
 (define_insn "neg2"
   [(set (match_operand:GPI 0 "register_operand" "=r,w")
(neg:GPI (match_operand:GPI 1 "register_operand" "r,w")))]
-- 
2.20.1



[PATCH v2 5/9] aarch64: Provide expander for sub3_compare1

2020-03-20 Thread Richard Henderson via Gcc-patches
In a couple of places we open-code a special case of this
pattern into the more specific sub3_compare1_imm.
Centralize that special case into an expander.

* config/aarch64/aarch64.md (*sub3_compare1): Rename
from sub3_compare1.
(sub3_compare1): New expander.
---
 gcc/config/aarch64/aarch64.md | 22 +-
 1 file changed, 21 insertions(+), 1 deletion(-)

diff --git a/gcc/config/aarch64/aarch64.md b/gcc/config/aarch64/aarch64.md
index 076158b0071..47eeba7311c 100644
--- a/gcc/config/aarch64/aarch64.md
+++ b/gcc/config/aarch64/aarch64.md
@@ -3120,7 +3120,7 @@
   [(set_attr "type" "alus_imm")]
 )
 
-(define_insn "sub3_compare1"
+(define_insn "*sub3_compare1"
   [(set (reg:CC CC_REGNUM)
(compare:CC
  (match_operand:GPI 1 "aarch64_reg_or_zero" "rkZ")
@@ -3132,6 +3132,26 @@
   [(set_attr "type" "alus_sreg")]
 )
 
+(define_expand "sub3_compare1"
+  [(parallel
+[(set (reg:CC CC_REGNUM)
+ (compare:CC
+   (match_operand:GPI 1 "aarch64_reg_or_zero")
+   (match_operand:GPI 2 "aarch64_reg_or_imm")))
+ (set (match_operand:GPI 0 "register_operand")
+ (minus:GPI (match_dup 1) (match_dup 2)))])]
+  ""
+{
+  if (aarch64_plus_immediate (operands[2], mode))
+{
+  emit_insn (gen_sub3_compare1_imm
+(operands[0], operands[1], operands[2],
+ GEN_INT (-INTVAL (operands[2];
+  DONE;
+}
+  operands[2] = force_reg (mode, operands[2]);
+})
+
 (define_peephole2
   [(set (match_operand:GPI 0 "aarch64_general_reg")
(minus:GPI (match_operand:GPI 1 "aarch64_reg_or_zero")
-- 
2.20.1



[PATCH v2 3/9] aarch64: Add cmp_*_carryinC patterns

2020-03-20 Thread Richard Henderson via Gcc-patches
Duplicate all usub_*_carryinC, but use xzr for the output when we
only require the flags output.  The signed versions use sign_extend
instead of zero_extend for combine's benefit.

These will be used shortly for TImode comparisons.

* config/aarch64/aarch64.md (cmp3_carryinC): New.
(*cmp3_carryinC_z1): New.
(*cmp3_carryinC_z2): New.
(*cmp3_carryinC): New.
---
 gcc/config/aarch64/aarch64.md | 50 +++
 1 file changed, 50 insertions(+)

diff --git a/gcc/config/aarch64/aarch64.md b/gcc/config/aarch64/aarch64.md
index a996a5f1c39..9b1c3f797f9 100644
--- a/gcc/config/aarch64/aarch64.md
+++ b/gcc/config/aarch64/aarch64.md
@@ -3440,6 +3440,18 @@
""
 )
 
+(define_expand "cmp3_carryinC"
+   [(set (reg:CC CC_REGNUM)
+(compare:CC
+  (ANY_EXTEND:
+(match_operand:GPI 0 "register_operand"))
+  (plus:
+(ANY_EXTEND:
+  (match_operand:GPI 1 "register_operand"))
+(ltu: (reg:CC CC_REGNUM) (const_int 0)]
+   ""
+)
+
 (define_insn "*usub3_carryinC_z1"
   [(set (reg:CC CC_REGNUM)
(compare:CC
@@ -3457,6 +3469,19 @@
   [(set_attr "type" "adc_reg")]
 )
 
+(define_insn "*cmp3_carryinC_z1"
+  [(set (reg:CC CC_REGNUM)
+   (compare:CC
+ (const_int 0)
+ (plus:
+   (ANY_EXTEND:
+ (match_operand:GPI 0 "register_operand" "r"))
+   (match_operand: 1 "aarch64_borrow_operation" ""]
+   ""
+   "sbcs\\tzr, zr, %0"
+  [(set_attr "type" "adc_reg")]
+)
+
 (define_insn "*usub3_carryinC_z2"
   [(set (reg:CC CC_REGNUM)
(compare:CC
@@ -3472,6 +3497,17 @@
   [(set_attr "type" "adc_reg")]
 )
 
+(define_insn "*cmp3_carryinC_z2"
+  [(set (reg:CC CC_REGNUM)
+   (compare:CC
+ (ANY_EXTEND:
+   (match_operand:GPI 0 "register_operand" "r"))
+ (match_operand: 1 "aarch64_borrow_operation" "")))]
+   ""
+   "sbcs\\tzr, %0, zr"
+  [(set_attr "type" "adc_reg")]
+)
+
 (define_insn "*usub3_carryinC"
   [(set (reg:CC CC_REGNUM)
(compare:CC
@@ -3490,6 +3526,20 @@
   [(set_attr "type" "adc_reg")]
 )
 
+(define_insn "*cmp3_carryinC"
+  [(set (reg:CC CC_REGNUM)
+   (compare:CC
+ (ANY_EXTEND:
+   (match_operand:GPI 0 "register_operand" "r"))
+ (plus:
+   (ANY_EXTEND:
+ (match_operand:GPI 1 "register_operand" "r"))
+   (match_operand: 2 "aarch64_borrow_operation" ""]
+   ""
+   "sbcs\\tzr, %0, %1"
+  [(set_attr "type" "adc_reg")]
+)
+
 (define_expand "sub3_carryinV"
   [(parallel
  [(set (reg:CC_V CC_REGNUM)
-- 
2.20.1



[PATCH v2 4/9] aarch64: Add cmp_carryinC_m2

2020-03-20 Thread Richard Henderson via Gcc-patches
Combine will fold immediate -1 differently than the other
*cmp*_carryinC* patterns.  In this case we can use adcs
with an xzr input, and it occurs frequently when comparing
128-bit values to small negative constants.

* config/aarch64/aarch64.md (cmp_carryinC_m2): New.
---
 gcc/config/aarch64/aarch64.md | 15 +++
 1 file changed, 15 insertions(+)

diff --git a/gcc/config/aarch64/aarch64.md b/gcc/config/aarch64/aarch64.md
index 9b1c3f797f9..076158b0071 100644
--- a/gcc/config/aarch64/aarch64.md
+++ b/gcc/config/aarch64/aarch64.md
@@ -3452,6 +3452,7 @@
""
 )
 
+;; Substituting zero into the first input operand.
 (define_insn "*usub3_carryinC_z1"
   [(set (reg:CC CC_REGNUM)
(compare:CC
@@ -3482,6 +3483,7 @@
   [(set_attr "type" "adc_reg")]
 )
 
+;; Substituting zero into the second input operand.
 (define_insn "*usub3_carryinC_z2"
   [(set (reg:CC CC_REGNUM)
(compare:CC
@@ -3508,6 +3510,19 @@
   [(set_attr "type" "adc_reg")]
 )
 
+;; Substituting -1 into the second input operand.
+(define_insn "*cmp3_carryinC_m2"
+  [(set (reg:CC CC_REGNUM)
+   (compare:CC
+ (neg:
+   (match_operand: 1 "aarch64_carry_operation" ""))
+ (ANY_EXTEND:
+   (match_operand:GPI 0 "register_operand" "r"]
+   ""
+   "adcs\\tzr, %0, zr"
+  [(set_attr "type" "adc_reg")]
+)
+
 (define_insn "*usub3_carryinC"
   [(set (reg:CC CC_REGNUM)
(compare:CC
-- 
2.20.1



[PATCH v2 2/9] aarch64: Accept zeros in add3_carryin

2020-03-20 Thread Richard Henderson via Gcc-patches
The expander and the insn pattern did not match, leading to
recognition failures in expand.

* config/aarch64/aarch64.md (*add3_carryin): Accept zeros.
---
 gcc/config/aarch64/aarch64.md | 9 +
 1 file changed, 5 insertions(+), 4 deletions(-)

diff --git a/gcc/config/aarch64/aarch64.md b/gcc/config/aarch64/aarch64.md
index b9ae51e48dd..a996a5f1c39 100644
--- a/gcc/config/aarch64/aarch64.md
+++ b/gcc/config/aarch64/aarch64.md
@@ -2606,16 +2606,17 @@
""
 )
 
-;; Note that add with carry with two zero inputs is matched by cset,
-;; and that add with carry with one zero input is matched by cinc.
+;; While add with carry with two zero inputs will be folded to cset,
+;; and add with carry with one zero input will be folded to cinc,
+;; accept the zeros during initial expansion.
 
 (define_insn "*add3_carryin"
   [(set (match_operand:GPI 0 "register_operand" "=r")
(plus:GPI
  (plus:GPI
(match_operand:GPI 3 "aarch64_carry_operation" "")
-   (match_operand:GPI 1 "register_operand" "r"))
- (match_operand:GPI 2 "register_operand" "r")))]
+   (match_operand:GPI 1 "aarch64_reg_or_zero" "rZ"))
+ (match_operand:GPI 2 "aarch64_reg_or_zero" "rZ")))]
""
"adc\\t%0, %1, %2"
   [(set_attr "type" "adc_reg")]
-- 
2.20.1



[PATCH v2 0/9] aarch64: Implement TImode comparisons

2020-03-20 Thread Richard Henderson via Gcc-patches
This is attacking case 3 of PR 94174.

Although I'm no longer using ccmp for most of the TImode comparisons.
Thanks to Wilco Dijkstra for pulling off my blinders and reminding me
that we can use subs+sbcs for (almost) all compares.

The first 5 patches clean up or add patterns to support the expansion
and not generate extraneous constant loads.

The aarch64_expand_addsubti patch tidies up the existing TImode
arithmetic expansions.

EXAMPLE __subvti3 (context diff is easier to read):

*** 12,27 
10: b7f800a3tbnzx3, #63, 24 <__subvti3+0x24>
!   14: eb02003fcmp x1, x2
!   18: 5400010cb.gt38 <__subvti3+0x38>
!   1c: 54000140b.eq44 <__subvti3+0x44>  // b.none
20: d65f03c0ret
!   24: eb01005fcmp x2, x1
!   28: 548cb.gt38 <__subvti3+0x38>
!   2c: 54a1b.ne20 <__subvti3+0x20>  // b.any
!   30: eb9fcmp x4, x0
!   34: 5469b.ls20 <__subvti3+0x20>  // b.plast
!   38: a9bf7bfdstp x29, x30, [sp, #-16]!
!   3c: 910003fdmov x29, sp
!   40: 9400bl  0 
!   44: eb04001fcmp x0, x4
!   48: 5488b.hi38 <__subvti3+0x38>  // b.pmore
!   4c: d65f03c0ret
--- 12,22 
10: b7f800a3tbnzx3, #63, 24 <__subvti3+0x24>
!   14: eb9fcmp x4, x0
!   18: fa01005fsbcsxzr, x2, x1
!   1c: 54abb.lt30 <__subvti3+0x30>  // b.tstop
20: d65f03c0ret
!   24: eb04001fcmp x0, x4
!   28: fa02003fsbcsxzr, x1, x2
!   2c: 54aab.ge20 <__subvti3+0x20>  // b.tcont
!   30: a9bf7bfdstp x29, x30, [sp, #-16]!
!   34: 910003fdmov x29, sp
!   38: 9400bl  0 

EXAMPLE from the pr:

void test3(__int128 a, uint64_t l)
{
if ((__int128_t)a - l <= 1)
doit();
}

*** 11,23 
subsx0, x0, x2
sbc x1, x1, xzr
!   cmp x1, 0
!   ble .L6
! .L1:
ret
.p2align 2,,3
- .L6:
-   bne .L4
-   cmp x0, 1
-   bhi .L1
  .L4:
b   doit
--- 11,19 
subsx0, x0, x2
sbc x1, x1, xzr
!   cmp x0, 2
!   sbcsxzr, x1, xzr
!   blt .L4
ret
.p2align 2,,3
  .L4:
b   doit


r~


Richard Henderson (9):
  aarch64: Accept 0 as first argument to compares
  aarch64: Accept zeros in add3_carryin
  aarch64: Add cmp_*_carryinC patterns
  aarch64: Add cmp_carryinC_m2
  aarch64: Provide expander for sub3_compare1
  aarch64: Introduce aarch64_expand_addsubti
  aarch64: Adjust result of aarch64_gen_compare_reg
  aarch64: Implement TImode comparisons
  aarch64: Implement absti2

 gcc/config/aarch64/aarch64-protos.h   |  10 +-
 gcc/config/aarch64/aarch64.c  | 292 +---
 gcc/config/aarch64/aarch64-simd.md|  18 +-
 gcc/config/aarch64/aarch64-speculation.cc |   5 +-
 gcc/config/aarch64/aarch64.md | 389 +-
 5 files changed, 402 insertions(+), 312 deletions(-)

-- 
2.20.1



Re: [PATCH 0/6] aarch64: Implement TImode comparisons

2020-03-19 Thread Richard Henderson via Gcc-patches
On 3/19/20 8:47 AM, Wilco Dijkstra wrote:
> Hi Richard,
> 
> Thanks for these patches - yes TI mode expansions can certainly be improved!
> So looking at your expansions for signed compares, why not copy the optimal
> sequence from 32-bit Arm?
> 
> Any compare can be done in at most 2 instructions:
> 
> void doit(void);
> void f(long long a)
> {
> if (a <= 1)
> doit();
> }
> 
> f:
> cmp r0, #2
> sbcsr3, r1, #0
> blt .L4

Well, this one requires that you be able to add 1 to an input and for that
input to not overflow.  But you're right that I should be using this sequence
for LT (not LE).

I'll have another look.


r~


[PATCH 4/6] aarch64: Simplify @ccmp operands

2020-03-18 Thread Richard Henderson via Gcc-patches
The first two arguments were "reversed", in that operand 0 was not
the output, but the input cc_reg.  Remove operand 0 entirely, since
we can get the input cc_reg from within the operand 3 comparison
expression.  This moves the output operand to index 0.

* config/aarch64/aarch64.md (@ccmpcc): New expander; remove
operand 0; change operand 3 from match_operator to match_operand.
(*ccmpcc): Rename from @ccmp; swap numbers of operand 0 & 1.
(@ccmp, *ccmp): Likewise.
(@ccmpcc_rev, *ccmpcc_rev): Likewise.
(@ccmp_rev, *ccmp_rev): Likewise.
* config/aarch64/aarch64.c (aarch64_gen_compare_reg): Update to match.
(aarch64_gen_ccmp_next): Likewise.
---
 gcc/config/aarch64/aarch64.c  | 21 +-
 gcc/config/aarch64/aarch64.md | 76 +--
 2 files changed, 74 insertions(+), 23 deletions(-)

diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c
index 619357fa210..16ff40fc267 100644
--- a/gcc/config/aarch64/aarch64.c
+++ b/gcc/config/aarch64/aarch64.c
@@ -2349,7 +2349,7 @@ aarch64_gen_compare_reg (RTX_CODE code, rtx x, rtx y)
 
   rtx x_hi = operand_subword (x, 1, 0, TImode);
   rtx y_hi = operand_subword (y, 1, 0, TImode);
-  emit_insn (gen_ccmpccdi (cc_reg, cc_reg, x_hi, y_hi,
+  emit_insn (gen_ccmpccdi (cc_reg, x_hi, y_hi,
   gen_rtx_EQ (cc_mode, cc_reg, const0_rtx),
   GEN_INT (AARCH64_EQ)));
 }
@@ -20445,7 +20445,7 @@ aarch64_gen_ccmp_next (rtx_insn **prep_seq, rtx_insn 
**gen_seq, rtx prev,
   machine_mode op_mode, cmp_mode, cc_mode = CCmode;
   int unsignedp = TYPE_UNSIGNED (TREE_TYPE (treeop0));
   insn_code icode;
-  struct expand_operand ops[6];
+  struct expand_operand ops[5];
   int aarch64_cond;
 
   push_to_sequence (*prep_seq);
@@ -20484,8 +20484,8 @@ aarch64_gen_ccmp_next (rtx_insn **prep_seq, rtx_insn 
**gen_seq, rtx prev,
 
   icode = code_for_ccmp (cc_mode, cmp_mode);
 
-  op0 = prepare_operand (icode, op0, 2, op_mode, cmp_mode, unsignedp);
-  op1 = prepare_operand (icode, op1, 3, op_mode, cmp_mode, unsignedp);
+  op0 = prepare_operand (icode, op0, 1, op_mode, cmp_mode, unsignedp);
+  op1 = prepare_operand (icode, op1, 2, op_mode, cmp_mode, unsignedp);
   if (!op0 || !op1)
 {
   end_sequence ();
@@ -20517,15 +20517,14 @@ aarch64_gen_ccmp_next (rtx_insn **prep_seq, rtx_insn 
**gen_seq, rtx prev,
   aarch64_cond = AARCH64_INVERSE_CONDITION_CODE (aarch64_cond);
 }
 
-  create_fixed_operand (&ops[0], XEXP (prev, 0));
-  create_fixed_operand (&ops[1], target);
-  create_fixed_operand (&ops[2], op0);
-  create_fixed_operand (&ops[3], op1);
-  create_fixed_operand (&ops[4], prev);
-  create_fixed_operand (&ops[5], GEN_INT (aarch64_cond));
+  create_fixed_operand (&ops[0], target);
+  create_fixed_operand (&ops[1], op0);
+  create_fixed_operand (&ops[2], op1);
+  create_fixed_operand (&ops[3], prev);
+  create_fixed_operand (&ops[4], GEN_INT (aarch64_cond));
 
   push_to_sequence (*gen_seq);
-  if (!maybe_expand_insn (icode, 6, ops))
+  if (!maybe_expand_insn (icode, 5, ops))
 {
   end_sequence ();
   return NULL_RTX;
diff --git a/gcc/config/aarch64/aarch64.md b/gcc/config/aarch64/aarch64.md
index 0fe41117640..12213176103 100644
--- a/gcc/config/aarch64/aarch64.md
+++ b/gcc/config/aarch64/aarch64.md
@@ -495,11 +495,24 @@
   ""
   "")
 
-(define_insn "@ccmp"
-  [(set (match_operand:CC_ONLY 1 "cc_register" "")
+(define_expand "@ccmp"
+  [(set (match_operand:CC_ONLY 0 "cc_register")
+   (if_then_else:CC_ONLY
+ (match_operand 3 "aarch64_comparison_operator")
+ (compare:CC_ONLY
+   (match_operand:GPI 1 "aarch64_reg_or_zero")
+   (match_operand:GPI 2 "aarch64_ccmp_operand"))
+ (unspec:CC_ONLY
+   [(match_operand 4 "immediate_operand")]
+   UNSPEC_NZCV)))]
+  ""
+)
+
+(define_insn "*ccmp"
+  [(set (match_operand:CC_ONLY 0 "cc_register" "")
(if_then_else:CC_ONLY
  (match_operator 4 "aarch64_comparison_operator"
-  [(match_operand 0 "cc_register" "")
+  [(match_operand 1 "cc_register" "")
(const_int 0)])
  (compare:CC_ONLY
(match_operand:GPI 2 "aarch64_reg_or_zero" "rZ,rZ,rZ")
@@ -515,11 +528,24 @@
   [(set_attr "type" "alus_sreg,alus_imm,alus_imm")]
 )
 
-(define_insn "@ccmp"
-  [(set (match_operand:CCFP_CCFPE 1 "cc_register" "")
+(define_expand "@ccmp"
+  [(set (match_operand:CCFP_CCFPE 0 "cc_register")
+   (if_then_else:CCFP_CCFPE
+ (match_operand 3 "aarch64_comparison_operator")
+ (compare:CCFP_CCFPE
+   (match_operand:GPF 1 "register_operand")
+   (match_operand:GPF 2 "register_operand"))
+ (unspec:CCFP_CCFPE
+   [(match_operand 4 "immediate_operand")]
+   UNSPEC_NZCV)))]
+  ""
+)
+
+(define_insn "*ccmp"
+  [(set (match_operand:CCFP_CCFPE 0 "cc_register" "")
(if_then_else:CCFP_CCFPE
  (match_operator 

[PATCH 2/6] aarch64: Adjust result of aarch64_gen_compare_reg

2020-03-18 Thread Richard Henderson via Gcc-patches
Return the entire comparison expression, not just the cc_reg.
This will allow the routine to adjust the comparison code as
needed for TImode comparisons.

Note that some users were passing e.g. EQ to aarch64_gen_compare_reg
and then using gen_rtx_NE.  Pass the proper code in the first place.

* config/aarch64/aarch64.c (aarch64_gen_compare_reg): Return
the final comparison for code & cc_reg.
(aarch64_gen_compare_reg_maybe_ze): Likewise.
(aarch64_expand_compare_and_swap): Update to match -- do not
build the final comparison here, but PUT_MODE as necessary.
(aarch64_split_compare_and_swap): Use prebuilt comparison.
* config/aarch64/aarch64-simd.md (aarch64_cmdi): Likewise.
(aarch64_cmdi): Likewise.
(aarch64_cmtstdi): Likewise.
* config/aarch64/aarch64-speculation.cc
(aarch64_speculation_establish_tracker): Likewise.
* config/aarch64/aarch64.md (cbranch4, cbranch4): Likewise.
(mod3, abs2): Likewise.
(cstore4, cstore4): Likewise.
(cmov6, cmov6): Likewise.
(movcc, movcc, movcc): Likewise.
(cc): Likewise.
(ffs2): Likewise.
(cstorecc4): Remove redundant "".
---
 gcc/config/aarch64/aarch64.c  | 26 +++---
 gcc/config/aarch64/aarch64-simd.md| 18 ++---
 gcc/config/aarch64/aarch64-speculation.cc |  5 +-
 gcc/config/aarch64/aarch64.md | 96 ++-
 4 files changed, 63 insertions(+), 82 deletions(-)

diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c
index c90de65de12..619357fa210 100644
--- a/gcc/config/aarch64/aarch64.c
+++ b/gcc/config/aarch64/aarch64.c
@@ -2328,7 +2328,7 @@ emit_set_insn (rtx x, rtx y)
 }
 
 /* X and Y are two things to compare using CODE.  Emit the compare insn and
-   return the rtx for register 0 in the proper mode.  */
+   return the rtx for the CCmode comparison.  */
 rtx
 aarch64_gen_compare_reg (RTX_CODE code, rtx x, rtx y)
 {
@@ -2359,7 +2359,7 @@ aarch64_gen_compare_reg (RTX_CODE code, rtx x, rtx y)
   cc_reg = gen_rtx_REG (cc_mode, CC_REGNUM);
   emit_set_insn (cc_reg, gen_rtx_COMPARE (cc_mode, x, y));
 }
-  return cc_reg;
+  return gen_rtx_fmt_ee (code, VOIDmode, cc_reg, const0_rtx);
 }
 
 /* Similarly, but maybe zero-extend Y if Y_MODE < SImode.  */
@@ -2382,7 +2382,7 @@ aarch64_gen_compare_reg_maybe_ze (RTX_CODE code, rtx x, 
rtx y,
  cc_mode = CC_SWPmode;
  cc_reg = gen_rtx_REG (cc_mode, CC_REGNUM);
  emit_set_insn (cc_reg, t);
- return cc_reg;
+ return gen_rtx_fmt_ee (code, VOIDmode, cc_reg, const0_rtx);
}
 }
 
@@ -18506,7 +18506,8 @@ aarch64_expand_compare_and_swap (rtx operands[])
 
   emit_insn (gen_aarch64_compare_and_swap_lse (mode, rval, mem,
   newval, mod_s));
-  cc_reg = aarch64_gen_compare_reg_maybe_ze (NE, rval, oldval, mode);
+  x = aarch64_gen_compare_reg_maybe_ze (EQ, rval, oldval, mode);
+  PUT_MODE (x, SImode);
 }
   else if (TARGET_OUTLINE_ATOMICS)
 {
@@ -18517,7 +18518,8 @@ aarch64_expand_compare_and_swap (rtx operands[])
   rval = emit_library_call_value (func, NULL_RTX, LCT_NORMAL, r_mode,
  oldval, mode, newval, mode,
  XEXP (mem, 0), Pmode);
-  cc_reg = aarch64_gen_compare_reg_maybe_ze (NE, rval, oldval, mode);
+  x = aarch64_gen_compare_reg_maybe_ze (EQ, rval, oldval, mode);
+  PUT_MODE (x, SImode);
 }
   else
 {
@@ -18529,13 +18531,13 @@ aarch64_expand_compare_and_swap (rtx operands[])
   emit_insn (GEN_FCN (code) (rval, mem, oldval, newval,
 is_weak, mod_s, mod_f));
   cc_reg = gen_rtx_REG (CCmode, CC_REGNUM);
+  x = gen_rtx_EQ (SImode, cc_reg, const0_rtx);
 }
 
   if (r_mode != mode)
 rval = gen_lowpart (mode, rval);
   emit_move_insn (operands[1], rval);
 
-  x = gen_rtx_EQ (SImode, cc_reg, const0_rtx);
   emit_insn (gen_rtx_SET (bval, x));
 }
 
@@ -18610,10 +18612,8 @@ aarch64_split_compare_and_swap (rtx operands[])
   if (strong_zero_p)
 x = gen_rtx_NE (VOIDmode, rval, const0_rtx);
   else
-{
-  rtx cc_reg = aarch64_gen_compare_reg_maybe_ze (NE, rval, oldval, mode);
-  x = gen_rtx_NE (VOIDmode, cc_reg, const0_rtx);
-}
+x = aarch64_gen_compare_reg_maybe_ze (NE, rval, oldval, mode);
+
   x = gen_rtx_IF_THEN_ELSE (VOIDmode, x,
gen_rtx_LABEL_REF (Pmode, label2), pc_rtx);
   aarch64_emit_unlikely_jump (gen_rtx_SET (pc_rtx, x));
@@ -18626,8 +18626,7 @@ aarch64_split_compare_and_swap (rtx operands[])
{
  /* Emit an explicit compare instruction, so that we can correctly
 track the condition codes.  */
- rtx cc_reg = aarch64_gen_compare_reg (NE, scratch, const0_rtx);
- x = gen_rtx_NE (GET_MODE (cc_reg), cc_reg, const0_rtx);
+ x = aarch64_gen_compare_reg (NE, scratch,

[PATCH 6/6] aarch64: Implement TImode comparisons

2020-03-18 Thread Richard Henderson via Gcc-patches
Use ccmp to perform all TImode comparisons branchless.

* config/aarch64/aarch64.c (aarch64_gen_compare_reg): Expand all of
the comparisons for TImode, not just NE.
* config/aarch64/aarch64.md (cbranchti4, cstoreti4): New.
---
 gcc/config/aarch64/aarch64.c  | 182 +++---
 gcc/config/aarch64/aarch64.md |  28 ++
 2 files changed, 196 insertions(+), 14 deletions(-)

diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c
index d7899dad759..911dc1c91cd 100644
--- a/gcc/config/aarch64/aarch64.c
+++ b/gcc/config/aarch64/aarch64.c
@@ -2363,32 +2363,186 @@ rtx
 aarch64_gen_compare_reg (RTX_CODE code, rtx x, rtx y)
 {
   machine_mode cmp_mode = GET_MODE (x);
-  machine_mode cc_mode;
   rtx cc_reg;
 
   if (cmp_mode == TImode)
 {
-  gcc_assert (code == NE);
-
-  cc_mode = CCmode;
-  cc_reg = gen_rtx_REG (cc_mode, CC_REGNUM);
-
   rtx x_lo = operand_subword (x, 0, 0, TImode);
-  rtx y_lo = operand_subword (y, 0, 0, TImode);
-  emit_set_insn (cc_reg, gen_rtx_COMPARE (cc_mode, x_lo, y_lo));
-
   rtx x_hi = operand_subword (x, 1, 0, TImode);
-  rtx y_hi = operand_subword (y, 1, 0, TImode);
-  emit_insn (gen_ccmpccdi (cc_reg, x_hi, y_hi,
-  gen_rtx_EQ (cc_mode, cc_reg, const0_rtx),
-  GEN_INT (aarch64_nzcv_codes[AARCH64_NE])));
+  rtx y_lo, y_hi, tmp;
+
+  if (y == const0_rtx)
+   {
+ y_lo = y_hi = y;
+ switch (code)
+   {
+   case EQ:
+   case NE:
+ /* For equality, IOR the two halves together.  If this gets
+used for a branch, we expect this to fold to cbz/cbnz;
+otherwise it's no larger than cmp+ccmp below.  Beware of
+the compare-and-swap post-reload split and use cmp+ccmp.  */
+ if (!can_create_pseudo_p ())
+   break;
+ tmp = gen_reg_rtx (DImode);
+ emit_insn (gen_iordi3 (tmp, x_hi, x_lo));
+ emit_insn (gen_cmpdi (tmp, const0_rtx));
+ cc_reg = gen_rtx_REG (CCmode, CC_REGNUM);
+ goto done;
+
+   case LT:
+   case GE:
+ /* Check only the sign bit.  Choose to expose this detail,
+lest something later tries to use a COMPARE in a way
+that doesn't correspond.  This is "tst".  */
+ cc_reg = gen_rtx_REG (CC_NZmode, CC_REGNUM);
+ tmp = gen_rtx_AND (DImode, x_hi, GEN_INT (INT64_MIN));
+ tmp = gen_rtx_COMPARE (CC_NZmode, tmp, const0_rtx);
+ emit_set_insn (cc_reg, tmp);
+ code = (code == LT ? NE : EQ);
+ goto done;
+
+   case LE:
+   case GT:
+ /* For GT, (x_hi >= 0) && ((x_hi | x_lo) != 0),
+and of course the inverse for LE.  */
+ emit_insn (gen_cmpdi (x_hi, const0_rtx));
+
+ tmp = gen_reg_rtx (DImode);
+ emit_insn (gen_iordi3 (tmp, x_hi, x_lo));
+
+ /* Combine the two terms:
+(GE ? (compare tmp 0) : EQ),
+so that the whole term is true for NE, false for EQ.  */
+ cc_reg = gen_rtx_REG (CCmode, CC_REGNUM);
+ emit_insn (gen_ccmpccdi
+(cc_reg, tmp, const0_rtx,
+ gen_rtx_GE (VOIDmode, cc_reg, const0_rtx),
+ GEN_INT (aarch64_nzcv_codes[AARCH64_EQ])));
+
+ /* The result is entirely within the Z bit. */
+ code = (code == GT ? NE : EQ);
+ goto done;
+
+   default:
+ break;
+   }
+   }
+  else
+   {
+ y_lo = operand_subword (y, 0, 0, TImode);
+ y_hi = operand_subword (y, 1, 0, TImode);
+   }
+
+  cc_reg = gen_rtx_REG (CCmode, CC_REGNUM);
+  switch (code)
+   {
+   case EQ:
+   case NE:
+ /* For EQ, (x_lo == y_lo) && (x_hi == y_hi).  */
+ emit_insn (gen_cmpdi (x_lo, y_lo));
+ emit_insn (gen_ccmpccdi (cc_reg, x_hi, y_hi,
+  gen_rtx_EQ (VOIDmode, cc_reg, const0_rtx),
+  GEN_INT (aarch64_nzcv_codes[AARCH64_NE])));
+ break;
+
+   case LEU:
+   case GTU:
+ std::swap (x_lo, y_lo);
+ std::swap (x_hi, y_hi);
+ code = swap_condition (code);
+ /* fall through */
+
+   case LTU:
+   case GEU:
+ /* For LTU, (x - y), as double-word arithmetic.  */
+ emit_insn (gen_cmpdi (x_lo, y_lo));
+ /* The ucmp*_carryinC pattern uses zero_extend, and so cannot
+take the constant 0 we allow elsewhere.  Force to reg now
+and allow combine to eliminate via simplification.  */
+ x_hi = force_reg (DImode, x_hi);
+ y_hi = force_reg (DImode, y_hi);
+ emit_insn (gen_ucmpdi3_carryinC(x_hi, y_hi));
+ /* The result is entirely within th

[PATCH 5/6] aarch64: Improve nzcv argument to ccmp

2020-03-18 Thread Richard Henderson via Gcc-patches
Currently we use %k to interpret an aarch64_cond_code value.
This interpretation is done via an array, aarch64_nzcv_codes.
The rtl is neither hindered nor harmed by using the proper
nzcv value itself, so index the array earlier than later.
This makes it easier to compare the rtl to the assembly.

It is slightly confusing in that aarch64_nzcv_codes has
values of nzcv which produce the inverse of the code that
is the index.  Invert those values.

* config/aarch64/aarch64.c (AARCH64_CC_{NZCV}): Move up.
(aarch64_nzcv_codes): Move up; reverse values of even/odd entries.
(aarch64_gen_compare_reg): Use aarch64_nzcv_codes in
gen_ccmpccdi generation.
(aarch64_print_operand): Remove case 'k'.
(aarch64_gen_ccmp_next): Invert condition for !AND, remove
inversion for AND; use aarch64_nzcv_codes.
* config/aarch64/aarch64.md (*ccmpcc): Remove %k from
all alternatives.
(*ccmpcc_rev, *ccmp, *ccmp_rev): Likewise.
---
 gcc/config/aarch64/aarch64.c  | 81 +++
 gcc/config/aarch64/aarch64.md | 16 +++
 2 files changed, 42 insertions(+), 55 deletions(-)

diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c
index 16ff40fc267..d7899dad759 100644
--- a/gcc/config/aarch64/aarch64.c
+++ b/gcc/config/aarch64/aarch64.c
@@ -1270,6 +1270,36 @@ aarch64_cc;
 
 #define AARCH64_INVERSE_CONDITION_CODE(X) ((aarch64_cc) (((int) X) ^ 1))
 
+/* N Z C V.  */
+#define AARCH64_CC_V 1
+#define AARCH64_CC_C (1 << 1)
+#define AARCH64_CC_Z (1 << 2)
+#define AARCH64_CC_N (1 << 3)
+
+/*
+ * N Z C V flags for ccmp.  Indexed by aarch64_cond_code.
+ * These are the flags to make the given code be *true*.
+ */
+static const int aarch64_nzcv_codes[] =
+{
+  AARCH64_CC_Z,/* EQ, Z == 1.  */
+  0,   /* NE, Z == 0.  */
+  AARCH64_CC_C,/* CS, C == 1.  */
+  0,   /* CC, C == 0.  */
+  AARCH64_CC_N,/* MI, N == 1.  */
+  0,   /* PL, N == 0.  */
+  AARCH64_CC_V,/* VS, V == 1.  */
+  0,   /* VC, V == 0.  */
+  AARCH64_CC_C,/* HI, C == 1 && Z == 0.  */
+  0,   /* LS, !(C == 1 && Z == 0).  */
+  0,   /* GE, N == V.  */
+  AARCH64_CC_V,/* LT, N != V.  */
+  0,   /* GT, Z == 0 && N == V.  */
+  AARCH64_CC_V,/* LE, !(Z == 0 && N == V).  */
+  0,   /* AL, Any.  */
+  0/* NV, Any.  */
+};
+
 struct aarch64_branch_protect_type
 {
   /* The type's name that the user passes to the branch-protection option
@@ -2351,7 +2381,7 @@ aarch64_gen_compare_reg (RTX_CODE code, rtx x, rtx y)
   rtx y_hi = operand_subword (y, 1, 0, TImode);
   emit_insn (gen_ccmpccdi (cc_reg, x_hi, y_hi,
   gen_rtx_EQ (cc_mode, cc_reg, const0_rtx),
-  GEN_INT (AARCH64_EQ)));
+  GEN_INT (aarch64_nzcv_codes[AARCH64_NE])));
 }
   else
 {
@@ -9302,33 +9332,6 @@ aarch64_const_vec_all_in_range_p (rtx vec,
   return true;
 }
 
-/* N Z C V.  */
-#define AARCH64_CC_V 1
-#define AARCH64_CC_C (1 << 1)
-#define AARCH64_CC_Z (1 << 2)
-#define AARCH64_CC_N (1 << 3)
-
-/* N Z C V flags for ccmp.  Indexed by AARCH64_COND_CODE.  */
-static const int aarch64_nzcv_codes[] =
-{
-  0,   /* EQ, Z == 1.  */
-  AARCH64_CC_Z,/* NE, Z == 0.  */
-  0,   /* CS, C == 1.  */
-  AARCH64_CC_C,/* CC, C == 0.  */
-  0,   /* MI, N == 1.  */
-  AARCH64_CC_N, /* PL, N == 0.  */
-  0,   /* VS, V == 1.  */
-  AARCH64_CC_V, /* VC, V == 0.  */
-  0,   /* HI, C ==1 && Z == 0.  */
-  AARCH64_CC_C,/* LS, !(C == 1 && Z == 0).  */
-  AARCH64_CC_V,/* GE, N == V.  */
-  0,   /* LT, N != V.  */
-  AARCH64_CC_Z, /* GT, Z == 0 && N == V.  */
-  0,   /* LE, !(Z == 0 && N == V).  */
-  0,   /* AL, Any.  */
-  0/* NV, Any.  */
-};
-
 /* Print floating-point vector immediate operand X to F, negating it
first if NEGATE is true.  Return true on success, false if it isn't
a constant we can handle.  */
@@ -9416,7 +9419,6 @@ sizetochar (int size)
(32-bit or 64-bit).
  '0':  Print a normal operand, if it's a general register,
then we assume DImode.
- 'k':  Print NZCV for conditional compare instructions.
  'A':  Output address constant representing the first
argument of X, specifying a relocation offset
if appropriate.
@@ -9866,22 +9868,6 @@ aarch64_print_operand (FILE *f, rtx x, int code)
   output_addr_const (asm_out_file, x);
   break;
 
-case 'k':
-  {
-   HOST_WIDE_INT cond_code;
-
-   if (!CONST_INT_P (x))
- {
-   output_operand_lossage ("invalid operand for '%%%c'", code);
-   return;
- }
-
-   cond_code = INTVAL (x);
-   gcc_assert (cond_code >= 0 && cond_code <= AARCH64_NV);
-  

[PATCH 1/6] aarch64: Add ucmp_*_carryinC patterns for all usub_*_carryinC

2020-03-18 Thread Richard Henderson via Gcc-patches
Use xzr for the output when we only require the flags output.
This will be used shortly for TImode comparisons.

* config/aarch64/aarch64.md (ucmp3_carryinC): New.
(*ucmp3_carryinC_z1): New.
(*ucmp3_carryinC_z2): New.
(*ucmp3_carryinC): New.
---
 gcc/config/aarch64/aarch64.md | 50 +++
 1 file changed, 50 insertions(+)

diff --git a/gcc/config/aarch64/aarch64.md b/gcc/config/aarch64/aarch64.md
index c7c4d1dd519..fcc1ddafaec 100644
--- a/gcc/config/aarch64/aarch64.md
+++ b/gcc/config/aarch64/aarch64.md
@@ -3439,6 +3439,18 @@
""
 )
 
+(define_expand "ucmp3_carryinC"
+   [(set (reg:CC CC_REGNUM)
+(compare:CC
+  (zero_extend:
+(match_operand:GPI 0 "register_operand"))
+  (plus:
+(zero_extend:
+  (match_operand:GPI 1 "register_operand"))
+(ltu: (reg:CC CC_REGNUM) (const_int 0)]
+   ""
+)
+
 (define_insn "*usub3_carryinC_z1"
   [(set (reg:CC CC_REGNUM)
(compare:CC
@@ -3456,6 +3468,19 @@
   [(set_attr "type" "adc_reg")]
 )
 
+(define_insn "*ucmp3_carryinC_z1"
+  [(set (reg:CC CC_REGNUM)
+   (compare:CC
+ (const_int 0)
+ (plus:
+   (zero_extend:
+ (match_operand:GPI 0 "register_operand" "r"))
+   (match_operand: 1 "aarch64_borrow_operation" ""]
+   ""
+   "sbcs\\tzr, zr, %0"
+  [(set_attr "type" "adc_reg")]
+)
+
 (define_insn "*usub3_carryinC_z2"
   [(set (reg:CC CC_REGNUM)
(compare:CC
@@ -3471,6 +3496,17 @@
   [(set_attr "type" "adc_reg")]
 )
 
+(define_insn "*ucmp3_carryinC_z2"
+  [(set (reg:CC CC_REGNUM)
+   (compare:CC
+ (zero_extend:
+   (match_operand:GPI 0 "register_operand" "r"))
+ (match_operand: 1 "aarch64_borrow_operation" "")))]
+   ""
+   "sbcs\\tzr, %0, zr"
+  [(set_attr "type" "adc_reg")]
+)
+
 (define_insn "*usub3_carryinC"
   [(set (reg:CC CC_REGNUM)
(compare:CC
@@ -3489,6 +3525,20 @@
   [(set_attr "type" "adc_reg")]
 )
 
+(define_insn "*ucmp3_carryinC"
+  [(set (reg:CC CC_REGNUM)
+   (compare:CC
+ (zero_extend:
+   (match_operand:GPI 0 "register_operand" "r"))
+ (plus:
+   (zero_extend:
+ (match_operand:GPI 1 "register_operand" "r"))
+   (match_operand: 2 "aarch64_borrow_operation" ""]
+   ""
+   "sbcs\\tzr, %0, %1"
+  [(set_attr "type" "adc_reg")]
+)
+
 (define_expand "sub3_carryinV"
   [(parallel
  [(set (reg:CC_V CC_REGNUM)
-- 
2.20.1



[PATCH 3/6] aarch64: Accept 0 as first argument to compares

2020-03-18 Thread Richard Henderson via Gcc-patches
While cmp (extended register) and cmp (immediate) uses ,
cmp (shifted register) uses .  So we can perform cmp xzr, x0.

For ccmp, we only have  as an input.

* config/aarch64/aarch64.md (cmp): For operand 0, use
aarch64_reg_or_zero.  Shuffle reg/reg to last alternative
and accept Z.
(@ccmpcc): For operand 0, use aarch64_reg_or_zero and Z.
(@ccmpcc_rev): Likewise.
---
 gcc/config/aarch64/aarch64.md | 14 +++---
 1 file changed, 7 insertions(+), 7 deletions(-)

diff --git a/gcc/config/aarch64/aarch64.md b/gcc/config/aarch64/aarch64.md
index 29dfd6df30c..0fe41117640 100644
--- a/gcc/config/aarch64/aarch64.md
+++ b/gcc/config/aarch64/aarch64.md
@@ -502,7 +502,7 @@
   [(match_operand 0 "cc_register" "")
(const_int 0)])
  (compare:CC_ONLY
-   (match_operand:GPI 2 "register_operand" "r,r,r")
+   (match_operand:GPI 2 "aarch64_reg_or_zero" "rZ,rZ,rZ")
(match_operand:GPI 3 "aarch64_ccmp_operand" "r,Uss,Usn"))
  (unspec:CC_ONLY
[(match_operand 5 "immediate_operand")]
@@ -542,7 +542,7 @@
[(match_operand 5 "immediate_operand")]
UNSPEC_NZCV)
  (compare:CC_ONLY
-   (match_operand:GPI 2 "register_operand" "r,r,r")
+   (match_operand:GPI 2 "aarch64_reg_or_zero" "rZ,rZ,rZ")
(match_operand:GPI 3 "aarch64_ccmp_operand" "r,Uss,Usn"]
   ""
   "@
@@ -4009,14 +4009,14 @@
 
 (define_insn "cmp"
   [(set (reg:CC CC_REGNUM)
-   (compare:CC (match_operand:GPI 0 "register_operand" "rk,rk,rk")
-   (match_operand:GPI 1 "aarch64_plus_operand" "r,I,J")))]
+   (compare:CC (match_operand:GPI 0 "aarch64_reg_or_zero" "rk,rk,rkZ")
+   (match_operand:GPI 1 "aarch64_plus_operand" "I,J,rZ")))]
   ""
   "@
-   cmp\\t%0, %1
cmp\\t%0, %1
-   cmn\\t%0, #%n1"
-  [(set_attr "type" "alus_sreg,alus_imm,alus_imm")]
+   cmn\\t%0, #%n1
+   cmp\\t%0, %1"
+  [(set_attr "type" "alus_imm,alus_imm,alus_sreg")]
 )
 
 (define_insn "fcmp"
-- 
2.20.1



[PATCH 0/6] aarch64: Implement TImode comparisons

2020-03-18 Thread Richard Henderson via Gcc-patches
This is attacking case 3 of PR 94174.

The existing ccmp optimization happens at the gimple level,
which means that rtl expansion of TImode stuff cannot take
advantage.  But we can to even better than the existing
ccmp optimization.

This expansion is similar size to our current branchful 
expansion, but all straight-line code.  I will assume in
general that the branch predictor will work better with
fewer branches.

E.g.

-  10:  b7f800a3tbnzx3, #63, 24 <__subvti3+0x24>
-  14:  eb02003fcmp x1, x2
-  18:  5400010cb.gt38 <__subvti3+0x38>
-  1c:  54000140b.eq44 <__subvti3+0x44>  // b.none
-  20:  d65f03c0ret
-  24:  eb01005fcmp x2, x1
-  28:  548cb.gt38 <__subvti3+0x38>
-  2c:  54a1b.ne20 <__subvti3+0x20>  // b.any
-  30:  eb9fcmp x4, x0
-  34:  5469b.ls20 <__subvti3+0x20>  // b.plast
-  38:  a9bf7bfdstp x29, x30, [sp, #-16]!
-  3c:  910003fdmov x29, sp
-  40:  9400bl  0 
-  44:  eb04001fcmp x0, x4
-  48:  5488b.hi38 <__subvti3+0x38>  // b.pmore
-  4c:  d65f03c0ret

+  10:  b7f800e3tbnzx3, #63, 2c <__subvti3+0x2c>
+  14:  eb01005fcmp x2, x1
+  18:  1a9fb7e2csetw2, ge  // ge = tcont
+  1c:  fa400080ccmpx4, x0, #0x0, eq  // eq = none
+  20:  7a40a844ccmpw2, #0x0, #0x4, ge  // ge = tcont
+  24:  54e0b.eq40 <__subvti3+0x40>  // b.none
+  28:  d65f03c0ret
+  2c:  eb01005fcmp x2, x1
+  30:  1a9fc7e2csetw2, le
+  34:  fa400081ccmpx4, x0, #0x1, eq  // eq = none
+  38:  7a40d844ccmpw2, #0x0, #0x4, le
+  3c:  5460b.eq28 <__subvti3+0x28>  // b.none
+  40:  a9bf7bfdstp x29, x30, [sp, #-16]!
+  44:  910003fdmov x29, sp
+  48:  9400bl  0 

So one less insn, but 2 branches instead of 6.

As for the specific case of the PR,

void test_int128(__int128 a, uint64_t l)
{
if ((__int128_t)a - l <= 1)
doit();
}

0:  eb02subsx0, x0, x2
4:  da1f0021sbc x1, x1, xzr
8:  f13fcmp x1, #0x0
-   c:  544db.le14 
-  10:  d65f03c0ret
-  14:  5461b.ne20   // b.any
-  18:  f100041fcmp x0, #0x1
-  1c:  54a8b.hi10   // b.pmore
+   c:  1a9fc7e1csetw1, le
+  10:  fa410801ccmpx0, #0x1, #0x1, eq  // eq = none
+  14:  7a40d824ccmpw1, #0x0, #0x4, le
+  18:  5441b.ne20   // b.any
+  1c:  d65f03c0ret
   20:  1400b   0 


r~


Richard Henderson (6):
  aarch64: Add ucmp_*_carryinC patterns for all usub_*_carryinC
  aarch64: Adjust result of aarch64_gen_compare_reg
  aarch64: Accept 0 as first argument to compares
  aarch64: Simplify @ccmp operands
  aarch64: Improve nzcv argument to ccmp
  aarch64: Implement TImode comparisons

 gcc/config/aarch64/aarch64.c  | 304 --
 gcc/config/aarch64/aarch64-simd.md|  18 +-
 gcc/config/aarch64/aarch64-speculation.cc |   5 +-
 gcc/config/aarch64/aarch64.md | 280 ++--
 4 files changed, 429 insertions(+), 178 deletions(-)

-- 
2.20.1