PING^3: [PATCH] mips: add MSA vec_cmp and vec_cmpu expand pattern [PR101132]

2021-07-15 Thread Xi Ruoyao via Gcc-patches
Ping again.

I heard that Matthew is too busy to deal with MIPS things from someone's
private mail.  Hope someone else can review it.

On Mon, 2021-06-21 at 21:42 +0800, Xi Ruoyao wrote:
> Middle-end started to emit vec_cmp and vec_cmpu since GCC 11, causing
> ICE on MIPS with MSA enabled.  Add the pattern to prevent it.
> 
> Bootstrapped and regression tested on mips64el-linux-gnu.
> Ok for trunk?
> 
> gcc/
> 
> * config/mips/mips-protos.h (mips_expand_vec_cmp_expr):
> Declare.
> * config/mips/mips.c (mips_expand_vec_cmp_expr): New function.
> * config/mips/mips-msa.md (vec_cmp): New
>   expander.
>   (vec_cmpu): New expander.
> ---
>  gcc/config/mips/mips-msa.md   | 22 ++
>  gcc/config/mips/mips-protos.h |  1 +
>  gcc/config/mips/mips.c    | 11 +++
>  3 files changed, 34 insertions(+)
> 
> diff --git a/gcc/config/mips/mips-msa.md b/gcc/config/mips/mips-msa.md
> index 3ecf2bde19f..3a67f25be56 100644
> --- a/gcc/config/mips/mips-msa.md
> +++ b/gcc/config/mips/mips-msa.md
> @@ -435,6 +435,28 @@
>    DONE;
>  })
>  
> +(define_expand "vec_cmp"
> +  [(match_operand: 0 "register_operand")
> +   (match_operator 1 ""
> + [(match_operand:MSA 2 "register_operand")
> +  (match_operand:MSA 3 "register_operand")])]
> +  "ISA_HAS_MSA"
> +{
> +  mips_expand_vec_cmp_expr (operands);
> +  DONE;
> +})
> +
> +(define_expand "vec_cmpu"
> +  [(match_operand: 0 "register_operand")
> +   (match_operator 1 ""
> + [(match_operand:IMSA 2 "register_operand")
> +  (match_operand:IMSA 3 "register_operand")])]
> +  "ISA_HAS_MSA"
> +{
> +  mips_expand_vec_cmp_expr (operands);
> +  DONE;
> +})
> +
>  (define_insn "msa_insert_"
>    [(set (match_operand:MSA 0 "register_operand" "=f,f")
> (vec_merge:MSA
> diff --git a/gcc/config/mips/mips-protos.h b/gcc/config/mips/mips-
> protos.h
> index 2cf4ed50292..a685f7f7dd5 100644
> --- a/gcc/config/mips/mips-protos.h
> +++ b/gcc/config/mips/mips-protos.h
> @@ -385,6 +385,7 @@ extern mulsidi3_gen_fn mips_mulsidi3_gen_fn (enum
> rtx_code);
>  
>  extern void mips_register_frame_header_opt (void);
>  extern void mips_expand_vec_cond_expr (machine_mode, machine_mode,
> rtx *);
> +extern void mips_expand_vec_cmp_expr (rtx *);
>  
>  /* Routines implemented in mips-d.c  */
>  extern void mips_d_target_versions (void);
> diff --git a/gcc/config/mips/mips.c b/gcc/config/mips/mips.c
> index 00a8eef96aa..8f043399a8e 100644
> --- a/gcc/config/mips/mips.c
> +++ b/gcc/config/mips/mips.c
> @@ -22321,6 +22321,17 @@ mips_expand_msa_cmp (rtx dest, enum rtx_code
> cond, rtx op0, rtx op1)
>  }
>  }
>  
> +void
> +mips_expand_vec_cmp_expr (rtx *operands)
> +{
> +  rtx cond = operands[1];
> +  rtx op0 = operands[2];
> +  rtx op1 = operands[3];
> +  rtx res = operands[0];
> +
> +  mips_expand_msa_cmp (res, GET_CODE (cond), op0, op1);
> +}
> +
>  /* Expand VEC_COND_EXPR, where:
>     MODE is mode of the result
>     VIMODE equivalent integer mode

-- 
Xi Ruoyao 
School of Aerospace Science and Technology, Xidian University



PING^3: [PATCH] mips: Fix up mips_atomic_assign_expand_fenv [PR94780]

2021-07-15 Thread Xi Ruoyao via Gcc-patches
Ping again.

On Wed, 2021-06-23 at 11:11 +0800, Xi Ruoyao wrote:
> Commit message shamelessly copied from 1777beb6b129 by jakub:
> 
> This function, because it is sometimes called even outside of function
> bodies, uses create_tmp_var_raw rather than create_tmp_var.  But in
> order
> for that to work, when first referenced, the VAR_DECLs need to appear
> in a
> TARGET_EXPR so that during gimplification the var gets the right
> DECL_CONTEXT and is added to local decls.
> 
> Bootstrapped & regtested on mips64el-linux-gnu.  Ok for trunk and
> backport
> to 11, 10, and 9?
> 
> gcc/
> 
> * config/mips/mips.c (mips_atomic_assign_expand_fenv): Use
>   TARGET_EXPR instead of MODIFY_EXPR.
> ---
>  gcc/config/mips/mips.c | 12 ++--
>  1 file changed, 6 insertions(+), 6 deletions(-)
> 
> diff --git a/gcc/config/mips/mips.c b/gcc/config/mips/mips.c
> index 8f043399a8e..89d1be6cea6 100644
> --- a/gcc/config/mips/mips.c
> +++ b/gcc/config/mips/mips.c
> @@ -22439,12 +22439,12 @@ mips_atomic_assign_expand_fenv (tree *hold,
> tree *clear, tree *update)
>    tree get_fcsr = mips_builtin_decls[MIPS_GET_FCSR];
>    tree set_fcsr = mips_builtin_decls[MIPS_SET_FCSR];
>    tree get_fcsr_hold_call = build_call_expr (get_fcsr, 0);
> -  tree hold_assign_orig = build2 (MODIFY_EXPR, MIPS_ATYPE_USI,
> - fcsr_orig_var, get_fcsr_hold_call);
> +  tree hold_assign_orig = build4 (TARGET_EXPR, MIPS_ATYPE_USI,
> + fcsr_orig_var, get_fcsr_hold_call,
> NULL, NULL);
>    tree hold_mod_val = build2 (BIT_AND_EXPR, MIPS_ATYPE_USI,
> fcsr_orig_var,
>   build_int_cst (MIPS_ATYPE_USI,
> 0xf003));
> -  tree hold_assign_mod = build2 (MODIFY_EXPR, MIPS_ATYPE_USI,
> -    fcsr_mod_var, hold_mod_val);
> +  tree hold_assign_mod = build4 (TARGET_EXPR, MIPS_ATYPE_USI,
> +    fcsr_mod_var, hold_mod_val, NULL,
> NULL);
>    tree set_fcsr_hold_call = build_call_expr (set_fcsr, 1,
> fcsr_mod_var);
>    tree hold_all = build2 (COMPOUND_EXPR, MIPS_ATYPE_USI,
>   hold_assign_orig, hold_assign_mod);
> @@ -22454,8 +22454,8 @@ mips_atomic_assign_expand_fenv (tree *hold,
> tree *clear, tree *update)
>    *clear = build_call_expr (set_fcsr, 1, fcsr_mod_var);
>  
>    tree get_fcsr_update_call = build_call_expr (get_fcsr, 0);
> -  *update = build2 (MODIFY_EXPR, MIPS_ATYPE_USI,
> -   exceptions_var, get_fcsr_update_call);
> +  *update = build4 (TARGET_EXPR, MIPS_ATYPE_USI,
> +   exceptions_var, get_fcsr_update_call, NULL, NULL);
>    tree set_fcsr_update_call = build_call_expr (set_fcsr, 1,
> fcsr_orig_var);
>    *update = build2 (COMPOUND_EXPR, void_type_node, *update,
>     set_fcsr_update_call);

-- 
Xi Ruoyao 



[PATCH v4] vect: Recog mul_highpart pattern

2021-07-15 Thread Kewen.Lin via Gcc-patches
on 2021/7/15 下午7:58, Richard Biener wrote:
> On Thu, Jul 15, 2021 at 10:41 AM Kewen.Lin  wrote:
>>
>> on 2021/7/15 下午4:04, Kewen.Lin via Gcc-patches wrote:
>>> Hi Uros,
>>>
>>> on 2021/7/15 下午3:17, Uros Bizjak wrote:
 On Thu, Jul 15, 2021 at 9:07 AM Kewen.Lin  wrote:
>
> on 2021/7/14 下午3:45, Kewen.Lin via Gcc-patches wrote:
>> on 2021/7/14 下午2:38, Richard Biener wrote:
>>> On Tue, Jul 13, 2021 at 4:59 PM Kewen.Lin  wrote:

 on 2021/7/13 下午8:42, Richard Biener wrote:
> On Tue, Jul 13, 2021 at 12:25 PM Kewen.Lin  
> wrote:
>>>
 I guess the proposed IFN would be directly mapped for [us]mul_highpart?
>>>
>>> Yes.
>>>
>>
>> Thanks for confirming!  The related patch v2 is attached and the testing
>> is ongoing.
>>
>
> It's bootstrapped & regtested on powerpc64le-linux-gnu P9 and
> aarch64-linux-gnu.  But on x86_64-redhat-linux there are XPASSes as below:
>
> XFAIL->XPASS: gcc.target/i386/pr100637-3w.c scan-assembler pmulhuw
> XFAIL->XPASS: gcc.target/i386/pr100637-3w.c scan-assembler pmulhuw
> XFAIL->XPASS: gcc.target/i386/pr100637-3w.c scan-assembler pmulhw
> XFAIL->XPASS: gcc.target/i386/pr100637-3w.c scan-assembler pmulhw

 These XFAILs should be removed after your patch.

>>> I'm curious whether it's intentional not to specify -fno-vect-cost-model
>>> for this test case.  As noted above, this case is sensitive on how we
>>> cost mult_highpart.  Without cost modeling, the XFAILs can be removed
>>> only with this mul_highpart pattern support, no matter how we model it
>>> (x86 part of this patch exists or not).
>>>
 This is PR100696 [1], we want PMULH.W here, so x86 part of the patch
 is actually not needed.

>>>
>>> Thanks for the information!  The justification for the x86 part is that:
>>> the IFN_MULH essentially covers MULT_HIGHPART_EXPR with mul_highpart
>>> optab support, i386 port has already customized costing for
>>> MULT_HIGHPART_EXPR (should mean/involve the case with mul_highpart optab
>>> support), if we don't follow the same way for IFN_MULH, I'm worried that
>>> we may cost the IFN_MULH wrongly.  If taking IFN_MULH as normal stmt is
>>> a right thing (we shouldn't cost it specially), it at least means we
>>> have to adjust ix86_multiplication_cost for MULT_HIGHPART_EXPR when it
>>> has direct mul_highpart optab support, I think they should be costed
>>> consistently.  Does it sound reasonable?
>>>
>>
>> Hi Richard(s),
>>
>> This possibly inconsistent handling problem seems like a counter example
>> better to use a new IFN rather than the existing tree_code, it seems hard
>> to maintain (should remember to keep consistent for its handlings).  ;)
>> From this perspective, maybe it's better to move backward to use tree_code
>> and guard it under can_mult_highpart_p == 1 (just like IFN and avoid
>> costing issue Richi pointed out before)?
>>
>> What do you think?
> 
> No, whenever we want to do code generation based on machine
> capabilities the canonical way to test for those is to look at optabs
> and then it's most natural to keep that 1:1 relation and emit
> internal function calls which directly map to supported optabs
> instead of going back to some tree codes.
> 
> When targets "lie" and provide expanders for something they can
> only emulate then they have to compensate in their costing.
> But as I understand this isn't the case for x86 here.
> 
> Now, in this case we already have the MULT_HIGHPART_EXPR tree,
> so yes, it might make sense to use that instead of introducing an
> alternate way via the direct internal function.  Somebody decided
> that MULT_HIGHPART is generic enough to warrant this - but I
> see that expand_mult_highpart can fail unless can_mult_highpart_p
> and this is exactly one of the cases we want to avoid - either
> we can handle something generally in which case it can be a
> tree code or we can't, then it should be 1:1 tied to optabs at best
> (mult_highpart has scalar support only for the direct optab,
> vector support also for widen_mult).
> 

Thanks for the detailed explanation!  The attached v4 follows the
preferred IFN way like v3, just with extra test case updates.

Bootstrapped & regtested again on powerpc64le-linux-gnu P9,
x86_64-redhat-linux and aarch64-linux-gnu.

Is it ok for trunk?

BR,
Kewen
-
gcc/ChangeLog:

PR tree-optimization/100696
* internal-fn.c (first_commutative_argument): Add info for IFN_MULH.
* internal-fn.def (IFN_MULH): New internal function.
* tree-vect-patterns.c (vect_recog_mulhs_pattern): Add support to
recog normal multiply highpart as IFN_MULH.
* config/i386/i386.c (ix86_add_stmt_cost): Adjust for combined
function CFN_MULH.

gcc/testsuite/ChangeLog:

PR tree-optimization/100696
* gcc.target/i386/pr100637-3w.c: Adjust for mul_highpart recog.
---
 gcc/config/i386/i386.c  |  3 ++
 

[PATCH] Analyzer: Refactor callstring to work with pairs of supernodes.

2021-07-15 Thread Ankur Saini via Gcc-patches
2021-07-12  Ankur Saini  

gcc/analyzer/ChangeLog:
* call-string.cc (call_string::element_t::operator==): New operator.
(call_String::element_t::operator!=): New operator.
(call_string::element_t::get_caller_function): New function.
(call_string::element_t::get_callee_function): New function.
(call_string::call_string): Refactor to Initialise m_elements.
(call_string::operator=): Refactor to work with m_elements.
(call_string::operator==): Likewise.
(call_string::to_json): Likewise.
(call_string::hash): Refactor to hash e.m_caller.
(call_string::push_call): Refactor to work with m_elements.
(call_string::push_call): New overload to push call via supernodes.
(call_string::pop): Refactor to work with m_elements.
(call_string::calc_recursion_depth): Likewise.
(call_string::cmp): Likewise.
(call_string::validate): Likewise.
(call_string::operator[]): Likewise.
* call-string.h (class supernode): New forward decl.
(struct call_string::element_t): New struct.
(call_string::call_string): Refactor to initialise m_elements.
(call_string::bool empty_p): Refactor to work with m_elements.
(call_string::get_callee_node): New decl.
(call_string::get_caller_node): New decl.
(m_elements): Replaces m_return_edges.
* program-point.cc (program_point::get_function_at_depth): Refactor to
work with new call-string format.
(program_point::validate): Likewise.
(program_point::on_edge): Likewise.
---
 gcc/analyzer/call-string.cc   | 147 +-
 gcc/analyzer/call-string.h|  54 ++---
 gcc/analyzer/program-point.cc |  10 ++-
 3 files changed, 159 insertions(+), 52 deletions(-)

diff --git a/gcc/analyzer/call-string.cc b/gcc/analyzer/call-string.cc
index 9f4f77ab3a9..2e7c8256cbb 100644
--- a/gcc/analyzer/call-string.cc
+++ b/gcc/analyzer/call-string.cc
@@ -45,13 +45,46 @@ along with GCC; see the file COPYING3.  If not see
 
 /* class call_string.  */
 
+/* struct call_string::element_t.  */
+
+/* call_string::element_t's equality operator.  */
+
+bool
+call_string::element_t::operator== (const call_string::element_t ) const
+{
+  if (m_caller == other.m_caller && m_callee == other.m_callee)
+return true;
+  return false;
+}
+
+/* call_string::element_t's inequality operator.  */
+bool
+call_string::element_t::operator!= (const call_string::element_t ) const
+{
+  if (m_caller != other.m_caller || m_callee != other.m_callee)
+return true;
+  return false;
+}
+
+function *
+call_string::element_t::get_caller_function () const
+{
+  return m_caller->get_function ();
+}
+
+function *
+call_string::element_t::get_callee_function () const
+{
+  return m_callee->get_function ();
+}
+
 /* call_string's copy ctor.  */
 
 call_string::call_string (const call_string )
-: m_return_edges (other.m_return_edges.length ())
+: m_elements (other.m_elements.length ())
 {
-  for (const return_superedge *e : other.m_return_edges)
-m_return_edges.quick_push (e);
+  for (const call_string::element_t  : other.m_elements)
+m_elements.quick_push (e);
 }
 
 /* call_string's assignment operator.  */
@@ -60,12 +93,12 @@ call_string&
 call_string::operator= (const call_string )
 {
   // would be much simpler if we could rely on vec<> assignment op
-  m_return_edges.truncate (0);
-  m_return_edges.reserve (other.m_return_edges.length (), true);
-  const return_superedge *e;
+  m_elements.truncate (0);
+  m_elements.reserve (other.m_elements.length (), true);
+  call_string::element_t *e;
   int i;
-  FOR_EACH_VEC_ELT (other.m_return_edges, i, e)
-m_return_edges.quick_push (e);
+  FOR_EACH_VEC_ELT (other.m_elements, i, e)
+m_elements.quick_push (*e);
   return *this;
 }
 
@@ -74,12 +107,12 @@ call_string::operator= (const call_string )
 bool
 call_string::operator== (const call_string ) const
 {
-  if (m_return_edges.length () != other.m_return_edges.length ())
+  if (m_elements.length () != other.m_elements.length ())
 return false;
-  const return_superedge *e;
+  call_string::element_t *e;
   int i;
-  FOR_EACH_VEC_ELT (m_return_edges, i, e)
-if (e != other.m_return_edges[i])
+  FOR_EACH_VEC_ELT (m_elements, i, e)
+if (*e != other.m_elements[i])
   return false;
   return true;
 }
@@ -91,15 +124,15 @@ call_string::print (pretty_printer *pp) const
 {
   pp_string (pp, "[");
 
-  const return_superedge *e;
+  call_string::element_t *e;
   int i;
-  FOR_EACH_VEC_ELT (m_return_edges, i, e)
+  FOR_EACH_VEC_ELT (m_elements, i, e)
 {
   if (i > 0)
pp_string (pp, ", ");
   pp_printf (pp, "(SN: %i -> SN: %i in %s)",
-e->m_src->m_index, e->m_dest->m_index,
-function_name (e->m_dest->m_fun));
+e->m_callee->m_index, e->m_caller->m_index,
+function_name (e->m_caller->m_fun));
 }
 
   pp_string (pp, 

[PATCH, Fortran] Bind(c): CFI_signed_char is not a Fortran character type

2021-07-15 Thread Sandra Loosemore
When I was reading code in conjunction with fixing PR101317, I noticed 
an unrelated bug in the implementation of CFI_allocate and 
CFI_select_part:  they were mis-handling the CFI_signed_char type as if 
it were a Fortran character type for the purposes of deciding whether to 
use the elem_len argument to those functions.  It's really an integer 
type that has the size of signed char.  I checked similar code in other 
functions in ISO_Fortran_binding.c and these were the only two that were 
incorrect.


The part of the patch to add tests for this goes on top of my base 
TS29113 testsuite patch, which hasn't been reviewed or committed yet.  I 
can refactor that into the same commit as the rest of the testsuite, 
assuming everything eventually gets approved.


https://gcc.gnu.org/pipermail/gcc-patches/2021-July/574576.html

-Sandra
commit 45190d9eb5123df77bd60a1d6712f05a3af5f42c
Author: Sandra Loosemore 
Date:   Thu Jul 15 16:51:55 2021 -0700

Bind(c): signed char is not a Fortran character type

CFI_allocate and CFI_select_part were incorrectly treating
CFI_type_signed_char as a Fortran character type for the purpose of
deciding whether or not to use the elem_len argument.  It is a Fortran
integer type per table 18.2 in the 2018 Fortran standard.

Other functions in ISO_Fortran_binding.c appeared to handle this case
correctly already.

2021-07-15  Sandra Loosemore  

gcc/testsuite/
	* gfortran.dg/ts29113/library/allocate-c.c (ctest): Also test
	handling of elem_len for CFI_type_char vs CFI_type_signed_char.
	* gfortran.dg/ts29113/library/select-c.c (ctest): Likewise.

libgfortran/
	* runtime/ISO_Fortran_binding.c (CFI_allocate)

diff --git a/gcc/testsuite/gfortran.dg/ts29113/library/allocate-c.c b/gcc/testsuite/gfortran.dg/ts29113/library/allocate-c.c
index 0208e5a..6343d28 100644
--- a/gcc/testsuite/gfortran.dg/ts29113/library/allocate-c.c
+++ b/gcc/testsuite/gfortran.dg/ts29113/library/allocate-c.c
@@ -135,5 +135,34 @@ ctest (void)
 		CFI_deallocate (dv));
   if (dv->base_addr != NULL)
 abort ();
+
+  /* Signed char is not a Fortran character type.  Here we expect it to
+ ignore the elem_len argument and use the size of the type.  */
+  ex[0] = 3;
+  ex[1] = 4;
+  ex[2] = 5;
+  check_CFI_status ("CFI_establish",
+		CFI_establish (dv, NULL, CFI_attribute_allocatable,
+   CFI_type_signed_char, 4, 3, ex));
+  lb[0] = 1;
+  lb[1] = 2;
+  lb[2] = 3;
+  ub[0] = 10;
+  ub[1] = 5;
+  ub[2] = 10;
+  sm = sizeof (double);
+  check_CFI_status ("CFI_allocate",
+		CFI_allocate (dv, lb, ub, sm));
+  dump_CFI_cdesc_t (dv);
+  if (dv->base_addr == NULL)
+abort ();
+  if (dv->elem_len != sizeof (signed char))
+abort ();
+
+  check_CFI_status ("CFI_deallocate",
+		CFI_deallocate (dv));
+  if (dv->base_addr != NULL)
+abort ();
+
 }
 
diff --git a/gcc/testsuite/gfortran.dg/ts29113/library/select-c.c b/gcc/testsuite/gfortran.dg/ts29113/library/select-c.c
index df6172c..9bcbc01 100644
--- a/gcc/testsuite/gfortran.dg/ts29113/library/select-c.c
+++ b/gcc/testsuite/gfortran.dg/ts29113/library/select-c.c
@@ -8,6 +8,8 @@
 
 /* Declare some source arrays.  */
 struct ss {
+  char c[4];
+  signed char b[4];
   int i, j, k;
 } s[10][5][3];
 
@@ -61,6 +63,31 @@ ctest (void)
   if (result->dim[2].sm != source->dim[2].sm)
 abort ();
 
+  /* Check that we use the given elem_size for char but not for
+ signed char, which is considered an integer type instead of a Fortran
+ character type.  */
+  check_CFI_status ("CFI_establish", 
+		CFI_establish (result, NULL, CFI_attribute_pointer,
+   CFI_type_char, 4, 3, NULL));
+  if (result->elem_len != 4)
+abort ();
+  offset = offsetof (struct ss, c);
+  check_CFI_status ("CFI_select_part",
+		CFI_select_part (result, source, offset, 4));
+  if (result->elem_len != 4)
+abort ();
+
+  check_CFI_status ("CFI_establish", 
+		CFI_establish (result, NULL, CFI_attribute_pointer,
+   CFI_type_signed_char, 4, 3, NULL));
+  if (result->elem_len != sizeof (signed char))
+abort ();
+  offset = offsetof (struct ss, c);
+  check_CFI_status ("CFI_select_part",
+		CFI_select_part (result, source, offset, 4));
+  if (result->elem_len != sizeof (signed char))
+abort ();
+
   /* Extract an array of character substrings.  */
   offset = 2;
   check_CFI_status ("CFI_establish",
diff --git a/libgfortran/runtime/ISO_Fortran_binding.c b/libgfortran/runtime/ISO_Fortran_binding.c
index 78953d0..9fe3a85 100644
--- a/libgfortran/runtime/ISO_Fortran_binding.c
+++ b/libgfortran/runtime/ISO_Fortran_binding.c
@@ -229,10 +229,9 @@ CFI_allocate (CFI_cdesc_t *dv, const CFI_index_t lower_bounds[],
 	}
 }
 
-  /* If the type is a character, the descriptor's element length is replaced
- by the elem_len argument. */
-  if (dv->type == CFI_type_char || dv->type == CFI_type_ucs4_char ||
-  dv->type == CFI_type_signed_char)
+  /* If the type is a Fortran 

Re: [PATCH] c++: implement C++17 hardware interference size

2021-07-15 Thread Noah Goldstein via Gcc-patches
On Thu, Jul 15, 2021 at 10:41 PM Jason Merrill via Gcc-patches <
gcc-patches@gcc.gnu.org> wrote:

> Adding CCs that got lost in the initial mail.
>
> On Thu, Jul 15, 2021 at 10:36 PM Jason Merrill  wrote:
>
> > The last missing piece of the C++17 standard library is the hardware
> > intereference size constants.  Much of the delay in implementing these
> has
> > been due to uncertainty about what the right values are, and even whether
> > there is a single constant value that is suitable; the destructive
> > interference size is intended to be used in structure layout, so program
> > ABIs will depend on it.
> >
> > In principle, both of these values should be the same as the target's L1
> > cache line size.  When compiling for a generic target that is intended to
> > support a range of target CPUs with different cache line sizes, the
> > constructive size should probably be the minimum size, and the
> destructive
> > size the maximum, unless you are constrained by ABI compatibility with
> > previous code.
> >
> > JF Bastien's implementation proposal is summarized at
> > https://github.com/itanium-cxx-abi/cxx-abi/issues/74
> >
> > I implement this by adding new --params for the two sizes.  Targets need
> to
> > override these values in targetm.target_option.override() to support the
> > feature.
> >
> > 64 bytes still seems correct for the x86 family.
> >
> > I'm not sure why he said 64/64 for 32-bit ARM, since the Cortex A9 has a
> > 32-byte cache line, and that seems to be the only ARM_PREFETCH_BENEFICIAL
> > target, so I'd think 32/64 would make more sense.
> >
> > He proposed 64/128 for AArch64, but since the A64FX now has a 256B cache
> > line, I've changed that to 64/256.  Does that seem right?
> >
> > Currently the patch does not adjust the values based on -march, as in
> JF's
> > proposal.  I'll need more guidance from the ARM/AArch64 maintainers about
> > how to go about that.  --param l1-cache-line-size is set based on -mtune,
> > but I don't think we want -mtune to change these ABI-affecting values.
> Are
> > there -march values for which a smaller range than 64-256 makes sense?
> >
> > gcc/ChangeLog:
> >
> > * params.opt: Add destructive-interference-size and
> > constructive-interference-size.
> > * doc/invoke.texi: Document them.
> > * config/aarch64/aarch64.c (aarch64_override_options_internal):
> > Set them.
> > * config/arm/arm.c (arm_option_override): Set them.
> > * config/i386/i386-options.c (ix86_option_override_internal):
> > Set them.
> >
> > gcc/c-family/ChangeLog:
> >
> > * c.opt: Add -Winterference-size.
> > * c-cppbuiltin.c (cpp_atomic_builtins): Add
> __GCC_DESTRUCTIVE_SIZE
> > and __GCC_CONSTRUCTIVE_SIZE.
> >
> > gcc/cp/ChangeLog:
> >
> > * decl.c (cxx_init_decl_processing): Check
> > --param *-interference-size values.
> >
> > libstdc++-v3/ChangeLog:
> >
> > * include/std/version: Define
> __cpp_lib_hardware_interference_size.
> > * libsupc++/new: Define hardware interference size variables.
> >
> > gcc/testsuite/ChangeLog:
> >
> > * g++.target/aarch64/interference.C: New test.
> > * g++.target/arm/interference.C: New test.
> > * g++.target/i386/interference.C: New test.
> > ---
> >  gcc/doc/invoke.texi   | 22 ++
> >  gcc/c-family/c.opt|  5 
> >  gcc/params.opt| 15 
> >  gcc/c-family/c-cppbuiltin.c   | 12 ++
> >  gcc/config/aarch64/aarch64.c  |  9 
> >  gcc/config/arm/arm.c  |  6 +
> >  gcc/config/i386/i386-options.c|  6 +
> >  gcc/cp/decl.c | 23 +++
> >  .../g++.target/aarch64/interference.C |  9 
> >  gcc/testsuite/g++.target/arm/interference.C   |  9 
> >  gcc/testsuite/g++.target/i386/interference.C  |  8 +++
> >  libstdc++-v3/include/std/version  |  3 +++
> >  libstdc++-v3/libsupc++/new| 10 ++--
> >  13 files changed, 135 insertions(+), 2 deletions(-)
> >  create mode 100644 gcc/testsuite/g++.target/aarch64/interference.C
> >  create mode 100644 gcc/testsuite/g++.target/arm/interference.C
> >  create mode 100644 gcc/testsuite/g++.target/i386/interference.C
> >
> > diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi
> > index ea8812425e9..f93cb7a20f7 100644
> > --- a/gcc/doc/invoke.texi
> > +++ b/gcc/doc/invoke.texi
> > @@ -13857,6 +13857,28 @@ prefetch hints can be issued for any constant
> > stride.
> >
> >  This setting is only useful for strides that are known and constant.
> >
> > +@item destructive_interference_size
> > +@item constructive_interference_size
> > +The values for the C++17 variables
> > +@code{std::hardware_destructive_interference_size} and
> > 

Re: [PATCH] c++: implement C++17 hardware interference size

2021-07-15 Thread Jason Merrill via Gcc-patches
Adding CCs that got lost in the initial mail.

On Thu, Jul 15, 2021 at 10:36 PM Jason Merrill  wrote:

> The last missing piece of the C++17 standard library is the hardware
> intereference size constants.  Much of the delay in implementing these has
> been due to uncertainty about what the right values are, and even whether
> there is a single constant value that is suitable; the destructive
> interference size is intended to be used in structure layout, so program
> ABIs will depend on it.
>
> In principle, both of these values should be the same as the target's L1
> cache line size.  When compiling for a generic target that is intended to
> support a range of target CPUs with different cache line sizes, the
> constructive size should probably be the minimum size, and the destructive
> size the maximum, unless you are constrained by ABI compatibility with
> previous code.
>
> JF Bastien's implementation proposal is summarized at
> https://github.com/itanium-cxx-abi/cxx-abi/issues/74
>
> I implement this by adding new --params for the two sizes.  Targets need to
> override these values in targetm.target_option.override() to support the
> feature.
>
> 64 bytes still seems correct for the x86 family.
>
> I'm not sure why he said 64/64 for 32-bit ARM, since the Cortex A9 has a
> 32-byte cache line, and that seems to be the only ARM_PREFETCH_BENEFICIAL
> target, so I'd think 32/64 would make more sense.
>
> He proposed 64/128 for AArch64, but since the A64FX now has a 256B cache
> line, I've changed that to 64/256.  Does that seem right?
>
> Currently the patch does not adjust the values based on -march, as in JF's
> proposal.  I'll need more guidance from the ARM/AArch64 maintainers about
> how to go about that.  --param l1-cache-line-size is set based on -mtune,
> but I don't think we want -mtune to change these ABI-affecting values.  Are
> there -march values for which a smaller range than 64-256 makes sense?
>
> gcc/ChangeLog:
>
> * params.opt: Add destructive-interference-size and
> constructive-interference-size.
> * doc/invoke.texi: Document them.
> * config/aarch64/aarch64.c (aarch64_override_options_internal):
> Set them.
> * config/arm/arm.c (arm_option_override): Set them.
> * config/i386/i386-options.c (ix86_option_override_internal):
> Set them.
>
> gcc/c-family/ChangeLog:
>
> * c.opt: Add -Winterference-size.
> * c-cppbuiltin.c (cpp_atomic_builtins): Add __GCC_DESTRUCTIVE_SIZE
> and __GCC_CONSTRUCTIVE_SIZE.
>
> gcc/cp/ChangeLog:
>
> * decl.c (cxx_init_decl_processing): Check
> --param *-interference-size values.
>
> libstdc++-v3/ChangeLog:
>
> * include/std/version: Define __cpp_lib_hardware_interference_size.
> * libsupc++/new: Define hardware interference size variables.
>
> gcc/testsuite/ChangeLog:
>
> * g++.target/aarch64/interference.C: New test.
> * g++.target/arm/interference.C: New test.
> * g++.target/i386/interference.C: New test.
> ---
>  gcc/doc/invoke.texi   | 22 ++
>  gcc/c-family/c.opt|  5 
>  gcc/params.opt| 15 
>  gcc/c-family/c-cppbuiltin.c   | 12 ++
>  gcc/config/aarch64/aarch64.c  |  9 
>  gcc/config/arm/arm.c  |  6 +
>  gcc/config/i386/i386-options.c|  6 +
>  gcc/cp/decl.c | 23 +++
>  .../g++.target/aarch64/interference.C |  9 
>  gcc/testsuite/g++.target/arm/interference.C   |  9 
>  gcc/testsuite/g++.target/i386/interference.C  |  8 +++
>  libstdc++-v3/include/std/version  |  3 +++
>  libstdc++-v3/libsupc++/new| 10 ++--
>  13 files changed, 135 insertions(+), 2 deletions(-)
>  create mode 100644 gcc/testsuite/g++.target/aarch64/interference.C
>  create mode 100644 gcc/testsuite/g++.target/arm/interference.C
>  create mode 100644 gcc/testsuite/g++.target/i386/interference.C
>
> diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi
> index ea8812425e9..f93cb7a20f7 100644
> --- a/gcc/doc/invoke.texi
> +++ b/gcc/doc/invoke.texi
> @@ -13857,6 +13857,28 @@ prefetch hints can be issued for any constant
> stride.
>
>  This setting is only useful for strides that are known and constant.
>
> +@item destructive_interference_size
> +@item constructive_interference_size
> +The values for the C++17 variables
> +@code{std::hardware_destructive_interference_size} and
> +@code{std::hardware_constructive_interference_size}.  The destructive
> +interference size is the minimum recommended offset between two
> +independent concurrently-accessed objects; the constructive
> +interference size is the maximum recommended size of contiguous memory
> +accessed together.  Typically both will be the size of an L1 cache
> +line for the 

[PATCH] c++: implement C++17 hardware interference size

2021-07-15 Thread Jason Merrill via Gcc-patches
The last missing piece of the C++17 standard library is the hardware
intereference size constants.  Much of the delay in implementing these has
been due to uncertainty about what the right values are, and even whether
there is a single constant value that is suitable; the destructive
interference size is intended to be used in structure layout, so program
ABIs will depend on it.

In principle, both of these values should be the same as the target's L1
cache line size.  When compiling for a generic target that is intended to
support a range of target CPUs with different cache line sizes, the
constructive size should probably be the minimum size, and the destructive
size the maximum, unless you are constrained by ABI compatibility with
previous code.

JF Bastien's implementation proposal is summarized at
https://github.com/itanium-cxx-abi/cxx-abi/issues/74

I implement this by adding new --params for the two sizes.  Targets need to
override these values in targetm.target_option.override() to support the
feature.

64 bytes still seems correct for the x86 family.

I'm not sure why he said 64/64 for 32-bit ARM, since the Cortex A9 has a
32-byte cache line, and that seems to be the only ARM_PREFETCH_BENEFICIAL
target, so I'd think 32/64 would make more sense.

He proposed 64/128 for AArch64, but since the A64FX now has a 256B cache
line, I've changed that to 64/256.  Does that seem right?

Currently the patch does not adjust the values based on -march, as in JF's
proposal.  I'll need more guidance from the ARM/AArch64 maintainers about
how to go about that.  --param l1-cache-line-size is set based on -mtune,
but I don't think we want -mtune to change these ABI-affecting values.  Are
there -march values for which a smaller range than 64-256 makes sense?

gcc/ChangeLog:

* params.opt: Add destructive-interference-size and
constructive-interference-size.
* doc/invoke.texi: Document them.
* config/aarch64/aarch64.c (aarch64_override_options_internal):
Set them.
* config/arm/arm.c (arm_option_override): Set them.
* config/i386/i386-options.c (ix86_option_override_internal):
Set them.

gcc/c-family/ChangeLog:

* c.opt: Add -Winterference-size.
* c-cppbuiltin.c (cpp_atomic_builtins): Add __GCC_DESTRUCTIVE_SIZE
and __GCC_CONSTRUCTIVE_SIZE.

gcc/cp/ChangeLog:

* decl.c (cxx_init_decl_processing): Check
--param *-interference-size values.

libstdc++-v3/ChangeLog:

* include/std/version: Define __cpp_lib_hardware_interference_size.
* libsupc++/new: Define hardware interference size variables.

gcc/testsuite/ChangeLog:

* g++.target/aarch64/interference.C: New test.
* g++.target/arm/interference.C: New test.
* g++.target/i386/interference.C: New test.
---
 gcc/doc/invoke.texi   | 22 ++
 gcc/c-family/c.opt|  5 
 gcc/params.opt| 15 
 gcc/c-family/c-cppbuiltin.c   | 12 ++
 gcc/config/aarch64/aarch64.c  |  9 
 gcc/config/arm/arm.c  |  6 +
 gcc/config/i386/i386-options.c|  6 +
 gcc/cp/decl.c | 23 +++
 .../g++.target/aarch64/interference.C |  9 
 gcc/testsuite/g++.target/arm/interference.C   |  9 
 gcc/testsuite/g++.target/i386/interference.C  |  8 +++
 libstdc++-v3/include/std/version  |  3 +++
 libstdc++-v3/libsupc++/new| 10 ++--
 13 files changed, 135 insertions(+), 2 deletions(-)
 create mode 100644 gcc/testsuite/g++.target/aarch64/interference.C
 create mode 100644 gcc/testsuite/g++.target/arm/interference.C
 create mode 100644 gcc/testsuite/g++.target/i386/interference.C

diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi
index ea8812425e9..f93cb7a20f7 100644
--- a/gcc/doc/invoke.texi
+++ b/gcc/doc/invoke.texi
@@ -13857,6 +13857,28 @@ prefetch hints can be issued for any constant stride.
 
 This setting is only useful for strides that are known and constant.
 
+@item destructive_interference_size
+@item constructive_interference_size
+The values for the C++17 variables
+@code{std::hardware_destructive_interference_size} and
+@code{std::hardware_constructive_interference_size}.  The destructive
+interference size is the minimum recommended offset between two
+independent concurrently-accessed objects; the constructive
+interference size is the maximum recommended size of contiguous memory
+accessed together.  Typically both will be the size of an L1 cache
+line for the target, in bytes.  If the target can have a range of L1
+cache line sizes, typically the constructive interference size will be
+the small end of the range and the destructive size will be the large
+end.
+
+These values, particularly the destructive size, are intended to be
+used for layout, and 

Re: [PATCH 1/4][committed] testsuite: Fix testisms in scalar tests PR101457

2021-07-15 Thread H.J. Lu via Gcc-patches
On Thu, Jul 15, 2021 at 9:40 AM Tamar Christina via Gcc-patches
 wrote:
>
> Hi All,
>
> These testcases accidentally contain the wrong signs for the expected values
> for the scalar code.  The vector code however is correct.
>
> Bootstrapped Regtested on aarch64-none-linux-gnu and no issues.
>
> Committed as a trivial fix.
>
> Thanks,
> Tamar
>
> gcc/testsuite/ChangeLog:
>
> PR middle-end/101457
> * gcc.dg/vect/vect-reduc-dot-17.c: Fix signs of scalar code.
> * gcc.dg/vect/vect-reduc-dot-18.c: Likewise.
> * gcc.dg/vect/vect-reduc-dot-22.c: Likewise.
> * gcc.dg/vect/vect-reduc-dot-9.c: Likewise.
>
> --- inline copy of patch --
> diff --git a/gcc/testsuite/gcc.dg/vect/vect-reduc-dot-17.c 
> b/gcc/testsuite/gcc.dg/vect/vect-reduc-dot-17.c
> index 
> aa269c4d657f65e07e36df7f3fd0098cf3aaf4d0..38f86fe458adcc7ebbbae22f5cc1e720928f2d48
>  100644
> --- a/gcc/testsuite/gcc.dg/vect/vect-reduc-dot-17.c
> +++ b/gcc/testsuite/gcc.dg/vect/vect-reduc-dot-17.c
> @@ -35,8 +35,9 @@ main (void)
>  {
>check_vect ();
>
> -  SIGNEDNESS_3 char a[N], b[N];
> -  int expected = 0x12345;
> +  SIGNEDNESS_3 char a[N];
> +  SIGNEDNESS_4 char b[N];
> +  SIGNEDNESS_1 int expected = 0x12345;
>for (int i = 0; i < N; ++i)
>  {
>a[i] = BASE + i * 5;
> diff --git a/gcc/testsuite/gcc.dg/vect/vect-reduc-dot-18.c 
> b/gcc/testsuite/gcc.dg/vect/vect-reduc-dot-18.c
> index 
> 2b1cc0411c3256ccd876d8b4da18ce4881dc0af9..2e86ebe3c6c6a0da9ac242868592f30028ed2155
>  100644
> --- a/gcc/testsuite/gcc.dg/vect/vect-reduc-dot-18.c
> +++ b/gcc/testsuite/gcc.dg/vect/vect-reduc-dot-18.c
> @@ -35,8 +35,9 @@ main (void)
>  {
>check_vect ();
>
> -  SIGNEDNESS_3 char a[N], b[N];
> -  int expected = 0x12345;
> +  SIGNEDNESS_3 char a[N];
> +  SIGNEDNESS_4 char b[N];
> +  SIGNEDNESS_1 int expected = 0x12345;
>for (int i = 0; i < N; ++i)
>  {
>a[i] = BASE + i * 5;
> diff --git a/gcc/testsuite/gcc.dg/vect/vect-reduc-dot-22.c 
> b/gcc/testsuite/gcc.dg/vect/vect-reduc-dot-22.c
> index 
> febeb19784c6aaca72dc0871af0d32cc91fa6ea2..0bde43a6cb855ce5edd9015ebf34ca226353d77e
>  100644
> --- a/gcc/testsuite/gcc.dg/vect/vect-reduc-dot-22.c
> +++ b/gcc/testsuite/gcc.dg/vect/vect-reduc-dot-22.c
> @@ -37,7 +37,7 @@ main (void)
>
>SIGNEDNESS_3 char a[N];
>SIGNEDNESS_4 short b[N];
> -  int expected = 0x12345;
> +  SIGNEDNESS_1 long expected = 0x12345;

Does it work with long == int? I still got

FAIL: gcc.dg/vect/vect-reduc-dot-22.c -flto -ffat-lto-objects
scan-tree-dump-not vect "vect_recog_dot_prod_pattern: detected"
FAIL: gcc.dg/vect/vect-reduc-dot-22.c scan-tree-dump-not vect
"vect_recog_dot_prod_pattern: detected"

with -m32 on Linux/x86-64.

>for (int i = 0; i < N; ++i)
>  {
>a[i] = BASE + i * 5;
> diff --git a/gcc/testsuite/gcc.dg/vect/vect-reduc-dot-9.c 
> b/gcc/testsuite/gcc.dg/vect/vect-reduc-dot-9.c
> index 
> cbbeedec3bfd0810a8ce8036e6670585d9334924..d1049c96bf1febfc8933622e292b44cc8dd129cc
>  100644
> --- a/gcc/testsuite/gcc.dg/vect/vect-reduc-dot-9.c
> +++ b/gcc/testsuite/gcc.dg/vect/vect-reduc-dot-9.c
> @@ -35,8 +35,9 @@ main (void)
>  {
>check_vect ();
>
> -  SIGNEDNESS_3 char a[N], b[N];
> -  int expected = 0x12345;
> +  SIGNEDNESS_3 char a[N];
> +  SIGNEDNESS_4 char b[N];
> +  SIGNEDNESS_1 int expected = 0x12345;
>for (int i = 0; i < N; ++i)
>  {
>a[i] = BASE + i * 5;
>
>
> --


-- 
H.J.


Re: [PATCH] [android] Disable large files when unsupported

2021-07-15 Thread João Gabriel Jardim via Gcc-patches
Hi Mr. Santana,

Some strange stuff happened with my previous comment. I've been struggling
with this lately, so this patch would be really useful.

Em qui., 15 de jul. de 2021 às 15:44, Abraão de Santana <
abraaocsant...@gmail.com> escreveu:

> Hey João , I think there's a problem with your email, it's empty!
>
> --
> *Abraão C. de Santana*
>


-- 
João Gabriel Jardim


Re: [PATCH 1/4] force decls to be allocated through build_decl to initialize them

2021-07-15 Thread Trevor Saunders
On Thu, Jul 15, 2021 at 10:01:01AM +0200, Richard Biener wrote:
> On Thu, Jul 15, 2021 at 4:24 AM Trevor Saunders  wrote:
> >
> > On Wed, Jul 14, 2021 at 01:27:54PM +0200, Richard Biener wrote:
> > > On Wed, Jul 14, 2021 at 10:20 AM Trevor Saunders  
> > > wrote:
> > > >
> > > > prior to this commit all calls to build_decl used input_location, even 
> > > > if
> > > > temporarily  until build_decl reset the location to something else that 
> > > > it was
> > > > told was the proper location.  To avoid using the global we need the 
> > > > caller to
> > > > pass in the location it wants, however that's not possible with 
> > > > make_node since
> > > > it makes other types of nodes.  So we force all callers who wish to 
> > > > make a decl
> > > > to go through build_decl which already takes a location argument.  To 
> > > > avoid
> > > > changing behavior this just explicitly passes in input_location to 
> > > > build_decl
> > > > for callers of make_node that create a decl, however it would seem in 
> > > > many of
> > > > these cases that the location of the decl being coppied might be a 
> > > > better
> > > > location.
> > > >
> > > > bootstrapped and regtested on x86_64-linux-gnu, ok?
> > >
> > > I think all eventually DECL_ARTIFICIAL decls should better use
> > > UNKNOWN_LOCATION instead of input_location.
> >
> > You'd know if that might break something better than me, but that seems
> > sensible in principal.  That said, I would like to incrementally do one
> > thing at a time, rather than change make_node to use unknown_location,
> > and set the location to something else all at once, but I suppose I
> > could first change some callers to be build_decl (unknown_location, ...)
> > and then come back to changing make_node when there's fewer callers to
> > reason about if that's preferable.
> 
> Sure, we can defer changing make_node (I thought the patch catched all
> but three callers ...).  But it feels odd to introduce so many explicit
> input_location uses for cases where it clearly doesn't matter (the
> DECL_ARTIFICIAL),
> so I'd prefer to "fix" those immediately,

Fair enough, I think you just have more expereience and confidence it
can't matter than I do yet, I'll work on fixing them immediately.

Trev



[PATCH v2] x86: Don't set AVX_U128_DIRTY when all bits are zero

2021-07-15 Thread H.J. Lu via Gcc-patches
On Thu, Jul 15, 2021 at 6:36 PM Hongtao Liu  wrote:
>
> On Fri, Jul 16, 2021 at 1:30 AM H.J. Lu via Gcc-patches
>  wrote:
> >
> > In a single SET, all bits of the source YMM/ZMM register are zero when
> >
> > 1. The source is contant zero.
> > 2. The source YMM/ZMM operand are defined from contant zero.
> >
> > and we don't set AVX_U128_DIRTY.
> >
> > gcc/
> >
> > PR target/101456
> > * config/i386/i386.c (ix86_avx_u128_mode_needed): Don't set
> > AVX_U128_DIRTY when all bits are zero.
> >
> > gcc/testsuite/
> >
> > PR target/101456
> > * gcc.target/i386/pr101456-1.c: New test.
> > ---
> >  gcc/config/i386/i386.c | 47 ++
> >  gcc/testsuite/gcc.target/i386/pr101456-1.c | 28 +
> >  2 files changed, 75 insertions(+)
> >  create mode 100644 gcc/testsuite/gcc.target/i386/pr101456-1.c
> >
> > diff --git a/gcc/config/i386/i386.c b/gcc/config/i386/i386.c
> > index cff26909292..c2b06934053 100644
> > --- a/gcc/config/i386/i386.c
> > +++ b/gcc/config/i386/i386.c
> > @@ -14129,6 +14129,53 @@ ix86_avx_u128_mode_needed (rtx_insn *insn)
> >return AVX_U128_CLEAN;
> >  }
> >
> > +  rtx set = single_set (insn);
> > +  if (set)
> > +{
> > +  rtx dest = SET_DEST (set);
> > +  rtx src = SET_SRC (set);
> > +  if (ix86_check_avx_upper_register (dest))
> > +   {
> > + /* It is not dirty if the source is known zero.  */
> > + if (standard_sse_constant_p (src, GET_MODE (dest)) == 1)
> > +   return AVX_U128_ANY;
> > + else
> > +   return AVX_U128_DIRTY;
> > +   }
> > +  else if (ix86_check_avx_upper_register (src))
> > +   {
> > + /* Check for the source operand with all DEFs from constant
> > +zero.  */
> > + df_ref def = DF_REG_DEF_CHAIN (REGNO (src));
> > + if (!def)
> > +   return AVX_U128_DIRTY;
> > +
> > + for (; def; def = DF_REF_NEXT_REG (def))
> > +   if (DF_REF_REG_DEF_P (def)
> > +   && !DF_REF_IS_ARTIFICIAL (def))
> > + {
> > +   rtx_insn *def_insn = DF_REF_INSN (def);
> > +   set = single_set (def_insn);
> > +   if (!set)
> > + return AVX_U128_DIRTY;
> > +
> > +   dest = SET_DEST (set);
> > +   if (ix86_check_avx_upper_register (dest))
> > + {
> > +   src = SET_SRC (set);
> > +   /* It is dirty if the source operand isn't constant
> > +  zero.  */
> > +   if (standard_sse_constant_p (src, GET_MODE (dest))
> > +   != 1)
> > + return AVX_U128_DIRTY;
> > + }
> > + }
> > +
> > + /* It is not dirty only if all sources are known zero.  */
> > + return AVX_U128_ANY;
> > +   }
> > +}
> > +
> >/* Require DIRTY mode if a 256bit or 512bit AVX register is referenced.
> >   Hardware changes state only when a 256bit register is written to,
> >   but we need to prevent the compiler from moving optimal insertion
> > diff --git a/gcc/testsuite/gcc.target/i386/pr101456-1.c 
> > b/gcc/testsuite/gcc.target/i386/pr101456-1.c
> > new file mode 100644
> > index 000..6a0f6ccd756
> > --- /dev/null
> > +++ b/gcc/testsuite/gcc.target/i386/pr101456-1.c
> > @@ -0,0 +1,28 @@
> > +/* { dg-do compile } */
> > +/* { dg-options "-O2 -march=skylake" } */
> > +
> > +#include 
> > +
> > +extern __m256 x1;
> > +extern __m256d x2;
> > +extern __m256i x3;
> > +
> > +void
> > +foo1 (void)
> > +{
> > +  x1 = _mm256_setzero_ps ();
> > +}
> > +
> > +void
> > +foo2 (void)
> > +{
> > +  x2 = _mm256_setzero_pd ();
> > +}
> > +
> > +void
> > +foo3 (void)
> > +{
> > +  x3 = _mm256_setzero_si256 ();
> > +}
> > +
> > +/* { dg-final { scan-assembler-not "vzeroupper" } } */
> > --
> > 2.31.1
> >
>
> LGTM.
>

Here is the v2 patch to handle calls.

-- 
H.J.
From 4bd6aba8326eee9fa3c5310086fc5b76fc090795 Mon Sep 17 00:00:00 2001
From: "H.J. Lu" 
Date: Wed, 14 Jul 2021 17:03:15 -0700
Subject: [PATCH v2] x86: Don't set AVX_U128_DIRTY when all bits are zero

In a single SET, all bits of the source YMM/ZMM register are zero when

1. The source is contant zero.
2. The source YMM/ZMM operand are defined from contant zero.

and we don't set AVX_U128_DIRTY.

gcc/

	PR target/101456
	* config/i386/i386.c (ix86_avx_u128_mode_needed): Don't set
	AVX_U128_DIRTY when all bits are zero.

gcc/testsuite/

	PR target/101456
	* gcc.target/i386/pr101456-1.c: New test.
	* gcc.target/i386/pr101456-2.c: Likewise.
---
 gcc/config/i386/i386.c | 63 ++
 gcc/testsuite/gcc.target/i386/pr101456-1.c | 33 
 gcc/testsuite/gcc.target/i386/pr101456-2.c | 33 
 3 files changed, 129 insertions(+)
 create mode 100644 gcc/testsuite/gcc.target/i386/pr101456-1.c
 create mode 100644 gcc/testsuite/gcc.target/i386/pr101456-2.c

diff 

[PATCH] Fix PR 101453: ICE with optimize and large integer constant

2021-07-15 Thread apinski--- via Gcc-patches
From: Andrew Pinski 

Every base 10 digit will take use ~3.32 bits to represent. So for
a 64bit signed integer, it is 20 characters. The buffer was only
20 so it did not fit; add in the null character and "-O" part,
the buffer would be 3 bytes too small.

Instead of just increasing the size of the buffer, I decided to
calculate the size at compile time and use constexpr to get a
constant for the size.
Since GCC is written in C++11, using constexpr is the best way
to force the size calculated at compile time.

OK? Bootstrapped and tested on x86_64-linux with no regressions.

gcc/c-family/ChangeLog:

PR c/101453
* c-common.c (parse_optimize_options): Use the correct
size for buffer.
---
 gcc/c-family/c-common.c | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/gcc/c-family/c-common.c b/gcc/c-family/c-common.c
index 20ec26317c5..4c5b75a9548 100644
--- a/gcc/c-family/c-common.c
+++ b/gcc/c-family/c-common.c
@@ -5799,7 +5799,9 @@ parse_optimize_options (tree args, bool attr_p)
 
   if (TREE_CODE (value) == INTEGER_CST)
{
- char buffer[20];
+ constexpr double log10 = 3.32;
+ constexpr int longdigits = ((int)((sizeof(long)*CHAR_BIT)/log10))+1;
+ char buffer[longdigits + 3];
  sprintf (buffer, "-O%ld", (long) TREE_INT_CST_LOW (value));
  vec_safe_push (optimize_args, ggc_strdup (buffer));
}
-- 
2.27.0



Re: [PATCH] Disable --param vect-partial-vector-usage by default on x86

2021-07-15 Thread Hongtao Liu via Gcc-patches
On Thu, Jul 15, 2021 at 8:33 PM Richard Biener  wrote:
>
> The following defaults --param vect-partial-vector-usage to zero
> for x86_64 matching existing behavior where support for this
> is not present.
>
> OK for trunk?
>
Ok.
> Thanks,
> Richard/
>
> 2021-07-15  Richard Biener  
>
> * config/i386/i386-options.c (ix86_option_override_internal): Set
> param_vect_partial_vector_usage to zero if not set.
> ---
>  gcc/config/i386/i386-options.c | 5 +
>  1 file changed, 5 insertions(+)
>
> diff --git a/gcc/config/i386/i386-options.c b/gcc/config/i386/i386-options.c
> index 7cba655595e..3416a4f1752 100644
> --- a/gcc/config/i386/i386-options.c
> +++ b/gcc/config/i386/i386-options.c
> @@ -2834,6 +2834,11 @@ ix86_option_override_internal (bool main_args_p,
>
>SET_OPTION_IF_UNSET (opts, opts_set, param_ira_consider_dup_in_all_alts, 
> 0);
>
> +  /* Fully masking the main or the epilogue vectorized loop is not
> + profitable generally so leave it disabled until we get more
> + fine grained control & costing.  */
> +  SET_OPTION_IF_UNSET (opts, opts_set, param_vect_partial_vector_usage, 0);
> +
>return true;
>  }
>
> --
> 2.26.2



-- 
BR,
Hongtao


Re: [PATCH 2/2][RFC] Add loop masking support for x86

2021-07-15 Thread Hongtao Liu via Gcc-patches
On Thu, Jul 15, 2021 at 7:48 PM Richard Biener  wrote:
>
> On Thu, 15 Jul 2021, Hongtao Liu wrote:
>
> > On Thu, Jul 15, 2021 at 6:45 PM Richard Biener via Gcc-patches
> >  wrote:
> > >
> > > On Thu, Jul 15, 2021 at 12:30 PM Richard Biener  wrote:
> > > >
> > > > The following extends the existing loop masking support using
> > > > SVE WHILE_ULT to x86 by proving an alternate way to produce the
> > > > mask using VEC_COND_EXPRs.  So with --param vect-partial-vector-usage
> > > > you can now enable masked vectorized epilogues (=1) or fully
> > > > masked vector loops (=2).
> > > >
> > > > What's missing is using a scalar IV for the loop control
> > > > (but in principle AVX512 can use the mask here - just the patch
> > > > doesn't seem to work for AVX512 yet for some reason - likely
> > > > expand_vec_cond_expr_p doesn't work there).  What's also missing
> > > > is providing more support for predicated operations in the case
> > > > of reductions either via VEC_COND_EXPRs or via implementing
> > > > some of the .COND_{ADD,SUB,MUL...} internal functions as mapping
> > > > to masked AVX512 operations.
> > > >
> > > > For AVX2 and
> > > >
> > > > int foo (unsigned *a, unsigned * __restrict b, int n)
> > > > {
> > > >   unsigned sum = 1;
> > > >   for (int i = 0; i < n; ++i)
> > > > b[i] += a[i];
> > > >   return sum;
> > > > }
> > > >
> > > > we get
> > > >
> > > > .L3:
> > > > vpmaskmovd  (%rsi,%rax), %ymm0, %ymm3
> > > > vpmaskmovd  (%rdi,%rax), %ymm0, %ymm1
> > > > addl$8, %edx
> > > > vpaddd  %ymm3, %ymm1, %ymm1
> > > > vpmaskmovd  %ymm1, %ymm0, (%rsi,%rax)
> > > > vmovd   %edx, %xmm1
> > > > vpsubd  %ymm15, %ymm2, %ymm0
> > > > addq$32, %rax
> > > > vpbroadcastd%xmm1, %ymm1
> > > > vpaddd  %ymm4, %ymm1, %ymm1
> > > > vpsubd  %ymm15, %ymm1, %ymm1
> > > > vpcmpgtd%ymm1, %ymm0, %ymm0
> > > > vptest  %ymm0, %ymm0
> > > > jne .L3
> > > >
> > > > for the fully masked loop body and for the masked epilogue
> > > > we see
> > > >
> > > > .L4:
> > > > vmovdqu (%rsi,%rax), %ymm3
> > > > vpaddd  (%rdi,%rax), %ymm3, %ymm0
> > > > vmovdqu %ymm0, (%rsi,%rax)
> > > > addq$32, %rax
> > > > cmpq%rax, %rcx
> > > > jne .L4
> > > > movl%edx, %eax
> > > > andl$-8, %eax
> > > > testb   $7, %dl
> > > > je  .L11
> > > > .L3:
> > > > subl%eax, %edx
> > > > vmovdqa .LC0(%rip), %ymm1
> > > > salq$2, %rax
> > > > vmovd   %edx, %xmm0
> > > > movl$-2147483648, %edx
> > > > addq%rax, %rsi
> > > > vmovd   %edx, %xmm15
> > > > vpbroadcastd%xmm0, %ymm0
> > > > vpbroadcastd%xmm15, %ymm15
> > > > vpsubd  %ymm15, %ymm1, %ymm1
> > > > vpsubd  %ymm15, %ymm0, %ymm0
> > > > vpcmpgtd%ymm1, %ymm0, %ymm0
> > > > vpmaskmovd  (%rsi), %ymm0, %ymm1
> > > > vpmaskmovd  (%rdi,%rax), %ymm0, %ymm2
> > > > vpaddd  %ymm2, %ymm1, %ymm1
> > > > vpmaskmovd  %ymm1, %ymm0, (%rsi)
> > > > .L11:
> > > > vzeroupper
> > > >
> > > > compared to
> > > >
> > > > .L3:
> > > > movl%edx, %r8d
> > > > subl%eax, %r8d
> > > > leal-1(%r8), %r9d
> > > > cmpl$2, %r9d
> > > > jbe .L6
> > > > leaq(%rcx,%rax,4), %r9
> > > > vmovdqu (%rdi,%rax,4), %xmm2
> > > > movl%r8d, %eax
> > > > andl$-4, %eax
> > > > vpaddd  (%r9), %xmm2, %xmm0
> > > > addl%eax, %esi
> > > > andl$3, %r8d
> > > > vmovdqu %xmm0, (%r9)
> > > > je  .L2
> > > > .L6:
> > > > movslq  %esi, %r8
> > > > leaq0(,%r8,4), %rax
> > > > movl(%rdi,%r8,4), %r8d
> > > > addl%r8d, (%rcx,%rax)
> > > > leal1(%rsi), %r8d
> > > > cmpl%r8d, %edx
> > > > jle .L2
> > > > addl$2, %esi
> > > > movl4(%rdi,%rax), %r8d
> > > > addl%r8d, 4(%rcx,%rax)
> > > > cmpl%esi, %edx
> > > > jle .L2
> > > > movl8(%rdi,%rax), %edx
> > > > addl%edx, 8(%rcx,%rax)
> > > > .L2:
> > > >
> > > > I'm giving this a little testing right now but will dig on why
> > > > I don't get masked loops when AVX512 is enabled.
> > >
> > > Ah, a simple thinko - rgroup_controls vectypes seem to be
> > > always VECTOR_BOOLEAN_TYPE_P and thus we can
> > > use expand_vec_cmp_expr_p.  The AVX512 fully masked
> > > loop then looks like
> > >
> > > .L3:
> > > vmovdqu32   (%rsi,%rax,4), %ymm2{%k1}
> > > vmovdqu32   (%rdi,%rax,4), %ymm1{%k1}
> > > vpaddd  %ymm2, %ymm1, %ymm0
> > > vmovdqu32   %ymm0, (%rsi,%rax,4){%k1}
> > > addq$8, %rax
> > > vpbroadcastd%eax, %ymm0
> > > 

Re: [PATCH] x86: Don't set AVX_U128_DIRTY when all bits are zero

2021-07-15 Thread Hongtao Liu via Gcc-patches
On Fri, Jul 16, 2021 at 1:30 AM H.J. Lu via Gcc-patches
 wrote:
>
> In a single SET, all bits of the source YMM/ZMM register are zero when
>
> 1. The source is contant zero.
> 2. The source YMM/ZMM operand are defined from contant zero.
>
> and we don't set AVX_U128_DIRTY.
>
> gcc/
>
> PR target/101456
> * config/i386/i386.c (ix86_avx_u128_mode_needed): Don't set
> AVX_U128_DIRTY when all bits are zero.
>
> gcc/testsuite/
>
> PR target/101456
> * gcc.target/i386/pr101456-1.c: New test.
> ---
>  gcc/config/i386/i386.c | 47 ++
>  gcc/testsuite/gcc.target/i386/pr101456-1.c | 28 +
>  2 files changed, 75 insertions(+)
>  create mode 100644 gcc/testsuite/gcc.target/i386/pr101456-1.c
>
> diff --git a/gcc/config/i386/i386.c b/gcc/config/i386/i386.c
> index cff26909292..c2b06934053 100644
> --- a/gcc/config/i386/i386.c
> +++ b/gcc/config/i386/i386.c
> @@ -14129,6 +14129,53 @@ ix86_avx_u128_mode_needed (rtx_insn *insn)
>return AVX_U128_CLEAN;
>  }
>
> +  rtx set = single_set (insn);
> +  if (set)
> +{
> +  rtx dest = SET_DEST (set);
> +  rtx src = SET_SRC (set);
> +  if (ix86_check_avx_upper_register (dest))
> +   {
> + /* It is not dirty if the source is known zero.  */
> + if (standard_sse_constant_p (src, GET_MODE (dest)) == 1)
> +   return AVX_U128_ANY;
> + else
> +   return AVX_U128_DIRTY;
> +   }
> +  else if (ix86_check_avx_upper_register (src))
> +   {
> + /* Check for the source operand with all DEFs from constant
> +zero.  */
> + df_ref def = DF_REG_DEF_CHAIN (REGNO (src));
> + if (!def)
> +   return AVX_U128_DIRTY;
> +
> + for (; def; def = DF_REF_NEXT_REG (def))
> +   if (DF_REF_REG_DEF_P (def)
> +   && !DF_REF_IS_ARTIFICIAL (def))
> + {
> +   rtx_insn *def_insn = DF_REF_INSN (def);
> +   set = single_set (def_insn);
> +   if (!set)
> + return AVX_U128_DIRTY;
> +
> +   dest = SET_DEST (set);
> +   if (ix86_check_avx_upper_register (dest))
> + {
> +   src = SET_SRC (set);
> +   /* It is dirty if the source operand isn't constant
> +  zero.  */
> +   if (standard_sse_constant_p (src, GET_MODE (dest))
> +   != 1)
> + return AVX_U128_DIRTY;
> + }
> + }
> +
> + /* It is not dirty only if all sources are known zero.  */
> + return AVX_U128_ANY;
> +   }
> +}
> +
>/* Require DIRTY mode if a 256bit or 512bit AVX register is referenced.
>   Hardware changes state only when a 256bit register is written to,
>   but we need to prevent the compiler from moving optimal insertion
> diff --git a/gcc/testsuite/gcc.target/i386/pr101456-1.c 
> b/gcc/testsuite/gcc.target/i386/pr101456-1.c
> new file mode 100644
> index 000..6a0f6ccd756
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/i386/pr101456-1.c
> @@ -0,0 +1,28 @@
> +/* { dg-do compile } */
> +/* { dg-options "-O2 -march=skylake" } */
> +
> +#include 
> +
> +extern __m256 x1;
> +extern __m256d x2;
> +extern __m256i x3;
> +
> +void
> +foo1 (void)
> +{
> +  x1 = _mm256_setzero_ps ();
> +}
> +
> +void
> +foo2 (void)
> +{
> +  x2 = _mm256_setzero_pd ();
> +}
> +
> +void
> +foo3 (void)
> +{
> +  x3 = _mm256_setzero_si256 ();
> +}
> +
> +/* { dg-final { scan-assembler-not "vzeroupper" } } */
> --
> 2.31.1
>

LGTM.

-- 
BR,
Hongtao


[PATCH] c++: Don't hide narrowing errors in system headers

2021-07-15 Thread Marek Polacek via Gcc-patches
Jonathan pointed me at this issue where

  constexpr unsigned f() { constexpr int n = -1; return unsigned{n}; }

is accepted in system headers, despite the narrowing conversion from
a constant.  I suspect that whereas narrowing warnings should be
disabled, ill-formed narrowing of constants should be a hard error
(which can still be disabled by -Wno-narrowing).

Bootstrapped/regtested on {ppc64le,x86_64}-pc-linux-gnu, ok for trunk?

gcc/cp/ChangeLog:

* typeck2.c (check_narrowing): Don't suppress the pedantic error
in system headers.

libstdc++-v3/ChangeLog:

* testsuite/20_util/ratio/operations/ops_overflow_neg.cc: Add
dg-error.

gcc/testsuite/ChangeLog:

* g++.dg/cpp1y/Wnarrowing2.C: New test.
* g++.dg/cpp1y/Wnarrowing2.h: New test.
---
 gcc/cp/typeck2.c  | 1 +
 gcc/testsuite/g++.dg/cpp1y/Wnarrowing2.C  | 4 
 gcc/testsuite/g++.dg/cpp1y/Wnarrowing2.h  | 2 ++
 .../testsuite/20_util/ratio/operations/ops_overflow_neg.cc| 2 ++
 4 files changed, 9 insertions(+)
 create mode 100644 gcc/testsuite/g++.dg/cpp1y/Wnarrowing2.C
 create mode 100644 gcc/testsuite/g++.dg/cpp1y/Wnarrowing2.h

diff --git a/gcc/cp/typeck2.c b/gcc/cp/typeck2.c
index 6679e247816..dcfdff2f905 100644
--- a/gcc/cp/typeck2.c
+++ b/gcc/cp/typeck2.c
@@ -986,6 +986,7 @@ check_narrowing (tree type, tree init, tsubst_flags_t 
complain,
{
  int savederrorcount = errorcount;
  global_dc->pedantic_errors = 1;
+ auto s = make_temp_override (global_dc->dc_warn_system_headers, true);
  pedwarn (loc, OPT_Wnarrowing,
   "narrowing conversion of %qE from %qH to %qI",
   init, ftype, type);
diff --git a/gcc/testsuite/g++.dg/cpp1y/Wnarrowing2.C 
b/gcc/testsuite/g++.dg/cpp1y/Wnarrowing2.C
new file mode 100644
index 000..048d484f46f
--- /dev/null
+++ b/gcc/testsuite/g++.dg/cpp1y/Wnarrowing2.C
@@ -0,0 +1,4 @@
+// { dg-do compile { target c++14 } }
+
+#include "Wnarrowing2.h"
+// { dg-error "narrowing conversion" "" { target *-*-* } 0 }
diff --git a/gcc/testsuite/g++.dg/cpp1y/Wnarrowing2.h 
b/gcc/testsuite/g++.dg/cpp1y/Wnarrowing2.h
new file mode 100644
index 000..7dafa51af14
--- /dev/null
+++ b/gcc/testsuite/g++.dg/cpp1y/Wnarrowing2.h
@@ -0,0 +1,2 @@
+#pragma GCC system_header
+constexpr unsigned f() { constexpr int n = -1; return unsigned{n}; }
diff --git 
a/libstdc++-v3/testsuite/20_util/ratio/operations/ops_overflow_neg.cc 
b/libstdc++-v3/testsuite/20_util/ratio/operations/ops_overflow_neg.cc
index 47d3c3a037e..f120e599a33 100644
--- a/libstdc++-v3/testsuite/20_util/ratio/operations/ops_overflow_neg.cc
+++ b/libstdc++-v3/testsuite/20_util/ratio/operations/ops_overflow_neg.cc
@@ -39,6 +39,7 @@ test02()
 }
 
 // { dg-error "required from here" "" { target *-*-* } 28 }
+// { dg-error "expected initializer" "" { target *-*-* } 28 }
 // { dg-error "expected initializer" "" { target *-*-* } 35 }
 // { dg-error "expected initializer" "" { target *-*-* } 37 }
 // { dg-error "overflow in addition" "" { target *-*-* } 0 }
@@ -46,5 +47,6 @@ test02()
 // { dg-error "overflow in multiplication" "" { target *-*-* } 100 }
 // { dg-error "overflow in multiplication" "" { target *-*-* } 102 }
 // { dg-error "overflow in constant expression" "" { target *-*-* } 0 }
+// { dg-error "narrowing conversion" "" { target *-*-* } 0 }
 // { dg-prune-output "out of range" }
 // { dg-prune-output "not usable in a constant expression" }

base-commit: f364cdffa47af574f90f671b2dcf5afa91442741
-- 
2.31.1



[committed] analyzer: fix const-correctness of various is_a_helper

2021-07-15 Thread David Malcolm via Gcc-patches
Successfully bootstrapped & regrtested on x86_64-pc-linux-gnu.
Pushed to trunk as c031ea2782a1873eee5ba82fb114cd87ff831412.

gcc/analyzer/ChangeLog:
* svalue.h (is_a_helper ::test): Make
param and template param const.
(is_a_helper ::test): Likewise.
(is_a_helper ::test): Likewise.
(is_a_helper ::test): Likewise.

Signed-off-by: David Malcolm 
---
 gcc/analyzer/svalue.h | 8 
 1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/gcc/analyzer/svalue.h b/gcc/analyzer/svalue.h
index 54b97f8617f..20d7cf8f751 100644
--- a/gcc/analyzer/svalue.h
+++ b/gcc/analyzer/svalue.h
@@ -1063,7 +1063,7 @@ public:
 template <>
 template <>
 inline bool
-is_a_helper ::test (svalue *sval)
+is_a_helper ::test (const svalue *sval)
 {
   return sval->get_kind () == SK_PLACEHOLDER;
 }
@@ -1165,7 +1165,7 @@ public:
 template <>
 template <>
 inline bool
-is_a_helper ::test (svalue *sval)
+is_a_helper ::test (const svalue *sval)
 {
   return sval->get_kind () == SK_WIDENING;
 }
@@ -1266,7 +1266,7 @@ public:
 template <>
 template <>
 inline bool
-is_a_helper ::test (svalue *sval)
+is_a_helper ::test (const svalue *sval)
 {
   return sval->get_kind () == SK_COMPOUND;
 }
@@ -1366,7 +1366,7 @@ public:
 template <>
 template <>
 inline bool
-is_a_helper ::test (svalue *sval)
+is_a_helper ::test (const svalue *sval)
 {
   return sval->get_kind () == SK_CONJURED;
 }
-- 
2.26.3



[PATCH v3 2/2] rs6000: Add test for _mm_minpos_epu16

2021-07-15 Thread Paul A. Clarke via Gcc-patches
Copy the test for _mm_minpos_epu16 from
gcc/testsuite/gcc.target/i386/sse4_1-phminposuw.c, with
a few adjustments:

- Adjust the dejagnu directives for powerpc platform.
- Make the data not be monotonically increasing,
  such that some of the returned values are not
  always the first value (index 0).
- Create a list of input data testing various scenarios
  including more than one minimum value and different
  orders and indices of the minimum value.
- Fix a masking issue where the index was being truncated
  to 2 bits instead of 3 bits, which wasn't found because
  all of the returned indices were 0 with the original
  generated data.
- Support big-endian.

2021-07-15  Paul A. Clarke  

gcc/testsuite
* gcc.target/powerpc/sse4_1-phminposuw.c: Copy from
gcc/testsuite/gcc.target/i386, make more robust.
---
v3: Minor formatting changes per Bill's review.
v2: Rewrote to utilize much more interesting input data afer Segher's
review.

 .../gcc.target/powerpc/sse4_1-phminposuw.c| 68 +++
 1 file changed, 68 insertions(+)
 create mode 100644 gcc/testsuite/gcc.target/powerpc/sse4_1-phminposuw.c

diff --git a/gcc/testsuite/gcc.target/powerpc/sse4_1-phminposuw.c 
b/gcc/testsuite/gcc.target/powerpc/sse4_1-phminposuw.c
new file mode 100644
index ..88d9b43c431c
--- /dev/null
+++ b/gcc/testsuite/gcc.target/powerpc/sse4_1-phminposuw.c
@@ -0,0 +1,68 @@
+/* { dg-do run } */
+/* { dg-options "-O2 -mpower8-vector -Wno-psabi" } */
+/* { dg-require-effective-target p8vector_hw } */
+
+#define NO_WARN_X86_INTRINSICS 1
+#ifndef CHECK_H
+#define CHECK_H "sse4_1-check.h"
+#endif
+
+#ifndef TEST
+#define TEST sse4_1_test
+#endif
+
+#include CHECK_H
+
+#include 
+
+#define DIM(a) (sizeof (a) / sizeof ((a)[0]))
+
+static void
+TEST (void)
+{
+  union
+{
+  __m128i x;
+  unsigned short s[8];
+} src[] =
+{
+  { .s = { 0x, 0x, 0x, 0x, 0x, 0x, 0x, 0x 
} },
+  { .s = { 0x, 0x, 0x, 0x, 0x, 0x, 0x, 0x 
} },
+  { .s = { 0x, 0x, 0x, 0x, 0x, 0x, 0x, 0x 
} },
+  { .s = { 0x0001, 0x0002, 0x0003, 0x0004, 0x0005, 0x0006, 0x0007, 0x0008 
} },
+  { .s = { 0x0008, 0x0007, 0x0006, 0x0005, 0x0004, 0x0003, 0x0002, 0x0001 
} },
+  { .s = { 0xfff4, 0xfff3, 0xfff2, 0xfff1, 0xfff3, 0xfff1, 0xfff2, 0xfff3 
} }
+};
+  unsigned short minVal[DIM (src)];
+  int minInd[DIM (src)];
+  unsigned short minValScalar, minIndScalar;
+  int i, j;
+  union
+{
+  int si;
+  unsigned short s[2];
+} res;
+
+  for (i = 0; i < DIM (src); i++)
+{
+  res.si = _mm_cvtsi128_si32 (_mm_minpos_epu16 (src[i].x));
+  minVal[i] = res.s[0];
+  minInd[i] = res.s[1] & 0b111;
+}
+
+  for (i = 0; i < DIM (src); i++)
+{
+  minValScalar = src[i].s[0];
+  minIndScalar = 0;
+
+  for (j = 1; j < 8; j++)
+   if (minValScalar > src[i].s[j])
+ {
+   minValScalar = src[i].s[j];
+   minIndScalar = j;
+ }
+
+  if (minValScalar != minVal[i] && minIndScalar != minInd[i])
+   abort ();
+}
+}
-- 
2.27.0



[PATCH v3 1/2] rs6000: Add support for _mm_minpos_epu16

2021-07-15 Thread Paul A. Clarke via Gcc-patches
Add a naive implementation of the subject x86 intrinsic to
ease porting.

2021-07-15  Paul A. Clarke  

gcc
* config/rs6000/smmintrin.h (_mm_minpos_epu16): New.
---
v3: Minor formatting changes per review from Bill.
v2: Minor formatting changes per review from Segher.

 gcc/config/rs6000/smmintrin.h | 27 +++
 1 file changed, 27 insertions(+)

diff --git a/gcc/config/rs6000/smmintrin.h b/gcc/config/rs6000/smmintrin.h
index 16fd34d836ff..6a010fdbb96f 100644
--- a/gcc/config/rs6000/smmintrin.h
+++ b/gcc/config/rs6000/smmintrin.h
@@ -172,4 +172,31 @@ _mm_test_mix_ones_zeros (__m128i __A, __m128i __mask)
   return any_ones * any_zeros;
 }
 
+/* Return horizontal packed word minimum and its index in bits [15:0]
+   and bits [18:16] respectively.  */
+__inline __m128i
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm_minpos_epu16 (__m128i __A)
+{
+  union __u
+{
+  __m128i __m;
+  __v8hu __uh;
+};
+  union __u __u = { .__m = __A }, __r = { .__m = {0} };
+  unsigned short __ridx = 0;
+  unsigned short __rmin = __u.__uh[__ridx];
+  for (unsigned long __i = 1; __i < 8; __i++)
+{
+  if (__u.__uh[__i] < __rmin)
+   {
+ __rmin = __u.__uh[__i];
+ __ridx = __i;
+   }
+}
+  __r.__uh[0] = __rmin;
+  __r.__uh[1] = __ridx;
+  return __r.__m;
+}
+
 #endif
-- 
2.27.0



[PATCH v3 0/2] rs6000: Add support for _mm_minpos_epu16

2021-07-15 Thread Paul A. Clarke via Gcc-patches
Added compatible implementation of _mm_minpos_epu16 for powerpc.
Copied, improved, and fixed testcase from i386.
Tested on BE, LE (32 and 64bit).

Paul A. Clarke (2):
  rs6000: Add support for _mm_minpos_epu16
  - v3: Changes per Bill's review.
  rs6000: Add test for _mm_minpos_epu16
  - v3: Changes per Bill's review.

 gcc/config/rs6000/smmintrin.h | 27 
 .../gcc.target/powerpc/sse4_1-phminposuw.c| 68 +++
 2 files changed, 95 insertions(+)
 create mode 100644 gcc/testsuite/gcc.target/powerpc/sse4_1-phminposuw.c

-- 
2.27.0



Re: [RFC/PATCH] vect: Recog mul_highpart pattern

2021-07-15 Thread Segher Boessenkool
On Thu, Jul 15, 2021 at 09:40:52AM +0800, Kewen.Lin wrote:
> on 2021/7/15 上午3:32, Segher Boessenkool wrote:
> > The normal rule is you cannot go over 80.  It is perfectly fine to have
> > shorter lines, certainly if that is nice for some other reason, so
> > automatically (by some tool) changing this is Just Wrong.
> 
> OK, could this be applied to changelog entry too?  I guess yes?

Yes, lines of length 80 are fine in changelogs.

But try to not make short lines (that do no end an entry).  A changelog
that looks different from other changelogs is harder to read.  Normally
you have a whole bunch of totally boring entries ("New." or "Likewise."
for example), and the few that are longer naturally stand out then,
making it easier to scan the changelogs (which is what they are used for
most of the time: search for something, and press "n" a lot).

Also try to write less trivial things somewhat briefly in changelogs:
changelogs just say *what* changed, not *why*, and it is okay to leave
out details (this is a tradeoff of course).


Segher


[PATCH] c++: Allow constexpr references to non-static vars [PR100976]

2021-07-15 Thread Marek Polacek via Gcc-patches
The combination of DR 2481 and DR 2126 should allow us to do

  void f()
  {
constexpr const int  = 42;
static_assert(r == 42);
  }

because [expr.const]/4.7 now says that "a temporary object of
non-volatile const-qualified literal type whose lifetime is extended to
that of a variable that is usable in constant expressions" is usable in
a constant expression.

I think the temporary is supposed to be const-qualified, because Core 2481
says so.  I was happy to find out that we already mark the temporary as
const + constexpr in set_up_extended_ref_temp.

But that wasn't enough to make the test above work: references are
traditionally implemented as pointers, so the temporary object will be
(const int &), and verify_constant -> reduced_constant_expression_p
-> initializer_constant_valid_p_1 doesn't think that's OK -- and rightly
so -- the address of a local variable certainly isn't constant.  Therefore
I'm skipping the verify_constant check in cxx_eval_outermost_constant_expr.
(DECL_INITIAL isn't checked because maybe we are still waiting for
initialize_local_var to set it.)

Then we need to be able to evaluate such a reference.  This I do by
seeing through the reference in cxx_eval_constant_expression.  I can't
rely on decl_constant_value to pull out DECL_INITIAL, because the VAR_DECL
isn't DECL_INITIALIZED_BY_CONSTANT_EXPRESSION_P, and I think we don't
need to mess with that if we're keeping this purely in constexpr.

I wonder if we should accept

  void f2()
  {
constexpr int & = 42;
static_assert(r == 42);
  }

Currently we don't -- CP_TYPE_CONST_NON_VOLATILE_P (type) is false in
set_up_extended_ref_temp.

Does this make sense?  Bootstrapped/regtested on x86_64-pc-linux-gnu.

PR c++/100976
DR 2481

gcc/cp/ChangeLog:

* constexpr.c (cxx_eval_constant_expression): For a constexpr
reference, return its DECL_INITIAL.
(cxx_eval_outermost_constant_expr): Don't verify the initializer
for a constexpr variable of reference type.

gcc/testsuite/ChangeLog:

* g++.dg/cpp0x/constexpr-ref2.C: Remove dg-error.
* g++.dg/cpp0x/constexpr-temp2.C: New test.
* g++.dg/cpp23/constexpr-temp1.C: New test.
* g++.dg/cpp23/constexpr-temp2.C: New test.
---
 gcc/cp/constexpr.c   | 29 +--
 gcc/testsuite/g++.dg/cpp0x/constexpr-ref2.C  |  5 ++-
 gcc/testsuite/g++.dg/cpp0x/constexpr-temp2.C | 15 
 gcc/testsuite/g++.dg/cpp23/constexpr-temp1.C | 39 
 gcc/testsuite/g++.dg/cpp23/constexpr-temp2.C | 23 
 5 files changed, 106 insertions(+), 5 deletions(-)
 create mode 100644 gcc/testsuite/g++.dg/cpp0x/constexpr-temp2.C
 create mode 100644 gcc/testsuite/g++.dg/cpp23/constexpr-temp1.C
 create mode 100644 gcc/testsuite/g++.dg/cpp23/constexpr-temp2.C

diff --git a/gcc/cp/constexpr.c b/gcc/cp/constexpr.c
index 31fa5b66865..80b4985d055 100644
--- a/gcc/cp/constexpr.c
+++ b/gcc/cp/constexpr.c
@@ -6180,6 +6180,22 @@ cxx_eval_constant_expression (const constexpr_ctx *ctx, 
tree t,
  return cxx_eval_constant_expression (ctx, r, lval, non_constant_p,
   overflow_p);
}
+  /* DR 2126 amended [expr.const]/4.7 to say that "a temporary object
+of non-volatile const-qualified literal type whose lifetime is
+extended to that of a variable that is usable in constant
+expressions" is usable in a constant expression.  Along with
+DR 2481 this means that we should accept
+
+  constexpr const int  = 42;
+  static_assert (r == 42);
+
+Take a shortcut here rather than using decl_constant_value.  The
+temporary was marked constexpr in set_up_extended_ref_temp.  */
+  else if (TYPE_REF_P (TREE_TYPE (t))
+  && DECL_DECLARED_CONSTEXPR_P (t)
+  && DECL_INITIAL (t))
+   return cxx_eval_constant_expression (ctx, DECL_INITIAL (t), lval,
+non_constant_p, overflow_p);
   /* fall through */
 case CONST_DECL:
   /* We used to not check lval for CONST_DECL, but darwin.c uses
@@ -7289,10 +7305,17 @@ cxx_eval_outermost_constant_expr (tree t, bool 
allow_non_constant,
   r = cxx_eval_constant_expression (, r,
false, _constant_p, _p);
 
-  if (!constexpr_dtor)
-verify_constant (r, allow_non_constant, _constant_p, _p);
-  else
+  if (object && VAR_P (object)
+  && DECL_DECLARED_CONSTEXPR_P (object)
+  && TYPE_REF_P (TREE_TYPE (object)))
+  /* Circumvent verify_constant, because it ends up calling
+ initializer_constant_valid_p which doesn't like taking
+ the address of a local variable.  But that's OK since
+ DR 2126 + DR 2481, at least in a constexpr context.  */;
+  else if (constexpr_dtor)
 DECL_INITIALIZED_BY_CONSTANT_EXPRESSION_P (object) = true;
+  else
+verify_constant (r, allow_non_constant, _constant_p, _p);
 
   unsigned int i;
 

Re: [PATCH 4/7] ifcvt/optabs: Allow using a CC comparison for emit_conditional_move.

2021-07-15 Thread Richard Sandiford via Gcc-patches
Robin Dapp  writes:
> Currently we only ever call emit_conditional_move with the comparison
> (as well as its comparands) we got from the jump.  Thus, backends are
> going to emit a CC comparison for every conditional move that is being
> generated instead of re-using the existing CC.
> This, combined with emitting temporaries for each conditional move,
> causes sky-high costs for conditional moves.
>
> This patch allows to re-use a CC so the costing situation is improved a
> bit.
> ---
>  gcc/ifcvt.c  |  16 +++--
>  gcc/optabs.c | 163 ++-
>  gcc/optabs.h |   1 +
>  3 files changed, 121 insertions(+), 59 deletions(-)
>
> diff --git a/gcc/ifcvt.c b/gcc/ifcvt.c
> index ac0c142c9fe..c5b8641e2aa 100644
> --- a/gcc/ifcvt.c
> +++ b/gcc/ifcvt.c
> @@ -771,7 +771,7 @@ static int noce_try_addcc (struct noce_if_info *);
>  static int noce_try_store_flag_constants (struct noce_if_info *);
>  static int noce_try_store_flag_mask (struct noce_if_info *);
>  static rtx noce_emit_cmove (struct noce_if_info *, rtx, enum rtx_code, rtx,
> - rtx, rtx, rtx);
> + rtx, rtx, rtx, rtx = NULL, rtx = NULL);
>  static int noce_try_cmove (struct noce_if_info *);
>  static int noce_try_cmove_arith (struct noce_if_info *);
>  static rtx noce_get_alt_condition (struct noce_if_info *, rtx, rtx_insn **);
> @@ -1710,7 +1710,8 @@ noce_try_store_flag_mask (struct noce_if_info *if_info)
>  
>  static rtx
>  noce_emit_cmove (struct noce_if_info *if_info, rtx x, enum rtx_code code,
> -  rtx cmp_a, rtx cmp_b, rtx vfalse, rtx vtrue)
> +  rtx cmp_a, rtx cmp_b, rtx vfalse, rtx vtrue, rtx cc_cmp,
> +  rtx rev_cc_cmp)
>  {
>rtx target ATTRIBUTE_UNUSED;
>int unsignedp ATTRIBUTE_UNUSED;
> @@ -1756,9 +1757,14 @@ noce_emit_cmove (struct noce_if_info *if_info, rtx x, 
> enum rtx_code code,
>unsignedp = (code == LTU || code == GEU
>  || code == LEU || code == GTU);
>  
> -  target = emit_conditional_move (x, code, cmp_a, cmp_b, VOIDmode,
> -   vtrue, vfalse, GET_MODE (x),
> -   unsignedp);
> +  if (cc_cmp != NULL_RTX && rev_cc_cmp != NULL_RTX)
> +target = emit_conditional_move (x, cc_cmp, rev_cc_cmp,
> + vtrue, vfalse, GET_MODE (x));
> +  else
> +target = emit_conditional_move (x, code, cmp_a, cmp_b, VOIDmode,
> + vtrue, vfalse, GET_MODE (x),
> + unsignedp);

It might make sense to move:

  /* Don't even try if the comparison operands are weird
 except that the target supports cbranchcc4.  */
  if (! general_operand (cmp_a, GET_MODE (cmp_a))
  || ! general_operand (cmp_b, GET_MODE (cmp_b)))
{
  if (!have_cbranchcc4
  || GET_MODE_CLASS (GET_MODE (cmp_a)) != MODE_CC
  || cmp_b != const0_rtx)
return NULL_RTX;
}

into the “else” arm, since it seems odd to be checking cmp_a and cmp_b
when we're not going to use them.  Looks like the later call to
emit_conditional_move should get the same treatment.

> +
>if (target)
>  return target;
>  
> diff --git a/gcc/optabs.c b/gcc/optabs.c
> index 62a6bdb4c59..6bf486b9b50 100644
> --- a/gcc/optabs.c
> +++ b/gcc/optabs.c
> @@ -52,6 +52,8 @@ static void prepare_float_lib_cmp (rtx, rtx, enum rtx_code, 
> rtx *,
>  static rtx expand_unop_direct (machine_mode, optab, rtx, rtx, int);
>  static void emit_libcall_block_1 (rtx_insn *, rtx, rtx, rtx, bool);
>  
> +static rtx emit_conditional_move (rtx, rtx, rtx, rtx, machine_mode);
> +
>  /* Debug facility for use in GDB.  */
>  void debug_optab_libfuncs (void);
>
> @@ -4747,7 +4749,6 @@ emit_conditional_move (rtx target, enum rtx_code code, 
> rtx op0, rtx op1,
>  machine_mode mode, int unsignedp)
>  {
>rtx comparison;
> -  rtx_insn *last;
>enum insn_code icode;
>enum rtx_code reversed;
>  
> @@ -4774,6 +4775,7 @@ emit_conditional_move (rtx target, enum rtx_code code, 
> rtx op0, rtx op1,
>/* get_condition will prefer to generate LT and GT even if the old
>   comparison was against zero, so undo that canonicalization here since
>   comparisons against zero are cheaper.  */
> +
>if (code == LT && op1 == const1_rtx)
>  code = LE, op1 = const0_rtx;
>else if (code == GT && op1 == constm1_rtx)
> @@ -4782,17 +4784,29 @@ emit_conditional_move (rtx target, enum rtx_code 
> code, rtx op0, rtx op1,
>if (cmode == VOIDmode)
>  cmode = GET_MODE (op0);
>  
> -  enum rtx_code orig_code = code;
> +  /* If the first source operand is constant and the second is not, swap
> + it into the second.  In that case we also need to reverse the
> + comparison.  It is possible, though, that the conditional move
> + will not expand with operands in this order, so we might also need
> + to revert to the original comparison and operand order.  */

Why's that the case 

[PATCH] i386: Fix ix86_hard_regno_mode_ok for TDmode on 32bit targets [PR101346]

2021-07-15 Thread Uros Bizjak via Gcc-patches
General regs on 32bit targets do not support 128bit modes,
including TDmode.

gcc/

2021-07-15  Uroš Bizjak  

PR target/101346
* config/i386/i386.h (VALID_SSE_REG_MODE): Add TDmode.
(VALID_INT_MODE_P): Add SDmode and DDmode.
Add TDmode for TARGET_64BIT.
(VALID_DFP_MODE_P): Remove.
* config/i386/i386.c (ix86_hard_regno_mode_ok):
Do not use VALID_DFP_MODE_P.

gcc/testsuite/

2021-07-15  Uroš Bizjak  

PR target/101346
* gcc.target/i386/pr101346.c: New test.

Bootstrapped and regression tested on x86_64-linux-gnu {,-m32}.

Pushed to master.

Uros.
diff --git a/gcc/config/i386/i386.c b/gcc/config/i386/i386.c
index 530d3572965..9d74b7a191b 100644
--- a/gcc/config/i386/i386.c
+++ b/gcc/config/i386/i386.c
@@ -19535,11 +19535,8 @@ ix86_hard_regno_mode_ok (unsigned int regno, 
machine_mode mode)
   return !can_create_pseudo_p ();
 }
   /* We handle both integer and floats in the general purpose registers.  */
-  else if (VALID_INT_MODE_P (mode))
-return true;
-  else if (VALID_FP_MODE_P (mode))
-return true;
-  else if (VALID_DFP_MODE_P (mode))
+  else if (VALID_INT_MODE_P (mode)
+  || VALID_FP_MODE_P (mode))
 return true;
   /* Lots of MMX code casts 8 byte vector modes to DImode.  If we then go
  on to use that value in smaller contexts, this can easily force a
diff --git a/gcc/config/i386/i386.h b/gcc/config/i386/i386.h
index 324e8a952d9..0c2c93daf32 100644
--- a/gcc/config/i386/i386.h
+++ b/gcc/config/i386/i386.h
@@ -1023,7 +1023,7 @@ extern const char *host_detect_local_cpu (int argc, const 
char **argv);
 #define VALID_SSE_REG_MODE(MODE)   \
   ((MODE) == V1TImode || (MODE) == TImode  \
|| (MODE) == V4SFmode || (MODE) == V4SImode \
-   || (MODE) == SFmode || (MODE) == TFmode)
+   || (MODE) == SFmode || (MODE) == TFmode || (MODE) == TDmode)
 
 #define VALID_MMX_REG_MODE_3DNOW(MODE) \
   ((MODE) == V2SFmode || (MODE) == SFmode)
@@ -1037,9 +1037,6 @@ extern const char *host_detect_local_cpu (int argc, const 
char **argv);
 
 #define VALID_MASK_AVX512BW_MODE(MODE) ((MODE) == SImode || (MODE) == DImode)
 
-#define VALID_DFP_MODE_P(MODE) \
-  ((MODE) == SDmode || (MODE) == DDmode || (MODE) == TDmode)
-
 #define VALID_FP_MODE_P(MODE)  \
   ((MODE) == SFmode || (MODE) == DFmode || (MODE) == XFmode\
|| (MODE) == SCmode || (MODE) == DCmode || (MODE) == XCmode)
\
@@ -1049,12 +1046,13 @@ extern const char *host_detect_local_cpu (int argc, 
const char **argv);
|| (MODE) == SImode || (MODE) == DImode \
|| (MODE) == CQImode || (MODE) == CHImode   \
|| (MODE) == CSImode || (MODE) == CDImode   \
+   || (MODE) == SDmode || (MODE) == DDmode \
|| (MODE) == V4QImode || (MODE) == V2HImode || (MODE) == V1SImode   \
|| (TARGET_64BIT\
&& ((MODE) == TImode || (MODE) == CTImode   \
   || (MODE) == TFmode || (MODE) == TCmode  \
   || (MODE) == V8QImode || (MODE) == V4HImode  \
-  || (MODE) == V2SImode)))
+  || (MODE) == V2SImode || (MODE) == TDmode)))
 
 /* Return true for modes passed in SSE registers.  */
 #define SSE_REG_MODE_P(MODE)   \
diff --git a/gcc/testsuite/gcc.target/i386/pr101346.c 
b/gcc/testsuite/gcc.target/i386/pr101346.c
new file mode 100644
index 000..fefabaf0e56
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/pr101346.c
@@ -0,0 +1,10 @@
+/* PR target/101346 */
+/* { dg-do compile } */
+/* { dg-options "-O0 -fprofile-generate -msse" } */
+/* { dg-require-profiling "-fprofile-generate" } */
+
+_Decimal128
+foo (_Decimal128 x)
+{
+  return - __builtin_fabsd128 (x);
+}


Re: [PATCH 3/7] ifcvt: Improve costs handling for noce_convert_multiple.

2021-07-15 Thread Richard Sandiford via Gcc-patches
Robin Dapp  writes:
> When noce_convert_multiple is called the original costs are not yet
> initialized.  Therefore, up to now, costs were only ever unfairly
> compared against COSTS_N_INSNS (2).  This would lead to
> default_noce_conversion_profitable_p () rejecting all but the most
> contrived of sequences.
>
> This patch temporarily initialized the original costs by counting
> a compare and all the sets inside the then_bb.
> ---
>  gcc/ifcvt.c | 30 ++
>  1 file changed, 26 insertions(+), 4 deletions(-)
>
> diff --git a/gcc/ifcvt.c b/gcc/ifcvt.c
> index 6006055f26a..ac0c142c9fe 100644
> --- a/gcc/ifcvt.c
> +++ b/gcc/ifcvt.c
> @@ -3382,14 +3382,17 @@ noce_convert_multiple_sets (struct noce_if_info 
> *if_info)
> (SET (REG) (REG)) insns suitable for conversion to a series
> of conditional moves.  Also check that we have more than one set
> (other routines can handle a single set better than we would), and
> -   fewer than PARAM_MAX_RTL_IF_CONVERSION_INSNS sets.  */
> +   fewer than PARAM_MAX_RTL_IF_CONVERSION_INSNS sets.  While going
> +   through the insns store the sum of their potential costs in COST.  */
>  
>  static bool
> -bb_ok_for_noce_convert_multiple_sets (basic_block test_bb)
> +bb_ok_for_noce_convert_multiple_sets (basic_block test_bb, unsigned *cost)
>  {
>rtx_insn *insn;
>unsigned count = 0;
>unsigned param = param_max_rtl_if_conversion_insns;
> +  bool speed_p = optimize_bb_for_speed_p (test_bb);
> +  unsigned potential_cost = 0;
>  
>FOR_BB_INSNS (test_bb, insn)
>  {
> @@ -3425,9 +3428,13 @@ bb_ok_for_noce_convert_multiple_sets (basic_block 
> test_bb)
>if (!can_conditionally_move_p (GET_MODE (dest)))
>   return false;
>  
> +  potential_cost += pattern_cost (set, speed_p);
> +

It looks like this is an existing (potential) problem,
but default_noce_conversion_profitable_p uses seq_cost, which in turn
uses insn_cost.  And insn_cost has an optional target hook behind it,
which allows for costing based on insn attributes etc.  For a true
apples-with-apples comparison we should use insn_cost here too.

>count++;
>  }
>  
> +  *cost += potential_cost;
> +
>/* If we would only put out one conditional move, the other strategies
>   this pass tries are better optimized and will be more appropriate.
>   Some targets want to strictly limit the number of conditional moves
> @@ -3475,11 +3482,23 @@ noce_process_if_block (struct noce_if_info *if_info)
>   to calculate a value for x.
>   ??? For future expansion, further expand the "multiple X" rules.  */
>  
> -  /* First look for multiple SETS.  */
> +  /* First look for multiple SETS.  The original costs already
> + include a compare that we will be needing either way.

I think the detail that COSTS_N_INSNS (2) is the default is useful here.
(In other words, I'd forgotten by the time I'd poked around other bits of
ifcvt and was about to ask why we didn't cost the condition “properly”.)
So how about something like:

   The original costs already include a base cost of COSTS_N_INSNS (2):
   one instruction for the compare (which we will be needing either way)
   and one instruction for the branch.

> + When
> + comparing costs we want to use the branch instruction cost and
> + the sets vs. the cmovs generated here.  Therefore subtract
> + the costs of the compare before checking.
> + ??? Actually, instead of the branch instruction costs we might want
> + to use COSTS_N_INSNS (BRANCH_COST ()) as in other places.*/

Hmm, not sure about the ??? either way.  The units of BRANCH_COST aren't
entirely clear.  But it's a ???, so keeping it is fine.

Formatting nit: should be two spaces between “.” and “*/”.

Looks good otherwise, thanks.

Richard

> +
> +  unsigned potential_cost = if_info->original_cost - COSTS_N_INSNS (1);
> +  unsigned old_cost = if_info->original_cost;
>if (!else_bb
>&& HAVE_conditional_move
> -  && bb_ok_for_noce_convert_multiple_sets (then_bb))
> +  && bb_ok_for_noce_convert_multiple_sets (then_bb, _cost))
>  {
> +  /* Temporarily set the original costs to what we estimated so
> +  we can determine if the transformation is worth it.  */
> +  if_info->original_cost = potential_cost;
>if (noce_convert_multiple_sets (if_info))
>   {
> if (dump_file && if_info->transform_name)
> @@ -3487,6 +3506,9 @@ noce_process_if_block (struct noce_if_info *if_info)
>if_info->transform_name);
> return TRUE;
>   }
> +
> +  /* Restore the original costs.  */
> +  if_info->original_cost = old_cost;
>  }
>  
>bool speed_p = optimize_bb_for_speed_p (test_bb);


Re: [PATCH 2/7] ifcvt: Allow constants for noce_convert_multiple.

2021-07-15 Thread Richard Sandiford via Gcc-patches
Robin Dapp  writes:
> This lifts the restriction of not allowing constants for
> noce_convert_multiple.  The code later checks if a valid sequence
> is produced anyway.

OK, thanks.

I was initially worried that this might trump later, more targetted
optimisations, but it looks like that's already accounted for:

  /* If we would only put out one conditional move, the other strategies
 this pass tries are better optimized and will be more appropriate.
 Some targets want to strictly limit the number of conditional moves
 that are emitted, they set this through PARAM, we need to respect
 that.  */
  return count > 1 && count <= param;

Richard


> ---
>  gcc/ifcvt.c | 13 -
>  1 file changed, 8 insertions(+), 5 deletions(-)
>
> diff --git a/gcc/ifcvt.c b/gcc/ifcvt.c
> index eef6490626a..6006055f26a 100644
> --- a/gcc/ifcvt.c
> +++ b/gcc/ifcvt.c
> @@ -3269,7 +3269,9 @@ noce_convert_multiple_sets (struct noce_if_info 
> *if_info)
>we'll end up trying to emit r4:HI = cond ? (r1:SI) : (r3:HI).
>Wrap the two cmove operands into subregs if appropriate to prevent
>that.  */
> -  if (GET_MODE (new_val) != GET_MODE (temp))
> +
> +  if (!CONSTANT_P (new_val)
> +   && GET_MODE (new_val) != GET_MODE (temp))
>   {
> machine_mode src_mode = GET_MODE (new_val);
> machine_mode dst_mode = GET_MODE (temp);
> @@ -3280,7 +3282,8 @@ noce_convert_multiple_sets (struct noce_if_info 
> *if_info)
>   }
> new_val = lowpart_subreg (dst_mode, new_val, src_mode);
>   }
> -  if (GET_MODE (old_val) != GET_MODE (temp))
> +  if (!CONSTANT_P (old_val)
> +   && GET_MODE (old_val) != GET_MODE (temp))
>   {
> machine_mode src_mode = GET_MODE (old_val);
> machine_mode dst_mode = GET_MODE (temp);
> @@ -3409,9 +3412,9 @@ bb_ok_for_noce_convert_multiple_sets (basic_block 
> test_bb)
>if (!REG_P (dest))
>   return false;
>  
> -  if (!(REG_P (src)
> -|| (GET_CODE (src) == SUBREG && REG_P (SUBREG_REG (src))
> -&& subreg_lowpart_p (src
> +  if (!((REG_P (src) || CONSTANT_P (src))
> + || (GET_CODE (src) == SUBREG && REG_P (SUBREG_REG (src))
> +   && subreg_lowpart_p (src
>   return false;
>  
>/* Destination must be appropriate for a conditional write.  */


Re: [PATCH 1/7] ifcvt: Check if cmovs are needed.

2021-07-15 Thread Richard Sandiford via Gcc-patches
Sorry for the slow review.

Robin Dapp  writes:
> When if-converting multiple SETs and we encounter a swap-style idiom
>
>   if (a > b)
> {
>   tmp = c;   // [1]
>   c = d;
>   d = tmp;
> }
>
> ifcvt should not generate a conditional move for the instruction at
> [1].
>
> In order to achieve that, this patch goes through all relevant SETs
> and marks the relevant instructions.  This helps to evaluate costs.
>
> On top, only generate temporaries if the current cmov is going to
> overwrite one of the comparands of the initial compare.
> ---
>  gcc/ifcvt.c | 104 +++-
>  1 file changed, 87 insertions(+), 17 deletions(-)
>
> diff --git a/gcc/ifcvt.c b/gcc/ifcvt.c
> index 017944f4f79..eef6490626a 100644
> --- a/gcc/ifcvt.c
> +++ b/gcc/ifcvt.c
> @@ -98,6 +98,7 @@ static int dead_or_predicable (basic_block, basic_block, 
> basic_block,
>  edge, int);
>  static void noce_emit_move_insn (rtx, rtx);
>  static rtx_insn *block_has_only_trap (basic_block);
> +static void check_need_cmovs (basic_block, hash_map *);
>
>  /* Count the number of non-jump active insns in BB.  */
>  
> @@ -3203,6 +3204,10 @@ noce_convert_multiple_sets (struct noce_if_info 
> *if_info)
>auto_vec unmodified_insns;
>int count = 0;
>  
> +  hash_map need_cmovs;

A hash_set might be simpler, given that the code only enters insns for
which the bool is false.  “rtx_insn *” would be better than rtx.

> +
> +  check_need_cmovs (then_bb, _cmovs);
> +
>FOR_BB_INSNS (then_bb, insn)
>  {
>/* Skip over non-insns.  */
> @@ -3213,26 +3218,38 @@ noce_convert_multiple_sets (struct noce_if_info 
> *if_info)
>gcc_checking_assert (set);
>  
>rtx target = SET_DEST (set);
> -  rtx temp = gen_reg_rtx (GET_MODE (target));
> +  rtx temp;
>rtx new_val = SET_SRC (set);
>rtx old_val = target;
>  
> -  /* If we were supposed to read from an earlier write in this block,
> -  we've changed the register allocation.  Rewire the read.  While
> -  we are looking, also try to catch a swap idiom.  */
> -  for (int i = count - 1; i >= 0; --i)
> - if (reg_overlap_mentioned_p (new_val, targets[i]))
> -   {
> - /* Catch a "swap" style idiom.  */
> - if (find_reg_note (insn, REG_DEAD, new_val) != NULL_RTX)
> -   /* The write to targets[i] is only live until the read
> -  here.  As the condition codes match, we can propagate
> -  the set to here.  */
> -   new_val = SET_SRC (single_set (unmodified_insns[i]));
> - else
> -   new_val = temporaries[i];
> - break;
> -   }
> +  /* As we are transforming
> +  if (x > y)
> +a = b;
> +c = d;

Looks like some missing braces here.

> +  into
> +a = (x > y) ...
> +c = (x > y) ...
> +
> +  we potentially check x > y before every set here.
> +  (Even though might be removed by subsequent passes.)

Do you mean the sets might be removed or that the checks might be removed?

> +  We cannot transform
> +if (x > y)
> +  x = y;
> +  ...
> +  into
> +x = (x > y) ...
> +...
> +  since this would invalidate x.  Therefore we introduce a temporary
> +  every time we are about to overwrite a variable used in the
> +  check.  Costing of a sequence with these is going to be inaccurate.  */
> +  if (reg_overlap_mentioned_p (target, cond))
> + temp = gen_reg_rtx (GET_MODE (target));
> +  else
> + temp = target;
> +
> +  bool need_cmov = true;
> +  if (need_cmovs.get (insn))
> + need_cmov = false;

The patch is quite hard to review on its own, since nothing actually uses
this variable.  It's also not obvious how the reg_overlap_mentioned_p
code works if the old target is referenced later.

Could you refactor the series a bit so that each patch is self-contained?
It's OK if that means fewer patches.

Thanks,
Richard

>/* If we had a non-canonical conditional jump (i.e. one where
>the fallthrough is to the "else" case) we need to reverse
> @@ -3808,6 +3825,59 @@ check_cond_move_block (basic_block bb,
>return TRUE;
>  }
>  
> +/* Find local swap-style idioms in BB and mark the first insn (1)
> +   that is only a temporary as not needing a conditional move as
> +   it is going to be dead afterwards anyway.
> +
> + (1) int tmp = a;
> +  a = b;
> +  b = tmp;
> +
> +  ifcvt
> +  -->
> +
> +  load tmp,a
> +  cmov a,b
> +  cmov b,tmp   */
> +
> +static void
> +check_need_cmovs (basic_block bb, hash_map *need_cmov)
> +{
> +  rtx_insn *insn;
> +  int count = 0;
> +  auto_vec insns;
> +  auto_vec dests;
> +
> +  FOR_BB_INSNS (bb, insn)
> +{
> +  rtx set, src, dest;
> +
> +  if (!active_insn_p (insn))
> + continue;
> +
> +  set = single_set (insn);
> +  if (set == NULL_RTX)
> + continue;
> +
> +

Re: [PATCH] libstdc++: Give split_view::_Sentinel a default ctor [PR101214]

2021-07-15 Thread Jonathan Wakely via Gcc-patches
On Tue, 13 Jul 2021 at 19:05, Patrick Palka via Libstdc++
 wrote:
>
> This gives the new split_view's sentinel type a defaulted default
> constructor, something which was overlooked in r12-1665.  This patch
> also fixes a couple of other issues with the new split_view as reported
> in the PR.
>
> Tested on x86_64-pc-linux-gnu, does this look OK for trunk?

Yes, thanks.

>
> PR libstdc++/101214
>
> libstdc++-v3/ChangeLog:
>
> * include/std/ranges (split_view::split_view): Use std::move.
> (split_view::_Iterator::_Iterator): Remove redundant
> default_initializable constraint.
> (split_view::_Sentinel::_Sentinel): Declare.
> * testsuite/std/ranges/adaptors/split.cc (test02): New test.
> ---
>  libstdc++-v3/include/std/ranges |  6 --
>  libstdc++-v3/testsuite/std/ranges/adaptors/split.cc | 11 +++
>  2 files changed, 15 insertions(+), 2 deletions(-)
>
> diff --git a/libstdc++-v3/include/std/ranges b/libstdc++-v3/include/std/ranges
> index f552caa9d5b..df74ac9dc19 100644
> --- a/libstdc++-v3/include/std/ranges
> +++ b/libstdc++-v3/include/std/ranges
> @@ -3306,7 +3306,7 @@ namespace views::__adaptor
> && constructible_from<_Pattern, single_view>>
>  constexpr
>  split_view(_Range&& __r, range_value_t<_Range> __e)
> -  : _M_pattern(views::single(__e)),
> +  : _M_pattern(views::single(std::move(__e))),
> _M_base(views::all(std::forward<_Range>(__r)))
>  { }
>
> @@ -3364,7 +3364,7 @@ namespace views::__adaptor
>using value_type = subrange>;
>using difference_type = range_difference_t<_Vp>;
>
> -  _Iterator() requires default_initializable> = default;
> +  _Iterator() = default;
>
>constexpr
>_Iterator(split_view* __parent,
> @@ -3429,6 +3429,8 @@ namespace views::__adaptor
>{ return __x._M_cur == _M_end && !__x._M_trailing_empty; }
>
>  public:
> +  _Sentinel() = default;
> +
>constexpr explicit
>_Sentinel(split_view* __parent)
> : _M_end(ranges::end(__parent->_M_base))
> diff --git a/libstdc++-v3/testsuite/std/ranges/adaptors/split.cc 
> b/libstdc++-v3/testsuite/std/ranges/adaptors/split.cc
> index 02c6073a503..b4e01fea6e4 100644
> --- a/libstdc++-v3/testsuite/std/ranges/adaptors/split.cc
> +++ b/libstdc++-v3/testsuite/std/ranges/adaptors/split.cc
> @@ -46,6 +46,16 @@ test01()
>VERIFY( ranges::equal(ints, (int[]){1,2,3,4}) );
>  }
>
> +void
> +test02()
> +{
> +  // PR libstdc++/101214
> +  auto v = views::iota(0) | views::take(5) | views::split(0);
> +  static_assert(!ranges::common_range);
> +  static_assert(std::default_initializable);
> +  static_assert(std::sentinel_for);
> +}
> +
>  // The following testcases are adapted from lazy_split.cc.
>  namespace from_lazy_split_cc
>  {
> @@ -189,6 +199,7 @@ int
>  main()
>  {
>test01();
> +  test02();
>
>from_lazy_split_cc::test01();
>from_lazy_split_cc::test02();
> --
> 2.32.0.170.gd486ca60a5
>


[COMMITTED] Add gimple_range_type for statements.

2021-07-15 Thread Andrew MacLeod via Gcc-patches

On 7/15/21 9:06 AM, Richard Biener wrote:

On Thu, Jul 15, 2021 at 1:06 PM Aldy Hernandez  wrote:


Currently gimple_expr_type is ICEing because it calls gimple_call_return_type.

I still think gimple_call_return_type should return void_type_node
instead of ICEing, but this will also fix my problem.

Anyone have a problem with this?

It's still somewhat inconsistent, no?  Because for a call without a LHS
it's now either void_type_node or the type of the return value.

It's probably known I dislike gimple_expr_type itself (it was introduced
to make the transition to tuples easier).  I wonder why you can't simply
fix range_of_call to do

tree lhs = gimple_call_lhs (call);
if (lhs)
  type = TREE_TYPE (lhs);

Richard.


You are correct. There are indeed inconsistencies, and they exist in 
multiple places.  In fact, none of them do exactly what we are looking 
for all the time, and there are times we do care about the stmt when 
there is no LHS.    In addition, we almost always then have to check 
whether the type we found is supported.


So instead, much as we did for types with range_compatible_p (), we'll 
provide a function for statements which does exactly what we need. This 
patch eliminates all the ranger calls to both gimple_expr_type ()  and  
gimple_call_return_type () .    This will also simplify the life of 
anyone who goes to eventually remove gimple_expr_type () as there will 
now be less uses.


The function will return a type if and only if we can find the type in 
an orderly fashion, and then determine if it is also supported by ranger.


Bootstrapped on x86_64-pc-linux-gnu with no regressions.  Pushed.

Andrew

commit 478cc962ad174bfc64c573152a0658935651fce3
Author: Andrew MacLeod 
Date:   Thu Jul 15 11:07:12 2021 -0400

Add gimple_range_type for statements.

The existing mechanisms for picking up the type of a statement are
inconsistent with the needs of ranger. Encapsulate all the bits
required to pick up the return type of a statement in one place, and check
whether the type is supported.

* gimple-range-fold.cc (adjust_pointer_diff_expr): Use
gimple_range_type.
(fold_using_range::fold_stmt): Ditto.
(fold_using_range::range_of_range_op): Ditto.
(fold_using_range::range_of_phi): Ditto.
(fold_using_range::range_of_call): Ditto.
(fold_using_range::range_of_builtin_ubsan_call): Ditto.
(fold_using_range::range_of_builtin_call): Ditto.
(fold_using_range::range_of_cond_expr): Ditto.
* gimple-range-fold.h (gimple_range_type): New.

diff --git a/gcc/gimple-range-fold.cc b/gcc/gimple-range-fold.cc
index eff5d1f89f2..f8578c013bc 100644
--- a/gcc/gimple-range-fold.cc
+++ b/gcc/gimple-range-fold.cc
@@ -362,7 +362,7 @@ adjust_pointer_diff_expr (irange , const gimple *diff_stmt)
 {
   tree max = vrp_val_max (ptrdiff_type_node);
   wide_int wmax = wi::to_wide (max, TYPE_PRECISION (TREE_TYPE (max)));
-  tree expr_type = gimple_expr_type (diff_stmt);
+  tree expr_type = gimple_range_type (diff_stmt);
   tree range_min = build_zero_cst (expr_type);
   tree range_max = wide_int_to_tree (expr_type, wmax - 1);
   int_range<2> r (range_min, range_max);
@@ -522,16 +522,8 @@ fold_using_range::fold_stmt (irange , gimple *s, fur_source , tree name)
 
   if (!res)
 {
-  // If no name is specified, try the expression kind.
-  if (!name)
-	{
-	  tree t = gimple_expr_type (s);
-	  if (!irange::supports_type_p (t))
-	return false;
-	  r.set_varying (t);
-	  return true;
-	}
-  if (!gimple_range_ssa_p (name))
+  // If no name specified or range is unsupported, bail.
+  if (!name || !gimple_range_ssa_p (name))
 	return false;
   // We don't understand the stmt, so return the global range.
   r = gimple_range_global (name);
@@ -558,10 +550,11 @@ bool
 fold_using_range::range_of_range_op (irange , gimple *s, fur_source )
 {
   int_range_max range1, range2;
-  tree type = gimple_expr_type (s);
+  tree type = gimple_range_type (s);
+  if (!type)
+return false;
   range_operator *handler = gimple_range_handler (s);
   gcc_checking_assert (handler);
-  gcc_checking_assert (irange::supports_type_p (type));
 
   tree lhs = gimple_get_lhs (s);
   tree op1 = gimple_range_operand1 (s);
@@ -719,11 +712,11 @@ bool
 fold_using_range::range_of_phi (irange , gphi *phi, fur_source )
 {
   tree phi_def = gimple_phi_result (phi);
-  tree type = TREE_TYPE (phi_def);
+  tree type = gimple_range_type (phi);
   int_range_max arg_range;
   unsigned x;
 
-  if (!irange::supports_type_p (type))
+  if (!type)
 return false;
 
   // Start with an empty range, unioning in each argument's range.
@@ -780,13 +773,13 @@ fold_using_range::range_of_phi (irange , gphi *phi, fur_source )
 bool
 fold_using_range::range_of_call (irange , gcall *call, fur_source )
 {
-  tree type = gimple_call_return_type (call);
+  tree type = 

Re: [PATCH] libstdc++: invalid default init in _CachedPosition [PR101231]

2021-07-15 Thread Jonathan Wakely via Gcc-patches
On Tue, 13 Jul 2021 at 20:09, Patrick Palka via Libstdc++
 wrote:
>
> The primary template for _CachedPosition is a dummy implementation for
> non-forward ranges, the iterators for which generally can't be cached.
> Because this implementation doesn't actually cache anything, _M_has_value
> is defined to be false and so calls to _M_get (which are always guarded
> by _M_has_value) are unreachable.
>
> Still, to suppress a "control reaches end of non-void function" warning
> I made _M_get return {}, but after P2325 input iterators are no longer
> necessarily default constructible so this workaround now breaks valid
> programs.
>
> This patch fixes this by instead using __builtin_unreachable to squelch
> the warning.
>
> Tested on x86_64-pc-linux-gnu, does this look OK for trunk?

Yes, thanks.


>
> PR libstdc++/101231
>
> libstdc++-v3/ChangeLog:
>
> * include/std/ranges (_CachedPosition::_M_get): For non-forward
> ranges, just call __builtin_unreachable.
> * testsuite/std/ranges/istream_view.cc (test05): New test.
> ---
>  libstdc++-v3/include/std/ranges   |  2 +-
>  libstdc++-v3/testsuite/std/ranges/istream_view.cc | 12 
>  2 files changed, 13 insertions(+), 1 deletion(-)
>
> diff --git a/libstdc++-v3/include/std/ranges b/libstdc++-v3/include/std/ranges
> index df74ac9dc19..d791e15d096 100644
> --- a/libstdc++-v3/include/std/ranges
> +++ b/libstdc++-v3/include/std/ranges
> @@ -1232,7 +1232,7 @@ namespace views::__adaptor
> _M_get(const _Range&) const
> {
>   __glibcxx_assert(false);
> - return {};
> + __builtin_unreachable();
> }
>
> constexpr void
> diff --git a/libstdc++-v3/testsuite/std/ranges/istream_view.cc 
> b/libstdc++-v3/testsuite/std/ranges/istream_view.cc
> index 369790e89e5..2f15f787250 100644
> --- a/libstdc++-v3/testsuite/std/ranges/istream_view.cc
> +++ b/libstdc++-v3/testsuite/std/ranges/istream_view.cc
> @@ -83,6 +83,17 @@ test04()
>static_assert(!std::forward_iterator);
>  }
>
> +void
> +test05()
> +{
> +  // PR libstdc++/101231
> +  auto words = std::istringstream{"42"};
> +  auto is = ranges::istream_view(words);
> +  auto r = is | views::filter([](auto) { return true; });
> +  for (auto x : r)
> +;
> +}
> +
>  int
>  main()
>  {
> @@ -90,4 +101,5 @@ main()
>test02();
>test03();
>test04();
> +  test05();
>  }
> --
> 2.32.0.170.gd486ca60a5
>


Re: testsuite: aarch64: Fix failing SVE tests on big endian

2021-07-15 Thread Richard Sandiford via Gcc-patches
Jonathan Wright via Gcc-patches  writes:
> Hi,
>
> A recent change "gcc: Add vec_select -> subreg RTL simplification"
> updated the expected test results for SVE extraction tests. The new
> result should only have been changed for little endian. This patch
> restores the old expected result for big endian.
>
> Ok for master?

OK, thanks.

Richard

> Thanks,
> Jonathan
>
> ---
>
> gcc/testsuite/ChangeLog:
>
> 2021-07-15  Jonathan Wright  
>
>   * gcc.target/aarch64/sve/extract_1.c: Split expected results
>   by big/little endian targets, restoring the old expected
>   result for big endian.
>   * gcc.target/aarch64/sve/extract_2.c: Likewise.
>   * gcc.target/aarch64/sve/extract_3.c: Likewise.
>   * gcc.target/aarch64/sve/extract_4.c: Likewise.
>
> diff --git a/gcc/testsuite/gcc.target/aarch64/sve/extract_1.c 
> b/gcc/testsuite/gcc.target/aarch64/sve/extract_1.c
> index 
> 1a926dbb76fb42ab4bcfa18922fdbb2366d04e6e..7d76c98e92545817f6544d1b131cdbbd800c46ab
>  100644
> --- a/gcc/testsuite/gcc.target/aarch64/sve/extract_1.c
> +++ b/gcc/testsuite/gcc.target/aarch64/sve/extract_1.c
> @@ -56,7 +56,10 @@ typedef _Float16 vnx8hf __attribute__((vector_size (32)));
>  
>  TEST_ALL (EXTRACT)
>  
> -/* { dg-final { scan-assembler-times {\tfmov\tx[0-9]+, d[0-9]\n} 2 } } */
> +/* { dg-final { scan-assembler-times {\tfmov\tx[0-9]+, d[0-9]\n} 2 {
> + target { aarch64_little_endian } } } } */
> +/* { dg-final { scan-assembler-times {\tumov\tx[0-9]+, v[0-9]+\.d\[0\]\n} 1 {
> + target { aarch64_big_endian } } } } */
>  /* { dg-final { scan-assembler-times {\tumov\tx[0-9]+, v[0-9]+\.d\[1\]\n} 1 
> } } */
>  /* { dg-final { scan-assembler-not {\tdup\td[0-9]+, v[0-9]+\.d\[0\]\n} } } */
>  /* { dg-final { scan-assembler-times {\tdup\td[0-9]+, v[0-9]+\.d\[1\]\n} 1 } 
> } */
> @@ -64,7 +67,10 @@ TEST_ALL (EXTRACT)
>  /* { dg-final { scan-assembler-times {\tlastb\tx[0-9]+, p[0-7], 
> z[0-9]+\.d\n} 1 } } */
>  /* { dg-final { scan-assembler-times {\tlastb\td[0-9]+, p[0-7], 
> z[0-9]+\.d\n} 1 } } */
>  
> -/* { dg-final { scan-assembler-times {\tfmov\tw[0-9]+, s[0-9]\n} 2 } } */
> +/* { dg-final { scan-assembler-times {\tfmov\tx[0-9]+, s[0-9]\n} 2 {
> + target { aarch64_little_endian } } } } */
> +/* { dg-final { scan-assembler-times {\tumov\tx[0-9]+, v[0-9]+\.s\[0\]\n} 1 {
> + target { aarch64_big_endian } } } } */
>  /* { dg-final { scan-assembler-times {\tumov\tw[0-9]+, v[0-9]+\.s\[1\]\n} 1 
> } } */
>  /* { dg-final { scan-assembler-times {\tumov\tw[0-9]+, v[0-9]+\.s\[3\]\n} 1 
> } } */
>  /* { dg-final { scan-assembler-not {\tdup\ts[0-9]+, v[0-9]+\.s\[0\]\n} } } */
> diff --git a/gcc/testsuite/gcc.target/aarch64/sve/extract_2.c 
> b/gcc/testsuite/gcc.target/aarch64/sve/extract_2.c
> index 
> 1c54d10cd348f8c81b6369b7b180e30580c8988d..a2644ceae68235175ff787d1d9cbece83985dc5f
>  100644
> --- a/gcc/testsuite/gcc.target/aarch64/sve/extract_2.c
> +++ b/gcc/testsuite/gcc.target/aarch64/sve/extract_2.c
> @@ -56,7 +56,10 @@ typedef _Float16 vnx16hf __attribute__((vector_size (64)));
>  
>  TEST_ALL (EXTRACT)
>  
> -/* { dg-final { scan-assembler-times {\tfmov\tx[0-9]+, d[0-9]\n} 2 } } */
> +/* { dg-final { scan-assembler-times {\tfmov\tx[0-9]+, d[0-9]\n} 2 {
> + target { aarch64_little_endian } } } } */
> +/* { dg-final { scan-assembler-times {\tumov\tx[0-9]+, v[0-9]+\.d\[0\]\n} 1 {
> + target { aarch64_big_endian } } } } */
>  /* { dg-final { scan-assembler-times {\tumov\tx[0-9]+, v[0-9]+\.d\[1\]\n} 1 
> } } */
>  /* { dg-final { scan-assembler-not {\tdup\td[0-9]+, v[0-9]+\.d\[0\]\n} } } */
>  /* { dg-final { scan-assembler-times {\tdup\td[0-9]+, v[0-9]+\.d\[1\]\n} 1 } 
> } */
> @@ -64,7 +67,10 @@ TEST_ALL (EXTRACT)
>  /* { dg-final { scan-assembler-times {\tlastb\tx[0-9]+, p[0-7], 
> z[0-9]+\.d\n} 1 } } */
>  /* { dg-final { scan-assembler-times {\tlastb\td[0-9]+, p[0-7], 
> z[0-9]+\.d\n} 1 } } */
>  
> -/* { dg-final { scan-assembler-times {\tfmov\tw[0-9]+, s[0-9]\n} 2 } } */
> +/* { dg-final { scan-assembler-times {\tfmov\tx[0-9]+, s[0-9]\n} 2 {
> + target { aarch64_little_endian } } } } */
> +/* { dg-final { scan-assembler-times {\tumov\tx[0-9]+, v[0-9]+\.s\[0\]\n} 1 {
> + target { aarch64_big_endian } } } } */
>  /* { dg-final { scan-assembler-times {\tumov\tw[0-9]+, v[0-9]+\.s\[1\]\n} 1 
> } } */
>  /* { dg-final { scan-assembler-times {\tumov\tw[0-9]+, v[0-9]+\.s\[3\]\n} 1 
> } } */
>  /* { dg-final { scan-assembler-not {\tdup\ts[0-9]+, v[0-9]+\.s\[0\]\n} } } */
> diff --git a/gcc/testsuite/gcc.target/aarch64/sve/extract_3.c 
> b/gcc/testsuite/gcc.target/aarch64/sve/extract_3.c
> index 
> 501b9f3027128d3297ef77694f6dbcf1fb4d9824..baa54594c482253b1e0aae2b62a342edea0369e9
>  100644
> --- a/gcc/testsuite/gcc.target/aarch64/sve/extract_3.c
> +++ b/gcc/testsuite/gcc.target/aarch64/sve/extract_3.c
> @@ -77,7 +77,10 @@ typedef _Float16 vnx32hf __attribute__((vector_size 
> (128)));
>  
>  TEST_ALL (EXTRACT)
>  
> -/* { 

Re: [PATCH 3/4]AArch64: correct dot-product RTL patterns for aarch64.

2021-07-15 Thread Richard Sandiford via Gcc-patches
Tamar Christina  writes:
> Hi All,
>
> The previous fix for this problem was wrong due to a subtle difference between
> where NEON expects the RMW values and where intrinsics expects them.
>
> The insn pattern is modeled after the intrinsics and so needs an expand for
> the vectorizer optab to switch the RTL.
>
> However operand[3] is not expected to be written to so the current pattern is
> bogus.
>
> Instead we use the expand to shuffle around the RTL.
>
> The vectorizer expects operands[3] and operands[0] to be
> the same but the aarch64 intrinsics expanders expect operands[0] and
> operands[1] to be the same.
>
> Bootstrapped Regtested on aarch64-none-linux-gnu and no issues.
>
> Ok for master? and active branches after some stew?
>
> Thanks,
> Tamar
>
> gcc/ChangeLog:
>
>   * config/aarch64/aarch64-simd.md (dot_prod): Correct
>   RTL.
>
> --- inline copy of patch -- 
> diff --git a/gcc/config/aarch64/aarch64-simd.md 
> b/gcc/config/aarch64/aarch64-simd.md
> index 
> 7397f1ec5ca0cb9e3cdd5c46772f604e640666e4..51789f954affd9fa88e2bc1bcc3dacf64ccb5bde
>  100644
> --- a/gcc/config/aarch64/aarch64-simd.md
> +++ b/gcc/config/aarch64/aarch64-simd.md
> @@ -635,18 +635,12 @@ (define_insn "aarch64_usdot"
>  ;; and so the vectorizer provides r, in which the result has to be 
> accumulated.
>  (define_expand "dot_prod"
>[(set (match_operand:VS 0 "register_operand")
> - (plus:VS (unspec:VS [(match_operand: 1 "register_operand")
> + (plus:VS (match_operand:VS 3 "register_operand")
> +  (unspec:VS [(match_operand: 1 "register_operand")
>   (match_operand: 2 "register_operand")]
> -  DOTPROD)
> - (match_operand:VS 3 "register_operand")))]
> +  DOTPROD)))]
>"TARGET_DOTPROD"

The canonical plus: operand order was the original one, so I think
it would be better to keep this rtl as-is and instead change
aarch64_dot to:

(plus:VS (unspec:VS [(match_operand: 2 "register_operand" "w")
 (match_operand: 3 "register_operand" "w")]
DOTPROD)
 (match_operand:VS 1 "register_operand" "0"))

Same idea for aarch64_dot_lane and
aarch64_dot_laneq.

Sorry to be awkward…

Thanks,
Richard

> -{
> -  emit_insn (
> -gen_aarch64_dot (operands[3], operands[3], operands[1],
> - operands[2]));
> -  emit_insn (gen_rtx_SET (operands[0], operands[3]));
> -  DONE;
> -})
> +)
>  
>  ;; Auto-vectorizer pattern for usdot.  The operand[3] and operand[0] are the
>  ;; RMW parameters that when it comes to the vectorizer.


Re: [PATCH 2/4]AArch64: correct usdot vectorizer and intrinsics optabs

2021-07-15 Thread Richard Sandiford via Gcc-patches
Tamar Christina  writes:
> Hi All,
>
> There's a slight mismatch between the vectorizer optabs and the intrinsics
> patterns for NEON.  The vectorizer expects operands[3] and operands[0] to be
> the same but the aarch64 intrinsics expanders expect operands[0] and
> operands[1] to be the same.
>
> This means we need different patterns here.  This adds a separate usdot
> vectorizer pattern which just shuffles around the RTL params.
>
> There's also an inconsistency between the usdot and (u|s)dot intrinsics RTL
> patterns which is not corrected here.
>
> Bootstrapped Regtested on aarch64-none-linux-gnu and no issues.
>
> Ok for master?

Couldn't we just change:

> diff --git a/gcc/config/aarch64/arm_neon.h b/gcc/config/aarch64/arm_neon.h
> index 
> 00d76ea937ace5763746478cbdfadf6479e0b15a..17e059efb80fa86a8a32127ace4fc7f43e2040a8
>  100644
> --- a/gcc/config/aarch64/arm_neon.h
> +++ b/gcc/config/aarch64/arm_neon.h
> @@ -34039,14 +34039,14 @@ __extension__ extern __inline int32x2_t
>  __attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
>  vusdot_s32 (int32x2_t __r, uint8x8_t __a, int8x8_t __b)
>  {
> -  return __builtin_aarch64_usdot_prodv8qi_ssus (__r, __a, __b);
> +  return __builtin_aarch64_usdotv8qi_ssus (__r, __a, __b);

…this to __builtin_aarch64_usdot_prodv8qi_ssus (__a, __b, __r) etc.?
I think that's an OK thing to do when the function is named after
an optab rather than an arm_neon.h intrinsic.

Thanks,
Richard

>  }
>  
>  __extension__ extern __inline int32x4_t
>  __attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
>  vusdotq_s32 (int32x4_t __r, uint8x16_t __a, int8x16_t __b)
>  {
> -  return __builtin_aarch64_usdot_prodv16qi_ssus (__r, __a, __b);
> +  return __builtin_aarch64_usdotv16qi_ssus (__r, __a, __b);
>  }
>  
>  __extension__ extern __inline int32x2_t


[committed] analyzer: reimplement -Wanalyzer-use-of-uninitialized-value [PR95006 et al]

2021-07-15 Thread David Malcolm via Gcc-patches
The initial gcc 10 era commit of the analyzer (in
757bf1dff5e8cee34c0a75d06140ca972bfecfa7) had an implementation of
-Wanalyzer-use-of-uninitialized-value, but was sufficiently buggy
that I removed it in 78b9783774bfd3540f38f5b1e3c7fc9f719653d7 before
the release of gcc 10.1

This patch reintroduces the warning, heavily rewritten, with (I hope)
a less buggy implementation this time, for GCC 12.

Successfully bootstrapped & regrtested on x86_64-pc-linux-gnu.
Pushed to trunk as r12-2337-g33255ad3ac14e3953750fe0f2d82b901c2852ff6.

gcc/analyzer/ChangeLog:
PR analyzer/95006
PR analyzer/94713
PR analyzer/94714
* analyzer.cc (maybe_reconstruct_from_def_stmt): Split out
GIMPLE_ASSIGN case into...
(get_diagnostic_tree_for_gassign_1): New.
(get_diagnostic_tree_for_gassign): New.
* analyzer.h (get_diagnostic_tree_for_gassign): New decl.
* analyzer.opt (Wanalyzer-write-to-string-literal): New.
* constraint-manager.cc (class svalue_purger): New.
(constraint_manager::purge_state_involving): New.
* constraint-manager.h
(constraint_manager::purge_state_involving): New.
* diagnostic-manager.cc (saved_diagnostic::supercedes_p): New.
(dedupe_winners::handle_interactions): New.
(diagnostic_manager::emit_saved_diagnostics): Call it.
* diagnostic-manager.h (saved_diagnostic::supercedes_p): New decl.
* engine.cc (impl_region_model_context::warn): Convert return type
to bool.  Return false if the diagnostic isn't saved.
(impl_region_model_context::purge_state_involving): New.
(impl_sm_context::get_state): Use NULL ctxt when querying old
rvalue.
(impl_sm_context::set_next_state): Use new sval when querying old
state.
(class dump_path_diagnostic): Move to region-model.cc
(exploded_node::on_stmt): Move to on_stmt_pre and on_stmt_post.
Remove call to purge_state_involving.
(exploded_node::on_stmt_pre): New, based on the above.  Move most
of it to region_model::on_stmt_pre.
(exploded_node::on_stmt_post): Likewise, moving to
region_model::on_stmt_post.
(class stale_jmp_buf): Fix parent class to use curiously recurring
template pattern.
(feasibility_state::maybe_update_for_edge): Call on_call_pre and
on_call_post on gcalls.
* exploded-graph.h (impl_region_model_context::warn): Return bool.
(impl_region_model_context::purge_state_involving): New decl.
(exploded_node::on_stmt_pre): New decl.
(exploded_node::on_stmt_post): New decl.
* pending-diagnostic.h (pending_diagnostic::use_of_uninit_p): New.
(pending_diagnostic::supercedes_p): New.
* program-state.cc (sm_state_map::get_state): Inherit state for
conjured_svalue as well as initial_svalue.
(sm_state_map::purge_state_involving): Also support SK_CONJURED.
* region-model-impl-calls.cc (call_details::get_uncertainty):
Handle m_ctxt being NULL.
(call_details::get_or_create_conjured_svalue): New.
(region_model::impl_call_fgets): New.
(region_model::impl_call_fread): New.
* region-model-manager.cc
(region_model_manager::get_or_create_initial_value): Return an
uninitialized poisoned value for regions that can't have initial
values.
* region-model-reachability.cc
(reachable_regions::mark_escaped_clusters): Handle ctxt being
NULL.
* region-model.cc (region_to_value_map::purge_state_involving): New.
(poisoned_value_diagnostic::use_of_uninit_p): New.
(poisoned_value_diagnostic::emit): Handle POISON_KIND_UNINIT.
(poisoned_value_diagnostic::describe_final_event): Likewise.
(region_model::check_for_poison): New.
(region_model::on_assignment): Call it.
(class dump_path_diagnostic): Move here from engine.cc.
(region_model::on_stmt_pre): New, based on exploded_node::on_stmt.
(region_model::on_call_pre): Move the setting of the LHS to a
conjured svalue to before the checks for specific functions.
Handle "fgets", "fgets_unlocked", and "fread".
(region_model::purge_state_involving): New.
(region_model::handle_unrecognized_call): Handle ctxt being NULL.
(region_model::get_rvalue): Call check_for_poison.
(selftest::test_stack_frames): Use NULL for context when getting
uninitialized rvalue.
(selftest::test_alloca): Likewise.
* region-model.h (region_to_value_map::purge_state_involving): New
decl.
(call_details::get_or_create_conjured_svalue): New decl.
(region_model::on_stmt_pre): New decl.
(region_model::purge_state_involving): New decl.
(region_model::impl_call_fgets): New decl.
(region_model::impl_call_fread): New decl.
(region_model::check_for_poison): New decl.
   

[committed] analyzer: add -fdump-analyzer-exploded-paths

2021-07-15 Thread David Malcolm via Gcc-patches
Successfully bootstrapped & regrtested on x86_64-pc-linux-gnu.
Pushed to trunk as 98cd4d123aa14598b1f0d54c22663c8200a96d9c.

gcc/analyzer/ChangeLog:
* analyzer.opt (fdump-analyzer-exploded-paths): New.
* diagnostic-manager.cc
(diagnostic_manager::emit_saved_diagnostic): Implement it.
* engine.cc (exploded_path::dump_to_pp): Add ext_state param and
use it to dump states if non-NULL.
(exploded_path::dump): Likewise.
(exploded_path::dump_to_file): New.
* exploded-graph.h (exploded_path::dump_to_pp): Add ext_state
param.
(exploded_path::dump): Likewise.
(exploded_path::dump): Likewise.
(exploded_path::dump_to_file): New.

gcc/ChangeLog:
* doc/invoke.texi (-fdump-analyzer-exploded-paths): New.

Signed-off-by: David Malcolm 
---
 gcc/analyzer/analyzer.opt  |  4 
 gcc/analyzer/diagnostic-manager.cc | 11 ++
 gcc/analyzer/engine.cc | 34 --
 gcc/analyzer/exploded-graph.h  |  9 +---
 gcc/doc/invoke.texi|  6 ++
 5 files changed, 55 insertions(+), 9 deletions(-)

diff --git a/gcc/analyzer/analyzer.opt b/gcc/analyzer/analyzer.opt
index dd34495abd5..7b77ae8a73d 100644
--- a/gcc/analyzer/analyzer.opt
+++ b/gcc/analyzer/analyzer.opt
@@ -210,6 +210,10 @@ fdump-analyzer-exploded-nodes-3
 Common RejectNegative Var(flag_dump_analyzer_exploded_nodes_3)
 Dump a textual representation of the exploded graph to SRCFILE.eg-ID.txt.
 
+fdump-analyzer-exploded-paths
+Common RejectNegative Var(flag_dump_analyzer_exploded_paths)
+Dump a textual representation of each diagnostic's exploded path to 
SRCFILE.IDX.KIND.epath.txt.
+
 fdump-analyzer-feasibility
 Common RejectNegative Var(flag_dump_analyzer_feasibility)
 Dump various analyzer internals to SRCFILE.*.fg.dot and SRCFILE.*.tg.dot.
diff --git a/gcc/analyzer/diagnostic-manager.cc 
b/gcc/analyzer/diagnostic-manager.cc
index b7d263b4217..d005facc20b 100644
--- a/gcc/analyzer/diagnostic-manager.cc
+++ b/gcc/analyzer/diagnostic-manager.cc
@@ -1164,6 +1164,17 @@ diagnostic_manager::emit_saved_diagnostic (const 
exploded_graph ,
inform_n (loc, num_dupes,
  "%i duplicate", "%i duplicates",
  num_dupes);
+  if (flag_dump_analyzer_exploded_paths)
+   {
+ auto_timevar tv (TV_ANALYZER_DUMP);
+ pretty_printer pp;
+ pp_printf (, "%s.%i.%s.epath.txt",
+dump_base_name, sd.get_index (), sd.m_d->get_kind ());
+ char *filename = xstrdup (pp_formatted_text ());
+ epath->dump_to_file (filename, eg.get_ext_state ());
+ inform (loc, "exploded path written to %qs", filename);
+ free (filename);
+   }
 }
   delete pp;
 }
diff --git a/gcc/analyzer/engine.cc b/gcc/analyzer/engine.cc
index 8f3e7f781b2..dc07a79e185 100644
--- a/gcc/analyzer/engine.cc
+++ b/gcc/analyzer/engine.cc
@@ -3630,10 +3630,12 @@ exploded_path::feasible_p (logger *logger, 
feasibility_problem **out,
   return true;
 }
 
-/* Dump this path in multiline form to PP.  */
+/* Dump this path in multiline form to PP.
+   If EXT_STATE is non-NULL, then show the nodes.  */
 
 void
-exploded_path::dump_to_pp (pretty_printer *pp) const
+exploded_path::dump_to_pp (pretty_printer *pp,
+  const extrinsic_state *ext_state) const
 {
   for (unsigned i = 0; i < m_edges.length (); i++)
 {
@@ -3643,28 +3645,48 @@ exploded_path::dump_to_pp (pretty_printer *pp) const
 eedge->m_src->m_index,
 eedge->m_dest->m_index);
   pp_newline (pp);
+
+  if (ext_state)
+   eedge->m_dest->dump_to_pp (pp, *ext_state);
 }
 }
 
 /* Dump this path in multiline form to FP.  */
 
 void
-exploded_path::dump (FILE *fp) const
+exploded_path::dump (FILE *fp, const extrinsic_state *ext_state) const
 {
   pretty_printer pp;
   pp_format_decoder () = default_tree_printer;
   pp_show_color () = pp_show_color (global_dc->printer);
   pp.buffer->stream = fp;
-  dump_to_pp ();
+  dump_to_pp (, ext_state);
   pp_flush ();
 }
 
 /* Dump this path in multiline form to stderr.  */
 
 DEBUG_FUNCTION void
-exploded_path::dump () const
+exploded_path::dump (const extrinsic_state *ext_state) const
 {
-  dump (stderr);
+  dump (stderr, ext_state);
+}
+
+/* Dump this path verbosely to FILENAME.  */
+
+void
+exploded_path::dump_to_file (const char *filename,
+const extrinsic_state _state) const
+{
+  FILE *fp = fopen (filename, "w");
+  if (!fp)
+return;
+  pretty_printer pp;
+  pp_format_decoder () = default_tree_printer;
+  pp.buffer->stream = fp;
+  dump_to_pp (, _state);
+  pp_flush ();
+  fclose (fp);
 }
 
 /* class feasibility_problem.  */
diff --git a/gcc/analyzer/exploded-graph.h b/gcc/analyzer/exploded-graph.h
index 2d25e5e5167..1d8b73da7c4 100644
--- a/gcc/analyzer/exploded-graph.h
+++ b/gcc/analyzer/exploded-graph.h
@@ -895,9 +895,12 @@ public:
 
   exploded_node 

[committed] analyzer: use DECL_DEBUG_EXPR on SSA names for artificial vars

2021-07-15 Thread David Malcolm via Gcc-patches
Successfully bootstrapped & regrtested on x86_64-pc-linux-gnu.
Pushed to trunk as e9711fe482b4abef0e7572809d3593631991276e.

gcc/analyzer/ChangeLog:
* analyzer.cc (fixup_tree_for_diagnostic_1): Use DECL_DEBUG_EXPR
if it's available.
* engine.cc (readability): Likewise.

Signed-off-by: David Malcolm 
---
 gcc/analyzer/analyzer.cc |  9 +++--
 gcc/analyzer/engine.cc   | 19 ---
 2 files changed, 23 insertions(+), 5 deletions(-)

diff --git a/gcc/analyzer/analyzer.cc b/gcc/analyzer/analyzer.cc
index 12c03f6cfbd..a8ee1a1a2dc 100644
--- a/gcc/analyzer/analyzer.cc
+++ b/gcc/analyzer/analyzer.cc
@@ -165,8 +165,13 @@ fixup_tree_for_diagnostic_1 (tree expr, hash_set 
*visited)
   && TREE_CODE (expr) == SSA_NAME
   && (SSA_NAME_VAR (expr) == NULL_TREE
  || DECL_ARTIFICIAL (SSA_NAME_VAR (expr
-if (tree expr2 = maybe_reconstruct_from_def_stmt (expr, visited))
-  return expr2;
+{
+  if (tree var = SSA_NAME_VAR (expr))
+   if (VAR_P (var) && DECL_HAS_DEBUG_EXPR_P (var))
+ return DECL_DEBUG_EXPR (var);
+  if (tree expr2 = maybe_reconstruct_from_def_stmt (expr, visited))
+   return expr2;
+}
   return expr;
 }
 
diff --git a/gcc/analyzer/engine.cc b/gcc/analyzer/engine.cc
index 01b83a4ef28..8f3e7f781b2 100644
--- a/gcc/analyzer/engine.cc
+++ b/gcc/analyzer/engine.cc
@@ -527,9 +527,22 @@ readability (const_tree expr)
 case SSA_NAME:
   {
if (tree var = SSA_NAME_VAR (expr))
- /* Slightly favor the underlying var over the SSA name to
-avoid having them compare equal.  */
- return readability (var) - 1;
+ {
+   if (DECL_ARTIFICIAL (var))
+ {
+   /* If we have an SSA name for an artificial var,
+  only use it if it has a debug expr associated with
+  it that fixup_tree_for_diagnostic can use.  */
+   if (VAR_P (var) && DECL_HAS_DEBUG_EXPR_P (var))
+ return readability (DECL_DEBUG_EXPR (var)) - 1;
+ }
+   else
+ {
+   /* Slightly favor the underlying var over the SSA name to
+  avoid having them compare equal.  */
+   return readability (var) - 1;
+ }
+ }
/* Avoid printing '' for SSA names for temporaries.  */
return -1;
   }
-- 
2.26.3



[committed] analyzer: handle self-referential phis

2021-07-15 Thread David Malcolm via Gcc-patches
Successfully bootstrapped & regrtested on x86_64-pc-linux-gnu.
Pushed to trunk as a9241df96e1950c630550ada9371c0b4a03496cf.

gcc/analyzer/ChangeLog:
* state-purge.cc (self_referential_phi_p): New.
(state_purge_per_ssa_name::process_point): Don't purge an SSA name
at its def-stmt if the def-stmt is self-referential.

gcc/testsuite/ChangeLog:
* gcc.dg/analyzer/phi-1.c: New test.

Signed-off-by: David Malcolm 
---
 gcc/analyzer/state-purge.cc   | 37 ---
 gcc/testsuite/gcc.dg/analyzer/phi-1.c | 24 +
 2 files changed, 58 insertions(+), 3 deletions(-)
 create mode 100644 gcc/testsuite/gcc.dg/analyzer/phi-1.c

diff --git a/gcc/analyzer/state-purge.cc b/gcc/analyzer/state-purge.cc
index 70a09ed581f..e82ea87e735 100644
--- a/gcc/analyzer/state-purge.cc
+++ b/gcc/analyzer/state-purge.cc
@@ -288,6 +288,20 @@ state_purge_per_ssa_name::add_to_worklist (const 
function_point ,
 }
 }
 
+/* Does this phi depend on itself?
+   e.g. in:
+ added_2 = PHI 
+   the middle defn (from edge 3) requires added_2 itself.  */
+
+static bool
+self_referential_phi_p (const gphi *phi)
+{
+  for (unsigned i = 0; i < gimple_phi_num_args (phi); i++)
+if (gimple_phi_arg_def (phi, i) == gimple_phi_result (phi))
+  return true;
+  return false;
+}
+
 /* Process POINT, popped from WORKLIST.
Iterate over predecessors of POINT, adding to WORKLIST.  */
 
@@ -326,11 +340,28 @@ state_purge_per_ssa_name::process_point (const 
function_point ,
 !gsi_end_p (gpi); gsi_next ())
  {
gphi *phi = gpi.phi ();
+   /* Are we at the def-stmt for m_name?  */
if (phi == def_stmt)
  {
-   if (logger)
- logger->log ("def stmt within phis; terminating");
-   return;
+   /* Does this phi depend on itself?
+  e.g. in:
+added_2 = PHI 
+  the middle defn (from edge 3) requires added_2 itself
+  so we can't purge it here.  */
+   if (self_referential_phi_p (phi))
+ {
+   if (logger)
+ logger->log ("self-referential def stmt within phis;"
+  " continuing");
+ }
+   else
+ {
+   /* Otherwise, we can stop here, so that m_name
+  can be purged.  */
+   if (logger)
+ logger->log ("def stmt within phis; terminating");
+   return;
+ }
  }
  }
 
diff --git a/gcc/testsuite/gcc.dg/analyzer/phi-1.c 
b/gcc/testsuite/gcc.dg/analyzer/phi-1.c
new file mode 100644
index 000..09260033fef
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/analyzer/phi-1.c
@@ -0,0 +1,24 @@
+/* { dg-do "compile" } */
+
+typedef __SIZE_TYPE__ size_t;
+#define NULL ((void *) 0)
+
+extern const char *foo (void);
+extern size_t bar (void);
+
+void
+_nl_expand_alias (const char *locale_alias_path)
+{
+  size_t added;
+  do
+{
+  added = 0;
+  while (added == 0 && locale_alias_path[0] != '\0')
+   {
+ const char *start = foo ();
+ if (start < locale_alias_path)
+   added = bar ();
+   }
+}
+  while (added != 0);
+}
-- 
2.26.3



[PATCH] gcc_update: use gcc-descr git alias for revision string in gcc/REVISION

2021-07-15 Thread Serge Belyshev
This is to make development version string more readable, and
to simplify navigation through gcc-testresults.

Currently gcc_update uses git log --pretty=tformat:%p:%t:%H to
generate version string, which is somewhat excessive since conversion
to git because commit hashes are now stable.

Even better, gcc-git-customization.sh script provides gcc-descr alias
which makes prettier version string, and thus use it instead (or just
abbreviated commit hash when the alias is not available).

Before: [master revision 
b25edf6e6fe:e035f180ebf:7094a69bd62a14dfa311eaa2fea468f221c7c9f3]
After: [master r12-2331]

OK for mainline?

contrib/Changelog:

* gcc_update: Use gcc-descr alias for revision string if it exists, or
abbreviated commit hash instead. Drop "revision" from gcc/REVISION.
---
 contrib/gcc_update | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/contrib/gcc_update b/contrib/gcc_update
index 80fac9fc995..8f712e37616 100755
--- a/contrib/gcc_update
+++ b/contrib/gcc_update
@@ -332,7 +332,7 @@ case $vcs_type in
 exit 1
fi
 
-   revision=`$GCC_GIT log -n1 --pretty=tformat:%p:%t:%H`
+   revision=`$GCC_GIT gcc-descr || $GCC_GIT log -n1 --pretty=tformat:%h`
branch=`$GCC_GIT name-rev --name-only HEAD || :`
;;
 
@@ -414,6 +414,6 @@ rm -f LAST_UPDATED gcc/REVISION
 date
 echo "`TZ=UTC date` (revision $revision)"
 } > LAST_UPDATED
-echo "[$branch revision $revision]" > gcc/REVISION
+echo "[$branch $revision]" > gcc/REVISION
 
 touch_files_reexec


Re: [committed] libstdc++: Add noexcept to __replacement_assert [PR101429]

2021-07-15 Thread Jonathan Wakely via Gcc-patches
On Thu, 15 Jul 2021, 18:21 François Dumont via Libstdc++, <
libstd...@gcc.gnu.org> wrote:

> On 15/07/21 5:26 pm, Jonathan Wakely via Libstdc++ wrote:
> > This results in slightly smaller code when assertions are enabled when
> > either using Clang (because it adds code to call std::terminate when
> > potentially-throwing functions are called in a noexcept function) or a
> > freestanding or non-verbose build (because it doesn't use printf).
> >
> > Signed-off-by: Jonathan Wakely 
> >
> > libstdc++-v3/ChangeLog:
> >
> >   PR libstdc++/101429
> >   * include/bits/c++config (__replacement_assert): Add noexcept.
> >   [!_GLIBCXX_VERBOSE] (__glibcxx_assert_impl): Use __builtin_trap
> >   instead of __replacement_assert.
> >
> > Tested powerpc64le-linux. Committed to trunk.
> >
> ChangeLog is talking about __builtin_trap but there is none in the
> attached patch.
>


Yes I already noticed that and mentioned it in the bugzilla PR. It uses
__builtin_abort not __builtin_trap. I'll fix the ChangeLog file tomorrow
after it gets generated.

The Git commit message will stay wrong though.


Re: [PATCH] [android] Disable large files when unsupported

2021-07-15 Thread Abraão de Santana via Gcc-patches
Hey João , I think there's a problem with your email, it's empty!

--
*Abraão C. de Santana*


Re: [PATCH] [android] Disable large files when unsupported

2021-07-15 Thread João Gabriel Jardim via Gcc-patches
-- 
João Gabriel Jardim


Re: [PATCH] [android] Disable large files when unsupported

2021-07-15 Thread João Gabriel Jardim via Gcc-patches
-- 
João Gabriel Jardim


[PATCH] c++: Add C++20 #__VA_OPT__ support

2021-07-15 Thread Jakub Jelinek via Gcc-patches
Hi!

The following patch implements C++20 # __VA_OPT__ (...) support.
Testcases cover what I came up with myself and what LLVM has for #__VA_OPT__
in its testsuite and the string literals are identical between the two
compilers on the va-opt-5.c testcase.

Haven't looked at the non-#__VA_OPT__ differences between LLVM and GCC
though, I think at least the
http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2018/p1042r1.html
#define H4(X, ...) __VA_OPT__(a X ## X) ## b
H4(, 1)  // replaced by a b
case isn't handled right (we emit ab).

Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?

2021-07-15  Jakub Jelinek  

libcpp/
* macro.c (vaopt_state): Add m_stringify member.
(vaopt_state::vaopt_state): Initialize it.
(vaopt_state::update): Overwrite it.
(vaopt_state::stringify): New method.
(stringify_arg): Replace arg argument with first, count arguments
and add va_opt argument.  Use first instead of arg->first and
count instead of arg->count, for va_opt add paste_tokens handling.
(paste_tokens): Fix up len calculation.  Don't spell rhs twice,
instead use %.*s to supply lhs and rhs spelling lengths.  Don't call
_cpp_backup_tokens here.
(paste_all_tokens): Call it here instead.
(replace_args): Adjust stringify_arg caller.  For vaopt_state::END
if stringify is true handle __VA_OPT__ stringification.
(create_iso_definition): Handle # __VA_OPT__ similarly to # macro_arg.
gcc/testsuite/
* c-c++-common/cpp/va-opt-5.c: New test.
* c-c++-common/cpp/va-opt-6.c: New test.

--- libcpp/macro.c.jj   2021-05-21 10:34:09.328560825 +0200
+++ libcpp/macro.c  2021-07-15 17:27:30.109631306 +0200
@@ -118,6 +118,7 @@ class vaopt_state {
 m_arg (arg),
 m_variadic (is_variadic),
 m_last_was_paste (false),
+m_stringify (false),
 m_state (0),
 m_paste_location (0),
 m_location (0),
@@ -145,6 +146,7 @@ class vaopt_state {
  }
++m_state;
m_location = token->src_loc;
+   m_stringify = (token->flags & STRINGIFY_ARG) != 0;
return BEGIN;
   }
 else if (m_state == 1)
@@ -234,6 +236,11 @@ class vaopt_state {
 return m_state == 0;
   }
 
+  bool stringify () const
+  {
+return m_stringify;
+  }  
+
  private:
 
   /* The cpp_reader.  */
@@ -247,6 +254,8 @@ class vaopt_state {
   /* If true, the previous token was ##.  This is used to detect when
  a paste occurs at the end of the sequence.  */
   bool m_last_was_paste;
+  /* True for #__VA_OPT__.  */
+  bool m_stringify;
 
   /* The state variable:
  0 means not parsing
@@ -284,7 +293,8 @@ static _cpp_buff *collect_args (cpp_read
 static cpp_context *next_context (cpp_reader *);
 static const cpp_token *padding_token (cpp_reader *, const cpp_token *);
 static const cpp_token *new_string_token (cpp_reader *, uchar *, unsigned int);
-static const cpp_token *stringify_arg (cpp_reader *, macro_arg *);
+static const cpp_token *stringify_arg (cpp_reader *, const cpp_token **,
+  unsigned int, bool);
 static void paste_all_tokens (cpp_reader *, const cpp_token *);
 static bool paste_tokens (cpp_reader *, location_t,
  const cpp_token **, const cpp_token *);
@@ -818,10 +828,11 @@ cpp_quote_string (uchar *dest, const uch
   return dest;
 }
 
-/* Convert a token sequence ARG to a single string token according to
-   the rules of the ISO C #-operator.  */
+/* Convert a token sequence FIRST to FIRST+COUNT-1 to a single string token
+   according to the rules of the ISO C #-operator.  */
 static const cpp_token *
-stringify_arg (cpp_reader *pfile, macro_arg *arg)
+stringify_arg (cpp_reader *pfile, const cpp_token **first, unsigned int count,
+  bool va_opt)
 {
   unsigned char *dest;
   unsigned int i, escape_it, backslash_count = 0;
@@ -834,9 +845,27 @@ stringify_arg (cpp_reader *pfile, macro_
   *dest++ = '"';
 
   /* Loop, reading in the argument's tokens.  */
-  for (i = 0; i < arg->count; i++)
+  for (i = 0; i < count; i++)
 {
-  const cpp_token *token = arg->first[i];
+  const cpp_token *token = first[i];
+
+  if (va_opt && (token->flags & PASTE_LEFT))
+   {
+ location_t virt_loc = pfile->invocation_location;
+ const cpp_token *rhs;
+ do
+   {
+ if (i == count)
+   abort ();
+ rhs = first[++i];
+ if (!paste_tokens (pfile, virt_loc, , rhs))
+   {
+ --i;
+ break;
+   }
+   }
+ while (rhs->flags & PASTE_LEFT);
+   }
 
   if (token->type == CPP_PADDING)
{
@@ -923,7 +952,7 @@ paste_tokens (cpp_reader *pfile, locatio
   cpp_token *lhs;
   unsigned int len;
 
-  len = cpp_token_len (*plhs) + cpp_token_len (rhs) + 1;
+  len = cpp_token_len (*plhs) + cpp_token_len (rhs) + 2;
   buf = (unsigned char *) alloca (len);
   end 

[PATCH] x86: Don't set AVX_U128_DIRTY when all bits are zero

2021-07-15 Thread H.J. Lu via Gcc-patches
In a single SET, all bits of the source YMM/ZMM register are zero when

1. The source is contant zero.
2. The source YMM/ZMM operand are defined from contant zero.

and we don't set AVX_U128_DIRTY.

gcc/

PR target/101456
* config/i386/i386.c (ix86_avx_u128_mode_needed): Don't set
AVX_U128_DIRTY when all bits are zero.

gcc/testsuite/

PR target/101456
* gcc.target/i386/pr101456-1.c: New test.
---
 gcc/config/i386/i386.c | 47 ++
 gcc/testsuite/gcc.target/i386/pr101456-1.c | 28 +
 2 files changed, 75 insertions(+)
 create mode 100644 gcc/testsuite/gcc.target/i386/pr101456-1.c

diff --git a/gcc/config/i386/i386.c b/gcc/config/i386/i386.c
index cff26909292..c2b06934053 100644
--- a/gcc/config/i386/i386.c
+++ b/gcc/config/i386/i386.c
@@ -14129,6 +14129,53 @@ ix86_avx_u128_mode_needed (rtx_insn *insn)
   return AVX_U128_CLEAN;
 }
 
+  rtx set = single_set (insn);
+  if (set)
+{
+  rtx dest = SET_DEST (set);
+  rtx src = SET_SRC (set);
+  if (ix86_check_avx_upper_register (dest))
+   {
+ /* It is not dirty if the source is known zero.  */
+ if (standard_sse_constant_p (src, GET_MODE (dest)) == 1)
+   return AVX_U128_ANY;
+ else
+   return AVX_U128_DIRTY;
+   }
+  else if (ix86_check_avx_upper_register (src))
+   {
+ /* Check for the source operand with all DEFs from constant
+zero.  */
+ df_ref def = DF_REG_DEF_CHAIN (REGNO (src));
+ if (!def)
+   return AVX_U128_DIRTY;
+
+ for (; def; def = DF_REF_NEXT_REG (def))
+   if (DF_REF_REG_DEF_P (def)
+   && !DF_REF_IS_ARTIFICIAL (def))
+ {
+   rtx_insn *def_insn = DF_REF_INSN (def);
+   set = single_set (def_insn);
+   if (!set)
+ return AVX_U128_DIRTY;
+
+   dest = SET_DEST (set);
+   if (ix86_check_avx_upper_register (dest))
+ {
+   src = SET_SRC (set);
+   /* It is dirty if the source operand isn't constant
+  zero.  */
+   if (standard_sse_constant_p (src, GET_MODE (dest))
+   != 1)
+ return AVX_U128_DIRTY;
+ }
+ }
+
+ /* It is not dirty only if all sources are known zero.  */
+ return AVX_U128_ANY;
+   }
+}
+
   /* Require DIRTY mode if a 256bit or 512bit AVX register is referenced.
  Hardware changes state only when a 256bit register is written to,
  but we need to prevent the compiler from moving optimal insertion
diff --git a/gcc/testsuite/gcc.target/i386/pr101456-1.c 
b/gcc/testsuite/gcc.target/i386/pr101456-1.c
new file mode 100644
index 000..6a0f6ccd756
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/pr101456-1.c
@@ -0,0 +1,28 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -march=skylake" } */
+
+#include 
+
+extern __m256 x1;
+extern __m256d x2;
+extern __m256i x3;
+
+void
+foo1 (void)
+{
+  x1 = _mm256_setzero_ps ();
+}
+
+void
+foo2 (void)
+{
+  x2 = _mm256_setzero_pd ();
+}
+
+void
+foo3 (void)
+{
+  x3 = _mm256_setzero_si256 ();
+}
+
+/* { dg-final { scan-assembler-not "vzeroupper" } } */
-- 
2.31.1



Re: [committed] libstdc++: Add noexcept to __replacement_assert [PR101429]

2021-07-15 Thread François Dumont via Gcc-patches

On 15/07/21 5:26 pm, Jonathan Wakely via Libstdc++ wrote:

This results in slightly smaller code when assertions are enabled when
either using Clang (because it adds code to call std::terminate when
potentially-throwing functions are called in a noexcept function) or a
freestanding or non-verbose build (because it doesn't use printf).

Signed-off-by: Jonathan Wakely 

libstdc++-v3/ChangeLog:

PR libstdc++/101429
* include/bits/c++config (__replacement_assert): Add noexcept.
[!_GLIBCXX_VERBOSE] (__glibcxx_assert_impl): Use __builtin_trap
instead of __replacement_assert.

Tested powerpc64le-linux. Committed to trunk.

ChangeLog is talking about __builtin_trap but there is none in the 
attached patch.




Re: [PATCH] c++: argument pack expansion inside constraint [PR100138]

2021-07-15 Thread Patrick Palka via Gcc-patches
On Sat, May 8, 2021 at 8:42 AM Jason Merrill  wrote:
>
> On 5/7/21 12:33 PM, Patrick Palka wrote:
> > This PR is about CTAD but the underlying problems are more general;
> > CTAD is a good trigger for them because of the necessary substitution
> > into constraints that deduction guide generation entails.
> >
> > In the testcase below, when generating the implicit deduction guide for
> > the constrained constructor template for A, we substitute the generic
> > flattening map 'tsubst_args' into the constructor's constraints.  During
> > this substitution, tsubst_pack_expansion returns a rebuilt pack
> > expansion for sizeof...(xs), but it's neglecting to carry over the
> > PACK_EXPANSION_LOCAL_P (and PACK_EXPANSION_SIZEOF_P) flag from the
> > original tree to the rebuilt one.  The flag is otherwise unset on the
> > original tree[1] but set for the rebuilt tree from make_pack_expansion
> > only because we're doing the CTAD at function scope (inside main).  This
> > leads us to crash when substituting into the pack expansion during
> > satisfaction because we don't have local_specializations set up (it'd be
> > set up for us if PACK_EXPANSION_LOCAL_P is unset)
> >
> > Similarly, when substituting into a constraint we need to set
> > cp_unevaluated since constraints are unevaluated operands.  This avoids
> > a crash during CTAD for C below.
> >
> > [1]: Although the original pack expansion is in a function context, I
> > guess it makes sense that PACK_EXPANSION_LOCAL_P is unset for it because
> > we can't rely on local specializations (which are formed when
> > substituting into the function declaration) during satisfaction.
> >
> > Bootstrapped and regtested on x86_64-pc-linux-gnu, also tested on
> > cmcstl2 and range-v3, does this look OK for trunk?
>
> OK.

Would it be ok to backport this patch to the 11 branch given its
impact on concepts (or perhaps backport only part of it, say all but
the PACK_EXPANSION_LOCAL_P propagation since that part just avoids
ICEing on the invalid portions of the testcase)?

>
> > gcc/cp/ChangeLog:
> >
> >   PR c++/100138
> >   * constraint.cc (tsubst_constraint): Set up cp_unevaluated.
> >   (satisfy_atom): Set up iloc_sentinel before calling
> >   cxx_constant_value.
> >   * pt.c (tsubst_pack_expansion): When returning a rebuilt pack
> >   expansion, carry over PACK_EXPANSION_LOCAL_P and
> >   PACK_EXPANSION_SIZEOF_P from the original pack expansion.
> >
> > gcc/testsuite/ChangeLog:
> >
> >   PR c++/100138
> >   * g++.dg/cpp2a/concepts-ctad4.C: New test.
> > ---
> >   gcc/cp/constraint.cc|  6 -
> >   gcc/cp/pt.c |  2 ++
> >   gcc/testsuite/g++.dg/cpp2a/concepts-ctad4.C | 25 +
> >   3 files changed, 32 insertions(+), 1 deletion(-)
> >   create mode 100644 gcc/testsuite/g++.dg/cpp2a/concepts-ctad4.C
> >
> > diff --git a/gcc/cp/constraint.cc b/gcc/cp/constraint.cc
> > index 0709695fd08..30fccc46678 100644
> > --- a/gcc/cp/constraint.cc
> > +++ b/gcc/cp/constraint.cc
> > @@ -2747,6 +2747,7 @@ tsubst_constraint (tree t, tree args, tsubst_flags_t 
> > complain, tree in_decl)
> > /* We also don't want to evaluate concept-checks when substituting the
> >constraint-expressions of a declaration.  */
> > processing_constraint_expression_sentinel s;
> > +  cp_unevaluated u;
> > tree expr = tsubst_expr (t, args, complain, in_decl, false);
> > return expr;
> >   }
> > @@ -3005,7 +3006,10 @@ satisfy_atom (tree t, tree args, sat_info info)
> >
> > /* Compute the value of the constraint.  */
> > if (info.noisy ())
> > -result = cxx_constant_value (result);
> > +{
> > +  iloc_sentinel ils (EXPR_LOCATION (result));
> > +  result = cxx_constant_value (result);
> > +}
> > else
> >   {
> > result = maybe_constant_value (result, NULL_TREE,
> > diff --git a/gcc/cp/pt.c b/gcc/cp/pt.c
> > index 36a8cb5df5d..0d27dd1af65 100644
> > --- a/gcc/cp/pt.c
> > +++ b/gcc/cp/pt.c
> > @@ -13203,6 +13203,8 @@ tsubst_pack_expansion (tree t, tree args, 
> > tsubst_flags_t complain,
> > else
> >   result = tsubst (pattern, args, complain, in_decl);
> > result = make_pack_expansion (result, complain);
> > +  PACK_EXPANSION_LOCAL_P (result) = PACK_EXPANSION_LOCAL_P (t);
> > +  PACK_EXPANSION_SIZEOF_P (result) = PACK_EXPANSION_SIZEOF_P (t);
> > if (PACK_EXPANSION_AUTO_P (t))
> >   {
> > /* This is a fake auto... pack expansion created in add_capture with
> > diff --git a/gcc/testsuite/g++.dg/cpp2a/concepts-ctad4.C 
> > b/gcc/testsuite/g++.dg/cpp2a/concepts-ctad4.C
> > new file mode 100644
> > index 000..95a3a22dd04
> > --- /dev/null
> > +++ b/gcc/testsuite/g++.dg/cpp2a/concepts-ctad4.C
> > @@ -0,0 +1,25 @@
> > +// PR c++/100138
> > +// { dg-do compile { target c++20 } }
> > +
> > +template 
> > +struct A {
> > +  A(T, auto... xs) requires (sizeof...(xs) != 0) { }
> > +};
> > +
> > 

[PATCH 4/4][AArch32]: correct dot-product RTL patterns.

2021-07-15 Thread Tamar Christina via Gcc-patches
Hi All,

The previous fix for this problem was wrong due to a subtle difference between
where NEON expects the RMW values and where intrinsics expects them.

The insn pattern is modeled after the intrinsics and so needs an expand for
the vectorizer optab to switch the RTL.

However operand[3] is not expected to be written to so the current pattern is
bogus.

Instead we use the expand to shuffle around the RTL.

The vectorizer expects operands[3] and operands[0] to be
the same but the aarch64 intrinsics expanders expect operands[0] and
operands[1] to be the same.

arm-none-linux-gnueabihf build is currently broken, the best I could do is
verify on arm-none-eabi but the tests are all marked UNSUPPORTED, but the ICE
is gone for the backend test.

Ok for master? and active branches after some stew?

Thanks,
Tamar

gcc/ChangeLog:

* config/arm/neon.md (dot_prod): Correct RTL.

--- inline copy of patch -- 
diff --git a/gcc/config/arm/neon.md b/gcc/config/arm/neon.md
index 
8b0a396947cc8e7345f178b926128d7224fb218a..876577fc20daee30ecdf03942c0d81c15bf8fe9a
 100644
--- a/gcc/config/arm/neon.md
+++ b/gcc/config/arm/neon.md
@@ -2954,20 +2954,14 @@ (define_insn "neon_dot_lane"
 ;; and so the vectorizer provides r, in which the result has to be accumulated.
 (define_expand "dot_prod"
   [(set (match_operand:VCVTI 0 "register_operand")
-   (plus:VCVTI (unspec:VCVTI [(match_operand: 1
+   (plus:VCVTI (match_operand:VCVTI 3 "register_operand")
+   (unspec:VCVTI [(match_operand: 1
"register_operand")
   (match_operand: 2
"register_operand")]
-DOTPROD)
-   (match_operand:VCVTI 3 "register_operand")))]
+DOTPROD)))]
   "TARGET_DOTPROD"
-{
-  emit_insn (
-gen_neon_dot (operands[3], operands[3], operands[1],
-operands[2]));
-  emit_insn (gen_rtx_SET (operands[0], operands[3]));
-  DONE;
-})
+)
 
 ;; Auto-vectorizer pattern for usdot
 (define_expand "usdot_prod"


-- 
diff --git a/gcc/config/arm/neon.md b/gcc/config/arm/neon.md
index 8b0a396947cc8e7345f178b926128d7224fb218a..876577fc20daee30ecdf03942c0d81c15bf8fe9a 100644
--- a/gcc/config/arm/neon.md
+++ b/gcc/config/arm/neon.md
@@ -2954,20 +2954,14 @@ (define_insn "neon_dot_lane"
 ;; and so the vectorizer provides r, in which the result has to be accumulated.
 (define_expand "dot_prod"
   [(set (match_operand:VCVTI 0 "register_operand")
-	(plus:VCVTI (unspec:VCVTI [(match_operand: 1
+	(plus:VCVTI (match_operand:VCVTI 3 "register_operand")
+		(unspec:VCVTI [(match_operand: 1
 			"register_operand")
    (match_operand: 2
 			"register_operand")]
-		 DOTPROD)
-		(match_operand:VCVTI 3 "register_operand")))]
+		 DOTPROD)))]
   "TARGET_DOTPROD"
-{
-  emit_insn (
-gen_neon_dot (operands[3], operands[3], operands[1],
- operands[2]));
-  emit_insn (gen_rtx_SET (operands[0], operands[3]));
-  DONE;
-})
+)
 
 ;; Auto-vectorizer pattern for usdot
 (define_expand "usdot_prod"



[PATCH 3/4]AArch64: correct dot-product RTL patterns for aarch64.

2021-07-15 Thread Tamar Christina via Gcc-patches
Hi All,

The previous fix for this problem was wrong due to a subtle difference between
where NEON expects the RMW values and where intrinsics expects them.

The insn pattern is modeled after the intrinsics and so needs an expand for
the vectorizer optab to switch the RTL.

However operand[3] is not expected to be written to so the current pattern is
bogus.

Instead we use the expand to shuffle around the RTL.

The vectorizer expects operands[3] and operands[0] to be
the same but the aarch64 intrinsics expanders expect operands[0] and
operands[1] to be the same.

Bootstrapped Regtested on aarch64-none-linux-gnu and no issues.

Ok for master? and active branches after some stew?

Thanks,
Tamar

gcc/ChangeLog:

* config/aarch64/aarch64-simd.md (dot_prod): Correct
RTL.

--- inline copy of patch -- 
diff --git a/gcc/config/aarch64/aarch64-simd.md 
b/gcc/config/aarch64/aarch64-simd.md
index 
7397f1ec5ca0cb9e3cdd5c46772f604e640666e4..51789f954affd9fa88e2bc1bcc3dacf64ccb5bde
 100644
--- a/gcc/config/aarch64/aarch64-simd.md
+++ b/gcc/config/aarch64/aarch64-simd.md
@@ -635,18 +635,12 @@ (define_insn "aarch64_usdot"
 ;; and so the vectorizer provides r, in which the result has to be accumulated.
 (define_expand "dot_prod"
   [(set (match_operand:VS 0 "register_operand")
-   (plus:VS (unspec:VS [(match_operand: 1 "register_operand")
+   (plus:VS (match_operand:VS 3 "register_operand")
+(unspec:VS [(match_operand: 1 "register_operand")
(match_operand: 2 "register_operand")]
-DOTPROD)
-   (match_operand:VS 3 "register_operand")))]
+DOTPROD)))]
   "TARGET_DOTPROD"
-{
-  emit_insn (
-gen_aarch64_dot (operands[3], operands[3], operands[1],
-   operands[2]));
-  emit_insn (gen_rtx_SET (operands[0], operands[3]));
-  DONE;
-})
+)
 
 ;; Auto-vectorizer pattern for usdot.  The operand[3] and operand[0] are the
 ;; RMW parameters that when it comes to the vectorizer.


-- 
diff --git a/gcc/config/aarch64/aarch64-simd.md b/gcc/config/aarch64/aarch64-simd.md
index 7397f1ec5ca0cb9e3cdd5c46772f604e640666e4..51789f954affd9fa88e2bc1bcc3dacf64ccb5bde 100644
--- a/gcc/config/aarch64/aarch64-simd.md
+++ b/gcc/config/aarch64/aarch64-simd.md
@@ -635,18 +635,12 @@ (define_insn "aarch64_usdot"
 ;; and so the vectorizer provides r, in which the result has to be accumulated.
 (define_expand "dot_prod"
   [(set (match_operand:VS 0 "register_operand")
-	(plus:VS (unspec:VS [(match_operand: 1 "register_operand")
+	(plus:VS (match_operand:VS 3 "register_operand")
+		 (unspec:VS [(match_operand: 1 "register_operand")
 			(match_operand: 2 "register_operand")]
-		 DOTPROD)
-		(match_operand:VS 3 "register_operand")))]
+		 DOTPROD)))]
   "TARGET_DOTPROD"
-{
-  emit_insn (
-gen_aarch64_dot (operands[3], operands[3], operands[1],
-operands[2]));
-  emit_insn (gen_rtx_SET (operands[0], operands[3]));
-  DONE;
-})
+)
 
 ;; Auto-vectorizer pattern for usdot.  The operand[3] and operand[0] are the
 ;; RMW parameters that when it comes to the vectorizer.



[PATCH 2/4]AArch64: correct usdot vectorizer and intrinsics optabs

2021-07-15 Thread Tamar Christina via Gcc-patches
Hi All,

There's a slight mismatch between the vectorizer optabs and the intrinsics
patterns for NEON.  The vectorizer expects operands[3] and operands[0] to be
the same but the aarch64 intrinsics expanders expect operands[0] and
operands[1] to be the same.

This means we need different patterns here.  This adds a separate usdot
vectorizer pattern which just shuffles around the RTL params.

There's also an inconsistency between the usdot and (u|s)dot intrinsics RTL
patterns which is not corrected here.

Bootstrapped Regtested on aarch64-none-linux-gnu and no issues.

Ok for master?

Thanks,
Tamar

gcc/ChangeLog:

* config/aarch64/aarch64-simd.md (usdot_prod): Rename to...
(aarch64_usdot): ..This
(usdot_prod): New.
* config/aarch64/arm_neon.h (vusdot_s32, vusdotq_s32): Use
aarch64_usdot.
* config/aarch64/aarch64-simd-builtins.def: Likewise.

--- inline copy of patch -- 
diff --git a/gcc/config/aarch64/aarch64-simd-builtins.def 
b/gcc/config/aarch64/aarch64-simd-builtins.def
index 
063f503ebd96657f017dfaa067cb231991376bda..ac5d4fc7ff1e61d404e66193b629986382ee4ffd
 100644
--- a/gcc/config/aarch64/aarch64-simd-builtins.def
+++ b/gcc/config/aarch64/aarch64-simd-builtins.def
@@ -374,11 +374,10 @@
   BUILTIN_VSDQ_I_DI (BINOP, srshl, 0, NONE)
   BUILTIN_VSDQ_I_DI (BINOP_UUS, urshl, 0, NONE)
 
-  /* Implemented by _prod.  */
+  /* Implemented by aarch64_{_lane}{q}.  */
   BUILTIN_VB (TERNOP, sdot, 0, NONE)
   BUILTIN_VB (TERNOPU, udot, 0, NONE)
-  BUILTIN_VB (TERNOP_SSUS, usdot_prod, 10, NONE)
-  /* Implemented by aarch64__lane{q}.  */
+  BUILTIN_VB (TERNOP_SSUS, usdot, 0, NONE)
   BUILTIN_VB (QUADOP_LANE, sdot_lane, 0, NONE)
   BUILTIN_VB (QUADOPU_LANE, udot_lane, 0, NONE)
   BUILTIN_VB (QUADOP_LANE, sdot_laneq, 0, NONE)
diff --git a/gcc/config/aarch64/aarch64-simd.md 
b/gcc/config/aarch64/aarch64-simd.md
index 
74890989cb3045798bf8d0241467eaaf72238297..7397f1ec5ca0cb9e3cdd5c46772f604e640666e4
 100644
--- a/gcc/config/aarch64/aarch64-simd.md
+++ b/gcc/config/aarch64/aarch64-simd.md
@@ -601,7 +601,7 @@ (define_insn "aarch64_dot"
 
 ;; These instructions map to the __builtins for the armv8.6a I8MM usdot
 ;; (vector) Dot Product operation.
-(define_insn "usdot_prod"
+(define_insn "aarch64_usdot"
   [(set (match_operand:VS 0 "register_operand" "=w")
(plus:VS
  (unspec:VS [(match_operand: 2 "register_operand" "w")
@@ -648,6 +648,17 @@ (define_expand "dot_prod"
   DONE;
 })
 
+;; Auto-vectorizer pattern for usdot.  The operand[3] and operand[0] are the
+;; RMW parameters that when it comes to the vectorizer.
+(define_expand "usdot_prod"
+  [(set (match_operand:VS 0 "register_operand")
+   (plus:VS (unspec:VS [(match_operand: 1 "register_operand")
+   (match_operand: 2 "register_operand")]
+UNSPEC_USDOT)
+(match_operand:VS 3 "register_operand")))]
+  "TARGET_I8MM"
+)
+
 ;; These instructions map to the __builtins for the Dot Product
 ;; indexed operations.
 (define_insn "aarch64_dot_lane"
diff --git a/gcc/config/aarch64/arm_neon.h b/gcc/config/aarch64/arm_neon.h
index 
00d76ea937ace5763746478cbdfadf6479e0b15a..17e059efb80fa86a8a32127ace4fc7f43e2040a8
 100644
--- a/gcc/config/aarch64/arm_neon.h
+++ b/gcc/config/aarch64/arm_neon.h
@@ -34039,14 +34039,14 @@ __extension__ extern __inline int32x2_t
 __attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
 vusdot_s32 (int32x2_t __r, uint8x8_t __a, int8x8_t __b)
 {
-  return __builtin_aarch64_usdot_prodv8qi_ssus (__r, __a, __b);
+  return __builtin_aarch64_usdotv8qi_ssus (__r, __a, __b);
 }
 
 __extension__ extern __inline int32x4_t
 __attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
 vusdotq_s32 (int32x4_t __r, uint8x16_t __a, int8x16_t __b)
 {
-  return __builtin_aarch64_usdot_prodv16qi_ssus (__r, __a, __b);
+  return __builtin_aarch64_usdotv16qi_ssus (__r, __a, __b);
 }
 
 __extension__ extern __inline int32x2_t


-- 
diff --git a/gcc/config/aarch64/aarch64-simd-builtins.def b/gcc/config/aarch64/aarch64-simd-builtins.def
index 063f503ebd96657f017dfaa067cb231991376bda..ac5d4fc7ff1e61d404e66193b629986382ee4ffd 100644
--- a/gcc/config/aarch64/aarch64-simd-builtins.def
+++ b/gcc/config/aarch64/aarch64-simd-builtins.def
@@ -374,11 +374,10 @@
   BUILTIN_VSDQ_I_DI (BINOP, srshl, 0, NONE)
   BUILTIN_VSDQ_I_DI (BINOP_UUS, urshl, 0, NONE)
 
-  /* Implemented by _prod.  */
+  /* Implemented by aarch64_{_lane}{q}.  */
   BUILTIN_VB (TERNOP, sdot, 0, NONE)
   BUILTIN_VB (TERNOPU, udot, 0, NONE)
-  BUILTIN_VB (TERNOP_SSUS, usdot_prod, 10, NONE)
-  /* Implemented by aarch64__lane{q}.  */
+  BUILTIN_VB (TERNOP_SSUS, usdot, 0, NONE)
   BUILTIN_VB (QUADOP_LANE, sdot_lane, 0, NONE)
   BUILTIN_VB (QUADOPU_LANE, udot_lane, 0, NONE)
   BUILTIN_VB (QUADOP_LANE, sdot_laneq, 0, NONE)
diff --git a/gcc/config/aarch64/aarch64-simd.md b/gcc/config/aarch64/aarch64-simd.md
index 

[PATCH 1/4][committed] testsuite: Fix testisms in scalar tests PR101457

2021-07-15 Thread Tamar Christina via Gcc-patches
Hi All,

These testcases accidentally contain the wrong signs for the expected values
for the scalar code.  The vector code however is correct.

Bootstrapped Regtested on aarch64-none-linux-gnu and no issues.

Committed as a trivial fix.

Thanks,
Tamar

gcc/testsuite/ChangeLog:

PR middle-end/101457
* gcc.dg/vect/vect-reduc-dot-17.c: Fix signs of scalar code.
* gcc.dg/vect/vect-reduc-dot-18.c: Likewise.
* gcc.dg/vect/vect-reduc-dot-22.c: Likewise.
* gcc.dg/vect/vect-reduc-dot-9.c: Likewise.

--- inline copy of patch -- 
diff --git a/gcc/testsuite/gcc.dg/vect/vect-reduc-dot-17.c 
b/gcc/testsuite/gcc.dg/vect/vect-reduc-dot-17.c
index 
aa269c4d657f65e07e36df7f3fd0098cf3aaf4d0..38f86fe458adcc7ebbbae22f5cc1e720928f2d48
 100644
--- a/gcc/testsuite/gcc.dg/vect/vect-reduc-dot-17.c
+++ b/gcc/testsuite/gcc.dg/vect/vect-reduc-dot-17.c
@@ -35,8 +35,9 @@ main (void)
 {
   check_vect ();
 
-  SIGNEDNESS_3 char a[N], b[N];
-  int expected = 0x12345;
+  SIGNEDNESS_3 char a[N];
+  SIGNEDNESS_4 char b[N];
+  SIGNEDNESS_1 int expected = 0x12345;
   for (int i = 0; i < N; ++i)
 {
   a[i] = BASE + i * 5;
diff --git a/gcc/testsuite/gcc.dg/vect/vect-reduc-dot-18.c 
b/gcc/testsuite/gcc.dg/vect/vect-reduc-dot-18.c
index 
2b1cc0411c3256ccd876d8b4da18ce4881dc0af9..2e86ebe3c6c6a0da9ac242868592f30028ed2155
 100644
--- a/gcc/testsuite/gcc.dg/vect/vect-reduc-dot-18.c
+++ b/gcc/testsuite/gcc.dg/vect/vect-reduc-dot-18.c
@@ -35,8 +35,9 @@ main (void)
 {
   check_vect ();
 
-  SIGNEDNESS_3 char a[N], b[N];
-  int expected = 0x12345;
+  SIGNEDNESS_3 char a[N];
+  SIGNEDNESS_4 char b[N];
+  SIGNEDNESS_1 int expected = 0x12345;
   for (int i = 0; i < N; ++i)
 {
   a[i] = BASE + i * 5;
diff --git a/gcc/testsuite/gcc.dg/vect/vect-reduc-dot-22.c 
b/gcc/testsuite/gcc.dg/vect/vect-reduc-dot-22.c
index 
febeb19784c6aaca72dc0871af0d32cc91fa6ea2..0bde43a6cb855ce5edd9015ebf34ca226353d77e
 100644
--- a/gcc/testsuite/gcc.dg/vect/vect-reduc-dot-22.c
+++ b/gcc/testsuite/gcc.dg/vect/vect-reduc-dot-22.c
@@ -37,7 +37,7 @@ main (void)
 
   SIGNEDNESS_3 char a[N];
   SIGNEDNESS_4 short b[N];
-  int expected = 0x12345;
+  SIGNEDNESS_1 long expected = 0x12345;
   for (int i = 0; i < N; ++i)
 {
   a[i] = BASE + i * 5;
diff --git a/gcc/testsuite/gcc.dg/vect/vect-reduc-dot-9.c 
b/gcc/testsuite/gcc.dg/vect/vect-reduc-dot-9.c
index 
cbbeedec3bfd0810a8ce8036e6670585d9334924..d1049c96bf1febfc8933622e292b44cc8dd129cc
 100644
--- a/gcc/testsuite/gcc.dg/vect/vect-reduc-dot-9.c
+++ b/gcc/testsuite/gcc.dg/vect/vect-reduc-dot-9.c
@@ -35,8 +35,9 @@ main (void)
 {
   check_vect ();
 
-  SIGNEDNESS_3 char a[N], b[N];
-  int expected = 0x12345;
+  SIGNEDNESS_3 char a[N];
+  SIGNEDNESS_4 char b[N];
+  SIGNEDNESS_1 int expected = 0x12345;
   for (int i = 0; i < N; ++i)
 {
   a[i] = BASE + i * 5;


-- 
diff --git a/gcc/testsuite/gcc.dg/vect/vect-reduc-dot-17.c b/gcc/testsuite/gcc.dg/vect/vect-reduc-dot-17.c
index aa269c4d657f65e07e36df7f3fd0098cf3aaf4d0..38f86fe458adcc7ebbbae22f5cc1e720928f2d48 100644
--- a/gcc/testsuite/gcc.dg/vect/vect-reduc-dot-17.c
+++ b/gcc/testsuite/gcc.dg/vect/vect-reduc-dot-17.c
@@ -35,8 +35,9 @@ main (void)
 {
   check_vect ();
 
-  SIGNEDNESS_3 char a[N], b[N];
-  int expected = 0x12345;
+  SIGNEDNESS_3 char a[N];
+  SIGNEDNESS_4 char b[N];
+  SIGNEDNESS_1 int expected = 0x12345;
   for (int i = 0; i < N; ++i)
 {
   a[i] = BASE + i * 5;
diff --git a/gcc/testsuite/gcc.dg/vect/vect-reduc-dot-18.c b/gcc/testsuite/gcc.dg/vect/vect-reduc-dot-18.c
index 2b1cc0411c3256ccd876d8b4da18ce4881dc0af9..2e86ebe3c6c6a0da9ac242868592f30028ed2155 100644
--- a/gcc/testsuite/gcc.dg/vect/vect-reduc-dot-18.c
+++ b/gcc/testsuite/gcc.dg/vect/vect-reduc-dot-18.c
@@ -35,8 +35,9 @@ main (void)
 {
   check_vect ();
 
-  SIGNEDNESS_3 char a[N], b[N];
-  int expected = 0x12345;
+  SIGNEDNESS_3 char a[N];
+  SIGNEDNESS_4 char b[N];
+  SIGNEDNESS_1 int expected = 0x12345;
   for (int i = 0; i < N; ++i)
 {
   a[i] = BASE + i * 5;
diff --git a/gcc/testsuite/gcc.dg/vect/vect-reduc-dot-22.c b/gcc/testsuite/gcc.dg/vect/vect-reduc-dot-22.c
index febeb19784c6aaca72dc0871af0d32cc91fa6ea2..0bde43a6cb855ce5edd9015ebf34ca226353d77e 100644
--- a/gcc/testsuite/gcc.dg/vect/vect-reduc-dot-22.c
+++ b/gcc/testsuite/gcc.dg/vect/vect-reduc-dot-22.c
@@ -37,7 +37,7 @@ main (void)
 
   SIGNEDNESS_3 char a[N];
   SIGNEDNESS_4 short b[N];
-  int expected = 0x12345;
+  SIGNEDNESS_1 long expected = 0x12345;
   for (int i = 0; i < N; ++i)
 {
   a[i] = BASE + i * 5;
diff --git a/gcc/testsuite/gcc.dg/vect/vect-reduc-dot-9.c b/gcc/testsuite/gcc.dg/vect/vect-reduc-dot-9.c
index cbbeedec3bfd0810a8ce8036e6670585d9334924..d1049c96bf1febfc8933622e292b44cc8dd129cc 100644
--- a/gcc/testsuite/gcc.dg/vect/vect-reduc-dot-9.c
+++ b/gcc/testsuite/gcc.dg/vect/vect-reduc-dot-9.c
@@ -35,8 +35,9 @@ main (void)
 {
   check_vect ();
 
-  SIGNEDNESS_3 char a[N], b[N];
-  int expected = 0x12345;
+  SIGNEDNESS_3 char a[N];
+  SIGNEDNESS_4 

[PATCH] c++: covariant reference return type [PR99664]

2021-07-15 Thread Patrick Palka via Gcc-patches
This implements the wording changes of DR 960 which clarifies that two
reference types are covariant only if they're both lvalue references
or both rvalue references.

Bootstrapped and regtested on x86_64-pc-linux-gnu, does this look
OK for trunk?

DR 960
PR c++/99664

gcc/cp/ChangeLog:

* search.c (check_final_overrider): Compare TYPE_REF_IS_RVALUE
when the return types are references.

gcc/testsuite/ChangeLog:

* g++.dg/inherit/covariant23.C: New test.
---
 gcc/cp/search.c|  8 +++-
 gcc/testsuite/g++.dg/inherit/covariant23.C | 14 ++
 2 files changed, 21 insertions(+), 1 deletion(-)
 create mode 100644 gcc/testsuite/g++.dg/inherit/covariant23.C

diff --git a/gcc/cp/search.c b/gcc/cp/search.c
index af41bfe5835..943671acff8 100644
--- a/gcc/cp/search.c
+++ b/gcc/cp/search.c
@@ -1948,7 +1948,13 @@ check_final_overrider (tree overrider, tree basefn)
   fail = !INDIRECT_TYPE_P (base_return);
   if (!fail)
{
- fail = cp_type_quals (base_return) != cp_type_quals (over_return);
+ if (cp_type_quals (base_return) != cp_type_quals (over_return))
+   fail = 1;
+
+ if (TYPE_REF_P (base_return)
+ && (TYPE_REF_IS_RVALUE (base_return)
+ != TYPE_REF_IS_RVALUE (over_return)))
+   fail = 1;
 
  base_return = TREE_TYPE (base_return);
  over_return = TREE_TYPE (over_return);
diff --git a/gcc/testsuite/g++.dg/inherit/covariant23.C 
b/gcc/testsuite/g++.dg/inherit/covariant23.C
new file mode 100644
index 000..b27be15ef45
--- /dev/null
+++ b/gcc/testsuite/g++.dg/inherit/covariant23.C
@@ -0,0 +1,14 @@
+// PR c++/99664
+// { dg-do compile { target c++11 } }
+
+struct Res { };
+
+struct A {
+  virtual Res &();
+  virtual Res ();
+};
+
+struct B : A {
+  Res () override; // { dg-error "return type" }
+  Res &() override; // { dg-error "return type" }
+};
-- 
2.32.0.264.g75ae10bc75



[PATCH] c++: alias CTAD inside decltype [PR101233]

2021-07-15 Thread Patrick Palka via Gcc-patches
This is the alias CTAD version of the CTAD bug PR93248, and the fix is
the same: clear cp_unevaluated_operand so that the entire chain of
DECL_ARGUMENTS gets substituted.

Bootstrapped and regtested on x86_64-pc-linux-gnu, does this look OK for
trunk/11?

PR c++/101233

gcc/cp/ChangeLog:

* pt.c (alias_ctad_tweaks): Clear cp_unevaluated_operand for
substituting DECL_ARGUMENTS.

gcc/testsuite/ChangeLog:

* g++.dg/cpp2a/class-deduction-alias10.C: New test.
---
 gcc/cp/pt.c  | 12 +---
 gcc/testsuite/g++.dg/cpp2a/class-deduction-alias10.C | 10 ++
 2 files changed, 19 insertions(+), 3 deletions(-)
 create mode 100644 gcc/testsuite/g++.dg/cpp2a/class-deduction-alias10.C

diff --git a/gcc/cp/pt.c b/gcc/cp/pt.c
index c7bf7d412ca..bc0a0936579 100644
--- a/gcc/cp/pt.c
+++ b/gcc/cp/pt.c
@@ -29097,9 +29097,15 @@ alias_ctad_tweaks (tree tmpl, tree uguides)
  /* Substitute the deduced arguments plus the rewritten template
 parameters into f to get g.  This covers the type, copyness,
 guideness, and explicit-specifier.  */
- tree g = tsubst_decl (DECL_TEMPLATE_RESULT (f), targs, complain);
- if (g == error_mark_node)
-   continue;
+ tree g;
+   {
+ /* Parms are to have DECL_CHAIN tsubsted, which would be skipped
+if cp_unevaluated_operand.  */
+ cp_evaluated ev;
+ g = tsubst_decl (DECL_TEMPLATE_RESULT (f), targs, complain);
+ if (g == error_mark_node)
+   continue;
+   }
  DECL_USE_TEMPLATE (g) = 0;
  fprime = build_template_decl (g, gtparms, false);
  DECL_TEMPLATE_RESULT (fprime) = g;
diff --git a/gcc/testsuite/g++.dg/cpp2a/class-deduction-alias10.C 
b/gcc/testsuite/g++.dg/cpp2a/class-deduction-alias10.C
new file mode 100644
index 000..a473fff5dc7
--- /dev/null
+++ b/gcc/testsuite/g++.dg/cpp2a/class-deduction-alias10.C
@@ -0,0 +1,10 @@
+// PR c++/101233
+// { dg-do compile { target c++20 } }
+
+template
+struct A { A(T, U); };
+
+template
+using B = A;
+
+using type = decltype(B{0, 0});
-- 
2.32.0.264.g75ae10bc75



Re: [PATCH] consider parameter names in -Wvla-parameter (PR 97548)

2021-07-15 Thread Martin Sebor via Gcc-patches

On 7/8/21 5:36 PM, Jeff Law wrote:



On 7/1/2021 7:02 PM, Martin Sebor via Gcc-patches wrote:

-Wvla-parameter relies on operand_equal_p() with OEP_LEXICOGRAPHIC
set to compare VLA bounds for equality.  But operand_equal_p()
doesn't consider decl names, and so nontrivial expressions that
refer to the same function parameter are considered unequal by
the function, leading to false positives.

The attached fix solves the problem by adding a new flag bit,
OEP_DECL_NAME, to set of flags that control the function.  When
the bit is set, the function considers distinct decls with
the same name equal.  The caller is responsible for ensuring
that the otherwise distinct decls appear in a context where they
can be assumed to refer to the same entity.  The only caller that
sets the flag is the -Wvla-parameter checker.

In addition, the patch strips nops from the VLA bound to avoid
false positives with meaningless casts.

I don't particularly like this, though I don't see a better solution.

Can you add some more info to OEP_DECL_NAME to describe the conditions 
where it's useful and how callers can correctly use it?


With that, this is OK.


I updated the comment and pushed r12-2329.

Martin


Re: [PATCH 2/2] testsuite: [arm] Remove arm_v8_2a_imm8_neon_ok_nocache

2021-07-15 Thread Richard Sandiford via Gcc-patches
Christophe Lyon via Gcc-patches  writes:
> This patch removes this recently-introduced effective-target, as it
> looks like a typo and duplicate for arm_v8_2a_i8mm_ok (imm8 vs i8mm),
> and it is not used.
>
> 2021-07-15  Christophe Lyon  
>
>   gcc/testsuite/
>   * lib/target-supports.exp (arm_v8_2a_imm8_neon_ok_nocache):
>   Delete.

OK, thanks.

Richard

> ---
>  gcc/testsuite/lib/target-supports.exp | 30 ---
>  1 file changed, 30 deletions(-)
>
> diff --git a/gcc/testsuite/lib/target-supports.exp 
> b/gcc/testsuite/lib/target-supports.exp
> index 1c27ccd94af..28950803b13 100644
> --- a/gcc/testsuite/lib/target-supports.exp
> +++ b/gcc/testsuite/lib/target-supports.exp
> @@ -5267,36 +5267,6 @@ proc 
> check_effective_target_arm_v8_2a_dotprod_neon_ok_nocache { } {
>  return 0;
>  }
>  
> -# Return 1 if the target supports ARMv8.2 Adv.SIMD imm8
> -# instructions, 0 otherwise.  The test is valid for ARM and for AArch64.
> -# Record the command line options needed.
> -
> -proc check_effective_target_arm_v8_2a_imm8_neon_ok_nocache { } {
> -global et_arm_v8_2a_imm8_neon_flags
> -set et_arm_v8_2a_imm8_neon_flags ""
> -
> -if { ![istarget arm*-*-*] && ![istarget aarch64*-*-*] } {
> -return 0;
> -}
> -
> -# Iterate through sets of options to find the compiler flags that
> -# need to be added to the -march option.
> -foreach flags {"" "-mfloat-abi=softfp -mfpu=neon-fp-armv8" 
> "-mfloat-abi=hard -mfpu=neon-fp-armv8"} {
> -if { [check_no_compiler_messages_nocache \
> -  arm_v8_2a_imm8_neon_ok object {
> - #include 
> -#if !defined (__ARM_FEATURE_MATMUL_INT8)
> -#error "__ARM_FEATURE_MATMUL_INT8 not defined"
> -#endif
> -} "$flags -march=armv8.2-a+imm8"] } {
> -set et_arm_v8_2a_imm8_neon_flags "$flags -march=armv8.2-a+imm8"
> -return 1
> -}
> -}
> -
> -return 0;
> -}
> -
>  # Return 1 if the target supports ARMv8.1-M MVE
>  # instructions, 0 otherwise.  The test is valid for ARM.
>  # Record the command line options needed.


Re: [PATCH 1/2] testsuite: [arm] Add missing effective-target to vusdot-autovec.c

2021-07-15 Thread Richard Sandiford via Gcc-patches
Christophe Lyon via Gcc-patches  writes:
> This test fails when forcing an -mcpu option incompatible with
> -march=armv8.2-a+i8mm.
>
> This patch adds the missing arm_v8_2a_i8mm_ok effective-target, as
> well as the associated dg-add-options arm_v8_2a_i8mm.
>
> 2021-07-15  Christophe Lyon  
>
>   gcc/testsuite/
>   * gcc.target/arm/simd/vusdot-autovec.c: Use arm_v8_2a_i8mm_ok
>   effective-target.

OK, thanks.

> ---
>  gcc/testsuite/gcc.target/arm/simd/vusdot-autovec.c | 4 +++-
>  1 file changed, 3 insertions(+), 1 deletion(-)
>
> diff --git a/gcc/testsuite/gcc.target/arm/simd/vusdot-autovec.c 
> b/gcc/testsuite/gcc.target/arm/simd/vusdot-autovec.c
> index 7cc56f68817..e7af895b423 100644
> --- a/gcc/testsuite/gcc.target/arm/simd/vusdot-autovec.c
> +++ b/gcc/testsuite/gcc.target/arm/simd/vusdot-autovec.c
> @@ -1,5 +1,7 @@
>  /* { dg-do compile } */
> -/* { dg-options "-O3 -march=armv8.2-a+i8mm" } */
> +/* { dg-require-effective-target arm_v8_2a_i8mm_ok } */
> +/* { dg-options "-O3" } */
> +/* { dg-add-options arm_v8_2a_i8mm } */
>  
>  #define N 480
>  #define SIGNEDNESS_1 unsigned


Re: [RFC] ipa: Adjust references to identify read-only globals

2021-07-15 Thread Jan Hubicka
> Hi,
> 
> gcc/ChangeLog:
> 
> 2021-06-29  Martin Jambor  
> 
>   * cgraph.h (ipa_replace_map): New field force_load_ref.
>   * ipa-prop.h (ipa_param_descriptor): Reduce precision of move_cost,
>   aded new flag load_dereferenced, adjusted comments.
>   (ipa_get_param_dereferenced): New function.
>   (ipa_set_param_dereferenced): Likewise.
>   * cgraphclones.c (cgraph_node::create_virtual_clone): Follow it.
>   * ipa-cp.c: Include gimple.h.
>   (ipcp_discover_new_direct_edges): Take into account dereferenced flag.
>   (get_replacement_map): New parameter force_load_ref, set the
>   appropriate flag in ipa_replace_map if set.
>   (struct symbol_and_index_together): New type.
>   (adjust_references_in_act_callers): New function.
>   (adjust_references_in_caller): Likewise.
>   (create_specialized_node): When appropriate, call
>   adjust_references_in_caller and force only load references.
>   * ipa-prop.c (load_from_dereferenced_name): New function.
>   (ipa_analyze_controlled_uses): Also detect loads from a
>   dereference, harden testing of call statements.
>   (ipa_write_node_info): Stream the dereferenced flag.
>   (ipa_read_node_info): Likewise.
>   (ipa_set_jf_constant): Also create refdesc when jump function
>   references a variable.
>   (cgraph_node_for_jfunc): Rename to symtab_node_for_jfunc, work
>   also on references of variables and return a symtab_node.  Adjust
>   all callers.
>   (propagate_controlled_uses): Also remove references to VAR_DECLs.
> 
> gcc/testsuite/ChangeLog:
> 
> 2021-06-29  Martin Jambor  
> 
>   * gcc.dg/ipa/remref-3.c: New test.
>   * gcc.dg/ipa/remref-4.c: Likewise.
>   * gcc.dg/ipa/remref-5.c: Likewise.
>   * gcc.dg/ipa/remref-6.c: Likewise.
> ---
>  gcc/cgraph.h|   3 +
>  gcc/cgraphclones.c  |  10 +-
>  gcc/ipa-cp.c| 146 ++--
>  gcc/ipa-prop.c  | 166 ++--
>  gcc/ipa-prop.h  |  27 -
>  gcc/testsuite/gcc.dg/ipa/remref-3.c |  23 
>  gcc/testsuite/gcc.dg/ipa/remref-4.c |  31 ++
>  gcc/testsuite/gcc.dg/ipa/remref-5.c |  38 +++
>  gcc/testsuite/gcc.dg/ipa/remref-6.c |  24 
>  9 files changed, 419 insertions(+), 49 deletions(-)
>  create mode 100644 gcc/testsuite/gcc.dg/ipa/remref-3.c
>  create mode 100644 gcc/testsuite/gcc.dg/ipa/remref-4.c
>  create mode 100644 gcc/testsuite/gcc.dg/ipa/remref-5.c
>  create mode 100644 gcc/testsuite/gcc.dg/ipa/remref-6.c
> 
> diff --git a/gcc/cgraph.h b/gcc/cgraph.h
> index 9f4338fdf87..0fc20cd4517 100644
> --- a/gcc/cgraph.h
> +++ b/gcc/cgraph.h
> @@ -700,6 +700,9 @@ struct GTY(()) ipa_replace_map
>tree new_tree;
>/* Parameter number to replace, when old_tree is NULL.  */
>int parm_num;
> +  /* Set if the newly added reference should not be an address one, but a 
> load
> + one from the operand of the ADDR_EXPR in NEW_TREE.  */

So this is in case where parameter p is used only as *p?
I think the comment should be expanded to explain the situation or in a
year I will not know why we need such a flag :)
> @@ -4320,7 +4322,8 @@ gather_edges_for_value (ipcp_value *val, 
> cgraph_node *dest,
> Return it or NULL if for some reason it cannot be created.  */
>  
>  static struct ipa_replace_map *
> -get_replacement_map (class ipa_node_params *info, tree value, int parm_num)
> +get_replacement_map (class ipa_node_params *info, tree value, int parm_num,
> +  bool force_load_ref)

You want to comment the parameter here too..
> +/* At INDEX of a function being called by CS there is an ADDR_EXPR of a
> +   variable which is only dereferenced and which is represented by SYMBOL.  
> See
> +   if we can remove ADDR reference in callers assosiated witht the call. */
> +
> +static void
> +adjust_references_in_caller (cgraph_edge *cs, symtab_node *symbol, int index)
> +{
> +  ipa_edge_args *args = ipa_edge_args_sum->get (cs);
> +  ipa_jump_func *jfunc = ipa_get_ith_jump_func (args, index);
> +  if (jfunc->type == IPA_JF_CONST)
> +{
> +  ipa_ref *to_del = cs->caller->find_reference (symbol, cs->call_stmt,
> + cs->lto_stmt_uid);
> +  if (!to_del)
> + return;
> +  to_del->remove_reference ();
> +  if (dump_file)
> + fprintf (dump_file, "Removed a reference from %s to %s.\n",
> +  cs->caller->dump_name (), symbol->dump_name ());
> +  return;
> +}
> +
> +  if (jfunc->type != IPA_JF_PASS_THROUGH
> +  || ipa_get_jf_pass_through_operation (jfunc) != NOP_EXPR)
> +return;
> +
> +  int fidx = ipa_get_jf_pass_through_formal_id (jfunc);
> +  cgraph_node *caller = cs->caller;
> +  ipa_node_params *caller_info = ipa_node_params_sum->get (caller);
> +  /* TODO: This consistency check may be too big and not really
> + that useful.  Consider 

Re: rs6000: Generate an lxvp instead of two adjacent lxv instructions

2021-07-15 Thread Segher Boessenkool
On Thu, Jul 15, 2021 at 09:15:55AM -0500, Peter Bergner wrote:
> On 7/14/21 4:12 PM, Peter Bergner wrote:
> > I'll make the change above and rebuild just to be safe and then commit.
> 
> Regtesting was clean as expected, so I pushed the commit to trunk.  Thanks.
> Is this ok for backporting to GCC 11 after a day or two on trunk?

If it is tested well enough, yes.  There are many things that can break
in this code, so I am not very comfortable with backporting it so close
to a release, but if it is important, we can take that risk.

Thanks,


Segher


[PATCH 2/2] testsuite: [arm] Remove arm_v8_2a_imm8_neon_ok_nocache

2021-07-15 Thread Christophe Lyon via Gcc-patches
This patch removes this recently-introduced effective-target, as it
looks like a typo and duplicate for arm_v8_2a_i8mm_ok (imm8 vs i8mm),
and it is not used.

2021-07-15  Christophe Lyon  

gcc/testsuite/
* lib/target-supports.exp (arm_v8_2a_imm8_neon_ok_nocache):
Delete.
---
 gcc/testsuite/lib/target-supports.exp | 30 ---
 1 file changed, 30 deletions(-)

diff --git a/gcc/testsuite/lib/target-supports.exp 
b/gcc/testsuite/lib/target-supports.exp
index 1c27ccd94af..28950803b13 100644
--- a/gcc/testsuite/lib/target-supports.exp
+++ b/gcc/testsuite/lib/target-supports.exp
@@ -5267,36 +5267,6 @@ proc 
check_effective_target_arm_v8_2a_dotprod_neon_ok_nocache { } {
 return 0;
 }
 
-# Return 1 if the target supports ARMv8.2 Adv.SIMD imm8
-# instructions, 0 otherwise.  The test is valid for ARM and for AArch64.
-# Record the command line options needed.
-
-proc check_effective_target_arm_v8_2a_imm8_neon_ok_nocache { } {
-global et_arm_v8_2a_imm8_neon_flags
-set et_arm_v8_2a_imm8_neon_flags ""
-
-if { ![istarget arm*-*-*] && ![istarget aarch64*-*-*] } {
-return 0;
-}
-
-# Iterate through sets of options to find the compiler flags that
-# need to be added to the -march option.
-foreach flags {"" "-mfloat-abi=softfp -mfpu=neon-fp-armv8" 
"-mfloat-abi=hard -mfpu=neon-fp-armv8"} {
-if { [check_no_compiler_messages_nocache \
-  arm_v8_2a_imm8_neon_ok object {
-   #include 
-#if !defined (__ARM_FEATURE_MATMUL_INT8)
-#error "__ARM_FEATURE_MATMUL_INT8 not defined"
-#endif
-} "$flags -march=armv8.2-a+imm8"] } {
-set et_arm_v8_2a_imm8_neon_flags "$flags -march=armv8.2-a+imm8"
-return 1
-}
-}
-
-return 0;
-}
-
 # Return 1 if the target supports ARMv8.1-M MVE
 # instructions, 0 otherwise.  The test is valid for ARM.
 # Record the command line options needed.
-- 
2.25.1



[PATCH 1/2] testsuite: [arm] Add missing effective-target to vusdot-autovec.c

2021-07-15 Thread Christophe Lyon via Gcc-patches
This test fails when forcing an -mcpu option incompatible with
-march=armv8.2-a+i8mm.

This patch adds the missing arm_v8_2a_i8mm_ok effective-target, as
well as the associated dg-add-options arm_v8_2a_i8mm.

2021-07-15  Christophe Lyon  

gcc/testsuite/
* gcc.target/arm/simd/vusdot-autovec.c: Use arm_v8_2a_i8mm_ok
effective-target.
---
 gcc/testsuite/gcc.target/arm/simd/vusdot-autovec.c | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/gcc/testsuite/gcc.target/arm/simd/vusdot-autovec.c 
b/gcc/testsuite/gcc.target/arm/simd/vusdot-autovec.c
index 7cc56f68817..e7af895b423 100644
--- a/gcc/testsuite/gcc.target/arm/simd/vusdot-autovec.c
+++ b/gcc/testsuite/gcc.target/arm/simd/vusdot-autovec.c
@@ -1,5 +1,7 @@
 /* { dg-do compile } */
-/* { dg-options "-O3 -march=armv8.2-a+i8mm" } */
+/* { dg-require-effective-target arm_v8_2a_i8mm_ok } */
+/* { dg-options "-O3" } */
+/* { dg-add-options arm_v8_2a_i8mm } */
 
 #define N 480
 #define SIGNEDNESS_1 unsigned
-- 
2.25.1



Re: [PATCH 2/2][RFC] Add loop masking support for x86

2021-07-15 Thread Richard Biener
On Thu, 15 Jul 2021, Richard Biener wrote:

> On Thu, 15 Jul 2021, Richard Sandiford wrote:
> 
> > Richard Biener  writes:
> > > On Thu, 15 Jul 2021, Hongtao Liu wrote:
> > >
> > >> On Thu, Jul 15, 2021 at 6:45 PM Richard Biener via Gcc-patches
> > >>  wrote:
> > >> >
> > >> > On Thu, Jul 15, 2021 at 12:30 PM Richard Biener  
> > >> > wrote:
> > >> > >
> > >> > > The following extends the existing loop masking support using
> > >> > > SVE WHILE_ULT to x86 by proving an alternate way to produce the
> > >> > > mask using VEC_COND_EXPRs.  So with --param vect-partial-vector-usage
> > >> > > you can now enable masked vectorized epilogues (=1) or fully
> > >> > > masked vector loops (=2).
> > >> > >
> > >> > > What's missing is using a scalar IV for the loop control
> > >> > > (but in principle AVX512 can use the mask here - just the patch
> > >> > > doesn't seem to work for AVX512 yet for some reason - likely
> > >> > > expand_vec_cond_expr_p doesn't work there).  What's also missing
> > >> > > is providing more support for predicated operations in the case
> > >> > > of reductions either via VEC_COND_EXPRs or via implementing
> > >> > > some of the .COND_{ADD,SUB,MUL...} internal functions as mapping
> > >> > > to masked AVX512 operations.
> > >> > >
> > >> > > For AVX2 and
> > >> > >
> > >> > > int foo (unsigned *a, unsigned * __restrict b, int n)
> > >> > > {
> > >> > >   unsigned sum = 1;
> > >> > >   for (int i = 0; i < n; ++i)
> > >> > > b[i] += a[i];
> > >> > >   return sum;
> > >> > > }
> > >> > >
> > >> > > we get
> > >> > >
> > >> > > .L3:
> > >> > > vpmaskmovd  (%rsi,%rax), %ymm0, %ymm3
> > >> > > vpmaskmovd  (%rdi,%rax), %ymm0, %ymm1
> > >> > > addl$8, %edx
> > >> > > vpaddd  %ymm3, %ymm1, %ymm1
> > >> > > vpmaskmovd  %ymm1, %ymm0, (%rsi,%rax)
> > >> > > vmovd   %edx, %xmm1
> > >> > > vpsubd  %ymm15, %ymm2, %ymm0
> > >> > > addq$32, %rax
> > >> > > vpbroadcastd%xmm1, %ymm1
> > >> > > vpaddd  %ymm4, %ymm1, %ymm1
> > >> > > vpsubd  %ymm15, %ymm1, %ymm1
> > >> > > vpcmpgtd%ymm1, %ymm0, %ymm0
> > >> > > vptest  %ymm0, %ymm0
> > >> > > jne .L3
> > >> > >
> > >> > > for the fully masked loop body and for the masked epilogue
> > >> > > we see
> > >> > >
> > >> > > .L4:
> > >> > > vmovdqu (%rsi,%rax), %ymm3
> > >> > > vpaddd  (%rdi,%rax), %ymm3, %ymm0
> > >> > > vmovdqu %ymm0, (%rsi,%rax)
> > >> > > addq$32, %rax
> > >> > > cmpq%rax, %rcx
> > >> > > jne .L4
> > >> > > movl%edx, %eax
> > >> > > andl$-8, %eax
> > >> > > testb   $7, %dl
> > >> > > je  .L11
> > >> > > .L3:
> > >> > > subl%eax, %edx
> > >> > > vmovdqa .LC0(%rip), %ymm1
> > >> > > salq$2, %rax
> > >> > > vmovd   %edx, %xmm0
> > >> > > movl$-2147483648, %edx
> > >> > > addq%rax, %rsi
> > >> > > vmovd   %edx, %xmm15
> > >> > > vpbroadcastd%xmm0, %ymm0
> > >> > > vpbroadcastd%xmm15, %ymm15
> > >> > > vpsubd  %ymm15, %ymm1, %ymm1
> > >> > > vpsubd  %ymm15, %ymm0, %ymm0
> > >> > > vpcmpgtd%ymm1, %ymm0, %ymm0
> > >> > > vpmaskmovd  (%rsi), %ymm0, %ymm1
> > >> > > vpmaskmovd  (%rdi,%rax), %ymm0, %ymm2
> > >> > > vpaddd  %ymm2, %ymm1, %ymm1
> > >> > > vpmaskmovd  %ymm1, %ymm0, (%rsi)
> > >> > > .L11:
> > >> > > vzeroupper
> > >> > >
> > >> > > compared to
> > >> > >
> > >> > > .L3:
> > >> > > movl%edx, %r8d
> > >> > > subl%eax, %r8d
> > >> > > leal-1(%r8), %r9d
> > >> > > cmpl$2, %r9d
> > >> > > jbe .L6
> > >> > > leaq(%rcx,%rax,4), %r9
> > >> > > vmovdqu (%rdi,%rax,4), %xmm2
> > >> > > movl%r8d, %eax
> > >> > > andl$-4, %eax
> > >> > > vpaddd  (%r9), %xmm2, %xmm0
> > >> > > addl%eax, %esi
> > >> > > andl$3, %r8d
> > >> > > vmovdqu %xmm0, (%r9)
> > >> > > je  .L2
> > >> > > .L6:
> > >> > > movslq  %esi, %r8
> > >> > > leaq0(,%r8,4), %rax
> > >> > > movl(%rdi,%r8,4), %r8d
> > >> > > addl%r8d, (%rcx,%rax)
> > >> > > leal1(%rsi), %r8d
> > >> > > cmpl%r8d, %edx
> > >> > > jle .L2
> > >> > > addl$2, %esi
> > >> > > movl4(%rdi,%rax), %r8d
> > >> > > addl%r8d, 4(%rcx,%rax)
> > >> > > cmpl%esi, %edx
> > >> > > jle .L2
> > >> > > movl8(%rdi,%rax), %edx
> > >> > > addl%edx, 8(%rcx,%rax)
> > >> > > .L2:
> > >> > >
> > >> > > I'm giving this a little testing right now but will dig on why
> > >> > > I don't get masked loops when AVX512 is enabled.
> > >> >
> > >> > Ah, a simple thinko - rgroup_controls vectypes seem to be
> 

Re: [PATCH, committed] rs6000: Don't let swaps pass break multiply low-part (PR101129)

2021-07-15 Thread David Edelsohn via Gcc-patches
On Thu, Jul 15, 2021 at 11:25 AM Bill Schmidt  wrote:
>
> Hi,
>
> Segher preapproved this patch in https://gcc.gnu.org/PR101129.  It differs 
> slightly from what was posted there, needing an additional test to ensure the 
> insn is a SET.  The patch also includes the test case provided by the OP.  
> Bootstrap and regtest succeeded on P9 little-endian.
>
> This bug has been around a long time, so the fix should be backported to all 
> open releases.  Is this okay after some burn-in time?
>
> Thanks!
> Bill
>
> rs6000: Don't let swaps pass break multiply low-part (PR101129)
>
> 2021-07-15  Bill Schmidt  
>
> gcc/
> * config/rs6000/rs6000-p8swap.c (has_part_mult): New.
> (rs6000_analyze_swaps): Insns containing a subreg of a mult are
> not swappable.
>
> gcc/testsuite/
> * gcc.target/powerpc/pr101129.c: New.

Thanks for fixing this so quickly.

Okay everywhere.

Thanks, David


Re: [committed] libstdc++: Fix std::get for std::tuple [PR101427]

2021-07-15 Thread Jonathan Wakely via Gcc-patches

On 15/07/21 16:26 +0100, Jonathan Wakely wrote:

The std::get functions relied on deduction failing if more than one
base class existed for the type T.  However the implementation of Core
DR 2303 (in r11-4693) made deduction succeed (and select the
more-derived base class).

This rewrites the implementation of std::get to explicitly check for
more than one occurrence of T in the tuple elements, making it
ill-formed again. Additionally, the large wall of overload resolution
errors described in PR c++/101460 is avoided by making std::get use
__get_helper directly instead of calling std::get, and by adding a
deleted overload of __get_helper for out-of-range N.

Signed-off-by: Jonathan Wakely 

libstdc++-v3/ChangeLog:

PR libstdc++/101427
* include/std/tuple (tuple_element): Improve static_assert text.
(__get_helper): Add deleted overload.
(get(tuple&&), get(const tuple&&)): Use
__get_helper directly.
(__get_helper2): Remove.
(__find_uniq_type_in_pack): New constexpr helper function.
(get): Use __find_uniq_type_in_pack and __get_helper instead
of __get_helper2.
* testsuite/20_util/tuple/element_access/get_neg.cc: Adjust
expected errors.
* testsuite/20_util/tuple/element_access/101427.cc: New test.

Tested powerpc64le-linux. Committed to trunk.


This should be backported to gcc-11 in time for 11.2 as well. If you
see any problems with it please let me know ASAP.



[committed] libstdc++: Fix std::get for std::tuple [PR101427]

2021-07-15 Thread Jonathan Wakely via Gcc-patches
The std::get functions relied on deduction failing if more than one
base class existed for the type T.  However the implementation of Core
DR 2303 (in r11-4693) made deduction succeed (and select the
more-derived base class).

This rewrites the implementation of std::get to explicitly check for
more than one occurrence of T in the tuple elements, making it
ill-formed again. Additionally, the large wall of overload resolution
errors described in PR c++/101460 is avoided by making std::get use
__get_helper directly instead of calling std::get, and by adding a
deleted overload of __get_helper for out-of-range N.

Signed-off-by: Jonathan Wakely 

libstdc++-v3/ChangeLog:

PR libstdc++/101427
* include/std/tuple (tuple_element): Improve static_assert text.
(__get_helper): Add deleted overload.
(get(tuple&&), get(const tuple&&)): Use
__get_helper directly.
(__get_helper2): Remove.
(__find_uniq_type_in_pack): New constexpr helper function.
(get): Use __find_uniq_type_in_pack and __get_helper instead
of __get_helper2.
* testsuite/20_util/tuple/element_access/get_neg.cc: Adjust
expected errors.
* testsuite/20_util/tuple/element_access/101427.cc: New test.

Tested powerpc64le-linux. Committed to trunk.

commit 17855eed7fc76b2cee7fbbc26f84d3c8b99be13c
Author: Jonathan Wakely 
Date:   Wed Jul 14 20:14:14 2021

libstdc++: Fix std::get for std::tuple [PR101427]

The std::get functions relied on deduction failing if more than one
base class existed for the type T.  However the implementation of Core
DR 2303 (in r11-4693) made deduction succeed (and select the
more-derived base class).

This rewrites the implementation of std::get to explicitly check for
more than one occurrence of T in the tuple elements, making it
ill-formed again. Additionally, the large wall of overload resolution
errors described in PR c++/101460 is avoided by making std::get use
__get_helper directly instead of calling std::get, and by adding a
deleted overload of __get_helper for out-of-range N.

Signed-off-by: Jonathan Wakely 

libstdc++-v3/ChangeLog:

PR libstdc++/101427
* include/std/tuple (tuple_element): Improve static_assert text.
(__get_helper): Add deleted overload.
(get(tuple&&), get(const tuple&&)): Use
__get_helper directly.
(__get_helper2): Remove.
(__find_uniq_type_in_pack): New constexpr helper function.
(get): Use __find_uniq_type_in_pack and __get_helper instead
of __get_helper2.
* testsuite/20_util/tuple/element_access/get_neg.cc: Adjust
expected errors.
* testsuite/20_util/tuple/element_access/101427.cc: New test.

diff --git a/libstdc++-v3/include/std/tuple b/libstdc++-v3/include/std/tuple
index 2d562f8da77..6953f8715d7 100644
--- a/libstdc++-v3/include/std/tuple
+++ b/libstdc++-v3/include/std/tuple
@@ -1358,7 +1358,7 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
 struct tuple_element<__i, tuple<>>
 {
   static_assert(__i < tuple_size>::value,
- "tuple index is in range");
+ "tuple index must be in range");
 };
 
   template
@@ -1371,6 +1371,11 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
 __get_helper(const _Tuple_impl<__i, _Head, _Tail...>& __t) noexcept
 { return _Tuple_impl<__i, _Head, _Tail...>::_M_head(__t); }
 
+  // Deleted overload to improve diagnostics for invalid indices
+  template
+__enable_if_t<(__i >= sizeof...(_Types))>
+__get_helper(const tuple<_Types...>&) = delete;
+
   /// Return a reference to the ith element of a tuple.
   template
 constexpr __tuple_element_t<__i, tuple<_Elements...>>&
@@ -1389,7 +1394,7 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
 get(tuple<_Elements...>&& __t) noexcept
 {
   typedef __tuple_element_t<__i, tuple<_Elements...>> __element_type;
-  return std::forward<__element_type&&>(std::get<__i>(__t));
+  return std::forward<__element_type>(std::__get_helper<__i>(__t));
 }
 
   /// Return a const rvalue reference to the ith element of a const tuple 
rvalue.
@@ -1398,47 +1403,79 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
 get(const tuple<_Elements...>&& __t) noexcept
 {
   typedef __tuple_element_t<__i, tuple<_Elements...>> __element_type;
-  return std::forward(std::get<__i>(__t));
+  return std::forward(std::__get_helper<__i>(__t));
 }
 
 #if __cplusplus >= 201402L
 
 #define __cpp_lib_tuples_by_type 201304
 
-  template
-constexpr _Head&
-__get_helper2(_Tuple_impl<__i, _Head, _Tail...>& __t) noexcept
-{ return _Tuple_impl<__i, _Head, _Tail...>::_M_head(__t); }
-
-  template
-constexpr const _Head&
-__get_helper2(const _Tuple_impl<__i, _Head, _Tail...>& __t) noexcept
-{ return _Tuple_impl<__i, _Head, _Tail...>::_M_head(__t); }
+  // Return the index of _Tp in _Types, if it occurs exactly 

[committed] libstdc++: Add noexcept to __replacement_assert [PR101429]

2021-07-15 Thread Jonathan Wakely via Gcc-patches
This results in slightly smaller code when assertions are enabled when
either using Clang (because it adds code to call std::terminate when
potentially-throwing functions are called in a noexcept function) or a
freestanding or non-verbose build (because it doesn't use printf).

Signed-off-by: Jonathan Wakely 

libstdc++-v3/ChangeLog:

PR libstdc++/101429
* include/bits/c++config (__replacement_assert): Add noexcept.
[!_GLIBCXX_VERBOSE] (__glibcxx_assert_impl): Use __builtin_trap
instead of __replacement_assert.

Tested powerpc64le-linux. Committed to trunk.

commit 1f7182d68c24985dace2a94422c671ff987c262c
Author: Jonathan Wakely 
Date:   Wed Jul 14 12:25:11 2021

libstdc++: Add noexcept to __replacement_assert [PR101429]

This results in slightly smaller code when assertions are enabled when
either using Clang (because it adds code to call std::terminate when
potentially-throwing functions are called in a noexcept function) or a
freestanding or non-verbose build (because it doesn't use printf).

Signed-off-by: Jonathan Wakely 

libstdc++-v3/ChangeLog:

PR libstdc++/101429
* include/bits/c++config (__replacement_assert): Add noexcept.
[!_GLIBCXX_VERBOSE] (__glibcxx_assert_impl): Use __builtin_trap
instead of __replacement_assert.

diff --git a/libstdc++-v3/include/bits/c++config 
b/libstdc++-v3/include/bits/c++config
index 9314117aed8..69ace386dd7 100644
--- a/libstdc++-v3/include/bits/c++config
+++ b/libstdc++-v3/include/bits/c++config
@@ -500,6 +500,7 @@ namespace std
 // Assert.
 #if defined(_GLIBCXX_ASSERTIONS) \
   || defined(_GLIBCXX_PARALLEL) || defined(_GLIBCXX_PARALLEL_ASSERTIONS)
+# if _GLIBCXX_HOSTED && _GLIBCXX_VERBOSE
 namespace std
 {
   // Avoid the use of assert, because we're trying to keep the 
@@ -508,6 +509,7 @@ namespace std
   inline void
   __replacement_assert(const char* __file, int __line,
   const char* __function, const char* __condition)
+  _GLIBCXX_NOEXCEPT
   {
 __builtin_printf("%s:%d: %s: Assertion '%s' failed.\n", __file, __line,
 __function, __condition);
@@ -517,10 +519,18 @@ namespace std
 #define __glibcxx_assert_impl(_Condition) \
   if (__builtin_expect(!bool(_Condition), false)) \
   {   \
-__glibcxx_constexpr_assert(_Condition);   \
+__glibcxx_constexpr_assert(false);\
 std::__replacement_assert(__FILE__, __LINE__, __PRETTY_FUNCTION__, \
  #_Condition);\
   }
+# else // ! VERBOSE
+# define __glibcxx_assert_impl(_Condition) \
+  if (__builtin_expect(!bool(_Condition), false))  \
+  {\
+__glibcxx_constexpr_assert(false); \
+__builtin_abort(); \
+  }
+#endif
 #endif
 
 #if defined(_GLIBCXX_ASSERTIONS)


Re: [PATCH v2] docs: Add 'S' to Machine Constraints for RISC-V

2021-07-15 Thread Palmer Dabbelt

On Sun, 11 Jul 2021 21:29:13 PDT (-0700), kito.ch...@sifive.com wrote:

It was undocument before, but it might used in linux kernel for resolve
code model issue, so LLVM community suggest we should document that,
so that make it become supported/documented/non-internal machine constraints.

gcc/ChangeLog:

PR target/101275
* config/riscv/constraints.md ("S"): Update description and remove
@internal.
* doc/md.texi (Machine Constraints): Document the 'S' constraints
for RISC-V.
---
 gcc/config/riscv/constraints.md | 3 +--
 gcc/doc/md.texi | 3 +++
 2 files changed, 4 insertions(+), 2 deletions(-)

diff --git a/gcc/config/riscv/constraints.md b/gcc/config/riscv/constraints.md
index 8c15c6c0486..c87d5b796a5 100644
--- a/gcc/config/riscv/constraints.md
+++ b/gcc/config/riscv/constraints.md
@@ -67,8 +67,7 @@ (define_memory_constraint "A"
(match_test "GET_CODE(XEXP(op,0)) == REG")))

 (define_constraint "S"
-  "@internal
-   A constant call address."
+  "A constraint that matches an absolute symbolic address."
   (match_operand 0 "absolute_symbolic_operand"))

 (define_constraint "U"
diff --git a/gcc/doc/md.texi b/gcc/doc/md.texi
index 00caf3844cc..2d120da96cf 100644
--- a/gcc/doc/md.texi
+++ b/gcc/doc/md.texi
@@ -3536,6 +3536,9 @@ A 5-bit unsigned immediate for CSR access instructions.
 @item A
 An address that is held in a general-purpose register.

+@item S
+A constraint that matches an absolute symbolic address.
+
 @end table

 @item RX---@file{config/rx/constraints.md}


Reviewed-by: Palmer Dabbelt 


[PATCH, committed] rs6000: Don't let swaps pass break multiply low-part (PR101129)

2021-07-15 Thread Bill Schmidt via Gcc-patches

Hi,

Segher preapproved this patch in https://gcc.gnu.org/PR101129. It 
differs slightly from what was posted there, needing an additional test 
to ensure the insn is a SET.  The patch also includes the test case 
provided by the OP.  Bootstrap and regtest succeeded on P9 little-endian.


This bug has been around a long time, so the fix should be backported to 
all open releases.  Is this okay after some burn-in time?


Thanks!
Bill

rs6000: Don't let swaps pass break multiply low-part (PR101129)

2021-07-15  Bill Schmidt  

gcc/
* config/rs6000/rs6000-p8swap.c (has_part_mult): New.
(rs6000_analyze_swaps): Insns containing a subreg of a mult are
not swappable.

gcc/testsuite/
* gcc.target/powerpc/pr101129.c: New.

diff --git a/gcc/config/rs6000/rs6000-p8swap.c 
b/gcc/config/rs6000/rs6000-p8swap.c
index 21cbcb2e28a..6b559aa5061 100644
--- a/gcc/config/rs6000/rs6000-p8swap.c
+++ b/gcc/config/rs6000/rs6000-p8swap.c
@@ -1523,6 +1523,22 @@ replace_swap_with_copy (swap_web_entry *insn_entry, 
unsigned i)
   insn->set_deleted ();
 }
 
+/* INSN is known to contain a SUBREG, which we can normally handle,

+   but if the SUBREG itself contains a MULT then we need to leave it alone
+   to avoid turning a mult_hipart into a mult_lopart, for example.  */
+static bool
+has_part_mult (rtx_insn *insn)
+{
+  rtx body = PATTERN (insn);
+  if (GET_CODE (body) != SET)
+return false;
+  rtx src = SET_SRC (body);
+  if (GET_CODE (src) != SUBREG)
+return false;
+  rtx inner = XEXP (src, 0);
+  return (GET_CODE (inner) == MULT);
+}
+
 /* Make NEW_MEM_EXP's attributes and flags resemble those of
ORIGINAL_MEM_EXP.  */
 static void
@@ -2501,6 +2517,9 @@ rs6000_analyze_swaps (function *fun)
insn_entry[uid].is_swappable = 0;
  else if (special != SH_NONE)
insn_entry[uid].special_handling = special;
+ else if (insn_entry[uid].contains_subreg
+  && has_part_mult (insn))
+   insn_entry[uid].is_swappable = 0;
  else if (insn_entry[uid].contains_subreg)
insn_entry[uid].special_handling = SH_SUBREG;
}
diff --git a/gcc/testsuite/gcc.target/powerpc/pr101129.c 
b/gcc/testsuite/gcc.target/powerpc/pr101129.c
new file mode 100644
index 000..1abc12480e4
--- /dev/null
+++ b/gcc/testsuite/gcc.target/powerpc/pr101129.c
@@ -0,0 +1,35 @@
+/* { dg-do run } */
+/* { dg-require-effective-target p8vector_hw } */
+/* { dg-options "-mdejagnu-cpu=power8 -O " } */
+
+/* PR101129: The swaps pass was turning a mult-lopart into a mult-hipart.
+   Make sure we aren't doing that anymore.  */
+
+typedef unsigned char u8;
+typedef unsigned char __attribute__((__vector_size__ (8))) U;
+typedef unsigned char __attribute__((__vector_size__ (16))) V;
+typedef unsigned int u32;
+typedef unsigned long long u64;
+typedef __int128 u128;
+
+u8 g;
+U u;
+
+void
+foo0 (u32 u32_0, U *ret)
+{
+  u128 u128_2 = u32_0 * (u128)((V){ 5 } > (u32_0 & 4));
+  u64 u64_r = u128_2 >> 64;
+  u8 u8_r = u64_r + g;
+  *ret = u + u8_r;
+}
+
+int
+main (void)
+{
+  U x;
+  foo0 (7, );
+  for (unsigned i = 0; i < sizeof (x); i++)
+if (x[i] != 0) __builtin_abort();
+  return 0;
+}



Re: [PATCH] Support reduction def re-use for epilogue with different vector size

2021-07-15 Thread Christophe Lyon via Gcc-patches
On Thu, Jul 15, 2021 at 2:34 PM Richard Biener  wrote:

> On Thu, 15 Jul 2021, Christophe Lyon wrote:
>
> > Hi,
> >
> >
> >
> > On Tue, Jul 13, 2021 at 2:09 PM Richard Biener 
> wrote:
> >
> > > The following adds support for re-using the vector reduction def
> > > from the main loop in vectorized epilogue loops on architectures
> > > which use different vector sizes for the epilogue.  That's only
> > > x86 as far as I am aware.
> > >
> > > vect.exp tested on x86_64-unknown-linux-gnu, full bootstrap &
> > > regtest in progress.
> > >
> > > There's costing issues on x86 which usually prevent vectorizing
> > > an epilogue with a reduction, at least for loops that only
> > > have a reduction - it could be mitigated by not accounting for
> > > the epilogue there if we can compute that we can re-use the
> > > main loops cost.
> > >
> > > Richard - did I figure the correct place to adjust?  I guess
> > > adjusting accumulator->reduc_input in vect_transform_cycle_phi
> > > for re-use by the skip code in vect_create_epilog_for_reduction
> > > is a bit awkward but at least we're conciously doing
> > > vect_create_epilog_for_reduction last (via vectorizing live
> > > operations).
> > >
> > > OK in the unlikely case all testing succeeds (I also want to
> > > run it through SPEC with/without -fno-vect-cost-model which
> > > will take some time)?
> > >
> > > Thanks,
> > > Richard.
> > >
> > > 2021-07-13  Richard Biener  
> > >
> > > * tree-vect-loop.c (vect_find_reusable_accumulator): Handle
> > > vector types where the old vector type has a multiple of
> > > the new vector type elements.
> > > (vect_create_partial_epilog): New function, split out from...
> > > (vect_create_epilog_for_reduction): ... here.
> > > (vect_transform_cycle_phi): Reduce the re-used accumulator
> > > to the new vector type.
> > >
> > > * gcc.target/i386/vect-reduc-1.c: New testcase.
> > >
> >
> > This patch is causing regressions on aarch64:
> >  FAIL: gcc.dg/vect/pr92324-4.c (internal compiler error)
> > FAIL: gcc.dg/vect/pr92324-4.c 2 blank line(s) in output
> > FAIL: gcc.dg/vect/pr92324-4.c (test for excess errors)
> > Excess errors:
> > /gcc/testsuite/gcc.dg/vect/pr92324-4.c:7:1: error: incompatible types in
> > 'PHI' argument 1
> > vector(2) unsigned int
> > vector(2) int
> > _91 = PHI <_90(17), _83(11)>
> > during GIMPLE pass: vect
> > dump file: ./pr92324-4.c.167t.vect
> > /gcc/testsuite/gcc.dg/vect/pr92324-4.c:7:1: internal compiler error:
> > verify_gimple failed
> > 0xe6438e verify_gimple_in_cfg(function*, bool)
> > /gcc/tree-cfg.c:5535
> > 0xd13902 execute_function_todo
> > /gcc/passes.c:2042
> > 0xd142a5 execute_todo
> > /gcc/passes.c:2096
> >
> > FAIL: gcc.target/aarch64/vect-fmaxv-fminv-compile.c scan-assembler
> fminnmv
> > FAIL: gcc.target/aarch64/vect-fmaxv-fminv-compile.c scan-assembler
> fmaxnmv
>
> What exact options do you pass to cc1 to get this?  Can you track this
> in a PR please?
>
> Thanks,
> Richard.
>
>
Sure, I filed PR 101462

Christophe


> > Thanks,
> >
> > Christophe
> >
> >
> >
> > > ---
> > >  gcc/testsuite/gcc.target/i386/vect-reduc-1.c |  17 ++
> > >  gcc/tree-vect-loop.c | 223 ---
> > >  2 files changed, 155 insertions(+), 85 deletions(-)
> > >  create mode 100644 gcc/testsuite/gcc.target/i386/vect-reduc-1.c
> > >
> > > diff --git a/gcc/testsuite/gcc.target/i386/vect-reduc-1.c
> > > b/gcc/testsuite/gcc.target/i386/vect-reduc-1.c
> > > new file mode 100644
> > > index 000..9ee9ba4e736
> > > --- /dev/null
> > > +++ b/gcc/testsuite/gcc.target/i386/vect-reduc-1.c
> > > @@ -0,0 +1,17 @@
> > > +/* { dg-do compile } */
> > > +/* { dg-options "-O3 -mavx2 -mno-avx512f -fdump-tree-vect-details" }
> */
> > > +
> > > +#define N 32
> > > +int foo (int *a, int n)
> > > +{
> > > +  int sum = 1;
> > > +  for (int i = 0; i < 8*N + 4; ++i)
> > > +sum += a[i];
> > > +  return sum;
> > > +}
> > > +
> > > +/* The reduction epilog should be vectorized and the accumulator
> > > +   re-used.  */
> > > +/* { dg-final { scan-tree-dump "LOOP EPILOGUE VECTORIZED" "vect" } }
> */
> > > +/* { dg-final { scan-assembler-times "psrl" 2 } } */
> > > +/* { dg-final { scan-assembler-times "padd" 5 } } */
> > > diff --git a/gcc/tree-vect-loop.c b/gcc/tree-vect-loop.c
> > > index 8c27d75f889..98e2a845629 100644
> > > --- a/gcc/tree-vect-loop.c
> > > +++ b/gcc/tree-vect-loop.c
> > > @@ -4901,7 +4901,8 @@ vect_find_reusable_accumulator (loop_vec_info
> > > loop_vinfo,
> > >   ones as well.  */
> > >tree vectype = STMT_VINFO_VECTYPE (reduc_info);
> > >tree old_vectype = TREE_TYPE (accumulator->reduc_input);
> > > -  if (!useless_type_conversion_p (old_vectype, vectype))
> > > +  if (!constant_multiple_p (TYPE_VECTOR_SUBPARTS (old_vectype),
> > > +   TYPE_VECTOR_SUBPARTS (vectype)))
> > >  return false;
> > >
> > >/* Non-SLP reductions might apply an adjustment 

[PATCH] Change the type of return value of profile_count::value to uint64_t

2021-07-15 Thread Martin Jambor
Hi,

The field in which profile_count holds the count has 61 bits but the
getter method only returns it as a 32 bit number.  The getter is (and
should be) only used for dumping but even dumps are better when they do
not lie.

The patch has passed bootstrap and testing on x86_64-linux and Honza has
approved it so I will commit it shortly.

Martin


gcc/ChangeLog:

2021-07-13  Martin Jambor  

* profile-count.h (profile_count::value): Change the return type to
uint64_t.
* gimple-pretty-print.c (dump_gimple_bb_header): Adjust print
statement.
* tree-cfg.c (dump_function_to_file): Likewise.
---
 gcc/gimple-pretty-print.c | 2 +-
 gcc/profile-count.h   | 2 +-
 gcc/tree-cfg.c| 2 +-
 3 files changed, 3 insertions(+), 3 deletions(-)

diff --git a/gcc/gimple-pretty-print.c b/gcc/gimple-pretty-print.c
index 39c5775e2cb..d6e63d6e57f 100644
--- a/gcc/gimple-pretty-print.c
+++ b/gcc/gimple-pretty-print.c
@@ -2831,7 +2831,7 @@ dump_gimple_bb_header (FILE *outf, basic_block bb, int 
indent,
  if (bb->loop_father->header == bb)
fprintf (outf, ",loop_header(%d)", bb->loop_father->num);
  if (bb->count.initialized_p ())
-   fprintf (outf, ",%s(%d)",
+   fprintf (outf, ",%s(%" PRIu64 ")",
 profile_quality_as_string (bb->count.quality ()),
 bb->count.value ());
  fprintf (outf, "):\n");
diff --git a/gcc/profile-count.h b/gcc/profile-count.h
index f2b1e3a6525..c7a45ac5ee3 100644
--- a/gcc/profile-count.h
+++ b/gcc/profile-count.h
@@ -804,7 +804,7 @@ public:
 }
 
   /* Get the value of the count.  */
-  uint32_t value () const { return m_val; }
+  uint64_t value () const { return m_val; }
 
   /* Get the quality of the count.  */
   enum profile_quality quality () const { return m_quality; }
diff --git a/gcc/tree-cfg.c b/gcc/tree-cfg.c
index c73e1cbdda6..2ed191f9a47 100644
--- a/gcc/tree-cfg.c
+++ b/gcc/tree-cfg.c
@@ -8081,7 +8081,7 @@ dump_function_to_file (tree fndecl, FILE *file, 
dump_flags_t flags)
{
  basic_block bb = ENTRY_BLOCK_PTR_FOR_FN (cfun);
  if (bb->count.initialized_p ())
-   fprintf (file, ",%s(%d)",
+   fprintf (file, ",%s(%" PRIu64 ")",
 profile_quality_as_string (bb->count.quality ()),
 bb->count.value ());
  fprintf (file, ")\n%s (", function_name (fun));
-- 
2.32.0



Re: [PATCH 2/2][RFC] Add loop masking support for x86

2021-07-15 Thread Richard Biener
On Thu, 15 Jul 2021, Richard Sandiford wrote:

> Richard Biener  writes:
> > On Thu, 15 Jul 2021, Hongtao Liu wrote:
> >
> >> On Thu, Jul 15, 2021 at 6:45 PM Richard Biener via Gcc-patches
> >>  wrote:
> >> >
> >> > On Thu, Jul 15, 2021 at 12:30 PM Richard Biener  
> >> > wrote:
> >> > >
> >> > > The following extends the existing loop masking support using
> >> > > SVE WHILE_ULT to x86 by proving an alternate way to produce the
> >> > > mask using VEC_COND_EXPRs.  So with --param vect-partial-vector-usage
> >> > > you can now enable masked vectorized epilogues (=1) or fully
> >> > > masked vector loops (=2).
> >> > >
> >> > > What's missing is using a scalar IV for the loop control
> >> > > (but in principle AVX512 can use the mask here - just the patch
> >> > > doesn't seem to work for AVX512 yet for some reason - likely
> >> > > expand_vec_cond_expr_p doesn't work there).  What's also missing
> >> > > is providing more support for predicated operations in the case
> >> > > of reductions either via VEC_COND_EXPRs or via implementing
> >> > > some of the .COND_{ADD,SUB,MUL...} internal functions as mapping
> >> > > to masked AVX512 operations.
> >> > >
> >> > > For AVX2 and
> >> > >
> >> > > int foo (unsigned *a, unsigned * __restrict b, int n)
> >> > > {
> >> > >   unsigned sum = 1;
> >> > >   for (int i = 0; i < n; ++i)
> >> > > b[i] += a[i];
> >> > >   return sum;
> >> > > }
> >> > >
> >> > > we get
> >> > >
> >> > > .L3:
> >> > > vpmaskmovd  (%rsi,%rax), %ymm0, %ymm3
> >> > > vpmaskmovd  (%rdi,%rax), %ymm0, %ymm1
> >> > > addl$8, %edx
> >> > > vpaddd  %ymm3, %ymm1, %ymm1
> >> > > vpmaskmovd  %ymm1, %ymm0, (%rsi,%rax)
> >> > > vmovd   %edx, %xmm1
> >> > > vpsubd  %ymm15, %ymm2, %ymm0
> >> > > addq$32, %rax
> >> > > vpbroadcastd%xmm1, %ymm1
> >> > > vpaddd  %ymm4, %ymm1, %ymm1
> >> > > vpsubd  %ymm15, %ymm1, %ymm1
> >> > > vpcmpgtd%ymm1, %ymm0, %ymm0
> >> > > vptest  %ymm0, %ymm0
> >> > > jne .L3
> >> > >
> >> > > for the fully masked loop body and for the masked epilogue
> >> > > we see
> >> > >
> >> > > .L4:
> >> > > vmovdqu (%rsi,%rax), %ymm3
> >> > > vpaddd  (%rdi,%rax), %ymm3, %ymm0
> >> > > vmovdqu %ymm0, (%rsi,%rax)
> >> > > addq$32, %rax
> >> > > cmpq%rax, %rcx
> >> > > jne .L4
> >> > > movl%edx, %eax
> >> > > andl$-8, %eax
> >> > > testb   $7, %dl
> >> > > je  .L11
> >> > > .L3:
> >> > > subl%eax, %edx
> >> > > vmovdqa .LC0(%rip), %ymm1
> >> > > salq$2, %rax
> >> > > vmovd   %edx, %xmm0
> >> > > movl$-2147483648, %edx
> >> > > addq%rax, %rsi
> >> > > vmovd   %edx, %xmm15
> >> > > vpbroadcastd%xmm0, %ymm0
> >> > > vpbroadcastd%xmm15, %ymm15
> >> > > vpsubd  %ymm15, %ymm1, %ymm1
> >> > > vpsubd  %ymm15, %ymm0, %ymm0
> >> > > vpcmpgtd%ymm1, %ymm0, %ymm0
> >> > > vpmaskmovd  (%rsi), %ymm0, %ymm1
> >> > > vpmaskmovd  (%rdi,%rax), %ymm0, %ymm2
> >> > > vpaddd  %ymm2, %ymm1, %ymm1
> >> > > vpmaskmovd  %ymm1, %ymm0, (%rsi)
> >> > > .L11:
> >> > > vzeroupper
> >> > >
> >> > > compared to
> >> > >
> >> > > .L3:
> >> > > movl%edx, %r8d
> >> > > subl%eax, %r8d
> >> > > leal-1(%r8), %r9d
> >> > > cmpl$2, %r9d
> >> > > jbe .L6
> >> > > leaq(%rcx,%rax,4), %r9
> >> > > vmovdqu (%rdi,%rax,4), %xmm2
> >> > > movl%r8d, %eax
> >> > > andl$-4, %eax
> >> > > vpaddd  (%r9), %xmm2, %xmm0
> >> > > addl%eax, %esi
> >> > > andl$3, %r8d
> >> > > vmovdqu %xmm0, (%r9)
> >> > > je  .L2
> >> > > .L6:
> >> > > movslq  %esi, %r8
> >> > > leaq0(,%r8,4), %rax
> >> > > movl(%rdi,%r8,4), %r8d
> >> > > addl%r8d, (%rcx,%rax)
> >> > > leal1(%rsi), %r8d
> >> > > cmpl%r8d, %edx
> >> > > jle .L2
> >> > > addl$2, %esi
> >> > > movl4(%rdi,%rax), %r8d
> >> > > addl%r8d, 4(%rcx,%rax)
> >> > > cmpl%esi, %edx
> >> > > jle .L2
> >> > > movl8(%rdi,%rax), %edx
> >> > > addl%edx, 8(%rcx,%rax)
> >> > > .L2:
> >> > >
> >> > > I'm giving this a little testing right now but will dig on why
> >> > > I don't get masked loops when AVX512 is enabled.
> >> >
> >> > Ah, a simple thinko - rgroup_controls vectypes seem to be
> >> > always VECTOR_BOOLEAN_TYPE_P and thus we can
> >> > use expand_vec_cmp_expr_p.  The AVX512 fully masked
> >> > loop then looks like
> >> >
> >> > .L3:
> >> > vmovdqu32   (%rsi,%rax,4), %ymm2{%k1}
> >> > vmovdqu32   (%rdi,%rax,4), %ymm1{%k1}
> >> > vpaddd  

RFA: Libiberty: Fix stack exhaunstion demangling corrupt rust names

2021-07-15 Thread Nick Clifton via Gcc-patches

Hi Guys,

  Attached is a proposed patch to fix PR 99935 and 100968, both
  of which are stack exhaustion problems in libiberty's Rust
  demangler.  The patch adds a recursion limit along the lines
  of the one already in place for the C++ demangler.

  OK to apply ?

Cheers
  Nick
diff --git a/libiberty/rust-demangle.c b/libiberty/rust-demangle.c
index 6fd8f6a4db0..df09b7b8fdd 100644
--- a/libiberty/rust-demangle.c
+++ b/libiberty/rust-demangle.c
@@ -74,6 +74,12 @@ struct rust_demangler
   /* Rust mangling version, with legacy mangling being -1. */
   int version;
 
+  /* Recursion depth.  */
+  uint recursion;
+  /* Maximum number of times demangle_path may be called recursively.  */
+#define RUST_MAX_RECURSION_COUNT  1024
+#define RUST_NO_RECURSION_LIMIT   ((uint) -1)
+
   uint64_t bound_lifetime_depth;
 };
 
@@ -671,6 +677,15 @@ demangle_path (struct rust_demangler *rdm, int in_value)
   if (rdm->errored)
 return;
 
+  if (rdm->recursion != RUST_NO_RECURSION_LIMIT)
+{
+  ++ rdm->recursion;
+  if (rdm->recursion > RUST_MAX_RECURSION_COUNT)
+	/* FIXME: There ought to be a way to report
+	   that the recursion limit has been reached.  */
+	goto fail_return;
+}
+
   switch (tag = next (rdm))
 {
 case 'C':
@@ -688,10 +703,7 @@ demangle_path (struct rust_demangler *rdm, int in_value)
 case 'N':
   ns = next (rdm);
   if (!ISLOWER (ns) && !ISUPPER (ns))
-{
-  rdm->errored = 1;
-  return;
-}
+	goto fail_return;
 
   demangle_path (rdm, in_value);
 
@@ -776,9 +788,15 @@ demangle_path (struct rust_demangler *rdm, int in_value)
 }
   break;
 default:
-  rdm->errored = 1;
-  return;
+  goto fail_return;
 }
+  goto pass_return;
+
+ fail_return:
+  rdm->errored = 1;
+ pass_return:
+  if (rdm->recursion != RUST_NO_RECURSION_LIMIT)
+-- rdm->recursion;
 }
 
 static void
@@ -1317,6 +1338,7 @@ rust_demangle_callback (const char *mangled, int options,
   rdm.skipping_printing = 0;
   rdm.verbose = (options & DMGL_VERBOSE) != 0;
   rdm.version = 0;
+  rdm.recursion = (options & DMGL_NO_RECURSE_LIMIT) ? RUST_NO_RECURSION_LIMIT : 0;
   rdm.bound_lifetime_depth = 0;
 
   /* Rust symbols always start with _R (v0) or _ZN (legacy). */


Re: [PATCH 2/2] Backwards jump threader rewrite with ranger.

2021-07-15 Thread Aldy Hernandez via Gcc-patches
As mentioned in my previous email, these are some minor changes to the
previous revision.  All I'm changing here is the call into the solver
to use range_of_expr and range_of_stmt.  Everything else remains the
same.

Tested on x86-64 Linux.

On Mon, Jul 5, 2021 at 5:39 PM Aldy Hernandez  wrote:
>
> PING.
>
> Aldy
From 1774338ddd1f4718884e766aae2fc48b97110c5d Mon Sep 17 00:00:00 2001
From: Aldy Hernandez 
Date: Tue, 15 Jun 2021 12:32:51 +0200
Subject: [PATCH 3/5] Backwards jump threader rewrite with ranger.

This is a rewrite of the backwards threader with a ranger based solver.

The code is divided into two parts: the path solver in
gimple-range-path.*, and the path discovery bits in
tree-ssa-threadbackward.c.

The legacy code is still available with --param=threader-mode=legacy,
but will be removed shortly after.

gcc/ChangeLog:

	* Makefile.in (tree-ssa-loop-im.o-warn): New.
	* flag-types.h (enum threader_mode): New.
	* params.opt: Add entry for --param=threader-mode.
	* tree-ssa-threadbackward.c (THREADER_ITERATIVE_MODE): New.
	(class back_threader): New.
	(back_threader::back_threader): New.
	(back_threader::~back_threader): New.
	(back_threader::maybe_register_path): New.
	(back_threader::find_taken_edge): New.
	(back_threader::find_taken_edge_switch): New.
	(back_threader::find_taken_edge_cond): New.
	(back_threader::resolve_def): New.
	(back_threader::resolve_phi): New.
	(back_threader::find_paths_to_names): New.
	(back_threader::find_paths): New.
	(dump_path): New.
	(debug): New.
	(thread_jumps::find_jump_threads_backwards): Call ranger threader.
	(thread_jumps::find_jump_threads_backwards_with_ranger): New.
	(pass_thread_jumps::execute): Abstract out code...
	(try_thread_blocks): ...here.
	* tree-ssa-threadedge.c (jump_threader::thread_outgoing_edges):
	Abstract out threading candidate code to...
	(single_succ_to_potentially_threadable_block): ...here.
	* tree-ssa-threadedge.h (single_succ_to_potentially_threadable_block):
	New.
	* tree-ssa-threadupdate.c (register_jump_thread): Return boolean.
	* tree-ssa-threadupdate.h (class jump_thread_path_registry):
	Return bool from register_jump_thread.

libgomp/ChangeLog:

	* testsuite/libgomp.graphite/force-parallel-4.c: Adjust for
	threader.
	* testsuite/libgomp.graphite/force-parallel-8.c: Same.

gcc/testsuite/ChangeLog:

	* g++.dg/debug/dwarf2/deallocator.C: Adjust for threader.
	* gcc.c-torture/compile/pr83510.c: Same.
	* gcc.dg/loop-unswitch-2.c: Same.
	* gcc.dg/old-style-asm-1.c: Same.
	* gcc.dg/pr68317.c: Same.
	* gcc.dg/pr97567-2.c: Same.
	* gcc.dg/predict-9.c: Same.
	* gcc.dg/shrink-wrap-loop.c: Same.
	* gcc.dg/sibcall-1.c: Same.
	* gcc.dg/tree-ssa/builtin-sprintf-3.c: Same.
	* gcc.dg/tree-ssa/pr21001.c: Same.
	* gcc.dg/tree-ssa/pr21294.c: Same.
	* gcc.dg/tree-ssa/pr21417.c: Same.
	* gcc.dg/tree-ssa/pr21458-2.c: Same.
	* gcc.dg/tree-ssa/pr21563.c: Same.
	* gcc.dg/tree-ssa/pr49039.c: Same.
	* gcc.dg/tree-ssa/pr61839_1.c: Same.
	* gcc.dg/tree-ssa/pr61839_3.c: Same.
	* gcc.dg/tree-ssa/pr77445-2.c: Same.
	* gcc.dg/tree-ssa/split-path-4.c: Same.
	* gcc.dg/tree-ssa/ssa-dom-thread-11.c: Same.
	* gcc.dg/tree-ssa/ssa-dom-thread-12.c: Same.
	* gcc.dg/tree-ssa/ssa-dom-thread-14.c: Same.
	* gcc.dg/tree-ssa/ssa-dom-thread-18.c: Same.
	* gcc.dg/tree-ssa/ssa-dom-thread-6.c: Same.
	* gcc.dg/tree-ssa/ssa-dom-thread-7.c: Same.
	* gcc.dg/tree-ssa/ssa-fre-48.c: Same.
	* gcc.dg/tree-ssa/ssa-thread-11.c: Same.
	* gcc.dg/tree-ssa/ssa-thread-12.c: Same.
	* gcc.dg/tree-ssa/ssa-thread-14.c: Same.
	* gcc.dg/tree-ssa/vrp02.c: Same.
	* gcc.dg/tree-ssa/vrp03.c: Same.
	* gcc.dg/tree-ssa/vrp05.c: Same.
	* gcc.dg/tree-ssa/vrp06.c: Same.
	* gcc.dg/tree-ssa/vrp07.c: Same.
	* gcc.dg/tree-ssa/vrp09.c: Same.
	* gcc.dg/tree-ssa/vrp19.c: Same.
	* gcc.dg/tree-ssa/vrp20.c: Same.
	* gcc.dg/tree-ssa/vrp33.c: Same.
	* gcc.dg/uninit-pred-9_b.c: Same.
	* gcc.dg/vect/bb-slp-16.c: Same.
	* gcc.target/i386/avx2-vect-aggressive.c: Same.
	* gcc.dg/tree-ssa/ranger-threader-1.c: New test.
	* gcc.dg/tree-ssa/ranger-threader-2.c: New test.
	* gcc.dg/tree-ssa/ranger-threader-3.c: New test.
	* gcc.dg/tree-ssa/ranger-threader-4.c: New test.
	* gcc.dg/tree-ssa/ranger-threader-5.c: New test.
---
 gcc/Makefile.in   |   5 +
 gcc/flag-types.h  |   7 +
 gcc/params.opt|  17 +
 .../g++.dg/debug/dwarf2/deallocator.C |   3 +-
 gcc/testsuite/gcc.c-torture/compile/pr83510.c |  33 ++
 gcc/testsuite/gcc.dg/loop-unswitch-2.c|   2 +-
 gcc/testsuite/gcc.dg/old-style-asm-1.c|   5 +-
 gcc/testsuite/gcc.dg/pr68317.c|   4 +-
 gcc/testsuite/gcc.dg/pr97567-2.c  |   2 +-
 gcc/testsuite/gcc.dg/predict-9.c  |   4 +-
 gcc/testsuite/gcc.dg/shrink-wrap-loop.c   |  53 ++
 gcc/testsuite/gcc.dg/sibcall-1.c  |  10 +
 .../gcc.dg/tree-ssa/builtin-sprintf-3.c   |  25 +-
 gcc/testsuite/gcc.dg/tree-ssa/pr21001.c   |   1 +
 gcc/testsuite/gcc.dg/tree-ssa/pr21294.c  

Re: [PATCH 2/2][RFC] Add loop masking support for x86

2021-07-15 Thread Richard Sandiford via Gcc-patches
Richard Biener  writes:
> On Thu, 15 Jul 2021, Hongtao Liu wrote:
>
>> On Thu, Jul 15, 2021 at 6:45 PM Richard Biener via Gcc-patches
>>  wrote:
>> >
>> > On Thu, Jul 15, 2021 at 12:30 PM Richard Biener  wrote:
>> > >
>> > > The following extends the existing loop masking support using
>> > > SVE WHILE_ULT to x86 by proving an alternate way to produce the
>> > > mask using VEC_COND_EXPRs.  So with --param vect-partial-vector-usage
>> > > you can now enable masked vectorized epilogues (=1) or fully
>> > > masked vector loops (=2).
>> > >
>> > > What's missing is using a scalar IV for the loop control
>> > > (but in principle AVX512 can use the mask here - just the patch
>> > > doesn't seem to work for AVX512 yet for some reason - likely
>> > > expand_vec_cond_expr_p doesn't work there).  What's also missing
>> > > is providing more support for predicated operations in the case
>> > > of reductions either via VEC_COND_EXPRs or via implementing
>> > > some of the .COND_{ADD,SUB,MUL...} internal functions as mapping
>> > > to masked AVX512 operations.
>> > >
>> > > For AVX2 and
>> > >
>> > > int foo (unsigned *a, unsigned * __restrict b, int n)
>> > > {
>> > >   unsigned sum = 1;
>> > >   for (int i = 0; i < n; ++i)
>> > > b[i] += a[i];
>> > >   return sum;
>> > > }
>> > >
>> > > we get
>> > >
>> > > .L3:
>> > > vpmaskmovd  (%rsi,%rax), %ymm0, %ymm3
>> > > vpmaskmovd  (%rdi,%rax), %ymm0, %ymm1
>> > > addl$8, %edx
>> > > vpaddd  %ymm3, %ymm1, %ymm1
>> > > vpmaskmovd  %ymm1, %ymm0, (%rsi,%rax)
>> > > vmovd   %edx, %xmm1
>> > > vpsubd  %ymm15, %ymm2, %ymm0
>> > > addq$32, %rax
>> > > vpbroadcastd%xmm1, %ymm1
>> > > vpaddd  %ymm4, %ymm1, %ymm1
>> > > vpsubd  %ymm15, %ymm1, %ymm1
>> > > vpcmpgtd%ymm1, %ymm0, %ymm0
>> > > vptest  %ymm0, %ymm0
>> > > jne .L3
>> > >
>> > > for the fully masked loop body and for the masked epilogue
>> > > we see
>> > >
>> > > .L4:
>> > > vmovdqu (%rsi,%rax), %ymm3
>> > > vpaddd  (%rdi,%rax), %ymm3, %ymm0
>> > > vmovdqu %ymm0, (%rsi,%rax)
>> > > addq$32, %rax
>> > > cmpq%rax, %rcx
>> > > jne .L4
>> > > movl%edx, %eax
>> > > andl$-8, %eax
>> > > testb   $7, %dl
>> > > je  .L11
>> > > .L3:
>> > > subl%eax, %edx
>> > > vmovdqa .LC0(%rip), %ymm1
>> > > salq$2, %rax
>> > > vmovd   %edx, %xmm0
>> > > movl$-2147483648, %edx
>> > > addq%rax, %rsi
>> > > vmovd   %edx, %xmm15
>> > > vpbroadcastd%xmm0, %ymm0
>> > > vpbroadcastd%xmm15, %ymm15
>> > > vpsubd  %ymm15, %ymm1, %ymm1
>> > > vpsubd  %ymm15, %ymm0, %ymm0
>> > > vpcmpgtd%ymm1, %ymm0, %ymm0
>> > > vpmaskmovd  (%rsi), %ymm0, %ymm1
>> > > vpmaskmovd  (%rdi,%rax), %ymm0, %ymm2
>> > > vpaddd  %ymm2, %ymm1, %ymm1
>> > > vpmaskmovd  %ymm1, %ymm0, (%rsi)
>> > > .L11:
>> > > vzeroupper
>> > >
>> > > compared to
>> > >
>> > > .L3:
>> > > movl%edx, %r8d
>> > > subl%eax, %r8d
>> > > leal-1(%r8), %r9d
>> > > cmpl$2, %r9d
>> > > jbe .L6
>> > > leaq(%rcx,%rax,4), %r9
>> > > vmovdqu (%rdi,%rax,4), %xmm2
>> > > movl%r8d, %eax
>> > > andl$-4, %eax
>> > > vpaddd  (%r9), %xmm2, %xmm0
>> > > addl%eax, %esi
>> > > andl$3, %r8d
>> > > vmovdqu %xmm0, (%r9)
>> > > je  .L2
>> > > .L6:
>> > > movslq  %esi, %r8
>> > > leaq0(,%r8,4), %rax
>> > > movl(%rdi,%r8,4), %r8d
>> > > addl%r8d, (%rcx,%rax)
>> > > leal1(%rsi), %r8d
>> > > cmpl%r8d, %edx
>> > > jle .L2
>> > > addl$2, %esi
>> > > movl4(%rdi,%rax), %r8d
>> > > addl%r8d, 4(%rcx,%rax)
>> > > cmpl%esi, %edx
>> > > jle .L2
>> > > movl8(%rdi,%rax), %edx
>> > > addl%edx, 8(%rcx,%rax)
>> > > .L2:
>> > >
>> > > I'm giving this a little testing right now but will dig on why
>> > > I don't get masked loops when AVX512 is enabled.
>> >
>> > Ah, a simple thinko - rgroup_controls vectypes seem to be
>> > always VECTOR_BOOLEAN_TYPE_P and thus we can
>> > use expand_vec_cmp_expr_p.  The AVX512 fully masked
>> > loop then looks like
>> >
>> > .L3:
>> > vmovdqu32   (%rsi,%rax,4), %ymm2{%k1}
>> > vmovdqu32   (%rdi,%rax,4), %ymm1{%k1}
>> > vpaddd  %ymm2, %ymm1, %ymm0
>> > vmovdqu32   %ymm0, (%rsi,%rax,4){%k1}
>> > addq$8, %rax
>> > vpbroadcastd%eax, %ymm0
>> > vpaddd  %ymm4, %ymm0, %ymm0
>> > vpcmpud $6, %ymm0, %ymm3, %k1
>> > kortestb%k1, %k1
>> > jne .L3
>> >
>> > I guess for x86 

Re: [PATCH 1/2] Implement basic block path solver.

2021-07-15 Thread Aldy Hernandez via Gcc-patches
Jeff has mentioned that it'll take a while longer to review the
threader rewrite, so I've decided to make some minor cleanups while he
gets to it.

There are few minor changes here:

1. I've renamed the solver to gimple-range-path.* which expresses
better that it's part of the ranger tools. The prefix tree-ssa-* is
somewhat outdated ;-).

2. I've made the folder a full blown range_query, which can be passed
around anywhere a range_query is accepted.  It turns out, we were 99%
of the way there, so might as well share the same API.  Now users will
be able use range_of_expr, range_of_stmt, and friends.  This can come
in handy when passing a range_query to something like
simplify_using_ranges, something which I am considering for my
follow-up changes to the DOM threader.

3. Finally, I've renamed the class to path_range_query to make it
obvious that it's a range_query object.

There are no functional changes.

Tested on x86-64 Linux.

I will wait on Jeff's review of the tree-ssa-threadbackward.* changes
before committing this.

Aldy

On Fri, Jul 2, 2021 at 3:17 PM Andrew MacLeod  wrote:
>
> On 7/2/21 4:13 AM, Aldy Hernandez wrote:
>
> +
> +// Return the range of STMT as it would be seen at the end of the path
> +// being analyzed.  Anything but the final conditional in a BB will
> +// return VARYING.
> +
> +void
> +path_solver::range_in_path (irange , gimple *stmt)
> +{
> +  if (gimple_code (stmt) == GIMPLE_COND && fold_range (r, stmt, this))
> +return;
> +
> +  r.set_varying (gimple_expr_type (stmt));
> +}
>
> Not objecting to anything here other than to note that I think we have cases 
> where there's a COND_EXPR on the RHS of statements within a block.  We're (in 
> general) not handling those well in DOM or jump threading.
>
>
> I guess I can put that on my TODO list :).
>
> note that we are no longer in the days of range-ops only processing...   
> fold_range handles COND_EXPR (and every other kind of stmt)  just fine.
>
> Andrew
From bb2d12abf7bab6306a38e143aed0f0a828f1c790 Mon Sep 17 00:00:00 2001
From: Aldy Hernandez 
Date: Tue, 15 Jun 2021 12:20:43 +0200
Subject: [PATCH 2/5] Implement basic block path solver.

This is is the main basic block path solver for use in the ranger-based
backwards threader.  Given a path of BBs, the class can solve the final
conditional or any SSA name used in calculating the final conditional.

gcc/ChangeLog:

* Makefile.in (OBJS): Add gimple-range-path.o.
	* gimple-range-path.cc: New file.
	* gimple-range-path.h: New file.
---
 gcc/Makefile.in  |   1 +
 gcc/gimple-range-path.cc | 327 +++
 gcc/gimple-range-path.h  |  85 ++
 3 files changed, 413 insertions(+)
 create mode 100644 gcc/gimple-range-path.cc
 create mode 100644 gcc/gimple-range-path.h

diff --git a/gcc/Makefile.in b/gcc/Makefile.in
index 934b2a05327..863f1256811 100644
--- a/gcc/Makefile.in
+++ b/gcc/Makefile.in
@@ -1649,6 +1649,7 @@ OBJS = \
 	tree-ssa-loop.o \
 	tree-ssa-math-opts.o \
 	tree-ssa-operands.o \
+	gimple-range-path.o \
 	tree-ssa-phiopt.o \
 	tree-ssa-phiprop.o \
 	tree-ssa-pre.o \
diff --git a/gcc/gimple-range-path.cc b/gcc/gimple-range-path.cc
new file mode 100644
index 000..dd7c5342d8b
--- /dev/null
+++ b/gcc/gimple-range-path.cc
@@ -0,0 +1,327 @@
+/* Basic block path solver.
+   Copyright (C) 2021 Free Software Foundation, Inc.
+   Contributed by Aldy Hernandez .
+
+This file is part of GCC.
+
+GCC is free software; you can redistribute it and/or modify it under
+the terms of the GNU General Public License as published by the Free
+Software Foundation; either version 3, or (at your option) any later
+version.
+
+GCC is distributed in the hope that it will be useful, but WITHOUT ANY
+WARRANTY; without even the implied warranty of MERCHANTABILITY or
+FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License
+ for more details.
+
+You should have received a copy of the GNU General Public License
+along with GCC; see the file COPYING3.  If not see
+.  */
+
+#include "config.h"
+#include "system.h"
+#include "coretypes.h"
+#include "backend.h"
+#include "tree.h"
+#include "gimple.h"
+#include "cfganal.h"
+#include "value-range.h"
+#include "gimple-range.h"
+#include "tree-pretty-print.h"
+#include "gimple-range-path.h"
+#include "ssa.h"
+
+// Internal construct to help facilitate debugging of solver.
+#define DEBUG_SOLVER (0 && dump_file)
+
+path_range_query::path_range_query (gimple_ranger )
+  : m_ranger (ranger)
+{
+  m_cache = new ssa_global_cache;
+  m_has_cache_entry = BITMAP_ALLOC (NULL);
+  m_path = NULL;
+}
+
+path_range_query::~path_range_query ()
+{
+  BITMAP_FREE (m_has_cache_entry);
+  delete m_cache;
+}
+
+// Mark cache entry for NAME as unused.
+
+void
+path_range_query::clear_cache (tree name)
+{
+  unsigned v = SSA_NAME_VERSION (name);
+  bitmap_clear_bit (m_has_cache_entry, v);
+}
+
+// If NAME has a cache entry, return it in R, and return TRUE.
+
+inline bool

Re: [patch][version 4]add -ftrivial-auto-var-init and variable attribute "uninitialized" to gcc

2021-07-15 Thread Qing Zhao via Gcc-patches


> On Jul 15, 2021, at 9:16 AM, Qing Zhao via Gcc-patches 
>  wrote:
> 
>> 
 
 Note that I think .DEFERRED_INIT can be elided for variables that do
 not have their address
 taken - otherwise we'll also have to worry about aggregate copy
 initialization and SRA
 decomposing the copy, initializing only the used parts.
>>> 
>>> Please explain this a little bit more.
>> 
>> For sth like
>> 
>> struct S { int i; long j; };
>> 
>> void bar (struct S);
>> struct S
>> foo (struct S *p)
>> {
>> struct S q = *p;
>> struct S r = q;
>> bar (r);
>> return r;
>> }
>> 
>> we don't get a .DEFERRED_INIT for 'r' (do we?)
> 
> No, we don’t emit .DEFERRED_INIT for both ‘q’ and ‘r’ since they are all 
> explicitly initialized.

Another thought on this example:

I think for the auto variable ‘q’ and ‘r’ in function ‘foo’, their 
initialization depend on the incoming parameter ‘p’.

If ‘p’ is an auto variable in ‘foo’s caller, then the incoming parameter should 
be initialized well in the caller, including it’s padding. 

So, I don’t think that we need to worry about such situation. 

If every function guarantees all its own auto-variables to be initialized 
completely including the paddings.
Then we can guarantee all such copy initialization through parameters all 
initialized completely. 

Let me know if I miss anything here.

Qing
> With the current 4th patch, the paddings inside this structure variable is 
> not initialized.
> 
> However, if we “clear” the whole structure in "gimplify_init_constructor “, 
> the initialization might happen. I will check on this.
> 
> and SRA decomposes the init to
>> 
>> 
>>  :
>> q = *p_2(D);
>> q$i_9 = p_2(D)->i;
>> q$j_10 = p_2(D)->j;
>> r.i = q$i_9;
>> r.j = q$j_10;
>> bar (r);
>> D.1953 = r;
>> r ={v} {CLOBBER};
>> return D.1953;
>> 
>> which leaves its padding uninitialized.  Hmm, and that even happens when
>> you make bar take struct S * and thus pass the address of 'r' to bar.
> 
> Will try this example and see how to resolve this issue.
> 
> Thanks for your explanation.
> 
> Qing
>> 
>> Richard.
>> 
>> 
>>> Thanks.
>>> 
>>> Qing
 
 Richard.
 
> Thanks.
> 
> Qing
> 
>> On Jul 13, 2021, at 4:29 PM, Kees Cook  wrote:
>> 
>> On Mon, Jul 12, 2021 at 08:28:55PM +, Qing Zhao wrote:
 On Jul 12, 2021, at 12:56 PM, Kees Cook  wrote:
 On Wed, Jul 07, 2021 at 05:38:02PM +, Qing Zhao wrote:
> This is the 4th version of the patch for the new security feature for 
> GCC.
 
 It looks like padding initialization has regressed to where things 
 where
 in version 1[1] (it was, however, working in version 2[2]). I'm seeing
 these failures again in the kernel self-test:
 
 test_stackinit: small_hole_static_all FAIL (uninit bytes: 3)
 test_stackinit: big_hole_static_all FAIL (uninit bytes: 61)
 test_stackinit: trailing_hole_static_all FAIL (uninit bytes: 7)
 test_stackinit: small_hole_dynamic_all FAIL (uninit bytes: 3)
 test_stackinit: big_hole_dynamic_all FAIL (uninit bytes: 61)
 test_stackinit: trailing_hole_dynamic_all FAIL (uninit bytes: 7)
>>> 
>>> Are the above failures for -ftrivial-auto-var-init=zero or 
>>> -ftrivial-auto-var-init=pattern?  Or both?
>> 
>> Yes, I was only testing =zero (the kernel test handles =pattern as well:
>> it doesn't explicitly test for 0x00). I've verified with =pattern now,
>> too.
>> 
>>> For the current implementation, I believe that all paddings should be 
>>> initialized with this option,
>>> for -ftrivial-auto-var-init=zero, the padding will be initialized to 
>>> zero as before, however, for
>>> -ftrivial-auto-var-init=pattern, the padding will be initialized to 
>>> 0xFE byte-repeatable patterns.
>> 
>> I've double-checked that I'm using the right gcc, with the flag.
>> 
 
 In looking at the gcc test cases, I think the wrong thing is
 being checked: we want to verify the padding itself. For example,
 in auto-init-17.c, the actual bytes after "four" need to be checked,
 rather than "four" itself.
>>> 
>>> **For the current auto-init-17.c
>>> 
>>> 1 /* Verify zero initialization for array type with structure element 
>>> with
>>> 2padding.  */
>>> 3 /* { dg-do compile } */
>>> 4 /* { dg-options "-ftrivial-auto-var-init=zero" } */
>>> 5
>>> 6 struct test_trailing_hole {
>>> 7 int one;
>>> 8 int two;
>>> 9 int three;
>>> 10 char four;
>>> 11 /* "sizeof(unsigned long) - 1" byte padding hole here. */
>>> 12 };
>>> 13
>>> 14
>>> 15 int foo ()
>>> 16 {
>>> 17   struct test_trailing_hole var[10];
>>> 18   return var[2].four;
>>> 19 }
>>> 20
>>> 21 /* { dg-final { scan-assembler "movl\t\\\$0," } } */
>>> 22 /* { 

Re: [patch][version 4]add -ftrivial-auto-var-init and variable attribute "uninitialized" to gcc

2021-07-15 Thread Qing Zhao via Gcc-patches
Hi, Richard,

> On Jul 15, 2021, at 2:56 AM, Richard Biener  
> wrote:
> 
>>> On Wed, Jul 14, 2021 at 1:17 AM Qing Zhao  wrote:
 
 Hi, Kees,
 
 I took a look at the kernel testing case you attached in the previous 
 email, and found the testing failed with the following case:
 
 #define INIT_STRUCT_static_all  = { .one = arg->one,\
   .two = arg->two,\
   .three = arg->three,\
   .four = arg->four,  \
   }
 
 i.e, when the structure type auto variable has been explicitly initialized 
 in the source code.  -ftrivial-auto-var-init in the 4th version
 does not initialize the paddings for such variables.
 
 But in the previous version of the patches ( 2 or 3), 
 -ftrivial-auto-var-init initializes the paddings for such variables.
 
 I intended to remove this part of the code from the 4th version of the 
 patch since the implementation for initializing such paddings is 
 completely different from
 the initializing of the whole structure as a whole with memset in this 
 version of the implementation.
 
 If we really need this functionality, I will add another separate patch 
 for this additional functionality, but not with this patch.
 
 Richard, what’s your comment and suggestions on this?
>>> 
>>> I think this can be addressed in the gimplifier by adjusting
>>> gimplify_init_constructor to clear
>>> the object before the initialization (if it's not done via aggregate
>>> copying).
>> 
>> I did this in the previous versions of the patch like the following:
>> 
>> @@ -5001,6 +5185,17 @@ gimplify_init_constructor (tree *expr_p, gimple_seq 
>> *pre_p, gimple_seq *post_p,
>>  /* If a single access to the target must be ensured and all elements
>> are zero, then it's optimal to clear whatever their number.  */
>>  cleared = true;
>> +   else if (flag_trivial_auto_var_init > AUTO_INIT_UNINITIALIZED
>> +&& !TREE_STATIC (object)
>> +&& type_has_padding (type))
>> + /* If the user requests to initialize automatic variables with
>> +paddings inside the type, we should initialize the paddings too.
>> +C guarantees that brace-init with fewer initializers than 
>> members
>> +aggregate will initialize the rest of the aggregate as-if it 
>> were
>> +static initialization.  In turn static initialization guarantees
>> +that pad is initialized to zero bits.
>> +So, it's better to clear the whole record under such situation. 
>>  */
>> + cleared = true;
>>else
>>  cleared = false;
>> 
>> Then the paddings are also initialized to zeroes with this option. (Even for 
>> -ftrivial-auto-var-init=pattern).
>> 
>> Is the above change Okay? (With this change, when 
>> -ftrivial-auto-var-init=pattern, the paddings for the
>> structure variables that have explicit initializer will be ZEROed, not 0xFE)
> 
> I guess that would be the simplest way, yes.
> 
>>> The clearing
>>> could be done via .DEFERRED_INIT.
>> 
>> You mean to add additional calls to .DEFERRED_INIT for each individual 
>> padding of the structure in “gimplify_init_constructor"?
>> Then  later during RTL expand, expand these calls the same as other calls?
> 
> No, I actually meant to in your patch above set
> 
>defered_padding_init = true;
> 
> and where 'cleared' is processed do sth like
> 
>  if (defered_padding_init)
>.. emit .DEFERRED_INIT for the _whole_ variable ..
>  else if (cleared)
> .. original cleared handling ...
> 
> that would retain the pattern init but possibly be less efficient in the end.

Okay, I see.

Yes, then this will resolve the inconsistent pattern-init issue for paddings. 
I will try this.

> 
>>> 
>>> Note that I think .DEFERRED_INIT can be elided for variables that do
>>> not have their address
>>> taken - otherwise we'll also have to worry about aggregate copy
>>> initialization and SRA
>>> decomposing the copy, initializing only the used parts.
>> 
>> Please explain this a little bit more.
> 
> For sth like
> 
> struct S { int i; long j; };
> 
> void bar (struct S);
> struct S
> foo (struct S *p)
> {
>  struct S q = *p;
>  struct S r = q;
>  bar (r);
>  return r;
> }
> 
> we don't get a .DEFERRED_INIT for 'r' (do we?)

No, we don’t emit .DEFERRED_INIT for both ‘q’ and ‘r’ since they are all 
explicitly initialized.
With the current 4th patch, the paddings inside this structure variable is not 
initialized.

However, if we “clear” the whole structure in "gimplify_init_constructor “, the 
initialization might happen. I will check on this.

and SRA decomposes the init to
> 
> 
>   :
>  q = *p_2(D);
>  q$i_9 = p_2(D)->i;
>  q$j_10 = p_2(D)->j;
>  r.i = 

Re: rs6000: Generate an lxvp instead of two adjacent lxv instructions

2021-07-15 Thread Peter Bergner via Gcc-patches
On 7/14/21 4:12 PM, Peter Bergner wrote:
> I'll make the change above and rebuild just to be safe and then commit.

Regtesting was clean as expected, so I pushed the commit to trunk.  Thanks.
Is this ok for backporting to GCC 11 after a day or two on trunk?

Given GCC 10 doesn't have the opaque mode changes, I don't want this in GCC 10.


Peter


Re: [PATCH 2/2][RFC] Add loop masking support for x86

2021-07-15 Thread Richard Biener
On Thu, 15 Jul 2021, Richard Sandiford wrote:

> Richard Biener  writes:
> > The following extends the existing loop masking support using
> > SVE WHILE_ULT to x86 by proving an alternate way to produce the
> > mask using VEC_COND_EXPRs.  So with --param vect-partial-vector-usage
> > you can now enable masked vectorized epilogues (=1) or fully
> > masked vector loops (=2).
> 
> As mentioned on IRC, WHILE_ULT is supposed to ensure that every
> element after the first zero is also zero.  That happens naturally
> for power-of-2 vectors if the start index is a multiple of the VF.
> (And at the moment, variable-length vectors are the only way of
> supporting non-power-of-2 vectors.)
> 
> This probably works fine for =2 and =1 as things stand, since the
> vector IVs always start at zero.  But if in future we have a single
> IV counting scalar iterations, and use it even for peeled prologue
> iterations, we could end up with a situation where the approximation
> is no longer safe.
> 
> E.g. suppose we had a uint32_t scalar IV with a limit of (uint32_t)-3.
> If we peeled 2 iterations for alignment and then had a VF of 8,
> the final vector would have a start index of (uint32_t)-6 and the
> vector would be { -1, -1, -1, 0, 0, 0, -1, -1 }.

Ah, I didn't think of overflow, yeah.  Guess the add of
{ 0, 1, 2, 3 ... } would need to be saturating ;)

> So I think it would be safer to handle this as an alternative to
> using while, rather than as a direct emulation, so that we can take
> the extra restrictions into account.  Alternatively, we could probably
> do { 0, 1, 2, ... } < { end - start, end - start, ... }.

Or this, that looks correct and not worse from a complexity point
of view.

I'll see if I can come up with a testcase and fix even.

Thanks,
Richard.

> Thanks,
> Richard
> 
> 
> 
> >
> > What's missing is using a scalar IV for the loop control
> > (but in principle AVX512 can use the mask here - just the patch
> > doesn't seem to work for AVX512 yet for some reason - likely
> > expand_vec_cond_expr_p doesn't work there).  What's also missing
> > is providing more support for predicated operations in the case
> > of reductions either via VEC_COND_EXPRs or via implementing
> > some of the .COND_{ADD,SUB,MUL...} internal functions as mapping
> > to masked AVX512 operations.
> >
> > For AVX2 and
> >
> > int foo (unsigned *a, unsigned * __restrict b, int n)
> > {
> >   unsigned sum = 1;
> >   for (int i = 0; i < n; ++i)
> > b[i] += a[i];
> >   return sum;
> > }
> >
> > we get
> >
> > .L3:
> > vpmaskmovd  (%rsi,%rax), %ymm0, %ymm3
> > vpmaskmovd  (%rdi,%rax), %ymm0, %ymm1
> > addl$8, %edx
> > vpaddd  %ymm3, %ymm1, %ymm1
> > vpmaskmovd  %ymm1, %ymm0, (%rsi,%rax)
> > vmovd   %edx, %xmm1
> > vpsubd  %ymm15, %ymm2, %ymm0
> > addq$32, %rax
> > vpbroadcastd%xmm1, %ymm1
> > vpaddd  %ymm4, %ymm1, %ymm1
> > vpsubd  %ymm15, %ymm1, %ymm1
> > vpcmpgtd%ymm1, %ymm0, %ymm0
> > vptest  %ymm0, %ymm0
> > jne .L3
> >
> > for the fully masked loop body and for the masked epilogue
> > we see
> >
> > .L4:
> > vmovdqu (%rsi,%rax), %ymm3
> > vpaddd  (%rdi,%rax), %ymm3, %ymm0
> > vmovdqu %ymm0, (%rsi,%rax)
> > addq$32, %rax
> > cmpq%rax, %rcx
> > jne .L4
> > movl%edx, %eax
> > andl$-8, %eax
> > testb   $7, %dl
> > je  .L11
> > .L3:
> > subl%eax, %edx
> > vmovdqa .LC0(%rip), %ymm1
> > salq$2, %rax
> > vmovd   %edx, %xmm0
> > movl$-2147483648, %edx
> > addq%rax, %rsi
> > vmovd   %edx, %xmm15
> > vpbroadcastd%xmm0, %ymm0
> > vpbroadcastd%xmm15, %ymm15
> > vpsubd  %ymm15, %ymm1, %ymm1
> > vpsubd  %ymm15, %ymm0, %ymm0
> > vpcmpgtd%ymm1, %ymm0, %ymm0
> > vpmaskmovd  (%rsi), %ymm0, %ymm1
> > vpmaskmovd  (%rdi,%rax), %ymm0, %ymm2
> > vpaddd  %ymm2, %ymm1, %ymm1
> > vpmaskmovd  %ymm1, %ymm0, (%rsi)
> > .L11:
> > vzeroupper
> >
> > compared to
> >
> > .L3:
> > movl%edx, %r8d
> > subl%eax, %r8d
> > leal-1(%r8), %r9d
> > cmpl$2, %r9d
> > jbe .L6
> > leaq(%rcx,%rax,4), %r9
> > vmovdqu (%rdi,%rax,4), %xmm2
> > movl%r8d, %eax
> > andl$-4, %eax
> > vpaddd  (%r9), %xmm2, %xmm0
> > addl%eax, %esi
> > andl$3, %r8d
> > vmovdqu %xmm0, (%r9)
> > je  .L2
> > .L6:
> > movslq  %esi, %r8
> > leaq0(,%r8,4), %rax
> > movl(%rdi,%r8,4), %r8d
> > addl%r8d, (%rcx,%rax)
> > leal1(%rsi), %r8d
> > cmpl%r8d, %edx
> > jle .L2
> > addl$2, %esi
> > movl4(%rdi,%rax), %r8d
> > addl

Re: [PATCH 2/2][RFC] Add loop masking support for x86

2021-07-15 Thread Richard Sandiford via Gcc-patches
Richard Biener  writes:
> The following extends the existing loop masking support using
> SVE WHILE_ULT to x86 by proving an alternate way to produce the
> mask using VEC_COND_EXPRs.  So with --param vect-partial-vector-usage
> you can now enable masked vectorized epilogues (=1) or fully
> masked vector loops (=2).

As mentioned on IRC, WHILE_ULT is supposed to ensure that every
element after the first zero is also zero.  That happens naturally
for power-of-2 vectors if the start index is a multiple of the VF.
(And at the moment, variable-length vectors are the only way of
supporting non-power-of-2 vectors.)

This probably works fine for =2 and =1 as things stand, since the
vector IVs always start at zero.  But if in future we have a single
IV counting scalar iterations, and use it even for peeled prologue
iterations, we could end up with a situation where the approximation
is no longer safe.

E.g. suppose we had a uint32_t scalar IV with a limit of (uint32_t)-3.
If we peeled 2 iterations for alignment and then had a VF of 8,
the final vector would have a start index of (uint32_t)-6 and the
vector would be { -1, -1, -1, 0, 0, 0, -1, -1 }.

So I think it would be safer to handle this as an alternative to
using while, rather than as a direct emulation, so that we can take
the extra restrictions into account.  Alternatively, we could probably
do { 0, 1, 2, ... } < { end - start, end - start, ... }.

Thanks,
Richard



>
> What's missing is using a scalar IV for the loop control
> (but in principle AVX512 can use the mask here - just the patch
> doesn't seem to work for AVX512 yet for some reason - likely
> expand_vec_cond_expr_p doesn't work there).  What's also missing
> is providing more support for predicated operations in the case
> of reductions either via VEC_COND_EXPRs or via implementing
> some of the .COND_{ADD,SUB,MUL...} internal functions as mapping
> to masked AVX512 operations.
>
> For AVX2 and
>
> int foo (unsigned *a, unsigned * __restrict b, int n)
> {
>   unsigned sum = 1;
>   for (int i = 0; i < n; ++i)
> b[i] += a[i];
>   return sum;
> }
>
> we get
>
> .L3:
> vpmaskmovd  (%rsi,%rax), %ymm0, %ymm3
> vpmaskmovd  (%rdi,%rax), %ymm0, %ymm1
> addl$8, %edx
> vpaddd  %ymm3, %ymm1, %ymm1
> vpmaskmovd  %ymm1, %ymm0, (%rsi,%rax)
> vmovd   %edx, %xmm1
> vpsubd  %ymm15, %ymm2, %ymm0
> addq$32, %rax
> vpbroadcastd%xmm1, %ymm1
> vpaddd  %ymm4, %ymm1, %ymm1
> vpsubd  %ymm15, %ymm1, %ymm1
> vpcmpgtd%ymm1, %ymm0, %ymm0
> vptest  %ymm0, %ymm0
> jne .L3
>
> for the fully masked loop body and for the masked epilogue
> we see
>
> .L4:
> vmovdqu (%rsi,%rax), %ymm3
> vpaddd  (%rdi,%rax), %ymm3, %ymm0
> vmovdqu %ymm0, (%rsi,%rax)
> addq$32, %rax
> cmpq%rax, %rcx
> jne .L4
> movl%edx, %eax
> andl$-8, %eax
> testb   $7, %dl
> je  .L11
> .L3:
> subl%eax, %edx
> vmovdqa .LC0(%rip), %ymm1
> salq$2, %rax
> vmovd   %edx, %xmm0
> movl$-2147483648, %edx
> addq%rax, %rsi
> vmovd   %edx, %xmm15
> vpbroadcastd%xmm0, %ymm0
> vpbroadcastd%xmm15, %ymm15
> vpsubd  %ymm15, %ymm1, %ymm1
> vpsubd  %ymm15, %ymm0, %ymm0
> vpcmpgtd%ymm1, %ymm0, %ymm0
> vpmaskmovd  (%rsi), %ymm0, %ymm1
> vpmaskmovd  (%rdi,%rax), %ymm0, %ymm2
> vpaddd  %ymm2, %ymm1, %ymm1
> vpmaskmovd  %ymm1, %ymm0, (%rsi)
> .L11:
> vzeroupper
>
> compared to
>
> .L3:
> movl%edx, %r8d
> subl%eax, %r8d
> leal-1(%r8), %r9d
> cmpl$2, %r9d
> jbe .L6
> leaq(%rcx,%rax,4), %r9
> vmovdqu (%rdi,%rax,4), %xmm2
> movl%r8d, %eax
> andl$-4, %eax
> vpaddd  (%r9), %xmm2, %xmm0
> addl%eax, %esi
> andl$3, %r8d
> vmovdqu %xmm0, (%r9)
> je  .L2
> .L6:
> movslq  %esi, %r8
> leaq0(,%r8,4), %rax
> movl(%rdi,%r8,4), %r8d
> addl%r8d, (%rcx,%rax)
> leal1(%rsi), %r8d
> cmpl%r8d, %edx
> jle .L2
> addl$2, %esi
> movl4(%rdi,%rax), %r8d
> addl%r8d, 4(%rcx,%rax)
> cmpl%esi, %edx
> jle .L2
> movl8(%rdi,%rax), %edx
> addl%edx, 8(%rcx,%rax)
> .L2:
>
> I'm giving this a little testing right now but will dig on why
> I don't get masked loops when AVX512 is enabled.
>
> Still comments are appreciated.
>
> Thanks,
> Richard.
>
> 2021-07-15  Richard Biener  
>
>   * tree-vect-stmts.c (can_produce_all_loop_masks_p): We
>   also can produce masks with VEC_COND_EXPRs.
>   * tree-vect-loop.c (vect_gen_while): Generate the mask
>   with a VEC_COND_EXPR in 

Re: [PATCH] c++: Optimize away NULLPTR_TYPE comparisons [PR101443]

2021-07-15 Thread Jason Merrill via Gcc-patches

On 7/15/21 3:53 AM, Jakub Jelinek wrote:

Hi!

Comparisons of NULLPTR_TYPE operands cause all kinds of problems in the
middle-end and in fold-const.c, various optimizations assume that if they
see e.g. a non-equality comparison with one of the operands being
INTEGER_CST and it is not INTEGRAL_TYPE_P (which has TYPE_{MIN,MAX}_VALUE),
they can build_int_cst (type, 1) to find a successor.

The following patch fixes it by making sure they don't appear in the IL,
optimize them away at cp_fold time as all can be folded.

Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?

Though, I've just noticed that clang++ rejects the non-equality comparisons
instead, foo () > 0 with
invalid operands to binary expression ('decltype(nullptr)' (aka 'nullptr_t') 
and 'int')
and foo () > nullptr with
invalid operands to binary expression ('decltype(nullptr)' (aka 'nullptr_t') 
and 'nullptr_t')

Shall we reject those too, in addition or instead of parts of this patch?


Yes.


If so, wouldn't this patch be still useful for backports, I bet we don't
want to start reject it on the release branches when we used to accept it.


Sounds good.


2021-07-15  Jakub Jelinek  

PR c++/101443
* cp-gimplify.c (cp_fold): For comparisons with NULLPTR_TYPE
operands, fold them right away to true or false.

* g++.dg/cpp0x/nullptr46.C: New test.

--- gcc/cp/cp-gimplify.c.jj 2021-06-25 10:36:22.141020337 +0200
+++ gcc/cp/cp-gimplify.c2021-07-14 12:04:24.221860756 +0200
@@ -2424,6 +2424,32 @@ cp_fold (tree x)
op0 = cp_fold_maybe_rvalue (TREE_OPERAND (x, 0), rval_ops);
op1 = cp_fold_rvalue (TREE_OPERAND (x, 1));
  
+  /* decltype(nullptr) has only one value, so optimize away all comparisons

+with that type right away, keeping them in the IL causes troubles for
+various optimizations.  */
+  if (COMPARISON_CLASS_P (org_x)
+ && TREE_CODE (TREE_TYPE (op0)) == NULLPTR_TYPE
+ && TREE_CODE (TREE_TYPE (op1)) == NULLPTR_TYPE)
+   {
+ switch (code)
+   {
+   case EQ_EXPR:
+   case LE_EXPR:
+   case GE_EXPR:
+ x = constant_boolean_node (true, TREE_TYPE (x));
+ break;
+   case NE_EXPR:
+   case LT_EXPR:
+   case GT_EXPR:
+ x = constant_boolean_node (false, TREE_TYPE (x));
+ break;
+   default:
+ gcc_unreachable ();
+   }
+ return omit_two_operands_loc (loc, TREE_TYPE (x), x,
+   op0, op1);
+   }
+
if (op0 != TREE_OPERAND (x, 0) || op1 != TREE_OPERAND (x, 1))
{
  if (op0 == error_mark_node || op1 == error_mark_node)
--- gcc/testsuite/g++.dg/cpp0x/nullptr46.C.jj   2021-07-14 11:48:03.917122727 
+0200
+++ gcc/testsuite/g++.dg/cpp0x/nullptr46.C  2021-07-14 11:46:52.261092097 
+0200
@@ -0,0 +1,11 @@
+// PR c++/101443
+// { dg-do compile { target c++11 } }
+// { dg-options "-O2" }
+
+decltype(nullptr) foo ();
+
+bool
+bar ()
+{
+  return foo () > nullptr || foo () < nullptr;
+}

Jakub





[PATCH] Add --enable-first-stage-cross configure option

2021-07-15 Thread Serge Belyshev
Add --enable-first-stage-cross configure option

Build static-only, C-only compiler that is sufficient to cross compile
glibc.  This option disables various runtime libraries that require
libc to compile, turns on --with-newlib, --without-headers,
--disable-decimal-float, --disable-shared, --disable-threads, and sets
--enable-languages=c.

Rationale: current way of building first stage compiler of a cross
toolchain requires specifying a list of target libraries that are not
going to be compiled due to their dependency on target libc.  This
list is not documented in gccinstall.texi and sometimes changes.  To
simplify the procedure, it is better to maintain that list in the GCC
itself.

Usage example as a patch to glibc's scripts/build-many-libcs.py:

diff --git a/scripts/build-many-glibcs.py b/scripts/build-many-glibcs.py
index 580d25e8ee..3a6a7be76b 100755
--- a/scripts/build-many-glibcs.py
+++ b/scripts/build-many-glibcs.py
@@ -1446,17 +1446,7 @@ class Config(object):
 # required to define inhibit_libc (to stop some parts of
 # libgcc including libc headers); --without-headers is not
 # sufficient.
-cfg_opts += ['--enable-languages=c', '--disable-shared',
- '--disable-threads',
- '--disable-libatomic',
- '--disable-decimal-float',
- '--disable-libffi',
- '--disable-libgomp',
- '--disable-libitm',
- '--disable-libmpx',
- '--disable-libquadmath',
- '--disable-libsanitizer',
- '--without-headers', '--with-newlib',
+cfg_opts += ['--enable-first-stage-cross',
  '--with-glibc-version=%s' % self.ctx.glibc_version
  ]
 cfg_opts += self.first_gcc_cfg


Bootstrapped/regtested on x86_64-pc-linux-gnu, and
tested with build-many-glibcs.py with the above patch.

OK for mainline?


ChangeLog:

* configure.ac: Add --enable-first-stage-cross.
* configure: Regenerate.

gcc/ChangeLog:

* doc/install.texi: Document --enable-first-stage-cross.
---
 configure| 20 
 configure.ac | 15 +++
 gcc/doc/install.texi |  7 +++
 3 files changed, 42 insertions(+)

diff --git a/configure b/configure
index 85ab9915402..df59036e258 100755
--- a/configure
+++ b/configure
@@ -787,6 +787,7 @@ ac_user_opts='
 enable_option_checking
 with_build_libsubdir
 with_system_zlib
+enable_first_stage_cross
 enable_as_accelerator_for
 enable_offload_targets
 enable_offload_defaulted
@@ -1514,6 +1515,9 @@ Optional Features:
   --disable-option-checking  ignore unrecognized --enable/--with options
   --disable-FEATURE   do not include FEATURE (same as --enable-FEATURE=no)
   --enable-FEATURE[=ARG]  include FEATURE [ARG=yes]
+  --enable-first-stage-cross
+  Build a static-only compiler that is sufficient to
+  build glibc.
   --enable-as-accelerator-for=ARG
   build as offload target compiler. Specify offload
   host triple by ARG
@@ -2961,6 +2965,22 @@ case $is_cross_compiler in
   no) skipdirs="${skipdirs} ${cross_only}" ;;
 esac
 
+# Check whether --enable-first-stage-cross was given.
+if test "${enable_first_stage_cross+set}" = set; then :
+  enableval=$enable_first_stage_cross; ENABLE_FIRST_STAGE_CROSS=$enableval
+else
+  ENABLE_FIRST_STAGE_CROSS=no
+fi
+
+case "${ENABLE_FIRST_STAGE_CROSS}" in
+  yes)
+noconfigdirs="$noconfigdirs target-libatomic target-libquadmath 
target-libgomp target-libssp"
+host_configargs="$host_configargs --disable-shared --disable-threads 
--disable-decimal-float --without-headers --with-newlib"
+target_configargs="$target_configargs --disable-shared"
+enable_languages=c
+;;
+esac
+
 # If both --with-headers and --with-libs are specified, default to
 # --without-newlib.
 if test x"${with_headers}" != x && test x"${with_headers}" != xno \
diff --git a/configure.ac b/configure.ac
index 1df038b04f3..53f920c1a2c 100644
--- a/configure.ac
+++ b/configure.ac
@@ -268,6 +268,21 @@ case $is_cross_compiler in
   no) skipdirs="${skipdirs} ${cross_only}" ;;
 esac
 
+AC_ARG_ENABLE(first-stage-cross,
+[AS_HELP_STRING([--enable-first-stage-cross],
+   [Build a static-only compiler that is
+   sufficient to build glibc.])],
+ENABLE_FIRST_STAGE_CROSS=$enableval,
+ENABLE_FIRST_STAGE_CROSS=no)
+case "${ENABLE_FIRST_STAGE_CROSS}" in
+  yes)
+noconfigdirs="$noconfigdirs target-libatomic target-libquadmath 
target-libgomp target-libssp"
+host_configargs="$host_configargs --disable-shared --disable-threads 
--disable-decimal-float --without-headers --with-newlib"
+target_configargs="$target_configargs --disable-shared"
+enable_languages=c
+;;
+esac
+
 # 

Re: [RFC] Return NULL from gimple_call_return_type if no return available.

2021-07-15 Thread Richard Biener via Gcc-patches
On Thu, Jul 15, 2021 at 3:23 PM Richard Biener
 wrote:
>
> On Thu, Jul 15, 2021 at 3:21 PM Richard Biener
>  wrote:
> >
> > On Thu, Jul 15, 2021 at 3:16 PM Aldy Hernandez  wrote:
> > >
> > >
> > >
> > > On 7/15/21 3:06 PM, Richard Biener wrote:
> > > > On Thu, Jul 15, 2021 at 1:06 PM Aldy Hernandez  wrote:
> > > >>
> > > >> Well, if we don't adjust gimple_call_return_type() to handle built-ins
> > > >> with no LHS, then we must adjust the callers.
> > > >>
> > > >> The attached patch fixes gimple_expr_type() per it's documentation:
> > > >>
> > > >> /* Return the type of the main expression computed by STMT.  Return
> > > >> void_type_node if the statement computes nothing.  */
> > > >>
> > > >> Currently gimple_expr_type is ICEing because it calls 
> > > >> gimple_call_return_type.
> > > >>
> > > >> I still think gimple_call_return_type should return void_type_node
> > > >> instead of ICEing, but this will also fix my problem.
> > > >>
> > > >> Anyone have a problem with this?
> > > >
> > > > It's still somewhat inconsistent, no?  Because for a call without a LHS
> > > > it's now either void_type_node or the type of the return value.
> > > >
> > > > It's probably known I dislike gimple_expr_type itself (it was introduced
> > > > to make the transition to tuples easier).  I wonder why you can't simply
> > > > fix range_of_call to do
> > > >
> > > > tree lhs = gimple_call_lhs (call);
> > > > if (lhs)
> > > >   type = TREE_TYPE (lhs);
> > >
> > > That would still leave gimple_expr_type() broken.  It's comment clearly
> > > says it should return void_type_node.
> >
> > Does it?  What does it say for
> >
> > int foo ();
> >
> > and the stmt
> >
> >  'foo ();'
> >
> > ?  How's this different from
> >
> >  'bar ();'
> >
> > when bar is an internal function?  Note how the comment
> > speaks about 'type of the main EXPRESSION' and
> > 'if the STATEMEMT computes nothing' (emphasis mine).
> > I don't think it's all that clear.  A gimple_cond stmt
> > doesn't compute anything, does it?  Does the 'foo ()'
> > statement compute anything?  The current implementation
> > (and your patched one) says so.  But why does
> >
> >  .ADD_OVERFLOW (_1, _2);
> >
> > not (according to your patched implementation)?  It computes
> > something and that something has a type that depends on
> > the types of _1 and _2 and on the actual internal function.
> > But we don't have it readily available.  If you need it then
> > you are on your own - but returning void_type_node is wrong.
>
> That said, in 99% of all cases people should have used
> TREE_TYPE (gimple_get_lhs (stmt)) insead of
> gimple_expr_type since that makes clear that we're
> talking of a result that materializes somewhere.  It also
> makes the required guard obvious - gimple_get_lhs (stmt) != NULL.
>
> Then there are the legacy callers that call it on a GIMPLE_COND
> and the (IMHO broken) ones that expect it to do magic for
> masked loads and stores.

Btw, void_type_node is also wrong for a GIMPLE_ASM with outputs.

I think if you really want to fix the ICEing then return NULL for
"we don't know" and adjust the current default as well.

Richard.

> Richard.
>
> > Richard.
> >
> > > I still think we should just fix gimple_call_return_type to return
> > > void_type_node instead of ICEing.
> > >


Re: [RFC] Return NULL from gimple_call_return_type if no return available.

2021-07-15 Thread Richard Biener via Gcc-patches
On Thu, Jul 15, 2021 at 3:21 PM Richard Biener
 wrote:
>
> On Thu, Jul 15, 2021 at 3:16 PM Aldy Hernandez  wrote:
> >
> >
> >
> > On 7/15/21 3:06 PM, Richard Biener wrote:
> > > On Thu, Jul 15, 2021 at 1:06 PM Aldy Hernandez  wrote:
> > >>
> > >> Well, if we don't adjust gimple_call_return_type() to handle built-ins
> > >> with no LHS, then we must adjust the callers.
> > >>
> > >> The attached patch fixes gimple_expr_type() per it's documentation:
> > >>
> > >> /* Return the type of the main expression computed by STMT.  Return
> > >> void_type_node if the statement computes nothing.  */
> > >>
> > >> Currently gimple_expr_type is ICEing because it calls 
> > >> gimple_call_return_type.
> > >>
> > >> I still think gimple_call_return_type should return void_type_node
> > >> instead of ICEing, but this will also fix my problem.
> > >>
> > >> Anyone have a problem with this?
> > >
> > > It's still somewhat inconsistent, no?  Because for a call without a LHS
> > > it's now either void_type_node or the type of the return value.
> > >
> > > It's probably known I dislike gimple_expr_type itself (it was introduced
> > > to make the transition to tuples easier).  I wonder why you can't simply
> > > fix range_of_call to do
> > >
> > > tree lhs = gimple_call_lhs (call);
> > > if (lhs)
> > >   type = TREE_TYPE (lhs);
> >
> > That would still leave gimple_expr_type() broken.  It's comment clearly
> > says it should return void_type_node.
>
> Does it?  What does it say for
>
> int foo ();
>
> and the stmt
>
>  'foo ();'
>
> ?  How's this different from
>
>  'bar ();'
>
> when bar is an internal function?  Note how the comment
> speaks about 'type of the main EXPRESSION' and
> 'if the STATEMEMT computes nothing' (emphasis mine).
> I don't think it's all that clear.  A gimple_cond stmt
> doesn't compute anything, does it?  Does the 'foo ()'
> statement compute anything?  The current implementation
> (and your patched one) says so.  But why does
>
>  .ADD_OVERFLOW (_1, _2);
>
> not (according to your patched implementation)?  It computes
> something and that something has a type that depends on
> the types of _1 and _2 and on the actual internal function.
> But we don't have it readily available.  If you need it then
> you are on your own - but returning void_type_node is wrong.

That said, in 99% of all cases people should have used
TREE_TYPE (gimple_get_lhs (stmt)) insead of
gimple_expr_type since that makes clear that we're
talking of a result that materializes somewhere.  It also
makes the required guard obvious - gimple_get_lhs (stmt) != NULL.

Then there are the legacy callers that call it on a GIMPLE_COND
and the (IMHO broken) ones that expect it to do magic for
masked loads and stores.

Richard.

> Richard.
>
> > I still think we should just fix gimple_call_return_type to return
> > void_type_node instead of ICEing.
> >


Re: [RFC] Return NULL from gimple_call_return_type if no return available.

2021-07-15 Thread Richard Biener via Gcc-patches
On Thu, Jul 15, 2021 at 3:16 PM Aldy Hernandez  wrote:
>
>
>
> On 7/15/21 3:06 PM, Richard Biener wrote:
> > On Thu, Jul 15, 2021 at 1:06 PM Aldy Hernandez  wrote:
> >>
> >> Well, if we don't adjust gimple_call_return_type() to handle built-ins
> >> with no LHS, then we must adjust the callers.
> >>
> >> The attached patch fixes gimple_expr_type() per it's documentation:
> >>
> >> /* Return the type of the main expression computed by STMT.  Return
> >> void_type_node if the statement computes nothing.  */
> >>
> >> Currently gimple_expr_type is ICEing because it calls 
> >> gimple_call_return_type.
> >>
> >> I still think gimple_call_return_type should return void_type_node
> >> instead of ICEing, but this will also fix my problem.
> >>
> >> Anyone have a problem with this?
> >
> > It's still somewhat inconsistent, no?  Because for a call without a LHS
> > it's now either void_type_node or the type of the return value.
> >
> > It's probably known I dislike gimple_expr_type itself (it was introduced
> > to make the transition to tuples easier).  I wonder why you can't simply
> > fix range_of_call to do
> >
> > tree lhs = gimple_call_lhs (call);
> > if (lhs)
> >   type = TREE_TYPE (lhs);
>
> That would still leave gimple_expr_type() broken.  It's comment clearly
> says it should return void_type_node.

Does it?  What does it say for

int foo ();

and the stmt

 'foo ();'

?  How's this different from

 'bar ();'

when bar is an internal function?  Note how the comment
speaks about 'type of the main EXPRESSION' and
'if the STATEMEMT computes nothing' (emphasis mine).
I don't think it's all that clear.  A gimple_cond stmt
doesn't compute anything, does it?  Does the 'foo ()'
statement compute anything?  The current implementation
(and your patched one) says so.  But why does

 .ADD_OVERFLOW (_1, _2);

not (according to your patched implementation)?  It computes
something and that something has a type that depends on
the types of _1 and _2 and on the actual internal function.
But we don't have it readily available.  If you need it then
you are on your own - but returning void_type_node is wrong.

Richard.

> I still think we should just fix gimple_call_return_type to return
> void_type_node instead of ICEing.
>


Re: [RFC] Return NULL from gimple_call_return_type if no return available.

2021-07-15 Thread Aldy Hernandez via Gcc-patches




On 7/15/21 3:06 PM, Richard Biener wrote:

On Thu, Jul 15, 2021 at 1:06 PM Aldy Hernandez  wrote:


Well, if we don't adjust gimple_call_return_type() to handle built-ins
with no LHS, then we must adjust the callers.

The attached patch fixes gimple_expr_type() per it's documentation:

/* Return the type of the main expression computed by STMT.  Return
void_type_node if the statement computes nothing.  */

Currently gimple_expr_type is ICEing because it calls gimple_call_return_type.

I still think gimple_call_return_type should return void_type_node
instead of ICEing, but this will also fix my problem.

Anyone have a problem with this?


It's still somewhat inconsistent, no?  Because for a call without a LHS
it's now either void_type_node or the type of the return value.

It's probably known I dislike gimple_expr_type itself (it was introduced
to make the transition to tuples easier).  I wonder why you can't simply
fix range_of_call to do

tree lhs = gimple_call_lhs (call);
if (lhs)
  type = TREE_TYPE (lhs);


That would still leave gimple_expr_type() broken.  It's comment clearly 
says it should return void_type_node.


I still think we should just fix gimple_call_return_type to return 
void_type_node instead of ICEing.




Re: [PATCH V2] gcc: Add vec_select -> subreg RTL simplification

2021-07-15 Thread Jonathan Wright via Gcc-patches
Ah, yes - those test results should have only been changed for little endian.

I've submitted a patch to the list restoring the original expected results for 
big
endian.

Thanks,
Jonathan

From: Christophe Lyon 
Sent: 15 July 2021 10:09
To: Richard Sandiford ; Jonathan Wright 
; gcc-patches@gcc.gnu.org ; 
Kyrylo Tkachov 
Subject: Re: [PATCH V2] gcc: Add vec_select -> subreg RTL simplification



On Mon, Jul 12, 2021 at 5:31 PM Richard Sandiford via Gcc-patches 
mailto:gcc-patches@gcc.gnu.org>> wrote:
Jonathan Wright mailto:jonathan.wri...@arm.com>> 
writes:
> Hi,
>
> Version 2 of this patch adds more code generation tests to show the
> benefit of this RTL simplification as well as adding a new helper function
> 'rtx_vec_series_p' to reduce code duplication.
>
> Patch tested as version 1 - ok for master?

Sorry for the slow reply.

> Regression tested and bootstrapped on aarch64-none-linux-gnu,
> x86_64-unknown-linux-gnu, arm-none-linux-gnueabihf and
> aarch64_be-none-linux-gnu - no issues.

I've also tested this on powerpc64le-unknown-linux-gnu, no issues again.

> diff --git a/gcc/combine.c b/gcc/combine.c
> index 
> 6476812a21268e28219d1e302ee1c979d528a6ca..0ff6ca87e4432cfeff1cae1dd219ea81ea0b73e4
>  100644
> --- a/gcc/combine.c
> +++ b/gcc/combine.c
> @@ -6276,6 +6276,26 @@ combine_simplify_rtx (rtx x, machine_mode op0_mode, 
> int in_dest,
> - 1,
> 0));
>break;
> +case VEC_SELECT:
> +  {
> + rtx trueop0 = XEXP (x, 0);
> + mode = GET_MODE (trueop0);
> + rtx trueop1 = XEXP (x, 1);
> + int nunits;
> + /* If we select a low-part subreg, return that.  */
> + if (GET_MODE_NUNITS (mode).is_constant ()
> + && targetm.can_change_mode_class (mode, GET_MODE (x), ALL_REGS))
> +   {
> + int offset = BYTES_BIG_ENDIAN ? nunits - XVECLEN (trueop1, 0) : 0;
> +
> + if (rtx_vec_series_p (trueop1, offset))
> +   {
> + rtx new_rtx = lowpart_subreg (GET_MODE (x), trueop0, mode);
> + if (new_rtx != NULL_RTX)
> +   return new_rtx;
> +   }
> +   }
> +  }

Since this occurs three times, I think it would be worth having
a new predicate:

/* Return true if, for all OP of mode OP_MODE:

 (vec_select:RESULT_MODE OP SEL)

   is equivalent to the lowpart RESULT_MODE of OP.  */

bool
vec_series_lowpart_p (machine_mode result_mode, machine_mode op_mode, rtx sel)

containing the GET_MODE_NUNITS (…).is_constant, can_change_mode_class
and rtx_vec_series_p tests.

I think the function belongs in rtlanal.[hc], even though subreg_lowpart_p
is in emit-rtl.c.

> diff --git a/gcc/config/aarch64/aarch64.md b/gcc/config/aarch64/aarch64.md
> index 
> aef6da9732d45b3586bad5ba57dafa438374ac3c..f12a0bebd3d6dd3381ac8248cd3fa3f519115105
>  100644
> --- a/gcc/config/aarch64/aarch64.md
> +++ b/gcc/config/aarch64/aarch64.md
> @@ -1884,15 +1884,16 @@
>  )
>
>  (define_insn "*zero_extend2_aarch64"
> -  [(set (match_operand:GPI 0 "register_operand" "=r,r,w")
> -(zero_extend:GPI (match_operand:SHORT 1 "nonimmediate_operand" 
> "r,m,m")))]
> +  [(set (match_operand:GPI 0 "register_operand" "=r,r,w,r")
> +(zero_extend:GPI (match_operand:SHORT 1 "nonimmediate_operand" 
> "r,m,m,w")))]
>""
>"@
> and\t%0, %1, 
> ldr\t%w0, %1
> -   ldr\t%0, %1"
> -  [(set_attr "type" "logic_imm,load_4,f_loads")
> -   (set_attr "arch" "*,*,fp")]
> +   ldr\t%0, %1
> +   umov\t%w0, %1.[0]"
> +  [(set_attr "type" "logic_imm,load_4,f_loads,neon_to_gp")
> +   (set_attr "arch" "*,*,fp,fp")]

FTR (just to show I thought about it): I don't know whether the umov
can really be considered an fp operation rather than a simd operation,
but since we don't support fp without simd, this is already a distinction
without a difference.  So the pattern is IMO OK as-is.

> diff --git a/gcc/config/arm/vfp.md b/gcc/config/arm/vfp.md
> index 
> 55b6c1ac585a4cae0789c3afc0fccfc05a6d3653..93e963696dad30f29a76025696670f8b31bf2c35
>  100644
> --- a/gcc/config/arm/vfp.md
> +++ b/gcc/config/arm/vfp.md
> @@ -224,7 +224,7 @@
>  ;; problems because small constants get converted into adds.
>  (define_insn "*arm_movsi_vfp"
>[(set (match_operand:SI 0 "nonimmediate_operand" "=rk,r,r,r,rk,m 
> ,*t,r,*t,*t, *Uv")
> -  (match_operand:SI 1 "general_operand" "rk, 
> I,K,j,mi,rk,r,*t,*t,*Uvi,*t"))]
> +  (match_operand:SI 1 "general_operand" "rk, 
> I,K,j,mi,rk,r,t,*t,*Uvi,*t"))]
>"TARGET_ARM && TARGET_HARD_FLOAT
> && (   s_register_operand (operands[0], SImode)
> || s_register_operand (operands[1], SImode))"

I'll assume that an Arm maintainer would have spoken up by now if
they didn't want this for some reason.

> diff --git a/gcc/rtl.c b/gcc/rtl.c
> index 
> aaee882f5ca3e37b59c9829e41d0864070c170eb..3e8b3628b0b76b41889b77bb0019f582ee6f5aaa
>  100644
> --- a/gcc/rtl.c
> +++ b/gcc/rtl.c
> @@ -736,6 +736,19 @@ rtvec_all_equal_p (const_rtvec 

Re: [RFC] Return NULL from gimple_call_return_type if no return available.

2021-07-15 Thread Richard Biener via Gcc-patches
On Thu, Jul 15, 2021 at 1:06 PM Aldy Hernandez  wrote:
>
> Well, if we don't adjust gimple_call_return_type() to handle built-ins
> with no LHS, then we must adjust the callers.
>
> The attached patch fixes gimple_expr_type() per it's documentation:
>
> /* Return the type of the main expression computed by STMT.  Return
>void_type_node if the statement computes nothing.  */
>
> Currently gimple_expr_type is ICEing because it calls gimple_call_return_type.
>
> I still think gimple_call_return_type should return void_type_node
> instead of ICEing, but this will also fix my problem.
>
> Anyone have a problem with this?

It's still somewhat inconsistent, no?  Because for a call without a LHS
it's now either void_type_node or the type of the return value.

It's probably known I dislike gimple_expr_type itself (it was introduced
to make the transition to tuples easier).  I wonder why you can't simply
fix range_of_call to do

   tree lhs = gimple_call_lhs (call);
   if (lhs)
 type = TREE_TYPE (lhs);

Richard.

>
> Aldy
>
> On Thu, Jun 24, 2021 at 3:57 PM Andrew MacLeod via Gcc-patches
>  wrote:
> >
> > On 6/24/21 9:45 AM, Jakub Jelinek wrote:
> > > On Thu, Jun 24, 2021 at 09:31:13AM -0400, Andrew MacLeod via Gcc-patches 
> > > wrote:
> > >> We'll still compute values for statements that don't have a LHS.. there's
> > >> nothing inherently wrong with that.  The primary example is
> > >>
> > >> if (x_2 < y_3)
> > >>
> > >> we will compute [0,0] [1,1] or [0,1] for that statement, without a LHS.  
> > >> It
> > >> primarily becomes a generic way to ask for the range of each of the 
> > >> operands
> > >> of the statement, and process it regardless of the presence of a LHS.  I
> > >> don't know, maybe there is (or will be)  an internal function that 
> > >> doesn't
> > >> have a LHS but which can be folded away/rewritten if the operands are
> > >> certain values.
> > > There are many internal functions that aren't ECF_CONST or ECF_PURE.  Some
> > > of them, like IFN*STORE* I think never have an lhs, others have them, but
> > > if the lhs is unused, various optimization passes can just remove those 
> > > lhs
> > > from the internal fn calls (if they'd be ECF_CONST or ECF_PURE, the calls
> > > would be DCEd).
> > >
> > > I think generally, if a call doesn't have lhs, there is no point in
> > > computing a value range for that missing lhs.  It won't be useful for the
> > > call arguments to lhs direction (nothing would care about that value) and
> > > it won't be useful on the direction from the lhs to the call arguments
> > > either.  Say if one has
> > >p_23 = __builtin_memcpy (p_75, q_23, 16);
> > > then one can imply from ~[0, 0] range on p_75 that p_23 has that range too
> > > (and vice versa), but if one has
> > >__builtin_memcpy (p_125, q_23, 16);
> > > none of that makes sense.
> > >
> > > So instead of punting when gimple_call_return_type returns NULL IMHO the
> > > code should punt when gimple_call_lhs is NULL.
> > >
> > >
> >
> > Well, we are going to punt anyway, because the call type, whether it is
> > NULL or VOIDmode is not supported by irange.   It was more just a matter
> > of figuring out whether us checking for internal call or the
> > gimple_function_return_type call should do the check...   Ultimately in
> > the end it doesnt matter.. just seemed like something someone else could
> > trip across if we didnt strengthen gimple_call_return_type to not ice.
> >
> > Andrew
> >


RE: [PATCH][AArch32]: Correct sdot RTL on aarch32

2021-07-15 Thread Tamar Christina via Gcc-patches
Hi Christophe,

Sorry about that, the ICEs should be fixed now and the execution tests are 
being fixed now.

They were being hidden by a model bug which kept saying everything passed even 
when failed ☹

Regards,
Tamar

From: Christophe Lyon 
Sent: Thursday, July 15, 2021 9:39 AM
To: Tamar Christina 
Cc: GCC Patches ; Richard Earnshaw 
; nd ; Ramana Radhakrishnan 

Subject: Re: [PATCH][AArch32]: Correct sdot RTL on aarch32

Hi Tamar,


On Tue, May 25, 2021 at 5:41 PM Tamar Christina via Gcc-patches 
mailto:gcc-patches@gcc.gnu.org>> wrote:
Hi All,

The RTL Generated from dot_prod is invalid as operand3 cannot be
written to, it's a normal input.  For the expand it's just another operand
but the caller does not expect it to be written to.

Bootstrapped Regtested on arm-none-linux-gnueabihf and no issues.

Ok for master? and backport to GCC 11, 10, 9?

Thanks,
Tamar

gcc/ChangeLog:

* config/arm/neon.md (dot_prod): Drop statements.

--- inline copy of patch --
diff --git a/gcc/config/arm/neon.md b/gcc/config/arm/neon.md
index 
61d81646475ce3bf62ece2cec2faf0c1fe978ec1..9602e9993aeebf4ec620d105fd20f64498a3b851
 100644
--- a/gcc/config/arm/neon.md
+++ b/gcc/config/arm/neon.md
@@ -3067,13 +3067,7 @@ (define_expand "dot_prod"
 DOTPROD)
(match_operand:VCVTI 3 "register_operand")))]
   "TARGET_DOTPROD"
-{
-  emit_insn (
-gen_neon_dot (operands[3], operands[3], operands[1],
-operands[2]));
-  emit_insn (gen_rtx_SET (operands[0], operands[3]));
-  DONE;
-})
+)

 ;; Auto-vectorizer pattern for usdot
 (define_expand "usdot_prod"

This patch is causing ICEs on arm-eabi (and probably arm-linux-gnueabi but 
trunk build is currently broken):

 FAIL: gcc.target/arm/simd/vect-dot-s8.c (internal compiler error)
FAIL: gcc.target/arm/simd/vect-dot-s8.c (test for excess errors)
Excess errors:
/gcc/testsuite/gcc.target/arm/simd/vect-dot-qi.h:15:1: error: unrecognizable 
insn:
(insn 29 28 30 5 (set (reg:V4SI 132 [ vect_patt_31.15 ])
(plus:V4SI (unspec:V4SI [
(reg:V16QI 182)
(reg:V16QI 183)
] UNSPEC_DOT_S)
(reg:V4SI 184))) -1
 (nil))
during RTL pass: vregs
/gcc/testsuite/gcc.target/arm/simd/vect-dot-qi.h:15:1: internal compiler error: 
in extract_insn, at recog.c:2769
0x5fc656 _fatal_insn(char const*, rtx_def const*, char const*, int, char const*)
/gcc/rtl-error.c:108
0x5fc672 _fatal_insn_not_found(rtx_def const*, char const*, int, char const*)
/gcc/rtl-error.c:116
0xcbbe07 extract_insn(rtx_insn*)
/gcc/recog.c:2769
0x9e2e95 instantiate_virtual_regs_in_insn
/gcc/function.c:1611
0x9e2e95 instantiate_virtual_regs
/gcc/function.c:1985
0x9e2e95 execute
/gcc/function.c:2034

Can you check?

Thanks,

Christophe



[PATCH] arm: Fix multilib mapping for CDE extensions [PR100856]

2021-07-15 Thread Christophe LYON via Gcc-patches

This is a followup to Srinath's recent patch: the newly added test is
failing e.g. on arm-linux-gnueabihf without R/M profile multilibs.

It is also failing on arm-eabi with R/M profile multilibs if the
execution engine does not support v8.1-M instructions.

The patch avoids this by adding check_effective_target_FUNC_multilib
in target-supports.exp which effectively checks whether the target
supports linking and execution, like what is already done for other
ARM effective targets.  pr100856.c is updated to use it instead of
arm_v8_1m_main_cde_mve_ok (which makes the testcase a bit of a
duplicate with check_effective_target_FUNC_multilib).

In addition, I noticed that requiring MVE does not seem necessary and
this enables the test to pass even when targeting a CPU without MVE:
since the test does not involve actual CDE instructions, it can pass
on other architecture versions.  For instance, when requiring MVE, we
have to use cortex-m55 under QEMU for the test to pass because the
memset() that comes from v8.1-m.main+mve multilib uses LOB
instructions (DLS) (memset is used during startup).  Keeping
arm_v8_1m_main_cde_mve_ok would mean we would enable the test provided
we have the right multilibs, causing a runtime error if the simulator
does not support LOB instructions (e.g. when targeting cortex-m7).

I do not update sourcebuild.texi since the CDE effective targets are
already collectively documented.

Finally, the patch fixes two typos in comments.

2021-07-15  Christophe Lyon  

    PR target/100856
    gcc/
    * config/arm/arm.opt: Fix typo.
    * config/arm/t-rmprofile: Fix typo.

    gcc/testsuite/
    * gcc.target/arm/acle/pr100856.c: Use arm_v8m_main_cde_multilib
    and arm_v8m_main_cde.
    * lib/target-supports.exp: Add 
check_effective_target_FUNC_multilib for ARM CDE.



From baa9ed42d986dd2569697ac8903b3ca70ad73bb9 Mon Sep 17 00:00:00 2001
From: Christophe Lyon 
Date: Thu, 15 Jul 2021 12:57:18 +
Subject: [PATCH] arm: Fix multilib mapping for CDE extensions [PR100856]

This is a followup to Srinath's recent patch: the newly added test is
failing e.g. on arm-linux-gnueabihf without R/M profile multilibs.

It is also failing on arm-eabi with R/M profile multilibs if the
execution engine does not support v8.1-M instructions.

The patch avoids this by adding check_effective_target_FUNC_multilib
in target-supports.exp which effectively checks whether the target
supports linking and execution, like what is already done for other
ARM effective targets.  pr100856.c is updated to use it instead of
arm_v8_1m_main_cde_mve_ok (which makes the testcase a bit of a
duplicate with check_effective_target_FUNC_multilib).

In addition, I noticed that requiring MVE does not seem necessary and
this enables the test to pass even when targeting a CPU without MVE:
since the test does not involve actual CDE instructions, it can pass
on other architecture versions.  For instance, when requiring MVE, we
have to use cortex-m55 under QEMU for the test to pass because the
memset() that comes from v8.1-m.main+mve multilib uses LOB
instructions (DLS) (memset is used during startup).  Keeping
arm_v8_1m_main_cde_mve_ok would mean we would enable the test provided
we have the right multilibs, causing a runtime error if the simulator
does not support LOB instructions (e.g. when targeting cortex-m7).

I do not update sourcebuild.texi since the CDE effective targets are
already collectively documented.

Finally, the patch fixes two typos in comments.

2021-07-15  Christophe Lyon  

PR target/100856
gcc/
* config/arm/arm.opt: Fix typo.
* config/arm/t-rmprofile: Fix typo.

gcc/testsuite/
* gcc.target/arm/acle/pr100856.c: Use arm_v8m_main_cde_multilib
and arm_v8m_main_cde.
* lib/target-supports.exp: Add
check_effective_target_FUNC_multilib for ARM CDE.
---
 gcc/config/arm/arm.opt   |  2 +-
 gcc/config/arm/t-rmprofile   |  2 +-
 gcc/testsuite/gcc.target/arm/acle/pr100856.c |  4 ++--
 gcc/testsuite/lib/target-supports.exp| 18 ++
 4 files changed, 22 insertions(+), 4 deletions(-)

diff --git a/gcc/config/arm/arm.opt b/gcc/config/arm/arm.opt
index af478a946b2..7417b55122a 100644
--- a/gcc/config/arm/arm.opt
+++ b/gcc/config/arm/arm.opt
@@ -82,7 +82,7 @@ EnumValue
 Enum(arm_arch) String(native) Value(-1) DriverOnly
 
 ; Set to the name of target architecture which is required for
-; multilib linking.  This option is undocumented becuase it
+; multilib linking.  This option is undocumented because it
 ; should not be used by the users.
 mlibarch=
 Target RejectNegative JoinedOrMissing NoDWARFRecord DriverOnly Undocumented
diff --git a/gcc/config/arm/t-rmprofile b/gcc/config/arm/t-rmprofile
index 3e75fcc9635..a6036bf0a51 100644
--- a/gcc/config/arm/t-rmprofile
+++ b/gcc/config/arm/t-rmprofile
@@ -54,7 +54,7 @@ MULTILIB_REQUIRED += 

testsuite: aarch64: Fix failing SVE tests on big endian

2021-07-15 Thread Jonathan Wright via Gcc-patches
Hi,

A recent change "gcc: Add vec_select -> subreg RTL simplification"
updated the expected test results for SVE extraction tests. The new
result should only have been changed for little endian. This patch
restores the old expected result for big endian.

Ok for master?

Thanks,
Jonathan

---

gcc/testsuite/ChangeLog:

2021-07-15  Jonathan Wright  

* gcc.target/aarch64/sve/extract_1.c: Split expected results
by big/little endian targets, restoring the old expected
result for big endian.
* gcc.target/aarch64/sve/extract_2.c: Likewise.
* gcc.target/aarch64/sve/extract_3.c: Likewise.
* gcc.target/aarch64/sve/extract_4.c: Likewise.


rb14655.patch
Description: rb14655.patch


Re: [PATCH 1/2] Streamline vect_gen_while

2021-07-15 Thread Richard Sandiford via Gcc-patches
Richard Biener  writes:
> This adjusts the vect_gen_while API to match that of
> vect_gen_while_not allowing further patches to generate more
> than one stmt for the while case.
>
> Bootstrapped and tested on x86_64-unknown-linux-gnu, tested a
> toy example on SVE that it still produces the same code.
>
> OK?
>
> 2021-07-15  Richard Biener  
>
>   * tree-vectorizer.h (vect_gen_while): Match up with
>   vect_gen_while_not.
>   * tree-vect-stmts.c (vect_gen_while): Adjust API to that
>   of vect_gen_while_not.
>   (vect_gen_while_not): Adjust.
>   * tree-vect-loop-manip.c (vect_set_loop_controls_directly): Likewise.
> ---
>  gcc/tree-vect-loop-manip.c | 14 ++
>  gcc/tree-vect-stmts.c  | 16 
>  gcc/tree-vectorizer.h  |  3 ++-
>  3 files changed, 16 insertions(+), 17 deletions(-)
>
> diff --git a/gcc/tree-vect-loop-manip.c b/gcc/tree-vect-loop-manip.c
> index c29ffb3356c..1f3d6614e6c 100644
> --- a/gcc/tree-vect-loop-manip.c
> +++ b/gcc/tree-vect-loop-manip.c
> @@ -609,11 +609,8 @@ vect_set_loop_controls_directly (class loop *loop, 
> loop_vec_info loop_vinfo,
>   }
>  
> if (use_masks_p)
> - {
> -   init_ctrl = make_temp_ssa_name (ctrl_type, NULL, "max_mask");
> -   gimple *tmp_stmt = vect_gen_while (init_ctrl, start, end);
> -   gimple_seq_add_stmt (preheader_seq, tmp_stmt);
> - }
> + init_ctrl = vect_gen_while (preheader_seq, ctrl_type,
> + start, end, "max_mask");
> else
>   {
> init_ctrl = make_temp_ssa_name (compare_type, NULL, "max_len");
> @@ -652,9 +649,10 @@ vect_set_loop_controls_directly (class loop *loop, 
> loop_vec_info loop_vinfo,
>/* Get the control value for the next iteration of the loop.  */
>if (use_masks_p)
>   {
> -   next_ctrl = make_temp_ssa_name (ctrl_type, NULL, "next_mask");
> -   gcall *call = vect_gen_while (next_ctrl, test_index, this_test_limit);
> -   gsi_insert_before (test_gsi, call, GSI_SAME_STMT);
> +   gimple_seq stmts = NULL;
> +   next_ctrl = vect_gen_while (, ctrl_type, test_index,
> +   this_test_limit, "next_mask");
> +   gsi_insert_seq_before (test_gsi, stmts, GSI_SAME_STMT);
>   }
>else
>   {
> diff --git a/gcc/tree-vect-stmts.c b/gcc/tree-vect-stmts.c
> index d9eeda50278..6a25d661800 100644
> --- a/gcc/tree-vect-stmts.c
> +++ b/gcc/tree-vect-stmts.c
> @@ -12002,19 +12002,21 @@ supportable_narrowing_operation (enum tree_code 
> code,
>  /* Generate and return a statement that sets vector mask MASK such that
> MASK[I] is true iff J + START_INDEX < END_INDEX for all J <= I.  */

Comment needs updating.  LGTM otherwise, thanks.

Richard

> -gcall *
> -vect_gen_while (tree mask, tree start_index, tree end_index)
> +tree
> +vect_gen_while (gimple_seq *seq, tree mask_type, tree start_index,
> + tree end_index, const char *name)
>  {
>tree cmp_type = TREE_TYPE (start_index);
> -  tree mask_type = TREE_TYPE (mask);
>gcc_checking_assert (direct_internal_fn_supported_p (IFN_WHILE_ULT,
>  cmp_type, mask_type,
>  OPTIMIZE_FOR_SPEED));
>gcall *call = gimple_build_call_internal (IFN_WHILE_ULT, 3,
>   start_index, end_index,
>   build_zero_cst (mask_type));
> -  gimple_call_set_lhs (call, mask);
> -  return call;
> +  tree tmp = make_temp_ssa_name (mask_type, NULL, name);
> +  gimple_call_set_lhs (call, tmp);
> +  gimple_seq_add_stmt (seq, call);
> +  return tmp;
>  }
>  
>  /* Generate a vector mask of type MASK_TYPE for which index I is false iff
> @@ -12024,9 +12026,7 @@ tree
>  vect_gen_while_not (gimple_seq *seq, tree mask_type, tree start_index,
>   tree end_index)
>  {
> -  tree tmp = make_ssa_name (mask_type);
> -  gcall *call = vect_gen_while (tmp, start_index, end_index);
> -  gimple_seq_add_stmt (seq, call);
> +  tree tmp = vect_gen_while (seq, mask_type, start_index, end_index);
>return gimple_build (seq, BIT_NOT_EXPR, mask_type, tmp);
>  }
>  
> diff --git a/gcc/tree-vectorizer.h b/gcc/tree-vectorizer.h
> index 4c4bc810c35..49afdd898d0 100644
> --- a/gcc/tree-vectorizer.h
> +++ b/gcc/tree-vectorizer.h
> @@ -1948,7 +1948,8 @@ extern bool vect_supportable_shift (vec_info *, enum 
> tree_code, tree);
>  extern tree vect_gen_perm_mask_any (tree, const vec_perm_indices &);
>  extern tree vect_gen_perm_mask_checked (tree, const vec_perm_indices &);
>  extern void optimize_mask_stores (class loop*);
> -extern gcall *vect_gen_while (tree, tree, tree);
> +extern tree vect_gen_while (gimple_seq *, tree, tree, tree,
> + const char * = nullptr);
>  extern tree vect_gen_while_not (gimple_seq *, tree, tree, tree);
>  extern opt_result 

Re: GCC 11.1.1 Status Report (2021-07-06)

2021-07-15 Thread Richard Biener
On Thu, 15 Jul 2021, H.J. Lu wrote:

> On Tue, Jul 6, 2021 at 12:00 AM Richard Biener  wrote:
> >
> >
> > Status
> > ==
> >
> > The GCC 11 branch is open for regression and documentation fixes.
> > It's time for a GCC 11.2 release and we are aiming for a release
> > candidate in about two weeks which would result in the GCC 11.2
> > release about three months after GCC 11.1.
> >
> > Two weeks give you ample time to care for important regressions
> > and backporting of fixes.  Please also look out for issues on
> > non-primary/secondary targets.
> >
> >
> > Quality Data
> > 
> >
> > Priority  #   Change from last report
> > ---   ---
> > P1
> > P2  272   +  20
> > P3   94   +  56
> > P4  210   +   2
> > P5   24   -   1
> > ---   ---
> > Total P1-P3 366   +  76
> > Total   600   +  79
> >
> >
> > Previous Report
> > ===
> >
> > https://gcc.gnu.org/pipermail/gcc/2021-April/235923.html
> 
> I'd like to backport:
> 
> https://gcc.gnu.org/g:3f04e3782536ad2f9cfbb8cfe6630e9f9dd8af4c
> 
> to fix this GCC 11 regression:
> 
> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=101023

OK.


Re: GCC 11.1.1 Status Report (2021-07-06)

2021-07-15 Thread H.J. Lu via Gcc-patches
On Tue, Jul 6, 2021 at 12:00 AM Richard Biener  wrote:
>
>
> Status
> ==
>
> The GCC 11 branch is open for regression and documentation fixes.
> It's time for a GCC 11.2 release and we are aiming for a release
> candidate in about two weeks which would result in the GCC 11.2
> release about three months after GCC 11.1.
>
> Two weeks give you ample time to care for important regressions
> and backporting of fixes.  Please also look out for issues on
> non-primary/secondary targets.
>
>
> Quality Data
> 
>
> Priority  #   Change from last report
> ---   ---
> P1
> P2  272   +  20
> P3   94   +  56
> P4  210   +   2
> P5   24   -   1
> ---   ---
> Total P1-P3 366   +  76
> Total   600   +  79
>
>
> Previous Report
> ===
>
> https://gcc.gnu.org/pipermail/gcc/2021-April/235923.html

I'd like to backport:

https://gcc.gnu.org/g:3f04e3782536ad2f9cfbb8cfe6630e9f9dd8af4c

to fix this GCC 11 regression:

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=101023

-- 
H.J.


Re: [PATCH] Support reduction def re-use for epilogue with different vector size

2021-07-15 Thread Richard Biener
On Thu, 15 Jul 2021, Christophe Lyon wrote:

> Hi,
> 
> 
> 
> On Tue, Jul 13, 2021 at 2:09 PM Richard Biener  wrote:
> 
> > The following adds support for re-using the vector reduction def
> > from the main loop in vectorized epilogue loops on architectures
> > which use different vector sizes for the epilogue.  That's only
> > x86 as far as I am aware.
> >
> > vect.exp tested on x86_64-unknown-linux-gnu, full bootstrap &
> > regtest in progress.
> >
> > There's costing issues on x86 which usually prevent vectorizing
> > an epilogue with a reduction, at least for loops that only
> > have a reduction - it could be mitigated by not accounting for
> > the epilogue there if we can compute that we can re-use the
> > main loops cost.
> >
> > Richard - did I figure the correct place to adjust?  I guess
> > adjusting accumulator->reduc_input in vect_transform_cycle_phi
> > for re-use by the skip code in vect_create_epilog_for_reduction
> > is a bit awkward but at least we're conciously doing
> > vect_create_epilog_for_reduction last (via vectorizing live
> > operations).
> >
> > OK in the unlikely case all testing succeeds (I also want to
> > run it through SPEC with/without -fno-vect-cost-model which
> > will take some time)?
> >
> > Thanks,
> > Richard.
> >
> > 2021-07-13  Richard Biener  
> >
> > * tree-vect-loop.c (vect_find_reusable_accumulator): Handle
> > vector types where the old vector type has a multiple of
> > the new vector type elements.
> > (vect_create_partial_epilog): New function, split out from...
> > (vect_create_epilog_for_reduction): ... here.
> > (vect_transform_cycle_phi): Reduce the re-used accumulator
> > to the new vector type.
> >
> > * gcc.target/i386/vect-reduc-1.c: New testcase.
> >
> 
> This patch is causing regressions on aarch64:
>  FAIL: gcc.dg/vect/pr92324-4.c (internal compiler error)
> FAIL: gcc.dg/vect/pr92324-4.c 2 blank line(s) in output
> FAIL: gcc.dg/vect/pr92324-4.c (test for excess errors)
> Excess errors:
> /gcc/testsuite/gcc.dg/vect/pr92324-4.c:7:1: error: incompatible types in
> 'PHI' argument 1
> vector(2) unsigned int
> vector(2) int
> _91 = PHI <_90(17), _83(11)>
> during GIMPLE pass: vect
> dump file: ./pr92324-4.c.167t.vect
> /gcc/testsuite/gcc.dg/vect/pr92324-4.c:7:1: internal compiler error:
> verify_gimple failed
> 0xe6438e verify_gimple_in_cfg(function*, bool)
> /gcc/tree-cfg.c:5535
> 0xd13902 execute_function_todo
> /gcc/passes.c:2042
> 0xd142a5 execute_todo
> /gcc/passes.c:2096
> 
> FAIL: gcc.target/aarch64/vect-fmaxv-fminv-compile.c scan-assembler fminnmv
> FAIL: gcc.target/aarch64/vect-fmaxv-fminv-compile.c scan-assembler fmaxnmv

What exact options do you pass to cc1 to get this?  Can you track this
in a PR please?

Thanks,
Richard.

> Thanks,
> 
> Christophe
> 
> 
> 
> > ---
> >  gcc/testsuite/gcc.target/i386/vect-reduc-1.c |  17 ++
> >  gcc/tree-vect-loop.c | 223 ---
> >  2 files changed, 155 insertions(+), 85 deletions(-)
> >  create mode 100644 gcc/testsuite/gcc.target/i386/vect-reduc-1.c
> >
> > diff --git a/gcc/testsuite/gcc.target/i386/vect-reduc-1.c
> > b/gcc/testsuite/gcc.target/i386/vect-reduc-1.c
> > new file mode 100644
> > index 000..9ee9ba4e736
> > --- /dev/null
> > +++ b/gcc/testsuite/gcc.target/i386/vect-reduc-1.c
> > @@ -0,0 +1,17 @@
> > +/* { dg-do compile } */
> > +/* { dg-options "-O3 -mavx2 -mno-avx512f -fdump-tree-vect-details" } */
> > +
> > +#define N 32
> > +int foo (int *a, int n)
> > +{
> > +  int sum = 1;
> > +  for (int i = 0; i < 8*N + 4; ++i)
> > +sum += a[i];
> > +  return sum;
> > +}
> > +
> > +/* The reduction epilog should be vectorized and the accumulator
> > +   re-used.  */
> > +/* { dg-final { scan-tree-dump "LOOP EPILOGUE VECTORIZED" "vect" } } */
> > +/* { dg-final { scan-assembler-times "psrl" 2 } } */
> > +/* { dg-final { scan-assembler-times "padd" 5 } } */
> > diff --git a/gcc/tree-vect-loop.c b/gcc/tree-vect-loop.c
> > index 8c27d75f889..98e2a845629 100644
> > --- a/gcc/tree-vect-loop.c
> > +++ b/gcc/tree-vect-loop.c
> > @@ -4901,7 +4901,8 @@ vect_find_reusable_accumulator (loop_vec_info
> > loop_vinfo,
> >   ones as well.  */
> >tree vectype = STMT_VINFO_VECTYPE (reduc_info);
> >tree old_vectype = TREE_TYPE (accumulator->reduc_input);
> > -  if (!useless_type_conversion_p (old_vectype, vectype))
> > +  if (!constant_multiple_p (TYPE_VECTOR_SUBPARTS (old_vectype),
> > +   TYPE_VECTOR_SUBPARTS (vectype)))
> >  return false;
> >
> >/* Non-SLP reductions might apply an adjustment after the reduction
> > @@ -4935,6 +4936,101 @@ vect_find_reusable_accumulator (loop_vec_info
> > loop_vinfo,
> >return true;
> >  }
> >
> > +/* Reduce the vector VEC_DEF down to VECTYPE with reduction operation
> > +   CODE emitting stmts before GSI.  Returns a vector def of VECTYPE.  */
> > +
> > +static tree
> > +vect_create_partial_epilog (tree 

[PATCH] Disable --param vect-partial-vector-usage by default on x86

2021-07-15 Thread Richard Biener
The following defaults --param vect-partial-vector-usage to zero
for x86_64 matching existing behavior where support for this
is not present.

OK for trunk?

Thanks,
Richard/

2021-07-15  Richard Biener  

* config/i386/i386-options.c (ix86_option_override_internal): Set
param_vect_partial_vector_usage to zero if not set.
---
 gcc/config/i386/i386-options.c | 5 +
 1 file changed, 5 insertions(+)

diff --git a/gcc/config/i386/i386-options.c b/gcc/config/i386/i386-options.c
index 7cba655595e..3416a4f1752 100644
--- a/gcc/config/i386/i386-options.c
+++ b/gcc/config/i386/i386-options.c
@@ -2834,6 +2834,11 @@ ix86_option_override_internal (bool main_args_p,
 
   SET_OPTION_IF_UNSET (opts, opts_set, param_ira_consider_dup_in_all_alts, 0);
 
+  /* Fully masking the main or the epilogue vectorized loop is not
+ profitable generally so leave it disabled until we get more
+ fine grained control & costing.  */
+  SET_OPTION_IF_UNSET (opts, opts_set, param_vect_partial_vector_usage, 0);
+
   return true;
 }
 
-- 
2.26.2


Re: [PATCH] Support reduction def re-use for epilogue with different vector size

2021-07-15 Thread Christophe Lyon via Gcc-patches
Hi,



On Tue, Jul 13, 2021 at 2:09 PM Richard Biener  wrote:

> The following adds support for re-using the vector reduction def
> from the main loop in vectorized epilogue loops on architectures
> which use different vector sizes for the epilogue.  That's only
> x86 as far as I am aware.
>
> vect.exp tested on x86_64-unknown-linux-gnu, full bootstrap &
> regtest in progress.
>
> There's costing issues on x86 which usually prevent vectorizing
> an epilogue with a reduction, at least for loops that only
> have a reduction - it could be mitigated by not accounting for
> the epilogue there if we can compute that we can re-use the
> main loops cost.
>
> Richard - did I figure the correct place to adjust?  I guess
> adjusting accumulator->reduc_input in vect_transform_cycle_phi
> for re-use by the skip code in vect_create_epilog_for_reduction
> is a bit awkward but at least we're conciously doing
> vect_create_epilog_for_reduction last (via vectorizing live
> operations).
>
> OK in the unlikely case all testing succeeds (I also want to
> run it through SPEC with/without -fno-vect-cost-model which
> will take some time)?
>
> Thanks,
> Richard.
>
> 2021-07-13  Richard Biener  
>
> * tree-vect-loop.c (vect_find_reusable_accumulator): Handle
> vector types where the old vector type has a multiple of
> the new vector type elements.
> (vect_create_partial_epilog): New function, split out from...
> (vect_create_epilog_for_reduction): ... here.
> (vect_transform_cycle_phi): Reduce the re-used accumulator
> to the new vector type.
>
> * gcc.target/i386/vect-reduc-1.c: New testcase.
>

This patch is causing regressions on aarch64:
 FAIL: gcc.dg/vect/pr92324-4.c (internal compiler error)
FAIL: gcc.dg/vect/pr92324-4.c 2 blank line(s) in output
FAIL: gcc.dg/vect/pr92324-4.c (test for excess errors)
Excess errors:
/gcc/testsuite/gcc.dg/vect/pr92324-4.c:7:1: error: incompatible types in
'PHI' argument 1
vector(2) unsigned int
vector(2) int
_91 = PHI <_90(17), _83(11)>
during GIMPLE pass: vect
dump file: ./pr92324-4.c.167t.vect
/gcc/testsuite/gcc.dg/vect/pr92324-4.c:7:1: internal compiler error:
verify_gimple failed
0xe6438e verify_gimple_in_cfg(function*, bool)
/gcc/tree-cfg.c:5535
0xd13902 execute_function_todo
/gcc/passes.c:2042
0xd142a5 execute_todo
/gcc/passes.c:2096

FAIL: gcc.target/aarch64/vect-fmaxv-fminv-compile.c scan-assembler fminnmv
FAIL: gcc.target/aarch64/vect-fmaxv-fminv-compile.c scan-assembler fmaxnmv

Thanks,

Christophe



> ---
>  gcc/testsuite/gcc.target/i386/vect-reduc-1.c |  17 ++
>  gcc/tree-vect-loop.c | 223 ---
>  2 files changed, 155 insertions(+), 85 deletions(-)
>  create mode 100644 gcc/testsuite/gcc.target/i386/vect-reduc-1.c
>
> diff --git a/gcc/testsuite/gcc.target/i386/vect-reduc-1.c
> b/gcc/testsuite/gcc.target/i386/vect-reduc-1.c
> new file mode 100644
> index 000..9ee9ba4e736
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/i386/vect-reduc-1.c
> @@ -0,0 +1,17 @@
> +/* { dg-do compile } */
> +/* { dg-options "-O3 -mavx2 -mno-avx512f -fdump-tree-vect-details" } */
> +
> +#define N 32
> +int foo (int *a, int n)
> +{
> +  int sum = 1;
> +  for (int i = 0; i < 8*N + 4; ++i)
> +sum += a[i];
> +  return sum;
> +}
> +
> +/* The reduction epilog should be vectorized and the accumulator
> +   re-used.  */
> +/* { dg-final { scan-tree-dump "LOOP EPILOGUE VECTORIZED" "vect" } } */
> +/* { dg-final { scan-assembler-times "psrl" 2 } } */
> +/* { dg-final { scan-assembler-times "padd" 5 } } */
> diff --git a/gcc/tree-vect-loop.c b/gcc/tree-vect-loop.c
> index 8c27d75f889..98e2a845629 100644
> --- a/gcc/tree-vect-loop.c
> +++ b/gcc/tree-vect-loop.c
> @@ -4901,7 +4901,8 @@ vect_find_reusable_accumulator (loop_vec_info
> loop_vinfo,
>   ones as well.  */
>tree vectype = STMT_VINFO_VECTYPE (reduc_info);
>tree old_vectype = TREE_TYPE (accumulator->reduc_input);
> -  if (!useless_type_conversion_p (old_vectype, vectype))
> +  if (!constant_multiple_p (TYPE_VECTOR_SUBPARTS (old_vectype),
> +   TYPE_VECTOR_SUBPARTS (vectype)))
>  return false;
>
>/* Non-SLP reductions might apply an adjustment after the reduction
> @@ -4935,6 +4936,101 @@ vect_find_reusable_accumulator (loop_vec_info
> loop_vinfo,
>return true;
>  }
>
> +/* Reduce the vector VEC_DEF down to VECTYPE with reduction operation
> +   CODE emitting stmts before GSI.  Returns a vector def of VECTYPE.  */
> +
> +static tree
> +vect_create_partial_epilog (tree vec_def, tree vectype, enum tree_code
> code,
> +   gimple_seq *seq)
> +{
> +  unsigned nunits = TYPE_VECTOR_SUBPARTS (TREE_TYPE
> (vec_def)).to_constant ();
> +  unsigned nunits1 = TYPE_VECTOR_SUBPARTS (vectype).to_constant ();
> +  tree stype = TREE_TYPE (vectype);
> +  tree new_temp = vec_def;
> +  while (nunits > nunits1)
> +{
> +  nunits /= 2;
> +  tree vectype1 = 

[PATCH v3] c++: Add gnu::diagnose_as attribute

2021-07-15 Thread Matthias Kretz
Hi Jason,

A new revision of the patch is attached. I think I implemented all your 
suggestions.

Please comment on cp/decl2.c (is_alias_template_p). I find it surprising that 
I had to write this function. Maybe I missed something? In any case, 
DECL_ALIAS_TEMPLATE_P requires a template_decl and the TYPE_DECL apparently 
doesn't have a template_info/decl at this point.

From: Matthias Kretz 

This attribute overrides the diagnostics output string for the entity it
appertains to. The motivation is to improve QoI for library TS
implementations, where diagnostics have a very bad signal-to-noise ratio
due to the long namespaces involved.

With the attribute, it is possible to solve PR89370 and make
std::__cxx11::basic_string<_CharT, _Traits, _Alloc> appear as
std::string in diagnostic output without extra hacks to recognize the
type in the C++ frontend.

Signed-off-by: Matthias Kretz 

gcc/ChangeLog:

PR c++/89370
* doc/extend.texi: Document the diagnose_as attribute.
* doc/invoke.texi: Document -fno-diagnostics-use-aliases.

gcc/c-family/ChangeLog:

PR c++/89370
* c.opt (fdiagnostics-use-aliases): New diagnostics flag.

gcc/cp/ChangeLog:

PR c++/89370
* cp-tree.h: Add TFF_AS_PRIMARY. Add is_alias_template_p
declaration.
* decl2.c (is_alias_template_p): New function. Determines
whether a given TYPE_DECL is actually an alias template that is
still missing its template_info.
(is_late_template_attribute): Decls with diagnose_as attribute
are early attributes only if they are alias templates.
* error.c (dump_scope): When printing the name of a namespace,
look for the diagnose_as attribute. If found, print the
associated string instead of calling dump_decl.
(dump_decl_name_or_diagnose_as): New function to replace
dump_decl (pp, DECL_NAME(t), flags) and inspect the tree for the
diagnose_as attribute before printing the DECL_NAME.
(dump_template_scope): New function. Prints the scope of a
template instance correctly applying diagnose_as attributes and
adjusting the list of template parms accordingly.
(dump_aggr_type): If the type has a diagnose_as attribute, print
the associated string instead of printing the original type
name. Print template parms only if the attribute was not applied
to the instantiation / full specialization. Delay call to
dump_scope until the diagnose_as attribute is found. If the
attribute has a second argument, use it to override the context
passed to dump_scope.
(dump_simple_decl): Call dump_decl_name_or_diagnose_as instead
of dump_decl.
(dump_decl): Ditto.
(lang_decl_name): Ditto.
(dump_function_decl): Walk the functions context list to
determine whether a call to dump_template_scope is required.
Ensure function templates are presented as primary templates.
(dump_function_name): Replace the function's identifier with the
diagnose_as attribute value, if set.
(dump_template_parms): Treat as primary template if flags
contains TFF_AS_PRIMARY.
(comparable_template_types_p): Consider the types not a template
if one carries a diagnose_as attribute.
(print_template_differences): Replace the identifier with the
diagnose_as attribute value on the most general template, if it
is set.
* name-lookup.c (handle_namespace_attrs): Handle the diagnose_as
attribute on namespaces. Ensure exactly one string argument.
Ensure previous diagnose_as attributes used the same name.
'diagnose_as' on namespace aliases are forwarded to the original
namespace. Support no-argument 'diagnose_as' on namespace
aliases.
(do_namespace_alias): Add attributes parameter and call
handle_namespace_attrs.
* name-lookup.h (do_namespace_alias): Add attributes tree
parameter.
* parser.c (cp_parser_declaration): If the next token is
RID_NAMESPACE, tentatively parse a namespace alias definition.
If this fails expect a namespace definition.
(cp_parser_namespace_alias_definition): Allow optional
attributes before and after the identifier. Fast exit if the
expected CPP_EQ token is missing. Pass attributes to
do_namespace_alias.
* tree.c (cxx_attribute_table): Add diagnose_as attribute to the
table.
(check_diagnose_as_redeclaration): New function; copied and
adjusted from check_abi_tag_redeclaration.
(handle_diagnose_as_attribute): New function; copied and
adjusted from handle_abi_tag_attribute. If the given *node is a
TYPE_DECL: allow no argument to the attribute, using DECL_NAME
instead; apply the attribute to the type on the RHS in place,
even if the type is complete. Allow 2 

[PUSHED] Abstract out non_null adjustments in ranger.

2021-07-15 Thread Aldy Hernandez via Gcc-patches
There are 4 exact copies of the non-null range adjusting code in the
ranger.  This patch abstracts the functionality into a separate method.

As a follow-up I would like to remove the varying_p check, since I have
seen incoming ranges such as [0, 0xffef] which are not varying, but
are not-null.  Removing the varying restriction catches those.

Tested on x86-64 Linux.

Pushed to trunk.

p.s. Andrew, what are your thoughts on removing the varying_p() check as
a follow-up?

gcc/ChangeLog:

* gimple-range-cache.cc (non_null_ref::adjust_range): New.
(ranger_cache::range_of_def): Call adjust_range.
(ranger_cache::entry_range): Same.
* gimple-range-cache.h (non_null_ref::adjust_range): New.
* gimple-range.cc (gimple_ranger::range_of_expr): Call
adjust_range.
(gimple_ranger::range_on_entry): Same.
---
 gcc/gimple-range-cache.cc | 35 ++-
 gcc/gimple-range-cache.h  |  2 ++
 gcc/gimple-range.cc   |  8 ++--
 3 files changed, 30 insertions(+), 15 deletions(-)

diff --git a/gcc/gimple-range-cache.cc b/gcc/gimple-range-cache.cc
index 98ecdbbd68e..23597ade802 100644
--- a/gcc/gimple-range-cache.cc
+++ b/gcc/gimple-range-cache.cc
@@ -81,6 +81,29 @@ non_null_ref::non_null_deref_p (tree name, basic_block bb, 
bool search_dom)
   return false;
 }
 
+// If NAME has a non-null dereference in block BB, adjust R with the
+// non-zero information from non_null_deref_p, and return TRUE.  If
+// SEARCH_DOM is true, non_null_deref_p should search the dominator tree.
+
+bool
+non_null_ref::adjust_range (irange , tree name, basic_block bb,
+   bool search_dom)
+{
+  // Check if pointers have any non-null dereferences.  Non-call
+  // exceptions mean we could throw in the middle of the block, so just
+  // punt for now on those.
+  if (!cfun->can_throw_non_call_exceptions
+  && r.varying_p ()
+  && non_null_deref_p (name, bb, search_dom))
+{
+  int_range<2> nz;
+  nz.set_nonzero (TREE_TYPE (name));
+  r.intersect (nz);
+  return true;
+}
+  return false;
+}
+
 // Allocate an populate the bitmap for NAME.  An ON bit for a block
 // index indicates there is a non-null reference in that block.  In
 // order to populate the bitmap, a quick run of all the immediate uses
@@ -857,9 +880,8 @@ ranger_cache::range_of_def (irange , tree name, 
basic_block bb)
r = gimple_range_global (name);
 }
 
-  if (bb && r.varying_p () && m_non_null.non_null_deref_p (name, bb, false) &&
-  !cfun->can_throw_non_call_exceptions)
-r = range_nonzero (TREE_TYPE (name));
+  if (bb)
+m_non_null.adjust_range (r, name, bb, false);
 }
 
 // Get the range of NAME as it occurs on entry to block BB.
@@ -878,12 +900,7 @@ ranger_cache::entry_range (irange , tree name, 
basic_block bb)
   if (!m_on_entry.get_bb_range (r, name, bb))
 range_of_def (r, name);
 
-  // Check if pointers have any non-null dereferences.  Non-call
-  // exceptions mean we could throw in the middle of the block, so just
-  // punt for now on those.
-  if (r.varying_p () && m_non_null.non_null_deref_p (name, bb, false) &&
-  !cfun->can_throw_non_call_exceptions)
-r = range_nonzero (TREE_TYPE (name));
+  m_non_null.adjust_range (r, name, bb, false);
 }
 
 // Get the range of NAME as it occurs on exit from block BB.
diff --git a/gcc/gimple-range-cache.h b/gcc/gimple-range-cache.h
index ecf63dc01b3..f842e9c092a 100644
--- a/gcc/gimple-range-cache.h
+++ b/gcc/gimple-range-cache.h
@@ -34,6 +34,8 @@ public:
   non_null_ref ();
   ~non_null_ref ();
   bool non_null_deref_p (tree name, basic_block bb, bool search_dom = true);
+  bool adjust_range (irange , tree name, basic_block bb,
+bool search_dom = true);
 private:
   vec  m_nn;
   void process_name (tree name);
diff --git a/gcc/gimple-range.cc b/gcc/gimple-range.cc
index 1851339c528..b210787d0b7 100644
--- a/gcc/gimple-range.cc
+++ b/gcc/gimple-range.cc
@@ -69,9 +69,7 @@ gimple_ranger::range_of_expr (irange , tree expr, gimple 
*stmt)
   if (def_stmt && gimple_bb (def_stmt) == bb)
 {
   range_of_stmt (r, def_stmt, expr);
-  if (!cfun->can_throw_non_call_exceptions && r.varying_p () &&
- m_cache.m_non_null.non_null_deref_p (expr, bb))
-   r = range_nonzero (TREE_TYPE (expr));
+  m_cache.m_non_null.adjust_range (r, expr, bb, true);
 }
   else
 // Otherwise OP comes from outside this block, use range on entry.
@@ -95,9 +93,7 @@ gimple_ranger::range_on_entry (irange , basic_block bb, 
tree name)
   if (m_cache.block_range (entry_range, bb, name))
 r.intersect (entry_range);
 
-  if (!cfun->can_throw_non_call_exceptions && r.varying_p () &&
-  m_cache.m_non_null.non_null_deref_p (name, bb))
-r = range_nonzero (TREE_TYPE (name));
+  m_cache.m_non_null.adjust_range (r, name, bb, true);
 }
 
 // Calculate the range for NAME at the end of block BB and return it in R.
-- 
2.31.1



Re: [PATCH] [DWARF] Fix hierarchy of debug information for offload kernels.

2021-07-15 Thread Richard Biener via Gcc-patches
On Thu, Jul 15, 2021 at 12:35 PM Hafiz Abid Qadeer
 wrote:
>
> On 15/07/2021 11:33, Thomas Schwinge wrote:
> >
> >> Note that the "parent" should be abstract but I don't think dwarf has a
> >> way to express a fully abstract parent of a concrete instance child - or
> >> at least how GCC expresses this causes consumers to "misinterpret"
> >> that.  I wonder if adding a DW_AT_declaration to the late DWARF
> >> emitted "parent" would fix things as well here?
> >
> > (I suppose not, Abid?)
> >
>
> Yes, adding DW_AT_declaration does not fix the problem.

Does emitting

DW_TAG_compile_unit
  DW_AT_name("")

  DW_TAG_subprogram // notional parent function (foo) with no code range
DW_AT_declaration 1
a:DW_TAG_subprogram // offload function foo._omp_fn.0
  DW_AT_declaration 1

  DW_TAG_subprogram // offload function
  DW_AT_abstract_origin a
...

do the trick?  The following would do this, flattening function definitions
for the concrete copies:

diff --git a/gcc/dwarf2out.c b/gcc/dwarf2out.c
index 82783c4968b..a9c8bc43e88 100644
--- a/gcc/dwarf2out.c
+++ b/gcc/dwarf2out.c
@@ -6076,6 +6076,11 @@ maybe_create_die_with_external_ref (tree decl)
   /* Peel types in the context stack.  */
   while (ctx && TYPE_P (ctx))
 ctx = TYPE_CONTEXT (ctx);
+  /* For functions peel the context up to namespace/TU scope.  The abstract
+ copies reveal the true nesting.  */
+  if (TREE_CODE (decl) == FUNCTION_DECL)
+while (ctx && TREE_CODE (ctx) == FUNCTION_DECL)
+  ctx = DECL_CONTEXT (ctx);
   /* Likewise namespaces in case we do not want to emit DIEs for them.  */
   if (debug_info_level <= DINFO_LEVEL_TERSE)
 while (ctx && TREE_CODE (ctx) == NAMESPACE_DECL)
@@ -6099,8 +6104,7 @@ maybe_create_die_with_external_ref (tree decl)
/* Leave function local entities parent determination to when
   we process scope vars.  */
;
-  else
-   parent = lookup_decl_die (ctx);
+  parent = lookup_decl_die (ctx);
 }
   else
 /* In some cases the FEs fail to set DECL_CONTEXT properly.



>
> --
> Hafiz Abid Qadeer
> Mentor, a Siemens Business


Re: [PATCH v3] vect: Recog mul_highpart pattern

2021-07-15 Thread Richard Biener via Gcc-patches
On Thu, Jul 15, 2021 at 10:41 AM Kewen.Lin  wrote:
>
> on 2021/7/15 下午4:04, Kewen.Lin via Gcc-patches wrote:
> > Hi Uros,
> >
> > on 2021/7/15 下午3:17, Uros Bizjak wrote:
> >> On Thu, Jul 15, 2021 at 9:07 AM Kewen.Lin  wrote:
> >>>
> >>> on 2021/7/14 下午3:45, Kewen.Lin via Gcc-patches wrote:
>  on 2021/7/14 下午2:38, Richard Biener wrote:
> > On Tue, Jul 13, 2021 at 4:59 PM Kewen.Lin  wrote:
> >>
> >> on 2021/7/13 下午8:42, Richard Biener wrote:
> >>> On Tue, Jul 13, 2021 at 12:25 PM Kewen.Lin  
> >>> wrote:
> >
> >> I guess the proposed IFN would be directly mapped for [us]mul_highpart?
> >
> > Yes.
> >
> 
>  Thanks for confirming!  The related patch v2 is attached and the testing
>  is ongoing.
> 
> >>>
> >>> It's bootstrapped & regtested on powerpc64le-linux-gnu P9 and
> >>> aarch64-linux-gnu.  But on x86_64-redhat-linux there are XPASSes as below:
> >>>
> >>> XFAIL->XPASS: gcc.target/i386/pr100637-3w.c scan-assembler pmulhuw
> >>> XFAIL->XPASS: gcc.target/i386/pr100637-3w.c scan-assembler pmulhuw
> >>> XFAIL->XPASS: gcc.target/i386/pr100637-3w.c scan-assembler pmulhw
> >>> XFAIL->XPASS: gcc.target/i386/pr100637-3w.c scan-assembler pmulhw
> >>
> >> These XFAILs should be removed after your patch.
> >>
> > I'm curious whether it's intentional not to specify -fno-vect-cost-model
> > for this test case.  As noted above, this case is sensitive on how we
> > cost mult_highpart.  Without cost modeling, the XFAILs can be removed
> > only with this mul_highpart pattern support, no matter how we model it
> > (x86 part of this patch exists or not).
> >
> >> This is PR100696 [1], we want PMULH.W here, so x86 part of the patch
> >> is actually not needed.
> >>
> >
> > Thanks for the information!  The justification for the x86 part is that:
> > the IFN_MULH essentially covers MULT_HIGHPART_EXPR with mul_highpart
> > optab support, i386 port has already customized costing for
> > MULT_HIGHPART_EXPR (should mean/involve the case with mul_highpart optab
> > support), if we don't follow the same way for IFN_MULH, I'm worried that
> > we may cost the IFN_MULH wrongly.  If taking IFN_MULH as normal stmt is
> > a right thing (we shouldn't cost it specially), it at least means we
> > have to adjust ix86_multiplication_cost for MULT_HIGHPART_EXPR when it
> > has direct mul_highpart optab support, I think they should be costed
> > consistently.  Does it sound reasonable?
> >
>
> Hi Richard(s),
>
> This possibly inconsistent handling problem seems like a counter example
> better to use a new IFN rather than the existing tree_code, it seems hard
> to maintain (should remember to keep consistent for its handlings).  ;)
> From this perspective, maybe it's better to move backward to use tree_code
> and guard it under can_mult_highpart_p == 1 (just like IFN and avoid
> costing issue Richi pointed out before)?
>
> What do you think?

No, whenever we want to do code generation based on machine
capabilities the canonical way to test for those is to look at optabs
and then it's most natural to keep that 1:1 relation and emit
internal function calls which directly map to supported optabs
instead of going back to some tree codes.

When targets "lie" and provide expanders for something they can
only emulate then they have to compensate in their costing.
But as I understand this isn't the case for x86 here.

Now, in this case we already have the MULT_HIGHPART_EXPR tree,
so yes, it might make sense to use that instead of introducing an
alternate way via the direct internal function.  Somebody decided
that MULT_HIGHPART is generic enough to warrant this - but I
see that expand_mult_highpart can fail unless can_mult_highpart_p
and this is exactly one of the cases we want to avoid - either
we can handle something generally in which case it can be a
tree code or we can't, then it should be 1:1 tied to optabs at best
(mult_highpart has scalar support only for the direct optab,
vector support also for widen_mult).

Richard.

>
> BR,
> Kewen


  1   2   >