[PATCH 2/2] ipa-cp: One more use of ipa_vr_supported_type_p

2024-09-11 Thread Martin Jambor
Hi,

Since we have the predicate, this patch converts one more check for
essentially the same thing into its use.

It has passed a bootstrap and testsuite on x86_64.  I believe it is
obvious enough that I can commit it myself and so will do so later
today.

Thanks,

Martin


2024-09-11  Martin Jambor  

* gcc/ipa-cp.cc (propagate_vr_across_jump_function): Use
ipa_vr_supported_type_p instead of explicit check for integral and
pointer types.
---
 gcc/ipa-cp.cc | 3 +--
 1 file changed, 1 insertion(+), 2 deletions(-)

diff --git a/gcc/ipa-cp.cc b/gcc/ipa-cp.cc
index a1033b81aef..fa7bd6a15da 100644
--- a/gcc/ipa-cp.cc
+++ b/gcc/ipa-cp.cc
@@ -2519,8 +2519,7 @@ propagate_vr_across_jump_function (cgraph_edge *cs, 
ipa_jump_func *jfunc,
 return false;
 
   if (!param_type
-  || (!INTEGRAL_TYPE_P (param_type)
- && !POINTER_TYPE_P (param_type)))
+  || !ipa_vr_supported_type_p (param_type))
 return dest_lat->set_to_bottom ();
 
   if (jfunc->type == IPA_JF_PASS_THROUGH)
-- 
2.46.0



[PATCH 1/2] ipa: Rename ipa_supports_p to ipa_vr_supported_type_p

2024-09-11 Thread Martin Jambor
Hi,

ipa_supports_p is not a name that captures well what the predicate
determines.  Therefore, this patch renames it to ipa_vr_supported_type_p.

This change has been pre-approved by Honza and has passed bootstrap and
test-suite on x86_64 and so I will push it to master later today.

Thanks,

Martin


gcc/ChangeLog:

2024-09-06  Martin Jambor  

* ipa-cp.h (ipa_supports_p): Rename to ipa_vr_supported_type_p.
* ipa-cp.cc (ipa_vr_operation_and_type_effects): Adjust called
function name.
(propagate_vr_across_jump_function): Likewise.
* ipa-prop.cc (ipa_compute_jump_functions_for_edge): Likewise.
(ipcp_get_parm_bits): Likewise.
---
 gcc/ipa-cp.cc   | 5 +++--
 gcc/ipa-cp.h| 2 +-
 gcc/ipa-prop.cc | 6 +++---
 3 files changed, 7 insertions(+), 6 deletions(-)

diff --git a/gcc/ipa-cp.cc b/gcc/ipa-cp.cc
index 56468dc40ee..a1033b81aef 100644
--- a/gcc/ipa-cp.cc
+++ b/gcc/ipa-cp.cc
@@ -1649,7 +1649,8 @@ ipa_vr_operation_and_type_effects (vrange &dst_vr,
   enum tree_code operation,
   tree dst_type, tree src_type)
 {
-  if (!ipa_supports_p (dst_type) || !ipa_supports_p (src_type))
+  if (!ipa_vr_supported_type_p (dst_type)
+  || !ipa_vr_supported_type_p (src_type))
 return false;
 
   range_op_handler handler (operation);
@@ -2553,7 +2554,7 @@ propagate_vr_across_jump_function (cgraph_edge *cs, 
ipa_jump_func *jfunc,
  ipa_range_set_and_normalize (op_vr, op);
 
  if (!handler
- || !ipa_supports_p (operand_type)
+ || !ipa_vr_supported_type_p (operand_type)
  /* Sometimes we try to fold comparison operators using a
 pointer type to hold the result instead of a boolean
 type.  Avoid trapping in the sanity check in
diff --git a/gcc/ipa-cp.h b/gcc/ipa-cp.h
index 4616c61625a..ba2ebfede63 100644
--- a/gcc/ipa-cp.h
+++ b/gcc/ipa-cp.h
@@ -294,7 +294,7 @@ bool values_equal_for_ipcp_p (tree x, tree y);
 /* Return TRUE if IPA supports ranges of TYPE.  */
 
 static inline bool
-ipa_supports_p (tree type)
+ipa_vr_supported_type_p (tree type)
 {
   return irange::supports_p (type) || prange::supports_p (type);
 }
diff --git a/gcc/ipa-prop.cc b/gcc/ipa-prop.cc
index 99ebd6229ec..78d1fb7086d 100644
--- a/gcc/ipa-prop.cc
+++ b/gcc/ipa-prop.cc
@@ -2392,8 +2392,8 @@ ipa_compute_jump_functions_for_edge (struct 
ipa_func_body_info *fbi,
   else
{
  if (param_type
- && ipa_supports_p (TREE_TYPE (arg))
- && ipa_supports_p (param_type)
+ && ipa_vr_supported_type_p (TREE_TYPE (arg))
+ && ipa_vr_supported_type_p (param_type)
  && get_range_query (cfun)->range_of_expr (vr, arg, cs->call_stmt)
  && !vr.undefined_p ())
{
@@ -5761,7 +5761,7 @@ ipcp_get_parm_bits (tree parm, tree *value, widest_int 
*mask)
   ipcp_transformation *ts = ipcp_get_transformation_summary (cnode);
   if (!ts
   || vec_safe_length (ts->m_vr) == 0
-  || !ipa_supports_p (TREE_TYPE (parm)))
+  || !ipa_vr_supported_type_p (TREE_TYPE (parm)))
 return false;
 
   int i = ts->get_param_index (current_function_decl, parm);
-- 
2.46.0



Re: PING^5 [PATCH] rs6000: Adjust -fpatchable-function-entry* support for dual entry [PR112980]

2024-08-27 Thread Martin Jambor
Hi,

On Fri, Aug 09 2024, Kewen.Lin wrote:
> Hi,
>
> Gentle ping this patch:
>
> https://gcc.gnu.org/pipermail/gcc-patches/2024-May/651025.html

I'd like to second this ping, please.

Thank you,

Martin


>
> BR,
> Kewen
>
>>> on 2024/7/12 00:15, Martin Jambor wrote:
>>>> Hi,
>>>>
>>>> can I add myself to the bunch of people who are pinging this? 
>>>> Having
>>>> this in will make our life easier.
>>>>
>>>> Thanks a lot,
>>>>
>>>> Martin
>>>>
>>>>
>>>> On Wed, May 08 2024, Kewen.Lin wrote:
>>>>> Hi,
>>>>>
>>>>> As the discussion in PR112980, although the current
>>>>> implementation for -fpatchable-function-entry* conforms
>>>>> with the documentation (making N NOPs be consecutive),
>>>>> it's inefficient for both kernel and userspace livepatching
>>>>> (see comments in PR for the details).
>>>>>
>>>>> So this patch is to change the current implementation by
>>>>> emitting the "before" NOPs before global entry point and
>>>>> the "after" NOPs after local entry point.  The new behavior
>>>>> would not keep NOPs to be consecutive, so the documentation
>>>>> is updated to emphasize this.
>>>>>
>>>>> Bootstrapped and regress-tested on powerpc64-linux-gnu
>>>>> P8/P9 and powerpc64le-linux-gnu P9 and P10.
>>>>>
>>>>> Is it ok for trunk?  And backporting to active branches
>>>>> after burn-in time?  I guess we should also mention this
>>>>> change in changes.html?
>>>>>
>>>>> BR,
>>>>> Kewen
>>>>> -
>>>>>   PR target/112980
>>>>>
>>>>> gcc/ChangeLog:
>>>>>
>>>>>   * config/rs6000/rs6000-logue.cc
>>>>> (rs6000_output_function_prologue):
>>>>>   Adjust the handling on patch area emitting with dual
>>>>> entry, remove
>>>>>   the restriction on "before" NOPs count, not emit
>>>>> "before" NOPs any
>>>>>   more but only emit "after" NOPs.
>>>>>   * config/rs6000/rs6000.cc
>>>>> (rs6000_print_patchable_function_entry):
>>>>>   Adjust by respecting cfun->machine-
>>>>>> stop_patch_area_print.
>>>>>   (rs6000_elf_declare_function_name): For ELFv2 with dual
>>>>> entry, set
>>>>>   cfun->machine->stop_patch_area_print as true.
>>>>>   * config/rs6000/rs6000.h (struct machine_function):
>>>>> Remove member
>>>>>   global_entry_emitted, add new member
>>>>> stop_patch_area_print.
>>>>>   * doc/invoke.texi (option -fpatchable-function-entry):
>>>>> Adjust the
>>>>>   documentation for PowerPC ELFv2 dual entry.
>>>>>
>>>>> gcc/testsuite/ChangeLog:
>>>>>
>>>>>   * c-c++-common/patchable_function_entry-default.c:
>>>>> Adjust.
>>>>>   * gcc.target/powerpc/pr99888-4.c: Likewise.
>>>>>   * gcc.target/powerpc/pr99888-5.c: Likewise.
>>>>>   * gcc.target/powerpc/pr99888-6.c: Likewise.
>>>>> ---
>>>>>  gcc/config/rs6000/rs6000-logue.cc | 40 +
>>>>> --
>>>>>  gcc/config/rs6000/rs6000.cc   | 15 +--
>>>>>  gcc/config/rs6000/rs6000.h    | 10 +++--
>>>>>  gcc/doc/invoke.texi   |  8 ++--
>>>>>  .../patchable_function_entry-default.c    |  3 --
>>>>>  gcc/testsuite/gcc.target/powerpc/pr99888-4.c  |  4 +-
>>>>>  gcc/testsuite/gcc.target/powerpc/pr99888-5.c  |  4 +-
>>>>>  gcc/testsuite/gcc.target/powerpc/pr99888-6.c  |  4 +-
>>>>>  8 files changed, 33 insertions(+), 55 deletions(-)
>>>>>
>>>>> diff --git a/gcc/config/rs6000/rs6000-logue.cc
>>>>> b/gcc/config/rs6000/rs6000-logue.cc
>>>>> index 60ba15a8bc3..0eb019b44b3 100644
>>>>> --- a/gcc/config/rs6000/rs6000-logue.cc
>>>>> +++ b/gcc/config/rs6000/rs6000-logue.cc
>>>>> @@ -4006,43 +4006,21 @@ rs6000_output_function_prologue (FILE
>>>>> *file)
>>>>>     fprintf (file, "\tadd 2,2,12\n

Re: [PATCH 2/2] ipa: Move pass_ipa_cdtor_merge before pass_ipa_cp and pass_ipa_sra

2024-08-27 Thread Martin Jambor
Hello,

and ping please.

Martin

On Fri, Aug 09 2024, Martin Jambor wrote:
> Hello,
>
> and ping please.
>
> Martin
>
> On Fri, Jul 26 2024, Martin Jambor wrote:
>> Hi,
>>
>> when looking at PR 115815 we realized that it would make sense to make
>> calls to functions originally declared static constructors and
>> destructors created by pass_ipa_cdtor_merge visible to IPA-SRA.  This
>> patch does that.
>>
>> Bootstrapped and tested on x86_64-linux.  OK for master?
>>
>> Thanks,
>>
>> Martin
>>
>>
>> gcc/ChangeLog:
>>
>> 2024-07-25  Martin Jambor  
>>
>>  * passes.def: Move pass_ipa_cdtor_merge before pass_ipa_cp and
>>  pass_ipa_sra.
>> ---
>>  gcc/passes.def | 2 +-
>>  1 file changed, 1 insertion(+), 1 deletion(-)
>>
>> diff --git a/gcc/passes.def b/gcc/passes.def
>> index b06d6d45f63..33b2c10c9c9 100644
>> --- a/gcc/passes.def
>> +++ b/gcc/passes.def
>> @@ -157,9 +157,9 @@ along with GCC; see the file COPYING3.  If not see
>>NEXT_PASS (pass_ipa_profile);
>>NEXT_PASS (pass_ipa_icf);
>>NEXT_PASS (pass_ipa_devirt);
>> +  NEXT_PASS (pass_ipa_cdtor_merge);
>>NEXT_PASS (pass_ipa_cp);
>>NEXT_PASS (pass_ipa_sra);
>> -  NEXT_PASS (pass_ipa_cdtor_merge);
>>NEXT_PASS (pass_ipa_fn_summary);
>>NEXT_PASS (pass_ipa_inline);
>>NEXT_PASS (pass_ipa_pure_const);
>> -- 
>> 2.45.2


Re: [PATCH 1/2] ipa: Treat static constructors and destructors as non-local (PR 115815)

2024-08-27 Thread Martin Jambor
Hello,

and ping please.

Martin


On Fri, Aug 09 2024, Martin Jambor wrote:
> Hello,
>
> and ping please.
>
> Martin
>
> On Fri, Jul 26 2024, Martin Jambor wrote:
>> Hi,
>>
>> in PR 115815, IPA-SRA thought it had control over all invocations of a
>> (recursive) static destructor but it did not see the implied
>> invocation which led to the original being left behind and the
>> clean-up code encountering uses of SSAs that definitely should have
>> been dead.
>>
>> Fixed by teaching cgraph_node::can_be_local_p about static
>> constructors and destructors.  Similar test is missing in
>> cgraph_node::local_p so I added the check there as well.
>>
>> Bootstrapped and tested on x86_64-linux.  OK for master and after a
>> while to gcc14 and gcc13 release branches?
>>
>> Thanks,
>>
>> Martin
>>
>>
>> gcc/ChangeLog:
>>
>> 2024-07-25  Martin Jambor  
>>
>>  PR ipa/115815
>>  * cgraph.cc (cgraph_node_cannot_be_local_p_1): Also check
>>  DECL_STATIC_CONSTRUCTOR and DECL_STATIC_DESTRUCTOR.
>>  * ipa-visibility.cc (non_local_p): Likewise.
>>  (cgraph_node::local_p): Delete extraneous line of tabs.
>>
>> gcc/testsuite/ChangeLog:
>>
>> 2024-07-25  Martin Jambor  
>>
>>  PR ipa/115815
>>  * gcc.dg/lto/pr115815_0.c: New test.
>> ---
>>  gcc/cgraph.cc |  4 +++-
>>  gcc/ipa-visibility.cc |  5 +++--
>>  gcc/testsuite/gcc.dg/lto/pr115815_0.c | 18 ++
>>  3 files changed, 24 insertions(+), 3 deletions(-)
>>  create mode 100644 gcc/testsuite/gcc.dg/lto/pr115815_0.c
>>
>> diff --git a/gcc/cgraph.cc b/gcc/cgraph.cc
>> index 473d8410bc9..39a3adbc7c3 100644
>> --- a/gcc/cgraph.cc
>> +++ b/gcc/cgraph.cc
>> @@ -2434,7 +2434,9 @@ cgraph_node_cannot_be_local_p_1 (cgraph_node *node, 
>> void *)
>>  && !node->forced_by_abi
>>  && !node->used_from_object_file_p ()
>>  && !node->same_comdat_group)
>> -   || !node->externally_visible));
>> +   || !node->externally_visible)
>> +   && !DECL_STATIC_CONSTRUCTOR (node->decl)
>> +   && !DECL_STATIC_DESTRUCTOR (node->decl));
>>  }
>>  
>>  /* Return true if cgraph_node can be made local for API change.
>> diff --git a/gcc/ipa-visibility.cc b/gcc/ipa-visibility.cc
>> index 501d3c304aa..21f0c47f388 100644
>> --- a/gcc/ipa-visibility.cc
>> +++ b/gcc/ipa-visibility.cc
>> @@ -102,7 +102,9 @@ non_local_p (struct cgraph_node *node, void *data 
>> ATTRIBUTE_UNUSED)
>> && !node->externally_visible
>> && !node->used_from_other_partition
>> && !node->in_other_partition
>> -   && node->get_availability () >= AVAIL_AVAILABLE);
>> +   && node->get_availability () >= AVAIL_AVAILABLE
>> +   && !DECL_STATIC_CONSTRUCTOR (node->decl)
>> +   && !DECL_STATIC_DESTRUCTOR (node->decl));
>>  }
>>  
>>  /* Return true when function can be marked local.  */
>> @@ -116,7 +118,6 @@ cgraph_node::local_p (void)
>>   return n->callees->callee->local_p ();
>> return !n->call_for_symbol_thunks_and_aliases (non_local_p,
>>NULL, true);
>> -
>>  }
>>  
>>  /* A helper for comdat_can_be_unshared_p.  */
>> diff --git a/gcc/testsuite/gcc.dg/lto/pr115815_0.c 
>> b/gcc/testsuite/gcc.dg/lto/pr115815_0.c
>> new file mode 100644
>> index 000..d938ae4c802
>> --- /dev/null
>> +++ b/gcc/testsuite/gcc.dg/lto/pr115815_0.c
>> @@ -0,0 +1,18 @@
>> +int a;
>> +volatile int v;
>> +volatile int w;
>> +
>> +int __attribute__((destructor))
>> +b() {
>> +  if (v)
>> +return a + b();
>> +  v = 5;
>> +  return 0;
>> +}
>> +
>> +int
>> +main (int argc, char **argv)
>> +{
>> +  w = 1;
>> +  return 0;
>> +}
>> -- 
>> 2.45.2


[PATCH] sra: Avoid risking x87 magling binary representation of a replacement (PR 58416)

2024-08-19 Thread Martin Jambor
Hi,

PR 58416 shows that storing non-floating point data to floating point
scalar registers can lead to miscompilations when the data is
normalized or otherwise processed upon loading to a register.  To
avoid that risk, this patch detects situations where we have multiple
types and a we decide to represent the data in a type with a mode that
is known to not be able to transfer actual bits reliably using the new
TARGET_MODE_CAN_TRANSFER_BITS hook.

Bootstrapped and tested on x86_64-linux.  OK for trunk?

Any back-ports to release branches would of course need a back-port of
the hook itself, unfortunately.

Thanks,

Martin


gcc/ChangeLog:

2024-08-19  Martin Jambor  

PR target/58416
* tree-sra.cc (types_risk_mangled_binary_repr_p): New function.
(sort_and_splice_var_accesses): Use it.
(propagate_subaccesses_from_rhs): Likewise.

gcc/testsuite/ChangeLog:

2024-08-19  Martin Jambor  

PR target/58416
* gcc.dg/torture/pr58416.c: New test.
---
 gcc/testsuite/gcc.dg/torture/pr58416.c | 32 ++
 gcc/tree-sra.cc| 28 +-
 2 files changed, 59 insertions(+), 1 deletion(-)
 create mode 100644 gcc/testsuite/gcc.dg/torture/pr58416.c

diff --git a/gcc/testsuite/gcc.dg/torture/pr58416.c 
b/gcc/testsuite/gcc.dg/torture/pr58416.c
new file mode 100644
index 000..0922b0e7089
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/torture/pr58416.c
@@ -0,0 +1,32 @@
+/* { dg-do run } */
+
+struct s {
+  char s[sizeof(long double)];
+};
+
+union u {
+  long double d;
+  struct s s;
+};
+
+int main()
+{
+  union u x = {0};
+#if __SIZEOF_LONG_DOUBLE__ == 16
+  x.s = (struct s){""};
+#elif __SIZEOF_LONG_DOUBLE__ == 12
+  x.s = (struct s){""};
+#elif __SIZEOF_LONG_DOUBLE__ == 8
+  x.s = (struct s){""};
+#elif __SIZEOF_LONG_DOUBLE__ == 4
+  x.s = (struct s){""};
+#endif
+
+  union u y = x;
+
+  for (unsigned char *p = (unsigned char *)&y + sizeof y;
+   p-- > (unsigned char *)&y;)
+if (*p != (unsigned char)'x')
+  __builtin_abort ();
+  return 0;
+}
diff --git a/gcc/tree-sra.cc b/gcc/tree-sra.cc
index 8040b0c5645..64e2f007d68 100644
--- a/gcc/tree-sra.cc
+++ b/gcc/tree-sra.cc
@@ -2335,6 +2335,19 @@ same_access_path_p (tree exp1, tree exp2)
   return true;
 }
 
+/* Return true when either T1 is a type that, when loaded into a register and
+   stored back to memory will yield the same bits or when both T1 and T2 are
+   compatible.  */
+
+static bool
+types_risk_mangled_binary_repr_p (tree t1, tree t2)
+{
+  if (mode_can_transfer_bits (TYPE_MODE (t1)))
+return false;
+
+  return !types_compatible_p (t1, t2);
+}
+
 /* Sort all accesses for the given variable, check for partial overlaps and
return NULL if there are any.  If there are none, pick a representative for
each combination of offset and size and create a linked list out of them.
@@ -2461,6 +2474,17 @@ sort_and_splice_var_accesses (tree var)
}
  unscalarizable_region = true;
}
+ else if (types_risk_mangled_binary_repr_p (access->type, ac2->type))
+   {
+ if (dump_file && (dump_flags & TDF_DETAILS))
+   {
+ fprintf (dump_file, "Cannot scalarize the following access "
+  "because data would be held in a mode which is not "
+  "guaranteed to preserve all bits.\n  ");
+ dump_access (dump_file, access, false);
+   }
+ unscalarizable_region = true;
+   }
 
  if (grp_same_access_path
  && !same_access_path_p (access->expr, ac2->expr))
@@ -3127,7 +3151,9 @@ propagate_subaccesses_from_rhs (struct access *lacc, 
struct access *racc)
  ret = true;
  subtree_mark_written_and_rhs_enqueue (lacc);
}
-  if (!lacc->first_child && !racc->first_child)
+  if (!lacc->first_child
+ && !racc->first_child
+ && !types_risk_mangled_binary_repr_p (racc->type, lacc->type))
{
  /* We are about to change the access type from aggregate to scalar,
 so we need to put the reverse flag onto the access, if any.  */
-- 
2.46.0



Re: [PATCH 2/2] ipa: Move pass_ipa_cdtor_merge before pass_ipa_cp and pass_ipa_sra

2024-08-09 Thread Martin Jambor
Hello,

and ping please.

Martin

On Fri, Jul 26 2024, Martin Jambor wrote:
> Hi,
>
> when looking at PR 115815 we realized that it would make sense to make
> calls to functions originally declared static constructors and
> destructors created by pass_ipa_cdtor_merge visible to IPA-SRA.  This
> patch does that.
>
> Bootstrapped and tested on x86_64-linux.  OK for master?
>
> Thanks,
>
> Martin
>
>
> gcc/ChangeLog:
>
> 2024-07-25  Martin Jambor  
>
>   * passes.def: Move pass_ipa_cdtor_merge before pass_ipa_cp and
>   pass_ipa_sra.
> ---
>  gcc/passes.def | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/gcc/passes.def b/gcc/passes.def
> index b06d6d45f63..33b2c10c9c9 100644
> --- a/gcc/passes.def
> +++ b/gcc/passes.def
> @@ -157,9 +157,9 @@ along with GCC; see the file COPYING3.  If not see
>NEXT_PASS (pass_ipa_profile);
>NEXT_PASS (pass_ipa_icf);
>NEXT_PASS (pass_ipa_devirt);
> +  NEXT_PASS (pass_ipa_cdtor_merge);
>NEXT_PASS (pass_ipa_cp);
>NEXT_PASS (pass_ipa_sra);
> -  NEXT_PASS (pass_ipa_cdtor_merge);
>NEXT_PASS (pass_ipa_fn_summary);
>NEXT_PASS (pass_ipa_inline);
>NEXT_PASS (pass_ipa_pure_const);
> -- 
> 2.45.2


Re: [PATCH 1/2] ipa: Treat static constructors and destructors as non-local (PR 115815)

2024-08-09 Thread Martin Jambor
Hello,

and ping please.

Martin

On Fri, Jul 26 2024, Martin Jambor wrote:
> Hi,
>
> in PR 115815, IPA-SRA thought it had control over all invocations of a
> (recursive) static destructor but it did not see the implied
> invocation which led to the original being left behind and the
> clean-up code encountering uses of SSAs that definitely should have
> been dead.
>
> Fixed by teaching cgraph_node::can_be_local_p about static
> constructors and destructors.  Similar test is missing in
> cgraph_node::local_p so I added the check there as well.
>
> Bootstrapped and tested on x86_64-linux.  OK for master and after a
> while to gcc14 and gcc13 release branches?
>
> Thanks,
>
> Martin
>
>
> gcc/ChangeLog:
>
> 2024-07-25  Martin Jambor  
>
>   PR ipa/115815
>   * cgraph.cc (cgraph_node_cannot_be_local_p_1): Also check
>   DECL_STATIC_CONSTRUCTOR and DECL_STATIC_DESTRUCTOR.
>   * ipa-visibility.cc (non_local_p): Likewise.
>   (cgraph_node::local_p): Delete extraneous line of tabs.
>
> gcc/testsuite/ChangeLog:
>
> 2024-07-25  Martin Jambor  
>
>   PR ipa/115815
>   * gcc.dg/lto/pr115815_0.c: New test.
> ---
>  gcc/cgraph.cc |  4 +++-
>  gcc/ipa-visibility.cc |  5 +++--
>  gcc/testsuite/gcc.dg/lto/pr115815_0.c | 18 ++
>  3 files changed, 24 insertions(+), 3 deletions(-)
>  create mode 100644 gcc/testsuite/gcc.dg/lto/pr115815_0.c
>
> diff --git a/gcc/cgraph.cc b/gcc/cgraph.cc
> index 473d8410bc9..39a3adbc7c3 100644
> --- a/gcc/cgraph.cc
> +++ b/gcc/cgraph.cc
> @@ -2434,7 +2434,9 @@ cgraph_node_cannot_be_local_p_1 (cgraph_node *node, 
> void *)
>   && !node->forced_by_abi
>   && !node->used_from_object_file_p ()
>   && !node->same_comdat_group)
> -|| !node->externally_visible));
> +|| !node->externally_visible)
> +&& !DECL_STATIC_CONSTRUCTOR (node->decl)
> +&& !DECL_STATIC_DESTRUCTOR (node->decl));
>  }
>  
>  /* Return true if cgraph_node can be made local for API change.
> diff --git a/gcc/ipa-visibility.cc b/gcc/ipa-visibility.cc
> index 501d3c304aa..21f0c47f388 100644
> --- a/gcc/ipa-visibility.cc
> +++ b/gcc/ipa-visibility.cc
> @@ -102,7 +102,9 @@ non_local_p (struct cgraph_node *node, void *data 
> ATTRIBUTE_UNUSED)
>  && !node->externally_visible
>  && !node->used_from_other_partition
>  && !node->in_other_partition
> -&& node->get_availability () >= AVAIL_AVAILABLE);
> +&& node->get_availability () >= AVAIL_AVAILABLE
> +&& !DECL_STATIC_CONSTRUCTOR (node->decl)
> +&& !DECL_STATIC_DESTRUCTOR (node->decl));
>  }
>  
>  /* Return true when function can be marked local.  */
> @@ -116,7 +118,6 @@ cgraph_node::local_p (void)
>   return n->callees->callee->local_p ();
> return !n->call_for_symbol_thunks_and_aliases (non_local_p,
> NULL, true);
> - 
>  }
>  
>  /* A helper for comdat_can_be_unshared_p.  */
> diff --git a/gcc/testsuite/gcc.dg/lto/pr115815_0.c 
> b/gcc/testsuite/gcc.dg/lto/pr115815_0.c
> new file mode 100644
> index 000..d938ae4c802
> --- /dev/null
> +++ b/gcc/testsuite/gcc.dg/lto/pr115815_0.c
> @@ -0,0 +1,18 @@
> +int a;
> +volatile int v;
> +volatile int w;
> +
> +int __attribute__((destructor))
> +b() {
> +  if (v)
> +return a + b();
> +  v = 5;
> +  return 0;
> +}
> +
> +int
> +main (int argc, char **argv)
> +{
> +  w = 1;
> +  return 0;
> +}
> -- 
> 2.45.2


[PATCH 2/2] ipa: Move pass_ipa_cdtor_merge before pass_ipa_cp and pass_ipa_sra

2024-07-26 Thread Martin Jambor
Hi,

when looking at PR 115815 we realized that it would make sense to make
calls to functions originally declared static constructors and
destructors created by pass_ipa_cdtor_merge visible to IPA-SRA.  This
patch does that.

Bootstrapped and tested on x86_64-linux.  OK for master?

Thanks,

Martin


gcc/ChangeLog:

2024-07-25  Martin Jambor  

* passes.def: Move pass_ipa_cdtor_merge before pass_ipa_cp and
pass_ipa_sra.
---
 gcc/passes.def | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/gcc/passes.def b/gcc/passes.def
index b06d6d45f63..33b2c10c9c9 100644
--- a/gcc/passes.def
+++ b/gcc/passes.def
@@ -157,9 +157,9 @@ along with GCC; see the file COPYING3.  If not see
   NEXT_PASS (pass_ipa_profile);
   NEXT_PASS (pass_ipa_icf);
   NEXT_PASS (pass_ipa_devirt);
+  NEXT_PASS (pass_ipa_cdtor_merge);
   NEXT_PASS (pass_ipa_cp);
   NEXT_PASS (pass_ipa_sra);
-  NEXT_PASS (pass_ipa_cdtor_merge);
   NEXT_PASS (pass_ipa_fn_summary);
   NEXT_PASS (pass_ipa_inline);
   NEXT_PASS (pass_ipa_pure_const);
-- 
2.45.2



[PATCH 1/2] ipa: Treat static constructors and destructors as non-local (PR 115815)

2024-07-26 Thread Martin Jambor
Hi,

in PR 115815, IPA-SRA thought it had control over all invocations of a
(recursive) static destructor but it did not see the implied
invocation which led to the original being left behind and the
clean-up code encountering uses of SSAs that definitely should have
been dead.

Fixed by teaching cgraph_node::can_be_local_p about static
constructors and destructors.  Similar test is missing in
cgraph_node::local_p so I added the check there as well.

Bootstrapped and tested on x86_64-linux.  OK for master and after a
while to gcc14 and gcc13 release branches?

Thanks,

Martin


gcc/ChangeLog:

2024-07-25  Martin Jambor  

PR ipa/115815
* cgraph.cc (cgraph_node_cannot_be_local_p_1): Also check
DECL_STATIC_CONSTRUCTOR and DECL_STATIC_DESTRUCTOR.
* ipa-visibility.cc (non_local_p): Likewise.
(cgraph_node::local_p): Delete extraneous line of tabs.

gcc/testsuite/ChangeLog:

2024-07-25  Martin Jambor  

PR ipa/115815
* gcc.dg/lto/pr115815_0.c: New test.
---
 gcc/cgraph.cc |  4 +++-
 gcc/ipa-visibility.cc |  5 +++--
 gcc/testsuite/gcc.dg/lto/pr115815_0.c | 18 ++
 3 files changed, 24 insertions(+), 3 deletions(-)
 create mode 100644 gcc/testsuite/gcc.dg/lto/pr115815_0.c

diff --git a/gcc/cgraph.cc b/gcc/cgraph.cc
index 473d8410bc9..39a3adbc7c3 100644
--- a/gcc/cgraph.cc
+++ b/gcc/cgraph.cc
@@ -2434,7 +2434,9 @@ cgraph_node_cannot_be_local_p_1 (cgraph_node *node, void 
*)
&& !node->forced_by_abi
&& !node->used_from_object_file_p ()
&& !node->same_comdat_group)
-  || !node->externally_visible));
+  || !node->externally_visible)
+  && !DECL_STATIC_CONSTRUCTOR (node->decl)
+  && !DECL_STATIC_DESTRUCTOR (node->decl));
 }
 
 /* Return true if cgraph_node can be made local for API change.
diff --git a/gcc/ipa-visibility.cc b/gcc/ipa-visibility.cc
index 501d3c304aa..21f0c47f388 100644
--- a/gcc/ipa-visibility.cc
+++ b/gcc/ipa-visibility.cc
@@ -102,7 +102,9 @@ non_local_p (struct cgraph_node *node, void *data 
ATTRIBUTE_UNUSED)
   && !node->externally_visible
   && !node->used_from_other_partition
   && !node->in_other_partition
-  && node->get_availability () >= AVAIL_AVAILABLE);
+  && node->get_availability () >= AVAIL_AVAILABLE
+  && !DECL_STATIC_CONSTRUCTOR (node->decl)
+  && !DECL_STATIC_DESTRUCTOR (node->decl));
 }
 
 /* Return true when function can be marked local.  */
@@ -116,7 +118,6 @@ cgraph_node::local_p (void)
  return n->callees->callee->local_p ();
return !n->call_for_symbol_thunks_and_aliases (non_local_p,
  NULL, true);
-   
 }
 
 /* A helper for comdat_can_be_unshared_p.  */
diff --git a/gcc/testsuite/gcc.dg/lto/pr115815_0.c 
b/gcc/testsuite/gcc.dg/lto/pr115815_0.c
new file mode 100644
index 000..d938ae4c802
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/lto/pr115815_0.c
@@ -0,0 +1,18 @@
+int a;
+volatile int v;
+volatile int w;
+
+int __attribute__((destructor))
+b() {
+  if (v)
+return a + b();
+  v = 5;
+  return 0;
+}
+
+int
+main (int argc, char **argv)
+{
+  w = 1;
+  return 0;
+}
-- 
2.45.2



Re: [PATCH] rs6000: Adjust -fpatchable-function-entry* support for dual entry [PR112980]

2024-07-11 Thread Martin Jambor
Hi,

can I add myself to the bunch of people who are pinging this?  Having
this in will make our life easier.

Thanks a lot,

Martin


On Wed, May 08 2024, Kewen.Lin wrote:
> Hi,
>
> As the discussion in PR112980, although the current
> implementation for -fpatchable-function-entry* conforms
> with the documentation (making N NOPs be consecutive),
> it's inefficient for both kernel and userspace livepatching
> (see comments in PR for the details).
>
> So this patch is to change the current implementation by
> emitting the "before" NOPs before global entry point and
> the "after" NOPs after local entry point.  The new behavior
> would not keep NOPs to be consecutive, so the documentation
> is updated to emphasize this.
>
> Bootstrapped and regress-tested on powerpc64-linux-gnu
> P8/P9 and powerpc64le-linux-gnu P9 and P10.
>
> Is it ok for trunk?  And backporting to active branches
> after burn-in time?  I guess we should also mention this
> change in changes.html?
>
> BR,
> Kewen
> -
>   PR target/112980
>
> gcc/ChangeLog:
>
>   * config/rs6000/rs6000-logue.cc (rs6000_output_function_prologue):
>   Adjust the handling on patch area emitting with dual entry, remove
>   the restriction on "before" NOPs count, not emit "before" NOPs any
>   more but only emit "after" NOPs.
>   * config/rs6000/rs6000.cc (rs6000_print_patchable_function_entry):
>   Adjust by respecting cfun->machine->stop_patch_area_print.
>   (rs6000_elf_declare_function_name): For ELFv2 with dual entry, set
>   cfun->machine->stop_patch_area_print as true.
>   * config/rs6000/rs6000.h (struct machine_function): Remove member
>   global_entry_emitted, add new member stop_patch_area_print.
>   * doc/invoke.texi (option -fpatchable-function-entry): Adjust the
>   documentation for PowerPC ELFv2 dual entry.
>
> gcc/testsuite/ChangeLog:
>
>   * c-c++-common/patchable_function_entry-default.c: Adjust.
>   * gcc.target/powerpc/pr99888-4.c: Likewise.
>   * gcc.target/powerpc/pr99888-5.c: Likewise.
>   * gcc.target/powerpc/pr99888-6.c: Likewise.
> ---
>  gcc/config/rs6000/rs6000-logue.cc | 40 +--
>  gcc/config/rs6000/rs6000.cc   | 15 +--
>  gcc/config/rs6000/rs6000.h| 10 +++--
>  gcc/doc/invoke.texi   |  8 ++--
>  .../patchable_function_entry-default.c|  3 --
>  gcc/testsuite/gcc.target/powerpc/pr99888-4.c  |  4 +-
>  gcc/testsuite/gcc.target/powerpc/pr99888-5.c  |  4 +-
>  gcc/testsuite/gcc.target/powerpc/pr99888-6.c  |  4 +-
>  8 files changed, 33 insertions(+), 55 deletions(-)
>
> diff --git a/gcc/config/rs6000/rs6000-logue.cc 
> b/gcc/config/rs6000/rs6000-logue.cc
> index 60ba15a8bc3..0eb019b44b3 100644
> --- a/gcc/config/rs6000/rs6000-logue.cc
> +++ b/gcc/config/rs6000/rs6000-logue.cc
> @@ -4006,43 +4006,21 @@ rs6000_output_function_prologue (FILE *file)
> fprintf (file, "\tadd 2,2,12\n");
>   }
>
> -  unsigned short patch_area_size = crtl->patch_area_size;
> -  unsigned short patch_area_entry = crtl->patch_area_entry;
> -  /* Need to emit the patching area.  */
> -  if (patch_area_size > 0)
> - {
> -   cfun->machine->global_entry_emitted = true;
> -   /* As ELFv2 ABI shows, the allowable bytes between the global
> -  and local entry points are 0, 4, 8, 16, 32 and 64 when
> -  there is a local entry point.  Considering there are two
> -  non-prefixed instructions for global entry point prologue
> -  (8 bytes), the count for patchable nops before local entry
> -  point would be 2, 6 and 14.  It's possible to support those
> -  other counts of nops by not making a local entry point, but
> -  we don't have clear use cases for them, so leave them
> -  unsupported for now.  */
> -   if (patch_area_entry > 0)
> - {
> -   if (patch_area_entry != 2
> -   && patch_area_entry != 6
> -   && patch_area_entry != 14)
> - error ("unsupported number of nops before function entry (%u)",
> -patch_area_entry);
> -   rs6000_print_patchable_function_entry (file, patch_area_entry,
> -  true);
> -   patch_area_size -= patch_area_entry;
> - }
> - }
> -
>fputs ("\t.localentry\t", file);
>assemble_name (file, name);
>fputs (",.-", file);
>assemble_name (file, name);
>fputs ("\n", file);
>/* Emit the nops after local entry.  */
> -  if (patch_area_size > 0)
> - rs6000_print_patchable_function_entry (file, patch_area_size,
> -patch_area_entry == 0);
> +  unsigned short patch_area_size = crtl->patch_area_size;
> +  unsigned short patch_area_entry = crtl->patch_area_entry;
> +  if (patch_area_size > patch_area_entry)
> + {
> +   cfun->mach

[commited, gcc13] ipa: Compare jump functions in ICF (PR 113907)

2024-05-14 Thread Martin Jambor
Hi,

This is a manual backport of r14-9840-g1162861439fd3c from master.
Manual because the bits and value range representation in jump
functions have changes during the gcc 14 development cycle.

In PR 113907 comment #58, Honza found a case where ICF thinks bodies
of functions are equivalent but becaise of difference in aliases in a
memory access, different aggregate jump functions are associated with
supposedly equivalent call statements.  This patch adds a way to
compare jump functions and plugs it into ICF to avoid the issue.

Bootstrapped and tested on x86_64-linux.  Committed to the gcc-13
branch.

Martin


gcc/ChangeLog:

2024-05-14  Martin Jambor  

PR ipa/113907
* ipa-prop.h (ipa_jump_functions_equivalent_p): Declare.
(values_equal_for_ipcp_p): Likewise.
* ipa-prop.cc (ipa_agg_pass_through_jf_equivalent_p): New function.
(ipa_agg_jump_functions_equivalent_p): Likewise.
(ipa_jump_functions_equivalent_p): Likewise.
* ipa-cp.cc (values_equal_for_ipcp_p): Make function public.
* ipa-icf-gimple.cc: Include alloc-pool.h, symbol-summary.h, sreal.h,
ipa-cp.h and ipa-prop.h.
(func_checker::compare_gimple_call): Comapre jump functions.

gcc/testsuite/ChangeLog:

2024-05-10  Martin Jambor  

PR ipa/113907
* gcc.dg/lto/pr113907_0.c: New.
* gcc.dg/lto/pr113907_1.c: Likewise.
* gcc.dg/lto/pr113907_2.c: Likewise.
---
 gcc/ipa-cp.cc |   2 +-
 gcc/ipa-icf-gimple.cc |  29 +
 gcc/ipa-prop.cc   | 157 ++
 gcc/ipa-prop.h|   3 +
 gcc/testsuite/gcc.dg/lto/pr113907_0.c |  18 +++
 gcc/testsuite/gcc.dg/lto/pr113907_1.c |  35 ++
 gcc/testsuite/gcc.dg/lto/pr113907_2.c |  11 ++
 7 files changed, 254 insertions(+), 1 deletion(-)
 create mode 100644 gcc/testsuite/gcc.dg/lto/pr113907_0.c
 create mode 100644 gcc/testsuite/gcc.dg/lto/pr113907_1.c
 create mode 100644 gcc/testsuite/gcc.dg/lto/pr113907_2.c

diff --git a/gcc/ipa-cp.cc b/gcc/ipa-cp.cc
index b3e0f62e400..8f36608cf33 100644
--- a/gcc/ipa-cp.cc
+++ b/gcc/ipa-cp.cc
@@ -458,7 +458,7 @@ ipcp_lattice::is_single_const ()
 
 /* Return true iff X and Y should be considered equal values by IPA-CP.  */
 
-static bool
+bool
 values_equal_for_ipcp_p (tree x, tree y)
 {
   gcc_checking_assert (x != NULL_TREE && y != NULL_TREE);
diff --git a/gcc/ipa-icf-gimple.cc b/gcc/ipa-icf-gimple.cc
index 49302ad56c6..054a557bd58 100644
--- a/gcc/ipa-icf-gimple.cc
+++ b/gcc/ipa-icf-gimple.cc
@@ -42,7 +42,11 @@ along with GCC; see the file COPYING3.  If not see
 #include "tree-sra.h"
 
 #include "tree-ssa-alias-compare.h"
+#include "alloc-pool.h"
+#include "symbol-summary.h"
 #include "ipa-icf-gimple.h"
+#include "sreal.h"
+#include "ipa-prop.h"
 
 namespace ipa_icf_gimple {
 
@@ -751,6 +755,31 @@ func_checker::compare_gimple_call (gcall *s1, gcall *s2)
   && !compatible_types_p (TREE_TYPE (t1), TREE_TYPE (t2)))
 return return_false_with_msg ("GIMPLE internal call LHS type mismatch");
 
+  if (!gimple_call_internal_p (s1))
+{
+  cgraph_edge *e1 = cgraph_node::get (m_source_func_decl)->get_edge (s1);
+  cgraph_edge *e2 = cgraph_node::get (m_target_func_decl)->get_edge (s2);
+  class ipa_edge_args *args1 = ipa_edge_args_sum->get (e1);
+  class ipa_edge_args *args2 = ipa_edge_args_sum->get (e2);
+  if ((args1 != nullptr) != (args2 != nullptr))
+   return return_false_with_msg ("ipa_edge_args mismatch");
+  if (args1)
+   {
+ int n1 = ipa_get_cs_argument_count (args1);
+ int n2 = ipa_get_cs_argument_count (args2);
+ if (n1 != n2)
+   return return_false_with_msg ("ipa_edge_args nargs mismatch");
+ for (int i = 0; i < n1; i++)
+   {
+ struct ipa_jump_func *jf1 = ipa_get_ith_jump_func (args1, i);
+ struct ipa_jump_func *jf2 = ipa_get_ith_jump_func (args2, i);
+ if (((jf1 != nullptr) != (jf2 != nullptr))
+ || (jf1 && !ipa_jump_functions_equivalent_p (jf1, jf2)))
+   return return_false_with_msg ("jump function mismatch");
+   }
+   }
+}
+
   return compare_operand (t1, t2, get_operand_access_type (&map, t1));
 }
 
diff --git a/gcc/ipa-prop.cc b/gcc/ipa-prop.cc
index 0d816749534..11ba2521b2c 100644
--- a/gcc/ipa-prop.cc
+++ b/gcc/ipa-prop.cc
@@ -6022,5 +6022,162 @@ ipcp_transform_function (struct cgraph_node *node)
   return modified_mem_access ? TODO_update_ssa_only_virtuals : 0;
 }
 
+/* Return true if the two pass_through components of two jump functions are
+   known to be equivalent.  AGG_JF denotes whether they are part of aggregate
+   functions or not.  The function can be used before the IPA phase of IPA-CP
+   or inlining because 

[PATCH] sra: Do not leave work for DSE (that it can sometimes not perform)

2024-05-03 Thread Martin Jambor
Hi,

when looking again at the g++.dg/tree-ssa/pr109849.C testcase we
discovered that it generates terrible store-to-load forwarding stalls
because SRA was leaving behind aggregate loads but all the stores were
by scalar parts and DSE failed to remove the useless load.  SRA has
all the knowledge to remove the statement even now, so this small
patch makes it do so.

With this patch, the g++.dg/tree-ssa/pr109849.C micro-benchmark runs 9
times faster (on an AMD EPYC 75F3 machine).

Bootstrapped and tested on x86_64.  OK for master?

Given that the patch is simple but can sometimes have large benefit,
could it possibly be backported to gcc-14 branch even if it is not a
regression (at least not in the last decade) in a few weeks?

Thanks,

Martin


gcc/ChangeLog:

2024-04-18  Martin Jambor  

* tree-sra.cc (sra_modify_assign): Remove the original statement
also when dealing with a store to a fully covered aggregate from a
non-candidate.

gcc/testsuite/ChangeLog:

2024-04-23  Martin Jambor  

* g++.dg/tree-ssa/pr109849.C: Also check that the aggeegate store
to cur disappears.
* gcc.dg/tree-ssa/ssa-dse-26.c: Instead of relying on DSE,
check that the unwanted stores were removed at early SRA time.
---
 gcc/testsuite/g++.dg/tree-ssa/pr109849.C   |  3 ++-
 gcc/testsuite/gcc.dg/tree-ssa/ssa-dse-26.c |  6 +++---
 gcc/tree-sra.cc| 14 --
 3 files changed, 17 insertions(+), 6 deletions(-)

diff --git a/gcc/testsuite/g++.dg/tree-ssa/pr109849.C 
b/gcc/testsuite/g++.dg/tree-ssa/pr109849.C
index cd348c0f590..d06dbb10482 100644
--- a/gcc/testsuite/g++.dg/tree-ssa/pr109849.C
+++ b/gcc/testsuite/g++.dg/tree-ssa/pr109849.C
@@ -1,5 +1,5 @@
 /* { dg-do compile } */
-/* { dg-options "-O2 -fdump-tree-sra" } */
+/* { dg-options "-O2 -fdump-tree-sra -fdump-tree-optimized" } */
 
 #include 
 typedef unsigned int uint32_t;
@@ -29,3 +29,4 @@ main()
 }
 
 /* { dg-final { scan-tree-dump "Created a replacement for stack offset" "sra"} 
} */
+/* { dg-final { scan-tree-dump-not "cur = MEM" "optimized"} } */
diff --git a/gcc/testsuite/gcc.dg/tree-ssa/ssa-dse-26.c 
b/gcc/testsuite/gcc.dg/tree-ssa/ssa-dse-26.c
index 43152de5616..1d01392c595 100644
--- a/gcc/testsuite/gcc.dg/tree-ssa/ssa-dse-26.c
+++ b/gcc/testsuite/gcc.dg/tree-ssa/ssa-dse-26.c
@@ -1,5 +1,5 @@
 /* { dg-do compile } */
-/* { dg-options "-O2 -fdump-tree-dse1-details -fno-short-enums -fno-tree-fre" 
} */
+/* { dg-options "-O2 -fdump-tree-esra -fno-short-enums -fno-tree-fre" } */
 /* { dg-skip-if "we want a BIT_FIELD_REF from fold_truth_andor" { ! lp64 } } */
 /* { dg-skip-if "temporary variable names are not x and y" { 
mmix-knuth-mmixware } } */
 
@@ -31,5 +31,5 @@ constraint_equal (struct constraint a, struct constraint b)
 && constraint_expr_equal (a.rhs, b.rhs);
 }
 
-/* { dg-final { scan-tree-dump-times "Deleted dead store: x = " 2 "dse1" } } */
-/* { dg-final { scan-tree-dump-times "Deleted dead store: y = " 2 "dse1" } } */
+/* { dg-final { scan-tree-dump-not "x = " "esra" } } */
+/* { dg-final { scan-tree-dump-not "y = " "esra" } } */
diff --git a/gcc/tree-sra.cc b/gcc/tree-sra.cc
index 32fa28911f2..8040b0c5645 100644
--- a/gcc/tree-sra.cc
+++ b/gcc/tree-sra.cc
@@ -4854,8 +4854,18 @@ sra_modify_assign (gimple *stmt, gimple_stmt_iterator 
*gsi)
 But use the RHS aggregate to load from to expose more
 optimization opportunities.  */
  if (access_has_children_p (lacc))
-   generate_subtree_copies (lacc->first_child, rhs, lacc->offset,
-0, 0, gsi, true, true, loc);
+   {
+ generate_subtree_copies (lacc->first_child, rhs, lacc->offset,
+  0, 0, gsi, true, true, loc);
+ if (lacc->grp_covered)
+   {
+ unlink_stmt_vdef (stmt);
+ gsi_remove (& orig_gsi, true);
+ release_defs (stmt);
+ sra_stats.deleted++;
+ return SRA_AM_REMOVED;
+   }
+   }
}
 
   return SRA_AM_NONE;
-- 
2.44.0



Re: [wwwdocs] Add znver5 to GCC 14 changes

2024-05-03 Thread Martin Jambor
Hi Gerald,

On Fri, May 03 2024, Gerald Pfeifer wrote:
> Hi Martin,
>
> On Thu, 2 May 2024, Martin Jambor wrote:
>> +   GCC now supports AMD CPUs based on the znver5 core via
>> +-march=znver5.  Based on ISA extensions enabled on
>> +a znver4 core, the switch further enables the AVXVNNI, MOVDIRI,
>> +MOVDIR64B, AVX512VP2INTERSECT, and PREFETCHI ISA extensions.
>
> just two small suggestions: We usually sort extensions alphabetically,
> so  AVX512VP2INTERSECT, AVXVNNI, MOVDIR64B, MOVDIRI, and PREFETCHI. If 
> there is a specific reason to do otherwise, that's okay of course.
>
> And I might write "In addition to the ISA extensions enabled on a znver4 
> core, this switch..." to avoid the repetition of "based on" (and make it a 
> bit more clear even that it is a full superset, not just 'loosely' based".
>

Thanks for the suggestions, I'll go ahead and commit the following then.

Martin


diff --git a/htdocs/gcc-14/changes.html b/htdocs/gcc-14/changes.html
index 8dfbf7dc..46a0266d 100644
--- a/htdocs/gcc-14/changes.html
+++ b/htdocs/gcc-14/changes.html
@@ -954,6 +954,12 @@ __asm (".global __flmap_lock"  "\n\t"
 -fsanitize=hwaddress will enable -mlam=u57
 by default.
   
+   GCC now supports AMD CPUs based on the znver5 core via
+-march=znver5.  In addition to the ISA extensions
+enabled on a znver4 core, this switch further enables the
+AVX512VP2INTERSECT, AVXVNNI, MOVDIR64B, MOVDIRI, and PREFETCHI ISA
+extensions.
+  
 
 
 MCore



[wwwdocs] Add znver5 to GCC 14 changes

2024-05-02 Thread Martin Jambor
Hello,

based on input from AMD, I'd like to commit the following to the wwwdocs
repo to point out new support for Zen 5 based AMD CPUs in GCC 14?

Is it OK?  Any suggestions, comments or questions?

Thanks,

Martin



diff --git a/htdocs/gcc-14/changes.html b/htdocs/gcc-14/changes.html
index 8dfbf7dc..d250340b 100644
--- a/htdocs/gcc-14/changes.html
+++ b/htdocs/gcc-14/changes.html
@@ -954,6 +954,11 @@ __asm (".global __flmap_lock"  "\n\t"
 -fsanitize=hwaddress will enable -mlam=u57
 by default.
   
+   GCC now supports AMD CPUs based on the znver5 core via
+-march=znver5.  Based on ISA extensions enabled on
+a znver4 core, the switch further enables the AVXVNNI, MOVDIRI,
+MOVDIR64B, AVX512VP2INTERSECT, and PREFETCHI ISA extensions.
+  
 
 
 MCore


Re: [wwwdocs] Porting-to-14: Mention new pragma GCC Target behavior

2024-05-02 Thread Martin Jambor
Hi,

On Wed, May 01 2024, Gerald Pfeifer wrote:
> On Tue, 30 Apr 2024, Martin Jambor wrote:
>> +Pragma GCC Target now affects preprocessor 
>> symbols
>
> Note the id: should be "gcc-target-pragma", though I even suggest to 
> simplify and say "target-pragma".
>
>> +The behavior of pragma GCC Target and specifically how it affects ISA
>
> Seconding Jakub's
>  
>   "And here as well, perhaps even #pragma GCC target."
>
>> +macros has changed in GCC 14.  In GCC 13 and older, the GCC
>> +target pragma defined and undefined corresponding ISA macros in
>> +C when using integrated preprocessor during compilation but not when
>
> "...the integrated preprocessor..."
>
>> +preprocessor was invoked as a separate step or when using -save-temps.
>
> "...the preprocessor..."
>
> and -save-temps, or better "the -save-temps 
> option".
>
>> +This can lead to different behavior, especially in C++.  For example,
>> +functions the C++ snippet below will be (silently) compiled for an
>> +incorrect instruction set by GCC 14.
>
> "functions" above looks like it's extraneous and should be skipped?
>
>> +  /* With GCC 14, __AVX2__ here will always be defined and pop_options
>> +  never called. */
>> +  #if ! __AVX2__
>> +  #pragma GCC pop_options
>> +  #endif
>
> Maybe a bit subtle, I would not say a #pragma is called; how about invoked 
> or activated?
>
>> +
>> +The fix in this case would be to remember
>> +whether pop_options needs to be performed in a new
>> +user-defined macro.
>
> "The fix in this case is to remember" (or "...remembering...")
>

Thanks for your suggestions, this is what I am going to commit in a
moment.

Martin


diff --git a/htdocs/gcc-14/porting_to.html b/htdocs/gcc-14/porting_to.html
index c825a68e..a20d82c2 100644
--- a/htdocs/gcc-14/porting_to.html
+++ b/htdocs/gcc-14/porting_to.html
@@ -514,6 +514,48 @@ be included explicitly when compiling with GCC 14:
 
 
 
+Pragma GCC target now affects preprocessor symbols
+
+
+The behavior of pragma GCC target and specifically how it affects ISA
+macros has changed in GCC 14.  In GCC 13 and older, the GCC
+target pragma defined and undefined corresponding ISA macros in
+C when using the integrated preprocessor during compilation but not
+when the preprocessor was invoked as a separate step or when using
+the -save-temps option.  In C++ the ISA macro definitions
+were performed in a way which did not have any actual effect.
+
+In GCC 14 C++ behaves like C with integrated preprocessing in earlier
+versions. Moreover, in both languages ISA macros are defined and
+undefined as expected when preprocessing separately from compilation.
+
+
+This can lead to different behavior, especially in C++.  For example,
+a part of the C++ snippet below will be (silently) compiled for an
+incorrect instruction set by GCC 14.
+
+
+  #if ! __AVX2__
+  #pragma GCC push_options
+  #pragma GCC target("avx2")
+  #endif
+
+  /* Code to be compiled for AVX2. */
+
+  /* With GCC 14, __AVX2__ here will always be defined and pop_options
+  never invoked. */
+  #if ! __AVX2__
+  #pragma GCC pop_options
+  #endif
+
+  /* With GCC 14, all following functions will be compiled for AVX2
+  which was not intended. */
+
+
+
+The fix in this case is to remember whether pop_options
+needs to be performed in a new user-defined macro.
+
 
 
 



Re: [wwwdocs] Porting-to-14: Mention new pragma GCC Target behavior

2024-04-30 Thread Martin Jambor
Hi,

On Thu, Apr 25 2024, Jakub Jelinek wrote:
> On Thu, Apr 25, 2024 at 02:34:22PM +0200, Martin Jambor wrote:
>> when looking at a package build issue with GCC 14, Michal Jireš noted a
>> different behavior of pragma GCC Target.  This snippet tries to describe
>> the gist of the problem.  I have left it in the C section even though it
>> is not really C specific, but could not think of a good name for a new
>> section for it.  Ideas (and any other suggestions for improvements)
>> welcome, of course.
>
> The change was more subtle.
> We used to define/undefine the ISA macros in C in GCC 13 and older as well,
> but only when using integrated preprocessor during compilation,
> so it didn't work that way with -save-temps or separate -E and -S/-c
> steps.
> While in C++ it behaved as if the define/undefines aren't done at all
> (they were done, but after preprocessing/lexing everything, so didn't
> affect anything).
> In GCC 14, it behaves in C++ the same as in C in older versions, and
> additionally they are defined/undefined also when using separate
> preprocessing, in both C and C++.
>

I see, thanks for the correction.

Would the following then perhaps describe the situation accurately?
Note that I have moved the whole thing to C++ section because it seems
porting issues in C because of this are quite unlikely.

Michal, I assume that the file where this issue happened was written in
C++, right?

Martin



diff --git a/htdocs/gcc-14/porting_to.html b/htdocs/gcc-14/porting_to.html
index c825a68e..1e67b0b3 100644
--- a/htdocs/gcc-14/porting_to.html
+++ b/htdocs/gcc-14/porting_to.html
@@ -514,6 +514,51 @@ be included explicitly when compiling with GCC 14:
 
 
 
+Pragma GCC Target now affects preprocessor 
symbols
+
+
+The behavior of pragma GCC Target and specifically how it affects ISA
+macros has changed in GCC 14.  In GCC 13 and older, the GCC
+target pragma defined and undefined corresponding ISA macros in
+C when using integrated preprocessor during compilation but not when
+preprocessor was invoked as a separate step or when using -save-temps.
+In C++ the ISA macro definitions were performed in a way which did not
+have any actual effect.
+
+In GCC 14 C++ behaves like C with integrated preprocessing in earlier
+versions. Moreover, in both languages ISA macros are defined and
+undefined as expected when preprocessing separately from compilation.
+
+
+This can lead to different behavior, especially in C++.  For example,
+functions the C++ snippet below will be (silently) compiled for an
+incorrect instruction set by GCC 14.
+
+
+  #if ! __AVX2__
+  #pragma GCC push_options
+  #pragma GCC target("avx2")
+  #endif
+
+  /* Code to be compiled for AVX2. */
+
+  /* With GCC 14, __AVX2__ here will always be defined and pop_options
+  never called. */
+  #if ! __AVX2__
+  #pragma GCC pop_options
+  #endif
+
+  /* With GCC 14, all following functions will be compiled for AVX2
+  which was not intended. */
+
+
+
+The fix in this case would be to remember
+whether pop_options needs to be performed in a new
+user-defined macro.
+
+
+
 
 
 


[wwwdocs] Porting-to-14: Mention new pragma GCC Target behavior

2024-04-25 Thread Martin Jambor
Hello,

when looking at a package build issue with GCC 14, Michal Jireš noted a
different behavior of pragma GCC Target.  This snippet tries to describe
the gist of the problem.  I have left it in the C section even though it
is not really C specific, but could not think of a good name for a new
section for it.  Ideas (and any other suggestions for improvements)
welcome, of course.

Otherwise, would this be good to go to the wwwdocs?

Thanks,

Martin

diff --git a/htdocs/gcc-14/porting_to.html b/htdocs/gcc-14/porting_to.html
index c825a68e..ae9a3cde 100644
--- a/htdocs/gcc-14/porting_to.html
+++ b/htdocs/gcc-14/porting_to.html
@@ -490,6 +490,43 @@ in C23.
 GCC will probably continue to support old-style function definitions
 even once C23 is used as the default language dialect.
 
+Pragma GCC Target now affects preprocessor 
symbols
+
+
+The behavior of pragma GCC Target has changed in GCC 14.  For example,
+GCC 13 and below defines __AVX2__ only when the target
+is specified on the command line.  This has been considered https://gcc.gnu.org/bugzilla/show_bug.cgi?id=87299";>a
+bug and since it was fixed in GCC 14, __AVX2__ is now also
+defined with #pragma GCC target("avx2").
+
+
+Therefore, if macros expand to something like the snippet below,
+functions will be (silently) compiled for an incorrect instruction
+set.
+
+
+  #if ! __AVX2__
+  #pragma GCC push_options
+  #pragma GCC target("avx2")
+  #endif
+
+  /* Code to be compiled for AVX2. */
+
+  /* With GCC 14, __AVX2__ here will always be defined and pop_options
+  never called. */
+  #if ! __AVX2__
+  #pragma GCC pop_options
+  #endif
+
+  /* With GCC 14, all following functions will be compiled for AVX2
+  which was not intended. */
+
+
+
+The fix in this case would be to remember
+whether pop_options needs to be performed in a new
+user-defined macro.
+
 C++ language issues
 
 Header dependency changes


Re: [PATCH] contrib/check-params-in-docs.py: Ignore target-specific params

2024-04-12 Thread Martin Jambor
Hi,

On Fri, Apr 12 2024, Filip Kastl wrote:
> On Thu 2024-04-11 20:51:55, Thomas Schwinge wrote:
>> Hi!
>> 
>> On 2024-04-11T19:52:51+0200, Martin Jambor  wrote:
>> > contrib/check-params-in-docs.py is a script that checks that all
>> > options reported with ./gcc/xgcc -Bgcc --help=param are in
>> > gcc/doc/invoke.texi and vice versa.
>> 
>> Eh, first time I'm hearing about this one!

It's running as part of our internal buildbot that Martin Liška set up.

I must admit I did want to spend the minimum time necessary to fix the
failure and did not realize Filip was looking at it too until I commited
my simple fix...

>> 
>> (a) Shouldn't this be running as part of the GCC build process?
>> 
>> > gcn-preferred-vectorization-factor is in the manual but normally not
>> > reported by --help, probably because I do not have gcn offload
>> > configured.
>> 
>> No, because you've not been building GCC for GCN target.  ;-P
>> 
>> > This patch makes the script silently about this particular
>> > fact.
>> 
>> (b) Shouldn't we instead ignore any '--param's with "gcn" prefix, similar
>> to how that's done for "skip aarch64 params"?
>> 
>> (c) ..., and shouldn't we likewise skip any "x86" ones?
>> 
>> (d) ..., or in fact any target specific ones, following after the generic
>> section?  (Easily achieved with a special marker in
>> 'gcc/doc/invoke.texi', just before:
>> 
>> The following choices of @var{name} are available on AArch64 targets:
>> 
>> ..., and adjusting the 'takewhile' in 'contrib/check-params-in-docs.py'
>> accordingly?
>
> Hi,
>
> I've made a patch to address (b), (c), (d).  I didn't adjust takewhile.  I
> chose to do it differently since target-specific params in both invoke.texi 
> and
> --help=params have to be ignored.
>
> The downside of this patch is that the script won't complain if someone adds a
> target-specific param and doesn't document it.
>
> What do you think?

...and this is clearly much better.  Thanks!

Martin

>
> Cheers,
> Filip
>
> -- 8< --
>
> contrib/check-params-in-docs.py is a script that checks that all options
> reported with gcc --help=params are in gcc/doc/invoke.texi and vice
> versa.
> gcc/doc/invoke.texi lists target-specific params but gcc --help=params
> doesn't.  This meant that the script would mistakenly complain about
> parms missing from --help=params.  Previously, the script was just set
> to ignore aarch64 and gcn params which solved this issue only for x86.
> This patch sets the script to ignore all target-specific params.
>
> contrib/ChangeLog:
>
>   * check-params-in-docs.py: Ignore target specific params.
>
> Signed-off-by: Filip Kastl 
> ---
>  contrib/check-params-in-docs.py | 21 +
>  1 file changed, 13 insertions(+), 8 deletions(-)
>
> diff --git a/contrib/check-params-in-docs.py b/contrib/check-params-in-docs.py
> index f7879dd8e08..ccdb8d72169 100755
> --- a/contrib/check-params-in-docs.py
> +++ b/contrib/check-params-in-docs.py
> @@ -38,6 +38,9 @@ def get_param_tuple(line):
>  description = line[i:].strip()
>  return (name, description)
>  
> +def target_specific(param):
> +return param.split('-')[0] in ('aarch64', 'gcn', 'x86')
> +
>  
>  parser = argparse.ArgumentParser()
>  parser.add_argument('texi_file')
> @@ -45,13 +48,16 @@ parser.add_argument('params_output')
>  
>  args = parser.parse_args()
>  
> -ignored = {'logical-op-non-short-circuit', 
> 'gcn-preferred-vectorization-factor'}
> -params = {}
> +ignored = {'logical-op-non-short-circuit'}
> +help_params = {}
>  
>  for line in open(args.params_output).readlines():
>  if line.startswith(' ' * 2) and not line.startswith(' ' * 8):
>  r = get_param_tuple(line)
> -params[r[0]] = r[1]
> +help_params[r[0]] = r[1]
> +
> +# Skip target-specific params
> +help_params = [x for x in help_params.keys() if not target_specific(x)]
>  
>  # Find section in .texi manual with parameters
>  texi = ([x.strip() for x in open(args.texi_file).readlines()])
> @@ -66,14 +72,13 @@ for line in texi:
>  texi_params.append(line[len(token):])
>  break
>  
> -# skip digits
> +# Skip digits
>  texi_params = [x for x in texi_params if not x[0].isdigit()]
> -# skip aarch64 params
> -texi_params = [x for x in texi_params if not x.startswith('aarch64')]
> -sorted_params = sorted(texi_params)
> +# Skip target-specific params
> +texi_params = [x for x in texi_params if not target_specific(x)]
>  
>  texi_set = set(texi_params) - ignored
> -params_set = set(params.keys()) - ignored
> +params_set = set(help_params) - ignored
>  
>  success = True
>  extra = texi_set - params_set
> -- 
> 2.43.1


[PATCH] contrib/check-params-in-docs.py: Ignore gcn-preferred-vectorization-factor

2024-04-11 Thread Martin Jambor
Hi,

contrib/check-params-in-docs.py is a script that checks that all
options reported with ./gcc/xgcc -Bgcc --help=param are in
gcc/doc/invoke.texi and vice versa.
gcn-preferred-vectorization-factor is in the manual but normally not
reported by --help, probably because I do not have gcn offload
configured.  This patch makes the script silently about this particular
fact.

I'll push the patch as obvious momentarily.

Martin


contrib/ChangeLog:

2024-04-11  Martin Jambor  

* check-params-in-docs.py (ignored): Add
gcn-preferred-vectorization-factor.
---
 contrib/check-params-in-docs.py | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/contrib/check-params-in-docs.py b/contrib/check-params-in-docs.py
index 623c82284e2..f7879dd8e08 100755
--- a/contrib/check-params-in-docs.py
+++ b/contrib/check-params-in-docs.py
@@ -45,7 +45,7 @@ parser.add_argument('params_output')
 
 args = parser.parse_args()
 
-ignored = {'logical-op-non-short-circuit'}
+ignored = {'logical-op-non-short-circuit', 
'gcn-preferred-vectorization-factor'}
 params = {}
 
 for line in open(args.params_output).readlines():
-- 
2.44.0



[wwwdocs, committed] Fix link to "Feature Test Macros" in "Porting to GCC 14" page

2024-04-10 Thread Martin Jambor
Hi,

Michal Jireš found out that the link to Feature Test Macros on the
Porting to GCC 14 page was broken, it misses a "/latest/" directory in
the middle of the path.

I'll commit the following as obvious.

Thanks,

Martin

---
 htdocs/gcc-14/porting_to.html | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/htdocs/gcc-14/porting_to.html b/htdocs/gcc-14/porting_to.html
index 35274691..c825a68e 100644
--- a/htdocs/gcc-14/porting_to.html
+++ b/htdocs/gcc-14/porting_to.html
@@ -133,7 +133,7 @@ On GNU systems, headers described in standards (such as the 
C
 standard, or POSIX) may require the definition of certain
 macros at the start of the compilation before all required
 function declarations are made available.
-See https://sourceware.org/glibc/manual/html_node/Feature-Test-Macros.html";>Feature
 Test Macros
+See https://sourceware.org/glibc/manual/latest/html_node/Feature-Test-Macros.html";>Feature
 Test Macros
 in the GNU C Library manual for details.
 
 
-- 
2.44.0



Re: [PATCH] ICF&SRA: Make ICF and SRA agree on padding

2024-04-08 Thread Martin Jambor
Hello,

On Sun, Apr 07 2024, Xi Ruoyao wrote:
> On Thu, 2024-04-04 at 23:19 +0200, Martin Jambor wrote:
>> The patch has been approved by Honza in Bugzilla. (I hope.  He did write
>> it looked reasonable.)  Together with the patch for PR 113907, it has
>> passed bootstrap, LTO bootstrap and LTO profiledbootstrap and testing on
>> x86_64-linux and bootstrap and LTO bootstrap on ppc64le-linux.  It also
>> passed normal bootstrap on aarch64-linux but there many testcases failed
>> because the compiler timed out.  The machine is old and slow and might
>> have been oversubscribed so my plan is to try again on gcc185 from
>> cfarm.  If that goes well, I intend to commit the patch and then start
>> working on backports.
>
> I've tried these two patches out on my own 24-core AArch64 machine. 
> Bootstrapped (but no LTO or PGO) and regtested fine.
>

Thank you very much, I have pushed th patches to upstream.

Martin


[PATCH] ipa: Force args obtined through pass-through maps to the expected type (PR 113964)

2024-04-05 Thread Martin Jambor
Hi,

interactions of IPA-CP and IPA-SRA on the same data is a rather big
source of issues, I'm afraid.  PR 113964 is a situation where IPA-CP
propagates an unsigned short in a union parameter into a function
which itself calls a different function which has a same union
parameter and both these union parameters are split with IPA-SRA.  The
leaf function however uses a signed short member of the union.

In the calling function, we get the unsigned constant as the
replacement for the union and it is then passed in the call without
any type compatibility checks.  Apparently on riscv64 it matters
whether the parameter is signed or unsigned short and so the leaf
function can see different values.

Fixed by using useless_type_conversion_p at the appropriate place and
if it fails, use force_value_to type as elsewhere in similar
situations.

Bootstrapped and tested on x86_64-linux, the reporter has also run the
testsuite with this patch on riscv64 and reported in Bugzilla there were
no issues.

OK for master and GCC 13?

Thanks,

Martin


gcc/ChangeLog:

2024-04-04  Martin Jambor  

PR ipa/113964
* ipa-param-manipulation.cc (ipa_param_adjustments::modify_call):
Force values obtined through pass-through maps to the expected
split type.

gcc/testsuite/ChangeLog:

2024-04-04  Patrick O'Neill  
        Martin Jambor  

PR ipa/113964
* gcc.dg/ipa/pr114247.c: New test.
---
 gcc/ipa-param-manipulation.cc   |  6 ++
 gcc/testsuite/gcc.dg/ipa/pr114247.c | 31 +
 2 files changed, 37 insertions(+)
 create mode 100644 gcc/testsuite/gcc.dg/ipa/pr114247.c

diff --git a/gcc/ipa-param-manipulation.cc b/gcc/ipa-param-manipulation.cc
index 3e0df6a6f77..b4ca78b652e 100644
--- a/gcc/ipa-param-manipulation.cc
+++ b/gcc/ipa-param-manipulation.cc
@@ -740,6 +740,12 @@ ipa_param_adjustments::modify_call (cgraph_edge *cs,
  }
   if (repl)
{
+ if (!useless_type_conversion_p(apm->type, repl->typed.type))
+   {
+ repl = force_value_to_type (apm->type, repl);
+ repl = force_gimple_operand_gsi (&gsi, repl,
+  true, NULL, true, GSI_SAME_STMT);
+   }
  vargs.quick_push (repl);
  continue;
}
diff --git a/gcc/testsuite/gcc.dg/ipa/pr114247.c 
b/gcc/testsuite/gcc.dg/ipa/pr114247.c
new file mode 100644
index 000..60aa2bc0122
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/ipa/pr114247.c
@@ -0,0 +1,31 @@
+/* { dg-do run } */
+/* { dg-options "-O2 -fsigned-char -fno-strict-aliasing -fwrapv" } */
+
+union a {
+  unsigned short b;
+  int c;
+  signed short d;
+};
+int e, f = 1, g;
+long h;
+const int **i;
+void j(union a k, int l, unsigned m) {
+  const int *a[100];
+  i = &a[0];
+  h = k.d;
+}
+static int o(union a k) {
+  k.d = -1;
+  while (1)
+if (f)
+  break;
+  j(k, g, e);
+  return 0;
+}
+int main() {
+  union a n = {1};
+  o(n);
+  if (h != -1)
+__builtin_abort();
+  return 0;
+}
-- 
2.44.0



[PATCH] ICF&SRA: Make ICF and SRA agree on padding

2024-04-04 Thread Martin Jambor
Hi,

PR 113359 shows that (at least with -fno-strict-aliasing) ICF can
unify two functions which copy an aggregate type of the same size but
then SRA, through its total scalarization, can copy the aggregate by
pieces, skipping paddding, but the padding was not the same in the two
original functions that ICF unified.

This patch enhances SRA with the ability to collect padding
information which then can be compared from within ICF.  Unfortunately
SRA uses OPTION_SET_P when determining its limits, so ICF needs to
switch cfuns at least once to figure it out too.

The patch has been approved by Honza in Bugzilla. (I hope.  He did write
it looked reasonable.)  Together with the patch for PR 113907, it has
passed bootstrap, LTO bootstrap and LTO profiledbootstrap and testing on
x86_64-linux and bootstrap and LTO bootstrap on ppc64le-linux.  It also
passed normal bootstrap on aarch64-linux but there many testcases failed
because the compiler timed out.  The machine is old and slow and might
have been oversubscribed so my plan is to try again on gcc185 from
cfarm.  If that goes well, I intend to commit the patch and then start
working on backports.

Martin


gcc/ChangeLog:

2024-03-27  Martin Jambor  

PR ipa/113359
* ipa-icf-gimple.h (func_checker): New members
safe_for_total_scalarization_p, m_total_scalarization_limit_known_p
and m_total_scalarization_limit.
(func_checker::func_checker): Initialize new member variables.
* ipa-icf-gimple.cc: Include tree-sra.h.
(func_checker::func_checker): Initialize new member variables.
(func_checker::safe_for_total_scalarization_p): New function.
(func_checker::compare_operand): Use the new function.
* tree-sra.h (sra_get_max_scalarization_size): Declare.
(sra_total_scalarization_would_copy_same_data_p): Likewise.
* tree-sra.cc (prepare_iteration_over_array_elts): New function.
(class sra_padding_collecting): New.
(sra_padding_collecting::record_padding): Likewise.
(scalarizable_type_p): Rename to totally_scalarizable_type_p.  Add
ability to record padding when requested.
(totally_scalarize_subtree): Split out gathering information necessary
to iterate over array elements to prepare_iteration_over_array_elts.
Fix errornous early exit.
(analyze_all_variable_accesses): Adjust the call to
totally_scalarizable_type_p.  Move determining of total scalariation
size limit...
(sra_get_max_scalarization_size): ...here.
(check_ts_and_push_padding_to_vec): New function.
(sra_total_scalarization_would_copy_same_data_p): Likewise.

gcc/testsuite/ChangeLog:

2024-03-27  Martin Jambor  

PR ipa/113359
* gcc.dg/lto/pr113359-1_0.c: New.
* gcc.dg/lto/pr113359-1_1.c: Likewise.
* gcc.dg/lto/pr113359-2_0.c: Likewise.
* gcc.dg/lto/pr113359-2_1.c: Likewise.
* gcc.dg/lto/pr113359-3_0.c: Likewise.
* gcc.dg/lto/pr113359-3_1.c: Likewise.
* gcc.dg/lto/pr113359-4_0.c: Likewise.
* gcc.dg/lto/pr113359-4_1.c: Likewise.
* gcc.dg/lto/pr113359-5_0.c: Likewise.
* gcc.dg/lto/pr113359-5_1.c: Likewise.
---
 gcc/ipa-icf-gimple.cc   |  41 +++-
 gcc/ipa-icf-gimple.h|  15 +-
 gcc/testsuite/gcc.dg/lto/pr113359-1_0.c |  86 
 gcc/testsuite/gcc.dg/lto/pr113359-1_1.c |  38 
 gcc/testsuite/gcc.dg/lto/pr113359-2_0.c |  87 
 gcc/testsuite/gcc.dg/lto/pr113359-2_1.c |  38 
 gcc/testsuite/gcc.dg/lto/pr113359-3_0.c | 114 +++
 gcc/testsuite/gcc.dg/lto/pr113359-3_1.c |  49 +
 gcc/testsuite/gcc.dg/lto/pr113359-4_0.c | 114 +++
 gcc/testsuite/gcc.dg/lto/pr113359-4_1.c |  49 +
 gcc/testsuite/gcc.dg/lto/pr113359-5_0.c | 118 +++
 gcc/testsuite/gcc.dg/lto/pr113359-5_1.c |  50 +
 gcc/tree-sra.cc | 252 +++-
 gcc/tree-sra.h  |   3 +
 14 files changed, 999 insertions(+), 55 deletions(-)
 create mode 100644 gcc/testsuite/gcc.dg/lto/pr113359-1_0.c
 create mode 100644 gcc/testsuite/gcc.dg/lto/pr113359-1_1.c
 create mode 100644 gcc/testsuite/gcc.dg/lto/pr113359-2_0.c
 create mode 100644 gcc/testsuite/gcc.dg/lto/pr113359-2_1.c
 create mode 100644 gcc/testsuite/gcc.dg/lto/pr113359-3_0.c
 create mode 100644 gcc/testsuite/gcc.dg/lto/pr113359-3_1.c
 create mode 100644 gcc/testsuite/gcc.dg/lto/pr113359-4_0.c
 create mode 100644 gcc/testsuite/gcc.dg/lto/pr113359-4_1.c
 create mode 100644 gcc/testsuite/gcc.dg/lto/pr113359-5_0.c
 create mode 100644 gcc/testsuite/gcc.dg/lto/pr113359-5_1.c

diff --git a/gcc/ipa-icf-gimple.cc b/gcc/ipa-icf-gimple.cc
index 17f62bec068..c25eb24710f 100644
--- a/gcc/ipa-icf-gimple.cc
+++ b/gcc/ipa-icf-gimple.cc
@@ -39,6 +39,7 @@ along with GCC; see the file COPYING3.  If not see
 #include "cfgloop.h"
 #include "attribs.h"
 #include "gimple-walk.h"
+#in

[PATCH] ipa: Compare jump functions in ICF (PR 113907)

2024-04-04 Thread Martin Jambor
Hello,

In PR 113907 comment #58, Honza found a case where ICF thinks bodies
of functions are equivalent but becaise of difference in aliases in a
memory access, different aggregate jump functions are associated with
supposedly equivalent call statements.  This patch adds a way to
compare jump functions and plugs it into ICF to avoid the issue.

The patch has been approved by Honza in Bugzilla.  Together with the
patch for PR 113359, it has passed bootstrap, LTO bootstrap and LTO
profiledbootstrap and testing on x86_64-linux and bootstrap and LTO
bootstrap on ppc64le-linux.  It also passed normal bootstrap on
aarch64-linux but there many testcases failed because the compiler timed
out.  The machine is old and slow and might have been oversubscribed so
my plan is to try again on gcc185 from cfarm.  If that goes well, I
intend to commit the patch and then start working on backports.

Martin


gcc/ChangeLog:

2024-03-20  Martin Jambor  

PR ipa/113907
* ipa-prop.h (class ipa_vr): Declare new overload of a member function
equal_p.
(ipa_jump_functions_equivalent_p): Declare.
* ipa-prop.cc (ipa_vr::equal_p): New function.
(ipa_agg_pass_through_jf_equivalent_p): Likewise.
(ipa_agg_jump_functions_equivalent_p): Likewise.
(ipa_jump_functions_equivalent_p): Likewise.
* ipa-cp.h (values_equal_for_ipcp_p): Declare.
* ipa-cp.cc (values_equal_for_ipcp_p): Make function public.
* ipa-icf-gimple.cc: Include alloc-pool.h, symbol-summary.h, sreal.h,
ipa-cp.h and ipa-prop.h.
(func_checker::compare_gimple_call): Comapre jump functions.

gcc/testsuite/ChangeLog:

2024-03-20  Martin Jambor  

PR ipa/113907
* gcc.dg/lto/pr113907_0.c: New.
* gcc.dg/lto/pr113907_1.c: Likewise.
* gcc.dg/lto/pr113907_2.c: Likewise.
---
 gcc/ipa-cp.cc |   2 +-
 gcc/ipa-cp.h  |   2 +
 gcc/ipa-icf-gimple.cc |  30 +
 gcc/ipa-prop.cc   | 167 ++
 gcc/ipa-prop.h|   3 +
 gcc/testsuite/gcc.dg/lto/pr113907_0.c |  18 +++
 gcc/testsuite/gcc.dg/lto/pr113907_1.c |  35 ++
 gcc/testsuite/gcc.dg/lto/pr113907_2.c |  11 ++
 8 files changed, 267 insertions(+), 1 deletion(-)
 create mode 100644 gcc/testsuite/gcc.dg/lto/pr113907_0.c
 create mode 100644 gcc/testsuite/gcc.dg/lto/pr113907_1.c
 create mode 100644 gcc/testsuite/gcc.dg/lto/pr113907_2.c

diff --git a/gcc/ipa-cp.cc b/gcc/ipa-cp.cc
index 2a1da631e9c..b7add455bd5 100644
--- a/gcc/ipa-cp.cc
+++ b/gcc/ipa-cp.cc
@@ -201,7 +201,7 @@ ipcp_lattice::is_single_const ()
 
 /* Return true iff X and Y should be considered equal values by IPA-CP.  */
 
-static bool
+bool
 values_equal_for_ipcp_p (tree x, tree y)
 {
   gcc_checking_assert (x != NULL_TREE && y != NULL_TREE);
diff --git a/gcc/ipa-cp.h b/gcc/ipa-cp.h
index 0b3cfe4b526..7ff74fb5c98 100644
--- a/gcc/ipa-cp.h
+++ b/gcc/ipa-cp.h
@@ -289,4 +289,6 @@ public:
   bool virt_call = false;
 };
 
+bool values_equal_for_ipcp_p (tree x, tree y);
+
 #endif /* IPA_CP_H */
diff --git a/gcc/ipa-icf-gimple.cc b/gcc/ipa-icf-gimple.cc
index 8c2df7a354e..17f62bec068 100644
--- a/gcc/ipa-icf-gimple.cc
+++ b/gcc/ipa-icf-gimple.cc
@@ -41,7 +41,12 @@ along with GCC; see the file COPYING3.  If not see
 #include "gimple-walk.h"
 
 #include "tree-ssa-alias-compare.h"
+#include "alloc-pool.h"
+#include "symbol-summary.h"
 #include "ipa-icf-gimple.h"
+#include "sreal.h"
+#include "ipa-cp.h"
+#include "ipa-prop.h"
 
 namespace ipa_icf_gimple {
 
@@ -714,6 +719,31 @@ func_checker::compare_gimple_call (gcall *s1, gcall *s2)
   && !compatible_types_p (TREE_TYPE (t1), TREE_TYPE (t2)))
 return return_false_with_msg ("GIMPLE internal call LHS type mismatch");
 
+  if (!gimple_call_internal_p (s1))
+{
+  cgraph_edge *e1 = cgraph_node::get (m_source_func_decl)->get_edge (s1);
+  cgraph_edge *e2 = cgraph_node::get (m_target_func_decl)->get_edge (s2);
+  class ipa_edge_args *args1 = ipa_edge_args_sum->get (e1);
+  class ipa_edge_args *args2 = ipa_edge_args_sum->get (e2);
+  if ((args1 != nullptr) != (args2 != nullptr))
+   return return_false_with_msg ("ipa_edge_args mismatch");
+  if (args1)
+   {
+ int n1 = ipa_get_cs_argument_count (args1);
+ int n2 = ipa_get_cs_argument_count (args2);
+ if (n1 != n2)
+   return return_false_with_msg ("ipa_edge_args nargs mismatch");
+ for (int i = 0; i < n1; i++)
+   {
+ struct ipa_jump_func *jf1 = ipa_get_ith_jump_func (args1, i);
+ struct ipa_jump_func *jf2 = ipa_get_ith_jump_func (args2, i);
+ if (((jf1 != nullptr) != (jf2 != nullptr))
+ || (jf1 && !ipa_jump_functions_equivalent_p (jf

Re: [PATCH] ipa: Avoid duplicate replacements in IPA-SRA transformation phase

2024-03-28 Thread Martin Jambor
Hello,

and ping, please.  (In my copy I have fixed the formatting issue spotted
by Jakub.)

Martin

On Fri, Mar 15 2024, Martin Jambor wrote:
> Hi,
>
> when the analysis part of IPA-SRA figures out that it would split out
> a scalar part of an aggregate which is known by IPA-CP to contain a
> known constant, it skips it knowing that the transformation part looks
> at IPA-CP aggregate results too and does the right thing (which can
> include doing the propagation in GIMPLE because that is the last
> moment the parameter exists).
>
> However, when IPA-SRA wants to split out a smaller non-aggregate out
> of an aggregate, which happens to be of the same size as a known
> scalar constant at the same offset, the transformation bit fails to
> recognize the situation, tries to do both splitting and constant
> propagation and in PR 111571 testcase creates a nonsensical call
> statement on which the call redirection then ICEs.
>
> Fixed by making sure we don't try to do two replacements of the same
> part of the same parameter.
>
> The look-up among replacements requires these are sorted and this
> patch just sorts them if they are not already sorted before each new
> look-up.  The worst number of sortings that can happen is number of
> parameters which are both split and have aggregate constants times
> param_ipa_max_agg_items (default 16).  I don't think complicating the
> source code to optimize for this unlikely case is worth it but if need
> be, it can of course be done.
>
> Bootstrapped and tested on x86_64-linux.  OK for master and eventually
> also the gcc-13 branch?
>
> Thanks,
>
> Martin
>
>
>
> gcc/ChangeLog:
>
> 2024-03-15  Martin Jambor  
>
>   PR ipa/111571
>   * ipa-param-manipulation.cc
>   (ipa_param_body_adjustments::common_initialization): Avoid creating
>   duplicate replacement entries.
>
> gcc/testsuite/ChangeLog:
>
> 2024-03-15  Martin Jambor  
>
>   PR ipa/111571
>   * gcc.dg/ipa/pr111571.c: New test.
> ---
>  gcc/ipa-param-manipulation.cc   | 16 
>  gcc/testsuite/gcc.dg/ipa/pr111571.c | 29 +
>  2 files changed, 45 insertions(+)
>  create mode 100644 gcc/testsuite/gcc.dg/ipa/pr111571.c
>
> diff --git a/gcc/ipa-param-manipulation.cc b/gcc/ipa-param-manipulation.cc
> index 3e0df6a6f77..4c6337cc563 100644
> --- a/gcc/ipa-param-manipulation.cc
> +++ b/gcc/ipa-param-manipulation.cc
> @@ -1525,6 +1525,22 @@ ipa_param_body_adjustments::common_initialization 
> (tree old_fndecl,
>replacement with a constant (for split aggregates passed
>by value).  */
>  
> +   if (split[parm_num])
> + {
> +   /* We must be careful not to add a duplicate
> +  replacement. */
> +   sort_replacements ();
> +   ipa_param_body_replacement *pbr =
> + lookup_replacement_1 (m_oparms[parm_num],
> +   av.unit_offset);
> +   if (pbr)
> + {
> +   /* Otherwise IPA-SRA should have bailed out.  */
> +   gcc_assert (AGGREGATE_TYPE_P (TREE_TYPE (pbr->repl)));
> +   continue;
> + }
> + }
> +
> tree repl;
> if (av.by_ref)
>   repl = av.value;
> diff --git a/gcc/testsuite/gcc.dg/ipa/pr111571.c 
> b/gcc/testsuite/gcc.dg/ipa/pr111571.c
> new file mode 100644
> index 000..2a4adc608db
> --- /dev/null
> +++ b/gcc/testsuite/gcc.dg/ipa/pr111571.c
> @@ -0,0 +1,29 @@
> +/* { dg-do compile } */
> +/* { dg-options "-O2"  } */
> +
> +struct a {
> +  int b;
> +};
> +struct c {
> +  long d;
> +  struct a e;
> +  long f;
> +};
> +int g, h, i;
> +int j() {return 0;}
> +static void k(struct a l, int p) {
> +  if (h)
> +g = 0;
> +  for (; g; g = j())
> +if (l.b)
> +  break;
> +}
> +static void m(struct c l) {
> +  k(l.e, l.f);
> +  for (;; --i)
> +;
> +}
> +int main() {
> +  struct c n = {10, 9};
> +  m(n);
> +}
> -- 
> 2.44.0


Re: [PATCH] tree-optimization/113727 - bogus SRA with BIT_FIELD_REF

2024-03-20 Thread Martin Jambor
Hello,

On Tue, Mar 19 2024, Richard Biener wrote:
> When SRA analyzes BIT_FIELD_REFs it handles writes and not byte
> aligned reads differently from byte aligned reads.  Instead of
> trying to create replacements for the loaded portion the former
> cases try to replace the base object while keeping the wrapping
> BIT_FIELD_REFs.  This breaks when we have both kinds operating
> on the same base object if there's no appearant overlap conflict
> as the conflict that then nevertheless exists isn't handled with.
> The fix is to enforce what I think is part of the design handling
> the former case - that only the full base object gets replaced
> and no further sub-objects are created within as otherwise
> keeping the wrapping BIT_FIELD_REF cannot work.  The patch
> enforces this within analyze_access_subtree.
>
> Bootstrap and regtest running on x86_64-unknown-linux-gnu.
>
> OK?

I agree this is the best thing to do.

Thanks,

Martin

>
> Thanks,
> Richard.
>
>   PR tree-optimization/113727
>   * tree-sra.cc (analyze_access_subtree): Do not allow
>   replacements in subtrees when grp_partial_lhs.
>
>   * gcc.dg/torture/pr113727.c: New testcase.
> ---
>  gcc/testsuite/gcc.dg/torture/pr113727.c | 26 +
>  gcc/tree-sra.cc |  3 ++-
>  2 files changed, 28 insertions(+), 1 deletion(-)
>  create mode 100644 gcc/testsuite/gcc.dg/torture/pr113727.c
>
> diff --git a/gcc/testsuite/gcc.dg/torture/pr113727.c 
> b/gcc/testsuite/gcc.dg/torture/pr113727.c
> new file mode 100644
> index 000..f92ddad5c8e
> --- /dev/null
> +++ b/gcc/testsuite/gcc.dg/torture/pr113727.c
> @@ -0,0 +1,26 @@
> +/* { dg-do run } */
> +/* { dg-require-effective-target int32plus } */
> +
> +struct f {
> +  unsigned au : 5;
> +  unsigned f3 : 21;
> +} g_994;
> +
> +int main()
> +{
> +  struct f aq1 = {};
> +{
> +  struct f aq = {9, 5};
> +  struct f as = aq;
> +  for (int y = 0 ; y <= 4; y += 1)
> + if (as.au)
> +   {
> + struct f aa[5] = {{2, 154}, {2, 154}, {2, 154}, {2, 154}, {2, 154}};
> + as = aa[0];
> +   }
> +  aq1 = as;
> +}
> +  if (aq1.f3 != 0x9a)
> +__builtin_abort();
> +  return 0;
> +}
> diff --git a/gcc/tree-sra.cc b/gcc/tree-sra.cc
> index f8e71ec48b9..dbfae5e7fdd 100644
> --- a/gcc/tree-sra.cc
> +++ b/gcc/tree-sra.cc
> @@ -2735,7 +2735,8 @@ analyze_access_subtree (struct access *root, struct 
> access *parent,
>  {
>hole |= covered_to < child->offset;
>sth_created |= analyze_access_subtree (child, root,
> -  allow_replacements && !scalar,
> +  allow_replacements && !scalar
> +  && !root->grp_partial_lhs,
>totally);
>  
>root->grp_unscalarized_data |= child->grp_unscalarized_data;
> -- 
> 2.35.3


[PATCH] ipa: Avoid duplicate replacements in IPA-SRA transformation phase

2024-03-15 Thread Martin Jambor
Hi,

when the analysis part of IPA-SRA figures out that it would split out
a scalar part of an aggregate which is known by IPA-CP to contain a
known constant, it skips it knowing that the transformation part looks
at IPA-CP aggregate results too and does the right thing (which can
include doing the propagation in GIMPLE because that is the last
moment the parameter exists).

However, when IPA-SRA wants to split out a smaller non-aggregate out
of an aggregate, which happens to be of the same size as a known
scalar constant at the same offset, the transformation bit fails to
recognize the situation, tries to do both splitting and constant
propagation and in PR 111571 testcase creates a nonsensical call
statement on which the call redirection then ICEs.

Fixed by making sure we don't try to do two replacements of the same
part of the same parameter.

The look-up among replacements requires these are sorted and this
patch just sorts them if they are not already sorted before each new
look-up.  The worst number of sortings that can happen is number of
parameters which are both split and have aggregate constants times
param_ipa_max_agg_items (default 16).  I don't think complicating the
source code to optimize for this unlikely case is worth it but if need
be, it can of course be done.

Bootstrapped and tested on x86_64-linux.  OK for master and eventually
also the gcc-13 branch?

Thanks,

Martin



gcc/ChangeLog:

2024-03-15  Martin Jambor  

PR ipa/111571
* ipa-param-manipulation.cc
(ipa_param_body_adjustments::common_initialization): Avoid creating
duplicate replacement entries.

gcc/testsuite/ChangeLog:

2024-03-15  Martin Jambor  

PR ipa/111571
* gcc.dg/ipa/pr111571.c: New test.
---
 gcc/ipa-param-manipulation.cc   | 16 
 gcc/testsuite/gcc.dg/ipa/pr111571.c | 29 +
 2 files changed, 45 insertions(+)
 create mode 100644 gcc/testsuite/gcc.dg/ipa/pr111571.c

diff --git a/gcc/ipa-param-manipulation.cc b/gcc/ipa-param-manipulation.cc
index 3e0df6a6f77..4c6337cc563 100644
--- a/gcc/ipa-param-manipulation.cc
+++ b/gcc/ipa-param-manipulation.cc
@@ -1525,6 +1525,22 @@ ipa_param_body_adjustments::common_initialization (tree 
old_fndecl,
 replacement with a constant (for split aggregates passed
 by value).  */
 
+ if (split[parm_num])
+   {
+ /* We must be careful not to add a duplicate
+replacement. */
+ sort_replacements ();
+ ipa_param_body_replacement *pbr =
+   lookup_replacement_1 (m_oparms[parm_num],
+ av.unit_offset);
+ if (pbr)
+   {
+ /* Otherwise IPA-SRA should have bailed out.  */
+ gcc_assert (AGGREGATE_TYPE_P (TREE_TYPE (pbr->repl)));
+ continue;
+   }
+   }
+
  tree repl;
  if (av.by_ref)
repl = av.value;
diff --git a/gcc/testsuite/gcc.dg/ipa/pr111571.c 
b/gcc/testsuite/gcc.dg/ipa/pr111571.c
new file mode 100644
index 000..2a4adc608db
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/ipa/pr111571.c
@@ -0,0 +1,29 @@
+/* { dg-do compile } */
+/* { dg-options "-O2"  } */
+
+struct a {
+  int b;
+};
+struct c {
+  long d;
+  struct a e;
+  long f;
+};
+int g, h, i;
+int j() {return 0;}
+static void k(struct a l, int p) {
+  if (h)
+g = 0;
+  for (; g; g = j())
+if (l.b)
+  break;
+}
+static void m(struct c l) {
+  k(l.e, l.f);
+  for (;; --i)
+;
+}
+int main() {
+  struct c n = {10, 9};
+  m(n);
+}
-- 
2.44.0



[PATCH] ipa: Fix C++ member ptr indirect inlining (PR 114254, PR 108802)

2024-03-08 Thread Martin Jambor
Hi,

Even though we have had code to handle creation of indirect call graph
edges (so that these calls can than be made direct as part of IPA-CP
and inlining and eventually also inlined) for C++ member pointers for
many years, it turns out that it does not work for lambdas and that it
has been severely broken since GCC 10 when the base class has virtual
functions.

Lambdas don't work because the code cannot work with structures
representing member function pointers because they are passed by
reference instead by value and the code was not ready for that.

The presence of virtual methods broke thinks because at some point C++
FE got clever and stopped emitting the check for virtual methods when
the base class does not have any and that in turn made our existing
testcases not test the necessary pattern matching code.  The pattern
matcher had a small bug which did not matter before
r10-917-g3b47da42de621c but did afterwards.

This patch changes the pattern matcher to match both of these cases.

Special thanks to the Linaro automated checker of patches which
reported that the earlier version of my PR 108802 fix was not working
on Aarch64 which in turn made me discover PR 114254.

The patch has passed bootstrap and testing on x86_64-linux,
aarch64-linux and ppc64-linux and I also LTO bootstrap on x86_64-linux.

I understand we have been living with these deficiencies for a while now
but both are technically regressions.  If Honza agrees (and manages to
review the patch quickly), I'm fine with pushing them to master now but
I can also wait until the next stage 1.

Thanks,

Martin


gcc/ChangeLog:

2024-03-06  Martin Jambor  

PR ipa/108802
PR ipa/114254
* ipa-prop.cc (ipa_get_stmt_member_ptr_load_param): Fix case looking
at COMPONENT_REFs directly from a PARM_DECL, also recognize loads from
a pointer parameter.
(ipa_analyze_indirect_call_uses): Also recognize loads from a pointer
parameter, also recognize the case when pfn pointer is loaded in its
own BB.

gcc/testsuite/ChangeLog:

2024-03-06  Martin Jambor  

PR ipa/108802
PR ipa/114254
* g++.dg/ipa/iinline-4.C: New test.
* g++.dg/ipa/pr108802.C: Likewise.
---
 gcc/ipa-prop.cc  | 110 +++
 gcc/testsuite/g++.dg/ipa/iinline-4.C |  61 +++
 gcc/testsuite/g++.dg/ipa/pr108802.C  |  14 
 3 files changed, 154 insertions(+), 31 deletions(-)
 create mode 100644 gcc/testsuite/g++.dg/ipa/iinline-4.C
 create mode 100644 gcc/testsuite/g++.dg/ipa/pr108802.C

diff --git a/gcc/ipa-prop.cc b/gcc/ipa-prop.cc
index e22c4f78405..e8e4918d5a8 100644
--- a/gcc/ipa-prop.cc
+++ b/gcc/ipa-prop.cc
@@ -2500,7 +2500,9 @@ static tree
 ipa_get_stmt_member_ptr_load_param (gimple *stmt, bool use_delta,
HOST_WIDE_INT *offset_p)
 {
-  tree rhs, rec, ref_field, ref_offset, fld, ptr_field, delta_field;
+  tree rhs, fld, ptr_field, delta_field;
+  tree ref_field = NULL_TREE;
+  tree ref_offset = NULL_TREE;
 
   if (!gimple_assign_single_p (stmt))
 return NULL_TREE;
@@ -2511,35 +2513,53 @@ ipa_get_stmt_member_ptr_load_param (gimple *stmt, bool 
use_delta,
   ref_field = TREE_OPERAND (rhs, 1);
   rhs = TREE_OPERAND (rhs, 0);
 }
+
+  if (TREE_CODE (rhs) == MEM_REF)
+{
+  ref_offset = TREE_OPERAND (rhs, 1);
+  if (ref_field && integer_nonzerop (ref_offset))
+   return NULL_TREE;
+}
+  else if (!ref_field)
+return NULL_TREE;
+
+  if (TREE_CODE (rhs) == MEM_REF
+  && TREE_CODE (TREE_OPERAND (rhs, 0)) == SSA_NAME
+  && SSA_NAME_IS_DEFAULT_DEF (TREE_OPERAND (rhs, 0)))
+{
+  rhs = TREE_OPERAND (rhs, 0);
+  if (TREE_CODE (SSA_NAME_VAR (rhs)) != PARM_DECL
+ || !type_like_member_ptr_p (TREE_TYPE (TREE_TYPE (rhs)), &ptr_field,
+ &delta_field))
+   return NULL_TREE;
+}
   else
-ref_field = NULL_TREE;
-  if (TREE_CODE (rhs) != MEM_REF)
-return NULL_TREE;
-  rec = TREE_OPERAND (rhs, 0);
-  if (TREE_CODE (rec) != ADDR_EXPR)
-return NULL_TREE;
-  rec = TREE_OPERAND (rec, 0);
-  if (TREE_CODE (rec) != PARM_DECL
-  || !type_like_member_ptr_p (TREE_TYPE (rec), &ptr_field, &delta_field))
-return NULL_TREE;
-  ref_offset = TREE_OPERAND (rhs, 1);
+{
+  if (TREE_CODE (rhs) == MEM_REF
+ && TREE_CODE (TREE_OPERAND (rhs, 0)) == ADDR_EXPR)
+   rhs = TREE_OPERAND (TREE_OPERAND (rhs, 0), 0);
+  if (TREE_CODE (rhs) != PARM_DECL
+ || !type_like_member_ptr_p (TREE_TYPE (rhs), &ptr_field,
+ &delta_field))
+   return NULL_TREE;
+}
 
   if (use_delta)
 fld = delta_field;
   else
 fld = ptr_field;
-  if (offset_p)
-*offset_p = int_bit_position (fld);
 
   if (ref_field)
 {
-  if (integer_nonzerop (ref_offset))
+  if (ref_field != fld)

Re: [PATCH] ipa: Avoid excessive removing of SSAs (PR 113757)

2024-03-07 Thread Martin Jambor
Hello,

and ping please.

Martin


On Thu, Feb 08 2024, Martin Jambor wrote:
> Hi,
>
> PR 113757 shows that the code which was meant to debug-reset and
> remove SSAs defined by LHSs of calls redirected to
> __builtin_unreachable can trigger also when speculative
> devirtualization creates a call to a noreturn function (and since it
> is noreturn, it does not bother dealing with its return value).
>
> What is more, it seems that the code handling this case is not really
> necessary.  I feel slightly idiotic about this because I have a
> feeling that I added it because of a failing test-case but I can
> neither find the testcase nor a reason why the code in
> cgraph_edge::redirect_call_stmt_to_callee would not be sufficient (it
> turns the SSA name into a default-def, a bit like IPA-SRA, but any
> code dominated by a call to a noreturn is not dangerous when it comes
> to its side-effects).  So this patch just removes the handling.
>
> Bootstrapped and tested on x86_64-linux and ppc64le-linux.  I have also
> LTO-bootstrapped and LTO-profilebootstrapped the patch on x86_64-linux.
>
> OK for master?
>
> Thanks,
>
> Martin
>
>
> gcc/ChangeLog:
>
> 2024-02-07  Martin Jambor  
>
>   PR ipa/113757
>   * tree-inline.cc (redirect_all_calls): Remove code adding SSAs to
>   id->killed_new_ssa_names.
>
> gcc/testsuite/ChangeLog:
>
> 2024-02-07  Martin Jambor  
>
>   PR ipa/113757
>   * g++.dg/ipa/pr113757.C: New test.
> ---
>  gcc/testsuite/g++.dg/ipa/pr113757.C | 14 ++
>  gcc/tree-inline.cc  | 14 ++
>  2 files changed, 16 insertions(+), 12 deletions(-)
>  create mode 100644 gcc/testsuite/g++.dg/ipa/pr113757.C
>
> diff --git a/gcc/testsuite/g++.dg/ipa/pr113757.C 
> b/gcc/testsuite/g++.dg/ipa/pr113757.C
> new file mode 100644
> index 000..885d4010a10
> --- /dev/null
> +++ b/gcc/testsuite/g++.dg/ipa/pr113757.C
> @@ -0,0 +1,14 @@
> +// { dg-do compile }
> +// { dg-options "-O2 -fPIC" }
> +// { dg-require-effective-target fpic }
> +
> +long size();
> +struct ll {  virtual int hh();  };
> +ll  *slice_owner;
> +int ll::hh() { __builtin_exit(0); }
> +int nn() {
> +  if (size())
> +return 0;
> +  return slice_owner->hh();
> +}
> +int (*a)() = nn;
> diff --git a/gcc/tree-inline.cc b/gcc/tree-inline.cc
> index 75c10eb7dfc..cac41b4f031 100644
> --- a/gcc/tree-inline.cc
> +++ b/gcc/tree-inline.cc
> @@ -2984,23 +2984,13 @@ redirect_all_calls (copy_body_data * id, basic_block 
> bb)
>gimple *stmt = gsi_stmt (si);
>if (is_gimple_call (stmt))
>   {
> -   tree old_lhs = gimple_call_lhs (stmt);
> struct cgraph_edge *edge = id->dst_node->get_edge (stmt);
> if (edge)
>   {
> if (!id->killed_new_ssa_names)
>   id->killed_new_ssa_names = new hash_set (16);
> -   gimple *new_stmt
> - = cgraph_edge::redirect_call_stmt_to_callee (edge,
> - id->killed_new_ssa_names);
> -   if (old_lhs
> -   && TREE_CODE (old_lhs) == SSA_NAME
> -   && !gimple_call_lhs (new_stmt))
> - /* In case of IPA-SRA removing the LHS, the name should have
> -been already added to the hash.  But in case of redirecting
> -to builtin_unreachable it was not and the name still should
> -be pruned from debug statements.  */
> - id->killed_new_ssa_names->add (old_lhs);
> +   cgraph_edge::redirect_call_stmt_to_callee (edge,
> + id->killed_new_ssa_names);
>  
> if (stmt == last && id->call_stmt && maybe_clean_eh_stmt (stmt))
>   gimple_purge_dead_eh_edges (bb);
> -- 
> 2.43.0


[PATCH] ipa: Create indirect call edges also for lambdas

2024-02-21 Thread Martin Jambor
Hi,

Even though we have had code to handle creation of indirect call graph
edges (so that these calls can than be made direct as part of IPA-CP
and inlining and eventually also inlined) for C++ member pointers for
many years, this code expects the member pointers to be structures
passed by value.  In PR 108802 it turned out that for lambdas these
are passed by reference.  This patch adjusts the code for that so that
small lambdas are also inlineable without depending on early inlining.

Bootstrapped and LTO bootstrapped on x86_64-linux.  This is technically
a regression against GCC 10.  OK for master even now?

Thanks,

Martin


gcc/ChangeLog:

2024-02-20  Martin Jambor  

PR ipa/108802
* ipa-prop.cc (ipa_get_stmt_member_ptr_load_param): Also recognize
loads from a pointer parameter.
(ipa_analyze_indirect_call_uses): Likewise.

gcc/testsuite/ChangeLog:

2024-02-20  Martin Jambor  

PR ipa/108802
* g++.dg/ipa/pr108802.C: New test.
---
 gcc/ipa-prop.cc | 56 +
 gcc/testsuite/g++.dg/ipa/pr108802.C | 14 
 2 files changed, 55 insertions(+), 15 deletions(-)
 create mode 100644 gcc/testsuite/g++.dg/ipa/pr108802.C

diff --git a/gcc/ipa-prop.cc b/gcc/ipa-prop.cc
index bec0ebd210c..25d252fd57c 100644
--- a/gcc/ipa-prop.cc
+++ b/gcc/ipa-prop.cc
@@ -2514,14 +2514,26 @@ ipa_get_stmt_member_ptr_load_param (gimple *stmt, bool 
use_delta,
   if (TREE_CODE (rhs) != MEM_REF)
 return NULL_TREE;
   rec = TREE_OPERAND (rhs, 0);
-  if (TREE_CODE (rec) != ADDR_EXPR)
-return NULL_TREE;
-  rec = TREE_OPERAND (rec, 0);
-  if (TREE_CODE (rec) != PARM_DECL
-  || !type_like_member_ptr_p (TREE_TYPE (rec), &ptr_field, &delta_field))
+  if (TREE_CODE (rec) == ADDR_EXPR)
+{
+  rec = TREE_OPERAND (rec, 0);
+  if (TREE_CODE (rec) != PARM_DECL
+ || !type_like_member_ptr_p (TREE_TYPE (rec), &ptr_field,
+ &delta_field))
+   return NULL_TREE;
+}
+  else if (TREE_CODE (rec) == SSA_NAME
+  && SSA_NAME_IS_DEFAULT_DEF (rec))
+{
+  if (TREE_CODE (SSA_NAME_VAR (rec)) != PARM_DECL
+ || !type_like_member_ptr_p (TREE_TYPE (TREE_TYPE (rec)), &ptr_field,
+ &delta_field))
+   return NULL_TREE;
+}
+  else
 return NULL_TREE;
-  ref_offset = TREE_OPERAND (rhs, 1);
 
+  ref_offset = TREE_OPERAND (rhs, 1);
   if (use_delta)
 fld = delta_field;
   else
@@ -2757,17 +2769,31 @@ ipa_analyze_indirect_call_uses (struct 
ipa_func_body_info *fbi, gcall *call,
   if (rec != rec2)
 return;
 
-  index = ipa_get_param_decl_index (info, rec);
-  if (index >= 0
-  && parm_preserved_before_stmt_p (fbi, index, call, rec))
+  if (TREE_CODE (rec) == SSA_NAME)
 {
-  struct cgraph_edge *cs = ipa_note_param_call (fbi->node, index,
-   call, false);
-  cs->indirect_info->offset = offset;
-  cs->indirect_info->agg_contents = 1;
-  cs->indirect_info->member_ptr = 1;
-  cs->indirect_info->guaranteed_unmodified = 1;
+  index = ipa_get_param_decl_index (info, SSA_NAME_VAR (rec));
+  if (index < 0
+ || !parm_ref_data_preserved_p (fbi, index, call,
+gimple_assign_rhs1 (def)))
+   return;
+  by_ref = true;
 }
+  else
+{
+  index = ipa_get_param_decl_index (info, rec);
+  if (index < 0
+ || !parm_preserved_before_stmt_p (fbi, index, call, rec))
+   return;
+  by_ref = false;
+}
+
+  struct cgraph_edge *cs = ipa_note_param_call (fbi->node, index,
+   call, false);
+  cs->indirect_info->offset = offset;
+  cs->indirect_info->agg_contents = 1;
+  cs->indirect_info->member_ptr = 1;
+  cs->indirect_info->by_ref = by_ref;
+  cs->indirect_info->guaranteed_unmodified = 1;
 
   return;
 }
diff --git a/gcc/testsuite/g++.dg/ipa/pr108802.C 
b/gcc/testsuite/g++.dg/ipa/pr108802.C
new file mode 100644
index 000..2e2b6c66b64
--- /dev/null
+++ b/gcc/testsuite/g++.dg/ipa/pr108802.C
@@ -0,0 +1,14 @@
+/* { dg-do compile  } */
+/* { dg-options "-O2 -std=c++14 -fdump-ipa-inline -fno-early-inlining"  } */
+/* { dg-add-options bind_pic_locally } */
+
+struct A {
+int interesting(int x) { return 2 * x; }
+};
+
+int f1() {
+A a;
+return [&](auto&& f) { return (a.*f)(42); } (&A::interesting);
+}
+
+/* { dg-final { scan-ipa-dump "A::interesting\[^\\n\]*inline copy in int f1"  
"inline"  } } */
-- 
2.43.0



[PATCH] ipa: Convert lattices from pure array to vector (PR 113476)

2024-02-19 Thread Martin Jambor
On Tue, Feb 13 2024, Martin Jambor wrote:
> On Mon, Feb 12 2024, Jan Hubicka wrote:
>>> Believe it or not, even though I have re-worked the internals of the
>>> lattices completely, the array itself is older than my involvement with
>>> GCC (or at least with ipa-cp.c ;-).
>>> 
>>> So it being an array and not a vector is historical coincidence, as far
>>> as I am concerned :-).  But that may be the reason, or because vector
>>> macros at that time looked scary, or perhaps the initialization by
>>> XCNEWVEC zeroing everything out was considered attractive (I kind of
>>> like that but constructors would probably be cleaner), I don't know.
>>
>> If your class is no longer a POD, then the clearing before construcion
>> is dead and GCC may optimize it out.  So fixing this may solve some
>> surprised in foreseable future when we will try to compile older GCC's
>> with newer ones.
>>
>
> That's a good point.  I'll prepare a patch converting the whole thing to
> use constructors and vectors.
>

In PR 113476 we have discovered that ipcp_param_lattices is no longer
a POD and should be destructed.  In a follow-up discussion it
transpired that their initialization done by memsetting their backing
memory to zero is also invalid because now any write there before
construction can be considered dead.  Plus that having them in an
array is a little bit old-school and does not get the extra checking
offered by vector along with automatic construction and destruction
when necessary.

So this patch converts the array to a vector.  That however means that
ipcp_param_lattices cannot be just a forward declared type but must be
known to all code that deal with ipa_node_params and thus to all code
that includes ipa-prop.h.  Therefore I have moved ipcp_param_lattices
and the type it depends on to a new header ipa-cp.h which now
ipa-prop.h depends on.  Because we have the (IMHO not a very wise)
rule that headers don't include what they need themselves, I had to
add inclusions of ipa-cp.h and sreal.h (on which it depends) to very
many files, which made the patch rather ugly.

Bootstrapped and tested on x86_64-linux.  I also had it checked by our
script which builds more than a hundred of cross-compilers, so other
targets are hopefully also fine.

OK for master?

Martin


gcc/lto/ChangeLog:

2024-02-16  Martin Jambor  

* lto-common.cc: Include sreal.h and ipa-cp.h.
    * lto-partition.cc: Include ipa-cp.h, move inclusion of sreal higher.
* lto.cc: Include sreal.h and ipa-cp.h.

gcc/ChangeLog:

2024-02-16  Martin Jambor  

* ipa-prop.h (ipa_node_params): Convert lattices to a vector, adjust
initializers in the contructor.
(ipa_node_params::~ipa_node_params): Release lattices as a vector.
* ipa-cp.h: New file.
* ipa-cp.cc: Include sreal.h and ipa-cp.h.
(ipcp_value_source): Move to ipa-cp.h.
(ipcp_value_base): Likewise.
(ipcp_value): Likewise.
(ipcp_lattice): Likewise.
(ipcp_agg_lattice): Likewise.
(ipcp_bits_lattice): Likewise.
(ipcp_vr_lattice): Likewise.
(ipcp_param_lattices): Likewise.
(ipa_get_parm_lattices): Remove assert latticess is non-NULL).
(ipa_value_from_jfunc): Adjust a check for empty lattices.
(ipa_context_from_jfunc): Likewise.
(ipa_agg_value_from_jfunc): Likewise.
(merge_agg_lats_step): Do not memset new aggregate lattices to zero.
(ipcp_propagate_stage): Allocate lattices in a vector as opposed to
just in contiguous memory.
(ipcp_store_vr_results): Adjust a check for empty lattices.
* auto-profile.cc: Include sreal.h and ipa-cp.h.
* cgraph.cc: Likewise.
* cgraphclones.cc: Likewise.
* cgraphunit.cc: Likewise.
* config/aarch64/aarch64.cc: Likewise.
* config/i386/i386-builtins.cc: Likewise.
* config/i386/i386-expand.cc: Likewise.
* config/i386/i386-features.cc: Likewise.
* config/i386/i386-options.cc: Likewise.
* config/i386/i386.cc: Likewise.
* config/rs6000/rs6000.cc: Likewise.
* config/s390/s390.cc: Likewise.
* gengtype.cc (open_base_files): Added sreal.h and ipa-cp.h to the
files to be included in gtype-desc.cc.
* gimple-range-fold.cc: Include sreal.h and ipa-cp.h.
* ipa-devirt.cc: Likewise.
* ipa-fnsummary.cc: Likewise.
* ipa-icf.cc: Likewise.
* ipa-inline-analysis.cc: Likewise.
* ipa-inline-transform.cc: Likewise.
* ipa-inline.cc: Include ipa-cp.h, move inclusion of sreal.h higher.
* ipa-modref.cc: Include sreal.h and ipa-cp.h.
* ipa-param-manipulation.cc: Likewise.
* ipa-predicate.cc: Likewise.
* ipa-profile.cc: Likewise.
* ipa-prop.cc: Likewise.
(ipa_n

[PATCH] testsuite: Fix guality/ipa-sra-1.c to work with return IPA-VRP

2024-02-14 Thread Martin Jambor
Hi,

the test guality/ipa-sra-1.c stopped working after
r14-5628-g53ba8d669550d3 because the variable from which the values of
removed parameters could be calculated is also removed with it.  Fixed
with this patch which stops a function from returning a constant.

I have also noticed that the XFAILed test passes at -O0 -O1 and -Og on
all (three) targets I have tried, not just aarch64, so I extended the
xfail exception accordingly.

Tested by running make -k check-gcc
RUNTESTFLAGS="guality.exp=ipa-sra-1.c" on x86_64-linux, aarch64-linux
and ppc64le-linux.  I hope it is obvious change for me to commit
without approval which I will do later today.

Thanks,

Martin


gcc/testsuite/ChangeLog:

2024-02-14  Martin Jambor  

* gcc.dg/guality/ipa-sra-1.c (get_val1): Move up in the file.
(get_val2): Likewise.
(bar): Do not return a constant.  Extend xfail exception for all
targets.
---
 gcc/testsuite/gcc.dg/guality/ipa-sra-1.c | 13 ++---
 1 file changed, 6 insertions(+), 7 deletions(-)

diff --git a/gcc/testsuite/gcc.dg/guality/ipa-sra-1.c 
b/gcc/testsuite/gcc.dg/guality/ipa-sra-1.c
index 9ef4eac93a7..55267c6f838 100644
--- a/gcc/testsuite/gcc.dg/guality/ipa-sra-1.c
+++ b/gcc/testsuite/gcc.dg/guality/ipa-sra-1.c
@@ -1,6 +1,10 @@
 /* { dg-do run } */
 /* { dg-options "-g -fno-ipa-icf" } */
 
+int __attribute__((noipa))
+get_val1 (void)  {return 20;}
+int __attribute__((noipa))
+get_val2 (void)  {return 7;}
 
 void __attribute__((noipa))
 use (int x)
@@ -12,8 +16,8 @@ static int __attribute__((noinline))
 bar (int i, int k)
 {
   asm ("" : "+r" (i));
-  use (i); /* { dg-final { gdb-test . "k" "3" { xfail { ! { 
aarch64*-*-* && { any-opts "-O0" "-O1" "-Og" } } } } } } */
-  return 6;
+  use (i); /* { dg-final { gdb-test . "k" "3" { xfail { ! { 
*-*-*-* && { any-opts "-O0" "-O1" "-Og" } } } } } } */
+  return 6 + get_val1();
 }
 
 volatile int v;
@@ -30,11 +34,6 @@ foo (int i, int k)
 
 volatile int v;
 
-int __attribute__((noipa))
-get_val1 (void)  {return 20;}
-int __attribute__((noipa))
-get_val2 (void)  {return 7;}
-
 int
 main (void)
 {
-- 
2.43.0



Re: [PATCH] ipa: call destructors on lattices before freeing them (PR 113476)

2024-02-14 Thread Martin Jambor
Hi,

On Mon, Feb 12 2024, Jan Hubicka wrote:
>> Hi,
>> 
>> In PR 113476 we have discovered that ipcp_param_lattices is no longer
>> a POD and should be destructed.  This patch does that, calling
>> destructor on each element of the array containing them when the
>> corresponding summary of a node is freed.  An alternative would be to
>> change the XCNEWVEC-and-placement-new to initializations in
>> constructors of all things in ipcp_param_lattices and then simply use
>> normal operators new and delete.  I am not sure, the initialization
>> through XCNEWVEC may be a bit more efficient although that is probably
>> not a big concern.  In the end, I opted for a simpler solution for
>> stage 4.
>> 
>> I have verified that valgrind no longer reports lost memory blocks
>> allocated within ipcp_vr_lattice::meet_with_1 on the preprocessed source
>> (dwarf2out.i) attached to Bugzilla.  The patch also passes bootstrap and
>> LTO bootstrap and testing on x86_64-linux.
>> 
>> OK for master?
>> 
>> Thanks,
>> 
>> Martin
>> 
>> 
>> gcc/ChangeLog:
>> 
>> 2024-02-09  Martin Jambor  
>> 
>>  PR tree-optimization/113476
>>  * ipa-prop.h (ipa_node_params::~ipa_node_params): Moved...
>>  * ipa-cp.cc (ipa_node_params::~ipa_node_params): ...here.  Added
>>  destruction of lattices.
>
> OK.
> So you do not use vectors (which would also handle the destruction)
> basically to save space needed to keep the
> size of the vector since that is known from the parameter count?
>

OK, so when I started looking at converting lattices to vector, it
immediately became clear why it is an array.  The type of the element of
the array (ipcp_param_lattices and all it contains) is only forward
declared in ipa-prop.h where ipa_node_params is defined which can
therefore just contain a pointer.  The actual definition of
ipcp_param_lattices is then done only in ipa-cp.c.

Converting the array to a vector would means moving ipcp_param_lattices
together with ipcp_lattice, ipcp_value, ipcp_value_base,
ipcp_agg_lattice, ipcp_bits_lattice, ipcp_vr_lattice from ipa-cp.c to
ipa-prop.h.  Or an ipa-cp.h which ipa-prop.h would require/include.  But
perhaps that is the proper C++ thing to do :-/

Martin


Re: [PATCH] ipa: call destructors on lattices before freeing them (PR 113476)

2024-02-13 Thread Martin Jambor
On Mon, Feb 12 2024, Jan Hubicka wrote:
>> Believe it or not, even though I have re-worked the internals of the
>> lattices completely, the array itself is older than my involvement with
>> GCC (or at least with ipa-cp.c ;-).
>> 
>> So it being an array and not a vector is historical coincidence, as far
>> as I am concerned :-).  But that may be the reason, or because vector
>> macros at that time looked scary, or perhaps the initialization by
>> XCNEWVEC zeroing everything out was considered attractive (I kind of
>> like that but constructors would probably be cleaner), I don't know.
>
> If your class is no longer a POD, then the clearing before construcion
> is dead and GCC may optimize it out.  So fixing this may solve some
> surprised in foreseable future when we will try to compile older GCC's
> with newer ones.
>

That's a good point.  I'll prepare a patch converting the whole thing to
use constructors and vectors.

Thanks,

Martin


Re: [PATCH] ipa: call destructors on lattices before freeing them (PR 113476)

2024-02-12 Thread Martin Jambor
On Mon, Feb 12 2024, Jan Hubicka wrote:
>> Hi,
>> 
>> In PR 113476 we have discovered that ipcp_param_lattices is no longer
>> a POD and should be destructed.  This patch does that, calling
>> destructor on each element of the array containing them when the
>> corresponding summary of a node is freed.  An alternative would be to
>> change the XCNEWVEC-and-placement-new to initializations in
>> constructors of all things in ipcp_param_lattices and then simply use
>> normal operators new and delete.  I am not sure, the initialization
>> through XCNEWVEC may be a bit more efficient although that is probably
>> not a big concern.  In the end, I opted for a simpler solution for
>> stage 4.
>> 
>> I have verified that valgrind no longer reports lost memory blocks
>> allocated within ipcp_vr_lattice::meet_with_1 on the preprocessed source
>> (dwarf2out.i) attached to Bugzilla.  The patch also passes bootstrap and
>> LTO bootstrap and testing on x86_64-linux.
>> 
>> OK for master?
>> 
>> Thanks,
>> 
>> Martin
>> 
>> 
>> gcc/ChangeLog:
>> 
>> 2024-02-09  Martin Jambor  
>> 
>>  PR tree-optimization/113476
>>  * ipa-prop.h (ipa_node_params::~ipa_node_params): Moved...
>>  * ipa-cp.cc (ipa_node_params::~ipa_node_params): ...here.  Added
>>  destruction of lattices.
>
> OK.
> So you do not use vectors (which would also handle the destruction)
> basically to save space needed to keep the
> size of the vector since that is known from the parameter count?
>

Believe it or not, even though I have re-worked the internals of the
lattices completely, the array itself is older than my involvement with
GCC (or at least with ipa-cp.c ;-).

So it being an array and not a vector is historical coincidence, as far
as I am concerned :-).  But that may be the reason, or because vector
macros at that time looked scary, or perhaps the initialization by
XCNEWVEC zeroing everything out was considered attractive (I kind of
like that but constructors would probably be cleaner), I don't know.

Martin


[PATCH] ipa: call destructors on lattices before freeing them (PR 113476)

2024-02-12 Thread Martin Jambor
Hi,

In PR 113476 we have discovered that ipcp_param_lattices is no longer
a POD and should be destructed.  This patch does that, calling
destructor on each element of the array containing them when the
corresponding summary of a node is freed.  An alternative would be to
change the XCNEWVEC-and-placement-new to initializations in
constructors of all things in ipcp_param_lattices and then simply use
normal operators new and delete.  I am not sure, the initialization
through XCNEWVEC may be a bit more efficient although that is probably
not a big concern.  In the end, I opted for a simpler solution for
stage 4.

I have verified that valgrind no longer reports lost memory blocks
allocated within ipcp_vr_lattice::meet_with_1 on the preprocessed source
(dwarf2out.i) attached to Bugzilla.  The patch also passes bootstrap and
LTO bootstrap and testing on x86_64-linux.

OK for master?

Thanks,

Martin


gcc/ChangeLog:

2024-02-09  Martin Jambor  

PR tree-optimization/113476
* ipa-prop.h (ipa_node_params::~ipa_node_params): Moved...
* ipa-cp.cc (ipa_node_params::~ipa_node_params): ...here.  Added
destruction of lattices.
---
 gcc/ipa-cp.cc  | 17 +
 gcc/ipa-prop.h |  9 -
 2 files changed, 17 insertions(+), 9 deletions(-)

diff --git a/gcc/ipa-cp.cc b/gcc/ipa-cp.cc
index e85477df32d..9864ff052de 100644
--- a/gcc/ipa-cp.cc
+++ b/gcc/ipa-cp.cc
@@ -399,6 +399,23 @@ public:
   bool virt_call;
 };
 
+/* Destructor of node function summary, placed here because it mainly must
+   destruct value range lattices not known outside of this source file.  */
+
+ipa_node_params::~ipa_node_params ()
+{
+  if (lattices)
+{
+  int count = ipa_get_param_count (this);
+  for (int i = 0; i < count; i++)
+   lattices[i].~ipcp_param_lattices ();
+  free (lattices);
+}
+  vec_free (descriptors);
+  known_csts.release ();
+  known_contexts.release ();
+}
+
 /* Allocation pools for values and their sources in ipa-cp.  */
 
 object_allocator > ipcp_cst_values_pool
diff --git a/gcc/ipa-prop.h b/gcc/ipa-prop.h
index 9c78dc9f486..fe401640824 100644
--- a/gcc/ipa-prop.h
+++ b/gcc/ipa-prop.h
@@ -670,15 +670,6 @@ ipa_node_params::ipa_node_params ()
 {
 }
 
-inline
-ipa_node_params::~ipa_node_params ()
-{
-  free (lattices);
-  vec_free (descriptors);
-  known_csts.release ();
-  known_contexts.release ();
-}
-
 /* Intermediate information that we get from alias analysis about a particular
parameter in a particular basic_block.  When a parameter or the memory it
references is marked modified, we use that information in all dominated
-- 
2.43.0



Re: [RFC] GCC Security policy

2024-02-12 Thread Martin Jambor
Hi,

On Fri, Feb 09 2024, Siddhesh Poyarekar wrote:
> On 2024-02-09 10:38, Martin Jambor wrote:
>> If anyone is interested in scoping this and then mentoring this as a
>> Google Summer of Code project this year then now is the right time to
>> speak up!
>
> I can help with mentoring and reviews, although I'll need someone to 
> assist with actual approvals.

I'm sure that we could manage that.  The project does not look like it
would be a huge one.

>
> There are two distinct sets of ideas to explore, one is privilege 
> management and the other sandboxing.
>
> For privilege management we could add a --allow-root driver flag that 
> allows gcc to run as root.  Without the flag one could either outright 
> refuse to run or drop privileges and run.  Dropping privileges will be a 
> bit tricky to implement because it would need a user to drop privileges 
> to and then there would be the question of how to manage file access to 
> read the compiler input and write out the compiler output.  If there's 
> no such user, gcc could refuse to run as root by default.  I wonder 
> though if from a security posture perspective it makes sense to simply 
> discourage running as root all the time and not bother trying to make it 
> work with dropped privileges and all that.  Of course it would mean that 
> this would be less of a "project"; it'll be a simple enough patch to 
> refuse to run until --allow-root is specified.

Yeah, this would not be enough for a GSoC project, not even for their
new small project category.

Additionally, I think that many, if not all, Linux distributions that
build binary packages do it in a VM/container/chroot where they do it
simply under root because the whole environment is there just for the
build.  So this would complicate lives for an important set of our
users.

>
> This probably ties in somewhat with an idea David Malcolm had riffed on 
> with me earlier, of caching files for diagnostics.  If we could unify 
> file accesses somehow, we could make this happen, i.e. open/read files 
> as root and then do all execution as non-root.
>
> Sandboxing will have similar requirements, i.e. map in input files and 
> an output file handle upfront and then unshare() into a sandbox to do 
> the actual compilation.  This will make sure that at least the 
> processing of inputs does not affect the system on which the compilation 
> is being run.

Right.  As we often just download some (sometimes large) pre-processed
source from Bugzilla and then happily run GCC on it on our computers,
this feature might be actually useful for us (still, we'd probably need
a more concrete description of what we want, would e.g. using "-wrapper
gdb,--args" work in such a sandbox?).  I agree that for some even
semi-complex builds, a more general sandboxing solution is probably
better.

Martin


Re: [RFC] GCC Security policy

2024-02-09 Thread Martin Jambor
Hi,

On Tue, Aug 08 2023, Richard Biener via Gcc-patches wrote:
> On Tue, Aug 8, 2023 at 2:33 PM Siddhesh Poyarekar  wrote:
>>
>> On 2023-08-08 04:16, Richard Biener wrote:
>> > On Mon, Aug 7, 2023 at 7:30 PM David Edelsohn via Gcc-patches
>> >  wrote:
>> >>
>> >> FOSS Best Practices recommends that projects have an official Security
>> >> policy stated in a SECURITY.md or SECURITY.txt file at the root of the
>> >> repository.  GLIBC and Binutils have added such documents.
>> >>
>> >> Appended is a prototype for a Security policy file for GCC based on the
>> >> Binutils document because GCC seems to have more affinity with Binutils as
>> >> a tool. Do the runtime libraries distributed with GCC, especially libgcc,
>> >> require additional security policies?
>> >>
>> >> [ ] Is it appropriate to use the Binutils SECURITY.txt as the starting
>> >> point or should GCC use GLIBC SECURITY.md as the starting point for the 
>> >> GCC
>> >> Security policy?
>> >>
>> >> [ ] Does GCC, or some components of GCC, require additional care because 
>> >> of
>> >> runtime libraries like libgcc and libstdc++, and because of gcov and
>> >> profile-directed feedback?
>> >
>> > I do think that the runtime libraries should at least be explicitly 
>> > mentioned
>> > because they fall into the "generated output" category and bugs in the
>> > runtime are usually more severe as affecting a wider class of inputs.
>>
>> Ack, I'd expect libstdc++ and libgcc to be aligned with glibc's
>> policies.  libiberty and others on the other hand, would probably be
>> more suitably aligned with binutils libbfd, where we assume trusted input.
>>
>> >> Thoughts?
>> >>
>> >> Thanks, David
>> >>
>> >> GCC Security Process
>> >> 
>> >>
>> >> What is a GCC security bug?
>> >> ===
>> >>
>> >>  A security bug is one that threatens the security of a system or
>> >>  network, or might compromise the security of data stored on it.
>> >>  In the context of GCC there are two ways in which such
>> >>  bugs might occur.  In the first, the programs themselves might be
>> >>  tricked into a direct compromise of security.  In the second, the
>> >>  tools might introduce a vulnerability in the generated output that
>> >>  was not already present in the files used as input.
>> >>
>> >>  Other than that, all other bugs will be treated as non-security
>> >>  issues.  This does not mean that they will be ignored, just that
>> >>  they will not be given the priority that is given to security bugs.
>> >>
>> >>  This stance applies to the creation tools in the GCC (e.g.,
>> >>  gcc, g++, gfortran, gccgo, gccrs, gnat, cpp, gcov, etc.) and the
>> >>  libraries that they use.
>> >>
>> >> Notes:
>> >> ==
>> >>
>> >>  None of the programs in GCC need elevated privileges to operate and
>> >>  it is recommended that users do not use them from accounts where such
>> >>  privileges are automatically available.
>> >
>> > I'll note that we could ourselves mitigate some of that by handling 
>> > privileged
>> > invocation of the driver specially, dropping privs on exec of the sibling 
>> > tools
>> > and possibly using temporary files or pipes to do the parts of the I/O that
>> > need to be privileged.
>>
>> It's not a bad idea, but it ends up giving legitimizing running the
>> compiler as root, pushing the responsibility of privilege management to
>> the driver.  How about rejecting invocation as root altogether by
>> default, bypassed with a --run-as-root flag instead?
>>
>> I've also been thinking about a --sandbox flag that isolates the build
>> process (for gcc as well as binutils) into a separate namespace so that
>> it's usable in a restricted mode on untrusted sources without exposing
>> the rest of the system to it.
>
> There's probably external tools to do this, not sure if we should replicate
> things in the driver for this.
>
> But sure, I think the driver is the proper point to address any of such
> issues - iff we want to address them at all.  Maybe a nice little
> google summer-of-code project ;)
>

If anyone is interested in scoping this and then mentoring this as a
Google Summer of Code project this year then now is the right time to
speak up!

Thanks,

Martin


[PATCH] ipa: Avoid excessive removing of SSAs (PR 113757)

2024-02-08 Thread Martin Jambor
Hi,

PR 113757 shows that the code which was meant to debug-reset and
remove SSAs defined by LHSs of calls redirected to
__builtin_unreachable can trigger also when speculative
devirtualization creates a call to a noreturn function (and since it
is noreturn, it does not bother dealing with its return value).

What is more, it seems that the code handling this case is not really
necessary.  I feel slightly idiotic about this because I have a
feeling that I added it because of a failing test-case but I can
neither find the testcase nor a reason why the code in
cgraph_edge::redirect_call_stmt_to_callee would not be sufficient (it
turns the SSA name into a default-def, a bit like IPA-SRA, but any
code dominated by a call to a noreturn is not dangerous when it comes
to its side-effects).  So this patch just removes the handling.

Bootstrapped and tested on x86_64-linux and ppc64le-linux.  I have also
LTO-bootstrapped and LTO-profilebootstrapped the patch on x86_64-linux.

OK for master?

Thanks,

Martin


gcc/ChangeLog:

2024-02-07  Martin Jambor  

PR ipa/113757
* tree-inline.cc (redirect_all_calls): Remove code adding SSAs to
id->killed_new_ssa_names.

gcc/testsuite/ChangeLog:

2024-02-07  Martin Jambor  

PR ipa/113757
* g++.dg/ipa/pr113757.C: New test.
---
 gcc/testsuite/g++.dg/ipa/pr113757.C | 14 ++
 gcc/tree-inline.cc  | 14 ++
 2 files changed, 16 insertions(+), 12 deletions(-)
 create mode 100644 gcc/testsuite/g++.dg/ipa/pr113757.C

diff --git a/gcc/testsuite/g++.dg/ipa/pr113757.C 
b/gcc/testsuite/g++.dg/ipa/pr113757.C
new file mode 100644
index 000..885d4010a10
--- /dev/null
+++ b/gcc/testsuite/g++.dg/ipa/pr113757.C
@@ -0,0 +1,14 @@
+// { dg-do compile }
+// { dg-options "-O2 -fPIC" }
+// { dg-require-effective-target fpic }
+
+long size();
+struct ll {  virtual int hh();  };
+ll  *slice_owner;
+int ll::hh() { __builtin_exit(0); }
+int nn() {
+  if (size())
+return 0;
+  return slice_owner->hh();
+}
+int (*a)() = nn;
diff --git a/gcc/tree-inline.cc b/gcc/tree-inline.cc
index 75c10eb7dfc..cac41b4f031 100644
--- a/gcc/tree-inline.cc
+++ b/gcc/tree-inline.cc
@@ -2984,23 +2984,13 @@ redirect_all_calls (copy_body_data * id, basic_block bb)
   gimple *stmt = gsi_stmt (si);
   if (is_gimple_call (stmt))
{
- tree old_lhs = gimple_call_lhs (stmt);
  struct cgraph_edge *edge = id->dst_node->get_edge (stmt);
  if (edge)
{
  if (!id->killed_new_ssa_names)
id->killed_new_ssa_names = new hash_set (16);
- gimple *new_stmt
-   = cgraph_edge::redirect_call_stmt_to_callee (edge,
-   id->killed_new_ssa_names);
- if (old_lhs
- && TREE_CODE (old_lhs) == SSA_NAME
- && !gimple_call_lhs (new_stmt))
-   /* In case of IPA-SRA removing the LHS, the name should have
-  been already added to the hash.  But in case of redirecting
-  to builtin_unreachable it was not and the name still should
-  be pruned from debug statements.  */
-   id->killed_new_ssa_names->add (old_lhs);
+ cgraph_edge::redirect_call_stmt_to_callee (edge,
+   id->killed_new_ssa_names);
 
  if (stmt == last && id->call_stmt && maybe_clean_eh_stmt (stmt))
gimple_purge_dead_eh_edges (bb);
-- 
2.43.0



Re: [PATCH] ipa-cp: Fix check for exceeding param_ipa_cp_value_list_size (PR 113490)

2024-01-24 Thread Martin Jambor
Hi,

On Mon, Jan 22 2024, Jan Hubicka wrote:
>> Hi,
>> 
>> When the check for exceeding param_ipa_cp_value_list_size limit was
>> modified to be ignored for generating values from self-recursive
>> calls, it should have been changed from equal to, to equals toor is
>> greater than.  This omission manifests itself as PR 113490.
>> 
>> When I examined the condition I also noticed that the parameter should
>> come from the callee rather than the caller, since the value list is
>> associated with the former and not the latter.  In practice the limit
>> is of course very likely to be the same, but I fixed this aspect of
>> the condition too.  I briefly audited all other uses of opt_for_fn in
>> ipa-cp.cc and all the others looked OK.
>> 
>> Bootstrapped and tested on x86_64-linux.  OK for master?
>> 
>> Thanks,
>> 
>> Martin
>> 
>> 
>> gcc/ChangeLog:
>> 
>> 2024-01-19  Martin Jambor  
>> 
>>  PR ipa/113490
>>  * ipa-cp.cc (ipcp_lattice::add_value): Bail out if value
>>  count is equal or greater than the limit.  Use the limit from the
>>  callee.
>> 
>> gcc/testsuite/ChangeLog:
>> 
>> 2024-01-19  Martin Jambor  
>> 
>>  PR ipa/113490
>>  * gcc.dg/ipa/pr113490.c: New test.
> OK,
> thanks!

thank you, I have pushed the following, which has a tweak in the added
test so that it is only run on targets which support the required vectors.

Martin




When the check for exceeding param_ipa_cp_value_list_size limit was
modified to be ignored for generating values from self-recursive
calls, it should have been changed from equal to, to equals to or is
greater than.  This omission manifests itself as PR 113490.

When I examined the condition I also noticed that the parameter should
come from the callee rather than the caller, since the value list is
associated with the former and not the latter.  In practice the limit
is of course very likely to be the same, but I fixed this aspect of
the condition too.  I briefly audited all other uses of opt_for_fn in
ipa-cp.cc and all the others looked OK.

gcc/ChangeLog:

2024-01-19  Martin Jambor  

    PR ipa/113490
* ipa-cp.cc (ipcp_lattice::add_value): Bail out if value
count is equal or greater than the limit.  Use the limit from the
callee.

gcc/testsuite/ChangeLog:

2024-01-22  Martin Jambor  

PR ipa/113490
* gcc.dg/ipa/pr113490.c: New test.
---
 gcc/ipa-cp.cc   |  2 +-
 gcc/testsuite/gcc.dg/ipa/pr113490.c | 31 +
 2 files changed, 32 insertions(+), 1 deletion(-)
 create mode 100644 gcc/testsuite/gcc.dg/ipa/pr113490.c

diff --git a/gcc/ipa-cp.cc b/gcc/ipa-cp.cc
index b1e2a3a829a..e85477df32d 100644
--- a/gcc/ipa-cp.cc
+++ b/gcc/ipa-cp.cc
@@ -2298,7 +2298,7 @@ ipcp_lattice::add_value (valtype newval, 
cgraph_edge *cs,
return false;
   }
 
-  if (!same_lat_gen_level && values_count == opt_for_fn (cs->caller->decl,
+  if (!same_lat_gen_level && values_count >= opt_for_fn (cs->callee->decl,
param_ipa_cp_value_list_size))
 {
   /* We can only free sources, not the values themselves, because sources
diff --git a/gcc/testsuite/gcc.dg/ipa/pr113490.c 
b/gcc/testsuite/gcc.dg/ipa/pr113490.c
new file mode 100644
index 000..526e22b3787
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/ipa/pr113490.c
@@ -0,0 +1,31 @@
+/* { dg-do compile { target int128 } } */
+/* { dg-options "-O3 -Wno-psabi"  } */
+
+typedef char A __attribute__((vector_size (64)));
+typedef short B __attribute__((vector_size (64)));
+typedef unsigned C __attribute__((vector_size (64)));
+typedef long D __attribute__((vector_size (64)));
+typedef __int128 E __attribute__((vector_size (64)));
+
+D bar1_D_0;
+E bar4 (A, D);
+
+E
+bar1 (C C_0)
+{
+  C_0 >>= 1;
+  bar4 ((A) C_0, bar1_D_0);
+  bar4 ((A) (E) {~0 }, (D) (A){ ~0 });
+  bar4 ((A) (B) { ~0 }, (D) (C) { ~0 });
+  bar1 ((C) (D){ 0, ~0});
+  bar4 ((A) C_0, bar1_D_0);
+  (A) { bar1 ((C) { 7})[5] - C_0[63], bar4 ((A) (D) {~0}, (D) (C) { 0, 
~0})[3]};
+}
+
+E
+bar4 (A A_0, D D_0)
+{
+  bar1 ((C) A_0);
+  bar1 ((C) {5});
+  bar1 ((C) D_0);
+}
-- 
2.43.0



[PATCH] ipa-cp: Fix check for exceeding param_ipa_cp_value_list_size (PR 113490)

2024-01-20 Thread Martin Jambor
Hi,

When the check for exceeding param_ipa_cp_value_list_size limit was
modified to be ignored for generating values from self-recursive
calls, it should have been changed from equal to, to equals toor is
greater than.  This omission manifests itself as PR 113490.

When I examined the condition I also noticed that the parameter should
come from the callee rather than the caller, since the value list is
associated with the former and not the latter.  In practice the limit
is of course very likely to be the same, but I fixed this aspect of
the condition too.  I briefly audited all other uses of opt_for_fn in
ipa-cp.cc and all the others looked OK.

Bootstrapped and tested on x86_64-linux.  OK for master?

Thanks,

Martin


gcc/ChangeLog:

2024-01-19  Martin Jambor  

PR ipa/113490
* ipa-cp.cc (ipcp_lattice::add_value): Bail out if value
count is equal or greater than the limit.  Use the limit from the
callee.

gcc/testsuite/ChangeLog:

2024-01-19  Martin Jambor  

PR ipa/113490
* gcc.dg/ipa/pr113490.c: New test.
---
 gcc/ipa-cp.cc   |  2 +-
 gcc/testsuite/gcc.dg/ipa/pr113490.c | 31 +
 2 files changed, 32 insertions(+), 1 deletion(-)
 create mode 100644 gcc/testsuite/gcc.dg/ipa/pr113490.c

diff --git a/gcc/ipa-cp.cc b/gcc/ipa-cp.cc
index b1e2a3a829a..e85477df32d 100644
--- a/gcc/ipa-cp.cc
+++ b/gcc/ipa-cp.cc
@@ -2298,7 +2298,7 @@ ipcp_lattice::add_value (valtype newval, 
cgraph_edge *cs,
return false;
   }
 
-  if (!same_lat_gen_level && values_count == opt_for_fn (cs->caller->decl,
+  if (!same_lat_gen_level && values_count >= opt_for_fn (cs->callee->decl,
param_ipa_cp_value_list_size))
 {
   /* We can only free sources, not the values themselves, because sources
diff --git a/gcc/testsuite/gcc.dg/ipa/pr113490.c 
b/gcc/testsuite/gcc.dg/ipa/pr113490.c
new file mode 100644
index 000..cffb0c5f639
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/ipa/pr113490.c
@@ -0,0 +1,31 @@
+/* { dg-do compile } */
+/* { dg-options "-O3 -Wno-psabi"  } */
+
+typedef char A __attribute__((vector_size (64)));
+typedef short B __attribute__((vector_size (64)));
+typedef unsigned C __attribute__((vector_size (64)));
+typedef long D __attribute__((vector_size (64)));
+typedef __int128 E __attribute__((vector_size (64)));
+
+D bar1_D_0;
+E bar4 (A, D);
+
+E
+bar1 (C C_0)
+{
+  C_0 >>= 1;
+  bar4 ((A) C_0, bar1_D_0);
+  bar4 ((A) (E) {~0 }, (D) (A){ ~0 });
+  bar4 ((A) (B) { ~0 }, (D) (C) { ~0 });
+  bar1 ((C) (D){ 0, ~0});
+  bar4 ((A) C_0, bar1_D_0);
+  (A) { bar1 ((C) { 7})[5] - C_0[63], bar4 ((A) (D) {~0}, (D) (C) { 0, 
~0})[3]};
+}
+
+E
+bar4 (A A_0, D D_0)
+{
+  bar1 ((C) A_0);
+  bar1 ((C) {5});
+  bar1 ((C) D_0);
+}
-- 
2.43.0



[PATCH] sra: Disqualify bases of operands of asm gotos

2024-01-17 Thread Martin Jambor
Hi,

PR 110422 shows that SRA can ICE assuming there is a single edge
outgoing from a block terminated with an asm goto.  We need that for
BB-terminating statements so that any adjustments they make to the
aggregates can be copied over to their replacements.  Because we can't
have that after ASM gotos, we need to punt.

Bootstrapped and tested on x86_64-linux, OK for master?  It will need
some tweaking for release branches, is it in principle OK for them too
(after testing)?

Thanks,

Martin


gcc/ChangeLog:

2024-01-17  Martin Jambor  

PR tree-optimization/110422
* tree-sra.cc (scan_function): Disqualify bases of operands of asm
gotos.

gcc/testsuite/ChangeLog:

2024-01-17  Martin Jambor  

PR tree-optimization/110422
* gcc.dg/torture/pr110422.c: New test.
---
 gcc/testsuite/gcc.dg/torture/pr110422.c | 10 +
 gcc/tree-sra.cc | 29 -
 2 files changed, 33 insertions(+), 6 deletions(-)
 create mode 100644 gcc/testsuite/gcc.dg/torture/pr110422.c

diff --git a/gcc/testsuite/gcc.dg/torture/pr110422.c 
b/gcc/testsuite/gcc.dg/torture/pr110422.c
new file mode 100644
index 000..2e171a7a19e
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/torture/pr110422.c
@@ -0,0 +1,10 @@
+/* { dg-do compile } */
+
+struct T { int x; };
+int foo(void) {
+  struct T v;
+  asm goto("" : "+r"(v.x) : : : lab);
+  return 0;
+lab:
+  return -5;
+}
diff --git a/gcc/tree-sra.cc b/gcc/tree-sra.cc
index 6a1141b7377..f8e71ec48b9 100644
--- a/gcc/tree-sra.cc
+++ b/gcc/tree-sra.cc
@@ -1559,15 +1559,32 @@ scan_function (void)
case GIMPLE_ASM:
  {
gasm *asm_stmt = as_a  (stmt);
-   for (i = 0; i < gimple_asm_ninputs (asm_stmt); i++)
+   if (stmt_ends_bb_p (asm_stmt)
+   && !single_succ_p (gimple_bb (asm_stmt)))
  {
-   t = TREE_VALUE (gimple_asm_input_op (asm_stmt, i));
-   ret |= build_access_from_expr (t, asm_stmt, false);
+   for (i = 0; i < gimple_asm_ninputs (asm_stmt); i++)
+ {
+   t = TREE_VALUE (gimple_asm_input_op (asm_stmt, i));
+   disqualify_base_of_expr (t, "OP of asm goto.");
+ }
+   for (i = 0; i < gimple_asm_noutputs (asm_stmt); i++)
+ {
+   t = TREE_VALUE (gimple_asm_output_op (asm_stmt, i));
+   disqualify_base_of_expr (t, "OP of asm goto.");
+ }
  }
-   for (i = 0; i < gimple_asm_noutputs (asm_stmt); i++)
+   else
  {
-   t = TREE_VALUE (gimple_asm_output_op (asm_stmt, i));
-   ret |= build_access_from_expr (t, asm_stmt, true);
+   for (i = 0; i < gimple_asm_ninputs (asm_stmt); i++)
+ {
+   t = TREE_VALUE (gimple_asm_input_op (asm_stmt, i));
+   ret |= build_access_from_expr (t, asm_stmt, false);
+ }
+   for (i = 0; i < gimple_asm_noutputs (asm_stmt); i++)
+ {
+   t = TREE_VALUE (gimple_asm_output_op (asm_stmt, i));
+   ret |= build_access_from_expr (t, asm_stmt, true);
+ }
  }
  }
  break;
-- 
2.43.0



Re: [PATCH] sra: Partial fix for BITINT_TYPEs [PR113120]

2024-01-17 Thread Martin Jambor
Hi,
On Wed, Jan 10 2024, Jakub Jelinek wrote:
> Hi!
>
> As changed in other parts of the compiler, using
> build_nonstandard_integer_type is not appropriate for arbitrary precisions,
> especially if the precision comes from a BITINT_TYPE or something based on
> that, build_nonstandard_integer_type relies on some integral mode being
> supported that can support the precision.
>
> The following patch uses build_bitint_type instead for BITINT_TYPE
> precisions.
>
> Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?
>
> Note, it would be good if we were able to punt on the optimization
> (but this code doesn't seem to be able to punt, so it needs to be done
> somewhere earlier) at least in cases where building it would be invalid.
> E.g. right now BITINT_TYPE can support precisions up to 65535 (inclusive),
> but 65536 will not work anymore (we can't have > 16-bit TYPE_PRECISION).
> I've tried to replace 513 with 65532 in the testcase and it didn't ICE,
> so maybe it ran into some other SRA limit.

Thank you very much for the patch.  Regarding punting, did you mean for
all BITINT_TYPEs or just for big ones, like you did when you fixed PR
11333 (thanks for that too) or something entirely else?

Martin

>
> 2024-01-10  Jakub Jelinek  
>
>   PR tree-optimization/113120
>   * tree-sra.cc (analyze_access_subtree): For BITINT_TYPE
>   with root->size TYPE_PRECISION don't build anything new.
>   Otherwise, if root->type is a BITINT_TYPE, use build_bitint_type
>   rather than build_nonstandard_integer_type.
>
>   * gcc.dg/bitint-63.c: New test.


[PATCH] ipa: Self-DCE of uses of removed call LHSs (PR 108007)

2024-01-16 Thread Martin Jambor
Hi,

PR 108007 is another manifestation where we rely on DCE to clean-up
after IPA-SRA and if the user explicitely switches DCE off, IPA-SRA
can leave behind statements which are fed uninitialized values and
trap, even though their results are themselves never used.

I have already fixed this for unused parameters in callees, this bug
shows that almost the same thing can happen for removed returns, on
the side of callers.  This means that the issue has to be fixed
elsewhere, in call redirection.  This patch adds a function which
looks for (and through, using a work-list) uses of operations fed
specific SSA names and removes them all.

That would have been easy if it wasn't for debug statements during
tree-inline (from which call redirection is also invoked).  Debug
statements are decoupled from the rest at this point and iterating
over uses of SSAs does not bring them up.  During tree-inline they are
handled especially at the end, I assume in order to make sure that
relative ordering of UIDs are the same with and without debug info.

This means that during tree-inline we need to make a hash of killed
SSAs, that we already have in copy_body_data, available to the
function making the purging.  So the patch duly does also that, making
the interface slightly ugly.  Moreover, all newly unused SSA names
need to be freed and as PR 112616 showed, it must be done in a defined
order, which is what newly added ipa_release_ssas_in_hash does.

The only difference from the patch which has already been approved in
September but which I later had to revert is (one function name and)
that SSAs that are to be released are first put into an auto_vec and
sorted according their version number to avoid issues like PR 112616.

The patch has passed bootstrap, LTO-bootstrap and profiled-LTO-bootstrap
and testing on x86_64-linux, bootstrap, LTO-bootstrap and testing on
ppc64le-linux and bootstrap and LTO-bootstrap on Aarch64, testsuite
there is still running, OK if it passes?

Thanks

Martin


gcc/ChangeLog:

2024-01-12  Martin Jambor  

PR ipa/108007
PR ipa/112616
* cgraph.h (cgraph_edge): Add a parameter to
redirect_call_stmt_to_callee.
* ipa-param-manipulation.h (ipa_param_adjustments): Add a
parameter to modify_call.
(ipa_release_ssas_in_hash): Declare.
* cgraph.cc (cgraph_edge::redirect_call_stmt_to_callee): New
parameter killed_ssas, pass it to padjs->modify_call.
* ipa-param-manipulation.cc (purge_all_uses): New function.
(ipa_param_adjustments::modify_call): New parameter killed_ssas.
Instead of substituting uses, invoke purge_all_uses.  If
hash of killed SSAs has not been provided, create a temporary one
and release SSAs that have been added to it.
(compare_ssa_versions): New function.
(ipa_release_ssas_in_hash): Likewise.
* tree-inline.cc (redirect_all_calls): Create
id->killed_new_ssa_names earlier, pass it to edge redirection,
adjust a comment.
(copy_body): Release SSAs in id->killed_new_ssa_names.

gcc/testsuite/ChangeLog:

2024-01-15  Martin Jambor  

PR ipa/108007
PR ipa/112616
* gcc.dg/ipa/pr108007.c: New test.
* gcc.dg/ipa/pr112616.c: Likewise.
---
 gcc/cgraph.cc   |  10 ++-
 gcc/cgraph.h|   9 ++-
 gcc/ipa-param-manipulation.cc   | 112 ++--
 gcc/ipa-param-manipulation.h|   5 +-
 gcc/testsuite/gcc.dg/ipa/pr108007.c |  32 
 gcc/testsuite/gcc.dg/ipa/pr112616.c |  28 +++
 gcc/tree-inline.cc  |  27 ---
 7 files changed, 184 insertions(+), 39 deletions(-)
 create mode 100644 gcc/testsuite/gcc.dg/ipa/pr108007.c
 create mode 100644 gcc/testsuite/gcc.dg/ipa/pr112616.c

diff --git a/gcc/cgraph.cc b/gcc/cgraph.cc
index d565c005f62..0ac8f73204b 100644
--- a/gcc/cgraph.cc
+++ b/gcc/cgraph.cc
@@ -1403,11 +1403,17 @@ cgraph_edge::redirect_callee (cgraph_node *n)
speculative indirect call, remove "speculative" of the indirect call and
also redirect stmt to it's final direct target.
 
+   When called from within tree-inline, KILLED_SSAs has to contain the pointer
+   to killed_new_ssa_names within the copy_body_data structure and SSAs
+   discovered to be useless (if LHS is removed) will be added to it, otherwise
+   it needs to be NULL.
+
It is up to caller to iteratively transform each "speculative"
direct call as appropriate.  */
 
 gimple *
-cgraph_edge::redirect_call_stmt_to_callee (cgraph_edge *e)
+cgraph_edge::redirect_call_stmt_to_callee (cgraph_edge *e,
+  hash_set  *killed_ssas)
 {
   tree decl = gimple_call_fndecl (e->call_stmt);
   gcall *new_stmt;
@@ -1527,7 +1533,7 @@ cgraph_edge::redirect_call_stmt_to_callee (cgraph_edge *e)
remove_stmt_from_eh_lp (e->call_stmt);
 
   tree old_fntype = gimple_call_f

Re: [wwwdocs] gcc-14/changes.html: OpenMP - improve wording

2024-01-09 Thread Martin Jambor
Hi Tobias,

On Mon, Jan 08 2024, Tobias Burnus wrote:
> The attached patch

there was no patch attached to your message.

Martin

> does a tiny updated to the OpenMP features (AMD GCN 
> now also has an optimized memcpy_rect not only nvptx), but the main 
> change is some shifting around to make it more consistent and better 
> readable.
>
> I intend to commit this relatively soon; like always, comments and 
> suggestions are welcome - be it before or after the commit.
>
> Current version: http://gcc.gnu.org/gcc-14/changes.html
>
> Thanks,
>
> Tobias


Re: [PATCH] tree-optimization/111807 - ICE in verify_sra_access_forest

2023-12-13 Thread Martin Jambor
 DAG
 1577701 are aritificially in conflict with void *

  Modref stats:
modref kill: 832 kills, 19399 queries
modref use: 50760 disambiguations, 1825109 queries
modref clobber: 1371014 disambiguations, 40152535 queries
5190238 tbaa queries (0.129263 per modref query)
1341663 base compares (0.033414 per modref query)

  PTA query stats:
pt_solution_includes: 36784427 disambiguations, 46141175 queries
pt_solutions_intersect: 4519387 disambiguations, 17081996 queries

to:

  Alias oracle query stats:
refs_may_alias_p: 94354083 disambiguations, 106278948 queries
ref_maybe_used_by_call_p: 1572511 disambiguations, 95618018 queries
call_may_clobber_ref_p: 649273 disambiguations, 659371 queries
stmt_kills_ref_p: 142342 kills, 8407310 queries
nonoverlapping_component_refs_p: 19 disambiguations, 10227 queries
nonoverlapping_refs_since_match_p: 15665 disambiguations, 52585 must 
overlaps, 68893 queries
aliasing_component_refs_p: 67104 disambiguations, 3081781 queries
TBAA oracle: 22676608 disambiguations 61782455 queries
 14044948 are in alias set 0
 10998619 queries asked about the same object
 153 queries asked about the same alias set
 0 access volatile
 12484882 are dependent in the DAG
 1577245 are aritificially in conflict with void *

  Modref stats:
modref kill: 832 kills, 19399 queries
modref use: 50760 disambiguations, 1825106 queries
modref clobber: 1371028 disambiguations, 40152504 queries
5190319 tbaa queries (0.129265 per modref query)
1341403 base compares (0.033408 per modref query)

  PTA query stats:
pt_solution_includes: 36784449 disambiguations, 46141210 queries
pt_solutions_intersect: 4519320 disambiguations, 17082083 queries

gcc/ChangeLog:

2023-12-13  Martin Jambor  

PR tree-optimization/111807
* tree-sra.cc (build_ref_for_model): Allow offset smaller than
model->offset when gsi is non-NULL.  Adjust function comment.
---
 gcc/tree-sra.cc | 10 +++---
 1 file changed, 7 insertions(+), 3 deletions(-)

diff --git a/gcc/tree-sra.cc b/gcc/tree-sra.cc
index 3bd0c7a9af0..1dba721be11 100644
--- a/gcc/tree-sra.cc
+++ b/gcc/tree-sra.cc
@@ -1843,8 +1843,11 @@ build_reconstructed_reference (location_t, tree base, 
struct access *model)
 /* Construct a memory reference to a part of an aggregate BASE at the given
OFFSET and of the same type as MODEL.  In case this is a reference to a
bit-field, the function will replicate the last component_ref of model's
-   expr to access it.  GSI and INSERT_AFTER have the same meaning as in
-   build_ref_for_offset.  */
+   expr to access it.  INSERT_AFTER and GSI have the same meaning as in
+   build_ref_for_offset, furthermore, when GSI is NULL, the function expects
+   that it re-builds the entire reference from a DECL to the final access and
+   so will create a MEM_REF when OFFSET does not exactly match offset of
+   MODEL.  */
 
 static tree
 build_ref_for_model (location_t loc, tree base, HOST_WIDE_INT offset,
@@ -1874,7 +1877,8 @@ build_ref_for_model (location_t loc, tree base, 
HOST_WIDE_INT offset,
  && !TREE_THIS_VOLATILE (base)
  && (TYPE_ADDR_SPACE (TREE_TYPE (base))
  == TYPE_ADDR_SPACE (TREE_TYPE (model->expr)))
- && offset == model->offset
+ && (offset == model->offset
+ || (gsi && offset <= model->offset))
  /* build_reconstructed_reference can still fail if we have already
 massaged BASE because of another type incompatibility.  */
  && (res = build_reconstructed_reference (loc, base, model)))
-- 
2.43.0







[PATCH] SRA: Force gimple operand in an additional corner case (PR 112822)

2023-12-12 Thread Martin Jambor
Hi,

PR 112822 revealed a corner case in load_assign_lhs_subreplacements
where it creates invalid gimple: an assignment where on the LHS there
is a complex variable which however is not a gimple register because
it has partial defs and on the right hand side there is a
VIEW_CONVERT_EXPR.  This patch invokes force_gimple_operand_gsi on
such statements (like it already does when both sides of a generated
assignment have partial definitions.

I've made sure the patch passes bootstrap and testsuite on x86_64-linux,
the bug reporter was kind enough to also check the same on an
powerpc64le-linux (see bugzilla comment #8).

The testcase has reasonable size but it is specific to ppc64le and its
altivec vectors.  My plan is to ask the bug reporter to massage it into
a target specific testcase in bugzilla.  Alternatively I can try to
craft a testcase from scratch but that will take time.

Despite the above, is the patch OK for master?

Thanks,

Martin



gcc/ChangeLog:

2023-12-12  Martin Jambor  

PR tree-optimization/112822
* tree-sra.cc (load_assign_lhs_subreplacements): Invoke
force_gimple_operand_gsi also when LHS has partial stores and RHS is a
VIEW_CONVERT_EXPR.
---
 gcc/tree-sra.cc | 10 +++---
 1 file changed, 7 insertions(+), 3 deletions(-)

diff --git a/gcc/tree-sra.cc b/gcc/tree-sra.cc
index 3bd0c7a9af0..99a1b0a6d17 100644
--- a/gcc/tree-sra.cc
+++ b/gcc/tree-sra.cc
@@ -4219,11 +4219,15 @@ load_assign_lhs_subreplacements (struct access *lacc,
  if (racc && racc->grp_to_be_replaced)
{
  rhs = get_access_replacement (racc);
+ bool vce = false;
  if (!useless_type_conversion_p (lacc->type, racc->type))
-   rhs = fold_build1_loc (sad->loc, VIEW_CONVERT_EXPR,
-  lacc->type, rhs);
+   {
+ rhs = fold_build1_loc (sad->loc, VIEW_CONVERT_EXPR,
+lacc->type, rhs);
+ vce = true;
+   }
 
- if (racc->grp_partial_lhs && lacc->grp_partial_lhs)
+ if (lacc->grp_partial_lhs && (vce || racc->grp_partial_lhs))
rhs = force_gimple_operand_gsi (&sad->old_gsi, rhs, true,
NULL_TREE, true, GSI_SAME_STMT);
}
-- 
2.43.0



Re: [PATCH] tree-sra: Avoid returns of references to SRA candidates

2023-11-29 Thread Martin Jambor
Hi,

On Tue, Nov 28 2023, Jan Hubicka wrote:
>> On Tue, 28 Nov 2023, Martin Jambor wrote:
>> 
>> > On Tue, Nov 28 2023, Richard Biener wrote:
>> > > On Mon, 27 Nov 2023, Martin Jambor wrote:
>> > >
>> > >> Hi,
>> > >> 
>> > >> The enhancement to address PR 109849 contained an importsnt thinko,
>> > >> and that any reference that is passed to a function and does not
>> > >> escape, must also not happen to be aliased by the return value of the
>> > >> function.  This has quickly transpired as bugs PR 112711 and PR
>> > >> 112721.
>> > >> 
>> > >> Just as IPA-modref does a good enough job to allow us to rely on the
>> > >> escaped set of variables, it sems to be doing well also on updating
>> > >> EAF_NOT_RETURNED_DIRECTLY call argument flag which happens to address
>> > >> exactly the situation we need to avoid.  Of course, if a call
>> > >> statement ignores any returned value, we also do not need to check the
>> > >> flag.
>> > >
>> > > But what about EAF_NOT_RETURNED_INDIRECTLY?  Don't you need to
>> > > verify the parameter doesn't escape through the return at all?
>> > >
>> > 
>> > I thought EAF_NOT_RETURNED_INDIRECTLY prohibits things like "return
>> > param->next" but those are not a problem (whatever next points to cannot
>> > be an SRA candidate and any ADDR_EXPR storing its address there would
>> > trigger a disqualification or at least an assert).  But I guess I am
>> > wrong, what is actually the exact meaning of the flag?
>> 
>> I thought it's return (x.ptr = param, &x);
>> 
>> so the parameter is reachable from the return value.
>> 
>> But let's Honza answer...
> It is same difference as direct/indirect escape. so it check whether
> values pointed to by arg can be possibly returned.  Indeed maybe we
> should think of better name - the other interpretation did not even
> occur to me, but it makes sense.
>

Is my patch OK then?

(Apart from making one of the testcases x86_64-only, as Andrew pointed
out, which I wanted to do but the line somehow got lost.  Making the
testcase more general is fairly low on my contested TODO list and the
testing depends on a specific instruction trapping.)

Thanks,

Martin



Re: [PATCH] tree-sra: Avoid returns of references to SRA candidates

2023-11-28 Thread Martin Jambor
On Tue, Nov 28 2023, Richard Biener wrote:
> On Mon, 27 Nov 2023, Martin Jambor wrote:
>
>> Hi,
>> 
>> The enhancement to address PR 109849 contained an importsnt thinko,
>> and that any reference that is passed to a function and does not
>> escape, must also not happen to be aliased by the return value of the
>> function.  This has quickly transpired as bugs PR 112711 and PR
>> 112721.
>> 
>> Just as IPA-modref does a good enough job to allow us to rely on the
>> escaped set of variables, it sems to be doing well also on updating
>> EAF_NOT_RETURNED_DIRECTLY call argument flag which happens to address
>> exactly the situation we need to avoid.  Of course, if a call
>> statement ignores any returned value, we also do not need to check the
>> flag.
>
> But what about EAF_NOT_RETURNED_INDIRECTLY?  Don't you need to
> verify the parameter doesn't escape through the return at all?
>

I thought EAF_NOT_RETURNED_INDIRECTLY prohibits things like "return
param->next" but those are not a problem (whatever next points to cannot
be an SRA candidate and any ADDR_EXPR storing its address there would
trigger a disqualification or at least an assert).  But I guess I am
wrong, what is actually the exact meaning of the flag?

Thanks,

Martin


[PATCH] tree-sra: Avoid returns of references to SRA candidates

2023-11-27 Thread Martin Jambor
Hi,

The enhancement to address PR 109849 contained an importsnt thinko,
and that any reference that is passed to a function and does not
escape, must also not happen to be aliased by the return value of the
function.  This has quickly transpired as bugs PR 112711 and PR
112721.

Just as IPA-modref does a good enough job to allow us to rely on the
escaped set of variables, it sems to be doing well also on updating
EAF_NOT_RETURNED_DIRECTLY call argument flag which happens to address
exactly the situation we need to avoid.  Of course, if a call
statement ignores any returned value, we also do not need to check the
flag.

Hopefully this does not pessimize things too much, I have verified
that the PR 109849 testcae remains quick and so should also the
benchmark it is derived from.

The patch has passed bootstrap and testing on x86_64-linux, OK for
master?

Thanks,

Martin


gcc/ChangeLog:

2023-11-27  Martin Jambor  

PR tree-optimization/112711
PR tree-optimization/112721
* tree-sra.cc (build_access_from_call_arg): New parameter
CAN_BE_RETURNED, disqualify any candidate passed by reference if it is
true.  Adjust leading comment.
(scan_function): Pass appropriate value to CAN_BE_RETURNED of
build_access_from_call_arg.

gcc/testsuite/ChangeLog:

2023-11-27  Martin Jambor  

PR tree-optimization/112711
PR tree-optimization/112721
* g++.dg/tree-ssa/pr112711.C: New test.
* gcc.dg/tree-ssa/pr112721.c: Likewise.
---
 gcc/testsuite/g++.dg/tree-ssa/pr112711.C | 31 ++
 gcc/testsuite/gcc.dg/tree-ssa/pr112721.c | 26 +++
 gcc/tree-sra.cc  | 40 ++--
 3 files changed, 88 insertions(+), 9 deletions(-)
 create mode 100644 gcc/testsuite/g++.dg/tree-ssa/pr112711.C
 create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/pr112721.c

diff --git a/gcc/testsuite/g++.dg/tree-ssa/pr112711.C 
b/gcc/testsuite/g++.dg/tree-ssa/pr112711.C
new file mode 100644
index 000..c04524b04a7
--- /dev/null
+++ b/gcc/testsuite/g++.dg/tree-ssa/pr112711.C
@@ -0,0 +1,31 @@
+/* { dg-do run } */
+/* { dg-options "-O1" } */
+
+typedef  int i32;
+typedef unsigned int u32;
+
+static inline void write_i32(void *memory, i32 value) {
+  // swap i32 bytes as if it was u32:
+  u32 u_value = value;
+  value = __builtin_bswap32(u_value);
+
+  // llvm infers '1' alignment from destination type
+  __builtin_memcpy(__builtin_assume_aligned(memory, 1), &value, sizeof(value));
+}
+
+__attribute__((noipa))
+static void bug (void) {
+  #define assert_eq(lhs, rhs) if (lhs != rhs) __builtin_trap()
+
+  unsigned char data[5];
+  write_i32(data, -1362446643);
+  assert_eq(data[0], 0xAE);
+  assert_eq(data[1], 0xCA);
+  write_i32(data + 1, -1362446643);
+  assert_eq(data[1], 0xAE);
+}
+
+int main() {
+bug();
+return 0;
+}
diff --git a/gcc/testsuite/gcc.dg/tree-ssa/pr112721.c 
b/gcc/testsuite/gcc.dg/tree-ssa/pr112721.c
new file mode 100644
index 000..adf62613266
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/tree-ssa/pr112721.c
@@ -0,0 +1,26 @@
+/* { dg-do run } */
+/* { dg-options "-O1" } */
+
+unsigned * volatile gv;
+
+struct a {
+  int b;
+};
+int c, e;
+long d;
+unsigned * __attribute__((noinline))
+f(unsigned *g) {
+  for (; c;)
+e = d;
+  return gv ? gv : g;
+}
+int main() {
+  int *h;
+  struct a i = {8};
+  int *j = &i.b;
+  h = (unsigned *) f(j);
+  *h = 0;
+  if (i.b != 0)
+__builtin_abort ();
+  return 0;
+}
diff --git a/gcc/tree-sra.cc b/gcc/tree-sra.cc
index 3a0d52675fe..6a759783990 100644
--- a/gcc/tree-sra.cc
+++ b/gcc/tree-sra.cc
@@ -1268,18 +1268,27 @@ abnormal_edge_after_stmt_p (gimple *stmt, enum 
out_edge_check *oe_check)
 }
 
 /* Scan expression EXPR which is an argument of a call and create access
-   structures for all accesses to candidates for scalarization.  Return true if
-   any access has been inserted.  STMT must be the statement from which the
-   expression is taken.  */
+   structures for all accesses to candidates for scalarization.  Return true
+   if any access has been inserted.  STMT must be the statement from which the
+   expression is taken.  CAN_BE_RETURNED must be true if call argument flags
+   do not rule out that the argument is directly returned.  OE_CHECK is used
+   to remember result of a test for abnormal outgoing edges after this
+   statement.  */
 
 static bool
-build_access_from_call_arg (tree expr, gimple *stmt,
+build_access_from_call_arg (tree expr, gimple *stmt, bool can_be_returned,
enum out_edge_check *oe_check)
 {
   if (TREE_CODE (expr) == ADDR_EXPR)
 {
   tree base = get_base_address (TREE_OPERAND (expr, 0));
 
+  if (can_be_returned)
+   {
+ disqualify_base_of_expr (base, "Address possibly returned, "
+  "leading to an alis SRA may not know.");
+ return false;
+

Re: [PATCH] sra: SRA of non-escaped aggregates passed by reference to calls

2023-11-24 Thread Martin Jambor
Hello,

thanks a lot for your review.

On Fri, Nov 17 2023, Richard Biener wrote:
> On Thu, 16 Nov 2023, Martin Jambor wrote:
>
>> Hello,
>> 
>> PR109849 shows that a loop that heavily pushes and pops from a stack
>> implemented by a C++ std::vec results in slow code, mainly because the
>> vector structure is not split by SRA and so we end up in many loads
>> and stores into it.  This is because it is passed by reference
>> to (re)allocation methods and so needs to live in memory, even though
>> it does not escape from them and so we could SRA it if we
>> re-constructed it before the call and then separated it to distinct
>> replacements afterwards.
>> 
>> This patch does exactly that, first relaxing the selection of
>> candidates to also include those which are addressable but do not
>> escape and then adding code to deal with the calls.  The
>> micro-benchmark that is also the (scan-dump) testcase in this patch
>> runs twice as fast with it than with current trunk.  Honza measured
>> its effect on the libjxl benchmark and it almost closes the
>> performance gap between Clang and GCC while not requiring excessive
>> inlining and thus code growth.
>> 
>> The patch disallows creation of replacements for such aggregates which
>> are also accessed with a precision smaller than their size because I
>> have observed that this led to excessive zero-extending of data
>> leading to slow-downs of perlbench (on some CPUs).  Apart from this
>> case I have not noticed any regressions, at least not so far.
>> 
>> Gimple call argument flags can tell if an argument is unused (and then
>> we do not need to generate any statements for it) or if it is not
>> written to and then we do not need to generate statements loading
>> replacements from the original aggregate after the call statement.
>> Unfortunately, we cannot symmetrically use flags that an aggregate is
>> not read because to avoid re-constructing the aggregate before the
>> call because flags don't tell which what parts of aggregates were not
>> written to, so we load all replacements, and so all need to have the
>> correct value before the call.
>> 
>> The patch passes bootstrap, lto-bootstrap and profiled-lto-bootstrap on
>> x86_64-linux and a very similar patch has also passed bootstrap and
>> testing on Aarch64-linux and ppc64le-linux (I'm re-running both on these
>> two architectures but as I'm sending this).  OK for master?
>> 
>> Thanks,
>> 
>> Martin
>> 

[...]

>> @@ -1920,10 +1981,19 @@ maybe_add_sra_candidate (tree var)
>>reject (var, "not aggregate");
>>return false;
>>  }
>> -  /* Allow constant-pool entries that "need to live in memory".  */
>> -  if (needs_to_live_in_memory (var) && !constant_decl_p (var))
>> +
>> +  if ((is_global_var (var)
>> +   /* There are cases where non-addressable variables fail the
>> +  pt_solutions_check test, e.g in gcc.dg/uninit-40.c. */
>> +   || (TREE_ADDRESSABLE (var)
>> +   && pt_solution_includes (&cfun->gimple_df->escaped, var))
>
> so it seems this is the "correctness" test rather than using
> the call argument flags?  I'll note that this may be overly
> conservative.

It is but with ipa-modref it seems to work fairly well.

>
> For the call handling above I wondered about return-slot-opt calls
> where the address of the LHS escapes to the call - if the points-to
> check is the correctness check that should still work out of course
> (but subject to PR109945).

Hm, I wonder what the implications of that PR is for SRA, but it is a
different issue, this patch does not touch handling LHSs.

>
>> +   || (TREE_CODE (var) == RESULT_DECL
>> +   && !DECL_BY_REFERENCE (var)
>> +   && aggregate_value_p (var, current_function_decl)))
>> +  /* Allow constant-pool entries that "need to live in memory".  */
>> +  && !constant_decl_p (var))
>>  {
>> -  reject (var, "needs to live in memory");
>> +  reject (var, "needs to live in memory and escapes or global");
>>return false;
>>  }
>>if (TREE_THIS_VOLATILE (var))
>> @@ -2122,6 +2192,21 @@ sort_and_splice_var_accesses (tree var)
>>  gcc_assert (access->offset >= low
>>  && access->offset + access->size <= high);
>>  
>> +  if (INTEGRAL_TYPE_P (access->type)
>> +  && TYPE_PRECISION (access->type) != access->size
>> 

Re: libstdc++: Speed up push_back

2023-11-24 Thread Martin Jambor
Hello,

On Thu, Nov 23 2023, Jonathan Wakely wrote:
> On Thu, 23 Nov 2023 at 15:34, Jan Hubicka  wrote:
>>

[...]

>>
>> I also wonder, if default operator new and malloc can be handled as not
>> reading/modifying anything visible to the user code.
>
> No, there's no way to know if the default operator new is being used.
> A replacement operator new could be provided at link-time.
>
> That's why we need -fsane-operator-new
>

Would it make sense to add -fsane-operator-new to -Ofast?

Martin


[PATCH] Bump LTO_minor_version

2023-11-20 Thread Martin Jambor
Hi Richi,

On Wed, Sep 20 2023, Richard Biener wrote:
> The following turns MAX_NUM_CHAINS and MAX_CHAIN_LEN to params which
> allows to experiment with raising them.  For the testcase in PR111489
> raising MAX_CHAIN_LEN from 5 to 8 avoids the bogus diagnostics
> at -O2, at -O3 we need a MAX_CHAIN_LEN of 6.
>
> Bootstrapped and tested on x86_64-unknown-linux-gnu, pushed.
>
>   PR tree-optimization/111489
>   * doc/invoke.texi (--param uninit-max-chain-len): Document.
>   (--param uninit-max-num-chains): Likewise.
>   * params.def (-param=uninit-max-chain-len=): New.
>   (-param=uninit-max-num-chains=): Likewise.
>   * gimple-predicate-analysis.cc (MAX_NUM_CHAINS): Define to
>   param_uninit_max_num_chains.
>   (MAX_CHAIN_LEN): Define to param_uninit_max_chain_len.
>   (uninit_analysis::init_use_preds): Avoid VLA.
>   (uninit_analysis::init_from_phi_def): Likewise.
>   (compute_control_dep_chain): Avoid using MAX_CHAIN_LEN in
>   template parameter.

our test attempting to detect that LTO_minor_version should have been
bumped but wasn't is failing and eye-balling backports to the gcc-13
branch, this looks like a likely culprit?  Unless I am mistaken, params
are streamed and therefore they alter the LTO format?

If so, I'd like to propose the obvious fix, OK for gcc-13 (after some
testing)?

Thanks,

Martin


[PATCH] Bump LTO_minor_version

I believe r13-8039-g06ee3438a4fcf9 has changed LTO format and
therefore we should bump the minor version of the GCC 13 LTO format.

gcc/ChangeLog:

2023-11-20  Martin Jambor  

* lto-streamer.h (LTO_minor_version): Bump.
---
 gcc/lto-streamer.h | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/gcc/lto-streamer.h b/gcc/lto-streamer.h
index fc7133d07ba..75cebcd02d3 100644
--- a/gcc/lto-streamer.h
+++ b/gcc/lto-streamer.h
@@ -122,7 +122,7 @@ along with GCC; see the file COPYING3.  If not see
  form followed by the data for the string.  */
 
 #define LTO_major_version GCC_major_version
-#define LTO_minor_version 0
+#define LTO_minor_version 1
 
 typedef unsigned char  lto_decl_flags_t;
 
-- 
2.42.0





Re: Propagate value ranges of return values

2023-11-20 Thread Martin Jambor
Hi,

thanks for working on this.

On Sun, Nov 19 2023, Jan Hubicka wrote:
> Hi,
> this is updated version which also adds testuiste compensation
> I lost earlier while maintaining the patch in my testing tree.
> There are quite few testcases that use constant return values to hide
> something from optimizer.
>
> Bootstrapped/regtested x86_64-linux.
> gcc/ChangeLog:
>
>   * cgraph.cc (add_detected_attribute_1): New function.
>   (cgraph_node::add_detected_attribute): Likewise.
>   * cgraph.h (cgraph_node::add_detected_attribute): Declare.
>   * common.opt: Add -Wsuggest-attribute=returns_nonnull.
>   * doc/invoke.texi: Document new flag.
>   * gimple-range-fold.cc (fold_using_range::range_of_call):
>   Use known reutrn value ranges.
>   * ipa-prop.cc (struct ipa_return_value_summary): New type.
>   (class ipa_return_value_sum_t): New type.
>   (ipa_return_value_sum): New summary.
>   (ipa_record_return_value_range): New function.
>   (ipa_return_value_range): New function.
>   * ipa-prop.h (ipa_return_value_range): Declare.
>   (ipa_record_return_value_range): Declare.
>   * ipa-pure-const.cc (warn_function_returns_nonnull): New funcion.
>   * ipa-utils.h (warn_function_returns_nonnull): Declare.
>   * symbol-summary.h: Fix comment.
>   * tree-vrp.cc (execute_ranger_vrp): Record return values.
>
> gcc/testsuite/ChangeLog:
>
>   * g++.dg/ipa/devirt-2.C: Add noipa attribute to prevent ipa-vrp.
>   * g++.dg/ipa/devirt-7.C: Disable ipa-vrp.
>   * g++.dg/ipa/ipa-icf-2.C: Disable ipa-vrp.
>   * g++.dg/ipa/ipa-icf-3.C: Disable ipa-vrp.
>   * g++.dg/ipa/ivinline-1.C: Disable ipa-vrp.
>   * g++.dg/ipa/ivinline-3.C: Disable ipa-vrp.
>   * g++.dg/ipa/ivinline-5.C: Disable ipa-vrp.
>   * g++.dg/ipa/ivinline-8.C: Disable ipa-vrp.
>   * g++.dg/ipa/nothrow-1.C: Disable ipa-vrp.
>   * g++.dg/ipa/pure-const-1.C: Disable ipa-vrp.
>   * g++.dg/ipa/pure-const-2.C: Disable ipa-vrp.
>   * g++.dg/lto/inline-crossmodule-1_0.C: Disable ipa-vrp.
>   * gcc.c-torture/compile/pr106433.c: Add noipa attribute to prevent 
> ipa-vrp.
>   * gcc.c-torture/execute/frame-address.c: Likewise.
>   * gcc.dg/ipa/fopt-info-inline-1.c: Disable ipa-vrp.
>   * gcc.dg/ipa/ipa-icf-25.c: Disable ipa-vrp.
>   * gcc.dg/ipa/ipa-icf-38.c: Disable ipa-vrp.
>   * gcc.dg/ipa/pure-const-1.c: Disable ipa-vrp.
>   * gcc.dg/ipa/remref-0.c: Add noipa attribute to prevent ipa-vrp.
>   * gcc.dg/tree-prof/time-profiler-1.c: Disable ipa-vrp.
>   * gcc.dg/tree-prof/time-profiler-2.c: Disable ipa-vrp.
>   * gcc.dg/tree-ssa/pr110269.c: Disable ipa-vrp.
>   * gcc.dg/tree-ssa/pr20701.c: Disable ipa-vrp.
>   * gcc.dg/tree-ssa/vrp05.c: Disable ipa-vrp.
>   * gcc.dg/tree-ssa/return-value-range-1.c: New test.
>
> diff --git a/gcc/cgraph.cc b/gcc/cgraph.cc
> index e41e5ad3ae7..71dacf23ce1 100644
> --- a/gcc/cgraph.cc
> +++ b/gcc/cgraph.cc
> @@ -2629,6 +2629,54 @@ cgraph_node::set_malloc_flag (bool malloc_p)
>return changed;
>  }
>  
> +/* Worker to set malloc flag.  */

I think the comment must be stale, and the name of the function also, it
does not add anything, does it?

> +static void
> +add_detected_attribute_1 (cgraph_node *node, const char *attr, bool *changed)
> +{
> +  if (!lookup_attribute (attr, DECL_ATTRIBUTES (node->decl)))
> +{
> +  DECL_ATTRIBUTES (node->decl) = tree_cons (get_identifier (attr),
> +  NULL_TREE, DECL_ATTRIBUTES 
> (node->decl));
> +  *changed = true;
> +}
> +
> +  ipa_ref *ref;
> +  FOR_EACH_ALIAS (node, ref)
> +{
> +  cgraph_node *alias = dyn_cast (ref->referring);
> +  if (alias->get_availability () > AVAIL_INTERPOSABLE)
> + add_detected_attribute_1 (alias, attr, changed);
> +}
> +
> +  for (cgraph_edge *e = node->callers; e; e = e->next_caller)
> +if (e->caller->thunk
> + && (e->caller->get_availability () > AVAIL_INTERPOSABLE))
> +  add_detected_attribute_1 (e->caller, attr, changed);
> +}
> +
> +/* Set DECL_IS_MALLOC on NODE's decl and on NODE's aliases if any.  */

Likewise.

> +
> +bool
> +cgraph_node::add_detected_attribute (const char *attr)
> +{
> +  bool changed = false;
> +
> +  if (get_availability () > AVAIL_INTERPOSABLE)
> +add_detected_attribute_1 (this, attr, &changed);
> +  else
> +{
> +  ipa_ref *ref;
> +
> +  FOR_EACH_ALIAS (this, ref)
> + {
> +   cgraph_node *alias = dyn_cast (ref->referring);
> +   if (alias->get_availability () > AVAIL_INTERPOSABLE)
> + add_detected_attribute_1 (alias, attr, &changed);
> + }
> +}
> +  return changed;
> +}
> +
>  /* Worker to set noreturng flag.  */
>  static void
>  set_noreturn_flag_1 (cgraph_node *node, bool noreturn_p, bool *changed)

[...]

> diff --git a/gcc/gimple-range-fold.cc b/gcc/gimple-range-fold.cc
> index 6e9530c3d7f..998b7608d78 100644
> --- a/gcc/gimple-range-fold.cc
> +++ b/gcc/g

[PATCH] sra: SRA of non-escaped aggregates passed by reference to calls

2023-11-16 Thread Martin Jambor
Hello,

PR109849 shows that a loop that heavily pushes and pops from a stack
implemented by a C++ std::vec results in slow code, mainly because the
vector structure is not split by SRA and so we end up in many loads
and stores into it.  This is because it is passed by reference
to (re)allocation methods and so needs to live in memory, even though
it does not escape from them and so we could SRA it if we
re-constructed it before the call and then separated it to distinct
replacements afterwards.

This patch does exactly that, first relaxing the selection of
candidates to also include those which are addressable but do not
escape and then adding code to deal with the calls.  The
micro-benchmark that is also the (scan-dump) testcase in this patch
runs twice as fast with it than with current trunk.  Honza measured
its effect on the libjxl benchmark and it almost closes the
performance gap between Clang and GCC while not requiring excessive
inlining and thus code growth.

The patch disallows creation of replacements for such aggregates which
are also accessed with a precision smaller than their size because I
have observed that this led to excessive zero-extending of data
leading to slow-downs of perlbench (on some CPUs).  Apart from this
case I have not noticed any regressions, at least not so far.

Gimple call argument flags can tell if an argument is unused (and then
we do not need to generate any statements for it) or if it is not
written to and then we do not need to generate statements loading
replacements from the original aggregate after the call statement.
Unfortunately, we cannot symmetrically use flags that an aggregate is
not read because to avoid re-constructing the aggregate before the
call because flags don't tell which what parts of aggregates were not
written to, so we load all replacements, and so all need to have the
correct value before the call.

The patch passes bootstrap, lto-bootstrap and profiled-lto-bootstrap on
x86_64-linux and a very similar patch has also passed bootstrap and
testing on Aarch64-linux and ppc64le-linux (I'm re-running both on these
two architectures but as I'm sending this).  OK for master?

Thanks,

Martin


gcc/ChangeLog:

2023-11-16  Martin Jambor  

PR middle-end/109849
* tree-sra.cc (passed_by_ref_in_call): New.
(sra_initialize): Allocate passed_by_ref_in_call.
(sra_deinitialize): Free passed_by_ref_in_call.
(create_access): Add decl pool candidates only if they are not
already candidates.
(build_access_from_expr_1): Bail out on ADDR_EXPRs.
(build_access_from_call_arg): New function.
(asm_visit_addr): Rename to scan_visit_addr, change the
disqualification dump message.
(scan_function): Check taken addresses for all non-call statements,
including phi nodes.  Process all call arguments, including the static
chain, build_access_from_call_arg.
(maybe_add_sra_candidate): Relax need_to_live_in_memory check to allow
non-escaped local variables.
(sort_and_splice_var_accesses): Disallow smaller-than-precision
replacements for aggregates passed by reference to functions.
(sra_modify_expr): Use a separate stmt iterator for adding satements
before the processed statement and after it.
(sra_modify_call_arg): New function.
(sra_modify_assign): Adjust calls to sra_modify_expr.
(sra_modify_function_body): Likewise, use sra_modify_call_arg to
process call arguments, including the static chain.

gcc/testsuite/ChangeLog:

2023-11-03  Martin Jambor  

PR middle-end/109849
* g++.dg/tree-ssa/pr109849.C: New test.
* gfortran.dg/pr43984.f90: Added -fno-tree-sra to dg-options.
---
 gcc/testsuite/g++.dg/tree-ssa/pr109849.C |  31 +++
 gcc/testsuite/gfortran.dg/pr43984.f90|   2 +-
 gcc/tree-sra.cc  | 244 ++-
 3 files changed, 231 insertions(+), 46 deletions(-)
 create mode 100644 gcc/testsuite/g++.dg/tree-ssa/pr109849.C

diff --git a/gcc/testsuite/g++.dg/tree-ssa/pr109849.C 
b/gcc/testsuite/g++.dg/tree-ssa/pr109849.C
new file mode 100644
index 000..cd348c0f590
--- /dev/null
+++ b/gcc/testsuite/g++.dg/tree-ssa/pr109849.C
@@ -0,0 +1,31 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -fdump-tree-sra" } */
+
+#include 
+typedef unsigned int uint32_t;
+std::pair pair;
+void
+test()
+{
+std::vector > stack;
+stack.push_back (pair);
+while (!stack.empty()) {
+std::pair cur = stack.back();
+stack.pop_back();
+if (!cur.first)
+{
+cur.second++;
+stack.push_back (cur);
+}
+if (cur.second > 1)
+break;
+}
+}
+int
+main()
+{
+for (int i = 0; i < 1; i++)
+  test();
+}
+
+/* { dg

[PATCH] gcc/configure: Regenerate

2023-11-07 Thread Martin Jambor
On Mon, Nov 06 2023, Martin Jambor wrote:
>
[...]
>
> I'm not sure what that means, whether a wrong version of
> autoconf/automake was used (though when I accidentally tried that, it
> has always complained loudly) or if some environment difference can
> cause this.  Perhaps I should change the script not to care about
> commits though that won't happen soon (or perhaps I should drop the
> checks completely) but would people be OK with me checking in the patch
> above (with appropriate ChangeLog) to silence buildbot for a while
> again?
>

I have committed the following to silence the tester.

Probabaly because of a re-base of changes to gcc/configure there are
line comment mismatches in between what we have and what would be
generated. This patch brings them in line so that consitency
checkers are happy.

gcc/ChangeLog:

2023-11-07  Martin Jambor  

* configure: Regenerate.
---
 gcc/configure | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/gcc/configure b/gcc/configure
index 4d0357cbc28..0d818ae6850 100755
--- a/gcc/configure
+++ b/gcc/configure
@@ -2,7 +2,7 @@ else
   lt_dlunknown=0; lt_dlno_uscore=1; lt_dlneed_uscore=2
   lt_status=$lt_dlunknown
   cat > conftest.$ac_ext <<_LT_EOF
-#line 19995 "configure"
+#line 20003 "configure"
 #include "confdefs.h"
 
 #if HAVE_DLFCN_H
@@ -20106,7 +20106,7 @@ else
   lt_dlunknown=0; lt_dlno_uscore=1; lt_dlneed_uscore=2
   lt_status=$lt_dlunknown
   cat > conftest.$ac_ext <<_LT_EOF
-#line 20101 "configure"
+#line 20109 "configure"
 #include "confdefs.h"
 
 #if HAVE_DLFCN_H
-- 
2.42.0



[PATCH] Fix configure script comments(!?!) (Was: Re: [PATCH] genemit: Split insn-emit.cc into ten files)

2023-11-06 Thread Martin Jambor
Hello,

On Thu, Oct 12 2023, Robin Dapp wrote:
>
[...]
> gcc/ChangeLog:
>
>   PR bootstrap/84402
>   PR target/111600
>
>   * Makefile.in: Handle split insn-emit.cc.
>   * configure: Regenerate.
>   * configure.ac: Add --with-insnemit-partitions.
>   * genemit.cc (output_peephole2_scratches): Print to file instead
>   of stdout.
>   (print_code): Ditto.
>   (gen_rtx_scratch): Ditto.
>   (gen_exp): Ditto.
>   (gen_emit_seq): Ditto.
>   (emit_c_code): Ditto.
>   (gen_insn): Ditto.
>   (gen_expand): Ditto.
>   (gen_split): Ditto.
>   (output_add_clobbers): Ditto.
>   (output_added_clobbers_hard_reg_p): Ditto.
>   (print_overload_arguments): Ditto.
>   (print_overload_test): Ditto.
>   (handle_overloaded_code_for): Ditto.
>   (handle_overloaded_gen): Ditto.
>   (print_header): New function.
>   (handle_arg): New function.
>   (main): Split output into 10 files.
>   * gensupport.cc (count_patterns): New function.
>   * gensupport.h (count_patterns): Define.
>   * read-md.cc (md_reader::print_md_ptr_loc): Add file argument.
>   * read-md.h (class md_reader): Change definition.

Following this commit, our buildbot script which checks that configure
scripts where re-generated correctly is unhappy because it insists
comments are wrong, it wants to them to be like this:


diff --git a/gcc/configure b/gcc/configure
index 4d0357cbc28..0d818ae6850 100755
--- a/gcc/configure
+++ b/gcc/configure
@@ -2,7 +2,7 @@ else
   lt_dlunknown=0; lt_dlno_uscore=1; lt_dlneed_uscore=2
   lt_status=$lt_dlunknown
   cat > conftest.$ac_ext <<_LT_EOF
-#line 19995 "configure"
+#line 20003 "configure"
 #include "confdefs.h"
 
 #if HAVE_DLFCN_H
@@ -20106,7 +20106,7 @@ else
   lt_dlunknown=0; lt_dlno_uscore=1; lt_dlneed_uscore=2
   lt_status=$lt_dlunknown
   cat > conftest.$ac_ext <<_LT_EOF
-#line 20101 "configure"
+#line 20109 "configure"
 #include "confdefs.h"
 
 #if HAVE_DLFCN_H


I'm not sure what that means, whether a wrong version of
autoconf/automake was used (though when I accidentally tried that, it
has always complained loudly) or if some environment difference can
cause this.  Perhaps I should change the script not to care about
commits though that won't happen soon (or perhaps I should drop the
checks completely) but would people be OK with me checking in the patch
above (with appropriate ChangeLog) to silence buildbot for a while
again?

Thanks,

Martin


[PATCH] Fortran: Fix generate_error library function fnspec

2023-11-03 Thread Martin Jambor
Hi,

when developing an otherwise unrelated patch I've discovered that the
fnspec for the Fortran library function generate_error is wrong. It is
currently ". R . R " where the first R describes the first parameter
and means that it "is only read and does not escape."  The function
itself, however, with signature:

  bool
  generate_error_common (st_parameter_common *cmp, int family, const char 
*message)

contains the following:

  /* Report status back to the compiler.  */
  cmp->flags &= ~IOPARM_LIBRETURN_MASK;

which does not correspond to the fnspec and breaks testcase
gfortran.dg/large_unit_2.f90 when my patch is applied, since it tries
to re-use the flags from before the call.

This patch replaces the "R" with "W" which stands for "specifies that
the memory pointed to by the parameter does not escape."

Bootstrapped and tested on x86_64-linux.  OK for master?


2023-11-02  Martin Jambor  

* trans-decl.cc (gfc_build_builtin_function_decls): Fix fnspec of
generate_error.

---
 gcc/fortran/trans-decl.cc | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/gcc/fortran/trans-decl.cc b/gcc/fortran/trans-decl.cc
index a3f037bd07b..b86cfec7d49 100644
--- a/gcc/fortran/trans-decl.cc
+++ b/gcc/fortran/trans-decl.cc
@@ -3821,7 +3821,7 @@ gfc_build_builtin_function_decls (void)
void_type_node, -2, pchar_type_node, pchar_type_node);
 
   gfor_fndecl_generate_error = gfc_build_library_function_decl_with_spec (
-   get_identifier (PREFIX("generate_error")), ". R . R ",
+   get_identifier (PREFIX("generate_error")), ". W . R ",
void_type_node, 3, pvoid_type_node, integer_type_node,
pchar_type_node);
 
-- 
2.42.0



Re: [PATCH 2/3] ipa: Prune any IPA-CP aggregate constants known by modref to be killed (111157)

2023-10-30 Thread Martin Jambor
Hello,

On Thu, Oct 05 2023, Jan Hubicka wrote:
>> gcc/ChangeLog:
>> 
>> 2023-09-19  Martin Jambor  
>> 
>>  PR ipa/57
>>  * ipa-prop.h (struct ipa_argagg_value): Newf flag killed.
>>  * ipa-modref.cc (ipcp_argagg_and_kill_overlap_p): New function.
>>  (update_signature): Mark any any IPA-CP aggregate constants at
>>  positions known to be killed as killed.  Move check that there is
>>  clone_info after this pruning.
>>  * ipa-cp.cc (ipa_argagg_value_list::dump): Dump the killed flag.
>>  (ipa_argagg_value_list::push_adjusted_values): Clear the new flag.
>>  (push_agg_values_from_plats): Likewise.
>>  (ipa_push_agg_values_from_jfunc): Likewise.
>>  (estimate_local_effects): Likewise.
>>  (push_agg_values_for_index_from_edge): Likewise.
>>  * ipa-prop.cc (write_ipcp_transformation_info): Stream the killed
>>  flag.
>>  (read_ipcp_transformation_info): Likewise.
>>  (ipcp_get_aggregate_const): Update comment, assert that encountered
>>  record does not have killed flag set.
>>  (ipcp_transform_function): Prune all aggregate constants with killed
>>  set.
>> 
>> gcc/testsuite/ChangeLog:
>> 
>> 2023-09-18  Martin Jambor  
>> 
>>  PR ipa/57
>>  * gcc.dg/lto/pr57_0.c: New test.
>>  * gcc.dg/lto/pr57_1.c: Second file of the same new test.
>
>> diff --git a/gcc/ipa-modref.cc b/gcc/ipa-modref.cc
>> index c04f9f44c06..a8fcf159259 100644
>> --- a/gcc/ipa-modref.cc
>> +++ b/gcc/ipa-modref.cc
>> @@ -4065,21 +4065,71 @@ remap_kills (vec  &kills, const 
>> vec  &map)
>>i++;
>>  }
>>  
>> +/* Return true if the V can overlap with KILL.  */
>> +
>> +static bool
>> +ipcp_argagg_and_kill_overlap_p (const ipa_argagg_value &v,
>> +const modref_access_node &kill)
>> +{
>> +  if (kill.parm_index == v.index)
>> +{
>> +  gcc_assert (kill.parm_offset_known);
>> +  gcc_assert (known_eq (kill.max_size, kill.size));
>> +  poly_int64 repl_size;
>> +  bool ok = poly_int_tree_p (TYPE_SIZE (TREE_TYPE (v.value)),
>> + &repl_size);
>> +  gcc_assert (ok);
>> +  poly_int64 repl_offset (v.unit_offset);
>> +  repl_offset <<= LOG2_BITS_PER_UNIT;
>> +  poly_int64 combined_offset
>> += (kill.parm_offset << LOG2_BITS_PER_UNIT) + kill.offset;
> parm_offset may be negative which I think will confuse 
> ranges_maybe_overlap_p. 
> I think you need to test for this and if it is negative adjust
> repl_offset instead of kill.offset

After a discussion with Honza about this in person, we came to the
conclusion that the patch works as intended even in presence of negative
parm_offsets (I even have a testcase but I need to enhance IPA-CP a bit
in order for it to be useful also outside a debugger).

>> +  if (ranges_maybe_overlap_p (repl_offset, repl_size,
>> +  combined_offset, kill.size))
>> +return true;
>> +}
>> +  return false;
>> +}
>> +
>>  /* If signature changed, update the summary.  */
>>  
>>  static void
>>  update_signature (struct cgraph_node *node)
>>  {
>> -  clone_info *info = clone_info::get (node);
>> -  if (!info || !info->param_adjustments)
>> -return;
>> -
>>modref_summary *r = optimization_summaries
>>? optimization_summaries->get (node) : NULL;
>>modref_summary_lto *r_lto = summaries_lto
>>? summaries_lto->get (node) : NULL;
>>if (!r && !r_lto)
>>  return;
>> +
>> +  ipcp_transformation *ipcp_ts = ipcp_get_transformation_summary (node);
> Please add comment on why this is necessary.
>> +  if (ipcp_ts)
>> +{
>> +for (auto &v : ipcp_ts->m_agg_values)
>> +  {
>> +if (!v.by_ref)
>> +  continue;
>> +if (r)
>> +  for (const modref_access_node &kill : r->kills)
>> +if (ipcp_argagg_and_kill_overlap_p (v, kill))
>> +  {
>> +v.killed = true;
>> +break;
>> +  }
>> +if (!v.killed && r_lto)
>> +  for (const modref_access_node &kill : r_lto->kills)
>> +if (ipcp_argagg_and_kill_overlap_p (v, kill))
>> +  {
>> +v.killed = 1;
>  = true?
>> +break;
>> +  }
>> +  }
>> +}
>> +
>> +  c

Re: Darwin: Replace environment runpath with embedded [PR88590]

2023-10-30 Thread Martin Jambor
Hello Iain,

On Tue, Aug 15 2023, FX Coudert via Gcc-patches wrote:
>

[...]

> From e1cf04cadb9fa065fb3f7d6bccf9ed6f1e9e3fc1 Mon Sep 17 00:00:00 2001
> From: Iain Sandoe 
> Date: Sun, 28 Mar 2021 14:48:17 +0100
> Subject: [PATCH 2/4] Darwin: Allow for configuring Darwin to use embedded
>  runpath.

our buildbot checker found that after this patch, there is an
uncommitted auto(re)conf generated hunk in fixincludes/configure:

diff --git a/fixincludes/configure b/fixincludes/configure
index b9770489adc..1bb547a1724 100755
--- a/fixincludes/configure
+++ b/fixincludes/configure
@@ -3027,6 +3027,7 @@ ac_configure="$SHELL $ac_aux_dir/configure"  # Please 
don't use this var.
 # ---
 # _LT_COMPILER_PIC
 
+enable_darwin_at_rpath_$1=no
 
 # _LT_LINKER_SHLIBS([TAGNAME])
 # 
@@ -3049,7 +3050,6 @@ ac_configure="$SHELL $ac_aux_dir/configure"  # Please 
don't use this var.
 # the compiler configuration to `libtool'.
 # _LT_LANG_CXX_CONFIG
 
-
 # _LT_SYS_HIDDEN_LIBDEPS([TAGNAME])
 # -
 # Figure out "hidden" library dependencies from verbose


Can I commit it (with an appropriate ChangeLog message) or do you want
to take care of it yourself?

Thanks,

Martin


>
> Recent Darwin versions place contraints on the use of run paths
> specified in environment variables.  This breaks some assumptions
> in the GCC build.
>
> This change allows the user to configure a Darwin build to use
> '@rpath/libraryname.dylib' in library names and then to add an
> embedded runpath to executables (and libraries with dependents).
>
> The embedded runpath is added by default unless the user adds
> '-nodefaultrpaths' to the link line.
>
> For an installed compiler, it means that any executable built with
> that compiler will reference the runtimes installed with the
> compiler (equivalent to hard-coding the library path into the name
> of the library).
>
> During build-time configurations  any "-B" entries will be added to
> the runpath thus the newly-built libraries will be found by exes.
>
> Since the install name is set in libtool, that decision needs to be
> available here (but might also cause dependent ones in Makefiles,
> so we need to export a conditional).
>
> This facility is not available for Darwin 8 or earlier, however the
> existing environment variable runpath does work there.
>
> We default this on for systems where the external DYLD_LIBRARY_PATH
> does not work and off for Darwin 8 or earlier.  For systems that can
> use either method, if the value is unset, we use the default (which
> is currently DYLD_LIBRARY_PATH).
>
> 
>
> Ada changes:
>  add paths relative to @loader-path
>
> JIT changes:
>
> This patch expects DARWIN_RPATH to be computed and available; which
> means that we will use @rpath or ${libdir} as the name prefix
> depending on the system version and the setting of
> --enable-darwin-at-rpath.  For branches that do not have this
> available, the value should be set to ${libdir}.
>
> added m2 library changes.
>
> ChangeLog:
>
>   * configure: Regenerate.
>   * configure.ac: Do not add default runpaths to GCC exes
>   when we are building -static-libstdc++/-static-libgcc (the
>   default).
>   * libtool.m4: Add 'enable-darwin-at-runpath'.  Act  on the
>   enable flag to alter Darwin libraries to use @rpath names.
>
> fixincludes/ChangeLog:
>
>   * configure: Regenerate.
>
> gcc/ChangeLog:
>
>   * aclocal.m4: Regenerate.
>   * configure: Regenerate.
>   * configure.ac: Handle Darwin rpaths.
>   * config/darwin-driver.cc: Handle Darwin rpaths.
>   * config/darwin.h: Handle Darwin rpaths.
>   * config/darwin.opt: Handle Darwin rpaths.
>   * Makefile.in:  Handle Darwin rpaths.
>
> gcc/ada/ChangeLog:
>
>   * gcc-interface/Makefile.in: Handle Darwin rpaths.
>
> gcc/jit/ChangeLog:
>   * Make-lang.in: Handle Darwin rpaths.
>
> libatomic/ChangeLog:
>
>   * Makefile.am: Handle Darwin rpaths.
>   * Makefile.in: Regenerate.
>   * configure: Regenerate.
>   * configure.ac: Handle Darwin rpaths.
>
> libbacktrace/ChangeLog:
>
>   * configure: Regenerate.
>   * configure.ac: Handle Darwin rpaths.
>
> libcc1/ChangeLog:
>
>   * configure: Regenerate.
>
> libffi/ChangeLog:
>
>   * Makefile.am: Handle Darwin rpaths.
>   * Makefile.in: Regenerate.
>   * configure: Regenerate.
>
> libgcc/ChangeLog:
>
>   * config/t-slibgcc-darwin: Generate libgcc_s
>   with an @rpath name.
>   * config.host: Handle Darwin rpaths.
>
> libgfortran/ChangeLog:
>
>   * Makefile.am: Handle Darwin rpaths.
>   * Makefile.in: Regenerate.
>   * configure: Regenerate.
>   * configure.ac: Handle Darwin rpaths
>
> libgm2/ChangeLog:
>
>   * Makefile.am: Handle Darwin rpaths.
>   * Makefile.in: Regenerate.
>   * aclocal.m4: Regenerate.
>   * configure: Regenerate.
>   * configure.ac: Handle Darwin rpaths.
>   * libm2cor/Makefile.

[PATCH 2/3] ipa: Prune any IPA-CP aggregate constants known by modref to be killed (111157)

2023-10-05 Thread Martin Jambor
PR 57 shows that IPA-modref and IPA-CP (when plugged into value
numbering) can optimize out a store both before a call (because the
call will overwrite it) and in the call (because the store is of the
same value) and by eliminating both create miscompilation.

This patch fixes that by pruning any constants from the list of IPA-CP
aggregate value constants that it knows the contents of the memory can
be "killed."  Unfortunately, doing so is tricky.  First, IPA-modref
loads override kills and so only stores not loaded are truly not
necessary.  Looking stuff up there means doing what most of what
modref_may_alias may do but doing exactly what it does is tricky
because it takes also aliasing into account and has bail-out counters.

To err on the side of caution in order to avoid this miscompilation we
have to prune a constant when in doubt.  However, pruning can
interfere with the mechanism of how clone materialization
distinguishes between the cases when a parameter was entirely removed
and when it was both IPA-CPed and IPA-SRAed (in order to make up for
the removal in debug info, which can bump into an assert when
compiling g++.dg/torture/pr103669.C when we are not careful).

Therefore this patch:

  1) marks constants that IPA-modref has in its kill list with a new
 "killed" flag, and
  2) prunes the list from entries with this flag after materialization
 and IPA-CP transformation is done using the template introduced in
 the previous patch

It does not try to look up anything in the load lists, this will be
done as a follow-up in order to ease review.

gcc/ChangeLog:

2023-09-19  Martin Jambor  

PR ipa/57
* ipa-prop.h (struct ipa_argagg_value): Newf flag killed.
* ipa-modref.cc (ipcp_argagg_and_kill_overlap_p): New function.
(update_signature): Mark any any IPA-CP aggregate constants at
positions known to be killed as killed.  Move check that there is
clone_info after this pruning.
* ipa-cp.cc (ipa_argagg_value_list::dump): Dump the killed flag.
(ipa_argagg_value_list::push_adjusted_values): Clear the new flag.
(push_agg_values_from_plats): Likewise.
(ipa_push_agg_values_from_jfunc): Likewise.
(estimate_local_effects): Likewise.
(push_agg_values_for_index_from_edge): Likewise.
* ipa-prop.cc (write_ipcp_transformation_info): Stream the killed
flag.
(read_ipcp_transformation_info): Likewise.
(ipcp_get_aggregate_const): Update comment, assert that encountered
record does not have killed flag set.
(ipcp_transform_function): Prune all aggregate constants with killed
set.

gcc/testsuite/ChangeLog:

2023-09-18  Martin Jambor  

PR ipa/57
* gcc.dg/lto/pr57_0.c: New test.
* gcc.dg/lto/pr57_1.c: Second file of the same new test.
---
 gcc/ipa-cp.cc |  8 
 gcc/ipa-modref.cc | 58 +--
 gcc/ipa-prop.cc   | 17 +++-
 gcc/ipa-prop.h|  4 ++
 gcc/testsuite/gcc.dg/lto/pr57_0.c | 24 +++
 gcc/testsuite/gcc.dg/lto/pr57_1.c | 10 +
 6 files changed, 115 insertions(+), 6 deletions(-)
 create mode 100644 gcc/testsuite/gcc.dg/lto/pr57_0.c
 create mode 100644 gcc/testsuite/gcc.dg/lto/pr57_1.c

diff --git a/gcc/ipa-cp.cc b/gcc/ipa-cp.cc
index 071c607fbe8..bb49a1b2959 100644
--- a/gcc/ipa-cp.cc
+++ b/gcc/ipa-cp.cc
@@ -1271,6 +1271,8 @@ ipa_argagg_value_list::dump (FILE *f)
   print_generic_expr (f, av.value);
   if (av.by_ref)
fprintf (f, "(by_ref)");
+  if (av.killed)
+   fprintf (f, "(killed)");
   comma = true;
 }
   fprintf (f, "\n");
@@ -1437,6 +1439,8 @@ ipa_argagg_value_list::push_adjusted_values (unsigned 
src_index,
  new_av.unit_offset = av->unit_offset - unit_delta;
  new_av.index = dest_index;
  new_av.by_ref = av->by_ref;
+ gcc_assert (!av->killed);
+ new_av.killed = false;
 
  /* Quick check that the offsets we push are indeed increasing.  */
  gcc_assert (first
@@ -1473,6 +1477,7 @@ push_agg_values_from_plats (ipcp_param_lattices *plats, 
int dest_index,
iav.unit_offset = aglat->offset / BITS_PER_UNIT - unit_delta;
iav.index = dest_index;
iav.by_ref = plats->aggs_by_ref;
+   iav.killed = false;
 
gcc_assert (first
|| iav.unit_offset > prev_unit_offset);
@@ -2139,6 +2144,7 @@ ipa_push_agg_values_from_jfunc (ipa_node_params *info, 
cgraph_node *node,
   iav.unit_offset = item.offset / BITS_PER_UNIT;
   iav.index = dst_index;
   iav.by_ref = agg_jfunc->by_ref;
+  iav.killed = 0;
 
   gcc_assert (first
  || iav.unit_offset > prev_unit_offset);
@@ -3970,6 +3976,7 @@ estimate_local_effects (struct cgraph

[PATCH 3/3] ipa: Limit pruning of IPA-CP aggregate constants if there are loads

2023-10-05 Thread Martin Jambor
This patch makes the previous one less conservative by looking whether
there are known ipa-modref loads from areas covered by the IPA-CP
aggregate constant entry in question.  Because ipa-modref relies on
alias information which IPA-CP does not have (yet), the test is much
more crude and only reports overlapping accesses with known offsets
and max_size.

I was not able to put together a testcase which would fail without
this patch however.  It basically needs to be a combination of
testcases for PR 92497 (so that IPA-CP transformation phase is not
enough), PR 57 (to get a load) and PR 103669 (to get a
clobber/kill) in a way that ipa-modref can still track things.
Therefore I am not sure if we actually want this patch.

gcc/ChangeLog:

2023-10-04  Martin Jambor  

* ipa-modref.cc (ipcp_argagg_and_access_must_overlap_p): New function.
(ipcp_argagg_and_modref_tree_must_overlap_p): Likewise.
(update_signature): Use ipcp_argagg_and_modref_tree_must_overlap_p.

Combined third step
---
 gcc/ipa-modref.cc | 65 +--
 1 file changed, 63 insertions(+), 2 deletions(-)

diff --git a/gcc/ipa-modref.cc b/gcc/ipa-modref.cc
index a8fcf159259..d2bfca3445d 100644
--- a/gcc/ipa-modref.cc
+++ b/gcc/ipa-modref.cc
@@ -4090,6 +4090,64 @@ ipcp_argagg_and_kill_overlap_p (const ipa_argagg_value 
&v,
   return false;
 }
 
+/* Return true if V overlaps with ACCESS_NODE.  When in doubt, return
+   false.  */
+
+static bool
+ipcp_argagg_and_access_must_overlap_p (const ipa_argagg_value &v,
+  const modref_access_node &access_node)
+{
+  if (access_node.parm_index == MODREF_GLOBAL_MEMORY_PARM
+  || access_node.parm_index == MODREF_UNKNOWN_PARM
+  || access_node.parm_index == MODREF_GLOBAL_MEMORY_PARM)
+  return false;
+
+  if (access_node.parm_index == v.index)
+{
+  if (!access_node.parm_offset_known)
+   return false;
+
+  poly_int64 repl_size;
+  bool ok = poly_int_tree_p (TYPE_SIZE (TREE_TYPE (v.value)),
+&repl_size);
+  gcc_assert (ok);
+  poly_int64 repl_offset (v.unit_offset);
+  repl_offset <<= LOG2_BITS_PER_UNIT;
+  poly_int64 combined_offset
+   = (access_node.parm_offset << LOG2_BITS_PER_UNIT) + access_node.offset;
+  if (ranges_maybe_overlap_p (repl_offset, repl_size,
+ combined_offset, access_node.max_size))
+   return true;
+}
+  return false;
+}
+
+/* Return true if MT contains an access that certainly overlaps with V even
+   when we cannot evaluate alias references.  When in doubt, return false.  */
+
+template 
+static bool
+ipcp_argagg_and_modref_tree_must_overlap_p (const ipa_argagg_value &v,
+   const modref_tree &mt)
+{
+  for (auto base_node : mt.bases)
+{
+  if (base_node->every_ref)
+   return false;
+  for (auto ref_node : base_node->refs)
+   {
+ if (ref_node->every_access)
+   return false;
+ for (auto access_node : ref_node->accesses)
+   {
+ if (ipcp_argagg_and_access_must_overlap_p (v, access_node))
+   return true;
+   }
+   }
+}
+  return false;
+}
+
 /* If signature changed, update the summary.  */
 
 static void
@@ -4111,14 +4169,17 @@ update_signature (struct cgraph_node *node)
  continue;
if (r)
  for (const modref_access_node &kill : r->kills)
-   if (ipcp_argagg_and_kill_overlap_p (v, kill))
+   if (ipcp_argagg_and_kill_overlap_p (v, kill)
+   && !ipcp_argagg_and_modref_tree_must_overlap_p (v, *r->loads))
  {
v.killed = true;
break;
  }
if (!v.killed && r_lto)
  for (const modref_access_node &kill : r_lto->kills)
-   if (ipcp_argagg_and_kill_overlap_p (v, kill))
+   if (ipcp_argagg_and_kill_overlap_p (v, kill)
+   && !ipcp_argagg_and_modref_tree_must_overlap_p (v,
+   *r_lto->loads))
  {
v.killed = 1;
break;
-- 
2.42.0


[PATCH 1/3] ipa-cp: Templatize filtering of m_agg_values

2023-10-05 Thread Martin Jambor
PR 57 points to another place where IPA-CP collected aggregate
compile-time constants need to be filtered, in addition to the one
place that already does this in ipa-sra.  In order to re-use code,
this patch turns the common bit into a template.

The functionality is still covered by testcase gcc.dg/ipa/pr108959.c.

gcc/ChangeLog:

2023-09-13  Martin Jambor  

PR ipa/57
* ipa-prop.h (ipcp_transformation): New member function template
remove_argaggs_if.
* ipa-sra.cc (zap_useless_ipcp_results): Use remove_argaggs_if to
filter aggreagate constants.
---
 gcc/ipa-prop.h | 33 +
 gcc/ipa-sra.cc | 33 -
 2 files changed, 37 insertions(+), 29 deletions(-)

diff --git a/gcc/ipa-prop.h b/gcc/ipa-prop.h
index 7e033d2a7b8..815855006e8 100644
--- a/gcc/ipa-prop.h
+++ b/gcc/ipa-prop.h
@@ -966,6 +966,39 @@ struct GTY(()) ipcp_transformation
 
   void maybe_create_parm_idx_map (tree fndecl);
 
+  /* Remove all elements in m_agg_values on which PREDICATE returns true.  */
+
+  template
+  void remove_argaggs_if (pred_function &&predicate)
+  {
+unsigned ts_len = vec_safe_length (m_agg_values);
+if (ts_len == 0)
+  return;
+
+bool removed_item = false;
+unsigned dst_index = 0;
+
+for (unsigned i = 0; i < ts_len; i++)
+  {
+   ipa_argagg_value *v = &(*m_agg_values)[i];
+   if (!predicate (*v))
+ {
+   if (removed_item)
+ (*m_agg_values)[dst_index] = *v;
+   dst_index++;
+ }
+   else
+ removed_item = true;
+  }
+if (dst_index == 0)
+  {
+   ggc_free (m_agg_values);
+   m_agg_values = NULL;
+  }
+else if (removed_item)
+  m_agg_values->truncate (dst_index);
+  }
+
   /* Known aggregate values.  */
   vec  *m_agg_values;
   /* Known bits information.  */
diff --git a/gcc/ipa-sra.cc b/gcc/ipa-sra.cc
index edba364f56e..1551b694679 100644
--- a/gcc/ipa-sra.cc
+++ b/gcc/ipa-sra.cc
@@ -4047,35 +4047,10 @@ mark_callers_calls_comdat_local (struct cgraph_node 
*node, void *)
 static void
 zap_useless_ipcp_results (const isra_func_summary *ifs, ipcp_transformation 
*ts)
 {
-  unsigned ts_len = vec_safe_length (ts->m_agg_values);
-
-  if (ts_len == 0)
-return;
-
-  bool removed_item = false;
-  unsigned dst_index = 0;
-
-  for (unsigned i = 0; i < ts_len; i++)
-{
-  ipa_argagg_value *v = &(*ts->m_agg_values)[i];
-  const isra_param_desc *desc = &(*ifs->m_parameters)[v->index];
-
-  if (!desc->locally_unused)
-   {
- if (removed_item)
-   (*ts->m_agg_values)[dst_index] = *v;
- dst_index++;
-   }
-  else
-   removed_item = true;
-}
-  if (dst_index == 0)
-{
-  ggc_free (ts->m_agg_values);
-  ts->m_agg_values = NULL;
-}
-  else if (removed_item)
-ts->m_agg_values->truncate (dst_index);
+  ts->remove_argaggs_if ([ifs](const ipa_argagg_value &v)
+  {
+return (*ifs->m_parameters)[v.index].locally_unused;
+  });
 
   bool useful_bits = false;
   unsigned count = vec_safe_length (ts->bits);
-- 
2.42.0



[PATCH] Revert "ipa: Self-DCE of uses of removed call LHSs (PR 108007)"

2023-10-05 Thread Martin Jambor
Hello,

I am going to commit the following patch to fix PR 111688 (bootstrap on
ppc64le broken) and will re-fix 108007 (issues with IPA-SRA when user
explicitely turns off DCE) when I figure out what's going wrong.

Sorry for the breakage,

Martin



[PATCH] Revert "ipa: Self-DCE of uses of removed call LHSs (PR 108007)"

This reverts commit 1be18ea110a2d69570dbc494588a7c73173883be.

As reported in PR bootstrap/111688, it broke ppc64le bootstrap because
of a debug-compare failure.
---
 gcc/cgraph.cc   | 10 +---
 gcc/cgraph.h|  9 +--
 gcc/ipa-param-manipulation.cc   | 88 -
 gcc/ipa-param-manipulation.h|  3 +-
 gcc/testsuite/gcc.dg/ipa/pr108007.c | 32 ---
 gcc/tree-inline.cc  | 28 -
 6 files changed, 38 insertions(+), 132 deletions(-)
 delete mode 100644 gcc/testsuite/gcc.dg/ipa/pr108007.c

diff --git a/gcc/cgraph.cc b/gcc/cgraph.cc
index b82367ac342..e41e5ad3ae7 100644
--- a/gcc/cgraph.cc
+++ b/gcc/cgraph.cc
@@ -1403,17 +1403,11 @@ cgraph_edge::redirect_callee (cgraph_node *n)
speculative indirect call, remove "speculative" of the indirect call and
also redirect stmt to it's final direct target.
 
-   When called from within tree-inline, KILLED_SSAs has to contain the pointer
-   to killed_new_ssa_names within the copy_body_data structure and SSAs
-   discovered to be useless (if LHS is removed) will be added to it, otherwise
-   it needs to be NULL.
-
It is up to caller to iteratively transform each "speculative"
direct call as appropriate.  */
 
 gimple *
-cgraph_edge::redirect_call_stmt_to_callee (cgraph_edge *e,
-  hash_set  *killed_ssas)
+cgraph_edge::redirect_call_stmt_to_callee (cgraph_edge *e)
 {
   tree decl = gimple_call_fndecl (e->call_stmt);
   gcall *new_stmt;
@@ -1533,7 +1527,7 @@ cgraph_edge::redirect_call_stmt_to_callee (cgraph_edge *e,
remove_stmt_from_eh_lp (e->call_stmt);
 
   tree old_fntype = gimple_call_fntype (e->call_stmt);
-  new_stmt = padjs->modify_call (e, false, killed_ssas);
+  new_stmt = padjs->modify_call (e, false);
   cgraph_node *origin = e->callee;
   while (origin->clone_of)
origin = origin->clone_of;
diff --git a/gcc/cgraph.h b/gcc/cgraph.h
index d7162efeeb4..cedaaac3a45 100644
--- a/gcc/cgraph.h
+++ b/gcc/cgraph.h
@@ -1833,16 +1833,9 @@ public:
  speculative indirect call, remove "speculative" of the indirect call and
  also redirect stmt to it's final direct target.
 
- When called from within tree-inline, KILLED_SSAs has to contain the
- pointer to killed_new_ssa_names within the copy_body_data structure and
- SSAs discovered to be useless (if LHS is removed) will be added to it,
- otherwise it needs to be NULL.
-
  It is up to caller to iteratively transform each "speculative"
  direct call as appropriate.  */
-  static gimple *redirect_call_stmt_to_callee (cgraph_edge *e,
-  hash_set 
-  *killed_ssas = nullptr);
+  static gimple *redirect_call_stmt_to_callee (cgraph_edge *e);
 
   /* Create clone of edge in the node N represented
  by CALL_EXPR the callgraph.  */
diff --git a/gcc/ipa-param-manipulation.cc b/gcc/ipa-param-manipulation.cc
index 014939bf754..ae52f17b2c9 100644
--- a/gcc/ipa-param-manipulation.cc
+++ b/gcc/ipa-param-manipulation.cc
@@ -593,66 +593,14 @@ isra_get_ref_base_and_offset (tree expr, tree *base_p, 
unsigned *unit_offset_p)
   return true;
 }
 
-/* Remove all statements that use NAME and transitively those that use the
-   result of such statements.  KILLED_SSAS contains the SSA_NAMEs that are
-   already being or have been processed and new ones need to be added to it.
-   The funtction only has to process situations handled by
-   ssa_name_only_returned_p in ipa-sra.cc with the exception that it can assume
-   it must never reach a use in a return statement.  */
-
-static void
-purge_transitive_uses (tree name, hash_set  *killed_ssas)
-{
-  imm_use_iterator imm_iter;
-  gimple *stmt;
-  auto_vec  worklist;
-
-  worklist.safe_push (name);
-  while (!worklist.is_empty ())
-{
-  tree cur_name = worklist.pop ();
-  FOR_EACH_IMM_USE_STMT (stmt, imm_iter, cur_name)
-   {
- if (gimple_debug_bind_p (stmt))
-   {
- /* When runing within tree-inline, we will never end up here but
-adding the SSAs to killed_ssas will do the trick in this case
-and the respective debug statements will get reset. */
- gimple_debug_bind_reset_value (stmt);
- update_stmt (stmt);
- continue;
-   }
-
- tree lhs = NULL_TREE;
- if (is_gimple_assign (stmt))
-   lhs = gimple_assign_lhs (stmt);
- else if (gimple_code (stmt) == GIMPLE_PHI)
-   lhs = gimple_phi_result (stmt);
- gcc_assert (l

[committed] ipa-modref: Fix dumping

2023-10-03 Thread Martin Jambor
Hi,

function dump_lto_records ought to dump to its parameter OUT but was
dumping expressions to dump_file.  This is corrected by this patch and
while at at, I also made the modref_summary::dump member function
const so that it is callable from more contexts.

I have committed this patch as obvious after including it in a bootstrap
and testing on an x86_64-linux.

Thanks,

Martin


gcc/ChangeLog:

2023-09-21  Martin Jambor  

* ipa-modref.h (modref_summary::dump): Make const.
* ipa-modref.cc (modref_summary::dump): Likewise.
(dump_lto_records): Dump to out instead of dump_file.
---
 gcc/ipa-modref.cc | 6 +++---
 gcc/ipa-modref.h  | 2 +-
 2 files changed, 4 insertions(+), 4 deletions(-)

diff --git a/gcc/ipa-modref.cc b/gcc/ipa-modref.cc
index c04f9f44c06..fe55621f007 100644
--- a/gcc/ipa-modref.cc
+++ b/gcc/ipa-modref.cc
@@ -474,7 +474,7 @@ dump_lto_records (modref_records_lto *tt, FILE *out)
   FOR_EACH_VEC_SAFE_ELT (tt->bases, i, n)
 {
   fprintf (out, "  Base %i:", (int)i);
-  print_generic_expr (dump_file, n->base);
+  print_generic_expr (out, n->base);
   fprintf (out, " (alias set %i)\n",
   n->base ? get_alias_set (n->base) : 0);
   if (n->every_ref)
@@ -487,7 +487,7 @@ dump_lto_records (modref_records_lto *tt, FILE *out)
   FOR_EACH_VEC_SAFE_ELT (n->refs, j, r)
{
  fprintf (out, "Ref %i:", (int)j);
- print_generic_expr (dump_file, r->ref);
+ print_generic_expr (out, r->ref);
  fprintf (out, " (alias set %i)\n",
   r->ref ? get_alias_set (r->ref) : 0);
  if (r->every_access)
@@ -567,7 +567,7 @@ remove_modref_edge_summaries (cgraph_node *node)
 /* Dump summary.  */
 
 void
-modref_summary::dump (FILE *out)
+modref_summary::dump (FILE *out) const
 {
   if (loads)
 {
diff --git a/gcc/ipa-modref.h b/gcc/ipa-modref.h
index 2a2d31e86db..f7dedace2da 100644
--- a/gcc/ipa-modref.h
+++ b/gcc/ipa-modref.h
@@ -66,7 +66,7 @@ struct GTY(()) modref_summary
 
   modref_summary ();
   ~modref_summary ();
-  void dump (FILE *);
+  void dump (FILE *) const;
   bool useful_p (int ecf_flags, bool check_flags = true);
   void finalize (tree);
 };
-- 
2.42.0



Re: [PATCH] ipa: Self-DCE of uses of removed call LHSs (PR 108007)

2023-10-03 Thread Martin Jambor
Hello,

On Mon, Sep 25 2023, Jan Hubicka wrote:

[...]

>> >> +static void
>> >> +purge_transitive_uses (tree name, hash_set  *killed_ssas)
>> >> +{
>> >> +  imm_use_iterator imm_iter;
>> >> +  gimple *stmt;
>> >> +
>> >> +  FOR_EACH_IMM_USE_STMT (stmt, imm_iter, name)
>> >> +{
>> >> +  if (gimple_debug_bind_p (stmt))
>> >> + {
>> >> +   /* When runing within tree-inline, we will never end up here but
>> >> +  adding the SSAs to killed_ssas will do the trick in this case and
>> >> +  the respective debug statements will get reset. */
>> >> +
>> >> +   gimple_debug_bind_reset_value (stmt);
>> >> +   update_stmt (stmt);
>> >> +   continue;
>> >> + }
>> >> +
>> >> +  tree lhs = NULL_TREE;
>> >> +  if (is_gimple_assign (stmt))
>> >> + lhs = gimple_assign_lhs (stmt);
>> >> +  else if (gimple_code (stmt) == GIMPLE_PHI)
>> >> + lhs = gimple_phi_result (stmt);
>> >> +  gcc_assert (lhs
>> >> +   && (TREE_CODE (lhs) == SSA_NAME)
>> >> +   && !gimple_vdef (stmt));
>> >> +
>> >> +  if (!killed_ssas->contains (lhs))
>> >> + {
>> >> +   killed_ssas->add (lhs);
>> >> +   purge_transitive_uses (lhs, killed_ssas);
>
> SSA graph may be deep so this may cause stack overflow, so I think we
> should use worklist here (it is also easy to do).
>
> OK with that change.
> Honza

I have just committed the following after a bootstrap and testing on
x86_64-linux.

Thanks,

Martin


PR 108007 is another manifestation where we rely on DCE to clean-up
after IPA-SRA and if the user explicitely switches DCE off, IPA-SRA
can leave behind statements which are fed uninitialized values and
trap, even though their results are themselves never used.

I have already fixed this for unused parameters in callees, this bug
shows that almost the same thing can happen for removed returns, on
the side of callers.  This means that the issue has to be fixed
elsewhere, in call redirection.  This patch adds a function which
looks for (and through, using a work-list) uses of operations fed
specific SSA names and removes them all.

That would have been easy if it wasn't for debug statements during
tree-inline (from which call redirection is also invoked).  Debug
statements are decoupled from the rest at this point and iterating
over uses of SSAs does not bring them up.  During tree-inline they are
handled especially at the end, I assume in order to make sure that
relative ordering of UIDs are the same with and without debug info.

This means that during tree-inline we need to make a hash of killed
SSAs, that we already have in copy_body_data, available to the
function making the purging.  So the patch duly does also that, making
the interface slightly ugly.

gcc/ChangeLog:

2023-09-27  Martin Jambor  

PR ipa/108007
* cgraph.h (cgraph_edge): Add a parameter to
redirect_call_stmt_to_callee.
* ipa-param-manipulation.h (ipa_param_adjustments): Add a
parameter to modify_call.
* cgraph.cc (cgraph_edge::redirect_call_stmt_to_callee): New
parameter killed_ssas, pass it to padjs->modify_call.
* ipa-param-manipulation.cc (purge_transitive_uses): New function.
(ipa_param_adjustments::modify_call): New parameter killed_ssas.
Instead of substituting uses, invoke purge_transitive_uses.  If
hash of killed SSAs has not been provided, create a temporary one
and release SSAs that have been added to it.
* tree-inline.cc (redirect_all_calls): Create
id->killed_new_ssa_names earlier, pass it to edge redirection,
adjust a comment.
(copy_body): Release SSAs in id->killed_new_ssa_names.

gcc/testsuite/ChangeLog:

2023-05-11  Martin Jambor  

PR ipa/108007
* gcc.dg/ipa/pr108007.c: New test.
---
 gcc/cgraph.cc   | 10 +++-
 gcc/cgraph.h|  9 ++-
 gcc/ipa-param-manipulation.cc   | 88 +
 gcc/ipa-param-manipulation.h|  3 +-
 gcc/testsuite/gcc.dg/ipa/pr108007.c | 32 +++
 gcc/tree-inline.cc  | 28 +
 6 files changed, 132 insertions(+), 38 deletions(-)
 create mode 100644 gcc/testsuite/gcc.dg/ipa/pr108007.c

diff --git a/gcc/cgraph.cc b/gcc/cgraph.cc
index e41e5ad3ae7..b82367ac342 100644
--- a/gcc/cgraph.cc
+++ b/gcc/cgraph.cc
@@ -1403,11 +1403,17 @@ cgraph_edge::redirect_callee (cgraph_node *n)
speculative indirect call, remove "speculative" of the indirect call and
also redirect stmt to it's final direct targ

[PATCH] contrib/mklog.py: Fix issues reported by flake8

2023-10-03 Thread Martin Jambor
Hi,

the testing infrastructure built by Martin Liška contains checking a
few python scripts in contrib witha tool flake8.  That tool recently
complains that:

  contrib/mklog.py:360:45: E711 comparison to None should be 'if cond is None:'
  contrib/mklog.py:362:1: E305 expected 2 blank lines after class or function 
definition, found 1

I'd like to silence these with the following, hopefully trivial,
changes.  However, I have only tested the changes by running flake8
again and running ./contrib/mklog.py --help.

Is this good for trunk?  (Or should I stop using flake8 instead?)

Thanks,

Martin


contrib/ChangeLog:

2023-10-03  Martin Jambor  

* mklog.py (skip_line_in_changelog): Compare to None using is instead
of ==, add an extra newline after the function.
---
 contrib/mklog.py | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/contrib/mklog.py b/contrib/mklog.py
index effe5aa1ca5..1c2c3216e9e 100755
--- a/contrib/mklog.py
+++ b/contrib/mklog.py
@@ -357,7 +357,8 @@ def update_copyright(data):
 
 
 def skip_line_in_changelog(line):
-return FIRST_LINE_OF_END_RE.match(line) == None
+return FIRST_LINE_OF_END_RE.match(line) is None
+
 
 if __name__ == '__main__':
 extra_args = os.getenv('GCC_MKLOG_ARGS')
-- 
2.42.0



Re: [PATCH] ipa: Self-DCE of uses of removed call LHSs (PR 108007)

2023-09-19 Thread Martin Jambor
Hello,

and ping.

Thanks,

Martin


On Fri, Sep 01 2023, Martin Jambor wrote:
> Hello
>
> and ping.
>
> Thanks,
>
> Martin
>
>
> On Fri, May 12 2023, Martin Jambor wrote:
>> Hi,
>>
>> PR 108007 is another manifestation where we rely on DCE to clean-up
>> after IPA-SRA and if the user explicitely switches DCE off, IPA-SRA
>> can leave behind statements which are fed uninitialized values and
>> trap, even though their results are themselves never used.
>>
>> I have already fixed this for unused parameters in callees, this bug
>> shows that almost the same thing can happen for removed returns, on
>> the side of callers.  This means that the issue has to be fixed
>> elsewhere, in call redirection.  This patch adds a function which
>> recursivewly looks for uses of operations fed specific SSA names and
>> removes them all.
>>
>> That would have been easy if it wasn't for debug statements during
>> tree-inline (from which call redirection is also invoked).  Debug
>> statements are decoupled from the rest at this point and iterating
>> over uses of SSAs does not bring them up.  During tree-inline they are
>> handled especially at the end, I assume in order to make sure that
>> relative ordering of UIDs are the same with and without debug info.
>>
>> This means that during tree-inline we need to make a hash of killed
>> SSAs, that we already have in copy_body_data, available to the
>> function making the purging.  So the patch duly does also that, making
>> the interface slightly ugly.
>>
>> Bootstrapped and tested on x86_64-linux.  OK for master?  (I am not sure
>> the problem is grave enough to warrant backporting to release branches
>> but can do that as well if people think I should.)
>>
>> Thanks,
>>
>> Martin
>>
>>
>> gcc/ChangeLog:
>>
>> 2023-05-11  Martin Jambor  
>>
>>  PR ipa/108007
>>  * cgraph.h (cgraph_edge): Add a parameter to
>>  redirect_call_stmt_to_callee.
>>  * ipa-param-manipulation.h (ipa_param_adjustments): Added a
>>  parameter to modify_call.
>>  * cgraph.cc (cgraph_edge::redirect_call_stmt_to_callee): New
>>  parameter killed_ssas, pass it to padjs->modify_call.
>>  * ipa-param-manipulation.cc (purge_transitive_uses): New function.
>>  (ipa_param_adjustments::modify_call): New parameter killed_ssas.
>>  Instead of substitutin uses, invoke purge_transitive_uses.  If
>>  hash of killed SSAs has not been provided, create a temporary one
>>  and release SSAs that have been added to it.
>>  * tree-inline.cc (redirect_all_calls): Create
>>  id->killed_new_ssa_names earlier, pass it to edge redirection,
>>  adjust a comment.
>>  (copy_body): Release SSAs in id->killed_new_ssa_names.
>>
>> gcc/testsuite/ChangeLog:
>>
>> 2023-05-11  Martin Jambor  
>>
>>  PR ipa/108007
>>  * gcc.dg/ipa/pr108007.c: New test.
>> ---
>>  gcc/cgraph.cc   | 10 +++-
>>  gcc/cgraph.h|  9 ++-
>>  gcc/ipa-param-manipulation.cc   | 85 +
>>  gcc/ipa-param-manipulation.h|  3 +-
>>  gcc/testsuite/gcc.dg/ipa/pr108007.c | 32 +++
>>  gcc/tree-inline.cc  | 28 ++
>>  6 files changed, 129 insertions(+), 38 deletions(-)
>>  create mode 100644 gcc/testsuite/gcc.dg/ipa/pr108007.c
>>
>> diff --git a/gcc/cgraph.cc b/gcc/cgraph.cc
>> index e8f9bec8227..5e923bf0557 100644
>> --- a/gcc/cgraph.cc
>> +++ b/gcc/cgraph.cc
>> @@ -1403,11 +1403,17 @@ cgraph_edge::redirect_callee (cgraph_node *n)
>> speculative indirect call, remove "speculative" of the indirect call and
>> also redirect stmt to it's final direct target.
>>  
>> +   When called from within tree-inline, KILLED_SSAs has to contain the 
>> pointer
>> +   to killed_new_ssa_names within the copy_body_data structure and SSAs
>> +   discovered to be useless (if LHS is removed) will be added to it, 
>> otherwise
>> +   it needs to be NULL.
>> +
>> It is up to caller to iteratively transform each "speculative"
>> direct call as appropriate.  */
>>  
>>  gimple *
>> -cgraph_edge::redirect_call_stmt_to_callee (cgraph_edge *e)
>> +cgraph_edge::redirect_call_stmt_to_callee (cgraph_edge *e,
>> +   hash_set  *killed_ssas)
>>  {
>>tree decl = gimple_call_fndecl (e->call_stmt);
>>gcall *new_stmt;
>> @@ -1

Re: [PATCH] ipa-sra: Allow IPA-SRA in presence of returns which will be removed

2023-09-18 Thread Martin Jambor
Hello,

and ping.

Thanks,

Martin


On Fri, Sep 01 2023, Martin Jambor wrote:
> Hello
>
> and ping.
>
> Thanks,
>
> Martin
>
>
> On Fri, Aug 18 2023, Martin Jambor wrote:
>> Hi,
>>
>> testing on 32bit arm revealed that even the simplest case of PR 110378
>> was still not resolved there because destructors were returning this
>> pointer.  Needless to say, the return value of those destructors often
>> is just not used, which IPA-SRA can already detect in time.  Since
>> such enhancement seems generally useful, here it is.
>>
>> The patch simply adds two flag to respective summaries to mark down
>> situations when it encounters either a simple direct use of a default
>> definition SSA_NAME of a parameter, which means that the parameter may
>> still be split when return value is removed, and when any derived use
>> of it is returned, allowing for complete removal in that case, instead
>> of discarding it as a candidate for removal or splitting like we do
>> now.  The IPA phase then simply checks that we indeed plan to remove
>> the return value before allowing any transformation to be considered
>> in such cases.
>>
>> Bootstrapped, LTO-bootstrapped and tested on x86_64-linux.  OK for
>> master?
>>
>> Thanks,
>>
>> Martin
>>
>>
>> gcc/ChangeLog:
>>
>> 2023-08-18  Martin Jambor  
>>
>>  PR ipa/110378
>>  * ipa-param-manipulation.cc
>>  (ipa_param_body_adjustments::mark_dead_statements): Verify that any
>>  return uses of PARAM will be removed.
>>  (ipa_param_body_adjustments::mark_clobbers_dead): Likewise.
>>  * ipa-sra.cc (isra_param_desc): New fields
>>  remove_only_when_retval_removed and split_only_when_retval_removed.
>>  (struct gensum_param_desc): Likewise.  Fix comment long line.
>>  (ipa_sra_function_summaries::duplicate): Copy the new flags.
>>  (dump_gensum_param_descriptor): Dump the new flags.
>>  (dump_isra_param_descriptor): Likewise.
>>  (isra_track_scalar_value_uses): New parameter desc.  Set its flag
>>  remove_only_when_retval_removed when encountering a simple return.
>>  (isra_track_scalar_param_local_uses): Replace parameter call_uses_p
>>  with desc.  Pass it to isra_track_scalar_value_uses and set its
>>  call_uses.
>>  (ptr_parm_has_nonarg_uses): Accept parameter descriptor as a
>>  parameter.  If there is a direct return use, mark any..
>>  (create_parameter_descriptors): Pass the whole parameter descriptor to
>>  isra_track_scalar_param_local_uses and ptr_parm_has_nonarg_uses.
>>  (process_scan_results): Copy the new flags.
>>  (isra_write_node_summary): Stream the new flags.
>>  (isra_read_node_info): Likewise.
>>  (adjust_parameter_descriptions): Check that transformations
>>  requring return removal only happen when return value is removed.
>>  Restructure main loop.  Adjust dump message.
>>
>> gcc/testsuite/ChangeLog:
>>
>> 2023-08-18  Martin Jambor  
>>
>>  PR ipa/110378
>>  * gcc.dg/ipa/ipa-sra-32.c: New test.
>>  * gcc.dg/ipa/pr110378-4.c: Likewise.
>>  * gcc.dg/ipa/ipa-sra-4.c: Use a return value.
>> ---
>>  gcc/ipa-param-manipulation.cc |   7 +-
>>  gcc/ipa-sra.cc| 247 +-
>>  gcc/testsuite/gcc.dg/ipa/ipa-sra-32.c |  30 
>>  gcc/testsuite/gcc.dg/ipa/ipa-sra-4.c  |   4 +-
>>  gcc/testsuite/gcc.dg/ipa/pr110378-4.c |  50 ++
>>  5 files changed, 251 insertions(+), 87 deletions(-)
>>  create mode 100644 gcc/testsuite/gcc.dg/ipa/ipa-sra-32.c
>>  create mode 100644 gcc/testsuite/gcc.dg/ipa/pr110378-4.c
>>
>> diff --git a/gcc/ipa-param-manipulation.cc b/gcc/ipa-param-manipulation.cc
>> index 4a185ddbdf4..ae52f17b2c9 100644
>> --- a/gcc/ipa-param-manipulation.cc
>> +++ b/gcc/ipa-param-manipulation.cc
>> @@ -1163,6 +1163,8 @@ ipa_param_body_adjustments::mark_dead_statements (tree 
>> dead_param,
>>  stack.safe_push (lhs);
>>  }
>>  }
>> +  else if (gimple_code (stmt) == GIMPLE_RETURN)
>> +gcc_assert (m_adjustments && m_adjustments->m_skip_return);
>>else
>>  /* IPA-SRA does not analyze other types of statements.  */
>>  gcc_unreachable ();
>> @@ -1182,7 +1184,8 @@ ipa_param_body_adjustments::mark_dead_statements (tree 
>> dead_param,
>>  }
>>  
>>  /* Put all clobbers of of dereference of default definition of PARAM into
>> -   m_dea

[PATCH] math-opts: Add dbgcounter for FMA formation

2023-09-07 Thread Martin Jambor
Hello,

This patch is a simple addition of a debug counter to FMA formation in
tree-ssa-math-opts.cc.  Given that issues with FMAs do occasionally
pop up, it seems genuinely useful.

I simply added an if right after the initial checks in
convert_mult_to_fma even though when FMA formation deferring is
active (i.e. when targeting Zen CPUs) this would interact with it (and
at this moment lead to producing all deferred candidates), so when
using the dbg counter to find a harmful set of FMAs, it is probably
best to also set param_avoid_fma_max_bits to zero.  I could not find a
better place which would not also make the code unnecessarily more
complicated.

Bootstrapped and tested on x86_64-linux.  OK for master?

Thanks,

Martin



gcc/ChangeLog:

2023-09-06  Martin Jambor  

* dbgcnt.def (form_fma): New.
* tree-ssa-math-opts.cc: Include dbgcnt.h.
(convert_mult_to_fma): Bail out if the debug counter say so.
---
 gcc/dbgcnt.def| 1 +
 gcc/tree-ssa-math-opts.cc | 4 
 2 files changed, 5 insertions(+)

diff --git a/gcc/dbgcnt.def b/gcc/dbgcnt.def
index 9e2f1d857b4..871cbf75d93 100644
--- a/gcc/dbgcnt.def
+++ b/gcc/dbgcnt.def
@@ -162,6 +162,7 @@ DEBUG_COUNTER (dom_unreachable_edges)
 DEBUG_COUNTER (dse)
 DEBUG_COUNTER (dse1)
 DEBUG_COUNTER (dse2)
+DEBUG_COUNTER (form_fma)
 DEBUG_COUNTER (gcse2_delete)
 DEBUG_COUNTER (gimple_unroll)
 DEBUG_COUNTER (global_alloc_at_func)
diff --git a/gcc/tree-ssa-math-opts.cc b/gcc/tree-ssa-math-opts.cc
index 95c22694368..3db69ad5733 100644
--- a/gcc/tree-ssa-math-opts.cc
+++ b/gcc/tree-ssa-math-opts.cc
@@ -116,6 +116,7 @@ along with GCC; see the file COPYING3.  If not see
 #include "targhooks.h"
 #include "domwalk.h"
 #include "tree-ssa-math-opts.h"
+#include "dbgcnt.h"
 
 /* This structure represents one basic block that either computes a
division, or is a common dominator for basic block that compute a
@@ -3366,6 +3367,9 @@ convert_mult_to_fma (gimple *mul_stmt, tree op1, tree op2,
   && !has_single_use (mul_result))
 return false;
 
+  if (!dbg_cnt (form_fma))
+return false;
+
   /* Make sure that the multiplication statement becomes dead after
  the transformation, thus that all uses are transformed to FMAs.
  This means we assume that an FMA operation has the same cost
-- 
2.41.0



Re: [PATCH] ipa: Self-DCE of uses of removed call LHSs (PR 108007)

2023-09-01 Thread Martin Jambor
Hello

and ping.

Thanks,

Martin


On Fri, May 12 2023, Martin Jambor wrote:
> Hi,
>
> PR 108007 is another manifestation where we rely on DCE to clean-up
> after IPA-SRA and if the user explicitely switches DCE off, IPA-SRA
> can leave behind statements which are fed uninitialized values and
> trap, even though their results are themselves never used.
>
> I have already fixed this for unused parameters in callees, this bug
> shows that almost the same thing can happen for removed returns, on
> the side of callers.  This means that the issue has to be fixed
> elsewhere, in call redirection.  This patch adds a function which
> recursivewly looks for uses of operations fed specific SSA names and
> removes them all.
>
> That would have been easy if it wasn't for debug statements during
> tree-inline (from which call redirection is also invoked).  Debug
> statements are decoupled from the rest at this point and iterating
> over uses of SSAs does not bring them up.  During tree-inline they are
> handled especially at the end, I assume in order to make sure that
> relative ordering of UIDs are the same with and without debug info.
>
> This means that during tree-inline we need to make a hash of killed
> SSAs, that we already have in copy_body_data, available to the
> function making the purging.  So the patch duly does also that, making
> the interface slightly ugly.
>
> Bootstrapped and tested on x86_64-linux.  OK for master?  (I am not sure
> the problem is grave enough to warrant backporting to release branches
> but can do that as well if people think I should.)
>
> Thanks,
>
> Martin
>
>
> gcc/ChangeLog:
>
> 2023-05-11  Martin Jambor  
>
>   PR ipa/108007
>   * cgraph.h (cgraph_edge): Add a parameter to
>   redirect_call_stmt_to_callee.
>   * ipa-param-manipulation.h (ipa_param_adjustments): Added a
>   parameter to modify_call.
>   * cgraph.cc (cgraph_edge::redirect_call_stmt_to_callee): New
>   parameter killed_ssas, pass it to padjs->modify_call.
>   * ipa-param-manipulation.cc (purge_transitive_uses): New function.
>   (ipa_param_adjustments::modify_call): New parameter killed_ssas.
>   Instead of substitutin uses, invoke purge_transitive_uses.  If
>   hash of killed SSAs has not been provided, create a temporary one
>   and release SSAs that have been added to it.
>   * tree-inline.cc (redirect_all_calls): Create
>   id->killed_new_ssa_names earlier, pass it to edge redirection,
>   adjust a comment.
>   (copy_body): Release SSAs in id->killed_new_ssa_names.
>
> gcc/testsuite/ChangeLog:
>
> 2023-05-11  Martin Jambor  
>
>   PR ipa/108007
>   * gcc.dg/ipa/pr108007.c: New test.
> ---
>  gcc/cgraph.cc   | 10 +++-
>  gcc/cgraph.h|  9 ++-
>  gcc/ipa-param-manipulation.cc   | 85 +
>  gcc/ipa-param-manipulation.h|  3 +-
>  gcc/testsuite/gcc.dg/ipa/pr108007.c | 32 +++
>  gcc/tree-inline.cc  | 28 ++
>  6 files changed, 129 insertions(+), 38 deletions(-)
>  create mode 100644 gcc/testsuite/gcc.dg/ipa/pr108007.c
>
> diff --git a/gcc/cgraph.cc b/gcc/cgraph.cc
> index e8f9bec8227..5e923bf0557 100644
> --- a/gcc/cgraph.cc
> +++ b/gcc/cgraph.cc
> @@ -1403,11 +1403,17 @@ cgraph_edge::redirect_callee (cgraph_node *n)
> speculative indirect call, remove "speculative" of the indirect call and
> also redirect stmt to it's final direct target.
>  
> +   When called from within tree-inline, KILLED_SSAs has to contain the 
> pointer
> +   to killed_new_ssa_names within the copy_body_data structure and SSAs
> +   discovered to be useless (if LHS is removed) will be added to it, 
> otherwise
> +   it needs to be NULL.
> +
> It is up to caller to iteratively transform each "speculative"
> direct call as appropriate.  */
>  
>  gimple *
> -cgraph_edge::redirect_call_stmt_to_callee (cgraph_edge *e)
> +cgraph_edge::redirect_call_stmt_to_callee (cgraph_edge *e,
> +hash_set  *killed_ssas)
>  {
>tree decl = gimple_call_fndecl (e->call_stmt);
>gcall *new_stmt;
> @@ -1527,7 +1533,7 @@ cgraph_edge::redirect_call_stmt_to_callee (cgraph_edge 
> *e)
>   remove_stmt_from_eh_lp (e->call_stmt);
>  
>tree old_fntype = gimple_call_fntype (e->call_stmt);
> -  new_stmt = padjs->modify_call (e, false);
> +  new_stmt = padjs->modify_call (e, false, killed_ssas);
>cgraph_node *origin = e->callee;
>while (origin->clone_of)
>   origin = origin->clone_of;
> diff --git a/gcc/cgraph.h

Re: [PATCH] ipa-sra: Allow IPA-SRA in presence of returns which will be removed

2023-09-01 Thread Martin Jambor
Hello

and ping.

Thanks,

Martin


On Fri, Aug 18 2023, Martin Jambor wrote:
> Hi,
>
> testing on 32bit arm revealed that even the simplest case of PR 110378
> was still not resolved there because destructors were returning this
> pointer.  Needless to say, the return value of those destructors often
> is just not used, which IPA-SRA can already detect in time.  Since
> such enhancement seems generally useful, here it is.
>
> The patch simply adds two flag to respective summaries to mark down
> situations when it encounters either a simple direct use of a default
> definition SSA_NAME of a parameter, which means that the parameter may
> still be split when return value is removed, and when any derived use
> of it is returned, allowing for complete removal in that case, instead
> of discarding it as a candidate for removal or splitting like we do
> now.  The IPA phase then simply checks that we indeed plan to remove
> the return value before allowing any transformation to be considered
> in such cases.
>
> Bootstrapped, LTO-bootstrapped and tested on x86_64-linux.  OK for
> master?
>
> Thanks,
>
> Martin
>
>
> gcc/ChangeLog:
>
> 2023-08-18  Martin Jambor  
>
>   PR ipa/110378
>   * ipa-param-manipulation.cc
>   (ipa_param_body_adjustments::mark_dead_statements): Verify that any
>   return uses of PARAM will be removed.
>   (ipa_param_body_adjustments::mark_clobbers_dead): Likewise.
>   * ipa-sra.cc (isra_param_desc): New fields
>   remove_only_when_retval_removed and split_only_when_retval_removed.
>   (struct gensum_param_desc): Likewise.  Fix comment long line.
>   (ipa_sra_function_summaries::duplicate): Copy the new flags.
>   (dump_gensum_param_descriptor): Dump the new flags.
>   (dump_isra_param_descriptor): Likewise.
>   (isra_track_scalar_value_uses): New parameter desc.  Set its flag
>   remove_only_when_retval_removed when encountering a simple return.
>   (isra_track_scalar_param_local_uses): Replace parameter call_uses_p
>   with desc.  Pass it to isra_track_scalar_value_uses and set its
>   call_uses.
>   (ptr_parm_has_nonarg_uses): Accept parameter descriptor as a
>   parameter.  If there is a direct return use, mark any..
>   (create_parameter_descriptors): Pass the whole parameter descriptor to
>   isra_track_scalar_param_local_uses and ptr_parm_has_nonarg_uses.
>   (process_scan_results): Copy the new flags.
>   (isra_write_node_summary): Stream the new flags.
>   (isra_read_node_info): Likewise.
>   (adjust_parameter_descriptions): Check that transformations
>   requring return removal only happen when return value is removed.
>   Restructure main loop.  Adjust dump message.
>
> gcc/testsuite/ChangeLog:
>
> 2023-08-18  Martin Jambor  
>
>   PR ipa/110378
>   * gcc.dg/ipa/ipa-sra-32.c: New test.
>   * gcc.dg/ipa/pr110378-4.c: Likewise.
>   * gcc.dg/ipa/ipa-sra-4.c: Use a return value.
> ---
>  gcc/ipa-param-manipulation.cc |   7 +-
>  gcc/ipa-sra.cc| 247 +-
>  gcc/testsuite/gcc.dg/ipa/ipa-sra-32.c |  30 
>  gcc/testsuite/gcc.dg/ipa/ipa-sra-4.c  |   4 +-
>  gcc/testsuite/gcc.dg/ipa/pr110378-4.c |  50 ++
>  5 files changed, 251 insertions(+), 87 deletions(-)
>  create mode 100644 gcc/testsuite/gcc.dg/ipa/ipa-sra-32.c
>  create mode 100644 gcc/testsuite/gcc.dg/ipa/pr110378-4.c
>
> diff --git a/gcc/ipa-param-manipulation.cc b/gcc/ipa-param-manipulation.cc
> index 4a185ddbdf4..ae52f17b2c9 100644
> --- a/gcc/ipa-param-manipulation.cc
> +++ b/gcc/ipa-param-manipulation.cc
> @@ -1163,6 +1163,8 @@ ipa_param_body_adjustments::mark_dead_statements (tree 
> dead_param,
>   stack.safe_push (lhs);
>   }
>   }
> +   else if (gimple_code (stmt) == GIMPLE_RETURN)
> + gcc_assert (m_adjustments && m_adjustments->m_skip_return);
> else
>   /* IPA-SRA does not analyze other types of statements.  */
>   gcc_unreachable ();
> @@ -1182,7 +1184,8 @@ ipa_param_body_adjustments::mark_dead_statements (tree 
> dead_param,
>  }
>  
>  /* Put all clobbers of of dereference of default definition of PARAM into
> -   m_dead_stmts.  */
> +   m_dead_stmts.  If there are returns among uses of the default definition 
> of
> +   PARAM, verify they will be stripped off the return value.  */
>  
>  void
>  ipa_param_body_adjustments::mark_clobbers_dead (tree param)
> @@ -1200,6 +1203,8 @@ ipa_param_body_adjustments::mark_clobbers_dead (tree 
> param)
>   gimple *stmt = USE_STMT (use_p);
>   if (gimple_clobber_p (stmt))
> m_d

[PATCH] ipa-sra: Allow IPA-SRA in presence of returns which will be removed

2023-08-18 Thread Martin Jambor
Hi,

testing on 32bit arm revealed that even the simplest case of PR 110378
was still not resolved there because destructors were returning this
pointer.  Needless to say, the return value of those destructors often
is just not used, which IPA-SRA can already detect in time.  Since
such enhancement seems generally useful, here it is.

The patch simply adds two flag to respective summaries to mark down
situations when it encounters either a simple direct use of a default
definition SSA_NAME of a parameter, which means that the parameter may
still be split when return value is removed, and when any derived use
of it is returned, allowing for complete removal in that case, instead
of discarding it as a candidate for removal or splitting like we do
now.  The IPA phase then simply checks that we indeed plan to remove
the return value before allowing any transformation to be considered
in such cases.

Bootstrapped, LTO-bootstrapped and tested on x86_64-linux.  OK for
master?

Thanks,

Martin


gcc/ChangeLog:

2023-08-18  Martin Jambor  

PR ipa/110378
* ipa-param-manipulation.cc
(ipa_param_body_adjustments::mark_dead_statements): Verify that any
return uses of PARAM will be removed.
(ipa_param_body_adjustments::mark_clobbers_dead): Likewise.
* ipa-sra.cc (isra_param_desc): New fields
remove_only_when_retval_removed and split_only_when_retval_removed.
(struct gensum_param_desc): Likewise.  Fix comment long line.
(ipa_sra_function_summaries::duplicate): Copy the new flags.
(dump_gensum_param_descriptor): Dump the new flags.
(dump_isra_param_descriptor): Likewise.
(isra_track_scalar_value_uses): New parameter desc.  Set its flag
remove_only_when_retval_removed when encountering a simple return.
(isra_track_scalar_param_local_uses): Replace parameter call_uses_p
with desc.  Pass it to isra_track_scalar_value_uses and set its
call_uses.
(ptr_parm_has_nonarg_uses): Accept parameter descriptor as a
parameter.  If there is a direct return use, mark any..
(create_parameter_descriptors): Pass the whole parameter descriptor to
isra_track_scalar_param_local_uses and ptr_parm_has_nonarg_uses.
(process_scan_results): Copy the new flags.
(isra_write_node_summary): Stream the new flags.
(isra_read_node_info): Likewise.
(adjust_parameter_descriptions): Check that transformations
requring return removal only happen when return value is removed.
Restructure main loop.  Adjust dump message.

gcc/testsuite/ChangeLog:

2023-08-18  Martin Jambor  

PR ipa/110378
* gcc.dg/ipa/ipa-sra-32.c: New test.
* gcc.dg/ipa/pr110378-4.c: Likewise.
* gcc.dg/ipa/ipa-sra-4.c: Use a return value.
---
 gcc/ipa-param-manipulation.cc |   7 +-
 gcc/ipa-sra.cc| 247 +-
 gcc/testsuite/gcc.dg/ipa/ipa-sra-32.c |  30 
 gcc/testsuite/gcc.dg/ipa/ipa-sra-4.c  |   4 +-
 gcc/testsuite/gcc.dg/ipa/pr110378-4.c |  50 ++
 5 files changed, 251 insertions(+), 87 deletions(-)
 create mode 100644 gcc/testsuite/gcc.dg/ipa/ipa-sra-32.c
 create mode 100644 gcc/testsuite/gcc.dg/ipa/pr110378-4.c

diff --git a/gcc/ipa-param-manipulation.cc b/gcc/ipa-param-manipulation.cc
index 4a185ddbdf4..ae52f17b2c9 100644
--- a/gcc/ipa-param-manipulation.cc
+++ b/gcc/ipa-param-manipulation.cc
@@ -1163,6 +1163,8 @@ ipa_param_body_adjustments::mark_dead_statements (tree 
dead_param,
stack.safe_push (lhs);
}
}
+ else if (gimple_code (stmt) == GIMPLE_RETURN)
+   gcc_assert (m_adjustments && m_adjustments->m_skip_return);
  else
/* IPA-SRA does not analyze other types of statements.  */
gcc_unreachable ();
@@ -1182,7 +1184,8 @@ ipa_param_body_adjustments::mark_dead_statements (tree 
dead_param,
 }
 
 /* Put all clobbers of of dereference of default definition of PARAM into
-   m_dead_stmts.  */
+   m_dead_stmts.  If there are returns among uses of the default definition of
+   PARAM, verify they will be stripped off the return value.  */
 
 void
 ipa_param_body_adjustments::mark_clobbers_dead (tree param)
@@ -1200,6 +1203,8 @@ ipa_param_body_adjustments::mark_clobbers_dead (tree 
param)
  gimple *stmt = USE_STMT (use_p);
  if (gimple_clobber_p (stmt))
m_dead_stmts.add (stmt);
+ else if (gimple_code (stmt) == GIMPLE_RETURN)
+   gcc_assert (m_adjustments && m_adjustments->m_skip_return);
}
 }
 
diff --git a/gcc/ipa-sra.cc b/gcc/ipa-sra.cc
index edba364f56e..817f29ea62f 100644
--- a/gcc/ipa-sra.cc
+++ b/gcc/ipa-sra.cc
@@ -185,6 +185,13 @@ struct GTY(()) isra_param_desc
   unsigned split_candidate : 1;
   /* Is this a parameter passing stuff by reference?  */
   unsigned by_ref : 1;
+  /* If set, this parameter can only be a candidate for removal if the func

Re: [PATCH] Fortran: Avoid accessing gfc_charlen when not looking at BT_CHARACTER (PR 110677)

2023-08-15 Thread Martin Jambor
Hello,

On Mon, Aug 14 2023, Harald Anlauf via Gcc-patches wrote:
> Hi Martin,
>
> Am 14.08.23 um 19:39 schrieb Martin Jambor:
>> Hello,
>> 
>> this patch addresses an issue uncovered by the undefined behavior
>> sanitizer.  In function resolve_structure_cons in resolve.cc there is
>> a test starting with:
>> 
>>if (cons->expr->ts.type == BT_CHARACTER && comp->ts.u.cl
>>&& comp->ts.u.cl->length
>>&& comp->ts.u.cl->length->expr_type == EXPR_CONSTANT
>> 
>> and UBSAN complained of loads from comp->ts.u.cl->length->expr_type of
>> integer value 1818451807 which is outside of the value range expr_t
>> enum.  If I understand the code correctly it the entire load was
>> unwanted because comp->ts.type in those cases is BT_CLASS and not
>> BT_CHARACTER.  This patch simply adds a check to make sure it is only
>> accessed in those cases.
>> 
>> I have verified that the UPBSAN failure goes away with this patch, it
>> also passes bootstrap and testing on x86_64-linux.  OK for master?
>
> this looks good to me.
>
> Looking at that code block, there is a potential other UB a few lines
> below, where (hopefully integer) string lengths are to be passed to
> mpz_cmp.
>
> If the string length is ill-defined (e.g. non-integer), value.integer
> is undefined.  We've seen this elsewhere, where on BE platforms that
> undefined value was interpreted as some large integer and giving
> failures on those platforms.  One could similarly add the following
> checks here (on top of your patch):

Thank you very much for the approval and the improvement.  I have
committed the following (after another round of testing).

Martin



Fortran: Avoid accessing gfc_charlen when not looking at BT_CHARACTER (PR 
110677)

This patch addresses an issue uncovered by the undefined behavior
sanitizer.  In function resolve_structure_cons in resolve.cc there is
a test starting with:

  if (cons->expr->ts.type == BT_CHARACTER && comp->ts.u.cl
  && comp->ts.u.cl->length
  && comp->ts.u.cl->length->expr_type == EXPR_CONSTANT

and UBSAN complained of loads from comp->ts.u.cl->length->expr_type of
integer value 1818451807 which is outside of the value range expr_t
enum.  If I understand the code correctly it the entire load was
unwanted because comp->ts.type in those cases is BT_CLASS and not
BT_CHARACTER.  This patch simply adds a check to make sure it is only
accessed in those cases.

During review, Harald Anlauf noticed that length types also need to be
checked and so I added also checks that he suggested to the condition.

Co-authored-by: Harald Anlauf 

gcc/fortran/ChangeLog:

2023-08-14  Martin Jambor  

PR fortran/110677
* resolve.cc (resolve_structure_cons): Check comp->ts is character
type before accessing stuff through comp->ts.u.cl.
---
 gcc/fortran/resolve.cc | 7 +--
 1 file changed, 5 insertions(+), 2 deletions(-)

diff --git a/gcc/fortran/resolve.cc b/gcc/fortran/resolve.cc
index e7c8d919bef..f51674f7faa 100644
--- a/gcc/fortran/resolve.cc
+++ b/gcc/fortran/resolve.cc
@@ -1396,11 +1396,14 @@ resolve_structure_cons (gfc_expr *expr, int init)
 the one of the structure, ensure this if the lengths are known at
 compile time and when we are dealing with PARAMETER or structure
 constructors.  */
-  if (cons->expr->ts.type == BT_CHARACTER && comp->ts.u.cl
- && comp->ts.u.cl->length
+  if (cons->expr->ts.type == BT_CHARACTER
+ && comp->ts.type == BT_CHARACTER
+ && comp->ts.u.cl && comp->ts.u.cl->length
  && comp->ts.u.cl->length->expr_type == EXPR_CONSTANT
  && cons->expr->ts.u.cl && cons->expr->ts.u.cl->length
  && cons->expr->ts.u.cl->length->expr_type == EXPR_CONSTANT
+ && cons->expr->ts.u.cl->length->ts.type == BT_INTEGER
+ && comp->ts.u.cl->length->ts.type == BT_INTEGER
  && mpz_cmp (cons->expr->ts.u.cl->length->value.integer,
  comp->ts.u.cl->length->value.integer) != 0)
{
-- 
2.41.0



[PATCH] Fortran: Avoid accessing gfc_charlen when not looking at BT_CHARACTER (PR 110677)

2023-08-14 Thread Martin Jambor
Hello,

this patch addresses an issue uncovered by the undefined behavior
sanitizer.  In function resolve_structure_cons in resolve.cc there is
a test starting with:

  if (cons->expr->ts.type == BT_CHARACTER && comp->ts.u.cl
  && comp->ts.u.cl->length
  && comp->ts.u.cl->length->expr_type == EXPR_CONSTANT

and UBSAN complained of loads from comp->ts.u.cl->length->expr_type of
integer value 1818451807 which is outside of the value range expr_t
enum.  If I understand the code correctly it the entire load was
unwanted because comp->ts.type in those cases is BT_CLASS and not
BT_CHARACTER.  This patch simply adds a check to make sure it is only
accessed in those cases.

I have verified that the UPBSAN failure goes away with this patch, it
also passes bootstrap and testing on x86_64-linux.  OK for master?

Thanks,

Martin



gcc/fortran/ChangeLog:

2023-08-14  Martin Jambor  

PR fortran/110677
* resolve.cc (resolve_structure_cons): Check comp->ts is character
type before accessing stuff through comp->ts.u.cl.
---
 gcc/fortran/resolve.cc | 5 +++--
 1 file changed, 3 insertions(+), 2 deletions(-)

diff --git a/gcc/fortran/resolve.cc b/gcc/fortran/resolve.cc
index e7c8d919bef..5b4dfc5fcd2 100644
--- a/gcc/fortran/resolve.cc
+++ b/gcc/fortran/resolve.cc
@@ -1396,8 +1396,9 @@ resolve_structure_cons (gfc_expr *expr, int init)
 the one of the structure, ensure this if the lengths are known at
 compile time and when we are dealing with PARAMETER or structure
 constructors.  */
-  if (cons->expr->ts.type == BT_CHARACTER && comp->ts.u.cl
- && comp->ts.u.cl->length
+  if (cons->expr->ts.type == BT_CHARACTER
+ && comp->ts.type == BT_CHARACTER
+ && comp->ts.u.cl && comp->ts.u.cl->length
  && comp->ts.u.cl->length->expr_type == EXPR_CONSTANT
  && cons->expr->ts.u.cl && cons->expr->ts.u.cl->length
  && cons->expr->ts.u.cl->length->expr_type == EXPR_CONSTANT
-- 
2.41.0



Re: [PATCH 2/2] ipa-cp: Feed results of IPA-CP into value numbering

2023-08-13 Thread Martin Jambor
Hello Richi,

it took me quite time to get back to this but it might have actually
helped because it forced me to re-read the code around and in turn
simplify the patch.

On Mon, Jun 12 2023, Richard Biener wrote:
> On Fri, 9 Jun 2023, Martin Jambor wrote:
>

[...]

>> @@ -2327,7 +2330,7 @@ vn_walk_cb_data::push_partial_def (pd_data pd,
>> with the current VUSE and performs the expression lookup.  */
>>  
>>  static void *
>> -vn_reference_lookup_2 (ao_ref *op ATTRIBUTE_UNUSED, tree vuse, void *data_)
>> +vn_reference_lookup_2 (ao_ref *op, tree vuse, void *data_)
>>  {
>>vn_walk_cb_data *data = (vn_walk_cb_data *)data_;
>>vn_reference_t vr = data->vr;
>> @@ -2361,6 +2364,38 @@ vn_reference_lookup_2 (ao_ref *op ATTRIBUTE_UNUSED, 
>> tree vuse, void *data_)
>>return *slot;
>>  }
>>  
>> +  if (SSA_NAME_IS_DEFAULT_DEF (vuse)
>  && data->partial_defs.is_empty ())
>
> ^^ do this check early

The check is actually done right at the beginning of the function
already so I simply removed it.

>
>> +{
>> +  HOST_WIDE_INT offset, size;
>> +  tree v = NULL_TREE;
>  tree base = ao_ref_base (op);
>  if ((TREE_CODE (base) == PARM_DECL
>   || TREE_CODE (base) == MEM_REF)
>
> handle both kind of bases with ...
>
>> +  && op->offset.is_constant (&offset)
>> +  && op->size.is_constant (&size)
>> +  && op->max_size_known_p ()
>> +  && known_eq (op->size, op->max_size))
>
> ^^^ this preconditions (would have been missing in the MEM_REF branch
> before)

I missed that call to ao_ref_base fills in these fields - and in the
pointer case that they are not filled in without it.  I hope the patch
below is the simplified version you wanted.

The patch passed bootstrap and testing and also LTO bootstrap on
x86_64-linux.

Thanks,

Martin



PRs 68930 and 92497 show that when IPA-CP figures out constants in
aggregate parameters or when passed by reference but the loads happen
in an inlined function the information is lost.  This happens even
when the inlined function itself was known to have - or even cloned to
have - such constants in incoming parameters because the transform
phase of IPA passes is not run on them.  See discussion in the bugs
for reasons why.

Honza suggested that we can plug the results of IPA-CP analysis into
value numbering, so that FRE can figure out that some loads fetch
known constants.  This is what this patch attempts to do.  The patch
does not attempt to populate partial_defs with information from
IPA-CP, this can be hopefully added as a follow-up.

gcc/ChangeLog:

2023-08-11  Martin Jambor  

PR ipa/68930
PR ipa/92497
* ipa-prop.h (ipcp_get_aggregate_const): Declare.
* ipa-prop.cc (ipcp_get_aggregate_const): New function.
(ipcp_transform_function): Do not deallocate transformation info.
* tree-ssa-sccvn.cc: Include alloc-pool.h, symbol-summary.h and
ipa-prop.h.
(vn_reference_lookup_2): When hitting default-def vuse, query
IPA-CP transformation info for any known constants.

gcc/testsuite/ChangeLog:

2023-06-07  Martin Jambor  

PR ipa/68930
PR ipa/92497
* gcc.dg/ipa/pr92497-1.c: New test.
* gcc.dg/ipa/pr92497-2.c: Likewise.
---
 gcc/ipa-prop.cc  | 33 +++
 gcc/ipa-prop.h   |  3 +++
 gcc/testsuite/gcc.dg/ipa/pr92497-1.c | 26 +
 gcc/testsuite/gcc.dg/ipa/pr92497-2.c | 26 +
 gcc/tree-ssa-sccvn.cc| 34 +++-
 5 files changed, 116 insertions(+), 6 deletions(-)
 create mode 100644 gcc/testsuite/gcc.dg/ipa/pr92497-1.c
 create mode 100644 gcc/testsuite/gcc.dg/ipa/pr92497-2.c

diff --git a/gcc/ipa-prop.cc b/gcc/ipa-prop.cc
index 4f6ed7b89bd..9efaa5cb848 100644
--- a/gcc/ipa-prop.cc
+++ b/gcc/ipa-prop.cc
@@ -5760,6 +5760,34 @@ ipcp_modif_dom_walker::before_dom_children (basic_block 
bb)
   return NULL;
 }
 
+/* If IPA-CP discovered a constant in parameter PARM at OFFSET of a given SIZE
+   - whether passed by reference or not is given by BY_REF - return that
+   constant.  Otherwise return NULL_TREE.  */
+
+tree
+ipcp_get_aggregate_const (struct function *func, tree parm, bool by_ref,
+ HOST_WIDE_INT bit_offset, HOST_WIDE_INT bit_size)
+{
+  cgraph_node *node = cgraph_node::get (func->decl);
+  ipcp_transformation *ts = ipcp_get_transformation_summary (node);
+
+  if (!ts || !ts->m_agg_values)
+return NULL_TREE;
+
+  int index = ts->get_param_index (func->decl, parm);
+  if (index < 0)
+return NULL_TREE;
+
+  ipa_argagg_value_list avl (ts);
+  unsigned unit_offset = bit_offset / BITS_PER_UNIT;
+  tree

Re: [PATCH] ipa-sra: Don't consider CLOBBERS as writes preventing splitting

2023-08-11 Thread Martin Jambor
Hello,

On Fri, Aug 11 2023, Christophe Lyon wrote:
> Hi Martin,
>
>
> On Fri, 4 Aug 2023 at 18:26, Martin Jambor  wrote:
>
>> Hello,
>>
>> On Wed, Aug 02 2023, Richard Biener wrote:
>> > On Mon, Jul 31, 2023 at 7:05 PM Martin Jambor  wrote:
>> >>
>> >> Hi,
>> >>
>> >> when IPA-SRA detects whether a parameter passed by reference is
>> >> written to, it does not special case CLOBBERs which means it often
>> >> bails out unnecessarily, especially when dealing with C++ destructors.
>> >> Fixed by the obvious continue in the two relevant loops.
>> >>
>> >> The (slightly) more complex testcases in the PR need surprisingly more
>> >> effort but the simple one can be fixed now easily by this patch and I'll
>> >> work on the others incrementally.
>> >>
>> >> Bootstrapped and currently undergoing testsuite run on x86_64-linux.  OK
>> >> if it passes too?
>> >
>> > LGTM, btw - how are the clobbers handled during transform?
>>
>> it turns out your question is spot on.  I assumed that the mini-DCE that
>> I implemented into IPA-SRA transform would delete but I had a closer
>> look and it is not invoked on split parameters,only on removed ones.
>> What was actually happening is that the parameter got remapped to a
>> default definition of a replacement VAR_DECL and we were thus
>> gimple-clobbering a pointer pointing to nowhere.  The clobber then got
>> DSEd and so I originally did not notice looking at the optimized dump.
>>
>> Still that is of course not ideal and so I added a simple function
>> removing clobbers when splitting.  I as considering adding that
>> functionality to ipa_param_body_adjustments::mark_dead_statements but
>> that would make the function harder to read without much gain.
>>
>> So thanks again for the remark.  The following passes bootstrap and
>> testing on x86_64-linux.  I am running LTO bootstrap now.  OK if it
>> passes?
>>
>> Martin
>>
>>
>>
>> When IPA-SRA detects whether a parameter passed by reference is
>> written to, it does not special case CLOBBERs which means it often
>> bails out unnecessarily, especially when dealing with C++ destructors.
>> Fixed by the obvious continue in the two relevant loops and by adding
>> a simple function that marks the clobbers in the transformation code
>> as statements to be removed.
>>
>>
> Not sure if you noticed: I updated bugzilla because the new test fails on
> arm, and I attached  pr110378-1.C.083i.sra there, to help you debug.
>

I am aware and have actually started looking at the issue a while ago.
Sorry, I'm only slowly making my way through my TODO list.

The difference on 32bit ARM is that the destructor return this pointer,
which means that IPA-SRA cannot just split the loaded bit - without any
follow-up IPA analysis that the return value is unused which it does not
take into account this way.  But now that we remove useless returns
before splitting it should be doable.

Meanwhile, is there a dejagnu target macro for architectures with
destructors returning value so that we could xfail the test there?

Thanks for bringing my attention to this.

Martin



> Thanks,
>
> Christophe
>
> gcc/ChangeLog:
>>
>> 2023-08-04  Martin Jambor  
>>
>> PR ipa/110378
>> * ipa-param-manipulation.h (class ipa_param_body_adjustments): New
>> members get_ddef_if_exists_and_is_used and mark_clobbers_dead.
>> * ipa-sra.cc (isra_track_scalar_value_uses): Ignore clobbers.
>> (ptr_parm_has_nonarg_uses): Likewise.
>> * ipa-param-manipulation.cc
>> (ipa_param_body_adjustments::get_ddef_if_exists_and_is_used): New.
>> (ipa_param_body_adjustments::mark_dead_statements): Move initial
>> checks to get_ddef_if_exists_and_is_used.
>> (ipa_param_body_adjustments::mark_clobbers_dead): New.
>> (ipa_param_body_adjustments::common_initialization): Call
>> mark_clobbers_dead when splitting.
>>
>> gcc/testsuite/ChangeLog:
>>
>> 2023-07-31  Martin Jambor  
>>
>> PR ipa/110378
>> * g++.dg/ipa/pr110378-1.C: New test.
>> ---
>>  gcc/ipa-param-manipulation.cc | 44 +---
>>  gcc/ipa-param-manipulation.h  |  2 ++
>>  gcc/ipa-sra.cc|  6 ++--
>>  gcc/testsuite/g++.dg/ipa/pr110378-1.C | 48 +++
>>  4 files changed, 94 insertions(+), 6 deletions(-)
>>  create mode 100644 gcc/testsuite/

Re: [PATCH] ipa-sra: Don't consider CLOBBERS as writes preventing splitting

2023-08-04 Thread Martin Jambor
Hello,

On Wed, Aug 02 2023, Richard Biener wrote:
> On Mon, Jul 31, 2023 at 7:05 PM Martin Jambor  wrote:
>>
>> Hi,
>>
>> when IPA-SRA detects whether a parameter passed by reference is
>> written to, it does not special case CLOBBERs which means it often
>> bails out unnecessarily, especially when dealing with C++ destructors.
>> Fixed by the obvious continue in the two relevant loops.
>>
>> The (slightly) more complex testcases in the PR need surprisingly more
>> effort but the simple one can be fixed now easily by this patch and I'll
>> work on the others incrementally.
>>
>> Bootstrapped and currently undergoing testsuite run on x86_64-linux.  OK
>> if it passes too?
>
> LGTM, btw - how are the clobbers handled during transform?

it turns out your question is spot on.  I assumed that the mini-DCE that
I implemented into IPA-SRA transform would delete but I had a closer
look and it is not invoked on split parameters,only on removed ones.
What was actually happening is that the parameter got remapped to a
default definition of a replacement VAR_DECL and we were thus
gimple-clobbering a pointer pointing to nowhere.  The clobber then got
DSEd and so I originally did not notice looking at the optimized dump.

Still that is of course not ideal and so I added a simple function
removing clobbers when splitting.  I as considering adding that
functionality to ipa_param_body_adjustments::mark_dead_statements but
that would make the function harder to read without much gain.

So thanks again for the remark.  The following passes bootstrap and
testing on x86_64-linux.  I am running LTO bootstrap now.  OK if it
passes?

Martin



When IPA-SRA detects whether a parameter passed by reference is
written to, it does not special case CLOBBERs which means it often
bails out unnecessarily, especially when dealing with C++ destructors.
Fixed by the obvious continue in the two relevant loops and by adding
a simple function that marks the clobbers in the transformation code
as statements to be removed.

gcc/ChangeLog:

2023-08-04  Martin Jambor  

PR ipa/110378
* ipa-param-manipulation.h (class ipa_param_body_adjustments): New
members get_ddef_if_exists_and_is_used and mark_clobbers_dead.
* ipa-sra.cc (isra_track_scalar_value_uses): Ignore clobbers.
(ptr_parm_has_nonarg_uses): Likewise.
* ipa-param-manipulation.cc
(ipa_param_body_adjustments::get_ddef_if_exists_and_is_used): New.
(ipa_param_body_adjustments::mark_dead_statements): Move initial
checks to get_ddef_if_exists_and_is_used.
(ipa_param_body_adjustments::mark_clobbers_dead): New.
(ipa_param_body_adjustments::common_initialization): Call
mark_clobbers_dead when splitting.

gcc/testsuite/ChangeLog:

2023-07-31  Martin Jambor  

PR ipa/110378
* g++.dg/ipa/pr110378-1.C: New test.
---
 gcc/ipa-param-manipulation.cc | 44 +---
 gcc/ipa-param-manipulation.h  |  2 ++
 gcc/ipa-sra.cc|  6 ++--
 gcc/testsuite/g++.dg/ipa/pr110378-1.C | 48 +++
 4 files changed, 94 insertions(+), 6 deletions(-)
 create mode 100644 gcc/testsuite/g++.dg/ipa/pr110378-1.C

diff --git a/gcc/ipa-param-manipulation.cc b/gcc/ipa-param-manipulation.cc
index a286af7f5d9..4a185ddbdf4 100644
--- a/gcc/ipa-param-manipulation.cc
+++ b/gcc/ipa-param-manipulation.cc
@@ -1072,6 +1072,20 @@ ipa_param_body_adjustments::carry_over_param (tree t)
   return new_parm;
 }
 
+/* If DECL is a gimple register that has a default definition SSA name and that
+   has some uses, return the default definition, otherwise return NULL_TREE.  
*/
+
+tree
+ipa_param_body_adjustments::get_ddef_if_exists_and_is_used (tree decl)
+{
+ if (!is_gimple_reg (decl))
+return NULL_TREE;
+  tree ddef = ssa_default_def (m_id->src_cfun, decl);
+  if (!ddef || has_zero_uses (ddef))
+return NULL_TREE;
+  return ddef;
+}
+
 /* Populate m_dead_stmts given that DEAD_PARAM is going to be removed without
any replacement or splitting.  REPL is the replacement VAR_SECL to base any
remaining uses of a removed parameter on.  Push all removed SSA names that
@@ -1084,10 +1098,8 @@ ipa_param_body_adjustments::mark_dead_statements (tree 
dead_param,
   /* Current IPA analyses which remove unused parameters never remove a
  non-gimple register ones which have any use except as parameters in other
  calls, so we can safely leve them as they are.  */
-  if (!is_gimple_reg (dead_param))
-return;
-  tree parm_ddef = ssa_default_def (m_id->src_cfun, dead_param);
-  if (!parm_ddef || has_zero_uses (parm_ddef))
+  tree parm_ddef = get_ddef_if_exists_and_is_used (dead_param);
+  if (!parm_ddef)
 return;
 
   auto_vec stack;
@@ -1169,6 +1181,28 @@ ipa_param_body_adjustments::mark_dead_statements (tree 
dead_param,
   m_dead_ssa_debug_equiv.pu

[PATCH] ipa-sra: Don't consider CLOBBERS as writes preventing splitting

2023-07-31 Thread Martin Jambor
Hi,

when IPA-SRA detects whether a parameter passed by reference is
written to, it does not special case CLOBBERs which means it often
bails out unnecessarily, especially when dealing with C++ destructors.
Fixed by the obvious continue in the two relevant loops.

The (slightly) more complex testcases in the PR need surprisingly more
effort but the simple one can be fixed now easily by this patch and I'll
work on the others incrementally.

Bootstrapped and currently undergoing testsuite run on x86_64-linux.  OK
if it passes too?

Thanks,

Martin




gcc/ChangeLog:

2023-07-31  Martin Jambor  

PR ipa/110378
* ipa-sra.cc (isra_track_scalar_value_uses): Ignore clobbers.
(ptr_parm_has_nonarg_uses): Likewise.

gcc/testsuite/ChangeLog:

2023-07-31  Martin Jambor  

PR ipa/110378
* g++.dg/ipa/pr110378-1.C: New test.
---
 gcc/ipa-sra.cc|  6 ++--
 gcc/testsuite/g++.dg/ipa/pr110378-1.C | 47 +++
 2 files changed, 51 insertions(+), 2 deletions(-)
 create mode 100644 gcc/testsuite/g++.dg/ipa/pr110378-1.C

diff --git a/gcc/ipa-sra.cc b/gcc/ipa-sra.cc
index c35e03b7abd..edba364f56e 100644
--- a/gcc/ipa-sra.cc
+++ b/gcc/ipa-sra.cc
@@ -898,7 +898,8 @@ isra_track_scalar_value_uses (function *fun, cgraph_node 
*node, tree name,
 
   FOR_EACH_IMM_USE_STMT (stmt, imm_iter, name)
 {
-  if (is_gimple_debug (stmt))
+  if (is_gimple_debug (stmt)
+ || gimple_clobber_p (stmt))
continue;
 
   /* TODO: We could handle at least const builtin functions like arithmetic
@@ -1056,7 +1057,8 @@ ptr_parm_has_nonarg_uses (cgraph_node *node, function 
*fun, tree parm,
   unsigned uses_ok = 0;
   use_operand_p use_p;
 
-  if (is_gimple_debug (stmt))
+  if (is_gimple_debug (stmt)
+ || gimple_clobber_p (stmt))
continue;
 
   if (gimple_assign_single_p (stmt))
diff --git a/gcc/testsuite/g++.dg/ipa/pr110378-1.C 
b/gcc/testsuite/g++.dg/ipa/pr110378-1.C
new file mode 100644
index 000..aabe326b8b2
--- /dev/null
+++ b/gcc/testsuite/g++.dg/ipa/pr110378-1.C
@@ -0,0 +1,47 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -fdump-ipa-sra -fdump-tree-optimized-slim"  } */
+
+/* Test that even though destructors end with clobbering all of *this, it
+   should not prevent IPA-SRA.  */
+
+namespace {
+
+  class foo
+  {
+  public:
+int *a;
+foo(int c)
+{
+  a = new int[c];
+  a[0] = 4;
+}
+__attribute__((noinline)) ~foo();
+int f ()
+{
+  return a[0] + 1;
+}
+  };
+
+  volatile int v1 = 4;
+
+  __attribute__((noinline)) foo::~foo()
+  {
+delete[] a;
+return;
+  }
+
+
+}
+
+volatile int v2 = 20;
+
+int test (void)
+{
+  foo shouldnotexist(v2);
+  v2 = shouldnotexist.f();
+  return 0;
+}
+
+
+/* { dg-final { scan-ipa-dump "Will split parameter 0" "sra"  } } */
+/* { dg-final { scan-tree-dump-not "shouldnotexist" "optimized" } } */
-- 
2.41.0



Re: [PATCH] Read global value/mask in IPA.

2023-07-31 Thread Martin Jambor
Hello,

On Tue, Jul 18 2023, Aldy Hernandez wrote:
> On 7/17/23 15:14, Aldy Hernandez wrote:
>> Instead of reading the known zero bits in IPA, read the value/mask
>> pair which is available.
>> 
>> There is a slight change of behavior here.  I have removed the check
>> for SSA_NAME, as the ranger can calculate the range and value/mask for
>> INTEGER_CST.  This simplifies the code a bit, since there's no special
>> casing when setting the jfunc bits.  The default range for VR is
>> undefined, so I think it's safe just to check for undefined_p().
>
> Final round of tests revealed a regression for which I've adjusted the 
> testcase.
>
> It turns out g++.dg/ipa/pure-const-3.C fails because IPA can now pick up 
> value/mask from any pass that has an integrated ranger.  The test was 
> previously disabling evrp and CCP, but now VRP[12], jump threading, and 
> DOM can make value/mask adjustments visible to IPA so they must be 
> disabled as well.

So can this be then converted into a new testcase that would test that
we can now derive something we could not in the past?

The patch is OK (but the testcase above is highly desirable).

Thanks for keeping looking at IPA-VR.

Martin


>
> We've run into these scenarios multiple times in the past-- any 
> improvements to the ranger pipeline causes everyone to get smarter, 
> making changes visible earlier in the pipeline.
>
> Aldy
> From e1dfd4d6b3d3bf09d55b6ea3ac732462c7030802 Mon Sep 17 00:00:00 2001
> From: Aldy Hernandez 
> Date: Fri, 14 Jul 2023 12:38:16 +0200
> Subject: [PATCH] Read global value/mask in IPA.
>
> Instead of reading the known zero bits in IPA, read the value/mask
> pair which is available.
>
> There is a slight change of behavior here.  I have removed the check
> for SSA_NAME, as the ranger can calculate the range and value/mask for
> INTEGER_CST.  This simplifies the code a bit, since there's no special
> casing when setting the jfunc bits.  The default range for VR is
> undefined, so I think it's safe just to check for undefined_p().
>
> gcc/ChangeLog:
>
>   * ipa-prop.cc (ipa_compute_jump_functions_for_edge): Read global
>   value/mask.
>
> gcc/testsuite/ChangeLog:
>
>   * g++.dg/ipa/pure-const-3.C: Adjust for smarter value/mask being
>   read by ranger earlier than expected by test.
> ---
>  gcc/ipa-prop.cc | 18 --
>  gcc/testsuite/g++.dg/ipa/pure-const-3.C |  2 +-
>  2 files changed, 9 insertions(+), 11 deletions(-)
>
> diff --git a/gcc/ipa-prop.cc b/gcc/ipa-prop.cc
> index 5d790ff1265..4f6ed7b89bd 100644
> --- a/gcc/ipa-prop.cc
> +++ b/gcc/ipa-prop.cc
> @@ -2402,8 +2402,7 @@ ipa_compute_jump_functions_for_edge (struct 
> ipa_func_body_info *fbi,
>   }
>else
>   {
> -   if (TREE_CODE (arg) == SSA_NAME
> -   && param_type
> +   if (param_type
> && Value_Range::supports_type_p (TREE_TYPE (arg))
> && Value_Range::supports_type_p (param_type)
> && irange::supports_p (TREE_TYPE (arg))
> @@ -2422,15 +2421,14 @@ ipa_compute_jump_functions_for_edge (struct 
> ipa_func_body_info *fbi,
>   gcc_assert (!jfunc->m_vr);
>   }
>  
> -  if (INTEGRAL_TYPE_P (TREE_TYPE (arg))
> -   && (TREE_CODE (arg) == SSA_NAME || TREE_CODE (arg) == INTEGER_CST))
> +  if (INTEGRAL_TYPE_P (TREE_TYPE (arg)) && !vr.undefined_p ())
>   {
> -   if (TREE_CODE (arg) == SSA_NAME)
> - ipa_set_jfunc_bits (jfunc, 0,
> - widest_int::from (get_nonzero_bits (arg),
> -   TYPE_SIGN (TREE_TYPE (arg;
> -   else
> - ipa_set_jfunc_bits (jfunc, wi::to_widest (arg), 0);
> +   irange &r = as_a  (vr);
> +   irange_bitmask bm = r.get_bitmask ();
> +   signop sign = TYPE_SIGN (TREE_TYPE (arg));
> +   ipa_set_jfunc_bits (jfunc,
> +   widest_int::from (bm.value (), sign),
> +   widest_int::from (bm.mask (), sign));
>   }
>else if (POINTER_TYPE_P (TREE_TYPE (arg)))
>   {
> diff --git a/gcc/testsuite/g++.dg/ipa/pure-const-3.C 
> b/gcc/testsuite/g++.dg/ipa/pure-const-3.C
> index b4a4673e86e..e43cf09af27 100644
> --- a/gcc/testsuite/g++.dg/ipa/pure-const-3.C
> +++ b/gcc/testsuite/g++.dg/ipa/pure-const-3.C
> @@ -1,5 +1,5 @@
>  /* { dg-do compile } */
> -/* { dg-options "-O2 -fno-ipa-vrp -fdump-tree-optimized -fno-tree-ccp 
> -fdisable-tree-evrp"  } */
> +/* { dg-options "-O2 -fno-ipa-vrp -fdump-tree-optimized -fno-tree-ccp 
> -fdisable-tree-evrp -fdisable-tree-vrp1 -fdisable-tree-vrp2 -fno-thread-jumps 
> -fno-tree-dominator-opts"  } */
>  int *ptr;
>  static int barvar;
>  static int b(int a);
> -- 
> 2.40.1


Re: [PATCH] mklog: Add --append option to auto add generate ChangeLog to patch file

2023-07-21 Thread Martin Jambor
Hello Lehua,

On Fri, Jul 21 2023, Lehua Ding wrote:
> Hi Martin,
>
>
> By the way, is there a standard format required for these Python files?

Generally, our Python coding conventions are at
https://gcc.gnu.org/codingconventions.html#python

> I see that other Python files have similar format error when checked
> using flake8.

For historic reasons (i.e. Martin Liška set it up that way), we
currently use flake8 to check python formatting of
contrib/gcc-changelog, contrib/mklog.py and
maintainer-scripts/branch_changer.py and use pytest to check
contrib/gcc-changelog and contrib/test_mklog.py.  That is how I found
out.

I guess many of the files predate the coding conventions and so don't
adhere to them.  Patches to fix them are welcome (I guess) but at least
we should not regress (I guess).

> If so, it feels necessary to configure a git hook on git server to do
> this check.

Performing more thorough checks on pushed commits is a much larger topic
than this thread.  FWIW, I would not oppose to checking python scripts
that are known to be OK.

Martin


Re: [PATCH] mklog: Add --append option to auto add generate ChangeLog to patch file

2023-07-21 Thread Martin Jambor
Hello Lehua,

On Fri, Jul 21 2023, Lehua Ding wrote:
> Hi Martin,
>
>
> > this patch caused flake8 to complain about contrib/mklog.py:
> > 
> > $ flake8 contrib/mklog.py
> > contrib/mklog.py:377:80: E501 line too long (85 > 79 characters)
> > contrib/mklog.py:388:26: E127 continuation line over-indented for 
> visual indent
> > contrib/mklog.py:388:36: W605 invalid escape sequence '\s'
> > contrib/mklog.py:388:40: W605 invalid escape sequence '\s'
> > contrib/mklog.py:388:44: W605 invalid escape sequence '\s'
> > contrib/mklog.py:388:47: W605 invalid escape sequence '\|'
> > contrib/mklog.py:388:49: W605 invalid escape sequence '\s'
> > contrib/mklog.py:388:51: W605 invalid escape sequence '\d'
> > contrib/mklog.py:388:54: W605 invalid escape sequence '\s'
> > contrib/mklog.py:388:58: W605 invalid escape sequence '\-'
> > 
> > Can you please have a look and ideally fix the issues?
>
>
> Thank you for pointing out this.
> I will fix these format errors in another fix patch[1].

Thanks!

> I tried to fix the following format error but couldn't
> find a way, do you know how to fix this error?
>
>
> contrib/mklog.py:388:26: E127 continuation line over-indented for visual 
> indent

I am no python expert but the following seems to work:

diff --git a/contrib/mklog.py b/contrib/mklog.py
index 26230b9b4f2..2563d19bc99 100755
--- a/contrib/mklog.py
+++ b/contrib/mklog.py
@@ -384,8 +384,8 @@ if __name__ == '__main__':
 for line in f:
 if maybe_diff_log == 1 and line == "---\n":
 maybe_diff_log = 2
-elif maybe_diff_log == 2 and \
- re.match("\s[^\s]+\s+\|\s\d+\s[+\-]+\n", line):
+elif (maybe_diff_log == 2 and
+  re.match("\s[^\s]+\s+\|\s\d+\s[+\-]+\n", line)):
 lines += [output, "---\n", line]
 maybe_diff_log = 3
 else:

Martin


Re: [PATCH] mklog: Add --append option to auto add generate ChangeLog to patch file

2023-07-21 Thread Martin Jambor
Hello Lehua,

On Wed, Jul 12 2023, Lehua Ding wrote:
> Hi,
>
> This tiny patch add --append option to mklog.py that support add generated
> ChangeLog to the corresponding patch file. With this option there is no need
> to manually copy the generated ChangeLog to the patch file. e.g.:
>
> Run `mklog.py -a /path/to/this/patch` will add the generated ChangeLog
>
> ```
> contrib/ChangeLog:
>
>   * mklog.py:
> ```

this patch caused flake8 to complain about contrib/mklog.py:

$ flake8 contrib/mklog.py
contrib/mklog.py:377:80: E501 line too long (85 > 79 characters)
contrib/mklog.py:388:26: E127 continuation line over-indented for visual indent
contrib/mklog.py:388:36: W605 invalid escape sequence '\s'
contrib/mklog.py:388:40: W605 invalid escape sequence '\s'
contrib/mklog.py:388:44: W605 invalid escape sequence '\s'
contrib/mklog.py:388:47: W605 invalid escape sequence '\|'
contrib/mklog.py:388:49: W605 invalid escape sequence '\s'
contrib/mklog.py:388:51: W605 invalid escape sequence '\d'
contrib/mklog.py:388:54: W605 invalid escape sequence '\s'
contrib/mklog.py:388:58: W605 invalid escape sequence '\-'

Can you please have a look and ideally fix the issues?

Thanks,

Martin


>
> to the right place of the /path/to/this/patch file.
>
> Best,
> Lehua
>
> contrib/ChangeLog:
>
>   * mklog.py: Add --append option.
>
> ---
>  contrib/mklog.py | 27 ++-
>  1 file changed, 26 insertions(+), 1 deletion(-)
>
> diff --git a/contrib/mklog.py b/contrib/mklog.py
> index 777212c98d7..26230b9b4f2 100755
> --- a/contrib/mklog.py
> +++ b/contrib/mklog.py
> @@ -358,6 +358,8 @@ if __name__ == '__main__':
>   'file')
>  parser.add_argument('--update-copyright', action='store_true',
>  help='Update copyright in ChangeLog files')
> +parser.add_argument('-a', '--append', action='store_true',
> +help='Append the generate ChangeLog to the patch 
> file')
>  args = parser.parse_args()
>  if args.input == '-':
>  args.input = None
> @@ -370,7 +372,30 @@ if __name__ == '__main__':
>  else:
>  output = generate_changelog(data, args.no_functions,
>  args.fill_up_bug_titles, args.pr_numbers)
> -if args.changelog:
> +if args.append:
> +if (not args.input):
> +raise Exception("`-a or --append` option not support 
> standard input")
> +lines = []
> +with open(args.input, 'r', newline='\n') as f:
> +# 1 -> not find the possible start of diff log
> +# 2 -> find the possible start of diff log
> +# 3 -> finish add ChangeLog to the patch file
> +maybe_diff_log = 1
> +for line in f:
> +if maybe_diff_log == 1 and line == "---\n":
> +maybe_diff_log = 2
> +elif maybe_diff_log == 2 and \
> + re.match("\s[^\s]+\s+\|\s\d+\s[+\-]+\n", line):
> +lines += [output, "---\n", line]
> +maybe_diff_log = 3
> +else:
> +# the possible start is not the true start.
> +if maybe_diff_log == 2:
> +maybe_diff_log = 1
> +lines.append(line)
> +with open(args.input, "w") as f:
> +f.writelines(lines)
> +elif args.changelog:
>  lines = open(args.changelog).read().split('\n')
>  start = list(takewhile(skip_line_in_changelog, lines))
>  end = lines[len(start):]
> -- 
> 2.36.1


[committed] Document new analyzer parameters

2023-07-20 Thread Martin Jambor
Hi,

This patch documents the analyzer parameters introduced in
r14-2029-g0e466e978c7286 also in gcc/doc/invoke.texi.

Committed as obvious after testing with make pdf and make info and
eyeballing the result.

Thanks,

Martin


2023-07-20  Martin Jambor  

* doc/invoke.texi (analyzer-text-art-string-ellipsis-threshold): New.
(analyzer-text-art-ideal-canvas-width): Likewise.
(analyzer-text-art-string-ellipsis-head-len): Likewise.
(analyzer-text-art-string-ellipsis-tail-len): Likewise.

---
 gcc/doc/invoke.texi | 12 
 1 file changed, 12 insertions(+)

diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi
index d3c821e208a..5628c08214d 100644
--- a/gcc/doc/invoke.texi
+++ b/gcc/doc/invoke.texi
@@ -16324,6 +16324,18 @@ The parameter is used only in GIMPLE FE.
 The maximum number of 'after supernode' exploded nodes within the analyzer
 per supernode, before terminating analysis.
 
+@item analyzer-text-art-string-ellipsis-threshold
+The number of bytes at which to ellipsize string literals in analyzer text art 
diagrams.
+
+@item analyzer-text-art-ideal-canvas-width
+The ideal width in characters of text art diagrams generated by the analyzer.
+
+@item analyzer-text-art-string-ellipsis-head-len
+The number of literal bytes to show at the head of a string literal in text 
art when ellipsizing it.
+
+@item analyzer-text-art-string-ellipsis-tail-len
+The number of literal bytes to show at the tail of a string literal in text 
art when ellipsizing it.
+
 @item ranger-logical-depth
 Maximum depth of logical expression evaluation ranger will look through
 when evaluating outgoing edge ranges.
-- 
2.41.0



[committed] Restore bootstrap by removing unused variable in tree-ssa-loop-ivcanon.cc

2023-07-17 Thread Martin Jambor
Hi,

This restores bootstrap by removing the variable causing:

  /home/mjambor/gcc/trunk/src/gcc/tree-ssa-loop-ivcanon.cc: In function ‘bool 
try_peel_loop(loop*, edge, tree, bool, long int)’:
  /home/mjambor/gcc/trunk/src/gcc/tree-ssa-loop-ivcanon.cc:1170:17: error: 
variable ‘entry_count’ set but not used [-Werror=unused-but-set-variable]
   1170 |   profile_count entry_count = profile_count::zero ();
| ^~~
  cc1plus: all warnings being treated as errors

ACKed by Honza in a chat, passed a bootstrap on x86_64-linux, committed.

Thanks,

Martin


gcc/ChangeLog:

2023-07-17  Martin Jambor  

* tree-ssa-loop-ivcanon.cc (try_peel_loop): Remove unused variable
entry_count.
---
 gcc/tree-ssa-loop-ivcanon.cc | 1 -
 1 file changed, 1 deletion(-)

diff --git a/gcc/tree-ssa-loop-ivcanon.cc b/gcc/tree-ssa-loop-ivcanon.cc
index bdb738af7a8..a895e8e65be 100644
--- a/gcc/tree-ssa-loop-ivcanon.cc
+++ b/gcc/tree-ssa-loop-ivcanon.cc
@@ -1167,7 +1167,6 @@ try_peel_loop (class loop *loop,
   loop->num, (int) npeel);
 }
   adjust_loop_info_after_peeling (loop, npeel, true);
-  profile_count entry_count = profile_count::zero ();
 
   bitmap_set_bit (peeled_loops, loop->num);
   return true;
-- 
2.41.0



Re: [PATCH] Export value/mask known bits from IPA.

2023-07-17 Thread Martin Jambor
Hi Aldy,

On Mon, Jul 17 2023, Aldy Hernandez wrote:
> Currently IPA throws away the known 1 bits because VRP and irange have
> traditionally only had a way of tracking known 0s (set_nonzero_bits).
> With the ability to keep all the known bits in the irange, we can now
> save this between passes.
>
> OK?
>
> gcc/ChangeLog:
>
>   * ipa-prop.cc (ipcp_update_bits): Export value/mask known bits.

OK, thanks.

Martin


> ---
>  gcc/ipa-prop.cc | 7 +++
>  1 file changed, 3 insertions(+), 4 deletions(-)
>
> diff --git a/gcc/ipa-prop.cc b/gcc/ipa-prop.cc
> index d2b998f8af5..5d790ff1265 100644
> --- a/gcc/ipa-prop.cc
> +++ b/gcc/ipa-prop.cc
> @@ -5853,10 +5853,9 @@ ipcp_update_bits (struct cgraph_node *node, 
> ipcp_transformation *ts)
>   {
> unsigned prec = TYPE_PRECISION (TREE_TYPE (ddef));
> signop sgn = TYPE_SIGN (TREE_TYPE (ddef));
> -
> -   wide_int nonzero_bits = wide_int::from (bits[i]->mask, prec, UNSIGNED)
> -   | wide_int::from (bits[i]->value, prec, sgn);
> -   set_nonzero_bits (ddef, nonzero_bits);
> +   wide_int mask = wide_int::from (bits[i]->mask, prec, UNSIGNED);
> +   wide_int value = wide_int::from (bits[i]->value, prec, sgn);
> +   set_bitmask (ddef, value, mask);
>   }
>else
>   {
> -- 
> 2.40.1


Re: [PATCH 3/3] analyzer: add text-art visualizations of out-of-bounds accesses [PR106626]

2023-06-30 Thread Martin Jambor
Hi David,

On Wed, May 31 2023, David Malcolm via Gcc-patches wrote:
> This patch extends -Wanalyzer-out-of-bounds so that, where possible, it
> will emit a text art diagram visualizing the spatial relationship between

[...]


>
> gcc/ChangeLog:
>   PR analyzer/106626
>   * Makefile.in (ANALYZER_OBJS): Add analyzer/access-diagram.o.
>   * doc/invoke.texi (Wanalyzer-out-of-bounds): Add description of
>   text art.
>   (fanalyzer-debug-text-art): New.
>
> gcc/analyzer/ChangeLog:
>   PR analyzer/106626
>   * access-diagram.cc: New file.
>   * access-diagram.h: New file.
>   * analyzer.h (class region_offset): Add default ctor.
>   (region_offset::make_byte_offset): New decl.
>   (region_offset::concrete_p): New.
>   (region_offset::get_concrete_byte_offset): New.
>   (region_offset::calc_symbolic_bit_offset): New decl.
>   (region_offset::calc_symbolic_byte_offset): New decl.
>   (region_offset::dump_to_pp): New decl.
>   (region_offset::dump): New decl.
>   (operator<, operator<=, operator>, operator>=): New decls for
>   region_offset.
>   * analyzer.opt
>   (-param=analyzer-text-art-string-ellipsis-threshold=): New.
>   (-param=analyzer-text-art-string-ellipsis-head-len=): New.
>   (-param=analyzer-text-art-string-ellipsis-tail-len=): New.
>   (-param=analyzer-text-art-ideal-canvas-width=): New.

contrib/check-params-in-docs.py now complains that:

  $ ./gcc/xgcc -Bgcc --help=param &>/tmp/params.txt
  $ ../src/contrib/check-params-in-docs.py ../src/gcc/doc/invoke.texi 
/tmp/params.txt 
  Missing:
  @item analyzer-text-art-string-ellipsis-threshold
  The number of bytes at which to ellipsize string literals in

  @item analyzer-text-art-string-ellipsis-head-len
  The number of literal bytes to show at the head of a string

  @item analyzer-text-art-string-ellipsis-tail-len
  The number of literal bytes to show at the tail of a string

  @item analyzer-text-art-ideal-canvas-width
  The ideal width in characters of text art diagrams generated by the

Can you please add the respective documentation entries?

Thanks!

Martin


[committed] Regenrate lto-plugin/Makefile.in

2023-06-30 Thread Martin Jambor
Hi,

On Thu, Jun 29 2023, Marek Polacek wrote:
> On Thu, Jun 29, 2023 at 05:58:22PM +0200, Martin Jambor wrote:

[...]

>> 
>> Unfortunately I won't have time to actually look at this in the next 2-3
>> weeks, so I am inclined to just trust the verification script (which
>> essentially runs autoconf/automake everywhere and then expects no diff)
>> and commit the one-line change.  What do you think, does that make sense
>> (even without looking at why other Makefile.in files did not change)?
>
> Yes please, go ahead with the one line change meanwhile.  Thanks!
>
> I've opened PR110467 for the build problem.
>
> Marek


Commit regenerated lto-plugin/Makefile.in in order to reflect changes
introduction of --enable-host-pie.

lto-plugin/ChangeLog:

2023-06-30  Martin Jambor  

* Makefile.in: Regenerate.
---
 lto-plugin/Makefile.in | 1 +
 1 file changed, 1 insertion(+)

diff --git a/lto-plugin/Makefile.in b/lto-plugin/Makefile.in
index cb568e1e09f..f6f5b020ff5 100644
--- a/lto-plugin/Makefile.in
+++ b/lto-plugin/Makefile.in
@@ -298,6 +298,7 @@ datadir = @datadir@
 datarootdir = @datarootdir@
 docdir = @docdir@
 dvidir = @dvidir@
+enable_host_bind_now = @enable_host_bind_now@
 exec_prefix = @exec_prefix@
 gcc_build_dir = @gcc_build_dir@
 get_gcc_base_ver = @get_gcc_base_ver@
-- 
2.41.0



Re: [PATCH] configure: Implement --enable-host-bind-now

2023-06-29 Thread Martin Jambor
Hi,

On Tue, Jun 27 2023, Marek Polacek wrote:
> On Tue, Jun 27, 2023 at 01:39:16PM +0200, Martin Jambor wrote:
>> Hello,
>> 
>> On Tue, May 16 2023, Marek Polacek via Gcc-patches wrote:
>> > As promised in the --enable-host-pie patch, this patch adds another
>> > configure option, --enable-host-bind-now, which adds -z now when linking
>> > the compiler executables in order to extend hardening.  BIND_NOW with RELRO
>> > allows the GOT to be marked RO; this prevents GOT modification attacks.
>> >
>> > This option does not affect linking of target libraries; you can use
>> > LDFLAGS_FOR_TARGET=-Wl,-z,relro,-z,now to enable RELRO/BIND_NOW.
>> >
>> > With this patch:
>> > $ readelf -Wd cc1{,plus} | grep FLAGS
>> >  0x001e (FLAGS)  BIND_NOW
>> >  0x6ffb (FLAGS_1)Flags: NOW PIE
>> >  0x001e (FLAGS)  BIND_NOW
>> >  0x6ffb (FLAGS_1)Flags: NOW PIE
>> >
>> > Bootstrapped/regtested on x86_64-pc-linux-gnu, ok for trunk?
>> >
>> > c++tools/ChangeLog:
>> >
>> >* configure.ac (--enable-host-bind-now): New check.
>> >* configure: Regenerate.
>> >
>> > gcc/ChangeLog:
>> >
>> >* configure.ac (--enable-host-bind-now): New check.  Add
>> >-Wl,-z,now to LD_PICFLAG if --enable-host-bind-now.
>> >* configure: Regenerate.
>> >* doc/install.texi: Document --enable-host-bind-now.
>> >
>> > lto-plugin/ChangeLog:
>> >
>> >* configure.ac (--enable-host-bind-now): New check.  Link with
>> >-z,now.
>> >* configure: Regenerate.
>> 
>> Our reconfiguration checking script complains about a missing hunk in
>> lto-plugin/Makefile.in:
>> 
>> diff --git a/lto-plugin/Makefile.in b/lto-plugin/Makefile.in
>> index cb568e1e09f..f6f5b020ff5 100644
>> --- a/lto-plugin/Makefile.in
>> +++ b/lto-plugin/Makefile.in
>> @@ -298,6 +298,7 @@ datadir = @datadir@
>>  datarootdir = @datarootdir@
>>  docdir = @docdir@
>>  dvidir = @dvidir@
>> +enable_host_bind_now = @enable_host_bind_now@
>>  exec_prefix = @exec_prefix@
>>  gcc_build_dir = @gcc_build_dir@
>>  get_gcc_base_ver = @get_gcc_base_ver@
>> 
>> 
>> I am somewhat puzzled why the line is not missing in any of the other
>> Makefile.in files.  Can you please check whether that is the only thing
>> that is missing (assuming it is actually missing)?
>
> Arg, once again, I'm sorry.  I don't know how this happened.  It would
> be trivial to fix it but since
>
> commit 4a48a38fa99f067b8f3a3d1a5dc7a1e602db351f
> Author: Eric Botcazou 
> Date:   Wed Jun 21 18:19:36 2023 +0200
>
> ada: Fix build of GNAT tools
>
> the build with Ada included fails with --enable-host-pie.  So that needs
> to be fixed first.
>
> Eric, I'm not asking you to fix that, but I'm curious, what did the
> commit above fix?  The patch looks correct; I'm just puzzled why I
> hadn't seen any build failures.
>
> The --enable-host-pie patch has been a nightmare :(.
>

No worries, I can see how these things can easily get difficult.

Unfortunately I won't have time to actually look at this in the next 2-3
weeks, so I am inclined to just trust the verification script (which
essentially runs autoconf/automake everywhere and then expects no diff)
and commit the one-line change.  What do you think, does that make sense
(even without looking at why other Makefile.in files did not change)?

Thanks,

Martin


Re: Enable ranger for ipa-prop

2023-06-27 Thread Martin Jambor
On Tue, Jun 27 2023, Jan Hubicka wrote:
> Hi,
> as shown in the testcase (which would eventually be useful for
> optimizing std::vector's push_back), ipa-prop can use context dependent ranger
> queries for better value range info.
>
> Bootstrapped/regtested x86_64-linux, OK?
>
> Honza
>
> gcc/ChangeLog:
>
>   PR middle-end/110377
>   * ipa-prop.cc (ipa_compute_jump_functions_for_edge): Add ranger
>   parameter; use ranger instance for rnage queries.
>   (ipa_compute_jump_functions_for_bb): Pass around ranger.
>   (analysis_dom_walker::before_dom_children): Enable ranger.

Looks good to me (with or without passing a ranger parameter around).

Martin


>
> gcc/testsuite/ChangeLog:
>
>   PR middle-end/110377
>   * gcc.dg/tree-ssa/pr110377.c: New test.
>
> diff --git a/gcc/testsuite/gcc.dg/tree-ssa/pr110377.c 
> b/gcc/testsuite/gcc.dg/tree-ssa/pr110377.c
> new file mode 100644
> index 000..cbe3441caea
> --- /dev/null
> +++ b/gcc/testsuite/gcc.dg/tree-ssa/pr110377.c
> @@ -0,0 +1,16 @@
> +/* { dg-do compile */
> +/* { dg-options "-O2 -fdump-ipa-fnsummary" } */
> +int test3(int);
> +__attribute__ ((noinline))
> +void test2(int a)
> +{
> + test3(a);
> +}
> +void
> +test(int n)
> +{
> +if (n > 5)
> +  __builtin_unreachable ();
> +test2(n);
> +}
> +/* { dg-final { scan-tree-dump "-INF, 5-INF" "fnsummary" } }  */
> diff --git a/gcc/ipa-prop.cc b/gcc/ipa-prop.cc
> index 41c812194ca..693d4805d93 100644
> --- a/gcc/ipa-prop.cc
> +++ b/gcc/ipa-prop.cc
> @@ -2341,7 +2341,8 @@ ipa_set_jfunc_vr (ipa_jump_func *jf, const ipa_vr &vr)
>  
>  static void
>  ipa_compute_jump_functions_for_edge (struct ipa_func_body_info *fbi,
> -  struct cgraph_edge *cs)
> +  struct cgraph_edge *cs,
> +  gimple_ranger *ranger)
>  {
>ipa_node_params *info = ipa_node_params_sum->get (cs->caller);
>ipa_edge_args *args = ipa_edge_args_sum->get_create (cs);
> @@ -2386,7 +2387,7 @@ ipa_compute_jump_functions_for_edge (struct 
> ipa_func_body_info *fbi,
>  
> if (TREE_CODE (arg) == SSA_NAME
> && param_type
> -   && get_range_query (cfun)->range_of_expr (vr, arg)
> +   && get_range_query (cfun)->range_of_expr (vr, arg, cs->call_stmt)
> && vr.nonzero_p ())
>   addr_nonzero = true;
> else if (tree_single_nonzero_warnv_p (arg, &strict_overflow))
> @@ -2408,7 +2409,7 @@ ipa_compute_jump_functions_for_edge (struct 
> ipa_func_body_info *fbi,
> && Value_Range::supports_type_p (param_type)
> && irange::supports_p (TREE_TYPE (arg))
> && irange::supports_p (param_type)
> -   && get_range_query (cfun)->range_of_expr (vr, arg)
> +   && ranger->range_of_expr (vr, arg, cs->call_stmt)
> && !vr.undefined_p ())
>   {
> Value_Range resvr (vr);
> @@ -2517,7 +2518,8 @@ ipa_compute_jump_functions_for_edge (struct 
> ipa_func_body_info *fbi,
> from BB.  */
>  
>  static void
> -ipa_compute_jump_functions_for_bb (struct ipa_func_body_info *fbi, 
> basic_block bb)
> +ipa_compute_jump_functions_for_bb (struct ipa_func_body_info *fbi, 
> basic_block bb,
> +gimple_ranger *ranger)
>  {
>struct ipa_bb_info *bi = ipa_get_bb_info (fbi, bb);
>int i;
> @@ -2536,7 +2538,7 @@ ipa_compute_jump_functions_for_bb (struct 
> ipa_func_body_info *fbi, basic_block b
> && !gimple_call_fnspec (cs->call_stmt).known_p ())
>   continue;
>   }
> -  ipa_compute_jump_functions_for_edge (fbi, cs);
> +  ipa_compute_jump_functions_for_edge (fbi, cs, ranger);
>  }
>  }
>  
> @@ -3110,19 +3112,27 @@ class analysis_dom_walker : public dom_walker
>  {
>  public:
>analysis_dom_walker (struct ipa_func_body_info *fbi)
> -: dom_walker (CDI_DOMINATORS), m_fbi (fbi) {}
> +: dom_walker (CDI_DOMINATORS), m_fbi (fbi)
> +  {
> +m_ranger = enable_ranger (cfun, false);
> +  }
> +  ~analysis_dom_walker ()
> +  {
> +disable_ranger (cfun);
> +  }
>  
>edge before_dom_children (basic_block) final override;
>  
>  private:
>struct ipa_func_body_info *m_fbi;
> +  gimple_ranger *m_ranger;
>  };
>  
>  edge
>  analysis_dom_walker::before_dom_children (basic_block bb)
>  {
>ipa_analyze_params_uses_in_bb (m_fbi, bb);
> -  ipa_compute_jump_functions_for_bb (m_fbi, bb);
> +  ipa_compute_jump_functions_for_bb (m_fbi, bb, m_ranger);
>return NULL;
>  }
>  


Re: [PATCH] configure: Implement --enable-host-bind-now

2023-06-27 Thread Martin Jambor
Hello,

On Tue, May 16 2023, Marek Polacek via Gcc-patches wrote:
> As promised in the --enable-host-pie patch, this patch adds another
> configure option, --enable-host-bind-now, which adds -z now when linking
> the compiler executables in order to extend hardening.  BIND_NOW with RELRO
> allows the GOT to be marked RO; this prevents GOT modification attacks.
>
> This option does not affect linking of target libraries; you can use
> LDFLAGS_FOR_TARGET=-Wl,-z,relro,-z,now to enable RELRO/BIND_NOW.
>
> With this patch:
> $ readelf -Wd cc1{,plus} | grep FLAGS
>  0x001e (FLAGS)  BIND_NOW
>  0x6ffb (FLAGS_1)Flags: NOW PIE
>  0x001e (FLAGS)  BIND_NOW
>  0x6ffb (FLAGS_1)Flags: NOW PIE
>
> Bootstrapped/regtested on x86_64-pc-linux-gnu, ok for trunk?
>
> c++tools/ChangeLog:
>
>   * configure.ac (--enable-host-bind-now): New check.
>   * configure: Regenerate.
>
> gcc/ChangeLog:
>
>   * configure.ac (--enable-host-bind-now): New check.  Add
>   -Wl,-z,now to LD_PICFLAG if --enable-host-bind-now.
>   * configure: Regenerate.
>   * doc/install.texi: Document --enable-host-bind-now.
>
> lto-plugin/ChangeLog:
>
>   * configure.ac (--enable-host-bind-now): New check.  Link with
>   -z,now.
>   * configure: Regenerate.

Our reconfiguration checking script complains about a missing hunk in
lto-plugin/Makefile.in:

diff --git a/lto-plugin/Makefile.in b/lto-plugin/Makefile.in
index cb568e1e09f..f6f5b020ff5 100644
--- a/lto-plugin/Makefile.in
+++ b/lto-plugin/Makefile.in
@@ -298,6 +298,7 @@ datadir = @datadir@
 datarootdir = @datarootdir@
 docdir = @docdir@
 dvidir = @dvidir@
+enable_host_bind_now = @enable_host_bind_now@
 exec_prefix = @exec_prefix@
 gcc_build_dir = @gcc_build_dir@
 get_gcc_base_ver = @get_gcc_base_ver@


I am somewhat puzzled why the line is not missing in any of the other
Makefile.in files.  Can you please check whether that is the only thing
that is missing (assuming it is actually missing)?

Thanks,

Martin


Re: [PATCH] Convert remaining uses of value_range in ipa-*.cc to Value_Range.

2023-06-26 Thread Martin Jambor
Hi,

On Mon, May 22 2023, Aldy Hernandez via Gcc-patches wrote:
> Minor cleanups to get rid of value_range in IPA.  There's only one left,
> but it's in the switch code which is integer specific.
>
> OK?

With the same request that...

>
> gcc/ChangeLog:
>
>   * ipa-cp.cc (decide_whether_version_node): Adjust comment.
>   * ipa-fnsummary.cc (evaluate_conditions_for_known_args): Adjust
>   for Value_Range.
>   (set_switch_stmt_execution_predicate): Same.
>   * ipa-prop.cc (ipa_compute_jump_functions_for_edge): Same.
> ---
>  gcc/ipa-cp.cc|  3 +--
>  gcc/ipa-fnsummary.cc | 22 ++
>  gcc/ipa-prop.cc  |  9 +++--
>  3 files changed, 18 insertions(+), 16 deletions(-)
>
> diff --git a/gcc/ipa-cp.cc b/gcc/ipa-cp.cc
> index 03273666ea2..2e64415096e 100644
> --- a/gcc/ipa-cp.cc
> +++ b/gcc/ipa-cp.cc
> @@ -6287,8 +6287,7 @@ decide_whether_version_node (struct cgraph_node *node)
>   {
> /* If some values generated for self-recursive calls with
>arithmetic jump functions fall outside of the known
> -  value_range for the parameter, we can skip them.  VR interface
> -  supports this only for integers now.  */
> +  range for the parameter, we can skip them.  */
> if (TREE_CODE (val->value) == INTEGER_CST
> && !plats->m_value_range.bottom_p ()
> && !ipa_range_contains_p (plats->m_value_range.m_vr,
> diff --git a/gcc/ipa-fnsummary.cc b/gcc/ipa-fnsummary.cc
> index 0474af8991e..1ce8501fe85 100644
> --- a/gcc/ipa-fnsummary.cc
> +++ b/gcc/ipa-fnsummary.cc
> @@ -488,19 +488,20 @@ evaluate_conditions_for_known_args (struct cgraph_node 
> *node,
> if (vr.varying_p () || vr.undefined_p ())
>   break;
>  
> -   value_range res;
> +   Value_Range res (op->type);
> if (!op->val[0])
>   {
> +   Value_Range varying (op->type);
> +   varying.set_varying (op->type);
> range_op_handler handler (op->code, op->type);
> if (!handler
> || !res.supports_type_p (op->type)
> -   || !handler.fold_range (res, op->type, vr,
> -   value_range (op->type)))
> +   || !handler.fold_range (res, op->type, vr, varying))
>   res.set_varying (op->type);
>   }
> else if (!op->val[1])
>   {
> -   value_range op0;
> +   Value_Range op0 (op->type);
> range_op_handler handler (op->code, op->type);
>  
> ipa_range_set_and_normalize (op0, op->val[0]);
> @@ -518,14 +519,14 @@ evaluate_conditions_for_known_args (struct cgraph_node 
> *node,
>   }
> if (!vr.varying_p () && !vr.undefined_p ())
>   {
> -   value_range res;
> -   value_range val_vr;
> +   int_range<2> res;
> +   Value_Range val_vr (TREE_TYPE (c->val));
> range_op_handler handler (c->code, boolean_type_node);
>  
> ipa_range_set_and_normalize (val_vr, c->val);
>  
> if (!handler
> -   || !res.supports_type_p (boolean_type_node)
> +   || !val_vr.supports_type_p (TREE_TYPE (c->val))
> || !handler.fold_range (res, boolean_type_node, vr, 
> val_vr))
>   res.set_varying (boolean_type_node);
>  
> @@ -1687,12 +1688,17 @@ set_switch_stmt_execution_predicate (struct 
> ipa_func_body_info *fbi,
>int bound_limit = opt_for_fn (fbi->node->decl,
>   param_ipa_max_switch_predicate_bounds);
>int bound_count = 0;
> -  value_range vr;
> +  // This can safely be an integer range, as switches can only hold
> +  // integers.
> +  int_range<2> vr;
>  
>get_range_query (cfun)->range_of_expr (vr, op);
>if (vr.undefined_p ())
>  vr.set_varying (TREE_TYPE (op));
>tree vr_min, vr_max;
> +  // ?? This entire function could use a rewrite to use the irange
> +  // API, instead of trying to recreate its intersection/union logic.
> +  // Any use of get_legacy_range() is a serious code smell.

you replace "??" with TODO, because that is presumably what you mean.

OK with that change.

Thanks,

Martin


>value_range_kind vr_type = get_legacy_range (vr, vr_min, vr_max);
>wide_int vr_wmin = wi::to_wide (vr_min);
>wide_int vr_wmax = wi::to_wide (vr_max);
> diff --git a/gcc/ipa-prop.cc b/gcc/ipa-prop.cc
> index 6383bc11e0a..5f9e6dbbff2 100644
> --- a/gcc/ipa-prop.cc
> +++ b/gcc/ipa-prop.cc
> @@ -2348,7 +2348,6 @@ ipa_compute_jump_functions_for_edge (struct 
> ipa_func_body_info *fbi,
>gcall *call = cs->call_stmt;
>int n, arg_num = gimple_call_num_args (call);
>bool useful_context = false;
> -  value_ra

Re: [PATCH] Implement ipa_vr hashing.

2023-06-26 Thread Martin Jambor
Hi,

On Mon, May 22 2023, Aldy Hernandez via Gcc-patches wrote:
> Implement hashing for ipa_vr.  When all is said and done, all these
> patches incurr a 7.64% slowdown for ipa-cp, with is entirely covered by
> the similar 7% increase in this area last week.  So we get type agnostic
> ranges with "infinite" range precision close to free.
>
> There is no change in overall compilation.
>
> OK?
>

One small request

> gcc/ChangeLog:
>
>   * ipa-prop.cc (struct ipa_vr_ggc_hash_traits): Adjust for use with
>   ipa_vr instead of value_range.
>   (gt_pch_nx): Same.
>   (gt_ggc_mx): Same.
>   (ipa_get_value_range): Same.
>   * value-range.cc (gt_pch_nx): Move to ipa-prop.cc and adjust for
>   ipa_vr.
>   (gt_ggc_mx): Same.
> ---
>  gcc/ipa-prop.cc| 76 +++---
>  gcc/value-range.cc | 15 -
>  2 files changed, 45 insertions(+), 46 deletions(-)
>
> diff --git a/gcc/ipa-prop.cc b/gcc/ipa-prop.cc
> index c46a89f1b49..6383bc11e0a 100644
> --- a/gcc/ipa-prop.cc
> +++ b/gcc/ipa-prop.cc
> @@ -109,53 +109,53 @@ struct ipa_bit_ggc_hash_traits : public 
> ggc_cache_remove 
>  /* Hash table for avoid repeated allocations of equal ipa_bits.  */
>  static GTY ((cache)) hash_table 
> *ipa_bits_hash_table;
>  
> -/* Traits for a hash table for reusing value_ranges used for IPA.  Note that
> -   the equiv bitmap is not hashed and is expected to be NULL.  */
> +/* Traits for a hash table for reusing ranges.  */
>  
> -struct ipa_vr_ggc_hash_traits : public ggc_cache_remove 
> +struct ipa_vr_ggc_hash_traits : public ggc_cache_remove 
>  {
> -  typedef value_range *value_type;
> -  typedef value_range *compare_type;
> +  typedef ipa_vr *value_type;
> +  typedef const vrange *compare_type;
>static hashval_t
> -  hash (const value_range *p)
> +  hash (const ipa_vr *p)
>  {
> -  tree min, max;
> -  value_range_kind kind = get_legacy_range (*p, min, max);
> -  inchash::hash hstate (kind);
> -  inchash::add_expr (min, hstate);
> -  inchash::add_expr (max, hstate);
> +  // This never get called, except in the verification code, as
> +  // ipa_get_value_range() calculates the hash itself.  This
> +  // function is mostly here for completness' sake.
> +  Value_Range vr;
> +  p->get_vrange (vr);
> +  inchash::hash hstate;
> +  add_vrange (vr, hstate);
>return hstate.end ();
>  }
>static bool
> -  equal (const value_range *a, const value_range *b)
> +  equal (const ipa_vr *a, const vrange *b)
>  {
> -  return (types_compatible_p (a->type (), b->type ())
> -   && *a == *b);
> +  return a->equal_p (*b);
>  }
>static const bool empty_zero_p = true;
>static void
> -  mark_empty (value_range *&p)
> +  mark_empty (ipa_vr *&p)
>  {
>p = NULL;
>  }
>static bool
> -  is_empty (const value_range *p)
> +  is_empty (const ipa_vr *p)
>  {
>return p == NULL;
>  }
>static bool
> -  is_deleted (const value_range *p)
> +  is_deleted (const ipa_vr *p)
>  {
> -  return p == reinterpret_cast (1);
> +  return p == reinterpret_cast (1);
>  }
>static void
> -  mark_deleted (value_range *&p)
> +  mark_deleted (ipa_vr *&p)
>  {
> -  p = reinterpret_cast (1);
> +  p = reinterpret_cast (1);
>  }
>  };
>  
> -/* Hash table for avoid repeated allocations of equal value_ranges.  */
> +/* Hash table for avoid repeated allocations of equal ranges.  */
>  static GTY ((cache)) hash_table *ipa_vr_hash_table;
>  
>  /* Holders of ipa cgraph hooks: */
> @@ -265,6 +265,22 @@ ipa_vr::dump (FILE *out) const
>  fprintf (out, "NO RANGE");
>  }
>  
> +// ?? These stubs are because we use an ipa_vr in a hash_traits and
> +// hash-traits.h defines an extern of gt_ggc_mx (T &) instead of
> +// picking up the gt_ggc_mx (T *) version.

If you mean FIXME or TODO, please replace the "??" string with one of
those.  Otherwise please just remove it or specify what you mean in some
clearer way.

OK with that change.

Thanks,

Martin



> +void
> +gt_pch_nx (ipa_vr *&x)
> +{
> +  return gt_pch_nx ((ipa_vr *) x);
> +}
> +
> +void
> +gt_ggc_mx (ipa_vr *&x)
> +{
> +  return gt_ggc_mx ((ipa_vr *) x);
> +}
> +
> +

[...]


Re: [PATCH] Convert ipa_jump_func to use ipa_vr instead of a value_range.

2023-06-26 Thread Martin Jambor
Hi,

On Mon, May 22 2023, Aldy Hernandez via Gcc-patches wrote:
> This patch converts the ipa_jump_func code to use the type agnostic
> ipa_vr suitable for GC instead of value_range which is integer specific.
>
> I've disabled the range cacheing to simplify the patch for review, but
> it is handled in the next patch in the series.
>
> OK?
>
> gcc/ChangeLog:
>
>   * ipa-cp.cc (ipa_vr_operation_and_type_effects): New.
>   * ipa-prop.cc (ipa_get_value_range): Adjust for ipa_vr.
>   (ipa_set_jfunc_vr): Take a range.
>   (ipa_compute_jump_functions_for_edge): Pass range to
>   ipa_set_jfunc_vr.
>   (ipa_write_jump_function): Call streamer write helper.
>   (ipa_read_jump_function): Call streamer read helper.
>   * ipa-prop.h (class ipa_vr): Change m_vr to an ipa_vr.

OK, thanks and sorry for the waiting, I've been unexpectedly traveling
last week.

Martin

> ---
>  gcc/ipa-cp.cc   | 15 +++
>  gcc/ipa-prop.cc | 70 ++---
>  gcc/ipa-prop.h  |  5 +++-
>  3 files changed, 44 insertions(+), 46 deletions(-)
>
[...]


[PATCH] ipa-sra: Disable candidates with no known callers (PR 110276)

2023-06-16 Thread Martin Jambor
Hi,

In IPA-SRA we use can_be_local_p () predicate rather than just plain
local call graph flag in order to figure out whether the node is a
part of an external API that we cannot change.  Although there are
cases where this can allow more transformations, it also means we can
analyze functions which have no callers at all, which is pointless.

Moreover, it makes an assert of hint propagation trigger, which checks
that we have looked at callers before processing hints that come from
them.  This has been reported as PR 110276.

This patch simply adds a check that a node has at least one caller
into the early checks and makes the node a non-candidate for any
transformation if it does not.

Bootstrapped and tested on x86_64-linux, LTO bootstrap is still
underway.  OK if it passes too?

Thanks,

Martin


gcc/ChangeLog:

2023-06-16  Martin Jambor  

PR ipa/110276
* ipa-sra.cc (struct caller_issues): New field there_is_one.
(check_for_caller_issues): Set it.
(check_all_callers_for_issues): Check it.

gcc/testsuite/ChangeLog:

2023-06-16  Martin Jambor  

PR ipa/110276
* gcc.dg/ipa/pr110276.c: New test.
---
 gcc/ipa-sra.cc  | 11 +++
 gcc/testsuite/gcc.dg/ipa/pr110276.c | 15 +++
 2 files changed, 26 insertions(+)
 create mode 100644 gcc/testsuite/gcc.dg/ipa/pr110276.c

diff --git a/gcc/ipa-sra.cc b/gcc/ipa-sra.cc
index 3fee8fb22ce..21d281a9756 100644
--- a/gcc/ipa-sra.cc
+++ b/gcc/ipa-sra.cc
@@ -3074,6 +3074,8 @@ struct caller_issues
   cgraph_node *candidate;
   /* There is a thunk among callers.  */
   bool thunk;
+  /* Set if there is at least one caller that is OK.  */
+  bool there_is_one;
   /* Call site with no available information.  */
   bool unknown_callsite;
   /* Call from outside the candidate's comdat group.  */
@@ -3116,6 +3118,8 @@ check_for_caller_issues (struct cgraph_node *node, void 
*data)
 
   if (csum->m_bit_aligned_arg)
issues->bit_aligned_aggregate_argument = true;
+
+  issues->there_is_one = true;
 }
   return false;
 }
@@ -3170,6 +3174,13 @@ check_all_callers_for_issues (cgraph_node *node)
   for (unsigned i = 0; i < param_count; i++)
(*ifs->m_parameters)[i].split_candidate = false;
 }
+  if (!issues.there_is_one)
+{
+  if (dump_file && (dump_flags & TDF_DETAILS))
+   fprintf (dump_file, "There is no call to %s that we can modify.  "
+"Disabling all modifications.\n", node->dump_name ());
+  return true;
+}
   return false;
 }
 
diff --git a/gcc/testsuite/gcc.dg/ipa/pr110276.c 
b/gcc/testsuite/gcc.dg/ipa/pr110276.c
new file mode 100644
index 000..5a1e2f3fb1c
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/ipa/pr110276.c
@@ -0,0 +1,15 @@
+/* { dg-do compile } */
+/* { dg-options "-O2" } */
+
+typedef long (*EFI_PCI_IO_PROTOCOL_CONFIG)();
+typedef struct {
+  EFI_PCI_IO_PROTOCOL_CONFIG Read;
+} EFI_PCI_IO_PROTOCOL_CONFIG_ACCESS;
+typedef struct {
+  EFI_PCI_IO_PROTOCOL_CONFIG_ACCESS Pci;
+} EFI_PCI_IO_PROTOCOL;
+int init_regs_0;
+static void __attribute__((constructor)) init(EFI_PCI_IO_PROTOCOL *pci_io) {
+  if (init_regs_0)
+pci_io->Pci.Read();
+}
-- 
2.40.1



[PATCH] Regenerate some autotools generated files (Was: Re: [PATCH v3] configure: Implement --enable-host-pie)

2023-06-16 Thread Martin Jambor
On Fri, Jun 16 2023, Marek Polacek wrote:
> On Fri, Jun 16, 2023 at 12:26:23PM +0200, Martin Jambor wrote:
>> Hello,
>> 
>> On Thu, Jun 15 2023, Marek Polacek via Gcc-patches wrote:
>> > On Mon, Jun 05, 2023 at 09:06:43PM -0600, Jeff Law wrote:
>> >> 
>> >> 
>> >> On 6/5/23 10:18, Marek Polacek via Gcc-patches wrote:
>> >> > Ping.  Anyone have any further comments?
>> >> Given this was approved before, but got reverted due to issues (which have
>> >> since been addressed) -- I think you might as well go forward and sooner
>> >> rather than later so that we can catch fallout earlier.
>> >
>> > Thanks, pushed now, after rebasing, adjusting the patch for
>> > r14-1385, and testing with and without --enable-host-pie on
>> > both Debian and Fedora.
>> >
>> > If something comes up and I can't fix it quickly enough, I'll
>> > have to revert the patch.  We'll see.
>> >
>> 
>> The script that regularly checks that the checked-in autotools-generated
>> files are in sync now complain about the following diff.  Unless someone
>> stops me because I overlooked something or for some other reason, I will
>> commit it later on as obvious.
>
> Please, go ahead.
>  
>> I wonder where the "line" differences come from, perhaps you added a
>> comment after running autoconf/automake/...?  The zlib/Makefile.in hunks
>
> Arg, I think I must've messed up the #lines when rebasing though I don't
> know what went wrong with zlib/Makefile.in.  But I don't think the latter
> will actually make any difference.
>
>> like something we should have, though, even if I did not check whether
>> it makes any difference in practice.  And I want the checking script to
>> shut up too ;-)
>
> Thanks and sorry.
>

No worries, I have committed the following.

Thanks and have a nice weekend,

Martin



As discussed in
https://gcc.gnu.org/pipermail/gcc-patches/2023-June/621976.html this
should put the autotools generated files in sync to what they were
generated from (and make an automated checker happy).

Tested by bootstrapping on top of only a few revisions ago.

zlib/ChangeLog:

2023-06-16  Martin Jambor  

* Makefile.in: Regenerate.
* configure: Likewise.

gcc/ChangeLog:

2023-06-16  Martin Jambor  

* configure: Regenerate.
---
 gcc/configure| 4 ++--
 zlib/Makefile.in | 2 ++
 zlib/configure   | 4 ++--
 3 files changed, 6 insertions(+), 4 deletions(-)

diff --git a/gcc/configure b/gcc/configure
index a4563a9cade..f7b4b283ca2 100755
--- a/gcc/configure
+++ b/gcc/configure
@@ -19847,7 +19847,7 @@ else
   lt_dlunknown=0; lt_dlno_uscore=1; lt_dlneed_uscore=2
   lt_status=$lt_dlunknown
   cat > conftest.$ac_ext <<_LT_EOF
-#line 19848 "configure"
+#line 19850 "configure"
 #include "confdefs.h"
 
 #if HAVE_DLFCN_H
@@ -19953,7 +19953,7 @@ else
   lt_dlunknown=0; lt_dlno_uscore=1; lt_dlneed_uscore=2
   lt_status=$lt_dlunknown
   cat > conftest.$ac_ext <<_LT_EOF
-#line 19954 "configure"
+#line 19956 "configure"
 #include "confdefs.h"
 
 #if HAVE_DLFCN_H
diff --git a/zlib/Makefile.in b/zlib/Makefile.in
index 3f5102d1b87..80fe3b69116 100644
--- a/zlib/Makefile.in
+++ b/zlib/Makefile.in
@@ -353,6 +353,8 @@ datadir = @datadir@
 datarootdir = @datarootdir@
 docdir = @docdir@
 dvidir = @dvidir@
+enable_host_pie = @enable_host_pie@
+enable_host_shared = @enable_host_shared@
 exec_prefix = @exec_prefix@
 host = @host@
 host_alias = @host_alias@
diff --git a/zlib/configure b/zlib/configure
index 77be6c284e3..9308866a636 100755
--- a/zlib/configure
+++ b/zlib/configure
@@ -10763,7 +10763,7 @@ else
   lt_dlunknown=0; lt_dlno_uscore=1; lt_dlneed_uscore=2
   lt_status=$lt_dlunknown
   cat > conftest.$ac_ext <<_LT_EOF
-#line 10778 "configure"
+#line 10766 "configure"
 #include "confdefs.h"
 
 #if HAVE_DLFCN_H
@@ -10869,7 +10869,7 @@ else
   lt_dlunknown=0; lt_dlno_uscore=1; lt_dlneed_uscore=2
   lt_status=$lt_dlunknown
   cat > conftest.$ac_ext <<_LT_EOF
-#line 10884 "configure"
+#line 10872 "configure"
 #include "confdefs.h"
 
 #if HAVE_DLFCN_H
-- 
2.40.1





Re: [PATCH v3] configure: Implement --enable-host-pie

2023-06-16 Thread Martin Jambor
Hello,

On Thu, Jun 15 2023, Marek Polacek via Gcc-patches wrote:
> On Mon, Jun 05, 2023 at 09:06:43PM -0600, Jeff Law wrote:
>> 
>> 
>> On 6/5/23 10:18, Marek Polacek via Gcc-patches wrote:
>> > Ping.  Anyone have any further comments?
>> Given this was approved before, but got reverted due to issues (which have
>> since been addressed) -- I think you might as well go forward and sooner
>> rather than later so that we can catch fallout earlier.
>
> Thanks, pushed now, after rebasing, adjusting the patch for
> r14-1385, and testing with and without --enable-host-pie on
> both Debian and Fedora.
>
> If something comes up and I can't fix it quickly enough, I'll
> have to revert the patch.  We'll see.
>

The script that regularly checks that the checked-in autotools-generated
files are in sync now complain about the following diff.  Unless someone
stops me because I overlooked something or for some other reason, I will
commit it later on as obvious.

I wonder where the "line" differences come from, perhaps you added a
comment after running autoconf/automake/...?  The zlib/Makefile.in hunks
like something we should have, though, even if I did not check whether
it makes any difference in practice.  And I want the checking script to
shut up too ;-)

Thanks,

Martin


diff --git a/gcc/configure b/gcc/configure
index a4563a9cade..f7b4b283ca2 100755
--- a/gcc/configure
+++ b/gcc/configure
@@ -19847,7 +19847,7 @@ else
   lt_dlunknown=0; lt_dlno_uscore=1; lt_dlneed_uscore=2
   lt_status=$lt_dlunknown
   cat > conftest.$ac_ext <<_LT_EOF
-#line 19848 "configure"
+#line 19850 "configure"
 #include "confdefs.h"
 
 #if HAVE_DLFCN_H
@@ -19953,7 +19953,7 @@ else
   lt_dlunknown=0; lt_dlno_uscore=1; lt_dlneed_uscore=2
   lt_status=$lt_dlunknown
   cat > conftest.$ac_ext <<_LT_EOF
-#line 19954 "configure"
+#line 19956 "configure"
 #include "confdefs.h"
 
 #if HAVE_DLFCN_H
diff --git a/zlib/Makefile.in b/zlib/Makefile.in
index 3f5102d1b87..80fe3b69116 100644
--- a/zlib/Makefile.in
+++ b/zlib/Makefile.in
@@ -353,6 +353,8 @@ datadir = @datadir@
 datarootdir = @datarootdir@
 docdir = @docdir@
 dvidir = @dvidir@
+enable_host_pie = @enable_host_pie@
+enable_host_shared = @enable_host_shared@
 exec_prefix = @exec_prefix@
 host = @host@
 host_alias = @host_alias@
diff --git a/zlib/configure b/zlib/configure
index 77be6c284e3..9308866a636 100755
--- a/zlib/configure
+++ b/zlib/configure
@@ -10763,7 +10763,7 @@ else
   lt_dlunknown=0; lt_dlno_uscore=1; lt_dlneed_uscore=2
   lt_status=$lt_dlunknown
   cat > conftest.$ac_ext <<_LT_EOF
-#line 10778 "configure"
+#line 10766 "configure"
 #include "confdefs.h"
 
 #if HAVE_DLFCN_H
@@ -10869,7 +10869,7 @@ else
   lt_dlunknown=0; lt_dlno_uscore=1; lt_dlneed_uscore=2
   lt_status=$lt_dlunknown
   cat > conftest.$ac_ext <<_LT_EOF
-#line 10884 "configure"
+#line 10872 "configure"
 #include "confdefs.h"
 
 #if HAVE_DLFCN_H


Re: [PATCH] ipa: Self-DCE of uses of removed call LHSs (PR 108007)

2023-06-13 Thread Martin Jambor
Ping.

Thanks,

Martin

On Fri, May 12 2023, Martin Jambor wrote:
> Hi,
>
> PR 108007 is another manifestation where we rely on DCE to clean-up
> after IPA-SRA and if the user explicitely switches DCE off, IPA-SRA
> can leave behind statements which are fed uninitialized values and
> trap, even though their results are themselves never used.
>
> I have already fixed this for unused parameters in callees, this bug
> shows that almost the same thing can happen for removed returns, on
> the side of callers.  This means that the issue has to be fixed
> elsewhere, in call redirection.  This patch adds a function which
> recursivewly looks for uses of operations fed specific SSA names and
> removes them all.
>
> That would have been easy if it wasn't for debug statements during
> tree-inline (from which call redirection is also invoked).  Debug
> statements are decoupled from the rest at this point and iterating
> over uses of SSAs does not bring them up.  During tree-inline they are
> handled especially at the end, I assume in order to make sure that
> relative ordering of UIDs are the same with and without debug info.
>
> This means that during tree-inline we need to make a hash of killed
> SSAs, that we already have in copy_body_data, available to the
> function making the purging.  So the patch duly does also that, making
> the interface slightly ugly.
>
> Bootstrapped and tested on x86_64-linux.  OK for master?  (I am not sure
> the problem is grave enough to warrant backporting to release branches
> but can do that as well if people think I should.)
>
> Thanks,
>
> Martin
>
>
> gcc/ChangeLog:
>
> 2023-05-11  Martin Jambor  
>
>   PR ipa/108007
>   * cgraph.h (cgraph_edge): Add a parameter to
>   redirect_call_stmt_to_callee.
>   * ipa-param-manipulation.h (ipa_param_adjustments): Added a
>   parameter to modify_call.
>   * cgraph.cc (cgraph_edge::redirect_call_stmt_to_callee): New
>   parameter killed_ssas, pass it to padjs->modify_call.
>   * ipa-param-manipulation.cc (purge_transitive_uses): New function.
>   (ipa_param_adjustments::modify_call): New parameter killed_ssas.
>   Instead of substitutin uses, invoke purge_transitive_uses.  If
>   hash of killed SSAs has not been provided, create a temporary one
>   and release SSAs that have been added to it.
>   * tree-inline.cc (redirect_all_calls): Create
>   id->killed_new_ssa_names earlier, pass it to edge redirection,
>   adjust a comment.
>   (copy_body): Release SSAs in id->killed_new_ssa_names.
>
> gcc/testsuite/ChangeLog:
>
> 2023-05-11  Martin Jambor  
>
>   PR ipa/108007
>   * gcc.dg/ipa/pr108007.c: New test.
> ---
>  gcc/cgraph.cc   | 10 +++-
>  gcc/cgraph.h|  9 ++-
>  gcc/ipa-param-manipulation.cc   | 85 +
>  gcc/ipa-param-manipulation.h|  3 +-
>  gcc/testsuite/gcc.dg/ipa/pr108007.c | 32 +++
>  gcc/tree-inline.cc  | 28 ++
>  6 files changed, 129 insertions(+), 38 deletions(-)
>  create mode 100644 gcc/testsuite/gcc.dg/ipa/pr108007.c
>
> diff --git a/gcc/cgraph.cc b/gcc/cgraph.cc
> index e8f9bec8227..5e923bf0557 100644
> --- a/gcc/cgraph.cc
> +++ b/gcc/cgraph.cc
> @@ -1403,11 +1403,17 @@ cgraph_edge::redirect_callee (cgraph_node *n)
> speculative indirect call, remove "speculative" of the indirect call and
> also redirect stmt to it's final direct target.
>  
> +   When called from within tree-inline, KILLED_SSAs has to contain the 
> pointer
> +   to killed_new_ssa_names within the copy_body_data structure and SSAs
> +   discovered to be useless (if LHS is removed) will be added to it, 
> otherwise
> +   it needs to be NULL.
> +
> It is up to caller to iteratively transform each "speculative"
> direct call as appropriate.  */
>  
>  gimple *
> -cgraph_edge::redirect_call_stmt_to_callee (cgraph_edge *e)
> +cgraph_edge::redirect_call_stmt_to_callee (cgraph_edge *e,
> +hash_set  *killed_ssas)
>  {
>tree decl = gimple_call_fndecl (e->call_stmt);
>gcall *new_stmt;
> @@ -1527,7 +1533,7 @@ cgraph_edge::redirect_call_stmt_to_callee (cgraph_edge 
> *e)
>   remove_stmt_from_eh_lp (e->call_stmt);
>  
>tree old_fntype = gimple_call_fntype (e->call_stmt);
> -  new_stmt = padjs->modify_call (e, false);
> +  new_stmt = padjs->modify_call (e, false, killed_ssas);
>cgraph_node *origin = e->callee;
>while (origin->clone_of)
>   origin = origin->clone_of;
> diff --git a/gcc/cgraph.h b/gcc/cgra

Re: [PATCH] Convert ipcp_vr_lattice to type agnostic framework.

2023-06-10 Thread Martin Jambor
Hi,

thanks for dealing with my requests.

On Wed, Jun 07 2023, Aldy Hernandez wrote:
> On 5/26/23 18:17, Martin Jambor wrote:
>> Hello,
>> 
>> On Mon, May 22 2023, Aldy Hernandez wrote:
>>> I've adjusted the patch with some minor cleanups that came up when I
>>> implemented the rest of the IPA revamp.
>>>
>>> Rested.  OK?
>>>
>>> On Wed, May 17, 2023 at 4:31 PM Aldy Hernandez  wrote:
>>>>
>>>> This converts the lattice to store ranges in Value_Range instead of
>>>> value_range (*) to make it type agnostic, and adjust all users
>>>> accordingly.
>>>>
>>>> I think it is a good example on converting from static ranges to more
>>>> general, type agnostic ones.
>>>>
>>>> I've been careful to make sure Value_Range never ends up on GC, since
>>>> it contains an int_range_max and can expand on-demand onto the heap.
>>>> Longer term storage for ranges should be done with vrange_storage, as
>>>> per the previous patch ("Provide an API for ipa_vr").
>>>>
>>>> (*) I do know the Value_Range naming versus value_range is quite
>>>> annoying, but it was a judgement call last release for the eventual
>>>> migration to having "value_range" be a type agnostic range object.  We
>>>> will ultimately rename Value_Range to value_range.

[...]

>>>> diff --git a/gcc/ipa-cp.cc b/gcc/ipa-cp.cc
>>>> index d4b9d4ac27e..bd5b1da17b2 100644
>>>> --- a/gcc/ipa-cp.cc
>>>> +++ b/gcc/ipa-cp.cc
>>>> @@ -343,20 +343,29 @@ private:
>>>>   class ipcp_vr_lattice
>>>>   {
>>>>   public:
>>>> -  value_range m_vr;
>>>> +  Value_Range m_vr;
>>>>
>>>> inline bool bottom_p () const;
>>>> inline bool top_p () const;
>>>> -  inline bool set_to_bottom ();
>>>> -  bool meet_with (const value_range *p_vr);
>>>> +  inline bool set_to_bottom (tree type);
>> 
>> Requiring a type when setting a lattice to bottom makes for a weird
>> interface, can't we set the underlying Value_Range to whatever... >
>>>> +  bool meet_with (const vrange &p_vr);
>>>> bool meet_with (const ipcp_vr_lattice &other);
>>>> -  void init () { gcc_assert (m_vr.undefined_p ()); }
>>>> +  void init (tree type);
>>>> void print (FILE * f);
>>>>
>>>>   private:
>>>> -  bool meet_with_1 (const value_range *other_vr);
>>>> +  bool meet_with_1 (const vrange &other_vr);
>>>>   };
>>>>
>>>> +inline void
>>>> +ipcp_vr_lattice::init (tree type)
>>>> +{
>>>> +  if (type)
>>>> +m_vr.set_type (type);
>>>> +
>>>> +  // Otherwise m_vr will default to unsupported_range.
>> 
>> ...this does?
>> 
>> All users of the lattice check it for not being bottom first, so it
>> should be safe.
>> 
>> If it is not possible for some reason, then I guess we should add a bool
>> flag to ipcp_vr_lattice instead, rather than looking up types of
>> unusable lattices.  ipcp_vr_lattices don't live for long.
>
> The type was my least favorite part of this work.  And yes, your 
> suggestion would work.  I have tweaked the patch to force a VARYING for 
> an unsupported range which seems to do the trick.  It looks much 
> cleaner.  Thanks.

This version is much better indeed.

[...]

>>>> @@ -1912,29 +1917,33 @@ ipa_vr_operation_and_type_effects (value_range 
>>>> *dst_vr,
>>>>   return false;
>>>>
>>>> range_op_handler handler (operation, dst_type);
>>>> -  return (handler
>>>> - && handler.fold_range (*dst_vr, dst_type,
>>>> -*src_vr, value_range (dst_type))
>>>> - && !dst_vr->varying_p ()
>>>> - && !dst_vr->undefined_p ());
>>>> +  if (!handler)
>>>> +return false;
>>>> +
>>>> +  Value_Range varying (dst_type);
>>>> +  varying.set_varying (dst_type);
>>>> +
>>>> +  return (handler.fold_range (dst_vr, dst_type, src_vr, varying)
>>>> + && !dst_vr.varying_p ()
>>>> + && !dst_vr.undefined_p ());
>>>>   }
>>>>
>>>>   /* Determine value_range of JFUNC given that INFO describes the caller 

  1   2   3   4   5   6   7   8   9   10   >