Re: [PATCH] Fix incorrect computation in fill_always_executed_in_1

2021-08-16 Thread Xionghu Luo via Gcc-patches




On 2021/8/17 13:17, Xionghu Luo via Gcc-patches wrote:

Hi,

On 2021/8/16 19:46, Richard Biener wrote:

On Mon, 16 Aug 2021, Xiong Hu Luo wrote:


It seems to me that ALWAYS_EXECUTED_IN is not computed correctly for
nested loops.  inn_loop is updated to inner loop, so it need be restored
when exiting from innermost loop. With this patch, the store instruction
in outer loop could also be moved out of outer loop by store motion.
Any comments?  Thanks.



gcc/ChangeLog:

* tree-ssa-loop-im.c (fill_always_executed_in_1): Restore
inn_loop when exiting from innermost loop.

gcc/testsuite/ChangeLog:

* gcc.dg/tree-ssa/ssa-lim-19.c: New test.
---
  gcc/testsuite/gcc.dg/tree-ssa/ssa-lim-19.c | 24 ++
  gcc/tree-ssa-loop-im.c |  6 +-
  2 files changed, 29 insertions(+), 1 deletion(-)
  create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/ssa-lim-19.c

diff --git a/gcc/testsuite/gcc.dg/tree-ssa/ssa-lim-19.c 
b/gcc/testsuite/gcc.dg/tree-ssa/ssa-lim-19.c

new file mode 100644
index 000..097a5ee4a4b
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/tree-ssa/ssa-lim-19.c
@@ -0,0 +1,24 @@
+/* PR/101293 */
+/* { dg-do compile } */
+/* { dg-options "-O2 -fdump-tree-lim2-details" } */
+
+struct X { int i; int j; int k;};
+
+void foo(struct X *x, int n, int l)
+{
+  for (int j = 0; j < l; j++)
+    {
+  for (int i = 0; i < n; ++i)
+    {
+  int *p = >j;
+  int tem = *p;
+  x->j += tem * i;
+    }
+  int *r = >k;
+  int tem2 = *r;
+  x->k += tem2 * j;
+    }
+}
+
+/* { dg-final { scan-tree-dump-times "Executing store motion" 2 
"lim2" } } */

+
diff --git a/gcc/tree-ssa-loop-im.c b/gcc/tree-ssa-loop-im.c
index b24bc64f2a7..5ca4738b20e 100644
--- a/gcc/tree-ssa-loop-im.c
+++ b/gcc/tree-ssa-loop-im.c
@@ -3211,6 +3211,10 @@ fill_always_executed_in_1 (class loop *loop, 
sbitmap contains_call)

    if (dominated_by_p (CDI_DOMINATORS, loop->latch, bb))
  last = bb;
+  if (inn_loop != loop
+  && flow_loop_nested_p (bb->loop_father, inn_loop))
+    inn_loop = bb->loop_father;
+


The comment says

   /* In a loop that is always entered we may proceed anyway.
  But record that we entered it and stop once we leave 
it.

*/
   inn_loop = bb->loop_father;

and your change would defeat that early return, no?


The issue is the search method exits too early when iterating the outer
loop.  For example of a nested loop, loop 1 includes 5,8,3,10,4,9
and loop2 includes 3,10.  Currently, it breaks when bb is 3 as bb 3
doesn't dominate bb 9 of loop 1.  But actually, both bb 5 and bb 4 are
ALWAYS_EXECUTED for loop 1, so if there are store instructions in bb 4
they won't be processed by store motion again.


     5<
     |\   |
     8 \  9
     |  \ |
--->3--->4
|    | 10---|




Correct the graph display:

 5<
 |\   |
 8 \  9
 |  \ |
 --->3--->4
|   |
 ---10





SET_ALWAYS_EXECUTED_IN is only set to bb 5 on master code now, with this
patch, it will continue search when meet bb 3 until bb 4, then last is 
updated

to bb 4, it will break until exit edge is found at bb 4 by
"if (!flow_bb_inside_loop_p (loop, e->dest))".  Then the followed loop 
code will

set bb 4 as ALWAYS_EXEUCTED and all it's idoms bb 5.


  while (1)
 {
   SET_ALWAYS_EXECUTED_IN (last, loop);
   if (last == loop->header)
     break;
   last = get_immediate_dominator (CDI_DOMINATORS, last);
 }

After further discussion with Kewen, we found that the inn_loop variable is
totally useless and could be removed.





    if (bitmap_bit_p (contains_call, bb->index))
  break;
@@ -3238,7 +3242,7 @@ fill_always_executed_in_1 (class loop *loop, 
sbitmap contains_call)

    if (bb->loop_father->header == bb)
  {
-  if (!dominated_by_p (CDI_DOMINATORS, loop->latch, bb))
+  if (!dominated_by_p (CDI_DOMINATORS, 
bb->loop_father->latch, bb))

  break;


That's now a always false condition - a loops latch is always dominated
by its header.  The condition as written tries to verify whether the
loop is always entered - mind we visit all blocks, not only those
always executed.


Thanks for the catch!  I am afraid the piece of code should be removed 
since it stops

search of potential ALWAYS EXECUTED bb after inner loop...



In fact for your testcase the x->j ref is _not_ always executed
since the inner loop is conditional on n > 0.


Yes.  But I want to move x->k (not x->j) out of loop 1 when l > 0 in 
store-motion.
Attached the diff file without and with my patch to show the extra 
optimization.


x->j is already moved out of loop 2 on master code.
If change n and l to constant numbers like 100, master code could also 
do 2 store
motions as expected. The edge from bb 5 to bb 4 doesn't exist now, so bb 
4, bb 3

and bb 5 are ALWAYS EXECUTED for loop 1.


struct X { int i; int j; int k;};

void foo(struct X *x, int n, int l)
{
  for (int j = 0; j < l; 

Re: [PATCH] Fix incorrect computation in fill_always_executed_in_1

2021-08-16 Thread Xionghu Luo via Gcc-patches

Hi,

On 2021/8/16 19:46, Richard Biener wrote:

On Mon, 16 Aug 2021, Xiong Hu Luo wrote:


It seems to me that ALWAYS_EXECUTED_IN is not computed correctly for
nested loops.  inn_loop is updated to inner loop, so it need be restored
when exiting from innermost loop. With this patch, the store instruction
in outer loop could also be moved out of outer loop by store motion.
Any comments?  Thanks.



gcc/ChangeLog:

* tree-ssa-loop-im.c (fill_always_executed_in_1): Restore
inn_loop when exiting from innermost loop.

gcc/testsuite/ChangeLog:

* gcc.dg/tree-ssa/ssa-lim-19.c: New test.
---
  gcc/testsuite/gcc.dg/tree-ssa/ssa-lim-19.c | 24 ++
  gcc/tree-ssa-loop-im.c |  6 +-
  2 files changed, 29 insertions(+), 1 deletion(-)
  create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/ssa-lim-19.c

diff --git a/gcc/testsuite/gcc.dg/tree-ssa/ssa-lim-19.c 
b/gcc/testsuite/gcc.dg/tree-ssa/ssa-lim-19.c
new file mode 100644
index 000..097a5ee4a4b
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/tree-ssa/ssa-lim-19.c
@@ -0,0 +1,24 @@
+/* PR/101293 */
+/* { dg-do compile } */
+/* { dg-options "-O2 -fdump-tree-lim2-details" } */
+
+struct X { int i; int j; int k;};
+
+void foo(struct X *x, int n, int l)
+{
+  for (int j = 0; j < l; j++)
+{
+  for (int i = 0; i < n; ++i)
+   {
+ int *p = >j;
+ int tem = *p;
+ x->j += tem * i;
+   }
+  int *r = >k;
+  int tem2 = *r;
+  x->k += tem2 * j;
+}
+}
+
+/* { dg-final { scan-tree-dump-times "Executing store motion" 2 "lim2" } } */
+
diff --git a/gcc/tree-ssa-loop-im.c b/gcc/tree-ssa-loop-im.c
index b24bc64f2a7..5ca4738b20e 100644
--- a/gcc/tree-ssa-loop-im.c
+++ b/gcc/tree-ssa-loop-im.c
@@ -3211,6 +3211,10 @@ fill_always_executed_in_1 (class loop *loop, sbitmap 
contains_call)
  if (dominated_by_p (CDI_DOMINATORS, loop->latch, bb))
last = bb;
  
+	  if (inn_loop != loop

+ && flow_loop_nested_p (bb->loop_father, inn_loop))
+   inn_loop = bb->loop_father;
+


The comment says

   /* In a loop that is always entered we may proceed anyway.
  But record that we entered it and stop once we leave it.
*/
   inn_loop = bb->loop_father;

and your change would defeat that early return, no?


The issue is the search method exits too early when iterating the outer
loop.  For example of a nested loop, loop 1 includes 5,8,3,10,4,9
and loop2 includes 3,10.  Currently, it breaks when bb is 3 as bb 3
doesn't dominate bb 9 of loop 1.  But actually, both bb 5 and bb 4 are
ALWAYS_EXECUTED for loop 1, so if there are store instructions in bb 4
they won't be processed by store motion again.


5<
|\   |
8 \  9
|  \ |
--->3--->4
|| 
10---|



SET_ALWAYS_EXECUTED_IN is only set to bb 5 on master code now, with this
patch, it will continue search when meet bb 3 until bb 4, then last is updated
to bb 4, it will break until exit edge is found at bb 4 by
"if (!flow_bb_inside_loop_p (loop, e->dest))".  Then the followed loop code will
set bb 4 as ALWAYS_EXEUCTED and all it's idoms bb 5.


 while (1)
{
  SET_ALWAYS_EXECUTED_IN (last, loop);
  if (last == loop->header)
break;
  last = get_immediate_dominator (CDI_DOMINATORS, last);
}

After further discussion with Kewen, we found that the inn_loop variable is
totally useless and could be removed.





  if (bitmap_bit_p (contains_call, bb->index))
break;
  
@@ -3238,7 +3242,7 @@ fill_always_executed_in_1 (class loop *loop, sbitmap contains_call)
  
  	  if (bb->loop_father->header == bb)

{
- if (!dominated_by_p (CDI_DOMINATORS, loop->latch, bb))
+ if (!dominated_by_p (CDI_DOMINATORS, bb->loop_father->latch, bb))
break;


That's now a always false condition - a loops latch is always dominated
by its header.  The condition as written tries to verify whether the
loop is always entered - mind we visit all blocks, not only those
always executed.


Thanks for the catch!  I am afraid the piece of code should be removed since it 
stops
search of potential ALWAYS EXECUTED bb after inner loop...



In fact for your testcase the x->j ref is _not_ always executed
since the inner loop is conditional on n > 0.


Yes.  But I want to move x->k (not x->j) out of loop 1 when l > 0 in 
store-motion.
Attached the diff file without and with my patch to show the extra optimization.

x->j is already moved out of loop 2 on master code.
If change n and l to constant numbers like 100, master code could also do 2 
store
motions as expected. The edge from bb 5 to bb 4 doesn't exist now, so bb 4, bb 3
and bb 5 are ALWAYS EXECUTED for loop 1.


struct X { int i; int j; int k;};

void foo(struct X *x, int n, int l)
{
 for (int j = 0; j < l; j++) // loop 1
   {
 for (int i = 0; i < n; ++i)  // loop 2
   {
 int *p = >j;
 int tem 

Re: [PATCH] c++: fix -fsanitize-coverage=trace-pc ICE [PR101331]

2021-08-16 Thread Jeff Law via Gcc-patches




On 8/13/2021 2:05 AM, Martin Liška wrote:

Return content of flag_sanitize_coverage when fn is null.

Patch can bootstrap on x86_64-linux-gnu and survives regression tests.

Ready to be installed?
Thanks,
Martin

PR c++/101331

gcc/ChangeLog:

* asan.h (sanitize_coverage_p): Handle when fn == NULL.

gcc/testsuite/ChangeLog:

* g++.dg/pr101331.C: New test.

Assuming that FN can legitimately have a NULL value here, OK.
jeff



[committed] Improve SImode shifts for H8

2021-08-16 Thread Jeff Law via Gcc-patches


Similar to the H8/300H patch, this improves SImode shifts for the H8/S.  
It's not as big a win on the H8/S since we can shift two positions at a 
time.  But that also means that we can handle more residuals with 
minimal code growth after a special shift-by-16 or shift-by-24 sequence.


I think there's more to do here, but this seemed like as good a 
checkpoint as any.  Tested without regressions and installed on the trunk.


Jeff


commit 75a7176575c409940b66020def23508f5701f5fb
Author: Jeff Law 
Date:   Mon Aug 16 22:23:30 2021 -0400

Improve SImode shifts for H8

Similar to the H8/300H patch, this improves SImode shifts for the H8/S.
It's not as big a win on the H8/S since we can shift two positions at a
time.  But that also means that we can handle more residuals with minimal
ode growth after a special shift-by-16 or shift-by-24 sequence.

I think there's more to do here, but this seemed like as good a checkpoint
as any.  Tested without regressions.

gcc/
* config/h8300/h8300.c (shift_alg_si): Avoid loops for most SImode
shifts on the H8/S.
(h8300_option_override): Use loops on H8/S more often when 
optimizing
for size.
(get_shift_alg): Handle new "special" cases on H8/S.  Simplify
accordingly.  Handle various arithmetic right shifts with special
sequences that we couldn't handle before.

diff --git a/gcc/config/h8300/h8300.c b/gcc/config/h8300/h8300.c
index 7959ad1e276..0c4e5089791 100644
--- a/gcc/config/h8300/h8300.c
+++ b/gcc/config/h8300/h8300.c
@@ -248,17 +248,17 @@ static enum shift_alg shift_alg_si[2][3][32] = {
 /* 16   17   18   19   20   21   22   23  */
 /* 24   25   26   27   28   29   30   31  */
 { INL, INL, INL, INL, INL, INL, INL, INL,
-  INL, INL, INL, LOP, LOP, LOP, LOP, SPC,
-  SPC, SPC, SPC, SPC, SPC, SPC, LOP, LOP,
-  SPC, SPC, LOP, LOP, SPC, SPC, SPC, SPC }, /* SHIFT_ASHIFT   */
+  INL, INL, INL, INL, INL, INL, INL, SPC,
+  SPC, SPC, SPC, SPC, SPC, SPC, SPC, SPC,
+  SPC, SPC, SPC, SPC, SPC, SPC, SPC, SPC }, /* SHIFT_ASHIFT   */
 { INL, INL, INL, INL, INL, INL, INL, INL,
-  INL, INL, INL, LOP, LOP, LOP, LOP, SPC,
-  SPC, SPC, SPC, SPC, SPC, SPC, LOP, LOP,
-  SPC, SPC, LOP, LOP, SPC, SPC, SPC, SPC }, /* SHIFT_LSHIFTRT */
+  INL, INL, INL, INL, INL, INL, INL, SPC,
+  SPC, SPC, SPC, SPC, SPC, SPC, SPC, SPC,
+  SPC, SPC, SPC, SPC, SPC, SPC, SPC, SPC }, /* SHIFT_LSHIFTRT */
 { INL, INL, INL, INL, INL, INL, INL, INL,
-  INL, INL, INL, LOP, LOP, LOP, LOP, LOP,
-  SPC, SPC, SPC, SPC, SPC, SPC, LOP, LOP,
-  SPC, SPC, LOP, LOP, LOP, LOP, LOP, SPC }, /* SHIFT_ASHIFTRT */
+  INL, INL, INL, INL, INL, INL, INL, LOP,
+  SPC, SPC, SPC, SPC, SPC, SPC, SPC, SPC,
+  SPC, SPC, SPC, SPC, SPC, SPC, SPC, SPC }, /* SHIFT_ASHIFTRT */
   }
 };
 
@@ -375,6 +375,36 @@ h8300_option_override (void)
 
   /* H8S */
   shift_alg_hi[H8_S][SHIFT_ASHIFTRT][14] = SHIFT_LOOP;
+
+  shift_alg_si[H8_S][SHIFT_ASHIFT][11] = SHIFT_LOOP;
+  shift_alg_si[H8_S][SHIFT_ASHIFT][12] = SHIFT_LOOP;
+  shift_alg_si[H8_S][SHIFT_ASHIFT][13] = SHIFT_LOOP;
+  shift_alg_si[H8_S][SHIFT_ASHIFT][14] = SHIFT_LOOP;
+  shift_alg_si[H8_S][SHIFT_ASHIFT][22] = SHIFT_LOOP;
+  shift_alg_si[H8_S][SHIFT_ASHIFT][23] = SHIFT_LOOP;
+  shift_alg_si[H8_S][SHIFT_ASHIFT][26] = SHIFT_LOOP;
+  shift_alg_si[H8_S][SHIFT_ASHIFT][27] = SHIFT_LOOP;
+
+  shift_alg_si[H8_S][SHIFT_LSHIFTRT][11] = SHIFT_LOOP;
+  shift_alg_si[H8_S][SHIFT_LSHIFTRT][12] = SHIFT_LOOP;
+  shift_alg_si[H8_S][SHIFT_LSHIFTRT][13] = SHIFT_LOOP;
+  shift_alg_si[H8_S][SHIFT_LSHIFTRT][14] = SHIFT_LOOP;
+  shift_alg_si[H8_S][SHIFT_LSHIFTRT][22] = SHIFT_LOOP;
+  shift_alg_si[H8_S][SHIFT_LSHIFTRT][23] = SHIFT_LOOP;
+  shift_alg_si[H8_S][SHIFT_LSHIFTRT][26] = SHIFT_LOOP;
+  shift_alg_si[H8_S][SHIFT_LSHIFTRT][27] = SHIFT_LOOP;
+
+  shift_alg_si[H8_S][SHIFT_ASHIFTRT][11] = SHIFT_LOOP;
+  shift_alg_si[H8_S][SHIFT_ASHIFTRT][12] = SHIFT_LOOP;
+  shift_alg_si[H8_S][SHIFT_ASHIFTRT][13] = SHIFT_LOOP;
+  shift_alg_si[H8_S][SHIFT_ASHIFTRT][14] = SHIFT_LOOP;
+  shift_alg_si[H8_S][SHIFT_ASHIFTRT][22] = SHIFT_LOOP;
+  shift_alg_si[H8_S][SHIFT_ASHIFTRT][23] = SHIFT_LOOP;
+  shift_alg_si[H8_S][SHIFT_ASHIFTRT][26] = SHIFT_LOOP;
+  shift_alg_si[H8_S][SHIFT_ASHIFTRT][27] = SHIFT_LOOP;
+  shift_alg_si[H8_S][SHIFT_ASHIFTRT][28] = SHIFT_LOOP;
+  shift_alg_si[H8_S][SHIFT_ASHIFTRT][29] = SHIFT_LOOP;
+  shift_alg_si[H8_S][SHIFT_ASHIFTRT][30] = SHIFT_LOOP;
 }
 
   /* Work out a value for MOVE_RATIO.  */
@@ -3814,8 +3844,7 @@ get_shift_alg (enum shift_type shift_type, enum 
shift_mode shift_mode,
  gcc_unreachable ();
}
}
-  else if ((TARGET_H8300H && count >= 16 && count <= 23)
-  || (TARGET_H8300S && count >= 16 && count <= 21))
+  else 

[committed] Drop embedded stabs from rl78-elf port

2021-08-16 Thread Jeff Law via Gcc-patches
So rl78-elf started failing various stabs tests a few days ago in my 
tester with a linker/assembler error. Given the plan to drop embedded 
stabs, I didn't bother to debug the failure and instead just killed 
embedded stabs for the rl78 port, which was trivial to do.


Committed to the trunk after verifying rl78-elf returned to normal 
testing state.


Jeff


commit 052bdc7f2ba4b56d1ff9625d69b97c23bc858309
Author: Jeff Law 
Date:   Mon Aug 16 19:03:56 2021 -0400

Drop embeded stabs from rl78 port

gcc/
* config.gcc (rl78-*-elf*): Do not include dbxelf.h.

diff --git a/gcc/config.gcc b/gcc/config.gcc
index 93e2b3219b9..7001a79b823 100644
--- a/gcc/config.gcc
+++ b/gcc/config.gcc
@@ -3126,7 +3126,7 @@ rs6000-ibm-aix[789].* | powerpc-ibm-aix[789].*)
default_use_cxa_atexit=yes
;;
 rl78-*-elf*)
-   tm_file="dbxelf.h elfos.h newlib-stdint.h ${tm_file}"
+   tm_file="elfos.h newlib-stdint.h ${tm_file}"
target_has_targetm_common=no
c_target_objs="rl78-c.o"
cxx_target_objs="rl78-c.o"


[PATCH] more warning code refactoring

2021-08-16 Thread Martin Sebor via Gcc-patches

The attached patch continues with the move of warning code from
builtins.c and calls.c into a more suitable home.  As before, it
is mostly free of functional changes.  The one exception is that
as pleasant a side-effect, moving the attribute access checking
from initialize_argument_information() in calls.c to the new
warning pass also happens to fix PR 101854.  This is thanks to
the latter iterating over function arguments explicitly provided
in the program and not having to worry about skipping over
the additional pointer argument synthesized for calls to functions
that return a large struct by value that the former function sneaks
into the argument list.

Tested on x86_64-linux.

Martin

Previous patches in this series:
https://gcc.gnu.org/pipermail/gcc-patches/2021-August/576821.html
https://gcc.gnu.org/pipermail/gcc-patches/2021-July/575377.html
Move more warning code to gimple-ssa-warn-access etc.

gcc/ChangeLog:

	PR middle-end/101854
	* builtins.c (expand_builtin_alloca): Move warning code to check_alloca
	in gimple-ssa-warn-access.cc.
	* calls.c (alloc_max_size): Move code to check_alloca.
	(get_size_range): Move to pointer-query.cc.
	(maybe_warn_alloc_args_overflow): Move to gimple-ssa-warn-access.cc.
	(get_attr_nonstring_decl): Move to tree.c.
	(fntype_argno_type): Move to gimple-ssa-warn-access.cc.
	(append_attrname): Same.
	(maybe_warn_rdwr_sizes): Same.
	(initialize_argument_information): Move code to
	gimple-ssa-warn-access.cc.
	* calls.h (maybe_warn_alloc_args_overflow): Move to
	gimple-ssa-warn-access.h.
	(get_attr_nonstring_decl): Move to tree.h.
	(maybe_warn_nonstring_arg):  Move to gimple-ssa-warn-access.h.
	(enum size_range_flags): Move to pointer-query.h.
	(get_size_range): Same.
	* gimple-ssa-warn-access.cc (get_size_range): Declare static.
	(maybe_emit_free_warning): Rename...
	(maybe_check_dealloc_call): ...to this for consistency.
	(class pass_waccess): Add members.
	(pass_waccess::~pass_waccess): Defined.
	(alloc_max_size): Move here from calls.c.
	(maybe_warn_alloc_args_overflow): Same.
	(check_alloca): New function.
	(check_alloc_size_call): New function.
	(check_strncat): Handle another warning flag.
	(pass_waccess::check_builtin): Handle alloca.
	(fntype_argno_type): Move here from calls.c.
	(append_attrname): Same.
	(maybe_warn_rdwr_sizes): Same.
	(pass_waccess::check_call): Define.
	(check_nonstring_args): New function.
	(pass_waccess::check): Call new member functions.
	(pass_waccess::execute): Enable ranger.
	* gimple-ssa-warn-access.h (get_size_range): Move here from calls.h.
	(maybe_warn_nonstring_arg): Same.
	* gimple-ssa-warn-restrict.c: Remove #include.
	* pointer-query.cc (get_size_range): Move here from calls.c.
	* pointer-query.h (enum size_range_flags): Same.
	(get_size_range): Same.
	* tree.c (get_attr_nonstring_decl): Move here from calls.c.
	* tree.h (get_attr_nonstring_decl): Move here from calls.h.

gcc/testsuite/ChangeLog:

	PR middle-end/101854
	* gcc.dg/Wstringop-overflow-72.c: New test.

diff --git a/gcc/builtins.c b/gcc/builtins.c
index d2be807f1d6..99548627761 100644
--- a/gcc/builtins.c
+++ b/gcc/builtins.c
@@ -43,7 +43,7 @@ along with GCC; see the file COPYING3.  If not see
 #include "alias.h"
 #include "fold-const.h"
 #include "fold-const-call.h"
-#include "gimple-ssa-warn-restrict.h"
+#include "gimple-ssa-warn-access.h"
 #include "stor-layout.h"
 #include "calls.h"
 #include "varasm.h"
@@ -81,7 +81,6 @@ along with GCC; see the file COPYING3.  If not see
 #include "demangle.h"
 #include "gimple-range.h"
 #include "pointer-query.h"
-#include "gimple-ssa-warn-access.h"
 
 struct target_builtins default_target_builtins;
 #if SWITCHABLE_TARGET
@@ -4896,25 +4895,6 @@ expand_builtin_alloca (tree exp)
   if (!valid_arglist)
 return NULL_RTX;
 
-  if ((alloca_for_var
-   && warn_vla_limit >= HOST_WIDE_INT_MAX
-   && warn_alloc_size_limit < warn_vla_limit)
-  || (!alloca_for_var
-	  && warn_alloca_limit >= HOST_WIDE_INT_MAX
-	  && warn_alloc_size_limit < warn_alloca_limit
-	  ))
-{
-  /* -Walloca-larger-than and -Wvla-larger-than settings of
-	 less than HOST_WIDE_INT_MAX override the more general
-	 -Walloc-size-larger-than so unless either of the former
-	 options is smaller than the last one (wchich would imply
-	 that the call was already checked), check the alloca
-	 arguments for overflow.  */
-  tree args[] = { CALL_EXPR_ARG (exp, 0), NULL_TREE };
-  int idx[] = { 0, -1 };
-  maybe_warn_alloc_args_overflow (fndecl, exp, args, idx);
-}
-
   /* Compute the argument.  */
   op0 = expand_normal (CALL_EXPR_ARG (exp, 0));
 
diff --git a/gcc/calls.c b/gcc/calls.c
index fcb0d6dec69..e50d3fc3b62 100644
--- a/gcc/calls.c
+++ b/gcc/calls.c
@@ -17,7 +17,6 @@ You should have received a copy of the GNU General Public License
 along with GCC; see the file COPYING3.  If not see
 .  */
 
-#define INCLUDE_STRING
 #include "config.h"
 #include "system.h"
 #include "coretypes.h"
@@ -50,7 +49,6 @@ 

Re: [PATCH 4/6] Support -fexcess-precision=16 which will enable FLT_EVAL_METHOD_PROMOTE_TO_FLOAT16 when backend supports _Float16.

2021-08-16 Thread Hongtao Liu via Gcc-patches
On Fri, Aug 6, 2021 at 2:06 PM Hongtao Liu  wrote:
>
> On Tue, Aug 3, 2021 at 10:44 AM Hongtao Liu  wrote:
> >
> > On Tue, Aug 3, 2021 at 3:34 AM Joseph Myers  wrote:
> > >
> > > On Mon, 2 Aug 2021, liuhongt via Gcc-patches wrote:
> > >
> > > > diff --git a/gcc/config/i386/i386.c b/gcc/config/i386/i386.c
> > > > index 7979e240426..dc673c89bc8 100644
> > > > --- a/gcc/config/i386/i386.c
> > > > +++ b/gcc/config/i386/i386.c
> > > > @@ -23352,6 +23352,8 @@ ix86_get_excess_precision (enum 
> > > > excess_precision_type type)
> > > >   return (type == EXCESS_PRECISION_TYPE_STANDARD
> > > >   ? FLT_EVAL_METHOD_PROMOTE_TO_FLOAT
> > > >   : FLT_EVAL_METHOD_UNPREDICTABLE);
> > > > +  case EXCESS_PRECISION_TYPE_FLOAT16:
> > > > + return FLT_EVAL_METHOD_PROMOTE_TO_FLOAT16;
> > > >default:
> > > >   gcc_unreachable ();
> > > >  }
> > >
> > > I'd expect an error for -fexcess-precision=16 with -mfpmath=387 (since x87
> > > doesn't do float or double arithmetic, but -fexcess-precision=16 implies
> > > that all of _Float16, float and double are represented to the range and
> > > precision of their type withou any excess precision).
> > >
> > Yes, additional changes like this.
> >
> > modified   gcc/config/i386/i386.c
> > @@ -23443,6 +23443,9 @@ ix86_get_excess_precision (enum
> > excess_precision_type type)
> >   ? FLT_EVAL_METHOD_PROMOTE_TO_FLOAT
> >   : FLT_EVAL_METHOD_UNPREDICTABLE);
> >case EXCESS_PRECISION_TYPE_FLOAT16:
> > + if (TARGET_80387
> > + && !(TARGET_SSE_MATH && TARGET_SSE))
> > +   error ("%<-fexcess-precision=16%> is not compatible with 
> > %<-mfpmath=387%>");
> >   return FLT_EVAL_METHOD_PROMOTE_TO_FLOAT16;
> >default:
> >   gcc_unreachable ();
> > new file   gcc/testsuite/gcc.target/i386/float16-7.c
> > @@ -0,0 +1,9 @@
> > +/* { dg-do compile } */
> > +/* { dg-options "-O2 -mfpmath=387 -fexcess-precision=16" } */
> > +/* { dg-excess-errors "'-fexcess-precision=16' is not compatible with
> > '-mfpmath=387'" } */
> > +_Float16
> > +foo (_Float16 a, _Float16 b)
> > +{
> > +  return a + b;/* { dg-error "'-fexcess-precision=16' is not
> > compatible with '-mfpmath=387'" } */
> > +}
> > +
> >
> > > --
> > > Joseph S. Myers
> > > jos...@codesourcery.com
> >
> >
> >
> > --
> > BR,
> > Hongtao
>
>
> Updated patch and ping for it.
>
> Also for backend changes.
> 1. For backend m68k/s390 which totally don't support _Float16, backend
> will issue an error for -fexcess-precision=16, I think it should be
> fine.
> 2. For backend like arm/aarch64 which supports _Float16 , backend will
> set FLT_EVAL_METHOD_PROMOTE_TO_FLOAT16 for -fexcess-precision=16 even
> hardware instruction for fp16 is not supported. Would that be ok for
> arm?

Ping for this patch.

> --
> BR,
> Hongtao



-- 
BR,
Hongtao


Re: [PATCH] Make sure we're playing with integral modes before call extract_integral_bit_field.

2021-08-16 Thread Hongtao Liu via Gcc-patches
On Mon, Aug 9, 2021 at 4:34 PM Hongtao Liu  wrote:
>
> On Fri, Aug 6, 2021 at 7:27 PM Richard Biener via Gcc-patches
>  wrote:
> >
> > On Fri, Aug 6, 2021 at 11:05 AM Richard Sandiford
> >  wrote:
> > >
> > > Richard Biener via Gcc-patches  writes:
> > > > On Fri, Aug 6, 2021 at 5:32 AM liuhongt  wrote:
> > > >>
> > > >> Hi:
> > > >> ---
> > > >> OK, I think sth is amiss here upthread.  insv/extv do look like they
> > > >> are designed
> > > >> to work on integer modes (but docs do not say anything about this 
> > > >> here).
> > > >> In fact the caller of extract_bit_field_using_extv is named
> > > >> extract_integral_bit_field.  Of course nothing seems to check what 
> > > >> kind of
> > > >> modes we're dealing with, but we're for example happily doing
> > > >> expand_shift in 'mode'.  In the extract_integral_bit_field call 'mode' 
> > > >> is
> > > >> some integer mode and op0 is HFmode?  From the above I get it's
> > > >> the other way around?  In that case we should wrap the
> > > >> call to extract_integral_bit_field, extracting in an integer mode with 
> > > >> the
> > > >> same size as 'mode' and then converting the result as (subreg:HF 
> > > >> (reg:HI ...)).
> > > >> ---
> > > >>   This is a separate patch as a follow up of upper comments.
> > > >>
> > > >> gcc/ChangeLog:
> > > >>
> > > >> * expmed.c (extract_bit_field_1): Wrap the call to
> > > >> extract_integral_bit_field, extracting in an integer mode with
> > > >> the same size as 'tmode' and then converting the result
> > > >> as (subreg:tmode (reg:imode)).
> > > >>
> > > >> gcc/testsuite/ChangeLog:
> > > >> * gcc.target/i386/float16-5.c: New test.
> > > >> ---
> > > >>  gcc/expmed.c  | 19 +++
> > > >>  gcc/testsuite/gcc.target/i386/float16-5.c | 12 
> > > >>  2 files changed, 31 insertions(+)
> > > >>  create mode 100644 gcc/testsuite/gcc.target/i386/float16-5.c
> > > >>
> > > >> diff --git a/gcc/expmed.c b/gcc/expmed.c
> > > >> index 3143f38e057..72790693ef0 100644
> > > >> --- a/gcc/expmed.c
> > > >> +++ b/gcc/expmed.c
> > > >> @@ -1850,6 +1850,25 @@ extract_bit_field_1 (rtx str_rtx, poly_uint64 
> > > >> bitsize, poly_uint64 bitnum,
> > > >>op0_mode = opt_scalar_int_mode ();
> > > >>  }
> > > >>
> > > >> +  /* Make sure we are playing with integral modes.  Pun with subregs
> > > >> + if we aren't. When tmode is HFmode, op0 is SImode, there will be 
> > > >> ICE
> > > >> + in extract_integral_bit_field.  */
> > > >> +  if (int_mode_for_mode (tmode).exists ()
> > > >
> > > > check !INTEGRAL_MODE_P (tmode) before, that should be slightly
> > > > cheaper.  Then imode should always be != tmode.  Maybe
> > > > even GET_MDOE_CLASS (tmode) != MODE_INT since I'm not sure
> > > > how it behaves for composite modes.
> > > >
> > > > Of course the least surprises would happen when we restrict this
> > > > to FLOAT_MODE_P (tmode).
> > > >
> > > > Richard - any preferences?
> > >
> > > If the bug is that extract_integral_bit_field is being called with
> > > a non-integral mode parameter, then it looks odd that we can still
> > > fall through to it without an integral mode (when exists is false).
> > >
> > > If calling extract_integral_bit_field without an integral mode is
> > > a bug then I think we should have:
> > >
> > >   int_mode_for_mode (mode).require ()
> > >
> > > whenever mode is not already SCALAR_INT_MODE_P/is_a.
> > > Ideally we'd make the mode parameter scalar_int_mode too.
> > >
> > > extract_integral_bit_field currently has:
> > >
> > >   /* Find a correspondingly-sized integer field, so we can apply
> > >  shifts and masks to it.  */
> > >   scalar_int_mode int_mode;
> > >   if (!int_mode_for_mode (tmode).exists (_mode))
> > > /* If this fails, we should probably push op0 out to memory and then
> > >do a load.  */
> > > int_mode = int_mode_for_mode (mode).require ();
> > >
> > > which would seem to be redundant after this change.
> >
> > I'm not sure what exactly the bug is, but extract_integral_bit_field ends
> > up creating a lowpart subreg that's not allowed and that ICEs (and I
> > can't see a way to check beforehand).  So it seems to me at least
> > part of that function doesn't expect non-integral extraction modes.
> >
> > But who knows - the code is older than I am (OK, not, but older than
> > my involvment in GCC ;))
> >
> How about attached patch w/ below changelog
>
> gcc/ChangeLog:
>
> * expmed.c (extract_bit_field_1): Make sure we're playing with
> integral modes before call extract_integral_bit_field.
> (extract_integral_bit_field): Add a parameter of type
> scalar_int_mode which corresponds to of tmode.
> And call extract_and_convert_fixed_bit_field instead of
> extract_fixed_bit_field and convert_extracted_bit_field.
> (extract_and_convert_fixed_bit_field): New function, it's a
> combination of extract_fixed_bit_field 

Re: [PATCH] [i386] Optimize __builtin_shuffle_vector.

2021-08-16 Thread Hongtao Liu via Gcc-patches
On Mon, Aug 16, 2021 at 3:25 PM Hongtao Liu  wrote:
>
> On Mon, Aug 16, 2021 at 3:11 PM Jakub Jelinek via Gcc-patches
>  wrote:
> >
> > On Mon, Aug 16, 2021 at 01:18:38PM +0800, liuhongt via Gcc-patches wrote:
> > > +  /* Accept VNxHImode and VNxQImode now.  */
> > > +  if (!TARGET_AVX512VL && GET_MODE_SIZE (mode) < 64)
> > > +return false;
> > > +
> > > +  /* vpermw.  */
> > > +  if (!TARGET_AVX512BW && inner_size == 2)
> > > +return false;
> > > +
> > > +  /* vpermb.   */
> >
> > Too many spaces after dot.
> >
> > > @@ -18301,7 +18380,7 @@ expand_vec_perm_1 (struct expand_vec_perm_d *d)
> > >if (expand_vec_perm_palignr (d, true))
> > >  return true;
> > >
> > > -  /* Try the AVX512F vperm{s,d} instructions.  */
> > > +  /* Try the AVX512F vperm{w,b,s,d} and instructions  */
> >
> > What is the " and" doing there?
> Typo.
> >
> > > +  /* Check that the permutation is suitable for pmovz{bw,wd,dq}.
> > > + For example V16HImode to V8HImode
> > > + { 0 2 4 6 8 10 12 14 * * * * * * * * }.  */
> > > +  for (int i = 0; i != nelt/2; i++)
> >
> > nelt / 2 please
> >
> > Otherwise LGTM.
> >
> Thanks for the review.
> > Jakub
> >
>
>
> --
> BR,
> Hongtao

This patch caused FAIL: gcc.target/i386/pr82460-2.c scan-assembler-not
\\mvpermi2b\\M with -march=cascadelake

So adjust the testcase with the following patch.

[i386] Adjust testcase.

This testcase is used to detect reuse of perm mask in the main loop,
in epilog, vpermi2b can still be used, so add the option
--param=vect-epilogues-nomask=0.

gcc/testsuite/ChangeLog:

* gcc.target/i386/pr82460-2.c: Adjust testcase by adding
--param=vect-epilogues-nomask=0

diff --git a/gcc/testsuite/gcc.target/i386/pr82460-2.c
b/gcc/testsuite/gcc.target/i386/pr82460-2.c
index 4a45beed715..8cdfb54f56a 100644
--- a/gcc/testsuite/gcc.target/i386/pr82460-2.c
+++ b/gcc/testsuite/gcc.target/i386/pr82460-2.c
@@ -1,6 +1,6 @@
 /* PR target/82460 */
 /* { dg-do compile } */
-/* { dg-options "-O2 -ftree-vectorize -mavx512vbmi
-mprefer-vector-width=none" } */
+/* { dg-options "-O2 -ftree-vectorize -mavx512vbmi
-mprefer-vector-width=none --param=vect-epilogues-nomask=0" } */
 /* We want to reuse the permutation mask in the loop, so use vpermt2b rather
than vpermi2b.  */
 /* { dg-final { scan-assembler-not {\mvpermi2b\M} } } */


--
BR,
Hongtao


Re: [PATCH] c++: Add C++20 #__VA_OPT__ support

2021-08-16 Thread Jason Merrill via Gcc-patches

On 7/15/21 1:52 PM, Jakub Jelinek wrote:

Hi!

The following patch implements C++20 # __VA_OPT__ (...) support.
Testcases cover what I came up with myself and what LLVM has for #__VA_OPT__
in its testsuite and the string literals are identical between the two
compilers on the va-opt-5.c testcase.

Haven't looked at the non-#__VA_OPT__ differences between LLVM and GCC
though, I think at least the
http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2018/p1042r1.html
#define H4(X, ...) __VA_OPT__(a X ## X) ## b
H4(, 1)  // replaced by a b
case isn't handled right (we emit ab).

Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?


OK.


2021-07-15  Jakub Jelinek  

libcpp/
* macro.c (vaopt_state): Add m_stringify member.
(vaopt_state::vaopt_state): Initialize it.
(vaopt_state::update): Overwrite it.
(vaopt_state::stringify): New method.
(stringify_arg): Replace arg argument with first, count arguments
and add va_opt argument.  Use first instead of arg->first and
count instead of arg->count, for va_opt add paste_tokens handling.
(paste_tokens): Fix up len calculation.  Don't spell rhs twice,
instead use %.*s to supply lhs and rhs spelling lengths.  Don't call
_cpp_backup_tokens here.
(paste_all_tokens): Call it here instead.
(replace_args): Adjust stringify_arg caller.  For vaopt_state::END
if stringify is true handle __VA_OPT__ stringification.
(create_iso_definition): Handle # __VA_OPT__ similarly to # macro_arg.
gcc/testsuite/
* c-c++-common/cpp/va-opt-5.c: New test.
* c-c++-common/cpp/va-opt-6.c: New test.

--- libcpp/macro.c.jj   2021-05-21 10:34:09.328560825 +0200
+++ libcpp/macro.c  2021-07-15 17:27:30.109631306 +0200
@@ -118,6 +118,7 @@ class vaopt_state {
  m_arg (arg),
  m_variadic (is_variadic),
  m_last_was_paste (false),
+m_stringify (false),
  m_state (0),
  m_paste_location (0),
  m_location (0),
@@ -145,6 +146,7 @@ class vaopt_state {
  }
++m_state;
m_location = token->src_loc;
+   m_stringify = (token->flags & STRINGIFY_ARG) != 0;
return BEGIN;
}
  else if (m_state == 1)
@@ -234,6 +236,11 @@ class vaopt_state {
  return m_state == 0;
}
  
+  bool stringify () const

+  {
+return m_stringify;
+  }
+
   private:
  
/* The cpp_reader.  */

@@ -247,6 +254,8 @@ class vaopt_state {
/* If true, the previous token was ##.  This is used to detect when
   a paste occurs at the end of the sequence.  */
bool m_last_was_paste;
+  /* True for #__VA_OPT__.  */
+  bool m_stringify;
  
/* The state variable:

   0 means not parsing
@@ -284,7 +293,8 @@ static _cpp_buff *collect_args (cpp_read
  static cpp_context *next_context (cpp_reader *);
  static const cpp_token *padding_token (cpp_reader *, const cpp_token *);
  static const cpp_token *new_string_token (cpp_reader *, uchar *, unsigned 
int);
-static const cpp_token *stringify_arg (cpp_reader *, macro_arg *);
+static const cpp_token *stringify_arg (cpp_reader *, const cpp_token **,
+  unsigned int, bool);
  static void paste_all_tokens (cpp_reader *, const cpp_token *);
  static bool paste_tokens (cpp_reader *, location_t,
  const cpp_token **, const cpp_token *);
@@ -818,10 +828,11 @@ cpp_quote_string (uchar *dest, const uch
return dest;
  }
  
-/* Convert a token sequence ARG to a single string token according to

-   the rules of the ISO C #-operator.  */
+/* Convert a token sequence FIRST to FIRST+COUNT-1 to a single string token
+   according to the rules of the ISO C #-operator.  */
  static const cpp_token *
-stringify_arg (cpp_reader *pfile, macro_arg *arg)
+stringify_arg (cpp_reader *pfile, const cpp_token **first, unsigned int count,
+  bool va_opt)
  {
unsigned char *dest;
unsigned int i, escape_it, backslash_count = 0;
@@ -834,9 +845,27 @@ stringify_arg (cpp_reader *pfile, macro_
*dest++ = '"';
  
/* Loop, reading in the argument's tokens.  */

-  for (i = 0; i < arg->count; i++)
+  for (i = 0; i < count; i++)
  {
-  const cpp_token *token = arg->first[i];
+  const cpp_token *token = first[i];
+
+  if (va_opt && (token->flags & PASTE_LEFT))
+   {
+ location_t virt_loc = pfile->invocation_location;
+ const cpp_token *rhs;
+ do
+   {
+ if (i == count)
+   abort ();
+ rhs = first[++i];
+ if (!paste_tokens (pfile, virt_loc, , rhs))
+   {
+ --i;
+ break;
+   }
+   }
+ while (rhs->flags & PASTE_LEFT);
+   }
  
if (token->type == CPP_PADDING)

{
@@ -923,7 +952,7 @@ paste_tokens (cpp_reader *pfile, locatio
cpp_token *lhs;
unsigned int len;
  
-  len = cpp_token_len (*plhs) + cpp_token_len (rhs) + 1;

+  len = 

Re: [PATCH] libcpp, v2: Implement C++23 P1949R7 - C++ Identifier Syntax using Unicode Standard Annex 31

2021-08-16 Thread Jason Merrill via Gcc-patches

On 8/16/21 4:51 PM, Jakub Jelinek wrote:

On Mon, Aug 16, 2021 at 04:21:00PM -0400, Jason Merrill wrote:

I see for the UTF-8 chars we have:
switch (ucn_valid_in_identifier (pfile, *cp, nst))
  {
  case 0:
/* In C++, this is an error for invalid character in an identifier
   because logically, the UTF-8 was converted to a UCN during
   translation phase 1 (even though we don't physically do it that
   way).  In C, this byte rather becomes grammatically a separate
   token.  */
if (CPP_OPTION (pfile, cplusplus))
  cpp_error (pfile, CPP_DL_ERROR,
 "extended character %.*s is not valid in an 
identifier",
 (int) (*pstr - base), base);
else
  {
*pstr = base;
return false;
  }
So, shall we behave the same as C for cxx23_identifiers here?  And shall we
do something similar for the UCNs in \u and \U forms?
Confused...


I tend to agree with Joseph's comment on your followup patch about this
issue; do you?


It isn't clear to me if it is ok that it is an error even with just -E,
i.e. whether
"If a single universal-character-name does not match any of the other
preprocessing token categories, the program is ill-formed."
applies already in translation phase 4 which is what -E emits (or some other
one?), or only in phase 7 when converting preprocessing tokens to tokens.


I read it as applying in phase 3.


But sure, if you agree with Joseph that the followup isn't needed, the
diagnostics is much better that way and I'd certainly prefer just this
patch and not the follow-up.

If not -E, I guess the standard is clear that it is invalid and how exactly
we diagnose it is QoI.

Jakub





Re: [PATCH 1/2] analyzer: detect and analyze calls via function pointer (GSoC)

2021-08-16 Thread David Malcolm via Gcc-patches
On Mon, 2021-08-16 at 22:27 +0530, Ankur Saini wrote:
> 
> Thanks for the review
> 
> > On 16-Aug-2021, at 4:48 AM, David Malcolm 
> > wrote:
> > 
> > Thanks, this is looking promising.  Has this been rebased recently
> > (e.g. since I merged
> >  https://gcc.gnu.org/pipermail/gcc-patches/2021-August/576737.html )
> 
> Yes, The branch is totally up to date with master at the time of
> writing this mail.
> 
> - - -
> 
> Here is the updated patch :
> 
> 
> 
> - - -
> P.S. While adding the new tests, I found some last minute bugs where
> analyzer would sometimes try to access a NULL cgraph edge to get the
> call stmt and crash.
> Although the problem was easily fixed by updating
> `callgraph_superedge::get_call_stmt ()`, it lead to a delay in running
> the test suite on the final version ( so the patch is still under
> testing at the time of writing this mail ). 


[...snip...]

> From 4ef1658de158c07474605ef4df9d092cfe7962aa Mon Sep 17 00:00:00 2001
> From: Ankur Saini 
> Date: Sun, 15 Aug 2021 19:19:07 +0530
> Subject: [PATCH 2/2] analyzer: detect and analyze virtual function calls
> 
> 2021-08-15  Ankur Saini  
> 
> gcc/analyzer/ChangeLog:
>   PR analyzer/97114
>   * analyzer/region-model.cc (region_model::get_rvalue_1): Add case for
>   OBJ_TYPE_REF.
> 
> gcc/testsuite/ChangeLog:
>   *g++.dg/analyzer/vfunc-2.C: New test.
>   *g++.dg/analyzer/vfunc-3.C: New test.
>   *g++.dg/analyzer/vfunc-4.C: New test.
>   *g++.dg/analyzer/vfunc-5.C: New test.

ChangeLog nit: the format of the above isn't quite correct; the entries
are missing a space between the "*" and the filename.  You can use
  contrib/gcc-changelog/git_check_commit.py
to check commit messages before attempting to push them (that script
runs server-side and will reject the push).

I like the new tests, thanks!

Both of these patches are OK to push to trunk, provided that they
bootstrap and the testsuite doesn't regress.

Dave




Re: [PATCH] libcpp: __VA_OPT__ p1042r1 placemarker changes [PR101488]

2021-08-16 Thread Jason Merrill via Gcc-patches

On 7/20/21 5:50 AM, Jakub Jelinek wrote:

Hi!

So, besides missing #__VA_OPT__ patch for which I've posted patch last week,
P1042R1 introduced some placemarker changes for __VA_OPT__, most notably
the addition of before "removal of placemarker tokens," rescanning ...
and the
#define H4(X, ...) __VA_OPT__(a X ## X) ## b
H4(, 1)  // replaced by a b
example mentioned there where we replace it currently with ab

The following patch are the minimum changes (except for the
__builtin_expect) that achieve the same preprocessing between current
clang++ and patched gcc on all the testcases I've tried (i.e. gcc __VA_OPT__
testsuite in c-c++-common/cpp/va-opt* including the new test and the clang
clang/test/Preprocessor/macro_va_opt* testcases).

At one point I was trying to implement the __VA_OPT__(args) case as if
for non-empty __VA_ARGS__ it expanded as if __VA_OPT__( and ) were missing,
but from the tests it seems that is not how it should work, in particular
if after (or before) we have some macro argument and it is not followed
(or preceded) by ##, then it should be macro expanded even when __VA_OPT__
is after ## or ) is followed by ##.



And it seems that not removing any
padding tokens isn't possible either, because the expansion of the arguments
typically has a padding token at the start and end and those at least
according to the testsuite need to go.


Makes sense, just like we discard padding around macro arguments.


It is unclear if it would be enough
to remove just one or if all padding tokens should be removed.
Anyway, e.g. the previous removal of all padding tokens at the end of
__VA_OPT__ is undesirable, as it e.g. eats also the padding tokens needed
for the H4 example from the paper.


Hmm, I don't see why.  Looking at the H4 example, it seems that the 
expansion of __VA_OPT__ should be


 a 

so when we paste to b, b is pasted to the placemarker, leaving a as a 
separate token.



Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?

2021-07-20  Jakub Jelinek  

PR preprocessor/101488
* macro.c (replace_args): Fix up handling of CPP_PADDING tokens at the
start or end of __VA_OPT__ arguments when preceeded or followed by ##.

* c-c++-common/cpp/va-opt-3.c: Adjust expected output.
* c-c++-common/cpp/va-opt-7.c: New test.

--- libcpp/macro.c.jj   2021-07-16 11:10:08.512925510 +0200
+++ libcpp/macro.c  2021-07-19 15:58:59.819101659 +0200
@@ -2025,6 +2026,7 @@ replace_args (cpp_reader *pfile, cpp_has
i = 0;
vaopt_state vaopt_tracker (pfile, macro->variadic, [macro->paramc - 
1]);
const cpp_token **vaopt_start = NULL;
+  unsigned vaopt_padding_tokens = 0;
for (src = macro->exp.tokens; src < limit; src++)
  {
unsigned int arg_tokens_count;
@@ -2034,7 +2036,7 @@ replace_args (cpp_reader *pfile, cpp_has
  
/* __VA_OPT__ handling.  */

vaopt_state::update_type vostate = vaopt_tracker.update (src);
-  if (vostate != vaopt_state::INCLUDE)
+  if (__builtin_expect (vostate != vaopt_state::INCLUDE, false))
{
  if (vostate == vaopt_state::BEGIN)
{
@@ -2059,7 +2061,9 @@ replace_args (cpp_reader *pfile, cpp_has
  
  	  /* Remove any tail padding from inside the __VA_OPT__.  */

  paste_flag = tokens_buff_last_token_ptr (buff);
- while (paste_flag && paste_flag != start
+ while (vaopt_padding_tokens--
+&& paste_flag
+&& paste_flag != start
 && (*paste_flag)->type == CPP_PADDING)
{
  tokens_buff_remove_last_token (buff);
@@ -2103,6 +2107,7 @@ replace_args (cpp_reader *pfile, cpp_has
  continue;
}
  
+  vaopt_padding_tokens = 0;

if (src->type != CPP_MACRO_ARG)
{
  /* Allocate a virtual location for token SRC, and add that
@@ -2180,11 +2185,8 @@ replace_args (cpp_reader *pfile, cpp_has
  else
paste_flag = tmp_token_ptr;
}
- /* Remove the paste flag if the RHS is a placemarker, unless the
-previous emitted token is at the beginning of __VA_OPT__;
-placemarkers within __VA_OPT__ are ignored in that case.  */
- else if (arg_tokens_count == 0
-  && tmp_token_ptr != vaopt_start)
+ /* Remove the paste flag if the RHS is a placemarker.  */
+ else if (arg_tokens_count == 0)
paste_flag = tmp_token_ptr;
}
}
@@ -2259,8 +2262,12 @@ replace_args (cpp_reader *pfile, cpp_has
token_index += j;
  
  	  index = expanded_token_index (pfile, macro, src, token_index);

- tokens_buff_add_token (buff, virt_locs,
-macro_arg_token_iter_get_token (),
+ const cpp_token *tok = macro_arg_token_iter_get_token ();
+ if (tok->type == CPP_PADDING)
+   

Re: [PATCH] libcpp, v2: Implement C++23 P1949R7 - C++ Identifier Syntax using Unicode Standard Annex 31

2021-08-16 Thread Jakub Jelinek via Gcc-patches
On Mon, Aug 16, 2021 at 04:21:00PM -0400, Jason Merrill wrote:
> > I see for the UTF-8 chars we have:
> >switch (ucn_valid_in_identifier (pfile, *cp, nst))
> >  {
> >  case 0:
> >/* In C++, this is an error for invalid character in an 
> > identifier
> >   because logically, the UTF-8 was converted to a UCN during
> >   translation phase 1 (even though we don't physically do it 
> > that
> >   way).  In C, this byte rather becomes grammatically a separate
> >   token.  */
> >if (CPP_OPTION (pfile, cplusplus))
> >  cpp_error (pfile, CPP_DL_ERROR,
> > "extended character %.*s is not valid in an 
> > identifier",
> > (int) (*pstr - base), base);
> >else
> >  {
> >*pstr = base;
> >return false;
> >  }
> > So, shall we behave the same as C for cxx23_identifiers here?  And shall we
> > do something similar for the UCNs in \u and \U forms?
> > Confused...
> 
> I tend to agree with Joseph's comment on your followup patch about this
> issue; do you?

It isn't clear to me if it is ok that it is an error even with just -E,
i.e. whether
"If a single universal-character-name does not match any of the other
preprocessing token categories, the program is ill-formed."
applies already in translation phase 4 which is what -E emits (or some other
one?), or only in phase 7 when converting preprocessing tokens to tokens.

But sure, if you agree with Joseph that the followup isn't needed, the
diagnostics is much better that way and I'd certainly prefer just this
patch and not the follow-up.

If not -E, I guess the standard is clear that it is invalid and how exactly
we diagnose it is QoI.

Jakub



RE: [PATCH] [MIPS] Hazard barrier return support

2021-08-16 Thread Dragan Mladjenovic via Gcc-patches


> -Original Message-
> From: Andrew Pinski [mailto:pins...@gmail.com]
> Sent: 16 August 2021 21:17
> To: Dragan Mladjenovic 
> Cc: gcc-patches@gcc.gnu.org
> Subject: Re: [PATCH] [MIPS] Hazard barrier return support
> 
> On Mon, Aug 16, 2021 at 7:43 AM Dragan Mladjenovic via Gcc-patches  patc...@gcc.gnu.org> wrote:
> >
> > This patch allows a function to request clearing of all instruction
> > and execution hazards upon normal return via __attribute__
> ((use_hazard_barrier_return)).
> >
> > 2017-04-25  Prachi Godbole  
> >
> > gcc/
> > * config/mips/mips.h (machine_function): New variable
> > use_hazard_barrier_return_p.
> > * config/mips/mips.md (UNSPEC_JRHB): New unspec.
> > (mips_hb_return_internal): New insn pattern.
> > * config/mips/mips.c (mips_attribute_table): Add attribute
> > use_hazard_barrier_return.
> > (mips_use_hazard_barrier_return_p): New static function.
> > (mips_function_attr_inlinable_p): Likewise.
> > (mips_compute_frame_info): Set use_hazard_barrier_return_p.
> > Emit error for unsupported architecture choice.
> > (mips_function_ok_for_sibcall, mips_can_use_return_insn):
> > Return false for use_hazard_barrier_return.
> > (mips_expand_epilogue): Emit hazard barrier return.
> > * doc/extend.texi: Document use_hazard_barrier_return.
> >
> > gcc/testsuite/
> > * gcc.target/mips/hazard-barrier-return-attribute.c: New test.
> > ---
> > Rehash of original patch posted by Prachi with minimal changes. Tested
> > against mips-mti-elf with mips32r2/-EB and mips32r2/-EB/-micromips.
> >
> >  gcc/config/mips/mips.c| 58 +--
> >  gcc/config/mips/mips.h|  3 +
> >  gcc/config/mips/mips.md   | 15 +
> >  gcc/doc/extend.texi   |  6 ++
> >  .../mips/hazard-barrier-return-attribute.c| 20 +++
> >  5 files changed, 98 insertions(+), 4 deletions(-)  create mode 100644
> > gcc/testsuite/gcc.target/mips/hazard-barrier-return-attribute.c
> >
> > diff --git a/gcc/config/mips/mips.c b/gcc/config/mips/mips.c index
> > 89d1be6cea6..6ce12fce52e 100644
> > --- a/gcc/config/mips/mips.c
> > +++ b/gcc/config/mips/mips.c
> > @@ -630,6 +630,7 @@ static const struct attribute_spec
> mips_attribute_table[] = {
> >  mips_handle_use_shadow_register_set_attr, NULL },
> >{ "keep_interrupts_masked",  0, 0, false, true,  true, false, NULL, NULL 
> > },
> >{ "use_debug_exception_return", 0, 0, false, true, true, false,
> > NULL, NULL },
> > +  { "use_hazard_barrier_return", 0, 0, true, false, false, false,
> > + NULL, NULL },
> >{ NULL, 0, 0, false, false, false, false, NULL, NULL }
> >  };
> >
> > @@ -1309,6 +1310,16 @@ mips_use_debug_exception_return_p (tree
> type)
> >TYPE_ATTRIBUTES (type)) != NULL;  }
> >
> > +/* Check if the attribute to use hazard barrier return is set for
> > +   the function declaration DECL.  */
> > +
> > +static bool
> > +mips_use_hazard_barrier_return_p (const_tree decl) {
> > +  return lookup_attribute ("use_hazard_barrier_return",
> > +  DECL_ATTRIBUTES (decl)) != NULL; }
> > +
> >  /* Return the set of compression modes that are explicitly required
> > by the attributes in ATTRIBUTES.  */
> >
> > @@ -1494,6 +1505,19 @@ mips_can_inline_p (tree caller, tree callee)
> >return default_target_can_inline_p (caller, callee);  }
> >
> > +/* Implement TARGET_FUNCTION_ATTRIBUTE_INLINABLE_P.
> > +
> > +   A function requesting clearing of all instruction and execution hazards
> > +   before returning cannot be inlined - thereby not clearing any hazards.
> > +   All our other function attributes are related to how out-of-line copies
> > +   should be compiled or called.  They don't in themselves prevent
> > + inlining.  */
> > +
> > +static bool
> > +mips_function_attr_inlinable_p (const_tree decl) {
> > +  return !mips_use_hazard_barrier_return_p (decl); }
> > +
> >  /* Handle an "interrupt" attribute with an optional argument.  */
> >
> >  static tree
> > @@ -7921,6 +7945,11 @@ mips_function_ok_for_sibcall (tree decl, tree
> exp ATTRIBUTE_UNUSED)
> >&& !targetm.binds_local_p (decl))
> >  return false;
> >
> > +  /* Can't generate sibling calls if returning from current function using
> > + hazard barrier return.  */
> > +  if (mips_use_hazard_barrier_return_p (current_function_decl))
> > +return false;
> > +
> >/* Otherwise OK.  */
> >return true;
> >  }
> > @@ -11008,6 +11037,17 @@ mips_compute_frame_info (void)
> > }
> >  }
> >
> > +  /* Determine whether to use hazard barrier return or not.  */  if
> > + (mips_use_hazard_barrier_return_p (current_function_decl))
> > +{
> > +  if (mips_isa_rev < 2)
> > +   error ("hazard barrier returns require a MIPS32r2 processor or
> > + greater");
> 
> Just a small nit, is MIPS64r2 ok too? 

Re: 'hash_map>'

2021-08-16 Thread Martin Sebor via Gcc-patches

On 8/16/21 6:44 AM, Thomas Schwinge wrote:

Hi!

On 2021-08-12T17:15:44-0600, Martin Sebor via Gcc  wrote:

On 8/6/21 10:57 AM, Thomas Schwinge wrote:

So I'm trying to do some C++...  ;-)

Given:

  /* A map from SSA names or var decls to record fields.  */
  typedef hash_map field_map_t;

  /* For each propagation record type, this is a map from SSA names or var 
decls
 to propagate, to the field in the record type that should be used for
 transmission and reception.  */
  typedef hash_map record_field_map_t;

Thus, that's a 'hash_map>'.  (I may do that,
right?)  Looking through GCC implementation files, very most of all uses
of 'hash_map' boil down to pointer key ('tree', for example) and
pointer/integer value.


Right.  Because most GCC containers rely exclusively on GCC's own
uses for testing, if your use case is novel in some way, chances
are it might not work as intended in all circumstances.

I've wrestled with hash_map a number of times.  A use case that's
close to yours (i.e., a non-trivial value type) is in cp/parser.c:
see class_to_loc_map_t.


Indeed, at the time you sent this email, I already had started looking
into that one!  (The Fortran test cases that I originally analyzed, which
triggered other cases of non-POD/non-trivial destructor, all didn't
result in a memory leak, because the non-trivial constructor doesn't
actually allocate any resources dynamically -- that's indeed different in
this case here.)  ..., and indeed:


(I don't remember if I tested it for leaks
though.  It's used to implement -Wmismatched-tags so compiling
a few tests under Valgrind should show if it does leak.)


... it does leak memory at present.  :-| (See attached commit log for
details for one example.)

To that effect, to document the current behavior, I propose to
"Add more self-tests for 'hash_map' with Value type with non-trivial
constructor/destructor", see attached.  OK to push to master branch?
(Also cherry-pick into release branches, eventually?)


Adding more tests sounds like an excellent idea.  I'm not sure about
the idea of adding loopy selftests that iterate as many times as in
the patch (looks like 1234 times two?)  Selftests run each time GCC
builds (i.e., even during day to day development).  It seems to me
that it might be better to run such selftests only as part of
the bootstrap process.




Then:

  record_field_map_t field_map ([...]); // see below
  for ([...])
{
  tree record_type = [...];
  [...]
  bool existed;
  field_map_t 
= field_map.get_or_insert (record_type, );
  gcc_checking_assert (!existed);
  [...]
  for ([...])
fields.put ([...], [...]);
  [...]
}
  [stuff that looks up elements from 'field_map']
  field_map.empty ();

This generally works.

If I instantiate 'record_field_map_t field_map (40);', Valgrind is happy.
If however I instantiate 'record_field_map_t field_map (13);' (where '13'
would be the default for 'hash_map'), Valgrind complains:

  2,080 bytes in 10 blocks are definitely lost in loss record 828 of 876
 at 0x483DD99: calloc (vg_replace_malloc.c:762)
 by 0x175F010: xcalloc (xmalloc.c:162)
 by 0xAF4A2C: hash_table, tree_node*> >::hash_entry, false, 
xcallocator>::hash_table(unsigned long, bool, bool, bool, mem_alloc_origin) (hash-table.h:275)
 by 0x15E0120: hash_map, tree_node*> 
>::hash_map(unsigned long, bool, bool, bool) (hash-map.h:143)
 by 0x15DEE87: hash_map, tree_node*> >, 
simple_hashmap_traits, hash_map, tree_node*> > > >::get_or_insert(tree_node* const&, 
bool*) (hash-map.h:205)
 by 0x15DD52C: execute_omp_oacc_neuter_broadcast() 
(omp-oacc-neuter-broadcast.cc:1371)
 [...]

(That's with '#pragma GCC optimize "O0"' at the top of the 'gcc/*.cc'
file.)

My suspicion was that it is due to the 'field_map' getting resized as it
incrementally grows (and '40' being big enough for that to never happen),
and somehow the non-POD (?) value objects not being properly handled
during that.  Working my way a bit through 'gcc/hash-map.*' and
'gcc/hash-table.*' (but not claiming that I understand all that, off
hand), it seems as if my theory is right: I'm able to plug this memory
leak as follows:

  --- gcc/hash-table.h
  +++ gcc/hash-table.h
  @@ -820,6 +820,8 @@ hash_table::expand ()
   {
 value_type *q = find_empty_slot_for_expand (Descriptor::hash 
(x));
new ((void*) q) value_type (std::move (x));
  + //BAD Descriptor::remove (x); // (doesn't make sense and) a ton of 
"Invalid read [...] inside a block of size [...] free'd"
  + x.~value_type (); //GOOD This seems to work!  -- but does it make 
sense?
   }

 p++;

However, that doesn't exactly look like a correct fix, does it?  I'd
expect such a manual destructor call in combination with placement new
(that is being 

Re: [PATCH] c++: ignore explicit dguides during NTTP CTAD [PR101883]

2021-08-16 Thread Jason Merrill via Gcc-patches

On 8/16/21 3:06 PM, Patrick Palka wrote:

Since (template) argument passing is a copy-initialization context,
we mustn't consider explicit deduction guides when deducing a CTAD
placeholder type of an NTTP.

Bootstrapped and regtested on x86_64-pc-linux-gnu, does this look OK for
trunk/11?


OK.


PR c++/101883

gcc/cp/ChangeLog:

* pt.c (convert_template_argument): Pass LOOKUP_IMPLICIT to
convert_template_argument.

gcc/testsuite/ChangeLog:

* g++.dg/cpp2a/nontype-class49.C: New test.
---
  gcc/cp/pt.c  | 3 ++-
  gcc/testsuite/g++.dg/cpp2a/nontype-class49.C | 8 
  2 files changed, 10 insertions(+), 1 deletion(-)
  create mode 100644 gcc/testsuite/g++.dg/cpp2a/nontype-class49.C

diff --git a/gcc/cp/pt.c b/gcc/cp/pt.c
index 0870ccdc9f6..5ac89901e22 100644
--- a/gcc/cp/pt.c
+++ b/gcc/cp/pt.c
@@ -8486,7 +8486,8 @@ convert_template_argument (tree parm,
   can happen in the context of -fnew-ttp-matching.  */;
else if (tree a = type_uses_auto (t))
{
- t = do_auto_deduction (t, arg, a, complain, adc_unify, args);
+ t = do_auto_deduction (t, arg, a, complain, adc_unify, args,
+LOOKUP_IMPLICIT);
  if (t == error_mark_node)
return error_mark_node;
}
diff --git a/gcc/testsuite/g++.dg/cpp2a/nontype-class49.C 
b/gcc/testsuite/g++.dg/cpp2a/nontype-class49.C
new file mode 100644
index 000..c83e4075ed0
--- /dev/null
+++ b/gcc/testsuite/g++.dg/cpp2a/nontype-class49.C
@@ -0,0 +1,8 @@
+// PR c++/101883
+// { dg-do compile { target c++20 } }
+
+template struct C { constexpr C(int) { } };
+explicit C(int) -> C;
+
+template struct X { };
+X<1> x; // { dg-error "deduction|no match" }





Re: [PATCH] c++: aggregate CTAD and brace elision [PR101344]

2021-08-16 Thread Jason Merrill via Gcc-patches

On 8/16/21 3:06 PM, Patrick Palka wrote:

During aggregate CTAD, collect_ctor_idx_types always recurses into a
sub-CONSTRUCTOR, regardless of whether the corresponding pair of braces
was elided in the original initializer.  This causes us to reject some
completely-braced forms of aggregate CTAD as in the first testcase
below, because collect_ctor_idx_types effectively assumes that the given
initializer is always minimally-braced (hence the aggregate deduction
candidate is given a function type that's incompatible with the written
initializer).

This patch fixes this by making reshape_init flag CONSTRUCTORs that
were built to undo brace elision in the original CONSTRUCTOR, so that
collect_ctor_idx_types can determine whether to recurse into a
sub-CONSTRUCTOR by simply inspecting this flag.

This happens to also fix PR101820, which is about aggregate CTAD using
designated initializers, for a similar reason as above.

A tricky case is the "intermediately-braced" initialization of 'e3'
in the first testcase below.  It seems to me we're correct to continue
to reject this according to [over.match.class.deduct]/1 because here
the initializer element {1, 2, 3, 4} corresponds to the subobject E::t,
hence the type T_1 of the first funciton parameter of the aggregate
deduction candidate is T(&&)[2][2] which the argument {1, 2, 3, 4} isn't
compatible with (as opposed to say T(&&)[4]).

Bootstrapped and regtested on x86_64-pc-linux-gnu, does this look OK for
trunk/11?


OK.


PR c++/101344
PR c++/101820

gcc/cp/ChangeLog:

* cp-tree.h (CONSTRUCTOR_BRACES_ELIDED_P): Define.
* decl.c (reshape_init_r): Set it.
* pt.c (collect_ctor_idx_types): Recurse into a sub-CONSTRUCTOR
iff CONSTRUCTOR_BRACES_ELIDED_P.

gcc/testsuite/ChangeLog:

* g++.dg/cpp2a/class-deduction-aggr11.C: New test.
* g++.dg/cpp2a/class-deduction-aggr12.C: New test.
---
  gcc/cp/cp-tree.h  |  6 
  gcc/cp/decl.c | 18 +---
  gcc/cp/pt.c   |  7 +
  .../g++.dg/cpp2a/class-deduction-aggr11.C | 29 +++
  .../g++.dg/cpp2a/class-deduction-aggr12.C | 15 ++
  5 files changed, 65 insertions(+), 10 deletions(-)
  create mode 100644 gcc/testsuite/g++.dg/cpp2a/class-deduction-aggr11.C
  create mode 100644 gcc/testsuite/g++.dg/cpp2a/class-deduction-aggr12.C

diff --git a/gcc/cp/cp-tree.h b/gcc/cp/cp-tree.h
index bd3f12a393e..8cbf6cc30b0 100644
--- a/gcc/cp/cp-tree.h
+++ b/gcc/cp/cp-tree.h
@@ -4502,6 +4502,12 @@ more_aggr_init_expr_args_p (const 
aggr_init_expr_arg_iterator *iter)
  #define CONSTRUCTOR_IS_PAREN_INIT(NODE) \
(CONSTRUCTOR_CHECK(NODE)->base.private_flag)
  
+/* True if reshape_init built this CONSTRUCTOR to undo the brace elision

+   of another CONSTRUCTOR.  This flag is used during C++20 aggregate
+   CTAD.  */
+#define CONSTRUCTOR_BRACES_ELIDED_P(NODE) \
+  (CONSTRUCTOR_CHECK (NODE)->base.protected_flag)
+
  /* True if NODE represents a conversion for direct-initialization in a
 template.  Set by perform_implicit_conversion_flags.  */
  #define IMPLICIT_CONV_EXPR_DIRECT_INIT(NODE) \
diff --git a/gcc/cp/decl.c b/gcc/cp/decl.c
index b3671ee8956..9e257b32e18 100644
--- a/gcc/cp/decl.c
+++ b/gcc/cp/decl.c
@@ -6650,7 +6650,8 @@ reshape_init_r (tree type, reshape_iter *d, tree 
first_initializer_p,
/* A non-aggregate type is always initialized with a single
   initializer.  */
if (!CP_AGGREGATE_TYPE_P (type)
-  /* As is an array with dependent bound.  */
+  /* As is an array with dependent bound, which we can see
+during C++20 aggregate CTAD.  */
|| (cxx_dialect >= cxx20
  && TREE_CODE (type) == ARRAY_TYPE
  && uses_template_parms (TYPE_DOMAIN (type
@@ -6767,6 +6768,7 @@ reshape_init_r (tree type, reshape_iter *d, tree 
first_initializer_p,
   initializer already, and there is not a CONSTRUCTOR, it means that there
   is a missing set of braces (that is, we are processing the case for
   which reshape_init exists).  */
+  bool braces_elided_p = false;
if (!first_initializer_p)
  {
if (TREE_CODE (stripped_init) == CONSTRUCTOR)
@@ -6802,17 +6804,25 @@ reshape_init_r (tree type, reshape_iter *d, tree 
first_initializer_p,
warning (OPT_Wmissing_braces,
 "missing braces around initializer for %qT",
 type);
+  braces_elided_p = true;
  }
  
/* Dispatch to specialized routines.  */

+  tree new_init;
if (CLASS_TYPE_P (type))
-return reshape_init_class (type, d, first_initializer_p, complain);
+new_init = reshape_init_class (type, d, first_initializer_p, complain);
else if (TREE_CODE (type) == ARRAY_TYPE)
-return reshape_init_array (type, d, first_initializer_p, complain);
+new_init = reshape_init_array (type, d, first_initializer_p, complain);
else if (VECTOR_TYPE_P (type))
-return 

Re: [PATCH] RISC-V: Enable overlap-by-pieces in case of fast unaliged access

2021-08-16 Thread Palmer Dabbelt

On Mon, 16 Aug 2021 11:56:05 PDT (-0700), pins...@gmail.com wrote:

On Mon, Aug 16, 2021 at 10:10 AM Palmer Dabbelt  wrote:


On Mon, 16 Aug 2021 09:29:16 PDT (-0700), Kito Cheng wrote:
>> > Could you submit v3 patch which is v1 with overlap_op_by_pieces field,
>> > testcase from v2 and add a few more comments to describe the field?
>> >
>> > And add an -mtune=ultra-size to make it able to test without change
>> > other behavior?
>> >
>> > Hi Palmer:
>> >
>> > Are you OK with that?
>>
>> I'm still not convinced on the performance: like Andrew and I pointed
>> out, this is a difficult case for pipelines of this flavor to handle.
>> Nobody here knows anything about this pipeline deeply enough to say
>> anything difinitive, though, so this is really just a guess.
>
> So with an extra field to indicate should resolve that?
> I believe people should only set overlap_op_by_pieces
> to true only if they are sure it has benefits.

My only issue there is that we'd have no way to turn it on, but see
below...

>> As I'm not convinced this is an obvious performance win I'm not going to
>> merge it without a benchmark.  If you're convinced and want to merge it
>> that's fine, I don't really care about the performance fo the C906 and
>> if someone complains we can always just revert it later.
>
> I suppose Christoph has tried with their internal processor, and it's
> benefit on performance,
> but it can't be open-source yet, so v2 patch set using C906 to demo
> and test that since that is
> the only processor with slow_unaligned_access=False.

Well, that's a very different discussion.  The C906 tuning model should
be for the C906, not a proxy for some internal-only processor.  If the
goal here is to allow this pass to be flipped on by an out-of-tree
pipeline model then we can talk about it.

> I agree on the C906 part, we never know it's benefit or not, so I propose
> adding one -mtune=ultra-size to make this test-able rather than changing C906.

That's essentially the same conclusion we came to last time this came
up, except that we were calling it "-Oz" (because LLVM does).  I guess
we never got around having the broader GCC discussion about "-Oz".  IIRC
we had some other "-Oz" candidates we never got around to dealing with,
but that was a while ago so I'm not sure if any of that panned out.


-Oz was a bad idea that Apple came up because GCC decided to start
emitting store multiple on PowerPC around 13 years ago.
I don't think we should repeat that mistake for GCC and especially for RISCV.
If people want to optimize for size, they get the performance issues.


Makes sense.  Probably best to avoid adding the RISC-V specific version 
of this as well, then, as it's really just two sides of the same coin.


Sounds like we'll likely want to stop implementing -Os via a tuning on 
RISC-V: that was a convienent way to do it wen we didn't have any 
conflicts between -O and -mtune, but assuming this will eventually land 
that won't be valid any more.  That's a pretty mechinacial process.


It still leaves us with the question of what to do with this pass, which 
IMO really just depends on what the actual goal is here: if we're trying 
to optimize for the C906 then we should just wait for the benchmarks to 
demorstrate this is worth doing (though again, Kito, if you think this 
is good enough and want to flip this on I don't really care that much), 
but if we're trying to optimize for some other pipeline then we should 
really wait for that to show up.


I'm not going to speculate about what this new pipeline is, but if 
there's anything concrete announced about it then I'm happy to take a 
look.  Historically we've never been super strict about waiting for 
hardware before taking a pipeline model, but I do think we should have 
something as just trying to support any hypothetical future hardware 
will lead to insanity.  IMO we need to be extra explicit that we're 
willing to work with hardware vendors, as due to the nature of RISC-V 
that can get lost in translation, but there has to be some balance.


Re: [PATCH] c++, v2: Implement P0466R5 __cpp_lib_is_layout_compatible compiler helpers [PR101539]

2021-08-16 Thread Jason Merrill via Gcc-patches

On 8/12/21 1:07 PM, Jakub Jelinek wrote:

On Thu, Aug 12, 2021 at 12:06:33PM -0400, Jason Merrill wrote:

Yes; if the standard says something nonsensical, I prefer to figure out
something more sensible to propose as a change.


Ok, so here it is implemented, so far tested only on the new testcases
(but nothing else really uses the code that has changed since the last
patch).  Also attached incremental diff (so that it is also clear
what test behaviors changed).
I'll of course bootstrap/regtest it full tonight.

2021-08-12  Jakub Jelinek  

PR c++/101539
gcc/c-family/
* c-common.h (enum rid): Add RID_IS_LAYOUT_COMPATIBLE.
* c-common.c (c_common_reswords): Add __is_layout_compatible.
gcc/cp/
* cp-tree.h (enum cp_trait_kind): Add CPTK_IS_LAYOUT_COMPATIBLE.
(enum cp_built_in_function): Add CP_BUILT_IN_IS_CORRESPONDING_MEMBER.
(fold_builtin_is_corresponding_member, layout_compatible_type_p):
Declare.
* parser.c (cp_parser_primary_expression): Handle
RID_IS_LAYOUT_COMPATIBLE.
(cp_parser_trait_expr): Likewise.
* cp-objcp-common.c (names_builtin_p): Likewise.
* constraint.cc (diagnose_trait_expr): Handle
CPTK_IS_LAYOUT_COMPATIBLE.
* decl.c (cxx_init_decl_processing): Register
__builtin_is_corresponding_member builtin.
* constexpr.c (cxx_eval_builtin_function_call): Handle
CP_BUILT_IN_IS_CORRESPONDING_MEMBER builtin.
* semantics.c (is_corresponding_member_union,
is_corresponding_member_aggr, fold_builtin_is_corresponding_member):
New functions.
(trait_expr_value): Handle CPTK_IS_LAYOUT_COMPATIBLE.
(finish_trait_expr): Likewise.
* typeck.c (layout_compatible_type_p): New function.
* cp-gimplify.c (cp_gimplify_expr): Fold
CP_BUILT_IN_IS_CORRESPONDING_MEMBER.
(cp_fold): Likewise.
* tree.c (builtin_valid_in_constant_expr_p): Handle
CP_BUILT_IN_IS_CORRESPONDING_MEMBER.
* cxx-pretty-print.c (pp_cxx_trait_expression): Handle
CPTK_IS_LAYOUT_COMPATIBLE.
* class.c (remove_zero_width_bit_fields): Remove.
(layout_class_type): Don't call it.
gcc/testsuite/
* g++.dg/cpp2a/is-corresponding-member1.C: New test.
* g++.dg/cpp2a/is-corresponding-member2.C: New test.
* g++.dg/cpp2a/is-corresponding-member3.C: New test.
* g++.dg/cpp2a/is-corresponding-member4.C: New test.
* g++.dg/cpp2a/is-corresponding-member5.C: New test.
* g++.dg/cpp2a/is-corresponding-member6.C: New test.
* g++.dg/cpp2a/is-corresponding-member7.C: New test.
* g++.dg/cpp2a/is-corresponding-member8.C: New test.
* g++.dg/cpp2a/is-layout-compatible1.C: New test.
* g++.dg/cpp2a/is-layout-compatible2.C: New test.
* g++.dg/cpp2a/is-layout-compatible3.C: New test.

--- gcc/c-family/c-common.h.jj  2021-08-12 18:14:29.235853657 +0200
+++ gcc/c-family/c-common.h 2021-08-12 18:21:01.141484689 +0200
@@ -173,7 +173,8 @@ enum rid
RID_IS_ABSTRACT, RID_IS_AGGREGATE,
RID_IS_BASE_OF,  RID_IS_CLASS,
RID_IS_EMPTY,RID_IS_ENUM,
-  RID_IS_FINAL,RID_IS_LITERAL_TYPE,
+  RID_IS_FINAL,RID_IS_LAYOUT_COMPATIBLE,
+  RID_IS_LITERAL_TYPE,
RID_IS_POINTER_INTERCONVERTIBLE_BASE_OF,
RID_IS_POD,  RID_IS_POLYMORPHIC,
RID_IS_SAME_AS,
--- gcc/c-family/c-common.c.jj  2021-08-03 00:44:32.762494219 +0200
+++ gcc/c-family/c-common.c 2021-08-12 18:21:01.143484661 +0200
@@ -420,6 +420,7 @@ const struct c_common_resword c_common_r
{ "__is_empty",   RID_IS_EMPTY,   D_CXXONLY },
{ "__is_enum",RID_IS_ENUM,D_CXXONLY },
{ "__is_final",   RID_IS_FINAL,   D_CXXONLY },
+  { "__is_layout_compatible", RID_IS_LAYOUT_COMPATIBLE, D_CXXONLY },
{ "__is_literal_type", RID_IS_LITERAL_TYPE, D_CXXONLY },
{ "__is_pointer_interconvertible_base_of",
RID_IS_POINTER_INTERCONVERTIBLE_BASE_OF, D_CXXONLY },
--- gcc/cp/cp-tree.h.jj 2021-08-12 09:34:16.817236456 +0200
+++ gcc/cp/cp-tree.h2021-08-12 18:21:01.144484647 +0200
@@ -1365,6 +1365,7 @@ enum cp_trait_kind
CPTK_IS_EMPTY,
CPTK_IS_ENUM,
CPTK_IS_FINAL,
+  CPTK_IS_LAYOUT_COMPATIBLE,
CPTK_IS_LITERAL_TYPE,
CPTK_IS_POINTER_INTERCONVERTIBLE_BASE_OF,
CPTK_IS_POD,
@@ -6358,6 +6359,7 @@ struct GTY((chain_next ("%h.next"))) tin
  enum cp_built_in_function {
CP_BUILT_IN_IS_CONSTANT_EVALUATED,
CP_BUILT_IN_INTEGER_PACK,
+  CP_BUILT_IN_IS_CORRESPONDING_MEMBER,
CP_BUILT_IN_IS_POINTER_INTERCONVERTIBLE_WITH_CLASS,
CP_BUILT_IN_SOURCE_LOCATION,
CP_BUILT_IN_LAST
@@ -7574,6 +7576,7 @@ extern tree baselink_for_fns
  extern void finish_static_assert(tree, tree, location_t,
 bool, bool);
  extern tree finish_decltype_type(tree, bool, tsubst_flags_t);
+extern tree 

Re: [patch][version 6] add -ftrivial-auto-var-init and variable attribute "uninitialized" to gcc

2021-08-16 Thread Qing Zhao via Gcc-patches


> On Aug 16, 2021, at 2:40 AM, Richard Biener  wrote:
> 
> On Thu, 12 Aug 2021, Qing Zhao wrote:
> 
>> Hi, Richard,
>> 
>> For RTL expansion of call to .DEFERRED_INIT, I changed my code per your 
>> suggestions like following:
>> 
>> ==
>> #define INIT_PATTERN_VALUE  0xFE
>> static void
>> expand_DEFERRED_INIT (internal_fn, gcall *stmt)
>> {
>>  tree lhs = gimple_call_lhs (stmt);
>>  tree var_size = gimple_call_arg (stmt, 0);
>>  enum auto_init_type init_type
>>= (enum auto_init_type) TREE_INT_CST_LOW (gimple_call_arg (stmt, 1));
>>  bool is_vla = (bool) TREE_INT_CST_LOW (gimple_call_arg (stmt, 2));
>> 
>>  tree var_type = TREE_TYPE (lhs);
>>  gcc_assert (init_type > AUTO_INIT_UNINITIALIZED);
>> 
>>  if (is_vla || (!can_native_interpret_type_p (var_type)))
>>{
>>/* If this is a VLA or the type of the variable cannot be natively
>>   interpreted, expand to a memset to initialize it.  */
>>  if (TREE_CODE (lhs) == SSA_NAME)
>>lhs = SSA_NAME_VAR (lhs);
>>  tree var_addr = NULL_TREE;
>>  if (is_vla)
>>var_addr = TREE_OPERAND (lhs, 0);
>>  else
>>{
>> TREE_ADDRESSABLE (lhs) = 1;
>> var_addr = build_fold_addr_expr (lhs);
>>}
>>  tree value = (init_type == AUTO_INIT_PATTERN) ?
>>build_int_cst (unsigned_char_type_node,
>>   INIT_PATTERN_VALUE) :
>>build_zero_cst (unsigned_char_type_node);
>>  tree m_call = build_call_expr (builtin_decl_implicit (BUILT_IN_MEMSET),
>> 3, var_addr, value, var_size);
>>  /* Expand this memset call.  */
>>  expand_builtin_memset (m_call, NULL_RTX, TYPE_MODE (var_type));
>>}
>>  else
>>{
>>/* If this is not a VLA and the type of the variable can be natively 
>>   interpreted, expand to assignment to generate better code.  */
>>  tree pattern = NULL_TREE;
>>  unsigned HOST_WIDE_INT total_bytes
>>= tree_to_uhwi (TYPE_SIZE_UNIT (var_type));
>> 
>>  if (init_type == AUTO_INIT_PATTERN)
>>{
>>  unsigned char *buf = (unsigned char *) xmalloc (total_bytes);
>>  memset (buf, INIT_PATTERN_VALUE, total_bytes);
>>  pattern = native_interpret_expr (var_type, buf, total_bytes);
>>  gcc_assert (pattern);
>>}
>> 
>>  tree init = (init_type == AUTO_INIT_PATTERN) ?
>>   pattern :
>>   build_zero_cst (var_type);
>>  expand_assignment (lhs, init, false);
>>}
>> }
>> ===
>> 
>> Now, I used “can_native_interpret_type_p (var_type)” instead of 
>> “use_register_for_decl (lhs)” to decide 
>> whether to use “memset” or use “assign” to expand this function.
>> 
>> However, this exposed an bug that is very hard to be addressed:
>> 
>> ***For the testing case: test suite/gcc.dg/uninit-I.c:
>> 
>> /* { dg-do compile } */
>> /* { dg-options "-O2 -Wuninitialized" } */
>> 
>> int sys_msgctl (void)
>> {
>>  struct { int mode; } setbuf;
>>  return setbuf.mode;  /* { dg-warning "'setbuf\.mode' is used" } */
>> ==
>> 
>> **the above auto var “setbuf” has “struct” type, which 
>> “can_native_interpret_type_p(var_type)” is false, therefore, 
>> Expanding this .DEFERRED_INIT call went down the “memset” expansion route. 
>> 
>> However, this structure type can be fitted into a register, therefore cannot 
>> be taken address anymore at this stage, even though I tried:
>> 
>> TREE_ADDRESSABLE (lhs) = 1;
>> var_addr = build_fold_addr_expr (lhs);
>> 
>> To create an address variable for it, the expansion still failed at expr.c: 
>> line 8412:
>> during RTL pass: expand
>> /home/opc/Work/GCC/latest-gcc/gcc/testsuite/gcc.dg/auto-init-uninit-I.c:6:24:
>>  internal compiler error: in expand_expr_addr_expr_1, at expr.c:8412
>> 0xd04104 expand_expr_addr_expr_1
>>  ../../latest-gcc/gcc/expr.c:8412
>> 0xd04a95 expand_expr_addr_expr
>>  ../../latest-gcc/gcc/expr.c:8525
>> 0xd13592 expand_expr_real_1(tree_node*, rtx_def*, machine_mode, 
>> expand_modifier, rtx_def**, bool)
>>  ../../latest-gcc/gcc/expr.c:11741
>> 0xd05142 expand_expr_real(tree_node*, rtx_def*, machine_mode, 
>> expand_modifier, rtx_def**, bool)
>>  ../../latest-gcc/gcc/expr.c:8713
>> 0xaed1d3 expand_expr
>>  ../../latest-gcc/gcc/expr.h:301
>> 0xaf0d89 get_memory_rtx
>>  ../../latest-gcc/gcc/builtins.c:1370
>> 0xafb4fb expand_builtin_memset_args
>>  ../../latest-gcc/gcc/builtins.c:4102
>> 0xafacde expand_builtin_memset(tree_node*, rtx_def*, machine_mode)
>>  ../../latest-gcc/gcc/builtins.c:3886
>> 0xe97fb3 expand_DEFERRED_INIT
>> 
>> **That’s the major reason why I chose “use_register_for_decl(lhs)” to 
>> decide “memset” expansion or “assign” expansion, “memset” expansion
>> needs to take address of the variable, if the variable has been decided to 
>> fit into a register, then its address cannot taken anymore at this stage.
>> 
>> **using 

Re: What should a std::error_code pretty printer show?

2021-08-16 Thread Jonathan Wakely via Gcc-patches
On Mon, 16 Aug 2021 at 17:51, Jonathan Wakely  wrote:
>
> On Mon, 16 Aug 2021 at 13:11, Jonathan Wakely  wrote:
> >
> >
> >
> > On Mon, 16 Aug 2021, 12:55 Jonathan Wakely,  wrote:
> >>
> >> I'm adding a GDB printer for std::error_code.What I have now prints
> >> the category name as a quoted string, followed by the error value:
> >>
> >> {"system": 0}
> >> {"system": 1234}
> >>
> >> If the category is std::generic_category() then it also shows the
> >> strerror description:
> >
> >
> > I should probably extend this special case for the generic category to also 
> > apply to the system category when the OS is POSIX-based. For POSIX systems, 
> > the system error numbers are generic errno values.
> >
> >
> >>
> >> {"generic": 13 "Permission denied"}
> >>
> >> But I'd also like it to show the errno macro, but I'm not sure what's
> >> the best way to show it.
> >>
> >> Does this seem OK?
> >>
> >> {"generic": 13 EACCES "Permission denied"}
> >>
> >> I think that's a bit too verbose.
> >>
> >> Would {"generic": EACCES} be better? You can always use ec.value() to
> >> get the numeric value, and strerror to get the description if you want
> >> those.
>
> Here's what I plan to commit. It just uses {"generic": EACCES} for
> known categories that use errno values, and {"foo": 99} for other
> error categories.
>
>
> It also supports std::error_condition (using the same printer and the
> same output formats).

Actually, the proposal in PR 65230 would mean that we should print { }
if the object has its default-constructed state, i.e. {"system": 0}
for error_code and {"generic": 0} for error_condition. I'll make that
change before pushing anything to master.

Other suggestions for improvement (or just agreeing with the
direction) are welcome.


Re: [PATCH] c++: aggregate CTAD and brace elision [PR101344]

2021-08-16 Thread Patrick Palka via Gcc-patches
On Mon, 16 Aug 2021, Marek Polacek wrote:

> On Mon, Aug 16, 2021 at 03:06:08PM -0400, Patrick Palka via Gcc-patches wrote:
> > During aggregate CTAD, collect_ctor_idx_types always recurses into a
> > sub-CONSTRUCTOR, regardless of whether the corresponding pair of braces
> > was elided in the original initializer.  This causes us to reject some
> > completely-braced forms of aggregate CTAD as in the first testcase
> > below, because collect_ctor_idx_types effectively assumes that the given
> > initializer is always minimally-braced (hence the aggregate deduction
> > candidate is given a function type that's incompatible with the written
> > initializer).
> > 
> > This patch fixes this by making reshape_init flag CONSTRUCTORs that
> > were built to undo brace elision in the original CONSTRUCTOR, so that
> > collect_ctor_idx_types can determine whether to recurse into a
> > sub-CONSTRUCTOR by simply inspecting this flag.
> > 
> > This happens to also fix PR101820, which is about aggregate CTAD using
> > designated initializers, for a similar reason as above.
> > 
> > A tricky case is the "intermediately-braced" initialization of 'e3'
> > in the first testcase below.  It seems to me we're correct to continue
> > to reject this according to [over.match.class.deduct]/1 because here
> > the initializer element {1, 2, 3, 4} corresponds to the subobject E::t,
> > hence the type T_1 of the first funciton parameter of the aggregate
> > deduction candidate is T(&&)[2][2] which the argument {1, 2, 3, 4} isn't
> > compatible with (as opposed to say T(&&)[4]).
> > 
> > Bootstrapped and regtested on x86_64-pc-linux-gnu, does this look OK for
> > trunk/11?
> > 
> > PR c++/101344
> > PR c++/101820
> > 
> > gcc/cp/ChangeLog:
> > 
> > * cp-tree.h (CONSTRUCTOR_BRACES_ELIDED_P): Define.
> > * decl.c (reshape_init_r): Set it.
> > * pt.c (collect_ctor_idx_types): Recurse into a sub-CONSTRUCTOR
> > iff CONSTRUCTOR_BRACES_ELIDED_P.
> > 
> > gcc/testsuite/ChangeLog:
> > 
> > * g++.dg/cpp2a/class-deduction-aggr11.C: New test.
> > * g++.dg/cpp2a/class-deduction-aggr12.C: New test.
> > ---
> >  gcc/cp/cp-tree.h  |  6 
> >  gcc/cp/decl.c | 18 +---
> >  gcc/cp/pt.c   |  7 +
> >  .../g++.dg/cpp2a/class-deduction-aggr11.C | 29 +++
> >  .../g++.dg/cpp2a/class-deduction-aggr12.C | 15 ++
> >  5 files changed, 65 insertions(+), 10 deletions(-)
> >  create mode 100644 gcc/testsuite/g++.dg/cpp2a/class-deduction-aggr11.C
> >  create mode 100644 gcc/testsuite/g++.dg/cpp2a/class-deduction-aggr12.C
> > 
> > diff --git a/gcc/cp/cp-tree.h b/gcc/cp/cp-tree.h
> > index bd3f12a393e..8cbf6cc30b0 100644
> > --- a/gcc/cp/cp-tree.h
> > +++ b/gcc/cp/cp-tree.h
> > @@ -4502,6 +4502,12 @@ more_aggr_init_expr_args_p (const 
> > aggr_init_expr_arg_iterator *iter)
> >  #define CONSTRUCTOR_IS_PAREN_INIT(NODE) \
> >(CONSTRUCTOR_CHECK(NODE)->base.private_flag)
> >  
> > +/* True if reshape_init built this CONSTRUCTOR to undo the brace elision
> > +   of another CONSTRUCTOR.  This flag is used during C++20 aggregate
> > +   CTAD.  */
> > +#define CONSTRUCTOR_BRACES_ELIDED_P(NODE) \
> > +  (CONSTRUCTOR_CHECK (NODE)->base.protected_flag)
> > +
> >  /* True if NODE represents a conversion for direct-initialization in a
> > template.  Set by perform_implicit_conversion_flags.  */
> >  #define IMPLICIT_CONV_EXPR_DIRECT_INIT(NODE) \
> > diff --git a/gcc/cp/decl.c b/gcc/cp/decl.c
> > index b3671ee8956..9e257b32e18 100644
> > --- a/gcc/cp/decl.c
> > +++ b/gcc/cp/decl.c
> > @@ -6650,7 +6650,8 @@ reshape_init_r (tree type, reshape_iter *d, tree 
> > first_initializer_p,
> >/* A non-aggregate type is always initialized with a single
> >   initializer.  */
> >if (!CP_AGGREGATE_TYPE_P (type)
> > -  /* As is an array with dependent bound.  */
> > +  /* As is an array with dependent bound, which we can see
> > +during C++20 aggregate CTAD.  */
> >|| (cxx_dialect >= cxx20
> >   && TREE_CODE (type) == ARRAY_TYPE
> >   && uses_template_parms (TYPE_DOMAIN (type
> > @@ -6767,6 +6768,7 @@ reshape_init_r (tree type, reshape_iter *d, tree 
> > first_initializer_p,
> >   initializer already, and there is not a CONSTRUCTOR, it means that 
> > there
> >   is a missing set of braces (that is, we are processing the case for
> >   which reshape_init exists).  */
> > +  bool braces_elided_p = false;
> >if (!first_initializer_p)
> >  {
> >if (TREE_CODE (stripped_init) == CONSTRUCTOR)
> > @@ -6802,17 +6804,25 @@ reshape_init_r (tree type, reshape_iter *d, tree 
> > first_initializer_p,
> > warning (OPT_Wmissing_braces,
> >  "missing braces around initializer for %qT",
> >  type);
> > +  braces_elided_p = true;
> >  }
> >  
> >/* Dispatch to specialized routines.  */
> > +  tree new_init;
> >if (CLASS_TYPE_P 

Re: [PATCH] c++: aggregate CTAD and brace elision [PR101344]

2021-08-16 Thread Marek Polacek via Gcc-patches
On Mon, Aug 16, 2021 at 03:06:08PM -0400, Patrick Palka via Gcc-patches wrote:
> During aggregate CTAD, collect_ctor_idx_types always recurses into a
> sub-CONSTRUCTOR, regardless of whether the corresponding pair of braces
> was elided in the original initializer.  This causes us to reject some
> completely-braced forms of aggregate CTAD as in the first testcase
> below, because collect_ctor_idx_types effectively assumes that the given
> initializer is always minimally-braced (hence the aggregate deduction
> candidate is given a function type that's incompatible with the written
> initializer).
> 
> This patch fixes this by making reshape_init flag CONSTRUCTORs that
> were built to undo brace elision in the original CONSTRUCTOR, so that
> collect_ctor_idx_types can determine whether to recurse into a
> sub-CONSTRUCTOR by simply inspecting this flag.
> 
> This happens to also fix PR101820, which is about aggregate CTAD using
> designated initializers, for a similar reason as above.
> 
> A tricky case is the "intermediately-braced" initialization of 'e3'
> in the first testcase below.  It seems to me we're correct to continue
> to reject this according to [over.match.class.deduct]/1 because here
> the initializer element {1, 2, 3, 4} corresponds to the subobject E::t,
> hence the type T_1 of the first funciton parameter of the aggregate
> deduction candidate is T(&&)[2][2] which the argument {1, 2, 3, 4} isn't
> compatible with (as opposed to say T(&&)[4]).
> 
> Bootstrapped and regtested on x86_64-pc-linux-gnu, does this look OK for
> trunk/11?
> 
>   PR c++/101344
>   PR c++/101820
> 
> gcc/cp/ChangeLog:
> 
>   * cp-tree.h (CONSTRUCTOR_BRACES_ELIDED_P): Define.
>   * decl.c (reshape_init_r): Set it.
>   * pt.c (collect_ctor_idx_types): Recurse into a sub-CONSTRUCTOR
>   iff CONSTRUCTOR_BRACES_ELIDED_P.
> 
> gcc/testsuite/ChangeLog:
> 
>   * g++.dg/cpp2a/class-deduction-aggr11.C: New test.
>   * g++.dg/cpp2a/class-deduction-aggr12.C: New test.
> ---
>  gcc/cp/cp-tree.h  |  6 
>  gcc/cp/decl.c | 18 +---
>  gcc/cp/pt.c   |  7 +
>  .../g++.dg/cpp2a/class-deduction-aggr11.C | 29 +++
>  .../g++.dg/cpp2a/class-deduction-aggr12.C | 15 ++
>  5 files changed, 65 insertions(+), 10 deletions(-)
>  create mode 100644 gcc/testsuite/g++.dg/cpp2a/class-deduction-aggr11.C
>  create mode 100644 gcc/testsuite/g++.dg/cpp2a/class-deduction-aggr12.C
> 
> diff --git a/gcc/cp/cp-tree.h b/gcc/cp/cp-tree.h
> index bd3f12a393e..8cbf6cc30b0 100644
> --- a/gcc/cp/cp-tree.h
> +++ b/gcc/cp/cp-tree.h
> @@ -4502,6 +4502,12 @@ more_aggr_init_expr_args_p (const 
> aggr_init_expr_arg_iterator *iter)
>  #define CONSTRUCTOR_IS_PAREN_INIT(NODE) \
>(CONSTRUCTOR_CHECK(NODE)->base.private_flag)
>  
> +/* True if reshape_init built this CONSTRUCTOR to undo the brace elision
> +   of another CONSTRUCTOR.  This flag is used during C++20 aggregate
> +   CTAD.  */
> +#define CONSTRUCTOR_BRACES_ELIDED_P(NODE) \
> +  (CONSTRUCTOR_CHECK (NODE)->base.protected_flag)
> +
>  /* True if NODE represents a conversion for direct-initialization in a
> template.  Set by perform_implicit_conversion_flags.  */
>  #define IMPLICIT_CONV_EXPR_DIRECT_INIT(NODE) \
> diff --git a/gcc/cp/decl.c b/gcc/cp/decl.c
> index b3671ee8956..9e257b32e18 100644
> --- a/gcc/cp/decl.c
> +++ b/gcc/cp/decl.c
> @@ -6650,7 +6650,8 @@ reshape_init_r (tree type, reshape_iter *d, tree 
> first_initializer_p,
>/* A non-aggregate type is always initialized with a single
>   initializer.  */
>if (!CP_AGGREGATE_TYPE_P (type)
> -  /* As is an array with dependent bound.  */
> +  /* As is an array with dependent bound, which we can see
> +  during C++20 aggregate CTAD.  */
>|| (cxx_dialect >= cxx20
> && TREE_CODE (type) == ARRAY_TYPE
> && uses_template_parms (TYPE_DOMAIN (type
> @@ -6767,6 +6768,7 @@ reshape_init_r (tree type, reshape_iter *d, tree 
> first_initializer_p,
>   initializer already, and there is not a CONSTRUCTOR, it means that there
>   is a missing set of braces (that is, we are processing the case for
>   which reshape_init exists).  */
> +  bool braces_elided_p = false;
>if (!first_initializer_p)
>  {
>if (TREE_CODE (stripped_init) == CONSTRUCTOR)
> @@ -6802,17 +6804,25 @@ reshape_init_r (tree type, reshape_iter *d, tree 
> first_initializer_p,
>   warning (OPT_Wmissing_braces,
>"missing braces around initializer for %qT",
>type);
> +  braces_elided_p = true;
>  }
>  
>/* Dispatch to specialized routines.  */
> +  tree new_init;
>if (CLASS_TYPE_P (type))
> -return reshape_init_class (type, d, first_initializer_p, complain);
> +new_init = reshape_init_class (type, d, first_initializer_p, complain);
>else if (TREE_CODE (type) == ARRAY_TYPE)
> -return 

Re: [PATCH] [MIPS] Hazard barrier return support

2021-08-16 Thread Andrew Pinski via Gcc-patches
On Mon, Aug 16, 2021 at 7:43 AM Dragan Mladjenovic via Gcc-patches
 wrote:
>
> This patch allows a function to request clearing of all instruction and 
> execution
> hazards upon normal return via __attribute__ ((use_hazard_barrier_return)).
>
> 2017-04-25  Prachi Godbole  
>
> gcc/
> * config/mips/mips.h (machine_function): New variable
> use_hazard_barrier_return_p.
> * config/mips/mips.md (UNSPEC_JRHB): New unspec.
> (mips_hb_return_internal): New insn pattern.
> * config/mips/mips.c (mips_attribute_table): Add attribute
> use_hazard_barrier_return.
> (mips_use_hazard_barrier_return_p): New static function.
> (mips_function_attr_inlinable_p): Likewise.
> (mips_compute_frame_info): Set use_hazard_barrier_return_p.
> Emit error for unsupported architecture choice.
> (mips_function_ok_for_sibcall, mips_can_use_return_insn):
> Return false for use_hazard_barrier_return.
> (mips_expand_epilogue): Emit hazard barrier return.
> * doc/extend.texi: Document use_hazard_barrier_return.
>
> gcc/testsuite/
> * gcc.target/mips/hazard-barrier-return-attribute.c: New test.
> ---
> Rehash of original patch posted by Prachi with minimal changes. Tested against
> mips-mti-elf with mips32r2/-EB and mips32r2/-EB/-micromips.
>
>  gcc/config/mips/mips.c| 58 +--
>  gcc/config/mips/mips.h|  3 +
>  gcc/config/mips/mips.md   | 15 +
>  gcc/doc/extend.texi   |  6 ++
>  .../mips/hazard-barrier-return-attribute.c| 20 +++
>  5 files changed, 98 insertions(+), 4 deletions(-)
>  create mode 100644 
> gcc/testsuite/gcc.target/mips/hazard-barrier-return-attribute.c
>
> diff --git a/gcc/config/mips/mips.c b/gcc/config/mips/mips.c
> index 89d1be6cea6..6ce12fce52e 100644
> --- a/gcc/config/mips/mips.c
> +++ b/gcc/config/mips/mips.c
> @@ -630,6 +630,7 @@ static const struct attribute_spec mips_attribute_table[] 
> = {
>  mips_handle_use_shadow_register_set_attr, NULL },
>{ "keep_interrupts_masked",  0, 0, false, true,  true, false, NULL, NULL },
>{ "use_debug_exception_return", 0, 0, false, true, true, false, NULL, NULL 
> },
> +  { "use_hazard_barrier_return", 0, 0, true, false, false, false, NULL, NULL 
> },
>{ NULL, 0, 0, false, false, false, false, NULL, NULL }
>  };
>
> @@ -1309,6 +1310,16 @@ mips_use_debug_exception_return_p (tree type)
>TYPE_ATTRIBUTES (type)) != NULL;
>  }
>
> +/* Check if the attribute to use hazard barrier return is set for
> +   the function declaration DECL.  */
> +
> +static bool
> +mips_use_hazard_barrier_return_p (const_tree decl)
> +{
> +  return lookup_attribute ("use_hazard_barrier_return",
> +  DECL_ATTRIBUTES (decl)) != NULL;
> +}
> +
>  /* Return the set of compression modes that are explicitly required
> by the attributes in ATTRIBUTES.  */
>
> @@ -1494,6 +1505,19 @@ mips_can_inline_p (tree caller, tree callee)
>return default_target_can_inline_p (caller, callee);
>  }
>
> +/* Implement TARGET_FUNCTION_ATTRIBUTE_INLINABLE_P.
> +
> +   A function requesting clearing of all instruction and execution hazards
> +   before returning cannot be inlined - thereby not clearing any hazards.
> +   All our other function attributes are related to how out-of-line copies
> +   should be compiled or called.  They don't in themselves prevent inlining. 
>  */
> +
> +static bool
> +mips_function_attr_inlinable_p (const_tree decl)
> +{
> +  return !mips_use_hazard_barrier_return_p (decl);
> +}
> +
>  /* Handle an "interrupt" attribute with an optional argument.  */
>
>  static tree
> @@ -7921,6 +7945,11 @@ mips_function_ok_for_sibcall (tree decl, tree exp 
> ATTRIBUTE_UNUSED)
>&& !targetm.binds_local_p (decl))
>  return false;
>
> +  /* Can't generate sibling calls if returning from current function using
> + hazard barrier return.  */
> +  if (mips_use_hazard_barrier_return_p (current_function_decl))
> +return false;
> +
>/* Otherwise OK.  */
>return true;
>  }
> @@ -11008,6 +11037,17 @@ mips_compute_frame_info (void)
> }
>  }
>
> +  /* Determine whether to use hazard barrier return or not.  */
> +  if (mips_use_hazard_barrier_return_p (current_function_decl))
> +{
> +  if (mips_isa_rev < 2)
> +   error ("hazard barrier returns require a MIPS32r2 processor or 
> greater");

Just a small nit, is MIPS64r2 ok too?  Also did you did you test it
for MIPS64 too? I still partly care about MIPS64.

Thanks,
Andrew

> +  else if (TARGET_MIPS16)
> +   error ("hazard barrier returns are not supported for MIPS16 
> functions");
> +  else
> +   cfun->machine->use_hazard_barrier_return_p = true;
> +}
> +
>frame = >machine->frame;
>memset (frame, 0, sizeof (*frame));
>size = get_frame_size ();
> @@ -12671,7 +12711,8 @@ 

[PATCH] c++: ignore explicit dguides during NTTP CTAD [PR101883]

2021-08-16 Thread Patrick Palka via Gcc-patches
Since (template) argument passing is a copy-initialization context,
we mustn't consider explicit deduction guides when deducing a CTAD
placeholder type of an NTTP.

Bootstrapped and regtested on x86_64-pc-linux-gnu, does this look OK for
trunk/11?

PR c++/101883

gcc/cp/ChangeLog:

* pt.c (convert_template_argument): Pass LOOKUP_IMPLICIT to
convert_template_argument.

gcc/testsuite/ChangeLog:

* g++.dg/cpp2a/nontype-class49.C: New test.
---
 gcc/cp/pt.c  | 3 ++-
 gcc/testsuite/g++.dg/cpp2a/nontype-class49.C | 8 
 2 files changed, 10 insertions(+), 1 deletion(-)
 create mode 100644 gcc/testsuite/g++.dg/cpp2a/nontype-class49.C

diff --git a/gcc/cp/pt.c b/gcc/cp/pt.c
index 0870ccdc9f6..5ac89901e22 100644
--- a/gcc/cp/pt.c
+++ b/gcc/cp/pt.c
@@ -8486,7 +8486,8 @@ convert_template_argument (tree parm,
   can happen in the context of -fnew-ttp-matching.  */;
   else if (tree a = type_uses_auto (t))
{
- t = do_auto_deduction (t, arg, a, complain, adc_unify, args);
+ t = do_auto_deduction (t, arg, a, complain, adc_unify, args,
+LOOKUP_IMPLICIT);
  if (t == error_mark_node)
return error_mark_node;
}
diff --git a/gcc/testsuite/g++.dg/cpp2a/nontype-class49.C 
b/gcc/testsuite/g++.dg/cpp2a/nontype-class49.C
new file mode 100644
index 000..c83e4075ed0
--- /dev/null
+++ b/gcc/testsuite/g++.dg/cpp2a/nontype-class49.C
@@ -0,0 +1,8 @@
+// PR c++/101883
+// { dg-do compile { target c++20 } }
+
+template struct C { constexpr C(int) { } };
+explicit C(int) -> C;
+
+template struct X { };
+X<1> x; // { dg-error "deduction|no match" }
-- 
2.33.0.rc1



[PATCH] c++: aggregate CTAD and brace elision [PR101344]

2021-08-16 Thread Patrick Palka via Gcc-patches
During aggregate CTAD, collect_ctor_idx_types always recurses into a
sub-CONSTRUCTOR, regardless of whether the corresponding pair of braces
was elided in the original initializer.  This causes us to reject some
completely-braced forms of aggregate CTAD as in the first testcase
below, because collect_ctor_idx_types effectively assumes that the given
initializer is always minimally-braced (hence the aggregate deduction
candidate is given a function type that's incompatible with the written
initializer).

This patch fixes this by making reshape_init flag CONSTRUCTORs that
were built to undo brace elision in the original CONSTRUCTOR, so that
collect_ctor_idx_types can determine whether to recurse into a
sub-CONSTRUCTOR by simply inspecting this flag.

This happens to also fix PR101820, which is about aggregate CTAD using
designated initializers, for a similar reason as above.

A tricky case is the "intermediately-braced" initialization of 'e3'
in the first testcase below.  It seems to me we're correct to continue
to reject this according to [over.match.class.deduct]/1 because here
the initializer element {1, 2, 3, 4} corresponds to the subobject E::t,
hence the type T_1 of the first funciton parameter of the aggregate
deduction candidate is T(&&)[2][2] which the argument {1, 2, 3, 4} isn't
compatible with (as opposed to say T(&&)[4]).

Bootstrapped and regtested on x86_64-pc-linux-gnu, does this look OK for
trunk/11?

PR c++/101344
PR c++/101820

gcc/cp/ChangeLog:

* cp-tree.h (CONSTRUCTOR_BRACES_ELIDED_P): Define.
* decl.c (reshape_init_r): Set it.
* pt.c (collect_ctor_idx_types): Recurse into a sub-CONSTRUCTOR
iff CONSTRUCTOR_BRACES_ELIDED_P.

gcc/testsuite/ChangeLog:

* g++.dg/cpp2a/class-deduction-aggr11.C: New test.
* g++.dg/cpp2a/class-deduction-aggr12.C: New test.
---
 gcc/cp/cp-tree.h  |  6 
 gcc/cp/decl.c | 18 +---
 gcc/cp/pt.c   |  7 +
 .../g++.dg/cpp2a/class-deduction-aggr11.C | 29 +++
 .../g++.dg/cpp2a/class-deduction-aggr12.C | 15 ++
 5 files changed, 65 insertions(+), 10 deletions(-)
 create mode 100644 gcc/testsuite/g++.dg/cpp2a/class-deduction-aggr11.C
 create mode 100644 gcc/testsuite/g++.dg/cpp2a/class-deduction-aggr12.C

diff --git a/gcc/cp/cp-tree.h b/gcc/cp/cp-tree.h
index bd3f12a393e..8cbf6cc30b0 100644
--- a/gcc/cp/cp-tree.h
+++ b/gcc/cp/cp-tree.h
@@ -4502,6 +4502,12 @@ more_aggr_init_expr_args_p (const 
aggr_init_expr_arg_iterator *iter)
 #define CONSTRUCTOR_IS_PAREN_INIT(NODE) \
   (CONSTRUCTOR_CHECK(NODE)->base.private_flag)
 
+/* True if reshape_init built this CONSTRUCTOR to undo the brace elision
+   of another CONSTRUCTOR.  This flag is used during C++20 aggregate
+   CTAD.  */
+#define CONSTRUCTOR_BRACES_ELIDED_P(NODE) \
+  (CONSTRUCTOR_CHECK (NODE)->base.protected_flag)
+
 /* True if NODE represents a conversion for direct-initialization in a
template.  Set by perform_implicit_conversion_flags.  */
 #define IMPLICIT_CONV_EXPR_DIRECT_INIT(NODE) \
diff --git a/gcc/cp/decl.c b/gcc/cp/decl.c
index b3671ee8956..9e257b32e18 100644
--- a/gcc/cp/decl.c
+++ b/gcc/cp/decl.c
@@ -6650,7 +6650,8 @@ reshape_init_r (tree type, reshape_iter *d, tree 
first_initializer_p,
   /* A non-aggregate type is always initialized with a single
  initializer.  */
   if (!CP_AGGREGATE_TYPE_P (type)
-  /* As is an array with dependent bound.  */
+  /* As is an array with dependent bound, which we can see
+during C++20 aggregate CTAD.  */
   || (cxx_dialect >= cxx20
  && TREE_CODE (type) == ARRAY_TYPE
  && uses_template_parms (TYPE_DOMAIN (type
@@ -6767,6 +6768,7 @@ reshape_init_r (tree type, reshape_iter *d, tree 
first_initializer_p,
  initializer already, and there is not a CONSTRUCTOR, it means that there
  is a missing set of braces (that is, we are processing the case for
  which reshape_init exists).  */
+  bool braces_elided_p = false;
   if (!first_initializer_p)
 {
   if (TREE_CODE (stripped_init) == CONSTRUCTOR)
@@ -6802,17 +6804,25 @@ reshape_init_r (tree type, reshape_iter *d, tree 
first_initializer_p,
warning (OPT_Wmissing_braces,
 "missing braces around initializer for %qT",
 type);
+  braces_elided_p = true;
 }
 
   /* Dispatch to specialized routines.  */
+  tree new_init;
   if (CLASS_TYPE_P (type))
-return reshape_init_class (type, d, first_initializer_p, complain);
+new_init = reshape_init_class (type, d, first_initializer_p, complain);
   else if (TREE_CODE (type) == ARRAY_TYPE)
-return reshape_init_array (type, d, first_initializer_p, complain);
+new_init = reshape_init_array (type, d, first_initializer_p, complain);
   else if (VECTOR_TYPE_P (type))
-return reshape_init_vector (type, d, complain);
+new_init = reshape_init_vector (type, d, 

Re: [PATCH][Hashtable 6/6] PR 68303 small size optimization

2021-08-16 Thread François Dumont via Gcc-patches

On 17/07/20 2:58 pm, Jonathan Wakely wrote:

On 17/11/19 22:31 +0100, François Dumont wrote:

This is an implementation of PR 68303.

I try to use this idea as much as possible to avoid computation of 
hash codes.


Note that tests are not showing any gain. I guess hash computation 
must be quite bad to get a benefit from it. So I am only activating 
it when hash code is not cached and/or when computation is not fast.


If the tests don't show any benefit, why bother making the change?


I eventually managed to demonstrate this optimization through a 
performance test case.




Does it help the example in the PR?


No, the code attached to the PR just show what the user has done to put 
in place this optim on his side.


What I needed was a slow hash code computation compared to the equal 
operation. I realized that I had to use longer string to achieve this.


Moreover making this optim dependant on _Hashtable_traits::__hash_cached 
was just wrong as we cannot use the cached hash code here as the input 
is a key instance, not a node.


I introduce _Hashtable_hash_traits<_Hash> to offer a single 
customization point as this optim depends highly on the difference 
between a hash code computation and a comparison. Maybe I should put it 
at std namespace scope to ease partial specialization ?


Performance test results before the patch:

unordered_small_size.cc      std::unordered_set: 1st insert      
40r   32u    8s 264000112mem    0pf
unordered_small_size.cc      std::unordered_set: find/erase      
22r   22u    0s -191999808mem    0pf
unordered_small_size.cc      std::unordered_set: 2nd insert      
36r   36u    0s 191999776mem    0pf
unordered_small_size.cc      std::unordered_set: erase key       
25r   25u    0s -191999808mem    0pf
unordered_small_size.cc      std::unordered_set: 1st insert    
 404r  244u  156s -1989936256mem    0pf
unordered_small_size.cc      std::unordered_set: find/erase    
 315r  315u    0s 2061942272mem    0pf
unordered_small_size.cc      std::unordered_set: 2nd insert    
 233r  233u    0s -2061942208mem    0pf
unordered_small_size.cc      std::unordered_set: erase key     
 299r  298u    0s 2061942208mem    0pf


after the patch:

unordered_small_size.cc      std::unordered_set: 1st insert      
41r   33u    7s 264000112mem    0pf
unordered_small_size.cc      std::unordered_set: find/erase      
24r   25u    1s -191999808mem    0pf
unordered_small_size.cc      std::unordered_set: 2nd insert      
34r   34u    0s 191999776mem    0pf
unordered_small_size.cc      std::unordered_set: erase key       
25r   25u    0s -191999808mem    0pf
unordered_small_size.cc      std::unordered_set: 1st insert    
 399r  232u  165s -1989936256mem    0pf
unordered_small_size.cc      std::unordered_set: find/erase    
 196r  197u    0s 2061942272mem    0pf
unordered_small_size.cc      std::unordered_set: 2nd insert    
 221r  222u    0s -2061942208mem    0pf
unordered_small_size.cc      std::unordered_set: erase key     
 178r  178u    0s 2061942208mem    0pf


    libstdc++: Optimize operations on small size hashtable [PR 68303]

    When hasher is identified as slow and the number of elements is 
limited in the
    container use a brute-force loop on those elements to look for a 
given key using
    the key_equal functor. For the moment the default threshold below 
which the

    container is considered as small is 20.

    libstdc++-v3/ChangeLog:

    PR libstdc++/68303
    * include/bits/hashtable_policy.h
    (_Hashtable_hash_traits<_Hash>): New.
    (_Hash_code_base<>::_M_hash_code(const 
_Hash_node_value<>&)): New.

    (_Hashtable_base<>::_M_key_equals): New.
    (_Hashtable_base<>::_M_equals): Use latter.
    (_Hashtable_base<>::_M_key_equals_tr): New.
    (_Hashtable_base<>::_M_equals_tr): Use latter.
    * include/bits/hashtable.h
    (_Hashtable<>::__small_size_threshold()): New, use 
_Hashtable_hash_traits.
    (_Hashtable<>::find): Loop through elements to look for key 
if size is lower

    than __small_size_threshold().
    (_Hashtable<>::_M_emplace(true_type, _Args&&...)): Likewise.
    (_Hashtable<>::_M_insert_unique(_Kt&&, _Args&&, const 
_NodeGenerator&)): Likewise.

(_Hashtable<>::_M_compute_hash_code(const_iterator, const key_type&)): New.
    (_Hashtable<>::_M_emplace(const_iterator, false_type, 
_Args&&...)): Use latter.

    (_Hashtable<>::_M_find_before_node(const key_type&)): New.
    (_Hashtable<>::_M_erase(true_type, const key_type&)): Use 
latter.
    (_Hashtable<>::_M_erase(false_type, const key_type&)): 
Likewise.
    * src/c++11/hashtable_c++0x.cc: Include 
.

    * testsuite/util/testsuite_performane.h
    (report_performance): Use 9 width to display memory.
    * 
testsuite/performance/23_containers/insert_erase/unordered_small_size.cc:

    New performance test case.

Tested 

Re: [PATCH] RISC-V: Enable overlap-by-pieces in case of fast unaliged access

2021-08-16 Thread Andrew Pinski via Gcc-patches
On Mon, Aug 16, 2021 at 10:10 AM Palmer Dabbelt  wrote:
>
> On Mon, 16 Aug 2021 09:29:16 PDT (-0700), Kito Cheng wrote:
> >> > Could you submit v3 patch which is v1 with overlap_op_by_pieces field,
> >> > testcase from v2 and add a few more comments to describe the field?
> >> >
> >> > And add an -mtune=ultra-size to make it able to test without change
> >> > other behavior?
> >> >
> >> > Hi Palmer:
> >> >
> >> > Are you OK with that?
> >>
> >> I'm still not convinced on the performance: like Andrew and I pointed
> >> out, this is a difficult case for pipelines of this flavor to handle.
> >> Nobody here knows anything about this pipeline deeply enough to say
> >> anything difinitive, though, so this is really just a guess.
> >
> > So with an extra field to indicate should resolve that?
> > I believe people should only set overlap_op_by_pieces
> > to true only if they are sure it has benefits.
>
> My only issue there is that we'd have no way to turn it on, but see
> below...
>
> >> As I'm not convinced this is an obvious performance win I'm not going to
> >> merge it without a benchmark.  If you're convinced and want to merge it
> >> that's fine, I don't really care about the performance fo the C906 and
> >> if someone complains we can always just revert it later.
> >
> > I suppose Christoph has tried with their internal processor, and it's
> > benefit on performance,
> > but it can't be open-source yet, so v2 patch set using C906 to demo
> > and test that since that is
> > the only processor with slow_unaligned_access=False.
>
> Well, that's a very different discussion.  The C906 tuning model should
> be for the C906, not a proxy for some internal-only processor.  If the
> goal here is to allow this pass to be flipped on by an out-of-tree
> pipeline model then we can talk about it.
>
> > I agree on the C906 part, we never know it's benefit or not, so I propose
> > adding one -mtune=ultra-size to make this test-able rather than changing 
> > C906.
>
> That's essentially the same conclusion we came to last time this came
> up, except that we were calling it "-Oz" (because LLVM does).  I guess
> we never got around having the broader GCC discussion about "-Oz".  IIRC
> we had some other "-Oz" candidates we never got around to dealing with,
> but that was a while ago so I'm not sure if any of that panned out.

-Oz was a bad idea that Apple came up because GCC decided to start
emitting store multiple on PowerPC around 13 years ago.
I don't think we should repeat that mistake for GCC and especially for RISCV.
If people want to optimize for size, they get the performance issues.

Thanks,
Andrew Pinski


Re: [Patch][GCC][middle-end] - Generate FRINTZ for (double)(int) under -ffast-math on aarch64

2021-08-16 Thread Andrew Pinski via Gcc-patches
On Mon, Aug 16, 2021 at 9:15 AM Jirui Wu via Gcc-patches
 wrote:
>
> Hi all,
>
> This patch generates FRINTZ instruction to optimize type casts.
>
> The changes in this patch covers:
> * Opimization of a FIX_TRUNC_EXPR cast inside a FLOAT_EXPR using IFN_TRUNC.
> * Change of corresponding test cases.
>
> Regtested on aarch64-none-linux-gnu and no issues.
>
> Ok for master? If OK can it be committed for me, I have no commit rights.

Is there a reason why you are doing the transformation manually inside
forwprop rather than handling it inside match.pd?
Also can't this only be done for -ffast-math case?

Thanks,
Andrew Pinski

>
> Thanks,
> Jirui
>
> gcc/ChangeLog:
>
> * tree-ssa-forwprop.c (pass_forwprop::execute): Optimize with frintz.
>
>
> gcc/testsuite/ChangeLog:
>
> * gcc.target/aarch64/fix_trunc1.c: Update to new expectation.


[r12-2919 Regression] FAIL: gcc.target/i386/pr82460-2.c scan-assembler-not \\mvpermi2b\\M on Linux/x86_64

2021-08-16 Thread sunil.k.pandey via Gcc-patches
On Linux/x86_64,

faf2b6bc527dff31725dde55381c92688047 is the first bad commit
commit faf2b6bc527dff31725dde55381c92688047
Author: liuhongt 
Date:   Mon Aug 16 11:16:52 2021 +0800

Optimize __builtin_shuffle_vector.

caused

FAIL: gcc.target/i386/pr82460-2.c scan-assembler-not \\mvpermi2b\\M

with GCC configured with



To reproduce:

$ cd {build_dir}/gcc && make check 
RUNTESTFLAGS="i386.exp=gcc.target/i386/pr82460-2.c --target_board='unix{-m32\ 
-march=cascadelake}'"
$ cd {build_dir}/gcc && make check 
RUNTESTFLAGS="i386.exp=gcc.target/i386/pr82460-2.c --target_board='unix{-m64\ 
-march=cascadelake}'"

(Please do not reply to this email, for question about this report, contact me 
at skpgkp2 at gmail dot com)


Re: [PATCH, V2 2/3] targhooks: New target hook for CTF/BTF debug info emission

2021-08-16 Thread Indu Bhagat via Gcc-patches

On 8/10/21 4:54 AM, Richard Biener wrote:

On Thu, Aug 5, 2021 at 2:52 AM Indu Bhagat via Gcc-patches
 wrote:


This patch adds a new target hook to detect if the CTF container can allow the
emission of CTF/BTF debug info at DWARF debug info early finish time. Some
backends, e.g., BPF when generating code for CO-RE usecase, may need to emit
the CTF/BTF debug info sections around the time when late DWARF debug is
finalized (dwarf2out_finish).


Without looking at the dwarf2out.c usage in the next patch - I think
the CTF part
should be always emitted from dwarf2out_early_finish, the "hooks" should somehow
arrange for the alternate output specific data to be preserved until
dwarf2out_finish
time so the late BTF data can be emitted from there.

Lumping everything together now just makes it harder to see what info
is required
to persist and thus make LTO support more intrusive than necessary.


In principle, I agree the approach to split generate/emit CTF/BTF like 
you mention is ideal.  But, the BTF CO-RE relocations format is such 
that the .BTF section cannot be finalized until .BTF.ext contents are 
all fully known (David Faust summarizes this issue in the other thread 
"[PATCH, V2 3/3] dwarf2out: Emit BTF in dwarf2out_finish for BPF CO-RE 
usecase".)


In summary, the .BTF.ext section refers to strings in the .BTF section. 
These strings are added at the time the CO-RE relocations are added. 
Recall that the .BTF section's header has information about the .BTF 
string table start offset and length. So, this means the "CTF part" (or 
the .BTF section) cannot simply be emitted in the dwarf2out_early_finish 
because it's not ready yet. If it is still unclear, please let me know.


My judgement here is that the BTF format itself is not amenable to split 
early/late emission like DWARF. BTF has no linker support yet either.





gcc/ChangeLog:

 * config/bpf/bpf.c (ctfc_debuginfo_early_finish_p): New definition.
 (TARGET_CTFC_DEBUGINFO_EARLY_FINISH_P): Undefine and override.
 * doc/tm.texi: Regenerated.
 * doc/tm.texi.in: Document the new hook.
 * target.def: Add a new hook.
 * targhooks.c (default_ctfc_debuginfo_early_finish_p): Likewise.
 * targhooks.h (default_ctfc_debuginfo_early_finish_p): Likewise.
---
  gcc/config/bpf/bpf.c | 14 ++
  gcc/doc/tm.texi  |  6 ++
  gcc/doc/tm.texi.in   |  2 ++
  gcc/target.def   | 10 ++
  gcc/targhooks.c  |  6 ++
  gcc/targhooks.h  |  2 ++
  6 files changed, 40 insertions(+)

diff --git a/gcc/config/bpf/bpf.c b/gcc/config/bpf/bpf.c
index 028013e..85f6b76 100644
--- a/gcc/config/bpf/bpf.c
+++ b/gcc/config/bpf/bpf.c
@@ -178,6 +178,20 @@ bpf_option_override (void)
  #undef TARGET_OPTION_OVERRIDE
  #define TARGET_OPTION_OVERRIDE bpf_option_override

+/* Return FALSE iff -mcore has been specified.  */
+
+static bool
+ctfc_debuginfo_early_finish_p (void)
+{
+  if (TARGET_BPF_CORE)
+return false;
+  else
+return true;
+}
+
+#undef TARGET_CTFC_DEBUGINFO_EARLY_FINISH_P
+#define TARGET_CTFC_DEBUGINFO_EARLY_FINISH_P ctfc_debuginfo_early_finish_p
+
  /* Define target-specific CPP macros.  This function in used in the
 definition of TARGET_CPU_CPP_BUILTINS in bpf.h */

diff --git a/gcc/doc/tm.texi b/gcc/doc/tm.texi
index cb01528..2d5ff05 100644
--- a/gcc/doc/tm.texi
+++ b/gcc/doc/tm.texi
@@ -10400,6 +10400,12 @@ Define this macro if GCC should produce debugging 
output in BTF debug
  format in response to the @option{-gbtf} option.
  @end defmac

+@deftypefn {Target Hook} bool TARGET_CTFC_DEBUGINFO_EARLY_FINISH_P (void)
+This target hook returns nonzero if the CTF Container can allow the
+ emission of the CTF/BTF debug info at the DWARF debuginfo early finish
+ time.
+@end deftypefn
+
  @node Floating Point
  @section Cross Compilation and Floating Point
  @cindex cross compilation and floating point
diff --git a/gcc/doc/tm.texi.in b/gcc/doc/tm.texi.in
index 4a522ae..05b3c2c 100644
--- a/gcc/doc/tm.texi.in
+++ b/gcc/doc/tm.texi.in
@@ -7020,6 +7020,8 @@ Define this macro if GCC should produce debugging output 
in BTF debug
  format in response to the @option{-gbtf} option.
  @end defmac

+@hook TARGET_CTFC_DEBUGINFO_EARLY_FINISH_P
+
  @node Floating Point
  @section Cross Compilation and Floating Point
  @cindex cross compilation and floating point
diff --git a/gcc/target.def b/gcc/target.def
index 68a46aa..44e2251 100644
--- a/gcc/target.def
+++ b/gcc/target.def
@@ -4016,6 +4016,16 @@ clobbered parts of a register altering the frame register 
size",
   machine_mode, (int regno),
   default_dwarf_frame_reg_mode)

+/* Return nonzero if CTF Container can finalize the CTF/BTF emission
+   at DWARF debuginfo early finish time.  */
+DEFHOOK
+(ctfc_debuginfo_early_finish_p,
+ "This target hook returns nonzero if the CTF Container can allow the\n\
+ emission of the CTF/BTF debug info at the DWARF debuginfo early finish\n\
+ time.",
+ bool, (void),
+ 

C++ Patch ping

2021-08-16 Thread Jakub Jelinek via Gcc-patches
Hi!

I'd like to ping 3 patches:

c++: Add C++20 #__VA_OPT__ support
https://gcc.gnu.org/pipermail/gcc-patches/2021-July/575355.html

libcpp: __VA_OPT__ p1042r1 placemarker changes [PR101488]
https://gcc.gnu.org/pipermail/gcc-patches/2021-July/575621.html

libcpp, v2: Implement C++23 P1949R7 - C++ Identifier Syntax using Unicode 
Standard Annex 31 [PR100977]
https://gcc.gnu.org/pipermail/gcc-patches/2021-August/576854.html

Thanks

Jakub



Re: [PATCH] c++, v2: Implement P0466R5 __cpp_lib_is_layout_compatible compiler helpers [PR101539]

2021-08-16 Thread Jakub Jelinek via Gcc-patches
On Thu, Aug 12, 2021 at 07:07:00PM +0200, Jakub Jelinek via Gcc-patches wrote:
> On Thu, Aug 12, 2021 at 12:06:33PM -0400, Jason Merrill wrote:
> > Yes; if the standard says something nonsensical, I prefer to figure out
> > something more sensible to propose as a change.
> 
> Ok, so here it is implemented, so far tested only on the new testcases
> (but nothing else really uses the code that has changed since the last
> patch).  Also attached incremental diff (so that it is also clear
> what test behaviors changed).
> I'll of course bootstrap/regtest it full tonight.

Bootstrapped/regtested fine on both x86_64-linux and i686-linux.

> 2021-08-12  Jakub Jelinek  
> 
>   PR c++/101539
> gcc/c-family/
>   * c-common.h (enum rid): Add RID_IS_LAYOUT_COMPATIBLE.
>   * c-common.c (c_common_reswords): Add __is_layout_compatible.
> gcc/cp/
>   * cp-tree.h (enum cp_trait_kind): Add CPTK_IS_LAYOUT_COMPATIBLE.
>   (enum cp_built_in_function): Add CP_BUILT_IN_IS_CORRESPONDING_MEMBER.
>   (fold_builtin_is_corresponding_member, layout_compatible_type_p):
>   Declare.
>   * parser.c (cp_parser_primary_expression): Handle
>   RID_IS_LAYOUT_COMPATIBLE.
>   (cp_parser_trait_expr): Likewise.
>   * cp-objcp-common.c (names_builtin_p): Likewise.
>   * constraint.cc (diagnose_trait_expr): Handle
>   CPTK_IS_LAYOUT_COMPATIBLE.
>   * decl.c (cxx_init_decl_processing): Register
>   __builtin_is_corresponding_member builtin.
>   * constexpr.c (cxx_eval_builtin_function_call): Handle
>   CP_BUILT_IN_IS_CORRESPONDING_MEMBER builtin.
>   * semantics.c (is_corresponding_member_union,
>   is_corresponding_member_aggr, fold_builtin_is_corresponding_member):
>   New functions.
>   (trait_expr_value): Handle CPTK_IS_LAYOUT_COMPATIBLE.
>   (finish_trait_expr): Likewise.
>   * typeck.c (layout_compatible_type_p): New function.
>   * cp-gimplify.c (cp_gimplify_expr): Fold
>   CP_BUILT_IN_IS_CORRESPONDING_MEMBER.
>   (cp_fold): Likewise.
>   * tree.c (builtin_valid_in_constant_expr_p): Handle
>   CP_BUILT_IN_IS_CORRESPONDING_MEMBER.
>   * cxx-pretty-print.c (pp_cxx_trait_expression): Handle
>   CPTK_IS_LAYOUT_COMPATIBLE.
>   * class.c (remove_zero_width_bit_fields): Remove.
>   (layout_class_type): Don't call it.
> gcc/testsuite/
>   * g++.dg/cpp2a/is-corresponding-member1.C: New test.
>   * g++.dg/cpp2a/is-corresponding-member2.C: New test.
>   * g++.dg/cpp2a/is-corresponding-member3.C: New test.
>   * g++.dg/cpp2a/is-corresponding-member4.C: New test.
>   * g++.dg/cpp2a/is-corresponding-member5.C: New test.
>   * g++.dg/cpp2a/is-corresponding-member6.C: New test.
>   * g++.dg/cpp2a/is-corresponding-member7.C: New test.
>   * g++.dg/cpp2a/is-corresponding-member8.C: New test.
>   * g++.dg/cpp2a/is-layout-compatible1.C: New test.
>   * g++.dg/cpp2a/is-layout-compatible2.C: New test.
>   * g++.dg/cpp2a/is-layout-compatible3.C: New test.

Jakub



Re: [PATCH] RISC-V: Enable overlap-by-pieces in case of fast unaliged access

2021-08-16 Thread Palmer Dabbelt

On Mon, 16 Aug 2021 09:29:16 PDT (-0700), Kito Cheng wrote:

> Could you submit v3 patch which is v1 with overlap_op_by_pieces field,
> testcase from v2 and add a few more comments to describe the field?
>
> And add an -mtune=ultra-size to make it able to test without change
> other behavior?
>
> Hi Palmer:
>
> Are you OK with that?

I'm still not convinced on the performance: like Andrew and I pointed
out, this is a difficult case for pipelines of this flavor to handle.
Nobody here knows anything about this pipeline deeply enough to say
anything difinitive, though, so this is really just a guess.


So with an extra field to indicate should resolve that?
I believe people should only set overlap_op_by_pieces
to true only if they are sure it has benefits.


My only issue there is that we'd have no way to turn it on, but see 
below...



As I'm not convinced this is an obvious performance win I'm not going to
merge it without a benchmark.  If you're convinced and want to merge it
that's fine, I don't really care about the performance fo the C906 and
if someone complains we can always just revert it later.


I suppose Christoph has tried with their internal processor, and it's
benefit on performance,
but it can't be open-source yet, so v2 patch set using C906 to demo
and test that since that is
the only processor with slow_unaligned_access=False.


Well, that's a very different discussion.  The C906 tuning model should 
be for the C906, not a proxy for some internal-only processor.  If the 
goal here is to allow this pass to be flipped on by an out-of-tree 
pipeline model then we can talk about it.



I agree on the C906 part, we never know it's benefit or not, so I propose
adding one -mtune=ultra-size to make this test-able rather than changing C906.


That's essentially the same conclusion we came to last time this came 
up, except that we were calling it "-Oz" (because LLVM does).  I guess 
we never got around having the broader GCC discussion about "-Oz".  IIRC 
we had some other "-Oz" candidates we never got around to dealing with, 
but that was a while ago so I'm not sure if any of that panned out.


Re: [PATCH 1/2] analyzer: detect and analyze calls via function pointer (GSoC)

2021-08-16 Thread Ankur Saini via Gcc-patches

Thanks for the review

> On 16-Aug-2021, at 4:48 AM, David Malcolm  wrote:
> 
> Thanks, this is looking promising.  Has this been rebased recently
> (e.g. since I merged
>  https://gcc.gnu.org/pipermail/gcc-patches/2021-August/576737.html )

Yes, The branch is totally up to date with master at the time of writing this 
mail.

- - -

Here is the updated patch :



vfunc.patch
Description: Binary data


- - -
P.S. While adding the new tests, I found some last minute bugs where analyzer 
would sometimes try to access a NULL cgraph edge to get the call stmt and crash.
Although the problem was easily fixed by updating 
`callgraph_superedge::get_call_stmt ()`, it lead to a delay in running the test 
suite on the final version ( so the patch is still under testing at the time of 
writing this mail ). 

Thanks 
- Ankur

[committed] libstdc++: Use qualified-id for class member constant [PR101937]

2021-08-16 Thread Jonathan Wakely via Gcc-patches
The expression ctx._M_indent is not a constant expression when ctx is a
reference parameter, even though _M_indent is an enumerator. Rename it
to _S_indent to be consistent with our conventions, and refer to it as
PrintContext::_S_indent to be valid C++ code (at least until P2280 is
accepted as a DR).

Signed-off-by: Jonathan Wakely 

libstdc++-v3/ChangeLog:

PR libstdc++/101937
* src/c++11/debug.cc (PrintContext::_M_indent): Replace with a
static data member.
(print_word): Use qualified-id to access it.

Tested x86_64-linux. Committed to trunk.

commit 6c25932ac399423b09b730fb8f894ada568deb2a
Author: Jonathan Wakely 
Date:   Mon Aug 16 15:35:58 2021

libstdc++: Use qualified-id for class member constant [PR101937]

The expression ctx._M_indent is not a constant expression when ctx is a
reference parameter, even though _M_indent is an enumerator. Rename it
to _S_indent to be consistent with our conventions, and refer to it as
PrintContext::_S_indent to be valid C++ code (at least until P2280 is
accepted as a DR).

Signed-off-by: Jonathan Wakely 

libstdc++-v3/ChangeLog:

PR libstdc++/101937
* src/c++11/debug.cc (PrintContext::_M_indent): Replace with a
static data member.
(print_word): Use qualified-id to access it.

diff --git a/libstdc++-v3/src/c++11/debug.cc b/libstdc++-v3/src/c++11/debug.cc
index 33d76bfcaab..0128535135e 100644
--- a/libstdc++-v3/src/c++11/debug.cc
+++ b/libstdc++-v3/src/c++11/debug.cc
@@ -573,8 +573,8 @@ namespace
 : _M_max_length(78), _M_column(1), _M_first_line(true), _M_wordwrap(false)
 { get_max_length(_M_max_length); }
 
+static constexpr int _S_indent = 4;
 std::size_t_M_max_length;
-enum { _M_indent = 4 } ;
 std::size_t_M_column;
 bool   _M_first_line;
 bool   _M_wordwrap;
@@ -603,7 +603,7 @@ namespace
 if (length == 0)
   return;
 
-// Consider first '\n' at begining cause it impacts column.
+// First consider '\n' at the beginning because it impacts the column.
 if (word[0] == '\n')
   {
fprintf(stderr, "\n");
@@ -625,8 +625,8 @@ namespace
// If this isn't the first line, indent
if (ctx._M_column == 1 && !ctx._M_first_line)
  {
-   const char spacing[ctx._M_indent + 1] = "";
-   print_raw(ctx, spacing, ctx._M_indent);
+   const char spacing[PrintContext::_S_indent + 1] = "";
+   print_raw(ctx, spacing, PrintContext::_S_indent);
  }
 
int written = fprintf(stderr, "%.*s", (int)length, word);


[committed] libstdc++: Install GDB pretty printers for debug library

2021-08-16 Thread Jonathan Wakely via Gcc-patches
The additional libraries installed by --enable-libstdcxx-debug are built
without optimization to aid debugging, but the Python pretty printers
are not installed alongside them. This means that you can step through
the unoptimized library code, but at the expense of pretty printing the
library types.

This remedies the situation by installing another copy of the GDB hooks
alongside the debug version of libstdc++.so.

Signed-off-by: Jonathan Wakely 

libstdc++-v3/ChangeLog:

* python/Makefile.am [GLIBCXX_BUILD_DEBUG] (install-data-local):
Install another copy of the GDB hook.
* python/Makefile.in: Regenerate.

Tested x86_64-linux. Committed to trunk.

commit db853ff78a34fef25bc16133e0367a64526f9f4e
Author: Jonathan Wakely 
Date:   Thu Aug 12 19:56:14 2021

libstdc++: Install GDB pretty printers for debug library

The additional libraries installed by --enable-libstdcxx-debug are built
without optimization to aid debugging, but the Python pretty printers
are not installed alongside them. This means that you can step through
the unoptimized library code, but at the expense of pretty printing the
library types.

This remedies the situation by installing another copy of the GDB hooks
alongside the debug version of libstdc++.so.

Signed-off-by: Jonathan Wakely 

libstdc++-v3/ChangeLog:

* python/Makefile.am [GLIBCXX_BUILD_DEBUG] (install-data-local):
Install another copy of the GDB hook.
* python/Makefile.in: Regenerate.

diff --git a/libstdc++-v3/python/Makefile.am b/libstdc++-v3/python/Makefile.am
index 8efefa5725c..bc4a26651d8 100644
--- a/libstdc++-v3/python/Makefile.am
+++ b/libstdc++-v3/python/Makefile.am
@@ -29,6 +29,12 @@ else
 pythondir = $(datadir)/gcc-$(gcc_version)/python
 endif
 
+if GLIBCXX_BUILD_DEBUG
+debug_gdb_py = YES
+else
+debug_gdb_py =
+endif
+
 all-local: gdb.py
 
 nobase_python_DATA = \
@@ -53,4 +59,8 @@ install-data-local: gdb.py
  $(DESTDIR)$(toolexeclibdir)/libstdc++.la`; \
fi; \
echo " $(INSTALL_DATA) gdb.py 
$(DESTDIR)$(toolexeclibdir)/$$libname-gdb.py"; \
-   $(INSTALL_DATA) gdb.py $(DESTDIR)$(toolexeclibdir)/$$libname-gdb.py
+   $(INSTALL_DATA) gdb.py $(DESTDIR)$(toolexeclibdir)/$$libname-gdb.py ; \
+   if [ -n "$(debug_gdb_py)" ]; then \
+ sed "/^libdir = /s;'$$;/debug';" gdb.py > debug-gdb.py ; \
+ $(INSTALL_DATA) debug-gdb.py 
$(DESTDIR)$(toolexeclibdir)/debug/$$libname-gdb.py ; \
+   fi


Re: What should a std::error_code pretty printer show?

2021-08-16 Thread Jonathan Wakely via Gcc-patches
On Mon, 16 Aug 2021 at 13:11, Jonathan Wakely  wrote:
>
>
>
> On Mon, 16 Aug 2021, 12:55 Jonathan Wakely,  wrote:
>>
>> I'm adding a GDB printer for std::error_code.What I have now prints
>> the category name as a quoted string, followed by the error value:
>>
>> {"system": 0}
>> {"system": 1234}
>>
>> If the category is std::generic_category() then it also shows the
>> strerror description:
>
>
> I should probably extend this special case for the generic category to also 
> apply to the system category when the OS is POSIX-based. For POSIX systems, 
> the system error numbers are generic errno values.
>
>
>>
>> {"generic": 13 "Permission denied"}
>>
>> But I'd also like it to show the errno macro, but I'm not sure what's
>> the best way to show it.
>>
>> Does this seem OK?
>>
>> {"generic": 13 EACCES "Permission denied"}
>>
>> I think that's a bit too verbose.
>>
>> Would {"generic": EACCES} be better? You can always use ec.value() to
>> get the numeric value, and strerror to get the description if you want
>> those.

Here's what I plan to commit. It just uses {"generic": EACCES} for
known categories that use errno values, and {"foo": 99} for other
error categories.


It also supports std::error_condition (using the same printer and the
same output formats).
commit f2de11d5f5758dfe90330b0efacbea9c7819df8e
Author: Jonathan Wakely 
Date:   Mon Aug 16 17:41:50 2021

libstdc++: Add pretty printer for std::error_code and std::error_condition

Signed-off-by: Jonathan Wakely 

libstdc++-v3/ChangeLog:

* python/libstdcxx/v6/printers.py (StdErrorCodePrinter): Define.
(build_libstdcxx_dictionary): Register printer for
std::error_code and std::error_condition.
* testsuite/libstdc++-prettyprinters/cxx11.cc: Test it.

diff --git a/libstdc++-v3/python/libstdcxx/v6/printers.py 
b/libstdc++-v3/python/libstdcxx/v6/printers.py
index 550e0ecdd22..069c1bae183 100644
--- a/libstdc++-v3/python/libstdcxx/v6/printers.py
+++ b/libstdc++-v3/python/libstdcxx/v6/printers.py
@@ -18,7 +18,7 @@
 import gdb
 import itertools
 import re
-import sys
+import sys, os, errno
 
 ### Python 2 + Python 3 compatibility code
 
@@ -1484,6 +1484,52 @@ class StdCmpCatPrinter:
 name = names[int(self.val)]
 return 'std::{}::{}'.format(self.typename, name)
 
+class StdErrorCatPrinter:
+"Print an object derived from std::error_category"
+
+def __init__ (self, typename, val):
+self.val = val
+self.typename = typename
+
+def to_string (self):
+gdb.set_convenience_variable('__cat', self.val)
+name = gdb.parse_and_eval('$__cat->name()').string()
+return 'error category = "{}"'.format(name)
+
+class StdErrorCodePrinter:
+"Print a std::error_code or std::error_condition"
+
+_errno_categories = None # List of categories that use errno values
+
+def __init__ (self, typename, val):
+self.val = val
+self.typename = typename
+# Do this only once ...
+if StdErrorCodePrinter._errno_categories is None:
+StdErrorCodePrinter._errno_categories = ['generic']
+try:
+import posix
+StdErrorCodePrinter._errno_categories.append('system')
+except ImportError:
+pass
+
+@staticmethod
+def _category_name(cat):
+"Call the virtual function that overrides std::error_category::name()"
+gdb.set_convenience_variable('__cat', cat)
+return gdb.parse_and_eval('$__cat->name()').string()
+
+def to_string (self):
+value = self.val['_M_value']
+category = self._category_name(self.val['_M_cat'])
+strval = str(value)
+if value > 0 and category in StdErrorCodePrinter._errno_categories:
+try:
+strval = errno.errorcode[int(value)]
+except:
+pass
+return '%s = {"%s": %s}' % (self.typename, category, strval)
+
 # A "regular expression" printer which conforms to the
 # "SubPrettyPrinter" protocol from gdb.printing.
 class RxPrinter(object):
@@ -1886,6 +1932,8 @@ def build_libstdcxx_dictionary ():
 libstdcxx_printer.add_version('std::__cxx11::', 'basic_string', 
StdStringPrinter)
 libstdcxx_printer.add_container('std::', 'bitset', StdBitsetPrinter)
 libstdcxx_printer.add_container('std::', 'deque', StdDequePrinter)
+libstdcxx_printer.add_version('std::', 'error_code', StdErrorCodePrinter)
+libstdcxx_printer.add_version('std::', 'error_condition', 
StdErrorCodePrinter)
 libstdcxx_printer.add_container('std::', 'list', StdListPrinter)
 libstdcxx_printer.add_container('std::__cxx11::', 'list', StdListPrinter)
 libstdcxx_printer.add_container('std::', 'map', StdMapPrinter)
diff --git a/libstdc++-v3/testsuite/libstdc++-prettyprinters/cxx11.cc 
b/libstdc++-v3/testsuite/libstdc++-prettyprinters/cxx11.cc
index 05950513ab0..08f2c367f45 100644
--- 

Re: [patch][version 6] add -ftrivial-auto-var-init and variable attribute "uninitialized" to gcc

2021-08-16 Thread Qing Zhao via Gcc-patches


> On Aug 16, 2021, at 2:11 AM, Richard Biener  wrote:
> 
> On Wed, 11 Aug 2021, Qing Zhao wrote:
> 
>> Hi, 
>> 
>> I met another issue for “address taken” auto variable, see below for details:
>> 
>>  the testing case: (gcc/testsuite/gcc.dg/uninit-16.c)
>> 
>> int foo, bar;
>> 
>> static
>> void decode_reloc(int reloc, int *is_alt)
>> {
>>  if (reloc >= 20)
>>  *is_alt = 1;
>>  else if (reloc >= 10)
>>  *is_alt = 0;
>> }
>> 
>> void testfunc()
>> {
>>  int alt_reloc;
>> 
>>  decode_reloc(foo, _reloc);
>> 
>>  if (alt_reloc) /* { dg-warning "may be used uninitialized" } */
>>bar = 42;
>> }
>> 
>> When compiled with -ftrivial-auto-var-init=zero -O2 -Wuninitialized 
>> -fdump-tree-all:
>> 
>> .*gimple dump:
>> 
>> void testfunc ()
>> { 
>>  int alt_reloc;
>> 
>>  try
>>{
>>  _1 = .DEFERRED_INIT (4, 2, 0);
>>  alt_reloc = _1;
>>  foo.0_2 = foo;
>>  decode_reloc (foo.0_2, _reloc);
>>  alt_reloc.1_3 = alt_reloc;
>>  if (alt_reloc.1_3 != 0) goto ; else goto ;
>>  :
>>  bar = 42;
>>  :
>>}
>>  finally
>>{
>>  alt_reloc = {CLOBBER};
>>}
>> }
>> 
>> **fre1 dump:
>> 
>> void testfunc ()
>> {
>>  int alt_reloc;
>>  int _1;
>>  int foo.0_2;
>> 
>>   :
>>  _1 = .DEFERRED_INIT (4, 2, 0);
>>  foo.0_2 = foo;
>>  if (foo.0_2 > 19)
>>goto ; [50.00%]
>>  else
>>goto ; [50.00%]
>> 
>>   :
>>  goto ; [100.00%]
>> 
>>   :
>>  if (foo.0_2 > 9)
>>goto ; [50.00%]
>>  else
>>goto ; [50.00%]
>> 
>>   :
>>  goto ; [100.00%]
>> 
>>   :
>>  if (_1 != 0)
>>goto ; [INV]
>>  else
>>goto ; [INV]
>> 
>>   :
>>  bar = 42;
>> 
>>   :
>>  return;
>> 
>> }
>> 
>> From the above IR file after “FRE”, we can see that the major issue with 
>> this IR is:
>> 
>> The address taken auto variable “alt_reloc” has been completely replaced by 
>> the temporary variable “_1” in all
>> the uses of the original “alt_reloc”. 
> 
> Well, this can happen with regular code as well, there's no need for
> .DEFERRED_INIT.  This is the usual problem with reporting uninitialized
> uses late.
> 
> IMHO this shouldn't be a blocker.  The goal of zero "regressions" wrt
> -Wuninitialized isn't really achievable.

Okay. Sounds reasonable to me too.

> 
>> The major problem with such IR is,  during uninitialized analysis phase, the 
>> original use of “alt_reloc” disappeared completely.
>> So, the warning cannot be reported.
>> 
>> 
>> My questions:
>> 
>> 1. Is it possible to get the original “alt_reloc” through the temporary 
>> variable “_1” with some available information recorded in the IR?
>> 2. If not, then we have to record the relationship between “alt_reloc” and 
>> “_1” when the original “alt_reloc” is replaced by “_1” and get such 
>> relationship during
>>Uninitialized analysis phase.  Is this doable?
> 
> Well, you could add a fake argument to .DEFERRED_INIT for the purpose of
> diagnostics.  The difficulty is to avoid tracking it as actual use so
> you could for example pass a string with the declarations name though
> this wouldn't give the association with the actual decl.
Good suggestion, I can try this a little bit. 

> 
>> 3. Looks like that for “address taken” auto variable, if we have to 
>> introduce a new temporary variable and split the call to .DEFERRED_INIT into 
>> two:
>> 
>>  temp = .DEFERRED_INIT (4, 2, 0);
>>  alt_reloc = temp;
>> 
>>   More issues might possible.
>> 
>> Any comments and suggestions on this issue?
> 
> I don't see any good possibilities that would not make optimizing code
> as good as w/o .DEFERRED_INIT more difficult.  My stake here is always
> that GCC is an optimizing compiler, not a static analysis engine and
> thus I side with "broken" diagnostics and better optimization.
That’s true and reasonable, too.

thanks.

Qing
> 
> Richard.
> 
>> Qing
>> 
>> j
>>> On Aug 11, 2021, at 11:55 AM, Richard Biener  wrote:
>>> 
>>> On August 11, 2021 6:22:00 PM GMT+02:00, Qing Zhao  
>>> wrote:
 
 
> On Aug 11, 2021, at 10:53 AM, Richard Biener  wrote:
> 
> On August 11, 2021 5:30:40 PM GMT+02:00, Qing Zhao  
> wrote:
>> I modified the routine “gimple_add_init_for_auto_var” as the following:
>> 
>> /* Generate initialization to automatic variable DECL based on INIT_TYPE.
>> Build a call to internal const function DEFERRED_INIT:
>> 1st argument: SIZE of the DECL;
>> 2nd argument: INIT_TYPE;
>> 3rd argument: IS_VLA, 0 NO, 1 YES;
>> 
>> as DEFERRED_INIT (SIZE of the DECL, INIT_TYPE, IS_VLA).  */
>> static void
>> gimple_add_init_for_auto_var (tree decl,
>>   enum auto_init_type init_type,
>>   bool is_vla,
>>   gimple_seq *seq_p)
>> {
>> gcc_assert (VAR_P (decl) && !DECL_EXTERNAL (decl) && !TREE_STATIC 
>> (decl));
>> gcc_assert (init_type > AUTO_INIT_UNINITIALIZED);
>> tree decl_size = TYPE_SIZE_UNIT (TREE_TYPE (decl));
>> 

[PATCH 3/3] aarch64: Remove macros for vld4[q]_lane Neon intrinsics

2021-08-16 Thread Jonathan Wright via Gcc-patches
Hi,

This patch removes macros for vld4[q]_lane Neon intrinsics. This is a
preparatory step before adding new modes for structures of Advanced
SIMD vectors.

Regression tested and bootstrapped on aarch64-none-linux-gnu - no
issues.

Ok for master?

Thanks,
Jonathan

---

gcc/ChangeLog:

2021-08-16  Jonathan Wright  

* config/aarch64/arm_neon.h (__LD4_LANE_FUNC): Delete.
(__LD4Q_LANE_FUNC): Likewise.
(vld4_lane_u8): Define without macro.
(vld4_lane_u16): Likewise.
(vld4_lane_u32): Likewise.
(vld4_lane_u64): Likewise.
(vld4_lane_s8): Likewise.
(vld4_lane_s16): Likewise.
(vld4_lane_s32): Likewise.
(vld4_lane_s64): Likewise.
(vld4_lane_f16): Likewise.
(vld4_lane_f32): Likewise.
(vld4_lane_f64): Likewise.
(vld4_lane_p8): Likewise.
(vld4_lane_p16): Likewise.
(vld4_lane_p64): Likewise.
(vld4q_lane_u8): Likewise.
(vld4q_lane_u16): Likewise.
(vld4q_lane_u32): Likewise.
(vld4q_lane_u64): Likewise.
(vld4q_lane_s8): Likewise.
(vld4q_lane_s16): Likewise.
(vld4q_lane_s32): Likewise.
(vld4q_lane_s64): Likewise.
(vld4q_lane_f16): Likewise.
(vld4q_lane_f32): Likewise.
(vld4q_lane_f64): Likewise.
(vld4q_lane_p8): Likewise.
(vld4q_lane_p16): Likewise.
(vld4q_lane_p64): Likewise.
(vld4_lane_bf16): Likewise.
(vld4q_lane_bf16): Likewise.


rb14793.patch
Description: rb14793.patch


[PATCH 2/3] aarch64: Remove macros for vld3[q]_lane Neon intrinsics

2021-08-16 Thread Jonathan Wright via Gcc-patches
Hi,

This patch removes macros for vld3[q]_lane Neon intrinsics. This is a
preparatory step before adding new modes for structures of Advanced
SIMD vectors.

Regression tested and bootstrapped on aarch64-none-linux-gnu - no
issues.

Ok for master?

Thanks,
Jonathan

---

gcc/ChangeLog:

2021-08-16  Jonathan Wright  

* config/aarch64/arm_neon.h (__LD3_LANE_FUNC): Delete.
(__LD3Q_LANE_FUNC): Delete.
(vld3_lane_u8): Define without macro.
(vld3_lane_u16): Likewise.
(vld3_lane_u32): Likewise.
(vld3_lane_u64): Likewise.
(vld3_lane_s8): Likewise.
(vld3_lane_s16): Likewise.
(vld3_lane_s32): Likewise.
(vld3_lane_s64): Likewise.
(vld3_lane_f16): Likewise.
(vld3_lane_f32): Likewise.
(vld3_lane_f64): Likewise.
(vld3_lane_p8): Likewise.
(vld3_lane_p16): Likewise.
(vld3_lane_p64): Likewise.
(vld3q_lane_u8): Likewise.
(vld3q_lane_u16): Likewise.
(vld3q_lane_u32): Likewise.
(vld3q_lane_u64): Likewise.
(vld3q_lane_s8): Likewise.
(vld3q_lane_s16): Likewise.
(vld3q_lane_s32): Likewise.
(vld3q_lane_s64): Likewise.
(vld3q_lane_f16): Likewise.
(vld3q_lane_f32): Likewise.
(vld3q_lane_f64): Likewise.
(vld3q_lane_p8): Likewise.
(vld3q_lane_p16): Likewise.
(vld3q_lane_p64): Likewise.
(vld3_lane_bf16): Likewise.
(vld3q_lane_bf16): Likewise.


rb14792.patch
Description: rb14792.patch


[PATCH 1/3] aarch64: Remove macros for vld2[q]_lane Neon intrinsics

2021-08-16 Thread Jonathan Wright via Gcc-patches
Hi,

This patch removes macros for vld2[q]_lane Neon intrinsics. This is a
preparatory step before adding new modes for structures of Advanced
SIMD vectors.

Regression tested and bootstrapped on aarch64-none-linux-gnu - no
issues.

Ok for master?

Thanks,
Jonathan

---

gcc/ChangeLog:

2021-08-12  Jonathan Wright  

* config/aarch64/arm_neon.h (__LD2_LANE_FUNC): Delete.
(__LD2Q_LANE_FUNC): Likewise.
(vld2_lane_u8): Define without macro.
(vld2_lane_u16): Likewise.
(vld2_lane_u32): Likewise.
(vld2_lane_u64): Likewise.
(vld2_lane_s8): Likewise.
(vld2_lane_s16): Likewise.
(vld2_lane_s32): Likewise.
(vld2_lane_s64): Likewise.
(vld2_lane_f16): Likewise.
(vld2_lane_f32): Likewise.
(vld2_lane_f64): Likewise.
(vld2_lane_p8): Likewise.
(vld2_lane_p16): Likewise.
(vld2_lane_p64): Likewise.
(vld2q_lane_u8): Likewise.
(vld2q_lane_u16): Likewise.
(vld2q_lane_u32): Likewise.
(vld2q_lane_u64): Likewise.
(vld2q_lane_s8): Likewise.
(vld2q_lane_s16): Likewise.
(vld2q_lane_s32): Likewise.
(vld2q_lane_s64): Likewise.
(vld2q_lane_f16): Likewise.
(vld2q_lane_f32): Likewise.
(vld2q_lane_f64): Likewise.
(vld2q_lane_p8): Likewise.
(vld2q_lane_p16): Likewise.
(vld2q_lane_p64): Likewise.
(vld2_lane_bf16): Likewise.
(vld2q_lane_bf16): Likewise.


rb14791.patch
Description: rb14791.patch


Re: [PATCH] RISC-V: Enable overlap-by-pieces in case of fast unaliged access

2021-08-16 Thread Kito Cheng via Gcc-patches
> > Could you submit v3 patch which is v1 with overlap_op_by_pieces field,
> > testcase from v2 and add a few more comments to describe the field?
> >
> > And add an -mtune=ultra-size to make it able to test without change
> > other behavior?
> >
> > Hi Palmer:
> >
> > Are you OK with that?
>
> I'm still not convinced on the performance: like Andrew and I pointed
> out, this is a difficult case for pipelines of this flavor to handle.
> Nobody here knows anything about this pipeline deeply enough to say
> anything difinitive, though, so this is really just a guess.

So with an extra field to indicate should resolve that?
I believe people should only set overlap_op_by_pieces
to true only if they are sure it has benefits.

> As I'm not convinced this is an obvious performance win I'm not going to
> merge it without a benchmark.  If you're convinced and want to merge it
> that's fine, I don't really care about the performance fo the C906 and
> if someone complains we can always just revert it later.

I suppose Christoph has tried with their internal processor, and it's
benefit on performance,
but it can't be open-source yet, so v2 patch set using C906 to demo
and test that since that is
the only processor with slow_unaligned_access=False.

I agree on the C906 part, we never know it's benefit or not, so I propose
adding one -mtune=ultra-size to make this test-able rather than changing C906.


Re: [PATCH] RISC-V: Enable overlap-by-pieces in case of fast unaliged access

2021-08-16 Thread Palmer Dabbelt

On Mon, 16 Aug 2021 03:02:42 PDT (-0700), Kito Cheng wrote:

HI Christoph:

Could you submit v3 patch which is v1 with overlap_op_by_pieces field,
testcase from v2 and add a few more comments to describe the field?

And add an -mtune=ultra-size to make it able to test without change
other behavior?

Hi Palmer:

Are you OK with that?


I'm still not convinced on the performance: like Andrew and I pointed 
out, this is a difficult case for pipelines of this flavor to handle.  
Nobody here knows anything about this pipeline deeply enough to say 
anything difinitive, though, so this is really just a guess.


As I'm not convinced this is an obvious performance win I'm not going to 
merge it without a benchmark.  If you're convinced and want to merge it 
that's fine, I don't really care about the performance fo the C906 and 
if someone complains we can always just revert it later.



On Sat, Aug 14, 2021 at 1:54 AM Christoph Müllner via Gcc-patches
 wrote:


Ping.

On Thu, Aug 5, 2021 at 11:11 AM Christoph Müllner  wrote:
>
> Ping.
>
> On Thu, Jul 29, 2021 at 9:36 PM Christoph Müllner  
wrote:
> >
> > On Thu, Jul 29, 2021 at 8:54 PM Palmer Dabbelt  wrote:
> > >
> > > On Tue, 27 Jul 2021 02:32:12 PDT (-0700), cmuell...@gcc.gnu.org wrote:
> > > > Ok, so if I understand correctly Palmer and Andrew prefer
> > > > overlap_op_by_pieces to be controlled
> > > > by its own field in the riscv_tune_param struct and not by the field
> > > > slow_unaligned_access in this struct
> > > > (i.e. slow_unaligned_access==false is not enough to imply
> > > > overlap_op_by_pieces==true).
> > >
> > > I guess, but I'm not really worried about this at that level of detail
> > > right now.  It's not like the tune structures form any sort of external
> > > interface we have to keep stable, we can do whatever we want with those
> > > fields so I'd just aim for encoding the desired behavior as simply as
> > > possible rather than trying to build something extensible.
> > >
> > > There are really two questions we need to answer: is this code actually
> > > faster for the C906, and is this what the average users wants under -Os.
> >
> > I never mentioned -Os.
> > My main goal is code compiled for -O2, -O3 or even -Ofast.
> > And I want to execute code as fast as possible.
> >
> > Loading hot data from cache is faster when being done by a single
> > load-word instruction than 4 load-byte instructions.
> > Less instructions implies less pressure for the instruction cache.
> > Less instructions implies less work for a CPU pipeline.
> > Architectures, which don't have a penalty for unaligned accesses
> > therefore observe a performance benefit.
> >
> > What I understand from Andrew's email is that it is not that simple
> > and implementation might have a penalty for overlapping accesses
> > that is high enough to avoid them. I don't have the details for C906,
> > so I can't say if that's the case.
> >
> > > That first one is pretty easy: just running those simple code sequences
> > > under a sweep of page offsets should be sufficient to determine if this
> > > is always faster (in which case it's an easy yes), if it's always slower
> > > (an easy no), or if there's some slow cases like page/cache line
> > > crossing (in which case we'd need to think a bit).
> > >
> > > The second one is a bit tricker.  In the past we'd said these sort of
> > > "actively misalign accesses to generate smaller code" sort of thing
> > > isn't suitable for -Os (as most machines still have very slow unaligned
> > > accesses) but is suitable for -Oz (don't remember if that ever ended up
> > > in GCC, though).  That still seems like a reasonable decision, but if it
> > > turns out that implementations with fast unaligned accesses become the
> > > norm then it'd probably be worth revisiting it.  Not sure exactly how to
> > > determine that tipping point, but I think we're a long way away from it
> > > right now.
> > >
> > > IMO it's really just premature to try and design an encoding of the
> > > tuning paramaters until we have an idea of what they are, as we'll just
> > > end up devolving down the path of trying to encode all possible hardware
> > > and that's generally a huge waste of time.  Since there's no ABI here we
> > > can refactor this however we want as new tunings show up.
> >
> > I guess you mean that there needs to be a clear benefit for a supported
> > machine in GCC. Either obviously (see below), by measurement results,
> > or by decision
> > of the machine's maintainer (especially if the decision is a trade-off).
> >
> > >
> > > > I don't have access to pipeline details that give proof that there are 
cases
> > > > where this patch causes a performance penalty.
> > > >
> > > > So, I leave this here as a summary for someone who has enough 
information and
> > > > interest to move this forward:
> > > > * the original patch should be sufficient, but does not have tests:
> > > >   https://gcc.gnu.org/pipermail/gcc-patches/2021-July/575791.html
> > > > * 

[Patch][GCC][middle-end] - Generate FRINTZ for (double)(int) under -ffast-math on aarch64

2021-08-16 Thread Jirui Wu via Gcc-patches
Hi all,

This patch generates FRINTZ instruction to optimize type casts.

The changes in this patch covers:
* Opimization of a FIX_TRUNC_EXPR cast inside a FLOAT_EXPR using IFN_TRUNC.
* Change of corresponding test cases.

Regtested on aarch64-none-linux-gnu and no issues.

Ok for master? If OK can it be committed for me, I have no commit rights.

Thanks,
Jirui

gcc/ChangeLog:

* tree-ssa-forwprop.c (pass_forwprop::execute): Optimize with frintz.


gcc/testsuite/ChangeLog:

* gcc.target/aarch64/fix_trunc1.c: Update to new expectation.


diff
Description: diff


Re: [patch][version 6] add -ftrivial-auto-var-init and variable attribute "uninitialized" to gcc

2021-08-16 Thread Qing Zhao via Gcc-patches


> On Aug 16, 2021, at 2:40 AM, Richard Biener  wrote:
> 
> On Thu, 12 Aug 2021, Qing Zhao wrote:
> 
>> Hi, Richard,
>> 
>> For RTL expansion of call to .DEFERRED_INIT, I changed my code per your 
>> suggestions like following:
>> 
>> ==
>> #define INIT_PATTERN_VALUE  0xFE
>> static void
>> expand_DEFERRED_INIT (internal_fn, gcall *stmt)
>> {
>>  tree lhs = gimple_call_lhs (stmt);
>>  tree var_size = gimple_call_arg (stmt, 0);
>>  enum auto_init_type init_type
>>= (enum auto_init_type) TREE_INT_CST_LOW (gimple_call_arg (stmt, 1));
>>  bool is_vla = (bool) TREE_INT_CST_LOW (gimple_call_arg (stmt, 2));
>> 
>>  tree var_type = TREE_TYPE (lhs);
>>  gcc_assert (init_type > AUTO_INIT_UNINITIALIZED);
>> 
>>  if (is_vla || (!can_native_interpret_type_p (var_type)))
>>{
>>/* If this is a VLA or the type of the variable cannot be natively
>>   interpreted, expand to a memset to initialize it.  */
>>  if (TREE_CODE (lhs) == SSA_NAME)
>>lhs = SSA_NAME_VAR (lhs);
>>  tree var_addr = NULL_TREE;
>>  if (is_vla)
>>var_addr = TREE_OPERAND (lhs, 0);
>>  else
>>{
>> TREE_ADDRESSABLE (lhs) = 1;
>> var_addr = build_fold_addr_expr (lhs);
>>}
>>  tree value = (init_type == AUTO_INIT_PATTERN) ?
>>build_int_cst (unsigned_char_type_node,
>>   INIT_PATTERN_VALUE) :
>>build_zero_cst (unsigned_char_type_node);
>>  tree m_call = build_call_expr (builtin_decl_implicit (BUILT_IN_MEMSET),
>> 3, var_addr, value, var_size);
>>  /* Expand this memset call.  */
>>  expand_builtin_memset (m_call, NULL_RTX, TYPE_MODE (var_type));
>>}
>>  else
>>{
>>/* If this is not a VLA and the type of the variable can be natively 
>>   interpreted, expand to assignment to generate better code.  */
>>  tree pattern = NULL_TREE;
>>  unsigned HOST_WIDE_INT total_bytes
>>= tree_to_uhwi (TYPE_SIZE_UNIT (var_type));
>> 
>>  if (init_type == AUTO_INIT_PATTERN)
>>{
>>  unsigned char *buf = (unsigned char *) xmalloc (total_bytes);
>>  memset (buf, INIT_PATTERN_VALUE, total_bytes);
>>  pattern = native_interpret_expr (var_type, buf, total_bytes);
>>  gcc_assert (pattern);
>>}
>> 
>>  tree init = (init_type == AUTO_INIT_PATTERN) ?
>>   pattern :
>>   build_zero_cst (var_type);
>>  expand_assignment (lhs, init, false);
>>}
>> }
>> ===
>> 
>> Now, I used “can_native_interpret_type_p (var_type)” instead of 
>> “use_register_for_decl (lhs)” to decide 
>> whether to use “memset” or use “assign” to expand this function.
>> 
>> However, this exposed an bug that is very hard to be addressed:
>> 
>> ***For the testing case: test suite/gcc.dg/uninit-I.c:
>> 
>> /* { dg-do compile } */
>> /* { dg-options "-O2 -Wuninitialized" } */
>> 
>> int sys_msgctl (void)
>> {
>>  struct { int mode; } setbuf;
>>  return setbuf.mode;  /* { dg-warning "'setbuf\.mode' is used" } */
>> ==
>> 
>> **the above auto var “setbuf” has “struct” type, which 
>> “can_native_interpret_type_p(var_type)” is false, therefore, 
>> Expanding this .DEFERRED_INIT call went down the “memset” expansion route. 
>> 
>> However, this structure type can be fitted into a register, therefore cannot 
>> be taken address anymore at this stage, even though I tried:
>> 
>> TREE_ADDRESSABLE (lhs) = 1;
>> var_addr = build_fold_addr_expr (lhs);
>> 
>> To create an address variable for it, the expansion still failed at expr.c: 
>> line 8412:
>> during RTL pass: expand
>> /home/opc/Work/GCC/latest-gcc/gcc/testsuite/gcc.dg/auto-init-uninit-I.c:6:24:
>>  internal compiler error: in expand_expr_addr_expr_1, at expr.c:8412
>> 0xd04104 expand_expr_addr_expr_1
>>  ../../latest-gcc/gcc/expr.c:8412
>> 0xd04a95 expand_expr_addr_expr
>>  ../../latest-gcc/gcc/expr.c:8525
>> 0xd13592 expand_expr_real_1(tree_node*, rtx_def*, machine_mode, 
>> expand_modifier, rtx_def**, bool)
>>  ../../latest-gcc/gcc/expr.c:11741
>> 0xd05142 expand_expr_real(tree_node*, rtx_def*, machine_mode, 
>> expand_modifier, rtx_def**, bool)
>>  ../../latest-gcc/gcc/expr.c:8713
>> 0xaed1d3 expand_expr
>>  ../../latest-gcc/gcc/expr.h:301
>> 0xaf0d89 get_memory_rtx
>>  ../../latest-gcc/gcc/builtins.c:1370
>> 0xafb4fb expand_builtin_memset_args
>>  ../../latest-gcc/gcc/builtins.c:4102
>> 0xafacde expand_builtin_memset(tree_node*, rtx_def*, machine_mode)
>>  ../../latest-gcc/gcc/builtins.c:3886
>> 0xe97fb3 expand_DEFERRED_INIT
>> 
>> **That’s the major reason why I chose “use_register_for_decl(lhs)” to 
>> decide “memset” expansion or “assign” expansion, “memset” expansion
>> needs to take address of the variable, if the variable has been decided to 
>> fit into a register, then its address cannot taken anymore at this stage.
>> 
>> **using 

Re: [patch][version 6] add -ftrivial-auto-var-init and variable attribute "uninitialized" to gcc

2021-08-16 Thread Qing Zhao via Gcc-patches


> On Aug 16, 2021, at 10:08 AM, Richard Biener  wrote:
> 
> On August 16, 2021 4:48:16 PM GMT+02:00, Qing Zhao  
> wrote:
>> 
>> 
>>> On Aug 16, 2021, at 2:12 AM, Richard Biener  wrote:
>>> 
>>> On Wed, 11 Aug 2021, Qing Zhao wrote:
>>> 
 Hi, 
 
 I finally decided to take another approach to resolve this issue, it 
 resolved all the potential issues with the “address taken” auto variable.
 
 The basic idea is to avoid generating the temporary variable in the 
 beginning. 
 As you mentioned, "The reason is that alt_reloc is memory (because it is 
 address taken) and that GIMPLE says that register typed stores 
 need to use a is_gimple_val RHS which the call is not.”
 In order to avoid generating the temporary variable for “address taken” 
 auto variable, I updated the utility routine “is_gimple_val” as following:
 
 diff --git a/gcc/gimple-expr.c b/gcc/gimple-expr.c
 index a2563a45c37d..d5ef1aef8cea 100644
 --- a/gcc/gimple-expr.c
 +++ b/gcc/gimple-expr.c
 @@ -787,8 +787,20 @@ is_gimple_reg (tree t)
  return !DECL_NOT_GIMPLE_REG_P (t);
 }
 
 +/* Return true if T is a call to .DEFERRED_INIT internal function.  */ 
 +static bool
 +is_deferred_init_call (tree t)
 +{
 +  if (TREE_CODE (t) == CALL_EXPR
 +  &&  CALL_EXPR_IFN (t) == IFN_DEFERRED_INIT)
 +return true;
 +  return false;
 +}
 +
 
 -/* Return true if T is a GIMPLE rvalue, i.e. an identifier or a constant. 
  */
 +/* Return true if T is a GIMPLE rvalue, i.e. an identifier or a constant,
 +   or a call to .DEFERRED_INIT internal function because the call to
 +   .DEFERRED_INIT will eventually be expanded as a constant.  */
 
 bool
 is_gimple_val (tree t)
 @@ -799,7 +811,8 @@ is_gimple_val (tree t)
  && !is_gimple_reg (t))
return false;
 
 -  return (is_gimple_variable (t) || is_gimple_min_invariant (t));
 +  return (is_gimple_variable (t) || is_gimple_min_invariant (t)
 + || is_deferred_init_call (t));
 }
 
 With this change, the temporary variable will not be created for “address 
 taken” auto variable, and uninitialized analysis does not need any change. 
 Everything works well.
 
 And I believe that treating “call to .DEFERRED_INIT” as “is_gimple_val” is 
 reasonable since this call actually is a constant.
 
 Let me know if you have any objection on this solution.
>>> 
>>> Yeah, I object to this solution.
>> 
>> Can you explain what’s the major issue for this solution? 
> 
> It punches a hole into the GIMPLE IL which is very likely incomplete and will 
> cause issues elsewhere. In particular the corresponding type check is missing 
> and not computable since the RHS of this call isn't separately available. 
> 
> If you go down this route then you should have added a new constant class 
> tree code instead of an internal function call. 

Okay. I see. 
> 
>> To me,  treating “call to .DEFERRED_INIT” as “is_gimple_val” is reasonable 
>> since this call actually is a constant.
> 
> Sure, but it's not represented as such. 

Thank you!.

Qing
> 
> Richard. 
> 
>> Thanks.
>> 
>> Qing
>>> Richard.
>>> 
 thanks.
 
 Qing
 
> On Aug 11, 2021, at 3:30 PM, Qing Zhao via Gcc-patches 
>  wrote:
> 
> Hi, 
> 
> I met another issue for “address taken” auto variable, see below for 
> details:
> 
>  the testing case: (gcc/testsuite/gcc.dg/uninit-16.c)
> 
> int foo, bar;
> 
> static
> void decode_reloc(int reloc, int *is_alt)
> {
> if (reloc >= 20)
>*is_alt = 1;
> else if (reloc >= 10)
>*is_alt = 0;
> }
> 
> void testfunc()
> {
> int alt_reloc;
> 
> decode_reloc(foo, _reloc);
> 
> if (alt_reloc) /* { dg-warning "may be used uninitialized" } */
>  bar = 42;
> }
> 
> When compiled with -ftrivial-auto-var-init=zero -O2 -Wuninitialized 
> -fdump-tree-all:
> 
> .*gimple dump:
> 
> void testfunc ()
> { 
> int alt_reloc;
> 
> try
>  {
>_1 = .DEFERRED_INIT (4, 2, 0);
>alt_reloc = _1;
>foo.0_2 = foo;
>decode_reloc (foo.0_2, _reloc);
>alt_reloc.1_3 = alt_reloc;
>if (alt_reloc.1_3 != 0) goto ; else goto ;
>:
>bar = 42;
>:
>  }
> finally
>  {
>alt_reloc = {CLOBBER};
>  }
> }
> 
> **fre1 dump:
> 
> void testfunc ()
> {
> int alt_reloc;
> int _1;
> int foo.0_2;
> 
>  :
> _1 = .DEFERRED_INIT (4, 2, 0);
> foo.0_2 = foo;
> if (foo.0_2 > 19)
>  goto ; [50.00%]
> else
>  goto ; [50.00%]
> 
>  :
> goto ; [100.00%]
> 
>  :
> if (foo.0_2 > 9)
>  goto ; [50.00%]
> else
>  goto ; [50.00%]
> 
>  :
> goto ; [100.00%]

Re: [patch][version 6] add -ftrivial-auto-var-init and variable attribute "uninitialized" to gcc

2021-08-16 Thread Richard Biener via Gcc-patches
On August 16, 2021 4:48:16 PM GMT+02:00, Qing Zhao  wrote:
>
>
>> On Aug 16, 2021, at 2:12 AM, Richard Biener  wrote:
>> 
>> On Wed, 11 Aug 2021, Qing Zhao wrote:
>> 
>>> Hi, 
>>> 
>>> I finally decided to take another approach to resolve this issue, it 
>>> resolved all the potential issues with the “address taken” auto variable.
>>> 
>>> The basic idea is to avoid generating the temporary variable in the 
>>> beginning. 
>>> As you mentioned, "The reason is that alt_reloc is memory (because it is 
>>> address taken) and that GIMPLE says that register typed stores 
>>> need to use a is_gimple_val RHS which the call is not.”
>>> In order to avoid generating the temporary variable for “address taken” 
>>> auto variable, I updated the utility routine “is_gimple_val” as following:
>>> 
>>> diff --git a/gcc/gimple-expr.c b/gcc/gimple-expr.c
>>> index a2563a45c37d..d5ef1aef8cea 100644
>>> --- a/gcc/gimple-expr.c
>>> +++ b/gcc/gimple-expr.c
>>> @@ -787,8 +787,20 @@ is_gimple_reg (tree t)
>>>   return !DECL_NOT_GIMPLE_REG_P (t);
>>> }
>>> 
>>> +/* Return true if T is a call to .DEFERRED_INIT internal function.  */ 
>>> +static bool
>>> +is_deferred_init_call (tree t)
>>> +{
>>> +  if (TREE_CODE (t) == CALL_EXPR
>>> +  &&  CALL_EXPR_IFN (t) == IFN_DEFERRED_INIT)
>>> +return true;
>>> +  return false;
>>> +}
>>> +
>>> 
>>> -/* Return true if T is a GIMPLE rvalue, i.e. an identifier or a constant.  
>>> */
>>> +/* Return true if T is a GIMPLE rvalue, i.e. an identifier or a constant,
>>> +   or a call to .DEFERRED_INIT internal function because the call to
>>> +   .DEFERRED_INIT will eventually be expanded as a constant.  */
>>> 
>>> bool
>>> is_gimple_val (tree t)
>>> @@ -799,7 +811,8 @@ is_gimple_val (tree t)
>>>   && !is_gimple_reg (t))
>>> return false;
>>> 
>>> -  return (is_gimple_variable (t) || is_gimple_min_invariant (t));
>>> +  return (is_gimple_variable (t) || is_gimple_min_invariant (t)
>>> + || is_deferred_init_call (t));
>>> }
>>> 
>>> With this change, the temporary variable will not be created for “address 
>>> taken” auto variable, and uninitialized analysis does not need any change. 
>>> Everything works well.
>>> 
>>> And I believe that treating “call to .DEFERRED_INIT” as “is_gimple_val” is 
>>> reasonable since this call actually is a constant.
>>> 
>>> Let me know if you have any objection on this solution.
>> 
>> Yeah, I object to this solution.
>
>Can you explain what’s the major issue for this solution? 

It punches a hole into the GIMPLE IL which is very likely incomplete and will 
cause issues elsewhere. In particular the corresponding type check is missing 
and not computable since the RHS of this call isn't separately available. 

If you go down this route then you should have added a new constant class tree 
code instead of an internal function call. 

>To me,  treating “call to .DEFERRED_INIT” as “is_gimple_val” is reasonable 
>since this call actually is a constant.

Sure, but it's not represented as such. 

Richard. 

>Thanks.
>
>Qing
>> Richard.
>> 
>>> thanks.
>>> 
>>> Qing
>>> 
 On Aug 11, 2021, at 3:30 PM, Qing Zhao via Gcc-patches 
  wrote:
 
 Hi, 
 
 I met another issue for “address taken” auto variable, see below for 
 details:
 
  the testing case: (gcc/testsuite/gcc.dg/uninit-16.c)
 
 int foo, bar;
 
 static
 void decode_reloc(int reloc, int *is_alt)
 {
 if (reloc >= 20)
 *is_alt = 1;
 else if (reloc >= 10)
 *is_alt = 0;
 }
 
 void testfunc()
 {
 int alt_reloc;
 
 decode_reloc(foo, _reloc);
 
 if (alt_reloc) /* { dg-warning "may be used uninitialized" } */
   bar = 42;
 }
 
 When compiled with -ftrivial-auto-var-init=zero -O2 -Wuninitialized 
 -fdump-tree-all:
 
 .*gimple dump:
 
 void testfunc ()
 { 
 int alt_reloc;
 
 try
   {
 _1 = .DEFERRED_INIT (4, 2, 0);
 alt_reloc = _1;
 foo.0_2 = foo;
 decode_reloc (foo.0_2, _reloc);
 alt_reloc.1_3 = alt_reloc;
 if (alt_reloc.1_3 != 0) goto ; else goto ;
 :
 bar = 42;
 :
   }
 finally
   {
 alt_reloc = {CLOBBER};
   }
 }
 
 **fre1 dump:
 
 void testfunc ()
 {
 int alt_reloc;
 int _1;
 int foo.0_2;
 
  :
 _1 = .DEFERRED_INIT (4, 2, 0);
 foo.0_2 = foo;
 if (foo.0_2 > 19)
   goto ; [50.00%]
 else
   goto ; [50.00%]
 
  :
 goto ; [100.00%]
 
  :
 if (foo.0_2 > 9)
   goto ; [50.00%]
 else
   goto ; [50.00%]
 
  :
 goto ; [100.00%]
 
  :
 if (_1 != 0)
   goto ; [INV]
 else
   goto ; [INV]
 
  :
 bar = 42;
 
  :
 return;
 
 }
 
 From the above IR file after “FRE”, we can see that the major issue with 
 this IR is:
 
 The address 

Re: [patch][version 6] add -ftrivial-auto-var-init and variable attribute "uninitialized" to gcc

2021-08-16 Thread Qing Zhao via Gcc-patches


> On Aug 16, 2021, at 2:12 AM, Richard Biener  wrote:
> 
> On Wed, 11 Aug 2021, Qing Zhao wrote:
> 
>> Hi, 
>> 
>> I finally decided to take another approach to resolve this issue, it 
>> resolved all the potential issues with the “address taken” auto variable.
>> 
>> The basic idea is to avoid generating the temporary variable in the 
>> beginning. 
>> As you mentioned, "The reason is that alt_reloc is memory (because it is 
>> address taken) and that GIMPLE says that register typed stores 
>> need to use a is_gimple_val RHS which the call is not.”
>> In order to avoid generating the temporary variable for “address taken” auto 
>> variable, I updated the utility routine “is_gimple_val” as following:
>> 
>> diff --git a/gcc/gimple-expr.c b/gcc/gimple-expr.c
>> index a2563a45c37d..d5ef1aef8cea 100644
>> --- a/gcc/gimple-expr.c
>> +++ b/gcc/gimple-expr.c
>> @@ -787,8 +787,20 @@ is_gimple_reg (tree t)
>>   return !DECL_NOT_GIMPLE_REG_P (t);
>> }
>> 
>> +/* Return true if T is a call to .DEFERRED_INIT internal function.  */ 
>> +static bool
>> +is_deferred_init_call (tree t)
>> +{
>> +  if (TREE_CODE (t) == CALL_EXPR
>> +  &&  CALL_EXPR_IFN (t) == IFN_DEFERRED_INIT)
>> +return true;
>> +  return false;
>> +}
>> +
>> 
>> -/* Return true if T is a GIMPLE rvalue, i.e. an identifier or a constant.  
>> */
>> +/* Return true if T is a GIMPLE rvalue, i.e. an identifier or a constant,
>> +   or a call to .DEFERRED_INIT internal function because the call to
>> +   .DEFERRED_INIT will eventually be expanded as a constant.  */
>> 
>> bool
>> is_gimple_val (tree t)
>> @@ -799,7 +811,8 @@ is_gimple_val (tree t)
>>   && !is_gimple_reg (t))
>> return false;
>> 
>> -  return (is_gimple_variable (t) || is_gimple_min_invariant (t));
>> +  return (is_gimple_variable (t) || is_gimple_min_invariant (t)
>> + || is_deferred_init_call (t));
>> }
>> 
>> With this change, the temporary variable will not be created for “address 
>> taken” auto variable, and uninitialized analysis does not need any change. 
>> Everything works well.
>> 
>> And I believe that treating “call to .DEFERRED_INIT” as “is_gimple_val” is 
>> reasonable since this call actually is a constant.
>> 
>> Let me know if you have any objection on this solution.
> 
> Yeah, I object to this solution.

Can you explain what’s the major issue for this solution? 

To me,  treating “call to .DEFERRED_INIT” as “is_gimple_val” is reasonable 
since this call actually is a constant.

Thanks.

Qing
> Richard.
> 
>> thanks.
>> 
>> Qing
>> 
>>> On Aug 11, 2021, at 3:30 PM, Qing Zhao via Gcc-patches 
>>>  wrote:
>>> 
>>> Hi, 
>>> 
>>> I met another issue for “address taken” auto variable, see below for 
>>> details:
>>> 
>>>  the testing case: (gcc/testsuite/gcc.dg/uninit-16.c)
>>> 
>>> int foo, bar;
>>> 
>>> static
>>> void decode_reloc(int reloc, int *is_alt)
>>> {
>>> if (reloc >= 20)
>>> *is_alt = 1;
>>> else if (reloc >= 10)
>>> *is_alt = 0;
>>> }
>>> 
>>> void testfunc()
>>> {
>>> int alt_reloc;
>>> 
>>> decode_reloc(foo, _reloc);
>>> 
>>> if (alt_reloc) /* { dg-warning "may be used uninitialized" } */
>>>   bar = 42;
>>> }
>>> 
>>> When compiled with -ftrivial-auto-var-init=zero -O2 -Wuninitialized 
>>> -fdump-tree-all:
>>> 
>>> .*gimple dump:
>>> 
>>> void testfunc ()
>>> { 
>>> int alt_reloc;
>>> 
>>> try
>>>   {
>>> _1 = .DEFERRED_INIT (4, 2, 0);
>>> alt_reloc = _1;
>>> foo.0_2 = foo;
>>> decode_reloc (foo.0_2, _reloc);
>>> alt_reloc.1_3 = alt_reloc;
>>> if (alt_reloc.1_3 != 0) goto ; else goto ;
>>> :
>>> bar = 42;
>>> :
>>>   }
>>> finally
>>>   {
>>> alt_reloc = {CLOBBER};
>>>   }
>>> }
>>> 
>>> **fre1 dump:
>>> 
>>> void testfunc ()
>>> {
>>> int alt_reloc;
>>> int _1;
>>> int foo.0_2;
>>> 
>>>  :
>>> _1 = .DEFERRED_INIT (4, 2, 0);
>>> foo.0_2 = foo;
>>> if (foo.0_2 > 19)
>>>   goto ; [50.00%]
>>> else
>>>   goto ; [50.00%]
>>> 
>>>  :
>>> goto ; [100.00%]
>>> 
>>>  :
>>> if (foo.0_2 > 9)
>>>   goto ; [50.00%]
>>> else
>>>   goto ; [50.00%]
>>> 
>>>  :
>>> goto ; [100.00%]
>>> 
>>>  :
>>> if (_1 != 0)
>>>   goto ; [INV]
>>> else
>>>   goto ; [INV]
>>> 
>>>  :
>>> bar = 42;
>>> 
>>>  :
>>> return;
>>> 
>>> }
>>> 
>>> From the above IR file after “FRE”, we can see that the major issue with 
>>> this IR is:
>>> 
>>> The address taken auto variable “alt_reloc” has been completely replaced by 
>>> the temporary variable “_1” in all
>>> the uses of the original “alt_reloc”. 
>>> 
>>> The major problem with such IR is,  during uninitialized analysis phase, 
>>> the original use of “alt_reloc” disappeared completely.
>>> So, the warning cannot be reported.
>>> 
>>> 
>>> My questions:
>>> 
>>> 1. Is it possible to get the original “alt_reloc” through the temporary 
>>> variable “_1” with some available information recorded in the IR?
>>> 2. If not, then we have to record the relationship between “alt_reloc” and 
>>> “_1” when the original “alt_reloc” is replaced by “_1” and 

[PATCH] [MIPS] Hazard barrier return support

2021-08-16 Thread Dragan Mladjenovic via Gcc-patches
This patch allows a function to request clearing of all instruction and 
execution
hazards upon normal return via __attribute__ ((use_hazard_barrier_return)).

2017-04-25  Prachi Godbole  

gcc/
* config/mips/mips.h (machine_function): New variable
use_hazard_barrier_return_p.
* config/mips/mips.md (UNSPEC_JRHB): New unspec.
(mips_hb_return_internal): New insn pattern.
* config/mips/mips.c (mips_attribute_table): Add attribute
use_hazard_barrier_return.
(mips_use_hazard_barrier_return_p): New static function.
(mips_function_attr_inlinable_p): Likewise.
(mips_compute_frame_info): Set use_hazard_barrier_return_p.
Emit error for unsupported architecture choice.
(mips_function_ok_for_sibcall, mips_can_use_return_insn):
Return false for use_hazard_barrier_return.
(mips_expand_epilogue): Emit hazard barrier return.
* doc/extend.texi: Document use_hazard_barrier_return.

gcc/testsuite/
* gcc.target/mips/hazard-barrier-return-attribute.c: New test.
---
Rehash of original patch posted by Prachi with minimal changes. Tested against
mips-mti-elf with mips32r2/-EB and mips32r2/-EB/-micromips.

 gcc/config/mips/mips.c| 58 +--
 gcc/config/mips/mips.h|  3 +
 gcc/config/mips/mips.md   | 15 +
 gcc/doc/extend.texi   |  6 ++
 .../mips/hazard-barrier-return-attribute.c| 20 +++
 5 files changed, 98 insertions(+), 4 deletions(-)
 create mode 100644 
gcc/testsuite/gcc.target/mips/hazard-barrier-return-attribute.c

diff --git a/gcc/config/mips/mips.c b/gcc/config/mips/mips.c
index 89d1be6cea6..6ce12fce52e 100644
--- a/gcc/config/mips/mips.c
+++ b/gcc/config/mips/mips.c
@@ -630,6 +630,7 @@ static const struct attribute_spec mips_attribute_table[] = 
{
 mips_handle_use_shadow_register_set_attr, NULL },
   { "keep_interrupts_masked",  0, 0, false, true,  true, false, NULL, NULL },
   { "use_debug_exception_return", 0, 0, false, true, true, false, NULL, NULL },
+  { "use_hazard_barrier_return", 0, 0, true, false, false, false, NULL, NULL },
   { NULL, 0, 0, false, false, false, false, NULL, NULL }
 };
 
@@ -1309,6 +1310,16 @@ mips_use_debug_exception_return_p (tree type)
   TYPE_ATTRIBUTES (type)) != NULL;
 }
 
+/* Check if the attribute to use hazard barrier return is set for
+   the function declaration DECL.  */
+
+static bool
+mips_use_hazard_barrier_return_p (const_tree decl)
+{
+  return lookup_attribute ("use_hazard_barrier_return",
+  DECL_ATTRIBUTES (decl)) != NULL;
+}
+
 /* Return the set of compression modes that are explicitly required
by the attributes in ATTRIBUTES.  */
 
@@ -1494,6 +1505,19 @@ mips_can_inline_p (tree caller, tree callee)
   return default_target_can_inline_p (caller, callee);
 }
 
+/* Implement TARGET_FUNCTION_ATTRIBUTE_INLINABLE_P.
+
+   A function requesting clearing of all instruction and execution hazards
+   before returning cannot be inlined - thereby not clearing any hazards.
+   All our other function attributes are related to how out-of-line copies
+   should be compiled or called.  They don't in themselves prevent inlining.  
*/
+
+static bool
+mips_function_attr_inlinable_p (const_tree decl)
+{
+  return !mips_use_hazard_barrier_return_p (decl);
+}
+
 /* Handle an "interrupt" attribute with an optional argument.  */
 
 static tree
@@ -7921,6 +7945,11 @@ mips_function_ok_for_sibcall (tree decl, tree exp 
ATTRIBUTE_UNUSED)
   && !targetm.binds_local_p (decl))
 return false;
 
+  /* Can't generate sibling calls if returning from current function using
+ hazard barrier return.  */
+  if (mips_use_hazard_barrier_return_p (current_function_decl))
+return false;
+
   /* Otherwise OK.  */
   return true;
 }
@@ -11008,6 +11037,17 @@ mips_compute_frame_info (void)
}
 }
 
+  /* Determine whether to use hazard barrier return or not.  */
+  if (mips_use_hazard_barrier_return_p (current_function_decl))
+{
+  if (mips_isa_rev < 2)
+   error ("hazard barrier returns require a MIPS32r2 processor or 
greater");
+  else if (TARGET_MIPS16)
+   error ("hazard barrier returns are not supported for MIPS16 functions");
+  else
+   cfun->machine->use_hazard_barrier_return_p = true;
+}
+
   frame = >machine->frame;
   memset (frame, 0, sizeof (*frame));
   size = get_frame_size ();
@@ -12671,7 +12711,8 @@ mips_expand_epilogue (bool sibcall_p)
   && !crtl->calls_eh_return
   && !sibcall_p
   && step2 > 0
-  && mips_unsigned_immediate_p (step2, 5, 2))
+  && mips_unsigned_immediate_p (step2, 5, 2)
+  && !cfun->machine->use_hazard_barrier_return_p)
use_jraddiusp_p = true;
   else
/* Deallocate the final bit of the frame.  */
@@ -12712,6 +12753,11 @@ mips_expand_epilogue 

Re: [PATCH] Try LTO partial linking. (Was: Speed of compiling gimple-match.c)

2021-08-16 Thread Martin Liška

PING^2

@Honza: Can you please review the change?

Martin

On 6/23/21 3:53 PM, Martin Liška wrote:

On 5/21/21 10:29 AM, Martin Liška wrote:

On 5/20/21 5:55 PM, Jan Hubicka wrote:

Quick solution is to also modify partitioner to use the local symbol
names when doing incremental linking (those mixing in source code and
random seeds) to avoid clashes.


Good hint. I added hash based on object file name (I don't want to handle
proper string escaping) and -frandom-seed.

What do you think about the patch?
Thanks,
Martin


@Honza: Can you please take a look at this patch?

Cheers,
Martin




Re: [PATCH v4] gcov: Add TARGET_GCOV_TYPE_SIZE target hook

2021-08-16 Thread Sebastian Huber

On 16/08/2021 14:33, Martin Liška wrote:

On 8/12/21 6:13 PM, Joseph Myers wrote:

This is not a review of the patch, but I think this version addresses all
the issues I had with previous versions regarding target macro/hook
handling.


And I'm fine with the GCOV part. Please install the patch.


Thanks for the review, I checked it in.

--
embedded brains GmbH
Herr Sebastian HUBER
Dornierstr. 4
82178 Puchheim
Germany
email: sebastian.hu...@embedded-brains.de
phone: +49-89-18 94 741 - 16
fax:   +49-89-18 94 741 - 08

Registergericht: Amtsgericht München
Registernummer: HRB 157899
Vertretungsberechtigte Geschäftsführer: Peter Rasmussen, Thomas Dörfler
Unsere Datenschutzerklärung finden Sie hier:
https://embedded-brains.de/datenschutzerklaerung/


Re: [PATCH] vect: Add extraction cost for slp reduc

2021-08-16 Thread Richard Biener via Gcc-patches
On Mon, Aug 16, 2021 at 9:16 AM Kewen.Lin  wrote:
>
> Hi Richi,
>
> Thanks for the comments!
>
> on 2021/8/16 下午2:49, Richard Biener wrote:
> > On Mon, Aug 16, 2021 at 8:03 AM Kewen.Lin  wrote:
> >>
> >> Hi,
> >>
> >> IIUC, the function vectorizable_bb_reduc_epilogue missed to
> >> consider the cost to extract the final value from the vector
> >> for reduc operations.  This patch is to add one time of
> >> vec_to_scalar cost for extracting.
> >>
> >> Bootstrapped & regtested on powerpc64le-linux-gnu P9.
> >> The testing on x86_64 and aarch64 is ongoing.
> >>
> >> Is it ok for trunk?
> >
> > There's no such instruction necessary, the way the costing works
> > the result is in lane zero already.  Note the optabs are defined
> > to reduce to a scalar already.  So if your arch implements those and
> > requires such move then the backend costing needs to handle that.
> >
>
> Yes, these reduc__scal_ should have made the
> operand[0] as the final scalar result.
>
> > That said, ideally we'd simply cost the IFN_REDUC_* in the backend
> > but for BB reductions we don't actually build a SLP node with such
> > representative stmt to pass down (yet).
> >
>
> OK, thanks for the explanation.  It explains why we cost the
> IFN_REDUC_* as one vect_stmt in loop vect but cost it as
> conservative (shuffle and reduc_op) as possible here.
>
> > I guess you're running into a integer reduction where there's
> > a vector -> gpr move missing in costing?  I suppose costing
> > vec_to_scalar works for that but in the end we should maybe
> > find a way to cost the IFN_REDUC_* ...
>
> Yeah, it's a reduction on plus, initially I wanted to adjust backend
> costing for various IFN_REDUC* (since for some variants Power has more
> than one instructions for them), then I noticed we cost the reduction
> as shuffle and reduc_op during SLP for now, I guess it's good to get
> vec_to_scalar considered here for consistency?  Then it can be removed
> together when we have a better modeling in the end?

Yeah, I guess that works for now.

Thanks,
Richard.

> BR,
> Kewen
>
> >
> > Richard.
> >
> >> BR,
> >> Kewen
> >> -
> >> gcc/ChangeLog:
> >>
> >> * tree-vect-slp.c (vectorizable_bb_reduc_epilogue): Add the cost 
> >> for
> >> value extraction.
> >>
> >> diff --git a/gcc/tree-vect-slp.c b/gcc/tree-vect-slp.c
> >> index b9d88c2d943..841a0872afa 100644
> >> --- a/gcc/tree-vect-slp.c
> >> +++ b/gcc/tree-vect-slp.c
> >> @@ -4845,12 +4845,14 @@ vectorizable_bb_reduc_epilogue (slp_instance 
> >> instance,
> >>  return false;
> >>
> >>/* There's no way to cost a horizontal vector reduction via REDUC_FN so
> >> - cost log2 vector operations plus shuffles.  */
> >> + cost log2 vector operations plus shuffles and one extraction.  */
> >>unsigned steps = floor_log2 (vect_nunits_for_cost (vectype));
> >>record_stmt_cost (cost_vec, steps, vector_stmt, instance->root_stmts[0],
> >> vectype, 0, vect_body);
> >>record_stmt_cost (cost_vec, steps, vec_perm, instance->root_stmts[0],
> >> vectype, 0, vect_body);
> >> +  record_stmt_cost (cost_vec, 1, vec_to_scalar, instance->root_stmts[0],
> >> +   vectype, 0, vect_body);
> >>return true;
> >>  }
>
>


Re: Ping: [PATCH v2] Analyze niter for until-wrap condition [PR101145]

2021-08-16 Thread Jiufu Guo via Gcc-patches

Jiufu Guo  writes:


"Bin.Cheng"  writes:

On Wed, Aug 4, 2021 at 10:42 AM guojiufu 
 wrote:


Hi,


cut...

>> @@ -0,0 +1,63 @@
>> +TYPE __attribute__ ((noinline))
>> +foo_sign (int *__restrict__ a, int *__restrict__ b, TYPE 
>> l, >> TYPE n)

>> +{
>> +  for (l = L_BASE; n < l; l += C)
>> +*a++ = *b++ + 1;
>> +  return l;
>> +}
>> +
>> +TYPE __attribute__ ((noinline))
>> +bar_sign (int *__restrict__ a, int *__restrict__ b, TYPE 
>> l, >> TYPE n)

>> +{
>> +  for (l = L_BASE_DOWN; l < n; l -= C)
I noticed that both L_BASE and L_BASE_DOWN are defined as l, 
which
makes this test a bit confusing.  Could you clean the use of l, 
for

example, by using an auto var for the loop index invariable?
Otherwise the patch looks good to me.  Thanks very much for the 
work.

Thanks a lot for your help to review!
L_BASE.. are not needed.  Updated the patch which use
a new index var 'i' for loop instead param 'l':

 TYPE i;
 for (i = l; n < i; i += C)

I updated the patch as below.
Bootstrap & regress pass on powerpc64 and powerpc64le.
I mean it also pass powerpc64(BE includes 32bit). 


BR,
Jiufu


For code like:
unsigned foo(unsigned val, unsigned start)
{
 unsigned cnt = 0;
 for (unsigned i = start; i > val; ++i)
   cnt++;
 return cnt;
}

The number of iterations should be about UINT_MAX - start.

There is function adjust_cond_for_loop_until_wrap which
handles similar work for const bases.
Like adjust_cond_for_loop_until_wrap, this patch enhance
function number_of_iterations_cond/number_of_iterations_lt
to analyze number of iterations for this kind of loop.

Bootstrap and regtest pass on powerpc64le, x86_64 and aarch64.
Is this ok for trunk?

gcc/ChangeLog:

2021-08-16  Jiufu Guo  

PR tree-optimization/101145
* tree-ssa-loop-niter.c (number_of_iterations_until_wrap):
New function.
(number_of_iterations_lt): Invoke above function.
(adjust_cond_for_loop_until_wrap):
Merge to number_of_iterations_until_wrap.
(number_of_iterations_cond): Update invokes for
	adjust_cond_for_loop_until_wrap and 
number_of_iterations_lt.


gcc/testsuite/ChangeLog:

2021-08-16  Jiufu Guo  

PR tree-optimization/101145
* gcc.dg/vect/pr101145.c: New test.
* gcc.dg/vect/pr101145.inc: New test.
* gcc.dg/vect/pr101145_1.c: New test.
* gcc.dg/vect/pr101145_2.c: New test.
* gcc.dg/vect/pr101145_3.c: New test.
* gcc.dg/vect/pr101145inf.c: New test.
* gcc.dg/vect/pr101145inf.inc: New test.
* gcc.dg/vect/pr101145inf_1.c: New test.
---
gcc/testsuite/gcc.dg/vect/pr101145.c  | 187 
++

gcc/testsuite/gcc.dg/vect/pr101145.inc|  65 
gcc/testsuite/gcc.dg/vect/pr101145_1.c|  13 ++
gcc/testsuite/gcc.dg/vect/pr101145_2.c|  13 ++
gcc/testsuite/gcc.dg/vect/pr101145_3.c|  13 ++
gcc/testsuite/gcc.dg/vect/pr101145inf.c   |  25 +++
gcc/testsuite/gcc.dg/vect/pr101145inf.inc |  28 
gcc/testsuite/gcc.dg/vect/pr101145inf_1.c |  23 +++
gcc/tree-ssa-loop-niter.c | 157 
++

9 files changed, 459 insertions(+), 65 deletions(-)
create mode 100644 gcc/testsuite/gcc.dg/vect/pr101145.c
create mode 100644 gcc/testsuite/gcc.dg/vect/pr101145.inc
create mode 100644 gcc/testsuite/gcc.dg/vect/pr101145_1.c
create mode 100644 gcc/testsuite/gcc.dg/vect/pr101145_2.c
create mode 100644 gcc/testsuite/gcc.dg/vect/pr101145_3.c
create mode 100644 gcc/testsuite/gcc.dg/vect/pr101145inf.c
create mode 100644 gcc/testsuite/gcc.dg/vect/pr101145inf.inc
create mode 100644 gcc/testsuite/gcc.dg/vect/pr101145inf_1.c

diff --git a/gcc/testsuite/gcc.dg/vect/pr101145.c 
b/gcc/testsuite/gcc.dg/vect/pr101145.c

new file mode 100644
index 000..74031b031cf
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/vect/pr101145.c
@@ -0,0 +1,187 @@
+/* { dg-require-effective-target vect_int } */
+/* { dg-options "-O3 -fdump-tree-vect-details" } */
+#include 
+
+unsigned __attribute__ ((noinline))
+foo (int *__restrict__ a, int *__restrict__ b, unsigned l, 
unsigned n)

+{
+  while (n < ++l)
+*a++ = *b++ + 1;
+  return l;
+}
+
+unsigned __attribute__ ((noinline))
+foo_1 (int *__restrict__ a, int *__restrict__ b, unsigned l, 
unsigned)

+{
+  while (UINT_MAX - 64 < ++l)
+*a++ = *b++ + 1;
+  return l;
+}
+
+unsigned __attribute__ ((noinline))
+foo_2 (int *__restrict__ a, int *__restrict__ b, unsigned l, 
unsigned n)

+{
+  l = UINT_MAX - 32;
+  while (n < ++l)
+*a++ = *b++ + 1;
+  return l;
+}
+
+unsigned __attribute__ ((noinline))
+foo_3 (int *__restrict__ a, int *__restrict__ b, unsigned l, 
unsigned n)

+{
+  while (n <= ++l)
+*a++ = *b++ + 1;
+  return l;
+}
+
+unsigned __attribute__ ((noinline))
+foo_4 (int *__restrict__ a, int *__restrict__ b, unsigned l, 
unsigned n)

+{  // infininate +  while (0 <= ++l)
+*a++ = *b++ + 1;
+  return l;
+}
+
+unsigned __attribute__ ((noinline))
+foo_5 (int *__restrict__ a, int *__restrict__ b, unsigned l, 
unsigned n)

+{
+  //no loop
+  l = UINT_MAX;
+ 

[PATCH] arm: Fix __arm_vctp16q return type in arm_mve.h

2021-08-16 Thread Christophe Lyon via Gcc-patches
__arm_vctp16q actually returns mve_pred16_t rather than int64_t.

2021-08-16  Christophe Lyon  

gcc/
* config/arm/arm_mve.h: Fix __arm_vctp16q return type.
---
 gcc/config/arm/arm_mve.h | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/gcc/config/arm/arm_mve.h b/gcc/config/arm/arm_mve.h
index 83f10036990..e04d46218d0 100644
--- a/gcc/config/arm/arm_mve.h
+++ b/gcc/config/arm/arm_mve.h
@@ -3524,7 +3524,7 @@ __arm_vaddlvq_u32 (uint32x4_t __a)
   return __builtin_mve_vaddlvq_uv4si (__a);
 }
 
-__extension__ extern __inline int64_t
+__extension__ extern __inline mve_pred16_t
 __attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
 __arm_vctp16q (uint32_t __a)
 {
-- 
2.25.1



Re: 'hash_map>'

2021-08-16 Thread Thomas Schwinge
Hi!

On 2021-08-12T17:15:44-0600, Martin Sebor via Gcc  wrote:
> On 8/6/21 10:57 AM, Thomas Schwinge wrote:
>> So I'm trying to do some C++...  ;-)
>>
>> Given:
>>
>>  /* A map from SSA names or var decls to record fields.  */
>>  typedef hash_map field_map_t;
>>
>>  /* For each propagation record type, this is a map from SSA names or 
>> var decls
>> to propagate, to the field in the record type that should be used for
>> transmission and reception.  */
>>  typedef hash_map record_field_map_t;
>>
>> Thus, that's a 'hash_map>'.  (I may do that,
>> right?)  Looking through GCC implementation files, very most of all uses
>> of 'hash_map' boil down to pointer key ('tree', for example) and
>> pointer/integer value.
>
> Right.  Because most GCC containers rely exclusively on GCC's own
> uses for testing, if your use case is novel in some way, chances
> are it might not work as intended in all circumstances.
>
> I've wrestled with hash_map a number of times.  A use case that's
> close to yours (i.e., a non-trivial value type) is in cp/parser.c:
> see class_to_loc_map_t.

Indeed, at the time you sent this email, I already had started looking
into that one!  (The Fortran test cases that I originally analyzed, which
triggered other cases of non-POD/non-trivial destructor, all didn't
result in a memory leak, because the non-trivial constructor doesn't
actually allocate any resources dynamically -- that's indeed different in
this case here.)  ..., and indeed:

> (I don't remember if I tested it for leaks
> though.  It's used to implement -Wmismatched-tags so compiling
> a few tests under Valgrind should show if it does leak.)

... it does leak memory at present.  :-| (See attached commit log for
details for one example.)

To that effect, to document the current behavior, I propose to
"Add more self-tests for 'hash_map' with Value type with non-trivial
constructor/destructor", see attached.  OK to push to master branch?
(Also cherry-pick into release branches, eventually?)

>> Then:
>>
>>  record_field_map_t field_map ([...]); // see below
>>  for ([...])
>>{
>>  tree record_type = [...];
>>  [...]
>>  bool existed;
>>  field_map_t 
>>= field_map.get_or_insert (record_type, );
>>  gcc_checking_assert (!existed);
>>  [...]
>>  for ([...])
>>fields.put ([...], [...]);
>>  [...]
>>}
>>  [stuff that looks up elements from 'field_map']
>>  field_map.empty ();
>>
>> This generally works.
>>
>> If I instantiate 'record_field_map_t field_map (40);', Valgrind is happy.
>> If however I instantiate 'record_field_map_t field_map (13);' (where '13'
>> would be the default for 'hash_map'), Valgrind complains:
>>
>>  2,080 bytes in 10 blocks are definitely lost in loss record 828 of 876
>> at 0x483DD99: calloc (vg_replace_malloc.c:762)
>> by 0x175F010: xcalloc (xmalloc.c:162)
>> by 0xAF4A2C: hash_table> simple_hashmap_traits, tree_node*> 
>> >::hash_entry, false, xcallocator>::hash_table(unsigned long, bool, bool, 
>> bool, mem_alloc_origin) (hash-table.h:275)
>> by 0x15E0120: hash_map> simple_hashmap_traits, tree_node*> 
>> >::hash_map(unsigned long, bool, bool, bool) (hash-map.h:143)
>> by 0x15DEE87: hash_map> simple_hashmap_traits, tree_node*> >, 
>> simple_hashmap_traits, hash_map> tree_node*, simple_hashmap_traits, 
>> tree_node*> > > >::get_or_insert(tree_node* const&, bool*) (hash-map.h:205)
>> by 0x15DD52C: execute_omp_oacc_neuter_broadcast() 
>> (omp-oacc-neuter-broadcast.cc:1371)
>> [...]
>>
>> (That's with '#pragma GCC optimize "O0"' at the top of the 'gcc/*.cc'
>> file.)
>>
>> My suspicion was that it is due to the 'field_map' getting resized as it
>> incrementally grows (and '40' being big enough for that to never happen),
>> and somehow the non-POD (?) value objects not being properly handled
>> during that.  Working my way a bit through 'gcc/hash-map.*' and
>> 'gcc/hash-table.*' (but not claiming that I understand all that, off
>> hand), it seems as if my theory is right: I'm able to plug this memory
>> leak as follows:
>>
>>  --- gcc/hash-table.h
>>  +++ gcc/hash-table.h
>>  @@ -820,6 +820,8 @@ hash_table::expand ()
>>   {
>> value_type *q = find_empty_slot_for_expand (Descriptor::hash 
>> (x));
>>new ((void*) q) value_type (std::move (x));
>>  + //BAD Descriptor::remove (x); // (doesn't make sense and) a ton 
>> of "Invalid read [...] inside a block of size [...] free'd"
>>  + x.~value_type (); //GOOD This seems to work!  -- but does it make 
>> sense?
>>   }
>>
>> p++;
>>
>> However, that doesn't exactly look like a correct fix, does it?  I'd
>> expect such a manual destructor call in combination with placement new
>> (that is being used here, obviously) -- but this is after 'std::move'?
>> However, this also 

Re: Ping: [PATCH v2] Analyze niter for until-wrap condition [PR101145]

2021-08-16 Thread Jiufu Guo via Gcc-patches

"Bin.Cheng"  writes:

On Wed, Aug 4, 2021 at 10:42 AM guojiufu 
 wrote:


Hi,


cut...

>> @@ -0,0 +1,63 @@
>> +TYPE __attribute__ ((noinline))
>> +foo_sign (int *__restrict__ a, int *__restrict__ b, TYPE l, 
>> TYPE n)

>> +{
>> +  for (l = L_BASE; n < l; l += C)
>> +*a++ = *b++ + 1;
>> +  return l;
>> +}
>> +
>> +TYPE __attribute__ ((noinline))
>> +bar_sign (int *__restrict__ a, int *__restrict__ b, TYPE l, 
>> TYPE n)

>> +{
>> +  for (l = L_BASE_DOWN; l < n; l -= C)
I noticed that both L_BASE and L_BASE_DOWN are defined as l, 
which
makes this test a bit confusing.  Could you clean the use of l, 
for

example, by using an auto var for the loop index invariable?
Otherwise the patch looks good to me.  Thanks very much for the 
work.

Thanks a lot for your help to review!
L_BASE.. are not needed.  Updated the patch which use
a new index var 'i' for loop instead param 'l':

 TYPE i;
 for (i = l; n < i; i += C)

I updated the patch as below.
Bootstrap & regress pass on powerpc64 and powerpc64le.

For code like:
unsigned foo(unsigned val, unsigned start)
{
 unsigned cnt = 0;
 for (unsigned i = start; i > val; ++i)
   cnt++;
 return cnt;
}

The number of iterations should be about UINT_MAX - start.

There is function adjust_cond_for_loop_until_wrap which
handles similar work for const bases.
Like adjust_cond_for_loop_until_wrap, this patch enhance
function number_of_iterations_cond/number_of_iterations_lt
to analyze number of iterations for this kind of loop.

Bootstrap and regtest pass on powerpc64le, x86_64 and aarch64.
Is this ok for trunk?

gcc/ChangeLog:

2021-08-16  Jiufu Guo  

PR tree-optimization/101145
* tree-ssa-loop-niter.c (number_of_iterations_until_wrap):
New function.
(number_of_iterations_lt): Invoke above function.
(adjust_cond_for_loop_until_wrap):
Merge to number_of_iterations_until_wrap.
(number_of_iterations_cond): Update invokes for
	adjust_cond_for_loop_until_wrap and 
	number_of_iterations_lt.


gcc/testsuite/ChangeLog:

2021-08-16  Jiufu Guo  

PR tree-optimization/101145
* gcc.dg/vect/pr101145.c: New test.
* gcc.dg/vect/pr101145.inc: New test.
* gcc.dg/vect/pr101145_1.c: New test.
* gcc.dg/vect/pr101145_2.c: New test.
* gcc.dg/vect/pr101145_3.c: New test.
* gcc.dg/vect/pr101145inf.c: New test.
* gcc.dg/vect/pr101145inf.inc: New test.
* gcc.dg/vect/pr101145inf_1.c: New test.
---
gcc/testsuite/gcc.dg/vect/pr101145.c  | 187 
++

gcc/testsuite/gcc.dg/vect/pr101145.inc|  65 
gcc/testsuite/gcc.dg/vect/pr101145_1.c|  13 ++
gcc/testsuite/gcc.dg/vect/pr101145_2.c|  13 ++
gcc/testsuite/gcc.dg/vect/pr101145_3.c|  13 ++
gcc/testsuite/gcc.dg/vect/pr101145inf.c   |  25 +++
gcc/testsuite/gcc.dg/vect/pr101145inf.inc |  28 
gcc/testsuite/gcc.dg/vect/pr101145inf_1.c |  23 +++
gcc/tree-ssa-loop-niter.c | 157 
++

9 files changed, 459 insertions(+), 65 deletions(-)
create mode 100644 gcc/testsuite/gcc.dg/vect/pr101145.c
create mode 100644 gcc/testsuite/gcc.dg/vect/pr101145.inc
create mode 100644 gcc/testsuite/gcc.dg/vect/pr101145_1.c
create mode 100644 gcc/testsuite/gcc.dg/vect/pr101145_2.c
create mode 100644 gcc/testsuite/gcc.dg/vect/pr101145_3.c
create mode 100644 gcc/testsuite/gcc.dg/vect/pr101145inf.c
create mode 100644 gcc/testsuite/gcc.dg/vect/pr101145inf.inc
create mode 100644 gcc/testsuite/gcc.dg/vect/pr101145inf_1.c

diff --git a/gcc/testsuite/gcc.dg/vect/pr101145.c 
b/gcc/testsuite/gcc.dg/vect/pr101145.c

new file mode 100644
index 000..74031b031cf
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/vect/pr101145.c
@@ -0,0 +1,187 @@
+/* { dg-require-effective-target vect_int } */
+/* { dg-options "-O3 -fdump-tree-vect-details" } */
+#include 
+
+unsigned __attribute__ ((noinline))
+foo (int *__restrict__ a, int *__restrict__ b, unsigned l, 
unsigned n)

+{
+  while (n < ++l)
+*a++ = *b++ + 1;
+  return l;
+}
+
+unsigned __attribute__ ((noinline))
+foo_1 (int *__restrict__ a, int *__restrict__ b, unsigned l, 
unsigned)

+{
+  while (UINT_MAX - 64 < ++l)
+*a++ = *b++ + 1;
+  return l;
+}
+
+unsigned __attribute__ ((noinline))
+foo_2 (int *__restrict__ a, int *__restrict__ b, unsigned l, 
unsigned n)

+{
+  l = UINT_MAX - 32;
+  while (n < ++l)
+*a++ = *b++ + 1;
+  return l;
+}
+
+unsigned __attribute__ ((noinline))
+foo_3 (int *__restrict__ a, int *__restrict__ b, unsigned l, 
unsigned n)

+{
+  while (n <= ++l)
+*a++ = *b++ + 1;
+  return l;
+}
+
+unsigned __attribute__ ((noinline))
+foo_4 (int *__restrict__ a, int *__restrict__ b, unsigned l, 
unsigned n)
+{  // infininate 
+  while (0 <= ++l)

+*a++ = *b++ + 1;
+  return l;
+}
+
+unsigned __attribute__ ((noinline))
+foo_5 (int *__restrict__ a, int *__restrict__ b, unsigned l, 
unsigned n)

+{
+  //no loop
+  l = UINT_MAX;
+  while (n < ++l)
+*a++ = *b++ + 1;
+  return l;
+}
+
+unsigned __attribute__ 

Re: [PATCH 1/5] x86: Add -mmwait for -mgeneral-regs-only

2021-08-16 Thread Martin Liška

On 8/16/21 2:28 PM, Richard Biener via Gcc-patches wrote:

Yes please, and do it with the same commit doing the .opt change.


Just one quick note: I've got a periodic builder that verifies the LTO
stream on tramp3d in all active branches.

Martin


Re: [PATCH 1/5] x86: Add -mmwait for -mgeneral-regs-only

2021-08-16 Thread H.J. Lu via Gcc-patches
On Mon, Aug 16, 2021 at 5:28 AM Richard Biener
 wrote:
>
> On Mon, Aug 16, 2021 at 2:25 PM H.J. Lu  wrote:
> >
> > On Sun, Aug 15, 2021 at 11:11 PM Richard Biener
> >  wrote:
> > >
> > > On Fri, Aug 13, 2021 at 3:51 PM H.J. Lu  wrote:
> > > >
> > > > Add -mmwait so that the MWAIT and MONITOR intrinsics can be used with
> > > > -mgeneral-regs-only and make -msse3 to imply -mmwait.
> > >
> > > Adding new options requires to bump the LTO streaming minor version
> > > (I know we forgot it once on the branch already when adding a new 
> > > --param).
> > >
> > > Please take care of this when backporting.
> >
> > It was updated today:
> >
> > commit dce5367eecfb0729cad0325240d614721afb39e3
> > Author: Martin Liska 
> > Date:   Mon Aug 16 13:02:54 2021 +0200
> >
> > LTO: bump minor version
> >
> > Bump the LTO_minor_version due to changes in
> > 52f0aa4dee8401ef3958dbf789780b0ee877beab
> >
> > PR c/100150
> >
> > gcc/ChangeLog:
> >
> > * lto-streamer.h (LTO_minor_version): Bump.
> >
> > Do I need to do it again if I can check in my patches this week?
>
> Yes please, and do it with the same commit doing the .opt change.
>

Here is the updated patch with LTO_minor_version bump.

Thanks.

-- 
H.J.
From 8f3e275ef061cd5f8353c71cb99f05dd944575f9 Mon Sep 17 00:00:00 2001
From: "H.J. Lu" 
Date: Thu, 15 Apr 2021 11:19:32 -0700
Subject: [PATCH 1/5] x86: Add -mmwait for -mgeneral-regs-only

Add -mmwait so that the MWAIT and MONITOR intrinsics can be used with
-mgeneral-regs-only and make -msse3 to imply -mmwait.

gcc/

	* config.gcc: Install mwaitintrin.h for i[34567]86-*-* and
	x86_64-*-* targets.
	* lto-streamer.h (LTO_minor_version): Bump.
	* common/config/i386/i386-common.c (OPTION_MASK_ISA2_MWAIT_SET):
	New.
	(OPTION_MASK_ISA2_MWAIT_UNSET): Likewise.
	(ix86_handle_option): Handle -mmwait.
	* config/i386/i386-builtins.c (ix86_init_mmx_sse_builtins):
	Replace OPTION_MASK_ISA_SSE3 with OPTION_MASK_ISA2_MWAIT on
	__builtin_ia32_monitor and __builtin_ia32_mwait.
	* config/i386/i386-options.c (isa2_opts): Add -mmwait.
	(ix86_valid_target_attribute_inner_p): Likewise.
	(ix86_option_override_internal): Enable mwait/monitor
	instructions for -msse3.
	* config/i386/i386.h (TARGET_MWAIT): New.
	(TARGET_MWAIT_P): Likewise.
	* config/i386/i386.opt: Add -mmwait.
	* config/i386/mwaitintrin.h: New file.
	* config/i386/pmmintrin.h: Include .
	* config/i386/sse.md (sse3_mwait): Replace TARGET_SSE3 with
	TARGET_MWAIT.
	(@sse3_monitor_): Likewise.
	* config/i386/x86gprintrin.h: Include .
	* doc/extend.texi: Document mwait target attribute.
	* doc/invoke.texi: Document -mmwait.

gcc/testsuite/

	* gcc.target/i386/monitor-2.c: New test.

(cherry picked from commit d8c6cc2ca35489bc41bb58ec96c1195928826922)
---
 gcc/common/config/i386/i386-common.c  | 15 +++
 gcc/config.gcc|  6 ++-
 gcc/config/i386/i386-builtins.c   |  4 +-
 gcc/config/i386/i386-options.c|  7 +++
 gcc/config/i386/i386.h|  2 +
 gcc/config/i386/i386.opt  |  4 ++
 gcc/config/i386/mwaitintrin.h | 52 +++
 gcc/config/i386/pmmintrin.h   | 13 +-
 gcc/config/i386/sse.md|  4 +-
 gcc/config/i386/x86gprintrin.h|  2 +
 gcc/doc/extend.texi   |  5 +++
 gcc/doc/invoke.texi   |  8 +++-
 gcc/lto-streamer.h|  2 +-
 gcc/testsuite/gcc.target/i386/monitor-2.c | 27 
 14 files changed, 131 insertions(+), 20 deletions(-)
 create mode 100644 gcc/config/i386/mwaitintrin.h
 create mode 100644 gcc/testsuite/gcc.target/i386/monitor-2.c

diff --git a/gcc/common/config/i386/i386-common.c b/gcc/common/config/i386/i386-common.c
index 6a7b5c8312f..e156cc34584 100644
--- a/gcc/common/config/i386/i386-common.c
+++ b/gcc/common/config/i386/i386-common.c
@@ -150,6 +150,7 @@ along with GCC; see the file COPYING3.  If not see
 #define OPTION_MASK_ISA_F16C_SET \
   (OPTION_MASK_ISA_F16C | OPTION_MASK_ISA_AVX_SET)
 #define OPTION_MASK_ISA2_MWAITX_SET OPTION_MASK_ISA2_MWAITX
+#define OPTION_MASK_ISA2_MWAIT_SET OPTION_MASK_ISA2_MWAIT
 #define OPTION_MASK_ISA2_CLZERO_SET OPTION_MASK_ISA2_CLZERO
 #define OPTION_MASK_ISA_PKU_SET OPTION_MASK_ISA_PKU
 #define OPTION_MASK_ISA2_RDPID_SET OPTION_MASK_ISA2_RDPID
@@ -245,6 +246,7 @@ along with GCC; see the file COPYING3.  If not see
 #define OPTION_MASK_ISA_XSAVES_UNSET OPTION_MASK_ISA_XSAVES
 #define OPTION_MASK_ISA_CLWB_UNSET OPTION_MASK_ISA_CLWB
 #define OPTION_MASK_ISA2_MWAITX_UNSET OPTION_MASK_ISA2_MWAITX
+#define OPTION_MASK_ISA2_MWAIT_UNSET OPTION_MASK_ISA2_MWAIT
 #define OPTION_MASK_ISA2_CLZERO_UNSET OPTION_MASK_ISA2_CLZERO
 #define OPTION_MASK_ISA_PKU_UNSET OPTION_MASK_ISA_PKU
 #define OPTION_MASK_ISA2_RDPID_UNSET OPTION_MASK_ISA2_RDPID
@@ -1546,6 +1548,19 @@ ix86_handle_option (struct gcc_options *opts,
 	}
   return true;
 
+case OPT_mmwait:
+  if (value)
+	{
+	  

Re: [PATCH v4] gcov: Add TARGET_GCOV_TYPE_SIZE target hook

2021-08-16 Thread Martin Liška

On 8/12/21 6:13 PM, Joseph Myers wrote:

This is not a review of the patch, but I think this version addresses all
the issues I had with previous versions regarding target macro/hook
handling.


And I'm fine with the GCOV part. Please install the patch.

Martin


Re: [PATCH 1/5] x86: Add -mmwait for -mgeneral-regs-only

2021-08-16 Thread Richard Biener via Gcc-patches
On Mon, Aug 16, 2021 at 2:25 PM H.J. Lu  wrote:
>
> On Sun, Aug 15, 2021 at 11:11 PM Richard Biener
>  wrote:
> >
> > On Fri, Aug 13, 2021 at 3:51 PM H.J. Lu  wrote:
> > >
> > > Add -mmwait so that the MWAIT and MONITOR intrinsics can be used with
> > > -mgeneral-regs-only and make -msse3 to imply -mmwait.
> >
> > Adding new options requires to bump the LTO streaming minor version
> > (I know we forgot it once on the branch already when adding a new --param).
> >
> > Please take care of this when backporting.
>
> It was updated today:
>
> commit dce5367eecfb0729cad0325240d614721afb39e3
> Author: Martin Liska 
> Date:   Mon Aug 16 13:02:54 2021 +0200
>
> LTO: bump minor version
>
> Bump the LTO_minor_version due to changes in
> 52f0aa4dee8401ef3958dbf789780b0ee877beab
>
> PR c/100150
>
> gcc/ChangeLog:
>
> * lto-streamer.h (LTO_minor_version): Bump.
>
> Do I need to do it again if I can check in my patches this week?

Yes please, and do it with the same commit doing the .opt change.

Richard.

> Thanks.
>
> > Richard.
> >
> > > gcc/
> > >
> > > * config.gcc: Install mwaitintrin.h for i[34567]86-*-* and
> > > x86_64-*-* targets.
> > > * common/config/i386/i386-common.c (OPTION_MASK_ISA2_MWAIT_SET):
> > > New.
> > > (OPTION_MASK_ISA2_MWAIT_UNSET): Likewise.
> > > (ix86_handle_option): Handle -mmwait.
> > > * config/i386/i386-builtins.c (ix86_init_mmx_sse_builtins):
> > > Replace OPTION_MASK_ISA_SSE3 with OPTION_MASK_ISA2_MWAIT on
> > > __builtin_ia32_monitor and __builtin_ia32_mwait.
> > > * config/i386/i386-options.c (isa2_opts): Add -mmwait.
> > > (ix86_valid_target_attribute_inner_p): Likewise.
> > > (ix86_option_override_internal): Enable mwait/monitor
> > > instructions for -msse3.
> > > * config/i386/i386.h (TARGET_MWAIT): New.
> > > (TARGET_MWAIT_P): Likewise.
> > > * config/i386/i386.opt: Add -mmwait.
> > > * config/i386/mwaitintrin.h: New file.
> > > * config/i386/pmmintrin.h: Include .
> > > * config/i386/sse.md (sse3_mwait): Replace TARGET_SSE3 with
> > > TARGET_MWAIT.
> > > (@sse3_monitor_): Likewise.
> > > * config/i386/x86gprintrin.h: Include .
> > > * doc/extend.texi: Document mwait target attribute.
> > > * doc/invoke.texi: Document -mmwait.
> > >
> > > gcc/testsuite/
> > >
> > > * gcc.target/i386/monitor-2.c: New test.
> > >
> > > (cherry picked from commit d8c6cc2ca35489bc41bb58ec96c1195928826922)
> > > ---
> > >  gcc/common/config/i386/i386-common.c  | 15 +++
> > >  gcc/config.gcc|  6 ++-
> > >  gcc/config/i386/i386-builtins.c   |  4 +-
> > >  gcc/config/i386/i386-options.c|  7 +++
> > >  gcc/config/i386/i386.h|  2 +
> > >  gcc/config/i386/i386.opt  |  4 ++
> > >  gcc/config/i386/mwaitintrin.h | 52 +++
> > >  gcc/config/i386/pmmintrin.h   | 13 +-
> > >  gcc/config/i386/sse.md|  4 +-
> > >  gcc/config/i386/x86gprintrin.h|  2 +
> > >  gcc/doc/extend.texi   |  5 +++
> > >  gcc/doc/invoke.texi   |  8 +++-
> > >  gcc/testsuite/gcc.target/i386/monitor-2.c | 27 
> > >  13 files changed, 130 insertions(+), 19 deletions(-)
> > >  create mode 100644 gcc/config/i386/mwaitintrin.h
> > >  create mode 100644 gcc/testsuite/gcc.target/i386/monitor-2.c
> > >
> > > diff --git a/gcc/common/config/i386/i386-common.c 
> > > b/gcc/common/config/i386/i386-common.c
> > > index 6a7b5c8312f..e156cc34584 100644
> > > --- a/gcc/common/config/i386/i386-common.c
> > > +++ b/gcc/common/config/i386/i386-common.c
> > > @@ -150,6 +150,7 @@ along with GCC; see the file COPYING3.  If not see
> > >  #define OPTION_MASK_ISA_F16C_SET \
> > >(OPTION_MASK_ISA_F16C | OPTION_MASK_ISA_AVX_SET)
> > >  #define OPTION_MASK_ISA2_MWAITX_SET OPTION_MASK_ISA2_MWAITX
> > > +#define OPTION_MASK_ISA2_MWAIT_SET OPTION_MASK_ISA2_MWAIT
> > >  #define OPTION_MASK_ISA2_CLZERO_SET OPTION_MASK_ISA2_CLZERO
> > >  #define OPTION_MASK_ISA_PKU_SET OPTION_MASK_ISA_PKU
> > >  #define OPTION_MASK_ISA2_RDPID_SET OPTION_MASK_ISA2_RDPID
> > > @@ -245,6 +246,7 @@ along with GCC; see the file COPYING3.  If not see
> > >  #define OPTION_MASK_ISA_XSAVES_UNSET OPTION_MASK_ISA_XSAVES
> > >  #define OPTION_MASK_ISA_CLWB_UNSET OPTION_MASK_ISA_CLWB
> > >  #define OPTION_MASK_ISA2_MWAITX_UNSET OPTION_MASK_ISA2_MWAITX
> > > +#define OPTION_MASK_ISA2_MWAIT_UNSET OPTION_MASK_ISA2_MWAIT
> > >  #define OPTION_MASK_ISA2_CLZERO_UNSET OPTION_MASK_ISA2_CLZERO
> > >  #define OPTION_MASK_ISA_PKU_UNSET OPTION_MASK_ISA_PKU
> > >  #define OPTION_MASK_ISA2_RDPID_UNSET OPTION_MASK_ISA2_RDPID
> > > @@ -1546,6 +1548,19 @@ ix86_handle_option (struct gcc_options *opts,
> > > }
> > >return true;
> > >
> > > +case 

Re: [PATCH 1/5] x86: Add -mmwait for -mgeneral-regs-only

2021-08-16 Thread H.J. Lu via Gcc-patches
On Sun, Aug 15, 2021 at 11:11 PM Richard Biener
 wrote:
>
> On Fri, Aug 13, 2021 at 3:51 PM H.J. Lu  wrote:
> >
> > Add -mmwait so that the MWAIT and MONITOR intrinsics can be used with
> > -mgeneral-regs-only and make -msse3 to imply -mmwait.
>
> Adding new options requires to bump the LTO streaming minor version
> (I know we forgot it once on the branch already when adding a new --param).
>
> Please take care of this when backporting.

It was updated today:

commit dce5367eecfb0729cad0325240d614721afb39e3
Author: Martin Liska 
Date:   Mon Aug 16 13:02:54 2021 +0200

LTO: bump minor version

Bump the LTO_minor_version due to changes in
52f0aa4dee8401ef3958dbf789780b0ee877beab

PR c/100150

gcc/ChangeLog:

* lto-streamer.h (LTO_minor_version): Bump.

Do I need to do it again if I can check in my patches this week?

Thanks.

> Richard.
>
> > gcc/
> >
> > * config.gcc: Install mwaitintrin.h for i[34567]86-*-* and
> > x86_64-*-* targets.
> > * common/config/i386/i386-common.c (OPTION_MASK_ISA2_MWAIT_SET):
> > New.
> > (OPTION_MASK_ISA2_MWAIT_UNSET): Likewise.
> > (ix86_handle_option): Handle -mmwait.
> > * config/i386/i386-builtins.c (ix86_init_mmx_sse_builtins):
> > Replace OPTION_MASK_ISA_SSE3 with OPTION_MASK_ISA2_MWAIT on
> > __builtin_ia32_monitor and __builtin_ia32_mwait.
> > * config/i386/i386-options.c (isa2_opts): Add -mmwait.
> > (ix86_valid_target_attribute_inner_p): Likewise.
> > (ix86_option_override_internal): Enable mwait/monitor
> > instructions for -msse3.
> > * config/i386/i386.h (TARGET_MWAIT): New.
> > (TARGET_MWAIT_P): Likewise.
> > * config/i386/i386.opt: Add -mmwait.
> > * config/i386/mwaitintrin.h: New file.
> > * config/i386/pmmintrin.h: Include .
> > * config/i386/sse.md (sse3_mwait): Replace TARGET_SSE3 with
> > TARGET_MWAIT.
> > (@sse3_monitor_): Likewise.
> > * config/i386/x86gprintrin.h: Include .
> > * doc/extend.texi: Document mwait target attribute.
> > * doc/invoke.texi: Document -mmwait.
> >
> > gcc/testsuite/
> >
> > * gcc.target/i386/monitor-2.c: New test.
> >
> > (cherry picked from commit d8c6cc2ca35489bc41bb58ec96c1195928826922)
> > ---
> >  gcc/common/config/i386/i386-common.c  | 15 +++
> >  gcc/config.gcc|  6 ++-
> >  gcc/config/i386/i386-builtins.c   |  4 +-
> >  gcc/config/i386/i386-options.c|  7 +++
> >  gcc/config/i386/i386.h|  2 +
> >  gcc/config/i386/i386.opt  |  4 ++
> >  gcc/config/i386/mwaitintrin.h | 52 +++
> >  gcc/config/i386/pmmintrin.h   | 13 +-
> >  gcc/config/i386/sse.md|  4 +-
> >  gcc/config/i386/x86gprintrin.h|  2 +
> >  gcc/doc/extend.texi   |  5 +++
> >  gcc/doc/invoke.texi   |  8 +++-
> >  gcc/testsuite/gcc.target/i386/monitor-2.c | 27 
> >  13 files changed, 130 insertions(+), 19 deletions(-)
> >  create mode 100644 gcc/config/i386/mwaitintrin.h
> >  create mode 100644 gcc/testsuite/gcc.target/i386/monitor-2.c
> >
> > diff --git a/gcc/common/config/i386/i386-common.c 
> > b/gcc/common/config/i386/i386-common.c
> > index 6a7b5c8312f..e156cc34584 100644
> > --- a/gcc/common/config/i386/i386-common.c
> > +++ b/gcc/common/config/i386/i386-common.c
> > @@ -150,6 +150,7 @@ along with GCC; see the file COPYING3.  If not see
> >  #define OPTION_MASK_ISA_F16C_SET \
> >(OPTION_MASK_ISA_F16C | OPTION_MASK_ISA_AVX_SET)
> >  #define OPTION_MASK_ISA2_MWAITX_SET OPTION_MASK_ISA2_MWAITX
> > +#define OPTION_MASK_ISA2_MWAIT_SET OPTION_MASK_ISA2_MWAIT
> >  #define OPTION_MASK_ISA2_CLZERO_SET OPTION_MASK_ISA2_CLZERO
> >  #define OPTION_MASK_ISA_PKU_SET OPTION_MASK_ISA_PKU
> >  #define OPTION_MASK_ISA2_RDPID_SET OPTION_MASK_ISA2_RDPID
> > @@ -245,6 +246,7 @@ along with GCC; see the file COPYING3.  If not see
> >  #define OPTION_MASK_ISA_XSAVES_UNSET OPTION_MASK_ISA_XSAVES
> >  #define OPTION_MASK_ISA_CLWB_UNSET OPTION_MASK_ISA_CLWB
> >  #define OPTION_MASK_ISA2_MWAITX_UNSET OPTION_MASK_ISA2_MWAITX
> > +#define OPTION_MASK_ISA2_MWAIT_UNSET OPTION_MASK_ISA2_MWAIT
> >  #define OPTION_MASK_ISA2_CLZERO_UNSET OPTION_MASK_ISA2_CLZERO
> >  #define OPTION_MASK_ISA_PKU_UNSET OPTION_MASK_ISA_PKU
> >  #define OPTION_MASK_ISA2_RDPID_UNSET OPTION_MASK_ISA2_RDPID
> > @@ -1546,6 +1548,19 @@ ix86_handle_option (struct gcc_options *opts,
> > }
> >return true;
> >
> > +case OPT_mmwait:
> > +  if (value)
> > +   {
> > + opts->x_ix86_isa_flags2 |= OPTION_MASK_ISA2_MWAIT_SET;
> > + opts->x_ix86_isa_flags2_explicit |= OPTION_MASK_ISA2_MWAIT_SET;
> > +   }
> > +  else
> > +   {
> > + opts->x_ix86_isa_flags2 &= ~OPTION_MASK_ISA2_MWAIT_UNSET;
> > + 

Re: [patch] Fix regression in debug info for Ada with DWARF 5

2021-08-16 Thread Richard Biener via Gcc-patches
On Mon, Aug 16, 2021 at 12:25 PM Eric Botcazou  wrote:
>
> Hi,
>
> add_scalar_info can directly generate a reference to an existing DIE for a
> scalar attribute, e.g the upper bound of a VLA, but it does so only if this
> existing DIE has a location or is a constant:
>
>   if (get_AT (decl_die, DW_AT_location)
>   || get_AT (decl_die, DW_AT_data_member_location)
>   || get_AT (decl_die, DW_AT_const_value))
>
> Now, in DWARF 5, members of a structure that are bitfields no longer have a
> DW_AT_data_member_location but a DW_AT_data_bit_offset attribute instead, so
> the condition is bypassed.
>
> Tested on x86-64/Linux, OK for mainline and 11 branch?

OK

>
> 2021-08-16  Eric Botcazou  
>
> * dwarf2out.c (add_scalar_info): Deal with DW_AT_data_bit_offset.
>
> --
> Eric Botcazou


Re: [PATCH] Do not enable DT_INIT_ARRAY/DT_FINI_ARRAY on uclinuxfdpiceabi

2021-08-16 Thread Christophe LYON via Gcc-patches

ping?


On 12/08/2021 17:29, Christophe Lyon via Gcc-patches wrote:

Commit r12-1328 enabled DT_INIT_ARRAY/DT_FINI_ARRAY for all Linux
targets, but this does not work for arm-none-uclinuxfdpiceabi: it
makes all the execution tests fail.

This patch restores the original behavior for uclinuxfdpiceabi.

2021-08-12  Christophe Lyon  

gcc/
PR target/100896
* config.gcc (gcc_cv_initfini_array): Leave undefined for
uclinuxfdpiceabi targets.
---
  gcc/config.gcc | 10 --
  1 file changed, 8 insertions(+), 2 deletions(-)

diff --git a/gcc/config.gcc b/gcc/config.gcc
index 93e2b3219b9..8c8d30ca934 100644
--- a/gcc/config.gcc
+++ b/gcc/config.gcc
@@ -851,8 +851,14 @@ case ${target} in
tmake_file="${tmake_file} t-glibc"
target_has_targetcm=yes
target_has_targetdm=yes
-  # Linux targets always support .init_array.
-  gcc_cv_initfini_array=yes
+  case $target in
+*-*-uclinuxfdpiceabi)
+  ;;
+*)
+  # Linux targets always support .init_array.
+  gcc_cv_initfini_array=yes
+  ;;
+  esac
;;
  *-*-netbsd*)
tm_p_file="${tm_p_file} netbsd-protos.h"


Re: [PATCH] arm: Fix multilib mapping for CDE extensions [PR100856]

2021-08-16 Thread Christophe LYON via Gcc-patches

ping?

On 11/08/2021 16:06, Christophe Lyon wrote:

ping?
https://gcc.gnu.org/pipermail/gcc-patches/2021-July/575310.html 




On Wed, Aug 4, 2021 at 11:13 AM Christophe Lyon via Gcc-patches 
mailto:gcc-patches@gcc.gnu.org>> wrote:


ping?

On Thu, 15 Jul 2021 at 15:07, Christophe LYON via Gcc-patches
mailto:gcc-patches@gcc.gnu.org>> wrote:
>
> This is a followup to Srinath's recent patch: the newly added
test is
> failing e.g. on arm-linux-gnueabihf without R/M profile multilibs.
>
> It is also failing on arm-eabi with R/M profile multilibs if the
> execution engine does not support v8.1-M instructions.
>
> The patch avoids this by adding check_effective_target_FUNC_multilib
> in target-supports.exp which effectively checks whether the target
> supports linking and execution, like what is already done for other
> ARM effective targets.  pr100856.c is updated to use it instead of
> arm_v8_1m_main_cde_mve_ok (which makes the testcase a bit of a
> duplicate with check_effective_target_FUNC_multilib).
>
> In addition, I noticed that requiring MVE does not seem
necessary and
> this enables the test to pass even when targeting a CPU without MVE:
> since the test does not involve actual CDE instructions, it can pass
> on other architecture versions.  For instance, when requiring
MVE, we
> have to use cortex-m55 under QEMU for the test to pass because the
> memset() that comes from v8.1-m.main+mve multilib uses LOB
> instructions (DLS) (memset is used during startup). Keeping
> arm_v8_1m_main_cde_mve_ok would mean we would enable the test
provided
> we have the right multilibs, causing a runtime error if the
simulator
> does not support LOB instructions (e.g. when targeting cortex-m7).
>
> I do not update sourcebuild.texi since the CDE effective targets are
> already collectively documented.
>
> Finally, the patch fixes two typos in comments.
>
> 2021-07-15  Christophe Lyon  mailto:christophe.l...@foss.st.com>>
>
>          PR target/100856
>          gcc/
>          * config/arm/arm.opt: Fix typo.
>          * config/arm/t-rmprofile: Fix typo.
>
>          gcc/testsuite/
>          * gcc.target/arm/acle/pr100856.c: Use
arm_v8m_main_cde_multilib
>          and arm_v8m_main_cde.
>          * lib/target-supports.exp: Add
> check_effective_target_FUNC_multilib for ARM CDE.
>
>



Re: [PATCH] Fix incorrect computation in fill_always_executed_in_1

2021-08-16 Thread Richard Biener via Gcc-patches
On Mon, 16 Aug 2021, Xiong Hu Luo wrote:

> It seems to me that ALWAYS_EXECUTED_IN is not computed correctly for
> nested loops.  inn_loop is updated to inner loop, so it need be restored
> when exiting from innermost loop. With this patch, the store instruction
> in outer loop could also be moved out of outer loop by store motion.
> Any comments?  Thanks.

> gcc/ChangeLog:
> 
>   * tree-ssa-loop-im.c (fill_always_executed_in_1): Restore
>   inn_loop when exiting from innermost loop.
> 
> gcc/testsuite/ChangeLog:
> 
>   * gcc.dg/tree-ssa/ssa-lim-19.c: New test.
> ---
>  gcc/testsuite/gcc.dg/tree-ssa/ssa-lim-19.c | 24 ++
>  gcc/tree-ssa-loop-im.c |  6 +-
>  2 files changed, 29 insertions(+), 1 deletion(-)
>  create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/ssa-lim-19.c
> 
> diff --git a/gcc/testsuite/gcc.dg/tree-ssa/ssa-lim-19.c 
> b/gcc/testsuite/gcc.dg/tree-ssa/ssa-lim-19.c
> new file mode 100644
> index 000..097a5ee4a4b
> --- /dev/null
> +++ b/gcc/testsuite/gcc.dg/tree-ssa/ssa-lim-19.c
> @@ -0,0 +1,24 @@
> +/* PR/101293 */
> +/* { dg-do compile } */
> +/* { dg-options "-O2 -fdump-tree-lim2-details" } */
> +
> +struct X { int i; int j; int k;};
> +
> +void foo(struct X *x, int n, int l)
> +{
> +  for (int j = 0; j < l; j++)
> +{
> +  for (int i = 0; i < n; ++i)
> + {
> +   int *p = >j;
> +   int tem = *p;
> +   x->j += tem * i;
> + }
> +  int *r = >k;
> +  int tem2 = *r;
> +  x->k += tem2 * j;
> +}
> +}
> +
> +/* { dg-final { scan-tree-dump-times "Executing store motion" 2 "lim2" } } */
> +
> diff --git a/gcc/tree-ssa-loop-im.c b/gcc/tree-ssa-loop-im.c
> index b24bc64f2a7..5ca4738b20e 100644
> --- a/gcc/tree-ssa-loop-im.c
> +++ b/gcc/tree-ssa-loop-im.c
> @@ -3211,6 +3211,10 @@ fill_always_executed_in_1 (class loop *loop, sbitmap 
> contains_call)
> if (dominated_by_p (CDI_DOMINATORS, loop->latch, bb))
>   last = bb;
>  
> +   if (inn_loop != loop
> +   && flow_loop_nested_p (bb->loop_father, inn_loop))
> + inn_loop = bb->loop_father;
> +

The comment says

  /* In a loop that is always entered we may proceed anyway.
 But record that we entered it and stop once we leave it.  
*/
  inn_loop = bb->loop_father;

and your change would defeat that early return, no?

> if (bitmap_bit_p (contains_call, bb->index))
>   break;
>  
> @@ -3238,7 +3242,7 @@ fill_always_executed_in_1 (class loop *loop, sbitmap 
> contains_call)
>  
> if (bb->loop_father->header == bb)
>   {
> -   if (!dominated_by_p (CDI_DOMINATORS, loop->latch, bb))
> +   if (!dominated_by_p (CDI_DOMINATORS, bb->loop_father->latch, bb))
>   break;

That's now a always false condition - a loops latch is always dominated
by its header.  The condition as written tries to verify whether the
loop is always entered - mind we visit all blocks, not only those
always executed.

In fact for your testcase the x->j ref is _not_ always executed
since the inner loop is conditional on n > 0.

Richard.


Re: [PATCH] Speed up jump table switch detection.

2021-08-16 Thread Richard Biener via Gcc-patches
On Mon, Aug 16, 2021 at 10:28 AM Martin Liška  wrote:
>
> Hi.
>
> As mentioned in the PR, this patch speeds up rapidly jump table detection
> in switch lowering.
>
> Patch can bootstrap on x86_64-linux-gnu and survives regression tests.
>
> Ready to be installed?

OK.

> Thanks,
> Martin
>
> PR tree-optimization/100393
>
> gcc/ChangeLog:
>
> * tree-switch-conversion.c (group_cluster::dump): Use
>   get_comparison_count.
> (jump_table_cluster::find_jump_tables): Pre-compute number of
> comparisons and then decrement it. Cache also max_ratio.
> (jump_table_cluster::can_be_handled): Change signature.
> * tree-switch-conversion.h (get_comparison_count): New.
> ---
>   gcc/tree-switch-conversion.c | 42 
>   gcc/tree-switch-conversion.h | 14 ++--
>   2 files changed, 35 insertions(+), 21 deletions(-)
>
> diff --git a/gcc/tree-switch-conversion.c b/gcc/tree-switch-conversion.c
> index 294b5457008..244cf4be010 100644
> --- a/gcc/tree-switch-conversion.c
> +++ b/gcc/tree-switch-conversion.c
> @@ -1091,7 +1091,7 @@ group_cluster::dump (FILE *f, bool details)
> for (unsigned i = 0; i < m_cases.length (); i++)
>   {
> simple_cluster *sc = static_cast (m_cases[i]);
> -  comparison_count += sc->m_range_p ? 2 : 1;
> +  comparison_count += sc->get_comparison_count ();
>   }
>
> unsigned HOST_WIDE_INT range = get_range (get_low (), get_high ());
> @@ -1186,11 +1186,24 @@ jump_table_cluster::find_jump_tables (vec 
> )
>
> min.quick_push (min_cluster_item (0, 0, 0));
>
> +  unsigned HOST_WIDE_INT max_ratio
> += (optimize_insn_for_size_p ()
> +   ? param_jump_table_max_growth_ratio_for_size
> +   : param_jump_table_max_growth_ratio_for_speed);
> +
> for (unsigned i = 1; i <= l; i++)
>   {
> /* Set minimal # of clusters with i-th item to infinite.  */
> min.quick_push (min_cluster_item (INT_MAX, INT_MAX, INT_MAX));
>
> +  /* Pre-calculate number of comparisons for the clusters.  */
> +  HOST_WIDE_INT comparison_count = 0;
> +  for (unsigned k = 0; k <= i - 1; k++)
> +   {
> + simple_cluster *sc = static_cast (clusters[k]);
> + comparison_count += sc->get_comparison_count ();
> +   }
> +
> for (unsigned j = 0; j < i; j++)
> {
>   unsigned HOST_WIDE_INT s = min[j].m_non_jt_cases;
> @@ -1201,10 +1214,15 @@ jump_table_cluster::find_jump_tables (vec 
> )
>   if ((min[j].m_count + 1 < min[i].m_count
>|| (min[j].m_count + 1 == min[i].m_count
>&& s < min[i].m_non_jt_cases))
> - && can_be_handled (clusters, j, i - 1))
> + && can_be_handled (clusters, j, i - 1, max_ratio,
> +comparison_count))
> min[i] = min_cluster_item (min[j].m_count + 1, j, s);
> +
> + simple_cluster *sc = static_cast (clusters[j]);
> + comparison_count -= sc->get_comparison_count ();
> }
>
> +  gcc_checking_assert (comparison_count == 0);
> gcc_checking_assert (min[i].m_count != INT_MAX);
>   }
>
> @@ -1242,7 +1260,9 @@ jump_table_cluster::find_jump_tables (vec 
> )
>
>   bool
>   jump_table_cluster::can_be_handled (const vec ,
> -   unsigned start, unsigned end)
> +   unsigned start, unsigned end,
> +   unsigned HOST_WIDE_INT max_ratio,
> +   unsigned HOST_WIDE_INT comparison_count)
>   {
> /* If the switch is relatively small such that the cost of one
>indirect jump on the target are higher than the cost of a
> @@ -1261,10 +1281,6 @@ jump_table_cluster::can_be_handled (const vec *> ,
> if (start == end)
>   return true;
>
> -  unsigned HOST_WIDE_INT max_ratio
> -= (optimize_insn_for_size_p ()
> -   ? param_jump_table_max_growth_ratio_for_size
> -   : param_jump_table_max_growth_ratio_for_speed);
> unsigned HOST_WIDE_INT range = get_range (clusters[start]->get_low (),
> clusters[end]->get_high ());
> /* Check overflow.  */
> @@ -1278,18 +1294,6 @@ jump_table_cluster::can_be_handled (const vec *> ,
> if (lhs < range)
>   return false;
>
> -  /* First make quick guess as each cluster
> - can add at maximum 2 to the comparison_count.  */
> -  if (lhs > 2 * max_ratio * (end - start + 1))
> -return false;
> -
> -  unsigned HOST_WIDE_INT comparison_count = 0;
> -  for (unsigned i = start; i <= end; i++)
> -{
> -  simple_cluster *sc = static_cast (clusters[i]);
> -  comparison_count += sc->m_range_p ? 2 : 1;
> -}
> -
> return lhs <= max_ratio * comparison_count;
>   }
>
> diff --git a/gcc/tree-switch-conversion.h b/gcc/tree-switch-conversion.h
> index d76f19b57f6..a375e52636e 100644
> --- a/gcc/tree-switch-conversion.h
> +++ 

GCC 11 backports

2021-08-16 Thread Martin Liška

I'm going to apply the following 3 tested patches.

Martin
>From 85b78f6c38bb357abb749fed81a1e7a589050461 Mon Sep 17 00:00:00 2001
From: Martin Liska 
Date: Thu, 12 Aug 2021 16:01:01 +0200
Subject: [PATCH 1/3] ipa: make target_clone default decl local [PR101726]

	PR ipa/101726

gcc/ChangeLog:

	* multiple_target.c (create_dispatcher_calls): Make default
	  function local only if it is a definition.
---
 gcc/multiple_target.c | 25 ++---
 1 file changed, 14 insertions(+), 11 deletions(-)

diff --git a/gcc/multiple_target.c b/gcc/multiple_target.c
index e4192657cef..6c0565880c5 100644
--- a/gcc/multiple_target.c
+++ b/gcc/multiple_target.c
@@ -170,17 +170,20 @@ create_dispatcher_calls (struct cgraph_node *node)
   clone_function_name_numbered (
 	  node->decl, "default"));
 
-  /* FIXME: copy of cgraph_node::make_local that should be cleaned up
-	in next stage1.  */
-  node->make_decl_local ();
-  node->set_section (NULL);
-  node->set_comdat_group (NULL);
-  node->externally_visible = false;
-  node->forced_by_abi = false;
-  node->set_section (NULL);
-
-  DECL_ARTIFICIAL (node->decl) = 1;
-  node->force_output = true;
+  if (node->definition)
+{
+  /* FIXME: copy of cgraph_node::make_local that should be cleaned up
+		in next stage1.  */
+  node->make_decl_local ();
+  node->set_section (NULL);
+  node->set_comdat_group (NULL);
+  node->externally_visible = false;
+  node->forced_by_abi = false;
+  node->set_section (NULL);
+
+  DECL_ARTIFICIAL (node->decl) = 1;
+  node->force_output = true;
+}
 }
 
 /* Return length of attribute names string,
-- 
2.32.0

>From fbe246620e2fa5b5968718837ae4c12fdb85ba39 Mon Sep 17 00:00:00 2001
From: Martin Liska 
Date: Fri, 13 Aug 2021 11:10:56 +0200
Subject: [PATCH 2/3] ipa: do not make localaliases for target_clones
 [PR101261]

	PR ipa/101261

gcc/ChangeLog:

	* symtab.c (symtab_node::noninterposable_alias): Do not create
	  local aliases for target_clone functions as the clonning pass
	  rejects aliases.

gcc/testsuite/ChangeLog:

	* gcc.target/i386/pr101261.c: New test.
---
 gcc/symtab.c |  2 ++
 gcc/testsuite/gcc.target/i386/pr101261.c | 11 +++
 2 files changed, 13 insertions(+)
 create mode 100644 gcc/testsuite/gcc.target/i386/pr101261.c

diff --git a/gcc/symtab.c b/gcc/symtab.c
index 2135b34ce27..5530a124a9d 100644
--- a/gcc/symtab.c
+++ b/gcc/symtab.c
@@ -1959,6 +1959,8 @@ symtab_node::noninterposable_alias (void)
   /* If aliases aren't supported by the assembler, fail.  */
   if (!TARGET_SUPPORTS_ALIASES)
 return NULL;
+  else if (lookup_attribute ("target_clones", DECL_ATTRIBUTES (node->decl)))
+return NULL;
 
   /* Otherwise create a new one.  */
   new_decl = copy_node (node->decl);
diff --git a/gcc/testsuite/gcc.target/i386/pr101261.c b/gcc/testsuite/gcc.target/i386/pr101261.c
new file mode 100644
index 000..d25d1a202c2
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/pr101261.c
@@ -0,0 +1,11 @@
+/* PR middle-end/101261 */
+/* { dg-do compile { target fpic } } */
+/* { dg-options "-fno-semantic-interposition -fPIC" } */
+/* { dg-require-ifunc "" } */
+
+void
+__attribute__((target_clones("default", "avx2")))
+dt_ioppr_transform_image_colorspace()
+{
+  dt_ioppr_transform_image_colorspace();
+}
-- 
2.32.0

>From 5b1bcb10b2cc6c93b06d22da0a044a6a6f362f0b Mon Sep 17 00:00:00 2001
From: Martin Liska 
Date: Fri, 13 Aug 2021 12:35:47 +0200
Subject: [PATCH 3/3] ipa: ICF should check SSA_NAME_IS_DEFAULT_DEF

	PR ipa/100600

gcc/ChangeLog:

	* ipa-icf-gimple.c (func_checker::compare_ssa_name): Do not
	  consider equal SSA_NAMEs when one is a param.

gcc/testsuite/ChangeLog:

	* gcc.dg/ipa/pr100600.c: New test.
---
 gcc/ipa-icf-gimple.c|  3 +++
 gcc/testsuite/gcc.dg/ipa/pr100600.c | 22 ++
 2 files changed, 25 insertions(+)
 create mode 100644 gcc/testsuite/gcc.dg/ipa/pr100600.c

diff --git a/gcc/ipa-icf-gimple.c b/gcc/ipa-icf-gimple.c
index edf5f025627..cf0262621be 100644
--- a/gcc/ipa-icf-gimple.c
+++ b/gcc/ipa-icf-gimple.c
@@ -96,6 +96,9 @@ func_checker::compare_ssa_name (const_tree t1, const_tree t2)
   unsigned i1 = SSA_NAME_VERSION (t1);
   unsigned i2 = SSA_NAME_VERSION (t2);
 
+  if (SSA_NAME_IS_DEFAULT_DEF (t1) != SSA_NAME_IS_DEFAULT_DEF (t2))
+return false;
+
   if (m_source_ssa_names[i1] == -1)
 m_source_ssa_names[i1] = i2;
   else if (m_source_ssa_names[i1] != (int) i2)
diff --git a/gcc/testsuite/gcc.dg/ipa/pr100600.c b/gcc/testsuite/gcc.dg/ipa/pr100600.c
new file mode 100644
index 000..8a3d0e16e7e
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/ipa/pr100600.c
@@ -0,0 +1,22 @@
+/* PR ipa/100600 */
+/* { dg-do compile } */
+/* { dg-options "-O2" } */
+
+int a, b, c;
+long d(long x, long e, long f, long g) {
+  long h, i;
+  for (; h < e; h++) {
+i = f;
+for (; i < g; i++)
+  c = b + a;
+  }
+  return h + i;
+}
+
+long j(long x, long e, long y, long g) {
+  long h, i;
+  

[PATCH][gcc-11][pushed] LTO: bump minor version

2021-08-16 Thread Martin Liška

Bump the LTO_minor_version due to changes in 
52f0aa4dee8401ef3958dbf789780b0ee877beab

PR c/100150

gcc/ChangeLog:

* lto-streamer.h (LTO_minor_version): Bump.
---
 gcc/lto-streamer.h | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/gcc/lto-streamer.h b/gcc/lto-streamer.h
index 7a7be80dab8..a01049da472 100644
--- a/gcc/lto-streamer.h
+++ b/gcc/lto-streamer.h
@@ -121,7 +121,7 @@ along with GCC; see the file COPYING3.  If not see
  form followed by the data for the string.  */
 
 #define LTO_major_version 11

-#define LTO_minor_version 0
+#define LTO_minor_version 1
 
 typedef unsigned char	lto_decl_flags_t;
 
--

2.32.0



Address '?:' issues in 'libgomp.oacc-c-c++-common/mode-transitions.c'

2021-08-16 Thread Thomas Schwinge
Hi!

Pushed "Address '?:' issues in
'libgomp.oacc-c-c++-common/mode-transitions.c'" to master branch in
commit a2ab2f0dfba0fa69ebf6c82e34750911b2e5a639, see attached.


Grüße
 Thomas


-
Siemens Electronic Design Automation GmbH; Anschrift: Arnulfstraße 201, 80634 
München; Gesellschaft mit beschränkter Haftung; Geschäftsführer: Thomas 
Heurung, Frank Thürauf; Sitz der Gesellschaft: München; Registergericht 
München, HRB 106955
>From a2ab2f0dfba0fa69ebf6c82e34750911b2e5a639 Mon Sep 17 00:00:00 2001
From: Thomas Schwinge 
Date: Wed, 11 Aug 2021 11:59:19 +0200
Subject: [PATCH] Address '?:' issues in
 'libgomp.oacc-c-c++-common/mode-transitions.c'
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

[...]/libgomp.oacc-c-c++-common/mode-transitions.c: In function ‘t3’:
[...]/libgomp.oacc-c-c++-common/mode-transitions.c:127:43: warning: ‘?:’ using integer constants in boolean context, the expression will always evaluate to ‘true’ [-Wint-in-bool-context]
  127 | assert (arr[i] == ((i % 64) < 32) ? 1 : -1);
  |   ^

[...]/libgomp.oacc-c-c++-common/mode-transitions.c: In function ‘t9’:
[...]/libgomp.oacc-c-c++-common/mode-transitions.c:359:46: warning: ‘?:’ using integer constants in boolean context, the expression will always evaluate to ‘true’ [-Wint-in-bool-context]
  359 | assert (arr[i] == ((i % 3) == 0) ? 1 : 2);
  |  ^

..., and PR101862 "[C, C++] Potential '?:' diagnostic for always-true
expressions in boolean context".

	libgomp/
	* testsuite/libgomp.oacc-c-c++-common/mode-transitions.c: Address
	'?:' issues.
---
 .../testsuite/libgomp.oacc-c-c++-common/mode-transitions.c  | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/libgomp/testsuite/libgomp.oacc-c-c++-common/mode-transitions.c b/libgomp/testsuite/libgomp.oacc-c-c++-common/mode-transitions.c
index 6c989abedf5..94dc9d05293 100644
--- a/libgomp/testsuite/libgomp.oacc-c-c++-common/mode-transitions.c
+++ b/libgomp/testsuite/libgomp.oacc-c-c++-common/mode-transitions.c
@@ -124,7 +124,7 @@ void t3()
 assert (n[i] == 2);
 
   for (i = 0; i < 1024; i++)
-assert (arr[i] == ((i % 64) < 32) ? 1 : -1);
+assert (arr[i] == (((i % 64) < 32) ? 1 : -1));
 }
 
 
@@ -356,7 +356,7 @@ void t9()
   }
 
   for (i = 0; i < 1024; i++)
-	assert (arr[i] == ((i % 3) == 0) ? 1 : 2);
+	assert (arr[i] == ((i % 3) == 0 ? 1 : 2));
 }
 }
 
@@ -960,7 +960,7 @@ void t23()
   }
 
   for (i = 0; i < 32; i++)
-assert (arr[i] == ((i % 2) != 0) ? i + 1 : i + 2);
+assert (arr[i] == (((i % 2) != 0) ? i + 1 : i + 2));
 }
 
 
-- 
2.30.2



Re: [PATCH 1/4] openacc: Middle-end worker-partitioning support

2021-08-16 Thread Thomas Schwinge
Hi!

On 2021-03-02T04:20:11-0800, Julian Brown  wrote:
> --- /dev/null
> +++ b/gcc/oacc-neuter-bcast.c

Allocated here:

> +static parallel_g *
> +omp_sese_find_par (bb_stmt_map_t *map, parallel_g *par, basic_block block)
> +{

> +   par = new parallel_g ([...]);

> +   par = new parallel_g (par, mask);

> +par = new parallel_g ([...]);

> +  return par;
> +}

> +static parallel_g *
> +omp_sese_discover_pars (bb_stmt_map_t *map)
> +{

> +  parallel_g *par = omp_sese_find_par (map, 0, block);

> +  return par;
> +}

..., and used here:

> +void
> +oacc_do_neutering (void)
> +{

> +  parallel_g *par = omp_sese_discover_pars (_stmt_map);
> +  populate_single_mode_bitmaps (par, [...]);

> +  find_ssa_names_to_propagate (par, [...]);

> +  find_partitioned_var_uses (par, [...]);
> +  find_local_vars_to_propagate (par, [...]);

> +  neuter_worker_single (par, [...]);

... but never released; memory leak.

Pushed "Plug 'par' memory leak in
'gcc/omp-oacc-neuter-broadcast.cc:execute_omp_oacc_neuter_broadcast'" to
master branch in commit df98015fb7db2ed754a7c154669bcf8e1612, see
attached.


Grüße
 Thomas


-
Siemens Electronic Design Automation GmbH; Anschrift: Arnulfstraße 201, 80634 
München; Gesellschaft mit beschränkter Haftung; Geschäftsführer: Thomas 
Heurung, Frank Thürauf; Sitz der Gesellschaft: München; Registergericht 
München, HRB 106955
>From df98015fb7db2ed754a7c154669bcf8e1612 Mon Sep 17 00:00:00 2001
From: Thomas Schwinge 
Date: Fri, 6 Aug 2021 15:34:25 +0200
Subject: [PATCH] Plug 'par' memory leak in
 'gcc/omp-oacc-neuter-broadcast.cc:execute_omp_oacc_neuter_broadcast'

Fix-up for recent commit e2a58ed6dc5293602d0d168475109caa81ad0f0d
"openacc: Middle-end worker-partitioning support".

	gcc/
	* omp-oacc-neuter-broadcast.cc
	(execute_omp_oacc_neuter_broadcast): Plug 'par' memory leak.
---
 gcc/omp-oacc-neuter-broadcast.cc | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/gcc/omp-oacc-neuter-broadcast.cc b/gcc/omp-oacc-neuter-broadcast.cc
index d30867085c3..d48627a6940 100644
--- a/gcc/omp-oacc-neuter-broadcast.cc
+++ b/gcc/omp-oacc-neuter-broadcast.cc
@@ -1463,6 +1463,8 @@ execute_omp_oacc_neuter_broadcast ()
 gcc_checking_assert (!it);
   prop_set.release ();
 
+  delete par;
+
   /* This doesn't seem to make a difference.  */
   loops_state_clear (LOOP_CLOSED_SSA);
 
-- 
2.30.2



Re: [PATCH 1/4] openacc: Middle-end worker-partitioning support

2021-08-16 Thread Thomas Schwinge
Hi!

On 2021-03-02T04:20:11-0800, Julian Brown  wrote:
> --- /dev/null
> +++ b/gcc/oacc-neuter-bcast.c

Allocated here:

> +/* Sets of SSA_NAMES or VAR_DECLs to propagate.  */
> +typedef hash_set propagation_set;
> +
> +static void
> +find_ssa_names_to_propagate ([...],
> +  vec *prop_set)
> +{

> +   if (!(*prop_set)[def_bb->index])
> + (*prop_set)[def_bb->index] = new propagation_set;

> +   if (!(*prop_set)[def_bb->index])
> + (*prop_set)[def_bb->index] = new propagation_set;

..., and here:

> +static void
> +find_local_vars_to_propagate ([...],
> +   vec *prop_set)
> +{

> +   if (!(*prop_set)[block->index])
> + (*prop_set)[block->index] = new propagation_set;

..., and deallocated here:

> +static void
> +neuter_worker_single ([...],
> +   vec *prop_set,
> +   [...])
> +{

> +   propagation_set *ws_prop = (*prop_set)[block->index];
> +
> +   if (ws_prop)
> + {
> +   [...]
> +   delete ws_prop;
> +   (*prop_set)[block->index] = 0;
> + }

..., and defined here:

> +void
> +oacc_do_neutering (void)
> +{

> +  vec prop_set;
> +  prop_set.create (last_basic_block_for_fn (cfun));
> +
> +  for (unsigned i = 0; i < last_basic_block_for_fn (cfun); i++)
> +prop_set.quick_push (0);

I recently learned about 'safe_grow_cleared', which allows for
simplifying this loop.

> +  find_ssa_names_to_propagate ([...], _set);

> +  find_local_vars_to_propagate ([...], _set);

> +  neuter_worker_single ([...], _set, [...]);
> +
> +  prop_set.release ();

It seems appropriate to add a check that 'neuter_worker_single' has
indeed handled/'delete'd all these.

Pushed "Clarify memory management for 'prop_set' in
'gcc/omp-oacc-neuter-broadcast.cc'" to master branch in
commit 7b9d99e615212c24cecae4202d8def9aa5e71809, see attached.


Grüße
 Thomas


-
Siemens Electronic Design Automation GmbH; Anschrift: Arnulfstraße 201, 80634 
München; Gesellschaft mit beschränkter Haftung; Geschäftsführer: Thomas 
Heurung, Frank Thürauf; Sitz der Gesellschaft: München; Registergericht 
München, HRB 106955
>From 7b9d99e615212c24cecae4202d8def9aa5e71809 Mon Sep 17 00:00:00 2001
From: Thomas Schwinge 
Date: Wed, 11 Aug 2021 22:31:55 +0200
Subject: [PATCH] Clarify memory management for 'prop_set' in
 'gcc/omp-oacc-neuter-broadcast.cc'

Clean-up for recent commit e2a58ed6dc5293602d0d168475109caa81ad0f0d
"openacc: Middle-end worker-partitioning support".

	gcc/
	* omp-oacc-neuter-broadcast.cc
	(execute_omp_oacc_neuter_broadcast): Clarify memory management for
	'prop_set'.
---
 gcc/omp-oacc-neuter-broadcast.cc | 10 +-
 1 file changed, 5 insertions(+), 5 deletions(-)

diff --git a/gcc/omp-oacc-neuter-broadcast.cc b/gcc/omp-oacc-neuter-broadcast.cc
index 9bde0aca10f..d30867085c3 100644
--- a/gcc/omp-oacc-neuter-broadcast.cc
+++ b/gcc/omp-oacc-neuter-broadcast.cc
@@ -1398,11 +1398,8 @@ execute_omp_oacc_neuter_broadcast ()
   FOR_ALL_BB_FN (bb, cfun)
 bb->aux = NULL;
 
-  vec prop_set;
-  prop_set.create (last_basic_block_for_fn (cfun));
-
-  for (int i = 0; i < last_basic_block_for_fn (cfun); i++)
-prop_set.quick_push (0);
+  vec prop_set (vNULL);
+  prop_set.safe_grow_cleared (last_basic_block_for_fn (cfun), true);
 
   find_ssa_names_to_propagate (par, mask, worker_single, vector_single,
 			   _set);
@@ -1461,6 +1458,9 @@ execute_omp_oacc_neuter_broadcast ()
 delete it.second;
   record_field_map.empty ();
 
+  /* These are supposed to have been 'delete'd by 'neuter_worker_single'.  */
+  for (auto it : prop_set)
+gcc_checking_assert (!it);
   prop_set.release ();
 
   /* This doesn't seem to make a difference.  */
-- 
2.30.2



Re: [PATCH 1/4] openacc: Middle-end worker-partitioning support

2021-08-16 Thread Thomas Schwinge
Hi!

On 2021-08-06T09:49:58+0100, Julian Brown  wrote:
> On Wed, 4 Aug 2021 15:13:30 +0200
> Thomas Schwinge  wrote:
>
>> 'oacc_do_neutering' is the 'execute' function of the pass, so that
>> means every time this executes, a fresh 'field_map' is set up, no
>> state persists across runs (assuming I'm understanding that
>> correctly).  Why don't we simply use standard (non-GC) memory
>> management for that?  "For convenience" shall be fine as an answer
>> ;-) -- but maybe instead of figuring out the right GC annotations,
>> changing the memory management will be easier?  (Or, of course, maybe
>> I completely misunderstood that?)
>
> I suspect you're right, and there's no need for this to be GC-allocated
> memory. If non-standard memory allocation will work out fine, we should

("non-GC", I suppose.)

> probably use that instead.

Pushed "Avoid 'GTY' use for 'gcc/omp-oacc-neuter-broadcast.cc:field_map'"
to master branch in commit 049eda8274b7394523238b17ab12c3e2889f253e, see
attached.


Grüße
 Thomas


-
Siemens Electronic Design Automation GmbH; Anschrift: Arnulfstraße 201, 80634 
München; Gesellschaft mit beschränkter Haftung; Geschäftsführer: Thomas 
Heurung, Frank Thürauf; Sitz der Gesellschaft: München; Registergericht 
München, HRB 106955
>From 049eda8274b7394523238b17ab12c3e2889f253e Mon Sep 17 00:00:00 2001
From: Thomas Schwinge 
Date: Fri, 6 Aug 2021 12:09:12 +0200
Subject: [PATCH] Avoid 'GTY' use for
 'gcc/omp-oacc-neuter-broadcast.cc:field_map'

... and further simplify related things a bit.

Fix-up/clean-up for recent commit e2a58ed6dc5293602d0d168475109caa81ad0f0d
"openacc: Middle-end worker-partitioning support".

	gcc/
	* omp-oacc-neuter-broadcast.cc (field_map): Move variable into...
	(execute_omp_oacc_neuter_broadcast): ... here.
	(install_var_field, build_receiver_ref, build_sender_ref): Take
	'field_map_t *' parameter.  Adjust all users.
	(worker_single_copy, neuter_worker_single): Take a
	'record_field_map_t *' parameter.  Adjust all users.
---
 gcc/omp-oacc-neuter-broadcast.cc | 48 +---
 1 file changed, 26 insertions(+), 22 deletions(-)

diff --git a/gcc/omp-oacc-neuter-broadcast.cc b/gcc/omp-oacc-neuter-broadcast.cc
index f8555380451..9bde0aca10f 100644
--- a/gcc/omp-oacc-neuter-broadcast.cc
+++ b/gcc/omp-oacc-neuter-broadcast.cc
@@ -538,12 +538,9 @@ typedef hash_map field_map_t;
 
 typedef hash_map record_field_map_t;
 
-static GTY(()) record_field_map_t *field_map;
-
 static void
-install_var_field (tree var, tree record_type)
+install_var_field (tree var, tree record_type, field_map_t *fields)
 {
-  field_map_t *fields = *field_map->get (record_type);
   tree name;
   char tmp[20];
 
@@ -959,9 +956,8 @@ oacc_build_component_ref (tree obj, tree field)
 }
 
 static tree
-build_receiver_ref (tree record_type, tree var, tree receiver_decl)
+build_receiver_ref (tree var, tree receiver_decl, field_map_t *fields)
 {
-  field_map_t *fields = *field_map->get (record_type);
   tree x = build_simple_mem_ref (receiver_decl);
   tree field = *fields->get (var);
   TREE_THIS_NOTRAP (x) = 1;
@@ -970,9 +966,8 @@ build_receiver_ref (tree record_type, tree var, tree receiver_decl)
 }
 
 static tree
-build_sender_ref (tree record_type, tree var, tree sender_decl)
+build_sender_ref (tree var, tree sender_decl, field_map_t *fields)
 {
-  field_map_t *fields = *field_map->get (record_type);
   tree field = *fields->get (var);
   return oacc_build_component_ref (sender_decl, field);
 }
@@ -1010,7 +1005,7 @@ static void
 worker_single_copy (basic_block from, basic_block to,
 		hash_set *def_escapes_block,
 		hash_set *worker_partitioned_uses,
-		tree record_type)
+		tree record_type, record_field_map_t *record_field_map)
 {
   /* If we only have virtual defs, we'll have no record type, but we still want
  to emit single_copy_start and (particularly) single_copy_end to act as
@@ -1147,7 +1142,7 @@ worker_single_copy (basic_block from, basic_block to,
 	gcc_assert (TREE_CODE (var) == VAR_DECL);
 
   /* If we had no record type, we will have no fields map.  */
-  field_map_t **fields_p = field_map->get (record_type);
+  field_map_t **fields_p = record_field_map->get (record_type);
   field_map_t *fields = fields_p ? *fields_p : NULL;
 
   if (worker_partitioned_uses->contains (var)
@@ -1158,8 +1153,7 @@ worker_single_copy (basic_block from, basic_block to,
 
 	  /* Receive definition from shared memory block.  */
 
-	  tree receiver_ref = build_receiver_ref (record_type, var,
-		  receiver_decl);
+	  tree receiver_ref = build_receiver_ref (var, receiver_decl, fields);
 	  gassign *recv = gimple_build_assign (neutered_def,
 	   receiver_ref);
 	  gsi_insert_after (_gsi, recv, GSI_CONTINUE_LINKING);
@@ -1189,7 +1183,7 @@ worker_single_copy (basic_block from, basic_block to,
 
 	  /* Send definition to shared memory block.  */
 
-	  tree sender_ref = build_sender_ref (record_type, var, sender_decl);
+	  tree 

[patch] Fix regression in debug info for Ada with DWARF 5

2021-08-16 Thread Eric Botcazou
Hi,

add_scalar_info can directly generate a reference to an existing DIE for a 
scalar attribute, e.g the upper bound of a VLA, but it does so only if this 
existing DIE has a location or is a constant:

  if (get_AT (decl_die, DW_AT_location)
  || get_AT (decl_die, DW_AT_data_member_location)
  || get_AT (decl_die, DW_AT_const_value))

Now, in DWARF 5, members of a structure that are bitfields no longer have a 
DW_AT_data_member_location but a DW_AT_data_bit_offset attribute instead, so 
the condition is bypassed.

Tested on x86-64/Linux, OK for mainline and 11 branch?


2021-08-16  Eric Botcazou  

* dwarf2out.c (add_scalar_info): Deal with DW_AT_data_bit_offset.

-- 
Eric Botcazoudiff --git a/gcc/dwarf2out.c b/gcc/dwarf2out.c
index 4bcd3313fee..ba0a6d6ed60 100644
--- a/gcc/dwarf2out.c
+++ b/gcc/dwarf2out.c
@@ -21253,6 +21253,7 @@ add_scalar_info (dw_die_ref die, enum dwarf_attribute attr, tree value,
 	{
 	  if (get_AT (decl_die, DW_AT_location)
 		  || get_AT (decl_die, DW_AT_data_member_location)
+		  || get_AT (decl_die, DW_AT_data_bit_offset)
 		  || get_AT (decl_die, DW_AT_const_value))
 		{
 		  add_AT_die_ref (die, attr, decl_die);


Disable GNAT encodings by default

2021-08-16 Thread Eric Botcazou
Given the latest work in the compiler and debugger, we no longer need to use
most GNAT-specific encodings in the debug info generated for an Ada program,
so the attached patch disables them, except with -fgnat-encodings=all.

Tested on x86-64/Linux, applied on the mainline as obvious.


2021-08-16  Eric Botcazou  

* dwarf2out.c (add_data_member_location_attribute): Use GNAT
encodings only when -fgnat-encodings=all is specified.
(add_bound_info): Likewise.
(add_byte_size_attribute): Likewise.
(gen_member_die): Likewise.

-- 
Eric Botcazoudiff --git a/gcc/dwarf2out.c b/gcc/dwarf2out.c
index b91a9b5abaa..ba0a6d6ed60 100644
--- a/gcc/dwarf2out.c
+++ b/gcc/dwarf2out.c
@@ -19915,22 +19915,23 @@ add_data_member_location_attribute (dw_die_ref die,
 {
   loc_descr = field_byte_offset (decl, ctx, );
 
-  /* If loc_descr is available then we know the field offset is dynamic.
-	 However, GDB does not handle dynamic field offsets very well at the
-	 moment.  */
-  if (loc_descr != NULL && gnat_encodings != DWARF_GNAT_ENCODINGS_MINIMAL)
+  if (!loc_descr)
+	;
+
+  /* If loc_descr is available, then we know the offset is dynamic.  */
+  else if (gnat_encodings == DWARF_GNAT_ENCODINGS_ALL)
 	{
 	  loc_descr = NULL;
 	  offset = 0;
 	}
 
-  /* Data member location evalutation starts with the base address on the
+  /* Data member location evaluation starts with the base address on the
 	 stack.  Compute the field offset and add it to this base address.  */
-  else if (loc_descr != NULL)
+  else
 	add_loc_descr (_descr, new_loc_descr (DW_OP_plus, 0, 0));
 }
 
-  if (! loc_descr)
+  if (!loc_descr)
 {
   /* While DW_AT_data_bit_offset has been added already in DWARF4,
 	 e.g. GDB only added support to it in November 2016.  For DWARF5
@@ -21389,12 +21391,9 @@ add_bound_info (dw_die_ref subrange_die, enum dwarf_attribute bound_attr,
 	/* FALLTHRU */
 
   default:
-	/* Because of the complex interaction there can be with other GNAT
-	   encodings, GDB isn't ready yet to handle proper DWARF description
-	   for self-referencial subrange bounds: let GNAT encodings do the
-	   magic in such a case.  */
+	/* Let GNAT encodings do the magic for self-referential bounds.  */
 	if (is_ada ()
-	&& gnat_encodings != DWARF_GNAT_ENCODINGS_MINIMAL
+	&& gnat_encodings == DWARF_GNAT_ENCODINGS_ALL
 	&& contains_placeholder_p (bound))
 	  return;
 
@@ -21566,7 +21565,7 @@ add_byte_size_attribute (dw_die_ref die, tree tree_node)
   /* Support for dynamically-sized objects was introduced in DWARF3.  */
   else if (TYPE_P (tree_node)
 	   && (dwarf_version >= 3 || !dwarf_strict)
-	   && gnat_encodings == DWARF_GNAT_ENCODINGS_MINIMAL)
+	   && gnat_encodings != DWARF_GNAT_ENCODINGS_ALL)
 {
   struct loc_descr_context ctx = {
 	const_cast (tree_node),	/* context_type */
@@ -25629,11 +25628,11 @@ gen_member_die (tree type, dw_die_ref context_die)
 	splice_child_die (context_die, child);
 	}
 
-  /* Do not generate standard DWARF for variant parts if we are generating
-	 the corresponding GNAT encodings: DIEs generated for both would
-	 conflict in our mappings.  */
+  /* Do not generate DWARF for variant parts if we are generating the
+	 corresponding GNAT encodings: DIEs generated for the two schemes
+	 would conflict in our mappings.  */
   else if (is_variant_part (member)
-	   && gnat_encodings == DWARF_GNAT_ENCODINGS_MINIMAL)
+	   && gnat_encodings != DWARF_GNAT_ENCODINGS_ALL)
 	{
 	  vlr_ctx.variant_part_offset = byte_position (member);
 	  gen_variant_part (member, _ctx, context_die);


Re: [PATCH] RISC-V: Enable overlap-by-pieces in case of fast unaliged access

2021-08-16 Thread Kito Cheng via Gcc-patches
HI Christoph:

Could you submit v3 patch which is v1 with overlap_op_by_pieces field,
testcase from v2 and add a few more comments to describe the field?

And add an -mtune=ultra-size to make it able to test without change
other behavior?

Hi Palmer:

Are you OK with that?


On Sat, Aug 14, 2021 at 1:54 AM Christoph Müllner via Gcc-patches
 wrote:
>
> Ping.
>
> On Thu, Aug 5, 2021 at 11:11 AM Christoph Müllner  
> wrote:
> >
> > Ping.
> >
> > On Thu, Jul 29, 2021 at 9:36 PM Christoph Müllner  
> > wrote:
> > >
> > > On Thu, Jul 29, 2021 at 8:54 PM Palmer Dabbelt  wrote:
> > > >
> > > > On Tue, 27 Jul 2021 02:32:12 PDT (-0700), cmuell...@gcc.gnu.org wrote:
> > > > > Ok, so if I understand correctly Palmer and Andrew prefer
> > > > > overlap_op_by_pieces to be controlled
> > > > > by its own field in the riscv_tune_param struct and not by the field
> > > > > slow_unaligned_access in this struct
> > > > > (i.e. slow_unaligned_access==false is not enough to imply
> > > > > overlap_op_by_pieces==true).
> > > >
> > > > I guess, but I'm not really worried about this at that level of detail
> > > > right now.  It's not like the tune structures form any sort of external
> > > > interface we have to keep stable, we can do whatever we want with those
> > > > fields so I'd just aim for encoding the desired behavior as simply as
> > > > possible rather than trying to build something extensible.
> > > >
> > > > There are really two questions we need to answer: is this code actually
> > > > faster for the C906, and is this what the average users wants under -Os.
> > >
> > > I never mentioned -Os.
> > > My main goal is code compiled for -O2, -O3 or even -Ofast.
> > > And I want to execute code as fast as possible.
> > >
> > > Loading hot data from cache is faster when being done by a single
> > > load-word instruction than 4 load-byte instructions.
> > > Less instructions implies less pressure for the instruction cache.
> > > Less instructions implies less work for a CPU pipeline.
> > > Architectures, which don't have a penalty for unaligned accesses
> > > therefore observe a performance benefit.
> > >
> > > What I understand from Andrew's email is that it is not that simple
> > > and implementation might have a penalty for overlapping accesses
> > > that is high enough to avoid them. I don't have the details for C906,
> > > so I can't say if that's the case.
> > >
> > > > That first one is pretty easy: just running those simple code sequences
> > > > under a sweep of page offsets should be sufficient to determine if this
> > > > is always faster (in which case it's an easy yes), if it's always slower
> > > > (an easy no), or if there's some slow cases like page/cache line
> > > > crossing (in which case we'd need to think a bit).
> > > >
> > > > The second one is a bit tricker.  In the past we'd said these sort of
> > > > "actively misalign accesses to generate smaller code" sort of thing
> > > > isn't suitable for -Os (as most machines still have very slow unaligned
> > > > accesses) but is suitable for -Oz (don't remember if that ever ended up
> > > > in GCC, though).  That still seems like a reasonable decision, but if it
> > > > turns out that implementations with fast unaligned accesses become the
> > > > norm then it'd probably be worth revisiting it.  Not sure exactly how to
> > > > determine that tipping point, but I think we're a long way away from it
> > > > right now.
> > > >
> > > > IMO it's really just premature to try and design an encoding of the
> > > > tuning paramaters until we have an idea of what they are, as we'll just
> > > > end up devolving down the path of trying to encode all possible hardware
> > > > and that's generally a huge waste of time.  Since there's no ABI here we
> > > > can refactor this however we want as new tunings show up.
> > >
> > > I guess you mean that there needs to be a clear benefit for a supported
> > > machine in GCC. Either obviously (see below), by measurement results,
> > > or by decision
> > > of the machine's maintainer (especially if the decision is a trade-off).
> > >
> > > >
> > > > > I don't have access to pipeline details that give proof that there 
> > > > > are cases
> > > > > where this patch causes a performance penalty.
> > > > >
> > > > > So, I leave this here as a summary for someone who has enough 
> > > > > information and
> > > > > interest to move this forward:
> > > > > * the original patch should be sufficient, but does not have tests:
> > > > >   https://gcc.gnu.org/pipermail/gcc-patches/2021-July/575791.html
> > > > > * the tests can be taken from this patch:
> > > > >   https://gcc.gnu.org/pipermail/gcc-patches/2021-July/575864.html
> > > > >   Note, that there is a duplicated "sw" in builtins-overlap-6.c, which
> > > > > should be a "sd".
> > > > >
> > > > > Thanks for the feedback!
> > > >
> > > > Cool.  Looks like the C906 is starting to show up in the real world, so
> > > > we should be able to find someone who has access to one and cares 

Re: [Patch] [OpenMP] Update omp-low.c's omp_runtime_api_call [PR101931]

2021-08-16 Thread Jakub Jelinek via Gcc-patches
On Mon, Aug 16, 2021 at 11:55:50AM +0200, Tobias Burnus wrote:
> [OpenMP] Update omp-low.c's omp_runtime_api_call [PR101931]
> 
> gcc/ChangeLog:
> 
>   PR middle-end/101931
>   * omp-low.c (omp_runtime_api_call): Update for routines
>   added in the meanwhile.

LGTM, thanks.

Jakub



[Patch] [OpenMP] Update omp-low.c's omp_runtime_api_call [PR101931]

2021-08-16 Thread Tobias Burnus

Compared to the PR, it also contains omp_alloc/omp_free which I missed
when creating the PR.

Hopefully, I haven't missed any other routine.

OK? (Once bootstrap+testing succeeded, but I do not expect issues.)

Tobias

-
Siemens Electronic Design Automation GmbH; Anschrift: Arnulfstraße 201, 80634 
München; Gesellschaft mit beschränkter Haftung; Geschäftsführer: Thomas 
Heurung, Frank Thürauf; Sitz der Gesellschaft: München; Registergericht 
München, HRB 106955
[OpenMP] Update omp-low.c's omp_runtime_api_call [PR101931]

gcc/ChangeLog:

	PR middle-end/101931
	* omp-low.c (omp_runtime_api_call): Update for routines
	added in the meanwhile.

 gcc/omp-low.c | 10 ++
 1 file changed, 10 insertions(+)

diff --git a/gcc/omp-low.c b/gcc/omp-low.c
index 22ba579d11c..bef99405c20 100644
--- a/gcc/omp-low.c
+++ b/gcc/omp-low.c
@@ -3863,6 +3863,8 @@ omp_runtime_api_call (const_tree fndecl)
 {
   /* This array has 3 sections.  First omp_* calls that don't
 	 have any suffixes.  */
+  "omp_alloc",
+  "omp_free",
   "target_alloc",
   "target_associate_ptr",
   "target_disassociate_ptr",
@@ -3873,13 +3875,17 @@ omp_runtime_api_call (const_tree fndecl)
   NULL,
   /* Now omp_* calls that are available as omp_* and omp_*_.  */
   "capture_affinity",
+  "destroy_allocator",
   "destroy_lock",
   "destroy_nest_lock",
   "display_affinity",
+  "fulfill_event",
   "get_active_level",
   "get_affinity_format",
   "get_cancellation",
+  "get_default_allocator",
   "get_default_device",
+  "get_device_num",
   "get_dynamic",
   "get_initial_device",
   "get_level",
@@ -3895,6 +3901,7 @@ omp_runtime_api_call (const_tree fndecl)
   "get_partition_num_places",
   "get_place_num",
   "get_proc_bind",
+  "get_supported_active_levels",
   "get_team_num",
   "get_thread_limit",
   "get_thread_num",
@@ -3908,6 +3915,7 @@ omp_runtime_api_call (const_tree fndecl)
   "pause_resource",
   "pause_resource_all",
   "set_affinity_format",
+  "set_default_allocator",
   "set_lock",
   "set_nest_lock",
   "test_lock",
@@ -3916,7 +3924,9 @@ omp_runtime_api_call (const_tree fndecl)
   "unset_nest_lock",
   NULL,
   /* And finally calls available as omp_*, omp_*_ and omp_*_8_.  */
+  "display_env",
   "get_ancestor_thread_num",
+  "init_allocator",
   "get_partition_place_nums",
   "get_place_num_procs",
   "get_place_proc_ids",


[Patch] [OpenMP] Update omp-low.c's omp_runtime_api_call [PR101931]

2021-08-16 Thread Tobias Burnus

Compared to the PR, it also contains omp_alloc/omp_free which I missed
when creating the PR.

Hopefully, I haven't missed any other routine.

OK? (Once bootstrap+testing succeeded, but I do not expect issues.)

Tobias

-
Siemens Electronic Design Automation GmbH; Anschrift: Arnulfstraße 201, 80634 
München; Gesellschaft mit beschränkter Haftung; Geschäftsführer: Thomas 
Heurung, Frank Thürauf; Sitz der Gesellschaft: München; Registergericht 
München, HRB 106955
[OpenMP] Update omp-low.c's omp_runtime_api_call [PR101931]

gcc/ChangeLog:

	PR middle-end/101931
	* omp-low.c (omp_runtime_api_call): Update for routines
	added in the meanwhile.

 gcc/omp-low.c | 10 ++
 1 file changed, 10 insertions(+)

diff --git a/gcc/omp-low.c b/gcc/omp-low.c
index 22ba579d11c..bef99405c20 100644
--- a/gcc/omp-low.c
+++ b/gcc/omp-low.c
@@ -3863,6 +3863,8 @@ omp_runtime_api_call (const_tree fndecl)
 {
   /* This array has 3 sections.  First omp_* calls that don't
 	 have any suffixes.  */
+  "omp_alloc",
+  "omp_free",
   "target_alloc",
   "target_associate_ptr",
   "target_disassociate_ptr",
@@ -3873,13 +3875,17 @@ omp_runtime_api_call (const_tree fndecl)
   NULL,
   /* Now omp_* calls that are available as omp_* and omp_*_.  */
   "capture_affinity",
+  "destroy_allocator",
   "destroy_lock",
   "destroy_nest_lock",
   "display_affinity",
+  "fulfill_event",
   "get_active_level",
   "get_affinity_format",
   "get_cancellation",
+  "get_default_allocator",
   "get_default_device",
+  "get_device_num",
   "get_dynamic",
   "get_initial_device",
   "get_level",
@@ -3895,6 +3901,7 @@ omp_runtime_api_call (const_tree fndecl)
   "get_partition_num_places",
   "get_place_num",
   "get_proc_bind",
+  "get_supported_active_levels",
   "get_team_num",
   "get_thread_limit",
   "get_thread_num",
@@ -3908,6 +3915,7 @@ omp_runtime_api_call (const_tree fndecl)
   "pause_resource",
   "pause_resource_all",
   "set_affinity_format",
+  "set_default_allocator",
   "set_lock",
   "set_nest_lock",
   "test_lock",
@@ -3916,7 +3924,9 @@ omp_runtime_api_call (const_tree fndecl)
   "unset_nest_lock",
   NULL,
   /* And finally calls available as omp_*, omp_*_ and omp_*_8_.  */
+  "display_env",
   "get_ancestor_thread_num",
+  "init_allocator",
   "get_partition_place_nums",
   "get_place_num_procs",
   "get_place_proc_ids",


Re: [PATCH] RISC-V: Allow unaligned accesses in cpymemsi expansion

2021-08-16 Thread Kito Cheng via Gcc-patches
Hi Christoph:

Generally LGTM, only 1 minor comment.

> @@ -3292,8 +3294,17 @@ riscv_expand_block_move (rtx dest, rtx src, rtx length)
>unsigned HOST_WIDE_INT hwi_length = UINTVAL (length);
>unsigned HOST_WIDE_INT factor, align;
>
> -  align = MIN (MIN (MEM_ALIGN (src), MEM_ALIGN (dest)), BITS_PER_WORD);
> -  factor = BITS_PER_WORD / align;
> +  if (riscv_slow_unaligned_access_p)
> +   {
> + align = MIN (MIN (MEM_ALIGN (src), MEM_ALIGN (dest)), 
> BITS_PER_WORD);
> + factor = BITS_PER_WORD / align;
> +   }
> +  else
> +   {
> + /* Assume data to be aligned.  */
> + align = hwi_length * BITS_PER_UNIT;

Either the variable should be renamed or just set to BITS_PER_WORD?
e.g hwi_length = 15, then align =15 * 8 = 120

Although that still generated the result since mode_for_size still
returns DI for rv64 and SI for rv32.

> + factor = 1;

I would prefer keep factor = BITS_PER_WORD / align; and then pull it
outside like that:

```c
 if (riscv_slow_unaligned_access_p)
   align = MIN (MIN (MEM_ALIGN (src), MEM_ALIGN (dest)), BITS_PER_WORD);
 else
   align =  BITS_PER_WORD;

factor = BITS_PER_WORD / align;
```


RE: [wwwdocs] gcc-12/changes.html (GCN): >1 workers per gang

2021-08-16 Thread Thomas Schwinge
Hi!

On 2021-08-16T10:34:34+0200, "Stubbs, Andrew"  wrote:
>> In other words:  For gangs > #CUs or >1 gang per CU, the following patch
>> is needed:
>>[OG11] https://gcc.gnu.org/g:4dcd1e1f4e6b451aac44f919b8eb3ac49292b308
>>[email] https://gcc.gnu.org/pipermail/gcc-patches/2020-July/550102.html
>>   "not suitable for mainline until the multiple-worker support is merged
>> there"
>>
>> @Andrew + @Julian:  Do you intent to commit it relatively soon?
>> Regarding the wwwdocs patch, I can hold off until that commit or reword
>> it to only cover the workers part.
>
> Were these not part of the patch set Thomas was working on?

That one is "amdgcn: Tune default OpenMP/OpenACC GPU utilization", which
indeed I'm planning to handle in the next few weeks.  So I suggest we
simply hold back Tobias' 'gcc-12/changes.html' patch until that time.


Grüße
 Thomas
-
Siemens Electronic Design Automation GmbH; Anschrift: Arnulfstraße 201, 80634 
München; Gesellschaft mit beschränkter Haftung; Geschäftsführer: Thomas 
Heurung, Frank Thürauf; Sitz der Gesellschaft: München; Registergericht 
München, HRB 106955


Re: [PATCH] i386: Add peephole for lea and zero extend [PR 101716]

2021-08-16 Thread Uros Bizjak via Gcc-patches
On Mon, Aug 16, 2021 at 11:18 AM Hongyu Wang  wrote:
>
> > So, the question is if the combine pass really needs to zero-extend
> > with 0xfffe, the left shift << 1 guarantees zero in the LSB, so
> > 0x should be better and in line with canonical zero-extension
> > RTX.
>
> The shift mask is generated in simplify_shift_const_1:
>
> mask_rtx = gen_int_mode (nonzero_bits (varop, int_varop_mode),
>  int_result_mode);
> rtx count_rtx = gen_int_shift_amount (int_result_mode, count);
> mask_rtx
>   = simplify_const_binary_operation (code, int_result_mode,
>  mask_rtx, count_rtx);
>
> Can we adjust the count for ashift if nonzero_bits overlaps it?
>
> > Also, ix86_decompose_address accepts ASHIFT RTX when ASHIFT is
> > embedded in the PLUS chain, but naked ASHIFT is rejected (c.f. the
> > call in ix86_legitimate_address_p) for some (historic?) reason. It
> > looks to me that this restriction is not necessary, since
> > ix86_legitimize_address can canonicalize ASHIFT RTXes without
> > problems. The attached patch that survives bootstrap and regtest can
> > help in your case.
>
> We have a split to transform ashift to mult, I'm afraid it could not
> help this issue.

If you want existing *lea to accept ASHIFT RTX, it uses
address_no_seg_operand predicate which uses address_operand predicate,
which calls ix86_legitimate_address_p, which ATM rejects ASHIFT RTXes.

Uros.


Re: [PATCH] [i386] Fix ICE.

2021-08-16 Thread Uros Bizjak via Gcc-patches
On Mon, Aug 16, 2021 at 11:19 AM liuhongt  wrote:
>
> Hi:
>   avx512f_scalef2 only accept register_operand for operands[1],
> force it to reg in ldexp3.
>
>   Bootstrapped and regtested on x86_64-linux-gnu{-m32,}.
>   Ok for trunk.
>
> gcc/ChangeLog:
>
> PR target/101930
> * config/i386/i386.md (ldexp3): Force operands[1] to
> reg.
>
> gcc/testsuite/ChangeLog:
>
> PR target/101930
> * gcc.target/i386/pr101930.c: New test.

OK.

Thanks,
Uros.

> ---
>  gcc/config/i386/i386.md  | 4 +---
>  gcc/testsuite/gcc.target/i386/pr101930.c | 9 +
>  2 files changed, 10 insertions(+), 3 deletions(-)
>  create mode 100644 gcc/testsuite/gcc.target/i386/pr101930.c
>
> diff --git a/gcc/config/i386/i386.md b/gcc/config/i386/i386.md
> index 4a8e8fea290..41d85623ad6 100644
> --- a/gcc/config/i386/i386.md
> +++ b/gcc/config/i386/i386.md
> @@ -17938,9 +17938,7 @@ (define_expand "ldexp3"
>if (TARGET_AVX512F && TARGET_SSE_MATH)
> {
>   rtx op2 = gen_reg_rtx (mode);
> -
> - if (!nonimmediate_operand (operands[1], mode))
> -   operands[1] = force_reg (mode, operands[1]);
> + operands[1] = force_reg (mode, operands[1]);
>
>   emit_insn (gen_floatsi2 (op2, operands[2]));
>   emit_insn (gen_avx512f_scalef2 (operands[0], operands[1], op2));
> diff --git a/gcc/testsuite/gcc.target/i386/pr101930.c 
> b/gcc/testsuite/gcc.target/i386/pr101930.c
> new file mode 100644
> index 000..7207dd18377
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/i386/pr101930.c
> @@ -0,0 +1,9 @@
> +/* { dg-do compile } */
> +/* { dg-options "-mavx512f -O2 -mfpmath=sse -ffast-math" } */
> +double a;
> +double
> +__attribute__((noipa))
> +foo (int b)
> +{
> +  return __builtin_ldexp (a, b);
> +}
> --
> 2.27.0
>


[PATCH] [i386] Fix ICE.

2021-08-16 Thread liuhongt via Gcc-patches
Hi:
  avx512f_scalef2 only accept register_operand for operands[1],
force it to reg in ldexp3.

  Bootstrapped and regtested on x86_64-linux-gnu{-m32,}.
  Ok for trunk.

gcc/ChangeLog:

PR target/101930
* config/i386/i386.md (ldexp3): Force operands[1] to
reg.

gcc/testsuite/ChangeLog:

PR target/101930
* gcc.target/i386/pr101930.c: New test.
---
 gcc/config/i386/i386.md  | 4 +---
 gcc/testsuite/gcc.target/i386/pr101930.c | 9 +
 2 files changed, 10 insertions(+), 3 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/i386/pr101930.c

diff --git a/gcc/config/i386/i386.md b/gcc/config/i386/i386.md
index 4a8e8fea290..41d85623ad6 100644
--- a/gcc/config/i386/i386.md
+++ b/gcc/config/i386/i386.md
@@ -17938,9 +17938,7 @@ (define_expand "ldexp3"
   if (TARGET_AVX512F && TARGET_SSE_MATH)
{
  rtx op2 = gen_reg_rtx (mode);
-
- if (!nonimmediate_operand (operands[1], mode))
-   operands[1] = force_reg (mode, operands[1]);
+ operands[1] = force_reg (mode, operands[1]);
 
  emit_insn (gen_floatsi2 (op2, operands[2]));
  emit_insn (gen_avx512f_scalef2 (operands[0], operands[1], op2));
diff --git a/gcc/testsuite/gcc.target/i386/pr101930.c 
b/gcc/testsuite/gcc.target/i386/pr101930.c
new file mode 100644
index 000..7207dd18377
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/pr101930.c
@@ -0,0 +1,9 @@
+/* { dg-do compile } */
+/* { dg-options "-mavx512f -O2 -mfpmath=sse -ffast-math" } */
+double a;
+double
+__attribute__((noipa))
+foo (int b)
+{
+  return __builtin_ldexp (a, b);
+}
-- 
2.27.0



Re: [PATCH] i386: Add peephole for lea and zero extend [PR 101716]

2021-08-16 Thread Hongyu Wang via Gcc-patches
> So, the question is if the combine pass really needs to zero-extend
> with 0xfffe, the left shift << 1 guarantees zero in the LSB, so
> 0x should be better and in line with canonical zero-extension
> RTX.

The shift mask is generated in simplify_shift_const_1:

mask_rtx = gen_int_mode (nonzero_bits (varop, int_varop_mode),
 int_result_mode);
rtx count_rtx = gen_int_shift_amount (int_result_mode, count);
mask_rtx
  = simplify_const_binary_operation (code, int_result_mode,
 mask_rtx, count_rtx);

Can we adjust the count for ashift if nonzero_bits overlaps it?

> Also, ix86_decompose_address accepts ASHIFT RTX when ASHIFT is
> embedded in the PLUS chain, but naked ASHIFT is rejected (c.f. the
> call in ix86_legitimate_address_p) for some (historic?) reason. It
> looks to me that this restriction is not necessary, since
> ix86_legitimize_address can canonicalize ASHIFT RTXes without
> problems. The attached patch that survives bootstrap and regtest can
> help in your case.

We have a split to transform ashift to mult, I'm afraid it could not
help this issue.

Uros Bizjak via Gcc-patches  于2021年8月16日周一 下午4:12写道:

>
> On Fri, Aug 13, 2021 at 9:21 AM Uros Bizjak  wrote:
> >
> > On Fri, Aug 13, 2021 at 2:48 AM Hongyu Wang  wrote:
> > >
> > > Hi,
> > >
> > > For lea + zero_extendsidi insns, if dest of lea and src of zext are the
> > > same, combine them with single leal under 64bit target since 32bit
> > > register will be automatically zero-extended.
> > >
> > > Bootstrapped and regtested on x86_64-linux-gnu{-m32,}.
> > > Ok for master?
> > >
> > > gcc/ChangeLog:
> > >
> > > PR target/101716
> > > * config/i386/i386.md (*lea_zext): New define_insn.
> > > (define_peephole2): New peephole2 to combine zero_extend
> > > with lea.
> > >
> > > gcc/testsuite/ChangeLog:
> > >
> > > PR target/101716
> > > * gcc.target/i386/pr101716.c: New test.
> >
> > This form should be covered by ix86_decompose_address via
> > address_no_seg_operand predicate. Combine creates:
> >
> > Trying 6 -> 7:
> >6: {r86:DI=r87:DI<<0x1;clobber flags:CC;}
> >  REG_DEAD r87:DI
> >  REG_UNUSED flags:CC
> >7: r85:DI=zero_extend(r86:DI#0)
> >  REG_DEAD r86:DI
> > Failed to match this instruction:
> > (set (reg:DI 85)
> >(and:DI (ashift:DI (reg:DI 87)
> >(const_int 1 [0x1]))
> >(const_int 4294967294 [0xfffe])))
> >
> > which does not fit:
> >
> >   else if (GET_CODE (addr) == AND
> >&& const_32bit_mask (XEXP (addr, 1), DImode))
> >
> > After reload, we lose SUBREG, so REE does not trigger on:
> >
> > (insn 17 3 7 2 (set (reg:DI 0 ax [86])
> >(mult:DI (reg:DI 5 di [87])
> >(const_int 2 [0x2]))) "pr101716.c":4:13 204 {*leadi}
> > (nil))
> > (insn 7 17 13 2 (set (reg:DI 0 ax [85])
> >(zero_extend:DI (reg:SI 0 ax [86]))) "pr101716.c":4:19 136
> > {*zero_extendsidi2}
> > (nil))
> >
> > So, the question is if the combine pass really needs to zero-extend
> > with 0xfffe, the left shift << 1 guarantees zero in the LSB, so
> > 0x should be better and in line with canonical zero-extension
> > RTX.
>
> Also, ix86_decompose_address accepts ASHIFT RTX when ASHIFT is
> embedded in the PLUS chain, but naked ASHIFT is rejected (c.f. the
> call in ix86_legitimate_address_p) for some (historic?) reason. It
> looks to me that this restriction is not necessary, since
> ix86_legitimize_address can canonicalize ASHIFT RTXes without
> problems. The attached patch that survives bootstrap and regtest can
> help in your case.
>
> Uros.


[PATCH] Fix incorrect computation in fill_always_executed_in_1

2021-08-16 Thread Xiong Hu Luo via Gcc-patches
It seems to me that ALWAYS_EXECUTED_IN is not computed correctly for
nested loops.  inn_loop is updated to inner loop, so it need be restored
when exiting from innermost loop. With this patch, the store instruction
in outer loop could also be moved out of outer loop by store motion.
Any comments?  Thanks.

gcc/ChangeLog:

* tree-ssa-loop-im.c (fill_always_executed_in_1): Restore
inn_loop when exiting from innermost loop.

gcc/testsuite/ChangeLog:

* gcc.dg/tree-ssa/ssa-lim-19.c: New test.
---
 gcc/testsuite/gcc.dg/tree-ssa/ssa-lim-19.c | 24 ++
 gcc/tree-ssa-loop-im.c |  6 +-
 2 files changed, 29 insertions(+), 1 deletion(-)
 create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/ssa-lim-19.c

diff --git a/gcc/testsuite/gcc.dg/tree-ssa/ssa-lim-19.c 
b/gcc/testsuite/gcc.dg/tree-ssa/ssa-lim-19.c
new file mode 100644
index 000..097a5ee4a4b
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/tree-ssa/ssa-lim-19.c
@@ -0,0 +1,24 @@
+/* PR/101293 */
+/* { dg-do compile } */
+/* { dg-options "-O2 -fdump-tree-lim2-details" } */
+
+struct X { int i; int j; int k;};
+
+void foo(struct X *x, int n, int l)
+{
+  for (int j = 0; j < l; j++)
+{
+  for (int i = 0; i < n; ++i)
+   {
+ int *p = >j;
+ int tem = *p;
+ x->j += tem * i;
+   }
+  int *r = >k;
+  int tem2 = *r;
+  x->k += tem2 * j;
+}
+}
+
+/* { dg-final { scan-tree-dump-times "Executing store motion" 2 "lim2" } } */
+
diff --git a/gcc/tree-ssa-loop-im.c b/gcc/tree-ssa-loop-im.c
index b24bc64f2a7..5ca4738b20e 100644
--- a/gcc/tree-ssa-loop-im.c
+++ b/gcc/tree-ssa-loop-im.c
@@ -3211,6 +3211,10 @@ fill_always_executed_in_1 (class loop *loop, sbitmap 
contains_call)
  if (dominated_by_p (CDI_DOMINATORS, loop->latch, bb))
last = bb;
 
+ if (inn_loop != loop
+ && flow_loop_nested_p (bb->loop_father, inn_loop))
+   inn_loop = bb->loop_father;
+
  if (bitmap_bit_p (contains_call, bb->index))
break;
 
@@ -3238,7 +3242,7 @@ fill_always_executed_in_1 (class loop *loop, sbitmap 
contains_call)
 
  if (bb->loop_father->header == bb)
{
- if (!dominated_by_p (CDI_DOMINATORS, loop->latch, bb))
+ if (!dominated_by_p (CDI_DOMINATORS, bb->loop_father->latch, bb))
break;
 
  /* In a loop that is always entered we may proceed anyway.
-- 
2.27.0.90.geebb51ba8c



RE: [wwwdocs] gcc-12/changes.html (GCN): >1 workers per gang

2021-08-16 Thread Stubbs, Andrew
> In other words:  For gangs > #CUs or >1 gang per CU, the following patch
> is needed:
>[OG11] https://gcc.gnu.org/g:4dcd1e1f4e6b451aac44f919b8eb3ac49292b308
>[email] https://gcc.gnu.org/pipermail/gcc-patches/2020-July/550102.html
>   "not suitable for mainline until the multiple-worker support is merged
> there"
> 
> @Andrew + @Julian:  Do you intent to commit it relatively soon?
> Regarding the wwwdocs patch, I can hold off until that commit or reword
> it to only cover the workers part.

Were these not part of the patch set Thomas was working on?

Andrew


[PATCH] Speed up jump table switch detection.

2021-08-16 Thread Martin Liška

Hi.

As mentioned in the PR, this patch speeds up rapidly jump table detection
in switch lowering.

Patch can bootstrap on x86_64-linux-gnu and survives regression tests.

Ready to be installed?
Thanks,
Martin

PR tree-optimization/100393

gcc/ChangeLog:

* tree-switch-conversion.c (group_cluster::dump): Use
  get_comparison_count.
(jump_table_cluster::find_jump_tables): Pre-compute number of
comparisons and then decrement it. Cache also max_ratio.
(jump_table_cluster::can_be_handled): Change signature.
* tree-switch-conversion.h (get_comparison_count): New.
---
 gcc/tree-switch-conversion.c | 42 
 gcc/tree-switch-conversion.h | 14 ++--
 2 files changed, 35 insertions(+), 21 deletions(-)

diff --git a/gcc/tree-switch-conversion.c b/gcc/tree-switch-conversion.c
index 294b5457008..244cf4be010 100644
--- a/gcc/tree-switch-conversion.c
+++ b/gcc/tree-switch-conversion.c
@@ -1091,7 +1091,7 @@ group_cluster::dump (FILE *f, bool details)
   for (unsigned i = 0; i < m_cases.length (); i++)
 {
   simple_cluster *sc = static_cast (m_cases[i]);
-  comparison_count += sc->m_range_p ? 2 : 1;
+  comparison_count += sc->get_comparison_count ();
 }
 
   unsigned HOST_WIDE_INT range = get_range (get_low (), get_high ());

@@ -1186,11 +1186,24 @@ jump_table_cluster::find_jump_tables (vec 
)
 
   min.quick_push (min_cluster_item (0, 0, 0));
 
+  unsigned HOST_WIDE_INT max_ratio

+= (optimize_insn_for_size_p ()
+   ? param_jump_table_max_growth_ratio_for_size
+   : param_jump_table_max_growth_ratio_for_speed);
+
   for (unsigned i = 1; i <= l; i++)
 {
   /* Set minimal # of clusters with i-th item to infinite.  */
   min.quick_push (min_cluster_item (INT_MAX, INT_MAX, INT_MAX));
 
+  /* Pre-calculate number of comparisons for the clusters.  */

+  HOST_WIDE_INT comparison_count = 0;
+  for (unsigned k = 0; k <= i - 1; k++)
+   {
+ simple_cluster *sc = static_cast (clusters[k]);
+ comparison_count += sc->get_comparison_count ();
+   }
+
   for (unsigned j = 0; j < i; j++)
{
  unsigned HOST_WIDE_INT s = min[j].m_non_jt_cases;
@@ -1201,10 +1214,15 @@ jump_table_cluster::find_jump_tables (vec 
)
  if ((min[j].m_count + 1 < min[i].m_count
   || (min[j].m_count + 1 == min[i].m_count
   && s < min[i].m_non_jt_cases))
- && can_be_handled (clusters, j, i - 1))
+ && can_be_handled (clusters, j, i - 1, max_ratio,
+comparison_count))
min[i] = min_cluster_item (min[j].m_count + 1, j, s);
+
+ simple_cluster *sc = static_cast (clusters[j]);
+ comparison_count -= sc->get_comparison_count ();
}
 
+  gcc_checking_assert (comparison_count == 0);

   gcc_checking_assert (min[i].m_count != INT_MAX);
 }
 
@@ -1242,7 +1260,9 @@ jump_table_cluster::find_jump_tables (vec )
 
 bool

 jump_table_cluster::can_be_handled (const vec ,
-   unsigned start, unsigned end)
+   unsigned start, unsigned end,
+   unsigned HOST_WIDE_INT max_ratio,
+   unsigned HOST_WIDE_INT comparison_count)
 {
   /* If the switch is relatively small such that the cost of one
  indirect jump on the target are higher than the cost of a
@@ -1261,10 +1281,6 @@ jump_table_cluster::can_be_handled (const vec 
,
   if (start == end)
 return true;
 
-  unsigned HOST_WIDE_INT max_ratio

-= (optimize_insn_for_size_p ()
-   ? param_jump_table_max_growth_ratio_for_size
-   : param_jump_table_max_growth_ratio_for_speed);
   unsigned HOST_WIDE_INT range = get_range (clusters[start]->get_low (),
clusters[end]->get_high ());
   /* Check overflow.  */
@@ -1278,18 +1294,6 @@ jump_table_cluster::can_be_handled (const vec 
,
   if (lhs < range)
 return false;
 
-  /* First make quick guess as each cluster

- can add at maximum 2 to the comparison_count.  */
-  if (lhs > 2 * max_ratio * (end - start + 1))
-return false;
-
-  unsigned HOST_WIDE_INT comparison_count = 0;
-  for (unsigned i = start; i <= end; i++)
-{
-  simple_cluster *sc = static_cast (clusters[i]);
-  comparison_count += sc->m_range_p ? 2 : 1;
-}
-
   return lhs <= max_ratio * comparison_count;
 }
 
diff --git a/gcc/tree-switch-conversion.h b/gcc/tree-switch-conversion.h

index d76f19b57f6..a375e52636e 100644
--- a/gcc/tree-switch-conversion.h
+++ b/gcc/tree-switch-conversion.h
@@ -180,6 +180,13 @@ public:
 return tree_int_cst_equal (get_low (), get_high ());
   }
 
+  /* Return number of comparisons needed for the case.  */

+  unsigned
+  get_comparison_count ()
+  {
+return m_range_p ? 2 : 1;
+  }
+
   /* Low value of the case.  */
   tree m_low;
 
@@ -267,9 +274,12 @@ 

Re: [ping] Re-unify 'omp_build_component_ref' and 'oacc_build_component_ref'

2021-08-16 Thread Jakub Jelinek via Gcc-patches
On Mon, Aug 16, 2021 at 10:08:42AM +0200, Thomas Schwinge wrote:
> --- a/gcc/omp-general.c
> +++ b/gcc/omp-general.c
> @@ -2815,4 +2815,25 @@ oacc_get_ifn_dim_arg (const gimple *stmt)
>return (int) axis;
>  }
>  
> +/* Build COMPONENT_REF and set TREE_THIS_VOLATILE and TREE_READONLY on it
> +   as appropriate.  */
> +
> +tree
> +omp_build_component_ref (tree obj, tree field)
> +{
> +  tree field_type = TREE_TYPE (field);
> +  tree obj_type = TREE_TYPE (obj);
> +  if (!ADDR_SPACE_GENERIC_P (TYPE_ADDR_SPACE (obj_type)))
> +field_type
> +  = build_qualified_type (field_type,
> +   KEEP_QUAL_ADDR_SPACE (TYPE_QUALS (obj_type)));

Are you sure this can't trigger?
Say
extern int __seg_fs a;

void
foo (void)
{
  #pragma omp parallel private (a)
  a = 2;
}
I think keeping the qual addr space here is the wrong thing to do,
it should keep the other quals and clear the address space instead,
the whole struct is going to be in generic addres space, isn't it?

> +
> +  tree ret = build3 (COMPONENT_REF, field_type, obj, field, NULL);
> +  if (TREE_THIS_VOLATILE (field))
> +TREE_THIS_VOLATILE (ret) |= 1;
> +  if (TREE_READONLY (field))
> +TREE_READONLY (ret) |= 1;

When touching these two, shouldn't it be better written as
= 1; instead of |= 1; ?  For a bitfield...

Jakub



Re: [PATCH] i386: Add peephole for lea and zero extend [PR 101716]

2021-08-16 Thread Uros Bizjak via Gcc-patches
On Fri, Aug 13, 2021 at 9:21 AM Uros Bizjak  wrote:
>
> On Fri, Aug 13, 2021 at 2:48 AM Hongyu Wang  wrote:
> >
> > Hi,
> >
> > For lea + zero_extendsidi insns, if dest of lea and src of zext are the
> > same, combine them with single leal under 64bit target since 32bit
> > register will be automatically zero-extended.
> >
> > Bootstrapped and regtested on x86_64-linux-gnu{-m32,}.
> > Ok for master?
> >
> > gcc/ChangeLog:
> >
> > PR target/101716
> > * config/i386/i386.md (*lea_zext): New define_insn.
> > (define_peephole2): New peephole2 to combine zero_extend
> > with lea.
> >
> > gcc/testsuite/ChangeLog:
> >
> > PR target/101716
> > * gcc.target/i386/pr101716.c: New test.
>
> This form should be covered by ix86_decompose_address via
> address_no_seg_operand predicate. Combine creates:
>
> Trying 6 -> 7:
>6: {r86:DI=r87:DI<<0x1;clobber flags:CC;}
>  REG_DEAD r87:DI
>  REG_UNUSED flags:CC
>7: r85:DI=zero_extend(r86:DI#0)
>  REG_DEAD r86:DI
> Failed to match this instruction:
> (set (reg:DI 85)
>(and:DI (ashift:DI (reg:DI 87)
>(const_int 1 [0x1]))
>(const_int 4294967294 [0xfffe])))
>
> which does not fit:
>
>   else if (GET_CODE (addr) == AND
>&& const_32bit_mask (XEXP (addr, 1), DImode))
>
> After reload, we lose SUBREG, so REE does not trigger on:
>
> (insn 17 3 7 2 (set (reg:DI 0 ax [86])
>(mult:DI (reg:DI 5 di [87])
>(const_int 2 [0x2]))) "pr101716.c":4:13 204 {*leadi}
> (nil))
> (insn 7 17 13 2 (set (reg:DI 0 ax [85])
>(zero_extend:DI (reg:SI 0 ax [86]))) "pr101716.c":4:19 136
> {*zero_extendsidi2}
> (nil))
>
> So, the question is if the combine pass really needs to zero-extend
> with 0xfffe, the left shift << 1 guarantees zero in the LSB, so
> 0x should be better and in line with canonical zero-extension
> RTX.

Also, ix86_decompose_address accepts ASHIFT RTX when ASHIFT is
embedded in the PLUS chain, but naked ASHIFT is rejected (c.f. the
call in ix86_legitimate_address_p) for some (historic?) reason. It
looks to me that this restriction is not necessary, since
ix86_legitimize_address can canonicalize ASHIFT RTXes without
problems. The attached patch that survives bootstrap and regtest can
help in your case.

Uros.
diff --git a/gcc/config/i386/i386.c b/gcc/config/i386/i386.c
index 4d4ab6a03d6..9395716dd60 100644
--- a/gcc/config/i386/i386.c
+++ b/gcc/config/i386/i386.c
@@ -10018,8 +10018,7 @@ ix86_live_on_entry (bitmap regs)
 
 /* Extract the parts of an RTL expression that is a valid memory address
for an instruction.  Return 0 if the structure of the address is
-   grossly off.  Return -1 if the address contains ASHIFT, so it is not
-   strictly valid, but still used for computing length of lea instruction.  */
+   grossly off.  */
 
 int
 ix86_decompose_address (rtx addr, struct ix86_address *out)
@@ -10029,7 +10028,6 @@ ix86_decompose_address (rtx addr, struct ix86_address 
*out)
   HOST_WIDE_INT scale = 1;
   rtx scale_rtx = NULL_RTX;
   rtx tmp;
-  int retval = 1;
   addr_space_t seg = ADDR_SPACE_GENERIC;
 
   /* Allow zero-extended SImode addresses,
@@ -10179,7 +10177,6 @@ ix86_decompose_address (rtx addr, struct ix86_address 
*out)
   if ((unsigned HOST_WIDE_INT) scale > 3)
return 0;
   scale = 1 << scale;
-  retval = -1;
 }
   else
 disp = addr;   /* displacement */
@@ -10252,7 +10249,7 @@ ix86_decompose_address (rtx addr, struct ix86_address 
*out)
   out->scale = scale;
   out->seg = seg;
 
-  return retval;
+  return 1;
 }
 
 /* Return cost of the memory address x.
@@ -10765,7 +10762,7 @@ ix86_legitimate_address_p (machine_mode, rtx addr, bool 
strict)
   HOST_WIDE_INT scale;
   addr_space_t seg;
 
-  if (ix86_decompose_address (addr, ) <= 0)
+  if (ix86_decompose_address (addr, ) == 0)
 /* Decomposition failed.  */
 return false;
 


[ping] Re-unify 'omp_build_component_ref' and 'oacc_build_component_ref'

2021-08-16 Thread Thomas Schwinge
Hi!

Ping.


On 2021-08-09T16:16:51+0200, I wrote:
> [from internal]
>
>
> Hi!
>
> This concerns a class of ICEs seen as of og10 branch with the
> "openacc: Middle-end worker-partitioning support" and "amdgcn:
> Enable OpenACC worker partitioning for AMD GCN" changes applied:
>
> On 2020-06-06T16:07:36+0100, Kwok Cheung Yeung  wrote:
>> On 01/06/2020 8:48 pm, Kwok Cheung Yeung wrote:
>>> On 21/05/2020 10:23 pm, Kwok Cheung Yeung wrote:
 These all have the same failure mode:

 during RTL pass: expand
 [...]/libgomp/testsuite/libgomp.oacc-fortran/parallel-dims.f90: In 
 function 'MAIN__._omp_fn.1':
 [...]/libgomp/testsuite/libgomp.oacc-fortran/parallel-dims.f90:86: 
 internal compiler error: in convert_memory_address_addr_space_1, at 
 explow.c:302
 0xc29f20 convert_memory_address_addr_space_1(scalar_int_mode, rtx_def*, 
 unsigned char, bool, bool)
  [...]/gcc/explow.c:302
 0xc29f57 convert_memory_address_addr_space(scalar_int_mode, rtx_def*, 
 unsigned char)
  [...]/gcc/explow.c:404
 [...]
>
 This occurs if the -ftree-slp-vectorize flag is specified (default at -O3).
>
>>> The problematic bit of Gimple code is this:
>>>
>>>.oacc_worker_o.44._120 = gangs_min_472;
>>>.oacc_worker_o.44._122 = workers_min_473;
>>>.oacc_worker_o.44._124 = vectors_min_474;
>>>.oacc_worker_o.44._126 = gangs_max_475;
>>>.oacc_worker_o.44._128 = workers_max_476;
>>>.oacc_worker_o.44._130 = vectors_max_477;
>>>.oacc_worker_o.44._132 = 0;
>>>
>>> With SLP vectorization enabled, it becomes this:
>>>
>>>_40 = {gangs_min_472, workers_min_473, vectors_min_474, gangs_max_475};
>>>...
>>>MEM  [(int *)&.oacc_worker_o.44] = _40;
>>>.oacc_worker_o.44._128 = workers_max_476;
>>>.oacc_worker_o.44._130 = vectors_max_477;
>>>.oacc_worker_o.44._132 = 0;
>>>
>>> The optimization is trying to transform 4 separate assignments into a single
>>> memory operation. The trouble is that _worker_o is an SImode pointer 
>>> in
>>> AS4 (LDS), while the memory expression appears to be in the default memory
>>> space. The 'to' expression of the assignment is:
>>>
>>>   >>  type >>  type >>  size 
>>>  unit-size 
>>>  align:32 warn_if_not_align:0 symtab:0 alias-set 1 
>>> canonical-type 0x773195e8 precision:32 min >> -2147483648> max 
>>>  pointer_to_this  
>>> reference_to_this >
>>>  TI
>>>  size 
>>>  unit-size 
>>>  align:128 warn_if_not_align:0 symtab:0 alias-set 1 
>>> structural-equality nunits:4
>>>  pointer_to_this >
>>>
>>>  arg:0 >>  type >> 0x773195e8 int>
>>>  public unsigned DI
>>>  size 
>>>  unit-size 
>>>  align:64 warn_if_not_align:0 symtab:0 alias-set 2 
>>> structural-equality>
>>>  constant
>>>  arg:0 >> 0x773eb888 .oacc_ws_data_s.21 address-space-4>
>>>  addressable used static ignored BLK 
>>> [...]/libgomp/testsuite/libgomp.oacc-fortran/parallel-dims.f90:86:0
>>>
>>>  size 
>>>  unit-size 
>>>  align:128 warn_if_not_align:0
>>>  (mem/c:BLK (symbol_ref:SI (".oacc_worker_o.44.14") [flags 0x2] 
>>> ) [9 .oacc_worker_o.44+0 S28 
>>> A128 AS4])>>
>>>  arg:1  
>>> constant 0>>
>>>
>>> In convert_memory_address_addr_space_1:
>>>
>>> #ifndef POINTERS_EXTEND_UNSIGNED
>>>gcc_assert (GET_MODE (x) == to_mode || GET_MODE (x) == VOIDmode);
>>>return x;
>>> #else /* defined(POINTERS_EXTEND_UNSIGNED) */
>>>
>>> POINTERS_EXTEND_UNSIGNED is not defined, so it hits the assert. The expected
>>> to_mode is DI_mode, but x is SI_mode, so the assert fires.
>
>> I now have a fix for this.
>>
>>  >MEM  [(int *)&.oacc_worker_o.44] = _40;
>>
>> The ICE occurs because the SLP vectorization pass creates the new statement
>> using the type of the expression '&.oacc_worker_o.44', which is a pointer to 
>> a
>> component ref in the default address space. The expand pass gets confused
>> because it is handed an SImode pointer (for LDS) when it is expecting a 
>> DImode
>> pointer (for flat/global space).
>>
>> The underlying problem is that although .oacc_worker_o is in the correct 
>> address
>> space, the component ref .oacc_worker_o is not. I fixed this by propagating 
>> the
>> address space of .oacc_worker_o when the component ref is created.
>
>>  static tree
>>  oacc_build_component_ref (tree obj, tree field)
>>  {
>> -  tree ret = build3 (COMPONENT_REF, TREE_TYPE (field), obj, field, NULL);
>> +  tree field_type = TREE_TYPE (field);
>> +  tree obj_type = TREE_TYPE (obj);
>> +  if (!ADDR_SPACE_GENERIC_P (TYPE_ADDR_SPACE (obj_type)))
>> +field_type = build_qualified_type
>> + (field_type,
>> +  KEEP_QUAL_ADDR_SPACE (TYPE_QUALS (obj_type)));
>> +
>> +  tree ret = build3 (COMPONENT_REF, field_type, obj, field, NULL);
>>if 

Re: [patch][version 6] add -ftrivial-auto-var-init and variable attribute "uninitialized" to gcc

2021-08-16 Thread Richard Biener via Gcc-patches
On Thu, 12 Aug 2021, Qing Zhao wrote:

> Hi, Richard,
> 
> For RTL expansion of call to .DEFERRED_INIT, I changed my code per your 
> suggestions like following:
> 
> ==
> #define INIT_PATTERN_VALUE  0xFE
> static void
> expand_DEFERRED_INIT (internal_fn, gcall *stmt)
> {
>   tree lhs = gimple_call_lhs (stmt);
>   tree var_size = gimple_call_arg (stmt, 0);
>   enum auto_init_type init_type
> = (enum auto_init_type) TREE_INT_CST_LOW (gimple_call_arg (stmt, 1));
>   bool is_vla = (bool) TREE_INT_CST_LOW (gimple_call_arg (stmt, 2));
> 
>   tree var_type = TREE_TYPE (lhs);
>   gcc_assert (init_type > AUTO_INIT_UNINITIALIZED);
> 
>   if (is_vla || (!can_native_interpret_type_p (var_type)))
> {
> /* If this is a VLA or the type of the variable cannot be natively
>interpreted, expand to a memset to initialize it.  */
>   if (TREE_CODE (lhs) == SSA_NAME)
> lhs = SSA_NAME_VAR (lhs);
>   tree var_addr = NULL_TREE;
>   if (is_vla)
> var_addr = TREE_OPERAND (lhs, 0);
>   else
> {
>  TREE_ADDRESSABLE (lhs) = 1;
>  var_addr = build_fold_addr_expr (lhs);
> }
>   tree value = (init_type == AUTO_INIT_PATTERN) ?
> build_int_cst (unsigned_char_type_node,
>INIT_PATTERN_VALUE) :
> build_zero_cst (unsigned_char_type_node);
>   tree m_call = build_call_expr (builtin_decl_implicit (BUILT_IN_MEMSET),
>  3, var_addr, value, var_size);
>   /* Expand this memset call.  */
>   expand_builtin_memset (m_call, NULL_RTX, TYPE_MODE (var_type));
> }
>   else
> {
> /* If this is not a VLA and the type of the variable can be natively 
>interpreted, expand to assignment to generate better code.  */
>   tree pattern = NULL_TREE;
>   unsigned HOST_WIDE_INT total_bytes
> = tree_to_uhwi (TYPE_SIZE_UNIT (var_type));
> 
>   if (init_type == AUTO_INIT_PATTERN)
> {
>   unsigned char *buf = (unsigned char *) xmalloc (total_bytes);
>   memset (buf, INIT_PATTERN_VALUE, total_bytes);
>   pattern = native_interpret_expr (var_type, buf, total_bytes);
>   gcc_assert (pattern);
> }
> 
>   tree init = (init_type == AUTO_INIT_PATTERN) ?
>pattern :
>build_zero_cst (var_type);
>   expand_assignment (lhs, init, false);
> }
> }
> ===
> 
> Now, I used “can_native_interpret_type_p (var_type)” instead of 
> “use_register_for_decl (lhs)” to decide 
> whether to use “memset” or use “assign” to expand this function.
> 
> However, this exposed an bug that is very hard to be addressed:
> 
> ***For the testing case: test suite/gcc.dg/uninit-I.c:
> 
> /* { dg-do compile } */
> /* { dg-options "-O2 -Wuninitialized" } */
> 
> int sys_msgctl (void)
> {
>   struct { int mode; } setbuf;
>   return setbuf.mode;  /* { dg-warning "'setbuf\.mode' is used" } */
> ==
> 
> **the above auto var “setbuf” has “struct” type, which 
> “can_native_interpret_type_p(var_type)” is false, therefore, 
> Expanding this .DEFERRED_INIT call went down the “memset” expansion route. 
> 
> However, this structure type can be fitted into a register, therefore cannot 
> be taken address anymore at this stage, even though I tried:
> 
>  TREE_ADDRESSABLE (lhs) = 1;
>  var_addr = build_fold_addr_expr (lhs);
> 
> To create an address variable for it, the expansion still failed at expr.c: 
> line 8412:
> during RTL pass: expand
> /home/opc/Work/GCC/latest-gcc/gcc/testsuite/gcc.dg/auto-init-uninit-I.c:6:24: 
> internal compiler error: in expand_expr_addr_expr_1, at expr.c:8412
> 0xd04104 expand_expr_addr_expr_1
>   ../../latest-gcc/gcc/expr.c:8412
> 0xd04a95 expand_expr_addr_expr
>   ../../latest-gcc/gcc/expr.c:8525
> 0xd13592 expand_expr_real_1(tree_node*, rtx_def*, machine_mode, 
> expand_modifier, rtx_def**, bool)
>   ../../latest-gcc/gcc/expr.c:11741
> 0xd05142 expand_expr_real(tree_node*, rtx_def*, machine_mode, 
> expand_modifier, rtx_def**, bool)
>   ../../latest-gcc/gcc/expr.c:8713
> 0xaed1d3 expand_expr
>   ../../latest-gcc/gcc/expr.h:301
> 0xaf0d89 get_memory_rtx
>   ../../latest-gcc/gcc/builtins.c:1370
> 0xafb4fb expand_builtin_memset_args
>   ../../latest-gcc/gcc/builtins.c:4102
> 0xafacde expand_builtin_memset(tree_node*, rtx_def*, machine_mode)
>   ../../latest-gcc/gcc/builtins.c:3886
> 0xe97fb3 expand_DEFERRED_INIT
> 
> **That’s the major reason why I chose “use_register_for_decl(lhs)” to 
> decide “memset” expansion or “assign” expansion, “memset” expansion
> needs to take address of the variable, if the variable has been decided to 
> fit into a register, then its address cannot taken anymore at this stage.
> 
> **using “can_native_interpret_type_p” did make the “pattern” generation 
> part much  cleaner and simpler, however, looks like it 

Re: [PATCH] [i386] Optimize vec_perm_expr to match vpmov{dw,qd,wb}.

2021-08-16 Thread Richard Biener via Gcc-patches
On Fri, Aug 13, 2021 at 11:04 AM Richard Sandiford via Gcc-patches
 wrote:
>
> Jakub Jelinek via Gcc-patches  writes:
> > On Fri, Aug 13, 2021 at 09:42:00AM +0800, Hongtao Liu wrote:
> >> > So, I wonder if your new routine shouldn't be instead done after
> >> > in ix86_expand_vec_perm_const_1 after vec_perm_1 among other 2 insn cases
> >> > and handle the other vpmovdw etc. cases in combine splitters (see that we
> >> > only use low half or quarter of the result and transform whatever
> >> > permutation we've used into what we want).
> >> >
> >> Got it, i'll try that way.
> >
> > Note, IMHO the ultimate fix would be to add real support for the
> > __builtin_shufflevector -1 indices (meaning I don't care what will be in
> > that element, perhaps narrowed down to an implementation choice of
> > any element of the input vector(s) or 0).
> > As VEC_PERM_EXPR is currently used for both perms by variable permutation
> > vector and constant, I think we'd need to introduce VEC_PERM_CONST_EXPR,
> > which would be exactly like VEC_PERM_EXPR, except that the last operand
> > would be required to be a VECTOR_CST and that all ones element would mean
> > something different, the I don't care behavior.
> > The GIMPLE side would be fairly easy, except that there should be some
> > optimizations eventually, like when only certain subset of elements of
> > a vector are used later, we can mark the other elements as don't care.
>
> Another alternative I'd wondered about was keeping a single tree code,
> but adding an extra operand with a “care/don't care” mask.  I think
> that would fit with variable-length vectors better.

That sounds reasonable.  Note I avoided "quaternary" ops for
BIT_INSERT_EXPR but I don't see a good way to avoid that for
such VEC_PERM_EXPR extension.

Richard.

> Thanks,
> Richard


Re: [PATCH] [i386] Optimize __builtin_shuffle_vector.

2021-08-16 Thread Hongtao Liu via Gcc-patches
On Mon, Aug 16, 2021 at 3:11 PM Jakub Jelinek via Gcc-patches
 wrote:
>
> On Mon, Aug 16, 2021 at 01:18:38PM +0800, liuhongt via Gcc-patches wrote:
> > +  /* Accept VNxHImode and VNxQImode now.  */
> > +  if (!TARGET_AVX512VL && GET_MODE_SIZE (mode) < 64)
> > +return false;
> > +
> > +  /* vpermw.  */
> > +  if (!TARGET_AVX512BW && inner_size == 2)
> > +return false;
> > +
> > +  /* vpermb.   */
>
> Too many spaces after dot.
>
> > @@ -18301,7 +18380,7 @@ expand_vec_perm_1 (struct expand_vec_perm_d *d)
> >if (expand_vec_perm_palignr (d, true))
> >  return true;
> >
> > -  /* Try the AVX512F vperm{s,d} instructions.  */
> > +  /* Try the AVX512F vperm{w,b,s,d} and instructions  */
>
> What is the " and" doing there?
Typo.
>
> > +  /* Check that the permutation is suitable for pmovz{bw,wd,dq}.
> > + For example V16HImode to V8HImode
> > + { 0 2 4 6 8 10 12 14 * * * * * * * * }.  */
> > +  for (int i = 0; i != nelt/2; i++)
>
> nelt / 2 please
>
> Otherwise LGTM.
>
Thanks for the review.
> Jakub
>


-- 
BR,
Hongtao


Re: [Patch] Fortran/OpenMP: Add support for OpenMP 5.1 masked construct (was: Re: [committed] openmp: Add support for OpenMP 5.1 masked construct)

2021-08-16 Thread Jakub Jelinek via Gcc-patches
On Fri, Aug 13, 2021 at 08:52:34PM +0200, Tobias Burnus wrote:
> gcc/fortran/ChangeLog:
> 
>   * dump-parse-tree.c (show_omp_clauses): Handle 'filter' clause.
>   (show_omp_node, show_code_node): Handle (combined) omp masked construct.
>   * frontend-passes.c (gfc_code_walker): Likewise.
>   * gfortran.h (enum gfc_statement): Add ST_OMP_*_MASKED*.
>   (enum gfc_exec_op): Add EXEC_OMP_*_MASKED*.
>   * match.h (gfc_match_omp_masked, gfc_match_omp_masked_taskloop,
>   gfc_match_omp_masked_taskloop_simd, gfc_match_omp_parallel_masked,
>   gfc_match_omp_parallel_masked_taskloop,
>   gfc_match_omp_parallel_masked_taskloop_simd): New prototypes.
>   * openmp.c (enum omp_mask1): Add OMP_CLAUSE_FILTER.
>   (gfc_match_omp_clauses): Match it.
>   (OMP_MASKED_CLAUSES, gfc_match_omp_parallel_masked,
>   gfc_match_omp_parallel_masked_taskloop,
>   gfc_match_omp_parallel_masked_taskloop_simd,
>   gfc_match_omp_masked, gfc_match_omp_masked_taskloop,
>   gfc_match_omp_masked_taskloop_simd): New.
>   (resolve_omp_clauses): Resolve filter clause. 
>   (gfc_resolve_omp_parallel_blocks, resolve_omp_do,
>   omp_code_to_statement, gfc_resolve_omp_directive): Handle
>   omp masked constructs.
>   * parse.c (decode_omp_directive, case_exec_markers,
>   gfc_ascii_statement, parse_omp_do, parse_omp_structured_block,
>   parse_executable): Likewise.
>   * resolve.c (gfc_resolve_blocks, gfc_resolve_code): Likewise.
>   * st.c (gfc_free_statement): Likewise.
>   * trans-openmp.c (gfc_trans_omp_clauses): Handle filter clause.
>   (GFC_OMP_SPLIT_MASKED, GFC_OMP_MASK_MASKED): New enum values.
>   (gfc_trans_omp_masked): New.
>   (gfc_split_omp_clauses): Handle combined masked directives.
>   (gfc_trans_omp_master_taskloop): Rename to ...
>   (gfc_trans_omp_master_masked_taskloop): ... this; handle also
>   combined masked directives.
>   (gfc_trans_omp_parallel_master): Rename to ...
>   (gfc_trans_omp_parallel_master_masked): ... this; handle
>   combined masked directives.
>   (gfc_trans_omp_directive): Handle EXEC_OMP_*_MASKED*.
>   * trans.c (trans_code): Likewise.
> 
> libgomp/ChangeLog:
> 
>   * testsuite/libgomp.fortran/masked-1.f90: New test.
> 
> gcc/testsuite/ChangeLog:
> 
>   * gfortran.dg/gomp/masked-1.f90: New test.
>   * gfortran.dg/gomp/masked-2.f90: New test.
>   * gfortran.dg/gomp/masked-3.f90: New test.
>   * gfortran.dg/gomp/masked-combined-1.f90: New test.
>   * gfortran.dg/gomp/masked-combined-2.f90: New test.

Ok, thanks.

Jakub



Re: [PATCH] vect: Add extraction cost for slp reduc

2021-08-16 Thread Kewen.Lin via Gcc-patches
Hi Richi,

Thanks for the comments!

on 2021/8/16 下午2:49, Richard Biener wrote:
> On Mon, Aug 16, 2021 at 8:03 AM Kewen.Lin  wrote:
>>
>> Hi,
>>
>> IIUC, the function vectorizable_bb_reduc_epilogue missed to
>> consider the cost to extract the final value from the vector
>> for reduc operations.  This patch is to add one time of
>> vec_to_scalar cost for extracting.
>>
>> Bootstrapped & regtested on powerpc64le-linux-gnu P9.
>> The testing on x86_64 and aarch64 is ongoing.
>>
>> Is it ok for trunk?
> 
> There's no such instruction necessary, the way the costing works
> the result is in lane zero already.  Note the optabs are defined
> to reduce to a scalar already.  So if your arch implements those and
> requires such move then the backend costing needs to handle that.
> 

Yes, these reduc__scal_ should have made the
operand[0] as the final scalar result.

> That said, ideally we'd simply cost the IFN_REDUC_* in the backend
> but for BB reductions we don't actually build a SLP node with such
> representative stmt to pass down (yet).
> 

OK, thanks for the explanation.  It explains why we cost the 
IFN_REDUC_* as one vect_stmt in loop vect but cost it as
conservative (shuffle and reduc_op) as possible here.

> I guess you're running into a integer reduction where there's
> a vector -> gpr move missing in costing?  I suppose costing
> vec_to_scalar works for that but in the end we should maybe
> find a way to cost the IFN_REDUC_* ...

Yeah, it's a reduction on plus, initially I wanted to adjust backend
costing for various IFN_REDUC* (since for some variants Power has more
than one instructions for them), then I noticed we cost the reduction
as shuffle and reduc_op during SLP for now, I guess it's good to get
vec_to_scalar considered here for consistency?  Then it can be removed
together when we have a better modeling in the end? 

BR,
Kewen

> 
> Richard.
> 
>> BR,
>> Kewen
>> -
>> gcc/ChangeLog:
>>
>> * tree-vect-slp.c (vectorizable_bb_reduc_epilogue): Add the cost for
>> value extraction.
>>
>> diff --git a/gcc/tree-vect-slp.c b/gcc/tree-vect-slp.c
>> index b9d88c2d943..841a0872afa 100644
>> --- a/gcc/tree-vect-slp.c
>> +++ b/gcc/tree-vect-slp.c
>> @@ -4845,12 +4845,14 @@ vectorizable_bb_reduc_epilogue (slp_instance 
>> instance,
>>  return false;
>>
>>/* There's no way to cost a horizontal vector reduction via REDUC_FN so
>> - cost log2 vector operations plus shuffles.  */
>> + cost log2 vector operations plus shuffles and one extraction.  */
>>unsigned steps = floor_log2 (vect_nunits_for_cost (vectype));
>>record_stmt_cost (cost_vec, steps, vector_stmt, instance->root_stmts[0],
>> vectype, 0, vect_body);
>>record_stmt_cost (cost_vec, steps, vec_perm, instance->root_stmts[0],
>> vectype, 0, vect_body);
>> +  record_stmt_cost (cost_vec, 1, vec_to_scalar, instance->root_stmts[0],
>> +   vectype, 0, vect_body);
>>return true;
>>  }




Re: [PATCH] Adding target hook allows to reject initialization of register

2021-08-16 Thread Richard Biener via Gcc-patches
On Fri, Aug 13, 2021 at 3:59 AM Jojo R  wrote:
>
>
> — Jojo
> 在 2021年8月11日 +0800 PM6:44,Richard Biener ,写道:
>
> On Wed, Aug 11, 2021 at 11:28 AM Richard Sandiford
>  wrote:
>
>
> Richard Biener  writes:
>
> On Tue, Aug 10, 2021 at 10:33 AM Jojo R via Gcc-patches
>  wrote:
>
>
> Some target like RISC-V allow to group vector register as a whole,
> and only operate part of it in fact, but the 'init-regs' pass will add 
> initialization
> for uninitialized registers. Add this hook to reject this action for reducing 
> instruction.
>
>
> Are these groups "visible"? That is, are the pseudos multi-reg
> pseudos? I wonder
> if there's a more generic way to tame down initregs w/o introducing a new 
> target
> hook.
>
> Btw, initregs is a red herring - it ideally should go away. See PR61810.
>
> So instead of adding to it can you see whether disabling the pass for RISC-V
> works w/o fallout (and add a comment to the PR)? Maybe some more RTL
> literate (in particular DF literate) can look at the remaining issue.
> Richard, did you
> ever have a look into the "issue" that initregs covers up (whatever
> that exactly is)?
>
>
> No, sorry. I don't really understand what it would be from the comment
> in the code:
>
> [...] papers over some problems on the arm and other
> processors where certain isa constraints cannot be handled by gcc.
> These are of the form where two operands to an insn my not be the
> same. The ra will only make them the same if they do not
> interfere, and this can only happen if one is not initialized.
>
> That would definitely be an RA bug if true, since the constraints need
> to be applied independently of dataflow information. But the comment
> and code predate LRA and maybe no-one fancied poking around in reload
> (hard to believe).
>
> I'd be very surprised if LRA gets this wrong.
>
>
> OK, we're wondering since quite some time - how about changing the
> gate of initregs to optimize > 0 && !targetm.lra_p ()? We'll hopefully
> figure out the "real" issue the pass is papering over. At the same time
> we're leaving old reload (and likely unmaintianed) targets unaffected.
>
> Richard,
>
> So this patch is not necessary ?
>
> I need to disable this pass in my situation only ?
> I am afraid some side effect in my projects without this init-regs pass … ...

Can you try disabling the pass on RISC-V?

Richard.

> Richard.
>
> Thanks,
> Richard


Re: [patch][version 6] add -ftrivial-auto-var-init and variable attribute "uninitialized" to gcc

2021-08-16 Thread Richard Biener via Gcc-patches
On Wed, 11 Aug 2021, Qing Zhao wrote:

> Hi, 
> 
> I finally decided to take another approach to resolve this issue, it resolved 
> all the potential issues with the “address taken” auto variable.
> 
> The basic idea is to avoid generating the temporary variable in the 
> beginning. 
> As you mentioned, "The reason is that alt_reloc is memory (because it is 
> address taken) and that GIMPLE says that register typed stores 
> need to use a is_gimple_val RHS which the call is not.”
> In order to avoid generating the temporary variable for “address taken” auto 
> variable, I updated the utility routine “is_gimple_val” as following:
> 
> diff --git a/gcc/gimple-expr.c b/gcc/gimple-expr.c
> index a2563a45c37d..d5ef1aef8cea 100644
> --- a/gcc/gimple-expr.c
> +++ b/gcc/gimple-expr.c
> @@ -787,8 +787,20 @@ is_gimple_reg (tree t)
>return !DECL_NOT_GIMPLE_REG_P (t);
>  }
>  
> +/* Return true if T is a call to .DEFERRED_INIT internal function.  */ 
> +static bool
> +is_deferred_init_call (tree t)
> +{
> +  if (TREE_CODE (t) == CALL_EXPR
> +  &&  CALL_EXPR_IFN (t) == IFN_DEFERRED_INIT)
> +return true;
> +  return false;
> +}
> +
>  
> -/* Return true if T is a GIMPLE rvalue, i.e. an identifier or a constant.  */
> +/* Return true if T is a GIMPLE rvalue, i.e. an identifier or a constant,
> +   or a call to .DEFERRED_INIT internal function because the call to
> +   .DEFERRED_INIT will eventually be expanded as a constant.  */
>  
>  bool
>  is_gimple_val (tree t)
> @@ -799,7 +811,8 @@ is_gimple_val (tree t)
>&& !is_gimple_reg (t))
>  return false;
>  
> -  return (is_gimple_variable (t) || is_gimple_min_invariant (t));
> +  return (is_gimple_variable (t) || is_gimple_min_invariant (t)
> + || is_deferred_init_call (t));
>  }
>  
> With this change, the temporary variable will not be created for “address 
> taken” auto variable, and uninitialized analysis does not need any change. 
> Everything works well.
> 
> And I believe that treating “call to .DEFERRED_INIT” as “is_gimple_val” is 
> reasonable since this call actually is a constant.
>
> Let me know if you have any objection on this solution.

Yeah, I object to this solution.

Richard.

> thanks.
> 
> Qing
> 
> > On Aug 11, 2021, at 3:30 PM, Qing Zhao via Gcc-patches 
> >  wrote:
> > 
> > Hi, 
> > 
> > I met another issue for “address taken” auto variable, see below for 
> > details:
> > 
> >  the testing case: (gcc/testsuite/gcc.dg/uninit-16.c)
> > 
> > int foo, bar;
> > 
> > static
> > void decode_reloc(int reloc, int *is_alt)
> > {
> >  if (reloc >= 20)
> >  *is_alt = 1;
> >  else if (reloc >= 10)
> >  *is_alt = 0;
> > }
> > 
> > void testfunc()
> > {
> >  int alt_reloc;
> > 
> >  decode_reloc(foo, _reloc);
> > 
> >  if (alt_reloc) /* { dg-warning "may be used uninitialized" } */
> >bar = 42;
> > }
> > 
> > When compiled with -ftrivial-auto-var-init=zero -O2 -Wuninitialized 
> > -fdump-tree-all:
> > 
> > .*gimple dump:
> > 
> > void testfunc ()
> > { 
> >  int alt_reloc;
> > 
> >  try
> >{
> >  _1 = .DEFERRED_INIT (4, 2, 0);
> >  alt_reloc = _1;
> >  foo.0_2 = foo;
> >  decode_reloc (foo.0_2, _reloc);
> >  alt_reloc.1_3 = alt_reloc;
> >  if (alt_reloc.1_3 != 0) goto ; else goto ;
> >  :
> >  bar = 42;
> >  :
> >}
> >  finally
> >{
> >  alt_reloc = {CLOBBER};
> >}
> > }
> > 
> > **fre1 dump:
> > 
> > void testfunc ()
> > {
> >  int alt_reloc;
> >  int _1;
> >  int foo.0_2;
> > 
> >   :
> >  _1 = .DEFERRED_INIT (4, 2, 0);
> >  foo.0_2 = foo;
> >  if (foo.0_2 > 19)
> >goto ; [50.00%]
> >  else
> >goto ; [50.00%]
> > 
> >   :
> >  goto ; [100.00%]
> > 
> >   :
> >  if (foo.0_2 > 9)
> >goto ; [50.00%]
> >  else
> >goto ; [50.00%]
> > 
> >   :
> >  goto ; [100.00%]
> > 
> >   :
> >  if (_1 != 0)
> >goto ; [INV]
> >  else
> >goto ; [INV]
> > 
> >   :
> >  bar = 42;
> > 
> >   :
> >  return;
> > 
> > }
> > 
> > From the above IR file after “FRE”, we can see that the major issue with 
> > this IR is:
> > 
> > The address taken auto variable “alt_reloc” has been completely replaced by 
> > the temporary variable “_1” in all
> > the uses of the original “alt_reloc”. 
> > 
> > The major problem with such IR is,  during uninitialized analysis phase, 
> > the original use of “alt_reloc” disappeared completely.
> > So, the warning cannot be reported.
> > 
> > 
> > My questions:
> > 
> > 1. Is it possible to get the original “alt_reloc” through the temporary 
> > variable “_1” with some available information recorded in the IR?
> > 2. If not, then we have to record the relationship between “alt_reloc” and 
> > “_1” when the original “alt_reloc” is replaced by “_1” and get such 
> > relationship during
> >Uninitialized analysis phase.  Is this doable?
> > 3. Looks like that for “address taken” auto variable, if we have to 
> > introduce a new temporary variable and split the call to .DEFERRED_INIT 
> > into two:
> > 
> >  

Re: [patch][version 6] add -ftrivial-auto-var-init and variable attribute "uninitialized" to gcc

2021-08-16 Thread Richard Biener via Gcc-patches
On Wed, 11 Aug 2021, Qing Zhao wrote:

> Hi, 
> 
> I met another issue for “address taken” auto variable, see below for details:
> 
>  the testing case: (gcc/testsuite/gcc.dg/uninit-16.c)
> 
> int foo, bar;
> 
> static
> void decode_reloc(int reloc, int *is_alt)
> {
>   if (reloc >= 20)
>   *is_alt = 1;
>   else if (reloc >= 10)
>   *is_alt = 0;
> }
> 
> void testfunc()
> {
>   int alt_reloc;
> 
>   decode_reloc(foo, _reloc);
> 
>   if (alt_reloc) /* { dg-warning "may be used uninitialized" } */
> bar = 42;
> }
> 
> When compiled with -ftrivial-auto-var-init=zero -O2 -Wuninitialized 
> -fdump-tree-all:
> 
> .*gimple dump:
> 
> void testfunc ()
> { 
>   int alt_reloc;
> 
>   try
> {
>   _1 = .DEFERRED_INIT (4, 2, 0);
>   alt_reloc = _1;
>   foo.0_2 = foo;
>   decode_reloc (foo.0_2, _reloc);
>   alt_reloc.1_3 = alt_reloc;
>   if (alt_reloc.1_3 != 0) goto ; else goto ;
>   :
>   bar = 42;
>   :
> }
>   finally
> {
>   alt_reloc = {CLOBBER};
> }
> }
> 
> **fre1 dump:
> 
> void testfunc ()
> {
>   int alt_reloc;
>   int _1;
>   int foo.0_2;
> 
>:
>   _1 = .DEFERRED_INIT (4, 2, 0);
>   foo.0_2 = foo;
>   if (foo.0_2 > 19)
> goto ; [50.00%]
>   else
> goto ; [50.00%]
> 
>:
>   goto ; [100.00%]
> 
>:
>   if (foo.0_2 > 9)
> goto ; [50.00%]
>   else
> goto ; [50.00%]
> 
>:
>   goto ; [100.00%]
> 
>:
>   if (_1 != 0)
> goto ; [INV]
>   else
> goto ; [INV]
> 
>:
>   bar = 42;
> 
>:
>   return;
> 
> }
> 
> From the above IR file after “FRE”, we can see that the major issue with this 
> IR is:
> 
> The address taken auto variable “alt_reloc” has been completely replaced by 
> the temporary variable “_1” in all
> the uses of the original “alt_reloc”. 

Well, this can happen with regular code as well, there's no need for
.DEFERRED_INIT.  This is the usual problem with reporting uninitialized
uses late.

IMHO this shouldn't be a blocker.  The goal of zero "regressions" wrt
-Wuninitialized isn't really achievable.

> The major problem with such IR is,  during uninitialized analysis phase, the 
> original use of “alt_reloc” disappeared completely.
> So, the warning cannot be reported.
> 
> 
> My questions:
> 
> 1. Is it possible to get the original “alt_reloc” through the temporary 
> variable “_1” with some available information recorded in the IR?
> 2. If not, then we have to record the relationship between “alt_reloc” and 
> “_1” when the original “alt_reloc” is replaced by “_1” and get such 
> relationship during
> Uninitialized analysis phase.  Is this doable?

Well, you could add a fake argument to .DEFERRED_INIT for the purpose of
diagnostics.  The difficulty is to avoid tracking it as actual use so
you could for example pass a string with the declarations name though
this wouldn't give the association with the actual decl.

> 3. Looks like that for “address taken” auto variable, if we have to introduce 
> a new temporary variable and split the call to .DEFERRED_INIT into two:
> 
>   temp = .DEFERRED_INIT (4, 2, 0);
>   alt_reloc = temp;
> 
>More issues might possible.
> 
> Any comments and suggestions on this issue?

I don't see any good possibilities that would not make optimizing code
as good as w/o .DEFERRED_INIT more difficult.  My stake here is always
that GCC is an optimizing compiler, not a static analysis engine and
thus I side with "broken" diagnostics and better optimization.

Richard.

> Qing
> 
> j
> > On Aug 11, 2021, at 11:55 AM, Richard Biener  wrote:
> > 
> > On August 11, 2021 6:22:00 PM GMT+02:00, Qing Zhao  
> > wrote:
> >> 
> >> 
> >>> On Aug 11, 2021, at 10:53 AM, Richard Biener  wrote:
> >>> 
> >>> On August 11, 2021 5:30:40 PM GMT+02:00, Qing Zhao  
> >>> wrote:
>  I modified the routine “gimple_add_init_for_auto_var” as the following:
>  
>  /* Generate initialization to automatic variable DECL based on INIT_TYPE.
>  Build a call to internal const function DEFERRED_INIT:
>  1st argument: SIZE of the DECL;
>  2nd argument: INIT_TYPE;
>  3rd argument: IS_VLA, 0 NO, 1 YES;
>  
>  as DEFERRED_INIT (SIZE of the DECL, INIT_TYPE, IS_VLA).  */
>  static void
>  gimple_add_init_for_auto_var (tree decl,
> enum auto_init_type init_type,
> bool is_vla,
> gimple_seq *seq_p)
>  {
>  gcc_assert (VAR_P (decl) && !DECL_EXTERNAL (decl) && !TREE_STATIC 
>  (decl));
>  gcc_assert (init_type > AUTO_INIT_UNINITIALIZED);
>  tree decl_size = TYPE_SIZE_UNIT (TREE_TYPE (decl));
>  
>  tree init_type_node
>   = build_int_cst (integer_type_node, (int) init_type);
>  tree is_vla_node
>   = build_int_cst (integer_type_node, (int) is_vla);
>  
>  tree call = build_call_expr_internal_loc (UNKNOWN_LOCATION, 
>  IFN_DEFERRED_INIT,
>  

  1   2   >