Re: GCC: v850-elf

2021-03-16 Thread Jeff Law via Gcc-patches



On 3/16/2021 5:32 AM, Nick Clifton via Gcc-patches wrote:

Hi Jan-Benedict,


With my re-started testing efforts, I've got another one for you I
guess (`./configure --target=v850-elf && make all-gcc`):

../.././gcc/config/v850/v850.c: In function ‘char* 
construct_restore_jr(rtx)’:
../.././gcc/config/v850/v850.c:2260:22: error: ‘%s’ directive writing 
up to 39 bytes into a region of size between 32 and 71 
[-Werror=format-overflow=]
  2260 |   sprintf (buff, "movhi hi(%s), r0, r6\n\tmovea lo(%s), 
r6, r6\n\tjmp r6",

   | ^~~~
  2261 | name, name);
   |   


I could not reproduce these in my local environment, but I suspect that
you are using a more recent version of gcc than me.  The fix looks 
obvious

however, so please could you try out the attached patch and let me know
if it resolves the problem ?
I suspect that construct_save_jarl has the same problem and the same 
patch should fix it.



Jeff



Re: Ping^2: [PATCH v2] rs6000: Convert the vector element register to SImode [PR98914]

2021-03-16 Thread Xionghu Luo via Gcc-patches

Thanks Jakub & Segher,

On 2021/3/17 06:47, Segher Boessenkool wrote:

Hi!

On Tue, Mar 16, 2021 at 07:57:17PM +0100, Jakub Jelinek wrote:

On Thu, Mar 11, 2021 at 07:57:23AM +0800, Xionghu Luo via Gcc-patches wrote:

diff --git a/gcc/config/rs6000/rs6000.c b/gcc/config/rs6000/rs6000.c
index ec068c58aa5..48eb91132a9 100644
--- a/gcc/config/rs6000/rs6000.c
+++ b/gcc/config/rs6000/rs6000.c
@@ -7000,11 +7000,15 @@ rs6000_expand_vector_set_var_p9 (rtx target, rtx val, 
rtx idx)
  
gcc_assert (VECTOR_MEM_VSX_P (mode) && !CONST_INT_P (idx));
  
-  gcc_assert (GET_MODE (idx) == E_SImode);

-
machine_mode inner_mode = GET_MODE (val);
  
-  rtx tmp = gen_reg_rtx (GET_MODE (idx));

+  machine_mode idx_mode = GET_MODE (idx);
+  rtx tmp = gen_reg_rtx (DImode);
+  if (idx_mode != DImode)
+tmp = convert_modes (DImode, idx_mode, idx, 0);
+  else
+tmp = idx;
+
int width = GET_MODE_SIZE (inner_mode);
  
gcc_assert (width >= 1 && width <= 8);

@@ -7012,9 +7016,7 @@ rs6000_expand_vector_set_var_p9 (rtx target, rtx val, rtx 
idx)
int shift = exact_log2 (width);
/* Generate the IDX for permute shift, width is the vector element size.
   idx = idx * width.  */
-  emit_insn (gen_ashlsi3 (tmp, idx, GEN_INT (shift)));
-
-  tmp = convert_modes (DImode, SImode, tmp, 1);
+  emit_insn (gen_ashldi3 (tmp, tmp, GEN_INT (shift)));


This looks broken.


Yup.


The gen_reg_rtx (DImode); call result is completely ignored,
so it wastes one pseudo, and I'm not convinced that idx nor tmp
is guaranteed to be usable as output of shift.
So, shouldn't it be instead:
   rtx tmp = gen_reg_rtx (DImode);
   if (idx_mode != DImode)
 idx = convert_modes (DImode, idx_mode, idx, 0);

...
   emit_insn (gen_ashldi3 (tmp, idx, GEN_INT (shift));
?


Yeah, something like that.

Just
   idx = convert_modes (DImode, idx_mode, idx, 1);
...
   rtx tmp = gen_reg_rtx (DImode);
   emit_insn (gen_ashldi3 (tmp, idx, GEN_INT (shift));
works just as well.


Also, dunno what is less and what is more expensive on
rs6000, whether convert_modes with unsigned = 0 or 1.


Unsigned is never more expensive.  Sometimes it is cheaper.  It depends
more on the situation what is best.  Just look at the generated code?



Change it to "idx = convert_modes (DImode, idx_mode, idx, 1);"

For the case with "-mvsx -Og -mcpu=power9":

vector char f2(vector char v, char k, char val)
{
 v[k] = val;
 return v;
}

gimple:
_1 = (int) k_2(D);
_7 = v;
_8 = .VEC_SET (_7, val_4(D), _1);
v = _8;
_6 = v;

the expanded RTL is:

   6: NOTE_INSN_BASIC_BLOCK 2
   2: r121:V16QI=%2:V16QI
   3: r122:DI=%5:DI
   4: r123:DI=%6:DI
   5: NOTE_INSN_FUNCTION_BEG
   8: r117:DI=sign_extend(r122:DI#0)
   9: r124:V16QI=r121:V16QI
  10: r126:QI=r123:DI#0
  11: r127:DI=zero_extend(r117:DI#0)
  12: r128:DI=r127:DI<<0
  13: r129:V16QI=unspec[r128:DI] 152
  14: r130:V16QI=unspec[r128:DI] 151
  15: r124:V16QI=unspec[r124:V16QI,r124:V16QI,r129:V16QI] 236
  16: r124:V16QI=unspec[r124:V16QI,r126:QI,0] 118
  17: r124:V16QI=unspec[r124:V16QI,r124:V16QI,r130:V16QI] 236
  18: r119:V16QI=r124:V16QI
  19: r121:V16QI=r119:V16QI
  20: r120:V16QI=r121:V16QI
  24: %2:V16QI=r120:V16QI
  25: use %2:V16QI

sign_extend of insn #8 is generated by "_1 = (int) k_2(D);", zero_extend
of insn #11 is the convert added by this patch.  Both will be optimized
out in final ASM.  Since k is index of vector element, so it could only
be unsigned value, which means zero_extend should be used here as well?

PS: If type of k is "unsigned int", no zero_extend in expander,
and if k is "int", a zero_extend is generated in expander.

Attached the updated patch.  Thanks.


Xionghu
From 06bff74a35ad7641dffc87d1eab9c5872851ae79 Mon Sep 17 00:00:00 2001
From: Xionghu Luo 
Date: Tue, 16 Mar 2021 21:23:14 -0500
Subject: [PATCH] rs6000: Convert the vector set variable idx to DImode
 [PR98914]

vec_insert defines the element argument type to be signed int by ELFv2
ABI.  When expanding a vector with a variable rtx, convert the rtx type
to DImode to support both intrinsic usage and other callers from
rs6000_expand_vector_init produced by v[k] = val when k is long type.

gcc/ChangeLog:

2021-03-17  Xionghu Luo  

PR target/98914
* config/rs6000/rs6000.c (rs6000_expand_vector_set_var_p9):
Convert idx to DImode.
(rs6000_expand_vector_set_var_p8): Likewise.

gcc/testsuite/ChangeLog:

2021-03-17  Xionghu Luo  

PR target/98914
* gcc.target/powerpc/pr98914.c: New test.
---
 gcc/config/rs6000/rs6000.c | 39 ++
 gcc/testsuite/gcc.target/powerpc/pr98914.c | 11 ++
 2 files changed, 29 insertions(+), 21 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/powerpc/pr98914.c

diff --git a/gcc/config/rs6000/rs6000.c b/gcc/config/rs6000/rs6000.c
index ec068c58aa5..bb7fcdca876 100644
--- a/gcc/config/rs6000/rs6000.c
+++ b/gcc/config/rs6000/rs6000.c
@@ -7000,21 +7000,22 @@ rs6000_expand_vector_set_var_p9 (rtx target, rtx val, 
rtx idx)
 
   gcc_assert 

Re: [PATCH 1/2, rs6000] Add const_anchor for rs6000 [PR33699]

2021-03-16 Thread HAO CHEN GUI via Gcc-patches

Segher,

   The const_anchor should work on both 64 and 32 bit. I think the 
constant loading is cheap on 32 bit platform,  so I only enabled it on 
64 bit. I will add a test case and verify the patch on Darwin and AIX. 
Thanks.


On 17/3/2021 上午 12:35, Segher Boessenkool wrote:

Hi!

On Mon, Mar 15, 2021 at 11:11:32AM +0800, HAO CHEN GUI via Gcc-patches wrote:

     This patch adds const_anchor for rs6000. The const_anchor is used
in cse pass.

1) This isn't suitable for stage 4.
2) Please add a test case, which shows what it does, that it is useful.
3) Does this work on other OSes than Linux?  What about Darwin and AIX?


diff --git a/gcc/config/rs6000/rs6000.c b/gcc/config/rs6000/rs6000.c
index ec068c58aa5..2b2350c53ae 100644
--- a/gcc/config/rs6000/rs6000.c
+++ b/gcc/config/rs6000/rs6000.c
@@ -4911,6 +4911,13 @@ rs6000_option_override_internal (bool global_init_p)
  warning (0, "%qs is deprecated and not recommended in any circumstances",
 "-mno-speculate-indirect-jumps");
  
+  if (TARGET_64BIT)

+{
+  targetm.min_anchor_offset = -32768;
+  targetm.max_anchor_offset = 32767;
+  targetm.const_anchor = 0x8000;
+}

Why only on 64 bit?  Why these choices?


Segher


Re: [PATCH] c-family: Fix up -Wduplicated-branches for union members [PR99565]

2021-03-16 Thread Martin Sebor via Gcc-patches

On 3/16/21 11:56 AM, Jakub Jelinek via Gcc-patches wrote:

Hi!

Honza has fairly recently changed operand_equal_p to compare
DECL_FIELD_OFFSET for COMPONENT_REFs when comparing addresses.
As the first testcase in this patch shows, while that is very nice
for optimizations, for the -Wduplicated-branches warning it causes
regressions.  Pedantically a union in both C and C++ has only one
active member at a time, so using some other union member even if it has the
same type is UB, so I think the warning shouldn't warn when it sees access
to different fields that happen to have the same offset and should consider
them different.
In my first attempt to fix this I've keyed the old behavior on
OEP_LEXICOGRAPHIC, but unfortunately that has various problems, the warning
has a quick non-lexicographic compare in build_conditional_expr* and another
lexicographic more expensive one later during genericization and turning the
first one into lexicographic would mean wasting compile time on large
conditionals.
So, this patch instead introduces a new OEP_ flag and makes sure to pass it
to operand_equal_p in all -Wduplicated-branches cases.

The cvt.c changes are because on the other testcase we were warning with
UNKNOWN_LOCATION, so the user wouldn't really know where the questionable
code is.

Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?

2021-03-16  Jakub Jelinek  

PR c++/99565
* tree-core.h (enum operand_equal_flag): Add OEP_DUPLICATED_BRANCHES.
* fold-const.c (operand_compare::operand_equal_p): Don't compare
field offsets if OEP_DUPLICATED_BRANCHES.

* c-warn.c (do_warn_duplicated_branches): Pass also
OEP_DUPLICATED_BRANCHES to operand_equal_p.

* c-typeck.c (build_conditional_expr): Pass OEP_DUPLICATED_BRANCHES
to operand_equal_p.

* call.c (build_conditional_expr_1): Pass OEP_DUPLICATED_BRANCHES
to operand_equal_p.
* cvt.c (convert_to_void): Preserve location_t on COND_EXPR or
or COMPOUND_EXPR.

* g++.dg/warn/Wduplicated-branches6.C: New test.
* g++.dg/warn/Wduplicated-branches7.C: New test.

--- gcc/tree-core.h.jj  2021-01-16 22:52:33.654413400 +0100
+++ gcc/tree-core.h 2021-03-16 16:24:06.136829900 +0100
@@ -896,7 +896,10 @@ enum operand_equal_flag {
OEP_HASH_CHECK = 32,
/* Makes operand_equal_p handle more expressions:  */
OEP_LEXICOGRAPHIC = 64,
-  OEP_BITWISE = 128
+  OEP_BITWISE = 128,
+  /* Considers different fields different even when they have
+ the same offset.  */
+  OEP_DUPLICATED_BRANCHES = 256


It seems sort of "inverted:" I'd expect OEP_LEXICOGRAPHIC on its
own to do a lexicographical comparison, without having to set
an additional bit to ask for it.  If a more refined form of
a lexicographical comparison is needed that considers
lexicographically distinct names equal then adding a new bit
for that would seem like the right design.

I'd also expect the name of the new bit to be independent of
the context where it's used (like -Wduplicated-branches), in case
it also needs to be used in an unrelated warning.  (E.g., the new
-Wvla-parameter also uses OEP_LEXICOGRAPHICAL and if it does need
to set a new bit it would look odd if that bit had the name of
an unrelated warning in it.)

Martin


  };
  
  /* Enum and arrays used for tree allocation stats.

--- gcc/fold-const.c.jj 2021-02-24 12:10:17.571212194 +0100
+++ gcc/fold-const.c2021-03-16 16:19:16.327018956 +0100
@@ -3317,7 +3317,7 @@ operand_compare::operand_equal_p (const_
flags &= ~OEP_ADDRESS_OF;
if (!OP_SAME (1))
  {
-   if (compare_address)
+   if (compare_address && (flags & OEP_DUPLICATED_BRANCHES) == 0)
  {
if (TREE_OPERAND (arg0, 2)
|| TREE_OPERAND (arg1, 2))
--- gcc/c-family/c-warn.c.jj2021-02-12 23:57:30.459142340 +0100
+++ gcc/c-family/c-warn.c   2021-03-16 16:24:19.224685926 +0100
@@ -2779,7 +2779,8 @@ do_warn_duplicated_branches (tree expr)
  
/* Compare the hashes.  */

if (h0 == h1
-  && operand_equal_p (thenb, elseb, OEP_LEXICOGRAPHIC)
+  && operand_equal_p (thenb, elseb, OEP_LEXICOGRAPHIC
+   | OEP_DUPLICATED_BRANCHES)
/* Don't warn if any of the branches or their subexpressions comes
 from a macro.  */
&& !walk_tree_without_duplicates (, expr_from_macro_expansion_r,
--- gcc/c/c-typeck.c.jj 2021-02-18 22:17:44.184657291 +0100
+++ gcc/c/c-typeck.c2021-03-16 16:23:58.335915719 +0100
@@ -5490,7 +5490,7 @@ build_conditional_expr (location_t colon
   warn here, because the COND_EXPR will be turned into OP1.  */
if (warn_duplicated_branches
&& TREE_CODE (ret) == COND_EXPR
-  && (op1 == op2 || operand_equal_p (op1, op2, 0)))
+  && (op1 == op2 || operand_equal_p (op1, op2, OEP_DUPLICATED_BRANCHES)))
  warning_at (EXPR_LOCATION (ret), 

[WIP] 'walk_gimple_seq' backward

2021-03-16 Thread Thomas Schwinge
Hi!

On 2021-03-17T00:24:55+0100, I wrote:
> Now, walking each function backwards (!), [...]

> I've now got a simple 'callback_op', which for '!is_lhs' looks at
> 'get_base_address ([op])', and if that 'var' is contained in the set of
> current candidates (initialized per containg 'bind's, which we enter
> first, even if walking a 'gimple_seq' backwards), removes that 'var' as a
> candidate for such optimization.  (Plus some "details", of couse.)  This
> seems to work fine, as far as I can tell.  :-)

Is something like the attached "[WIP] 'walk_gimple_seq' backward" OK
conceptually?  (For next development stage 1 and with all the TODOs
resolved, of course.)

The 'backward' flag cannot simply be a argument to 'walk_gimple_seq'
etc.: it needs to also be passed to 'walk_gimple_seq_mod' calls triggered
from inside 'walk_gimple_stmt'.  Hence, I've put it into the state
'struct walk_stmt_info'.


Grüße
 Thomas


-
Mentor Graphics (Deutschland) GmbH, Arnulfstrasse 201, 80634 München 
Registergericht München HRB 106955, Geschäftsführer: Thomas Heurung, Frank 
Thürauf
>From 48036d65cea6bb47c7d4eca3b4fea77e058f29e3 Mon Sep 17 00:00:00 2001
From: Thomas Schwinge 
Date: Mon, 15 Mar 2021 17:31:00 +0100
Subject: [PATCH] [WIP] 'walk_gimple_seq' backward

This seems to work -- for the one case where I'm using it...
---
 gcc/doc/gimple.texi |  2 ++
 gcc/gimple-walk.c   | 15 ---
 gcc/gimple-walk.h   |  4 
 3 files changed, 18 insertions(+), 3 deletions(-)

diff --git a/gcc/doc/gimple.texi b/gcc/doc/gimple.texi
index 5e0fc2e0dc5..51fe0f2d715 100644
--- a/gcc/doc/gimple.texi
+++ b/gcc/doc/gimple.texi
@@ -2778,4 +2778,6 @@ calling @code{walk_gimple_stmt} on each one.  @code{WI} is as in
 @code{walk_gimple_stmt}.  If @code{walk_gimple_stmt} returns non-@code{NULL}, the walk
 is stopped and the value returned.  Otherwise, all the statements
 are walked and @code{NULL_TREE} returned.
+
+TODO update for forward vs. backward.
 @end deftypefn
diff --git a/gcc/gimple-walk.c b/gcc/gimple-walk.c
index 9a761f32578..dfc2f0b4dbc 100644
--- a/gcc/gimple-walk.c
+++ b/gcc/gimple-walk.c
@@ -32,6 +32,8 @@ along with GCC; see the file COPYING3.  If not see
 /* Walk all the statements in the sequence *PSEQ calling walk_gimple_stmt
on each one.  WI is as in walk_gimple_stmt.
 
+   TODO update for forward vs. backward.
+
If walk_gimple_stmt returns non-NULL, the walk is stopped, and the
value is stored in WI->CALLBACK_RESULT.  Also, the statement that
produced the value is returned if this statement has not been
@@ -44,9 +46,10 @@ gimple *
 walk_gimple_seq_mod (gimple_seq *pseq, walk_stmt_fn callback_stmt,
 		 walk_tree_fn callback_op, struct walk_stmt_info *wi)
 {
-  gimple_stmt_iterator gsi;
+  bool forward = !(wi && wi->backward);
 
-  for (gsi = gsi_start (*pseq); !gsi_end_p (gsi); )
+  gimple_stmt_iterator gsi = forward ? gsi_start (*pseq) : gsi_last (*pseq);
+  for (; !gsi_end_p (gsi); )
 {
   tree ret = walk_gimple_stmt (, callback_stmt, callback_op, wi);
   if (ret)
@@ -60,7 +63,13 @@ walk_gimple_seq_mod (gimple_seq *pseq, walk_stmt_fn callback_stmt,
 	}
 
   if (!wi->removed_stmt)
-	gsi_next ();
+	{
+	  if (forward)
+	gsi_next ();
+	  else //TODO Correct?
+	gsi_prev ();
+	  //TODO This could do with some unit testing, to make sure all the corner cases (removing first/last, for example) work correctly.
+	}
 }
 
   if (wi)
diff --git a/gcc/gimple-walk.h b/gcc/gimple-walk.h
index bdc7351f190..9590c63ba18 100644
--- a/gcc/gimple-walk.h
+++ b/gcc/gimple-walk.h
@@ -71,6 +71,10 @@ struct walk_stmt_info
 
   /* True if we've removed the statement that was processed.  */
   BOOL_BITFIELD removed_stmt : 1;
+
+  /*TODO */
+  //TODO Only applicable for 'walk_gimple_seq'.
+  BOOL_BITFIELD backward : 1;
 };
 
 /* Callback for walk_gimple_stmt.  Called for every statement found
-- 
2.17.1



Re: [PATCH] c-family: Fix up -Wduplicated-branches for union members [PR99565]

2021-03-16 Thread Marek Polacek via Gcc-patches
On Tue, Mar 16, 2021 at 06:56:47PM +0100, Jakub Jelinek wrote:
> Hi!
> 
> Honza has fairly recently changed operand_equal_p to compare
> DECL_FIELD_OFFSET for COMPONENT_REFs when comparing addresses.
> As the first testcase in this patch shows, while that is very nice
> for optimizations, for the -Wduplicated-branches warning it causes
> regressions.  Pedantically a union in both C and C++ has only one
> active member at a time, so using some other union member even if it has the
> same type is UB, so I think the warning shouldn't warn when it sees access
> to different fields that happen to have the same offset and should consider
> them different.
> In my first attempt to fix this I've keyed the old behavior on
> OEP_LEXICOGRAPHIC, but unfortunately that has various problems, the warning
> has a quick non-lexicographic compare in build_conditional_expr* and another
> lexicographic more expensive one later during genericization and turning the
> first one into lexicographic would mean wasting compile time on large
> conditionals.
> So, this patch instead introduces a new OEP_ flag and makes sure to pass it
> to operand_equal_p in all -Wduplicated-branches cases.
> 
> The cvt.c changes are because on the other testcase we were warning with
> UNKNOWN_LOCATION, so the user wouldn't really know where the questionable
> code is.
> 
> Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?
> 
> 2021-03-16  Jakub Jelinek  
> 
>   PR c++/99565
>   * tree-core.h (enum operand_equal_flag): Add OEP_DUPLICATED_BRANCHES.
>   * fold-const.c (operand_compare::operand_equal_p): Don't compare
>   field offsets if OEP_DUPLICATED_BRANCHES.
> 
>   * c-warn.c (do_warn_duplicated_branches): Pass also
>   OEP_DUPLICATED_BRANCHES to operand_equal_p.
> 
>   * c-typeck.c (build_conditional_expr): Pass OEP_DUPLICATED_BRANCHES
>   to operand_equal_p.
> 
>   * call.c (build_conditional_expr_1): Pass OEP_DUPLICATED_BRANCHES
>   to operand_equal_p.
>   * cvt.c (convert_to_void): Preserve location_t on COND_EXPR or
>   or COMPOUND_EXPR.
> 
>   * g++.dg/warn/Wduplicated-branches6.C: New test.
>   * g++.dg/warn/Wduplicated-branches7.C: New test.
> 
> --- gcc/tree-core.h.jj2021-01-16 22:52:33.654413400 +0100
> +++ gcc/tree-core.h   2021-03-16 16:24:06.136829900 +0100
> @@ -896,7 +896,10 @@ enum operand_equal_flag {
>OEP_HASH_CHECK = 32,
>/* Makes operand_equal_p handle more expressions:  */
>OEP_LEXICOGRAPHIC = 64,
> -  OEP_BITWISE = 128
> +  OEP_BITWISE = 128,
> +  /* Considers different fields different even when they have
> + the same offset.  */
> +  OEP_DUPLICATED_BRANCHES = 256
>  };
>  
>  /* Enum and arrays used for tree allocation stats.
> --- gcc/fold-const.c.jj   2021-02-24 12:10:17.571212194 +0100
> +++ gcc/fold-const.c  2021-03-16 16:19:16.327018956 +0100
> @@ -3317,7 +3317,7 @@ operand_compare::operand_equal_p (const_
>   flags &= ~OEP_ADDRESS_OF;
>   if (!OP_SAME (1))
> {
> - if (compare_address)
> + if (compare_address && (flags & OEP_DUPLICATED_BRANCHES) == 0)
> {
>   if (TREE_OPERAND (arg0, 2)
>   || TREE_OPERAND (arg1, 2))
> --- gcc/c-family/c-warn.c.jj  2021-02-12 23:57:30.459142340 +0100
> +++ gcc/c-family/c-warn.c 2021-03-16 16:24:19.224685926 +0100
> @@ -2779,7 +2779,8 @@ do_warn_duplicated_branches (tree expr)
>  
>/* Compare the hashes.  */
>if (h0 == h1
> -  && operand_equal_p (thenb, elseb, OEP_LEXICOGRAPHIC)
> +  && operand_equal_p (thenb, elseb, OEP_LEXICOGRAPHIC
> + | OEP_DUPLICATED_BRANCHES)

This change isn't strictly necessary, OEP_DUPLICATED_BRANCHES is needed
mainly in build_conditional_expr*, right?  (It makes sense to me though,
so let's leave it in.)

>/* Don't warn if any of the branches or their subexpressions comes
>from a macro.  */
>&& !walk_tree_without_duplicates (, expr_from_macro_expansion_r,
> --- gcc/c/c-typeck.c.jj   2021-02-18 22:17:44.184657291 +0100
> +++ gcc/c/c-typeck.c  2021-03-16 16:23:58.335915719 +0100
> @@ -5490,7 +5490,7 @@ build_conditional_expr (location_t colon
>   warn here, because the COND_EXPR will be turned into OP1.  */
>if (warn_duplicated_branches
>&& TREE_CODE (ret) == COND_EXPR
> -  && (op1 == op2 || operand_equal_p (op1, op2, 0)))
> +  && (op1 == op2 || operand_equal_p (op1, op2, OEP_DUPLICATED_BRANCHES)))
>  warning_at (EXPR_LOCATION (ret), OPT_Wduplicated_branches,
>   "this condition has identical branches");

The c/ and c-family/ parts are ok, thanks.

Marek



[PATCH] rs6000: Fix disassembling a vector pair in gcc-10 in little-endian mode

2021-03-16 Thread Peter Bergner via Gcc-patches
In gcc-10, we don't handle disassembling a vector pair in little-endian mode
correctly.  The solution is to make use of the disassemble accumulator code
that is endian friendly.

Trunk does not have this bug, as the use of opaque modes for the MMA types
"fixed" this issue there.

This passed bootstrap and regtesting on powerpc64le-linux with no regressions.
Ok for the GCC 10 release branch?

Peter


gcc/

2021-03-16  Peter Bergner  

* config/rs6000/rs6000-call.c (rs6000_gimple_fold_mma_builtin): Handle
disassembling a vector pair vector by vector in little-endian mode.

diff --git a/gcc/config/rs6000/rs6000-call.c b/gcc/config/rs6000/rs6000-call.c
index 538a57adceb..a112593878a 100644
--- a/gcc/config/rs6000/rs6000-call.c
+++ b/gcc/config/rs6000/rs6000-call.c
@@ -10850,10 +10850,12 @@ rs6000_gimple_fold_mma_builtin (gimple_stmt_iterator 
*gsi)
   tree src = make_ssa_name (TREE_TYPE (src_type));
   gimplify_assign (src, build_simple_mem_ref (src_ptr), _seq);
 
-  /* If we are not disassembling an accumulator or our destination is
-another accumulator, then just copy the entire thing as is.  */
-  if (fncode != MMA_BUILTIN_DISASSEMBLE_ACC
- || TREE_TYPE (TREE_TYPE (dst_ptr)) == vector_quad_type_node)
+  /* If we are disassembling an accumulator/pair and our destination is
+another accumulator/pair, then just copy the entire thing as is.  */
+  if ((fncode == MMA_BUILTIN_DISASSEMBLE_ACC
+  && TREE_TYPE (TREE_TYPE (dst_ptr)) == vector_quad_type_node)
+ || (fncode == VSX_BUILTIN_DISASSEMBLE_PAIR
+ && TREE_TYPE (TREE_TYPE (dst_ptr)) == vector_pair_type_node))
{
  tree dst = build_simple_mem_ref (build1 (VIEW_CONVERT_EXPR,
   src_type, dst_ptr));
@@ -10865,21 +10867,25 @@ rs6000_gimple_fold_mma_builtin (gimple_stmt_iterator 
*gsi)
 
   /* We're disassembling an accumulator into a different type, so we need
 to emit a xxmfacc instruction now, since we cannot do it later.  */
-  new_decl = rs6000_builtin_decls[MMA_BUILTIN_XXMFACC_INTERNAL];
-  new_call = gimple_build_call (new_decl, 1, src);
-  src = make_ssa_name (vector_quad_type_node);
-  gimple_call_set_lhs (new_call, src);
-  gimple_seq_add_stmt (_seq, new_call);
+  if (fncode == MMA_BUILTIN_DISASSEMBLE_ACC)
+   {
+ new_decl = rs6000_builtin_decls[MMA_BUILTIN_XXMFACC_INTERNAL];
+ new_call = gimple_build_call (new_decl, 1, src);
+ src = make_ssa_name (vector_quad_type_node);
+ gimple_call_set_lhs (new_call, src);
+ gimple_seq_add_stmt (_seq, new_call);
+   }
 
-  /* Copy the accumulator vector by vector.  */
+  /* Copy the accumulator/pair vector by vector.  */
+  unsigned nvecs = (fncode == MMA_BUILTIN_DISASSEMBLE_ACC) ? 4 : 2;
   tree dst_type = build_pointer_type_for_mode (unsigned_V16QI_type_node,
   ptr_mode, true);
   tree dst_base = build1 (VIEW_CONVERT_EXPR, dst_type, dst_ptr);
-  tree array_type = build_array_type_nelts (unsigned_V16QI_type_node, 4);
+  tree array_type = build_array_type_nelts (unsigned_V16QI_type_node, 
nvecs);
   tree src_array = build1 (VIEW_CONVERT_EXPR, array_type, src);
-  for (unsigned i = 0; i < 4; i++)
+  for (unsigned i = 0; i < nvecs; i++)
{
- unsigned index = WORDS_BIG_ENDIAN ? i : 3 - i;
+ unsigned index = WORDS_BIG_ENDIAN ? i : nvecs - 1 - i;
  tree ref = build4 (ARRAY_REF, unsigned_V16QI_type_node, src_array,
 build_int_cst (size_type_node, i),
 NULL_TREE, NULL_TREE);


Re: Ping^2: [PATCH v2] rs6000: Convert the vector element register to SImode [PR98914]

2021-03-16 Thread Segher Boessenkool
Hi!

On Tue, Mar 16, 2021 at 07:57:17PM +0100, Jakub Jelinek wrote:
> On Thu, Mar 11, 2021 at 07:57:23AM +0800, Xionghu Luo via Gcc-patches wrote:
> > diff --git a/gcc/config/rs6000/rs6000.c b/gcc/config/rs6000/rs6000.c
> > index ec068c58aa5..48eb91132a9 100644
> > --- a/gcc/config/rs6000/rs6000.c
> > +++ b/gcc/config/rs6000/rs6000.c
> > @@ -7000,11 +7000,15 @@ rs6000_expand_vector_set_var_p9 (rtx target, rtx 
> > val, rtx idx)
> >  
> >gcc_assert (VECTOR_MEM_VSX_P (mode) && !CONST_INT_P (idx));
> >  
> > -  gcc_assert (GET_MODE (idx) == E_SImode);
> > -
> >machine_mode inner_mode = GET_MODE (val);
> >  
> > -  rtx tmp = gen_reg_rtx (GET_MODE (idx));
> > +  machine_mode idx_mode = GET_MODE (idx);
> > +  rtx tmp = gen_reg_rtx (DImode);
> > +  if (idx_mode != DImode)
> > +tmp = convert_modes (DImode, idx_mode, idx, 0);
> > +  else
> > +tmp = idx;
> > +
> >int width = GET_MODE_SIZE (inner_mode);
> >  
> >gcc_assert (width >= 1 && width <= 8);
> > @@ -7012,9 +7016,7 @@ rs6000_expand_vector_set_var_p9 (rtx target, rtx val, 
> > rtx idx)
> >int shift = exact_log2 (width);
> >/* Generate the IDX for permute shift, width is the vector element size.
> >   idx = idx * width.  */
> > -  emit_insn (gen_ashlsi3 (tmp, idx, GEN_INT (shift)));
> > -
> > -  tmp = convert_modes (DImode, SImode, tmp, 1);
> > +  emit_insn (gen_ashldi3 (tmp, tmp, GEN_INT (shift)));
> 
> This looks broken.

Yup.

> The gen_reg_rtx (DImode); call result is completely ignored,
> so it wastes one pseudo, and I'm not convinced that idx nor tmp
> is guaranteed to be usable as output of shift.
> So, shouldn't it be instead:
>   rtx tmp = gen_reg_rtx (DImode);
>   if (idx_mode != DImode)
> idx = convert_modes (DImode, idx_mode, idx, 0);
> 
> ...
>   emit_insn (gen_ashldi3 (tmp, idx, GEN_INT (shift));
> ?

Yeah, something like that.

Just
  idx = convert_modes (DImode, idx_mode, idx, 1);
...
  rtx tmp = gen_reg_rtx (DImode);
  emit_insn (gen_ashldi3 (tmp, idx, GEN_INT (shift));
works just as well.

> Also, dunno what is less and what is more expensive on
> rs6000, whether convert_modes with unsigned = 0 or 1.

Unsigned is never more expensive.  Sometimes it is cheaper.  It depends
more on the situation what is best.  Just look at the generated code?

> I think rs6000 only supports very narrow vectors, so even for

The only hardware vectors are 128 bits.  There are modes for two vectors
concatenated as well, but that is just for RTL, you should not have
insns making such a pair.

> say QImode idx anything with MSB set will be out of out of bounds and UB,
> so you can sign or zero extend, whatever is more efficient.

Yup.

> > +  machine_mode idx_mode = GET_MODE (idx);
> > +  rtx tmp = gen_reg_rtx (DImode);
> > +  if (idx_mode != DImode)
> > +tmp = convert_modes (DImode, idx_mode, idx, 0);
> > +  else
> > +tmp = idx;
> > +
> >gcc_assert (width >= 1 && width <= 4);
> >  
> >if (!BYTES_BIG_ENDIAN)
> >  {
> >/*  idx = idx * width.  */
> > -  emit_insn (gen_mulsi3 (tmp, idx, GEN_INT (width)));
> > +  emit_insn (gen_muldi3 (tmp, tmp, GEN_INT (width)));
> >/*  idx = idx + 8.  */
> > -  emit_insn (gen_addsi3 (tmp, tmp, GEN_INT (8)));
> > +  emit_insn (gen_adddi3 (tmp, tmp, GEN_INT (8)));
> >  }
> >else
> >  {
> > -  emit_insn (gen_mulsi3 (tmp, idx, GEN_INT (width)));
> > -  emit_insn (gen_subsi3 (tmp, GEN_INT (24 - width), tmp));
> > +  emit_insn (gen_muldi3 (tmp, idx, GEN_INT (width)));
> > +  emit_insn (gen_subdi3 (tmp, GEN_INT (24 - width), tmp));
> 
> Ditto.  But the above was even inconsistent, sometimes it
> used tmp (for LE) and otherwise idx in whatever mode it has.

Thanks for the review Jakub, I kinda dropped this :-(


Segher


[pushed] c++: Fix NaN as C++20 template argument

2021-03-16 Thread Jason Merrill via Gcc-patches
C++20 allows floating-point types for non-type template parameters;
floating-point values are considered to be equivalent template arguments if
they are "identical", which conveniently seems to map onto an existing GCC
predicate.

Tested x86_64-pc-linux-gnu, applying to trunk.

gcc/cp/ChangeLog:

* tree.c (cp_tree_equal): Use real_identical.

gcc/testsuite/ChangeLog:

* g++.dg/cpp2a/nontype-float1.C: New test.
---
 gcc/cp/tree.c   |  2 +-
 gcc/testsuite/g++.dg/cpp2a/nontype-float1.C | 12 
 2 files changed, 13 insertions(+), 1 deletion(-)
 create mode 100644 gcc/testsuite/g++.dg/cpp2a/nontype-float1.C

diff --git a/gcc/cp/tree.c b/gcc/cp/tree.c
index 3c469750e9d..3acb6433b64 100644
--- a/gcc/cp/tree.c
+++ b/gcc/cp/tree.c
@@ -3740,7 +3740,7 @@ cp_tree_equal (tree t1, tree t2)
   return tree_int_cst_equal (t1, t2);
 
 case REAL_CST:
-  return real_equal (_REAL_CST (t1), _REAL_CST (t2));
+  return real_identical (_REAL_CST (t1), _REAL_CST (t2));
 
 case STRING_CST:
   return TREE_STRING_LENGTH (t1) == TREE_STRING_LENGTH (t2)
diff --git a/gcc/testsuite/g++.dg/cpp2a/nontype-float1.C 
b/gcc/testsuite/g++.dg/cpp2a/nontype-float1.C
new file mode 100644
index 000..4fafac13793
--- /dev/null
+++ b/gcc/testsuite/g++.dg/cpp2a/nontype-float1.C
@@ -0,0 +1,12 @@
+// { dg-do compile { target c++20 } }
+
+#include 
+
+template class MyClass { };
+
+static_assert(__is_same(MyClass, MyClass));
+
+constexpr auto mynan = NAN;
+static_assert(__is_same(MyClass, MyClass));
+
+static_assert(__is_same(MyClass, MyClass));

base-commit: 0251051db64f13c9a31a05c8133c31dc50b2b235
-- 
2.27.0



[PATCH] Complete __gnu_debug::basic_string

2021-03-16 Thread François Dumont via Gcc-patches

Following:

https://gcc.gnu.org/pipermail/libstdc++/2021-March/052158.html

Here is the patch to complete __gnu_debug::basic_string support. 
Contrarily to what I thought code in std::basic_string to generate a 
basic_string_view works just fine for __gnu_debug::basic_string.


    libstdc++: [_GLIBCXX_DEBUG] Add __gnu_debug 
u8string/u16string/u32string


    Complete __gnu_debug::basic_string support so that it can be used as a
    transparent replacement of std::basic_string.

    libstdc++-v3/ChangeLog:

    * include/debug/string
    (basic_string(const _CharT*, const _Allocator&)): Remove 
assign call.
    (basic_string<>::insert(const_iterator, _InputIte, 
_InputIte)): Try to

    remove iterator debug layer even if !_GLIBCXX_USE_CXX11_ABI.
    [_GLIBCXX_USE_CHAR8_T] (__gnu_debug::u8string): New.
    (__gnu_debug::u16string, __gnu_debug::u32string): New.
[!_GLIBCXX_COMPATIBILITY_CXX0X](std::hash<__gnu_debug::string>): New.
[!_GLIBCXX_COMPATIBILITY_CXX0X][_GLIBCXX_USE_WCHAR_T](std::hash<__gnu_debug::wstring>): 
New.

[_GLIBCXX_USE_CHAR8_T](std::hash<__gnu_debug::u8string>): New.
    (std::hash<__gnu_debug::u16string>): New.
    (std::hash<__gnu_debug::u32string>): New.
    * testsuite/21_strings/basic_string/hash/hash_char8_t.cc: 
Adapt for

    __gnu_debug basic_string.

Tested under Linux x86_64.

Ok to commit ?

François

diff --git a/libstdc++-v3/include/debug/string b/libstdc++-v3/include/debug/string
index d6eb5280ade..dec23f6277b 100644
--- a/libstdc++-v3/include/debug/string
+++ b/libstdc++-v3/include/debug/string
@@ -41,6 +41,14 @@
 __gnu_debug::_Error_formatter::_S_at(_File, _Line, _Func)		\
   ._M_message(#_Cond)._M_error()
 
+#if _GLIBCXX_USE_CXX11_ABI && __cplusplus >= 201103
+# define _GLIBCXX_CPP11_AND_CXX11_ABI 1
+# define _GLIBCXX_CPP11_AND_CXX11_ABI_ONLY(Statement) Statement
+#else
+# define _GLIBCXX_CPP11_AND_CXX11_ABI 0
+# define _GLIBCXX_CPP11_AND_CXX11_ABI_ONLY(Statement)
+#endif
+
 namespace __gnu_debug
 {
   /** Checks that __s is non-NULL or __n == 0, and then returns __s. */
@@ -180,7 +188,7 @@ namespace __gnu_debug
 
   basic_string(const _CharT* __s, const _Allocator& __a = _Allocator())
   : _Base(__glibcxx_check_string_constructor(__s), __a)
-  { this->assign(__s); }
+  { }
 
   basic_string(size_type __n, _CharT __c,
 		   const _Allocator& __a = _Allocator())
@@ -637,15 +645,22 @@ namespace __gnu_debug
 	  __glibcxx_check_insert_range(__p, __first, __last, __dist);
 
 	  typename _Base::iterator __res;
-#if _GLIBCXX_USE_CXX11_ABI && __cplusplus >= 201103
+#if ! _GLIBCXX_CPP11_AND_CXX11_ABI
+	  const size_type __offset = __p.base() - _Base::begin();
+#endif
 	  if (__dist.second >= __dp_sign)
-	__res = _Base::insert(__p.base(), __gnu_debug::__unsafe(__first),
-  __gnu_debug::__unsafe(__last));
+	{
+	  _GLIBCXX_CPP11_AND_CXX11_ABI_ONLY(__res =)
+		_Base::insert(__p.base(), __gnu_debug::__unsafe(__first),
+			  __gnu_debug::__unsafe(__last));
+	}
 	  else
-	__res = _Base::insert(__p.base(), __first, __last);
-#else
-	  const size_type __offset = __p.base() - _Base::begin();
-	  _Base::insert(__p.base(), __first, __last);
+	{
+	  _GLIBCXX_CPP11_AND_CXX11_ABI_ONLY(__res =)
+		_Base::insert(__p.base(), __first, __last);
+	}
+
+#if ! _GLIBCXX_CPP11_AND_CXX11_ABI
 	  __res = _Base::begin() + __offset;
 #endif
 	  this->_M_invalidate_all();
@@ -1287,6 +1302,19 @@ namespace __gnu_debug
   typedef basic_string wstring;
 #endif
 
+#ifdef _GLIBCXX_USE_CHAR8_T
+  /// A string of @c char8_t
+  typedef basic_string u8string;
+#endif
+
+#if __cplusplus >= 201103L
+  /// A string of @c char16_t
+  typedef basic_string u16string;
+
+  /// A string of @c char32_t
+  typedef basic_string u32string;
+#endif
+
   template
 struct _Insert_range_from_self_is_safe<
   __gnu_debug::basic_string<_CharT, _Traits, _Allocator> >
@@ -1294,4 +1322,96 @@ namespace __gnu_debug
 
 } // namespace __gnu_debug
 
+#if __cplusplus >= 201103L
+
+namespace std _GLIBCXX_VISIBILITY(default)
+{
+_GLIBCXX_BEGIN_NAMESPACE_VERSION
+
+  // DR 1182.
+
+#ifndef _GLIBCXX_COMPATIBILITY_CXX0X
+  /// std::hash specialization for string.
+  template<>
+struct hash<__gnu_debug::string>
+: public __hash_base
+{
+  size_t
+  operator()(const __gnu_debug::string& __s) const noexcept
+  { return std::hash()(__s); }
+};
+
+  template<>
+struct __is_fast_hash> : std::false_type
+{ };
+
+#ifdef _GLIBCXX_USE_WCHAR_T
+  /// std::hash specialization for wstring.
+  template<>
+struct hash<__gnu_debug::wstring>
+: public __hash_base
+{
+  size_t
+  operator()(const __gnu_debug::wstring& __s) const noexcept
+  { return std::hash<__gnu_debug::wstring>()(__s); }
+};
+
+  template<>
+struct __is_fast_hash> : std::false_type
+{ };
+#endif
+#endif /* _GLIBCXX_COMPATIBILITY_CXX0X */
+
+#ifdef _GLIBCXX_USE_CHAR8_T
+  

Aw: Re: [Patch, fortran] PR99602 - [11 regression] runtime error: pointer actual argument not associated

2021-03-16 Thread Harald Anlauf via Gcc-patches
Hi Tobias,

> Shouldn't there be also a testcase which triggers this run-time error?

The testcase is there, it simply has the wrong dg-foo:

! { dg-do compile }

which should be

! { dg-do run }

*-*-*

There are certainly more cases which should insert checks but currently don't, 
like:

subroutine p (mm)
  implicit none
  type :: m_t
  end type m_t
  type, extends (m_t) :: m2_t
  end type m2_t
  class(m_t), pointer :: mm
  select type (mm)
  type is (m2_t)
 print *, "m2_t"
  end select
end

We've still two or three weeks to solve this before 11-release!  :-)

Harald



Re: [PATCH] c++: Fix up calls to immediate functions returning reference [PR99507]

2021-03-16 Thread Jason Merrill via Gcc-patches

On 3/11/21 3:52 AM, Jakub Jelinek wrote:

Hi!

build_cxx_call calls convert_from_reference at the end, so if an immediate
function returns a reference, we were constant evaluating not just that
call, but that call wrapped in an INDIRECT_REF.  That unfortunately means
it can constant evaluate to something non-addressable, so if code later
needs to take its address it will fail.

The following patch fixes that by undoing the convert_from_reference
wrapping for the cxx_constant_value evaluation and readdding it ad the end.

Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?


OK.


2021-03-11  Jakub Jelinek  

PR c++/99507
* call.c (build_over_call): For immediate evaluation of functions
that return references, undo convert_from_reference effects before
calling cxx_constant_value and call convert_from_reference
afterwards.

* g++.dg/cpp2a/consteval19.C: New test.

--- gcc/cp/call.c.jj2021-03-06 10:17:01.687304578 +0100
+++ gcc/cp/call.c   2021-03-10 13:51:29.121445195 +0100
@@ -9504,6 +9504,8 @@ build_over_call (struct z_candidate *can
if (immediate_invocation_p (fndecl, nargs))
{
  tree obj_arg = NULL_TREE;
+ if (REFERENCE_REF_P (call))
+   call = TREE_OPERAND (call, 0);
  if (DECL_CONSTRUCTOR_P (fndecl))
obj_arg = cand->first_arg ? cand->first_arg : (*args)[0];
  if (obj_arg && is_dummy_object (obj_arg))
@@ -9527,6 +9529,7 @@ build_over_call (struct z_candidate *can
  call = cxx_constant_value (call, obj_arg);
  if (obj_arg && !error_operand_p (call))
call = build2 (INIT_EXPR, void_type_node, obj_arg, call);
+ call = convert_from_reference (call);
}
  }
return call;
--- gcc/testsuite/g++.dg/cpp2a/consteval19.C.jj 2021-03-10 14:20:58.018835190 
+0100
+++ gcc/testsuite/g++.dg/cpp2a/consteval19.C2021-03-10 14:20:16.642294167 
+0100
@@ -0,0 +1,6 @@
+// PR c++/99507
+// { dg-do compile { target c++20 } }
+
+constexpr int i{0};
+consteval const int  () { return i; }
+const int *a{ ()};

Jakub





Re: [PATCH] tighten up checking for virtual bases (PR 99502)

2021-03-16 Thread Jason Merrill via Gcc-patches

On 3/11/21 1:06 PM, Martin Sebor wrote:

More extensive testing of the patch I just committed in r11-7563 to
avoid the false positive -Warray-bounds on accesses to members of
virtual bases has exposed a couple of problems that cause many false
negatives for even basic bugs like:

    typedef struct A { int i; } A;

    void* g (void)
    {
  A *p = (A*)malloc (3);
  p->i = 0;    // missing warning in C++
  return p;
    }

as well as a number of others, including more complex ones involving
virtual base classes that can, with some additional work, be reliably
detected.  Most of them were successfully diagnosed before the fix
for PR 98266.  Unfortunately, the tests that exercise this are all
C so the C++ only regressions due to the divergence went unnoticed.

The root cause is twofold: a) the simplistic check for TYPE_BINFO
in the new inbounds_vbase_memaccess_p() function means we exclude
from checking any member accesses whose leading offset is in bounds,
and b) the check for the leading offset misses cases where just
the ending offset of an access is out of bounds.

The attached patch adds code to restrict the original solution
to just the narrow subset of accesses to members of classes with
virtual bases, and also adds a check for the ending offset.


Is it enough to just fix the ending offset check, and maybe remove 
"vbase" from the function name?  The shortcut should be valid for 
classes without vbases, as well.



Is the patch okay for trunk?

(The code for has_virtual_base() is adapted from
polymorphic_type_binfo_p() in ipa-utils.h.  It seems like a handy
utility that could go in tree.h.)

Martin

PS There is another regression I discovered while working on this.
It's a false positive but unrelated to the r11-7563 fix so I'll
post a patch for separately.




Re: [PATCH] c++: Ensure correct destruction order of local statics [PR99613]

2021-03-16 Thread Jason Merrill via Gcc-patches

On 3/16/21 2:08 PM, Jakub Jelinek wrote:

Hi!

As mentioned in the PR, if end of two constructions of local statics
is strongly ordered, their destructors should be run in the reverse order.
As we run __cxa_guard_release before calling __cxa_atexit, it is possible
that we have two threads that access two local statics in the same order
for the first time, one thread wins the __cxa_guard_acquire on the first
one but is rescheduled in between the __cxa_guard_release and __cxa_atexit
calls, then the other thread is scheduled and wins __cxa_guard_acquire
on the second one and calls __cxa_quard_release and __cxa_atexit and only
afterwards the first thread calls its __cxa_atexit.  This means a variable
whose completion of the constructor strongly happened after the completion
of the other one will be destructed after the other variable is destructed.

The following patch fixes that by swapping the __cxa_guard_release and
__cxa_atexit calls.

Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?


OK.


2021-03-16  Jakub Jelinek  

PR c++/99613
* decl.c (expand_static_init): For thread guards, call __cxa_atexit
before calling __cxa_guard_release rather than after it.  Formatting
fixes.

--- gcc/cp/decl.c.jj2021-03-16 00:21:29.517232582 +0100
+++ gcc/cp/decl.c   2021-03-16 15:16:40.803278426 +0100
@@ -9246,17 +9246,25 @@ expand_static_init (tree decl, tree init
  
  	  /* Do the initialization itself.  */

  init = add_stmt_to_compound (begin, init);
- init = add_stmt_to_compound
-   (init, build2 (MODIFY_EXPR, void_type_node, flag, 
boolean_true_node));
- init = add_stmt_to_compound
-   (init, build_call_n (release_fn, 1, guard_addr));
+ init = add_stmt_to_compound (init,
+  build2 (MODIFY_EXPR, void_type_node,
+  flag, boolean_true_node));
+
+ /* Use atexit to register a function for destroying this static
+variable.  Do this before calling __cxa_guard_release.  */
+ init = add_stmt_to_compound (init, register_dtor_fn (decl));
+
+ init = add_stmt_to_compound (init, build_call_n (release_fn, 1,
+  guard_addr));
}
else
-   init = add_stmt_to_compound (init, set_guard (guard));
+   {
+ init = add_stmt_to_compound (init, set_guard (guard));
  
-  /* Use atexit to register a function for destroying this static

-variable.  */
-  init = add_stmt_to_compound (init, register_dtor_fn (decl));
+ /* Use atexit to register a function for destroying this static
+variable.  */
+ init = add_stmt_to_compound (init, register_dtor_fn (decl));
+   }
  
finish_expr_stmt (init);
  


Jakub





Re: [PATCH] C++: target attribute - local decl

2021-03-16 Thread Jason Merrill via Gcc-patches

On 3/8/21 4:33 AM, Martin Liška wrote:

On 3/4/21 9:54 PM, Jason Merrill wrote:

On 3/4/21 10:52 AM, Martin Liška wrote:

On 3/4/21 4:45 PM, Jason Merrill wrote:


Sure, I guess you do need to set that flag for the local decls, but 
that should be all.


Jason


Doing that also fails :/
This time likely due to how we set RECORD argument of 
maybe_version_functions function:


gcc/cp/decl.c:    && maybe_version_functions (newdecl, olddecl,
gcc/cp/decl.c-
(!DECL_FUNCTION_VERSIONED (newdecl)
gcc/cp/decl.c- || 
!DECL_FUNCTION_VERSIONED (olddecl


That is odd.

The other problem is that DECL_LOCAL_DECL_ALIAS isn't always set 
before we get to maybe_version_functions; it's set further down in 
do_pushdecl.  We need it to be set or things break.


Oh, I see.



This seems to work for me, what do you think?


Seems good to me, please apply the patch.


Done.

Jason



[PATCH] rs6000: Workaround for PR98092

2021-03-16 Thread Segher Boessenkool
The bcdinvalid_ RTL instruction uses the "unordered" comparison,
which cannot be used if we have -ffinite-math-only.  We really need
CCMODEs that describe what bits in a CR field are set by other insns
than just comparisons, but that is a lot more surgery, and it is stage 4
now.  This patch does a simple workaround.

2021-03-16  Segher Boessenkool  

PR target/98092
* config/rs6000/predicates.md (branch_comparison_operator): Allow
ordered and unordered for CCFPmode, if flag_finite_math_only.

gcc/testsuite/
PR target/98092
* gcc.target/powerpc/pr98092.c: New.
---
 gcc/config/rs6000/predicates.md| 9 +
 gcc/testsuite/gcc.target/powerpc/pr98092.c | 7 +++
 2 files changed, 12 insertions(+), 4 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/powerpc/pr98092.c

diff --git a/gcc/config/rs6000/predicates.md b/gcc/config/rs6000/predicates.md
index 69f3c709abbc..859af75dfbd1 100644
--- a/gcc/config/rs6000/predicates.md
+++ b/gcc/config/rs6000/predicates.md
@@ -1208,10 +1208,11 @@ (define_special_predicate "equality_operator"
 (define_predicate "branch_comparison_operator"
(and (match_operand 0 "comparison_operator")
(match_test "GET_MODE_CLASS (GET_MODE (XEXP (op, 0))) == MODE_CC")
-   (if_then_else (match_test "GET_MODE (XEXP (op, 0)) == CCFPmode
-  && !flag_finite_math_only")
- (match_code "lt,gt,eq,unordered,unge,unle,ne,ordered")
- (match_code "lt,ltu,le,leu,gt,gtu,ge,geu,eq,ne"))
+   (if_then_else (match_test "GET_MODE (XEXP (op, 0)) == CCFPmode")
+ (if_then_else (match_test "flag_finite_math_only")
+   (match_code "lt,le,gt,ge,eq,ne,unordered,ordered")
+   (match_code "lt,gt,eq,unordered,unge,unle,ne,ordered"))
+ (match_code "lt,ltu,le,leu,gt,gtu,ge,geu,eq,ne"))
(match_test "validate_condition_mode (GET_CODE (op),
  GET_MODE (XEXP (op, 0))),
 1")))
diff --git a/gcc/testsuite/gcc.target/powerpc/pr98092.c 
b/gcc/testsuite/gcc.target/powerpc/pr98092.c
new file mode 100644
index ..03eab5a24ef7
--- /dev/null
+++ b/gcc/testsuite/gcc.target/powerpc/pr98092.c
@@ -0,0 +1,7 @@
+/* { dg-options "-mdejagnu-cpu=power9 -ffinite-math-only" } */
+
+int
+h9 (__attribute__ ((altivec (vector__))) char un)
+{
+  return (__builtin_vec_bcdinvalid (un));
+}
-- 
1.8.3.1



Re: Ping^2: [PATCH v2] rs6000: Convert the vector element register to SImode [PR98914]

2021-03-16 Thread Jakub Jelinek via Gcc-patches
On Thu, Mar 11, 2021 at 07:57:23AM +0800, Xionghu Luo via Gcc-patches wrote:
> diff --git a/gcc/config/rs6000/rs6000.c b/gcc/config/rs6000/rs6000.c
> index ec068c58aa5..48eb91132a9 100644
> --- a/gcc/config/rs6000/rs6000.c
> +++ b/gcc/config/rs6000/rs6000.c
> @@ -7000,11 +7000,15 @@ rs6000_expand_vector_set_var_p9 (rtx target, rtx val, 
> rtx idx)
>  
>gcc_assert (VECTOR_MEM_VSX_P (mode) && !CONST_INT_P (idx));
>  
> -  gcc_assert (GET_MODE (idx) == E_SImode);
> -
>machine_mode inner_mode = GET_MODE (val);
>  
> -  rtx tmp = gen_reg_rtx (GET_MODE (idx));
> +  machine_mode idx_mode = GET_MODE (idx);
> +  rtx tmp = gen_reg_rtx (DImode);
> +  if (idx_mode != DImode)
> +tmp = convert_modes (DImode, idx_mode, idx, 0);
> +  else
> +tmp = idx;
> +
>int width = GET_MODE_SIZE (inner_mode);
>  
>gcc_assert (width >= 1 && width <= 8);
> @@ -7012,9 +7016,7 @@ rs6000_expand_vector_set_var_p9 (rtx target, rtx val, 
> rtx idx)
>int shift = exact_log2 (width);
>/* Generate the IDX for permute shift, width is the vector element size.
>   idx = idx * width.  */
> -  emit_insn (gen_ashlsi3 (tmp, idx, GEN_INT (shift)));
> -
> -  tmp = convert_modes (DImode, SImode, tmp, 1);
> +  emit_insn (gen_ashldi3 (tmp, tmp, GEN_INT (shift)));

This looks broken.
The gen_reg_rtx (DImode); call result is completely ignored,
so it wastes one pseudo, and I'm not convinced that idx nor tmp
is guaranteed to be usable as output of shift.
So, shouldn't it be instead:
  rtx tmp = gen_reg_rtx (DImode);
  if (idx_mode != DImode)
idx = convert_modes (DImode, idx_mode, idx, 0);

...
  emit_insn (gen_ashldi3 (tmp, idx, GEN_INT (shift));
?
Also, dunno what is less and what is more expensive on
rs6000, whether convert_modes with unsigned = 0 or 1.
I think rs6000 only supports very narrow vectors, so even for
say QImode idx anything with MSB set will be out of out of bounds and UB,
so you can sign or zero extend, whatever is more efficient.

> +  machine_mode idx_mode = GET_MODE (idx);
> +  rtx tmp = gen_reg_rtx (DImode);
> +  if (idx_mode != DImode)
> +tmp = convert_modes (DImode, idx_mode, idx, 0);
> +  else
> +tmp = idx;
> +
>gcc_assert (width >= 1 && width <= 4);
>  
>if (!BYTES_BIG_ENDIAN)
>  {
>/*  idx = idx * width.  */
> -  emit_insn (gen_mulsi3 (tmp, idx, GEN_INT (width)));
> +  emit_insn (gen_muldi3 (tmp, tmp, GEN_INT (width)));
>/*  idx = idx + 8.  */
> -  emit_insn (gen_addsi3 (tmp, tmp, GEN_INT (8)));
> +  emit_insn (gen_adddi3 (tmp, tmp, GEN_INT (8)));
>  }
>else
>  {
> -  emit_insn (gen_mulsi3 (tmp, idx, GEN_INT (width)));
> -  emit_insn (gen_subsi3 (tmp, GEN_INT (24 - width), tmp));
> +  emit_insn (gen_muldi3 (tmp, idx, GEN_INT (width)));
> +  emit_insn (gen_subdi3 (tmp, GEN_INT (24 - width), tmp));

Ditto.  But the above was even inconsistent, sometimes it
used tmp (for LE) and otherwise idx in whatever mode it has.

Jakub



RE: [PATCH] Ada: hashed container Cursor type predefined equality non-conformance

2021-03-16 Thread Richard Wai
Just a note that I do not have write access, so I will need someone who does
to commit this patch, if approved.

Richard

> -Original Message-
> From: Richard Wai 
> Sent: March 11, 2021 9:13 AM
> To: 'Arnaud Charlet' 
> Cc: 'gcc-patches@gcc.gnu.org' ; 'Bob Duff'
> 
> Subject: RE: [PATCH] Ada: hashed container Cursor type predefined equality
> non-conformance
> 
> Here is the amended commit log:
> 
> --
> 
> Ada: Ensure correct predefined equality behavior for Cursor objects of
> hashed containers.
> 
> 2021-03-11  Richard Wai  
> 
> gcc/ada/
>   * libgnat/a-cohase.ads (Cursor): Synchronize comments for the
> Cursor
>   type definition to be consistent with identical definitions in other
>   container packages. Add additional comments regarding the
> importance of
>   maintaining the "Position" component for predefined equality.
>   * libgnat/a-cohama.ads (Cursor): Likewise.
>   * libgnat/a-cihama.ads (Cursor): Likewise.
>   * libgnat/a-cohase.adb (Find, Insert): Ensure that Cursor objects
>   always have their "Position" component set to ensure predefined
>   equality works as required.
>   * libgnat/a-cohama.adb (Find, Insert): Likewise.
>   * libgnat/a-cihama.adb (Find, Insert): Likewise.
> 
> gcc/testsuite/
>   * gnat.dg/containers2.adb: New test.
> 
> --
> 
> Thanks!
> 
> Richard
> 
> > -Original Message-
> > From: Arnaud Charlet 
> > Sent: March 10, 2021 11:27 AM
> > To: Richard Wai 
> > Cc: gcc-patches@gcc.gnu.org; 'Bob Duff' ; Arnaud
> > Charlet 
> > Subject: Re: [PATCH] Ada: hashed container Cursor type predefined
> > equality non-conformance
> >
> > > I'm not sure I correctly understand you here, but my interpretation
> > > is that I should no longer submit Changelog entries, rather just the
> > > patch, and then
> >
> > Right.
> >
> > > a commit message (a-la git), and then presumably the Changelong
> > > entries will be generated automatically. From what I can see, gcc'
> > > website does not talk about that, so I'm guessing this format based
> > > on what I see from git-log, generally.
> > >
> > > So assuming that, attached is the "correct" patch, and here is the
> > > commit
> > > message:
> > >
> > > ---
> > >
> > > Author: Richard Wai 
> > >
> > > Ada: Ensure correct Cursor predefined equality behavior for hashed
> > > containers.
> > >
> > > --
> > >
> > > And for the record, the change log entries I've come up with as per
> > > the previous email:
> >
> > And the commit log will look like:
> >
> > 2021-03-09  Richard Wai  
> >
> > gcc/ada/
> > * libgnat/...
> >
> > gcc/testsuite/
> > * gnat.dg/containers2.adb: ...
> >
> > Your patch is OK with these changes, thanks.



[PATCH] c++: Ensure correct destruction order of local statics [PR99613]

2021-03-16 Thread Jakub Jelinek via Gcc-patches
Hi!

As mentioned in the PR, if end of two constructions of local statics
is strongly ordered, their destructors should be run in the reverse order.
As we run __cxa_guard_release before calling __cxa_atexit, it is possible
that we have two threads that access two local statics in the same order
for the first time, one thread wins the __cxa_guard_acquire on the first
one but is rescheduled in between the __cxa_guard_release and __cxa_atexit
calls, then the other thread is scheduled and wins __cxa_guard_acquire
on the second one and calls __cxa_quard_release and __cxa_atexit and only
afterwards the first thread calls its __cxa_atexit.  This means a variable
whose completion of the constructor strongly happened after the completion
of the other one will be destructed after the other variable is destructed.

The following patch fixes that by swapping the __cxa_guard_release and
__cxa_atexit calls.

Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?

2021-03-16  Jakub Jelinek  

PR c++/99613
* decl.c (expand_static_init): For thread guards, call __cxa_atexit
before calling __cxa_guard_release rather than after it.  Formatting
fixes.

--- gcc/cp/decl.c.jj2021-03-16 00:21:29.517232582 +0100
+++ gcc/cp/decl.c   2021-03-16 15:16:40.803278426 +0100
@@ -9246,17 +9246,25 @@ expand_static_init (tree decl, tree init
 
  /* Do the initialization itself.  */
  init = add_stmt_to_compound (begin, init);
- init = add_stmt_to_compound
-   (init, build2 (MODIFY_EXPR, void_type_node, flag, 
boolean_true_node));
- init = add_stmt_to_compound
-   (init, build_call_n (release_fn, 1, guard_addr));
+ init = add_stmt_to_compound (init,
+  build2 (MODIFY_EXPR, void_type_node,
+  flag, boolean_true_node));
+
+ /* Use atexit to register a function for destroying this static
+variable.  Do this before calling __cxa_guard_release.  */
+ init = add_stmt_to_compound (init, register_dtor_fn (decl));
+
+ init = add_stmt_to_compound (init, build_call_n (release_fn, 1,
+  guard_addr));
}
   else
-   init = add_stmt_to_compound (init, set_guard (guard));
+   {
+ init = add_stmt_to_compound (init, set_guard (guard));
 
-  /* Use atexit to register a function for destroying this static
-variable.  */
-  init = add_stmt_to_compound (init, register_dtor_fn (decl));
+ /* Use atexit to register a function for destroying this static
+variable.  */
+ init = add_stmt_to_compound (init, register_dtor_fn (decl));
+   }
 
   finish_expr_stmt (init);
 

Jakub



[PATCH] c-family: Fix up -Wduplicated-branches for union members [PR99565]

2021-03-16 Thread Jakub Jelinek via Gcc-patches
Hi!

Honza has fairly recently changed operand_equal_p to compare
DECL_FIELD_OFFSET for COMPONENT_REFs when comparing addresses.
As the first testcase in this patch shows, while that is very nice
for optimizations, for the -Wduplicated-branches warning it causes
regressions.  Pedantically a union in both C and C++ has only one
active member at a time, so using some other union member even if it has the
same type is UB, so I think the warning shouldn't warn when it sees access
to different fields that happen to have the same offset and should consider
them different.
In my first attempt to fix this I've keyed the old behavior on
OEP_LEXICOGRAPHIC, but unfortunately that has various problems, the warning
has a quick non-lexicographic compare in build_conditional_expr* and another
lexicographic more expensive one later during genericization and turning the
first one into lexicographic would mean wasting compile time on large
conditionals.
So, this patch instead introduces a new OEP_ flag and makes sure to pass it
to operand_equal_p in all -Wduplicated-branches cases.

The cvt.c changes are because on the other testcase we were warning with
UNKNOWN_LOCATION, so the user wouldn't really know where the questionable
code is.

Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?

2021-03-16  Jakub Jelinek  

PR c++/99565
* tree-core.h (enum operand_equal_flag): Add OEP_DUPLICATED_BRANCHES.
* fold-const.c (operand_compare::operand_equal_p): Don't compare
field offsets if OEP_DUPLICATED_BRANCHES.

* c-warn.c (do_warn_duplicated_branches): Pass also
OEP_DUPLICATED_BRANCHES to operand_equal_p.

* c-typeck.c (build_conditional_expr): Pass OEP_DUPLICATED_BRANCHES
to operand_equal_p.

* call.c (build_conditional_expr_1): Pass OEP_DUPLICATED_BRANCHES
to operand_equal_p.
* cvt.c (convert_to_void): Preserve location_t on COND_EXPR or
or COMPOUND_EXPR.

* g++.dg/warn/Wduplicated-branches6.C: New test.
* g++.dg/warn/Wduplicated-branches7.C: New test.

--- gcc/tree-core.h.jj  2021-01-16 22:52:33.654413400 +0100
+++ gcc/tree-core.h 2021-03-16 16:24:06.136829900 +0100
@@ -896,7 +896,10 @@ enum operand_equal_flag {
   OEP_HASH_CHECK = 32,
   /* Makes operand_equal_p handle more expressions:  */
   OEP_LEXICOGRAPHIC = 64,
-  OEP_BITWISE = 128
+  OEP_BITWISE = 128,
+  /* Considers different fields different even when they have
+ the same offset.  */
+  OEP_DUPLICATED_BRANCHES = 256
 };
 
 /* Enum and arrays used for tree allocation stats.
--- gcc/fold-const.c.jj 2021-02-24 12:10:17.571212194 +0100
+++ gcc/fold-const.c2021-03-16 16:19:16.327018956 +0100
@@ -3317,7 +3317,7 @@ operand_compare::operand_equal_p (const_
flags &= ~OEP_ADDRESS_OF;
if (!OP_SAME (1))
  {
-   if (compare_address)
+   if (compare_address && (flags & OEP_DUPLICATED_BRANCHES) == 0)
  {
if (TREE_OPERAND (arg0, 2)
|| TREE_OPERAND (arg1, 2))
--- gcc/c-family/c-warn.c.jj2021-02-12 23:57:30.459142340 +0100
+++ gcc/c-family/c-warn.c   2021-03-16 16:24:19.224685926 +0100
@@ -2779,7 +2779,8 @@ do_warn_duplicated_branches (tree expr)
 
   /* Compare the hashes.  */
   if (h0 == h1
-  && operand_equal_p (thenb, elseb, OEP_LEXICOGRAPHIC)
+  && operand_equal_p (thenb, elseb, OEP_LEXICOGRAPHIC
+   | OEP_DUPLICATED_BRANCHES)
   /* Don't warn if any of the branches or their subexpressions comes
 from a macro.  */
   && !walk_tree_without_duplicates (, expr_from_macro_expansion_r,
--- gcc/c/c-typeck.c.jj 2021-02-18 22:17:44.184657291 +0100
+++ gcc/c/c-typeck.c2021-03-16 16:23:58.335915719 +0100
@@ -5490,7 +5490,7 @@ build_conditional_expr (location_t colon
  warn here, because the COND_EXPR will be turned into OP1.  */
   if (warn_duplicated_branches
   && TREE_CODE (ret) == COND_EXPR
-  && (op1 == op2 || operand_equal_p (op1, op2, 0)))
+  && (op1 == op2 || operand_equal_p (op1, op2, OEP_DUPLICATED_BRANCHES)))
 warning_at (EXPR_LOCATION (ret), OPT_Wduplicated_branches,
"this condition has identical branches");
 
--- gcc/cp/call.c.jj2021-03-12 10:11:17.642122302 +0100
+++ gcc/cp/call.c   2021-03-16 16:23:49.647011304 +0100
@@ -5798,7 +5798,8 @@ build_conditional_expr_1 (const op_locat
  warn here, because the COND_EXPR will be turned into ARG2.  */
   if (warn_duplicated_branches
   && (complain & tf_warning)
-  && (arg2 == arg3 || operand_equal_p (arg2, arg3, 0)))
+  && (arg2 == arg3 || operand_equal_p (arg2, arg3,
+  OEP_DUPLICATED_BRANCHES)))
 warning_at (EXPR_LOCATION (result), OPT_Wduplicated_branches,
"this condition has identical branches");
 
--- gcc/cp/cvt.c.jj 2021-03-04 16:03:40.089323550 +0100
+++ gcc/cp/cvt.c

Re: [PATCH] aarch64: Fix up aarch64_simd_clone_compute_vecsize_and_simdlen [PR99542]

2021-03-16 Thread Christophe Lyon via Gcc-patches
On Tue, 16 Mar 2021 at 18:38, Jakub Jelinek  wrote:
>
> On Tue, Mar 16, 2021 at 06:34:34PM +0100, Christophe Lyon wrote:
> > On Sat, 13 Mar 2021 at 21:34, Jakub Jelinek via Gcc-patches
> > Unfortunately, the last new warning does not happen when running the
> > tests with -mabi=ilp32.
> > Is that small patch OK?
>
> With a proper ChangeLog entry, sure.
> Though perhaps it would be easier to do && lp64.  Your choice.

Indeed, I'll do that.

Thanks

>
> > diff --git a/gcc/testsuite/gcc.dg/declare-simd.c
> > b/gcc/testsuite/gcc.dg/declare-simd.c
> > index 52796f6..894fb64 100644
> > --- a/gcc/testsuite/gcc.dg/declare-simd.c
> > +++ b/gcc/testsuite/gcc.dg/declare-simd.c
> > @@ -3,7 +3,7 @@
> >
> >  #pragma omp declare simd linear (p2, p3)
> >  extern void fn2 (float p1, float *p2, float *p3);
> > -/* { dg-warning "GCC does not currently support mixed size types for
> > 'simd' functions" "" { target aarch64*-*-* } .-1 } */
> > +/* { dg-warning "GCC does not currently support mixed size types for
> > 'simd' functions" "" { target { { aarch64*-*-* } && { ! ilp32 } } }
> > .-1 } */
> >
> >  float *a, *b;
> >  void fn1 (float *p1)
>
> Jakub
>


Re: [PATCH] aarch64: Fix up aarch64_simd_clone_compute_vecsize_and_simdlen [PR99542]

2021-03-16 Thread Jakub Jelinek via Gcc-patches
On Tue, Mar 16, 2021 at 06:34:34PM +0100, Christophe Lyon wrote:
> On Sat, 13 Mar 2021 at 21:34, Jakub Jelinek via Gcc-patches
> Unfortunately, the last new warning does not happen when running the
> tests with -mabi=ilp32.
> Is that small patch OK?

With a proper ChangeLog entry, sure.
Though perhaps it would be easier to do && lp64.  Your choice.

> diff --git a/gcc/testsuite/gcc.dg/declare-simd.c
> b/gcc/testsuite/gcc.dg/declare-simd.c
> index 52796f6..894fb64 100644
> --- a/gcc/testsuite/gcc.dg/declare-simd.c
> +++ b/gcc/testsuite/gcc.dg/declare-simd.c
> @@ -3,7 +3,7 @@
> 
>  #pragma omp declare simd linear (p2, p3)
>  extern void fn2 (float p1, float *p2, float *p3);
> -/* { dg-warning "GCC does not currently support mixed size types for
> 'simd' functions" "" { target aarch64*-*-* } .-1 } */
> +/* { dg-warning "GCC does not currently support mixed size types for
> 'simd' functions" "" { target { { aarch64*-*-* } && { ! ilp32 } } }
> .-1 } */
> 
>  float *a, *b;
>  void fn1 (float *p1)

Jakub



Re: [PATCH] aarch64: Fix up aarch64_simd_clone_compute_vecsize_and_simdlen [PR99542]

2021-03-16 Thread Christophe Lyon via Gcc-patches
On Sat, 13 Mar 2021 at 21:34, Jakub Jelinek via Gcc-patches
 wrote:
>
> Hi!
>
> As the patch shows, there are several bugs in
> aarch64_simd_clone_compute_vecsize_and_simdlen.
> One is that unlike for function declarations that aren't definitions
> it completely ignores argument types.  Such decls don't have DECL_ARGUMENTS,
> but we can walk TYPE_ARG_TYPES instead, like the i386 backend does or like
> the simd cloning code in the middle end does too.
>
> Another problem is that it checks types of uniform arguments.  That is
> unnecessary, uniform arguments are passed the way it normally is, it is
> a scalar argument rather than vector, so there is no reason not to support
> uniform argument of different size, or long double, structure etc.
>
> Fixed thusly, bootstrapped/regtested on aarch64-linux, ok for trunk?
>
> 2021-03-13  Jakub Jelinek  
>
> PR target/99542
> * config/aarch64/aarch64.c
> (aarch64_simd_clone_compute_vecsize_and_simdlen): If not a function
> definition, walk TYPE_ARG_TYPES list if non-NULL for argument types
> instead of DECL_ARGUMENTS.  Ignore types for uniform arguments.
>
> * gcc.dg/gomp/pr99542.c: New test.
> * gcc.dg/gomp/pr59669-2.c (bar): Don't expect a warning on aarch64.
> * gcc.dg/gomp/simd-clones-2.c (setArray): Likewise.
> * g++.dg/vect/simd-clone-7.cc (bar): Likewise.
> * g++.dg/gomp/declare-simd-1.C (f37): Expect a different warning
> on aarch64.
> * gcc.dg/declare-simd.c (fn2): Expect a new warning on aarch64.
>
> --- gcc/config/aarch64/aarch64.c.jj 2021-03-12 19:01:48.672296344 +0100
> +++ gcc/config/aarch64/aarch64.c2021-03-13 09:16:42.585045538 +0100
> @@ -23412,11 +23412,17 @@ aarch64_simd_clone_compute_vecsize_and_s
>return 0;
>  }
>
> -  for (t = DECL_ARGUMENTS (node->decl); t; t = DECL_CHAIN (t))
> +  int i;
> +  tree type_arg_types = TYPE_ARG_TYPES (TREE_TYPE (node->decl));
> +  bool decl_arg_p = (node->definition || type_arg_types == NULL_TREE);
> +
> +  for (t = (decl_arg_p ? DECL_ARGUMENTS (node->decl) : type_arg_types), i = 
> 0;
> +   t && t != void_list_node; t = TREE_CHAIN (t), i++)
>  {
> -  arg_type = TREE_TYPE (t);
> +  tree arg_type = decl_arg_p ? TREE_TYPE (t) : TREE_VALUE (t);
>
> -  if (!currently_supported_simd_type (arg_type, base_type))
> +  if (clonei->args[i].arg_type != SIMD_CLONE_ARG_TYPE_UNIFORM
> + && !currently_supported_simd_type (arg_type, base_type))
> {
>   if (TYPE_SIZE (arg_type) != TYPE_SIZE (base_type))
> warning_at (DECL_SOURCE_LOCATION (node->decl), 0,
> --- gcc/testsuite/gcc.dg/gomp/pr99542.c.jj  2021-03-13 09:16:42.586045527 
> +0100
> +++ gcc/testsuite/gcc.dg/gomp/pr99542.c 2021-03-13 09:16:42.586045527 +0100
> @@ -0,0 +1,17 @@
> +/* PR middle-end/89246 */
> +/* { dg-do compile { target int128 } } */
> +/* { dg-options "-O0 -fopenmp-simd" } */
> +
> +#pragma omp declare simd
> +extern int foo (__int128 x);   /* { dg-warning "GCC does not currently 
> support mixed size types for 'simd' function" "" { target aarch64*-*-* } } */
> +/* { dg-warning "unsupported argument type '__int128' for simd" "" { target 
> i?86-*-* x86_64-*-* } .-1 } */
> +
> +#pragma omp declare simd uniform (x)
> +extern int baz (__int128 x);
> +
> +#pragma omp declare simd
> +int
> +bar (int x)
> +{
> +  return x + foo (0) + baz (0);
> +}
> --- gcc/testsuite/gcc.dg/gomp/pr59669-2.c.jj2020-01-12 11:54:37.430398065 
> +0100
> +++ gcc/testsuite/gcc.dg/gomp/pr59669-2.c   2021-03-13 09:35:07.100807877 
> +0100
> @@ -7,4 +7,3 @@ void
>  bar (int *a)
>  {
>  }
> -/* { dg-warning "GCC does not currently support mixed size types for 'simd' 
> functions" "" { target aarch64*-*-* } .-3 } */
> --- gcc/testsuite/gcc.dg/gomp/simd-clones-2.c.jj2020-01-12 
> 11:54:37.431398050 +0100
> +++ gcc/testsuite/gcc.dg/gomp/simd-clones-2.c   2021-03-13 09:36:21.143988256 
> +0100
> @@ -15,7 +15,6 @@ float setArray(float *a, float x, int k)
>return a[k];
>  }
>
> -/* { dg-warning "GCC does not currently support mixed size types for 'simd' 
> functions" "" { target aarch64*-*-* } .-6 } */
>  /* { dg-final { scan-tree-dump "_ZGVbN4ua32vl_setArray" "optimized" { target 
> i?86-*-* x86_64-*-* } } } */
>  /* { dg-final { scan-tree-dump "_ZGVbN4vvva32_addit" "optimized" { target 
> i?86-*-* x86_64-*-* } } } */
>  /* { dg-final { scan-tree-dump "_ZGVbM4vl66u_addit" "optimized" { target 
> i?86-*-* x86_64-*-* } } } */
> --- gcc/testsuite/g++.dg/vect/simd-clone-7.cc.jj2020-01-12 
> 11:54:37.280400328 +0100
> +++ gcc/testsuite/g++.dg/vect/simd-clone-7.cc   2021-03-13 09:23:58.005214442 
> +0100
> @@ -8,5 +8,3 @@ bar (float x, float *y, int)
>  {
>return y[0] + y[1] * x;
>  }
> -// { dg-warning "GCC does not currently support mixed size types for 'simd' 
> functions" "" { target { { aarch64*-*-* } && lp64 } } .-4 }
> -
> --- gcc/testsuite/g++.dg/gomp/declare-simd-1.C.jj   

Re: [RFC] ldist: Recognize rawmemchr loop patterns

2021-03-16 Thread Stefan Schulze Frielinghaus via Gcc-patches
[snip]

Please find attached a new version of the patch.  A major change compared to
the previous patch is that I created a separate pass which hopefully makes
reviewing also easier since it is almost self-contained.  After realizing that
detecting loops which mimic the behavior of rawmemchr/strlen functions does not
really fit into the topic of loop distribution, I created a separate pass.  Due
to this I was also able to play around a bit and schedule the pass at different
times.  Currently it is scheduled right before loop distribution where loop
header copying already took place which leads to the following effect.  Running
this setup over

char *t (char *p)
{
  for (; *p; ++p);
  return p;
}

the new pass transforms

char * t (char * p)
{
  char _1;
  char _7;

   [local count: 118111600]:
  _7 = *p_3(D);
  if (_7 != 0)
goto ; [89.00%]
  else
goto ; [11.00%]

   [local count: 105119324]:

   [local count: 955630225]:
  # p_8 = PHI 
  p_6 = p_8 + 1;
  _1 = *p_6;
  if (_1 != 0)
goto ; [89.00%]
  else
goto ; [11.00%]

   [local count: 105119324]:
  # p_2 = PHI 
  goto ; [100.00%]

   [local count: 850510901]:
  goto ; [100.00%]

   [local count: 12992276]:

   [local count: 118111600]:
  # p_9 = PHI 
  return p_9;

}

into

char * t (char * p)
{
  char * _5;
  char _7;

   [local count: 118111600]:
  _7 = *p_3(D);
  if (_7 != 0)
goto ; [89.00%]
  else
goto ; [11.00%]

   [local count: 105119324]:
  _5 = p_3(D) + 1;
  p_10 = .RAWMEMCHR (_5, 0);

   [local count: 118111600]:
  # p_9 = PHI 
  return p_9;

}

which is fine so far.  However, I haven't made up my mind so far whether it is
worthwhile to spend more time in order to also eliminate the "first unrolling"
of the loop.  I gave it a shot by scheduling the pass prior pass copy header
and ended up with:

char * t (char * p)
{
   [local count: 118111600]:
  p_5 = .RAWMEMCHR (p_3(D), 0);
  return p_5;

}

which seems optimal to me.  The downside of this is that I have to initialize
scalar evolution analysis which might be undesired that early.

All this brings me to the question where do you see this peace of code running?
If in a separate pass when would you schedule it?  If in an existing pass,
which one would you choose?

Another topic which came up is whether there exists a more elegant solution to
my current implementation in order to deal with stores (I'm speaking of the `if
(store_dr)` statement inside of function transform_loop_1).  For example,

extern char *p;
char *t ()
{
  for (; *p; ++p);
  return p;
}

ends up as

char * t ()
{
  char * _1;
  char * _2;
  char _3;
  char * p.1_8;
  char _9;
  char * p.1_10;
  char * p.1_11;

   [local count: 118111600]:
  p.1_8 = p;
  _9 = *p.1_8;
  if (_9 != 0)
goto ; [89.00%]
  else
goto ; [11.00%]

   [local count: 105119324]:

   [local count: 955630225]:
  # p.1_10 = PHI <_1(6), p.1_8(5)>
  _1 = p.1_10 + 1;
  p = _1;
  _3 = *_1;
  if (_3 != 0)
goto ; [89.00%]
  else
goto ; [11.00%]

   [local count: 105119324]:
  # _2 = PHI <_1(3)>
  goto ; [100.00%]

   [local count: 850510901]:
  goto ; [100.00%]

   [local count: 12992276]:

   [local count: 118111600]:
  # p.1_11 = PHI <_2(8), p.1_8(7)>
  return p.1_11;

}

where inside the loop a load and store occurs.  For a rawmemchr like loop I
have to show that we never load from a memory location to which we write.
Currently I solve this by hard coding those facts which are not generic at all.
I gave compute_data_dependences_for_loop a try which failed to determine the
fact that stores only happen to p[0] and loads from p[i] where i>0.  Maybe
there are more generic solutions to express this in contrast to my current one?

Thanks again for your input so far.  Really appreciated.

Cheers,
Stefan
diff --git a/gcc/Makefile.in b/gcc/Makefile.in
index 8a5fb3fd99c..7b2d7405277 100644
--- a/gcc/Makefile.in
+++ b/gcc/Makefile.in
@@ -1608,6 +1608,7 @@ OBJS = \
tree-into-ssa.o \
tree-iterator.o \
tree-loop-distribution.o \
+   tree-loop-pattern.o \
tree-nested.o \
tree-nrv.o \
tree-object-size.o \
diff --git a/gcc/internal-fn.c b/gcc/internal-fn.c
index dd7173126fb..957e96a46a4 100644
--- a/gcc/internal-fn.c
+++ b/gcc/internal-fn.c
@@ -2917,6 +2917,33 @@ expand_VEC_CONVERT (internal_fn, gcall *)
   gcc_unreachable ();
 }
 
+void
+expand_RAWMEMCHR (internal_fn, gcall *stmt)
+{
+  expand_operand ops[3];
+
+  tree lhs = gimple_call_lhs (stmt);
+  if (!lhs)
+return;
+  tree lhs_type = TREE_TYPE (lhs);
+  rtx lhs_rtx = expand_expr (lhs, NULL_RTX, VOIDmode, EXPAND_WRITE);
+  create_output_operand ([0], lhs_rtx, TYPE_MODE (lhs_type));
+
+  for (unsigned int i = 0; i < 2; ++i)
+{
+  tree rhs = gimple_call_arg (stmt, i);
+  tree rhs_type = TREE_TYPE (rhs);
+  rtx rhs_rtx = expand_normal (rhs);
+  create_input_operand ([i + 1], rhs_rtx, TYPE_MODE (rhs_type));
+}
+
+  insn_code icode = direct_optab_handler (rawmemchr_optab, ops[2].mode);
+
+  expand_insn (icode, 

Re: [Patch, fortran] PR99602 - [11 regression] runtime error: pointer actual argument not associated

2021-03-16 Thread Tobias Burnus

Hi Paul,

On 16.03.21 17:42, Paul Richard Thomas via Gcc-patches wrote:

Fortran: Fix runtime errors for class actual arguments [PR99602].
* trans-array.c (gfc_conv_procedure_call): For class formal
arguments, use the _data field attributes for runtime errors.
* gfortran.dg/pr99602.f90: New test.


Shouldn't there be also a testcase which triggers this run-time error?

I might have messed up my testcase, but I think it should trigger?
(Attached is an attempt to pass the nullified pointer as actual argument
to a non-pointer argument; otherwise it is the same testcase as before.)

Otherwise, at a glance, it looked sensible.

Tobias

-
Mentor Graphics (Deutschland) GmbH, Arnulfstrasse 201, 80634 München 
Registergericht München HRB 106955, Geschäftsführer: Thomas Heurung, Frank 
Thürauf
! { dg-do compile }
! { dg-options "-fcheck=pointer -fdump-tree-original" }
!
! Test fix of PR99602, where a spurious runtime error was introduced
! by PR99112. This is the testcase in comment #6 of the PR.
!
! Contributed by Jeurgen Reuter  
!
module m
  implicit none
  private
  public :: m_t
  type :: m_t
 integer :: ii(100)
  end type m_t
end module m

module m2_testbed
  use m
  implicit none
  private
  public :: prepare_m2
  procedure (prepare_m2_proc), pointer :: prepare_m2 => null ()

  abstract interface
 subroutine prepare_m2_proc (m2)
   import
   class(m_t), intent(inout) :: m2
 end subroutine prepare_m2_proc
  end interface

end module m2_testbed

module a
  use m
  use m2_testbed, only: prepare_m2
  implicit none
  private
  public :: a_1

contains

  subroutine a_1 ()
class(m_t), pointer :: mm
mm => null ()
call prepare_m2 (mm) ! Runtime error triggered here
  end subroutine a_1

end module a


module m2
  use m
  implicit none
  private
  public :: m2_t

  type, extends (m_t) :: m2_t
 private
   contains
 procedure :: read => m2_read
  end type m2_t
contains

  subroutine m2_read (mm)
class(m2_t), intent(out), target :: mm
  end subroutine m2_read
end module m2

program main
  use m2_testbed
  use a, only: a_1
  implicit none
  prepare_m2 => prepare_whizard_m2
  call a_1 ()

contains

  subroutine prepare_whizard_m2 (mm)
use m
use m2
class(m_t), intent(inout) :: mm
!if (.not. associated (mm))  allocate (m2_t :: mm)
mm%ii = 100
select type (mm)
type is (m2_t)
   call mm%read ()
end select
  end subroutine prepare_whizard_m2
end program main
! { dg-final { scan-tree-dump-times "_gfortran_runtime_error_at" 0 "original" } }
! { dg-final { scan-tree-dump-times "Pointer actual argument" 0 "original" } }


Re: [PATCH v8] Practical improvement to libgcc complex divide

2021-03-16 Thread Patrick McGehearty via Gcc-patches

Ping...

On 2/22/2021 5:08 PM, Patrick McGehearty via Gcc-patches wrote:

Changes in this version from Version 7:
 gcc/c-family/c-cppbuiltin.c
Changed modename to float_h_prefix
Removed (mode == TYPE_MODE...) code left over from earlier approach.
 libgcc/libgcc2.c
Removed conditional use of XF constants in TF case.
Also left over from earlier approach.

Tested _Float64x case on x86. Works as expected.

Correctness and performance test programs used during development of
this project may be found in the attachment to:
https://www.mail-archive.com/gcc-patches@gcc.gnu.org/msg254210.html

Summary of Purpose

This patch to libgcc/libgcc2.c __divdc3 provides an
opportunity to gain important improvements to the quality of answers
for the default complex divide routine (half, float, double, extended,
long double precisions) when dealing with very large or very small exponents.

The current code correctly implements Smith's method (1962) [2]
further modified by c99's requirements for dealing with NaN (not a
number) results. When working with input values where the exponents
are greater than *_MAX_EXP/2 or less than -(*_MAX_EXP)/2, results are
substantially different from the answers provided by quad precision
more than 1% of the time. This error rate may be unacceptable for many
applications that cannot a priori restrict their computations to the
safe range. The proposed method reduces the frequency of
"substantially different" answers by more than 99% for double
precision at a modest cost of performance.

Differences between current gcc methods and the new method will be
described. Then accuracy and performance differences will be discussed.

Background

This project started with an investigation related to
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=59714.  Study of Beebe[1]
provided an overview of past and recent practice for computing complex
divide. The current glibc implementation is based on Robert Smith's
algorithm [2] from 1962.  A google search found the paper by Baudin
and Smith [3] (same Robert Smith) published in 2012. Elen Kalda's
proposed patch [4] is based on that paper.

I developed two sets of test data by randomly distributing values over
a restricted range and the full range of input values. The current
complex divide handled the restricted range well enough, but failed on
the full range more than 1% of the time. Baudin and Smith's primary
test for "ratio" equals zero reduced the cases with 16 or more error
bits by a factor of 5, but still left too many flawed answers. Adding
debug print out to cases with substantial errors allowed me to see the
intermediate calculations for test values that failed. I noted that
for many of the failures, "ratio" was a subnormal. Changing the
"ratio" test from check for zero to check for subnormal reduced the 16
bit error rate by another factor of 12. This single modified test
provides the greatest benefit for the least cost, but the percentage
of cases with greater than 16 bit errors (double precision data) is
still greater than 0.027% (2.7 in 10,000).

Continued examination of remaining errors and their intermediate
computations led to the various tests of input value tests and scaling
to avoid under/overflow. The current patch does not handle some of the
rare and most extreme combinations of input values, but the random
test data is only showing 1 case in 10 million that has an error of
greater than 12 bits. That case has 18 bits of error and is due to
subtraction cancellation. These results are significantly better
than the results reported by Baudin and Smith.

Support for half, float, double, extended, and long double precision
is included as all are handled with suitable preprocessor symbols in a
single source routine. Since half precision is computed with float
precision as per current libgcc practice, the enhanced algorithm
provides no benefit for half precision and would cost performance.
Further investigation showed changing the half precision algorithm
to use the simple formula (real=a*c+b*d imag=b*c-a*d) caused no
loss of precision and modest improvement in performance.

The existing constants for each precision:
float: FLT_MAX, FLT_MIN;
double: DBL_MAX, DBL_MIN;
extended and/or long double: LDBL_MAX, LDBL_MIN
are used for avoiding the more common overflow/underflow cases.  This
use is made generic by defining appropriate __LIBGCC2_* macros in
c-cppbuiltin.c.

Tests are added for when both parts of the denominator have exponents
small enough to allow shifting any subnormal values to normal values
all input values could be scaled up without risking overflow. That
gained a clear improvement in accuracy. Similarly, when either
numerator was subnormal and the other numerator and both denominator
values were not too large, scaling could be used to reduce risk of
computing with subnormals.  The test and scaling values used all fit
within the allowed exponent range for each precision required by the C
standard.


[Patch, fortran] PR99602 - [11 regression] runtime error: pointer actual argument not associated

2021-03-16 Thread Paul Richard Thomas via Gcc-patches
Hi Everybody,

Although this is 'obvious' I thought that I should post it because I
believe that it was triggered by the fix for PR99602 but I just do not have
the bandwidth at the moment to test that. The ChangeLog together with the
patch is more than sufficient explanation.

Regtests OK on FC33/x86_64. OK for 11-branch?

Paul

Fortran: Fix runtime errors for class actual arguments [PR99602].

2021-03-16  Paul Thomas  

gcc/fortran
PR fortran/99602
* trans-array.c (gfc_conv_procedure_call): For class formal
arguments, use the _data field attributes for runtime errors.

gcc/testsuite/
PR fortran/99602
* gfortran.dg/pr99602.f90: New test.
! { dg-do compile }
! { dg-options "-fcheck=pointer -fdump-tree-original" }
!
! Test fix of PR99602, where a spurious runtime error was introduced
! by PR99112. This is the testcase in comment #6 of the PR.
!
! Contributed by Jeurgen Reuter  
!
module m
  implicit none
  private
  public :: m_t
  type :: m_t
 private
  end type m_t
end module m

module m2_testbed
  use m
  implicit none
  private
  public :: prepare_m2
  procedure (prepare_m2_proc), pointer :: prepare_m2 => null ()

  abstract interface
 subroutine prepare_m2_proc (m2)
   import
   class(m_t), intent(inout), pointer :: m2
 end subroutine prepare_m2_proc
  end interface

end module m2_testbed

module a
  use m
  use m2_testbed, only: prepare_m2
  implicit none
  private
  public :: a_1

contains

  subroutine a_1 ()
class(m_t), pointer :: mm
mm => null ()
call prepare_m2 (mm) ! Runtime error triggered here
  end subroutine a_1

end module a


module m2
  use m
  implicit none
  private
  public :: m2_t

  type, extends (m_t) :: m2_t
 private
   contains
 procedure :: read => m2_read
  end type m2_t
contains

  subroutine m2_read (mm)
class(m2_t), intent(out), target :: mm
  end subroutine m2_read
end module m2

program main
  use m2_testbed
  use a, only: a_1
  implicit none
  prepare_m2 => prepare_whizard_m2
  call a_1 ()

contains

  subroutine prepare_whizard_m2 (mm)
use m
use m2
class(m_t), intent(inout), pointer :: mm
if (.not. associated (mm))  allocate (m2_t :: mm)
select type (mm)
type is (m2_t)
   call mm%read ()
end select
  end subroutine prepare_whizard_m2
end program main
! { dg-final { scan-tree-dump-times "_gfortran_runtime_error_at" 0 "original" } }
! { dg-final { scan-tree-dump-times "Pointer actual argument" 0 "original" } }
diff --git a/gcc/fortran/trans-expr.c b/gcc/fortran/trans-expr.c
index bffe0808dff..0cf17008b05 100644
--- a/gcc/fortran/trans-expr.c
+++ b/gcc/fortran/trans-expr.c
@@ -6663,6 +6663,15 @@ gfc_conv_procedure_call (gfc_se * se, gfc_symbol * sym,
 	  char *msg;
 	  tree cond;
 	  tree tmp;
+	  symbol_attribute fsym_attr;
+
+	  if (fsym)
+	{
+	  if (fsym->ts.type == BT_CLASS && !UNLIMITED_POLY (fsym))
+		fsym_attr = CLASS_DATA (fsym)->attr;
+	  else
+		fsym_attr = fsym->attr;
+	}
 
 	  if (e->expr_type == EXPR_VARIABLE || e->expr_type == EXPR_FUNCTION)
 	attr = gfc_expr_attr (e);
@@ -6685,17 +6694,17 @@ gfc_conv_procedure_call (gfc_se * se, gfc_symbol * sym,
 	  tree present, null_ptr, type;
 
 	  if (attr.allocatable
-		  && (fsym == NULL || !fsym->attr.allocatable))
+		  && (fsym == NULL || !fsym_attr.allocatable))
 		msg = xasprintf ("Allocatable actual argument '%s' is not "
  "allocated or not present",
  e->symtree->n.sym->name);
 	  else if (attr.pointer
-		   && (fsym == NULL || !fsym->attr.pointer))
+		   && (fsym == NULL || !fsym_attr.pointer))
 		msg = xasprintf ("Pointer actual argument '%s' is not "
  "associated or not present",
  e->symtree->n.sym->name);
 	  else if (attr.proc_pointer
-		   && (fsym == NULL || !fsym->attr.proc_pointer))
+		   && (fsym == NULL || !fsym_attr.proc_pointer))
 		msg = xasprintf ("Proc-pointer actual argument '%s' is not "
  "associated or not present",
  e->symtree->n.sym->name);
@@ -6719,15 +6728,15 @@ gfc_conv_procedure_call (gfc_se * se, gfc_symbol * sym,
   else
 	{
 	  if (attr.allocatable
-		  && (fsym == NULL || !fsym->attr.allocatable))
+		  && (fsym == NULL || !fsym_attr.allocatable))
 		msg = xasprintf ("Allocatable actual argument '%s' is not "
  "allocated", e->symtree->n.sym->name);
 	  else if (attr.pointer
-		   && (fsym == NULL || !fsym->attr.pointer))
+		   && (fsym == NULL || !fsym_attr.pointer))
 		msg = xasprintf ("Pointer actual argument '%s' is not "
  "associated", e->symtree->n.sym->name);
 	  else if (attr.proc_pointer
-		   && (fsym == NULL || !fsym->attr.proc_pointer))
+		   && (fsym == NULL || !fsym_attr.proc_pointer))
 		msg = xasprintf ("Proc-pointer actual argument '%s' is not "
  "associated", e->symtree->n.sym->name);
 	  else


Re: [PATCH 1/2, rs6000] Add const_anchor for rs6000 [PR33699]

2021-03-16 Thread Segher Boessenkool
Hi!

On Mon, Mar 15, 2021 at 11:11:32AM +0800, HAO CHEN GUI via Gcc-patches wrote:
>     This patch adds const_anchor for rs6000. The const_anchor is used 
> in cse pass.

1) This isn't suitable for stage 4.
2) Please add a test case, which shows what it does, that it is useful.
3) Does this work on other OSes than Linux?  What about Darwin and AIX?

> diff --git a/gcc/config/rs6000/rs6000.c b/gcc/config/rs6000/rs6000.c
> index ec068c58aa5..2b2350c53ae 100644
> --- a/gcc/config/rs6000/rs6000.c
> +++ b/gcc/config/rs6000/rs6000.c
> @@ -4911,6 +4911,13 @@ rs6000_option_override_internal (bool global_init_p)
>  warning (0, "%qs is deprecated and not recommended in any circumstances",
>"-mno-speculate-indirect-jumps");
>  
> +  if (TARGET_64BIT)
> +{
> +  targetm.min_anchor_offset = -32768;
> +  targetm.max_anchor_offset = 32767;
> +  targetm.const_anchor = 0x8000;
> +}

Why only on 64 bit?  Why these choices?


Segher


[PATCH v4] x86: Update 'P' operand modifier for -fno-plt

2021-03-16 Thread H.J. Lu via Gcc-patches
On Sun, Mar 14, 2021 at 1:31 PM H.J. Lu  wrote:
>
> On Sun, Mar 14, 2021 at 12:43 PM Uros Bizjak  wrote:
> >
> > On Sun, Mar 14, 2021 at 8:14 PM H.J. Lu  wrote:
> > > > > Done.  Here is the updated patch.  Tested on Linux/x86-64.  OK for 
> > > > > master?
> > > >
> > > > I don't understand the purpose of the current_output_insn check and I
> > > > don't know if the usage of current_output_insn is correct. The
> > > > comments are not helpful either, and no other target uses this
> > > > variable in the way you propose. Can you please elaborate the reason
> > > > and the purpose of the check a bit more?
> > > >
> > > > Uros.
> > >
> > > Originally, ix86_force_load_from_GOT_p is only for non-PIC.   My patch 
> > > extended
> > > it to inline assembly statements where current_output_insn == NULL and 
> > > PIC is
> > > allowed in 64-bit.
> >
> > I can see this from the patch, but this explanation didn't answer my 
> > question.
> >
>
> The purpose of current_output_insn == NULL is to allow PIC for inline
> asm statements in 64-bit mode.  Is there a better way to check if
> ix86_print_operand () is called on inline asm statements?
>

Here is the v4 patch to check this_is_asm_operands for inline
 asm statements.   OK for master?

Thanks.

-- 
H.J.


Re: testsuite: automatically checking for param documentation

2021-03-16 Thread Martin Liška

On 3/16/21 4:09 PM, David Malcolm wrote:

[updating subject for greater visibility]

On Tue, 2021-03-16 at 08:51 -0600, Martin Sebor wrote:

On 3/16/21 3:08 AM, Martin Liška wrote:

On 3/15/21 9:57 PM, Martin Sebor wrote:

Any plans to integrate it into the testsuite?  (That way we presu
mably
wouldn't need to remember to run it by hand.)


Likely not, I'm not so big friend with DejaGNU.
Are you willing to help me with that?


I'm not a fan either but that's what we've got.  And sure, I'll help
if/when I can.  I think it should be "straightforward" to rewrite
the script in Expect (as much as anything is straightforward in
Expect).  Or, it could stay as is and be invoked from some .exp
file in testsuite.  Although is Python a required prerequisite
for running the testsuite?  If not, it might be better to either
rewrite the script in something that is (e.g., AWK) or in Expect.


I find it painful writing non-trivial logic in Tcl.


Me too!



We already have run-gcov-pytest in gcov.exp which calls out to a
python3 script, checking if python3 is supported, and bailing with
UNSUPPORTED otherwise.

FWIW I think it's reasonable to have something similar for this case.


Yep, let's run the existing Python script.

Martin



Dave





testsuite: automatically checking for param documentation

2021-03-16 Thread David Malcolm via Gcc-patches
[updating subject for greater visibility]

On Tue, 2021-03-16 at 08:51 -0600, Martin Sebor wrote:
> On 3/16/21 3:08 AM, Martin Liška wrote:
> > On 3/15/21 9:57 PM, Martin Sebor wrote:
> > > Any plans to integrate it into the testsuite?  (That way we presu
> > > mably
> > > wouldn't need to remember to run it by hand.)
> > 
> > Likely not, I'm not so big friend with DejaGNU.
> > Are you willing to help me with that?
> 
> I'm not a fan either but that's what we've got.  And sure, I'll help
> if/when I can.  I think it should be "straightforward" to rewrite
> the script in Expect (as much as anything is straightforward in
> Expect).  Or, it could stay as is and be invoked from some .exp
> file in testsuite.  Although is Python a required prerequisite
> for running the testsuite?  If not, it might be better to either
> rewrite the script in something that is (e.g., AWK) or in Expect.

I find it painful writing non-trivial logic in Tcl.

We already have run-gcov-pytest in gcov.exp which calls out to a
python3 script, checking if python3 is supported, and bailing with
UNSUPPORTED otherwise.

FWIW I think it's reasonable to have something similar for this case.

Dave



Re: [PATCH][pushed] analyzer: document new param

2021-03-16 Thread Martin Sebor via Gcc-patches

On 3/16/21 3:08 AM, Martin Liška wrote:

On 3/15/21 9:57 PM, Martin Sebor wrote:

Any plans to integrate it into the testsuite?  (That way we presumably
wouldn't need to remember to run it by hand.)


Likely not, I'm not so big friend with DejaGNU.
Are you willing to help me with that?


I'm not a fan either but that's what we've got.  And sure, I'll help
if/when I can.  I think it should be "straightforward" to rewrite
the script in Expect (as much as anything is straightforward in
Expect).  Or, it could stay as is and be invoked from some .exp
file in testsuite.  Although is Python a required prerequisite
for running the testsuite?  If not, it might be better to either
rewrite the script in something that is (e.g., AWK) or in Expect.

Martin



Thanks,
Martin




Re: RFA: libiberty: silence static analyzer warning

2021-03-16 Thread Ian Lance Taylor via Gcc-patches
On Tue, Mar 16, 2021, 4:48 AM Nick Clifton via Gcc-patches <
gcc-patches@gcc.gnu.org> wrote:

> Hi Ian,
>
>   One of the static analyzers we use is throwing up an error report for
>   one of the libiberty source files:
>
> Error: BUFFER_SIZE (CWE-474):
> libiberty/sha1.c:261: overlapping_buffer: The source buffer
> ">buffer[16]" potentially overlaps with the destination buffer
> "ctx->buffer", which results in undefined behavior for "memcpy".
> libiberty/sha1.c:261: remediation: Use memmove instead of "memcpy".
> #  259|   sha1_process_block (ctx->buffer, 64, ctx);
> #  260|   left_over -= 64;
> #  261|-> memcpy (ctx->buffer, >buffer[16], left_over);
> #  262| }
> #  263| ctx->buflen = left_over;
>
>   Looking at the source code I am not sure if the problem can actually
>   be triggered in reality, but there seems to be no harm in being
>   cautious, so I would like to ask for permission to apply the following
>   patch:
>
> diff --git a/libiberty/sha1.c b/libiberty/sha1.c
> index e3d7f86e351..7d15d48d11d 100644
> --- a/libiberty/sha1.c
> +++ b/libiberty/sha1.c
> @@ -258,7 +258,7 @@ sha1_process_bytes (const void *buffer, size_t len,
> struct sha1_ctx *ctx)
> {
>   sha1_process_block (ctx->buffer, 64, ctx);
>   left_over -= 64;
> - memcpy (ctx->buffer, >buffer[16], left_over);
> + memmove (ctx->buffer, >buffer[16], left_over);
> }
>ctx->buflen = left_over;
>  }


That is ok.

Thanks.

Ian

>
>


Re: [PATCH 1/2, rs6000] Add const_anchor for rs6000 [PR33699]

2021-03-16 Thread David Edelsohn via Gcc-patches
I suspect that this is incorrect.  Did you look at the discussion of
the choice of anchor limits when first implemented?  We specifically
chose 32 bit range.

Thanks, David

On Tue, Mar 16, 2021 at 10:21 AM will schmidt  wrote:
>
> On Mon, 2021-03-15 at 11:11 +0800, HAO CHEN GUI via Gcc-patches wrote:
> > Hi,
> >
> >  This patch adds const_anchor for rs6000. The const_anchor is used
> > in cse pass.
> >
> >  The attachment are the patch diff and change log file.
> >
> >  Bootstrapped and tested on powerpc64le with no regressions. Is this
> > okay for trunk? Any  recommendations? Thanks a lot.
>
>
> Be sure to CC David and Segher to help ensure they see the arch
> specific patch. :-)
> >
>
> > * config/rs6000/rs6000.c (rs6000_option_override_internal): Set
> > targetm.const_anchor, targetm.min_anchor_offset
> > and targetm.max_anchor_offset.
>
>
> lgtm.
>
>
> > diff --git a/gcc/config/rs6000/rs6000.c b/gcc/config/rs6000/rs6000.c
> > index ec068c58aa5..2b2350c53ae 100644
> > --- a/gcc/config/rs6000/rs6000.c
> > +++ b/gcc/config/rs6000/rs6000.c
> > @@ -4911,6 +4911,13 @@ rs6000_option_override_internal (bool global_init_p)
> >  warning (0, "%qs is deprecated and not recommended in any 
> > circumstances",
> >"-mno-speculate-indirect-jumps");
> >
> > +  if (TARGET_64BIT)
> > +{
> > +  targetm.min_anchor_offset = -32768;
> > +  targetm.max_anchor_offset = 32767;
> > +  targetm.const_anchor = 0x8000;
> > +}
> > +
>
>
> The mix of decimal and hexadecimal notation catches my eye, but
> this matches the style I see for other architectures, mips in
> particular.
>
> Do we want/need to explicitly set the values for !TARGET_64BIT ?   (I
> can't immediately tell what the default values are).
>
> lgtm.
>
>
> >return ret;
> >  }
> >
>


Re: [PATCH 1/2, rs6000] Add const_anchor for rs6000 [PR33699]

2021-03-16 Thread will schmidt via Gcc-patches
On Mon, 2021-03-15 at 11:11 +0800, HAO CHEN GUI via Gcc-patches wrote:
> Hi,
> 
>  This patch adds const_anchor for rs6000. The const_anchor is
> used 
> in cse pass.
> 
>  The attachment are the patch diff and change log file.
> 
>  Bootstrapped and tested on powerpc64le with no regressions. Is
> this 
> okay for trunk? Any  recommendations? Thanks a lot.
> 


> * config/rs6000/rs6000.c (rs6000_option_override_internal): Set
> targetm.const_anchor, targetm.min_anchor_offset
> and targetm.max_anchor_offset.


Part two of my review (i missed this first time)..  :-)

Some variation of "PR Target/33699 " should be included as part of the
changelog blurb.


Thanks
-WIll



Re: [PATCH 1/2, rs6000] Add const_anchor for rs6000 [PR33699]

2021-03-16 Thread will schmidt via Gcc-patches
On Mon, 2021-03-15 at 11:11 +0800, HAO CHEN GUI via Gcc-patches wrote:
> Hi,
> 
>  This patch adds const_anchor for rs6000. The const_anchor is used 
> in cse pass.
> 
>  The attachment are the patch diff and change log file.
> 
>  Bootstrapped and tested on powerpc64le with no regressions. Is this 
> okay for trunk? Any  recommendations? Thanks a lot.


Be sure to CC David and Segher to help ensure they see the arch
specific patch. :-) 
> 

> * config/rs6000/rs6000.c (rs6000_option_override_internal): Set
> targetm.const_anchor, targetm.min_anchor_offset
> and targetm.max_anchor_offset.


lgtm.


> diff --git a/gcc/config/rs6000/rs6000.c b/gcc/config/rs6000/rs6000.c
> index ec068c58aa5..2b2350c53ae 100644
> --- a/gcc/config/rs6000/rs6000.c
> +++ b/gcc/config/rs6000/rs6000.c
> @@ -4911,6 +4911,13 @@ rs6000_option_override_internal (bool global_init_p)
>  warning (0, "%qs is deprecated and not recommended in any circumstances",
>"-mno-speculate-indirect-jumps");
>  
> +  if (TARGET_64BIT)
> +{
> +  targetm.min_anchor_offset = -32768;
> +  targetm.max_anchor_offset = 32767;
> +  targetm.const_anchor = 0x8000;
> +}
> +


The mix of decimal and hexadecimal notation catches my eye, but 
this matches the style I see for other architectures, mips in
particular.

Do we want/need to explicitly set the values for !TARGET_64BIT ?   (I
can't immediately tell what the default values are).

lgtm.


>return ret;
>  }
>  



[PATCH][pushed] options: ignore flag_ipa_ra in cl_optimization_compare

2021-03-16 Thread Martin Liška

One more exception for cl_optimization_compare.

Pushed to master,
thanks,
Martin

gcc/ChangeLog:

PR target/99592
* optc-save-gen.awk: Add flag_ipa_ra to exceptions for
cl_optimization_compare function.

gcc/testsuite/ChangeLog:

PR target/99592
* gcc.target/arm/pr99592.c: New test.
---
 gcc/optc-save-gen.awk  | 1 +
 gcc/testsuite/gcc.target/arm/pr99592.c | 7 +++
 2 files changed, 8 insertions(+)
 create mode 100644 gcc/testsuite/gcc.target/arm/pr99592.c

diff --git a/gcc/optc-save-gen.awk b/gcc/optc-save-gen.awk
index 14b8d03888e..19afa895930 100644
--- a/gcc/optc-save-gen.awk
+++ b/gcc/optc-save-gen.awk
@@ -1445,6 +1445,7 @@ checked_options["TARGET_CASE_VECTOR_PC_RELATIVE"]++
 checked_options["arc_size_opt_level"]++
 # arm exceptions
 checked_options["arm_fp16_format"]++
+checked_options["flag_ipa_ra"]++
 # s390 exceptions
 checked_options["param_max_completely_peel_times"]++
 checked_options["param_max_completely_peeled_insns"]++
diff --git a/gcc/testsuite/gcc.target/arm/pr99592.c 
b/gcc/testsuite/gcc.target/arm/pr99592.c
new file mode 100644
index 000..23d6591ba16
--- /dev/null
+++ b/gcc/testsuite/gcc.target/arm/pr99592.c
@@ -0,0 +1,7 @@
+/* PR target/99592 */
+/* { dg-do compile } */
+/* { dg-options "-O2 -pg" } */
+
+#pragma GCC push_options
+#pragma GCC target ""
+#pragma GCC pop_options
--
2.30.2



Re: [PATCH 2/2] libstdc++: Remove symbols for new std::call_once implementation [PR 99341]

2021-03-16 Thread Jonathan Wakely via Gcc-patches

On 12/03/21 17:46 +, Jonathan Wakely wrote:

This removes the new symbols added for the new futex-based
std::call_once implementation. These symbols were new on trunk, so not
in any released version. However, they are already present in some
beta distro releases (Fedora Linux 34) and in Fedora Linux rawhide. This
change can be locally reverted by distros that need to keep the symbols
present until affected packages have been rebuilt.

libstdc++-v3/ChangeLog:

   PR libstdc++/99341
   * config/abi/post/aarch64-linux-gnu/baseline_symbols.txt: Remove
   std::once_flag symbols.
   * config/abi/post/ia64-linux-gnu/baseline_symbols.txt: Likewise.
   * config/abi/post/m68k-linux-gnu/baseline_symbols.txt: Likewise.
   * config/abi/post/riscv64-linux-gnu/baseline_symbols.txt:
   Likewise.
   * config/abi/pre/gnu.ver: Likewise.
   * src/c++11/mutex.cc [_GLIBCXX_HAVE_LINUX_FUTEX]
   (struct __once_flag_compat): Remove.
   (_ZNSt9once_flag11_M_activateEv): Remove.
   (_ZNSt9once_flag9_M_finishEb): Remove.

Tested x86_64-linux, powerpc64-linux and powerpc64le-linux, but not yet
committed (it's not really appropriate for a last-minute Friday
change!)


I've now pushed [PATCH 1/2] and this one.



Re: [PATCH] i386: Avoid mutual recursion between two peephole2s [PR99600]

2021-03-16 Thread Uros Bizjak via Gcc-patches
On Tue, Mar 16, 2021 at 12:52 PM Jakub Jelinek  wrote:
>
> On Tue, Mar 16, 2021 at 11:55:01AM +0100, Uros Bizjak wrote:
> > Maybe we could simply emit a special form of a ASHIFT pattern, tagged
> > with some unspec (similar to e.g. divmod4_1), and teach
> > ix86_split_lea_for_addr to emit it instead? Peephole pass is so late
> > in the pass sequence that we won't lose anything. We only need one
> > additional SWI48mode ASHIFT pattern with a const123_operand immediate.
>
> Ok.  Any reason not to use just MULT for that and split it back into
> ASHIFT during the split pass that follows shortly after peephole2?

Works for me.

Uros.

>
> 2021-03-16  Jakub Jelinek  
>
> PR target/99600
> * config/i386/i386-expand.c (ix86_split_lea_for_addr): Emit a MULT
> rather than ASHIFT.
> * config/i386/i386.md (mult by 1248 into ashift): New splitter.
>
> * gcc.target/i386/pr99600.c: New test.
>
> --- gcc/config/i386/i386-expand.c.jj2021-03-16 11:16:08.487860451 +0100
> +++ gcc/config/i386/i386-expand.c   2021-03-16 12:26:20.331083409 +0100
> @@ -1348,9 +1348,10 @@ ix86_split_lea_for_addr (rtx_insn *insn,
>   if (regno0 != regno2)
> emit_insn (gen_rtx_SET (target, parts.index));
>
> - /* Use shift for scaling.  */
> - ix86_emit_binop (ASHIFT, mode, target,
> -  GEN_INT (exact_log2 (parts.scale)));
> + /* Use shift for scaling, but emit it as MULT instead
> +to avoid it being immediately peephole2 optimized back
> +into lea.  */
> + ix86_emit_binop (MULT, mode, target, GEN_INT (parts.scale));
>
>   if (parts.base)
> ix86_emit_binop (PLUS, mode, target, parts.base);
> --- gcc/config/i386/i386.md.jj  2021-03-16 00:21:12.192422264 +0100
> +++ gcc/config/i386/i386.md 2021-03-16 12:41:27.384022356 +0100
> @@ -5219,6 +5219,18 @@ (define_peephole2
>
>DONE;
>  })
> +
> +;; ix86_split_lea_for_addr emits the shifts as MULT to avoid it from being
> +;; peephole2 optimized back into a lea.  Split that into the shift during
> +;; the following split pass.
> +(define_split
> +  [(set (match_operand:SWI48 0 "general_reg_operand")
> +   (mult:SWI48 (match_dup 0) (match_operand:SWI48 1 
> "const1248_operand")))
> +   (clobber (reg:CC FLAGS_REG))]
> +  "reload_completed"
> +  [(parallel [(set (match_dup 0) (ashift:SWI48 (match_dup 0) (match_dup 1)))
> +  (clobber (reg:CC FLAGS_REG))])]
> +  "operands[1] = GEN_INT (exact_log2 (INTVAL (operands[1])));")
>
>  ;; Add instructions
>
> --- gcc/testsuite/gcc.target/i386/pr99600.c.jj  2021-03-16 12:26:50.610747515 
> +0100
> +++ gcc/testsuite/gcc.target/i386/pr99600.c 2021-03-16 12:26:50.610747515 
> +0100
> @@ -0,0 +1,16 @@
> +/* PR target/99600 */
> +/* { dg-do compile } */
> +/* { dg-options "-O2 -march=atom" } */
> +
> +char a, b;
> +long c;
> +
> +long
> +foo (void)
> +{
> +  if (a)
> +c = b == 1 ? 1 << 3 : 1 << 2;
> +  else
> +c = 0;
> +  return 0;
> +}
>
>
> Jakub
>


Re: [PATCH] i386: Avoid mutual recursion between two peephole2s [PR99600]

2021-03-16 Thread Jakub Jelinek via Gcc-patches
On Tue, Mar 16, 2021 at 11:55:01AM +0100, Uros Bizjak wrote:
> Maybe we could simply emit a special form of a ASHIFT pattern, tagged
> with some unspec (similar to e.g. divmod4_1), and teach
> ix86_split_lea_for_addr to emit it instead? Peephole pass is so late
> in the pass sequence that we won't lose anything. We only need one
> additional SWI48mode ASHIFT pattern with a const123_operand immediate.

Ok.  Any reason not to use just MULT for that and split it back into
ASHIFT during the split pass that follows shortly after peephole2?

2021-03-16  Jakub Jelinek  

PR target/99600
* config/i386/i386-expand.c (ix86_split_lea_for_addr): Emit a MULT
rather than ASHIFT.
* config/i386/i386.md (mult by 1248 into ashift): New splitter.

* gcc.target/i386/pr99600.c: New test.

--- gcc/config/i386/i386-expand.c.jj2021-03-16 11:16:08.487860451 +0100
+++ gcc/config/i386/i386-expand.c   2021-03-16 12:26:20.331083409 +0100
@@ -1348,9 +1348,10 @@ ix86_split_lea_for_addr (rtx_insn *insn,
  if (regno0 != regno2)
emit_insn (gen_rtx_SET (target, parts.index));
 
- /* Use shift for scaling.  */
- ix86_emit_binop (ASHIFT, mode, target,
-  GEN_INT (exact_log2 (parts.scale)));
+ /* Use shift for scaling, but emit it as MULT instead
+to avoid it being immediately peephole2 optimized back
+into lea.  */
+ ix86_emit_binop (MULT, mode, target, GEN_INT (parts.scale));
 
  if (parts.base)
ix86_emit_binop (PLUS, mode, target, parts.base);
--- gcc/config/i386/i386.md.jj  2021-03-16 00:21:12.192422264 +0100
+++ gcc/config/i386/i386.md 2021-03-16 12:41:27.384022356 +0100
@@ -5219,6 +5219,18 @@ (define_peephole2
 
   DONE;
 })
+
+;; ix86_split_lea_for_addr emits the shifts as MULT to avoid it from being
+;; peephole2 optimized back into a lea.  Split that into the shift during
+;; the following split pass.
+(define_split
+  [(set (match_operand:SWI48 0 "general_reg_operand")
+   (mult:SWI48 (match_dup 0) (match_operand:SWI48 1 "const1248_operand")))
+   (clobber (reg:CC FLAGS_REG))]
+  "reload_completed"
+  [(parallel [(set (match_dup 0) (ashift:SWI48 (match_dup 0) (match_dup 1)))
+  (clobber (reg:CC FLAGS_REG))])]
+  "operands[1] = GEN_INT (exact_log2 (INTVAL (operands[1])));")
 
 ;; Add instructions
 
--- gcc/testsuite/gcc.target/i386/pr99600.c.jj  2021-03-16 12:26:50.610747515 
+0100
+++ gcc/testsuite/gcc.target/i386/pr99600.c 2021-03-16 12:26:50.610747515 
+0100
@@ -0,0 +1,16 @@
+/* PR target/99600 */
+/* { dg-do compile } */
+/* { dg-options "-O2 -march=atom" } */
+
+char a, b;
+long c;
+
+long
+foo (void)
+{
+  if (a)
+c = b == 1 ? 1 << 3 : 1 << 2;
+  else
+c = 0;
+  return 0;
+}


Jakub



RFA: libiberty: silence static analyzer warning

2021-03-16 Thread Nick Clifton via Gcc-patches
Hi Ian,

  One of the static analyzers we use is throwing up an error report for
  one of the libiberty source files:

Error: BUFFER_SIZE (CWE-474):
libiberty/sha1.c:261: overlapping_buffer: The source buffer ">buffer[16]" 
potentially overlaps with the destination buffer "ctx->buffer", which results 
in undefined behavior for "memcpy".
libiberty/sha1.c:261: remediation: Use memmove instead of "memcpy".
#  259|   sha1_process_block (ctx->buffer, 64, ctx);
#  260|   left_over -= 64;
#  261|-> memcpy (ctx->buffer, >buffer[16], left_over);
#  262| }
#  263| ctx->buflen = left_over;

  Looking at the source code I am not sure if the problem can actually
  be triggered in reality, but there seems to be no harm in being
  cautious, so I would like to ask for permission to apply the following
  patch:

diff --git a/libiberty/sha1.c b/libiberty/sha1.c
index e3d7f86e351..7d15d48d11d 100644
--- a/libiberty/sha1.c
+++ b/libiberty/sha1.c
@@ -258,7 +258,7 @@ sha1_process_bytes (const void *buffer, size_t len, struct 
sha1_ctx *ctx)
{
  sha1_process_block (ctx->buffer, 64, ctx);
  left_over -= 64;
- memcpy (ctx->buffer, >buffer[16], left_over);
+ memmove (ctx->buffer, >buffer[16], left_over);
}
   ctx->buflen = left_over;
 }

Cheers
  Nick
  



Re: c++: Fix 2 testcases [PR 99601]

2021-03-16 Thread Nathan Sidwell

On 3/15/21 7:29 PM, Jakub Jelinek wrote:

On Mon, Mar 15, 2021 at 03:28:06PM -0400, Nathan Sidwell wrote:


I'd failed to correctly restrict some checks to lp64 x86 targets.

PR c++/99601
gcc/testsuite/
* g++.dg/modules/builtin-3_a.C: Fix lp64 x86 detection.
* g++.dg/modules/builtin-3_b.C: Fix lp64 x86 detection.


ERROR: tcl error sourcing 
/home/jakub/src/gcc/gcc/testsuite/g++.dg/modules/modules.exp.
ERROR: unmatched open brace in list


Sigh, I always fall for the trap of 'check the fail went away', not 
'verify it now passes'.  Thanks for fixing.


nathan

--
Nathan Sidwell


Re: GCC: v850-elf

2021-03-16 Thread Nick Clifton via Gcc-patches

Hi Jan-Benedict,


With my re-started testing efforts, I've got another one for you I
guess (`./configure --target=v850-elf && make all-gcc`):

../.././gcc/config/v850/v850.c: In function ‘char* construct_restore_jr(rtx)’:
../.././gcc/config/v850/v850.c:2260:22: error: ‘%s’ directive writing up to 39 
bytes into a region of size between 32 and 71 [-Werror=format-overflow=]
  2260 |   sprintf (buff, "movhi hi(%s), r0, r6\n\tmovea lo(%s), r6, r6\n\tjmp 
r6",
   |  
^~~~
  2261 | name, name);
   |   


I could not reproduce these in my local environment, but I suspect that
you are using a more recent version of gcc than me.  The fix looks obvious
however, so please could you try out the attached patch and let me know
if it resolves the problem ?

Cheers
  Nick
diff --git a/gcc/config/v850/v850.c b/gcc/config/v850/v850.c
index 249cb400177..db3002a2cfb 100644
--- a/gcc/config/v850/v850.c
+++ b/gcc/config/v850/v850.c
@@ -2181,7 +2181,7 @@ construct_restore_jr (rtx op)
   unsigned long int first;
   unsigned long int last;
   int i;
-  static char buff [100]; /* XXX */
+  static char buff [256]; /* XXX */
   
   if (count <= 2)
 {


GCC 11.0.1 Status Report (2021-03-16)

2021-03-16 Thread Richard Biener


Status
==

GCC trunk which eventually will become GCC 11 is still in
regression and documentation fixes only mode (Stage 4).

If history should repeat itself then a first release candidate
of GCC 11 will be released mid April.  For this to happen
we need to resolve the remaining 17 P1 regressions - 8 of which
are classified as target or bootstrap issues.  Please have
a look at the set of P1 (and preferably also P2) regressions
which can be conveniently accessed via the 'Serious regressions'
link on https://gcc.gnu.org


Quality Data


Priority  #   Change from last report
---   ---
P1   17   - 45
P2  288   - 46
P3   37   +  1
P4  192   +  2
P5   24
---   ---
Total P1-P3 342   - 90
Total   558   - 88


Previous Report
===

https://gcc.gnu.org/pipermail/gcc/2021-January/234703.html


Re: [PATCH] i386: Avoid mutual recursion between two peephole2s [PR99600]

2021-03-16 Thread Uros Bizjak via Gcc-patches
On Tue, Mar 16, 2021 at 11:10 AM Jakub Jelinek  wrote:
>
> Hi!
>
> As the testcase shows, the compiler hangs and eats all memory when compiling
> it.  This is because in r11-7274-gdecd8fb0128870d0d768ba53dae626913d6d9c54
> I have changed the ix86_avoid_lea_for_addr splitting from a splitter
> into a peephole2 (because during splitting passes we don't have guaranteed
> df, while during peephole2 we do).
> The problem is we have another peephole2 that works in the opposite way,
> when seeing split lea (in particular ASHIFT followed by PLUS) it attempts
> to turn it back into a lea.
> In the past, they were fighting against each other, but as they were in
> different passes, simply the last one won.  So, split after reload
> split the lea into shift left and plus, peephole2 reverted that (but, note
> not perfectly, the peephole2 doesn't understand that something can be placed
> into lea disp; to be fixed for GCC12) and then another split pass split the
> lea appart again.
> But my changes and the way peephole2 works means that we endlessly iterate
> over those two, the first peephole2 splits the lea, the second one reverts
> it, the first peephole2 splits the new lea back into new 2 insns and so
> forth forever.
> So, we need to break the cycle somehow.  Best would be to call
> ix86_avoid_lea_for_addr in the second peephole2 and if it says it would be
> split, undo, but unfortunately calling that function is very non-trivial,
> because it needs to walk insns forwards and backwards from the given insn
> and uses df in those walks.  Furthermore, one of the registers in the new
> lea is set by the newly added insn.  So I'd be afraid we'd need to
> temporarily register the new insns with df and replace in the IL the old
> insns with new insns (and undo their df), then call the function and then
> undo everything we did.
> So, this patch instead breaks the cycle by remembering INSN_UID of the last
> ASHIFT ix86_split_lea_for_addr emitted and punting on the ASHIFT + PLUS
> peephole2 when it is that uid.  We need to reset it somewhere, so I've
> added clearing of the var to ix86_expand_prologue which is guaranteed to be
> a few passes before peephole2.

Maybe we could simply emit a special form of a ASHIFT pattern, tagged
with some unspec (similar to e.g. divmod4_1), and teach
ix86_split_lea_for_addr to emit it instead? Peephole pass is so late
in the pass sequence that we won't lose anything. We only need one
additional SWI48mode ASHIFT pattern with a const123_operand immediate.

Uros.

> Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?
>
> Or would you e.g. prefer if I don't use a global var but stick it into
> cfun->machine->last_lea_split_shift_uid ?  That way it would be cleared
> automatically.
>
> 2021-03-16  Jakub Jelinek  
>
> PR target/99600
> * config/i386/i386-protos.h (ix86_last_lea_split_shift_uid): Declare.
> * config/i386/i386-expand.c (ix86_last_lea_split_shift_uid): New
> variable.
> (ix86_split_lea_for_addr): Set it to INSN_UID of the emitted shift
> insn.
> * config/i386/i386.md (ashift + add peephole2 into lea): Punt if
> INSN_UID of ashift is ix86_last_lea_split_shift_uid.
> * config/i386/i386.c (ix86_expand_prologue): Clear
> ix86_last_lea_split_shift_uid.
>
> * gcc.target/i386/pr99600.c: New test.
>
> --- gcc/config/i386/i386-protos.h.jj2021-01-04 10:25:45.436159056 +0100
> +++ gcc/config/i386/i386-protos.h   2021-03-15 20:06:36.267515383 +0100
> @@ -109,6 +109,7 @@ extern bool ix86_binary_operator_ok (enu
>  extern bool ix86_avoid_lea_for_add (rtx_insn *, rtx[]);
>  extern bool ix86_use_lea_for_mov (rtx_insn *, rtx[]);
>  extern bool ix86_avoid_lea_for_addr (rtx_insn *, rtx[]);
> +extern int ix86_last_lea_split_shift_uid;
>  extern void ix86_split_lea_for_addr (rtx_insn *, rtx[], machine_mode);
>  extern bool ix86_lea_for_add_ok (rtx_insn *, rtx[]);
>  extern bool ix86_vec_interleave_v2df_operator_ok (rtx operands[3], bool 
> high);
> --- gcc/config/i386/i386-expand.c.jj2021-03-15 12:34:26.549901726 +0100
> +++ gcc/config/i386/i386-expand.c   2021-03-15 20:08:52.992016204 +0100
> @@ -1291,6 +1291,10 @@ find_nearest_reg_def (rtx_insn *insn, in
>return false;
>  }
>
> +/* INSN_UID of the last ASHIFT insn emitted by ix86_split_lea_for_addr
> +   or zero if none has been emitted yet.  */
> +int ix86_last_lea_split_shift_uid;
> +
>  /* Split lea instructions into a sequence of instructions
> which are executed on ALU to avoid AGU stalls.
> It is assumed that it is allowed to clobber flags register
> @@ -1352,6 +1356,9 @@ ix86_split_lea_for_addr (rtx_insn *insn,
>   ix86_emit_binop (ASHIFT, mode, target,
>GEN_INT (exact_log2 (parts.scale)));
>
> + rtx_insn *ashift_insn = get_last_insn ();
> + ix86_last_lea_split_shift_uid = INSN_UID (ashift_insn);
> +
>   if (parts.base)
> ix86_emit_binop 

[PATCH][pushed] gcc-changelog: skip broken commit in git_update_version.py.

2021-03-16 Thread Martin Liška

Hi.

For an unknown reason the following commit modified ChangeLog file
together with other changes:

$ git show --stat c2be82058fb40f3ae891c68d185ff53e07f14f45
...
 libstdc++-v3/ChangeLog   | 9 +
 libstdc++-v3/src/Makefile.am | 4 +++-
 libstdc++-v3/src/Makefile.in | 3 ++-

Let's ignore the commit in git_update_version.py.

Martin

contrib/ChangeLog:

* gcc-changelog/git_update_version.py: Skip one problematic
commit.
---
 contrib/gcc-changelog/git_update_version.py | 4 
 1 file changed, 4 insertions(+)

diff --git a/contrib/gcc-changelog/git_update_version.py 
b/contrib/gcc-changelog/git_update_version.py
index d2cadb8811c..1e2b22b008b 100755
--- a/contrib/gcc-changelog/git_update_version.py
+++ b/contrib/gcc-changelog/git_update_version.py
@@ -26,6 +26,9 @@ from git_repository import parse_git_revisions
 
 current_timestamp = datetime.datetime.now().strftime('%Y%m%d\n')
 
+# Skip the following commits, they cannot be correctly processed

+IGNORED_COMMITS = ('c2be82058fb40f3ae891c68d185ff53e07f14f45')
+
 
 def read_timestamp(path):

 with open(path) as f:
@@ -98,6 +101,7 @@ def update_current_branch():
 head = head.parents[1]
 commits = parse_git_revisions(args.git_path, '%s..%s'
   % (commit.hexsha, head.hexsha))
+commits = [c for c in commits if c.info.hexsha not in IGNORED_COMMITS]
 for git_commit in reversed(commits):
 prepend_to_changelog_files(repo, args.git_path, git_commit,
not args.dry_mode)
--
2.30.2



Re: [PATCH] aarch64: Fix up aarch64_simd_clone_compute_vecsize_and_simdlen [PR99542]

2021-03-16 Thread Richard Biener via Gcc-patches
On Tue, Mar 16, 2021 at 11:19 AM Jakub Jelinek  wrote:
>
> On Tue, Mar 16, 2021 at 11:16:54AM +0100, Richard Biener wrote:
> > > Not varargs semantics, but unprototyped function semantics, i.e. the same
> > > as for
> > > int foo ();
> > > Which means callers can pass anything, but it is not ..., i.e. callee 
> > > can't
> > > use va_start/va_end/va_arg and the ... calling conventions are not in 
> > > place
> > > (e.g. the passing of arguments in two places on some targets, or the
> > > extra %rax on x86_64 etc.).
> >
> > Right, so for the caller side looking at DECL_ARGUMENTS is wrong as well
> > then, the actual argument list is what matters for the used ABI.
> >
> > Anyway, I suppose the hook may just give up for calls to unprototyped 
> > functions?
>
> It can't.  We can punt at trying to optimize them that way in the
> vectorizer, that is purely an optimization, but on the function definition,
> it is an ABI thing and perhaps some caller in some other TU could have correct
> prototype and call clones that don't exist.

I suppose we could simply reject OMP simd on K style definitions, OTOH they
seem to be supported even in C18 ...

> Jakub
>


Re: [PATCH] aarch64: Fix up aarch64_simd_clone_compute_vecsize_and_simdlen [PR99542]

2021-03-16 Thread Jakub Jelinek via Gcc-patches
On Tue, Mar 16, 2021 at 11:16:54AM +0100, Richard Biener wrote:
> > Not varargs semantics, but unprototyped function semantics, i.e. the same
> > as for
> > int foo ();
> > Which means callers can pass anything, but it is not ..., i.e. callee can't
> > use va_start/va_end/va_arg and the ... calling conventions are not in place
> > (e.g. the passing of arguments in two places on some targets, or the
> > extra %rax on x86_64 etc.).
> 
> Right, so for the caller side looking at DECL_ARGUMENTS is wrong as well
> then, the actual argument list is what matters for the used ABI.
> 
> Anyway, I suppose the hook may just give up for calls to unprototyped 
> functions?

It can't.  We can punt at trying to optimize them that way in the
vectorizer, that is purely an optimization, but on the function definition,
it is an ABI thing and perhaps some caller in some other TU could have correct
prototype and call clones that don't exist.

Jakub



Re: [PATCH] aarch64: Fix up aarch64_simd_clone_compute_vecsize_and_simdlen [PR99542]

2021-03-16 Thread Richard Biener via Gcc-patches
On Tue, Mar 16, 2021 at 10:40 AM Jakub Jelinek  wrote:
>
> On Tue, Mar 16, 2021 at 10:36:35AM +0100, Richard Biener wrote:
> > On Tue, Mar 16, 2021 at 10:29 AM Jakub Jelinek  wrote:
> > >
> > > On Tue, Mar 16, 2021 at 09:46:33AM +0100, Richard Biener wrote:
> > > > > shows we have an interface problem here -- we shouldn't need 
> > > > > something this
> > > > > convoluted to walk through an argument list.  But that's not your 
> > > > > problem
> > > > > or a problem with the patch.
> > > >
> > > > The caller side should IMHO always look at TYPE_ARG_TYPES since
> > > > that's what determines the ABI.  But IIRC there were problems in the 
> > > > past
> > > > that TYPE_ARG_TYPES might be NULL but DECL_ARGUMENTS not?!
> > >
> > > Yes, for the K definitions
> > > #pragma omp declare simd
> > > int
> > > foo (a, b, c)
> > >   int a, b, c;
> > > {
> > >   return a + b + c;
> > > }
> >
> > The C frontend could still populate TYPE_ARG_TYPES here, OTOH IIRC
> > K definitions work like unprototyped functions on the caller side, thus
> > having varargs argument passing semantics?
>
> Not varargs semantics, but unprototyped function semantics, i.e. the same
> as for
> int foo ();
> Which means callers can pass anything, but it is not ..., i.e. callee can't
> use va_start/va_end/va_arg and the ... calling conventions are not in place
> (e.g. the passing of arguments in two places on some targets, or the
> extra %rax on x86_64 etc.).

Right, so for the caller side looking at DECL_ARGUMENTS is wrong as well
then, the actual argument list is what matters for the used ABI.

Anyway, I suppose the hook may just give up for calls to unprototyped functions?

Richard.

>
> Jakub
>


[PATCH] i386: Avoid mutual recursion between two peephole2s [PR99600]

2021-03-16 Thread Jakub Jelinek via Gcc-patches
Hi!

As the testcase shows, the compiler hangs and eats all memory when compiling
it.  This is because in r11-7274-gdecd8fb0128870d0d768ba53dae626913d6d9c54
I have changed the ix86_avoid_lea_for_addr splitting from a splitter
into a peephole2 (because during splitting passes we don't have guaranteed
df, while during peephole2 we do).
The problem is we have another peephole2 that works in the opposite way,
when seeing split lea (in particular ASHIFT followed by PLUS) it attempts
to turn it back into a lea.
In the past, they were fighting against each other, but as they were in
different passes, simply the last one won.  So, split after reload
split the lea into shift left and plus, peephole2 reverted that (but, note
not perfectly, the peephole2 doesn't understand that something can be placed
into lea disp; to be fixed for GCC12) and then another split pass split the
lea appart again.
But my changes and the way peephole2 works means that we endlessly iterate
over those two, the first peephole2 splits the lea, the second one reverts
it, the first peephole2 splits the new lea back into new 2 insns and so
forth forever.
So, we need to break the cycle somehow.  Best would be to call
ix86_avoid_lea_for_addr in the second peephole2 and if it says it would be
split, undo, but unfortunately calling that function is very non-trivial,
because it needs to walk insns forwards and backwards from the given insn
and uses df in those walks.  Furthermore, one of the registers in the new
lea is set by the newly added insn.  So I'd be afraid we'd need to
temporarily register the new insns with df and replace in the IL the old
insns with new insns (and undo their df), then call the function and then
undo everything we did.
So, this patch instead breaks the cycle by remembering INSN_UID of the last
ASHIFT ix86_split_lea_for_addr emitted and punting on the ASHIFT + PLUS
peephole2 when it is that uid.  We need to reset it somewhere, so I've
added clearing of the var to ix86_expand_prologue which is guaranteed to be
a few passes before peephole2.

Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?

Or would you e.g. prefer if I don't use a global var but stick it into
cfun->machine->last_lea_split_shift_uid ?  That way it would be cleared
automatically.

2021-03-16  Jakub Jelinek  

PR target/99600
* config/i386/i386-protos.h (ix86_last_lea_split_shift_uid): Declare.
* config/i386/i386-expand.c (ix86_last_lea_split_shift_uid): New
variable.
(ix86_split_lea_for_addr): Set it to INSN_UID of the emitted shift
insn.
* config/i386/i386.md (ashift + add peephole2 into lea): Punt if
INSN_UID of ashift is ix86_last_lea_split_shift_uid.
* config/i386/i386.c (ix86_expand_prologue): Clear
ix86_last_lea_split_shift_uid.

* gcc.target/i386/pr99600.c: New test.

--- gcc/config/i386/i386-protos.h.jj2021-01-04 10:25:45.436159056 +0100
+++ gcc/config/i386/i386-protos.h   2021-03-15 20:06:36.267515383 +0100
@@ -109,6 +109,7 @@ extern bool ix86_binary_operator_ok (enu
 extern bool ix86_avoid_lea_for_add (rtx_insn *, rtx[]);
 extern bool ix86_use_lea_for_mov (rtx_insn *, rtx[]);
 extern bool ix86_avoid_lea_for_addr (rtx_insn *, rtx[]);
+extern int ix86_last_lea_split_shift_uid;
 extern void ix86_split_lea_for_addr (rtx_insn *, rtx[], machine_mode);
 extern bool ix86_lea_for_add_ok (rtx_insn *, rtx[]);
 extern bool ix86_vec_interleave_v2df_operator_ok (rtx operands[3], bool high);
--- gcc/config/i386/i386-expand.c.jj2021-03-15 12:34:26.549901726 +0100
+++ gcc/config/i386/i386-expand.c   2021-03-15 20:08:52.992016204 +0100
@@ -1291,6 +1291,10 @@ find_nearest_reg_def (rtx_insn *insn, in
   return false;
 }
 
+/* INSN_UID of the last ASHIFT insn emitted by ix86_split_lea_for_addr
+   or zero if none has been emitted yet.  */
+int ix86_last_lea_split_shift_uid;
+
 /* Split lea instructions into a sequence of instructions
which are executed on ALU to avoid AGU stalls.
It is assumed that it is allowed to clobber flags register
@@ -1352,6 +1356,9 @@ ix86_split_lea_for_addr (rtx_insn *insn,
  ix86_emit_binop (ASHIFT, mode, target,
   GEN_INT (exact_log2 (parts.scale)));
 
+ rtx_insn *ashift_insn = get_last_insn ();
+ ix86_last_lea_split_shift_uid = INSN_UID (ashift_insn);
+
  if (parts.base)
ix86_emit_binop (PLUS, mode, target, parts.base);
 
--- gcc/config/i386/i386.md.jj  2021-03-05 21:51:33.675350463 +0100
+++ gcc/config/i386/i386.md 2021-03-15 20:16:03.796292449 +0100
@@ -20680,6 +20680,11 @@ (define_peephole2
 (match_operand 4 "x86_64_general_operand")))
   (clobber (reg:CC FLAGS_REG))])]
   "IN_RANGE (INTVAL (operands[2]), 1, 3)
+   /* Avoid the ix86_avoid_lea_for_addr peephole2 from splitting
+  lea and this peephole2 to undo that into another lea that
+  would be split again.  Break that cycle by ignoring 

Re: [PATCH] i386: Fix up _mm256_vzeroupper() handling [PR99563]

2021-03-16 Thread Uros Bizjak via Gcc-patches
On Tue, Mar 16, 2021 at 10:51 AM Jakub Jelinek  wrote:
>
> Hi!
>
> My r10-6451-gb7b3378f91c0641f2ef4d88db22af62a571c9359 fix for
> vzeroupper vs. ms ABI apparently broke the explicit vzeroupper handling
> when the implicit vzeroupper handling is disabled.
> The epilogue_completed splitter for vzeroupper now adds clobbers for all
> registers which don't have explicit sets in the pattern and the sets are
> added during vzeroupper pass.  Before my changes, for explicit user
> vzeroupper, we just weren't modelling its effects at all, it was just
> unspec that didn't tell that it clobbers the upper parts of all XMM < %xmm16
> registers.  But now the splitter will even for those add clobbers and as
> it has no sets, it will add clobbers for all registers, which means
> we optimize away anything that lived across that vzeroupper.
>
> The vzeroupper pass has two parts, one is the mode switching that computes
> where to put the implicit vzeroupper calls and puts them there, and then
> another that uses df to figure out what sets to add to all the vzeroupper.
> The former part should be done only under the conditions we have in the
> gate, but the latter as this PR shows needs to happen either if we perform
> the implicit vzeroupper additions, or if there are (or could be) any
> explicit vzeroupper instructions.  As that function does df_analyze and
> walks the whole IL, I think it would be too expensive to run it always
> whenever TARGET_AVX, so this patch remembers if we've expanded at least
> one __builtin_ia32_vzeroupper in the function and runs that part of the
> vzeroupper pass both when the old condition is true or when this new
> flag is set.
>
> Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?
>
> 2021-03-16  Jakub Jelinek  
>
> PR target/99563
> * config/i386/i386.h (struct machine_function): Add
> has_explicit_vzeroupper bitfield.
> * config/i386/i386-expand.c (ix86_expand_builtin): Set
> cfun->machine->has_explicit_vzeroupper when expanding
> IX86_BUILTIN_VZEROUPPER.
> * config/i386/i386-features.c (rest_of_handle_insert_vzeroupper):
> Do the mode switching only when TARGET_VZEROUPPER, expensive
> optimizations turned on and not optimizing for size.
> (pass_insert_vzeroupper::gate): Enable even when
> cfun->machine->has_explicit_vzeroupper is set.
>
> * gcc.target/i386/avx-pr99563.c: New test.

OK.

Thanks,
Uros.

>
> --- gcc/config/i386/i386.h.jj   2021-02-22 17:54:05.617799002 +0100
> +++ gcc/config/i386/i386.h  2021-03-15 12:30:00.814841624 +0100
> @@ -2941,6 +2941,10 @@ struct GTY(()) machine_function {
>/* True if the function needs a stack frame.  */
>BOOL_BITFIELD stack_frame_required : 1;
>
> +  /* True if __builtin_ia32_vzeroupper () has been expanded in current
> + function.  */
> +  BOOL_BITFIELD has_explicit_vzeroupper : 1;
> +
>/* The largest alignment, in bytes, of stack slot actually used.  */
>unsigned int max_used_stack_alignment;
>
> --- gcc/config/i386/i386-expand.c.jj2021-02-09 12:28:14.069323264 +0100
> +++ gcc/config/i386/i386-expand.c   2021-03-15 12:34:26.549901726 +0100
> @@ -13210,6 +13210,10 @@ rdseed_step:
>
>return 0;
>
> +case IX86_BUILTIN_VZEROUPPER:
> +  cfun->machine->has_explicit_vzeroupper = true;
> +  break;
> +
>  default:
>break;
>  }
> --- gcc/config/i386/i386-features.c.jj  2021-02-01 09:55:45.953519272 +0100
> +++ gcc/config/i386/i386-features.c 2021-03-15 12:37:07.886116827 +0100
> @@ -1837,19 +1837,22 @@ ix86_add_reg_usage_to_vzerouppers (void)
>  static unsigned int
>  rest_of_handle_insert_vzeroupper (void)
>  {
> -  int i;
> -
> -  /* vzeroupper instructions are inserted immediately after reload to
> - account for possible spills from 256bit or 512bit registers.  The pass
> - reuses mode switching infrastructure by re-running mode insertion
> - pass, so disable entities that have already been processed.  */
> -  for (i = 0; i < MAX_386_ENTITIES; i++)
> -ix86_optimize_mode_switching[i] = 0;
> +  if (TARGET_VZEROUPPER
> +  && flag_expensive_optimizations
> +  && !optimize_size)
> +{
> +  /* vzeroupper instructions are inserted immediately after reload to
> +account for possible spills from 256bit or 512bit registers.  The 
> pass
> +reuses mode switching infrastructure by re-running mode insertion
> +pass, so disable entities that have already been processed.  */
> +  for (int i = 0; i < MAX_386_ENTITIES; i++)
> +   ix86_optimize_mode_switching[i] = 0;
>
> -  ix86_optimize_mode_switching[AVX_U128] = 1;
> +  ix86_optimize_mode_switching[AVX_U128] = 1;
>
> -  /* Call optimize_mode_switching.  */
> -  g->get_passes ()->execute_pass_mode_switching ();
> +  /* Call optimize_mode_switching.  */
> +  g->get_passes ()->execute_pass_mode_switching ();
> +}
>ix86_add_reg_usage_to_vzerouppers ();
>

[PATCH] i386: Fix up _mm256_vzeroupper() handling [PR99563]

2021-03-16 Thread Jakub Jelinek via Gcc-patches
Hi!

My r10-6451-gb7b3378f91c0641f2ef4d88db22af62a571c9359 fix for
vzeroupper vs. ms ABI apparently broke the explicit vzeroupper handling
when the implicit vzeroupper handling is disabled.
The epilogue_completed splitter for vzeroupper now adds clobbers for all
registers which don't have explicit sets in the pattern and the sets are
added during vzeroupper pass.  Before my changes, for explicit user
vzeroupper, we just weren't modelling its effects at all, it was just
unspec that didn't tell that it clobbers the upper parts of all XMM < %xmm16
registers.  But now the splitter will even for those add clobbers and as
it has no sets, it will add clobbers for all registers, which means
we optimize away anything that lived across that vzeroupper.

The vzeroupper pass has two parts, one is the mode switching that computes
where to put the implicit vzeroupper calls and puts them there, and then
another that uses df to figure out what sets to add to all the vzeroupper.
The former part should be done only under the conditions we have in the
gate, but the latter as this PR shows needs to happen either if we perform
the implicit vzeroupper additions, or if there are (or could be) any
explicit vzeroupper instructions.  As that function does df_analyze and
walks the whole IL, I think it would be too expensive to run it always
whenever TARGET_AVX, so this patch remembers if we've expanded at least
one __builtin_ia32_vzeroupper in the function and runs that part of the
vzeroupper pass both when the old condition is true or when this new
flag is set.

Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?

2021-03-16  Jakub Jelinek  

PR target/99563
* config/i386/i386.h (struct machine_function): Add
has_explicit_vzeroupper bitfield.
* config/i386/i386-expand.c (ix86_expand_builtin): Set
cfun->machine->has_explicit_vzeroupper when expanding
IX86_BUILTIN_VZEROUPPER.
* config/i386/i386-features.c (rest_of_handle_insert_vzeroupper):
Do the mode switching only when TARGET_VZEROUPPER, expensive
optimizations turned on and not optimizing for size.
(pass_insert_vzeroupper::gate): Enable even when
cfun->machine->has_explicit_vzeroupper is set.

* gcc.target/i386/avx-pr99563.c: New test.

--- gcc/config/i386/i386.h.jj   2021-02-22 17:54:05.617799002 +0100
+++ gcc/config/i386/i386.h  2021-03-15 12:30:00.814841624 +0100
@@ -2941,6 +2941,10 @@ struct GTY(()) machine_function {
   /* True if the function needs a stack frame.  */
   BOOL_BITFIELD stack_frame_required : 1;
 
+  /* True if __builtin_ia32_vzeroupper () has been expanded in current
+ function.  */
+  BOOL_BITFIELD has_explicit_vzeroupper : 1;
+
   /* The largest alignment, in bytes, of stack slot actually used.  */
   unsigned int max_used_stack_alignment;
 
--- gcc/config/i386/i386-expand.c.jj2021-02-09 12:28:14.069323264 +0100
+++ gcc/config/i386/i386-expand.c   2021-03-15 12:34:26.549901726 +0100
@@ -13210,6 +13210,10 @@ rdseed_step:
 
   return 0;
 
+case IX86_BUILTIN_VZEROUPPER:
+  cfun->machine->has_explicit_vzeroupper = true;
+  break;
+
 default:
   break;
 }
--- gcc/config/i386/i386-features.c.jj  2021-02-01 09:55:45.953519272 +0100
+++ gcc/config/i386/i386-features.c 2021-03-15 12:37:07.886116827 +0100
@@ -1837,19 +1837,22 @@ ix86_add_reg_usage_to_vzerouppers (void)
 static unsigned int
 rest_of_handle_insert_vzeroupper (void)
 {
-  int i;
-
-  /* vzeroupper instructions are inserted immediately after reload to
- account for possible spills from 256bit or 512bit registers.  The pass
- reuses mode switching infrastructure by re-running mode insertion
- pass, so disable entities that have already been processed.  */
-  for (i = 0; i < MAX_386_ENTITIES; i++)
-ix86_optimize_mode_switching[i] = 0;
+  if (TARGET_VZEROUPPER
+  && flag_expensive_optimizations
+  && !optimize_size)
+{
+  /* vzeroupper instructions are inserted immediately after reload to
+account for possible spills from 256bit or 512bit registers.  The pass
+reuses mode switching infrastructure by re-running mode insertion
+pass, so disable entities that have already been processed.  */
+  for (int i = 0; i < MAX_386_ENTITIES; i++)
+   ix86_optimize_mode_switching[i] = 0;
 
-  ix86_optimize_mode_switching[AVX_U128] = 1;
+  ix86_optimize_mode_switching[AVX_U128] = 1;
 
-  /* Call optimize_mode_switching.  */
-  g->get_passes ()->execute_pass_mode_switching ();
+  /* Call optimize_mode_switching.  */
+  g->get_passes ()->execute_pass_mode_switching ();
+}
   ix86_add_reg_usage_to_vzerouppers ();
   return 0;
 }
@@ -1880,8 +1883,10 @@ public:
   virtual bool gate (function *)
 {
   return TARGET_AVX
-&& TARGET_VZEROUPPER && flag_expensive_optimizations
-&& !optimize_size;
+&& ((TARGET_VZEROUPPER
+ && 

Re: [PATCH] aarch64: Fix up aarch64_simd_clone_compute_vecsize_and_simdlen [PR99542]

2021-03-16 Thread Jakub Jelinek via Gcc-patches
On Tue, Mar 16, 2021 at 10:36:35AM +0100, Richard Biener wrote:
> On Tue, Mar 16, 2021 at 10:29 AM Jakub Jelinek  wrote:
> >
> > On Tue, Mar 16, 2021 at 09:46:33AM +0100, Richard Biener wrote:
> > > > shows we have an interface problem here -- we shouldn't need something 
> > > > this
> > > > convoluted to walk through an argument list.  But that's not your 
> > > > problem
> > > > or a problem with the patch.
> > >
> > > The caller side should IMHO always look at TYPE_ARG_TYPES since
> > > that's what determines the ABI.  But IIRC there were problems in the past
> > > that TYPE_ARG_TYPES might be NULL but DECL_ARGUMENTS not?!
> >
> > Yes, for the K definitions
> > #pragma omp declare simd
> > int
> > foo (a, b, c)
> >   int a, b, c;
> > {
> >   return a + b + c;
> > }
> 
> The C frontend could still populate TYPE_ARG_TYPES here, OTOH IIRC
> K definitions work like unprototyped functions on the caller side, thus
> having varargs argument passing semantics?

Not varargs semantics, but unprototyped function semantics, i.e. the same
as for
int foo ();
Which means callers can pass anything, but it is not ..., i.e. callee can't
use va_start/va_end/va_arg and the ... calling conventions are not in place
(e.g. the passing of arguments in two places on some targets, or the
extra %rax on x86_64 etc.).

Jakub



Re: [PATCH] aarch64: Fix up aarch64_simd_clone_compute_vecsize_and_simdlen [PR99542]

2021-03-16 Thread Richard Biener via Gcc-patches
On Tue, Mar 16, 2021 at 10:29 AM Jakub Jelinek  wrote:
>
> On Tue, Mar 16, 2021 at 09:46:33AM +0100, Richard Biener wrote:
> > > shows we have an interface problem here -- we shouldn't need something 
> > > this
> > > convoluted to walk through an argument list.  But that's not your problem
> > > or a problem with the patch.
> >
> > The caller side should IMHO always look at TYPE_ARG_TYPES since
> > that's what determines the ABI.  But IIRC there were problems in the past
> > that TYPE_ARG_TYPES might be NULL but DECL_ARGUMENTS not?!
>
> Yes, for the K definitions
> #pragma omp declare simd
> int
> foo (a, b, c)
>   int a, b, c;
> {
>   return a + b + c;
> }

The C frontend could still populate TYPE_ARG_TYPES here, OTOH IIRC
K definitions work like unprototyped functions on the caller side, thus
having varargs argument passing semantics?

> Jakub
>


Re: [PATCH] aarch64: Fix up aarch64_simd_clone_compute_vecsize_and_simdlen [PR99542]

2021-03-16 Thread Jakub Jelinek via Gcc-patches
On Tue, Mar 16, 2021 at 09:46:33AM +0100, Richard Biener wrote:
> > shows we have an interface problem here -- we shouldn't need something this
> > convoluted to walk through an argument list.  But that's not your problem
> > or a problem with the patch.
> 
> The caller side should IMHO always look at TYPE_ARG_TYPES since
> that's what determines the ABI.  But IIRC there were problems in the past
> that TYPE_ARG_TYPES might be NULL but DECL_ARGUMENTS not?!

Yes, for the K definitions
#pragma omp declare simd
int
foo (a, b, c)
  int a, b, c;
{
  return a + b + c;
}

Jakub



Re: [PATCH][pushed] analyzer: document new param

2021-03-16 Thread Martin Liška

On 3/15/21 9:57 PM, Martin Sebor wrote:

Any plans to integrate it into the testsuite?  (That way we presumably
wouldn't need to remember to run it by hand.)


Likely not, I'm not so big friend with DejaGNU.
Are you willing to help me with that?

Thanks,
Martin


Re: [PATCH] aarch64: Fix up aarch64_simd_clone_compute_vecsize_and_simdlen [PR99542]

2021-03-16 Thread Richard Biener via Gcc-patches
On Tue, Mar 16, 2021 at 9:43 AM Richard Sandiford via Gcc-patches
 wrote:
>
> Kyrylo Tkachov  writes:
> > Hi Jakub,
> >
> >> -Original Message-
> >> From: Jakub Jelinek 
> >> Sent: 13 March 2021 20:34
> >> To: Richard Sandiford ; Richard Earnshaw
> >> ; Kyrylo Tkachov 
> >> Cc: gcc-patches@gcc.gnu.org
> >> Subject: [PATCH] aarch64: Fix up
> >> aarch64_simd_clone_compute_vecsize_and_simdlen [PR99542]
> >>
> >> Hi!
> >>
> >> As the patch shows, there are several bugs in
> >> aarch64_simd_clone_compute_vecsize_and_simdlen.
> >> One is that unlike for function declarations that aren't definitions
> >> it completely ignores argument types.  Such decls don't have
> >> DECL_ARGUMENTS,
> >> but we can walk TYPE_ARG_TYPES instead, like the i386 backend does or like
> >> the simd cloning code in the middle end does too.
> >>
> >> Another problem is that it checks types of uniform arguments.  That is
> >> unnecessary, uniform arguments are passed the way it normally is, it is
> >> a scalar argument rather than vector, so there is no reason not to support
> >> uniform argument of different size, or long double, structure etc.
> >>
> >> Fixed thusly, bootstrapped/regtested on aarch64-linux, ok for trunk?
> >
> > I don't have much experience in this part of GCC and I think you're best 
> > placed to make a judgement on the implementation of this hook.
> > It's okay from my perspective (it doesn't look like it would break any SVE 
> > logic), but if you'd like to wait for a review from Richard I won't object.
>
> Yeah, it looks good to me too, sorry for the slow response.
>
> IMO:
>
>   tree type_arg_types = TYPE_ARG_TYPES (TREE_TYPE (node->decl));
>   bool decl_arg_p = (node->definition || type_arg_types == NULL_TREE);
>
>   for (t = (decl_arg_p ? DECL_ARGUMENTS (node->decl) : type_arg_types), i = 0;
>t && t != void_list_node; t = TREE_CHAIN (t), i++)
> {
>   tree arg_type = decl_arg_p ? TREE_TYPE (t) : TREE_VALUE (t);
>
> shows we have an interface problem here -- we shouldn't need something this
> convoluted to walk through an argument list.  But that's not your problem
> or a problem with the patch.

The caller side should IMHO always look at TYPE_ARG_TYPES since
that's what determines the ABI.  But IIRC there were problems in the past
that TYPE_ARG_TYPES might be NULL but DECL_ARGUMENTS not?!

>
> Thanks,
> Richard


Re: [PATCH] aarch64: Fix up aarch64_simd_clone_compute_vecsize_and_simdlen [PR99542]

2021-03-16 Thread Richard Sandiford via Gcc-patches
Kyrylo Tkachov  writes:
> Hi Jakub,
>
>> -Original Message-
>> From: Jakub Jelinek 
>> Sent: 13 March 2021 20:34
>> To: Richard Sandiford ; Richard Earnshaw
>> ; Kyrylo Tkachov 
>> Cc: gcc-patches@gcc.gnu.org
>> Subject: [PATCH] aarch64: Fix up
>> aarch64_simd_clone_compute_vecsize_and_simdlen [PR99542]
>>
>> Hi!
>>
>> As the patch shows, there are several bugs in
>> aarch64_simd_clone_compute_vecsize_and_simdlen.
>> One is that unlike for function declarations that aren't definitions
>> it completely ignores argument types.  Such decls don't have
>> DECL_ARGUMENTS,
>> but we can walk TYPE_ARG_TYPES instead, like the i386 backend does or like
>> the simd cloning code in the middle end does too.
>>
>> Another problem is that it checks types of uniform arguments.  That is
>> unnecessary, uniform arguments are passed the way it normally is, it is
>> a scalar argument rather than vector, so there is no reason not to support
>> uniform argument of different size, or long double, structure etc.
>>
>> Fixed thusly, bootstrapped/regtested on aarch64-linux, ok for trunk?
>
> I don't have much experience in this part of GCC and I think you're best 
> placed to make a judgement on the implementation of this hook.
> It's okay from my perspective (it doesn't look like it would break any SVE 
> logic), but if you'd like to wait for a review from Richard I won't object.

Yeah, it looks good to me too, sorry for the slow response.

IMO:

  tree type_arg_types = TYPE_ARG_TYPES (TREE_TYPE (node->decl));
  bool decl_arg_p = (node->definition || type_arg_types == NULL_TREE);

  for (t = (decl_arg_p ? DECL_ARGUMENTS (node->decl) : type_arg_types), i = 0;
   t && t != void_list_node; t = TREE_CHAIN (t), i++)
{
  tree arg_type = decl_arg_p ? TREE_TYPE (t) : TREE_VALUE (t);

shows we have an interface problem here -- we shouldn't need something this
convoluted to walk through an argument list.  But that's not your problem
or a problem with the patch.

Thanks,
Richard


RE: [PATCH] aarch64: Fix up aarch64_simd_clone_compute_vecsize_and_simdlen [PR99542]

2021-03-16 Thread Kyrylo Tkachov via Gcc-patches
Hi Jakub,

> -Original Message-
> From: Jakub Jelinek 
> Sent: 13 March 2021 20:34
> To: Richard Sandiford ; Richard Earnshaw
> ; Kyrylo Tkachov 
> Cc: gcc-patches@gcc.gnu.org
> Subject: [PATCH] aarch64: Fix up
> aarch64_simd_clone_compute_vecsize_and_simdlen [PR99542]
> 
> Hi!
> 
> As the patch shows, there are several bugs in
> aarch64_simd_clone_compute_vecsize_and_simdlen.
> One is that unlike for function declarations that aren't definitions
> it completely ignores argument types.  Such decls don't have
> DECL_ARGUMENTS,
> but we can walk TYPE_ARG_TYPES instead, like the i386 backend does or like
> the simd cloning code in the middle end does too.
> 
> Another problem is that it checks types of uniform arguments.  That is
> unnecessary, uniform arguments are passed the way it normally is, it is
> a scalar argument rather than vector, so there is no reason not to support
> uniform argument of different size, or long double, structure etc.
> 
> Fixed thusly, bootstrapped/regtested on aarch64-linux, ok for trunk?

I don't have much experience in this part of GCC and I think you're best placed 
to make a judgement on the implementation of this hook.
It's okay from my perspective (it doesn't look like it would break any SVE 
logic), but if you'd like to wait for a review from Richard I won't object.

Thanks,
Kyrill

> 
> 2021-03-13  Jakub Jelinek  
> 
>   PR target/99542
>   * config/aarch64/aarch64.c
>   (aarch64_simd_clone_compute_vecsize_and_simdlen): If not a
> function
>   definition, walk TYPE_ARG_TYPES list if non-NULL for argument
> types
>   instead of DECL_ARGUMENTS.  Ignore types for uniform arguments.
> 
>   * gcc.dg/gomp/pr99542.c: New test.
>   * gcc.dg/gomp/pr59669-2.c (bar): Don't expect a warning on
> aarch64.
>   * gcc.dg/gomp/simd-clones-2.c (setArray): Likewise.
>   * g++.dg/vect/simd-clone-7.cc (bar): Likewise.
>   * g++.dg/gomp/declare-simd-1.C (f37): Expect a different warning
>   on aarch64.
>   * gcc.dg/declare-simd.c (fn2): Expect a new warning on aarch64.
> 
> --- gcc/config/aarch64/aarch64.c.jj   2021-03-12 19:01:48.672296344
> +0100
> +++ gcc/config/aarch64/aarch64.c  2021-03-13 09:16:42.585045538
> +0100
> @@ -23412,11 +23412,17 @@
> aarch64_simd_clone_compute_vecsize_and_s
>return 0;
>  }
> 
> -  for (t = DECL_ARGUMENTS (node->decl); t; t = DECL_CHAIN (t))
> +  int i;
> +  tree type_arg_types = TYPE_ARG_TYPES (TREE_TYPE (node->decl));
> +  bool decl_arg_p = (node->definition || type_arg_types == NULL_TREE);
> +
> +  for (t = (decl_arg_p ? DECL_ARGUMENTS (node->decl) : type_arg_types), i
> = 0;
> +   t && t != void_list_node; t = TREE_CHAIN (t), i++)
>  {
> -  arg_type = TREE_TYPE (t);
> +  tree arg_type = decl_arg_p ? TREE_TYPE (t) : TREE_VALUE (t);
> 
> -  if (!currently_supported_simd_type (arg_type, base_type))
> +  if (clonei->args[i].arg_type != SIMD_CLONE_ARG_TYPE_UNIFORM
> +   && !currently_supported_simd_type (arg_type, base_type))
>   {
> if (TYPE_SIZE (arg_type) != TYPE_SIZE (base_type))
>   warning_at (DECL_SOURCE_LOCATION (node->decl), 0,
> --- gcc/testsuite/gcc.dg/gomp/pr99542.c.jj2021-03-13
> 09:16:42.586045527 +0100
> +++ gcc/testsuite/gcc.dg/gomp/pr99542.c   2021-03-13
> 09:16:42.586045527 +0100
> @@ -0,0 +1,17 @@
> +/* PR middle-end/89246 */
> +/* { dg-do compile { target int128 } } */
> +/* { dg-options "-O0 -fopenmp-simd" } */
> +
> +#pragma omp declare simd
> +extern int foo (__int128 x); /* { dg-warning "GCC does not currently
> support mixed size types for 'simd' function" "" { target aarch64*-*-* } } */
> +/* { dg-warning "unsupported argument type '__int128' for simd" "" { target
> i?86-*-* x86_64-*-* } .-1 } */
> +
> +#pragma omp declare simd uniform (x)
> +extern int baz (__int128 x);
> +
> +#pragma omp declare simd
> +int
> +bar (int x)
> +{
> +  return x + foo (0) + baz (0);
> +}
> --- gcc/testsuite/gcc.dg/gomp/pr59669-2.c.jj  2020-01-12
> 11:54:37.430398065 +0100
> +++ gcc/testsuite/gcc.dg/gomp/pr59669-2.c 2021-03-13
> 09:35:07.100807877 +0100
> @@ -7,4 +7,3 @@ void
>  bar (int *a)
>  {
>  }
> -/* { dg-warning "GCC does not currently support mixed size types for 'simd'
> functions" "" { target aarch64*-*-* } .-3 } */
> --- gcc/testsuite/gcc.dg/gomp/simd-clones-2.c.jj  2020-01-12
> 11:54:37.431398050 +0100
> +++ gcc/testsuite/gcc.dg/gomp/simd-clones-2.c 2021-03-13
> 09:36:21.143988256 +0100
> @@ -15,7 +15,6 @@ float setArray(float *a, float x, int k)
>return a[k];
>  }
> 
> -/* { dg-warning "GCC does not currently support mixed size types for 'simd'
> functions" "" { target aarch64*-*-* } .-6 } */
>  /* { dg-final { scan-tree-dump "_ZGVbN4ua32vl_setArray" "optimized"
> { target i?86-*-* x86_64-*-* } } } */
>  /* { dg-final { scan-tree-dump "_ZGVbN4vvva32_addit" "optimized" { target
> i?86-*-* x86_64-*-* } } } */
>  /* { dg-final { scan-tree-dump "_ZGVbM4vl66u_addit" "optimized" { target
> i?86-*-* x86_64-*-* } 

Re: [PATCH] IBM Z: Fix "+fvm" constraint with long doubles

2021-03-16 Thread Andreas Krebbel via Gcc-patches
On 3/16/21 1:16 AM, Ilya Leoshkevich wrote:
> Bootstrapped and regtested on s390x-redhat-linux.  Ok for master?
> 
> 
> 
> When a long double is passed to an asm statement with a "+fvm"
> constraint, a LRA loop occurs.  This happens, because LRA chooses the
> widest register class in this case (VEC_REGS), but the code generated
> by s390_md_asm_adjust() always wants FP_REGS.  Mismatching register
> classes cause infinite reloading.
> 
> Fix by treating "fv" constraints as "v" in s390_md_asm_adjust().
> 
> gcc/ChangeLog:
> 
>   * config/s390/s390.c (f_constraint_p): Treat "fv" constraints
>   as "v".
> 
> gcc/testsuite/ChangeLog:
> 
>   * gcc.target/s390/vector/long-double-asm-fprvrmem.c: New test.

Ok. Thanks!

Andreas

> ---
>  gcc/config/s390/s390.c   | 12 ++--
>  .../s390/vector/long-double-asm-fprvrmem.c   | 11 +++
>  2 files changed, 21 insertions(+), 2 deletions(-)
>  create mode 100644 
> gcc/testsuite/gcc.target/s390/vector/long-double-asm-fprvrmem.c
> 
> diff --git a/gcc/config/s390/s390.c b/gcc/config/s390/s390.c
> index 151136bedbc..f7b1c03561e 100644
> --- a/gcc/config/s390/s390.c
> +++ b/gcc/config/s390/s390.c
> @@ -16714,13 +16714,21 @@ s390_shift_truncation_mask (machine_mode mode)
>  static bool
>  f_constraint_p (const char *constraint)
>  {
> +  bool seen_f_p = false;
> +  bool seen_v_p = false;
> +
>for (size_t i = 0, c_len = strlen (constraint); i < c_len;
> i += CONSTRAINT_LEN (constraint[i], constraint + i))
>  {
>if (constraint[i] == 'f')
> - return true;
> + seen_f_p = true;
> +  if (constraint[i] == 'v')
> + seen_v_p = true;
>  }
> -  return false;
> +
> +  /* Treat "fv" constraints as "v", because LRA will choose the widest 
> register
> +   * class.  */
> +  return seen_f_p && !seen_v_p;
>  }
>  
>  /* Implement TARGET_MD_ASM_ADJUST hook in order to fix up "f"
> diff --git a/gcc/testsuite/gcc.target/s390/vector/long-double-asm-fprvrmem.c 
> b/gcc/testsuite/gcc.target/s390/vector/long-double-asm-fprvrmem.c
> new file mode 100644
> index 000..f95656c5723
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/s390/vector/long-double-asm-fprvrmem.c
> @@ -0,0 +1,11 @@
> +/* { dg-do compile } */
> +/* { dg-options "-O3 -march=z14 -mzarch" } */
> +
> +long double
> +foo (long double x)
> +{
> +  x = x * x;
> +  asm("# %0" : "+fvm"(x));
> +  x = x + x;
> +  return x;
> +}
>