Committed: gcc.dg/tree-ssa/ssa-dse-26.c: Skip on mmix-knuth-mmixware

2021-07-30 Thread Hans-Peter Nilsson
Commit r12-432, rewriting the dg-stuff, reverted the
adjustment for mmix-knuth-mmixware that I added in r11-2335.
(See those commits for context.)

Hopefully this variant will age better, just skipping it
with a trivial extra line less prone to pile-on.  (Not much
is won by covering this generic case for MMIX too; might as
well skip it.)

Beware that the dg-skip-if text can't say
"temporary variables are not x and y but x::3 and y::4"
because that leads to (on one line):

ERROR: gcc.dg/tree-ssa/ssa-dse-26.c: can't set "{temporary
 variables are not x and y but x::3 and y::4} {
 mmix-knuth-mmixware }": parent namespace doesn't exist for
 " dg-skip-if 4 "temporary variables are not x and y but
 x::3 and y::4" { mmix-knuth-mmixware } "

gcc/testsuite:
* gcc.dg/tree-ssa/ssa-dse-26.c: Skip on mmix-knuth-mmixware.
---
 gcc/testsuite/gcc.dg/tree-ssa/ssa-dse-26.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/gcc/testsuite/gcc.dg/tree-ssa/ssa-dse-26.c 
b/gcc/testsuite/gcc.dg/tree-ssa/ssa-dse-26.c
index 5eabfb464d3c..e3c33f49ef60 100644
--- a/gcc/testsuite/gcc.dg/tree-ssa/ssa-dse-26.c
+++ b/gcc/testsuite/gcc.dg/tree-ssa/ssa-dse-26.c
@@ -1,6 +1,7 @@
 /* { dg-do compile } */
 /* { dg-options "-O2 -fdump-tree-dse1-details -fno-short-enums -fno-tree-fre" 
} */
 /* { dg-skip-if "we want a BIT_FIELD_REF from fold_truth_andor" { ! lp64 } } */
+/* { dg-skip-if "temporary variable names are not x and y" { 
mmix-knuth-mmixware } } */

 enum constraint_expr_type
 {
-- 
2.20.1



Committed: gcc.dg/uninit-pred-9_b.c: Xfail for MMIX too

2021-07-30 Thread Hans-Peter Nilsson
Looks like MMIX is the "correct target" too (cf. 2f6bdd51cfe15)
and from
https://gcc.gnu.org/pipermail/gcc-testresults/2021-July/710188.html
it seems powerpc-ibm-aix7.2.3.0 is too, but I've not found
other targets failing.

gcc/testsuite:
PR middle-end/101674
* gcc.dg/uninit-pred-9_b.c: Xfail for mmix-*-* too.
---
 gcc/testsuite/gcc.dg/uninit-pred-9_b.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/gcc/testsuite/gcc.dg/uninit-pred-9_b.c 
b/gcc/testsuite/gcc.dg/uninit-pred-9_b.c
index e0dc21405f7d..9383c559507c 100644
--- a/gcc/testsuite/gcc.dg/uninit-pred-9_b.c
+++ b/gcc/testsuite/gcc.dg/uninit-pred-9_b.c
@@ -20,7 +20,7 @@ int foo (int n, int l, int m, int r)
   blah(v); /* { dg-bogus "uninitialized" "bogus warning" } */

   if ( (n <= 8) &&  (m < 99)  && (r < 19) )
-  blah(v); /* { dg-bogus "uninitialized" "pr101674" { xfail powerpc64*-*-* 
} } */
+  blah(v); /* { dg-bogus "uninitialized" "pr101674" { xfail powerpc64*-*-* 
mmix-*-* } } */

   return 0;
 }
-- 
2.20.1



Re: [PATCH v2 4/6] rs6000: Add tests for SSE4.1 "ceil" intrinsics

2021-07-30 Thread Paul A. Clarke via Gcc-patches
On Wed, Jul 28, 2021 at 05:16:32PM -0500, Segher Boessenkool wrote:
> On Fri, Jul 16, 2021 at 08:50:20AM -0500, Paul A. Clarke wrote:
> > --- /dev/null
> > +++ b/gcc/testsuite/gcc.target/powerpc/sse4_1-round.h
> > @@ -0,0 +1,27 @@
> > +#include 
> > +#include 
> > +#include "sse4_1-check.h"
> > +
> > +#define DIM(a) (sizeof (a) / sizeof ((a)[0]))
> 
> Pet peeve: sizeof is an operator, not a function, so even if you want to
> protect the macro parameter this just is
>   #define DIM(a) (sizeof (a) / sizeof (a)[0])
> 
> > +  (void) fesetround (round_save);
> 
> Please don't cast to (void).  That never does *anything*.
> 
> Okay for trunk (these are all testsuite files after all, and we should
> test horrrible style as well! :-P )

I didn't want to be responsible for promulgating horrible style, so
I incorporated the above changes and pushed as
d656a3d3ce88d402a14e8c120f1b0e78a3979deb.  :-)

PC


Re: [PATCH][gcc/doc] Improve nonnull attribute documentation

2021-07-30 Thread Martin Sebor via Gcc-patches

On 7/30/21 2:21 PM, Tom de Vries wrote:

On 7/30/21 6:17 PM, Martin Sebor wrote:

On 7/28/21 9:20 AM, Tom de Vries wrote:

Hi,

Improve nonnull attribute documentation in a number of ways:

Reorganize discussion of effects into:
- effects for calls to functions with nonnull-marked parameters, and
- effects for function definitions with nonnull-marked parameters.
This makes it clear that -fno-delete-null-pointer-checks has no effect
for
optimizations based on nonnull-marked parameters in function definitions
(see PR100404).


This resolves half of PR 101665 that I raised the other day (i.e.,
updates the docs).  Thank you!


You're welcome :)


Since PR 100404 was resolved as
invalid,


Yeah, I can also live with reopening that one as documentation PR.


can you please reference the other PR in the changelog?


Done.


The other half (warning when attribute nonnull is specified along
with attribute optimize "-fno-delete-null-pointer-checks") remains.
I plan to look into it unless someone beats me to it or unless some
other solution emerges.



FWIW, In my reply to Richi here (
https://gcc.gnu.org/pipermail/gcc-patches/2021-July/576415.html ) I
proposed to split the existing nonnull  attribute functionality into
assume_nonnull/verify_nonnull attributes.  I'm curious what you think of
that proposal.


I think it may have been worth considering at the time nonnull was
being proposed but it's too late now.  Users are familiar with it
under its current name, it's embedded in lots of code, and it's
supported by GCC-compatible compilers.  Introducing a pair of new
attributes with slightly different semantics would make it difficult
to write code that's meant to be portable across these implementations
(including prior versions of GCC).  Not to mention that it wouldn't
help users of nonnull.

But since the problem is limited to function definitions, we don't
need a solution for declarations.  What's missing is an intuitive
way to tell GCC that an attribute on the declaration should not
have an effect in its definition.  A nicer name for Andrew's asm
trick.  Coming up with a generic enough name would make it usable
for arguments with other attributes with similar effect (the only
one that comes to mind at the moment is attribute access which
also triggers warnings in function bodies, although it's not yet
used for optimization; others might pop up in the future).

Martin




A few comments on the documentation changes below.



Mention -Wnonnull-compare.

Mention workaround from PR100404 comment 7.

The workaround can be used for this scenario.  Say we have a test.c:
...
   #include 

   extern int isnull (char *ptr) __attribute__ ((nonnull));
   int isnull (char *ptr)
   {
     if (ptr == 0)
   return 1;
     return 0;
   }

   int
   main (void)
   {
     char *ptr = NULL;
     if (isnull (ptr)) __builtin_abort ();
     return 0;
   }
...

The test-case contains a mistake: ptr == NULL, and we want to detect the
mistake using an abort:
...
$ gcc test.c
$ ./a.out
Aborted (core dumped)
...

At -O2 however, the mistake is not detected:
...
$ gcc test.c -O2
$ ./a.out
...
which is what -Wnonnull-compare (not show here) warns about.

The easiest way to fix this is by dropping the nonnull attribute.  But
that
also disables -Wnonnull, which would detect something like:
...
    if (isnull (NULL)) __builtin_abort ();
...
at compile time.

Using this workaround:
...
   int isnull (char *ptr)
   {
+  asm ("" : "+r"(ptr));
     if (ptr == 0)
   return 1;
     return 0;
   }
...
we still manage to detect the problem at runtime with -O2:
...
$ ~/gcc_versions/devel/install/bin/gcc test.c -O2
$ ./a.out
Aborted (core dumped)
...
while keeping the possibility to detect "isnull (NULL)" at compile time.

OK for trunk?

Thanks,
- Tom

[gcc/doc] Improve nonnull attribute documentation

gcc/ChangeLog:

2021-07-28  Tom de Vries  

 * doc/extend.texi (nonnull attribute): Improve documentation.

---
   gcc/doc/extend.texi | 51
---
   1 file changed, 40 insertions(+), 11 deletions(-)

diff --git a/gcc/doc/extend.texi b/gcc/doc/extend.texi
index b83cd4919bb..3389effd70c 100644
--- a/gcc/doc/extend.texi
+++ b/gcc/doc/extend.texi
@@ -3488,17 +3488,46 @@ my_memcpy (void *dest, const void *src, size_t
len)
   @end smallexample
     @noindent
-causes the compiler to check that, in calls to @code{my_memcpy},
-arguments @var{dest} and @var{src} are non-null.  If the compiler
-determines that a null pointer is passed in an argument slot marked
-as non-null, and the @option{-Wnonnull} option is enabled, a warning
-is issued.  @xref{Warning Options}.  Unless disabled by
-the @option{-fno-delete-null-pointer-checks} option the compiler may
-also perform optimizations based on the knowledge that certain function
-arguments cannot be null. In addition,
-the @option{-fisolate-erroneous-paths-attribute} option can be specified
-to have GCC transform calls with null arguments to non-null functions

Committed: MMIX: remove generic placeholders parameters in call insn patterns

2021-07-30 Thread Hans-Peter Nilsson
I guess the best way to describe these operands, at least for MMIX, is
"ballast".  Some targets seem to drag along one or two of the incoming
pattern operands through the rtl passes and not dropping them until
assembly output.  Let's stop doing that for MMIX.  There really are
*two* unused parameters: one is a number corresponding to the
stack-size of arguments as a const_int and the other is whatever the
target yields for targetm.calls.function_arg (args_so_far,
function_arg_info::end_marker ()).  There's a mandatory second
argument to the "call" RTX, but the target doesn't have to keep it a
variable number; it can be replaced by (const_int 0) early, like this.

Astute readers may object that as the MMIX call-type insns (PUSHJ,
PUSHGO) have a parameter in addition to the address of the called
function, so should the emitted RTL.  But, that parameter depends only
on the local function, not the called function (IOW, it's the same for
all calls in a function), and its value isn't known until frame layout
time.  Having it a parameter in the emitted RTL for the call would
just be confusing.  (Maybe this will be amended later, if/when
improving "shrink-wrapping".)

gcc:
* config/mmix/mmix.md ("call", "call_value", "*call_real")
("*call_value_real"): Don't generate rtx mentioning the generic
operands 1 and 2 to "call", and similarly for "call_value".
* config/mmix/mmix.c (mmix_print_operand_punct_valid_p)
(mmix_print_operand): Use '!' instead of 'p'.
---
 gcc/config/mmix/mmix.c  | 20 +++
 gcc/config/mmix/mmix.md | 56 +
 2 files changed, 33 insertions(+), 43 deletions(-)

diff --git a/gcc/config/mmix/mmix.c b/gcc/config/mmix/mmix.c
index db7af7b75b6d..010cd4773eac 100644
--- a/gcc/config/mmix/mmix.c
+++ b/gcc/config/mmix/mmix.c
@@ -1624,6 +1624,12 @@ mmix_print_operand (FILE *stream, rtx x, int code)
   fprintf (stream, "%d", MMIX_POP_ARGUMENT ());
   return;

+case '!':
+  /* The number of registers we want to save.  This was setup by the
+prologue.  */
+  fprintf (stream, "%d", cfun->machine->highest_saved_stack_register + 1);
+  return;
+
 case 'B':
   if (GET_CODE (x) != CONST_INT)
fatal_insn ("MMIX Internal: Expected a CONST_INT, not this", x);
@@ -1712,15 +1718,6 @@ mmix_print_operand (FILE *stream, rtx x, int code)
   (int64_t) (mmix_intval (x) - 1));
   return;

-case 'p':
-  /* Store the number of registers we want to save.  This was setup
-by the prologue.  The actual operand contains the number of
-registers to pass, but we don't use it currently.  Anyway, we
-need to output the number of saved registers here.  */
-  fprintf (stream, "%d",
-  cfun->machine->highest_saved_stack_register + 1);
-  return;
-
 case 'r':
   /* Store the register to output a constant to.  */
   if (! REG_P (x))
@@ -1830,7 +1827,10 @@ mmix_print_operand_punct_valid_p (unsigned char code)
   /* A '+' is used for branch prediction, similar to other ports.  */
   return code == '+'
 /* A '.' is used for the %d in the POP %d,0 return insn.  */
-|| code == '.';
+|| code == '.'
+/* A '!' is used for the number of saved registers, like when outputting
+   PUSHJ and PUSHGO. */
+|| code == '!';
 }

 /* TARGET_PRINT_OPERAND_ADDRESS.  */
diff --git a/gcc/config/mmix/mmix.md b/gcc/config/mmix/mmix.md
index 33e9c60982d6..99be8263a1a1 100644
--- a/gcc/config/mmix/mmix.md
+++ b/gcc/config/mmix/mmix.md
@@ -974,11 +974,9 @@ (define_insn "*bCC_inverted"
   "%+B%D1 %2,%0")

 (define_expand "call"
-  [(parallel [(call (match_operand:QI 0 "memory_operand" "")
-   (match_operand 1 "general_operand" ""))
- (use (match_operand 2 "general_operand" ""))
- (clobber (match_dup 4))])
-   (set (match_dup 4) (match_dup 3))]
+  [(parallel [(call (match_operand:QI 0 "memory_operand" "") (const_int 0))
+ (clobber (match_dup 1))])
+   (set (match_dup 1) (match_dup 2))]
   ""
   "
 {
@@ -992,28 +990,24 @@ (define_expand "call"
   = replace_equiv_address (operands[0],
   force_reg (Pmode, XEXP (operands[0], 0)));

+  /* Note that we overwrite the generic operands[1] and operands[2]; we
+ don't use those values.  */
+  operands[1] = gen_rtx_REG (DImode, MMIX_INCOMING_RETURN_ADDRESS_REGNUM);
+
   /* Since the epilogue 'uses' the return address, and it is clobbered
  in the call, and we set it back after every call (all but one setting
  will be optimized away), integrity is maintained.  */
-  operands[3]
+  operands[2]
 = mmix_get_hard_reg_initial_val (Pmode,
 MMIX_INCOMING_RETURN_ADDRESS_REGNUM);
-
-  /* NULL gets passed as operand[2] when we get out of registers,
- which later confuses gcc.  Replace it with const_int 0.  */
-  if (operands[2] == NULL_RTX)
-operands[2] = const0_rtx;

[PATCH v6 05/10] x86: Add tests for piecewise move and store

2021-07-30 Thread H.J. Lu via Gcc-patches
* gcc.target/i386/pieces-memcpy-10.c: New test.
* gcc.target/i386/pieces-memcpy-11.c: Likewise.
* gcc.target/i386/pieces-memcpy-12.c: Likewise.
* gcc.target/i386/pieces-memcpy-13.c: Likewise.
* gcc.target/i386/pieces-memcpy-14.c: Likewise.
* gcc.target/i386/pieces-memcpy-15.c: Likewise.
* gcc.target/i386/pieces-memcpy-16.c: Likewise.
* gcc.target/i386/pieces-memcpy-17.c: Likewise.
* gcc.target/i386/pieces-memcpy-18.c: Likewise.
* gcc.target/i386/pieces-memcpy-19.c: Likewise.
* gcc.target/i386/pieces-memset-1.c: Likewise.
* gcc.target/i386/pieces-memset-2.c: Likewise.
* gcc.target/i386/pieces-memset-3.c: Likewise.
* gcc.target/i386/pieces-memset-4.c: Likewise.
* gcc.target/i386/pieces-memset-5.c: Likewise.
* gcc.target/i386/pieces-memset-6.c: Likewise.
* gcc.target/i386/pieces-memset-7.c: Likewise.
* gcc.target/i386/pieces-memset-8.c: Likewise.
* gcc.target/i386/pieces-memset-9.c: Likewise.
* gcc.target/i386/pieces-memset-10.c: Likewise.
* gcc.target/i386/pieces-memset-11.c: Likewise.
* gcc.target/i386/pieces-memset-12.c: Likewise.
* gcc.target/i386/pieces-memset-13.c: Likewise.
* gcc.target/i386/pieces-memset-14.c: Likewise.
* gcc.target/i386/pieces-memset-15.c: Likewise.
* gcc.target/i386/pieces-memset-16.c: Likewise.
* gcc.target/i386/pieces-memset-17.c: Likewise.
* gcc.target/i386/pieces-memset-18.c: Likewise.
* gcc.target/i386/pieces-memset-19.c: Likewise.
* gcc.target/i386/pieces-memset-20.c: Likewise.
* gcc.target/i386/pieces-memset-21.c: Likewise.
* gcc.target/i386/pieces-memset-22.c: Likewise.
* gcc.target/i386/pieces-memset-23.c: Likewise.
* gcc.target/i386/pieces-memset-24.c: Likewise.
* gcc.target/i386/pieces-memset-25.c: Likewise.
* gcc.target/i386/pieces-memset-26.c: Likewise.
* gcc.target/i386/pieces-memset-27.c: Likewise.
* gcc.target/i386/pieces-memset-28.c: Likewise.
* gcc.target/i386/pieces-memset-29.c: Likewise.
* gcc.target/i386/pieces-memset-30.c: Likewise.
* gcc.target/i386/pieces-memset-31.c: Likewise.
* gcc.target/i386/pieces-memset-32.c: Likewise.
* gcc.target/i386/pieces-memset-33.c: Likewise.
* gcc.target/i386/pieces-memset-34.c: Likewise.
* gcc.target/i386/pieces-memset-35.c: Likewise.
* gcc.target/i386/pieces-memset-36.c: Likewise.
* gcc.target/i386/pieces-memset-37.c: Likewise.
* gcc.target/i386/pieces-memset-38.c: Likewise.
* gcc.target/i386/pieces-memset-39.c: Likewise.
* gcc.target/i386/pieces-memset-40.c: Likewise.
* gcc.target/i386/pieces-memset-41.c: Likewise.
* gcc.target/i386/pieces-memset-42.c: Likewise.
* gcc.target/i386/pieces-memset-43.c: Likewise.
* gcc.target/i386/pieces-memset-44.c: Likewise.
---
 .../gcc.target/i386/pieces-memcpy-10.c | 16 
 .../gcc.target/i386/pieces-memcpy-11.c | 17 +
 .../gcc.target/i386/pieces-memcpy-12.c | 16 
 .../gcc.target/i386/pieces-memcpy-13.c | 16 
 .../gcc.target/i386/pieces-memcpy-14.c | 17 +
 .../gcc.target/i386/pieces-memcpy-15.c | 16 
 .../gcc.target/i386/pieces-memcpy-16.c | 16 
 .../gcc.target/i386/pieces-memcpy-7.c  | 15 +++
 .../gcc.target/i386/pieces-memcpy-8.c  | 14 ++
 .../gcc.target/i386/pieces-memcpy-9.c  | 14 ++
 .../gcc.target/i386/pieces-memset-1.c  | 16 
 .../gcc.target/i386/pieces-memset-10.c | 16 
 .../gcc.target/i386/pieces-memset-11.c | 16 
 .../gcc.target/i386/pieces-memset-12.c | 16 
 .../gcc.target/i386/pieces-memset-13.c | 16 
 .../gcc.target/i386/pieces-memset-14.c | 16 
 .../gcc.target/i386/pieces-memset-15.c | 16 
 .../gcc.target/i386/pieces-memset-16.c | 16 
 .../gcc.target/i386/pieces-memset-17.c | 16 
 .../gcc.target/i386/pieces-memset-18.c | 16 
 .../gcc.target/i386/pieces-memset-19.c | 17 +
 .../gcc.target/i386/pieces-memset-2.c  | 12 
 .../gcc.target/i386/pieces-memset-20.c | 17 +
 .../gcc.target/i386/pieces-memset-21.c | 18 ++
 .../gcc.target/i386/pieces-memset-22.c | 17 +
 .../gcc.target/i386/pieces-memset-23.c | 17 +
 .../gcc.target/i386/pieces-memset-24.c | 17 +
 .../gcc.target/i386/pieces-memset-25.c | 17 +
 

[PATCH v6 09/10] x86: Update gcc.target/i386/incoming-11.c

2021-07-30 Thread H.J. Lu via Gcc-patches
Expect no stack realignment since we no longer realign stack when
copying data.

* gcc.target/i386/incoming-11.c: Expect no stack realignment.
---
 gcc/testsuite/gcc.target/i386/incoming-11.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/gcc/testsuite/gcc.target/i386/incoming-11.c 
b/gcc/testsuite/gcc.target/i386/incoming-11.c
index a830c96f7d1..4b822684b88 100644
--- a/gcc/testsuite/gcc.target/i386/incoming-11.c
+++ b/gcc/testsuite/gcc.target/i386/incoming-11.c
@@ -15,4 +15,4 @@ void f()
for (i = 0; i < 100; i++) q[i] = 1;
 }
 
-/* { dg-final { scan-assembler "andl\[\\t \]*\\$-16,\[\\t \]*%esp" } } */
+/* { dg-final { scan-assembler-not "andl\[\\t \]*\\$-16,\[\\t \]*%esp" } } */
-- 
2.31.1



[PATCH v6 10/10] x86: Also pass -mno-sse to vect8-ret.c

2021-07-30 Thread H.J. Lu via Gcc-patches
Also pass -mno-sse to vect8-ret.c to disable XMM load/store when running
GCC tests with "-march=x86-64 -m32".

* gcc.target/i386/vect8-ret.c: Also pass -mno-sse.
---
 gcc/testsuite/gcc.target/i386/vect8-ret.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/gcc/testsuite/gcc.target/i386/vect8-ret.c 
b/gcc/testsuite/gcc.target/i386/vect8-ret.c
index 2b2b81ecf7a..6ace07e6e0c 100644
--- a/gcc/testsuite/gcc.target/i386/vect8-ret.c
+++ b/gcc/testsuite/gcc.target/i386/vect8-ret.c
@@ -1,5 +1,5 @@
 /* { dg-do compile { target { ia32 && { ! *-*-vxworks* } } } } */
-/* { dg-options "-mmmx -mvect8-ret-in-mem" } */
+/* { dg-options "-mmmx -mno-sse -mvect8-ret-in-mem" } */
 
 #include 
 
-- 
2.31.1



[PATCH v6 08/10] x86: Also pass -mno-avx to sw-1.c for ia32

2021-07-30 Thread H.J. Lu via Gcc-patches
Also pass -mno-avx to sw-1.c for ia32 since copying data with YMM or ZMM
registers disables shrink-wrapping when the second argument is passed on
stack.

* gcc.target/i386/sw-1.c: Also pass -mno-avx for ia32.
---
 gcc/testsuite/gcc.target/i386/sw-1.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/gcc/testsuite/gcc.target/i386/sw-1.c 
b/gcc/testsuite/gcc.target/i386/sw-1.c
index aec095eda62..a9c89fca4ec 100644
--- a/gcc/testsuite/gcc.target/i386/sw-1.c
+++ b/gcc/testsuite/gcc.target/i386/sw-1.c
@@ -1,5 +1,6 @@
 /* { dg-do compile } */
 /* { dg-options "-O2 -mtune=generic -fshrink-wrap -fdump-rtl-pro_and_epilogue" 
} */
+/* { dg-additional-options "-mno-avx" { target ia32 } } */
 /* { dg-skip-if "No shrink-wrapping preformed" { x86_64-*-mingw* } } */
 
 #include 
-- 
2.31.1



[PATCH v6 03/10] x86: Update piecewise move and store

2021-07-30 Thread H.J. Lu via Gcc-patches
We can use TImode/OImode/XImode integers for piecewise move and store.

1. Define MAX_MOVE_MAX to 64, which is the constant maximum number of
bytes that a single instruction can move quickly between memory and
registers or between two memory locations.
2. Define MOVE_MAX to MOVE_MAX_PIECES, which is the maximum number of
bytes we can move from memory to memory in one reasonably fast instruction.
The difference between MAX_MOVE_MAX and MOVE_MAX is that MAX_MOVE_MAX
must be a constant, independent of compiler options, since it is used in
reload.h to define struct target_reload and MOVE_MAX can vary, depending
on compiler options.
3. When vector register is used for piecewise move and store, we don't
increase stack_alignment_needed since vector register spill isn't
required for piecewise move and store.  Since stack_realign_needed is
set to true by checking stack_alignment_estimated set by pseudo vector
register usage, we also need to check stack_realign_needed to eliminate
frame pointer.

gcc/

* config/i386/i386.c (ix86_finalize_stack_frame_flags): Also
check stack_realign_needed for stack realignment.
(ix86_legitimate_constant_p): Always allow CONST_WIDE_INT smaller
than the largest integer supported by vector register.
* config/i386/i386.h (MAX_MOVE_MAX): New.  Set to 64.
(MOVE_MAX_PIECES): Set to bytes of the largest integer supported
by vector register.
(MOVE_MAX): Defined to MOVE_MAX_PIECES.
(STORE_MAX_PIECES): New.

gcc/testsuite/

* gcc.target/i386/pr90773-1.c: Adjust to expect movq for 32-bit.
* gcc.target/i386/pr90773-4.c: Also run for 32-bit.
* gcc.target/i386/pr90773-15.c: Likewise.
* gcc.target/i386/pr90773-16.c: Likewise.
* gcc.target/i386/pr90773-17.c: Likewise.
* gcc.target/i386/pr90773-24.c: Likewise.
* gcc.target/i386/pr90773-25.c: Likewise.
* gcc.target/i386/pr100865-1.c: Likewise.
* gcc.target/i386/pr100865-2.c: Likewise.
* gcc.target/i386/pr100865-3.c: Likewise.
* gcc.target/i386/pr90773-14.c: Also run for 32-bit and expect
XMM movd to store 4 bytes.
* gcc.target/i386/pr100865-4a.c: Also run for 32-bit and expect
YMM registers.
* gcc.target/i386/pr100865-4b.c: Likewise.
* gcc.target/i386/pr100865-10a.c: Expect YMM registers.
* gcc.target/i386/pr100865-10b.c: Likewise.
---
 gcc/config/i386/i386.c   | 21 --
 gcc/config/i386/i386.h   | 40 
 gcc/testsuite/gcc.target/i386/pr100865-1.c   |  2 +-
 gcc/testsuite/gcc.target/i386/pr100865-10a.c |  4 +-
 gcc/testsuite/gcc.target/i386/pr100865-10b.c |  4 +-
 gcc/testsuite/gcc.target/i386/pr100865-2.c   |  2 +-
 gcc/testsuite/gcc.target/i386/pr100865-3.c   |  2 +-
 gcc/testsuite/gcc.target/i386/pr100865-4a.c  |  6 +--
 gcc/testsuite/gcc.target/i386/pr100865-4b.c  |  8 ++--
 gcc/testsuite/gcc.target/i386/pr90773-1.c| 10 ++---
 gcc/testsuite/gcc.target/i386/pr90773-14.c   |  2 +-
 gcc/testsuite/gcc.target/i386/pr90773-15.c   |  6 +--
 gcc/testsuite/gcc.target/i386/pr90773-16.c   |  2 +-
 gcc/testsuite/gcc.target/i386/pr90773-17.c   |  2 +-
 gcc/testsuite/gcc.target/i386/pr90773-24.c   |  2 +-
 gcc/testsuite/gcc.target/i386/pr90773-25.c   |  2 +-
 gcc/testsuite/gcc.target/i386/pr90773-4.c|  2 +-
 17 files changed, 76 insertions(+), 41 deletions(-)

diff --git a/gcc/config/i386/i386.c b/gcc/config/i386/i386.c
index 5d20ca2067f..842eb0e6786 100644
--- a/gcc/config/i386/i386.c
+++ b/gcc/config/i386/i386.c
@@ -7953,8 +7953,17 @@ ix86_finalize_stack_frame_flags (void)
  assumed stack realignment might be needed or -fno-omit-frame-pointer
  is used, but in the end nothing that needed the stack alignment had
  been spilled nor stack access, clear frame_pointer_needed and say we
- don't need stack realignment.  */
-  if ((stack_realign || (!flag_omit_frame_pointer && optimize))
+ don't need stack realignment.
+
+ When vector register is used for piecewise move and store, we don't
+ increase stack_alignment_needed as there is no register spill for
+ piecewise move and store.  Since stack_realign_needed is set to true
+ by checking stack_alignment_estimated which is updated by pseudo
+ vector register usage, we also need to check stack_realign_needed to
+ eliminate frame pointer.  */
+  if ((stack_realign
+   || (!flag_omit_frame_pointer && optimize)
+   || crtl->stack_realign_needed)
   && frame_pointer_needed
   && crtl->is_leaf
   && crtl->sp_is_unchanging
@@ -10418,7 +10427,13 @@ ix86_legitimate_constant_p (machine_mode mode, rtx x)
  /* FALLTHRU */
case E_OImode:
case E_XImode:
- if (!standard_sse_constant_p (x, mode))
+ if (!standard_sse_constant_p (x, mode)
+ && GET_MODE_SIZE (TARGET_AVX512F
+   ? XImode
+   

[PATCH v6 06/10] x86: Also pass -mno-avx to pr72839.c

2021-07-30 Thread H.J. Lu via Gcc-patches
Also pass -mno-avx to pr72839.c to avoid copying data with YMM or ZMM
registers.

* gcc.target/i386/pr72839.c: Also pass -mno-avx.
---
 gcc/testsuite/gcc.target/i386/pr72839.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/gcc/testsuite/gcc.target/i386/pr72839.c 
b/gcc/testsuite/gcc.target/i386/pr72839.c
index ea724f70377..6888d9d0a55 100644
--- a/gcc/testsuite/gcc.target/i386/pr72839.c
+++ b/gcc/testsuite/gcc.target/i386/pr72839.c
@@ -1,6 +1,6 @@
 /* { dg-do compile } */
 /* { dg-require-effective-target ia32 } */
-/* { dg-options "-O2 -mtune=lakemont" } */
+/* { dg-options "-O2 -mtune=lakemont -mno-avx" } */
 
 extern char *strcpy (char *, const char *);
 
-- 
2.31.1



[PATCH v6 07/10] x86: Also pass -mno-avx to cold-attribute-1.c

2021-07-30 Thread H.J. Lu via Gcc-patches
Also pass -mno-avx to pr72839.c to avoid copying data with YMM or ZMM
registers.

* gcc.target/i386/cold-attribute-1.c: Also pass -mno-avx.
---
 gcc/testsuite/gcc.target/i386/cold-attribute-1.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/gcc/testsuite/gcc.target/i386/cold-attribute-1.c 
b/gcc/testsuite/gcc.target/i386/cold-attribute-1.c
index 57666ac60b6..658eb3e25bb 100644
--- a/gcc/testsuite/gcc.target/i386/cold-attribute-1.c
+++ b/gcc/testsuite/gcc.target/i386/cold-attribute-1.c
@@ -1,5 +1,5 @@
 /* { dg-do compile } */
-/* { dg-options "-O2" } */
+/* { dg-options "-O2 -mno-avx" } */
 #include 
 static inline
 __attribute__ ((cold)) void
-- 
2.31.1



[PATCH v5 00/10] Allow TImode/OImode/XImode in op_by_pieces operations

2021-07-30 Thread H.J. Lu via Gcc-patches
Changes in the v6 patches:

1. No need to add TARGET_GEN_MEMSET_SCRATCH_RTX nor change the memset
expanders since they have been checked into master branch.

Changes in the v5 patches:

1. Add TARGET_GEN_MEMSET_SCRATCH_RTX to allow the backend to use a hard
scratch register to avoid stack realignment when expanding memset.
2. Use vec_duplicate, instead of adding TARGET_READ_MEMSET_VALUE and
TARGET_GEN_MEMSET_VALUE, to expand memset if available.

Changes in the v4 patches:

1. Define x86 MAX_MOVE_MAX to 64, which is the constant maximum number
of bytes that a single instruction can move quickly between memory and
registers or between two memory locations.
2. Define x86 MOVE_MAX to MOVE_MAX_PIECES, which is the maximum number of
bytes we can move from memory to memory in one reasonably fast instruction.
The difference between MAX_MOVE_MAX and MOVE_MAX is that MAX_MOVE_MAX
must be a constant, independent of compiler options, since it is used in
reload.h to define struct target_reload and MOVE_MAX can vary, depending
on compiler options.

Changes in the v3 patches:

1. Split the TARGET_READ_MEMSET_VALUE and TARGET_GEN_MEMSET_VALUE changes
into the generic part and the x86 part.


1. Add TARGET_READ_MEMSET_VALUE and TARGET_GEN_MEMSET_VALUE to support
target instructions to duplicate QImode value to TImode/OImode/XImode
value for memmset.
2. x86: Avoid stack realignment when copying data
3. x86: Remov MAX_BITSIZE_MODE_ANY_INT.  Only x86 backend defines it.
4. x86: Use TImode/OImode/XImode integers for piecewise move and store.
5. x86: Add tests for TImode/OImode/XImode for piecewise move and store.
6. x86: Adjust existing tests.

On x86-64, SPEC CPU 2017 performance impact is neutral.  Glibc code size
differences with -O2 build are:

 Before After
libc.so 19065721906444

Some code sequence differences in libc.so are:

:
...
jne   | jne

test   %r15,%r15test   
%r15,%r15
je| je 

mov%r13d,(%r14) mov
%r13d,(%r14)
lea0x10(%r14),%rdi  lea
0x10(%r14),%rdi
mov$0x1,%ecxmov
$0x1,%ecx
mov%r13d,%edx   mov
%r13d,%edx
mov%r15,0x40(%r12)  mov
%r15,0x40(%r12)
mov%r15,%rsimov
%r15,%rsi
call call   

lea0xa2f9b(%rip),%rax# | lea
0xa2fab(%rip),%rax# 
xor%esi,%esixor
%esi,%esi
mov%ebp,%edimov
%ebp,%edi
mov%rax,0x8(%r12)   mov
%rax,0x8(%r12)
movzwl 0x12(%rsp),%eax  movzwl 
0x12(%rsp),%eax
mov$0x8,%edx  <
lea0xc(%rsp),%rcx   lea
0xc(%rsp),%rcx
mov%r14,0x48(%r12)<
add$0x40,%r14 <
mov$0x4,%r8dmov
$0x4,%r8d
  > movq   
$0x0,0x1d0(%r14)
  > mov
$0x8,%edx
rol$0x8,%ax rol
$0x8,%ax
mov%ebp,(%r12)| mov
%r14,0x48(%r12)
movq   $0x0,0x190(%r14)   | add
$0x40,%r14
mov%ax,0x4(%r12)  <
mov%r14,0x30(%r12)  mov
%r14,0x30(%r12)
  > mov
%ax,0x4(%r12)
  > mov
%ebp,(%r12)
movl   $0x1,0xc(%rsp)   movl   
$0x1,0xc(%rsp)
callcall   

mov%r12,%rdimov
%r12,%rdi
movabs $0x101010101010101,%rdx<
test   %eax,%eaxtest   
%eax,%eax
mov$0xff,%eax   mov
$0xff,%eax
cmove  %eax,%ebxcmove  
%eax,%ebx
movzbl %bl,%eax   | 

[PATCH v6 01/10] x86: Add TARGET_GEN_MEMSET_SCRATCH_RTX

2021-07-30 Thread H.J. Lu via Gcc-patches
Define TARGET_GEN_MEMSET_SCRATCH_RTX to ix86_gen_scratch_sse_rtx to
return a scratch SSE register for memset.

gcc/

PR middle-end/90773
* config/i386/i386.c (TARGET_GEN_MEMSET_SCRATCH_RTX): New.

gcc/testsuite/

PR middle-end/90773
* gcc.target/i386/pr90773-5.c: Updated to expect XMM register.
* gcc.target/i386/pr90773-15.c: New test.
* gcc.target/i386/pr90773-16.c: Likewise.
* gcc.target/i386/pr90773-17.c: Likewise.
* gcc.target/i386/pr90773-18.c: Likewise.
* gcc.target/i386/pr90773-19.c: Likewise.
---
 gcc/config/i386/i386.c |  6 +-
 gcc/testsuite/gcc.target/i386/pr90773-14.c |  2 +-
 gcc/testsuite/gcc.target/i386/pr90773-15.c | 14 ++
 gcc/testsuite/gcc.target/i386/pr90773-16.c | 14 ++
 gcc/testsuite/gcc.target/i386/pr90773-17.c | 14 ++
 gcc/testsuite/gcc.target/i386/pr90773-18.c | 15 +++
 gcc/testsuite/gcc.target/i386/pr90773-19.c | 14 ++
 gcc/testsuite/gcc.target/i386/pr90773-5.c  |  2 +-
 8 files changed, 78 insertions(+), 3 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/i386/pr90773-15.c
 create mode 100644 gcc/testsuite/gcc.target/i386/pr90773-16.c
 create mode 100644 gcc/testsuite/gcc.target/i386/pr90773-17.c
 create mode 100644 gcc/testsuite/gcc.target/i386/pr90773-18.c
 create mode 100644 gcc/testsuite/gcc.target/i386/pr90773-19.c

diff --git a/gcc/config/i386/i386.c b/gcc/config/i386/i386.c
index a0285e659ad..5d20ca2067f 100644
--- a/gcc/config/i386/i386.c
+++ b/gcc/config/i386/i386.c
@@ -23313,7 +23313,8 @@ ix86_optab_supported_p (int op, machine_mode mode1, 
machine_mode,
 }
 }
 
-/* Return a scratch register in MODE for vector load and store.  */
+/* Implement the TARGET_GEN_MEMSET_SCRATCH_RTX hook.  Return a scratch
+   register in MODE for vector load and store.  */
 
 rtx
 ix86_gen_scratch_sse_rtx (machine_mode mode)
@@ -24232,6 +24233,9 @@ static bool ix86_libc_has_fast_function (int fcode 
ATTRIBUTE_UNUSED)
 #undef TARGET_LIBC_HAS_FAST_FUNCTION
 #define TARGET_LIBC_HAS_FAST_FUNCTION ix86_libc_has_fast_function
 
+#undef TARGET_GEN_MEMSET_SCRATCH_RTX
+#define TARGET_GEN_MEMSET_SCRATCH_RTX ix86_gen_scratch_sse_rtx
+
 #if CHECKING_P
 #undef TARGET_RUN_TARGET_SELFTESTS
 #define TARGET_RUN_TARGET_SELFTESTS selftest::ix86_run_selftests
diff --git a/gcc/testsuite/gcc.target/i386/pr90773-14.c 
b/gcc/testsuite/gcc.target/i386/pr90773-14.c
index 6364916ecac..e5c19f49cf5 100644
--- a/gcc/testsuite/gcc.target/i386/pr90773-14.c
+++ b/gcc/testsuite/gcc.target/i386/pr90773-14.c
@@ -10,4 +10,4 @@ foo (void)
 }
 
 /* { dg-final { scan-assembler-times "movups\[\\t \]+%xmm\[0-9\]+, 
\\(%\[\^,\]+\\)" 1 } } */
-/* { dg-final { scan-assembler-times "movl\[\\t \]+\\\$16843009, 
16\\(%\[\^,\]+\\)" 1 } } */
+/* { dg-final { scan-assembler-times "movd\[\\t \]+%xmm\[0-9\]+, 
16\\(%\[\^,\]+\\)" 1 } } */
diff --git a/gcc/testsuite/gcc.target/i386/pr90773-15.c 
b/gcc/testsuite/gcc.target/i386/pr90773-15.c
new file mode 100644
index 000..185ea60e1d2
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/pr90773-15.c
@@ -0,0 +1,14 @@
+/* { dg-do compile { target { ! ia32 } } } */
+/* { dg-options "-O2 -march=skylake-avx512" } */
+
+extern char *dst;
+
+void
+foo (int c)
+{
+  __builtin_memset (dst, c, 17);
+}
+
+/* { dg-final { scan-assembler-times "vpbroadcastb\[\\t \]+%edi, %xmm\[0-9\]+" 
1 } } */
+/* { dg-final { scan-assembler-times "vmovdqu8\[\\t \]+%xmm\[0-9\]+, 
\\(%\[\^,\]+\\)" 1 } } */
+/* { dg-final { scan-assembler-times "movb\[\\t \]+%dil, 16\\(%\[\^,\]+\\)" 1 
} } */
diff --git a/gcc/testsuite/gcc.target/i386/pr90773-16.c 
b/gcc/testsuite/gcc.target/i386/pr90773-16.c
new file mode 100644
index 000..d820cc318c3
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/pr90773-16.c
@@ -0,0 +1,14 @@
+/* { dg-do compile { target { ! ia32 } } } */
+/* { dg-options "-O2 -march=skylake-avx512" } */
+
+extern char *dst;
+
+void
+foo (void)
+{
+  __builtin_memset (dst, -1, 17);
+}
+
+/* { dg-final { scan-assembler-times "(?:vpcmpeqd|vpternlogd)" 1 } } */
+/* { dg-final { scan-assembler-times "vmovdqu8\[\\t \]+%xmm\[0-9\]+, 
\\(%\[\^,\]+\\)" 1 } } */
+/* { dg-final { scan-assembler-times "movb\[\\t \]+\\\$-1, 16\\(%\[\^,\]+\\)" 
1 } } */
diff --git a/gcc/testsuite/gcc.target/i386/pr90773-17.c 
b/gcc/testsuite/gcc.target/i386/pr90773-17.c
new file mode 100644
index 000..f6f179e9b5b
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/pr90773-17.c
@@ -0,0 +1,14 @@
+/* { dg-do compile { target { ! ia32 } } } */
+/* { dg-options "-O2 -march=skylake-avx512" } */
+
+extern char *dst;
+
+void
+foo (void)
+{
+  __builtin_memset (dst, 12, 19);
+}
+
+/* { dg-final { scan-assembler-times "vpbroadcastb" 1 } } */
+/* { dg-final { scan-assembler-times "vmovdqu8\[\\t \]+%xmm\[0-9\]+, 
\\(%\[\^,\]+\\)" 1 } } */
+/* { dg-final { scan-assembler-times "vmovd\[\\t \]+%xmm\[0-9\]+, 
15\\(%\[\^,\]+\\)" 1 } } */
diff --git 

[PATCH v6 04/10] x86: Add AVX2 tests for PR middle-end/90773

2021-07-30 Thread H.J. Lu via Gcc-patches
PR middle-end/90773
* gcc.target/i386/pr90773-20.c: New test.
* gcc.target/i386/pr90773-21.c: Likewise.
* gcc.target/i386/pr90773-22.c: Likewise.
* gcc.target/i386/pr90773-23.c: Likewise.
* gcc.target/i386/pr90773-26.c: Likewise.
---
 gcc/testsuite/gcc.target/i386/pr90773-20.c | 13 +
 gcc/testsuite/gcc.target/i386/pr90773-21.c | 13 +
 gcc/testsuite/gcc.target/i386/pr90773-22.c | 13 +
 gcc/testsuite/gcc.target/i386/pr90773-23.c | 13 +
 gcc/testsuite/gcc.target/i386/pr90773-26.c | 21 +
 5 files changed, 73 insertions(+)
 create mode 100644 gcc/testsuite/gcc.target/i386/pr90773-20.c
 create mode 100644 gcc/testsuite/gcc.target/i386/pr90773-21.c
 create mode 100644 gcc/testsuite/gcc.target/i386/pr90773-22.c
 create mode 100644 gcc/testsuite/gcc.target/i386/pr90773-23.c
 create mode 100644 gcc/testsuite/gcc.target/i386/pr90773-26.c

diff --git a/gcc/testsuite/gcc.target/i386/pr90773-20.c 
b/gcc/testsuite/gcc.target/i386/pr90773-20.c
new file mode 100644
index 000..e61e405f2b6
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/pr90773-20.c
@@ -0,0 +1,13 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -march=skylake" } */
+
+extern char *dst;
+
+void
+foo (int c)
+{
+  __builtin_memset (dst, c, 33);
+}
+
+/* { dg-final { scan-assembler-times "vmovdqu\[\\t \]%ymm\[0-9\]+, 
\\(%\[\^,\]+\\)" 1 } } */
+/* { dg-final { scan-assembler-times "movb\[\\t \]+.+, 32\\(%\[\^,\]+\\)" 1 } 
} */
diff --git a/gcc/testsuite/gcc.target/i386/pr90773-21.c 
b/gcc/testsuite/gcc.target/i386/pr90773-21.c
new file mode 100644
index 000..16ad17f3cbb
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/pr90773-21.c
@@ -0,0 +1,13 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -march=skylake" } */
+
+extern char *dst;
+
+void
+foo (int c)
+{
+  __builtin_memset (dst, c, 34);
+}
+
+/* { dg-final { scan-assembler-times "vmovdqu\[\\t \]%ymm\[0-9\]+, 
\\(%\[\^,\]+\\)" 1 } } */
+/* { dg-final { scan-assembler-times "movw\[\\t \]%.*, 32\\(%\[\^,\]+\\)" 1 } 
} */
diff --git a/gcc/testsuite/gcc.target/i386/pr90773-22.c 
b/gcc/testsuite/gcc.target/i386/pr90773-22.c
new file mode 100644
index 000..45a8ff65a84
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/pr90773-22.c
@@ -0,0 +1,13 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -march=skylake" } */
+
+extern char *dst;
+
+void
+foo (void)
+{
+  __builtin_memset (dst, 0, 33);
+}
+
+/* { dg-final { scan-assembler-times "vmovdqu\[\\t \]%ymm\[0-9\]+, 
\\(%\[\^,\]+\\)" 1 } } */
+/* { dg-final { scan-assembler-times "movb\[\\t \]+.+, 32\\(%\[\^,\]+\\)" 1 } 
} */
diff --git a/gcc/testsuite/gcc.target/i386/pr90773-23.c 
b/gcc/testsuite/gcc.target/i386/pr90773-23.c
new file mode 100644
index 000..9256ce10ff0
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/pr90773-23.c
@@ -0,0 +1,13 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -march=skylake" } */
+
+extern char *dst;
+
+void
+foo (void)
+{
+  __builtin_memset (dst, 0, 34);
+}
+
+/* { dg-final { scan-assembler-times "vmovdqu\[\\t \]%ymm\[0-9\]+, 
\\(%\[\^,\]+\\)" 1 } } */
+/* { dg-final { scan-assembler-times "movw\[\\t \]+.+, 32\\(%\[\^,\]+\\)" 1 } 
} */
diff --git a/gcc/testsuite/gcc.target/i386/pr90773-26.c 
b/gcc/testsuite/gcc.target/i386/pr90773-26.c
new file mode 100644
index 000..b2513c3a9c8
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/pr90773-26.c
@@ -0,0 +1,21 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -march=skylake" } */
+
+struct S
+{
+  long long s1 __attribute__ ((aligned (8)));
+  unsigned s2, s3, s4, s5, s6, s7, s8, s9, s10, s11, s12, s13, s14;
+};
+
+const struct S array[] = {
+  { 0, 60, 640, 2112543726, 39682, 48, 16, 33, 10, 96, 2, 0, 0, 4 }
+};
+
+void
+foo (struct S *x)
+{
+  x[0] = array[0];
+}
+
+/* { dg-final { scan-assembler-times "vmovdqu\[\\t \]%ymm\[0-9\]+, 
\\(%\[\^,\]+\\)" 1 } } */
+/* { dg-final { scan-assembler-times "vmovdqu\[\\t \]%ymm\[0-9\]+, 
32\\(%\[\^,\]+\\)" 1 } } */
-- 
2.31.1



[PATCH v6 02/10] x86: Avoid stack realignment when copying data

2021-07-30 Thread H.J. Lu via Gcc-patches
To avoid stack realignment, use SCRATCH_SSE_REG to copy data from one
memory location to another.

gcc/

* config/i386/i386-expand.c (ix86_expand_vector_move): Call
ix86_gen_scratch_sse_rtx to get a scratch SSE register to copy
data from one memory location to another.

gcc/testsuite/

* gcc.target/i386/eh_return-1.c: New test.
---
 gcc/config/i386/i386-expand.c   |  4 +++-
 gcc/testsuite/gcc.target/i386/eh_return-1.c | 26 +
 2 files changed, 29 insertions(+), 1 deletion(-)
 create mode 100644 gcc/testsuite/gcc.target/i386/eh_return-1.c

diff --git a/gcc/config/i386/i386-expand.c b/gcc/config/i386/i386-expand.c
index 896bd685b1f..1d469bf7221 100644
--- a/gcc/config/i386/i386-expand.c
+++ b/gcc/config/i386/i386-expand.c
@@ -625,7 +625,9 @@ ix86_expand_vector_move (machine_mode mode, rtx operands[])
   && !register_operand (op0, mode)
   && !register_operand (op1, mode))
 {
-  emit_move_insn (op0, force_reg (GET_MODE (op0), op1));
+  rtx tmp = ix86_gen_scratch_sse_rtx (GET_MODE (op0));
+  emit_move_insn (tmp, op1);
+  emit_move_insn (op0, tmp);
   return;
 }
 
diff --git a/gcc/testsuite/gcc.target/i386/eh_return-1.c 
b/gcc/testsuite/gcc.target/i386/eh_return-1.c
new file mode 100644
index 000..671ba635e88
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/eh_return-1.c
@@ -0,0 +1,26 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -march=haswell -mno-avx512f" } */
+
+struct _Unwind_Context
+{
+  void *ra;
+  char array[48];
+};
+
+extern long uw_install_context_1 (struct _Unwind_Context *);
+
+void
+_Unwind_RaiseException (void)
+{
+  struct _Unwind_Context this_context, cur_context;
+  long offset = uw_install_context_1 (_context);
+  __builtin_memcpy (_context, _context,
+   sizeof (struct _Unwind_Context));
+  void *handler = __builtin_frob_return_addr ((_context)->ra);
+  uw_install_context_1 (_context);
+  __builtin_eh_return (offset, handler);
+}
+
+/* { dg-final { scan-assembler-times "vmovdqu\[ \\t\]+\[^\n\]*%ymm" 4 } } */
+/* No need to dynamically realign the stack here.  */
+/* { dg-final { scan-assembler-not "and\[^\n\r]*%\[re\]sp" } } */
-- 
2.31.1



Re: [PATCH v2] Fix for powerpc64 long double complex divide failure

2021-07-30 Thread Segher Boessenkool
Hi!

On Thu, Jul 29, 2021 at 05:11:49PM +, Patrick McGehearty wrote:
> The MAX and MIN values have only modest changes since the exponent
> field for IBM 128-bit floating point values is the same size as
> the exponent field for IBM 64-bit floating point values.

This is misleading / wrong / not enough of the story to mean much, as I
explained before.  "The maximum and minimum values are about the same as
for double precision" is more correct, and more direct as well :-)

> However
> the EPSILON field is considerably different. Due to how small
> values can be represented in the lower 64 bits of the IBM 128-bit
> floating point,

... "while the high 64 bits are a not-so-small number".  Yes.

> EPSILON is extremely small, so far beyond the
> desired value that inversion of the value overflows and even
> without the overflow, the RMAX2 is so small as to eliminate
> most usage of the test.

Right.

> Instead of just replacing the use of KF_EPSILON with DF_ESPILON, we

(typo, s/SP/PS/)

> replace all uses of KF_* with DF_*. Since the exponent fields are
> essentially the same, we gain the positive benefits from the new
> formula while avoiding all under/overflow issues in the #defines.
> 
> The change has been tested on gcc135.fsffrance.org and gains the
> expected improvements in accuracy for long double complex divide.

> libgcc/
>   PR target/101104
>   * config/rs6000/_divkc3.c (RBIG, RMIN, RMIN2, RMINSCAL, RMAX2):
>   Fix long double complex divide for native IBM 128-bit.

"Use more correct values."?

Okay for trunk.  Thank you!


Segher


Re: [PATCH][gcc/doc] Improve nonnull attribute documentation

2021-07-30 Thread Tom de Vries
On 7/30/21 6:17 PM, Martin Sebor wrote:
> On 7/28/21 9:20 AM, Tom de Vries wrote:
>> Hi,
>>
>> Improve nonnull attribute documentation in a number of ways:
>>
>> Reorganize discussion of effects into:
>> - effects for calls to functions with nonnull-marked parameters, and
>> - effects for function definitions with nonnull-marked parameters.
>> This makes it clear that -fno-delete-null-pointer-checks has no effect
>> for
>> optimizations based on nonnull-marked parameters in function definitions
>> (see PR100404).
> 
> This resolves half of PR 101665 that I raised the other day (i.e.,
> updates the docs).  Thank you!

You're welcome :)

> Since PR 100404 was resolved as
> invalid,

Yeah, I can also live with reopening that one as documentation PR.

> can you please reference the other PR in the changelog?

Done.

> The other half (warning when attribute nonnull is specified along
> with attribute optimize "-fno-delete-null-pointer-checks") remains.
> I plan to look into it unless someone beats me to it or unless some
> other solution emerges.
> 

FWIW, In my reply to Richi here (
https://gcc.gnu.org/pipermail/gcc-patches/2021-July/576415.html ) I
proposed to split the existing nonnull  attribute functionality into
assume_nonnull/verify_nonnull attributes.  I'm curious what you think of
that proposal.

> A few comments on the documentation changes below.
> 
>>
>> Mention -Wnonnull-compare.
>>
>> Mention workaround from PR100404 comment 7.
>>
>> The workaround can be used for this scenario.  Say we have a test.c:
>> ...
>>   #include 
>>
>>   extern int isnull (char *ptr) __attribute__ ((nonnull));
>>   int isnull (char *ptr)
>>   {
>>     if (ptr == 0)
>>   return 1;
>>     return 0;
>>   }
>>
>>   int
>>   main (void)
>>   {
>>     char *ptr = NULL;
>>     if (isnull (ptr)) __builtin_abort ();
>>     return 0;
>>   }
>> ...
>>
>> The test-case contains a mistake: ptr == NULL, and we want to detect the
>> mistake using an abort:
>> ...
>> $ gcc test.c
>> $ ./a.out
>> Aborted (core dumped)
>> ...
>>
>> At -O2 however, the mistake is not detected:
>> ...
>> $ gcc test.c -O2
>> $ ./a.out
>> ...
>> which is what -Wnonnull-compare (not show here) warns about.
>>
>> The easiest way to fix this is by dropping the nonnull attribute.  But
>> that
>> also disables -Wnonnull, which would detect something like:
>> ...
>>    if (isnull (NULL)) __builtin_abort ();
>> ...
>> at compile time.
>>
>> Using this workaround:
>> ...
>>   int isnull (char *ptr)
>>   {
>> +  asm ("" : "+r"(ptr));
>>     if (ptr == 0)
>>   return 1;
>>     return 0;
>>   }
>> ...
>> we still manage to detect the problem at runtime with -O2:
>> ...
>> $ ~/gcc_versions/devel/install/bin/gcc test.c -O2
>> $ ./a.out
>> Aborted (core dumped)
>> ...
>> while keeping the possibility to detect "isnull (NULL)" at compile time.
>>
>> OK for trunk?
>>
>> Thanks,
>> - Tom
>>
>> [gcc/doc] Improve nonnull attribute documentation
>>
>> gcc/ChangeLog:
>>
>> 2021-07-28  Tom de Vries  
>>
>> * doc/extend.texi (nonnull attribute): Improve documentation.
>>
>> ---
>>   gcc/doc/extend.texi | 51
>> ---
>>   1 file changed, 40 insertions(+), 11 deletions(-)
>>
>> diff --git a/gcc/doc/extend.texi b/gcc/doc/extend.texi
>> index b83cd4919bb..3389effd70c 100644
>> --- a/gcc/doc/extend.texi
>> +++ b/gcc/doc/extend.texi
>> @@ -3488,17 +3488,46 @@ my_memcpy (void *dest, const void *src, size_t
>> len)
>>   @end smallexample
>>     @noindent
>> -causes the compiler to check that, in calls to @code{my_memcpy},
>> -arguments @var{dest} and @var{src} are non-null.  If the compiler
>> -determines that a null pointer is passed in an argument slot marked
>> -as non-null, and the @option{-Wnonnull} option is enabled, a warning
>> -is issued.  @xref{Warning Options}.  Unless disabled by
>> -the @option{-fno-delete-null-pointer-checks} option the compiler may
>> -also perform optimizations based on the knowledge that certain function
>> -arguments cannot be null. In addition,
>> -the @option{-fisolate-erroneous-paths-attribute} option can be specified
>> -to have GCC transform calls with null arguments to non-null functions
>> -into traps. @xref{Optimize Options}.
>> +informs the compiler that, in calls to @code{my_memcpy}, arguments
>> +@var{dest} and @var{src} must be non-null.
>> +
>> +The attribute has effect both for functions calls and function
>> definitions.
> 
> Missing article: has an  effect.  Also, an effect on
> (rather than for) might be more appropriate.
> 

Done.

>> +
>> +For function calls:
>> +@itemize @bullet
>> +@item If the compiler determines that a null pointer is
>> +passed in an argument slot marked as non-null, and the
>> +@option{-Wnonnull} option is enabled, a warning is issued.
>> +@xref{Warning Options}.
>> +@item The @option{-fisolate-erroneous-paths-attribute} option can be
>> +specified to have GCC transform calls with null arguments to non-null
>> +functions into traps.  @xref{Optimize 

Re: [PATCH] diagnostics: Support for -finput-charset [PR93067]

2021-07-30 Thread Lewis Hyatt via Gcc-patches
On Fri, Jan 29, 2021 at 10:46:30AM -0500, Lewis Hyatt wrote:
> On Tue, Jan 26, 2021 at 04:02:52PM -0500, David Malcolm wrote:
> > On Fri, 2020-12-18 at 18:03 -0500, Lewis Hyatt wrote:
> > > Hello-
> > > 
> > > The attached patch addresses PR93067:
> > > https://gcc.gnu.org/bugzilla/show_bug.cgi?id=93067#c0
> > > 
> > > This is similar to the patch I posted last year on the PR, with some
> > tweaks
> > > to make it a little simpler. Recapping some of the commentary on the
> > PR:
> > > 
> > > When source lines are needed for diagnostics output, they are
> > retrieved from
> > > the source file by the fcache infrastructure in input.c, since libcpp
> > has
> > > generally already forgotten them (plus not all front ends are using
> > > libcpp). This infrastructure does not read the files in the same way
> > as
> > > libcpp does; in particular, it does not translate the encoding as
> > requested
> > > by -finput-charset, and it does not strip a UTF-8 byte-order mark if
> > > present. The patch adds this ability. My thinking in deciding how to
> > do it
> > > was the following:
> > > 
> > > - Use of -finput-charset is rare, and use of UTF-8 BOMs must be rarer
> > still,
> > >   so this patch should try hard not to introduce any worse
> > performance
> > >   unless these things are needed.
> > > 
> > > - It is desirable to reuse libcpp's encoding infrastructure from
> > charset.c
> > >   rather than repeat it in input.c. (Notably, libcpp uses iconv but
> > it also
> > >   has hand-coded routines for certain charsets to make sure they are
> > >   available.)
> > > 
> > > - There is a performance degradation required in order to make use of
> > libcpp
> > >   directly, because the input.c infrastructure only reads as much of
> > the
> > >   source file as necessary, whereas libcpp interfaces as-is require
> > to read
> > >   the entire file into memory.
> > > 
> > > - It can't be quite as simple as just "only delegate to libcpp if
> > >   -finput-charset was specified", because the stripping of the UTF-8
> > BOM has
> > >   to happen with or without this option.
> > > 
> > > - So it seemed a reasonable compromise to me, if -finput-charset is
> > >   specified, then use libcpp to convert the file, otherwise, strip
> > the BOM
> > >   in input.c and then process the file the same way it is done now.
> > There's
> > >   a little bit of leakage of charset logic from libcpp this way (for
> > the
> > >   BOM), but it seems worthwhile, since otherwise, diagnostics would
> > always
> > >   be reading the entire file into memory, which is not a cost paid
> > >   currently.
> > 
> > Thanks for the patch; sorry about the delay in reviewing it.
> >
> 
> Thanks for the comments! Here is an updated patch that addresses your
> feedback, plus some responses inline below.
> 
> Bootstrap + regtest all languages was done on x86-64 GNU/Linux. All tests
> the same before and after, plus 6 new PASS.
> 
> FAIL 85 85
> PASS 479191 479197
> UNSUPPORTED 13664 13664
> UNTESTED 129 129
> XFAIL 2292 2292
> XPASS 30 30
> 
> 
> > This mostly seems good to me.
> > 
> > One aspect I'm not quite convinced about is the
> > input_cpp_context.in_use flag.  The input.c machinery is used by
> > diagnostics, and so could be used by middle-end warnings for frontends
> > that don't use libcpp.  Presumably we'd still want to remove the UTF-8
> > BOM for those, and do encoding fixups if necessary - is it more a case
> > of initializing things to express what the expected input charset is?
> > (since that is part of the cpp_options)
> > 
> > c.opt has:
> >   finput-charset=
> >   C ObjC C++ ObjC++ Joined RejectNegative
> >   -finput-charset=Specify the default character set for
> > source files.
> > 
> > I believe that D and Go are the two frontends that don't use libcpp for
> > parsing.  I believe Go source is required to be UTF-8 (unsurprisingly
> > considering the heritage of both).  I don't know what source encodings
> > D supports.
> >
> 
> For this patch I was rather singularly focused on libcpp, so I looked
> deeper at the other frontends now. It seems to me that there are basically
> two questions to answer, and the three frontend styles answer this pair in
> three different ways.
> 
> Q1: What is the input charset?
> A1:
> 
> libcpp: Whatever was passed to -finput-charset (note, for Fortran,
> -finput-charset is not supported though.)
> 
> go: Assume UTF-8.
> 
> D: UTF-8, UTF-16, or UTF-32 (the latter two in either
>endianness); determined by inspecting the first bytes of the file.
> 
> Q2: How should a UTF-8 BOM be handled?
> A2:
> 
> libcpp: Treat entirely the same, as if it was not present at all. So
> a diagnostic referring to the first non-BOM character in the file will
> point to column 1, not to column 4.
> 
> go: Treat it like whitespace, ignored for parsing purposes, but still
> logically part of the file. A diagnostic referring to the first non-BOM
> character in the file will point to 

[PATCH 2/3] targhooks: New target hook for CTF/BTF debug info emission

2021-07-30 Thread Indu Bhagat via Gcc-patches
This patch adds a new target hook to detect if the CTF container can allow the
emission of CTF/BTF debug info at DWARF debug info early finish time. Some
backends, e.g., BPF when generating code for CO-RE usecase, may need to emit
the CTF/BTF debug info sections around the time when late DWARF debug is
finalized (dwarf2out_finish).

gcc/ChangeLog:

* config/bpf/bpf.c (ctfc_debuginfo_early_finish_p): New definition.
(TARGET_CTFC_DEBUGINFO_EARLY_FINISH_P): Undefine and override.
* doc/tm.texi: Regenerated.
* doc/tm.texi.in: Document the new hook.
* target.def: Add a new hook.
* targhooks.c (default_ctfc_debuginfo_early_finish_p): Likewise.
* targhooks.h (default_ctfc_debuginfo_early_finish_p): Likewise.
---
 gcc/config/bpf/bpf.c | 14 ++
 gcc/doc/tm.texi  |  6 ++
 gcc/doc/tm.texi.in   |  2 ++
 gcc/target.def   | 10 ++
 gcc/targhooks.c  |  6 ++
 gcc/targhooks.h  |  2 ++
 6 files changed, 40 insertions(+)

diff --git a/gcc/config/bpf/bpf.c b/gcc/config/bpf/bpf.c
index 028013e..85f6b76 100644
--- a/gcc/config/bpf/bpf.c
+++ b/gcc/config/bpf/bpf.c
@@ -178,6 +178,20 @@ bpf_option_override (void)
 #undef TARGET_OPTION_OVERRIDE
 #define TARGET_OPTION_OVERRIDE bpf_option_override
 
+/* Return FALSE iff -mcore has been specified.  */
+
+static bool
+ctfc_debuginfo_early_finish_p (void)
+{
+  if (TARGET_BPF_CORE)
+return false;
+  else
+return true;
+}
+
+#undef TARGET_CTFC_DEBUGINFO_EARLY_FINISH_P
+#define TARGET_CTFC_DEBUGINFO_EARLY_FINISH_P ctfc_debuginfo_early_finish_p
+
 /* Define target-specific CPP macros.  This function in used in the
definition of TARGET_CPU_CPP_BUILTINS in bpf.h */
 
diff --git a/gcc/doc/tm.texi b/gcc/doc/tm.texi
index a464d26..df408ee 100644
--- a/gcc/doc/tm.texi
+++ b/gcc/doc/tm.texi
@@ -10400,6 +10400,12 @@ Define this macro if GCC should produce debugging 
output in BTF debug
 format in response to the @option{-gbtf} option.
 @end defmac
 
+@deftypefn {Target Hook} bool TARGET_CTFC_DEBUGINFO_EARLY_FINISH_P (void)
+This target hook returns nonzero if the CTF Container can allow the
+ emission of the CTF/BTF debug info at the DWARF debuginfo early finish
+ time.
+@end deftypefn
+
 @node Floating Point
 @section Cross Compilation and Floating Point
 @cindex cross compilation and floating point
diff --git a/gcc/doc/tm.texi.in b/gcc/doc/tm.texi.in
index 0b60342..6119a30 100644
--- a/gcc/doc/tm.texi.in
+++ b/gcc/doc/tm.texi.in
@@ -7020,6 +7020,8 @@ Define this macro if GCC should produce debugging output 
in BTF debug
 format in response to the @option{-gbtf} option.
 @end defmac
 
+@hook TARGET_CTFC_DEBUGINFO_EARLY_FINISH_P
+
 @node Floating Point
 @section Cross Compilation and Floating Point
 @cindex cross compilation and floating point
diff --git a/gcc/target.def b/gcc/target.def
index 6b4226c..67bdcba 100644
--- a/gcc/target.def
+++ b/gcc/target.def
@@ -4007,6 +4007,16 @@ clobbered parts of a register altering the frame 
register size",
  machine_mode, (int regno),
  default_dwarf_frame_reg_mode)
 
+/* Return nonzero if CTF Container can finalize the CTF/BTF emission
+   at DWARF debuginfo early finish time.  */
+DEFHOOK
+(ctfc_debuginfo_early_finish_p,
+ "This target hook returns nonzero if the CTF Container can allow the\n\
+ emission of the CTF/BTF debug info at the DWARF debuginfo early finish\n\
+ time.",
+ bool, (void),
+ default_ctfc_debuginfo_early_finish_p)
+
 /* If expand_builtin_init_dwarf_reg_sizes needs to fill in table
entries not corresponding directly to registers below
FIRST_PSEUDO_REGISTER, this hook should generate the necessary
diff --git a/gcc/targhooks.c b/gcc/targhooks.c
index eb51909..e38566c 100644
--- a/gcc/targhooks.c
+++ b/gcc/targhooks.c
@@ -2112,6 +2112,12 @@ default_dwarf_frame_reg_mode (int regno)
   return save_mode;
 }
 
+bool
+default_ctfc_debuginfo_early_finish_p (void)
+{
+  return true;
+}
+
 /* To be used by targets where reg_raw_mode doesn't return the right
mode for registers used in apply_builtin_return and apply_builtin_arg.  */
 
diff --git a/gcc/targhooks.h b/gcc/targhooks.h
index f92e102..55dc443 100644
--- a/gcc/targhooks.h
+++ b/gcc/targhooks.h
@@ -255,6 +255,8 @@ extern unsigned int default_dwarf_poly_indeterminate_value 
(unsigned int,
unsigned int *,
int *);
 extern machine_mode default_dwarf_frame_reg_mode (int);
+extern bool default_ctfc_debuginfo_early_finish_p (void);
+
 extern fixed_size_mode default_get_reg_raw_mode (int);
 extern bool default_keep_leaf_when_profiled ();
 
-- 
1.8.3.1



[PATCH 1/3] bpf: Add new -mcore option for BPF CO-RE

2021-07-30 Thread Indu Bhagat via Gcc-patches
-mcore in the BPF backend enables code generation for the CO-RE usecase. LTO is
disabled for CO-RE compilations.

gcc/ChangeLog:

* config/bpf/bpf.c (bpf_option_override): For BPF backend, disable LTO
support when compiling for CO-RE.
* config/bpf/bpf.opt: Add new command line option -mcore.

gcc/testsuite/ChangeLog:

* gcc.dg/debug/btf/btf-mcore-1.c: New test.
---
 gcc/config/bpf/bpf.c | 15 +++
 gcc/config/bpf/bpf.opt   |  4 
 gcc/testsuite/gcc.dg/debug/btf/btf-mcore-1.c | 14 ++
 3 files changed, 33 insertions(+)
 create mode 100644 gcc/testsuite/gcc.dg/debug/btf/btf-mcore-1.c

diff --git a/gcc/config/bpf/bpf.c b/gcc/config/bpf/bpf.c
index e635f9e..028013e 100644
--- a/gcc/config/bpf/bpf.c
+++ b/gcc/config/bpf/bpf.c
@@ -158,6 +158,21 @@ bpf_option_override (void)
 {
   /* Set the initializer for the per-function status structure.  */
   init_machine_status = bpf_init_machine_status;
+
+  /* To support the portability needs of BPF CO-RE approach, BTF debug
+ information includes the BPF CO-RE relocations.  The information
+ necessary for these relocations is added to the CTF container by the
+ BPF backend.  Enabling LTO poses challenges in the generation of the BPF
+ CO-RE relocations because if LTO is in effect, they need to be
+ generated late in the LTO link phase.  This in turn means the compiler
+ needs to provide means to combine the early and late BTF debug info,
+ similar to DWARF debug info.
+
+ In any case, in absence of linker support for BTF sections at this time,
+ it is acceptable to simply disallow LTO for BPF CO-RE compilations.  */
+
+  if (flag_lto && TARGET_BPF_CORE)
+error ("BPF CO-RE does not support LTO");
 }
 
 #undef TARGET_OPTION_OVERRIDE
diff --git a/gcc/config/bpf/bpf.opt b/gcc/config/bpf/bpf.opt
index 916b53c..e8926f5 100644
--- a/gcc/config/bpf/bpf.opt
+++ b/gcc/config/bpf/bpf.opt
@@ -127,3 +127,7 @@ Generate little-endian eBPF.
 mframe-limit=
 Target Joined RejectNegative UInteger IntegerRange(0, 32767) 
Var(bpf_frame_limit) Init(512)
 Set a hard limit for the size of each stack frame, in bytes.
+
+mcore
+Target Mask(BPF_CORE)
+Generate all necessary information for BPF Compile Once - Run Everywhere.
diff --git a/gcc/testsuite/gcc.dg/debug/btf/btf-mcore-1.c 
b/gcc/testsuite/gcc.dg/debug/btf/btf-mcore-1.c
new file mode 100644
index 000..58f20d0
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/debug/btf/btf-mcore-1.c
@@ -0,0 +1,14 @@
+/* Testcase for BPF CO-RE.
+  
+   -mcore is used to generate information for BPF CO-RE usecase. To support
+   the generataion of the .BTF and .BTF.ext sections in GCC, -flto is disabled
+   with -mcore.  */
+
+/* { dg-do compile { target bpf-*-* } } */
+/* { dg-error "BPF CO-RE does not support LTO" "" { target btf-*-* } 0 } */
+
+/* { dg-require-effective-target lto } */
+
+/* { dg-options "-gbtf -mcore -flto" } */
+
+void func(void) { }
-- 
1.8.3.1



[PATCH 0/3] Allow means for late BTF generation for BPF CO-RE

2021-07-30 Thread Indu Bhagat via Gcc-patches
Hello,

This patch series puts the framework in place for late BTF generation (in
dwarf2out_finish). This is needed for the landing of BPF CO-RE support in GCC.

BPF's Compile Once - Run Everywhere (CO-RE) feature is used to make a compiled 
BPF program portable across kernel versions, all this without the need to
recompile the BPF program. A key part of BPF CO-RE capability is the BTF debug
info generated for the BPF program.

A traditional BPF program (non CO-RE) will have a .BTF section which contains
the type information in the BTF debug format. In case of CO-RE, however, an 
additional section .BTF.ext section is generated. The .BTF.ext section contains
the CO-RE relocations. A BPF loader will use the .BTF.ext section along with the
associated .BTF section to adjust some references in the instructions of the
BPF program to ensure it is compatible with the required kernel version /
headers.

Roughly, each CO-RE relocation record will contain the following info:
 - offset of BPF instruction to be patched,
 - the BTF ID of the data structure being accessed by the instruction, and 
 - an offset to the BTF string which encodes a series of field accesses to
   retrieve the field of interest in the instruction.

High-level design
-
- The CTF container (CTFC) is populated with the compiler-internal
representation for the CTF/BTF "type information" at dwarf2out_early_finish
time.
- In case of CO-RE compilation, the information needed to generate .BTF.ext
section will be added by the BPF backend to the CTF container (CTFC) at expand
time. This introduces challenges in having LTO support for CO-RE - CO-RE
relocations can only be generated late in the compilation process, much like
late DWARF.
- While .BTF.ext is a separate section, the format requires that the string
encodings of field accesses (in the CO-RE relocation record) are added in the
.BTF string table. Recall that .BTF strings are owned by the .BTF section. 
Hence,
this means that .BTF section cannot simply be emitted "early" because the
CO-RE relocations records will need to add additional .BTF strings to the 
.BTF section.
- This patch set disables LTO to be used together with CO-RE for the BPF
target. Combining late and early BTF is not being done in this patch series.
BTF debug info emission for CO-RE compilations is done at dwarf2out_finish
time.
- A new target hook is added for the CTFC (CTF Container) to know whether early
emission of CTF/BTF is allowed for the target. The hook returns false when
"-mcore" for the BPF target is in effect.

Testing notes
-
- Bootstrapped and reg tested (make check-gcc) on x86_64-pc-linux.
- make all-gcc for --target=bpf-unknown-none.

Thanks,

Indu Bhagat (3):
  bpf: Add new -mcore option for BPF CO-RE
  targhooks: New target hook for CTF/BTF debug info emission
  dwarf2out: Emit BTF in dwarf2out_finish for BPF CO-RE usecase

 gcc/config/bpf/bpf.c | 29 +++
 gcc/config/bpf/bpf.opt   |  4 ++
 gcc/doc/tm.texi  |  6 +++
 gcc/doc/tm.texi.in   |  2 +
 gcc/dwarf2ctf.c  | 55 ++--
 gcc/dwarf2ctf.h  |  4 +-
 gcc/dwarf2out.c  |  9 -
 gcc/target.def   | 10 +
 gcc/targhooks.c  |  6 +++
 gcc/targhooks.h  |  2 +
 gcc/testsuite/gcc.dg/debug/btf/btf-mcore-1.c | 14 +++
 11 files changed, 126 insertions(+), 15 deletions(-)
 create mode 100644 gcc/testsuite/gcc.dg/debug/btf/btf-mcore-1.c

-- 
1.8.3.1



[PATCH 3/3] dwarf2out: Emit BTF in dwarf2out_finish for BPF CO-RE usecase

2021-07-30 Thread Indu Bhagat via Gcc-patches
DWARF generation is split between early and late phases when LTO is in effect.
This poses challenges for CTF/BTF generation especially if late debug info
generation is desirable, as turns out to be the case for BPF CO-RE.

In case of BPF CO-RE, the BPF backend adds information about CO-RE relocations
to the CTF container. This information is what needs to be emitted as a
separate .BTF.ext section when -more is in effect. Further, each CO-RE
relocation record holds an offset to a string specifying the access to the
structure's field. This means that .BTF string table needs to be modified
"late" in the compilation process. In other words, this implies that the BTF
sections cannot be finalized in dwarf2out_early_finish when -mcore for the BPF
backend is in effect.

Now, the emission of CTF/BTF debug info cannot be moved unconditionally to
dwarf2out_finish because dwarf2out_finish is not invoked at all for the LTO
compile phase for slim LTO objects, thus breaking CTF/BTF generation for other
targets when used with LTO.

The approach taken here in this patch is that -

1. LTO is disabled for BPF CO-RE
The reason to disable LTO for BPF CO-RE is that if LTO is in effect, BPF CO-RE
relocations need to be generated in the LTO link phase _after_ the optimizations
are done. This means we need to devise way to combine early and late BTF. At
this time, in absence of linker support for BTF sections, it makes sense to
steer clear of LTO for BPF CO-RE and bypass the issue.

2. Use a target hook to allow BPF backend to cleanly convey the case when late
finalization of the CTF container is desirable.

So, in other words,

dwarf2out_early_finish
  - Always emit CTF here.
  - if (BTF && ctfc_debuginfo_early_finish_p), emit BTF now.

dwarf2out_finish
  - if (BTF && !ctfc_debuginfo_early_finish_p && !in_lto_p) emit BTF now.
  - Use of in_lto_p to make sure LTO link phase does not affect BTF sections
for other targets.

gcc/ChangeLog:

* dwarf2ctf.c (ctf_debug_finalize): Make it static.
(ctf_debug_early_finish): New definition.
(ctf_debug_finish): Likewise.
* dwarf2ctf.h (ctf_debug_finalize): Remove declaration.
(ctf_debug_early_finish): New declaration.
(ctf_debug_finish): Likewise.
* dwarf2out.c (dwarf2out_finish): Invoke ctf_debug_finish.
(dwarf2out_early_finish): Invoke ctf_debug_early_finish.
---
 gcc/dwarf2ctf.c | 55 +++
 gcc/dwarf2ctf.h |  4 +++-
 gcc/dwarf2out.c |  9 +++--
 3 files changed, 53 insertions(+), 15 deletions(-)

diff --git a/gcc/dwarf2ctf.c b/gcc/dwarf2ctf.c
index 5e8a725..0fa429c 100644
--- a/gcc/dwarf2ctf.c
+++ b/gcc/dwarf2ctf.c
@@ -917,6 +917,27 @@ gen_ctf_type (ctf_container_ref ctfc, dw_die_ref die)
   return type_id;
 }
 
+/* Prepare for output and write out the CTF debug information.  */
+
+static void
+ctf_debug_finalize (const char *filename, bool btf)
+{
+  if (btf)
+{
+  btf_output (filename);
+  btf_finalize ();
+}
+
+  else
+{
+  /* Emit the collected CTF information.  */
+  ctf_output (filename);
+
+  /* Reset the CTF state.  */
+  ctf_finalize ();
+}
+}
+
 bool
 ctf_do_die (dw_die_ref die)
 {
@@ -966,25 +987,35 @@ ctf_debug_init_postprocess (bool btf)
 btf_init_postprocess ();
 }
 
-/* Prepare for output and write out the CTF debug information.  */
+/* Early finish CTF/BTF debug info.  */
 
 void
-ctf_debug_finalize (const char *filename, bool btf)
+ctf_debug_early_finish (const char * filename)
 {
-  if (btf)
+  /* Emit CTF debug info early always.  */
+  if (ctf_debug_info_level > CTFINFO_LEVEL_NONE
+  /* Emit BTF debug info early if the target does not require late
+emission.  */
+   || (btf_debuginfo_p ()
+  && targetm.ctfc_debuginfo_early_finish_p ()))
 {
-  btf_output (filename);
-  btf_finalize ();
+  /* Emit CTF/BTF debug info.  */
+  ctf_debug_finalize (filename, btf_debuginfo_p ());
 }
+}
 
-  else
-{
-  /* Emit the collected CTF information.  */
-  ctf_output (filename);
+/* Finish CTF/BTF debug info emission.  */
 
-  /* Reset the CTF state.  */
-  ctf_finalize ();
-}
+void
+ctf_debug_finish (const char * filename)
+{
+  /* Emit BTF debug info here when the target needs to update the CTF container
+ (ctfc) in the backend.  An example of this, at this time is the BPF CO-RE
+ usecase.  */
+  if (btf_debuginfo_p ()
+  && (!in_lto_p && !targetm.ctfc_debuginfo_early_finish_p ()))
+/* Emit BTF debug info.  */
+ctf_debug_finalize (filename, btf_debuginfo_p ());
 }
 
 #include "gt-dwarf2ctf.h"
diff --git a/gcc/dwarf2ctf.h b/gcc/dwarf2ctf.h
index a3cf567..9edbde0 100644
--- a/gcc/dwarf2ctf.h
+++ b/gcc/dwarf2ctf.h
@@ -24,13 +24,15 @@ along with GCC; see the file COPYING3.  If not see
 #define GCC_DWARF2CTF_H 1
 
 #include "dwarf2out.h"
+#include "flags.h"
 
 /* Debug Format Interface.  Used in dwarf2out.c.  */
 
 extern void ctf_debug_init 

Re: [PATCH] c++: Reject anonymous struct with bases

2021-07-30 Thread Andrew Pinski via Gcc-patches
On Fri, Jul 30, 2021 at 9:26 AM Jason Merrill via Gcc-patches
 wrote:
>
> In discussion of jakub's patch for C++20 pointer-interconvertibility, it
> came up that we allow anonymous structs to have bases, but don't do anything
> usable with them.  Let's reject it.
>
> The comment change is something I noticed while looking for the right place
> to diagnose this: finish_struct_anon does not actually check for anything
> invalid, so it shouldn't claim to.

This should fix PR 96636 by rejecting the code.

Thanks,
Andrew Pinski

>
> Tested x86_64-pc-linux-gnu, applying to trunk.
>
> gcc/cp/ChangeLog:
>
> * class.c (finish_struct_anon): Improve comment.
> * decl.c (fixup_anonymous_aggr): Reject anonymous struct
> with bases.
>
> gcc/testsuite/ChangeLog:
>
> * g++.dg/ext/anon-struct8.C: New test.
> ---
>  gcc/cp/class.c  | 3 +--
>  gcc/cp/decl.c   | 3 +++
>  gcc/testsuite/g++.dg/ext/anon-struct8.C | 9 +
>  3 files changed, 13 insertions(+), 2 deletions(-)
>  create mode 100644 gcc/testsuite/g++.dg/ext/anon-struct8.C
>
> diff --git a/gcc/cp/class.c b/gcc/cp/class.c
> index 14db06692dc..6f31700c06c 100644
> --- a/gcc/cp/class.c
> +++ b/gcc/cp/class.c
> @@ -3072,8 +3072,7 @@ finish_struct_anon_r (tree field)
>  }
>  }
>
> -/* Check for things that are invalid.  There are probably plenty of other
> -   things we should check for also.  */
> +/* Fix up any anonymous union/struct members of T.  */
>
>  static void
>  finish_struct_anon (tree t)
> diff --git a/gcc/cp/decl.c b/gcc/cp/decl.c
> index 01d64a16125..71308a06c63 100644
> --- a/gcc/cp/decl.c
> +++ b/gcc/cp/decl.c
> @@ -5084,6 +5084,9 @@ fixup_anonymous_aggr (tree t)
>  {
>tree field, type;
>
> +  if (BINFO_N_BASE_BINFOS (TYPE_BINFO (t)))
> +   error_at (location_of (t), "anonymous struct with base classes");
> +
>for (field = TYPE_FIELDS (t); field; field = DECL_CHAIN (field))
> if (TREE_CODE (field) == FIELD_DECL)
>   {
> diff --git a/gcc/testsuite/g++.dg/ext/anon-struct8.C 
> b/gcc/testsuite/g++.dg/ext/anon-struct8.C
> new file mode 100644
> index 000..f4e3f11b678
> --- /dev/null
> +++ b/gcc/testsuite/g++.dg/ext/anon-struct8.C
> @@ -0,0 +1,9 @@
> +// { dg-options "" }
> +
> +struct A { };
> +struct B {
> +  struct: A { int i; };// { dg-error "anonymous struct with 
> base" }
> +};
> +union U {
> +  struct: A { int i; };// { dg-error "anonymous struct with 
> base" }
> +};
>
> base-commit: 0ba2003cf306aa98b6ec91c9d849ab9bafcf17c2
> --
> 2.27.0
>


[PATCH] Fix PR 101683: FP exceptions for float->unsigned

2021-07-30 Thread apinski--- via Gcc-patches
From: Andrew Pinski 

Just like the old bug PR9651, unsigned_fix rtl should
also be handled as a trapping instruction.

OK? Bootstrapped and tested on x86_64-linux-gnu with no regressions.

gcc/ChangeLog:

PR rtl-optimization/101683
* rtlanal.c (may_trap_p_1): Handle UNSIGNED_FIX.
---
 gcc/rtlanal.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/gcc/rtlanal.c b/gcc/rtlanal.c
index 3b8d88afd4d..f7f3acb75db 100644
--- a/gcc/rtlanal.c
+++ b/gcc/rtlanal.c
@@ -3261,6 +3261,7 @@ may_trap_p_1 (const_rtx x, unsigned flags)
   break;
 
 case FIX:
+case UNSIGNED_FIX:
   /* Conversion of floating point might trap.  */
   if (flag_trapping_math && HONOR_NANS (XEXP (x, 0)))
return 1;
-- 
2.27.0



[COMMITTED] Handle constants in wi_fold for trunc_mod.

2021-07-30 Thread Andrew MacLeod via Gcc-patches
When resolving issues with divide by 0 returning UNDEFINED, I discovered 
that although we treat % 0 as undefined, the implementation of wi_fold 
for modulus doesn't expect constants, and with the earlier changes to 
wi_fold_in_parts, it can now get constants to calculate and combine. ie


[10,10] % [4,4] was returning [0,3] instead of calculating the more 
precise  [2,2].  This patch fixes that, and I updated the testcase to 
incluide a more comprehensive test of modulus with constants.


Bootstrapped on x86_64-pc-linux-gnu and powerpc64-unknown-linux-gnu with 
no regressions.  Pushed.


Andrew

>From 145bc41dae7c7bfa093d61e77346f98e6a595a0e Mon Sep 17 00:00:00 2001
From: Andrew MacLeod 
Date: Thu, 29 Jul 2021 11:22:28 -0400
Subject: [PATCH 3/3] Handle constants in wi_fold for trunc_mod.

Handle const % const, as wi_fold_in_parts may now provide this.  Before this
[10, 10] % [4, 4] would produce [0, 3] instead of [2, 2].

	gcc/
	* range-op.cc (operator_trunc_mod::wi_fold): Fold constants.

	gcc/testsuite/
	* gcc.dg/tree-ssa/pr61839_2.c: Adjust.  Add new const fold test.
---
 gcc/range-op.cc   | 12 +++
 gcc/testsuite/gcc.dg/tree-ssa/pr61839_2.c | 39 ---
 2 files changed, 47 insertions(+), 4 deletions(-)

diff --git a/gcc/range-op.cc b/gcc/range-op.cc
index 69228882930..eb66e12677f 100644
--- a/gcc/range-op.cc
+++ b/gcc/range-op.cc
@@ -3240,6 +3240,18 @@ operator_trunc_mod::wi_fold (irange , tree type,
   return;
 }
 
+  // Check for constant and try to fold.
+  if (lh_lb == lh_ub && rh_lb == rh_ub)
+{
+  wi::overflow_type ov = wi::OVF_NONE;
+  tmp = wi::mod_trunc (lh_lb, rh_lb, sign, );
+  if (ov == wi::OVF_NONE)
+	{
+	  r = int_range<2> (type, tmp, tmp);
+	  return;
+	}
+}
+
   // ABS (A % B) < ABS (B) and either 0 <= A % B <= A or A <= A % B <= 0.
   new_ub = rh_ub - 1;
   if (sign == SIGNED)
diff --git a/gcc/testsuite/gcc.dg/tree-ssa/pr61839_2.c b/gcc/testsuite/gcc.dg/tree-ssa/pr61839_2.c
index f1b8feb4e9d..0e0f4c02113 100644
--- a/gcc/testsuite/gcc.dg/tree-ssa/pr61839_2.c
+++ b/gcc/testsuite/gcc.dg/tree-ssa/pr61839_2.c
@@ -45,9 +45,40 @@ int bar2 ()
   return 0;
 }
 
-/* Dont optimize 972195717 / 0 in function foo.  */
+/* Ensure we are folding modulus sub-ranges properly.  */
+__attribute__ ((noinline))
+int mod (int a, int b)
+{
+  int v1, v2;
+  v1 = (a < 10) ? 12 : 24;
+  v2 = (b > 20) ? 3 : 6;
+
+  if (a > 20)
+v1 = v1 * 2;
+  if (b > 20)
+v2 = v2 * 2;
+
+  if (a == b)
+v2 = 0;
+
+  /* v1 == 12, 24, or 48.  v2 == 0, 3, 6, or 12. */
+  int c = v1 % v2;
+  if (c == 0)
+;
+  else
+__builtin_abort ();
+  return 0;
+}
+
+/* EVRP now makes transformations in all functions, leaving a single
+ * builtin_abort call in bar2. */
+/* { dg-final { scan-tree-dump-times "__builtin_abort" 1 "evrp" } } */
+
+/* Make sure to optimize 972195717 / 0 in function foo.  */
 /* { dg-final { scan-tree-dump-times "972195717 / " 0  "evrp" } } */
-/* Dont optimize 972195717 % 0 in function bar.  */
-/* { dg-final { scan-tree-dump-times "972195717 % " 1 "evrp" } } */
-/* May optimize in function bar2, but EVRP doesn't perform this yet.  */
+/* Make sure  to optimize 972195717 % 0 in function bar.  */
+/* { dg-final { scan-tree-dump-times "972195717 % " 0 "evrp" } } */
+/* Make sure to optimize 972195717 % [1,2] function bar2.  */
 /* { dg-final { scan-tree-dump-times "972195715 % " 0 "evrp" } } */
+/* [12,12][24,24][48,48] % [0,0][3,3][6,6][12,12] == [0,0] */
+/* { dg-final { scan-tree-dump-times "%" 0 "evrp" } } */
-- 
2.17.2



[COMMITTED] Change integral divide by zero to produce UNDEFINED range.

2021-07-30 Thread Andrew MacLeod via Gcc-patches
This patch changes divide by 0 to produce an UNDEFINED range rather than 
VARYING.   This can help in propagating values by ignoring / 0 results 
rather than bailing.


Bootstrapped on x86_64-pc-linux-gnu and powerpc64-unknown-linux-gnu with 
no regressions.  Pushed.


Andrew


>From ebbcdd7fae1f802763850e4afedfdfa09cf10e1a Mon Sep 17 00:00:00 2001
From: Andrew MacLeod 
Date: Wed, 28 Jul 2021 13:14:22 -0400
Subject: [PATCH 2/3] Change integral divide by zero to produce UNDEFINED.

Instead of VARYING, we can get better results by treating divide by zero
as producing an undefined result.

	gcc/
	* range-op.cc (operator_div::wi_fold): Return UNDEFINED for [0, 0] divisor.

	gcc/testsuite/
	* gcc.dg/tree-ssa/pr61839_2.c: Adjust.
---
 gcc/range-op.cc   | 9 +
 gcc/testsuite/gcc.dg/tree-ssa/pr61839_2.c | 3 +--
 2 files changed, 2 insertions(+), 10 deletions(-)

diff --git a/gcc/range-op.cc b/gcc/range-op.cc
index b1fb25c77f8..69228882930 100644
--- a/gcc/range-op.cc
+++ b/gcc/range-op.cc
@@ -1785,13 +1785,6 @@ operator_div::wi_fold (irange , tree type,
 		   const wide_int _lb, const wide_int _ub,
 		   const wide_int _lb, const wide_int _ub) const
 {
-  // If we know we will divide by zero...
-  if (rh_lb == 0 && rh_ub == 0)
-{
-  r.set_varying (type);
-  return;
-}
-
   const wide_int dividend_min = lh_lb;
   const wide_int dividend_max = lh_ub;
   const wide_int divisor_min = rh_lb;
@@ -1818,7 +1811,7 @@ operator_div::wi_fold (irange , tree type,
   // If we're definitely dividing by zero, there's nothing to do.
   if (wi_zero_p (type, divisor_min, divisor_max))
 {
-  r.set_varying (type);
+  r.set_undefined ();
   return;
 }
 
diff --git a/gcc/testsuite/gcc.dg/tree-ssa/pr61839_2.c b/gcc/testsuite/gcc.dg/tree-ssa/pr61839_2.c
index cfec54de991..f1b8feb4e9d 100644
--- a/gcc/testsuite/gcc.dg/tree-ssa/pr61839_2.c
+++ b/gcc/testsuite/gcc.dg/tree-ssa/pr61839_2.c
@@ -45,9 +45,8 @@ int bar2 ()
   return 0;
 }
 
-
 /* Dont optimize 972195717 / 0 in function foo.  */
-/* { dg-final { scan-tree-dump-times "972195717 / " 1  "evrp" } } */
+/* { dg-final { scan-tree-dump-times "972195717 / " 0  "evrp" } } */
 /* Dont optimize 972195717 % 0 in function bar.  */
 /* { dg-final { scan-tree-dump-times "972195717 % " 1 "evrp" } } */
 /* May optimize in function bar2, but EVRP doesn't perform this yet.  */
-- 
2.17.2



[COMMITTED] Change const basic_block to const_basic_block in gimple-range-cache.

2021-07-30 Thread Andrew MacLeod via Gcc-patches

AS mentioned elsewhere, its const_basic_block not const basic_block.

bootstrapped on x86_64-pc-linux-gnu  with no regressions. pushed.

Andrew

>From d242acc396d645267cd1ccbdb4d0d73cc9b1ef48 Mon Sep 17 00:00:00 2001
From: Andrew MacLeod 
Date: Thu, 29 Jul 2021 09:15:45 -0400
Subject: [PATCH 1/3] Change const basic_block to const_basic_block.

	* gimple-range-cache.cc (*::set_bb_range): Change const basic_block to
	const_basic_block..
	(*::get_bb_range): Ditto.
	(*::bb_range_p): Ditto.
	* gimple-range-cache.h: Change prototypes.
---
 gcc/gimple-range-cache.cc | 36 ++--
 gcc/gimple-range-cache.h  |  6 +++---
 2 files changed, 21 insertions(+), 21 deletions(-)

diff --git a/gcc/gimple-range-cache.cc b/gcc/gimple-range-cache.cc
index 265a64bacca..91541f12c3c 100644
--- a/gcc/gimple-range-cache.cc
+++ b/gcc/gimple-range-cache.cc
@@ -160,9 +160,9 @@ non_null_ref::process_name (tree name)
 class ssa_block_ranges
 {
 public:
-  virtual bool set_bb_range (const basic_block bb, const irange ) = 0;
-  virtual bool get_bb_range (irange , const basic_block bb) = 0;
-  virtual bool bb_range_p (const basic_block bb) = 0;
+  virtual bool set_bb_range (const_basic_block bb, const irange ) = 0;
+  virtual bool get_bb_range (irange , const_basic_block bb) = 0;
+  virtual bool bb_range_p (const_basic_block bb) = 0;
 
   void dump(FILE *f);
 };
@@ -193,9 +193,9 @@ class sbr_vector : public ssa_block_ranges
 public:
   sbr_vector (tree t, irange_allocator *allocator);
 
-  virtual bool set_bb_range (const basic_block bb, const irange ) OVERRIDE;
-  virtual bool get_bb_range (irange , const basic_block bb) OVERRIDE;
-  virtual bool bb_range_p (const basic_block bb) OVERRIDE;
+  virtual bool set_bb_range (const_basic_block bb, const irange ) OVERRIDE;
+  virtual bool get_bb_range (irange , const_basic_block bb) OVERRIDE;
+  virtual bool bb_range_p (const_basic_block bb) OVERRIDE;
 protected:
   irange **m_tab;	// Non growing vector.
   int m_tab_size;
@@ -225,7 +225,7 @@ sbr_vector::sbr_vector (tree t, irange_allocator *allocator)
 // Set the range for block BB to be R.
 
 bool
-sbr_vector::set_bb_range (const basic_block bb, const irange )
+sbr_vector::set_bb_range (const_basic_block bb, const irange )
 {
   irange *m;
   gcc_checking_assert (bb->index < m_tab_size);
@@ -243,7 +243,7 @@ sbr_vector::set_bb_range (const basic_block bb, const irange )
 // there is no range.
 
 bool
-sbr_vector::get_bb_range (irange , const basic_block bb)
+sbr_vector::get_bb_range (irange , const_basic_block bb)
 {
   gcc_checking_assert (bb->index < m_tab_size);
   irange *m = m_tab[bb->index];
@@ -258,7 +258,7 @@ sbr_vector::get_bb_range (irange , const basic_block bb)
 // Return true if a range is present.
 
 bool
-sbr_vector::bb_range_p (const basic_block bb)
+sbr_vector::bb_range_p (const_basic_block bb)
 {
   gcc_checking_assert (bb->index < m_tab_size);
   return m_tab[bb->index] != NULL;
@@ -281,9 +281,9 @@ class sbr_sparse_bitmap : public ssa_block_ranges
 {
 public:
   sbr_sparse_bitmap (tree t, irange_allocator *allocator, bitmap_obstack *bm);
-  virtual bool set_bb_range (const basic_block bb, const irange ) OVERRIDE;
-  virtual bool get_bb_range (irange , const basic_block bb) OVERRIDE;
-  virtual bool bb_range_p (const basic_block bb) OVERRIDE;
+  virtual bool set_bb_range (const_basic_block bb, const irange ) OVERRIDE;
+  virtual bool get_bb_range (irange , const_basic_block bb) OVERRIDE;
+  virtual bool bb_range_p (const_basic_block bb) OVERRIDE;
 private:
   void bitmap_set_quad (bitmap head, int quad, int quad_value);
   int bitmap_get_quad (const_bitmap head, int quad);
@@ -342,7 +342,7 @@ sbr_sparse_bitmap::bitmap_get_quad (const_bitmap head, int quad)
 // Set the range on entry to basic block BB to R.
 
 bool
-sbr_sparse_bitmap::set_bb_range (const basic_block bb, const irange )
+sbr_sparse_bitmap::set_bb_range (const_basic_block bb, const irange )
 {
   if (r.undefined_p ())
 {
@@ -368,7 +368,7 @@ sbr_sparse_bitmap::set_bb_range (const basic_block bb, const irange )
 // there is no range.
 
 bool
-sbr_sparse_bitmap::get_bb_range (irange , const basic_block bb)
+sbr_sparse_bitmap::get_bb_range (irange , const_basic_block bb)
 {
   int value = bitmap_get_quad (bitvec, bb->index);
 
@@ -386,7 +386,7 @@ sbr_sparse_bitmap::get_bb_range (irange , const basic_block bb)
 // Return true if a range is present.
 
 bool
-sbr_sparse_bitmap::bb_range_p (const basic_block bb)
+sbr_sparse_bitmap::bb_range_p (const_basic_block bb)
 {
   return (bitmap_get_quad (bitvec, bb->index) != 0);
 }
@@ -417,7 +417,7 @@ block_range_cache::~block_range_cache ()
 // If it has not been accessed yet, allocate it first.
 
 bool
-block_range_cache::set_bb_range (tree name, const basic_block bb,
+block_range_cache::set_bb_range (tree name, const_basic_block bb,
  const irange )
 {
   unsigned v = SSA_NAME_VERSION (name);
@@ -464,7 +464,7 @@ block_range_cache::query_block_ranges (tree name)
 // is one.
 
 bool

Re: [RFC] Mark gcc.dg/shrink-wrap-loop.c as XFAIL.

2021-07-30 Thread Aldy Hernandez via Gcc-patches
On Fri, Jul 30, 2021 at 8:24 PM Jeff Law  wrote:
>
>
>
> On 7/30/2021 4:39 AM, Aldy Hernandez via Gcc-patches wrote:
> > It occurs to me that I should not have disabled early jump threading in
> > this test, as it may hide an actual defect.  I have reverted my change
> > and XFAILed the test instead.  I have also opened a PR101690 to keep track
> > of this problem.
> >
> > I have pushed this patch, but could benefit from someone with knowledge
> > of loop-ch and/or RTL shrink wrapping to look at the PR, as here we have
> > a valid jump thread that is causing loop_ch to drastically change the
> > probabilities ultimately having us fail at shrink wrapping.
> >
> > Thanks.
> >
> > gcc/testsuite/ChangeLog:
> >
> >   * gcc.dg/shrink-wrap-loop.c: Enable early jump threading.  Mark as
> >   XFAIL.
> Getting the profile data right with jump threading can be painful. See
> the huge comment at the start of
> tree-ssa-threadupdate.c::compute_path_counts.

*head explodes*

>
> The backwards threader uses a completely different copying/update
> mechanism.  It's probably not using compute_path_counts (and it's not
> even clear if it could) and I suspect the copier/updater that is being
> used for backwards threading doesn't handle those cases right.

Ughhh ok.

I wonder if this has something to do with
gcc.dg/tree-prof/20050826-2.c which is failing for x86 32-bits.  The
test just makes sure there is no magic "Invalid sum" after compiling
with -fprofile-use.

Aldy



Re: [PATCH] doc: correct documentation of "call" (et al) operand 2.

2021-07-30 Thread Jeff Law via Gcc-patches




On 7/29/2021 5:41 PM, Hans-Peter Nilsson wrote:

An old itch being scratched: the documentation lies; it's not "the
number of registers used as operands", unless the target makes a
special arrangement to that effect, and there's nothing in the guts of
gcc setting up or assuming those semantics.

Instead, see calls.c:expand_call, variable next_arg_reg.  Or just
consider the variable name.  The text is somewhat transcribed from the
head comment of emit_call_1 for parameter next_arg_reg.  Most
important is to document the relation to function_arg_info::end_marker()
and the TARGET_FUNCTION_ARG hook.

The "normally" in the head comment, in "normally it is the first
arg-register beyond those used for args in this call, or 0 if all the
arg-registers are used in this call" means "by default", unless the
target tests end_marker_p and does something special, but the port is
free to return whatever it likes when it sees the end-marker.

And, I do mean "whatever it likes" because if the port doesn't
actually mention that operand in the RTX emitted for its "call" or
"call_value" patterns ("usually" define_expands), it can be any
mumbo-jumbo, such as a VOIDmode register, which seems like it happens
for some targets, or NULL, that happens for others.  Returning a
VOIDmode register until recently included MMIX, where it made it into
the emitted RTL, confusing later passes, recently exposed as an ICE.

Tested by inspecting the info and generated pdf for sanity.

Ok for the doc part?

gcc:
* doc/md.texi (call): Correct information about operand 2.
* config/mmix/mmix.md ("call", "call_value"): Remove fixed FIXMEs.

OK
jeff



Re: [RFC] Mark gcc.dg/shrink-wrap-loop.c as XFAIL.

2021-07-30 Thread Jeff Law via Gcc-patches




On 7/30/2021 4:39 AM, Aldy Hernandez via Gcc-patches wrote:

It occurs to me that I should not have disabled early jump threading in
this test, as it may hide an actual defect.  I have reverted my change
and XFAILed the test instead.  I have also opened a PR101690 to keep track
of this problem.

I have pushed this patch, but could benefit from someone with knowledge
of loop-ch and/or RTL shrink wrapping to look at the PR, as here we have
a valid jump thread that is causing loop_ch to drastically change the
probabilities ultimately having us fail at shrink wrapping.

Thanks.

gcc/testsuite/ChangeLog:

* gcc.dg/shrink-wrap-loop.c: Enable early jump threading.  Mark as
XFAIL.
Getting the profile data right with jump threading can be painful. See 
the huge comment at the start of 
tree-ssa-threadupdate.c::compute_path_counts.


The backwards threader uses a completely different copying/update 
mechanism.  It's probably not using compute_path_counts (and it's not 
even clear if it could) and I suspect the copier/updater that is being 
used for backwards threading doesn't handle those cases right.


Jeff



---
  gcc/testsuite/gcc.dg/shrink-wrap-loop.c | 3 +--
  1 file changed, 1 insertion(+), 2 deletions(-)

diff --git a/gcc/testsuite/gcc.dg/shrink-wrap-loop.c 
b/gcc/testsuite/gcc.dg/shrink-wrap-loop.c
index ba872fa23f6..6e1be8937fe 100644
--- a/gcc/testsuite/gcc.dg/shrink-wrap-loop.c
+++ b/gcc/testsuite/gcc.dg/shrink-wrap-loop.c
@@ -1,6 +1,5 @@
  /* { dg-do compile { target { { { i?86-*-* x86_64-*-* } && lp64 } || { 
arm_thumb2 } } } } */
  /* { dg-options "-O2 -fdump-rtl-pro_and_epilogue"  } */
-// { dg-additional-options "-fdisable-tree-ethread" }
  
  /*

  Our new threader is threading things a bit too early, and causing the
@@ -69,4 +68,4 @@ test (int *p1, int *p2)
  
return 1;

  }
-/* { dg-final { scan-rtl-dump "Performing shrink-wrapping" "pro_and_epilogue"  
} } */
+/* { dg-final { scan-rtl-dump "Performing shrink-wrapping" "pro_and_epilogue" 
{ xfail *-*-* } } } */




committed: [PATCH] mips: Fix up mips_atomic_assign_expand_fenv [PR94780]

2021-07-30 Thread Xi Ruoyao via Gcc-patches
On Fri, 2021-07-30 at 16:23 +0800, Xi Ruoyao via Gcc-patches wrote:
> On Fri, 2021-07-30 at 09:11 +0100, Richard Sandiford wrote:
> > Xi Ruoyao  writes:
> > > Ping again.
> > > 
> > > On Wed, 2021-06-23 at 11:11 +0800, Xi Ruoyao wrote:
> > > > Commit message shamelessly copied from 1777beb6b129 by jakub:
> > > > 
> > > > This function, because it is sometimes called even outside of
> > > > function
> > > > bodies, uses create_tmp_var_raw rather than create_tmp_var.  But
> > > > in
> > > > order
> > > > for that to work, when first referenced, the VAR_DECLs need to
> > > > appear
> > > > in a
> > > > TARGET_EXPR so that during gimplification the var gets the right
> > > > DECL_CONTEXT and is added to local decls.
> > > > 
> > > > Bootstrapped & regtested on mips64el-linux-gnu.  Ok for trunk
> > > > and
> > > > backport
> > > > to 11, 10, and 9?
> > 
> > OK for all, thanks.
> > 
> > Similar comments to the previous message about the appropriateness
> > of me reviewing the patch, but like you say, this is doing for MIPS
> > what we've already had to do for other targets.
> 
> Thanks for reviewing.
> 
> Will bootstrap and test it again, and commit if there is no
> regressions.

Committed to master at 20656544 and releases/gcc-11 at 7db1795a.

Will do it for gcc-10 and gcc-9 tomorrow.



committed: [PATCH] mips: add MSA vec_cmp and vec_cmpu expand pattern [PR101132]

2021-07-30 Thread Xi Ruoyao via Gcc-patches
On Fri, 2021-07-30 at 16:17 +0800, Xi Ruoyao via Gcc-patches wrote:
> On Fri, 2021-07-30 at 09:04 +0100, Richard Sandiford wrote:
> > Xi Ruoyao  writes:
> > > Ping again.
> > 
> > Sorry that this has gone unreviewed for so long.  I think in
> > practice
> > the MIPS port is essentially unmaintained at this point -- it would
> > be great if someone would volunteer :-)
> 
> A company working on MIPS has contacted me and said one of their
> employees may contact the SC and take the role of MIPS maintainer. 
> Not
> sure their progress though.
> 
> > It isn't really appropriate for me to review MIPS stuff given that I
> > work
> > for a company that has a competing architecture.  I think Jeff
> > expressed
> > similar concerns given his new role.
> 
> > That said, the patch looks clearly correct to me, so please go ahead
> > and apply (to trunk and GCC 11).  Thanks for your patience.
> 
> Thanks!
> 
> It has been 5 weeks so it's better to rebase and bootstrap & test it
> again.  I'll commit it if there is no regression.

Committed to master at 45cb789e and releases/gcc-11 at 2a47ee78.
-- 
Xi Ruoyao 
School of Aerospace Science and Technology, Xidian University



Re: [PATCH v6] Add QI vector mode support to by-pieces for memset

2021-07-30 Thread H.J. Lu via Gcc-patches
On Fri, Jul 30, 2021 at 8:40 AM Richard Sandiford
 wrote:
>
> "H.J. Lu via Gcc-patches"  writes:
> > +/* Callback routine for store_by_pieces.  Read GET_MODE_BITSIZE (MODE)
> > +   bytes from constant string DATA + OFFSET and return it as target
> > +   constant.  If PREV isn't nullptr, it has the RTL info from the
> > +   previous iteration.  */
> >
> > +rtx
> > +builtin_memset_read_str (void *data, void *prev,
> > +  HOST_WIDE_INT offset ATTRIBUTE_UNUSED,
> > +  fixed_size_mode mode)
> > +{
> >const char *c = (const char *) data;
> > -  char *p = XALLOCAVEC (char, GET_MODE_SIZE (mode));
> > +  unsigned int size = GET_MODE_SIZE (mode);
> >
> > -  memset (p, *c, GET_MODE_SIZE (mode));
> > +  rtx target = gen_memset_value_from_prev ((by_pieces_prev *) prev,
> > +mode);
> > +  if (target != nullptr)
> > +return target;
> > +  rtx src = gen_int_mode (*c, QImode);
> >
> > -  return c_readstr (p, mode);
> > +  if (VECTOR_MODE_P (mode))
> > +{
> > +  gcc_assert (GET_MODE_INNER (mode) == QImode);
> > +
> > +  rtx const_vec = gen_const_vec_duplicate (mode, src);
> > +  if (prev == NULL)
> > + /* Return CONST_VECTOR when called by a query function.  */
> > + target = const_vec;
> > +  else
> > + {
> > +   /* Use the move expander with CONST_VECTOR.  */
> > +   target = targetm.gen_memset_scratch_rtx (mode);
> > +   emit_move_insn (target, const_vec);
> > + }
> > +
> > +  return target;
>
> I guess this is personal preference, sorry, but it seems more obvious
> to me with an early return rather than an assignemnt to target:
>
>   if (prev == NULL)
> /* Return CONST_VECTOR when called by a query function.  */
> return const_vec;
>
>   /* Use the move expander with CONST_VECTOR.  */
>   target = targetm.gen_memset_scratch_rtx (mode);
>   emit_move_insn (target, const_vec);
>   return target;

I made the change and checked it in.   Thank you very much for your help.

> OK with that change, thanks, no need to repost.  (And thanks for
> your patience.)
>
> Richard



-- 
H.J.


[committed] libstdc++: Use secure_getenv for filesystem::temp_directory_path() [PR65018]

2021-07-30 Thread Jonathan Wakely via Gcc-patches
This adds a configure check for the GNU extension secure_getenv and then
uses it for looking up TMPDIR and similar variables.

Signed-off-by: Jonathan Wakely 

libstdc++-v3/ChangeLog:

PR libstdc++/65018
* configure.ac: Check for secure_getenv.
* config.h.in: Regenerate.
* configure: Regenerate.
* src/filesystem/ops-common.h (get_temp_directory_from_env): New
helper function to obtain path from the environment.
* src/c++17/fs_ops.cc (fs::temp_directory_path): Use new helper.
* src/filesystem/ops.cc (fs::temp_directory_path): Likewise.
* testsuite/27_io/filesystem/operations/temp_directory_path.cc:
Print messages if test cannot be run.
* testsuite/experimental/filesystem/operations/temp_directory_path.cc:
Likewise. Fix incorrect condition. Use "TMP" to work with
Windows as well as POSIX.

Tested powerpc64le-linux. Committed to trunk.

commit 3dbd4d94bf380f3efa8bba9b203ce7d4c8f47fbb
Author: Jonathan Wakely 
Date:   Fri Jul 30 13:56:14 2021

libstdc++: Use secure_getenv for filesystem::temp_directory_path() [PR65018]

This adds a configure check for the GNU extension secure_getenv and then
uses it for looking up TMPDIR and similar variables.

Signed-off-by: Jonathan Wakely 

libstdc++-v3/ChangeLog:

PR libstdc++/65018
* configure.ac: Check for secure_getenv.
* config.h.in: Regenerate.
* configure: Regenerate.
* src/filesystem/ops-common.h (get_temp_directory_from_env): New
helper function to obtain path from the environment.
* src/c++17/fs_ops.cc (fs::temp_directory_path): Use new helper.
* src/filesystem/ops.cc (fs::temp_directory_path): Likewise.
* testsuite/27_io/filesystem/operations/temp_directory_path.cc:
Print messages if test cannot be run.
* 
testsuite/experimental/filesystem/operations/temp_directory_path.cc:
Likewise. Fix incorrect condition. Use "TMP" to work with
Windows as well as POSIX.

diff --git a/libstdc++-v3/configure.ac b/libstdc++-v3/configure.ac
index a816ff79d16..9d70ae7b1d0 100644
--- a/libstdc++-v3/configure.ac
+++ b/libstdc++-v3/configure.ac
@@ -273,6 +273,7 @@ if $GLIBCXX_IS_NATIVE; then
   AC_CHECK_FUNCS(__cxa_thread_atexit_impl __cxa_thread_atexit)
   AC_CHECK_FUNCS(aligned_alloc posix_memalign memalign _aligned_malloc)
   AC_CHECK_FUNCS(_wfopen)
+  AC_CHECK_FUNCS(secure_getenv)
 
   # C11 functions for C++17 library
   AC_CHECK_FUNCS(timespec_get)
diff --git a/libstdc++-v3/src/c++17/fs_ops.cc b/libstdc++-v3/src/c++17/fs_ops.cc
index ceaf0291d64..db2250e4841 100644
--- a/libstdc++-v3/src/c++17/fs_ops.cc
+++ b/libstdc++-v3/src/c++17/fs_ops.cc
@@ -1591,7 +1591,8 @@ fs::symlink_status(const fs::path& p)
   return result;
 }
 
-fs::path fs::temp_directory_path()
+fs::path
+fs::temp_directory_path()
 {
   error_code ec;
   path tmp = temp_directory_path(ec);
@@ -1600,32 +1601,10 @@ fs::path fs::temp_directory_path()
   return tmp;
 }
 
-fs::path fs::temp_directory_path(error_code& ec)
+fs::path
+fs::temp_directory_path(error_code& ec)
 {
-  path p;
-#ifdef _GLIBCXX_FILESYSTEM_IS_WINDOWS
-  unsigned len = 1024;
-  std::wstring buf;
-  do
-{
-  buf.resize(len);
-  len = GetTempPathW(buf.size(), buf.data());
-} while (len > buf.size());
-
-  if (len == 0)
-{
-  ec.assign((int)GetLastError(), std::system_category());
-  return p;
-}
-  buf.resize(len);
-  p = std::move(buf);
-#else
-  const char* tmpdir = nullptr;
-  const char* env[] = { "TMPDIR", "TMP", "TEMP", "TEMPDIR", nullptr };
-  for (auto e = env; tmpdir == nullptr && *e != nullptr; ++e)
-tmpdir = ::getenv(*e);
-  p = tmpdir ? tmpdir : "/tmp";
-#endif
+  path p = fs::get_temp_directory_from_env();
   auto st = status(p, ec);
   if (ec)
 p.clear();
diff --git a/libstdc++-v3/src/filesystem/ops-common.h 
b/libstdc++-v3/src/filesystem/ops-common.h
index 43311e6c38f..b8bbf446883 100644
--- a/libstdc++-v3/src/filesystem/ops-common.h
+++ b/libstdc++-v3/src/filesystem/ops-common.h
@@ -568,6 +568,46 @@ _GLIBCXX_BEGIN_NAMESPACE_FILESYSTEM
 
 #endif // _GLIBCXX_HAVE_SYS_STAT_H
 
+  // Find OS-specific name of temporary directory from the environment,
+  // Caller must check that the path is an accessible directory.
+#ifdef _GLIBCXX_FILESYSTEM_IS_WINDOWS
+  inline wstring
+  get_temp_directory_from_env()
+  {
+unsigned len = 1024;
+std::wstring buf;
+do
+  {
+   buf.resize(len);
+   len = GetTempPathW(buf.size(), buf.data());
+  } while (len > buf.size());
+
+if (len == 0)
+  {
+   ec.assign((int)GetLastError(), std::system_category());
+   return p;
+  }
+buf.resize(len);
+return buf;
+  }
+#else
+  inline const char*
+  get_temp_directory_from_env() noexcept
+  {
+for (auto env : { "TMPDIR", "TMP", "TEMP", "TEMPDIR" })
+  {
+#if 

Re: [PATCH] c++: Implement P0466R5 __cpp_lib_is_pointer_interconvertible compiler helpers [PR101539]

2021-07-30 Thread Jason Merrill via Gcc-patches

On 7/30/21 11:23 AM, Jakub Jelinek wrote:

On Fri, Jul 30, 2021 at 11:00:26AM -0400, Jason Merrill wrote:

Patch attached.


LGTM (which would mean I'll need to replace that particular test union
with a different one which will have just a non-std-layout member of the
anon struct, see below), but I guess we want a testcase for this, e.g.
struct E { };
struct X { int a; struct : public E { short b; long c; }; long long d; };
union Y { int a; struct : public E { short b; long c; }; long long d; };
will do it.


I've now committed this change.


But standard layout means that even all the non-static members of the struct
need to be standard-layout, that seems an unnecessary requirement for
anon structures to me.


Good point.

But then, if the anonymous struct is non-standard-layout, that should make
the enclosing class non-standard-layout as well, so we should never need to
consider in the pointer-interconv code whether the anonymous struct is
standard-layout.


For non-std-layout anon struct in a non-union class sure.
But, for non-std-layout anon struct in a union, while it makes the union
also non-std-layout, pointer-interconvertibility doesn't care about
std-layout, even in non-std-layout unions address of each of the union member
is pointer-interconvertible with the address of the whole union object.


Ah, right, I remembered that yesterday, but forgot this morning...

So the handling in your revised patch is good; let's not worry about the 
case you mentioned with an anonymous struct member of the same type as 
another union member.


Though


+ if ((TREE_CODE (type) != UNION_TYPE


This line could use either a comment or to be dropped; it's true that if 
type is non-union, we can skip the other checks because we know that any 
anonymous struct member must be standard-layout, but the other checks 
are cheap enough that I'm inclined to omit this one.


OK either way and with the testcase adjustments for my patch.


+  || TREE_CODE (TREE_TYPE (field)) == UNION_TYPE
   || std_layout_type_p (TREE_TYPE (field)))
  && first_nonstatic_data_member_p (TREE_TYPE (field), membertype))




[PATCH] c++: Reject anonymous struct with bases

2021-07-30 Thread Jason Merrill via Gcc-patches
In discussion of jakub's patch for C++20 pointer-interconvertibility, it
came up that we allow anonymous structs to have bases, but don't do anything
usable with them.  Let's reject it.

The comment change is something I noticed while looking for the right place
to diagnose this: finish_struct_anon does not actually check for anything
invalid, so it shouldn't claim to.

Tested x86_64-pc-linux-gnu, applying to trunk.

gcc/cp/ChangeLog:

* class.c (finish_struct_anon): Improve comment.
* decl.c (fixup_anonymous_aggr): Reject anonymous struct
with bases.

gcc/testsuite/ChangeLog:

* g++.dg/ext/anon-struct8.C: New test.
---
 gcc/cp/class.c  | 3 +--
 gcc/cp/decl.c   | 3 +++
 gcc/testsuite/g++.dg/ext/anon-struct8.C | 9 +
 3 files changed, 13 insertions(+), 2 deletions(-)
 create mode 100644 gcc/testsuite/g++.dg/ext/anon-struct8.C

diff --git a/gcc/cp/class.c b/gcc/cp/class.c
index 14db06692dc..6f31700c06c 100644
--- a/gcc/cp/class.c
+++ b/gcc/cp/class.c
@@ -3072,8 +3072,7 @@ finish_struct_anon_r (tree field)
 }
 }
 
-/* Check for things that are invalid.  There are probably plenty of other
-   things we should check for also.  */
+/* Fix up any anonymous union/struct members of T.  */
 
 static void
 finish_struct_anon (tree t)
diff --git a/gcc/cp/decl.c b/gcc/cp/decl.c
index 01d64a16125..71308a06c63 100644
--- a/gcc/cp/decl.c
+++ b/gcc/cp/decl.c
@@ -5084,6 +5084,9 @@ fixup_anonymous_aggr (tree t)
 {
   tree field, type;
 
+  if (BINFO_N_BASE_BINFOS (TYPE_BINFO (t)))
+   error_at (location_of (t), "anonymous struct with base classes");
+
   for (field = TYPE_FIELDS (t); field; field = DECL_CHAIN (field))
if (TREE_CODE (field) == FIELD_DECL)
  {
diff --git a/gcc/testsuite/g++.dg/ext/anon-struct8.C 
b/gcc/testsuite/g++.dg/ext/anon-struct8.C
new file mode 100644
index 000..f4e3f11b678
--- /dev/null
+++ b/gcc/testsuite/g++.dg/ext/anon-struct8.C
@@ -0,0 +1,9 @@
+// { dg-options "" }
+
+struct A { };
+struct B {
+  struct: A { int i; };// { dg-error "anonymous struct with 
base" }
+};
+union U {
+  struct: A { int i; };// { dg-error "anonymous struct with 
base" }
+};

base-commit: 0ba2003cf306aa98b6ec91c9d849ab9bafcf17c2
-- 
2.27.0



Re: [PATCH][gcc/doc] Improve nonnull attribute documentation

2021-07-30 Thread Martin Sebor via Gcc-patches

On 7/28/21 9:20 AM, Tom de Vries wrote:

Hi,

Improve nonnull attribute documentation in a number of ways:

Reorganize discussion of effects into:
- effects for calls to functions with nonnull-marked parameters, and
- effects for function definitions with nonnull-marked parameters.
This makes it clear that -fno-delete-null-pointer-checks has no effect for
optimizations based on nonnull-marked parameters in function definitions
(see PR100404).


This resolves half of PR 101665 that I raised the other day (i.e.,
updates the docs).  Thank you!  Since PR 100404 was resolved as
invalid, can you please reference the other PR in the changelog?
The other half (warning when attribute nonnull is specified along
with attribute optimize "-fno-delete-null-pointer-checks") remains.
I plan to look into it unless someone beats me to it or unless some
other solution emerges.

A few comments on the documentation changes below.



Mention -Wnonnull-compare.

Mention workaround from PR100404 comment 7.

The workaround can be used for this scenario.  Say we have a test.c:
...
  #include 

  extern int isnull (char *ptr) __attribute__ ((nonnull));
  int isnull (char *ptr)
  {
if (ptr == 0)
  return 1;
return 0;
  }

  int
  main (void)
  {
char *ptr = NULL;
if (isnull (ptr)) __builtin_abort ();
return 0;
  }
...

The test-case contains a mistake: ptr == NULL, and we want to detect the
mistake using an abort:
...
$ gcc test.c
$ ./a.out
Aborted (core dumped)
...

At -O2 however, the mistake is not detected:
...
$ gcc test.c -O2
$ ./a.out
...
which is what -Wnonnull-compare (not show here) warns about.

The easiest way to fix this is by dropping the nonnull attribute.  But that
also disables -Wnonnull, which would detect something like:
...
   if (isnull (NULL)) __builtin_abort ();
...
at compile time.

Using this workaround:
...
  int isnull (char *ptr)
  {
+  asm ("" : "+r"(ptr));
if (ptr == 0)
  return 1;
return 0;
  }
...
we still manage to detect the problem at runtime with -O2:
...
$ ~/gcc_versions/devel/install/bin/gcc test.c -O2
$ ./a.out
Aborted (core dumped)
...
while keeping the possibility to detect "isnull (NULL)" at compile time.

OK for trunk?

Thanks,
- Tom

[gcc/doc] Improve nonnull attribute documentation

gcc/ChangeLog:

2021-07-28  Tom de Vries  

* doc/extend.texi (nonnull attribute): Improve documentation.

---
  gcc/doc/extend.texi | 51 ---
  1 file changed, 40 insertions(+), 11 deletions(-)

diff --git a/gcc/doc/extend.texi b/gcc/doc/extend.texi
index b83cd4919bb..3389effd70c 100644
--- a/gcc/doc/extend.texi
+++ b/gcc/doc/extend.texi
@@ -3488,17 +3488,46 @@ my_memcpy (void *dest, const void *src, size_t len)
  @end smallexample
  
  @noindent

-causes the compiler to check that, in calls to @code{my_memcpy},
-arguments @var{dest} and @var{src} are non-null.  If the compiler
-determines that a null pointer is passed in an argument slot marked
-as non-null, and the @option{-Wnonnull} option is enabled, a warning
-is issued.  @xref{Warning Options}.  Unless disabled by
-the @option{-fno-delete-null-pointer-checks} option the compiler may
-also perform optimizations based on the knowledge that certain function
-arguments cannot be null. In addition,
-the @option{-fisolate-erroneous-paths-attribute} option can be specified
-to have GCC transform calls with null arguments to non-null functions
-into traps. @xref{Optimize Options}.
+informs the compiler that, in calls to @code{my_memcpy}, arguments
+@var{dest} and @var{src} must be non-null.
+
+The attribute has effect both for functions calls and function definitions.


Missing article: has an  effect.  Also, an effect on
(rather than for) might be more appropriate.


+
+For function calls:
+@itemize @bullet
+@item If the compiler determines that a null pointer is
+passed in an argument slot marked as non-null, and the
+@option{-Wnonnull} option is enabled, a warning is issued.
+@xref{Warning Options}.
+@item The @option{-fisolate-erroneous-paths-attribute} option can be
+specified to have GCC transform calls with null arguments to non-null
+functions into traps.  @xref{Optimize Options}.
+@item The compiler may also perform optimizations based on the
+knowledge that certain function arguments cannot be null.  These
+optimizations can be disabled by the
+@option{-fno-delete-null-pointer-checks} option. @xref{Optimize Options}.
+@end itemize
+
+For function definitions:
+@itemize @bullet
+@item If the compiler determines that a function parameter that is
+marked with non-null


Either no "with" or no hyphen in nonnull (when it names the attribute).


is compared with null, and
+@option{-Wnonnull-compare} option is enabled, a warning is issued.
+@xref{Warning Options}.
+@item The compiler may also perform optimizations based on the
+knowledge that certain function parameters cannot be null.


"certain function parameters" makes me think it might include others
besides 

Re: [PATCH] gcov-profile/71672 Fix indirect call inlining with AutoFDO

2021-07-30 Thread Andi Kleen via Gcc-patches



On 7/30/2021 12:08 AM, Eugene Rozenfeld wrote:

This patch has the following changes:



Great thanks. Thanks for working on this. Looks all good to me (except I 
guess the patches could be split up for commit)



-Andi



[PATCH] Add a simple fraction class

2021-07-30 Thread Richard Sandiford via Gcc-patches
This patch adds a simple class for holding A/B fractions.
As the comments in the patch say, the class isn't designed
to have nice numerial properties at the extremes.

The motivating use case was some aarch64 costing work,
where being able to represent fractions was much easier
than using single integers and avoided the rounding errors
that would come with using floats.  (Unlike things like
COSTS_N_INSNS, there was no sensible constant base factor
that could be used.)

Tested on aarch64-linux-gnu and x86_64-linux-gnu.  OK to install?

Thanks,
Richard


gcc/
* simple-fraction.h: New file.
* simple-fraction.cc: Likewise.
* Makefile.in (OBJS): Add simple-fraction.o.
* selftest.h (simple_fraction_cc_tests): Declare.
* selftest-run-tests.c (selftest::run_tests): Call it.
---
 gcc/Makefile.in  |   1 +
 gcc/selftest-run-tests.c |   1 +
 gcc/selftest.h   |   1 +
 gcc/simple-fraction.cc   | 160 
 gcc/simple-fraction.h| 308 +++
 5 files changed, 471 insertions(+)
 create mode 100644 gcc/simple-fraction.cc
 create mode 100644 gcc/simple-fraction.h

diff --git a/gcc/Makefile.in b/gcc/Makefile.in
index 1666ef84d6a..8eaaab84143 100644
--- a/gcc/Makefile.in
+++ b/gcc/Makefile.in
@@ -1572,6 +1572,7 @@ OBJS = \
selftest-run-tests.o \
sese.o \
shrink-wrap.o \
+   simple-fraction.o \
simplify-rtx.o \
sparseset.o \
spellcheck.o \
diff --git a/gcc/selftest-run-tests.c b/gcc/selftest-run-tests.c
index 1b5583e476a..f17d4e24884 100644
--- a/gcc/selftest-run-tests.c
+++ b/gcc/selftest-run-tests.c
@@ -80,6 +80,7 @@ selftest::run_tests ()
   opt_problem_cc_tests ();
   ordered_hash_map_tests_cc_tests ();
   splay_tree_cc_tests ();
+  simple_fraction_cc_tests ();
 
   /* Mid-level data structures.  */
   input_c_tests ();
diff --git a/gcc/selftest.h b/gcc/selftest.h
index 80459d63a39..716ba41f6bf 100644
--- a/gcc/selftest.h
+++ b/gcc/selftest.h
@@ -254,6 +254,7 @@ extern void read_rtl_function_c_tests ();
 extern void rtl_tests_c_tests ();
 extern void sbitmap_c_tests ();
 extern void selftest_c_tests ();
+extern void simple_fraction_cc_tests ();
 extern void simplify_rtx_c_tests ();
 extern void spellcheck_c_tests ();
 extern void spellcheck_tree_c_tests ();
diff --git a/gcc/simple-fraction.h b/gcc/simple-fraction.h
new file mode 100644
index 000..8d3ff2bdd2d
--- /dev/null
+++ b/gcc/simple-fraction.h
@@ -0,0 +1,308 @@
+// Simple fraction utilities
+// Copyright (C) 2021 Free Software Foundation, Inc.
+//
+// This file is part of GCC.
+//
+// GCC is free software; you can redistribute it and/or modify it under
+// the terms of the GNU General Public License as published by the Free
+// Software Foundation; either version 3, or (at your option) any later
+// version.
+//
+// GCC is distributed in the hope that it will be useful, but WITHOUT ANY
+// WARRANTY; without even the implied warranty of MERCHANTABILITY or
+// FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License
+// for more details.
+//
+// You should have received a copy of the GNU General Public License
+// along with GCC; see the file COPYING3.  If not see
+// .
+
+// A simple fraction with nominator and denominator of integral type T.
+// There is little attempt to handle multiplication overflow, so the class
+// shouldn't be used in cases where that's a risk.  It also doesn't cope
+// gracefully with the minimum T value, if T is signed.
+template 
+class simple_fraction
+{
+public:
+  // Construct a fraction equal to NOMINATOR.
+  template
+  constexpr simple_fraction (T1 nominator = 0)
+: m_nominator (nominator), m_denominator (1) {}
+
+  // Construct a fraction equal to NOMINATOR / DENOMINATOR (simplifying
+  // where possible).
+  template
+  simple_fraction (T1 nominator, T2 denominator)
+: simple_fraction (nominator, denominator, gcd (nominator, denominator)) {}
+
+  simple_fraction operator- () const;
+  simple_fraction operator+ (const simple_fraction &) const;
+  simple_fraction operator- (const simple_fraction &) const;
+  simple_fraction operator* (const simple_fraction &) const;
+  simple_fraction operator/ (const simple_fraction &) const;
+
+  simple_fraction += (const simple_fraction &);
+  simple_fraction = (const simple_fraction &);
+  simple_fraction *= (const simple_fraction &);
+  simple_fraction /= (const simple_fraction &);
+
+  bool operator== (const simple_fraction &) const;
+  bool operator!= (const simple_fraction &) const;
+  bool operator< (const simple_fraction &) const;
+  bool operator<= (const simple_fraction &) const;
+  bool operator>= (const simple_fraction &) const;
+  bool operator> (const simple_fraction &) const;
+
+  explicit operator bool () const { return m_nominator != 0; }
+
+  T floor () const;
+  T ceil () const;
+
+  // Convert the value to a double.
+  double as_double () const { return double 

Re: PING^5: [PATCH] mips: add MSA vec_cmp and vec_cmpu expand pattern [PR101132]

2021-07-30 Thread Jeff Law via Gcc-patches




On 7/30/2021 2:04 AM, Richard Sandiford via Gcc-patches wrote:

Xi Ruoyao  writes:

Ping again.

Sorry that this has gone unreviewed for so long.  I think in practice
the MIPS port is essentially unmaintained at this point -- it would
be great if someone would volunteer :-)

Yup.



It isn't really appropriate for me to review MIPS stuff given that I work
for a company that has a competing architecture.  I think Jeff expressed
similar concerns given his new role.
Right, I'm largely in the same boat as well.   I've been given a degree 
of freedom, but I'm very cognizant of not raising  conflict of interest 
concerns with my employer.  Some trivial patch review for a port is OK, 
but the more substantial it is, the closer it is to the line that I 
don't want to cross.


Jeff


Re: [PATCH v6] Add QI vector mode support to by-pieces for memset

2021-07-30 Thread Richard Sandiford via Gcc-patches
"H.J. Lu via Gcc-patches"  writes:
> +/* Callback routine for store_by_pieces.  Read GET_MODE_BITSIZE (MODE)
> +   bytes from constant string DATA + OFFSET and return it as target
> +   constant.  If PREV isn't nullptr, it has the RTL info from the
> +   previous iteration.  */
>  
> +rtx
> +builtin_memset_read_str (void *data, void *prev,
> +  HOST_WIDE_INT offset ATTRIBUTE_UNUSED,
> +  fixed_size_mode mode)
> +{
>const char *c = (const char *) data;
> -  char *p = XALLOCAVEC (char, GET_MODE_SIZE (mode));
> +  unsigned int size = GET_MODE_SIZE (mode);
>  
> -  memset (p, *c, GET_MODE_SIZE (mode));
> +  rtx target = gen_memset_value_from_prev ((by_pieces_prev *) prev,
> +mode);
> +  if (target != nullptr)
> +return target;
> +  rtx src = gen_int_mode (*c, QImode);
>  
> -  return c_readstr (p, mode);
> +  if (VECTOR_MODE_P (mode))
> +{
> +  gcc_assert (GET_MODE_INNER (mode) == QImode);
> +
> +  rtx const_vec = gen_const_vec_duplicate (mode, src);
> +  if (prev == NULL)
> + /* Return CONST_VECTOR when called by a query function.  */
> + target = const_vec;
> +  else
> + {
> +   /* Use the move expander with CONST_VECTOR.  */
> +   target = targetm.gen_memset_scratch_rtx (mode);
> +   emit_move_insn (target, const_vec);
> + }
> +
> +  return target;

I guess this is personal preference, sorry, but it seems more obvious
to me with an early return rather than an assignemnt to target:

  if (prev == NULL)
/* Return CONST_VECTOR when called by a query function.  */
return const_vec;

  /* Use the move expander with CONST_VECTOR.  */
  target = targetm.gen_memset_scratch_rtx (mode);
  emit_move_insn (target, const_vec);
  return target;

OK with that change, thanks, no need to repost.  (And thanks for
your patience.)

Richard


Re: [PATCH] c++: Implement P0466R5 __cpp_lib_is_pointer_interconvertible compiler helpers [PR101539]

2021-07-30 Thread Jakub Jelinek via Gcc-patches
On Fri, Jul 30, 2021 at 11:00:26AM -0400, Jason Merrill wrote:
> Patch attached.

LGTM (which would mean I'll need to replace that particular test union
with a different one which will have just a non-std-layout member of the
anon struct, see below), but I guess we want a testcase for this, e.g.
struct E { };
struct X { int a; struct : public E { short b; long c; }; long long d; };
union Y { int a; struct : public E { short b; long c; }; long long d; };
will do it.

> > But standard layout means that even all the non-static members of the struct
> > need to be standard-layout, that seems an unnecessary requirement for
> > anon structures to me.
> 
> Good point.
> 
> But then, if the anonymous struct is non-standard-layout, that should make
> the enclosing class non-standard-layout as well, so we should never need to
> consider in the pointer-interconv code whether the anonymous struct is
> standard-layout.

For non-std-layout anon struct in a non-union class sure.
But, for non-std-layout anon struct in a union, while it makes the union
also non-std-layout, pointer-interconvertibility doesn't care about
std-layout, even in non-std-layout unions address of each of the union member
is pointer-interconvertible with the address of the whole union object.
But when we recurse into the anon struct, the rule for non-union classes
applies and it only handles std-layout.

So something like:
struct D { int x; private: int y; };
union Y { int a; struct { short b; long c; D z; }; long long d; };

static_assert (std::is_pointer_interconvertible_with_class (::a));
static_assert (!std::is_pointer_interconvertible_with_class (::b));
static_assert (!std::is_pointer_interconvertible_with_class (::c));
static_assert (std::is_pointer_interconvertible_with_class (::d));

D is not std-layout, therefore the anon-struct is not std-layout either
and neither is union Y, for ::a and ::d the union
pointer-interconvertibility rule applies and both are
pointer-interconvertible, for the anon-struct it is not std-layout and
therefore address of the artificial member with anon-struct type
is not pointer-interconvertible with address of the b member.

Jakub



Re: [PING][PATCH] define auto_vec copy ctor and assignment (PR 90904)

2021-07-30 Thread Jason Merrill via Gcc-patches

On 7/27/21 2:56 PM, Martin Sebor wrote:

Ping: https://gcc.gnu.org/pipermail/gcc-patches/2021-July/575690.html

Are there any other suggestions or comments or is the latest revision
okay to commit?


OK.


On 7/20/21 12:34 PM, Martin Sebor wrote:

On 7/14/21 10:23 AM, Jason Merrill wrote:

On 7/14/21 10:46 AM, Martin Sebor wrote:

On 7/13/21 9:39 PM, Jason Merrill wrote:

On 7/13/21 4:02 PM, Martin Sebor wrote:

On 7/13/21 12:37 PM, Jason Merrill wrote:

On 7/13/21 10:08 AM, Jonathan Wakely wrote:

On Mon, 12 Jul 2021 at 12:02, Richard Biener wrote:

Somebody with more C++ knowledge than me needs to approve the
vec.h changes - I don't feel competent to assess all effects of 
the change.


They look OK to me except for:

-extern vnull vNULL;
+static constexpr vnull vNULL{ };

Making vNULL have static linkage can make it an ODR violation to 
use

vNULL in templates and inline functions, because different
instantiations will refer to a different "vNULL" in each 
translation

unit.


The ODR says this is OK because it's a literal constant with the 
same value (6.2/12.2.1).


But it would be better without the explicit 'static'; then in 
C++17 it's implicitly inline instead of static.


I'll remove the static.



But then, do we really want to keep vNULL at all?  It's a weird 
blurring of the object/pointer boundary that is also dependent on 
vec being a thin wrapper around a pointer.  In almost all cases 
it can be replaced with {}; one exception is == comparison, where 
it seems to be testing that the embedded pointer is null, which 
is a weird thing to want to test.


The one use case I know of for vNULL where I can't think of
an equally good substitute is in passing a vec as an argument by
value.  The only way to do that that I can think of is to name
the full vec type (i.e., the specialization) which is more typing
and less generic than vNULL.  I don't use vNULL myself so I wouldn't
miss this trick if it were to be removed but others might feel
differently.


In C++11, it can be replaced by {} in that context as well.


Cool.  I thought I'd tried { } here but I guess not.




If not, I'm all for getting rid of vNULL but with over 350 uses
of it left, unless there's some clever trick to make the removal
(mostly) effortless and seamless, I'd much rather do it independently
of this initial change. I also don't know if I can commit to making
all this cleanup.


I already have a patch to replace all but one use of vNULL, but 
I'll hold off with it until after your patch.


So what's the next step?  The patch only removes a few uses of vNULL
but doesn't add any.  Is it good to go as is (without the static and
with the additional const changes Richard suggested)?  This patch is
attached to my reply to Richard:
https://gcc.gnu.org/pipermail/gcc-patches/2021-July/575199.html


As Richard wrote:


The pieces where you change vec<> passing to const vec<>& and the few
where you change vec<> * to const vec<> * are OK - this should make the
rest a smaller piece to review.


Please go ahead and apply those changes and send a new patch with the 
remainder of the changes.


I have just pushed r12-2418:
https://gcc.gnu.org/pipermail/gcc-cvs/2021-July/350886.html



A few other comments:


-   omp_declare_simd_clauses);
+   *omp_declare_simd_clauses);


Instead of doing this indirection in all of the callers, let's change 
c_finish_omp_declare_simd to take a pointer as well, and do the 
indirection in initializing a reference variable at the top of the 
function.


Okay.




+    sched_init_luids (bbs.to_vec ());
+    haifa_init_h_i_d (bbs.to_vec ());


Why are these to_vec changes needed when you are also changing the 
functions to take const&?


Calling to_vec() here isn't necessary so I've removed it.




-  vec checks = LOOP_VINFO_CHECK_NONZERO (loop_vinfo);
+  vec checks = LOOP_VINFO_CHECK_NONZERO (loop_vinfo).to_vec ();


Why not use a reference here and in other similar spots?


Sure, that works too.

Attached is what's left of the original changes now that r12-2418
has been applied.

Martin






Re: [PATCH] c++: Fix up attribute rollbacks in cp_parser_statement

2021-07-30 Thread Jason Merrill via Gcc-patches

On 7/30/21 6:44 AM, Jakub Jelinek wrote:

Hi!

During the OpenMP directives using C++ attribute syntax work, I've noticed
that cp_parser_statement when parsing various block declarations that do
not allow attribute-specifier-seq at the start rolls back the attributes
only if std_attrs is non-NULL (i.e. some attributes have been parsed),
but doesn't roll back if some tokens were parsed as attribute-specifier-seq,
but didn't yield any attributes (e.g. [[]][[]][[]][[]]), which means
we accept those empty attributes even in places where they don't appear
in the grammar.

The following patch fixes that by instead checking if there are any
tokens to roll back.  This makes the parsing handle the first
function the same as the second one (where some attribute appears).

The testcase contains two xfails, using namespace ... apparently
allows attributes at the start and the attributes shall appeartain to
using in that case.  To be fixed incrementally.

Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?


OK.


2021-07-30  Jakub Jelinek  

* parser.c (cp_parser_statement): Rollback attributes not just
when std_attrs is non-NULL, but whenever
cp_parser_std_attribute_spec_seq parsed any tokens.

* g++.dg/cpp0x/gen-attrs-76.C: New test.

--- gcc/cp/parser.c.jj  2021-07-30 11:19:39.431614703 +0200
+++ gcc/cp/parser.c 2021-07-30 12:22:16.995130642 +0200
@@ -11909,6 +11909,7 @@ cp_parser_statement (cp_parser* parser,
cp_token *token;
location_t statement_location, attrs_loc;
bool in_omp_attribute_pragma = parser->lexer->in_omp_attribute_pragma;
+  bool has_std_attrs;
  
   restart:

if (if_p != NULL)
@@ -11917,7 +11918,8 @@ cp_parser_statement (cp_parser* parser,
statement = NULL_TREE;
  
saved_token_sentinel saved_tokens (parser->lexer);

-  attrs_loc = cp_lexer_peek_token (parser->lexer)->location;
+  token = cp_lexer_peek_token (parser->lexer);
+  attrs_loc = token->location;
if (c_dialect_objc ())
  /* In obj-c++, seeing '[[' might be the either the beginning of
 c++11 attributes, or a nested objc-message-expression.  So
@@ -11931,6 +11933,7 @@ cp_parser_statement (cp_parser* parser,
if (!cp_parser_parse_definitely (parser))
std_attrs = NULL_TREE;
  }
+  has_std_attrs = cp_lexer_peek_token (parser->lexer) != token;
  
if (std_attrs && (flag_openmp || flag_openmp_simd))

  std_attrs = cp_parser_handle_statement_omp_attributes (parser, std_attrs);
@@ -11999,7 +12002,7 @@ cp_parser_statement (cp_parser* parser,
  
  	case RID_NAMESPACE:

  /* This must be a namespace alias definition.  */
- if (std_attrs != NULL_TREE)
+ if (has_std_attrs)
{
  /* Attributes should be parsed as part of the
 declaration, so let's un-parse them.  */
@@ -12104,7 +12107,7 @@ cp_parser_statement (cp_parser* parser,
  {
if (cp_lexer_next_token_is_not (parser->lexer, CPP_SEMICOLON))
{
- if (std_attrs != NULL_TREE)
+ if (has_std_attrs)
/* Attributes should be parsed as part of the declaration,
   so let's un-parse them.  */
saved_tokens.rollback();
@@ -12116,7 +12119,7 @@ cp_parser_statement (cp_parser* parser,
  if (cp_parser_parse_definitely (parser))
return;
  /* It didn't work, restore the post-attribute position.  */
- if (std_attrs)
+ if (has_std_attrs)
cp_lexer_set_token_position (parser->lexer, statement_token);
}
/* All preceding labels have been parsed at this point.  */
--- gcc/testsuite/g++.dg/cpp0x/gen-attrs-76.C.jj2021-07-30 
12:16:59.472477365 +0200
+++ gcc/testsuite/g++.dg/cpp0x/gen-attrs-76.C   2021-07-30 12:21:32.440740569 
+0200
@@ -0,0 +1,31 @@
+// { dg-do compile { target c++11 } }
+// { dg-options "-Wno-attributes" }
+
+namespace N {}
+namespace O { typedef int T; };
+
+void
+foo ()
+{
+  [[]] asm (""); // { dg-error "expected" }
+  [[]] __extension__ asm ("");   // { dg-error "expected" }
+  __extension__ [[]] asm ("");   // { dg-error "expected" }
+  [[]] namespace M = ::N;  // { dg-error "expected" }
+  [[]] using namespace N;  // { dg-bogus "expected" "" { 
xfail *-*-* } }
+  [[]] using O::T; // { dg-error "expected" }
+  [[]] __label__ foo;  // { dg-error "expected" }
+  [[]] static_assert (true, ""); // { dg-error "expected" }
+}
+
+void
+bar ()
+{
+  [[gnu::unused]] asm ("");  // { dg-error "expected" }
+  [[gnu::unused]] __extension__ asm ("");// { dg-error "expected" }
+  __extension__ [[gnu::unused]] asm ("");// { dg-error "expected" }
+  [[gnu::unused]] namespace M = ::N;   // { dg-error "expected" }
+  [[gnu::unused]] using namespace N;   // { dg-bogus "expected" "" { 

Re: [PATCH] c++: Implement P0466R5 __cpp_lib_is_pointer_interconvertible compiler helpers [PR101539]

2021-07-30 Thread Jason Merrill via Gcc-patches

On 7/30/21 5:51 AM, Jakub Jelinek wrote:

On Thu, Jul 29, 2021 at 04:38:44PM -0400, Jason Merrill wrote:

We don't already reject an anonymous struct with bases?  I think we should
do so, in fixup_anonymous_aggr.  We might even require anonymous structs to
be standard-layout.


Not having base classes seems reasonable requirement for the anonymous
structures, after all, I couldn't find a way to refer to the members
in the base class - ::e is rejected with the above.


Patch attached.


But standard layout means that even all the non-static members of the struct
need to be standard-layout, that seems an unnecessary requirement for
anon structures to me.


Good point.

But then, if the anonymous struct is non-standard-layout, that should 
make the enclosing class non-standard-layout as well, so we should never 
need to consider in the pointer-interconv code whether the anonymous 
struct is standard-layout.



+/* Helper function for pointer_interconvertible_base_of_p.  Verify
+   that BINFO_TYPE (BINFO) is pointer interconvertible with BASE.  */
+
+static bool
+pointer_interconvertible_base_of_p_1 (tree binfo, tree base)
+{
+  for (tree field = TYPE_FIELDS (BINFO_TYPE (binfo));
+   field; field = DECL_CHAIN (field))
+if (TREE_CODE (field) == FIELD_DECL && !DECL_FIELD_IS_BASE (field))
+  return false;


I think checking non-static data members is a bug in the resolution of CWG
2254, which correctly changed 11.4 to say that the address of a
standard-layout class is the same as the address of each base whether or not
the class has non-static data members, but didn't change
pointer-interconvertibility enough to match.  I've raised this with CWG.

I think we don't need this function at all.


Ok.

...

Instead of checking !UNION_TYPE twice above, you could check it once here
and return false.


Here is an updated patch, which includes the incremental patch for
non-std-layout unions (with no changes for non-stdlayout in anon structure
in union though) and has your review comments above incorporated.
All the changes from the combination of the original and incremental patch
are in gcc/cp/semantics.c and
gcc/testsuite/g++.dg/cpp2a/is-pointer-interconvertible-base-of1.C.

2021-07-30  Jakub Jelinek  

PR c++/101539
gcc/c-family/
* c-common.h (enum rid): Add RID_IS_POINTER_INTERCONVERTIBLE_BASE_OF.
* c-common.c (c_common_reswords): Add
__is_pointer_interconvertible_base_of.
gcc/cp/
* cp-tree.h (enum cp_trait_kind): Add
CPTK_IS_POINTER_INTERCONVERTIBLE_BASE_OF.
(enum cp_built_in_function): Add
CP_BUILT_IN_IS_POINTER_INTERCONVERTIBLE_WITH_CLASS.
(fold_builtin_is_pointer_inverconvertible_with_class): Declare.
* parser.c (cp_parser_primary_expression): Handle
RID_IS_POINTER_INTERCONVERTIBLE_BASE_OF.
(cp_parser_trait_expr): Likewise.
* cp-objcp-common.c (names_builtin_p): Likewise.
* constraint.cc (diagnose_trait_expr): Handle
CPTK_IS_POINTER_INTERCONVERTIBLE_BASE_OF.
* decl.c (cxx_init_decl_processing): Register
__builtin_is_pointer_interconvertible_with_class builtin.
* constexpr.c (cxx_eval_builtin_function_call): Handle
CP_BUILT_IN_IS_POINTER_INTERCONVERTIBLE_WITH_CLASS builtin.
* semantics.c (pointer_interconvertible_base_of_p,
first_nonstatic_data_member_p,
fold_builtin_is_pointer_inverconvertible_with_class): New functions.
(trait_expr_value): Handle CPTK_IS_POINTER_INTERCONVERTIBLE_BASE_OF.
(finish_trait_expr): Likewise.  Formatting fix.
* cp-gimplify.c (cp_gimplify_expr): Fold
CP_BUILT_IN_IS_POINTER_INTERCONVERTIBLE_WITH_CLASS.  Call
fndecl_built_in_p just once.
(cp_fold): Likewise.
* tree.c (builtin_valid_in_constant_expr_p): Handle
CP_BUILT_IN_IS_POINTER_INTERCONVERTIBLE_WITH_CLASS.  Call
fndecl_built_in_p just once.
* cxx-pretty-print.c (pp_cxx_trait_expression): Handle
CPTK_IS_POINTER_INTERCONVERTIBLE_BASE_OF.
gcc/testsuite/
* g++.dg/cpp2a/is-pointer-interconvertible-base-of1.C: New test.
* g++.dg/cpp2a/is-pointer-interconvertible-with-class1.C: New test.
* g++.dg/cpp2a/is-pointer-interconvertible-with-class2.C: New test.
* g++.dg/cpp2a/is-pointer-interconvertible-with-class3.C: New test.
* g++.dg/cpp2a/is-pointer-interconvertible-with-class4.C: New test.
* g++.dg/cpp2a/is-pointer-interconvertible-with-class5.C: New test.
* g++.dg/cpp2a/is-pointer-interconvertible-with-class6.C: New test.

--- gcc/c-family/c-common.h.jj  2021-07-29 13:24:08.368481637 +0200
+++ gcc/c-family/c-common.h 2021-07-30 11:19:39.391615252 +0200
@@ -174,6 +174,7 @@ enum rid
RID_IS_BASE_OF,  RID_IS_CLASS,
RID_IS_EMPTY,RID_IS_ENUM,
RID_IS_FINAL,RID_IS_LITERAL_TYPE,
+  RID_IS_POINTER_INTERCONVERTIBLE_BASE_OF,
RID_IS_POD,  

Re: [PATCH] Add emulated gather capability to the vectorizer

2021-07-30 Thread Kewen.Lin via Gcc-patches
Hi Richi,

on 2021/7/30 下午7:34, Richard Biener wrote:
> This adds a gather vectorization capability to the vectorizer
> without target support by decomposing the offset vector, doing
> sclar loads and then building a vector from the result.  This
> is aimed mainly at cases where vectorizing the rest of the loop
> offsets the cost of vectorizing the gather.
> 
> Note it's difficult to avoid vectorizing the offset load, but in
> some cases later passes can turn the vector load + extract into
> scalar loads, see the followup patch.
> 
> On SPEC CPU 2017 510.parest_r this improves runtime from 250s
> to 219s on a Zen2 CPU which has its native gather instructions
> disabled (using those the runtime instead increases to 254s)
> using -Ofast -march=znver2 [-flto].  It turns out the critical
> loops in this benchmark all perform gather operations.
> 

Wow, it sounds promising!

> Bootstrapped and tested on x86_64-unknown-linux-gnu.
> 
> Any comments?  I still plan to run this over full SPEC and
> I have to apply TLC to the followup patch before I can post it.
> 
> I think neither power nor z has gather so I'm curious if the
> patch helps 510.parest_r there, I'm unsure about neon/advsimd.

Yes, Power (latest Power10) doesn't support gather load.
I just measured 510.parest_r with this patch on Power9 at option
-Ofast -mcpu=power9 {,-funroll-loops}, both are neutral.

It fails to vectorize the loop in vect-gather-1.c:

vect-gather.c:12:28: missed:  failed: evolution of base is not affine.
vect-gather.c:11:46: missed:   not vectorized: data ref analysis failed _6 = 
*_5;
vect-gather.c:12:28: missed:   not vectorized: data ref analysis failed: _6 = 
*_5;
vect-gather.c:11:46: missed:  bad data references.
vect-gather.c:11:46: missed: couldn't vectorize loop

BR,
Kewen

> Both might need the followup patch - I was surprised about
> the speedup without it on Zen (the followup improves runtime
> to 198s there).
> 
> Thanks,
> Richard.
> 
> 2021-07-30  Richard Biener  
> 
>   * tree-vect-data-refs.c (vect_check_gather_scatter):
>   Include widening conversions only when the result is
>   still handed by native gather or the current offset
>   size not already matches the data size.
>   Also succeed analysis in case there's no native support,
>   noted by a IFN_LAST ifn and a NULL decl.
>   * tree-vect-patterns.c (vect_recog_gather_scatter_pattern):
>   Test for no IFN gather rather than decl gather.
>   * tree-vect-stmts.c (vect_model_load_cost): Pass in the
>   gather-scatter info and cost emulated gathers accordingly.
>   (vect_truncate_gather_scatter_offset): Properly test for
>   no IFN gather.
>   (vect_use_strided_gather_scatters_p): Likewise.
>   (get_load_store_type): Handle emulated gathers and its
>   restrictions.
>   (vectorizable_load): Likewise.  Emulate them by extracting
> scalar offsets, doing scalar loads and a vector construct.
> 
>   * gcc.target/i386/vect-gather-1.c: New testcase.
>   * gfortran.dg/vect/vect-8.f90: Adjust.
> ---
>  gcc/testsuite/gcc.target/i386/vect-gather-1.c |  18 
>  gcc/testsuite/gfortran.dg/vect/vect-8.f90 |   2 +-
>  gcc/tree-vect-data-refs.c |  29 +++--
>  gcc/tree-vect-patterns.c  |   2 +-
>  gcc/tree-vect-stmts.c | 100 --
>  5 files changed, 136 insertions(+), 15 deletions(-)
>  create mode 100644 gcc/testsuite/gcc.target/i386/vect-gather-1.c
> 
> diff --git a/gcc/testsuite/gcc.target/i386/vect-gather-1.c 
> b/gcc/testsuite/gcc.target/i386/vect-gather-1.c
> new file mode 100644
> index 000..134aef39666
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/i386/vect-gather-1.c
> @@ -0,0 +1,18 @@
> +/* { dg-do compile } */
> +/* { dg-options "-Ofast -msse2 -fdump-tree-vect-details" } */
> +
> +#ifndef INDEXTYPE
> +#define INDEXTYPE int
> +#endif
> +double vmul(INDEXTYPE *rowstart, INDEXTYPE *rowend,
> + double *luval, double *dst)
> +{
> +  double res = 0;
> +  for (const INDEXTYPE * col = rowstart; col != rowend; ++col, ++luval)
> +res += *luval * dst[*col];
> +  return res;
> +}
> +
> +/* With gather emulation this should be profitable to vectorize
> +   even with plain SSE2.  */
> +/* { dg-final { scan-tree-dump "loop vectorized" "vect" } } */
> diff --git a/gcc/testsuite/gfortran.dg/vect/vect-8.f90 
> b/gcc/testsuite/gfortran.dg/vect/vect-8.f90
> index 9994805d77f..cc1aebfbd84 100644
> --- a/gcc/testsuite/gfortran.dg/vect/vect-8.f90
> +++ b/gcc/testsuite/gfortran.dg/vect/vect-8.f90
> @@ -706,5 +706,5 @@ END SUBROUTINE kernel
>  
>  ! { dg-final { scan-tree-dump-times "vectorized 24 loops" 1 "vect" { target 
> aarch64_sve } } }
>  ! { dg-final { scan-tree-dump-times "vectorized 23 loops" 1 "vect" { target 
> { aarch64*-*-* && { ! aarch64_sve } } } } }
> -! { dg-final { scan-tree-dump-times "vectorized 2\[23\] loops" 1 "vect" { 
> target { vect_intdouble_cvt && { ! aarch64*-*-* } } } } }
> +! 

RE: [PATCH] Add emulated gather capability to the vectorizer

2021-07-30 Thread Tamar Christina via Gcc-patches
> -Original Message-
> From: Gcc-patches  bounces+tamar.christina=arm@gcc.gnu.org> On Behalf Of Richard
> Biener
> Sent: Friday, July 30, 2021 12:34 PM
> To: gcc-patches@gcc.gnu.org
> Cc: Richard Sandiford 
> Subject: [PATCH] Add emulated gather capability to the vectorizer
> 
> This adds a gather vectorization capability to the vectorizer without target
> support by decomposing the offset vector, doing sclar loads and then
> building a vector from the result.  This is aimed mainly at cases where
> vectorizing the rest of the loop offsets the cost of vectorizing the gather.
> 
> Note it's difficult to avoid vectorizing the offset load, but in some cases 
> later
> passes can turn the vector load + extract into scalar loads, see the followup
> patch.
> 
> On SPEC CPU 2017 510.parest_r this improves runtime from 250s to 219s on a
> Zen2 CPU which has its native gather instructions disabled (using those the
> runtime instead increases to 254s) using -Ofast -march=znver2 [-flto].  It
> turns out the critical loops in this benchmark all perform gather operations.
> 
> Bootstrapped and tested on x86_64-unknown-linux-gnu.
> 
> Any comments?  I still plan to run this over full SPEC and I have to apply 
> TLC to
> the followup patch before I can post it.
> 

Interesting patch, I wonder if this can be used to vectorize the first loop 
satd in x264,
There the loads look like a gather due to how the dr are detected, you could 
transform
It into a linear load and widening operations.

But we could never pattern match it because no SLP tree was ever created 
because of the
loads.  Maybe this will allow us to then?

Cheers,
Tamar

> I think neither power nor z has gather so I'm curious if the patch helps
> 510.parest_r there, I'm unsure about neon/advsimd.
> Both might need the followup patch - I was surprised about the speedup
> without it on Zen (the followup improves runtime to 198s there).
> 
> Thanks,
> Richard.
> 
> 2021-07-30  Richard Biener  
> 
>   * tree-vect-data-refs.c (vect_check_gather_scatter):
>   Include widening conversions only when the result is
>   still handed by native gather or the current offset
>   size not already matches the data size.
>   Also succeed analysis in case there's no native support,
>   noted by a IFN_LAST ifn and a NULL decl.
>   * tree-vect-patterns.c (vect_recog_gather_scatter_pattern):
>   Test for no IFN gather rather than decl gather.
>   * tree-vect-stmts.c (vect_model_load_cost): Pass in the
>   gather-scatter info and cost emulated gathers accordingly.
>   (vect_truncate_gather_scatter_offset): Properly test for
>   no IFN gather.
>   (vect_use_strided_gather_scatters_p): Likewise.
>   (get_load_store_type): Handle emulated gathers and its
>   restrictions.
>   (vectorizable_load): Likewise.  Emulate them by extracting
> scalar offsets, doing scalar loads and a vector construct.
> 
>   * gcc.target/i386/vect-gather-1.c: New testcase.
>   * gfortran.dg/vect/vect-8.f90: Adjust.
> ---
>  gcc/testsuite/gcc.target/i386/vect-gather-1.c |  18 
>  gcc/testsuite/gfortran.dg/vect/vect-8.f90 |   2 +-
>  gcc/tree-vect-data-refs.c |  29 +++--
>  gcc/tree-vect-patterns.c  |   2 +-
>  gcc/tree-vect-stmts.c | 100 --
>  5 files changed, 136 insertions(+), 15 deletions(-)  create mode 100644
> gcc/testsuite/gcc.target/i386/vect-gather-1.c
> 
> diff --git a/gcc/testsuite/gcc.target/i386/vect-gather-1.c
> b/gcc/testsuite/gcc.target/i386/vect-gather-1.c
> new file mode 100644
> index 000..134aef39666
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/i386/vect-gather-1.c
> @@ -0,0 +1,18 @@
> +/* { dg-do compile } */
> +/* { dg-options "-Ofast -msse2 -fdump-tree-vect-details" } */
> +
> +#ifndef INDEXTYPE
> +#define INDEXTYPE int
> +#endif
> +double vmul(INDEXTYPE *rowstart, INDEXTYPE *rowend,
> + double *luval, double *dst)
> +{
> +  double res = 0;
> +  for (const INDEXTYPE * col = rowstart; col != rowend; ++col, ++luval)
> +res += *luval * dst[*col];
> +  return res;
> +}
> +
> +/* With gather emulation this should be profitable to vectorize
> +   even with plain SSE2.  */
> +/* { dg-final { scan-tree-dump "loop vectorized" "vect" } } */
> diff --git a/gcc/testsuite/gfortran.dg/vect/vect-8.f90
> b/gcc/testsuite/gfortran.dg/vect/vect-8.f90
> index 9994805d77f..cc1aebfbd84 100644
> --- a/gcc/testsuite/gfortran.dg/vect/vect-8.f90
> +++ b/gcc/testsuite/gfortran.dg/vect/vect-8.f90
> @@ -706,5 +706,5 @@ END SUBROUTINE kernel
> 
>  ! { dg-final { scan-tree-dump-times "vectorized 24 loops" 1 "vect" { target
> aarch64_sve } } }  ! { dg-final { scan-tree-dump-times "vectorized 23 loops" 1
> "vect" { target { aarch64*-*-* && { ! aarch64_sve } } } } } -! { dg-final { 
> scan-
> tree-dump-times "vectorized 2\[23\] loops" 1 "vect" { target
> { vect_intdouble_cvt && { ! aarch64*-*-* } } } } }
> +! { 

Re: [PATCH] x86: Don't enable LZCNT/POPCNT if disabled explicitly

2021-07-30 Thread H.J. Lu via Gcc-patches
On Fri, Jul 30, 2021 at 6:42 AM Jakub Jelinek  wrote:
>
> On Fri, Jul 30, 2021 at 03:39:03PM +0200, Uros Bizjak via Gcc-patches wrote:
> > On Fri, Jul 30, 2021 at 3:04 PM H.J. Lu  wrote:
> > >
> > > gcc/
> > >
> > > PR target/101685
> > > * config/i386/i386-options.c (ix86_option_override_internal):
> > > Don't enable LZCNT/POPCNT if they have been disabled explicitly.
> > >
> > > gcc/testsuite/
> > >
> > > PR target/101685
> > > * gcc.target/i386/pr-101685.c: New test.
>
> Can you please remove the hyphen from the test name?  We don't have
> any pr-*.c tests...

Will do.

> >
> > OK.
>
> Jakub
>


-- 
H.J.


Re: [PATCH] Add emulated gather capability to the vectorizer

2021-07-30 Thread Richard Sandiford via Gcc-patches
Richard Biener  writes:
> This adds a gather vectorization capability to the vectorizer
> without target support by decomposing the offset vector, doing
> sclar loads and then building a vector from the result.  This
> is aimed mainly at cases where vectorizing the rest of the loop
> offsets the cost of vectorizing the gather.
>
> Note it's difficult to avoid vectorizing the offset load, but in
> some cases later passes can turn the vector load + extract into
> scalar loads, see the followup patch.
>
> On SPEC CPU 2017 510.parest_r this improves runtime from 250s
> to 219s on a Zen2 CPU which has its native gather instructions
> disabled (using those the runtime instead increases to 254s)
> using -Ofast -march=znver2 [-flto].  It turns out the critical
> loops in this benchmark all perform gather operations.

Nice!

> Bootstrapped and tested on x86_64-unknown-linux-gnu.
>
> Any comments?  I still plan to run this over full SPEC and
> I have to apply TLC to the followup patch before I can post it.
>
> I think neither power nor z has gather so I'm curious if the
> patch helps 510.parest_r there, I'm unsure about neon/advsimd.
> Both might need the followup patch - I was surprised about
> the speedup without it on Zen (the followup improves runtime
> to 198s there).
>
> Thanks,
> Richard.
>
> 2021-07-30  Richard Biener  
>
>   * tree-vect-data-refs.c (vect_check_gather_scatter):
>   Include widening conversions only when the result is
>   still handed by native gather or the current offset
>   size not already matches the data size.
>   Also succeed analysis in case there's no native support,
>   noted by a IFN_LAST ifn and a NULL decl.
>   * tree-vect-patterns.c (vect_recog_gather_scatter_pattern):
>   Test for no IFN gather rather than decl gather.
>   * tree-vect-stmts.c (vect_model_load_cost): Pass in the
>   gather-scatter info and cost emulated gathers accordingly.
>   (vect_truncate_gather_scatter_offset): Properly test for
>   no IFN gather.
>   (vect_use_strided_gather_scatters_p): Likewise.
>   (get_load_store_type): Handle emulated gathers and its
>   restrictions.
>   (vectorizable_load): Likewise.  Emulate them by extracting
> scalar offsets, doing scalar loads and a vector construct.
>
>   * gcc.target/i386/vect-gather-1.c: New testcase.
>   * gfortran.dg/vect/vect-8.f90: Adjust.
> ---
>  gcc/testsuite/gcc.target/i386/vect-gather-1.c |  18 
>  gcc/testsuite/gfortran.dg/vect/vect-8.f90 |   2 +-
>  gcc/tree-vect-data-refs.c |  29 +++--
>  gcc/tree-vect-patterns.c  |   2 +-
>  gcc/tree-vect-stmts.c | 100 --
>  5 files changed, 136 insertions(+), 15 deletions(-)
>  create mode 100644 gcc/testsuite/gcc.target/i386/vect-gather-1.c
>
> diff --git a/gcc/testsuite/gcc.target/i386/vect-gather-1.c 
> b/gcc/testsuite/gcc.target/i386/vect-gather-1.c
> new file mode 100644
> index 000..134aef39666
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/i386/vect-gather-1.c
> @@ -0,0 +1,18 @@
> +/* { dg-do compile } */
> +/* { dg-options "-Ofast -msse2 -fdump-tree-vect-details" } */
> +
> +#ifndef INDEXTYPE
> +#define INDEXTYPE int
> +#endif
> +double vmul(INDEXTYPE *rowstart, INDEXTYPE *rowend,
> + double *luval, double *dst)
> +{
> +  double res = 0;
> +  for (const INDEXTYPE * col = rowstart; col != rowend; ++col, ++luval)
> +res += *luval * dst[*col];
> +  return res;
> +}
> +
> +/* With gather emulation this should be profitable to vectorize
> +   even with plain SSE2.  */
> +/* { dg-final { scan-tree-dump "loop vectorized" "vect" } } */
> diff --git a/gcc/testsuite/gfortran.dg/vect/vect-8.f90 
> b/gcc/testsuite/gfortran.dg/vect/vect-8.f90
> index 9994805d77f..cc1aebfbd84 100644
> --- a/gcc/testsuite/gfortran.dg/vect/vect-8.f90
> +++ b/gcc/testsuite/gfortran.dg/vect/vect-8.f90
> @@ -706,5 +706,5 @@ END SUBROUTINE kernel
>  
>  ! { dg-final { scan-tree-dump-times "vectorized 24 loops" 1 "vect" { target 
> aarch64_sve } } }
>  ! { dg-final { scan-tree-dump-times "vectorized 23 loops" 1 "vect" { target 
> { aarch64*-*-* && { ! aarch64_sve } } } } }
> -! { dg-final { scan-tree-dump-times "vectorized 2\[23\] loops" 1 "vect" { 
> target { vect_intdouble_cvt && { ! aarch64*-*-* } } } } }
> +! { dg-final { scan-tree-dump-times "vectorized 2\[234\] loops" 1 "vect" { 
> target { vect_intdouble_cvt && { ! aarch64*-*-* } } } } }
>  ! { dg-final { scan-tree-dump-times "vectorized 17 loops" 1 "vect" { target 
> { { ! vect_intdouble_cvt } && { ! aarch64*-*-* } } } } }
> diff --git a/gcc/tree-vect-data-refs.c b/gcc/tree-vect-data-refs.c
> index 6995efba899..0279e75fa8e 100644
> --- a/gcc/tree-vect-data-refs.c
> +++ b/gcc/tree-vect-data-refs.c
> @@ -4007,8 +4007,26 @@ vect_check_gather_scatter (stmt_vec_info stmt_info, 
> loop_vec_info loop_vinfo,
> continue;
>   }
>  
> -   if (TYPE_PRECISION (TREE_TYPE (op0))
> -  

Re: [PATCH] x86: Don't enable LZCNT/POPCNT if disabled explicitly

2021-07-30 Thread Jakub Jelinek via Gcc-patches
On Fri, Jul 30, 2021 at 03:39:03PM +0200, Uros Bizjak via Gcc-patches wrote:
> On Fri, Jul 30, 2021 at 3:04 PM H.J. Lu  wrote:
> >
> > gcc/
> >
> > PR target/101685
> > * config/i386/i386-options.c (ix86_option_override_internal):
> > Don't enable LZCNT/POPCNT if they have been disabled explicitly.
> >
> > gcc/testsuite/
> >
> > PR target/101685
> > * gcc.target/i386/pr-101685.c: New test.

Can you please remove the hyphen from the test name?  We don't have
any pr-*.c tests...
> 
> OK.

Jakub



Re: [PATCH] x86: Don't enable LZCNT/POPCNT if disabled explicitly

2021-07-30 Thread Uros Bizjak via Gcc-patches
On Fri, Jul 30, 2021 at 3:04 PM H.J. Lu  wrote:
>
> gcc/
>
> PR target/101685
> * config/i386/i386-options.c (ix86_option_override_internal):
> Don't enable LZCNT/POPCNT if they have been disabled explicitly.
>
> gcc/testsuite/
>
> PR target/101685
> * gcc.target/i386/pr-101685.c: New test.

OK.

Thanks,
Uros.

> ---
>  gcc/config/i386/i386-options.c|  6 --
>  gcc/testsuite/gcc.target/i386/pr-101685.c | 10 ++
>  2 files changed, 14 insertions(+), 2 deletions(-)
>  create mode 100644 gcc/testsuite/gcc.target/i386/pr-101685.c
>
> diff --git a/gcc/config/i386/i386-options.c b/gcc/config/i386/i386-options.c
> index 3416a4f1752..6b789988baa 100644
> --- a/gcc/config/i386/i386-options.c
> +++ b/gcc/config/i386/i386-options.c
> @@ -2124,8 +2124,10 @@ ix86_option_override_internal (bool main_args_p,
> if (((processor_alias_table[i].flags & PTA_ABM) != 0)
> && !TARGET_EXPLICIT_ABM_P (opts))
>   {
> -   SET_TARGET_LZCNT (opts);
> -   SET_TARGET_POPCNT (opts);
> +   if (!TARGET_EXPLICIT_LZCNT_P (opts))
> + SET_TARGET_LZCNT (opts);
> +   if (!TARGET_EXPLICIT_POPCNT_P (opts))
> + SET_TARGET_POPCNT (opts);
>   }
>
> if ((processor_alias_table[i].flags
> diff --git a/gcc/testsuite/gcc.target/i386/pr-101685.c 
> b/gcc/testsuite/gcc.target/i386/pr-101685.c
> new file mode 100644
> index 000..0c743ecad00
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/i386/pr-101685.c
> @@ -0,0 +1,10 @@
> +/* { dg-do compile } */
> +/* { dg-options "-O2 -march=amdfam10 -mno-lzcnt -mno-popcnt" } */
> +
> +#ifdef __LZCNT__
> +# error LZCNT should be disabled
> +#endif
> +
> +#ifdef __POPCNT__
> +# error POPCNT should be disabled
> +#endif
> --
> 2.31.1
>


Re: [PATCH][gcc/doc] Improve nonnull attribute documentation

2021-07-30 Thread Tom de Vries
On 7/30/21 9:25 AM, Richard Biener wrote:
> On Wed, 28 Jul 2021, Tom de Vries wrote:
> 
>> Hi,
>>
>> Improve nonnull attribute documentation in a number of ways:
>>
>> Reorganize discussion of effects into:
>> - effects for calls to functions with nonnull-marked parameters, and
>> - effects for function definitions with nonnull-marked parameters.
>> This makes it clear that -fno-delete-null-pointer-checks has no effect for
>> optimizations based on nonnull-marked parameters in function definitions
>> (see PR100404).
>>
>> Mention -Wnonnull-compare.
>>
>> Mention workaround from PR100404 comment 7.
>>
>> The workaround can be used for this scenario.  Say we have a test.c:
>> ...
>>  #include 
>>
>>  extern int isnull (char *ptr) __attribute__ ((nonnull));
>>  int isnull (char *ptr)
>>  {
>>if (ptr == 0)
>>  return 1;
>>return 0;
>>  }
>>
>>  int
>>  main (void)
>>  {
>>char *ptr = NULL;
>>if (isnull (ptr)) __builtin_abort ();
>>return 0;
>>  }
>> ...
>>
>> The test-case contains a mistake: ptr == NULL, and we want to detect the
>> mistake using an abort:
>> ...
>> $ gcc test.c
>> $ ./a.out
>> Aborted (core dumped)
>> ...
>>
>> At -O2 however, the mistake is not detected:
>> ...
>> $ gcc test.c -O2
>> $ ./a.out
>> ...
>> which is what -Wnonnull-compare (not show here) warns about.
>>
>> The easiest way to fix this is by dropping the nonnull attribute.  But that
>> also disables -Wnonnull, which would detect something like:
>> ...
>>   if (isnull (NULL)) __builtin_abort ();
>> ...
>> at compile time.
>>
>> Using this workaround:
>> ...
>>  int isnull (char *ptr)
>>  {
>> +  asm ("" : "+r"(ptr));
>>if (ptr == 0)
>>  return 1;
>>return 0;
>>  }
>> ...
>> we still manage to detect the problem at runtime with -O2:
>> ...
>> $ ~/gcc_versions/devel/install/bin/gcc test.c -O2
>> $ ./a.out
>> Aborted (core dumped)
>> ...
>> while keeping the possibility to detect "isnull (NULL)" at compile time.
>>
>> OK for trunk?
> 
> I think it's an improvement over the current situation but the
> inline-assembler suggestion to "fix" definition side optimizations
> are IMHO a hint at that we need a better solution here.

Agreed.

[ If you want I can resubmit without that bit, if it's not acceptable. I
merely included it because it's the only solution I found advertised (in
bugzilla). ]

> Splitting
> the attribute into a caller and a calle side one for example,
> or making -fno-delete-null-pointer-checks do what it suggests.
> 

I think the problem is that in fact there are two attributes:
- assume_nonnull
- verify_nonnull (or, want_nonnull or some such)
which got conflated in the nonnull attribute.  [ Which still could be
fine if you got an -fnonnull=assume/verify to switch between the two
interpretations. ]

Anyway, so more concretely my idea is that this should generate a -Wnonnull:
...
extern void foo (void *ptr) __attribute__((verify_nonnull (1)));
void bar (void) {
  foo (nullptr);
}
...
and the same with assume_nonnull (after all, the assumption is violated).

And this assert could be optimized away (with -Wnonnull-compare):
...
extern void foo (void *ptr) __attribute__((assume_nonnull (1)));
void foo (void *ptr) {
  assert (ptr != nullptr);
}
...
while the assert shouldn't be optimized away with verify_nonnull.

And likewise, this if could be optimized away by
fdelete-null-pointer-checks:
... 
extern void foo (void *ptr) __attribute__((assume_nonnull (1)));
void bar (void *ptr) {
  foo (ptr);
  if (ptr != nullptr)
{ ... }
}
...
while the if shouldn't be optimized away with verify_nonnull.

I think this way of splitting up the functionality conforms to the
principle of least surprise and does not require thinking about caller /
callee distinction, nor does it require extension of
-fno-delete-null-pointer-checks.

Thanks,
- Tom

> And as suggested elsewhere the effect of -fno-delete-null-pointer-checks
> making objects at NULL address valid should be a target hook based
> on the address-space with the default implementation considering
> only the default address-space having no objects at NULL.
> 



Re: [PATCH] i386: Improve extensions of __builtin_clz and constant - __builtin_clz for -mno-lzcnt [PR78103]

2021-07-30 Thread Jakub Jelinek via Gcc-patches
On Fri, Jul 30, 2021 at 12:27:39PM +0200, Uros Bizjak wrote:
> Please put some space here, e.g.:
...
> Can you just name the relevant insn pattern and use
> 
> emit_insn (gen_bsr_1)?

Here is the updated patch.  I'll bootstrap/regtest it tonight.

2021-07-30  Jakub Jelinek  

PR target/78103
* config/i386/i386.md (bsr_rex64_1, bsr_1, bsr_zext_1): New
define_insn patterns.
(*bsr_rex64_2, *bsr_2): New define_insn_and_split patterns.
Add combine splitters for constant - clz.
(clz2): Use a temporary pseudo for bsr result.

* gcc.target/i386/pr78103-1.c: New test.
* gcc.target/i386/pr78103-2.c: New test.
* gcc.target/i386/pr78103-3.c: New test.

--- gcc/config/i386/i386.md.jj  2021-07-28 12:05:56.857977764 +0200
+++ gcc/config/i386/i386.md 2021-07-30 15:13:49.994946550 +0200
@@ -14761,6 +14761,18 @@ (define_insn "bsr_rex64"
(set_attr "znver1_decode" "vector")
(set_attr "mode" "DI")])
 
+(define_insn "bsr_rex64_1"
+  [(set (match_operand:DI 0 "register_operand" "=r")
+   (minus:DI (const_int 63)
+ (clz:DI (match_operand:DI 1 "nonimmediate_operand" "rm"
+   (clobber (reg:CC FLAGS_REG))]
+  "!TARGET_LZCNT && TARGET_64BIT"
+  "bsr{q}\t{%1, %0|%0, %1}"
+  [(set_attr "type" "alu1")
+   (set_attr "prefix_0f" "1")
+   (set_attr "znver1_decode" "vector")
+   (set_attr "mode" "DI")])
+
 (define_insn "bsr"
   [(set (reg:CCZ FLAGS_REG)
(compare:CCZ (match_operand:SI 1 "nonimmediate_operand" "rm")
@@ -14775,17 +14787,204 @@ (define_insn "bsr"
(set_attr "znver1_decode" "vector")
(set_attr "mode" "SI")])
 
+(define_insn "bsr_1"
+  [(set (match_operand:SI 0 "register_operand" "=r")
+   (minus:SI (const_int 31)
+ (clz:SI (match_operand:SI 1 "nonimmediate_operand" "rm"
+   (clobber (reg:CC FLAGS_REG))]
+  "!TARGET_LZCNT"
+  "bsr{l}\t{%1, %0|%0, %1}"
+  [(set_attr "type" "alu1")
+   (set_attr "prefix_0f" "1")
+   (set_attr "znver1_decode" "vector")
+   (set_attr "mode" "SI")])
+
+(define_insn "bsr_zext_1"
+  [(set (match_operand:DI 0 "register_operand" "=r")
+   (zero_extend:DI
+ (minus:SI
+   (const_int 31)
+   (clz:SI (match_operand:SI 1 "nonimmediate_operand" "rm")
+   (clobber (reg:CC FLAGS_REG))]
+  "!TARGET_LZCNT && TARGET_64BIT"
+  "bsr{l}\t{%1, %k0|%k0, %1}"
+  [(set_attr "type" "alu1")
+   (set_attr "prefix_0f" "1")
+   (set_attr "znver1_decode" "vector")
+   (set_attr "mode" "SI")])
+
+; As bsr is undefined behavior on zero and for other input
+; values it is in range 0 to 63, we can optimize away sign-extends.
+(define_insn_and_split "*bsr_rex64_2"
+  [(set (match_operand:DI 0 "register_operand")
+   (xor:DI
+ (sign_extend:DI
+   (minus:SI
+ (const_int 63)
+ (subreg:SI (clz:DI (match_operand:DI 1 "nonimmediate_operand"))
+0)))
+ (const_int 63)))
+(clobber (reg:CC FLAGS_REG))]
+  "!TARGET_LZCNT && TARGET_64BIT && ix86_pre_reload_split ()"
+  "#"
+  "&& 1"
+  [(parallel [(set (reg:CCZ FLAGS_REG)
+  (compare:CCZ (match_dup 1) (const_int 0)))
+ (set (match_dup 2)
+  (minus:DI (const_int 63) (clz:DI (match_dup 1])
+   (parallel [(set (match_dup 0)
+  (zero_extend:DI (xor:SI (match_dup 3) (const_int 63
+ (clobber (reg:CC FLAGS_REG))])]
+{
+  operands[2] = gen_reg_rtx (DImode);
+  operands[3] = lowpart_subreg (SImode, operands[2], DImode);
+})
+
+(define_insn_and_split "*bsr_2"
+  [(set (match_operand:DI 0 "register_operand")
+   (sign_extend:DI
+ (xor:SI
+   (minus:SI
+ (const_int 31)
+ (clz:SI (match_operand:SI 1 "nonimmediate_operand")))
+   (const_int 31
+   (clobber (reg:CC FLAGS_REG))]
+  "!TARGET_LZCNT && TARGET_64BIT && ix86_pre_reload_split ()"
+  "#"
+  "&& 1"
+  [(parallel [(set (reg:CCZ FLAGS_REG)
+  (compare:CCZ (match_dup 1) (const_int 0)))
+ (set (match_dup 2)
+  (minus:SI (const_int 31) (clz:SI (match_dup 1])
+   (parallel [(set (match_dup 0)
+  (zero_extend:DI (xor:SI (match_dup 2) (const_int 31
+ (clobber (reg:CC FLAGS_REG))])]
+  "operands[2] = gen_reg_rtx (SImode);")
+
+; Splitters to optimize 64 - __builtin_clzl (x) or 32 - __builtin_clz (x).
+; Again, as for !TARGET_LZCNT CLZ is UB at zero, CLZ is guaranteed to be
+; in [0, 63] or [0, 31] range.
+(define_split
+  [(set (match_operand:SI 0 "register_operand")
+   (minus:SI
+ (match_operand:SI 2 "const_int_operand")
+ (xor:SI
+   (minus:SI (const_int 63)
+ (subreg:SI
+   (clz:DI (match_operand:DI 1 "nonimmediate_operand"))
+   0))
+   (const_int 63]
+  "!TARGET_LZCNT && TARGET_64BIT && ix86_pre_reload_split ()"
+  [(set (match_dup 3)
+   (minus:DI (const_int 63) (clz:DI (match_dup 1

[PATCH] x86: Don't enable LZCNT/POPCNT if disabled explicitly

2021-07-30 Thread H.J. Lu via Gcc-patches
gcc/

PR target/101685
* config/i386/i386-options.c (ix86_option_override_internal):
Don't enable LZCNT/POPCNT if they have been disabled explicitly.

gcc/testsuite/

PR target/101685
* gcc.target/i386/pr-101685.c: New test.
---
 gcc/config/i386/i386-options.c|  6 --
 gcc/testsuite/gcc.target/i386/pr-101685.c | 10 ++
 2 files changed, 14 insertions(+), 2 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/i386/pr-101685.c

diff --git a/gcc/config/i386/i386-options.c b/gcc/config/i386/i386-options.c
index 3416a4f1752..6b789988baa 100644
--- a/gcc/config/i386/i386-options.c
+++ b/gcc/config/i386/i386-options.c
@@ -2124,8 +2124,10 @@ ix86_option_override_internal (bool main_args_p,
if (((processor_alias_table[i].flags & PTA_ABM) != 0)
&& !TARGET_EXPLICIT_ABM_P (opts))
  {
-   SET_TARGET_LZCNT (opts);
-   SET_TARGET_POPCNT (opts);
+   if (!TARGET_EXPLICIT_LZCNT_P (opts))
+ SET_TARGET_LZCNT (opts);
+   if (!TARGET_EXPLICIT_POPCNT_P (opts))
+ SET_TARGET_POPCNT (opts);
  }
 
if ((processor_alias_table[i].flags
diff --git a/gcc/testsuite/gcc.target/i386/pr-101685.c 
b/gcc/testsuite/gcc.target/i386/pr-101685.c
new file mode 100644
index 000..0c743ecad00
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/pr-101685.c
@@ -0,0 +1,10 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -march=amdfam10 -mno-lzcnt -mno-popcnt" } */
+
+#ifdef __LZCNT__
+# error LZCNT should be disabled
+#endif
+
+#ifdef __POPCNT__
+# error POPCNT should be disabled
+#endif
-- 
2.31.1



Re: [PATCH v4] Add QI vector mode support to by-pieces for memset

2021-07-30 Thread H.J. Lu via Gcc-patches
On Fri, Jul 30, 2021 at 2:06 AM Richard Sandiford
 wrote:
>
> "H.J. Lu via Gcc-patches"  writes:
> > On Mon, Jul 26, 2021 at 2:53 PM Richard Sandiford
> >  wrote:
> >>
> >> "H.J. Lu via Gcc-patches"  writes:
> >> > On Mon, Jul 26, 2021 at 11:42 AM Richard Sandiford
> >> >  wrote:
> >> >>
> >> >> "H.J. Lu via Gcc-patches"  writes:
> >> >> > +to avoid stack realignment when expanding memset.  The default is
> >> >> > +@code{gen_reg_rtx}.
> >> >> > +@end deftypefn
> >> >> > +
> >> >> >  @deftypefn {Target Hook} unsigned TARGET_LOOP_UNROLL_ADJUST 
> >> >> > (unsigned @var{nunroll}, class loop *@var{loop})
> >> >> >  This target hook returns a new value for the number of times 
> >> >> > @var{loop}
> >> >> >  should be unrolled. The parameter @var{nunroll} is the number of 
> >> >> > times
> >> >> > […]
> >> >> > @@ -1446,7 +1511,10 @@ can_store_by_pieces (unsigned HOST_WIDE_INT 
> >> >> > len,
> >> >> >max_size = STORE_MAX_PIECES + 1;
> >> >> >while (max_size > 1 && l > 0)
> >> >> >   {
> >> >> > -   scalar_int_mode mode = widest_int_mode_for_size (max_size);
> >> >> > +   /* Since this can be called before virtual registers are ready
> >> >> > +  to use, avoid QI vector mode here.  */
> >> >> > +   fixed_size_mode mode
> >> >> > + = widest_fixed_size_mode_for_size (max_size, false);
> >> >>
> >> >> I think I might have asked this before, sorry, but: when is that true
> >> >> and why does it matter?
> >> >
> >> > can_store_by_pieces may be called:
> >> >
> >> > value-prof.c:  if (!can_store_by_pieces (val, 
> >> > builtin_memset_read_str,
> >> > value-prof.c:  if (!can_store_by_pieces (val, 
> >> > builtin_memset_read_str,
> >> >
> >> > before virtual registers can be used.   When true is passed to
> >> > widest_fixed_size_mode_for_size,  virtual registers may be used
> >> > to expand memset to broadcast, which leads to ICE.   Since for the
> >> > purpose of can_store_by_pieces, we don't need to expand memset
> >> > to broadcast and pass false here can avoid ICE.
> >>
> >> Ah, I see, thanks.
> >>
> >> That sounds like a problem in the way that the memset const function is
> >> written though.  can_store_by_pieces is just a query function, so I don't
> >> think it should be trying to create new registers for can_store_by_pieces,
> >> even if it could.  At the same time, can_store_by_pieces should make the
> >> same choices as the real expander would.
> >>
> >> I think this means that:
> >>
> >> - gen_memset_broadcast should be inlined into its callers, with the
> >>   builtin_memset_read_str getting the CONST_INT_P case and
> >>   builtin_memset_gen_str getting the variable case.
> >>
> >> - builtin_memset_read_str should then stop at and return the
> >>   gen_const_vec_duplicate when the prev argument is null.
> >>   Only when prev is nonnull should it go on to call the hook
> >>   and copy the constant to the register that the hook returns.
> >
> > How about keeping gen_memset_broadcast and passing PREV to it:
> >
> >   rtx target;
> >   if (CONST_INT_P (data))
> > {
> >   rtx const_vec = gen_const_vec_duplicate (mode, data);
> >   if (prev == NULL)
> > /* Return CONST_VECTOR when called by a query function.  */
> > target = const_vec;
> >   else
> > {
> >   /* Use the move expander with CONST_VECTOR.  */
> >   target = targetm.gen_memset_scratch_rtx (mode);
> >   emit_move_insn (target, const_vec);
> > }
> > }
> >   else
> > {
> >   target = targetm.gen_memset_scratch_rtx (mode);
> >   class expand_operand ops[2];
> >   create_output_operand ([0], target, mode);
> >   create_input_operand ([1], data, QImode);
> >   expand_insn (icode, 2, ops);
> >   if (!rtx_equal_p (target, ops[0].value))
> > emit_move_insn (target, ops[0].value);
> > }
>
> TBH I think that complicates the interface too much.  The constant
> and non-constant cases are now very different.

I inlined gen_memset_broadcast in the v6 patch.

Thanks.

-- 
H.J.


Re: [PATCH v4] Add QI vector mode support to by-pieces for memset

2021-07-30 Thread H.J. Lu via Gcc-patches
On Fri, Jul 30, 2021 at 2:05 AM Richard Sandiford
 wrote:
>
> "H.J. Lu"  writes:
> > On Tue, Jul 27, 2021 at 8:31 AM H.J. Lu  wrote:
> >>
> >> On Mon, Jul 26, 2021 at 4:19 PM H.J. Lu  wrote:
> >> >
> >> > On Mon, Jul 26, 2021 at 3:56 PM H.J. Lu  wrote:
> >> > >
> >> > > On Mon, Jul 26, 2021 at 2:53 PM Richard Sandiford
> >> > >  wrote:
> >> > > >
> >> > > > "H.J. Lu via Gcc-patches"  writes:
> >> > > > > On Mon, Jul 26, 2021 at 11:42 AM Richard Sandiford
> >> > > > >  wrote:
> >> > > > >>
> >> > > > >> "H.J. Lu via Gcc-patches"  writes:
> >> > > > >> > +to avoid stack realignment when expanding memset.  The default 
> >> > > > >> > is
> >> > > > >> > +@code{gen_reg_rtx}.
> >> > > > >> > +@end deftypefn
> >> > > > >> > +
> >> > > > >> >  @deftypefn {Target Hook} unsigned TARGET_LOOP_UNROLL_ADJUST 
> >> > > > >> > (unsigned @var{nunroll}, class loop *@var{loop})
> >> > > > >> >  This target hook returns a new value for the number of times 
> >> > > > >> > @var{loop}
> >> > > > >> >  should be unrolled. The parameter @var{nunroll} is the number 
> >> > > > >> > of times
> >> > > > >> > […]
> >> > > > >> > @@ -1446,7 +1511,10 @@ can_store_by_pieces (unsigned 
> >> > > > >> > HOST_WIDE_INT len,
> >> > > > >> >max_size = STORE_MAX_PIECES + 1;
> >> > > > >> >while (max_size > 1 && l > 0)
> >> > > > >> >   {
> >> > > > >> > -   scalar_int_mode mode = widest_int_mode_for_size 
> >> > > > >> > (max_size);
> >> > > > >> > +   /* Since this can be called before virtual registers 
> >> > > > >> > are ready
> >> > > > >> > +  to use, avoid QI vector mode here.  */
> >> > > > >> > +   fixed_size_mode mode
> >> > > > >> > + = widest_fixed_size_mode_for_size (max_size, false);
> >> > > > >>
> >> > > > >> I think I might have asked this before, sorry, but: when is that 
> >> > > > >> true
> >> > > > >> and why does it matter?
> >> > > > >
> >> > > > > can_store_by_pieces may be called:
> >> > > > >
> >> > > > > value-prof.c:  if (!can_store_by_pieces (val, 
> >> > > > > builtin_memset_read_str,
> >> > > > > value-prof.c:  if (!can_store_by_pieces (val, 
> >> > > > > builtin_memset_read_str,
> >> > > > >
> >> > > > > before virtual registers can be used.   When true is passed to
> >> > > > > widest_fixed_size_mode_for_size,  virtual registers may be used
> >> > > > > to expand memset to broadcast, which leads to ICE.   Since for the
> >> > > > > purpose of can_store_by_pieces, we don't need to expand memset
> >> > > > > to broadcast and pass false here can avoid ICE.
> >> > > >
> >> > > > Ah, I see, thanks.
> >> > > >
> >> > > > That sounds like a problem in the way that the memset const function 
> >> > > > is
> >> > > > written though.  can_store_by_pieces is just a query function, so I 
> >> > > > don't
> >> > > > think it should be trying to create new registers for 
> >> > > > can_store_by_pieces,
> >> > > > even if it could.  At the same time, can_store_by_pieces should make 
> >> > > > the
> >> > > > same choices as the real expander would.
> >> > > >
> >> > > > I think this means that:
> >> > > >
> >> > > > - gen_memset_broadcast should be inlined into its callers, with the
> >> > > >   builtin_memset_read_str getting the CONST_INT_P case and
> >> > > >   builtin_memset_gen_str getting the variable case.
> >> > > >
> >> > > > - builtin_memset_read_str should then stop at and return the
> >> > > >   gen_const_vec_duplicate when the prev argument is null.
> >> >
> >> > This doesn't work since can_store_by_pieces has
> >> >
> >> >  cst = (*constfun) (constfundata, nullptr, offset, mode);
> >> >   if (!targetm.legitimate_constant_p (mode, cst))
> >>
> >> We can add a target hook, targetm.legitimate_memset_constant_p,
> >> which defaults to targetm.legitimate_constant_p.  Will it be acceptable?
> >
> > In the v5 patch,  I changed it to
> >
> >  cst = (*constfun) (constfundata, nullptr, offset, mode);
> >   /* All CONST_VECTORs are legitimate if vec_duplicate
> >  is supported.  */
>
> Maybe “can be loaded” rather than “are legitimate”, since they're

Fixed.

> not necessarily legitimate in the sense of legitimate_constant_p
> (hence the patch).  Also, since we assume elsewhere that
> vec_duplicate is a precondition for picking a vector mode,
> I think we should do the same here (and note that in the comment).

Fixed.

> So…
>
> >   if (!((memsetp
> >  && VECTOR_MODE_P (mode)
> >  && GET_MODE_INNER (mode) == QImode
> >  && (optab_handler (vec_duplicate_optab, mode)
> >  != CODE_FOR_nothing))
>
> I think we need only the (memsetp && VECTOR_MODE_P (mode)) check.
>
> This feels a bit of a hack TBH.  I think the same principles apply
> to vectors and integers here: forcing the constant to memory is
> still likely to be an optimisation, but is an extra overhead that
> we 

[PATCH v6] Add QI vector mode support to by-pieces for memset

2021-07-30 Thread H.J. Lu via Gcc-patches
1. Replace scalar_int_mode with fixed_size_mode in the by-pieces
infrastructure to allow non-integer mode.
2. Rename widest_int_mode_for_size to widest_fixed_size_mode_for_size
to return QI vector mode for memset.
3. Add op_by_pieces_d::smallest_fixed_size_mode_for_size to return the
smallest integer or QI vector mode.
4. Remove clear_by_pieces_1 and use builtin_memset_read_str in
clear_by_pieces to support vector mode broadcast.
5. Add lowpart_subreg_regno, a wrapper around simplify_subreg_regno that
uses subreg_lowpart_offset (mode, prev_mode) as the offset.
6. Add TARGET_GEN_MEMSET_SCRATCH_RTX to allow the backend to use a hard
scratch register to avoid stack realignment when expanding memset.

gcc/

PR middle-end/90773
* builtins.c (builtin_memcpy_read_str): Change the mode argument
from scalar_int_mode to fixed_size_mode.
(builtin_strncpy_read_str): Likewise.
(gen_memset_value_from_prev): New function.
(builtin_memset_read_str): Change the mode argument from
scalar_int_mode to fixed_size_mode.  Use gen_memset_value_from_prev
and support CONST_VECTOR.
(builtin_memset_gen_str): Likewise.
(try_store_by_multiple_pieces): Use by_pieces_constfn to declare
constfun.
* builtins.h (builtin_strncpy_read_str): Replace scalar_int_mode
with fixed_size_mode.
(builtin_memset_read_str): Likewise.
* expr.c (widest_int_mode_for_size): Renamed to ...
(widest_fixed_size_mode_for_size): Add a bool argument to
indicate if QI vector mode can be used.
(by_pieces_ninsns): Call widest_fixed_size_mode_for_size
instead of widest_int_mode_for_size.
(pieces_addr::adjust): Change the mode argument from
scalar_int_mode to fixed_size_mode.
(op_by_pieces_d): Make m_len read-only.  Add a bool member,
m_qi_vector_mode, to indicate that QI vector mode can be used.
(op_by_pieces_d::op_by_pieces_d): Add a bool argument to
initialize m_qi_vector_mode.  Call widest_fixed_size_mode_for_size
instead of widest_int_mode_for_size.
(op_by_pieces_d::get_usable_mode): Change the mode argument from
scalar_int_mode to fixed_size_mode.  Call
widest_fixed_size_mode_for_size instead of
widest_int_mode_for_size.
(op_by_pieces_d::smallest_fixed_size_mode_for_size): New member
function to return the smallest integer or QI vector mode.
(op_by_pieces_d::run): Call widest_fixed_size_mode_for_size
instead of widest_int_mode_for_size.  Call
smallest_fixed_size_mode_for_size instead of
smallest_int_mode_for_size.
(store_by_pieces_d::store_by_pieces_d): Add a bool argument to
indicate that QI vector mode can be used and pass it to
op_by_pieces_d::op_by_pieces_d.
(can_store_by_pieces): Call widest_fixed_size_mode_for_size
instead of widest_int_mode_for_size.  Pass memsetp to
widest_fixed_size_mode_for_size to support QI vector mode.
Allow all CONST_VECTORs for memset if vec_duplicate is supported.
(store_by_pieces): Pass memsetp to
store_by_pieces_d::store_by_pieces_d.
(clear_by_pieces_1): Removed.
(clear_by_pieces): Replace clear_by_pieces_1 with
builtin_memset_read_str and pass true to store_by_pieces_d to
support vector mode broadcast.
(string_cst_read_str): Change the mode argument from
scalar_int_mode to fixed_size_mode.
* expr.h (by_pieces_constfn): Change scalar_int_mode to
fixed_size_mode.
(by_pieces_prev): Likewise.
* rtl.h (lowpart_subreg_regno): New.
* rtlanal.c (lowpart_subreg_regno): New.  A wrapper around
simplify_subreg_regno.
* target.def (gen_memset_scratch_rtx): New hook.
* doc/tm.texi.in: Add TARGET_GEN_MEMSET_SCRATCH_RTX.
* doc/tm.texi: Regenerated.

gcc/testsuite/

* gcc.target/i386/pr100865-3.c: Expect vmovdqu8 instead of
vmovdqu.
* gcc.target/i386/pr100865-4b.c: Likewise.
---
 gcc/builtins.c  | 174 
 gcc/builtins.h  |   4 +-
 gcc/doc/tm.texi |   7 +
 gcc/doc/tm.texi.in  |   2 +
 gcc/expr.c  | 172 +--
 gcc/expr.h  |   4 +-
 gcc/rtl.h   |   2 +
 gcc/rtlanal.c   |  11 ++
 gcc/target.def  |   9 +
 gcc/testsuite/gcc.target/i386/pr100865-3.c  |   2 +-
 gcc/testsuite/gcc.target/i386/pr100865-4b.c |   2 +-
 11 files changed, 300 insertions(+), 89 deletions(-)

diff --git a/gcc/builtins.c b/gcc/builtins.c
index 845a8bb1201..cddef63cfc2 100644
--- a/gcc/builtins.c
+++ b/gcc/builtins.c
@@ -3119,13 +3119,16 @@ expand_builtin_strnlen (tree exp, rtx target, 
machine_mode 

Re: PING^5: [PATCH] mips: add MSA vec_cmp and vec_cmpu expand pattern [PR101132]

2021-07-30 Thread Maciej W. Rozycki
On Fri, 30 Jul 2021, Richard Biener wrote:

> > It isn't really appropriate for me to review MIPS stuff given that I work
> > for a company that has a competing architecture.  I think Jeff expressed
> > similar concerns given his new role.
> 
> I think that should be a non-issue unless it is an issue between you
> and your employer (I realize some companies even restrict what you
> can do in your spare time).

 That is exactly the point, and I understand the ethical concerns even if 
such activity has not been explicitly restricted by an employment contract 
one has entered into and has been bound by.

>  We trust maintainers / reviewers to do
> the right thing(TM) for the GCC project even when it is against the
> interest of the company they are employed by.  That is, not push
> crap even if it is in the area of your maintainership.

 That I think is undoubtable.

  Maciej


Re: OpenMP 5.1: omp_display_env

2021-07-30 Thread Thomas Schwinge
Hi!

On 2021-07-30T12:05:56+0200, I wrote:
> On 2021-07-30T12:02:00+0200, Jakub Jelinek via Gcc-patches 
>  wrote:
>> On Fri, Jul 30, 2021 at 11:54:00AM +0200, Ulrich Drepper wrote:
>>> On Fri, Jul 30, 2021 at 10:50 AM Jakub Jelinek  wrote:
>>>
>>> > I think for now it would be better to guard the omp_display_env_*
>>> > in fortran.c with #ifndef LIBGOMP_OFFLOADED_ONLY
>>>
>>> OK, easy enough.  This compiles for me.
>>
>> Ok (with ChangeLog entry), thanks.
>
> Heh, I had just come up with the same patch, and pushed
> "[libgomp] Restore offloading 'libgomp/fortran.c'" to master branch in
> commit 28665ddc7efa48f9b39615e313a2c4a7a66cdb24, see attached.

I'm sorry if I stepped on anyone's toes: Tobias told me that you were
still discussing on IRC the proper way of fixing this, while I went ahead
and pushed what I considered "obvious enough", meaning that it fixes the
regression, whilst not breaking anything else.  (Adding more
functionality can of course be done, incrementally.)


Grüße
 Thomas


> From 28665ddc7efa48f9b39615e313a2c4a7a66cdb24 Mon Sep 17 00:00:00 2001
> From: Thomas Schwinge 
> Date: Fri, 30 Jul 2021 11:48:54 +0200
> Subject: [PATCH] [libgomp] Restore offloading 'libgomp/fortran.c'
>
> GCN:
>
> ld: error: undefined symbol: gomp_ialias_omp_display_env
> >>> referenced by fortran.c:744 ([...]/source-gcc/libgomp/fortran.c:744)
> >>>   fortran.o:(omp_display_env_) in archive 
> [...]/build-gcc-offload-amdgcn-amdhsa/amdgcn-amdhsa/libgomp/.libs/libgomp.a
> >>> referenced by fortran.c:744 ([...]/source-gcc/libgomp/fortran.c:744)
> >>>   fortran.o:(omp_display_env_) in archive 
> [...]/build-gcc-offload-amdgcn-amdhsa/amdgcn-amdhsa/libgomp/.libs/libgomp.a
> >>> referenced by fortran.c:750 ([...]/source-gcc/libgomp/fortran.c:750)
> >>>   fortran.o:(omp_display_env_8_) in archive 
> [...]/build-gcc-offload-amdgcn-amdhsa/amdgcn-amdhsa/libgomp/.libs/libgomp.a
> >>> referenced by fortran.c:750 ([...]/source-gcc/libgomp/fortran.c:750)
> >>>   fortran.o:(omp_display_env_8_) in archive 
> [...]/build-gcc-offload-amdgcn-amdhsa/amdgcn-amdhsa/libgomp/.libs/libgomp.a
> collect2: error: ld returned 1 exit status
> mkoffload: fatal error: 
> build-gcc/gcc/x86_64-pc-linux-gnu-accel-amdgcn-amdhsa-gcc returned 1 exit 
> status
>
> nvptx:
>
> unresolved symbol omp_display_env
> collect2: error: ld returned 1 exit status
> mkoffload: fatal error: 
> [...]/build-gcc/./gcc/x86_64-pc-linux-gnu-accel-nvptx-none-gcc returned 1 
> exit status
>
> Fix-up for commit 7123ae2455b5a1a2f19f13fa82c377cfda157f23
> "Implement OpenMP 5.1 section 3.15: omp_display_env".
>
>   libgomp/
>   * fortran.c (omp_display_env_, omp_display_env_8_): Only
>   '#ifndef LIBGOMP_OFFLOADED_ONLY'.
>
> Co-Authored-By: Ulrich Drepper 
> ---
>  libgomp/fortran.c | 4 
>  1 file changed, 4 insertions(+)
>
> diff --git a/libgomp/fortran.c b/libgomp/fortran.c
> index 76285d4376b..e042702ac91 100644
> --- a/libgomp/fortran.c
> +++ b/libgomp/fortran.c
> @@ -738,6 +738,8 @@ omp_get_default_allocator_ ()
>return (intptr_t) omp_get_default_allocator ();
>  }
>
> +#ifndef LIBGOMP_OFFLOADED_ONLY
> +
>  void
>  omp_display_env_ (const int32_t *verbose)
>  {
> @@ -749,3 +751,5 @@ omp_display_env_8_ (const int64_t *verbose)
>  {
>omp_display_env (!!*verbose);
>  }
> +
> +#endif /* LIBGOMP_OFFLOADED_ONLY */
> --
> 2.30.2
-
Siemens Electronic Design Automation GmbH; Anschrift: Arnulfstraße 201, 80634 
München; Gesellschaft mit beschränkter Haftung; Geschäftsführer: Thomas 
Heurung, Frank Thürauf; Sitz der Gesellschaft: München; Registergericht 
München, HRB 106955


[PATCH] Add emulated gather capability to the vectorizer

2021-07-30 Thread Richard Biener
This adds a gather vectorization capability to the vectorizer
without target support by decomposing the offset vector, doing
sclar loads and then building a vector from the result.  This
is aimed mainly at cases where vectorizing the rest of the loop
offsets the cost of vectorizing the gather.

Note it's difficult to avoid vectorizing the offset load, but in
some cases later passes can turn the vector load + extract into
scalar loads, see the followup patch.

On SPEC CPU 2017 510.parest_r this improves runtime from 250s
to 219s on a Zen2 CPU which has its native gather instructions
disabled (using those the runtime instead increases to 254s)
using -Ofast -march=znver2 [-flto].  It turns out the critical
loops in this benchmark all perform gather operations.

Bootstrapped and tested on x86_64-unknown-linux-gnu.

Any comments?  I still plan to run this over full SPEC and
I have to apply TLC to the followup patch before I can post it.

I think neither power nor z has gather so I'm curious if the
patch helps 510.parest_r there, I'm unsure about neon/advsimd.
Both might need the followup patch - I was surprised about
the speedup without it on Zen (the followup improves runtime
to 198s there).

Thanks,
Richard.

2021-07-30  Richard Biener  

* tree-vect-data-refs.c (vect_check_gather_scatter):
Include widening conversions only when the result is
still handed by native gather or the current offset
size not already matches the data size.
Also succeed analysis in case there's no native support,
noted by a IFN_LAST ifn and a NULL decl.
* tree-vect-patterns.c (vect_recog_gather_scatter_pattern):
Test for no IFN gather rather than decl gather.
* tree-vect-stmts.c (vect_model_load_cost): Pass in the
gather-scatter info and cost emulated gathers accordingly.
(vect_truncate_gather_scatter_offset): Properly test for
no IFN gather.
(vect_use_strided_gather_scatters_p): Likewise.
(get_load_store_type): Handle emulated gathers and its
restrictions.
(vectorizable_load): Likewise.  Emulate them by extracting
scalar offsets, doing scalar loads and a vector construct.

* gcc.target/i386/vect-gather-1.c: New testcase.
* gfortran.dg/vect/vect-8.f90: Adjust.
---
 gcc/testsuite/gcc.target/i386/vect-gather-1.c |  18 
 gcc/testsuite/gfortran.dg/vect/vect-8.f90 |   2 +-
 gcc/tree-vect-data-refs.c |  29 +++--
 gcc/tree-vect-patterns.c  |   2 +-
 gcc/tree-vect-stmts.c | 100 --
 5 files changed, 136 insertions(+), 15 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/i386/vect-gather-1.c

diff --git a/gcc/testsuite/gcc.target/i386/vect-gather-1.c 
b/gcc/testsuite/gcc.target/i386/vect-gather-1.c
new file mode 100644
index 000..134aef39666
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/vect-gather-1.c
@@ -0,0 +1,18 @@
+/* { dg-do compile } */
+/* { dg-options "-Ofast -msse2 -fdump-tree-vect-details" } */
+
+#ifndef INDEXTYPE
+#define INDEXTYPE int
+#endif
+double vmul(INDEXTYPE *rowstart, INDEXTYPE *rowend,
+   double *luval, double *dst)
+{
+  double res = 0;
+  for (const INDEXTYPE * col = rowstart; col != rowend; ++col, ++luval)
+res += *luval * dst[*col];
+  return res;
+}
+
+/* With gather emulation this should be profitable to vectorize
+   even with plain SSE2.  */
+/* { dg-final { scan-tree-dump "loop vectorized" "vect" } } */
diff --git a/gcc/testsuite/gfortran.dg/vect/vect-8.f90 
b/gcc/testsuite/gfortran.dg/vect/vect-8.f90
index 9994805d77f..cc1aebfbd84 100644
--- a/gcc/testsuite/gfortran.dg/vect/vect-8.f90
+++ b/gcc/testsuite/gfortran.dg/vect/vect-8.f90
@@ -706,5 +706,5 @@ END SUBROUTINE kernel
 
 ! { dg-final { scan-tree-dump-times "vectorized 24 loops" 1 "vect" { target 
aarch64_sve } } }
 ! { dg-final { scan-tree-dump-times "vectorized 23 loops" 1 "vect" { target { 
aarch64*-*-* && { ! aarch64_sve } } } } }
-! { dg-final { scan-tree-dump-times "vectorized 2\[23\] loops" 1 "vect" { 
target { vect_intdouble_cvt && { ! aarch64*-*-* } } } } }
+! { dg-final { scan-tree-dump-times "vectorized 2\[234\] loops" 1 "vect" { 
target { vect_intdouble_cvt && { ! aarch64*-*-* } } } } }
 ! { dg-final { scan-tree-dump-times "vectorized 17 loops" 1 "vect" { target { 
{ ! vect_intdouble_cvt } && { ! aarch64*-*-* } } } } }
diff --git a/gcc/tree-vect-data-refs.c b/gcc/tree-vect-data-refs.c
index 6995efba899..0279e75fa8e 100644
--- a/gcc/tree-vect-data-refs.c
+++ b/gcc/tree-vect-data-refs.c
@@ -4007,8 +4007,26 @@ vect_check_gather_scatter (stmt_vec_info stmt_info, 
loop_vec_info loop_vinfo,
  continue;
}
 
- if (TYPE_PRECISION (TREE_TYPE (op0))
- < TYPE_PRECISION (TREE_TYPE (off)))
+ /* Include the conversion if it is widening and we're using
+the IFN path or the target can handle the converted from
+

[PATCH] Fix typos in move_sese_region_to_fn

2021-07-30 Thread Kewen.Lin via Gcc-patches
Hi,

This patch is to fix the typos in the move_sese_region_to_fn.
As mentioned here [1], I tried to debug the test case
gcc.dg/graphite/pr83359.c with trunk, but I found it didn't
go into the hunk guard with "if (moved_orig_loop_num)".  So
I switched to commit 555758de90074 (also reproduced the ICE
with 555758de90074~ to ensure my used command step is correct),
I noticed the compilation of the test case only covers the
hunk

else
  {
moved_orig_loop_num[dloop->orig_loop_num] = -1;
dloop->orig_loop_num = 0;
  }

it doesn't touch the hunk

if ((*larray)[dloop->orig_loop_num] != NULL
&& get_loop (saved_cfun, dloop->orig_loop_num) == NULL)
  {
if (moved_orig_loop_num[dloop->orig_loop_num] >= 0
&& moved_orig_loop_num[dloop->orig_loop_num] < 2)
  moved_orig_loop_num[dloop->orig_loop_num]++;
dloop->orig_loop_num = (*larray)[dloop->orig_loop_num]->num;
  }

so the following hunk using dloop and guarded with 
"if (moved_orig_loop_num[orig_loop_num] == 2)" doesn't get executed.

It explains why the problem doesn't get exposed before.

By looking to the code using dloop, I think it's a copy/paste typo,
the assertion

  gcc_assert ((*larray)[dloop->orig_loop_num] != NULL
  && (get_loop (saved_cfun, dloop->orig_loop_num)
  == NULL));

would like to ensure the condition in the previous
loop iterating is true, that is:

if ((*larray)[dloop->orig_loop_num] != NULL
&& get_loop (saved_cfun, dloop->orig_loop_num) == NULL)

But in that context, I think the expected original number has been
assigned to variable orig_loop_num by extracting from the arg0
of IFN_LOOP_DIST_ALIAS call.  So replace those ones.

Is it ok for trunk?

[1] https://gcc.gnu.org/pipermail/gcc-patches/2021-July/576367.html

BR,
Kewen
-
gcc/ChangeLog:

* tree-cfg.c (move_sese_region_to_fn): Fix typos on dloop.
diff --git a/gcc/tree-cfg.c b/gcc/tree-cfg.c
index 48ee8c011ab..9883eaaa9bf 100644
--- a/gcc/tree-cfg.c
+++ b/gcc/tree-cfg.c
@@ -7747,9 +7747,8 @@ move_sese_region_to_fn (struct function *dest_cfun, 
basic_block entry_bb,
 
   /* Fix up orig_loop_num.  If the block referenced in it has been moved
  to dest_cfun, update orig_loop_num field, otherwise clear it.  */
-  class loop *dloop = NULL;
   signed char *moved_orig_loop_num = NULL;
-  for (class loop *dloop : loops_list (dest_cfun, 0))
+  for (auto dloop : loops_list (dest_cfun, 0))
 if (dloop->orig_loop_num)
   {
if (moved_orig_loop_num == NULL)
@@ -7787,11 +7786,10 @@ move_sese_region_to_fn (struct function *dest_cfun, 
basic_block entry_bb,
  /* If we have moved both loops with this orig_loop_num into
 dest_cfun and the LOOP_DIST_ALIAS call is being moved there
 too, update the first argument.  */
- gcc_assert ((*larray)[dloop->orig_loop_num] != NULL
- && (get_loop (saved_cfun, dloop->orig_loop_num)
- == NULL));
+ gcc_assert ((*larray)[orig_loop_num] != NULL
+ && (get_loop (saved_cfun, orig_loop_num) == NULL));
  tree t = build_int_cst (integer_type_node,
- (*larray)[dloop->orig_loop_num]->num);
+ (*larray)[orig_loop_num]->num);
  gimple_call_set_arg (g, 0, t);
  update_stmt (g);
  /* Make sure the following loop will not update it.  */


Re: [PATCH 1/2] Fix debug info for ignored decls at start of assembly

2021-07-30 Thread Bernd Edlinger


On 7/29/21 9:23 AM, Richard Biener wrote:
> On Wed, 28 Jul 2021, Bernd Edlinger wrote:
> 
>> On 7/28/21 2:51 PM, Richard Biener wrote:
>>> On Mon, 26 Jul 2021, Bernd Edlinger wrote:
>>>
 Ignored functions decls that are compiled at the start of
 the assembly have bogus line numbers until the first .file
 directive, as reported in PR101575.

 The work around for this issue is to emit a dummy .file
 directive when the first function is DECL_IGNORED_P, when
 that is not already done, mostly for -fdwarf-4.
>>>
>>> I wonder if it makes sense to unconditionally announce the
>>> TU with a .file directive at the beginning.  ISTR this is
>>> what we now do with -gdwarf-5.
>>>
>>
>> Yes, that would work, even when the file name is not guessed
>> correctly.
>>
>> Initially I had "" unconditionally here, and it did
>> not really hurt, except that it is visible with readelf.
> 
> I think I'd prefer that, since if we don't announce a .file
> before the first assembler statement but ask gas to produce
> line info it might be tempted to create line info referencing
> the possibly temporary filename of the assembler file which
> is undesirable from a build reproducability point.
> 

Yeah, I understand.

Meanwhile I found a simple C test case without ignored functions

$ cat test1.c
asm("nop");
int main () 
{
  return 0;
}

$ gcc -g test1.c
$ readelf --debug-dump=decodedline a.out 
Contents of the .debug_line section:

CU: ./test1.c:
File nameLine numberStarting addressView
Stmt
test1.c50x401106
   x
test1.c30x401107
   x
test1.c40x40110b
   x
test1.c50x401110
   x
test1.c-0x401112

even with the proposed patch, so I agree it is incomplete.

I tried the gdb test case and compile it with different LTO
options, but the gen_AT_string was always valid, in some
cases a lto debug section together with a couple .file 
directives was output before the .file 0.
So I'd like to use the file name from gen_AT_string, since
it's most of the time accurate, and avoids unnecessary confusion
on the readers of the produced debug info.

So I'd propose the attached patch instead.
Is it OK for trunk?


> Richard.
> 
>>> Note get_AT_string (comp_unit_die (), DW_AT_name) doesn't
>>> work with LTO, you'll get  then.
>>>
>>
>> Yeah, that's why I wanted to restrict that to the case where
>> it's absolutely necessary.
>>
>>> Is the dwarf assembler bug reported/fixed?  Can you include
>>> a reference please?
>>>
>>
>> I've just added a bug report, it's unlikely to be fixed IMHO:
>> https://sourceware.org/bugzilla/show_bug.cgi?id=28149
>>
>> I will add that to the patch description:
>>
>> Ignored functions decls that are compiled at the start of
>> the assembly have bogus line numbers until the first .file
>> directive, as reported in PR101575.
>>
>> The corresponding binutils bug report is
>> https://sourceware.org/bugzilla/show_bug.cgi?id=28149
>>
>> The work around for this issue is to emit a dummy .file
>> directive when the first function is DECL_IGNORED_P, when
>> that is not already done, mostly for -fdwarf-4.
>>
>>
>> Thanks
>> Bernd.
>>
>>> Thanks,
>>> Richard.
>>>
 2021-07-24  Bernd Edlinger  

PR ada/101575
* dwarf2out.c (dwarf2out_begin_prologue): Move init
of fde->ignored_debug to dwarf2out_set_ignored_loc.
(dwarf2out_set_ignored_loc): This is now also called
when no .loc statement is to be generated, in that case
we emit a dummy .file statement when needed.
* final.c (final_start_function_1,
final_scan_insn_1): Call debug_hooks->set_ignored_loc
for all DECL_IGNORED_P functions.
 ---
  gcc/dwarf2out.c | 29 +
  gcc/final.c |  5 ++---
  2 files changed, 27 insertions(+), 7 deletions(-)

 diff --git a/gcc/dwarf2out.c b/gcc/dwarf2out.c
 index 884f1e1..8de0d6f 100644
 --- a/gcc/dwarf2out.c
 +++ b/gcc/dwarf2out.c
 @@ -1115,7 +1115,6 @@ dwarf2out_begin_prologue (unsigned int line 
 ATTRIBUTE_UNUSED,
fde->dw_fde_current_label = dup_label;
fde->in_std_section = (fnsec == text_section
 || (cold_text_section && fnsec == cold_text_section));
 -  fde->ignored_debug = DECL_IGNORED_P (current_function_decl);
in_text_section_p = fnsec == text_section;
  
/* We only want to output line number information for the genuine dwarf2
 @@ -28546,10 +28545,32 @@ dwarf2out_set_ignored_loc (unsigned int line, 
 unsigned int column,
  {
dw_fde_ref fde = cfun->fde;
  
 -  fde->ignored_debug = false;
 -  set_cur_line_info_table (function_section 

[committed 12/12] d: Remove dead code from binary_op.

2021-07-30 Thread Iain Buclaw via Gcc-patches
The front-end ensures that both sides have been casted to the same type
before being given to the lowering pass.

gcc/d/ChangeLog:

* expr.cc (binary_op): Remove dead code.
---
 gcc/d/expr.cc | 8 
 1 file changed, 8 deletions(-)

diff --git a/gcc/d/expr.cc b/gcc/d/expr.cc
index 73e0abeaa43..e293cf2a4cd 100644
--- a/gcc/d/expr.cc
+++ b/gcc/d/expr.cc
@@ -101,8 +101,6 @@ binary_op (tree_code code, tree type, tree arg0, tree arg1)
   tree t1 = TREE_TYPE (arg1);
   tree ret = NULL_TREE;
 
-  bool unsignedp = TYPE_UNSIGNED (t0) || TYPE_UNSIGNED (t1);
-
   /* Deal with float mod expressions immediately.  */
   if (code == FLOAT_MOD_EXPR)
 return build_float_modulus (type, arg0, arg1);
@@ -130,12 +128,6 @@ binary_op (tree_code code, tree type, tree arg0, tree arg1)
   else
ret = fold_build2 (POINTER_DIFF_EXPR, ptrtype, arg0, arg1);
 }
-  else if (INTEGRAL_TYPE_P (type) && (TYPE_UNSIGNED (type) != unsignedp))
-{
-  tree inttype = (unsignedp)
-   ? d_unsigned_type (type) : d_signed_type (type);
-  ret = fold_build2 (code, inttype, arg0, arg1);
-}
   else
 {
   /* If the operation needs excess precision.  */
-- 
2.30.2



[committed 11/12] d: Always layout initializer for the m_RTInfo field in TypeInfo_Class

2021-07-30 Thread Iain Buclaw via Gcc-patches
Makes it explicit that the default value is set to NULL.

gcc/d/ChangeLog:

* typeinfo.cc (TypeInfoVisitor::visit (TypeInfoClassDeclaration *)):
Always layout initializer for the m_RTInfo field.
---
 gcc/d/typeinfo.cc | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/gcc/d/typeinfo.cc b/gcc/d/typeinfo.cc
index c9126f4c6b5..978c73e65f6 100644
--- a/gcc/d/typeinfo.cc
+++ b/gcc/d/typeinfo.cc
@@ -934,6 +934,8 @@ public:
  this->layout_field (build_expr (cd->getRTInfo, true));
else if (!(flags & ClassFlags::noPointers))
  this->layout_field (size_one_node);
+   else
+ this->layout_field (null_pointer_node);
   }
 else
   {
-- 
2.30.2



[committed 10/12] d: Don't generate a PREDICT_EXPR when assert contracts are turned off.

2021-07-30 Thread Iain Buclaw via Gcc-patches
This expression is just discarded by add_stmt, so never reaches the
middle-end.

gcc/d/ChangeLog:

* expr.cc (ExprVisitor::visit (AssertExp *)): Don't generate
PREDICT_EXPR.
---
 gcc/d/expr.cc | 12 +++-
 1 file changed, 3 insertions(+), 9 deletions(-)

diff --git a/gcc/d/expr.cc b/gcc/d/expr.cc
index 76c1e613e77..73e0abeaa43 100644
--- a/gcc/d/expr.cc
+++ b/gcc/d/expr.cc
@@ -2085,15 +2085,9 @@ public:
   }
 else
   {
-   /* Assert contracts are turned off, if the contract condition has no
-  side effects can still use it as a predicate for the optimizer.  */
-   if (TREE_SIDE_EFFECTS (arg))
- {
-   this->result_ = void_node;
-   return;
- }
-
-   assert_fail = build_predict_expr (PRED_NORETURN, NOT_TAKEN);
+   /* Assert contracts are turned off.  */
+   this->result_ = void_node;
+   return;
   }
 
 /* Build condition that we are asserting in this contract.  */
-- 
2.30.2



[committed 09/12] d: Clarify comment for generating static array assignment with literal.

2021-07-30 Thread Iain Buclaw via Gcc-patches
The code block is done as an optimization to elide a call to the runtime
library helpers _d_arrayctor or _d_arrayassign.

gcc/d/ChangeLog:

* expr.cc (ExprVisitor::visit (AssignExp *)): Clarify comment
  for generating static array assignment with literal.
---
 gcc/d/expr.cc | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/gcc/d/expr.cc b/gcc/d/expr.cc
index 85269c6b2be..76c1e613e77 100644
--- a/gcc/d/expr.cc
+++ b/gcc/d/expr.cc
@@ -1163,9 +1163,9 @@ public:
bool destructor = needs_dtor (etype);
bool lvalue = lvalue_p (e->e2);
 
-   /* Even if the elements in rhs are all rvalues and don't have
-  to call postblits, this assignment should call dtors on old
-  assigned elements.  */
+   /* Optimize static array assignment with array literal.  Even if the
+  elements in rhs are all rvalues and don't have to call postblits,
+  this assignment should call dtors on old assigned elements.  */
if ((!postblit && !destructor)
|| (e->op == TOKconstruct && e->e2->op == TOKarrayliteral)
|| (e->op == TOKconstruct && !lvalue && postblit)
-- 
2.30.2



[committed 08/12] d: Only handle named enums in enum_initializer_decl

2021-07-30 Thread Iain Buclaw via Gcc-patches
Anonymous enums neither generate an initializer nor typeinfo symbol, so
it's safe to assert that all enum declarations passed to this function
always have an identifier.

gcc/d/ChangeLog:

* decl.cc (enum_initializer_decl): Only handle named enums.
---
 gcc/d/decl.cc | 7 ++-
 1 file changed, 2 insertions(+), 5 deletions(-)

diff --git a/gcc/d/decl.cc b/gcc/d/decl.cc
index cf61cd49159..0d46ee180e7 100644
--- a/gcc/d/decl.cc
+++ b/gcc/d/decl.cc
@@ -2218,13 +2218,10 @@ enum_initializer_decl (EnumDeclaration *decl)
   if (decl->sinit)
 return decl->sinit;
 
-  tree type = build_ctype (decl->type);
+  gcc_assert (decl->ident);
 
-  Identifier *ident_save = decl->ident;
-  if (!decl->ident)
-decl->ident = Identifier::generateId ("__enum");
+  tree type = build_ctype (decl->type);
   tree ident = mangle_internal_decl (decl, "__init", "Z");
-  decl->ident = ident_save;
 
   decl->sinit = declare_extern_var (ident, type);
   DECL_LANG_SPECIFIC (decl->sinit) = build_lang_decl (NULL);
-- 
2.30.2



[committed 07/12] d: Set COMDAT and visibility of thunks only if they are public.

2021-07-30 Thread Iain Buclaw via Gcc-patches
It is not expected to have a member function that can be non-public, but
this guards against any internal errors that might occur should that
ever change in the front-end.

gcc/d/ChangeLog:

* decl.cc (make_thunk): Set COMDAT and visibility of thunks only if
they are public.
---
 gcc/d/decl.cc | 9 ++---
 1 file changed, 6 insertions(+), 3 deletions(-)

diff --git a/gcc/d/decl.cc b/gcc/d/decl.cc
index 59991c3c255..cf61cd49159 100644
--- a/gcc/d/decl.cc
+++ b/gcc/d/decl.cc
@@ -1781,9 +1781,12 @@ make_thunk (FuncDeclaration *decl, int offset)
   DECL_ARTIFICIAL (thunk) = 1;
   DECL_DECLARED_INLINE_P (thunk) = 0;
 
-  DECL_VISIBILITY (thunk) = DECL_VISIBILITY (function);
-  DECL_COMDAT (thunk) = DECL_COMDAT (function);
-  DECL_WEAK (thunk) = DECL_WEAK (function);
+  if (TREE_PUBLIC (thunk))
+{
+  DECL_VISIBILITY (thunk) = DECL_VISIBILITY (function);
+  DECL_COMDAT (thunk) = DECL_COMDAT (function);
+  DECL_WEAK (thunk) = DECL_WEAK (function);
+}
 
   /* When the thunk is for an extern C++ function, let C++ do the thunk
  generation and just reference the symbol as extern, instead of
-- 
2.30.2



[committed 06/12] d: Factor aggregate_initializer_decl to set the sinit for aggregate declarations.

2021-07-30 Thread Iain Buclaw via Gcc-patches
The self-hosted implementation of the D front-end changes the type of
`sinit' to a void pointer, which requires an explicit cast to `tree'.

gcc/d/ChangeLog:

* decl.cc (DeclVisitor::visit (StructDeclaration *)): Don't use sinit
for declaration directly.
(DeclVisitor::visit (ClassDeclaration *)): Likewise.
(aggregate_initializer_decl): Likewise.  Set sinit after creating.
---
 gcc/d/decl.cc | 29 +++--
 1 file changed, 15 insertions(+), 14 deletions(-)

diff --git a/gcc/d/decl.cc b/gcc/d/decl.cc
index 7d1378255bd..59991c3c255 100644
--- a/gcc/d/decl.cc
+++ b/gcc/d/decl.cc
@@ -386,9 +386,9 @@ public:
   create_typeinfo (d->type, NULL);
 
 /* Generate static initializer.  */
-d->sinit = aggregate_initializer_decl (d);
-DECL_INITIAL (d->sinit) = layout_struct_initializer (d);
-d_finish_decl (d->sinit);
+tree sinit = aggregate_initializer_decl (d);
+DECL_INITIAL (sinit) = layout_struct_initializer (d);
+d_finish_decl (sinit);
 
 /* Put out the members.  There might be static constructors in the members
list, and they cannot be put in separate object files.  */
@@ -496,11 +496,11 @@ public:
 /* Generate C symbols.  */
 d->csym = get_classinfo_decl (d);
 d->vtblsym = get_vtable_decl (d);
-d->sinit = aggregate_initializer_decl (d);
+tree sinit = aggregate_initializer_decl (d);
 
 /* Generate static initializer.  */
-DECL_INITIAL (d->sinit) = layout_class_initializer (d);
-d_finish_decl (d->sinit);
+DECL_INITIAL (sinit) = layout_class_initializer (d);
+d_finish_decl (sinit);
 
 /* Put out the TypeInfo.  */
 if (have_typeinfo_p (Type::dtypeinfo))
@@ -2151,7 +2151,7 @@ tree
 aggregate_initializer_decl (AggregateDeclaration *decl)
 {
   if (decl->sinit)
-return decl->sinit;
+return (tree) decl->sinit;
 
   /* Class is a reference, want the record type.  */
   tree type = build_ctype (decl->type);
@@ -2161,20 +2161,21 @@ aggregate_initializer_decl (AggregateDeclaration *decl)
 
   tree ident = mangle_internal_decl (decl, "__init", "Z");
 
-  decl->sinit = declare_extern_var (ident, type);
-  DECL_LANG_SPECIFIC (decl->sinit) = build_lang_decl (NULL);
+  tree sinit = declare_extern_var (ident, type);
+  DECL_LANG_SPECIFIC (sinit) = build_lang_decl (NULL);
 
-  DECL_CONTEXT (decl->sinit) = type;
-  TREE_READONLY (decl->sinit) = 1;
+  DECL_CONTEXT (sinit) = type;
+  TREE_READONLY (sinit) = 1;
 
   /* Honor struct alignment set by user.  */
   if (sd && sd->alignment != STRUCTALIGN_DEFAULT)
 {
-  SET_DECL_ALIGN (decl->sinit, sd->alignment * BITS_PER_UNIT);
-  DECL_USER_ALIGN (decl->sinit) = true;
+  SET_DECL_ALIGN (sinit, sd->alignment * BITS_PER_UNIT);
+  DECL_USER_ALIGN (sinit) = true;
 }
 
-  return decl->sinit;
+  decl->sinit = sinit;
+  return sinit;
 }
 
 /* Generate the data for the static initializer.  */
-- 
2.30.2



[committed 05/12] d: Use Identifier::idPool to generate anonymous field name.

2021-07-30 Thread Iain Buclaw via Gcc-patches
The self-hosted implementation of the D front-end does not export
Identifier::generateId, so handle name generation inline instead.

gcc/d/ChangeLog:

* d-builtins.cc (build_frontend_type): Use Identifier::idPool to
generate anonymous field name.
---
 gcc/d/d-builtins.cc | 10 +++---
 1 file changed, 7 insertions(+), 3 deletions(-)

diff --git a/gcc/d/d-builtins.cc b/gcc/d/d-builtins.cc
index 9db46c0c5ca..328711fc745 100644
--- a/gcc/d/d-builtins.cc
+++ b/gcc/d/d-builtins.cc
@@ -241,8 +241,8 @@ build_frontend_type (tree type)
   sdecl->type->merge2 ();
 
   /* Add both named and anonymous fields as members of the struct.
-Anonymous fields still need a name in D, so call them "__pad%d".  */
-  int anonfield_id = 0;
+Anonymous fields still need a name in D, so call them "__pad%u".  */
+  unsigned anonfield_id = 0;
   sdecl->members = new Dsymbols;
 
   for (tree field = TYPE_FIELDS (type); field; field = DECL_CHAIN (field))
@@ -259,7 +259,11 @@ build_frontend_type (tree type)
 
  Identifier *fident;
  if (DECL_NAME (field) == NULL_TREE)
-   fident = Identifier::generateId ("__pad", anonfield_id++);
+   {
+ char name[16];
+ snprintf (name, sizeof (name), "__pad%u", anonfield_id++);
+ fident = Identifier::idPool (name);
+   }
  else
{
  const char *name = IDENTIFIER_POINTER (DECL_NAME (field));
-- 
2.30.2



[committed 04/12] d: Use hasMonitor to determine whether to emit a __monitor field in D classes

2021-07-30 Thread Iain Buclaw via Gcc-patches
This helper introduced by the front-end is a better gate, and allows the
front-end to change rules for what gets a monitor in the future.

gcc/d/ChangeLog:

* types.cc (layout_aggregate_type): Call hasMonitor.
* typeinfo.cc (TypeInfoVisitor::layout_base): Likewise.
(layout_cpp_typeinfo): Likewise.  Don't emit vtable unless
have_typeinfo_p.
---
 gcc/d/typeinfo.cc | 21 ++---
 gcc/d/types.cc|  2 +-
 2 files changed, 15 insertions(+), 8 deletions(-)

diff --git a/gcc/d/typeinfo.cc b/gcc/d/typeinfo.cc
index a1f0543d58e..c9126f4c6b5 100644
--- a/gcc/d/typeinfo.cc
+++ b/gcc/d/typeinfo.cc
@@ -423,7 +423,8 @@ class TypeInfoVisitor : public Visitor
 else
   this->layout_field (null_pointer_node);
 
-this->layout_field (null_pointer_node);
+if (cd->hasMonitor ())
+  this->layout_field (null_pointer_node);
   }
 
   /* Write out the interfaces field of class CD.
@@ -1457,9 +1458,17 @@ layout_cpp_typeinfo (ClassDeclaration *cd)
   /* Use the vtable of __cpp_type_info_ptr, the EH personality routine
  expects this, as it uses .classinfo identity comparison to test for
  C++ catch handlers.  */
-  tree vptr = get_vtable_decl (ClassDeclaration::cpp_type_info_ptr);
-  CONSTRUCTOR_APPEND_ELT (init, NULL_TREE, build_address (vptr));
-  CONSTRUCTOR_APPEND_ELT (init, NULL_TREE, null_pointer_node);
+  ClassDeclaration *cppti = ClassDeclaration::cpp_type_info_ptr;
+  if (have_typeinfo_p (cppti))
+{
+  tree vptr = get_vtable_decl (cppti);
+  CONSTRUCTOR_APPEND_ELT (init, NULL_TREE, build_address (vptr));
+}
+  else
+CONSTRUCTOR_APPEND_ELT (init, NULL_TREE, null_pointer_node);
+
+  if (cppti->hasMonitor ())
+CONSTRUCTOR_APPEND_ELT (init, NULL_TREE, null_pointer_node);
 
   /* Let C++ do the RTTI generation, and just reference the symbol as
  extern, knowing the underlying type is not required.  */
@@ -1471,9 +1480,7 @@ layout_cpp_typeinfo (ClassDeclaration *cd)
 
   /* Build the initializer and emit.  */
   DECL_INITIAL (decl) = build_struct_literal (TREE_TYPE (decl), init);
-  DECL_EXTERNAL (decl) = 0;
-  d_pushdecl (decl);
-  rest_of_decl_compilation (decl, 1, 0);
+  d_finish_decl (decl);
 }
 
 /* Get the VAR_DECL of the __cpp_type_info_ptr for DECL.  If this does not yet
diff --git a/gcc/d/types.cc b/gcc/d/types.cc
index ba2d6d4dc66..8e674618004 100644
--- a/gcc/d/types.cc
+++ b/gcc/d/types.cc
@@ -469,7 +469,7 @@ layout_aggregate_type (AggregateDeclaration *decl, tree 
type,
  insert_aggregate_field (type, field, 0);
}
 
- if (!id && !cd->isCPPclass ())
+ if (!id && cd->hasMonitor ())
{
  tree field = create_field_decl (ptr_type_node, "__monitor", 1,
  inherited_p);
-- 
2.30.2



[committed 03/12] d: Insert null terminator in obstack buffers

2021-07-30 Thread Iain Buclaw via Gcc-patches
Covers cases where functions that handle the extracted strings ignore
the explicit length.  This isn't something that's known to happen in the
current front-end, but the self-hosted front-end has been observed to do
this in its conversions between D and C-style strings.

gcc/d/ChangeLog:

* d-lang.cc (deps_add_target): Insert null terminator in buffer.
(deps_write): Likewise.
(d_parse_file): Likewise.
---
 gcc/d/d-lang.cc | 6 +-
 1 file changed, 5 insertions(+), 1 deletion(-)

diff --git a/gcc/d/d-lang.cc b/gcc/d/d-lang.cc
index ac0945b1f34..4386a489ff2 100644
--- a/gcc/d/d-lang.cc
+++ b/gcc/d/d-lang.cc
@@ -108,7 +108,7 @@ deps_add_target (const char *target, bool quoted)
 
   if (!quoted)
 {
-  obstack_grow (, target, strlen (target));
+  obstack_grow0 (, target, strlen (target));
   d_option.deps_target.safe_push ((const char *) obstack_finish ());
   return;
 }
@@ -149,6 +149,7 @@ deps_add_target (const char *target, bool quoted)
   obstack_1grow (, *p);
 }
 
+  obstack_1grow (, '\0');
   d_option.deps_target.safe_push ((const char *) obstack_finish ());
 }
 
@@ -278,6 +279,8 @@ deps_write (Module *module, obstack *buffer)
   obstack_grow (buffer, str, strlen (str));
   obstack_grow (buffer, ":\n", 2);
 }
+
+  obstack_1grow (buffer, '\0');
 }
 
 /* Implements the lang_hooks.init_options routine for language D.
@@ -884,6 +887,7 @@ d_parse_file (void)
  obstack_grow (, str, strlen (str));
}
 
+ obstack_1grow (, '\0');
  message ("%s", (char *) obstack_finish ());
}
 }
-- 
2.30.2



[committed 02/12] d: Drop any field or parameter types that got cached before conversion failed.

2021-07-30 Thread Iain Buclaw via Gcc-patches
This ensures there are no dangling references to AST members that have
been freed, either explcitly or by the garbage collector.

gcc/d/ChangeLog:

* d-builtins.cc (build_frontend_type): Restore builtin_converted_decls
length on conversion failure.
---
 gcc/d/d-builtins.cc | 9 -
 1 file changed, 8 insertions(+), 1 deletion(-)

diff --git a/gcc/d/d-builtins.cc b/gcc/d/d-builtins.cc
index ff2a5776dc5..9db46c0c5ca 100644
--- a/gcc/d/d-builtins.cc
+++ b/gcc/d/d-builtins.cc
@@ -80,7 +80,8 @@ build_frontend_type (tree type)
 mod |= MODshared;
 
   /* If we've seen the type before, re-use the converted decl.  */
-  for (size_t i = 0; i < builtin_converted_decls.length (); ++i)
+  unsigned saved_builtin_decls_length = builtin_converted_decls.length ();
+  for (size_t i = 0; i < saved_builtin_decls_length; ++i)
 {
   tree t = builtin_converted_decls[i].ctype;
   if (TYPE_MAIN_VARIANT (t) == TYPE_MAIN_VARIANT (type))
@@ -249,6 +250,9 @@ build_frontend_type (tree type)
  Type *ftype = build_frontend_type (TREE_TYPE (field));
  if (!ftype)
{
+ /* Drop any field types that got cached before the conversion
+of this record type failed.  */
+ builtin_converted_decls.truncate (saved_builtin_decls_length);
  delete sdecl->members;
  return NULL;
}
@@ -307,6 +311,9 @@ build_frontend_type (tree type)
  Type *targ = build_frontend_type (argtype);
  if (!targ)
{
+ /* Drop any parameter types that got cached before the
+conversion of this function type failed.  */
+ builtin_converted_decls.truncate (saved_builtin_decls_length);
  delete args;
  return NULL;
}
-- 
2.30.2



[committed 01/12] d: Factor d_nested_class and d_nested_struct into single function.

2021-07-30 Thread Iain Buclaw via Gcc-patches
Both do the exact same operation, just on different AST nodes.

gcc/d/ChangeLog:

* d-codegen.cc (d_nested_class): Rename to ...
(get_outer_function): ... this.  Handle all aggregate declarations.
(d_nested_struct): Remove.
(find_this_tree): Use get_outer_function.
(get_framedecl): Likewise.
---
 gcc/d/d-codegen.cc | 54 --
 1 file changed, 14 insertions(+), 40 deletions(-)

diff --git a/gcc/d/d-codegen.cc b/gcc/d/d-codegen.cc
index f35de90b54c..fe2ad98e60a 100644
--- a/gcc/d/d-codegen.cc
+++ b/gcc/d/d-codegen.cc
@@ -2354,41 +2354,24 @@ get_frame_for_symbol (Dsymbol *sym)
   return null_pointer_node;
 }
 
-/* Return the parent function of a nested class CD.  */
+/* Return the parent function of a nested class or struct AD.  */
 
 static FuncDeclaration *
-d_nested_class (ClassDeclaration *cd)
+get_outer_function (AggregateDeclaration *ad)
 {
   FuncDeclaration *fd = NULL;
-  while (cd && cd->isNested ())
+  while (ad && ad->isNested ())
 {
-  Dsymbol *dsym = cd->toParent2 ();
+  Dsymbol *dsym = ad->toParent2 ();
   if ((fd = dsym->isFuncDeclaration ()))
return fd;
   else
-   cd = dsym->isClassDeclaration ();
+   ad = dsym->isAggregateDeclaration ();
 }
-  return NULL;
-}
-
-/* Return the parent function of a nested struct SD.  */
 
-static FuncDeclaration *
-d_nested_struct (StructDeclaration *sd)
-{
-  FuncDeclaration *fd = NULL;
-  while (sd && sd->isNested ())
-{
-  Dsymbol *dsym = sd->toParent2 ();
-  if ((fd = dsym->isFuncDeclaration ()))
-   return fd;
-  else
-   sd = dsym->isStructDeclaration ();
-}
   return NULL;
 }
 
-
 /* Starting from the current function FD, try to find a suitable value of
`this' in nested function instances.  A suitable `this' value is an
instance of OCD or a class that has OCD as a base.  */
@@ -2411,18 +2394,17 @@ find_this_tree (ClassDeclaration *ocd)
return convert_expr (get_decl_tree (fd->vthis),
 cd->type, ocd->type);
 
- fd = d_nested_class (cd);
+ fd = get_outer_function (cd);
+ continue;
}
-  else
-   {
- if (fd->isNested ())
-   {
- fd = fd->toParent2 ()->isFuncDeclaration ();
- continue;
-   }
 
- fd = NULL;
+  if (fd->isNested ())
+   {
+ fd = fd->toParent2 ()->isFuncDeclaration ();
+ continue;
}
+
+  fd = NULL;
 }
 
   return NULL_TREE;
@@ -2760,10 +2742,6 @@ get_framedecl (FuncDeclaration *inner, FuncDeclaration 
*outer)
 
   while (fd && fd != outer)
 {
-  AggregateDeclaration *ad;
-  ClassDeclaration *cd;
-  StructDeclaration *sd;
-
   /* Parent frame link is the first field.  */
   if (FRAMEINFO_CREATES_FRAME (get_frameinfo (fd)))
result = indirect_ref (ptr_type_node, result);
@@ -2773,12 +2751,8 @@ get_framedecl (FuncDeclaration *inner, FuncDeclaration 
*outer)
   /* The frame/closure record always points to the outer function's
 frame, even if there are intervening nested classes or structs.
 So, we can just skip over these.  */
-  else if ((ad = fd->isThis ()) && (cd = ad->isClassDeclaration ()))
-   fd = d_nested_class (cd);
-  else if ((ad = fd->isThis ()) && (sd = ad->isStructDeclaration ()))
-   fd = d_nested_struct (sd);
   else
-   break;
+   fd = get_outer_function (fd->isThis ());
 }
 
   if (fd != outer)
-- 
2.30.2



[committed 00/12] d: Series of refactorings to the D front-end

2021-07-30 Thread Iain Buclaw via Gcc-patches
Hi,

This small series of patches are for non-mechanical refactorings of the
D front-end either required to work with the self-hosted version, or
were made during the port to the self-hosted compiler.

Each individual change has been pulled out into its own patch, rather
than lumped together into one big commit.

Bootstrapped and regression tested on x86_64-linux-gnu/-m32/-mx32, and
committed to mainline.

Regards,
Iain.

Iain Buclaw (12):
  d: Factor d_nested_class and d_nested_struct into single function.
  d: Drop any field or parameter types that got cached before conversion
failed.
  d: Insert null terminator in obstack buffers
  d: Use hasMonitor to determine whether to emit a __monitor field in D
classes
  d: Use Identifier::idPool to generate anonymous field name.
  d: Factor aggregate_initializer_decl to set the sinit for aggregate
declarations.
  d: Set COMDAT and visibility of thunks only if they are public.
  d: Only handle named enums in enum_initializer_decl
  d: Clarify comment for generating static array assignment with
literal.
  d: Don't generate a PREDICT_EXPR when assert contracts are turned off.
  d: Always layout initializer for the m_RTInfo field in TypeInfo_Class
  d: Remove dead code from binary_op.

 gcc/d/d-builtins.cc | 19 
 gcc/d/d-codegen.cc  | 54 -
 gcc/d/d-lang.cc |  6 -
 gcc/d/decl.cc   | 45 +++--
 gcc/d/expr.cc   | 26 +-
 gcc/d/typeinfo.cc   | 23 +--
 gcc/d/types.cc  |  2 +-
 7 files changed, 80 insertions(+), 95 deletions(-)

-- 
2.30.2



[PATCH] c++: Fix up attribute rollbacks in cp_parser_statement

2021-07-30 Thread Jakub Jelinek via Gcc-patches
Hi!

During the OpenMP directives using C++ attribute syntax work, I've noticed
that cp_parser_statement when parsing various block declarations that do
not allow attribute-specifier-seq at the start rolls back the attributes
only if std_attrs is non-NULL (i.e. some attributes have been parsed),
but doesn't roll back if some tokens were parsed as attribute-specifier-seq,
but didn't yield any attributes (e.g. [[]][[]][[]][[]]), which means
we accept those empty attributes even in places where they don't appear
in the grammar.

The following patch fixes that by instead checking if there are any
tokens to roll back.  This makes the parsing handle the first
function the same as the second one (where some attribute appears).

The testcase contains two xfails, using namespace ... apparently
allows attributes at the start and the attributes shall appeartain to
using in that case.  To be fixed incrementally.

Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?

2021-07-30  Jakub Jelinek  

* parser.c (cp_parser_statement): Rollback attributes not just
when std_attrs is non-NULL, but whenever
cp_parser_std_attribute_spec_seq parsed any tokens.

* g++.dg/cpp0x/gen-attrs-76.C: New test.

--- gcc/cp/parser.c.jj  2021-07-30 11:19:39.431614703 +0200
+++ gcc/cp/parser.c 2021-07-30 12:22:16.995130642 +0200
@@ -11909,6 +11909,7 @@ cp_parser_statement (cp_parser* parser,
   cp_token *token;
   location_t statement_location, attrs_loc;
   bool in_omp_attribute_pragma = parser->lexer->in_omp_attribute_pragma;
+  bool has_std_attrs;
 
  restart:
   if (if_p != NULL)
@@ -11917,7 +11918,8 @@ cp_parser_statement (cp_parser* parser,
   statement = NULL_TREE;
 
   saved_token_sentinel saved_tokens (parser->lexer);
-  attrs_loc = cp_lexer_peek_token (parser->lexer)->location;
+  token = cp_lexer_peek_token (parser->lexer);
+  attrs_loc = token->location;
   if (c_dialect_objc ())
 /* In obj-c++, seeing '[[' might be the either the beginning of
c++11 attributes, or a nested objc-message-expression.  So
@@ -11931,6 +11933,7 @@ cp_parser_statement (cp_parser* parser,
   if (!cp_parser_parse_definitely (parser))
std_attrs = NULL_TREE;
 }
+  has_std_attrs = cp_lexer_peek_token (parser->lexer) != token;
 
   if (std_attrs && (flag_openmp || flag_openmp_simd))
 std_attrs = cp_parser_handle_statement_omp_attributes (parser, std_attrs);
@@ -11999,7 +12002,7 @@ cp_parser_statement (cp_parser* parser,
 
case RID_NAMESPACE:
  /* This must be a namespace alias definition.  */
- if (std_attrs != NULL_TREE)
+ if (has_std_attrs)
{
  /* Attributes should be parsed as part of the
 declaration, so let's un-parse them.  */
@@ -12104,7 +12107,7 @@ cp_parser_statement (cp_parser* parser,
 {
   if (cp_lexer_next_token_is_not (parser->lexer, CPP_SEMICOLON))
{
- if (std_attrs != NULL_TREE)
+ if (has_std_attrs)
/* Attributes should be parsed as part of the declaration,
   so let's un-parse them.  */
saved_tokens.rollback();
@@ -12116,7 +12119,7 @@ cp_parser_statement (cp_parser* parser,
  if (cp_parser_parse_definitely (parser))
return;
  /* It didn't work, restore the post-attribute position.  */
- if (std_attrs)
+ if (has_std_attrs)
cp_lexer_set_token_position (parser->lexer, statement_token);
}
   /* All preceding labels have been parsed at this point.  */
--- gcc/testsuite/g++.dg/cpp0x/gen-attrs-76.C.jj2021-07-30 
12:16:59.472477365 +0200
+++ gcc/testsuite/g++.dg/cpp0x/gen-attrs-76.C   2021-07-30 12:21:32.440740569 
+0200
@@ -0,0 +1,31 @@
+// { dg-do compile { target c++11 } }
+// { dg-options "-Wno-attributes" }
+
+namespace N {}
+namespace O { typedef int T; };
+
+void
+foo ()
+{
+  [[]] asm ("");   // { dg-error "expected" }
+  [[]] __extension__ asm (""); // { dg-error "expected" }
+  __extension__ [[]] asm (""); // { dg-error "expected" }
+  [[]] namespace M = ::N;  // { dg-error "expected" }
+  [[]] using namespace N;  // { dg-bogus "expected" "" { 
xfail *-*-* } }
+  [[]] using O::T; // { dg-error "expected" }
+  [[]] __label__ foo;  // { dg-error "expected" }
+  [[]] static_assert (true, "");   // { dg-error "expected" }
+}
+
+void
+bar ()
+{
+  [[gnu::unused]] asm ("");// { dg-error "expected" }
+  [[gnu::unused]] __extension__ asm ("");  // { dg-error "expected" }
+  __extension__ [[gnu::unused]] asm ("");  // { dg-error "expected" }
+  [[gnu::unused]] namespace M = ::N;   // { dg-error "expected" }
+  [[gnu::unused]] using namespace N;   // { dg-bogus "expected" "" { 
xfail *-*-* } }
+  [[gnu::unused]] using O::T;  // { 

[RFC] Mark gcc.dg/shrink-wrap-loop.c as XFAIL.

2021-07-30 Thread Aldy Hernandez via Gcc-patches
It occurs to me that I should not have disabled early jump threading in
this test, as it may hide an actual defect.  I have reverted my change
and XFAILed the test instead.  I have also opened a PR101690 to keep track
of this problem.

I have pushed this patch, but could benefit from someone with knowledge
of loop-ch and/or RTL shrink wrapping to look at the PR, as here we have
a valid jump thread that is causing loop_ch to drastically change the
probabilities ultimately having us fail at shrink wrapping.

Thanks.

gcc/testsuite/ChangeLog:

* gcc.dg/shrink-wrap-loop.c: Enable early jump threading.  Mark as
XFAIL.
---
 gcc/testsuite/gcc.dg/shrink-wrap-loop.c | 3 +--
 1 file changed, 1 insertion(+), 2 deletions(-)

diff --git a/gcc/testsuite/gcc.dg/shrink-wrap-loop.c 
b/gcc/testsuite/gcc.dg/shrink-wrap-loop.c
index ba872fa23f6..6e1be8937fe 100644
--- a/gcc/testsuite/gcc.dg/shrink-wrap-loop.c
+++ b/gcc/testsuite/gcc.dg/shrink-wrap-loop.c
@@ -1,6 +1,5 @@
 /* { dg-do compile { target { { { i?86-*-* x86_64-*-* } && lp64 } || { 
arm_thumb2 } } } } */
 /* { dg-options "-O2 -fdump-rtl-pro_and_epilogue"  } */
-// { dg-additional-options "-fdisable-tree-ethread" }
 
 /*
 Our new threader is threading things a bit too early, and causing the
@@ -69,4 +68,4 @@ test (int *p1, int *p2)
 
   return 1;
 }
-/* { dg-final { scan-rtl-dump "Performing shrink-wrapping" "pro_and_epilogue"  
} } */
+/* { dg-final { scan-rtl-dump "Performing shrink-wrapping" "pro_and_epilogue" 
{ xfail *-*-* } } } */
-- 
2.31.1



Re: [PATCH] i386: Improve extensions of __builtin_clz and constant - __builtin_clz for -mno-lzcnt [PR78103]

2021-07-30 Thread Uros Bizjak via Gcc-patches
On Wed, Jul 28, 2021 at 10:36 AM Jakub Jelinek  wrote:
>
> Hi!
>
> This patch improves emitted code for the non-TARGET_LZCNT case.
> As __builtin_clz* is UB on 0 argument and for !TARGET_LZCNT
> CLZ_VALUE_DEFINED_AT_ZERO is 0, it is UB even at RTL time and so we
> can take advantage of that and assume the result will be 0 to 31 or
> 0 to 63.
> Given that, sign or zero extension of that result are the same and
> are actually already performed by bsrl or xorl instructions.
> And constant - __builtin_clz* can be simplified into
> bsr + constant - bitmask.
> For TARGET_LZCNT, a lot of this is already fine as is (e.g. the sign or
> zero extensions), and other optimizations are IMHO not possible
> (if we have lzcnt, we've lost information on whether it is UB at
> zero or not and so can't transform it into bsr even when that is
> 1-2 insns shorter).
> The changes on the 3 testcases between unpatched and patched gcc
> are for -m64:
> pr78103-1.s:
> bsrq%rdi, %rax
> -   xorq$63, %rax
> -   cltq
> +   xorl$63, %eax
> ...
> bsrq%rdi, %rax
> -   xorq$63, %rax
> -   cltq
> +   xorl$63, %eax
> ...
> bsrl%edi, %eax
> xorl$31, %eax
> -   cltq
> ...
> bsrl%edi, %eax
> xorl$31, %eax
> -   cltq
> pr78103-2.s:
> bsrl%edi, %edi
> -   movl$32, %eax
> -   xorl$31, %edi
> -   subl%edi, %eax
> +   leal1(%rdi), %eax
> ...
> -   bsrl%edi, %edi
> -   movl$31, %eax
> -   xorl$31, %edi
> -   subl%edi, %eax
> +   bsrl%edi, %eax
> ...
> bsrq%rdi, %rdi
> -   movl$64, %eax
> -   xorq$63, %rdi
> -   subl%edi, %eax
> +   leal1(%rdi), %eax
> ...
> -   bsrq%rdi, %rdi
> -   movl$63, %eax
> -   xorq$63, %rdi
> -   subl%edi, %eax
> +   bsrq%rdi, %rax
> pr78103-3.s:
> bsrl%edi, %edi
> -   movl$32, %eax
> -   xorl$31, %edi
> -   movslq  %edi, %rdi
> -   subq%rdi, %rax
> +   leaq1(%rdi), %rax
> ...
> -   bsrl%edi, %edi
> -   movl$31, %eax
> -   xorl$31, %edi
> -   movslq  %edi, %rdi
> -   subq%rdi, %rax
> +   bsrl%edi, %eax
> ...
> bsrq%rdi, %rdi
> -   movl$64, %eax
> -   xorq$63, %rdi
> -   movslq  %edi, %rdi
> -   subq%rdi, %rax
> +   leaq1(%rdi), %rax
> ...
> -   bsrq%rdi, %rdi
> -   movl$63, %eax
> -   xorq$63, %rdi
> -   movslq  %edi, %rdi
> -   subq%rdi, %rax
> +   bsrq%rdi, %rax
>
> Most of the changes are done with combine splitters, but for
> *bsr_rex64_2 and *bsr_2 I had to use define_insn_and_split, because
> as mentioned in the PR the combiner unfortunately doesn't create LOG_LINKS
> in between the two insns created by combine splitter, so it can't be
> combined further with following instructions.
>
> Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?
>
> 2021-07-28  Jakub Jelinek  
>
> PR target/78103
> * config/i386/i386.md (*bsr_rex64_1, *bsr_1, *bsr_zext_1): New
> define_insn patterns.
> (*bsr_rex64_2, *bsr_2): New define_insn_and_split patterns.
> Add combine splitters for constant - clz.
> (clz2): Use a temporary pseudo for bsr result.
>
> * gcc.target/i386/pr78103-1.c: New test.
> * gcc.target/i386/pr78103-2.c: New test.
> * gcc.target/i386/pr78103-3.c: New test.
>
> --- gcc/config/i386/i386.md.jj  2021-07-27 09:47:30.311970004 +0200
> +++ gcc/config/i386/i386.md 2021-07-27 15:37:59.011394624 +0200
> @@ -14761,6 +14761,18 @@ (define_insn "bsr_rex64"
> (set_attr "znver1_decode" "vector")
> (set_attr "mode" "DI")])
>
> +(define_insn "*bsr_rex64_1"
> +  [(set (match_operand:DI 0 "register_operand" "=r")
> +   (minus:DI (const_int 63)
> + (clz:DI (match_operand:DI 1 "nonimmediate_operand" "rm"
> +   (clobber (reg:CC FLAGS_REG))]
> +  "!TARGET_LZCNT && TARGET_64BIT"
> +  "bsr{q}\t{%1, %0|%0, %1}"
> +  [(set_attr "type" "alu1")
> +   (set_attr "prefix_0f" "1")
> +   (set_attr "znver1_decode" "vector")
> +   (set_attr "mode" "DI")])
> +
>  (define_insn "bsr"
>[(set (reg:CCZ FLAGS_REG)
> (compare:CCZ (match_operand:SI 1 "nonimmediate_operand" "rm")
> @@ -14775,17 +14787,210 @@ (define_insn "bsr"
> (set_attr "znver1_decode" "vector")
> (set_attr "mode" "SI")])
>
> +(define_insn "*bsr_1"
> +  [(set (match_operand:SI 0 "register_operand" "=r")
> +   (minus:SI (const_int 31)
> + (clz:SI (match_operand:SI 1 "nonimmediate_operand" "rm"
> +   (clobber (reg:CC FLAGS_REG))]
> +  "!TARGET_LZCNT"
> +  "bsr{l}\t{%1, %0|%0, %1}"
> +  [(set_attr "type" "alu1")
> +   (set_attr "prefix_0f" "1")
> +   (set_attr "znver1_decode" "vector")
> +   (set_attr "mode" "SI")])
> +
> +(define_insn "*bsr_zext_1"
> +  [(set 

Re: OpenMP 5.1: omp_display_env

2021-07-30 Thread Thomas Schwinge
Hi!

On 2021-07-30T12:02:00+0200, Jakub Jelinek via Gcc-patches 
 wrote:
> On Fri, Jul 30, 2021 at 11:54:00AM +0200, Ulrich Drepper wrote:
>> On Fri, Jul 30, 2021 at 10:50 AM Jakub Jelinek  wrote:
>>
>> > I think for now it would be better to guard the omp_display_env_*
>> > in fortran.c with #ifndef LIBGOMP_OFFLOADED_ONLY
>>
>> OK, easy enough.  This compiles for me.
>
> Ok (with ChangeLog entry), thanks.

Heh, I had just come up with the same patch, and pushed
"[libgomp] Restore offloading 'libgomp/fortran.c'" to master branch in
commit 28665ddc7efa48f9b39615e313a2c4a7a66cdb24, see attached.


Grüße
 Thomas


>> diff --git a/libgomp/fortran.c b/libgomp/fortran.c
>> index 76285d4376b..26ec8ce30d8 100644
>> --- a/libgomp/fortran.c
>> +++ b/libgomp/fortran.c
>> @@ -738,6 +738,7 @@ omp_get_default_allocator_ ()
>>return (intptr_t) omp_get_default_allocator ();
>>  }
>>
>> +#ifndef LIBGOMP_OFFLOADED_ONLY
>>  void
>>  omp_display_env_ (const int32_t *verbose)
>>  {
>> @@ -749,3 +750,4 @@ omp_display_env_8_ (const int64_t *verbose)
>>  {
>>omp_display_env (!!*verbose);
>>  }
>> +#endif /* LIBGOMP_OFFLOADED_ONLY */
>
>   Jakub


-
Siemens Electronic Design Automation GmbH; Anschrift: Arnulfstraße 201, 80634 
München; Gesellschaft mit beschränkter Haftung; Geschäftsführer: Thomas 
Heurung, Frank Thürauf; Sitz der Gesellschaft: München; Registergericht 
München, HRB 106955
>From 28665ddc7efa48f9b39615e313a2c4a7a66cdb24 Mon Sep 17 00:00:00 2001
From: Thomas Schwinge 
Date: Fri, 30 Jul 2021 11:48:54 +0200
Subject: [PATCH] [libgomp] Restore offloading 'libgomp/fortran.c'

GCN:

ld: error: undefined symbol: gomp_ialias_omp_display_env
>>> referenced by fortran.c:744 ([...]/source-gcc/libgomp/fortran.c:744)
>>>   fortran.o:(omp_display_env_) in archive [...]/build-gcc-offload-amdgcn-amdhsa/amdgcn-amdhsa/libgomp/.libs/libgomp.a
>>> referenced by fortran.c:744 ([...]/source-gcc/libgomp/fortran.c:744)
>>>   fortran.o:(omp_display_env_) in archive [...]/build-gcc-offload-amdgcn-amdhsa/amdgcn-amdhsa/libgomp/.libs/libgomp.a
>>> referenced by fortran.c:750 ([...]/source-gcc/libgomp/fortran.c:750)
>>>   fortran.o:(omp_display_env_8_) in archive [...]/build-gcc-offload-amdgcn-amdhsa/amdgcn-amdhsa/libgomp/.libs/libgomp.a
>>> referenced by fortran.c:750 ([...]/source-gcc/libgomp/fortran.c:750)
>>>   fortran.o:(omp_display_env_8_) in archive [...]/build-gcc-offload-amdgcn-amdhsa/amdgcn-amdhsa/libgomp/.libs/libgomp.a
collect2: error: ld returned 1 exit status
mkoffload: fatal error: build-gcc/gcc/x86_64-pc-linux-gnu-accel-amdgcn-amdhsa-gcc returned 1 exit status

nvptx:

unresolved symbol omp_display_env
collect2: error: ld returned 1 exit status
mkoffload: fatal error: [...]/build-gcc/./gcc/x86_64-pc-linux-gnu-accel-nvptx-none-gcc returned 1 exit status

Fix-up for commit 7123ae2455b5a1a2f19f13fa82c377cfda157f23
"Implement OpenMP 5.1 section 3.15: omp_display_env".

	libgomp/
	* fortran.c (omp_display_env_, omp_display_env_8_): Only
	'#ifndef LIBGOMP_OFFLOADED_ONLY'.

Co-Authored-By: Ulrich Drepper 
---
 libgomp/fortran.c | 4 
 1 file changed, 4 insertions(+)

diff --git a/libgomp/fortran.c b/libgomp/fortran.c
index 76285d4376b..e042702ac91 100644
--- a/libgomp/fortran.c
+++ b/libgomp/fortran.c
@@ -738,6 +738,8 @@ omp_get_default_allocator_ ()
   return (intptr_t) omp_get_default_allocator ();
 }
 
+#ifndef LIBGOMP_OFFLOADED_ONLY
+
 void
 omp_display_env_ (const int32_t *verbose)
 {
@@ -749,3 +751,5 @@ omp_display_env_8_ (const int64_t *verbose)
 {
   omp_display_env (!!*verbose);
 }
+
+#endif /* LIBGOMP_OFFLOADED_ONLY */
-- 
2.30.2



Re: PING^1 [PATCH v2] x86: Check AVX512 without mask instructions

2021-07-30 Thread Uros Bizjak via Gcc-patches
On Mon, Jul 26, 2021 at 5:33 AM Hongtao Liu  wrote:
>
> On Wed, Jul 14, 2021 at 8:27 PM H.J. Lu  wrote:
> >
> > On Fri, Jun 25, 2021 at 5:39 AM H.J. Lu  wrote:
> > >
> > > On Fri, Jun 25, 2021 at 12:50 AM Uros Bizjak  wrote:
> > > >
> > > > On Fri, Jun 25, 2021 at 4:51 AM Hongtao Liu  wrote:
> > > > >
> > > > > On Fri, Jun 25, 2021 at 12:13 AM Uros Bizjak via Gcc-patches
> > > > >  wrote:
> > > > > >
> > > > > > On Thu, Jun 24, 2021 at 2:12 PM H.J. Lu  wrote:
> > > > > > >
> > > > > > > CPUID functions are used to detect CPU features.  If vector ISAs
> > > > > > > are enabled, compiler is free to use them in these functions.  Add
> > > > > > > __attribute__ ((target("general-regs-only"))) to CPUID functions
> > > > > > > to avoid vector instructions.
> > > > > >
> > > > > > These functions are intended to be inlined, so how does target
> > > > > > attribute affect inlining?
> > > > > I guess w/ -O0. they may not be inlined, that's why H.J adds those
> > > > > attributes to those functions.
> > > >
> > > > The problem is not with these functions, but with surrounding checks
> > > > for cpuid features. These checks are implemented with logic
> > > > instructions, and nothing prevents RA from allocating mask registers,
> > > > and consequently mask insn is emitted. Regarding mentioned functions,
> > > > cpuid insn pattern has four GPR single-reg constraints, so mask
> > > > registers can't be allocated here.
> > > >
> > > > > pr96814.dump:
> > > > > 0804aa40 :
> > > > >  804aa40: 8d 4c 24 04  lea0x4(%esp),%ecx
> > > > > ...
> > > > >  804aa63: 6a 07push   $0x7
> > > > >  804aa65: e8 e0 e7 ff ffcall   804924a <__get_cpuid_count>
> > > > >
> > > > > Also we need to add a target attribute to avx512f_os_support (), and
> > > > > that would be enough to fix the AVX512 part.
> > > > >
> > > > > Moreover, all check functions in below files may also need to deal 
> > > > > with:
> > > > > adx-check.h
> > > > > aes-avx-check.h
> > > > > aes-check.h
> > > > > amx-check.h
> > > > > attr-nocf-check-1a.c
> > > > > attr-nocf-check-3a.c
> > > > > avx2-check.h
> > > > > avx2-vpop-check.h
> > > > > avx512bw-check.h
> > > > > avx512-check.h
> > > > > avx512dq-check.h
> > > > > avx512er-check.h
> > > > > avx512f-check.h
> > > > > avx512vl-check.h
> > > > > avx-check.h
> > > > > bmi2-check.h
> > > > > bmi-check.h
> > > > > cf_check-1.c
> > > > > cf_check-2.c
> > > > > cf_check-3.c
> > > > > cf_check-4.c
> > > > > cf_check-5.c
> > > > > f16c-check.h
> > > > > fma4-check.h
> > > > > fma-check.h
> > > > > isa-check.h
> > > > > lzcnt-check.h
> > > > > m128-check.h
> > > > > m256-check.h
> > > > > m512-check.h
> > > > > mmx-3dnow-check.h
> > > > > mmx-check.h
> > > > > pclmul-avx-check.h
> > > > > pclmul-check.h
> > > > > pr39315-check.c
> > > > > rtm-check.h
> > > > > sha-check.h
> > > > > spellcheck-options-1.c
> > > > > spellcheck-options-2.c
> > > > > spellcheck-options-3.c
> > > > > spellcheck-options-4.c
> > > > > spellcheck-options-5.c
> > > > > sse2-check.h
> > > > > sse3-check.h
> > > > > sse4_1-check.h
> > > > > sse4_2-check.h
> > > > > sse4a-check.h
> > > > > sse-check.h
> > > > > ssse3-check.h
> > > > > stack-check-11.c
> > > > > stack-check-12.c
> > > > > stack-check-17.c
> > > > > stack-check-18.c
> > > > > stack-check-19.c
> > > > > xop-check.h
> > > >
> > > > True, but this would just paper over the real problem. Now, it is
> > > > expected that the user decorates the function that checks CPUID
> > > > features with the target attribute. I'm not sure if this is OK.
> vmovw is enabled by AVX512FP16, and compile cpuid check function w/
> avx512fp16 may result in SIGILL on non-avx512fp16 target(though, we
> didn't get a testcase yet).

In struct processor_costs (i386.h) we have:

  const int sse_to_integer;/* cost of moving SSE register to
integer.  */
  const int integer_to_sse;/* cost of moving integer register to SSE. */
  const int mask_to_integer; /* cost of moving mask register to integer.  */
  const int integer_to_mask; /* cost of moving integer register to mask.  */

These are currently set sufficiently high, so we won't get vmovw for
the same reason we don't get vmovd and vmovq.

> Would that be a sufficient reason to disable avx512 for cpuid check?

We would like to avoid inter-unit moves, and keep values in their
respective register set as much as possible. This is the reason for
relatively high values for the above costs and special passes were
introduced (STV) to avoid excessive moves between register sets.
Without this approach, register allocator is free to generate e.g.
instructions with mask registers instead of integer registers
(especially under register pressure), trading spills with inter-unit
moves.

We tried to spill to SSE registers, and the experiment ended with a
nice list of PRs. See  ix86_spill_class in i386.c.

Decorating the function with general_regs_only would just paper over
the above problem. Regarding mask registers, some 

Re: OpenMP 5.1: omp_display_env

2021-07-30 Thread Jakub Jelinek via Gcc-patches
On Fri, Jul 30, 2021 at 11:54:00AM +0200, Ulrich Drepper wrote:
> On Fri, Jul 30, 2021 at 10:50 AM Jakub Jelinek  wrote:
> 
> > I think for now it would be better to guard the omp_display_env_*
> > in fortran.c with #ifndef LIBGOMP_OFFLOADED_ONLY
> >
> 
> OK, easy enough.  This compiles for me.

Ok (with ChangeLog entry), thanks.

> diff --git a/libgomp/fortran.c b/libgomp/fortran.c
> index 76285d4376b..26ec8ce30d8 100644
> --- a/libgomp/fortran.c
> +++ b/libgomp/fortran.c
> @@ -738,6 +738,7 @@ omp_get_default_allocator_ ()
>return (intptr_t) omp_get_default_allocator ();
>  }
> 
> +#ifndef LIBGOMP_OFFLOADED_ONLY
>  void
>  omp_display_env_ (const int32_t *verbose)
>  {
> @@ -749,3 +750,4 @@ omp_display_env_8_ (const int64_t *verbose)
>  {
>omp_display_env (!!*verbose);
>  }
> +#endif /* LIBGOMP_OFFLOADED_ONLY */

Jakub



Re: OpenMP 5.1: omp_display_env

2021-07-30 Thread Ulrich Drepper via Gcc-patches
On Fri, Jul 30, 2021 at 10:50 AM Jakub Jelinek  wrote:

> I think for now it would be better to guard the omp_display_env_*
> in fortran.c with #ifndef LIBGOMP_OFFLOADED_ONLY
>

OK, easy enough.  This compiles for me.


diff --git a/libgomp/fortran.c b/libgomp/fortran.c
index 76285d4376b..26ec8ce30d8 100644
--- a/libgomp/fortran.c
+++ b/libgomp/fortran.c
@@ -738,6 +738,7 @@ omp_get_default_allocator_ ()
   return (intptr_t) omp_get_default_allocator ();
 }

+#ifndef LIBGOMP_OFFLOADED_ONLY
 void
 omp_display_env_ (const int32_t *verbose)
 {
@@ -749,3 +750,4 @@ omp_display_env_8_ (const int64_t *verbose)
 {
   omp_display_env (!!*verbose);
 }
+#endif /* LIBGOMP_OFFLOADED_ONLY */


Re: [PATCH] c++: Implement P0466R5 __cpp_lib_is_pointer_interconvertible compiler helpers [PR101539]

2021-07-30 Thread Jakub Jelinek via Gcc-patches
On Thu, Jul 29, 2021 at 04:38:44PM -0400, Jason Merrill wrote:
> > +/* Helper function for pointer_interconvertible_base_of_p.  Verify
> > +   that BINFO_TYPE (BINFO) is pointer interconvertible with BASE.  */
> > +
> > +static bool
> > +pointer_interconvertible_base_of_p_1 (tree binfo, tree base)
> > +{
> > +  for (tree field = TYPE_FIELDS (BINFO_TYPE (binfo));
> > +   field; field = DECL_CHAIN (field))
> > +if (TREE_CODE (field) == FIELD_DECL && !DECL_FIELD_IS_BASE (field))
> > +  return false;
> 
> I think checking non-static data members is a bug in the resolution of CWG
> 2254, which correctly changed 11.4 to say that the address of a
> standard-layout class is the same as the address of each base whether or not
> the class has non-static data members, but didn't change
> pointer-interconvertibility enough to match.  I've raised this with CWG.
> 
> I think we don't need this function at all.

Ok.

...
> Instead of checking !UNION_TYPE twice above, you could check it once here
> and return false.

Here is an updated patch, which includes the incremental patch for
non-std-layout unions (with no changes for non-stdlayout in anon structure
in union though) and has your review comments above incorporated.
All the changes from the combination of the original and incremental patch
are in gcc/cp/semantics.c and
gcc/testsuite/g++.dg/cpp2a/is-pointer-interconvertible-base-of1.C.

2021-07-30  Jakub Jelinek  

PR c++/101539
gcc/c-family/
* c-common.h (enum rid): Add RID_IS_POINTER_INTERCONVERTIBLE_BASE_OF.
* c-common.c (c_common_reswords): Add
__is_pointer_interconvertible_base_of.
gcc/cp/
* cp-tree.h (enum cp_trait_kind): Add
CPTK_IS_POINTER_INTERCONVERTIBLE_BASE_OF.
(enum cp_built_in_function): Add
CP_BUILT_IN_IS_POINTER_INTERCONVERTIBLE_WITH_CLASS.
(fold_builtin_is_pointer_inverconvertible_with_class): Declare.
* parser.c (cp_parser_primary_expression): Handle
RID_IS_POINTER_INTERCONVERTIBLE_BASE_OF.
(cp_parser_trait_expr): Likewise.
* cp-objcp-common.c (names_builtin_p): Likewise.
* constraint.cc (diagnose_trait_expr): Handle
CPTK_IS_POINTER_INTERCONVERTIBLE_BASE_OF.
* decl.c (cxx_init_decl_processing): Register
__builtin_is_pointer_interconvertible_with_class builtin.
* constexpr.c (cxx_eval_builtin_function_call): Handle
CP_BUILT_IN_IS_POINTER_INTERCONVERTIBLE_WITH_CLASS builtin.
* semantics.c (pointer_interconvertible_base_of_p,
first_nonstatic_data_member_p,
fold_builtin_is_pointer_inverconvertible_with_class): New functions.
(trait_expr_value): Handle CPTK_IS_POINTER_INTERCONVERTIBLE_BASE_OF.
(finish_trait_expr): Likewise.  Formatting fix.
* cp-gimplify.c (cp_gimplify_expr): Fold
CP_BUILT_IN_IS_POINTER_INTERCONVERTIBLE_WITH_CLASS.  Call
fndecl_built_in_p just once.
(cp_fold): Likewise.
* tree.c (builtin_valid_in_constant_expr_p): Handle
CP_BUILT_IN_IS_POINTER_INTERCONVERTIBLE_WITH_CLASS.  Call
fndecl_built_in_p just once.
* cxx-pretty-print.c (pp_cxx_trait_expression): Handle
CPTK_IS_POINTER_INTERCONVERTIBLE_BASE_OF.
gcc/testsuite/
* g++.dg/cpp2a/is-pointer-interconvertible-base-of1.C: New test.
* g++.dg/cpp2a/is-pointer-interconvertible-with-class1.C: New test.
* g++.dg/cpp2a/is-pointer-interconvertible-with-class2.C: New test.
* g++.dg/cpp2a/is-pointer-interconvertible-with-class3.C: New test.
* g++.dg/cpp2a/is-pointer-interconvertible-with-class4.C: New test.
* g++.dg/cpp2a/is-pointer-interconvertible-with-class5.C: New test.
* g++.dg/cpp2a/is-pointer-interconvertible-with-class6.C: New test.

--- gcc/c-family/c-common.h.jj  2021-07-29 13:24:08.368481637 +0200
+++ gcc/c-family/c-common.h 2021-07-30 11:19:39.391615252 +0200
@@ -174,6 +174,7 @@ enum rid
   RID_IS_BASE_OF,  RID_IS_CLASS,
   RID_IS_EMPTY,RID_IS_ENUM,
   RID_IS_FINAL,RID_IS_LITERAL_TYPE,
+  RID_IS_POINTER_INTERCONVERTIBLE_BASE_OF,
   RID_IS_POD,  RID_IS_POLYMORPHIC,
   RID_IS_SAME_AS,
   RID_IS_STD_LAYOUT,   RID_IS_TRIVIAL,
--- gcc/c-family/c-common.c.jj  2021-07-29 13:24:42.667013302 +0200
+++ gcc/c-family/c-common.c 2021-07-30 11:19:39.411614978 +0200
@@ -421,6 +421,8 @@ const struct c_common_resword c_common_r
   { "__is_enum",   RID_IS_ENUM,D_CXXONLY },
   { "__is_final",  RID_IS_FINAL,   D_CXXONLY },
   { "__is_literal_type", RID_IS_LITERAL_TYPE, D_CXXONLY },
+  { "__is_pointer_interconvertible_base_of",
+   RID_IS_POINTER_INTERCONVERTIBLE_BASE_OF, D_CXXONLY },
   { "__is_pod",RID_IS_POD, D_CXXONLY },
   { "__is_polymorphic",RID_IS_POLYMORPHIC, D_CXXONLY },
   { "__is_same", RID_IS_SAME_AS, D_CXXONLY },
--- gcc/cp/cp-tree.h.jj 2021-07-29 13:24:08.673477473 +0200
+++ 

Re: [PATCH] Replace evrp use in loop versioning with ranger.

2021-07-30 Thread Aldy Hernandez via Gcc-patches




On 7/30/21 10:39 AM, Richard Sandiford wrote:

Aldy Hernandez  writes:

On Mon, Jul 26, 2021 at 7:28 PM Richard Sandiford
 wrote:


Aldy Hernandez  writes:

On Mon, Jul 26, 2021 at 4:18 PM Richard Sandiford
 wrote:


Aldy Hernandez  writes:

This patch replaces the evrp_range_analyzer in the loop versioning code
with an on-demand ranger.

Everything was pretty straightforward, except that range_of_expr requires
a gimple statement as context to provide context aware ranges.  I didn't see
a convient place where the statement was saved, so I made a vector indexed
by SSA names.  As an alternative, I tried to use the loop's first statement,
but that proved to be insufficient.


The mapping is one-to-many though: there can be multiple statements
for each SSA name.  Maybe that doesn't matter in this context and
any of the statements can act as a representative.

I'm surprised that the loop's first statement didn't work though,
since the SSA name is supposedly known to be loop-invariant.  What went
wrong when you tried that?


I was looking at the first statement of loop_info->block_list and one
of the dg.exp=loop-versioning* tests failed.  Perhaps I should have
used the loop itself, as in the attached patch.  With this patch all
of the loop-versioning tests pass.




I am not familiar with loop versioning, but if the DOM walk was only
necessary for the calls to record_ranges_from_stmt, this too could be
removed as the ranger will work without it.


Yeah, that was the only reason.  If the information is available at
version_for_unity (I guess it is) then we should just avoid recording
the versioning there if so.

How expensive is the check?  If the result is worth caching, perhaps
we should have two bitmaps: the existing one, and one that records
whether we've checked a particular SSA name.

If the check is relatively cheap then that won't be worth it though.


If you're asking about the range_of_expr check, that's all cached, so
it should be pretty cheap.  Besides, we're no longer calculating
ranges for each statement in the IL, as we were doing in lv_dom_walker
with evrp's record_ranges_from_stmt.  Only statements of interest are
queried.


Sounds good.  If the results are already cached then another level
of caching (via the second bitmap I mentioned above) would obviously
be a waste of time.


My callgrind harness for performance testing wasn't able to pick up
enough samples to measure the time spent in
pass_loop_versioning::execute.  I've seen this happen before with
passes that run too fast.  I'm afraid I don't have enough cycles to
continue working on this.


Yeah, any testing of this was above and beyond IMO.  Hearing that the
range query does its own caching was enough for me. :-)


How about this patch, pending tests?


OK, thanks, as a strict improvement over the status quo.  But it'd be
even better without the dom walk :-)


I've removed the DOM walk, and re-tested.

OK to push?


Sorry for asking for another iteration, but…


It looks like this is a bit more involved than I originally envisioned.

I've pushed the original (approved) patch that just removes the use of 
evrp, which was my main goal.


I'll follow-up with the dom walk removal and your suggested changes next 
week when I have more cycles.


Aldy



Re: [x86_64 PATCH] Decrement followed by cmov improvements.

2021-07-30 Thread Uros Bizjak via Gcc-patches
On Mon, Jul 26, 2021 at 1:27 PM Roger Sayle  wrote:
>
>
> The following patch to the x86_64 backend improves the code generated
> for a decrement followed by a conditional move.  The primary change is
> to recognize that after subtracting one, checking the result is -1 (or
> equivalently that the original value was zero) can be implemented using
> the borrow/carry flag instead of requiring an explicit test instruction.
> This is achieved by a new define_insn_and_split that allows combine to
> split the desired sequence/composite into a *subsi_3 and *movsicc_noc.
>
> The other change with this patch is/are a pair of peephole2 optimizations
> to eliminate register-to-register moves generated during register
> allocation.  During reload, the compiler doesn't know that inverting
> the condition of a conditional cmove can sometimes reduce register
> pressure, but this is easy to tidy up during the peephole2 pass (where
> swapping the order of the insn's operands performs the required
> logic inversion).
>
> Both improvements are demonstrated by the case below:
>
> int foo(int x) {
>   if (x == 0)
> x = 16;
>   else x--;
>   return x;
> }
>
> Before:
> foo:leal-1(%rdi), %eax
> testl   %edi, %edi
> movl$16, %edx
> cmove   %edx, %eax
> ret
>
> After:
> foo:subl$1, %edi
> movl$16, %eax
> cmovnc  %edi, %eax
> ret
>
> And the value of the peephole2 clean-up can be seen on its own in:
>
> int bar(int x) {
>   x--;
>   if (x == 0)
> x = 16;
>   return x;
> }
>
> Before:
> bar:movl%edi, %eax
> movl$16, %edx
> subl$1, %eax
> cmove   %edx, %eax
> ret
>
> After:
> bar:subl$1, %edi
> movl$16, %eax
> cmovne  %edi, %eax
> ret
>
> These idioms were inspired by the source code of NIST SciMark4's
> Random_nextDouble function, where the tweaks above result in
> a ~1% improvement in the MonteCarlo benchmark kernel.
>
> This patch has been tested on x86_64-pc-linux-gnu with a
> "make boostrap" and "make -k check" with no new failures.
>
> Ok for mainline?
>
>
> 2021-07-26  Roger Sayle  
>
> gcc/ChangeLog
> * config/i386/i386.md (*dec_cmov): New define_insn_and_split
> to generate a conditional move using the carry flag after sub $1.
> (peephole2): Eliminate a register-to-register move by inverting
> the condition of a conditional move.
>
> gcc/testsuite/ChangeLog
> * gcc.target/i386/dec-cmov-1.c: New test.
> * gcc.target/i386/dec-cmov-2.c: New test.

Please also allow ia32 in the testcases. #ifdef __x86_64__ 64bit
specific (long long) tests and add:

/* { dg-additional-options "-march=pentiumpro -mregparm=3" { target ia32 } } */

(cmov generation uses ancient ix86_arch_features, it gets enabled by
using -march=pentiumpro).

OK with the above change.

Thanks,
Uros.


Re: [PATCH] c++: __builtin_is_pointer_interconvertible_with_class incremental fix [PR101539]

2021-07-30 Thread Jakub Jelinek via Gcc-patches
On Fri, Jul 30, 2021 at 01:10:33AM -0400, Jason Merrill wrote:
> > It also shows that in the case (we're beyond the standard in this case
> > because anonymous structures are not in the standard) of union with
> > non-std-layout anonymous structure in it, in the case in the testcases like:
> > struct D {};
> > struct E { [[no_unique_address]] D e; };
> > union Y { int a; struct : public E { short b; long c; }; long long d; };
> 
> We don't already reject an anonymous struct with bases?  I think we should
> do so, in fixup_anonymous_aggr.  We might even require anonymous structs to
> be standard-layout.

Not having base classes seems reasonable requirement for the anonymous
structures, after all, I couldn't find a way to refer to the members
in the base class - ::e is rejected with the above.
But standard layout means that even all the non-static members of the struct
need to be standard-layout, that seems an unnecessary requirement for
anon structures to me.

Jakub



Re: [PATCH v4] Add QI vector mode support to by-pieces for memset

2021-07-30 Thread Richard Sandiford via Gcc-patches
"H.J. Lu via Gcc-patches"  writes:
> On Mon, Jul 26, 2021 at 2:53 PM Richard Sandiford
>  wrote:
>>
>> "H.J. Lu via Gcc-patches"  writes:
>> > On Mon, Jul 26, 2021 at 11:42 AM Richard Sandiford
>> >  wrote:
>> >>
>> >> "H.J. Lu via Gcc-patches"  writes:
>> >> > +to avoid stack realignment when expanding memset.  The default is
>> >> > +@code{gen_reg_rtx}.
>> >> > +@end deftypefn
>> >> > +
>> >> >  @deftypefn {Target Hook} unsigned TARGET_LOOP_UNROLL_ADJUST (unsigned 
>> >> > @var{nunroll}, class loop *@var{loop})
>> >> >  This target hook returns a new value for the number of times @var{loop}
>> >> >  should be unrolled. The parameter @var{nunroll} is the number of times
>> >> > […]
>> >> > @@ -1446,7 +1511,10 @@ can_store_by_pieces (unsigned HOST_WIDE_INT len,
>> >> >max_size = STORE_MAX_PIECES + 1;
>> >> >while (max_size > 1 && l > 0)
>> >> >   {
>> >> > -   scalar_int_mode mode = widest_int_mode_for_size (max_size);
>> >> > +   /* Since this can be called before virtual registers are ready
>> >> > +  to use, avoid QI vector mode here.  */
>> >> > +   fixed_size_mode mode
>> >> > + = widest_fixed_size_mode_for_size (max_size, false);
>> >>
>> >> I think I might have asked this before, sorry, but: when is that true
>> >> and why does it matter?
>> >
>> > can_store_by_pieces may be called:
>> >
>> > value-prof.c:  if (!can_store_by_pieces (val, builtin_memset_read_str,
>> > value-prof.c:  if (!can_store_by_pieces (val, builtin_memset_read_str,
>> >
>> > before virtual registers can be used.   When true is passed to
>> > widest_fixed_size_mode_for_size,  virtual registers may be used
>> > to expand memset to broadcast, which leads to ICE.   Since for the
>> > purpose of can_store_by_pieces, we don't need to expand memset
>> > to broadcast and pass false here can avoid ICE.
>>
>> Ah, I see, thanks.
>>
>> That sounds like a problem in the way that the memset const function is
>> written though.  can_store_by_pieces is just a query function, so I don't
>> think it should be trying to create new registers for can_store_by_pieces,
>> even if it could.  At the same time, can_store_by_pieces should make the
>> same choices as the real expander would.
>>
>> I think this means that:
>>
>> - gen_memset_broadcast should be inlined into its callers, with the
>>   builtin_memset_read_str getting the CONST_INT_P case and
>>   builtin_memset_gen_str getting the variable case.
>>
>> - builtin_memset_read_str should then stop at and return the
>>   gen_const_vec_duplicate when the prev argument is null.
>>   Only when prev is nonnull should it go on to call the hook
>>   and copy the constant to the register that the hook returns.
>
> How about keeping gen_memset_broadcast and passing PREV to it:
>
>   rtx target;
>   if (CONST_INT_P (data))
> {
>   rtx const_vec = gen_const_vec_duplicate (mode, data);
>   if (prev == NULL)
> /* Return CONST_VECTOR when called by a query function.  */
> target = const_vec;
>   else
> {
>   /* Use the move expander with CONST_VECTOR.  */
>   target = targetm.gen_memset_scratch_rtx (mode);
>   emit_move_insn (target, const_vec);
> }
> }
>   else
> {
>   target = targetm.gen_memset_scratch_rtx (mode);
>   class expand_operand ops[2];
>   create_output_operand ([0], target, mode);
>   create_input_operand ([1], data, QImode);
>   expand_insn (icode, 2, ops);
>   if (!rtx_equal_p (target, ops[0].value))
> emit_move_insn (target, ops[0].value);
> }

TBH I think that complicates the interface too much.  The constant
and non-constant cases are now very different.

Thanks,
Richard


Re: [PATCH v4] Add QI vector mode support to by-pieces for memset

2021-07-30 Thread Richard Sandiford via Gcc-patches
"H.J. Lu"  writes:
> On Tue, Jul 27, 2021 at 8:31 AM H.J. Lu  wrote:
>>
>> On Mon, Jul 26, 2021 at 4:19 PM H.J. Lu  wrote:
>> >
>> > On Mon, Jul 26, 2021 at 3:56 PM H.J. Lu  wrote:
>> > >
>> > > On Mon, Jul 26, 2021 at 2:53 PM Richard Sandiford
>> > >  wrote:
>> > > >
>> > > > "H.J. Lu via Gcc-patches"  writes:
>> > > > > On Mon, Jul 26, 2021 at 11:42 AM Richard Sandiford
>> > > > >  wrote:
>> > > > >>
>> > > > >> "H.J. Lu via Gcc-patches"  writes:
>> > > > >> > +to avoid stack realignment when expanding memset.  The default is
>> > > > >> > +@code{gen_reg_rtx}.
>> > > > >> > +@end deftypefn
>> > > > >> > +
>> > > > >> >  @deftypefn {Target Hook} unsigned TARGET_LOOP_UNROLL_ADJUST 
>> > > > >> > (unsigned @var{nunroll}, class loop *@var{loop})
>> > > > >> >  This target hook returns a new value for the number of times 
>> > > > >> > @var{loop}
>> > > > >> >  should be unrolled. The parameter @var{nunroll} is the number of 
>> > > > >> > times
>> > > > >> > […]
>> > > > >> > @@ -1446,7 +1511,10 @@ can_store_by_pieces (unsigned 
>> > > > >> > HOST_WIDE_INT len,
>> > > > >> >max_size = STORE_MAX_PIECES + 1;
>> > > > >> >while (max_size > 1 && l > 0)
>> > > > >> >   {
>> > > > >> > -   scalar_int_mode mode = widest_int_mode_for_size 
>> > > > >> > (max_size);
>> > > > >> > +   /* Since this can be called before virtual registers are 
>> > > > >> > ready
>> > > > >> > +  to use, avoid QI vector mode here.  */
>> > > > >> > +   fixed_size_mode mode
>> > > > >> > + = widest_fixed_size_mode_for_size (max_size, false);
>> > > > >>
>> > > > >> I think I might have asked this before, sorry, but: when is that 
>> > > > >> true
>> > > > >> and why does it matter?
>> > > > >
>> > > > > can_store_by_pieces may be called:
>> > > > >
>> > > > > value-prof.c:  if (!can_store_by_pieces (val, 
>> > > > > builtin_memset_read_str,
>> > > > > value-prof.c:  if (!can_store_by_pieces (val, 
>> > > > > builtin_memset_read_str,
>> > > > >
>> > > > > before virtual registers can be used.   When true is passed to
>> > > > > widest_fixed_size_mode_for_size,  virtual registers may be used
>> > > > > to expand memset to broadcast, which leads to ICE.   Since for the
>> > > > > purpose of can_store_by_pieces, we don't need to expand memset
>> > > > > to broadcast and pass false here can avoid ICE.
>> > > >
>> > > > Ah, I see, thanks.
>> > > >
>> > > > That sounds like a problem in the way that the memset const function is
>> > > > written though.  can_store_by_pieces is just a query function, so I 
>> > > > don't
>> > > > think it should be trying to create new registers for 
>> > > > can_store_by_pieces,
>> > > > even if it could.  At the same time, can_store_by_pieces should make 
>> > > > the
>> > > > same choices as the real expander would.
>> > > >
>> > > > I think this means that:
>> > > >
>> > > > - gen_memset_broadcast should be inlined into its callers, with the
>> > > >   builtin_memset_read_str getting the CONST_INT_P case and
>> > > >   builtin_memset_gen_str getting the variable case.
>> > > >
>> > > > - builtin_memset_read_str should then stop at and return the
>> > > >   gen_const_vec_duplicate when the prev argument is null.
>> >
>> > This doesn't work since can_store_by_pieces has
>> >
>> >  cst = (*constfun) (constfundata, nullptr, offset, mode);
>> >   if (!targetm.legitimate_constant_p (mode, cst))
>>
>> We can add a target hook, targetm.legitimate_memset_constant_p,
>> which defaults to targetm.legitimate_constant_p.  Will it be acceptable?
>
> In the v5 patch,  I changed it to
>
>  cst = (*constfun) (constfundata, nullptr, offset, mode);
>   /* All CONST_VECTORs are legitimate if vec_duplicate
>  is supported.  */

Maybe “can be loaded” rather than “are legitimate”, since they're
not necessarily legitimate in the sense of legitimate_constant_p
(hence the patch).  Also, since we assume elsewhere that
vec_duplicate is a precondition for picking a vector mode,
I think we should do the same here (and note that in the comment).
So…

>   if (!((memsetp
>  && VECTOR_MODE_P (mode)
>  && GET_MODE_INNER (mode) == QImode
>  && (optab_handler (vec_duplicate_optab, mode)
>  != CODE_FOR_nothing))

I think we need only the (memsetp && VECTOR_MODE_P (mode)) check.

This feels a bit of a hack TBH.  I think the same principles apply
to vectors and integers here: forcing the constant to memory is
still likely to be an optimisation, but is an extra overhead that
we should probably account for.

However, I agree this is probably the most practical way forward
at the moment.

Thanks,
Richard

> || targetm.legitimate_constant_p (mode, cst)))
> return 0;
>
>> > ix86_legitimate_constant_p only allows 0 or -1 for CONST_VECTOR.
>> > 

Re: [PATCH v2] c++: Accept C++11 attribute-definition [PR101582]

2021-07-30 Thread Jakub Jelinek via Gcc-patches
On Thu, Jul 29, 2021 at 03:03:50PM -0400, Jason Merrill wrote:
> Let's not mention the obscure attribute-declaration grammar nonterminal,
> "attribute ignored" seems sufficient.

Ok, thanks.  Here is what I've committed and I have also filed
PR101686 for the module related xfails.

2021-07-30  Jakub Jelinek  

PR c++/101582
* parser.c (cp_parser_skip_std_attribute_spec_seq): Add a forward
declaration.
(cp_parser_declaration): Parse empty-declaration and
attribute-declaration.
(cp_parser_toplevel_declaration): Don't parse empty-declaration here.

* g++.dg/cpp0x/gen-attrs-45.C: Expect a warning about ignored
attributes instead of error.
* g++.dg/cpp0x/gen-attrs-75.C: New test.
* g++.dg/modules/pr101582-1.C: New test.

--- gcc/cp/parser.c.jj  2021-07-28 23:06:38.658443554 +0200
+++ gcc/cp/parser.c 2021-07-28 23:12:10.955941089 +0200
@@ -2507,6 +2507,8 @@ static tree cp_parser_std_attribute_spec
   (cp_parser *);
 static tree cp_parser_std_attribute_spec_seq
   (cp_parser *);
+static size_t cp_parser_skip_std_attribute_spec_seq
+  (cp_parser *, size_t);
 static size_t cp_parser_skip_attributes_opt
   (cp_parser *, size_t);
 static bool cp_parser_extension_opt
@@ -14410,6 +14412,30 @@ cp_parser_declaration (cp_parser* parser
   cp_token *token2 = (token1->type == CPP_EOF
  ? token1 : cp_lexer_peek_nth_token (parser->lexer, 2));
 
+  if (token1->type == CPP_SEMICOLON)
+{
+  cp_lexer_consume_token (parser->lexer);
+  /* A declaration consisting of a single semicolon is invalid
+   * before C++11.  Allow it unless we're being pedantic.  */
+  if (cxx_dialect < cxx11)
+   pedwarn (input_location, OPT_Wpedantic, "extra %<;%>");
+  return;
+}
+  else if (cp_lexer_nth_token_is (parser->lexer,
+ cp_parser_skip_std_attribute_spec_seq (parser,
+1),
+ CPP_SEMICOLON))
+{
+  location_t attrs_loc = token1->location;
+  tree std_attrs = cp_parser_std_attribute_spec_seq (parser);
+  if (std_attrs != NULL_TREE)
+   warning_at (make_location (attrs_loc, attrs_loc, parser->lexer),
+   OPT_Wattributes, "attribute ignored");
+  if (cp_lexer_next_token_is (parser->lexer, CPP_SEMICOLON))
+   cp_lexer_consume_token (parser->lexer);
+  return;
+}
+
   /* Get the high-water mark for the DECLARATOR_OBSTACK.  */
   void *p = obstack_alloc (_obstack, 0);
 
@@ -14560,14 +14587,6 @@ cp_parser_toplevel_declaration (cp_parse
cp_parser_declaration.  (A #pragma at block scope is
handled in cp_parser_statement.)  */
 cp_parser_pragma (parser, pragma_external, NULL);
-  else if (token->type == CPP_SEMICOLON)
-{
-  cp_lexer_consume_token (parser->lexer);
-  /* A declaration consisting of a single semicolon is invalid
-   * before C++11.  Allow it unless we're being pedantic.  */
-  if (cxx_dialect < cxx11)
-   pedwarn (input_location, OPT_Wpedantic, "extra %<;%>");
-}
   else
 /* Parse the declaration itself.  */
 cp_parser_declaration (parser, NULL_TREE);
--- gcc/testsuite/g++.dg/cpp0x/gen-attrs-45.C.jj2021-07-26 
09:13:08.504121494 +0200
+++ gcc/testsuite/g++.dg/cpp0x/gen-attrs-45.C   2021-07-28 23:07:05.095085351 
+0200
@@ -1,4 +1,4 @@
 // PR c++/52906
 // { dg-do compile { target c++11 } }
 
-[[gnu::deprecated]]; // { dg-error "does not declare anything" }
+[[gnu::deprecated]]; // { dg-warning "attribute ignored" }
--- gcc/testsuite/g++.dg/cpp0x/gen-attrs-75.C.jj2021-07-28 
23:07:05.095085351 +0200
+++ gcc/testsuite/g++.dg/cpp0x/gen-attrs-75.C   2021-07-29 10:59:09.630326797 
+0200
@@ -0,0 +1,35 @@
+// PR c++/101582
+// { dg-do compile }
+// { dg-options "" }
+
+;
+[[]] [[]] [[]];// { dg-warning "attributes only available with" "" { 
target c++98_only } }
+[[foobar]];// { dg-warning "attribute ignored" }
+// { dg-warning "attributes only available with" "" { target c++98_only } .-1 }
+
+extern "C" ;
+extern "C" [[]];   // { dg-warning "attributes only available with" "" { 
target c++98_only } }
+extern "C" extern "C" ;
+extern "C" extern "C" [[]][[]][[]];// { dg-warning "attributes only 
available with" "" { target c++98_only } }
+__extension__ ;
+__extension__ [[]];// { dg-warning "attributes only 
available with" "" { target c++98_only } }
+__extension__ __extension__ ;
+__extension__ __extension__ [[]][[]];  // { dg-warning "attributes only 
available with" "" { target c++98_only } }
+
+namespace N {
+
+;
+[[]] [[]] [[]];// { dg-warning "attributes only available with" "" { 
target c++98_only } }
+[[foobar]];// { dg-warning "attribute ignored" }
+// { dg-warning "attributes only available with" "" { target c++98_only } .-1 }
+
+extern "C" ;
+extern "C" [[]];   // { dg-warning "attributes only 

Re: OpenMP 5.1: omp_display_env

2021-07-30 Thread Jakub Jelinek via Gcc-patches
On Fri, Jul 30, 2021 at 10:43:01AM +0200, Ulrich Drepper wrote:
> On Fri, Jul 30, 2021 at 9:50 AM Tobias Burnus 
> wrote:
> 
> > this patch breaks offloading. The reason is that most code
> > in env.c is enclosed in:
> >
> 
> Indeed, normally I test that configuration but my setup currently has a few
> problems.
> 
> Although the env vars aren't parsed for those targets it seems to be
> appropriate to still provide the complete implementation.  There are other
> functions which print something this is likely bogus as well and/or the
> output isn't seen.
> 
> How about this change?  Compiles for me for NVPTX.  It doesn't change
> anything but the location of the function definition in the file and
> includes for LIBGOMP_OFFLOADED_ONLY some headers which aren't included in
> this file before (but are present).

I think for now it would be better to guard the omp_display_env_*
in fortran.c with #ifndef LIBGOMP_OFFLOADED_ONLY
It is true that we have e.g. omp_display_affinity supported in offloaded
regions, but in that case we don't really support affinity in the offloaded
regions and so it prints the limited information (like it does even on
hosts that don't support affinity).
But for omp_display_env we shouldn't just print the variables with
default unmodified values, but need to have a structure with all of that
info copied from host to target during load image time.

Jakub



Re: OpenMP 5.1: omp_display_env

2021-07-30 Thread Ulrich Drepper via Gcc-patches
Hi,

On Fri, Jul 30, 2021 at 9:50 AM Tobias Burnus 
wrote:

> this patch breaks offloading. The reason is that most code
> in env.c is enclosed in:
>

Indeed, normally I test that configuration but my setup currently has a few
problems.

Although the env vars aren't parsed for those targets it seems to be
appropriate to still provide the complete implementation.  There are other
functions which print something this is likely bogus as well and/or the
output isn't seen.

How about this change?  Compiles for me for NVPTX.  It doesn't change
anything but the location of the function definition in the file and
includes for LIBGOMP_OFFLOADED_ONLY some headers which aren't included in
this file before (but are present).

diff --git a/libgomp/env.c b/libgomp/env.c
index 5220877d533..e7ef294139d 100644
--- a/libgomp/env.c
+++ b/libgomp/env.c
@@ -30,15 +30,16 @@
 #include "libgomp.h"
 #include "gomp-constants.h"
 #include 
+#include 
+#include "thread-stacksize.h"
+#ifdef HAVE_INTTYPES_H
+# include  /* For PRIu64.  */
+#endif
 #ifndef LIBGOMP_OFFLOADED_ONLY
 #include "libgomp_f.h"
 #include "oacc-int.h"
 #include 
 #include 
-#include 
-#ifdef HAVE_INTTYPES_H
-# include  /* For PRIu64.  */
-#endif
 #ifdef STRING_WITH_STRINGS
 # include 
 # include 
@@ -52,7 +53,6 @@
 # endif
 #endif
 #include 
-#include "thread-stacksize.h"

 #ifndef HAVE_STRTOULL
 # define strtoull(ptr, eptr, base) strtoul (ptr, eptr, base)
@@ -97,11 +97,178 @@ char *goacc_device_type;
 int goacc_device_num;
 int goacc_default_dims[GOMP_DIM_MAX];

-#ifndef LIBGOMP_OFFLOADED_ONLY
-
 static int wait_policy;
 static unsigned long stacksize = GOMP_DEFAULT_STACKSIZE;

+
+void
+omp_display_env (int verbose)
+{
+  int i;
+
+  fputs ("\nOPENMP DISPLAY ENVIRONMENT BEGIN\n", stderr);
+
+  fputs ("  _OPENMP = '201511'\n", stderr);
+  fprintf (stderr, "  OMP_DYNAMIC = '%s'\n",
+   gomp_global_icv.dyn_var ? "TRUE" : "FALSE");
+  fprintf (stderr, "  OMP_NESTED = '%s'\n",
+   gomp_global_icv.max_active_levels_var > 1 ? "TRUE" : "FALSE");
+
+  fprintf (stderr, "  OMP_NUM_THREADS = '%lu",
gomp_global_icv.nthreads_var);
+  for (i = 1; i < gomp_nthreads_var_list_len; i++)
+fprintf (stderr, ",%lu", gomp_nthreads_var_list[i]);
+  fputs ("'\n", stderr);
+
+  fprintf (stderr, "  OMP_SCHEDULE = '");
+  if ((gomp_global_icv.run_sched_var & GFS_MONOTONIC))
+{
+  if (gomp_global_icv.run_sched_var != (GFS_MONOTONIC | GFS_STATIC))
+ fputs ("MONOTONIC:", stderr);
+}
+  else if (gomp_global_icv.run_sched_var == GFS_STATIC)
+fputs ("NONMONOTONIC:", stderr);
+  switch (gomp_global_icv.run_sched_var & ~GFS_MONOTONIC)
+{
+case GFS_RUNTIME:
+  fputs ("RUNTIME", stderr);
+  if (gomp_global_icv.run_sched_chunk_size != 1)
+ fprintf (stderr, ",%d", gomp_global_icv.run_sched_chunk_size);
+  break;
+case GFS_STATIC:
+  fputs ("STATIC", stderr);
+  if (gomp_global_icv.run_sched_chunk_size != 0)
+ fprintf (stderr, ",%d", gomp_global_icv.run_sched_chunk_size);
+  break;
+case GFS_DYNAMIC:
+  fputs ("DYNAMIC", stderr);
+  if (gomp_global_icv.run_sched_chunk_size != 1)
+ fprintf (stderr, ",%d", gomp_global_icv.run_sched_chunk_size);
+  break;
+case GFS_GUIDED:
+  fputs ("GUIDED", stderr);
+  if (gomp_global_icv.run_sched_chunk_size != 1)
+ fprintf (stderr, ",%d", gomp_global_icv.run_sched_chunk_size);
+  break;
+case GFS_AUTO:
+  fputs ("AUTO", stderr);
+  break;
+}
+  fputs ("'\n", stderr);
+
+  fputs ("  OMP_PROC_BIND = '", stderr);
+  switch (gomp_global_icv.bind_var)
+{
+case omp_proc_bind_false:
+  fputs ("FALSE", stderr);
+  break;
+case omp_proc_bind_true:
+  fputs ("TRUE", stderr);
+  break;
+case omp_proc_bind_master:
+  fputs ("MASTER", stderr);
+  break;
+case omp_proc_bind_close:
+  fputs ("CLOSE", stderr);
+  break;
+case omp_proc_bind_spread:
+  fputs ("SPREAD", stderr);
+  break;
+}
+  for (i = 1; i < gomp_bind_var_list_len; i++)
+switch (gomp_bind_var_list[i])
+  {
+  case omp_proc_bind_master:
+ fputs (",MASTER", stderr);
+ break;
+  case omp_proc_bind_close:
+ fputs (",CLOSE", stderr);
+ break;
+  case omp_proc_bind_spread:
+ fputs (",SPREAD", stderr);
+ break;
+  }
+  fputs ("'\n", stderr);
+  fputs ("  OMP_PLACES = '", stderr);
+  for (i = 0; i < gomp_places_list_len; i++)
+{
+  fputs ("{", stderr);
+  gomp_affinity_print_place (gomp_places_list[i]);
+  fputs (i + 1 == gomp_places_list_len ? "}" : "},", stderr);
+}
+  fputs ("'\n", stderr);
+
+  fprintf (stderr, "  OMP_STACKSIZE = '%lu'\n", stacksize);
+
+  /* GOMP's default value is actually neither active nor passive.  */
+  fprintf (stderr, "  OMP_WAIT_POLICY = '%s'\n",
+   wait_policy > 0 ? "ACTIVE" : "PASSIVE");
+  fprintf (stderr, "  OMP_THREAD_LIMIT = '%u'\n",
+   gomp_global_icv.thread_limit_var);
+  fprintf (stderr, "  OMP_MAX_ACTIVE_LEVELS = '%u'\n",
+   gomp_global_icv.max_active_levels_var);
+
+  

Re: [PATCH] Replace evrp use in loop versioning with ranger.

2021-07-30 Thread Richard Sandiford via Gcc-patches
Aldy Hernandez  writes:
> On Mon, Jul 26, 2021 at 7:28 PM Richard Sandiford
>  wrote:
>>
>> Aldy Hernandez  writes:
>> > On Mon, Jul 26, 2021 at 4:18 PM Richard Sandiford
>> >  wrote:
>> >>
>> >> Aldy Hernandez  writes:
>> >> > This patch replaces the evrp_range_analyzer in the loop versioning code
>> >> > with an on-demand ranger.
>> >> >
>> >> > Everything was pretty straightforward, except that range_of_expr 
>> >> > requires
>> >> > a gimple statement as context to provide context aware ranges.  I 
>> >> > didn't see
>> >> > a convient place where the statement was saved, so I made a vector 
>> >> > indexed
>> >> > by SSA names.  As an alternative, I tried to use the loop's first 
>> >> > statement,
>> >> > but that proved to be insufficient.
>> >>
>> >> The mapping is one-to-many though: there can be multiple statements
>> >> for each SSA name.  Maybe that doesn't matter in this context and
>> >> any of the statements can act as a representative.
>> >>
>> >> I'm surprised that the loop's first statement didn't work though,
>> >> since the SSA name is supposedly known to be loop-invariant.  What went
>> >> wrong when you tried that?
>> >
>> > I was looking at the first statement of loop_info->block_list and one
>> > of the dg.exp=loop-versioning* tests failed.  Perhaps I should have
>> > used the loop itself, as in the attached patch.  With this patch all
>> > of the loop-versioning tests pass.
>> >
>> >>
>> >> > I am not familiar with loop versioning, but if the DOM walk was only
>> >> > necessary for the calls to record_ranges_from_stmt, this too could be
>> >> > removed as the ranger will work without it.
>> >>
>> >> Yeah, that was the only reason.  If the information is available at
>> >> version_for_unity (I guess it is) then we should just avoid recording
>> >> the versioning there if so.
>> >>
>> >> How expensive is the check?  If the result is worth caching, perhaps
>> >> we should have two bitmaps: the existing one, and one that records
>> >> whether we've checked a particular SSA name.
>> >>
>> >> If the check is relatively cheap then that won't be worth it though.
>> >
>> > If you're asking about the range_of_expr check, that's all cached, so
>> > it should be pretty cheap.  Besides, we're no longer calculating
>> > ranges for each statement in the IL, as we were doing in lv_dom_walker
>> > with evrp's record_ranges_from_stmt.  Only statements of interest are
>> > queried.
>>
>> Sounds good.  If the results are already cached then another level
>> of caching (via the second bitmap I mentioned above) would obviously
>> be a waste of time.
>
> My callgrind harness for performance testing wasn't able to pick up
> enough samples to measure the time spent in
> pass_loop_versioning::execute.  I've seen this happen before with
> passes that run too fast.  I'm afraid I don't have enough cycles to
> continue working on this.

Yeah, any testing of this was above and beyond IMO.  Hearing that the
range query does its own caching was enough for me. :-)

>> > How about this patch, pending tests?
>>
>> OK, thanks, as a strict improvement over the status quo.  But it'd be
>> even better without the dom walk :-)
>
> I've removed the DOM walk, and re-tested.
>
> OK to push?

Sorry for asking for another iteration, but…

> Aldy
>
> From 9b1cba95377e7b26b4f0495b1b5998d2f7f33a14 Mon Sep 17 00:00:00 2001
> From: Aldy Hernandez 
> Date: Sat, 24 Jul 2021 12:29:28 +0200
> Subject: [PATCH] Replace evrp use in loop versioning with ranger.
>
> This patch replaces the evrp_range_analyzer in the loop versioning code
> with a ranger.
>
> Tested on x86-64 Linux.
>
> gcc/ChangeLog:
>
>   * gimple-loop-versioning.cc (lv_dom_walker::lv_dom_walker): Remove.
>   (loop_versioning::lv_dom_walker::before_dom_children): Remove.
>   (loop_versioning::lv_dom_walker::after_dom_children): Remove.
>   (loop_versioning::prune_loop_conditions): Replace vr_values use
>   with range_query interface.
>   (loop_versioning::prune_conditions): Replace dom walk with
>   straight iteration.
>   (pass_loop_versioning::execute): Use ranger.
> ---
>  gcc/gimple-loop-versioning.cc | 78 ---
>  1 file changed, 18 insertions(+), 60 deletions(-)
>
> diff --git a/gcc/gimple-loop-versioning.cc b/gcc/gimple-loop-versioning.cc
> index 4b70c5a4aab..52eb6429171 100644
> --- a/gcc/gimple-loop-versioning.cc
> +++ b/gcc/gimple-loop-versioning.cc
> @@ -30,19 +30,17 @@ along with GCC; see the file COPYING3.  If not see
>  #include "tree-ssa-loop.h"
>  #include "ssa.h"
>  #include "tree-scalar-evolution.h"
> -#include "tree-chrec.h"
>  #include "tree-ssa-loop-ivopts.h"
>  #include "fold-const.h"
>  #include "tree-ssa-propagate.h"
>  #include "tree-inline.h"
>  #include "domwalk.h"
> -#include "alloc-pool.h"
> -#include "vr-values.h"
> -#include "gimple-ssa-evrp-analyze.h"
>  #include "tree-vectorizer.h"
>  #include "omp-general.h"
>  #include "predict.h"
>  #include "tree-into-ssa.h"
> +#include 

Re: PING^5: [PATCH] mips: add MSA vec_cmp and vec_cmpu expand pattern [PR101132]

2021-07-30 Thread Richard Biener via Gcc-patches
On Fri, Jul 30, 2021 at 10:06 AM Richard Sandiford via Gcc-patches
 wrote:
>
> Xi Ruoyao  writes:
> > Ping again.
>
> Sorry that this has gone unreviewed for so long.  I think in practice
> the MIPS port is essentially unmaintained at this point -- it would
> be great if someone would volunteer :-)
>
> It isn't really appropriate for me to review MIPS stuff given that I work
> for a company that has a competing architecture.  I think Jeff expressed
> similar concerns given his new role.

I think that should be a non-issue unless it is an issue between you
and your employer (I realize some companies even restrict what you
can do in your spare time).  We trust maintainers / reviewers to do
the right thing(TM) for the GCC project even when it is against the
interest of the company they are employed by.  That is, not push
crap even if it is in the area of your maintainership.

Richard.


Re: PING^5: [PATCH] mips: Fix up mips_atomic_assign_expand_fenv [PR94780]

2021-07-30 Thread Xi Ruoyao via Gcc-patches
On Fri, 2021-07-30 at 09:11 +0100, Richard Sandiford wrote:
> Xi Ruoyao  writes:
> > Ping again.
> > 
> > On Wed, 2021-06-23 at 11:11 +0800, Xi Ruoyao wrote:
> > > Commit message shamelessly copied from 1777beb6b129 by jakub:
> > > 
> > > This function, because it is sometimes called even outside of
> > > function
> > > bodies, uses create_tmp_var_raw rather than create_tmp_var.  But
> > > in
> > > order
> > > for that to work, when first referenced, the VAR_DECLs need to
> > > appear
> > > in a
> > > TARGET_EXPR so that during gimplification the var gets the right
> > > DECL_CONTEXT and is added to local decls.
> > > 
> > > Bootstrapped & regtested on mips64el-linux-gnu.  Ok for trunk and
> > > backport
> > > to 11, 10, and 9?
> 
> OK for all, thanks.
> 
> Similar comments to the previous message about the appropriateness
> of me reviewing the patch, but like you say, this is doing for MIPS
> what we've already had to do for other targets.

Thanks for reviewing.

Will bootstrap and test it again, and commit if there is no regressions.
-- 
Xi Ruoyao 
School of Aerospace Science and Technology, Xidian University



Re: PING^5: [PATCH] mips: add MSA vec_cmp and vec_cmpu expand pattern [PR101132]

2021-07-30 Thread Xi Ruoyao via Gcc-patches
On Fri, 2021-07-30 at 09:04 +0100, Richard Sandiford wrote:
> Xi Ruoyao  writes:
> > Ping again.
> 
> Sorry that this has gone unreviewed for so long.  I think in practice
> the MIPS port is essentially unmaintained at this point -- it would
> be great if someone would volunteer :-)

A company working on MIPS has contacted me and said one of their
employees may contact the SC and take the role of MIPS maintainer.  Not
sure their progress though.

> It isn't really appropriate for me to review MIPS stuff given that I
> work
> for a company that has a competing architecture.  I think Jeff
> expressed
> similar concerns given his new role.

> That said, the patch looks clearly correct to me, so please go ahead
> and apply (to trunk and GCC 11).  Thanks for your patience.

Thanks!

It has been 5 weeks so it's better to rebase and bootstrap & test it
again.  I'll commit it if there is no regression.
-- 
Xi Ruoyao 
School of Aerospace Science and Technology, Xidian University



Re: PING^5: [PATCH] mips: Fix up mips_atomic_assign_expand_fenv [PR94780]

2021-07-30 Thread Richard Sandiford via Gcc-patches
Xi Ruoyao  writes:
> Ping again.
>
> On Wed, 2021-06-23 at 11:11 +0800, Xi Ruoyao wrote:
>> Commit message shamelessly copied from 1777beb6b129 by jakub:
>> 
>> This function, because it is sometimes called even outside of function
>> bodies, uses create_tmp_var_raw rather than create_tmp_var.  But in
>> order
>> for that to work, when first referenced, the VAR_DECLs need to appear
>> in a
>> TARGET_EXPR so that during gimplification the var gets the right
>> DECL_CONTEXT and is added to local decls.
>> 
>> Bootstrapped & regtested on mips64el-linux-gnu.  Ok for trunk and
>> backport
>> to 11, 10, and 9?

OK for all, thanks.

Similar comments to the previous message about the appropriateness
of me reviewing the patch, but like you say, this is doing for MIPS
what we've already had to do for other targets.

Richard

>> 
>> gcc/
>> 
>> * config/mips/mips.c (mips_atomic_assign_expand_fenv): Use
>>   TARGET_EXPR instead of MODIFY_EXPR.
>> ---
>>  gcc/config/mips/mips.c | 12 ++--
>>  1 file changed, 6 insertions(+), 6 deletions(-)
>> 
>> diff --git a/gcc/config/mips/mips.c b/gcc/config/mips/mips.c
>> index 8f043399a8e..89d1be6cea6 100644
>> --- a/gcc/config/mips/mips.c
>> +++ b/gcc/config/mips/mips.c
>> @@ -22439,12 +22439,12 @@ mips_atomic_assign_expand_fenv (tree *hold,
>> tree *clear, tree *update)
>>    tree get_fcsr = mips_builtin_decls[MIPS_GET_FCSR];
>>    tree set_fcsr = mips_builtin_decls[MIPS_SET_FCSR];
>>    tree get_fcsr_hold_call = build_call_expr (get_fcsr, 0);
>> -  tree hold_assign_orig = build2 (MODIFY_EXPR, MIPS_ATYPE_USI,
>> - fcsr_orig_var, get_fcsr_hold_call);
>> +  tree hold_assign_orig = build4 (TARGET_EXPR, MIPS_ATYPE_USI,
>> + fcsr_orig_var, get_fcsr_hold_call,
>> NULL, NULL);
>>    tree hold_mod_val = build2 (BIT_AND_EXPR, MIPS_ATYPE_USI,
>> fcsr_orig_var,
>>   build_int_cst (MIPS_ATYPE_USI,
>> 0xf003));
>> -  tree hold_assign_mod = build2 (MODIFY_EXPR, MIPS_ATYPE_USI,
>> -    fcsr_mod_var, hold_mod_val);
>> +  tree hold_assign_mod = build4 (TARGET_EXPR, MIPS_ATYPE_USI,
>> +    fcsr_mod_var, hold_mod_val, NULL,
>> NULL);
>>    tree set_fcsr_hold_call = build_call_expr (set_fcsr, 1,
>> fcsr_mod_var);
>>    tree hold_all = build2 (COMPOUND_EXPR, MIPS_ATYPE_USI,
>>   hold_assign_orig, hold_assign_mod);
>> @@ -22454,8 +22454,8 @@ mips_atomic_assign_expand_fenv (tree *hold,
>> tree *clear, tree *update)
>>    *clear = build_call_expr (set_fcsr, 1, fcsr_mod_var);
>>  
>>    tree get_fcsr_update_call = build_call_expr (get_fcsr, 0);
>> -  *update = build2 (MODIFY_EXPR, MIPS_ATYPE_USI,
>> -   exceptions_var, get_fcsr_update_call);
>> +  *update = build4 (TARGET_EXPR, MIPS_ATYPE_USI,
>> +   exceptions_var, get_fcsr_update_call, NULL, NULL);
>>    tree set_fcsr_update_call = build_call_expr (set_fcsr, 1,
>> fcsr_orig_var);
>>    *update = build2 (COMPOUND_EXPR, void_type_node, *update,
>>     set_fcsr_update_call);


Re: PING^5: [PATCH] mips: add MSA vec_cmp and vec_cmpu expand pattern [PR101132]

2021-07-30 Thread Richard Sandiford via Gcc-patches
Xi Ruoyao  writes:
> Ping again.

Sorry that this has gone unreviewed for so long.  I think in practice
the MIPS port is essentially unmaintained at this point -- it would
be great if someone would volunteer :-)

It isn't really appropriate for me to review MIPS stuff given that I work
for a company that has a competing architecture.  I think Jeff expressed
similar concerns given his new role.

That said, the patch looks clearly correct to me, so please go ahead
and apply (to trunk and GCC 11).  Thanks for your patience.

Richard

> On Mon, 2021-06-21 at 21:42 +0800, Xi Ruoyao wrote:
>> Middle-end started to emit vec_cmp and vec_cmpu since GCC 11, causing
>> ICE on MIPS with MSA enabled.  Add the pattern to prevent it.
>> 
>> Bootstrapped and regression tested on mips64el-linux-gnu.
>> Ok for trunk?
>> 
>> gcc/
>> 
>> * config/mips/mips-protos.h (mips_expand_vec_cmp_expr):
>> Declare.
>> * config/mips/mips.c (mips_expand_vec_cmp_expr): New function.
>> * config/mips/mips-msa.md (vec_cmp): New
>>   expander.
>>   (vec_cmpu): New expander.
>> ---
>>  gcc/config/mips/mips-msa.md   | 22 ++
>>  gcc/config/mips/mips-protos.h |  1 +
>>  gcc/config/mips/mips.c    | 11 +++
>>  3 files changed, 34 insertions(+)
>> 
>> diff --git a/gcc/config/mips/mips-msa.md b/gcc/config/mips/mips-msa.md
>> index 3ecf2bde19f..3a67f25be56 100644
>> --- a/gcc/config/mips/mips-msa.md
>> +++ b/gcc/config/mips/mips-msa.md
>> @@ -435,6 +435,28 @@
>>    DONE;
>>  })
>>  
>> +(define_expand "vec_cmp"
>> +  [(match_operand: 0 "register_operand")
>> +   (match_operator 1 ""
>> + [(match_operand:MSA 2 "register_operand")
>> +  (match_operand:MSA 3 "register_operand")])]
>> +  "ISA_HAS_MSA"
>> +{
>> +  mips_expand_vec_cmp_expr (operands);
>> +  DONE;
>> +})
>> +
>> +(define_expand "vec_cmpu"
>> +  [(match_operand: 0 "register_operand")
>> +   (match_operator 1 ""
>> + [(match_operand:IMSA 2 "register_operand")
>> +  (match_operand:IMSA 3 "register_operand")])]
>> +  "ISA_HAS_MSA"
>> +{
>> +  mips_expand_vec_cmp_expr (operands);
>> +  DONE;
>> +})
>> +
>>  (define_insn "msa_insert_"
>>    [(set (match_operand:MSA 0 "register_operand" "=f,f")
>> (vec_merge:MSA
>> diff --git a/gcc/config/mips/mips-protos.h b/gcc/config/mips/mips-
>> protos.h
>> index 2cf4ed50292..a685f7f7dd5 100644
>> --- a/gcc/config/mips/mips-protos.h
>> +++ b/gcc/config/mips/mips-protos.h
>> @@ -385,6 +385,7 @@ extern mulsidi3_gen_fn mips_mulsidi3_gen_fn (enum
>> rtx_code);
>>  
>>  extern void mips_register_frame_header_opt (void);
>>  extern void mips_expand_vec_cond_expr (machine_mode, machine_mode,
>> rtx *);
>> +extern void mips_expand_vec_cmp_expr (rtx *);
>>  
>>  /* Routines implemented in mips-d.c  */
>>  extern void mips_d_target_versions (void);
>> diff --git a/gcc/config/mips/mips.c b/gcc/config/mips/mips.c
>> index 00a8eef96aa..8f043399a8e 100644
>> --- a/gcc/config/mips/mips.c
>> +++ b/gcc/config/mips/mips.c
>> @@ -22321,6 +22321,17 @@ mips_expand_msa_cmp (rtx dest, enum rtx_code
>> cond, rtx op0, rtx op1)
>>  }
>>  }
>>  
>> +void
>> +mips_expand_vec_cmp_expr (rtx *operands)
>> +{
>> +  rtx cond = operands[1];
>> +  rtx op0 = operands[2];
>> +  rtx op1 = operands[3];
>> +  rtx res = operands[0];
>> +
>> +  mips_expand_msa_cmp (res, GET_CODE (cond), op0, op1);
>> +}
>> +
>>  /* Expand VEC_COND_EXPR, where:
>>     MODE is mode of the result
>>     VIMODE equivalent integer mode


Re: [PATCH] c++: __builtin_is_pointer_interconvertible_with_class incremental fix [PR101539]

2021-07-30 Thread Jakub Jelinek via Gcc-patches
On Fri, Jul 30, 2021 at 01:10:33AM -0400, Jason Merrill wrote:
> On 7/29/21 10:21 AM, Jakub Jelinek wrote:
> > On Thu, Jul 29, 2021 at 09:50:10AM +0200, Jakub Jelinek via Gcc-patches 
> > wrote:
> > > Now that I'm writing the above text and rereading the
> > > pointer-interconvertibility definition, I think my 
> > > first_nonstatic_data_member_p
> > > and fold_builtin_is_pointer_inverconvertible_with_class have one bug,
> > > for unions the pointer inter-convertibility doesn't talk about std layout 
> > > at
> > > all, so I think I need to check for std_layout_type_p only for non-union
> > > class types and accept any union, std_layout_type_p or not.  But when
> > > recursing from a union type into anonymous structure type punt if the
> > > anonymous structure type is not std_layout_type_p + add testcase coverage.
> > 
> > For this part, here is an incremental fix.  Tested on x86_64-linux.
> > 
> > It also shows that in the case (we're beyond the standard in this case
> > because anonymous structures are not in the standard) of union with
> > non-std-layout anonymous structure in it, in the case in the testcases like:
> > struct D {};
> > struct E { [[no_unique_address]] D e; };
> > union Y { int a; struct : public E { short b; long c; }; long long d; };
> 
> We don't already reject an anonymous struct with bases?  I think we should
> do so, in fixup_anonymous_aggr.  We might even require anonymous structs to
> be standard-layout.

Apparently not, the above is accepted.  I was looking for an example of
non-stdlayout anon aggregate in union and my first try (mixing
private/public members) has been rejected.

> I'm inclined not to handle this extension case specifically.

You mean not to even recurse into anonymous structures in the function,
or something else?

Jakub



[PATCH committed] ipa-devirt: check precision mismatch of enum values [PR101396]

2021-07-30 Thread Xi Ruoyao via Gcc-patches
On Fri, 2021-07-30 at 15:21 +0800, Xi Ruoyao via Gcc-patches wrote:
> On Fri, 2021-07-30 at 15:00 +0800, Kewen.Lin wrote:
> > Hi Ruoyao,
> > 
> > on 2021/7/30 下午12:57, Xi Ruoyao via Gcc-patches wrote:
> > > Ping again.
> > > 
> > 
> > This ping-ed patch has been approved by Richard at
> > 
> > https://gcc.gnu.org/pipermail/gcc-patches/2021-July/576001.html
> > 
> > Just chime in as I guess you didn't receive his mail somehow.
> 
> Hi Kewen,
> 
> I guessed I missed some mail because I didn't subscribe to gcc-
> patches.
> Thanks for the reminder!
> 

Rebased and bootstrapped & regtested on x86_64-linux-gnu.

Committed at 291416d3.



  1   2   >