Re: [PATCH v3] LoongArch: add addr_global attribute

2022-07-29 Thread Huacai Chen via Gcc-patches
Hi, Lulu,

On Sat, Jul 30, 2022 at 11:13 AM Lulu Cheng  wrote:
>
>
> 在 2022/7/30 上午1:17, Xi Ruoyao 写道:
>
> Change v2 to v3:
> - Disable section anchor for addr_global symbols.
> - Use -O2 in test to make sure section anchor is disabled.
>
> --
>
> Background:
> https://lore.kernel.org/loongarch/d7670b60-2782-4642-995b-7baa01779...@loongson.cn/T/#e1d47e2fe185f2e2be8fdc0784f0db2f644119379
>
> Question:  Do you have a better name than "addr_global"?
>
> I think the name can be changed to "get_through_got". What do you think of it?
Is this the same thing as "movable" once we used internally?

Huacai
>
> Alternatives:
>
> 1. Just use "-mno-explicit-relocs -mla-local-with-abs" for kernel
> modules.  It's stupid IMO.
> 2. Implement a "-maddress-local-with-got" option for GCC and use it for
> kernel modules.  It seems too overkill: we might create many unnecessary
> GOT entries.
> 3. For all variables with a section attribute, consider it global.  It
> may make sense, but I just checked x86_64 and riscv and they don't do
> this.
> 4. Implement -mcmodel=extreme and use it for kernel modules.  To me
> "extreme" seems really too extreme.
> 5. More hacks in kernel. (Convert relocations against .data..percpu with
> objtool?  But objtool is not even implemented for LoongArch yet.)
>
> Note: I'll be mostly AFK in the following week.  My attempt to finish
> the kernel support for new relocs before going AFK now failed miserably
> :(.
>
> -- >8 --
>
> A linker script and/or a section attribute may locate a local object in
> some way unexpected by the code model, leading to a link failure.  This
> happens when the Linux kernel loads a module with "local" per-CPU
> variables.
>
> Add an attribute to explicitly mark an variable with the address
> unlimited by the code model so we would be able to work around such
> problems.
>
>
> Others I think are ok.


Re: [PATCH 2/2] xtensa: Fix conflicting hard regno between indirect sibcall fixups and EH_RETURN_STACKADJ_RTX

2022-07-29 Thread Max Filippov via Gcc-patches
On Fri, Jul 29, 2022 at 12:34 PM Takayuki 'January June' Suwa
 wrote:
>
> The hard register A10 was already allocated for EH_RETURN_STACKADJ_RTX.
> (although exception handling and sibling call may not apply at the same time,
>  but for safety)
>
> gcc/ChangeLog:
>
> * config/xtensa/xtensa.md: Change hard register number used in
> the split patterns for indirect sibling call fixups from 10 to 11,
> the last free one for the CALL0 ABI.
> ---
>  gcc/config/xtensa/xtensa.md | 10 +-
>  1 file changed, 5 insertions(+), 5 deletions(-)

Regtested for target=xtensa-linux-uclibc, no new regressions.
Committed to master.

-- 
Thanks.
-- Max


Re: [PATCH 1/2] xtensa: Add RTX costs for if_then_else

2022-07-29 Thread Max Filippov via Gcc-patches
On Fri, Jul 29, 2022 at 12:34 PM Takayuki 'January June' Suwa
 wrote:
>
> It takes one machine instruction for both condtional branch and move.
>
> gcc/ChangeLog:
>
> * config/xtensa/xtensa.cc (xtensa_rtx_costs):
> Add new case for IF_THEN_ELSE.
> ---
>  gcc/config/xtensa/xtensa.cc | 1 +
>  1 file changed, 1 insertion(+)

Regtested for target=xtensa-linux-uclibc, no new regressions.
Committed to master.

-- 
Thanks.
-- Max


Re: [PATCH v3] LoongArch: add addr_global attribute

2022-07-29 Thread Lulu Cheng



在 2022/7/30 上午1:17, Xi Ruoyao 写道:

Change v2 to v3:
- Disable section anchor for addr_global symbols.
- Use -O2 in test to make sure section anchor is disabled.

--

Background:
https://lore.kernel.org/loongarch/d7670b60-2782-4642-995b-7baa01779...@loongson.cn/T/#e1d47e2fe185f2e2be8fdc0784f0db2f644119379

Question:  Do you have a better name than "addr_global"?


I think the name can be changed to "get_through_got".What do you think of it?



Alternatives:

1. Just use "-mno-explicit-relocs -mla-local-with-abs" for kernel
modules.  It's stupid IMO.
2. Implement a "-maddress-local-with-got" option for GCC and use it for
kernel modules.  It seems too overkill: we might create many unnecessary
GOT entries.
3. For all variables with a section attribute, consider it global.  It
may make sense, but I just checked x86_64 and riscv and they don't do
this.
4. Implement -mcmodel=extreme and use it for kernel modules.  To me
"extreme" seems really too extreme.
5. More hacks in kernel. (Convert relocations against .data..percpu with
objtool?  But objtool is not even implemented for LoongArch yet.)

Note: I'll be mostly AFK in the following week.  My attempt to finish
the kernel support for new relocs before going AFK now failed miserably
:(.

-- >8 --

A linker script and/or a section attribute may locate a local object in
some way unexpected by the code model, leading to a link failure.  This
happens when the Linux kernel loads a module with "local" per-CPU
variables.

Add an attribute to explicitly mark an variable with the address
unlimited by the code model so we would be able to work around such
problems.



Others I think are ok.



Re: [PATCH] analyzer: support for creat, dup, dup2 and dup3 in sm-fd.cc [PR106300]

2022-07-29 Thread David Malcolm via Gcc-patches
On Fri, 2022-07-29 at 21:59 +0530, Immad Mir wrote:
> This patch extends the state machine in sm-fd.cc to support
> creat, dup, dup2 and dup3 functions.

Thanks for the patch.

Please can you use PR 106298 for this (in the ChangeLog and subject
line), rather than 106300, as it's more specific.

It's always a battle to keep the subject line a reasonable length,
especially with an "analyzer: " prefix and a " [PRnn]" suffix.  I
think you can leave off the " in sm-fd.cc" from the subject line, which
will help.

I think the patch is close to ready; review comments inline...

> 
> Lightly tested on x86_64 Linux.
> 
> gcc/analyzer/ChangeLog:
> PR analyzer/106300
> * sm-fd.cc (fd_state_machine::on_open): Add
> creat, dup, dup2 and dup3 functions.
> (enum dup): New.
> (fd_state_machine::valid_to_unchecked_state): New.
> (fd_state_machine::on_creat): New.
> (fd_state_machine::on_dup): New.
> 
> gcc/testsuite/ChangeLog:
> PR analyzer/106300
> * gcc.dg/analyzer/fd-1.c: Add tests for 'creat'.
> * gcc.dg/analyzer/fd-2.c: Likewise.
> * gcc.dg/analyzer/fd-4.c: Likewise.
> * gcc.dg/analyzer/fd-6.c: New tests.

At some point we'll want to add documentation to invoke.texi about what
functions we're special-casing in -fanalyzer, but you can postpone the
sm-fd.cc part of that to a follow-up patch if you like.

[...snip...]

> --- a/gcc/analyzer/sm-fd.cc
> +++ b/gcc/analyzer/sm-fd.cc
> @@ -69,6 +69,13 @@ enum access_directions
>    DIRS_WRITE
>  };
>  
> +enum dup
> +{
> +  DUP_1,
> +  DUP_2,
> +  DUP_3
> +};

Please add a comment documenting this enum.

[...snip...]

> +void
> +fd_state_machine::check_for_dup (sm_context *sm_ctxt, const
> supernode *node,
> +    const gimple *stmt, const gcall
> *call,
> +    const tree callee_fndecl, enum dup
> kind) const
> +{
> +  tree lhs = gimple_call_lhs (call);
> +  tree arg_1 = gimple_call_arg (call, 0);
> +  state_t state_arg_1 = sm_ctxt->get_state (stmt, arg_1);
> +  tree diag_arg_1 = sm_ctxt->get_diagnostic_tree (arg_1);
> +  if (state_arg_1 == m_stop)
> +    return;
> +  if (!(is_constant_fd_p (state_arg_1) || is_valid_fd_p
> (state_arg_1)))
> +    {
> +  sm_ctxt->warn (
> + node, stmt, arg_1,
> + new fd_use_without_check (*this, diag_arg_1,
> callee_fndecl));
> +  if (kind == DUP_1)
> +   return;
> +    }

I don't see test coverage for dup of a closed fd; I think we'd want to
warn for that with an fd_use_after_close.  That ought to be tested for
here, but if I'm reading it right, the check is missing.  Can you reuse
fd_state_machine::check_for_open_fd here, rather than duplicating the
logic for use-without-check and for use-after-close ? (since I think
the code is almost the same, apart, perhaps from that early return when
kind is DUP_1).

> +  switch (kind)
> +    {
> +    case DUP_1:
> +  if (!is_constant_fd_p (state_arg_1))
> +   if (lhs)
> + sm_ctxt->set_next_state (stmt, lhs,
> +  valid_to_unchecked_state
> (state_arg_1));
> +  break;

What happens on dup() of an integer constant?
My understanding is that:
* in terms of POSIX: dup will return either a new FD, or -1.
* as for this patch, the LHS won't be checked (for validity, or leaks).
Shouldn't we transition the LHS to m_unchecked_read_write?  Or am I
missing something?

I don't see test coverage for dup of integer constants - please add
some.


> +
> +    case DUP_2:
> +    case DUP_3:
> +  tree arg_2 = gimple_call_arg (call, 1);
> +  state_t state_arg_2 = sm_ctxt->get_state (stmt, arg_2);
> +  tree diag_arg_2 = sm_ctxt->get_diagnostic_tree (arg_2);
> +  if (state_arg_2 == m_stop)
> +   return;
> +  if (!(is_constant_fd_p (state_arg_2) || is_valid_fd_p
> (state_arg_2)))
> +   {
> + sm_ctxt->warn (
> + node, stmt, arg_2,
> + new fd_use_without_check (*this, diag_arg_2,
> callee_fndecl));
> + return;
> +   }
> +
> +  if (!is_constant_fd_p (state_arg_2))
> +   {
> + if (lhs)
> +   sm_ctxt->set_next_state (stmt, lhs,
> +    valid_to_unchecked_state
> (state_arg_2));

I got a bit confused by the logic here, but I think it's correct. 
Maybe add a comment clarifying that we want to make sure we don't pass
-1 as the 2nd argument, and therefor we want to check for
is_valid_fd_p.

We're not yet modeling that lhs == arg_2 on success, but maybe we don't
need to.  Maybe add a comment to that effect.

[...snip...]

> --- /dev/null
> +++ b/gcc/testsuite/gcc.dg/analyzer/fd-6.c

I get a bit lost when there are lots of numbered test files.  Perhaps
you could renaming this, say, to fd-dup-1.c, to reflect the theme of
what's being tested.  But that's up to you.

[...snip...]

Thanks again for the patch.

Dave



[PATCH, v3] Fortran: detect blanks within literal constants in free-form mode [PR92805]

2022-07-29 Thread Harald Anlauf via Gcc-patches

Hi Mikael,

Am 29.07.22 um 22:36 schrieb Mikael Morin:

Indeed, I overlooked that, but my opinion remains that we shouldn’t play
with fixed vs free form considerations here.
So the options I can see are:
  - handle the locus in get_kind; we do it a lot already in matching
functions, so it wouldn’t be different here.
  - implement a variant of gfc_match_char without space gobbling.
  - use gfc_match(...), which is a bit heavy weight to match a single
char string, but otherwise would keep things concise.

My preference goes to the third option, but I’m fine with either of them
if you have a different one.



how about the attached?

This introduces the helper function gfc_match_next_char, which is
your second option.

Thanks,
Harald
From 0a95d103e4855fbcc20fd24d44b97b690d570333 Mon Sep 17 00:00:00 2001
From: Harald Anlauf 
Date: Thu, 28 Jul 2022 22:07:02 +0200
Subject: [PATCH] Fortran: detect blanks within literal constants in free-form
 mode [PR92805]

gcc/fortran/ChangeLog:

	PR fortran/92805
	* gfortran.h (gfc_match_next_char): Declare it.
	* primary.cc (get_kind): Do not skip over blanks in free-form mode.
	(match_string_constant): Likewise.
	* scanner.cc (gfc_match_next_char): New.  Match next character of
	input, treating whitespace depending on fixed or free form.

gcc/testsuite/ChangeLog:

	PR fortran/92805
	* gfortran.dg/literal_constants.f: New test.
	* gfortran.dg/literal_constants.f90: New test.

Co-authored-by: Steven G. Kargl 
---
 gcc/fortran/gfortran.h|  1 +
 gcc/fortran/primary.cc| 17 +
 gcc/fortran/scanner.cc| 17 +
 gcc/testsuite/gfortran.dg/literal_constants.f | 20 
 .../gfortran.dg/literal_constants.f90 | 24 +++
 5 files changed, 68 insertions(+), 11 deletions(-)
 create mode 100644 gcc/testsuite/gfortran.dg/literal_constants.f
 create mode 100644 gcc/testsuite/gfortran.dg/literal_constants.f90

diff --git a/gcc/fortran/gfortran.h b/gcc/fortran/gfortran.h
index 696aadd7db6..645a30e7d49 100644
--- a/gcc/fortran/gfortran.h
+++ b/gcc/fortran/gfortran.h
@@ -3209,6 +3209,7 @@ gfc_char_t gfc_next_char (void);
 char gfc_next_ascii_char (void);
 gfc_char_t gfc_peek_char (void);
 char gfc_peek_ascii_char (void);
+match gfc_match_next_char (gfc_char_t);
 void gfc_error_recovery (void);
 void gfc_gobble_whitespace (void);
 void gfc_new_file (void);
diff --git a/gcc/fortran/primary.cc b/gcc/fortran/primary.cc
index 3f01f67cd49..9fa6779200f 100644
--- a/gcc/fortran/primary.cc
+++ b/gcc/fortran/primary.cc
@@ -92,14 +92,17 @@ get_kind (int *is_iso_c)
 {
   int kind;
   match m;
+  char c;
 
   *is_iso_c = 0;
 
-  if (gfc_match_char ('_') != MATCH_YES)
+  if (gfc_match_next_char ('_') != MATCH_YES)
 return -2;
 
-  m = match_kind_param (&kind, is_iso_c);
-  if (m == MATCH_NO)
+  m = MATCH_NO;
+  c = gfc_peek_ascii_char ();
+  if ((gfc_current_form == FORM_FREE && gfc_is_whitespace (c))
+  || (m = match_kind_param (&kind, is_iso_c)) == MATCH_NO)
 gfc_error ("Missing kind-parameter at %C");
 
   return (m == MATCH_YES) ? kind : -1;
@@ -1074,17 +1077,9 @@ match_string_constant (gfc_expr **result)
   c = gfc_next_char ();
 }
 
-  if (c == ' ')
-{
-  gfc_gobble_whitespace ();
-  c = gfc_next_char ();
-}
-
   if (c != '_')
 goto no_match;
 
-  gfc_gobble_whitespace ();
-
   c = gfc_next_char ();
   if (c != '\'' && c != '"')
 goto no_match;
diff --git a/gcc/fortran/scanner.cc b/gcc/fortran/scanner.cc
index 2dff2514700..2d1980c074c 100644
--- a/gcc/fortran/scanner.cc
+++ b/gcc/fortran/scanner.cc
@@ -1690,6 +1690,23 @@ gfc_peek_ascii_char (void)
 }
 
 
+/* Match next character of input.  In fixed form mode, we also ignore
+   spaces.  */
+
+match
+gfc_match_next_char (gfc_char_t c)
+{
+  locus where;
+
+  where = gfc_current_locus;
+  if (gfc_next_char () == c)
+return MATCH_YES;
+
+  gfc_current_locus = where;
+  return MATCH_NO;
+}
+
+
 /* Recover from an error.  We try to get past the current statement
and get lined up for the next.  The next statement follows a '\n'
or a ';'.  We also assume that we are not within a character
diff --git a/gcc/testsuite/gfortran.dg/literal_constants.f b/gcc/testsuite/gfortran.dg/literal_constants.f
new file mode 100644
index 000..4d1f1b7eb4c
--- /dev/null
+++ b/gcc/testsuite/gfortran.dg/literal_constants.f
@@ -0,0 +1,20 @@
+! { dg-do compile }
+! { dg-options "-ffixed-form" }
+! PR fortran/92805 - blanks within literal constants in fixed-form mode
+
+  implicit none
+  integer, parameter :: ck = kind ("a")  ! default character kind
+  integer, parameter :: rk = kind (1.0)  ! default real kind
+  print *, 1_"abc"
+  print *, 1 _"abc"
+  print *, 1_ "abc"
+  print *, ck_"a"
+  print *, ck _"ab"
+  print *, ck_ "ab"
+  print *, 3.1415_4
+  print *, 3.1415 _4
+  print *, 3.1415_ 4
+  print *, 3.1415_rk
+  print *, 3.1415 _rk
+ 

Re: [PATCH, v2] Fortran: detect blanks within literal constants in free-form mode [PR92805]

2022-07-29 Thread Mikael Morin

Le 29/07/2022 à 21:59, Harald Anlauf via Fortran a écrit :


Am 29.07.22 um 13:11 schrieb Mikael Morin:

 > and use gfc_next_char instead of gfc_match_char

in get_kind.


There is one important functionality in gfc_match_char(): it manages
the locus.  We would need then to add this explicitly to get_kind,
which does not look to me like a big improvement over the present
solution.  Otherwise I get test regressions.

Indeed, I overlooked that, but my opinion remains that we shouldn’t play 
with fixed vs free form considerations here.

So the options I can see are:
 - handle the locus in get_kind; we do it a lot already in matching 
functions, so it wouldn’t be different here.

 - implement a variant of gfc_match_char without space gobbling.
 - use gfc_match(...), which is a bit heavy weight to match a single 
char string, but otherwise would keep things concise.


My preference goes to the third option, but I’m fine with either of them 
if you have a different one.


Re: [PATCH] openmp-simd-clone: Match shift type

2022-07-29 Thread Jakub Jelinek via Gcc-patches
On Fri, Jul 29, 2022 at 06:03:18PM +0100, Andrew Stubbs wrote:
> On 29/07/2022 16:59, Jakub Jelinek wrote:
> > Doing the fold_convert seems to be a wasted effort to me.
> > Can't this be done conditional on whether some change is needed at all
> > and just using gimple_build_assign with NOP_EXPR, so something like:
> 
> I'm just not familiar enough with this stuff to run fold_convert in my head
> with confidence.

The thing with fold_convert is that if some conversion is needed (and
fold_convert actually is strict, so even if the conversion is useless
but the type isn't exactly the same) it creates a NOP_EXPR around the
argument, and then gimple_build_assign notices it should create a NOP_EXPR
assign rhs op and just uses the argument of NOP_EXPR, where the NOP_EXPR
will be GC later.
Plus, if the conversion isn't needed, it creates an extra assignment that
will be only later in some other pass optimized away.
> 
> >   tree shift_cvt_conv = shift_cnt;
> >   if (!useless_type_conversion_p (TREE_TYPE (mask),
> >   TREE_TYPE (shift_cnt)))
> > {
> >   shift_cnt_conv = make_ssa_name (TREE_TYPE (mask));
> >   g = gimple_build_assign (shift_cnt_conv, NOP_EXPR, shift_cnt);
> >   gsi_insert_after (&gsi, g, GSI_CONTINUE_LINKING);
> > }
> > 
> 
> Your version gives the same output mine does, at least on amdgcn anyway.
> 
> Am I OK to commit this version?

Yes, thanks.

> openmp-simd-clone: Match shift types
> 
> Ensure that both parameters to vector shifts use the same mode.  This is most
> important for amdgcn where the masks are DImode.
> 
> gcc/ChangeLog:
> 
>   * omp-simd-clone.cc (simd_clone_adjust): Convert shift_cnt to match
>   the mask type.
> 
> Co-authored-by: Jakub Jelinek  
> 
> diff --git a/gcc/omp-simd-clone.cc b/gcc/omp-simd-clone.cc
> index 32649bc3f9a..58bd68b129b 100644
> --- a/gcc/omp-simd-clone.cc
> +++ b/gcc/omp-simd-clone.cc
> @@ -1305,8 +1305,16 @@ simd_clone_adjust (struct cgraph_node *node)
>  build_int_cst (TREE_TYPE (iter1), c));
> gsi_insert_after (&gsi, g, GSI_CONTINUE_LINKING);
>   }
> +   tree shift_cnt_conv = shift_cnt;
> +   if (!useless_type_conversion_p (TREE_TYPE (mask),
> +   TREE_TYPE (shift_cnt)))
> + {
> +   shift_cnt_conv = make_ssa_name (TREE_TYPE (mask));
> +   g = gimple_build_assign (shift_cnt_conv, NOP_EXPR, shift_cnt);
> +   gsi_insert_after (&gsi, g, GSI_CONTINUE_LINKING);
> + }
> g = gimple_build_assign (make_ssa_name (TREE_TYPE (mask)),
> -RSHIFT_EXPR, mask, shift_cnt);
> +RSHIFT_EXPR, mask, shift_cnt_conv);
> gsi_insert_after (&gsi, g, GSI_CONTINUE_LINKING);
> mask = gimple_assign_lhs (g);
> g = gimple_build_assign (make_ssa_name (TREE_TYPE (mask)),


Jakub



Re: [PATCH, v2] Fortran: fix invalid rank error in ASSOCIATED when rank is remapped [PR77652]

2022-07-29 Thread Harald Anlauf via Gcc-patches

Am 28.07.22 um 22:19 schrieb Mikael Morin:

Hello,

Le 27/07/2022 à 21:45, Harald Anlauf via Fortran a écrit :

ok, I have thought about your comments in the review process,
and played with the Cray compiler.  Attached is a refined version
of the patch that now rejects in addition these cases for which there
are no possible related pointer assignments with bounds remapping:

   ASSOCIATED (scalar, array) ! impossible, cannot remap bounds
   ASSOCIATED (array, scalar) ! a scalar is not simply contiguous


In principle, it could make sense to construct a one-sized array pointer
(of any rank) pointing to a scalar, but I agree that if we follow the
rules of the standard to the letter, it should be rejected (and we do
reject such a pointer assignment).
So, with this case eliminated, this patch looks good to me as is.


OK, so I will push that version soon.


Regarding Toon’s suggestion to ask the fortran committee, it sounds
sensible.  I propose to prepare something tomorrow.



Depending on the answer we can later refine the compile-time check
and relax if needed.

Harald


[PATCH, v2] Fortran: detect blanks within literal constants in free-form mode [PR92805]

2022-07-29 Thread Harald Anlauf via Gcc-patches

Hi Mikael,

Am 29.07.22 um 13:11 schrieb Mikael Morin:

Hello,

Le 28/07/2022 à 22:11, Harald Anlauf via Fortran a écrit :

Dear all,

in free-form mode, blanks are significant, so they cannot appear
in literal constants, especially not before or after the "_" that
separates the literal and the kind specifier.

The initial patch from Steve addressed numerical literals, which
I completed by adjusting the parsing of string literals.

Regtested on x86_64-pc-linux-gnu.  OK for mainline?


It looks correct, but I think we should continue to have the free vs
fixed form abstracted away from the parsing code.


yes, that makes sense.


So, I suggest instead to remove the calls to gfc_gobble_whitespace in
match_string_constant,


Indeed, removing these simplifies the function and indeed works!

> and use gfc_next_char instead of gfc_match_char

in get_kind.


There is one important functionality in gfc_match_char(): it manages
the locus.  We would need then to add this explicitly to get_kind,
which does not look to me like a big improvement over the present
solution.  Otherwise I get test regressions.


Mikael



I've attached a revised version with improved match_string_constant().
What do you think?

Thanks,
Harald
From f8e7c297b7c9e5a2b22185c7e0d638764c33aa71 Mon Sep 17 00:00:00 2001
From: Harald Anlauf 
Date: Thu, 28 Jul 2022 22:07:02 +0200
Subject: [PATCH] Fortran: detect blanks within literal constants in free-form
 mode [PR92805]

gcc/fortran/ChangeLog:

	PR fortran/92805
	* primary.cc (get_kind): Do not skip over blanks in free-form mode.
	(match_string_constant): Likewise.

gcc/testsuite/ChangeLog:

	PR fortran/92805
	* gfortran.dg/literal_constants.f: New test.
	* gfortran.dg/literal_constants.f90: New test.

Co-authored-by: Steven G. Kargl 
---
 gcc/fortran/primary.cc| 19 +++
 gcc/testsuite/gfortran.dg/literal_constants.f | 20 
 .../gfortran.dg/literal_constants.f90 | 24 +++
 3 files changed, 53 insertions(+), 10 deletions(-)
 create mode 100644 gcc/testsuite/gfortran.dg/literal_constants.f
 create mode 100644 gcc/testsuite/gfortran.dg/literal_constants.f90

diff --git a/gcc/fortran/primary.cc b/gcc/fortran/primary.cc
index 3f01f67cd49..604f98a8dd9 100644
--- a/gcc/fortran/primary.cc
+++ b/gcc/fortran/primary.cc
@@ -92,14 +92,21 @@ get_kind (int *is_iso_c)
 {
   int kind;
   match m;
+  char c;
 
   *is_iso_c = 0;
 
+  c = gfc_peek_ascii_char ();
+  if (gfc_current_form == FORM_FREE && gfc_is_whitespace (c))
+return -2;
+
   if (gfc_match_char ('_') != MATCH_YES)
 return -2;
 
-  m = match_kind_param (&kind, is_iso_c);
-  if (m == MATCH_NO)
+  m = MATCH_NO;
+  c = gfc_peek_ascii_char ();
+  if ((gfc_current_form == FORM_FREE && gfc_is_whitespace (c))
+  || (m = match_kind_param (&kind, is_iso_c)) == MATCH_NO)
 gfc_error ("Missing kind-parameter at %C");
 
   return (m == MATCH_YES) ? kind : -1;
@@ -1074,17 +1081,9 @@ match_string_constant (gfc_expr **result)
   c = gfc_next_char ();
 }
 
-  if (c == ' ')
-{
-  gfc_gobble_whitespace ();
-  c = gfc_next_char ();
-}
-
   if (c != '_')
 goto no_match;
 
-  gfc_gobble_whitespace ();
-
   c = gfc_next_char ();
   if (c != '\'' && c != '"')
 goto no_match;
diff --git a/gcc/testsuite/gfortran.dg/literal_constants.f b/gcc/testsuite/gfortran.dg/literal_constants.f
new file mode 100644
index 000..4d1f1b7eb4c
--- /dev/null
+++ b/gcc/testsuite/gfortran.dg/literal_constants.f
@@ -0,0 +1,20 @@
+! { dg-do compile }
+! { dg-options "-ffixed-form" }
+! PR fortran/92805 - blanks within literal constants in fixed-form mode
+
+  implicit none
+  integer, parameter :: ck = kind ("a")  ! default character kind
+  integer, parameter :: rk = kind (1.0)  ! default real kind
+  print *, 1_"abc"
+  print *, 1 _"abc"
+  print *, 1_ "abc"
+  print *, ck_"a"
+  print *, ck _"ab"
+  print *, ck_ "ab"
+  print *, 3.1415_4
+  print *, 3.1415 _4
+  print *, 3.1415_ 4
+  print *, 3.1415_rk
+  print *, 3.1415 _rk
+  print *, 3.1415_ rk
+  end
diff --git a/gcc/testsuite/gfortran.dg/literal_constants.f90 b/gcc/testsuite/gfortran.dg/literal_constants.f90
new file mode 100644
index 000..f8908f9ad76
--- /dev/null
+++ b/gcc/testsuite/gfortran.dg/literal_constants.f90
@@ -0,0 +1,24 @@
+! { dg-do compile }
+! { dg-options "-ffree-form" }
+! PR fortran/92805 - blanks within literal constants in free-form mode
+
+  implicit none
+  integer, parameter :: ck = kind ("a")  ! default character kind
+  integer, parameter :: rk = kind (1.0)  ! default real kind
+  print *, 1_"abc"
+  print *, 1 _"abc"   ! { dg-error "Syntax error" }
+  print *, 1_ "abc"   ! { dg-error "Missing kind-parameter" }
+  print *, 1 _ "abc"  ! { dg-error "Syntax error" }
+  print *, ck_"a"
+  print *, ck _"ab"   ! { dg-error "Syntax error" }
+  print *, ck_ "ab"   ! { dg-error "Syntax error

Re: [GCC13][Patch][V2][1/2]Add a new option -fstrict-flex-array[=n] and attribute strict_flex_array(n) and use it in PR101836

2022-07-29 Thread Qing Zhao via Gcc-patches
Hi, Richard,

Thanks a lot for your comments and suggestions. (And sorry for my late reply).

> On Jul 28, 2022, at 3:26 AM, Richard Biener  wrote:
> 
> On Tue, 19 Jul 2022, Qing Zhao wrote:
> 
>> From 3854004802b8e2f132ebf218fc35a632f5e80c6a Mon Sep 17 00:00:00 2001
>> From: Qing Zhao 
>> Date: Mon, 18 Jul 2022 17:04:12 +
>> Subject: [PATCH 1/2] Add a new option -fstrict-flex-array[=n] and new
>> attribute strict_flex_array
>> 
>> Add the following new option -fstrict-flex-array[=n] and a corresponding
>> attribute strict_flex_array to GCC:
>> 
>> '-fstrict-flex-array'
>>Treat the trailing array of a structure as a flexible array member
>>in a stricter way.  The positive form is equivalent to
>>'-fstrict-flex-array=3', which is the strictest.  A trailing array
>>is treated as a flexible array member only when it is declared as a
>>flexible array member per C99 standard onwards.  The negative form
>>is equivalent to '-fstrict-flex-array=0', which is the least
>>strict.  All trailing arrays of structures are treated as flexible
>>array members.
>> 
>> '-fstrict-flex-array=LEVEL'
>>Treat the trailing array of a structure as a flexible array member
>>in a stricter way.  The value of LEVEL controls the level of
>>strictness.
>> 
>>The possible values of LEVEL are the same as for the
>>'strict_flex_array' attribute (*note Variable Attributes::).
>> 
>>You can control this behavior for a specific trailing array field
>>of a structure by using the variable attribute 'strict_flex_array'
>>attribute (*note Variable Attributes::).
>> 
>> 'strict_flex_array (LEVEL)'
>>The 'strict_flex_array' attribute should be attached to the
>>trailing array field of a structure.  It specifies the level of
>>strictness of treating the trailing array field of a structure as a
>>flexible array member.  LEVEL must be an integer betwen 0 to 3.
>> 
>>LEVEL=0 is the least strict level, all trailing arrays of
>>structures are treated as flexible array members.  LEVEL=3 is the
>>strictest level, only when the trailing array is declared as a
>>flexible array member per C99 standard onwards ([]), it is treated
>>as a flexible array member.
>> 
>>There are two more levels in between 0 and 3, which are provided to
>>support older codes that use GCC zero-length array extension ([0])
>>or one-size array as flexible array member ([1]): When LEVEL is 1,
>>the trailing array is treated as a flexible array member when it is
>>declared as either [], [0], or [1]; When LEVEL is 2, the trailing
>>array is treated as a flexible array member when it is declared as
>>either [], or [0].
>> 
>>This attribute can be used with or without '-fstrict-flex-array'.
>>When both the attribute and the option present at the same time,
>>the level of the strictness for the specific trailing array field
>>is determined by the attribute.
>> 
>> gcc/c-family/ChangeLog:
>> 
>>  * c-attribs.cc (handle_strict_flex_array_attribute): New function.
>>  (c_common_attribute_table): New item for strict_flex_array.
>>  * c.opt (fstrict-flex-array): New option.
>>  (fstrict-flex-array=): New option.
>> 
>> gcc/c/ChangeLog:
>> 
>>  * c-decl.cc (add_flexible_array_elts_to_size): Call new utility
>>  routine flexible_array_member_p.
>>  (is_flexible_array_member_p): New function.
>>  (finish_struct): Set the new DECL_NOT_FLEXARRAY flag.
>> 
>> gcc/ChangeLog:
>> 
>>  * doc/extend.texi: Document strict_flex_array attribute.
>>  * doc/invoke.texi: Document -fstrict-flex-array[=n] option.
>>  * tree-core.h (struct tree_decl_common): New bit field
>>  decl_not_flexarray.
>>  * tree.cc (component_ref_size): Reorg by using new utility functions.
>>  (flexible_array_member_p): New function.
>>  (zero_length_array_p): Likewise.
>>  (one_element_array_p): Likewise.
>>  (flexible_array_type_p): Likewise.
>>  * tree.h (DECL_NOT_FLEXARRAY): New flag.
>>  (zero_length_array_p): New function prototype.
>>  (one_element_array_p): Likewise.
>>  (flexible_array_member_p): Likewise.
>> 
>> gcc/testsuite/ChangeLog:
>> 
>>  * gcc.dg/strict-flex-array-1.c: New test.
>> ---
>> gcc/c-family/c-attribs.cc  |  47 
>> gcc/c-family/c.opt |   7 ++
>> gcc/c/c-decl.cc|  91 +--
>> gcc/doc/extend.texi|  25 
>> gcc/doc/invoke.texi|  27 -
>> gcc/testsuite/gcc.dg/strict-flex-array-1.c |  31 +
>> gcc/tree-core.h|   5 +-
>> gcc/tree.cc| 130 ++---
>> gcc/tree.h |  16 ++-
>> 9 files changed, 322 insertions(+), 57 deletions(-)
>> create mode 100644 gcc/testsuite/gcc.dg/strict-flex-array-1.c
>> 
>> diff --git a/gcc/c-family/c-attribs.cc b/gcc/c-f

[PATCH 2/2] xtensa: Fix conflicting hard regno between indirect sibcall fixups and EH_RETURN_STACKADJ_RTX

2022-07-29 Thread Takayuki 'January June' Suwa via Gcc-patches
The hard register A10 was already allocated for EH_RETURN_STACKADJ_RTX.
(although exception handling and sibling call may not apply at the same time,
 but for safety)

gcc/ChangeLog:

* config/xtensa/xtensa.md: Change hard register number used in
the split patterns for indirect sibling call fixups from 10 to 11,
the last free one for the CALL0 ABI.
---
 gcc/config/xtensa/xtensa.md | 10 +-
 1 file changed, 5 insertions(+), 5 deletions(-)

diff --git a/gcc/config/xtensa/xtensa.md b/gcc/config/xtensa/xtensa.md
index 899ce2755aa..1294aab6c5d 100644
--- a/gcc/config/xtensa/xtensa.md
+++ b/gcc/config/xtensa/xtensa.md
@@ -25,7 +25,7 @@
   (A7_REG  7)
   (A8_REG  8)
   (A9_REG  9)
-  (A10_REG 10)
+  (A11_REG 11)
 
   (UNSPEC_NOP  2)
   (UNSPEC_PLT  3)
@@ -2295,9 +2295,9 @@
   "reload_completed
&& !TARGET_WINDOWED_ABI && SIBLING_CALL_P (insn)
&& ! call_used_or_fixed_reg_p (REGNO (operands[0]))"
-  [(set (reg:SI A10_REG)
+  [(set (reg:SI A11_REG)
(match_dup 0))
-   (call (mem:SI (reg:SI A10_REG))
+   (call (mem:SI (reg:SI A11_REG))
 (match_dup 1))])
 
 (define_expand "sibcall_value"
@@ -2328,10 +2328,10 @@
   "reload_completed
&& !TARGET_WINDOWED_ABI && SIBLING_CALL_P (insn)
&& ! call_used_or_fixed_reg_p (REGNO (operands[1]))"
-  [(set (reg:SI A10_REG)
+  [(set (reg:SI A11_REG)
(match_dup 1))
(set (match_dup 0)
-   (call (mem:SI (reg:SI A10_REG))
+   (call (mem:SI (reg:SI A11_REG))
  (match_dup 2)))])
 
 (define_insn "entry"
-- 
2.20.1


[PATCH 1/2] xtensa: Add RTX costs for if_then_else

2022-07-29 Thread Takayuki 'January June' Suwa via Gcc-patches
It takes one machine instruction for both condtional branch and move.

gcc/ChangeLog:

* config/xtensa/xtensa.cc (xtensa_rtx_costs):
Add new case for IF_THEN_ELSE.
---
 gcc/config/xtensa/xtensa.cc | 1 +
 1 file changed, 1 insertion(+)

diff --git a/gcc/config/xtensa/xtensa.cc b/gcc/config/xtensa/xtensa.cc
index a851a7ae6b3..6ac879c38fb 100644
--- a/gcc/config/xtensa/xtensa.cc
+++ b/gcc/config/xtensa/xtensa.cc
@@ -4273,6 +4273,7 @@ xtensa_rtx_costs (rtx x, machine_mode mode, int 
outer_code,
 
 case ZERO_EXTRACT:
 case ZERO_EXTEND:
+case IF_THEN_ELSE:
   *total = COSTS_N_INSNS (1);
   return true;
 
-- 
2.20.1


[PATCH v3] LoongArch: add addr_global attribute

2022-07-29 Thread Xi Ruoyao via Gcc-patches
Change v2 to v3:
- Disable section anchor for addr_global symbols.
- Use -O2 in test to make sure section anchor is disabled.

--

Background:
https://lore.kernel.org/loongarch/d7670b60-2782-4642-995b-7baa01779...@loongson.cn/T/#e1d47e2fe185f2e2be8fdc0784f0db2f644119379

Question:  Do you have a better name than "addr_global"?

Alternatives:

1. Just use "-mno-explicit-relocs -mla-local-with-abs" for kernel
modules.  It's stupid IMO.
2. Implement a "-maddress-local-with-got" option for GCC and use it for
kernel modules.  It seems too overkill: we might create many unnecessary
GOT entries.
3. For all variables with a section attribute, consider it global.  It
may make sense, but I just checked x86_64 and riscv and they don't do
this.
4. Implement -mcmodel=extreme and use it for kernel modules.  To me
"extreme" seems really too extreme.
5. More hacks in kernel. (Convert relocations against .data..percpu with
objtool?  But objtool is not even implemented for LoongArch yet.)

Note: I'll be mostly AFK in the following week.  My attempt to finish
the kernel support for new relocs before going AFK now failed miserably
:(.

-- >8 --

A linker script and/or a section attribute may locate a local object in
some way unexpected by the code model, leading to a link failure.  This
happens when the Linux kernel loads a module with "local" per-CPU
variables.

Add an attribute to explicitly mark an variable with the address
unlimited by the code model so we would be able to work around such
problems.

gcc/ChangeLog:

* config/loongarch/loongarch.cc (loongarch_attribute_table):
New attribute table.
(TARGET_ATTRIBUTE_TABLE): Define the target hook.
(loongarch_handle_addr_global_attribute): New static function.
(loongarch_classify_symbol): Return SYMBOL_GOT_DISP for
SYMBOL_REF_DECL with addr_global attribute.
(loongarch_use_anchors_for_symbol_p): New static function.
(TARGET_USE_ANCHORS_FOR_SYMBOL_P): Define the target hook.
* doc/extend.texi (Variable Attributes): Document new
LoongArch specific attribute.

gcc/testsuite/ChangeLog:

* gcc.target/loongarch/addr-global.c: New test.
---
 gcc/config/loongarch/loongarch.cc | 61 +++
 gcc/doc/extend.texi   | 17 ++
 .../gcc.target/loongarch/addr-global.c| 28 +
 3 files changed, 106 insertions(+)
 create mode 100644 gcc/testsuite/gcc.target/loongarch/addr-global.c

diff --git a/gcc/config/loongarch/loongarch.cc 
b/gcc/config/loongarch/loongarch.cc
index 79687340dfd..db6f84d4e66 100644
--- a/gcc/config/loongarch/loongarch.cc
+++ b/gcc/config/loongarch/loongarch.cc
@@ -1643,6 +1643,13 @@ loongarch_classify_symbol (const_rtx x)
   && !loongarch_symbol_binds_local_p (x))
 return SYMBOL_GOT_DISP;
 
+  if (SYMBOL_REF_P (x))
+{
+  tree decl = SYMBOL_REF_DECL (x);
+  if (decl && lookup_attribute ("addr_global", DECL_ATTRIBUTES (decl)))
+   return SYMBOL_GOT_DISP;
+}
+
   return SYMBOL_PCREL;
 }
 
@@ -6068,6 +6075,54 @@ loongarch_starting_frame_offset (void)
   return crtl->outgoing_args_size;
 }
 
+static tree
+loongarch_handle_addr_global_attribute (tree *node, tree name, tree, int,
+bool *no_add_attrs)
+{
+  tree decl = *node;
+  if (TREE_CODE (decl) == VAR_DECL)
+{
+  if (DECL_CONTEXT (decl)
+ && TREE_CODE (DECL_CONTEXT (decl)) == FUNCTION_DECL
+ && !TREE_STATIC (decl))
+   {
+ error_at (DECL_SOURCE_LOCATION (decl),
+   "%qE attribute cannot be specified for local "
+   "variables", name);
+ *no_add_attrs = true;
+   }
+}
+  else
+{
+  warning (OPT_Wattributes, "%qE attribute ignored", name);
+  *no_add_attrs = true;
+}
+  return NULL_TREE;
+}
+
+static const struct attribute_spec loongarch_attribute_table[] =
+{
+  /* { name, min_len, max_len, decl_req, type_req, fn_type_req,
+   affects_type_identity, handler, exclude } */
+  { "addr_global", 0, 0, true, false, false, false,
+loongarch_handle_addr_global_attribute, NULL },
+  /* The last attribute spec is set to be NULL.  */
+  {}
+};
+
+bool
+loongarch_use_anchors_for_symbol_p (const_rtx symbol)
+{
+  tree decl = SYMBOL_REF_DECL (symbol);
+
+  /* addr_global indicates we don't know how the linker will locate the symbol,
+ so the use of anchor may cause relocation overflow.  */
+  if (decl && lookup_attribute ("addr_global", DECL_ATTRIBUTES (decl)))
+return false;
+
+  return default_use_anchors_for_symbol_p (symbol);
+}
+
 /* Initialize the GCC target structure.  */
 #undef TARGET_ASM_ALIGNED_HI_OP
 #define TARGET_ASM_ALIGNED_HI_OP "\t.half\t"
@@ -6256,6 +6311,12 @@ loongarch_starting_frame_offset (void)
 #undef  TARGET_HAVE_SPECULATION_SAFE_VALUE
 #define TARGET_HAVE_SPECULATION_SAFE_VALUE speculation_safe_value_not_needed
 
+#undef  TARGET_ATTRIBUTE_TABLE
+#define TARGET_ATTRIBUTE_TABLE loongarch

Re: [PATCH] openmp-simd-clone: Match shift type

2022-07-29 Thread Andrew Stubbs

On 29/07/2022 16:59, Jakub Jelinek wrote:

Doing the fold_convert seems to be a wasted effort to me.
Can't this be done conditional on whether some change is needed at all
and just using gimple_build_assign with NOP_EXPR, so something like:


I'm just not familiar enough with this stuff to run fold_convert in my 
head with confidence.



  tree shift_cvt_conv = shift_cnt;
  if (!useless_type_conversion_p (TREE_TYPE (mask),
  TREE_TYPE (shift_cnt)))
{
  shift_cnt_conv = make_ssa_name (TREE_TYPE (mask));
  g = gimple_build_assign (shift_cnt_conv, NOP_EXPR, shift_cnt);
  gsi_insert_after (&gsi, g, GSI_CONTINUE_LINKING);
}



Your version gives the same output mine does, at least on amdgcn anyway.

Am I OK to commit this version?

Andrew
openmp-simd-clone: Match shift types

Ensure that both parameters to vector shifts use the same mode.  This is most
important for amdgcn where the masks are DImode.

gcc/ChangeLog:

* omp-simd-clone.cc (simd_clone_adjust): Convert shift_cnt to match
the mask type.

Co-authored-by: Jakub Jelinek  

diff --git a/gcc/omp-simd-clone.cc b/gcc/omp-simd-clone.cc
index 32649bc3f9a..58bd68b129b 100644
--- a/gcc/omp-simd-clone.cc
+++ b/gcc/omp-simd-clone.cc
@@ -1305,8 +1305,16 @@ simd_clone_adjust (struct cgraph_node *node)
   build_int_cst (TREE_TYPE (iter1), c));
  gsi_insert_after (&gsi, g, GSI_CONTINUE_LINKING);
}
+ tree shift_cnt_conv = shift_cnt;
+ if (!useless_type_conversion_p (TREE_TYPE (mask),
+ TREE_TYPE (shift_cnt)))
+   {
+ shift_cnt_conv = make_ssa_name (TREE_TYPE (mask));
+ g = gimple_build_assign (shift_cnt_conv, NOP_EXPR, shift_cnt);
+ gsi_insert_after (&gsi, g, GSI_CONTINUE_LINKING);
+   }
  g = gimple_build_assign (make_ssa_name (TREE_TYPE (mask)),
-  RSHIFT_EXPR, mask, shift_cnt);
+  RSHIFT_EXPR, mask, shift_cnt_conv);
  gsi_insert_after (&gsi, g, GSI_CONTINUE_LINKING);
  mask = gimple_assign_lhs (g);
  g = gimple_build_assign (make_ssa_name (TREE_TYPE (mask)),


[PATCH] analyzer: support for creat, dup, dup2 and dup3 in sm-fd.cc [PR106300]

2022-07-29 Thread Immad Mir via Gcc-patches
This patch extends the state machine in sm-fd.cc to support
creat, dup, dup2 and dup3 functions.

Lightly tested on x86_64 Linux.

gcc/analyzer/ChangeLog:
PR analyzer/106300
* sm-fd.cc (fd_state_machine::on_open): Add
creat, dup, dup2 and dup3 functions.
(enum dup): New.
(fd_state_machine::valid_to_unchecked_state): New.
(fd_state_machine::on_creat): New.
(fd_state_machine::on_dup): New.

gcc/testsuite/ChangeLog:
PR analyzer/106300
* gcc.dg/analyzer/fd-1.c: Add tests for 'creat'.
* gcc.dg/analyzer/fd-2.c: Likewise.
* gcc.dg/analyzer/fd-4.c: Likewise.
* gcc.dg/analyzer/fd-6.c: New tests.

Signed-off-by: Immad Mir 
---
 gcc/analyzer/sm-fd.cc| 117 ++-
 gcc/testsuite/gcc.dg/analyzer/fd-1.c |  21 
 gcc/testsuite/gcc.dg/analyzer/fd-2.c |  15 +++
 gcc/testsuite/gcc.dg/analyzer/fd-4.c |  31 -
 gcc/testsuite/gcc.dg/analyzer/fd-6.c | 168 +++
 5 files changed, 350 insertions(+), 2 deletions(-)
 create mode 100644 gcc/testsuite/gcc.dg/analyzer/fd-6.c

diff --git a/gcc/analyzer/sm-fd.cc b/gcc/analyzer/sm-fd.cc
index ed923ade100..7906034599c 100644
--- a/gcc/analyzer/sm-fd.cc
+++ b/gcc/analyzer/sm-fd.cc
@@ -69,6 +69,13 @@ enum access_directions
   DIRS_WRITE
 };
 
+enum dup
+{
+  DUP_1,
+  DUP_2,
+  DUP_3
+};
+
 class fd_state_machine : public state_machine
 {
 public:
@@ -114,7 +121,9 @@ public:
   bool is_readonly_fd_p (state_t s) const;
   bool is_writeonly_fd_p (state_t s) const;
   enum access_mode get_access_mode_from_flag (int flag) const;
-
+  /* Function for one-to-one correspondence between valid
+ and unchecked states.  */
+  state_t valid_to_unchecked_state (state_t state) const;
   /* State for a constant file descriptor (>= 0) */
   state_t m_constant_fd;
 
@@ -147,6 +156,8 @@ public:
 private:
   void on_open (sm_context *sm_ctxt, const supernode *node, const gimple *stmt,
const gcall *call) const;
+  void on_creat (sm_context *sm_ctxt, const supernode *node, const gimple 
*stmt,
+   const gcall *call) const;
   void on_close (sm_context *sm_ctxt, const supernode *node, const gimple 
*stmt,
 const gcall *call) const;
   void on_read (sm_context *sm_ctxt, const supernode *node, const gimple *stmt,
@@ -170,6 +181,9 @@ private:
   const gimple *stmt, const gcall *call,
   const tree callee_fndecl, const char *attr_name,
   access_directions fd_attr_access_dir) const;
+  void check_for_dup (sm_context *sm_ctxt, const supernode *node,
+   const gimple *stmt, const gcall *call, const tree callee_fndecl,
+   enum dup kind) const;
 };
 
 /* Base diagnostic class relative to fd_state_machine.  */
@@ -723,6 +737,20 @@ fd_state_machine::is_constant_fd_p (state_t state) const
   return (state == m_constant_fd);
 }
 
+fd_state_machine::state_t
+fd_state_machine::valid_to_unchecked_state (state_t state) const
+{
+  if (state == m_valid_read_write)
+return m_unchecked_read_write;
+  else if (state == m_valid_write_only)
+return m_unchecked_write_only;
+  else if (state == m_valid_read_only)
+return m_unchecked_read_only;
+  else
+gcc_unreachable ();
+  return NULL;
+}
+
 bool
 fd_state_machine::on_stmt (sm_context *sm_ctxt, const supernode *node,
   const gimple *stmt) const
@@ -736,6 +764,11 @@ fd_state_machine::on_stmt (sm_context *sm_ctxt, const 
supernode *node,
return true;
  } //  "open"
 
+   if (is_named_call_p (callee_fndecl, "creat", call, 2))
+ {
+   on_creat (sm_ctxt, node, stmt, call);
+ } // "creat"
+
if (is_named_call_p (callee_fndecl, "close", call, 1))
  {
on_close (sm_ctxt, node, stmt, call);
@@ -754,6 +787,23 @@ fd_state_machine::on_stmt (sm_context *sm_ctxt, const 
supernode *node,
return true;
  } // "read"
 
+   if (is_named_call_p (callee_fndecl, "dup", call, 1))
+ {
+   check_for_dup (sm_ctxt, node, stmt, call, callee_fndecl, DUP_1);
+   return true;
+ }
+
+   if (is_named_call_p (callee_fndecl, "dup2", call, 2))
+ {
+   check_for_dup (sm_ctxt, node, stmt, call, callee_fndecl, DUP_2);
+   return true;
+ }
+
+   if (is_named_call_p (callee_fndecl, "dup3", call, 3))
+ {
+   check_for_dup (sm_ctxt, node, stmt, call, callee_fndecl, DUP_3);
+   return true;
+ }
 
{
  // Handle __attribute__((fd_arg))
@@ -899,6 +949,71 @@ fd_state_machine::on_open (sm_context *sm_ctxt, const 
supernode *node,
 }
 }
 
+void
+fd_state_machine::on_creat (sm_context *sm_ctxt, const supernode *node,
+   const gimple *stmt, const gcall *call) const
+{
+  tree lhs = gimple_call_lhs (call);
+  if (lhs)
+sm_ctxt->on_transition (node, stmt, lhs, m_start, m_unc

Re: [PATCH] openmp-simd-clone: Match shift type

2022-07-29 Thread Jakub Jelinek via Gcc-patches
On Fri, Jul 29, 2022 at 04:53:51PM +0100, Andrew Stubbs wrote:
> This patch adjusts the generation of SIMD "inbranch" clones that use integer
> masks to ensure that it vectorizes on amdgcn.
> 
> The problem was only that an amdgcn mask is DImode and the shift amount was
> SImode, and the difference causes vectorization to fail.
> 
> OK for mainline?
> 
> Andrew

> openmp-simd-clone: Match shift types
> 
> Ensure that both parameters to vector shifts use the same mode.  This is most
> important for amdgcn where the masks are DImode.
> 
> gcc/ChangeLog:
> 
>   * omp-simd-clone.cc (simd_clone_adjust): Convert shift_cnt to match
>   the mask type.
> 
> diff --git a/gcc/omp-simd-clone.cc b/gcc/omp-simd-clone.cc
> index 32649bc3f9a..5d3a90730e7 100644
> --- a/gcc/omp-simd-clone.cc
> +++ b/gcc/omp-simd-clone.cc
> @@ -1305,8 +1305,12 @@ simd_clone_adjust (struct cgraph_node *node)
>  build_int_cst (TREE_TYPE (iter1), c));
> gsi_insert_after (&gsi, g, GSI_CONTINUE_LINKING);
>   }
> +   tree shift_cnt_conv = make_ssa_name (TREE_TYPE (mask));
> +   g = gimple_build_assign (shift_cnt_conv,
> +fold_convert (TREE_TYPE (mask), shift_cnt));
> +   gsi_insert_after (&gsi, g, GSI_CONTINUE_LINKING);

Doing the fold_convert seems to be a wasted effort to me.
Can't this be done conditional on whether some change is needed at all
and just using gimple_build_assign with NOP_EXPR, so something like:
  tree shift_cvt_conv = shift_cnt;
  if (!useless_type_conversion_p (TREE_TYPE (mask),
  TREE_TYPE (shift_cnt)))
{
  shift_cnt_conv = make_ssa_name (TREE_TYPE (mask));
  g = gimple_build_assign (shift_cnt_conv, NOP_EXPR, shift_cnt);
  gsi_insert_after (&gsi, g, GSI_CONTINUE_LINKING);
}

> g = gimple_build_assign (make_ssa_name (TREE_TYPE (mask)),
> -RSHIFT_EXPR, mask, shift_cnt);
> +RSHIFT_EXPR, mask, shift_cnt_conv);
> gsi_insert_after (&gsi, g, GSI_CONTINUE_LINKING);
> mask = gimple_assign_lhs (g);
> g = gimple_build_assign (make_ssa_name (TREE_TYPE (mask)),

?

Jakub



[PATCH] openmp-simd-clone: Match shift type

2022-07-29 Thread Andrew Stubbs
This patch adjusts the generation of SIMD "inbranch" clones that use 
integer masks to ensure that it vectorizes on amdgcn.


The problem was only that an amdgcn mask is DImode and the shift amount 
was SImode, and the difference causes vectorization to fail.


OK for mainline?

Andrewopenmp-simd-clone: Match shift types

Ensure that both parameters to vector shifts use the same mode.  This is most
important for amdgcn where the masks are DImode.

gcc/ChangeLog:

* omp-simd-clone.cc (simd_clone_adjust): Convert shift_cnt to match
the mask type.

diff --git a/gcc/omp-simd-clone.cc b/gcc/omp-simd-clone.cc
index 32649bc3f9a..5d3a90730e7 100644
--- a/gcc/omp-simd-clone.cc
+++ b/gcc/omp-simd-clone.cc
@@ -1305,8 +1305,12 @@ simd_clone_adjust (struct cgraph_node *node)
   build_int_cst (TREE_TYPE (iter1), c));
  gsi_insert_after (&gsi, g, GSI_CONTINUE_LINKING);
}
+ tree shift_cnt_conv = make_ssa_name (TREE_TYPE (mask));
+ g = gimple_build_assign (shift_cnt_conv,
+  fold_convert (TREE_TYPE (mask), shift_cnt));
+ gsi_insert_after (&gsi, g, GSI_CONTINUE_LINKING);
  g = gimple_build_assign (make_ssa_name (TREE_TYPE (mask)),
-  RSHIFT_EXPR, mask, shift_cnt);
+  RSHIFT_EXPR, mask, shift_cnt_conv);
  gsi_insert_after (&gsi, g, GSI_CONTINUE_LINKING);
  mask = gimple_assign_lhs (g);
  g = gimple_build_assign (make_ssa_name (TREE_TYPE (mask)),


[PATCH v2] LoongArch: add addr_global attribute

2022-07-29 Thread Xi Ruoyao via Gcc-patches
Don't look at V1 patch: I selected wrong file and sent a draft with bugs
mistakenly.

Background:
https://lore.kernel.org/loongarch/d7670b60-2782-4642-995b-7baa01779...@loongson.cn/T/#e1d47e2fe185f2e2be8fdc0784f0db2f644119379

Question:  Do you have a better name than "addr_global"?

Alternatives:

1. Just use "-mno-explicit-relocs -mla-local-with-abs" for kernel
modules.  It's stupid IMO.
2. Implement a "-maddress-local-with-got" option for GCC and use it for
kernel modules.  It seems too overkill: we might create many unnecessary
GOT entries.
3. For all variables with a section attribute, consider it global.  It
may make sense, but I just checked x86_64 and riscv and they don't do
this.
4. Implement -mcmodel=extreme and use it for kernel modules.  To me
"extreme" seems really too extreme.
5. More hacks in kernel. (Convert relocations against .data..percpu with
objtool?  But objtool is not even implemented for LoongArch yet.)

Note: I'll be mostly AFK in the following week.  My attempt to finish
the kernel support for new relocs before going AFK now failed miserably
:(.

-- >8 --

A linker script and/or a section attribute may locate a local object in
some way unexpected by the code model, leading to a link failure.  This
happens when the Linux kernel loads a module with "local" per-CPU
variables.

Add an attribute to explicitly mark an variable with the address
unlimited by the code model so we would be able to work around such
problems.

gcc/ChangeLog:

* config/loongarch/loongarch.cc (loongarch_attribute_table):
New attribute table.
(TARGET_ATTRIBUTE_TABLE): Define the target hook.
(loongarch_handle_addr_global_attribute): New static function.
(loongarch_classify_symbol): Return SYMBOL_GOT_DISP for
SYMBOL_REF_DECL with addr_global attribute.
* doc/extend.texi (Variable Attributes): Document new
LoongArch specific attribute.

gcc/testsuite/ChangeLog:

* gcc.target/loongarch/addr-global.c: New test.
---
 gcc/config/loongarch/loongarch.cc | 45 +++
 gcc/doc/extend.texi   | 17 +++
 .../gcc.target/loongarch/addr-global.c| 28 
 3 files changed, 90 insertions(+)
 create mode 100644 gcc/testsuite/gcc.target/loongarch/addr-global.c

diff --git a/gcc/config/loongarch/loongarch.cc 
b/gcc/config/loongarch/loongarch.cc
index 79687340dfd..cdf6bf44e36 100644
--- a/gcc/config/loongarch/loongarch.cc
+++ b/gcc/config/loongarch/loongarch.cc
@@ -1643,6 +1643,13 @@ loongarch_classify_symbol (const_rtx x)
   && !loongarch_symbol_binds_local_p (x))
 return SYMBOL_GOT_DISP;
 
+  if (SYMBOL_REF_P (x))
+{
+  tree decl = SYMBOL_REF_DECL (x);
+  if (decl && lookup_attribute ("addr_global", DECL_ATTRIBUTES (decl)))
+   return SYMBOL_GOT_DISP;
+}
+
   return SYMBOL_PCREL;
 }
 
@@ -6068,6 +6075,41 @@ loongarch_starting_frame_offset (void)
   return crtl->outgoing_args_size;
 }
 
+static tree
+loongarch_handle_addr_global_attribute (tree *node, tree name, tree, int,
+bool *no_add_attrs)
+{
+  tree decl = *node;
+  if (TREE_CODE (decl) == VAR_DECL)
+{
+  if (DECL_CONTEXT (decl)
+ && TREE_CODE (DECL_CONTEXT (decl)) == FUNCTION_DECL
+ && !TREE_STATIC (decl))
+   {
+ error_at (DECL_SOURCE_LOCATION (decl),
+   "%qE attribute cannot be specified for local "
+   "variables", name);
+ *no_add_attrs = true;
+   }
+}
+  else
+{
+  warning (OPT_Wattributes, "%qE attribute ignored", name);
+  *no_add_attrs = true;
+}
+  return NULL_TREE;
+}
+
+static const struct attribute_spec loongarch_attribute_table[] =
+{
+  /* { name, min_len, max_len, decl_req, type_req, fn_type_req,
+   affects_type_identity, handler, exclude } */
+  { "addr_global", 0, 0, true, false, false, false,
+loongarch_handle_addr_global_attribute, NULL },
+  /* The last attribute spec is set to be NULL.  */
+  {}
+};
+
 /* Initialize the GCC target structure.  */
 #undef TARGET_ASM_ALIGNED_HI_OP
 #define TARGET_ASM_ALIGNED_HI_OP "\t.half\t"
@@ -6256,6 +6298,9 @@ loongarch_starting_frame_offset (void)
 #undef  TARGET_HAVE_SPECULATION_SAFE_VALUE
 #define TARGET_HAVE_SPECULATION_SAFE_VALUE speculation_safe_value_not_needed
 
+#undef  TARGET_ATTRIBUTE_TABLE
+#define TARGET_ATTRIBUTE_TABLE loongarch_attribute_table
+
 struct gcc_target targetm = TARGET_INITIALIZER;
 
 #include "gt-loongarch.h"
diff --git a/gcc/doc/extend.texi b/gcc/doc/extend.texi
index 7fe7f8817cd..4a1cd7ab5e4 100644
--- a/gcc/doc/extend.texi
+++ b/gcc/doc/extend.texi
@@ -7314,6 +7314,7 @@ attributes.
 * Blackfin Variable Attributes::
 * H8/300 Variable Attributes::
 * IA-64 Variable Attributes::
+* LoongArch Variable Attributes::
 * M32R/D Variable Attributes::
 * MeP Variable Attributes::
 * Microsoft Windows Variable Attributes::
@@ -8098,6 +8099,22 @@ defined by shared libraries.
 
 @end table
 

Re: [GCC13][Patch][V2][2/2]Add a new option -fstrict-flex-array[=n] and attribute strict_flex_array(n) and use it in PR101836

2022-07-29 Thread Qing Zhao via Gcc-patches


> On Jul 28, 2022, at 3:28 AM, Richard Biener  wrote:
> 
> On Tue, 19 Jul 2022, Qing Zhao wrote:
> 
>> From a09f39ded462611286a44d9e8273de8342673ba2 Mon Sep 17 00:00:00 2001
>> From: Qing Zhao 
>> Date: Mon, 18 Jul 2022 18:12:26 +
>> Subject: [PATCH 2/2] Use new flag DECL_NOT_FLEXARRAY in __builtin_object_size
>> [PR101836]
>> 
>> Use new flag DECL_NOT_FLEXARRAY to determine whether the trailing array
>> of a structure is flexible array member in __builtin_object_size.
>> 
>> gcc/ChangeLog:
>> 
>>  PR tree-optimization/101836
>>  * tree-object-size.cc (addr_object_size): Use array_at_struct_end_p
>>  and DECL_NOT_FLEXARRAY to determine a flexible array member reference.
>> 
>> gcc/testsuite/ChangeLog:
>> 
>>  PR tree-optimization/101836
>>  * gcc.dg/pr101836.c: New test.
>>  * gcc.dg/pr101836_1.c: New test.
>>  * gcc.dg/pr101836_2.c: New test.
>>  * gcc.dg/pr101836_3.c: New test.
>>  * gcc.dg/pr101836_4.c: New test.
>>  * gcc.dg/pr101836_5.c: New test.
>>  * gcc.dg/strict-flex-array-2.c: New test.
>>  * gcc.dg/strict-flex-array-3.c: New test.
>> ---
>> gcc/testsuite/gcc.dg/pr101836.c| 60 ++
>> gcc/testsuite/gcc.dg/pr101836_1.c  | 60 ++
>> gcc/testsuite/gcc.dg/pr101836_2.c  | 60 ++
>> gcc/testsuite/gcc.dg/pr101836_3.c  | 60 ++
>> gcc/testsuite/gcc.dg/pr101836_4.c  | 60 ++
>> gcc/testsuite/gcc.dg/pr101836_5.c  | 60 ++
>> gcc/testsuite/gcc.dg/strict-flex-array-2.c | 60 ++
>> gcc/testsuite/gcc.dg/strict-flex-array-3.c | 60 ++
>> gcc/tree-object-size.cc| 18 +++
>> 9 files changed, 489 insertions(+), 9 deletions(-)
>> create mode 100644 gcc/testsuite/gcc.dg/pr101836.c
>> create mode 100644 gcc/testsuite/gcc.dg/pr101836_1.c
>> create mode 100644 gcc/testsuite/gcc.dg/pr101836_2.c
>> create mode 100644 gcc/testsuite/gcc.dg/pr101836_3.c
>> create mode 100644 gcc/testsuite/gcc.dg/pr101836_4.c
>> create mode 100644 gcc/testsuite/gcc.dg/pr101836_5.c
>> create mode 100644 gcc/testsuite/gcc.dg/strict-flex-array-2.c
>> create mode 100644 gcc/testsuite/gcc.dg/strict-flex-array-3.c
>> 
>> diff --git a/gcc/testsuite/gcc.dg/pr101836.c 
>> b/gcc/testsuite/gcc.dg/pr101836.c
>> new file mode 100644
>> index 000..e5b4e5160a4
>> --- /dev/null
>> +++ b/gcc/testsuite/gcc.dg/pr101836.c
>> @@ -0,0 +1,60 @@
>> +/* -fstrict-flex-array is aliased with -ftrict-flex-array=3, which is the
>> +   strictest, only [] is treated as flexible array.  */ 
>> +/* PR tree-optimization/101836 */
>> +/* { dg-do run } */
>> +/* { dg-options "-O2 -fstrict-flex-array" } */
>> +
>> +#include 
>> +
>> +#define expect(p, _v) do { \
>> +size_t v = _v; \
>> +if (p == v) \
>> +printf("ok:  %s == %zd\n", #p, p); \
>> +else \
>> +{  \
>> +  printf("WAT: %s == %zd (expected %zd)\n", #p, p, v); \
>> +  __builtin_abort (); \
>> +} \
>> +} while (0);
>> +
>> +struct trailing_array_1 {
>> +int a;
>> +int b;
>> +int c[4];
>> +};
>> +
>> +struct trailing_array_2 {
>> +int a;
>> +int b;
>> +int c[1];
>> +};
>> +
>> +struct trailing_array_3 {
>> +int a;
>> +int b;
>> +int c[0];
>> +};
>> +struct trailing_array_4 {
>> +int a;
>> +int b;
>> +int c[];
>> +};
>> +
>> +void __attribute__((__noinline__)) stuff(
>> +struct trailing_array_1 *normal,
>> +struct trailing_array_2 *trailing_1,
>> +struct trailing_array_3 *trailing_0,
>> +struct trailing_array_4 *trailing_flex)
>> +{
>> +expect(__builtin_object_size(normal->c, 1), 16);
>> +expect(__builtin_object_size(trailing_1->c, 1), 4);
>> +expect(__builtin_object_size(trailing_0->c, 1), 0);
>> +expect(__builtin_object_size(trailing_flex->c, 1), -1);
>> +}
>> +
>> +int main(int argc, char *argv[])
>> +{
>> +stuff((void *)argv[0], (void *)argv[0], (void *)argv[0], (void 
>> *)argv[0]);
>> +
>> +return 0;
>> +}
>> diff --git a/gcc/testsuite/gcc.dg/pr101836_1.c 
>> b/gcc/testsuite/gcc.dg/pr101836_1.c
>> new file mode 100644
>> index 000..30ea20427a5
>> --- /dev/null
>> +++ b/gcc/testsuite/gcc.dg/pr101836_1.c
>> @@ -0,0 +1,60 @@
>> +/* -fstrict-flex-array=3 is the strictest, only [] is treated as
>> +   flexible array.  */ 
>> +/* PR tree-optimization/101836 */
>> +/* { dg-do run } */
>> +/* { dg-options "-O2 -fstrict-flex-array=3" } */
>> +
>> +#include 
>> +
>> +#define expect(p, _v) do { \
>> +size_t v = _v; \
>> +if (p == v) \
>> +printf("ok:  %s == %zd\n", #p, p); \
>> +else \
>> +{  \
>> +  printf("WAT: %s == %zd (expected %zd)\n", #p, p, v); \
>> +  __builtin_abort (); \
>> +} \
>> +} while (0);
>> +
>> +struct trailing_array_1 {
>> +int a;
>> +int b;
>> +int c[4];
>> +};
>> +
>> +struct trailing_array_2 {
>> +int a;
>> +int b;
>> +int c[

[PATCH] LoongArch: add addr_global attribute

2022-07-29 Thread Xi Ruoyao via Gcc-patches
Background:
https://lore.kernel.org/loongarch/d7670b60-2782-4642-995b-7baa01779...@loongson.cn/T/#e1d47e2fe185f2e2be8fdc0784f0db2f644119379

Question:  Do you have a better name than "addr_global"?

Alternatives:

1. Just use "-mno-explicit-relocs -mla-local-with-abs" for kernel
modules.  It's stupid IMO.
2. Implement a "-maddress-local-with-got" option for GCC and use it for
kernel modules.  It seems too overkill: we might create many unnecessary
GOT entries.
3. For all variables with a section attribute, consider it global.  It
may make sense, but I just checked x86_64 and riscv and they don't do
this.
4. Implement -mcmodel=extreme and use it for kernel modules.  To me
"extreme" seems really too extreme.
5. More hacks in kernel. (Convert relocations against .data..percpu with
objtool?  But objtool is not even implemented for LoongArch yet.)

Note: I'll be mostly AFK in the following week.  My attempt to finish
the kernel support for new relocs before going AFK now failed miserably
:(.

-- >8 --

A linker script and/or a section attribute may locate a local object in
some way unexpected by the code model, leading to a link failure.  This
happens when the Linux kernel loads a module with "local" per-CPU
variables.

Add an attribute to explicitly mark an variable with the address
unlimited by the code model so we would be able to work around such
problems.

gcc/ChangeLog:

* config/loongarch/loongarch.cc (loongarch_attribute_table):
New attribute table.
(TARGET_ATTRIBUTE_TABLE): Define the target hook.
(loongarch_handle_addr_global_attribute): New static function.
(loongarch_classify_symbol): Return SYMBOL_GOT_DISP for
SYMBOL_REF_DECL with addr_global attribute.
* doc/extend.texi (Variable Attributes): Document new
LoongArch specific attribute.

gcc/testsuite/ChangeLog:

* gcc.target/loongarch/addr-global.c: New test.
---
 gcc/config/loongarch/loongarch.cc | 43 +++
 gcc/doc/extend.texi   | 17 
 .../gcc.target/loongarch/addr-global.c| 28 
 3 files changed, 88 insertions(+)
 create mode 100644 gcc/testsuite/gcc.target/loongarch/addr-global.c

diff --git a/gcc/config/loongarch/loongarch.cc 
b/gcc/config/loongarch/loongarch.cc
index 79687340dfd..5f4e6114fc9 100644
--- a/gcc/config/loongarch/loongarch.cc
+++ b/gcc/config/loongarch/loongarch.cc
@@ -1643,6 +1643,13 @@ loongarch_classify_symbol (const_rtx x)
   && !loongarch_symbol_binds_local_p (x))
 return SYMBOL_GOT_DISP;
 
+  if (SYMBOL_REF_P (x))
+{
+  tree decl = SYMBOL_REF_DECL (x);
+  if (decl && lookup_attribute ("addr_global", DECL_ATTRIBUTES (decl)))
+   return SYMBOL_GOT_DISP;
+}
+
   return SYMBOL_PCREL;
 }
 
@@ -6068,6 +6075,39 @@ loongarch_starting_frame_offset (void)
   return crtl->outgoing_args_size;
 }
 
+static tree
+loongarch_handle_addr_global_attribute (tree *node, tree name, tree, int,
+bool *no_add_attrs)
+{
+  tree decl = *node;
+  if (TREE_CODE (decl) == VAR_DECL)
+{
+  if (DECL_CONTEXT (decl)
+ && TREE_CODE (DECL_CONTEXT (decl)) == FUNCTION_DECL
+ && !TREE_STATIC (decl))
+   {
+ error_at (DECL_SOURCE_LOCATION (decl),
+   "%qE attribute cannot be specified for local "
+   "variables", name);
+ *no_add_attrs = true;
+   }
+}
+  else
+{
+  warning (OPT_Wattributes, "%qE attribute ignored", name);
+  *no_add_attrs = true;
+}
+  return NULL_TREE;
+}
+
+static const struct attribute_spec loongarch_attribute_table[] =
+{
+  /* { name, min_len, max_len, decl_req, type_req, fn_type_req,
+   affects_type_identity, handler, exclude } */
+  { "addr_global", 0, 0, true, false, false, false,
+loongarch_handle_addr_global_attribute, NULL }
+};
+
 /* Initialize the GCC target structure.  */
 #undef TARGET_ASM_ALIGNED_HI_OP
 #define TARGET_ASM_ALIGNED_HI_OP "\t.half\t"
@@ -6256,6 +6296,9 @@ loongarch_starting_frame_offset (void)
 #undef  TARGET_HAVE_SPECULATION_SAFE_VALUE
 #define TARGET_HAVE_SPECULATION_SAFE_VALUE speculation_safe_value_not_needed
 
+#undef  TARGET_ATTRIBUTE_TABLE
+#define TARGET_ATTRIBUTE_TABLE loongarch_attribute_table
+
 struct gcc_target targetm = TARGET_INITIALIZER;
 
 #include "gt-loongarch.h"
diff --git a/gcc/doc/extend.texi b/gcc/doc/extend.texi
index 7fe7f8817cd..4a1cd7ab5e4 100644
--- a/gcc/doc/extend.texi
+++ b/gcc/doc/extend.texi
@@ -7314,6 +7314,7 @@ attributes.
 * Blackfin Variable Attributes::
 * H8/300 Variable Attributes::
 * IA-64 Variable Attributes::
+* LoongArch Variable Attributes::
 * M32R/D Variable Attributes::
 * MeP Variable Attributes::
 * Microsoft Windows Variable Attributes::
@@ -8098,6 +8099,22 @@ defined by shared libraries.
 
 @end table
 
+@node LoongArch Variable Attributes
+@subsection LoongArch Variable Attributes
+
+One attribute is currently defined for the LoongArch.
+
+@tab

[PATCH v2] cselib: add function to check if SET is redundant [PR106187]

2022-07-29 Thread Richard Earnshaw via Gcc-patches
A SET operation that writes memory may have the same value as an earlier 
store but if the alias sets of the new and earlier store do not conflict 
then the set is not truly redundant.  This can happen, for example, if 
objects of different types share a stack slot.


To fix this we define a new function in cselib that first checks for
equality and if that is successful then finds the earlier store in the
value history and checks the alias sets.

The routine is used in two places elsewhere in the compiler.  Firstly
in cfgcleanup and secondly in postreload.

gcc/ChangeLog:
* alias.h (mems_same_for_tbaa_p): Declare.
* alias.cc (mems_same_for_tbaa_p): New function.
* dse.cc (record_store): Use it instead of open-coding
alias check.
* cselib.h (cselib_redundant_set_p): Declare.
* cselib.cc: Include alias.h
(cselib_redundant_set_p): New function.
* cfgcleanup.cc: (mark_effect): Use cselib_redundant_set_p instead
of rtx_equal_for_cselib_p.
* postreload.c (reload_cse_simplify): Use cselib_redundant_set_p.
(reload_cse_noop_set_p): Delete.diff --git a/gcc/alias.cc b/gcc/alias.cc
index 8c08452e0ac..d54feb15268 100644
--- a/gcc/alias.cc
+++ b/gcc/alias.cc
@@ -389,6 +389,20 @@ refs_same_for_tbaa_p (tree earlier, tree later)
 	  || alias_set_subset_of (later_base_set, earlier_base_set));
 }
 
+/* Similar to refs_same_for_tbaa_p() but for use on MEM rtxs.  */
+bool
+mems_same_for_tbaa_p (rtx earlier, rtx later)
+{
+  gcc_assert (MEM_P (earlier));
+  gcc_assert (MEM_P (later));
+
+  return ((MEM_ALIAS_SET (earlier) == MEM_ALIAS_SET (later)
+	   || alias_set_subset_of (MEM_ALIAS_SET (later),
+   MEM_ALIAS_SET (earlier)))
+	  && (!MEM_EXPR (earlier)
+	  || refs_same_for_tbaa_p (MEM_EXPR (earlier), MEM_EXPR (later;
+}
+
 /* Returns a pointer to the alias set entry for ALIAS_SET, if there is
such an entry, or NULL otherwise.  */
 
diff --git a/gcc/alias.h b/gcc/alias.h
index b2596518ac9..ee3db466763 100644
--- a/gcc/alias.h
+++ b/gcc/alias.h
@@ -40,6 +40,7 @@ tree reference_alias_ptr_type_1 (tree *);
 bool alias_ptr_types_compatible_p (tree, tree);
 int compare_base_decls (tree, tree);
 bool refs_same_for_tbaa_p (tree, tree);
+bool mems_same_for_tbaa_p (rtx, rtx);
 
 /* This alias set can be used to force a memory to conflict with all
other memories, creating a barrier across which no memory reference
diff --git a/gcc/cfgcleanup.cc b/gcc/cfgcleanup.cc
index 18047da7b24..a8b0139bb4d 100644
--- a/gcc/cfgcleanup.cc
+++ b/gcc/cfgcleanup.cc
@@ -208,7 +208,7 @@ mark_effect (rtx exp, regset nonequal)
   return false;
 
 case SET:
-  if (rtx_equal_for_cselib_p (SET_DEST (exp), SET_SRC (exp)))
+  if (cselib_redundant_set_p (exp))
 	return false;
   dest = SET_DEST (exp);
   if (dest == pc_rtx)
diff --git a/gcc/cselib.cc b/gcc/cselib.cc
index 6769beeeaf8..6a5609786fa 100644
--- a/gcc/cselib.cc
+++ b/gcc/cselib.cc
@@ -32,6 +32,7 @@ along with GCC; see the file COPYING3.  If not see
 #include "dumpfile.h"
 #include "cselib.h"
 #include "function-abi.h"
+#include "alias.h"
 
 /* A list of cselib_val structures.  */
 struct elt_list
@@ -1157,6 +1158,75 @@ rtx_equal_for_cselib_1 (rtx x, rtx y, machine_mode memmode, int depth)
   return 1;
 }
 
+/* Wrapper for rtx_equal_for_cselib_p to determine whether a SET is
+   truly redundant, taking into account aliasing information.  */
+bool
+cselib_redundant_set_p (rtx set)
+{
+  gcc_assert (GET_CODE (set) == SET);
+  rtx dest = SET_DEST (set);
+  if (cselib_reg_set_mode (dest) != GET_MODE (dest))
+return false;
+
+  if (!rtx_equal_for_cselib_p (dest, SET_SRC (set)))
+return false;
+
+  while (GET_CODE (dest) == SUBREG
+	 || GET_CODE (dest) == ZERO_EXTRACT
+	 || GET_CODE (dest) == STRICT_LOW_PART)
+dest = XEXP (dest, 0);
+
+  if (!flag_strict_aliasing || !MEM_P (dest))
+return true;
+
+  /* For a store we need to check that suppressing it will not change
+ the effective alias set.  */
+  rtx dest_addr = XEXP (dest, 0);
+
+  /* Lookup the equivalents to the original dest (rather than just the
+ MEM).  */
+  cselib_val *src_val = cselib_lookup (SET_DEST (set),
+   GET_MODE (SET_DEST (set)),
+   0, VOIDmode);
+
+  if (src_val)
+{
+  /* Walk the list of source equivalents to find the MEM accessing
+	 the same location.  */
+  for (elt_loc_list *l = src_val->locs; l; l = l->next)
+	{
+	  rtx src_equiv = l->loc;
+	  while (GET_CODE (src_equiv) == SUBREG
+		 || GET_CODE (src_equiv) == ZERO_EXTRACT
+		 || GET_CODE (src_equiv) == STRICT_LOW_PART)
+	src_equiv = XEXP (src_equiv, 0);
+
+	  if (MEM_P (src_equiv))
+	{
+	  /* Match the MEMs by comparing the addresses.  We can
+		 only remove the later store if the earlier aliases at
+		 least all the accesses of the later one.  */
+	  if (rtx_equal_for_cselib_1 (dest_addr, XEXP (src_equiv, 0),
+	  GET_MODE (dest), 0))
+		return mems_same_for_tbaa_p (src_equiv, d

Progress of the new linking implementation

2022-07-29 Thread Gaius Mulley via Gcc-patches


Hello,

the non shared library linking is complete and the gm2 driver has been
rewritten using the fortran/c++ code base.  Once the shared library
scaffold is complete the focus will be on tidying up and
platform/architecture testing.

All 11.7k tests pass on amd64 and aarch64 debian

regards,
Gaius


Re: [PATCH] tree-optimization/105679 - disable backward threading of unlikely entry

2022-07-29 Thread Richard Biener via Gcc-patches
On Fri, 29 Jul 2022, Aldy Hernandez wrote:

> On Fri, Jul 29, 2022 at 11:02 AM Richard Biener  wrote:
> >
> > The following makes the backward threader reject threads whose entry
> > edge is probably never executed according to the profile.  That in
> > particular, for the testcase, avoids threading the irq == 1 check
> > on the path where irq > 31, thereby avoiding spurious -Warray-bounds
> > diagnostics
> >
> >   if (irq_1(D) > 31)
> > goto ; [0.00%]
> >   else
> > goto ; [100.00%]
> >
> > ;;   basic block 3, loop depth 0, count 0 (precise), probably never executed
> >   _2 = (unsigned long) irq_1(D);
> >   __builtin___ubsan_handle_shift_out_of_bounds (&*.Lubsan_data0, 1, _2);
> >
> >   _3 = 1 << irq_1(D);
> >   mask_4 = (u32) _3;
> >   entry = instance_5(D)->array[irq_1(D)];
> >   capture (mask_4);
> >   if (level_6(D) != 0)
> > goto ; [34.00%]
> >   else
> > goto ; [66.00%]
> >
> > ;;   basic block 5, loop depth 0, count 708669600 (estimated locally), 
> > maybe hot  if (irq_1(D) == 1)
> > goto ; [20.97%]
> >   else
> > goto ; [79.03%]
> >
> > Bootstrap and regtest running on x86_64-unknown-linux-gnu.
> >
> > The testcase in the PR requries both ubsan and sancov so I'm not sure
> > where to put it but IIRC there were quite some duplicate PRs wrt
> > threading unlikely paths exposing diagnostics, eventually some
> > testcase will come out of those (when we identify them).  Note
> > the patch is quite conservative in only disabling likely never
> > executed paths rather than requiring maybe_hot_edge_p (OTOH those
> > are somewhat similar in the end).
> 
> Sounds reasonable, if for no other reason than to quiet down the
> annoyingly large amount of false positives we have because of
> aggressive threading, or better ranges as a whole.
> 
> How does this fit in with Jeff's original idea of isolating known
> problematic paths (null dereferences?), which in theory are also
> unlikely?

If profile estimation gets to see those the paths should turn unlikely.
Path isolation should also make sure to adjust the profile accordingly,
though local profile adjustments are somewhat difficult.

Richard.

> Thanks for doing this.
> Aldy
> 
> >
> > I'm going to push it when testing finishes but maybe there are some
> > testcases to adjust.
> >
> > PR tree-optimization/105679
> > * tree-ssa-threadbackwards.cc
> > (back_threader_profitability::profitable_path_p): Avoid threading
> > when the entry edge is probably never executed.
> > ---
> >  gcc/tree-ssa-threadbackward.cc | 9 +
> >  1 file changed, 9 insertions(+)
> >
> > diff --git a/gcc/tree-ssa-threadbackward.cc b/gcc/tree-ssa-threadbackward.cc
> > index 3519aca84cd..90f5331c265 100644
> > --- a/gcc/tree-ssa-threadbackward.cc
> > +++ b/gcc/tree-ssa-threadbackward.cc
> > @@ -777,6 +777,15 @@ back_threader_profitability::profitable_path_p (const 
> > vec &m_path,
> >  "exceeds PARAM_MAX_FSM_THREAD_PATH_INSNS.\n");
> >   return false;
> > }
> > +  edge entry = find_edge (m_path[m_path.length () - 1],
> > + m_path[m_path.length () - 2]);
> > +  if (probably_never_executed_edge_p (cfun, entry))
> > +   {
> > + if (dump_file && (dump_flags & TDF_DETAILS))
> > +   fprintf (dump_file, "  FAIL: Jump-thread path not considered: "
> > +"path entry is probably never executed.\n");
> > + return false;
> > +   }
> >  }
> >else if (!m_speed_p && n_insns > 1)
> >  {
> > --
> > 2.35.3
> >
> 
> 

-- 
Richard Biener 
SUSE Software Solutions Germany GmbH, Frankenstrasse 146, 90461 Nuernberg,
Germany; GF: Ivo Totev, Andrew Myers, Andrew McDonald, Boudien Moerman;
HRB 36809 (AG Nuernberg)


[committed] amdgcn: 64-bit vector shifts

2022-07-29 Thread Andrew Stubbs
I've committed this patch to implement V64DImode vector-vector and 
vector-scalar shifts.


In particular, these are used by the SIMD "inbranch" clones that I'm 
working on right now, but it's an omission that ought to have been fixed 
anyway.


Andrewamdgcn: 64-bit vector shifts

Enable 64-bit vector-vector and vector-scalar shifts.

gcc/ChangeLog:

* config/gcn/gcn-valu.md (V_INT_noHI): New iterator.
(3): Use V_INT_noHI.
(v3): Likewise.

diff --git a/gcc/config/gcn/gcn-valu.md b/gcc/config/gcn/gcn-valu.md
index abe46201344..8c33ae0c717 100644
--- a/gcc/config/gcn/gcn-valu.md
+++ b/gcc/config/gcn/gcn-valu.md
@@ -60,6 +60,8 @@ (define_mode_iterator V_noHI
 
 (define_mode_iterator V_INT_noQI
  [V64HI V64SI V64DI])
+(define_mode_iterator V_INT_noHI
+ [V64SI V64DI])
 
 ; All of above
 (define_mode_iterator V_ALL
@@ -2086,10 +2088,10 @@ (define_expand "3"
   })
 
 (define_insn "3"
-  [(set (match_operand:V_SI 0 "register_operand"  "= v")
-   (shiftop:V_SI
- (match_operand:V_SI 1 "gcn_alu_operand" "  v")
- (vec_duplicate:V_SI
+  [(set (match_operand:V_INT_noHI 0 "register_operand"  "= v")
+   (shiftop:V_INT_noHI
+ (match_operand:V_INT_noHI 1 "gcn_alu_operand" "  v")
+ (vec_duplicate:
(match_operand:SI 2 "gcn_alu_operand"  "SvB"]
   ""
   "v_0\t%0, %2, %1"
@@ -2117,10 +2119,10 @@ (define_expand "v3"
   })
 
 (define_insn "v3"
-  [(set (match_operand:V_SI 0 "register_operand"  "=v")
-   (shiftop:V_SI
- (match_operand:V_SI 1 "gcn_alu_operand" " v")
- (match_operand:V_SI 2 "gcn_alu_operand" "vB")))]
+  [(set (match_operand:V_INT_noHI 0 "register_operand"  "=v")
+   (shiftop:V_INT_noHI
+ (match_operand:V_INT_noHI 1 "gcn_alu_operand" " v")
+ (match_operand: 2 "gcn_alu_operand" "vB")))]
   ""
   "v_0\t%0, %2, %1"
   [(set_attr "type" "vop2")


[committed] amdgcn: 64-bit not

2022-07-29 Thread Andrew Stubbs

I've committed this patch to enable DImode one's-complement on amdgcn.

The hardware doesn't have 64-bit not, and this isn't needed by expand 
which is happy to use two SImode operations, but the vectorizer isn't so 
clever. Vector condition masks are DImode on amdgcn, so this has been 
causing lots of conditional code to fail to vectorize.


Andrewamdgcn: 64-bit not

This makes the auto-vectorizer happier when handling masks.

gcc/ChangeLog:

* config/gcn/gcn.md (one_cmpldi2): New.

diff --git a/gcc/config/gcn/gcn.md b/gcc/config/gcn/gcn.md
index 033c1708e88..70a769babc4 100644
--- a/gcc/config/gcn/gcn.md
+++ b/gcc/config/gcn/gcn.md
@@ -1676,6 +1676,26 @@ (define_expand "si3_scc"
 ;; }}}
 ;; {{{ ALU: generic 64-bit
 
+(define_insn_and_split "one_cmpldi2"
+  [(set (match_operand:DI 0 "register_operand""=Sg,v")
+   (not:DI (match_operand:DI 1 "gcn_alu_operand" "SgA,vSvDB")))
+   (clobber (match_scratch:BI 2  "=cs,X"))]
+  ""
+  "#"
+  "reload_completed"
+  [(parallel [(set (match_dup 3) (not:SI (match_dup 4)))
+ (clobber (match_dup 2))])
+   (parallel [(set (match_dup 5) (not:SI (match_dup 6)))
+ (clobber (match_dup 2))])]
+  {
+operands[3] = gcn_operand_part (DImode, operands[0], 0);
+operands[4] = gcn_operand_part (DImode, operands[1], 0);
+operands[5] = gcn_operand_part (DImode, operands[0], 1);
+operands[6] = gcn_operand_part (DImode, operands[1], 1);
+  }
+  [(set_attr "type" "mult")]
+)
+
 (define_code_iterator vec_and_scalar64_com [and ior xor])
 
 (define_insn_and_split "di3"


Re: [PATCH] tree-optimization/105679 - disable backward threading of unlikely entry

2022-07-29 Thread Aldy Hernandez via Gcc-patches
On Fri, Jul 29, 2022 at 11:02 AM Richard Biener  wrote:
>
> The following makes the backward threader reject threads whose entry
> edge is probably never executed according to the profile.  That in
> particular, for the testcase, avoids threading the irq == 1 check
> on the path where irq > 31, thereby avoiding spurious -Warray-bounds
> diagnostics
>
>   if (irq_1(D) > 31)
> goto ; [0.00%]
>   else
> goto ; [100.00%]
>
> ;;   basic block 3, loop depth 0, count 0 (precise), probably never executed
>   _2 = (unsigned long) irq_1(D);
>   __builtin___ubsan_handle_shift_out_of_bounds (&*.Lubsan_data0, 1, _2);
>
>   _3 = 1 << irq_1(D);
>   mask_4 = (u32) _3;
>   entry = instance_5(D)->array[irq_1(D)];
>   capture (mask_4);
>   if (level_6(D) != 0)
> goto ; [34.00%]
>   else
> goto ; [66.00%]
>
> ;;   basic block 5, loop depth 0, count 708669600 (estimated locally), maybe 
> hot  if (irq_1(D) == 1)
> goto ; [20.97%]
>   else
> goto ; [79.03%]
>
> Bootstrap and regtest running on x86_64-unknown-linux-gnu.
>
> The testcase in the PR requries both ubsan and sancov so I'm not sure
> where to put it but IIRC there were quite some duplicate PRs wrt
> threading unlikely paths exposing diagnostics, eventually some
> testcase will come out of those (when we identify them).  Note
> the patch is quite conservative in only disabling likely never
> executed paths rather than requiring maybe_hot_edge_p (OTOH those
> are somewhat similar in the end).

Sounds reasonable, if for no other reason than to quiet down the
annoyingly large amount of false positives we have because of
aggressive threading, or better ranges as a whole.

How does this fit in with Jeff's original idea of isolating known
problematic paths (null dereferences?), which in theory are also
unlikely?

Thanks for doing this.
Aldy

>
> I'm going to push it when testing finishes but maybe there are some
> testcases to adjust.
>
> PR tree-optimization/105679
> * tree-ssa-threadbackwards.cc
> (back_threader_profitability::profitable_path_p): Avoid threading
> when the entry edge is probably never executed.
> ---
>  gcc/tree-ssa-threadbackward.cc | 9 +
>  1 file changed, 9 insertions(+)
>
> diff --git a/gcc/tree-ssa-threadbackward.cc b/gcc/tree-ssa-threadbackward.cc
> index 3519aca84cd..90f5331c265 100644
> --- a/gcc/tree-ssa-threadbackward.cc
> +++ b/gcc/tree-ssa-threadbackward.cc
> @@ -777,6 +777,15 @@ back_threader_profitability::profitable_path_p (const 
> vec &m_path,
>  "exceeds PARAM_MAX_FSM_THREAD_PATH_INSNS.\n");
>   return false;
> }
> +  edge entry = find_edge (m_path[m_path.length () - 1],
> + m_path[m_path.length () - 2]);
> +  if (probably_never_executed_edge_p (cfun, entry))
> +   {
> + if (dump_file && (dump_flags & TDF_DETAILS))
> +   fprintf (dump_file, "  FAIL: Jump-thread path not considered: "
> +"path entry is probably never executed.\n");
> + return false;
> +   }
>  }
>else if (!m_speed_p && n_insns > 1)
>  {
> --
> 2.35.3
>



Re: [patch] libgompd: Add thread handles

2022-07-29 Thread Jakub Jelinek via Gcc-patches
On Mon, Jul 04, 2022 at 10:34:03PM +0200, Ahmed Sayed Mousse wrote:
> *This patch is the initial implementation of OpenMP-API specs book section
> **20.5.5 with title "Thread Handles".*

Sorry for the delay, have been on vacation.

> *I have fixed the first version after revising the notes on it.*
> 
> *libgomp/ChangeLog
> 
> 2022-07-01  Ahmed Sayed   >
> *
> 
> ** Makefile.am (libgompd_la_SOURCES): Add ompd-threads.c.*
> 
> ** Makefile.in: Regenerate.*
> 
> ** team.c ( gomp_free_thread ): Called ompd_bp_thread_end ().*
> 
> ** ompd-support.c ( gompd_thread_initial_tls_bias ): New Variable.*
> 
> *   (gompd_load): Initialize gompd_thread_initial_tls_bias.*
> 
> ** ompd-threads.c: New File.*

The ChangeLog formatting is wrong, so wouldn't go through the commit
hook checking.  Unclear what part of it is just a fault of your mailer
setting and what is really wrong.  But
There should be just > after gmail.com, not another email address,
all the non-empty lines should be indented by a single tab,
there shouldn't be an extra * at the start of end of lines.
There shouldn't be empty lines in between the different changes, just
between the date/name/email line and the ret.
There shouldn't be spaces after ( or before ).
Instead of Called ompd_bp_thread_end (). say just Call ompd_bp_thread_end.
New variable. rather than New Variable.
The line with (gompd_load) is weirdly extra indented.
New file. rather than New File.

> diff --git a/libgomp/Makefile.am b/libgomp/Makefile.am
> index 6d913a93e7f..23f5bede1bf 100644
> --- a/libgomp/Makefile.am
> +++ b/libgomp/Makefile.am
> @@ -94,7 +94,7 @@ libgomp_la_SOURCES = alloc.c atomic.c barrier.c critical.c 
> env.c error.c \
>   priority_queue.c affinity-fmt.c teams.c allocator.c oacc-profiling.c \
>   oacc-target.c ompd-support.c
>  
> -libgompd_la_SOURCES = ompd-init.c ompd-helper.c ompd-icv.c
> +libgompd_la_SOURCES = ompd-init.c ompd-helper.c ompd-icv.c ompd-threads.c
>  
>  include $(top_srcdir)/plugin/Makefrag.am
>  

You've just changed libgompd_la_SOURCES but there are many changes
in the generated file, that means either you didn't use the right
libtool version (1.15.1) or something else wrong is happening.

> diff --git a/libgomp/Makefile.in b/libgomp/Makefile.in
> index 40f896b5f03..8bbc46cca25 100644
> --- a/libgomp/Makefile.in
> +++ b/libgomp/Makefile.in
> @@ -133,21 +133,8 @@ target_triplet = @target@
>  @USE_FORTRAN_TRUE@am__append_7 = openacc.f90
>  subdir = .
>  ACLOCAL_M4 = $(top_srcdir)/aclocal.m4
> -am__aclocal_m4_deps = $(top_srcdir)/../config/acx.m4 \
> - $(top_srcdir)/../config/ax_count_cpus.m4 \
> - $(top_srcdir)/../config/depstand.m4 \
> - $(top_srcdir)/../config/enable.m4 \
> - $(top_srcdir)/../config/futex.m4 \
> - $(top_srcdir)/../config/lead-dot.m4 \
> - $(top_srcdir)/../config/lthostflags.m4 \
> - $(top_srcdir)/../config/multi.m4 \
> - $(top_srcdir)/../config/override.m4 \
> - $(top_srcdir)/../config/tls.m4 \
> - $(top_srcdir)/../config/toolexeclibdir.m4 \
> - $(top_srcdir)/../ltoptions.m4 $(top_srcdir)/../ltsugar.m4 \
> - $(top_srcdir)/../ltversion.m4 $(top_srcdir)/../lt~obsolete.m4 \
> - $(top_srcdir)/acinclude.m4 $(top_srcdir)/../libtool.m4 \
> - $(top_srcdir)/../config/cet.m4 \
> +am__aclocal_m4_deps = $(top_srcdir)/acinclude.m4 \
> + $(top_srcdir)/../libtool.m4 $(top_srcdir)/../config/cet.m4 \
>   $(top_srcdir)/plugin/configfrag.ac $(top_srcdir)/configure.ac

The above certainly shouldn't be changed.

>  am__configure_deps = $(am__aclocal_m4_deps) $(CONFIGURE_DEPENDENCIES) \
>   $(ACLOCAL_M4)
> @@ -233,7 +220,8 @@ am_libgomp_la_OBJECTS = alloc.lo atomic.lo barrier.lo 
> critical.lo \
>   affinity-fmt.lo teams.lo allocator.lo oacc-profiling.lo \
>   oacc-target.lo ompd-support.lo $(am__objects_1)
>  libgomp_la_OBJECTS = $(am_libgomp_la_OBJECTS)
> -am_libgompd_la_OBJECTS = ompd-init.lo ompd-helper.lo ompd-icv.lo
> +am_libgompd_la_OBJECTS = ompd-init.lo ompd-helper.lo ompd-icv.lo \
> + ompd-threads.lo
>  libgompd_la_OBJECTS = $(am_libgompd_la_OBJECTS)
>  AM_V_P = $(am__v_P_@AM_V@)
>  am__v_P_ = $(am__v_P_@AM_DEFAULT_V@)

The above yes.

> @@ -485,7 +473,6 @@ dvidir = @dvidir@
>  enable_shared = @enable_shared@
>  enable_static = @enable_static@
>  exec_prefix = @exec_prefix@
> -get_gcc_base_ver = @get_gcc_base_ver@
>  host = @host@
>  host_alias = @host_alias@
>  host_cpu = @host_cpu@
> @@ -501,10 +488,8 @@ libtool_VERSION = @libtool_VERSION@
>  link_gomp = @link_gomp@
>  localedir = @localedir@
>  localstatedir = @localstatedir@
> -lt_host_flags = @lt_host_flags@
>  mandir = @mandir@
>  mkdir_p = @mkdir_p@
> -multi_basedir = @multi_basedir@
>  offload_additional_lib_paths = @offload_additional_lib_paths@
>  offload_additional_options = @offload_additional_options@
>  offload_plugins = @offload_plugins@
> @@ -514,6 +499,7 @@ pdfdir = @pdfdir@
>  prefix = @prefix@
>  program_transform_name = @program_transform_name@
>  psdir = @psdir@
> +runstatedir = @runs

Re: [PATCH] Fortran: detect blanks within literal constants in free-form mode [PR92805]

2022-07-29 Thread Mikael Morin

Hello,

Le 28/07/2022 à 22:11, Harald Anlauf via Fortran a écrit :

Dear all,

in free-form mode, blanks are significant, so they cannot appear
in literal constants, especially not before or after the "_" that
separates the literal and the kind specifier.

The initial patch from Steve addressed numerical literals, which
I completed by adjusting the parsing of string literals.

Regtested on x86_64-pc-linux-gnu.  OK for mainline?

It looks correct, but I think we should continue to have the free vs 
fixed form abstracted away from the parsing code.
So, I suggest instead to remove the calls to gfc_gobble_whitespace in 
match_string_constant, and use gfc_next_char instead of gfc_match_char 
in get_kind.


Mikael


Re: [RFC] Teach vectorizer to deal with bitfield reads

2022-07-29 Thread Richard Biener via Gcc-patches
On Fri, 29 Jul 2022, Jakub Jelinek wrote:

> On Fri, Jul 29, 2022 at 09:57:29AM +0100, Andre Vieira (lists) via 
> Gcc-patches wrote:
> > The 'only on the vectorized code path' remains the same though as vect_recog
> > also only happens on the vectorized code path right?
> 
> if conversion (in some cases) duplicates a loop and guards one copy with
> an ifn which resolves to true if that particular loop is vectorized and
> false otherwise.  So, then changes that shouldn't be done in case of
> vectorization failure can be done on the for vectorizer only copy of the
> loop.

And just to mention, one issue with lowering of bitfield accesses
is bitfield inserts which, on some architectures (hello m68k) have
instructions operating on memory directly.  For those it's difficult
to not regress in code quality if a bitfield store becomes a
read-modify-write cycle.  That's one of the things holding this
back.  One idea would be to lower to .INSV directly for those targets
(but presence of insv isn't necessarily indicating support for
memory destinations).

Richard.


[PATCH] Mips: Enable TSAN for 64-bit ABIs

2022-07-29 Thread Dimitrije Milosevic
The following patch enables TSAN for mips64, on which it is supported.

Signed-off-by: Dimitrije Milosevic .

libsanitizer/ChangeLog:

* configure.tgt: Enable
TSAN for 64-bit ABIs.
---
 libsanitizer/configure.tgt | 4 
 1 file changed, 4 insertions(+)

diff --git a/libsanitizer/configure.tgt b/libsanitizer/configure.tgt
index fb89df4935c..6855a6ca9e7 100644
--- a/libsanitizer/configure.tgt
+++ b/libsanitizer/configure.tgt
@@ -55,6 +55,10 @@ case "${target}" in
   arm*-*-linux*)
;;
   mips*-*-linux*)
+   if test x$ac_cv_sizeof_void_p = x8; then
+   TSAN_SUPPORTED=yes
+   TSAN_TARGET_DEPENDENT_OBJECTS=tsan_rtl_mips64.lo
+   fi
;;
   aarch64*-*-linux*)
if test x$ac_cv_sizeof_void_p = x8; then
-- 
2.25.1

Re: [Patch] Add libgomp.c-c++-common/pr106449-2.c (was: [committed] openmp: Fix up handling of non-rectangular simd loops with pointer type iterators [PR106449])

2022-07-29 Thread Jakub Jelinek via Gcc-patches
On Fri, Jul 29, 2022 at 11:48:08AM +0200, Tobias Burnus wrote:
> On 29.07.22 10:03, Jakub Jelinek wrote:
> > There were 2 issues visible on this new testcase, one that we didn't have
> > special POINTER_TYPE_P handling in a few spots of expand_omp_simd ...
> > The other issue was that we put n2 expression directly into a
> > comparison in a condition and regimplified that, for the &a[512] case that
> > and with gimplification being destructed that unfortunately meant 
> > modification
> > of original fd->loops[?].n2.  Fixed by unsharing the expression.
> 
> I created a testcase for the non-simd case – and due to messing up, it failed;
> hence, I filled PR middle-end/106467.  After fixing the testcase, it passes.
> (→ closed PR as invalid).
> 
> However, given that the testcase now exists, I think it makes sense to add it 
> :-)
> 
> Changes compared to the simd testcase: replaced '(parallel for) simd' by 
> 'for',
> removed 'linear', used now 'b' and 'c' instead of storing both ptrs in 'b'.
> 
> Side remark: Before GCC 12, GCC complained about "q = p + n" with
> "error: initializer expression refers to iteration variable ‘p’".
> 
> OK for mainline?

Ok, thanks.

Jakub



Re: [Patch] OpenMP/Fortran: Permit assumed-size arrays in uniform clause

2022-07-29 Thread Jakub Jelinek via Gcc-patches
On Fri, Jul 29, 2022 at 11:47:54AM +0200, Tobias Burnus wrote:
> Testcase wise, the run-time testcase libgomp.fortran/examples-4/simd-2.f90
> checks essentially the same, except that it uses an array-descriptor array
> (assumed shape) while this testcase uses an assumed-size array.
> 
> I decided for an extra compile-time only testcase, but it could be also be
> moved to libgomp as run-time test or the other test could be extended to
> also test assumed-size arrays.
> 
> The OpenMP examples document contains two testcases which now pass,
> but are reject without this patch:
> - SIMD/sources/SIMD.2.f90 (for OpenMP 4.0)
> - SIMD/sources/linear_modifier.3.f90 (for OpenMP 5.2)
> 
> OK for mainline?

Ok, thanks.

Jakub



Re: [RFC] Teach vectorizer to deal with bitfield reads

2022-07-29 Thread Jakub Jelinek via Gcc-patches
On Fri, Jul 29, 2022 at 09:57:29AM +0100, Andre Vieira (lists) via Gcc-patches 
wrote:
> The 'only on the vectorized code path' remains the same though as vect_recog
> also only happens on the vectorized code path right?

if conversion (in some cases) duplicates a loop and guards one copy with
an ifn which resolves to true if that particular loop is vectorized and
false otherwise.  So, then changes that shouldn't be done in case of
vectorization failure can be done on the for vectorizer only copy of the
loop.

Jakub



Re: cselib: add function to check if SET is redundant [PR106187]

2022-07-29 Thread Richard Biener via Gcc-patches
On Fri, Jul 29, 2022 at 11:52 AM Richard Earnshaw
 wrote:
>
>
>
> On 29/07/2022 08:06, Richard Biener via Gcc-patches wrote:
> > On Thu, Jul 28, 2022 at 6:46 PM Richard Earnshaw
> >  wrote:
> >>
> >> [resend with correct subject line]
> >>
> >> A SET operation that writes memory may have the same value as an earlier
> >> store but if the alias sets of the new and earlier store do not conflict
> >> then the set is not truly redundant.  This can happen, for example, if
> >> objects of different types share a stack slot.
> >>
> >> To fix this we define a new function in cselib that first checks for
> >> equality and if that is successful then finds the earlier store in the
> >> value history and checks the alias sets.
> >>
> >> The routine is used in two places elsewhere in the compiler.  Firstly
> >> in cfgcleanup and secondly in postreload.
> >
> > I can't comment on the stripping on SUBREGs and friends but it seems
> > to be conservative apart from
> >
> > +  if (!flag_strict_aliasing || !MEM_P (dest))
> > +return true;
> >
> > where if dest is not a MEM but were to contain one we'd miss it.
> > Double-checking
> > from more RTL literate people appreciated.
>
> There are very few things that can wrap a MEM in a SET_DEST.  I'm pretty
> sure that's all of them.  It certainly matches the code in
> cselib_invalidate_rtx which has to deal with this sort of case.
>
> >
> > +  /* Lookup the equivalents to the dest.  This is more likely to succeed
> > + than looking up the equivalents to the source (for example, when the
> > + src is some form of constant).  */
> >
> > I think the comment is misleading - we _do_ have to lookup the MEM,
> > looking up equivalences of a reg or an expression on the RHS isn't
> > what we are interested in.
>
> OK, I'll try to reword it.
>
> >
> > +   return alias_sets_conflict_p (MEM_ALIAS_SET (dest),
> > + MEM_ALIAS_SET (src_equiv));
> >
> > that's not conservative enough - dse.cc has correct boilerplate, we have
> > to check both MEM_ALIAS_SET and MEM_EXPR here (the latter only
> > if the former load/store has a MEM_EXPR).  Note in particular
> > using alias_set_subset_of instead of alias_sets_conflict_p.
> >
> >/* We can only remove the later store if the earlier aliases
> >   at least all accesses the later one.  */
> >&& ((MEM_ALIAS_SET (mem) == MEM_ALIAS_SET (s_info->mem)
> > || alias_set_subset_of (MEM_ALIAS_SET (mem),
> > MEM_ALIAS_SET (s_info->mem)))
> >&& (!MEM_EXPR (s_info->mem)
> >|| refs_same_for_tbaa_p (MEM_EXPR (s_info->mem),
> > MEM_EXPR (mem)
> >
>
> OK, that's an easy enough change.
>
> > +  /* We failed to find a recorded value in the cselib history, so try the
> > + source of this set.  */
> > +  rtx src = SET_SRC (set);
> > +  while (GET_CODE (src) == SUBREG)
> > +src = XEXP (src, 0);
> > +
> > +  if (MEM_P (src) && rtx_equal_for_cselib_1 (dest_addr, XEXP (src, 0),
> > +GET_MODE (dest), 0))
> > +return alias_sets_conflict_p (MEM_ALIAS_SET (dest),
> > + MEM_ALIAS_SET (src));
> >
> > this looks like an odd case to me - wouldn't that only catch things
> > like self-assignments, aka *p = *p?  So I'd simply drop this fallback.
>
> It catches the case of *p = *q when p and q have the same value.  It did
> come up in testing on x86 (when previously I was aborting to make sure
> I'd caught everything).  We could leave it out as the fallback case in
> this instance is to record a conflict, but it's not a path that's likely
> to be performance critical and the probability of this being a redundant
> store is quite high.  I'll update the comment to make this clearer.

Ah OK - if it did actually catch cases then it's fine to keep.  Note the
alias check needs to be updated the same as above.

Richard.

>
>
> R.
>
> >
> > Otherwise it looks OK to me.
> >
> > Thanks,
> > Richard.
> >
> >> gcc/ChangeLog:
> >>  * cselib.h (cselib_redundant_set_p): Declare.
> >>  * cselib.cc: Include alias.h
> >>  (cselib_redundant_set_p): New function.
> >>  * cfgcleanup.cc: (mark_effect): Use cselib_redundant_set_p instead
> >>  of rtx_equal_for_cselib_p.
> >>  * postreload.c (reload_cse_simplify): Use cselib_redundant_set_p.
> >>  (reload_cse_noop_set_p): Delete.


Re: cselib: add function to check if SET is redundant [PR106187]

2022-07-29 Thread Richard Earnshaw via Gcc-patches




On 29/07/2022 08:06, Richard Biener via Gcc-patches wrote:

On Thu, Jul 28, 2022 at 6:46 PM Richard Earnshaw
 wrote:


[resend with correct subject line]

A SET operation that writes memory may have the same value as an earlier
store but if the alias sets of the new and earlier store do not conflict
then the set is not truly redundant.  This can happen, for example, if
objects of different types share a stack slot.

To fix this we define a new function in cselib that first checks for
equality and if that is successful then finds the earlier store in the
value history and checks the alias sets.

The routine is used in two places elsewhere in the compiler.  Firstly
in cfgcleanup and secondly in postreload.


I can't comment on the stripping on SUBREGs and friends but it seems
to be conservative apart from

+  if (!flag_strict_aliasing || !MEM_P (dest))
+return true;

where if dest is not a MEM but were to contain one we'd miss it.
Double-checking
from more RTL literate people appreciated.


There are very few things that can wrap a MEM in a SET_DEST.  I'm pretty 
sure that's all of them.  It certainly matches the code in 
cselib_invalidate_rtx which has to deal with this sort of case.




+  /* Lookup the equivalents to the dest.  This is more likely to succeed
+ than looking up the equivalents to the source (for example, when the
+ src is some form of constant).  */

I think the comment is misleading - we _do_ have to lookup the MEM,
looking up equivalences of a reg or an expression on the RHS isn't
what we are interested in.


OK, I'll try to reword it.



+   return alias_sets_conflict_p (MEM_ALIAS_SET (dest),
+ MEM_ALIAS_SET (src_equiv));

that's not conservative enough - dse.cc has correct boilerplate, we have
to check both MEM_ALIAS_SET and MEM_EXPR here (the latter only
if the former load/store has a MEM_EXPR).  Note in particular
using alias_set_subset_of instead of alias_sets_conflict_p.

   /* We can only remove the later store if the earlier aliases
  at least all accesses the later one.  */
   && ((MEM_ALIAS_SET (mem) == MEM_ALIAS_SET (s_info->mem)
|| alias_set_subset_of (MEM_ALIAS_SET (mem),
MEM_ALIAS_SET (s_info->mem)))
   && (!MEM_EXPR (s_info->mem)
   || refs_same_for_tbaa_p (MEM_EXPR (s_info->mem),
MEM_EXPR (mem)



OK, that's an easy enough change.


+  /* We failed to find a recorded value in the cselib history, so try the
+ source of this set.  */
+  rtx src = SET_SRC (set);
+  while (GET_CODE (src) == SUBREG)
+src = XEXP (src, 0);
+
+  if (MEM_P (src) && rtx_equal_for_cselib_1 (dest_addr, XEXP (src, 0),
+GET_MODE (dest), 0))
+return alias_sets_conflict_p (MEM_ALIAS_SET (dest),
+ MEM_ALIAS_SET (src));

this looks like an odd case to me - wouldn't that only catch things
like self-assignments, aka *p = *p?  So I'd simply drop this fallback.


It catches the case of *p = *q when p and q have the same value.  It did 
come up in testing on x86 (when previously I was aborting to make sure 
I'd caught everything).  We could leave it out as the fallback case in 
this instance is to record a conflict, but it's not a path that's likely 
to be performance critical and the probability of this being a redundant 
store is quite high.  I'll update the comment to make this clearer.



R.



Otherwise it looks OK to me.

Thanks,
Richard.


gcc/ChangeLog:
 * cselib.h (cselib_redundant_set_p): Declare.
 * cselib.cc: Include alias.h
 (cselib_redundant_set_p): New function.
 * cfgcleanup.cc: (mark_effect): Use cselib_redundant_set_p instead
 of rtx_equal_for_cselib_p.
 * postreload.c (reload_cse_simplify): Use cselib_redundant_set_p.
 (reload_cse_noop_set_p): Delete.


[Patch] Add libgomp.c-c++-common/pr106449-2.c (was: [committed] openmp: Fix up handling of non-rectangular simd loops with pointer type iterators [PR106449])

2022-07-29 Thread Tobias Burnus

On 29.07.22 10:03, Jakub Jelinek wrote:

There were 2 issues visible on this new testcase, one that we didn't have
special POINTER_TYPE_P handling in a few spots of expand_omp_simd ...
The other issue was that we put n2 expression directly into a
comparison in a condition and regimplified that, for the &a[512] case that
and with gimplification being destructed that unfortunately meant modification
of original fd->loops[?].n2.  Fixed by unsharing the expression.


I created a testcase for the non-simd case – and due to messing up, it failed;
hence, I filled PR middle-end/106467.  After fixing the testcase, it passes.
(→ closed PR as invalid).

However, given that the testcase now exists, I think it makes sense to add it 
:-)

Changes compared to the simd testcase: replaced '(parallel for) simd' by 'for',
removed 'linear', used now 'b' and 'c' instead of storing both ptrs in 'b'.

Side remark: Before GCC 12, GCC complained about "q = p + n" with
"error: initializer expression refers to iteration variable ‘p’".

OK for mainline?

Tobias
-
Siemens Electronic Design Automation GmbH; Anschrift: Arnulfstraße 201, 80634 
München; Gesellschaft mit beschränkter Haftung; Geschäftsführer: Thomas 
Heurung, Frank Thürauf; Sitz der Gesellschaft: München; Registergericht 
München, HRB 106955
Add libgomp.c-c++-common/pr106449-2.c

This run-time test test pointer-based iteration with collapse,
similar to the '(parallel) simd' test for PR106449 but for 'for'.

libgomp/ChangeLog:

	* testsuite/libgomp.c-c++-common/pr106449-2.c: New test.

 .../testsuite/libgomp.c-c++-common/pr106449-2.c| 64 ++
 1 file changed, 64 insertions(+)

diff --git a/libgomp/testsuite/libgomp.c-c++-common/pr106449-2.c b/libgomp/testsuite/libgomp.c-c++-common/pr106449-2.c
new file mode 100644
index 000..7fef7461bcf
--- /dev/null
+++ b/libgomp/testsuite/libgomp.c-c++-common/pr106449-2.c
@@ -0,0 +1,64 @@
+/* { dg-do run } */
+
+/* Based on pr106449.c - but using 'for' instead of 'simd'.
+   Cf. PR middle-end/106449 (for pr106449.c) and PR middle-end/106467.  */
+
+void
+foo (void)
+{
+  int a[1024], *b[65536], *c[65536];
+  int *p, *q, **r = &b[0], **r2 = &c[0], i;
+  #pragma omp for collapse(2)
+  for (p = &a[0]; p < &a[512]; p++)
+for (q = p + 64; q < p + 128; q++)
+  {
+	*r++ = p;
+	*r2++ = q;
+  }
+  for (i = 0; i < 32768; i++)
+if (b[i] != &a[i / 64] || c[i] != &a[(i / 64) + 64 + (i % 64)])
+  __builtin_abort ();
+}
+
+void
+bar (int n, int m)
+{
+  int a[1024], *b[32768], *c[32768];
+  int *p, *q, **r = &b[0], **r2 = &c[0], i;
+  #pragma omp for collapse(2)
+  for (p = &a[0]; p < &a[512]; p++)
+for (q = p + n; q < p + m; q++)
+  {
+	*r++ = p;
+	*r2++ = q;
+  }
+  for (i = 0; i < 32768; i++)
+if (b[i] != &a[i / 64] || c[i] != &a[(i / 64) + 64 + (i % 64)])
+  __builtin_abort ();
+}
+
+void
+baz (int n, int m)
+{
+  int a[1024], *b[8192], *c[8192];
+  int *p, *q, **r = &b[0], **r2 = &c[0], i;
+  #pragma omp for collapse(2)
+  for (p = &a[0]; p < &a[512]; p += 4)
+for (q = p + n; q < p + m; q += 2)
+  {
+	*r++ = p;
+	*r2++ = q;
+  }
+  for (i = 0; i < 4096; i++)
+if (b[i] != &a[(i / 32) * 4] || c[i] != &a[(i / 32) * 4 + 64 + (i % 32) * 2])
+  __builtin_abort ();
+}
+
+int
+main ()
+{
+  foo ();
+  bar (64, 128);
+  baz (64, 128);
+  return 0;
+}


[Patch] OpenMP/Fortran: Permit assumed-size arrays in uniform clause

2022-07-29 Thread Tobias Burnus

Testcase wise, the run-time testcase libgomp.fortran/examples-4/simd-2.f90
checks essentially the same, except that it uses an array-descriptor array
(assumed shape) while this testcase uses an assumed-size array.

I decided for an extra compile-time only testcase, but it could be also be
moved to libgomp as run-time test or the other test could be extended to
also test assumed-size arrays.

The OpenMP examples document contains two testcases which now pass,
but are reject without this patch:
- SIMD/sources/SIMD.2.f90 (for OpenMP 4.0)
- SIMD/sources/linear_modifier.3.f90 (for OpenMP 5.2)

OK for mainline?

Tobias
-
Siemens Electronic Design Automation GmbH; Anschrift: Arnulfstraße 201, 80634 
München; Gesellschaft mit beschränkter Haftung; Geschäftsführer: Thomas 
Heurung, Frank Thürauf; Sitz der Gesellschaft: München; Registergericht 
München, HRB 106955
OpenMP/Fortran: Permit assumed-size arrays in uniform clause

gcc/fortran/ChangeLog:

	* openmp.cc (resolve_omp_clauses): Permit assumed-size arrays
	in uniform clause.

gcc/testsuite/ChangeLog:

	* gfortran.dg/gomp/declare-simd-3.f90: New test.

 gcc/fortran/openmp.cc |  3 ++-
 gcc/testsuite/gfortran.dg/gomp/declare-simd-3.f90 | 30 +++
 2 files changed, 32 insertions(+), 1 deletion(-)

diff --git a/gcc/fortran/openmp.cc b/gcc/fortran/openmp.cc
index df9cdf43eb7..a7eb6c3e8f4 100644
--- a/gcc/fortran/openmp.cc
+++ b/gcc/fortran/openmp.cc
@@ -7386,7 +7386,8 @@ resolve_omp_clauses (gfc_code *code, gfc_omp_clauses *omp_clauses,
 			|| code->op == EXEC_OACC_PARALLEL
 			|| code->op == EXEC_OACC_SERIAL))
 		  check_array_not_assumed (n->sym, n->where, name);
-		else if (n->sym->as && n->sym->as->type == AS_ASSUMED_SIZE)
+		else if (list != OMP_LIST_UNIFORM
+			 && n->sym->as && n->sym->as->type == AS_ASSUMED_SIZE)
 		  gfc_error ("Assumed size array %qs in %s clause at %L",
 			 n->sym->name, name, &n->where);
 		if (n->sym->attr.in_namelist && !is_reduction)
diff --git a/gcc/testsuite/gfortran.dg/gomp/declare-simd-3.f90 b/gcc/testsuite/gfortran.dg/gomp/declare-simd-3.f90
new file mode 100644
index 000..b94587ef35a
--- /dev/null
+++ b/gcc/testsuite/gfortran.dg/gomp/declare-simd-3.f90
@@ -0,0 +1,30 @@
+! { dg-do compile }
+
+module m
+  implicit none (type, external)
+contains
+  real function add(x, y, j) result(res)
+!$omp declare simd(add) uniform(x, y) linear(j : 1) simdlen(4)
+integer, value :: j
+real, intent(in) :: x(*), y(*)
+res = x(j) + y(j)
+  end function
+end module m
+
+program main
+  use m
+  implicit none (type, external)
+  real, allocatable :: A(:), B(:), C(:)
+  integer :: i, N
+  N = 128
+  A = [(3*i, i = 1, N)]
+  B = [(7*i, i = 1, N)]
+  allocate (C(N))
+
+  !$omp simd
+  do i = 1, N
+C(i) = add(A, B, i)
+  end do
+
+  if (any (C /= [(10*i, i = 1, N)])) error stop
+end program main


Re: [RFC] Teach vectorizer to deal with bitfield reads

2022-07-29 Thread Richard Biener via Gcc-patches
On Fri, 29 Jul 2022, Andre Vieira (lists) wrote:

> Hi Richard,
> 
> Thanks for the review, I don't completely understand all of the below, so I
> added some extra questions to help me understand :)
> 
> On 27/07/2022 12:37, Richard Biener wrote:
> > On Tue, 26 Jul 2022, Andre Vieira (lists) wrote:
> >
> > I don't think this is a good approach for what you gain and how
> > necessarily limited it will be.  Similar to the recent experiment with
> > handling _Complex loads/stores this is much better tackled by lowering
> > things earlier (it will be lowered at RTL expansion time).
> I assume the approach you are referring to here is the lowering of the
> BIT_FIELD_DECL to BIT_FIELD_REF in the vect_recog part of the vectorizer. I am
> all for lowering earlier, the reason I did it there was as a 'temporary'
> approach until we have that earlier loading.

I understood, but "temporary" things in GCC tend to be still around
10 years later, so ...

> >
> > One place to do this experimentation would be to piggy-back on the
> > if-conversion pass so the lowering would happen only on the
> > vectorized code path.
> This was one of my initial thoughts, though the if-conversion changes are a
> bit more intrusive for a temporary approach and not that much earlier. It does
> however have the added benefit of not having to make any changes to the
> vectorizer itself later if we do do the earlier lowering, assuming the
> lowering results in the same.
> 
> The 'only on the vectorized code path' remains the same though as vect_recog
> also only happens on the vectorized code path right?
> >   Note that the desired lowering would look like
> > the following for reads:
> >
> >_1 = a.b;
> >
> > to
> >
> >_2 = a.;
> >_1 = BIT_FIELD_REF <2, ...>; // extract bits
> I don't yet have a well formed idea of what '' is
> supposed to look like in terms of tree expressions. I understand what it's
> supposed to be representing, the 'larger than bit-field'-load. But is it going
> to be a COMPONENT_REF with a fake 'FIELD_DECL' with the larger size? Like I
> said on IRC, the description of BIT_FIELD_REF makes it sound like this isn't
> how we are supposed to use it, are we intending to make a change to that here?

 is what DECL_BIT_FIELD_REPRESENTATIVE 
(FIELD_DECL-for-b) gives you, it is a "fake" FIELD_DECL for the underlying
storage, conveniently made available to you by the middle-end.  For
your 31bit field it would be simply 'int' typed.

The BIT_FIELD_REF then extracts the actual bitfield from the underlying
storage, but it's now no longer operating on memory but on the register
holding the underlying data.  To the vectorizer we'd probably have to
pattern-match this to shifts & masks and hope for the conversion to
combine with a later one.

> > and for writes:
> >
> >a.b = _1;
> >
> > to
> >
> >_2 = a.;
> >_3 = BIT_INSERT_EXPR <_2, _1, ...>; // insert bits
> >a. = _3;
> I was going to avoid writes for now because they are somewhat more
> complicated, but maybe it's not that bad, I'll add them too.

Only handling loads at start is probably fine as experiment, but
handling stores should be straight forward - of course the
BIT_INSERT_EXPR lowering to shifts & masks will be more
complicated.

> > so you trade now handled loads/stores with not handled
> > BIT_FIELD_REF / BIT_INSERT_EXPR which you would then need to
> > pattern match to shifts and logical ops in the vectorizer.
> Yeah that vect_recog pattern already exists in my RFC patch, though I can
> probably simplify it by moving the bit-field-ref stuff to ifcvt.
> >
> > There's a separate thing of actually promoting all uses, for
> > example
> >
> > struct { long long x : 33; } a;
> >
> >   a.a = a.a + 1;
> >
> > will get you 33bit precision adds (for bit fields less than 32bits
> > they get promoted to int but not for larger bit fields).  RTL
> > expansion again will rewrite this into larger ops plus masking.
> Not sure I understand why this is relevant here? The current way I am doing
> this would likely lower a  bit-field like that to a 64-bit load  followed by
> the masking away of the top 31 bits, same would happen with a ifcvt-lowering
> approach.

Yes, but if there's anything besides loading or storing you will have
a conversion from, say int:31 to int in the IL before any arithmetic.
I've not looked but your patch probably handles conversions to/from
bitfield types by masking / extending.  What I've mentioned with the
33bit example is that with that you can have arithmetic in 33 bits
_without_ intermediate conversions, so you'd have to properly truncate
after every such operation (or choose not to vectorize which I think
is what would happen now).

> >
> > So I think the time is better spent in working on the lowering of
> > bitfield accesses, if sufficiently separated it could be used
> > from if-conversion by working on loop SEME regions.
> I will start to look at modifying ifcvt to add the lowering there. Will likely
> require two pass though becaus

[committed] libstdc++: Tweak common_iterator::operator-> return type [PR104443]

2022-07-29 Thread Jonathan Wakely via Gcc-patches
Tested powerpc64le-linux, pushed to trunk.

-- >8 --

This adjusts the return type to match the resolution of LWG 3672. There
is no functional difference, because decltype(auto) always deduced a
value anyway, but this makes it simpler and consistent with the working
draft.

libstdc++-v3/ChangeLog:

PR libstdc++/104443
* include/bits/stl_iterator.h (common_iterator::operator->):
Change return type to just auto.
---
 libstdc++-v3/include/bits/stl_iterator.h | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/libstdc++-v3/include/bits/stl_iterator.h 
b/libstdc++-v3/include/bits/stl_iterator.h
index 9cd262cd1d9..003cebbec83 100644
--- a/libstdc++-v3/include/bits/stl_iterator.h
+++ b/libstdc++-v3/include/bits/stl_iterator.h
@@ -2058,7 +2058,7 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
 }
 
 [[nodiscard]]
-constexpr decltype(auto)
+constexpr auto
 operator->() const requires __detail::__common_iter_has_arrow<_It>
 {
   __glibcxx_assert(_M_index == 0);
-- 
2.37.1



Re: [RFC] Teach vectorizer to deal with bitfield reads

2022-07-29 Thread Andre Vieira (lists) via Gcc-patches

Hi Richard,

Thanks for the review, I don't completely understand all of the below, 
so I added some extra questions to help me understand :)


On 27/07/2022 12:37, Richard Biener wrote:

On Tue, 26 Jul 2022, Andre Vieira (lists) wrote:

I don't think this is a good approach for what you gain and how
necessarily limited it will be.  Similar to the recent experiment with
handling _Complex loads/stores this is much better tackled by lowering
things earlier (it will be lowered at RTL expansion time).
I assume the approach you are referring to here is the lowering of the 
BIT_FIELD_DECL to BIT_FIELD_REF in the vect_recog part of the 
vectorizer. I am all for lowering earlier, the reason I did it there was 
as a 'temporary' approach until we have that earlier loading.


One place to do this experimentation would be to piggy-back on the
if-conversion pass so the lowering would happen only on the
vectorized code path.
This was one of my initial thoughts, though the if-conversion changes 
are a bit more intrusive for a temporary approach and not that much 
earlier. It does however have the added benefit of not having to make 
any changes to the vectorizer itself later if we do do the earlier 
lowering, assuming the lowering results in the same.


The 'only on the vectorized code path' remains the same though as 
vect_recog also only happens on the vectorized code path right?

  Note that the desired lowering would look like
the following for reads:

   _1 = a.b;

to

   _2 = a.;
   _1 = BIT_FIELD_REF <2, ...>; // extract bits
I don't yet have a well formed idea of what '' is 
supposed to look like in terms of tree expressions. I understand what 
it's supposed to be representing, the 'larger than bit-field'-load. But 
is it going to be a COMPONENT_REF with a fake 'FIELD_DECL' with the 
larger size? Like I said on IRC, the description of BIT_FIELD_REF makes 
it sound like this isn't how we are supposed to use it, are we intending 
to make a change to that here?



and for writes:

   a.b = _1;

to

   _2 = a.;
   _3 = BIT_INSERT_EXPR <_2, _1, ...>; // insert bits
   a. = _3;
I was going to avoid writes for now because they are somewhat more 
complicated, but maybe it's not that bad, I'll add them too.

so you trade now handled loads/stores with not handled
BIT_FIELD_REF / BIT_INSERT_EXPR which you would then need to
pattern match to shifts and logical ops in the vectorizer.
Yeah that vect_recog pattern already exists in my RFC patch, though I 
can probably simplify it by moving the bit-field-ref stuff to ifcvt.


There's a separate thing of actually promoting all uses, for
example

struct { long long x : 33; } a;

  a.a = a.a + 1;

will get you 33bit precision adds (for bit fields less than 32bits
they get promoted to int but not for larger bit fields).  RTL
expansion again will rewrite this into larger ops plus masking.
Not sure I understand why this is relevant here? The current way I am 
doing this would likely lower a  bit-field like that to a 64-bit load  
followed by the masking away of the top 31 bits, same would happen with 
a ifcvt-lowering approach.


So I think the time is better spent in working on the lowering of
bitfield accesses, if sufficiently separated it could be used
from if-conversion by working on loop SEME regions.
I will start to look at modifying ifcvt to add the lowering there. Will 
likely require two pass though because we can no longer look at the 
number of BBs to determine whether ifcvt is even needed, so we will 
first need to look for bit-field-decls, then version the loops and then 
look for them again for transformation, but I guess that's fine?

The patches
doing previous implementations are probably not too useful anymore
(I find one from 2011 and one from 2016, both pre-dating BIT_INSERT_EXPR)

Richard.


[PATCH] tree-optimization/105679 - disable backward threading of unlikely entry

2022-07-29 Thread Richard Biener via Gcc-patches
The following makes the backward threader reject threads whose entry
edge is probably never executed according to the profile.  That in
particular, for the testcase, avoids threading the irq == 1 check
on the path where irq > 31, thereby avoiding spurious -Warray-bounds
diagnostics

  if (irq_1(D) > 31)
goto ; [0.00%]
  else
goto ; [100.00%]

;;   basic block 3, loop depth 0, count 0 (precise), probably never executed
  _2 = (unsigned long) irq_1(D);
  __builtin___ubsan_handle_shift_out_of_bounds (&*.Lubsan_data0, 1, _2);

  _3 = 1 << irq_1(D);
  mask_4 = (u32) _3;
  entry = instance_5(D)->array[irq_1(D)];
  capture (mask_4);
  if (level_6(D) != 0)
goto ; [34.00%]
  else
goto ; [66.00%]

;;   basic block 5, loop depth 0, count 708669600 (estimated locally), maybe 
hot  if (irq_1(D) == 1)
goto ; [20.97%]
  else
goto ; [79.03%]

Bootstrap and regtest running on x86_64-unknown-linux-gnu.

The testcase in the PR requries both ubsan and sancov so I'm not sure
where to put it but IIRC there were quite some duplicate PRs wrt
threading unlikely paths exposing diagnostics, eventually some
testcase will come out of those (when we identify them).  Note
the patch is quite conservative in only disabling likely never
executed paths rather than requiring maybe_hot_edge_p (OTOH those
are somewhat similar in the end).

I'm going to push it when testing finishes but maybe there are some
testcases to adjust.

PR tree-optimization/105679
* tree-ssa-threadbackwards.cc
(back_threader_profitability::profitable_path_p): Avoid threading
when the entry edge is probably never executed.
---
 gcc/tree-ssa-threadbackward.cc | 9 +
 1 file changed, 9 insertions(+)

diff --git a/gcc/tree-ssa-threadbackward.cc b/gcc/tree-ssa-threadbackward.cc
index 3519aca84cd..90f5331c265 100644
--- a/gcc/tree-ssa-threadbackward.cc
+++ b/gcc/tree-ssa-threadbackward.cc
@@ -777,6 +777,15 @@ back_threader_profitability::profitable_path_p (const 
vec &m_path,
 "exceeds PARAM_MAX_FSM_THREAD_PATH_INSNS.\n");
  return false;
}
+  edge entry = find_edge (m_path[m_path.length () - 1],
+ m_path[m_path.length () - 2]);
+  if (probably_never_executed_edge_p (cfun, entry))
+   {
+ if (dump_file && (dump_flags & TDF_DETAILS))
+   fprintf (dump_file, "  FAIL: Jump-thread path not considered: "
+"path entry is probably never executed.\n");
+ return false;
+   }
 }
   else if (!m_speed_p && n_insns > 1)
 {
-- 
2.35.3


[PATCH] tree-optimization/106422 - verify block copying in forward threading

2022-07-29 Thread Richard Biener via Gcc-patches
The forward threader failed to check whether it can actually duplicate
blocks.  The following adds this in a similar place the backwards threader
performs this check.

Bootstrapped and tested on x86_64-unknown-linux-gnu, pushed.

PR tree-optimization/106422
* tree-ssa-threadupdate.cc (fwd_jt_path_registry::update_cfg):
Check whether we can copy thread blocks and cancel the thread if not.

* gcc.dg/torture/pr106422.c: New testcase.
---
 gcc/testsuite/gcc.dg/torture/pr106422.c | 14 ++
 gcc/tree-ssa-threadupdate.cc|  4 +++-
 2 files changed, 17 insertions(+), 1 deletion(-)
 create mode 100644 gcc/testsuite/gcc.dg/torture/pr106422.c

diff --git a/gcc/testsuite/gcc.dg/torture/pr106422.c 
b/gcc/testsuite/gcc.dg/torture/pr106422.c
new file mode 100644
index 000..a2cef1aecb6
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/torture/pr106422.c
@@ -0,0 +1,14 @@
+/* { dg-do compile } */
+
+void vfork() __attribute__((__leaf__));
+void semanage_reload_policy(char *arg, void cb(void))
+{
+  if (!arg)
+{
+  cb();
+  return;
+}
+  vfork();
+  if (arg)
+__builtin_free(arg);
+}
diff --git a/gcc/tree-ssa-threadupdate.cc b/gcc/tree-ssa-threadupdate.cc
index f901c7759e3..0f2b319d44a 100644
--- a/gcc/tree-ssa-threadupdate.cc
+++ b/gcc/tree-ssa-threadupdate.cc
@@ -2678,7 +2678,9 @@ fwd_jt_path_registry::update_cfg (bool 
may_peel_loop_headers)
for (j = 0; j < path->length (); j++)
  {
edge e = (*path)[j]->e;
-   if (m_removed_edges->find_slot (e, NO_INSERT))
+   if (m_removed_edges->find_slot (e, NO_INSERT)
+   || ((*path)[j]->type == EDGE_COPY_SRC_BLOCK
+   && !can_duplicate_block_p (e->src)))
  break;
  }
 
-- 
2.35.3


[PATCH 1/1] PR 106101: IBM zSystems: Fix strict_low_part problem

2022-07-29 Thread Andreas Krebbel via Gcc-patches
This avoids generating illegal (strict_low_part (reg ...)) RTXs. This
required two changes:

1. Do not use gen_lowpart to generate the inner expression of a
STRICT_LOW_PART.  gen_lowpart might fold the SUBREG either because
there is already a paradoxical subreg or because it can directly be
applied to the register. A new wrapper function makes sure that we
always end up having an actual SUBREG.

2. Change the movstrict patterns to enforce a SUBREG as inner operand
of the STRICT_LOW_PARTs.  The new predicate introduced for the
destination operand requires a SUBREG expression with a
register_operand as inner operand.  However, since reload strips away
the majority of the SUBREGs we have to accept single registers as well
once we reach reload.

Bootstrapped and regression tested on IBM zSystems 64 bit.

gcc/ChangeLog:

PR target/106101
* config/s390/predicates.md (subreg_register_operand): New
predicate.
* config/s390/s390-protos.h (s390_gen_lowpart_subreg): New
function prototype.
* config/s390/s390.cc (s390_gen_lowpart_subreg): New function.
(s390_expand_insv): Use s390_gen_lowpart_subreg instead of
gen_lowpart.
* config/s390/s390.md ("*get_tp_64", "*zero_extendhisi2_31")
("*zero_extendqisi2_31", "*zero_extendqihi2_31"): Likewise.
("movstrictqi", "movstricthi", "movstrictsi"): Use the
subreg_register_operand predicate instead of register_operand.

gcc/testsuite/ChangeLog:

PR target/106101
* gcc.c-torture/compile/pr106101.c: New test.
---
 gcc/config/s390/predicates.md | 12 
 gcc/config/s390/s390-protos.h |  1 +
 gcc/config/s390/s390.cc   | 27 +++-
 gcc/config/s390/s390.md   | 36 +--
 .../gcc.c-torture/compile/pr106101.c  | 62 +++
 5 files changed, 116 insertions(+), 22 deletions(-)
 create mode 100644 gcc/testsuite/gcc.c-torture/compile/pr106101.c

diff --git a/gcc/config/s390/predicates.md b/gcc/config/s390/predicates.md
index 33194d3f3d6..430cf6edfd6 100644
--- a/gcc/config/s390/predicates.md
+++ b/gcc/config/s390/predicates.md
@@ -594,3 +594,15 @@
 (define_predicate "addv_const_operand"
   (and (match_code "const_int")
(match_test "INTVAL (op) >= -32768 && INTVAL (op) <= 32767")))
+
+; Match (subreg (reg ...)) operands.
+; Used for movstrict destination operands
+; When replacing pseudos with hard regs reload strips away the
+; subregs. Accept also plain registers then to prevent the insn from
+; becoming unrecognizable.
+(define_predicate "subreg_register_operand"
+  (ior (and (match_code "subreg")
+   (match_test "register_operand (SUBREG_REG (op), GET_MODE 
(SUBREG_REG (op)))"))
+   (and (match_code "reg")
+   (match_test "reload_completed || reload_in_progress")
+   (match_test "register_operand (op, GET_MODE (op))"
diff --git a/gcc/config/s390/s390-protos.h b/gcc/config/s390/s390-protos.h
index fd4acaae44a..765d843a418 100644
--- a/gcc/config/s390/s390-protos.h
+++ b/gcc/config/s390/s390-protos.h
@@ -50,6 +50,7 @@ extern void s390_set_has_landing_pad_p (bool);
 extern bool s390_hard_regno_rename_ok (unsigned int, unsigned int);
 extern int s390_class_max_nregs (enum reg_class, machine_mode);
 extern bool s390_return_addr_from_memory(void);
+extern rtx s390_gen_lowpart_subreg (machine_mode, rtx);
 extern bool s390_fma_allowed_p (machine_mode);
 #if S390_USE_TARGET_ATTRIBUTE
 extern tree s390_valid_target_attribute_tree (tree args,
diff --git a/gcc/config/s390/s390.cc b/gcc/config/s390/s390.cc
index 5aaf76a9490..5e06bf9350c 100644
--- a/gcc/config/s390/s390.cc
+++ b/gcc/config/s390/s390.cc
@@ -458,6 +458,31 @@ s390_return_addr_from_memory ()
   return cfun_gpr_save_slot(RETURN_REGNUM) == SAVE_SLOT_STACK;
 }
 
+/* Generate a SUBREG for the MODE lowpart of EXPR.
+
+   In contrast to gen_lowpart it will always return a SUBREG
+   expression.  This is useful to generate STRICT_LOW_PART
+   expressions.  */
+rtx
+s390_gen_lowpart_subreg (machine_mode mode, rtx expr)
+{
+  rtx lowpart = gen_lowpart (mode, expr);
+
+  /* There might be no SUBREG in case it could be applied to the hard
+ REG rtx or it could be folded with a paradoxical subreg.  Bring
+ it back.  */
+  if (!SUBREG_P (lowpart))
+{
+  machine_mode reg_mode = TARGET_ZARCH ? DImode : SImode;
+  gcc_assert (REG_P (lowpart));
+  lowpart = gen_lowpart_SUBREG (mode,
+   gen_rtx_REG (reg_mode,
+REGNO (lowpart)));
+}
+
+  return lowpart;
+}
+
 /* Return nonzero if it's OK to use fused multiply-add for MODE.  */
 bool
 s390_fma_allowed_p (machine_mode mode)
@@ -6520,7 +6545,7 @@ s390_expand_insv (rtx dest, rtx op1, rtx op2, rtx src)
   /* Emit a strict_low_part pattern if possible.  */
   if (smode_bsize == bitsize && bitpos == mode_bsize - smode_bsize)
{
- rtx low

[committed] openmp: Reject invalid forms of C++ #pragma omp atomic compare [PR106448]

2022-07-29 Thread Jakub Jelinek via Gcc-patches
Hi!

The allowed syntaxes of atomic compare don't allow ()s around the condition
of ?:, but we were accepting it in one case for C++.

Fixed thusly.

Bootstrapped/regtested on x86_64-linux and i686-linux, committed to trunk.

2022-07-29  Jakub Jelinek  

PR c++/106448
* parser.cc (cp_parser_omp_atomic): For simple cast followed by
CPP_QUERY token, don't try cp_parser_binary_operation if compare
is true.

* c-c++-common/gomp/atomic-32.c: New test.

--- gcc/cp/parser.cc.jj 2022-07-26 10:32:23.557273390 +0200
+++ gcc/cp/parser.cc2022-07-28 15:12:44.288195066 +0200
@@ -41535,7 +41535,9 @@ restart:
  goto saw_error;
}
  token = cp_lexer_peek_token (parser->lexer);
- if (token->type != CPP_SEMICOLON && !cp_tree_equal (lhs, rhs1))
+ if (token->type != CPP_SEMICOLON
+ && (!compare || token->type != CPP_QUERY)
+ && !cp_tree_equal (lhs, rhs1))
{
  cp_parser_abort_tentative_parse (parser);
  cp_parser_parse_tentatively (parser);
--- gcc/testsuite/c-c++-common/gomp/atomic-32.c.jj  2022-07-28 
15:23:54.567237524 +0200
+++ gcc/testsuite/c-c++-common/gomp/atomic-32.c 2022-07-28 15:25:25.127027325 
+0200
@@ -0,0 +1,14 @@
+/* PR c++/106448 */
+
+int x, expr;
+  
+void
+foo (void)
+{
+  #pragma omp atomic compare
+  x = (expr > x) ? expr : x;   /* { dg-error "invalid (form|operator)" } */
+  #pragma omp atomic compare
+  x = (x < expr) ? expr : x;   /* { dg-error "invalid (form|operator)" } */
+  #pragma omp atomic compare
+  x = (x == expr) ? expr : x;  /* { dg-error "invalid (form|operator)" } */
+}

Jakub



[committed] openmp: Fix up handling of non-rectangular simd loops with pointer type iterators [PR106449]

2022-07-29 Thread Jakub Jelinek via Gcc-patches
Hi!

There were 2 issues visible on this new testcase, one that we didn't have
special POINTER_TYPE_P handling in a few spots of expand_omp_simd - for
pointers we need to use POINTER_PLUS_EXPR and need to have the non-pointer
part in sizetype, for non-rectangular loop on the other side we can rely on
multiplication factor 1, pointers can't be multiplied, without those changes
we'd ICE.  The other issue was that we put n2 expression directly into a
comparison in a condition and regimplified that, for the &a[512] case that
and with gimplification being destructed that unfortunately meant modification
of original fd->loops[?].n2.  Fixed by unsharing the expression.  This was
causing a runtime failure on the testcase.

Bootstrapped/regtested on x86_64-linux and i686-linux, committed to trunk.

2022-07-29  Jakub Jelinek  

PR middle-end/106449
* omp-expand.cc (expand_omp_simd): Fix up handling of pointer
iterators in non-rectangular simd loops.  Unshare fd->loops[i].n2
or n2 before regimplifying it inside of a condition.

* testsuite/libgomp.c-c++-common/pr106449.c: New test.

--- gcc/omp-expand.cc.jj2022-06-28 13:03:30.930689700 +0200
+++ gcc/omp-expand.cc   2022-07-28 10:47:03.138939297 +0200
@@ -6714,7 +6696,7 @@ expand_omp_simd (struct omp_region *regi
   if (fd->loops[i].m2)
t = n2v = create_tmp_var (itype);
   else
-   t = fold_convert (itype, fd->loops[i].n2);
+   t = fold_convert (itype, unshare_expr (fd->loops[i].n2));
   t = force_gimple_operand_gsi (&gsi, t, true, NULL_TREE,
false, GSI_CONTINUE_LINKING);
   tree v = fd->loops[i].v;
@@ -6728,7 +6710,7 @@ expand_omp_simd (struct omp_region *regi
   if (fd->collapse > 1 && !broken_loop)
t = n2var;
   else
-   t = fold_convert (type, n2);
+   t = fold_convert (type, unshare_expr (n2));
   t = force_gimple_operand_gsi (&gsi, t, true, NULL_TREE,
false, GSI_CONTINUE_LINKING);
   tree v = fd->loop.v;
@@ -6840,7 +6819,7 @@ expand_omp_simd (struct omp_region *regi
  if (fd->loops[i].m2)
t = nextn2v = create_tmp_var (itype);
  else
-   t = fold_convert (itype, fd->loops[i].n2);
+   t = fold_convert (itype, unshare_expr (fd->loops[i].n2));
  t = force_gimple_operand_gsi (&gsi, t, true, NULL_TREE,
false, GSI_CONTINUE_LINKING);
  tree v = fd->loops[i].v;
@@ -6870,17 +6849,25 @@ expand_omp_simd (struct omp_region *regi
  ne->probability = e->probability.invert ();
 
  gsi = gsi_after_labels (init_bb);
- t = fold_convert (TREE_TYPE (fd->loops[i + 1].v),
-   fd->loops[i + 1].n1);
  if (fd->loops[i + 1].m1)
{
- tree t2 = fold_convert (TREE_TYPE (t),
+ tree t2 = fold_convert (TREE_TYPE (fd->loops[i + 1].v),
  fd->loops[i + 1
- fd->loops[i + 1].outer].v);
- tree t3 = fold_convert (TREE_TYPE (t), fd->loops[i + 1].m1);
- t2 = fold_build2 (MULT_EXPR, TREE_TYPE (t), t2, t3);
- t = fold_build2 (PLUS_EXPR, TREE_TYPE (t), t, t2);
+ if (POINTER_TYPE_P (TREE_TYPE (t2)))
+   t = fold_build_pointer_plus (t2, fd->loops[i + 1].n1);
+ else
+   {
+ t = fold_convert (TREE_TYPE (fd->loops[i + 1].v),
+   fd->loops[i + 1].n1);
+ tree t3 = fold_convert (TREE_TYPE (t), fd->loops[i + 1].m1);
+ t2 = fold_build2 (MULT_EXPR, TREE_TYPE (t), t2, t3);
+ t = fold_build2 (PLUS_EXPR, TREE_TYPE (t), t, t2);
+   }
}
+ else
+   t = fold_convert (TREE_TYPE (fd->loops[i + 1].v),
+ fd->loops[i + 1].n1);
  expand_omp_build_assign (&gsi, fd->loops[i + 1].v, t);
  if (fd->loops[i + 1].m2)
{
@@ -6889,14 +6876,19 @@ expand_omp_simd (struct omp_region *regi
  gcc_assert (n2v == NULL_TREE);
  n2v = create_tmp_var (TREE_TYPE (fd->loops[i + 1].v));
}
- t = fold_convert (TREE_TYPE (fd->loops[i + 1].v),
-   fd->loops[i + 1].n2);
- tree t2 = fold_convert (TREE_TYPE (t),
+ tree t2 = fold_convert (TREE_TYPE (fd->loops[i + 1].v),
  fd->loops[i + 1
- fd->loops[i + 1].outer].v);
- tree t3 = fold_convert (TREE_TYPE (t), fd->loops[i + 1].m2);
- t2 = fold_build2 (MULT_EXPR, TREE_TYPE (t), t2, t3);
- t = fold_build2 (PLUS_EXPR, TREE_TYPE (t), t, t2);
+ if (POINTER_TYPE_P (TREE_TYPE (t2)))
+   t = fold_build_pointer_plus (t2, fd->loops[i + 1].n2);
+  

[committed] openmp: Simplify fold_build_pointer_plus callers in omp-expand

2022-07-29 Thread Jakub Jelinek via Gcc-patches
Hi!

Tobias mentioned in PR106449 that fold_build_pointer_plus already
fold_converts the second argument to sizetype if it doesn't already
have an integral type gimple compatible with sizetype.

So, this patch simplifies the callers of fold_build_pointer_plus in
omp-expand so that they don't do those conversions manually.

Bootstrapped/regtested on x86_64-linux and i686-linux, committed to trunk.

2022-07-29  Jakub Jelinek  

* omp-expand.cc (expand_omp_for_init_counts, expand_omp_for_init_vars,
extract_omp_for_update_vars, expand_omp_for_ordered_loops,
expand_omp_simd): Don't fold_convert second argument to
fold_build_pointer_plus to sizetype.

--- gcc/omp-expand.cc.jj2022-06-28 13:03:30.930689700 +0200
+++ gcc/omp-expand.cc   2022-07-28 10:47:03.138939297 +0200
@@ -2267,8 +2267,7 @@ expand_omp_for_init_counts (struct omp_f
  else if (POINTER_TYPE_P (itype))
{
  gcc_assert (integer_onep (fd->loops[i].m1));
- t = fold_convert (sizetype,
-   unshare_expr (fd->loops[i].n1));
+ t = unshare_expr (fd->loops[i].n1);
  n1 = fold_build_pointer_plus (vs[i - fd->loops[i].outer], t);
}
  else
@@ -2291,8 +2290,7 @@ expand_omp_for_init_counts (struct omp_f
  else if (POINTER_TYPE_P (itype))
{
  gcc_assert (integer_onep (fd->loops[i].m2));
- t = fold_convert (sizetype,
-   unshare_expr (fd->loops[i].n2));
+ t = unshare_expr (fd->loops[i].n2);
  n2 = fold_build_pointer_plus (vs[i - fd->loops[i].outer], t);
}
  else
@@ -2353,8 +2351,7 @@ expand_omp_for_init_counts (struct omp_f
  tree step = fold_convert (itype,
unshare_expr (fd->loops[i].step));
  if (POINTER_TYPE_P (TREE_TYPE (vs[i])))
-   t = fold_build_pointer_plus (vs[i],
-fold_convert (sizetype, step));
+   t = fold_build_pointer_plus (vs[i], step);
  else
t = fold_build2 (PLUS_EXPR, itype, vs[i], step);
  t = force_gimple_operand_gsi (&gsi2, t, true, NULL_TREE,
@@ -2794,8 +2791,7 @@ expand_omp_for_init_vars (struct omp_for
  else if (POINTER_TYPE_P (itype))
{
  gcc_assert (integer_onep (fd->loops[j].m1));
- t = fold_convert (sizetype,
-   unshare_expr (fd->loops[j].n1));
+ t = unshare_expr (fd->loops[j].n1);
  n1 = fold_build_pointer_plus (vs[j - fd->loops[j].outer], t);
}
  else
@@ -2818,8 +2814,7 @@ expand_omp_for_init_vars (struct omp_for
  else if (POINTER_TYPE_P (itype))
{
  gcc_assert (integer_onep (fd->loops[j].m2));
- t = fold_convert (sizetype,
-   unshare_expr (fd->loops[j].n2));
+ t = unshare_expr (fd->loops[j].n2);
  n2 = fold_build_pointer_plus (vs[j - fd->loops[j].outer], t);
}
  else
@@ -2895,8 +2890,7 @@ expand_omp_for_init_vars (struct omp_for
  tree step
= fold_convert (itype, unshare_expr (fd->loops[j].step));
  if (POINTER_TYPE_P (vtype))
-   t = fold_build_pointer_plus (vs[j], fold_convert (sizetype,
- step));
+   t = fold_build_pointer_plus (vs[j], step);
  else
t = fold_build2 (PLUS_EXPR, itype, vs[j], step);
}
@@ -2959,8 +2953,7 @@ expand_omp_for_init_vars (struct omp_for
= fold_convert (itype, unshare_expr (fd->loops[j].step));
  t = fold_build2 (MULT_EXPR, itype, t, t2);
  if (POINTER_TYPE_P (vtype))
-   t = fold_build_pointer_plus (n1,
-fold_convert (sizetype, t));
+   t = fold_build_pointer_plus (n1, t);
  else
t = fold_build2 (PLUS_EXPR, itype, n1, t);
}
@@ -2970,8 +2963,7 @@ expand_omp_for_init_vars (struct omp_for
  t = fold_build2 (MULT_EXPR, itype, t,
   fold_convert (itype, fd->loops[j].step));
  if (POINTER_TYPE_P (vtype))
-   t = fold_build_pointer_plus (fd->loops[j].n1,
-fold_convert (sizetype, t));
+   t = fold_build_pointer_plus (fd->loops[j].n1, t);
  else
t = fold_build2 (PLUS_EXPR, itype, fd->loops[j].n1, t);
}
@@ -3035,9 +3027,8 @@ expand_omp_for

RE: [PATCH] Some additional zero-extension related optimizations in simplify-rtx.

2022-07-29 Thread Roger Sayle


Hi Segher,
> > > To implement this, and some closely related transformations, we
> > > build upon the existing val_signbit_known_clear_p predicate.  In the
> > > first chunk, nonzero_bits knows that FFS and ABS can't leave the
> > > sign-bit bit set,
> >
> > Is that guaranteed in all cases?  Also at -O0, also for args bigger
> > than 64 bits?
> 
> val_signbit_known_clear_p should work for any size/precision arg.

No, you're right!  Please forgive/excuse me.  Neither val_signbit_p nor
nonzero_bits have yet been updated to use "wide_int", so don't work
for TImode or wider modes.  Doh!

I'm shocked.

Roger
--




Re: cselib: add function to check if SET is redundant [PR106187]

2022-07-29 Thread Richard Biener via Gcc-patches
On Thu, Jul 28, 2022 at 6:46 PM Richard Earnshaw
 wrote:
>
> [resend with correct subject line]
>
> A SET operation that writes memory may have the same value as an earlier
> store but if the alias sets of the new and earlier store do not conflict
> then the set is not truly redundant.  This can happen, for example, if
> objects of different types share a stack slot.
>
> To fix this we define a new function in cselib that first checks for
> equality and if that is successful then finds the earlier store in the
> value history and checks the alias sets.
>
> The routine is used in two places elsewhere in the compiler.  Firstly
> in cfgcleanup and secondly in postreload.

I can't comment on the stripping on SUBREGs and friends but it seems
to be conservative apart from

+  if (!flag_strict_aliasing || !MEM_P (dest))
+return true;

where if dest is not a MEM but were to contain one we'd miss it.
Double-checking
from more RTL literate people appreciated.

+  /* Lookup the equivalents to the dest.  This is more likely to succeed
+ than looking up the equivalents to the source (for example, when the
+ src is some form of constant).  */

I think the comment is misleading - we _do_ have to lookup the MEM,
looking up equivalences of a reg or an expression on the RHS isn't
what we are interested in.

+   return alias_sets_conflict_p (MEM_ALIAS_SET (dest),
+ MEM_ALIAS_SET (src_equiv));

that's not conservative enough - dse.cc has correct boilerplate, we have
to check both MEM_ALIAS_SET and MEM_EXPR here (the latter only
if the former load/store has a MEM_EXPR).  Note in particular
using alias_set_subset_of instead of alias_sets_conflict_p.

  /* We can only remove the later store if the earlier aliases
 at least all accesses the later one.  */
  && ((MEM_ALIAS_SET (mem) == MEM_ALIAS_SET (s_info->mem)
   || alias_set_subset_of (MEM_ALIAS_SET (mem),
   MEM_ALIAS_SET (s_info->mem)))
  && (!MEM_EXPR (s_info->mem)
  || refs_same_for_tbaa_p (MEM_EXPR (s_info->mem),
   MEM_EXPR (mem)

+  /* We failed to find a recorded value in the cselib history, so try the
+ source of this set.  */
+  rtx src = SET_SRC (set);
+  while (GET_CODE (src) == SUBREG)
+src = XEXP (src, 0);
+
+  if (MEM_P (src) && rtx_equal_for_cselib_1 (dest_addr, XEXP (src, 0),
+GET_MODE (dest), 0))
+return alias_sets_conflict_p (MEM_ALIAS_SET (dest),
+ MEM_ALIAS_SET (src));

this looks like an odd case to me - wouldn't that only catch things
like self-assignments, aka *p = *p?  So I'd simply drop this fallback.

Otherwise it looks OK to me.

Thanks,
Richard.

> gcc/ChangeLog:
> * cselib.h (cselib_redundant_set_p): Declare.
> * cselib.cc: Include alias.h
> (cselib_redundant_set_p): New function.
> * cfgcleanup.cc: (mark_effect): Use cselib_redundant_set_p instead
> of rtx_equal_for_cselib_p.
> * postreload.c (reload_cse_simplify): Use cselib_redundant_set_p.
> (reload_cse_noop_set_p): Delete.


Re: [PATCH v2] LoongArch: Define the macro ASM_PREFERRED_EH_DATA_FORMAT by checking the assembler's support for eh_frame encoding.

2022-07-29 Thread Lulu Cheng



在 2022/7/29 下午12:02, Lulu Cheng 写道:

The ASM_PREFERRED_EH_DATA_FORMAT macro before and after modification is as 
follows:

  #define ASM_PREFERRED_EH_DATA_FORMAT(CODE, GLOBAL) \
-  (((GLOBAL) ? DW_EH_PE_indirect : 0) | DW_EH_PE_absptr)
+  (((GLOBAL) ? DW_EH_PE_indirect : 0) | DW_EH_PE_pcrel | DW_EH_PE_sdata4)

Use the following tests to describe the effect of modifying this macro on the 
generated
assembly code:

#include 
#include 

using namespace std;
int main()
{

  try {
  throw 1;
  }
  catch (int i)
  {
cout << i << endl;
  }
}

The main comparisons related to the eh_frame section are as follows:

   OLD  
  NEW
.LFB1997 = .  |  .LFB1780 = .
   .cfi_startproc  |  
.cfi_startproc
   .cfi_personality 0x80,DW.ref.__gxx_personality_v0   |  
.cfi_personality 0x9b,DW.ref.__gxx_personality_v0
   .cfi_lsda 0,.LLSDA1997  |  
.cfi_lsda 0x1b,.LLSDA1780

If the assembly file generated by the new gcc is compiled with the binutils of 
version 2.38, the
following error will be reported:
test.s:74: Error: invalid or unsupported encoding in .cfi_personality
test.s:75: Error: invalid or unsupported encoding in .cfi_lsda

So I think I should judge whether binutils supports this encoding when gcc is 
configured, and then choose how to define
macro ASM_PREFERRED_EH_DATA_FORMAT.





.eh_frame DW_EH_PE_pcrel encoding format is not supported by gas <= 2.39.
Check if the assembler support DW_EH_PE_PCREL encoding and define .eh_frame
encoding type.

gcc/ChangeLog:

* config.in: Regenerate.
* config/loongarch/loongarch.h (ASM_PREFERRED_EH_DATA_FORMAT):
Select the value of the macro definition according to whether
HAVE_AS_EH_FRAME_PCREL_ENCODING_SUPPORT is defined.
* configure: Regenerate.
* configure.ac: Reinstate HAVE_AS_EH_FRAME_PCREL_ENCODING_SUPPORT test.
---
  gcc/config.in|  8 +++-
  gcc/config/loongarch/loongarch.h |  5 +
  gcc/configure| 34 
  gcc/configure.ac |  8 
  4 files changed, 54 insertions(+), 1 deletion(-)

diff --git a/gcc/config.in b/gcc/config.in
index 16bb963b45b..413b2bd36cb 100644
--- a/gcc/config.in
+++ b/gcc/config.in
@@ -404,13 +404,19 @@
  #endif
  
  
+/* Define if your assembler supports eh_frame pcrel encoding. */

+#ifndef USED_FOR_TARGET
+#undef HAVE_AS_EH_FRAME_PCREL_ENCODING_SUPPORT
+#endif
+
+
  /* Define if your assembler supports the R_PPC64_ENTRY relocation. */
  #ifndef USED_FOR_TARGET
  #undef HAVE_AS_ENTRY_MARKERS
  #endif
  
  
-/* Define if your assembler supports explicit relocations. */

+/* Define if your assembler supports explicit relocation. */
  #ifndef USED_FOR_TARGET
  #undef HAVE_AS_EXPLICIT_RELOCS
  #endif
diff --git a/gcc/config/loongarch/loongarch.h b/gcc/config/loongarch/loongarch.h
index 89a5bd728fe..8b1288961e4 100644
--- a/gcc/config/loongarch/loongarch.h
+++ b/gcc/config/loongarch/loongarch.h
@@ -1127,8 +1127,13 @@ struct GTY (()) machine_function
  };
  #endif
  
+#ifdef HAVE_AS_EH_FRAME_PCREL_ENCODING_SUPPORT

+#define ASM_PREFERRED_EH_DATA_FORMAT(CODE, GLOBAL) \
+  (((GLOBAL) ? DW_EH_PE_indirect : 0) | DW_EH_PE_pcrel | DW_EH_PE_sdata4)
+#else
  #define ASM_PREFERRED_EH_DATA_FORMAT(CODE, GLOBAL) \
(((GLOBAL) ? DW_EH_PE_indirect : 0) | DW_EH_PE_absptr)
+#endif
  
  /* Do emit .note.GNU-stack by default.  */

  #ifndef NEED_INDICATE_EXEC_STACK
diff --git a/gcc/configure b/gcc/configure
index 7eb9479ae8e..05efa5b01ef 100755
--- a/gcc/configure
+++ b/gcc/configure
@@ -28836,6 +28836,40 @@ if test $gcc_cv_as_loongarch_explicit_relocs = yes; 
then
  
  $as_echo "#define HAVE_AS_EXPLICIT_RELOCS 1" >>confdefs.h
  
+fi

+
+{ $as_echo "$as_me:${as_lineno-$LINENO}: checking assembler for eh_frame pcrel 
encoding support" >&5
+$as_echo_n "checking assembler for eh_frame pcrel encoding support... " >&6; }
+if ${gcc_cv_as_loongarch_eh_frame_pcrel_encoding_support+:} false; then :
+  $as_echo_n "(cached) " >&6
+else
+  gcc_cv_as_loongarch_eh_frame_pcrel_encoding_support=no
+  if test x$gcc_cv_as != x; then
+$as_echo '.cfi_startproc
+   .cfi_personality 0x9b,a
+   .cfi_lsda 0x1b,b
+   .cfi_endproc' > conftest.s
+if { ac_try='$gcc_cv_as $gcc_cv_as_flags  -o conftest.o conftest.s >&5'
+  { { eval echo "\"\$as_me\":${as_lineno-$LINENO}: \"$ac_try\""; } >&5
+  (eval $ac_try) 2>&5
+  ac_status=$?
+  $as_echo "$as_me:${as_lineno-$LINENO}: \$? = $ac_status" >&5
+  test $ac_status = 0; }; }
+then
+   gcc_cv_as_loongarch_eh_frame_pcrel_encoding_support=yes
+else
+  echo "configure: failed program was" >&5
+  cat conftest.s >&5
+