date:20230629

[PATCH 3/3] rs6000: Teach legitimate_address_p about LEN_{LOAD, STORE} [PR110248]

2023-06-29 Thread Kewen.Lin via Gcc-patches

Hi,

This patch is to teach rs6000_legitimate_address_p to
handle the queried rtx constructed for LEN_{LOAD,STORE},
since lxvl and stxvl doesn't support x-form or ds-form,
so consider it as not legitimate when outer code is PLUS.

This can help to fix SPEC2017 503.bwaves_r degradation
as reported in PR110248 (note that with explicit option
--param=vect-partial-vector-usage=2), also +1.69% for
507.cactuBSSN_r, +2.35% for 510.parest_r (likely noise),
the others are neutral.

I didn't associate one test case for this as I think
checking some dumping of ivopts or final assembly insns
seems quite fragile.

Bootstrapped and regtested on powerpc64-linux-gnu
P7/P8/P9 and powerpc64le-linux-gnu P9/P10.

Is it ok for trunk?

BR,
Kewen
--
PR tree-optimization/110248

gcc/ChangeLog:

* config/rs6000/rs6000.cc (rs6000_legitimate_address_p): Check if
the given code is for ifn LEN_{LOAD,STORE}, if yes then make it not
legitimate when outer code is PLUS.
---
 gcc/config/rs6000/rs6000.cc | 8 +++-
 1 file changed, 7 insertions(+), 1 deletion(-)

diff --git a/gcc/config/rs6000/rs6000.cc b/gcc/config/rs6000/rs6000.cc
index efc54528b23..f5c9289fda0 100644
--- a/gcc/config/rs6000/rs6000.cc
+++ b/gcc/config/rs6000/rs6000.cc
@@ -9856,7 +9856,7 @@ use_toc_relative_ref (rtx sym, machine_mode mode)
during assembly output.  */
 static bool
 rs6000_legitimate_address_p (machine_mode mode, rtx x, bool reg_ok_strict,
-code_helper = ERROR_MARK)
+code_helper ch = ERROR_MARK)
 {
   bool reg_offset_p = reg_offset_addressing_ok_p (mode);
   bool quad_offset_p = mode_supports_dq_form (mode);
@@ -9864,6 +9864,12 @@ rs6000_legitimate_address_p (machine_mode mode, rtx x, 
bool reg_ok_strict,
   if (TARGET_ELF && RS6000_SYMBOL_REF_TLS_P (x))
 return 0;

+  /* lxvl and stxvl doesn't support any addressing modes with PLUS.  */
+  if (ch.is_internal_fn ()
+  && (ch == IFN_LEN_LOAD || ch == IFN_LEN_STORE)
+  && GET_CODE (x) == PLUS)
+return 0;
+
   /* Handle unaligned altivec lvx/stvx type addresses.  */
   if (VECTOR_MEM_ALTIVEC_OR_VSX_P (mode)
   && GET_CODE (x) == AND
--
2.39.3

[PATCH 2/3] ivopts: Call valid_mem_ref_p with code_helper [PR110248]

2023-06-29 Thread Kewen.Lin via Gcc-patches

Hi,

As PR110248 shows, to get the expected query results for
that case internal functions LEN_{LOAD,STORE} is unable to
adopt some addressing modes, we need to pass down the
related IFN code as well.  This patch is to make IVOPTs
pass down ifn code for USE_PTR_ADDRESS type uses, it
adjusts the related {strict_,}memory_address_addr_space_p
and valid_mem_ref_p functions as well.

Bootstrapped and regtested on x86_64-redhat-linux and
powerpc64{,le}-linux-gnu.

Is it ok for trunk?

BR,
Kewen
-
PR tree-optimization/110248

gcc/ChangeLog:

* recog.cc (memory_address_addr_space_p): Add one more argument ch of
type code_helper and pass it to targetm.addr_space.legitimate_address_p
instead of ERROR_MARK.
(offsettable_address_addr_space_p): Update one function pointer with
one more argument of type code_helper as its assignees
memory_address_addr_space_p and strict_memory_address_addr_space_p
have been adjusted, and adjust some call sites with ERROR_MARK.
* recog.h (tree.h): New include header file for tree_code ERROR_MARK.
(memory_address_addr_space_p): Adjust with one more unnamed argument
of type code_helper with default ERROR_MARK.
(strict_memory_address_addr_space_p): Likewise.
* reload.cc (strict_memory_address_addr_space_p): Add one unnamed
argument of type code_helper.
* tree-ssa-address.cc (valid_mem_ref_p): Add one more argument ch of
type code_helper and pass it to memory_address_addr_space_p.
* tree-ssa-address.h (valid_mem_ref_p): Adjust the declaration with
one more unnamed argument of type code_helper with default value
ERROR_MARK.
* tree-ssa-loop-ivopts.cc (get_address_cost): Use ERROR_MARK as code
by default, change it with ifn code for USE_PTR_ADDRESS type use, and
pass it to all valid_mem_ref_p calls.
---
 gcc/recog.cc| 13 ++---
 gcc/recog.h | 10 +++---
 gcc/reload.cc   |  2 +-
 gcc/tree-ssa-address.cc |  4 ++--
 gcc/tree-ssa-address.h  |  3 ++-
 gcc/tree-ssa-loop-ivopts.cc | 18 +-
 6 files changed, 31 insertions(+), 19 deletions(-)

diff --git a/gcc/recog.cc b/gcc/recog.cc
index 692c258def6..2bff6c03e4d 100644
--- a/gcc/recog.cc
+++ b/gcc/recog.cc
@@ -1802,8 +1802,8 @@ pop_operand (rtx op, machine_mode mode)
for mode MODE in address space AS.  */

 bool
-memory_address_addr_space_p (machine_mode mode ATTRIBUTE_UNUSED,
-rtx addr, addr_space_t as)
+memory_address_addr_space_p (machine_mode mode ATTRIBUTE_UNUSED, rtx addr,
+addr_space_t as, code_helper ch)
 {
 #ifdef GO_IF_LEGITIMATE_ADDRESS
   gcc_assert (ADDR_SPACE_GENERIC_P (as));
@@ -1813,8 +1813,7 @@ memory_address_addr_space_p (machine_mode mode 
ATTRIBUTE_UNUSED,
  win:
   return true;
 #else
-  return targetm.addr_space.legitimate_address_p (mode, addr, 0, as,
- ERROR_MARK);
+  return targetm.addr_space.legitimate_address_p (mode, addr, 0, as, ch);
 #endif
 }

@@ -2430,7 +2429,7 @@ offsettable_address_addr_space_p (int strictp, 
machine_mode mode, rtx y,
   rtx z;
   rtx y1 = y;
   rtx *y2;
-  bool (*addressp) (machine_mode, rtx, addr_space_t) =
+  bool (*addressp) (machine_mode, rtx, addr_space_t, code_helper) =
 (strictp ? strict_memory_address_addr_space_p
 : memory_address_addr_space_p);
   poly_int64 mode_sz = GET_MODE_SIZE (mode);
@@ -2469,7 +2468,7 @@ offsettable_address_addr_space_p (int strictp, 
machine_mode mode, rtx y,
   *y2 = plus_constant (address_mode, *y2, mode_sz - 1);
   /* Use QImode because an odd displacement may be automatically invalid
 for any wider mode.  But it should be valid for a single byte.  */
-  good = (*addressp) (QImode, y, as);
+  good = (*addressp) (QImode, y, as, ERROR_MARK);

   /* In any case, restore old contents of memory.  */
   *y2 = y1;
@@ -2504,7 +2503,7 @@ offsettable_address_addr_space_p (int strictp, 
machine_mode mode, rtx y,

   /* Use QImode because an odd displacement may be automatically invalid
  for any wider mode.  But it should be valid for a single byte.  */
-  return (*addressp) (QImode, z, as);
+  return (*addressp) (QImode, z, as, ERROR_MARK);
 }

 /* Return true if ADDR is an address-expression whose effect depends
diff --git a/gcc/recog.h b/gcc/recog.h
index badf8e3dc1c..c6ef619c5dd 100644
--- a/gcc/recog.h
+++ b/gcc/recog.h
@@ -20,6 +20,9 @@ along with GCC; see the file COPYING3.  If not see
 #ifndef GCC_RECOG_H
 #define GCC_RECOG_H

+/* For enum tree_code ERROR_MARK.  */
+#include "tree.h"
+
 /* Random number that should be large enough for all purposes.  Also define
a type that has at least MAX_RECOG_ALTERNATIVES + 1 bits, with the extra
bit giving an invalid value that can be used to mean "uninitialized".  */
@@ -200,11 +203,12 @@ extern void

[PATCH 1/3] targhooks: Extend legitimate_address_p with code_helper [PR110248]

2023-06-29 Thread Kewen.Lin via Gcc-patches

Hi,

As PR110248 shows, some middle-end passes like IVOPTs can
query the target hook legitimate_address_p with some
artificially constructed rtx to determine whether some
addressing modes are supported by target for some gimple
statement.  But for now the existing legitimate_address_p
only checks the given mode, it's unable to distinguish
some special cases unfortunately, for example, for LEN_LOAD
ifn on Power port, we would expand it with lxvl hardware
insn, which only supports one register to hold the address
(the other register is holding the length), that is we
don't support base (reg) + index (reg) addressing mode for
sure.  But hook legitimate_address_p only considers the
given mode which would be some vector mode for LEN_LOAD
ifn, and we do support base + index addressing mode for
normal vector load and store insns, so the hook will return
true for the query unexpectedly.

This patch is to introduce one extra argument of type
code_helper for hook legitimate_address_p, it makes targets
able to handle some special case like what's described
above.  The subsequent patches will show how to leverage
the new code_helper argument.

I didn't separate those target specific adjustments to
their own patches, since those changes are no function
changes.  One typical change is to add one unnamed argument
with default ERROR_MARK, some ports need to include tree.h
in their {port}-protos.h since the hook is used in some
machine description files.  I've cross-built a corresponding
cc1 successfully for at least one triple of each affected
target so I believe they are safe.  But feel free to correct
me if separating is needed for the review of this patch.

Besides, it's bootstrapped and regtested on
x86_64-redhat-linux and powerpc64{,le}-linux-gnu.

Is it ok for trunk?

BR,
Kewen
--
PR tree-optimization/110248

gcc/ChangeLog:

* coretypes.h (class code_helper): Add forward declaration.
* doc/tm.texi: Regenerate.
* lra-constraints.cc (valid_address_p): Call target hook
targetm.addr_space.legitimate_address_p with an extra parameter
ERROR_MARK as its prototype changes.
* recog.cc (memory_address_addr_space_p): Likewise.
* reload.cc (strict_memory_address_addr_space_p): Likewise.
* target.def (legitimate_address_p, addr_space.legitimate_address_p):
Extend with one more argument of type code_helper, update the
documentation accordingly.
* targhooks.cc (default_legitimate_address_p): Adjust for the
new code_helper argument.
(default_addr_space_legitimate_address_p): Likewise.
* targhooks.h (default_legitimate_address_p): Likewise.
(default_addr_space_legitimate_address_p): Likewise.
* config/aarch64/aarch64.cc (aarch64_legitimate_address_hook_p): Adjust
with extra unnamed code_helper argument with default ERROR_MARK.
* config/alpha/alpha.cc (alpha_legitimate_address_p): Likewise.
* config/arc/arc.cc (arc_legitimate_address_p): Likewise.
* config/arm/arm-protos.h (arm_legitimate_address_p): Likewise.
(tree.h): New include for tree_code ERROR_MARK.
* config/arm/arm.cc (arm_legitimate_address_p): Adjust with extra
unnamed code_helper argument with default ERROR_MARK.
* config/avr/avr.cc (avr_addr_space_legitimate_address_p): Likewise.
* config/bfin/bfin.cc (bfin_legitimate_address_p): Likewise.
* config/bpf/bpf.cc (bpf_legitimate_address_p): Likewise.
* config/c6x/c6x.cc (c6x_legitimate_address_p): Likewise.
* config/cris/cris-protos.h (cris_legitimate_address_p): Likewise.
(tree.h): New include for tree_code ERROR_MARK.
* config/cris/cris.cc (cris_legitimate_address_p): Adjust with extra
unnamed code_helper argument with default ERROR_MARK.
* config/csky/csky.cc (csky_legitimate_address_p): Likewise.
* config/epiphany/epiphany.cc (epiphany_legitimate_address_p):
Likewise.
* config/frv/frv.cc (frv_legitimate_address_p): Likewise.
* config/ft32/ft32.cc (ft32_addr_space_legitimate_address_p): Likewise.
* config/gcn/gcn.cc (gcn_addr_space_legitimate_address_p): Likewise.
* config/h8300/h8300.cc (h8300_legitimate_address_p): Likewise.
* config/i386/i386.cc (ix86_legitimate_address_p): Likewise.
* config/ia64/ia64.cc (ia64_legitimate_address_p): Likewise.
* config/iq2000/iq2000.cc (iq2000_legitimate_address_p): Likewise.
* config/lm32/lm32.cc (lm32_legitimate_address_p): Likewise.
* config/loongarch/loongarch.cc (loongarch_legitimate_address_p):
Likewise.
* config/m32c/m32c.cc (m32c_legitimate_address_p): Likewise.
(m32c_addr_space_legitimate_address_p): Likewise.
* config/m32r/m32r.cc (m32r_legitimate_address_p): Likewise.
* config/m68k/m68k.cc (m68k_legitimate_address_p): Likewise.
* config/mcore/mcore.cc

[PATCH] Collect both user and kernel events for autofdo tests and autoprofiledbootstrap

2023-06-29 Thread Eugene Rozenfeld via Gcc-patches

When we collect just user events for autofdo with lbr we get some events where 
branch
sources are kernel addresses and branch targets are user addresses. Without 
kernel MMAP
events create_gcov can't make sense of kernel addresses. Currently create_gcov 
fails if
it can't map at least 95% of events. We sometimes get below this threshold with 
just
user events. The change is to collect both user events and kernel events.

Tested on x86_64-pc-linux-gnu.

ChangeLog:

* Makefile.in: Collect both kernel and user events for autofdo
* Makefile.tpl: Collect both kernel and user events for autofdo

gcc/testsuite/ChangeLog:

* lib/target-supports.exp: Collect both kernel and user events for 
autofdo
---
 Makefile.in   | 2 +-
 Makefile.tpl  | 2 +-
 gcc/testsuite/lib/target-supports.exp | 2 +-
 3 files changed, 3 insertions(+), 3 deletions(-)

diff --git a/Makefile.in b/Makefile.in
index f19a9db621e..04307ca561b 100644
--- a/Makefile.in
+++ b/Makefile.in
@@ -404,7 +404,7 @@ MAKEINFO = @MAKEINFO@
 EXPECT = @EXPECT@
 RUNTEST = @RUNTEST@
 
-AUTO_PROFILE = gcc-auto-profile -c 1000
+AUTO_PROFILE = gcc-auto-profile --all -c 1000
 
 # This just becomes part of the MAKEINFO definition passed down to
 # sub-makes.  It lets flags be given on the command line while still
diff --git a/Makefile.tpl b/Makefile.tpl
index 3a5b7ed3c92..d0fe7e2fb77 100644
--- a/Makefile.tpl
+++ b/Makefile.tpl
@@ -407,7 +407,7 @@ MAKEINFO = @MAKEINFO@
 EXPECT = @EXPECT@
 RUNTEST = @RUNTEST@
 
-AUTO_PROFILE = gcc-auto-profile -c 1000
+AUTO_PROFILE = gcc-auto-profile --all -c 1000
 
 # This just becomes part of the MAKEINFO definition passed down to
 # sub-makes.  It lets flags be given on the command line while still
diff --git a/gcc/testsuite/lib/target-supports.exp 
b/gcc/testsuite/lib/target-supports.exp
index 4d04df2a709..b16853d76df 100644
--- a/gcc/testsuite/lib/target-supports.exp
+++ b/gcc/testsuite/lib/target-supports.exp
@@ -704,7 +704,7 @@ proc check_effective_target_keeps_null_pointer_checks { } {
 # this allows parallelism of 16 and higher of parallel gcc-auto-profile
 proc profopt-perf-wrapper { } {
 global srcdir
-return "$srcdir/../config/i386/gcc-auto-profile -m8 "
+return "$srcdir/../config/i386/gcc-auto-profile --all -m8 "
 }
 
 # Return true if profiling is supported on the target.
-- 
2.25.1

[PATCH] tree.h: Hide wi::from_mpz from GENERATOR_FILE

2023-06-29 Thread Kewen.Lin via Gcc-patches

Hi,

Similar to r0-85707-g34917a102a4e0c for PR35051, the uses
of mpz_t should be guarded with "#ifndef GENERATOR_FILE".
This patch is to fix it and avoid some possible build
errors.

Bootstrapped and regress-tested on x86_64-redhat-linux,
and powerpc64{,le}-linux-gnu.  And cross-build well on
power for 40+ different ports.

Is it ok for trunk?

gcc/ChangeLog:

* tree.h (wi::from_mpz): Hide from GENERATOR_FILE.
---
 gcc/tree.h | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/gcc/tree.h b/gcc/tree.h
index 1854fe4a7d4..7e92a12f9cb 100644
--- a/gcc/tree.h
+++ b/gcc/tree.h
@@ -6460,7 +6460,9 @@ namespace wi

   wide_int min_value (const_tree);
   wide_int max_value (const_tree);
+#ifndef GENERATOR_FILE
   wide_int from_mpz (const_tree, mpz_t, bool);
+#endif
 }

 template 
--
2.39.3

Re: PR82943 - Suggested patch to fix

2023-06-29 Thread Steve Kargl via Gcc-patches

On Thu, Jun 29, 2023 at 10:38:42PM -0500, Alexander Westbrooks via Fortran 
wrote:
> I have finished my testing, and updated my patch and relevant Changelogs. I
> added 4 new tests and all the existing tests in the current testsuite
> for gfortran passed or failed as expected. Do I need to attach the test
> results here?

Yes.  It helps others also do testing to have one self-contained
patch (which I don't know to generate with git and new files :-( ).
It may also be a good idea to attach the patch and test cases to
the PR in bugzilla so that they don't accidentally get lost.

> The platform I tested on was a Docker container running in Docker Desktop,
> running the "mcr.microsoft.com/devcontainers/universal:2-linux" image.
> 
> I also made sure that my code changes followed the coding standards. Please
> let me know if there is anything else that I need to do. I don't have
> write-access to the repository.

See the legal link that Harald provided.  At one time, one needed to
assign copyright to the FSF with a wet-ink signature on some form.
Now, I think you just need to attest that you have the right to
provide the code to the gcc project.

PS: Welcome to the gfortran development world.  Don't be put off
if there is a delay in getting feedback/review.  There are too 
few contributors and too little time.   If a week passes simply
ping the mailing list.  I'll try to carve out some time to look
over your patch this weekend.

-- 
steve


> 
> Thanks,
> 
> Alexander
> 
> On Wed, Jun 28, 2023 at 4:14 PM Harald Anlauf  wrote:
> 
> > Hi Alex,
> >
> > welcome to the gfortran community.  It is great that you are trying
> > to get actively involved.
> >
> > You already did quite a few things right: patches shall be sent to
> > the gcc-patches ML, but Fortran reviewers usually notice them only
> > where they are copied to the fortran ML.
> >
> > There are some general recommendations on the formatting of C code,
> > like indentation, of the patches, and of the commit log entries.
> >
> > Regarding coding standards, see https://www.gnu.org/prep/standards/ .
> >
> > Regarding testcases, a recommendation is to have a look at
> > existing testcases, e.g. in gcc/testsuite/gfortran.dg/, and then
> > decide if the testcase shall test the compile-time or run-time
> > behaviour, and add the necessary dejagnu directives.
> >
> > You should also verify if your patch passes regression testing.
> > For changes to gfortran, it is usually sufficient to run
> >
> > make check-fortran -j 
> >
> > where  is the number of parallel tests.
> > You would need to report also the platform where you tested on.
> >
> > There is also a legal issue to consider before non-trivial patches can
> > be accepted for incorporation: https://gcc.gnu.org/contribute.html#legal
> >
> > If your patch is accepted and if you do not have write-access to the
> > repository, one of the maintainers will likely take care of it.
> > If you become a regular contributor, you will probably want to consider
> > getting write access.
> >
> > Cheers,
> > Harald
> >
> >
> >
> > On 6/24/23 19:17, Alexander Westbrooks via Gcc-patches wrote:
> > > Hello,
> > >
> > > I am new to the GFortran community. Over the past two weeks I created a
> > > patch that should fix PR82943 for GFortran. I have attached it to this
> > > email. The patch allows the code below to compile successfully. I am
> > > working on creating test cases next, but I am new to the process so it
> > may
> > > take me some time. After I make test cases, do I email them to you as
> > well?
> > > Do I need to make a pull-request on github in order to get the patch
> > > reviewed?
> > >
> > > Thank you,
> > >
> > > Alexander Westbrooks
> > >
> > > module testmod
> > >
> > >  public :: foo
> > >
> > >  type, public :: tough_lvl_0(a, b)
> > >  integer, kind :: a = 1
> > >  integer, len :: b
> > >  contains
> > >  procedure :: foo
> > >  end type
> > >
> > >  type, public, EXTENDS(tough_lvl_0) :: tough_lvl_1 (c)
> > >  integer, len :: c
> > >  contains
> > >  procedure :: bar
> > >  end type
> > >
> > >  type, public, EXTENDS(tough_lvl_1) :: tough_lvl_2 (d)
> > >  integer, len :: d
> > >  contains
> > >  procedure :: foobar
> > >  end type
> > >
> > > contains
> > >  subroutine foo(this)
> > >  class(tough_lvl_0(1,*)), intent(inout) :: this
> > >  end subroutine
> > >
> > >  subroutine bar(this)
> > >  class(tough_lvl_1(1,*,*)), intent(inout) :: this
> > >  end subroutine
> > >
> > >  subroutine foobar(this)
> > >  class(tough_lvl_2(1,*,*,*)), intent(inout) :: this
> > >  end subroutine
> > >
> > > end module
> > >
> > > PROGRAM testprogram
> > >  USE testmod
> > >
> > >  TYPE(tough_lvl_0(1,5)) :: test_pdt_0
> > >  TYPE(tough_lvl_1(1,5,6))   :: test_pdt_1
> > >  TYPE(tough_lvl_2(1,5,6,7)) :: test_pdt_2
> > >
> > >  CALL test_pdt_0%foo()
> > >
> > >

PR108672 re-fixed after [PATCH] libstdc++: Synchronize PSTL with upstream

2023-06-29 Thread Hans-Peter Nilsson via Gcc-patches

> Date: Mon, 26 Jun 2023 11:57:49 -0700
> From: Thomas Rodgers via Gcc-patches 

> On Wed, May 17, 2023 at 12:32 PM Jonathan Wakely  wrote:
> > All the actual code changes look good.

Unfortunately, this overwrote the fix for PR108672.  I take
it there's a step missing from the synchronization process;
a check that no local commits are overwritten?  Sounds like
something that can be fully scripted (not volunteering) or
already available (like, "list all commits affecting
contents touched by/between two named commits").

I did *not* check whether any other local commits were also
overwritten.  Also, not sure about whether better try to get
this upstreamed: __INT32_TYPE__ seems gcc-specific.

Anyway, r13-5702-g72058eea9d407e was "re-committed" per
below as obvious after regtesting cris-elf.

brgds, H-P

-- >8 --
Subject: libstdc++: Re-apply PR108672 fix (avoid use of naked int32_t in 
unseq_backend_simd.h)

The fix was overwritten by r14-2109-g3162ca09dbdc2e "libstdc++:
Synchronize PSTL with upstream".

libstdc++-v3:

PR libstdc++/108672
* include/pstl/unseq_backend_simd.h (__simd_or): Re-apply using
__INT32_TYPE__ instead of int32_t.
---
 libstdc++-v3/include/pstl/unseq_backend_simd.h | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/libstdc++-v3/include/pstl/unseq_backend_simd.h 
b/libstdc++-v3/include/pstl/unseq_backend_simd.h
index 69784bcdbe66..f3c38fbbbc2a 100644
--- a/libstdc++-v3/include/pstl/unseq_backend_simd.h
+++ b/libstdc++-v3/include/pstl/unseq_backend_simd.h
@@ -74,7 +74,7 @@ __simd_or(_Index __first, _DifferenceType __n, _Pred __pred) 
noexcept
 const _Index __last = __first + __n;
 while (__last != __first)
 {
-int32_t __flag = 1;
+__INT32_TYPE__ __flag = 1;
 _PSTL_PRAGMA_SIMD_REDUCTION(& : __flag)
 for (_DifferenceType __i = 0; __i < __block_size; ++__i)
 if (__pred(*(__first + __i)))
-- 
2.30.2

Re: PR82943 - Suggested patch to fix

2023-06-29 Thread Alexander Westbrooks via Gcc-patches

Hello,

I have finished my testing, and updated my patch and relevant Changelogs. I
added 4 new tests and all the existing tests in the current testsuite
for gfortran passed or failed as expected. Do I need to attach the test
results here?

The platform I tested on was a Docker container running in Docker Desktop,
running the "mcr.microsoft.com/devcontainers/universal:2-linux" image.

I also made sure that my code changes followed the coding standards. Please
let me know if there is anything else that I need to do. I don't have
write-access to the repository.

Thanks,

Alexander

On Wed, Jun 28, 2023 at 4:14 PM Harald Anlauf  wrote:

> Hi Alex,
>
> welcome to the gfortran community.  It is great that you are trying
> to get actively involved.
>
> You already did quite a few things right: patches shall be sent to
> the gcc-patches ML, but Fortran reviewers usually notice them only
> where they are copied to the fortran ML.
>
> There are some general recommendations on the formatting of C code,
> like indentation, of the patches, and of the commit log entries.
>
> Regarding coding standards, see https://www.gnu.org/prep/standards/ .
>
> Regarding testcases, a recommendation is to have a look at
> existing testcases, e.g. in gcc/testsuite/gfortran.dg/, and then
> decide if the testcase shall test the compile-time or run-time
> behaviour, and add the necessary dejagnu directives.
>
> You should also verify if your patch passes regression testing.
> For changes to gfortran, it is usually sufficient to run
>
> make check-fortran -j 
>
> where  is the number of parallel tests.
> You would need to report also the platform where you tested on.
>
> There is also a legal issue to consider before non-trivial patches can
> be accepted for incorporation: https://gcc.gnu.org/contribute.html#legal
>
> If your patch is accepted and if you do not have write-access to the
> repository, one of the maintainers will likely take care of it.
> If you become a regular contributor, you will probably want to consider
> getting write access.
>
> Cheers,
> Harald
>
>
>
> On 6/24/23 19:17, Alexander Westbrooks via Gcc-patches wrote:
> > Hello,
> >
> > I am new to the GFortran community. Over the past two weeks I created a
> > patch that should fix PR82943 for GFortran. I have attached it to this
> > email. The patch allows the code below to compile successfully. I am
> > working on creating test cases next, but I am new to the process so it
> may
> > take me some time. After I make test cases, do I email them to you as
> well?
> > Do I need to make a pull-request on github in order to get the patch
> > reviewed?
> >
> > Thank you,
> >
> > Alexander Westbrooks
> >
> > module testmod
> >
> >  public :: foo
> >
> >  type, public :: tough_lvl_0(a, b)
> >  integer, kind :: a = 1
> >  integer, len :: b
> >  contains
> >  procedure :: foo
> >  end type
> >
> >  type, public, EXTENDS(tough_lvl_0) :: tough_lvl_1 (c)
> >  integer, len :: c
> >  contains
> >  procedure :: bar
> >  end type
> >
> >  type, public, EXTENDS(tough_lvl_1) :: tough_lvl_2 (d)
> >  integer, len :: d
> >  contains
> >  procedure :: foobar
> >  end type
> >
> > contains
> >  subroutine foo(this)
> >  class(tough_lvl_0(1,*)), intent(inout) :: this
> >  end subroutine
> >
> >  subroutine bar(this)
> >  class(tough_lvl_1(1,*,*)), intent(inout) :: this
> >  end subroutine
> >
> >  subroutine foobar(this)
> >  class(tough_lvl_2(1,*,*,*)), intent(inout) :: this
> >  end subroutine
> >
> > end module
> >
> > PROGRAM testprogram
> >  USE testmod
> >
> >  TYPE(tough_lvl_0(1,5)) :: test_pdt_0
> >  TYPE(tough_lvl_1(1,5,6))   :: test_pdt_1
> >  TYPE(tough_lvl_2(1,5,6,7)) :: test_pdt_2
> >
> >  CALL test_pdt_0%foo()
> >
> >  CALL test_pdt_1%foo()
> >  CALL test_pdt_1%bar()
> >
> >  CALL test_pdt_2%foo()
> >  CALL test_pdt_2%bar()
> >  CALL test_pdt_2%foobar()
> >
> >
> > END PROGRAM testprogram
>
>


0001-Fix-fortran-PR82943-PR86148-and-PR86268.patch
Description: Binary data

Re: [PATCH] rs6000: Update the vsx-vector-6.* tests.

2023-06-29 Thread Kewen.Lin via Gcc-patches

Hi Carl,

on 2023/6/30 05:36, Carl Love wrote:
> Kewen:
> 
> On Wed, 2023-06-28 at 16:35 +0800, Kewen.Lin wrote:
>>> Yea, I was going with a runnable test and didn't include the
>>> instruction counts.  Added back in.  Rather then doing by processor
>>> version (P8, P9, P10) I was able to do it by BE/LE.  The
>>> instruction
>>> counts were the same for LE accross processor versions but there
>>> are a
>>> few instruction counts that vary with BE and LE.
>>
>> But the original test case only checks for cpu-types (processor
>> version)
>> but not for endianness, it means for the bif usages, there should not
>> be
>> different for endianness.  Why does this changes with your new test
>> case?
>> Could you have a further look and make it consistent with some
>> adjustment
>> if possible?  As we know, checking insn counts sometimes are fragile,
>> so
>> I think we should try our best to make it as robust as possible in
>> the
>> first place.
>>
>> Besides, the original case also have some differences between p7/p8
>> and
>> p9.
>>   
> 
> There are differences on P8 LE versus BE.  I did a diff between the P8
> and P9 tests:
> 
>  diff vsx-vector-6.p8.c vsx-vector-6.p9.c
> 3,4c3,4
> < /* { dg-require-effective-target powerpc_p8vector_ok } */
> < /* { dg-options "-O2 -mdejagnu-cpu=power8" } */
> ---
>> /* { dg-require-effective-target powerpc_p9vector_ok } */
>> /* { dg-options "-O2 -mdejagnu-cpu=power9" } */
> 12c12
> < /* { dg-final { scan-assembler-times {\mvperm\M} 1 } } */
> ---
>> /* { dg-final { scan-assembler-times {\m(?:v|xx)permr?\M} 1 } } */
> 23d22
> < /* { dg-final { scan-assembler-times {\mxvmsub[am]dp\M} 1 } } */
> 37c36
> < /* { dg-final { scan-assembler-times {\mxvsubdp\M} 1 } } */
> ---
>> /* { dg-final { scan-assembler-times {\mxvmsub[am]dp\M} 1 } } */
> 
> So we can see the vperm, vpermr, xxpermr, xvmsubadp, xvmsubmdp,
> xvsubdp, xvmsubadp, xvmsubmdp instruction count checks are different
> between the two architectures.  I then wrote a script to compile the
> CPU specific test on Power 8, Power 9 and Power 10 architectures and
> then grep for the above list of instructions.  If I run the scrip on P8
> BE  and LE I get> 
> 
> Power 8 BEPower 8 LE   Power 9 LE   Power 9 BEPower 10 LE*
>(makalu-lp1)(genoa) (marlin)  (nilram)   (ltcd97-lp3)
> instruction   count countcount countcount
> vperm  1  10 00
> vpermr 0  00 00
> xxpermr0  01 01
> xvmsubadp  1  01 11
> xvmsubmdp  0  10 00
> xvsubdp1  11 11
> 

Thanks for looking into this and making this statistics.

Is there a typo for column nilram?   Otherwise, the below insn check

/* { dg-final { scan-assembler-times {\m(?:v|xx)permr?\M} 1 } } */

would fail there.

> 
> From the diff we see 
> 
>   { dg-final {scan-assembler-times {\mxvmsub[am]dp\M} 1 } }
> 
> This test picks up the correct subtraction instruction for LE versus BE
> so this "masks" the LE/BE difference.  I changed the check in vsx-
> vector-6-func-3op.c to match.  This eliminates the LE and BE checks and
> reduces the number of specific checks.

OK, nice.

> 
> In vsx-vector-6-func-3op.c  The new test checks the counts for
> xxpermdi, which the original test does not check.  The check for
> xxpermdi are not needed.  They are not directly related to the builtin
> tests.  I removed them.

OK.

> 
> Looking at the LE/BE checks in the other test file vsx-vector-6-func-
> 2op.c, instructions xvmaxsp, xvminsp and xvmaxdp were not checked in
> the original test.  The functions where these instructions are used get
> inlined.  On LE, the binary instructions show up in the inlined code as
> well as what appears to be the binary for the original, non-inlined
> function.  Best I can see, the binary for the original function is dead
> code.  I don't see any calls to it.  Seems like it shouldn't be there
> as it would make the binary smaller. On BE, I don't see the binary for
> the original non-inlined function.  
> 
> I had played with putting -Wno-inline on the command line but that
> didn't seem to make any difference.  However, you suggestion of
> __attribute__ ((noipa)) does prevent the inlining and we don't get the
> second copy of the instructions showing up. The inlining eliminated the
> LE/BE differences for xvmaxsp, xvminsp and xvmaxdp.

-Winline is a option for warning: "Warn if a function that is declared
as inline cannot be inlined.", I think what you wanted is -fno-inline,
and it's good to know noipa helps here.

> 
> The instruction count test for xxlor in vsx-vector-6-func-2lop.c
> differs on LE and BE vsx-vector-6-func-2op.c.  I believe the
> instruction is used with loads to

Re: [PATCH] rs6000, __builtin_set_fpscr_rn add retrun value

2023-06-29 Thread Peter Bergner via Gcc-patches

On 6/29/23 4:13 AM, Kewen.Lin wrote:
> on 2023/6/19 23:57, Carl Love wrote:
>> The following patch changes the return type of the
>>  __builtin_set_fpscr_rn builtin from void to double.  The return value
>> is the current value of the various FPSCR fields DRN, VE, OE, UE, ZE,
>> XE, NI, RN bit positions when the builtin is called.  The builtin then
>> updated the RN field with the new value provided as an argument to the
>> builtin.  The patch adds new testcases to test_fpscr_rn_builtin.c to
>> check that the builtin returns the current value of the FPSCR fields
>> and then updates the RN field.
> 
> But this patch also introduces one more overloading instance with argument
> type double, I had a check on glibc usage of mffscrn/mffscrni, I don't
> think it's necessary to add this new instance, as it takes the given
> rounding mode as integral type.

I agree with Ke Wen, I don't know why there is an extra overloaded
instance.  I assumed we'd have just the one we had before, except now
its return type is double and not void.

That said, we need to announce to users that the return type has
changed, so new code compiled with a new GCC can get the new return
value, but if that new code is compiled with an old GCC (still has void
return type), it knows it needs to use some other method to get the
FPSCR value, if it needs it.  We normally do that with a predefine
macro (#define ...) that the user can test for in their code, ala:

#ifdef __SET_FPSCR_RN_RETURNS_FPSCR__  /* Or whatever name.  */
  ret = __builtin_set_fpscr_rn (..);
#else
  __builtin_set_fpscr_rn (..);
  ret = ;
#endif

We add those predefined macros in rs6000-c.cc:rs6000_target_modify_macros()
and it should be gated via the same conditions that the built-in itself
is enabled.  Look at my addition of the __TM_FENCE__ macro that let the
user know our HTM insn patterns were fixed to now act as memory barriers.

Peter

[PATCH V2] Machine Description: Add LEN_MASK_{GATHER_LOAD, SCATTER_STORE} pattern

2023-06-29 Thread juzhe . zhong

From: Ju-Zhe Zhong 

Hi, Richi and Richard.

This patch is adding LEN_MASK_{GATHER_LOAD,SCATTER_STORE} to allow targets
handle flow control by mask and loop control by length on gather/scatter memory
operations. Consider this following case:

#include 
void
f (uint8_t *restrict a, 
   uint8_t *restrict b, int n,
   int base, int step,
   int *restrict cond)
{
  for (int i = 0; i < n; ++i)
{
  if (cond[i])
a[i * step + base] = b[i * step + base];
}
}

We hope RVV can vectorize such case into following IR:

loop_len = SELECT_VL
control_mask = comparison
v = LEN_MASK_GATHER_LOAD (.., loop_len, control_mask)
LEN_SCATTER_STORE (... v, ..., loop_len, control_mask)

This patch doesn't apply such patterns into vectorizer, just add patterns
and update the documents.

Will send patch which apply such patterns into vectorizer soon after this
patch is approved.

Thanks.

gcc/ChangeLog:

* doc/md.texi: Add LEN_MASK_{GATHER_LOAD,SCATTER_STORE}.
* internal-fn.cc (expand_scatter_store_optab_fn): Ditto.
(expand_gather_load_optab_fn): Ditto.
(internal_load_fn_p): Ditto.
(internal_store_fn_p): Ditto.
(internal_gather_scatter_fn_p): Ditto.
(internal_fn_mask_index): Ditto.
(internal_fn_stored_value_index): Ditto.
* internal-fn.def (LEN_MASK_GATHER_LOAD): Ditto.
(LEN_MASK_SCATTER_STORE): Ditto.
* optabs.def (OPTAB_CD): Ditto.

---
 gcc/doc/md.texi | 17 +
 gcc/internal-fn.cc  | 32 ++--
 gcc/internal-fn.def |  8 ++--
 gcc/optabs.def  |  2 ++
 4 files changed, 55 insertions(+), 4 deletions(-)

diff --git a/gcc/doc/md.texi b/gcc/doc/md.texi
index 9648fdc846a..b84aaab7075 100644
--- a/gcc/doc/md.texi
+++ b/gcc/doc/md.texi
@@ -5040,6 +5040,15 @@ operand 5.  Bit @var{i} of the mask is set if element 
@var{i}
 of the result should be loaded from memory and clear if element @var{i}
 of the result should be set to zero.
 
+@cindex @code{len_mask_gather_load@var{m}@var{n}} instruction pattern
+@item @samp{len_mask_gather_load@var{m}@var{n}}
+Like @samp{gather_load@var{m}@var{n}}, but takes an extra length operand 
(operand 5)
+as well as a mask operand (operand 6). Similar to len_maskload, the 
instruction loads
+at most (operand 5) elements from memory.
+Bit @var{i} of the mask is set if element @var{i} of the result should
+be loaded from memory and clear if element @var{i} of the result should be 
undefined.
+Mask elements @var{i} with i > (operand 5) are ignored.
+
 @cindex @code{scatter_store@var{m}@var{n}} instruction pattern
 @item @samp{scatter_store@var{m}@var{n}}
 Store a vector of mode @var{m} into several distinct memory locations.
@@ -5069,6 +5078,14 @@ Like @samp{scatter_store@var{m}@var{n}}, but takes an 
extra mask operand as
 operand 5.  Bit @var{i} of the mask is set if element @var{i}
 of the result should be stored to memory.
 
+@cindex @code{len_mask_scatter_store@var{m}@var{n}} instruction pattern
+@item @samp{len_mask_scatter_store@var{m}@var{n}}
+Like @samp{scatter_store@var{m}@var{n}}, but takes an extra length operand 
(operand 5)
+as well as a mask operand (operand 6). The instruction stores at most (operand 
5) elements
+of (operand 4) to memory.
+Bit @var{i} of the mask is set if element @var{i} of (operand 4) should be 
stored.
+Mask elements @var{i} with i > (operand 5) are ignored.
+
 @cindex @code{vec_set@var{m}} instruction pattern
 @item @samp{vec_set@var{m}}
 Set given field in the vector value.  Operand 0 is the vector to modify,
diff --git a/gcc/internal-fn.cc b/gcc/internal-fn.cc
index 9017176dc7a..e4b558e33d8 100644
--- a/gcc/internal-fn.cc
+++ b/gcc/internal-fn.cc
@@ -3537,7 +3537,7 @@ expand_scatter_store_optab_fn (internal_fn, gcall *stmt, 
direct_optab optab)
   HOST_WIDE_INT scale_int = tree_to_shwi (scale);
   rtx rhs_rtx = expand_normal (rhs);
 
-  class expand_operand ops[6];
+  class expand_operand ops[7];
   int i = 0;
   create_address_operand ([i++], base_rtx);
   create_input_operand ([i++], offset_rtx, TYPE_MODE (TREE_TYPE (offset)));
@@ -3546,6 +3546,14 @@ expand_scatter_store_optab_fn (internal_fn, gcall *stmt, 
direct_optab optab)
   create_input_operand ([i++], rhs_rtx, TYPE_MODE (TREE_TYPE (rhs)));
   if (mask_index >= 0)
 {
+  if (optab == len_mask_scatter_store_optab)
+   {
+ tree len = gimple_call_arg (stmt, mask_index - 1);
+ rtx len_rtx = expand_normal (len);
+ create_convert_operand_from ([i++], len_rtx,
+  TYPE_MODE (TREE_TYPE (len)),
+  TYPE_UNSIGNED (TREE_TYPE (len)));
+   }
   tree mask = gimple_call_arg (stmt, mask_index);
   rtx mask_rtx = expand_normal (mask);
   create_input_operand ([i++], mask_rtx, TYPE_MODE (TREE_TYPE (mask)));
@@ -3572,7 +3580,7 @@ expand_gather_load_optab_fn (internal_fn, gcall *stmt, 
direct_optab optab)
   HOST_WIDE_INT scale_int = tree_to_shwi (scale);

[PATCH v1 4/6] LoongArch: Added Loongson ASX vector directive compilation framework.

2023-06-29 Thread Chenghui Pan

From: Lulu Cheng 

gcc/ChangeLog:

* config/loongarch/genopts/loongarch-strings: Added compilation 
framework.
* config/loongarch/genopts/loongarch.opt.in: Ditto.
* config/loongarch/loongarch-c.cc (loongarch_cpu_cpp_builtins): Ditto.
* config/loongarch/loongarch-def.c: Ditto.
* config/loongarch/loongarch-def.h (N_ISA_EXT_TYPES): Ditto.
(ISA_EXT_SIMD_LASX): Ditto.
(N_SWITCH_TYPES): Ditto.
(SW_LASX): Ditto.
* config/loongarch/loongarch-driver.cc (driver_get_normalized_m_opts): 
Ditto.
* config/loongarch/loongarch-driver.h (driver_get_normalized_m_opts): 
Ditto.
* config/loongarch/loongarch-opts.cc (isa_str): Ditto.
* config/loongarch/loongarch-opts.h (ISA_HAS_LSX): Ditto.
(ISA_HAS_LASX): Ditto.
* config/loongarch/loongarch-str.h (OPTSTR_LASX): Ditto.
* config/loongarch/loongarch.opt: Ditto.
---
 gcc/config/loongarch/genopts/loongarch-strings |  1 +
 gcc/config/loongarch/genopts/loongarch.opt.in  |  4 
 gcc/config/loongarch/loongarch-c.cc| 11 +++
 gcc/config/loongarch/loongarch-def.c   |  4 +++-
 gcc/config/loongarch/loongarch-def.h   |  6 --
 gcc/config/loongarch/loongarch-driver.cc   |  2 +-
 gcc/config/loongarch/loongarch-driver.h|  1 +
 gcc/config/loongarch/loongarch-opts.cc |  9 -
 gcc/config/loongarch/loongarch-opts.h  |  4 +++-
 gcc/config/loongarch/loongarch-str.h   |  1 +
 gcc/config/loongarch/loongarch.opt |  4 
 11 files changed, 41 insertions(+), 6 deletions(-)

diff --git a/gcc/config/loongarch/genopts/loongarch-strings 
b/gcc/config/loongarch/genopts/loongarch-strings
index 24a5025061f..35d08f5967d 100644
--- a/gcc/config/loongarch/genopts/loongarch-strings
+++ b/gcc/config/loongarch/genopts/loongarch-strings
@@ -42,6 +42,7 @@ OPTSTR_DOUBLE_FLOAT   double-float
 
 # SIMD extensions
 OPTSTR_LSX lsx
+OPTSTR_LASXlasx
 
 # -mabi=
 OPTSTR_ABI_BASE  abi
diff --git a/gcc/config/loongarch/genopts/loongarch.opt.in 
b/gcc/config/loongarch/genopts/loongarch.opt.in
index 7ea3f0dea5d..d1c2d2fef34 100644
--- a/gcc/config/loongarch/genopts/loongarch.opt.in
+++ b/gcc/config/loongarch/genopts/loongarch.opt.in
@@ -80,6 +80,10 @@ m@@OPTSTR_LSX@@
 Target RejectNegative Var(la_opt_switches) Mask(LSX) Negative(m@@OPTSTR_LSX@@)
 Enable LoongArch SIMD Extension (LSX).
 
+m@@OPTSTR_LASX@@
+Target RejectNegative Var(la_opt_switches) Mask(LASX) 
Negative(m@@OPTSTR_LASX@@)
+Enable LoongArch Advanced SIMD Extension (LASX).
+
 ;; Base target models (implies ISA & tune parameters)
 Enum
 Name(cpu_type) Type(int)
diff --git a/gcc/config/loongarch/loongarch-c.cc 
b/gcc/config/loongarch/loongarch-c.cc
index b065921adc3..2747fb9e472 100644
--- a/gcc/config/loongarch/loongarch-c.cc
+++ b/gcc/config/loongarch/loongarch-c.cc
@@ -104,8 +104,19 @@ loongarch_cpu_cpp_builtins (cpp_reader *pfile)
   builtin_define ("__loongarch_simd");
   builtin_define ("__loongarch_sx");
   builtin_define ("__loongarch_sx_width=128");
+
+  if (!ISA_HAS_LASX)
+   builtin_define ("__loongarch_simd_width=128");
 }
 
+  if (ISA_HAS_LASX)
+{
+  builtin_define ("__loongarch_asx");
+  builtin_define ("__loongarch_asx_width=256");
+  builtin_define ("__loongarch_simd_width=256");
+}
+
+
   /* Native Data Sizes.  */
   builtin_define_with_int_value ("_LOONGARCH_SZINT", INT_TYPE_SIZE);
   builtin_define_with_int_value ("_LOONGARCH_SZLONG", LONG_TYPE_SIZE);
diff --git a/gcc/config/loongarch/loongarch-def.c 
b/gcc/config/loongarch/loongarch-def.c
index 28e24c62249..bff92c86532 100644
--- a/gcc/config/loongarch/loongarch-def.c
+++ b/gcc/config/loongarch/loongarch-def.c
@@ -54,7 +54,7 @@ loongarch_cpu_default_isa[N_ARCH_TYPES] = {
   [CPU_LA464] = {
   .base = ISA_BASE_LA64V100,
   .fpu = ISA_EXT_FPU64,
-  .simd = ISA_EXT_SIMD_LSX,
+  .simd = ISA_EXT_SIMD_LASX,
   },
 };
 
@@ -150,6 +150,7 @@ loongarch_isa_ext_strings[N_ISA_EXT_TYPES] = {
   [ISA_EXT_FPU32] = STR_ISA_EXT_FPU32,
   [ISA_EXT_NOFPU] = STR_ISA_EXT_NOFPU,
   [ISA_EXT_SIMD_LSX] = OPTSTR_LSX,
+  [ISA_EXT_SIMD_LASX] = OPTSTR_LASX,
 };
 
 const char*
@@ -180,6 +181,7 @@ loongarch_switch_strings[] = {
   [SW_SINGLE_FLOAT]  = OPTSTR_SINGLE_FLOAT,
   [SW_DOUBLE_FLOAT]  = OPTSTR_DOUBLE_FLOAT,
   [SW_LSX]   = OPTSTR_LSX,
+  [SW_LASX]  = OPTSTR_LASX,
 };
 
 
diff --git a/gcc/config/loongarch/loongarch-def.h 
b/gcc/config/loongarch/loongarch-def.h
index f34cffcfb9b..0bbcdb03d22 100644
--- a/gcc/config/loongarch/loongarch-def.h
+++ b/gcc/config/loongarch/loongarch-def.h
@@ -64,7 +64,8 @@ extern const char* loongarch_isa_ext_strings[];
 #define ISA_EXT_FPU642
 #define N_ISA_EXT_FPU_TYPES   3
 #define ISA_EXT_SIMD_LSX  3
-#define N_ISA_EXT_TYPES  4
+#define ISA_EXT_SIMD_LASX 4
+#define N_ISA_EXT_TYPES  5
 
 /* enum abi_base */
 extern const char*

[PATCH v1 1/6] LoongArch: Added Loongson SX vector directive compilation framework.

2023-06-29 Thread Chenghui Pan

From: Lulu Cheng 

gcc/ChangeLog:

* config/loongarch/genopts/loongarch-strings: Added compilation 
framework.
* config/loongarch/genopts/loongarch.opt.in: Ditto.
* config/loongarch/loongarch-c.cc (loongarch_cpu_cpp_builtins): Ditto.
* config/loongarch/loongarch-def.c: Ditto.
* config/loongarch/loongarch-def.h (N_ISA_EXT_TYPES): Ditto.
(ISA_EXT_SIMD_LSX): Ditto.
(N_SWITCH_TYPES): Ditto.
(SW_LSX): Ditto.
(struct loongarch_isa): Ditto.
* config/loongarch/loongarch-driver.cc (APPEND_SWITCH): Ditto.
(driver_get_normalized_m_opts): Ditto.
* config/loongarch/loongarch-driver.h (driver_get_normalized_m_opts): 
Ditto.
* config/loongarch/loongarch-opts.cc (loongarch_config_target): Ditto.
(isa_str): Ditto.
* config/loongarch/loongarch-opts.h (ISA_HAS_LSX): Ditto.
* config/loongarch/loongarch-str.h (OPTSTR_LSX): Ditto.
* config/loongarch/loongarch.opt: Ditto.
---
 .../loongarch/genopts/loongarch-strings   |  3 +
 gcc/config/loongarch/genopts/loongarch.opt.in | 12 ++-
 gcc/config/loongarch/loongarch-c.cc   |  7 ++
 gcc/config/loongarch/loongarch-def.c  |  4 +
 gcc/config/loongarch/loongarch-def.h  |  7 +-
 gcc/config/loongarch/loongarch-driver.cc  | 10 +++
 gcc/config/loongarch/loongarch-driver.h   |  1 +
 gcc/config/loongarch/loongarch-opts.cc| 82 ++-
 gcc/config/loongarch/loongarch-opts.h |  1 +
 gcc/config/loongarch/loongarch-str.h  |  2 +
 gcc/config/loongarch/loongarch.opt| 12 ++-
 11 files changed, 136 insertions(+), 5 deletions(-)

diff --git a/gcc/config/loongarch/genopts/loongarch-strings 
b/gcc/config/loongarch/genopts/loongarch-strings
index a40998ead97..24a5025061f 100644
--- a/gcc/config/loongarch/genopts/loongarch-strings
+++ b/gcc/config/loongarch/genopts/loongarch-strings
@@ -40,6 +40,9 @@ OPTSTR_SOFT_FLOAT soft-float
 OPTSTR_SINGLE_FLOAT   single-float
 OPTSTR_DOUBLE_FLOAT   double-float
 
+# SIMD extensions
+OPTSTR_LSX lsx
+
 # -mabi=
 OPTSTR_ABI_BASE  abi
 STR_ABI_BASE_LP64Dlp64d
diff --git a/gcc/config/loongarch/genopts/loongarch.opt.in 
b/gcc/config/loongarch/genopts/loongarch.opt.in
index 4b9b4ac273e..7ea3f0dea5d 100644
--- a/gcc/config/loongarch/genopts/loongarch.opt.in
+++ b/gcc/config/loongarch/genopts/loongarch.opt.in
@@ -76,6 +76,9 @@ m@@OPTSTR_DOUBLE_FLOAT@@
 Target Driver RejectNegative Var(la_opt_switches) Mask(FORCE_F64) 
Negative(m@@OPTSTR_SOFT_FLOAT@@)
 Allow hardware floating-point instructions to cover both 32-bit and 64-bit 
operations.
 
+m@@OPTSTR_LSX@@
+Target RejectNegative Var(la_opt_switches) Mask(LSX) Negative(m@@OPTSTR_LSX@@)
+Enable LoongArch SIMD Extension (LSX).
 
 ;; Base target models (implies ISA & tune parameters)
 Enum
@@ -125,11 +128,18 @@ Target RejectNegative Joined ToLower Enum(abi_base) 
Var(la_opt_abi_base) Init(M_
 Variable
 int la_opt_abi_ext = M_OPTION_NOT_SEEN
 
-
 mbranch-cost=
 Target RejectNegative Joined UInteger Var(loongarch_branch_cost)
 -mbranch-cost=COST Set the cost of branches to roughly COST instructions.
 
+mvecarg
+Target Var(TARGET_VECARG) Init(1)
+Target pass vect arg uses vector register.
+
+mmemvec-cost=
+Target RejectNegative Joined UInteger Var(loongarch_vector_access_cost) 
IntegerRange(1, 5)
+mmemvec-cost=COST  Set the cost of vector memory access instructions.
+
 mcheck-zero-division
 Target Mask(CHECK_ZERO_DIV)
 Trap on integer divide by zero.
diff --git a/gcc/config/loongarch/loongarch-c.cc 
b/gcc/config/loongarch/loongarch-c.cc
index 67911b78f28..b065921adc3 100644
--- a/gcc/config/loongarch/loongarch-c.cc
+++ b/gcc/config/loongarch/loongarch-c.cc
@@ -99,6 +99,13 @@ loongarch_cpu_cpp_builtins (cpp_reader *pfile)
   else
 builtin_define ("__loongarch_frlen=0");
 
+  if (ISA_HAS_LSX)
+{
+  builtin_define ("__loongarch_simd");
+  builtin_define ("__loongarch_sx");
+  builtin_define ("__loongarch_sx_width=128");
+}
+
   /* Native Data Sizes.  */
   builtin_define_with_int_value ("_LOONGARCH_SZINT", INT_TYPE_SIZE);
   builtin_define_with_int_value ("_LOONGARCH_SZLONG", LONG_TYPE_SIZE);
diff --git a/gcc/config/loongarch/loongarch-def.c 
b/gcc/config/loongarch/loongarch-def.c
index 6729c857f7c..28e24c62249 100644
--- a/gcc/config/loongarch/loongarch-def.c
+++ b/gcc/config/loongarch/loongarch-def.c
@@ -49,10 +49,12 @@ loongarch_cpu_default_isa[N_ARCH_TYPES] = {
   [CPU_LOONGARCH64] = {
   .base = ISA_BASE_LA64V100,
   .fpu = ISA_EXT_FPU64,
+  .simd = 0,
   },
   [CPU_LA464] = {
   .base = ISA_BASE_LA64V100,
   .fpu = ISA_EXT_FPU64,
+  .simd = ISA_EXT_SIMD_LSX,
   },
 };
 
@@ -147,6 +149,7 @@ loongarch_isa_ext_strings[N_ISA_EXT_TYPES] = {
   [ISA_EXT_FPU64] = STR_ISA_EXT_FPU64,
   [ISA_EXT_FPU32] = STR_ISA_EXT_FPU32,
   [ISA_EXT_NOFPU] = STR_ISA_EXT_NOFPU,
+  [ISA_EXT_SIMD_LSX] = OPTSTR_LSX,
 };
 
 const char*
@@ -176,6 +179,7 @@

[PATCH v1 0/6] Add Loongson SX/ASX instruction support to LoongArch target.

2023-06-29 Thread Chenghui Pan

These patches add the Loongson SX/ASX instruction support to the LoongArch
target, and can be utilized by using the new "-mlsx" and
"-mlasx" option.

Patches are bootstrapped and tested on loongarch64-linux-gnu target.

Lulu Cheng (6):
  LoongArch: Added Loongson SX vector directive compilation framework.
  LoongArch: Added Loongson SX base instruction support.
  LoongArch: Added Loongson SX directive builtin function support.
  LoongArch: Added Loongson ASX vector directive compilation framework.
  LoongArch: Added Loongson ASX base instruction support.
  LoongArch: Added Loongson ASX directive builtin function support.

 gcc/config.gcc|2 +-
 gcc/config/loongarch/constraints.md   |  128 +-
 .../loongarch/genopts/loongarch-strings   |4 +
 gcc/config/loongarch/genopts/loongarch.opt.in |   16 +-
 gcc/config/loongarch/lasx.md  | 5147 
 gcc/config/loongarch/lasxintrin.h | 5342 +
 gcc/config/loongarch/loongarch-builtins.cc| 2686 -
 gcc/config/loongarch/loongarch-c.cc   |   18 +
 gcc/config/loongarch/loongarch-def.c  |6 +
 gcc/config/loongarch/loongarch-def.h  |9 +-
 gcc/config/loongarch/loongarch-driver.cc  |   10 +
 gcc/config/loongarch/loongarch-driver.h   |2 +
 gcc/config/loongarch/loongarch-ftypes.def |  666 +-
 gcc/config/loongarch/loongarch-modes.def  |   39 +
 gcc/config/loongarch/loongarch-opts.cc|   89 +-
 gcc/config/loongarch/loongarch-opts.h |3 +
 gcc/config/loongarch/loongarch-protos.h   |   35 +
 gcc/config/loongarch/loongarch-str.h  |3 +
 gcc/config/loongarch/loongarch.cc | 4615 +-
 gcc/config/loongarch/loongarch.h  |  117 +-
 gcc/config/loongarch/loongarch.md |   56 +-
 gcc/config/loongarch/loongarch.opt|   16 +-
 gcc/config/loongarch/lsx.md   | 4490 ++
 gcc/config/loongarch/lsxintrin.h  | 5181 
 gcc/config/loongarch/predicates.md|  333 +-
 25 files changed, 28723 insertions(+), 290 deletions(-)
 create mode 100644 gcc/config/loongarch/lasx.md
 create mode 100644 gcc/config/loongarch/lasxintrin.h
 create mode 100644 gcc/config/loongarch/lsx.md
 create mode 100644 gcc/config/loongarch/lsxintrin.h

-- 
2.36.0

Re: Re: [PATCH] Machine Description: Add LEN_MASK_{GATHER_LOAD, SCATTER_STORE} pattern

2023-06-29 Thread juzhe.zh...@rivai.ai

Thanks for the comments.

I will fix doc's description as you suggested.

I personally prefer **NOT** to include BIAS in the gather/scatter since I don't 
known how it will be used.

Let's wait for Richi or Richard more comments.

Thanks.


juzhe.zh...@rivai.ai
 
From: Robin Dapp
Date: 2023-06-29 23:04
To: juzhe.zhong; gcc-patches
CC: rdapp.gcc; richard.sandiford; rguenther
Subject: Re: [PATCH] Machine Description: Add LEN_MASK_{GATHER_LOAD, 
SCATTER_STORE} pattern
Hi Juzhe,
 
just looking at the documentation changes.
 
> +@cindex @code{len_mask_gather_load@var{m}@var{n}} instruction pattern
> +@item @samp{len_mask_gather_load@var{m}@var{n}}
> +Like @samp{gather_load@var{m}@var{n}}, but takes an extra len operand
> +as operand 5 and an extra mask operand as operand 6.  Bit @var{i} of
> +the mask is set and i < len if element @var{i} of the result should be
> +loaded from memory.  Element @var{i} of the result should be undefined
> +value when either Bit @var{i} of the mask is clear or i >= len.
> +
 
I would suggest to rephrase this slightly as:
 
"Like ... but takes an extra length operand (operand 5) as well as a
mask operand (operand 6).  Similar to len_maskload, the instruction
loads at most (operand 5) elements from memory.  
Bit @var{i} of the mask is set if element @var{i} of the result should
be loaded from memory and clear if element @var{i} of the result
should be undefined.  Mask elements @var{i} with i > (operand 5) are
ignored."
 
to make it more similar to mask_gather_load.  Further improvements
welcome though - not sure if we should refer to len_maskload at all
because it has a bias operand and mask_gather_load doesn't.
 
> +@cindex @code{len_mask_scatter_store@var{m}@var{n}} instruction pattern
> +@item @samp{len_mask_scatter_store@var{m}@var{n}}
> +Like @samp{scatter_store@var{m}@var{n}}, but takes an extra len operand as
> +operand 5 and an extra mask operand as operand 6.  Bit @var{i} of the mask
> +is set and i < len if element @var{i} of the result should be stored to 
> memory.
> +
"an extra length operand (operand 5)... The instruction stores
at most (operand 5) elements of (operand 4) to memory.
Bit @var{i} of the mask is set if element @var{i} of (operand 4)
should be stored.  Mask elements @var{i} with i > (operand 5) are
ignored."
 
Regards
Robin

RE: Re: [PATCH v3] Streamer: Fix out of range memory access of machine mode

2023-06-29 Thread Li, Pan2 via Gcc-patches

That’s very cool, thanks Thomas for help! Let’s wait the AMD test running 
result for the final version of the patch.

Pan

From: juzhe.zh...@rivai.ai 
Sent: Friday, June 30, 2023 9:27 AM
To: Thomas Schwinge ; Li, Pan2 ; 
gcc-patches ; rguenther ; jakub 

Cc: Robin Dapp ; jeffreyalaw ; 
Wang, Yanzhang ; kito.cheng ; 
Tobias Burnus 
Subject: Re: Re: [PATCH v3] Streamer: Fix out of range memory access of machine 
mode

Thanks a lot!

Really appreciate your help ! That's really helpful for RVV (RISC-V vector).
Could you merge your patch after you tested?

Thanks.

juzhe.zh...@rivai.ai

From: Thomas Schwinge
Date: 2023-06-30 04:14
To: Pan Li; 
gcc-patches@gcc.gnu.org; Richard 
Biener; Jakub Jelinek
CC: juzhe.zh...@rivai.ai; 
rdapp@gmail.com; 
jeffreya...@gmail.com; 
yanzhang.w...@intel.com; 
kito.ch...@gmail.com; Tobias 
Burnus
Subject: Re: [PATCH v3] Streamer: Fix out of range memory access of machine mode
Hi!

On 2023-06-29T11:29:57+0200, I wrote:
> On 2023-06-21T15:58:24+0800, Pan Li via Gcc-patches 
> mailto:gcc-patches@gcc.gnu.org>> wrote:
>> We extend the machine mode from 8 to 16 bits already. But there still
>> one placing missing from the streamer. It has one hard coded array
>> for the machine code like size 256.
>>
>> In the lto pass, we memset the array by MAX_MACHINE_MODE count but the
>> value of the MAX_MACHINE_MODE will grow as more and more modes are
>> added. While the machine mode array in tree-streamer still leave 256 as is.
>>
>> Then, when the MAX_MACHINE_MODE is greater than 256, the memset of
>> lto_output_init_mode_table will touch the memory out of range unexpected.
>
> Uh.  :-O
>
>> This patch would like to take the MAX_MACHINE_MODE as the size of the
>> array in streamer, to make sure there is no potential unexpected
>> memory access in future. Meanwhile, this patch also adjust some place
>> which has MAX_MACHINE_MODE <= 256 assumption.
>
> Thanks to Jakub and Richard for guidance re the offloading compilation
> case, where we've got different 'MAX_MACHINE_MODE's between stream-out
> and stream-in, and a modes mapping table.
>
> However, with this patch, there are ICEs all over the place...  I'm
> having a look.

Your patch has all the right ideas, there are just a few additional
changes necessary.  Please merge in the attached
"f into Streamer: Fix out of range memory access of machine mode", with
'Co-authored-by: Thomas Schwinge 
mailto:tho...@codesourcery.com>>'.  This has
already survived compiler-side 'lto.exp' testing and
'check-target-libgomp' with Nvidia GPU offloading; AMD GPU testing is now
running (not expecting any bad surprises).  Will let you know by (my)
tomorrow morning in case there are any more problems.

Explanation:

>> --- a/gcc/lto-streamer-in.cc
>> +++ b/gcc/lto-streamer-in.cc
>> @@ -1985,8 +1985,6 @@ lto_input_mode_table (struct lto_file_decl_data 
>> *file_data)
>>  internal_error ("cannot read LTO mode table from %s",
>>   file_data->file_name);
>>
>> -  unsigned char *table = ggc_cleared_vec_alloc (1 << 8);
>> -  file_data->mode_table = table;
>>const struct lto_simple_header_with_strings *header
>>  = (const struct lto_simple_header_with_strings *) data;
>>int string_offset;
>> @@ -1998,16 +1996,22 @@ lto_input_mode_table (struct lto_file_decl_data 
>> *file_data)
>>   header->string_size, vNULL);
>>bitpack_d bp = streamer_read_bitpack ();
>>
>> +  unsigned mode_bits = bp_unpack_value (, 5);
>> +  unsigned char *table = ggc_cleared_vec_alloc (1 << 
>> mode_bits);
>> +
>> +  file_data->mode_table = table;
>> +  file_data->mode_bits = mode_bits;

Here, we set 'file_data->mode_bits' for the offloading case (where
'lto_input_mode_table' is called) -- but it's not set for the
non-offloading case (where 'lto_input_mode_table' isn't called).  (See my
'gcc/lto/lto-common.cc:lto_read_decls' change.)  That's "not currently a
problem", as 'file_data->mode_bits' isn't used anywhere...

>> --- a/gcc/lto-streamer.h
>> +++ b/gcc/lto-streamer.h
>> @@ -604,6 +604,8 @@ struct GTY(()) lto_file_decl_data
>>int order_base;
>>
>>int unit_base;
>> +
>> +  unsigned mode_bits;
>>  };

>>  inline machine_mode
>>  bp_unpack_machine_mode (struct bitpack_d *bp)
>>  {
>> -  return (machine_mode)
>> -((class lto_input_block *)
>> - bp->stream)->mode_table[bp_unpack_enum (bp, machine_mode, 1 << 8)];
>> +  int last = 1 << ceil_log2 (MAX_MACHINE_MODE);
>> +  lto_input_block *input_block = (class lto_input_block *) bp->stream;
>> +  int index = bp_unpack_enum (bp, machine_mode, last);
>> +
>> +  return (machine_mode)

Re: Re: [PATCH] RISC-V: Support vfwnmacc/vfwmsac/vfwnmsac combine lowering

2023-06-29 Thread juzhe.zh...@rivai.ai

>> I've triple checked this already.
You mean you still didn't see vfwmul.vv ?

That's odd. Let's wait for kito or Robin test this patch.
Then, I believe they will know what I am saying.

>> I would strongly suggest looking at a dependency height reduction
>> pattern if you want to optimize that code further.
I did it long time ago. Turns out it's better to do that on Combine PASS in 
both GCC and LLVM.

Never mind, I always have this implementation in my downstream and won't affect 
my downstream GCC maintainment.
It's ok that this patch is not approved since I can get the perfect codegen in 
my downstream. 

Thanks.

juzhe.zh...@rivai.ai

From: Jeff Law
Date: 2023-06-30 09:26
To: juzhe.zh...@rivai.ai; gcc-patches
CC: kito.cheng; Kito.cheng; palmer; palmer; Robin Dapp
Subject: Re: [PATCH] RISC-V: Support vfwnmacc/vfwmsac/vfwnmsac combine lowering

On 6/29/23 19:14, juzhe.zh...@rivai.ai wrote:
> No, reduction patterns won't help.
> As I said in vfwmul patch. You should make sure your environment is 
> working then try again.
I've triple checked this already.

I checked it again and your patch does not impact behavior, nor should 
it.   I checked it on top of these trunk commits:

14bfda6084eaca07c842566a34316974907958e2
e714af12e3bee0032d8d226f87d92c9bc46f0269

I checked it with the code from the godbolt links you suggested with the 
options shown in those links.

More importantly, your explanation of what the pattern is supposed to do 
shows a misunderstanding of what combine's capabilities actually are.  A 
bridge or intermediate pattern is not needed here, combine can 
substitute multiple sources in combination attempts as can be clearly 
seen from the dump fragments I posted.

The only reason I didn't reject the patch at the outset was the 
possibility that maybe we were trying to combine more than 4 
instructions or that possibility something about the number of operands, 
unspecs, whatever were getting in the way.

This patch is not needed and does not affect code generation.

I would strongly suggest looking at a dependency height reduction 
pattern if you want to optimize that code further.

Jeff

Re: [PATCH] RISC-V: Support vfwnmacc/vfwmsac/vfwnmsac combine lowering

2023-06-29 Thread Jeff Law via Gcc-patches





On 6/29/23 19:14, juzhe.zh...@rivai.ai wrote:

No, reduction patterns won't help.
As I said in vfwmul patch. You should make sure your environment is 
working then try again.

I've triple checked this already.

I checked it again and your patch does not impact behavior, nor should 
it.   I checked it on top of these trunk commits:


14bfda6084eaca07c842566a34316974907958e2
e714af12e3bee0032d8d226f87d92c9bc46f0269

I checked it with the code from the godbolt links you suggested with the 
options shown in those links.


More importantly, your explanation of what the pattern is supposed to do 
shows a misunderstanding of what combine's capabilities actually are.  A 
bridge or intermediate pattern is not needed here, combine can 
substitute multiple sources in combination attempts as can be clearly 
seen from the dump fragments I posted.


The only reason I didn't reject the patch at the outset was the 
possibility that maybe we were trying to combine more than 4 
instructions or that possibility something about the number of operands, 
unspecs, whatever were getting in the way.


This patch is not needed and does not affect code generation.

I would strongly suggest looking at a dependency height reduction 
pattern if you want to optimize that code further.


Jeff

Re: Re: [PATCH v3] Streamer: Fix out of range memory access of machine mode

2023-06-29 Thread juzhe.zh...@rivai.ai

Thanks a lot!

Really appreciate your help ! That's really helpful for RVV (RISC-V vector).
Could you merge your patch after you tested?

Thanks.

juzhe.zh...@rivai.ai

From: Thomas Schwinge
Date: 2023-06-30 04:14
To: Pan Li; gcc-patches@gcc.gnu.org; Richard Biener; Jakub Jelinek
CC: juzhe.zh...@rivai.ai; rdapp@gmail.com; jeffreya...@gmail.com; 
yanzhang.w...@intel.com; kito.ch...@gmail.com; Tobias Burnus
Subject: Re: [PATCH v3] Streamer: Fix out of range memory access of machine mode
Hi!

On 2023-06-29T11:29:57+0200, I wrote:
> On 2023-06-21T15:58:24+0800, Pan Li via Gcc-patches  
> wrote:
>> We extend the machine mode from 8 to 16 bits already. But there still
>> one placing missing from the streamer. It has one hard coded array
>> for the machine code like size 256.
>>
>> In the lto pass, we memset the array by MAX_MACHINE_MODE count but the
>> value of the MAX_MACHINE_MODE will grow as more and more modes are
>> added. While the machine mode array in tree-streamer still leave 256 as is.
>>
>> Then, when the MAX_MACHINE_MODE is greater than 256, the memset of
>> lto_output_init_mode_table will touch the memory out of range unexpected.
>
> Uh.  :-O
>
>> This patch would like to take the MAX_MACHINE_MODE as the size of the
>> array in streamer, to make sure there is no potential unexpected
>> memory access in future. Meanwhile, this patch also adjust some place
>> which has MAX_MACHINE_MODE <= 256 assumption.
>
> Thanks to Jakub and Richard for guidance re the offloading compilation
> case, where we've got different 'MAX_MACHINE_MODE's between stream-out
> and stream-in, and a modes mapping table.
>
> However, with this patch, there are ICEs all over the place...  I'm
> having a look.

Your patch has all the right ideas, there are just a few additional
changes necessary.  Please merge in the attached
"f into Streamer: Fix out of range memory access of machine mode", with
'Co-authored-by: Thomas Schwinge '.  This has
already survived compiler-side 'lto.exp' testing and
'check-target-libgomp' with Nvidia GPU offloading; AMD GPU testing is now
running (not expecting any bad surprises).  Will let you know by (my)
tomorrow morning in case there are any more problems.

Explanation:

>> --- a/gcc/lto-streamer-in.cc
>> +++ b/gcc/lto-streamer-in.cc
>> @@ -1985,8 +1985,6 @@ lto_input_mode_table (struct lto_file_decl_data 
>> *file_data)
>>  internal_error ("cannot read LTO mode table from %s",
>>   file_data->file_name);
>>
>> -  unsigned char *table = ggc_cleared_vec_alloc (1 << 8);
>> -  file_data->mode_table = table;
>>const struct lto_simple_header_with_strings *header
>>  = (const struct lto_simple_header_with_strings *) data;
>>int string_offset;
>> @@ -1998,16 +1996,22 @@ lto_input_mode_table (struct lto_file_decl_data 
>> *file_data)
>>   header->string_size, vNULL);
>>bitpack_d bp = streamer_read_bitpack ();
>>
>> +  unsigned mode_bits = bp_unpack_value (, 5);
>> +  unsigned char *table = ggc_cleared_vec_alloc (1 << 
>> mode_bits);
>> +
>> +  file_data->mode_table = table;
>> +  file_data->mode_bits = mode_bits;

Here, we set 'file_data->mode_bits' for the offloading case (where
'lto_input_mode_table' is called) -- but it's not set for the
non-offloading case (where 'lto_input_mode_table' isn't called).  (See my
'gcc/lto/lto-common.cc:lto_read_decls' change.)  That's "not currently a
problem", as 'file_data->mode_bits' isn't used anywhere...

>> --- a/gcc/lto-streamer.h
>> +++ b/gcc/lto-streamer.h
>> @@ -604,6 +604,8 @@ struct GTY(()) lto_file_decl_data
>>int order_base;
>>
>>int unit_base;
>> +
>> +  unsigned mode_bits;
>>  };

>>  inline machine_mode
>>  bp_unpack_machine_mode (struct bitpack_d *bp)
>>  {
>> -  return (machine_mode)
>> -((class lto_input_block *)
>> - bp->stream)->mode_table[bp_unpack_enum (bp, machine_mode, 1 << 8)];
>> +  int last = 1 << ceil_log2 (MAX_MACHINE_MODE);
>> +  lto_input_block *input_block = (class lto_input_block *) bp->stream;
>> +  int index = bp_unpack_enum (bp, machine_mode, last);
>> +
>> +  return (machine_mode) input_block->mode_table[index];
>>  }

..., but 'file_data->mode_bits' needs to be considered here, in the
stream-in for offloading, where 'file_data->mode_bits' -- that is, the
host 'MAX_MACHINE_MODE' -- very likely is different from the offload
device 'MAX_MACHINE_MODE'.

Easiest is in 'gcc/lto-streamer.h:class lto_input_block' to capture
'lto_file_decl_data *file_data' instead of just
'unsigned char *mode_table', and adjust all users.

That's it.  :-)

>> --- a/gcc/tree-streamer.h
>> +++ b/gcc/tree-streamer.h

>> @@ -108,15 +108,19 @@ inline void
>>  bp_pack_machine_mode (struct bitpack_d *bp, machine_mode mode)
>>  {
>>streamer_mode_table[mode] = 1;
>> -  bp_pack_enum (bp, machine_mode, 1 << 8, mode);
>> +  int last = 1 << ceil_log2 (MAX_MACHINE_MODE);
>> +
>> +  bp_pack_enum (bp, machine_mode, last, mode);
>>  }

That use of 'MAX_MACHINE_MODE'

Re: Re: [PATCH] RISC-V: Support vfwnmacc/vfwmsac/vfwnmsac combine lowering

2023-06-29 Thread juzhe.zh...@rivai.ai

No, reduction patterns won't help. 
As I said in vfwmul patch. You should make sure your environment is working 
then try again.

Thanks.



juzhe.zh...@rivai.ai
 
From: Jeff Law
Date: 2023-06-30 07:43
To: 钟居哲; gcc-patches
CC: kito.cheng; kito.cheng; palmer; palmer; rdapp.gcc
Subject: Re: [PATCH] RISC-V: Support vfwnmacc/vfwmsac/vfwnmsac combine lowering
 
 
On 6/28/23 16:56, 钟居哲 wrote:
> 
> 
> 
> juzhe.zh...@rivai.ai
> 
> *From:* Jeff Law 
> *Date:* 2023-06-29 06:43
> *To:* 钟居哲 ; gcc-patches
> 
> *CC:* kito.cheng ; kito.cheng
> ; palmer ;
> palmer ; rdapp.gcc
> 
> *Subject:* Re: [PATCH] RISC-V: Support vfwnmacc/vfwmsac/vfwnmsac
> combine lowering
> On 6/28/23 16:10, 钟居哲 wrot
>  > Sure.
>  >
>  > https://godbolt.org/z/8857KzTno 
>  >
>  > Failed to match this instruction:
>  > (set (reg:VNx2DF 134 [ vect__31.47 ])
>  >  (fma:VNx2DF (neg:VNx2DF (float_extend:VNx2DF (reg:VNx2SF 136 [
>  > vect__28.44 ])))
>  >  (reg:VNx2DF 150 [ vect__8.12 ])
>  >  (reg:VNx2DF 171 [ vect__29.45 ])))
> Please attach the full dump.  I would expect to see additional attempts
> with more operands replaced.
THanks for the dump.  I think this fundamentally the same issue as the 
widening problem.
 
Drop those intermediate patterns.  They're not needed/helpful.  You may 
need a dependency height reduction pattern to get the code you want, but 
I see no evidence those extra patterns will solve anything.
 
jeff

Re: Re: [PATCH] RISC-V: Support vfwmul.vv combine lowering

2023-06-29 Thread juzhe.zh...@rivai.ai

Hi, Jeff.

That's odd. I think maybe you should first clean up your environment ?
Or you didn't build up the toolchain correctly with this patch?

Compile option: --param=riscv-autovec-preference=scalable -O3 -ffast-math
Before this patch:
https://godbolt.org/z/Y5d44WMqs 

fail.s:

lw t5,0(sp)
ble t5,zero,.L5
.L3:
vsetvli t1,t5,e32,mf2,ta,ma
vle32.v v2,0(a4)
vle32.v v1,0(a5)
vsetvli t0,zero,e32,mf2,ta,ma
vfwcvt.f.f.v v3,v2
vfwcvt.f.f.v v2,v1
vsetvli zero,t1,e32,mf2,ta,ma
vle32.v v5,0(a6)
vle32.v v4,0(a7)
vsetvli t0,zero,e32,mf2,ta,ma
vfwcvt.f.f.v v1,v5
vsetvli zero,zero,e64,m1,ta,ma
vfmul.vv v5,v2,v3
vfmul.vv v2,v1,v2
vsetvli zero,t1,e64,m1,ta,ma
vse64.v v2,0(a1)
vse64.v v5,0(a0)
vsetvli t6,zero,e64,m1,ta,ma
vfmul.vv v1,v1,v3
vsetvli zero,zero,e32,mf2,ta,ma
vfwcvt.f.f.v v2,v4
vsetvli zero,t1,e64,m1,ta,ma
vse64.v v1,0(a2)
vsetvli t6,zero,e64,m1,ta,ma
slli t4,t1,2
slli t3,t1,3
vfmul.vv v1,v2,v3
sub t5,t5,t1
vsetvli zero,t1,e64,m1,ta,ma
vse64.v v1,0(a3)
add a4,a4,t4
add a5,a5,t4
add a0,a0,t3
add a6,a6,t4
add a1,a1,t3
add a2,a2,t3
add a7,a7,t4
add a3,a3,t3
bne t5,zero,.L3
.L5:
ret

After this patch:
pass.s:

lw t5,0(sp)
ble t5,zero,.L5
.L3:
vsetvli t1,t5,e32,mf2,ta,ma
vle32.v v1,0(a4)
vle32.v v3,0(a5)
vle32.v v2,0(a6)
vle32.v v4,0(a7)
vsetvli t6,zero,e32,mf2,ta,ma
vfwmul.vv v5,v3,v2
vfwmul.vv v6,v1,v3
vsetvli zero,t1,e64,m1,ta,ma
vse64.v v6,0(a0)
vse64.v v5,0(a1)
vsetvli t6,zero,e32,mf2,ta,ma
slli t4,t1,2
slli t3,t1,3
vfwmul.vv v3,v2,v1
sub t5,t5,t1
vfwmul.vv v2,v1,v4
vsetvli zero,t1,e64,m1,ta,ma
vse64.v v3,0(a2)
vse64.v v2,0(a3)
add a4,a4,t4
add a5,a5,t4
add a0,a0,t3
add a6,a6,t4
add a1,a1,t3
add a2,a2,t3
add a7,a7,t4
add a3,a3,t3
bne t5,zero,.L3
.L5:
ret

It's very obvious the codegen with this patch is perfect.

I have attached the .S in this patch.

I am not claiming that this patch solution is the only solution.

I am welcome you can provide another solution as long as you can make this 
codegen become the perfect codegen that this patch achieved.

I think maybe you should make sure you are using the correct toolchain that 
built with patch.

Thanks.


juzhe.zh...@rivai.ai
 
From: Jeff Law
Date: 2023-06-30 07:48
To: juzhe.zhong
CC: gcc-patches; kito.cheng; kito.cheng; palmer; palmer; rdapp.gcc
Subject: Re: [PATCH] RISC-V: Support vfwmul.vv combine lowering
 
 
On 6/29/23 17:46, juzhe.zhong wrote:
> You should try the example check the codegen before and after the patch. 
> You will understand it.
I've already done that.  It makes _no_ difference on the godbold example.
 
Jeff
 


fail.s
Description: Binary data


pass.s
Description: Binary data

Re: [PATCH] RISC-V: Support vfwmul.vv combine lowering

2023-06-29 Thread Jeff Law via Gcc-patches





On 6/29/23 17:46, juzhe.zhong wrote:
You should try the example check the codegen before and after the patch. 
You will understand it.

I've already done that.  It makes _no_ difference on the godbold example.

Jeff

Re: [PATCH] RISC-V: Support vfwnmacc/vfwmsac/vfwnmsac combine lowering

2023-06-29 Thread Jeff Law via Gcc-patches





On 6/28/23 16:56, 钟居哲 wrote:




juzhe.zh...@rivai.ai

*From:* Jeff Law 
*Date:* 2023-06-29 06:43
*To:* 钟居哲 ; gcc-patches

*CC:* kito.cheng ; kito.cheng
; palmer ;
palmer ; rdapp.gcc

*Subject:* Re: [PATCH] RISC-V: Support vfwnmacc/vfwmsac/vfwnmsac
combine lowering
On 6/28/23 16:10, 钟居哲 wrot
 > Sure.
 >
 > https://godbolt.org/z/8857KzTno 
 >
 > Failed to match this instruction:
 > (set (reg:VNx2DF 134 [ vect__31.47 ])
 >      (fma:VNx2DF (neg:VNx2DF (float_extend:VNx2DF (reg:VNx2SF 136 [
 > vect__28.44 ])))
 >          (reg:VNx2DF 150 [ vect__8.12 ])
 >          (reg:VNx2DF 171 [ vect__29.45 ])))
Please attach the full dump.  I would expect to see additional attempts
with more operands replaced.
THanks for the dump.  I think this fundamentally the same issue as the 
widening problem.


Drop those intermediate patterns.  They're not needed/helpful.  You may 
need a dependency height reduction pattern to get the code you want, but 
I see no evidence those extra patterns will solve anything.


jeff

Re: [PATCH] RISC-V: Support vfwmul.vv combine lowering

2023-06-29 Thread Jeff Law via Gcc-patches





On 6/28/23 16:00, 钟居哲 wrote:

You can see here:

https://godbolt.org/z/d78646hWb 
You patch doesn't help that code and your patch is a result of 
fundamentally misunderstanding combine's capabilities AFAICT.


Jeff

Re: [PATCH] RISC-V: Support vfwmul.vv combine lowering

2023-06-29 Thread Jeff Law via Gcc-patches





On 6/28/23 16:00, 钟居哲 wrote:

You can see here:

https://godbolt.org/z/d78646hWb 
So just to be explicit, I see no difference with that test before/after 
your proposed change.  Nor would I expect one based on my understanding 
of the patch.


The explicit conversions I see are because we need the output of the 
conversion in multiple vfmul instructions.  That won't be helped by the 
patch you've proposed.


To be more concrete:


   vsetvli t1,t5,e32,mf2,ta,ma # 99[c=0 l=4]  vsetvldi
vle32.v v2,0(a4)# 23[c=4 l=4]  pred_movvnx2sf/1
vle32.v v1,0(a5)# 25[c=4 l=4]  pred_movvnx2sf/1
vsetvli t0,zero,e32,mf2,ta,ma   # 101   [c=0 l=4]  vsetvldi
vfwcvt.f.f.vv3,v2   # 77[c=4 l=4]  pred_extendvnx2df/0
vfwcvt.f.f.vv2,v1   # 79[c=4 l=4]  pred_extendvnx2df/0
vsetvli zero,t1,e32,mf2,ta,ma   # 102   [c=0 l=4]  
vsetvl_discard_resultdi
vle32.v v5,0(a6)# 31[c=4 l=4]  pred_movvnx2sf/1
vle32.v v4,0(a7)# 39[c=4 l=4]  pred_movvnx2sf/1
vsetvli t0,zero,e32,mf2,ta,ma   # 103   [c=0 l=4]  vsetvldi
vfwcvt.f.f.vv1,v5   # 81[c=4 l=4]  pred_extendvnx2df/0
vsetvli zero,zero,e64,m1,ta,ma  # 104   [c=16 l=4]  
vsetvl_vtype_change_only
vfmul.vvv5,v2,v3# 29[c=4 l=4]  pred_mulvnx2df/2
vfmul.vvv2,v1,v2# 34[c=4 l=4]  pred_mulvnx2df/2
vsetvli zero,t1,e64,m1,ta,ma# 105   [c=0 l=4]  
vsetvl_discard_resultdi
vse64.v v2,0(a1)# 35[c=4 l=4]  pred_storevnx2df
vse64.v v5,0(a0)# 30[c=4 l=4]  pred_storevnx2df
vsetvli t6,zero,e64,m1,ta,ma# 106   [c=0 l=4]  vsetvldi
vfmul.vvv1,v1,v3# 37[c=4 l=4]  pred_mulvnx2df/2
vsetvli zero,zero,e32,mf2,ta,ma # 107   [c=20 l=4]  
vsetvl_vtype_change_only
vfwcvt.f.f.vv2,v4   # 83[c=4 l=4]  pred_extendvnx2df/0
vsetvli zero,t1,e64,m1,ta,ma# 108   [c=0 l=4]  
vsetvl_discard_resultdi
vse64.v v1,0(a2)# 38[c=4 l=4]  pred_storevnx2df
vsetvli t6,zero,e64,m1,ta,ma# 109   [c=0 l=4]  vsetvldi
sllit4,t1,2 # 22[c=4 l=4]  ashldi3
sllit3,t1,3 # 27[c=4 l=4]  ashldi3
vfmul.vvv1,v2,v3# 42[c=4 l=4]  pred_mulvnx2df/2



Note how the output of the explicit conversion done in insn 77 is used 
by the vfmul in insns 29, 37 and 42.  Similarly for the other explcit 
conversions.


Your pattern isn't going to help that problem.

You could model this as a dependency height reduction.  I think that 
will get you were you want to go.


You'll need a pattern that matches this:

(parallel [ 
(set (reg:VNx2DF 160 [ vect__11.15 ])

(if_then_else:VNx2DF (unspec:VNx2BI [
(const_vector:VNx2BI repeat [
(const_int 1 [0x1])
])  
(reg:DI 169)

(const_int 2 [0x2]) repeated x2
(const_int 1 [0x1]) 
(const_int 7 [0x7])

(reg:SI 66 vl)
(reg:SI 67 vtype)
(reg:SI 69 frm)
] UNSPEC_VPREDICATE)
(mult:VNx2DF (float_extend:VNx2DF (reg:VNx2SF 144 [ vect__7.13 
]))
(float_extend:VNx2DF (reg:VNx2SF 146 [ vect__4.9 ])))
(unspec:VNx2DF [
(reg:SI 0 zero)
] UNSPEC_VUNDEF)))
(set (reg:VNx2DF 143 [ vect__8.14 ])
(float_extend:VNx2DF (reg:VNx2SF 144 [ vect__7.13 ])))
(set (reg:VNx2DF 145 [ vect__5.10 ])
(float_extend:VNx2DF (reg:VNx2SF 146 [ vect__4.9 ])))
])


It'll need to be a define_insn_and_split as its a 3->3 splitter.  The 
split will emit the two extensions and the widening multiply as 3 
distinct insns.


This has two positive effects.  First the widening multiply is no longer 
data dependent on the float_extend and so it can issue when ever r144 
and r146 are ready rather than when r143 and r145 are ready.


The second effect is I think this pattern will end up matching all the 
multiplies in this sample code.  As a result all the float_extend insns 
you generated when splitting become dead and should be removed by DCE.



Jeff

[PATCH ver 3] rs6000: Update the vsx-vector-6.* tests.

2023-06-29 Thread Carl Love via Gcc-patches

GCC maintainers:

Ver 3.  Added __attribute__ ((noipa)) to the test files.  Changed some
of the scan-assembler-times checks to cover multiple similar
instructions.  Change the function check macro to a macro to generate a
function to do the test and check the results.  Retested on the various
processor types and BE/LE versions.

Ver 2.  Switched to using code macros to generate the call to the
builtin and test the results.  Added in instruction counts for the key
instruction for the builtin.  Moved the tests into an additional
function call to ensure the compile doesn't replace the builtin call
code with the statically computed results.  The compiler was doing this
for a few of the simpler tests.  

The following patch takes the tests in vsx-vector-6-p7.h,  vsx-vector-
6-p8.h, vsx-vector-6-p9.h and reorganizes them into a series of smaller
test files by functionality rather than processor version.

Tested the patch on Power 8 LE/BE, Power 9 LE/BE and Power 10 LE with
no regresions.

Please let me know if this patch is acceptable for mainline.  Thanks.

   Carl


-
rs6000: Update the vsx-vector-6.* tests.

The vsx-vector-6.h file is included into the processor specific test files
vsx-vector-6.p7.c, vsx-vector-6.p8.c, and vsx-vector-6.p9.c.  The .h file
contains a large number of vsx vector builtin tests.  The processor
specific files contain the number of instructions that the tests are
expected to generate for that processor.  The tests are compile only.

The tests are broken up into a seriers of files for related tests.  The
new tests are runnable tests to verify the builtin argument types and the
functional correctness of each test rather then verifying the type and
number of instructions generated.

gcc/testsuite/
* gcc.target/powerpc/vsx-vector-6-1op.c: New test file.
* gcc.target/powerpc/vsx-vector-6-2lop.c: New test file.
* gcc.target/powerpc/vsx-vector-6-2op.c: New test file.
* gcc.target/powerpc/vsx-vector-6-3op.c: New test file.
* gcc.target/powerpc/vsx-vector-6-cmp-all.c: New test file.
* gcc.target/powerpc/vsx-vector-6-cmp.c: New test file.
* gcc.target/powerpc/vsx-vector-6.h: Remove test file.
* gcc.target/powerpc/vsx-vector-6-p7.h: Remove test file.
* gcc.target/powerpc/vsx-vector-6-p8.h: Remove test file.
* gcc.target/powerpc/vsx-vector-6-p9.h: Remove test file.
---
 .../powerpc/vsx-vector-6-func-1op.c   | 141 ++
 .../powerpc/vsx-vector-6-func-2lop.c  | 217 +++
 .../powerpc/vsx-vector-6-func-2op.c   | 133 +
 .../powerpc/vsx-vector-6-func-3op.c   | 257 ++
 .../powerpc/vsx-vector-6-func-cmp-all.c   | 211 ++
 .../powerpc/vsx-vector-6-func-cmp.c   | 121 +
 .../gcc.target/powerpc/vsx-vector-6.h | 154 ---
 .../gcc.target/powerpc/vsx-vector-6.p7.c  |  43 ---
 .../gcc.target/powerpc/vsx-vector-6.p8.c  |  43 ---
 .../gcc.target/powerpc/vsx-vector-6.p9.c  |  42 ---
 10 files changed, 1080 insertions(+), 282 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/powerpc/vsx-vector-6-func-1op.c
 create mode 100644 gcc/testsuite/gcc.target/powerpc/vsx-vector-6-func-2lop.c
 create mode 100644 gcc/testsuite/gcc.target/powerpc/vsx-vector-6-func-2op.c
 create mode 100644 gcc/testsuite/gcc.target/powerpc/vsx-vector-6-func-3op.c
 create mode 100644 gcc/testsuite/gcc.target/powerpc/vsx-vector-6-func-cmp-all.c
 create mode 100644 gcc/testsuite/gcc.target/powerpc/vsx-vector-6-func-cmp.c
 delete mode 100644 gcc/testsuite/gcc.target/powerpc/vsx-vector-6.h
 delete mode 100644 gcc/testsuite/gcc.target/powerpc/vsx-vector-6.p7.c
 delete mode 100644 gcc/testsuite/gcc.target/powerpc/vsx-vector-6.p8.c
 delete mode 100644 gcc/testsuite/gcc.target/powerpc/vsx-vector-6.p9.c

diff --git a/gcc/testsuite/gcc.target/powerpc/vsx-vector-6-func-1op.c 
b/gcc/testsuite/gcc.target/powerpc/vsx-vector-6-func-1op.c
new file mode 100644
index 000..52c7ae3e983
--- /dev/null
+++ b/gcc/testsuite/gcc.target/powerpc/vsx-vector-6-func-1op.c
@@ -0,0 +1,141 @@
+/* { dg-do run { target lp64 } } */
+/* { dg-skip-if "" { powerpc*-*-darwin* } } */
+/* { dg-options "-O2 -save-temps" } */
+
+/* Functional test of the one operand vector builtins.  */
+
+#include 
+#include 
+#include 
+
+#define DEBUG 0
+
+void abort (void);
+
+/* Macro to check the results for the various floating point argument tests.
+ */
+#define FLOAT_TEST(NAME)  \
+  void __attribute__ ((noipa))\
+  float_##NAME (vector float f_src, vector float f_##NAME##_expected) \
+  {  \
+vector float f_result = vec_##NAME(f_src);   \
+  \
+if

Re: Re: [PATCH] RISC-V: Support vfwmul.vv combine lowering

2023-06-29 Thread 钟居哲

Or do you have better solution to make the case succeed to combine into vfwmul?
I am ok with any solution.



juzhe.zh...@rivai.ai
 
From: Jeff Law
Date: 2023-06-30 06:59
To: 钟居哲; gcc-patches
CC: kito.cheng; kito.cheng; palmer; palmer; rdapp.gcc
Subject: Re: [PATCH] RISC-V: Support vfwmul.vv combine lowering
 
 
On 6/28/23 16:00, 钟居哲 wrote:
> You can see here:
> 
> https://godbolt.org/z/d78646hWb 
> 
> The first case can't genreate vfwmul.vv but second case succeed.
> 
> Failed to match this instruction:
> (set (reg:VNx2DF 150 [ vect__11.50 ])
>  (if_then_else:VNx2DF (unspec:VNx2BI [
>  (const_vector:VNx2BI repeat [
>  (const_int 1 [0x1])
>  ])
>  (reg:DI 153)
>  (const_int 2 [0x2]) repeated x2
>  (const_int 1 [0x1])
>  (const_int 7 [0x7])
>  (reg:SI 66 vl)
>  (reg:SI 67 vtype)
>  (reg:SI 69 N/A)
>  ] UNSPEC_VPREDICATE)
>  (mult:VNx2DF (float_extend:VNx2DF (reg:VNx2SF 149 [ vect__5.45 ]))
>  (reg:VNx2DF 148 [ vect__8.49 ]))
>  (unspec:VNx2DF [
>  (reg:SI 0 zero)
>  ] UNSPEC_VUNDEF)))
Right.  We try combining:
   24 -> 27
   25 -> 27
   23, 24 -> 27
   22, 25 -> 27
 
All of which fail, as expected.  24 -> 27 and 25-> 27 only put an 
extension on one operand of the mult.  The other two try to substitute a 
float extend of an if-then-else which I fully expect to fail.  All as 
expected.
 
The next one that gets tried is:
 
> Trying 25, 24 -> 27:
>25: r149:VNx2DF=float_extend(r141:VNx2SF)
>   REG_DEAD r141:VNx2SF
>24: r148:VNx2DF=float_extend(r139:VNx2SF)
>   REG_DEAD r139:VNx2SF
>27: 
> r150:VNx2DF={(unspec[const_vector,r153:DI,0x2,0x2,0x1,0x7,vl:SI,vtype:SI,N/A:SI]
>  69)?r148:VNx2DF*r149:VNx2DF:unspec[zero:SI] 68}
>   REG_DEAD r149:VNx2DF
>   REG_DEAD r148:VNx2DF
>   REG_DEAD N/A:SI
>   REG_DEAD zero:SI
>   REG_EQUAL r148:VNx2DF*r149:VNx2DF
> Successfully matched this instruction:
> (set (reg:VNx2DF 150 [ vect__11.50 ])
> (if_then_else:VNx2DF (unspec:VNx2BI [
> (const_vector:VNx2BI repeat [
> (const_int 1 [0x1])
> ])
> (reg:DI 153)
> (const_int 2 [0x2]) repeated x2
> (const_int 1 [0x1])
> (const_int 7 [0x7])
> (reg:SI 66 vl)
> (reg:SI 67 vtype)
> (reg:SI 69 N/A)
> ] UNSPEC_VPREDICATE)
> (mult:VNx2DF (float_extend:VNx2DF (reg:VNx2SF 141 [ vect__4.44 ]))
> (float_extend:VNx2DF (reg:VNx2SF 139 [ vect__7.48 ])))
> (unspec:VNx2DF [
> (reg:SI 0 zero)
> ] UNSPEC_VUNDEF)))
> allowing combination of insns 24, 25 and 27
> original costs 4 + 4 + 4 = 12
> replacement cost 4
 
Note how it replaced both operands of the mult with extended versions 
and the pattern matches, as expected.
 
The point being that I don't think those helper patterns are needed to 
handle the problem you suggested they were there to handle.  Combine 
knows how to handle multiple substitutions just fine.
 
Right now I don't see a need for this patch.
 
 
 
Jeff

Re: Re: [PATCH] RISC-V: Support vfwmul.vv combine lowering

2023-06-29 Thread 钟居哲

>> Right now I don't see a need for this patch.
No, we need this patch.

With this patch,  this following case can be combine into vfwmul.vv:
#define TEST_TYPE(TYPE1, TYPE2)\
  __attribute__ ((noipa)) void vwadd_##TYPE1_##TYPE2 ( \
TYPE1 *__restrict dst, TYPE1 *__restrict dst2, TYPE1 *__restrict dst3, \
TYPE1 *__restrict dst4, TYPE2 *__restrict a, TYPE2 *__restrict b,  \
TYPE2 *__restrict a2, TYPE2 *__restrict b2, int n) \
  {\
for (int i = 0; i < n; i++)\
  {\
dst[i] = (TYPE1) a[i] * (TYPE1) b[i];  \
dst2[i] = (TYPE1) a2[i] * (TYPE1) b[i];\
dst3[i] = (TYPE1) a2[i] * (TYPE1) a[i];\
dst4[i] = (TYPE1) a[i] * (TYPE1) b2[i];\
  }\
  }
TEST_TYPE (double, float)
You should try this, then you will know I am saying.


juzhe.zh...@rivai.ai
 
From: Jeff Law
Date: 2023-06-30 06:59
To: 钟居哲; gcc-patches
CC: kito.cheng; kito.cheng; palmer; palmer; rdapp.gcc
Subject: Re: [PATCH] RISC-V: Support vfwmul.vv combine lowering
 
 
On 6/28/23 16:00, 钟居哲 wrote:
> You can see here:
> 
> https://godbolt.org/z/d78646hWb 
> 
> The first case can't genreate vfwmul.vv but second case succeed.
> 
> Failed to match this instruction:
> (set (reg:VNx2DF 150 [ vect__11.50 ])
>  (if_then_else:VNx2DF (unspec:VNx2BI [
>  (const_vector:VNx2BI repeat [
>  (const_int 1 [0x1])
>  ])
>  (reg:DI 153)
>  (const_int 2 [0x2]) repeated x2
>  (const_int 1 [0x1])
>  (const_int 7 [0x7])
>  (reg:SI 66 vl)
>  (reg:SI 67 vtype)
>  (reg:SI 69 N/A)
>  ] UNSPEC_VPREDICATE)
>  (mult:VNx2DF (float_extend:VNx2DF (reg:VNx2SF 149 [ vect__5.45 ]))
>  (reg:VNx2DF 148 [ vect__8.49 ]))
>  (unspec:VNx2DF [
>  (reg:SI 0 zero)
>  ] UNSPEC_VUNDEF)))
Right.  We try combining:
   24 -> 27
   25 -> 27
   23, 24 -> 27
   22, 25 -> 27
 
All of which fail, as expected.  24 -> 27 and 25-> 27 only put an 
extension on one operand of the mult.  The other two try to substitute a 
float extend of an if-then-else which I fully expect to fail.  All as 
expected.
 
The next one that gets tried is:
 
> Trying 25, 24 -> 27:
>25: r149:VNx2DF=float_extend(r141:VNx2SF)
>   REG_DEAD r141:VNx2SF
>24: r148:VNx2DF=float_extend(r139:VNx2SF)
>   REG_DEAD r139:VNx2SF
>27: 
> r150:VNx2DF={(unspec[const_vector,r153:DI,0x2,0x2,0x1,0x7,vl:SI,vtype:SI,N/A:SI]
>  69)?r148:VNx2DF*r149:VNx2DF:unspec[zero:SI] 68}
>   REG_DEAD r149:VNx2DF
>   REG_DEAD r148:VNx2DF
>   REG_DEAD N/A:SI
>   REG_DEAD zero:SI
>   REG_EQUAL r148:VNx2DF*r149:VNx2DF
> Successfully matched this instruction:
> (set (reg:VNx2DF 150 [ vect__11.50 ])
> (if_then_else:VNx2DF (unspec:VNx2BI [
> (const_vector:VNx2BI repeat [
> (const_int 1 [0x1])
> ])
> (reg:DI 153)
> (const_int 2 [0x2]) repeated x2
> (const_int 1 [0x1])
> (const_int 7 [0x7])
> (reg:SI 66 vl)
> (reg:SI 67 vtype)
> (reg:SI 69 N/A)
> ] UNSPEC_VPREDICATE)
> (mult:VNx2DF (float_extend:VNx2DF (reg:VNx2SF 141 [ vect__4.44 ]))
> (float_extend:VNx2DF (reg:VNx2SF 139 [ vect__7.48 ])))
> (unspec:VNx2DF [
> (reg:SI 0 zero)
> ] UNSPEC_VUNDEF)))
> allowing combination of insns 24, 25 and 27
> original costs 4 + 4 + 4 = 12
> replacement cost 4
 
Note how it replaced both operands of the mult with extended versions 
and the pattern matches, as expected.
 
The point being that I don't think those helper patterns are needed to 
handle the problem you suggested they were there to handle.  Combine 
knows how to handle multiple substitutions just fine.
 
Right now I don't see a need for this patch.
 
 
 
Jeff

Re: [PATCH] RISC-V: Support vfwmul.vv combine lowering

2023-06-29 Thread Jeff Law via Gcc-patches





On 6/28/23 16:00, 钟居哲 wrote:

You can see here:

https://godbolt.org/z/d78646hWb 

The first case can't genreate vfwmul.vv but second case succeed.

Failed to match this instruction:
(set (reg:VNx2DF 150 [ vect__11.50 ])
     (if_then_else:VNx2DF (unspec:VNx2BI [
                 (const_vector:VNx2BI repeat [
                         (const_int 1 [0x1])
                     ])
                 (reg:DI 153)
                 (const_int 2 [0x2]) repeated x2
                 (const_int 1 [0x1])
                 (const_int 7 [0x7])
                 (reg:SI 66 vl)
                 (reg:SI 67 vtype)
                 (reg:SI 69 N/A)
             ] UNSPEC_VPREDICATE)
         (mult:VNx2DF (float_extend:VNx2DF (reg:VNx2SF 149 [ vect__5.45 ]))
             (reg:VNx2DF 148 [ vect__8.49 ]))
         (unspec:VNx2DF [
                 (reg:SI 0 zero)
             ] UNSPEC_VUNDEF)))

Right.  We try combining:
  24 -> 27
  25 -> 27
  23, 24 -> 27
  22, 25 -> 27

All of which fail, as expected.  24 -> 27 and 25-> 27 only put an 
extension on one operand of the mult.  The other two try to substitute a 
float extend of an if-then-else which I fully expect to fail.  All as 
expected.


The next one that gets tried is:


Trying 25, 24 -> 27:
   25: r149:VNx2DF=float_extend(r141:VNx2SF)
  REG_DEAD r141:VNx2SF
   24: r148:VNx2DF=float_extend(r139:VNx2SF)
  REG_DEAD r139:VNx2SF
   27: 
r150:VNx2DF={(unspec[const_vector,r153:DI,0x2,0x2,0x1,0x7,vl:SI,vtype:SI,N/A:SI]
 69)?r148:VNx2DF*r149:VNx2DF:unspec[zero:SI] 68}
  REG_DEAD r149:VNx2DF
  REG_DEAD r148:VNx2DF
  REG_DEAD N/A:SI
  REG_DEAD zero:SI
  REG_EQUAL r148:VNx2DF*r149:VNx2DF
Successfully matched this instruction:
(set (reg:VNx2DF 150 [ vect__11.50 ])
(if_then_else:VNx2DF (unspec:VNx2BI [
(const_vector:VNx2BI repeat [
(const_int 1 [0x1])
])
(reg:DI 153)
(const_int 2 [0x2]) repeated x2
(const_int 1 [0x1])
(const_int 7 [0x7])
(reg:SI 66 vl)
(reg:SI 67 vtype)
(reg:SI 69 N/A)
] UNSPEC_VPREDICATE)
(mult:VNx2DF (float_extend:VNx2DF (reg:VNx2SF 141 [ vect__4.44 ]))
(float_extend:VNx2DF (reg:VNx2SF 139 [ vect__7.48 ])))
(unspec:VNx2DF [
(reg:SI 0 zero)
] UNSPEC_VUNDEF)))
allowing combination of insns 24, 25 and 27
original costs 4 + 4 + 4 = 12
replacement cost 4


Note how it replaced both operands of the mult with extended versions 
and the pattern matches, as expected.


The point being that I don't think those helper patterns are needed to 
handle the problem you suggested they were there to handle.  Combine 
knows how to handle multiple substitutions just fine.


Right now I don't see a need for this patch.



Jeff

Re: [PATCH] c++: fix up caching of level lowered ttps

2023-06-29 Thread Jason Merrill via Gcc-patches


On 6/1/23 17:42, Patrick Palka wrote:

Due to level/depth mismatches between the template parameters of a level
lowered ttp and the original ttp, the ttp comparison check added by
r14-418-g0bc2a1dc327af9 never actually holds outside of erroneous cases.
Moreover, it'd be good to cache the overall TEMPLATE_TEMPLATE_PARM
instead of just the corresponding TEMPLATE_PARM_INDEX.

It's tricky to cache all level lowered ttps since the result of level
lowering may depend on more than just the depth of the arguments, e.g.
for TT in

   template
   struct A
   {
 template class TT>
 void f();
   }

the substitution T=int yields a different level-lowerd ttp than T=char.
But these kinds of ttps seem to be rare in practice, and "simple" ttps
that don't depend on outer template parameters are easy enough to
cache like so.  Unfortunately, this means we're back to expecting a
duplicate error in nontype12.C again since the ttp in question is
not "simple" so caching of the (erroneous) lowered ttp doesn't happen.

Bootstrapped and regtested on x86_64-pc-linux-gnu, does this look OK for
trunk?  This reduces memory usage of range-v3's zip.cpp by 1%.

gcc/cp/ChangeLog:

* cp-tree.h (TEMPLATE_PARM_DESCENDANTS): Harden.
(TEMPLATE_TYPE_DESCENDANTS): Define.
(TEMPLATE_TEMPLATE_PARM_SIMPLE_P): Define.
* pt.cc (reduce_template_parm_level): Revert
r14-418-g0bc2a1dc327af9 change.
(process_template_parm): Set TEMPLATE_TEMPLATE_PARM_SIMPLE_P
appropriately.
(uses_outer_template_parms): Determine the outer depth of
a template template parm without relying on DECL_CONTEXT.
(tsubst) : Cache lowering a
simple template template parm.  Consistently use 'code'.

gcc/testsuite/ChangeLog:

* g++.dg/template/nontype12.C: Expect a duplicate error again.
---
  gcc/cp/cp-tree.h  | 10 +-
  gcc/cp/pt.cc  | 37 +--
  gcc/testsuite/g++.dg/template/nontype12.C |  3 +-
  3 files changed, 31 insertions(+), 19 deletions(-)

+++ b/gcc/testsuite/g++.dg/template/nontype12.C
@@ -4,8 +4,7 @@
  template struct A
  {
template int foo();// { dg-error "double" "" { 
target c++17_down } }
-  template class> int bar();// { dg-bogus 
{double.*C:7:[^\n]*double} }


Let's xfail the duplicate error rather than remove the dg-bogus.  OK 
with that change.


Jason

Re: [PATCH 2/19][front-end] C/C++ front-end: add pragma GCC novector

2023-06-29 Thread Jason Merrill via Gcc-patches


On 6/28/23 09:41, Tamar Christina wrote:

Hi All,

FORTRAN currently has a pragma NOVECTOR for indicating that vectorization should
not be applied to a particular loop.

ICC/ICX also has such a pragma for C and C++ called #pragma novector.

As part of this patch series I need a way to easily turn off vectorization of
particular loops, particularly for testsuite reasons.

This patch proposes a #pragma GCC novector that does the same for C and C++
as gfortan does for FORTRAN and what ICX/ICX does for C and C++.

I added only some basic tests here, but the next patch in the series uses this
in the testsuite in about ~800 tests.

Bootstrapped Regtested on aarch64-none-linux-gnu and no issues.

Ok for master?

Thanks,
Tamar

gcc/c-family/ChangeLog:

* c-pragma.h (enum pragma_kind): Add PRAGMA_NOVECTOR.
* c-pragma.cc (init_pragma): Use it.

gcc/c/ChangeLog:

* c-parser.cc (c_parser_while_statement, c_parser_do_statement,
c_parser_for_statement, c_parser_statement_after_labels,
c_parse_pragma_novector, c_parser_pragma): Wire through novector and
default to false.


I'll let the C maintainers review the C changes.


gcc/cp/ChangeLog:

* cp-tree.def (RANGE_FOR_STMT): Update comment.
* cp-tree.h (RANGE_FOR_NOVECTOR): New.
(cp_convert_range_for, finish_while_stmt_cond, finish_do_stmt,
finish_for_cond): Add novector param.
* init.cc (build_vec_init): Default novector to false.
* method.cc (build_comparison_op): Likewise.
* parser.cc (cp_parser_statement): Likewise.
(cp_parser_for, cp_parser_c_for, cp_parser_range_for,
cp_convert_range_for, cp_parser_iteration_statement,
cp_parser_omp_for_loop, cp_parser_pragma): Support novector.
(cp_parser_pragma_novector): New.
* pt.cc (tsubst_expr): Likewise.
* semantics.cc (finish_while_stmt_cond, finish_do_stmt,
finish_for_cond): Likewise.

gcc/ChangeLog:

* doc/extend.texi: Document it.
* tree-core.h (struct tree_base): Add lang_flag_7 and reduce spare0.
* tree.h (TREE_LANG_FLAG_7): New.


This doesn't seem necessary; I think only flags 1 and 6 are currently 
used in RANGE_FOR_STMT.



gcc/testsuite/ChangeLog:

* g++.dg/vect/vect-novector-pragma.cc: New test.
* gcc.dg/vect/vect-novector-pragma.c: New test.

--- inline copy of patch --
...
@@ -13594,7 +13595,8 @@ cp_parser_condition (cp_parser* parser)
 not included. */
  
  static tree

-cp_parser_for (cp_parser *parser, bool ivdep, unsigned short unroll)
+cp_parser_for (cp_parser *parser, bool ivdep, unsigned short unroll,
+  bool novector)


I wonder about combining the ivdep and novector parameters here and in 
other functions?  Up to you.



@@ -49613,17 +49633,33 @@ cp_parser_pragma (cp_parser *parser, enum 
pragma_context context, bool *if_p)
break;
  }
const bool ivdep = cp_parser_pragma_ivdep (parser, pragma_tok);
-   unsigned short unroll;
+   unsigned short unroll = 0;
+   bool novector = false;
cp_token *tok = cp_lexer_peek_token (the_parser->lexer);
-   if (tok->type == CPP_PRAGMA
-   && cp_parser_pragma_kind (tok) == PRAGMA_UNROLL)
+
+   while (tok->type == CPP_PRAGMA)
  {
-   tok = cp_lexer_consume_token (parser->lexer);
-   unroll = cp_parser_pragma_unroll (parser, tok);
-   tok = cp_lexer_peek_token (the_parser->lexer);
+   switch (cp_parser_pragma_kind (tok))
+ {
+   case PRAGMA_UNROLL:
+ {
+   tok = cp_lexer_consume_token (parser->lexer);
+   unroll = cp_parser_pragma_unroll (parser, tok);
+   tok = cp_lexer_peek_token (the_parser->lexer);
+   break;
+ }
+   case PRAGMA_NOVECTOR:
+ {
+   tok = cp_lexer_consume_token (parser->lexer);
+   novector = cp_parser_pragma_novector (parser, tok);
+   tok = cp_lexer_peek_token (the_parser->lexer);
+   break;
+ }
+   default:
+ gcc_unreachable ();
+ }
  }


Repeating this pattern three times for the three related pragmas is too 
much; please combine the three cases into one.


Jason

Re: [PATCH] rs6000: Update the vsx-vector-6.* tests.

2023-06-29 Thread Carl Love via Gcc-patches

Kewen:

On Wed, 2023-06-28 at 16:35 +0800, Kewen.Lin wrote:
> > Yea, I was going with a runnable test and didn't include the
> > instruction counts.  Added back in.  Rather then doing by processor
> > version (P8, P9, P10) I was able to do it by BE/LE.  The
> > instruction
> > counts were the same for LE accross processor versions but there
> > are a
> > few instruction counts that vary with BE and LE.
> 
> But the original test case only checks for cpu-types (processor
> version)
> but not for endianness, it means for the bif usages, there should not
> be
> different for endianness.  Why does this changes with your new test
> case?
> Could you have a further look and make it consistent with some
> adjustment
> if possible?  As we know, checking insn counts sometimes are fragile,
> so
> I think we should try our best to make it as robust as possible in
> the
> first place.
> 
> Besides, the original case also have some differences between p7/p8
> and
> p9.
>   

There are differences on P8 LE versus BE.  I did a diff between the P8
and P9 tests:

 diff vsx-vector-6.p8.c vsx-vector-6.p9.c
3,4c3,4
< /* { dg-require-effective-target powerpc_p8vector_ok } */
< /* { dg-options "-O2 -mdejagnu-cpu=power8" } */
---
> /* { dg-require-effective-target powerpc_p9vector_ok } */
> /* { dg-options "-O2 -mdejagnu-cpu=power9" } */
12c12
< /* { dg-final { scan-assembler-times {\mvperm\M} 1 } } */
---
> /* { dg-final { scan-assembler-times {\m(?:v|xx)permr?\M} 1 } } */
23d22
< /* { dg-final { scan-assembler-times {\mxvmsub[am]dp\M} 1 } } */
37c36
< /* { dg-final { scan-assembler-times {\mxvsubdp\M} 1 } } */
---
> /* { dg-final { scan-assembler-times {\mxvmsub[am]dp\M} 1 } } */

So we can see the vperm, vpermr, xxpermr, xvmsubadp, xvmsubmdp,
xvsubdp, xvmsubadp, xvmsubmdp instruction count checks are different
between the two architectures.  I then wrote a script to compile the
CPU specific test on Power 8, Power 9 and Power 10 architectures and
then grep for the above list of instructions.  If I run the scrip on P8
BE  and LE I get

Power 8 BEPower 8 LE   Power 9 LE   Power 9 BEPower 10 LE*
   (makalu-lp1)(genoa) (marlin)  (nilram)   (ltcd97-lp3)
instruction   count countcount countcount
vperm  1  10 00
vpermr 0  00 00
xxpermr0  01 01
xvmsubadp  1  01 11
xvmsubmdp  0  10 00
xvsubdp1  11 11

>From the diff we see 

  { dg-final {scan-assembler-times {\mxvmsub[am]dp\M} 1 } }

This test picks up the correct subtraction instruction for LE versus BE
so this "masks" the LE/BE difference.  I changed the check in vsx-
vector-6-func-3op.c to match.  This eliminates the LE and BE checks and
reduces the number of specific checks.

In vsx-vector-6-func-3op.c  The new test checks the counts for
xxpermdi, which the original test does not check.  The check for
xxpermdi are not needed.  They are not directly related to the builtin
tests.  I removed them.

Looking at the LE/BE checks in the other test file vsx-vector-6-func-
2op.c, instructions xvmaxsp, xvminsp and xvmaxdp were not checked in
the original test.  The functions where these instructions are used get
inlined.  On LE, the binary instructions show up in the inlined code as
well as what appears to be the binary for the original, non-inlined
function.  Best I can see, the binary for the original function is dead
code.  I don't see any calls to it.  Seems like it shouldn't be there
as it would make the binary smaller. On BE, I don't see the binary for
the original non-inlined function.  

I had played with putting -Wno-inline on the command line but that
didn't seem to make any difference.  However, you suggestion of
__attribute__ ((noipa)) does prevent the inlining and we don't get the
second copy of the instructions showing up. The inlining eliminated the
LE/BE differences for xvmaxsp, xvminsp and xvmaxdp.

The instruction count test for xxlor in vsx-vector-6-func-2lop.c
differs on LE and BE vsx-vector-6-func-2op.c.  I believe the
instruction is used with loads to reorder the data.  I don't see anyway
to get around the extra xxlor instructions and verify the vec_or
builtin test generates the instruction.  

I was able to eliminate all of the LE/BE qualifiers in the instruction
counts with the exception of xxlor.  By using the same checks that look
for multiple versions of xvmsumb*, as was done in the original test, we
can also eliminate LE/BE specific tests and account for different
instructions across CPU versions.  We could go back to checking for
specific instructions being generated on Power 8, Power 9, Power 10 if
you prefer not using checks that cover multiple flavors of a given

Re: [PATCH v1] RISC-V: Refactor vxrm_mode attr for type attr equal

2023-06-29 Thread Jeff Law via Gcc-patches





On 6/29/23 00:00, pan2...@intel.com wrote:

From: Pan Li 

This patch would like to refactor the vxrm_mode attr for duplicated
eq_attr condition. The common condition of attr is extraced to one
place instead of many places.

Signed-off-by: Pan Li 

gcc/ChangeLog:

* config/riscv/vector.md: Refactor the common condition.

OK
jeff

Extend ipa-fnsummary to skip __builtin_expect

2023-06-29 Thread Jan Hubicka via Gcc-patches

Compute ipa-predicates for conditionals involving __builtin_expect_p

std::vector allocator looks as follows:

__attribute__((nodiscard))
struct pair * std::__new_allocator 
>::allocate (struct __new_allocator * const this, size_type __n, const void * 
D.27753)
{
  bool _1;
  long int _2;
  long int _3;
  long unsigned int _5;
  struct pair * _9;

   [local count: 1073741824]:
  _1 = __n_7(D) > 1152921504606846975;
  _2 = (long int) _1;
  _3 = __builtin_expect (_2, 0);
  if (_3 != 0)
goto ; [10.00%]
  else
goto ; [90.00%]

   [local count: 107374184]:
  if (__n_7(D) > 2305843009213693951)
goto ; [50.00%]
  else
goto ; [50.00%]

   [local count: 53687092]:
  std::__throw_bad_array_new_length ();

   [local count: 53687092]:
  std::__throw_bad_alloc ();

   [local count: 966367641]:
  _5 = __n_7(D) * 8;
  _9 = operator new (_5);
  return _9;
}


So there is check for allocated block size being greater than max_size which is
wrapper in __builtin_expect.  This makes ipa-fnsummary to give up analyzing
predicates and it will miss the fact that the two different calls to __throw
will be optimized out if __n is larady smaller than 1152921504606846975 which
it is after _M_check_len.

This patch extends ipa-fnsummary to understand functions that return their
parameter.

We still do not get the value range propagated sicne _M_check_len is not
inlined early and ipa-prop misses return functions, but we get closer :)

Bootstrapped/regtested x86_64-linux, comitted.


gcc/ChangeLog:

PR tree-optimization/109849
* ipa-fnsummary.cc (decompose_param_expr): Skip
functions returning its parameter.
(set_cond_stmt_execution_predicate): Return early
if predicate was constructed.

gcc/testsuite/ChangeLog:

PR tree-optimization/109849
* gcc.dg/ipa/pr109849.c: New test.

diff --git a/gcc/ipa-fnsummary.cc b/gcc/ipa-fnsummary.cc
index 78cbb60d056..a09f6305c63 100644
--- a/gcc/ipa-fnsummary.cc
+++ b/gcc/ipa-fnsummary.cc
@@ -1516,6 +1516,19 @@ decompose_param_expr (struct ipa_func_body_info *fbi,
 
   if (TREE_CODE (expr) != SSA_NAME || SSA_NAME_IS_DEFAULT_DEF (expr))
break;
+  stmt = SSA_NAME_DEF_STMT (expr);
+
+  if (gcall *call = dyn_cast  (stmt))
+   {
+ int flags = gimple_call_return_flags (call);
+ if (!(flags & ERF_RETURNS_ARG))
+   goto fail;
+ int arg = flags & ERF_RETURN_ARG_MASK;
+ if (arg >= (int)gimple_call_num_args (call))
+   goto fail;
+ expr = gimple_call_arg (stmt, arg);
+ continue;
+   }
 
   if (!is_gimple_assign (stmt = SSA_NAME_DEF_STMT (expr)))
break;
@@ -1664,6 +1677,7 @@ set_cond_stmt_execution_predicate (struct 
ipa_func_body_info *fbi,
}
}
   vec_free (param_ops);
+  return;
 }
 
   if (TREE_CODE (op) != SSA_NAME)
diff --git a/gcc/testsuite/gcc.dg/ipa/pr109849.c 
b/gcc/testsuite/gcc.dg/ipa/pr109849.c
new file mode 100644
index 000..09b62f90d70
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/ipa/pr109849.c
@@ -0,0 +1,27 @@
+/* { dg-do compile } */
+/* { dg-options "-Os -fdump-ipa-inline-details" } */
+void bad (void);
+void
+test(int a)
+{
+   if (__builtin_expect (a>3, 0))
+   {
+   bad ();
+   bad ();
+   bad ();
+   bad ();
+   bad ();
+   bad ();
+   bad ();
+   bad ();
+   }
+}
+void
+foo (int a)
+{
+   if (a>0)
+   __builtin_unreachable ();
+   test (a);
+}
+/* { dg-final { scan-ipa-dump "Inlined 2 calls" "inline"  } } */
+/* { dg-final { scan-ipa-dump "Inlining test" "inline"  } } */

Re: [PATCH] analyzer: Fix regression bug after r14-1632-g9589a46ddadc8b [PR110198]

2023-06-29 Thread David Malcolm via Gcc-patches

On Thu, 2023-06-29 at 20:45 +0200, priour...@gmail.com wrote:
> From: benjamin priour 
> 
> See below formatting updates on my patch.
> In mail
> https://gcc.gnu.org/pipermail/gcc-patches/2023-June/623140.html,
> David Malcolm says regtesting failed for him.
> 
> So I did it once more this morning rebased on fresh trunk dc93a0f633b
> and target x86_64-linux-gnu, the output was similar to last time:
> 
>     # from gcc_sources_testing 
>     gcc/contrib/compare_tests ../gcc_sources_control/build build
> 
>     # Comparing directories
>     ## Dir1=../gcc_sources_control/build: 8 sum files
>     ## Dir2=build: 8 sum files
>     
>     # Comparing 8 common sum files
>     ## /bin/sh gcc/contrib/compare_tests  /tmp/gxx-sum1.750468
> /tmp/gxx-sum2.750468
>     Tests that now work, but didn't before (3 tests):
> 
>     g++.dg/analyzer/pr100244.C  -std=c++14  (test for warnings, line
> 17)
>     g++.dg/analyzer/pr100244.C  -std=c++17  (test for warnings, line
> 17)
>     g++.dg/analyzer/pr100244.C  -std=c++20  (test for warnings, line
> 17)
>     
>     # No differences found in 8 common sum files
> 
> Can you confirm formatting of the patch below, and perhaps regtest it
> ?

Looks good to me; you can go ahead and push this to trunk.

Thanks
Dave

Re: [PATCH] testsuite: Use -fno-report-bug in gcc.dg/plugin/

2023-06-29 Thread David Malcolm via Gcc-patches

On Thu, 2023-06-29 at 15:03 -0400, Marek Polacek wrote:
> Certain downstream compilers (for example, in Fedora) default to
> -freport-bug.  The extra output breaks the following tests.  We can
> use
> -fno-report-bug to fix that.  Patch verified with:
> 
> $ make check RUNTESTFLAGS='--target_board=unix\{,-freport-bug\}
> plugin.exp'
> 
> Tested x86_64-pc-linux-gnu, ok for trunk/13?

Looks good to me; thanks

Dave

Re: [PATCH v3] Streamer: Fix out of range memory access of machine mode

2023-06-29 Thread Thomas Schwinge

Hi!

On 2023-06-29T11:29:57+0200, I wrote:
> On 2023-06-21T15:58:24+0800, Pan Li via Gcc-patches  
> wrote:
>> We extend the machine mode from 8 to 16 bits already. But there still
>> one placing missing from the streamer. It has one hard coded array
>> for the machine code like size 256.
>>
>> In the lto pass, we memset the array by MAX_MACHINE_MODE count but the
>> value of the MAX_MACHINE_MODE will grow as more and more modes are
>> added. While the machine mode array in tree-streamer still leave 256 as is.
>>
>> Then, when the MAX_MACHINE_MODE is greater than 256, the memset of
>> lto_output_init_mode_table will touch the memory out of range unexpected.
>
> Uh.  :-O
>
>> This patch would like to take the MAX_MACHINE_MODE as the size of the
>> array in streamer, to make sure there is no potential unexpected
>> memory access in future. Meanwhile, this patch also adjust some place
>> which has MAX_MACHINE_MODE <= 256 assumption.
>
> Thanks to Jakub and Richard for guidance re the offloading compilation
> case, where we've got different 'MAX_MACHINE_MODE's between stream-out
> and stream-in, and a modes mapping table.
>
> However, with this patch, there are ICEs all over the place...  I'm
> having a look.

Your patch has all the right ideas, there are just a few additional
changes necessary.  Please merge in the attached
"f into Streamer: Fix out of range memory access of machine mode", with
'Co-authored-by: Thomas Schwinge '.  This has
already survived compiler-side 'lto.exp' testing and
'check-target-libgomp' with Nvidia GPU offloading; AMD GPU testing is now
running (not expecting any bad surprises).  Will let you know by (my)
tomorrow morning in case there are any more problems.

Explanation:

>> --- a/gcc/lto-streamer-in.cc
>> +++ b/gcc/lto-streamer-in.cc
>> @@ -1985,8 +1985,6 @@ lto_input_mode_table (struct lto_file_decl_data 
>> *file_data)
>>  internal_error ("cannot read LTO mode table from %s",
>>   file_data->file_name);
>>
>> -  unsigned char *table = ggc_cleared_vec_alloc (1 << 8);
>> -  file_data->mode_table = table;
>>const struct lto_simple_header_with_strings *header
>>  = (const struct lto_simple_header_with_strings *) data;
>>int string_offset;
>> @@ -1998,16 +1996,22 @@ lto_input_mode_table (struct lto_file_decl_data 
>> *file_data)
>>   header->string_size, vNULL);
>>bitpack_d bp = streamer_read_bitpack ();
>>
>> +  unsigned mode_bits = bp_unpack_value (, 5);
>> +  unsigned char *table = ggc_cleared_vec_alloc (1 << 
>> mode_bits);
>> +
>> +  file_data->mode_table = table;
>> +  file_data->mode_bits = mode_bits;

Here, we set 'file_data->mode_bits' for the offloading case (where
'lto_input_mode_table' is called) -- but it's not set for the
non-offloading case (where 'lto_input_mode_table' isn't called).  (See my
'gcc/lto/lto-common.cc:lto_read_decls' change.)  That's "not currently a
problem", as 'file_data->mode_bits' isn't used anywhere...

>> --- a/gcc/lto-streamer.h
>> +++ b/gcc/lto-streamer.h
>> @@ -604,6 +604,8 @@ struct GTY(()) lto_file_decl_data
>>int order_base;
>>
>>int unit_base;
>> +
>> +  unsigned mode_bits;
>>  };

>>  inline machine_mode
>>  bp_unpack_machine_mode (struct bitpack_d *bp)
>>  {
>> -  return (machine_mode)
>> -((class lto_input_block *)
>> - bp->stream)->mode_table[bp_unpack_enum (bp, machine_mode, 1 << 8)];
>> +  int last = 1 << ceil_log2 (MAX_MACHINE_MODE);
>> +  lto_input_block *input_block = (class lto_input_block *) bp->stream;
>> +  int index = bp_unpack_enum (bp, machine_mode, last);
>> +
>> +  return (machine_mode) input_block->mode_table[index];
>>  }

..., but 'file_data->mode_bits' needs to be considered here, in the
stream-in for offloading, where 'file_data->mode_bits' -- that is, the
host 'MAX_MACHINE_MODE' -- very likely is different from the offload
device 'MAX_MACHINE_MODE'.

Easiest is in 'gcc/lto-streamer.h:class lto_input_block' to capture
'lto_file_decl_data *file_data' instead of just
'unsigned char *mode_table', and adjust all users.

That's it.  :-)

>> --- a/gcc/tree-streamer.h
>> +++ b/gcc/tree-streamer.h

>> @@ -108,15 +108,19 @@ inline void
>>  bp_pack_machine_mode (struct bitpack_d *bp, machine_mode mode)
>>  {
>>streamer_mode_table[mode] = 1;
>> -  bp_pack_enum (bp, machine_mode, 1 << 8, mode);
>> +  int last = 1 << ceil_log2 (MAX_MACHINE_MODE);
>> +
>> +  bp_pack_enum (bp, machine_mode, last, mode);
>>  }

That use of 'MAX_MACHINE_MODE' is safe, as that only concerns the
stream-out phase.

>> --- a/gcc/tree-streamer.cc
>> +++ b/gcc/tree-streamer.cc
>> @@ -35,7 +35,7 @@ along with GCC; see the file COPYING3.  If not see
>> During streaming in, we translate the on the disk mode using this
>> table.  For normal LTO it is set to identity, for ACCEL_COMPILER
>> depending on the mode_table content.  */
>> -unsigned char streamer_mode_table[1 << 8];
>> +unsigned char streamer_mode_table[MAX_MACHINE_MODE];

Likewise.


Grüße
 Thomas

Re: [PATCH] i386: add -fno-stack-protector to two tests

2023-06-29 Thread Marek Polacek via Gcc-patches

On Fri, Jun 30, 2023 at 04:11:44AM +0800, Xi Ruoyao wrote:
> On Fri, 2023-06-30 at 04:08 +0800, Xi Ruoyao wrote:
> > On Thu, 2023-06-29 at 16:01 -0400, Marek Polacek via Gcc-patches wrote:
> > > These tests fail when the testsuite is executed with -fstack-
> > > protector-strong.
> > > To avoid this, this patch adds -fno-stack-protector to dg-options.
> > > 
> > > Tested on x86_64-pc-linux-gnu, ok for trunk?
> > 
> > LGTM, we've noticed these two failures in Linux From Scratch [1].  But
> > this is not an approval because I'm not a maintainer.

Thanks.

> And can we backport them to gcc-13 branch too?  These two tests were
> added in the cycle of GCC 13, so we could consider the failures
> "regression".

Yeah, it would be good for Fedora gcc as well.  I'll put the patch in 13
as well.
 
Marek

Re: [PATCH] i386: add -fno-stack-protector to two tests

2023-06-29 Thread Jakub Jelinek via Gcc-patches

On Fri, Jun 30, 2023 at 04:11:44AM +0800, Xi Ruoyao via Gcc-patches wrote:
> On Fri, 2023-06-30 at 04:08 +0800, Xi Ruoyao wrote:
> > On Thu, 2023-06-29 at 16:01 -0400, Marek Polacek via Gcc-patches wrote:
> > > These tests fail when the testsuite is executed with -fstack-
> > > protector-strong.
> > > To avoid this, this patch adds -fno-stack-protector to dg-options.
> > > 
> > > Tested on x86_64-pc-linux-gnu, ok for trunk?
> > 
> > LGTM, we've noticed these two failures in Linux From Scratch [1].  But
> > this is not an approval because I'm not a maintainer.
> 
> And can we backport them to gcc-13 branch too?  These two tests were
> added in the cycle of GCC 13, so we could consider the failures
> "regression".

It is ok even for 13 branch.

Jakub

Re: [PATCH] i386: add -fno-stack-protector to two tests

2023-06-29 Thread Xi Ruoyao via Gcc-patches

On Fri, 2023-06-30 at 04:08 +0800, Xi Ruoyao wrote:
> On Thu, 2023-06-29 at 16:01 -0400, Marek Polacek via Gcc-patches wrote:
> > These tests fail when the testsuite is executed with -fstack-
> > protector-strong.
> > To avoid this, this patch adds -fno-stack-protector to dg-options.
> > 
> > Tested on x86_64-pc-linux-gnu, ok for trunk?
> 
> LGTM, we've noticed these two failures in Linux From Scratch [1].  But
> this is not an approval because I'm not a maintainer.

And can we backport them to gcc-13 branch too?  These two tests were
added in the cycle of GCC 13, so we could consider the failures
"regression".

> 
> [1]:https://www.linuxfromscratch.org/lfs/view/development/chapter08/gcc.html
> 
> > 
> > gcc/testsuite/ChangeLog:
> > 
> > * gcc.target/i386/pr104610.c: Use -fno-stack-protector.
> > * gcc.target/i386/pr69482-1.c: Likewise.
> > ---
> >  gcc/testsuite/gcc.target/i386/pr104610.c  | 2 +-
> >  gcc/testsuite/gcc.target/i386/pr69482-1.c | 2 +-
> >  2 files changed, 2 insertions(+), 2 deletions(-)
> > 
> > diff --git a/gcc/testsuite/gcc.target/i386/pr104610.c
> > b/gcc/testsuite/gcc.target/i386/pr104610.c
> > index fe39cbe5b8a..5173fc8898c 100644
> > --- a/gcc/testsuite/gcc.target/i386/pr104610.c
> > +++ b/gcc/testsuite/gcc.target/i386/pr104610.c
> > @@ -1,5 +1,5 @@
> >  /* { dg-do compile } */
> > -/* { dg-options "-O2 -mavx -mmove-max=256 -mstore-max=256" } */
> > +/* { dg-options "-O2 -mavx -mmove-max=256 -mstore-max=256 -fno-stack-
> > protector" } */
> >  /* { dg-final { scan-assembler-times {(?n)vptest.*ymm} 1 } } */
> >  /* { dg-final { scan-assembler-times {sete} 1 } } */
> >  /* { dg-final { scan-assembler-not {(?n)je.*L[0-9]} } } */
> > diff --git a/gcc/testsuite/gcc.target/i386/pr69482-1.c
> > b/gcc/testsuite/gcc.target/i386/pr69482-1.c
> > index f192261b104..99bb6ad5a37 100644
> > --- a/gcc/testsuite/gcc.target/i386/pr69482-1.c
> > +++ b/gcc/testsuite/gcc.target/i386/pr69482-1.c
> > @@ -1,5 +1,5 @@
> >  /* { dg-do compile } */
> > -/* { dg-options "-O3" } */
> > +/* { dg-options "-O3 -fno-stack-protector" } */
> >  
> >  static inline void memset_s(void* s, int n) {
> >    volatile unsigned char * p = s;
> > 
> > base-commit: 070a6bf0bdc6761ad77ac97404c98f00a7007d54
> 

-- 
Xi Ruoyao 
School of Aerospace Science and Technology, Xidian University

Re: [PATCH] i386: add -fno-stack-protector to two tests

2023-06-29 Thread Jakub Jelinek via Gcc-patches

On Thu, Jun 29, 2023 at 04:01:20PM -0400, Marek Polacek via Gcc-patches wrote:
> These tests fail when the testsuite is executed with -fstack-protector-strong.
> To avoid this, this patch adds -fno-stack-protector to dg-options.
> 
> Tested on x86_64-pc-linux-gnu, ok for trunk?
> 
> gcc/testsuite/ChangeLog:
> 
>   * gcc.target/i386/pr104610.c: Use -fno-stack-protector.
>   * gcc.target/i386/pr69482-1.c: Likewise.

Ok, thanks.

Jakub

Re: [PATCH] i386: add -fno-stack-protector to two tests

2023-06-29 Thread Xi Ruoyao via Gcc-patches

On Thu, 2023-06-29 at 16:01 -0400, Marek Polacek via Gcc-patches wrote:
> These tests fail when the testsuite is executed with -fstack-
> protector-strong.
> To avoid this, this patch adds -fno-stack-protector to dg-options.
> 
> Tested on x86_64-pc-linux-gnu, ok for trunk?

LGTM, we've noticed these two failures in Linux From Scratch [1].  But
this is not an approval because I'm not a maintainer.

[1]:https://www.linuxfromscratch.org/lfs/view/development/chapter08/gcc.html

> 
> gcc/testsuite/ChangeLog:
> 
> * gcc.target/i386/pr104610.c: Use -fno-stack-protector.
> * gcc.target/i386/pr69482-1.c: Likewise.
> ---
>  gcc/testsuite/gcc.target/i386/pr104610.c  | 2 +-
>  gcc/testsuite/gcc.target/i386/pr69482-1.c | 2 +-
>  2 files changed, 2 insertions(+), 2 deletions(-)
> 
> diff --git a/gcc/testsuite/gcc.target/i386/pr104610.c
> b/gcc/testsuite/gcc.target/i386/pr104610.c
> index fe39cbe5b8a..5173fc8898c 100644
> --- a/gcc/testsuite/gcc.target/i386/pr104610.c
> +++ b/gcc/testsuite/gcc.target/i386/pr104610.c
> @@ -1,5 +1,5 @@
>  /* { dg-do compile } */
> -/* { dg-options "-O2 -mavx -mmove-max=256 -mstore-max=256" } */
> +/* { dg-options "-O2 -mavx -mmove-max=256 -mstore-max=256 -fno-stack-
> protector" } */
>  /* { dg-final { scan-assembler-times {(?n)vptest.*ymm} 1 } } */
>  /* { dg-final { scan-assembler-times {sete} 1 } } */
>  /* { dg-final { scan-assembler-not {(?n)je.*L[0-9]} } } */
> diff --git a/gcc/testsuite/gcc.target/i386/pr69482-1.c
> b/gcc/testsuite/gcc.target/i386/pr69482-1.c
> index f192261b104..99bb6ad5a37 100644
> --- a/gcc/testsuite/gcc.target/i386/pr69482-1.c
> +++ b/gcc/testsuite/gcc.target/i386/pr69482-1.c
> @@ -1,5 +1,5 @@
>  /* { dg-do compile } */
> -/* { dg-options "-O3" } */
> +/* { dg-options "-O3 -fno-stack-protector" } */
>  
>  static inline void memset_s(void* s, int n) {
>    volatile unsigned char * p = s;
> 
> base-commit: 070a6bf0bdc6761ad77ac97404c98f00a7007d54

-- 
Xi Ruoyao 
School of Aerospace Science and Technology, Xidian University

[PATCH] i386: add -fno-stack-protector to two tests

2023-06-29 Thread Marek Polacek via Gcc-patches

These tests fail when the testsuite is executed with -fstack-protector-strong.
To avoid this, this patch adds -fno-stack-protector to dg-options.

Tested on x86_64-pc-linux-gnu, ok for trunk?

gcc/testsuite/ChangeLog:

* gcc.target/i386/pr104610.c: Use -fno-stack-protector.
* gcc.target/i386/pr69482-1.c: Likewise.
---
 gcc/testsuite/gcc.target/i386/pr104610.c  | 2 +-
 gcc/testsuite/gcc.target/i386/pr69482-1.c | 2 +-
 2 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/gcc/testsuite/gcc.target/i386/pr104610.c 
b/gcc/testsuite/gcc.target/i386/pr104610.c
index fe39cbe5b8a..5173fc8898c 100644
--- a/gcc/testsuite/gcc.target/i386/pr104610.c
+++ b/gcc/testsuite/gcc.target/i386/pr104610.c
@@ -1,5 +1,5 @@
 /* { dg-do compile } */
-/* { dg-options "-O2 -mavx -mmove-max=256 -mstore-max=256" } */
+/* { dg-options "-O2 -mavx -mmove-max=256 -mstore-max=256 
-fno-stack-protector" } */
 /* { dg-final { scan-assembler-times {(?n)vptest.*ymm} 1 } } */
 /* { dg-final { scan-assembler-times {sete} 1 } } */
 /* { dg-final { scan-assembler-not {(?n)je.*L[0-9]} } } */
diff --git a/gcc/testsuite/gcc.target/i386/pr69482-1.c 
b/gcc/testsuite/gcc.target/i386/pr69482-1.c
index f192261b104..99bb6ad5a37 100644
--- a/gcc/testsuite/gcc.target/i386/pr69482-1.c
+++ b/gcc/testsuite/gcc.target/i386/pr69482-1.c
@@ -1,5 +1,5 @@
 /* { dg-do compile } */
-/* { dg-options "-O3" } */
+/* { dg-options "-O3 -fno-stack-protector" } */
 
 static inline void memset_s(void* s, int n) {
   volatile unsigned char * p = s;

base-commit: 070a6bf0bdc6761ad77ac97404c98f00a7007d54
-- 
2.41.0

[PATCH] testsuite: Use -fno-report-bug in gcc.dg/plugin/

2023-06-29 Thread Marek Polacek via Gcc-patches

Certain downstream compilers (for example, in Fedora) default to
-freport-bug.  The extra output breaks the following tests.  We can use
-fno-report-bug to fix that.  Patch verified with:

$ make check RUNTESTFLAGS='--target_board=unix\{,-freport-bug\} plugin.exp'

Tested x86_64-pc-linux-gnu, ok for trunk/13?

gcc/testsuite/ChangeLog:

* gcc.dg/plugin/crash-test-ice-sarif.c: Use -fno-report-bug.  Adjust
scan-sarif-file.
* gcc.dg/plugin/crash-test-ice-stderr.c: Use -fno-report-bug.
* gcc.dg/plugin/crash-test-write-though-null-sarif.c: Use
-fno-report-bug.  Adjust scan-sarif-file.
* gcc.dg/plugin/crash-test-write-though-null-stderr.c: Use
-fno-report-bug.
---
 gcc/testsuite/gcc.dg/plugin/crash-test-ice-sarif.c | 3 ++-
 gcc/testsuite/gcc.dg/plugin/crash-test-ice-stderr.c| 1 +
 .../gcc.dg/plugin/crash-test-write-though-null-sarif.c | 3 ++-
 .../gcc.dg/plugin/crash-test-write-though-null-stderr.c| 1 +
 4 files changed, 6 insertions(+), 2 deletions(-)

diff --git a/gcc/testsuite/gcc.dg/plugin/crash-test-ice-sarif.c 
b/gcc/testsuite/gcc.dg/plugin/crash-test-ice-sarif.c
index 3b773a9a84c..84a4347a17e 100644
--- a/gcc/testsuite/gcc.dg/plugin/crash-test-ice-sarif.c
+++ b/gcc/testsuite/gcc.dg/plugin/crash-test-ice-sarif.c
@@ -1,5 +1,6 @@
 /* { dg-do compile } */
 /* { dg-options "-fdiagnostics-format=sarif-file" } */
+/* { dg-additional-options "-fno-report-bug" } */
 
 extern void inject_ice (void);
 
@@ -56,7 +57,7 @@ void test_inject_ice (void)
  { dg-final { scan-sarif-file "\"contextRegion\": " } }
  { dg-final { scan-sarif-file "\"artifactLocation\": " } }
  { dg-final { scan-sarif-file "\"region\": " } }
-   { dg-final { scan-sarif-file "\"startLine\": 8" } }
+   { dg-final { scan-sarif-file "\"startLine\": 9" } }
{ dg-final { scan-sarif-file "\"startColumn\": 3" } }
{ dg-final { scan-sarif-file "\"endColumn\": 16" } }
  { dg-final { scan-sarif-file "\"message\": " } }
diff --git a/gcc/testsuite/gcc.dg/plugin/crash-test-ice-stderr.c 
b/gcc/testsuite/gcc.dg/plugin/crash-test-ice-stderr.c
index cee701b135c..0064d3bc447 100644
--- a/gcc/testsuite/gcc.dg/plugin/crash-test-ice-stderr.c
+++ b/gcc/testsuite/gcc.dg/plugin/crash-test-ice-stderr.c
@@ -1,4 +1,5 @@
 /* { dg-do compile } */
+/* { dg-additional-options "-fno-report-bug" } */
 
 extern void inject_ice (void);
 
diff --git a/gcc/testsuite/gcc.dg/plugin/crash-test-write-though-null-sarif.c 
b/gcc/testsuite/gcc.dg/plugin/crash-test-write-though-null-sarif.c
index 57caa20155f..83b38d2ffb5 100644
--- a/gcc/testsuite/gcc.dg/plugin/crash-test-write-though-null-sarif.c
+++ b/gcc/testsuite/gcc.dg/plugin/crash-test-write-though-null-sarif.c
@@ -1,5 +1,6 @@
 /* { dg-do compile } */
 /* { dg-options "-fdiagnostics-format=sarif-file" } */
+/* { dg-additional-options "-fno-report-bug" } */
 
 extern void inject_write_through_null (void);
 
@@ -56,7 +57,7 @@ void test_inject_write_through_null (void)
  { dg-final { scan-sarif-file "\"contextRegion\": " } }
  { dg-final { scan-sarif-file "\"artifactLocation\": " } }
  { dg-final { scan-sarif-file "\"region\": " } }
-   { dg-final { scan-sarif-file "\"startLine\": 8" } }
+   { dg-final { scan-sarif-file "\"startLine\": 9" } }
{ dg-final { scan-sarif-file "\"startColumn\": 3" } }
{ dg-final { scan-sarif-file "\"endColumn\": 31" } }
  { dg-final { scan-sarif-file "\"message\": " } }
diff --git a/gcc/testsuite/gcc.dg/plugin/crash-test-write-though-null-stderr.c 
b/gcc/testsuite/gcc.dg/plugin/crash-test-write-though-null-stderr.c
index 7b43e423633..a9a211a3b1f 100644
--- a/gcc/testsuite/gcc.dg/plugin/crash-test-write-though-null-stderr.c
+++ b/gcc/testsuite/gcc.dg/plugin/crash-test-write-though-null-stderr.c
@@ -1,4 +1,5 @@
 /* { dg-do compile } */
+/* { dg-additional-options "-fno-report-bug" } */
 
 extern void inject_write_through_null (void);
 

base-commit: 070a6bf0bdc6761ad77ac97404c98f00a7007d54
-- 
2.41.0

Re: [PATCH] analyzer: Fix regression bug after r14-1632-g9589a46ddadc8b [PR110198]

2023-06-29 Thread Benjamin Priour via Gcc-patches

From: benjamin priour 

See below formatting updates on my patch.
In mail https://gcc.gnu.org/pipermail/gcc-patches/2023-June/623140.html,
David Malcolm says regtesting failed for him.

So I did it once more this morning rebased on fresh trunk dc93a0f633b
and target x86_64-linux-gnu, the output was similar to last time:

# from gcc_sources_testing 
gcc/contrib/compare_tests ../gcc_sources_control/build build

# Comparing directories
## Dir1=../gcc_sources_control/build: 8 sum files
## Dir2=build: 8 sum files

# Comparing 8 common sum files
## /bin/sh gcc/contrib/compare_tests  /tmp/gxx-sum1.750468 
/tmp/gxx-sum2.750468
Tests that now work, but didn't before (3 tests):

g++.dg/analyzer/pr100244.C  -std=c++14  (test for warnings, line 17)
g++.dg/analyzer/pr100244.C  -std=c++17  (test for warnings, line 17)
g++.dg/analyzer/pr100244.C  -std=c++20  (test for warnings, line 17)

# No differences found in 8 common sum files

Can you confirm formatting of the patch below, and perhaps regtest it ?

Thanks a lot, as this regression fix is now long due.
Benjamin. 

---

g++.dg/analyzer/PR100244.C was failing after a patch of PR109439.
The reason was a spurious preemptive return of get_store_value upon
out-of-bounds read that was preventing further checks. Now instead,
a boolean value check_poisoned goes to false when a OOB is detected,
and is later on given to get_or_create_initial_value.

gcc/analyzer/ChangeLog:

* region-model-manager.cc
(region_model_manager::get_or_create_initial_value): Take an
optional boolean value to bypass poisoning checks
* region-model-manager.h: Update declaration of the above function.
* region-model.cc (region_model::get_store_value): No longer returns
on OOB, but rather gives a boolean to get_or_create_initial_value.
(region_model::check_region_access): Update docstring.
(region_model::check_region_for_write): Update docstring.

Signed-off-by: benjamin priour 
---
 gcc/analyzer/region-model-manager.cc |  5 +++--
 gcc/analyzer/region-model-manager.h  |  3 ++-
 gcc/analyzer/region-model.cc | 15 ---
 3 files changed, 13 insertions(+), 10 deletions(-)

diff --git a/gcc/analyzer/region-model-manager.cc 
b/gcc/analyzer/region-model-manager.cc
index 1453acf7bc9..4f11ef4bd29 100644
--- a/gcc/analyzer/region-model-manager.cc
+++ b/gcc/analyzer/region-model-manager.cc
@@ -293,9 +293,10 @@ region_model_manager::create_unique_svalue (tree type)
necessary.  */
 
 const svalue *
-region_model_manager::get_or_create_initial_value (const region *reg)
+region_model_manager::get_or_create_initial_value (const region *reg,
+  bool check_poisoned)
 {
-  if (!reg->can_have_initial_svalue_p ())
+  if (!reg->can_have_initial_svalue_p () && check_poisoned)
 return get_or_create_poisoned_svalue (POISON_KIND_UNINIT,
  reg->get_type ());
 
diff --git a/gcc/analyzer/region-model-manager.h 
b/gcc/analyzer/region-model-manager.h
index 3340c3ebd1e..ff5333bf07c 100644
--- a/gcc/analyzer/region-model-manager.h
+++ b/gcc/analyzer/region-model-manager.h
@@ -49,7 +49,8 @@ public:
 tree type);
   const svalue *get_or_create_poisoned_svalue (enum poison_kind kind,
   tree type);
-  const svalue *get_or_create_initial_value (const region *reg);
+  const svalue *get_or_create_initial_value (const region *reg,
+bool check_poisoned = true);
   const svalue *get_ptr_svalue (tree ptr_type, const region *pointee);
   const svalue *get_or_create_unaryop (tree type, enum tree_code op,
   const svalue *arg);
diff --git a/gcc/analyzer/region-model.cc b/gcc/analyzer/region-model.cc
index 6bc60f89f3d..187013a37cc 100644
--- a/gcc/analyzer/region-model.cc
+++ b/gcc/analyzer/region-model.cc
@@ -2373,8 +2373,9 @@ region_model::get_store_value (const region *reg,
   if (reg->empty_p ())
 return m_mgr->get_or_create_unknown_svalue (reg->get_type ());
 
+  bool check_poisoned = true;
   if (check_region_for_read (reg, ctxt))
-return m_mgr->get_or_create_unknown_svalue(reg->get_type());
+check_poisoned = false;
 
   /* Special-case: handle var_decls in the constant pool.  */
   if (const decl_region *decl_reg = reg->dyn_cast_decl_region ())
@@ -2427,7 +2428,7 @@ region_model::get_store_value (const region *reg,
   == RK_GLOBALS)
 return get_initial_value_for_global (reg);
 
-  return m_mgr->get_or_create_initial_value (reg);
+  return m_mgr->get_or_create_initial_value (reg, check_poisoned);
 }
 
 /* Return false if REG does not exist, true if it may do.
@@ -2790,7 +2791,7 @@ region_model::get_string_size (const region *reg) const
 
 /* If CTXT is non-NULL, use it to warn about any problems accessing REG,
using DIR to

Re: [PATCH] aarch64: Remove architecture dependencies from intrinsics

2023-06-29 Thread Andrew Carlotti via Gcc-patches

On Tue, Jun 27, 2023 at 07:23:32AM +0100, Richard Sandiford wrote:
> Andrew Carlotti via Gcc-patches  writes:
> > Many intrinsics currently depend on both an architecture version and a
> > feature, despite the corresponding instructions being available within
> > GCC at lower architecture versions.
> >
> > LLVM has already removed these explicit architecture version
> > dependences; this patch does the same for GCC, as well as removing an
> > unecessary simd dependency for the scalar fp16 intrinsics.
> >
> > Binutils does not support all of these architecture+feature combinations
> > yet, but this is an existing problem that is already reachable from GCC.
> > For example, compiling the test gcc.target/aarch64/usadv16qi-dotprod.c
> > with -O3 -march=armv8-a+dotprod has resulted in an assembler error since
> > GCC 10. I intend to patch this in binutils.
> >
> > This patch retains explicit architecture version dependencies for
> > features that do not currently have a separate feature flag.
> >
> > Ok for master, and backport to GCC 13?
> >
> > gcc/ChangeLog:
> >
> >  * config/aarch64/aarch64.h (TARGET_MEMTAG): Remove armv8.5
> >  dependency.
> >  * config/aarch64/arm_acle.h: Remove unnecessary armv8.x
> >  dependencies from target pragmas.
> >  * config/aarch64/arm_fp16.h (target): Likewise.
> 
> The change to this file is a bit different from the others,
> since it's removing an implicit dependency on +simd, rather
> than a dependency on an architecture level.  I think it'd be
> worth mentioning that explicitly in the changelog.
> 
> OK with that change, thanks.
> 
> (Arguably we should add +nosimd to many of the other pragmas in
> arm_acle.h, but that's logically a separate patch.)
> 
> Richard

Actually, I think I should just remove the +nosimd from the patch, because
+fp16 doesn't enable simd (unlike +bf16, which has simd as an 'explicit on'
implication).

Aside from +bf16, the only other feature with simd as an 'explicit on' is
+rdma. However, there appear to be no non-simd rdma instructions, so
+nothing+rdma+nosimd is effectively the same as +nothing.

> > ...
> >
> > diff --git a/gcc/config/aarch64/arm_fp16.h b/gcc/config/aarch64/arm_fp16.h
> > index 
> > a8fa4dbbdfe1bab4aa604bb311ef66d4e1de18ac..84b2ed66f9ba19fba6ccd8be33940d7239bfa22e
> >  100644
> > --- a/gcc/config/aarch64/arm_fp16.h
> > +++ b/gcc/config/aarch64/arm_fp16.h
> > @@ -30,7 +30,7 @@
> >  #include 
> >  
> >  #pragma GCC push_options
> > -#pragma GCC target ("arch=armv8.2-a+fp16")
> > +#pragma GCC target ("+nothing+fp16+nosimd")
> >  
> >  typedef __fp16 float16_t;
> >

Re: [PATCH] Relax type-printer regexp in libstdc++ test suite

2023-06-29 Thread Jonathan Wakely via Gcc-patches

On Thu, 29 Jun 2023 at 17:59, Tom Tromey  wrote:

> > Jonathan Wakely  writes:
>
> > Looks good. OK for trunk, and OK to backport after some soak time on
> trunk. Thanks.
>
> AdaCore doesn't need a backport of this, and I don't think it's
> extremely important; so unless you want me to do it, I don't plan to.
>

OK, we can always backport it later if anybody else needs it.



> I did check it in on trunk earlier today.
>
>
Thanks.

Re: [PATCH] c++: unpropagated CONSTRUCTOR_MUTABLE_POISON [PR110463]

2023-06-29 Thread Jason Merrill via Gcc-patches


On 6/29/23 11:36, Marek Polacek wrote:

On Thu, Jun 29, 2023 at 11:22:55AM -0400, Patrick Palka via Gcc-patches wrote:

Bootstrapped and regtested on x86_64-pc-linux-gnu, does this look OK for
trunk/13?

-- >8 --

cp_fold is neglecting to propagate CONSTRUCTOR_MUTABLE_POISON when folding
a CONSTRUCTOR initializer, which for the below testcase causes us to fail
to reject a mutable member access of a constexpr variable during constexpr
evaluation.


LGTM.


Agreed, OK.


PR c++/110463

gcc/cp/ChangeLog:

* cp-gimplify.cc (cp_fold) : Propagate
CONSTRUCTOR_MUTABLE_POISON.

gcc/testsuite/ChangeLog:

* g++.dg/cpp0x/constexpr-mutable6.C: New test.
---
  gcc/cp/cp-gimplify.cc  |  2 ++
  .../g++.dg/cpp0x/constexpr-mutable6.C  | 18 ++
  2 files changed, 20 insertions(+)
  create mode 100644 gcc/testsuite/g++.dg/cpp0x/constexpr-mutable6.C

diff --git a/gcc/cp/cp-gimplify.cc b/gcc/cp/cp-gimplify.cc
index 853b1e44236..f5734197774 100644
--- a/gcc/cp/cp-gimplify.cc
+++ b/gcc/cp/cp-gimplify.cc
@@ -3079,6 +3079,8 @@ cp_fold (tree x, fold_flags_t flags)
x = build_constructor (TREE_TYPE (x), nelts);
CONSTRUCTOR_PLACEHOLDER_BOUNDARY (x)
  = CONSTRUCTOR_PLACEHOLDER_BOUNDARY (org_x);
+   CONSTRUCTOR_MUTABLE_POISON (x)
+ = CONSTRUCTOR_MUTABLE_POISON (org_x);
  }
if (VECTOR_TYPE_P (TREE_TYPE (x)))
  x = fold (x);
diff --git a/gcc/testsuite/g++.dg/cpp0x/constexpr-mutable6.C 
b/gcc/testsuite/g++.dg/cpp0x/constexpr-mutable6.C
new file mode 100644
index 000..2c946e388ab
--- /dev/null
+++ b/gcc/testsuite/g++.dg/cpp0x/constexpr-mutable6.C
@@ -0,0 +1,18 @@
+// PR c++/110463
+// { dg-do compile { target c++11 } }
+
+struct U {
+  mutable int x = 1;
+};
+
+struct V {
+  mutable int y = 1+1;
+};
+
+int main() {
+  constexpr U u = {};
+  constexpr int x = u.x; // { dg-error "mutable" }
+
+  constexpr V v = {};
+  constexpr int y = v.y; // { dg-error "mutable" }
+}
--
2.41.0.199.ga9e066fa63



Marek

Re: [PATCH] c++: NSDMI instantiation during overload resolution [PR110468]

2023-06-29 Thread Jason Merrill via Gcc-patches


On 6/29/23 11:22, Patrick Palka wrote:

Bootstrapped and regtested on x86_64-pc-linux-gnu, does this look OK for
trunk/13/12?


OK.


-- >8 --

Here we find ourselves instantiating the NSDMI for A<1>::m when
computing argument conversions during overload resolution, and
thus tf_conv is set.  This causes mark_used for the constructor
used in the NSDMI to exit early and not instantiate its noexcept-spec,
leading to an ICE from nothrow_spec_p.

This patch fixes this by clearing any unusual tsubst flags during
instantiation of an NSDMI, since the result should be independent of
the context that requires the instantiation.

PR c++/110468

gcc/cp/ChangeLog:

* init.cc (maybe_instantiate_nsdmi_init): Mask out all
tsubst flags except for tf_warning_or_error.

gcc/testsuite/ChangeLog:

* g++.dg/cpp0x/noexcept79.C: New test.
---
  gcc/cp/init.cc  |  4 
  gcc/testsuite/g++.dg/cpp0x/noexcept79.C | 18 ++
  2 files changed, 22 insertions(+)
  create mode 100644 gcc/testsuite/g++.dg/cpp0x/noexcept79.C

diff --git a/gcc/cp/init.cc b/gcc/cp/init.cc
index af6e30f511e..f01a11c5299 100644
--- a/gcc/cp/init.cc
+++ b/gcc/cp/init.cc
@@ -579,6 +579,10 @@ maybe_instantiate_nsdmi_init (tree member, tsubst_flags_t 
complain)
/* tsubst_decl uses void_node to indicate an uninstantiated DMI.  */
if (init == void_node)
  {
+  /* The result of NSDMI instantiation should be independent of
+the tsubst flags we're given.  */
+  complain &= tf_warning_or_error;
+
init = DECL_INITIAL (DECL_TI_TEMPLATE (member));
location_t expr_loc
= cp_expr_loc_or_loc (init, DECL_SOURCE_LOCATION (member));
diff --git a/gcc/testsuite/g++.dg/cpp0x/noexcept79.C 
b/gcc/testsuite/g++.dg/cpp0x/noexcept79.C
new file mode 100644
index 000..d1f54d14431
--- /dev/null
+++ b/gcc/testsuite/g++.dg/cpp0x/noexcept79.C
@@ -0,0 +1,18 @@
+// PR c++/110468
+// { dg-do compile { target c++11 } }
+
+template
+struct variant {
+  variant() noexcept(T > 0);
+};
+
+template
+struct A {
+  variant m = {};
+};
+
+struct B {
+  B(A<1>);
+};
+
+B b = {{}};

Re: [PATCH] c++: Fix ICE with parameter pack of decltype(auto) [PR103497]

2023-06-29 Thread Jason Merrill via Gcc-patches


On 6/24/23 09:24, Nathaniel Shead wrote:

On Fri, Jun 23, 2023 at 11:59:51AM -0400, Patrick Palka wrote:

Hi,

On Sat, 22 Apr 2023, Nathaniel Shead via Gcc-patches wrote:


Bootstrapped and tested on x86_64-pc-linux-gnu.

-- 8< --

This patch raises an error early when the decltype(auto) specifier is
used as a parameter of a function. This prevents any issues with an
unexpected tree type later on when performing the call.


Thanks very much for the patch!  Some minor comments below.



PR 103497


We should include the bug component name when referring to the PR in the
commit message (i.e. PR c++/103497) so that upon pushing the patch the
post-commit hook automatically adds a comment to the PR reffering to the
commit.  I could be wrong but AFAIK the hook only performs this when the
component name is included.


Thanks for the review! Fixed.



gcc/cp/ChangeLog:

* parser.cc (cp_parser_simple_type_specifier): Add check for
decltype(auto) as function parameter.

gcc/testsuite/ChangeLog:

* g++.dg/pr103497.C: New test.

Signed-off-by: Nathaniel Shead 
---
  gcc/cp/parser.cc| 10 ++
  gcc/testsuite/g++.dg/pr103497.C |  7 +++
  2 files changed, 17 insertions(+)
  create mode 100644 gcc/testsuite/g++.dg/pr103497.C

diff --git a/gcc/cp/parser.cc b/gcc/cp/parser.cc
index e5f032f2330..1415e07e152 100644
--- a/gcc/cp/parser.cc
+++ b/gcc/cp/parser.cc
@@ -19884,6 +19884,16 @@ cp_parser_simple_type_specifier (cp_parser* parser,
&& cp_lexer_peek_nth_token (parser->lexer, 2)->type != CPP_SCOPE)
  {
type = saved_checks_value (token->u.tree_check_value);
+  /* Within a function parameter declaration, decltype(auto) is always an
+error.  */
+  if (parser->auto_is_implicit_function_template_parm_p
+ && TREE_CODE (type) == TEMPLATE_TYPE_PARM


We could check is_auto (type) here instead, to avoid any confusion with
checking AUTO_IS_DECLTYPE for a non-auto TEMPLATE_TYPE_PARM.


+ && AUTO_IS_DECLTYPE (type))
+   {
+ error_at (token->location,
+   "cannot declare a parameter with %");
+ type = error_mark_node;
+   }
if (decl_specs)
{
  cp_parser_set_decl_spec_type (decl_specs, type,
diff --git a/gcc/testsuite/g++.dg/pr103497.C b/gcc/testsuite/g++.dg/pr103497.C
new file mode 100644
index 000..bcd421c2907
--- /dev/null
+++ b/gcc/testsuite/g++.dg/pr103497.C
@@ -0,0 +1,7 @@
+// { dg-do compile { target c++14 } }
+
+void foo(decltype(auto)... args);  // { dg-error "parameter with 
.decltype.auto..|no parameter packs" }


I noticed for

   void foo(decltype(auto) arg);

we already issue an identical error from grokdeclarator.  Perhaps we could
instead extend the error handling there to detect decltype(auto)... as well,
rather than adding new error handling in cp_parser_simple_type_specifier?


Ah thanks, I didn't notice this; this simplifies the change a fair bit.
How about this patch instead?

Regtested on x86_64-pc-linux-gnu.

-- 8< --

This patch ensures that checks for usages of 'auto' in function
parameters also consider parameter packs, since 'type_uses_auto' does
not seem to consider this case.

PR c++/103497

gcc/cp/ChangeLog:

* decl.cc (grokdeclarator): Check for decltype(auto) in
parameter pack.

gcc/testsuite/ChangeLog:

* g++.dg/cpp1y/decltype-auto-103497.C: New test.

Signed-off-by: Nathaniel Shead 
---
  gcc/cp/decl.cc| 3 +++
  gcc/testsuite/g++.dg/cpp1y/decltype-auto-103497.C | 8 
  2 files changed, 11 insertions(+)
  create mode 100644 gcc/testsuite/g++.dg/cpp1y/decltype-auto-103497.C

diff --git a/gcc/cp/decl.cc b/gcc/cp/decl.cc
index 60f107d50c4..aaf691fce68 100644
--- a/gcc/cp/decl.cc
+++ b/gcc/cp/decl.cc
@@ -14044,6 +14044,9 @@ grokdeclarator (const cp_declarator *declarator,
error ("cannot use %<::%> in parameter declaration");
  
tree auto_node = type_uses_auto (type);

+  if (!auto_node && parameter_pack_p)
+   auto_node = type_uses_auto (PACK_EXPANSION_PATTERN (type));


Hmm, I wonder if type_uses_auto should look into PACK_EXPANSION_PATTERN 
itself.  Would that break anything?



+
if (auto_node && !(cxx_dialect >= cxx17 && template_parm_flag))
{
  if (cxx_dialect >= cxx14)
diff --git a/gcc/testsuite/g++.dg/cpp1y/decltype-auto-103497.C 
b/gcc/testsuite/g++.dg/cpp1y/decltype-auto-103497.C
new file mode 100644
index 000..cedd661710c
--- /dev/null
+++ b/gcc/testsuite/g++.dg/cpp1y/decltype-auto-103497.C
@@ -0,0 +1,8 @@
+// PR c++/103497
+// { dg-do compile { target c++14 } }
+
+void foo(decltype(auto)... args);  // { dg-error "cannot declare a parameter with 
.decltype.auto.." }
+
+int main() {
+  foo();
+}

Re: [PATCH] Relax type-printer regexp in libstdc++ test suite

2023-06-29 Thread Tom Tromey via Gcc-patches

> Jonathan Wakely  writes:

> Looks good. OK for trunk, and OK to backport after some soak time on trunk. 
> Thanks.

AdaCore doesn't need a backport of this, and I don't think it's
extremely important; so unless you want me to do it, I don't plan to.

I did check it in on trunk earlier today.

thanks,
Tom

[COMMITTED] Tidy up the range normalization code.

2023-06-29 Thread Aldy Hernandez via Gcc-patches

There's a few spots where a range is being altered in-place, but we
fail to call normalize the range.  This patch makes sure we always
call normalize_kind(), and that normalize_kind in turn, calls
verify_range to make sure verything is canonical.

gcc/ChangeLog:

* value-range.cc (frange::set): Do not call verify_range.
(frange::normalize_kind): Verify range.
(frange::union_nans): Do not call verify_range.
(frange::union_): Same.
(frange::intersect): Same.
(irange::irange_single_pair_union): Call normalize_kind if
necessary.
(irange::union_): Same.
(irange::intersect): Same.
(irange::set_range_from_nonzero_bits): Verify range.
(irange::set_nonzero_bits): Call normalize_kind if necessary.
(irange::get_nonzero_bits): Tweak comment.
(irange::intersect_nonzero_bits): Call normalize_kind if
necessary.
(irange::union_nonzero_bits): Same.
* value-range.h (irange::normalize_kind): Verify range.
---
 gcc/value-range.cc | 99 ++
 gcc/value-range.h  |  2 +
 2 files changed, 50 insertions(+), 51 deletions(-)

diff --git a/gcc/value-range.cc b/gcc/value-range.cc
index 6f46f7c9875..f5d4bf3bb4a 100644
--- a/gcc/value-range.cc
+++ b/gcc/value-range.cc
@@ -411,9 +411,6 @@ frange::set (tree type,
   gcc_checking_assert (real_compare (LE_EXPR, , ));
 
   normalize_kind ();
-
-  if (flag_checking)
-verify_range ();
 }
 
 // Setter for an frange defaulting the NAN possibility to +-NAN when
@@ -462,6 +459,8 @@ frange::normalize_kind ()
  m_kind = VR_RANGE;
  m_min = frange_val_min (m_type);
  m_max = frange_val_max (m_type);
+ if (flag_checking)
+   verify_range ();
  return true;
}
 }
@@ -524,8 +523,6 @@ frange::union_nans (const frange )
   m_pos_nan |= r.m_pos_nan;
   m_neg_nan |= r.m_neg_nan;
   normalize_kind ();
-  if (flag_checking)
-verify_range ();
   return true;
 }
 
@@ -569,8 +566,6 @@ frange::union_ (const vrange )
 changed |= combine_zeros (r, true);
 
   changed |= normalize_kind ();
-  if (flag_checking)
-verify_range ();
   return changed;
 }
 
@@ -648,8 +643,6 @@ frange::intersect (const vrange )
 changed |= combine_zeros (r, false);
 
   changed |= normalize_kind ();
-  if (flag_checking)
-verify_range ();
   return changed;
 }
 
@@ -1197,7 +1190,12 @@ irange::irange_single_pair_union (const irange )
  m_base[3] = r.m_base[1];
  m_num_ranges = 2;
}
-  union_nonzero_bits (r);
+  // The range has been altered, so normalize it even if nothing
+  // changed in the mask.
+  if (!union_nonzero_bits (r))
+   normalize_kind ();
+  if (flag_checking)
+   verify_range ();
   return true;
 }
 
@@ -1221,7 +1219,12 @@ irange::irange_single_pair_union (const irange )
   m_base[3] = m_base[1];
   m_base[1] = r.m_base[1];
 }
-  union_nonzero_bits (r);
+  // The range has been altered, so normalize it even if nothing
+  // changed in the mask.
+  if (!union_nonzero_bits (r))
+normalize_kind ();
+  if (flag_checking)
+verify_range ();
   return true;
 }
 
@@ -1351,7 +1354,12 @@ irange::union_ (const vrange )
   m_num_ranges = i / 2;
 
   m_kind = VR_RANGE;
-  union_nonzero_bits (r);
+  // The range has been altered, so normalize it even if nothing
+  // changed in the mask.
+  if (!union_nonzero_bits (r))
+normalize_kind ();
+  if (flag_checking)
+verify_range ();
   return true;
 }
 
@@ -1518,7 +1526,12 @@ irange::intersect (const vrange )
 }
 
   m_kind = VR_RANGE;
-  intersect_nonzero_bits (r);
+  // The range has been altered, so normalize it even if nothing
+  // changed in the mask.
+  if (!intersect_nonzero_bits (r))
+normalize_kind ();
+  if (flag_checking)
+verify_range ();
   return true;
 }
 
@@ -1585,10 +1598,7 @@ irange::intersect (const wide_int& lb, const wide_int& 
ub)
 }
 
   m_kind = VR_RANGE;
-  // No need to call normalize_kind(), as the caller will do this
-  // while intersecting the nonzero mask.
-  if (flag_checking)
-verify_range ();
+  normalize_kind ();
   return true;
 }
 
@@ -1758,6 +1768,8 @@ irange::set_range_from_nonzero_bits ()
  zero.set_zero (type ());
  union_ (zero);
}
+  if (flag_checking)
+   verify_range ();
   return true;
 }
   else if (popcount == 0)
@@ -1778,10 +1790,8 @@ irange::set_nonzero_bits (const wide_int )
 m_kind = VR_RANGE;
 
   m_nonzero_mask = bits;
-  if (set_range_from_nonzero_bits ())
-return;
-
-  normalize_kind ();
+  if (!set_range_from_nonzero_bits ())
+normalize_kind ();
   if (flag_checking)
 verify_range ();
 }
@@ -1807,8 +1817,8 @@ irange::get_nonzero_bits () const
 return m_nonzero_mask & get_nonzero_bits_from_range ();
 }
 
-// Intersect the nonzero bits in R into THIS and normalize the range.
-// Return TRUE if the intersection changed

[COMMITTED] Move maybe_set_nonzero_bits() to its only user.

2023-06-29 Thread Aldy Hernandez via Gcc-patches

gcc/ChangeLog:

* tree-vrp.cc (maybe_set_nonzero_bits): Move from here...
* tree-ssa-dom.cc (maybe_set_nonzero_bits): ...to here.
* tree-vrp.h (maybe_set_nonzero_bits): Remove.
---
 gcc/tree-ssa-dom.cc | 65 +
 gcc/tree-vrp.cc | 65 -
 gcc/tree-vrp.h  |  1 -
 3 files changed, 65 insertions(+), 66 deletions(-)

diff --git a/gcc/tree-ssa-dom.cc b/gcc/tree-ssa-dom.cc
index 9f534b5a190..f7f8b730877 100644
--- a/gcc/tree-ssa-dom.cc
+++ b/gcc/tree-ssa-dom.cc
@@ -1338,6 +1338,71 @@ all_uses_feed_or_dominated_by_stmt (tree name, gimple 
*stmt)
   return true;
 }
 
+/* Handle
+   _4 = x_3 & 31;
+   if (_4 != 0)
+ goto ;
+   else
+ goto ;
+   :
+   __builtin_unreachable ();
+   :
+
+   If x_3 has no other immediate uses (checked by caller), var is the
+   x_3 var, we can clear low 5 bits from the non-zero bitmask.  */
+
+static void
+maybe_set_nonzero_bits (edge e, tree var)
+{
+  basic_block cond_bb = e->src;
+  gcond *cond = safe_dyn_cast  (*gsi_last_bb (cond_bb));
+  tree cst;
+
+  if (cond == NULL
+  || gimple_cond_code (cond) != ((e->flags & EDGE_TRUE_VALUE)
+? EQ_EXPR : NE_EXPR)
+  || TREE_CODE (gimple_cond_lhs (cond)) != SSA_NAME
+  || !integer_zerop (gimple_cond_rhs (cond)))
+return;
+
+  gimple *stmt = SSA_NAME_DEF_STMT (gimple_cond_lhs (cond));
+  if (!is_gimple_assign (stmt)
+  || gimple_assign_rhs_code (stmt) != BIT_AND_EXPR
+  || TREE_CODE (gimple_assign_rhs2 (stmt)) != INTEGER_CST)
+return;
+  if (gimple_assign_rhs1 (stmt) != var)
+{
+  gimple *stmt2;
+
+  if (TREE_CODE (gimple_assign_rhs1 (stmt)) != SSA_NAME)
+   return;
+  stmt2 = SSA_NAME_DEF_STMT (gimple_assign_rhs1 (stmt));
+  if (!gimple_assign_cast_p (stmt2)
+ || gimple_assign_rhs1 (stmt2) != var
+ || !CONVERT_EXPR_CODE_P (gimple_assign_rhs_code (stmt2))
+ || (TYPE_PRECISION (TREE_TYPE (gimple_assign_rhs1 (stmt)))
+ != TYPE_PRECISION (TREE_TYPE (var
+   return;
+}
+  cst = gimple_assign_rhs2 (stmt);
+  if (POINTER_TYPE_P (TREE_TYPE (var)))
+{
+  struct ptr_info_def *pi = SSA_NAME_PTR_INFO (var);
+  if (pi && pi->misalign)
+   return;
+  wide_int w = wi::bit_not (wi::to_wide (cst));
+  unsigned int bits = wi::ctz (w);
+  if (bits == 0 || bits >= HOST_BITS_PER_INT)
+   return;
+  unsigned int align = 1U << bits;
+  if (pi == NULL || pi->align < align)
+   set_ptr_info_alignment (get_ptr_info (var), align, 0);
+}
+  else
+set_nonzero_bits (var, wi::bit_and_not (get_nonzero_bits (var),
+   wi::to_wide (cst)));
+}
+
 /* Set global ranges that can be determined from the C->M edge:
 
:
diff --git a/gcc/tree-vrp.cc b/gcc/tree-vrp.cc
index c52e9971faa..d61b087b730 100644
--- a/gcc/tree-vrp.cc
+++ b/gcc/tree-vrp.cc
@@ -633,71 +633,6 @@ overflow_comparison_p (tree_code code, tree name, tree 
val, tree *new_cst)
  true, new_cst);
 }
 
-/* Handle
-   _4 = x_3 & 31;
-   if (_4 != 0)
- goto ;
-   else
- goto ;
-   :
-   __builtin_unreachable ();
-   :
-
-   If x_3 has no other immediate uses (checked by caller), var is the
-   x_3 var, we can clear low 5 bits from the non-zero bitmask.  */
-
-void
-maybe_set_nonzero_bits (edge e, tree var)
-{
-  basic_block cond_bb = e->src;
-  gcond *cond = safe_dyn_cast  (*gsi_last_bb (cond_bb));
-  tree cst;
-
-  if (cond == NULL
-  || gimple_cond_code (cond) != ((e->flags & EDGE_TRUE_VALUE)
-? EQ_EXPR : NE_EXPR)
-  || TREE_CODE (gimple_cond_lhs (cond)) != SSA_NAME
-  || !integer_zerop (gimple_cond_rhs (cond)))
-return;
-
-  gimple *stmt = SSA_NAME_DEF_STMT (gimple_cond_lhs (cond));
-  if (!is_gimple_assign (stmt)
-  || gimple_assign_rhs_code (stmt) != BIT_AND_EXPR
-  || TREE_CODE (gimple_assign_rhs2 (stmt)) != INTEGER_CST)
-return;
-  if (gimple_assign_rhs1 (stmt) != var)
-{
-  gimple *stmt2;
-
-  if (TREE_CODE (gimple_assign_rhs1 (stmt)) != SSA_NAME)
-   return;
-  stmt2 = SSA_NAME_DEF_STMT (gimple_assign_rhs1 (stmt));
-  if (!gimple_assign_cast_p (stmt2)
- || gimple_assign_rhs1 (stmt2) != var
- || !CONVERT_EXPR_CODE_P (gimple_assign_rhs_code (stmt2))
- || (TYPE_PRECISION (TREE_TYPE (gimple_assign_rhs1 (stmt)))
- != TYPE_PRECISION (TREE_TYPE (var
-   return;
-}
-  cst = gimple_assign_rhs2 (stmt);
-  if (POINTER_TYPE_P (TREE_TYPE (var)))
-{
-  struct ptr_info_def *pi = SSA_NAME_PTR_INFO (var);
-  if (pi && pi->misalign)
-   return;
-  wide_int w = wi::bit_not (wi::to_wide (cst));
-  unsigned int bits = wi::ctz (w);
-  if (bits == 0 || bits >= HOST_BITS_PER_INT)
-   return;
-  unsigned int align = 1U << bits;
-  if (pi == NULL

Re: [PATCH] configure: Implement --enable-host-bind-now

2023-06-29 Thread Marek Polacek via Gcc-patches

On Thu, Jun 29, 2023 at 05:58:22PM +0200, Martin Jambor wrote:
> Hi,
> 
> On Tue, Jun 27 2023, Marek Polacek wrote:
> > On Tue, Jun 27, 2023 at 01:39:16PM +0200, Martin Jambor wrote:
> >> Hello,
> >> 
> >> On Tue, May 16 2023, Marek Polacek via Gcc-patches wrote:
> >> > As promised in the --enable-host-pie patch, this patch adds another
> >> > configure option, --enable-host-bind-now, which adds -z now when linking
> >> > the compiler executables in order to extend hardening.  BIND_NOW with 
> >> > RELRO
> >> > allows the GOT to be marked RO; this prevents GOT modification attacks.
> >> >
> >> > This option does not affect linking of target libraries; you can use
> >> > LDFLAGS_FOR_TARGET=-Wl,-z,relro,-z,now to enable RELRO/BIND_NOW.
> >> >
> >> > With this patch:
> >> > $ readelf -Wd cc1{,plus} | grep FLAGS
> >> >  0x001e (FLAGS)  BIND_NOW
> >> >  0x6ffb (FLAGS_1)Flags: NOW PIE
> >> >  0x001e (FLAGS)  BIND_NOW
> >> >  0x6ffb (FLAGS_1)Flags: NOW PIE
> >> >
> >> > Bootstrapped/regtested on x86_64-pc-linux-gnu, ok for trunk?
> >> >
> >> > c++tools/ChangeLog:
> >> >
> >> >  * configure.ac (--enable-host-bind-now): New check.
> >> >  * configure: Regenerate.
> >> >
> >> > gcc/ChangeLog:
> >> >
> >> >  * configure.ac (--enable-host-bind-now): New check.  Add
> >> >  -Wl,-z,now to LD_PICFLAG if --enable-host-bind-now.
> >> >  * configure: Regenerate.
> >> >  * doc/install.texi: Document --enable-host-bind-now.
> >> >
> >> > lto-plugin/ChangeLog:
> >> >
> >> >  * configure.ac (--enable-host-bind-now): New check.  Link with
> >> >  -z,now.
> >> >  * configure: Regenerate.
> >> 
> >> Our reconfiguration checking script complains about a missing hunk in
> >> lto-plugin/Makefile.in:
> >> 
> >> diff --git a/lto-plugin/Makefile.in b/lto-plugin/Makefile.in
> >> index cb568e1e09f..f6f5b020ff5 100644
> >> --- a/lto-plugin/Makefile.in
> >> +++ b/lto-plugin/Makefile.in
> >> @@ -298,6 +298,7 @@ datadir = @datadir@
> >>  datarootdir = @datarootdir@
> >>  docdir = @docdir@
> >>  dvidir = @dvidir@
> >> +enable_host_bind_now = @enable_host_bind_now@
> >>  exec_prefix = @exec_prefix@
> >>  gcc_build_dir = @gcc_build_dir@
> >>  get_gcc_base_ver = @get_gcc_base_ver@
> >> 
> >> 
> >> I am somewhat puzzled why the line is not missing in any of the other
> >> Makefile.in files.  Can you please check whether that is the only thing
> >> that is missing (assuming it is actually missing)?
> >
> > Arg, once again, I'm sorry.  I don't know how this happened.  It would
> > be trivial to fix it but since
> >
> > commit 4a48a38fa99f067b8f3a3d1a5dc7a1e602db351f
> > Author: Eric Botcazou 
> > Date:   Wed Jun 21 18:19:36 2023 +0200
> >
> > ada: Fix build of GNAT tools
> >
> > the build with Ada included fails with --enable-host-pie.  So that needs
> > to be fixed first.
> >
> > Eric, I'm not asking you to fix that, but I'm curious, what did the
> > commit above fix?  The patch looks correct; I'm just puzzled why I
> > hadn't seen any build failures.
> >
> > The --enable-host-pie patch has been a nightmare :(.
> >
> 
> No worries, I can see how these things can easily get difficult.
> 
> Unfortunately I won't have time to actually look at this in the next 2-3
> weeks, so I am inclined to just trust the verification script (which
> essentially runs autoconf/automake everywhere and then expects no diff)
> and commit the one-line change.  What do you think, does that make sense
> (even without looking at why other Makefile.in files did not change)?

Yes please, go ahead with the one line change meanwhile.  Thanks!

I've opened PR110467 for the build problem.

Marek

Re: [PATCH] configure: Implement --enable-host-bind-now

2023-06-29 Thread Martin Jambor

Hi,

On Tue, Jun 27 2023, Marek Polacek wrote:
> On Tue, Jun 27, 2023 at 01:39:16PM +0200, Martin Jambor wrote:
>> Hello,
>> 
>> On Tue, May 16 2023, Marek Polacek via Gcc-patches wrote:
>> > As promised in the --enable-host-pie patch, this patch adds another
>> > configure option, --enable-host-bind-now, which adds -z now when linking
>> > the compiler executables in order to extend hardening.  BIND_NOW with RELRO
>> > allows the GOT to be marked RO; this prevents GOT modification attacks.
>> >
>> > This option does not affect linking of target libraries; you can use
>> > LDFLAGS_FOR_TARGET=-Wl,-z,relro,-z,now to enable RELRO/BIND_NOW.
>> >
>> > With this patch:
>> > $ readelf -Wd cc1{,plus} | grep FLAGS
>> >  0x001e (FLAGS)  BIND_NOW
>> >  0x6ffb (FLAGS_1)Flags: NOW PIE
>> >  0x001e (FLAGS)  BIND_NOW
>> >  0x6ffb (FLAGS_1)Flags: NOW PIE
>> >
>> > Bootstrapped/regtested on x86_64-pc-linux-gnu, ok for trunk?
>> >
>> > c++tools/ChangeLog:
>> >
>> >* configure.ac (--enable-host-bind-now): New check.
>> >* configure: Regenerate.
>> >
>> > gcc/ChangeLog:
>> >
>> >* configure.ac (--enable-host-bind-now): New check.  Add
>> >-Wl,-z,now to LD_PICFLAG if --enable-host-bind-now.
>> >* configure: Regenerate.
>> >* doc/install.texi: Document --enable-host-bind-now.
>> >
>> > lto-plugin/ChangeLog:
>> >
>> >* configure.ac (--enable-host-bind-now): New check.  Link with
>> >-z,now.
>> >* configure: Regenerate.
>> 
>> Our reconfiguration checking script complains about a missing hunk in
>> lto-plugin/Makefile.in:
>> 
>> diff --git a/lto-plugin/Makefile.in b/lto-plugin/Makefile.in
>> index cb568e1e09f..f6f5b020ff5 100644
>> --- a/lto-plugin/Makefile.in
>> +++ b/lto-plugin/Makefile.in
>> @@ -298,6 +298,7 @@ datadir = @datadir@
>>  datarootdir = @datarootdir@
>>  docdir = @docdir@
>>  dvidir = @dvidir@
>> +enable_host_bind_now = @enable_host_bind_now@
>>  exec_prefix = @exec_prefix@
>>  gcc_build_dir = @gcc_build_dir@
>>  get_gcc_base_ver = @get_gcc_base_ver@
>> 
>> 
>> I am somewhat puzzled why the line is not missing in any of the other
>> Makefile.in files.  Can you please check whether that is the only thing
>> that is missing (assuming it is actually missing)?
>
> Arg, once again, I'm sorry.  I don't know how this happened.  It would
> be trivial to fix it but since
>
> commit 4a48a38fa99f067b8f3a3d1a5dc7a1e602db351f
> Author: Eric Botcazou 
> Date:   Wed Jun 21 18:19:36 2023 +0200
>
> ada: Fix build of GNAT tools
>
> the build with Ada included fails with --enable-host-pie.  So that needs
> to be fixed first.
>
> Eric, I'm not asking you to fix that, but I'm curious, what did the
> commit above fix?  The patch looks correct; I'm just puzzled why I
> hadn't seen any build failures.
>
> The --enable-host-pie patch has been a nightmare :(.
>

No worries, I can see how these things can easily get difficult.

Unfortunately I won't have time to actually look at this in the next 2-3
weeks, so I am inclined to just trust the verification script (which
essentially runs autoconf/automake everywhere and then expects no diff)
and commit the one-line change.  What do you think, does that make sense
(even without looking at why other Makefile.in files did not change)?

Thanks,

Martin

Re: [PATCH] c++: unpropagated CONSTRUCTOR_MUTABLE_POISON [PR110463]

2023-06-29 Thread Marek Polacek via Gcc-patches

On Thu, Jun 29, 2023 at 11:22:55AM -0400, Patrick Palka via Gcc-patches wrote:
> Bootstrapped and regtested on x86_64-pc-linux-gnu, does this look OK for
> trunk/13?
> 
> -- >8 --
> 
> cp_fold is neglecting to propagate CONSTRUCTOR_MUTABLE_POISON when folding
> a CONSTRUCTOR initializer, which for the below testcase causes us to fail
> to reject a mutable member access of a constexpr variable during constexpr
> evaluation.

LGTM.
 
>   PR c++/110463
> 
> gcc/cp/ChangeLog:
> 
>   * cp-gimplify.cc (cp_fold) : Propagate
>   CONSTRUCTOR_MUTABLE_POISON.
> 
> gcc/testsuite/ChangeLog:
> 
>   * g++.dg/cpp0x/constexpr-mutable6.C: New test.
> ---
>  gcc/cp/cp-gimplify.cc  |  2 ++
>  .../g++.dg/cpp0x/constexpr-mutable6.C  | 18 ++
>  2 files changed, 20 insertions(+)
>  create mode 100644 gcc/testsuite/g++.dg/cpp0x/constexpr-mutable6.C
> 
> diff --git a/gcc/cp/cp-gimplify.cc b/gcc/cp/cp-gimplify.cc
> index 853b1e44236..f5734197774 100644
> --- a/gcc/cp/cp-gimplify.cc
> +++ b/gcc/cp/cp-gimplify.cc
> @@ -3079,6 +3079,8 @@ cp_fold (tree x, fold_flags_t flags)
>   x = build_constructor (TREE_TYPE (x), nelts);
>   CONSTRUCTOR_PLACEHOLDER_BOUNDARY (x)
> = CONSTRUCTOR_PLACEHOLDER_BOUNDARY (org_x);
> + CONSTRUCTOR_MUTABLE_POISON (x)
> +   = CONSTRUCTOR_MUTABLE_POISON (org_x);
> }
>   if (VECTOR_TYPE_P (TREE_TYPE (x)))
> x = fold (x);
> diff --git a/gcc/testsuite/g++.dg/cpp0x/constexpr-mutable6.C 
> b/gcc/testsuite/g++.dg/cpp0x/constexpr-mutable6.C
> new file mode 100644
> index 000..2c946e388ab
> --- /dev/null
> +++ b/gcc/testsuite/g++.dg/cpp0x/constexpr-mutable6.C
> @@ -0,0 +1,18 @@
> +// PR c++/110463
> +// { dg-do compile { target c++11 } }
> +
> +struct U {
> +  mutable int x = 1;
> +};
> +
> +struct V {
> +  mutable int y = 1+1;
> +};
> +
> +int main() {
> +  constexpr U u = {};
> +  constexpr int x = u.x; // { dg-error "mutable" }
> +
> +  constexpr V v = {};
> +  constexpr int y = v.y; // { dg-error "mutable" }
> +}
> -- 
> 2.41.0.199.ga9e066fa63
> 

Marek

[committed] cselib+expr+bitmap: Change return type of predicate functions from int to bool

2023-06-29 Thread Uros Bizjak via Gcc-patches

gcc/ChangeLog:

* cselib.h (rtx_equal_for_cselib_1):
Change return type from int to bool.
(references_value_p): Ditto.
(rtx_equal_for_cselib_p): Ditto.
* expr.h (can_store_by_pieces): Ditto.
(try_casesi): Ditto.
(try_tablejump): Ditto.
(safe_from_p): Ditto.
* sbitmap.h (bitmap_equal_p): Ditto.
* cselib.cc (references_value_p): Change return type
from int to void and adjust function body accordingly.
(rtx_equal_for_cselib_1): Ditto.
* expr.cc (is_aligning_offset): Ditto.
(can_store_by_pieces): Ditto.
(mostly_zeros_p): Ditto.
(all_zeros_p): Ditto.
(safe_from_p): Ditto.
(is_aligning_offset): Ditto.
(try_casesi): Ditto.
(try_tablejump): Ditto.
(store_constructor): Change "need_to_clear" and
"const_bounds_p" variables to bool.
* sbitmap.cc (bitmap_equal_p): Change return type from int to bool.

Bootstrapped and regression tested on x86_64-linux-gnu {,-m32}.

Uros,
diff --git a/gcc/cselib.cc b/gcc/cselib.cc
index 065867b4a84..5b9843a5942 100644
--- a/gcc/cselib.cc
+++ b/gcc/cselib.cc
@@ -636,7 +636,7 @@ cselib_find_slot (machine_mode mode, rtx x, hashval_t hash,
element has been set to zero, which implies the cselib_val will be
removed.  */
 
-int
+bool
 references_value_p (const_rtx x, int only_useless)
 {
   const enum rtx_code code = GET_CODE (x);
@@ -646,19 +646,19 @@ references_value_p (const_rtx x, int only_useless)
   if (GET_CODE (x) == VALUE
   && (! only_useless
  || (CSELIB_VAL_PTR (x)->locs == 0 && !PRESERVED_VALUE_P (x
-return 1;
+return true;
 
   for (i = GET_RTX_LENGTH (code) - 1; i >= 0; i--)
 {
   if (fmt[i] == 'e' && references_value_p (XEXP (x, i), only_useless))
-   return 1;
+   return true;
   else if (fmt[i] == 'E')
for (j = 0; j < XVECLEN (x, i); j++)
  if (references_value_p (XVECEXP (x, i, j), only_useless))
-   return 1;
+   return true;
 }
 
-  return 0;
+  return false;
 }
 
 /* Return true if V is a useless VALUE and can be discarded as such.  */
@@ -926,13 +926,13 @@ autoinc_split (rtx x, rtx *off, machine_mode memmode)
   return x;
 }
 
-/* Return nonzero if we can prove that X and Y contain the same value,
+/* Return true if we can prove that X and Y contain the same value,
taking our gathered information into account.  MEMMODE holds the
mode of the enclosing MEM, if any, as required to deal with autoinc
addressing modes.  If X and Y are not (known to be) part of
addresses, MEMMODE should be VOIDmode.  */
 
-int
+bool
 rtx_equal_for_cselib_1 (rtx x, rtx y, machine_mode memmode, int depth)
 {
   enum rtx_code code;
@@ -956,7 +956,7 @@ rtx_equal_for_cselib_1 (rtx x, rtx y, machine_mode memmode, 
int depth)
 }
 
   if (x == y)
-return 1;
+return true;
 
   if (GET_CODE (x) == VALUE)
 {
@@ -973,11 +973,11 @@ rtx_equal_for_cselib_1 (rtx x, rtx y, machine_mode 
memmode, int depth)
  rtx yoff = NULL;
  rtx yr = autoinc_split (y, , memmode);
  if ((yr == x || yr == e->val_rtx) && yoff == NULL_RTX)
-   return 1;
+   return true;
}
 
   if (depth == 128)
-   return 0;
+   return false;
 
   for (l = e->locs; l; l = l->next)
{
@@ -989,10 +989,10 @@ rtx_equal_for_cselib_1 (rtx x, rtx y, machine_mode 
memmode, int depth)
  if (REG_P (t) || MEM_P (t) || GET_CODE (t) == VALUE)
continue;
  else if (rtx_equal_for_cselib_1 (t, y, memmode, depth + 1))
-   return 1;
+   return true;
}
 
-  return 0;
+  return false;
 }
   else if (GET_CODE (y) == VALUE)
 {
@@ -1006,11 +1006,11 @@ rtx_equal_for_cselib_1 (rtx x, rtx y, machine_mode 
memmode, int depth)
  rtx xoff = NULL;
  rtx xr = autoinc_split (x, , memmode);
  if ((xr == y || xr == e->val_rtx) && xoff == NULL_RTX)
-   return 1;
+   return true;
}
 
   if (depth == 128)
-   return 0;
+   return false;
 
   for (l = e->locs; l; l = l->next)
{
@@ -1019,14 +1019,14 @@ rtx_equal_for_cselib_1 (rtx x, rtx y, machine_mode 
memmode, int depth)
  if (REG_P (t) || MEM_P (t) || GET_CODE (t) == VALUE)
continue;
  else if (rtx_equal_for_cselib_1 (x, t, memmode, depth + 1))
-   return 1;
+   return true;
}
 
-  return 0;
+  return false;
 }
 
   if (GET_MODE (x) != GET_MODE (y))
-return 0;
+return false;
 
   if (GET_CODE (x) != GET_CODE (y)
   || (GET_CODE (x) == PLUS
@@ -1044,16 +1044,16 @@ rtx_equal_for_cselib_1 (rtx x, rtx y, machine_mode 
memmode, int depth)
   if (x != xorig || y != yorig)
{
  if (!xoff != !yoff)
-   return 0;
+   return false;
 
  if (xoff && !rtx_equal_for_cselib_1 (xoff, yoff, memmode, depth))
-   return 0;
+   return false;
 
  return rtx_equal_for_cselib_1 (x, y, memmode,

[committed] libstdc++: Do not use off64_t in calls to copy_file_range [PR110462]

2023-06-29 Thread Jonathan Wakely via Gcc-patches

Tested x86_64-linux. Pushed to trunk.

-- >8 --

Although the copy_file_range(2) man page shows the arguments as off64_t*
that is not portable. For musl there is no off64_t type, as off_t is
always 64-bit. Use the loff_t type which is always 64-bit even if off_t
isn't. We could just use off_t because the filesystem library is
compiled with _FILE_OFFSET_BITS=64, but loff_t is the more correct type
for this interface.

libstdc++-v3/ChangeLog:

PR libstdc++/110462
* acinclude.m4 (GLIBCXX_CHECK_FILESYSTEM_DEPS): Check that
copy_file_range can be called with loff_t* arguments.
* configure: Regenerate.
* src/filesystem/ops-common.h (copy_file_copy_file_range):
Use loff_t for offsets.
---
 libstdc++-v3/acinclude.m4| 2 +-
 libstdc++-v3/configure   | 4 ++--
 libstdc++-v3/src/filesystem/ops-common.h | 2 +-
 3 files changed, 4 insertions(+), 4 deletions(-)

diff --git a/libstdc++-v3/acinclude.m4 b/libstdc++-v3/acinclude.m4
index efc27aa493e..277ae10e031 100644
--- a/libstdc++-v3/acinclude.m4
+++ b/libstdc++-v3/acinclude.m4
@@ -5160,7 +5160,7 @@ dnl
   linux*)
GCC_TRY_COMPILE_OR_LINK(
  [#include ],
- [copy_file_range(1, nullptr, 2, nullptr, 1, 0);],
+ [copy_file_range(1, (loff_t*)nullptr, 2, (loff_t*)nullptr, 1, 0);],
  [glibcxx_cv_copy_file_range=yes],
  [glibcxx_cv_copy_file_range=no])
;;
diff --git a/libstdc++-v3/src/filesystem/ops-common.h 
b/libstdc++-v3/src/filesystem/ops-common.h
index f04bbc66d7d..2e4331bb682 100644
--- a/libstdc++-v3/src/filesystem/ops-common.h
+++ b/libstdc++-v3/src/filesystem/ops-common.h
@@ -374,7 +374,7 @@ _GLIBCXX_BEGIN_NAMESPACE_FILESYSTEM
return false;
   }
 size_t bytes_left = length;
-off64_t off_in = 0, off_out = 0;
+loff_t off_in = 0, off_out = 0;
 ssize_t bytes_copied;
 do
   {
-- 
2.41.0

[PATCH] c++: unpropagated CONSTRUCTOR_MUTABLE_POISON [PR110463]

2023-06-29 Thread Patrick Palka via Gcc-patches

Bootstrapped and regtested on x86_64-pc-linux-gnu, does this look OK for
trunk/13?

-- >8 --

cp_fold is neglecting to propagate CONSTRUCTOR_MUTABLE_POISON when folding
a CONSTRUCTOR initializer, which for the below testcase causes us to fail
to reject a mutable member access of a constexpr variable during constexpr
evaluation.

PR c++/110463

gcc/cp/ChangeLog:

* cp-gimplify.cc (cp_fold) : Propagate
CONSTRUCTOR_MUTABLE_POISON.

gcc/testsuite/ChangeLog:

* g++.dg/cpp0x/constexpr-mutable6.C: New test.
---
 gcc/cp/cp-gimplify.cc  |  2 ++
 .../g++.dg/cpp0x/constexpr-mutable6.C  | 18 ++
 2 files changed, 20 insertions(+)
 create mode 100644 gcc/testsuite/g++.dg/cpp0x/constexpr-mutable6.C

diff --git a/gcc/cp/cp-gimplify.cc b/gcc/cp/cp-gimplify.cc
index 853b1e44236..f5734197774 100644
--- a/gcc/cp/cp-gimplify.cc
+++ b/gcc/cp/cp-gimplify.cc
@@ -3079,6 +3079,8 @@ cp_fold (tree x, fold_flags_t flags)
x = build_constructor (TREE_TYPE (x), nelts);
CONSTRUCTOR_PLACEHOLDER_BOUNDARY (x)
  = CONSTRUCTOR_PLACEHOLDER_BOUNDARY (org_x);
+   CONSTRUCTOR_MUTABLE_POISON (x)
+ = CONSTRUCTOR_MUTABLE_POISON (org_x);
  }
if (VECTOR_TYPE_P (TREE_TYPE (x)))
  x = fold (x);
diff --git a/gcc/testsuite/g++.dg/cpp0x/constexpr-mutable6.C 
b/gcc/testsuite/g++.dg/cpp0x/constexpr-mutable6.C
new file mode 100644
index 000..2c946e388ab
--- /dev/null
+++ b/gcc/testsuite/g++.dg/cpp0x/constexpr-mutable6.C
@@ -0,0 +1,18 @@
+// PR c++/110463
+// { dg-do compile { target c++11 } }
+
+struct U {
+  mutable int x = 1;
+};
+
+struct V {
+  mutable int y = 1+1;
+};
+
+int main() {
+  constexpr U u = {};
+  constexpr int x = u.x; // { dg-error "mutable" }
+
+  constexpr V v = {};
+  constexpr int y = v.y; // { dg-error "mutable" }
+}
-- 
2.41.0.199.ga9e066fa63

[PATCH] c++: NSDMI instantiation during overload resolution [PR110468]

2023-06-29 Thread Patrick Palka via Gcc-patches

Bootstrapped and regtested on x86_64-pc-linux-gnu, does this look OK for
trunk/13/12?

-- >8 --

Here we find ourselves instantiating the NSDMI for A<1>::m when
computing argument conversions during overload resolution, and
thus tf_conv is set.  This causes mark_used for the constructor
used in the NSDMI to exit early and not instantiate its noexcept-spec,
leading to an ICE from nothrow_spec_p.

This patch fixes this by clearing any unusual tsubst flags during
instantiation of an NSDMI, since the result should be independent of
the context that requires the instantiation.

PR c++/110468

gcc/cp/ChangeLog:

* init.cc (maybe_instantiate_nsdmi_init): Mask out all
tsubst flags except for tf_warning_or_error.

gcc/testsuite/ChangeLog:

* g++.dg/cpp0x/noexcept79.C: New test.
---
 gcc/cp/init.cc  |  4 
 gcc/testsuite/g++.dg/cpp0x/noexcept79.C | 18 ++
 2 files changed, 22 insertions(+)
 create mode 100644 gcc/testsuite/g++.dg/cpp0x/noexcept79.C

diff --git a/gcc/cp/init.cc b/gcc/cp/init.cc
index af6e30f511e..f01a11c5299 100644
--- a/gcc/cp/init.cc
+++ b/gcc/cp/init.cc
@@ -579,6 +579,10 @@ maybe_instantiate_nsdmi_init (tree member, tsubst_flags_t 
complain)
   /* tsubst_decl uses void_node to indicate an uninstantiated DMI.  */
   if (init == void_node)
 {
+  /* The result of NSDMI instantiation should be independent of
+the tsubst flags we're given.  */
+  complain &= tf_warning_or_error;
+
   init = DECL_INITIAL (DECL_TI_TEMPLATE (member));
   location_t expr_loc
= cp_expr_loc_or_loc (init, DECL_SOURCE_LOCATION (member));
diff --git a/gcc/testsuite/g++.dg/cpp0x/noexcept79.C 
b/gcc/testsuite/g++.dg/cpp0x/noexcept79.C
new file mode 100644
index 000..d1f54d14431
--- /dev/null
+++ b/gcc/testsuite/g++.dg/cpp0x/noexcept79.C
@@ -0,0 +1,18 @@
+// PR c++/110468
+// { dg-do compile { target c++11 } }
+
+template
+struct variant {
+  variant() noexcept(T > 0);
+};
+
+template
+struct A {
+  variant m = {};
+};
+
+struct B {
+  B(A<1>);
+};
+
+B b = {{}};
-- 
2.41.0.199.ga9e066fa63

[committed] libstdc++: Fix src/c++20/tzdb.cc for non-constexpr std::mutex

2023-06-29 Thread Jonathan Wakely via Gcc-patches

This build failure for riscv32-esp-elf was "reported" on the gcc-help list:
https://gcc.gnu.org/pipermail/gcc-help/2023-June/142641.html

Tested x86_64-linux. Pushed to trunk.

-- >8 --

Building libstdc++ reportedly fails for targets without lock-free
std::atomic which don't define __GTHREAD_MUTEX_INIT:

src/c++20/tzdb.cc:110:21: error: 'constinit' variable 
'std::chrono::{anonymous}::list_mutex' does not have a constant initializer
src/c++20/tzdb.cc:110:21: error: call to non-'constexpr' function 
'std::mutex::mutex()'

The solution implemented by this commit is to use a local static mutex
when it can't be constinit, so that it's constructed on first use.

With this change, we can also simplify the preprocessor logic for
defining USE_ATOMIC_SHARED_PTR. It now depends on the same conditions as
USE_ATOMIC_LIST_HEAD, so in theory we could have a single macro. Keeping
them separate would allow us to replace the use of atomic>
with a mutex if that performs better, without having to give up on the
lock-free cache for fast access to the list head.

libstdc++-v3/ChangeLog:

* src/c++20/tzdb.cc (USE_ATOMIC_SHARED_PTR): Define consistently
with USE_ATOMIC_LIST_HEAD.
(list_mutex): Replace global object with function. Use local
static object when std::mutex constructor isn't constexpr.
---
 libstdc++-v3/src/c++20/tzdb.cc | 46 ++
 1 file changed, 24 insertions(+), 22 deletions(-)

diff --git a/libstdc++-v3/src/c++20/tzdb.cc b/libstdc++-v3/src/c++20/tzdb.cc
index a43b4f33eba..8d27726016e 100644
--- a/libstdc++-v3/src/c++20/tzdb.cc
+++ b/libstdc++-v3/src/c++20/tzdb.cc
@@ -41,22 +41,17 @@
 # include// getenv
 #endif
 
-#ifndef __GTHREADS
-# define USE_ATOMIC_SHARED_PTR 0
-#elif _WIN32
-// std::mutex cannot be constinit, so Windows must use atomic>.
-# define USE_ATOMIC_SHARED_PTR 1
-#elif ATOMIC_POINTER_LOCK_FREE < 2
-# define USE_ATOMIC_SHARED_PTR 0
-#else
-// TODO benchmark atomic> vs mutex.
-# define USE_ATOMIC_SHARED_PTR 1
-#endif
-
 #if defined __GTHREADS && ATOMIC_POINTER_LOCK_FREE == 2
 # define USE_ATOMIC_LIST_HEAD 1
+// TODO benchmark atomic> vs mutex.
+# define USE_ATOMIC_SHARED_PTR 1
 #else
 # define USE_ATOMIC_LIST_HEAD 0
+# define USE_ATOMIC_SHARED_PTR 0
+#endif
+
+#if USE_ATOMIC_SHARED_PTR && ! USE_ATOMIC_LIST_HEAD
+# error Unsupported combination
 #endif
 
 #if ! __cpp_constinit
@@ -106,9 +101,18 @@ namespace std::chrono
 // Dummy no-op mutex type for single-threaded targets.
 struct mutex { void lock() { } void unlock() { } };
 #endif
-/// XXX std::mutex::mutex() not constexpr on Windows, so can't be constinit
-constinit mutex list_mutex;
+inline mutex& list_mutex()
+{
+#ifdef __GTHREAD_MUTEX_INIT
+  constinit static mutex m;
+#else
+  // Cannot use a constinit mutex, so use a local static.
+  alignas(mutex) constinit static char buf[sizeof(mutex)];
+  static mutex& m = *::new(buf) mutex();
 #endif
+  return m;
+}
+#endif // ! USE_ATOMIC_SHARED_PTR
 
 struct Rule;
   }
@@ -154,7 +158,7 @@ namespace std::chrono
 static _Node*
 _S_list_head(memory_order)
 {
-  lock_guard l(list_mutex);
+  lock_guard l(list_mutex());
   return _S_head_owner.get();
 }
 
@@ -1279,7 +1283,7 @@ namespace std::chrono
   }
 // XXX small window here where _S_head_cache still points to previous tzdb.
 #else
-lock_guard l(list_mutex);
+lock_guard l(list_mutex());
 if (const _Node* h = _S_head_owner.get())
   {
if (h->db.version == new_head_ptr->db.version)
@@ -1406,11 +1410,12 @@ namespace std::chrono
 #else
 if (Node::_S_list_head(memory_order::relaxed) != nullptr) [[likely]]
 {
-  lock_guard l(list_mutex);
+  lock_guard l(list_mutex());
   const tzdb& current = Node::_S_head_owner->db;
   if (current.version == version)
return current;
 }
+shared_ptr head; // Passed as unused arg to _S_replace_head.
 #endif
 
 auto [leaps, leaps_ok] = Node::_S_read_leap_seconds();
@@ -1499,9 +1504,6 @@ namespace std::chrono
 ranges::sort(node->db.links, {}, _zone_link::name);
 ranges::stable_sort(node->rules, {}, ::name);
 
-#if ! USE_ATOMIC_SHARED_PTR
-shared_ptr head;
-#endif
 return Node::_S_replace_head(std::move(head), std::move(node));
 #else
 __throw_disabled();
@@ -1526,7 +1528,7 @@ namespace std::chrono
 #if USE_ATOMIC_SHARED_PTR
 return const_iterator{_Node::_S_head_owner.load()};
 #else
-lock_guard l(list_mutex);
+lock_guard l(list_mutex());
 return const_iterator{_Node::_S_head_owner};
 #endif
   }
@@ -1539,7 +1541,7 @@ namespace std::chrono
 if (p._M_node) [[likely]]
 {
 #if ! USE_ATOMIC_SHARED_PTR
-  lock_guard l(list_mutex);
+  lock_guard l(list_mutex());
 #endif
   if (auto next = p._M_node->next) [[likely]]
return const_iterator{p._M_node->next = std::move(next->next)};
-- 
2.41.0

Re: [PATCH] Introduce hardbool attribute for C

2023-06-29 Thread Qing Zhao via Gcc-patches

Hi, ALexandre,

Thank you for the explanation.
I am now clear with the interaction between hardbool and 
-ftrivial-auto-var-init, and also agree
that clarifying the documentation on their interaction is good enough. 

Qing

> On Jun 29, 2023, at 6:30 AM, Alexandre Oliva  wrote:
> 
> On Jun 28, 2023, Qing Zhao  wrote:
> 
>> In summary, Ada’s Boolean variables (whether it’s hardened or not) are
>> represented as
>> enumeration types in GNU IR.
> 
> Not quite.  Only boolean types with representation clauses are.  Those
> without (such as Standard.Boolean) are BOOLEAN_TYPEs.  But those without
> a representation clause are not so relevant and could be disregarded,
> for purposes of this conversation.
> 
>> FE takes care of the converting between non-boolean_type_node
>> enumeration types and boolean_type_node as needed, no special handling
>> in Middle end.
> 
>> So, is this exactly the same situation as the new hardbool attribute
>> for C being implemented in this patch?
> 
> That's correct.
> 
>> (Another question, for Ada’s Boolean variables, does the ada FE also
>> insert BUILT_IN_TRAP when
>>  The value is neither true_value nor false_value?)
> 
> Ada raises exceptions when validity checking fails; such as upon using a
> boolean variable with a representation clause holds a value that is
> neither true nor false.
> 
>>> The middle-end doesn't know (and ATM cannot know) that those represented
>>> as enumeration types are conceptually booleans, so they are treated as
>>> enumeration types, not as booleans.
> 
>> They should know it’s a boolean if using the lookup_attribute to get
>> the attribute info -:)
> 
> I meant boolean types that have a representation clause but are not
> hardbools.  Those don't have any attribute whatsoever.
> 
>>> You mean more than what's in the patch posted last week?
>> No, the updated doc is good I think.
> 
> Great, thanks
> 
>> So, from my current understanding, a summary on my major concern and
>> the possible solution to this concern:
> 
> That was a good summary.
> 
>> Is it necessary to fix such inconsistency?
> 
> I don't think it is even desirable.
> 
> Initialization of static variables is well-defined, one is allowed to
> count on a specific value after initialization, and we have that
> covered.
> 
> Automatic variables, OTOH, when not explicitly initialized, may hold
> undefined, random, even malformed values.  Picking an initializer to
> make them predictable needs not follow the semantics of zero
> initialization for static variables.  =pattern makes it clear that using
> something other than zero initialization is useful to catch errors.  The
> Ada language standard even suggests that compilers may set uninitialized
> variables to out-of-range values so as to catch this sort of error.  So,
> though it might seem desirable, for symmetry, to have automatic
> variables implicitly initialized similarly to static variables, it's not
> clear that doing so serves a useful purpose, at least for such types as
> hardened booleans, that are *intended* to catch malformed values.
> 
> -- 
> Alexandre Oliva, happy hackerhttps://FSFLA.org/blogs/lxo/
>   Free Software Activist   GNU Toolchain Engineer
> Disinformation flourishes because many people care deeply about injustice
> but very few check the facts.  Ask me about

Re: [PATCH] Prevent TYPE_PRECISION on VECTOR_TYPEs

2023-06-29 Thread Jeff Law via Gcc-patches





On 6/29/23 02:11, Richard Biener wrote:

On Wed, 28 Jun 2023, Jeff Law wrote:




On 6/28/23 22:04, Li, Pan2 wrote:

It seems this patch may result in many test ICE failures on RISC-V backend.
Could you help to double confirm about it follow the possible reproduce
steps like blow? Thank you!

I've one ICE due to this change as well but it wasn't in the
tree-ssa-math-opts.code like this one is.  In my case we're in a place where
it doesn't look like we expect a vector type to show up, but it does and we
can likely just prune it away.

Anyway, your fault is in here:



divmod_candidate_p:

  if (TYPE_PRECISION (type) <= HOST_BITS_PER_WIDE_INT
   && TYPE_PRECISION (type) <= BITS_PER_WORD)
 return false;

TYPE is almost certainly a vector type.  The question we need to answer (and
I'm not likely to get to it tomorrow) would be whether or not TYPE can
legitimately be a vector type here.


I think GCN people wanted to make this code work for vectors, the
most obvious local fix is to use element_precision (type) above.

Note usually vector integer divisions are not a thing so this might
explain why you're seeing this only with RVV?
Vector integer division is available in RVV, it's also available in 
Tachyum's chip.  So GCN isn't alone in wanting this capability.


jeff

Re: [PATCH] Machine Description: Add LEN_MASK_{GATHER_LOAD, SCATTER_STORE} pattern

2023-06-29 Thread Robin Dapp via Gcc-patches

Hi Juzhe,

just looking at the documentation changes.

> +@cindex @code{len_mask_gather_load@var{m}@var{n}} instruction pattern
> +@item @samp{len_mask_gather_load@var{m}@var{n}}
> +Like @samp{gather_load@var{m}@var{n}}, but takes an extra len operand
> +as operand 5 and an extra mask operand as operand 6.  Bit @var{i} of
> +the mask is set and i < len if element @var{i} of the result should be
> +loaded from memory.  Element @var{i} of the result should be undefined
> +value when either Bit @var{i} of the mask is clear or i >= len.
> +

I would suggest to rephrase this slightly as:

"Like ... but takes an extra length operand (operand 5) as well as a
mask operand (operand 6).  Similar to len_maskload, the instruction
loads at most (operand 5) elements from memory.  
Bit @var{i} of the mask is set if element @var{i} of the result should
be loaded from memory and clear if element @var{i} of the result
should be undefined.  Mask elements @var{i} with i > (operand 5) are
ignored."

to make it more similar to mask_gather_load.  Further improvements
welcome though - not sure if we should refer to len_maskload at all
because it has a bias operand and mask_gather_load doesn't.

> +@cindex @code{len_mask_scatter_store@var{m}@var{n}} instruction pattern
> +@item @samp{len_mask_scatter_store@var{m}@var{n}}
> +Like @samp{scatter_store@var{m}@var{n}}, but takes an extra len operand as
> +operand 5 and an extra mask operand as operand 6.  Bit @var{i} of the mask
> +is set and i < len if element @var{i} of the result should be stored to 
> memory.
> +
"an extra length operand (operand 5)... The instruction stores
at most (operand 5) elements of (operand 4) to memory.
Bit @var{i} of the mask is set if element @var{i} of (operand 4)
should be stored.  Mask elements @var{i} with i > (operand 5) are
ignored."

Regards
 Robin

Re: [PATCH V3] RISC-V: Fix bug of pre-calculated const vector mask for VNx1BI, VNx2BI and VNx4BI

2023-06-29 Thread Robin Dapp via Gcc-patches

>> Hi Robin:
>>
>>> diff --git a/gcc/lto/lto-lang.cc b/gcc/lto/lto-lang.cc
>>> index 52d7626e92e..14d419c2013 100644
>>> --- a/gcc/lto/lto-lang.cc
>>> +++ b/gcc/lto/lto-lang.cc
>>> @@ -1050,7 +1050,7 @@ lto_type_for_mode (machine_mode mode, int unsigned_p)
>>>else if (GET_MODE_CLASS (mode) == MODE_VECTOR_BOOL
>>>&& valid_vector_subparts_p (GET_MODE_NUNITS (mode)))
>>>  {
>>> -  unsigned int elem_bits = vector_element_size (GET_MODE_BITSIZE 
>>> (mode),
>>> +  unsigned int elem_bits = vectwhereor_element_size 
>>> (GET_MODE_PRECISION (mode),
>>
>> This seems weird?

Indeed :D Must be an accidental middle-click in Thunderbird.  I just
re-checked and the diff itself is fine.

> FWIW, I bootstrapped & regression-tested the patch with that fixed
> on aarch64-linux-gnu (all languages).
> 
> So OK with the above fixed from my POV.
Oh, thanks!  Mine is still running, not even with all languages.  I picked
the M1 from the compile farm which only has eight cores.

Kito (or somebody else), would you mind doing a RISC-V bootstrap?  It would
take forever on my machine.  Thank you.

Regards
 Robin

Re: [PATCH 10/11] riscv: thead: Add support for the XTheadMemIdx ISA extension

2023-06-29 Thread Jeff Law via Gcc-patches





On 6/29/23 01:39, Christoph Müllner wrote:

On Wed, Jun 28, 2023 at 8:23 PM Jeff Law  wrote:




On 6/28/23 06:39, Christoph Müllner wrote:


+;; XTheadMemIdx overview:
+;; All peephole passes attempt to improve the operand utilization of
+;; XTheadMemIdx instructions, where one sign or zero extended
+;; register-index-operand can be shifted left by a 2-bit immediate.
+;;
+;; The basic idea is the following optimization:
+;; (set (reg 0) (op (reg 1) (imm 2)))
+;; (set (reg 3) (mem (plus (reg 0) (reg 4)))
+;; ==>
+;; (set (reg 3) (mem (plus (reg 4) (op2 (reg 1) (imm 2
+;; This optimization only valid if (reg 0) has no further uses.

Couldn't this be done by combine if you created define_insn patterns
rather than define_peephole2 patterns?  Similarly for the other cases
handled here.


I was inspired by XTheadMemPair, which merges two memory accesses
into a mem-pair instruction (and which got inspiration from
gcc/config/aarch64/aarch64-ldpstp.md).

Right.  I'm pretty familiar with those.  They cover a different case,
specifically the two insns being optimized don't have a true data
dependency between them.  ie, the first instruction does not produce a
result used in the second insn.


In the case above there is a data dependency on reg0.  ie, the first
instruction generates a result used in the second instruction.  combine
is usually the best place to handle the data dependency case.


Ok, understood.

It is a bit of a special case here, because the peephole is restricted
to those cases, where reg0 is not used elsewhere (peep2_reg_dead_p()).
I have not seen how to do this for combiner optimizations.
If the value is used elsewhere, then the combiner will generate a 
parallel with two sets.  If the value dies, then the combiner generates 
the one set.  ie given


(set (t) (op0 (a) (b)))
(set (r) (op1 (c) (t)))

If "t" is dead, then combine will present you with:

(set (r) (op1 (c) (op0 (a) (b

If "t" is used elsewhere, then combine will present you with:

(parallel
  [(set (r) (op1 (c) (op0 (a) (b
   (set (t) (op0 (a) (b)))])

Which makes perfect sense if you think about it for a while.  If you 
still need "t", then the first sequence simply isn't valid as it doesn't 
preserve that side effect.  Hence it tries to produce a sequence with 
the combined operation, but with the side effect of the first statement 
included as well.



Jeff

Re: [PATCH V3] RISC-V: Fix bug of pre-calculated const vector mask for VNx1BI, VNx2BI and VNx4BI

2023-06-29 Thread Richard Sandiford via Gcc-patches

Kito Cheng  writes:
> Hi Robin:
>
>> diff --git a/gcc/lto/lto-lang.cc b/gcc/lto/lto-lang.cc
>> index 52d7626e92e..14d419c2013 100644
>> --- a/gcc/lto/lto-lang.cc
>> +++ b/gcc/lto/lto-lang.cc
>> @@ -1050,7 +1050,7 @@ lto_type_for_mode (machine_mode mode, int unsigned_p)
>>else if (GET_MODE_CLASS (mode) == MODE_VECTOR_BOOL
>>&& valid_vector_subparts_p (GET_MODE_NUNITS (mode)))
>>  {
>> -  unsigned int elem_bits = vector_element_size (GET_MODE_BITSIZE (mode),
>> +  unsigned int elem_bits = vectwhereor_element_size (GET_MODE_PRECISION 
>> (mode),
>
> This seems weird?

FWIW, I bootstrapped & regression-tested the patch with that fixed
on aarch64-linux-gnu (all languages).

So OK with the above fixed from my POV.

Thanks,
Richard

>
>> GET_MODE_NUNITS (mode));
>>tree bool_type = build_nonstandard_boolean_type (elem_bits);
>>return build_vector_type_for_mode (bool_type, mode);

Re: [PATCH V3] RISC-V: Fix bug of pre-calculated const vector mask for VNx1BI, VNx2BI and VNx4BI

2023-06-29 Thread Kito Cheng via Gcc-patches

Hi Robin:

> diff --git a/gcc/lto/lto-lang.cc b/gcc/lto/lto-lang.cc
> index 52d7626e92e..14d419c2013 100644
> --- a/gcc/lto/lto-lang.cc
> +++ b/gcc/lto/lto-lang.cc
> @@ -1050,7 +1050,7 @@ lto_type_for_mode (machine_mode mode, int unsigned_p)
>else if (GET_MODE_CLASS (mode) == MODE_VECTOR_BOOL
>&& valid_vector_subparts_p (GET_MODE_NUNITS (mode)))
>  {
> -  unsigned int elem_bits = vector_element_size (GET_MODE_BITSIZE (mode),
> +  unsigned int elem_bits = vectwhereor_element_size (GET_MODE_PRECISION 
> (mode),

This seems weird?

> GET_MODE_NUNITS (mode));
>tree bool_type = build_nonstandard_boolean_type (elem_bits);
>return build_vector_type_for_mode (bool_type, mode);

Re: [RFC] GNU Vector Extension -- Packed Boolean Vectors

2023-06-29 Thread Richard Biener via Gcc-patches

On Wed, Jun 28, 2023 at 1:26 PM Tejas Belagod  wrote:
>
>
>
>
>
> From: Richard Biener 
> Date: Tuesday, June 27, 2023 at 12:58 PM
> To: Tejas Belagod 
> Cc: gcc-patches@gcc.gnu.org 
> Subject: Re: [RFC] GNU Vector Extension -- Packed Boolean Vectors
>
> On Tue, Jun 27, 2023 at 8:30 AM Tejas Belagod  wrote:
> >
> >
> >
> >
> >
> > From: Richard Biener 
> > Date: Monday, June 26, 2023 at 2:23 PM
> > To: Tejas Belagod 
> > Cc: gcc-patches@gcc.gnu.org 
> > Subject: Re: [RFC] GNU Vector Extension -- Packed Boolean Vectors
> >
> > On Mon, Jun 26, 2023 at 8:24 AM Tejas Belagod via Gcc-patches
> >  wrote:
> > >
> > > Hi,
> > >
> > > Packed Boolean Vectors
> > > --
> > >
> > > I'd like to propose a feature addition to GNU Vector extensions to add 
> > > packed
> > > boolean vectors (PBV).  This has been discussed in the past here[1] and a 
> > > variant has
> > > been implemented in Clang recently[2].
> > >
> > > With predication features being added to vector architectures (SVE, MVE, 
> > > AVX),
> > > it is a useful feature to have to model predication on targets.  This 
> > > could
> > > find its use in intrinsics or just used as is as a GNU vector extension 
> > > being
> > > mapped to underlying target features.  For example, the packed boolean 
> > > vector
> > > could directly map to a predicate register on SVE.
> > >
> > > Also, this new packed boolean type GNU extension can be used with SVE ACLE
> > > intrinsics to replace a fixed-length svbool_t.
> > >
> > > Here are a few options to represent the packed boolean vector type.
> >
> > The GIMPLE frontend uses a new 'vector_mask' attribute:
> >
> > typedef int v8si __attribute__((vector_size(8*sizeof(int;
> > typedef v8si v8sib __attribute__((vector_mask));
> >
> > it get's you a vector type that's the appropriate (dependent on the
> > target) vector
> > mask type for the vector data type (v8si in this case).
> >
> >
> >
> > Thanks Richard.
> >
> > Having had a quick look at the implementation, it does seem to tick the 
> > boxes.
> >
> > I must admit I haven't dug deep, but if the target hook allows the mask to 
> > be
> >
> > defined in way that is target-friendly (and I don't know how much effort it 
> > will
> >
> > be to migrate the attribute to more front-ends), it should do the job 
> > nicely.
> >
> > Let me go back and dig a bit deeper and get back with questions if any.
>
>
> Let me add that the advantage of this is the compiler doesn't need
> to support weird explicitely laid out packed boolean vectors that do
> not match what the target supports and the user doesn't need to know
> what the target supports (and thus have an #ifdef maze around explicitely
> specified layouts).
>
> Sorry for the delayed response – I spent a day experimenting with vector_mask.
>
>
>
> Yeah, this is what option 4 in the RFC is trying to achieve – be portable 
> enough
>
> to avoid having to sprinkle the code with ifdefs.
>
>
> It does remove some flexibility though, for example with -mavx512f -mavx512vl
> you'll get AVX512 style masks for V4SImode data vectors but of course the
> target sill supports SSE2/AVX2 style masks as well, but those would not be
> available as "packed boolean vectors", though they are of course in fact
> equal to V4SImode data vectors with -1 or 0 values, so in this particular
> case it might not matter.
>
> That said, the vector_mask attribute will get you V4SImode vectors with
> signed boolean elements of 32 bits for V4SImode data vectors with
> SSE2/AVX2.
>
>
>
> This sounds very much like what the scenario would be with NEON vs SVE. 
> Coming to think
>
> of it, vector_mask resembles option 4 in the proposal with ‘n’ implied by the 
> ‘base’ vector type
>
> and a ‘w’ specified for the type.
>
>
>
> Given its current implementation, if vector_mask is exposed to the CFE, would 
> there be any
>
> major challenges wrt implementation or defining behaviour semantics? I played 
> around with a
>
> few examples from the testsuite and wrote some new ones. I mostly tried 
> operations that
>
> the new type would have to support (unary, binary bitwise, initializations 
> etc) – with a couple of exceptions
>
> most of the ops seem to be supported. I also triggered a couple of ICEs in 
> some tests involving
>
> implicit conversions to wider/narrower vector_mask types (will raise reports 
> for these). Correct me
>
> if I’m wrong here, but we’d probably have to support a couple of new ops if 
> vector_mask is exposed
>
> to the CFE – initialization and subscript operations?

Yes, either that or restrict how the mask vectors can be used, thus
properly diagnose improper
uses.  A question would be for example how to write common mask test
operations like
if (any (mask)) or if (all (mask)).  Likewise writing merge operations
- do those as

 a = a | (mask ? b : 0);

thus use ternary ?: for this?  For initialization regular vector
syntax should work:

mtype mask = (mtype){ -1, -1, 0, 0, ... };

there's the question of the

Re: [committed] libstdc++: Fix preprocessor conditions for std::from_chars [PR109921]

2023-06-29 Thread Christophe Lyon via Gcc-patches

On Thu, 29 Jun 2023 at 14:50, Jonathan Wakely  wrote:

>
>
> On Thu, 1 Jun 2023 at 12:05, Jonathan Wakely 
> wrote:
>
>> On Thu, 1 Jun 2023 at 10:30, Christophe Lyon via Libstdc++
>>  wrote:
>> >
>> > Hi,
>> >
>> >
>> > On Wed, 31 May 2023 at 14:25, Jonathan Wakely via Gcc-patches <
>> > gcc-patches@gcc.gnu.org> wrote:
>> >
>> > > Tested powerpc64le-linux. Pushed to trunk.
>> > >
>> > > -- >8 --
>> > >
>> > > We use the from_chars_strtod function with __strtof128 to read a
>> > > _Float128 value, but from_chars_strtod is not defined unless uselocale
>> > > is available. This can lead to compilation failures for some targets,
>> > > because we try to define the _Flaot128 overload in terms of a
>> > > non-existing from_chars_strtod function.
>> > >
>> > > Only try to use __strtof128 if uselocale is available, otherwise
>> > > fallback to the long double overload of std::from_chars (which might
>> > > fallback to the double overload, which should use fast_float).
>> > >
>> > > This ensures we always define the full set of overloads, even if they
>> > > are not always accurate for all values of the wider types.
>> > >
>> > > libstdc++-v3/ChangeLog:
>> > >
>> > > PR libstdc++/109921
>> > > * src/c++17/floating_from_chars.cc
>> (USE_STRTOF128_FOR_FROM_CHARS):
>> > > Only define when USE_STRTOD_FOR_FROM_CHARS is also defined.
>> > > (USE_STRTOD_FOR_FROM_CHARS): Do not undefine when long double
>> is
>> > > binary64.
>> > > (from_chars(const char*, const char*, double&, chars_format)):
>> > > Check __LDBL_MANT_DIG__ == __DBL_MANT_DIG__ here.
>> > > (from_chars(const char*, const char*, _Float128&,
>> chars_format))
>> > > Only use from_chars_strtod when USE_STRTOD_FOR_FROM_CHARS is
>> > > defined, otherwise parse a long double and convert to
>> _Float128.
>> > >
>> >
>> >
>> > This is causing a regression on aarch64:
>> >  FAIL: libstdc++-abi/abi_check
>>
>> This is now PR 110077.
>>
>
> Hi Christophe,
>
> Is this fixed for aarch64 now? I think it should be.
>
> Hi Jonathan,

Yes, I know see
PASS: libstdc++-abi/abi_check

Thanks for fixing this.

Christophe



>
>>
>>
>> >
>> > The log says:
>> >
>> > 3 added symbols
>> > 0
>> > _ZNSt7__cxx1112basic_stringIwSt11char_traitsIwESaIwEE11_S_allocateERS3_m
>> > std::__cxx11::basic_string,
>> > std::allocator >::_S_allocate(std::allocator&,
>> unsigned
>> > long)
>> > version status: compatible
>> > GLIBCXX_3.4.32
>> > type: function
>> > status: added
>> >
>> > 1
>> > _ZNSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEE11_S_allocateERS3_m
>> > std::__cxx11::basic_string,
>> > std::allocator >::_S_allocate(std::allocator&, unsigned
>> long)
>> > version status: compatible
>> > GLIBCXX_3.4.32
>> > type: function
>> > status: added
>> >
>> > 2
>> > _ZSt10from_charsPKcS0_RDF128_St12chars_format
>> > std::from_chars(char const*, char const*, _Float128&, std::chars_format)
>> > version status: incompatible
>> > GLIBCXX_3.4.31
>> > type: function
>> > status: added
>> >
>> >
>> > 2 undesignated symbols
>> > 0
>> > _ZSt11__once_call
>> > std::__once_call
>> > version status: compatible
>> > GLIBCXX_3.4.11
>> > type: tls
>> > type size: 8
>> > status: undesignated
>> >
>> > 1
>> > _ZSt15__once_callable
>> > std::__once_callable
>> > version status: compatible
>> > GLIBCXX_3.4.11
>> > type: tls
>> > type size: 8
>> > status: undesignated
>> >
>> >
>> > 1 incompatible symbols
>> > 0
>> > _ZSt10from_charsPKcS0_RDF128_St12chars_format
>> > std::from_chars(char const*, char const*, _Float128&, std::chars_format)
>> > version status: incompatible
>> > GLIBCXX_3.4.31
>> > type: function
>> > status: added
>> >
>> >
>> >
>> >  libstdc++-v3 check-abi Summary 
>> >
>> > # of added symbols:  3
>> > # of missing symbols:0
>> > # of undesignated symbols:   2
>> > # of incompatible symbols:   1
>> >
>> >
>> > Can you have a look?
>> >
>> > Thanks,
>> > Christophe
>> >
>> > ---
>> > >  libstdc++-v3/src/c++17/floating_from_chars.cc | 20
>> ---
>> > >  1 file changed, 13 insertions(+), 7 deletions(-)
>> > >
>> > > diff --git a/libstdc++-v3/src/c++17/floating_from_chars.cc
>> > > b/libstdc++-v3/src/c++17/floating_from_chars.cc
>> > > index ebd428d5be3..eea878072b0 100644
>> > > --- a/libstdc++-v3/src/c++17/floating_from_chars.cc
>> > > +++ b/libstdc++-v3/src/c++17/floating_from_chars.cc
>> > > @@ -64,7 +64,7 @@
>> > >  // strtold for __ieee128
>> > >  extern "C" __ieee128 __strtoieee128(const char*, char**);
>> > >  #elif __FLT128_MANT_DIG__ == 113 && __LDBL_MANT_DIG__ != 113 \
>> > > -  && defined(__GLIBC_PREREQ)
>> > > +  && defined(__GLIBC_PREREQ) &&
>> defined(USE_STRTOD_FOR_FROM_CHARS)
>> > >  #define USE_STRTOF128_FOR_FROM_CHARS 1
>> > >  extern "C" _Float128 __strtof128(const char*, char**)
>> > >__asm ("strtof128")
>> > > @@ -77,10 +77,6 @@ extern "C" _Float128 __strtof128(const char*,
>> char**)
>> > >  #if

Re: [PATCH][RFC] target/110456 - avoid loop masking with zero distance dependences

2023-06-29 Thread Richard Biener via Gcc-patches

On Thu, 29 Jun 2023, Richard Sandiford wrote:

> Richard Biener  writes:
> > With applying loop masking to epilogues on x86_64 AVX512 we see
> > some significant performance regressions when evaluating SPEC CPU 2017
> > that are caused by store-to-load forwarding fails across outer
> > loop iterations when the inner loop does not iterate.  Consider
> >
> >   for (j = 0; j < m; ++j)
> > for (i = 0; i < n; ++i)
> >   a[j*n + i] += b[j*n + i];
> >
> > with 'n' chosen so that the inner loop vectorized code is fully
> > executed by the masked epilogue and that masked epilogue
> > storing O > n elements (with elements >= n masked of course).
> > Then the masked load performed for the next outer loop iteration
> > will get a hit in the store queue but it obviously cannot forward
> > so we have to wait for the store to retire.
> >
> > That causes a significant hit to performance especially if 'n'
> > would have made a non-masked epilogue to fully cover 'n' as well
> > (say n == 4 for a V4DImode epilogue), avoiding the need for
> > store-forwarding and waiting for the retiring of the store.
> >
> > The following applies a very simple heuristic, disabling
> > the use of loop masking when there's a memory reference pair
> > with dependence distance zero.  That resolves the issue
> > (other problematic dependence distances seem to be less common
> > at least).
> >
> > I have applied this heuristic in generic vectorizer code but
> > restricted it to non-VL vector sizes.  There currently isn't
> > a way for the target to request disabling of masking only,
> > while we can reject the vectoriztion at costing time that will
> > not re-consider the same vector mode but without masking.
> > It seems simply re-costing with masking disabled should be
> > possible through, we'd just need an indication whether that
> > should be done?  Maybe always when the current vector mode is
> > of fixed size?
> >
> > I wonder how SVE vectorized code behaves in these situations?
> > The affected SPEC CPU 2017 benchmarks were 527.cam4_r and
> > 503.bwaves_r though I think both will need a hardware vector
> > size covering at least 8 doubles to show the issue.  527.cam4_r
> > has 4 elements in the inner loop, 503.bwaves_r 5 IIRC.
> >
> > Bootstrap / regtest running on x86_64-unknown-linux-gnu.
> >
> > Any comments?
> >
> > Thanks,
> > Richard.
> >
> > PR target/110456
> > * tree-vectorizer.h (vec_info_shared::has_zero_dep_dist): New.
> > * tree-vectorizer.cc (vec_info_shared::vec_info_shared):
> > Initialize has_zero_dep_dist.
> > * tree-vect-data-refs.cc (vect_analyze_data_ref_dependence):
> > Remember if we've seen a dependence distance of zero.
> > * tree-vect-stmts.cc (check_load_store_for_partial_vectors):
> > When we've seen a dependence distance of zero and the vector
> > type has constant size disable the use of partial vectors.
> > ---
> >  gcc/tree-vect-data-refs.cc |  2 ++
> >  gcc/tree-vect-stmts.cc | 10 ++
> >  gcc/tree-vectorizer.cc |  1 +
> >  gcc/tree-vectorizer.h  |  3 +++
> >  4 files changed, 16 insertions(+)
> >
> > diff --git a/gcc/tree-vect-data-refs.cc b/gcc/tree-vect-data-refs.cc
> > index ebe93832b1e..40cde95c16a 100644
> > --- a/gcc/tree-vect-data-refs.cc
> > +++ b/gcc/tree-vect-data-refs.cc
> > @@ -470,6 +470,8 @@ vect_analyze_data_ref_dependence (struct 
> > data_dependence_relation *ddr,
> >  "dependence distance == 0 between %T and %T\n",
> >  DR_REF (dra), DR_REF (drb));
> >  
> > + loop_vinfo->shared->has_zero_dep_dist = true;
> > +
> >   /* When we perform grouped accesses and perform implicit CSE
> >  by detecting equal accesses and doing disambiguation with
> >  runtime alias tests like for
> > diff --git a/gcc/tree-vect-stmts.cc b/gcc/tree-vect-stmts.cc
> > index d642d3c257f..3bcbc000323 100644
> > --- a/gcc/tree-vect-stmts.cc
> > +++ b/gcc/tree-vect-stmts.cc
> > @@ -1839,6 +1839,16 @@ check_load_store_for_partial_vectors (loop_vec_info 
> > loop_vinfo, tree vectype,
> >using_partial_vectors_p = true;
> >  }
> >  
> > +  if (loop_vinfo->shared->has_zero_dep_dist
> > +  && TYPE_VECTOR_SUBPARTS (vectype).is_constant ())
> 
> I don't think it makes sense to treat VLA and VLS differently here.
> 
> But RMW operations are very common, so it seems like we're giving up a
> lot on the off chance that the inner loop is applied iteratively
> to successive memory locations.

Yes ...

> Maybe that's still OK for AVX512, where I guess loop masking is more
> of a niche use case.  But if so, then yeah, I think a hook to disable
> masking might be better here.

It's a nice use case in that if you'd cost the main vector loop
with and without masking the not masked case is always winning.
I understand with SVE it would be the same if you fix the
vector size but otherwise it would be a win to mask as there's
the chance the HW implementation uses bigger vectors than the

Re: [PATCH][RFC] target/110456 - avoid loop masking with zero distance dependences

2023-06-29 Thread Richard Sandiford via Gcc-patches

Richard Biener  writes:
> With applying loop masking to epilogues on x86_64 AVX512 we see
> some significant performance regressions when evaluating SPEC CPU 2017
> that are caused by store-to-load forwarding fails across outer
> loop iterations when the inner loop does not iterate.  Consider
>
>   for (j = 0; j < m; ++j)
> for (i = 0; i < n; ++i)
>   a[j*n + i] += b[j*n + i];
>
> with 'n' chosen so that the inner loop vectorized code is fully
> executed by the masked epilogue and that masked epilogue
> storing O > n elements (with elements >= n masked of course).
> Then the masked load performed for the next outer loop iteration
> will get a hit in the store queue but it obviously cannot forward
> so we have to wait for the store to retire.
>
> That causes a significant hit to performance especially if 'n'
> would have made a non-masked epilogue to fully cover 'n' as well
> (say n == 4 for a V4DImode epilogue), avoiding the need for
> store-forwarding and waiting for the retiring of the store.
>
> The following applies a very simple heuristic, disabling
> the use of loop masking when there's a memory reference pair
> with dependence distance zero.  That resolves the issue
> (other problematic dependence distances seem to be less common
> at least).
>
> I have applied this heuristic in generic vectorizer code but
> restricted it to non-VL vector sizes.  There currently isn't
> a way for the target to request disabling of masking only,
> while we can reject the vectoriztion at costing time that will
> not re-consider the same vector mode but without masking.
> It seems simply re-costing with masking disabled should be
> possible through, we'd just need an indication whether that
> should be done?  Maybe always when the current vector mode is
> of fixed size?
>
> I wonder how SVE vectorized code behaves in these situations?
> The affected SPEC CPU 2017 benchmarks were 527.cam4_r and
> 503.bwaves_r though I think both will need a hardware vector
> size covering at least 8 doubles to show the issue.  527.cam4_r
> has 4 elements in the inner loop, 503.bwaves_r 5 IIRC.
>
> Bootstrap / regtest running on x86_64-unknown-linux-gnu.
>
> Any comments?
>
> Thanks,
> Richard.
>
>   PR target/110456
>   * tree-vectorizer.h (vec_info_shared::has_zero_dep_dist): New.
>   * tree-vectorizer.cc (vec_info_shared::vec_info_shared):
>   Initialize has_zero_dep_dist.
>   * tree-vect-data-refs.cc (vect_analyze_data_ref_dependence):
>   Remember if we've seen a dependence distance of zero.
>   * tree-vect-stmts.cc (check_load_store_for_partial_vectors):
>   When we've seen a dependence distance of zero and the vector
>   type has constant size disable the use of partial vectors.
> ---
>  gcc/tree-vect-data-refs.cc |  2 ++
>  gcc/tree-vect-stmts.cc | 10 ++
>  gcc/tree-vectorizer.cc |  1 +
>  gcc/tree-vectorizer.h  |  3 +++
>  4 files changed, 16 insertions(+)
>
> diff --git a/gcc/tree-vect-data-refs.cc b/gcc/tree-vect-data-refs.cc
> index ebe93832b1e..40cde95c16a 100644
> --- a/gcc/tree-vect-data-refs.cc
> +++ b/gcc/tree-vect-data-refs.cc
> @@ -470,6 +470,8 @@ vect_analyze_data_ref_dependence (struct 
> data_dependence_relation *ddr,
>"dependence distance == 0 between %T and %T\n",
>DR_REF (dra), DR_REF (drb));
>  
> +   loop_vinfo->shared->has_zero_dep_dist = true;
> +
> /* When we perform grouped accesses and perform implicit CSE
>by detecting equal accesses and doing disambiguation with
>runtime alias tests like for
> diff --git a/gcc/tree-vect-stmts.cc b/gcc/tree-vect-stmts.cc
> index d642d3c257f..3bcbc000323 100644
> --- a/gcc/tree-vect-stmts.cc
> +++ b/gcc/tree-vect-stmts.cc
> @@ -1839,6 +1839,16 @@ check_load_store_for_partial_vectors (loop_vec_info 
> loop_vinfo, tree vectype,
>using_partial_vectors_p = true;
>  }
>  
> +  if (loop_vinfo->shared->has_zero_dep_dist
> +  && TYPE_VECTOR_SUBPARTS (vectype).is_constant ())

I don't think it makes sense to treat VLA and VLS differently here.

But RMW operations are very common, so it seems like we're giving up a
lot on the off chance that the inner loop is applied iteratively
to successive memory locations.

Maybe that's still OK for AVX512, where I guess loop masking is more
of a niche use case.  But if so, then yeah, I think a hook to disable
masking might be better here.

Thanks,
Richard

> +{
> +  if (dump_enabled_p ())
> + dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
> +  "disabling partial vectors because of possible "
> +  "STLF issues\n");
> +  LOOP_VINFO_CAN_USE_PARTIAL_VECTORS_P (loop_vinfo) = false;
> +}
> +
>if (!using_partial_vectors_p)
>  {
>if (dump_enabled_p ())
> diff --git a/gcc/tree-vectorizer.cc b/gcc/tree-vectorizer.cc
> index a048e9d8917..74457259b6e 100644
> --- a/gcc/tree-vectorizer.cc
> +++

[PATCH] LoongArch: Fix bug in loongarch_emit_stack_tie [PR110484].

2023-06-29 Thread chenglulu

From: Lulu Cheng 

Which may result in implicit references to $fp when frame_pointer_needed is 
false,
causing regs_ever_live[$fp] to be true when $fp is not explicitly used,
resulting in $fp being used as the target replacement register in the rnreg 
pass.

The bug originates from SPEC2017 541.leela_r(-flto).

gcc/ChangeLog:

PR target/110484
* config/loongarch/loongarch.cc (loongarch_emit_stack_tie): Use the
frame_pointer_needed to determine whether to use the $fp register.

Co-authored-by: Guo Jie 
---
 gcc/config/loongarch/loongarch.cc | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/gcc/config/loongarch/loongarch.cc 
b/gcc/config/loongarch/loongarch.cc
index 5b8b93eb24b..aa470aafd7b 100644
--- a/gcc/config/loongarch/loongarch.cc
+++ b/gcc/config/loongarch/loongarch.cc
@@ -1112,7 +1112,9 @@ loongarch_first_stack_step (struct loongarch_frame_info 
*frame)
 static void
 loongarch_emit_stack_tie (void)
 {
-  emit_insn (gen_stack_tie (Pmode, stack_pointer_rtx, hard_frame_pointer_rtx));
+  emit_insn (gen_stack_tie (Pmode, stack_pointer_rtx,
+   frame_pointer_needed ? hard_frame_pointer_rtx
+   : stack_pointer_rtx));
 }
 
 #define PROBE_INTERVAL (1 << STACK_CHECK_PROBE_INTERVAL_EXP)
-- 
2.31.1

Re: [committed] libstdc++: Fix preprocessor conditions for std::from_chars [PR109921]

2023-06-29 Thread Jonathan Wakely via Gcc-patches

On Thu, 1 Jun 2023 at 12:05, Jonathan Wakely  wrote:

> On Thu, 1 Jun 2023 at 10:30, Christophe Lyon via Libstdc++
>  wrote:
> >
> > Hi,
> >
> >
> > On Wed, 31 May 2023 at 14:25, Jonathan Wakely via Gcc-patches <
> > gcc-patches@gcc.gnu.org> wrote:
> >
> > > Tested powerpc64le-linux. Pushed to trunk.
> > >
> > > -- >8 --
> > >
> > > We use the from_chars_strtod function with __strtof128 to read a
> > > _Float128 value, but from_chars_strtod is not defined unless uselocale
> > > is available. This can lead to compilation failures for some targets,
> > > because we try to define the _Flaot128 overload in terms of a
> > > non-existing from_chars_strtod function.
> > >
> > > Only try to use __strtof128 if uselocale is available, otherwise
> > > fallback to the long double overload of std::from_chars (which might
> > > fallback to the double overload, which should use fast_float).
> > >
> > > This ensures we always define the full set of overloads, even if they
> > > are not always accurate for all values of the wider types.
> > >
> > > libstdc++-v3/ChangeLog:
> > >
> > > PR libstdc++/109921
> > > * src/c++17/floating_from_chars.cc
> (USE_STRTOF128_FOR_FROM_CHARS):
> > > Only define when USE_STRTOD_FOR_FROM_CHARS is also defined.
> > > (USE_STRTOD_FOR_FROM_CHARS): Do not undefine when long double
> is
> > > binary64.
> > > (from_chars(const char*, const char*, double&, chars_format)):
> > > Check __LDBL_MANT_DIG__ == __DBL_MANT_DIG__ here.
> > > (from_chars(const char*, const char*, _Float128&,
> chars_format))
> > > Only use from_chars_strtod when USE_STRTOD_FOR_FROM_CHARS is
> > > defined, otherwise parse a long double and convert to
> _Float128.
> > >
> >
> >
> > This is causing a regression on aarch64:
> >  FAIL: libstdc++-abi/abi_check
>
> This is now PR 110077.
>

Hi Christophe,

Is this fixed for aarch64 now? I think it should be.



>
>
> >
> > The log says:
> >
> > 3 added symbols
> > 0
> > _ZNSt7__cxx1112basic_stringIwSt11char_traitsIwESaIwEE11_S_allocateERS3_m
> > std::__cxx11::basic_string,
> > std::allocator >::_S_allocate(std::allocator&, unsigned
> > long)
> > version status: compatible
> > GLIBCXX_3.4.32
> > type: function
> > status: added
> >
> > 1
> > _ZNSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEE11_S_allocateERS3_m
> > std::__cxx11::basic_string,
> > std::allocator >::_S_allocate(std::allocator&, unsigned long)
> > version status: compatible
> > GLIBCXX_3.4.32
> > type: function
> > status: added
> >
> > 2
> > _ZSt10from_charsPKcS0_RDF128_St12chars_format
> > std::from_chars(char const*, char const*, _Float128&, std::chars_format)
> > version status: incompatible
> > GLIBCXX_3.4.31
> > type: function
> > status: added
> >
> >
> > 2 undesignated symbols
> > 0
> > _ZSt11__once_call
> > std::__once_call
> > version status: compatible
> > GLIBCXX_3.4.11
> > type: tls
> > type size: 8
> > status: undesignated
> >
> > 1
> > _ZSt15__once_callable
> > std::__once_callable
> > version status: compatible
> > GLIBCXX_3.4.11
> > type: tls
> > type size: 8
> > status: undesignated
> >
> >
> > 1 incompatible symbols
> > 0
> > _ZSt10from_charsPKcS0_RDF128_St12chars_format
> > std::from_chars(char const*, char const*, _Float128&, std::chars_format)
> > version status: incompatible
> > GLIBCXX_3.4.31
> > type: function
> > status: added
> >
> >
> >
> >  libstdc++-v3 check-abi Summary 
> >
> > # of added symbols:  3
> > # of missing symbols:0
> > # of undesignated symbols:   2
> > # of incompatible symbols:   1
> >
> >
> > Can you have a look?
> >
> > Thanks,
> > Christophe
> >
> > ---
> > >  libstdc++-v3/src/c++17/floating_from_chars.cc | 20 ---
> > >  1 file changed, 13 insertions(+), 7 deletions(-)
> > >
> > > diff --git a/libstdc++-v3/src/c++17/floating_from_chars.cc
> > > b/libstdc++-v3/src/c++17/floating_from_chars.cc
> > > index ebd428d5be3..eea878072b0 100644
> > > --- a/libstdc++-v3/src/c++17/floating_from_chars.cc
> > > +++ b/libstdc++-v3/src/c++17/floating_from_chars.cc
> > > @@ -64,7 +64,7 @@
> > >  // strtold for __ieee128
> > >  extern "C" __ieee128 __strtoieee128(const char*, char**);
> > >  #elif __FLT128_MANT_DIG__ == 113 && __LDBL_MANT_DIG__ != 113 \
> > > -  && defined(__GLIBC_PREREQ)
> > > +  && defined(__GLIBC_PREREQ) && defined(USE_STRTOD_FOR_FROM_CHARS)
> > >  #define USE_STRTOF128_FOR_FROM_CHARS 1
> > >  extern "C" _Float128 __strtof128(const char*, char**)
> > >__asm ("strtof128")
> > > @@ -77,10 +77,6 @@ extern "C" _Float128 __strtof128(const char*,
> char**)
> > >  #if _GLIBCXX_FLOAT_IS_IEEE_BINARY32 &&
> _GLIBCXX_DOUBLE_IS_IEEE_BINARY64 \
> > >  && __SIZE_WIDTH__ >= 32
> > >  # define USE_LIB_FAST_FLOAT 1
> > > -# if __LDBL_MANT_DIG__ == __DBL_MANT_DIG__
> > > -// No need to use strtold.
> > > -#  undef USE_STRTOD_FOR_FROM_CHARS
> > > -# endif
> > >  #endif
> > >
> > >  #if USE_LIB_FAST_FLOAT
> > > @@ -1261,7

[PATCH][RFC] target/110456 - avoid loop masking with zero distance dependences

2023-06-29 Thread Richard Biener via Gcc-patches

With applying loop masking to epilogues on x86_64 AVX512 we see
some significant performance regressions when evaluating SPEC CPU 2017
that are caused by store-to-load forwarding fails across outer
loop iterations when the inner loop does not iterate.  Consider

  for (j = 0; j < m; ++j)
for (i = 0; i < n; ++i)
  a[j*n + i] += b[j*n + i];

with 'n' chosen so that the inner loop vectorized code is fully
executed by the masked epilogue and that masked epilogue
storing O > n elements (with elements >= n masked of course).
Then the masked load performed for the next outer loop iteration
will get a hit in the store queue but it obviously cannot forward
so we have to wait for the store to retire.

That causes a significant hit to performance especially if 'n'
would have made a non-masked epilogue to fully cover 'n' as well
(say n == 4 for a V4DImode epilogue), avoiding the need for
store-forwarding and waiting for the retiring of the store.

The following applies a very simple heuristic, disabling
the use of loop masking when there's a memory reference pair
with dependence distance zero.  That resolves the issue
(other problematic dependence distances seem to be less common
at least).

I have applied this heuristic in generic vectorizer code but
restricted it to non-VL vector sizes.  There currently isn't
a way for the target to request disabling of masking only,
while we can reject the vectoriztion at costing time that will
not re-consider the same vector mode but without masking.
It seems simply re-costing with masking disabled should be
possible through, we'd just need an indication whether that
should be done?  Maybe always when the current vector mode is
of fixed size?

I wonder how SVE vectorized code behaves in these situations?
The affected SPEC CPU 2017 benchmarks were 527.cam4_r and
503.bwaves_r though I think both will need a hardware vector
size covering at least 8 doubles to show the issue.  527.cam4_r
has 4 elements in the inner loop, 503.bwaves_r 5 IIRC.

Bootstrap / regtest running on x86_64-unknown-linux-gnu.

Any comments?

Thanks,
Richard.

PR target/110456
* tree-vectorizer.h (vec_info_shared::has_zero_dep_dist): New.
* tree-vectorizer.cc (vec_info_shared::vec_info_shared):
Initialize has_zero_dep_dist.
* tree-vect-data-refs.cc (vect_analyze_data_ref_dependence):
Remember if we've seen a dependence distance of zero.
* tree-vect-stmts.cc (check_load_store_for_partial_vectors):
When we've seen a dependence distance of zero and the vector
type has constant size disable the use of partial vectors.
---
 gcc/tree-vect-data-refs.cc |  2 ++
 gcc/tree-vect-stmts.cc | 10 ++
 gcc/tree-vectorizer.cc |  1 +
 gcc/tree-vectorizer.h  |  3 +++
 4 files changed, 16 insertions(+)

diff --git a/gcc/tree-vect-data-refs.cc b/gcc/tree-vect-data-refs.cc
index ebe93832b1e..40cde95c16a 100644
--- a/gcc/tree-vect-data-refs.cc
+++ b/gcc/tree-vect-data-refs.cc
@@ -470,6 +470,8 @@ vect_analyze_data_ref_dependence (struct 
data_dependence_relation *ddr,
 "dependence distance == 0 between %T and %T\n",
 DR_REF (dra), DR_REF (drb));
 
+ loop_vinfo->shared->has_zero_dep_dist = true;
+
  /* When we perform grouped accesses and perform implicit CSE
 by detecting equal accesses and doing disambiguation with
 runtime alias tests like for
diff --git a/gcc/tree-vect-stmts.cc b/gcc/tree-vect-stmts.cc
index d642d3c257f..3bcbc000323 100644
--- a/gcc/tree-vect-stmts.cc
+++ b/gcc/tree-vect-stmts.cc
@@ -1839,6 +1839,16 @@ check_load_store_for_partial_vectors (loop_vec_info 
loop_vinfo, tree vectype,
   using_partial_vectors_p = true;
 }
 
+  if (loop_vinfo->shared->has_zero_dep_dist
+  && TYPE_VECTOR_SUBPARTS (vectype).is_constant ())
+{
+  if (dump_enabled_p ())
+   dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
+"disabling partial vectors because of possible "
+"STLF issues\n");
+  LOOP_VINFO_CAN_USE_PARTIAL_VECTORS_P (loop_vinfo) = false;
+}
+
   if (!using_partial_vectors_p)
 {
   if (dump_enabled_p ())
diff --git a/gcc/tree-vectorizer.cc b/gcc/tree-vectorizer.cc
index a048e9d8917..74457259b6e 100644
--- a/gcc/tree-vectorizer.cc
+++ b/gcc/tree-vectorizer.cc
@@ -478,6 +478,7 @@ vec_info::~vec_info ()
 
 vec_info_shared::vec_info_shared ()
   : n_stmts (0),
+has_zero_dep_dist (false),
 datarefs (vNULL),
 datarefs_copy (vNULL),
 ddrs (vNULL)
diff --git a/gcc/tree-vectorizer.h b/gcc/tree-vectorizer.h
index a36974c2c0d..7626cda2a73 100644
--- a/gcc/tree-vectorizer.h
+++ b/gcc/tree-vectorizer.h
@@ -419,6 +419,9 @@ public:
   /* The number of scalar stmts.  */
   unsigned n_stmts;
 
+  /* Whether there's a dependence with zero distance.  */
+  bool has_zero_dep_dist;
+
   /* All data references.  Freed by free_data_refs, so not an

Re: [PATCH V3] RISC-V: Fix bug of pre-calculated const vector mask for VNx1BI, VNx2BI and VNx4BI

2023-06-29 Thread Robin Dapp via Gcc-patches

> This should probably use GET_MODE_PRECISION as well.
> 
> OK if it bootstraps/tests on both aarch64 and riscv.
> 
> Richard.

I found a several other instances, also in the frontends that
I'm not exactly sure about.  I'm currently testing this but aarch64
bootstrap is still going to take a while, various aarch compile
farm machines are down?

Regards
 Robin

>From ef919a27f4a156afeca6b4825e6029d9f44be556 Mon Sep 17 00:00:00 2001
From: Robin Dapp 
Date: Wed, 28 Jun 2023 20:59:29 +0200
Subject: [PATCH] mode_bitsize -> precision.

bitsize -> precision.
---
 gcc/c-family/c-common.cc  |  2 +-
 gcc/fortran/trans-types.cc|  2 +-
 gcc/go/go-lang.cc |  2 +-
 gcc/lto/lto-lang.cc   |  2 +-
 gcc/rust/backend/rust-tree.cc |  2 +-
 gcc/simplify-rtx.cc   | 10 +-
 gcc/tree.cc   |  2 +-
 gcc/varasm.cc |  2 +-
 8 files changed, 12 insertions(+), 12 deletions(-)

diff --git a/gcc/c-family/c-common.cc b/gcc/c-family/c-common.cc
index 34566a342bd..6ab63dae997 100644
--- a/gcc/c-family/c-common.cc
+++ b/gcc/c-family/c-common.cc
@@ -2458,7 +2458,7 @@ c_common_type_for_mode (machine_mode mode, int unsignedp)
   else if (GET_MODE_CLASS (mode) == MODE_VECTOR_BOOL
   && valid_vector_subparts_p (GET_MODE_NUNITS (mode)))
 {
-  unsigned int elem_bits = vector_element_size (GET_MODE_BITSIZE (mode),
+  unsigned int elem_bits = vector_element_size (GET_MODE_PRECISION (mode),
GET_MODE_NUNITS (mode));
   tree bool_type = build_nonstandard_boolean_type (elem_bits);
   return build_vector_type_for_mode (bool_type, mode);
diff --git a/gcc/fortran/trans-types.cc b/gcc/fortran/trans-types.cc
index d718f28cc86..987e3d26c46 100644
--- a/gcc/fortran/trans-types.cc
+++ b/gcc/fortran/trans-types.cc
@@ -3403,7 +3403,7 @@ gfc_type_for_mode (machine_mode mode, int unsignedp)
   else if (GET_MODE_CLASS (mode) == MODE_VECTOR_BOOL
   && valid_vector_subparts_p (GET_MODE_NUNITS (mode)))
 {
-  unsigned int elem_bits = vector_element_size (GET_MODE_BITSIZE (mode),
+  unsigned int elem_bits = vector_element_size (GET_MODE_PRECISION (mode),
GET_MODE_NUNITS (mode));
   tree bool_type = build_nonstandard_boolean_type (elem_bits);
   return build_vector_type_for_mode (bool_type, mode);
diff --git a/gcc/go/go-lang.cc b/gcc/go/go-lang.cc
index e85a4bfe949..d5c871a533c 100644
--- a/gcc/go/go-lang.cc
+++ b/gcc/go/go-lang.cc
@@ -414,7 +414,7 @@ go_langhook_type_for_mode (machine_mode mode, int unsignedp)
   if (GET_MODE_CLASS (mode) == MODE_VECTOR_BOOL
   && valid_vector_subparts_p (GET_MODE_NUNITS (mode)))
 {
-  unsigned int elem_bits = vector_element_size (GET_MODE_BITSIZE (mode),
+  unsigned int elem_bits = vector_element_size (GET_MODE_PRECISION (mode),
GET_MODE_NUNITS (mode));
   tree bool_type = build_nonstandard_boolean_type (elem_bits);
   return build_vector_type_for_mode (bool_type, mode);
diff --git a/gcc/lto/lto-lang.cc b/gcc/lto/lto-lang.cc
index 52d7626e92e..14d419c2013 100644
--- a/gcc/lto/lto-lang.cc
+++ b/gcc/lto/lto-lang.cc
@@ -1050,7 +1050,7 @@ lto_type_for_mode (machine_mode mode, int unsigned_p)
   else if (GET_MODE_CLASS (mode) == MODE_VECTOR_BOOL
   && valid_vector_subparts_p (GET_MODE_NUNITS (mode)))
 {
-  unsigned int elem_bits = vector_element_size (GET_MODE_BITSIZE (mode),
+  unsigned int elem_bits = vectwhereor_element_size (GET_MODE_PRECISION 
(mode),
GET_MODE_NUNITS (mode));
   tree bool_type = build_nonstandard_boolean_type (elem_bits);
   return build_vector_type_for_mode (bool_type, mode);
diff --git a/gcc/rust/backend/rust-tree.cc b/gcc/rust/backend/rust-tree.cc
index 8243d4cf5c6..66e859cd70c 100644
--- a/gcc/rust/backend/rust-tree.cc
+++ b/gcc/rust/backend/rust-tree.cc
@@ -5320,7 +5320,7 @@ c_common_type_for_mode (machine_mode mode, int unsignedp)
   && valid_vector_subparts_p (GET_MODE_NUNITS (mode)))
 {
   unsigned int elem_bits
-   = vector_element_size (GET_MODE_BITSIZE (mode), GET_MODE_NUNITS (mode));
+   = vector_element_size (GET_MODE_PRECISION (mode), GET_MODE_NUNITS 
(mode));
   tree bool_type = build_nonstandard_boolean_type (elem_bits);
   return build_vector_type_for_mode (bool_type, mode);
 }
diff --git a/gcc/simplify-rtx.cc b/gcc/simplify-rtx.cc
index 99cbdd47d93..d7315d82aa3 100644
--- a/gcc/simplify-rtx.cc
+++ b/gcc/simplify-rtx.cc
@@ -7076,7 +7076,7 @@ native_encode_rtx (machine_mode mode, rtx x, 
vec ,
   /* CONST_VECTOR_ELT follows target memory order, so no shuffling
 is necessary.  The only complication is that MODE_VECTOR_BOOL
 vectors can have several elements per byte.  */
-  unsigned int elt_bits = vector_element_size (GET_MODE_BITSIZE (mode),
+  unsigned int

Re: [PATCH V3] RISC-V: Fix bug of pre-calculated const vector mask for VNx1BI, VNx2BI and VNx4BI

2023-06-29 Thread Richard Biener via Gcc-patches

On Thu, Jun 29, 2023 at 11:10 AM Robin Dapp via Gcc-patches
 wrote:
>
> > Yeah, that part is OK, and was the case I was thinking about when
> > I said OK yesterday.  But now that we allow BITSIZE != PRECISION,
> > it's possible for BITSIZE - PRECISION to be more than a full byte,
> > in which case the new loop would not initialise every byte of
> > the mode.
>
> Ah, I see, so when e.g. BITSIZE == 16 and PRECISION == 1.  Luckily
> this cannot happen with RVV as all we do is adjust the precision
> of the modes that have BITSIZE == 8.  I'm going to add an assert.
> Juzhe would rather work around that in the backend, though.
>
> The other thing I just noticed is
>
> tree
> build_truth_vector_type_for_mode (poly_uint64 nunits, machine_mode mask_mode)
> {
>   gcc_assert (mask_mode != BLKmode);
>
>   unsigned HOST_WIDE_INT esize;
>   if (VECTOR_MODE_P (mask_mode))
> {
>   poly_uint64 vsize = GET_MODE_BITSIZE (mask_mode);
>   esize = vector_element_size (vsize, nunits);
> }
>   else
> esize = 1;
>
>   tree bool_type = build_nonstandard_boolean_type (esize);
>
>   return make_vector_type (bool_type, nunits, mask_mode);
> }
>
> which gives us wrong precision as we rely on the BITSIZE here as well.
> This results in a precision of 1 for VNx8BI, 2 for VNx4BI and 4 for
> VNx2BI.

This should probably use GET_MODE_PRECISION as well.

OK if it bootstraps/tests on both aarch64 and riscv.

Richard.

>
> Maybe this isn't a problem per se but to me it appears
> just wrong.
>
> Regards
>  Robin
>

Re: [PATCH] Prevent TYPE_PRECISION on VECTOR_TYPEs

2023-06-29 Thread Robin Dapp via Gcc-patches

>> Since nobody else has provided a patch yet, is the attached OK as long
>> as x86 bootstrap and testsuite are clean?
> 
> Yes.

Bootstrap and testsuite are good.  Going to commit.

Thanks.

Regards
 Robin

[Committed] Add -mmove-max=128 -mstore-max=128 to pieces-memcmp-2.c

2023-06-29 Thread Roger Sayle


Adding -mmove-max=128 and -mstore-max=128 to the dg-options of the
recently added gcc.target/i386/pieces-memcmp-2.c avoids changing the
intent of this testcase when adding -march=cascadelake to RUNTESTFLAGS.
Tested on x86_64-pc-linux-gnu.  Committed as obvious.


2023-06-29  Roger Sayle  

gcc/testsuite/ChangeLog
* gcc.target/i386/pieces-memcmp-2.c: Specify that 128-bit
comparisons are desired, to see if 256-bit instructions are
generated inappropriately (fixes test on -march=cascadelake).

diff --git a/gcc/testsuite/gcc.target/i386/pieces-memcmp-2.c 
b/gcc/testsuite/gcc.target/i386/pieces-memcmp-2.c
index 6f996faeced..6061c911165 100644
--- a/gcc/testsuite/gcc.target/i386/pieces-memcmp-2.c
+++ b/gcc/testsuite/gcc.target/i386/pieces-memcmp-2.c
@@ -1,5 +1,5 @@
 /* { dg-do compile { target ia32 } } */
-/* { dg-options "-O2 -mavx2" } */
+/* { dg-options "-O2 -mavx2 -mmove-max=128 -mstore-max=128" } */
 
 int foo(char *a)
 {

Re: [PATCH v2] mips: Fix overaligned function arguments [PR109435]

2023-06-29 Thread Jovan Dmitrovic

> Ohh, my fault: the `-flto` option should always be skipped, when run test.

Right, if tests run with `-flto` option, they will fail. However, I do believe
they are run only if LTO support is enabled, that's why my tests all passed
without explicitly skipping that option.

Your modification looks good to me.

Regards,
Jovan

Re: [PATCH] Introduce hardbool attribute for C

2023-06-29 Thread Alexandre Oliva via Gcc-patches

On Jun 28, 2023, Qing Zhao  wrote:

> In summary, Ada’s Boolean variables (whether it’s hardened or not) are
> represented as
> enumeration types in GNU IR.

Not quite.  Only boolean types with representation clauses are.  Those
without (such as Standard.Boolean) are BOOLEAN_TYPEs.  But those without
a representation clause are not so relevant and could be disregarded,
for purposes of this conversation.

> FE takes care of the converting between non-boolean_type_node
> enumeration types and boolean_type_node as needed, no special handling
> in Middle end.

> So, is this exactly the same situation as the new hardbool attribute
> for C being implemented in this patch?

That's correct.

> (Another question, for Ada’s Boolean variables, does the ada FE also
> insert BUILT_IN_TRAP when
>   The value is neither true_value nor false_value?)

Ada raises exceptions when validity checking fails; such as upon using a
boolean variable with a representation clause holds a value that is
neither true nor false.

>> The middle-end doesn't know (and ATM cannot know) that those represented
>> as enumeration types are conceptually booleans, so they are treated as
>> enumeration types, not as booleans.

> They should know it’s a boolean if using the lookup_attribute to get
> the attribute info -:)

I meant boolean types that have a representation clause but are not
hardbools.  Those don't have any attribute whatsoever.

>> You mean more than what's in the patch posted last week?
> No, the updated doc is good I think.

Great, thanks

> So, from my current understanding, a summary on my major concern and
> the possible solution to this concern:

That was a good summary.

> Is it necessary to fix such inconsistency?

I don't think it is even desirable.

Initialization of static variables is well-defined, one is allowed to
count on a specific value after initialization, and we have that
covered.

Automatic variables, OTOH, when not explicitly initialized, may hold
undefined, random, even malformed values.  Picking an initializer to
make them predictable needs not follow the semantics of zero
initialization for static variables.  =pattern makes it clear that using
something other than zero initialization is useful to catch errors.  The
Ada language standard even suggests that compilers may set uninitialized
variables to out-of-range values so as to catch this sort of error.  So,
though it might seem desirable, for symmetry, to have automatic
variables implicitly initialized similarly to static variables, it's not
clear that doing so serves a useful purpose, at least for such types as
hardened booleans, that are *intended* to catch malformed values.

-- 
Alexandre Oliva, happy hackerhttps://FSFLA.org/blogs/lxo/
   Free Software Activist   GNU Toolchain Engineer
Disinformation flourishes because many people care deeply about injustice
but very few check the facts.  Ask me about

Re: [PATCH 2/2] AArch64: New RTL for ABDL

2023-06-29 Thread Richard Sandiford via Gcc-patches

Oluwatamilore Adebayo  writes:
> From: oluade01 
>
> This patch adds new RTL for ABDL (sabdl, sabdl2, uabdl, uabdl2).
>
> gcc/ChangeLog:
>
>   * config/aarch64/aarch64-simd.md
>   (vec_widen_abdl_lo_, vec_widen_abdl_hi_):
>   Expansions for abd vec widen optabs.
>   (aarch64_abdl_insn): VQW based abdl RTL.
>   * config/aarch64/iterators.md (USMAX_EXT): Code attributes
>   that give the appropriate extend RTL for the max RTL.
>
> gcc/testsuite/ChangeLog:
>
>   * gcc.target/aarch64/abd_2.c: Added ABDL testcases.
>   * gcc.target/aarch64/abd_3.c: Added ABDL testcases.
>   * gcc.target/aarch64/abd_4.c: Added ABDL testcases.
>   * gcc.target/aarch64/abd_none_2.c: Added ABDL testcases.
>   * gcc.target/aarch64/abd_none_3.c: Added ABDL testcases.
>   * gcc.target/aarch64/abd_none_4.c: Added ABDL testcases.
>   * gcc.target/aarch64/abd_run_1.c: Added ABDL testcases.
>   * gcc.target/aarch64/sve/abd_1.c: Added ABDL testcases.
>   * gcc.target/aarch64/sve/abd_2.c: Added ABDL testcases.
>   * gcc.target/aarch64/sve/abd_none_1.c: Added ABDL testcases.
>   * gcc.target/aarch64/sve/abd_none_2.c: Added ABDL testcases.
> ---
>  gcc/config/aarch64/aarch64-simd.md| 65 ++
>  gcc/config/aarch64/iterators.md   |  3 +
>  gcc/testsuite/gcc.target/aarch64/abd_2.c  | 33 +---
>  gcc/testsuite/gcc.target/aarch64/abd_3.c  | 36 +---
>  gcc/testsuite/gcc.target/aarch64/abd_4.c  | 34 
>  gcc/testsuite/gcc.target/aarch64/abd_none_2.c | 73 
>  gcc/testsuite/gcc.target/aarch64/abd_none_3.c | 73 
>  gcc/testsuite/gcc.target/aarch64/abd_none_4.c | 84 +++
>  gcc/testsuite/gcc.target/aarch64/abd_run_1.c  | 29 +++
>  .../gcc.target/aarch64/abd_widen_2.c  | 62 ++
>  .../gcc.target/aarch64/abd_widen_3.c  | 62 ++
>  .../gcc.target/aarch64/abd_widen_4.c  | 56 +
>  gcc/testsuite/gcc.target/aarch64/sve/abd_1.c  | 57 +++--
>  gcc/testsuite/gcc.target/aarch64/sve/abd_2.c  | 47 +--
>  .../gcc.target/aarch64/sve/abd_none_1.c   | 73 
>  .../gcc.target/aarch64/sve/abd_none_2.c   | 80 ++
>  16 files changed, 811 insertions(+), 56 deletions(-)
>  create mode 100644 gcc/testsuite/gcc.target/aarch64/abd_widen_2.c
>  create mode 100644 gcc/testsuite/gcc.target/aarch64/abd_widen_3.c
>  create mode 100644 gcc/testsuite/gcc.target/aarch64/abd_widen_4.c
>
> diff --git a/gcc/config/aarch64/aarch64-simd.md 
> b/gcc/config/aarch64/aarch64-simd.md
> index 
> bf90202ba2ad3f62f2020486d21256f083effb07..9acf0ab3067a76c0ba49d61e2857558c8482e77d
>  100644
> --- a/gcc/config/aarch64/aarch64-simd.md
> +++ b/gcc/config/aarch64/aarch64-simd.md
> @@ -975,6 +975,71 @@ (define_expand "aarch64_abdl2"
>}
>  )
>  
> +(define_insn "aarch64_abdl_hi_internal"
> +  [(set (match_operand: 0 "register_operand" "=w")
> + (abs:
> +   (minus:
> + (ANY_EXTEND:
> +   (vec_select:
> + (match_operand:VQW 1 "register_operand" "w")
> + (match_operand:VQW 3 "vect_par_cnst_hi_half" "")))
> + (ANY_EXTEND:
> +   (vec_select:
> + (match_operand:VQW 2 "register_operand" "w")
> + (match_dup 3))]
> +  "TARGET_SIMD"
> +  "abdl2\t%0., %1., %2."
> +  [(set_attr "type" "neon_abd_long")]
> +)
> +
> +(define_insn "aarch64_abdl_lo_internal"
> +  [(set (match_operand: 0 "register_operand" "=w")
> + (minus:
> +   (USMAX:
> + (:
> +   (vec_select:
> + (match_operand:VQW 1 "register_operand" "w")
> + (match_operand:VQW 3 "vect_par_cnst_lo_half" "")))
> + (:
> +   (vec_select:
> + (match_operand:VQW 2 "register_operand" "w")
> + (match_dup 3
> +   (:
> + (:
> +   (vec_select: (match_dup 1) (match_dup 3)))
> + (:
> +   (vec_select: (match_dup 2) (match_dup 3))]

Sorry, my fault, but I meant the comment about avoiding
(minus (max…) (min…)) for both patterns, not just the first.

I think the review suggestions for 1/2 will change the tests.
For example:

TEST2(signed, short, char)

shouldn't use IFN_WIDEN_ABD, since:

.L2:
ldr q30, [x5, x3]
ldr q28, [x4, x3]
ldr q31, [x0, x3]
ldr q29, [x1, x3]
add x3, x3, 32
sabdv30.8h, v30.8h, v28.8h
sabdv31.8h, v31.8h, v29.8h
uzp1v31.16b, v31.16b, v30.16b
str q31, [x2], 16
cmp x3, 2048
bne .L2
 
is better than:

.L2:
ldr q28, [x1, x3]
ldr q29, [x0, x3]
ldr q30, [x5, x3]
ldr q27, [x4, x3]
add x3, x3, 32
sabdl   v31.4s, v29.4h, v28.4h
sabdl2  v29.4s, v29.8h, v28.8h
sabdl   v28.4s, v30.4h, v27.4h
sabdl2  v30.4s, v30.8h, v27.8h
uzp1v31.8h,

Re: [PATCH 1/2] Mid engine setup [SU]ABDL

2023-06-29 Thread Richard Sandiford via Gcc-patches

Oluwatamilore Adebayo  writes:
> From: oluade01 
>
> This updates vect_recog_abd_pattern to recognize the widening
> variant of absolute difference (ABDL, ABDL2).
>
> gcc/ChangeLog:
>
>   * internal-fn.cc (widening_fn_p, decomposes_to_hilo_fn_p):
>   Add IFN_VEC_WIDEN_ABD to the switch statement.
>   * internal-fn.def (VEC_WIDEN_ABD): New internal hilo optab.
>   * optabs.def (vec_widen_sabd_optab,
>   vec_widen_sabd_hi_optab, vec_widen_sabd_lo_optab,
>   vec_widen_sabd_odd_even, vec_widen_sabd_even_optab,
>   vec_widen_uabd_optab,
>   vec_widen_uabd_hi_optab, vec_widen_uabd_lo_optab,
>   vec_widen_uabd_odd_even, vec_widen_uabd_even_optab):
>   New optabs.
>   * tree-vect-patterns.cc (vect_recog_abd_pattern): Update to
>   to build a VEC_WIDEN_ABD call if the input precision is smaller
>   than the precision of the output.
>   (vect_recog_widen_abd_pattern): Should an ABD expression be
>   found preceeding an extension, replace the two with a
>   VEC_WIDEN_ABD.
> ---
>  gcc/doc/md.texi   |  11 ++
>  gcc/internal-fn.def   |   5 +
>  gcc/optabs.def|  10 ++
>  gcc/tree-vect-patterns.cc | 205 +-
>  4 files changed, 183 insertions(+), 48 deletions(-)
>
> diff --git a/gcc/doc/md.texi b/gcc/doc/md.texi
> index 
> e11b10d2fca11016232921bc85e47975f700e6c6..2ae6182b925d0cf8950dc830d083cf93baf2eaa1
>  100644
> --- a/gcc/doc/md.texi
> +++ b/gcc/doc/md.texi
> @@ -5617,6 +5617,17 @@ signed/unsigned elements of size S@.  Subtract the 
> high/low elements of 2 from
>  1 and widen the resulting elements. Put the N/2 results of size 2*S in the
>  output vector (operand 0).
>  
> +@cindex @code{vec_widen_sabdl_hi_@var{m}} instruction pattern
> +@cindex @code{vec_widen_sabdl_lo_@var{m}} instruction pattern
> +@cindex @code{vec_widen_uabdl_hi_@var{m}} instruction pattern
> +@cindex @code{vec_widen_uabdl_lo_@var{m}} instruction pattern
> +@item @samp{vec_widen_uabdl_hi_@var{m}}, @samp{vec_widen_uabdl_lo_@var{m}}
> +@itemx @samp{vec_widen_sabdl_hi_@var{m}}, @samp{vec_widen_sabdl_lo_@var{m}}

The optabs don't have the trailing “l” (long).  (Which is a good thing!)

The list should include the even/odd patterns as well.

> +Signed/Unsigned widening absolute difference long.  Operands 1 and 2 are

Similarly no “long” here.

> +vectors with N signed/unsigned elements of size S@.  Find the absolute
> +difference between 1 and 2 and widen the resulting elements.  Put the N/2

Maybe “operands 1 and 2”, or just “them”.

> +results of size 2*S in the output vector (operand 0).
> +
>  @cindex @code{vec_addsub@var{m}3} instruction pattern
>  @item @samp{vec_addsub@var{m}3}
>  Alternating subtract, add with even lanes doing subtract and odd
> diff --git a/gcc/internal-fn.def b/gcc/internal-fn.def
> index 
> 116965f4830cec8f60642ff011a86b6562e2c509..d67274d68b49943a88c531e903fd03b42343ab97
>  100644
> --- a/gcc/internal-fn.def
> +++ b/gcc/internal-fn.def
> @@ -352,6 +352,11 @@ DEF_INTERNAL_WIDENING_OPTAB_FN (VEC_WIDEN_MINUS,
>   first,
>   vec_widen_ssub, vec_widen_usub,
>   binary)
> +DEF_INTERNAL_WIDENING_OPTAB_FN (VEC_WIDEN_ABD,
> + ECF_CONST | ECF_NOTHROW,
> + first,
> + vec_widen_sabd, vec_widen_uabd,
> + binary)
>  DEF_INTERNAL_OPTAB_FN (VEC_FMADDSUB, ECF_CONST, vec_fmaddsub, ternary)
>  DEF_INTERNAL_OPTAB_FN (VEC_FMSUBADD, ECF_CONST, vec_fmsubadd, ternary)
>  
> diff --git a/gcc/optabs.def b/gcc/optabs.def
> index 
> 35b835a6ac56d72417dac8ddfd77a8a7e2475e65..68dfa1550f791a2fe833012157601ecfa68f1e09
>  100644
> --- a/gcc/optabs.def
> +++ b/gcc/optabs.def
> @@ -418,6 +418,11 @@ OPTAB_D (vec_widen_sadd_hi_optab, "vec_widen_sadd_hi_$a")
>  OPTAB_D (vec_widen_sadd_lo_optab, "vec_widen_sadd_lo_$a")
>  OPTAB_D (vec_widen_sadd_odd_optab, "vec_widen_sadd_odd_$a")
>  OPTAB_D (vec_widen_sadd_even_optab, "vec_widen_sadd_even_$a")
> +OPTAB_D (vec_widen_sabd_optab, "vec_widen_sabd_$a")
> +OPTAB_D (vec_widen_sabd_hi_optab, "vec_widen_sabd_hi_$a")
> +OPTAB_D (vec_widen_sabd_lo_optab, "vec_widen_sabd_lo_$a")
> +OPTAB_D (vec_widen_sabd_odd_optab, "vec_widen_sabd_odd_$a")
> +OPTAB_D (vec_widen_sabd_even_optab, "vec_widen_sabd_even_$a")
>  OPTAB_D (vec_widen_sshiftl_hi_optab, "vec_widen_sshiftl_hi_$a")
>  OPTAB_D (vec_widen_sshiftl_lo_optab, "vec_widen_sshiftl_lo_$a")
>  OPTAB_D (vec_widen_umult_even_optab, "vec_widen_umult_even_$a")
> @@ -436,6 +441,11 @@ OPTAB_D (vec_widen_uadd_hi_optab, "vec_widen_uadd_hi_$a")
>  OPTAB_D (vec_widen_uadd_lo_optab, "vec_widen_uadd_lo_$a")
>  OPTAB_D (vec_widen_uadd_odd_optab, "vec_widen_uadd_odd_$a")
>  OPTAB_D (vec_widen_uadd_even_optab, "vec_widen_uadd_even_$a")
> +OPTAB_D (vec_widen_uabd_optab, "vec_widen_uabd_$a")
> +OPTAB_D (vec_widen_uabd_hi_optab, "vec_widen_uabd_hi_$a")
> +OPTAB_D

Re: Re: [PATCH] Prevent TYPE_PRECISION on VECTOR_TYPEs

2023-06-29 Thread juzhe.zh...@rivai.ai

Yes. Thanks for taking care of it!

juzhe.zh...@rivai.ai

From: Robin Dapp
Date: 2023-06-29 18:07
To: juzhe.zh...@rivai.ai; pan2.li; Richard Biener
CC: rdapp.gcc; gcc-patches; jeffreyalaw; kito.cheng
Subject: Re: [PATCH] Prevent TYPE_PRECISION on VECTOR_TYPEs
Ah, the one sub-thread continued before you were CC'ed.
Sorry about that.

Regards
Robin

Re: [PATCH] Prevent TYPE_PRECISION on VECTOR_TYPEs

2023-06-29 Thread Robin Dapp via Gcc-patches

Ah, the one sub-thread continued before you were CC'ed.
Sorry about that.

Regards
 Robin

Re: [PATCH] Prevent TYPE_PRECISION on VECTOR_TYPEs

2023-06-29 Thread Robin Dapp via Gcc-patches

> Currently, I have no ideal how to walk around this ICE in RISC-V port.
> Do you have any suggestions?

I'm already bootstrapping this patch:

https://gcc.gnu.org/pipermail/gcc-patches/2023-June/623184.html

I replied to all but it seems you got lost in the thread?

Regards
 Robin

Re: [PATCH v2] mips: Fix overaligned function arguments [PR109435]

2023-06-29 Thread YunQiang Su via Gcc-patches

YunQiang Su  于2023年6月29日周四 14:04写道：
>
> Jovan Dmitrovic  于2023年6月27日周二 16:54写道：
> >
> > Hi,
> > I am sending a revised patch, now with different tests for N64/N32 and O32 
> > ABIs.
> > For the O32 ABI, I've skipped the -O0 and -Os pipelines, considering there 
> > is a
> > difference between exact offsets for store instructions (the registers used 
> > remain
> > the same).
> >
> > Skipping -flto isn't really necessary, so I've removed that part.
> >
> > I've fixed the Changelog, hopefully I've corrected the mistakes I made.
> >
>
> Looks good.
> I will submit this patch with some format improvement.
>

Ohh, my fault: the `-flto` option should always be skipped, when run test.
And you skipped -O0 test on O32, while this bug effects O0 only, it
should not be expected.

The below is my modification to your patch.
Is it OK for you?

--- xx.patch 2023-06-29 14:32:59.805474033 +0800
+++ build/0001-mips-Fix-overaligned-function-arguments-PR109435.patch
2023-06-29 18:01:19.245478275 +0800
@@ -1,4 +1,4 @@
-From 05e4ff4d2fbb91ea8040fb10d8d6a130ad24bba7 Mon Sep 17 00:00:00 2001
+From 7b5af22bb7c8fadce27e94c37c96101a06acd286 Mon Sep 17 00:00:00 2001
 From: Jovan Dmitrovic 
 Date: Mon, 26 Jun 2023 17:00:20 +0200
 Subject: [PATCH] mips: Fix overaligned function arguments [PR109435]
@@ -16,12 +16,13 @@
 2023-06-27  Jovan Dmitrović  

 gcc/ChangeLog:
-PR target/109435
+
+ PR target/109435
  * config/mips/mips.cc (mips_function_arg_alignment): Returns
-the alignment of function argument. In case of typedef type,
-it returns the aligment of the aliased type.
+ the alignment of function argument. In case of typedef type,
+ it returns the aligment of the aliased type.
  (mips_function_arg_boundary): Relocated calculation of the
-aligment of function arguments.
+ aligment of function arguments.

 gcc/testsuite/ChangeLog:

@@ -29,9 +30,9 @@
  * gcc.target/mips/align-1-o32.c: New test.
 ---
  gcc/config/mips/mips.cc | 19 ++-
- gcc/testsuite/gcc.target/mips/align-1-n64.c | 19 +++
+ gcc/testsuite/gcc.target/mips/align-1-n64.c | 20 
  gcc/testsuite/gcc.target/mips/align-1-o32.c | 20 
- 3 files changed, 57 insertions(+), 1 deletion(-)
+ 3 files changed, 58 insertions(+), 1 deletion(-)
  create mode 100644 gcc/testsuite/gcc.target/mips/align-1-n64.c
  create mode 100644 gcc/testsuite/gcc.target/mips/align-1-o32.c

@@ -75,14 +76,15 @@
if (alignment > STACK_BOUNDARY)
 diff --git a/gcc/testsuite/gcc.target/mips/align-1-n64.c
b/gcc/testsuite/gcc.target/mips/align-1-n64.c
 new file mode 100644
-index 000..46e718d548d
+index 000..3ede539c3a4
 --- /dev/null
 +++ b/gcc/testsuite/gcc.target/mips/align-1-n64.c
-@@ -0,0 +1,19 @@
+@@ -0,0 +1,20 @@
 +/* Check that typedef alignment does not affect passing of function
 +   parameters for N64/N32 ABIs.  */
 +/* { dg-do compile { target { "mips*-*-*" } } } */
 +/* { dg-options "-mabi=64"  } */
++/* { dg-skip-if "" { *-*-* } { "-flto" } { "" } } */
 +
 +typedef struct ui8
 +{
@@ -100,7 +102,7 @@
 +/* { dg-final { scan-assembler "\tsd\t\\\$7,16\\(\\\$\[0-9\]\\)" } } */
 diff --git a/gcc/testsuite/gcc.target/mips/align-1-o32.c
b/gcc/testsuite/gcc.target/mips/align-1-o32.c
 new file mode 100644
-index 000..a548632b7f6
+index 000..e043d6a3eca
 --- /dev/null
 +++ b/gcc/testsuite/gcc.target/mips/align-1-o32.c
 @@ -0,0 +1,20 @@
@@ -108,7 +110,7 @@
 +   parameters for O32 ABI.  */
 +/* { dg-do compile { target { "mips*-*-*" } } } */
 +/* { dg-options "-mabi=32"  } */
-+/* { dg-skip-if "" { *-*-* } { "-O0" "-Os" } { "" } } */
++/* { dg-skip-if "" { *-*-* } { "-flto" } { "" } } */
 +
 +typedef struct ui8
 +{
@@ -121,10 +123,9 @@
 +  return a.v[0];
 +}
 +
-+/* { dg-final { scan-assembler "\tsw\t\\\$5,100\\(\\\$sp\\)" } } */
-+/* { dg-final { scan-assembler "\tsw\t\\\$6,104\\(\\\$sp\\)" } } */
-+/* { dg-final { scan-assembler "\tsw\t\\\$7,108\\(\\\$sp\\)" } } */
++/* { dg-final { scan-assembler "\tsw\t\\\$5,1\\d\\d\\(\\\$(sp|fp)\\)" } } */
++/* { dg-final { scan-assembler "\tsw\t\\\$6,1\\d\\d\\(\\\$(sp|fp)\\)" } } */
++/* { dg-final { scan-assembler "\tsw\t\\\$7,1\\d\\d\\(\\\$(sp|fp)\\)" } } */
 --
-2.34.1
-
+2.30.2


> > Regards,
> > Jovan
>
>
>
> --
> YunQiang Su



-- 
YunQiang Su

Re: RE: [PATCH] Prevent TYPE_PRECISION on VECTOR_TYPEs

2023-06-29 Thread juzhe.zh...@rivai.ai

Hi， Richi.

I tried to debug this ICE:

(gdb) call print_gimple_stmt(stdout,stmt,0,0)
vect_patt_2876.406_2862 = vect_patt_2877.405_2864 % { 6, ... };
(gdb) p type->type_common.mode
$2 = E_VNx4HImode

Currently, I have no ideal how to walk around this ICE in RISC-V port.
Do you have any suggestions?

Thanks.


juzhe.zh...@rivai.ai
 
From: Li, Pan2
Date: 2023-06-29 12:18
To: Jakub Jelinek; Richard Biener
CC: gcc-patches@gcc.gnu.org; jeffreya...@gmail.com; kito.ch...@gmail.com; 
juzhe.zh...@rivai.ai; rdapp@gmail.com
Subject: RE: [PATCH] Prevent TYPE_PRECISION on VECTOR_TYPEs
Sorry for disturbing, cc kito, juzhe and robin for awareness.
 
Pan
 
-Original Message-
From: Li, Pan2 
Sent: Thursday, June 29, 2023 12:05 PM
To: Jakub Jelinek ; Richard Biener 
Cc: gcc-patches@gcc.gnu.org; jeffreya...@gmail.com
Subject: RE: [PATCH] Prevent TYPE_PRECISION on VECTOR_TYPEs
 
It seems this patch may result in many test ICE failures on RISC-V backend. 
Could you help to double confirm about it follow the possible reproduce steps 
like blow? Thank you!
 
cd gcc && mkdir __BUILD__ && cd __BUILD__
../configure \
  --target=riscv64-unknown-elf \
  --prefix= \
  --disable-shared \
  --enable-threads \
  --enable-tls \
  --enable-languages=c,c++ \
  --with-system-zlib \
  --with-newlib \
  --disable-libmudflap \
  --disable-libssp \
  --disable-libquadmath \
  --disable-libgomp \
  --enable-nls \
  --disable-tm-clone-registry \
  --enable-multilib \
  --src=`pwd`/../ \
  --with-abi=lp64d \
  --with-arch=rv64imafdcv \
  --with-tune=rocket \
  --with-isa-spec=20191213 \
  --enable-werror \
  --enable-bootstrap \
  CFLAGS_FOR_BUILD="-O0 -g" \
  CXXFLAGS_FOR_BUILD="-O0 -g" \
  CFLAGS_FOR_TARGET="-O0 -g" \
  CXXFLAGS_FOR_TARGET="-O0 -g" \
  BOOT_CFLAGS="-O0 -g" \
  CFLAGS="-O0 -g" \
  CXXFLAGS="-O0 -g" \
  GM2FLAGS_FOR_TARGET="-O0 -g" \
  GOCFLAGS_FOR_TARGET="-O0 -g" \
  GDCFLAGS_FOR_TARGET="-O0 -g"
make -j $(nproc) all-gcc && make install-gcc
 
Then run one test file build like below, and you may see the ICE similar to 
below.
 
../__RISC-V_INSTALL_/bin/riscv64-unknown-elf-gcc -O2 --param 
riscv-autovec-preference=fixed-vlmax 
gcc/testsuite/gcc.target/riscv/rvv/autovec/partial/multiple_rgroup_run-3.c
during GIMPLE pass: widening_mul
In file included from 
gcc/testsuite/gcc.target/riscv/rvv/autovec/partial/multiple_rgroup_run-3.c:4:
gcc/testsuite/gcc.target/riscv/rvv/autovec/partial/multiple_rgroup-3.c: In 
function 'f3_init':
gcc/testsuite/gcc.target/riscv/rvv/autovec/partial/multiple_rgroup-3.c:249:1: 
internal compiler error: tree check: expected none of vector_type, have 
vector_type in divmod_candidate_p, at tree-ssa-math-opts.cc:4998
  249 | f3_init (int8_t *__restrict x, int8_t *__restrict x2, int64_t 
*__restrict y,
  | ^~~
0x1b1584e tree_not_check_failed(tree_node const*, char const*, int, char 
const*, ...)

/home/pli/repos/gcc/444/riscv-gnu-toolchain/gcc/__RISC-V_BUILD__/../gcc/tree.cc:8936
0xd74e9e tree_not_check(tree_node*, char const*, int, char const*, tree_code)

/home/pli/repos/gcc/444/riscv-gnu-toolchain/gcc/__RISC-V_BUILD__/../gcc/tree.h:3581
0x196150c divmod_candidate_p

/home/pli/repos/gcc/444/riscv-gnu-toolchain/gcc/__RISC-V_BUILD__/../gcc/tree-ssa-math-opts.cc:4998
0x196164f convert_to_divmod

/home/pli/repos/gcc/444/riscv-gnu-toolchain/gcc/__RISC-V_BUILD__/../gcc/tree-ssa-math-opts.cc:5041
0x196383d after_dom_children

/home/pli/repos/gcc/444/riscv-gnu-toolchain/gcc/__RISC-V_BUILD__/../gcc/tree-ssa-math-opts.cc:5580
0x299bcb4 dom_walker::walk(basic_block_def*)

/home/pli/repos/gcc/444/riscv-gnu-toolchain/gcc/__RISC-V_BUILD__/../gcc/domwalk.cc:354
0x1963d09 execute

/home/pli/repos/gcc/444/riscv-gnu-toolchain/gcc/__RISC-V_BUILD__/../gcc/tree-ssa-math-opts.cc:5666
Please submit a full bug report, with preprocessed source (by using 
-freport-bug).
Please include the complete backtrace with any bug report.
See  for instructions.
 
Pan
 
-Original Message-
From: Gcc-patches  On Behalf 
Of Jakub Jelinek via Gcc-patches
Sent: Tuesday, June 27, 2023 5:47 PM
To: Richard Biener 
Cc: gcc-patches@gcc.gnu.org; jeffreya...@gmail.com
Subject: Re: [PATCH] Prevent TYPE_PRECISION on VECTOR_TYPEs
 
On Tue, Jun 27, 2023 at 11:45:33AM +0200, Richard Biener wrote:
> The following makes sure that using TYPE_PRECISION on VECTOR_TYPE
> ICEs when tree checking is enabled.  This should avoid wrong-code
> in cases like PR110182 and instead ICE.
> 
> It also introduces a TYPE_PRECISION_RAW accessor and adjusts
> places I found that are eligible to use that.
> 
> Bootstrapped and tested on x86_64-unknown-linux-gnu with all
> languages enabled.
> 
> OK for trunk?  There is definitely going to be fallout but it
> should be straight-forward to fix with quick fixes using
> TYPE_PRECISION_RAW possible.
> 
> Thanks,
> Richard.
> 
> * tree.h (TYPE_PRECISION): Check for non-VECTOR_TYPE.
> (TYPE_PRECISION_RAW): Provide raw access to the

Re: [PATCH] Prevent TYPE_PRECISION on VECTOR_TYPEs

2023-06-29 Thread Richard Biener via Gcc-patches

On Thu, 29 Jun 2023, Robin Dapp wrote:

> > I think GCN people wanted to make this code work for vectors, the
> > most obvious local fix is to use element_precision (type) above.
> > 
> > Note usually vector integer divisions are not a thing so this might
> > explain why you're seeing this only with RVV?
> 
> Since nobody else has provided a patch yet, is the attached OK as long
> as x86 bootstrap and testsuite are clean?

Yes.

Thanks,
Richard.

> Regards
>  Robin
> 
> From 5ac3bb96cae0af99cefeaa225806de67e268e8f5 Mon Sep 17 00:00:00 2001
> From: Robin Dapp 
> Date: Thu, 29 Jun 2023 11:35:02 +0200
> Subject: [PATCH] ssa-math-opts: Use element_precision.
> 
> The recent TYPE_PRECISION changes to detect improper usage
> cause an ICE in divmod_candidate_p for RVV when called with
> a vector type.  Therefore, use element_precision instead.
> 
> gcc/ChangeLog:
> 
>   * tree-ssa-math-opts.cc (divmod_candidate_p): Use
>   element_precision.
> ---
>  gcc/tree-ssa-math-opts.cc | 4 ++--
>  1 file changed, 2 insertions(+), 2 deletions(-)
> 
> diff --git a/gcc/tree-ssa-math-opts.cc b/gcc/tree-ssa-math-opts.cc
> index da01d4ab2b6..701fce2ab61 100644
> --- a/gcc/tree-ssa-math-opts.cc
> +++ b/gcc/tree-ssa-math-opts.cc
> @@ -4995,8 +4995,8 @@ divmod_candidate_p (gassign *stmt)
>if (integer_pow2p (op2))
>   return false;
>  
> -  if (TYPE_PRECISION (type) <= HOST_BITS_PER_WIDE_INT
> -   && TYPE_PRECISION (type) <= BITS_PER_WORD)
> +  if (element_precision (type) <= HOST_BITS_PER_WIDE_INT
> +   && element_precision (type) <= BITS_PER_WORD)
>   return false;
>  
>/* If the divisor is not power of 2 and the precision wider than
> 

-- 
Richard Biener 
SUSE Software Solutions Germany GmbH, Frankenstrasse 146, 90461 Nuernberg,
Germany; GF: Ivo Totev, Andrew Myers, Andrew McDonald, Boudien Moerman;
HRB 36809 (AG Nuernberg)

[PATCH] tree-optimization/110460 - fend off vector types from vectorizer

2023-06-29 Thread Richard Biener via Gcc-patches

The following makes fending off existing vector types from vectorization
also apply to word_mode vector types.  I've chosen to add a positive
list of allowed scalar types here for clarity.

Bootstrapped and tested on x86_64-unknown-linux-gnu, pushed.

PR tree-optimization/110460
* tree-vect-stmts.cc (get_related_vectype_for_scalar_type):
Only allow integral, pointer and scalar float type scalar_type.
---
 gcc/tree-vect-stmts.cc | 7 +--
 1 file changed, 5 insertions(+), 2 deletions(-)

diff --git a/gcc/tree-vect-stmts.cc b/gcc/tree-vect-stmts.cc
index d642d3c257f..68faa8ead39 100644
--- a/gcc/tree-vect-stmts.cc
+++ b/gcc/tree-vect-stmts.cc
@@ -12174,8 +12174,11 @@ get_related_vectype_for_scalar_type (machine_mode 
prevailing_mode,
   machine_mode simd_mode;
   tree vectype;
 
-  if (!is_int_mode (TYPE_MODE (scalar_type), _mode)
-  && !is_float_mode (TYPE_MODE (scalar_type), _mode))
+  if ((!INTEGRAL_TYPE_P (scalar_type)
+   && !POINTER_TYPE_P (scalar_type)
+   && !SCALAR_FLOAT_TYPE_P (scalar_type))
+  || (!is_int_mode (TYPE_MODE (scalar_type), _mode)
+ && !is_float_mode (TYPE_MODE (scalar_type), _mode)))
 return NULL_TREE;
 
   unsigned int nbytes = GET_MODE_SIZE (inner_mode);
-- 
2.35.3

Re: [PATCH] Prevent TYPE_PRECISION on VECTOR_TYPEs

2023-06-29 Thread Robin Dapp via Gcc-patches

> I think GCN people wanted to make this code work for vectors, the
> most obvious local fix is to use element_precision (type) above.
> 
> Note usually vector integer divisions are not a thing so this might
> explain why you're seeing this only with RVV?

Since nobody else has provided a patch yet, is the attached OK as long
as x86 bootstrap and testsuite are clean?

Regards
 Robin

>From 5ac3bb96cae0af99cefeaa225806de67e268e8f5 Mon Sep 17 00:00:00 2001
From: Robin Dapp 
Date: Thu, 29 Jun 2023 11:35:02 +0200
Subject: [PATCH] ssa-math-opts: Use element_precision.

The recent TYPE_PRECISION changes to detect improper usage
cause an ICE in divmod_candidate_p for RVV when called with
a vector type.  Therefore, use element_precision instead.

gcc/ChangeLog:

* tree-ssa-math-opts.cc (divmod_candidate_p): Use
element_precision.
---
 gcc/tree-ssa-math-opts.cc | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/gcc/tree-ssa-math-opts.cc b/gcc/tree-ssa-math-opts.cc
index da01d4ab2b6..701fce2ab61 100644
--- a/gcc/tree-ssa-math-opts.cc
+++ b/gcc/tree-ssa-math-opts.cc
@@ -4995,8 +4995,8 @@ divmod_candidate_p (gassign *stmt)
   if (integer_pow2p (op2))
return false;
 
-  if (TYPE_PRECISION (type) <= HOST_BITS_PER_WIDE_INT
- && TYPE_PRECISION (type) <= BITS_PER_WORD)
+  if (element_precision (type) <= HOST_BITS_PER_WIDE_INT
+ && element_precision (type) <= BITS_PER_WORD)
return false;
 
   /* If the divisor is not power of 2 and the precision wider than
-- 
2.41.0

Re: Re: [PATCH v3] Streamer: Fix out of range memory access of machine mode

2023-06-29 Thread juzhe.zh...@rivai.ai

Ok. Thanks for taking care of it!

>> That looks like a different issue, though?
Yes, it's different issue and I am trying to fix it in RISC-V backend.

Thanks a lot.


juzhe.zh...@rivai.ai
 
From: Thomas Schwinge
Date: 2023-06-29 17:47
To: juzhe.zh...@rivai.ai
CC: pan2...@intel.com; gcc-patches@gcc.gnu.org; Robin Dapp; 
jeffreya...@gmail.com; yanzhang.w...@intel.com; kito.ch...@gmail.com; 
rguent...@suse.de; ja...@redhat.com; Tobias Burnus
Subject: Re: Re: [PATCH v3] Streamer: Fix out of range memory access of machine 
mode
Hi!
 
On 2023-06-29T17:33:14+0800, "juzhe.zh...@rivai.ai"  
wrote:
> Not sure what happens you said ICEs all over the place...
 
Ah, sorry for not providing proper context here.  My comment was about
heterogeneous GCC configurations involving code offloading, like: x86_64
host with GCN, nvptx offload targets, which I'd been asked to test this
"Streamer: Fix out of range memory access of machine mode" for (in
private email) -- for good reasons, as we've now found.  ;-|
 
 
> Actually, even without this patch, current upstream GCC in RISCV port already 
> ICE all over the place:
>
> FAIL: gcc.target/riscv/rvv/autovec/partial/multiple_rgroup-3.c (internal 
> compiler error: tree check: expected none of vector_type, have vector_type in 
> divmod_candidate_p, at tree-ssa-math-opts.cc:4998)
> FAIL: gcc.target/riscv/rvv/autovec/partial/multiple_rgroup-3.c (test for 
> excess errors)
> FAIL: gcc.target/riscv/rvv/autovec/partial/slp_run-1.c (internal compiler 
> error: tree check: expected none of vector_type, have vector_type in 
> divmod_candidate_p, at tree-ssa-math-opts.cc:4998)
> FAIL: gcc.target/riscv/rvv/autovec/partial/slp_run-1.c (test for excess 
> errors)
> FAIL: gcc.target/riscv/rvv/autovec/partial/slp_run-17.c (internal compiler 
> error: tree check: expected none of vector_type, have vector_type in 
> divmod_candidate_p, at tree-ssa-math-opts.cc:4998)
> FAIL: gcc.target/riscv/rvv/autovec/partial/slp_run-17.c (test for excess 
> errors)
> FAIL: gcc.target/riscv/rvv/autovec/partial/multiple_rgroup_run-3.c (internal 
> compiler error: tree check: expected none of vector_type, have vector_type in 
> divmod_candidate_p, at tree-ssa-math-opts.cc:4998)
> FAIL: gcc.target/riscv/rvv/autovec/partial/multiple_rgroup_run-3.c (test for 
> excess errors)
> FAIL: gcc.target/riscv/rvv/autovec/partial/slp_run-2.c (internal compiler 
> error: tree check: expected none of vector_type, have vector_type in 
> divmod_candidate_p, at tree-ssa-math-opts.cc:4998)
> FAIL: gcc.target/riscv/rvv/autovec/partial/slp_run-2.c (test for excess 
> errors)
> FAIL: gcc.target/riscv/rvv/autovec/binop/narrow_run-1.c (internal compiler 
> error: tree check: expected none of vector_type, have vector_type in 
> divmod_candidate_p, at tree-ssa-math-opts.cc:4998)
> FAIL: gcc.target/riscv/rvv/autovec/binop/narrow_run-1.c (test for excess 
> errors)
> FAIL: gcc.target/riscv/rvv/autovec/partial/slp_run-16.c (internal compiler 
> error: tree check: expected none of vector_type, have vector_type in 
> divmod_candidate_p, at tree-ssa-math-opts.cc:4998)
> FAIL: gcc.target/riscv/rvv/autovec/partial/slp_run-16.c (test for excess 
> errors)
> FAIL: gcc.target/riscv/rvv/autovec/partial/slp_run-3.c (internal compiler 
> error: tree check: expected none of vector_type, have vector_type in 
> divmod_candidate_p, at tree-ssa-math-opts.cc:4998)
> FAIL: gcc.target/riscv/rvv/autovec/partial/slp_run-3.c (test for excess 
> errors)
 
That looks like a different issue, though?
 
 
Grüße
Thomas
 
 
> From: Thomas Schwinge
> Date: 2023-06-29 17:29
> To: Pan Li
> CC: gcc-patches@gcc.gnu.org; juzhe.zh...@rivai.ai; rdapp@gmail.com; 
> jeffreya...@gmail.com; yanzhang.w...@intel.com; kito.ch...@gmail.com; 
> rguent...@suse.de; ja...@redhat.com; Tobias Burnus
> Subject: Re: [PATCH v3] Streamer: Fix out of range memory access of machine 
> mode
> Hi!
>
> On 2023-06-21T15:58:24+0800, Pan Li via Gcc-patches  
> wrote:
>> We extend the machine mode from 8 to 16 bits already. But there still
>> one placing missing from the streamer. It has one hard coded array
>> for the machine code like size 256.
>>
>> In the lto pass, we memset the array by MAX_MACHINE_MODE count but the
>> value of the MAX_MACHINE_MODE will grow as more and more modes are
>> added. While the machine mode array in tree-streamer still leave 256 as is.
>>
>> Then, when the MAX_MACHINE_MODE is greater than 256, the memset of
>> lto_output_init_mode_table will touch the memory out of range unexpected.
>
> Uh.  :-O
>
>> This patch would like to take the MAX_MACHINE_MODE as the size of the
>> array in streamer, to make sure there is no potential unexpected
>> memory access in future. Meanwhile, this patch also adjust some place
>> which has MAX_MACHINE_MODE <= 256 assumption.
>
> Thanks to Jakub and Richard for guidance re the offloading compilation
> case, where we've got different 'MAX_MACHINE_MODE's between stream-out
> and stream-in, and a modes mapping table.
>
> However, with this

Re: Re: [PATCH v3] Streamer: Fix out of range memory access of machine mode

2023-06-29 Thread Thomas Schwinge

Hi!

On 2023-06-29T17:33:14+0800, "juzhe.zh...@rivai.ai"  
wrote:
> Not sure what happens you said ICEs all over the place...

Ah, sorry for not providing proper context here.  My comment was about
heterogeneous GCC configurations involving code offloading, like: x86_64
host with GCN, nvptx offload targets, which I'd been asked to test this
"Streamer: Fix out of range memory access of machine mode" for (in
private email) -- for good reasons, as we've now found.  ;-|


> Actually, even without this patch, current upstream GCC in RISCV port already 
> ICE all over the place:
>
> FAIL: gcc.target/riscv/rvv/autovec/partial/multiple_rgroup-3.c (internal 
> compiler error: tree check: expected none of vector_type, have vector_type in 
> divmod_candidate_p, at tree-ssa-math-opts.cc:4998)
> FAIL: gcc.target/riscv/rvv/autovec/partial/multiple_rgroup-3.c (test for 
> excess errors)
> FAIL: gcc.target/riscv/rvv/autovec/partial/slp_run-1.c (internal compiler 
> error: tree check: expected none of vector_type, have vector_type in 
> divmod_candidate_p, at tree-ssa-math-opts.cc:4998)
> FAIL: gcc.target/riscv/rvv/autovec/partial/slp_run-1.c (test for excess 
> errors)
> FAIL: gcc.target/riscv/rvv/autovec/partial/slp_run-17.c (internal compiler 
> error: tree check: expected none of vector_type, have vector_type in 
> divmod_candidate_p, at tree-ssa-math-opts.cc:4998)
> FAIL: gcc.target/riscv/rvv/autovec/partial/slp_run-17.c (test for excess 
> errors)
> FAIL: gcc.target/riscv/rvv/autovec/partial/multiple_rgroup_run-3.c (internal 
> compiler error: tree check: expected none of vector_type, have vector_type in 
> divmod_candidate_p, at tree-ssa-math-opts.cc:4998)
> FAIL: gcc.target/riscv/rvv/autovec/partial/multiple_rgroup_run-3.c (test for 
> excess errors)
> FAIL: gcc.target/riscv/rvv/autovec/partial/slp_run-2.c (internal compiler 
> error: tree check: expected none of vector_type, have vector_type in 
> divmod_candidate_p, at tree-ssa-math-opts.cc:4998)
> FAIL: gcc.target/riscv/rvv/autovec/partial/slp_run-2.c (test for excess 
> errors)
> FAIL: gcc.target/riscv/rvv/autovec/binop/narrow_run-1.c (internal compiler 
> error: tree check: expected none of vector_type, have vector_type in 
> divmod_candidate_p, at tree-ssa-math-opts.cc:4998)
> FAIL: gcc.target/riscv/rvv/autovec/binop/narrow_run-1.c (test for excess 
> errors)
> FAIL: gcc.target/riscv/rvv/autovec/partial/slp_run-16.c (internal compiler 
> error: tree check: expected none of vector_type, have vector_type in 
> divmod_candidate_p, at tree-ssa-math-opts.cc:4998)
> FAIL: gcc.target/riscv/rvv/autovec/partial/slp_run-16.c (test for excess 
> errors)
> FAIL: gcc.target/riscv/rvv/autovec/partial/slp_run-3.c (internal compiler 
> error: tree check: expected none of vector_type, have vector_type in 
> divmod_candidate_p, at tree-ssa-math-opts.cc:4998)
> FAIL: gcc.target/riscv/rvv/autovec/partial/slp_run-3.c (test for excess 
> errors)

That looks like a different issue, though?


Grüße
 Thomas


> From: Thomas Schwinge
> Date: 2023-06-29 17:29
> To: Pan Li
> CC: gcc-patches@gcc.gnu.org; juzhe.zh...@rivai.ai; rdapp@gmail.com; 
> jeffreya...@gmail.com; yanzhang.w...@intel.com; kito.ch...@gmail.com; 
> rguent...@suse.de; ja...@redhat.com; Tobias Burnus
> Subject: Re: [PATCH v3] Streamer: Fix out of range memory access of machine 
> mode
> Hi!
>
> On 2023-06-21T15:58:24+0800, Pan Li via Gcc-patches  
> wrote:
>> We extend the machine mode from 8 to 16 bits already. But there still
>> one placing missing from the streamer. It has one hard coded array
>> for the machine code like size 256.
>>
>> In the lto pass, we memset the array by MAX_MACHINE_MODE count but the
>> value of the MAX_MACHINE_MODE will grow as more and more modes are
>> added. While the machine mode array in tree-streamer still leave 256 as is.
>>
>> Then, when the MAX_MACHINE_MODE is greater than 256, the memset of
>> lto_output_init_mode_table will touch the memory out of range unexpected.
>
> Uh.  :-O
>
>> This patch would like to take the MAX_MACHINE_MODE as the size of the
>> array in streamer, to make sure there is no potential unexpected
>> memory access in future. Meanwhile, this patch also adjust some place
>> which has MAX_MACHINE_MODE <= 256 assumption.
>
> Thanks to Jakub and Richard for guidance re the offloading compilation
> case, where we've got different 'MAX_MACHINE_MODE's between stream-out
> and stream-in, and a modes mapping table.
>
> However, with this patch, there are ICEs all over the place...  I'm
> having a look.
>
>
> Grüße
> Thomas
>
>
>> gcc/ChangeLog:
>>
>>   * lto-streamer-in.cc (lto_input_mode_table): Stream in the mode
>>   bits for machine mode table.
>>   * lto-streamer-out.cc (lto_write_mode_table): Stream out the
>>   HOST machine mode bits.
>>   * lto-streamer.h (struct lto_file_decl_data): New fields mode_bits.
>>   * tree-streamer.cc (streamer_mode_table): Take MAX_MACHINE_MODE
>>   as the table size.
>>   * tree-streamer.h

RE: [PATCH] PR gcc/110148:Avoid adding loop-carried ops to long chains

2023-06-29 Thread Cui, Lili via Gcc-patches



> -Original Message-
> From: Richard Biener 
> Sent: Thursday, June 29, 2023 2:42 PM
> To: Cui, Lili 
> Cc: gcc-patches@gcc.gnu.org
> Subject: Re: [PATCH] PR gcc/110148:Avoid adding loop-carried ops to long
> chains
> 
> On Thu, Jun 29, 2023 at 3:49 AM Cui, Lili  wrote:
> >
> > From: Lili Cui 
> >
> > Hi Maintainer
> >
> > This patch is to fix TSVC242 regression related to loop-carried ops.
> >
> > Bootstrapped and regtested. Ok for trunk?
> 
> OK.
> 
Committed, thanks Richard.

Regards,
Lili.

> Thanks,
> Richard.
> 
> > Regards
> > Lili.
> >
> > Avoid adding loop-carried ops to long chains, otherwise the whole
> > chain will have dependencies across the loop iteration. Just keep
> > loop-carried ops in a separate chain.
> >E.g.
> >x_1 = phi(x_0, x_2)
> >y_1 = phi(y_0, y_2)
> >
> >a + b + c + d + e + x1 + y1
> >
> >SSA1 = a + b;
> >SSA2 = c + d;
> >SSA3 = SSA1 + e;
> >SSA4 = SSA3 + SSA2;
> >SSA5 = x1 + y1;
> >SSA6 = SSA4 + SSA5;
> >
> > With the patch applied, these test cases improved by 32%~100%.
> >
> > S242:
> > for (int i = 1; i < LEN_1D; ++i) {
> > a[i] = a[i - 1] + s1 + s2 + b[i] + c[i] + d[i];}
> >
> > Case 1:
> > for (int i = 1; i < LEN_1D; ++i) {
> > a[i] = a[i - 1] + s1 + s2 + b[i] + c[i] + d[i] + e[i];}
> >
> > Case 2:
> > for (int i = 1; i < LEN_1D; ++i) {
> > a[i] = a[i - 1] + b[i - 1] + s1 + s2 + b[i] + c[i] + d[i] + e[i];}
> >
> > The value is the execution time
> > A: original version
> > B: with FMA patch g:e5405f065bace0685cb3b8878d1dfc7a6e7ef409(base
> on
> > A)
> > C: with current patch(base on B)
> >
> >   A   B   C B/A C/A
> > s2422.859   5.152   2.859   1.802028681 1
> > case 1  5.489   5.488   3.511   0.9998180.64
> > case 2  7.216   7.499   4.885   1.0392180.68
> >
> > gcc/ChangeLog:
> > PR tree-optimization/110148
> > * tree-ssa-reassoc.cc (rewrite_expr_tree_parallel): Handle 
> > loop-carried
> > ops in this function.
> > ---
> >  gcc/tree-ssa-reassoc.cc | 236
> > 
> >  1 file changed, 167 insertions(+), 69 deletions(-)
> >
> > diff --git a/gcc/tree-ssa-reassoc.cc b/gcc/tree-ssa-reassoc.cc index
> > 96c88ec003e..f5da385e0b2 100644
> > --- a/gcc/tree-ssa-reassoc.cc
> > +++ b/gcc/tree-ssa-reassoc.cc
> > @@ -5471,37 +5471,62 @@ get_reassociation_width (int ops_num, enum
> tree_code opc,
> >return width;
> >  }
> >
> > +#define SPECIAL_BIASED_END_STMT 0 /* It is the end stmt of all ops.
> > +*/ #define BIASED_END_STMT 1 /* It is the end stmt of normal or
> > +biased ops.  */ #define NORMAL_END_STMT 2 /* It is the end stmt of
> > +normal ops.  */
> > +
> >  /* Rewrite statements with dependency chain with regard the chance to
> generate
> > FMA.
> > For the chain with FMA: Try to keep fma opportunity as much as possible.
> > For the chain without FMA: Putting the computation in rank order and
> trying
> > to allow operations to be executed in parallel.
> > E.g.
> > -   e + f + g + a * b + c * d;
> > +   e + f + a * b + c * d;
> >
> > -   ssa1 = e + f;
> > -   ssa2 = g + a * b;
> > -   ssa3 = ssa1 + c * d;
> > -   ssa4 = ssa2 + ssa3;
> > +   ssa1 = e + a * b;
> > +   ssa2 = f + c * d;
> > +   ssa3 = ssa1 + ssa2;
> >
> > This reassociation approach preserves the chance of fma generation as
> much
> > -   as possible.  */
> > +   as possible.
> > +
> > +   Another thing is to avoid adding loop-carried ops to long chains,
> otherwise
> > +   the whole chain will have dependencies across the loop iteration. Just
> keep
> > +   loop-carried ops in a separate chain.
> > +   E.g.
> > +   x_1 = phi(x_0, x_2)
> > +   y_1 = phi(y_0, y_2)
> > +
> > +   a + b + c + d + e + x1 + y1
> > +
> > +   SSA1 = a + b;
> > +   SSA2 = c + d;
> > +   SSA3 = SSA1 + e;
> > +   SSA4 = SSA3 + SSA2;
> > +   SSA5 = x1 + y1;
> > +   SSA6 = SSA4 + SSA5;
> > + */
> >  static void
> >  rewrite_expr_tree_parallel (gassign *stmt, int width, bool has_fma,
> > -const vec )
> > +   const vec )
> >  {
> >enum tree_code opcode = gimple_assign_rhs_code (stmt);
> >int op_num = ops.length ();
> > +  int op_normal_num = op_num;
> >gcc_assert (op_num > 0);
> >int stmt_num = op_num - 1;
> >gimple **stmts = XALLOCAVEC (gimple *, stmt_num);
> > -  int op_index = op_num - 1;
> > -  int width_count = width;
> >int i = 0, j = 0;
> >tree tmp_op[2], op1;
> >operand_entry *oe;
> >gimple *stmt1 = NULL;
> >tree last_rhs1 = gimple_assign_rhs1 (stmt);
> > +  int last_rhs1_stmt_index = 0, last_rhs2_stmt_index = 0;  int
> > + width_active = 0, width_count = 0;  bool has_biased = false,
> > + ops_changed = false;  auto_vec ops_normal;
> > + auto_vec ops_biased;  vec *ops1;
> >
> >/* We start expression rewriting from the top statements.
> >   So, in this loop we create a full list of statements @@ -5510,83
> > +5535,155 @@

Re: Re: [PATCH v3] Streamer: Fix out of range memory access of machine mode

2023-06-29 Thread juzhe.zh...@rivai.ai

Not sure what happens you said ICEs all over the place...

Actually, even without this patch, current upstream GCC in RISCV port already 
ICE all over the place:

FAIL: gcc.target/riscv/rvv/autovec/partial/multiple_rgroup-3.c (internal 
compiler error: tree check: expected none of vector_type, have vector_type in 
divmod_candidate_p, at tree-ssa-math-opts.cc:4998)
FAIL: gcc.target/riscv/rvv/autovec/partial/multiple_rgroup-3.c (test for excess 
errors)
FAIL: gcc.target/riscv/rvv/autovec/partial/slp_run-1.c (internal compiler 
error: tree check: expected none of vector_type, have vector_type in 
divmod_candidate_p, at tree-ssa-math-opts.cc:4998)
FAIL: gcc.target/riscv/rvv/autovec/partial/slp_run-1.c (test for excess errors)
FAIL: gcc.target/riscv/rvv/autovec/partial/slp_run-17.c (internal compiler 
error: tree check: expected none of vector_type, have vector_type in 
divmod_candidate_p, at tree-ssa-math-opts.cc:4998)
FAIL: gcc.target/riscv/rvv/autovec/partial/slp_run-17.c (test for excess errors)
FAIL: gcc.target/riscv/rvv/autovec/partial/multiple_rgroup_run-3.c (internal 
compiler error: tree check: expected none of vector_type, have vector_type in 
divmod_candidate_p, at tree-ssa-math-opts.cc:4998)
FAIL: gcc.target/riscv/rvv/autovec/partial/multiple_rgroup_run-3.c (test for 
excess errors)
FAIL: gcc.target/riscv/rvv/autovec/partial/slp_run-2.c (internal compiler 
error: tree check: expected none of vector_type, have vector_type in 
divmod_candidate_p, at tree-ssa-math-opts.cc:4998)
FAIL: gcc.target/riscv/rvv/autovec/partial/slp_run-2.c (test for excess errors)
FAIL: gcc.target/riscv/rvv/autovec/binop/narrow_run-1.c (internal compiler 
error: tree check: expected none of vector_type, have vector_type in 
divmod_candidate_p, at tree-ssa-math-opts.cc:4998)
FAIL: gcc.target/riscv/rvv/autovec/binop/narrow_run-1.c (test for excess errors)
FAIL: gcc.target/riscv/rvv/autovec/partial/slp_run-16.c (internal compiler 
error: tree check: expected none of vector_type, have vector_type in 
divmod_candidate_p, at tree-ssa-math-opts.cc:4998)
FAIL: gcc.target/riscv/rvv/autovec/partial/slp_run-16.c (test for excess errors)
FAIL: gcc.target/riscv/rvv/autovec/partial/slp_run-3.c (internal compiler 
error: tree check: expected none of vector_type, have vector_type in 
divmod_candidate_p, at tree-ssa-math-opts.cc:4998)
FAIL: gcc.target/riscv/rvv/autovec/partial/slp_run-3.c (test for excess errors)



juzhe.zh...@rivai.ai
 
From: Thomas Schwinge
Date: 2023-06-29 17:29
To: Pan Li
CC: gcc-patches@gcc.gnu.org; juzhe.zh...@rivai.ai; rdapp@gmail.com; 
jeffreya...@gmail.com; yanzhang.w...@intel.com; kito.ch...@gmail.com; 
rguent...@suse.de; ja...@redhat.com; Tobias Burnus
Subject: Re: [PATCH v3] Streamer: Fix out of range memory access of machine mode
Hi!
 
On 2023-06-21T15:58:24+0800, Pan Li via Gcc-patches  
wrote:
> We extend the machine mode from 8 to 16 bits already. But there still
> one placing missing from the streamer. It has one hard coded array
> for the machine code like size 256.
>
> In the lto pass, we memset the array by MAX_MACHINE_MODE count but the
> value of the MAX_MACHINE_MODE will grow as more and more modes are
> added. While the machine mode array in tree-streamer still leave 256 as is.
>
> Then, when the MAX_MACHINE_MODE is greater than 256, the memset of
> lto_output_init_mode_table will touch the memory out of range unexpected.
 
Uh.  :-O
 
> This patch would like to take the MAX_MACHINE_MODE as the size of the
> array in streamer, to make sure there is no potential unexpected
> memory access in future. Meanwhile, this patch also adjust some place
> which has MAX_MACHINE_MODE <= 256 assumption.
 
Thanks to Jakub and Richard for guidance re the offloading compilation
case, where we've got different 'MAX_MACHINE_MODE's between stream-out
and stream-in, and a modes mapping table.
 
However, with this patch, there are ICEs all over the place...  I'm
having a look.
 
 
Grüße
Thomas
 
 
> gcc/ChangeLog:
>
>   * lto-streamer-in.cc (lto_input_mode_table): Stream in the mode
>   bits for machine mode table.
>   * lto-streamer-out.cc (lto_write_mode_table): Stream out the
>   HOST machine mode bits.
>   * lto-streamer.h (struct lto_file_decl_data): New fields mode_bits.
>   * tree-streamer.cc (streamer_mode_table): Take MAX_MACHINE_MODE
>   as the table size.
>   * tree-streamer.h (streamer_mode_table): Ditto.
>   (bp_pack_machine_mode): Take 1 << ceil_log2 (MAX_MACHINE_MODE)
>   as the packing limit.
>   (bp_unpack_machine_mode): Ditto.
> ---
>  gcc/lto-streamer-in.cc  | 12 
>  gcc/lto-streamer-out.cc | 11 ---
>  gcc/lto-streamer.h  |  2 ++
>  gcc/tree-streamer.cc|  2 +-
>  gcc/tree-streamer.h | 14 +-
>  5 files changed, 28 insertions(+), 13 deletions(-)
>
> diff --git a/gcc/lto-streamer-in.cc b/gcc/lto-streamer-in.cc
> index 2cb83406db5..2a0720b4e6f 100644
> --- a/gcc/lto-streamer-in.cc
> +++ b/gcc/lto-streamer-in.cc
> @@

Re: [PATCH] rs6000: Remove redundant initialization [PR106907]

2023-06-29 Thread Kewen.Lin via Gcc-patches

Hi Jeevitha,

on 2023/6/7 13:44, P Jeevitha via Gcc-patches wrote:
> PR106907 has few warnings spotted from cppcheck. In that addressing
> redundant initialization issue. Here the initialized value of 'new_addr'
> was overwritten before it was read. Updated the source by removing the
> unnecessary initialization of 'new_addr'.

This is okay for trunk (no backports needed btw), this fix can even be
taken as obvious, thanks!

> 
> 2023-06-07  Jeevitha Palanisamy  
> 
> gcc/
>   PR target/106907

One curious question is that this PR106907 seemed not to report this issue,
is there another PR reporting this?  Or do I miss something?

BR,
Kewen

>   * gcc/config/rs6000/rs6000.cc (rs6000_expand_vector_extract): Remove 
> redundant
>   initialization of new_addr.
> 
> 
> diff --git a/gcc/config/rs6000/rs6000.cc b/gcc/config/rs6000/rs6000.cc
> index 42f49e4a56b..d994e004bd3 100644
> --- a/gcc/config/rs6000/rs6000.cc
> +++ b/gcc/config/rs6000/rs6000.cc
> @@ -7660,12 +7660,11 @@ rs6000_expand_vector_extract (rtx target, rtx vec, 
> rtx elt)
>  {
>unsigned int ele_size = GET_MODE_SIZE (inner_mode);
>rtx num_ele_m1 = GEN_INT (GET_MODE_NUNITS (mode) - 1);
> -  rtx new_addr = gen_reg_rtx (Pmode);
>  
>elt = gen_rtx_AND (Pmode, elt, num_ele_m1);
>if (ele_size > 1)
>   elt = gen_rtx_MULT (Pmode, elt, GEN_INT (ele_size));
> -  new_addr = gen_rtx_PLUS (Pmode, XEXP (mem, 0), elt);
> +  rtx new_addr = gen_rtx_PLUS (Pmode, XEXP (mem, 0), elt);
>new_addr = change_address (mem, inner_mode, new_addr);
>emit_move_insn (target, new_addr);
>  }
>

Re: [PATCH v3] Streamer: Fix out of range memory access of machine mode

2023-06-29 Thread Thomas Schwinge

Hi!

On 2023-06-21T15:58:24+0800, Pan Li via Gcc-patches  
wrote:
> We extend the machine mode from 8 to 16 bits already. But there still
> one placing missing from the streamer. It has one hard coded array
> for the machine code like size 256.
>
> In the lto pass, we memset the array by MAX_MACHINE_MODE count but the
> value of the MAX_MACHINE_MODE will grow as more and more modes are
> added. While the machine mode array in tree-streamer still leave 256 as is.
>
> Then, when the MAX_MACHINE_MODE is greater than 256, the memset of
> lto_output_init_mode_table will touch the memory out of range unexpected.

Uh.  :-O

> This patch would like to take the MAX_MACHINE_MODE as the size of the
> array in streamer, to make sure there is no potential unexpected
> memory access in future. Meanwhile, this patch also adjust some place
> which has MAX_MACHINE_MODE <= 256 assumption.

Thanks to Jakub and Richard for guidance re the offloading compilation
case, where we've got different 'MAX_MACHINE_MODE's between stream-out
and stream-in, and a modes mapping table.

However, with this patch, there are ICEs all over the place...  I'm
having a look.


Grüße
 Thomas


> gcc/ChangeLog:
>
>   * lto-streamer-in.cc (lto_input_mode_table): Stream in the mode
>   bits for machine mode table.
>   * lto-streamer-out.cc (lto_write_mode_table): Stream out the
>   HOST machine mode bits.
>   * lto-streamer.h (struct lto_file_decl_data): New fields mode_bits.
>   * tree-streamer.cc (streamer_mode_table): Take MAX_MACHINE_MODE
>   as the table size.
>   * tree-streamer.h (streamer_mode_table): Ditto.
>   (bp_pack_machine_mode): Take 1 << ceil_log2 (MAX_MACHINE_MODE)
>   as the packing limit.
>   (bp_unpack_machine_mode): Ditto.
> ---
>  gcc/lto-streamer-in.cc  | 12 
>  gcc/lto-streamer-out.cc | 11 ---
>  gcc/lto-streamer.h  |  2 ++
>  gcc/tree-streamer.cc|  2 +-
>  gcc/tree-streamer.h | 14 +-
>  5 files changed, 28 insertions(+), 13 deletions(-)
>
> diff --git a/gcc/lto-streamer-in.cc b/gcc/lto-streamer-in.cc
> index 2cb83406db5..2a0720b4e6f 100644
> --- a/gcc/lto-streamer-in.cc
> +++ b/gcc/lto-streamer-in.cc
> @@ -1985,8 +1985,6 @@ lto_input_mode_table (struct lto_file_decl_data 
> *file_data)
>  internal_error ("cannot read LTO mode table from %s",
>   file_data->file_name);
>
> -  unsigned char *table = ggc_cleared_vec_alloc (1 << 8);
> -  file_data->mode_table = table;
>const struct lto_simple_header_with_strings *header
>  = (const struct lto_simple_header_with_strings *) data;
>int string_offset;
> @@ -1998,16 +1996,22 @@ lto_input_mode_table (struct lto_file_decl_data 
> *file_data)
>   header->string_size, vNULL);
>bitpack_d bp = streamer_read_bitpack ();
>
> +  unsigned mode_bits = bp_unpack_value (, 5);
> +  unsigned char *table = ggc_cleared_vec_alloc (1 << 
> mode_bits);
> +
> +  file_data->mode_table = table;
> +  file_data->mode_bits = mode_bits;
> +
>table[VOIDmode] = VOIDmode;
>table[BLKmode] = BLKmode;
>unsigned int m;
> -  while ((m = bp_unpack_value (, 8)) != VOIDmode)
> +  while ((m = bp_unpack_value (, mode_bits)) != VOIDmode)
>  {
>enum mode_class mclass
>   = bp_unpack_enum (, mode_class, MAX_MODE_CLASS);
>poly_uint16 size = bp_unpack_poly_value (, 16);
>poly_uint16 prec = bp_unpack_poly_value (, 16);
> -  machine_mode inner = (machine_mode) bp_unpack_value (, 8);
> +  machine_mode inner = (machine_mode) bp_unpack_value (, mode_bits);
>poly_uint16 nunits = bp_unpack_poly_value (, 16);
>unsigned int ibit = 0, fbit = 0;
>unsigned int real_fmt_len = 0;
> diff --git a/gcc/lto-streamer-out.cc b/gcc/lto-streamer-out.cc
> index 5ab2eb4301e..36899283ded 100644
> --- a/gcc/lto-streamer-out.cc
> +++ b/gcc/lto-streamer-out.cc
> @@ -3196,6 +3196,11 @@ lto_write_mode_table (void)
>   if (inner_m != m)
> streamer_mode_table[(int) inner_m] = 1;
>}
> +
> +  /* Pack the mode_bits value within 5 bits (up to 31) in the beginning.  */
> +  unsigned mode_bits = ceil_log2 (MAX_MACHINE_MODE);
> +  bp_pack_value (, mode_bits, 5);
> +
>/* First stream modes that have GET_MODE_INNER (m) == m,
>   so that we can refer to them afterwards.  */
>for (int pass = 0; pass < 2; pass++)
> @@ -3205,11 +3210,11 @@ lto_write_mode_table (void)
> machine_mode m = (machine_mode) i;
> if ((GET_MODE_INNER (m) == m) ^ (pass == 0))
>   continue;
> -   bp_pack_value (, m, 8);
> +   bp_pack_value (, m, mode_bits);
> bp_pack_enum (, mode_class, MAX_MODE_CLASS, GET_MODE_CLASS (m));
> bp_pack_poly_value (, GET_MODE_SIZE (m), 16);
> bp_pack_poly_value (, GET_MODE_PRECISION (m), 16);
> -   bp_pack_value (, GET_MODE_INNER (m), 8);
> +   bp_pack_value (, GET_MODE_INNER (m), mode_bits);
> bp_pack_poly_value (, GET_MODE_NUNITS (m), 16);
>

Re: Re: [PATCH V3] RISC-V: Fix bug of pre-calculated const vector mask for VNx1BI, VNx2BI and VNx4BI

2023-06-29 Thread juzhe.zh...@rivai.ai

No, I am not saying I want to fix it in RISC-V backend.
Actually, if you can quickly land the fix in generic codes and not block of the 
RISC-V following patches.
I am glad to see. Otherwise, I prefer to fix it RISC-V backend for now if it is 
not a big issue for performance and defer it to GCC-15 to make it perfect.

The reason why I plan that is global reviewers bandwidth is very limit.
We should make the highest priority auto-vectorizaiton middle-end support first 
and then let's come back to see the corner case issues.

Thanks.


juzhe.zh...@rivai.ai
 
From: Robin Dapp
Date: 2023-06-29 17:09
To: Robin Dapp via Gcc-patches; 钟居哲; Jeff Law; kito.cheng; kito.cheng; palmer; 
palmer; richard.sandiford
CC: rdapp.gcc
Subject: Re: [PATCH V3] RISC-V: Fix bug of pre-calculated const vector mask for 
VNx1BI, VNx2BI and VNx4BI
> Yeah, that part is OK, and was the case I was thinking about when
> I said OK yesterday.  But now that we allow BITSIZE != PRECISION,
> it's possible for BITSIZE - PRECISION to be more than a full byte,
> in which case the new loop would not initialise every byte of
> the mode.
 
Ah, I see, so when e.g. BITSIZE == 16 and PRECISION == 1.  Luckily
this cannot happen with RVV as all we do is adjust the precision
of the modes that have BITSIZE == 8.  I'm going to add an assert.
Juzhe would rather work around that in the backend, though.
 
The other thing I just noticed is
 
tree
build_truth_vector_type_for_mode (poly_uint64 nunits, machine_mode mask_mode)
{
  gcc_assert (mask_mode != BLKmode);
 
  unsigned HOST_WIDE_INT esize;
  if (VECTOR_MODE_P (mask_mode))
{
  poly_uint64 vsize = GET_MODE_BITSIZE (mask_mode);
  esize = vector_element_size (vsize, nunits);
}
  else
esize = 1;
 
  tree bool_type = build_nonstandard_boolean_type (esize);
 
  return make_vector_type (bool_type, nunits, mask_mode);
}
 
which gives us wrong precision as we rely on the BITSIZE here as well.
This results in a precision of 1 for VNx8BI, 2 for VNx4BI and 4 for
VNx2BI.
 
Maybe this isn't a problem per se but to me it appears
just wrong.
 
Regards
Robin

Re: [PATCH] rs6000, __builtin_set_fpscr_rn add retrun value

2023-06-29 Thread Kewen.Lin via Gcc-patches

Hi Carl,

on 2023/6/19 23:57, Carl Love wrote:
> GCC maintainers:
> 
> 
> The GLibC team requested a builtin to replace the mffscrn and mffscrni inline 
> asm instructions in the GLibC code> Previously there was discussion on adding 
> builtins for the mffscrn instructions.
> 
> https://gcc.gnu.org/pipermail/gcc-patches/2023-May/620261.html
> 
> In the end, it was felt that it would be to extend the existing
> __builtin_set_fpscr_rn builtin to return a double instead of a void
> type.  The desire is that we could have the functionality of the
> mffscrn and mffscrni instructions on older ISAs.  The two instructions
> were initially added in ISA 3.0.  The __builtin_set_fpscr_rn has the
> needed functionality to set the RN field using the mffscrn and mffscrni
> instructions if ISA 3.0 is supported or fall back to using logical
> instructions to mask and set the bits for earlier ISAs.  The
> instructions return the current value of the FPSCR fields DRN, VE, OE,
> UE, ZE, XE, NI, RN bit positions then update the RN bit positions with
> the new RN value provided.
> 
> The current __builtin_set_fpscr_rn builtin has a return type of void. 
> So, changing the return type to double and returning the  FPSCR fields
> DRN, VE, OE, UE, ZE, XE, NI, RN bit positions would then give the
> functionally equivalent of the mffscrn and mffscrni instructions.  Any
> current uses of the builtin would just ignore the return value yet any
> new uses could use the return value.  So the requirement is for the
> change to the __builtin_set_fpscr_rn builtin to be backwardly
> compatible and work for all ISAs.
> 
> The following patch changes the return type of the
>  __builtin_set_fpscr_rn builtin from void to double.  The return value
> is the current value of the various FPSCR fields DRN, VE, OE, UE, ZE,
> XE, NI, RN bit positions when the builtin is called.  The builtin then
> updated the RN field with the new value provided as an argument to the
> builtin.  The patch adds new testcases to test_fpscr_rn_builtin.c to
> check that the builtin returns the current value of the FPSCR fields
> and then updates the RN field.

But this patch also introduces one more overloading instance with argument
type double, I had a check on glibc usage of mffscrn/mffscrni, I don't
think it's necessary to add this new instance, as it takes the given
rounding mode as integral type.

For examples:

#define __fe_mffscrn(rn)\
  ({register fenv_union_t __fr; \
if (__builtin_constant_p (rn))  \
  __asm__ __volatile__ (\
".machine push; .machine \"power9\"; mffscrni %0,%1; .machine pop" \
: "=f" (__fr.fenv) : "n" (rn)); \
else\
{   \
  __fr.l = (rn);\
  __asm__ __volatile__ (\
".machine push; .machine \"power9\"; mffscrn %0,%1; .machine pop" \
: "=f" (__fr.fenv) : "f" (__fr.fenv));  \
}   \
__fr.fenv;  \
  })


/* Same as __fesetround_inline, however without runtime check to use DFP
   mtfsfi syntax (as relax_fenv_state) or if round value is valid.  */
static inline void
__fesetround_inline_nocheck (const int round)
{
#ifdef _ARCH_PWR9
  __fe_mffscrn (round);
#else
  if (__glibc_likely (GLRO(dl_hwcap2) & PPC_FEATURE2_ARCH_3_00))
__fe_mffscrn (round);
  else
asm volatile ("mtfsfi 7,%0" : : "n" (round));
#endif
}

So could you just extend return type (from void to double) but without one
more overloading instance?

Without overloading, we can still use the original bif instance SET_FPSCR_RN
and its correpsonding expander rs6000_set_fpscr_rn, just add some more
handlings to fetch bits for return value.  It would be simpler IMHO.

> 
> The GLibC team has reviewed the patch to make sure it met their needs
> as a drop in replacement for the inline asm mffscr and mffscrni
> statements in the GLibC code.  T
> 
> The patch has been tested on Power 8 LE/BE, Power 9 LE/BE and Power 10
> LE.
> 
> Please let me know if the patch is acceptable for mainline.  Thanks.
> 
>Carl 
> 
> 
> rs6000, __builtin_set_fpscr_rn add retrun value
> 
> Change the return value from void to double.  The return value consists of
> the FPSCR fields DRN, VE, OE, UE, ZE, XE, NI, RN bit positions.  Add an
> overloaded version which accepts a double argument.
> 
> The test powerpc/test_fpscr_rn_builtin.c is updated to add tests for the
> double reterun value and the new double argument.
> 
> gcc/ChangeLog:

Re: [PATCH V3] RISC-V: Fix bug of pre-calculated const vector mask for VNx1BI, VNx2BI and VNx4BI

2023-06-29 Thread Robin Dapp via Gcc-patches

> Yeah, that part is OK, and was the case I was thinking about when
> I said OK yesterday.  But now that we allow BITSIZE != PRECISION,
> it's possible for BITSIZE - PRECISION to be more than a full byte,
> in which case the new loop would not initialise every byte of
> the mode.

Ah, I see, so when e.g. BITSIZE == 16 and PRECISION == 1.  Luckily
this cannot happen with RVV as all we do is adjust the precision
of the modes that have BITSIZE == 8.  I'm going to add an assert.
Juzhe would rather work around that in the backend, though.

The other thing I just noticed is

tree
build_truth_vector_type_for_mode (poly_uint64 nunits, machine_mode mask_mode)
{
  gcc_assert (mask_mode != BLKmode);

  unsigned HOST_WIDE_INT esize;
  if (VECTOR_MODE_P (mask_mode))
{
  poly_uint64 vsize = GET_MODE_BITSIZE (mask_mode);
  esize = vector_element_size (vsize, nunits);
}
  else
esize = 1;

  tree bool_type = build_nonstandard_boolean_type (esize);

  return make_vector_type (bool_type, nunits, mask_mode);
}

which gives us wrong precision as we rely on the BITSIZE here as well.
This results in a precision of 1 for VNx8BI, 2 for VNx4BI and 4 for
VNx2BI.

Maybe this isn't a problem per se but to me it appears
just wrong.

Regards
 Robin

1 2 >

1 - 100 of 117 matches

Mail list logo