RE: [PATCH 1/8]middle-end: Recognize scalar reductions from bitfields and array_refs

2022-11-06 Thread Tamar Christina via Gcc-patches
> -Original Message-
> From: Richard Biener 
> Sent: Saturday, November 5, 2022 11:33 AM
> To: Tamar Christina 
> Cc: gcc-patches@gcc.gnu.org; nd ; rguent...@suse.de
> Subject: Re: [PATCH 1/8]middle-end: Recognize scalar reductions from
> bitfields and array_refs
> 
> On Mon, Oct 31, 2022 at 1:00 PM Tamar Christina via Gcc-patches  patc...@gcc.gnu.org> wrote:
> >
> > Hi All,
> >
> > This patch series is to add recognition of pairwise operations
> > (reductions) in match.pd such that we can benefit from them even at
> > -O1 when the vectorizer isn't enabled.
> >
> > Ths use of these allow for a lot simpler codegen in AArch64 and allows
> > us to avoid quite a lot of codegen warts.
> >
> > As an example a simple:
> >
> > typedef float v4sf __attribute__((vector_size (16)));
> >
> > float
> > foo3 (v4sf x)
> > {
> >   return x[1] + x[2];
> > }
> >
> > currently generates:
> >
> > foo3:
> > dup s1, v0.s[1]
> > dup s0, v0.s[2]
> > fadds0, s1, s0
> > ret
> >
> > while with this patch series now generates:
> >
> > foo3:
> > ext v0.16b, v0.16b, v0.16b, #4
> > faddp   s0, v0.2s
> > ret
> >
> > This patch will not perform the operation if the source is not a
> > gimple register and leaves memory sources to the vectorizer as it's
> > able to deal correctly with clobbers.
> 
> But the vectorizer should also be able to cope with the above.  

There are several problems with leaving it up to the vectorizer to do:

1. We only get it at -O2 and higher.
2. The way the vectorizer costs the reduction makes the resulting cost always 
too high for AArch64.

As an example the following:

typedef unsigned int u32v4 __attribute__((vector_size(16)));
unsigned int f (u32v4 a, u32v4 b)
{
return a[0] + a[1];
}

Doesn't get SLP'ed because the vectorizer costs it as:

node 0x485eb30 0 times vec_perm costs 0 in body
_1 + _2 1 times vector_stmt costs 1 in body
_1 + _2 1 times vec_perm costs 2 in body
_1 + _2 1 times vec_to_scalar costs 2 in body

And so ultimately you fail because:

/app/example.c:8:17: note: Cost model analysis for part in loop 0:
  Vector cost: 5
  Scalar cost: 3

This looks like it's because the vectorizer costs the operation to create the 
BIT_FIELD_REF ;
For the reduction as requiring two scalar extracts and a permute. While it 
ultimately does produce a
BIT_FIELD_REF ; that's not what it costs.

This causes the reduction to almost always be more expensive, so unless the 
rest of the SLP tree amortizes
the cost we never generate them.

3. The SLP only happens on operation that are SLP shaped and where SLP didn't 
fail.

As a simple example, the vectorizer can't SLP the following:

unsigned int f (u32v4 a, u32v4 b)
{
a[0] += b[0];
return a[0] + a[1];
}

Because there's not enough VF here and it can't unroll. This and many others 
fail because they're not an
SLP-able operation, or SLP build fails.

This causes us to generate for e.g. this example:

f:
dup s2, v0.s[1]
fmovw1, s1
add v0.2s, v2.2s, v0.2s
fmovw0, s0
add w0, w0, w1
ret

instead of with my patch:

f:
addpv0.2s, v0.2s, v0.2s
add v0.2s, v0.2s, v1.2s
fmovw0, s0
ret

which is significantly better code.  So I don't think the vectorizer is the 
right solution for this.

> I don't think
> we want to do this as part of general folding.  Iff, then this belongs in 
> specific
> points of the pass pipeline, no?

The reason I currently have it as such is because in general the compiler 
doesn't really deal with
horizontal reductions at all.  Also since the vectorizer itself can introduce 
reductions I figured it's
better to have one representation for this.  So admittedly perhaps this should 
only be done after
vectorization as that's when today we expect reductions to be in Gimple.

As for having it in a specific point in the pass pipeline, I have it as a 
general one since a number of
passes could create the form for the reduction, for instance vec_lower could 
break up an operation
to allow this to match.  The bigger BIT_FIELD_EXPR it creates could also lead 
to other optimizations.

Additionally you had mentioned last time that Andrew was trying to move min/max 
detection to match.pd
So I had figured this was the correct place for it.

That said I have no intuition for what would be better here. Since the check is 
quite cheap.  But do you have
a particular place you want this move to then?  Ideally I'd want it before the 
last FRE pass, but perhaps
isel?

Thanks,
Tamar

> 
> > The use of these instruction makes a significant difference in codegen
> > quality for AArch64 and Arm.
> >
> > NOTE: The last entry in the series contains tests for all of the
> > previous patches as it's a bit of an all or nothing thing.
> >
> > Bootstrapped Regtested on aarch64-none-linux-gnu, x86_64-pc-linux-gnu
> > and no issues.
> >
> > Ok for master?
> >
> > Thanks,
> > Tamar
> >
> > 

Re: why does gccgit require pthread?

2022-11-06 Thread LIU Hao via Gcc-patches

在 2022/11/7 15:03, Andrew Pinski 写道:


The win32 thread model does not have `std::mutex`; but there is no 
`pthread_mutex_t` either, so it
does not build either way.

Oh, but I would assume it will later on right?



There has been effort on C++11 threading support for win32 thread model, but I have a negative 
attitude on that.


Another solution is to use `__gthread_mutex_t` instead of `pthread_mutex_t`, which is also available 
in the win32 thread model. Actually I prefer this approach as it keeps code structure like what we 
have at this moment.




Also I think you might need to change some more than you did.
That is:
-#define INCLUDE_PTHREAD_H
  #include "system.h"

You must likely have a macro, INCLUDE_MUTEX, and define that and
include mutex in system.h like it was done for pthread.h.
GCC loves to poison identifiers while lexing to make sure those
identifiers are not used inside GCC and the include of mutex should be
done early.



Well I am not familiar with such behavior. Feel free to amend the patch, until it looks good to you. 
I hope we can check this in before GCC 13 RC.




--
Best regards,
LIU Hao



OpenPGP_signature
Description: OpenPGP digital signature


Re: why does gccgit require pthread?

2022-11-06 Thread Andrew Pinski via Gcc-patches
On Sun, Nov 6, 2022 at 10:51 PM LIU Hao  wrote:
>
> 在 2022-11-07 12:37, Andrew Pinski 写道:
> >
> > The original code which used pthread was added in GCC 5 way before GCC
> > moved to being written in C++11 which was only in the last 3 years.
> > pthread_* functions were the best choice at the time (2014) but now
> > GCC is written in C++11, I don't see any reason not to move them over
> > to using C++11 threading code.
> >
> >
>
> Attached is the proposed patch.
>
> The win32 thread model does not have `std::mutex`; but there is no 
> `pthread_mutex_t` either, so it
> does not build either way.
Oh, but I would assume it will later on right?

Also I think you might need to change some more than you did.
That is:
-#define INCLUDE_PTHREAD_H
 #include "system.h"

You must likely have a macro, INCLUDE_MUTEX, and define that and
include mutex in system.h like it was done for pthread.h.
GCC loves to poison identifiers while lexing to make sure those
identifiers are not used inside GCC and the include of mutex should be
done early.

Thanks,
Andrew

>
> Tested bootstrapping GCC on `{i686,x86_64}-w64-mingw32` with languages
> `c,lto,c++,fortran,objc,obj-c++` and with the `mcf` thread model; no errors 
> observed. The built
> `libgccjit-0.dll` does not have imports from winpthread any more.
>
> Please review.
>
>
> --
> Best regards,
> LIU Hao
>


Re: why does gccgit require pthread?

2022-11-06 Thread LIU Hao via Gcc-patches

在 2022-11-07 12:37, Andrew Pinski 写道:


The original code which used pthread was added in GCC 5 way before GCC
moved to being written in C++11 which was only in the last 3 years.
pthread_* functions were the best choice at the time (2014) but now
GCC is written in C++11, I don't see any reason not to move them over
to using C++11 threading code.




Attached is the proposed patch.

The win32 thread model does not have `std::mutex`; but there is no `pthread_mutex_t` either, so it 
does not build either way.


Tested bootstrapping GCC on `{i686,x86_64}-w64-mingw32` with languages 
`c,lto,c++,fortran,objc,obj-c++` and with the `mcf` thread model; no errors observed. The built 
`libgccjit-0.dll` does not have imports from winpthread any more.


Please review.


--
Best regards,
LIU Hao

From ceb65f21b5ac23ce218efee82f40f641ebe44361 Mon Sep 17 00:00:00 2001
From: LIU Hao 
Date: Mon, 7 Nov 2022 13:00:12 +0800
Subject: [PATCH] gcc/jit: Use C++11 mutex instead of pthread's

This allows JIT to be built with a different thread model from `posix`
where pthread isn't available

gcc/jit/ChangeLog:

* jit-playback.cc: Use `std::mutex` instead of `pthread_mutex_t`
(playback::context::acquire_mutex): Likewise
(playback::context::release_mutex): Likewise
* jit-recording.cc: Remove the unused `INCLUDE_PTHREAD_H`
* libgccjit.cc: Use `std::mutex` instead of `pthread_mutex_t`
---
 gcc/jit/jit-playback.cc  | 9 +
 gcc/jit/jit-recording.cc | 1 -
 gcc/jit/libgccjit.cc | 8 
 3 files changed, 9 insertions(+), 9 deletions(-)

diff --git a/gcc/jit/jit-playback.cc b/gcc/jit/jit-playback.cc
index d227d36283a..17ff98c149b 100644
--- a/gcc/jit/jit-playback.cc
+++ b/gcc/jit/jit-playback.cc
@@ -19,7 +19,6 @@ along with GCC; see the file COPYING3.  If not see
 .  */
 
 #include "config.h"
-#define INCLUDE_PTHREAD_H
 #include "system.h"
 #include "coretypes.h"
 #include "target.h"
@@ -51,6 +50,8 @@ along with GCC; see the file COPYING3.  If not see
 #include "jit-w32.h"
 #endif
 
+#include 
+
 /* Compare with gcc/c-family/c-common.h: DECL_C_BIT_FIELD,
SET_DECL_C_BIT_FIELD.
These are redefined here to avoid depending from the C frontend.  */
@@ -2662,7 +2663,7 @@ playback::compile_to_file::copy_file (const char 
*src_path,
 /* This mutex guards gcc::jit::recording::context::compile, so that only
one thread can be accessing the bulk of GCC's state at once.  */
 
-static pthread_mutex_t jit_mutex = PTHREAD_MUTEX_INITIALIZER;
+static std::mutex jit_mutex;
 
 /* Acquire jit_mutex and set "this" as the active playback ctxt.  */
 
@@ -2673,7 +2674,7 @@ playback::context::acquire_mutex ()
 
   /* Acquire the big GCC mutex. */
   JIT_LOG_SCOPE (get_logger ());
-  pthread_mutex_lock (&jit_mutex);
+  jit_mutex.lock ();
   gcc_assert (active_playback_ctxt == NULL);
   active_playback_ctxt = this;
 }
@@ -2687,7 +2688,7 @@ playback::context::release_mutex ()
   JIT_LOG_SCOPE (get_logger ());
   gcc_assert (active_playback_ctxt == this);
   active_playback_ctxt = NULL;
-  pthread_mutex_unlock (&jit_mutex);
+  jit_mutex.unlock ();
 }
 
 /* Callback used by gcc::jit::playback::context::make_fake_args when
diff --git a/gcc/jit/jit-recording.cc b/gcc/jit/jit-recording.cc
index f78daed2d71..6ae5a667e90 100644
--- a/gcc/jit/jit-recording.cc
+++ b/gcc/jit/jit-recording.cc
@@ -19,7 +19,6 @@ along with GCC; see the file COPYING3.  If not see
 .  */
 
 #include "config.h"
-#define INCLUDE_PTHREAD_H
 #include "system.h"
 #include "coretypes.h"
 #include "tm.h"
diff --git a/gcc/jit/libgccjit.cc b/gcc/jit/libgccjit.cc
index ca862662777..a5105fbc1f9 100644
--- a/gcc/jit/libgccjit.cc
+++ b/gcc/jit/libgccjit.cc
@@ -19,7 +19,6 @@ along with GCC; see the file COPYING3.  If not see
 .  */
 
 #include "config.h"
-#define INCLUDE_PTHREAD_H
 #include "system.h"
 #include "coretypes.h"
 #include "timevar.h"
@@ -30,6 +29,8 @@ along with GCC; see the file COPYING3.  If not see
 #include "jit-recording.h"
 #include "jit-result.h"
 
+#include 
+
 /* The opaque types used by the public API are actually subclasses
of the gcc::jit::recording classes.  */
 
@@ -4060,7 +4061,7 @@ gcc_jit_context_new_rvalue_from_vector (gcc_jit_context 
*ctxt,
Ideally this would be within parse_basever, but the mutex is only needed
by libgccjit.  */
 
-static pthread_mutex_t version_mutex = PTHREAD_MUTEX_INITIALIZER;
+static std::mutex version_mutex;
 
 struct jit_version_info
 {
@@ -4068,9 +4069,8 @@ struct jit_version_info
  guarded by version_mutex.  */
   jit_version_info ()
   {
-pthread_mutex_lock (&version_mutex);
+std::lock_guard g (version_mutex);
 parse_basever (&major, &minor, &patchlevel);
-pthread_mutex_unlock (&version_mutex);
   }
 
   int major;
-- 
2.38.1



OpenPGP_signature
Description: OpenPGP digital signature


[PATCH v4, rs6000] Change mode and insn condition for VSX scalar extract/insert instructions

2022-11-06 Thread HAO CHEN GUI via Gcc-patches
Hi,
  For scalar extract/insert instructions, exponent field can be stored in a
32-bit register. So this patch changes the mode of exponent field from DI to
SI. So these instructions can be generated in a 32-bit environment. The patch
removes TARGET_64BIT check for these instructiions.

  The instructions using DI registers can be invoked with -mpowerpc64 in a
32-bit environment. The patch changes insn condition from TARGET_64BIT to
TARGET_POWERPC64 for those instructions.

  This patch also changes prototypes and catagories of relevant built-ins and
effective target checks of test cases.

  Compared to last version, main changes are to set catagories of relevant
built-ins from power9-64 to power9 and remove some unnecessary test cases.
Last version: 
https://gcc.gnu.org/pipermail/gcc-patches/2022-September/601196.html

  Bootstrapped and tested on powerpc64-linux BE and LE with no regressions.
Is this okay for trunk? Any recommendations? Thanks a lot.


ChangeLog
2022-11-07  Haochen Gui  

gcc/
* config/rs6000/rs6000-builtins.def
(__builtin_vsx_scalar_extract_exp): Set return type to const unsigned
int and move it from power9-64 to power9 catatlog.
(__builtin_vsx_scalar_extract_sig): Set return type to const unsigned
long long.
(__builtin_vsx_scalar_insert_exp): Set type of second argument to
unsigned int.
(__builtin_vsx_scalar_insert_exp_dp): Set type of second argument to
unsigned int and move it from power9-64 to power9 catatlog.
* config/rs6000/vsx.md (xsxexpdp): Set mode of first operand to
SImode.  Remove TARGET_64BIT from insn condition.
(xsxsigdp): Change insn condition from TARGET_64BIT to TARGET_POWERPC64.
(xsiexpdp): Change insn condition from TARGET_64BIT to
TARGET_POWERPC64.  Set mode of third operand to SImode.
(xsiexpdpf): Set mode of third operand to SImode.  Remove TARGET_64BIT
from insn condition.

gcc/testsuite/
* gcc.target/powerpc/bfp/scalar-extract-exp-0.c: Remove lp64 check.
* gcc.target/powerpc/bfp/scalar-extract-exp-1.c: Remove lp64 check.
* gcc.target/powerpc/bfp/scalar-extract-exp-2.c: Deleted as case is
invalid now.
* gcc.target/powerpc/bfp/scalar-extract-exp-6.c: Replace lp64 check
with has_arch_ppc64.
* gcc.target/powerpc/bfp/scalar-extract-sig-0.c: Likewise.
* gcc.target/powerpc/bfp/scalar-extract-sig-6.c: Likewise.
* gcc.target/powerpc/bfp/scalar-insert-exp-0.c: Replace lp64 check
with has_arch_ppc64. Set type of exponent to unsigned int.
* gcc.target/powerpc/bfp/scalar-insert-exp-1.c: Set type of exponent
to unsigned int.
* gcc.target/powerpc/bfp/scalar-insert-exp-12.c: Replace lp64 check
with has_arch_ppc64. Set type of exponent to unsigned int.
* gcc.target/powerpc/bfp/scalar-insert-exp-13.c: Remove lp64 check.
Set type of exponent to unsigned int.
* gcc.target/powerpc/bfp/scalar-insert-exp-2.c: Set type of exponent to
unsigned int.
* gcc.target/powerpc/bfp/scalar-insert-exp-3.c: Remove lp64 check. Set
type of exponent to unsigned int.
* gcc.target/powerpc/bfp/scalar-insert-exp-4.c: Likewise.
* gcc.target/powerpc/bfp/scalar-insert-exp-5.c: Deleted as case is
invalid now.

patch.diff
diff --git a/gcc/config/rs6000/rs6000-builtins.def 
b/gcc/config/rs6000/rs6000-builtins.def
index f76f54793d7..d8d67fa0cad 100644
--- a/gcc/config/rs6000/rs6000-builtins.def
+++ b/gcc/config/rs6000/rs6000-builtins.def
@@ -2833,6 +2833,11 @@
   const signed int __builtin_dtstsfi_ov_td (const int<6>, _Decimal128);
 TSTSFI_OV_TD dfptstsfi_unordered_td {}

+  const unsigned int __builtin_vsx_scalar_extract_exp (double);
+VSEEDP xsxexpdp {}
+
+  const double __builtin_vsx_scalar_insert_exp_dp (double, unsigned int);
+VSIEDPF xsiexpdpf {}

 [power9-64]
   void __builtin_altivec_xst_len_r (vsc, void *, long);
@@ -2847,19 +2852,13 @@
   pure vsc __builtin_vsx_lxvl (const void *, signed long);
 LXVL lxvl {}

-  const signed long __builtin_vsx_scalar_extract_exp (double);
-VSEEDP xsxexpdp {}
-
-  const signed long __builtin_vsx_scalar_extract_sig (double);
+  const unsigned long long __builtin_vsx_scalar_extract_sig (double);
 VSESDP xsxsigdp {}

   const double __builtin_vsx_scalar_insert_exp (unsigned long long, \
-unsigned long long);
+   unsigned int);
 VSIEDP xsiexpdp {}

-  const double __builtin_vsx_scalar_insert_exp_dp (double, unsigned long long);
-VSIEDPF xsiexpdpf {}
-
   pure vsc __builtin_vsx_xl_len_r (void *, signed long);
 XL_LEN_R xl_len_r {}

diff --git a/gcc/config/rs6000/vsx.md b/gcc/config/rs6000/vsx.md
index e226a93bbe5..9d3a2340a79 100644
--- a/gcc/config/rs6000/vsx.md
+++ b/gcc/config/rs6000/vsx.md
@@ -5095,10 +5095,10 @@ (define_insn "xsxexpqp_"


Re: [PATCH 1/2] Initial Grand Ridge support

2022-11-06 Thread Hongtao Liu via Gcc-patches
On Mon, Nov 7, 2022 at 9:41 AM Haochen Jiang via Gcc-patches
 wrote:
>
> gcc/ChangeLog:
>
> * common/config/i386/i386-common.cc
> (processor_names): Add grandridge.
> (processor_alias_table): Ditto.
> * common/config/i386/i386-cpuinfo.h:
> (enum processor_types): Add INTEL_GRANDRIDGE.
> * config.gcc: Add -march=grandridge.
> * config/i386/driver-i386.cc (host_detect_local_cpu):
> Handle grandridge.
> * config/i386/i386-c.cc (ix86_target_macros_internal):
> Ditto.
> * config/i386/i386-options.cc (m_GRANDRIDGE): New define.
> (processor_cost_table): Add grandridge.
> * config/i386/i386.h (enum processor_type):
> Add PROCESSOR_GRANDRIDGE.
> (PTA_GRANDRIDGE): Ditto.
> * doc/extend.texi: Add grandridge.
> * doc/invoke.texi: Ditto.
>
> gcc/testsuite/ChangeLog:
>
> * gcc/testsuite/g++.target/i386/mv16.C: Add grandridge.
> * gcc.target/i386/funcspec-56.inc: Handle new march.
> ---
LGTM.
>  gcc/common/config/i386/cpuinfo.h  | 6 ++
>  gcc/common/config/i386/i386-common.cc | 3 +++
>  gcc/common/config/i386/i386-cpuinfo.h | 1 +
>  gcc/config.gcc| 2 +-
>  gcc/config/i386/driver-i386.cc| 5 -
>  gcc/config/i386/i386-c.cc | 7 +++
>  gcc/config/i386/i386-options.cc   | 2 ++
>  gcc/config/i386/i386.h| 2 ++
>  gcc/doc/extend.texi   | 3 +++
>  gcc/doc/invoke.texi   | 9 +
>  gcc/testsuite/g++.target/i386/mv16.C  | 6 ++
>  gcc/testsuite/gcc.target/i386/funcspec-56.inc | 1 +
>  12 files changed, 45 insertions(+), 2 deletions(-)
>
> diff --git a/gcc/common/config/i386/cpuinfo.h 
> b/gcc/common/config/i386/cpuinfo.h
> index df3500adc83..4d1bcffb978 100644
> --- a/gcc/common/config/i386/cpuinfo.h
> +++ b/gcc/common/config/i386/cpuinfo.h
> @@ -573,6 +573,12 @@ get_intel_cpu (struct __processor_model *cpu_model,
>cpu_model->__cpu_type = INTEL_COREI7;
>cpu_model->__cpu_subtype = INTEL_COREI7_GRANITERAPIDS;
>break;
> +case 0xb6:
> +  /* Grand Ridge.  */
> +  cpu = "grandridge";
> +  CHECK___builtin_cpu_is ("grandridge");
> +  cpu_model->__cpu_type = INTEL_GRANDRIDGE;
> +  break;
>  case 0x17:
>  case 0x1d:
>/* Penryn.  */
> diff --git a/gcc/common/config/i386/i386-common.cc 
> b/gcc/common/config/i386/i386-common.cc
> index 60a193a651c..431fd0d3ad1 100644
> --- a/gcc/common/config/i386/i386-common.cc
> +++ b/gcc/common/config/i386/i386-common.cc
> @@ -1920,6 +1920,7 @@ const char *const processor_names[] =
>"goldmont-plus",
>"tremont",
>"sierraforest",
> +  "grandridge",
>"knl",
>"knm",
>"skylake",
> @@ -2071,6 +2072,8 @@ const pta processor_alias_table[] =
>  M_CPU_TYPE (INTEL_TREMONT), P_PROC_SSE4_2},
>{"sierraforest", PROCESSOR_SIERRAFOREST, CPU_HASWELL, PTA_SIERRAFOREST,
>  M_CPU_SUBTYPE (INTEL_SIERRAFOREST), P_PROC_AVX2},
> +  {"grandridge", PROCESSOR_GRANDRIDGE, CPU_HASWELL, PTA_GRANDRIDGE,
> +M_CPU_TYPE (INTEL_GRANDRIDGE), P_PROC_AVX2},
>{"knl", PROCESSOR_KNL, CPU_SLM, PTA_KNL,
>  M_CPU_TYPE (INTEL_KNL), P_PROC_AVX512F},
>{"knm", PROCESSOR_KNM, CPU_SLM, PTA_KNM,
> diff --git a/gcc/common/config/i386/i386-cpuinfo.h 
> b/gcc/common/config/i386/i386-cpuinfo.h
> index 345fda648ff..fe2e9e21fd2 100644
> --- a/gcc/common/config/i386/i386-cpuinfo.h
> +++ b/gcc/common/config/i386/i386-cpuinfo.h
> @@ -61,6 +61,7 @@ enum processor_types
>AMDFAM19H,
>ZHAOXIN_FAM7H,
>INTEL_SIERRAFOREST,
> +  INTEL_GRANDRIDGE,
>CPU_TYPE_MAX,
>BUILTIN_CPU_TYPE_MAX = CPU_TYPE_MAX
>  };
> diff --git a/gcc/config.gcc b/gcc/config.gcc
> index 84c040746dc..b5eda046033 100644
> --- a/gcc/config.gcc
> +++ b/gcc/config.gcc
> @@ -669,7 +669,7 @@ silvermont knl knm skylake-avx512 cannonlake 
> icelake-client icelake-server \
>  skylake goldmont goldmont-plus tremont cascadelake tigerlake cooperlake \
>  sapphirerapids alderlake rocketlake eden-x2 nano nano-1000 nano-2000 
> nano-3000 \
>  nano-x2 eden-x4 nano-x4 lujiazui x86-64 x86-64-v2 x86-64-v3 x86-64-v4 \
> -sierraforest graniterapids native"
> +sierraforest graniterapids grandridge native"
>
>  # Additional x86 processors supported by --with-cpu=.  Each processor
>  # MUST be separated by exactly one space.
> diff --git a/gcc/config/i386/driver-i386.cc b/gcc/config/i386/driver-i386.cc
> index 3117d83de00..95c16c23c7f 100644
> --- a/gcc/config/i386/driver-i386.cc
> +++ b/gcc/config/i386/driver-i386.cc
> @@ -591,8 +591,11 @@ const char *host_detect_local_cpu (int argc, const char 
> **argv)
>   /* This is unknown family 0x6 CPU.  */
>   if (has_feature (FEATURE_AVX))
> {
> + /* Assume Grand Ridge.  */
> + if (has_feature (FEATURE_RAOINT))
> +  

[PATCH 2/2] Add m_CORE_ATOM for atom cores

2022-11-06 Thread Haochen Jiang via Gcc-patches
gcc/ChangeLog:

* config/i386/i386-options.cc (m_CORE_ATOM): New.
* config/i386/x86-tune.def
(X86_TUNE_SCHEDULE): Initial tune for CORE_ATOM.
(X86_TUNE_PARTIAL_REG_DEPENDENCY): Ditto.
(X86_TUNE_SSE_PARTIAL_REG_DEPENDENCY): Ditto.
(X86_TUNE_SSE_PARTIAL_REG_FP_CONVERTS_DEPENDENCY): Ditto.
(X86_TUNE_SSE_PARTIAL_REG_CONVERTS_DEPENDENCY): Ditto.
(X86_TUNE_DEST_FALSE_DEP_FOR_GLC): Ditto.
(X86_TUNE_MEMORY_MISMATCH_STALL): Ditto.
(X86_TUNE_USE_LEAVE): Ditto.
(X86_TUNE_PUSH_MEMORY): Ditto.
(X86_TUNE_USE_INCDEC): Ditto.
(X86_TUNE_INTEGER_DFMODE_MOVES): Ditto.
(X86_TUNE_PREFER_KNOWN_REP_MOVSB_STOSB): Ditto.
(X86_TUNE_MISALIGNED_MOVE_STRING_PRO_EPILOGUES): Ditto.
(X86_TUNE_USE_SAHF): Ditto.
(X86_TUNE_USE_BT): Ditto.
(X86_TUNE_AVOID_FALSE_DEP_FOR_BMI): Ditto.
(X86_TUNE_ONE_IF_CONV_INSN): Ditto.
(X86_TUNE_AVOID_MFENCE): Ditto.
(X86_TUNE_USE_SIMODE_FIOP): Ditto.
(X86_TUNE_EXT_80387_CONSTANTS): Ditto.
(X86_TUNE_SSE_UNALIGNED_LOAD_OPTIMAL): Ditto.
(X86_TUNE_SSE_UNALIGNED_STORE_OPTIMAL): Ditto.
(X86_TUNE_SSE_TYPELESS_STORES): Ditto.
(X86_TUNE_SSE_LOAD0_BY_PXOR): Ditto.
(X86_TUNE_AVOID_4BYTE_PREFIXES): Ditto.
(X86_TUNE_USE_GATHER_2PARTS): Ditto.
(X86_TUNE_USE_GATHER_4PARTS): Ditto.
(X86_TUNE_USE_GATHER): Ditto.
---
 gcc/config/i386/i386-options.cc |  1 +
 gcc/config/i386/x86-tune.def| 71 +++--
 2 files changed, 41 insertions(+), 31 deletions(-)

diff --git a/gcc/config/i386/i386-options.cc b/gcc/config/i386/i386-options.cc
index 23ab1f867d0..e5c77f3a84d 100644
--- a/gcc/config/i386/i386-options.cc
+++ b/gcc/config/i386/i386-options.cc
@@ -139,6 +139,7 @@ along with GCC; see the file COPYING3.  If not see
 #define m_TREMONT (HOST_WIDE_INT_1U<> (W-1) ^ x) -
@@ -372,7 +379,7 @@ DEF_TUNE (X86_TUNE_USE_SIMODE_FIOP, "use_simode_fiop",
   ~(m_PENT | m_LAKEMONT | m_PPRO | m_CORE_ALL | m_BONNELL
| m_SILVERMONT | m_KNL | m_KNM | m_INTEL | m_AMD_MULTIPLE
| m_LUJIAZUI | m_GOLDMONT | m_GOLDMONT_PLUS | m_TREMONT
-   | m_ALDERLAKE | m_GENERIC))
+   | m_ALDERLAKE | m_CORE_ATOM | m_GENERIC))
 
 /* X86_TUNE_USE_FFREEP: Use freep instruction instead of fstp.  */
 DEF_TUNE (X86_TUNE_USE_FFREEP, "use_ffreep", m_AMD_MULTIPLE | m_LUJIAZUI)
@@ -381,7 +388,8 @@ DEF_TUNE (X86_TUNE_USE_FFREEP, "use_ffreep", m_AMD_MULTIPLE 
| m_LUJIAZUI)
 DEF_TUNE (X86_TUNE_EXT_80387_CONSTANTS, "ext_80387_constants",
   m_PPRO | m_P4_NOCONA | m_CORE_ALL | m_BONNELL | m_SILVERMONT
  | m_KNL | m_KNM | m_INTEL | m_K6_GEODE | m_ATHLON_K8 | m_LUJIAZUI
- | m_GOLDMONT | m_GOLDMONT_PLUS | m_TREMONT | m_ALDERLAKE | m_GENERIC)
+ | m_GOLDMONT | m_GOLDMONT_PLUS | m_TREMONT | m_ALDERLAKE | m_CORE_ATOM
+ | m_GENERIC)
 
 /*/
 /* SSE instruction selection tuning  */
@@ -397,14 +405,15 @@ DEF_TUNE (X86_TUNE_GENERAL_REGS_SSE_SPILL, 
"general_regs_sse_spill",
 DEF_TUNE (X86_TUNE_SSE_UNALIGNED_LOAD_OPTIMAL, "sse_unaligned_load_optimal",
  m_NEHALEM | m_SANDYBRIDGE | m_CORE_AVX2 | m_SILVERMONT | m_KNL | m_KNM
  | m_INTEL | m_GOLDMONT | m_GOLDMONT_PLUS | m_TREMONT | m_ALDERLAKE
- | m_AMDFAM10 | m_BDVER | m_BTVER | m_ZNVER | m_LUJIAZUI | m_GENERIC)
+ | m_CORE_ATOM | m_AMDFAM10 | m_BDVER | m_BTVER | m_ZNVER | m_LUJIAZUI
+ | m_GENERIC)
 
 /* X86_TUNE_SSE_UNALIGNED_STORE_OPTIMAL: Use movups for misaligned stores
instead of a sequence loading registers by parts.  */
 DEF_TUNE (X86_TUNE_SSE_UNALIGNED_STORE_OPTIMAL, "sse_unaligned_store_optimal",
  m_NEHALEM | m_SANDYBRIDGE | m_CORE_AVX2 | m_SILVERMONT | m_KNL | m_KNM
  | m_INTEL | m_GOLDMONT | m_GOLDMONT_PLUS | m_TREMONT | m_ALDERLAKE
- | m_BDVER | m_ZNVER | m_LUJIAZUI | m_GENERIC)
+ | m_CORE_ATOM | m_BDVER | m_ZNVER | m_LUJIAZUI | m_GENERIC)
 
 /* X86_TUNE_SSE_PACKED_SINGLE_INSN_OPTIMAL: Use packed single
precision 128bit instructions instead of double where possible.   */
@@ -414,13 +423,13 @@ DEF_TUNE (X86_TUNE_SSE_PACKED_SINGLE_INSN_OPTIMAL, 
"sse_packed_single_insn_optim
 /* X86_TUNE_SSE_TYPELESS_STORES: Always movaps/movups for 128bit stores.   */
 DEF_TUNE (X86_TUNE_SSE_TYPELESS_STORES, "sse_typeless_stores",
  m_AMD_MULTIPLE | m_LUJIAZUI | m_CORE_ALL | m_TREMONT | m_ALDERLAKE
- | m_GENERIC)
+ | m_CORE_ATOM | m_GENERIC)
 
 /* X86_TUNE_SSE_LOAD0_BY_PXOR: Always use pxor to load0 as opposed to
xorps/xorpd and other variants.  */
 DEF_TUNE (X86_TUNE_SSE_LOAD0_BY_PXOR, "sse_load0_by_pxor",
  m_PPRO | m_P4_NOCONA | m_CORE_ALL | m_BDVER | m_BTVER | m_ZNVER
- | m_LUJIAZUI | m_TREMONT | m_ALDERLAKE | m_GENERIC)
+ | m_LUJIAZUI | m_TREMONT | m_ALDERLAKE

[PATCH 1/2] Initial Grand Ridge support

2022-11-06 Thread Haochen Jiang via Gcc-patches
gcc/ChangeLog:

* common/config/i386/i386-common.cc
(processor_names): Add grandridge.
(processor_alias_table): Ditto.
* common/config/i386/i386-cpuinfo.h:
(enum processor_types): Add INTEL_GRANDRIDGE.
* config.gcc: Add -march=grandridge.
* config/i386/driver-i386.cc (host_detect_local_cpu):
Handle grandridge.
* config/i386/i386-c.cc (ix86_target_macros_internal):
Ditto.
* config/i386/i386-options.cc (m_GRANDRIDGE): New define.
(processor_cost_table): Add grandridge.
* config/i386/i386.h (enum processor_type):
Add PROCESSOR_GRANDRIDGE.
(PTA_GRANDRIDGE): Ditto.
* doc/extend.texi: Add grandridge.
* doc/invoke.texi: Ditto.

gcc/testsuite/ChangeLog:

* gcc/testsuite/g++.target/i386/mv16.C: Add grandridge.
* gcc.target/i386/funcspec-56.inc: Handle new march.
---
 gcc/common/config/i386/cpuinfo.h  | 6 ++
 gcc/common/config/i386/i386-common.cc | 3 +++
 gcc/common/config/i386/i386-cpuinfo.h | 1 +
 gcc/config.gcc| 2 +-
 gcc/config/i386/driver-i386.cc| 5 -
 gcc/config/i386/i386-c.cc | 7 +++
 gcc/config/i386/i386-options.cc   | 2 ++
 gcc/config/i386/i386.h| 2 ++
 gcc/doc/extend.texi   | 3 +++
 gcc/doc/invoke.texi   | 9 +
 gcc/testsuite/g++.target/i386/mv16.C  | 6 ++
 gcc/testsuite/gcc.target/i386/funcspec-56.inc | 1 +
 12 files changed, 45 insertions(+), 2 deletions(-)

diff --git a/gcc/common/config/i386/cpuinfo.h b/gcc/common/config/i386/cpuinfo.h
index df3500adc83..4d1bcffb978 100644
--- a/gcc/common/config/i386/cpuinfo.h
+++ b/gcc/common/config/i386/cpuinfo.h
@@ -573,6 +573,12 @@ get_intel_cpu (struct __processor_model *cpu_model,
   cpu_model->__cpu_type = INTEL_COREI7;
   cpu_model->__cpu_subtype = INTEL_COREI7_GRANITERAPIDS;
   break;
+case 0xb6:
+  /* Grand Ridge.  */
+  cpu = "grandridge";
+  CHECK___builtin_cpu_is ("grandridge");
+  cpu_model->__cpu_type = INTEL_GRANDRIDGE;
+  break;
 case 0x17:
 case 0x1d:
   /* Penryn.  */
diff --git a/gcc/common/config/i386/i386-common.cc 
b/gcc/common/config/i386/i386-common.cc
index 60a193a651c..431fd0d3ad1 100644
--- a/gcc/common/config/i386/i386-common.cc
+++ b/gcc/common/config/i386/i386-common.cc
@@ -1920,6 +1920,7 @@ const char *const processor_names[] =
   "goldmont-plus",
   "tremont",
   "sierraforest",
+  "grandridge",
   "knl",
   "knm",
   "skylake",
@@ -2071,6 +2072,8 @@ const pta processor_alias_table[] =
 M_CPU_TYPE (INTEL_TREMONT), P_PROC_SSE4_2},
   {"sierraforest", PROCESSOR_SIERRAFOREST, CPU_HASWELL, PTA_SIERRAFOREST,
 M_CPU_SUBTYPE (INTEL_SIERRAFOREST), P_PROC_AVX2},
+  {"grandridge", PROCESSOR_GRANDRIDGE, CPU_HASWELL, PTA_GRANDRIDGE,
+M_CPU_TYPE (INTEL_GRANDRIDGE), P_PROC_AVX2},
   {"knl", PROCESSOR_KNL, CPU_SLM, PTA_KNL,
 M_CPU_TYPE (INTEL_KNL), P_PROC_AVX512F},
   {"knm", PROCESSOR_KNM, CPU_SLM, PTA_KNM,
diff --git a/gcc/common/config/i386/i386-cpuinfo.h 
b/gcc/common/config/i386/i386-cpuinfo.h
index 345fda648ff..fe2e9e21fd2 100644
--- a/gcc/common/config/i386/i386-cpuinfo.h
+++ b/gcc/common/config/i386/i386-cpuinfo.h
@@ -61,6 +61,7 @@ enum processor_types
   AMDFAM19H,
   ZHAOXIN_FAM7H,
   INTEL_SIERRAFOREST,
+  INTEL_GRANDRIDGE,
   CPU_TYPE_MAX,
   BUILTIN_CPU_TYPE_MAX = CPU_TYPE_MAX
 };
diff --git a/gcc/config.gcc b/gcc/config.gcc
index 84c040746dc..b5eda046033 100644
--- a/gcc/config.gcc
+++ b/gcc/config.gcc
@@ -669,7 +669,7 @@ silvermont knl knm skylake-avx512 cannonlake icelake-client 
icelake-server \
 skylake goldmont goldmont-plus tremont cascadelake tigerlake cooperlake \
 sapphirerapids alderlake rocketlake eden-x2 nano nano-1000 nano-2000 nano-3000 
\
 nano-x2 eden-x4 nano-x4 lujiazui x86-64 x86-64-v2 x86-64-v3 x86-64-v4 \
-sierraforest graniterapids native"
+sierraforest graniterapids grandridge native"
 
 # Additional x86 processors supported by --with-cpu=.  Each processor
 # MUST be separated by exactly one space.
diff --git a/gcc/config/i386/driver-i386.cc b/gcc/config/i386/driver-i386.cc
index 3117d83de00..95c16c23c7f 100644
--- a/gcc/config/i386/driver-i386.cc
+++ b/gcc/config/i386/driver-i386.cc
@@ -591,8 +591,11 @@ const char *host_detect_local_cpu (int argc, const char 
**argv)
  /* This is unknown family 0x6 CPU.  */
  if (has_feature (FEATURE_AVX))
{
+ /* Assume Grand Ridge.  */
+ if (has_feature (FEATURE_RAOINT))
+   cpu = "grandridge";
  /* Assume Granite Rapids.  */
- if (has_feature (FEATURE_AMX_FP16))
+ else if (has_feature (FEATURE_AMX_FP16))
cpu = "graniterapids";
  /* Assume Sierra Forest.  */
  else if (has_fea

[PATCH 0/2] Intel Grand Ridge Support

2022-11-06 Thread Haochen Jiang via Gcc-patches
Hi all,

These patches aimed to add initial Granite Rapids support for GCC.
Also we added a new m_CORE_ATOM for future atom core tune. They need
to be checked in after RAO-INT patch.

The information for Granite Rapids comes following:
https://www.intel.com/content/www/us/en/develop/download/intel-architecture-instruction-set-extensions-programming-reference.html

Regtested on x86_64-pc-linux-gnu. Ok for trunk?

BRs,
Haochen




Re: [PATCH] Support Intel RAO-INT

2022-11-06 Thread Hongtao Liu via Gcc-patches
On Sun, Nov 6, 2022 at 8:56 PM Kong, Lingling via Gcc-patches
 wrote:
>
> Hi,
> The patches aimed to add Intel RAO-INT.
>
> The information is based on newly released
> Intel Architecture Instruction Set Extensions and Future Features.
>
> The document comes following:
> https://www.intel.com/content/www/us/en/develop/download/intel-architecture-instruction-set-extensions-programming-reference.html.
>
> OK for trunk?
Ok.
>
> gcc/ChangeLog:
>
> * common/config/i386/cpuinfo.h (get_available_features):
> Detect raoint.
> * common/config/i386/i386-common.cc (OPTION_MASK_ISA2_RAOINT_SET,
> OPTION_MASK_ISA2_RAOINT_UNSET): New.
> (ix86_handle_option): Handle -mraoint.
> * common/config/i386/i386-cpuinfo.h (enum processor_features):
> Add FEATURE_RAOINT.
> * common/config/i386/i386-isas.h: Add ISA_NAME_TABLE_ENTRY for
> raoint.
> * config.gcc: Add raointintrin.h
> * config/i386/cpuid.h (bit_RAOINT): New.
> * config/i386/i386-builtin.def (BDESC): Add new builtins.
> * config/i386/i386-c.cc (ix86_target_macros_internal): Define
> __RAOINT__.
> * config/i386/i386-isa.def (RAOINT): Add DEF_PTA(RAOINT).
> * config/i386/i386-options.cc (ix86_valid_target_attribute_inner_p):
> Add -mraoint.
> * config/i386/sync.md (rao_a): New define insn.
> * config/i386/i386.opt: Add option -mraoint.
> * config/i386/x86gprintrin.h: Include raointintrin.h.
> * doc/extend.texi: Document raoint.
> * doc/invoke.texi: Document -mraoint.
> * doc/sourcebuild.texi: Document target raoint.
> * config/i386/raointintrin.h: New file.
>
> gcc/testsuite/ChangeLog:
>
> * g++.dg/other/i386-2.C: Add -mraoint.
> * g++.dg/other/i386-3.C: Ditto.
> * gcc.target/i386/funcspec-56.inc: Add new target attribute.
> * gcc.target/i386/sse-12.c: Add -mraoint.
> * gcc.target/i386/sse-13.c: Ditto.
> * gcc.target/i386/sse-14.c: Ditto.
> * gcc.target/i386/sse-22.c: Add raoint target.
> * gcc.target/i386/sse-23.c: Ditto.
> * lib/target-supports.exp: Add check_effective_target_raoint.
> * gcc.target/i386/rao-helper.h: New test.
> * gcc.target/i386/raoint-1.c: Ditto.
> * gcc.target/i386/raoint-aadd-2.c: Ditto.
> * gcc.target/i386/raoint-aand-2.c: Ditto.
> * gcc.target/i386/raoint-aor-2.c: Ditto.
> * gcc.target/i386/raoint-axor-2.c: Ditto.
> * gcc.target/i386/x86gprintrin-1.c: Ditto.
> * gcc.target/i386/x86gprintrin-2.c: Ditto.
> * gcc.target/i386/x86gprintrin-3.c: Ditto.
> * gcc.target/i386/x86gprintrin-4.c: Ditto.
> * gcc.target/i386/x86gprintrin-5.c: Ditto.
> ---
>  gcc/common/config/i386/cpuinfo.h  |   2 +
>  gcc/common/config/i386/i386-common.cc |  15 +++
>  gcc/common/config/i386/i386-cpuinfo.h |   1 +
>  gcc/common/config/i386/i386-isas.h|   1 +
>  gcc/config.gcc|   3 +-
>  gcc/config/i386/cpuid.h   |   1 +
>  gcc/config/i386/i386-builtin.def  |  10 ++
>  gcc/config/i386/i386-c.cc |   2 +
>  gcc/config/i386/i386-isa.def  |   1 +
>  gcc/config/i386/i386-options.cc   |   4 +-
>  gcc/config/i386/i386.opt  |   4 +
>  gcc/config/i386/raointintrin.h| 101 ++
>  gcc/config/i386/sync.md   |  16 +++
>  gcc/config/i386/x86gprintrin.h|   2 +
>  gcc/doc/extend.texi   |   5 +
>  gcc/doc/invoke.texi   |  11 +-
>  gcc/doc/sourcebuild.texi  |   3 +
>  gcc/testsuite/g++.dg/other/i386-2.C   |   2 +-
>  gcc/testsuite/g++.dg/other/i386-3.C   |   2 +-
>  gcc/testsuite/gcc.target/i386/funcspec-56.inc |   2 +
>  gcc/testsuite/gcc.target/i386/rao-helper.h|  79 ++
>  gcc/testsuite/gcc.target/i386/raoint-1.c  |  31 ++
>  gcc/testsuite/gcc.target/i386/raoint-aadd-2.c |  24 +  
> gcc/testsuite/gcc.target/i386/raoint-aand-2.c |  25 +  
> gcc/testsuite/gcc.target/i386/raoint-aor-2.c  |  25 +  
> gcc/testsuite/gcc.target/i386/raoint-axor-2.c |  25 +
>  gcc/testsuite/gcc.target/i386/sse-12.c|   2 +-
>  gcc/testsuite/gcc.target/i386/sse-13.c|   2 +-
>  gcc/testsuite/gcc.target/i386/sse-14.c|   2 +-
>  gcc/testsuite/gcc.target/i386/sse-22.c|   4 +-
>  gcc/testsuite/gcc.target/i386/sse-23.c|   2 +-
>  .../gcc.target/i386/x86gprintrin-1.c  |   2 +-
>  .../gcc.target/i386/x86gprintrin-2.c  |   2 +-
>  .../gcc.target/i386/x86gprintrin-3.c  |   2 +-
>  .../gcc.target/i386/x86gprintrin-4.c  |   4 +-
>  .../gcc.target/i386/x86gprintrin-5.c  |   4 +-
>  gcc/testsuite/lib/target-supports.exp |  11 ++
>  37 files changed

Re: [PATCH] i386: Prefer remote atomic insn for atomic_fetch{add, and, or, xor}

2022-11-06 Thread Hongtao Liu via Gcc-patches
On Sun, Nov 6, 2022 at 9:00 PM Kong, Lingling via Gcc-patches
 wrote:
>
> Hi
>
> The patch is to add flag -mprefer-remote-atomic to control whether to 
> generate raoint insn for atomic operations.
> Ok for trunk?
Ok with below 2 little adjustments.
>
> BRs,
> Lingling
>
> gcc/ChangeLog:
>
> * config/i386/i386.opt:Add -mprefer-remote-atomic.
Please also update *x86 options* in gcc/doc/invode.texi.
> * config/i386/sync.md (atomic_):
> New define_expand.
> (atomic_add): Rename to below one.
> (atomic_add_1): To this.
> (atomic_): Ditto.
> (atomic__1): Ditto.
>
> gcc/testsuite/ChangeLog:
>
> * gcc.target/i386/raoint-atomic-fetch.c: New test.
> ---
>  gcc/config/i386/i386.opt  |  4 +++
>  gcc/config/i386/sync.md   | 29 ---
>  .../gcc.target/i386/raoint-atomic-fetch.c | 29 +++
>  3 files changed, 58 insertions(+), 4 deletions(-)  create mode 100644 
> gcc/testsuite/gcc.target/i386/raoint-atomic-fetch.c
>
> diff --git a/gcc/config/i386/i386.opt b/gcc/config/i386/i386.opt index 
> 415c52e1bb4..abb1e5ecbdc 100644
> --- a/gcc/config/i386/i386.opt
> +++ b/gcc/config/i386/i386.opt
> @@ -1246,3 +1246,7 @@ Support PREFETCHI built-in functions and code 
> generation.
>  mraoint
>  Target Mask(ISA2_RAOINT) Var(ix86_isa_flags2) Save  Support RAOINT built-in 
> functions and code generation.
> +
> +mprefer-remote-atomic
> +Target Var(flag_prefer_remote_atomic) Init(0) Prefer use remote atomic
> +insn for atomic operations.
> diff --git a/gcc/config/i386/sync.md b/gcc/config/i386/sync.md index 
> e6543a5efb0..08e944fc9b7 100644
> --- a/gcc/config/i386/sync.md
> +++ b/gcc/config/i386/sync.md
> @@ -37,7 +37,7 @@
>UNSPECV_CMPXCHG
>UNSPECV_XCHG
>UNSPECV_LOCK
> -
> +
Please remove this change.
>;; For CMPccXADD support
>UNSPECV_CMPCCXADD
>
> @@ -791,7 +791,28 @@
>  (define_code_iterator any_plus_logic [and ior xor plus])  (define_code_attr 
> plus_logic [(and "and") (ior "or") (xor "xor") (plus "add")])
>
> -(define_insn "rao_a"
> +(define_expand "atomic_"
> +  [(match_operand:SWI 0 "memory_operand")
> +   (any_plus_logic:SWI (match_dup 0)
> +  (match_operand:SWI 1 "nonmemory_operand"))
> +   (match_operand:SI 2 "const_int_operand")]
> +  ""
> +{
> +  if (flag_prefer_remote_atomic
> +  && TARGET_RAOINT && operands[2] == const0_rtx
> +  && (mode == SImode || mode == DImode))
> +  {
> +if (CONST_INT_P (operands[1]))
> +  operands[1] = force_reg (mode, operands[1]);
> +emit_insn (maybe_gen_rao_a (, mode, operands[0],
> +operands[1]));
> +  }
> +  else
> +emit_insn (gen_atomic__1 (operands[0], operands[1],
> +   operands[2]));
> +  DONE;
> +})
> +
> +(define_insn "@rao_a"
>[(set (match_operand:SWI48 0 "memory_operand" "+m")
> (unspec_volatile:SWI48
>   [(any_plus_logic:SWI48 (match_dup 0) @@ -801,7 +822,7 @@
>"TARGET_RAOINT"
>"a\t{%1, %0|%0, %1}")
>
> -(define_insn "atomic_add"
> +(define_insn "atomic_add_1"
>[(set (match_operand:SWI 0 "memory_operand" "+m")
> (unspec_volatile:SWI
>   [(plus:SWI (match_dup 0)
> @@ -855,7 +876,7 @@
>return "lock{%;} %K2sub{}\t{%1, %0|%0, %1}";
>  })
>
> -(define_insn "atomic_"
> +(define_insn "atomic__1"
>[(set (match_operand:SWI 0 "memory_operand" "+m")
> (unspec_volatile:SWI
>   [(any_logic:SWI (match_dup 0)
> diff --git a/gcc/testsuite/gcc.target/i386/raoint-atomic-fetch.c 
> b/gcc/testsuite/gcc.target/i386/raoint-atomic-fetch.c
> new file mode 100644
> index 000..ac4099d888e
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/i386/raoint-atomic-fetch.c
> @@ -0,0 +1,29 @@
> +/* { dg-do compile } */
> +/* { dg-options "-mraoint -O2 -mprefer-remote-atomic" } */
> +/* { dg-final { scan-assembler-times "aadd" 2 { target {! ia32 } } } }
> +*/
> +/* { dg-final { scan-assembler-times "aand" 2 { target {! ia32 } } } }
> +*/
> +/* { dg-final { scan-assembler-times "aor" 2 { target {! ia32 } } } }
> +*/
> +/* { dg-final { scan-assembler-times "axor" 2 { target {! ia32 } } } }
> +*/
> +/* { dg-final { scan-assembler-times "aadd" 1 { target ia32 } } } */
> +/* { dg-final { scan-assembler-times "aand" 1 { target ia32 } } } */
> +/* { dg-final { scan-assembler-times "aor" 1 { target ia32 } } } */
> +/* { dg-final { scan-assembler-times "axor" 1 { target ia32 } } } */
> +volatile int x; volatile long long y; int *a; long long *b;
> +
> +void extern
> +rao_int_test (void)
> +{
> +  __atomic_add_fetch (a, x, __ATOMIC_RELAXED);
> +  __atomic_and_fetch (a, x, __ATOMIC_RELAXED);
> +  __atomic_or_fetch (a, x, __ATOMIC_RELAXED);
> +  __atomic_xor_fetch (a, x, __ATOMIC_RELAXED); #ifdef __x86_64__
> +  __atomic_add_fetch (b, y, __ATOMIC_RELAXED);
> +  __atomic_and_fetch (b, y, __ATOMIC_RELAXED);
> +  __atomic_or_fetch (b, y, __ATOMIC_RELAXED);
> +  __atomic_xor_fetch (b, y, __ATOMIC_RELAXED); #endif }

Re: [PATCH] Support Intel prefetchit0/t1

2022-11-06 Thread Hongtao Liu via Gcc-patches
On Fri, Nov 4, 2022 at 3:47 PM Haochen Jiang via Gcc-patches
 wrote:
>
> Hi all,
>
> We will take back the patches which add a new parameter on original
> builtin_prefetch and implement instruction prefetch on that.
>
> Also we consider that since we will only do that on specific backend,
> no need to add a new rtl for that.
>
> This patch will only support instructions prefetch for x86 backend.
>
> Regtested on x86_64-pc-linux-gnu. Ok for trunk?
Ok.
>
> BRs,
> Haochen
>
> gcc/ChangeLog:
>
> * common/config/i386/cpuinfo.h (get_available_features):
> Detect PREFETCHI.
> * common/config/i386/i386-common.cc
> (OPTION_MASK_ISA2_PREFETCHI_SET,
> OPTION_MASK_ISA2_PREFETCHI_UNSET): New.
> (ix86_handle_option): Handle -mprefetchi.
> * common/config/i386/i386-cpuinfo.h
> (enum processor_features): Add FEATURE_PREFETCHI.
> * common/config/i386/i386-isas.h: Add ISA_NAME_TABLE_ENTRY
> for prefetchi.
> * config.gcc: Add prfchiintrin.h.
> * config/i386/cpuid.h (bit_PREFETCHI): New.
> * config/i386/i386-builtin-types.def:
> Add DEF_FUNCTION_TYPE (VOID, PCVOID, INT)
> and DEF_FUNCTION_TYPE (VOID, PCVOID, INT, INT, INT).
> * config/i386/i386-builtin.def (BDESC): Add new builtins.
> * config/i386/i386-c.cc (ix86_target_macros_internal):
> Define __PREFETCHI__.
> * config/i386/i386-expand.cc: Handle new builtins.
> * config/i386/i386-isa.def (PREFETCHI):
> Add DEF_PTA(PREFETCHI).
> * config/i386/i386-options.cc
> (ix86_valid_target_attribute_inner_p): Handle prefetchi.
> * config/i386/i386.md (prefetchi): New define_insn.
> * config/i386/i386.opt: Add option -mprefetchi.
> * config/i386/predicates.md (local_func_symbolic_operand):
> New predicates.
> * config/i386/x86gprintrin.h: Include prfchiintrin.h.
> * config/i386/xmmintrin.h (enum _mm_hint): New enum for
> prefetchi.
> (_mm_prefetch): Handle the highest bit of enum.
> * doc/extend.texi: Document prefetchi.
> * doc/invoke.texi: Document -mprefetchi.
> * doc/sourcebuild.texi: Document target prefetchi.
> * config/i386/prfchiintrin.h: New file.
>
> gcc/testsuite/ChangeLog:
>
> * g++.dg/other/i386-2.C: Add -mprefetchi.
> * g++.dg/other/i386-3.C: Ditto.
> * gcc.target/i386/avx-1.c: Ditto.
> * gcc.target/i386/funcspec-56.inc: Add new target attribute.
> * gcc.target/i386/sse-13.c: Add -mprefetchi.
> * gcc.target/i386/sse-23.c: Ditto.
> * gcc.target/i386/x86gprintrin-1.c: Ditto.
> * gcc.target/i386/x86gprintrin-2.c: Ditto.
> * gcc.target/i386/x86gprintrin-3.c: Ditto.
> * gcc.target/i386/x86gprintrin-4.c: Ditto.
> * gcc.target/i386/x86gprintrin-5.c: Ditto.
> * gcc.target/i386/prefetchi-1.c: New test.
> * gcc.target/i386/prefetchi-2.c: Ditto.
> * gcc.target/i386/prefetchi-3.c: Ditto.
> * gcc.target/i386/prefetchi-4.c: Ditto.
>
> Co-authored-by: Hongtao Liu 
> ---
>  gcc/common/config/i386/cpuinfo.h  |  2 +
>  gcc/common/config/i386/i386-common.cc | 15 
>  gcc/common/config/i386/i386-cpuinfo.h |  1 +
>  gcc/common/config/i386/i386-isas.h|  1 +
>  gcc/config.gcc|  2 +-
>  gcc/config/i386/cpuid.h   |  1 +
>  gcc/config/i386/i386-builtin-types.def|  4 +
>  gcc/config/i386/i386-builtin.def  |  4 +
>  gcc/config/i386/i386-c.cc |  2 +
>  gcc/config/i386/i386-expand.cc| 77 +++
>  gcc/config/i386/i386-isa.def  |  1 +
>  gcc/config/i386/i386-options.cc   |  4 +-
>  gcc/config/i386/i386.md   | 23 ++
>  gcc/config/i386/i386.opt  |  4 +
>  gcc/config/i386/predicates.md | 15 
>  gcc/config/i386/prfchiintrin.h| 49 
>  gcc/config/i386/x86gprintrin.h|  2 +
>  gcc/config/i386/xmmintrin.h   |  7 +-
>  gcc/doc/extend.texi   |  5 ++
>  gcc/doc/invoke.texi   | 10 ++-
>  gcc/doc/sourcebuild.texi  |  3 +
>  gcc/testsuite/g++.dg/other/i386-2.C   |  2 +-
>  gcc/testsuite/g++.dg/other/i386-3.C   |  2 +-
>  gcc/testsuite/gcc.target/i386/avx-1.c |  4 +-
>  gcc/testsuite/gcc.target/i386/funcspec-56.inc |  2 +
>  gcc/testsuite/gcc.target/i386/prefetchi-1.c   | 40 ++
>  gcc/testsuite/gcc.target/i386/prefetchi-2.c   | 26 +++
>  gcc/testsuite/gcc.target/i386/prefetchi-3.c   | 20 +
>  gcc/testsuite/gcc.target/i386/prefetchi-4.c   | 19 +
>  gcc/testsuite/gcc.target/i386/sse-13.c|  4 +-
>  gcc/testsuite/gcc.target/i386/sse-23.c|  4 +-
>  .../gcc.target/i386/x86gprintrin-1.c 

Re: [PATCH] Initial Granite Rapids support

2022-11-06 Thread Hongtao Liu via Gcc-patches
On Fri, Nov 4, 2022 at 4:14 PM Haochen Jiang via Gcc-patches
 wrote:
>
> From: "Hu, Lin1" 
>
> Hi all,
>
> This patch aimed to add initial Granite Rapids support for GCC.
> It needs to be checked in after prefetchit0/t1 patch.
>
> The information for Granite Rapids comes following:
> https://www.intel.com/content/www/us/en/develop/download/intel-architecture-instruction-set-extensions-programming-reference.html
>
> Regtested on x86_64-pc-linux-gnu. Ok for trunk?
Ok.
>
> BRs,
> Haochen
>
> gcc/Changelog:
>
> * common/config/i386/cpuinfo.h:
> (get_intel_cpu): Handle Granite Rapids.
> * common/config/i386/i386-common.cc:
> (processor_names): Add graniterapids.
> (processor_alias_table): Ditto.
> * common/config/i386/i386-cpuinfo.h:
> (enum processor_types): Add INTEL_GRANITERAPIDS.
> * config.gcc: Add -march=graniterapids.
> * config/i386/driver-i386.cc (host_detect_local_cpu):
> Handle graniterapids.
> * config/i386/i386-c.cc (ix86_target_macros_internal):
> Ditto.
> * config/i386/i386-options.cc (m_GRANITERAPIDS): New define.
> (processor_cost_table): Add graniterapids.
> * config/i386/i386.h (enum processor_type):
> Add PROCESSOR_GRANITERAPIDS.
> (PTA_GRANITERAPIDS): Ditto.
> * doc/extend.texi: Add graniterapids.
> * doc/invoke.texi: Ditto.
>
> gcc/testsuite/ChangeLog:
>
> * gcc/testsuite/g++.target/i386/mv16.C: Add graniterapids.
> * gcc.target/i386/funcspec-56.inc: Handle new march.
> ---
>  gcc/common/config/i386/cpuinfo.h  |  9 +
>  gcc/common/config/i386/i386-common.cc |  3 +++
>  gcc/common/config/i386/i386-cpuinfo.h |  1 +
>  gcc/config.gcc|  2 +-
>  gcc/config/i386/driver-i386.cc|  5 -
>  gcc/config/i386/i386-c.cc |  7 +++
>  gcc/config/i386/i386-options.cc   |  4 +++-
>  gcc/config/i386/i386.h|  3 +++
>  gcc/doc/extend.texi   |  3 +++
>  gcc/doc/invoke.texi   | 11 +++
>  gcc/testsuite/g++.target/i386/mv16.C  |  6 ++
>  gcc/testsuite/gcc.target/i386/funcspec-56.inc |  1 +
>  12 files changed, 52 insertions(+), 3 deletions(-)
>
> diff --git a/gcc/common/config/i386/cpuinfo.h 
> b/gcc/common/config/i386/cpuinfo.h
> index ac7761699af..42c25b8a636 100644
> --- a/gcc/common/config/i386/cpuinfo.h
> +++ b/gcc/common/config/i386/cpuinfo.h
> @@ -564,6 +564,15 @@ get_intel_cpu (struct __processor_model *cpu_model,
>CHECK___builtin_cpu_is ("sierraforest");
>cpu_model->__cpu_type = INTEL_SIERRAFOREST;
>break;
> +case 0xad:
> +case 0xae:
> +  /* Granite Rapids.  */
> +  cpu = "graniterapids";
> +  CHECK___builtin_cpu_is ("corei7");
> +  CHECK___builtin_cpu_is ("graniterapids");
> +  cpu_model->__cpu_type = INTEL_COREI7;
> +  cpu_model->__cpu_subtype = INTEL_COREI7_GRANITERAPIDS;
> +  break;
>  case 0x17:
>  case 0x1d:
>/* Penryn.  */
> diff --git a/gcc/common/config/i386/i386-common.cc 
> b/gcc/common/config/i386/i386-common.cc
> index 9bcae020a00..c828ae5b7d7 100644
> --- a/gcc/common/config/i386/i386-common.cc
> +++ b/gcc/common/config/i386/i386-common.cc
> @@ -1918,6 +1918,7 @@ const char *const processor_names[] =
>"sapphirerapids",
>"alderlake",
>"rocketlake",
> +  "graniterapids",
>"intel",
>"lujiazui",
>"geode",
> @@ -2037,6 +2038,8 @@ const pta processor_alias_table[] =
>  M_CPU_SUBTYPE (INTEL_COREI7_ALDERLAKE), P_PROC_AVX2},
>{"meteorlake", PROCESSOR_ALDERLAKE, CPU_HASWELL, PTA_ALDERLAKE,
>  M_CPU_SUBTYPE (INTEL_COREI7_ALDERLAKE), P_PROC_AVX2},
> +  {"graniterapids", PROCESSOR_GRANITERAPIDS, CPU_HASWELL, PTA_GRANITERAPIDS,
> +M_CPU_SUBTYPE (INTEL_COREI7_GRANITERAPIDS), P_PROC_AVX512F},
>{"bonnell", PROCESSOR_BONNELL, CPU_ATOM, PTA_BONNELL,
>  M_CPU_TYPE (INTEL_BONNELL), P_PROC_SSSE3},
>{"atom", PROCESSOR_BONNELL, CPU_ATOM, PTA_BONNELL,
> diff --git a/gcc/common/config/i386/i386-cpuinfo.h 
> b/gcc/common/config/i386/i386-cpuinfo.h
> index 68eda7a8696..c06f089b0c5 100644
> --- a/gcc/common/config/i386/i386-cpuinfo.h
> +++ b/gcc/common/config/i386/i386-cpuinfo.h
> @@ -96,6 +96,7 @@ enum processor_subtypes
>INTEL_COREI7_ROCKETLAKE,
>ZHAOXIN_FAM7H_LUJIAZUI,
>AMDFAM19H_ZNVER4,
> +  INTEL_COREI7_GRANITERAPIDS,
>CPU_SUBTYPE_MAX
>  };
>
> diff --git a/gcc/config.gcc b/gcc/config.gcc
> index 5c782b2f298..03c1523f7af 100644
> --- a/gcc/config.gcc
> +++ b/gcc/config.gcc
> @@ -668,7 +668,7 @@ silvermont knl knm skylake-avx512 cannonlake 
> icelake-client icelake-server \
>  skylake goldmont goldmont-plus tremont cascadelake tigerlake cooperlake \
>  sapphirerapids alderlake rocketlake eden-x2 nano nano-1000 nano-2000 
> nano-3000 \
>  nano-x2 eden-x4 nano-x4 lujiazui x86-64 x86-64-v2 x86-64-v3 x86

Re: [Patch] Fortran: Fix reallocation on assignment for kind=4 strings [PR107508]

2022-11-06 Thread Tobias Burnus

Hello,

On 06.11.22 21:32, Mikael Morin wrote:

Le 05/11/2022 à 23:28, Tobias Burnus a écrit :

OK for mainline?

The trans-array.c part looks good.
A couple of nits for the trans-expr.cc part:


-  /* Use the rhs string length and the lhs element size.  */
-  size = string_length;
-  tmp = TREE_TYPE (gfc_typenode_for_spec (&expr1->ts));
-  tmp = TYPE_SIZE_UNIT (tmp);
+  /* Use the rhs string length and the lhs element size. Note
that 'size' is
+ used below for the string-length comparison, only.  */
+  size = string_length,

s/,/;/ ?

+  tmp = TYPE_SIZE_UNIT (gfc_get_char_type (expr2->ts.kind));

Here you are using the rhs element size, which contradicts the
comment, so there is certainly something to fix here (either the
comment or the code).


I did remove it in between for testing – but obviously completely messed up 
when re-adding it :-/

However, testing indicates that expr1 vs. expr2 does not make a difference for 
the kind calculation:
  character(len=:,kind=1), allocatable :: c1l
  character(len=:,kind=4), allocatable :: c4l
  c1l = c4l
  c4l = c1l
as the code path is different and the result is in either case:
c1l = (character(kind=1)[1:.c1l] *) __builtin_realloc ((void *) c1l, MAX_EXPR 
<(sizetype) .c4l, 1>);
c4l = (character(kind=4)[1:.c4l] *) __builtin_realloc ((void *) c4l, MAX_EXPR 
<(sizetype) .c1l * 4, 1>);

Still, matching the comment makes sense.


As for the testcase, do you keep the code commented on purpose?

I think it happened when I did 'git add' after adding the PR to the
testcase, missing the commented lines I added for the explaining dumps :-/

Can some of it be removed or uncommented?


It should be all uncommented, except for the 'print' line.

Updated patch attached; passed quick testing + I will fully regtest it.
— I will commit it, unless more comments come up.

Tobias

PS: Writing patches while being tired works, but writing clean patches
obvious does not.
-
Siemens Electronic Design Automation GmbH; Anschrift: Arnulfstraße 201, 80634 
München; Gesellschaft mit beschränkter Haftung; Geschäftsführer: Thomas 
Heurung, Frank Thürauf; Sitz der Gesellschaft: München; Registergericht 
München, HRB 106955
Fortran: Fix reallocation on assignment for kind=4 strings [PR107508]

The check whether reallocation on assignment was required did not handle
kind=4 characters correctly such that there was always a reallocation,
implying issues with pointer addresses and lower bounds.  Additionally,
with all deferred strings, the old memory was not freed on reallocation.
And, finally, inside the block which was only executed if string lengths
or bounds or dynamic types changed, was a subcheck of the same, which
was effectively a no op but still confusing and at least added with -O0
extra instructions to the binary.

	PR fortran/107508

gcc/fortran/ChangeLog:

	* trans-array.cc (gfc_alloc_allocatable_for_assignment): Fix
	string-length check, plug memory leak, and avoid generation of
	effectively no-op code.
	* trans-expr.cc (alloc_scalar_allocatable_for_assignment): Extend
	comment; minor cleanup.

gcc/testsuite/ChangeLog:

	* gfortran.dg/widechar_11.f90: New test.

 gcc/fortran/trans-array.cc| 57 ---
 gcc/fortran/trans-expr.cc |  6 ++--
 gcc/testsuite/gfortran.dg/widechar_11.f90 | 51 +++
 3 files changed, 60 insertions(+), 54 deletions(-)

diff --git a/gcc/fortran/trans-array.cc b/gcc/fortran/trans-array.cc
index 514cb057afb..b7d4c41b5fe 100644
--- a/gcc/fortran/trans-array.cc
+++ b/gcc/fortran/trans-array.cc
@@ -10527,7 +10527,6 @@ gfc_alloc_allocatable_for_assignment (gfc_loopinfo *loop,
   tree offset;
   tree jump_label1;
   tree jump_label2;
-  tree neq_size;
   tree lbd;
   tree class_expr2 = NULL_TREE;
   int n;
@@ -10607,6 +10606,11 @@ gfc_alloc_allocatable_for_assignment (gfc_loopinfo *loop,
 	elemsize1 = expr1->ts.u.cl->backend_decl;
   else
 	elemsize1 = lss->info->string_length;
+  tree unit_size = TYPE_SIZE_UNIT (gfc_get_char_type (expr1->ts.kind));
+  elemsize1 = fold_build2_loc (input_location, MULT_EXPR,
+   TREE_TYPE (elemsize1), elemsize1,
+   fold_convert (TREE_TYPE (elemsize1), unit_size));
+
 }
   else if (expr1->ts.type == BT_CLASS)
 {
@@ -10699,19 +10703,7 @@ gfc_alloc_allocatable_for_assignment (gfc_loopinfo *loop,
   /* Allocate if data is NULL.  */
   cond_null = fold_build2_loc (input_location, EQ_EXPR, logical_type_node,
 			 array1, build_int_cst (TREE_TYPE (array1), 0));
-
-  if (expr1->ts.type == BT_CHARACTER && expr1->ts.deferred)
-{
-  tmp = fold_build2_loc (input_location, NE_EXPR,
-			 logical_type_node,
-			 lss->info->string_length,
-			 rss->info->string_length);
-  cond_null = fold_build2_loc (input_location, TRUTH_OR_EXPR,
-   logical_type_node, tmp, cond_null);
-  cond_null= gfc_evaluate_now (cond_null, &fblock);
-}
-  else
-cond_null= gfc_evalu

Re: [Patch] Fortran: Fix reallocation on assignment for kind=4 strings [PR107508]

2022-11-06 Thread Mikael Morin

Hello,

Le 05/11/2022 à 23:28, Tobias Burnus a écrit :

Prior to the attached patch, there is a problem with realloc on assignment
with kind=4 characters as the string length was compared with the byte 
size,

which was always true.


(...)


OK for mainline?


The trans-array.c part looks good.
A couple of nits for the trans-expr.cc part:


diff --git a/gcc/fortran/trans-expr.cc b/gcc/fortran/trans-expr.cc
index e7b9211f17e..44c373cc495 100644
--- a/gcc/fortran/trans-expr.cc
+++ b/gcc/fortran/trans-expr.cc
@@ -11236,10 +11236,10 @@ alloc_scalar_allocatable_for_assignment (stmtblock_t 
*block,
 
   if (expr1->ts.type == BT_CHARACTER && expr1->ts.deferred)

 {
-  /* Use the rhs string length and the lhs element size.  */
-  size = string_length;
-  tmp = TREE_TYPE (gfc_typenode_for_spec (&expr1->ts));
-  tmp = TYPE_SIZE_UNIT (tmp);
+  /* Use the rhs string length and the lhs element size. Note that 'size' 
is
+used below for the string-length comparison, only.  */
+  size = string_length,

s/,/;/ ?

+  tmp = TYPE_SIZE_UNIT (gfc_get_char_type (expr2->ts.kind));
Here you are using the rhs element size, which contradicts the comment, 
so there is certainly something to fix here (either the comment or the 
code).



   size_in_bytes = fold_build2_loc (input_location, MULT_EXPR,
   TREE_TYPE (tmp), tmp,
   fold_convert (TREE_TYPE (tmp), size));


As for the testcase, do you keep the code commented on purpose?
Can some of it be removed or uncommented?

Mikael


Re: optabs: Variable index vec_set

2022-11-06 Thread Uros Bizjak via Gcc-patches
On Sat, Nov 5, 2022 at 12:25 PM Richard Biener
 wrote:
>
> On Wed, Nov 2, 2022 at 1:46 PM Uros Bizjak  wrote:
> >
> > On Wed, Nov 2, 2022 at 1:45 PM Robin Dapp  wrote:
> > >
> > > > IIRC, I was trying to "fix" modeless operand by giving it a mode, but
> > > > since it made no difference for x86, I later dropped the patch.
> > > > However, operand with a known mode is preferred, so if it works for
> > > > you, just include my patch in your submission. My patch is somehow
> > > > trivial if we want operand to have known mode.
> > >
> > > I'd prefer to push it separately as my patch changes several things in
> > > the s390 backend that are kind of unrelated.  Is it OK to do an x86
> > > bootstrap and regtest and push it if everything looks good?  You can of
> > > course also do it yourself :)
> >
> > It is a middle-end patch, someone will have to approve it.
>
> The patch is OK

Thanks, pushed with the following ChangeLog:

optabs: Use operand[2] mode in can_vec_set_var_idx_p

Use operand[2] mode in can_vec_set_var_idx_p when checking vec_set_optab.

This change allows non-VOID index operand in vec_set_optab.

2022-11-06  Uroš Bizjak  

gcc/ChangeLog:

* optabs.cc (can_vec_set_var_idx_p): Use operand[2]
mode when checking vec_set_optab.

Bootstrapped and regression tested on x86_64-linux-gnu {,-m32}.

Uros.
diff --git a/gcc/optabs.cc b/gcc/optabs.cc
index c2a6f971d74..9fc9b1fc6e9 100644
--- a/gcc/optabs.cc
+++ b/gcc/optabs.cc
@@ -4344,12 +4344,17 @@ can_vec_set_var_idx_p (machine_mode vec_mode)
 return false;
 
   machine_mode inner_mode = GET_MODE_INNER (vec_mode);
+
   rtx reg1 = alloca_raw_REG (vec_mode, LAST_VIRTUAL_REGISTER + 1);
   rtx reg2 = alloca_raw_REG (inner_mode, LAST_VIRTUAL_REGISTER + 2);
-  rtx reg3 = alloca_raw_REG (VOIDmode, LAST_VIRTUAL_REGISTER + 3);
 
   enum insn_code icode = optab_handler (vec_set_optab, vec_mode);
 
+  const struct insn_data_d *data = &insn_data[icode];
+  machine_mode idx_mode = data->operand[2].mode;
+
+  rtx reg3 = alloca_raw_REG (idx_mode, LAST_VIRTUAL_REGISTER + 3);
+
   return icode != CODE_FOR_nothing && insn_operand_matches (icode, 0, reg1)
 && insn_operand_matches (icode, 1, reg2)
 && insn_operand_matches (icode, 2, reg3);


[PATCH] Use bit-CCP in range-ops.

2022-11-06 Thread Aldy Hernandez via Gcc-patches
After Jakub and Richi's suggestion of using the same representation
for tracking known bits as we do in CCP, I took a peek at the code and
realized there's a plethora of bit-tracking code there that we could
be sharing with range-ops.  For example, the multiplication
optimizations are way better than what I had cobbled together.  For
that matter, our maybe nonzero tracking as a whole has a lot of room
for improvement.  Being the lazy ass that I am, I think we should just
use one code base (CCP's).

This patch provides a thin wrapper for converting the irange maybe
nonzero bits to what CCP requires, and uses that to call into
bit_value_binop().  I have so far converted the MULT_EXPR range-op
entry to use it, as the DIV_EXPR entry we have gets a case CCP doesn't
get so I'd like to contribute the enhancement to CCP before converting
over.

I'd like to use this approach with the dozen or so tree_code's that
are handled in CCP, thus saving us from having to implement any of
them :).

Early next season I'd like to change irange's internal representation
to a pair of value / mask, and start tracking all known bits.  This
ties in nicely with our plan for tracking known set bits.

Perhaps if the stars align, we could merge the bit twiddling in CCP
into range-ops and have a central repository for it.  That is, once we
make the switch to wide-ints, and assuming there are no performance
issues.  Note that range-ops is our lowest level abstraction.
i.e. it's just the math, there's no GORI or ranger, or even the
concept of a symbolic or SSA.

I'd love to hear comments and ideas, and if no one objects push this.
Please let me know if I missed anything.

Tested on x86-64 Linux.

gcc/ChangeLog:

* range-op.cc (irange_to_masked_value): New.
(update_known_bitmask): New.
(operator_mult::fold_range): Call update_known_bitmask.
---
 gcc/range-op.cc | 63 +++--
 1 file changed, 50 insertions(+), 13 deletions(-)

diff --git a/gcc/range-op.cc b/gcc/range-op.cc
index 25c004d8287..6d9914d8d12 100644
--- a/gcc/range-op.cc
+++ b/gcc/range-op.cc
@@ -46,6 +46,54 @@ along with GCC; see the file COPYING3.  If not see
 #include "wide-int.h"
 #include "value-relation.h"
 #include "range-op.h"
+#include "tree-ssa-ccp.h"
+
+// Convert irange bitmasks into a VALUE MASK pair suitable for calling CCP.
+
+static void
+irange_to_masked_value (const irange &r, widest_int &value, widest_int &mask)
+{
+  if (r.singleton_p ())
+{
+  mask = 0;
+  value = widest_int::from (r.lower_bound (), TYPE_SIGN (r.type ()));
+}
+  else
+{
+  mask = widest_int::from (r.get_nonzero_bits (), TYPE_SIGN (r.type ()));
+  value = 0;
+}
+}
+
+// Update the known bitmasks in R when applying the operation CODE to
+// LH and RH.
+
+static void
+update_known_bitmask (irange &r, tree_code code,
+ const irange &lh, const irange &rh)
+{
+  if (r.undefined_p ())
+return;
+
+  widest_int value, mask, lh_mask, rh_mask, lh_value, rh_value;
+  tree type = r.type ();
+  signop sign = TYPE_SIGN (type);
+  int prec = TYPE_PRECISION (type);
+  signop lh_sign = TYPE_SIGN (lh.type ());
+  signop rh_sign = TYPE_SIGN (rh.type ());
+  int lh_prec = TYPE_PRECISION (lh.type ());
+  int rh_prec = TYPE_PRECISION (rh.type ());
+
+  irange_to_masked_value (lh, lh_value, lh_mask);
+  irange_to_masked_value (rh, rh_value, rh_mask);
+  bit_value_binop (code, sign, prec, &value, &mask,
+  lh_sign, lh_prec, lh_value, lh_mask,
+  rh_sign, rh_prec, rh_value, rh_mask);
+
+  int_range<2> tmp (type);
+  tmp.set_nonzero_bits (value | mask);
+  r.intersect (tmp);
+}
 
 // Return the upper limit for a type.
 
@@ -1774,21 +1822,10 @@ operator_mult::fold_range (irange &r, tree type,
   if (!cross_product_operator::fold_range (r, type, lh, rh, trio))
 return false;
 
-  if (lh.undefined_p ())
+  if (lh.undefined_p () || rh.undefined_p ())
 return true;
 
-  tree t;
-  if (rh.singleton_p (&t))
-{
-  wide_int w = wi::to_wide (t);
-  int shift = wi::exact_log2 (w);
-  if (shift != -1)
-   {
- wide_int nz = lh.get_nonzero_bits ();
- nz = wi::lshift (nz, shift);
- r.set_nonzero_bits (nz);
-   }
-}
+  update_known_bitmask (r, MULT_EXPR, lh, rh);
   return true;
 }
 
-- 
2.38.1



[PATCH] i386: Prefer remote atomic insn for atomic_fetch{add, and, or, xor}

2022-11-06 Thread Kong, Lingling via Gcc-patches
Hi

The patch is to add flag -mprefer-remote-atomic to control whether to generate 
raoint insn for atomic operations.
Ok for trunk?

BRs,
Lingling

gcc/ChangeLog:

* config/i386/i386.opt:Add -mprefer-remote-atomic.
* config/i386/sync.md (atomic_):
New define_expand.
(atomic_add): Rename to below one.
(atomic_add_1): To this.
(atomic_): Ditto.
(atomic__1): Ditto.

gcc/testsuite/ChangeLog:

* gcc.target/i386/raoint-atomic-fetch.c: New test.
---
 gcc/config/i386/i386.opt  |  4 +++
 gcc/config/i386/sync.md   | 29 ---
 .../gcc.target/i386/raoint-atomic-fetch.c | 29 +++
 3 files changed, 58 insertions(+), 4 deletions(-)  create mode 100644 
gcc/testsuite/gcc.target/i386/raoint-atomic-fetch.c

diff --git a/gcc/config/i386/i386.opt b/gcc/config/i386/i386.opt index 
415c52e1bb4..abb1e5ecbdc 100644
--- a/gcc/config/i386/i386.opt
+++ b/gcc/config/i386/i386.opt
@@ -1246,3 +1246,7 @@ Support PREFETCHI built-in functions and code generation.
 mraoint
 Target Mask(ISA2_RAOINT) Var(ix86_isa_flags2) Save  Support RAOINT built-in 
functions and code generation.
+
+mprefer-remote-atomic
+Target Var(flag_prefer_remote_atomic) Init(0) Prefer use remote atomic 
+insn for atomic operations.
diff --git a/gcc/config/i386/sync.md b/gcc/config/i386/sync.md index 
e6543a5efb0..08e944fc9b7 100644
--- a/gcc/config/i386/sync.md
+++ b/gcc/config/i386/sync.md
@@ -37,7 +37,7 @@
   UNSPECV_CMPXCHG
   UNSPECV_XCHG
   UNSPECV_LOCK
- 
+
   ;; For CMPccXADD support
   UNSPECV_CMPCCXADD
 
@@ -791,7 +791,28 @@
 (define_code_iterator any_plus_logic [and ior xor plus])  (define_code_attr 
plus_logic [(and "and") (ior "or") (xor "xor") (plus "add")])
 
-(define_insn "rao_a"
+(define_expand "atomic_"
+  [(match_operand:SWI 0 "memory_operand")
+   (any_plus_logic:SWI (match_dup 0)
+  (match_operand:SWI 1 "nonmemory_operand"))
+   (match_operand:SI 2 "const_int_operand")]
+  ""
+{
+  if (flag_prefer_remote_atomic
+  && TARGET_RAOINT && operands[2] == const0_rtx
+  && (mode == SImode || mode == DImode))
+  {
+if (CONST_INT_P (operands[1]))
+  operands[1] = force_reg (mode, operands[1]);
+emit_insn (maybe_gen_rao_a (, mode, operands[0], 
+operands[1]));
+  }
+  else
+emit_insn (gen_atomic__1 (operands[0], operands[1],
+   operands[2]));
+  DONE;
+})
+
+(define_insn "@rao_a"
   [(set (match_operand:SWI48 0 "memory_operand" "+m")
(unspec_volatile:SWI48
  [(any_plus_logic:SWI48 (match_dup 0) @@ -801,7 +822,7 @@
   "TARGET_RAOINT"
   "a\t{%1, %0|%0, %1}")
 
-(define_insn "atomic_add"
+(define_insn "atomic_add_1"
   [(set (match_operand:SWI 0 "memory_operand" "+m")
(unspec_volatile:SWI
  [(plus:SWI (match_dup 0)
@@ -855,7 +876,7 @@
   return "lock{%;} %K2sub{}\t{%1, %0|%0, %1}";
 })
 
-(define_insn "atomic_"
+(define_insn "atomic__1"
   [(set (match_operand:SWI 0 "memory_operand" "+m")
(unspec_volatile:SWI
  [(any_logic:SWI (match_dup 0)
diff --git a/gcc/testsuite/gcc.target/i386/raoint-atomic-fetch.c 
b/gcc/testsuite/gcc.target/i386/raoint-atomic-fetch.c
new file mode 100644
index 000..ac4099d888e
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/raoint-atomic-fetch.c
@@ -0,0 +1,29 @@
+/* { dg-do compile } */
+/* { dg-options "-mraoint -O2 -mprefer-remote-atomic" } */
+/* { dg-final { scan-assembler-times "aadd" 2 { target {! ia32 } } } } 
+*/
+/* { dg-final { scan-assembler-times "aand" 2 { target {! ia32 } } } } 
+*/
+/* { dg-final { scan-assembler-times "aor" 2 { target {! ia32 } } } } 
+*/
+/* { dg-final { scan-assembler-times "axor" 2 { target {! ia32 } } } } 
+*/
+/* { dg-final { scan-assembler-times "aadd" 1 { target ia32 } } } */
+/* { dg-final { scan-assembler-times "aand" 1 { target ia32 } } } */
+/* { dg-final { scan-assembler-times "aor" 1 { target ia32 } } } */
+/* { dg-final { scan-assembler-times "axor" 1 { target ia32 } } } */ 
+volatile int x; volatile long long y; int *a; long long *b;
+
+void extern
+rao_int_test (void)
+{
+  __atomic_add_fetch (a, x, __ATOMIC_RELAXED);
+  __atomic_and_fetch (a, x, __ATOMIC_RELAXED);
+  __atomic_or_fetch (a, x, __ATOMIC_RELAXED);
+  __atomic_xor_fetch (a, x, __ATOMIC_RELAXED); #ifdef __x86_64__
+  __atomic_add_fetch (b, y, __ATOMIC_RELAXED);
+  __atomic_and_fetch (b, y, __ATOMIC_RELAXED);
+  __atomic_or_fetch (b, y, __ATOMIC_RELAXED);
+  __atomic_xor_fetch (b, y, __ATOMIC_RELAXED); #endif }
--
2.27.0



[PATCH] Support Intel RAO-INT

2022-11-06 Thread Kong, Lingling via Gcc-patches
Hi,
The patches aimed to add Intel RAO-INT.

The information is based on newly released
Intel Architecture Instruction Set Extensions and Future Features.

The document comes following:
https://www.intel.com/content/www/us/en/develop/download/intel-architecture-instruction-set-extensions-programming-reference.html.

OK for trunk?

gcc/ChangeLog:

* common/config/i386/cpuinfo.h (get_available_features):
Detect raoint.
* common/config/i386/i386-common.cc (OPTION_MASK_ISA2_RAOINT_SET,
OPTION_MASK_ISA2_RAOINT_UNSET): New.
(ix86_handle_option): Handle -mraoint.
* common/config/i386/i386-cpuinfo.h (enum processor_features):
Add FEATURE_RAOINT.
* common/config/i386/i386-isas.h: Add ISA_NAME_TABLE_ENTRY for
raoint.
* config.gcc: Add raointintrin.h
* config/i386/cpuid.h (bit_RAOINT): New.
* config/i386/i386-builtin.def (BDESC): Add new builtins.
* config/i386/i386-c.cc (ix86_target_macros_internal): Define
__RAOINT__.
* config/i386/i386-isa.def (RAOINT): Add DEF_PTA(RAOINT).
* config/i386/i386-options.cc (ix86_valid_target_attribute_inner_p):
Add -mraoint.
* config/i386/sync.md (rao_a): New define insn.
* config/i386/i386.opt: Add option -mraoint.
* config/i386/x86gprintrin.h: Include raointintrin.h.
* doc/extend.texi: Document raoint.
* doc/invoke.texi: Document -mraoint.
* doc/sourcebuild.texi: Document target raoint.
* config/i386/raointintrin.h: New file.

gcc/testsuite/ChangeLog:

* g++.dg/other/i386-2.C: Add -mraoint.
* g++.dg/other/i386-3.C: Ditto.
* gcc.target/i386/funcspec-56.inc: Add new target attribute.
* gcc.target/i386/sse-12.c: Add -mraoint.
* gcc.target/i386/sse-13.c: Ditto.
* gcc.target/i386/sse-14.c: Ditto.
* gcc.target/i386/sse-22.c: Add raoint target.
* gcc.target/i386/sse-23.c: Ditto.
* lib/target-supports.exp: Add check_effective_target_raoint.
* gcc.target/i386/rao-helper.h: New test.
* gcc.target/i386/raoint-1.c: Ditto.
* gcc.target/i386/raoint-aadd-2.c: Ditto.
* gcc.target/i386/raoint-aand-2.c: Ditto.
* gcc.target/i386/raoint-aor-2.c: Ditto.
* gcc.target/i386/raoint-axor-2.c: Ditto.
* gcc.target/i386/x86gprintrin-1.c: Ditto.
* gcc.target/i386/x86gprintrin-2.c: Ditto.
* gcc.target/i386/x86gprintrin-3.c: Ditto.
* gcc.target/i386/x86gprintrin-4.c: Ditto.
* gcc.target/i386/x86gprintrin-5.c: Ditto.
---
 gcc/common/config/i386/cpuinfo.h  |   2 +
 gcc/common/config/i386/i386-common.cc |  15 +++
 gcc/common/config/i386/i386-cpuinfo.h |   1 +
 gcc/common/config/i386/i386-isas.h|   1 +
 gcc/config.gcc|   3 +-
 gcc/config/i386/cpuid.h   |   1 +
 gcc/config/i386/i386-builtin.def  |  10 ++
 gcc/config/i386/i386-c.cc |   2 +
 gcc/config/i386/i386-isa.def  |   1 +
 gcc/config/i386/i386-options.cc   |   4 +-
 gcc/config/i386/i386.opt  |   4 +
 gcc/config/i386/raointintrin.h| 101 ++
 gcc/config/i386/sync.md   |  16 +++
 gcc/config/i386/x86gprintrin.h|   2 +
 gcc/doc/extend.texi   |   5 +
 gcc/doc/invoke.texi   |  11 +-
 gcc/doc/sourcebuild.texi  |   3 +
 gcc/testsuite/g++.dg/other/i386-2.C   |   2 +-
 gcc/testsuite/g++.dg/other/i386-3.C   |   2 +-
 gcc/testsuite/gcc.target/i386/funcspec-56.inc |   2 +
 gcc/testsuite/gcc.target/i386/rao-helper.h|  79 ++
 gcc/testsuite/gcc.target/i386/raoint-1.c  |  31 ++
 gcc/testsuite/gcc.target/i386/raoint-aadd-2.c |  24 +  
gcc/testsuite/gcc.target/i386/raoint-aand-2.c |  25 +  
gcc/testsuite/gcc.target/i386/raoint-aor-2.c  |  25 +  
gcc/testsuite/gcc.target/i386/raoint-axor-2.c |  25 +
 gcc/testsuite/gcc.target/i386/sse-12.c|   2 +-
 gcc/testsuite/gcc.target/i386/sse-13.c|   2 +-
 gcc/testsuite/gcc.target/i386/sse-14.c|   2 +-
 gcc/testsuite/gcc.target/i386/sse-22.c|   4 +-
 gcc/testsuite/gcc.target/i386/sse-23.c|   2 +-
 .../gcc.target/i386/x86gprintrin-1.c  |   2 +-
 .../gcc.target/i386/x86gprintrin-2.c  |   2 +-
 .../gcc.target/i386/x86gprintrin-3.c  |   2 +-
 .../gcc.target/i386/x86gprintrin-4.c  |   4 +-
 .../gcc.target/i386/x86gprintrin-5.c  |   4 +-
 gcc/testsuite/lib/target-supports.exp |  11 ++
 37 files changed, 413 insertions(+), 21 deletions(-)  create mode 100644 
gcc/config/i386/raointintrin.h  create mode 100644 
gcc/testsuite/gcc.target/i386/rao-helper.h
 create mode 100644 gcc/testsuite/gcc.target/i386/raoint-1.c
 create mode 100644 gcc/testsuite/gcc.target/i386/rao

[PATCH] RISC-V: Add RVV registers register spilling

2022-11-06 Thread juzhe . zhong
From: Ju-Zhe Zhong 

This patch support RVV scalable register spilling.
prologue && epilogue handling pick up prototype from Monk Chiang 
.
Co-authored-by: Monk Chiang 

gcc/ChangeLog:

* config/riscv/riscv-v.cc (emit_pred_move): Adjust for scalable 
register spilling.
(legitimize_move): Ditto.
* config/riscv/riscv.cc (riscv_v_adjust_scalable_frame): New function.
(riscv_first_stack_step): Adjust for scalable register spilling.
(riscv_expand_prologue): Ditto.
(riscv_expand_epilogue): Ditto.
(riscv_dwarf_poly_indeterminate_value): New function.
(TARGET_DWARF_POLY_INDETERMINATE_VALUE): New target hook support for 
register spilling.
* config/riscv/riscv.h (RISCV_DWARF_VLENB): New macro.
(RISCV_PROLOGUE_TEMP2_REGNUM): Ditto.
(RISCV_PROLOGUE_TEMP2): Ditto.
* config/riscv/vector-iterators.md: New iterators.
* config/riscv/vector.md (*mov): Fix it for register spilling.
(*mov_whole): New pattern.
(*mov_fract): New pattern.
(@pred_mov): Fix it for register spilling.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/rvv/base/mov-9.c:
* gcc.target/riscv/rvv/base/macro.h: New test.
* gcc.target/riscv/rvv/base/spill-1.c: New test.
* gcc.target/riscv/rvv/base/spill-10.c: New test.
* gcc.target/riscv/rvv/base/spill-11.c: New test.
* gcc.target/riscv/rvv/base/spill-12.c: New test.
* gcc.target/riscv/rvv/base/spill-2.c: New test.
* gcc.target/riscv/rvv/base/spill-3.c: New test.
* gcc.target/riscv/rvv/base/spill-4.c: New test.
* gcc.target/riscv/rvv/base/spill-5.c: New test.
* gcc.target/riscv/rvv/base/spill-6.c: New test.
* gcc.target/riscv/rvv/base/spill-7.c: New test.
* gcc.target/riscv/rvv/base/spill-8.c: New test.
* gcc.target/riscv/rvv/base/spill-9.c: New test.

---
 gcc/config/riscv/riscv-v.cc   |  47 +--
 gcc/config/riscv/riscv.cc | 147 ++-
 gcc/config/riscv/riscv.h  |   3 +
 gcc/config/riscv/vector-iterators.md  |  23 ++
 gcc/config/riscv/vector.md| 136 +--
 .../gcc.target/riscv/rvv/base/macro.h |   6 +
 .../gcc.target/riscv/rvv/base/mov-9.c |   8 +-
 .../gcc.target/riscv/rvv/base/spill-1.c   | 385 ++
 .../gcc.target/riscv/rvv/base/spill-10.c  |  41 ++
 .../gcc.target/riscv/rvv/base/spill-11.c  |  60 +++
 .../gcc.target/riscv/rvv/base/spill-12.c  |  47 +++
 .../gcc.target/riscv/rvv/base/spill-2.c   | 320 +++
 .../gcc.target/riscv/rvv/base/spill-3.c   | 254 
 .../gcc.target/riscv/rvv/base/spill-4.c   | 196 +
 .../gcc.target/riscv/rvv/base/spill-5.c   | 130 ++
 .../gcc.target/riscv/rvv/base/spill-6.c   | 101 +
 .../gcc.target/riscv/rvv/base/spill-7.c   | 114 ++
 .../gcc.target/riscv/rvv/base/spill-8.c   |  51 +++
 .../gcc.target/riscv/rvv/base/spill-9.c   |  42 ++
 19 files changed, 2021 insertions(+), 90 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/base/macro.h
 create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/base/spill-1.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/base/spill-10.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/base/spill-11.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/base/spill-12.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/base/spill-2.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/base/spill-3.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/base/spill-4.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/base/spill-5.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/base/spill-6.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/base/spill-7.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/base/spill-8.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/base/spill-9.c

diff --git a/gcc/config/riscv/riscv-v.cc b/gcc/config/riscv/riscv-v.cc
index 6615a5c7ffe..e0459e3f610 100644
--- a/gcc/config/riscv/riscv-v.cc
+++ b/gcc/config/riscv/riscv-v.cc
@@ -106,28 +106,25 @@ const_vec_all_same_in_range_p (rtx x, HOST_WIDE_INT 
minval,
 
 /* Emit an RVV unmask && vl mov from SRC to DEST.  */
 static void
-emit_pred_move (rtx dest, rtx src, rtx vl, machine_mode mask_mode)
+emit_pred_move (rtx dest, rtx src, machine_mode mask_mode)
 {
   insn_expander<7> e;
-
   machine_mode mode = GET_MODE (dest);
-  if (register_operand (src, mode) && register_operand (dest, mode))
-{
-  emit_move_insn (dest, src);
-  return;
-}
+  rtx vl = gen_reg_rtx (Pmode);
+  unsigned int sew = GET_MODE_CLASS (mode) == MODE_VECTOR_BOOL
+  ? 8
+  : GET_MODE_BITSIZE (GET_MODE_INNER (mode));
+
+  emit_insn (gen_vsetvl_no_side_effects (
+Pmode, vl, gen_rtx_REG (Pmode, 0), gen_int_mode (sew, Pmode),
+gen_int_mo

Re: [PATCH] LoongArch: fix signed overflow in loongarch_emit_int_compare

2022-11-06 Thread Xi Ruoyao via Gcc-patches
On Sun, 2022-11-06 at 09:46 +0800, Lulu Cheng wrote:
> I think it should be here:
> 
>   if (!increment && !decrement)
>  continue;
> 
> + if ((increment && rhs == HOST_WIDE_INT_MAX)
> + || (decrement && rhs == HOST_WIDE_INT_MIN))
> +   break;
> +
> 
> It is not necessary to continue when *code matches one of 
> mag_comparisons[i].

Ah yes, I misread the code :(.

-- 
Xi Ruoyao 
School of Aerospace Science and Technology, Xidian University


Re: [PATCH] RISC-V: Fix RVV related testsuite

2022-11-06 Thread Andreas Schwab
Perhaps rvv.exp should add -I. so that the wrapper is found regardless?

-- 
Andreas Schwab, sch...@linux-m68k.org
GPG Key fingerprint = 7578 EB47 D4E5 4D69 2510  2552 DF73 E780 A9DA AEC1
"And now for something completely different."