[committed] i386/testsuite: Add testcase for fixed PR [PR51492]

2024-07-30 Thread Uros Bizjak
PR target/51492

gcc/testsuite/ChangeLog:

* gcc.target/i386/pr51492.c: New test.

Tested on x86_64-linux-gnu {,-m32}.

Uros.
diff --git a/gcc/testsuite/gcc.target/i386/pr51492.c 
b/gcc/testsuite/gcc.target/i386/pr51492.c
new file mode 100644
index 000..0892e0c79a7
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/pr51492.c
@@ -0,0 +1,19 @@
+/* PR target/51492 */
+/* { dg-do compile } */
+/* { dg-options "-O2 -ftree-vectorize -msse2" } */
+
+#define SIZE 65536
+#define WSIZE 64
+unsigned short head[SIZE] __attribute__((aligned(64)));
+
+void
+f(void)
+{
+  for (unsigned n = 0; n < SIZE; ++n) {
+unsigned short m = head[n];
+head[n] = (unsigned short)(m >= WSIZE ? m-WSIZE : 0);
+  }
+}
+
+/* { dg-final { scan-assembler "psubusw" } } */
+/* { dg-final { scan-assembler-not "paddw" } } */


[gcc r15-2419] i386/testsuite: Add testcase for fixed PR [PR51492]

2024-07-30 Thread Uros Bizjak via Gcc-cvs
https://gcc.gnu.org/g:8b737ec289da83e9e2a9672be0336980616e8932

commit r15-2419-g8b737ec289da83e9e2a9672be0336980616e8932
Author: Uros Bizjak 
Date:   Tue Jul 30 20:02:36 2024 +0200

i386/testsuite: Add testcase for fixed PR [PR51492]

PR target/51492

gcc/testsuite/ChangeLog:

* gcc.target/i386/pr51492.c: New test.

Diff:
---
 gcc/testsuite/gcc.target/i386/pr51492.c | 19 +++
 1 file changed, 19 insertions(+)

diff --git a/gcc/testsuite/gcc.target/i386/pr51492.c 
b/gcc/testsuite/gcc.target/i386/pr51492.c
new file mode 100644
index ..0892e0c79a7b
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/pr51492.c
@@ -0,0 +1,19 @@
+/* PR target/51492 */
+/* { dg-do compile } */
+/* { dg-options "-O2 -ftree-vectorize -msse2" } */
+
+#define SIZE 65536
+#define WSIZE 64
+unsigned short head[SIZE] __attribute__((aligned(64)));
+
+void
+f(void)
+{
+  for (unsigned n = 0; n < SIZE; ++n) {
+unsigned short m = head[n];
+head[n] = (unsigned short)(m >= WSIZE ? m-WSIZE : 0);
+  }
+}
+
+/* { dg-final { scan-assembler "psubusw" } } */
+/* { dg-final { scan-assembler-not "paddw" } } */


Re: [PATCH 2/3][x86][v2] implement TARGET_MODE_CAN_TRANSFER_BITS

2024-07-30 Thread Uros Bizjak
On Tue, Jul 30, 2024 at 3:00 PM Richard Biener  wrote:
>
> On Tue, 30 Jul 2024, Alexander Monakov wrote:
>
> >
> > On Tue, 30 Jul 2024, Richard Biener wrote:
> >
> > > > Oh, and please add a small comment why we don't use XFmode here.
> > >
> > > Will do.
> > >
> > > /* Do not enable XFmode, there is padding in it and it suffers
> > >from normalization upon load like SFmode and DFmode when
> > >not using SSE.  */
> >
> > Is it really true? I have no evidence of FLDT performing normalization
> > (as mentioned in PR 114659, if it did, there would be no way to spill/reload
> > x87 registers).
>
> What mangling fld performs depends on the contents of the FP control
> word which is awkward.  IIRC there's at least a bugreport that it
> turns sNaN into a qNaN, it seems I was wrong about denormals
> (when DM is not masked).  And yes, IIRC x87 instability is also
> related to spills (IIRC we spill in the actual mode of the reg, not in
> XFmode), but -fexcess-precision=standard should hopefully avoid that.
> It's also not clear whether all implementations conformed to the
> specs wrt extended-precision format loads.

FYI, FLDT does not mangle long-double values and does not generate
exceptions. Please see [1], but ignore shadowed text and instead read
the "Floating-Point Exceptions" section. So, as far as hardware is
concerned, it *can* be used to transfer 10-byte values, but I don't
want to judge from the compiler PoV if this is the way to go. We can
enable it, perhaps temporarily to experiment a bit - it is easy to
disable if it causes problems.

Let's CC Intel folks for their opinion, if it is worth using an aging
x87 to transfer 80-bit data.

[1] https://www.felixcloutier.com/x86/fld

Uros.


Re: [PATCH 2/3][x86][v2] implement TARGET_MODE_CAN_TRANSFER_BITS

2024-07-30 Thread Uros Bizjak
On Tue, Jul 30, 2024 at 1:07 PM Uros Bizjak  wrote:
>
> On Tue, Jul 30, 2024 at 12:18 PM Richard Biener  wrote:
> >
> > The following implements the hook, excluding x87 modes for scalar
> > and complex float modes.
> >
> > Bootstrapped and tested on x86_64-unknown-linux-gnu.
> >
> > OK?
> >
> > Thanks,
> > Richard.
> >
> > * i386.cc (TARGET_MODE_CAN_TRANSFER_BITS): Define.
> > (ix86_mode_can_transfer_bits): New function.
> > ---
> >  gcc/config/i386/i386.cc | 21 +
> >  1 file changed, 21 insertions(+)
> >
> > diff --git a/gcc/config/i386/i386.cc b/gcc/config/i386/i386.cc
> > index 12d15feb5e9..5184366916b 100644
> > --- a/gcc/config/i386/i386.cc
> > +++ b/gcc/config/i386/i386.cc
> > @@ -26113,6 +26113,24 @@ ix86_have_ccmp ()
> >return (bool) TARGET_APX_CCMP;
> >  }
> >
> > +/* Implement TARGET_MODE_CAN_TRANSFER_BITS.  */
> > +static bool
> > +ix86_mode_can_transfer_bits (machine_mode mode)
> > +{
> > +  if (GET_MODE_CLASS (mode) == MODE_FLOAT
> > +  || GET_MODE_CLASS (mode) == MODE_COMPLEX_FLOAT)
> > +switch (GET_MODE_INNER (mode))
> > +  {
> > +  case SFmode:
> > +  case DFmode:
> > +   return TARGET_SSE_MATH && !TARGET_MIX_SSE_I387;
>
> This can be simplified to:
>
> return !(ix86_fpmath & FPMATH_387);
>
> (Which implies that we should introduce TARGET_I387_MATH to parallel
> TARGET_SSE_MATH some day...)
>
> > +  default:
> > +   return false;
>
> We don't want to enable HFmode for transfers?

Oh, and please add a small comment why we don't use XFmode here.

Uros.


Re: [PATCH 2/3][x86][v2] implement TARGET_MODE_CAN_TRANSFER_BITS

2024-07-30 Thread Uros Bizjak
On Tue, Jul 30, 2024 at 12:18 PM Richard Biener  wrote:
>
> The following implements the hook, excluding x87 modes for scalar
> and complex float modes.
>
> Bootstrapped and tested on x86_64-unknown-linux-gnu.
>
> OK?
>
> Thanks,
> Richard.
>
> * i386.cc (TARGET_MODE_CAN_TRANSFER_BITS): Define.
> (ix86_mode_can_transfer_bits): New function.
> ---
>  gcc/config/i386/i386.cc | 21 +
>  1 file changed, 21 insertions(+)
>
> diff --git a/gcc/config/i386/i386.cc b/gcc/config/i386/i386.cc
> index 12d15feb5e9..5184366916b 100644
> --- a/gcc/config/i386/i386.cc
> +++ b/gcc/config/i386/i386.cc
> @@ -26113,6 +26113,24 @@ ix86_have_ccmp ()
>return (bool) TARGET_APX_CCMP;
>  }
>
> +/* Implement TARGET_MODE_CAN_TRANSFER_BITS.  */
> +static bool
> +ix86_mode_can_transfer_bits (machine_mode mode)
> +{
> +  if (GET_MODE_CLASS (mode) == MODE_FLOAT
> +  || GET_MODE_CLASS (mode) == MODE_COMPLEX_FLOAT)
> +switch (GET_MODE_INNER (mode))
> +  {
> +  case SFmode:
> +  case DFmode:
> +   return TARGET_SSE_MATH && !TARGET_MIX_SSE_I387;

This can be simplified to:

return !(ix86_fpmath & FPMATH_387);

(Which implies that we should introduce TARGET_I387_MATH to parallel
TARGET_SSE_MATH some day...)

> +  default:
> +   return false;

We don't want to enable HFmode for transfers?

Uros.

> +  }
> +
> +  return true;
> +}
> +
>  /* Target-specific selftests.  */
>
>  #if CHECKING_P
> @@ -26959,6 +26977,9 @@ ix86_libgcc_floating_mode_supported_p
>  #undef TARGET_HAVE_CCMP
>  #define TARGET_HAVE_CCMP ix86_have_ccmp
>
> +#undef TARGET_MODE_CAN_TRANSFER_BITS
> +#define TARGET_MODE_CAN_TRANSFER_BITS ix86_mode_can_transfer_bits
> +
>  static bool
>  ix86_libc_has_fast_function (int fcode ATTRIBUTE_UNUSED)
>  {
> --
> 2.43.0
>


Re: [PATCH v2] i386: Change prefetchi output template

2024-07-22 Thread Uros Bizjak
On Tue, Jul 23, 2024 at 4:59 AM Haochen Jiang  wrote:
>
> Hi all,
>
> I tested with %a and it works. Therefore I suppose it is a better solution.
>
> Bootstrapped and regtested on x86-64-pc-linux-gnu. Ok for trunk and backport
> to GCC 13 and 14?

OK, also for backports.

Thanks,
Uros.

>
> Thx,
> Haochen
>
> ---
>
> Changes in v2: Use %a in pattern
>
> ---
>
> For prefetchi instructions, RIP-relative address is explicitly mentioned
> for operand and assembler obeys that rule strictly. This makes
> instruction like:
>
> prefetchit0 bar
>
> got illegal for assembler, which should be a broad usage for prefetchi.
>
> Change to %a to explicitly add (%rip) after function label to make it
> legal in assembler so that it could pass to linker to get the real address.
>
> gcc/ChangeLog:
>
> * config/i386/i386.md (prefetchi): Change to %a.
>
> gcc/testsuite/ChangeLog:
>
> * gcc.target/i386/prefetchi-1.c: Check (%rip).
> ---
>  gcc/config/i386/i386.md | 2 +-
>  gcc/testsuite/gcc.target/i386/prefetchi-1.c | 4 ++--
>  2 files changed, 3 insertions(+), 3 deletions(-)
>
> diff --git a/gcc/config/i386/i386.md b/gcc/config/i386/i386.md
> index 90d3aa450f0..6207036a2a0 100644
> --- a/gcc/config/i386/i386.md
> +++ b/gcc/config/i386/i386.md
> @@ -28004,7 +28004,7 @@
>"TARGET_PREFETCHI && TARGET_64BIT"
>  {
>static const char * const patterns[2] = {
> -"prefetchit1\t%0", "prefetchit0\t%0"
> +"prefetchit1\t%a0", "prefetchit0\t%a0"
>};
>
>int locality = INTVAL (operands[1]);
> diff --git a/gcc/testsuite/gcc.target/i386/prefetchi-1.c 
> b/gcc/testsuite/gcc.target/i386/prefetchi-1.c
> index 80f25e70e8e..03dfdc55e86 100644
> --- a/gcc/testsuite/gcc.target/i386/prefetchi-1.c
> +++ b/gcc/testsuite/gcc.target/i386/prefetchi-1.c
> @@ -1,7 +1,7 @@
>  /* { dg-do compile { target { ! ia32 } } } */
>  /* { dg-options "-mprefetchi -O2" } */
> -/* { dg-final { scan-assembler-times "\[ \\t\]+prefetchit0\[ \\t\]+" 2 } } */
> -/* { dg-final { scan-assembler-times "\[ \\t\]+prefetchit1\[ \\t\]+" 2 } } */
> +/* { dg-final { scan-assembler-times "\[ \\t\]+prefetchit0\[ 
> \\t\]+bar\\(%rip\\)" 2 } } */
> +/* { dg-final { scan-assembler-times "\[ \\t\]+prefetchit1\[ 
> \\t\]+bar\\(%rip\\)" 2 } } */
>
>  #include 
>
> --
> 2.31.1
>


Re: [PATCH] Relax ix86_hardreg_mov_ok after split1.

2024-07-22 Thread Uros Bizjak
On Tue, Jul 23, 2024 at 3:08 AM liuhongt  wrote:
>
> ix86_hardreg_mov_ok is added by r11-5066-gbe39636d9f68c4
>
> >The solution proposed here is to have the x86 backend/recog prevent
> >early RTL passes composing instructions (that set likely_spilled hard
> >registers) that they (combine) can't simplify, until after reload.
> >We allow sets from pseudo registers, immediate constants and memory
> >accesses, but anything more complicated is performed via a temporary
> >pseudo.  Not only does this simplify things for the register allocator,
> >but any remaining register-to-register moves are easily cleaned up
> >by the late optimization passes after reload, such as peephole2 and
> >cprop_hardreg.
>
> The restriction is mainly for rtl optimization passes before pass_combine.
>
> But split1 splits
>
> ```
> (insn 17 13 18 2 (set (reg/i:V4SI 20 xmm0)
> (vec_merge:V4SI (const_vector:V4SI [
> (const_int -1 [0x]) repeated x4
> ])
> (const_vector:V4SI [
> (const_int 0 [0]) repeated x4
> ])
> (unspec:QI [
> (reg:V4SF 106)
> (reg:V4SF 102)
> (const_int 0 [0])
> ] UNSPEC_PCMP))) "/app/example.cpp":20:1 2929 
> {*avx_cmpv4sf3_1}
>  (expr_list:REG_DEAD (reg:V4SF 102)
> (expr_list:REG_DEAD (reg:V4SF 106)
> (nil
> ```
>
> into:
> ```
> (insn 23 13 24 2 (set (reg:V4SF 107)
> (unspec:V4SF [
> (reg:V4SF 106)
> (reg:V4SF 102)
> (const_int 0 [0])
> ] UNSPEC_PCMP)) "/app/example.cpp":20:1 -1
>  (nil))
> (insn 24 23 18 2 (set (reg/i:V4SI 20 xmm0)
> (subreg:V4SI (reg:V4SF 107) 0)) "/app/example.cpp":20:1 -1
>  (nil))
> ```
>
> There're many splitters generating MOV insn with SUBREG and would have
> same problem.
> Instead of changing those splitters one by one, the patch relaxes
> ix86_hard_mov_ok to allow mov subreg to hard register after
> split1. ix86_pre_reload_split () is used to replace
> !reload_completed && !lra_in_progress.
>
> Bootstrapped and regtested on x86_64-pc-linux-gnu{-m32,}.
> Ok for trunk?
>
> gcc/ChangeLog:
>
> * config/i386/i386.cc (ix86_hardreg_mov_ok): Relax mov subreg
> to hard register after split1.
>
> gcc/testsuite/ChangeLog:
>
> * g++.target/i386/pr115982.C: New test.

LGTM, but please watch out for fallout.

Thanks,
Uros.

> ---
>  gcc/config/i386/i386.cc  |  5 ++---
>  gcc/testsuite/g++.target/i386/pr115982.C | 11 +++
>  2 files changed, 13 insertions(+), 3 deletions(-)
>  create mode 100644 gcc/testsuite/g++.target/i386/pr115982.C
>
> diff --git a/gcc/config/i386/i386.cc b/gcc/config/i386/i386.cc
> index 9c2ebe74fc9..77c441893b4 100644
> --- a/gcc/config/i386/i386.cc
> +++ b/gcc/config/i386/i386.cc
> @@ -20212,7 +20212,7 @@ ix86_class_likely_spilled_p (reg_class_t rclass)
>  }
>
>  /* Return true if a set of DST by the expression SRC should be allowed.
> -   This prevents complex sets of likely_spilled hard regs before reload.  */
> +   This prevents complex sets of likely_spilled hard regs before split1.  */
>
>  bool
>  ix86_hardreg_mov_ok (rtx dst, rtx src)
> @@ -20224,8 +20224,7 @@ ix86_hardreg_mov_ok (rtx dst, rtx src)
>? standard_sse_constant_p (src, GET_MODE (dst))
>: x86_64_immediate_operand (src, GET_MODE (dst)))
>&& ix86_class_likely_spilled_p (REGNO_REG_CLASS (REGNO (dst)))
> -  && !reload_completed
> -  && !lra_in_progress)
> +  && ix86_pre_reload_split ())
>  return false;
>return true;
>  }
> diff --git a/gcc/testsuite/g++.target/i386/pr115982.C 
> b/gcc/testsuite/g++.target/i386/pr115982.C
> new file mode 100644
> index 000..4b91618405d
> --- /dev/null
> +++ b/gcc/testsuite/g++.target/i386/pr115982.C
> @@ -0,0 +1,11 @@
> +/* { dg-do compile } */
> +/* { dg-options "-mavx512vl -O2" } */
> +
> +typedef float VF __attribute__((__vector_size__(16)));
> +typedef int VI __attribute__((__vector_size__(16)));
> +
> +VI
> +foo (VF x)
> +{
> +  return !x;
> +}
> --
> 2.31.1
>


[committed] libatomic: Handle AVX+CX16 ZHAOXIN like intel for 16b atomic [PR104688]

2024-07-18 Thread Uros Bizjak
From: mayshao 

PR target/104688

libatomic/ChangeLog:

* config/x86/init.c (__libat_feat1_init): Don't clear
bit_AVX on ZHAOXIN CPUs.

Bootstrapped and regression tested on x86_64-linux-gnu {,-m32}.

Uros.
diff --git a/libatomic/config/x86/init.c b/libatomic/config/x86/init.c
index 26168d46832..c6ce997a5af 100644
--- a/libatomic/config/x86/init.c
+++ b/libatomic/config/x86/init.c
@@ -41,11 +41,15 @@ __libat_feat1_init (void)
{
  /* Intel SDM guarantees that 16-byte VMOVDQA on 16-byte aligned
 address is atomic, and AMD is going to do something similar soon.
-We don't have a guarantee from vendors of other CPUs with AVX,
-like Zhaoxin and VIA.  */
+Zhaoxin also guarantees this.  We don't have a guarantee
+from vendors of other CPUs with AVX, like VIA.  */
+ unsigned int family = (eax >> 8) & 0x0f;
  unsigned int ecx2;
  __cpuid (0, eax, ebx, ecx2, edx);
- if (ecx2 != signature_INTEL_ecx && ecx2 != signature_AMD_ecx)
+ if (ecx2 != signature_INTEL_ecx
+ && ecx2 != signature_AMD_ecx
+ && !(ecx2 == signature_CENTAUR_ecx && family > 6)
+ && ecx2 != signature_SHANGHAI_ecx)
FEAT1_REGISTER &= ~bit_AVX;
}
 #endif


[gcc r15-2147] libatomic: Handle AVX+CX16 ZHAOXIN like Intel for 16b atomic [PR104688]

2024-07-18 Thread Uros Bizjak via Gcc-cvs
https://gcc.gnu.org/g:9846b0916c1a9b9f3e9df4657670ef4419617134

commit r15-2147-g9846b0916c1a9b9f3e9df4657670ef4419617134
Author: mayshao 
Date:   Thu Jul 18 22:43:00 2024 +0200

libatomic: Handle AVX+CX16 ZHAOXIN like Intel for 16b atomic [PR104688]

PR target/104688

libatomic/ChangeLog:

* config/x86/init.c (__libat_feat1_init): Don't clear
bit_AVX on ZHAOXIN CPUs.

Diff:
---
 libatomic/config/x86/init.c | 10 +++---
 1 file changed, 7 insertions(+), 3 deletions(-)

diff --git a/libatomic/config/x86/init.c b/libatomic/config/x86/init.c
index 26168d468324..c6ce997a5af4 100644
--- a/libatomic/config/x86/init.c
+++ b/libatomic/config/x86/init.c
@@ -41,11 +41,15 @@ __libat_feat1_init (void)
{
  /* Intel SDM guarantees that 16-byte VMOVDQA on 16-byte aligned
 address is atomic, and AMD is going to do something similar soon.
-We don't have a guarantee from vendors of other CPUs with AVX,
-like Zhaoxin and VIA.  */
+Zhaoxin also guarantees this.  We don't have a guarantee
+from vendors of other CPUs with AVX, like VIA.  */
+ unsigned int family = (eax >> 8) & 0x0f;
  unsigned int ecx2;
  __cpuid (0, eax, ebx, ecx2, edx);
- if (ecx2 != signature_INTEL_ecx && ecx2 != signature_AMD_ecx)
+ if (ecx2 != signature_INTEL_ecx
+ && ecx2 != signature_AMD_ecx
+ && !(ecx2 == signature_CENTAUR_ecx && family > 6)
+ && ecx2 != signature_SHANGHAI_ecx)
FEAT1_REGISTER &= ~bit_AVX;
}
 #endif


[committed] libatomic: Improve cpuid usage in __libat_feat1_init

2024-07-18 Thread Uros Bizjak
Check the result of __get_cpuid and process FEAT1_REGISTER only when
__get_cpuid returns success.  Use __cpuid instead of nested __get_cpuid.

libatomic/ChangeLog:

* config/x86/init.c (__libat_feat1_init): Check the result of
__get_cpuid and process FEAT1_REGISTER only when __get_cpuid
returns success.  Use __cpuid instead of nested __get_cpuid.

Bootstrapped and regression tested libatomic on x86_64-linux-gnu {,-m32}.

Uros.
diff --git a/libatomic/config/x86/init.c b/libatomic/config/x86/init.c
index a75be3f175c..26168d46832 100644
--- a/libatomic/config/x86/init.c
+++ b/libatomic/config/x86/init.c
@@ -33,21 +33,23 @@ __libat_feat1_init (void)
 {
   unsigned int eax, ebx, ecx, edx;
   FEAT1_REGISTER = 0;
-  __get_cpuid (1, , , , );
-#ifdef __x86_64__
-  if ((FEAT1_REGISTER & (bit_AVX | bit_CMPXCHG16B))
-  == (bit_AVX | bit_CMPXCHG16B))
+  if (__get_cpuid (1, , , , ))
 {
-  /* Intel SDM guarantees that 16-byte VMOVDQA on 16-byte aligned address
-is atomic, and AMD is going to do something similar soon.
-We don't have a guarantee from vendors of other CPUs with AVX,
-like Zhaoxin and VIA.  */
-  unsigned int ecx2 = 0;
-  __get_cpuid (0, , , , );
-  if (ecx2 != signature_INTEL_ecx && ecx2 != signature_AMD_ecx)
-   FEAT1_REGISTER &= ~bit_AVX;
-}
+#ifdef __x86_64__
+  if ((FEAT1_REGISTER & (bit_AVX | bit_CMPXCHG16B))
+ == (bit_AVX | bit_CMPXCHG16B))
+   {
+ /* Intel SDM guarantees that 16-byte VMOVDQA on 16-byte aligned
+address is atomic, and AMD is going to do something similar soon.
+We don't have a guarantee from vendors of other CPUs with AVX,
+like Zhaoxin and VIA.  */
+ unsigned int ecx2;
+ __cpuid (0, eax, ebx, ecx2, edx);
+ if (ecx2 != signature_INTEL_ecx && ecx2 != signature_AMD_ecx)
+   FEAT1_REGISTER &= ~bit_AVX;
+   }
 #endif
+}
   /* See the load in load_feat1.  */
   __atomic_store_n (&__libat_feat1, FEAT1_REGISTER, __ATOMIC_RELAXED);
   return FEAT1_REGISTER;


[gcc r15-2142] libatomic: Improve cpuid usage in __libat_feat1_init

2024-07-18 Thread Uros Bizjak via Gcc-cvs
https://gcc.gnu.org/g:f7d01e080a54ea94586c8847857e5aef17906519

commit r15-2142-gf7d01e080a54ea94586c8847857e5aef17906519
Author: Uros Bizjak 
Date:   Thu Jul 18 16:58:09 2024 +0200

libatomic: Improve cpuid usage in __libat_feat1_init

Check the result of __get_cpuid and process FEAT1_REGISTER only when
__get_cpuid returns success.  Use __cpuid instead of nested __get_cpuid.

libatomic/ChangeLog:

* config/x86/init.c (__libat_feat1_init): Check the result of
__get_cpuid and process FEAT1_REGISTER only when __get_cpuid
returns success.  Use __cpuid instead of nested __get_cpuid.

Diff:
---
 libatomic/config/x86/init.c | 28 +++-
 1 file changed, 15 insertions(+), 13 deletions(-)

diff --git a/libatomic/config/x86/init.c b/libatomic/config/x86/init.c
index a75be3f175c3..26168d468324 100644
--- a/libatomic/config/x86/init.c
+++ b/libatomic/config/x86/init.c
@@ -33,21 +33,23 @@ __libat_feat1_init (void)
 {
   unsigned int eax, ebx, ecx, edx;
   FEAT1_REGISTER = 0;
-  __get_cpuid (1, , , , );
-#ifdef __x86_64__
-  if ((FEAT1_REGISTER & (bit_AVX | bit_CMPXCHG16B))
-  == (bit_AVX | bit_CMPXCHG16B))
+  if (__get_cpuid (1, , , , ))
 {
-  /* Intel SDM guarantees that 16-byte VMOVDQA on 16-byte aligned address
-is atomic, and AMD is going to do something similar soon.
-We don't have a guarantee from vendors of other CPUs with AVX,
-like Zhaoxin and VIA.  */
-  unsigned int ecx2 = 0;
-  __get_cpuid (0, , , , );
-  if (ecx2 != signature_INTEL_ecx && ecx2 != signature_AMD_ecx)
-   FEAT1_REGISTER &= ~bit_AVX;
-}
+#ifdef __x86_64__
+  if ((FEAT1_REGISTER & (bit_AVX | bit_CMPXCHG16B))
+ == (bit_AVX | bit_CMPXCHG16B))
+   {
+ /* Intel SDM guarantees that 16-byte VMOVDQA on 16-byte aligned
+address is atomic, and AMD is going to do something similar soon.
+We don't have a guarantee from vendors of other CPUs with AVX,
+like Zhaoxin and VIA.  */
+ unsigned int ecx2;
+ __cpuid (0, eax, ebx, ecx2, edx);
+ if (ecx2 != signature_INTEL_ecx && ecx2 != signature_AMD_ecx)
+   FEAT1_REGISTER &= ~bit_AVX;
+   }
 #endif
+}
   /* See the load in load_feat1.  */
   __atomic_store_n (&__libat_feat1, FEAT1_REGISTER, __ATOMIC_RELAXED);
   return FEAT1_REGISTER;


[gcc r12-10623] alpha: Fix duplicate !tlsgd!62 assemble error [PR115526]

2024-07-18 Thread Uros Bizjak via Gcc-cvs
https://gcc.gnu.org/g:c5a26fc24b0af61498fae65ccad69d51d63d2a8b

commit r12-10623-gc5a26fc24b0af61498fae65ccad69d51d63d2a8b
Author: Uros Bizjak 
Date:   Wed Jul 17 18:11:26 2024 +0200

alpha: Fix duplicate !tlsgd!62 assemble error [PR115526]

Add missing "cannot_copy" attribute to instructions that have to
stay in 1-1 correspondence with another insn.

PR target/115526

gcc/ChangeLog:

* config/alpha/alpha.md (movdi_er_high_g): Add cannot_copy 
attribute.
(movdi_er_tlsgd): Ditto.
(movdi_er_tlsldm): Ditto.
(call_value_osf_): Ditto.

gcc/testsuite/ChangeLog:

* gcc.target/alpha/pr115526.c: New test.

(cherry picked from commit 0841fd4c42ab053be951b7418233f0478282d020)

Diff:
---
 gcc/config/alpha/alpha.md | 10 +--
 gcc/testsuite/gcc.target/alpha/pr115526.c | 46 +++
 2 files changed, 53 insertions(+), 3 deletions(-)

diff --git a/gcc/config/alpha/alpha.md b/gcc/config/alpha/alpha.md
index 442953fe50e1..b6795e1d2638 100644
--- a/gcc/config/alpha/alpha.md
+++ b/gcc/config/alpha/alpha.md
@@ -3933,7 +3933,8 @@
   else
 return "ldq %0,%2(%1)\t\t!literal!%3";
 }
-  [(set_attr "type" "ldsym")])
+  [(set_attr "type" "ldsym")
+   (set_attr "cannot_copy" "true")])
 
 (define_split
   [(set (match_operand:DI 0 "register_operand")
@@ -3957,7 +3958,8 @@
 return "lda %0,%2(%1)\t\t!tlsgd";
   else
 return "lda %0,%2(%1)\t\t!tlsgd!%3";
-})
+}
+  [(set_attr "cannot_copy" "true")])
 
 (define_insn "movdi_er_tlsldm"
   [(set (match_operand:DI 0 "register_operand" "=r")
@@ -3970,7 +3972,8 @@
 return "lda %0,%&(%1)\t\t!tlsldm";
   else
 return "lda %0,%&(%1)\t\t!tlsldm!%2";
-})
+}
+  [(set_attr "cannot_copy" "true")])
 
 (define_insn "*movdi_er_gotdtp"
   [(set (match_operand:DI 0 "register_operand" "=r")
@@ -5939,6 +5942,7 @@
   "HAVE_AS_TLS"
   "ldq $27,%1($29)\t\t!literal!%2\;jsr $26,($27),%1\t\t!lituse_!%2\;ldah 
$29,0($26)\t\t!gpdisp!%*\;lda $29,0($29)\t\t!gpdisp!%*"
   [(set_attr "type" "jsr")
+   (set_attr "cannot_copy" "true")
(set_attr "length" "16")])
 
 ;; We must use peep2 instead of a split because we need accurate life
diff --git a/gcc/testsuite/gcc.target/alpha/pr115526.c 
b/gcc/testsuite/gcc.target/alpha/pr115526.c
new file mode 100644
index ..2f57903fec34
--- /dev/null
+++ b/gcc/testsuite/gcc.target/alpha/pr115526.c
@@ -0,0 +1,46 @@
+/* PR target/115526 */
+/* { dg-do assemble } */
+/* { dg-options "-O2 -Wno-attributes -fvisibility=hidden -fPIC -mcpu=ev4" } */
+
+struct _ts {
+  struct _dtoa_state *interp;
+};
+struct Bigint {
+  int k;
+} *_Py_dg_strtod_bs;
+struct _dtoa_state {
+  struct Bigint p5s;
+  struct Bigint *freelist[];
+};
+extern _Thread_local struct _ts _Py_tss_tstate;
+typedef struct Bigint Bigint;
+int pow5mult_k;
+long _Py_dg_strtod_ndigits;
+void PyMem_Free();
+void Bfree(Bigint *v) {
+  if (v)
+{
+  if (v->k)
+   PyMem_Free();
+  else {
+   struct _dtoa_state *interp = _Py_tss_tstate.interp;
+   interp->freelist[v->k] = v;
+  }
+}
+}
+static Bigint *pow5mult(Bigint *b) {
+  for (;;) {
+if (pow5mult_k & 1) {
+  Bfree(b);
+  if (b == 0)
+return 0;
+}
+if (!(pow5mult_k >>= 1))
+  break;
+  }
+  return 0;
+}
+void _Py_dg_strtod() {
+  if (_Py_dg_strtod_ndigits)
+pow5mult(_Py_dg_strtod_bs);
+}


Re: [PATCH v2] [libatomic]: Handle AVX+CX16 ZHAOXIN like intel for 16b atomic [PR104688]

2024-07-18 Thread Uros Bizjak
On Thu, Jul 18, 2024 at 2:07 PM Jakub Jelinek  wrote:
>
> On Thu, Jul 18, 2024 at 01:57:11PM +0200, Uros Bizjak wrote:
> > Attached patch illustrates the proposed improvement with nested cpuid
> > calls. Bootstrapped and teased with libatomic testsuite.
> >
> > Jakub, WDYT?
>
> I'd probably keep the FEAT1_REGISTER = 0; before the if (__get_cpuid (1, ...)
> to avoid the else, I think that could result in smaller code, but otherwise

OK, I'll keep the initialization this way.

> LGTM, especially the use of just __cpuid there.  And note your patch doesn't
> incorporate the Zhaoxin changes.

This will be a separate patch.

Thanks,
Uros.

>
> > diff --git a/libatomic/config/x86/init.c b/libatomic/config/x86/init.c
> > index a75be3f175c..94d45683567 100644
> > --- a/libatomic/config/x86/init.c
> > +++ b/libatomic/config/x86/init.c
> > @@ -32,22 +32,25 @@ unsigned int
> >  __libat_feat1_init (void)
> >  {
> >unsigned int eax, ebx, ecx, edx;
> > -  FEAT1_REGISTER = 0;
> > -  __get_cpuid (1, , , , );
> > -#ifdef __x86_64__
> > -  if ((FEAT1_REGISTER & (bit_AVX | bit_CMPXCHG16B))
> > -  == (bit_AVX | bit_CMPXCHG16B))
> > +  if (__get_cpuid (1, , , , ))
> >  {
> > -  /* Intel SDM guarantees that 16-byte VMOVDQA on 16-byte aligned 
> > address
> > -  is atomic, and AMD is going to do something similar soon.
> > -  We don't have a guarantee from vendors of other CPUs with AVX,
> > -  like Zhaoxin and VIA.  */
> > -  unsigned int ecx2 = 0;
> > -  __get_cpuid (0, , , , );
> > -  if (ecx2 != signature_INTEL_ecx && ecx2 != signature_AMD_ecx)
> > - FEAT1_REGISTER &= ~bit_AVX;
> > -}
> > +#ifdef __x86_64__
> > +  if ((FEAT1_REGISTER & (bit_AVX | bit_CMPXCHG16B))
> > +   == (bit_AVX | bit_CMPXCHG16B))
> > + {
> > +   /* Intel SDM guarantees that 16-byte VMOVDQA on 16-byte aligned
> > +  address is atomic, and AMD is going to do something similar soon.
> > +  We don't have a guarantee from vendors of other CPUs with AVX,
> > +  like Zhaoxin and VIA.  */
> > +   unsigned int ecx2;
> > +   __cpuid (0, eax, ebx, ecx2, edx);
> > +   if (ecx2 != signature_INTEL_ecx && ecx2 != signature_AMD_ecx)
> > + FEAT1_REGISTER &= ~bit_AVX;
> > + }
> >  #endif
> > +}
> > +  else
> > +FEAT1_REGISTER = 0;
> >/* See the load in load_feat1.  */
> >__atomic_store_n (&__libat_feat1, FEAT1_REGISTER, __ATOMIC_RELAXED);
> >return FEAT1_REGISTER;
>
>
> Jakub
>


Re: [PATCH v2] [libatomic]: Handle AVX+CX16 ZHAOXIN like intel for 16b atomic [PR104688]

2024-07-18 Thread Uros Bizjak
On Thu, Jul 18, 2024 at 10:31 AM Uros Bizjak  wrote:
>
> On Thu, Jul 18, 2024 at 10:21 AM Jakub Jelinek  wrote:
> >
> > On Thu, Jul 18, 2024 at 10:12:46AM +0200, Uros Bizjak wrote:
> > > On Thu, Jul 18, 2024 at 9:50 AM Jakub Jelinek  wrote:
> > > >
> > > > On Thu, Jul 18, 2024 at 09:34:14AM +0200, Uros Bizjak wrote:
> > > > > > > +  unsigned int ecx2 = 0, family = 0;
> > > > >
> > > > > No need to initialize these two variables.
> > > >
> > > > The function ignores __get_cpuid result, so at least the
> > > > FEAT1_REGISTER = 0; is needed before the first __get_cpuid.
> > > > Do you mean the ecx2 = 0 initialization is useless because
> > > > __get_cpuid (0, ...) on x86_64 will always succeed (especially
> > > > when __get_cpuid (1, ...) had to succeed otherwise FEAT1_REGISTER
> > > > would be zero)?
> > > > I guess that is true, but won't that cause -Wmaybe-uninitialized 
> > > > warnings?
> > >
> > > Yes, if the __get_cpuid (1, ...) works OK, then we are sure that
> > > __get_cpuid (0, ...) will also work.
> > >
> > > > I agree initializing family to 0 is not needed, but I don't understand
> > > > why it isn't just
> > > >   unsigned family = (eax >> 8) & 0x0f;
> > > > Though, guess even that might fail with -Wmaybe-uninitialized too, as
> > > > eax isn't unconditionally initialized.
> > >
> > > Perhaps we should check the result of __get_cpuid (1, ...) and use eax
> > > only if the function returns 1? IMO, this would solve the
> > > uninitialized issue, and we could use __cpuid in the second case (we
> > > would know that leaf 0 is supported, because leaf 1 support was
> > > checked with __get_cpuid (1, ...)).
> >
> > We know the code is ok if FEAT1_REGISTER = 0; is done before __get_cpuid (1,
> > ...).
> > Everything else is implied from it, all we need to ensure is that
> > -Wmaybe-uninitialized is happy about it.
> > Whatever doesn't report the warning and ideally doesn't increase the size of
> > the function.
> > I think the reason it is written the way it is before the AVX hacks in it
> > is that we need to handle even the case when __get_cpuid (1, ...) returns 0,
> > and we want in that case FEAT1_REGISTER = 0.
> > So it could be
>
> Yes, I think this is better, see below.
>
> >   FEAT1_REGISTER = 0;
> > #ifdef __x86_64__
> >   if (__get_cpuid (1, , , , )
> >   && (FEAT1_REGISTER & (bit_AVX | bit_CMPXCHG16B))
> >   == (bit_AVX | bit_CMPXCHG16B))
> > {
>
> Here we can simply use

Attached patch illustrates the proposed improvement with nested cpuid
calls. Bootstrapped and teased with libatomic testsuite.

Jakub, WDYT?

Uros.
diff --git a/libatomic/config/x86/init.c b/libatomic/config/x86/init.c
index a75be3f175c..94d45683567 100644
--- a/libatomic/config/x86/init.c
+++ b/libatomic/config/x86/init.c
@@ -32,22 +32,25 @@ unsigned int
 __libat_feat1_init (void)
 {
   unsigned int eax, ebx, ecx, edx;
-  FEAT1_REGISTER = 0;
-  __get_cpuid (1, , , , );
-#ifdef __x86_64__
-  if ((FEAT1_REGISTER & (bit_AVX | bit_CMPXCHG16B))
-  == (bit_AVX | bit_CMPXCHG16B))
+  if (__get_cpuid (1, , , , ))
 {
-  /* Intel SDM guarantees that 16-byte VMOVDQA on 16-byte aligned address
-is atomic, and AMD is going to do something similar soon.
-We don't have a guarantee from vendors of other CPUs with AVX,
-like Zhaoxin and VIA.  */
-  unsigned int ecx2 = 0;
-  __get_cpuid (0, , , , );
-  if (ecx2 != signature_INTEL_ecx && ecx2 != signature_AMD_ecx)
-   FEAT1_REGISTER &= ~bit_AVX;
-}
+#ifdef __x86_64__
+  if ((FEAT1_REGISTER & (bit_AVX | bit_CMPXCHG16B))
+ == (bit_AVX | bit_CMPXCHG16B))
+   {
+ /* Intel SDM guarantees that 16-byte VMOVDQA on 16-byte aligned
+address is atomic, and AMD is going to do something similar soon.
+We don't have a guarantee from vendors of other CPUs with AVX,
+like Zhaoxin and VIA.  */
+ unsigned int ecx2;
+ __cpuid (0, eax, ebx, ecx2, edx);
+ if (ecx2 != signature_INTEL_ecx && ecx2 != signature_AMD_ecx)
+   FEAT1_REGISTER &= ~bit_AVX;
+   }
 #endif
+}
+  else
+FEAT1_REGISTER = 0;
   /* See the load in load_feat1.  */
   __atomic_store_n (&__libat_feat1, FEAT1_REGISTER, __ATOMIC_RELAXED);
   return FEAT1_REGISTER;


Re: [PATCH v2] [libatomic]: Handle AVX+CX16 ZHAOXIN like intel for 16b atomic [PR104688]

2024-07-18 Thread Uros Bizjak
On Thu, Jul 18, 2024 at 10:21 AM Jakub Jelinek  wrote:
>
> On Thu, Jul 18, 2024 at 10:12:46AM +0200, Uros Bizjak wrote:
> > On Thu, Jul 18, 2024 at 9:50 AM Jakub Jelinek  wrote:
> > >
> > > On Thu, Jul 18, 2024 at 09:34:14AM +0200, Uros Bizjak wrote:
> > > > > > +  unsigned int ecx2 = 0, family = 0;
> > > >
> > > > No need to initialize these two variables.
> > >
> > > The function ignores __get_cpuid result, so at least the
> > > FEAT1_REGISTER = 0; is needed before the first __get_cpuid.
> > > Do you mean the ecx2 = 0 initialization is useless because
> > > __get_cpuid (0, ...) on x86_64 will always succeed (especially
> > > when __get_cpuid (1, ...) had to succeed otherwise FEAT1_REGISTER
> > > would be zero)?
> > > I guess that is true, but won't that cause -Wmaybe-uninitialized warnings?
> >
> > Yes, if the __get_cpuid (1, ...) works OK, then we are sure that
> > __get_cpuid (0, ...) will also work.
> >
> > > I agree initializing family to 0 is not needed, but I don't understand
> > > why it isn't just
> > >   unsigned family = (eax >> 8) & 0x0f;
> > > Though, guess even that might fail with -Wmaybe-uninitialized too, as
> > > eax isn't unconditionally initialized.
> >
> > Perhaps we should check the result of __get_cpuid (1, ...) and use eax
> > only if the function returns 1? IMO, this would solve the
> > uninitialized issue, and we could use __cpuid in the second case (we
> > would know that leaf 0 is supported, because leaf 1 support was
> > checked with __get_cpuid (1, ...)).
>
> We know the code is ok if FEAT1_REGISTER = 0; is done before __get_cpuid (1,
> ...).
> Everything else is implied from it, all we need to ensure is that
> -Wmaybe-uninitialized is happy about it.
> Whatever doesn't report the warning and ideally doesn't increase the size of
> the function.
> I think the reason it is written the way it is before the AVX hacks in it
> is that we need to handle even the case when __get_cpuid (1, ...) returns 0,
> and we want in that case FEAT1_REGISTER = 0.
> So it could be

Yes, I think this is better, see below.

>   FEAT1_REGISTER = 0;
> #ifdef __x86_64__
>   if (__get_cpuid (1, , , , )
>   && (FEAT1_REGISTER & (bit_AVX | bit_CMPXCHG16B))
>   == (bit_AVX | bit_CMPXCHG16B))
> {

Here we can simply use

unsigned int family = (eax >> 8) & 0x0f;
unsigned int ecx2;

__cpuid (0, eax, ebx, ecx2, edx);

if (ecx2 ...)

> ...
> }
> #else
>   __get_cpuid (1, , , , );
> #endif
> etc.
>
> Jakub

Uros.


Re: [PATCH v2] [libatomic]: Handle AVX+CX16 ZHAOXIN like intel for 16b atomic [PR104688]

2024-07-18 Thread Uros Bizjak
On Thu, Jul 18, 2024 at 9:50 AM Jakub Jelinek  wrote:
>
> On Thu, Jul 18, 2024 at 09:34:14AM +0200, Uros Bizjak wrote:
> > > > +  unsigned int ecx2 = 0, family = 0;
> >
> > No need to initialize these two variables.
>
> The function ignores __get_cpuid result, so at least the
> FEAT1_REGISTER = 0; is needed before the first __get_cpuid.
> Do you mean the ecx2 = 0 initialization is useless because
> __get_cpuid (0, ...) on x86_64 will always succeed (especially
> when __get_cpuid (1, ...) had to succeed otherwise FEAT1_REGISTER
> would be zero)?
> I guess that is true, but won't that cause -Wmaybe-uninitialized warnings?

Yes, if the __get_cpuid (1, ...) works OK, then we are sure that
__get_cpuid (0, ...) will also work.

> I agree initializing family to 0 is not needed, but I don't understand
> why it isn't just
>   unsigned family = (eax >> 8) & 0x0f;
> Though, guess even that might fail with -Wmaybe-uninitialized too, as
> eax isn't unconditionally initialized.

Perhaps we should check the result of __get_cpuid (1, ...) and use eax
only if the function returns 1? IMO, this would solve the
uninitialized issue, and we could use __cpuid in the second case (we
would know that leaf 0 is supported, because leaf 1 support was
checked with __get_cpuid (1, ...)).

Uros.


Re: [PATCH v2] [libatomic]: Handle AVX+CX16 ZHAOXIN like intel for 16b atomic [PR104688]

2024-07-18 Thread Uros Bizjak
On Thu, Jul 18, 2024 at 9:29 AM Jakub Jelinek  wrote:
>
> On Thu, Jul 18, 2024 at 03:23:05PM +0800, MayShao-oc wrote:
> > From: mayshao 
> >
> > Hi Jakub:
> >
> > Thanks for your review,We should just amend this to handle Zhaoxin.
> >
> > Bootstrapped /regtested X86_64.
> >
> > Ok for trunk?
> > BR
> > Mayshao
> >
> > libatomic/ChangeLog:
> >
> >   PR target/104688
> >   * config/x86/init.c (__libat_feat1_init): Don't clear
> >   bit_AVX on ZHAOXIN CPUs.
> > ---
> >  libatomic/config/x86/init.c | 13 -
> >  1 file changed, 8 insertions(+), 5 deletions(-)
> >
> > diff --git a/libatomic/config/x86/init.c b/libatomic/config/x86/init.c
> > index a75be3f175c..0d6864909bb 100644
> > --- a/libatomic/config/x86/init.c
> > +++ b/libatomic/config/x86/init.c
> > @@ -39,12 +39,15 @@ __libat_feat1_init (void)
> >== (bit_AVX | bit_CMPXCHG16B))
> >  {
> >/* Intel SDM guarantees that 16-byte VMOVDQA on 16-byte aligned 
> > address
> > -  is atomic, and AMD is going to do something similar soon.
> > -  We don't have a guarantee from vendors of other CPUs with AVX,
> > -  like Zhaoxin and VIA.  */
> > -  unsigned int ecx2 = 0;
> > +  is atomic, and AMD is going to do something similar soon. Zhaoxin 
> > also
>
> Two spaces before Zhaoxin (and also should go on another line).
>
> > +  guarantees this. We don't have a guarantee from vendors of other CPUs
>
> Two spaces before We (and again, the line will be too long).
>
> > +  with AVX,like VIA.  */
>
> Space before like
>
> > +  unsigned int ecx2 = 0, family = 0;

No need to initialize these two variables. Please also add one line of
vertical space after variable declarations.

OK with the above change and with Jakub's proposed formatting changes.

Thanks,
Uros.

> > +  family = (eax >> 8) & 0x0f;
> >__get_cpuid (0, , , , );
> > -  if (ecx2 != signature_INTEL_ecx && ecx2 != signature_AMD_ecx)
> > +  if (ecx2 != signature_INTEL_ecx && ecx2 != signature_AMD_ecx
>
> If the whole condition can't fit on one line, then each subcondition should
> be on a separate line, so linebreak + indentation should be added also
> before && ecx2 != signature_AMD_ecx
>
> > +  && !(ecx2 == signature_CENTAUR_ecx && family > 0x6)
> > +  && ecx2 != signature_SHANGHAI_ecx)
> >   FEAT1_REGISTER &= ~bit_AVX;
> >  }
> >  #endif
> > --
> > 2.27.0
>
> Otherwise LGTM, but please give Uros a day or two to chime in.
>
> Jakub
>


Re: [PATCH v2] i386: Fix testcases generating invalid asm

2024-07-18 Thread Uros Bizjak
On Thu, Jul 18, 2024 at 8:52 AM Haochen Jiang  wrote:
>
> Hi all,
>
> I revised the patch according to the comment.
>
> Ok for trunk?
>
> Thx,
> Haochen
>
> ---
>
> Changes in v2: Add suffix for mov to make the test more robust.
>
> ---
>
> For compile test, we should generate valid asm except for special purposes.
> Fix the compile test that generates invalid asm.
>
> gcc/testsuite/ChangeLog:
>
> * gcc.target/i386/apx-egprs-names.c: Use ax for short and
> al for char instead of eax.
> * gcc.target/i386/avx512bw-kandnq-1.c: Do not run the test
> under -m32 since kmovq with register is invalid. Use long
> long to use 64 bit register instead of 32 bit register for
> kmovq.
> * gcc.target/i386/avx512bw-kandq-1.c: Ditto.
> * gcc.target/i386/avx512bw-knotq-1.c: Ditto.
> * gcc.target/i386/avx512bw-korq-1.c: Ditto.
> * gcc.target/i386/avx512bw-kshiftlq-1.c: Ditto.
> * gcc.target/i386/avx512bw-kshiftrq-1.c: Ditto.
> * gcc.target/i386/avx512bw-kxnorq-1.c: Ditto.
> * gcc.target/i386/avx512bw-kxorq-1.c: Ditto.

LGTM.

Thanks,
Uros.

> ---
>  gcc/testsuite/gcc.target/i386/apx-egprs-names.c | 8 
>  gcc/testsuite/gcc.target/i386/avx512bw-kandnq-1.c   | 6 +++---
>  gcc/testsuite/gcc.target/i386/avx512bw-kandq-1.c| 6 +++---
>  gcc/testsuite/gcc.target/i386/avx512bw-knotq-1.c| 4 ++--
>  gcc/testsuite/gcc.target/i386/avx512bw-korq-1.c | 6 +++---
>  gcc/testsuite/gcc.target/i386/avx512bw-kshiftlq-1.c | 4 ++--
>  gcc/testsuite/gcc.target/i386/avx512bw-kshiftrq-1.c | 4 ++--
>  gcc/testsuite/gcc.target/i386/avx512bw-kxnorq-1.c   | 6 +++---
>  gcc/testsuite/gcc.target/i386/avx512bw-kxorq-1.c| 6 +++---
>  9 files changed, 25 insertions(+), 25 deletions(-)
>
> diff --git a/gcc/testsuite/gcc.target/i386/apx-egprs-names.c 
> b/gcc/testsuite/gcc.target/i386/apx-egprs-names.c
> index f0517e47c33..917ef505495 100644
> --- a/gcc/testsuite/gcc.target/i386/apx-egprs-names.c
> +++ b/gcc/testsuite/gcc.target/i386/apx-egprs-names.c
> @@ -10,8 +10,8 @@ void foo ()
>register int b __asm ("r30");
>register short c __asm ("r29");
>register char d __asm ("r28");
> -  __asm__ __volatile__ ("mov %0, %%rax" : : "r" (a) : "rax");
> -  __asm__ __volatile__ ("mov %0, %%eax" : : "r" (b) : "eax");
> -  __asm__ __volatile__ ("mov %0, %%eax" : : "r" (c) : "eax");
> -  __asm__ __volatile__ ("mov %0, %%eax" : : "r" (d) : "eax");
> +  __asm__ __volatile__ ("movq %0, %%rax" : : "r" (a) : "rax");
> +  __asm__ __volatile__ ("movl %0, %%eax" : : "r" (b) : "eax");
> +  __asm__ __volatile__ ("movw %0, %%ax" : : "r" (c) : "ax");
> +  __asm__ __volatile__ ("movb %0, %%al" : : "r" (d) : "al");
>  }
> diff --git a/gcc/testsuite/gcc.target/i386/avx512bw-kandnq-1.c 
> b/gcc/testsuite/gcc.target/i386/avx512bw-kandnq-1.c
> index e8b7a5f9aa2..f9f03c90782 100644
> --- a/gcc/testsuite/gcc.target/i386/avx512bw-kandnq-1.c
> +++ b/gcc/testsuite/gcc.target/i386/avx512bw-kandnq-1.c
> @@ -1,4 +1,4 @@
> -/* { dg-do compile } */
> +/* { dg-do compile { target { ! ia32 } } } */
>  /* { dg-options "-mavx512bw -O2" } */
>  /* { dg-final { scan-assembler-times "kandnq\[ 
> \\t\]+\[^\{\n\]*%k\[0-7\](?:\n|\[ \\t\]+#)" 1 } } */
>
> @@ -10,8 +10,8 @@ avx512bw_test ()
>__mmask64 k1, k2, k3;
>volatile __m512i x = _mm512_setzero_si512 ();
>
> -  __asm__( "kmovq %1, %0" : "=k" (k1) : "r" (1) );
> -  __asm__( "kmovq %1, %0" : "=k" (k2) : "r" (2) );
> +  __asm__( "kmovq %1, %0" : "=k" (k1) : "r" (1ULL) );
> +  __asm__( "kmovq %1, %0" : "=k" (k2) : "r" (2ULL) );
>
>k3 = _kandn_mask64 (k1, k2);
>x = _mm512_mask_add_epi8 (x, k3, x, x);
> diff --git a/gcc/testsuite/gcc.target/i386/avx512bw-kandq-1.c 
> b/gcc/testsuite/gcc.target/i386/avx512bw-kandq-1.c
> index a1aaed67c66..6ad836087ad 100644
> --- a/gcc/testsuite/gcc.target/i386/avx512bw-kandq-1.c
> +++ b/gcc/testsuite/gcc.target/i386/avx512bw-kandq-1.c
> @@ -1,4 +1,4 @@
> -/* { dg-do compile } */
> +/* { dg-do compile { target { ! ia32 } } } */
>  /* { dg-options "-mavx512bw -O2" } */
>  /* { dg-final { scan-assembler-times "kandq\[ 
> \\t\]+\[^\{\n\]*%k\[0-7\](?:\n|\[ \\t\]+#)" 1 } } */
>
> @@ -10,8 +10,8 @@ avx512bw_test ()
>__mmask64 k1, k2, k3;
>volatile __m512i x = _mm512_setzero_epi32();
>
> -  __asm__( "kmovq %1, %0" : "=k" (k1) : "r" (1) );
> -  __asm__( "kmovq %1, %0" : "=k" (k2) : "r" (2) );
> +  __asm__( "kmovq %1, %0" : "=k" (k1) : "r" (1ULL) );
> +  __asm__( "kmovq %1, %0" : "=k" (k2) : "r" (2ULL) );
>
>k3 = _kand_mask64 (k1, k2);
>x = _mm512_mask_add_epi8 (x, k3, x, x);
> diff --git a/gcc/testsuite/gcc.target/i386/avx512bw-knotq-1.c 
> b/gcc/testsuite/gcc.target/i386/avx512bw-knotq-1.c
> index deb65795760..341bbc03847 100644
> --- a/gcc/testsuite/gcc.target/i386/avx512bw-knotq-1.c
> +++ b/gcc/testsuite/gcc.target/i386/avx512bw-knotq-1.c
> @@ -1,4 +1,4 @@
> -/* { dg-do compile } */
> +/* { dg-do compile { target { ! ia32 } } } */
>  /* { dg-options "-mavx512bw 

Re: [PATCH v2] [x86][avx512] Optimize maskstore when mask is 0 or -1 in UNSPEC_MASKMOV

2024-07-18 Thread Uros Bizjak
On Thu, Jul 18, 2024 at 3:35 AM liuhongt  wrote:
>
> > Also, in case the insn is deleted, do:
> >
> > emit_note (NOTE_INSN_DELETED);
> >
> > DONE;
> >
> > instead of leaving (const_int 0) in the stream.
> >
> > So, the above insn preparation statements should read:
> >
> > --cut here--
> > if (constm1_operand (operands[2], mode))
> >   emit_move_insn (operands[0], operands[1]);
> > else
> >   emit_note (NOTE_INSN_DELETED);
> >
> > DONE;
> > --cut here--
> Changed.
>
> Bootstrapped and regtested on x86_64-pc-linux-gnu{-m32,}.
> Ok for trunk?
>
> gcc/ChangeLog:
>
> PR target/115843
> * config/i386/predicates.md (const0_or_m1_operand): New
> predicate.
> * config/i386/sse.md (*_store_mask_1): New
> pre_reload define_insn_and_split.
> (V): Add V32BF,V16BF,V8BF.
> (V4SF_V8BF): Rename to ..
> (V24F_128): .. this.
> (*vec_concat): Adjust with V24F_128.
> (*vec_concat_0): Ditto.
>
> gcc/testsuite/ChangeLog:
>
> * gcc.target/i386/pr115843.c: New test.

LGTM.

Thanks,
Uros.

> ---
>  gcc/config/i386/predicates.md|  5 
>  gcc/config/i386/sse.md   | 33 
>  gcc/testsuite/gcc.target/i386/pr115843.c | 38 
>  3 files changed, 70 insertions(+), 6 deletions(-)
>  create mode 100644 gcc/testsuite/gcc.target/i386/pr115843.c
>
> diff --git a/gcc/config/i386/predicates.md b/gcc/config/i386/predicates.md
> index 5d0bb1e0f54..680594871de 100644
> --- a/gcc/config/i386/predicates.md
> +++ b/gcc/config/i386/predicates.md
> @@ -825,6 +825,11 @@ (define_predicate "constm1_operand"
>(and (match_code "const_int")
> (match_test "op == constm1_rtx")))
>
> +;; Match 0 or -1.
> +(define_predicate "const0_or_m1_operand"
> +  (ior (match_operand 0 "const0_operand")
> +   (match_operand 0 "constm1_operand")))
> +
>  ;; Match exactly eight.
>  (define_predicate "const8_operand"
>(and (match_code "const_int")
> diff --git a/gcc/config/i386/sse.md b/gcc/config/i386/sse.md
> index e44822f705b..f54e966bdbb 100644
> --- a/gcc/config/i386/sse.md
> +++ b/gcc/config/i386/sse.md
> @@ -294,6 +294,7 @@ (define_mode_iterator V
> (V16SI "TARGET_AVX512F && TARGET_EVEX512") (V8SI "TARGET_AVX") V4SI
> (V8DI "TARGET_AVX512F && TARGET_EVEX512")  (V4DI "TARGET_AVX") V2DI
> (V32HF "TARGET_AVX512F && TARGET_EVEX512") (V16HF "TARGET_AVX") V8HF
> +   (V32BF "TARGET_AVX512F && TARGET_EVEX512") (V16BF "TARGET_AVX") V8BF
> (V16SF "TARGET_AVX512F && TARGET_EVEX512") (V8SF "TARGET_AVX") V4SF
> (V8DF "TARGET_AVX512F && TARGET_EVEX512")  (V4DF "TARGET_AVX") (V2DF 
> "TARGET_SSE2")])
>
> @@ -430,8 +431,8 @@ (define_mode_iterator VFB_512
> (V16SF "TARGET_EVEX512")
> (V8DF "TARGET_EVEX512")])
>
> -(define_mode_iterator V4SF_V8HF
> -  [V4SF V8HF])
> +(define_mode_iterator V24F_128
> +  [V4SF V8HF V8BF])
>
>  (define_mode_iterator VI48_AVX512VL
>[(V16SI "TARGET_EVEX512") (V8SI "TARGET_AVX512VL") (V4SI "TARGET_AVX512VL")
> @@ -11543,8 +11544,8 @@ (define_insn "*vec_concatv2sf_sse"
> (set_attr "mode" "V4SF,SF,DI,DI")])
>
>  (define_insn "*vec_concat"
> -  [(set (match_operand:V4SF_V8HF 0 "register_operand"   "=x,v,x,v")
> -   (vec_concat:V4SF_V8HF
> +  [(set (match_operand:V24F_128 0 "register_operand"   "=x,v,x,v")
> +   (vec_concat:V24F_128
>   (match_operand: 1 "register_operand" " 0,v,0,v")
>   (match_operand: 2 "nonimmediate_operand" " 
> x,v,m,m")))]
>"TARGET_SSE"
> @@ -11559,8 +11560,8 @@ (define_insn "*vec_concat"
> (set_attr "mode" "V4SF,V4SF,V2SF,V2SF")])
>
>  (define_insn "*vec_concat_0"
> -  [(set (match_operand:V4SF_V8HF 0 "register_operand"   "=v")
> -   (vec_concat:V4SF_V8HF
> +  [(set (match_operand:V24F_128 0 "register_operand"   "=v")
> +   (vec_concat:V24F_128
>   (match_operand: 1 "nonimmediate_operand" "vm")
>   (match_operand: 2 "const0_operand")))]
>"TARGET_SSE2"
> @@ -28574,6 +28575,26 @@ (define_insn "_store_mask"
> (set_attr "memory" "store")
> (set_attr "mode" "")])
>
> +(define_insn_and_split "*_store_mask_1"
> +  [(set (match_operand:V 0 "memory_operand")
> +   (unspec:V
> + [(match_operand:V 1 "register_operand")
> +  (match_dup 0)
> +  (match_operand: 2 "const0_or_m1_operand")]
> + UNSPEC_MASKMOV))]
> +  "TARGET_AVX512F && ix86_pre_reload_split ()"
> +  "#"
> +  "&& 1"
> +  [(const_int 0)]
> +{
> +  if (constm1_operand (operands[2], mode))
> +emit_move_insn (operands[0], operands[1]);
> +  else
> +emit_note (NOTE_INSN_DELETED);
> +
> +  DONE;
> +})
> +
>  (define_expand "cbranch4"
>[(set (reg:CC FLAGS_REG)
> (compare:CC (match_operand:VI_AVX_AVX512F 1 "register_operand")
> diff --git a/gcc/testsuite/gcc.target/i386/pr115843.c 
> b/gcc/testsuite/gcc.target/i386/pr115843.c
> new file mode 100644
> index 000..00d8605757a
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/i386/pr115843.c
> 

Re: [PATCH] i386: Fix testcases generating invalid asm

2024-07-17 Thread Uros Bizjak
On Thu, Jul 18, 2024 at 3:46 AM Haochen Jiang  wrote:
>
> Hi all,
>
> For compile test, we should generate valid asm except for special purposes.
> Fix the compile test that generates invalid asm.
>
> Regtested on x86-64-pc-linux-gnu. Ok for trunk?
>
> Thx,
> Haochen
>
> gcc/testsuite/ChangeLog:
>
> * gcc.target/i386/apx-egprs-names.c: Use ax for short and
> al for char instead of eax.
> * gcc.target/i386/avx512bw-kandnq-1.c: Do not run the test
> under -m32 since kmovq with register is invalid. Use long
> long to use 64 bit register instead of 32 bit register for
> kmovq.
> * gcc.target/i386/avx512bw-kandq-1.c: Ditto.
> * gcc.target/i386/avx512bw-knotq-1.c: Ditto.
> * gcc.target/i386/avx512bw-korq-1.c: Ditto.
> * gcc.target/i386/avx512bw-kshiftlq-1.c: Ditto.
> * gcc.target/i386/avx512bw-kshiftrq-1.c: Ditto.
> * gcc.target/i386/avx512bw-kxnorq-1.c: Ditto.
> * gcc.target/i386/avx512bw-kxorq-1.c: Ditto.
> ---
>  gcc/testsuite/gcc.target/i386/apx-egprs-names.c | 4 ++--
>  gcc/testsuite/gcc.target/i386/avx512bw-kandnq-1.c   | 6 +++---
>  gcc/testsuite/gcc.target/i386/avx512bw-kandq-1.c| 6 +++---
>  gcc/testsuite/gcc.target/i386/avx512bw-knotq-1.c| 4 ++--
>  gcc/testsuite/gcc.target/i386/avx512bw-korq-1.c | 6 +++---
>  gcc/testsuite/gcc.target/i386/avx512bw-kshiftlq-1.c | 4 ++--
>  gcc/testsuite/gcc.target/i386/avx512bw-kshiftrq-1.c | 4 ++--
>  gcc/testsuite/gcc.target/i386/avx512bw-kxnorq-1.c   | 6 +++---
>  gcc/testsuite/gcc.target/i386/avx512bw-kxorq-1.c| 6 +++---
>  9 files changed, 23 insertions(+), 23 deletions(-)
>
> diff --git a/gcc/testsuite/gcc.target/i386/apx-egprs-names.c 
> b/gcc/testsuite/gcc.target/i386/apx-egprs-names.c
> index f0517e47c33..5b342aa385b 100644
> --- a/gcc/testsuite/gcc.target/i386/apx-egprs-names.c
> +++ b/gcc/testsuite/gcc.target/i386/apx-egprs-names.c
> @@ -12,6 +12,6 @@ void foo ()
>register char d __asm ("r28");
>__asm__ __volatile__ ("mov %0, %%rax" : : "r" (a) : "rax");
>__asm__ __volatile__ ("mov %0, %%eax" : : "r" (b) : "eax");
> -  __asm__ __volatile__ ("mov %0, %%eax" : : "r" (c) : "eax");
> -  __asm__ __volatile__ ("mov %0, %%eax" : : "r" (d) : "eax");
> +  __asm__ __volatile__ ("mov %0, %%ax" : : "r" (c) : "ax");
> +  __asm__ __volatile__ ("mov %0, %%al" : : "r" (d) : "al");

You can use the insn suffix (movq, movl, movw and movb) to make the
asm even more robust.

Uros.

>  }
> diff --git a/gcc/testsuite/gcc.target/i386/avx512bw-kandnq-1.c 
> b/gcc/testsuite/gcc.target/i386/avx512bw-kandnq-1.c
> index e8b7a5f9aa2..f9f03c90782 100644
> --- a/gcc/testsuite/gcc.target/i386/avx512bw-kandnq-1.c
> +++ b/gcc/testsuite/gcc.target/i386/avx512bw-kandnq-1.c
> @@ -1,4 +1,4 @@
> -/* { dg-do compile } */
> +/* { dg-do compile { target { ! ia32 } } } */
>  /* { dg-options "-mavx512bw -O2" } */
>  /* { dg-final { scan-assembler-times "kandnq\[ 
> \\t\]+\[^\{\n\]*%k\[0-7\](?:\n|\[ \\t\]+#)" 1 } } */
>
> @@ -10,8 +10,8 @@ avx512bw_test ()
>__mmask64 k1, k2, k3;
>volatile __m512i x = _mm512_setzero_si512 ();
>
> -  __asm__( "kmovq %1, %0" : "=k" (k1) : "r" (1) );
> -  __asm__( "kmovq %1, %0" : "=k" (k2) : "r" (2) );
> +  __asm__( "kmovq %1, %0" : "=k" (k1) : "r" (1ULL) );
> +  __asm__( "kmovq %1, %0" : "=k" (k2) : "r" (2ULL) );
>
>k3 = _kandn_mask64 (k1, k2);
>x = _mm512_mask_add_epi8 (x, k3, x, x);
> diff --git a/gcc/testsuite/gcc.target/i386/avx512bw-kandq-1.c 
> b/gcc/testsuite/gcc.target/i386/avx512bw-kandq-1.c
> index a1aaed67c66..6ad836087ad 100644
> --- a/gcc/testsuite/gcc.target/i386/avx512bw-kandq-1.c
> +++ b/gcc/testsuite/gcc.target/i386/avx512bw-kandq-1.c
> @@ -1,4 +1,4 @@
> -/* { dg-do compile } */
> +/* { dg-do compile { target { ! ia32 } } } */
>  /* { dg-options "-mavx512bw -O2" } */
>  /* { dg-final { scan-assembler-times "kandq\[ 
> \\t\]+\[^\{\n\]*%k\[0-7\](?:\n|\[ \\t\]+#)" 1 } } */
>
> @@ -10,8 +10,8 @@ avx512bw_test ()
>__mmask64 k1, k2, k3;
>volatile __m512i x = _mm512_setzero_epi32();
>
> -  __asm__( "kmovq %1, %0" : "=k" (k1) : "r" (1) );
> -  __asm__( "kmovq %1, %0" : "=k" (k2) : "r" (2) );
> +  __asm__( "kmovq %1, %0" : "=k" (k1) : "r" (1ULL) );
> +  __asm__( "kmovq %1, %0" : "=k" (k2) : "r" (2ULL) );
>
>k3 = _kand_mask64 (k1, k2);
>x = _mm512_mask_add_epi8 (x, k3, x, x);
> diff --git a/gcc/testsuite/gcc.target/i386/avx512bw-knotq-1.c 
> b/gcc/testsuite/gcc.target/i386/avx512bw-knotq-1.c
> index deb65795760..341bbc03847 100644
> --- a/gcc/testsuite/gcc.target/i386/avx512bw-knotq-1.c
> +++ b/gcc/testsuite/gcc.target/i386/avx512bw-knotq-1.c
> @@ -1,4 +1,4 @@
> -/* { dg-do compile } */
> +/* { dg-do compile { target { ! ia32 } } } */
>  /* { dg-options "-mavx512bw -O2" } */
>  /* { dg-final { scan-assembler-times "knotq\[ 
> \\t\]+\[^\{\n\]*%k\[0-7\](?:\n|\[ \\t\]+#)" 1 } } */
>
> @@ -10,7 +10,7 @@ avx512bw_test ()
>__mmask64 k1, k2;
>volatile __m512i x = _mm512_setzero_si512 ();
>
> -  

[gcc r13-8920] alpha: Fix duplicate !tlsgd!62 assemble error [PR115526]

2024-07-17 Thread Uros Bizjak via Gcc-cvs
https://gcc.gnu.org/g:37bd7d5c4e17c97d2b7d50f630b1cf8b347a31f4

commit r13-8920-g37bd7d5c4e17c97d2b7d50f630b1cf8b347a31f4
Author: Uros Bizjak 
Date:   Wed Jul 17 18:11:26 2024 +0200

alpha: Fix duplicate !tlsgd!62 assemble error [PR115526]

Add missing "cannot_copy" attribute to instructions that have to
stay in 1-1 correspondence with another insn.

PR target/115526

gcc/ChangeLog:

* config/alpha/alpha.md (movdi_er_high_g): Add cannot_copy 
attribute.
(movdi_er_tlsgd): Ditto.
(movdi_er_tlsldm): Ditto.
(call_value_osf_): Ditto.

gcc/testsuite/ChangeLog:

* gcc.target/alpha/pr115526.c: New test.

(cherry picked from commit 0841fd4c42ab053be951b7418233f0478282d020)

Diff:
---
 gcc/config/alpha/alpha.md | 10 +--
 gcc/testsuite/gcc.target/alpha/pr115526.c | 46 +++
 2 files changed, 53 insertions(+), 3 deletions(-)

diff --git a/gcc/config/alpha/alpha.md b/gcc/config/alpha/alpha.md
index 17dfc4a58689..0752c5a001ca 100644
--- a/gcc/config/alpha/alpha.md
+++ b/gcc/config/alpha/alpha.md
@@ -3933,7 +3933,8 @@
   else
 return "ldq %0,%2(%1)\t\t!literal!%3";
 }
-  [(set_attr "type" "ldsym")])
+  [(set_attr "type" "ldsym")
+   (set_attr "cannot_copy" "true")])
 
 (define_split
   [(set (match_operand:DI 0 "register_operand")
@@ -3957,7 +3958,8 @@
 return "lda %0,%2(%1)\t\t!tlsgd";
   else
 return "lda %0,%2(%1)\t\t!tlsgd!%3";
-})
+}
+  [(set_attr "cannot_copy" "true")])
 
 (define_insn "movdi_er_tlsldm"
   [(set (match_operand:DI 0 "register_operand" "=r")
@@ -3970,7 +3972,8 @@
 return "lda %0,%&(%1)\t\t!tlsldm";
   else
 return "lda %0,%&(%1)\t\t!tlsldm!%2";
-})
+}
+  [(set_attr "cannot_copy" "true")])
 
 (define_insn "*movdi_er_gotdtp"
   [(set (match_operand:DI 0 "register_operand" "=r")
@@ -5939,6 +5942,7 @@
   "HAVE_AS_TLS"
   "ldq $27,%1($29)\t\t!literal!%2\;jsr $26,($27),%1\t\t!lituse_!%2\;ldah 
$29,0($26)\t\t!gpdisp!%*\;lda $29,0($29)\t\t!gpdisp!%*"
   [(set_attr "type" "jsr")
+   (set_attr "cannot_copy" "true")
(set_attr "length" "16")])
 
 ;; We must use peep2 instead of a split because we need accurate life
diff --git a/gcc/testsuite/gcc.target/alpha/pr115526.c 
b/gcc/testsuite/gcc.target/alpha/pr115526.c
new file mode 100644
index ..2f57903fec34
--- /dev/null
+++ b/gcc/testsuite/gcc.target/alpha/pr115526.c
@@ -0,0 +1,46 @@
+/* PR target/115526 */
+/* { dg-do assemble } */
+/* { dg-options "-O2 -Wno-attributes -fvisibility=hidden -fPIC -mcpu=ev4" } */
+
+struct _ts {
+  struct _dtoa_state *interp;
+};
+struct Bigint {
+  int k;
+} *_Py_dg_strtod_bs;
+struct _dtoa_state {
+  struct Bigint p5s;
+  struct Bigint *freelist[];
+};
+extern _Thread_local struct _ts _Py_tss_tstate;
+typedef struct Bigint Bigint;
+int pow5mult_k;
+long _Py_dg_strtod_ndigits;
+void PyMem_Free();
+void Bfree(Bigint *v) {
+  if (v)
+{
+  if (v->k)
+   PyMem_Free();
+  else {
+   struct _dtoa_state *interp = _Py_tss_tstate.interp;
+   interp->freelist[v->k] = v;
+  }
+}
+}
+static Bigint *pow5mult(Bigint *b) {
+  for (;;) {
+if (pow5mult_k & 1) {
+  Bfree(b);
+  if (b == 0)
+return 0;
+}
+if (!(pow5mult_k >>= 1))
+  break;
+  }
+  return 0;
+}
+void _Py_dg_strtod() {
+  if (_Py_dg_strtod_ndigits)
+pow5mult(_Py_dg_strtod_bs);
+}


[gcc r14-10448] alpha: Fix duplicate !tlsgd!62 assemble error [PR115526]

2024-07-17 Thread Uros Bizjak via Gcc-cvs
https://gcc.gnu.org/g:3a963d441a68797956a5f67dcb351b2dbd4ac1d0

commit r14-10448-g3a963d441a68797956a5f67dcb351b2dbd4ac1d0
Author: Uros Bizjak 
Date:   Wed Jul 17 18:11:26 2024 +0200

alpha: Fix duplicate !tlsgd!62 assemble error [PR115526]

Add missing "cannot_copy" attribute to instructions that have to
stay in 1-1 correspondence with another insn.

PR target/115526

gcc/ChangeLog:

* config/alpha/alpha.md (movdi_er_high_g): Add cannot_copy 
attribute.
(movdi_er_tlsgd): Ditto.
(movdi_er_tlsldm): Ditto.
(call_value_osf_): Ditto.

gcc/testsuite/ChangeLog:

* gcc.target/alpha/pr115526.c: New test.

(cherry picked from commit 0841fd4c42ab053be951b7418233f0478282d020)

Diff:
---
 gcc/config/alpha/alpha.md | 10 +--
 gcc/testsuite/gcc.target/alpha/pr115526.c | 46 +++
 2 files changed, 53 insertions(+), 3 deletions(-)

diff --git a/gcc/config/alpha/alpha.md b/gcc/config/alpha/alpha.md
index 1e2de5a4d15b..bd92392878e2 100644
--- a/gcc/config/alpha/alpha.md
+++ b/gcc/config/alpha/alpha.md
@@ -3902,7 +3902,8 @@
   else
 return "ldq %0,%2(%1)\t\t!literal!%3";
 }
-  [(set_attr "type" "ldsym")])
+  [(set_attr "type" "ldsym")
+   (set_attr "cannot_copy" "true")])
 
 (define_split
   [(set (match_operand:DI 0 "register_operand")
@@ -3926,7 +3927,8 @@
 return "lda %0,%2(%1)\t\t!tlsgd";
   else
 return "lda %0,%2(%1)\t\t!tlsgd!%3";
-})
+}
+  [(set_attr "cannot_copy" "true")])
 
 (define_insn "movdi_er_tlsldm"
   [(set (match_operand:DI 0 "register_operand" "=r")
@@ -3939,7 +3941,8 @@
 return "lda %0,%&(%1)\t\t!tlsldm";
   else
 return "lda %0,%&(%1)\t\t!tlsldm!%2";
-})
+}
+  [(set_attr "cannot_copy" "true")])
 
 (define_insn "*movdi_er_gotdtp"
   [(set (match_operand:DI 0 "register_operand" "=r")
@@ -5908,6 +5911,7 @@
   "HAVE_AS_TLS"
   "ldq $27,%1($29)\t\t!literal!%2\;jsr $26,($27),%1\t\t!lituse_!%2\;ldah 
$29,0($26)\t\t!gpdisp!%*\;lda $29,0($29)\t\t!gpdisp!%*"
   [(set_attr "type" "jsr")
+   (set_attr "cannot_copy" "true")
(set_attr "length" "16")])
 
 ;; We must use peep2 instead of a split because we need accurate life
diff --git a/gcc/testsuite/gcc.target/alpha/pr115526.c 
b/gcc/testsuite/gcc.target/alpha/pr115526.c
new file mode 100644
index ..2f57903fec34
--- /dev/null
+++ b/gcc/testsuite/gcc.target/alpha/pr115526.c
@@ -0,0 +1,46 @@
+/* PR target/115526 */
+/* { dg-do assemble } */
+/* { dg-options "-O2 -Wno-attributes -fvisibility=hidden -fPIC -mcpu=ev4" } */
+
+struct _ts {
+  struct _dtoa_state *interp;
+};
+struct Bigint {
+  int k;
+} *_Py_dg_strtod_bs;
+struct _dtoa_state {
+  struct Bigint p5s;
+  struct Bigint *freelist[];
+};
+extern _Thread_local struct _ts _Py_tss_tstate;
+typedef struct Bigint Bigint;
+int pow5mult_k;
+long _Py_dg_strtod_ndigits;
+void PyMem_Free();
+void Bfree(Bigint *v) {
+  if (v)
+{
+  if (v->k)
+   PyMem_Free();
+  else {
+   struct _dtoa_state *interp = _Py_tss_tstate.interp;
+   interp->freelist[v->k] = v;
+  }
+}
+}
+static Bigint *pow5mult(Bigint *b) {
+  for (;;) {
+if (pow5mult_k & 1) {
+  Bfree(b);
+  if (b == 0)
+return 0;
+}
+if (!(pow5mult_k >>= 1))
+  break;
+  }
+  return 0;
+}
+void _Py_dg_strtod() {
+  if (_Py_dg_strtod_ndigits)
+pow5mult(_Py_dg_strtod_bs);
+}


[committed] alpha: Fix duplicate !tlsgd!62 assemble error [PR115526]

2024-07-17 Thread Uros Bizjak
Add missing "cannot_copy" attribute to instructions that have to
stay in 1-1 correspondence with another insn.

PR target/115526

gcc/ChangeLog:

* config/alpha/alpha.md (movdi_er_high_g): Add cannot_copy attribute.
(movdi_er_tlsgd): Ditto.
(movdi_er_tlsldm): Ditto.
(call_value_osf_): Ditto.

gcc/testsuite/ChangeLog:

* gcc.target/alpha/pr115526.c: New test.

Tested by Maciej on Alpha/Linux target and reported in the PR.

Uros.
diff --git a/gcc/config/alpha/alpha.md b/gcc/config/alpha/alpha.md
index 1e2de5a4d15..bd92392878e 100644
--- a/gcc/config/alpha/alpha.md
+++ b/gcc/config/alpha/alpha.md
@@ -3902,7 +3902,8 @@ (define_insn "movdi_er_high_g"
   else
 return "ldq %0,%2(%1)\t\t!literal!%3";
 }
-  [(set_attr "type" "ldsym")])
+  [(set_attr "type" "ldsym")
+   (set_attr "cannot_copy" "true")])
 
 (define_split
   [(set (match_operand:DI 0 "register_operand")
@@ -3926,7 +3927,8 @@ (define_insn "movdi_er_tlsgd"
 return "lda %0,%2(%1)\t\t!tlsgd";
   else
 return "lda %0,%2(%1)\t\t!tlsgd!%3";
-})
+}
+  [(set_attr "cannot_copy" "true")])
 
 (define_insn "movdi_er_tlsldm"
   [(set (match_operand:DI 0 "register_operand" "=r")
@@ -3939,7 +3941,8 @@ (define_insn "movdi_er_tlsldm"
 return "lda %0,%&(%1)\t\t!tlsldm";
   else
 return "lda %0,%&(%1)\t\t!tlsldm!%2";
-})
+}
+  [(set_attr "cannot_copy" "true")])
 
 (define_insn "*movdi_er_gotdtp"
   [(set (match_operand:DI 0 "register_operand" "=r")
@@ -5908,6 +5911,7 @@ (define_insn "call_value_osf_"
   "HAVE_AS_TLS"
   "ldq $27,%1($29)\t\t!literal!%2\;jsr $26,($27),%1\t\t!lituse_!%2\;ldah 
$29,0($26)\t\t!gpdisp!%*\;lda $29,0($29)\t\t!gpdisp!%*"
   [(set_attr "type" "jsr")
+   (set_attr "cannot_copy" "true")
(set_attr "length" "16")])
 
 ;; We must use peep2 instead of a split because we need accurate life
diff --git a/gcc/testsuite/gcc.target/alpha/pr115526.c 
b/gcc/testsuite/gcc.target/alpha/pr115526.c
new file mode 100644
index 000..2f57903fec3
--- /dev/null
+++ b/gcc/testsuite/gcc.target/alpha/pr115526.c
@@ -0,0 +1,46 @@
+/* PR target/115526 */
+/* { dg-do assemble } */
+/* { dg-options "-O2 -Wno-attributes -fvisibility=hidden -fPIC -mcpu=ev4" } */
+
+struct _ts {
+  struct _dtoa_state *interp;
+};
+struct Bigint {
+  int k;
+} *_Py_dg_strtod_bs;
+struct _dtoa_state {
+  struct Bigint p5s;
+  struct Bigint *freelist[];
+};
+extern _Thread_local struct _ts _Py_tss_tstate;
+typedef struct Bigint Bigint;
+int pow5mult_k;
+long _Py_dg_strtod_ndigits;
+void PyMem_Free();
+void Bfree(Bigint *v) {
+  if (v)
+{
+  if (v->k)
+   PyMem_Free();
+  else {
+   struct _dtoa_state *interp = _Py_tss_tstate.interp;
+   interp->freelist[v->k] = v;
+  }
+}
+}
+static Bigint *pow5mult(Bigint *b) {
+  for (;;) {
+if (pow5mult_k & 1) {
+  Bfree(b);
+  if (b == 0)
+return 0;
+}
+if (!(pow5mult_k >>= 1))
+  break;
+  }
+  return 0;
+}
+void _Py_dg_strtod() {
+  if (_Py_dg_strtod_ndigits)
+pow5mult(_Py_dg_strtod_bs);
+}


[gcc r15-2104] alpha: Fix duplicate !tlsgd!62 assemble error [PR115526]

2024-07-17 Thread Uros Bizjak via Gcc-cvs
https://gcc.gnu.org/g:0841fd4c42ab053be951b7418233f0478282d020

commit r15-2104-g0841fd4c42ab053be951b7418233f0478282d020
Author: Uros Bizjak 
Date:   Wed Jul 17 18:11:26 2024 +0200

alpha: Fix duplicate !tlsgd!62 assemble error [PR115526]

Add missing "cannot_copy" attribute to instructions that have to
stay in 1-1 correspondence with another insn.

PR target/115526

gcc/ChangeLog:

* config/alpha/alpha.md (movdi_er_high_g): Add cannot_copy 
attribute.
(movdi_er_tlsgd): Ditto.
(movdi_er_tlsldm): Ditto.
(call_value_osf_): Ditto.

gcc/testsuite/ChangeLog:

* gcc.target/alpha/pr115526.c: New test.

Diff:
---
 gcc/config/alpha/alpha.md | 10 +--
 gcc/testsuite/gcc.target/alpha/pr115526.c | 46 +++
 2 files changed, 53 insertions(+), 3 deletions(-)

diff --git a/gcc/config/alpha/alpha.md b/gcc/config/alpha/alpha.md
index 1e2de5a4d15b..bd92392878e2 100644
--- a/gcc/config/alpha/alpha.md
+++ b/gcc/config/alpha/alpha.md
@@ -3902,7 +3902,8 @@
   else
 return "ldq %0,%2(%1)\t\t!literal!%3";
 }
-  [(set_attr "type" "ldsym")])
+  [(set_attr "type" "ldsym")
+   (set_attr "cannot_copy" "true")])
 
 (define_split
   [(set (match_operand:DI 0 "register_operand")
@@ -3926,7 +3927,8 @@
 return "lda %0,%2(%1)\t\t!tlsgd";
   else
 return "lda %0,%2(%1)\t\t!tlsgd!%3";
-})
+}
+  [(set_attr "cannot_copy" "true")])
 
 (define_insn "movdi_er_tlsldm"
   [(set (match_operand:DI 0 "register_operand" "=r")
@@ -3939,7 +3941,8 @@
 return "lda %0,%&(%1)\t\t!tlsldm";
   else
 return "lda %0,%&(%1)\t\t!tlsldm!%2";
-})
+}
+  [(set_attr "cannot_copy" "true")])
 
 (define_insn "*movdi_er_gotdtp"
   [(set (match_operand:DI 0 "register_operand" "=r")
@@ -5908,6 +5911,7 @@
   "HAVE_AS_TLS"
   "ldq $27,%1($29)\t\t!literal!%2\;jsr $26,($27),%1\t\t!lituse_!%2\;ldah 
$29,0($26)\t\t!gpdisp!%*\;lda $29,0($29)\t\t!gpdisp!%*"
   [(set_attr "type" "jsr")
+   (set_attr "cannot_copy" "true")
(set_attr "length" "16")])
 
 ;; We must use peep2 instead of a split because we need accurate life
diff --git a/gcc/testsuite/gcc.target/alpha/pr115526.c 
b/gcc/testsuite/gcc.target/alpha/pr115526.c
new file mode 100644
index ..2f57903fec34
--- /dev/null
+++ b/gcc/testsuite/gcc.target/alpha/pr115526.c
@@ -0,0 +1,46 @@
+/* PR target/115526 */
+/* { dg-do assemble } */
+/* { dg-options "-O2 -Wno-attributes -fvisibility=hidden -fPIC -mcpu=ev4" } */
+
+struct _ts {
+  struct _dtoa_state *interp;
+};
+struct Bigint {
+  int k;
+} *_Py_dg_strtod_bs;
+struct _dtoa_state {
+  struct Bigint p5s;
+  struct Bigint *freelist[];
+};
+extern _Thread_local struct _ts _Py_tss_tstate;
+typedef struct Bigint Bigint;
+int pow5mult_k;
+long _Py_dg_strtod_ndigits;
+void PyMem_Free();
+void Bfree(Bigint *v) {
+  if (v)
+{
+  if (v->k)
+   PyMem_Free();
+  else {
+   struct _dtoa_state *interp = _Py_tss_tstate.interp;
+   interp->freelist[v->k] = v;
+  }
+}
+}
+static Bigint *pow5mult(Bigint *b) {
+  for (;;) {
+if (pow5mult_k & 1) {
+  Bfree(b);
+  if (b == 0)
+return 0;
+}
+if (!(pow5mult_k >>= 1))
+  break;
+  }
+  return 0;
+}
+void _Py_dg_strtod() {
+  if (_Py_dg_strtod_ndigits)
+pow5mult(_Py_dg_strtod_bs);
+}


Re: [PATCH] [x86][avx512] Optimize maskstore when mask is 0 or -1 in UNSPEC_MASKMOV

2024-07-17 Thread Uros Bizjak
On Wed, Jul 17, 2024 at 8:54 AM Liu, Hongtao  wrote:
>
>
>
> > -Original Message-
> > From: Uros Bizjak 
> > Sent: Wednesday, July 17, 2024 2:52 PM
> > To: Liu, Hongtao 
> > Cc: gcc-patches@gcc.gnu.org; crazy...@gmail.com; hjl.to...@gmail.com
> > Subject: Re: [PATCH] [x86][avx512] Optimize maskstore when mask is 0 or -1
> > in UNSPEC_MASKMOV
> >
> > On Wed, Jul 17, 2024 at 3:27 AM liuhongt  wrote:
> > >
> > > Bootstrapped and regtested on x86_64-pc-linux-gnu{-m32,}.
> > > Ready push to trunk.
> > >
> > > gcc/ChangeLog:
> > >
> > > PR target/115843
> > > * config/i386/predicates.md (const0_or_m1_operand): New
> > > predicate.
> > > * config/i386/sse.md (*_store_mask_1): New
> > > pre_reload define_insn_and_split.
> > > (V): Add V32BF,V16BF,V8BF.
> > > (V4SF_V8BF): Rename to ..
> > > (V24F_128): .. this.
> > > (*vec_concat): Adjust with V24F_128.
> > > (*vec_concat_0): Ditto.
> > >
> > > gcc/testsuite/ChangeLog:
> > >
> > > * gcc.target/i386/pr115843.c: New test.
> > > ---
> > >  gcc/config/i386/predicates.md|  5 
> > >  gcc/config/i386/sse.md   | 32 
> > >  gcc/testsuite/gcc.target/i386/pr115843.c | 38
> > > 
> > >  3 files changed, 69 insertions(+), 6 deletions(-)  create mode 100644
> > > gcc/testsuite/gcc.target/i386/pr115843.c
> > >
> > > diff --git a/gcc/config/i386/predicates.md
> > > b/gcc/config/i386/predicates.md index 5d0bb1e0f54..680594871de
> > 100644
> > > --- a/gcc/config/i386/predicates.md
> > > +++ b/gcc/config/i386/predicates.md
> > > @@ -825,6 +825,11 @@ (define_predicate "constm1_operand"
> > >(and (match_code "const_int")
> > > (match_test "op == constm1_rtx")))
> > >
> > > +;; Match 0 or -1.
> > > +(define_predicate "const0_or_m1_operand"
> > > +  (ior (match_operand 0 "const0_operand")
> > > +   (match_operand 0 "constm1_operand")))
> > > +
> > >  ;; Match exactly eight.
> > >  (define_predicate "const8_operand"
> > >(and (match_code "const_int")
> > > diff --git a/gcc/config/i386/sse.md b/gcc/config/i386/sse.md index
> > > e44822f705b..e11610f4b88 100644
> > > --- a/gcc/config/i386/sse.md
> > > +++ b/gcc/config/i386/sse.md
> > > @@ -294,6 +294,7 @@ (define_mode_iterator V
> > > (V16SI "TARGET_AVX512F && TARGET_EVEX512") (V8SI "TARGET_AVX")
> > V4SI
> > > (V8DI "TARGET_AVX512F && TARGET_EVEX512")  (V4DI "TARGET_AVX")
> > V2DI
> > > (V32HF "TARGET_AVX512F && TARGET_EVEX512") (V16HF
> > "TARGET_AVX")
> > > V8HF
> > > +   (V32BF "TARGET_AVX512F && TARGET_EVEX512") (V16BF
> > "TARGET_AVX")
> > > + V8BF
> > > (V16SF "TARGET_AVX512F && TARGET_EVEX512") (V8SF "TARGET_AVX")
> > V4SF
> > > (V8DF "TARGET_AVX512F && TARGET_EVEX512")  (V4DF "TARGET_AVX")
> > > (V2DF "TARGET_SSE2")])
> > >
> > > @@ -430,8 +431,8 @@ (define_mode_iterator VFB_512
> > > (V16SF "TARGET_EVEX512")
> > > (V8DF "TARGET_EVEX512")])
> > >
> > > -(define_mode_iterator V4SF_V8HF
> > > -  [V4SF V8HF])
> > > +(define_mode_iterator V24F_128
> > > +  [V4SF V8HF V8BF])
> > >
> > >  (define_mode_iterator VI48_AVX512VL
> > >[(V16SI "TARGET_EVEX512") (V8SI "TARGET_AVX512VL") (V4SI
> > > "TARGET_AVX512VL") @@ -11543,8 +11544,8 @@ (define_insn
> > "*vec_concatv2sf_sse"
> > > (set_attr "mode" "V4SF,SF,DI,DI")])
> > >
> > >  (define_insn "*vec_concat"
> > > -  [(set (match_operand:V4SF_V8HF 0 "register_operand"   "=x,v,x,v")
> > > -   (vec_concat:V4SF_V8HF
> > > +  [(set (match_operand:V24F_128 0 "register_operand"   "=x,v,x,v")
> > > +   (vec_concat:V24F_128
> > >   (match_operand: 1 "register_operand" " 
> > > 0,v,0,v")
> > >   (match_operand: 2 "nonim

Re: [PATCH] [x86][avx512] Optimize maskstore when mask is 0 or -1 in UNSPEC_MASKMOV

2024-07-17 Thread Uros Bizjak
On Wed, Jul 17, 2024 at 3:27 AM liuhongt  wrote:
>
> Bootstrapped and regtested on x86_64-pc-linux-gnu{-m32,}.
> Ready push to trunk.
>
> gcc/ChangeLog:
>
> PR target/115843
> * config/i386/predicates.md (const0_or_m1_operand): New
> predicate.
> * config/i386/sse.md (*_store_mask_1): New
> pre_reload define_insn_and_split.
> (V): Add V32BF,V16BF,V8BF.
> (V4SF_V8BF): Rename to ..
> (V24F_128): .. this.
> (*vec_concat): Adjust with V24F_128.
> (*vec_concat_0): Ditto.
>
> gcc/testsuite/ChangeLog:
>
> * gcc.target/i386/pr115843.c: New test.
> ---
>  gcc/config/i386/predicates.md|  5 
>  gcc/config/i386/sse.md   | 32 
>  gcc/testsuite/gcc.target/i386/pr115843.c | 38 
>  3 files changed, 69 insertions(+), 6 deletions(-)
>  create mode 100644 gcc/testsuite/gcc.target/i386/pr115843.c
>
> diff --git a/gcc/config/i386/predicates.md b/gcc/config/i386/predicates.md
> index 5d0bb1e0f54..680594871de 100644
> --- a/gcc/config/i386/predicates.md
> +++ b/gcc/config/i386/predicates.md
> @@ -825,6 +825,11 @@ (define_predicate "constm1_operand"
>(and (match_code "const_int")
> (match_test "op == constm1_rtx")))
>
> +;; Match 0 or -1.
> +(define_predicate "const0_or_m1_operand"
> +  (ior (match_operand 0 "const0_operand")
> +   (match_operand 0 "constm1_operand")))
> +
>  ;; Match exactly eight.
>  (define_predicate "const8_operand"
>(and (match_code "const_int")
> diff --git a/gcc/config/i386/sse.md b/gcc/config/i386/sse.md
> index e44822f705b..e11610f4b88 100644
> --- a/gcc/config/i386/sse.md
> +++ b/gcc/config/i386/sse.md
> @@ -294,6 +294,7 @@ (define_mode_iterator V
> (V16SI "TARGET_AVX512F && TARGET_EVEX512") (V8SI "TARGET_AVX") V4SI
> (V8DI "TARGET_AVX512F && TARGET_EVEX512")  (V4DI "TARGET_AVX") V2DI
> (V32HF "TARGET_AVX512F && TARGET_EVEX512") (V16HF "TARGET_AVX") V8HF
> +   (V32BF "TARGET_AVX512F && TARGET_EVEX512") (V16BF "TARGET_AVX") V8BF
> (V16SF "TARGET_AVX512F && TARGET_EVEX512") (V8SF "TARGET_AVX") V4SF
> (V8DF "TARGET_AVX512F && TARGET_EVEX512")  (V4DF "TARGET_AVX") (V2DF 
> "TARGET_SSE2")])
>
> @@ -430,8 +431,8 @@ (define_mode_iterator VFB_512
> (V16SF "TARGET_EVEX512")
> (V8DF "TARGET_EVEX512")])
>
> -(define_mode_iterator V4SF_V8HF
> -  [V4SF V8HF])
> +(define_mode_iterator V24F_128
> +  [V4SF V8HF V8BF])
>
>  (define_mode_iterator VI48_AVX512VL
>[(V16SI "TARGET_EVEX512") (V8SI "TARGET_AVX512VL") (V4SI "TARGET_AVX512VL")
> @@ -11543,8 +11544,8 @@ (define_insn "*vec_concatv2sf_sse"
> (set_attr "mode" "V4SF,SF,DI,DI")])
>
>  (define_insn "*vec_concat"
> -  [(set (match_operand:V4SF_V8HF 0 "register_operand"   "=x,v,x,v")
> -   (vec_concat:V4SF_V8HF
> +  [(set (match_operand:V24F_128 0 "register_operand"   "=x,v,x,v")
> +   (vec_concat:V24F_128
>   (match_operand: 1 "register_operand" " 0,v,0,v")
>   (match_operand: 2 "nonimmediate_operand" " 
> x,v,m,m")))]
>"TARGET_SSE"
> @@ -11559,8 +11560,8 @@ (define_insn "*vec_concat"
> (set_attr "mode" "V4SF,V4SF,V2SF,V2SF")])
>
>  (define_insn "*vec_concat_0"
> -  [(set (match_operand:V4SF_V8HF 0 "register_operand"   "=v")
> -   (vec_concat:V4SF_V8HF
> +  [(set (match_operand:V24F_128 0 "register_operand"   "=v")
> +   (vec_concat:V24F_128
>   (match_operand: 1 "nonimmediate_operand" "vm")
>   (match_operand: 2 "const0_operand")))]
>"TARGET_SSE2"
> @@ -28574,6 +28575,25 @@ (define_insn "_store_mask"
> (set_attr "memory" "store")
> (set_attr "mode" "")])
>
> +(define_insn_and_split "*_store_mask_1"
> +  [(set (match_operand:V 0 "memory_operand")
> +   (unspec:V
> + [(match_operand:V 1 "register_operand")
> +  (match_dup 0)
> +  (match_operand: 2 "const0_or_m1_operand")]
> + UNSPEC_MASKMOV))]
> +  "TARGET_AVX512F"

Please add "ix86_pre_reload_split ()" condition to insn constraint for
instructions that have to be split before reload.

Uros.

> +  "#"
> +  "&& 1"
> +  [(const_int 0)]
> +{
> +  if (constm1_operand (operands[2], mode))
> +  {
> +emit_move_insn (operands[0], operands[1]);
> +DONE;
> +  }
> +})
> +
>  (define_expand "cbranch4"
>[(set (reg:CC FLAGS_REG)
> (compare:CC (match_operand:VI_AVX_AVX512F 1 "register_operand")
> diff --git a/gcc/testsuite/gcc.target/i386/pr115843.c 
> b/gcc/testsuite/gcc.target/i386/pr115843.c
> new file mode 100644
> index 000..00d8605757a
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/i386/pr115843.c
> @@ -0,0 +1,38 @@
> +/* { dg-do compile } */
> +/* { dg-options "-O3 -mavx512vl --param vect-partial-vector-usage=2 
> -mtune=znver5 -mprefer-vector-width=512" } */
> +/* { dg-final { scan-assembler-not "kxor\[bw]" } } */
> +
> +typedef unsigned long long BITBOARD;
> +BITBOARD KingPressureMask1[64], KingSafetyMask1[64];
> +
> +void __attribute__((noinline))
> +foo()
> 

Re: [x86 PATCH] Tweak i386-expand.cc to restore bootstrap on RHEL.

2024-07-14 Thread Uros Bizjak
On Sun, Jul 14, 2024 at 3:42 PM Roger Sayle  wrote:
>
>
> This is a minor change to restore bootstrap on systems using gcc 4.8
> as a host compiler.  The fatal error is:
>
> In file included from gcc/gcc/coretypes.h:471:0,
>  from gcc/gcc/config/i386/i386-expand.cc:23:
> gcc/gcc/config/i386/i386-expand.cc: In function 'void
> ix86_expand_fp_absneg_operator(rtx_code, machine_mode, rtx_def**)':
> ./insn-modes.h:315:75: error: temporary of non-literal type
> 'scalar_float_mode' in a constant expression
>  #define HFmode (scalar_float_mode ((scalar_float_mode::from_int) E_HFmode))
>^
> gcc/gcc/config/i386/i386-expand.cc:2179:8: note: in expansion of macro
> 'HFmode'
>case HFmode:
> ^
>
>
> The solution is to use the E_?Fmode enumeration constants as case values
> in switch statements.
>
> This patch has been tested on x86_64-pc-linux-gnu with make bootstrap
> and make -k check, both with and without --target_board=unix{-m32}
> with no new failures (from this change).  Ok for mainline?
>
>
> 2024-07-14  Roger Sayle  
>
> * config/i386/i386-expand.cc (ix86_expand_fp_absneg_operator):
> Use E_?Fmode enumeration constants in switch statement.
> (ix86_expand_copysign): Likewise.
> (ix86_expand_xorsign): Likewise.

OK, also for backports.

Thanks,
Uros.

>
>
> Thanks in advance,
> Roger
> --
>


Re: [r15-1936 Regression] FAIL: gcc.target/i386/avx512vl-vpmovuswb-2.c execution test on Linux/x86_64

2024-07-10 Thread Uros Bizjak
On Wed, Jul 10, 2024 at 3:42 PM haochen.jiang
 wrote:
>
> On Linux/x86_64,
>
> 80e446e829d818dc19daa6e671b9626e93ee4949 is the first bad commit
> commit 80e446e829d818dc19daa6e671b9626e93ee4949
> Author: Pan Li 
> Date:   Fri Jul 5 20:36:35 2024 +0800
>
> Match: Support form 2 for the .SAT_TRUNC
>
> caused
>
> FAIL: gcc.target/i386/avx512f-vpmovusqb-2.c execution test
> FAIL: gcc.target/i386/avx512vl-vpmovusdb-2.c execution test
> FAIL: gcc.target/i386/avx512vl-vpmovusdw-2.c execution test
> FAIL: gcc.target/i386/avx512vl-vpmovusqb-2.c execution test
> FAIL: gcc.target/i386/avx512vl-vpmovusqd-2.c execution test
> FAIL: gcc.target/i386/avx512vl-vpmovusqw-2.c execution test
> FAIL: gcc.target/i386/avx512vl-vpmovuswb-2.c execution test

This is fixed by [1].

The consequence of a last-minute "impossible-to-fail-so-no-need-to-test" change.

Lesson learned.

[1] https://gcc.gnu.org/pipermail/gcc-patches/2024-July/656898.html

Uros.


[committed] i386: Swap compare operands in ustrunc patterns

2024-07-10 Thread Uros Bizjak
A last minute change led to a wrong operand order in the  compare insn.

gcc/ChangeLog:

* config/i386/i386.md (ustruncdi2): Swap compare operands.
(ustruncsi2): Ditto.
(ustrunchiqi2): Ditto.

Bootstrapped and regression tested on x86_64-linux-gnu {,-m32}.

Uros.
diff --git a/gcc/config/i386/i386.md b/gcc/config/i386/i386.md
index e2f30695d70..de9f4ba0496 100644
--- a/gcc/config/i386/i386.md
+++ b/gcc/config/i386/i386.md
@@ -9990,7 +9990,7 @@ (define_expand "ustruncdi2"
   rtx sat = force_reg (DImode, GEN_INT (GET_MODE_MASK (mode)));
   rtx dst;
 
-  emit_insn (gen_cmpdi_1 (op1, sat));
+  emit_insn (gen_cmpdi_1 (sat, op1));
 
   if (TARGET_CMOVE)
 {
@@ -10026,7 +10026,7 @@ (define_expand "ustruncsi2"
   rtx sat = force_reg (SImode, GEN_INT (GET_MODE_MASK (mode)));
   rtx dst;
 
-  emit_insn (gen_cmpsi_1 (op1, sat));
+  emit_insn (gen_cmpsi_1 (sat, op1));
 
   if (TARGET_CMOVE)
 {
@@ -10062,7 +10062,7 @@ (define_expand "ustrunchiqi2"
   rtx sat = force_reg (HImode, GEN_INT (GET_MODE_MASK (QImode)));
   rtx dst;
 
-  emit_insn (gen_cmphi_1 (op1, sat));
+  emit_insn (gen_cmphi_1 (sat, op1));
 
   if (TARGET_CMOVE)
 {


[gcc r15-1954] i386: Swap compare operands in ustrunc patterns

2024-07-10 Thread Uros Bizjak via Gcc-cvs
https://gcc.gnu.org/g:aae535f3a870659d1f002f82bd585de0bcec7905

commit r15-1954-gaae535f3a870659d1f002f82bd585de0bcec7905
Author: Uros Bizjak 
Date:   Wed Jul 10 23:00:00 2024 +0200

i386: Swap compare operands in ustrunc patterns

A last minute change led to a wrong operand order in the compare insn.

gcc/ChangeLog:

* config/i386/i386.md (ustruncdi2): Swap compare operands.
(ustruncsi2): Ditto.
(ustrunchiqi2): Ditto.

Diff:
---
 gcc/config/i386/i386.md | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/gcc/config/i386/i386.md b/gcc/config/i386/i386.md
index e2f30695d70e..de9f4ba04962 100644
--- a/gcc/config/i386/i386.md
+++ b/gcc/config/i386/i386.md
@@ -9990,7 +9990,7 @@
   rtx sat = force_reg (DImode, GEN_INT (GET_MODE_MASK (mode)));
   rtx dst;
 
-  emit_insn (gen_cmpdi_1 (op1, sat));
+  emit_insn (gen_cmpdi_1 (sat, op1));
 
   if (TARGET_CMOVE)
 {
@@ -10026,7 +10026,7 @@
   rtx sat = force_reg (SImode, GEN_INT (GET_MODE_MASK (mode)));
   rtx dst;
 
-  emit_insn (gen_cmpsi_1 (op1, sat));
+  emit_insn (gen_cmpsi_1 (sat, op1));
 
   if (TARGET_CMOVE)
 {
@@ -10062,7 +10062,7 @@
   rtx sat = force_reg (HImode, GEN_INT (GET_MODE_MASK (QImode)));
   rtx dst;
 
-  emit_insn (gen_cmphi_1 (op1, sat));
+  emit_insn (gen_cmphi_1 (sat, op1));
 
   if (TARGET_CMOVE)
 {


[gcc r11-11568] middle-end: Fix stalled swapped condition code value [PR115836]

2024-07-10 Thread Uros Bizjak via Gcc-cvs
https://gcc.gnu.org/g:d67566cefe7325998cc2471a28e9d3a3016455a0

commit r11-11568-gd67566cefe7325998cc2471a28e9d3a3016455a0
Author: Uros Bizjak 
Date:   Wed Jul 10 09:27:27 2024 +0200

middle-end: Fix stalled swapped condition code value [PR115836]

emit_store_flag_1 calculates scode (swapped condition code) at the
beginning of the function from the value of code variable.  However,
code variable may change before scode usage site, resulting in
invalid stalled scode value.

Move calculation of scode value just before its only usage site to
avoid stalled scode value.

PR middle-end/115836

gcc/ChangeLog:

* expmed.c (emit_store_flag_1): Move calculation of
scode just before its only usage site.

(cherry picked from commit 44933fdeb338e00c972e42224b9a83d3f8f6a757)

Diff:
---
 gcc/expmed.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/gcc/expmed.c b/gcc/expmed.c
index 3143f38e0570..2c916eab43b6 100644
--- a/gcc/expmed.c
+++ b/gcc/expmed.c
@@ -5589,11 +5589,9 @@ emit_store_flag_1 (rtx target, enum rtx_code code, rtx 
op0, rtx op1,
   enum insn_code icode;
   machine_mode compare_mode;
   enum mode_class mclass;
-  enum rtx_code scode;
 
   if (unsignedp)
 code = unsigned_condition (code);
-  scode = swap_condition (code);
 
   /* If one operand is constant, make it the second one.  Only do this
  if the other operand is not constant as well.  */
@@ -5761,6 +5759,8 @@ emit_store_flag_1 (rtx target, enum rtx_code code, rtx 
op0, rtx op1,
 
  if (GET_MODE_CLASS (mode) == MODE_FLOAT)
{
+ enum rtx_code scode = swap_condition (code);
+
  tem = emit_cstore (target, icode, scode, mode, compare_mode,
 unsignedp, op1, op0, normalizep, target_mode);
  if (tem)


[gcc r12-10610] middle-end: Fix stalled swapped condition code value [PR115836]

2024-07-10 Thread Uros Bizjak via Gcc-cvs
https://gcc.gnu.org/g:10904e051f1b970cd8e030dff7dec8374c946b12

commit r12-10610-g10904e051f1b970cd8e030dff7dec8374c946b12
Author: Uros Bizjak 
Date:   Wed Jul 10 09:27:27 2024 +0200

middle-end: Fix stalled swapped condition code value [PR115836]

emit_store_flag_1 calculates scode (swapped condition code) at the
beginning of the function from the value of code variable.  However,
code variable may change before scode usage site, resulting in
invalid stalled scode value.

Move calculation of scode value just before its only usage site to
avoid stalled scode value.

PR middle-end/115836

gcc/ChangeLog:

* expmed.cc (emit_store_flag_1): Move calculation of
scode just before its only usage site.

(cherry picked from commit 44933fdeb338e00c972e42224b9a83d3f8f6a757)

Diff:
---
 gcc/expmed.cc | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/gcc/expmed.cc b/gcc/expmed.cc
index 1bb4da8d094e..39e53faec70e 100644
--- a/gcc/expmed.cc
+++ b/gcc/expmed.cc
@@ -5601,11 +5601,9 @@ emit_store_flag_1 (rtx target, enum rtx_code code, rtx 
op0, rtx op1,
   enum insn_code icode;
   machine_mode compare_mode;
   enum mode_class mclass;
-  enum rtx_code scode;
 
   if (unsignedp)
 code = unsigned_condition (code);
-  scode = swap_condition (code);
 
   /* If one operand is constant, make it the second one.  Only do this
  if the other operand is not constant as well.  */
@@ -5773,6 +5771,8 @@ emit_store_flag_1 (rtx target, enum rtx_code code, rtx 
op0, rtx op1,
 
  if (GET_MODE_CLASS (mode) == MODE_FLOAT)
{
+ enum rtx_code scode = swap_condition (code);
+
  tem = emit_cstore (target, icode, scode, mode, compare_mode,
 unsignedp, op1, op0, normalizep, target_mode);
  if (tem)


[gcc r13-8903] middle-end: Fix stalled swapped condition code value [PR115836]

2024-07-10 Thread Uros Bizjak via Gcc-cvs
https://gcc.gnu.org/g:cc47ad09e571016f498710fbd8a19f302c9004de

commit r13-8903-gcc47ad09e571016f498710fbd8a19f302c9004de
Author: Uros Bizjak 
Date:   Wed Jul 10 09:27:27 2024 +0200

middle-end: Fix stalled swapped condition code value [PR115836]

emit_store_flag_1 calculates scode (swapped condition code) at the
beginning of the function from the value of code variable.  However,
code variable may change before scode usage site, resulting in
invalid stalled scode value.

Move calculation of scode value just before its only usage site to
avoid stalled scode value.

PR middle-end/115836

gcc/ChangeLog:

* expmed.cc (emit_store_flag_1): Move calculation of
scode just before its only usage site.

(cherry picked from commit 44933fdeb338e00c972e42224b9a83d3f8f6a757)

Diff:
---
 gcc/expmed.cc | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/gcc/expmed.cc b/gcc/expmed.cc
index 1553ea8e31eb..e06cdd47e9e6 100644
--- a/gcc/expmed.cc
+++ b/gcc/expmed.cc
@@ -5607,11 +5607,9 @@ emit_store_flag_1 (rtx target, enum rtx_code code, rtx 
op0, rtx op1,
   enum insn_code icode;
   machine_mode compare_mode;
   enum mode_class mclass;
-  enum rtx_code scode;
 
   if (unsignedp)
 code = unsigned_condition (code);
-  scode = swap_condition (code);
 
   /* If one operand is constant, make it the second one.  Only do this
  if the other operand is not constant as well.  */
@@ -5726,6 +5724,8 @@ emit_store_flag_1 (rtx target, enum rtx_code code, rtx 
op0, rtx op1,
 
  if (GET_MODE_CLASS (mode) == MODE_FLOAT)
{
+ enum rtx_code scode = swap_condition (code);
+
  tem = emit_cstore (target, icode, scode, mode, compare_mode,
 unsignedp, op1, op0, normalizep, target_mode);
  if (tem)


[gcc r14-10404] middle-end: Fix stalled swapped condition code value [PR115836]

2024-07-10 Thread Uros Bizjak via Gcc-cvs
https://gcc.gnu.org/g:47a8b464d2dd9a586a9e15242c9825e39e1ecd4c

commit r14-10404-g47a8b464d2dd9a586a9e15242c9825e39e1ecd4c
Author: Uros Bizjak 
Date:   Wed Jul 10 09:27:27 2024 +0200

middle-end: Fix stalled swapped condition code value [PR115836]

emit_store_flag_1 calculates scode (swapped condition code) at the
beginning of the function from the value of code variable.  However,
code variable may change before scode usage site, resulting in
invalid stalled scode value.

Move calculation of scode value just before its only usage site to
avoid stalled scode value.

PR middle-end/115836

gcc/ChangeLog:

* expmed.cc (emit_store_flag_1): Move calculation of
scode just before its only usage site.

(cherry picked from commit 44933fdeb338e00c972e42224b9a83d3f8f6a757)

Diff:
---
 gcc/expmed.cc | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/gcc/expmed.cc b/gcc/expmed.cc
index 4ec035e4843b..19765311b954 100644
--- a/gcc/expmed.cc
+++ b/gcc/expmed.cc
@@ -5617,11 +5617,9 @@ emit_store_flag_1 (rtx target, enum rtx_code code, rtx 
op0, rtx op1,
   enum insn_code icode;
   machine_mode compare_mode;
   enum mode_class mclass;
-  enum rtx_code scode;
 
   if (unsignedp)
 code = unsigned_condition (code);
-  scode = swap_condition (code);
 
   /* If one operand is constant, make it the second one.  Only do this
  if the other operand is not constant as well.  */
@@ -5736,6 +5734,8 @@ emit_store_flag_1 (rtx target, enum rtx_code code, rtx 
op0, rtx op1,
 
  if (GET_MODE_CLASS (mode) == MODE_FLOAT)
{
+ enum rtx_code scode = swap_condition (code);
+
  tem = emit_cstore (target, icode, scode, mode, compare_mode,
 unsignedp, op1, op0, normalizep, target_mode);
  if (tem)


[gcc r15-1939] middle-end: Fix stalled swapped condition code value [PR115836]

2024-07-10 Thread Uros Bizjak via Gcc-cvs
https://gcc.gnu.org/g:44933fdeb338e00c972e42224b9a83d3f8f6a757

commit r15-1939-g44933fdeb338e00c972e42224b9a83d3f8f6a757
Author: Uros Bizjak 
Date:   Wed Jul 10 09:27:27 2024 +0200

middle-end: Fix stalled swapped condition code value [PR115836]

emit_store_flag_1 calculates scode (swapped condition code) at the
beginning of the function from the value of code variable.  However,
code variable may change before scode usage site, resulting in
invalid stalled scode value.

Move calculation of scode value just before its only usage site to
avoid stalled scode value.

PR middle-end/115836

gcc/ChangeLog:

* expmed.cc (emit_store_flag_1): Move calculation of
scode just before its only usage site.

Diff:
---
 gcc/expmed.cc | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/gcc/expmed.cc b/gcc/expmed.cc
index 8bbbc94a98cb..154964bd0687 100644
--- a/gcc/expmed.cc
+++ b/gcc/expmed.cc
@@ -5632,11 +5632,9 @@ emit_store_flag_1 (rtx target, enum rtx_code code, rtx 
op0, rtx op1,
   enum insn_code icode;
   machine_mode compare_mode;
   enum mode_class mclass;
-  enum rtx_code scode;
 
   if (unsignedp)
 code = unsigned_condition (code);
-  scode = swap_condition (code);
 
   /* If one operand is constant, make it the second one.  Only do this
  if the other operand is not constant as well.  */
@@ -5751,6 +5749,8 @@ emit_store_flag_1 (rtx target, enum rtx_code code, rtx 
op0, rtx op1,
 
  if (GET_MODE_CLASS (mode) == MODE_FLOAT)
{
+ enum rtx_code scode = swap_condition (code);
+
  tem = emit_cstore (target, icode, scode, mode, compare_mode,
 unsignedp, op1, op0, normalizep, target_mode);
  if (tem)


[PATCH] middle-end: Fix stalled swapped condition code value [PR115836]

2024-07-10 Thread Uros Bizjak
emit_store_flag_1 calculates scode (swapped condition code) at the
beginning of the function from the value of code variable.  However,
code variable may change before scode usage site, resulting in
invalid stalled scode value.

Move calculation of scode value just before its only usage site to
avoid stalled scode value.

PR middle-end/115836

gcc/ChangeLog:

* expmed.cc (emit_store_flag_1): Move calculation of
scode just before its only usage site.

Bootstrapped and regression tested on x86_64-linux-gnu {,-m32}.

Also tested with original and minimized preprocessed source.
Unfortunately, even with the minimized source, the compilation takes
~5 minutes, and IMO such a trivial fix does not warrant that high
resource consumption.

OK for master and release branches?

Uros.
diff --git a/gcc/expmed.cc b/gcc/expmed.cc
index 8bbbc94a98c..154964bd068 100644
--- a/gcc/expmed.cc
+++ b/gcc/expmed.cc
@@ -5632,11 +5632,9 @@ emit_store_flag_1 (rtx target, enum rtx_code code, rtx 
op0, rtx op1,
   enum insn_code icode;
   machine_mode compare_mode;
   enum mode_class mclass;
-  enum rtx_code scode;
 
   if (unsignedp)
 code = unsigned_condition (code);
-  scode = swap_condition (code);
 
   /* If one operand is constant, make it the second one.  Only do this
  if the other operand is not constant as well.  */
@@ -5751,6 +5749,8 @@ emit_store_flag_1 (rtx target, enum rtx_code code, rtx 
op0, rtx op1,
 
  if (GET_MODE_CLASS (mode) == MODE_FLOAT)
{
+ enum rtx_code scode = swap_condition (code);
+
  tem = emit_cstore (target, icode, scode, mode, compare_mode,
 unsignedp, op1, op0, normalizep, target_mode);
  if (tem)


Re: [PATCH] [alpha] adjust MEM alignment for block move [PR115459] (was: Re: [PATCH v2] [PR100106] Reject unaligned subregs when strict alignment is required)

2024-07-10 Thread Uros Bizjak
On Thu, Jun 13, 2024 at 9:37 AM Alexandre Oliva  wrote:
>
> Hello, Maciej,
>
> On Jun 12, 2024, "Maciej W. Rozycki"  wrote:
>
> >  This has regressed building the `alpha-linux-gnu' target, in libada, as
> > from commit d6b756447cd5 including GCC 14 and up to current GCC 15 trunk:
>
> > | Error detected around g-debpoo.adb:1896:8|
>
> > I have filed PR #115459.
>
> Thanks!
>
> This was tricky to duplicate without access to an alpha-linux-gnu
> machine.  I ended up building an uberbaum tree with --disable-shared
> --disable-threads --enable-languages=ada up to all-target-libgcc, then I
> replaced gcc/collect2 with a wrapper script that dropped crt[1in].o and
> -lc, so that link tests in libada/configure would succeed without glibc
> for the target.  libada still wouldn't build, because of the missing
> glibc headers, but I could compile g-depboo.adb with -I pointing at a
> x86_64-linux-gnu's gcc/ada/rts build tree, and with that, at -O2, I
> could trigger the problem and investigate it.  And with the following
> patch, the problem seems to be gone.
>
> Maciej, would you be so kind as to give it a spin with a native
> regstrap?  TIA,
>
> Richard, is this ok to install if regstrapping succeeds?
>
>
> Before issuing loads or stores for a block move, adjust the MEM
> alignments if analysis of the addresses enabled the inference of
> stricter alignment.  This ensures that the MEMs are sufficiently
> aligned for the corresponding insns, which avoids trouble in case of
> e.g. substitutions into SUBREGs.
>
>
> for  gcc/ChangeLog
>
> PR target/115459
> * config/alpha/alpha.cc (alpha_expand_block_move): Adjust
> MEMs to match inferred alignment.

LGTM, based on a successful bootstrap/regtest report down the reply thread.

Thanks,
Uros.

> ---
>  gcc/config/alpha/alpha.cc |   12 
>  1 file changed, 12 insertions(+)
>
> diff --git a/gcc/config/alpha/alpha.cc b/gcc/config/alpha/alpha.cc
> index 1126cea1f7ba2..e090e74b9d073 100644
> --- a/gcc/config/alpha/alpha.cc
> +++ b/gcc/config/alpha/alpha.cc
> @@ -3820,6 +3820,12 @@ alpha_expand_block_move (rtx operands[])
>else if (a >= 16 && c % 2 == 0)
> src_align = 16;
> }
> +
> +  if (MEM_P (orig_src) && MEM_ALIGN (orig_src) < src_align)
> +   {
> + orig_src = shallow_copy_rtx (orig_src);
> + set_mem_align (orig_src, src_align);
> +   }
>  }
>
>tmp = XEXP (orig_dst, 0);
> @@ -3841,6 +3847,12 @@ alpha_expand_block_move (rtx operands[])
>else if (a >= 16 && c % 2 == 0)
> dst_align = 16;
> }
> +
> +  if (MEM_P (orig_dst) && MEM_ALIGN (orig_dst) < dst_align)
> +   {
> + orig_dst = shallow_copy_rtx (orig_dst);
> + set_mem_align (orig_dst, dst_align);
> +   }
>  }
>
>ofs = 0;
>
>
> --
> Alexandre Oliva, happy hackerhttps://FSFLA.org/blogs/lxo/
>Free Software Activist   GNU Toolchain Engineer
> More tolerance and less prejudice are key for inclusion and diversity
> Excluding neuro-others for not behaving ""normal"" is *not* inclusive


[committed] i386: Implement .SAT_TRUNC for unsigned integers

2024-07-09 Thread Uros Bizjak
The following testcase:

unsigned short foo (unsigned int x)
{
  _Bool overflow = x > (unsigned int)(unsigned short)(-1);
  return ((unsigned short)x | (unsigned short)-overflow);
}

currently compiles (-O2) to:

foo:
xorl%eax, %eax
cmpl$65535, %edi
seta%al
negl%eax
orl%edi, %eax
ret

We can expand through ustrunc{m}{n}2 optab to use carry flag from the
comparison and generate code using SBB:

foo:
cmpl$65535, %edi
sbbl%eax, %eax
orl%edi, %eax
ret

or CMOV instruction:

foo:
movl$65535, %eax
cmpl%eax, %edi
cmovnc%edi, %eax
ret

gcc/ChangeLog:

* config/i386/i386.md (@cmp_1): Use SWI mode iterator.
(ustruncdi2): New expander.
(ustruncsi2): Ditto.
(ustrunchiqi2): Ditto.

gcc/testsuite/ChangeLog:

* gcc.target/i386/sattrunc-1.c: New test.

Bootstrapped and regression tested on x86_64-linux-gnu {,-m32}.

Uros.
diff --git a/gcc/config/i386/i386.md b/gcc/config/i386/i386.md
index 214cb2e239a..e2f30695d70 100644
--- a/gcc/config/i386/i386.md
+++ b/gcc/config/i386/i386.md
@@ -1533,8 +1533,8 @@ (define_insn "@ccmp"
 
 (define_expand "@cmp_1"
   [(set (reg:CC FLAGS_REG)
-   (compare:CC (match_operand:SWI48 0 "nonimmediate_operand")
-   (match_operand:SWI48 1 "")))])
+   (compare:CC (match_operand:SWI 0 "nonimmediate_operand")
+   (match_operand:SWI 1 "")))])
 
 (define_mode_iterator SWI1248_AVX512BWDQ_64
   [(QI "TARGET_AVX512DQ") HI
@@ -9981,6 +9981,114 @@ (define_expand "ussub3"
   DONE;
 })
 
+(define_expand "ustruncdi2"
+  [(set (match_operand:SWI124 0 "register_operand")
+   (us_truncate:DI (match_operand:DI 1 "nonimmediate_operand")))]
+  "TARGET_64BIT"
+{
+  rtx op1 = force_reg (DImode, operands[1]);
+  rtx sat = force_reg (DImode, GEN_INT (GET_MODE_MASK (mode)));
+  rtx dst;
+
+  emit_insn (gen_cmpdi_1 (op1, sat));
+
+  if (TARGET_CMOVE)
+{
+  rtx cmp = gen_rtx_GEU (VOIDmode, gen_rtx_REG (CCCmode, FLAGS_REG),
+const0_rtx);
+
+  dst = force_reg (mode, operands[0]);
+  emit_insn (gen_movsicc (gen_lowpart (SImode, dst), cmp,
+ gen_lowpart (SImode, op1),
+ gen_lowpart (SImode, sat)));
+}
+  else
+{
+  rtx msk = gen_reg_rtx (mode);
+
+  emit_insn (gen_x86_movcc_0_m1_neg (msk));
+  dst = expand_simple_binop (mode, IOR,
+gen_lowpart (mode, op1), msk,
+operands[0], 1, OPTAB_WIDEN);
+}
+
+  if (!rtx_equal_p (dst, operands[0]))
+emit_move_insn (operands[0], dst);
+  DONE;
+})
+
+(define_expand "ustruncsi2"
+  [(set (match_operand:SWI12 0 "register_operand")
+   (us_truncate:SI (match_operand:SI 1 "nonimmediate_operand")))]
+  ""
+{
+  rtx op1 = force_reg (SImode, operands[1]);
+  rtx sat = force_reg (SImode, GEN_INT (GET_MODE_MASK (mode)));
+  rtx dst;
+
+  emit_insn (gen_cmpsi_1 (op1, sat));
+
+  if (TARGET_CMOVE)
+{
+  rtx cmp = gen_rtx_GEU (VOIDmode, gen_rtx_REG (CCCmode, FLAGS_REG),
+const0_rtx);
+
+  dst = force_reg (mode, operands[0]);
+  emit_insn (gen_movsicc (gen_lowpart (SImode, dst), cmp,
+ gen_lowpart (SImode, op1),
+ gen_lowpart (SImode, sat)));
+}
+  else
+{
+  rtx msk = gen_reg_rtx (mode);
+
+  emit_insn (gen_x86_movcc_0_m1_neg (msk));
+  dst = expand_simple_binop (mode, IOR,
+gen_lowpart (mode, op1), msk,
+operands[0], 1, OPTAB_WIDEN);
+}
+
+  if (!rtx_equal_p (dst, operands[0]))
+emit_move_insn (operands[0], dst);
+  DONE;
+})
+
+(define_expand "ustrunchiqi2"
+  [(set (match_operand:QI 0 "register_operand")
+   (us_truncate:HI (match_operand:HI 1 "nonimmediate_operand")))]
+  ""
+{
+  rtx op1 = force_reg (HImode, operands[1]);
+  rtx sat = force_reg (HImode, GEN_INT (GET_MODE_MASK (QImode)));
+  rtx dst;
+
+  emit_insn (gen_cmphi_1 (op1, sat));
+
+  if (TARGET_CMOVE)
+{
+  rtx cmp = gen_rtx_GEU (VOIDmode, gen_rtx_REG (CCCmode, FLAGS_REG),
+const0_rtx);
+
+  dst = force_reg (QImode, operands[0]);
+  emit_insn (gen_movsicc (gen_lowpart (SImode, dst), cmp,
+ gen_lowpart (SImode, op1),
+ gen_lowpart (SImode, sat)));
+}
+  else
+{
+  rtx msk = gen_reg_rtx (QImode);
+
+  emit_insn (gen_x86_movqicc_0_m1_neg (msk));
+  dst = expand_simple_binop (QImode, IOR,
+gen_lowpart (QImode, op1), msk,
+operands[0], 1, OPTAB_WIDEN);
+}
+
+  if (!rtx_equal_p (dst, operands[0]))
+emit_move_insn (operands[0], dst);
+  DONE;
+})
+
 ;; The patterns that match these are at the end of this file.
 
 (define_expand "xf3"
diff --git a/gcc/testsuite/gcc.target/i386/sattrunc-1.c 

[gcc r15-1914] i386: Implement .SAT_TRUNC for unsigned integers

2024-07-09 Thread Uros Bizjak via Gcc-cvs
https://gcc.gnu.org/g:d17889dbffd5dcdb2df22d42586ac0363704e1f1

commit r15-1914-gd17889dbffd5dcdb2df22d42586ac0363704e1f1
Author: Uros Bizjak 
Date:   Tue Jul 9 17:34:25 2024 +0200

i386: Implement .SAT_TRUNC for unsigned integers

The following testcase:

unsigned short foo (unsigned int x)
{
  _Bool overflow = x > (unsigned int)(unsigned short)(-1);
  return ((unsigned short)x | (unsigned short)-overflow);
}

currently compiles (-O2) to:

foo:
xorl%eax, %eax
cmpl$65535, %edi
seta%al
negl%eax
orl %edi, %eax
ret

We can expand through ustrunc{m}{n}2 optab to use carry flag from the
comparison and generate code using SBB:

foo:
cmpl$65535, %edi
sbbl%eax, %eax
orl %edi, %eax
ret

or CMOV instruction:

foo:
movl$65535, %eax
cmpl%eax, %edi
cmovnc  %edi, %eax
ret

gcc/ChangeLog:

* config/i386/i386.md (@cmp_1): Use SWI mode iterator.
(ustruncdi2): New expander.
(ustruncsi2): Ditto.
(ustrunchiqi2): Ditto.

gcc/testsuite/ChangeLog:

* gcc.target/i386/sattrunc-1.c: New test.

Diff:
---
 gcc/config/i386/i386.md| 112 -
 gcc/testsuite/gcc.target/i386/sattrunc-1.c |  24 +++
 2 files changed, 134 insertions(+), 2 deletions(-)

diff --git a/gcc/config/i386/i386.md b/gcc/config/i386/i386.md
index 214cb2e239ae..e2f30695d70e 100644
--- a/gcc/config/i386/i386.md
+++ b/gcc/config/i386/i386.md
@@ -1533,8 +1533,8 @@
 
 (define_expand "@cmp_1"
   [(set (reg:CC FLAGS_REG)
-   (compare:CC (match_operand:SWI48 0 "nonimmediate_operand")
-   (match_operand:SWI48 1 "")))])
+   (compare:CC (match_operand:SWI 0 "nonimmediate_operand")
+   (match_operand:SWI 1 "")))])
 
 (define_mode_iterator SWI1248_AVX512BWDQ_64
   [(QI "TARGET_AVX512DQ") HI
@@ -9981,6 +9981,114 @@
   DONE;
 })
 
+(define_expand "ustruncdi2"
+  [(set (match_operand:SWI124 0 "register_operand")
+   (us_truncate:DI (match_operand:DI 1 "nonimmediate_operand")))]
+  "TARGET_64BIT"
+{
+  rtx op1 = force_reg (DImode, operands[1]);
+  rtx sat = force_reg (DImode, GEN_INT (GET_MODE_MASK (mode)));
+  rtx dst;
+
+  emit_insn (gen_cmpdi_1 (op1, sat));
+
+  if (TARGET_CMOVE)
+{
+  rtx cmp = gen_rtx_GEU (VOIDmode, gen_rtx_REG (CCCmode, FLAGS_REG),
+const0_rtx);
+
+  dst = force_reg (mode, operands[0]);
+  emit_insn (gen_movsicc (gen_lowpart (SImode, dst), cmp,
+ gen_lowpart (SImode, op1),
+ gen_lowpart (SImode, sat)));
+}
+  else
+{
+  rtx msk = gen_reg_rtx (mode);
+
+  emit_insn (gen_x86_movcc_0_m1_neg (msk));
+  dst = expand_simple_binop (mode, IOR,
+gen_lowpart (mode, op1), msk,
+operands[0], 1, OPTAB_WIDEN);
+}
+
+  if (!rtx_equal_p (dst, operands[0]))
+emit_move_insn (operands[0], dst);
+  DONE;
+})
+
+(define_expand "ustruncsi2"
+  [(set (match_operand:SWI12 0 "register_operand")
+   (us_truncate:SI (match_operand:SI 1 "nonimmediate_operand")))]
+  ""
+{
+  rtx op1 = force_reg (SImode, operands[1]);
+  rtx sat = force_reg (SImode, GEN_INT (GET_MODE_MASK (mode)));
+  rtx dst;
+
+  emit_insn (gen_cmpsi_1 (op1, sat));
+
+  if (TARGET_CMOVE)
+{
+  rtx cmp = gen_rtx_GEU (VOIDmode, gen_rtx_REG (CCCmode, FLAGS_REG),
+const0_rtx);
+
+  dst = force_reg (mode, operands[0]);
+  emit_insn (gen_movsicc (gen_lowpart (SImode, dst), cmp,
+ gen_lowpart (SImode, op1),
+ gen_lowpart (SImode, sat)));
+}
+  else
+{
+  rtx msk = gen_reg_rtx (mode);
+
+  emit_insn (gen_x86_movcc_0_m1_neg (msk));
+  dst = expand_simple_binop (mode, IOR,
+gen_lowpart (mode, op1), msk,
+operands[0], 1, OPTAB_WIDEN);
+}
+
+  if (!rtx_equal_p (dst, operands[0]))
+emit_move_insn (operands[0], dst);
+  DONE;
+})
+
+(define_expand "ustrunchiqi2"
+  [(set (match_operand:QI 0 "register_operand")
+   (us_truncate:HI (match_operand:HI 1 "nonimmediate_operand")))]
+  ""
+{
+  rtx op1 = force_reg (HImode, operands[1]);
+  rtx sat = force_reg (HImode, GEN_INT (GET_MODE_MASK (QImode)));
+  rtx dst;
+
+  emit_insn (gen_cmphi_1 (op1, sat));
+
+  if (TARGET_CMOVE)
+{
+  rtx cmp = gen_rtx_GEU (VOIDmode, gen_rtx_REG (CCCmode, FLAGS_REG),
+   

Re: [PATCH] i386: Correct AVX10 CPUID emulation

2024-07-09 Thread Uros Bizjak
On Tue, Jul 9, 2024 at 10:38 AM Haochen Jiang  wrote:
>
> Hi all,
>
> AVX10 Documentaion has specified ecx value as 0 for AVX10 version and
> vector size under 0x24 subleaf. Although for ecx=1, the bits are all
> reserved for now, we still need to specify ecx as 0 to avoid dirty
> value in ecx.
>
> Bootstrapped on x86-64-pc-linux-gnu. Ok for trunk and backport to GCC14?
>
> Reference:
>
> Intel Advanced Vector Extensions 10 (Intel AVX10) Architecture Specification
>
> https://cdrdv2.intel.com/v1/dl/getContent/784267
>
> It describes the Intel Advanced Vector Extensions 10 Instruction Set 
> Architecture.
>
> Thx,
> Haochen
>
> gcc/ChangeLog:
>
> * common/config/i386/cpuinfo.h (get_available_features): Correct
> AVX10 CPUID emulation to specify ecx value.

OK.

Thanks,
Uros.

> ---
>  gcc/common/config/i386/cpuinfo.h | 4 ++--
>  1 file changed, 2 insertions(+), 2 deletions(-)
>
> diff --git a/gcc/common/config/i386/cpuinfo.h 
> b/gcc/common/config/i386/cpuinfo.h
> index 936039725ab..2ae77d335d2 100644
> --- a/gcc/common/config/i386/cpuinfo.h
> +++ b/gcc/common/config/i386/cpuinfo.h
> @@ -998,10 +998,10 @@ get_available_features (struct __processor_model 
> *cpu_model,
> }
>  }
>
> -  /* Get Advanced Features at level 0x24 (eax = 0x24).  */
> +  /* Get Advanced Features at level 0x24 (eax = 0x24, ecx = 0).  */
>if (avx10_set && max_cpuid_level >= 0x24)
>  {
> -  __cpuid (0x24, eax, ebx, ecx, edx);
> +  __cpuid_count (0x24, 0, eax, ebx, ecx, edx);
>version = ebx & 0xff;
>if (ebx & bit_AVX10_256)
> switch (version)
> --
> 2.31.1
>


[committed] i386: Promote {QI, HI}mode x86_movcc_0_m1_neg to SImode

2024-07-08 Thread Uros Bizjak
Promote HImode x86_movcc_0_m1_neg insn to SImode to avoid
redundant prefixes. Also promote QImode insn when TARGET_PROMOTE_QImode
is set. This is similar to promotable_binary_operator splitter, where we
promote the result to SImode.

Also correct insn condition for splitters to SImode of NEG and NOT
instructions. The sizes of QImode and SImode instructions are always
the same, so there is no need for optimize_insn_for_size bypass.

gcc/ChangeLog:

* config/i386/i386.md (x86_movcc_0_m1_neg splitter to SImode):
New splitter.
(NEG and NOT splitter to SImode): Remove optimize_insn_for_size_p
predicate from insn condition.

Bootstrapped and regression tested on x86_64-linux-gnu {,-m32}.

Uros.
diff --git a/gcc/config/i386/i386.md b/gcc/config/i386/i386.md
index b24c4fe5875..214cb2e239a 100644
--- a/gcc/config/i386/i386.md
+++ b/gcc/config/i386/i386.md
@@ -26576,9 +26576,7 @@ (define_split
(clobber (reg:CC FLAGS_REG))]
   "! TARGET_PARTIAL_REG_STALL && reload_completed
&& (GET_MODE (operands[0]) == HImode
-   || (GET_MODE (operands[0]) == QImode
-  && (TARGET_PROMOTE_QImode
-  || optimize_insn_for_size_p ("
+   || (GET_MODE (operands[0]) == QImode && TARGET_PROMOTE_QImode))"
   [(parallel [(set (match_dup 0)
   (neg:SI (match_dup 1)))
  (clobber (reg:CC FLAGS_REG))])]
@@ -26593,15 +26591,30 @@ (define_split
(not (match_operand 1 "general_reg_operand")))]
   "! TARGET_PARTIAL_REG_STALL && reload_completed
&& (GET_MODE (operands[0]) == HImode
-   || (GET_MODE (operands[0]) == QImode
-  && (TARGET_PROMOTE_QImode
-  || optimize_insn_for_size_p ("
+   || (GET_MODE (operands[0]) == QImode && TARGET_PROMOTE_QImode))"
   [(set (match_dup 0)
(not:SI (match_dup 1)))]
 {
   operands[0] = gen_lowpart (SImode, operands[0]);
   operands[1] = gen_lowpart (SImode, operands[1]);
 })
+
+(define_split
+  [(set (match_operand 0 "general_reg_operand")
+   (neg (match_operator 1 "ix86_carry_flag_operator"
+ [(reg FLAGS_REG) (const_int 0)])))
+   (clobber (reg:CC FLAGS_REG))]
+  "! TARGET_PARTIAL_REG_STALL && reload_completed
+   && (GET_MODE (operands[0]) == HImode
+   || (GET_MODE (operands[0]) == QImode && TARGET_PROMOTE_QImode))"
+  [(parallel [(set (match_dup 0)
+  (neg:SI (match_dup 1)))
+ (clobber (reg:CC FLAGS_REG))])]
+{
+  operands[0] = gen_lowpart (SImode, operands[0]);
+  operands[1] = shallow_copy_rtx (operands[1]);
+  PUT_MODE (operands[1], SImode);
+})
 
 ;; RTL Peephole optimizations, run before sched2.  These primarily look to
 ;; transform a complex memory operation into two memory to register operations.


[gcc r15-1899] i386: Promote {QI, HI}mode x86_movcc_0_m1_neg to SImode

2024-07-08 Thread Uros Bizjak via Gcc-cvs
https://gcc.gnu.org/g:2b3027bea3f218599d36379d3d593841df7a1559

commit r15-1899-g2b3027bea3f218599d36379d3d593841df7a1559
Author: Uros Bizjak 
Date:   Mon Jul 8 20:47:52 2024 +0200

i386: Promote {QI,HI}mode x86_movcc_0_m1_neg to SImode

Promote HImode x86_movcc_0_m1_neg insn to SImode to avoid
redundant prefixes. Also promote QImode insn when TARGET_PROMOTE_QImode
is set. This is similar to promotable_binary_operator splitter, where we
promote the result to SImode.

Also correct insn condition for splitters to SImode of NEG and NOT
instructions. The sizes of QImode and SImode instructions are always
the same, so there is no need for optimize_insn_for_size bypass.

gcc/ChangeLog:

* config/i386/i386.md (x86_movcc_0_m1_neg splitter to SImode):
New splitter.
(NEG and NOT splitter to SImode): Remove optimize_insn_for_size_p
predicate from insn condition.

Diff:
---
 gcc/config/i386/i386.md | 25 +++--
 1 file changed, 19 insertions(+), 6 deletions(-)

diff --git a/gcc/config/i386/i386.md b/gcc/config/i386/i386.md
index b24c4fe58750..214cb2e239ae 100644
--- a/gcc/config/i386/i386.md
+++ b/gcc/config/i386/i386.md
@@ -26576,9 +26576,7 @@
(clobber (reg:CC FLAGS_REG))]
   "! TARGET_PARTIAL_REG_STALL && reload_completed
&& (GET_MODE (operands[0]) == HImode
-   || (GET_MODE (operands[0]) == QImode
-  && (TARGET_PROMOTE_QImode
-  || optimize_insn_for_size_p ("
+   || (GET_MODE (operands[0]) == QImode && TARGET_PROMOTE_QImode))"
   [(parallel [(set (match_dup 0)
   (neg:SI (match_dup 1)))
  (clobber (reg:CC FLAGS_REG))])]
@@ -26593,15 +26591,30 @@
(not (match_operand 1 "general_reg_operand")))]
   "! TARGET_PARTIAL_REG_STALL && reload_completed
&& (GET_MODE (operands[0]) == HImode
-   || (GET_MODE (operands[0]) == QImode
-  && (TARGET_PROMOTE_QImode
-  || optimize_insn_for_size_p ("
+   || (GET_MODE (operands[0]) == QImode && TARGET_PROMOTE_QImode))"
   [(set (match_dup 0)
(not:SI (match_dup 1)))]
 {
   operands[0] = gen_lowpart (SImode, operands[0]);
   operands[1] = gen_lowpart (SImode, operands[1]);
 })
+
+(define_split
+  [(set (match_operand 0 "general_reg_operand")
+   (neg (match_operator 1 "ix86_carry_flag_operator"
+ [(reg FLAGS_REG) (const_int 0)])))
+   (clobber (reg:CC FLAGS_REG))]
+  "! TARGET_PARTIAL_REG_STALL && reload_completed
+   && (GET_MODE (operands[0]) == HImode
+   || (GET_MODE (operands[0]) == QImode && TARGET_PROMOTE_QImode))"
+  [(parallel [(set (match_dup 0)
+  (neg:SI (match_dup 1)))
+ (clobber (reg:CC FLAGS_REG))])]
+{
+  operands[0] = gen_lowpart (SImode, operands[0]);
+  operands[1] = shallow_copy_rtx (operands[1]);
+  PUT_MODE (operands[1], SImode);
+})
 
 ;; RTL Peephole optimizations, run before sched2.  These primarily look to
 ;; transform a complex memory operation into two memory to register operations.


Re: [PATCH v2] i386: Refactor ssedoublemode

2024-07-05 Thread Uros Bizjak
On Fri, Jul 5, 2024 at 9:07 AM Hu, Lin1  wrote:
>
> I Modified the changelog and comments.
>
> ssedoublemode's double should mean double type, like SI -> DI.
> And we need to refactor some patterns with  instead of
> .
>
> BRs,
> Lin
>
> gcc/ChangeLog:
>
> * config/i386/sse.md (ssedoublemode): Remove mappings to double
>   of elements and mapping vector mode to the same number of
>   double sized elements.

Better write: "Remove mappings to twice the number of same-sized
elements.  Add mappings to the same number of double-sized elements."

>   (define_split for vec_concat_minus_plus): Change mode_attr from
>   ssedoublemode to ssedoublevecmode.
>   (define_split for vec_concat_plus_minus): Ditto.
>   (avx512dq_shuf_64x2_1):
>   Ditto.
>   (avx512f_shuf_64x2_1): Ditto.
>   (avx512vl_shuf_32x4_1): Ditto.
>   (avx512f_shuf_32x4_1): Ditto.

OK with the above ChangeLog adjustment.

Thanks,
Uros.

> ---
>  gcc/config/i386/sse.md | 19 +--
>  1 file changed, 9 insertions(+), 10 deletions(-)
>
> diff --git a/gcc/config/i386/sse.md b/gcc/config/i386/sse.md
> index d71b0f2567e..bda66d5e121 100644
> --- a/gcc/config/i386/sse.md
> +++ b/gcc/config/i386/sse.md
> @@ -808,13 +808,12 @@ (define_mode_attr ssedoublemodelower
> (V8HI "v8si")   (V16HI "v16si") (V32HI "v32si")
> (V4SI "v4di")   (V8SI "v8di")   (V16SI "v16di")])
>
> +;; Map vector mode to the same number of double sized elements.
>  (define_mode_attr ssedoublemode
> -  [(V4SF "V8SF") (V8SF "V16SF") (V16SF "V32SF")
> -   (V2DF "V4DF") (V4DF "V8DF") (V8DF "V16DF")
> +  [(V4SF "V4DF") (V8SF "V8DF") (V16SF "V16DF")
> (V16QI "V16HI") (V32QI "V32HI") (V64QI "V64HI")
> (V8HI "V8SI") (V16HI "V16SI") (V32HI "V32SI")
> -   (V4SI "V4DI") (V8SI "V16SI") (V16SI "V32SI")
> -   (V4DI "V8DI") (V8DI "V16DI")])
> +   (V4SI "V4DI") (V8SI "V8DI") (V16SI "V16DI")])
>
>  (define_mode_attr ssebytemode
>[(V8DI "V64QI") (V4DI "V32QI") (V2DI "V16QI")
> @@ -3319,7 +3318,7 @@ (define_split
>  (define_split
>[(set (match_operand:VF_128_256 0 "register_operand")
> (match_operator:VF_128_256 7 "addsub_vs_operator"
> - [(vec_concat:
> + [(vec_concat:
>  (minus:VF_128_256
>(match_operand:VF_128_256 1 "register_operand")
>(match_operand:VF_128_256 2 "vector_operand"))
> @@ -3353,7 +3352,7 @@ (define_split
>  (define_split
>[(set (match_operand:VF_128_256 0 "register_operand")
> (match_operator:VF_128_256 7 "addsub_vs_operator"
> - [(vec_concat:
> + [(vec_concat:
>  (plus:VF_128_256
>(match_operand:VF_128_256 1 "vector_operand")
>(match_operand:VF_128_256 2 "vector_operand"))
> @@ -19869,7 +19868,7 @@ (define_expand "avx512dq_shuf_64x2_mask"
>  (define_insn "avx512dq_shuf_64x2_1"
>[(set (match_operand:VI8F_256 0 "register_operand" "=x,v")
> (vec_select:VI8F_256
> - (vec_concat:
> + (vec_concat:
> (match_operand:VI8F_256 1 "register_operand" "x,v")
> (match_operand:VI8F_256 2 "nonimmediate_operand" "xjm,vm"))
>   (parallel [(match_operand 3 "const_0_to_3_operand")
> @@ -19922,7 +19921,7 @@ (define_expand "avx512f_shuf_64x2_mask"
>  (define_insn "avx512f_shuf_64x2_1"
>[(set (match_operand:V8FI 0 "register_operand" "=v")
> (vec_select:V8FI
> - (vec_concat:
> + (vec_concat:
> (match_operand:V8FI 1 "register_operand" "v")
> (match_operand:V8FI 2 "nonimmediate_operand" "vm"))
>   (parallel [(match_operand 3 "const_0_to_7_operand")
> @@ -20020,7 +20019,7 @@ (define_expand "avx512vl_shuf_32x4_mask"
>  (define_insn "avx512vl_shuf_32x4_1"
>[(set (match_operand:VI4F_256 0 "register_operand" "=x,v")
> (vec_select:VI4F_256
> - (vec_concat:
> + (vec_concat:
> (match_operand:VI4F_256 1 "register_operand" "x,v")
> (match_operand:VI4F_256 2 "nonimmediate_operand" "xjm,vm"))
>   (parallel [(match_operand 3 "const_0_to_7_operand")
> @@ -20091,7 +20090,7 @@ (define_expand "avx512f_shuf_32x4_mask"
>  (define_insn "avx512f_shuf_32x4_1"
>[(set (match_operand:V16FI 0 "register_operand" "=v")
> (vec_select:V16FI
> - (vec_concat:
> + (vec_concat:
> (match_operand:V16FI 1 "register_operand" "v")
> (match_operand:V16FI 2 "nonimmediate_operand" "vm"))
>   (parallel [(match_operand 3 "const_0_to_15_operand")
> --
> 2.31.1
>


Re: [PATCH] i386: Refactor ssedoublemode

2024-07-05 Thread Uros Bizjak
On Fri, Jul 5, 2024 at 7:48 AM Hu, Lin1  wrote:
>
> Hi, all
>
> ssedoublemode's double should mean double type, like SI -> DI.
> And we need to refactor some patterns with  instead of
> .
>
> Bootstrapped and regtested on x86-64-linux-gnu, OK for trunk?
>
> BRs,
> Lin
>
> gcc/ChangeLog:
>
> * config/i386/sse.md (ssedoublemode): Fix the mode_attr.

Please be more descriptive, like ": Remove mappings to double of
elements". Please also add names of changed patterns to ChangeLog.

> ---
>  gcc/config/i386/sse.md | 19 +--
>  1 file changed, 9 insertions(+), 10 deletions(-)
>
> diff --git a/gcc/config/i386/sse.md b/gcc/config/i386/sse.md
> index d71b0f2567e..d06ce94fa55 100644
> --- a/gcc/config/i386/sse.md
> +++ b/gcc/config/i386/sse.md
> @@ -808,13 +808,12 @@ (define_mode_attr ssedoublemodelower
> (V8HI "v8si")   (V16HI "v16si") (V32HI "v32si")
> (V4SI "v4di")   (V8SI "v8di")   (V16SI "v16di")])
>
> +;; ssedoublemode means vector mode with sanme number of double-size.

Better say: Map vector mode to the same number of double sized elements.

Uros.

>  (define_mode_attr ssedoublemode
> -  [(V4SF "V8SF") (V8SF "V16SF") (V16SF "V32SF")
> -   (V2DF "V4DF") (V4DF "V8DF") (V8DF "V16DF")
> +  [(V4SF "V4DF") (V8SF "V8DF") (V16SF "V16DF")
> (V16QI "V16HI") (V32QI "V32HI") (V64QI "V64HI")
> (V8HI "V8SI") (V16HI "V16SI") (V32HI "V32SI")
> -   (V4SI "V4DI") (V8SI "V16SI") (V16SI "V32SI")
> -   (V4DI "V8DI") (V8DI "V16DI")])
> +   (V4SI "V4DI") (V8SI "V8DI") (V16SI "V16DI")])
>
>  (define_mode_attr ssebytemode
>[(V8DI "V64QI") (V4DI "V32QI") (V2DI "V16QI")
> @@ -3319,7 +3318,7 @@ (define_split
>  (define_split
>[(set (match_operand:VF_128_256 0 "register_operand")
> (match_operator:VF_128_256 7 "addsub_vs_operator"
> - [(vec_concat:
> + [(vec_concat:
>  (minus:VF_128_256
>(match_operand:VF_128_256 1 "register_operand")
>(match_operand:VF_128_256 2 "vector_operand"))
> @@ -3353,7 +3352,7 @@ (define_split
>  (define_split
>[(set (match_operand:VF_128_256 0 "register_operand")
> (match_operator:VF_128_256 7 "addsub_vs_operator"
> - [(vec_concat:
> + [(vec_concat:
>  (plus:VF_128_256
>(match_operand:VF_128_256 1 "vector_operand")
>(match_operand:VF_128_256 2 "vector_operand"))
> @@ -19869,7 +19868,7 @@ (define_expand "avx512dq_shuf_64x2_mask"
>  (define_insn "avx512dq_shuf_64x2_1"
>[(set (match_operand:VI8F_256 0 "register_operand" "=x,v")
> (vec_select:VI8F_256
> - (vec_concat:
> + (vec_concat:
> (match_operand:VI8F_256 1 "register_operand" "x,v")
> (match_operand:VI8F_256 2 "nonimmediate_operand" "xjm,vm"))
>   (parallel [(match_operand 3 "const_0_to_3_operand")
> @@ -19922,7 +19921,7 @@ (define_expand "avx512f_shuf_64x2_mask"
>  (define_insn "avx512f_shuf_64x2_1"
>[(set (match_operand:V8FI 0 "register_operand" "=v")
> (vec_select:V8FI
> - (vec_concat:
> + (vec_concat:
> (match_operand:V8FI 1 "register_operand" "v")
> (match_operand:V8FI 2 "nonimmediate_operand" "vm"))
>   (parallel [(match_operand 3 "const_0_to_7_operand")
> @@ -20020,7 +20019,7 @@ (define_expand "avx512vl_shuf_32x4_mask"
>  (define_insn "avx512vl_shuf_32x4_1"
>[(set (match_operand:VI4F_256 0 "register_operand" "=x,v")
> (vec_select:VI4F_256
> - (vec_concat:
> + (vec_concat:
> (match_operand:VI4F_256 1 "register_operand" "x,v")
> (match_operand:VI4F_256 2 "nonimmediate_operand" "xjm,vm"))
>   (parallel [(match_operand 3 "const_0_to_7_operand")
> @@ -20091,7 +20090,7 @@ (define_expand "avx512f_shuf_32x4_mask"
>  (define_insn "avx512f_shuf_32x4_1"
>[(set (match_operand:V16FI 0 "register_operand" "=v")
> (vec_select:V16FI
> - (vec_concat:
> + (vec_concat:
> (match_operand:V16FI 1 "register_operand" "v")
> (match_operand:V16FI 2 "nonimmediate_operand" "vm"))
>   (parallel [(match_operand 3 "const_0_to_15_operand")
> --
> 2.31.1
>


Re: [x86 PATCH] Add additional variant of bswaphisi2_lowpart peephole2.

2024-07-01 Thread Uros Bizjak
On Mon, Jul 1, 2024 at 3:20 PM Roger Sayle  wrote:
>
>
> This patch adds an additional variation of the peephole2 used to convert
> bswaphisi2_lowpart into rotlhi3_1_slp, which converts xchgb %ah,%al into
> rotw if the flags register isn't live.  The motivating example is:
>
> void ext(int x);
> void foo(int x)
> {
>   ext((x&~0x)|((x>>8)&0xff)|((x&0xff)<<8));
> }
>
> where GCC with -O2 currently produces:
>
> foo:movl%edi, %eax
> rolw$8, %ax
> movl%eax, %edi
> jmp ext
>
> The issue is that the original xchgb (bswaphisi2_lowpart) can only be
> performed in "Q" registers that allow the %?h register to be used, so
> reload generates the above two movl.  However, it's later in peephole2
> where we see that CC_FLAGS can be clobbered, so we can use a rotate word,
> which is more forgiving with register allocations.  With the additional
> peephole2 proposed here, we now generate:
>
> foo:rolw$8, %di
> jmp ext
>
>
> This patch has been tested on x86_64-pc-linux-gnu with make bootstrap
> and make -k check, both with and without --target_board=unix{-m32}
> with no new failures.  Ok for mainline?
>
>
> 2024-07-01  Roger Sayle  
>
> gcc/ChangeLog
> * config/i386/i386.md (bswaphisi2_lowpart peephole2): New
> peephole2 variant to eliminate register shuffling.
>
> gcc/testsuite/ChangeLog
> * gcc.target/i386/xchg-4.c: New test case.

OK.

Thanks,
Uros.

>
>
> Thanks again,
> Roger
> --
>


Re: [x86 PATCH]: Additional peephole2 to use lea in round-up integer division.

2024-06-30 Thread Uros Bizjak
On Sun, Jun 30, 2024 at 9:09 PM Roger Sayle  wrote:
>
>
> Hi Uros,
> > On Sat, Jun 29, 2024 at 6:21 PM Roger Sayle 
> > wrote:
> > > A common idiom for implementing an integer division that rounds
> > > upwards is to write (x + y - 1) / y.  Conveniently on x86, the two
> > > additions to form the numerator can be performed by a single lea
> > > instruction, and indeed gcc currently generates a lea when x and y both
> > registers.
> > >
> > > int foo(int x, int y) {
> > >   return (x+y-1)/y;
> > > }
> > >
> > > generates with -O2:
> > >
> > > foo:leal-1(%rsi,%rdi), %eax // 4 bytes
> > > cltd
> > > idivl   %esi
> > > ret
> > >
> > > Oddly, however, if x is a memory, gcc currently uses two instructions:
> > >
> > > int m;
> > > int bar(int y) {
> > >   return (m+y-1)/y;
> > > }
> > >
> > > generates:
> > >
> > > foo:movlm(%rip), %eax
> > > addl%edi, %eax  // 2 bytes
> > > subl$1, %eax// 3 bytes
> > > cltd
> > > idivl   %edi
> > > ret
> > >
> > > This discrepancy is caused by the late decision (in peephole2) to
> > > split an addition with a memory operand, into a load followed by a
> > > reg-reg addition.  This patch improves this situation by adding a
> > > peephole2 to recognized consecutive additions and transform them into
> > > lea if profitable.
> > >
> > > My first attempt at fixing this was to use a define_insn_and_split:
> > >
> > > (define_insn_and_split "*lea3_reg_mem_imm"
> > >   [(set (match_operand:SWI48 0 "register_operand")
> > >(plus:SWI48 (plus:SWI48 (match_operand:SWI48 1 "register_operand")
> > >(match_operand:SWI48 2 "memory_operand"))
> > >(match_operand:SWI48 3 "x86_64_immediate_operand")))]
> > >   "ix86_pre_reload_split ()"
> > >   "#"
> > >   "&& 1"
> > >   [(set (match_dup 4) (match_dup 2))
> > >(set (match_dup 0) (plus:SWI48 (plus:SWI48 (match_dup 1) (match_dup 4))
> > >  (match_dup 3)))]
> > >   "operands[4] = gen_reg_rtx (mode);")
> > >
> > > using combine to combine instructions.  Unfortunately, this approach
> > > interferes with (reload's) subtle balance of deciding when to
> > > use/avoid lea, which can be observed as a code size regression in
> > > CSiBE.  The peephole2 approach (proposed here) uniformly improves CSiBE
> > results.
> > >
> > > This patch has been tested on x86_64-pc-linux-gnu with make bootstrap
> > > and make -k check, both with and without --target_board=unix{-m32}
> > > with no new failures.  Ok for mainline?
> > >
> > >
> > > 2024-06-29  Roger Sayle  
> > >
> > > gcc/ChangeLog
> > > * config/i386/i386.md (peephole2): Transform two consecutive
> > > additions into a 3-component lea if !TARGET_AVOID_LEA_FOR_ADDR.
> > >
> > > gcc/testsuite/ChageLog
> > > * gcc.target/i386/lea-3.c: New test case.
> >
> > Is the assumption that one LEA is always faster than two ADD instructions
> > universally correct for TARGET_AVOID_LEA_FOR_ADDR?
> >
> > Please note ix86_lea_outperforms predicate and its uses in
> > ix86_avoid_lea_for_add(), ix86_use_lea_for_mov(),
> > ix86_avoid_lea_for_addr() and ix86_lea_for_add_ok(). IMO,
> > !avoid_lea_for_addr() should be used here, but I didn't check it thoroughly.
> >
> > The function comment of avoid_lea_for_addr() says:
> >
> > /* Return true if we need to split lea into a sequence of
> >instructions to avoid AGU stalls during peephole2. */
> >
> > And your peephole tries to reverse the above split.
>
> I completely agree that understanding when/why i386.md converts
> an lea into a sequence of additions (and avoiding reversing this split)
> is vitally important to understanding my patch.  You're quite right that
> the logic governing this ultimately calls ix86_lea_outperforms, but as
> I'll explain below the shape of those APIs (requiring an insn) is not as
> convenient for instruction merging as for splitting.
>
> The current location in i386.md where it decides whether the
> lea in the foo example above needs to be split, is at line 6293:
>
> (define_peephole2
>   [(set (match_operand:SWI48 0 "register_operand")
> (match_operand:SWI48 1 "address_no_seg_operand"))]
>   "ix86_hardreg_mov_ok (operands[0], operands[1])
>&& peep2_regno_dead_p (0, FLAGS_REG)
>&& ix86_avoid_lea_for_addr (peep2_next_insn (0), operands)"
> ...
>
> Hence, we transform lea->add+add when ix86_avoid_lea_for_addr
> returns true, so by symmetry is not unreasonable to turn add+add->lea
> when ix86_avoid_lea_for_addr would return false.  The relevant part
> of ix86_avoid_lea_for_addr is then around line 15974 of i386.cc:
>
>   /* Check we need to optimize.  */
>   if (!TARGET_AVOID_LEA_FOR_ADDR || optimize_function_for_size_p (cfun))
> return false;
>
> which you'll recognize is precisely the condition under which my
> proposed peephole2 fires.  Technically, we also know that this is
> a 3-component lea, 

Re: [x86 PATCH]: Additional peephole2 to use lea in round-up integer division.

2024-06-30 Thread Uros Bizjak
On Sat, Jun 29, 2024 at 6:21 PM Roger Sayle  wrote:
>
>
> A common idiom for implementing an integer division that rounds upwards is
> to write (x + y - 1) / y.  Conveniently on x86, the two additions to form
> the numerator can be performed by a single lea instruction, and indeed gcc
> currently generates a lea when x and y both registers.
>
> int foo(int x, int y) {
>   return (x+y-1)/y;
> }
>
> generates with -O2:
>
> foo:leal-1(%rsi,%rdi), %eax // 4 bytes
> cltd
> idivl   %esi
> ret
>
> Oddly, however, if x is a memory, gcc currently uses two instructions:
>
> int m;
> int bar(int y) {
>   return (m+y-1)/y;
> }
>
> generates:
>
> foo:movlm(%rip), %eax
> addl%edi, %eax  // 2 bytes
> subl$1, %eax// 3 bytes
> cltd
> idivl   %edi
> ret
>
> This discrepancy is caused by the late decision (in peephole2) to split
> an addition with a memory operand, into a load followed by a reg-reg
> addition.  This patch improves this situation by adding a peephole2
> to recognized consecutive additions and transform them into lea if
> profitable.
>
> My first attempt at fixing this was to use a define_insn_and_split:
>
> (define_insn_and_split "*lea3_reg_mem_imm"
>   [(set (match_operand:SWI48 0 "register_operand")
>(plus:SWI48 (plus:SWI48 (match_operand:SWI48 1 "register_operand")
>(match_operand:SWI48 2 "memory_operand"))
>(match_operand:SWI48 3 "x86_64_immediate_operand")))]
>   "ix86_pre_reload_split ()"
>   "#"
>   "&& 1"
>   [(set (match_dup 4) (match_dup 2))
>(set (match_dup 0) (plus:SWI48 (plus:SWI48 (match_dup 1) (match_dup 4))
>  (match_dup 3)))]
>   "operands[4] = gen_reg_rtx (mode);")
>
> using combine to combine instructions.  Unfortunately, this approach
> interferes with (reload's) subtle balance of deciding when to use/avoid lea,
> which can be observed as a code size regression in CSiBE.  The peephole2
> approach (proposed here) uniformly improves CSiBE results.
>
> This patch has been tested on x86_64-pc-linux-gnu with make bootstrap
> and make -k check, both with and without --target_board=unix{-m32}
> with no new failures.  Ok for mainline?
>
>
> 2024-06-29  Roger Sayle  
>
> gcc/ChangeLog
> * config/i386/i386.md (peephole2): Transform two consecutive
> additions into a 3-component lea if !TARGET_AVOID_LEA_FOR_ADDR.
>
> gcc/testsuite/ChageLog
> * gcc.target/i386/lea-3.c: New test case.

Is the assumption that one LEA is always faster than two ADD
instructions universally correct for TARGET_AVOID_LEA_FOR_ADDR?

Please note ix86_lea_outperforms predicate and its uses in
ix86_avoid_lea_for_add(), ix86_use_lea_for_mov(),
ix86_avoid_lea_for_addr() and ix86_lea_for_add_ok(). IMO,
!avoid_lea_for_addr() should be used here, but I didn't check it
thoroughly.

The function comment of avoid_lea_for_addr() says:

/* Return true if we need to split lea into a sequence of
   instructions to avoid AGU stalls during peephole2. */

And your peephole tries to reverse the above split.

Uros.

>
>
> Thanks in advance,
> Roger
> --
>


[PATCH] i386: Cleanup tmp variable usage in ix86_expand_move

2024-06-28 Thread Uros Bizjak
Remove extra assignment, extra temp variable and variable shadowing.

No functional changes intended.

gcc/ChangeLog:

* config/i386/i386-expand.cc (ix86_expand_move): Remove extra
assignment to tmp variable, reuse tmp variable instead of
declaring new temporary variable and remove tmp variable shadowing.

Bootstrapped and regression tested on x86_64-linux-gnu {,-m32}.

Also built crosscompiler to x86_64-pc-cygwin and x86_64-apple-darwin16.

Uros.
diff --git a/gcc/config/i386/i386-expand.cc b/gcc/config/i386/i386-expand.cc
index a4434c19272..a773b45bf03 100644
--- a/gcc/config/i386/i386-expand.cc
+++ b/gcc/config/i386/i386-expand.cc
@@ -414,9 +414,6 @@ ix86_expand_move (machine_mode mode, rtx operands[])
{
 #if TARGET_PECOFF
  tmp = legitimize_pe_coff_symbol (op1, addend != NULL_RTX);
-#else
- tmp = NULL_RTX;
-#endif
 
  if (tmp)
{
@@ -425,6 +422,7 @@ ix86_expand_move (machine_mode mode, rtx operands[])
break;
}
  else
+#endif
{
  op1 = operands[1];
  break;
@@ -482,12 +480,12 @@ ix86_expand_move (machine_mode mode, rtx operands[])
  /* dynamic-no-pic */
  if (MACHOPIC_INDIRECT)
{
- rtx temp = (op0 && REG_P (op0) && mode == Pmode)
-? op0 : gen_reg_rtx (Pmode);
- op1 = machopic_indirect_data_reference (op1, temp);
+ tmp = (op0 && REG_P (op0) && mode == Pmode)
+   ? op0 : gen_reg_rtx (Pmode);
+ op1 = machopic_indirect_data_reference (op1, tmp);
  if (MACHOPIC_PURE)
op1 = machopic_legitimize_pic_address (op1, mode,
-  temp == op1 ? 0 : temp);
+  tmp == op1 ? 0 : tmp);
}
  if (op0 != op1 && GET_CODE (op0) != MEM)
{
@@ -542,9 +540,9 @@ ix86_expand_move (machine_mode mode, rtx operands[])
  op1 = validize_mem (force_const_mem (mode, op1));
  if (!register_operand (op0, mode))
{
- rtx temp = gen_reg_rtx (mode);
- emit_insn (gen_rtx_SET (temp, op1));
- emit_move_insn (op0, temp);
+ tmp = gen_reg_rtx (mode);
+ emit_insn (gen_rtx_SET (tmp, op1));
+ emit_move_insn (op0, tmp);
  return;
}
}
@@ -565,7 +563,7 @@ ix86_expand_move (machine_mode mode, rtx operands[])
   if (SUBREG_BYTE (op0) == 0)
{
  wide_int mask = wi::mask (64, true, 128);
- rtx tmp = immed_wide_int_const (mask, TImode);
+ tmp = immed_wide_int_const (mask, TImode);
  op0 = SUBREG_REG (op0);
  tmp = gen_rtx_AND (TImode, copy_rtx (op0), tmp);
  if (mode == DFmode)
@@ -577,7 +575,7 @@ ix86_expand_move (machine_mode mode, rtx operands[])
   else if (SUBREG_BYTE (op0) == 8)
{
  wide_int mask = wi::mask (64, false, 128);
- rtx tmp = immed_wide_int_const (mask, TImode);
+ tmp = immed_wide_int_const (mask, TImode);
  op0 = SUBREG_REG (op0);
  tmp = gen_rtx_AND (TImode, copy_rtx (op0), tmp);
  if (mode == DFmode)


[gcc r15-1711] i386: Cleanup tmp variable usage in ix86_expand_move

2024-06-28 Thread Uros Bizjak via Gcc-cvs
https://gcc.gnu.org/g:7419b4fe48b48e44b27e2dadc9ff870f5e049077

commit r15-1711-g7419b4fe48b48e44b27e2dadc9ff870f5e049077
Author: Uros Bizjak 
Date:   Fri Jun 28 17:49:43 2024 +0200

i386: Cleanup tmp variable usage in ix86_expand_move

Remove extra assignment, extra temp variable and variable shadowing.

No functional changes intended.

gcc/ChangeLog:

* config/i386/i386-expand.cc (ix86_expand_move): Remove extra
assignment to tmp variable, reuse tmp variable instead of
declaring new temporary variable and remove tmp variable shadowing.

Diff:
---
 gcc/config/i386/i386-expand.cc | 22 ++
 1 file changed, 10 insertions(+), 12 deletions(-)

diff --git a/gcc/config/i386/i386-expand.cc b/gcc/config/i386/i386-expand.cc
index a4434c19272..a773b45bf03 100644
--- a/gcc/config/i386/i386-expand.cc
+++ b/gcc/config/i386/i386-expand.cc
@@ -414,9 +414,6 @@ ix86_expand_move (machine_mode mode, rtx operands[])
{
 #if TARGET_PECOFF
  tmp = legitimize_pe_coff_symbol (op1, addend != NULL_RTX);
-#else
- tmp = NULL_RTX;
-#endif
 
  if (tmp)
{
@@ -425,6 +422,7 @@ ix86_expand_move (machine_mode mode, rtx operands[])
break;
}
  else
+#endif
{
  op1 = operands[1];
  break;
@@ -482,12 +480,12 @@ ix86_expand_move (machine_mode mode, rtx operands[])
  /* dynamic-no-pic */
  if (MACHOPIC_INDIRECT)
{
- rtx temp = (op0 && REG_P (op0) && mode == Pmode)
-? op0 : gen_reg_rtx (Pmode);
- op1 = machopic_indirect_data_reference (op1, temp);
+ tmp = (op0 && REG_P (op0) && mode == Pmode)
+   ? op0 : gen_reg_rtx (Pmode);
+ op1 = machopic_indirect_data_reference (op1, tmp);
  if (MACHOPIC_PURE)
op1 = machopic_legitimize_pic_address (op1, mode,
-  temp == op1 ? 0 : temp);
+  tmp == op1 ? 0 : tmp);
}
  if (op0 != op1 && GET_CODE (op0) != MEM)
{
@@ -542,9 +540,9 @@ ix86_expand_move (machine_mode mode, rtx operands[])
  op1 = validize_mem (force_const_mem (mode, op1));
  if (!register_operand (op0, mode))
{
- rtx temp = gen_reg_rtx (mode);
- emit_insn (gen_rtx_SET (temp, op1));
- emit_move_insn (op0, temp);
+ tmp = gen_reg_rtx (mode);
+ emit_insn (gen_rtx_SET (tmp, op1));
+ emit_move_insn (op0, tmp);
  return;
}
}
@@ -565,7 +563,7 @@ ix86_expand_move (machine_mode mode, rtx operands[])
   if (SUBREG_BYTE (op0) == 0)
{
  wide_int mask = wi::mask (64, true, 128);
- rtx tmp = immed_wide_int_const (mask, TImode);
+ tmp = immed_wide_int_const (mask, TImode);
  op0 = SUBREG_REG (op0);
  tmp = gen_rtx_AND (TImode, copy_rtx (op0), tmp);
  if (mode == DFmode)
@@ -577,7 +575,7 @@ ix86_expand_move (machine_mode mode, rtx operands[])
   else if (SUBREG_BYTE (op0) == 8)
{
  wide_int mask = wi::mask (64, false, 128);
- rtx tmp = immed_wide_int_const (mask, TImode);
+ tmp = immed_wide_int_const (mask, TImode);
  op0 = SUBREG_REG (op0);
  tmp = gen_rtx_AND (TImode, copy_rtx (op0), tmp);
  if (mode == DFmode)


Re: [PATCH] i386: Fix regression after refactoring legitimize_pe_coff_symbol, ix86_GOT_alias_set and PE_COFF_LEGITIMIZE_EXTERN_DECL

2024-06-28 Thread Uros Bizjak
On Fri, Jun 28, 2024 at 1:41 PM Evgeny Karpov
 wrote:
>
> Thursday, June 27, 2024 8:13 PM
> Uros Bizjak  wrote:
>
> >
> > So, there is no problem having #endif just after else.
> >
> > Anyway, it's your call, this is not a hill I'm willing to die on. ;)
> >
> > Thanks,
> > Uros.
>
> It looks like the patch resolves 3 reported issues.
> Uros, I suggest merging the patch as it is, without minor refactoring, to 
> avoid triggering another round of testing, if you agree.

Yes, please go ahead.

Thanks,
Uros.


Re: [PATCH 3/3] [x86] Enable flate-combine.

2024-06-28 Thread Uros Bizjak
On Fri, Jun 28, 2024 at 7:29 AM liuhongt  wrote:
>
> Move pass_stv2 and pass_rpad after pre_reload pass_late_combine, also
> define target_insn_cost to prevent post_reload pass_late_combine to
> revert the optimziation did in pass_rpad.
>
> Adjust testcases since pass_late_combine generates better code but
> break scan assembly.
>
> .i.e
> Under 32-bit target, gcc used to generate broadcast from stack and
> then do the real operation.
> After flate_combine, they're combined into embeded broadcast
> operations.
>
> gcc/ChangeLog:
>
> * config/i386/i386-features.cc (ix86_rpad_gate): New function.
> * config/i386/i386-options.cc (ix86_override_options_after_change):
> Don't disable flate_combine.
> * config/i386/i386-passes.def: Move pass_stv2 and pass_rpad
> after pre_reload pas_late_combine.
> * config/i386/i386-protos.h (ix86_rpad_gate): New declare.
> * config/i386/i386.cc (ix86_insn_cost): New function.
> (TARGET_INSN_COST): Define.
>
> gcc/testsuite/ChangeLog:
>
> * gcc.target/i386/avx512f-broadcast-pr87767-1.c: Adjus
> testcase.
> * gcc.target/i386/avx512f-broadcast-pr87767-5.c: Ditto.
> * gcc.target/i386/avx512f-fmadd-sf-zmm-7.c: Ditto.
> * gcc.target/i386/avx512f-fmsub-sf-zmm-7.c: Ditto.
> * gcc.target/i386/avx512f-fnmadd-sf-zmm-7.c: Ditto.
> * gcc.target/i386/avx512f-fnmsub-sf-zmm-7.c: Ditto.
> * gcc.target/i386/avx512vl-broadcast-pr87767-1.c: Ditto.
> * gcc.target/i386/avx512vl-broadcast-pr87767-5.c: Ditto.
> * gcc.target/i386/pr91333.c: Ditto.
> * gcc.target/i386/vect-strided-4.c: Ditto.

LGTM.

Thanks,
Uros.

> ---
>  gcc/config/i386/i386-features.cc   | 16 +++-
>  gcc/config/i386/i386-options.cc|  4 
>  gcc/config/i386/i386-passes.def|  4 ++--
>  gcc/config/i386/i386-protos.h  |  1 +
>  gcc/config/i386/i386.cc| 18 ++
>  .../i386/avx512f-broadcast-pr87767-1.c |  4 ++--
>  .../i386/avx512f-broadcast-pr87767-5.c |  1 -
>  .../gcc.target/i386/avx512f-fmadd-sf-zmm-7.c   |  2 +-
>  .../gcc.target/i386/avx512f-fmsub-sf-zmm-7.c   |  2 +-
>  .../gcc.target/i386/avx512f-fnmadd-sf-zmm-7.c  |  2 +-
>  .../gcc.target/i386/avx512f-fnmsub-sf-zmm-7.c  |  2 +-
>  .../i386/avx512vl-broadcast-pr87767-1.c|  4 ++--
>  .../i386/avx512vl-broadcast-pr87767-5.c|  2 --
>  gcc/testsuite/gcc.target/i386/pr91333.c|  2 +-
>  gcc/testsuite/gcc.target/i386/vect-strided-4.c |  2 +-
>  15 files changed, 42 insertions(+), 24 deletions(-)
>
> diff --git a/gcc/config/i386/i386-features.cc 
> b/gcc/config/i386/i386-features.cc
> index 607d1991460..fc224ed06b0 100644
> --- a/gcc/config/i386/i386-features.cc
> +++ b/gcc/config/i386/i386-features.cc
> @@ -2995,6 +2995,16 @@ make_pass_insert_endbr_and_patchable_area 
> (gcc::context *ctxt)
>return new pass_insert_endbr_and_patchable_area (ctxt);
>  }
>
> +bool
> +ix86_rpad_gate ()
> +{
> +  return (TARGET_AVX
> + && TARGET_SSE_PARTIAL_REG_DEPENDENCY
> + && TARGET_SSE_MATH
> + && optimize
> + && optimize_function_for_speed_p (cfun));
> +}
> +
>  /* At entry of the nearest common dominator for basic blocks with
> conversions/rcp/sqrt/rsqrt/round, generate a single
> vxorps %xmmN, %xmmN, %xmmN
> @@ -3232,11 +3242,7 @@ public:
>/* opt_pass methods: */
>bool gate (function *) final override
>  {
> -  return (TARGET_AVX
> - && TARGET_SSE_PARTIAL_REG_DEPENDENCY
> - && TARGET_SSE_MATH
> - && optimize
> - && optimize_function_for_speed_p (cfun));
> +  return ix86_rpad_gate ();
>  }
>
>unsigned int execute (function *) final override
> diff --git a/gcc/config/i386/i386-options.cc b/gcc/config/i386/i386-options.cc
> index 9c12d498928..1ef2c71a7a2 100644
> --- a/gcc/config/i386/i386-options.cc
> +++ b/gcc/config/i386/i386-options.cc
> @@ -1944,10 +1944,6 @@ ix86_override_options_after_change (void)
> flag_cunroll_grow_size = flag_peel_loops || optimize >= 3;
>  }
>
> -  /* Late combine tends to undo some of the effects of STV and RPAD,
> - by combining instructions back to their original form.  */
> -  if (!OPTION_SET_P (flag_late_combine_instructions))
> -flag_late_combine_instructions = 0;
>  }
>
>  /* Clear stack slot assignments remembered from previous functions.
> diff --git a/gcc/config/i386/i386-passes.def b/gcc/config/i386/i386-passes.def
> index 7d96766f7b9..2d29f65da88 100644
> --- a/gcc/config/i386/i386-passes.def
> +++ b/gcc/config/i386/i386-passes.def
> @@ -25,11 +25,11 @@ along with GCC; see the file COPYING3.  If not see
>   */
>
>INSERT_PASS_AFTER (pass_postreload_cse, 1, pass_insert_vzeroupper);
> -  INSERT_PASS_AFTER (pass_combine, 1, pass_stv, false /* timode_p */);
> +  INSERT_PASS_AFTER (pass_late_combine, 1, pass_stv, false /* 

Re: [PATCH 2/3] Extend lshifrtsi3_1_zext to ?k alternative.

2024-06-28 Thread Uros Bizjak
On Fri, Jun 28, 2024 at 7:29 AM liuhongt  wrote:
>
> late_combine will combine lshift + zero into *lshifrtsi3_1_zext which
> cause extra mov between gpr and kmask, add ?k to the pattern.
>
> gcc/ChangeLog:
>
> PR target/115610
> * config/i386/i386.md (<*insnsi3_zext): Add alternative ?k,
> enable it only for lshiftrt and under avx512bw.
> * config/i386/sse.md (*klshrsi3_1_zext): New define_insn, and
> add corresponding define_split after it.

OK.

Thanks,
Uros.

> ---
>  gcc/config/i386/i386.md | 19 +--
>  gcc/config/i386/sse.md  | 28 
>  2 files changed, 41 insertions(+), 6 deletions(-)
>
> diff --git a/gcc/config/i386/i386.md b/gcc/config/i386/i386.md
> index fd48e764469..57a10c1af48 100644
> --- a/gcc/config/i386/i386.md
> +++ b/gcc/config/i386/i386.md
> @@ -16836,10 +16836,10 @@ (define_insn "*bmi2_si3_1_zext"
> (set_attr "mode" "SI")])
>
>  (define_insn "*si3_1_zext"
> -  [(set (match_operand:DI 0 "register_operand" "=r,r,r")
> +  [(set (match_operand:DI 0 "register_operand" "=r,r,r,?k")
> (zero_extend:DI
> - (any_shiftrt:SI (match_operand:SI 1 "nonimmediate_operand" 
> "0,rm,rm")
> - (match_operand:QI 2 "nonmemory_operand" 
> "cI,r,cI"
> + (any_shiftrt:SI (match_operand:SI 1 "nonimmediate_operand" 
> "0,rm,rm,k")
> + (match_operand:QI 2 "nonmemory_operand" 
> "cI,r,cI,I"
> (clobber (reg:CC FLAGS_REG))]
>"TARGET_64BIT
> && ix86_binary_operator_ok (, SImode, operands, TARGET_APX_NDD)"
> @@ -16850,6 +16850,8 @@ (define_insn "*si3_1_zext"
>  case TYPE_ISHIFTX:
>return "#";
>
> +case TYPE_MSKLOG:
> +  return "#";
>  default:
>if (operands[2] == const1_rtx
>   && (TARGET_SHIFT1 || optimize_function_for_size_p (cfun))
> @@ -16860,8 +16862,8 @@ (define_insn "*si3_1_zext"
>: "{l}\t{%2, %k0|%k0, %2}";
>  }
>  }
> -  [(set_attr "isa" "*,bmi2,apx_ndd")
> -   (set_attr "type" "ishift,ishiftx,ishift")
> +  [(set_attr "isa" "*,bmi2,apx_ndd,avx512bw")
> +   (set_attr "type" "ishift,ishiftx,ishift,msklog")
> (set (attr "length_immediate")
>   (if_then_else
> (and (match_operand 2 "const1_operand")
> @@ -16869,7 +16871,12 @@ (define_insn "*si3_1_zext"
>  (match_test "optimize_function_for_size_p (cfun)")))
> (const_string "0")
> (const_string "*")))
> -   (set_attr "mode" "SI")])
> +   (set_attr "mode" "SI")
> +   (set (attr "enabled")
> +   (if_then_else
> + (eq_attr "alternative" "3")
> + (symbol_ref " == LSHIFTRT && TARGET_AVX512BW")
> + (const_string "*")))])
>
>  ;; Convert shift to the shiftx pattern to avoid flags dependency.
>  (define_split
> diff --git a/gcc/config/i386/sse.md b/gcc/config/i386/sse.md
> index 0be2dcd8891..20665a6f097 100644
> --- a/gcc/config/i386/sse.md
> +++ b/gcc/config/i386/sse.md
> @@ -2179,6 +2179,34 @@ (define_split
>  (match_dup 2)))
>(unspec [(const_int 0)] UNSPEC_MASKOP)])])
>
> +(define_insn "*klshrsi3_1_zext"
> +  [(set (match_operand:DI 0 "register_operand" "=k")
> +   (zero_extend:DI
> + (lshiftrt:SI (match_operand:SI 1 "register_operand" "k")
> +  (match_operand 2 "const_0_to_31_operand" "I"
> +  (unspec [(const_int 0)] UNSPEC_MASKOP)]
> +  "TARGET_AVX512BW"
> +  "kshiftrd\t{%2, %1, %0|%0, %1, %2}"
> +[(set_attr "type" "msklog")
> +   (set_attr "prefix" "vex")
> +   (set_attr "mode" "SI")])
> +
> +(define_split
> +  [(set (match_operand:DI 0 "mask_reg_operand")
> +   (zero_extend:DI
> + (lshiftrt:SI
> +   (match_operand:SI 1 "mask_reg_operand")
> +   (match_operand 2 "const_0_to_31_operand"
> +(clobber (reg:CC FLAGS_REG))]
> +  "TARGET_AVX512BW && reload_completed"
> +  [(parallel
> + [(set (match_dup 0)
> +  (zero_extend:DI
> +(lshiftrt:SI
> +  (match_dup 1)
> +  (match_dup 2
> +  (unspec [(const_int 0)] UNSPEC_MASKOP)])])
> +
>  (define_insn "ktest"
>[(set (reg:CC FLAGS_REG)
> (unspec:CC
> --
> 2.31.1
>


Re: [x86 PATCH] Handle sign_extend like zero_extend in *concatditi3_[346]

2024-06-27 Thread Uros Bizjak
On Thu, Jun 27, 2024 at 9:40 PM Roger Sayle  wrote:
>
>
> This patch generalizes some of the patterns in i386.md that recognize
> double word concatenation, so they handle sign_extend the same way that
> they handle zero_extend in appropriate contexts.
>
> As a motivating example consider the following function:
>
> __int128 foo(long long x, unsigned long long y)
> {
>   return ((__int128)x<<64) | y;
> }
>
> when compiled with -O2, x86_64 currently generates:
>
> foo:movq%rdi, %rdx
> xorl%eax, %eax
> xorl%edi, %edi
> orq %rsi, %rax
> orq %rdi, %rdx
> ret
>
> with this patch we now generate (the same as if x is unsigned):
>
> foo:movq%rsi, %rax
> movq%rdi, %rdx
> ret
>
> Treating both extensions the same way using any_extend is valid as
> the top (extended) bits are "unused" after the shift by 64 (or more).
> In theory, the RTL optimizers might consider canonicalizing the form
> of extension used in these cases, but zero_extend is faster on some
> machines, whereas sign extension is supported via addressing modes on
> others, so handling both in the machine description is probably best.
>
> This patch has been tested on x86_64-pc-linux-gnu with make bootstrap
> and make -k check, both with and without --target_board=unix{-m32}
> with no new failures.  Ok for mainline?
>
>
> 2024-06-27  Roger Sayle  
>
> gcc/ChangeLog
> * config/i386/i386.md (*concat3_3): Change zero_extend
> to any_extend in first operand to left shift by mode precision.
> (*concat3_4): Likewise.
> (*concat3_6): Likewise.
>
> gcc/testsuite/ChangeLog
> * gcc.target/i386/concatditi-1.c: New test case.

OK.

Thanks,
Uros.

>
>
> Thanks in advance,
> Roger
> --
>


Re: [PATCH] i386: Fix regression after refactoring legitimize_pe_coff_symbol, ix86_GOT_alias_set and PE_COFF_LEGITIMIZE_EXTERN_DECL

2024-06-27 Thread Uros Bizjak
On Thu, Jun 27, 2024 at 12:50 PM Evgeny Karpov
 wrote:
>
> Thursday, June 27, 2024 10:39 AM
> Uros Bizjak  wrote:
>
> > > diff --git a/gcc/config/i386/i386-expand.cc 
> > > b/gcc/config/i386/i386-expand.cc
> > > index 5dfa7d49f58..20adb42e17b 100644
> > > --- a/gcc/config/i386/i386-expand.cc
> > > +++ b/gcc/config/i386/i386-expand.cc
> > > @@ -414,6 +414,10 @@ ix86_expand_move (machine_mode mode, rtx
> > operands[])
> > >   {
> > >  #if TARGET_PECOFF
> > >tmp = legitimize_pe_coff_symbol (op1, addend != NULL_RTX);
> > > +#else
> > > +tmp = NULL_RTX;
> > > +#endif
> > > +
> > >if (tmp)
> > >  {
> > >op1 = tmp;
> > > @@ -425,7 +429,6 @@ ix86_expand_move (machine_mode mode, rtx
> > operands[])
> > >op1 = operands[1];
> > >break;
> > >  }
> > > -#endif
> > >   }
> > >
> > >if (addend)
> >
> > tmp can only be set by legitimize_pe_coff_symbol, so !TARGET_PECOFF
> > will always get to the "else" part. Do this change simply by moving
> > #endif, like the below:
> >
> > --cut here--
> > iff --git a/gcc/config/i386/i386-expand.cc b/gcc/config/i386/i386-expand.cc
> > index 5dfa7d49f58..407db6c215b 100644
> > --- a/gcc/config/i386/i386-expand.cc
> > +++ b/gcc/config/i386/i386-expand.cc
> > @@ -421,11 +421,11 @@ ix86_expand_move (machine_mode mode, rtx
> > operands[])
> >break;
> >}
> >  else
> > +#endif
> >{
> >  op1 = operands[1];
> >  break;
> >}
> > -#endif
> >}
> >
> >   if (addend)
> > --cut here--
> >
>
> I would prefer readability in the original version if there are no objections.

The proposed form is how existing TARGET_MACHO handles similar issue.
Please see e.g. i386.cc, around line 6216 and elsewhere:

#if TARGET_MACHO
  if (TARGET_MACHO)
{
  switch_to_section (darwin_sections[picbase_thunk_section]);
  fputs ("\t.weak_definition\t", asm_out_file);
  assemble_name (asm_out_file, name);
  fputs ("\n\t.private_extern\t", asm_out_file);
  assemble_name (asm_out_file, name);
  putc ('\n', asm_out_file);
  ASM_OUTPUT_LABEL (asm_out_file, name);
  DECL_WEAK (decl) = 1;
}
  else
#endif
if (USE_HIDDEN_LINKONCE)
...

So, there is no problem having #endif just after else.

Anyway, it's your call, this is not a hill I'm willing to die on. ;)

Thanks,
Uros.


Re: [PATCH] libgccjit: Add support for machine-dependent builtins

2024-06-27 Thread Uros Bizjak
On Thu, Jun 27, 2024 at 12:49 AM David Malcolm  wrote:
>
> On Thu, 2023-11-23 at 17:17 -0500, Antoni Boucher wrote:
> > Hi.
> > I did split the patch and sent one for the bfloat16 support and
> > another
> > one for the vector support.
> >
> > Here's the updated patch for the machine-dependent builtins.
> >
>
> Thanks for the patch; sorry about the long delay in reviewing it.
>
> CCing Jan and Uros re the i386 part of that patch; for reference the
> patch being discussed is here:
>   https://gcc.gnu.org/pipermail/gcc-patches/2023-November/638027.html
>
> > From e025f95f4790ae861e709caf23cbc0723c1a3804 Mon Sep 17 00:00:00 2001
> > From: Antoni Boucher 
> > Date: Mon, 23 Jan 2023 17:21:15 -0500
> > Subject: [PATCH] libgccjit: Add support for machine-dependent builtins
>
> [...snip...]
>
> > diff --git a/gcc/config/i386/i386-builtins.cc 
> > b/gcc/config/i386/i386-builtins.cc
> > index 42fc3751676..5cc1d6f4d2e 100644
> > --- a/gcc/config/i386/i386-builtins.cc
> > +++ b/gcc/config/i386/i386-builtins.cc
> > @@ -225,6 +225,22 @@ static GTY(()) tree ix86_builtins[(int) 
> > IX86_BUILTIN_MAX];
> >
> >  struct builtin_isa ix86_builtins_isa[(int) IX86_BUILTIN_MAX];
> >
> > +static void
> > +clear_builtin_types (void)
> > +{
> > +  for (int i = 0 ; i < IX86_BT_LAST_CPTR + 1 ; i++)
> > +ix86_builtin_type_tab[i] = NULL;
> > +
> > +  for (int i = 0 ; i < IX86_BUILTIN_MAX ; i++)
> > +  {
> > +ix86_builtins[i] = NULL;
> > +ix86_builtins_isa[i].set_and_not_built_p = true;
> > +  }
> > +
> > +  for (int i = 0 ; i < IX86_BT_LAST_ALIAS + 1 ; i++)
> > +ix86_builtin_func_type_tab[i] = NULL;
> > +}
> > +
> >  tree get_ix86_builtin (enum ix86_builtins c)
> >  {
> >return ix86_builtins[c];
> > @@ -1483,6 +1499,8 @@ ix86_init_builtins (void)
> >  {
> >tree ftype, decl;
> >
> > +  clear_builtin_types ();
> > +
> >ix86_init_builtin_types ();
> >
> >/* Builtins to get CPU type and features. */
>
> Please can one of the i386 maintainers check this?
> (CCing Jan and Uros: this is for the case where the compiler code runs
> multiple times in-process due to being linked into libgccjit.so.  We
> want to restore state within i386-builtins.cc to an initial state, and
> ensure that no GC-managed objects persist from previous in-memory
> compiles).

Can we rather introduce TARGET_CLEANUP_BUILTINS hook and call it from
the JIT compiler at some appropriate time? IMO, this burdens
unnecessarily non-JIT compilation.

Uros.


Re: [PATCH v2] Vect: Support truncate after .SAT_SUB pattern in zip

2024-06-27 Thread Uros Bizjak
On Thu, Jun 27, 2024 at 9:01 AM Li, Pan2  wrote:
>
> It only requires the backend implement the standard name for vector mode I 
> bet.

There are several standard names present for x86:
{ss,us}{add,sub}{v8qi,v16qi,v32qi,v64qi,v4hi,v8hi,v16hi,v32hi},
defined in sse.md:

(define_expand "3"
  [(set (match_operand:VI12_AVX2_AVX512BW 0 "register_operand")
(sat_plusminus:VI12_AVX2_AVX512BW
  (match_operand:VI12_AVX2_AVX512BW 1 "vector_operand")
  (match_operand:VI12_AVX2_AVX512BW 2 "vector_operand")))]
  "TARGET_SSE2 &&  && "
  "ix86_fixup_binary_operands_no_copy (, mode, operands);")

but all of these handle only 8 and 16 bit elements.

> How about a simpler one like below.
>
>   #define DEF_VEC_SAT_U_SUB_TRUNC_FMT_1(OUT_T, IN_T)   \
>   void __attribute__((noinline))   \
>   vec_sat_u_sub_trunc_##OUT_T##_fmt_1 (OUT_T *out, IN_T *op_1, IN_T y, \
>unsigned limit) \
>   {\
> unsigned i;\
> for (i = 0; i < limit; i++)\
>   {\
> IN_T x = op_1[i];  \
> out[i] = (OUT_T)(x >= y ? x - y : 0);  \
>   }\
>   }
>
> DEF_VEC_SAT_U_SUB_TRUNC_FMT_1(uint32_t, uint64_t);

I tried with:

DEF_VEC_SAT_U_SUB_TRUNC_FMT_1(uint8_t, uint16_t);

And the compiler was able to detect several .SAT_SUB patterns:

$ grep SAT_SUB pr51492-1.c.266t.optimized
 vect_patt_37.14_85 = .SAT_SUB (vect_x_13.12_81, vect_cst__84);
 vect_patt_37.14_86 = .SAT_SUB (vect_x_13.13_83, vect_cst__84);
 vect_patt_42.26_126 = .SAT_SUB (vect_x_62.24_122, vect_cst__125);
 vect_patt_42.26_127 = .SAT_SUB (vect_x_62.25_124, vect_cst__125);
 iftmp.0_24 = .SAT_SUB (x_3, y_14(D));

Uros.

>
> The riscv backend is able to detect the pattern similar as below. I can help 
> to check x86 side after the running test suites.
>
> ;;   basic block 2, loop depth 0
> ;;pred:   ENTRY
>   if (limit_11(D) != 0)
> goto ; [89.00%]
>   else
> goto ; [11.00%]
> ;;succ:   3
> ;;5
> ;;   basic block 3, loop depth 0
> ;;pred:   2
>   vect_cst__71 = [vec_duplicate_expr] y_14(D);
>   _78 = (unsigned long) limit_11(D);
> ;;succ:   4
>
> ;;   basic block 4, loop depth 1
> ;;pred:   4
> ;;3
>   # vectp_op_1.7_68 = PHI 
>   # vectp_out.12_75 = PHI 
>   # ivtmp_79 = PHI 
>   _81 = .SELECT_VL (ivtmp_79, POLY_INT_CST [2, 2]);
>   ivtmp_67 = _81 * 8;
>   vect_x_13.9_70 = .MASK_LEN_LOAD (vectp_op_1.7_68, 64B, { -1, ... }, _81, 0);
>   vect_patt_48.10_72 = .SAT_SUB (vect_x_13.9_70, vect_cst__71);   
>// .SAT_SUB pattern
>   vect_patt_49.11_73 = (vector([2,2]) unsigned int) vect_patt_48.10_72;
>   ivtmp_74 = _81 * 4;
>   .MASK_LEN_STORE (vectp_out.12_75, 32B, { -1, ... }, _81, 0, 
> vect_patt_49.11_73);
>   vectp_op_1.7_69 = vectp_op_1.7_68 + ivtmp_67;
>   vectp_out.12_76 = vectp_out.12_75 + ivtmp_74;
>   ivtmp_80 = ivtmp_79 - _81;
>
> riscv64-unknown-elf-gcc (GCC) 15.0.0 20240627 (experimental)
> Copyright (C) 2024 Free Software Foundation, Inc.
> This is free software; see the source for copying conditions.  There is NO
> warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.
>
> Pan
>
> -Original Message-
> From: Uros Bizjak 
> Sent: Thursday, June 27, 2024 2:48 PM
> To: Li, Pan2 
> Cc: gcc-patches@gcc.gnu.org; juzhe.zh...@rivai.ai; kito.ch...@gmail.com; 
> richard.guent...@gmail.com; jeffreya...@gmail.com; pins...@gmail.com
> Subject: Re: [PATCH v2] Vect: Support truncate after .SAT_SUB pattern in zip
>
> On Mon, Jun 24, 2024 at 3:55 PM  wrote:
> >
> > From: Pan Li 
> >
> > The zip benchmark of coremark-pro have one SAT_SUB like pattern but
> > truncated as below:
> >
> > void test (uint16_t *x, unsigned b, unsigned n)
> > {
> >   unsigned a = 0;
> >   register uint16_t *p = x;
> >
> >   do {
> > a = *--p;
> > *p = (uint16_t)(a >= b ? a - b : 0); // Truncate after .SAT_SUB
> >   } while (--n);
> > }
> >

No, the current compiler does not recognize .SAT_SUB for x86 with the
above code, although many vector sat sub instructions involving 16bit
elements are present.

Uros.


Re: [PATCH] i386: Fix regression after refactoring legitimize_pe_coff_symbol, ix86_GOT_alias_set and PE_COFF_LEGITIMIZE_EXTERN_DECL

2024-06-27 Thread Uros Bizjak
On Thu, Jun 27, 2024 at 9:16 AM Evgeny Karpov
 wrote:
>
> Thank you for reporting the issues and discussing the root causes.
> It helped in preparing the patch.
>
> This patch fixes 3 bugs reported after merging
> the "Add DLL import/export implementation to AArch64" series.
> https://gcc.gnu.org/pipermail/gcc-patches/2024-June/653955.html
> The series refactors the i386 codebase to reuse it in AArch64, which
> triggers some bugs.
>
> Bug 115661 - [15 Regression] wrong code at -O{2,3} on
> x86_64-linux-gnu since r15-1599-g63512c72df09b4
> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115661
>
> Bug 115635 - [15 regression] Bootstrap fails with failed
> self-test with the rust fe (diagnostic-path.cc:1153:
> test_empty_path: FAIL: ASSERT_FALSE
> ((path.interprocedural_p ( since r15-1599-g63512c72df09b4
> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115635
>
> Issue 1. In some code, i386 has been relying on the
> legitimize_pe_coff_symbol call on all platforms and should return
> NULL_RTX if it is not supported.
>
> Fix: NULL_RTX handling has been added when the target does not
> support PECOFF.
>
> Issue 2. ix86_GOT_alias_set is used on all platforms and cannot be
> extracted to mingw.
>
> Fix: ix86_GOT_alias_set has been returned as it was and is used on
> all platforms for i386.
>
> Bug 115643 - [15 regression] aarch64-w64-mingw32 support today breaks
> x86_64-w64-mingw32 build cannot represent relocation type
> BFD_RELOC_64 since r15-1602-ged20feebd9ea31
> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115643
>
> Issue 3. PE_COFF_EXTERN_DECL_SHOULD_BE_LEGITIMIZED has been added and used
> with a negative operator for a complex expression without braces.
>
> Fix: Braces has been added, and
> PE_COFF_EXTERN_DECL_SHOULD_BE_LEGITIMIZED has been renamed to
> PE_COFF_LEGITIMIZE_EXTERN_DECL.
>
>
> The patch has been attached as a text file because it contains special
> characters that are usually removed by the mail client.

> diff --git a/gcc/config/i386/i386-expand.cc b/gcc/config/i386/i386-expand.cc
> index 5dfa7d49f58..20adb42e17b 100644
> --- a/gcc/config/i386/i386-expand.cc
> +++ b/gcc/config/i386/i386-expand.cc
> @@ -414,6 +414,10 @@ ix86_expand_move (machine_mode mode, rtx operands[])
>   {
>  #if TARGET_PECOFF
>tmp = legitimize_pe_coff_symbol (op1, addend != NULL_RTX);
> +#else
> +tmp = NULL_RTX;
> +#endif
> +
>if (tmp)
>  {
>op1 = tmp;
> @@ -425,7 +429,6 @@ ix86_expand_move (machine_mode mode, rtx operands[])
>op1 = operands[1];
>break;
>  }
> -#endif
>   }
>
>if (addend)

tmp can only be set by legitimize_pe_coff_symbol, so !TARGET_PECOFF
will always get to the "else" part. Do this change simply by moving
#endif, like the below:

--cut here--
iff --git a/gcc/config/i386/i386-expand.cc b/gcc/config/i386/i386-expand.cc
index 5dfa7d49f58..407db6c215b 100644
--- a/gcc/config/i386/i386-expand.cc
+++ b/gcc/config/i386/i386-expand.cc
@@ -421,11 +421,11 @@ ix86_expand_move (machine_mode mode, rtx operands[])
   break;
   }
 else
+#endif
   {
 op1 = operands[1];
 break;
   }
-#endif
   }

  if (addend)
--cut here--

Side note, legitimize_pe_coff_symbol is always called from #if
TARGET_PECOFF, so:

rtx
legitimize_pe_coff_symbol (rtx addr, bool inreg)
{
  if (!TARGET_PECOFF)
return NULL_RTX;

should be removed or converted to gcc_assert.

> +alias_set_type
> +ix86_GOT_alias_set (void)
> +{
> +  static alias_set_type set = -1;

Please add a line of vertical space here.

> +  if (set == -1)
> +set = new_alias_set ();
> +  return set;

OK, but please allow RichartB to look at the alias_set changes.

Thanks,
Uros.


Re: [PATCH v2] Vect: Support truncate after .SAT_SUB pattern in zip

2024-06-27 Thread Uros Bizjak
On Mon, Jun 24, 2024 at 3:55 PM  wrote:
>
> From: Pan Li 
>
> The zip benchmark of coremark-pro have one SAT_SUB like pattern but
> truncated as below:
>
> void test (uint16_t *x, unsigned b, unsigned n)
> {
>   unsigned a = 0;
>   register uint16_t *p = x;
>
>   do {
> a = *--p;
> *p = (uint16_t)(a >= b ? a - b : 0); // Truncate after .SAT_SUB
>   } while (--n);
> }
>
> It will have gimple before vect pass,  it cannot hit any pattern of
> SAT_SUB and then cannot vectorize to SAT_SUB.
>
> _2 = a_11 - b_12(D);
> iftmp.0_13 = (short unsigned int) _2;
> _18 = a_11 >= b_12(D);
> iftmp.0_5 = _18 ? iftmp.0_13 : 0;
>
> This patch would like to improve the pattern match to recog above
> as truncate after .SAT_SUB pattern.  Then we will have the pattern
> similar to below,  as well as eliminate the first 3 dead stmt.
>
> _2 = a_11 - b_12(D);
> iftmp.0_13 = (short unsigned int) _2;
> _18 = a_11 >= b_12(D);
> iftmp.0_5 = (short unsigned int).SAT_SUB (a_11, b_12(D));
>
> The below tests are passed for this patch.
> 1. The rv64gcv fully regression tests.
> 2. The rv64gcv build with glibc.
> 3. The x86 bootstrap tests.
> 4. The x86 fully regression tests.

I have tried this patch with x86_64 on the testcase from PR51492, but
the compiler does not recognize the .SAT_SUB pattern here.

Is there anything else missing for successful detection?

Uros.

>
> gcc/ChangeLog:
>
> * match.pd: Add convert description for minus and capture.
> * tree-vect-patterns.cc (vect_recog_build_binary_gimple_call): Add
> new logic to handle in_type is incompatibile with out_type,  as
> well as rename from.
> (vect_recog_build_binary_gimple_stmt): Rename to.
> (vect_recog_sat_add_pattern): Leverage above renamed func.
> (vect_recog_sat_sub_pattern): Ditto.
>
> Signed-off-by: Pan Li 
> ---
>  gcc/match.pd  |  4 +--
>  gcc/tree-vect-patterns.cc | 51 ---
>  2 files changed, 33 insertions(+), 22 deletions(-)
>
> diff --git a/gcc/match.pd b/gcc/match.pd
> index 3d0689c9312..4a4b0b2e72f 100644
> --- a/gcc/match.pd
> +++ b/gcc/match.pd
> @@ -3164,9 +3164,9 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT)
>  /* Unsigned saturation sub, case 2 (branch with ge):
> SAT_U_SUB = X >= Y ? X - Y : 0.  */
>  (match (unsigned_integer_sat_sub @0 @1)
> - (cond^ (ge @0 @1) (minus @0 @1) integer_zerop)
> + (cond^ (ge @0 @1) (convert? (minus (convert1? @0) (convert1? @1))) 
> integer_zerop)
>   (if (INTEGRAL_TYPE_P (type) && TYPE_UNSIGNED (type)
> -  && types_match (type, @0, @1
> +  && TYPE_UNSIGNED (TREE_TYPE (@0)) && types_match (@0, @1
>
>  /* Unsigned saturation sub, case 3 (branchless with gt):
> SAT_U_SUB = (X - Y) * (X > Y).  */
> diff --git a/gcc/tree-vect-patterns.cc b/gcc/tree-vect-patterns.cc
> index cef901808eb..3d887d36050 100644
> --- a/gcc/tree-vect-patterns.cc
> +++ b/gcc/tree-vect-patterns.cc
> @@ -4490,26 +4490,37 @@ vect_recog_mult_pattern (vec_info *vinfo,
>  extern bool gimple_unsigned_integer_sat_add (tree, tree*, tree (*)(tree));
>  extern bool gimple_unsigned_integer_sat_sub (tree, tree*, tree (*)(tree));
>
> -static gcall *
> -vect_recog_build_binary_gimple_call (vec_info *vinfo, gimple *stmt,
> +static gimple *
> +vect_recog_build_binary_gimple_stmt (vec_info *vinfo, stmt_vec_info 
> stmt_info,
>  internal_fn fn, tree *type_out,
> -tree op_0, tree op_1)
> +tree lhs, tree op_0, tree op_1)
>  {
>tree itype = TREE_TYPE (op_0);
> -  tree vtype = get_vectype_for_scalar_type (vinfo, itype);
> +  tree otype = TREE_TYPE (lhs);
> +  tree v_itype = get_vectype_for_scalar_type (vinfo, itype);
> +  tree v_otype = get_vectype_for_scalar_type (vinfo, otype);
>
> -  if (vtype != NULL_TREE
> -&& direct_internal_fn_supported_p (fn, vtype, OPTIMIZE_FOR_BOTH))
> +  if (v_itype != NULL_TREE && v_otype != NULL_TREE
> +&& direct_internal_fn_supported_p (fn, v_itype, OPTIMIZE_FOR_BOTH))
>  {
>gcall *call = gimple_build_call_internal (fn, 2, op_0, op_1);
> +  tree in_ssa = vect_recog_temp_ssa_var (itype, NULL);
>
> -  gimple_call_set_lhs (call, vect_recog_temp_ssa_var (itype, NULL));
> +  gimple_call_set_lhs (call, in_ssa);
>gimple_call_set_nothrow (call, /* nothrow_p */ false);
> -  gimple_set_location (call, gimple_location (stmt));
> +  gimple_set_location (call, gimple_location (STMT_VINFO_STMT 
> (stmt_info)));
> +
> +  *type_out = v_otype;
>
> -  *type_out = vtype;
> +  if (types_compatible_p (itype, otype))
> +   return call;
> +  else
> +   {
> + append_pattern_def_seq (vinfo, stmt_info, call, v_itype);
> + tree out_ssa = vect_recog_temp_ssa_var (otype, NULL);
>
> -  return call;
> + return gimple_build_assign (out_ssa, CONVERT_EXPR, in_ssa);
> +   }
>  }
>
>return NULL;
> @@ -4541,13 +4552,13 @@ 

Re: [PATCH V2] Fix wrong cost of MEM when addr is a lea.

2024-06-27 Thread Uros Bizjak
On Thu, Jun 27, 2024 at 5:57 AM liuhongt  wrote:
>
> > But rtx_cost invokes targetm.rtx_cost which allows to avoid that
> > recursive processing at any level.  You're dealing with MEM [addr]
> > here, so why's rtx_cost (addr, Pmode, MEM, 0, speed) not always
> > the best way to deal with this?  Since this is the MEM [addr] case
> > we know it's not LEA, no?
> The patch restrict MEM rtx_cost reduction only for register_operand + disp.
>
>
> Bootstrapped and regtested on x86_64-pc-linux-gnu{-m32,}.
> Ok for trunk?

LGTM.

Thanks,
Uros.

>
>
> 416.gamess regressed 4-6% on x86_64 since my r15-882-g1d6199e5f8c1c0.
> The commit adjust rtx_cost of mem to reduce cost of (add op0 disp).
> But Cost of ADDR could be cheaper than XEXP (addr, 0) when it's a lea.
> It is the case in the PR, the patch adjust rtx_cost to only handle reg
> + disp, for other forms, they're basically all LEA which doesn't have
> additional cost of ADD.
>
> gcc/ChangeLog:
>
> PR target/115462
> * config/i386/i386.cc (ix86_rtx_costs): Make cost of MEM (reg +
> disp) just a little bit more than MEM (reg).
>
> gcc/testsuite/ChangeLog:
> * gcc.target/i386/pr115462.c: New test.
> ---
>  gcc/config/i386/i386.cc  |  5 -
>  gcc/testsuite/gcc.target/i386/pr115462.c | 22 ++
>  2 files changed, 26 insertions(+), 1 deletion(-)
>  create mode 100644 gcc/testsuite/gcc.target/i386/pr115462.c
>
> diff --git a/gcc/config/i386/i386.cc b/gcc/config/i386/i386.cc
> index d4ccc24be6e..ef2a1e4f4f2 100644
> --- a/gcc/config/i386/i386.cc
> +++ b/gcc/config/i386/i386.cc
> @@ -22339,7 +22339,10 @@ ix86_rtx_costs (rtx x, machine_mode mode, int 
> outer_code_i, int opno,
>  address_cost should be used, but it reduce cost too much.
>  So current solution is make constant disp as cheap as possible.  
> */
>   if (GET_CODE (addr) == PLUS
> - && x86_64_immediate_operand (XEXP (addr, 1), Pmode))
> + && x86_64_immediate_operand (XEXP (addr, 1), Pmode)
> + /* Only hanlde (reg + disp) since other forms of addr are 
> mostly LEA,
> +there's no additional cost for the plus of disp.  */
> + && register_operand (XEXP (addr, 0), Pmode))
> {
>   *total += 1;
>   *total += rtx_cost (XEXP (addr, 0), Pmode, PLUS, 0, speed);
> diff --git a/gcc/testsuite/gcc.target/i386/pr115462.c 
> b/gcc/testsuite/gcc.target/i386/pr115462.c
> new file mode 100644
> index 000..ad50a6382bc
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/i386/pr115462.c
> @@ -0,0 +1,22 @@
> +/* { dg-do compile } */
> +/* { dg-options "-O2 -mavx2 -fno-tree-vectorize -fno-pic" } */
> +/* { dg-final { scan-assembler-times {(?n)movl[ \t]+.*, p1\.0\+[0-9]*\(,} 3 
> } } */
> +
> +int
> +foo (long indx, long indx2, long indx3, long indx4, long indx5, long indx6, 
> long n, int* q)
> +{
> +  static int p1[1];
> +  int* p2 = p1 + 1000;
> +  int* p3 = p1 + 4000;
> +  int* p4 = p1 + 8000;
> +
> +  for (long i = 0; i != n; i++)
> +{
> +  /* scan for  movl%edi, p1.0+3996(,%rax,4),
> +p1.0+3996 should be propagted into the loop.  */
> +  p2[indx++] = q[indx++];
> +  p3[indx2++] = q[indx2++];
> +  p4[indx3++] = q[indx3++];
> +}
> +  return p1[indx6] + p1[indx5];
> +}
> --
> 2.31.1
>


Re: [PATCH] i386: Fix some ISA bit test in option_override

2024-06-19 Thread Uros Bizjak
On Thu, Jun 20, 2024 at 3:16 AM Hongyu Wang  wrote:
>
> Hi,
>
> This patch adjusts several new feature check in ix86_option_override_interal
> that directly use TARGET_* instead of TARGET_*_P (opts->ix86_isa_flags),
> which caused cmdline option overrides target_attribute isa flag.
>
> Bootstrapped && regtested on x86_64-pc-linux-gnu.
>
> Ok for trunk?
>
> gcc/ChangeLog:
>
> * config/i386/i386-options.cc (ix86_option_override_internal):
> Use TARGET_*_P (opts->x_ix86_isa_flags*) instead of TARGET_*
> for UINTR, LAM and APX_F.
>
> gcc/testsuite/ChangeLog:
>
> * gcc.target/i386/apx-ccmp-2.c: Remove -mno-apxf in option.
> * gcc.target/i386/funcspec-56.inc: Drop uintr tests.
> * gcc.target/i386/funcspec-6.c: Add uintr tests.

OK.

Thanks,
Uros.

> ---
>  gcc/config/i386/i386-options.cc   | 14 +-
>  gcc/testsuite/gcc.target/i386/apx-ccmp-2.c|  2 +-
>  gcc/testsuite/gcc.target/i386/funcspec-56.inc |  2 --
>  gcc/testsuite/gcc.target/i386/funcspec-6.c|  2 ++
>  4 files changed, 12 insertions(+), 8 deletions(-)
>
> diff --git a/gcc/config/i386/i386-options.cc b/gcc/config/i386/i386-options.cc
> index f2cecc0e254..34adedb3127 100644
> --- a/gcc/config/i386/i386-options.cc
> +++ b/gcc/config/i386/i386-options.cc
> @@ -2113,15 +2113,18 @@ ix86_option_override_internal (bool main_args_p,
>opts->x_ix86_stringop_alg = no_stringop;
>  }
>
> -  if (TARGET_APX_F && !TARGET_64BIT)
> +  if (TARGET_APX_F_P (opts->x_ix86_isa_flags2)
> +  && !TARGET_64BIT_P (opts->x_ix86_isa_flags))
>  error ("%<-mapxf%> is not supported for 32-bit code");
> -  else if (opts->x_ix86_apx_features != apx_none && !TARGET_64BIT)
> +  else if (opts->x_ix86_apx_features != apx_none
> +  && !TARGET_64BIT_P (opts->x_ix86_isa_flags))
>  error ("%<-mapx-features=%> option is not supported for 32-bit code");
>
> -  if (TARGET_UINTR && !TARGET_64BIT)
> +  if (TARGET_UINTR_P (opts->x_ix86_isa_flags2)
> +  && !TARGET_64BIT_P (opts->x_ix86_isa_flags))
>  error ("%<-muintr%> not supported for 32-bit code");
>
> -  if (ix86_lam_type && !TARGET_LP64)
> +  if (ix86_lam_type && !TARGET_LP64_P (opts->x_ix86_isa_flags))
>  error ("%<-mlam=%> option: [u48|u57] not supported for 32-bit code");
>
>if (!opts->x_ix86_arch_string)
> @@ -2502,7 +2505,8 @@ ix86_option_override_internal (bool main_args_p,
>init_machine_status = ix86_init_machine_status;
>
>/* Override APX flag here if ISA bit is set.  */
> -  if (TARGET_APX_F && !OPTION_SET_P (ix86_apx_features))
> +  if (TARGET_APX_F_P (opts->x_ix86_isa_flags2)
> +  && !OPTION_SET_P (ix86_apx_features))
>  opts->x_ix86_apx_features = apx_all;
>
>/* Validate -mregparm= value.  */
> diff --git a/gcc/testsuite/gcc.target/i386/apx-ccmp-2.c 
> b/gcc/testsuite/gcc.target/i386/apx-ccmp-2.c
> index 4a0784394c3..192c0458728 100644
> --- a/gcc/testsuite/gcc.target/i386/apx-ccmp-2.c
> +++ b/gcc/testsuite/gcc.target/i386/apx-ccmp-2.c
> @@ -1,6 +1,6 @@
>  /* { dg-do run { target { ! ia32 } } } */
>  /* { dg-require-effective-target apxf } */
> -/* { dg-options "-O3 -mno-apxf" } */
> +/* { dg-options "-O3" } */
>
>  __attribute__((noinline, noclone, target("apxf")))
>  int foo_apx(int a, int b, int c, int d)
> diff --git a/gcc/testsuite/gcc.target/i386/funcspec-56.inc 
> b/gcc/testsuite/gcc.target/i386/funcspec-56.inc
> index 2a50f5bf67c..8825e88768a 100644
> --- a/gcc/testsuite/gcc.target/i386/funcspec-56.inc
> +++ b/gcc/testsuite/gcc.target/i386/funcspec-56.inc
> @@ -69,7 +69,6 @@ extern void test_avx512vp2intersect (void)
> __attribute__((__target__("avx512vp2i
>  extern void test_amx_tile (void)   
> __attribute__((__target__("amx-tile")));
>  extern void test_amx_int8 (void)   
> __attribute__((__target__("amx-int8")));
>  extern void test_amx_bf16 (void)   
> __attribute__((__target__("amx-bf16")));
> -extern void test_uintr (void)  
> __attribute__((__target__("uintr")));
>  extern void test_hreset (void) 
> __attribute__((__target__("hreset")));
>  extern void test_keylocker (void)  
> __attribute__((__target__("kl")));
>  extern void test_widekl (void) 
> __attribute__((__target__("widekl")));
> @@ -158,7 +157,6 @@ extern void test_no_avx512vp2intersect (void)   
> __attribute__((__target__("no-avx5
>  extern void test_no_amx_tile (void)
> __attribute__((__target__("no-amx-tile")));
>  extern void test_no_amx_int8 (void)
> __attribute__((__target__("no-amx-int8")));
>  extern void test_no_amx_bf16 (void)
> __attribute__((__target__("no-amx-bf16")));
> -extern void test_no_uintr (void)   
> __attribute__((__target__("no-uintr")));
>  extern void test_no_hreset (void)  
> __attribute__((__target__("no-hreset")));
>  extern void test_no_keylocker (void)   
> __attribute__((__target__("no-kl")));
>  extern void 

Re: [PATCH] [x86_64]: Zhaoxin shijidadao enablement

2024-06-19 Thread Uros Bizjak
On Tue, Jun 18, 2024 at 9:21 AM mayshao-oc  wrote:
>
>
>
> On 5/28/24 14:15, Uros Bizjak wrote:
> >
> >
> >
> > On Mon, May 27, 2024 at 10:33 AM MayShao  wrote:
> >>
> >> From: mayshao 
> >>
> >> Hi all:
> >>  This patch enables -march/-mtune=shijidadao, costs and tunings are 
> >> set according to the characteristics of the processor.
> >>
> >>  Bootstrapped /regtested X86_64.
> >>
> >>  Ok for trunk?
> >
> > OK.
> >
> > Thanks,
> > Uros.
>
> Thanks for your review, please help me commit.

Done, committed as r15-1454 [1].

[1] https://gcc.gnu.org/pipermail/gcc-cvs/2024-June/404474.html

Thanks,
Uros.


[gcc r15-1454] i386: Zhaoxin shijidadao enablement

2024-06-19 Thread Uros Bizjak via Gcc-cvs
https://gcc.gnu.org/g:6f6ea27d17e9bbc917b94ffea1c933755e736bdc

commit r15-1454-g6f6ea27d17e9bbc917b94ffea1c933755e736bdc
Author: mayshao 
Date:   Wed Jun 19 16:03:25 2024 +0200

i386: Zhaoxin shijidadao enablement

This patch enables -march/-mtune=shijidadao, costs and tunings are set
according to the characteristics of the processor.

gcc/ChangeLog:

* common/config/i386/cpuinfo.h (get_zhaoxin_cpu): Recognize 
shijidadao.
* common/config/i386/i386-common.cc: Add shijidadao.
* common/config/i386/i386-cpuinfo.h (enum processor_subtypes):
Add ZHAOXIN_FAM7H_SHIJIDADAO.
* config.gcc: Add shijidadao.
* config/i386/driver-i386.cc (host_detect_local_cpu):
Let -march=native recognize shijidadao processors.
* config/i386/i386-c.cc (ix86_target_macros_internal): Add 
shijidadao.
* config/i386/i386-options.cc (m_ZHAOXIN): Add m_SHIJIDADAO.
(m_SHIJIDADAO): New definition.
* config/i386/i386.h (enum processor_type): Add 
PROCESSOR_SHIJIDADAO.
* config/i386/x86-tune-costs.h (struct processor_costs):
Add shijidadao_cost.
* config/i386/x86-tune-sched.cc (ix86_issue_rate): Add shijidadao.
(ix86_adjust_cost): Ditto.
* config/i386/x86-tune.def (X86_TUNE_USE_GATHER_2PARTS): Add 
m_SHIJIDADAO.
(X86_TUNE_USE_GATHER_4PARTS): Ditto.
(X86_TUNE_USE_GATHER_8PARTS): Ditto.
(X86_TUNE_AVOID_128FMA_CHAINS): Ditto.
* doc/extend.texi: Add details about shijidadao.
* doc/invoke.texi: Ditto.

gcc/testsuite/ChangeLog:

* g++.target/i386/mv32.C: Handle new -march
* gcc.target/i386/funcspec-56.inc: Ditto.

Diff:
---
 gcc/common/config/i386/cpuinfo.h  |   8 +-
 gcc/common/config/i386/i386-common.cc |   8 +-
 gcc/common/config/i386/i386-cpuinfo.h |   1 +
 gcc/config.gcc|  14 +++-
 gcc/config/i386/driver-i386.cc|  11 ++-
 gcc/config/i386/i386-c.cc |   7 ++
 gcc/config/i386/i386-options.cc   |   4 +-
 gcc/config/i386/i386.h|   1 +
 gcc/config/i386/x86-tune-costs.h  | 116 ++
 gcc/config/i386/x86-tune-sched.cc |   2 +
 gcc/config/i386/x86-tune.def  |   8 +-
 gcc/doc/extend.texi   |   3 +
 gcc/doc/invoke.texi   |   6 ++
 gcc/testsuite/g++.target/i386/mv32.C  |   6 ++
 gcc/testsuite/gcc.target/i386/funcspec-56.inc |   2 +
 15 files changed, 183 insertions(+), 14 deletions(-)

diff --git a/gcc/common/config/i386/cpuinfo.h b/gcc/common/config/i386/cpuinfo.h
index 4610bf6d6a45..936039725ab6 100644
--- a/gcc/common/config/i386/cpuinfo.h
+++ b/gcc/common/config/i386/cpuinfo.h
@@ -667,12 +667,18 @@ get_zhaoxin_cpu (struct __processor_model *cpu_model,
  reset_cpu_feature (cpu_model, cpu_features2, FEATURE_F16C);
  cpu_model->__cpu_subtype = ZHAOXIN_FAM7H_LUJIAZUI;
}
- else if (model >= 0x5b)
+ else if (model == 0x5b)
{
  cpu = "yongfeng";
  CHECK___builtin_cpu_is ("yongfeng");
  cpu_model->__cpu_subtype = ZHAOXIN_FAM7H_YONGFENG;
}
+ else if (model >= 0x6b)
+   {
+ cpu = "shijidadao";
+ CHECK___builtin_cpu_is ("shijidadao");
+ cpu_model->__cpu_subtype = ZHAOXIN_FAM7H_SHIJIDADAO;
+   }
   break;
 default:
   break;
diff --git a/gcc/common/config/i386/i386-common.cc 
b/gcc/common/config/i386/i386-common.cc
index 5d9c188c9c7d..e38b1b22ffb1 100644
--- a/gcc/common/config/i386/i386-common.cc
+++ b/gcc/common/config/i386/i386-common.cc
@@ -2066,6 +2066,7 @@ const char *const processor_names[] =
   "intel",
   "lujiazui",
   "yongfeng",
+  "shijidadao",
   "geode",
   "k6",
   "athlon",
@@ -2271,10 +2272,13 @@ const pta processor_alias_table[] =
   | PTA_SSSE3 | PTA_SSE4_1 | PTA_FXSR, 0, P_NONE},
   {"lujiazui", PROCESSOR_LUJIAZUI, CPU_LUJIAZUI,
PTA_LUJIAZUI,
-   M_CPU_SUBTYPE (ZHAOXIN_FAM7H_LUJIAZUI), P_NONE},
+   M_CPU_SUBTYPE (ZHAOXIN_FAM7H_LUJIAZUI), P_PROC_BMI},
   {"yongfeng", PROCESSOR_YONGFENG, CPU_YONGFENG,
PTA_YONGFENG,
-   M_CPU_SUBTYPE (ZHAOXIN_FAM7H_YONGFENG), P_NONE},
+   M_CPU_SUBTYPE (ZHAOXIN_FAM7H_YONGFENG), P_PROC_AVX2},
+  {"shijidadao", PROCESSOR_SHIJIDADAO, CPU_YONGFENG,
+   PTA_YONGFENG,
+   M_CPU_SUBTYPE (ZHAOXIN_FAM7H_SHIJIDADAO), P_PROC_AVX2},
   {"k8", PROCESSOR_K8, CPU_K8,
 PTA_64BIT | PTA_MMX | PTA_3DNOW | PTA_3DNOW_A | PTA_SSE
   | PTA_SSE2 | PTA_NO_SAHF | PTA_FXSR, 0, P_NONE},
diff --git a/gcc/common/config/i386/i386-cpuinfo.h 
b/gcc/common/config/i386/i386-cpuinfo.h
index 3ec9e005a6ad..ccc6deb63853 100644
--- a/gcc/common/config/i386/i386-cpuinfo.h
+++ b/gcc/common/config/i386/i386-cpuinfo.h
@@ -104,6 

Re: [PATCH] [APX CCMP] Use ctestcc when comparing to const 0

2024-06-13 Thread Uros Bizjak
On Thu, Jun 13, 2024 at 3:44 AM Hongyu Wang  wrote:
>
> Thanks for the advice, updated patch in attachment.
>
> Bootstrapped/regtested on x86-64-pc-linux-gnu. Ok for trunk?
>
> Uros Bizjak  于2024年6月12日周三 18:12写道:
> >
> > On Wed, Jun 12, 2024 at 12:00 PM Uros Bizjak  wrote:
> > >
> > > On Wed, Jun 12, 2024 at 5:12 AM Hongyu Wang  wrote:
> > > >
> > > > Hi,
> > > >
> > > > For CTEST, we don't have conditional AND so there's no optimization
> > > > opportunity to write a new ctest pattern. Emit ctest when ccmp did
> > > > comparison to const 0 to save bytes.
> > > >
> > > > Bootstrapped & regtested under x86-64-pc-linux-gnu.
> > > >
> > > > Ok for trunk?
> > > >
> > > > gcc/ChangeLog:
> > > >
> > > > * config/i386/i386.md (@ccmp): Use ctestcc when
> > > > operands[3] is const0_rtx.
> > > >
> > > > gcc/testsuite/ChangeLog:
> > > >
> > > > * gcc.target/i386/apx-ccmp-1.c: Adjust output to scan ctest.
> > > > * gcc.target/i386/apx-ccmp-2.c: Adjust some condition to
> > > > compare with 0.

LGTM.

+  (minus:SWI (match_operand:SWI 2 "nonimmediate_operand" ",m,")
+ (match_operand:SWI 3 "" "C,,"))

Perhaps the constraint can be slightly optimized to avoid repeating
(,) pairs.

",m,"
"C  ,,"

Uros.

> > > > ---
> > > >  gcc/config/i386/i386.md|  6 +-
> > > >  gcc/testsuite/gcc.target/i386/apx-ccmp-1.c | 10 ++
> > > >  gcc/testsuite/gcc.target/i386/apx-ccmp-2.c |  4 ++--
> > > >  3 files changed, 13 insertions(+), 7 deletions(-)
> > > >
> > > > diff --git a/gcc/config/i386/i386.md b/gcc/config/i386/i386.md
> > > > index a64f2ad4f5f..014d48cddd6 100644
> > > > --- a/gcc/config/i386/i386.md
> > > > +++ b/gcc/config/i386/i386.md
> > > > @@ -1522,7 +1522,11 @@ (define_insn "@ccmp"
> > > >   [(match_operand:SI 4 "const_0_to_15_operand")]
> > > >   UNSPEC_APX_DFV)))]
> > > >   "TARGET_APX_CCMP"
> > > > - "ccmp%C1{}\t%G4 {%3, %2|%2, %3}"
> > > > + {
> > > > +   if (operands[3] == const0_rtx && !MEM_P (operands[2]))
> > > > + return "ctest%C1{}\t%G4 %2, %2";
> > > > +   return "ccmp%C1{}\t%G4 {%3, %2|%2, %3}";
> > > > + }
> > >
> > > This could be implemented as an alternative using "r,C" constraint as
> > > the first constraint for operands[2,3]. Then the register allocator
> > > will match the constraints for you.
> >
> > Like in the attached (lightly tested) patch.
> >
> > Uros.


Re: [PATCH] [APX CCMP] Use ctestcc when comparing to const 0

2024-06-12 Thread Uros Bizjak
On Wed, Jun 12, 2024 at 12:00 PM Uros Bizjak  wrote:
>
> On Wed, Jun 12, 2024 at 5:12 AM Hongyu Wang  wrote:
> >
> > Hi,
> >
> > For CTEST, we don't have conditional AND so there's no optimization
> > opportunity to write a new ctest pattern. Emit ctest when ccmp did
> > comparison to const 0 to save bytes.
> >
> > Bootstrapped & regtested under x86-64-pc-linux-gnu.
> >
> > Ok for trunk?
> >
> > gcc/ChangeLog:
> >
> > * config/i386/i386.md (@ccmp): Use ctestcc when
> > operands[3] is const0_rtx.
> >
> > gcc/testsuite/ChangeLog:
> >
> > * gcc.target/i386/apx-ccmp-1.c: Adjust output to scan ctest.
> > * gcc.target/i386/apx-ccmp-2.c: Adjust some condition to
> > compare with 0.
> > ---
> >  gcc/config/i386/i386.md|  6 +-
> >  gcc/testsuite/gcc.target/i386/apx-ccmp-1.c | 10 ++
> >  gcc/testsuite/gcc.target/i386/apx-ccmp-2.c |  4 ++--
> >  3 files changed, 13 insertions(+), 7 deletions(-)
> >
> > diff --git a/gcc/config/i386/i386.md b/gcc/config/i386/i386.md
> > index a64f2ad4f5f..014d48cddd6 100644
> > --- a/gcc/config/i386/i386.md
> > +++ b/gcc/config/i386/i386.md
> > @@ -1522,7 +1522,11 @@ (define_insn "@ccmp"
> >   [(match_operand:SI 4 "const_0_to_15_operand")]
> >   UNSPEC_APX_DFV)))]
> >   "TARGET_APX_CCMP"
> > - "ccmp%C1{}\t%G4 {%3, %2|%2, %3}"
> > + {
> > +   if (operands[3] == const0_rtx && !MEM_P (operands[2]))
> > + return "ctest%C1{}\t%G4 %2, %2";
> > +   return "ccmp%C1{}\t%G4 {%3, %2|%2, %3}";
> > + }
>
> This could be implemented as an alternative using "r,C" constraint as
> the first constraint for operands[2,3]. Then the register allocator
> will match the constraints for you.

Like in the attached (lightly tested) patch.

Uros.
diff --git a/gcc/config/i386/i386.md b/gcc/config/i386/i386.md
index a64f2ad4f5f..14d4d8cddca 100644
--- a/gcc/config/i386/i386.md
+++ b/gcc/config/i386/i386.md
@@ -1515,14 +1515,17 @@ (define_insn "@ccmp"
 (match_operator 1 "comparison_operator"
  [(reg:CC FLAGS_REG) (const_int 0)])
(compare:CC
- (minus:SWI (match_operand:SWI 2 "nonimmediate_operand" "m,")
-(match_operand:SWI 3 "" ","))
+ (minus:SWI (match_operand:SWI 2 "nonimmediate_operand" ",m,")
+(match_operand:SWI 3 "" 
"C,,"))
  (const_int 0))
(unspec:SI
  [(match_operand:SI 4 "const_0_to_15_operand")]
  UNSPEC_APX_DFV)))]
  "TARGET_APX_CCMP"
- "ccmp%C1{}\t%G4 {%3, %2|%2, %3}"
+ "@
+  ctest%C1{}\t%G4 %2, %2
+  ccmp%C1{}\t%G4 {%3, %2|%2, %3}
+  ccmp%C1{}\t%G4 {%3, %2|%2, %3}"
  [(set_attr "type" "icmp")
   (set_attr "mode" "")
   (set_attr "length_immediate" "1")


Re: [PATCH] [APX CCMP] Use ctestcc when comparing to const 0

2024-06-12 Thread Uros Bizjak
On Wed, Jun 12, 2024 at 5:12 AM Hongyu Wang  wrote:
>
> Hi,
>
> For CTEST, we don't have conditional AND so there's no optimization
> opportunity to write a new ctest pattern. Emit ctest when ccmp did
> comparison to const 0 to save bytes.
>
> Bootstrapped & regtested under x86-64-pc-linux-gnu.
>
> Ok for trunk?
>
> gcc/ChangeLog:
>
> * config/i386/i386.md (@ccmp): Use ctestcc when
> operands[3] is const0_rtx.
>
> gcc/testsuite/ChangeLog:
>
> * gcc.target/i386/apx-ccmp-1.c: Adjust output to scan ctest.
> * gcc.target/i386/apx-ccmp-2.c: Adjust some condition to
> compare with 0.
> ---
>  gcc/config/i386/i386.md|  6 +-
>  gcc/testsuite/gcc.target/i386/apx-ccmp-1.c | 10 ++
>  gcc/testsuite/gcc.target/i386/apx-ccmp-2.c |  4 ++--
>  3 files changed, 13 insertions(+), 7 deletions(-)
>
> diff --git a/gcc/config/i386/i386.md b/gcc/config/i386/i386.md
> index a64f2ad4f5f..014d48cddd6 100644
> --- a/gcc/config/i386/i386.md
> +++ b/gcc/config/i386/i386.md
> @@ -1522,7 +1522,11 @@ (define_insn "@ccmp"
>   [(match_operand:SI 4 "const_0_to_15_operand")]
>   UNSPEC_APX_DFV)))]
>   "TARGET_APX_CCMP"
> - "ccmp%C1{}\t%G4 {%3, %2|%2, %3}"
> + {
> +   if (operands[3] == const0_rtx && !MEM_P (operands[2]))
> + return "ctest%C1{}\t%G4 %2, %2";
> +   return "ccmp%C1{}\t%G4 {%3, %2|%2, %3}";
> + }

This could be implemented as an alternative using "r,C" constraint as
the first constraint for operands[2,3]. Then the register allocator
will match the constraints for you.

Uros.

>   [(set_attr "type" "icmp")
>(set_attr "mode" "")
>(set_attr "length_immediate" "1")
> diff --git a/gcc/testsuite/gcc.target/i386/apx-ccmp-1.c 
> b/gcc/testsuite/gcc.target/i386/apx-ccmp-1.c
> index e4e112f07e0..a8b70576760 100644
> --- a/gcc/testsuite/gcc.target/i386/apx-ccmp-1.c
> +++ b/gcc/testsuite/gcc.target/i386/apx-ccmp-1.c
> @@ -96,9 +96,11 @@ f15 (double a, double b, int c, int d)
>
>  /* { dg-final { scan-assembler-times "ccmpg" 2 } } */
>  /* { dg-final { scan-assembler-times "ccmple" 2 } } */
> -/* { dg-final { scan-assembler-times "ccmpne" 4 } } */
> -/* { dg-final { scan-assembler-times "ccmpe" 3 } } */
> +/* { dg-final { scan-assembler-times "ccmpne" 2 } } */
> +/* { dg-final { scan-assembler-times "ccmpe" 1 } } */
>  /* { dg-final { scan-assembler-times "ccmpbe" 1 } } */
> +/* { dg-final { scan-assembler-times "ctestne" 2 } } */
> +/* { dg-final { scan-assembler-times "cteste" 2 } } */
>  /* { dg-final { scan-assembler-times "ccmpa" 1 } } */
> -/* { dg-final { scan-assembler-times "ccmpbl" 2 } } */
> -
> +/* { dg-final { scan-assembler-times "ccmpbl" 1 } } */
> +/* { dg-final { scan-assembler-times "ctestbl" 1 } } */
> diff --git a/gcc/testsuite/gcc.target/i386/apx-ccmp-2.c 
> b/gcc/testsuite/gcc.target/i386/apx-ccmp-2.c
> index 0123a686d2c..4a0784394c3 100644
> --- a/gcc/testsuite/gcc.target/i386/apx-ccmp-2.c
> +++ b/gcc/testsuite/gcc.target/i386/apx-ccmp-2.c
> @@ -12,7 +12,7 @@ int foo_apx(int a, int b, int c, int d)
>c += d;
>a += b;
>sum += a + c;
> -  if (b != d && sum < c || sum > d)
> +  if (b > d && sum != 0 || sum > d)
> {
>   b += d;
>   sum += b;
> @@ -32,7 +32,7 @@ int foo_noapx(int a, int b, int c, int d)
>c += d;
>a += b;
>sum += a + c;
> -  if (b != d && sum < c || sum > d)
> +  if (b > d && sum != 0 || sum > d)
> {
>   b += d;
>   sum += b;
> --
> 2.31.1
>


Re: [PATCH] rust: Do not link with libdl and libpthread unconditionally

2024-06-12 Thread Uros Bizjak
On Tue, Jun 11, 2024 at 11:21 AM Arthur Cohen  wrote:
>
> Thanks Richi!
>
> Tested again and pushed on trunk.


This patch introduced a couple of errors during ./configure:

checking for library containing dlopen... none required
checking for library containing pthread_create... none required
/git/gcc/configure: line 8997: test: too many arguments
/git/gcc/configure: line 8999: test: too many arguments
/git/gcc/configure: line 9003: test: too many arguments
/git/gcc/configure: line 9005: test: =: unary operator expected

You have to wrap arguments of the test with double quotes.

Uros.

> Best,
>
> Arthur
>
> On 5/31/24 15:02, Richard Biener wrote:
> > On Fri, May 31, 2024 at 12:24 PM Arthur Cohen  
> > wrote:
> >>
> >> Hi Richard,
> >>
> >> On 4/30/24 09:55, Richard Biener wrote:
> >>> On Fri, Apr 19, 2024 at 11:49 AM Arthur Cohen  
> >>> wrote:
> 
>  Hi everyone,
> 
>  This patch checks for the presence of dlopen and pthread_create in libc. 
>  If that is not the
>  case, we check for the existence of -ldl and -lpthread, as these 
>  libraries are required to
>  link the Rust runtime to our Rust frontend.
> 
>  If these libs are not present on the system, then we disable the Rust 
>  frontend.
> 
>  This was tested on x86_64, in an environment with a recent GLIBC and in 
>  a container with GLIBC
>  2.27.
> 
>  Apologies for sending it in so late.
> >>>
> >>> For example GCC_ENABLE_PLUGINS simply does
> >>>
> >>># Check -ldl
> >>>saved_LIBS="$LIBS"
> >>>AC_SEARCH_LIBS([dlopen], [dl])
> >>>if test x"$ac_cv_search_dlopen" = x"-ldl"; then
> >>>  pluginlibs="$pluginlibs -ldl"
> >>>fi
> >>>LIBS="$saved_LIBS"
> >>>
> >>> which I guess would also work for pthread_create?  This would simplify
> >>> the code a bit.
> >>
> >> Thanks a lot for the review. I've udpated the patch's content in
> >> configure.ac per your suggestion. Tested similarly on x86_64 and in a
> >> container with libc 2.27
> >
> > LGTM.
> >
> > Thanks,
> > Richard.
> >
> >>   From 00669b600a75743523c358ee41ab999b6e9fa0f6 Mon Sep 17 00:00:00 2001
> >> From: Arthur Cohen 
> >> Date: Fri, 12 Apr 2024 13:52:18 +0200
> >> Subject: [PATCH] rust: Do not link with libdl and libpthread 
> >> unconditionally
> >>
> >> ChangeLog:
> >>
> >>  * Makefile.tpl: Add CRAB1_LIBS variable.
> >>  * Makefile.in: Regenerate.
> >>  * configure: Regenerate.
> >>  * configure.ac: Check if -ldl and -lpthread are needed, and if 
> >> so, add
> >>  them to CRAB1_LIBS.
> >>
> >> gcc/rust/ChangeLog:
> >>
> >>  * Make-lang.in: Remove overazealous LIBS = -ldl -lpthread line, 
> >> link
> >>  crab1 against CRAB1_LIBS.
> >> ---
> >>Makefile.in   |   3 +
> >>Makefile.tpl  |   3 +
> >>configure | 154 ++
> >>configure.ac  |  41 +++
> >>gcc/rust/Make-lang.in |   6 +-
> >>5 files changed, 203 insertions(+), 4 deletions(-)
> >>
> >> diff --git a/Makefile.in b/Makefile.in
> >> index edb0c8a9a42..1753fb6b862 100644
> >> --- a/Makefile.in
> >> +++ b/Makefile.in
> >> @@ -197,6 +197,7 @@ HOST_EXPORTS = \
> >>  $(BASE_EXPORTS) \
> >>  CC="$(CC)"; export CC; \
> >>  ADA_CFLAGS="$(ADA_CFLAGS)"; export ADA_CFLAGS; \
> >> +   CRAB1_LIBS="$(CRAB1_LIBS)"; export CRAB1_LIBS; \
> >>  CFLAGS="$(CFLAGS)"; export CFLAGS; \
> >>  CONFIG_SHELL="$(SHELL)"; export CONFIG_SHELL; \
> >>  CXX="$(CXX)"; export CXX; \
> >> @@ -450,6 +451,8 @@ GOCFLAGS = $(CFLAGS)
> >>GDCFLAGS = @GDCFLAGS@
> >>GM2FLAGS = $(CFLAGS)
> >>
> >> +CRAB1_LIBS = @CRAB1_LIBS@
> >> +
> >>PKG_CONFIG_PATH = @PKG_CONFIG_PATH@
> >>
> >>GUILE = guile
> >> diff --git a/Makefile.tpl b/Makefile.tpl
> >> index adbcbdd1d57..4aeaad3c1a5 100644
> >> --- a/Makefile.tpl
> >> +++ b/Makefile.tpl
> >> @@ -200,6 +200,7 @@ HOST_EXPORTS = \
> >>  $(BASE_EXPORTS) \
> >>  CC="$(CC)"; export CC; \
> >>  ADA_CFLAGS="$(ADA_CFLAGS)"; export ADA_CFLAGS; \
> >> +   CRAB1_LIBS="$(CRAB1_LIBS)"; export CRAB1_LIBS; \
> >>  CFLAGS="$(CFLAGS)"; export CFLAGS; \
> >>  CONFIG_SHELL="$(SHELL)"; export CONFIG_SHELL; \
> >>  CXX="$(CXX)"; export CXX; \
> >> @@ -453,6 +454,8 @@ GOCFLAGS = $(CFLAGS)
> >>GDCFLAGS = @GDCFLAGS@
> >>GM2FLAGS = $(CFLAGS)
> >>
> >> +CRAB1_LIBS = @CRAB1_LIBS@
> >> +
> >>PKG_CONFIG_PATH = @PKG_CONFIG_PATH@
> >>
> >>GUILE = guile
> >> diff --git a/configure b/configure
> >> index 02b435c1163..a9ea5258f0f 100755
> >> --- a/configure
> >> +++ b/configure
> >> @@ -690,6 +690,7 @@ extra_host_zlib_configure_flags
> >>extra_host_libiberty_configure_flags
> >>stage1_languages
> >>host_libs_picflag
> >> +CRAB1_LIBS
> >>PICFLAG
> >>host_shared
> >>gcc_host_pie
> >> @@ -8826,6 +8827,139 @@ fi
> >>
> >>
> >>
> >> +# Rust 

Re: [PATCH] rust: Do not link with libdl and libpthread unconditionally

2024-06-12 Thread Uros Bizjak
On Tue, Jun 11, 2024 at 11:21 AM Arthur Cohen  wrote:
>
> Thanks Richi!
>
> Tested again and pushed on trunk.


This patch introduced a couple of errors during ./configure:

checking for library containing dlopen... none required
checking for library containing pthread_create... none required
/git/gcc/configure: line 8997: test: too many arguments
/git/gcc/configure: line 8999: test: too many arguments
/git/gcc/configure: line 9003: test: too many arguments
/git/gcc/configure: line 9005: test: =: unary operator expected

You have to wrap arguments of the test with double quotes.

Uros.

> Best,
>
> Arthur
>
> On 5/31/24 15:02, Richard Biener wrote:
> > On Fri, May 31, 2024 at 12:24 PM Arthur Cohen  
> > wrote:
> >>
> >> Hi Richard,
> >>
> >> On 4/30/24 09:55, Richard Biener wrote:
> >>> On Fri, Apr 19, 2024 at 11:49 AM Arthur Cohen  
> >>> wrote:
> 
>  Hi everyone,
> 
>  This patch checks for the presence of dlopen and pthread_create in libc. 
>  If that is not the
>  case, we check for the existence of -ldl and -lpthread, as these 
>  libraries are required to
>  link the Rust runtime to our Rust frontend.
> 
>  If these libs are not present on the system, then we disable the Rust 
>  frontend.
> 
>  This was tested on x86_64, in an environment with a recent GLIBC and in 
>  a container with GLIBC
>  2.27.
> 
>  Apologies for sending it in so late.
> >>>
> >>> For example GCC_ENABLE_PLUGINS simply does
> >>>
> >>># Check -ldl
> >>>saved_LIBS="$LIBS"
> >>>AC_SEARCH_LIBS([dlopen], [dl])
> >>>if test x"$ac_cv_search_dlopen" = x"-ldl"; then
> >>>  pluginlibs="$pluginlibs -ldl"
> >>>fi
> >>>LIBS="$saved_LIBS"
> >>>
> >>> which I guess would also work for pthread_create?  This would simplify
> >>> the code a bit.
> >>
> >> Thanks a lot for the review. I've udpated the patch's content in
> >> configure.ac per your suggestion. Tested similarly on x86_64 and in a
> >> container with libc 2.27
> >
> > LGTM.
> >
> > Thanks,
> > Richard.
> >
> >>   From 00669b600a75743523c358ee41ab999b6e9fa0f6 Mon Sep 17 00:00:00 2001
> >> From: Arthur Cohen 
> >> Date: Fri, 12 Apr 2024 13:52:18 +0200
> >> Subject: [PATCH] rust: Do not link with libdl and libpthread 
> >> unconditionally
> >>
> >> ChangeLog:
> >>
> >>  * Makefile.tpl: Add CRAB1_LIBS variable.
> >>  * Makefile.in: Regenerate.
> >>  * configure: Regenerate.
> >>  * configure.ac: Check if -ldl and -lpthread are needed, and if 
> >> so, add
> >>  them to CRAB1_LIBS.
> >>
> >> gcc/rust/ChangeLog:
> >>
> >>  * Make-lang.in: Remove overazealous LIBS = -ldl -lpthread line, 
> >> link
> >>  crab1 against CRAB1_LIBS.
> >> ---
> >>Makefile.in   |   3 +
> >>Makefile.tpl  |   3 +
> >>configure | 154 ++
> >>configure.ac  |  41 +++
> >>gcc/rust/Make-lang.in |   6 +-
> >>5 files changed, 203 insertions(+), 4 deletions(-)
> >>
> >> diff --git a/Makefile.in b/Makefile.in
> >> index edb0c8a9a42..1753fb6b862 100644
> >> --- a/Makefile.in
> >> +++ b/Makefile.in
> >> @@ -197,6 +197,7 @@ HOST_EXPORTS = \
> >>  $(BASE_EXPORTS) \
> >>  CC="$(CC)"; export CC; \
> >>  ADA_CFLAGS="$(ADA_CFLAGS)"; export ADA_CFLAGS; \
> >> +   CRAB1_LIBS="$(CRAB1_LIBS)"; export CRAB1_LIBS; \
> >>  CFLAGS="$(CFLAGS)"; export CFLAGS; \
> >>  CONFIG_SHELL="$(SHELL)"; export CONFIG_SHELL; \
> >>  CXX="$(CXX)"; export CXX; \
> >> @@ -450,6 +451,8 @@ GOCFLAGS = $(CFLAGS)
> >>GDCFLAGS = @GDCFLAGS@
> >>GM2FLAGS = $(CFLAGS)
> >>
> >> +CRAB1_LIBS = @CRAB1_LIBS@
> >> +
> >>PKG_CONFIG_PATH = @PKG_CONFIG_PATH@
> >>
> >>GUILE = guile
> >> diff --git a/Makefile.tpl b/Makefile.tpl
> >> index adbcbdd1d57..4aeaad3c1a5 100644
> >> --- a/Makefile.tpl
> >> +++ b/Makefile.tpl
> >> @@ -200,6 +200,7 @@ HOST_EXPORTS = \
> >>  $(BASE_EXPORTS) \
> >>  CC="$(CC)"; export CC; \
> >>  ADA_CFLAGS="$(ADA_CFLAGS)"; export ADA_CFLAGS; \
> >> +   CRAB1_LIBS="$(CRAB1_LIBS)"; export CRAB1_LIBS; \
> >>  CFLAGS="$(CFLAGS)"; export CFLAGS; \
> >>  CONFIG_SHELL="$(SHELL)"; export CONFIG_SHELL; \
> >>  CXX="$(CXX)"; export CXX; \
> >> @@ -453,6 +454,8 @@ GOCFLAGS = $(CFLAGS)
> >>GDCFLAGS = @GDCFLAGS@
> >>GM2FLAGS = $(CFLAGS)
> >>
> >> +CRAB1_LIBS = @CRAB1_LIBS@
> >> +
> >>PKG_CONFIG_PATH = @PKG_CONFIG_PATH@
> >>
> >>GUILE = guile
> >> diff --git a/configure b/configure
> >> index 02b435c1163..a9ea5258f0f 100755
> >> --- a/configure
> >> +++ b/configure
> >> @@ -690,6 +690,7 @@ extra_host_zlib_configure_flags
> >>extra_host_libiberty_configure_flags
> >>stage1_languages
> >>host_libs_picflag
> >> +CRAB1_LIBS
> >>PICFLAG
> >>host_shared
> >>gcc_host_pie
> >> @@ -8826,6 +8827,139 @@ fi
> >>
> >>
> >>
> >> +# Rust 

[committed] i386: Use CMOV in .SAT_{ADD|SUB} expansion for TARGET_CMOV [PR112600]

2024-06-11 Thread Uros Bizjak
For TARGET_CMOV targets emit insn sequence involving conditional move.

.SAT_ADD:

addl%esi, %edi
movl$-1, %eax
cmovnc  %edi, %eax
ret

.SAT_SUB:

subl%esi, %edi
movl$0, %eax
cmovnc  %edi, %eax
ret

PR target/112600

gcc/ChangeLog:

* config/i386/i386.md (usadd3): Emit insn sequence
involving conditional move for TARGET_CMOVE targets.
(ussub3): Ditto.

gcc/testsuite/ChangeLog:

* gcc.target/i386/pr112600-a.c: Also scan for cmov.
* gcc.target/i386/pr112600-b.c: Ditto.

Bootstrapped and regression tested on x86_64-linux-gnu {,-m32}.

Uros.
diff --git a/gcc/config/i386/i386.md b/gcc/config/i386/i386.md
index d69bc8d6e48..a64f2ad4f5f 100644
--- a/gcc/config/i386/i386.md
+++ b/gcc/config/i386/i386.md
@@ -9885,13 +9885,35 @@ (define_expand "usadd3"
   ""
 {
   rtx res = gen_reg_rtx (mode);
-  rtx msk = gen_reg_rtx (mode);
   rtx dst;
 
   emit_insn (gen_add3_cc_overflow_1 (res, operands[1], operands[2]));
-  emit_insn (gen_x86_movcc_0_m1_neg (msk));
-  dst = expand_simple_binop (mode, IOR, res, msk,
-operands[0], 1, OPTAB_WIDEN);
+
+  if (TARGET_CMOVE)
+{
+  rtx cmp = gen_rtx_GEU (VOIDmode, gen_rtx_REG (CCCmode, FLAGS_REG),
+const0_rtx);
+
+  if ( < GET_MODE_SIZE (SImode))
+   {
+ dst = force_reg (mode, operands[0]);
+ emit_insn (gen_movsicc (gen_lowpart (SImode, dst), cmp,
+ gen_lowpart (SImode, res), constm1_rtx));
+   }
+   else
+   {
+ dst = operands[0];
+ emit_insn (gen_movcc (dst, cmp, res, constm1_rtx));
+   }
+}
+  else
+{
+  rtx msk = gen_reg_rtx (mode);
+
+  emit_insn (gen_x86_movcc_0_m1_neg (msk));
+  dst = expand_simple_binop (mode, IOR, res, msk,
+operands[0], 1, OPTAB_WIDEN);
+}
 
   if (!rtx_equal_p (dst, operands[0]))
 emit_move_insn (operands[0], dst);
@@ -9905,14 +9927,36 @@ (define_expand "ussub3"
   ""
 {
   rtx res = gen_reg_rtx (mode);
-  rtx msk = gen_reg_rtx (mode);
   rtx dst;
 
   emit_insn (gen_sub_3 (res, operands[1], operands[2]));
-  emit_insn (gen_x86_movcc_0_m1_neg (msk));
-  msk = expand_simple_unop (mode, NOT, msk, NULL, 1);
-  dst = expand_simple_binop (mode, AND, res, msk,
-operands[0], 1, OPTAB_WIDEN);
+
+  if (TARGET_CMOVE)
+{
+  rtx cmp = gen_rtx_GEU (VOIDmode, gen_rtx_REG (CCCmode, FLAGS_REG),
+const0_rtx);
+
+  if ( < GET_MODE_SIZE (SImode))
+   {
+ dst = force_reg (mode, operands[0]);
+ emit_insn (gen_movsicc (gen_lowpart (SImode, dst), cmp,
+ gen_lowpart (SImode, res), const0_rtx));
+   }
+   else
+   {
+ dst = operands[0];
+ emit_insn (gen_movcc (dst, cmp, res, const0_rtx));
+   }
+}
+  else
+{
+  rtx msk = gen_reg_rtx (mode);
+
+  emit_insn (gen_x86_movcc_0_m1_neg (msk));
+  msk = expand_simple_unop (mode, NOT, msk, NULL, 1);
+  dst = expand_simple_binop (mode, AND, res, msk,
+operands[0], 1, OPTAB_WIDEN);
+}
 
   if (!rtx_equal_p (dst, operands[0]))
 emit_move_insn (operands[0], dst);
diff --git a/gcc/testsuite/gcc.target/i386/pr112600-a.c 
b/gcc/testsuite/gcc.target/i386/pr112600-a.c
index fa122bc7a3f..2b084860451 100644
--- a/gcc/testsuite/gcc.target/i386/pr112600-a.c
+++ b/gcc/testsuite/gcc.target/i386/pr112600-a.c
@@ -1,7 +1,7 @@
 /* PR target/112600 */
 /* { dg-do compile } */
 /* { dg-options "-O2" } */
-/* { dg-final { scan-assembler-times "sbb" 4 } } */
+/* { dg-final { scan-assembler-times "sbb|cmov" 4 } } */
 
 unsigned char
 add_sat_char (unsigned char x, unsigned char y)
diff --git a/gcc/testsuite/gcc.target/i386/pr112600-b.c 
b/gcc/testsuite/gcc.target/i386/pr112600-b.c
index ea14bb9738b..ac4e26423b6 100644
--- a/gcc/testsuite/gcc.target/i386/pr112600-b.c
+++ b/gcc/testsuite/gcc.target/i386/pr112600-b.c
@@ -1,7 +1,7 @@
 /* PR target/112600 */
 /* { dg-do compile } */
 /* { dg-options "-O2" } */
-/* { dg-final { scan-assembler-times "sbb" 4 } } */
+/* { dg-final { scan-assembler-times "sbb|cmov" 4 } } */
 
 unsigned char
 sub_sat_char (unsigned char x, unsigned char y)


[gcc r15-1183] i386: Use CMOV in .SAT_{ADD|SUB} expansion for TARGET_CMOV [PR112600]

2024-06-11 Thread Uros Bizjak via Gcc-cvs
https://gcc.gnu.org/g:05b95238be648c9cf8af2516930af6a7b637a2b8

commit r15-1183-g05b95238be648c9cf8af2516930af6a7b637a2b8
Author: Uros Bizjak 
Date:   Tue Jun 11 16:00:31 2024 +0200

i386: Use CMOV in .SAT_{ADD|SUB} expansion for TARGET_CMOV [PR112600]

For TARGET_CMOV targets emit insn sequence involving conditonal move.

.SAT_ADD:

addl%esi, %edi
movl$-1, %eax
cmovnc  %edi, %eax
ret

.SAT_SUB:

subl%esi, %edi
movl$0, %eax
cmovnc  %edi, %eax
ret

PR target/112600

gcc/ChangeLog:

* config/i386/i386.md (usadd3): Emit insn sequence
involving conditional move for TARGET_CMOVE targets.
(ussub3): Ditto.

gcc/testsuite/ChangeLog:

* gcc.target/i386/pr112600-a.c: Also scan for cmov.
* gcc.target/i386/pr112600-b.c: Ditto.

Diff:
---
 gcc/config/i386/i386.md| 62 +-
 gcc/testsuite/gcc.target/i386/pr112600-a.c |  2 +-
 gcc/testsuite/gcc.target/i386/pr112600-b.c |  2 +-
 3 files changed, 55 insertions(+), 11 deletions(-)

diff --git a/gcc/config/i386/i386.md b/gcc/config/i386/i386.md
index d69bc8d6e482..a64f2ad4f5f0 100644
--- a/gcc/config/i386/i386.md
+++ b/gcc/config/i386/i386.md
@@ -9885,13 +9885,35 @@
   ""
 {
   rtx res = gen_reg_rtx (mode);
-  rtx msk = gen_reg_rtx (mode);
   rtx dst;
 
   emit_insn (gen_add3_cc_overflow_1 (res, operands[1], operands[2]));
-  emit_insn (gen_x86_movcc_0_m1_neg (msk));
-  dst = expand_simple_binop (mode, IOR, res, msk,
-operands[0], 1, OPTAB_WIDEN);
+
+  if (TARGET_CMOVE)
+{
+  rtx cmp = gen_rtx_GEU (VOIDmode, gen_rtx_REG (CCCmode, FLAGS_REG),
+const0_rtx);
+
+  if ( < GET_MODE_SIZE (SImode))
+   {
+ dst = force_reg (mode, operands[0]);
+ emit_insn (gen_movsicc (gen_lowpart (SImode, dst), cmp,
+ gen_lowpart (SImode, res), constm1_rtx));
+   }
+   else
+   {
+ dst = operands[0];
+ emit_insn (gen_movcc (dst, cmp, res, constm1_rtx));
+   }
+}
+  else
+{
+  rtx msk = gen_reg_rtx (mode);
+
+  emit_insn (gen_x86_movcc_0_m1_neg (msk));
+  dst = expand_simple_binop (mode, IOR, res, msk,
+operands[0], 1, OPTAB_WIDEN);
+}
 
   if (!rtx_equal_p (dst, operands[0]))
 emit_move_insn (operands[0], dst);
@@ -9905,14 +9927,36 @@
   ""
 {
   rtx res = gen_reg_rtx (mode);
-  rtx msk = gen_reg_rtx (mode);
   rtx dst;
 
   emit_insn (gen_sub_3 (res, operands[1], operands[2]));
-  emit_insn (gen_x86_movcc_0_m1_neg (msk));
-  msk = expand_simple_unop (mode, NOT, msk, NULL, 1);
-  dst = expand_simple_binop (mode, AND, res, msk,
-operands[0], 1, OPTAB_WIDEN);
+
+  if (TARGET_CMOVE)
+{
+  rtx cmp = gen_rtx_GEU (VOIDmode, gen_rtx_REG (CCCmode, FLAGS_REG),
+const0_rtx);
+
+  if ( < GET_MODE_SIZE (SImode))
+   {
+ dst = force_reg (mode, operands[0]);
+ emit_insn (gen_movsicc (gen_lowpart (SImode, dst), cmp,
+ gen_lowpart (SImode, res), const0_rtx));
+   }
+   else
+   {
+ dst = operands[0];
+ emit_insn (gen_movcc (dst, cmp, res, const0_rtx));
+   }
+}
+  else
+{
+  rtx msk = gen_reg_rtx (mode);
+
+  emit_insn (gen_x86_movcc_0_m1_neg (msk));
+  msk = expand_simple_unop (mode, NOT, msk, NULL, 1);
+  dst = expand_simple_binop (mode, AND, res, msk,
+operands[0], 1, OPTAB_WIDEN);
+}
 
   if (!rtx_equal_p (dst, operands[0]))
 emit_move_insn (operands[0], dst);
diff --git a/gcc/testsuite/gcc.target/i386/pr112600-a.c 
b/gcc/testsuite/gcc.target/i386/pr112600-a.c
index fa122bc7a3fd..2b0848604512 100644
--- a/gcc/testsuite/gcc.target/i386/pr112600-a.c
+++ b/gcc/testsuite/gcc.target/i386/pr112600-a.c
@@ -1,7 +1,7 @@
 /* PR target/112600 */
 /* { dg-do compile } */
 /* { dg-options "-O2" } */
-/* { dg-final { scan-assembler-times "sbb" 4 } } */
+/* { dg-final { scan-assembler-times "sbb|cmov" 4 } } */
 
 unsigned char
 add_sat_char (unsigned char x, unsigned char y)
diff --git a/gcc/testsuite/gcc.target/i386/pr112600-b.c 
b/gcc/testsuite/gcc.target/i386/pr112600-b.c
index ea14bb9738b7..ac4e26423b6f 100644
--- a/gcc/testsuite/gcc.target/i386/pr112600-b.c
+++ b/gcc/testsuite/gcc.target/i386/pr112600-b.c
@@ -1,7 +1,7 @@
 /* PR target/112600 */
 /* { dg-do compile } */
 /* { dg-options "-O2" } */
-/* { dg-final { scan-assembler-times "sbb" 4 } } */
+/* { dg-final { scan-assembler-times "sbb|cmov" 4 } } */
 
 unsigned char
 sub_sat_char (unsigned char x, unsigned char y)


[committed] i386: Implement .SAT_SUB for unsigned scalar integers [PR112600]

2024-06-09 Thread Uros Bizjak
The following testcase:

unsigned
sub_sat (unsigned x, unsigned y)
{
  unsigned res;
  res = x - y;
  res &= -(x >= y);
  return res;
}

currently compiles (-O2) to:

sub_sat:
movl%edi, %edx
xorl%eax, %eax
subl%esi, %edx
cmpl%esi, %edi
setnb   %al
negl%eax
andl%edx, %eax
ret

We can expand through ussub{m}3 optab to use carry flag from the subtraction
and generate code using SBB instruction implementing:

unsigned res = x - y;
res &= ~(-(x < y));

sub_sat:
subl%esi, %edi
sbbl%eax, %eax
notl%eax
andl%edi, %eax
ret

PR target/112600

gcc/ChangeLog:

* config/i386/i386.md (ussub3): New expander.
(sub_3): Ditto.

gcc/testsuite/ChangeLog:

* gcc.target/i386/pr112600-b.c: New test.

Bootstrapped and regression tested on x86_64-linux-gnu {,-m32}.

Uros.
diff --git a/gcc/config/i386/i386.md b/gcc/config/i386/i386.md
index bc2ef819df6..d69bc8d6e48 100644
--- a/gcc/config/i386/i386.md
+++ b/gcc/config/i386/i386.md
@@ -8436,6 +8436,14 @@ (define_expand "usubv4"
   "ix86_fixup_binary_operands_no_copy (MINUS, mode, operands,
   TARGET_APX_NDD);")
 
+(define_expand "sub_3"
+  [(parallel [(set (reg:CC FLAGS_REG)
+  (compare:CC
+(match_operand:SWI 1 "nonimmediate_operand")
+(match_operand:SWI 2 "")))
+ (set (match_operand:SWI 0 "register_operand")
+  (minus:SWI (match_dup 1) (match_dup 2)))])])
+
 (define_insn "*sub_3"
   [(set (reg FLAGS_REG)
(compare (match_operand:SWI 1 "nonimmediate_operand" "0,0,rm,r")
@@ -9883,7 +9891,28 @@ (define_expand "usadd3"
   emit_insn (gen_add3_cc_overflow_1 (res, operands[1], operands[2]));
   emit_insn (gen_x86_movcc_0_m1_neg (msk));
   dst = expand_simple_binop (mode, IOR, res, msk,
-operands[0], 1, OPTAB_DIRECT);
+operands[0], 1, OPTAB_WIDEN);
+
+  if (!rtx_equal_p (dst, operands[0]))
+emit_move_insn (operands[0], dst);
+  DONE;
+})
+
+(define_expand "ussub3"
+  [(set (match_operand:SWI 0 "register_operand")
+   (us_minus:SWI (match_operand:SWI 1 "register_operand")
+ (match_operand:SWI 2 "")))]
+  ""
+{
+  rtx res = gen_reg_rtx (mode);
+  rtx msk = gen_reg_rtx (mode);
+  rtx dst;
+
+  emit_insn (gen_sub_3 (res, operands[1], operands[2]));
+  emit_insn (gen_x86_movcc_0_m1_neg (msk));
+  msk = expand_simple_unop (mode, NOT, msk, NULL, 1);
+  dst = expand_simple_binop (mode, AND, res, msk,
+operands[0], 1, OPTAB_WIDEN);
 
   if (!rtx_equal_p (dst, operands[0]))
 emit_move_insn (operands[0], dst);


[gcc r15-1122] i386: Implement .SAT_SUB for unsigned scalar integers [PR112600]

2024-06-09 Thread Uros Bizjak via Gcc-cvs
https://gcc.gnu.org/g:8bb6b2f4ae19c3aab7d7a5e5c8f5965f89d90e01

commit r15-1122-g8bb6b2f4ae19c3aab7d7a5e5c8f5965f89d90e01
Author: Uros Bizjak 
Date:   Sun Jun 9 12:09:13 2024 +0200

i386: Implement .SAT_SUB for unsigned scalar integers [PR112600]

The following testcase:

unsigned
sub_sat (unsigned x, unsigned y)
{
  unsigned res;
  res = x - y;
  res &= -(x >= y);
  return res;
}

currently compiles (-O2) to:

sub_sat:
movl%edi, %edx
xorl%eax, %eax
subl%esi, %edx
cmpl%esi, %edi
setnb   %al
negl%eax
andl%edx, %eax
ret

We can expand through ussub{m}3 optab to use carry flag from the subtraction
and generate code using SBB instruction implementing:

unsigned res = x - y;
res &= ~(-(x < y));

sub_sat:
subl%esi, %edi
sbbl%eax, %eax
notl%eax
andl%edi, %eax
ret

PR target/112600

gcc/ChangeLog:

* config/i386/i386.md (ussub3): New expander.
(sub_3): Ditto.

gcc/testsuite/ChangeLog:

* gcc.target/i386/pr112600-b.c: New test.

Diff:
---
 gcc/config/i386/i386.md| 31 ++-
 gcc/testsuite/gcc.target/i386/pr112600-b.c | 40 ++
 2 files changed, 70 insertions(+), 1 deletion(-)

diff --git a/gcc/config/i386/i386.md b/gcc/config/i386/i386.md
index bc2ef819df6..d69bc8d6e48 100644
--- a/gcc/config/i386/i386.md
+++ b/gcc/config/i386/i386.md
@@ -8436,6 +8436,14 @@
   "ix86_fixup_binary_operands_no_copy (MINUS, mode, operands,
   TARGET_APX_NDD);")
 
+(define_expand "sub_3"
+  [(parallel [(set (reg:CC FLAGS_REG)
+  (compare:CC
+(match_operand:SWI 1 "nonimmediate_operand")
+(match_operand:SWI 2 "")))
+ (set (match_operand:SWI 0 "register_operand")
+  (minus:SWI (match_dup 1) (match_dup 2)))])])
+
 (define_insn "*sub_3"
   [(set (reg FLAGS_REG)
(compare (match_operand:SWI 1 "nonimmediate_operand" "0,0,rm,r")
@@ -9883,7 +9891,28 @@
   emit_insn (gen_add3_cc_overflow_1 (res, operands[1], operands[2]));
   emit_insn (gen_x86_movcc_0_m1_neg (msk));
   dst = expand_simple_binop (mode, IOR, res, msk,
-operands[0], 1, OPTAB_DIRECT);
+operands[0], 1, OPTAB_WIDEN);
+
+  if (!rtx_equal_p (dst, operands[0]))
+emit_move_insn (operands[0], dst);
+  DONE;
+})
+
+(define_expand "ussub3"
+  [(set (match_operand:SWI 0 "register_operand")
+   (us_minus:SWI (match_operand:SWI 1 "register_operand")
+ (match_operand:SWI 2 "")))]
+  ""
+{
+  rtx res = gen_reg_rtx (mode);
+  rtx msk = gen_reg_rtx (mode);
+  rtx dst;
+
+  emit_insn (gen_sub_3 (res, operands[1], operands[2]));
+  emit_insn (gen_x86_movcc_0_m1_neg (msk));
+  msk = expand_simple_unop (mode, NOT, msk, NULL, 1);
+  dst = expand_simple_binop (mode, AND, res, msk,
+operands[0], 1, OPTAB_WIDEN);
 
   if (!rtx_equal_p (dst, operands[0]))
 emit_move_insn (operands[0], dst);
diff --git a/gcc/testsuite/gcc.target/i386/pr112600-b.c 
b/gcc/testsuite/gcc.target/i386/pr112600-b.c
new file mode 100644
index 000..ea14bb9738b
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/pr112600-b.c
@@ -0,0 +1,40 @@
+/* PR target/112600 */
+/* { dg-do compile } */
+/* { dg-options "-O2" } */
+/* { dg-final { scan-assembler-times "sbb" 4 } } */
+
+unsigned char
+sub_sat_char (unsigned char x, unsigned char y)
+{
+  unsigned char res;
+  res = x - y;
+  res &= -(x >= y);
+  return res;
+}
+
+unsigned short
+sub_sat_short (unsigned short x, unsigned short y)
+{
+  unsigned short res;
+  res = x - y;
+  res &= -(x >= y);
+  return res;
+}
+
+unsigned int
+sub_sat_int (unsigned int x, unsigned int y)
+{
+  unsigned int res;
+  res = x - y;
+  res &= -(x >= y);
+  return res;
+}
+
+unsigned long
+sub_sat_long (unsigned long x, unsigned long y)
+{
+  unsigned long res;
+  res = x - y;
+  res &= -(x >= y);
+  return res;
+}


Re: [committed] i386: Implement .SAT_ADD for unsigned scalar integers [PR112600]

2024-06-08 Thread Uros Bizjak
On Sat, Jun 8, 2024 at 2:09 PM Gerald Pfeifer  wrote:
>
> On Sat, 8 Jun 2024, Uros Bizjak wrote:
> > gcc/ChangeLog:
> >
> > * config/i386/i386.md (usadd3): New expander.
> > (x86_movcc_0_m1_neg): Use SWI mode iterator.
>
> When you write "committed", did you actually push?

Yes, IIRC, the request was to mark pushed change with the word "committed".

> If so, us being on Git now it might be good to adjust terminology.

No problem, I can say "pushed" if that is more descriptive.

Thanks,
Uros.


[committed] i386: Implement .SAT_ADD for unsigned scalar integers [PR112600]

2024-06-08 Thread Uros Bizjak
The following testcase:

unsigned
add_sat(unsigned x, unsigned y)
{
unsigned z;
return __builtin_add_overflow(x, y, ) ? -1u : z;
}

currently compiles (-O2) to:

add_sat:
addl%esi, %edi
jc  .L3
movl%edi, %eax
ret
.L3:
orl $-1, %eax
ret

We can expand through usadd{m}3 optab to use carry flag from the addition
and generate branchless code using SBB instruction implementing:

unsigned res = x + y;
res |= -(res < x);

add_sat:
addl%esi, %edi
sbbl%eax, %eax
orl %edi, %eax
ret

PR target/112600

gcc/ChangeLog:

* config/i386/i386.md (usadd3): New expander.
(x86_movcc_0_m1_neg): Use SWI mode iterator.

gcc/testsuite/ChangeLog:

* gcc.target/i386/pr112600-a.c: New test.

Bootstrapped and regression tested on x86_64-linux-gnu {,-m32}.

Uros.
diff --git a/gcc/config/i386/i386.md b/gcc/config/i386/i386.md
index ffcf63e1cba..bc2ef819df6 100644
--- a/gcc/config/i386/i386.md
+++ b/gcc/config/i386/i386.md
@@ -9870,6 +9870,26 @@ (define_insn_and_split "*sub3_ne_0"
 operands[1] = force_reg (mode, operands[1]);
 })
 
+(define_expand "usadd3"
+  [(set (match_operand:SWI 0 "register_operand")
+   (us_plus:SWI (match_operand:SWI 1 "register_operand")
+(match_operand:SWI 2 "")))]
+  ""
+{
+  rtx res = gen_reg_rtx (mode);
+  rtx msk = gen_reg_rtx (mode);
+  rtx dst;
+
+  emit_insn (gen_add3_cc_overflow_1 (res, operands[1], operands[2]));
+  emit_insn (gen_x86_movcc_0_m1_neg (msk));
+  dst = expand_simple_binop (mode, IOR, res, msk,
+operands[0], 1, OPTAB_DIRECT);
+
+  if (!rtx_equal_p (dst, operands[0]))
+emit_move_insn (operands[0], dst);
+  DONE;
+})
+
 ;; The patterns that match these are at the end of this file.
 
 (define_expand "xf3"
@@ -24945,8 +24965,8 @@ (define_insn "*x86_movcc_0_m1_neg"
 
 (define_expand "x86_movcc_0_m1_neg"
   [(parallel
-[(set (match_operand:SWI48 0 "register_operand")
- (neg:SWI48 (ltu:SWI48 (reg:CCC FLAGS_REG) (const_int 0
+[(set (match_operand:SWI 0 "register_operand")
+ (neg:SWI (ltu:SWI (reg:CCC FLAGS_REG) (const_int 0
  (clobber (reg:CC FLAGS_REG))])])
 
 (define_split
diff --git a/gcc/testsuite/gcc.target/i386/pr112600-a.c 
b/gcc/testsuite/gcc.target/i386/pr112600-a.c
new file mode 100644
index 000..fa122bc7a3f
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/pr112600-a.c
@@ -0,0 +1,32 @@
+/* PR target/112600 */
+/* { dg-do compile } */
+/* { dg-options "-O2" } */
+/* { dg-final { scan-assembler-times "sbb" 4 } } */
+
+unsigned char
+add_sat_char (unsigned char x, unsigned char y)
+{
+  unsigned char z;
+  return __builtin_add_overflow(x, y, ) ? -1u : z;
+}
+
+unsigned short
+add_sat_short (unsigned short x, unsigned short y)
+{
+  unsigned short z;
+  return __builtin_add_overflow(x, y, ) ? -1u : z;
+}
+
+unsigned int
+add_sat_int (unsigned int x, unsigned int y)
+{
+  unsigned int z;
+  return __builtin_add_overflow(x, y, ) ? -1u : z;
+}
+
+unsigned long
+add_sat_long (unsigned long x, unsigned long y)
+{
+  unsigned long z;
+  return __builtin_add_overflow(x, y, ) ? -1ul : z;
+}


[gcc r15-1113] i386: Implement .SAT_ADD for unsigned scalar integers [PR112600]

2024-06-08 Thread Uros Bizjak via Gcc-cvs
https://gcc.gnu.org/g:de05e44b2ad9638d04173393b1eae3c38b2c3864

commit r15-1113-gde05e44b2ad9638d04173393b1eae3c38b2c3864
Author: Uros Bizjak 
Date:   Sat Jun 8 12:17:11 2024 +0200

i386: Implement .SAT_ADD for unsigned scalar integers [PR112600]

The following testcase:

unsigned
add_sat(unsigned x, unsigned y)
{
unsigned z;
return __builtin_add_overflow(x, y, ) ? -1u : z;
}

currently compiles (-O2) to:

add_sat:
addl%esi, %edi
jc  .L3
movl%edi, %eax
ret
.L3:
orl $-1, %eax
ret

We can expand through usadd{m}3 optab to use carry flag from the addition
and generate branchless code using SBB instruction implementing:

unsigned res = x + y;
res |= -(res < x);

add_sat:
addl%esi, %edi
sbbl%eax, %eax
orl %edi, %eax
ret

PR target/112600

gcc/ChangeLog:

* config/i386/i386.md (usadd3): New expander.
(x86_movcc_0_m1_neg): Use SWI mode iterator.

gcc/testsuite/ChangeLog:

* gcc.target/i386/pr112600-a.c: New test.

Diff:
---
 gcc/config/i386/i386.md| 24 --
 gcc/testsuite/gcc.target/i386/pr112600-a.c | 32 ++
 2 files changed, 54 insertions(+), 2 deletions(-)

diff --git a/gcc/config/i386/i386.md b/gcc/config/i386/i386.md
index ffcf63e1cba..bc2ef819df6 100644
--- a/gcc/config/i386/i386.md
+++ b/gcc/config/i386/i386.md
@@ -9870,6 +9870,26 @@
 operands[1] = force_reg (mode, operands[1]);
 })
 
+(define_expand "usadd3"
+  [(set (match_operand:SWI 0 "register_operand")
+   (us_plus:SWI (match_operand:SWI 1 "register_operand")
+(match_operand:SWI 2 "")))]
+  ""
+{
+  rtx res = gen_reg_rtx (mode);
+  rtx msk = gen_reg_rtx (mode);
+  rtx dst;
+
+  emit_insn (gen_add3_cc_overflow_1 (res, operands[1], operands[2]));
+  emit_insn (gen_x86_movcc_0_m1_neg (msk));
+  dst = expand_simple_binop (mode, IOR, res, msk,
+operands[0], 1, OPTAB_DIRECT);
+
+  if (!rtx_equal_p (dst, operands[0]))
+emit_move_insn (operands[0], dst);
+  DONE;
+})
+
 ;; The patterns that match these are at the end of this file.
 
 (define_expand "xf3"
@@ -24945,8 +24965,8 @@
 
 (define_expand "x86_movcc_0_m1_neg"
   [(parallel
-[(set (match_operand:SWI48 0 "register_operand")
- (neg:SWI48 (ltu:SWI48 (reg:CCC FLAGS_REG) (const_int 0
+[(set (match_operand:SWI 0 "register_operand")
+ (neg:SWI (ltu:SWI (reg:CCC FLAGS_REG) (const_int 0
  (clobber (reg:CC FLAGS_REG))])])
 
 (define_split
diff --git a/gcc/testsuite/gcc.target/i386/pr112600-a.c 
b/gcc/testsuite/gcc.target/i386/pr112600-a.c
new file mode 100644
index 000..fa122bc7a3f
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/pr112600-a.c
@@ -0,0 +1,32 @@
+/* PR target/112600 */
+/* { dg-do compile } */
+/* { dg-options "-O2" } */
+/* { dg-final { scan-assembler-times "sbb" 4 } } */
+
+unsigned char
+add_sat_char (unsigned char x, unsigned char y)
+{
+  unsigned char z;
+  return __builtin_add_overflow(x, y, ) ? -1u : z;
+}
+
+unsigned short
+add_sat_short (unsigned short x, unsigned short y)
+{
+  unsigned short z;
+  return __builtin_add_overflow(x, y, ) ? -1u : z;
+}
+
+unsigned int
+add_sat_int (unsigned int x, unsigned int y)
+{
+  unsigned int z;
+  return __builtin_add_overflow(x, y, ) ? -1u : z;
+}
+
+unsigned long
+add_sat_long (unsigned long x, unsigned long y)
+{
+  unsigned long z;
+  return __builtin_add_overflow(x, y, ) ? -1ul : z;
+}


Re: [PATCH v2 2/6] Extract ix86 dllimport implementation to mingw

2024-06-07 Thread Uros Bizjak
On Fri, Jun 7, 2024 at 11:48 AM Evgeny Karpov
 wrote:
>
> This patch extracts the ix86 implementation for expanding a SYMBOL
> into its corresponding dllimport, far-address, or refptr symbol.
> It will be reused in the aarch64-w64-mingw32 target.
> The implementation is copied as is from i386/i386.cc with
> minor changes to follow to the code style.
>
> Also this patch replaces the original DLL import/export
> implementation in ix86 with mingw.
>
> gcc/ChangeLog:
>
> * config.gcc: Add winnt-dll.o, which contains the DLL
> import/export implementation.
> * config/i386/cygming.h (SUB_TARGET_RECORD_STUB): Remove the
> old implementation. Rename the required function to MinGW.
> Use MinGW implementation for COFF and nothing otherwise.
> (GOT_ALIAS_SET): Likewise.
> * config/i386/i386-expand.cc (ix86_expand_move): Likewise.
> * config/i386/i386-expand.h (ix86_GOT_alias_set): Likewise.
> (legitimize_pe_coff_symbol): Likewise.
> * config/i386/i386-protos.h (i386_pe_record_stub): Likewise.
> * config/i386/i386.cc (is_imported_p): Likewise.
> (legitimate_pic_address_disp_p): Likewise.
> (ix86_GOT_alias_set): Likewise.
> (legitimize_pic_address): Likewise.
> (legitimize_tls_address): Likewise.
> (struct dllimport_hasher): Likewise.
> (GTY): Likewise.
> (get_dllimport_decl): Likewise.
> (legitimize_pe_coff_extern_decl): Likewise.
> (legitimize_dllimport_symbol): Likewise.
> (legitimize_pe_coff_symbol): Likewise.
> (ix86_legitimize_address): Likewise.
> * config/i386/i386.h (GOT_ALIAS_SET): Likewise.
> * config/mingw/winnt.cc (i386_pe_record_stub): Likewise.
> (mingw_pe_record_stub): Likewise.
> * config/mingw/winnt.h (mingw_pe_record_stub): Likewise.
> * config/mingw/t-cygming: Add the winnt-dll.o compilation.
> * config/mingw/winnt-dll.cc: New file.
> * config/mingw/winnt-dll.h: New file.

LGTM for generic x86 changes.

Thanks,
Uros.

> ---
>  gcc/config.gcc |  12 +-
>  gcc/config/i386/cygming.h  |   5 +-
>  gcc/config/i386/i386-expand.cc |   4 +-
>  gcc/config/i386/i386-expand.h  |   2 -
>  gcc/config/i386/i386-protos.h  |   1 -
>  gcc/config/i386/i386.cc| 205 ++---
>  gcc/config/i386/i386.h |   2 +
>  gcc/config/mingw/t-cygming |   6 +
>  gcc/config/mingw/winnt-dll.cc  | 231 +
>  gcc/config/mingw/winnt-dll.h   |  30 +
>  gcc/config/mingw/winnt.cc  |   2 +-
>  gcc/config/mingw/winnt.h   |   1 +
>  12 files changed, 298 insertions(+), 203 deletions(-)
>  create mode 100644 gcc/config/mingw/winnt-dll.cc
>  create mode 100644 gcc/config/mingw/winnt-dll.h
>
> diff --git a/gcc/config.gcc b/gcc/config.gcc
> index 553a310f4bd..d053b98efa8 100644
> --- a/gcc/config.gcc
> +++ b/gcc/config.gcc
> @@ -2177,11 +2177,13 @@ i[4567]86-wrs-vxworks*|x86_64-wrs-vxworks7*)
>  i[34567]86-*-cygwin*)
> tm_file="${tm_file} i386/unix.h i386/bsd.h i386/gas.h i386/cygming.h 
> i386/cygwin.h i386/cygwin-stdint.h"
> tm_file="${tm_file} mingw/winnt.h"
> +   tm_file="${tm_file} mingw/winnt-dll.h"
> xm_file=i386/xm-cygwin.h
> tmake_file="${tmake_file} mingw/t-cygming t-slibgcc"
> target_gtfiles="$target_gtfiles \$(srcdir)/config/mingw/winnt.cc"
> +   target_gtfiles="$target_gtfiles \$(srcdir)/config/mingw/winnt-dll.cc"
> extra_options="${extra_options} mingw/cygming.opt i386/cygwin.opt"
> -   extra_objs="${extra_objs} winnt.o winnt-stubs.o"
> +   extra_objs="${extra_objs} winnt.o winnt-stubs.o winnt-dll.o"
> c_target_objs="${c_target_objs} msformat-c.o"
> cxx_target_objs="${cxx_target_objs} winnt-cxx.o msformat-c.o"
> d_target_objs="${d_target_objs} cygwin-d.o"
> @@ -2196,11 +2198,13 @@ x86_64-*-cygwin*)
> need_64bit_isa=yes
> tm_file="${tm_file} i386/unix.h i386/bsd.h i386/gas.h i386/cygming.h 
> i386/cygwin.h i386/cygwin-w64.h i386/cygwin-stdint.h"
> tm_file="${tm_file} mingw/winnt.h"
> +   tm_file="${tm_file} mingw/winnt-dll.h"
> xm_file=i386/xm-cygwin.h
> tmake_file="${tmake_file} mingw/t-cygming t-slibgcc"
> target_gtfiles="$target_gtfiles \$(srcdir)/config/mingw/winnt.cc"
> +   target_gtfiles="$target_gtfiles \$(srcdir)/config/mingw/winnt-dll.cc"
> extra_options="${extra_options} mingw/cygming.opt i386/cygwin.opt"
> -   extra_objs="${extra_objs} winnt.o winnt-stubs.o"
> +   extra_objs="${extra_objs} winnt.o winnt-stubs.o winnt-dll.o"
> c_target_objs="${c_target_objs} msformat-c.o"
> cxx_target_objs="${cxx_target_objs} winnt-cxx.o msformat-c.o"
> d_target_objs="${d_target_objs} cygwin-d.o"
> @@ -2266,6 +2270,7 @@ i[34567]86-*-mingw* | x86_64-*-mingw*)
> esac
> tm_file="${tm_file} mingw/mingw-stdint.h"
> 

Re: [x86 PATCH] PR target/115351: RTX costs for *concatditi3 and *insvti_highpart.

2024-06-07 Thread Uros Bizjak
On Fri, Jun 7, 2024 at 11:21 AM Roger Sayle  wrote:
>
>
> This patch addresses PR target/115351, which is a code quality regression
> on x86 when passing floating point complex numbers.  The ABI considers
> these arguments to have TImode, requiring interunit moves to place the
> FP values (which are actually passed in SSE registers) into the upper
> and lower parts of a TImode pseudo, and then similar moves back again
> before they can be used.
>
> The cause of the regression is that changes in how TImode initialization
> is represented in RTL now prevents the RTL optimizers from eliminating
> these redundant moves.  The specific cause is that the *concatditi3
> pattern, (zext(hi)<<64)|zext(lo), has an inappropriately high (default)
> rtx_cost, preventing fwprop1 from propagating it.  This pattern just
> sets the hipart and lopart of a double-word register, typically two
> instructions (less if reload can allocate things appropriately) but
> the current ix86_rtx_costs actually returns INSN_COSTS(13), i.e. 52.
>
> propagating insn 5 into insn 6, replacing:
> (set (reg:TI 110)
> (ior:TI (and:TI (reg:TI 110)
> (const_wide_int 0x0))
> (ashift:TI (zero_extend:TI (subreg:DI (reg:DF 112 [ zD.2796+8 ]) 0))
> (const_int 64 [0x40]
> successfully matched this instruction to *concatditi3_3:
> (set (reg:TI 110)
> (ior:TI (ashift:TI (zero_extend:TI (subreg:DI (reg:DF 112 [ zD.2796+8 ])
> 0))
> (const_int 64 [0x40]))
> (zero_extend:TI (subreg:DI (reg:DF 111 [ zD.2796 ]) 0
> change not profitable (cost 50 -> cost 52)
>
> This issue is resolved by having ix86_rtx_costs return more reasonable
> values for these (place-holder) patterns.
>
> This patch has been tested on x86_64-pc-linux-gnu with make bootstrap
> and make -k check, both with and without --target_board=unix{-m32}
> with no new failures.  Ok for mainline?
>
>
> 2024-06-07  Roger Sayle  
>
> gcc/ChangeLog
> PR target/115351
> * config/i386/i386.cc (ix86_rtx_costs): Provide estimates for the
> *concatditi3 and *insvti_highpart patterns, about two insns.
>
> gcc/testsuite/ChangeLog
> PR target/115351
> * g++.target/i386/pr115351.C: New test case.

LGTM.

Thanks,
Uros.

>
>
> Thanks in advance (and sorry for any inconvenience),
> Roger
> --
>


[committed] testsuite/i386: Add vector sat_sub testcases [PR112600]

2024-06-06 Thread Uros Bizjak
PR middle-end/112600

gcc/testsuite/ChangeLog:

* gcc.target/i386/pr112600-2a.c: New test.
* gcc.target/i386/pr112600-2b.c: New test.

Tested on x86_64-linux-gnu {,-m32}.

Uros.
diff --git a/gcc/testsuite/gcc.target/i386/pr112600-2a.c 
b/gcc/testsuite/gcc.target/i386/pr112600-2a.c
new file mode 100644
index 000..4df38e5a720
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/pr112600-2a.c
@@ -0,0 +1,15 @@
+/* PR middle-end/112600 */
+/* { dg-do compile } */
+/* { dg-options "-O2 -ftree-vectorize -msse2" } */
+
+typedef unsigned char T;
+
+void foo (T *out, T *x, T *y, int n)
+{
+  int i;
+
+  for (i = 0; i < n; i++)
+out[i] = (x[i] - y[i]) & (-(T)(x[i] >= y[i]));
+}
+
+/* { dg-final { scan-assembler "psubusb" } } */
diff --git a/gcc/testsuite/gcc.target/i386/pr112600-2b.c 
b/gcc/testsuite/gcc.target/i386/pr112600-2b.c
new file mode 100644
index 000..0f6345de704
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/pr112600-2b.c
@@ -0,0 +1,15 @@
+/* PR middle-end/112600 */
+/* { dg-do compile } */
+/* { dg-options "-O2 -ftree-vectorize -msse2" } */
+
+typedef unsigned short T;
+
+void foo (T *out, T *x, T *y, int n)
+{
+  int i;
+
+  for (i = 0; i < n; i++)
+out[i] = (x[i] - y[i]) & (-(T)(x[i] >= y[i]));
+}
+
+/* { dg-final { scan-assembler "psubusw" } } */


[gcc r15-1077] testsuite/i386: Add vector sat_sub testcases [PR112600]

2024-06-06 Thread Uros Bizjak via Gcc-cvs
https://gcc.gnu.org/g:366d45c8d4911dc7874d2e64cf2583c0133b8dd5

commit r15-1077-g366d45c8d4911dc7874d2e64cf2583c0133b8dd5
Author: Uros Bizjak 
Date:   Thu Jun 6 19:18:41 2024 +0200

testsuite/i386: Add vector sat_sub testcases [PR112600]

PR middle-end/112600

gcc/testsuite/ChangeLog:

* gcc.target/i386/pr112600-2a.c: New test.
* gcc.target/i386/pr112600-2b.c: New test.

Diff:
---
 gcc/testsuite/gcc.target/i386/pr112600-2a.c | 15 +++
 gcc/testsuite/gcc.target/i386/pr112600-2b.c | 15 +++
 2 files changed, 30 insertions(+)

diff --git a/gcc/testsuite/gcc.target/i386/pr112600-2a.c 
b/gcc/testsuite/gcc.target/i386/pr112600-2a.c
new file mode 100644
index 000..4df38e5a720
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/pr112600-2a.c
@@ -0,0 +1,15 @@
+/* PR middle-end/112600 */
+/* { dg-do compile } */
+/* { dg-options "-O2 -ftree-vectorize -msse2" } */
+
+typedef unsigned char T;
+
+void foo (T *out, T *x, T *y, int n)
+{
+  int i;
+
+  for (i = 0; i < n; i++)
+out[i] = (x[i] - y[i]) & (-(T)(x[i] >= y[i]));
+}
+
+/* { dg-final { scan-assembler "psubusb" } } */
diff --git a/gcc/testsuite/gcc.target/i386/pr112600-2b.c 
b/gcc/testsuite/gcc.target/i386/pr112600-2b.c
new file mode 100644
index 000..0f6345de704
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/pr112600-2b.c
@@ -0,0 +1,15 @@
+/* PR middle-end/112600 */
+/* { dg-do compile } */
+/* { dg-options "-O2 -ftree-vectorize -msse2" } */
+
+typedef unsigned short T;
+
+void foo (T *out, T *x, T *y, int n)
+{
+  int i;
+
+  for (i = 0; i < n; i++)
+out[i] = (x[i] - y[i]) & (-(T)(x[i] >= y[i]));
+}
+
+/* { dg-final { scan-assembler "psubusw" } } */


Re: [PATCH v1] Internal-fn: Support new IFN SAT_SUB for unsigned scalar int

2024-06-05 Thread Uros Bizjak
On Wed, Jun 5, 2024 at 10:52 AM Li, Pan2  wrote:
>
> Thanks for explaining. I see, cmove is well designed for such cases.

If the question is if it is worth it to convert using
__builtin_sub_overflow here if the target doesn't provide scalar
saturating optab, I think the answer is yes. For x86, the compare will
be eliminated.

Please consider this testcase:

--cut here--
unsigned int
__attribute__((noinline))
foo (unsigned int x, unsigned int y)
{
  return x > y ? x - y : 0;
}

unsigned int
__attribute__((noinline))
bar (unsigned int x, unsigned int y)
{
  unsigned int z;

  return __builtin_sub_overflow (x, y, ) ? 0 : z;
}
--cut here--

This will compile to:

 :
  0:   89 f8   mov%edi,%eax
  2:   31 d2   xor%edx,%edx
  4:   29 f0   sub%esi,%eax
  6:   39 fe   cmp%edi,%esi
  8:   0f 43 c2cmovae %edx,%eax
  b:   c3  ret
  c:   0f 1f 40 00 nopl   0x0(%rax)

0010 :
 10:   29 f7   sub%esi,%edi
 12:   72 03   jb 17 
 14:   89 f8   mov%edi,%eax
 16:   c3  ret
 17:   31 c0   xor%eax,%eax
 19:   c3  ret

Please note that the compare was eliminated in the later test. So, if
the target does not provide saturated optab but provides
__builtin_sub_overflow, I think it is worth emitting .SAT_SUB via
__builtin_sub_overflow (and in similar way for saturated add).

Uros.


>
> Pan
>
> -Original Message-
> From: Uros Bizjak 
> Sent: Wednesday, June 5, 2024 4:46 PM
> To: Li, Pan2 
> Cc: Richard Biener ; gcc-patches@gcc.gnu.org; 
> juzhe.zh...@rivai.ai; kito.ch...@gmail.com; tamar.christ...@arm.com
> Subject: Re: [PATCH v1] Internal-fn: Support new IFN SAT_SUB for unsigned 
> scalar int
>
> On Wed, Jun 5, 2024 at 10:38 AM Li, Pan2  wrote:
> >
> > > I see. x86 doesn't have scalar saturating instructions, so the scalar
> > > version indeed can't be converted.
> >
> > > I will amend x86 testcases after the vector part of your patch is 
> > > committed.
> >
> > Thanks for the confirmation. Just curious, the .SAT_SUB for scalar has 
> > sorts of forms, like a branch version as below.
> >
> > .SAT_SUB (x, y) = x > y ? x - y : 0. // or leverage __builtin_sub_overflow 
> > here
> >
> > It is reasonable to implement the scalar .SAT_SUB for x86? Given somehow we 
> > can eliminate the branch here.
>
> x86 will emit cmove in the above case:
>
>movl%edi, %eax
>xorl%edx, %edx
>subl%esi, %eax
>cmpl%edi, %esi
>cmovnb  %edx, %eax
>
> Maybe we can reuse flags from the subtraction here to avoid the compare.
>
> Uros.


Re: [PATCH v1] Internal-fn: Support new IFN SAT_SUB for unsigned scalar int

2024-06-05 Thread Uros Bizjak
On Wed, Jun 5, 2024 at 10:38 AM Li, Pan2  wrote:
>
> > I see. x86 doesn't have scalar saturating instructions, so the scalar
> > version indeed can't be converted.
>
> > I will amend x86 testcases after the vector part of your patch is committed.
>
> Thanks for the confirmation. Just curious, the .SAT_SUB for scalar has sorts 
> of forms, like a branch version as below.
>
> .SAT_SUB (x, y) = x > y ? x - y : 0. // or leverage __builtin_sub_overflow 
> here
>
> It is reasonable to implement the scalar .SAT_SUB for x86? Given somehow we 
> can eliminate the branch here.

x86 will emit cmove in the above case:

   movl%edi, %eax
   xorl%edx, %edx
   subl%esi, %eax
   cmpl%edi, %esi
   cmovnb  %edx, %eax

Maybe we can reuse flags from the subtraction here to avoid the compare.

Uros.


Re: [PATCH v1] Internal-fn: Support new IFN SAT_SUB for unsigned scalar int

2024-06-05 Thread Uros Bizjak
On Wed, Jun 5, 2024 at 10:22 AM Li, Pan2  wrote:
>
> > Is the above testcase correct? You need "(x + y)" as the first term.
>
> Thanks for comments, should be copy issue here, you can take SAT_SUB (x, y) 
> => (x - y) & (-(TYPE)(x >= y)) or below template for reference.
>
> +#define DEF_SAT_U_SUB_FMT_1(T) \
> +T __attribute__((noinline))\
> +sat_u_sub_##T##_fmt_1 (T x, T y)   \
> +{  \
> +  return (x - y) & (-(T)(x >= y)); \
> +}
> +
> +#define DEF_SAT_U_SUB_FMT_2(T)\
> +T __attribute__((noinline))   \
> +sat_u_sub_##T##_fmt_2 (T x, T y)  \
> +{ \
> +  return (x - y) & (-(T)(x > y)); \
> +}
>
> > BTW: After applying your patch, I'm not able to produce .SAT_SUB with
> > x86_64 and the following testcase:
>
> You mean vectorize part? This patch is only for unsigned scalar int (see 
> title) and the below is the vect part.
> Could you please help to double confirm if you cannot see .SAT_SUB after 
> widen_mul pass in x86 for unsigned scalar int?
> Of course, I will have a try later as in the middle of sth.
>
> https://gcc.gnu.org/pipermail/gcc-patches/2024-May/653024.html

I see. x86 doesn't have scalar saturating instructions, so the scalar
version indeed can't be converted.

I will amend x86 testcases after the vector part of your patch is committed.

Thanks,
Uros.


Re: [PATCH v1] Internal-fn: Support new IFN SAT_SUB for unsigned scalar int

2024-06-05 Thread Uros Bizjak
On Wed, Jun 5, 2024 at 9:38 AM Li, Pan2  wrote:
>
> Thanks Richard, will commit after the rebased pass the regression test.
>
> Pan
>
> -Original Message-
> From: Richard Biener 
> Sent: Wednesday, June 5, 2024 3:19 PM
> To: Li, Pan2 
> Cc: gcc-patches@gcc.gnu.org; juzhe.zh...@rivai.ai; kito.ch...@gmail.com; 
> tamar.christ...@arm.com
> Subject: Re: [PATCH v1] Internal-fn: Support new IFN SAT_SUB for unsigned 
> scalar int
>
> On Tue, May 28, 2024 at 10:29 AM  wrote:
> >
> > From: Pan Li 
> >
> > This patch would like to add the middle-end presentation for the
> > saturation sub.  Aka set the result of add to the min when downflow.
> > It will take the pattern similar as below.
> >
> > SAT_SUB (x, y) => (x - y) & (-(TYPE)(x >= y));
> >
> > For example for uint8_t, we have
> >
> > * SAT_SUB (255, 0)   => 255
> > * SAT_SUB (1, 2) => 0
> > * SAT_SUB (254, 255) => 0
> > * SAT_SUB (0, 255)   => 0
> >
> > Given below SAT_SUB for uint64
> >
> > uint64_t sat_sub_u64 (uint64_t x, uint64_t y)
> > {
> >   return (x + y) & (- (uint64_t)((x >= y)));
> > }

Is the above testcase correct? You need "(x + y)" as the first term.

BTW: After applying your patch, I'm not able to produce .SAT_SUB with
x86_64 and the following testcase:

--cut here--
typedef unsigned short T;

void foo (T *out, T *x, T *y, int n)
{
  int i;

  for (i = 0; i < n; i++)
out[i] = (x[i] - y[i]) & (-(T)(x[i] >= y[i]));
}
--cut here--

with gcc -O2 -ftree-vectorize -msse2

I think that all relevant optabs were added for x86 in

https://gcc.gnu.org/git/gitweb.cgi?p=gcc.git;h=b59de4113262f2bee14147eb17eb3592f03d9556

as part of the commit for PR112600, comment 8.

Uros.

> >
> > Before this patch:
> > uint64_t sat_sub_u_0_uint64_t (uint64_t x, uint64_t y)
> > {
> >   _Bool _1;
> >   long unsigned int _3;
> >   uint64_t _6;
> >
> > ;;   basic block 2, loop depth 0
> > ;;pred:   ENTRY
> >   _1 = x_4(D) >= y_5(D);
> >   _3 = x_4(D) - y_5(D);
> >   _6 = _1 ? _3 : 0;
> >   return _6;
> > ;;succ:   EXIT
> > }
> >
> > After this patch:
> > uint64_t sat_sub_u_0_uint64_t (uint64_t x, uint64_t y)
> > {
> >   uint64_t _6;
> >
> > ;;   basic block 2, loop depth 0
> > ;;pred:   ENTRY
> >   _6 = .SAT_SUB (x_4(D), y_5(D)); [tail call]
> >   return _6;
> > ;;succ:   EXIT
> > }
> >
> > The below tests are running for this patch:
> > *. The riscv fully regression tests.
> > *. The x86 bootstrap tests.
> > *. The x86 fully regression tests.
>
> OK.
>
> Thanks,
> Richard.
>
> > PR target/51492
> > PR target/112600
> >
> > gcc/ChangeLog:
> >
> > * internal-fn.def (SAT_SUB): Add new IFN define for SAT_SUB.
> > * match.pd: Add new match for SAT_SUB.
> > * optabs.def (OPTAB_NL): Remove fixed-point for ussub/ssub.
> > * tree-ssa-math-opts.cc (gimple_unsigned_integer_sat_sub): Add
> > new decl for generated in match.pd.
> > (build_saturation_binary_arith_call): Add new helper function
> > to build the gimple call to binary SAT alu.
> > (match_saturation_arith): Rename from.
> > (match_unsigned_saturation_add): Rename to.
> > (match_unsigned_saturation_sub): Add new func to match the
> > unsigned sat sub.
> > (math_opts_dom_walker::after_dom_children): Add SAT_SUB matching
> > try when COND_EXPR.
> >
> > Signed-off-by: Pan Li 
> > ---
> >  gcc/internal-fn.def   |  1 +
> >  gcc/match.pd  | 14 
> >  gcc/optabs.def|  4 +--
> >  gcc/tree-ssa-math-opts.cc | 67 +++
> >  4 files changed, 64 insertions(+), 22 deletions(-)
> >
> > diff --git a/gcc/internal-fn.def b/gcc/internal-fn.def
> > index 25badbb86e5..24539716e5b 100644
> > --- a/gcc/internal-fn.def
> > +++ b/gcc/internal-fn.def
> > @@ -276,6 +276,7 @@ DEF_INTERNAL_SIGNED_OPTAB_FN (MULHRS, ECF_CONST | 
> > ECF_NOTHROW, first,
> >   smulhrs, umulhrs, binary)
> >
> >  DEF_INTERNAL_SIGNED_OPTAB_FN (SAT_ADD, ECF_CONST, first, ssadd, usadd, 
> > binary)
> > +DEF_INTERNAL_SIGNED_OPTAB_FN (SAT_SUB, ECF_CONST, first, sssub, ussub, 
> > binary)
> >
> >  DEF_INTERNAL_COND_FN (ADD, ECF_CONST, add, binary)
> >  DEF_INTERNAL_COND_FN (SUB, ECF_CONST, sub, binary)
> > diff --git a/gcc/match.pd b/gcc/match.pd
> > index 024e3350465..3e334533ff8 100644
> > --- a/gcc/match.pd
> > +++ b/gcc/match.pd
> > @@ -3086,6 +3086,20 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT)
> >  (match (unsigned_integer_sat_add @0 @1)
> >   (bit_ior:c (usadd_left_part_2 @0 @1) (usadd_right_part_2 @0 @1)))
> >
> > +/* Unsigned saturation sub, case 1 (branch with gt):
> > +   SAT_U_SUB = X > Y ? X - Y : 0  */
> > +(match (unsigned_integer_sat_sub @0 @1)
> > + (cond (gt @0 @1) (minus @0 @1) integer_zerop)
> > + (if (INTEGRAL_TYPE_P (type) && TYPE_UNSIGNED (type)
> > +  && types_match (type, @0, @1
> > +
> > +/* Unsigned saturation sub, case 2 (branch with ge):
> > +   SAT_U_SUB = X >= Y ? X - Y : 0.  */
> > 

Re: [PATCH v1 0/6] Add DLL import/export implementation to AArch64

2024-06-05 Thread Uros Bizjak
On Tue, Jun 4, 2024 at 10:10 PM Evgeny Karpov
 wrote:
>
> Richard and Uros, could you please review the changes for v2?

LGTM for the generic x86 part, OS-specific part (cygming) should also
be reviewed by OS port maintainer (CC'd).

Thanks,
Uros.

> Additionally, we have detected an issue with GCC GC in winnt-dll.cc. The fix 
> will be included in v2.
>
> >> -ix86_handle_selectany_attribute (tree *node, tree name, tree, int,
> >> +mingw_handle_selectany_attribute (tree *node, tree name, tree, int,
> >>   bool *no_add_attrs)
>
> > please reindent the parameters for the new name length.
>
> Richard, could you please clarify how it should be done?
> Thanks!
>
> Regards,
> Evgeny
>
>
> ---
>  gcc/config/aarch64/cygming.h   |  6 +
>  gcc/config/i386/cygming.h  |  6 +
>  gcc/config/i386/i386-expand.cc |  6 +++--
>  gcc/config/i386/i386-expand.h  |  2 --
>  gcc/config/i386/i386.cc| 42 ++
>  gcc/config/i386/i386.h |  2 ++
>  gcc/config/mingw/winnt-dll.cc  |  8 ++-
>  gcc/config/mingw/winnt-dll.h   |  2 +-
>  8 files changed, 33 insertions(+), 41 deletions(-)
>
> diff --git a/gcc/config/aarch64/cygming.h b/gcc/config/aarch64/cygming.h
> index 4beebf9e093..0ff475754e0 100644
> --- a/gcc/config/aarch64/cygming.h
> +++ b/gcc/config/aarch64/cygming.h
> @@ -183,4 +183,10 @@ still needed for compilation.  */
>  #undef MAX_OFILE_ALIGNMENT
>  #define MAX_OFILE_ALIGNMENT (8192 * 8)
>
> +#define CMODEL_IS_NOT_LARGE_OR_MEDIUM_PIC 0
> +
> +#define HAVE_64BIT_POINTERS 1
> +
> +#define GOT_ALIAS_SET mingw_GOT_alias_set ()
> +
>  #endif
> diff --git a/gcc/config/i386/cygming.h b/gcc/config/i386/cygming.h
> index ee01e6bb6ce..cd240533dbc 100644
> --- a/gcc/config/i386/cygming.h
> +++ b/gcc/config/i386/cygming.h
> @@ -469,3 +469,9 @@ do {\
>  #ifndef HAVE_GAS_ALIGNED_COMM
>  # define HAVE_GAS_ALIGNED_COMM 0
>  #endif
> +
> +#define CMODEL_IS_NOT_LARGE_OR_MEDIUM_PIC ix86_cmodel != CM_LARGE_PIC && 
> ix86_cmodel != CM_MEDIUM_PIC
> +
> +#define HAVE_64BIT_POINTERS TARGET_64BIT_DEFAULT
> +
> +#define GOT_ALIAS_SET mingw_GOT_alias_set ()
> diff --git a/gcc/config/i386/i386-expand.cc b/gcc/config/i386/i386-expand.cc
> index fb460e30d0a..267d0ba257b 100644
> --- a/gcc/config/i386/i386-expand.cc
> +++ b/gcc/config/i386/i386-expand.cc
> @@ -408,11 +408,12 @@ ix86_expand_move (machine_mode mode, rtx operands[])
>  : UNSPEC_GOT));
>   op1 = gen_rtx_CONST (Pmode, op1);
>   op1 = gen_const_mem (Pmode, op1);
> - set_mem_alias_set (op1, ix86_GOT_alias_set ());
> + set_mem_alias_set (op1, GOT_ALIAS_SET);
> }
>else
> {
> - tmp = ix86_legitimize_pe_coff_symbol (op1, addend != NULL_RTX);
> +#if TARGET_PECOFF
> + tmp = legitimize_pe_coff_symbol (op1, addend != NULL_RTX);
>   if (tmp)
> {
>   op1 = tmp;
> @@ -424,6 +425,7 @@ ix86_expand_move (machine_mode mode, rtx operands[])
>   op1 = operands[1];
>   break;
> }
> +#endif
> }
>
>if (addend)
> diff --git a/gcc/config/i386/i386-expand.h b/gcc/config/i386/i386-expand.h
> index a8c20993954..5e02df1706d 100644
> --- a/gcc/config/i386/i386-expand.h
> +++ b/gcc/config/i386/i386-expand.h
> @@ -34,9 +34,7 @@ struct expand_vec_perm_d
>  };
>
>  rtx legitimize_tls_address (rtx x, enum tls_model model, bool for_mov);
> -alias_set_type ix86_GOT_alias_set (void);
>  rtx legitimize_pic_address (rtx orig, rtx reg);
> -rtx ix86_legitimize_pe_coff_symbol (rtx addr, bool inreg);
>
>  bool insn_defines_reg (unsigned int regno1, unsigned int regno2,
>rtx_insn *insn);
> diff --git a/gcc/config/i386/i386.cc b/gcc/config/i386/i386.cc
> index 66845b30446..ee3a59ed498 100644
> --- a/gcc/config/i386/i386.cc
> +++ b/gcc/config/i386/i386.cc
> @@ -11807,30 +11807,6 @@ constant_address_p (rtx x)
>  }
>
>
>
> -#if TARGET_PECOFF
> -rtx ix86_legitimize_pe_coff_symbol (rtx addr, bool inreg)
> -{
> -  return legitimize_pe_coff_symbol (addr, inreg);
> -}
> -
> -alias_set_type
> -ix86_GOT_alias_set (void)
> -{
> -  return mingw_GOT_alias_set ();
> -}
> -#else
> -rtx ix86_legitimize_pe_coff_symbol (rtx addr, bool inreg)
> -{
> -  return NULL_RTX;
> -}
> -
> -alias_set_type
> -ix86_GOT_alias_set (void)
> -{
> -  return -1;
> -}
> -#endif
> -
>  /* Return a legitimate reference for ORIG (an address) using the
> register REG.  If REG is 0, a new pseudo is generated.
>
> @@ -11867,9 +11843,11 @@ legitimize_pic_address (rtx orig, rtx reg)
>
>if (TARGET_64BIT && TARGET_DLLIMPORT_DECL_ATTRIBUTES)
>  {
> -  rtx tmp = ix86_legitimize_pe_coff_symbol (addr, true);
> +#if TARGET_PECOFF
> +  rtx tmp = legitimize_pe_coff_symbol (addr, true);
>if (tmp)
>  return tmp;
> +#endif
>  }
>
>if (TARGET_64BIT && legitimate_pic_address_disp_p (addr))
> @@ -11912,9 

[gcc r11-11463] alpha: Fix invalid RTX in divmodsi insn patterns [PR115297]

2024-06-03 Thread Uros Bizjak via Gcc-cvs
https://gcc.gnu.org/g:835b913aff1b1a813df3b9d2bbef170ae7d8856d

commit r11-11463-g835b913aff1b1a813df3b9d2bbef170ae7d8856d
Author: Uros Bizjak 
Date:   Fri May 31 15:52:03 2024 +0200

alpha: Fix invalid RTX in divmodsi insn patterns [PR115297]

any_divmod instructions are modelled with invalid RTX:

  [(set (match_operand:DI 0 "register_operand" "=c")
(sign_extend:DI (match_operator:SI 3 "divmod_operator"
[(match_operand:DI 1 "register_operand" "a")
 (match_operand:DI 2 "register_operand" "b")])))
   (clobber (reg:DI 23))
   (clobber (reg:DI 28))]

where SImode divmod_operator (div,mod,udiv,umod) has DImode operands.

Wrap input operand with truncate:SI to make machine modes consistent.

PR target/115297

gcc/ChangeLog:

* config/alpha/alpha.md (si3): Wrap DImode
operands 3 and 4 with truncate:SI RTX.
(*divmodsi_internal_er): Ditto for operands 1 and 2.
(*divmodsi_internal_er_1): Ditto.
(*divmodsi_internal): Ditto.
* config/alpha/constraints.md ("b"): Correct register
number in the description.

gcc/testsuite/ChangeLog:

* gcc.target/alpha/pr115297.c: New test.

(cherry picked from commit 0ac802064c2a018cf166c37841697e867de65a95)

Diff:
---
 gcc/config/alpha/alpha.md | 21 -
 gcc/config/alpha/constraints.md   |  2 +-
 gcc/testsuite/gcc.target/alpha/pr115297.c | 13 +
 3 files changed, 26 insertions(+), 10 deletions(-)

diff --git a/gcc/config/alpha/alpha.md b/gcc/config/alpha/alpha.md
index 98d09d43721..6ee8eb81df8 100644
--- a/gcc/config/alpha/alpha.md
+++ b/gcc/config/alpha/alpha.md
@@ -756,7 +756,8 @@
(sign_extend:DI (match_operand:SI 2 "nonimmediate_operand")))
(parallel [(set (match_dup 5)
   (sign_extend:DI
-   (any_divmod:SI (match_dup 3) (match_dup 4
+   (any_divmod:SI (truncate:SI (match_dup 3))
+  (truncate:SI (match_dup 4)
  (clobber (reg:DI 23))
  (clobber (reg:DI 28))])
(set (match_operand:SI 0 "nonimmediate_operand")
@@ -782,9 +783,10 @@
 
 (define_insn_and_split "*divmodsi_internal_er"
   [(set (match_operand:DI 0 "register_operand" "=c")
-   (sign_extend:DI (match_operator:SI 3 "divmod_operator"
-   [(match_operand:DI 1 "register_operand" "a")
-(match_operand:DI 2 "register_operand" "b")])))
+   (sign_extend:DI
+(match_operator:SI 3 "divmod_operator"
+ [(truncate:SI (match_operand:DI 1 "register_operand" "a"))
+  (truncate:SI (match_operand:DI 2 "register_operand" "b"))])))
(clobber (reg:DI 23))
(clobber (reg:DI 28))]
   "TARGET_EXPLICIT_RELOCS && TARGET_ABI_OSF"
@@ -826,8 +828,8 @@
 (define_insn "*divmodsi_internal_er_1"
   [(set (match_operand:DI 0 "register_operand" "=c")
(sign_extend:DI (match_operator:SI 3 "divmod_operator"
-[(match_operand:DI 1 "register_operand" "a")
- (match_operand:DI 2 "register_operand" "b")])))
+[(truncate:SI (match_operand:DI 1 "register_operand" "a"))
+ (truncate:SI (match_operand:DI 2 "register_operand" "b"))])))
(use (match_operand:DI 4 "register_operand" "c"))
(use (match_operand 5 "const_int_operand"))
(clobber (reg:DI 23))
@@ -839,9 +841,10 @@
 
 (define_insn "*divmodsi_internal"
   [(set (match_operand:DI 0 "register_operand" "=c")
-   (sign_extend:DI (match_operator:SI 3 "divmod_operator"
-   [(match_operand:DI 1 "register_operand" "a")
-(match_operand:DI 2 "register_operand" "b")])))
+   (sign_extend:DI
+(match_operator:SI 3 "divmod_operator"
+ [(truncate:SI (match_operand:DI 1 "register_operand" "a"))
+  (truncate:SI (match_operand:DI 2 "register_operand" "b"))])))
(clobber (reg:DI 23))
(clobber (reg:DI 28))]
   "TARGET_ABI_OSF"
diff --git a/gcc/config/alpha/constraints.md b/gcc/config/alpha/constraints.md
index e75a1489b4b..4b245233644 100644
--- a/gcc/config/alpha/constraints.md
+++ b/gcc/config/alpha/constraints.md
@@ -27,7 +27,7 @@
  "General register 24, input to division routine")
 
 (define_register_constraint 

[gcc r12-10486] alpha: Fix invalid RTX in divmodsi insn patterns [PR115297]

2024-06-03 Thread Uros Bizjak via Gcc-cvs
https://gcc.gnu.org/g:c6c2a6cebabc5f78cef3d81cedb4b3b578478b9f

commit r12-10486-gc6c2a6cebabc5f78cef3d81cedb4b3b578478b9f
Author: Uros Bizjak 
Date:   Fri May 31 15:52:03 2024 +0200

alpha: Fix invalid RTX in divmodsi insn patterns [PR115297]

any_divmod instructions are modelled with invalid RTX:

  [(set (match_operand:DI 0 "register_operand" "=c")
(sign_extend:DI (match_operator:SI 3 "divmod_operator"
[(match_operand:DI 1 "register_operand" "a")
 (match_operand:DI 2 "register_operand" "b")])))
   (clobber (reg:DI 23))
   (clobber (reg:DI 28))]

where SImode divmod_operator (div,mod,udiv,umod) has DImode operands.

Wrap input operand with truncate:SI to make machine modes consistent.

PR target/115297

gcc/ChangeLog:

* config/alpha/alpha.md (si3): Wrap DImode
operands 3 and 4 with truncate:SI RTX.
(*divmodsi_internal_er): Ditto for operands 1 and 2.
(*divmodsi_internal_er_1): Ditto.
(*divmodsi_internal): Ditto.
* config/alpha/constraints.md ("b"): Correct register
number in the description.

gcc/testsuite/ChangeLog:

* gcc.target/alpha/pr115297.c: New test.

(cherry picked from commit 0ac802064c2a018cf166c37841697e867de65a95)

Diff:
---
 gcc/config/alpha/alpha.md | 21 -
 gcc/config/alpha/constraints.md   |  2 +-
 gcc/testsuite/gcc.target/alpha/pr115297.c | 13 +
 3 files changed, 26 insertions(+), 10 deletions(-)

diff --git a/gcc/config/alpha/alpha.md b/gcc/config/alpha/alpha.md
index 87514330c22..442953fe50e 100644
--- a/gcc/config/alpha/alpha.md
+++ b/gcc/config/alpha/alpha.md
@@ -756,7 +756,8 @@
(sign_extend:DI (match_operand:SI 2 "nonimmediate_operand")))
(parallel [(set (match_dup 5)
   (sign_extend:DI
-   (any_divmod:SI (match_dup 3) (match_dup 4
+   (any_divmod:SI (truncate:SI (match_dup 3))
+  (truncate:SI (match_dup 4)
  (clobber (reg:DI 23))
  (clobber (reg:DI 28))])
(set (match_operand:SI 0 "nonimmediate_operand")
@@ -782,9 +783,10 @@
 
 (define_insn_and_split "*divmodsi_internal_er"
   [(set (match_operand:DI 0 "register_operand" "=c")
-   (sign_extend:DI (match_operator:SI 3 "divmod_operator"
-   [(match_operand:DI 1 "register_operand" "a")
-(match_operand:DI 2 "register_operand" "b")])))
+   (sign_extend:DI
+(match_operator:SI 3 "divmod_operator"
+ [(truncate:SI (match_operand:DI 1 "register_operand" "a"))
+  (truncate:SI (match_operand:DI 2 "register_operand" "b"))])))
(clobber (reg:DI 23))
(clobber (reg:DI 28))]
   "TARGET_EXPLICIT_RELOCS && TARGET_ABI_OSF"
@@ -826,8 +828,8 @@
 (define_insn "*divmodsi_internal_er_1"
   [(set (match_operand:DI 0 "register_operand" "=c")
(sign_extend:DI (match_operator:SI 3 "divmod_operator"
-[(match_operand:DI 1 "register_operand" "a")
- (match_operand:DI 2 "register_operand" "b")])))
+[(truncate:SI (match_operand:DI 1 "register_operand" "a"))
+ (truncate:SI (match_operand:DI 2 "register_operand" "b"))])))
(use (match_operand:DI 4 "register_operand" "c"))
(use (match_operand 5 "const_int_operand"))
(clobber (reg:DI 23))
@@ -839,9 +841,10 @@
 
 (define_insn "*divmodsi_internal"
   [(set (match_operand:DI 0 "register_operand" "=c")
-   (sign_extend:DI (match_operator:SI 3 "divmod_operator"
-   [(match_operand:DI 1 "register_operand" "a")
-(match_operand:DI 2 "register_operand" "b")])))
+   (sign_extend:DI
+(match_operator:SI 3 "divmod_operator"
+ [(truncate:SI (match_operand:DI 1 "register_operand" "a"))
+  (truncate:SI (match_operand:DI 2 "register_operand" "b"))])))
(clobber (reg:DI 23))
(clobber (reg:DI 28))]
   "TARGET_ABI_OSF"
diff --git a/gcc/config/alpha/constraints.md b/gcc/config/alpha/constraints.md
index a41b6471b9c..fd93525e36c 100644
--- a/gcc/config/alpha/constraints.md
+++ b/gcc/config/alpha/constraints.md
@@ -27,7 +27,7 @@
  "General register 24, input to division routine")
 
 (define_register_constraint 

[gcc r13-8820] alpha: Fix invalid RTX in divmodsi insn patterns [PR115297]

2024-06-03 Thread Uros Bizjak via Gcc-cvs
https://gcc.gnu.org/g:ed06ca80bae174f1179222ff8e6b93464006e86a

commit r13-8820-ged06ca80bae174f1179222ff8e6b93464006e86a
Author: Uros Bizjak 
Date:   Fri May 31 15:52:03 2024 +0200

alpha: Fix invalid RTX in divmodsi insn patterns [PR115297]

any_divmod instructions are modelled with invalid RTX:

  [(set (match_operand:DI 0 "register_operand" "=c")
(sign_extend:DI (match_operator:SI 3 "divmod_operator"
[(match_operand:DI 1 "register_operand" "a")
 (match_operand:DI 2 "register_operand" "b")])))
   (clobber (reg:DI 23))
   (clobber (reg:DI 28))]

where SImode divmod_operator (div,mod,udiv,umod) has DImode operands.

Wrap input operand with truncate:SI to make machine modes consistent.

PR target/115297

gcc/ChangeLog:

* config/alpha/alpha.md (si3): Wrap DImode
operands 3 and 4 with truncate:SI RTX.
(*divmodsi_internal_er): Ditto for operands 1 and 2.
(*divmodsi_internal_er_1): Ditto.
(*divmodsi_internal): Ditto.
* config/alpha/constraints.md ("b"): Correct register
number in the description.

gcc/testsuite/ChangeLog:

* gcc.target/alpha/pr115297.c: New test.

(cherry picked from commit 0ac802064c2a018cf166c37841697e867de65a95)

Diff:
---
 gcc/config/alpha/alpha.md | 21 -
 gcc/config/alpha/constraints.md   |  2 +-
 gcc/testsuite/gcc.target/alpha/pr115297.c | 13 +
 3 files changed, 26 insertions(+), 10 deletions(-)

diff --git a/gcc/config/alpha/alpha.md b/gcc/config/alpha/alpha.md
index d91742496d0..17dfc4a5868 100644
--- a/gcc/config/alpha/alpha.md
+++ b/gcc/config/alpha/alpha.md
@@ -756,7 +756,8 @@
(sign_extend:DI (match_operand:SI 2 "nonimmediate_operand")))
(parallel [(set (match_dup 5)
   (sign_extend:DI
-   (any_divmod:SI (match_dup 3) (match_dup 4
+   (any_divmod:SI (truncate:SI (match_dup 3))
+  (truncate:SI (match_dup 4)
  (clobber (reg:DI 23))
  (clobber (reg:DI 28))])
(set (match_operand:SI 0 "nonimmediate_operand")
@@ -782,9 +783,10 @@
 
 (define_insn_and_split "*divmodsi_internal_er"
   [(set (match_operand:DI 0 "register_operand" "=c")
-   (sign_extend:DI (match_operator:SI 3 "divmod_operator"
-   [(match_operand:DI 1 "register_operand" "a")
-(match_operand:DI 2 "register_operand" "b")])))
+   (sign_extend:DI
+(match_operator:SI 3 "divmod_operator"
+ [(truncate:SI (match_operand:DI 1 "register_operand" "a"))
+  (truncate:SI (match_operand:DI 2 "register_operand" "b"))])))
(clobber (reg:DI 23))
(clobber (reg:DI 28))]
   "TARGET_EXPLICIT_RELOCS && TARGET_ABI_OSF"
@@ -826,8 +828,8 @@
 (define_insn "*divmodsi_internal_er_1"
   [(set (match_operand:DI 0 "register_operand" "=c")
(sign_extend:DI (match_operator:SI 3 "divmod_operator"
-[(match_operand:DI 1 "register_operand" "a")
- (match_operand:DI 2 "register_operand" "b")])))
+[(truncate:SI (match_operand:DI 1 "register_operand" "a"))
+ (truncate:SI (match_operand:DI 2 "register_operand" "b"))])))
(use (match_operand:DI 4 "register_operand" "c"))
(use (match_operand 5 "const_int_operand"))
(clobber (reg:DI 23))
@@ -839,9 +841,10 @@
 
 (define_insn "*divmodsi_internal"
   [(set (match_operand:DI 0 "register_operand" "=c")
-   (sign_extend:DI (match_operator:SI 3 "divmod_operator"
-   [(match_operand:DI 1 "register_operand" "a")
-(match_operand:DI 2 "register_operand" "b")])))
+   (sign_extend:DI
+(match_operator:SI 3 "divmod_operator"
+ [(truncate:SI (match_operand:DI 1 "register_operand" "a"))
+  (truncate:SI (match_operand:DI 2 "register_operand" "b"))])))
(clobber (reg:DI 23))
(clobber (reg:DI 28))]
   "TARGET_ABI_OSF"
diff --git a/gcc/config/alpha/constraints.md b/gcc/config/alpha/constraints.md
index ac3a5293732..2c0c276d491 100644
--- a/gcc/config/alpha/constraints.md
+++ b/gcc/config/alpha/constraints.md
@@ -27,7 +27,7 @@
  "General register 24, input to division routine")
 
 (define_register_constraint 

[committed] i386: Force operand 1 of bswapsi2 to a register for !TARGET_BSWAP [PR115321]

2024-06-03 Thread Uros Bizjak
PR target/115321

gcc/ChangeLog:

* config/i386/i386.md (bswapsi2): Force operand 1
to a register also for !TARGET_BSWAP.

gcc/testsuite/ChangeLog:

* gcc.target/i386/pr115321.c: New test.

Bootstrapped and regression tested on x86_64-linux-gnu {,m32}.

Uros.
diff --git a/gcc/config/i386/i386.md b/gcc/config/i386/i386.md
index 2c95395b7be..ef83984d00e 100644
--- a/gcc/config/i386/i386.md
+++ b/gcc/config/i386/i386.md
@@ -21193,18 +21193,19 @@ (define_expand "bswapsi2"
(bswap:SI (match_operand:SI 1 "nonimmediate_operand")))]
   ""
 {
-  if (TARGET_MOVBE)
-;
-  else if (TARGET_BSWAP)
-operands[1] = force_reg (SImode, operands[1]);
-  else
+  if (!TARGET_MOVBE)
 {
-  rtx x = gen_reg_rtx (SImode);
+  operands[1] = force_reg (SImode, operands[1]);
 
-  emit_insn (gen_bswaphisi2_lowpart (x, operands[1]));
-  emit_insn (gen_rotlsi3 (x, x, GEN_INT (16)));
-  emit_insn (gen_bswaphisi2_lowpart (operands[0], x));
-  DONE;
+  if (!TARGET_BSWAP)
+   {
+ rtx x = gen_reg_rtx (SImode);
+
+ emit_insn (gen_bswaphisi2_lowpart (x, operands[1]));
+ emit_insn (gen_rotlsi3 (x, x, GEN_INT (16)));
+ emit_insn (gen_bswaphisi2_lowpart (operands[0], x));
+ DONE;
+   }
 }
 })
 
diff --git a/gcc/testsuite/gcc.target/i386/pr115321.c 
b/gcc/testsuite/gcc.target/i386/pr115321.c
new file mode 100644
index 000..0ddab9bd7a5
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/pr115321.c
@@ -0,0 +1,4 @@
+/* { dg-do compile { target ia32 } } */
+/* { dg-options "-march=i386" } */
+
+unsigned foo (unsigned x) { return __builtin_bswap32 (x); }


[gcc r15-993] i386: Force operand 1 of bswapsi2 to a register for !TARGET_BSWAP [PR115321]

2024-06-03 Thread Uros Bizjak via Gcc-cvs
https://gcc.gnu.org/g:6ab5145825ca7e96fcbe3aa505d42e4ae8f81009

commit r15-993-g6ab5145825ca7e96fcbe3aa505d42e4ae8f81009
Author: Uros Bizjak 
Date:   Mon Jun 3 15:48:18 2024 +0200

i386: Force operand 1 of bswapsi2 to a register for !TARGET_BSWAP [PR115321]

PR target/115321

gcc/ChangeLog:

* config/i386/i386.md (bswapsi2): Force operand 1
to a register also for !TARGET_BSWAP.

gcc/testsuite/ChangeLog:

* gcc.target/i386/pr115321.c: New test.

Diff:
---
 gcc/config/i386/i386.md  | 21 +++--
 gcc/testsuite/gcc.target/i386/pr115321.c |  4 
 2 files changed, 15 insertions(+), 10 deletions(-)

diff --git a/gcc/config/i386/i386.md b/gcc/config/i386/i386.md
index 2c95395b7be..ef83984d00e 100644
--- a/gcc/config/i386/i386.md
+++ b/gcc/config/i386/i386.md
@@ -21193,18 +21193,19 @@
(bswap:SI (match_operand:SI 1 "nonimmediate_operand")))]
   ""
 {
-  if (TARGET_MOVBE)
-;
-  else if (TARGET_BSWAP)
-operands[1] = force_reg (SImode, operands[1]);
-  else
+  if (!TARGET_MOVBE)
 {
-  rtx x = gen_reg_rtx (SImode);
+  operands[1] = force_reg (SImode, operands[1]);
 
-  emit_insn (gen_bswaphisi2_lowpart (x, operands[1]));
-  emit_insn (gen_rotlsi3 (x, x, GEN_INT (16)));
-  emit_insn (gen_bswaphisi2_lowpart (operands[0], x));
-  DONE;
+  if (!TARGET_BSWAP)
+   {
+ rtx x = gen_reg_rtx (SImode);
+
+ emit_insn (gen_bswaphisi2_lowpart (x, operands[1]));
+ emit_insn (gen_rotlsi3 (x, x, GEN_INT (16)));
+ emit_insn (gen_bswaphisi2_lowpart (operands[0], x));
+ DONE;
+   }
 }
 })
 
diff --git a/gcc/testsuite/gcc.target/i386/pr115321.c 
b/gcc/testsuite/gcc.target/i386/pr115321.c
new file mode 100644
index 000..0ddab9bd7a5
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/pr115321.c
@@ -0,0 +1,4 @@
+/* { dg-do compile { target ia32 } } */
+/* { dg-options "-march=i386" } */
+
+unsigned foo (unsigned x) { return __builtin_bswap32 (x); }


Re: [PATCH] [x86] Add some preference for floating point rtl ifcvt when sse4.1 is not available

2024-06-03 Thread Uros Bizjak
On Mon, Jun 3, 2024 at 5:11 AM liuhongt  wrote:
>
> W/o TARGET_SSE4_1, it takes 3 instructions (pand, pandn and por) for
> movdfcc/movsfcc, and could possibly fail cost comparison. Increase
> branch cost could hurt performance for other modes, so specially add
> some preference for floating point ifcvt.
>
> Bootstrapped and regtested on x86_64-pc-linux-gnu{-m32,}.
> Ok for trunk?
>
> gcc/ChangeLog:
>
> * config/i386/i386.cc (ix86_noce_conversion_profitable_p): Add
> some preference for floating point ifcvt when SSE4.1 is not
> available.
>
> gcc/testsuite/ChangeLog:
>
> * gcc.target/i386/pr115299.c: New test.
> * gcc.target/i386/pr86722.c: Adjust testcase.

LGTM.

Thanks,
Uros.

> ---
>  gcc/config/i386/i386.cc  | 17 +
>  gcc/testsuite/gcc.target/i386/pr115299.c | 10 ++
>  gcc/testsuite/gcc.target/i386/pr86722.c  |  2 +-
>  3 files changed, 28 insertions(+), 1 deletion(-)
>  create mode 100644 gcc/testsuite/gcc.target/i386/pr115299.c
>
> diff --git a/gcc/config/i386/i386.cc b/gcc/config/i386/i386.cc
> index 1a0206ab573..271da127a89 100644
> --- a/gcc/config/i386/i386.cc
> +++ b/gcc/config/i386/i386.cc
> @@ -24879,6 +24879,23 @@ ix86_noce_conversion_profitable_p (rtx_insn *seq, 
> struct noce_if_info *if_info)
> return false;
> }
>  }
> +
> +  /* W/o TARGET_SSE4_1, it takes 3 instructions (pand, pandn and por)
> + for movdfcc/movsfcc, and could possibly fail cost comparison.
> + Increase branch cost will hurt performance for other modes, so
> + specially add some preference for floating point ifcvt.  */
> +  if (!TARGET_SSE4_1 && if_info->x
> +  && GET_MODE_CLASS (GET_MODE (if_info->x)) == MODE_FLOAT
> +  && if_info->speed_p)
> +{
> +  unsigned cost = seq_cost (seq, true);
> +
> +  if (cost <= if_info->original_cost)
> +   return true;
> +
> +  return cost <= (if_info->max_seq_cost + COSTS_N_INSNS (2));
> +}
> +
>return default_noce_conversion_profitable_p (seq, if_info);
>  }
>
> diff --git a/gcc/testsuite/gcc.target/i386/pr115299.c 
> b/gcc/testsuite/gcc.target/i386/pr115299.c
> new file mode 100644
> index 000..53c5899136a
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/i386/pr115299.c
> @@ -0,0 +1,10 @@
> +/* { dg-do compile { target { ! ia32 } } } */
> +/* { dg-options "-O2 -mno-sse4.1 -msse2" } */
> +
> +void f(double*d,double*e){
> +  for(;d +*d=(*d<.5)?.7:0;
> +}
> +
> +/* { dg-final { scan-assembler {(?n)(?:cmpnltsd|cmpltsd)} } } */
> +/* { dg-final { scan-assembler {(?n)(?:andnpd|andpd)} } } */
> diff --git a/gcc/testsuite/gcc.target/i386/pr86722.c 
> b/gcc/testsuite/gcc.target/i386/pr86722.c
> index 4de2ca1a6c0..e266a1e56c2 100644
> --- a/gcc/testsuite/gcc.target/i386/pr86722.c
> +++ b/gcc/testsuite/gcc.target/i386/pr86722.c
> @@ -6,5 +6,5 @@ void f(double*d,double*e){
>  *d=(*d<.5)?.7:0;
>  }
>
> -/* { dg-final { scan-assembler-not "andnpd" } } */
> +/* { dg-final { scan-assembler-times {(?n)(?:andnpd|andpd)} 1 } } */
>  /* { dg-final { scan-assembler-not "orpd" } } */
> --
> 2.31.1
>


Re: [PATCH 39/52] i386: New hook implementation ix86_c_mode_for_floating_type

2024-06-03 Thread Uros Bizjak
On Mon, Jun 3, 2024 at 5:02 AM Kewen Lin  wrote:
>
> This is to remove macros {FLOAT,{,LONG_}DOUBLE}_TYPE_SIZE
> defines in i386 port, and add new port specific hook
> implementation ix86_c_mode_for_floating_type.
>
> gcc/ChangeLog:
>
> * config/i386/i386.cc (ix86_c_mode_for_floating_type): New
> function.
> (TARGET_C_MODE_FOR_FLOATING_TYPE): New macro.
> * config/i386/i386.h (FLOAT_TYPE_SIZE): Remove.
> (DOUBLE_TYPE_SIZE): Likewise.
> (LONG_DOUBLE_TYPE_SIZE): Likewise.

OK.

Thanks,
Uros.

> ---
>  gcc/config/i386/i386.cc | 15 +++
>  gcc/config/i386/i386.h  |  4 
>  2 files changed, 15 insertions(+), 4 deletions(-)
>
> diff --git a/gcc/config/i386/i386.cc b/gcc/config/i386/i386.cc
> index 3e2a3a194f1..6abb6d7a1ca 100644
> --- a/gcc/config/i386/i386.cc
> +++ b/gcc/config/i386/i386.cc
> @@ -25794,6 +25794,19 @@ ix86_bitint_type_info (int n, struct bitint_info 
> *info)
>return true;
>  }
>
> +/* Implement TARGET_C_MODE_FOR_FLOATING_TYPE.  Return DFmode, TFmode
> +   or XFmode for TI_LONG_DOUBLE_TYPE which is for long double type,
> +   based on long double bits, go with the default one for the others.  */
> +
> +static machine_mode
> +ix86_c_mode_for_floating_type (enum tree_index ti)
> +{
> +  if (ti == TI_LONG_DOUBLE_TYPE)
> +return (TARGET_LONG_DOUBLE_64 ? DFmode
> + : (TARGET_LONG_DOUBLE_128 ? TFmode : 
> XFmode));
> +  return default_mode_for_floating_type (ti);
> +}
> +
>  /* Returns modified FUNCTION_TYPE for cdtor callabi.  */
>  tree
>  ix86_cxx_adjust_cdtor_callabi_fntype (tree fntype)
> @@ -26419,6 +26432,8 @@ static const scoped_attribute_specs *const 
> ix86_attribute_table[] =
>  #define TARGET_C_EXCESS_PRECISION ix86_get_excess_precision
>  #undef TARGET_C_BITINT_TYPE_INFO
>  #define TARGET_C_BITINT_TYPE_INFO ix86_bitint_type_info
> +#undef TARGET_C_MODE_FOR_FLOATING_TYPE
> +#define TARGET_C_MODE_FOR_FLOATING_TYPE ix86_c_mode_for_floating_type
>  #undef TARGET_CXX_ADJUST_CDTOR_CALLABI_FNTYPE
>  #define TARGET_CXX_ADJUST_CDTOR_CALLABI_FNTYPE 
> ix86_cxx_adjust_cdtor_callabi_fntype
>  #undef TARGET_PROMOTE_PROTOTYPES
> diff --git a/gcc/config/i386/i386.h b/gcc/config/i386/i386.h
> index 359a8408263..fad434c10d6 100644
> --- a/gcc/config/i386/i386.h
> +++ b/gcc/config/i386/i386.h
> @@ -675,10 +675,6 @@ extern const char *host_detect_local_cpu (int argc, 
> const char **argv);
>  #define LONG_TYPE_SIZE (TARGET_X32 ? 32 : BITS_PER_WORD)
>  #define POINTER_SIZE (TARGET_X32 ? 32 : BITS_PER_WORD)
>  #define LONG_LONG_TYPE_SIZE 64
> -#define FLOAT_TYPE_SIZE 32
> -#define DOUBLE_TYPE_SIZE 64
> -#define LONG_DOUBLE_TYPE_SIZE \
> -  (TARGET_LONG_DOUBLE_64 ? 64 : (TARGET_LONG_DOUBLE_128 ? 128 : 80))
>
>  #define WIDEST_HARDWARE_FP_SIZE 80
>
> --
> 2.43.0
>


[gcc r14-10264] alpha: Fix invalid RTX in divmodsi insn patterns [PR115297]

2024-05-31 Thread Uros Bizjak via Gcc-cvs
https://gcc.gnu.org/g:ec92744de552303a1424085203e1311bd9146f21

commit r14-10264-gec92744de552303a1424085203e1311bd9146f21
Author: Uros Bizjak 
Date:   Fri May 31 15:52:03 2024 +0200

alpha: Fix invalid RTX in divmodsi insn patterns [PR115297]

any_divmod instructions are modelled with invalid RTX:

  [(set (match_operand:DI 0 "register_operand" "=c")
(sign_extend:DI (match_operator:SI 3 "divmod_operator"
[(match_operand:DI 1 "register_operand" "a")
 (match_operand:DI 2 "register_operand" "b")])))
   (clobber (reg:DI 23))
   (clobber (reg:DI 28))]

where SImode divmod_operator (div,mod,udiv,umod) has DImode operands.

Wrap input operand with truncate:SI to make machine modes consistent.

PR target/115297

gcc/ChangeLog:

* config/alpha/alpha.md (si3): Wrap DImode
operands 3 and 4 with truncate:SI RTX.
(*divmodsi_internal_er): Ditto for operands 1 and 2.
(*divmodsi_internal_er_1): Ditto.
(*divmodsi_internal): Ditto.
* config/alpha/constraints.md ("b"): Correct register
number in the description.

gcc/testsuite/ChangeLog:

* gcc.target/alpha/pr115297.c: New test.

(cherry picked from commit 0ac802064c2a018cf166c37841697e867de65a95)

Diff:
---
 gcc/config/alpha/alpha.md | 21 -
 gcc/config/alpha/constraints.md   |  2 +-
 gcc/testsuite/gcc.target/alpha/pr115297.c | 13 +
 3 files changed, 26 insertions(+), 10 deletions(-)

diff --git a/gcc/config/alpha/alpha.md b/gcc/config/alpha/alpha.md
index 79f12c53c16..1e2de5a4d15 100644
--- a/gcc/config/alpha/alpha.md
+++ b/gcc/config/alpha/alpha.md
@@ -725,7 +725,8 @@
(sign_extend:DI (match_operand:SI 2 "nonimmediate_operand")))
(parallel [(set (match_dup 5)
   (sign_extend:DI
-   (any_divmod:SI (match_dup 3) (match_dup 4
+   (any_divmod:SI (truncate:SI (match_dup 3))
+  (truncate:SI (match_dup 4)
  (clobber (reg:DI 23))
  (clobber (reg:DI 28))])
(set (match_operand:SI 0 "nonimmediate_operand")
@@ -751,9 +752,10 @@
 
 (define_insn_and_split "*divmodsi_internal_er"
   [(set (match_operand:DI 0 "register_operand" "=c")
-   (sign_extend:DI (match_operator:SI 3 "divmod_operator"
-   [(match_operand:DI 1 "register_operand" "a")
-(match_operand:DI 2 "register_operand" "b")])))
+   (sign_extend:DI
+(match_operator:SI 3 "divmod_operator"
+ [(truncate:SI (match_operand:DI 1 "register_operand" "a"))
+  (truncate:SI (match_operand:DI 2 "register_operand" "b"))])))
(clobber (reg:DI 23))
(clobber (reg:DI 28))]
   "TARGET_EXPLICIT_RELOCS && TARGET_ABI_OSF"
@@ -795,8 +797,8 @@
 (define_insn "*divmodsi_internal_er_1"
   [(set (match_operand:DI 0 "register_operand" "=c")
(sign_extend:DI (match_operator:SI 3 "divmod_operator"
-[(match_operand:DI 1 "register_operand" "a")
- (match_operand:DI 2 "register_operand" "b")])))
+[(truncate:SI (match_operand:DI 1 "register_operand" "a"))
+ (truncate:SI (match_operand:DI 2 "register_operand" "b"))])))
(use (match_operand:DI 4 "register_operand" "c"))
(use (match_operand 5 "const_int_operand"))
(clobber (reg:DI 23))
@@ -808,9 +810,10 @@
 
 (define_insn "*divmodsi_internal"
   [(set (match_operand:DI 0 "register_operand" "=c")
-   (sign_extend:DI (match_operator:SI 3 "divmod_operator"
-   [(match_operand:DI 1 "register_operand" "a")
-(match_operand:DI 2 "register_operand" "b")])))
+   (sign_extend:DI
+(match_operator:SI 3 "divmod_operator"
+ [(truncate:SI (match_operand:DI 1 "register_operand" "a"))
+  (truncate:SI (match_operand:DI 2 "register_operand" "b"))])))
(clobber (reg:DI 23))
(clobber (reg:DI 28))]
   "TARGET_ABI_OSF"
diff --git a/gcc/config/alpha/constraints.md b/gcc/config/alpha/constraints.md
index 0d001ba26f1..4383f1fa895 100644
--- a/gcc/config/alpha/constraints.md
+++ b/gcc/config/alpha/constraints.md
@@ -27,7 +27,7 @@
  "General register 24, input to division routine")
 
 (define_register_constraint 

[committed] alpha: Fix invalid RTX in divmodsi insn patterns [PR115297]

2024-05-31 Thread Uros Bizjak
any_divmod instructions are modelled with invalid RTX:

  [(set (match_operand:DI 0 "register_operand" "=c")
(sign_extend:DI (match_operator:SI 3 "divmod_operator"
[(match_operand:DI 1 "register_operand" "a")
 (match_operand:DI 2 "register_operand" "b")])))
   (clobber (reg:DI 23))
   (clobber (reg:DI 28))]

where SImode divmod_operator (div,mod,udiv,umod) has DImode operands.

Wrap input operand with truncate:SI to make machine modes consistent.

PR target/115297

gcc/ChangeLog:

* config/alpha/alpha.md (si3): Wrap DImode
operands 3 and 4 with truncate:SI RTX.
(*divmodsi_internal_er): Ditto for operands 1 and 2.
(*divmodsi_internal_er_1): Ditto.
(*divmodsi_internal): Ditto.
* config/alpha/constraints.md ("b"): Correct register
number in the description.

gcc/testsuite/ChangeLog:

* gcc.target/alpha/pr115297.c: New test.

Tested by building an alpha-linux-gnu crosscompiler.

Uros.
diff --git a/gcc/config/alpha/alpha.md b/gcc/config/alpha/alpha.md
index 79f12c53c16..1e2de5a4d15 100644
--- a/gcc/config/alpha/alpha.md
+++ b/gcc/config/alpha/alpha.md
@@ -725,7 +725,8 @@ (define_expand "si3"
(sign_extend:DI (match_operand:SI 2 "nonimmediate_operand")))
(parallel [(set (match_dup 5)
   (sign_extend:DI
-   (any_divmod:SI (match_dup 3) (match_dup 4
+   (any_divmod:SI (truncate:SI (match_dup 3))
+  (truncate:SI (match_dup 4)
  (clobber (reg:DI 23))
  (clobber (reg:DI 28))])
(set (match_operand:SI 0 "nonimmediate_operand")
@@ -751,9 +752,10 @@ (define_expand "di3"
 
 (define_insn_and_split "*divmodsi_internal_er"
   [(set (match_operand:DI 0 "register_operand" "=c")
-   (sign_extend:DI (match_operator:SI 3 "divmod_operator"
-   [(match_operand:DI 1 "register_operand" "a")
-(match_operand:DI 2 "register_operand" "b")])))
+   (sign_extend:DI
+(match_operator:SI 3 "divmod_operator"
+ [(truncate:SI (match_operand:DI 1 "register_operand" "a"))
+  (truncate:SI (match_operand:DI 2 "register_operand" "b"))])))
(clobber (reg:DI 23))
(clobber (reg:DI 28))]
   "TARGET_EXPLICIT_RELOCS && TARGET_ABI_OSF"
@@ -795,8 +797,8 @@ (define_insn_and_split "*divmodsi_internal_er"
 (define_insn "*divmodsi_internal_er_1"
   [(set (match_operand:DI 0 "register_operand" "=c")
(sign_extend:DI (match_operator:SI 3 "divmod_operator"
-[(match_operand:DI 1 "register_operand" "a")
- (match_operand:DI 2 "register_operand" "b")])))
+[(truncate:SI (match_operand:DI 1 "register_operand" "a"))
+ (truncate:SI (match_operand:DI 2 "register_operand" "b"))])))
(use (match_operand:DI 4 "register_operand" "c"))
(use (match_operand 5 "const_int_operand"))
(clobber (reg:DI 23))
@@ -808,9 +810,10 @@ (define_insn "*divmodsi_internal_er_1"
 
 (define_insn "*divmodsi_internal"
   [(set (match_operand:DI 0 "register_operand" "=c")
-   (sign_extend:DI (match_operator:SI 3 "divmod_operator"
-   [(match_operand:DI 1 "register_operand" "a")
-(match_operand:DI 2 "register_operand" "b")])))
+   (sign_extend:DI
+(match_operator:SI 3 "divmod_operator"
+ [(truncate:SI (match_operand:DI 1 "register_operand" "a"))
+  (truncate:SI (match_operand:DI 2 "register_operand" "b"))])))
(clobber (reg:DI 23))
(clobber (reg:DI 28))]
   "TARGET_ABI_OSF"
diff --git a/gcc/config/alpha/constraints.md b/gcc/config/alpha/constraints.md
index 0d001ba26f1..4383f1fa895 100644
--- a/gcc/config/alpha/constraints.md
+++ b/gcc/config/alpha/constraints.md
@@ -27,7 +27,7 @@ (define_register_constraint "a" "R24_REG"
  "General register 24, input to division routine")
 
 (define_register_constraint "b" "R25_REG"
- "General register 24, input to division routine")
+ "General register 25, input to division routine")
 
 (define_register_constraint "c" "R27_REG"
  "General register 27, function call address")
diff --git a/gcc/testsuite/gcc.target/alpha/pr115297.c 
b/gcc/testsuite/gcc.target/alpha/pr115297.c
new file mode 100644
index 000..4d5890ec8d9
--- /dev/null
+++ b/gcc/testsuite/gcc.target/alpha/pr115297.c
@@ -0,0 +1,13 @@
+/* PR target/115297 */
+/* { dg-do compile } */
+/* { dg-options "-O1" } */
+
+enum { BPF_F_USER_BUILD_ID } __bpf_get_stack_size;
+long __bpf_get_stack_flags, bpf_get_stack___trans_tmp_2;
+
+void bpf_get_stack() {
+  unsigned elem_size;
+  int err = elem_size = __bpf_get_stack_flags ?: sizeof(long);
+  if (__builtin_expect(__bpf_get_stack_size % elem_size, 0))
+bpf_get_stack___trans_tmp_2 = err;
+}


[gcc r15-943] alpha: Fix invalid RTX in divmodsi insn patterns [PR115297]

2024-05-31 Thread Uros Bizjak via Gcc-cvs
https://gcc.gnu.org/g:0ac802064c2a018cf166c37841697e867de65a95

commit r15-943-g0ac802064c2a018cf166c37841697e867de65a95
Author: Uros Bizjak 
Date:   Fri May 31 15:52:03 2024 +0200

alpha: Fix invalid RTX in divmodsi insn patterns [PR115297]

any_divmod instructions are modelled with invalid RTX:

  [(set (match_operand:DI 0 "register_operand" "=c")
(sign_extend:DI (match_operator:SI 3 "divmod_operator"
[(match_operand:DI 1 "register_operand" "a")
 (match_operand:DI 2 "register_operand" "b")])))
   (clobber (reg:DI 23))
   (clobber (reg:DI 28))]

where SImode divmod_operator (div,mod,udiv,umod) has DImode operands.

Wrap input operand with truncate:SI to make machine modes consistent.

PR target/115297

gcc/ChangeLog:

* config/alpha/alpha.md (si3): Wrap DImode
operands 3 and 4 with truncate:SI RTX.
(*divmodsi_internal_er): Ditto for operands 1 and 2.
(*divmodsi_internal_er_1): Ditto.
(*divmodsi_internal): Ditto.
* config/alpha/constraints.md ("b"): Correct register
number in the description.

gcc/testsuite/ChangeLog:

* gcc.target/alpha/pr115297.c: New test.

Diff:
---
 gcc/config/alpha/alpha.md | 21 -
 gcc/config/alpha/constraints.md   |  2 +-
 gcc/testsuite/gcc.target/alpha/pr115297.c | 13 +
 3 files changed, 26 insertions(+), 10 deletions(-)

diff --git a/gcc/config/alpha/alpha.md b/gcc/config/alpha/alpha.md
index 79f12c53c16..1e2de5a4d15 100644
--- a/gcc/config/alpha/alpha.md
+++ b/gcc/config/alpha/alpha.md
@@ -725,7 +725,8 @@
(sign_extend:DI (match_operand:SI 2 "nonimmediate_operand")))
(parallel [(set (match_dup 5)
   (sign_extend:DI
-   (any_divmod:SI (match_dup 3) (match_dup 4
+   (any_divmod:SI (truncate:SI (match_dup 3))
+  (truncate:SI (match_dup 4)
  (clobber (reg:DI 23))
  (clobber (reg:DI 28))])
(set (match_operand:SI 0 "nonimmediate_operand")
@@ -751,9 +752,10 @@
 
 (define_insn_and_split "*divmodsi_internal_er"
   [(set (match_operand:DI 0 "register_operand" "=c")
-   (sign_extend:DI (match_operator:SI 3 "divmod_operator"
-   [(match_operand:DI 1 "register_operand" "a")
-(match_operand:DI 2 "register_operand" "b")])))
+   (sign_extend:DI
+(match_operator:SI 3 "divmod_operator"
+ [(truncate:SI (match_operand:DI 1 "register_operand" "a"))
+  (truncate:SI (match_operand:DI 2 "register_operand" "b"))])))
(clobber (reg:DI 23))
(clobber (reg:DI 28))]
   "TARGET_EXPLICIT_RELOCS && TARGET_ABI_OSF"
@@ -795,8 +797,8 @@
 (define_insn "*divmodsi_internal_er_1"
   [(set (match_operand:DI 0 "register_operand" "=c")
(sign_extend:DI (match_operator:SI 3 "divmod_operator"
-[(match_operand:DI 1 "register_operand" "a")
- (match_operand:DI 2 "register_operand" "b")])))
+[(truncate:SI (match_operand:DI 1 "register_operand" "a"))
+ (truncate:SI (match_operand:DI 2 "register_operand" "b"))])))
(use (match_operand:DI 4 "register_operand" "c"))
(use (match_operand 5 "const_int_operand"))
(clobber (reg:DI 23))
@@ -808,9 +810,10 @@
 
 (define_insn "*divmodsi_internal"
   [(set (match_operand:DI 0 "register_operand" "=c")
-   (sign_extend:DI (match_operator:SI 3 "divmod_operator"
-   [(match_operand:DI 1 "register_operand" "a")
-(match_operand:DI 2 "register_operand" "b")])))
+   (sign_extend:DI
+(match_operator:SI 3 "divmod_operator"
+ [(truncate:SI (match_operand:DI 1 "register_operand" "a"))
+  (truncate:SI (match_operand:DI 2 "register_operand" "b"))])))
(clobber (reg:DI 23))
(clobber (reg:DI 28))]
   "TARGET_ABI_OSF"
diff --git a/gcc/config/alpha/constraints.md b/gcc/config/alpha/constraints.md
index 0d001ba26f1..4383f1fa895 100644
--- a/gcc/config/alpha/constraints.md
+++ b/gcc/config/alpha/constraints.md
@@ -27,7 +27,7 @@
  "General register 24, input to division routine")
 
 (define_register_constraint "b" "R25_REG"
- "General register 24, input to division routine&q

[committed] i386: Rewrite bswaphi2 handling [PR115102]

2024-05-30 Thread Uros Bizjak
Introduce *bswaphi2 instruction pattern and enable bswaphi2 expander
also for non-movbe targets.  The testcase:

unsigned short bswap8 (unsigned short val)
{
  return ((val & 0xff00) >> 8) | ((val & 0xff) << 8);
}

now expands through bswaphi2 named expander.

Rewrite bswaphi_lowpart insn pattern as bswaphisi2_lowpart in the RTX form
that combine pass can use to simplify:

Trying 6, 9, 8 -> 10:
6: r99:SI=bswap(r103:SI)
9: {r107:SI=r103:SI&0x;clobber flags:CC;}
  REG_DEAD r103:SI
  REG_UNUSED flags:CC
8: {r106:SI=r99:SI 0>>0x10;clobber flags:CC;}
  REG_DEAD r99:SI
  REG_UNUSED flags:CC
   10: {r104:SI=r106:SI|r107:SI;clobber flags:CC;}
  REG_DEAD r107:SI
  REG_DEAD r106:SI
  REG_UNUSED flags:CC

Successfully matched this instruction:
(set (reg:SI 104 [ _8 ])
(ior:SI (and:SI (reg/v:SI 103 [ val ])
(const_int -65536 [0x]))
(lshiftrt:SI (bswap:SI (reg/v:SI 103 [ val ]))
(const_int 16 [0x10]
allowing combination of insns 6, 8, 9 and 10

when compiling the following testcase:

unsigned int bswap8 (unsigned int val)
{
  return (val & 0x) | ((val & 0xff00) >> 8) | ((val & 0xff) << 8);
}

to produce:

movl%edi, %eax
xchgb   %ah, %al
ret

The expansion now always goes through a clobberless form of the bswaphi
instruction.  The instruction is conditionally converted to a rotate at
peephole2 pass.  This significantly simplifies bswaphisi2_lowpart
insn pattern attributes.

PR target/115102

gcc/ChangeLog:

* config/i386/i386.md (bswaphi2): Also enable for !TARGET_MOVBE.
(*bswaphi2): New insn pattern.
(bswaphisi2_lowpart): Rename from bswaphi_lowpart.  Rewrite
insn RTX to match the expected form of the combine pass.
Remove rol{w} alternative and corresponding attributes.
(bswsaphisi2_lowpart peephole2): New peephole2 pattern to
conditionally convert bswaphisi2_lowpart to rotlhi3_1_slp.
(bswapsi2): Update expander for rename.
(rotlhi3_1_slp splitter): Conditionally split to bswaphi2.

gcc/testsuite/ChangeLog:

* gcc.target/i386/pr115102.c: New test.

Bootstrapped and regression tested on x86_64-pc-linux-gnu {,-m32}.

Uros.
diff --git a/gcc/config/i386/i386.md b/gcc/config/i386/i386.md
index c162cd42386..375654cf74e 100644
--- a/gcc/config/i386/i386.md
+++ b/gcc/config/i386/i386.md
@@ -17210,9 +17210,7 @@ (define_split
   (clobber (reg:CC FLAGS_REG))]
  "reload_completed
   && (TARGET_USE_XCHGB || optimize_function_for_size_p (cfun))"
- [(parallel [(set (strict_low_part (match_dup 0))
- (bswap:HI (match_dup 0)))
-(clobber (reg:CC FLAGS_REG))])])
+ [(set (match_dup 0) (bswap:HI (match_dup 0)))])
 
 ;; Rotations through carry flag
 (define_insn "rcrsi2"
@@ -20730,12 +20728,11 @@ (define_expand "bswapsi2"
 operands[1] = force_reg (SImode, operands[1]);
   else
 {
-  rtx x = operands[0];
+  rtx x = gen_reg_rtx (SImode);
 
-  emit_move_insn (x, operands[1]);
-  emit_insn (gen_bswaphi_lowpart (gen_lowpart (HImode, x)));
+  emit_insn (gen_bswaphisi2_lowpart (x, operands[1]));
   emit_insn (gen_rotlsi3 (x, x, GEN_INT (16)));
-  emit_insn (gen_bswaphi_lowpart (gen_lowpart (HImode, x)));
+  emit_insn (gen_bswaphisi2_lowpart (operands[0], x));
   DONE;
 }
 })
@@ -20767,7 +20764,11 @@ (define_insn "*bswap2"
 (define_expand "bswaphi2"
   [(set (match_operand:HI 0 "register_operand")
(bswap:HI (match_operand:HI 1 "nonimmediate_operand")))]
-  "TARGET_MOVBE")
+  ""
+{
+  if (!TARGET_MOVBE)
+operands[1] = force_reg (HImode, operands[1]);
+})
 
 (define_insn "*bswaphi2_movbe"
   [(set (match_operand:HI 0 "nonimmediate_operand" "=Q,r,m")
@@ -20788,33 +20789,55 @@ (define_insn "*bswaphi2_movbe"
(set_attr "bdver1_decode" "double,*,*")
(set_attr "mode" "QI,HI,HI")])
 
+(define_insn "*bswaphi2"
+  [(set (match_operand:HI 0 "register_operand" "=Q")
+   (bswap:HI (match_operand:HI 1 "register_operand" "0")))]
+  "!TARGET_MOVBE"
+  "xchg{b}\t{%h0, %b0|%b0, %h0}"
+  [(set_attr "type" "imov")
+   (set_attr "pent_pair" "np")
+   (set_attr "athlon_decode" "vector")
+   (set_attr "amdfam10_decode" "double")
+   (set_attr "bdver1_decode" "double")
+   (set_attr "mode" "QI")])
+
 (define_peephole2
   [(set (match_operand:HI 0 "general_reg_operand")
(bswap:HI (match_dup 0)))]
-  "TARGET_MOVBE
-   && !(TARGET_USE_XCHGB || optimize_function_for_size_p (cfun))
+  "!(TARGET_USE_XCHGB ||
+ TARGET_PARTIAL_REG_STALL || optimize_function_for_size_p (cfun))
&& peep2_regno_dead_p (0, FLAGS_REG)"
   [(parallel [(set (match_dup 0) (rotate:HI (match_dup 0) (const_int 8)))
  (clobber (reg:CC FLAGS_REG))])])
 
-(define_insn "bswaphi_lowpart"
-  [(set (strict_low_part (match_operand:HI 0 "register_operand" "+Q,r"))
-   (bswap:HI (match_dup 0)))
-   (clobber (reg:CC FLAGS_REG))]
+(define_insn "bswaphisi2_lowpart"
+  [(set (match_operand:SI 0 "register_operand" 

[gcc r15-930] i386: Rewrite bswaphi2 handling [PR115102]

2024-05-30 Thread Uros Bizjak via Gcc-cvs
https://gcc.gnu.org/g:e715204f203d318524ae86f3f2a1e8d5d7cb08dc

commit r15-930-ge715204f203d318524ae86f3f2a1e8d5d7cb08dc
Author: Uros Bizjak 
Date:   Thu May 30 21:27:42 2024 +0200

i386: Rewrite bswaphi2 handling [PR115102]

Introduce *bswaphi2 instruction pattern and enable bswaphi2 expander
also for non-movbe targets.  The testcase:

unsigned short bswap8 (unsigned short val)
{
  return ((val & 0xff00) >> 8) | ((val & 0xff) << 8);
}

now expands through bswaphi2 named expander.

Rewrite bswaphi_lowpart insn pattern as bswaphisi2_lowpart in the RTX form
that combine pass can use to simplify:

Trying 6, 9, 8 -> 10:
6: r99:SI=bswap(r103:SI)
9: {r107:SI=r103:SI&0x;clobber flags:CC;}
  REG_DEAD r103:SI
  REG_UNUSED flags:CC
8: {r106:SI=r99:SI 0>>0x10;clobber flags:CC;}
  REG_DEAD r99:SI
  REG_UNUSED flags:CC
   10: {r104:SI=r106:SI|r107:SI;clobber flags:CC;}
  REG_DEAD r107:SI
  REG_DEAD r106:SI
  REG_UNUSED flags:CC

Successfully matched this instruction:
(set (reg:SI 104 [ _8 ])
(ior:SI (and:SI (reg/v:SI 103 [ val ])
(const_int -65536 [0x]))
(lshiftrt:SI (bswap:SI (reg/v:SI 103 [ val ]))
(const_int 16 [0x10]
allowing combination of insns 6, 8, 9 and 10

when compiling the following testcase:

unsigned int bswap8 (unsigned int val)
{
  return (val & 0x) | ((val & 0xff00) >> 8) | ((val & 0xff) << 8);
}

to produce:

movl%edi, %eax
xchgb   %ah, %al
ret

The expansion now always goes through a clobberless form of the bswaphi
instruction.  The instruction is conditionally converted to a rotate at
peephole2 pass.  This significantly simplifies bswaphisi2_lowpart
insn pattern attributes.

PR target/115102

gcc/ChangeLog:

* config/i386/i386.md (bswaphi2): Also enable for !TARGET_MOVBE.
(*bswaphi2): New insn pattern.
(bswaphisi2_lowpart): Rename from bswaphi_lowpart.  Rewrite
insn RTX to match the expected form of the combine pass.
Remove rol{w} alternative and corresponding attributes.
(bswsaphisi2_lowpart peephole2): New peephole2 pattern to
conditionally convert bswaphisi2_lowpart to rotlhi3_1_slp.
(bswapsi2): Update expander for rename.
(rotlhi3_1_slp splitter): Conditionally split to bswaphi2.

gcc/testsuite/ChangeLog:

* gcc.target/i386/pr115102.c: New test.

Diff:
---
 gcc/config/i386/i386.md  | 77 +---
 gcc/testsuite/gcc.target/i386/pr115102.c | 10 +
 2 files changed, 60 insertions(+), 27 deletions(-)

diff --git a/gcc/config/i386/i386.md b/gcc/config/i386/i386.md
index c162cd42386..375654cf74e 100644
--- a/gcc/config/i386/i386.md
+++ b/gcc/config/i386/i386.md
@@ -17210,9 +17210,7 @@
   (clobber (reg:CC FLAGS_REG))]
  "reload_completed
   && (TARGET_USE_XCHGB || optimize_function_for_size_p (cfun))"
- [(parallel [(set (strict_low_part (match_dup 0))
- (bswap:HI (match_dup 0)))
-(clobber (reg:CC FLAGS_REG))])])
+ [(set (match_dup 0) (bswap:HI (match_dup 0)))])
 
 ;; Rotations through carry flag
 (define_insn "rcrsi2"
@@ -20730,12 +20728,11 @@
 operands[1] = force_reg (SImode, operands[1]);
   else
 {
-  rtx x = operands[0];
+  rtx x = gen_reg_rtx (SImode);
 
-  emit_move_insn (x, operands[1]);
-  emit_insn (gen_bswaphi_lowpart (gen_lowpart (HImode, x)));
+  emit_insn (gen_bswaphisi2_lowpart (x, operands[1]));
   emit_insn (gen_rotlsi3 (x, x, GEN_INT (16)));
-  emit_insn (gen_bswaphi_lowpart (gen_lowpart (HImode, x)));
+  emit_insn (gen_bswaphisi2_lowpart (operands[0], x));
   DONE;
 }
 })
@@ -20767,7 +20764,11 @@
 (define_expand "bswaphi2"
   [(set (match_operand:HI 0 "register_operand")
(bswap:HI (match_operand:HI 1 "nonimmediate_operand")))]
-  "TARGET_MOVBE")
+  ""
+{
+  if (!TARGET_MOVBE)
+operands[1] = force_reg (HImode, operands[1]);
+})
 
 (define_insn "*bswaphi2_movbe"
   [(set (match_operand:HI 0 "nonimmediate_operand" "=Q,r,m")
@@ -20788,33 +20789,55 @@
(set_attr "bdver1_decode" "double,*,*")
(set_attr "mode" "QI,HI,HI")])
 
+(define_insn "*bswaphi2"
+  [(set (match_operand:HI 0 "register_operand" "=Q")
+   (bswap:HI (match_operand:HI 1 "register_operand" "0")))]
+  "!TARGET_MOVBE"
+  "xchg{b}\t{%h0, %b0|%b0, %h0}"
+  [(set_attr "type" "imo

[committed] i386: Improve access to _Atomic DImode location via XMM regs for SSE4.1 x86_32 targets

2024-05-28 Thread Uros Bizjak
Use MOVD/PEXTRD and MOVD/PINSRD insn sequences to move DImode value
between XMM and GPR register sets for SSE4.1 x86_32 targets in order
to avoid spilling the value to stack.

The load from _Atomic location a improves from:

movqa, %xmm0
movq%xmm0, (%esp)
movl(%esp), %eax
movl4(%esp), %edx

to:
movqa, %xmm0
movd%xmm0, %eax
pextrd  $1, %xmm0, %edx

The store to _Atomic location b improves from:

movl%eax, (%esp)
movl%edx, 4(%esp)
movq(%esp), %xmm0
movq%xmm0, b

to:
movd%eax, %xmm0
pinsrd  $1, %edx, %xmm0
movq%xmm0, b

gcc/ChangeLog:

* config/i386/sync.md (atomic_loaddi_fpu): Use movd/pextrd
to move DImode value from XMM to GPR for TARGET_SSE4_1.
(atomic_storedi_fpu): Use movd/pinsrd to move DImode value
from GPR to XMM for TARGET_SSE4_1.

Bootstrapped and regression tested on x86_64-pc-linuxgnu {,-m32}.

Uros.
diff --git a/gcc/config/i386/sync.md b/gcc/config/i386/sync.md
index 8317581ebe2..f2b3ba0aa7a 100644
--- a/gcc/config/i386/sync.md
+++ b/gcc/config/i386/sync.md
@@ -215,8 +215,18 @@ (define_insn_and_split "atomic_loaddi_fpu"
}
   else
{
+ rtx tmpdi = gen_lowpart (DImode, tmp);
+
  emit_insn (gen_loaddi_via_sse (tmp, src));
- emit_insn (gen_storedi_via_sse (mem, tmp));
+
+ if (GENERAL_REG_P (dst)
+ && TARGET_SSE4_1 && TARGET_INTER_UNIT_MOVES_FROM_VEC)
+   {
+ emit_move_insn (dst, tmpdi);
+ DONE;
+   }
+ else
+   emit_move_insn (mem, tmpdi);
}
 
   if (mem != dst)
@@ -294,20 +304,30 @@ (define_insn_and_split "atomic_storedi_fpu"
 emit_move_insn (dst, src);
   else
 {
-  if (REG_P (src))
-   {
- emit_move_insn (mem, src);
- src = mem;
-   }
-
   if (STACK_REG_P (tmp))
{
+ if (GENERAL_REG_P (src))
+   {
+ emit_move_insn (mem, src);
+ src = mem;
+   }
+
  emit_insn (gen_loaddi_via_fpu (tmp, src));
  emit_insn (gen_storedi_via_fpu (dst, tmp));
}
   else
{
- emit_insn (gen_loaddi_via_sse (tmp, src));
+ rtx tmpdi = gen_lowpart (DImode, tmp);
+
+ if (GENERAL_REG_P (src)
+ && !(TARGET_SSE4_1 && TARGET_INTER_UNIT_MOVES_TO_VEC))
+   {
+ emit_move_insn (mem, src);
+ src = mem;
+   }
+
+ emit_move_insn (tmpdi, src);
+
  emit_insn (gen_storedi_via_sse (dst, tmp));
}
 }


[gcc r15-876] i386: Improve access to _Atomic DImode location via XMM regs for SSE4.1 x86_32 targets

2024-05-28 Thread Uros Bizjak via Gcc-cvs
https://gcc.gnu.org/g:91d79053f2b416cb9e97d9c0c3fb5b73075289e6

commit r15-876-g91d79053f2b416cb9e97d9c0c3fb5b73075289e6
Author: Uros Bizjak 
Date:   Tue May 28 20:25:14 2024 +0200

i386: Improve access to _Atomic DImode location via XMM regs for SSE4.1 
x86_32 targets

Use MOVD/PEXTRD and MOVD/PINSRD insn sequences to move DImode value
between XMM and GPR register sets for SSE4.1 x86_32 targets in order
to avoid spilling the value to stack.

The load from _Atomic location a improves from:

movqa, %xmm0
movq%xmm0, (%esp)
movl(%esp), %eax
movl4(%esp), %edx

to:
movqa, %xmm0
movd%xmm0, %eax
pextrd  $1, %xmm0, %edx

The store to _Atomic location b improves from:

movl%eax, (%esp)
movl%edx, 4(%esp)
movq(%esp), %xmm0
movq%xmm0, b

to:
movd%eax, %xmm0
pinsrd  $1, %edx, %xmm0
movq%xmm0, b

gcc/ChangeLog:

* config/i386/sync.md (atomic_loaddi_fpu): Use movd/pextrd
to move DImode value from XMM to GPR for TARGET_SSE4_1.
(atomic_storedi_fpu): Use movd/pinsrd to move DImode value
from GPR to XMM for TARGET_SSE4_1.

Diff:
---
 gcc/config/i386/sync.md | 36 
 1 file changed, 28 insertions(+), 8 deletions(-)

diff --git a/gcc/config/i386/sync.md b/gcc/config/i386/sync.md
index 8317581ebe2..f2b3ba0aa7a 100644
--- a/gcc/config/i386/sync.md
+++ b/gcc/config/i386/sync.md
@@ -215,8 +215,18 @@
}
   else
{
+ rtx tmpdi = gen_lowpart (DImode, tmp);
+
  emit_insn (gen_loaddi_via_sse (tmp, src));
- emit_insn (gen_storedi_via_sse (mem, tmp));
+
+ if (GENERAL_REG_P (dst)
+ && TARGET_SSE4_1 && TARGET_INTER_UNIT_MOVES_FROM_VEC)
+   {
+ emit_move_insn (dst, tmpdi);
+ DONE;
+   }
+ else
+   emit_move_insn (mem, tmpdi);
}
 
   if (mem != dst)
@@ -294,20 +304,30 @@
 emit_move_insn (dst, src);
   else
 {
-  if (REG_P (src))
-   {
- emit_move_insn (mem, src);
- src = mem;
-   }
-
   if (STACK_REG_P (tmp))
{
+ if (GENERAL_REG_P (src))
+   {
+ emit_move_insn (mem, src);
+ src = mem;
+   }
+
  emit_insn (gen_loaddi_via_fpu (tmp, src));
  emit_insn (gen_storedi_via_fpu (dst, tmp));
}
   else
{
- emit_insn (gen_loaddi_via_sse (tmp, src));
+ rtx tmpdi = gen_lowpart (DImode, tmp);
+
+ if (GENERAL_REG_P (src)
+ && !(TARGET_SSE4_1 && TARGET_INTER_UNIT_MOVES_TO_VEC))
+   {
+ emit_move_insn (mem, src);
+ src = mem;
+   }
+
+ emit_move_insn (tmpdi, src);
+
  emit_insn (gen_storedi_via_sse (dst, tmp));
}
 }


[gcc r11-11454] ubsan: Use right address space for MEM_REF created for bool/enum sanitization [PR115172]

2024-05-28 Thread Uros Bizjak via Gcc-cvs
https://gcc.gnu.org/g:d8985ea10c911c994e00dbd6a08dcae907ebc1f7

commit r11-11454-gd8985ea10c911c994e00dbd6a08dcae907ebc1f7
Author: Jakub Jelinek 
Date:   Wed May 22 09:12:28 2024 +0200

ubsan: Use right address space for MEM_REF created for bool/enum 
sanitization [PR115172]

The following testcase is miscompiled, because -fsanitize=bool,enum
creates a MEM_REF without propagating there address space qualifiers,
so what should be normally loaded using say %gs:/%fs: segment prefix
isn't.  Together with asan it then causes that load to be sanitized.

2024-05-22  Jakub Jelinek  

PR sanitizer/115172
* ubsan.c (instrument_bool_enum_load): If rhs is not in generic
address space, use qualified version of utype with the right
address space.  Formatting fix.

* gcc.dg/asan/pr115172.c: New test.

(cherry picked from commit d3c506eff54fcbac389a529c2e98da108a410b7f)

Diff:
---
 gcc/testsuite/gcc.dg/asan/pr115172.c | 20 
 gcc/ubsan.c  |  6 +-
 2 files changed, 25 insertions(+), 1 deletion(-)

diff --git a/gcc/testsuite/gcc.dg/asan/pr115172.c 
b/gcc/testsuite/gcc.dg/asan/pr115172.c
new file mode 100644
index 000..8707e615733
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/asan/pr115172.c
@@ -0,0 +1,20 @@
+/* PR sanitizer/115172 */
+/* { dg-do compile { target i?86-*-* x86_64-*-* } } */
+/* { dg-options "-O2 -fsanitize=address,bool -ffat-lto-objects 
-fdump-tree-asan1" } */
+/* { dg-final { scan-tree-dump-not "\.ASAN_CHECK " "asan1" } } */
+
+#ifdef __x86_64__
+#define SEG __seg_gs
+#else
+#define SEG __seg_fs
+#endif
+
+extern struct S { _Bool b; } s;
+void bar (void);
+
+void
+foo (void)
+{
+  if (*(volatile _Bool SEG *) (__UINTPTR_TYPE__) )
+bar ();
+}
diff --git a/gcc/ubsan.c b/gcc/ubsan.c
index 2b12651b440..f77dee5fddd 100644
--- a/gcc/ubsan.c
+++ b/gcc/ubsan.c
@@ -1703,13 +1703,17 @@ instrument_bool_enum_load (gimple_stmt_iterator *gsi)
   || TREE_CODE (gimple_assign_lhs (stmt)) != SSA_NAME)
 return;
 
+  addr_space_t as = TYPE_ADDR_SPACE (TREE_TYPE (rhs));
+  if (as != TYPE_ADDR_SPACE (utype))
+utype = build_qualified_type (utype, TYPE_QUALS (utype)
+| ENCODE_QUAL_ADDR_SPACE (as));
   bool ends_bb = stmt_ends_bb_p (stmt);
   location_t loc = gimple_location (stmt);
   tree lhs = gimple_assign_lhs (stmt);
   tree ptype = build_pointer_type (TREE_TYPE (rhs));
   tree atype = reference_alias_ptr_type (rhs);
   gimple *g = gimple_build_assign (make_ssa_name (ptype),
- build_fold_addr_expr (rhs));
+  build_fold_addr_expr (rhs));
   gimple_set_location (g, loc);
   gsi_insert_before (gsi, g, GSI_SAME_STMT);
   tree mem = build2 (MEM_REF, utype, gimple_assign_lhs (g),


[gcc r12-10477] ubsan: Use right address space for MEM_REF created for bool/enum sanitization [PR115172]

2024-05-28 Thread Uros Bizjak via Gcc-cvs
https://gcc.gnu.org/g:da9b7a507ef38287cc16bc88e808293019f9f531

commit r12-10477-gda9b7a507ef38287cc16bc88e808293019f9f531
Author: Jakub Jelinek 
Date:   Wed May 22 09:12:28 2024 +0200

ubsan: Use right address space for MEM_REF created for bool/enum 
sanitization [PR115172]

The following testcase is miscompiled, because -fsanitize=bool,enum
creates a MEM_REF without propagating there address space qualifiers,
so what should be normally loaded using say %gs:/%fs: segment prefix
isn't.  Together with asan it then causes that load to be sanitized.

2024-05-22  Jakub Jelinek  

PR sanitizer/115172
* ubsan.cc (instrument_bool_enum_load): If rhs is not in generic
address space, use qualified version of utype with the right
address space.  Formatting fix.

* gcc.dg/asan/pr115172.c: New test.

(cherry picked from commit d3c506eff54fcbac389a529c2e98da108a410b7f)

Diff:
---
 gcc/testsuite/gcc.dg/asan/pr115172.c | 20 
 gcc/ubsan.cc |  6 +-
 2 files changed, 25 insertions(+), 1 deletion(-)

diff --git a/gcc/testsuite/gcc.dg/asan/pr115172.c 
b/gcc/testsuite/gcc.dg/asan/pr115172.c
new file mode 100644
index 000..8707e615733
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/asan/pr115172.c
@@ -0,0 +1,20 @@
+/* PR sanitizer/115172 */
+/* { dg-do compile { target i?86-*-* x86_64-*-* } } */
+/* { dg-options "-O2 -fsanitize=address,bool -ffat-lto-objects 
-fdump-tree-asan1" } */
+/* { dg-final { scan-tree-dump-not "\.ASAN_CHECK " "asan1" } } */
+
+#ifdef __x86_64__
+#define SEG __seg_gs
+#else
+#define SEG __seg_fs
+#endif
+
+extern struct S { _Bool b; } s;
+void bar (void);
+
+void
+foo (void)
+{
+  if (*(volatile _Bool SEG *) (__UINTPTR_TYPE__) )
+bar ();
+}
diff --git a/gcc/ubsan.cc b/gcc/ubsan.cc
index 4d8e7cd86c5..70a5ef66bd9 100644
--- a/gcc/ubsan.cc
+++ b/gcc/ubsan.cc
@@ -1703,13 +1703,17 @@ instrument_bool_enum_load (gimple_stmt_iterator *gsi)
   || TREE_CODE (gimple_assign_lhs (stmt)) != SSA_NAME)
 return;
 
+  addr_space_t as = TYPE_ADDR_SPACE (TREE_TYPE (rhs));
+  if (as != TYPE_ADDR_SPACE (utype))
+utype = build_qualified_type (utype, TYPE_QUALS (utype)
+| ENCODE_QUAL_ADDR_SPACE (as));
   bool ends_bb = stmt_ends_bb_p (stmt);
   location_t loc = gimple_location (stmt);
   tree lhs = gimple_assign_lhs (stmt);
   tree ptype = build_pointer_type (TREE_TYPE (rhs));
   tree atype = reference_alias_ptr_type (rhs);
   gimple *g = gimple_build_assign (make_ssa_name (ptype),
- build_fold_addr_expr (rhs));
+  build_fold_addr_expr (rhs));
   gimple_set_location (g, loc);
   gsi_insert_before (gsi, g, GSI_SAME_STMT);
   tree mem = build2 (MEM_REF, utype, gimple_assign_lhs (g),


Re: [PATCH V2] Reduce cost of MEM (A + imm).

2024-05-28 Thread Uros Bizjak
On Tue, May 28, 2024 at 12:48 PM liuhongt  wrote:
>
> > IMO, there is no need for CONST_INT_P condition, we should also allow
> > symbol_ref, label_ref and const (all allowed by
> > x86_64_immediate_operand predicate), these all decay to an immediate
> > value.
>
> Changed.
>
> Bootstrapped and regtested on x86_64-pc-linux-gnu{-m32,}.
> Ok for trunk.
>
> For MEM, rtx_cost iterates each subrtx, and adds up the costs,
> so for MEM (reg) and MEM (reg + 4), the former costs 5,
> the latter costs 9, it is not accurate for x86. Ideally
> address_cost should be used, but it reduce cost too much.
> So current solution is make constant disp as cheap as possible.
>
> gcc/ChangeLog:
>
> PR target/67325
> * config/i386/i386.cc (ix86_rtx_costs): Reduce cost of MEM (A
> + imm) to "cost of MEM (A)" + 1.
>
> gcc/testsuite/ChangeLog:
>
> * gcc.target/i386/pr67325.c: New test.

OK.

Thanks,
Uros.

> ---
>  gcc/config/i386/i386.cc | 18 +-
>  gcc/testsuite/gcc.target/i386/pr67325.c |  7 +++
>  2 files changed, 24 insertions(+), 1 deletion(-)
>  create mode 100644 gcc/testsuite/gcc.target/i386/pr67325.c
>
> diff --git a/gcc/config/i386/i386.cc b/gcc/config/i386/i386.cc
> index 3e2a3a194f1..85d87b9f778 100644
> --- a/gcc/config/i386/i386.cc
> +++ b/gcc/config/i386/i386.cc
> @@ -22194,7 +22194,23 @@ ix86_rtx_costs (rtx x, machine_mode mode, int 
> outer_code_i, int opno,
>/* An insn that accesses memory is slightly more expensive
>   than one that does not.  */
>if (speed)
> -*total += 1;
> +   {
> + *total += 1;
> + rtx addr = XEXP (x, 0);
> + /* For MEM, rtx_cost iterates each subrtx, and adds up the costs,
> +so for MEM (reg) and MEM (reg + 4), the former costs 5,
> +the latter costs 9, it is not accurate for x86. Ideally
> +address_cost should be used, but it reduce cost too much.
> +So current solution is make constant disp as cheap as possible.  
> */
> + if (GET_CODE (addr) == PLUS
> + && x86_64_immediate_operand (XEXP (addr, 1), Pmode))
> +   {
> + *total += 1;
> + *total += rtx_cost (XEXP (addr, 0), Pmode, PLUS, 0, speed);
> + return true;
> +   }
> +   }
> +
>return false;
>
>  case ZERO_EXTRACT:
> diff --git a/gcc/testsuite/gcc.target/i386/pr67325.c 
> b/gcc/testsuite/gcc.target/i386/pr67325.c
> new file mode 100644
> index 000..c3c1e4c5b4d
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/i386/pr67325.c
> @@ -0,0 +1,7 @@
> +/* { dg-do compile { target { ! ia32 } } } */
> +/* { dg-options "-O2" } */
> +/* { dg-final { scan-assembler-not "(?:sar|shr)" } } */
> +
> +int f(long*l){
> +  return *l>>32;
> +}
> --
> 2.31.1
>


  1   2   3   4   5   6   7   8   9   10   >