[PATCH] rs6000: Fix ICE on IEEE128 long double without vsx [PR114402]

2024-05-07 Thread Kewen.Lin
Hi,

As PR114402 shows, we supports IEEE128 format long double
even if there is no vsx support, but there is an ICE about
cbranch as the test case shows.  For now, we only supports
compare:CCFP pattern for IEEE128 fp if TARGET_FLOAT128_HW,
so in function rs6000_generate_compare we have a check with
!TARGET_FLOAT128_HW && FLOAT128_VECTOR_P (mode) to make
!TARGET_FLOAT128_HW IEEE128 fp handling go with libcall.
But unfortunately the IEEE128 without vsx support doesn't
meet FLOAT128_VECTOR_P (mode) so it goes further with an
unmatched compare:CCFP pattern which triggers ICE.

So this patch is to make rs6000_generate_compare consider
IEEE128 without vsx as well then it can end up with libcall.

Bootstrapped and regress-tested on powerpc64-linux-gnu
P8/P9 and powerpc64le-linux-gnu P9 and P10.

I'm going to push this next week if no objections.

BR,
Kewen
-
PR target/114402

gcc/ChangeLog:

* config/rs6000/rs6000.cc (rs6000_generate_compare): Make IEEE128
handling without vsx go with libcall.

gcc/testsuite/ChangeLog:

* gcc.target/powerpc/pr114402.c: New test.
---
 gcc/config/rs6000/rs6000.cc |  4 ++--
 gcc/testsuite/gcc.target/powerpc/pr114402.c | 16 
 2 files changed, 18 insertions(+), 2 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/powerpc/pr114402.c

diff --git a/gcc/config/rs6000/rs6000.cc b/gcc/config/rs6000/rs6000.cc
index d6214bd672b..7ae6cf43da4 100644
--- a/gcc/config/rs6000/rs6000.cc
+++ b/gcc/config/rs6000/rs6000.cc
@@ -15283,7 +15283,7 @@ rs6000_generate_compare (rtx cmp, machine_mode mode)
   rtx op0 = XEXP (cmp, 0);
   rtx op1 = XEXP (cmp, 1);

-  if (!TARGET_FLOAT128_HW && FLOAT128_VECTOR_P (mode))
+  if (!TARGET_FLOAT128_HW && FLOAT128_IEEE_P (mode))
 comp_mode = CCmode;
   else if (FLOAT_MODE_P (mode))
 comp_mode = CCFPmode;
@@ -15315,7 +15315,7 @@ rs6000_generate_compare (rtx cmp, machine_mode mode)

   /* IEEE 128-bit support in VSX registers when we do not have hardware
  support.  */
-  if (!TARGET_FLOAT128_HW && FLOAT128_VECTOR_P (mode))
+  if (!TARGET_FLOAT128_HW && FLOAT128_IEEE_P (mode))
 {
   rtx libfunc = NULL_RTX;
   bool check_nan = false;
diff --git a/gcc/testsuite/gcc.target/powerpc/pr114402.c 
b/gcc/testsuite/gcc.target/powerpc/pr114402.c
new file mode 100644
index 000..b927138382f
--- /dev/null
+++ b/gcc/testsuite/gcc.target/powerpc/pr114402.c
@@ -0,0 +1,16 @@
+/* Explicitly disable VSX when VSX is on.  */
+/* { dg-options "-mno-vsx" { target powerpc_vsx } } */
+
+/* Verify there is no ICE.  */
+
+long double a;
+long double b;
+
+int
+foo ()
+{
+  if (a > b)
+return 0;
+  else
+return 1;
+}
--
2.39.1


[PATCH 2/2] RISC-V: Add cmpmemsi expansion

2024-05-07 Thread Christoph Müllner
GCC has a generic cmpmemsi expansion via the by-pieces framework,
which shows some room for target-specific optimizations.
E.g. for comparing two aligned memory blocks of 15 bytes
we get the following sequence:

my_mem_cmp_aligned_15:
li  a4,0
j   .L2
.L8:
bgeua4,a7,.L7
.L2:
add a2,a0,a4
add a3,a1,a4
lbu a5,0(a2)
lbu a6,0(a3)
addia4,a4,1
li  a7,15// missed hoisting
subwa5,a5,a6
andia5,a5,0xff // useless
beq a5,zero,.L8
lbu a0,0(a2) // loading again!
lbu a5,0(a3) // loading again!
subwa0,a0,a5
ret
.L7:
li  a0,0
ret

Diff first byte: 15 insns
Diff second byte: 25 insns
No diff: 25 insns

Possible improvements:
* unroll the loop and use load-with-displacement to avoid offset increments
* load and compare multiple (aligned) bytes at once
* Use the bitmanip/strcmp result calculation (reverse words and
  synthesize (a2 >= a3) ? 1 : -1 in a branchless sequence)

When applying these improvements we get the following sequence:

my_mem_cmp_aligned_15:
ld  a5,0(a0)
ld  a4,0(a1)
bne a5,a4,.L2
ld  a5,8(a0)
ld  a4,8(a1)
sllia5,a5,8
sllia4,a4,8
bne a5,a4,.L2
li  a0,0
.L3:
sext.w  a0,a0
ret
.L2:
rev8a5,a5
rev8a4,a4
sltua5,a5,a4
neg a5,a5
ori a0,a5,1
j   .L3

Diff first byte: 11 insns
Diff second byte: 16 insns
No diff: 11 insns

This patch implements this improvements.

The tests consist of a execution test (similar to
gcc/testsuite/gcc.dg/torture/inline-mem-cmp-1.c) and a few tests
that test the expansion conditions (known length and alignment).

Similar to the cpymemsi expansion this patch does not introduce any
gating for the cmpmemsi expansion (on top of requiring the known length,
alignment and Zbb).

Bootstrapped and SPEC CPU 2017 tested.

gcc/ChangeLog:

* config/riscv/riscv-protos.h (riscv_expand_block_compare): New
prototype.
* config/riscv/riscv-string.cc (GEN_EMIT_HELPER2): New helper.
(do_load_from_addr): Add support for HI and SI/64 modes.
(emit_memcmp_scalar_load_and_compare): New helper to emit memcmp.
(emit_memcmp_scalar_result_calculation): Likewise.
(riscv_expand_block_compare_scalar): Likewise.
(riscv_expand_block_compare): New RISC-V expander for memory compare.
* config/riscv/riscv.md (cmpmemsi): New cmpmem expansion.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/cmpmemsi-1.c: New test.
* gcc.target/riscv/cmpmemsi-2.c: New test.
* gcc.target/riscv/cmpmemsi-3.c: New test.
* gcc.target/riscv/cmpmemsi.c: New test.

Signed-off-by: Christoph Müllner 
---
 gcc/config/riscv/riscv-protos.h |   1 +
 gcc/config/riscv/riscv-string.cc| 161 
 gcc/config/riscv/riscv.md   |  15 ++
 gcc/testsuite/gcc.target/riscv/cmpmemsi-1.c |   6 +
 gcc/testsuite/gcc.target/riscv/cmpmemsi-2.c |  42 +
 gcc/testsuite/gcc.target/riscv/cmpmemsi-3.c |  43 ++
 gcc/testsuite/gcc.target/riscv/cmpmemsi.c   |  22 +++
 7 files changed, 290 insertions(+)
 create mode 100644 gcc/testsuite/gcc.target/riscv/cmpmemsi-1.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/cmpmemsi-2.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/cmpmemsi-3.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/cmpmemsi.c

diff --git a/gcc/config/riscv/riscv-protos.h b/gcc/config/riscv/riscv-protos.h
index e5aebf3fc3d..30ffe30be1d 100644
--- a/gcc/config/riscv/riscv-protos.h
+++ b/gcc/config/riscv/riscv-protos.h
@@ -188,6 +188,7 @@ rtl_opt_pass * make_pass_avlprop (gcc::context *ctxt);
 rtl_opt_pass * make_pass_vsetvl (gcc::context *ctxt);
 
 /* Routines implemented in riscv-string.c.  */
+extern bool riscv_expand_block_compare (rtx, rtx, rtx, rtx);
 extern bool riscv_expand_block_move (rtx, rtx, rtx);
 
 /* Information about one CPU we know about.  */
diff --git a/gcc/config/riscv/riscv-string.cc b/gcc/config/riscv/riscv-string.cc
index b09b51d7526..9d4dc0cb827 100644
--- a/gcc/config/riscv/riscv-string.cc
+++ b/gcc/config/riscv/riscv-string.cc
@@ -86,6 +86,7 @@ GEN_EMIT_HELPER2(th_rev) /* do_th_rev2  */
 GEN_EMIT_HELPER2(th_tstnbz) /* do_th_tstnbz2  */
 GEN_EMIT_HELPER3(xor) /* do_xor3  */
 GEN_EMIT_HELPER2(zero_extendqi) /* do_zero_extendqi2  */
+GEN_EMIT_HELPER2(zero_extendhi) /* do_zero_extendhi2  */
 
 #undef GEN_EMIT_HELPER2
 #undef GEN_EMIT_HELPER3
@@ -109,6 +110,10 @@ do_load_from_addr (machine_mode mode, rtx dest, rtx 
addr_reg, rtx addr)
 
   if (mode == QImode)
 do_zero_extendqi2 (dest, mem);
+  else if (mode == HImode)
+do_zero_extendhi2 (dest, mem);
+  else if (mode == SImode && TARGET_64BIT)
+emit_insn (gen_zero_extendsidi2 (dest, mem));
   else if (mode == Xmode)
  

[PATCH 1/2] RISC-V: Add tests for cpymemsi expansion

2024-05-07 Thread Christoph Müllner
cpymemsi expansion was available for RISC-V since the initial port.
However, there are not tests to detect regression.
This patch adds such tests.

Three of the tests target the expansion requirements (known length and
alignment). One test reuses an existing memcpy test from the by-pieces
framework (gcc/testsuite/gcc.dg/torture/inline-mem-cpy-1.c).

gcc/testsuite/ChangeLog:

* gcc.target/riscv/cpymemsi-1.c: New test.
* gcc.target/riscv/cpymemsi-2.c: New test.
* gcc.target/riscv/cpymemsi-3.c: New test.
* gcc.target/riscv/cpymemsi.c: New test.

Signed-off-by: Christoph Müllner 
---
 gcc/testsuite/gcc.target/riscv/cpymemsi-1.c |  9 +
 gcc/testsuite/gcc.target/riscv/cpymemsi-2.c | 42 
 gcc/testsuite/gcc.target/riscv/cpymemsi-3.c | 43 +
 gcc/testsuite/gcc.target/riscv/cpymemsi.c   | 22 +++
 4 files changed, 116 insertions(+)
 create mode 100644 gcc/testsuite/gcc.target/riscv/cpymemsi-1.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/cpymemsi-2.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/cpymemsi-3.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/cpymemsi.c

diff --git a/gcc/testsuite/gcc.target/riscv/cpymemsi-1.c 
b/gcc/testsuite/gcc.target/riscv/cpymemsi-1.c
new file mode 100644
index 000..983b564ccaf
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/cpymemsi-1.c
@@ -0,0 +1,9 @@
+/* { dg-do run } */
+/* { dg-options "-march=rv32gc -save-temps -g0 -fno-lto" { target { rv32 } } } 
*/
+/* { dg-options "-march=rv64gc -save-temps -g0 -fno-lto" { target { rv64 } } } 
*/
+/* { dg-additional-options "-DRUN_FRACTION=11" { target simulator } } */
+/* { dg-timeout-factor 2 } */
+
+#include "../../gcc.dg/memcmp-1.c"
+/* Yeah, this memcmp test exercises plenty of memcpy, more than any of the
+   memcpy tests.  */
diff --git a/gcc/testsuite/gcc.target/riscv/cpymemsi-2.c 
b/gcc/testsuite/gcc.target/riscv/cpymemsi-2.c
new file mode 100644
index 000..833d1c04487
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/cpymemsi-2.c
@@ -0,0 +1,42 @@
+/* { dg-do compile } */
+/* { dg-options "-march=rv32gc" { target { rv32 } } } */
+/* { dg-options "-march=rv64gc" { target { rv64 } } } */
+/* { dg-skip-if "" { *-*-* } { "-O0" "-Os" "-Og" "-Oz" } } */
+
+#include 
+#define aligned32 __attribute__ ((aligned (32)))
+
+const char myconst15[] aligned32 = { 1, 2, 3, 4, 5, 6, 7,
+0, 1, 2, 3, 4, 5, 6, 7 };
+const char myconst23[] aligned32 = { 1, 2, 3, 4, 5, 6, 7,
+0, 1, 2, 3, 4, 5, 6, 7,
+0, 1, 2, 3, 4, 5, 6, 7 };
+const char myconst31[] aligned32 = { 1, 2, 3, 4, 5, 6, 7,
+0, 1, 2, 3, 4, 5, 6, 7,
+0, 1, 2, 3, 4, 5, 6, 7,
+0, 1, 2, 3, 4, 5, 6, 7 };
+
+/* No expansion (unknown alignment) */
+#define MY_MEM_CPY_N(N)\
+void my_mem_cpy_##N (char *b1, const char *b2) \
+{  \
+  __builtin_memcpy (b1, b2, N);\
+}
+
+/* No expansion (unknown alignment) */
+#define MY_MEM_CPY_CONST_N(N)  \
+void my_mem_cpy_const_##N (char *b1)   \
+{  \
+  __builtin_memcpy (b1, myconst##N, sizeof(myconst##N));\
+}
+
+MY_MEM_CPY_N(15)
+MY_MEM_CPY_CONST_N(15)
+
+MY_MEM_CPY_N(23)
+MY_MEM_CPY_CONST_N(23)
+
+MY_MEM_CPY_N(31)
+MY_MEM_CPY_CONST_N(31)
+
+/* { dg-final { scan-assembler-times "\t(call|tail)\tmemcpy" 6 } } */
diff --git a/gcc/testsuite/gcc.target/riscv/cpymemsi-3.c 
b/gcc/testsuite/gcc.target/riscv/cpymemsi-3.c
new file mode 100644
index 000..803765195b2
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/cpymemsi-3.c
@@ -0,0 +1,43 @@
+/* { dg-do compile } */
+/* { dg-options "-march=rv32gc" { target { rv32 } } } */
+/* { dg-options "-march=rv64gc" { target { rv64 } } } */
+/* { dg-skip-if "" { *-*-* } { "-O0" "-Os" "-Og" "-Oz" } } */
+
+#include 
+#define aligned32 __attribute__ ((aligned (32)))
+
+const char myconst15[] aligned32 = { 1, 2, 3, 4, 5, 6, 7,
+0, 1, 2, 3, 4, 5, 6, 7 };
+const char myconst23[] aligned32 = { 1, 2, 3, 4, 5, 6, 7,
+0, 1, 2, 3, 4, 5, 6, 7,
+0, 1, 2, 3, 4, 5, 6, 7 };
+const char myconst31[] aligned32 = { 1, 2, 3, 4, 5, 6, 7,
+0, 1, 2, 3, 4, 5, 6, 7,
+0, 1, 2, 3, 4, 5, 6, 7,
+0, 1, 2, 3, 4, 5, 6, 7 };
+
+#define MY_MEM_CPY_ALIGNED_N(N)\
+void my_mem_cpy_aligned_##N(char *b1, const char *b2)  \
+{  \
+  b1 = __builtin_assume_aligned (b1, 4096);\
+  b2 = __builtin_assume_aligned 

[PATCH] driver: Move -fdiagnostics-urls= early like -fdiagnostics-color= [PR114980]

2024-05-07 Thread Xi Ruoyao
In GCC 14 we started to emit URLs for "command-line option  is
valid for  but not " and "-Werror= argument
'-Werror=' is not valid for " warnings.  So we should
have moved -fdiagnostics-urls= early like -fdiagnostics-color=, or
-fdiagnostics-urls= wouldn't be able to control URLs in these warnings.

No test cases are added because with TERM=xterm-256colors PR114980
already triggers some test failures.

gcc/ChangeLog:

PR driver/114980
* opts-common.cc (prune_options): Move -fdiagnostics-urls=
early like -fdiagnostics-color=.
---

Bootstrapped and regtested on x86_64-linux-gnu.  Ok for trunk and
releases/gcc-14?

 gcc/opts-common.cc | 13 +
 1 file changed, 13 insertions(+)

diff --git a/gcc/opts-common.cc b/gcc/opts-common.cc
index 4a2dff243b0..2d1e86ff94f 100644
--- a/gcc/opts-common.cc
+++ b/gcc/opts-common.cc
@@ -1152,6 +1152,7 @@ prune_options (struct cl_decoded_option **decoded_options,
   unsigned int options_to_prepend = 0;
   unsigned int Wcomplain_wrong_lang_idx = 0;
   unsigned int fdiagnostics_color_idx = 0;
+  unsigned int fdiagnostics_urls_idx = 0;
 
   /* Remove arguments which are negated by others after them.  */
   new_decoded_options_count = 0;
@@ -1185,6 +1186,12 @@ prune_options (struct cl_decoded_option 
**decoded_options,
++options_to_prepend;
  fdiagnostics_color_idx = i;
  continue;
+   case OPT_fdiagnostics_urls_:
+ gcc_checking_assert (i != 0);
+ if (fdiagnostics_urls_idx == 0)
+   ++options_to_prepend;
+ fdiagnostics_urls_idx = i;
+ continue;
 
default:
  gcc_assert (opt_idx < cl_options_count);
@@ -1248,6 +1255,12 @@ keep:
= old_decoded_options[fdiagnostics_color_idx];
  new_decoded_options_count++;
}
+  if (fdiagnostics_urls_idx != 0)
+   {
+ new_decoded_options[argv_0 + options_prepended++]
+   = old_decoded_options[fdiagnostics_urls_idx];
+ new_decoded_options_count++;
+   }
   gcc_checking_assert (options_to_prepend == options_prepended);
 }
 
-- 
2.45.0



[PATCH] rs6000: Adjust -fpatchable-function-entry* support for dual entry [PR112980]

2024-05-07 Thread Kewen.Lin
Hi,

As the discussion in PR112980, although the current
implementation for -fpatchable-function-entry* conforms
with the documentation (making N NOPs be consecutive),
it's inefficient for both kernel and userspace livepatching
(see comments in PR for the details).

So this patch is to change the current implementation by
emitting the "before" NOPs before global entry point and
the "after" NOPs after local entry point.  The new behavior
would not keep NOPs to be consecutive, so the documentation
is updated to emphasize this.

Bootstrapped and regress-tested on powerpc64-linux-gnu
P8/P9 and powerpc64le-linux-gnu P9 and P10.

Is it ok for trunk?  And backporting to active branches
after burn-in time?  I guess we should also mention this
change in changes.html?

BR,
Kewen
-
PR target/112980

gcc/ChangeLog:

* config/rs6000/rs6000-logue.cc (rs6000_output_function_prologue):
Adjust the handling on patch area emitting with dual entry, remove
the restriction on "before" NOPs count, not emit "before" NOPs any
more but only emit "after" NOPs.
* config/rs6000/rs6000.cc (rs6000_print_patchable_function_entry):
Adjust by respecting cfun->machine->stop_patch_area_print.
(rs6000_elf_declare_function_name): For ELFv2 with dual entry, set
cfun->machine->stop_patch_area_print as true.
* config/rs6000/rs6000.h (struct machine_function): Remove member
global_entry_emitted, add new member stop_patch_area_print.
* doc/invoke.texi (option -fpatchable-function-entry): Adjust the
documentation for PowerPC ELFv2 dual entry.

gcc/testsuite/ChangeLog:

* c-c++-common/patchable_function_entry-default.c: Adjust.
* gcc.target/powerpc/pr99888-4.c: Likewise.
* gcc.target/powerpc/pr99888-5.c: Likewise.
* gcc.target/powerpc/pr99888-6.c: Likewise.
---
 gcc/config/rs6000/rs6000-logue.cc | 40 +--
 gcc/config/rs6000/rs6000.cc   | 15 +--
 gcc/config/rs6000/rs6000.h| 10 +++--
 gcc/doc/invoke.texi   |  8 ++--
 .../patchable_function_entry-default.c|  3 --
 gcc/testsuite/gcc.target/powerpc/pr99888-4.c  |  4 +-
 gcc/testsuite/gcc.target/powerpc/pr99888-5.c  |  4 +-
 gcc/testsuite/gcc.target/powerpc/pr99888-6.c  |  4 +-
 8 files changed, 33 insertions(+), 55 deletions(-)

diff --git a/gcc/config/rs6000/rs6000-logue.cc 
b/gcc/config/rs6000/rs6000-logue.cc
index 60ba15a8bc3..0eb019b44b3 100644
--- a/gcc/config/rs6000/rs6000-logue.cc
+++ b/gcc/config/rs6000/rs6000-logue.cc
@@ -4006,43 +4006,21 @@ rs6000_output_function_prologue (FILE *file)
  fprintf (file, "\tadd 2,2,12\n");
}

-  unsigned short patch_area_size = crtl->patch_area_size;
-  unsigned short patch_area_entry = crtl->patch_area_entry;
-  /* Need to emit the patching area.  */
-  if (patch_area_size > 0)
-   {
- cfun->machine->global_entry_emitted = true;
- /* As ELFv2 ABI shows, the allowable bytes between the global
-and local entry points are 0, 4, 8, 16, 32 and 64 when
-there is a local entry point.  Considering there are two
-non-prefixed instructions for global entry point prologue
-(8 bytes), the count for patchable nops before local entry
-point would be 2, 6 and 14.  It's possible to support those
-other counts of nops by not making a local entry point, but
-we don't have clear use cases for them, so leave them
-unsupported for now.  */
- if (patch_area_entry > 0)
-   {
- if (patch_area_entry != 2
- && patch_area_entry != 6
- && patch_area_entry != 14)
-   error ("unsupported number of nops before function entry (%u)",
-  patch_area_entry);
- rs6000_print_patchable_function_entry (file, patch_area_entry,
-true);
- patch_area_size -= patch_area_entry;
-   }
-   }
-
   fputs ("\t.localentry\t", file);
   assemble_name (file, name);
   fputs (",.-", file);
   assemble_name (file, name);
   fputs ("\n", file);
   /* Emit the nops after local entry.  */
-  if (patch_area_size > 0)
-   rs6000_print_patchable_function_entry (file, patch_area_size,
-  patch_area_entry == 0);
+  unsigned short patch_area_size = crtl->patch_area_size;
+  unsigned short patch_area_entry = crtl->patch_area_entry;
+  if (patch_area_size > patch_area_entry)
+   {
+ cfun->machine->stop_patch_area_print = false;
+ patch_area_size -= patch_area_entry;
+ rs6000_print_patchable_function_entry (file, patch_area_size,
+patch_area_entry == 0);
+   }
 }

   else if (rs6000_pcrel_p ())
diff --git 

[PATCH 3/3] RISC-V: Add memset-zero expansion to cbo.zero

2024-05-07 Thread Christoph Müllner
The Zicboz extension offers the cbo.zero instruction, which can be used
to clean a memory region corresponding to a cache block.
The Zic64b extension defines the cache block size to 64 byte.
If both extensions are available, it is possible to use cbo.zero
to clear memory, if the alignment and size constraints are met.
This patch implements this.

gcc/ChangeLog:

* config/riscv/riscv-protos.h (riscv_expand_block_clear): New prototype.
* config/riscv/riscv-string.cc (riscv_expand_block_clear_zicboz_zic64b):
New function to expand a block-clear with cbo.zero.
(riscv_expand_block_clear): New RISC-V block-clear expansion function.
* config/riscv/riscv.md (setmem): New setmem expansion.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/cmo-zicboz-zic64-1.c: New test.

Signed-off-by: Christoph Müllner 
---
 gcc/config/riscv/riscv-protos.h   |  1 +
 gcc/config/riscv/riscv-string.cc  | 59 +++
 gcc/config/riscv/riscv.md | 24 
 .../gcc.target/riscv/cmo-zicboz-zic64-1.c | 43 ++
 4 files changed, 127 insertions(+)
 create mode 100644 gcc/testsuite/gcc.target/riscv/cmo-zicboz-zic64-1.c

diff --git a/gcc/config/riscv/riscv-protos.h b/gcc/config/riscv/riscv-protos.h
index e5aebf3fc3d..255fd6a0de9 100644
--- a/gcc/config/riscv/riscv-protos.h
+++ b/gcc/config/riscv/riscv-protos.h
@@ -189,6 +189,7 @@ rtl_opt_pass * make_pass_vsetvl (gcc::context *ctxt);
 
 /* Routines implemented in riscv-string.c.  */
 extern bool riscv_expand_block_move (rtx, rtx, rtx);
+extern bool riscv_expand_block_clear (rtx, rtx);
 
 /* Information about one CPU we know about.  */
 struct riscv_cpu_info {
diff --git a/gcc/config/riscv/riscv-string.cc b/gcc/config/riscv/riscv-string.cc
index b09b51d7526..cf92256bc4e 100644
--- a/gcc/config/riscv/riscv-string.cc
+++ b/gcc/config/riscv/riscv-string.cc
@@ -787,6 +787,65 @@ riscv_expand_block_move (rtx dest, rtx src, rtx length)
   return false;
 }
 
+/* Expand a block-clear instruction via cbo.zero instructions.  */
+
+static bool
+riscv_expand_block_clear_zicboz_zic64b (rtx dest, rtx length)
+{
+  unsigned HOST_WIDE_INT hwi_length;
+  unsigned HOST_WIDE_INT align;
+  const unsigned HOST_WIDE_INT cbo_bytes = 64;
+
+  gcc_assert (TARGET_ZICBOZ && TARGET_ZIC64B);
+
+  if (!CONST_INT_P (length))
+return false;
+
+  hwi_length = UINTVAL (length);
+  if (hwi_length < cbo_bytes)
+return false;
+
+  align = MEM_ALIGN (dest) / BITS_PER_UNIT;
+  if (align < cbo_bytes)
+return false;
+
+  /* We don't emit loops.  Instead apply move-bytes limitation.  */
+  unsigned HOST_WIDE_INT max_bytes = RISCV_MAX_MOVE_BYTES_STRAIGHT /
+ UNITS_PER_WORD * cbo_bytes;
+  if (hwi_length > max_bytes)
+return false;
+
+  unsigned HOST_WIDE_INT offset = 0;
+  while (offset + cbo_bytes <= hwi_length)
+{
+  rtx mem = adjust_address (dest, BLKmode, offset);
+  rtx addr = force_reg (Pmode, XEXP (mem, 0));
+  emit_insn (gen_riscv_zero_di (addr));
+  offset += cbo_bytes;
+}
+
+  if (offset < hwi_length)
+{
+  rtx mem = adjust_address (dest, BLKmode, offset);
+  clear_by_pieces (mem, hwi_length - offset, align);
+}
+
+  return true;
+}
+
+bool
+riscv_expand_block_clear (rtx dest, rtx length)
+{
+  /* Only use setmem-zero expansion for Zicboz + Zic64b.  */
+  if (!TARGET_ZICBOZ || !TARGET_ZIC64B)
+return false;
+
+  if (optimize_function_for_size_p (cfun))
+return false;
+
+  return riscv_expand_block_clear_zicboz_zic64b (dest, length);
+}
+
 /* --- Vector expanders --- */
 
 namespace riscv_vector {
diff --git a/gcc/config/riscv/riscv.md b/gcc/config/riscv/riscv.md
index d4676507b45..729c102812c 100644
--- a/gcc/config/riscv/riscv.md
+++ b/gcc/config/riscv/riscv.md
@@ -2598,6 +2598,30 @@ (define_expand "cpymem"
 FAIL;
 })
 
+;; Fill memory with constant byte.
+;; Argument 0 is the destination
+;; Argument 1 is the constant byte
+;; Argument 2 is the length
+;; Argument 3 is the alignment
+
+(define_expand "setmem"
+  [(parallel [(set (match_operand:BLK 0 "memory_operand")
+  (match_operand:QI 2 "const_int_operand"))
+ (use (match_operand:P 1 ""))
+ (use (match_operand:SI 3 "const_int_operand"))])]
+ ""
+ {
+  /* If value to set is not zero, use the library routine.  */
+  if (operands[2] != const0_rtx)
+FAIL;
+
+  if (riscv_expand_block_clear (operands[0], operands[1]))
+DONE;
+  else
+FAIL;
+})
+
+
 ;; Expand in-line code to clear the instruction cache between operand[0] and
 ;; operand[1].
 (define_expand "clear_cache"
diff --git a/gcc/testsuite/gcc.target/riscv/cmo-zicboz-zic64-1.c 
b/gcc/testsuite/gcc.target/riscv/cmo-zicboz-zic64-1.c
new file mode 100644
index 000..c2d79eb7ae6
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/cmo-zicboz-zic64-1.c
@@ -0,0 +1,43 @@
+/* { dg-do compile } */
+/* { dg-options "-march=rv64gc_zic64b_zicboz" { target { rv64 } } } */
+/* { dg-options 

[PATCH 2/3] RISC-V: testsuite: Make cmo tests LTO safe

2024-05-07 Thread Christoph Müllner
Let's add '\t' to the instruction match pattern to avoid false positive
matches when compiling with -flto.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/cmo-zicbom-1.c: Add \t to test pattern.
* gcc.target/riscv/cmo-zicbom-2.c: Likewise.
* gcc.target/riscv/cmo-zicbop-1.c: Likewise.
* gcc.target/riscv/cmo-zicbop-2.c: Likewise.
* gcc.target/riscv/cmo-zicboz-1.c: Likewise.
* gcc.target/riscv/cmo-zicboz-2.c: Likewise.

Signed-off-by: Christoph Müllner 
---
 gcc/testsuite/gcc.target/riscv/cmo-zicbom-1.c | 6 +++---
 gcc/testsuite/gcc.target/riscv/cmo-zicbom-2.c | 6 +++---
 gcc/testsuite/gcc.target/riscv/cmo-zicbop-1.c | 6 +++---
 gcc/testsuite/gcc.target/riscv/cmo-zicbop-2.c | 6 +++---
 gcc/testsuite/gcc.target/riscv/cmo-zicboz-1.c | 2 +-
 gcc/testsuite/gcc.target/riscv/cmo-zicboz-2.c | 2 +-
 6 files changed, 14 insertions(+), 14 deletions(-)

diff --git a/gcc/testsuite/gcc.target/riscv/cmo-zicbom-1.c 
b/gcc/testsuite/gcc.target/riscv/cmo-zicbom-1.c
index 6341f7874d3..02c38e201fa 100644
--- a/gcc/testsuite/gcc.target/riscv/cmo-zicbom-1.c
+++ b/gcc/testsuite/gcc.target/riscv/cmo-zicbom-1.c
@@ -24,6 +24,6 @@ void foo3()
 __builtin_riscv_zicbom_cbo_inval((void*)0x111);
 }
 
-/* { dg-final { scan-assembler-times "cbo.clean" 3 } } */
-/* { dg-final { scan-assembler-times "cbo.flush" 3 } } */
-/* { dg-final { scan-assembler-times "cbo.inval" 3 } } */
+/* { dg-final { scan-assembler-times "cbo.clean\t" 3 } } */
+/* { dg-final { scan-assembler-times "cbo.flush\t" 3 } } */
+/* { dg-final { scan-assembler-times "cbo.inval\t" 3 } } */
diff --git a/gcc/testsuite/gcc.target/riscv/cmo-zicbom-2.c 
b/gcc/testsuite/gcc.target/riscv/cmo-zicbom-2.c
index a04f106c8b0..040b96952bc 100644
--- a/gcc/testsuite/gcc.target/riscv/cmo-zicbom-2.c
+++ b/gcc/testsuite/gcc.target/riscv/cmo-zicbom-2.c
@@ -24,6 +24,6 @@ void foo3()
 __builtin_riscv_zicbom_cbo_inval((void*)0x111);
 }
 
-/* { dg-final { scan-assembler-times "cbo.clean" 3 } } */
-/* { dg-final { scan-assembler-times "cbo.flush" 3 } } */
-/* { dg-final { scan-assembler-times "cbo.inval" 3 } } */
+/* { dg-final { scan-assembler-times "cbo.clean\t" 3 } } */
+/* { dg-final { scan-assembler-times "cbo.flush\t" 3 } } */
+/* { dg-final { scan-assembler-times "cbo.inval\t" 3 } } */
diff --git a/gcc/testsuite/gcc.target/riscv/cmo-zicbop-1.c 
b/gcc/testsuite/gcc.target/riscv/cmo-zicbop-1.c
index c5d78c1763d..97181154d85 100644
--- a/gcc/testsuite/gcc.target/riscv/cmo-zicbop-1.c
+++ b/gcc/testsuite/gcc.target/riscv/cmo-zicbop-1.c
@@ -18,6 +18,6 @@ int foo1()
   return __builtin_riscv_zicbop_cbo_prefetchi(1);
 }
 
-/* { dg-final { scan-assembler-times "prefetch.i" 1 } } */
-/* { dg-final { scan-assembler-times "prefetch.r" 4 } } */
-/* { dg-final { scan-assembler-times "prefetch.w" 4 } } */
+/* { dg-final { scan-assembler-times "prefetch.i\t" 1 } } */
+/* { dg-final { scan-assembler-times "prefetch.r\t" 4 } } */
+/* { dg-final { scan-assembler-times "prefetch.w\t" 4 } } */
diff --git a/gcc/testsuite/gcc.target/riscv/cmo-zicbop-2.c 
b/gcc/testsuite/gcc.target/riscv/cmo-zicbop-2.c
index 6576365b39c..4871a97b21a 100644
--- a/gcc/testsuite/gcc.target/riscv/cmo-zicbop-2.c
+++ b/gcc/testsuite/gcc.target/riscv/cmo-zicbop-2.c
@@ -18,6 +18,6 @@ int foo1()
   return __builtin_riscv_zicbop_cbo_prefetchi(1);
 }
 
-/* { dg-final { scan-assembler-times "prefetch.i" 1 } } */
-/* { dg-final { scan-assembler-times "prefetch.r" 4 } } */
-/* { dg-final { scan-assembler-times "prefetch.w" 4 } } */ 
+/* { dg-final { scan-assembler-times "prefetch.i\t" 1 } } */
+/* { dg-final { scan-assembler-times "prefetch.r\t" 4 } } */
+/* { dg-final { scan-assembler-times "prefetch.w\t" 4 } } */ 
diff --git a/gcc/testsuite/gcc.target/riscv/cmo-zicboz-1.c 
b/gcc/testsuite/gcc.target/riscv/cmo-zicboz-1.c
index 5eb78ab94b5..63b8782bf89 100644
--- a/gcc/testsuite/gcc.target/riscv/cmo-zicboz-1.c
+++ b/gcc/testsuite/gcc.target/riscv/cmo-zicboz-1.c
@@ -10,4 +10,4 @@ void foo1()
 __builtin_riscv_zicboz_cbo_zero((void*)0x121);
 }
 
-/* { dg-final { scan-assembler-times "cbo.zero" 3 } } */ 
+/* { dg-final { scan-assembler-times "cbo.zero\t" 3 } } */ 
diff --git a/gcc/testsuite/gcc.target/riscv/cmo-zicboz-2.c 
b/gcc/testsuite/gcc.target/riscv/cmo-zicboz-2.c
index fdc9c719669..cc3bd505ec0 100644
--- a/gcc/testsuite/gcc.target/riscv/cmo-zicboz-2.c
+++ b/gcc/testsuite/gcc.target/riscv/cmo-zicboz-2.c
@@ -10,4 +10,4 @@ void foo1()
 __builtin_riscv_zicboz_cbo_zero((void*)0x121);
 }
 
-/* { dg-final { scan-assembler-times "cbo.zero" 3 } } */ 
+/* { dg-final { scan-assembler-times "cbo.zero\t" 3 } } */ 
-- 
2.44.0



[PATCH 1/3] expr: Export clear_by_pieces()

2024-05-07 Thread Christoph Müllner
Make clear_by_pieces() available to other parts of the compiler,
similar to store_by_pieces().

gcc/ChangeLog:

* expr.cc (clear_by_pieces): Remove static from clear_by_pieces.
* expr.h (clear_by_pieces): Add prototype for clear_by_pieces.

Signed-off-by: Christoph Müllner 
---
 gcc/expr.cc | 6 +-
 gcc/expr.h  | 5 +
 2 files changed, 6 insertions(+), 5 deletions(-)

diff --git a/gcc/expr.cc b/gcc/expr.cc
index d4414e242cb..eaf86d3d842 100644
--- a/gcc/expr.cc
+++ b/gcc/expr.cc
@@ -85,7 +85,6 @@ static void emit_block_move_via_sized_loop (rtx, rtx, rtx, 
unsigned, unsigned);
 static void emit_block_move_via_oriented_loop (rtx, rtx, rtx, unsigned, 
unsigned);
 static rtx emit_block_cmp_via_loop (rtx, rtx, rtx, tree, rtx, bool,
unsigned, unsigned);
-static void clear_by_pieces (rtx, unsigned HOST_WIDE_INT, unsigned int);
 static rtx_insn *compress_float_constant (rtx, rtx);
 static rtx get_subtarget (rtx);
 static rtx store_field (rtx, poly_int64, poly_int64, poly_uint64, poly_uint64,
@@ -1832,10 +1831,7 @@ store_by_pieces (rtx to, unsigned HOST_WIDE_INT len,
 return to;
 }
 
-/* Generate several move instructions to clear LEN bytes of block TO.  (A MEM
-   rtx with BLKmode).  ALIGN is maximum alignment we can assume.  */
-
-static void
+void
 clear_by_pieces (rtx to, unsigned HOST_WIDE_INT len, unsigned int align)
 {
   if (len == 0)
diff --git a/gcc/expr.h b/gcc/expr.h
index 64956f63029..75181584108 100644
--- a/gcc/expr.h
+++ b/gcc/expr.h
@@ -245,6 +245,11 @@ extern bool can_store_by_pieces (unsigned HOST_WIDE_INT,
 extern rtx store_by_pieces (rtx, unsigned HOST_WIDE_INT, by_pieces_constfn,
void *, unsigned int, bool, memop_ret);
 
+/* Generate several move instructions to clear LEN bytes of block TO.  (A MEM
+   rtx with BLKmode).  ALIGN is maximum alignment we can assume.  */
+
+extern void clear_by_pieces (rtx, unsigned HOST_WIDE_INT, unsigned int);
+
 /* If can_store_by_pieces passes for worst-case values near MAX_LEN, call
store_by_pieces within conditionals so as to handle variable LEN 
efficiently,
storing VAL, if non-NULL_RTX, or valc instead.  */
-- 
2.44.0



[PATCH 0/3] RISC-V: Add memset-zero expansion with Zicboz+Zic64b

2024-05-07 Thread Christoph Müllner
I've mentioned this patchset a few weeks ago in the RISC-V call.
Sending it now, as the release is out.

Christoph Müllner (3):
  expr: Export clear_by_pieces()
  RISC-V: testsuite: Make cmo tests LTO safe
  RISC-V: Add memset-zero expansion to cbo.zero

 gcc/config/riscv/riscv-protos.h   |  1 +
 gcc/config/riscv/riscv-string.cc  | 59 +++
 gcc/config/riscv/riscv.md | 24 
 gcc/expr.cc   |  6 +-
 gcc/expr.h|  5 ++
 gcc/testsuite/gcc.target/riscv/cmo-zicbom-1.c |  6 +-
 gcc/testsuite/gcc.target/riscv/cmo-zicbom-2.c |  6 +-
 gcc/testsuite/gcc.target/riscv/cmo-zicbop-1.c |  6 +-
 gcc/testsuite/gcc.target/riscv/cmo-zicbop-2.c |  6 +-
 gcc/testsuite/gcc.target/riscv/cmo-zicboz-1.c |  2 +-
 gcc/testsuite/gcc.target/riscv/cmo-zicboz-2.c |  2 +-
 .../gcc.target/riscv/cmo-zicboz-zic64-1.c | 43 ++
 12 files changed, 147 insertions(+), 19 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/riscv/cmo-zicboz-zic64-1.c

-- 
2.44.0



[PATCH 4/4] tree: Remove KFmode workaround [PR112993]

2024-05-07 Thread Kewen.Lin
Hi,

The fix for PR112993 makes KFmode have 128 bit mode precision,
we don't need this workaround to fix up the type precision any
more, and just go with mode precision.  So this patch is to
remove KFmode workaround.

Bootstrapped and regress-tested on:
  - powerpc64-linux-gnu P8/P9 (with ibm128 by default)
  - powerpc64le-linux-gnu P9/P10 (with ibm128 by default)
  - powerpc64le-linux-gnu P9 (with ieee128 by default)

Is it OK for trunk if {1,2}/4 in this series get landed?

BR,
Kewen
-

PR target/112993

gcc/ChangeLog:

* tree.cc (build_common_tree_nodes): Drop the workaround for rs6000
KFmode precision adjustment.
---
 gcc/tree.cc | 9 -
 1 file changed, 9 deletions(-)

diff --git a/gcc/tree.cc b/gcc/tree.cc
index f801712c9dd..f730981ec8b 100644
--- a/gcc/tree.cc
+++ b/gcc/tree.cc
@@ -9575,15 +9575,6 @@ build_common_tree_nodes (bool signed_char)
   if (!targetm.floatn_mode (n, extended).exists ())
continue;
   int precision = GET_MODE_PRECISION (mode);
-  /* Work around the rs6000 KFmode having precision 113 not
-128.  */
-  const struct real_format *fmt = REAL_MODE_FORMAT (mode);
-  gcc_assert (fmt->b == 2 && fmt->emin + fmt->emax == 3);
-  int min_precision = fmt->p + ceil_log2 (fmt->emax - fmt->emin);
-  if (!extended)
-   gcc_assert (min_precision == n);
-  if (precision < min_precision)
-   precision = min_precision;
   FLOATN_NX_TYPE_NODE (i) = make_node (REAL_TYPE);
   TYPE_PRECISION (FLOATN_NX_TYPE_NODE (i)) = precision;
   layout_type (FLOATN_NX_TYPE_NODE (i));
--
2.39.1


[PATCH 3/4] ranger: Revert the workaround introduced in PR112788 [PR112993]

2024-05-07 Thread Kewen.Lin
Hi,

This reverts commit r14-6478-gfda8e2f8292a90 "range:
Workaround different type precision between _Float128 and
long double [PR112788]" as the fixes for PR112993 make
all 128 bits scalar floating point have the same 128 bit
precision, this workaround isn't needed any more.

Bootstrapped and regress-tested on:
  - powerpc64-linux-gnu P8/P9 (with ibm128 by default)
  - powerpc64le-linux-gnu P9/P10 (with ibm128 by default)
  - powerpc64le-linux-gnu P9 (with ieee128 by default)

Is it OK for trunk if {1,2}/4 in this series get landed?

BR,
Kewen
-

PR target/112993

gcc/ChangeLog:

* value-range.h (range_compatible_p): Remove the workaround on
different type precision between _Float128 and long double.
---
 gcc/value-range.h | 10 ++
 1 file changed, 2 insertions(+), 8 deletions(-)

diff --git a/gcc/value-range.h b/gcc/value-range.h
index 9531df56988..39de7daf3d9 100644
--- a/gcc/value-range.h
+++ b/gcc/value-range.h
@@ -1558,13 +1558,7 @@ range_compatible_p (tree type1, tree type2)
   // types_compatible_p requires conversion in both directions to be useless.
   // GIMPLE only requires a cast one way in order to be compatible.
   // Ranges really only need the sign and precision to be the same.
-  return TYPE_SIGN (type1) == TYPE_SIGN (type2)
-&& (TYPE_PRECISION (type1) == TYPE_PRECISION (type2)
-// FIXME: As PR112788 shows, for now on rs6000 _Float128 has
-// type precision 128 while long double has type precision 127
-// but both have the same mode so their precision is actually
-// the same, workaround it temporarily.
-|| (SCALAR_FLOAT_TYPE_P (type1)
-&& TYPE_MODE (type1) == TYPE_MODE (type2)));
+  return (TYPE_PRECISION (type1) == TYPE_PRECISION (type2)
+ && TYPE_SIGN (type1) == TYPE_SIGN (type2));
 }
 #endif // GCC_VALUE_RANGE_H
--
2.39.1


[PATCH 2/4] fortran: Teach get_real_kind_from_node for Power 128 fp modes [PR112993]

2024-05-07 Thread Kewen.Lin
Hi,

Previously effective target fortran_real_c_float128 never
passes on Power regardless of the default 128 long double
is ibmlongdouble or ieeelongdouble.  It's due to that TF
mode is always used for kind 16 real, which has precision
127, while the node float128_type_node for c_float128 has
128 type precision, get_real_kind_from_node can't find a
matching as it only checks gfc_real_kinds[i].mode_precision
and type precision.

With changing TFmode/IFmode/KFmode to have the same mode
precision 128, now fortran_real_c_float12 can pass with
ieeelongdouble enabled by default and test cases guarded
with it get tested accordingly.  But with ibmlongdouble
enabled by default, since TFmode has precision 128 which
is the same as type precision 128 of float128_type_node,
get_real_kind_from_node considers kind for TFmode matches
float128_type_node, but it's wrong as at this time point
TFmode is with ibm extended format.  So this patch is to
teach get_real_kind_from_node to check one more field which
can be differentiable from the underlying real format, it
can avoid the unexpected matching when there more than one
modes have the same precision.

Bootstrapped and regress-tested on:
  - powerpc64-linux-gnu P8/P9 (with ibm128 by default)
  - powerpc64le-linux-gnu P9/P10 (with ibm128 by default)
  - powerpc64le-linux-gnu P9 (with ieee128 by default)

BR,
Kewen
-
PR target/112993

gcc/fortran/ChangeLog:

* trans-types.cc (get_real_kind_from_node): Consider the case where
more than one modes have the same precision.
---
 gcc/fortran/trans-types.cc | 16 +++-
 1 file changed, 15 insertions(+), 1 deletion(-)

diff --git a/gcc/fortran/trans-types.cc b/gcc/fortran/trans-types.cc
index 676014e9b98..dd94ef77741 100644
--- a/gcc/fortran/trans-types.cc
+++ b/gcc/fortran/trans-types.cc
@@ -183,7 +183,21 @@ get_real_kind_from_node (tree type)

   for (i = 0; gfc_real_kinds[i].kind != 0; i++)
 if (gfc_real_kinds[i].mode_precision == TYPE_PRECISION (type))
-  return gfc_real_kinds[i].kind;
+  {
+   /* On Power, we have three 128-bit scalar floating-point modes
+  and all of their types have 128 bit type precision, so we
+  should check underlying real format details further.  */
+#if defined(HAVE_TFmode) && defined(HAVE_IFmode) && defined(HAVE_KFmode)
+   if (gfc_real_kinds[i].kind == 16)
+ {
+   machine_mode mode = TYPE_MODE (type);
+   const struct real_format *fmt = REAL_MODE_FORMAT (mode);
+   if (fmt->p != gfc_real_kinds[i].digits)
+ continue;
+ }
+#endif
+   return gfc_real_kinds[i].kind;
+  }

   return -4;
 }
--
2.39.1


[PATCH 1/4] rs6000: Make all 128 bit scalar FP modes have 128 bit precision [PR112993]

2024-05-07 Thread Kewen.Lin
Hi,

On rs6000, there are three 128 bit scalar floating point
modes TFmode, IFmode and KFmode.  With some historical
reasons, we defines them with different mode precisions,
that is KFmode 126, TFmode 127 and IFmode 128.  But in
fact all of them should have the same mode precision 128,
this special setting has caused some issues like some
unexpected failures mentioned in [1] and also made us have
to introduce some workarounds, such as: the workaround in
build_common_tree_nodes for KFmode 126, the workaround in
range_compatible_p for same mode but different precision
issue.

This patch is to make these three 128 bit scalar floating
point modes TFmode, IFmode and KFmode have 128 bit mode
precision, and keep the order same as previous in order
to make machine independent parts of the compiler not try
to widen IFmode to TFmode.  To make build_common_tree_nodes
be able to find the correct mode for long double type node,
it introduces one hook mode_for_longdouble to offer target
a way to specify the mode used for long double type node.
Previously I tried to put the order as TF/KF/IF then long
double type node can pick up the TF as expected, but we
need to teach some functions not to try the conversions
from IF(TF) to KF, one more important thing is that we
would further remove TF and leave only two modes for 128
bit floating point modes, without such hook the first 128
bit mode will be chosen for long double type node but
whichever we replace first would be possible not the
expected one as it actually depends on the selected long
double format.

In function convert_mode_scalar, it adopts sext_optab for
same precision modes conversion if !DECIMAL_FLOAT_MODE_P,
so we only need to support sext_optab for any possible
conversion.  Thus this patch removes some useless trunc
optab supports, supplements one new sext_optab which calls
the common handler rs6000_expand_float128_convert, unnames
two define_insn_and_split to avoid conflicts and make them
more clear.  Considering the current implementation that
there is no chance to have KF <-> IF conversion (since
either of them would be TF already), it adds two dummy
define_expands to assert this.

Bootstrapped and regress-tested (with patch 2/4) on:
  - powerpc64-linux-gnu P8/P9 (with ibm128 by default)
  - powerpc64le-linux-gnu P9/P10 (with ibm128 by default)
  - powerpc64le-linux-gnu P9 (with ieee128 by default)

Is it OK for trunk (especially the generic code change)?

[1] https://inbox.sourceware.org/gcc-patches/
718677e7-614d-7977-312d-05a75e1fd...@linux.ibm.com/

BR,
Kewen
-
PR target/112993

gcc/ChangeLog:

* config/rs6000/rs6000-modes.def (IFmode, KFmode, TFmode): Define
with FLOAT_MODE instead of FRACTIONAL_FLOAT_MODE, don't use special
precisions any more.
(rs6000-modes.h): Remove include.
* config/rs6000/rs6000-modes.h: Remove.
* config/rs6000/rs6000.h (rs6000-modes.h): Remove include.
* config/rs6000/t-rs6000: Remove rs6000-modes.h include.
* config/rs6000/rs6000.cc (rs6000_option_override_internal): Replace
all uses of FLOAT_PRECISION_TFmode with 128.
(TARGET_C_MODE_FOR_LONGDOUBLE): New macro.
(rs6000_c_mode_for_longdouble): New hook implementation.
* config/rs6000/rs6000.md (define_expand trunciftf2): Remove.
(define_expand truncifkf2): Remove.
(define_expand trunckftf2): Remove.
(define_expand trunctfif2): Remove.
(define_expand expandtfkf2, expandtfif2): Merge to ...
(define_expand expandtf2): ... this, new.
(define_expand expandiftf2): Merge to ...
(define_expand expandtf2): ... this, new.
(define_expand expandiftf2): Update with assert.
(define_expand expandkfif2): New.
(define_insn_and_split extendkftf2): Rename to  ...
(define_insn_and_split *extendkftf2): ... this.
(define_insn_and_split trunctfkf2): Rename to ...
(define_insn_and_split *extendtfkf2): ... this.
* expr.cc (convert_mode_scalar): Allow same precision conversion
between scalar floating point modes if whose underlying format is
ibm_extended_format or ieee_quad_format, and refactor assertion
with new lambda function acceptable_same_precision_modes.
* doc/tm.texi: Regenerate.
* doc/tm.texi.in (TARGET_C_MODE_FOR_LONGDOUBLE): New hook.
* target.def (mode_for_longdouble): Likewise.
* targhooks.cc (default_mode_for_longdouble): New hook default
implementation.
* targhooks.h (default_mode_for_longdouble): New hook declaration.
* tree.cc (build_common_tree_nodes): Call newly added hook
targetm.c.mode_for_longdouble.
---
 gcc/config/rs6000/rs6000-modes.def | 31 +
 gcc/config/rs6000/rs6000-modes.h   | 36 ---
 gcc/config/rs6000/rs6000.cc| 18 +---
 gcc/config/rs6000/rs6000.h |  5 ---
 gcc/config/rs6000/rs6000.md| 72 

[PATCH 4/4] RISC-V: Allow by-pieces to do overlapping accesses in block_move_straight

2024-05-07 Thread Christoph Müllner
The current implementation of riscv_block_move_straight() emits a couple
of loads/stores with with maximum width (e.g. 8-byte for RV64).
The remainder is handed over to move_by_pieces().
The by-pieces framework utilizes target hooks to decide about the emitted
instructions (e.g. unaligned accesses or overlapping accesses).

Since the current implementation will always request less than XLEN bytes
to be handled by the by-pieces infrastructure, it is impossible that
overlapping memory accesses can ever be emitted (the by-pieces code does
not know of any previous instructions that were emitted by the backend).

This patch changes the implementation of riscv_block_move_straight()
such, that it utilizes the by-pieces framework if the remaining data
is less than 2*XLEN bytes, which is sufficient to enable overlapping
memory accesses (if the requirements for them are given).

The changes in the expansion can be seen in the adjustments of the
cpymem-NN-ooo test cases. The changes in the cpymem-NN tests are
caused by the different instruction ordering of the code emitted
by the by-pieces infrastructure, which emits alternating load/store
sequences.

gcc/ChangeLog:

* config/riscv/riscv-string.cc (riscv_block_move_straight):
Hand over up to 2xXLEN bytes to move_by_pieces().

gcc/testsuite/ChangeLog:

* gcc.target/riscv/cpymem-32-ooo.c: Adjustments for overlapping
access.
* gcc.target/riscv/cpymem-32.c: Adjustments for code emitted by
by-pieces.
* gcc.target/riscv/cpymem-64-ooo.c: Adjustments for overlapping
access.
* gcc.target/riscv/cpymem-64.c: Adjustments for code emitted by
by-pieces.

Signed-off-by: Christoph Müllner 
---
 gcc/config/riscv/riscv-string.cc   |  6 +++---
 gcc/testsuite/gcc.target/riscv/cpymem-32-ooo.c | 16 
 gcc/testsuite/gcc.target/riscv/cpymem-32.c | 10 --
 gcc/testsuite/gcc.target/riscv/cpymem-64-ooo.c |  8 
 gcc/testsuite/gcc.target/riscv/cpymem-64.c |  9 +++--
 5 files changed, 22 insertions(+), 27 deletions(-)

diff --git a/gcc/config/riscv/riscv-string.cc b/gcc/config/riscv/riscv-string.cc
index 8fc082f..38cf60eb9cf 100644
--- a/gcc/config/riscv/riscv-string.cc
+++ b/gcc/config/riscv/riscv-string.cc
@@ -630,18 +630,18 @@ riscv_block_move_straight (rtx dest, rtx src, unsigned 
HOST_WIDE_INT length,
   delta = bits / BITS_PER_UNIT;
 
   /* Allocate a buffer for the temporary registers.  */
-  regs = XALLOCAVEC (rtx, length / delta);
+  regs = XALLOCAVEC (rtx, length / delta - 1);
 
   /* Load as many BITS-sized chunks as possible.  Use a normal load if
  the source has enough alignment, otherwise use left/right pairs.  */
-  for (offset = 0, i = 0; offset + delta <= length; offset += delta, i++)
+  for (offset = 0, i = 0; offset + 2 * delta <= length; offset += delta, i++)
 {
   regs[i] = gen_reg_rtx (mode);
   riscv_emit_move (regs[i], adjust_address (src, mode, offset));
 }
 
   /* Copy the chunks to the destination.  */
-  for (offset = 0, i = 0; offset + delta <= length; offset += delta, i++)
+  for (offset = 0, i = 0; offset + 2 * delta <= length; offset += delta, i++)
 riscv_emit_move (adjust_address (dest, mode, offset), regs[i]);
 
   /* Mop up any left-over bytes.  */
diff --git a/gcc/testsuite/gcc.target/riscv/cpymem-32-ooo.c 
b/gcc/testsuite/gcc.target/riscv/cpymem-32-ooo.c
index 947d58c30fa..2a48567353a 100644
--- a/gcc/testsuite/gcc.target/riscv/cpymem-32-ooo.c
+++ b/gcc/testsuite/gcc.target/riscv/cpymem-32-ooo.c
@@ -91,8 +91,8 @@ COPY_ALIGNED_N(11)
 **...
 **sw\t[at][0-9],0\([at][0-9]\)
 **...
-**lbu\t[at][0-9],14\([at][0-9]\)
-**sb\t[at][0-9],14\([at][0-9]\)
+**lw\t[at][0-9],11\([at][0-9]\)
+**sw\t[at][0-9],11\([at][0-9]\)
 **...
 */
 COPY_N(15)
@@ -104,8 +104,8 @@ COPY_N(15)
 **...
 **sw\t[at][0-9],0\([at][0-9]\)
 **...
-**lbu\t[at][0-9],14\([at][0-9]\)
-**sb\t[at][0-9],14\([at][0-9]\)
+**lw\t[at][0-9],11\([at][0-9]\)
+**sw\t[at][0-9],11\([at][0-9]\)
 **...
 */
 COPY_ALIGNED_N(15)
@@ -117,8 +117,8 @@ COPY_ALIGNED_N(15)
 **...
 **sw\t[at][0-9],20\([at][0-9]\)
 **...
-**lbu\t[at][0-9],26\([at][0-9]\)
-**sb\t[at][0-9],26\([at][0-9]\)
+**lw\t[at][0-9],23\([at][0-9]\)
+**sw\t[at][0-9],23\([at][0-9]\)
 **...
 */
 COPY_N(27)
@@ -130,8 +130,8 @@ COPY_N(27)
 **...
 **sw\t[at][0-9],20\([at][0-9]\)
 **...
-**lbu\t[at][0-9],26\([at][0-9]\)
-**sb\t[at][0-9],26\([at][0-9]\)
+**lw\t[at][0-9],23\([at][0-9]\)
+**sw\t[at][0-9],23\([at][0-9]\)
 **...
 */
 COPY_ALIGNED_N(27)
diff --git a/gcc/testsuite/gcc.target/riscv/cpymem-32.c 
b/gcc/testsuite/gcc.target/riscv/cpymem-32.c
index 44ba14a1d51..2030a39ca97 100644
--- a/gcc/testsuite/gcc.target/riscv/cpymem-32.c
+++ b/gcc/testsuite/gcc.target/riscv/cpymem-32.c
@@ -24,10 +24,10 @@ void copy_aligned_##N (void *to, void *from)
\
 **...
 **

[PATCH 3/4] RISC-V: tune: Add setting for overlapping mem ops to tuning struct

2024-05-07 Thread Christoph Müllner
This patch adds the field overlap_op_by_pieces to the struct
riscv_tune_param, which is used by the TARGET_OVERLAP_OP_BY_PIECES_P()
hook. This hook is used by the by-pieces infrastructure to decide
if overlapping memory accesses should be emitted.

The new property is set to false in all tune structs except for
generic-ooo.

The changes in the expansion can be seen in the adjustments of the
cpymem test cases. These tests also reveal a limitation in the
RISC-V cpymem expansion that prevents this optimization as only
by-pieces cpymem expansions emit overlapping memory accesses.

gcc/ChangeLog:

* config/riscv/riscv.cc (struct riscv_tune_param): New field
overlap_op_by_pieces.
(riscv_overlap_op_by_pieces): New function.
(TARGET_OVERLAP_OP_BY_PIECES_P): Connect to
riscv_overlap_op_by_pieces.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/cpymem-32-ooo.c: Adjust for overlapping
access.
* gcc.target/riscv/cpymem-64-ooo.c: Likewise.

Signed-off-by: Christoph Müllner 
---
 gcc/config/riscv/riscv.cc | 20 +++
 .../gcc.target/riscv/cpymem-32-ooo.c  | 20 +--
 .../gcc.target/riscv/cpymem-64-ooo.c  | 33 +++
 3 files changed, 40 insertions(+), 33 deletions(-)

diff --git a/gcc/config/riscv/riscv.cc b/gcc/config/riscv/riscv.cc
index 44945d47fd6..793ec3155b9 100644
--- a/gcc/config/riscv/riscv.cc
+++ b/gcc/config/riscv/riscv.cc
@@ -286,6 +286,7 @@ struct riscv_tune_param
   unsigned short memory_cost;
   unsigned short fmv_cost;
   bool slow_unaligned_access;
+  bool overlap_op_by_pieces;
   bool use_divmod_expansion;
   unsigned int fusible_ops;
   const struct cpu_vector_cost *vec_costs;
@@ -425,6 +426,7 @@ static const struct riscv_tune_param rocket_tune_info = {
   5,   /* memory_cost */
   8,   /* fmv_cost */
   true,/* 
slow_unaligned_access */
+  false,   /* overlap_op_by_pieces */
   false,   /* use_divmod_expansion */
   RISCV_FUSE_NOTHING,   /* fusible_ops */
   NULL,/* vector cost */
@@ -442,6 +444,7 @@ static const struct riscv_tune_param sifive_7_tune_info = {
   3,   /* memory_cost */
   8,   /* fmv_cost */
   true,/* 
slow_unaligned_access */
+  false,   /* overlap_op_by_pieces */
   false,   /* use_divmod_expansion */
   RISCV_FUSE_NOTHING,   /* fusible_ops */
   NULL,/* vector cost */
@@ -459,6 +462,7 @@ static const struct riscv_tune_param sifive_p400_tune_info 
= {
   3,   /* memory_cost */
   4,   /* fmv_cost */
   true,/* 
slow_unaligned_access */
+  false,   /* overlap_op_by_pieces */
   false,   /* use_divmod_expansion */
   RISCV_FUSE_LUI_ADDI | RISCV_FUSE_AUIPC_ADDI,  /* fusible_ops */
   _vector_cost,/* vector cost */
@@ -476,6 +480,7 @@ static const struct riscv_tune_param sifive_p600_tune_info 
= {
   3,   /* memory_cost */
   4,   /* fmv_cost */
   true,/* 
slow_unaligned_access */
+  false,   /* overlap_op_by_pieces */
   false,   /* use_divmod_expansion */
   RISCV_FUSE_LUI_ADDI | RISCV_FUSE_AUIPC_ADDI,  /* fusible_ops */
   _vector_cost,/* vector cost */
@@ -493,6 +498,7 @@ static const struct riscv_tune_param thead_c906_tune_info = 
{
   5,/* memory_cost */
   8,   /* fmv_cost */
   false,/* slow_unaligned_access */
+  false,   /* overlap_op_by_pieces */
   false,   /* use_divmod_expansion */
   RISCV_FUSE_NOTHING,   /* fusible_ops */
   NULL,/* vector cost */
@@ -510,6 +516,7 @@ static const struct riscv_tune_param 
xiangshan_nanhu_tune_info = {
   3,   /* memory_cost */
   3,   /* fmv_cost */
   true,/* 
slow_unaligned_access */
+  false,   /* overlap_op_by_pieces */
   false,   /* use_divmod_expansion */
   RISCV_FUSE_ZEXTW | 

[PATCH 2/4] RISC-V: Allow unaligned accesses in cpymemsi expansion

2024-05-07 Thread Christoph Müllner
The RISC-V cpymemsi expansion is called, whenever the by-pieces
infrastructure will not take care of the builtin expansion.
The code emitted by the by-pieces infrastructure may emits code,
that includes unaligned accesses if riscv_slow_unaligned_access_p
is false.

The RISC-V cpymemsi expansion is handled via riscv_expand_block_move().
The current implementation of this function does not check
riscv_slow_unaligned_access_p and never emits unaligned accesses.

Since by-pieces emits unaligned accesses, it is reasonable to implement
the same behaviour in the cpymemsi expansion. And that's what this patch
is doing.

The patch checks riscv_slow_unaligned_access_p at the entry and sets
the allowed alignment accordingly. This alignment is then propagated
down to the routines that emit the actual instructions.

The changes introduced by this patch can be seen in the adjustments
of the cpymem tests.

gcc/ChangeLog:

* config/riscv/riscv-string.cc (riscv_block_move_straight): Add
parameter align.
(riscv_adjust_block_mem): Replace parameter length by align.
(riscv_block_move_loop): Add parameter align.
(riscv_expand_block_move_scalar): Set alignment properly if the
target has fast unaligned access.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/cpymem-32-ooo.c: Adjust for unaligned access.
* gcc.target/riscv/cpymem-64-ooo.c: Likewise.

Signed-off-by: Christoph Müllner 
---
 gcc/config/riscv/riscv-string.cc  | 53 +++
 .../gcc.target/riscv/cpymem-32-ooo.c  | 20 +--
 .../gcc.target/riscv/cpymem-64-ooo.c  | 14 -
 3 files changed, 59 insertions(+), 28 deletions(-)

diff --git a/gcc/config/riscv/riscv-string.cc b/gcc/config/riscv/riscv-string.cc
index b09b51d7526..8fc082f 100644
--- a/gcc/config/riscv/riscv-string.cc
+++ b/gcc/config/riscv/riscv-string.cc
@@ -610,11 +610,13 @@ riscv_expand_strlen (rtx result, rtx src, rtx 
search_char, rtx align)
   return false;
 }
 
-/* Emit straight-line code to move LENGTH bytes from SRC to DEST.
+/* Emit straight-line code to move LENGTH bytes from SRC to DEST
+   with accesses that are ALIGN bytes aligned.
Assume that the areas do not overlap.  */
 
 static void
-riscv_block_move_straight (rtx dest, rtx src, unsigned HOST_WIDE_INT length)
+riscv_block_move_straight (rtx dest, rtx src, unsigned HOST_WIDE_INT length,
+  unsigned HOST_WIDE_INT align)
 {
   unsigned HOST_WIDE_INT offset, delta;
   unsigned HOST_WIDE_INT bits;
@@ -622,8 +624,7 @@ riscv_block_move_straight (rtx dest, rtx src, unsigned 
HOST_WIDE_INT length)
   enum machine_mode mode;
   rtx *regs;
 
-  bits = MAX (BITS_PER_UNIT,
- MIN (BITS_PER_WORD, MIN (MEM_ALIGN (src), MEM_ALIGN (dest;
+  bits = MAX (BITS_PER_UNIT, MIN (BITS_PER_WORD, align));
 
   mode = mode_for_size (bits, MODE_INT, 0).require ();
   delta = bits / BITS_PER_UNIT;
@@ -648,21 +649,20 @@ riscv_block_move_straight (rtx dest, rtx src, unsigned 
HOST_WIDE_INT length)
 {
   src = adjust_address (src, BLKmode, offset);
   dest = adjust_address (dest, BLKmode, offset);
-  move_by_pieces (dest, src, length - offset,
- MIN (MEM_ALIGN (src), MEM_ALIGN (dest)), RETURN_BEGIN);
+  move_by_pieces (dest, src, length - offset, align, RETURN_BEGIN);
 }
 }
 
 /* Helper function for doing a loop-based block operation on memory
-   reference MEM.  Each iteration of the loop will operate on LENGTH
-   bytes of MEM.
+   reference MEM.
 
Create a new base register for use within the loop and point it to
the start of MEM.  Create a new memory reference that uses this
-   register.  Store them in *LOOP_REG and *LOOP_MEM respectively.  */
+   register and has an alignment of ALIGN.  Store them in *LOOP_REG
+   and *LOOP_MEM respectively.  */
 
 static void
-riscv_adjust_block_mem (rtx mem, unsigned HOST_WIDE_INT length,
+riscv_adjust_block_mem (rtx mem, unsigned HOST_WIDE_INT align,
rtx *loop_reg, rtx *loop_mem)
 {
   *loop_reg = copy_addr_to_reg (XEXP (mem, 0));
@@ -670,15 +670,17 @@ riscv_adjust_block_mem (rtx mem, unsigned HOST_WIDE_INT 
length,
   /* Although the new mem does not refer to a known location,
  it does keep up to LENGTH bytes of alignment.  */
   *loop_mem = change_address (mem, BLKmode, *loop_reg);
-  set_mem_align (*loop_mem, MIN (MEM_ALIGN (mem), length * BITS_PER_UNIT));
+  set_mem_align (*loop_mem, align);
 }
 
 /* Move LENGTH bytes from SRC to DEST using a loop that moves BYTES_PER_ITER
-   bytes at a time.  LENGTH must be at least BYTES_PER_ITER.  Assume that
-   the memory regions do not overlap.  */
+   bytes at a time.  LENGTH must be at least BYTES_PER_ITER.  The alignment
+   of the access can be set by ALIGN.  Assume that the memory regions do not
+   overlap.  */
 
 static void
 riscv_block_move_loop (rtx dest, rtx src, unsigned HOST_WIDE_INT length,
+  unsigned HOST_WIDE_INT align,
 

[PATCH 1/4] RISC-V: Add test cases for cpymem expansion

2024-05-07 Thread Christoph Müllner
We have two mechanisms in the RISC-V backend that expand
cpymem pattern: a) by-pieces, b) riscv_expand_block_move()
in riscv-string.cc. The by-pieces framework has higher priority
and emits a sequence of up to 15 instructions
(see use_by_pieces_infrastructure_p() for more details).

As a rule-of-thumb, by-pieces emits alternating load/store sequences
and the setmem expansion in the backend emits a sequence of loads
followed by a sequence of stores.

Let's add some test cases to document the current behaviour
and to have tests to identify regressions.

Signed-off-by: Christoph Müllner 

gcc/testsuite/ChangeLog:

* gcc.target/riscv/cpymem-32-ooo.c: New test.
* gcc.target/riscv/cpymem-32.c: New test.
* gcc.target/riscv/cpymem-64-ooo.c: New test.
* gcc.target/riscv/cpymem-64.c: New test.
---
 .../gcc.target/riscv/cpymem-32-ooo.c  | 131 +
 gcc/testsuite/gcc.target/riscv/cpymem-32.c| 138 ++
 .../gcc.target/riscv/cpymem-64-ooo.c  | 129 
 gcc/testsuite/gcc.target/riscv/cpymem-64.c| 138 ++
 4 files changed, 536 insertions(+)
 create mode 100644 gcc/testsuite/gcc.target/riscv/cpymem-32-ooo.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/cpymem-32.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/cpymem-64-ooo.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/cpymem-64.c

diff --git a/gcc/testsuite/gcc.target/riscv/cpymem-32-ooo.c 
b/gcc/testsuite/gcc.target/riscv/cpymem-32-ooo.c
new file mode 100644
index 000..33fb9891d82
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/cpymem-32-ooo.c
@@ -0,0 +1,131 @@
+/* { dg-do compile } */
+/* { dg-require-effective-target rv32 } */
+/* { dg-options "-march=rv32gc -mabi=ilp32d -mtune=generic-ooo" } */
+/* { dg-skip-if "" { *-*-* } {"-O0" "-Os" "-Og" "-Oz" "-flto" } } */
+/* { dg-final { check-function-bodies "**" "" } } */
+/* { dg-allow-blank-lines-in-output 1 } */
+
+#define COPY_N(N)  \
+void copy_##N (void *to, void *from)   \
+{  \
+  __builtin_memcpy (to, from, N);  \
+}
+
+#define COPY_ALIGNED_N(N)  \
+void copy_aligned_##N (void *to, void *from)   \
+{  \
+  to = __builtin_assume_aligned(to, sizeof(long)); \
+  from = __builtin_assume_aligned(from, sizeof(long)); \
+  __builtin_memcpy (to, from, N);  \
+}
+
+/*
+**copy_7:
+**...
+**lw\t[at][0-9],0\([at][0-9]\)
+**sw\t[at][0-9],0\([at][0-9]\)
+**...
+**lbu\t[at][0-9],6\([at][0-9]\)
+**sb\t[at][0-9],6\([at][0-9]\)
+**...
+*/
+COPY_N(7)
+
+/*
+**copy_aligned_7:
+**...
+**lw\t[at][0-9],0\([at][0-9]\)
+**sw\t[at][0-9],0\([at][0-9]\)
+**...
+**lbu\t[at][0-9],6\([at][0-9]\)
+**sb\t[at][0-9],6\([at][0-9]\)
+**...
+*/
+COPY_ALIGNED_N(7)
+
+/*
+**copy_8:
+**...
+**lw\ta[0-9],0\(a[0-9]\)
+**sw\ta[0-9],0\(a[0-9]\)
+**...
+*/
+COPY_N(8)
+
+/*
+**copy_aligned_8:
+**...
+**lw\ta[0-9],0\(a[0-9]\)
+**sw\ta[0-9],0\(a[0-9]\)
+**...
+*/
+COPY_ALIGNED_N(8)
+
+/*
+**copy_11:
+**...
+**lbu\t[at][0-9],0\([at][0-9]\)
+**...
+**lbu\t[at][0-9],10\([at][0-9]\)
+**...
+**sb\t[at][0-9],0\([at][0-9]\)
+**...
+**sb\t[at][0-9],10\([at][0-9]\)
+**...
+*/
+COPY_N(11)
+
+/*
+**copy_aligned_11:
+**...
+**lw\t[at][0-9],0\([at][0-9]\)
+**...
+**sw\t[at][0-9],0\([at][0-9]\)
+**...
+**lbu\t[at][0-9],10\([at][0-9]\)
+**sb\t[at][0-9],10\([at][0-9]\)
+**...
+*/
+COPY_ALIGNED_N(11)
+
+/*
+**copy_15:
+**...
+**(call|tail)\tmemcpy
+**...
+*/
+COPY_N(15)
+
+/*
+**copy_aligned_15:
+**...
+**lw\t[at][0-9],0\([at][0-9]\)
+**...
+**sw\t[at][0-9],0\([at][0-9]\)
+**...
+**lbu\t[at][0-9],14\([at][0-9]\)
+**sb\t[at][0-9],14\([at][0-9]\)
+**...
+*/
+COPY_ALIGNED_N(15)
+
+/*
+**copy_27:
+**...
+**(call|tail)\tmemcpy
+**...
+*/
+COPY_N(27)
+
+/*
+**copy_aligned_27:
+**...
+**lw\t[at][0-9],20\([at][0-9]\)
+**...
+**sw\t[at][0-9],20\([at][0-9]\)
+**...
+**lbu\t[at][0-9],26\([at][0-9]\)
+**sb\t[at][0-9],26\([at][0-9]\)
+**...
+*/
+COPY_ALIGNED_N(27)
diff --git a/gcc/testsuite/gcc.target/riscv/cpymem-32.c 
b/gcc/testsuite/gcc.target/riscv/cpymem-32.c
new file mode 100644
index 000..44ba14a1d51
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/cpymem-32.c
@@ -0,0 +1,138 @@
+/* { dg-do compile } */
+/* { dg-require-effective-target rv32 } */
+/* { dg-options "-march=rv32gc -mabi=ilp32d -mtune=rocket" } */
+/* { dg-skip-if "" { *-*-* } {"-O0" "-Os" "-Og" "-Oz" "-flto" } } */
+/* { dg-final { check-function-bodies "**" "" } } */
+/* { dg-allow-blank-lines-in-output 1 } */
+
+#define COPY_N(N)  \
+void copy_##N (void *to, void 

[PATCH 0/4] RISC-V: Enhance unaligned/overlapping codegen

2024-05-07 Thread Christoph Müllner
I've mentioned some improvements for unaligned and overlapping code
generation in the RISC-V call a few weeks ago.  Sending this patches
now, as the release is out.

Christoph Müllner (4):
  RISC-V: Add test cases for cpymem expansion
  RISC-V: Allow unaligned accesses in cpymemsi expansion
  RISC-V: tune: Add setting for overlapping mem ops to tuning struct
  RISC-V: Allow by-pieces to do overlapping accesses in
block_move_straight

 gcc/config/riscv/riscv-string.cc  |  59 +---
 gcc/config/riscv/riscv.cc |  20 +++
 .../gcc.target/riscv/cpymem-32-ooo.c  | 137 ++
 gcc/testsuite/gcc.target/riscv/cpymem-32.c| 136 +
 .../gcc.target/riscv/cpymem-64-ooo.c  | 130 +
 gcc/testsuite/gcc.target/riscv/cpymem-64.c| 135 +
 6 files changed, 593 insertions(+), 24 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/riscv/cpymem-32-ooo.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/cpymem-32.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/cpymem-64-ooo.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/cpymem-64.c

-- 
2.44.0



[PATCH v22 20/23] c++: Implement __is_invocable built-in trait

2024-05-07 Thread Ken Matsui
Fixed the reference_wrapper case.  I used non_ref_datum_type to avoid
potentially multiple build_trait_object calls.

-- >8 --

This patch implements built-in trait for std::is_invocable.

gcc/cp/ChangeLog:

* cp-trait.def: Define __is_invocable.
* constraint.cc (diagnose_trait_expr): Handle CPTK_IS_INVOCABLE.
* semantics.cc (trait_expr_value): Likewise.
(finish_trait_expr): Likewise.
* cp-tree.h (build_invoke): New function.
* method.cc (build_invoke): New function.

gcc/testsuite/ChangeLog:

* g++.dg/ext/has-builtin-1.C: Test existence of __is_invocable.
* g++.dg/ext/is_invocable1.C: New test.
* g++.dg/ext/is_invocable2.C: New test.
* g++.dg/ext/is_invocable3.C: New test.
* g++.dg/ext/is_invocable4.C: New test.

Signed-off-by: Ken Matsui 
---
 gcc/cp/constraint.cc |   6 +
 gcc/cp/cp-trait.def  |   1 +
 gcc/cp/cp-tree.h |   2 +
 gcc/cp/method.cc | 142 +
 gcc/cp/semantics.cc  |   5 +
 gcc/testsuite/g++.dg/ext/has-builtin-1.C |   3 +
 gcc/testsuite/g++.dg/ext/is_invocable1.C | 349 +++
 gcc/testsuite/g++.dg/ext/is_invocable2.C | 139 +
 gcc/testsuite/g++.dg/ext/is_invocable3.C |  51 
 gcc/testsuite/g++.dg/ext/is_invocable4.C |  33 +++
 10 files changed, 731 insertions(+)
 create mode 100644 gcc/testsuite/g++.dg/ext/is_invocable1.C
 create mode 100644 gcc/testsuite/g++.dg/ext/is_invocable2.C
 create mode 100644 gcc/testsuite/g++.dg/ext/is_invocable3.C
 create mode 100644 gcc/testsuite/g++.dg/ext/is_invocable4.C

diff --git a/gcc/cp/constraint.cc b/gcc/cp/constraint.cc
index c28d7bf428e..6d14ef7dcc7 100644
--- a/gcc/cp/constraint.cc
+++ b/gcc/cp/constraint.cc
@@ -3792,6 +3792,12 @@ diagnose_trait_expr (tree expr, tree args)
 case CPTK_IS_FUNCTION:
   inform (loc, "  %qT is not a function", t1);
   break;
+case CPTK_IS_INVOCABLE:
+  if (!t2)
+inform (loc, "  %qT is not invocable", t1);
+  else
+inform (loc, "  %qT is not invocable by %qE", t1, t2);
+  break;
 case CPTK_IS_LAYOUT_COMPATIBLE:
   inform (loc, "  %qT is not layout compatible with %qT", t1, t2);
   break;
diff --git a/gcc/cp/cp-trait.def b/gcc/cp/cp-trait.def
index b1c875a6e7d..4e420d5390a 100644
--- a/gcc/cp/cp-trait.def
+++ b/gcc/cp/cp-trait.def
@@ -75,6 +75,7 @@ DEFTRAIT_EXPR (IS_EMPTY, "__is_empty", 1)
 DEFTRAIT_EXPR (IS_ENUM, "__is_enum", 1)
 DEFTRAIT_EXPR (IS_FINAL, "__is_final", 1)
 DEFTRAIT_EXPR (IS_FUNCTION, "__is_function", 1)
+DEFTRAIT_EXPR (IS_INVOCABLE, "__is_invocable", -1)
 DEFTRAIT_EXPR (IS_LAYOUT_COMPATIBLE, "__is_layout_compatible", 2)
 DEFTRAIT_EXPR (IS_LITERAL_TYPE, "__is_literal_type", 1)
 DEFTRAIT_EXPR (IS_MEMBER_FUNCTION_POINTER, "__is_member_function_pointer", 1)
diff --git a/gcc/cp/cp-tree.h b/gcc/cp/cp-tree.h
index 52d6841559c..8aa41f7147f 100644
--- a/gcc/cp/cp-tree.h
+++ b/gcc/cp/cp-tree.h
@@ -7340,6 +7340,8 @@ extern tree get_copy_assign   (tree);
 extern tree get_default_ctor   (tree);
 extern tree get_dtor   (tree, tsubst_flags_t);
 extern tree build_stub_object  (tree);
+extern tree build_invoke   (tree, const_tree,
+tsubst_flags_t);
 extern tree strip_inheriting_ctors (tree);
 extern tree inherited_ctor_binfo   (tree);
 extern bool base_ctor_omit_inherited_parms (tree);
diff --git a/gcc/cp/method.cc b/gcc/cp/method.cc
index 08a3d34fb01..1c3233ca5d7 100644
--- a/gcc/cp/method.cc
+++ b/gcc/cp/method.cc
@@ -1928,6 +1928,148 @@ build_trait_object (tree type)
   return build_stub_object (type);
 }
 
+/* [func.require] Build an expression of INVOKE(FN_TYPE, ARG_TYPES...).  If the
+   given is not invocable, returns error_mark_node.  */
+
+tree
+build_invoke (tree fn_type, const_tree arg_types, tsubst_flags_t complain)
+{
+  if (error_operand_p (fn_type) || error_operand_p (arg_types))
+return error_mark_node;
+
+  gcc_assert (TYPE_P (fn_type));
+  gcc_assert (TREE_CODE (arg_types) == TREE_VEC);
+
+  /* Access check is required to determine if the given is invocable.  */
+  deferring_access_check_sentinel acs (dk_no_deferred);
+
+  /* INVOKE is an unevaluated context.  */
+  cp_unevaluated cp_uneval_guard;
+
+  bool is_ptrdatamem;
+  bool is_ptrmemfunc;
+  if (TREE_CODE (fn_type) == REFERENCE_TYPE)
+{
+  tree non_ref_fn_type = TREE_TYPE (fn_type);
+  is_ptrdatamem = TYPE_PTRDATAMEM_P (non_ref_fn_type);
+  is_ptrmemfunc = TYPE_PTRMEMFUNC_P (non_ref_fn_type);
+
+  /* Dereference fn_type if it is a pointer to member.  */
+  if (is_ptrdatamem || is_ptrmemfunc)
+   fn_type = non_ref_fn_type;
+}
+  else
+{
+  is_ptrdatamem = TYPE_PTRDATAMEM_P (fn_type);
+  is_ptrmemfunc = TYPE_PTRMEMFUNC_P (fn_type);
+}
+
+  if (is_ptrdatamem && 

Re: [PATCH 2/4] df: Add DF_LIVE_SUBREG problem

2024-05-07 Thread 陈硕
Hi Dimitar


I send a patch just now, modifies accordingly


some comments:


Nit: Should have two spaces after the dot, per GNU coding style. I'd 
suggest
to run the contrib/check_GNU_style.py script on your patches.
Do you mean "star" by "dot", i.e. "/*" should be "/* "?


These names seem a bit too short for global variables. Perhaps tuck
them in a namespace?

Also, since these must remain empty, shouldn't they be declared as const?

namespace df {
 const bitmap_head empty_bitmap;
 const subregs_live empty_live;
}



May be better if "namespace df" contains all DF related code? as a minor 
modification, I add a prefix "df_" to the variables.
Meanwhile, const seems inapropriate here, since it's returned as normal pointer 
rather than const pointer in some funtions, 

change to const would break this return value type check, and a const_cast 
would make the const meanlingless.


more details see in the patch


regards
Shuo



--Original--
From: "DimitarDimitrov"

[PATCH 1/1] Emit cvtne2ps2bf16 for odd increasing perm in __builtin_shufflevector

2024-05-07 Thread Levy Hsu
Hi All

This patch updates the GCC x86 backend to efficiently handle
odd, incrementally increasing permutations of BF16 vectors
using the cvtne2ps2bf16 instruction.
It modifies ix86_vectorize_vec_perm_const to support these operations
and adds a specific predicate to ensure proper sequence handling.

Bootstrapped and tested on x86_64-linux-gnu, OK for trunk?

BRs,
Levy

gcc/ChangeLog:

* config/i386/i386-expand.cc
(ix86_vectorize_vec_perm_const): Convert BF to HI using subreg.
* config/i386/predicates.md
(vcvtne2ps2bf_parallel): New define_insn_and_split.
* config/i386/sse.md
(vpermt2_sepcial_bf16_shuffle_): New predicates.

gcc/testsuite/ChangeLog:

* gcc.target/i386/vpermt2-special-bf16-shufflue.c: New test.
---
 gcc/config/i386/i386-expand.cc|  4 +--
 gcc/config/i386/predicates.md | 12 +++
 gcc/config/i386/sse.md| 35 +++
 .../i386/vpermt2-special-bf16-shufflue.c  | 27 ++
 4 files changed, 76 insertions(+), 2 deletions(-)
 create mode 100755 
gcc/testsuite/gcc.target/i386/vpermt2-special-bf16-shufflue.c

diff --git a/gcc/config/i386/i386-expand.cc b/gcc/config/i386/i386-expand.cc
index 2f27bfb484c..e2e1e93f2bb 100644
--- a/gcc/config/i386/i386-expand.cc
+++ b/gcc/config/i386/i386-expand.cc
@@ -23894,8 +23894,8 @@ ix86_vectorize_vec_perm_const (machine_mode vmode, 
machine_mode op_mode,
   if (GET_MODE_SIZE (vmode) == 64 && !TARGET_EVEX512)
 return false;
 
-  /* For HF mode vector, convert it to HI using subreg.  */
-  if (GET_MODE_INNER (vmode) == HFmode)
+  /* For HF and BF mode vector, convert it to HI using subreg.  */
+  if (GET_MODE_INNER (vmode) == HFmode || GET_MODE_INNER (vmode) == BFmode)
 {
   machine_mode orig_mode = vmode;
   vmode = mode_for_vector (HImode,
diff --git a/gcc/config/i386/predicates.md b/gcc/config/i386/predicates.md
index 2a97776fc32..9813739daf7 100644
--- a/gcc/config/i386/predicates.md
+++ b/gcc/config/i386/predicates.md
@@ -2317,3 +2317,15 @@
 
   return true;
 })
+
+;; Check that each element is odd and incrementally increasing from 1
+(define_predicate "vcvtne2ps2bf_parallel"
+  (and (match_code "const_vector")
+   (match_code "const_int" "a"))
+{
+  for (int i = 0; i < XVECLEN (op, 0); ++i)
+if (INTVAL (XVECEXP (op, 0, i)) != (2 * i + 1))
+  return false;
+  return true;
+})
+
diff --git a/gcc/config/i386/sse.md b/gcc/config/i386/sse.md
index f57f36ae380..39b52cd00ca 100644
--- a/gcc/config/i386/sse.md
+++ b/gcc/config/i386/sse.md
@@ -31110,3 +31110,38 @@
   "TARGET_AVXVNNIINT16"
   "vpdp\t{%3, %2, %0|%0, %2, %3}"
[(set_attr "prefix" "vex")])
+
+(define_mode_attr hi_cvt_bf
+  [(V8HI "v8bf") (V16HI "v16bf") (V32HI "v32bf")])
+
+(define_mode_attr HI_CVT_BF
+  [(V8HI "V8BF") (V16HI "V16BF") (V32HI "V32BF")])
+
+(define_insn_and_split "vpermt2_sepcial_bf16_shuffle_"
+  [(set (match_operand:VI2_AVX512F 0 "register_operand")
+   (unspec:VI2_AVX512F
+ [(match_operand:VI2_AVX512F 1 "vcvtne2ps2bf_parallel")
+  (match_operand:VI2_AVX512F 2 "register_operand")
+  (match_operand:VI2_AVX512F 3 "nonimmediate_operand")]
+ UNSPEC_VPERMT2))]
+  "TARGET_AVX512VL && TARGET_AVX512BF16 && ix86_pre_reload_split ()"
+  "#"
+  "&& 1"
+  [(const_int 0)]
+{
+  rtx op0 = gen_reg_rtx (mode);
+  operands[2] = lowpart_subreg (mode,
+   force_reg (mode, operands[2]),
+   mode);
+  operands[3] = lowpart_subreg (mode,
+   force_reg (mode, operands[3]),
+   mode);
+
+  emit_insn (gen_avx512f_cvtne2ps2bf16_(op0,
+  operands[3],
+  operands[2]));
+  emit_move_insn (operands[0], lowpart_subreg (mode, op0,
+  mode));
+DONE;
+}
+[(set_attr "mode" "")])
\ No newline at end of file
diff --git a/gcc/testsuite/gcc.target/i386/vpermt2-special-bf16-shufflue.c 
b/gcc/testsuite/gcc.target/i386/vpermt2-special-bf16-shufflue.c
new file mode 100755
index 000..5d36c03442b
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/vpermt2-special-bf16-shufflue.c
@@ -0,0 +1,27 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -mavx512bf16 -mavx512vl" } */
+/* { dg-final { scan-assembler-not "vpermi2b" } } */
+/* { dg-final { scan-assembler-times "vcvtne2ps2bf16" 3 } } */
+
+typedef __bf16 v8bf __attribute__((vector_size(16)));
+typedef __bf16 v16bf __attribute__((vector_size(32)));
+typedef __bf16 v32bf __attribute__((vector_size(64)));
+
+v8bf foo0(v8bf a, v8bf b)
+{
+  return __builtin_shufflevector(a, b, 1, 3, 5, 7, 9, 11, 13, 15);
+}
+
+v16bf foo1(v16bf a, v16bf b)
+{
+  return __builtin_shufflevector(a, b, 1, 3, 5, 7, 9, 11, 13, 15,
+ 17, 19, 21, 23, 25, 27, 29, 31);
+}
+
+v32bf foo2(v32bf a, v32bf b)
+{
+  return 

[df-Add-DF_LIVE_SUBREG-problem] df: Add DF_LIVE_SUBREG problem

2024-05-07 Thread shuo . chen
From: Lehua Ding 

This patch add a new DF problem, named DF_LIVE_SUBREG. This problem
is extended from the DF_LR problem and support track the subreg liveness
of multireg pseudo if these pseudo satisfy the following conditions:

  1. the mode size greater than it's REGMODE_NATURAL_SIZE.
  2. the reg is used in insns via subreg pattern.

The main methods are as follows:

  1. split bitmap in/out/def/use fileds to full_in/out/def/use and
 partial_in/out/def/use. If a pseudo need to be tracked it's subreg
 liveness, then it is recorded in partial_in/out/def/use fileds.
 Meantimes, there are range_in/out/def/use fileds which records the live
 range of the tracked pseudo.
  2. in the df_live_subreg_finalize function, we move the tracked pseudo from
 the partial_in/out/def/use to full_in/out/def/use if the pseudo's live
 range is full.

gcc/ChangeLog:

* Makefile.in: Add subreg-live-range object file.
* df-problems.cc (struct df_live_subreg_problem_data): Private struct
for DF_LIVE_SUBREG problem.
(df_live_subreg_get_bb_info): getting bb regs in/out data.
(get_live_subreg_local_bb_info): getting bb regs use/def data.
(multireg_p): checking is the regno a pseudo multireg.
(need_track_subreg_p): checking is the regno need to be tracked.
(init_range): getting the range of subreg rtx.
(remove_subreg_range): removing use data for the reg/subreg rtx.
(add_subreg_range_def): adding def data for the reg/subreg rtx.
(add_subreg_range_use): adding use data for the reg/subreg rtx.
(df_live_subreg_free_bb_info): Free basic block df data.
(df_live_subreg_alloc): Allocate and init df data.
(df_live_subreg_reset): Reset the live in/out df data.
(df_live_subreg_bb_local_compute): Compute basic block df data.
(df_live_subreg_local_compute): Compute all basic blocks df data.
(df_live_subreg_init): Init the in/out df data.
(df_live_subreg_check_result): Assert the full and partial df data.
(df_live_subreg_confluence_0): Confluence function for infinite loops.
(df_live_subreg_confluence_n): Confluence function for normal edge.
(df_live_subreg_transfer_function): Transfer function.
(df_live_subreg_finalize): Finalize the all_in/all_out df data.
(df_live_subreg_free): Free the df data.
(df_live_subreg_top_dump): Dump top df data.
(df_live_subreg_bottom_dump): Dump bottom df data.
(df_live_subreg_add_problem): Add the DF_LIVE_SUBREG problem.
* df.h (enum df_problem_id): Add DF_LIVE_SUBREG.
(class subregs_live): Simple decalare.
(class df_live_subreg_local_bb_info): New class for full/partial def/use
df data.
(class df_live_subreg_bb_info): New class for full/partial in/out
df data.
(df_live_subreg): getting the df_live_subreg data.
(df_live_subreg_add_problem): Exported.
(df_live_subreg_finalize): Ditto.
(df_live_subreg_check_result): Ditto.
(multireg_p): Ditto.
(init_range): Ditto.
(add_subreg_range): Ditto.
(remove_subreg_range): Ditto.
(df_get_subreg_live_in): Accessor the all_in df data.
(df_get_subreg_live_out): Accessor the all_out df data.
(df_get_subreg_live_full_in): Accessor the full_in df data.
(df_get_subreg_live_full_out): Accessor the full_out df data.
(df_get_subreg_live_partial_in): Accessor the partial_in df data.
(df_get_subreg_live_partial_out): Accessor the partial_out df data.
(df_get_subreg_live_range_in): Accessor the range_in df data.
(df_get_subreg_live_range_out): Accessor the range_out df data.
* regs.h (get_nblocks): Get the blocks of mode.
* sbitmap.cc (bitmap_full_p): sbitmap predicator.
(bitmap_same_p): sbitmap predicator.
(test_full): test bitmap_full_p.
(test_same): test bitmap_same_p.
(sbitmap_cc_tests): Add test_full and test_same.
* sbitmap.h (bitmap_full_p): Exported.
(bitmap_same_p): Ditto.
* timevar.def (TV_DF_LIVE_SUBREG): add DF_LIVE_SUBREG timevar.
* subreg-live-range.cc: New file.
* subreg-live-range.h: New file.
---
 gcc/Makefile.in  |1 +
 gcc/df-problems.cc   | 1179 +-
 gcc/df.h |  158 +
 gcc/regs.h   |5 +
 gcc/sbitmap.cc   |   98 
 gcc/sbitmap.h|2 +
 gcc/subreg-live-range.cc |   53 ++
 gcc/subreg-live-range.h  |  206 +++
 gcc/timevar.def  |1 +
 9 files changed, 1563 insertions(+), 140 deletions(-)
 create mode 100644 gcc/subreg-live-range.cc
 create mode 100644 gcc/subreg-live-range.h

diff --git a/gcc/Makefile.in b/gcc/Makefile.in
index a74761b7ab3..e195238f6ab 100644
--- a/gcc/Makefile.in
+++ b/gcc/Makefile.in
@@ -1684,6 +1684,7 @@ OBJS = \
store-motion.o \
 

[PATCH] match: `a CMP nonnegative ? a : ABS` simplified to just `ABS` [PR112392]

2024-05-07 Thread Andrew Pinski
We can optimize `a == nonnegative ? a : ABS`, `a > nonnegative ? a : ABS`
and `a >= nonnegative ? a : ABS` into `ABS`. This allows removal of
some extra comparison and extra conditional moves in some cases.
I don't remember where I had found though but it is simple to add so
let's add it.

Bootstrapped and tested on x86_64-linux-gnu with no regressions.

Note I have a secondary pattern for the equal case as either a or nonnegative
could be used.

PR tree-optimization/112392

gcc/ChangeLog:

* match.pd (`x CMP nonnegative ? x : ABS`): New pattern;
where CMP is ==, > and >=.
(`x CMP nonnegative@y ? y : ABS`): New pattern.

gcc/testsuite/ChangeLog:

* gcc.dg/tree-ssa/phi-opt-41.c: New test.

Signed-off-by: Andrew Pinski 
---
 gcc/match.pd   | 15 ++
 gcc/testsuite/gcc.dg/tree-ssa/phi-opt-41.c | 34 ++
 2 files changed, 49 insertions(+)
 create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/phi-opt-41.c

diff --git a/gcc/match.pd b/gcc/match.pd
index 03a03c31233..07e743ae464 100644
--- a/gcc/match.pd
+++ b/gcc/match.pd
@@ -5876,6 +5876,21 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT)
  (convert (absu:utype @0)))
 @3
 
+/* X >  Positive ? X : ABS(X) -> ABS(X) */
+/* X >= Positive ? X : ABS(X) -> ABS(X) */
+/* X == Positive ? X : ABS(X) -> ABS(X) */
+(for cmp (eq gt ge)
+ (simplify
+  (cond (cmp:c @0 tree_expr_nonnegative_p@1) @0 (abs@3 @0))
+  (if (INTEGRAL_TYPE_P (type))
+   @3)))
+
+/* X == Positive ? Positive : ABS(X) -> ABS(X) */
+(simplify
+ (cond (eq:c @0 tree_expr_nonnegative_p@1) @1 (abs@3 @0))
+ (if (INTEGRAL_TYPE_P (type))
+  @3))
+
 /* (X + 1) > Y ? -X : 1 simplifies to X >= Y ? -X : 1 when
X is unsigned, as when X + 1 overflows, X is -1, so -X == 1.  */
 (simplify
diff --git a/gcc/testsuite/gcc.dg/tree-ssa/phi-opt-41.c 
b/gcc/testsuite/gcc.dg/tree-ssa/phi-opt-41.c
new file mode 100644
index 000..9774e283a7b
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/tree-ssa/phi-opt-41.c
@@ -0,0 +1,34 @@
+/* { dg-do compile } */
+/* { dg-options "-O1 -fdump-tree-phiopt1" } */
+/* PR tree-optimization/112392 */
+
+int feq_1(int a, unsigned char b)
+{
+  int absb = b;
+  if (a == absb)  return absb;
+  return a > 0 ? a : -a;
+}
+int feq_2(int a, unsigned char b)
+{
+  int absb = b;
+  if (a == absb)  return a;
+  return a > 0 ? a : -a;
+}
+
+int fgt(int a, unsigned char b)
+{
+  int absb = b;
+  if (a > absb)  return a;
+  return a > 0 ? a : -a;
+}
+
+int fge(int a, unsigned char b)
+{
+  int absb = b;
+  if (a >= absb)  return a;
+  return a > 0 ? a : -a;
+}
+
+
+/* { dg-final { scan-tree-dump-not "if " "phiopt1" } } */
+/* { dg-final { scan-tree-dump-times "ABS_EXPR <" 4 "phiopt1" } } */
-- 
2.43.0



[PATCH] x86:Add 3-instruction subroutine vector shift for V16QI in ix86_expand_vec_perm_const_1 [PR107563]

2024-05-07 Thread Levy Hsu
Hi All

We've introduced a new subroutine in ix86_expand_vec_perm_const_1 
to optimize vector shifting for the V16QI type on x86. 
This patch uses a three-instruction sequence psrlw, psllw, and por
to handle specific vector shuffle operations more efficiently. 
The change aims to improve assembly code generation for configurations 
supporting SSE2. 
This update addresses the issue detailed in Bugzilla report 107563.

Bootstrapped and tested on x86_64-linux-gnu, OK for trunk?

BRs,
Levy

gcc/ChangeLog:

* config/i386/i386-expand.cc (expand_vec_perm_psrlw_psllw_por): New
subroutine.
(ix86_expand_vec_perm_const_1): New Entry.

gcc/testsuite/ChangeLog:

* g++.target/i386/pr107563.C: New test.
---
 gcc/config/i386/i386-expand.cc   | 64 
 gcc/testsuite/g++.target/i386/pr107563.C | 23 +
 2 files changed, 87 insertions(+)
 create mode 100755 gcc/testsuite/g++.target/i386/pr107563.C

diff --git a/gcc/config/i386/i386-expand.cc b/gcc/config/i386/i386-expand.cc
index 2f27bfb484c..2718b0acb87 100644
--- a/gcc/config/i386/i386-expand.cc
+++ b/gcc/config/i386/i386-expand.cc
@@ -22362,6 +22362,67 @@ expand_vec_perm_2perm_pblendv (struct 
expand_vec_perm_d *d, bool two_insn)
   return true;
 }
 
+/* A subroutine of ix86_expand_vec_perm_const_1.
+   Implement a permutation with psrlw, psllw and por.
+   It handles case:
+   __builtin_shufflevector (v,v,1,0,3,2,5,4,7,6,9,8,11,10,13,12,15,14);
+   __builtin_shufflevector (v,v,1,0,3,2,5,4,7,6); */
+
+static bool
+expand_vec_perm_psrlw_psllw_por (struct expand_vec_perm_d *d)
+{
+  unsigned i;
+  rtx (*gen_shr) (rtx, rtx, rtx);
+  rtx (*gen_shl) (rtx, rtx, rtx);
+  rtx (*gen_or) (rtx, rtx, rtx);
+  machine_mode mode = VOIDmode;
+
+  if (!TARGET_SSE2 || !d->one_operand_p)
+return false;
+
+  switch (d->vmode)
+{
+case E_V8QImode:
+  if (!TARGET_MMX_WITH_SSE)
+   return false;
+  mode = V4HImode;
+  gen_shr = gen_ashrv4hi3;
+  gen_shl = gen_ashlv4hi3;
+  gen_or = gen_iorv4hi3;
+  break;
+case E_V16QImode:
+  mode = V8HImode;
+  gen_shr = gen_vlshrv8hi3;
+  gen_shl = gen_vashlv8hi3;
+  gen_or = gen_iorv8hi3;
+  break;
+default: return false;
+}
+
+  if (!rtx_equal_p (d->op0, d->op1))
+return false;
+
+  for (i = 0; i < d->nelt; i += 2)
+if (d->perm[i] != i + 1 || d->perm[i + 1] != i)
+  return false;
+
+  if (d->testing_p)
+return true;
+
+  rtx tmp1 = gen_reg_rtx (mode);
+  rtx tmp2 = gen_reg_rtx (mode);
+  rtx op0 = force_reg (d->vmode, d->op0);
+
+  emit_move_insn (tmp1, lowpart_subreg (mode, op0, d->vmode));
+  emit_move_insn (tmp2, lowpart_subreg (mode, op0, d->vmode));
+  emit_insn (gen_shr (tmp1, tmp1, GEN_INT (8)));
+  emit_insn (gen_shl (tmp2, tmp2, GEN_INT (8)));
+  emit_insn (gen_or (tmp1, tmp1, tmp2));
+  emit_move_insn (d->target, lowpart_subreg (d->vmode, tmp1, mode));
+
+  return true;
+}
+
 /* A subroutine of ix86_expand_vec_perm_const_1.  Implement a V4DF
permutation using two vperm2f128, followed by a vshufpd insn blending
the two vectors together.  */
@@ -23781,6 +23842,9 @@ ix86_expand_vec_perm_const_1 (struct expand_vec_perm_d 
*d)
 
   if (expand_vec_perm_2perm_pblendv (d, false))
 return true;
+
+  if (expand_vec_perm_psrlw_psllw_por (d))
+return true;
 
   /* Try sequences of four instructions.  */
 
diff --git a/gcc/testsuite/g++.target/i386/pr107563.C 
b/gcc/testsuite/g++.target/i386/pr107563.C
new file mode 100755
index 000..5b0c648e8f1
--- /dev/null
+++ b/gcc/testsuite/g++.target/i386/pr107563.C
@@ -0,0 +1,23 @@
+/* PR target/107563.C */
+/* { dg-do compile { target { ! ia32 } } } */
+/* { dg-options "-std=c++2b -O3 -msse2" } */
+/* { dg-final { scan-assembler-not "movzbl" } } */
+/* { dg-final { scan-assembler-not "salq" } } */
+/* { dg-final { scan-assembler-not "orq" } } */
+/* { dg-final { scan-assembler-not "punpcklqdq" } } */
+/* { dg-final { scan-assembler-times "psllw" 2 } } */
+/* { dg-final { scan-assembler-times "psrlw" 1 } } */
+/* { dg-final { scan-assembler-times "psraw" 1 } } */
+/* { dg-final { scan-assembler-times "por" 2 } } */
+
+using temp_vec_type [[__gnu__::__vector_size__ (16)]] = char;
+void foo (temp_vec_type& v) noexcept
+{
+  v = __builtin_shufflevector(v,v,1,0,3,2,5,4,7,6,9,8,11,10,13,12,15,14);
+}
+
+using temp_vec_type2 [[__gnu__::__vector_size__ (8)]] = char;
+void foo2 (temp_vec_type2& v) noexcept
+{
+  v=__builtin_shufflevector(v,v,1,0,3,2,5,4,7,6);
+}
-- 
2.31.1



Re: [PATCH 4/4] lra: Apply DF_LIVE_SUBREG data

2024-05-07 Thread Lehua Ding

Hi Vladimir,

I'll send V3 patchs based on these comments. Note that these four 
patches only support subreg liveness tracking and apply to IRA and LRA 
pass. Therefore, no performance changes are expected before we support 
subreg coalesce. There will be new patches later to complete the subreg 
coalesce functionality. Support for subreg coalesce requires support for 
subreg copy i.e. modifying the logic for conflict detection.


On 2024/5/2 00:24, Vladimir Makarov wrote:


On 2/3/24 05:50, Lehua Ding wrote:

This patch apply the DF_LIVE_SUBREG to LRA pass. More changes were made
to the LRA than the IRA since the LRA will modify the DF data directly.
The main big changes are centered on the lra-lives.cc file.

gcc/ChangeLog:

* lra-coalesce.cc (update_live_info): Extend to DF_LIVE_SUBREG.
(lra_coalesce): Ditto.
* lra-constraints.cc (update_ebb_live_info): Ditto.
(get_live_on_other_edges): Ditto.
(inherit_in_ebb): Ditto.
(lra_inheritance): Ditto.
(fix_bb_live_info): Ditto.
(remove_inheritance_pseudos): Ditto.
* lra-int.h (GCC_LRA_INT_H): include subreg-live-range.h
(struct lra_insn_reg): Add op filed to record the corresponding rtx.
* lra-lives.cc (class bb_data_pseudos): Extend the bb_data_pseudos to
include new partial_def/use and range_def/use fileds for DF_LIVE_SUBREG
problem.

Typo "fileds".

(need_track_subreg_p): checking is the regno need to be tracked.
(make_hard_regno_live): switch to live_subreg filed.

The same typo.

(make_hard_regno_dead): Ditto.
(mark_regno_live): Support record subreg liveness.
(mark_regno_dead): Ditto.
(live_trans_fun): Adjust transfer function to support subreg liveness.
(live_con_fun_0): Adjust Confluence function to support subreg liveness.
(live_con_fun_n): Ditto.
(initiate_live_solver): Ditto.
(finish_live_solver): Ditto.
(process_bb_lives): Ditto.
(lra_create_live_ranges_1): Dump subreg liveness.
* lra-remat.cc (dump_candidates_and_remat_bb_data): Switch to
DF_LIVE_SUBREG df data.
(calculate_livein_cands): Ditto.
(do_remat): Ditto.
* lra-spills.cc (spill_pseudos): Ditto.
* lra.cc (new_insn_reg): New argument op.
(add_regs_to_insn_regno_info): Add new argument op.


The patch is ok for me with some minor requests:

You missed log entry for collect_non_operand_hard_regs.  Log entry for 
lra_create_live_ranges_1 is not full (at least, it should be "Ditto. ...").


Also you changed signature for functions update_live_info, 
fix_bb_live_info, mark_regno_live, mark_regno_dead, new_insn_reg but did 
not updated the function comments.  Outdated comments are even worse 
than the comment absence.  Please fix them.


Also some variable naming could be improved but it is up to you.

So now you need just an approval for the rest patches to commit your 
work but they are not my area responsibility.


It is difficult predict for patches of this size how they will work for 
other targets.  I tested you patches on aarch64 and ppc64le. They seems 
working right but please be prepare to switch them off (it is easy) if 
the patches create some issues for other targets, of course until fixing 
the issues.


And thank you for your contribution.  Improving GCC performance these 
days is a challenging task as so many people are working on GCC but you 
found such opportunity and most importantly implement it.





--
Best,
Lehua



Re: [PATCH 2/4] df: Add DF_LIVE_SUBREG problem

2024-05-07 Thread Lehua Ding

Hi Dimitar,


Thanks for helping to review the code! I will send V3 patch which 
address these comments.



Best,

Lehua


On 2024/4/26 04:56, Dimitar Dimitrov wrote:

On Wed, Apr 24, 2024 at 06:05:03PM +0800, Lehua Ding wrote:

This patch add a new DF problem, named DF_LIVE_SUBREG. This problem
is extended from the DF_LR problem and support track the subreg liveness
of multireg pseudo if these pseudo satisfy the following conditions:

   1. the mode size greater than it's REGMODE_NATURAL_SIZE.
   2. the reg is used in insns via subreg pattern.

The main methods are as follows:

   1. split bitmap in/out/def/use fileds to full_in/out/def/use and
  partial_in/out/def/use. If a pseudo need to be tracked it's subreg
  liveness, then it is recorded in partial_in/out/def/use fileds.
  Meantimes, there are range_in/out/def/use fileds which records the live
  range of the tracked pseudo.
   2. in the df_live_subreg_finalize function, we move the tracked pseudo from
  the partial_in/out/def/use to full_in/out/def/use if the pseudo's live
  range is full.

Hi Lehua,

I'm not familiar with LRA, so my comments bellow could be totally off
point.  Please treat them as mild suggestions.


gcc/ChangeLog:

* Makefile.in: Add subreg-live-range object file.
* df-problems.cc (struct df_live_subreg_problem_data): Private struct
for DF_LIVE_SUBREG problem.
(df_live_subreg_get_bb_info): getting bb regs in/out data.
(get_live_subreg_local_bb_info): getting bb regs use/def data.
(multireg_p): checking is the regno a pseudo multireg.
(need_track_subreg_p): checking is the regno need to be tracked.
(init_range): getting the range of subreg rtx.
(remove_subreg_range): removing use data for the reg/subreg rtx.
(add_subreg_range): adding def/use data for the reg/subreg rtx.
(df_live_subreg_free_bb_info): Free basic block df data.
(df_live_subreg_alloc): Allocate and init df data.
(df_live_subreg_reset): Reset the live in/out df data.
(df_live_subreg_bb_local_compute): Compute basic block df data.
(df_live_subreg_local_compute): Compute all basic blocks df data.
(df_live_subreg_init): Init the in/out df data.
(df_live_subreg_check_result): Assert the full and partial df data.
(df_live_subreg_confluence_0): Confluence function for infinite loops.
(df_live_subreg_confluence_n): Confluence function for normal edge.
(df_live_subreg_transfer_function): Transfer function.
(df_live_subreg_finalize): Finalize the all_in/all_out df data.
(df_live_subreg_free): Free the df data.
(df_live_subreg_top_dump): Dump top df data.
(df_live_subreg_bottom_dump): Dump bottom df data.
(df_live_subreg_add_problem): Add the DF_LIVE_SUBREG problem.
* df.h (enum df_problem_id): Add DF_LIVE_SUBREG.
(class subregs_live): Simple decalare.
(class df_live_subreg_local_bb_info): New class for full/partial def/use
df data.
(class df_live_subreg_bb_info): New class for full/partial in/out
df data.
(df_live_subreg): getting the df_live_subreg data.
(df_live_subreg_add_problem): Exported.
(df_live_subreg_finalize): Ditto.
(df_live_subreg_check_result): Ditto.
(multireg_p): Ditto.
(init_range): Ditto.
(add_subreg_range): Ditto.
(remove_subreg_range): Ditto.
(df_get_subreg_live_in): Accessor the all_in df data.
(df_get_subreg_live_out): Accessor the all_out df data.
(df_get_subreg_live_full_in): Accessor the full_in df data.
(df_get_subreg_live_full_out): Accessor the full_out df data.
(df_get_subreg_live_partial_in): Accessor the partial_in df data.
(df_get_subreg_live_partial_out): Accessor the partial_out df data.
(df_get_subreg_live_range_in): Accessor the range_in df data.
(df_get_subreg_live_range_out): Accessor the range_out df data.
* regs.h (get_nblocks): Get the blocks of mode.
* sbitmap.cc (bitmap_full_p): sbitmap predicator.
(bitmap_same_p): sbitmap predicator.
(test_full): test bitmap_full_p.
(test_same): test bitmap_same_p.
(sbitmap_cc_tests): Add test_full and test_same.
* sbitmap.h (bitmap_full_p): Exported.
(bitmap_same_p): Ditto.
* timevar.def (TV_DF_LIVE_SUBREG): add DF_LIVE_SUBREG timevar.
* subreg-live-range.cc: New file.
* subreg-live-range.h: New file.
---
  gcc/Makefile.in  |   1 +
  gcc/df-problems.cc   | 855 ++-
  gcc/df.h | 155 +++
  gcc/regs.h   |   5 +
  gcc/sbitmap.cc   |  98 +
  gcc/sbitmap.h|   2 +
  gcc/subreg-live-range.cc |  53 +++
  gcc/subreg-live-range.h  | 206 ++
  gcc/timevar.def  |   1 +
  9 files changed, 1375 insertions(+), 1 

[PATCH] x86:Add 3-instruction subroutine vector shift for V16QI in ix86_expand_vec_perm_const_1 [PR107563]

2024-05-07 Thread Levy Hsu
PR target/107563

gcc/ChangeLog:

* config/i386/i386-expand.cc (expand_vec_perm_psrlw_psllw_por): New
subroutine.
(ix86_expand_vec_perm_const_1): New Entry.

gcc/testsuite/ChangeLog:

* g++.target/i386/pr107563.C: New test.
---
 gcc/config/i386/i386-expand.cc   | 64 
 gcc/testsuite/g++.target/i386/pr107563.C | 23 +
 2 files changed, 87 insertions(+)
 create mode 100755 gcc/testsuite/g++.target/i386/pr107563.C

diff --git a/gcc/config/i386/i386-expand.cc b/gcc/config/i386/i386-expand.cc
index 2f27bfb484c..2718b0acb87 100644
--- a/gcc/config/i386/i386-expand.cc
+++ b/gcc/config/i386/i386-expand.cc
@@ -22362,6 +22362,67 @@ expand_vec_perm_2perm_pblendv (struct 
expand_vec_perm_d *d, bool two_insn)
   return true;
 }
 
+/* A subroutine of ix86_expand_vec_perm_const_1.
+   Implement a permutation with psrlw, psllw and por.
+   It handles case:
+   __builtin_shufflevector (v,v,1,0,3,2,5,4,7,6,9,8,11,10,13,12,15,14);
+   __builtin_shufflevector (v,v,1,0,3,2,5,4,7,6); */
+
+static bool
+expand_vec_perm_psrlw_psllw_por (struct expand_vec_perm_d *d)
+{
+  unsigned i;
+  rtx (*gen_shr) (rtx, rtx, rtx);
+  rtx (*gen_shl) (rtx, rtx, rtx);
+  rtx (*gen_or) (rtx, rtx, rtx);
+  machine_mode mode = VOIDmode;
+
+  if (!TARGET_SSE2 || !d->one_operand_p)
+return false;
+
+  switch (d->vmode)
+{
+case E_V8QImode:
+  if (!TARGET_MMX_WITH_SSE)
+   return false;
+  mode = V4HImode;
+  gen_shr = gen_ashrv4hi3;
+  gen_shl = gen_ashlv4hi3;
+  gen_or = gen_iorv4hi3;
+  break;
+case E_V16QImode:
+  mode = V8HImode;
+  gen_shr = gen_vlshrv8hi3;
+  gen_shl = gen_vashlv8hi3;
+  gen_or = gen_iorv8hi3;
+  break;
+default: return false;
+}
+
+  if (!rtx_equal_p (d->op0, d->op1))
+return false;
+
+  for (i = 0; i < d->nelt; i += 2)
+if (d->perm[i] != i + 1 || d->perm[i + 1] != i)
+  return false;
+
+  if (d->testing_p)
+return true;
+
+  rtx tmp1 = gen_reg_rtx (mode);
+  rtx tmp2 = gen_reg_rtx (mode);
+  rtx op0 = force_reg (d->vmode, d->op0);
+
+  emit_move_insn (tmp1, lowpart_subreg (mode, op0, d->vmode));
+  emit_move_insn (tmp2, lowpart_subreg (mode, op0, d->vmode));
+  emit_insn (gen_shr (tmp1, tmp1, GEN_INT (8)));
+  emit_insn (gen_shl (tmp2, tmp2, GEN_INT (8)));
+  emit_insn (gen_or (tmp1, tmp1, tmp2));
+  emit_move_insn (d->target, lowpart_subreg (d->vmode, tmp1, mode));
+
+  return true;
+}
+
 /* A subroutine of ix86_expand_vec_perm_const_1.  Implement a V4DF
permutation using two vperm2f128, followed by a vshufpd insn blending
the two vectors together.  */
@@ -23781,6 +23842,9 @@ ix86_expand_vec_perm_const_1 (struct expand_vec_perm_d 
*d)
 
   if (expand_vec_perm_2perm_pblendv (d, false))
 return true;
+
+  if (expand_vec_perm_psrlw_psllw_por (d))
+return true;
 
   /* Try sequences of four instructions.  */
 
diff --git a/gcc/testsuite/g++.target/i386/pr107563.C 
b/gcc/testsuite/g++.target/i386/pr107563.C
new file mode 100755
index 000..5b0c648e8f1
--- /dev/null
+++ b/gcc/testsuite/g++.target/i386/pr107563.C
@@ -0,0 +1,23 @@
+/* PR target/107563.C */
+/* { dg-do compile { target { ! ia32 } } } */
+/* { dg-options "-std=c++2b -O3 -msse2" } */
+/* { dg-final { scan-assembler-not "movzbl" } } */
+/* { dg-final { scan-assembler-not "salq" } } */
+/* { dg-final { scan-assembler-not "orq" } } */
+/* { dg-final { scan-assembler-not "punpcklqdq" } } */
+/* { dg-final { scan-assembler-times "psllw" 2 } } */
+/* { dg-final { scan-assembler-times "psrlw" 1 } } */
+/* { dg-final { scan-assembler-times "psraw" 1 } } */
+/* { dg-final { scan-assembler-times "por" 2 } } */
+
+using temp_vec_type [[__gnu__::__vector_size__ (16)]] = char;
+void foo (temp_vec_type& v) noexcept
+{
+  v = __builtin_shufflevector(v,v,1,0,3,2,5,4,7,6,9,8,11,10,13,12,15,14);
+}
+
+using temp_vec_type2 [[__gnu__::__vector_size__ (8)]] = char;
+void foo2 (temp_vec_type2& v) noexcept
+{
+  v=__builtin_shufflevector(v,v,1,0,3,2,5,4,7,6);
+}
-- 
2.31.1



[COMMITTED] Revert "Revert "testsuite/gcc.target/cris/pr93372-2.c: Handle xpass from combine improvement"" combine improvement

2024-05-07 Thread Hans-Peter Nilsson
> From: Hans-Peter Nilsson 
> Date: Thu, 11 Apr 2024 01:16:32 +0200

I committed this revert of a revert, as r15-311, as the
prerequisite was also revert-reverted, in r15-268.

-- >8 --
This reverts commit 39f81924d88e3cc197fc3df74204c9b5e01e12f7.
---
 gcc/testsuite/gcc.target/cris/pr93372-2.c | 15 ---
 1 file changed, 8 insertions(+), 7 deletions(-)

diff --git a/gcc/testsuite/gcc.target/cris/pr93372-2.c 
b/gcc/testsuite/gcc.target/cris/pr93372-2.c
index 912069c018d5..2ef6471a990b 100644
--- a/gcc/testsuite/gcc.target/cris/pr93372-2.c
+++ b/gcc/testsuite/gcc.target/cris/pr93372-2.c
@@ -1,19 +1,20 @@
 /* Check that eliminable compare-instructions are eliminated. */
 /* { dg-do compile } */
 /* { dg-options "-O2" } */
-/* { dg-final { scan-assembler-not "\tcmp|\ttest" { xfail *-*-* } } } */
-/* { dg-final { scan-assembler-not "\tnot" { xfail cc0 } } } */
-/* { dg-final { scan-assembler-not "\tlsr" { xfail cc0 } } } */
+/* { dg-final { scan-assembler-not "\tcmp|\ttest" } } */
+/* { dg-final { scan-assembler-not "\tnot" } } */
+/* { dg-final { scan-assembler-not "\tlsr" } } */
+/* We should get just one move, storing the result into *d.  */
+/* { dg-final { scan-assembler-times "\tmove" 1 } } */
 
 int f(int a, int b, int *d)
 {
   int c = a - b;
 
-  /* Whoops!  We get a cmp.d with the original operands here. */
+  /* We used to get a cmp.d with the original operands here. */
   *d = (c == 0);
 
-  /* Whoops!  While we don't get a test.d for the result here for cc0,
- we get a sequence of insns: a move, a "not" and a shift of the
- subtraction-result, where a simple "spl" would have done. */
+  /* We used to get a suboptimal sequence, but now we get the optimal "sge"
+ (a.k.a "spl") re-using flags from the subtraction. */
   return c >= 0;
 }
-- 
2.30.2

brgds, H-P


Ping^3 [PATCH, rs6000] Split TImode for logical operations in expand pass [PR100694]

2024-05-07 Thread HAO CHEN GUI
Hi,
  As now it's stage-1, gently ping this:
https://gcc.gnu.org/pipermail/gcc-patches/2023-February/611550.html

Gui Haochen
Thanks

在 2023/4/24 13:35, HAO CHEN GUI 写道:
> Hi,
>   Gently ping this:
> https://gcc.gnu.org/pipermail/gcc-patches/2023-February/611550.html
> 
> Thanks
> Gui Haochen
> 
> 在 2023/2/20 10:10, HAO CHEN GUI 写道:
>> Hi,
>>   Gently ping this:
>>   https://gcc.gnu.org/pipermail/gcc-patches/2023-February/611550.html
>>
>> Gui Haochen
>> Thanks
>>
>> 在 2023/2/8 13:08, HAO CHEN GUI 写道:
>>> Hi,
>>>   The logical operations for TImode is split after reload pass right now. 
>>> Some
>>> potential optimizations miss as the split is too late. This patch removes
>>> TImode from "AND", "IOR", "XOR" and "NOT" expander so that these logical
>>> operations can be split at expand pass. The new test case illustrates the
>>> optimization.
>>>
>>>   Two test cases of pr92398 are merged into one as all sub-targets generates
>>> the same sequence of instructions with the patch.
>>>
>>>   Bootstrapped and tested on powerpc64-linux BE and LE with no regressions.
>>>
>>> Thanks
>>> Gui Haochen
>>>
>>>
>>> ChangeLog
>>> 2023-02-08  Haochen Gui 
>>>
>>> gcc/
>>> PR target/100694
>>> * config/rs6000/rs6000.md (BOOL_128_V): New mode iterator for 128-bit
>>> vector types.
>>> (and3): Replace BOOL_128 with BOOL_128_V.
>>> (ior3): Likewise.
>>> (xor3): Likewise.
>>> (one_cmpl2 expander): New expander with BOOL_128_V.
>>> (one_cmpl2 insn_and_split): Rename to ...
>>> (*one_cmpl2): ... this.
>>>
>>> gcc/testsuite/
>>> PR target/100694
>>> * gcc.target/powerpc/pr100694.c: New.
>>> * gcc.target/powerpc/pr92398.c: New.
>>> * gcc.target/powerpc/pr92398.h: Remove.
>>> * gcc.target/powerpc/pr92398.p9-.c: Remove.
>>> * gcc.target/powerpc/pr92398.p9+.c: Remove.
>>>
>>>
>>> patch.diff
>>> diff --git a/gcc/config/rs6000/rs6000.md b/gcc/config/rs6000/rs6000.md
>>> index 4bd1dfd3da9..455b7329643 100644
>>> --- a/gcc/config/rs6000/rs6000.md
>>> +++ b/gcc/config/rs6000/rs6000.md
>>> @@ -743,6 +743,15 @@ (define_mode_iterator BOOL_128 [TI
>>>  (V2DF  "TARGET_ALTIVEC")
>>>  (V1TI  "TARGET_ALTIVEC")])
>>>
>>> +;; Mode iterator for logical operations on 128-bit vector types
>>> +(define_mode_iterator BOOL_128_V   [(V16QI "TARGET_ALTIVEC")
>>> +(V8HI  "TARGET_ALTIVEC")
>>> +(V4SI  "TARGET_ALTIVEC")
>>> +(V4SF  "TARGET_ALTIVEC")
>>> +(V2DI  "TARGET_ALTIVEC")
>>> +(V2DF  "TARGET_ALTIVEC")
>>> +(V1TI  "TARGET_ALTIVEC")])
>>> +
>>>  ;; For the GPRs we use 3 constraints for register outputs, two that are the
>>>  ;; same as the output register, and a third where the output register is an
>>>  ;; early clobber, so we don't have to deal with register overlaps.  For the
>>> @@ -7135,23 +7144,23 @@ (define_expand "subti3"
>>>  ;; 128-bit logical operations expanders
>>>
>>>  (define_expand "and3"
>>> -  [(set (match_operand:BOOL_128 0 "vlogical_operand")
>>> -   (and:BOOL_128 (match_operand:BOOL_128 1 "vlogical_operand")
>>> - (match_operand:BOOL_128 2 "vlogical_operand")))]
>>> +  [(set (match_operand:BOOL_128_V 0 "vlogical_operand")
>>> +   (and:BOOL_128_V (match_operand:BOOL_128_V 1 "vlogical_operand")
>>> +   (match_operand:BOOL_128_V 2 "vlogical_operand")))]
>>>""
>>>"")
>>>
>>>  (define_expand "ior3"
>>> -  [(set (match_operand:BOOL_128 0 "vlogical_operand")
>>> -(ior:BOOL_128 (match_operand:BOOL_128 1 "vlogical_operand")
>>> - (match_operand:BOOL_128 2 "vlogical_operand")))]
>>> +  [(set (match_operand:BOOL_128_V 0 "vlogical_operand")
>>> +   (ior:BOOL_128_V (match_operand:BOOL_128_V 1 "vlogical_operand")
>>> +   (match_operand:BOOL_128_V 2 "vlogical_operand")))]
>>>""
>>>"")
>>>
>>>  (define_expand "xor3"
>>> -  [(set (match_operand:BOOL_128 0 "vlogical_operand")
>>> -(xor:BOOL_128 (match_operand:BOOL_128 1 "vlogical_operand")
>>> - (match_operand:BOOL_128 2 "vlogical_operand")))]
>>> +  [(set (match_operand:BOOL_128_V 0 "vlogical_operand")
>>> +   (xor:BOOL_128_V (match_operand:BOOL_128_V 1 "vlogical_operand")
>>> +   (match_operand:BOOL_128_V 2 "vlogical_operand")))]
>>>""
>>>"")
>>>
>>> @@ -7449,7 +7458,14 @@ (define_insn_and_split "*eqv3_internal2"
>>>  (const_string "16")))])
>>>
>>>  ;; 128-bit one's complement
>>> -(define_insn_and_split "one_cmpl2"
>>> +(define_expand "one_cmpl2"
>>> +[(set (match_operand:BOOL_128_V 0 "vlogical_operand" "=")
>>> +   (not:BOOL_128_V
>>> + (match_operand:BOOL_128_V 1 "vlogical_operand" "")))]
>>> +  ""
>>> +  "")
>>> +
>>> +(define_insn_and_split "*one_cmpl2"
>>>[(set (match_operand:BOOL_128 0 

Ping [PATCH, RFC] combine: Don't truncate const operand of AND if it's no benefits

2024-05-07 Thread HAO CHEN GUI
Hi,
  Gently ping this:
https://gcc.gnu.org/pipermail/gcc-patches/2024-March/647533.html

Thanks
Gui Haochen

在 2024/3/18 17:10, HAO CHEN GUI 写道:
> Hi,
>   Gently ping this:
> https://gcc.gnu.org/pipermail/gcc-patches/2024-March/647533.html
> 
> Thanks
> Gui Haochen
> 
> 在 2024/3/11 13:41, HAO CHEN GUI 写道:
>> Hi,
>>   This patch tries to fix the problem when a canonical form doesn't benefit
>> on a specific target. The const operand of AND is and with the nonzero
>> bits of another operand in combine pass. It's a canonical form, but it's no
>> benefits for the target which has rotate and mask insns. As the mask is
>> truncated, it can't match the insn conditions which it originally matches.
>> For example, the following insn condition checks the sum of two AND masks.
>> When one of the mask is truncated, the condition breaks.
>>
>> (define_insn "*rotlsi3_insert_5"
>>   [(set (match_operand:SI 0 "gpc_reg_operand" "=r,r")
>>  (ior:SI (and:SI (match_operand:SI 1 "gpc_reg_operand" "0,r")
>>  (match_operand:SI 2 "const_int_operand" "n,n"))
>>  (and:SI (match_operand:SI 3 "gpc_reg_operand" "r,0")
>>  (match_operand:SI 4 "const_int_operand" "n,n"]
>>   "rs6000_is_valid_mask (operands[2], NULL, NULL, SImode)
>>&& UINTVAL (operands[2]) != 0 && UINTVAL (operands[4]) != 0
>>&& UINTVAL (operands[2]) + UINTVAL (operands[4]) + 1 == 0"
>> ...
>>
>>   This patch tries to fix the problem by comparing the rtx cost. If another
>> operand (varop) is not changed and rtx cost with new mask is not less than
>> the original one, the mask is restored to original one.
>>
>>   I'm not sure if comparison of rtx cost here is proper. The outer code is
>> unknown and I suppose it as "SET". Also the rtx cost might not be accurate.
>> From my understanding, the canonical forms should always benefit as it can't
>> be undo in combine pass. Do we have a perfect solution for this kind of
>> issues? Looking forward for your advice.
>>
>>   Another similar issues for canonical forms. Whether the widen mode for
>> lshiftrt is always good?
>> https://gcc.gnu.org/pipermail/gcc-patches/2023-July/624852.html
>>
>> Thanks
>> Gui Haochen
>>
>> ChangeLog
>> Combine: Don't truncate const operand of AND if it's no benefits
>>
>> In combine pass, the canonical form is to turn off all bits in the constant
>> that are know to already be zero for AND.
>>
>>   /* Turn off all bits in the constant that are known to already be zero.
>>  Thus, if the AND isn't needed at all, we will have CONSTOP == 
>> NONZERO_BITS
>>  which is tested below.  */
>>
>>   constop &= nonzero;
>>
>> But it doesn't benefit when the target has rotate and mask insert insns.
>> The AND mask is truncated and lost its information.  Thus it can't match
>> the insn conditions.  For example, the following insn condition checks
>> the sum of two AND masks.
>>
>> (define_insn "*rotlsi3_insert_5"
>>   [(set (match_operand:SI 0 "gpc_reg_operand" "=r,r")
>>  (ior:SI (and:SI (match_operand:SI 1 "gpc_reg_operand" "0,r")
>>  (match_operand:SI 2 "const_int_operand" "n,n"))
>>  (and:SI (match_operand:SI 3 "gpc_reg_operand" "r,0")
>>  (match_operand:SI 4 "const_int_operand" "n,n"]
>>   "rs6000_is_valid_mask (operands[2], NULL, NULL, SImode)
>>&& UINTVAL (operands[2]) != 0 && UINTVAL (operands[4]) != 0
>>&& UINTVAL (operands[2]) + UINTVAL (operands[4]) + 1 == 0"
>> ...
>>
>> This patch restores the const operand of AND if the another operand is
>> not optimized and the truncated const operand doesn't save the rtx cost.
>>
>> gcc/
>>  * combine.cc (simplify_and_const_int_1): Restore the const operand
>>  of AND if varop is not optimized and the rtx cost of the new const
>>  operand is not reduced.
>>
>> gcc/testsuite/
>>  * gcc.target/powerpc/rlwimi-0.c: Reduced total number of insns and
>>  adjust the number of rotate and mask insns.
>>  * gcc.target/powerpc/rlwimi-1.c: Likewise.
>>  * gcc.target/powerpc/rlwimi-2.c: Likewise.
>>
>> patch.diff
>> diff --git a/gcc/combine.cc b/gcc/combine.cc
>> index a4479f8d836..16ff09ea854 100644
>> --- a/gcc/combine.cc
>> +++ b/gcc/combine.cc
>> @@ -10161,8 +10161,23 @@ simplify_and_const_int_1 (scalar_int_mode mode, rtx 
>> varop,
>>if (constop == nonzero)
>>  return varop;
>>
>> -  if (varop == orig_varop && constop == orig_constop)
>> -return NULL_RTX;
>> +  if (varop == orig_varop)
>> +{
>> +  if (constop == orig_constop)
>> +return NULL_RTX;
>> +  else
>> +{
>> +  rtx tmp = simplify_gen_binary (AND, mode, varop,
>> + gen_int_mode (constop, mode));
>> +  rtx orig = simplify_gen_binary (AND, mode, varop,
>> +  gen_int_mode (orig_constop, mode));
>> +  if (set_src_cost (tmp, mode, optimize_this_for_speed_p)
>> +  < set_src_cost (orig, mode, optimize_this_for_speed_p))
>> +

[PATCH] i386: Fix some intrinsics without alignment requirements.

2024-05-07 Thread Hu, Lin1
Hi all,

This patch aims to fix some intrinsics without alignment requirement, but
raised runtime error's problem.

Bootstrapped and tested on x86_64-linux-gnu, OK for trunk?

BRs,
Lin

gcc/ChangeLog:

PR target/84508
* config/i386/emmintrin.h
(_mm_load_sd): Remove alignment requirement.
(_mm_store_sd): Ditto.
(_mm_loadh_pd): Ditto.
(_mm_loadl_pd): Ditto.
(_mm_storel_pd): Add alignment requirement.
* config/i386/xmmintrin.h
(_mm_loadh_pi): Remove alignment requirement.
(_mm_loadl_pi): Ditto.
(_mm_load_ss): Ditto.
(_mm_store_ss): Ditto.

gcc/testsuite/ChangeLog:

PR target/84508
* gcc.target/i386/pr84508-1.c: New test.
* gcc.target/i386/pr84508-2.c: Ditto.
---
 gcc/config/i386/emmintrin.h   | 11 ++-
 gcc/config/i386/xmmintrin.h   |  9 +
 gcc/testsuite/gcc.target/i386/pr84508-1.c | 11 +++
 gcc/testsuite/gcc.target/i386/pr84508-2.c | 11 +++
 4 files changed, 33 insertions(+), 9 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/i386/pr84508-1.c
 create mode 100644 gcc/testsuite/gcc.target/i386/pr84508-2.c

diff --git a/gcc/config/i386/emmintrin.h b/gcc/config/i386/emmintrin.h
index 915a5234c38..d7fc1af9687 100644
--- a/gcc/config/i386/emmintrin.h
+++ b/gcc/config/i386/emmintrin.h
@@ -56,6 +56,7 @@ typedef double __m128d __attribute__ ((__vector_size__ (16), 
__may_alias__));
 /* Unaligned version of the same types.  */
 typedef long long __m128i_u __attribute__ ((__vector_size__ (16), 
__may_alias__, __aligned__ (1)));
 typedef double __m128d_u __attribute__ ((__vector_size__ (16), __may_alias__, 
__aligned__ (1)));
+typedef double double_u __attribute__ ((__may_alias__, __aligned__ (1)));
 
 /* Create a selector for use with the SHUFPD instruction.  */
 #define _MM_SHUFFLE2(fp1,fp0) \
@@ -145,7 +146,7 @@ _mm_load1_pd (double const *__P)
 extern __inline __m128d __attribute__((__gnu_inline__, __always_inline__, 
__artificial__))
 _mm_load_sd (double const *__P)
 {
-  return _mm_set_sd (*__P);
+  return __extension__ (__m128d){ *(double_u *)__P, 0.0 };
 }
 
 extern __inline __m128d __attribute__((__gnu_inline__, __always_inline__, 
__artificial__))
@@ -180,7 +181,7 @@ _mm_storeu_pd (double *__P, __m128d __A)
 extern __inline void __attribute__((__gnu_inline__, __always_inline__, 
__artificial__))
 _mm_store_sd (double *__P, __m128d __A)
 {
-  *__P = ((__v2df)__A)[0];
+  *(double_u *)__P = ((__v2df)__A)[0] ;
 }
 
 extern __inline double __attribute__((__gnu_inline__, __always_inline__, 
__artificial__))
@@ -192,7 +193,7 @@ _mm_cvtsd_f64 (__m128d __A)
 extern __inline void __attribute__((__gnu_inline__, __always_inline__, 
__artificial__))
 _mm_storel_pd (double *__P, __m128d __A)
 {
-  _mm_store_sd (__P, __A);
+  *__P = ((__v2df)__A)[0];
 }
 
 /* Stores the upper DPFP value.  */
@@ -973,13 +974,13 @@ _mm_unpacklo_pd (__m128d __A, __m128d __B)
 extern __inline __m128d __attribute__((__gnu_inline__, __always_inline__, 
__artificial__))
 _mm_loadh_pd (__m128d __A, double const *__B)
 {
-  return (__m128d)__builtin_ia32_loadhpd ((__v2df)__A, __B);
+  return __extension__ (__m128d) { ((__v2df)__A)[0], *(double_u*)__B };
 }
 
 extern __inline __m128d __attribute__((__gnu_inline__, __always_inline__, 
__artificial__))
 _mm_loadl_pd (__m128d __A, double const *__B)
 {
-  return (__m128d)__builtin_ia32_loadlpd ((__v2df)__A, __B);
+  return __extension__ (__m128d) { *(double_u*)__B, ((__v2df)__A)[1] };
 }
 
 extern __inline int __attribute__((__gnu_inline__, __always_inline__, 
__artificial__))
diff --git a/gcc/config/i386/xmmintrin.h b/gcc/config/i386/xmmintrin.h
index 71b9955b843..9e20f262839 100644
--- a/gcc/config/i386/xmmintrin.h
+++ b/gcc/config/i386/xmmintrin.h
@@ -73,6 +73,7 @@ typedef float __m128 __attribute__ ((__vector_size__ (16), 
__may_alias__));
 
 /* Unaligned version of the same type.  */
 typedef float __m128_u __attribute__ ((__vector_size__ (16), __may_alias__, 
__aligned__ (1)));
+typedef float float_u __attribute__ ((__may_alias__, __aligned__ (1)));
 
 /* Internal data types for implementing the intrinsics.  */
 typedef float __v4sf __attribute__ ((__vector_size__ (16)));
@@ -774,7 +775,7 @@ _mm_unpacklo_ps (__m128 __A, __m128 __B)
 /* Sets the upper two SPFP values with 64-bits of data loaded from P;
the lower two values are passed through from A.  */
 extern __inline __m128 __attribute__((__gnu_inline__, __always_inline__, 
__artificial__))
-_mm_loadh_pi (__m128 __A, __m64 const *__P)
+_mm_loadh_pi (__m128 __A, __m64_u const *__P)
 {
   return (__m128) __builtin_ia32_loadhps ((__v4sf)__A, (const __v2sf *)__P);
 }
@@ -803,7 +804,7 @@ _mm_movelh_ps (__m128 __A, __m128 __B)
 /* Sets the lower two SPFP values with 64-bits of data loaded from P;
the upper two values are passed through from A.  */
 extern __inline __m128 __attribute__((__gnu_inline__, __always_inline__, 
__artificial__))
-_mm_loadl_pi (__m128 

Re: [PATCH] i386: fix ix86_hardreg_mov_ok with lra_in_progress

2024-05-07 Thread Hongtao Liu
On Mon, May 6, 2024 at 3:40 PM Kong, Lingling  wrote:
>
> Hi,
> Originally eliminate_regs_in_insn will transform
> (parallel [
>   (set (reg:QI 130)
> (plus:QI (subreg:QI (reg:DI 19 frame) 0)
>   (const_int 96)))
>   (clobber (reg:CC 17 flag))]) {*addqi_1}
> to
> (set (reg:QI 130)
>   (subreg:QI (reg:DI 19 frame) 0)) {*movqi_internal}
> when verify_changes.
>
> But with No Flags add, it transforms
> (set (reg:QI 5 di)
>   (plus:QI (subreg:QI (reg:DI 19 frame) 0)
>(const_int 96))) {*addqi_1_nf}
> to
> (set (reg:QI 5 di)
>  (subreg:QI (reg:DI 19 frame) 0)) {*addqi_1_nf}.
> there is no extra clobbers at the end, and its dest reg just is a hardreg. 
> For ix86_hardreg_mov_ok, it returns false. So it fails to update insn and 
> causes the ICE when transform to movqi_internal.
>
> But actually it is ok and safe for ix86_hardreg_mov_ok when lra_in_progress.
>
> And tested the spec2017, the performance was not affected.
> Bootstrapped and regtested on x86_64-pc-linux-gnu. OK for trunk?
Ok.
>
> gcc/ChangeLog:
>
> * config/i386/i386.cc (ix86_hardreg_mov_ok): Relax
> hard reg mov restriction when lra in progress.
> ---
>  gcc/config/i386/i386.cc | 3 ++-
>  1 file changed, 2 insertions(+), 1 deletion(-)
>
> diff --git a/gcc/config/i386/i386.cc b/gcc/config/i386/i386.cc index 
> 4d6b2b98761..ca4348a18bf 100644
> --- a/gcc/config/i386/i386.cc
> +++ b/gcc/config/i386/i386.cc
> @@ -20357,7 +20357,8 @@ ix86_hardreg_mov_ok (rtx dst, rtx src)
>? standard_sse_constant_p (src, GET_MODE (dst))
>: x86_64_immediate_operand (src, GET_MODE (dst)))
>&& ix86_class_likely_spilled_p (REGNO_REG_CLASS (REGNO (dst)))
> -  && !reload_completed)
> +  && !reload_completed
> +  && !lra_in_progress)
>  return false;
>return true;
>  }
> --
> 2.31.1
>


-- 
BR,
Hongtao


Re: [Patch, rs6000] Enable overlap memory store for block memory clear

2024-05-07 Thread HAO CHEN GUI
Hi,
  As now it's stage 1, gently ping this:
https://gcc.gnu.org/pipermail/gcc-patches/2024-February/646478.html

Thanks
Gui Haochen

在 2024/2/26 10:25, HAO CHEN GUI 写道:
> Hi,
>   This patch enables overlap memory store for block memory clear which
> saves the number of store instructions. The expander calls
> widest_fixed_size_mode_for_block_clear to get the mode for looped block
> clear and calls widest_fixed_size_mode_for_block_clear to get the mode
> for last overlapped clear.
> 
> Bootstrapped and tested on x86 and powerpc64-linux BE and LE with no
> regressions. Is it OK for the trunk or next stage 1?
> 
> Thanks
> Gui Haochen
> 
> 
> ChangeLog
> rs6000: Enable overlap memory store for block memory clear
> 
> gcc/
>   * config/rs6000/rs6000-string.cc
>   (widest_fixed_size_mode_for_block_clear): New.
>   (smallest_fixed_size_mode_for_block_clear): New.
>   (expand_block_clear): Call widest_fixed_size_mode_for_block_clear to
>   get the mode for looped memory stores and call
>   smallest_fixed_size_mode_for_block_clear to get the mode for the last
>   overlapped memory store.
> 
> gcc/testsuite
>   * gcc.target/powerpc/block-clear-1.c: New.
> 
> 
> patch.diff
> diff --git a/gcc/config/rs6000/rs6000-string.cc 
> b/gcc/config/rs6000/rs6000-string.cc
> index 133e5382af2..c2a6095a586 100644
> --- a/gcc/config/rs6000/rs6000-string.cc
> +++ b/gcc/config/rs6000/rs6000-string.cc
> @@ -38,6 +38,49 @@
>  #include "profile-count.h"
>  #include "predict.h"
> 
> +/* Return the widest mode which mode size is less than or equal to the
> +   size.  */
> +static fixed_size_mode
> +widest_fixed_size_mode_for_block_clear (unsigned int size, unsigned int 
> align,
> + bool unaligned_vsx_ok)
> +{
> +  machine_mode mode;
> +
> +  if (TARGET_ALTIVEC
> +  && size >= 16
> +  && (align >= 128
> +   || unaligned_vsx_ok))
> +mode = V4SImode;
> +  else if (size >= 8
> +&& TARGET_POWERPC64
> +&& (align >= 64
> +|| !STRICT_ALIGNMENT))
> +mode = DImode;
> +  else if (size >= 4
> +&& (align >= 32
> +|| !STRICT_ALIGNMENT))
> +mode = SImode;
> +  else if (size >= 2
> +&& (align >= 16
> +|| !STRICT_ALIGNMENT))
> +mode = HImode;
> +  else
> +mode = QImode;
> +
> +  return as_a  (mode);
> +}
> +
> +/* Return the smallest mode which mode size is smaller than or eqaul to
> +   the size.  */
> +static fixed_size_mode
> +smallest_fixed_size_mode_for_block_clear (unsigned int size)
> +{
> +  if (size > UNITS_PER_WORD)
> +return as_a  (V4SImode);
> +
> +  return smallest_int_mode_for_size (size * BITS_PER_UNIT);
> +}
> +
>  /* Expand a block clear operation, and return 1 if successful.  Return 0
> if we should let the compiler generate normal code.
> 
> @@ -55,7 +98,6 @@ expand_block_clear (rtx operands[])
>HOST_WIDE_INT align;
>HOST_WIDE_INT bytes;
>int offset;
> -  int clear_bytes;
>int clear_step;
> 
>/* If this is not a fixed size move, just call memcpy */
> @@ -89,62 +131,36 @@ expand_block_clear (rtx operands[])
> 
>bool unaligned_vsx_ok = (bytes >= 32 && TARGET_EFFICIENT_UNALIGNED_VSX);
> 
> -  for (offset = 0; bytes > 0; offset += clear_bytes, bytes -= clear_bytes)
> +  auto mode = widest_fixed_size_mode_for_block_clear (bytes, align,
> +   unaligned_vsx_ok);
> +  offset = 0;
> +  rtx dest;
> +
> +  do
>  {
> -  machine_mode mode = BLKmode;
> -  rtx dest;
> +  unsigned int size = GET_MODE_SIZE (mode);
> 
> -  if (TARGET_ALTIVEC
> -   && (bytes >= 16 && (align >= 128 || unaligned_vsx_ok)))
> +  while (bytes >= size)
>   {
> -   clear_bytes = 16;
> -   mode = V4SImode;
> - }
> -  else if (bytes >= 8 && TARGET_POWERPC64
> -&& (align >= 64 || !STRICT_ALIGNMENT))
> - {
> -   clear_bytes = 8;
> -   mode = DImode;
> -   if (offset == 0 && align < 64)
> - {
> -   rtx addr;
> +   dest = adjust_address (orig_dest, mode, offset);
> +   emit_move_insn (dest, CONST0_RTX (mode));
> 
> -   /* If the address form is reg+offset with offset not a
> -  multiple of four, reload into reg indirect form here
> -  rather than waiting for reload.  This way we get one
> -  reload, not one per store.  */
> -   addr = XEXP (orig_dest, 0);
> -   if ((GET_CODE (addr) == PLUS || GET_CODE (addr) == LO_SUM)
> -   && CONST_INT_P (XEXP (addr, 1))
> -   && (INTVAL (XEXP (addr, 1)) & 3) != 0)
> - {
> -   addr = copy_addr_to_reg (addr);
> -   orig_dest = replace_equiv_address (orig_dest, addr);
> - }
> - }
> - }
> -  else if (bytes >= 4 && (align >= 32 || !STRICT_ALIGNMENT))
> - {   /* move 4 bytes */
> -   clear_bytes = 4;
> -   mode = SImode;
> -

[PATCH] vect: generate suitable convert insn for int -> int, float -> float and int <-> float.

2024-05-07 Thread Hu, Lin1
Hi, all

This patch aims to optimize __builtin_convertvector. We want the function
can generate more efficient insn for some situations. Like v2si -> v2di.

The patch has been bootstrapped and regtested on x86_64-pc-linux-gnu, OK for
trunk?

BRs,
Lin

gcc/ChangeLog:

PR target/107432
* tree-vect-generic.cc (expand_vector_conversion): Support
convert for int -> int, float -> float and int <-> float.
(expand_vector_conversion_no_vec_pack): Check if can convert
int <-> int, float <-> float and int <-> float, directly.
Support indirect convert, when direct optab is not supported.

gcc/testsuite/ChangeLog:

PR target/107432
* gcc.target/i386/pr107432-1.c: New test.
* gcc.target/i386/pr107432-2.c: Ditto.
* gcc.target/i386/pr107432-3.c: Ditto.
* gcc.target/i386/pr107432-4.c: Ditto.
* gcc.target/i386/pr107432-5.c: Ditto.
* gcc.target/i386/pr107432-6.c: Ditto.
* gcc.target/i386/pr107432-7.c: Ditto.
---
 gcc/testsuite/gcc.target/i386/pr107432-1.c | 234 +
 gcc/testsuite/gcc.target/i386/pr107432-2.c | 105 +
 gcc/testsuite/gcc.target/i386/pr107432-3.c |  55 +
 gcc/testsuite/gcc.target/i386/pr107432-4.c |  56 +
 gcc/testsuite/gcc.target/i386/pr107432-5.c |  72 +++
 gcc/testsuite/gcc.target/i386/pr107432-6.c | 139 
 gcc/testsuite/gcc.target/i386/pr107432-7.c | 156 ++
 gcc/tree-vect-generic.cc   | 107 +-
 8 files changed, 918 insertions(+), 6 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/i386/pr107432-1.c
 create mode 100644 gcc/testsuite/gcc.target/i386/pr107432-2.c
 create mode 100644 gcc/testsuite/gcc.target/i386/pr107432-3.c
 create mode 100644 gcc/testsuite/gcc.target/i386/pr107432-4.c
 create mode 100644 gcc/testsuite/gcc.target/i386/pr107432-5.c
 create mode 100644 gcc/testsuite/gcc.target/i386/pr107432-6.c
 create mode 100644 gcc/testsuite/gcc.target/i386/pr107432-7.c

diff --git a/gcc/testsuite/gcc.target/i386/pr107432-1.c 
b/gcc/testsuite/gcc.target/i386/pr107432-1.c
new file mode 100644
index 000..a4f37447eb4
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/pr107432-1.c
@@ -0,0 +1,234 @@
+/* { dg-do compile } */
+/* { dg-options "-march=x86-64 -mavx512bw -mavx512vl -O3" } */
+/* { dg-final { scan-assembler-times "vpmovqd" 6 } } */
+/* { dg-final { scan-assembler-times "vpmovqw" 6 } } */
+/* { dg-final { scan-assembler-times "vpmovqb" 6 } } */
+/* { dg-final { scan-assembler-times "vpmovdw" 6 { target { ia32 } } } } */
+/* { dg-final { scan-assembler-times "vpmovdw" 8 { target { ! ia32 } } } } */
+/* { dg-final { scan-assembler-times "vpmovdb" 6 { target { ia32 } } } } */
+/* { dg-final { scan-assembler-times "vpmovdb" 8 { target { ! ia32 } } } } */
+/* { dg-final { scan-assembler-times "vpmovwb" 8 } } */
+
+#include 
+
+typedef short __v2hi __attribute__ ((__vector_size__ (4)));
+typedef char __v2qi __attribute__ ((__vector_size__ (2)));
+typedef char __v4qi __attribute__ ((__vector_size__ (4)));
+typedef char __v8qi __attribute__ ((__vector_size__ (8)));
+
+typedef unsigned short __v2hu __attribute__ ((__vector_size__ (4)));
+typedef unsigned short __v4hu __attribute__ ((__vector_size__ (8)));
+typedef unsigned char __v2qu __attribute__ ((__vector_size__ (2)));
+typedef unsigned char __v4qu __attribute__ ((__vector_size__ (4)));
+typedef unsigned char __v8qu __attribute__ ((__vector_size__ (8)));
+typedef unsigned int __v2su __attribute__ ((__vector_size__ (8)));
+
+__v2si mm_cvtepi64_epi32_builtin_convertvector(__m128i a)
+{
+  return __builtin_convertvector((__v2di)a, __v2si);
+}
+
+__m128imm256_cvtepi64_epi32_builtin_convertvector(__m256i a)
+{
+  return (__m128i)__builtin_convertvector((__v4di)a, __v4si);
+}
+
+__m256imm512_cvtepi64_epi32_builtin_convertvector(__m512i a)
+{
+  return (__m256i)__builtin_convertvector((__v8di)a, __v8si);
+}
+
+__v2hi mm_cvtepi64_epi16_builtin_convertvector(__m128i a)
+{
+  return __builtin_convertvector((__v2di)a, __v2hi);
+}
+
+__v4hi mm256_cvtepi64_epi16_builtin_convertvector(__m256i a)
+{
+  return __builtin_convertvector((__v4di)a, __v4hi);
+}
+
+__m128imm512_cvtepi64_epi16_builtin_convertvector(__m512i a)
+{
+  return (__m128i)__builtin_convertvector((__v8di)a, __v8hi);
+}
+
+__v2qi mm_cvtepi64_epi8_builtin_convertvector(__m128i a)
+{
+  return __builtin_convertvector((__v2di)a, __v2qi);
+}
+
+__v4qi mm256_cvtepi64_epi8_builtin_convertvector(__m256i a)
+{
+  return __builtin_convertvector((__v4di)a, __v4qi);
+}
+
+__v8qi mm512_cvtepi64_epi8_builtin_convertvector(__m512i a)
+{
+  return __builtin_convertvector((__v8di)a, __v8qi);
+}
+
+__v2hi mm64_cvtepi32_epi16_builtin_convertvector(__v2si a)
+{
+  return __builtin_convertvector((__v2si)a, __v2hi);
+}
+
+__v4hi mm_cvtepi32_epi16_builtin_convertvector(__m128i a)
+{
+  return __builtin_convertvector((__v4si)a, __v4hi);
+}
+
+__m128i

[PATCH v1 1/1] RISC-V: Nan-box the result of movbf on soft-bf16

2024-05-07 Thread Xiao Zeng
1 This patch implements the Nan-box of bf16.

2 Please refer to the Nan-box implementation of hf16 in:


3 The discussion about Nan-box can be found on the website:


4 Below test are passed for this patch
* The riscv fully regression test.

gcc/ChangeLog:

* config/riscv/riscv.cc (riscv_legitimize_move): Expand movbf
with Nan-boxing value.
* config/riscv/riscv.md (*movbf_softfloat_boxing): New pattern.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/_Bfloat16-nanboxing.c: New test.
---
 gcc/config/riscv/riscv.cc | 51 ++-
 gcc/config/riscv/riscv.md | 12 -
 .../gcc.target/riscv/_Bfloat16-nanboxing.c| 38 ++
 3 files changed, 76 insertions(+), 25 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/riscv/_Bfloat16-nanboxing.c

diff --git a/gcc/config/riscv/riscv.cc b/gcc/config/riscv/riscv.cc
index 545e68566dc..be2cb245733 100644
--- a/gcc/config/riscv/riscv.cc
+++ b/gcc/config/riscv/riscv.cc
@@ -3120,35 +3120,38 @@ riscv_legitimize_move (machine_mode mode, rtx dest, rtx 
src)
 }
 
   /* In order to fit NaN boxing, expand
- (set FP_REG (reg:HF src))
+ (set FP_REG (reg:HF/BF src))
  to
  (set (reg:SI/DI mask) (const_int -65536)
- (set (reg:SI/DI temp) (zero_extend:SI/DI (subreg:HI (reg:HF src) 0)))
+ (set (reg:SI/DI temp) (zero_extend:SI/DI (subreg:HI (reg:HF/BF src) 0)))
  (set (reg:SI/DI temp) (ior:SI/DI (reg:SI/DI mask) (reg:SI/DI temp)))
- (set (reg:HF dest) (unspec:HF [ (reg:SI/DI temp) ] UNSPEC_FMV_SFP16_X))
+ (set (reg:HF/BF dest) (unspec:HF/BF[ (reg:SI/DI temp) ]
+   UNSPEC_FMV_SFP16_X/UNSPEC_FMV_SBF16_X))
  */
 
- if (TARGET_HARD_FLOAT
- && !TARGET_ZFHMIN && mode == HFmode
- && REG_P (dest) && FP_REG_P (REGNO (dest))
- && REG_P (src) && !FP_REG_P (REGNO (src))
- && can_create_pseudo_p ())
-   {
- rtx mask = force_reg (word_mode, gen_int_mode (-65536, word_mode));
- rtx temp = gen_reg_rtx (word_mode);
- emit_insn (gen_extend_insn (temp,
-simplify_gen_subreg (HImode, src, mode, 0),
-word_mode, HImode, 1));
- if (word_mode == SImode)
-   emit_insn (gen_iorsi3 (temp, mask, temp));
- else
-   emit_insn (gen_iordi3 (temp, mask, temp));
-
- riscv_emit_move (dest, gen_rtx_UNSPEC (HFmode, gen_rtvec (1, temp),
-   UNSPEC_FMV_SFP16_X));
-
- return true;
-   }
+  if (TARGET_HARD_FLOAT
+  && ((!TARGET_ZFHMIN && mode == HFmode)
+ || (!TARGET_ZFBFMIN && mode == BFmode))
+  && REG_P (dest) && FP_REG_P (REGNO (dest)) && REG_P (src)
+  && !FP_REG_P (REGNO (src)) && can_create_pseudo_p ())
+{
+  rtx mask = force_reg (word_mode, gen_int_mode (-65536, word_mode));
+  rtx temp = gen_reg_rtx (word_mode);
+  emit_insn (gen_extend_insn (temp,
+ simplify_gen_subreg (HImode, src, mode, 0),
+ word_mode, HImode, 1));
+  if (word_mode == SImode)
+   emit_insn (gen_iorsi3 (temp, mask, temp));
+  else
+   emit_insn (gen_iordi3 (temp, mask, temp));
+
+  riscv_emit_move (dest,
+  gen_rtx_UNSPEC (mode, gen_rtvec (1, temp),
+  mode == HFmode ? UNSPEC_FMV_SFP16_X
+ : UNSPEC_FMV_SBF16_X));
+
+  return true;
+}
 
   /* We need to deal with constants that would be legitimate
  immediate_operands but aren't legitimate move_operands.  */
diff --git a/gcc/config/riscv/riscv.md b/gcc/config/riscv/riscv.md
index 24558682eb8..236293e2fcd 100644
--- a/gcc/config/riscv/riscv.md
+++ b/gcc/config/riscv/riscv.md
@@ -86,8 +86,9 @@
   ;; String unspecs
   UNSPEC_STRLEN
 
-  ;; Workaround for HFmode without hardware extension
+  ;; Workaround for HFmode and BFmode without hardware extension
   UNSPEC_FMV_SFP16_X
+  UNSPEC_FMV_SBF16_X
 
   ;; XTheadFmv moves
   UNSPEC_XTHEADFMV
@@ -1926,6 +1927,15 @@
   [(set_attr "type" "fmove")
(set_attr "mode" "SF")])
 
+(define_insn "*movbf_softfloat_boxing"
+  [(set (match_operand:BF 0 "register_operand"   "=f")
+   (unspec:BF [(match_operand:X 1 "register_operand" " r")]
+UNSPEC_FMV_SBF16_X))]
+  "!TARGET_ZFBFMIN"
+  "fmv.w.x\t%0,%1"
+  [(set_attr "type" "fmove")
+   (set_attr "mode" "SF")])
+
 ;;
 ;;  
 ;;
diff --git a/gcc/testsuite/gcc.target/riscv/_Bfloat16-nanboxing.c 
b/gcc/testsuite/gcc.target/riscv/_Bfloat16-nanboxing.c
new file mode 100644
index 000..11a73d22234
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/_Bfloat16-nanboxing.c
@@ -0,0 +1,38 @@
+/* { dg-do compile } */
+/* { dg-options 

[PATCH v1 0/1] RISC-V: Nan-box the result of movbf on soft-bf16

2024-05-07 Thread Xiao Zeng
Compared to the initial patch:


1 Fixed the formatting issue, although the modified format can pass the
CI format check, it looks strange.

2 Due to CI not using the latest code for patch, the initial patch build failed.

3 V1 submission will trigger CI again, hoping to resolve this issue. If not, I 
will
send an email to the CI management personnel requesting them to manually run it.

4 For information about CI, you can refer to the following email conversation:
---
On 5/7/24 01:25, Xiao Zeng wrote:
> Hi, during the use of CI, I discovered a possible issue and am now providing 
> feedback to you
>
> https://github.com/ewlu/gcc-precommit-ci/issues/1481
>
> The RISCV_Nanbox_the_result_of_movbf_on_softbf16 relies on the mainline with 
> a CommitID
> of <8c7cee80eb50792e57d514be1418c453ddd1073e>, but in CI,  is used 
> as the
> parent CommitID, which obviously leads to patch compilation failure.
>
> Expect that each CI is performed on the latest code.
>
> Of course, perhaps there are other considerations here.
>
> How can I operate to enable CI to use the latest mainline code?
>
> Looking forward to your reply very much.
>
> Thanks
> Xiao Zeng
>
Hi Xiao,

Thanks for reaching out.

Currently we rely on postcommit generate a baseline of known failures:

https://github.com/patrick-rivos/gcc-postcommit-ci/issues

Precommit then applies patches to that same baseline hash and compares
the results.

We don't currently have enough compute to generate a baseline for each
patch sent to the mailing list. We'll be asking for more compute from
the RISE project in the next week or so to allow us to use the most
recent GCC hash.

What you noticed yesterday was a bad change to the postcommit-CI which
prevented a new baseline from being generated.

It's expected that precommit uses a ~8 hour old commit as a baseline. If
a patch ever requires a recent commit, feel free to email us and we'll
rerun the precommit CI once a new baseline exists.

We should have a new baseline in ~8 hours and I'll rerun your patch for
you once that happens.

Thanks,
Patrick
---

Xiao Zeng (1):
  RISC-V: Nan-box the result of movbf on soft-bf16

 gcc/config/riscv/riscv.cc | 51 ++-
 gcc/config/riscv/riscv.md | 12 -
 .../gcc.target/riscv/_Bfloat16-nanboxing.c| 38 ++
 3 files changed, 76 insertions(+), 25 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/riscv/_Bfloat16-nanboxing.c

-- 
2.17.1



Re: [V2][PATCH] gcc-14/changes.html: Deprecate a GCC C extension on flexible array members.

2024-05-07 Thread Kees Cook
On Tue, May 07, 2024 at 06:34:19PM +, Qing Zhao wrote:
> On May 7, 2024, at 13:57, Sebastian Huber 
>  wrote:
> > On 07.05.24 16:26, Qing Zhao wrote:
> > > Hi, Sebastian,
> > > Thanks for your explanation.
> > > Our goal is to deprecate the GCC extension on  structure
> > > containing a flexible array member not at the end of another
> > > structure. In order to achieve this goal, we provided the warning option
> > > -Wflex-array-member-not-at-end for the users to locate all such
> > > cases in their source code and update the source code to eliminate
> > > such cases.
> >
> > What is the benefit of deprecating this GCC extension? If GCC
> > extensions are removed, then it would be nice to enable the associated
> > warnings by default.

The goal of all of the recent array bounds and flexible array work is to
make sizing information unambiguous (e.g. via __builtin_object_size(),
__builtin_dynamic_object_size(), and the array-bounds sanitizer). For
the compiler to be able to deterministically report size information
on arrays, we needed to deprecate this case even though it had been
supported in the past. (Though we also _added_ extensions to support
for other things, like flexible arrays in unions, and the coming
__counted_by attribute.)

For example:

struct flex { int length; char data[]; };
struct mid_flex { int m; struct flex flex_data; int n; int o; };

#define SZ(p)   __builtin_dynamic_object_size(p, 1)

void foo(struct flex *f, struct mid_flex *m)
{
printf("%zu\n", SZ(f));
printf("%zu\n", SZ(m->flex_data));
}

int main(void)
{
struct mid_flex m = { .flex_data.length = 8 };
foo(>flex_data, );
return 0;
}

This is printing the size of the same object. But the desired results
are ambiguous. Does m->flex_data have an unknown size (i.e. SIZE_MAX)
because it's a flex array, or does it contain 8 bytes, since it overlaps
with the other structure's trailing 2 ints?

The answer from GCC 13 was neither:

18446744073709551615
4

It considered flex_data to be only the size of it's non-flex-array
members, but only when there was semantic context that it was part of
another structure. (Yet more ambiguity.)

In GCC 14, this is "resolved" to be unknown since it is a flex array
which has no sizing info, and context doesn't matter:

18446744073709551615
18446744073709551615

But this paves the way for the coming 'counted_by' attribute which will
allow for struct flex above to be defined as:

struct flex { int length; char data[] __attribute__((counted_by(length))); };

At which point GCC can deterministically report the object size.

Hopefully I've captured this all correctly -- Qing can correct me. :)

> >
> > > We had a long discussion before deciding to deprecating this GCC
> > > extension. Please see details here:
> > >
> > > https://gcc.gnu.org/bugzilla/show_bug.cgi?id=101832
> > >
> > > Yes, we do plan to enable this warning by default before final
> > > deprecation.  (Might consider to enable this warning by default in
> > > GCC15… and then deprecate it in the next release)
> > >
> > > Right now, there is an ongoing work in Linux kernel to get rid of
> > > all such cases. Kees might have more information on this.
> > >
> > >
> > > The static initialization of structures with flexible array members
> > > will still work as long as the flexible array members are at the end of
> > > the structures.
> >
> > Removing the support for flexible array members in the middle of
> > compounds will make the static initialization practically infeasible.
>
> If the flexible array member is moved to the end of the compounds,
> the static initialization still work. What’s the issue here?
>
> > > My question: is it possible to update your source code to move
> > > the structure with flexible array member to the end of the containing
> > > structure?
> > >
> > > i.e, in your example, in the struct Thread_Configured_control,
> > > move the field “Thread_Control Control” to the end of the structure?
> >
> > If we move the Thread_Control to the end, how would I add a
> > configuration defined number of elements at the end?
>
> Don’t understand this, why moving the Thread_Control Control” to
> the end of the containing structure will make this a problem?
> Could you please explain this with a simplified example?

I found your example at [2] and tried to trim/summarize it here:


struct _Thread_Control {
Objects_Control Object;
...
void*extensions[];
};
typedef struct _Thread_Control Thread_Control;

struct Thread_Configured_control {
  Thread_Control Control;

  #if CONFIGURE_MAXIMUM_USER_EXTENSIONS > 0
void *extensions[ CONFIGURE_MAXIMUM_USER_EXTENSIONS + 1 ];
  #endif
  Configuration_Scheduler_node Scheduler_nodes[ _CONFIGURE_SCHEDULER_COUNT ];
  RTEMS_API_Control API_RTEMS;
  #ifdef RTEMS_POSIX_API
POSIX_API_Control API_POSIX;
  #endif
  #if CONFIGURE_MAXIMUM_THREAD_NAME_SIZE > 1
char name[ CONFIGURE_MAXIMUM_THREAD_NAME_SIZE ];
  #endif
  #if 

[committed][RISC-V] Turn on overlap_op_by_pieces for generic-ooo tuning

2024-05-07 Thread Jeff Law
Per quick email exchange with Palmer.  Given the triviality, I'm just 
pushing it.


jeffcommit 9f14f1978260148d4d6208dfd73df1858e623758
Author: Jeff Law 
Date:   Tue May 7 15:34:16 2024 -0600

[committed][RISC-V] Turn on overlap_op_by_pieces for generic-ooo tuning

Per quick email exchange with Palmer.  Given the triviality, I'm just 
pushing
it.

gcc/
* config/riscv/riscv.cc (generic_ooo_tune_info): Turn on
overlap_op_by_pieces.

diff --git a/gcc/config/riscv/riscv.cc b/gcc/config/riscv/riscv.cc
index a9b57d41184..62207b6b227 100644
--- a/gcc/config/riscv/riscv.cc
+++ b/gcc/config/riscv/riscv.cc
@@ -536,7 +536,7 @@ static const struct riscv_tune_param generic_ooo_tune_info 
= {
   4,   /* fmv_cost */
   false,   /* slow_unaligned_access */
   false,   /* use_divmod_expansion */
-  false,   /* overlap_op_by_pieces */
+  true,/* overlap_op_by_pieces 
*/
   RISCV_FUSE_NOTHING,   /* fusible_ops */
   _vector_cost,/* vector cost */
 };


Re: [committed] [RISC-V] Allow uarchs to set TARGET_OVERLAP_OP_BY_PIECES_P

2024-05-07 Thread Jeff Law




On 5/7/24 3:24 PM, Palmer Dabbelt wrote:


@@ -529,6 +536,7 @@ static const struct riscv_tune_param generic_ooo_tune_info 
= {
4,  /* fmv_cost */
false,  /* slow_unaligned_access */
false,  /* use_divmod_expansion */
+  false,   /* overlap_op_by_pieces */


IMO we should turn this on for the generic OOO tuning -- the benchmarks
say it's not faster for the T-Head OOO cores, but we were all so
surprised to find that I don't think we even fully trust the benchmarks.
I'd assume OOO cores are faster with the overlapping stores, so we
should just lean into it and let vendors say something if that's the
wrong assumption.
Several factors likely come into play (branch prediction, OOO 
properties, write combining, etc etc).


But sure, I don't think that'd be terribly controversial.  I can go 
ahead and make that change now given its triviality.


Jeff





Re: [committed] [RISC-V] Allow uarchs to set TARGET_OVERLAP_OP_BY_PIECES_P

2024-05-07 Thread Palmer Dabbelt
On Tue, 07 May 2024 14:18:36 PDT (-0700), Jeff Law wrote:
> This is almost exclusively work from the VRULL team.
>
> As we've discussed in the Tuesday meeting in the past, we'd like to have
> a knob in the tuning structure to indicate that overlapped stores during
> move_by_pieces expansion of memcpy & friends are acceptable.
>
> This patch adds the that capability in our tuning structure.  It's off
> for all the uarchs upstream, but we have been using it inside Ventana
> for our uarch with success.  So technically it's NFC upstream, but puts
> in the infrastructure multiple organizations likely need.
>
>
> Built and tested rv64gc.  Pushing to the trunk shortly.
> jeff
> commit 300393484dbfa9fd3891174ea47aa3fb41915abc
> Author: Christoph Müllner 
> Date:   Tue May 7 15:16:21 2024 -0600
>
> [committed] [RISC-V] Allow uarchs to set TARGET_OVERLAP_OP_BY_PIECES_P
>
> This is almost exclusively work from the VRULL team.
>
> As we've discussed in the Tuesday meeting in the past, we'd like to have 
> a knob
> in the tuning structure to indicate that overlapped stores during
> move_by_pieces expansion of memcpy & friends are acceptable.
>
> This patch adds the that capability in our tuning structure.  It's off 
> for all
> the uarchs upstream, but we have been using it inside Ventana for our 
> uarch
> with success.  So technically it's NFC upstream, but puts in the 
> infrastructure
> multiple organizations likely need.
>
> gcc/
>
> * config/riscv/riscv.cc (struct riscv_tune_param): Add new
> "overlap_op_by_pieces" field.
> (rocket_tune_info, sifive_7_tune_info): Set it.
> (sifive_p400_tune_info, sifive_p600_tune_info): Likewise.
> (thead_c906_tune_info, xiangshan_nanhu_tune_info): Likewise.
> (generic_ooo_tune_info, optimize_size_tune_info): Likewise.
> (riscv_overlap_op_by_pieces): New function.
> (TARGET_OVERLAP_OP_BY_PIECES_P): define.
>
> gcc/testsuite/
>
> * gcc.target/riscv/memcpy-nonoverlapping.c: New test.
> * gcc.target/riscv/memset-nonoverlapping.c: New test.
>
> diff --git a/gcc/config/riscv/riscv.cc b/gcc/config/riscv/riscv.cc
> index 545e68566dc..a9b57d41184 100644
> --- a/gcc/config/riscv/riscv.cc
> +++ b/gcc/config/riscv/riscv.cc
> @@ -288,6 +288,7 @@ struct riscv_tune_param
>unsigned short fmv_cost;
>bool slow_unaligned_access;
>bool use_divmod_expansion;
> +  bool overlap_op_by_pieces;
>unsigned int fusible_ops;
>const struct cpu_vector_cost *vec_costs;
>  };
> @@ -427,6 +428,7 @@ static const struct riscv_tune_param rocket_tune_info = {
>8, /* fmv_cost */
>true,  /* 
> slow_unaligned_access */
>false, /* use_divmod_expansion */
> +  false, /* overlap_op_by_pieces */
>RISCV_FUSE_NOTHING,   /* fusible_ops */
>NULL,  /* vector cost */
>  };
> @@ -444,6 +446,7 @@ static const struct riscv_tune_param sifive_7_tune_info = 
> {
>8, /* fmv_cost */
>true,  /* 
> slow_unaligned_access */
>false, /* use_divmod_expansion */
> +  false, /* overlap_op_by_pieces */
>RISCV_FUSE_NOTHING,   /* fusible_ops */
>NULL,  /* vector cost */
>  };
> @@ -461,6 +464,7 @@ static const struct riscv_tune_param 
> sifive_p400_tune_info = {
>4, /* fmv_cost */
>true,  /* 
> slow_unaligned_access */
>false, /* use_divmod_expansion */
> +  false, /* overlap_op_by_pieces */
>RISCV_FUSE_LUI_ADDI | RISCV_FUSE_AUIPC_ADDI,  /* fusible_ops */
>_vector_cost,  /* vector cost */
>  };
> @@ -478,6 +482,7 @@ static const struct riscv_tune_param 
> sifive_p600_tune_info = {
>4, /* fmv_cost */
>true,  /* 
> slow_unaligned_access */
>false, /* use_divmod_expansion */
> +  false, /* overlap_op_by_pieces */
>RISCV_FUSE_LUI_ADDI | RISCV_FUSE_AUIPC_ADDI,  /* fusible_ops */
>_vector_cost,  /* vector cost */
>  };
> @@ -495,6 +500,7 @@ static const struct riscv_tune_param thead_c906_tune_info 
> = {
>8, /* fmv_cost */
>false,/* slow_unaligned_access */
>false, /* use_divmod_expansion */
> +  false, /* 

[committed] [RISC-V] Allow uarchs to set TARGET_OVERLAP_OP_BY_PIECES_P

2024-05-07 Thread Jeff Law

This is almost exclusively work from the VRULL team.

As we've discussed in the Tuesday meeting in the past, we'd like to have 
a knob in the tuning structure to indicate that overlapped stores during 
move_by_pieces expansion of memcpy & friends are acceptable.


This patch adds the that capability in our tuning structure.  It's off 
for all the uarchs upstream, but we have been using it inside Ventana 
for our uarch with success.  So technically it's NFC upstream, but puts 
in the infrastructure multiple organizations likely need.



Built and tested rv64gc.  Pushing to the trunk shortly.
jeffcommit 300393484dbfa9fd3891174ea47aa3fb41915abc
Author: Christoph Müllner 
Date:   Tue May 7 15:16:21 2024 -0600

[committed] [RISC-V] Allow uarchs to set TARGET_OVERLAP_OP_BY_PIECES_P

This is almost exclusively work from the VRULL team.

As we've discussed in the Tuesday meeting in the past, we'd like to have a 
knob
in the tuning structure to indicate that overlapped stores during
move_by_pieces expansion of memcpy & friends are acceptable.

This patch adds the that capability in our tuning structure.  It's off for 
all
the uarchs upstream, but we have been using it inside Ventana for our uarch
with success.  So technically it's NFC upstream, but puts in the 
infrastructure
multiple organizations likely need.

gcc/

* config/riscv/riscv.cc (struct riscv_tune_param): Add new
"overlap_op_by_pieces" field.
(rocket_tune_info, sifive_7_tune_info): Set it.
(sifive_p400_tune_info, sifive_p600_tune_info): Likewise.
(thead_c906_tune_info, xiangshan_nanhu_tune_info): Likewise.
(generic_ooo_tune_info, optimize_size_tune_info): Likewise.
(riscv_overlap_op_by_pieces): New function.
(TARGET_OVERLAP_OP_BY_PIECES_P): define.

gcc/testsuite/

* gcc.target/riscv/memcpy-nonoverlapping.c: New test.
* gcc.target/riscv/memset-nonoverlapping.c: New test.

diff --git a/gcc/config/riscv/riscv.cc b/gcc/config/riscv/riscv.cc
index 545e68566dc..a9b57d41184 100644
--- a/gcc/config/riscv/riscv.cc
+++ b/gcc/config/riscv/riscv.cc
@@ -288,6 +288,7 @@ struct riscv_tune_param
   unsigned short fmv_cost;
   bool slow_unaligned_access;
   bool use_divmod_expansion;
+  bool overlap_op_by_pieces;
   unsigned int fusible_ops;
   const struct cpu_vector_cost *vec_costs;
 };
@@ -427,6 +428,7 @@ static const struct riscv_tune_param rocket_tune_info = {
   8,   /* fmv_cost */
   true,/* 
slow_unaligned_access */
   false,   /* use_divmod_expansion */
+  false,   /* overlap_op_by_pieces */
   RISCV_FUSE_NOTHING,   /* fusible_ops */
   NULL,/* vector cost */
 };
@@ -444,6 +446,7 @@ static const struct riscv_tune_param sifive_7_tune_info = {
   8,   /* fmv_cost */
   true,/* 
slow_unaligned_access */
   false,   /* use_divmod_expansion */
+  false,   /* overlap_op_by_pieces */
   RISCV_FUSE_NOTHING,   /* fusible_ops */
   NULL,/* vector cost */
 };
@@ -461,6 +464,7 @@ static const struct riscv_tune_param sifive_p400_tune_info 
= {
   4,   /* fmv_cost */
   true,/* 
slow_unaligned_access */
   false,   /* use_divmod_expansion */
+  false,   /* overlap_op_by_pieces */
   RISCV_FUSE_LUI_ADDI | RISCV_FUSE_AUIPC_ADDI,  /* fusible_ops */
   _vector_cost,/* vector cost */
 };
@@ -478,6 +482,7 @@ static const struct riscv_tune_param sifive_p600_tune_info 
= {
   4,   /* fmv_cost */
   true,/* 
slow_unaligned_access */
   false,   /* use_divmod_expansion */
+  false,   /* overlap_op_by_pieces */
   RISCV_FUSE_LUI_ADDI | RISCV_FUSE_AUIPC_ADDI,  /* fusible_ops */
   _vector_cost,/* vector cost */
 };
@@ -495,6 +500,7 @@ static const struct riscv_tune_param thead_c906_tune_info = 
{
   8,   /* fmv_cost */
   false,/* slow_unaligned_access */
   false,   /* use_divmod_expansion */
+  false,   /* overlap_op_by_pieces */
   RISCV_FUSE_NOTHING,   /* fusible_ops */
   NULL,/* vector cost */
 };
@@ -512,6 +518,7 @@ static 

Re: [PATCH 3/4] gcc/c-family/c-opts: fix quoting for `-fdeps-format=` error message

2024-05-07 Thread Joseph Myers
On Sat, 4 May 2024, Ben Boeckel wrote:

> diff --git a/gcc/c-family/c-opts.cc b/gcc/c-family/c-opts.cc
> index be3058dca63..4a164ad0c0b 100644
> --- a/gcc/c-family/c-opts.cc
> +++ b/gcc/c-family/c-opts.cc
> @@ -370,7 +370,7 @@ c_common_handle_option (size_t scode, const char *arg, 
> HOST_WIDE_INT value,
>if (!strcmp (arg, "p1689r5"))
>   cpp_opts->deps.fdeps_format = FDEPS_FMT_P1689R5;
>else
> - error ("%<-fdeps-format=%> unknown format %<%s%>", arg);
> + error ("%<-fdeps-format=%> unknown format %q", arg);
>break;

That can't be right.  The GCC %q is a modifier that needs to have an 
actual format specifier it modifies (so %qs - which produces the same 
output as %<%s%> - but not %q by itself).

-- 
Joseph S. Myers
josmy...@redhat.com



Re: [PATCH] MATCH: Add some more value_replacement simplifications (a != 0 ? expr : 0) to match

2024-05-07 Thread Andrew Pinski
On Tue, May 7, 2024 at 1:45 PM Jeff Law  wrote:
>
>
>
> On 4/30/24 9:21 PM, Andrew Pinski wrote:
> > This adds a few more of what is currently done in phiopt's value_replacement
> > to match. I noticed this when I was hooking up phiopt's value_replacement
> > code to use match and disabling the old code. But this can be done
> > independently from the hooking up phiopt's value_replacement as phiopt
> > is already hooked up for simplified versions already.
> >
> > /* a != 0 ? a / b : 0  -> a / b iff b is nonzero. */
> > /* a != 0 ? a * b : 0 -> a * b */
> > /* a != 0 ? a & b : 0 -> a & b */
> >
> > We prefer the `cond ? a : 0` forms to allow optimization of `a * cond` which
> > uses that form.
> >
> > Bootstrapped and tested on x86_64-linux-gnu with no regressions.
> >
> >   PR treee-optimization/114894
> >
> > gcc/ChangeLog:
> >
> >   * match.pd (`a != 0 ? a / b : 0`): New pattern.
> >   (`a != 0 ? a * b : 0`): New pattern.
> >   (`a != 0 ? a & b : 0`): New pattern.
> >
> > gcc/testsuite/ChangeLog:
> >
> >   * gcc.dg/tree-ssa/phi-opt-value-5.c: New test.
> Is there any need to also handle the reversed conditional with the arms
> swapped?If not, this is fine as-is.  If yes, then fine with the
> obvious generalization.

The answer is yes and no. While the PHI-OPT pass will try both cases
but the other (all?) passes does not. This is something I have been
thinking about trying to solve in a generic way instead of adding many
more patterns here. I will start working on that in the middle of
June.
Most of the time cond patterns in match are used is inside phiopt so
having the revered conditional has not been on high on my priority but
with VRP and scev and match (itself) producing more cond_expr, we
should fix this once and for all for GCC 15.

Thanks,
Andrew Pinski

>
> jeff
>


Re: [PATCH] PR middle-end/111701: signbit(x*x) vs -fsignaling-nans

2024-05-07 Thread Joseph Myers
On Fri, 3 May 2024, Richard Biener wrote:

> So what I do not necessarily agree with is that we need to preserve
> the multiplication with -fsignaling-nans.  Do we consider a program doing
> 
> handler() { exit(0); }
> 
>  x = sNaN;
> ...
>  sigaction(SIGFPE, ... handler)
>  x*x;
>  format_hard_drive();
> 
> and expecting the program to exit(0) rather than formating the hard-disk
> to be expecting something the C standard guarantees?  And is it enough
> for the program to enable -fsignaling-nans for this?
> 
> If so then the first and foremost bug is that 'x*x' doesn't have
> TREE_SIDE_EFFECTS
> set and thus we do not preserve it when optimizing __builtin_signbit () of it.

Signaling NaNs don't seem relevant here.  "Signal" means "set the 
exception flag" - and 0 * Inf raises the same "invalid" exception flag as 
sNaN * sNaN.  Changing flow of control on an exception is outside the 
scope of standard C and requires nonstandard extensions such as 
feenableexcept.  (At present -ftrapping-math covers both kinds of 
exception handling - the default setting of a flag, and the nonstandard 
change of flow of control.)

-- 
Joseph S. Myers
josmy...@redhat.com



Re: [PATCH] MATCH: Add some more value_replacement simplifications (a != 0 ? expr : 0) to match

2024-05-07 Thread Jeff Law




On 4/30/24 9:21 PM, Andrew Pinski wrote:

This adds a few more of what is currently done in phiopt's value_replacement
to match. I noticed this when I was hooking up phiopt's value_replacement
code to use match and disabling the old code. But this can be done
independently from the hooking up phiopt's value_replacement as phiopt
is already hooked up for simplified versions already.

/* a != 0 ? a / b : 0  -> a / b iff b is nonzero. */
/* a != 0 ? a * b : 0 -> a * b */
/* a != 0 ? a & b : 0 -> a & b */

We prefer the `cond ? a : 0` forms to allow optimization of `a * cond` which
uses that form.

Bootstrapped and tested on x86_64-linux-gnu with no regressions.

PR treee-optimization/114894

gcc/ChangeLog:

* match.pd (`a != 0 ? a / b : 0`): New pattern.
(`a != 0 ? a * b : 0`): New pattern.
(`a != 0 ? a & b : 0`): New pattern.

gcc/testsuite/ChangeLog:

* gcc.dg/tree-ssa/phi-opt-value-5.c: New test.
Is there any need to also handle the reversed conditional with the arms 
swapped?If not, this is fine as-is.  If yes, then fine with the 
obvious generalization.


jeff



Re: [PATCH v3] DCE __cxa_atexit calls where the function is pure/const [PR19661]

2024-05-07 Thread Jeff Law




On 5/4/24 5:58 PM, Andrew Pinski wrote:

In C++ sometimes you have a deconstructor function which is "empty", like for an
example with unions or with arrays.  The front-end might not know it is empty 
either
so this should be done on during optimization.o
To implement it I added it to DCE where we mark if a statement is necessary or 
not.

Bootstrapped and tested on x86_64-linux-gnu with no regressions.

Changes since v1:
   * v2: Add support for __aeabi_atexit for arm-*eabi. Add extra comments.
 Add cxa_atexit-5.C testcase for -fPIC case.
   * v3: Fix testcases for the __aeabi_atexit (forgot to do in the v2).

PR tree-optimization/19661

gcc/ChangeLog:

* tree-ssa-dce.cc (is_cxa_atexit): New function.
(is_removable_cxa_atexit_call): New function.
(mark_stmt_if_obviously_necessary): Don't mark removable
cxa_at_exit calls.
(mark_all_reaching_defs_necessary_1): Likewise.
(propagate_necessity): Likewise.

gcc/testsuite/ChangeLog:

* g++.dg/tree-ssa/cxa_atexit-1.C: New test.
* g++.dg/tree-ssa/cxa_atexit-2.C: New test.
* g++.dg/tree-ssa/cxa_atexit-3.C: New test.
* g++.dg/tree-ssa/cxa_atexit-4.C: New test.
* g++.dg/tree-ssa/cxa_atexit-5.C: New test.
* g++.dg/tree-ssa/cxa_atexit-6.C: New test.

OK
jeff



Re: [patch,avr] PR114975: Better 8-bit parity detection.

2024-05-07 Thread Jeff Law




On 5/7/24 11:23 AM, Georg-Johann Lay wrote:

Add a combine pattern for parity detection.

Ok for master?

Johann

AVR: target/114975 - Add combine-pattern for __parityqi2.

 PR target/114975
gcc/
 * config/avr/avr.md: Add combine pattern for
 8-bit parity detection.

gcc/testsuite/
 * gcc.target/avr/pr114975-parity.c: New test.

OK
jeff



Re: [patch,avr] PR114975: Better 8-bit popcount detection.

2024-05-07 Thread Jeff Law




On 5/7/24 11:25 AM, Georg-Johann Lay wrote:

Add a pattern for better popcount detection.

Ok for master?

Johann

--

AVR: target/114975 - Add combine-pattern for __popcountqi2.

 PR target/114975
gcc/
 * config/avr/avr.md: Add combine pattern for
 8-bit popcount detection.

gcc/testsuite/
 * gcc.target/avr/pr114975-popcount.c: New test.

OK
jeff



Re: [PATCH] c++: Implement C++26 P2893R3 - Variadic friends [PR114459]

2024-05-07 Thread Jason Merrill

On 5/3/24 12:35, Jakub Jelinek wrote:

Hi!

The following patch imeplements the C++26 P2893R3 - Variadic friends
paper.  The paper allows for the friend type declarations to specify
more than one friend type specifier and allows to specify ... at
the end of each.  The patch doesn't introduce tentative parsing of
friend-type-declaration non-terminal, but rather just extends existing
parsing where it is a friend declaration which ends with ; after the
declaration specifiers to the cases where it ends with ...; or , or ...,
In that case it pedwarns for cxx_dialect < cxx26, handles the ... and
if there is , continues in a loop to parse the further friend type
specifiers.

Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?

2024-05-03  Jakub Jelinek  

PR c++/114459
gcc/c-family/
* c-cppbuiltin.cc (c_cpp_builtins): Predefine
__cpp_variadic_friend=202403L for C++26.
gcc/cp/
* parser.cc (cp_parser_member_declaration): Implement C++26
P2893R3 - Variadic friends.  Parse friend type declarations
with ... or with more than one friend type specifier.
* friend.cc (make_friend_class): Allow TYPE_PACK_EXPANSION.
* pt.cc (instantiate_class_template): Handle PACK_EXPANSION_P
in friend classes.
gcc/testsuite/
* g++.dg/cpp26/feat-cxx26.C (__cpp_variadic_friend): Add test.
* g++.dg/cpp26/variadic-friend1.C: New test.

--- gcc/c-family/c-cppbuiltin.cc.jj 2024-05-02 09:31:17.746298275 +0200
+++ gcc/c-family/c-cppbuiltin.cc2024-05-03 14:50:08.008242950 +0200
@@ -1093,6 +1093,7 @@ c_cpp_builtins (cpp_reader *pfile)
  cpp_define (pfile, "__cpp_placeholder_variables=202306L");
  cpp_define (pfile, "__cpp_structured_bindings=202403L");
  cpp_define (pfile, "__cpp_deleted_function=202403L");
+ cpp_define (pfile, "__cpp_variadic_friend=202403L");
}
if (flag_concepts)
  {
--- gcc/cp/parser.cc.jj 2024-05-03 09:43:47.781511477 +0200
+++ gcc/cp/parser.cc2024-05-03 13:26:38.208088017 +0200
@@ -28102,7 +28102,14 @@ cp_parser_member_declaration (cp_parser*
  goto out;
/* If there is no declarator, then the decl-specifier-seq should
   specify a type.  */


Let's mention C++26 variadic friends in this comment.  OK with that change.


-  if (cp_lexer_next_token_is (parser->lexer, CPP_SEMICOLON))
+  if (cp_lexer_next_token_is (parser->lexer, CPP_SEMICOLON)
+  || (cp_parser_friend_p (_specifiers)
+ && cxx_dialect >= cxx11
+ && (cp_lexer_next_token_is (parser->lexer, CPP_COMMA)
+ || (cp_lexer_next_token_is (parser->lexer, CPP_ELLIPSIS)
+ && (cp_lexer_nth_token_is (parser->lexer, 2, CPP_SEMICOLON)
+ || cp_lexer_nth_token_is (parser->lexer, 2,
+   CPP_COMMA))
  {
/* If there was no decl-specifier-seq, and the next token is a
 `;', then we have something like:
@@ -28137,44 +28144,81 @@ cp_parser_member_declaration (cp_parser*
{
  /* If the `friend' keyword was present, the friend must
 be introduced with a class-key.  */
-  if (!declares_class_or_enum && cxx_dialect < cxx11)
-pedwarn (decl_spec_token_start->location, OPT_Wpedantic,
- "in C++03 a class-key must be used "
- "when declaring a friend");
-  /* In this case:
+ if (!declares_class_or_enum && cxx_dialect < cxx11)
+   pedwarn (decl_spec_token_start->location, OPT_Wpedantic,
+"in C++03 a class-key must be used "
+"when declaring a friend");
+ if (!cp_lexer_next_token_is (parser->lexer, CPP_SEMICOLON)
+ && cxx_dialect < cxx26)
+   pedwarn (cp_lexer_peek_token (parser->lexer)->location,
+OPT_Wc__26_extensions,
+"variadic friends or friend type declarations with "
+"multiple types only available with "
+"%<-std=c++2c%> or %<-std=gnu++2c%>");
+ location_t friend_loc = decl_specifiers.locations[ds_friend];
+ do
+   {
+ /* In this case:
  
-		template  struct A {

- friend struct A::B;
-   };
+template  struct A {
+  friend struct A::B;
+};
  
-		  A::B will be represented by a TYPENAME_TYPE, and

- therefore not recognized by check_tag_decl.  */
-  if (!type)
-{
-  type = decl_specifiers.type;
-  if (type && TREE_CODE (type) == TYPE_DECL)
-type = TREE_TYPE (type);
-}
-  /* Warn if an attribute cannot appear here, as per
- [dcl.attr.grammar]/5.  But not when 

Re: [V2][PATCH] gcc-14/changes.html: Deprecate a GCC C extension on flexible array members.

2024-05-07 Thread Qing Zhao
(Resend since my previous email in HTML and inline quoting wasn’t work, I 
changed the mail setting, hopefully this time it’s good). Sorry for the 
inconvenience.


> On May 7, 2024, at 13:57, Sebastian Huber 
>  wrote:
> 
> On 07.05.24 16:26, Qing Zhao wrote:
>> Hi, Sebastian,
>> Thanks for your explanation.
>> Our goal is to deprecate the GCC extension on  structure containing a 
>> flexible array member not at the end of another structure. In order to 
>> achieve this goal, we provided the warning option 
>> -Wflex-array-member-not-at-end for the users to
>> locate all such cases in their source code and update the source code to 
>> eliminate such cases.
> 
> What is the benefit of deprecating this GCC extension? If GCC extensions are 
> removed, then it would be nice to enable the associated warnings by default.

We had a long discussion before deciding to deprecating this GCC extension. 
Please see details here:

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=101832

Yes, we do plan to enable this warning by default before final deprecation.  
(Might consider to enable this warning by default in GCC15… and then deprecate 
it in the next release)

Right now, there is an ongoing work in Linux kernel to get rid of all such 
cases. Kees might have more information on this.

> 
>> The static initialization of structures with flexible array members will 
>> still work as long as the flexible array members are at
>> the end of the structures.
> 
> Removing the support for flexible array members in the middle of compounds 
> will make the static initialization practically infeasible.
 If the flexible array member is moved to the end of the compounds, the static 
initialization still work. What’s the issue here?
> 
>> My question: is it possible to update your source code to move the structure 
>> with flexible array member to the end of the
>> containing structure?
>> i.e, in your example, in the struct Thread_Configured_control, move the 
>> field “Thread_Control Control” to the end of the structure?
> 
> If we move the Thread_Control to the end, how would I add a configuration 
> defined number of elements at the end?

Don’t understand this, why moving the Thread_Control Control” to the end of the 
containing structure will make this a problem? 
Could you please explain this with a simplified example? 

Thanks.

Qing
> 
> -- 
> embedded brains GmbH & Co. KG
> Herr Sebastian HUBER
> Dornierstr. 4
> 82178 Puchheim
> Germany
> email: sebastian.hu...@embedded-brains.de
> phone: +49-89-18 94 741 - 16
> fax:   +49-89-18 94 741 - 08
> 
> Registergericht: Amtsgericht München
> Registernummer: HRB 157899
> Vertretungsberechtigte Geschäftsführer: Peter Rasmussen, Thomas Dörfler
> Unsere Datenschutzerklärung finden Sie hier:
> https://embedded-brains.de/datenschutzerklaerung/



Re: [PATCH] expansion: Use __trunchfbf2 calls rather than __extendhfbf2 [PR114907]

2024-05-07 Thread Jakub Jelinek
On Tue, May 07, 2024 at 08:57:00PM +0200, Richard Biener wrote:
> 
> 
> > Am 07.05.2024 um 18:02 schrieb Jakub Jelinek :
> > 
> > Hi!
> > 
> > The HF and BF modes have the same size/precision and neither is
> > a subset nor superset of the other.
> > So, using either __extendhfbf2 or __trunchfbf2 is weird.
> > The expansion apparently emits __extendhfbf2, but on the libgcc side
> > we apparently have __trunchfbf2 implemented.
> > 
> > I think it is easier to switch to using what is available rather than
> > adding new entrypoints to libgcc, even alias, because this is backportable.
> > 
> > Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?
> 
> Ok - do we have any target patterns that need adjustments?

I don't think so.
BFmode is i386/aarch64/arm/riscv backend only from what I can see,
I've done make mddump for all of them and none of the tmp-mddump.md
files show any matches for hfbf (nor bfhf).

Jakub



Re: [PATCH] c++/modules: Stream unmergeable temporaries by value again [PR114856]

2024-05-07 Thread Jason Merrill

On 5/7/24 01:35, Nathaniel Shead wrote:

On Thu, May 02, 2024 at 01:53:44PM -0400, Jason Merrill wrote:

On 5/2/24 10:40, Patrick Palka wrote:

On Thu, 2 May 2024, Nathaniel Shead wrote:


Bootstrapped and regtested on x86_64-pc-linux-gnu, OK for trunk/14.2?

Another alternative would be to stream such !DECL_NAME temporaries with
a merge key of MK_unique rather than attempting to find the matching
(nonexistant) field of the class context.


Both approaches sound good to me, hard to say which one is preferable..

The handling of function-scope vs class-scope temporaries seems to start
diverging in:

@@ -8861,28 +8861,6 @@ trees_out::decl_node (tree decl, walk_kind ref)
 return false;
   }
!  tree ctx = CP_DECL_CONTEXT (decl);
!  depset *dep = NULL;
!  if (streaming_p ())
!dep = dep_hash->find_dependency (decl);
!  else if (TREE_CODE (ctx) != FUNCTION_DECL
!  || TREE_CODE (decl) == TEMPLATE_DECL
!  || DECL_IMPLICIT_TYPEDEF_P (decl)
!  || (DECL_LANG_SPECIFIC (decl)
!  && DECL_MODULE_IMPORT_P (decl)))
!{
!  auto kind = (TREE_CODE (decl) == NAMESPACE_DECL
!  && !DECL_NAMESPACE_ALIAS (decl)
!  ? depset::EK_NAMESPACE : depset::EK_DECL);
!  dep = dep_hash->add_dependency (decl, kind);
!}
!
!  if (!dep)
!{
!  /* Some internal entity of context.  Do by value.  */
!  decl_value (decl, NULL);
!  return false;
!}
 if (dep->get_entity_kind () == depset::EK_REDIRECT)
   {

where for a class-scope temporary we add a dependency for it, stream
it by reference, and then stream it by value separately, which seems
unnecessary.

So if we decide to keep the create_temporary_var change, we probably
would want to unify this code path's handling of temporaries (i.e.
don't add_dependency a temporary regardless of its context).

If we decide your partially revert the create_temporary_var change,
your patch LGTM.


Streaming by value sounds right, but as noted an important difference
between reference temps and others is DECL_NAME.  Perhaps the code Patrick
quotes could look at that as well as the context?

Jason



With my patch we would no longer go through the code that Patrick quotes
for class-scope temporaries that I can see; we would instead first hit
the following code in 'tree_node':


   if (DECL_P (t))
 {
   if (DECL_TEMPLATE_PARM_P (t))
{
  tpl_parm_value (t);
  goto done;
}

   if (!DECL_CONTEXT (t))
{
  /* There are a few cases of decls with no context.  We'll write
 these by value, but first assert they are cases we expect.  */
  gcc_checking_assert (ref == WK_normal);
  switch (TREE_CODE (t))
{
default: gcc_unreachable ();

case LABEL_DECL:
  /* CASE_LABEL_EXPRs contain uncontexted LABEL_DECLs.  */
  gcc_checking_assert (!DECL_NAME (t));
  break;

case VAR_DECL:
  /* AGGR_INIT_EXPRs cons up anonymous uncontexted VAR_DECLs.  */
  gcc_checking_assert (!DECL_NAME (t)
   && DECL_ARTIFICIAL (t));
  break;

case PARM_DECL:
  /* REQUIRES_EXPRs have a tree list of uncontexted
 PARM_DECLS.  It'd be nice if they had a
 distinguishing flag to double check.  */
  break;
}
  goto by_value;
}
 }

  skip_normal:
   if (DECL_P (t) && !decl_node (t, ref))
 goto done;

   /* Otherwise by value */
  by_value:
   tree_value (t);


I think modifying what Patrick pointed out should only be necessary if
we maintain these nameless temporaries as having a class context; for
clarity, is that the direction you'd prefer me to go in to solve this?


I was thinking in this code that it seems fragile to require null 
DECL_CONTEXT to identify something produced by 
create_temporary_var/build_local_temp (which should really be merged).
We could even use is_local_temp instead of checking particular qualities 
directly.


But your patch is OK; please just add a comment to the 
make_temporary_var_for_ref_to_temp change indicating that you're setting 
DECL_CONTEXT to make the variable mergeable.


Thanks,
Jason



Re: [PATCH] expansion: Use __trunchfbf2 calls rather than __extendhfbf2 [PR114907]

2024-05-07 Thread Richard Biener



> Am 07.05.2024 um 18:02 schrieb Jakub Jelinek :
> 
> Hi!
> 
> The HF and BF modes have the same size/precision and neither is
> a subset nor superset of the other.
> So, using either __extendhfbf2 or __trunchfbf2 is weird.
> The expansion apparently emits __extendhfbf2, but on the libgcc side
> we apparently have __trunchfbf2 implemented.
> 
> I think it is easier to switch to using what is available rather than
> adding new entrypoints to libgcc, even alias, because this is backportable.
> 
> Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?

Ok - do we have any target patterns that need adjustments?

Thanks,
Richard 

> 2024-05-07  Jakub Jelinek  
> 
>PR middle-end/114907
>* expr.cc (convert_mode_scalar): Use trunc_optab rather than
>sext_optab for HF->BF conversions.
>* optabs-libfuncs.cc (gen_trunc_conv_libfunc): Likewise.
> 
>* gcc.dg/pr114907.c: New test.
> 
> --- gcc/expr.cc.jj2024-04-09 09:29:04.0 +0200
> +++ gcc/expr.cc2024-05-06 13:21:33.933798494 +0200
> @@ -355,8 +355,16 @@ convert_mode_scalar (rtx to, rtx from, i
>  && REAL_MODE_FORMAT (from_mode) == _half_format));
> 
>   if (GET_MODE_PRECISION (from_mode) == GET_MODE_PRECISION (to_mode))
> -/* Conversion between decimal float and binary float, same size.  */
> -tab = DECIMAL_FLOAT_MODE_P (from_mode) ? trunc_optab : sext_optab;
> +{
> +  if (REAL_MODE_FORMAT (to_mode) == _bfloat_half_format
> +  && REAL_MODE_FORMAT (from_mode) == _half_format)
> +/* libgcc implements just __trunchfbf2, not __extendhfbf2.  */
> +tab = trunc_optab;
> +  else
> +/* Conversion between decimal float and binary float, same
> +   size.  */
> +tab = DECIMAL_FLOAT_MODE_P (from_mode) ? trunc_optab : sext_optab;
> +}
>   else if (GET_MODE_PRECISION (from_mode) < GET_MODE_PRECISION (to_mode))
>tab = sext_optab;
>   else
> --- gcc/optabs-libfuncs.cc.jj2024-01-03 11:51:31.739728303 +0100
> +++ gcc/optabs-libfuncs.cc2024-05-06 15:50:21.611027802 +0200
> @@ -589,7 +589,9 @@ gen_trunc_conv_libfunc (convert_optab ta
>   if (GET_MODE_CLASS (float_tmode) != GET_MODE_CLASS (float_fmode))
> gen_interclass_conv_libfunc (tab, opname, float_tmode, float_fmode);
> 
> -  if (GET_MODE_PRECISION (float_fmode) <= GET_MODE_PRECISION (float_tmode))
> +  if (GET_MODE_PRECISION (float_fmode) <= GET_MODE_PRECISION (float_tmode)
> +  && (REAL_MODE_FORMAT (float_tmode) != _bfloat_half_format
> +  || REAL_MODE_FORMAT (float_fmode) != _half_format))
> return;
> 
>   if (GET_MODE_CLASS (float_tmode) == GET_MODE_CLASS (float_fmode))
> --- gcc/testsuite/gcc.dg/pr114907.c.jj2024-05-06 15:59:08.734958523 +0200
> +++ gcc/testsuite/gcc.dg/pr114907.c2024-05-06 16:02:38.914139829 +0200
> @@ -0,0 +1,27 @@
> +/* PR middle-end/114907 */
> +/* { dg-do run } */
> +/* { dg-options "" } */
> +/* { dg-add-options float16 } */
> +/* { dg-require-effective-target float16_runtime } */
> +/* { dg-add-options bfloat16 } */
> +/* { dg-require-effective-target bfloat16_runtime } */
> +
> +__attribute__((noipa)) _Float16
> +foo (__bf16 x)
> +{
> +  return (_Float16) x;
> +}
> +
> +__attribute__((noipa)) __bf16
> +bar (_Float16 x)
> +{
> +  return (__bf16) x;
> +}
> +
> +int
> +main ()
> +{
> +  if (foo (11.125bf16) != 11.125f16
> +  || bar (11.125f16) != 11.125bf16)
> +__builtin_abort ();
> +}
> 
>Jakub
> 


Re: [PATCH] tree-inline: Remove .ASAN_MARK calls when inlining functions into no_sanitize callers [PR114956]

2024-05-07 Thread Richard Biener



> Am 07.05.2024 um 17:54 schrieb Jakub Jelinek :
> 
> Hi!
> 
> In r9-5742 we've started allowing to inline always_inline functions into
> functions which have disabled e.g. address sanitization even when the
> always_inline function is implicitly from command line options sanitized.
> 
> This mostly works fine because most of the asan instrumentation is done only
> late after ipa, but as the following testcase the .ASAN_MARK ifn calls
> gimplifier adds can result in ICEs.
> 
> Fixed by dropping those during inlining, similarly to how we drop
> .TSAN_FUNC_EXIT calls.
> 
> Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?

Ok

Richard 

> 2024-05-07  Jakub Jelinek  
> 
>PR sanitizer/114956
>* tree-inline.cc: Include asan.h.
>(copy_bb): Remove also .ASAN_MARK calls if id->dst_fn has asan/hwasan
>sanitization disabled.
> 
>* gcc.dg/asan/pr114956.c: New test.
> 
> --- gcc/tree-inline.cc.jj2024-05-03 09:44:21.199055899 +0200
> +++ gcc/tree-inline.cc2024-05-06 10:45:37.231349328 +0200
> @@ -65,6 +65,7 @@ along with GCC; see the file COPYING3.
> #include "symbol-summary.h"
> #include "symtab-thunks.h"
> #include "symtab-clones.h"
> +#include "asan.h"
> 
> /* I'm not real happy about this, but we need to handle gimple and
>non-gimple trees.  */
> @@ -2226,13 +2227,26 @@ copy_bb (copy_body_data *id, basic_block
>}
>  else if (call_stmt
>   && id->call_stmt
> -   && gimple_call_internal_p (stmt)
> -   && gimple_call_internal_fn (stmt) == IFN_TSAN_FUNC_EXIT)
> -{
> -  /* Drop TSAN_FUNC_EXIT () internal calls during inlining.  */
> -  gsi_remove (_gsi, false);
> -  continue;
> -}
> +   && gimple_call_internal_p (stmt))
> +switch (gimple_call_internal_fn (stmt))
> +  {
> +  case IFN_TSAN_FUNC_EXIT:
> +/* Drop .TSAN_FUNC_EXIT () internal calls during inlining.  */
> +gsi_remove (_gsi, false);
> +continue;
> +  case IFN_ASAN_MARK:
> +/* Drop .ASAN_MARK internal calls during inlining into
> +   no_sanitize functions.  */
> +if (!sanitize_flags_p (SANITIZE_ADDRESS, id->dst_fn)
> +&& !sanitize_flags_p (SANITIZE_HWADDRESS, id->dst_fn))
> +  {
> +gsi_remove (_gsi, false);
> +continue;
> +  }
> +break;
> +  default:
> +break;
> +  }
> 
>  /* Statements produced by inlining can be unfolded, especially
> when we constant propagated some operands.  We can't fold
> --- gcc/testsuite/gcc.dg/asan/pr114956.c.jj2024-05-06 10:54:52.601892840 
> +0200
> +++ gcc/testsuite/gcc.dg/asan/pr114956.c2024-05-06 10:54:33.920143734 
> +0200
> @@ -0,0 +1,26 @@
> +/* PR sanitizer/114956 */
> +/* { dg-do compile } */
> +/* { dg-options "-O2 -fsanitize=address,null" } */
> +
> +int **a;
> +void qux (int *);
> +
> +__attribute__((always_inline)) static inline int *
> +foo (void)
> +{
> +  int b[1];
> +  qux (b);
> +  return a[1];
> +}
> +
> +__attribute__((no_sanitize_address)) void
> +bar (void)
> +{
> +  *a = foo ();
> +}
> +
> +void
> +baz (void)
> +{
> +  bar ();
> +}
> 
>Jakub
> 


Re: [PATCH v21 20/23] c++: Implement __is_invocable built-in trait

2024-05-07 Thread Ken Matsui
On Tue, May 7, 2024 at 11:36 AM Jason Merrill  wrote:
>
> On 5/3/24 16:52, Ken Matsui wrote:
> > Fixed datum reference problem.  Ok for trunk?
> >
> > -- >8 --
> >
> > This patch implements built-in trait for std::is_invocable.
> >
> > gcc/cp/ChangeLog:
> >
> >   * cp-trait.def: Define __is_invocable.
> >   * constraint.cc (diagnose_trait_expr): Handle CPTK_IS_INVOCABLE.
> >   * semantics.cc (trait_expr_value): Likewise.
> >   (finish_trait_expr): Likewise.
> >   * cp-tree.h (build_invoke): New function.
> >   * method.cc (build_invoke): New function.
> >
> > gcc/testsuite/ChangeLog:
> >
> >   * g++.dg/ext/has-builtin-1.C: Test existence of __is_invocable.
> >   * g++.dg/ext/is_invocable1.C: New test.
> >   * g++.dg/ext/is_invocable2.C: New test.
> >   * g++.dg/ext/is_invocable3.C: New test.
> >   * g++.dg/ext/is_invocable4.C: New test.
> >
> > Signed-off-by: Ken Matsui 
> > ---
> >   gcc/cp/constraint.cc |   6 +
> >   gcc/cp/cp-trait.def  |   1 +
> >   gcc/cp/cp-tree.h |   2 +
> >   gcc/cp/method.cc | 137 +
> >   gcc/cp/semantics.cc  |   5 +
> >   gcc/testsuite/g++.dg/ext/has-builtin-1.C |   3 +
> >   gcc/testsuite/g++.dg/ext/is_invocable1.C | 349 +++
> >   gcc/testsuite/g++.dg/ext/is_invocable2.C | 139 +
> >   gcc/testsuite/g++.dg/ext/is_invocable3.C |  51 
> >   gcc/testsuite/g++.dg/ext/is_invocable4.C |  33 +++
> >   10 files changed, 726 insertions(+)
> >   create mode 100644 gcc/testsuite/g++.dg/ext/is_invocable1.C
> >   create mode 100644 gcc/testsuite/g++.dg/ext/is_invocable2.C
> >   create mode 100644 gcc/testsuite/g++.dg/ext/is_invocable3.C
> >   create mode 100644 gcc/testsuite/g++.dg/ext/is_invocable4.C
> >
> > diff --git a/gcc/cp/constraint.cc b/gcc/cp/constraint.cc
> > index c28d7bf428e..6d14ef7dcc7 100644
> > --- a/gcc/cp/constraint.cc
> > +++ b/gcc/cp/constraint.cc
> > @@ -3792,6 +3792,12 @@ diagnose_trait_expr (tree expr, tree args)
> >   case CPTK_IS_FUNCTION:
> > inform (loc, "  %qT is not a function", t1);
> > break;
> > +case CPTK_IS_INVOCABLE:
> > +  if (!t2)
> > +inform (loc, "  %qT is not invocable", t1);
> > +  else
> > +inform (loc, "  %qT is not invocable by %qE", t1, t2);
> > +  break;
> >   case CPTK_IS_LAYOUT_COMPATIBLE:
> > inform (loc, "  %qT is not layout compatible with %qT", t1, t2);
> > break;
> > diff --git a/gcc/cp/cp-trait.def b/gcc/cp/cp-trait.def
> > index b1c875a6e7d..4e420d5390a 100644
> > --- a/gcc/cp/cp-trait.def
> > +++ b/gcc/cp/cp-trait.def
> > @@ -75,6 +75,7 @@ DEFTRAIT_EXPR (IS_EMPTY, "__is_empty", 1)
> >   DEFTRAIT_EXPR (IS_ENUM, "__is_enum", 1)
> >   DEFTRAIT_EXPR (IS_FINAL, "__is_final", 1)
> >   DEFTRAIT_EXPR (IS_FUNCTION, "__is_function", 1)
> > +DEFTRAIT_EXPR (IS_INVOCABLE, "__is_invocable", -1)
> >   DEFTRAIT_EXPR (IS_LAYOUT_COMPATIBLE, "__is_layout_compatible", 2)
> >   DEFTRAIT_EXPR (IS_LITERAL_TYPE, "__is_literal_type", 1)
> >   DEFTRAIT_EXPR (IS_MEMBER_FUNCTION_POINTER, 
> > "__is_member_function_pointer", 1)
> > diff --git a/gcc/cp/cp-tree.h b/gcc/cp/cp-tree.h
> > index 52d6841559c..8aa41f7147f 100644
> > --- a/gcc/cp/cp-tree.h
> > +++ b/gcc/cp/cp-tree.h
> > @@ -7340,6 +7340,8 @@ extern tree get_copy_assign 
> > (tree);
> >   extern tree get_default_ctor(tree);
> >   extern tree get_dtor(tree, 
> > tsubst_flags_t);
> >   extern tree build_stub_object   (tree);
> > +extern tree build_invoke (tree, const_tree,
> > +  tsubst_flags_t);
> >   extern tree strip_inheriting_ctors  (tree);
> >   extern tree inherited_ctor_binfo(tree);
> >   extern bool base_ctor_omit_inherited_parms  (tree);
> > diff --git a/gcc/cp/method.cc b/gcc/cp/method.cc
> > index 08a3d34fb01..80791227a0a 100644
> > --- a/gcc/cp/method.cc
> > +++ b/gcc/cp/method.cc
> > @@ -1928,6 +1928,143 @@ build_trait_object (tree type)
> > return build_stub_object (type);
> >   }
> >
> > +/* [func.require] Build an expression of INVOKE(FN_TYPE, ARG_TYPES...).  
> > If the
> > +   given is not invocable, returns error_mark_node.  */
> > +
> > +tree
> > +build_invoke (tree fn_type, const_tree arg_types, tsubst_flags_t complain)
> > +{
> > +  if (error_operand_p (fn_type) || error_operand_p (arg_types))
> > +return error_mark_node;
> > +
> > +  gcc_assert (TYPE_P (fn_type));
> > +  gcc_assert (TREE_CODE (arg_types) == TREE_VEC);
> > +
> > +  /* Access check is required to determine if the given is invocable.  */
> > +  deferring_access_check_sentinel acs (dk_no_deferred);
> > +
> > +  /* INVOKE is an unevaluated context.  */
> > +  cp_unevaluated cp_uneval_guard;
> > +
> > +  bool is_ptrdatamem;
> > +  bool is_ptrmemfunc;
> > +  if (TREE_CODE (fn_type) == 

Re: [PATCH v21 20/23] c++: Implement __is_invocable built-in trait

2024-05-07 Thread Jason Merrill

On 5/3/24 16:52, Ken Matsui wrote:

Fixed datum reference problem.  Ok for trunk?

-- >8 --

This patch implements built-in trait for std::is_invocable.

gcc/cp/ChangeLog:

* cp-trait.def: Define __is_invocable.
* constraint.cc (diagnose_trait_expr): Handle CPTK_IS_INVOCABLE.
* semantics.cc (trait_expr_value): Likewise.
(finish_trait_expr): Likewise.
* cp-tree.h (build_invoke): New function.
* method.cc (build_invoke): New function.

gcc/testsuite/ChangeLog:

* g++.dg/ext/has-builtin-1.C: Test existence of __is_invocable.
* g++.dg/ext/is_invocable1.C: New test.
* g++.dg/ext/is_invocable2.C: New test.
* g++.dg/ext/is_invocable3.C: New test.
* g++.dg/ext/is_invocable4.C: New test.

Signed-off-by: Ken Matsui 
---
  gcc/cp/constraint.cc |   6 +
  gcc/cp/cp-trait.def  |   1 +
  gcc/cp/cp-tree.h |   2 +
  gcc/cp/method.cc | 137 +
  gcc/cp/semantics.cc  |   5 +
  gcc/testsuite/g++.dg/ext/has-builtin-1.C |   3 +
  gcc/testsuite/g++.dg/ext/is_invocable1.C | 349 +++
  gcc/testsuite/g++.dg/ext/is_invocable2.C | 139 +
  gcc/testsuite/g++.dg/ext/is_invocable3.C |  51 
  gcc/testsuite/g++.dg/ext/is_invocable4.C |  33 +++
  10 files changed, 726 insertions(+)
  create mode 100644 gcc/testsuite/g++.dg/ext/is_invocable1.C
  create mode 100644 gcc/testsuite/g++.dg/ext/is_invocable2.C
  create mode 100644 gcc/testsuite/g++.dg/ext/is_invocable3.C
  create mode 100644 gcc/testsuite/g++.dg/ext/is_invocable4.C

diff --git a/gcc/cp/constraint.cc b/gcc/cp/constraint.cc
index c28d7bf428e..6d14ef7dcc7 100644
--- a/gcc/cp/constraint.cc
+++ b/gcc/cp/constraint.cc
@@ -3792,6 +3792,12 @@ diagnose_trait_expr (tree expr, tree args)
  case CPTK_IS_FUNCTION:
inform (loc, "  %qT is not a function", t1);
break;
+case CPTK_IS_INVOCABLE:
+  if (!t2)
+inform (loc, "  %qT is not invocable", t1);
+  else
+inform (loc, "  %qT is not invocable by %qE", t1, t2);
+  break;
  case CPTK_IS_LAYOUT_COMPATIBLE:
inform (loc, "  %qT is not layout compatible with %qT", t1, t2);
break;
diff --git a/gcc/cp/cp-trait.def b/gcc/cp/cp-trait.def
index b1c875a6e7d..4e420d5390a 100644
--- a/gcc/cp/cp-trait.def
+++ b/gcc/cp/cp-trait.def
@@ -75,6 +75,7 @@ DEFTRAIT_EXPR (IS_EMPTY, "__is_empty", 1)
  DEFTRAIT_EXPR (IS_ENUM, "__is_enum", 1)
  DEFTRAIT_EXPR (IS_FINAL, "__is_final", 1)
  DEFTRAIT_EXPR (IS_FUNCTION, "__is_function", 1)
+DEFTRAIT_EXPR (IS_INVOCABLE, "__is_invocable", -1)
  DEFTRAIT_EXPR (IS_LAYOUT_COMPATIBLE, "__is_layout_compatible", 2)
  DEFTRAIT_EXPR (IS_LITERAL_TYPE, "__is_literal_type", 1)
  DEFTRAIT_EXPR (IS_MEMBER_FUNCTION_POINTER, "__is_member_function_pointer", 1)
diff --git a/gcc/cp/cp-tree.h b/gcc/cp/cp-tree.h
index 52d6841559c..8aa41f7147f 100644
--- a/gcc/cp/cp-tree.h
+++ b/gcc/cp/cp-tree.h
@@ -7340,6 +7340,8 @@ extern tree get_copy_assign   (tree);
  extern tree get_default_ctor  (tree);
  extern tree get_dtor  (tree, tsubst_flags_t);
  extern tree build_stub_object (tree);
+extern tree build_invoke   (tree, const_tree,
+tsubst_flags_t);
  extern tree strip_inheriting_ctors(tree);
  extern tree inherited_ctor_binfo  (tree);
  extern bool base_ctor_omit_inherited_parms(tree);
diff --git a/gcc/cp/method.cc b/gcc/cp/method.cc
index 08a3d34fb01..80791227a0a 100644
--- a/gcc/cp/method.cc
+++ b/gcc/cp/method.cc
@@ -1928,6 +1928,143 @@ build_trait_object (tree type)
return build_stub_object (type);
  }
  
+/* [func.require] Build an expression of INVOKE(FN_TYPE, ARG_TYPES...).  If the

+   given is not invocable, returns error_mark_node.  */
+
+tree
+build_invoke (tree fn_type, const_tree arg_types, tsubst_flags_t complain)
+{
+  if (error_operand_p (fn_type) || error_operand_p (arg_types))
+return error_mark_node;
+
+  gcc_assert (TYPE_P (fn_type));
+  gcc_assert (TREE_CODE (arg_types) == TREE_VEC);
+
+  /* Access check is required to determine if the given is invocable.  */
+  deferring_access_check_sentinel acs (dk_no_deferred);
+
+  /* INVOKE is an unevaluated context.  */
+  cp_unevaluated cp_uneval_guard;
+
+  bool is_ptrdatamem;
+  bool is_ptrmemfunc;
+  if (TREE_CODE (fn_type) == REFERENCE_TYPE)
+{
+  tree deref_fn_type = TREE_TYPE (fn_type);
+  is_ptrdatamem = TYPE_PTRDATAMEM_P (deref_fn_type);
+  is_ptrmemfunc = TYPE_PTRMEMFUNC_P (deref_fn_type);
+
+  /* Dereference fn_type if it is a pointer to member.  */
+  if (is_ptrdatamem || is_ptrmemfunc)
+   fn_type = deref_fn_type;
+}
+  else
+{
+  is_ptrdatamem = TYPE_PTRDATAMEM_P (fn_type);
+  is_ptrmemfunc = TYPE_PTRMEMFUNC_P (fn_type);
+}
+
+  if (is_ptrdatamem && 

Re: [V2][PATCH] gcc-14/changes.html: Deprecate a GCC C extension on flexible array members.

2024-05-07 Thread Qing Zhao


On May 7, 2024, at 13:57, Sebastian Huber  
wrote:

On 07.05.24 16:26, Qing Zhao wrote:
Hi, Sebastian,
Thanks for your explanation.
Our goal is to deprecate the GCC extension on  structure containing a flexible 
array member not at the end of another structure. In order to achieve this 
goal, we provided the warning option -Wflex-array-member-not-at-end for the 
users to
locate all such cases in their source code and update the source code to 
eliminate such cases.

What is the benefit of deprecating this GCC extension? If GCC extensions are 
removed, then it would be nice to enable the associated warnings by default.

We had a long discussion before deciding to deprecating this GCC extension. 
Please see details here:

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=101832

Yes, we do plan to enable this warning by default before final deprecation.  
(Might consider to enable this warning by default in GCC15… and then deprecate 
it in the next release)

Right now, there is an ongoing work in Linux kernel to get rid of all such 
cases. Kees might have more information on this.


The static initialization of structures with flexible array members will still 
work as long as the flexible array members are at
the end of the structures.

Removing the support for flexible array members in the middle of compounds will 
make the static initialization practically infeasible.

 If the flexible array member is moved to the end of the compounds, the static 
initialization still work. What’s the issue here?

My question: is it possible to update your source code to move the structure 
with flexible array member to the end of the
containing structure?
i.e, in your example, in the struct Thread_Configured_control, move the field 
“Thread_Control Control” to the end of the structure?

If we move the Thread_Control to the end, how would I add a configuration 
defined number of elements at the end?
Don’t understand this, why moving the Thread_Control Control” to the end of the 
containing structure will make this a problem?
Could you please explain this with a simplified example?

Thanks.

Qing

--
embedded brains GmbH & Co. KG
Herr Sebastian HUBER
Dornierstr. 4
82178 Puchheim
Germany
email: sebastian.hu...@embedded-brains.de
phone: +49-89-18 94 741 - 16
fax:   +49-89-18 94 741 - 08

Registergericht: Amtsgericht München
Registernummer: HRB 157899
Vertretungsberechtigte Geschäftsführer: Peter Rasmussen, Thomas Dörfler
Unsere Datenschutzerklärung finden Sie hier:
https://embedded-brains.de/datenschutzerklaerung/



Re: [V2][PATCH] gcc-14/changes.html: Deprecate a GCC C extension on flexible array members.

2024-05-07 Thread Sebastian Huber

On 07.05.24 16:26, Qing Zhao wrote:

Hi, Sebastian,

Thanks for your explanation.

Our goal is to deprecate the GCC extension on  structure containing a flexible 
array member not at the end of another structure. In order to achieve this 
goal, we provided the warning option -Wflex-array-member-not-at-end for the 
users to
locate all such cases in their source code and update the source code to 
eliminate such cases.


What is the benefit of deprecating this GCC extension? If GCC extensions 
are removed, then it would be nice to enable the associated warnings by 
default.




The static initialization of structures with flexible array members will still 
work as long as the flexible array members are at
the end of the structures.


Removing the support for flexible array members in the middle of 
compounds will make the static initialization practically infeasible.




My question: is it possible to update your source code to move the structure 
with flexible array member to the end of the
containing structure?

i.e, in your example, in the struct Thread_Configured_control, move the field 
“Thread_Control Control” to the end of the structure?


If we move the Thread_Control to the end, how would I add a 
configuration defined number of elements at the end?


--
embedded brains GmbH & Co. KG
Herr Sebastian HUBER
Dornierstr. 4
82178 Puchheim
Germany
email: sebastian.hu...@embedded-brains.de
phone: +49-89-18 94 741 - 16
fax:   +49-89-18 94 741 - 08

Registergericht: Amtsgericht München
Registernummer: HRB 157899
Vertretungsberechtigte Geschäftsführer: Peter Rasmussen, Thomas Dörfler
Unsere Datenschutzerklärung finden Sie hier:
https://embedded-brains.de/datenschutzerklaerung/


[patch,avr] PR114975: Better 8-bit popcount detection.

2024-05-07 Thread Georg-Johann Lay

Add a pattern for better popcount detection.

Ok for master?

Johann

--

AVR: target/114975 - Add combine-pattern for __popcountqi2.

PR target/114975
gcc/
* config/avr/avr.md: Add combine pattern for
8-bit popcount detection.

gcc/testsuite/
* gcc.target/avr/pr114975-popcount.c: New test.diff --git a/gcc/config/avr/avr.md b/gcc/config/avr/avr.md
index 97f42be7729..d4fcff46123 100644
--- a/gcc/config/avr/avr.md
+++ b/gcc/config/avr/avr.md
@@ -8527,6 +8542,19 @@ (define_expand "popcountsi2"
 operands[2] = gen_reg_rtx (HImode);
   })
 
+(define_insn_and_split "*popcounthi2.split8"
+  [(set (reg:HI 24)
+(zero_extend:HI (popcount:QI (match_operand:QI 0 "register_operand"]
+  "! reload_completed"
+  { gcc_unreachable(); }
+  "&& 1"
+  [(set (reg:QI 24)
+(match_dup 0))
+   (set (reg:QI 24)
+(popcount:QI (reg:QI 24)))
+   (set (reg:QI 25)
+(const_int 0))])
+
 (define_insn_and_split "*popcounthi2.libgcc_split"
   [(set (reg:HI 24)
 (popcount:HI (reg:HI 24)))]
diff --git a/gcc/testsuite/gcc.target/avr/pr114975-popcount.c b/gcc/testsuite/gcc.target/avr/pr114975-popcount.c
new file mode 100644
index 000..87eb56b56c5
--- /dev/null
+++ b/gcc/testsuite/gcc.target/avr/pr114975-popcount.c
@@ -0,0 +1,17 @@
+/* { dg-do compile } */
+/* { dg-additional-options "-Os" } */
+
+typedef __UINT8_TYPE__ uint8_t;
+
+uint8_t use_pop1 (int y, uint8_t x)
+{
+return 1 + __builtin_popcount (x);
+}
+
+uint8_t use_pop2 (uint8_t x)
+{
+	x += 1;
+return 1 - __builtin_popcount (x);
+}
+
+/* { dg-final { scan-assembler-times "__popcountqi2" 2 } } */


[patch,avr] PR114975: Better 8-bit parity detection.

2024-05-07 Thread Georg-Johann Lay

Add a combine pattern for parity detection.

Ok for master?

Johann

AVR: target/114975 - Add combine-pattern for __parityqi2.

PR target/114975
gcc/
* config/avr/avr.md: Add combine pattern for
8-bit parity detection.

gcc/testsuite/
* gcc.target/avr/pr114975-parity.c: New test.diff --git a/gcc/config/avr/avr.md b/gcc/config/avr/avr.md
index 97f42be7729..d4fcff46123 100644
--- a/gcc/config/avr/avr.md
+++ b/gcc/config/avr/avr.md
@@ -8418,7 +8418,22 @@ (define_insn_and_split "*parityhi2"
(set (match_dup 0)
 (reg:HI 24))])
 
-(define_insn_and_split "*parityqihi2"
+(define_insn_and_split "*parityqihi2.1"
+  [(set (match_operand:HI 0 "register_operand""=r")
+(zero_extend:HI
+ (parity:QI (match_operand:QI 1 "register_operand" "r"
+   (clobber (reg:HI 24))]
+  "!reload_completed"
+  { gcc_unreachable(); }
+  "&& 1"
+  [(set (reg:QI 24)
+(match_dup 1))
+   (set (reg:HI 24)
+(zero_extend:HI (parity:QI (reg:QI 24
+   (set (match_dup 0)
+(reg:HI 24))])
+
+(define_insn_and_split "*parityqihi2.2"
   [(set (match_operand:HI 0 "register_operand"   "=r")
 (parity:HI (match_operand:QI 1 "register_operand" "r")))
(clobber (reg:HI 24))]
diff --git a/gcc/testsuite/gcc.target/avr/pr114975-parity.c b/gcc/testsuite/gcc.target/avr/pr114975-parity.c
new file mode 100644
index 000..767ced0a464
--- /dev/null
+++ b/gcc/testsuite/gcc.target/avr/pr114975-parity.c
@@ -0,0 +1,17 @@
+/* { dg-do compile } */
+/* { dg-additional-options "-Os" } */
+
+typedef __UINT8_TYPE__ uint8_t;
+
+uint8_t use_pary1 (int y, uint8_t x)
+{
+return 1 + __builtin_parity (x);
+}
+
+uint8_t use_pary2 (uint8_t x)
+{
+	x += 1;
+return 1 - __builtin_parity (x);
+}
+
+/* { dg-final { scan-assembler-times "__parityqi2" 2 } } */


[r15-268 Regression] FAIL: gcc.target/i386/pr101950-2.c scan-assembler-times \txor[ql]\t 2 on Linux/x86_64

2024-05-07 Thread haochen.jiang
On Linux/x86_64,

9dbff9c05520a74e6cd337578f27b56c941f64f3 is the first bad commit
commit 9dbff9c05520a74e6cd337578f27b56c941f64f3
Author: Richard Biener 
Date:   Tue May 7 10:14:19 2024 +0200

Revert "Revert "combine: Don't combine if I2 does not change""

caused

FAIL: gcc.target/i386/pr101950-2.c scan-assembler-times \txor[ql]\t 2

with GCC configured with

../../gcc/configure 
--prefix=/export/users/haochenj/src/gcc-bisect/master/master/r15-268/usr 
--enable-clocale=gnu --with-system-zlib --with-demangler-in-ld 
--with-fpmath=sse --enable-languages=c,c++,fortran --enable-cet --without-isl 
--enable-libmpx x86_64-linux --disable-bootstrap

To reproduce:

$ cd {build_dir}/gcc && make check 
RUNTESTFLAGS="i386.exp=gcc.target/i386/pr101950-2.c --target_board='unix{-m64}'"
$ cd {build_dir}/gcc && make check 
RUNTESTFLAGS="i386.exp=gcc.target/i386/pr101950-2.c --target_board='unix{-m64\ 
-march=cascadelake}'"

(Please do not reply to this email, for question about this report, contact me 
at haochen dot jiang at intel.com.)
(If you met problems with cascadelake related, disabling AVX512F in command 
line might save that.)
(However, please make sure that there is no potential problems with AVX512.)


Re: [PATCH][risc-v] libstdc++: Preserve signbit of nan when converting float to double [PR113578]

2024-05-07 Thread Jonathan Wakely
On Tue, 7 May 2024 at 17:39, Jonathan Wakely  wrote:
>
> On Tue, 7 May 2024 at 17:33, Jeff Law wrote:
> >
> >
> >
> > On 5/7/24 9:36 AM, Andreas Schwab wrote:
> > > On Mai 07 2024, Jonathan Wakely wrote:
> > >
> > >> +#ifdef __riscv
> > >> +return _M_insert(__builtin_copysign((double)__f,
> > >> +
> > >> (double)-__builtin_signbit(__f));
> > >
> > > Should this use static_cast?
>
> Meh. It wouldn't fit in 80 columns any more with static_cast, and it
> means exactly the same thing.
>
> > And it's missing a close paren.
>
> Now that's more important! Thanks.

Also, I've just realised that signbit might return a negative value if
the signbit is set. The spec only says it returns non-zero if the
signbit is set.

So maybe we want:

#ifdef __riscv
const int __neg = __builtin_signbit(__f) ? -1 : 0;
return _M_insert(__builtin_copysign(static_cast(__f),
  static_cast(__neg)));
#else
return _M_insert(static_cast(__f));
#endif


Re: [PATCH][risc-v] libstdc++: Preserve signbit of nan when converting float to double [PR113578]

2024-05-07 Thread Jonathan Wakely
On Tue, 7 May 2024 at 17:33, Jeff Law wrote:
>
>
>
> On 5/7/24 9:36 AM, Andreas Schwab wrote:
> > On Mai 07 2024, Jonathan Wakely wrote:
> >
> >> +#ifdef __riscv
> >> +return _M_insert(__builtin_copysign((double)__f,
> >> +(double)-__builtin_signbit(__f));
> >
> > Should this use static_cast?

Meh. It wouldn't fit in 80 columns any more with static_cast, and it
means exactly the same thing.

> And it's missing a close paren.

Now that's more important! Thanks.


RE: [EXTERNAL] Re: [PATCH v3 00/12] Add aarch64-w64-mingw32 target

2024-05-07 Thread Zac Walker
Cool - congratulations everyone!!
Thanks for getting it completed. Fantastic effort from you all.

Zac

-Original Message-
From: Christophe Lyon  
Sent: Tuesday, May 7, 2024 6:06 PM
To: Evgeny Karpov 
Cc: gcc-patches@gcc.gnu.org; richard.sandif...@arm.com; Richard Earnshaw 
(lists) ; Maxim Kuvyrkov ; 
Radek Barton ; Zac Walker 
Subject: [EXTERNAL] Re: [PATCH v3 00/12] Add aarch64-w64-mingw32 target

Hi,

I've just pushed this patch series, congratulations!

Thanks,

Christophe


On Thu, 11 Apr 2024 at 15:40, Evgeny Karpov  wrote:
>
> Hello,
>
> Thank you for reviewing v2!
> v3 addresses all comments on v2.
>
> v3 Changes:
> - Exclude the aarch64_calling_abi declaration from the patch series.
> - Refactor x18 adjustment for MS ABI.
> - Remove unnecessary headers.
> - Add an extra comment to explain empty definitions.
> - Use gcc_unreachable for definitions that are needed for compilation, 
> but not used by the aarch64-w64-mingw32 target.
> - Retain old index entries.
> - Rebase from 11th April 2024
>
> Regards,
> Evgeny
>
>
> Zac Walker (12):
>   Introduce aarch64-w64-mingw32 target
>   aarch64: Mark x18 register as a fixed register for MS ABI
>   aarch64: Add aarch64-w64-mingw32 COFF
>   Reuse MinGW from i386 for AArch64
>   Rename section and encoding functions from i386 which will be used in
> aarch64
>   Exclude i386 functionality from aarch64 build
>   aarch64: Add Cygwin and MinGW environments for AArch64
>   aarch64: Add SEH to machine_function
>   Rename "x86 Windows Options" to "Cygwin and MinGW Options"
>   aarch64: Build and add objects for Cygwin and MinGW for AArch64
>   aarch64: Add aarch64-w64-mingw32 target to libatomic
>   Add aarch64-w64-mingw32 target to libgcc
>
>  fixincludes/mkfixinc.sh   |   3 +-
>  gcc/config.gcc|  47 +++--
>  gcc/config/aarch64/aarch64-abi-ms.h   |  34 
>  gcc/config/aarch64/aarch64-coff.h |  91 +
>  gcc/config/aarch64/aarch64-protos.h   |   5 +
>  gcc/config/aarch64/aarch64.h  |  13 +-
>  gcc/config/aarch64/cygming.h  | 172 ++
>  gcc/config/i386/cygming.h |  18 +-
>  gcc/config/i386/cygming.opt.urls  |  30 ---
>  gcc/config/i386/i386-protos.h |  12 +-
>  gcc/config/i386/mingw-w64.opt.urls|   2 +-
>  gcc/config/lynx.opt.urls  |   2 +-
>  gcc/config/{i386 => mingw}/cygming.opt|   0
>  gcc/config/mingw/cygming.opt.urls |  30 +++
>  gcc/config/{i386 => mingw}/cygwin-d.cc|   0
>  gcc/config/{i386 => mingw}/mingw-stdint.h |   9 +-
>  gcc/config/{i386 => mingw}/mingw.opt  |   0
>  gcc/config/{i386 => mingw}/mingw.opt.urls |   2 +-
>  gcc/config/{i386 => mingw}/mingw32.h  |   4 +-
>  gcc/config/{i386 => mingw}/msformat-c.cc  |   0
>  gcc/config/{i386 => mingw}/t-cygming  |  23 ++-
>  gcc/config/{i386 => mingw}/winnt-cxx.cc   |   0
>  gcc/config/{i386 => mingw}/winnt-d.cc |   0
>  gcc/config/{i386 => mingw}/winnt-stubs.cc |   0
>  gcc/config/{i386 => mingw}/winnt.cc   |  30 +--
>  gcc/doc/invoke.texi   |  10 +
>  gcc/varasm.cc |   2 +-
>  libatomic/configure.tgt   |   2 +-
>  libgcc/config.host|  23 ++-
>  libgcc/config/aarch64/t-no-eh |   2 +
>  libgcc/config/{i386 => mingw}/t-gthr-win32|   0
>  libgcc/config/{i386 => mingw}/t-mingw-pthread |   0
>  32 files changed, 473 insertions(+), 93 deletions(-)  create mode 
> 100644 gcc/config/aarch64/aarch64-abi-ms.h
>  create mode 100644 gcc/config/aarch64/aarch64-coff.h  create mode 
> 100644 gcc/config/aarch64/cygming.h  delete mode 100644 
> gcc/config/i386/cygming.opt.urls  rename gcc/config/{i386 => 
> mingw}/cygming.opt (100%)  create mode 100644 
> gcc/config/mingw/cygming.opt.urls  rename gcc/config/{i386 => 
> mingw}/cygwin-d.cc (100%)  rename gcc/config/{i386 => 
> mingw}/mingw-stdint.h (86%)  rename gcc/config/{i386 => 
> mingw}/mingw.opt (100%)  rename gcc/config/{i386 => 
> mingw}/mingw.opt.urls (86%)  rename gcc/config/{i386 => 
> mingw}/mingw32.h (99%)  rename gcc/config/{i386 => 
> mingw}/msformat-c.cc (100%)  rename gcc/config/{i386 => 
> mingw}/t-cygming (73%)  rename gcc/config/{i386 => mingw}/winnt-cxx.cc 
> (100%)  rename gcc/config/{i386 => mingw}/winnt-d.cc (100%)  rename 
> gcc/config/{i386 => mingw}/winnt-stubs.cc (100%)  rename 
> gcc/config/{i386 => mingw}/winnt.cc (97%)  create mode 100644 
> libgcc/config/aarch64/t-no-eh  rename libgcc/config/{i386 => 
> mingw}/t-gthr-win32 (100%)  rename libgcc/config/{i386 => 
> mingw}/t-mingw-pthread (100%)
>
> --
> 2.25.1
>


Re: [PATCH][risc-v] libstdc++: Preserve signbit of nan when converting float to double [PR113578]

2024-05-07 Thread Jeff Law




On 5/7/24 9:36 AM, Andreas Schwab wrote:

On Mai 07 2024, Jonathan Wakely wrote:


+#ifdef __riscv
+   return _M_insert(__builtin_copysign((double)__f,
+   (double)-__builtin_signbit(__f));


Should this use static_cast?

And it's missing a close paren.

jeff


Re: [PATCH][risc-v] libstdc++: Preserve signbit of nan when converting float to double [PR113578]

2024-05-07 Thread Palmer Dabbelt
[+Adhemerval and Letu, who handled the glibc side of things, in case 
they have any more context.]


On Tue, 07 May 2024 07:11:08 PDT (-0700), jwak...@redhat.com wrote:

On Tue, 7 May 2024 at 15:06, Jonathan Wakely wrote:


On Tue, 7 May 2024 at 14:57, Jeff Law wrote:
>
>
>
> On 5/7/24 7:49 AM, Jonathan Wakely wrote:
> > Do we want this change for RISC-V, to fix PR113578?
> >
> > I haven't tested it on RISC-V, only on x86_64-linux (where it doesn't do
> > anything).
> >
> > -- >8 --
> >
> > libstdc++-v3/ChangeLog:
> >
> >   PR libstdc++/113578
> >   * include/std/ostream (operator<<(basic_ostream&, float)):
> >   Restore signbit after converting to double.
> No strong opinion. One could argue that the existence of a
> conditional like that inherently implies the generic code is dependent
> on specific processor behavior which probably is unwise.  But again, no
> strong opinion.

Yes, but I'm not aware of any other processors that lose the signbit
like this, so in practice it's always worked fine to cast the float to
double.


The similar glibc fix for strfrom is specific to RISC-V:
https://sourceware.org/git/gitweb.cgi?p=glibc.git;h=0cc0033ef19bd3378445c2b851e53d7255cb1b1e


I missed the glibc patch, but IIUC the issue here is NaN 
canonicalization losing sign bits.  Presumably it's OK to lose the other 
bits?  Otherwise we'd need some different twiddling.


Either way, I think having the signed-NaN-preserving conversion is 
reasonable as it's what users are going to expect (even if it's only 
recommended by IEEE).  So


Reviewed-by: Palmer Dabbelt 
Acked-by: Palmer Dabbelt 

in case you want to pick it up.  I guess we should backport this too?

Maybe we should also have some sort of arch-independent 
`double __builtin_float_to_double_with_nan_sign_bits(float)` sort of 
thing?  Then we could just use it everywhere rather than duplicating 
this logic all over the place.



My patch uses copysign unconditionally, to avoid branching on isnan. I
don't know if that's the right choice.


IMO it's fine: it looks like this can get inlined so having the slightly 
shorter code sequence would help, and it's on an IO path so I doubt 
unconditionally executing the extra conversion instructions really 
matters.


[PATCH v2] arm: [MVE intrinsics] Fix support for predicate constants [PR target/114801]

2024-05-07 Thread Christophe Lyon
In this PR, we have to handle a case where MVE predicates are supplied
as a const_int, where individual predicates have illegal boolean
values (such as 0xc for a 4-bit boolean predicate).  To avoid the ICE,
we hide the constant behind an unspec.

On MVE V8BI and V4BI multi-bit masks are interpreted byte-by-byte at
instruction level, see
https://developer.arm.com/documentation/101028/0012/14--M-profile-Vector-Extension--MVE--intrinsics.

This is a workaround until we change such predicates representation to
V16BImode.

2024-05-06  Christophe Lyon  
Jakub Jelinek  

PR target/114801
gcc/
* config/arm/arm-mve-builtins.cc
(function_expander::add_input_operand): Handle CONST_INT
predicates.
* mve.md (set_mve_const_pred): New pattern.
* unspec.md (MVE_PRED): New unspec.

gcc/testsuite/
* gcc.target/arm/mve/pr114801.c: New test.
---
 gcc/config/arm/arm-mve-builtins.cc  | 27 ++-
 gcc/config/arm/mve.md   | 12 +++
 gcc/config/arm/unspecs.md   |  1 +
 gcc/testsuite/gcc.target/arm/mve/pr114801.c | 37 +
 4 files changed, 76 insertions(+), 1 deletion(-)
 create mode 100644 gcc/testsuite/gcc.target/arm/mve/pr114801.c

diff --git a/gcc/config/arm/arm-mve-builtins.cc 
b/gcc/config/arm/arm-mve-builtins.cc
index 6a5775c67e5..7d5af649857 100644
--- a/gcc/config/arm/arm-mve-builtins.cc
+++ b/gcc/config/arm/arm-mve-builtins.cc
@@ -2205,7 +2205,32 @@ function_expander::add_input_operand (insn_code icode, 
rtx x)
   mode = GET_MODE (x);
 }
   else if (VALID_MVE_PRED_MODE (mode))
-x = gen_lowpart (mode, x);
+{
+  if (CONST_INT_P (x) && (mode == V8BImode || mode == V4BImode))
+   {
+ /* In V8BI or V4BI each element has 2 or 4 bits, if those
+bits aren't all the same, gen_lowpart might ICE.  Hide
+the move behind an unspec to avoid this.
+V8BI and V4BI multi-bit masks are interpreted
+byte-by-byte at instruction level, see
+
https://developer.arm.com/documentation/101028/0012/14--M-profile-Vector-Extension--MVE--intrinsics.
  */
+ unsigned HOST_WIDE_INT xi = UINTVAL (x);
+ if ((xi & 0x) != ((xi >> 1) & 0x)
+ || (mode == V4BImode
+ && (xi & 0x) != ((xi >> 2) & 0x)))
+   {
+ rtx unspec_x;
+ unspec_x = gen_rtx_UNSPEC (HImode, gen_rtvec (1, x), MVE_PRED);
+ x = force_reg (HImode, unspec_x);
+   }
+
+   }
+  else if (SUBREG_P (x))
+   /* gen_lowpart on a SUBREG can ICE.  */
+   x = force_reg (GET_MODE (x), x);
+
+  x = gen_lowpart (mode, x);
+}
 
   m_ops.safe_grow (m_ops.length () + 1, true);
   create_input_operand (_ops.last (), x, mode);
diff --git a/gcc/config/arm/mve.md b/gcc/config/arm/mve.md
index 35916f62604..d337422d695 100644
--- a/gcc/config/arm/mve.md
+++ b/gcc/config/arm/mve.md
@@ -6621,3 +6621,15 @@ (define_expand "@arm_mve_reinterpret"
   }
   }
 )
+
+;; Hide predicate constants from optimizers
+(define_insn "set_mve_const_pred"
+ [(set
+   (match_operand:HI 0 "s_register_operand" "=r")
+   (unspec:HI [(match_operand:HI 1 "general_operand" "n")] MVE_PRED))]
+  "TARGET_HAVE_MVE"
+{
+return "movw%?\t%0, %L1\t%@ set_mve_const_pred";
+}
+  [(set_attr "type" "mov_imm")]
+)
diff --git a/gcc/config/arm/unspecs.md b/gcc/config/arm/unspecs.md
index 4713ec840ab..336f2fe08e6 100644
--- a/gcc/config/arm/unspecs.md
+++ b/gcc/config/arm/unspecs.md
@@ -1256,4 +1256,5 @@ (define_c_enum "unspec" [
   SQRSHRL_48
   VSHLCQ_M_
   REINTERPRET
+  MVE_PRED
 ])
diff --git a/gcc/testsuite/gcc.target/arm/mve/pr114801.c 
b/gcc/testsuite/gcc.target/arm/mve/pr114801.c
new file mode 100644
index 000..fb3e4d855f9
--- /dev/null
+++ b/gcc/testsuite/gcc.target/arm/mve/pr114801.c
@@ -0,0 +1,37 @@
+/* { dg-do compile } */
+/* { dg-require-effective-target arm_v8_1m_mve_ok } */
+/* { dg-options "-O2" } */
+/* { dg-add-options arm_v8_1m_mve } */
+/* { dg-final { check-function-bodies "**" "" "" } } */
+
+#include 
+
+/*
+** test_32:
+**...
+** movwr[0-9]+, 52428  @ set_mve_const_pred
+**...
+*/
+uint32x4_t test_32() {
+  return vdupq_m_n_u32(vdupq_n_u32(0x), 0, 0x);
+}
+
+/*
+** test_16:
+**...
+** movwr[0-9]+, 6927   @ set_mve_const_pred
+**...
+*/
+uint16x8_t test_16() {
+  return vdupq_m_n_u16(vdupq_n_u16(0x), 0, 0x1b0f);
+}
+
+/*
+** test_8:
+**...
+** mov r[0-9]+, #23055 @ movhi
+**...
+*/
+uint8x16_t test_8() {
+  return vdupq_m_n_u8(vdupq_n_u8(0xff), 0, 0x5a0f);
+}
-- 
2.34.1



Re: Fix gnu versioned namespace mode 00/03

2024-05-07 Thread Iain Sandoe
Hi François

> On 4 May 2024, at 22:11, François Dumont  wrote:
> 
> Here is the list of patches to restore gnu versioned namespace mode.
> 
> 1/3: Bump gnu version namespace
> 
> This is important to be done first so that once build of gnu versioned 
> namespace is fixed there is no chance to have another build of '__8' version 
> with a different abi than last successful '__8' build.
> 
> 2/3: Fix build using cxx11 abi for versioned namespace
> 
> 3/3: Proposal to default to "new" abi when dual abi is disabled and accept 
> any default-libstdcxx-abi either dual abi is enabled or not.
> 
> All testsuite run for following configs:
> 
> - dual abi
> 
> - gcc4-compatible only abi
> 
> - new only abi
> 
> - versioned namespace abi

At the risk of delaying this (a bit) - I think we should also consider items 
like once_call that have broken impls. in the current library - and at least 
get proposed replacements available behind the versioned namespace; rather than 
using up a namespace version with the current broken code.

I have a proposed once_call replacement (but I think Jonathan also has one or 
more alternatives there)

Please can we try to identify any other similar blocked fixes?

Iain



Re: Fix gnu versioned namespace mode 01/03

2024-05-07 Thread Iain Sandoe
Hi François

As you know I am keen to see this land - but having had some experience with 
applying previous patches to actual toolchain builds .. 

> On 4 May 2024, at 22:11, François Dumont  wrote:
> 
> libstdc++: Bump gnu versioned namespace to __9

I think that the namespace version should be a top-level configure choice.  
—with-libstdcxx-namespace-version= (for example) - we ought to be able to make 
that the only thing that is required to trigger the process.

The reasons are:

 1. (significant) The information is needed by both the (FE) testsuites and the 
library ( I do not think the it’s a nice maintenance job to have to go through 
all the testcases that have the namespace visible and change them for each GCC 
release); instead we should arrange to set some variable in gcc/site.exp that 
the FE tests can consume (I do not think that this is too hard to arrange - 
although it might be necessary to figure out how to make scan* tests work with 
it)

 2. (nice-to-have) For targets which have never used a versioned namespace it 
seems odd to jump straight from 6 to 9.

Iain

> 
> libstdc++-v3/ChangeLog:
> 
> * acinclude.m4 (libtool_VERSION): Bump to 9:0:0.
> * config/abi/pre/gnu-versioned-namespace.ver (GLIBCXX_8.0): 
> Replace by GLIBCXX_9.0.
> Adapt all references to __8 namespace.
> * configure: Regenerate.
> * include/bits/c++config (_GLIBCXX_BEGIN_NAMESPACE_VERSION): 
> Define as 'namespace __9{'.
> (_GLIBCXX_STD_V): Adapt.
> * include/std/format (to_chars): Update namespace version in 
> symbols alias definitions.
> (__format::_Arg_store): Update namespace version in 
> make_format_args friend
> declaration.
> * python/libstdcxx/v6/printers.py (_versioned_namespace): Assign 
> '__9::'.
> * python/libstdcxx/v6/xmethods.py: Likewise.
> * testsuite/23_containers/map/48101_neg.cc: Adapt dg-error.
> * testsuite/23_containers/multimap/48101_neg.cc: Likewise.
> * testsuite/20_util/function/cons/70692.cc: Likewise.
> * testsuite/20_util/function_objects/bind_back/111327.cc: 
> Likewise.
> * testsuite/20_util/function_objects/bind_front/111327.cc: 
> Likewise.
> * testsuite/lib/prune.exp (libstdc++-dg-prune): Bump version 
> namespace.
> 
> Ok to commit ?
> 
> François
> 



Re: [PATCH v3 00/12] Add aarch64-w64-mingw32 target

2024-05-07 Thread Christophe Lyon
Hi,

I've just pushed this patch series, congratulations!

Thanks,

Christophe


On Thu, 11 Apr 2024 at 15:40, Evgeny Karpov  wrote:
>
> Hello,
>
> Thank you for reviewing v2!
> v3 addresses all comments on v2.
>
> v3 Changes:
> - Exclude the aarch64_calling_abi declaration from the patch series.
> - Refactor x18 adjustment for MS ABI.
> - Remove unnecessary headers.
> - Add an extra comment to explain empty definitions.
> - Use gcc_unreachable for definitions that are needed for compilation,
> but not used by the aarch64-w64-mingw32 target.
> - Retain old index entries.
> - Rebase from 11th April 2024
>
> Regards,
> Evgeny
>
>
> Zac Walker (12):
>   Introduce aarch64-w64-mingw32 target
>   aarch64: Mark x18 register as a fixed register for MS ABI
>   aarch64: Add aarch64-w64-mingw32 COFF
>   Reuse MinGW from i386 for AArch64
>   Rename section and encoding functions from i386 which will be used in
> aarch64
>   Exclude i386 functionality from aarch64 build
>   aarch64: Add Cygwin and MinGW environments for AArch64
>   aarch64: Add SEH to machine_function
>   Rename "x86 Windows Options" to "Cygwin and MinGW Options"
>   aarch64: Build and add objects for Cygwin and MinGW for AArch64
>   aarch64: Add aarch64-w64-mingw32 target to libatomic
>   Add aarch64-w64-mingw32 target to libgcc
>
>  fixincludes/mkfixinc.sh   |   3 +-
>  gcc/config.gcc|  47 +++--
>  gcc/config/aarch64/aarch64-abi-ms.h   |  34 
>  gcc/config/aarch64/aarch64-coff.h |  91 +
>  gcc/config/aarch64/aarch64-protos.h   |   5 +
>  gcc/config/aarch64/aarch64.h  |  13 +-
>  gcc/config/aarch64/cygming.h  | 172 ++
>  gcc/config/i386/cygming.h |  18 +-
>  gcc/config/i386/cygming.opt.urls  |  30 ---
>  gcc/config/i386/i386-protos.h |  12 +-
>  gcc/config/i386/mingw-w64.opt.urls|   2 +-
>  gcc/config/lynx.opt.urls  |   2 +-
>  gcc/config/{i386 => mingw}/cygming.opt|   0
>  gcc/config/mingw/cygming.opt.urls |  30 +++
>  gcc/config/{i386 => mingw}/cygwin-d.cc|   0
>  gcc/config/{i386 => mingw}/mingw-stdint.h |   9 +-
>  gcc/config/{i386 => mingw}/mingw.opt  |   0
>  gcc/config/{i386 => mingw}/mingw.opt.urls |   2 +-
>  gcc/config/{i386 => mingw}/mingw32.h  |   4 +-
>  gcc/config/{i386 => mingw}/msformat-c.cc  |   0
>  gcc/config/{i386 => mingw}/t-cygming  |  23 ++-
>  gcc/config/{i386 => mingw}/winnt-cxx.cc   |   0
>  gcc/config/{i386 => mingw}/winnt-d.cc |   0
>  gcc/config/{i386 => mingw}/winnt-stubs.cc |   0
>  gcc/config/{i386 => mingw}/winnt.cc   |  30 +--
>  gcc/doc/invoke.texi   |  10 +
>  gcc/varasm.cc |   2 +-
>  libatomic/configure.tgt   |   2 +-
>  libgcc/config.host|  23 ++-
>  libgcc/config/aarch64/t-no-eh |   2 +
>  libgcc/config/{i386 => mingw}/t-gthr-win32|   0
>  libgcc/config/{i386 => mingw}/t-mingw-pthread |   0
>  32 files changed, 473 insertions(+), 93 deletions(-)
>  create mode 100644 gcc/config/aarch64/aarch64-abi-ms.h
>  create mode 100644 gcc/config/aarch64/aarch64-coff.h
>  create mode 100644 gcc/config/aarch64/cygming.h
>  delete mode 100644 gcc/config/i386/cygming.opt.urls
>  rename gcc/config/{i386 => mingw}/cygming.opt (100%)
>  create mode 100644 gcc/config/mingw/cygming.opt.urls
>  rename gcc/config/{i386 => mingw}/cygwin-d.cc (100%)
>  rename gcc/config/{i386 => mingw}/mingw-stdint.h (86%)
>  rename gcc/config/{i386 => mingw}/mingw.opt (100%)
>  rename gcc/config/{i386 => mingw}/mingw.opt.urls (86%)
>  rename gcc/config/{i386 => mingw}/mingw32.h (99%)
>  rename gcc/config/{i386 => mingw}/msformat-c.cc (100%)
>  rename gcc/config/{i386 => mingw}/t-cygming (73%)
>  rename gcc/config/{i386 => mingw}/winnt-cxx.cc (100%)
>  rename gcc/config/{i386 => mingw}/winnt-d.cc (100%)
>  rename gcc/config/{i386 => mingw}/winnt-stubs.cc (100%)
>  rename gcc/config/{i386 => mingw}/winnt.cc (97%)
>  create mode 100644 libgcc/config/aarch64/t-no-eh
>  rename libgcc/config/{i386 => mingw}/t-gthr-win32 (100%)
>  rename libgcc/config/{i386 => mingw}/t-mingw-pthread (100%)
>
> --
> 2.25.1
>


[PATCH] expansion: Use __trunchfbf2 calls rather than __extendhfbf2 [PR114907]

2024-05-07 Thread Jakub Jelinek
Hi!

The HF and BF modes have the same size/precision and neither is
a subset nor superset of the other.
So, using either __extendhfbf2 or __trunchfbf2 is weird.
The expansion apparently emits __extendhfbf2, but on the libgcc side
we apparently have __trunchfbf2 implemented.

I think it is easier to switch to using what is available rather than
adding new entrypoints to libgcc, even alias, because this is backportable.

Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?

2024-05-07  Jakub Jelinek  

PR middle-end/114907
* expr.cc (convert_mode_scalar): Use trunc_optab rather than
sext_optab for HF->BF conversions.
* optabs-libfuncs.cc (gen_trunc_conv_libfunc): Likewise.

* gcc.dg/pr114907.c: New test.

--- gcc/expr.cc.jj  2024-04-09 09:29:04.0 +0200
+++ gcc/expr.cc 2024-05-06 13:21:33.933798494 +0200
@@ -355,8 +355,16 @@ convert_mode_scalar (rtx to, rtx from, i
  && REAL_MODE_FORMAT (from_mode) == _half_format));
 
   if (GET_MODE_PRECISION (from_mode) == GET_MODE_PRECISION (to_mode))
-   /* Conversion between decimal float and binary float, same size.  */
-   tab = DECIMAL_FLOAT_MODE_P (from_mode) ? trunc_optab : sext_optab;
+   {
+ if (REAL_MODE_FORMAT (to_mode) == _bfloat_half_format
+ && REAL_MODE_FORMAT (from_mode) == _half_format)
+   /* libgcc implements just __trunchfbf2, not __extendhfbf2.  */
+   tab = trunc_optab;
+ else
+   /* Conversion between decimal float and binary float, same
+  size.  */
+   tab = DECIMAL_FLOAT_MODE_P (from_mode) ? trunc_optab : sext_optab;
+   }
   else if (GET_MODE_PRECISION (from_mode) < GET_MODE_PRECISION (to_mode))
tab = sext_optab;
   else
--- gcc/optabs-libfuncs.cc.jj   2024-01-03 11:51:31.739728303 +0100
+++ gcc/optabs-libfuncs.cc  2024-05-06 15:50:21.611027802 +0200
@@ -589,7 +589,9 @@ gen_trunc_conv_libfunc (convert_optab ta
   if (GET_MODE_CLASS (float_tmode) != GET_MODE_CLASS (float_fmode))
 gen_interclass_conv_libfunc (tab, opname, float_tmode, float_fmode);
 
-  if (GET_MODE_PRECISION (float_fmode) <= GET_MODE_PRECISION (float_tmode))
+  if (GET_MODE_PRECISION (float_fmode) <= GET_MODE_PRECISION (float_tmode)
+  && (REAL_MODE_FORMAT (float_tmode) != _bfloat_half_format
+ || REAL_MODE_FORMAT (float_fmode) != _half_format))
 return;
 
   if (GET_MODE_CLASS (float_tmode) == GET_MODE_CLASS (float_fmode))
--- gcc/testsuite/gcc.dg/pr114907.c.jj  2024-05-06 15:59:08.734958523 +0200
+++ gcc/testsuite/gcc.dg/pr114907.c 2024-05-06 16:02:38.914139829 +0200
@@ -0,0 +1,27 @@
+/* PR middle-end/114907 */
+/* { dg-do run } */
+/* { dg-options "" } */
+/* { dg-add-options float16 } */
+/* { dg-require-effective-target float16_runtime } */
+/* { dg-add-options bfloat16 } */
+/* { dg-require-effective-target bfloat16_runtime } */
+
+__attribute__((noipa)) _Float16
+foo (__bf16 x)
+{
+  return (_Float16) x;
+}
+
+__attribute__((noipa)) __bf16
+bar (_Float16 x)
+{
+  return (__bf16) x;
+}
+
+int
+main ()
+{
+  if (foo (11.125bf16) != 11.125f16
+  || bar (11.125f16) != 11.125bf16)
+__builtin_abort ();
+}

Jakub



[PATCH] tree-inline: Remove .ASAN_MARK calls when inlining functions into no_sanitize callers [PR114956]

2024-05-07 Thread Jakub Jelinek
Hi!

In r9-5742 we've started allowing to inline always_inline functions into
functions which have disabled e.g. address sanitization even when the
always_inline function is implicitly from command line options sanitized.

This mostly works fine because most of the asan instrumentation is done only
late after ipa, but as the following testcase the .ASAN_MARK ifn calls
gimplifier adds can result in ICEs.

Fixed by dropping those during inlining, similarly to how we drop
.TSAN_FUNC_EXIT calls.

Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?

2024-05-07  Jakub Jelinek  

PR sanitizer/114956
* tree-inline.cc: Include asan.h.
(copy_bb): Remove also .ASAN_MARK calls if id->dst_fn has asan/hwasan
sanitization disabled.

* gcc.dg/asan/pr114956.c: New test.

--- gcc/tree-inline.cc.jj   2024-05-03 09:44:21.199055899 +0200
+++ gcc/tree-inline.cc  2024-05-06 10:45:37.231349328 +0200
@@ -65,6 +65,7 @@ along with GCC; see the file COPYING3.
 #include "symbol-summary.h"
 #include "symtab-thunks.h"
 #include "symtab-clones.h"
+#include "asan.h"
 
 /* I'm not real happy about this, but we need to handle gimple and
non-gimple trees.  */
@@ -2226,13 +2227,26 @@ copy_bb (copy_body_data *id, basic_block
}
  else if (call_stmt
   && id->call_stmt
-  && gimple_call_internal_p (stmt)
-  && gimple_call_internal_fn (stmt) == IFN_TSAN_FUNC_EXIT)
-   {
- /* Drop TSAN_FUNC_EXIT () internal calls during inlining.  */
- gsi_remove (_gsi, false);
- continue;
-   }
+  && gimple_call_internal_p (stmt))
+   switch (gimple_call_internal_fn (stmt))
+ {
+ case IFN_TSAN_FUNC_EXIT:
+   /* Drop .TSAN_FUNC_EXIT () internal calls during inlining.  */
+   gsi_remove (_gsi, false);
+   continue;
+ case IFN_ASAN_MARK:
+   /* Drop .ASAN_MARK internal calls during inlining into
+  no_sanitize functions.  */
+   if (!sanitize_flags_p (SANITIZE_ADDRESS, id->dst_fn)
+   && !sanitize_flags_p (SANITIZE_HWADDRESS, id->dst_fn))
+ {
+   gsi_remove (_gsi, false);
+   continue;
+ }
+   break;
+ default:
+   break;
+ }
 
  /* Statements produced by inlining can be unfolded, especially
 when we constant propagated some operands.  We can't fold
--- gcc/testsuite/gcc.dg/asan/pr114956.c.jj 2024-05-06 10:54:52.601892840 
+0200
+++ gcc/testsuite/gcc.dg/asan/pr114956.c2024-05-06 10:54:33.920143734 
+0200
@@ -0,0 +1,26 @@
+/* PR sanitizer/114956 */
+/* { dg-do compile } */
+/* { dg-options "-O2 -fsanitize=address,null" } */
+
+int **a;
+void qux (int *);
+
+__attribute__((always_inline)) static inline int *
+foo (void)
+{
+  int b[1];
+  qux (b);
+  return a[1];
+}
+
+__attribute__((no_sanitize_address)) void
+bar (void)
+{
+  *a = foo ();
+}
+
+void
+baz (void)
+{
+  bar ();
+}

Jakub



Re: [wwwdocs] Specify AArch64 BitInt support for little-endian only

2024-05-07 Thread Jakub Jelinek
On Tue, May 07, 2024 at 02:12:07PM +0100, Andre Vieira (lists) wrote:
> Hey Jakub,
> 
> This what ya had in mind?

Yes.

> diff --git a/htdocs/gcc-14/changes.html b/htdocs/gcc-14/changes.html
> index 
> ca5174de991bb088f653468f77485c15a61526e6..924e045a15a78b5702a0d6997953f35c6b47efd1
>  100644
> --- a/htdocs/gcc-14/changes.html
> +++ b/htdocs/gcc-14/changes.html
> @@ -325,7 +325,7 @@ You may also want to check out our
>Bit-precise integer types (_BitInt (N)
>and unsigned _BitInt (N)): integer types with
>a specified number of bits.  These are only supported on
> -  IA-32, x86-64 and AArch64 at present.
> +  IA-32, x86-64 and AArch64 (little-endian) at present.
>Structure, union and enumeration types may be defined more
>than once in the same scope with the same contents and the same
>tag; if such types are defined with the same contents and the


Jakub



Re: [PATCH][risc-v] libstdc++: Preserve signbit of nan when converting float to double [PR113578]

2024-05-07 Thread Andreas Schwab
On Mai 07 2024, Jonathan Wakely wrote:

> +#ifdef __riscv
> + return _M_insert(__builtin_copysign((double)__f,
> + (double)-__builtin_signbit(__f));

Should this use static_cast?

-- 
Andreas Schwab, SUSE Labs, sch...@suse.de
GPG Key fingerprint = 0196 BAD8 1CE9 1970 F4BE  1748 E4D4 88E3 0EEA B9D7
"And now for something completely different."


Re: [RFA][RISC-V] [PATCH v2] Enable inlining str* by default

2024-05-07 Thread Jeff Law




On 5/4/24 8:41 AM, Jeff Law wrote:
The CI system caught a latent bug in the inline string comparison code 
that shows up with rv32+zbb.  It was hardcoding 64 when AFAICT it should 
have been using BITS_PER_WORD.


So v2 with that fixed.
So per the discussion in today's call I reviewed a couple of spaces, 
particularly -Os and interactions with vector expansion of these routines.



WRT vector expansion.  We *always* use loops for this stuff right now 
(str[n]cmp, strlen).   Vector expansion of these routines is suppressed 
with -Os enabled, which is good as it's hard to see how the vector loops 
will ever be smaller than a function call.


WRT scalar expansion.  -Os generally turns off scalar expansion as well, 
with the exception of trivial cases involving str[n]cmp with one arg 
being a constant string.


These shouldn't interact at all with Sergei's setmem, clrmem, movmem 
expanders.


If we look to improve the vector expansion case (say by handling cases 
with small counts for strncmp or when one argument to str[n]cmp is a 
constant string) in the future, we'll have to revisit.


Overall conclusion is we should go ahead with the patch.

jeff



Re: [PATCH][risc-v] libstdc++: Preserve signbit of nan when converting float to double [PR113578]

2024-05-07 Thread Jeff Law




On 5/7/24 8:06 AM, Jonathan Wakely wrote:

On Tue, 7 May 2024 at 14:57, Jeff Law wrote:




On 5/7/24 7:49 AM, Jonathan Wakely wrote:

Do we want this change for RISC-V, to fix PR113578?

I haven't tested it on RISC-V, only on x86_64-linux (where it doesn't do
anything).

-- >8 --

libstdc++-v3/ChangeLog:

   PR libstdc++/113578
   * include/std/ostream (operator<<(basic_ostream&, float)):
   Restore signbit after converting to double.

No strong opinion. One could argue that the existence of a
conditional like that inherently implies the generic code is dependent
on specific processor behavior which probably is unwise.  But again, no
strong opinion.


Yes, but I'm not aware of any other processors that lose the signbit
like this, so in practice it's always worked fine to cast the float to
double.
We kicked it around a bit in our meeting today and the thinking is that 
while RISC-V implementation is IEEE 754 compliant, it does differ from 
other implementations.


So do we want to be stuck explaining this corner of IEEE 754 compliance 
to end users?  If not, then we probably want to go with your fix.


Similarly if there's a reasonable chance a standard higher in the 
software stacks mandates the behavior that everyone else has, then we'd 
want to go with your fix as well.


So after further review, I'd lean towards fixing this in libstdc++ by 
whatever means you think is cleanest.


jeff


Re: [PATCH] aarch64: Fix typo in aarch64-ldp-fusion.cc:combine_reg_notes [PR114936]

2024-05-07 Thread Richard Earnshaw (lists)
On 03/05/2024 15:45, Alex Coplan wrote:
> This fixes a typo in combine_reg_notes in the load/store pair fusion
> pass.  As it stands, the calls to filter_notes store any
> REG_FRAME_RELATED_EXPR to fr_expr with the following association:
> 
>  - i2 -> fr_expr[0]
>  - i1 -> fr_expr[1]
> 
> but then the checks inside the following if statement expect the
> opposite (more natural) association, i.e.:
> 
>  - i2 -> fr_expr[1]
>  - i1 -> fr_expr[0]
> 
> this patch fixes the oversight by swapping the fr_expr indices in the
> calls to filter_notes.
> 
> In hindsight it would probably have been less confusing / error-prone to
> have combine_reg_notes take an array of two insns, then we wouldn't have
> to mix 1-based and 0-based indexing as well as remembering to call
> filter_notes in reverse program order.  This however is a minimal fix
> for backporting purposes.
> 
> Many thanks to Matthew for spotting this typo and pointing it out to me.
> 
> Bootstrapped/regtested on aarch64-linux-gnu, OK for trunk and the 14
> branch after the 14.1 release?
> 
> Thanks,
> Alex
> 
> gcc/ChangeLog:
> 
>   PR target/114936
>   * config/aarch64/aarch64-ldp-fusion.cc (combine_reg_notes):
>   Ensure insn iN has its REG_FRAME_RELATED_EXPR (if any) stored in
>   FR_EXPR[N-1], thus matching the correspondence expected by the
>   copy_rtx calls.


OK.

R.


[patch,avr,applied] PR target/114835 - Tweak __popcountqi2

2024-05-07 Thread Georg-Johann Lay

Applied this tweak as proposed in the PR.

Johann

--

commit 6b73a9879a4503ebee2cb1a3ad243f60c922ca31
Author: Wolfgang Hospital 
Date:   Tue May 7 16:24:39 2024 +0200

AVR: target/114835 - Tweak popcountqi2

libgcc/
PR target/114835
* config/avr/lib1funcs.S (__popcountqi2): Use code that
is one instruction shorter / faster.

diff --git a/libgcc/config/avr/lib1funcs.S b/libgcc/config/avr/lib1funcs.S
index af4d7d97016..4ac31fa104e 100644
--- a/libgcc/config/avr/lib1funcs.S
+++ b/libgcc/config/avr/lib1funcs.S
@@ -3050,21 +3050,21 @@ DEFUN __popcountdi2
 ;; r24 = popcount8 (r24)
 ;; clobbers: __tmp_reg__
 DEFUN __popcountqi2
-mov  __tmp_reg__, r24
-andi r24, 1
-lsr  __tmp_reg__
-lsr  __tmp_reg__
-adc  r24, __zero_reg__
-lsr  __tmp_reg__
-adc  r24, __zero_reg__
-lsr  __tmp_reg__
-adc  r24, __zero_reg__
-lsr  __tmp_reg__
-adc  r24, __zero_reg__
-lsr  __tmp_reg__
-adc  r24, __zero_reg__
-lsr  __tmp_reg__
-adc  r24, __tmp_reg__
+mov  __tmp_reg__, r24; oeoeoeoe
+andi r24, 0xAA   ; o0o0o0o0
+lsr  r24 ; 0o0o0o0o
+;; Four values 0, 1 or 2: # bits set o+e
+sub  __tmp_reg__, r24; 44332211
+mov  r24, __tmp_reg__; 44332211
+andi r24, 0x33   ; 00330011
+eor  __tmp_reg__, r24; 44002200
+lsr  __tmp_reg__ ; 04400220
+lsr  __tmp_reg__ ; 00440022
+add  r24, __tmp_reg__; 04210421
+mov  __tmp_reg__, r24; h421l421
+swap __tmp_reg__ ; l421h421
+add  r24, __tmp_reg__; 84218421
+andi r24, 0xf; 8421 /17
 ret
 ENDF __popcountqi2
 #endif /* defined (L_popcountqi2) */


Re: [V2][PATCH] gcc-14/changes.html: Deprecate a GCC C extension on flexible array members.

2024-05-07 Thread Qing Zhao
Hi, Sebastian,

Thanks for your explanation.

Our goal is to deprecate the GCC extension on  structure containing a flexible 
array member not at the end of another structure. In order to achieve this 
goal, we provided the warning option -Wflex-array-member-not-at-end for the 
users to
locate all such cases in their source code and update the source code to 
eliminate such cases.

The static initialization of structures with flexible array members will still 
work as long as the flexible array members are at
the end of the structures.

My question: is it possible to update your source code to move the structure 
with flexible array member to the end of the
containing structure?

i.e, in your example, in the struct Thread_Configured_control, move the field 
“Thread_Control Control” to the end of the structure?

Thanks

Qing

> On May 7, 2024, at 09:15, Sebastian Huber 
>  wrote:
> 
> On 06.05.24 16:20, Qing Zhao wrote:
>> Hi, Sebastian,
>> Looks like that the behavior you described is correct.
>> What’s your major concern? ( a little confused).
> 
> I am concerned that the static initialization of structures with flexible 
> array members no longer works. In the RTEMS open source real-time operating 
> system, we use flexible array members in some parts. One example is the 
> thread control block which is used to manage a thread:
> 
> struct _Thread_Control {
>  /** This field is the object management structure for each thread. */
>  Objects_Control  Object;
> 
> [...]
> 
>  /**
>   * @brief Variable length array of user extension pointers.
>   *
>   * The length is defined by the application via .
>   */
>  void *extensions[];
> };
> 
> In a static configuration of the operating system we have something like this:
> 
> struct Thread_Configured_control {
> /*
> * This was added to address the following warning.
> * warning: invalid use of structure with flexible array member
> */
> #pragma GCC diagnostic push
> #pragma GCC diagnostic ignored "-Wpedantic"
>  Thread_Control Control;
> #pragma GCC diagnostic pop
> 
>  #if CONFIGURE_MAXIMUM_USER_EXTENSIONS > 0
>void *extensions[ CONFIGURE_MAXIMUM_USER_EXTENSIONS + 1 ];
>  #endif
>  Configuration_Scheduler_node Scheduler_nodes[ _CONFIGURE_SCHEDULER_COUNT ];
>  RTEMS_API_Control API_RTEMS;
>  #ifdef RTEMS_POSIX_API
>POSIX_API_Control API_POSIX;
>  #endif
>  #if CONFIGURE_MAXIMUM_THREAD_NAME_SIZE > 1
>char name[ CONFIGURE_MAXIMUM_THREAD_NAME_SIZE ];
>  #endif
>  #if defined(_CONFIGURE_ENABLE_NEWLIB_REENTRANCY) && \
>!defined(_REENT_THREAD_LOCAL)
>struct _reent Newlib;
>  #endif
> };
> 
> This is used to define a table of thread control blocks:
> 
> Thread_Configured_control \
> name##_Objects[ _Objects_Maximum_per_allocation( max ) ]; \
> static RTEMS_SECTION( ".noinit.rtems.content.objects." #name ) \
> 
> I would like no know which consequences the deprecation this GCC extension 
> has.
> 
> -- 
> embedded brains GmbH & Co. KG
> Herr Sebastian HUBER
> Dornierstr. 4
> 82178 Puchheim
> Germany
> email: sebastian.hu...@embedded-brains.de
> phone: +49-89-18 94 741 - 16
> fax:   +49-89-18 94 741 - 08
> 
> Registergericht: Amtsgericht München
> Registernummer: HRB 157899
> Vertretungsberechtigte Geschäftsführer: Peter Rasmussen, Thomas Dörfler
> Unsere Datenschutzerklärung finden Sie hier:
> https://embedded-brains.de/datenschutzerklaerung/



Re: Ping * 2 [PATCH v9 0/5] New attribute "counted_by" to annotate bounds for C99 FAM(PR108896)

2024-05-07 Thread Qing Zhao


On May 7, 2024, at 10:02, Qing Zhao  wrote:

2nd Ping for the middle-end change approval. -:)

**Approval status:

All C FE changes have been approved.

**Review status:

All Middle-end changes have been reviewed by Sid, no remaining issue.

Okay for GCC15?

For convenience, the following is the links to the 9th version:
https://gcc.gnu.org/pipermail/gcc-patches/2024-April/649389.html
https://gcc.gnu.org/pipermail/gcc-patches/2024-April/649390.html
https://gcc.gnu.org/pipermail/gcc-patches/2024-April/649391.html
https://gcc.gnu.org/pipermail/gcc-patches/2024-April/649392.html
https://gcc.gnu.org/pipermail/gcc-patches/2024-April/649394.html
https://gcc.gnu.org/pipermail/gcc-patches/2024-April/649393.html

One more note, CLANG has supported this attribute since last year.

Qing

thanks.

Qing

Begin forwarded message:

From: Qing Zhao 
Subject: Re: [PATCH v9 0/5] New attribute "counted_by" to annotate bounds for 
C99 FAM(PR108896)
Date: April 23, 2024 at 15:56:26 EDT
To: Richard Biener , Siddhesh Poyarekar 

Cc: Joseph Myers , "gcc-patches@gcc.gnu.org" 
, "isanb...@gmail.com" , Kees Cook 
, "uec...@tugraz.at" 

Ping for the middle-end change approval.

And an update on the status of the patch set:

**Approval status:

All C FE changes have been approved.

**Review status:

All Middle-end changes have been reviewed by Sid, no remaining issue.

Okay for GCC15?

thanks.

Qing

On Apr 12, 2024, at 09:54, Qing Zhao  wrote:

Hi,

This is the 9th version of the patch.

Compare with the 8th version, the difference are:

updates per Joseph's comments:

1. in C FE, add checking for counted_by attribute for the new multiple 
definitions of the same tag for C23 in the routine 
"tagged_types_tu_compatible_p".
 Add a new testing case flex-array-counted-by-8.c for this.
 This is for Patch 1;

2. two minor typo fixes in c-typeck.cc.
 This is for Patch 2;

Approval status:

 Patch 2's C FE change has been approved with minor typo fixes (the above 2);
 Patch 4 has been approved;
 Patch 5's C FE change has been approved;

Review status:

 Patch 3, Patch 2 and Patch 5's Middle-end change have been review by Sid, No 
issue.

More review needed:

 Patch 1's new change to C FE (the above 1);
 Patch 2, 3 and 5's middle-end change need to be approved

The 8th version is here:
https://gcc.gnu.org/pipermail/gcc-patches/2024-March/648559.html
https://gcc.gnu.org/pipermail/gcc-patches/2024-March/648560.html
https://gcc.gnu.org/pipermail/gcc-patches/2024-March/648561.html
https://gcc.gnu.org/pipermail/gcc-patches/2024-March/648562.html
https://gcc.gnu.org/pipermail/gcc-patches/2024-March/648563.html

It based on the following original proposal:

https://gcc.gnu.org/pipermail/gcc-patches/2023-November/635884.html
Represent the missing dependence for the "counted_by" attribute and its 
consumers

**The summary of the proposal is:

* Add a new internal function ".ACCESS_WITH_SIZE" to carry the size information 
for every reference to a FAM field;
* In C FE, Replace every reference to a FAM field whose TYPE has the 
"counted_by" attribute with the new internal function ".ACCESS_WITH_SIZE";
* In every consumer of the size information, for example, BDOS or array bound 
sanitizer, query the size information or ACCESS_MODE information from the new 
internal function;
* When expansing to RTL, replace the internal function with the actual 
reference to the FAM field;
* Some adjustment to ipa alias analysis, and other SSA passes to mitigate the 
impact to the optimizer and code generation.


**The new internal function

.ACCESS_WITH_SIZE (REF_TO_OBJ, REF_TO_SIZE, CLASS_OF_SIZE, TYPE_OF_SIZE, 
ACCESS_MODE, TYPE_OF_REF)

INTERNAL_FN (ACCESS_WITH_SIZE, ECF_LEAF | ECF_NOTHROW, NULL)

which returns the "REF_TO_OBJ" same as the 1st argument;

Both the return type and the type of the first argument of this function have 
been converted from the incomplete array type to the corresponding pointer type.

The call to .ACCESS_WITH_SIZE is wrapped with an INDIRECT_REF, whose type is 
the original imcomplete array type.

Please see the following link for why:
https://gcc.gnu.org/pipermail/gcc-patches/2023-November/638793.html
https://gcc.gnu.org/pipermail/gcc-patches/2023-December/639605.html

1st argument "REF_TO_OBJ": The reference to the object;
2nd argument "REF_TO_SIZE": The reference to the size of the object,
3rd argument "CLASS_OF_SIZE": The size referenced by the REF_TO_SIZE represents
 0: the number of bytes;
 1: the number of the elements of the object type;
4th argument "TYPE_OF_SIZE": A constant 0 with the TYPE of the object
refed by REF_TO_SIZE
5th argument "ACCESS_MODE":
-1: Unknown access semantics
 0: none
 1: read_only
 2: write_only
 3: read_write
6th argument "TYPE_OF_REF": A constant 0 with the pointer TYPE to
to the original flexible array type.

** The Patch sets included:

1. Provide counted_by attribute to flexible array member field;
which includes:
* "counted_by" attribute documentation;
* C 

Re: [PATCH][risc-v] libstdc++: Preserve signbit of nan when converting float to double [PR113578]

2024-05-07 Thread Jonathan Wakely
On Tue, 7 May 2024 at 15:06, Jonathan Wakely wrote:
>
> On Tue, 7 May 2024 at 14:57, Jeff Law wrote:
> >
> >
> >
> > On 5/7/24 7:49 AM, Jonathan Wakely wrote:
> > > Do we want this change for RISC-V, to fix PR113578?
> > >
> > > I haven't tested it on RISC-V, only on x86_64-linux (where it doesn't do
> > > anything).
> > >
> > > -- >8 --
> > >
> > > libstdc++-v3/ChangeLog:
> > >
> > >   PR libstdc++/113578
> > >   * include/std/ostream (operator<<(basic_ostream&, float)):
> > >   Restore signbit after converting to double.
> > No strong opinion. One could argue that the existence of a
> > conditional like that inherently implies the generic code is dependent
> > on specific processor behavior which probably is unwise.  But again, no
> > strong opinion.
>
> Yes, but I'm not aware of any other processors that lose the signbit
> like this, so in practice it's always worked fine to cast the float to
> double.

The similar glibc fix for strfrom is specific to RISC-V:
https://sourceware.org/git/gitweb.cgi?p=glibc.git;h=0cc0033ef19bd3378445c2b851e53d7255cb1b1e

My patch uses copysign unconditionally, to avoid branching on isnan. I
don't know if that's the right choice.



Re: [PATCH][risc-v] libstdc++: Preserve signbit of nan when converting float to double [PR113578]

2024-05-07 Thread Jonathan Wakely
On Tue, 7 May 2024 at 14:57, Jeff Law wrote:
>
>
>
> On 5/7/24 7:49 AM, Jonathan Wakely wrote:
> > Do we want this change for RISC-V, to fix PR113578?
> >
> > I haven't tested it on RISC-V, only on x86_64-linux (where it doesn't do
> > anything).
> >
> > -- >8 --
> >
> > libstdc++-v3/ChangeLog:
> >
> >   PR libstdc++/113578
> >   * include/std/ostream (operator<<(basic_ostream&, float)):
> >   Restore signbit after converting to double.
> No strong opinion. One could argue that the existence of a
> conditional like that inherently implies the generic code is dependent
> on specific processor behavior which probably is unwise.  But again, no
> strong opinion.

Yes, but I'm not aware of any other processors that lose the signbit
like this, so in practice it's always worked fine to cast the float to
double.



[PATCH 2/2] libstdc++: Fix data races in std::ctype [PR77704]

2024-05-07 Thread Jonathan Wakely
Tested x86_64-linux. This one is less "obviously correct", as calling
the single-character narrow(char, char) overload no longer lazily
populates individual characters in the cache (because doing that is
racy). And the single-character widen(char) no longer calls
_M_wide_init() to populate the whole widening cache.

The current code definitely has a data race, i.e. undefined behaviour,
so we need to do _something_. But maybe not this. Maybe it would be
better to keep calling _M_widen_init() from widen(char), so that
iostream construction will fill the cache on the first call to the
global locale's widen(' '), and then be faster after that (which is the
current behaviour). Maybe we want to add that to narrow(char, char) too
(which is not the current behaviour).

I raised the question on the LWG list of whether it's OK for calls to
ctype::narrow(char, char) to result in calls to the virtual function
ctype::do_narrow(const char*, const char*, char, char*), and for calls
to ctype::narrow(const char*, const char*, char, char*) to result in
calls to the virtual function ctype::do_narrow(char, char). If that
isn't OK then our entire caching scheme in std::ctype is not
allowed, and we'd need to ensure each call to narrow results in exactly
one call to the corresponding do_narrow, and nothing else.

-- >8 --

The std::ctype specialization uses mutable data members to cache
the results of do_narrow calls, to avoid virtual calls.  However, the
accesses to those mutable members are not synchronized and so there are
data races when using the facet in multiple threads.

This change ensures that the _M_narrow_ok status flag is only accessed
atomically, avoiding any races on that member.

The _M_narrow_init() member function is changed to use a mutex (with
double-checked locking), so that writing to the _M_narrow array only
happens in one thread. The function is rearranged so that the virtual
calls and comparing the arrays are done outside the critical section,
then all writes to member variables are done last, inside the critical
section.  Importantly, the _M_narrow_ok member is not set until after
the _M_narrow array has been populated.

The narrow(char, char) function will now only read from _M_narrow if
_M_narrow_ok is non-zero. This means that populating the array
happens-before reading from it. If the cache isn't available and a
virtual call to do_narrow(c, d) is needed, this function no longer
stores the result in the cache, because only _M_narrow_init() can write
to the cache now. This means that repeated calls to narrow(c, d) with
the same value of c will no longer avoid calling do_narrow(c, d). If
this impacts performance too significantly then we could make
narrow(char, char) call _M_narrow_init() to populate the cache, or just
call _M_narrow_init() on construction so the cache is always available.
In the current code widen(wchar_t) always calls _M_widen_init() to
populate that cache, but I've removed that call to be consistent with
narrow(char, char) which doesn't initialize the narrow cache. This will
impact std::basic_ios::init (used when constructing any iostream
object) which calls widen(' ') on the global locale's std::ctype
facet, so maybe we do want to warm up that cache still.

The narrow(const char*, const char*, char. char*) overload now re-checks
the _M_narrow_ok status flag after calling _M_narrow_init(), so that we
don't make an unnecessary virtual call if _M_narrow_init() set the
status flag to 1, meaning the base class version of do_narrow (using
memcpy) can be used. Reloading the status flag after calling
_M_narrow_init() can be a relaxed load, because _M_narrow_init() either
did a load with acquire ordering, or set the flag itself in the current
thread.

Similar changes are needed for the std::ctype::widen members,
which are also defined in terms of mutable data members without
synchronization.

The 22_locale/ctype/narrow/char/19955.cc test needs to be fixed to work
with the new code, because it currently assumes that the library will
only use the array form of do_narrow, and the Ctype1::do_narrow override
is not idempotent.

libstdc++-v3/ChangeLog:

PR libstdc++/77704
* include/bits/locale_facets.h (ctype::widen(char)): Check
if cache is initialized before using it.
(ctype::narrow(char, char)): Likewise.
(ctype::widen(const char*, const char*, char, char*)):
Check again if memcpy can be used after initializing the cache.
(ctype::narrow(const char*, const char*, char, char*)):
Likewise.
(ctype::_M_narrow_cache_status(int)): New member function.
(ctype::_M_widen_cache_status(int)): New member function.
* src/c++11/ctype.cc (ctype::_M_narrow_init) [__GTHREADS]:
Use atomics and a mutex to synchronize accesses to _M_narrow_ok
and _M_narrow.
(ctype::_M_widen_init) [__GTHREADS]: Likewise.
* testsuite/22_locale/ctype/narrow/char/19955.cc: Fix test
facets so that the array 

[PATCH 1/2] libstdc++: Fix data race in std::basic_ios::fill() [PR77704]

2024-05-07 Thread Jonathan Wakely
Tested x86_64-linux. This seems "obviously correct", and I'd like to
push it. The current code definitely has a data race, i.e. undefined
behaviour.

-- >8 --

The lazy caching in std::basic_ios::fill() updates a mutable member
without synchronization, which can cause a data race if two threads both
call fill() on the same stream object when _M_fill_init is false.

To avoid this we can just cache the _M_fill member and set _M_fill_init
early in std::basic_ios::init, instead of doing it lazily. As explained
by the comment in init, there's a good reason for doing it lazily. When
char_type is neither char nor wchar_t, the locale might not have a
std::ctype, so getting the fill character would throw an
exception. The current lazy init allows using unformatted I/O with such
a stream, because the fill character is never needed and so it doesn't
matter if the locale doesn't have a ctype facet. We can
maintain this property by only setting the fill character in
std::basic_ios::init if the ctype facet is present at that time. If
fill() is called later and the fill character wasn't set by init, we can
get it from the stream's current locale at the point when fill() is
called (and not try to cache it without synchronization).

This causes a change in behaviour for the following program:

  std::ostringstream out;
  out.imbue(loc);
  auto fill = out.fill();

Previously the fill character would have been set when fill() is called,
and so would have used the new locale. This commit changes it so that
the fill character is set on construction and isn't affected by the new
locale being imbued later. This new behaviour seems to be what the
standard requires, and matches MSVC.

The new 27_io/basic_ios/fill/char/fill.cc test verifies that it's still
possible to use a std::basic_ios without the ctype facet
being present at construction.

libstdc++-v3/ChangeLog:

PR libstdc++/77704
* include/bits/basic_ios.h (basic_ios::fill()): Do not modify
_M_fill and _M_fill_init in a const member function.
(basic_ios::fill(char_type)): Use _M_fill directly instead of
calling fill(). Set _M_fill_init to true.
* include/bits/basic_ios.tcc (basic_ios::init): Set _M_fill and
_M_fill_init here instead.
* testsuite/27_io/basic_ios/fill/char/1.cc: New test.
* testsuite/27_io/basic_ios/fill/wchar_t/1.cc: New test.
---
 libstdc++-v3/include/bits/basic_ios.h | 10 +--
 libstdc++-v3/include/bits/basic_ios.tcc   | 15 +++-
 .../testsuite/27_io/basic_ios/fill/char/1.cc  | 78 +++
 .../27_io/basic_ios/fill/wchar_t/1.cc | 55 +
 4 files changed, 148 insertions(+), 10 deletions(-)
 create mode 100644 libstdc++-v3/testsuite/27_io/basic_ios/fill/char/1.cc
 create mode 100644 libstdc++-v3/testsuite/27_io/basic_ios/fill/wchar_t/1.cc

diff --git a/libstdc++-v3/include/bits/basic_ios.h 
b/libstdc++-v3/include/bits/basic_ios.h
index 258e6042b8f..bc3be4d2e37 100644
--- a/libstdc++-v3/include/bits/basic_ios.h
+++ b/libstdc++-v3/include/bits/basic_ios.h
@@ -373,11 +373,8 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
   char_type
   fill() const
   {
-   if (!_M_fill_init)
- {
-   _M_fill = this->widen(' ');
-   _M_fill_init = true;
- }
+   if (__builtin_expect(!_M_fill_init, false))
+ return this->widen(' ');
return _M_fill;
   }
 
@@ -393,8 +390,9 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
   char_type
   fill(char_type __ch)
   {
-   char_type __old = this->fill();
+   char_type __old = _M_fill;
_M_fill = __ch;
+   _M_fill_init = true;
return __old;
   }
 
diff --git a/libstdc++-v3/include/bits/basic_ios.tcc 
b/libstdc++-v3/include/bits/basic_ios.tcc
index a9313736e32..0197bdf8f67 100644
--- a/libstdc++-v3/include/bits/basic_ios.tcc
+++ b/libstdc++-v3/include/bits/basic_ios.tcc
@@ -138,13 +138,20 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
   // return without throwing an exception. Unfortunately,
   // ctype is not necessarily a required facet, so
   // streams with char_type != [char, wchar_t] will not have it by
-  // default. Because of this, the correct value for _M_fill is
-  // constructed on the first call of fill(). That way,
+  // default. If the ctype facet is available now,
+  // _M_fill is set here, but otherwise no fill character will be
+  // cached and a call to fill() will check for the facet again later
+  // (and will throw if the facet is still not present). This way
   // unformatted input and output with non-required basic_ios
   // instantiations is possible even without imbuing the expected
   // ctype facet.
-  _M_fill = _CharT();
-  _M_fill_init = false;
+  if (_M_ctype)
+   {
+ _M_fill = _M_ctype->widen(' ');
+ _M_fill_init = true;
+   }
+  else
+   _M_fill_init = false;
 
   _M_tie = 0;
   _M_exception = goodbit;
diff --git 

Ping * 2 [PATCH v9 0/5] New attribute "counted_by" to annotate bounds for C99 FAM(PR108896)

2024-05-07 Thread Qing Zhao
2nd Ping for the middle-end change approval. -:)

**Approval status:

All C FE changes have been approved.

**Review status:

All Middle-end changes have been reviewed by Sid, no remaining issue.

Okay for GCC15?

thanks.

Qing

Begin forwarded message:

From: Qing Zhao 
Subject: Re: [PATCH v9 0/5] New attribute "counted_by" to annotate bounds for 
C99 FAM(PR108896)
Date: April 23, 2024 at 15:56:26 EDT
To: Richard Biener , Siddhesh Poyarekar 

Cc: Joseph Myers , "gcc-patches@gcc.gnu.org" 
, "isanb...@gmail.com" , Kees Cook 
, "uec...@tugraz.at" 

Ping for the middle-end change approval.

And an update on the status of the patch set:

**Approval status:

All C FE changes have been approved.

**Review status:

All Middle-end changes have been reviewed by Sid, no remaining issue.

Okay for GCC15?

thanks.

Qing

On Apr 12, 2024, at 09:54, Qing Zhao  wrote:

Hi,

This is the 9th version of the patch.

Compare with the 8th version, the difference are:

updates per Joseph's comments:

1. in C FE, add checking for counted_by attribute for the new multiple 
definitions of the same tag for C23 in the routine 
"tagged_types_tu_compatible_p".
 Add a new testing case flex-array-counted-by-8.c for this.
 This is for Patch 1;

2. two minor typo fixes in c-typeck.cc.
 This is for Patch 2;

Approval status:

 Patch 2's C FE change has been approved with minor typo fixes (the above 2);
 Patch 4 has been approved;
 Patch 5's C FE change has been approved;

Review status:

 Patch 3, Patch 2 and Patch 5's Middle-end change have been review by Sid, No 
issue.

More review needed:

 Patch 1's new change to C FE (the above 1);
 Patch 2, 3 and 5's middle-end change need to be approved

The 8th version is here:
https://gcc.gnu.org/pipermail/gcc-patches/2024-March/648559.html
https://gcc.gnu.org/pipermail/gcc-patches/2024-March/648560.html
https://gcc.gnu.org/pipermail/gcc-patches/2024-March/648561.html
https://gcc.gnu.org/pipermail/gcc-patches/2024-March/648562.html
https://gcc.gnu.org/pipermail/gcc-patches/2024-March/648563.html

It based on the following original proposal:

https://gcc.gnu.org/pipermail/gcc-patches/2023-November/635884.html
Represent the missing dependence for the "counted_by" attribute and its 
consumers

**The summary of the proposal is:

* Add a new internal function ".ACCESS_WITH_SIZE" to carry the size information 
for every reference to a FAM field;
* In C FE, Replace every reference to a FAM field whose TYPE has the 
"counted_by" attribute with the new internal function ".ACCESS_WITH_SIZE";
* In every consumer of the size information, for example, BDOS or array bound 
sanitizer, query the size information or ACCESS_MODE information from the new 
internal function;
* When expansing to RTL, replace the internal function with the actual 
reference to the FAM field;
* Some adjustment to ipa alias analysis, and other SSA passes to mitigate the 
impact to the optimizer and code generation.


**The new internal function

.ACCESS_WITH_SIZE (REF_TO_OBJ, REF_TO_SIZE, CLASS_OF_SIZE, TYPE_OF_SIZE, 
ACCESS_MODE, TYPE_OF_REF)

INTERNAL_FN (ACCESS_WITH_SIZE, ECF_LEAF | ECF_NOTHROW, NULL)

which returns the "REF_TO_OBJ" same as the 1st argument;

Both the return type and the type of the first argument of this function have 
been converted from the incomplete array type to the corresponding pointer type.

The call to .ACCESS_WITH_SIZE is wrapped with an INDIRECT_REF, whose type is 
the original imcomplete array type.

Please see the following link for why:
https://gcc.gnu.org/pipermail/gcc-patches/2023-November/638793.html
https://gcc.gnu.org/pipermail/gcc-patches/2023-December/639605.html

1st argument "REF_TO_OBJ": The reference to the object;
2nd argument "REF_TO_SIZE": The reference to the size of the object,
3rd argument "CLASS_OF_SIZE": The size referenced by the REF_TO_SIZE represents
 0: the number of bytes;
 1: the number of the elements of the object type;
4th argument "TYPE_OF_SIZE": A constant 0 with the TYPE of the object
refed by REF_TO_SIZE
5th argument "ACCESS_MODE":
-1: Unknown access semantics
 0: none
 1: read_only
 2: write_only
 3: read_write
6th argument "TYPE_OF_REF": A constant 0 with the pointer TYPE to
to the original flexible array type.

** The Patch sets included:

1. Provide counted_by attribute to flexible array member field;
which includes:
* "counted_by" attribute documentation;
* C FE handling of the new attribute;
  syntax checking, error reporting;
* testing cases;

2. Convert "counted_by" attribute to/from .ACCESS_WITH_SIZE.
which includes:
* The definition of the new internal function .ACCESS_WITH_SIZE in 
internal-fn.def.
* C FE converts every reference to a FAM with "counted_by" attribute to a 
call to the internal function .ACCESS_WITH_SIZE.
  (build_component_ref in c_typeck.cc)
  This includes the case when the object is statically allocated and 
initialized.
  In order to make this working, we should update 

Re: [PATCH][risc-v] libstdc++: Preserve signbit of nan when converting float to double [PR113578]

2024-05-07 Thread Jeff Law




On 5/7/24 7:49 AM, Jonathan Wakely wrote:

Do we want this change for RISC-V, to fix PR113578?

I haven't tested it on RISC-V, only on x86_64-linux (where it doesn't do
anything).

-- >8 --

libstdc++-v3/ChangeLog:

PR libstdc++/113578
* include/std/ostream (operator<<(basic_ostream&, float)):
Restore signbit after converting to double.
No strong opinion. One could argue that the existence of a 
conditional like that inherently implies the generic code is dependent 
on specific processor behavior which probably is unwise.  But again, no 
strong opinion.


jeff


Re: [PATCH v18 02/26] libstdc++: Optimize std::is_const compilation performance

2024-05-07 Thread Ken Matsui
Hi Jonathan,

Since __is_const, __is_volatile, and __is_pointer were approved, could
you please review these patches for libstdc++?  I guess that you
already reviewed almost equivalent patches, but I wanted to make sure.

Sincerely,
Ken Matsui


On Thu, May 2, 2024 at 1:16 PM Ken Matsui  wrote:
>
> This patch optimizes the compilation performance of std::is_const
> by dispatching to the new __is_const built-in trait.
>
> libstdc++-v3/ChangeLog:
>
> * include/std/type_traits (is_const): Use __is_const built-in
> trait.
> (is_const_v): Likewise.
>
> Signed-off-by: Ken Matsui 
> ---
>  libstdc++-v3/include/std/type_traits | 12 
>  1 file changed, 12 insertions(+)
>
> diff --git a/libstdc++-v3/include/std/type_traits 
> b/libstdc++-v3/include/std/type_traits
> index b441bf9908f..8df0cf3ac3b 100644
> --- a/libstdc++-v3/include/std/type_traits
> +++ b/libstdc++-v3/include/std/type_traits
> @@ -835,6 +835,12 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
>// Type properties.
>
>/// is_const
> +#if _GLIBCXX_USE_BUILTIN_TRAIT(__is_const)
> +  template
> +struct is_const
> +: public __bool_constant<__is_const(_Tp)>
> +{ };
> +#else
>template
>  struct is_const
>  : public false_type { };
> @@ -842,6 +848,7 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
>template
>  struct is_const<_Tp const>
>  : public true_type { };
> +#endif
>
>/// is_volatile
>template
> @@ -3331,10 +3338,15 @@ template 
>inline constexpr bool is_member_pointer_v = is_member_pointer<_Tp>::value;
>  #endif
>
> +#if _GLIBCXX_USE_BUILTIN_TRAIT(__is_const)
> +template 
> +  inline constexpr bool is_const_v = __is_const(_Tp);
> +#else
>  template 
>inline constexpr bool is_const_v = false;
>  template 
>inline constexpr bool is_const_v = true;
> +#endif
>
>  #if _GLIBCXX_USE_BUILTIN_TRAIT(__is_function)
>  template 
> --
> 2.44.0
>


Re: [PATCH] libstdc++: Rewrite std::variant comparisons without macros

2024-05-07 Thread Ville Voutilainen
On Tue, 7 May 2024 at 16:47, Jonathan Wakely  wrote:
>
> I don't think using a macro for these really saves us much, we can do
> this to avoid duplication instead. And now it's not a big, multi-line
> macro that's a pain to edit.
>
> Any objections?

No, that's beautiful, ship it.


[PATCH][risc-v] libstdc++: Preserve signbit of nan when converting float to double [PR113578]

2024-05-07 Thread Jonathan Wakely
Do we want this change for RISC-V, to fix PR113578?

I haven't tested it on RISC-V, only on x86_64-linux (where it doesn't do
anything).

-- >8 --

libstdc++-v3/ChangeLog:

PR libstdc++/113578
* include/std/ostream (operator<<(basic_ostream&, float)):
Restore signbit after converting to double.
---
 libstdc++-v3/include/std/ostream | 5 +
 1 file changed, 5 insertions(+)

diff --git a/libstdc++-v3/include/std/ostream b/libstdc++-v3/include/std/ostream
index 8a21758d0a3..d492168ca0e 100644
--- a/libstdc++-v3/include/std/ostream
+++ b/libstdc++-v3/include/std/ostream
@@ -233,7 +233,12 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
   {
// _GLIBCXX_RESOLVE_LIB_DEFECTS
// 117. basic_ostream uses nonexistent num_put member functions.
+#ifdef __riscv
+   return _M_insert(__builtin_copysign((double)__f,
+   (double)-__builtin_signbit(__f));
+#else
return _M_insert(static_cast(__f));
+#endif
   }
 
   __ostream_type&
-- 
2.44.0



[committed] libstdc++: Fix handling of incomplete UTF-8 sequences in _Unicode_view

2024-05-07 Thread Jonathan Wakely
Tested x86_64-linux. Pushed to trunk. gcc-14 backport to follow.

-- >8 --

Eddie Nolan reported to me that _Unicode_view was not correctly
implementing the substitution of ill-formed subsequences with U+FFFD,
due to failing to increment the counter when the iterator reaches the
end of the sequence before a multibyte sequence is complete.  As a
result, the incomplete sequence was not completely consumed, and then
the remaining character was treated as another ill-formed sequence,
giving two U+FFFD characters instead of one.

To avoid similar mistakes in future, this change introduces a lambda
that increments the iterator and the counter together. This ensures the
counter is always incremented when the iterator is incremented, so that
we always know how many characters have been consumed.

libstdc++-v3/ChangeLog:

* include/bits/unicode.h (_Unicode_view::_M_read_utf8): Ensure
count of characters consumed is correct when the end of the
input is reached unexpectedly.
* testsuite/ext/unicode/view.cc: Test incomplete UTF-8
sequences.
---
 libstdc++-v3/include/bits/unicode.h| 24 ++
 libstdc++-v3/testsuite/ext/unicode/view.cc |  7 +++
 2 files changed, 18 insertions(+), 13 deletions(-)

diff --git a/libstdc++-v3/include/bits/unicode.h 
b/libstdc++-v3/include/bits/unicode.h
index 29813b743dc..46238143fb6 100644
--- a/libstdc++-v3/include/bits/unicode.h
+++ b/libstdc++-v3/include/bits/unicode.h
@@ -261,9 +261,13 @@ namespace __unicode
   {
_Guard<_Iter> __g{this, _M_curr()};
char32_t __c{};
-   uint8_t __u = *_M_curr()++;
const uint8_t __lo_bound = 0x80, __hi_bound = 0xBF;
+   uint8_t __u = *_M_curr()++;
uint8_t __to_incr = 1;
+   auto __incr = [&, this] {
+ ++__to_incr;
+ return ++_M_curr();
+   };
 
if (__u <= 0x7F) [[likely]]  // 0x00 to 0x7F
  __c = __u;
@@ -281,8 +285,7 @@ namespace __unicode
else
  {
__c = (__c << 6) | (__u & 0x3F);
-   ++_M_curr();
-   ++__to_incr;
+   __incr();
  }
  }
else if (__u <= 0xEF) // 0xE0 to 0xEF
@@ -295,11 +298,10 @@ namespace __unicode
 
if (__u < __lo_bound_2 || __u > __hi_bound_2) [[unlikely]]
  __c = _S_error();
-   else if (++_M_curr() == _M_last) [[unlikely]]
+   else if (__incr() == _M_last) [[unlikely]]
  __c = _S_error();
else
  {
-   ++__to_incr;
__c = (__c << 6) | (__u & 0x3F);
__u = *_M_curr();
 
@@ -308,8 +310,7 @@ namespace __unicode
else
  {
__c = (__c << 6) | (__u & 0x3F);
-   ++_M_curr();
-   ++__to_incr;
+   __incr();
  }
  }
  }
@@ -323,21 +324,19 @@ namespace __unicode
 
if (__u < __lo_bound_2 || __u > __hi_bound_2) [[unlikely]]
  __c = _S_error();
-   else if (++_M_curr() == _M_last) [[unlikely]]
+   else if (__incr() == _M_last) [[unlikely]]
  __c = _S_error();
else
  {
-   ++__to_incr;
__c = (__c << 6) | (__u & 0x3F);
__u = *_M_curr();
 
if (__u < __lo_bound || __u > __hi_bound) [[unlikely]]
  __c = _S_error();
-   else if (++_M_curr() == _M_last) [[unlikely]]
+   else if (__incr() == _M_last) [[unlikely]]
  __c = _S_error();
else
  {
-   ++__to_incr;
__c = (__c << 6) | (__u & 0x3F);
__u = *_M_curr();
 
@@ -346,8 +345,7 @@ namespace __unicode
else
  {
__c = (__c << 6) | (__u & 0x3F);
-   ++_M_curr();
-   ++__to_incr;
+   __incr();
  }
  }
  }
diff --git a/libstdc++-v3/testsuite/ext/unicode/view.cc 
b/libstdc++-v3/testsuite/ext/unicode/view.cc
index ee23b0b1d8a..6f3c099bd84 100644
--- a/libstdc++-v3/testsuite/ext/unicode/view.cc
+++ b/libstdc++-v3/testsuite/ext/unicode/view.cc
@@ -55,6 +55,13 @@ test_illformed_utf8()
   VERIFY( std::ranges::equal(v5, 
u8"\uFFFD\uFFFD\uFFFD\uFFFD\uFFFD\x41\uFFFD\uFFFD\x42"sv) );
   uc::_Utf8_view v6("\xe1\x80\xe2\xf0\x91\x92\xf1\xbf\x41"sv); // Table 3-11
   VERIFY( std::ranges::equal(v6, u8"\uFFFD\uFFFD\uFFFD\uFFFD\x41"sv) );
+
+  uc::_Utf32_view v7("\xe1\x80"sv);
+  VERIFY( std::ranges::equal(v7, U"\uFFFD"sv) );
+  uc::_Utf32_view v8("\xf1\x80"sv);
+  VERIFY( std::ranges::equal(v8, U"\uFFFD"sv) );
+  uc::_Utf32_view v9("\xf1\x80\x80"sv);
+  VERIFY( std::ranges::equal(v9, U"\uFFFD"sv) );
 }
 
 constexpr void
-- 
2.44.0



[committed] libstdc++: Fix for -std=c++23 -ffreestanding [PR114866]

2024-05-07 Thread Jonathan Wakely
Tested x86_64-linux. Pushed to trunk. gcc-14 backport to follow.

-- >8 --

std::shared_ptr isn't declared for freestanding, so guard uses of it
with #if _GLIBCXX_HOSTED in .

libstdc++-v3/ChangeLog:

PR libstdc++/114866
* include/bits/out_ptr.h [!_GLIBCXX_HOSTED]: Don't refer to
shared_ptr, __shared_ptr or __is_shred_ptr.
* testsuite/20_util/headers/memory/114866.cc: New test.
---
 libstdc++-v3/include/bits/out_ptr.h| 10 ++
 .../testsuite/20_util/headers/memory/114866.cc |  4 
 2 files changed, 14 insertions(+)
 create mode 100644 libstdc++-v3/testsuite/20_util/headers/memory/114866.cc

diff --git a/libstdc++-v3/include/bits/out_ptr.h 
b/libstdc++-v3/include/bits/out_ptr.h
index aeeb6640441..d74c9f52d3b 100644
--- a/libstdc++-v3/include/bits/out_ptr.h
+++ b/libstdc++-v3/include/bits/out_ptr.h
@@ -54,9 +54,11 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
   template
 class out_ptr_t
 {
+#if _GLIBCXX_HOSTED
   static_assert(!__is_shared_ptr<_Smart> || sizeof...(_Args) != 0,
"a deleter must be used when adapting std::shared_ptr "
"with std::out_ptr");
+#endif
 
 public:
   explicit
@@ -216,6 +218,7 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
  [[no_unique_address]] _Del2 _M_del;
};
 
+#if _GLIBCXX_HOSTED
   // Partial specialization for std::shared_ptr.
   // This specialization gives direct access to the private member
   // of the shared_ptr, avoiding the overhead of storing a separate
@@ -274,6 +277,7 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
{
  using _Impl<_Smart, _Pointer, _Del, allocator>::_Impl;
};
+#endif
 
   using _Impl_t = _Impl<_Smart, _Pointer, _Args...>;
 
@@ -293,8 +297,10 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
   template
 class inout_ptr_t
 {
+#if _GLIBCXX_HOSTED
   static_assert(!__is_shared_ptr<_Smart>,
"std::inout_ptr can not be used to wrap std::shared_ptr");
+#endif
 
 public:
   explicit
@@ -320,11 +326,15 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
   }
 
 private:
+#if _GLIBCXX_HOSTED
   // Avoid an invalid instantiation of out_ptr_t, ...>
   using _Out_ptr_t
= __conditional_t<__is_shared_ptr<_Smart>,
  out_ptr_t,
  out_ptr_t<_Smart, _Pointer, _Args...>>;
+#else
+  using _Out_ptr_t = out_ptr_t<_Smart, _Pointer, _Args...>;
+#endif
   using _Impl_t = typename _Out_ptr_t::_Impl_t;
   _Impl_t _M_impl;
 };
diff --git a/libstdc++-v3/testsuite/20_util/headers/memory/114866.cc 
b/libstdc++-v3/testsuite/20_util/headers/memory/114866.cc
new file mode 100644
index 000..7cf6be0539d
--- /dev/null
+++ b/libstdc++-v3/testsuite/20_util/headers/memory/114866.cc
@@ -0,0 +1,4 @@
+// { dg-options "-ffreestanding" }
+// { dg-do compile }
+// PR libstdc++/114866  & out_ptr in freestanding
+#include 
-- 
2.44.0



[PATCH] libstdc++: Rewrite std::variant comparisons without macros

2024-05-07 Thread Jonathan Wakely
I don't think using a macro for these really saves us much, we can do
this to avoid duplication instead. And now it's not a big, multi-line
macro that's a pain to edit.

Any objections?

Tested x86_64-linux.

-- >8 --

libstdc++-v3/ChangeLog:

* include/std/variant (__detail::__variant::__compare): New
function template.
(operator==, operator!=, operator<, operator>, operator<=)
(operator>=): Replace macro definition with handwritten function
calling __detail::__variant::__compare.
(operator<=>): Call __detail::__variant::__compare.
---
 libstdc++-v3/include/std/variant | 167 +--
 1 file changed, 114 insertions(+), 53 deletions(-)

diff --git a/libstdc++-v3/include/std/variant b/libstdc++-v3/include/std/variant
index bf05eec9a6b..cfb4bcdbcc9 100644
--- a/libstdc++-v3/include/std/variant
+++ b/libstdc++-v3/include/std/variant
@@ -48,6 +48,7 @@
 #include 
 #include  // in_place_index_t
 #if __cplusplus >= 202002L
+# include 
 # include 
 #endif
 
@@ -1237,47 +1238,119 @@ namespace __variant
 
   struct monostate { };
 
-#if __cpp_lib_concepts
-# define _VARIANT_RELATION_FUNCTION_CONSTRAINTS(TYPES, OP) \
-  requires ((requires (const TYPES& __t) { \
-   { __t OP __t } -> __detail::__boolean_testable; }) && ...)
-#else
-# define _VARIANT_RELATION_FUNCTION_CONSTRAINTS(TYPES, OP)
-#endif
+namespace __detail::__variant
+{
+  template
+constexpr _Ret
+__compare(_Ret __ret, const _Vp& __lhs, const _Vp& __rhs, _Op __op)
+{
+  __variant::__raw_idx_visit(
+   [&__ret, &__lhs, __op] (auto&& __rhs_mem, auto __rhs_index) mutable
+   {
+ if constexpr (__rhs_index != variant_npos)
+   {
+ if (__lhs.index() == __rhs_index.value)
+   {
+ auto& __this_mem = std::get<__rhs_index>(__lhs);
+ __ret = __op(__this_mem, __rhs_mem);
+ return;
+   }
+   }
+ __ret = __op(__lhs.index() + 1, __rhs_index + 1);
+   }, __rhs);
+  return __ret;
+}
+} // namespace __detail::__variant
 
-#define _VARIANT_RELATION_FUNCTION_TEMPLATE(__OP) \
-  template \
-_VARIANT_RELATION_FUNCTION_CONSTRAINTS(_Types, __OP) \
-constexpr bool \
-operator __OP [[nodiscard]] (const variant<_Types...>& __lhs, \
-const variant<_Types...>& __rhs) \
-{ \
-  bool __ret = true; \
-  __detail::__variant::__raw_idx_visit( \
-[&__ret, &__lhs] (auto&& __rhs_mem, auto __rhs_index) mutable \
-{ \
- if constexpr (__rhs_index != variant_npos) \
-   { \
- if (__lhs.index() == __rhs_index) \
-   { \
- auto& __this_mem = std::get<__rhs_index>(__lhs);  \
-  __ret = __this_mem __OP __rhs_mem; \
- return; \
-} \
-} \
- __ret = (__lhs.index() + 1) __OP (__rhs_index + 1); \
-   }, __rhs); \
-  return __ret; \
+  template
+#if __cpp_lib_concepts
+requires ((requires (const _Types& __t) {
+  { __t == __t } -> convertible_to; }) && ...)
+#endif
+constexpr bool
+operator== [[nodiscard]] (const variant<_Types...>& __lhs,
+ const variant<_Types...>& __rhs)
+{
+  return __detail::__variant::__compare(true, __lhs, __rhs,
+   [](auto&& __l, auto&& __r) {
+ return __l == __r;
+   });
 }
 
-  _VARIANT_RELATION_FUNCTION_TEMPLATE(<)
-  _VARIANT_RELATION_FUNCTION_TEMPLATE(<=)
-  _VARIANT_RELATION_FUNCTION_TEMPLATE(==)
-  _VARIANT_RELATION_FUNCTION_TEMPLATE(!=)
-  _VARIANT_RELATION_FUNCTION_TEMPLATE(>=)
-  _VARIANT_RELATION_FUNCTION_TEMPLATE(>)
+  template
+#if __cpp_lib_concepts
+requires ((requires (const _Types& __t) {
+  { __t != __t } -> convertible_to; }) && ...)
+#endif
+constexpr bool
+operator!= [[nodiscard]] (const variant<_Types...>& __lhs,
+ const variant<_Types...>& __rhs)
+{
+  return __detail::__variant::__compare(true, __lhs, __rhs,
+   [](auto&& __l, auto&& __r) {
+ return __l != __r;
+   });
+}
 
-#undef _VARIANT_RELATION_FUNCTION_TEMPLATE
+  template
+#if __cpp_lib_concepts
+requires ((requires (const _Types& __t) {
+  { __t < __t } -> convertible_to; }) && ...)
+#endif
+constexpr bool
+operator< [[nodiscard]] (const variant<_Types...>& __lhs,
+const variant<_Types...>& __rhs)
+{
+  return __detail::__variant::__compare(true, __lhs, __rhs,
+   [](auto&& __l, auto&& __r) {
+ return __l < __r;
+   });
+}
+
+  template
+#if 

Re: [PATCH 4/4] libstdc++: Simplify std::variant comparison operators

2024-05-07 Thread Jonathan Wakely
On Wed, 10 Apr 2024 at 09:51, Jonathan Wakely wrote:
>
> Tested x86_64-linux.
>
> This is just a minor clean-up and could wait for stage 1.

Pushed now.

>
> -- >8 --
>
> libstdc++-v3/ChangeLog:
>
> * include/std/variant (_VARIANT_RELATION_FUNCTION_TEMPLATE):
> Simplify.
> ---
>  libstdc++-v3/include/std/variant | 20 +---
>  1 file changed, 9 insertions(+), 11 deletions(-)
>
> diff --git a/libstdc++-v3/include/std/variant 
> b/libstdc++-v3/include/std/variant
> index 5ba6d9d42e3..2be0f0c1db7 100644
> --- a/libstdc++-v3/include/std/variant
> +++ b/libstdc++-v3/include/std/variant
> @@ -1245,7 +1245,7 @@ namespace __variant
>  # define _VARIANT_RELATION_FUNCTION_CONSTRAINTS(TYPES, OP)
>  #endif
>
> -#define _VARIANT_RELATION_FUNCTION_TEMPLATE(__OP, __NAME) \
> +#define _VARIANT_RELATION_FUNCTION_TEMPLATE(__OP) \
>template \
>  _VARIANT_RELATION_FUNCTION_CONSTRAINTS(_Types, __OP) \
>  constexpr bool \
> @@ -1262,22 +1262,20 @@ namespace __variant
> { \
>   auto& __this_mem = std::get<__rhs_index>(__lhs);  \
>__ret = __this_mem __OP __rhs_mem; \
> + return; \
>  } \
> - else \
> -   __ret = (__lhs.index() + 1) __OP (__rhs_index + 1); \
>  } \
> -  else \
> -__ret = (__lhs.index() + 1) __OP (__rhs_index + 1); \
> + __ret = (__lhs.index() + 1) __OP (__rhs_index + 1); \
> }, __rhs); \
>return __ret; \
>  }
>
> -  _VARIANT_RELATION_FUNCTION_TEMPLATE(<, less)
> -  _VARIANT_RELATION_FUNCTION_TEMPLATE(<=, less_equal)
> -  _VARIANT_RELATION_FUNCTION_TEMPLATE(==, equal)
> -  _VARIANT_RELATION_FUNCTION_TEMPLATE(!=, not_equal)
> -  _VARIANT_RELATION_FUNCTION_TEMPLATE(>=, greater_equal)
> -  _VARIANT_RELATION_FUNCTION_TEMPLATE(>, greater)
> +  _VARIANT_RELATION_FUNCTION_TEMPLATE(<)
> +  _VARIANT_RELATION_FUNCTION_TEMPLATE(<=)
> +  _VARIANT_RELATION_FUNCTION_TEMPLATE(==)
> +  _VARIANT_RELATION_FUNCTION_TEMPLATE(!=)
> +  _VARIANT_RELATION_FUNCTION_TEMPLATE(>=)
> +  _VARIANT_RELATION_FUNCTION_TEMPLATE(>)
>
>  #undef _VARIANT_RELATION_FUNCTION_TEMPLATE
>
> --
> 2.44.0
>



Re: [PATCH v2] aarch64: Preserve mem info on change of base for ldp/stp [PR114674]

2024-05-07 Thread Alex Coplan
On 12/04/2024 12:13, Richard Sandiford wrote:
> Alex Coplan  writes:
> > This is a v2 because I accidentally sent a WIP version of the patch last
> > time round which used replace_equiv_address instead of
> > replace_equiv_address_nv; that caused some ICEs (pointed out by the
> > Linaro CI) since pair addressing modes aren't a subset of the addresses
> > that are accepted by memory_operand for a given mode.
> >
> > This patch should otherwise be identical to v1.  Bootstrapped/regtested
> > on aarch64-linux-gnu (indeed this is the patch I actually tested last
> > time), is this version also OK for GCC 15?
> 
> OK, thanks.  Sorry for missing this in the first review.

Now pushed to trunk, thanks.

Alex

> 
> Richard
> 
> > Thanks,
> > Alex
> >
> > --- >8 ---
> >
> > The ldp/stp fusion pass can change the base of an access so that the two
> > accesses end up using a common base register.  So far we have been using
> > adjust_address_nv to do this, but this means that we don't preserve
> > other properties of the mem we're replacing.  It seems better to use
> > replace_equiv_address_nv, as this will preserve e.g. the MEM_ALIGN of the
> > mem whose address we're changing.
> >
> > The PR shows that by adjusting the other mem we lose alignment
> > information about the original access and therefore end up rejecting an
> > otherwise viable pair when --param=aarch64-stp-policy=aligned is passed.
> > This patch fixes that by using replace_equiv_address_nv instead.
> >
> > Notably this is the same approach as taken by
> > aarch64_check_consecutive_mems when a change of base is required, so
> > this at least makes things more consistent between the ldp fusion pass
> > and the peepholes.
> >
> > gcc/ChangeLog:
> >
> > PR target/114674
> > * config/aarch64/aarch64-ldp-fusion.cc (ldp_bb_info::fuse_pair):
> > Use replace_equiv_address_nv on a change of base instead of
> > adjust_address_nv on the other access.
> >
> > gcc/testsuite/ChangeLog:
> >
> > PR target/114674
> > * gcc.target/aarch64/pr114674.c: New test.
> >
> > diff --git a/gcc/config/aarch64/aarch64-ldp-fusion.cc 
> > b/gcc/config/aarch64/aarch64-ldp-fusion.cc
> > index 365dcf48b22..d07d79df06c 100644
> > --- a/gcc/config/aarch64/aarch64-ldp-fusion.cc
> > +++ b/gcc/config/aarch64/aarch64-ldp-fusion.cc
> > @@ -1730,11 +1730,11 @@ ldp_bb_info::fuse_pair (bool load_p,
> > adjust_amt *= -1;
> >  
> >rtx change_reg = XEXP (change_pat, !load_p);
> > -  machine_mode mode_for_mem = GET_MODE (change_mem);
> >rtx effective_base = drop_writeback (base_mem);
> > -  rtx new_mem = adjust_address_nv (effective_base,
> > -  mode_for_mem,
> > -  adjust_amt);
> > +  rtx adjusted_addr = plus_constant (Pmode,
> > +XEXP (effective_base, 0),
> > +adjust_amt);
> > +  rtx new_mem = replace_equiv_address_nv (change_mem, adjusted_addr);
> >rtx new_set = load_p
> > ? gen_rtx_SET (change_reg, new_mem)
> > : gen_rtx_SET (new_mem, change_reg);
> > diff --git a/gcc/testsuite/gcc.target/aarch64/pr114674.c 
> > b/gcc/testsuite/gcc.target/aarch64/pr114674.c
> > new file mode 100644
> > index 000..944784fd008
> > --- /dev/null
> > +++ b/gcc/testsuite/gcc.target/aarch64/pr114674.c
> > @@ -0,0 +1,17 @@
> > +/* { dg-do compile } */
> > +/* { dg-options "-O3 --param=aarch64-stp-policy=aligned" } */
> > +typedef struct {
> > +   unsigned int f1;
> > +   unsigned int f2;
> > +} test_struct;
> > +
> > +static test_struct ts = {
> > +   123, 456
> > +};
> > +
> > +void foo(void)
> > +{
> > +   ts.f2 = 36969 * (ts.f2 & 65535) + (ts.f1 >> 16);
> > +   ts.f1 = 18000 * (ts.f2 & 65535) + (ts.f2 >> 16);
> > +}
> > +/* { dg-final { scan-assembler-times "stp" 1 } } */


[committed] libstdc++: Constrain equality ops for std::pair, std::tuple, std::variant

2024-05-07 Thread Jonathan Wakely
Tested x86_64-linux. Pushed to trunk.

-- >8 --

Implement the changes from P2944R3 which add constraints to the
comparison operators of std::pair, std::tuple, and std::variant.

The paper also changes std::optional, but we already constrain its
comparisons using SFINAE on the return type. However, we need some
additional constraints on the [optional.comp.with.t] operators that
compare an optional with a value. The paper doesn't say to do that, but
I think it's needed because otherwise when the comparison for two
optional objects fails its constraints, the two overloads that are
supposed to be for comparing to a non-optional become the best overload
candidates, but are ambiguous (and we don't even get as far as checking
the constraints for satisfaction). I reported LWG 4072 for this.

The paper does not change std::expected, but probably should have done.
I'll submit an LWG issue about that and implement it separately.

Also add [[nodiscard]] to all these comparison operators.

libstdc++-v3/ChangeLog:

* include/bits/stl_pair.h (operator==): Add constraint.
* include/bits/version.def (constrained_equality): Define.
* include/bits/version.h: Regenerate.
* include/std/optional: Define feature test macro.
(__optional_rep_op_t): Use is_convertible_v instead of
is_convertible.
* include/std/tuple: Define feature test macro.
(operator==, __tuple_cmp, operator<=>): Reimplement C++20
comparisons using lambdas. Add constraints.
* include/std/utility: Define feature test macro.
* include/std/variant: Define feature test macro.
(_VARIANT_RELATION_FUNCTION_TEMPLATE): Add constraints.
(variant): Remove unnecessary friend declarations for comparison
operators.
* testsuite/20_util/optional/relops/constrained.cc: New test.
* testsuite/20_util/pair/comparison_operators/constrained.cc:
New test.
* testsuite/20_util/tuple/comparison_operators/constrained.cc:
New test.
* testsuite/20_util/variant/relops/constrained.cc: New test.
* testsuite/20_util/tuple/comparison_operators/overloaded.cc:
Disable for C++20 and later.
* testsuite/20_util/tuple/comparison_operators/overloaded2.cc:
Remove dg-error line for target c++20.
---
 libstdc++-v3/include/bits/stl_pair.h  |  16 +-
 libstdc++-v3/include/bits/version.def |   9 +
 libstdc++-v3/include/bits/version.h   |  10 +
 libstdc++-v3/include/std/optional |  50 +++-
 libstdc++-v3/include/std/tuple| 102 ---
 libstdc++-v3/include/std/utility  |   1 +
 libstdc++-v3/include/std/variant  |  28 +-
 .../20_util/optional/relops/constrained.cc| 258 ++
 .../pair/comparison_operators/constrained.cc  |  48 
 .../tuple/comparison_operators/constrained.cc |  50 
 .../tuple/comparison_operators/overloaded.cc  |   6 +-
 .../tuple/comparison_operators/overloaded2.cc |   1 -
 .../20_util/variant/relops/constrained.cc | 175 
 13 files changed, 679 insertions(+), 75 deletions(-)
 create mode 100644 
libstdc++-v3/testsuite/20_util/optional/relops/constrained.cc
 create mode 100644 
libstdc++-v3/testsuite/20_util/pair/comparison_operators/constrained.cc
 create mode 100644 
libstdc++-v3/testsuite/20_util/tuple/comparison_operators/constrained.cc
 create mode 100644 libstdc++-v3/testsuite/20_util/variant/relops/constrained.cc

diff --git a/libstdc++-v3/include/bits/stl_pair.h 
b/libstdc++-v3/include/bits/stl_pair.h
index 45317417c9c..0c1e5719a1a 100644
--- a/libstdc++-v3/include/bits/stl_pair.h
+++ b/libstdc++-v3/include/bits/stl_pair.h
@@ -1000,14 +1000,19 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
   template pair(_T1, _T2) -> pair<_T1, _T2>;
 #endif
 
-#if __cpp_lib_three_way_comparison && __cpp_lib_concepts
+#if __cpp_lib_three_way_comparison
   // _GLIBCXX_RESOLVE_LIB_DEFECTS
   // 3865. Sorting a range of pairs
 
   /// Two pairs are equal iff their members are equal.
   template
-inline _GLIBCXX_CONSTEXPR bool
+[[nodiscard]]
+constexpr bool
 operator==(const pair<_T1, _T2>& __x, const pair<_U1, _U2>& __y)
+requires requires {
+  { __x.first == __y.first } -> __detail::__boolean_testable;
+  { __x.second == __y.second } -> __detail::__boolean_testable;
+}
 { return __x.first == __y.first && __x.second == __y.second; }
 
   /** Defines a lexicographical order for pairs.
@@ -1018,6 +1023,7 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
* less than `Q.second`.
   */
   template
+[[nodiscard]]
 constexpr common_comparison_category_t<__detail::__synth3way_t<_T1, _U1>,
   __detail::__synth3way_t<_T2, _U2>>
 operator<=>(const pair<_T1, _T2>& __x, const pair<_U1, _U2>& __y)
@@ -1029,6 +1035,7 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
 #else
   /// Two pairs of the same type are equal iff their members are equal.
   template
+

[PATCH] libstdc++: Use __builtin_shufflevector for simd split and concat

2024-05-07 Thread Matthias Kretz
Tested on x86_64-linux-gnu and aarch64-linux-gnu and with Clang 18 on x86_64-
linux-gnu.

OK for trunk and backport(s)?

-- 8< 

Signed-off-by: Matthias Kretz 

libstdc++-v3/ChangeLog:

PR libstdc++/114958
* include/experimental/bits/simd.h (__as_vector): Return scalar
simd as one-element vector. Return vector from single-vector
fixed_size simd.
(__vec_shuffle): New.
(__extract_part): Adjust return type signature.
(split): Use __extract_part for any split into non-fixed_size
simds.
(concat): If the return type stores a single vector, use
__vec_shuffle (which calls __builtin_shufflevector) to produce
the return value.
* include/experimental/bits/simd_builtin.h
(__shift_elements_right): Removed.
(__extract_part): Return single elements directly. Use
__vec_shuffle (which calls __builtin_shufflevector) to for all
non-trivial cases.
* include/experimental/bits/simd_fixed_size.h (__extract_part):
Return single elements directly.
* testsuite/experimental/simd/pr114958.cc: New test.
---
 libstdc++-v3/include/experimental/bits/simd.h | 161 +-
 .../include/experimental/bits/simd_builtin.h  | 152 +
 .../experimental/bits/simd_fixed_size.h   |   4 +-
 .../testsuite/experimental/simd/pr114958.cc   |  20 +++
 4 files changed, 145 insertions(+), 192 deletions(-)
 create mode 100644 libstdc++-v3/testsuite/experimental/simd/pr114958.cc


--
──
 Dr. Matthias Kretz   https://mattkretz.github.io
 GSI Helmholtz Centre for Heavy Ion Research   https://gsi.de
 stdₓ::simd
──diff --git a/libstdc++-v3/include/experimental/bits/simd.h b/libstdc++-v3/include/experimental/bits/simd.h
index 6ef9c955cfa..6a6fd4f109d 100644
--- a/libstdc++-v3/include/experimental/bits/simd.h
+++ b/libstdc++-v3/include/experimental/bits/simd.h
@@ -1651,7 +1651,24 @@ __as_vector(_V __x)
 if constexpr (__is_vector_type_v<_V>)
   return __x;
 else if constexpr (is_simd<_V>::value || is_simd_mask<_V>::value)
-  return __data(__x)._M_data;
+  {
+	if constexpr (__is_fixed_size_abi_v)
+	  {
+	static_assert(is_simd<_V>::value);
+	static_assert(_V::abi_type::template __traits<
+			typename _V::value_type>::_SimdMember::_S_tuple_size == 1);
+	return __as_vector(__data(__x).first);
+	  }
+	else if constexpr (_V::size() > 1)
+	  return __data(__x)._M_data;
+	else
+	  {
+	static_assert(is_simd<_V>::value);
+	using _Tp = typename _V::value_type;
+	using _RV [[__gnu__::__vector_size__(sizeof(_Tp))]] = _Tp;
+	return _RV{__data(__x)};
+	  }
+  }
 else if constexpr (__is_vectorizable_v<_V>)
   return __vector_type_t<_V, 2>{__x};
 else
@@ -2061,6 +2078,60 @@ __not(_Tp __a) noexcept
   return ~__a;
   }
 
+// }}}
+// __vec_shuffle{{{
+template 
+  _GLIBCXX_SIMD_INTRINSIC constexpr auto
+  __vec_shuffle(_T0 __x, _T1 __y, index_sequence<_Is...> __seq, _Fun __idx_perm)
+  {
+constexpr int _N0 = sizeof(__x) / sizeof(__x[0]);
+constexpr int _N1 = sizeof(__y) / sizeof(__y[0]);
+#if __has_builtin(__builtin_shufflevector)
+#ifdef __clang__
+// Clang requires _T0 == _T1
+if constexpr (sizeof(__x) > sizeof(__y) and _N1 == 1)
+  return __vec_shuffle(__x, _T0{__y[0]}, __seq, __idx_perm);
+else if constexpr (sizeof(__x) > sizeof(__y))
+  return __vec_shuffle(__x, __intrin_bitcast<_T0>(__y), __seq, __idx_perm);
+else if constexpr (sizeof(__x) < sizeof(__y) and _N0 == 1)
+  return __vec_shuffle(_T1{__x[0]}, __y, __seq, [=](int __i) {
+	   __i = __idx_perm(__i);
+	   return __i < _N0 ? __i : __i - _N0 + _N1;
+	 });
+else if constexpr (sizeof(__x) < sizeof(__y))
+  return __vec_shuffle(__intrin_bitcast<_T1>(__x), __y, __seq, [=](int __i) {
+	   __i = __idx_perm(__i);
+	   return __i < _N0 ? __i : __i - _N0 + _N1;
+	 });
+else
+#endif
+  return __builtin_shufflevector(__x, __y, [=] {
+	   constexpr int __j = __idx_perm(_Is);
+	   static_assert(__j < _N0 + _N1);
+	   return __j;
+	 }()...);
+#else
+using _Tp = __remove_cvref_t;
+return __vector_type_t<_Tp, sizeof...(_Is)> {
+  [=]() -> _Tp {
+	constexpr int __j = __idx_perm(_Is);
+	static_assert(__j < _N0 + _N1);
+	if constexpr (__j < 0)
+	  return 0;
+	else if constexpr (__j < _N0)
+	  return __x[__j];
+	else
+	  return __y[__j - _N0];
+  }()...
+};
+#endif
+  }
+
+template 
+  _GLIBCXX_SIMD_INTRINSIC constexpr auto
+  __vec_shuffle(_T0 __x, _Seq __seq, _Fun __idx_perm)
+  { return __vec_shuffle(__x, _T0(), __seq, __idx_perm); }
+
 // }}}
 // __concat{{{
 template ,
@@ -3947,7 +4018,7 @@ clamp(const simd<_Tp, _Ap>& __v, const simd<_Tp, _Ap>& __lo, const 

[PATCH] Fix guard for IDF pruning by dominator

2024-05-07 Thread Richard Biener
When insert_updated_phi_nodes_for tries to skip pruning the IDF to
blocks dominated by the nearest common dominator of the set of
definition blocks it compares against ENTRY_BLOCK but that's never
going to be the common dominator, instead it will be at most its single
successor.

Re-bootstrap and regtest running on x86_64-unknown-linux-gnu.

* tree-into-ssa.cc (insert_updated_phi_nodes_for): Skip
pruning when the nearest common dominator is the successor
of ENTRY_BLOCK.
---
 gcc/tree-into-ssa.cc | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/gcc/tree-into-ssa.cc b/gcc/tree-into-ssa.cc
index 705e4119ba3..858c3840475 100644
--- a/gcc/tree-into-ssa.cc
+++ b/gcc/tree-into-ssa.cc
@@ -3262,7 +3262,7 @@ insert_updated_phi_nodes_for (tree var, bitmap_head *dfs,
 common dominator of all the definition blocks.  */
  entry = nearest_common_dominator_for_set (CDI_DOMINATORS,
db->def_blocks);
- if (entry != ENTRY_BLOCK_PTR_FOR_FN (cfun))
+ if (entry != single_succ (ENTRY_BLOCK_PTR_FOR_FN (cfun)))
EXECUTE_IF_SET_IN_BITMAP (idf, 0, i, bi)
  if (BASIC_BLOCK_FOR_FN (cfun, i) != entry
  && dominated_by_p (CDI_DOMINATORS,
-- 
2.35.3


Re: [V2][PATCH] gcc-14/changes.html: Deprecate a GCC C extension on flexible array members.

2024-05-07 Thread Sebastian Huber

On 06.05.24 16:20, Qing Zhao wrote:

Hi, Sebastian,

Looks like that the behavior you described is correct.
What’s your major concern? ( a little confused).


I am concerned that the static initialization of structures with 
flexible array members no longer works. In the RTEMS open source 
real-time operating system, we use flexible array members in some parts. 
One example is the thread control block which is used to manage a thread:


struct _Thread_Control {
  /** This field is the object management structure for each thread. */
  Objects_Control  Object;

[...]

  /**
   * @brief Variable length array of user extension pointers.
   *
   * The length is defined by the application via .
   */
  void *extensions[];
};

In a static configuration of the operating system we have something like 
this:


struct Thread_Configured_control {
/*
 * This was added to address the following warning.
 * warning: invalid use of structure with flexible array member
 */
#pragma GCC diagnostic push
#pragma GCC diagnostic ignored "-Wpedantic"
  Thread_Control Control;
#pragma GCC diagnostic pop

  #if CONFIGURE_MAXIMUM_USER_EXTENSIONS > 0
void *extensions[ CONFIGURE_MAXIMUM_USER_EXTENSIONS + 1 ];
  #endif
  Configuration_Scheduler_node Scheduler_nodes[ 
_CONFIGURE_SCHEDULER_COUNT ];

  RTEMS_API_Control API_RTEMS;
  #ifdef RTEMS_POSIX_API
POSIX_API_Control API_POSIX;
  #endif
  #if CONFIGURE_MAXIMUM_THREAD_NAME_SIZE > 1
char name[ CONFIGURE_MAXIMUM_THREAD_NAME_SIZE ];
  #endif
  #if defined(_CONFIGURE_ENABLE_NEWLIB_REENTRANCY) && \
!defined(_REENT_THREAD_LOCAL)
struct _reent Newlib;
  #endif
};

This is used to define a table of thread control blocks:

Thread_Configured_control \
name##_Objects[ _Objects_Maximum_per_allocation( max ) ]; \
static RTEMS_SECTION( ".noinit.rtems.content.objects." #name ) \

I would like no know which consequences the deprecation this GCC 
extension has.


--
embedded brains GmbH & Co. KG
Herr Sebastian HUBER
Dornierstr. 4
82178 Puchheim
Germany
email: sebastian.hu...@embedded-brains.de
phone: +49-89-18 94 741 - 16
fax:   +49-89-18 94 741 - 08

Registergericht: Amtsgericht München
Registernummer: HRB 157899
Vertretungsberechtigte Geschäftsführer: Peter Rasmussen, Thomas Dörfler
Unsere Datenschutzerklärung finden Sie hier:
https://embedded-brains.de/datenschutzerklaerung/


[wwwdocs] Specify AArch64 BitInt support for little-endian only

2024-05-07 Thread Andre Vieira (lists)

Hey Jakub,

This what ya had in mind?

Kind regards,
Andre Vieiradiff --git a/htdocs/gcc-14/changes.html b/htdocs/gcc-14/changes.html
index 
ca5174de991bb088f653468f77485c15a61526e6..924e045a15a78b5702a0d6997953f35c6b47efd1
 100644
--- a/htdocs/gcc-14/changes.html
+++ b/htdocs/gcc-14/changes.html
@@ -325,7 +325,7 @@ You may also want to check out our
   Bit-precise integer types (_BitInt (N)
   and unsigned _BitInt (N)): integer types with
   a specified number of bits.  These are only supported on
-  IA-32, x86-64 and AArch64 at present.
+  IA-32, x86-64 and AArch64 (little-endian) at present.
   Structure, union and enumeration types may be defined more
   than once in the same scope with the same contents and the same
   tag; if such types are defined with the same contents and the


[committed] libstdc++: Use https instead of http in some comments

2024-05-07 Thread Jonathan Wakely
Tested x86_64-linux. Pushed to trunk.

-- >8 --

libstdc++-v3/ChangeLog:

* include/backward/auto_ptr.h: Use https for URL in comment.
* include/bits/basic_ios.h: Likewise.
* include/std/iostream: Likewise.
---
 libstdc++-v3/include/backward/auto_ptr.h | 2 +-
 libstdc++-v3/include/bits/basic_ios.h| 6 +++---
 libstdc++-v3/include/std/iostream| 2 +-
 3 files changed, 5 insertions(+), 5 deletions(-)

diff --git a/libstdc++-v3/include/backward/auto_ptr.h 
b/libstdc++-v3/include/backward/auto_ptr.h
index dccd459f1e5..271a64d1de0 100644
--- a/libstdc++-v3/include/backward/auto_ptr.h
+++ b/libstdc++-v3/include/backward/auto_ptr.h
@@ -265,7 +265,7 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
*  @endcode
*
*  But it doesn't work, and won't be fixed. For further details see
-   *  http://cplusplus.github.io/LWG/lwg-closed.html#463
+   *  https://cplusplus.github.io/LWG/lwg-closed.html#463
*/
   auto_ptr(auto_ptr_ref __ref) throw()
   : _M_ptr(__ref._M_ptr) { }
diff --git a/libstdc++-v3/include/bits/basic_ios.h 
b/libstdc++-v3/include/bits/basic_ios.h
index 44a77149112..258e6042b8f 100644
--- a/libstdc++-v3/include/bits/basic_ios.h
+++ b/libstdc++-v3/include/bits/basic_ios.h
@@ -408,7 +408,7 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
*  with this stream, calls that buffer's @c pubimbue(loc).
*
*  Additional l10n notes are at
-   *  http://gcc.gnu.org/onlinedocs/libstdc++/manual/localization.html
+   *  https://gcc.gnu.org/onlinedocs/libstdc++/manual/localization.html
   */
   locale
   imbue(const locale& __loc);
@@ -428,7 +428,7 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
*  @endcode
*
*  Additional l10n notes are at
-   *  http://gcc.gnu.org/onlinedocs/libstdc++/manual/localization.html
+   *  https://gcc.gnu.org/onlinedocs/libstdc++/manual/localization.html
   */
   char
   narrow(char_type __c, char __dfault) const
@@ -447,7 +447,7 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
*  @endcode
*
*  Additional l10n notes are at
-   *  http://gcc.gnu.org/onlinedocs/libstdc++/manual/localization.html
+   *  https://gcc.gnu.org/onlinedocs/libstdc++/manual/localization.html
   */
   char_type
   widen(char __c) const
diff --git a/libstdc++-v3/include/std/iostream 
b/libstdc++-v3/include/std/iostream
index 0c6a2d8a4b3..4f4fa6880d5 100644
--- a/libstdc++-v3/include/std/iostream
+++ b/libstdc++-v3/include/std/iostream
@@ -50,7 +50,7 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
*
*  The `` header declares the eight *standard stream objects*.
*  For other declarations, see
-   *  http://gcc.gnu.org/onlinedocs/libstdc++/manual/io.html
+   *  https://gcc.gnu.org/onlinedocs/libstdc++/manual/io.html
*  and the @link iosfwd I/O forward declarations @endlink
*
*  They are required by default to cooperate with the global C
-- 
2.44.0



  1   2   >