date:20230429

[PATCH 0/2] Unify and deduplicate FTM code

2023-04-29 Thread Arsen Arsenović via Gcc-patches

Greetings!

This patch set replaces all code that involves defining feature test
macros based on loosely put together conditionals in the standard
library with a unified helper for specifying and requiring feature test
macros, as well as updating most usage sites, many of which have been
migrated to following a pattern similar, in structure, to:

  ...
  #define __glibcxx_want_foo
  #include 
  ...
  namespace std {
  ...
  #ifdef __cpp_lib_foo
template
void foonicate(T&& t)
{ __builtin_foonicate_address(std::__addressof(t)); }
  #endif // __cpp_lib_foo
  ...
  } // namespace std

In the future this should aid in preventing  from being
dishonest about what the implementation provides, as well as reducing
the amount of finicky work it takes to update FTMs.

Note that this patchset is not perfect.  The usage sites of various
feature test macros still include "wide" condition blocks that shadow
over the blocks that check for FTMs, mostly in places where features
with FTMs are the exception, rather than the norm.

That said, using a pair of scripts[1][2], I've tested that the code
emitted in bits/stdc++.h remains unchanged (save for a misdeclared
__cpp_lib_constexpr_string in !HOSTED), as well as regression-tested
--enable-languages=c,c++,lto on x86_64-pc-linux-gnu, and ran the
libstdc++ testsuite with

  --target_board="unix{,-std=c++98,-std=gnu++11,-std=gnu++20,
  -D_GLIBCXX_USE_CXX11_ABI=0/-D_GLIBCXX_DEBUG,-D_GLIBCXX_DEBUG,
  -std=gnu++23}{-fno-freestanding,-ffreestanding}"

(without the line breaks) to find no relevant failures.

OK for trunk?

Thanks in advance, have a lovely day.

[1] https://git.sr.ht/~arsen/scripts/tree/master/item/difall.bash
[2] https://git.sr.ht/~arsen/scripts/tree/master/item/vercmp.bash

Arsen Arsenović (2):
  libstdc++: Implement more maintainable  header
  libstdc++: Replace all manual FTM definitions and use

 libstdc++-v3/include/Makefile.am  |   10 +-
 libstdc++-v3/include/Makefile.in  |   10 +-
 libstdc++-v3/include/bits/algorithmfwd.h  |7 +-
 libstdc++-v3/include/bits/align.h |8 +-
 libstdc++-v3/include/bits/alloc_traits.h  |   11 +-
 libstdc++-v3/include/bits/allocator.h |3 +-
 libstdc++-v3/include/bits/atomic_base.h   |   14 +-
 libstdc++-v3/include/bits/atomic_wait.h   |   10 +-
 libstdc++-v3/include/bits/basic_string.h  |   24 +-
 libstdc++-v3/include/bits/char_traits.h   |   11 +-
 libstdc++-v3/include/bits/chrono.h|   18 +-
 libstdc++-v3/include/bits/cow_string.h|9 +-
 libstdc++-v3/include/bits/erase_if.h  |   11 +-
 libstdc++-v3/include/bits/forward_list.h  |6 +-
 libstdc++-v3/include/bits/hashtable.h |9 +-
 libstdc++-v3/include/bits/ios_base.h  |6 +-
 libstdc++-v3/include/bits/move.h  |8 +-
 .../include/bits/move_only_function.h |9 +-
 libstdc++-v3/include/bits/node_handle.h   |8 +-
 libstdc++-v3/include/bits/ptr_traits.h|   15 +-
 libstdc++-v3/include/bits/range_access.h  |   16 +-
 libstdc++-v3/include/bits/ranges_algo.h   |   27 +-
 libstdc++-v3/include/bits/ranges_cmp.h|   14 +-
 libstdc++-v3/include/bits/shared_ptr.h|   10 +-
 libstdc++-v3/include/bits/shared_ptr_atomic.h |6 +-
 libstdc++-v3/include/bits/shared_ptr_base.h   |   17 +-
 libstdc++-v3/include/bits/specfun.h   |6 +-
 libstdc++-v3/include/bits/stl_algo.h  |   20 +-
 libstdc++-v3/include/bits/stl_algobase.h  |   13 +-
 libstdc++-v3/include/bits/stl_function.h  |   28 +-
 libstdc++-v3/include/bits/stl_iterator.h  |   21 +-
 libstdc++-v3/include/bits/stl_list.h  |6 +-
 libstdc++-v3/include/bits/stl_map.h   |6 +-
 libstdc++-v3/include/bits/stl_pair.h  |   12 +-
 libstdc++-v3/include/bits/stl_queue.h |9 +-
 libstdc++-v3/include/bits/stl_stack.h |7 +-
 libstdc++-v3/include/bits/stl_tree.h  |7 +-
 libstdc++-v3/include/bits/stl_uninitialized.h |9 +-
 libstdc++-v3/include/bits/stl_vector.h|4 +-
 libstdc++-v3/include/bits/unique_ptr.h|   13 +-
 libstdc++-v3/include/bits/unordered_map.h |8 +-
 .../include/bits/uses_allocator_args.h|   10 +-
 libstdc++-v3/include/bits/utility.h   |   21 +-
 libstdc++-v3/include/bits/version.def | 1591 ++
 libstdc++-v3/include/bits/version.h   | 1937 +
 libstdc++-v3/include/bits/version.tpl |  209 ++
 .../include/c_compatibility/stdatomic.h   |9 +-
 libstdc++-v3/include/c_global/cmath   |   18 +-
 libstdc++-v3/include/c_global/cstddef |9 +-
 libstdc++-v3/include/std/algorithm|   10 +-
 libstdc++-v3/include/std/any  |9 +-
 libstdc++-v3/include/std/array|9 +-
 libstdc++-v3/include/std/atomic   |   67 +-
 libstdc++-v3/include/std/barrier  |   11 +-
 libstdc++-v

[PATCH 1/2] libstdc++: Implement more maintainable header

2023-04-29 Thread Arsen Arsenović via Gcc-patches

This commit replaces the ad-hoc logic in  with an AutoGen
database that (mostly) declaratively generates a version.h bit which
combines all of the FTM logic across all headers together.

This generated header defines macros of the form __glibcxx_foo,
equivalent to their __cpp_lib_foo variants, according to rules specified
in version.def and, optionally, if __glibcxx_want_foo or
__glibcxx_want_all are defined, also defines __cpp_lib_foo forms with
the same definition.

libstdc++-v3/ChangeLog:

* include/Makefile.am (bits_freestanding): Add version.h.
(allcreated): Add version.h.
(${bits_srcdir}/version.h): New rule.  Regenerates
version.h out of version.{def,tpl}.
* include/Makefile.in: Regenerate.
* include/bits/version.def: New file.  Declares a list of
all feature test macros, their values and their preconditions.
* include/bits/version.tpl: New file.  Turns version.def
into a sequence of #if blocks.
* include/bits/version.h: New file.  Generated from
version.def.
* include/std/version: Replace with a __glibcxx_want_all define
and bits/version.h include.
---
 libstdc++-v3/include/Makefile.am  |   10 +-
 libstdc++-v3/include/Makefile.in  |   10 +-
 libstdc++-v3/include/bits/version.def | 1591 
 libstdc++-v3/include/bits/version.h   | 1937 +
 libstdc++-v3/include/bits/version.tpl |  209 +++
 libstdc++-v3/include/std/version  |  350 +
 6 files changed, 3758 insertions(+), 349 deletions(-)
 create mode 100644 libstdc++-v3/include/bits/version.def
 create mode 100644 libstdc++-v3/include/bits/version.h
 create mode 100644 libstdc++-v3/include/bits/version.tpl

diff --git a/libstdc++-v3/include/Makefile.am b/libstdc++-v3/include/Makefile.am
index a880e8ee227..a07b4c18585 100644
--- a/libstdc++-v3/include/Makefile.am
+++ b/libstdc++-v3/include/Makefile.am
@@ -154,6 +154,7 @@ bits_freestanding = \
${bits_srcdir}/stl_raw_storage_iter.h \
${bits_srcdir}/stl_relops.h \
${bits_srcdir}/stl_uninitialized.h \
+   ${bits_srcdir}/version.h \
${bits_srcdir}/string_view.tcc \
${bits_srcdir}/uniform_int_dist.h \
${bits_srcdir}/unique_ptr.h \
@@ -1113,7 +1114,8 @@ allcreated = \
${host_builddir}/c++config.h \
${host_builddir}/largefile-config.h \
${thread_host_headers} \
-   ${pch_build}
+   ${pch_build} \
+   ${bits_srcdir}/version.h
 
 # Here are the rules for building the headers
 all-local: ${allstamped} ${allcreated}
@@ -1463,6 +1465,12 @@ ${pch3_output}: ${pch3_source} ${pch2_output}
-mkdir -p ${pch3_output_builddir}
$(CXX) $(PCHFLAGS) $(AM_CPPFLAGS) -O2 -g ${pch3_source} -o $@
 
+# AutoGen .
+${bits_srcdir}/version.h: ${bits_srcdir}/version.def \
+   ${bits_srcdir}/version.tpl
+   cd $(@D) && \
+   autogen version.def
+
 # The real deal.
 install-data-local: install-headers
 install-headers:
diff --git a/libstdc++-v3/include/Makefile.in b/libstdc++-v3/include/Makefile.in
index 0ff875b280b..f5b04d3fe8a 100644
--- a/libstdc++-v3/include/Makefile.in
+++ b/libstdc++-v3/include/Makefile.in
@@ -509,6 +509,7 @@ bits_freestanding = \
${bits_srcdir}/stl_raw_storage_iter.h \
${bits_srcdir}/stl_relops.h \
${bits_srcdir}/stl_uninitialized.h \
+   ${bits_srcdir}/version.h \
${bits_srcdir}/string_view.tcc \
${bits_srcdir}/uniform_int_dist.h \
${bits_srcdir}/unique_ptr.h \
@@ -1441,7 +1442,8 @@ allcreated = \
${host_builddir}/c++config.h \
${host_builddir}/largefile-config.h \
${thread_host_headers} \
-   ${pch_build}
+   ${pch_build} \
+   ${bits_srcdir}/version.h
 
 
 # Host includes for threads
@@ -1937,6 +1939,12 @@ ${pch3_output}: ${pch3_source} ${pch2_output}
-mkdir -p ${pch3_output_builddir}
$(CXX) $(PCHFLAGS) $(AM_CPPFLAGS) -O2 -g ${pch3_source} -o $@
 
+# AutoGen .
+${bits_srcdir}/version.h: ${bits_srcdir}/version.def \
+   ${bits_srcdir}/version.tpl
+   cd $(@D) && \
+   autogen version.def
+
 # The real deal.
 install-data-local: install-headers
 install-headers:
diff --git a/libstdc++-v3/include/bits/version.def 
b/libstdc++-v3/include/bits/version.def
new file mode 100644
index 000..afdec9acfe3
--- /dev/null
+++ b/libstdc++-v3/include/bits/version.def
@@ -0,0 +1,1591 @@
+// Feature test macro definitions  -*- C++ -*-
+// Copyright (C) 2023 Free Software Foundation, Inc.
+
+// This file is part of the GNU ISO C++ Library.  This library is free
+// software; you can redistribute it and/or modify it under the
+// terms of the GNU General Public License as published by the
+// Free Software Foundation; either version 3, or (at your option)
+// any later version.
+
+// This library is distributed in the hope that it will be useful,
+// but WITHOUT AN

[PATCH] OpenACC: Further attach/detach clause fixes for Fortran [PR109622]

2023-04-29 Thread Julian Brown

This patch moves several tests introduced by the following patch:

  https://gcc.gnu.org/pipermail/gcc-patches/2023-April/616939.html

into the proper location for OpenACC testing (thanks to Thomas for
spotting my mistake!), and also fixes a few additional problems --
missing diagnostics for non-pointer attaches, and a case where a pointer
was incorrectly dereferenced. Tests are also adjusted for vector-length
warnings on nvidia accelerators.

Tested with offloading to nvptx. OK?

2023-04-29  Julian Brown  

PR fortran/109622

gcc/fortran/
* trans-openmp.cc (gfc_trans_omp_clauses): Add diagnostic for
non-pointer/non-allocatable attach/detach.  Remove dereference for
pointer-to-scalar derived type component attach/detach.

gcc/testsuite/
* gfortran.dg/goacc/pr109622-5.f90: New test.

libgomp/
* testsuite/libgomp.fortran/pr109622.f90: Move test...
* testsuite/libgomp.oacc-fortran/pr109622.f90: ...to here. Ignore
vector length warning.
* testsuite/libgomp.fortran/pr109622-2.f90: Move test...
* testsuite/libgomp.oacc-fortran/pr109622-2.f90: ...to here.  Add
missing copyin/copyout variable. Ignore vector length warnings.
* testsuite/libgomp.fortran/pr109622-3.f90: Move test...
* testsuite/libgomp.oacc-fortran/pr109622-3.f90: ...to here.  Ignore
vector length warnings.
* testsuite/libgomp.oacc-fortran/pr109622-4.f90: New test.
---
 gcc/fortran/trans-openmp.cc   | 38 ---
 .../gfortran.dg/goacc/pr109622-5.f90  | 45 ++
 .../pr109622-2.f90|  7 ++-
 .../pr109622-3.f90|  3 ++
 .../libgomp.oacc-fortran/pr109622-4.f90   | 47 +++
 .../pr109622.f90  |  3 ++
 6 files changed, 135 insertions(+), 8 deletions(-)
 create mode 100644 gcc/testsuite/gfortran.dg/goacc/pr109622-5.f90
 rename libgomp/testsuite/{libgomp.fortran => 
libgomp.oacc-fortran}/pr109622-2.f90 (63%)
 rename libgomp/testsuite/{libgomp.fortran => 
libgomp.oacc-fortran}/pr109622-3.f90 (76%)
 create mode 100644 libgomp/testsuite/libgomp.oacc-fortran/pr109622-4.f90
 rename libgomp/testsuite/{libgomp.fortran => 
libgomp.oacc-fortran}/pr109622.f90 (78%)

diff --git a/gcc/fortran/trans-openmp.cc b/gcc/fortran/trans-openmp.cc
index 6ee22faa836a..b9a4ae3e53a8 100644
--- a/gcc/fortran/trans-openmp.cc
+++ b/gcc/fortran/trans-openmp.cc
@@ -3395,6 +3395,8 @@ gfc_trans_omp_clauses (stmtblock_t *block, 
gfc_omp_clauses *clauses,
  && (n->u.map_op == OMP_MAP_ATTACH
  || n->u.map_op == OMP_MAP_DETACH))
{
+ OMP_CLAUSE_DECL (node)
+   = build_fold_addr_expr (OMP_CLAUSE_DECL (node));
  OMP_CLAUSE_SIZE (node) = size_zero_node;
  goto finalize_map_clause;
}
@@ -3430,6 +3432,13 @@ gfc_trans_omp_clauses (stmtblock_t *block, 
gfc_omp_clauses *clauses,
= TYPE_SIZE_UNIT (gfc_charlen_type_node);
}
}
+ else if (openacc
+  && (n->u.map_op == OMP_MAP_ATTACH
+  || n->u.map_op == OMP_MAP_DETACH))
+   gfc_error ("%qs clause argument not pointer or "
+  "allocatable at %L",
+  (n->u.map_op == OMP_MAP_ATTACH)
+  ? "attach" : "detach", &where);
}
  else if (n->expr
   && n->expr->expr_type == EXPR_VARIABLE
@@ -3510,6 +3519,13 @@ gfc_trans_omp_clauses (stmtblock_t *block, 
gfc_omp_clauses *clauses,
}
  else
{
+ if (openacc
+   && (n->u.map_op == OMP_MAP_ATTACH
+   || n->u.map_op == OMP_MAP_DETACH))
+   gfc_error ("%qs clause argument not pointer or "
+  "allocatable at %L",
+  (n->u.map_op == OMP_MAP_ATTACH)
+  ? "attach" : "detach", &where);
  OMP_CLAUSE_DECL (node) = inner;
  OMP_CLAUSE_SIZE (node)
= TYPE_SIZE_UNIT (TREE_TYPE (inner));
@@ -3523,15 +3539,25 @@ gfc_trans_omp_clauses (stmtblock_t *block, 
gfc_omp_clauses *clauses,
  if (n->u.map_op == OMP_MAP_ATTACH
  || n->u.map_op == OMP_MAP_DETACH)
{
- if (GFC_DESCRIPTOR_TYPE_P (TREE_TYPE (inner)))
+ if (POINTER_TYPE_P (TREE_TYPE (inner))
+ || GFC_DESCRIPTOR_TYPE_P (TREE_TYPE (inner)))

[PATCH V2] RISC-V: decouple stack allocation for rv32e w/o save-restore.

2023-04-29 Thread Fei Gao

Currently in rv32e, stack allocation for GPR callee-saved registers is
always 12 bytes w/o save-restore. Actually, for the case without save-restore,
less stack memory can be reserved. This patch decouples stack allocation for
rv32e w/o save-restore and makes riscv_compute_frame_info more readable.

output of testcase rv32e_stack.c
before patch:
addisp,sp,-16
sw  ra,12(sp)
callgetInt
sw  a0,0(sp)
lw  a0,0(sp)
callPrintInts
lw  a5,0(sp)
mv  a0,a5
lw  ra,12(sp)
addisp,sp,16
jr  ra

after patch:
addisp,sp,-8
sw  ra,4(sp)
callgetInt
sw  a0,0(sp)
lw  a0,0(sp)
callPrintInts
lw  a5,0(sp)
mv  a0,a5
lw  ra,4(sp)
addisp,sp,8
jr  ra


gcc/ChangeLog:

* config/riscv/riscv.cc (riscv_avoid_save_libcall): helper function for 
riscv_use_save_libcall.
(riscv_use_save_libcall): call riscv_avoid_save_libcall.
(riscv_compute_frame_info): restructure to decouple stack allocation 
for rv32e w/o save-restore.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/rv32e_stack.c: New test.
---
 gcc/config/riscv/riscv.cc| 58 
 gcc/testsuite/gcc.target/riscv/rv32e_stack.c | 14 +
 2 files changed, 50 insertions(+), 22 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/riscv/rv32e_stack.c

diff --git a/gcc/config/riscv/riscv.cc b/gcc/config/riscv/riscv.cc
index 5d2550871c7..8b32977e296 100644
--- a/gcc/config/riscv/riscv.cc
+++ b/gcc/config/riscv/riscv.cc
@@ -4772,12 +4772,27 @@ riscv_save_reg_p (unsigned int regno)
   return false;
 }
 
+/* Return TRUE if a libcall to save/restore GPRs should be
+   avoided.  FALSE otherwise.  */
+static bool
+riscv_avoid_save_libcall (void)
+{
+  if (!TARGET_SAVE_RESTORE
+  || crtl->calls_eh_return
+  || frame_pointer_needed
+  || cfun->machine->interrupt_handler_p
+  || cfun->machine->varargs_size != 0
+  || crtl->args.pretend_args_size != 0)
+return true;
+
+  return false;
+}
+
 /* Determine whether to call GPR save/restore routines.  */
 static bool
 riscv_use_save_libcall (const struct riscv_frame_info *frame)
 {
-  if (!TARGET_SAVE_RESTORE || crtl->calls_eh_return || frame_pointer_needed
-  || cfun->machine->interrupt_handler_p)
+  if (riscv_avoid_save_libcall ())
 return false;
 
   return frame->save_libcall_adjustment != 0;
@@ -4857,7 +4872,7 @@ riscv_compute_frame_info (void)
   struct riscv_frame_info *frame;
   poly_int64 offset;
   bool interrupt_save_prologue_temp = false;
-  unsigned int regno, i, num_x_saved = 0, num_f_saved = 0;
+  unsigned int regno, i, num_x_saved = 0, num_f_saved = 0, x_save_size = 0;
 
   frame = &cfun->machine->frame;
 
@@ -4895,24 +4910,14 @@ riscv_compute_frame_info (void)
frame->fmask |= 1 << (regno - FP_REG_FIRST), num_f_saved++;
 }
 
-  /* At the bottom of the frame are any outgoing stack arguments. */
-  offset = riscv_stack_align (crtl->outgoing_args_size);
-  /* Next are local stack variables. */
-  offset += riscv_stack_align (get_frame_size ());
-  /* The virtual frame pointer points above the local variables. */
-  frame->frame_pointer_offset = offset;
-  /* Next are the callee-saved FPRs. */
-  if (frame->fmask)
-offset += riscv_stack_align (num_f_saved * UNITS_PER_FP_REG);
-  frame->fp_sp_offset = offset - UNITS_PER_FP_REG;
-  /* Next are the callee-saved GPRs. */
   if (frame->mask)
 {
-  unsigned x_save_size = riscv_stack_align (num_x_saved * UNITS_PER_WORD);
+  x_save_size = riscv_stack_align (num_x_saved * UNITS_PER_WORD);
   unsigned num_save_restore = 1 + riscv_save_libcall_count (frame->mask);
 
   /* Only use save/restore routines if they don't alter the stack size.  */
-  if (riscv_stack_align (num_save_restore * UNITS_PER_WORD) == x_save_size)
+  if (riscv_stack_align (num_save_restore * UNITS_PER_WORD) == x_save_size
+  && !riscv_avoid_save_libcall ())
{
  /* Libcall saves/restores 3 registers at once, so we need to
 allocate 12 bytes for callee-saved register.  */
@@ -4921,9 +4926,21 @@ riscv_compute_frame_info (void)
 
  frame->save_libcall_adjustment = x_save_size;
}
-
-  offset += x_save_size;
 }
+
+  /* At the bottom of the frame are any outgoing stack arguments. */
+  offset = riscv_stack_align (crtl->outgoing_args_size);
+  /* Next are local stack variables. */
+  offset += riscv_stack_align (get_frame_size ());
+  /* The virtual frame pointer points above the local variables. */
+  frame->frame_pointer_offset = offset;
+  /* Next are the callee-saved FPRs. */
+  if (frame->fmask)
+offset += riscv_stack_align (num_f_saved * UNITS_PER_FP_REG);
+  frame->fp_sp_offset = offset - UNITS_PER_FP_REG;
+  /* Next are the callee-saved GPRs. */
+  if (frame->mask)
+

[PATCH v2] RISC-V: Allow RVV VMS{Compare}(V1, V1) simplify to VMSET

2023-04-29 Thread Pan Li via Gcc-patches

From: Pan Li 

When some RVV integer compare operators act on the same vector registers
without mask. They can be simplified to VMSET.

This PATCH allow the eq, le, leu, ge, geu to perform such kind of the
simplification by adding vector bool support in relational_result of
the simplify rtx.

Given we have:
vbool1_t test_shortcut_for_riscv_vmseq_case_0(vint8m8_t v1, size_t vl)
{
  return __riscv_vmseq_vv_i8m8_b1(v1, v1, vl);
}

Before this patch:
vsetvli  zero,a2,e8,m8,ta,ma
vl8re8.v v8,0(a1)
vmseq.vv v8,v8,v8
vsetvli  a5,zero,e8,m8,ta,ma
vsm.vv8,0(a0)
ret

After this patch:
vsetvli zero,a2,e8,m8,ta,ma
vmset.m v1  <- optimized to vmset.m
vsetvli a5,zero,e8,m8,ta,ma
vsm.v   v1,0(a0)
ret

As above, we may have one instruction eliminated and require less vector
registers.

gcc/ChangeLog:

* machmode.h (VECTOR_BOOL_MODE_P): Add new predication macro.
* simplify-rtx.cc (relational_result): Add vector bool support.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/rvv/base/integer_compare_insn_shortcut.c:
  Adjust test check condition.

Signed-off-by: Pan Li 
---
 gcc/machmode.h  | 4 
 gcc/simplify-rtx.cc | 4 
 .../riscv/rvv/base/integer_compare_insn_shortcut.c  | 6 +-
 3 files changed, 9 insertions(+), 5 deletions(-)

diff --git a/gcc/machmode.h b/gcc/machmode.h
index f1865c1ef42..5fbece0042f 100644
--- a/gcc/machmode.h
+++ b/gcc/machmode.h
@@ -134,6 +134,10 @@ extern const unsigned char mode_class[NUM_MACHINE_MODES];
|| GET_MODE_CLASS (MODE) == MODE_VECTOR_ACCUM   \
|| GET_MODE_CLASS (MODE) == MODE_VECTOR_UACCUM)
 
+/* Nonzero if MODE is a vector bool mode.  */
+#define VECTOR_BOOL_MODE_P(MODE)   \
+  (GET_MODE_CLASS (MODE) == MODE_VECTOR_BOOL)
+
 /* Nonzero if MODE is a scalar integral mode.  */
 #define SCALAR_INT_MODE_P(MODE)\
   (GET_MODE_CLASS (MODE) == MODE_INT   \
diff --git a/gcc/simplify-rtx.cc b/gcc/simplify-rtx.cc
index d4aeebc7a5f..12aba4c4b05 100644
--- a/gcc/simplify-rtx.cc
+++ b/gcc/simplify-rtx.cc
@@ -2535,6 +2535,10 @@ relational_result (machine_mode mode, machine_mode 
cmp_mode, rtx res)
 {
   if (res == const0_rtx)
return CONST0_RTX (mode);
+
+  if (VECTOR_BOOL_MODE_P (mode) && res == const1_rtx)
+   return CONSTM1_RTX (mode);
+
 #ifdef VECTOR_STORE_FLAG_VALUE
   rtx val = VECTOR_STORE_FLAG_VALUE (mode);
   if (val == NULL_RTX)
diff --git 
a/gcc/testsuite/gcc.target/riscv/rvv/base/integer_compare_insn_shortcut.c 
b/gcc/testsuite/gcc.target/riscv/rvv/base/integer_compare_insn_shortcut.c
index 8954adad09d..1bca8467a16 100644
--- a/gcc/testsuite/gcc.target/riscv/rvv/base/integer_compare_insn_shortcut.c
+++ b/gcc/testsuite/gcc.target/riscv/rvv/base/integer_compare_insn_shortcut.c
@@ -283,9 +283,5 @@ vbool64_t test_shortcut_for_riscv_vmsgeu_case_6(vuint8mf8_t 
v1, size_t vl) {
   return __riscv_vmsgeu_vv_u8mf8_b64(v1, v1, vl);
 }
 
-/* { dg-final { scan-assembler-times {vmseq\.vv\sv[0-9],\s*v[0-9],\s*v[0-9]} 7 
} } */
-/* { dg-final { scan-assembler-times {vmsle\.vv\sv[0-9],\s*v[0-9],\s*v[0-9]} 7 
} } */
-/* { dg-final { scan-assembler-times {vmsleu\.vv\sv[0-9],\s*v[0-9],\s*v[0-9]} 
7 } } */
-/* { dg-final { scan-assembler-times {vmsge\.vv\sv[0-9],\s*v[0-9],\s*v[0-9]} 7 
} } */
-/* { dg-final { scan-assembler-times {vmsgeu\.vv\sv[0-9],\s*v[0-9],\s*v[0-9]} 
7 } } */
 /* { dg-final { scan-assembler-times {vmclr\.m\sv[0-9]} 35 } } */
+/* { dg-final { scan-assembler-times {vmset\.m\sv[0-9]} 35 } } */
-- 
2.34.1

RE: [PATCH] RISC-V: Allow RVV VMS{Compare}(V1, V1) simplify to VMSET

2023-04-29 Thread Li, Pan2 via Gcc-patches

Hi Jeff

Just have a try in simplify_rtx for this optimization in PATCH v2. Could you 
please help to share any idea about this when you free? Thank you!

https://gcc.gnu.org/pipermail/gcc-patches/2023-April/617117.html

Pan

-Original Message-
From: Li, Pan2 
Sent: Saturday, April 29, 2023 10:55 AM
To: Jeff Law ; gcc-patches@gcc.gnu.org
Cc: juzhe.zh...@rivai.ai; kito.ch...@sifive.com; Wang, Yanzhang 

Subject: RE: [PATCH] RISC-V: Allow RVV VMS{Compare}(V1, V1) simplify to VMSET

Thanks Jeff for comments.

It makes sense to me. For the EQ operator we should have CONSTM1. Does this 
mean s390 parts has similar issue here? Then for instructions like VMSEQ, we 
need to adjust the simplify_rtx up to a point.

Please help to correct me if any mistake. Thank you again.

Pan

-Original Message-
From: Jeff Law  
Sent: Saturday, April 29, 2023 5:48 AM
To: Li, Pan2 ; gcc-patches@gcc.gnu.org
Cc: juzhe.zh...@rivai.ai; kito.ch...@sifive.com; Wang, Yanzhang 

Subject: Re: [PATCH] RISC-V: Allow RVV VMS{Compare}(V1, V1) simplify to VMSET



On 4/28/23 09:21, Pan Li via Gcc-patches wrote:
> From: Pan Li 
> 
> When some RVV integer compare operators act on the same vector 
> registers without mask. They can be simplified to VMSET.
> 
> This PATCH allows the eq, le, leu, ge, geu to perform such kind of the 
> simplification by adding one macro in riscv for simplify rtx.
> 
> Given we have:
> vbool1_t test_shortcut_for_riscv_vmseq_case_0(vint8m8_t v1, size_t vl) 
> {
>return __riscv_vmseq_vv_i8m8_b1(v1, v1, vl); }
> 
> Before this patch:
> vsetvli  zero,a2,e8,m8,ta,ma
> vl8re8.v v8,0(a1)
> vmseq.vv v8,v8,v8
> vsetvli  a5,zero,e8,m8,ta,ma
> vsm.vv8,0(a0)
> ret
> 
> After this patch:
> vsetvli zero,a2,e8,m8,ta,ma
> vmset.m v1  <- optimized to vmset.m
> vsetvli a5,zero,e8,m8,ta,ma
> vsm.v   v1,0(a0)
> ret
> 
> As above, we may have one instruction eliminated and require less 
> vector registers.
> 
> Signed-off-by: Pan Li 
> 
> gcc/ChangeLog:
> 
>   * config/riscv/riscv.h (VECTOR_STORE_FLAG_VALUE): Add new macro
> consumed by simplify_rtx.
> 
> gcc/testsuite/ChangeLog:
> 
>   * gcc.target/riscv/rvv/base/integer_compare_insn_shortcut.c:
> Adjust test check condition.
I'm not sure this is 100% correct.

What happens to the high bits in the resultant mask register?  My understanding 
is we have one output bit per input element in the comparison.  So unless the 
number of elements matches the bit width of the mask register, this isn't going 
to work.

Am I missing something?

Jeff

Re: [PATCH] Turn on LRA on all targets

2023-04-29 Thread Roger Sayle

Segher Boessenkool wrote:
> I send this patch now so that people can start testing.
>
> diff --git a/gcc/config/nvptx/nvptx.cc b/gcc/config/nvptx/nvptx.cc
> index 89349dae9e62..e32f17377525 100644
> --- a/gcc/config/nvptx/nvptx.cc
> +++ b/gcc/config/nvptx/nvptx.cc
> @@ -7601,9 +7601,6 @@ nvptx_asm_output_def_from_decls (FILE *stream, tree
name, tree value)
> #undef TARGET_ATTRIBUTE_TABLE
> #define TARGET_ATTRIBUTE_TABLE nvptx_attribute_table
>
>-#undef TARGET_LRA_P
>-#define TARGET_LRA_P hook_bool_void_false
>-
> #undef TARGET_LEGITIMATE_ADDRESS_P
> #define TARGET_LEGITIMATE_ADDRESS_P nvptx_legitimate_address_p

I've tested Segher's patch on nvptx-none with make and make -k check and
can confirm there are no new regressions.  Nvptx is unique in that it
doesn't
use register allocation, i.e. GCC's only TARGET_NO_REGISTER_ALLOCATION
target,
so it's a little odd that it specifies which register allocator it doesn't
use.

I hope this helps,
Roger
--

Re: [PATCH] Turn on LRA on all targets

2023-04-29 Thread Segher Boessenkool

Hi!

On Mon, Apr 24, 2023 at 11:46:50AM +0200, Uros Bizjak wrote:
> On Mon, Apr 24, 2023 at 11:19 AM Segher Boessenkool
>  wrote:
> > We still need someone to test this on alpha now, years later, and give
> > a final okay, but hearing this is encouraging :-)
> 
> Please note that bootstrap worked on alpha*EV6*, not plain alpha.
> 
> Plain alpha is !BWX architecture and uses {un,}aligned_memory_operand
> predicates that call resolve_reload_operand function. Unfortunately,
> this function peeks deep into reload internals (reg_equiv_memory_loc)
> that has no equivalent in LRA. As said in the comment, this internal
> function resolves what reload is going to do with OP if it is a
> register.

Bootstrap works with everything I tried, but building Linux fails with a
few things like
/home/segher/src/kernel/drivers/tty/serial/serial_core.c:1029:1: internal 
compiler error: maximum number of generated reload insns per insn achieved (90)
(it uses -mcpu=ev5 there; to reproduce just (try to) build a defconfig).

Segher

Re: [PATCH] RISC-V: Allow RVV VMS{Compare}(V1, V1) simplify to VMSET

2023-04-29 Thread Jeff Law via Gcc-patches





On 4/28/23 20:55, Li, Pan2 wrote:

Thanks Jeff for comments.

It makes sense to me. For the EQ operator we should have CONSTM1. 
That's not the way I interpret the RVV documentation.  Of course it's 
not terribly clear.I guess one could do some experiments with qemu 
or try to dig into the sail code and figure out the intent from those.




Does this mean s390 parts has similar issue here? Then for instructions 
like VMSEQ, we need to adjust the simplify_rtx up to a point.
You'd have to refer to the s390 instruction set reference to understand 
precisely how the vector compares work.


But as it stands this really isn't a simplify-rtx question, but a 
question of the semantics of risc-v.   What happens with the high bits 
in the destination mask register is critical -- and if risc-v doesn't 
set them to all ones in this case, then that would mean that defining 
that macro is simply wrong for risc-v.


jeff

Re: [PATCH] Turn on LRA on all targets

2023-04-29 Thread Jeff Law via Gcc-patches





On 4/29/23 07:37, Roger Sayle wrote:


Segher Boessenkool wrote:

I send this patch now so that people can start testing.

diff --git a/gcc/config/nvptx/nvptx.cc b/gcc/config/nvptx/nvptx.cc
index 89349dae9e62..e32f17377525 100644
--- a/gcc/config/nvptx/nvptx.cc
+++ b/gcc/config/nvptx/nvptx.cc
@@ -7601,9 +7601,6 @@ nvptx_asm_output_def_from_decls (FILE *stream, tree

name, tree value)

#undef TARGET_ATTRIBUTE_TABLE
#define TARGET_ATTRIBUTE_TABLE nvptx_attribute_table

-#undef TARGET_LRA_P
-#define TARGET_LRA_P hook_bool_void_false
-
#undef TARGET_LEGITIMATE_ADDRESS_P
#define TARGET_LEGITIMATE_ADDRESS_P nvptx_legitimate_address_p



I've tested Segher's patch on nvptx-none with make and make -k check and
can confirm there are no new regressions.  Nvptx is unique in that it
doesn't
use register allocation, i.e. GCC's only TARGET_NO_REGISTER_ALLOCATION
target,
so it's a little odd that it specifies which register allocator it doesn't
use.

I hope this helps,
It does.  Consider a patch which flips the nvptx port to LRA as 
pre-approved.


I tried the FRV just for fun.  It faulted all over the place :(

jeff

Re: [PATCH V2] RISC-V: decouple stack allocation for rv32e w/o save-restore.

2023-04-29 Thread Jeff Law via Gcc-patches





On 4/29/23 04:59, Fei Gao wrote:

Currently in rv32e, stack allocation for GPR callee-saved registers is
always 12 bytes w/o save-restore. Actually, for the case without save-restore,
less stack memory can be reserved. This patch decouples stack allocation for
rv32e w/o save-restore and makes riscv_compute_frame_info more readable.

output of testcase rv32e_stack.c
before patch:
addisp,sp,-16
sw  ra,12(sp)
callgetInt
sw  a0,0(sp)
lw  a0,0(sp)
callPrintInts
lw  a5,0(sp)
mv  a0,a5
lw  ra,12(sp)
addisp,sp,16
jr  ra

after patch:
addisp,sp,-8
sw  ra,4(sp)
callgetInt
sw  a0,0(sp)
lw  a0,0(sp)
callPrintInts
lw  a5,0(sp)
mv  a0,a5
lw  ra,4(sp)
addisp,sp,8
jr  ra


gcc/ChangeLog:

 * config/riscv/riscv.cc (riscv_avoid_save_libcall): helper function 
for riscv_use_save_libcall.
 (riscv_use_save_libcall): call riscv_avoid_save_libcall.
 (riscv_compute_frame_info): restructure to decouple stack allocation 
for rv32e w/o save-restore.

gcc/testsuite/ChangeLog:

 * gcc.target/riscv/rv32e_stack.c: New test.

Thanks.  I rewrapped the ChangeLog and pushed this to the trunk.

jeff

[committed] [PR target/109549] Adjust mips test for recent ifcvt costing changes

2023-04-29 Thread Jeff Law



MIPS ports have been failing a few tests since the change to add cost 
checks in another path through the if-converter pass.


As with the other ports, these look like cases where we don't do good 
costing in the MIPS port.  Someone who cares about MIPS will need to fix 
this properly.


In the mean time this patch adjusts the branch cost when running the two 
affected tests and skips them at -Os.  This is enough to verify that if 
conversion can still happen if the costs are adjusted.


Committed to the trunk.

Jeff

commit ef6c3095aabe75af727a269d91d9ffa37f982ace
Author: Jeff Law 
Date:   Sat Apr 29 10:16:21 2023 -0600

Adjust mips test for recent ifcvt costing changes

MIPS ports have been failing a few tests since the change to add cost
checks in another path through the if-converter pass.

As with the other ports, these look like cases where we don't do good
costing in the MIPS port.  Someone who cares about MIPS will need to
fix this properly.

In the mean time this patch adjusts the branch cost when running the
two affected tests and skips them at -Os.  This is enough to verify
that if conversion can still happen if the costs are adjusted.

gcc/testsuite
* gcc.target/mips/mips-ps-type-2.c: Adjust branch cost to
encourage if-conversion.  Skip for -Os.
* gcc.target/mips/movcc-3.c: Similarly.

diff --git a/gcc/testsuite/gcc.target/mips/mips-ps-type-2.c 
b/gcc/testsuite/gcc.target/mips/mips-ps-type-2.c
index ed5d6ee1663..e5cb7d48dae 100644
--- a/gcc/testsuite/gcc.target/mips/mips-ps-type-2.c
+++ b/gcc/testsuite/gcc.target/mips/mips-ps-type-2.c
@@ -1,8 +1,8 @@
 /* Test v2sf calculations.  The nmadd and nmsub patterns need
-ffinite-math-only.  */
 /* { dg-do compile } */
-/* { dg-options "(HAS_MADDPS) -mmadd4 -mgp32 -mpaired-single 
-ffinite-math-only forbid_cpu=octeon.*" } */
-/* { dg-skip-if "nmadd and nmsub need combine" { *-*-* } { "-O0" } { "" } } */
+/* { dg-options "(HAS_MADDPS) -mmadd4 -mgp32 -mpaired-single 
-ffinite-math-only forbid_cpu=octeon.* -mbranch-cost=2" } */
+/* { dg-skip-if "nmadd and nmsub need combine" { *-*-* } { "-O0" "-Os" } { "" 
} } */
 /* { dg-final { scan-assembler "\tcvt.ps.s\t" } } */
 /* { dg-final { scan-assembler "\tmov.ps\t" } } */
 /* { dg-final { scan-assembler "\tldc1\t" } } */
diff --git a/gcc/testsuite/gcc.target/mips/movcc-3.c 
b/gcc/testsuite/gcc.target/mips/movcc-3.c
index 55434b72c72..80d44098a3f 100644
--- a/gcc/testsuite/gcc.target/mips/movcc-3.c
+++ b/gcc/testsuite/gcc.target/mips/movcc-3.c
@@ -1,6 +1,6 @@
 /* { dg-do compile } */
-/* { dg-options "(HAS_MOVN) -mhard-float" } */
-/* { dg-skip-if "code quality test" { *-*-* } { "-O0" } { "" } } */
+/* { dg-options "(HAS_MOVN) -mhard-float -mbranch-cost=2" } */
+/* { dg-skip-if "code quality test" { *-*-* } { "-O0" "-Os" } { "" } } */
 /* { dg-final { scan-assembler "\tmovt\t" } } */
 /* { dg-final { scan-assembler "\tmovf\t" } } */
 /* { dg-final { scan-assembler "\tmovz.s\t" } } */

[xstormy16 PATCH] Recognize/support swpn (swap nibbles) instruction.

2023-04-29 Thread Roger Sayle


This patch adds support for xstormy16's swap nibbles instruction (swpn).
For the test case:

short foo(short x) {
  return (x&0xff00) | ((x<<4)&0xf0) | ((x>>4)&0x0f);
}

GCC with -O2 currently generates the nine instruction sequence:
foo:mov r7,r2
asr r2,#4
and r2,#15
mov.w r6,#-256
and r6,r7
or r2,r6
shl r7,#4
and r7,#255
or r2,r7
ret

with this patch, we now generate:
foo:swpn r2
ret

To achieve this using combine's four instruction "combinations" requires
a little wizardry.  Firstly, define_insn_and_split are introduced to
treat logical shifts followed by bitwise-AND as macro instructions that
are split after reload.  This is sufficient to recognize a QImode
nibble swap, which can be implemented by swpn followed by either a
zero-extension or a sign-extension from QImode to HImode.  Then finally,
in the correct context, a QImode swap-nibbles pattern can be combined to
preserve the high-byte of a HImode word, matching the xstormy16's swpn
semantics.

The naming of the new code iterators is taken from i386.md.
The any_rotate code iterator is used in my next (split out) patch.

This patch has been tested by building a cross-compiler to xstormy16-elf
from x86_64-pc-linux-gnu and confirming the new test cases pass.
Ok for mainline?


2023-04-29  Roger Sayle  

gcc/ChangeLog
* config/stormy16/stormy16.md (any_lshift): New code iterator.
(any_or_plus): Likewise.
(any_rotate): Likewise.
(*_and_internal): New define_insn_and_split to
recognize a logical shift followed by an AND, and split it
again after reload.
(*swpn): New define_insn matching xstormy16's swpn.
(*swpn_zext): New define_insn recognizing swpn followed by
zero_extendqihi2, i.e. with the high byte set to zero.
(*swpn_sext): Likewise, for swpn followed by cbw.
(*swpn_sext_2): Likewise, for an alternate RTL form.
(*swpn_zext_ior): A pre-reload splitter so that an swpn+zext+ior
sequence is split in the correct place to recognize the *swpn_zext
followed by any_or_plus (ior, xor or plus) instruction.

gcc/testsuite/ChangeLog
* gcc.target/xstormy16/swpn-1.c: New QImode test case.
* gcc.target/xstormy16/swpn-2.c: New zero_extend test case.
* gcc.target/xstormy16/swpn-3.c: New sign_extend test case.
* gcc.target/xstormy16/swpn-4.c: New HImode test case.


Thanks in advance,
Roger
--

diff --git a/gcc/config/stormy16/stormy16.md b/gcc/config/stormy16/stormy16.md
index b2e86ee..be1ee04 100644
--- a/gcc/config/stormy16/stormy16.md
+++ b/gcc/config/stormy16/stormy16.md
@@ -48,6 +48,10 @@
 (CARRY_REG 16)
   ]
 )
+
+(define_code_iterator any_lshift [ashift lshiftrt])
+(define_code_iterator any_or_plus [plus ior xor])
+(define_code_iterator any_rotate [rotate rotatert])
 
 ;; 
 ;; ::
@@ -1301,3 +1323,86 @@
   [(parallel [(set (match_dup 2) (match_dup 1))
   (set (match_dup 1) (match_dup 2))])])
 
+;; Recognize shl+and and shr+and as macro instructions.
+(define_insn_and_split "*_and_internal"
+  [(set (match_operand:HI 0 "register_operand" "=r")
+(and:HI (any_lshift:HI (match_operand 1 "register_operand" "0")
+  (match_operand 2 "const_int_operand" "i"))
+   (match_operand 3 "const_int_operand" "i")))
+   (clobber (reg:BI CARRY_REG))]
+  "IN_RANGE (INTVAL (operands[2]), 0, 15)"
+  "#"
+  "reload_completed"
+  [(parallel [(set (match_dup 0) (any_lshift:HI (match_dup 1) (match_dup 2)))
+ (clobber (reg:BI CARRY_REG))])
+   (set (match_dup 0) (and:HI (match_dup 0) (match_dup 3)))])
+
+;; Swap nibbles instruction
+(define_insn "*swpn"
+  [(set (match_operand:HI 0 "register_operand" "=r")
+   (any_or_plus:HI
+ (any_or_plus:HI
+   (and:HI (ashift:HI (match_operand:HI 1 "register_operand" "0")
+  (const_int 4))
+   (const_int 240))
+   (and:HI (lshiftrt:HI (match_dup 1) (const_int 4))
+   (const_int 15)))
+ (and:HI (match_dup 1) (const_int -256]
+  ""
+  "swpn %0")
+
+(define_insn "*swpn_zext"
+  [(set (match_operand:HI 0 "register_operand" "=r")
+   (any_or_plus:HI
+ (and:HI (ashift:HI (match_operand:HI 1 "register_operand" "0")
+(const_int 4))
+ (const_int 240))
+ (and:HI (lshiftrt:HI (match_dup 1) (const_int 4))
+ (const_int 15]
+  ""
+  "swpn %0 | and %0,#255"
+  [(set_attr "length" "6")])
+
+(define_insn "*swpn_sext"
+  [(set (match_operand:HI 0 "register_operand" "=r")
+   (sign_extend:HI
+ (rotate:QI (subreg:QI (match_operand:HI 1 "register_operand" "0") 0)
+(const_int 4]
+  ""
+  "swpn %0 | cbw %0"
+  [(set_attr "length" "4")])
+
+(define_insn "*swpn_sext_2"
+  [(set (match_operand:HI 0 "register_operand" "=r")
+   (sign

[xstormy16 PATCH] Efficient HImode rotate left by a single bit.

2023-04-29 Thread Roger Sayle

This patch contains some minor tweak to xstormy16's machine description
most significantly providing a pattern for HImode rotate left by a single
bit that requires only two instructions.

unsigned short foo(unsigned short x)
{
  return (x << 1) | (x >> 15);
}

currently with -O2 generates:
foo:mov r7,r2
shr r7,#15
shl r2,#1
or r2,r7
ret

with this patch, GCC now generates:
foo:shl r2,#1 | adc r2,#0
ret

Additionally neghi2 is converted to a define_insn (so that the RTL
optimizers see the negation semantics), and HImode rotations by
8-bits can now be recognized and implemented using swpb.

This patch has been tested by building a cross-compiler to xstormy16-elf
from x86_64-pc-linux-gnu and confirming the new test cases pass.
Ok for mainline?


2023-04-29  Roger Sayle  

gcc/ChangeLog
* config/stormy16/stormy16.md (neghi2): Convert from a define_expand
to a define_insn.
(*rotatehi_1): New define_insn for efficient 2 insn sequence.
(*rotatehi_8, *rotaterthi_8): New define_insn to emit a swpb.

gcc/testsuite/ChangeLog
* gcc.target/xstormy16/neghi2.c: New test case.
* gcc.target/rotatehi-1.c: Likewise.


Thanks in advance,
Roger
--

diff --git a/gcc/config/stormy16/stormy16.md b/gcc/config/stormy16/stormy16.md
index b2e86ee..be1ee04 100644
--- a/gcc/config/stormy16/stormy16.md
+++ b/gcc/config/stormy16/stormy16.md
@@ -514,13 +518,13 @@
 
 ;; Negation
 
-(define_expand "neghi2"
-  [(set (match_operand:HI 0 "register_operand" "")
-   (not:HI (match_operand:HI 1 "register_operand" "")))
-   (parallel [(set (match_dup 0) (plus:HI (match_dup 0) (const_int 1)))
+(define_insn "neghi2"
+  [(parallel [(set (match_operand:HI 0 "register_operand" "=r")
+  (neg:HI (match_operand:HI 1 "register_operand" "0")))
  (clobber (reg:BI CARRY_REG))])]
   ""
-  "")
+  "not %0 | add %0,#1"
+  [(set_attr "length" "4")])
 
 ;; 
 ;; ::
@@ -554,6 +558,24 @@
(clobber (reg:BI CARRY_REG))]
   ""
   "shr %0,%2")
+
+;; HImode rotate left by 1 bit
+(define_insn "*rotatehi_1"
+  [(set (match_operand:HI 0 "register_operand" "=r")
+   (rotate:HI (match_operand:HI 1 "register_operand" "0")
+  (const_int 1)))
+   (clobber (reg:BI CARRY_REG))]
+  ""
+  "shl %0,#1 | adc %0,#0"
+  [(set_attr "length" "4")])
+
+;; HImode rotate left by 8 bits
+(define_insn "*hi_8"
+  [(set (match_operand:HI 0 "register_operand" "=r")
+   (any_rotate:HI (match_operand:HI 1 "register_operand" "0")
+  (const_int 8)))]
+  ""
+  "swpb %0")
 
 ;; 
 ;; ::
diff --git a/gcc/testsuite/gcc.target/xstormy16/neghi2.c 
b/gcc/testsuite/gcc.target/xstormy16/neghi2.c
new file mode 100644
index 000..dd3dd1e
--- /dev/null
+++ b/gcc/testsuite/gcc.target/xstormy16/neghi2.c
@@ -0,0 +1,8 @@
+/* { dg-do compile } */
+/* { dg-options "-O2" } */
+
+short neg(short x)
+{
+  return -x;
+}
+/* { dg-final { scan-assembler "not r2 | add r2,#1" } } */
diff --git a/gcc/testsuite/gcc.target/xstormy16/rotatehi-1.c 
b/gcc/testsuite/gcc.target/xstormy16/rotatehi-1.c
new file mode 100644
index 000..586e7dc
--- /dev/null
+++ b/gcc/testsuite/gcc.target/xstormy16/rotatehi-1.c
@@ -0,0 +1,10 @@
+/* { dg-do compile } */
+/* { dg-options "-O2" } */
+
+unsigned short foo(unsigned short x)
+{
+  return (x << 1) | (x >> 15);
+}
+
+/* { dg-final { scan-assembler "shl r2,#1" } } */
+/* { dg-final { scan-assembler "adc r2,#0" } } */

Re: [xstormy16 PATCH] Efficient HImode rotate left by a single bit.

2023-04-29 Thread Jeff Law via Gcc-patches





On 4/29/23 10:25, Roger Sayle wrote:

This patch contains some minor tweak to xstormy16's machine description
most significantly providing a pattern for HImode rotate left by a single
bit that requires only two instructions.

unsigned short foo(unsigned short x)
{
   return (x << 1) | (x >> 15);
}

currently with -O2 generates:
foo:mov r7,r2
 shr r7,#15
 shl r2,#1
 or r2,r7
 ret

with this patch, GCC now generates:
foo:shl r2,#1 | adc r2,#0
 ret

Additionally neghi2 is converted to a define_insn (so that the RTL
optimizers see the negation semantics), and HImode rotations by
8-bits can now be recognized and implemented using swpb.

This patch has been tested by building a cross-compiler to xstormy16-elf
from x86_64-pc-linux-gnu and confirming the new test cases pass.
Ok for mainline?


2023-04-29  Roger Sayle  

gcc/ChangeLog
 * config/stormy16/stormy16.md (neghi2): Convert from a define_expand
 to a define_insn.
 (*rotatehi_1): New define_insn for efficient 2 insn sequence.
 (*rotatehi_8, *rotaterthi_8): New define_insn to emit a swpb.

gcc/testsuite/ChangeLog
 * gcc.target/xstormy16/neghi2.c: New test case.
 * gcc.target/rotatehi-1.c: Likewise.
It may be the case that exposing negation as a not + add sequence was 
thought to potentially produce better code by exposing the component 
instructions.  Or it may have simply been the case that nobody 
considered the tradeoffs.



Either way, I think the patch is fine.  As is always the case, figure 
~24hrs after committing we'll have test results.




jeff

Re: [xstormy16 PATCH] Recognize/support swpn (swap nibbles) instruction.

2023-04-29 Thread Jeff Law via Gcc-patches





On 4/29/23 10:24, Roger Sayle wrote:


This patch adds support for xstormy16's swap nibbles instruction (swpn).
For the test case:

short foo(short x) {
   return (x&0xff00) | ((x<<4)&0xf0) | ((x>>4)&0x0f);
}

GCC with -O2 currently generates the nine instruction sequence:
foo:mov r7,r2
 asr r2,#4
 and r2,#15
 mov.w r6,#-256
 and r6,r7
 or r2,r6
 shl r7,#4
 and r7,#255
 or r2,r7
 ret

with this patch, we now generate:
foo:swpn r2
 ret

To achieve this using combine's four instruction "combinations" requires
a little wizardry.  Firstly, define_insn_and_split are introduced to
treat logical shifts followed by bitwise-AND as macro instructions that
are split after reload.  This is sufficient to recognize a QImode
nibble swap, which can be implemented by swpn followed by either a
zero-extension or a sign-extension from QImode to HImode.  Then finally,
in the correct context, a QImode swap-nibbles pattern can be combined to
preserve the high-byte of a HImode word, matching the xstormy16's swpn
semantics.

The naming of the new code iterators is taken from i386.md.
The any_rotate code iterator is used in my next (split out) patch.

This patch has been tested by building a cross-compiler to xstormy16-elf
from x86_64-pc-linux-gnu and confirming the new test cases pass.
Ok for mainline?


2023-04-29  Roger Sayle  

gcc/ChangeLog
 * config/stormy16/stormy16.md (any_lshift): New code iterator.
 (any_or_plus): Likewise.
 (any_rotate): Likewise.
 (*_and_internal): New define_insn_and_split to
 recognize a logical shift followed by an AND, and split it
 again after reload.
 (*swpn): New define_insn matching xstormy16's swpn.
 (*swpn_zext): New define_insn recognizing swpn followed by
 zero_extendqihi2, i.e. with the high byte set to zero.
 (*swpn_sext): Likewise, for swpn followed by cbw.
 (*swpn_sext_2): Likewise, for an alternate RTL form.
 (*swpn_zext_ior): A pre-reload splitter so that an swpn+zext+ior
 sequence is split in the correct place to recognize the *swpn_zext
 followed by any_or_plus (ior, xor or plus) instruction.

gcc/testsuite/ChangeLog
 * gcc.target/xstormy16/swpn-1.c: New QImode test case.
 * gcc.target/xstormy16/swpn-2.c: New zero_extend test case.
 * gcc.target/xstormy16/swpn-3.c: New sign_extend test case.
 * gcc.target/xstormy16/swpn-4.c: New HImode test case.

Ah, bridge patterns.

OK for the trunk.

jeff

Re: [PATCH] add glibc-stdint.h to vax and lm32 linux target (PR target/105525)

2023-04-29 Thread Jeff Law via Gcc-patches





On 4/28/23 11:45, Mikael Pettersson via Gcc-patches wrote:

PR target/105525 is a build regression for the vax and lm32 linux
targets present in gcc-12/13/head, where the builds fail due to
unsatisfied references to __INTPTR_TYPE__ and __UINTPTR_TYPE__,
caused by these two targets failing to provide glibc-stdint.h.

Fixed thusly, tested by building crosses, which now succeeds.

Ok for trunk? (Note I don't have commit rights.)

2023-04-28  Mikael Pettersson

PR target/105525
* config.gcc (vax-*-linux*): Add glibc-stdint.h.
(lm32-*-uclinux*): Likewise.

Thanks.  I've pushed this to the trunk.
jeff

Re: [PATCH V2] RISC-V: decouple stack allocation for rv32e w/o save-restore.

2023-04-29 Thread Palmer Dabbelt


On Sat, 29 Apr 2023 08:38:06 PDT (-0700), jeffreya...@gmail.com wrote:



On 4/29/23 04:59, Fei Gao wrote:

Currently in rv32e, stack allocation for GPR callee-saved registers is
always 12 bytes w/o save-restore. Actually, for the case without save-restore,
less stack memory can be reserved. This patch decouples stack allocation for
rv32e w/o save-restore and makes riscv_compute_frame_info more readable.


Are you guys using rv32e?  It's not widely tested, at least by most 
upstream folks.  If you're actively trying to ship it then we should 
probably add it to the various lists of targest that get tested, as I'd 
bet there's a lot of oddness floating around.



output of testcase rv32e_stack.c
before patch:
addisp,sp,-16
sw  ra,12(sp)
callgetInt
sw  a0,0(sp)
lw  a0,0(sp)
callPrintInts
lw  a5,0(sp)
mv  a0,a5
lw  ra,12(sp)
addisp,sp,16
jr  ra

after patch:
addisp,sp,-8
sw  ra,4(sp)
callgetInt
sw  a0,0(sp)
lw  a0,0(sp)
callPrintInts
lw  a5,0(sp)
mv  a0,a5
lw  ra,4(sp)
addisp,sp,8
jr  ra


gcc/ChangeLog:

 * config/riscv/riscv.cc (riscv_avoid_save_libcall): helper function 
for riscv_use_save_libcall.
 (riscv_use_save_libcall): call riscv_avoid_save_libcall.
 (riscv_compute_frame_info): restructure to decouple stack allocation 
for rv32e w/o save-restore.

gcc/testsuite/ChangeLog:

 * gcc.target/riscv/rv32e_stack.c: New test.

Thanks.  I rewrapped the ChangeLog and pushed this to the trunk.


Works for me, thanks for reviewing all this stuff -- we're all pretty 
buried ;)




jeff

Re: [PATCH] RISC-V: Allow RVV VMS{Compare}(V1, V1) simplify to VMSET

2023-04-29 Thread Andrew Waterman via Gcc-patches

On Sat, Apr 29, 2023 at 8:06 AM Jeff Law via Gcc-patches <
gcc-patches@gcc.gnu.org> wrote:
>
>
>
> On 4/28/23 20:55, Li, Pan2 wrote:
> > Thanks Jeff for comments.
> >
> > It makes sense to me. For the EQ operator we should have CONSTM1.
> That's not the way I interpret the RVV documentation.  Of course it's
> not terribly clear.I guess one could do some experiments with qemu
> or try to dig into the sail code and figure out the intent from those.
>
>
>
> Does this mean s390 parts has similar issue here? Then for instructions
> like VMSEQ, we need to adjust the simplify_rtx up to a point.
> You'd have to refer to the s390 instruction set reference to understand
> precisely how the vector compares work.
>
> But as it stands this really isn't a simplify-rtx question, but a
> question of the semantics of risc-v.   What happens with the high bits
> in the destination mask register is critical -- and if risc-v doesn't
> set them to all ones in this case, then that would mean that defining
> that macro is simply wrong for risc-v.

The relevant statement in the spec is that "the tail elements are always
updated with a tail-agnostic policy".  The vmset.m instruction will cause
mask register bits [0, vl-1] to be set to 1; elements [vl, VLMAX-1] will
either be undisturbed or set to 1, i.e., effectively unspecified.

>
> jeff

Re: [PATCH] RISC-V: Allow RVV VMS{Compare}(V1, V1) simplify to VMSET

2023-04-29 Thread Palmer Dabbelt

On Sat, 29 Apr 2023 10:21:53 PDT (-0700), gcc-patches@gcc.gnu.org wrote:

On Sat, Apr 29, 2023 at 8:06 AM Jeff Law via Gcc-patches <
gcc-patches@gcc.gnu.org> wrote:

On 4/28/23 20:55, Li, Pan2 wrote:
> Thanks Jeff for comments.
>
> It makes sense to me. For the EQ operator we should have CONSTM1.
That's not the way I interpret the RVV documentation.  Of course it's
not terribly clear.I guess one could do some experiments with qemu
or try to dig into the sail code and figure out the intent from those.

QEMU specifically takes advantage of the behavior Andrew is pointing out 
it the spec, and will soon do so more aggressively (assuming the patches 
Daniel just sent out get merged).

Does this mean s390 parts has similar issue here? Then for instructions
like VMSEQ, we need to adjust the simplify_rtx up to a point.
You'd have to refer to the s390 instruction set reference to understand
precisely how the vector compares work.

But as it stands this really isn't a simplify-rtx question, but a
question of the semantics of risc-v.   What happens with the high bits
in the destination mask register is critical -- and if risc-v doesn't
set them to all ones in this case, then that would mean that defining
that macro is simply wrong for risc-v.

The relevant statement in the spec is that "the tail elements are always
updated with a tail-agnostic policy".  The vmset.m instruction will cause
mask register bits [0, vl-1] to be set to 1; elements [vl, VLMAX-1] will
either be undisturbed or set to 1, i.e., effectively unspecified.

jeff

Re: [PATCH V2] RISC-V: decouple stack allocation for rv32e w/o save-restore.

2023-04-29 Thread Jeff Law via Gcc-patches





On 4/29/23 11:00, Palmer Dabbelt wrote:

On Sat, 29 Apr 2023 08:38:06 PDT (-0700), jeffreya...@gmail.com wrote:



On 4/29/23 04:59, Fei Gao wrote:

Currently in rv32e, stack allocation for GPR callee-saved registers is
always 12 bytes w/o save-restore. Actually, for the case without 
save-restore,
less stack memory can be reserved. This patch decouples stack 
allocation for

rv32e w/o save-restore and makes riscv_compute_frame_info more readable.


Are you guys using rv32e?  It's not widely tested, at least by most 
upstream folks.  If you're actively trying to ship it then we should 
probably add it to the various lists of targest that get tested, as I'd 
bet there's a lot of oddness floating around.

No interest at all in rv32 at Ventana.



Thanks.  I rewrapped the ChangeLog and pushed this to the trunk.


Works for me, thanks for reviewing all this stuff -- we're all pretty 
buried ;)
Just standard procedure with the trunk re-opened.  In the past I would 
have ignored anything in the risc-v space.  I've traded that for 
ignoring x86 :-)




Jeff

Re: [PATCH V2] RISC-V: decouple stack allocation for rv32e w/o save-restore.

2023-04-29 Thread Palmer Dabbelt


On Sat, 29 Apr 2023 10:44:08 PDT (-0700), jeffreya...@gmail.com wrote:



On 4/29/23 11:00, Palmer Dabbelt wrote:

On Sat, 29 Apr 2023 08:38:06 PDT (-0700), jeffreya...@gmail.com wrote:



On 4/29/23 04:59, Fei Gao wrote:

Currently in rv32e, stack allocation for GPR callee-saved registers is
always 12 bytes w/o save-restore. Actually, for the case without
save-restore,
less stack memory can be reserved. This patch decouples stack
allocation for
rv32e w/o save-restore and makes riscv_compute_frame_info more readable.


Are you guys using rv32e?  It's not widely tested, at least by most
upstream folks.  If you're actively trying to ship it then we should
probably add it to the various lists of targest that get tested, as I'd
bet there's a lot of oddness floating around.

No interest at all in rv32 at Ventana.


Makes sense, I was mostly wondering abotu the Eswin folks though.





Thanks.  I rewrapped the ChangeLog and pushed this to the trunk.


Works for me, thanks for reviewing all this stuff -- we're all pretty
buried ;)

Just standard procedure with the trunk re-opened.  In the past I would
have ignored anything in the risc-v space.  I've traded that for
ignoring x86 :-)



Jeff

Re: [PATCH] RISC-V: Allow RVV VMS{Compare}(V1, V1) simplify to VMSET

2023-04-29 Thread Jeff Law via Gcc-patches

On 4/29/23 11:28, Palmer Dabbelt wrote:

On Sat, 29 Apr 2023 10:21:53 PDT (-0700), gcc-patches@gcc.gnu.org wrote:

On Sat, Apr 29, 2023 at 8:06 AM Jeff Law via Gcc-patches <
gcc-patches@gcc.gnu.org> wrote:

On 4/28/23 20:55, Li, Pan2 wrote:
> Thanks Jeff for comments.
>
> It makes sense to me. For the EQ operator we should have CONSTM1.
That's not the way I interpret the RVV documentation.  Of course it's
not terribly clear.    I guess one could do some experiments with qemu
or try to dig into the sail code and figure out the intent from those.

QEMU specifically takes advantage of the behavior Andrew is pointing out 
it the spec, and will soon do so more aggressively (assuming the patches 
Daniel just sent out get merged).
Yea.  And taking advantage of that behavior is definitely a performance 
issue for QEMU.  There's still work to do though.  QEMU on vector code 
is running crazy slow.

jeff

Re: [PATCH] RISC-V: Allow RVV VMS{Compare}(V1, V1) simplify to VMSET

2023-04-29 Thread Palmer Dabbelt

On Sat, 29 Apr 2023 10:46:37 PDT (-0700), jeffreya...@gmail.com wrote:

On 4/29/23 11:28, Palmer Dabbelt wrote:

On Sat, 29 Apr 2023 10:21:53 PDT (-0700), gcc-patches@gcc.gnu.org wrote:

On Sat, Apr 29, 2023 at 8:06 AM Jeff Law via Gcc-patches <
gcc-patches@gcc.gnu.org> wrote:

On 4/28/23 20:55, Li, Pan2 wrote:
> Thanks Jeff for comments.
>
> It makes sense to me. For the EQ operator we should have CONSTM1.
That's not the way I interpret the RVV documentation.  Of course it's
not terribly clear.    I guess one could do some experiments with qemu
or try to dig into the sail code and figure out the intent from those.

QEMU specifically takes advantage of the behavior Andrew is pointing out
it the spec, and will soon do so more aggressively (assuming the patches
Daniel just sent out get merged).

Yea.  And taking advantage of that behavior is definitely a performance
issue for QEMU.  There's still work to do though.  QEMU on vector code
is running crazy slow.

I guess we're kind of off the rails for a GCC patch, but that's 
definately true.  Across the board RVV is going to just need a lot of 
work, it's very different than SVE or AVX.

Unfortunately QEMU performance isn't really a priority on our end, but 
it's great to see folks digging into it.

Re: [PATCH] RISC-V: Allow RVV VMS{Compare}(V1, V1) simplify to VMSET

2023-04-29 Thread Jeff Law via Gcc-patches





On 4/29/23 11:21, Andrew Waterman wrote:



The relevant statement in the spec is that "the tail elements are always 
updated with a tail-agnostic policy".  The vmset.m instruction will 
cause mask register bits [0, vl-1] to be set to 1; elements [vl, 
VLMAX-1] will either be undisturbed or set to 1, i.e., effectively 
unspecified.
Makes sense.  Just have to stitch together bits from different locations 
in the manual.


The net being that I can't think we can define that macro for RISC-V in 
the way that Pan wants, the semantics just don't line up correctly.


jeff

Re: [PATCH] RISC-V: Allow RVV VMS{Compare}(V1, V1) simplify to VMSET

2023-04-29 Thread Jeff Law via Gcc-patches





On 4/29/23 11:48, Palmer Dabbelt wrote:


Yea.  And taking advantage of that behavior is definitely a performance
issue for QEMU.  There's still work to do though.  QEMU on vector code
is running crazy slow.


I guess we're kind of off the rails for a GCC patch, but that's 
definately true.  Across the board RVV is going to just need a lot of 
work, it's very different than SVE or AVX.


Unfortunately QEMU performance isn't really a priority on our end, but 
it's great to see folks digging into it.
Well, when a user mode SPEC run goes from ~15 minutes to multiple hours 
for a single input workload within specint it becomes a development 
problem.  Daniel is loosely affiliated with my group in Ventana, so I 
can bug him with this kind of stuff.


jeff

Re: [PATCH] reload: Handle generating reloads that also clobbers flags

2023-04-29 Thread Jeff Law via Gcc-patches

On 4/18/23 08:12, Hans-Peter Nilsson wrote:

Date: Tue, 18 Apr 2023 07:43:41 -0600
From: Jeff Law 

On 2/15/23 08:34, Hans-Peter Nilsson via Gcc-patches wrote:

Regtested cris-elf with its LEGITIMIZE_RELOAD_ADDRESS
disabled, where it regresses gcc.target/cris/rld-legit1.c;
as expected, because that test guards proper function of its
LEGITIMIZE_RELOAD_ADDRESS i.e., that there's no sign of
decomposed address elements.

LRA also causes a similar decomposition (and worse, in even
smaller bits), but it can create valid insns as-is.
Unfortunately, it doesn't have something equivalent to
LEGITIMIZE_RELOAD_ADDRESS so it generates worse code for
cases where that hook helped reload.

I fear reload-related patches these days are treated like a
redheaded stepchild and even worse as this one is intended
for stage 1.  Either way, I need to create a reference to
it, and it's properly tested and has been a help when
working towards LRA, thus might help other targets: ok to
install for the next stage 1?

-- >8 --
When LEGITIMIZE_RELOAD_ADDRESS for cris-elf is disabled,
this code is now required for reload to generate valid insns
from some reload-decomposed addresses, for example the
(plus:SI
   (sign_extend:SI (mem:HI (reg/v/f:SI 32 [ a ]) [1 *a_6(D)+0 S2 A8]))
   (reg/v/f:SI 33 [ y ]))
generated in gcc.target/cris/rld-legit1.c (a valid address
but with two registers needing reload).  Now after decc0:ing,
most SET insns for former cc0 targets need to be a parallel
with a clobber of the flags register.  Such targets
typically have TARGET_FLAGS_REGNUM set to a valid register.

* reload1.cc (emit_insn_if_valid_for_reload_1): Rename from
emit_insn_if_valid_for_reload.
(emit_insn_if_valid_for_reload): Call new helper, and if a SET fails
to be recognized, also try emitting a parallel that clobbers
TARGET_FLAGS_REGNUM, as applicable.

BUt isn't it the case that we're not supposed to be exposing the flags
register until after reload?   And if that's the case, then why would
this be necessary?  Clearly I must be missing something.

That "supposed to" is only *one* possible implementation.
The one in CRIS - and I believe the preferred one; one I
should advocate more - is to *always* expose clobbering of
the flags.  (I managed to do the CRIS decc0ification
transformation without loss of performance.  There were much
fewer issues with code taking PATTERN (insn) and failing on
it being PARALLEL than I had expected, much thanks to use of
rtx_single_set.)

Think about it: why should the semantics of a valid insn
change after a "random" pass?  That's almost as crazy as the
implied semantics of cc0.
Ah, yes, thanks for the reminder that there's multiple approaches here. 
If I cared enough it'd probably make more sense at this point to expose 
cc0 early on the H8 as doing so would allow easier codegen for overflow 
tests which in turn could significantly speed up the testsuite.

OK for the trunk.

jeff

Re: [PATCH] build: Use -nostdinc generating macro_list [PR109522]

2023-04-29 Thread Jeff Law via Gcc-patches





On 4/15/23 06:01, Xi Ruoyao via Gcc-patches wrote:

This prevents a spurious message building a cross-compiler when target
libc is not installed yet:

 cc1: error: no include path in which to search for stdc-predef.h

As stdc-predef.h was added to define __STDC_* macros by libc, it's
unlikely the header will ever contain some bad definitions w/o "__"
prefix so it should be safe.

gcc/ChangeLog:

PR other/109522
* Makefile.in (s-macro_list): Pass -nostdinc to
$(GCC_FOR_TARGET).

OK.  Thanks.

jeff

Re: [PATCH] Handle Windows nul device in unlink-if-ordinary.c

2023-04-29 Thread Jeff Law via Gcc-patches





On 3/12/23 23:15, Himal wrote:

On 3/12/2023 1:48 AM, Jeff Law wrote:



On 1/6/23 01:31, anothername27-unity--- via Gcc-patches wrote:

From: Himal 

Hi,

This might be a better fix.

Regards.

PS. I had to use a different email.

---
  libiberty/unlink-if-ordinary.c | 6 ++
  1 file changed, 6 insertions(+)

diff --git a/libiberty/unlink-if-ordinary.c 
b/libiberty/unlink-if-ordinary.c

index 84328b216..e765ac8b1 100644
--- a/libiberty/unlink-if-ordinary.c
+++ b/libiberty/unlink-if-ordinary.c
@@ -62,6 +62,12 @@ was made to unlink the file because it is special.
  int
  unlink_if_ordinary (const char *name)
  {
+/* MS-Windows 'stat' function (and in turn, S_ISREG)
+   reports the null device as a regular file.  */
+#ifdef _WIN32
+  if (stricmp (name, "nul") == 0)
+    return 1;
+#endif


Hi Jeff, Thanks for the response.

Umm, wouldn't this return true for a real file called nul in the 
current directory?  ie, don't you need to distinguish between the nul 
device and a file named nul based on the full path?


I don't think that we can create a file called nul under Windows.

And not being a windows person, I'd really like to see some 
documentation which indicates that stat on the null device will 
indicate its a regular file.  Alternately if one of the windows 
experts here can chime in, it'd be appreciated.

jeff


I found these patches that might indicate the same thing.

https://src.fedoraproject.org/rpms/binutils/blob/0b119dd9d51a3763db7d6fea1b51a03494cb96d8/f/binutils-CVE-2021-20197.patch#_121-135

https://github.com/msys2/MINGW-packages/pull/10541/files

I would like to see some input from a Windows developer as well.

BTW, This doesn't affecting anything. I stumbled upon this while 
debugging another 
[bug](https://sourceware.org/bugzilla/show_bug.cgi?id=29947). I noticed 
it's calling unlink function for the nul device as well, but it wasn't 
throwing any errors or anything like that.
I'm inclined to go ahead and commit this.  I think the only other 
question I have is the use of stricmp.  That's not strictly ISO, 
strcasecmp would be preferred.  But I don't know enough about the 
windows environment to know if they picked up strcasecmp over time.


jeff

Re: [PATCH] RISC-V: Allow RVV VMS{Compare}(V1, V1) simplify to VMSET

2023-04-29 Thread Palmer Dabbelt


On Sat, 29 Apr 2023 10:52:50 PDT (-0700), jeffreya...@gmail.com wrote:



On 4/29/23 11:48, Palmer Dabbelt wrote:


Yea.  And taking advantage of that behavior is definitely a performance
issue for QEMU.  There's still work to do though.  QEMU on vector code
is running crazy slow.


I guess we're kind of off the rails for a GCC patch, but that's
definately true.  Across the board RVV is going to just need a lot of
work, it's very different than SVE or AVX.

Unfortunately QEMU performance isn't really a priority on our end, but
it's great to see folks digging into it.

Well, when a user mode SPEC run goes from ~15 minutes to multiple hours
for a single input workload within specint it becomes a development
problem.  Daniel is loosely affiliated with my group in Ventana, so I
can bug him with this kind of stuff.


We've got another team actually doing the mechanics of the SPEC runs, we 
just do the compiler.  So while I guess it is a problem, it's not my 
problem ;)


Maybe not the best way to go about things, but there's only so much that 
can be done...

Re: [PATCH] RISC-V: Allow RVV VMS{Compare}(V1, V1) simplify to VMSET

2023-04-29 Thread Kito Cheng via Gcc-patches

Hi Jeff:

The RTL pattern already models tail element and vector length well,
so I don't feel the first version of Pan's patch has any problem?

Input RTL pattern:

#(insn 10 7 12 2 (set (reg:VNx2BI 134 [ _1 ])
#(if_then_else:VNx2BI (unspec:VNx2BI [
#(const_vector:VNx2BI repeat [
#(const_int 1 [0x1])
#])  # all-1 mask
#(reg:DI 143)  # AVL reg, or vector length
#(const_int 2 [0x2]) # mask policy
#(const_int 0 [0])   # avl type
#(reg:SI 66 vl)
#(reg:SI 67 vtype)
#] UNSPEC_VPREDICATE)
#(geu:VNx2BI (reg/v:VNx2QI 137 [ v1 ])
#(reg/v:VNx2QI 137 [ v1 ]))
#(unspec:VNx2BI [
#(reg:SI 0 zero)
#] UNSPEC_VUNDEF))) # maskoff and tail operand
# (expr_list:REG_DEAD (reg:DI 143)
#(expr_list:REG_DEAD (reg/v:VNx2QI 137 [ v1 ])
#(nil

And the split pattern, only did on tail/maskoff element with undefined value:

(define_split
 [(set (match_operand:VB  0 "register_operand")
   (if_then_else:VB
 (unspec:VB
   [(match_operand:VB 1 "vector_all_trues_mask_operand")
(match_operand4 "vector_length_operand")
(match_operand5 "const_int_operand")
(match_operand6 "const_int_operand")
(reg:SI VL_REGNUM)
(reg:SI VTYPE_REGNUM)] UNSPEC_VPREDICATE)
 (match_operand:VB3 "vector_move_operand")
 (match_operand:VB2 "vector_undef_operand")))] # maskoff
and tail operand, only match undef value

Then it turns into vmset, and also discard mask policy operand (since
maskoff is undef means don't care IMO):

(insn 10 7 12 2 (set (reg:VNx2BI 134 [ _1 ])
   (if_then_else:VNx2BI (unspec:VNx2BI [
   (const_vector:VNx2BI repeat [
   (const_int 1 [0x1])
   ])  # all-1 mask
   (reg:DI 143) # AVL reg, or vector length
   (const_int 2 [0x2]) # mask policy
   (reg:SI 66 vl)
   (reg:SI 67 vtype)
   ] UNSPEC_VPREDICATE)
   (const_vector:VNx2BI repeat [
   (const_int 1 [0x1])
   ])# all-1
   (unspec:VNx2BI [
   (reg:SI 0 zero)
   ] UNSPEC_VUNDEF))) # still vundef
(expr_list:REG_DEAD (reg:DI 143)
   (nil)))



On Sat, Apr 29, 2023 at 11:05 PM Jeff Law  wrote:
>
>
>
> On 4/28/23 20:55, Li, Pan2 wrote:
> > Thanks Jeff for comments.
> >
> > It makes sense to me. For the EQ operator we should have CONSTM1.
> That's not the way I interpret the RVV documentation.  Of course it's
> not terribly clear.I guess one could do some experiments with qemu
> or try to dig into the sail code and figure out the intent from those.
>
>
>
> Does this mean s390 parts has similar issue here? Then for instructions
> like VMSEQ, we need to adjust the simplify_rtx up to a point.
> You'd have to refer to the s390 instruction set reference to understand
> precisely how the vector compares work.
>
> But as it stands this really isn't a simplify-rtx question, but a
> question of the semantics of risc-v.   What happens with the high bits
> in the destination mask register is critical -- and if risc-v doesn't
> set them to all ones in this case, then that would mean that defining
> that macro is simply wrong for risc-v.
>
> jeff

Re: [PATCH V2] RISC-V: decouple stack allocation for rv32e w/o save-restore.

2023-04-29 Thread Kito Cheng via Gcc-patches

SiFive has tests and delivers RV32E.

On Sun, Apr 30, 2023 at 1:45 AM Palmer Dabbelt  wrote:
>
> On Sat, 29 Apr 2023 10:44:08 PDT (-0700), jeffreya...@gmail.com wrote:
> >
> >
> > On 4/29/23 11:00, Palmer Dabbelt wrote:
> >> On Sat, 29 Apr 2023 08:38:06 PDT (-0700), jeffreya...@gmail.com wrote:
> >>>
> >>>
> >>> On 4/29/23 04:59, Fei Gao wrote:
>  Currently in rv32e, stack allocation for GPR callee-saved registers is
>  always 12 bytes w/o save-restore. Actually, for the case without
>  save-restore,
>  less stack memory can be reserved. This patch decouples stack
>  allocation for
>  rv32e w/o save-restore and makes riscv_compute_frame_info more readable.
> >>
> >> Are you guys using rv32e?  It's not widely tested, at least by most
> >> upstream folks.  If you're actively trying to ship it then we should
> >> probably add it to the various lists of targest that get tested, as I'd
> >> bet there's a lot of oddness floating around.
> > No interest at all in rv32 at Ventana.
>
> Makes sense, I was mostly wondering abotu the Eswin folks though.
>
> >
> >
> >>> Thanks.  I rewrapped the ChangeLog and pushed this to the trunk.
> >>
> >> Works for me, thanks for reviewing all this stuff -- we're all pretty
> >> buried ;)
> > Just standard procedure with the trunk re-opened.  In the past I would
> > have ignored anything in the risc-v space.  I've traded that for
> > ignoring x86 :-)
> >
> >
> >
> > Jeff

[PATCH] c++: Report invalid id-expression in decltype [PR100482]

2023-04-29 Thread Nathaniel Shead via Gcc-patches

This patch ensures that any errors raised by finish_id_expression when
parsing a decltype expression are properly reported, rather than
potentially going ignored and causing invalid code to be accepted.

We can also now remove the separate check for templates without args as
this is also checked for in finish_id_expression.

PR 100482

gcc/cp/ChangeLog:

* parser.cc (cp_parser_decltype_expr): Report errors raised by
finish_id_expression.

gcc/testsuite/ChangeLog:

* g++.dg/pr100482.C: New test.

Signed-off-by: Nathaniel Shead 
---
 gcc/cp/parser.cc| 22 +++---
 gcc/testsuite/g++.dg/pr100482.C | 11 +++
 2 files changed, 22 insertions(+), 11 deletions(-)
 create mode 100644 gcc/testsuite/g++.dg/pr100482.C

diff --git a/gcc/cp/parser.cc b/gcc/cp/parser.cc
index e5f032f2330..20ebcdc3cfd 100644
--- a/gcc/cp/parser.cc
+++ b/gcc/cp/parser.cc
@@ -16508,10 +16508,6 @@ cp_parser_decltype_expr (cp_parser *parser,
expr = cp_parser_lookup_name_simple (parser, expr,
 id_expr_start_token->location);
 
-  if (expr && TREE_CODE (expr) == TEMPLATE_DECL)
-   /* A template without args is not a complete id-expression.  */
-   expr = error_mark_node;
-
   if (expr
   && expr != error_mark_node
   && TREE_CODE (expr) != TYPE_DECL
@@ -16532,13 +16528,17 @@ cp_parser_decltype_expr (cp_parser *parser,
&error_msg,
   id_expr_start_token->location));
 
-  if (expr == error_mark_node)
-/* We found an id-expression, but it was something that we
-   should not have found. This is an error, not something
-   we can recover from, so note that we found an
-   id-expression and we'll recover as gracefully as
-   possible.  */
-id_expression_or_member_access_p = true;
+ if (error_msg)
+   {
+ /* We found an id-expression, but it was something that we
+should not have found. This is an error, not something
+we can recover from, so report the error we found and
+we'll recover as gracefully as possible.  */
+ cp_parser_parse_definitely (parser);
+ cp_parser_error (parser, error_msg);
+ id_expression_or_member_access_p = true;
+ return error_mark_node;
+   }
 }
 
   if (expr
diff --git a/gcc/testsuite/g++.dg/pr100482.C b/gcc/testsuite/g++.dg/pr100482.C
new file mode 100644
index 000..dcf6722fda5
--- /dev/null
+++ b/gcc/testsuite/g++.dg/pr100482.C
@@ -0,0 +1,11 @@
+// { dg-do compile { target c++11 } }
+
+namespace N {}
+decltype(std) x;   // { dg-error "expected primary-expression" }
+
+struct S {};
+decltype(S) y;  // { dg-error "argument to .decltype. must be an expression" }
+
+template 
+struct U {};
+decltype(U) z;  // { dg-error "missing template arguments" }
-- 
2.40.0

Re: [PATCH] apply debug-remap to file names in .su files

2023-04-29 Thread Jeff Law via Gcc-patches





On 2/13/23 12:27, Rasmus Villemoes wrote:

The .su files generated with -fstack-usage are arguably debug info. In
order to make builds more reproducible, apply the same remapping logic
to the recorded file names as for when producing the debug info
embedded in the object files.

To this end, teach print_decl_identifier() a new
PRINT_DECL_REMAP_DEBUG flag and use that from output_stack_usage_1().

gcc/ChangeLog:

* print-tree.h (PRINT_DECL_REMAP_DEBUG): New flag.
* print-tree.cc (print_decl_identifier): Implement it.
* toplev.cc (output_stack_usage_1): Use it.

OK for the trunk.
jeff

[PATCH 0/2] Unify and deduplicate FTM code

[PATCH 1/2] libstdc++: Implement more maintainable header

[PATCH] OpenACC: Further attach/detach clause fixes for Fortran [PR109622]

[PATCH V2] RISC-V: decouple stack allocation for rv32e w/o save-restore.

[PATCH v2] RISC-V: Allow RVV VMS{Compare}(V1, V1) simplify to VMSET

RE: [PATCH] RISC-V: Allow RVV VMS{Compare}(V1, V1) simplify to VMSET

Re: [PATCH] Turn on LRA on all targets

Re: [PATCH] Turn on LRA on all targets

Re: [PATCH] RISC-V: Allow RVV VMS{Compare}(V1, V1) simplify to VMSET

Re: [PATCH] Turn on LRA on all targets

Re: [PATCH V2] RISC-V: decouple stack allocation for rv32e w/o save-restore.

[committed] [PR target/109549] Adjust mips test for recent ifcvt costing changes

[xstormy16 PATCH] Recognize/support swpn (swap nibbles) instruction.

[xstormy16 PATCH] Efficient HImode rotate left by a single bit.

Re: [xstormy16 PATCH] Efficient HImode rotate left by a single bit.

Re: [xstormy16 PATCH] Recognize/support swpn (swap nibbles) instruction.

Re: [PATCH] add glibc-stdint.h to vax and lm32 linux target (PR target/105525)

Re: [PATCH V2] RISC-V: decouple stack allocation for rv32e w/o save-restore.

Re: [PATCH] RISC-V: Allow RVV VMS{Compare}(V1, V1) simplify to VMSET

Re: [PATCH] RISC-V: Allow RVV VMS{Compare}(V1, V1) simplify to VMSET

Re: [PATCH V2] RISC-V: decouple stack allocation for rv32e w/o save-restore.

Re: [PATCH V2] RISC-V: decouple stack allocation for rv32e w/o save-restore.

Re: [PATCH] RISC-V: Allow RVV VMS{Compare}(V1, V1) simplify to VMSET

Re: [PATCH] RISC-V: Allow RVV VMS{Compare}(V1, V1) simplify to VMSET

Re: [PATCH] RISC-V: Allow RVV VMS{Compare}(V1, V1) simplify to VMSET

Re: [PATCH] RISC-V: Allow RVV VMS{Compare}(V1, V1) simplify to VMSET

Re: [PATCH] reload: Handle generating reloads that also clobbers flags

Re: [PATCH] build: Use -nostdinc generating macro_list [PR109522]

Re: [PATCH] Handle Windows nul device in unlink-if-ordinary.c

Re: [PATCH] RISC-V: Allow RVV VMS{Compare}(V1, V1) simplify to VMSET

Re: [PATCH] RISC-V: Allow RVV VMS{Compare}(V1, V1) simplify to VMSET

Re: [PATCH V2] RISC-V: decouple stack allocation for rv32e w/o save-restore.

[PATCH] c++: Report invalid id-expression in decltype [PR100482]

Re: [PATCH] apply debug-remap to file names in .su files

34 matches

Site Navigation

Mail list logo

Footer information