Re: [PATCH] x86-64: Remove HAVE_LD_PIE_COPYRELOC

2021-05-11 Thread Rainer Orth
Hi Fangrui,

> This was introduced in 2014-12 to use local binding for external symbols
> for -fPIE. Now that we have H.J. Lu's GOTPCRELX for years which mostly
> nullify the benefit of HAVE_LD_PIE_COPYRELOC, HAVE_LD_PIE_COPYRELOC
> should retire now.

Solaris/x86 ld doesn't support this, so HAVE_LD_PIE_COPYRELOC needs to
stay.  The Solaris 11.3/x86 assembler doesn't support
R_X86_64_*GOTPCRELX.

Rainer

-- 
-
Rainer Orth, Center for Biotechnology, Bielefeld University


[PATCH] config: delete unused sim macros

2021-05-11 Thread Mike Frysinger via Gcc-patches
Nothing in gcc or binutils or gdb or anything anywhere uses these.

config/

* acinclude.m4 (CYG_AC_PATH_SIM, CYG_AC_PATH_DEVO): Delete.
---
 config/acinclude.m4 | 102 
 1 file changed, 102 deletions(-)

diff --git a/config/acinclude.m4 b/config/acinclude.m4
index 8242b2c7a8ac..0abccafa0353 100644
--- a/config/acinclude.m4
+++ b/config/acinclude.m4
@@ -373,88 +373,6 @@ fi
 AC_SUBST(INTLLIB)
 ])
 
-dnl 
-dnl Find the simulator library.
-AC_DEFUN([CYG_AC_PATH_SIM], [
-dirlist=".. ../../ ../../../ ../../../../ ../../../../../ ../../../../../../ 
../../../../../../.. ../../../../../../../.. ../../../../../../../../.. 
../../../../../../../../../.. ../../../../../../../../../.."
-case "$target_cpu" in
-powerpc)   target_dir=ppc ;;
-sparc*)target_dir=erc32 ;;
-mips*) target_dir=mips ;;
-*) target_dir=$target_cpu ;;
-esac
-dnl First look for the header file
-AC_MSG_CHECKING(for the simulator header file)
-AC_CACHE_VAL(ac_cv_c_simh,[
-for i in $dirlist; do
-if test -f "${srcdir}/$i/include/remote-sim.h" ; then
-   ac_cv_c_simh=`(cd ${srcdir}/$i/include; ${PWDCMD-pwd})`
-   break
-fi
-done
-])
-if test x"${ac_cv_c_simh}" != x; then
-SIMHDIR="-I${ac_cv_c_simh}"
-AC_MSG_RESULT(${ac_cv_c_simh})
-else
-AC_MSG_RESULT(none)
-fi
-AC_SUBST(SIMHDIR)
-
-dnl See whether it's a devo or Foundry branch simulator
-AC_MSG_CHECKING(Whether this is a devo simulator )
-AC_CACHE_VAL(ac_cv_c_simdevo,[
-CPPFLAGS="$CPPFLAGS $SIMHDIR"
-AC_EGREP_HEADER([SIM_DESC sim_open.*struct _bfd], remote-sim.h,
-ac_cv_c_simdevo=yes,
-ac_cv_c_simdevo=no)
-])
-if test x"$ac_cv_c_simdevo" = x"yes" ; then
-AC_DEFINE(HAVE_DEVO_SIM)
-fi
-AC_MSG_RESULT(${ac_cv_c_simdevo})
-AC_SUBST(HAVE_DEVO_SIM)
-
-dnl Next look for the library
-AC_MSG_CHECKING(for the simulator library)
-AC_CACHE_VAL(ac_cv_c_simlib,[
-for i in $dirlist; do
-if test -f "$i/sim/$target_dir/Makefile" ; then
-   ac_cv_c_simlib=`(cd $i/sim/$target_dir; ${PWDCMD-pwd})`
-fi
-done
-])
-if test x"${ac_cv_c_simlib}" != x; then
-SIMLIB="-L${ac_cv_c_simlib}"
-else
-AC_MSG_RESULT(none)
-dnl FIXME: this is kinda bogus, cause umtimately the TM will build
-dnl all the libraries for several architectures. But for now, this
-dnl will work till then.
-dnl AC_MSG_CHECKING(for the simulator installed with the compiler 
libraries)
-dnl Transform the name of the compiler to it's cross variant, unless
-dnl CXX is set. This is also what CXX gets set to in the generated
-dnl Makefile.
-CROSS_GCC=`echo gcc | sed -e "s/^/$target/"`
-
-dnl Get G++'s full path to libgcc.a
-changequote(,)
-gccpath=`${CROSS_GCC} --print-libgcc | sed -e 
's:[a-z0-9A-Z\.\-]*/libgcc.a::' -e 's:lib/gcc-lib/::'`lib
-changequote([,])
-if test -f $gccpath/libsim.a -o -f $gccpath/libsim.so ; then
-ac_cv_c_simlib="$gccpath/"
-SIMLIB="-L${ac_cv_c_simlib}"
-   AC_MSG_RESULT(${ac_cv_c_simlib})
-else
-AM_CONDITIONAL(PSIM, test x$psim = xno)
-   SIMLIB=""
-   AC_MSG_RESULT(none)
-dnl ac_cv_c_simlib=none
-fi
-fi
-AC_SUBST(SIMLIB)
-])
-
 dnl 
 dnl Find the libiberty library.
 AC_DEFUN([CYG_AC_PATH_LIBIBERTY], [
@@ -476,26 +394,6 @@ fi
 AC_SUBST(LIBIBERTY)
 ])
 
-dnl 
-AC_DEFUN([CYG_AC_PATH_DEVO], [
-AC_MSG_CHECKING(for devo headers in the source tree)
-dirlist=".. ../../ ../../../ ../../../../ ../../../../../ ../../../../../../ 
../../../../../../.. ../../../../../../../.. ../../../../../../../../.. 
../../../../../../../../../.."
-AC_CACHE_VAL(ac_cv_c_devoh,[
-for i in $dirlist; do
-if test -f "${srcdir}/$i/include/remote-sim.h" ; then
-   ac_cv_c_devoh=`(cd ${srcdir}/$i/include; ${PWDCMD-pwd})`
-fi
-done
-])
-if test x"${ac_cv_c_devoh}" != x; then
-DEVOHDIR="-I${ac_cv_c_devoh}"
-AC_MSG_RESULT(${ac_cv_c_devoh})
-else
-AC_MSG_RESULT(none)
-fi
-AC_SUBST(DEVOHDIR)
-])
-
 dnl 
 dnl Find all the ILU headers and libraries
 AC_DEFUN([CYG_AC_PATH_ILU], [
-- 
2.31.1



[PATCH] x86-64: Remove HAVE_LD_PIE_COPYRELOC

2021-05-11 Thread Fangrui Song via Gcc-patches
This was introduced in 2014-12 to use local binding for external symbols
for -fPIE. Now that we have H.J. Lu's GOTPCRELX for years which mostly
nullify the benefit of HAVE_LD_PIE_COPYRELOC, HAVE_LD_PIE_COPYRELOC
should retire now.

One design goal of -fPIE was to avoid copy relocations.
HAVE_LD_PIE_COPYRELOC has deviated from the goal.  With this change, the
-fPIE behavior of x86-64 will be closer to x86-32 and other targets.

---

See https://gcc.gnu.org/legacy-ml/gcc/2019-05/msg00215.html for a list
of fixed and unfixed (e.g. gold incompatibility with protected
https://sourceware.org/bugzilla/show_bug.cgi?id=19823) issues.

If you prefer a longer write-up, see
https://maskray.me/blog/2021-01-09-copy-relocations-canonical-plt-entries-and-protected
---
 gcc/config.in |  6 ---
 gcc/config/i386/i386.c| 11 +---
 gcc/configure | 52 ---
 gcc/configure.ac  | 48 -
 gcc/doc/sourcebuild.texi  |  3 --
 .../gcc.target/i386/pie-copyrelocs-1.c| 14 -
 .../gcc.target/i386/pie-copyrelocs-2.c| 14 -
 .../gcc.target/i386/pie-copyrelocs-3.c| 14 -
 .../gcc.target/i386/pie-copyrelocs-4.c| 17 --
 gcc/testsuite/lib/target-supports.exp | 47 -
 10 files changed, 2 insertions(+), 224 deletions(-)
 delete mode 100644 gcc/testsuite/gcc.target/i386/pie-copyrelocs-1.c
 delete mode 100644 gcc/testsuite/gcc.target/i386/pie-copyrelocs-2.c
 delete mode 100644 gcc/testsuite/gcc.target/i386/pie-copyrelocs-3.c
 delete mode 100644 gcc/testsuite/gcc.target/i386/pie-copyrelocs-4.c

diff --git a/gcc/config.in b/gcc/config.in
index e54f59ce0c3..a65bf5d4176 100644
--- a/gcc/config.in
+++ b/gcc/config.in
@@ -1659,12 +1659,6 @@
 #endif
 
 
-/* Define 0/1 if your linker supports -pie option with copy reloc. */
-#ifndef USED_FOR_TARGET
-#undef HAVE_LD_PIE_COPYRELOC
-#endif
-
-
 /* Define if your PowerPC linker has .gnu.attributes long double support. */
 #ifndef USED_FOR_TARGET
 #undef HAVE_LD_PPC_GNU_ATTR_LONG_DOUBLE
diff --git a/gcc/config/i386/i386.c b/gcc/config/i386/i386.c
index 915f89f571a..5ec3c6fd0c9 100644
--- a/gcc/config/i386/i386.c
+++ b/gcc/config/i386/i386.c
@@ -10579,11 +10579,7 @@ legitimate_pic_address_disp_p (rtx disp)
return true;
}
  else if (!SYMBOL_REF_FAR_ADDR_P (op0)
-  && (SYMBOL_REF_LOCAL_P (op0)
-  || (HAVE_LD_PIE_COPYRELOC
-  && flag_pie
-  && !SYMBOL_REF_WEAK (op0)
-  && !SYMBOL_REF_FUNCTION_P (op0)))
+  && SYMBOL_REF_LOCAL_P (op0)
   && ix86_cmodel != CM_LARGE_PIC)
return true;
  break;
@@ -22892,10 +22888,7 @@ ix86_atomic_assign_expand_fenv (tree *hold, tree 
*clear, tree *update)
 static bool
 ix86_binds_local_p (const_tree exp)
 {
-  return default_binds_local_p_3 (exp, flag_shlib != 0, true, true,
- (!flag_pic
-  || (TARGET_64BIT
-  && HAVE_LD_PIE_COPYRELOC != 0)));
+  return default_binds_local_p_3 (exp, flag_shlib != 0, true, true, !flag_pic);
 }
 #endif
 
diff --git a/gcc/configure b/gcc/configure
index f03fe888384..c500f5ca11e 100755
--- a/gcc/configure
+++ b/gcc/configure
@@ -29968,58 +29968,6 @@ fi
 { $as_echo "$as_me:${as_lineno-$LINENO}: result: $gcc_cv_ld_pie" >&5
 $as_echo "$gcc_cv_ld_pie" >&6; }
 
-{ $as_echo "$as_me:${as_lineno-$LINENO}: checking linker PIE support with copy 
reloc" >&5
-$as_echo_n "checking linker PIE support with copy reloc... " >&6; }
-gcc_cv_ld_pie_copyreloc=no
-if test $gcc_cv_ld_pie = yes ; then
-  if test $in_tree_ld = yes ; then
-if test "$gcc_cv_gld_major_version" -eq 2 -a "$gcc_cv_gld_minor_version" 
-ge 25 -o "$gcc_cv_gld_major_version" -gt 2; then
-  gcc_cv_ld_pie_copyreloc=yes
-fi
-  elif test x$gcc_cv_as != x -a x$gcc_cv_ld != x ; then
-# Check if linker supports -pie option with copy reloc
-case "$target" in
-i?86-*-linux* | x86_64-*-linux*)
-  cat > conftest1.s < conftest2.s < /dev/null 2>&1 \
- && $gcc_cv_ld -shared -melf_x86_64 -o conftest1.so conftest1.o > 
/dev/null 2>&1 \
- && $gcc_cv_as --64 -o conftest2.o conftest2.s > /dev/null 2>&1 \
- && $gcc_cv_ld -pie -melf_x86_64 -o conftest conftest2.o conftest1.so 
> /dev/null 2>&1; then
-gcc_cv_ld_pie_copyreloc=yes
-  fi
-  rm -f conftest conftest1.so conftest1.o conftest2.o conftest1.s 
conftest2.s
-  ;;
-esac
-  fi
-fi
-
-cat >>confdefs.h <<_ACEOF
-#define HAVE_LD_PIE_COPYRELOC `if test x"$gcc_cv_ld_pie_copyreloc" = xyes; 
then echo 1; else echo 0; fi`
-_ACEOF
-
-{ $as_echo "$as_me:${as_lineno-$LINENO}: result: $gcc_cv_ld_pie_copyreloc" >&5
-$as_echo "$gcc_cv_ld_pie_copyreloc" >&6; }
-
 { $as_echo 

[PATCH] c++: Check attributes on friend declarations [PR99032]

2021-05-11 Thread Marek Polacek via Gcc-patches
This patch implements [dcl.attr.grammar]/5: "If an attribute-specifier-seq
appertains to a friend declaration ([class.friend]), that declaration shall
be a definition."

This restriction only applies to C++11-style attributes.  There are
various forms of friend declarations, we have friend templates, C++11
extended friend declarations, and so on.  In some cases we already
ignore the attribute and warn that it was ignored.  But certain cases
weren't diagnosed, and with this patch we'll give a hard error.  I tried
hard not to emit both a warning and error and I think it worked out.

Bootstrapped/regtested on x86_64-pc-linux-gnu, ok for trunk?

gcc/cp/ChangeLog:

PR c++/99032
* decl.c (grokdeclarator): Diagnose when an attribute appertains to
a friend declaration that is not a definition.
* parser.c (cp_parser_elaborated_type_specifier): Likewise.
(cp_parser_member_declaration): Likewise.

gcc/testsuite/ChangeLog:

PR c++/99032
* g++.dg/cpp0x/friend7.C: New test.
---
 gcc/cp/decl.c|  4 +++
 gcc/cp/parser.c  | 15 +-
 gcc/testsuite/g++.dg/cpp0x/friend7.C | 41 
 3 files changed, 59 insertions(+), 1 deletion(-)
 create mode 100644 gcc/testsuite/g++.dg/cpp0x/friend7.C

diff --git a/gcc/cp/decl.c b/gcc/cp/decl.c
index bc3928d7f85..687a59d49e3 100644
--- a/gcc/cp/decl.c
+++ b/gcc/cp/decl.c
@@ -13741,6 +13741,10 @@ grokdeclarator (const cp_declarator *declarator,
 
if (friendp)
  {
+   if (attrlist && !funcdef_flag
+   && cxx11_attribute_p (*attrlist))
+ error_at (id_loc, "attribute appertains to a friend declaration "
+   "that is not a definition");
/* Friends are treated specially.  */
if (ctype == current_class_type)
  ;  /* We already issued a permerror.  */
diff --git a/gcc/cp/parser.c b/gcc/cp/parser.c
index 0fe29c658d2..612ca4598b9 100644
--- a/gcc/cp/parser.c
+++ b/gcc/cp/parser.c
@@ -19764,11 +19764,15 @@ cp_parser_elaborated_type_specifier (cp_parser* 
parser,
   && ! processing_explicit_instantiation)
warning (OPT_Wattributes,
 "attributes ignored on template instantiation");
+  else if (is_friend && cxx11_attribute_p (attributes))
+   error ("attribute appertains to a friend declaration that is not "
+  "a definition");
   else if (is_declaration && cp_parser_declares_only_class_p (parser))
cplus_decl_attributes (, attributes, (int) 
ATTR_FLAG_TYPE_IN_PLACE);
   else
warning (OPT_Wattributes,
-"attributes ignored on elaborated-type-specifier that is not a 
forward declaration");
+"attributes ignored on elaborated-type-specifier that is "
+"not a forward declaration");
 }
 
   if (tag_type == enum_type)
@@ -26054,6 +26058,15 @@ cp_parser_member_declaration (cp_parser* parser)
 error_at (decl_spec_token_start->location,
   "friend declaration does not name a class or "
   "function");
+  /* Give an error if an attribute cannot appear here, as per
+ [dcl.attr.grammar]/5.  But not when declares_class_or_enum:
+ we ignore attributes in elaborated-type-specifiers.  */
+  else if (!declares_class_or_enum
+   && (cxx11_attribute_p (decl_specifiers.std_attributes)
+   || cxx11_attribute_p (decl_specifiers.attributes)))
+error_at (decl_spec_token_start->location,
+  "attribute appertains to a friend declaration "
+  "that is not a definition");
   else
 make_friend_class (current_class_type, type,
/*complain=*/true);
diff --git a/gcc/testsuite/g++.dg/cpp0x/friend7.C 
b/gcc/testsuite/g++.dg/cpp0x/friend7.C
new file mode 100644
index 000..4aa7b14cf7d
--- /dev/null
+++ b/gcc/testsuite/g++.dg/cpp0x/friend7.C
@@ -0,0 +1,41 @@
+// PR c++/99032
+// { dg-do compile { target c++11 } }
+
+class X { };
+template
+void foo (T1, T2);
+
+struct S {
+  [[deprecated]] friend void f(); // { dg-error "attribute appertains" }
+  [[deprecated]] friend void f2() { }
+  __attribute__((deprecated)) friend void f3();
+  friend void f3 [[deprecated]] (); // { dg-error "attribute appertains" }
+  friend void f4 [[deprecated]] () { }
+  [[deprecated]] friend void; // { dg-error "attribute appertains" }
+  friend [[deprecated]] void; // { dg-error "attribute appertains" }
+  __attribute__((deprecated)) friend int;
+  friend __attribute__((deprecated)) int;
+  friend int __attribute__((deprecated));
+  [[deprecated]] friend X; // { dg-error "attribute appertains" }
+  [[deprecated]] friend class N; // { dg-warning "attribute ignored" }
+  friend class [[deprecated]] N2; // { 

retry zero-call-used-regs from zeroed regs

2021-05-11 Thread Alexandre Oliva


default_zero_call_used_regs currently requires all potentially zeroed
registers to offer a move opcode that accepts zero as an operand.

This is not the case e.g. for ARM's r12/ip in Thumb mode, and it was
not the case of FP registers on AArch64 as of GCC 10.

This patch introduces a fallback strategy to zero out registers,
copying from registers that have already been zeroed.  Adjacent
sources to make up wider modes are also supported.

This does not guarantee that there will be some zeroed-out register to
use as the source, but it expands the cases in which the default
implementation works out of the box.

This patch was regstrapped on x86_64-linux-gnu, where it is not supposed
to make any difference.  It was also tested with an older to which the
feature was backported, with flag_zero_call_used_regs defaulting to ALL
and to USED on x86_64-linux-gnu (bootstrap), arm-eabi, aarch64-elf, and
a few other targets that didn't hit problems before or after the patch.
ARM and AArch64 failed to build target libs in that tree without this
patch, and succeeded with it.  AArch64 wouldn't have required the patch
in mainline, because of improvements to movti.  Ok to install?


for  gcc/ChangeLog

* targhooks.c (default_zero_call_used_regs): Retry using
successfully-zeroed registers as sources.
---
 gcc/targhooks.c |   93 +++
 1 file changed, 86 insertions(+), 7 deletions(-)

diff --git a/gcc/targhooks.c b/gcc/targhooks.c
index 2e0fdb797e093..1947ef26fd644 100644
--- a/gcc/targhooks.c
+++ b/gcc/targhooks.c
@@ -1001,6 +1001,13 @@ default_zero_call_used_regs (HARD_REG_SET 
need_zeroed_hardregs)
 {
   gcc_assert (!hard_reg_set_empty_p (need_zeroed_hardregs));
 
+  HARD_REG_SET failed;
+  CLEAR_HARD_REG_SET (failed);
+  bool progress = false;
+
+  /* First, try to zero each register in need_zeroed_hardregs by
+ loading a zero into it, taking note of any failures in
+ FAILED.  */
   for (unsigned int regno = 0; regno < FIRST_PSEUDO_REGISTER; regno++)
 if (TEST_HARD_REG_BIT (need_zeroed_hardregs, regno))
   {
@@ -1010,16 +1017,88 @@ default_zero_call_used_regs (HARD_REG_SET 
need_zeroed_hardregs)
rtx_insn *insn = emit_move_insn (regno_reg_rtx[regno], zero);
if (!valid_insn_p (insn))
  {
-   static bool issued_error;
-   if (!issued_error)
- {
-   issued_error = true;
-   sorry ("%qs not supported on this target",
-   "-fzero-call-used-regs");
- }
+   SET_HARD_REG_BIT (failed, regno);
delete_insns_since (last_insn);
  }
+   else
+ progress = true;
   }
+
+  /* Now retry with copies from zeroed registers, as long as we've
+ made some PROGRESS, and registers remain to be zeroed in
+ FAILED.  */
+  while (progress && !hard_reg_set_empty_p (failed))
+{
+  HARD_REG_SET retrying = failed;
+
+  CLEAR_HARD_REG_SET (failed);
+  progress = false;
+
+  for (unsigned int regno = 0; regno < FIRST_PSEUDO_REGISTER; regno++)
+   if (TEST_HARD_REG_BIT (retrying, regno))
+ {
+   machine_mode mode = GET_MODE (regno_reg_rtx[regno]);
+   bool success = false;
+   /* Look for a source.  */
+   for (unsigned int src = 0; src < FIRST_PSEUDO_REGISTER; src++)
+ {
+   /* If SRC hasn't been zeroed (yet?), skip it.  */
+   if (! TEST_HARD_REG_BIT (need_zeroed_hardregs, src))
+ continue;
+   if (TEST_HARD_REG_BIT (retrying, src))
+ continue;
+
+   /* Check that SRC can hold MODE, and that any other
+  registers needed to hold MODE in SRC have also been
+  zeroed.  */
+   if (!targetm.hard_regno_mode_ok (src, mode))
+ continue;
+   unsigned n = targetm.hard_regno_nregs (src, mode);
+   bool ok = true;
+   for (unsigned i = 1; ok && i < n; i++)
+ ok = (TEST_HARD_REG_BIT (need_zeroed_hardregs, src + i)
+   && !TEST_HARD_REG_BIT (retrying, src + i));
+   if (!ok)
+ continue;
+
+   /* SRC is usable, try to copy from it.  */
+   rtx_insn *last_insn = get_last_insn ();
+   rtx zsrc = gen_rtx_REG (mode, src);
+   rtx_insn *insn = emit_move_insn (regno_reg_rtx[regno], zsrc);
+   if (!valid_insn_p (insn))
+ /* It didn't work, remove any inserts.  We'll look
+for another SRC.  */
+ delete_insns_since (last_insn);
+   else
+ {
+   /* We're done for REGNO.  */
+   success = true;
+   break;
+ }
+ }
+
+   /* If nothing worked for REGNO this round, marked it to be
+  retried if we get 

[committed] preprocessor: Support C2X #elifdef, #elifndef

2021-05-11 Thread Joseph Myers
C2X adds #elifdef and #elifndef preprocessor directives; these have
also been proposed for C++.  Implement these directives in libcpp
accordingly.

In this implementation, #elifdef and #elifndef are treated as
non-directives for any language version other than c2x and gnu2x (if
the feature is accepted for C++, it can trivially be enabled for
relevant C++ versions).  In strict conformance modes for prior
language versions, this is required, as illustrated by the
c11-elifdef-1.c test added.

Bootstrapped with no regressions for x86_64-pc-linux-gnu.  Applied to 
mainline.

libcpp/
* include/cpplib.h (struct cpp_options): Add elifdef.
* init.c (struct lang_flags): Add elifdef.
(lang_defaults): Update to include elifdef initializers.
(cpp_set_lang): Set elifdef for pfile based on language.
* directives.c (STDC2X, ELIFDEF): New macros.
(EXTENSION): Increase value to 3.
(DIRECTIVE_TABLE): Add #elifdef and #elifndef.
(_cpp_handle_directive): Do not treat ELIFDEF directives as
directives for language versions without the #elifdef feature.
(do_elif): Handle #elifdef and #elifndef.
(do_elifdef, do_elifndef): New functions.

gcc/testsuite/
* gcc.dg/cpp/c11-elifdef-1.c, gcc.dg/cpp/c2x-elifdef-1.c,
gcc.dg/cpp/c2x-elifdef-2.c: New tests.

diff --git a/gcc/testsuite/gcc.dg/cpp/c11-elifdef-1.c 
b/gcc/testsuite/gcc.dg/cpp/c11-elifdef-1.c
new file mode 100644
index 000..2d5809a8378
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/cpp/c11-elifdef-1.c
@@ -0,0 +1,16 @@
+/* Test #elifdef and #elifndef not in C11.  */
+/* { dg-do preprocess } */
+/* { dg-options "-std=c11 -pedantic-errors" } */
+
+#define A
+#undef B
+
+#if 0
+#elifdef A
+#error "#elifdef A applied"
+#endif
+
+#if 0
+#elifndef B
+#error "#elifndef B applied"
+#endif
diff --git a/gcc/testsuite/gcc.dg/cpp/c2x-elifdef-1.c 
b/gcc/testsuite/gcc.dg/cpp/c2x-elifdef-1.c
new file mode 100644
index 000..b23e3117daf
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/cpp/c2x-elifdef-1.c
@@ -0,0 +1,57 @@
+/* Test #elifdef and #elifndef in C2x.  */
+/* { dg-do preprocess } */
+/* { dg-options "-std=c2x -pedantic-errors" } */
+
+#define A
+#undef B
+
+#if 0
+#elifdef A
+#define M1 1
+#endif
+
+#if M1 != 1
+#error "#elifdef A did not apply"
+#endif
+
+#if 0
+#elifdef B
+#error "#elifdef B applied"
+#endif
+
+#if 0
+#elifndef A
+#error "#elifndef A applied"
+#endif
+
+#if 0
+#elifndef B
+#define M2 2
+#endif
+
+#if M2 != 2
+#error "#elifndef B did not apply"
+#endif
+
+#if 0
+#elifdef A
+#else
+#error "#elifdef A did not apply"
+#endif
+
+#if 0
+#elifndef B
+#else
+#error "#elifndef B did not apply"
+#endif
+
+/* As with #elif, the syntax of the new directives is relaxed after a
+   non-skipped group.  */
+
+#if 1
+#elifdef x * y
+#endif
+
+#if 1
+#elifndef !
+#endif
diff --git a/gcc/testsuite/gcc.dg/cpp/c2x-elifdef-2.c 
b/gcc/testsuite/gcc.dg/cpp/c2x-elifdef-2.c
new file mode 100644
index 000..9132832416d
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/cpp/c2x-elifdef-2.c
@@ -0,0 +1,63 @@
+/* Test #elifdef and #elifndef in C2x: erroneous usages.  */
+/* { dg-do preprocess } */
+/* { dg-options "-std=c2x -pedantic-errors" } */
+
+#define A
+#undef B
+
+#elifdef A /* { dg-error "#elifdef without #if" } */
+#elifdef B /* { dg-error "#elifdef without #if" } */
+#elifndef A /* { dg-error "#elifndef without #if" } */
+#elifndef B /* { dg-error "#elifndef without #if" } */
+
+#if 1 /* { dg-error "-:began here" } */
+#else
+#elifdef A /* { dg-error "#elifdef after #else" } */
+#endif
+
+#if 1 /* { dg-error "-:began here" } */
+#else
+#elifdef B /* { dg-error "#elifdef after #else" } */
+#endif
+
+#if 1 /* { dg-error "-:began here" } */
+#else
+#elifndef A /* { dg-error "#elifndef after #else" } */
+#endif
+
+#if 1 /* { dg-error "-:began here" } */
+#else
+#elifndef B /* { dg-error "#elifndef after #else" } */
+#endif
+
+#if 0
+#elifdef A = /* { dg-error "extra tokens at end of #elifdef directive" } */
+#endif
+
+#if 0
+#elifdef B = /* { dg-error "extra tokens at end of #elifdef directive" } */
+#endif
+
+#if 0
+#elifndef A = /* { dg-error "extra tokens at end of #elifndef directive" } */
+#endif
+
+#if 0
+#elifndef B = /* { dg-error "extra tokens at end of #elifndef directive" } */
+#endif
+
+#if 0
+#elifdef /* { dg-error "no macro name given in #elifdef directive" } */
+#endif
+
+#if 0
+#elifndef /* { dg-error "no macro name given in #elifndef directive" } */
+#endif
+
+#if 0
+#elifdef , /* { dg-error "macro names must be identifiers" } */
+#endif
+
+#if 0
+#elifndef , /* { dg-error "macro names must be identifiers" } */
+#endif
diff --git a/libcpp/directives.c b/libcpp/directives.c
index 795f93e664b..261a584c550 100644
--- a/libcpp/directives.c
+++ b/libcpp/directives.c
@@ -56,10 +56,12 @@ struct pragma_entry
 
 /* Values for the origin field of struct directive.  KANDR directives
come from traditional (K) C.  STDC89 directives come from the
-   1989 C standard.  EXTENSION 

[PATCH v2 06/11] x86: Add tests for piecewise move and store

2021-05-11 Thread H.J. Lu via Gcc-patches
* gcc.target/i386/pieces-memcpy-10.c: New test.
* gcc.target/i386/pieces-memcpy-11.c: Likewise.
* gcc.target/i386/pieces-memcpy-12.c: Likewise.
* gcc.target/i386/pieces-memcpy-13.c: Likewise.
* gcc.target/i386/pieces-memcpy-14.c: Likewise.
* gcc.target/i386/pieces-memcpy-15.c: Likewise.
* gcc.target/i386/pieces-memcpy-16.c: Likewise.
* gcc.target/i386/pieces-memcpy-17.c: Likewise.
* gcc.target/i386/pieces-memcpy-18.c: Likewise.
* gcc.target/i386/pieces-memcpy-19.c: Likewise.
* gcc.target/i386/pieces-memset-1.c: Likewise.
* gcc.target/i386/pieces-memset-2.c: Likewise.
* gcc.target/i386/pieces-memset-3.c: Likewise.
* gcc.target/i386/pieces-memset-4.c: Likewise.
* gcc.target/i386/pieces-memset-5.c: Likewise.
* gcc.target/i386/pieces-memset-6.c: Likewise.
* gcc.target/i386/pieces-memset-7.c: Likewise.
* gcc.target/i386/pieces-memset-8.c: Likewise.
* gcc.target/i386/pieces-memset-9.c: Likewise.
* gcc.target/i386/pieces-memset-10.c: Likewise.
* gcc.target/i386/pieces-memset-11.c: Likewise.
* gcc.target/i386/pieces-memset-12.c: Likewise.
* gcc.target/i386/pieces-memset-13.c: Likewise.
* gcc.target/i386/pieces-memset-14.c: Likewise.
* gcc.target/i386/pieces-memset-15.c: Likewise.
* gcc.target/i386/pieces-memset-16.c: Likewise.
* gcc.target/i386/pieces-memset-17.c: Likewise.
* gcc.target/i386/pieces-memset-18.c: Likewise.
* gcc.target/i386/pieces-memset-19.c: Likewise.
* gcc.target/i386/pieces-memset-20.c: Likewise.
* gcc.target/i386/pieces-memset-21.c: Likewise.
* gcc.target/i386/pieces-memset-22.c: Likewise.
* gcc.target/i386/pieces-memset-23.c: Likewise.
* gcc.target/i386/pieces-memset-24.c: Likewise.
* gcc.target/i386/pieces-memset-25.c: Likewise.
* gcc.target/i386/pieces-memset-26.c: Likewise.
* gcc.target/i386/pieces-memset-27.c: Likewise.
* gcc.target/i386/pieces-memset-28.c: Likewise.
* gcc.target/i386/pieces-memset-29.c: Likewise.
* gcc.target/i386/pieces-memset-30.c: Likewise.
* gcc.target/i386/pieces-memset-31.c: Likewise.
* gcc.target/i386/pieces-memset-32.c: Likewise.
* gcc.target/i386/pieces-memset-33.c: Likewise.
* gcc.target/i386/pieces-memset-34.c: Likewise.
* gcc.target/i386/pieces-memset-35.c: Likewise.
* gcc.target/i386/pieces-memset-36.c: Likewise.
* gcc.target/i386/pieces-memset-37.c: Likewise.
* gcc.target/i386/pieces-memset-38.c: Likewise.
* gcc.target/i386/pieces-memset-39.c: Likewise.
* gcc.target/i386/pieces-memset-40.c: Likewise.
* gcc.target/i386/pieces-memset-41.c: Likewise.
* gcc.target/i386/pieces-memset-42.c: Likewise.
* gcc.target/i386/pieces-memset-43.c: Likewise.
* gcc.target/i386/pieces-memset-44.c: Likewise.
---
 .../gcc.target/i386/pieces-memcpy-10.c | 16 
 .../gcc.target/i386/pieces-memcpy-11.c | 17 +
 .../gcc.target/i386/pieces-memcpy-12.c | 16 
 .../gcc.target/i386/pieces-memcpy-13.c | 16 
 .../gcc.target/i386/pieces-memcpy-14.c | 17 +
 .../gcc.target/i386/pieces-memcpy-15.c | 16 
 .../gcc.target/i386/pieces-memcpy-16.c | 16 
 .../gcc.target/i386/pieces-memcpy-7.c  | 15 +++
 .../gcc.target/i386/pieces-memcpy-8.c  | 14 ++
 .../gcc.target/i386/pieces-memcpy-9.c  | 14 ++
 .../gcc.target/i386/pieces-memset-1.c  | 16 
 .../gcc.target/i386/pieces-memset-10.c | 16 
 .../gcc.target/i386/pieces-memset-11.c | 16 
 .../gcc.target/i386/pieces-memset-12.c | 16 
 .../gcc.target/i386/pieces-memset-13.c | 16 
 .../gcc.target/i386/pieces-memset-14.c | 16 
 .../gcc.target/i386/pieces-memset-15.c | 16 
 .../gcc.target/i386/pieces-memset-16.c | 16 
 .../gcc.target/i386/pieces-memset-17.c | 16 
 .../gcc.target/i386/pieces-memset-18.c | 16 
 .../gcc.target/i386/pieces-memset-19.c | 17 +
 .../gcc.target/i386/pieces-memset-2.c  | 12 
 .../gcc.target/i386/pieces-memset-20.c | 17 +
 .../gcc.target/i386/pieces-memset-21.c | 17 +
 .../gcc.target/i386/pieces-memset-22.c | 17 +
 .../gcc.target/i386/pieces-memset-23.c | 17 +
 .../gcc.target/i386/pieces-memset-24.c | 17 +
 .../gcc.target/i386/pieces-memset-25.c | 17 +
 

[PATCH v2 11/11] constructor: Check if it is faster to load constant from memory

2021-05-11 Thread H.J. Lu via Gcc-patches
When expanding a constant constructor, don't call expand_constructor if
it is more efficient to load the data from the memory via move by pieces.

gcc/

PR middle-end/90773
* expr.c (expand_expr_real_1): Don't call expand_constructor if
it is more efficient to load the data from the memory.

gcc/testsuite/

PR middle-end/90773
* gcc.target/i386/pr90773-24.c: New test.
* gcc.target/i386/pr90773-25.c: Likewise.
---
 gcc/expr.c | 10 ++
 gcc/testsuite/gcc.target/i386/pr90773-24.c | 22 ++
 gcc/testsuite/gcc.target/i386/pr90773-25.c | 20 
 3 files changed, 52 insertions(+)
 create mode 100644 gcc/testsuite/gcc.target/i386/pr90773-24.c
 create mode 100644 gcc/testsuite/gcc.target/i386/pr90773-25.c

diff --git a/gcc/expr.c b/gcc/expr.c
index 42ef5bdf5d5..6ad7265702e 100644
--- a/gcc/expr.c
+++ b/gcc/expr.c
@@ -10885,6 +10885,16 @@ expand_expr_real_1 (tree exp, rtx target, machine_mode 
tmode,
unsigned HOST_WIDE_INT ix;
tree field, value;
 
+   /* Check if it is more efficient to load the data from
+  the memory directly.  FIXME: How many stores do we
+  need here if not moved by pieces?  */
+   unsigned HOST_WIDE_INT bytes
+ = tree_to_uhwi (TYPE_SIZE_UNIT (type));
+   if ((bytes / UNITS_PER_WORD) > 2
+   && MOVE_MAX_PIECES > UNITS_PER_WORD
+   && can_move_by_pieces (bytes, TYPE_ALIGN (type)))
+ goto normal_inner_ref;
+
FOR_EACH_CONSTRUCTOR_ELT (CONSTRUCTOR_ELTS (init), ix,
  field, value)
  if (tree_int_cst_equal (field, index))
diff --git a/gcc/testsuite/gcc.target/i386/pr90773-24.c 
b/gcc/testsuite/gcc.target/i386/pr90773-24.c
new file mode 100644
index 000..4a4b62533dc
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/pr90773-24.c
@@ -0,0 +1,22 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -march=x86-64" } */
+
+struct S
+{
+  long long s1 __attribute__ ((aligned (8)));
+  unsigned s2, s3, s4, s5, s6, s7, s8, s9, s10, s11, s12, s13, s14;
+};
+
+const struct S array[] = {
+  { 0, 60, 640, 2112543726, 39682, 48, 16, 33, 10, 96, 2, 0, 0, 4 }
+};
+
+void
+foo (struct S *x)
+{
+  x[0] = array[0];
+}
+/* { dg-final { scan-assembler-times "movups\[\\t \]%xmm\[0-9\]+, 
\\(%\[\^,\]+\\)" 1 } } */
+/* { dg-final { scan-assembler-times "movups\[\\t \]%xmm\[0-9\]+, 
16\\(%\[\^,\]+\\)" 1 } } */
+/* { dg-final { scan-assembler-times "movups\[\\t \]%xmm\[0-9\]+, 
32\\(%\[\^,\]+\\)" 1 } } */
+/* { dg-final { scan-assembler-times "movups\[\\t \]%xmm\[0-9\]+, 
48\\(%\[\^,\]+\\)" 1 } } */
diff --git a/gcc/testsuite/gcc.target/i386/pr90773-25.c 
b/gcc/testsuite/gcc.target/i386/pr90773-25.c
new file mode 100644
index 000..2520b670989
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/pr90773-25.c
@@ -0,0 +1,20 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -march=skylake" } */
+
+struct S
+{
+  long long s1 __attribute__ ((aligned (8)));
+  unsigned s2, s3, s4, s5, s6, s7, s8, s9, s10, s11, s12, s13, s14;
+};
+
+const struct S array[] = {
+  { 0, 60, 640, 2112543726, 39682, 48, 16, 33, 10, 96, 2, 0, 0, 4 }
+};
+
+void
+foo (struct S *x)
+{
+  x[0] = array[0];
+}
+/* { dg-final { scan-assembler-times "vmovdqu\[\\t \]%ymm\[0-9\]+, 
\\(%\[\^,\]+\\)" 1 } } */
+/* { dg-final { scan-assembler-times "vmovdqu\[\\t \]%ymm\[0-9\]+, 
32\\(%\[\^,\]+\\)" 1 } } */
-- 
2.31.1



[PATCH v2 10/11] x86: Update gcc.target/i386/incoming-11.c

2021-05-11 Thread H.J. Lu via Gcc-patches
Expect no stack realignment since we no longer realign stack when
copying data.

* gcc.target/i386/incoming-11.c: Expect no stack realignment.
---
 gcc/testsuite/gcc.target/i386/incoming-11.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/gcc/testsuite/gcc.target/i386/incoming-11.c 
b/gcc/testsuite/gcc.target/i386/incoming-11.c
index a830c96f7d1..4b822684b88 100644
--- a/gcc/testsuite/gcc.target/i386/incoming-11.c
+++ b/gcc/testsuite/gcc.target/i386/incoming-11.c
@@ -15,4 +15,4 @@ void f()
for (i = 0; i < 100; i++) q[i] = 1;
 }
 
-/* { dg-final { scan-assembler "andl\[\\t \]*\\$-16,\[\\t \]*%esp" } } */
+/* { dg-final { scan-assembler-not "andl\[\\t \]*\\$-16,\[\\t \]*%esp" } } */
-- 
2.31.1



[PATCH v2 09/11] x86: Also pass -mno-avx to sw-1.c for ia32

2021-05-11 Thread H.J. Lu via Gcc-patches
Also pass -mno-avx to sw-1.c for ia32 since copying data with YMM or ZMM
registers disables shrink-wrapping when the second argument is passed on
stack.

* gcc.target/i386/sw-1.c: Also pass -mno-avx for ia32.
---
 gcc/testsuite/gcc.target/i386/sw-1.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/gcc/testsuite/gcc.target/i386/sw-1.c 
b/gcc/testsuite/gcc.target/i386/sw-1.c
index aec095eda62..a9c89fca4ec 100644
--- a/gcc/testsuite/gcc.target/i386/sw-1.c
+++ b/gcc/testsuite/gcc.target/i386/sw-1.c
@@ -1,5 +1,6 @@
 /* { dg-do compile } */
 /* { dg-options "-O2 -mtune=generic -fshrink-wrap -fdump-rtl-pro_and_epilogue" 
} */
+/* { dg-additional-options "-mno-avx" { target ia32 } } */
 /* { dg-skip-if "No shrink-wrapping preformed" { x86_64-*-mingw* } } */
 
 #include 
-- 
2.31.1



[PATCH v2 08/11] x86: Also pass -mno-avx to cold-attribute-1.c

2021-05-11 Thread H.J. Lu via Gcc-patches
Also pass -mno-avx to pr72839.c to avoid copying data with YMM or ZMM
registers.

* gcc.target/i386/cold-attribute-1.c: Also pass -mno-avx.
---
 gcc/testsuite/gcc.target/i386/cold-attribute-1.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/gcc/testsuite/gcc.target/i386/cold-attribute-1.c 
b/gcc/testsuite/gcc.target/i386/cold-attribute-1.c
index 57666ac60b6..658eb3e25bb 100644
--- a/gcc/testsuite/gcc.target/i386/cold-attribute-1.c
+++ b/gcc/testsuite/gcc.target/i386/cold-attribute-1.c
@@ -1,5 +1,5 @@
 /* { dg-do compile } */
-/* { dg-options "-O2" } */
+/* { dg-options "-O2 -mno-avx" } */
 #include 
 static inline
 __attribute__ ((cold)) void
-- 
2.31.1



[PATCH v2 05/11] x86: Add AVX2 tests for PR middle-end/90773

2021-05-11 Thread H.J. Lu via Gcc-patches
PR middle-end/90773
* gcc.target/i386/pr90773-20.c: New test.
* gcc.target/i386/pr90773-21.c: Likewise.
* gcc.target/i386/pr90773-22.c: Likewise.
* gcc.target/i386/pr90773-23.c: Likewise.
---
 gcc/testsuite/gcc.target/i386/pr90773-20.c | 13 +
 gcc/testsuite/gcc.target/i386/pr90773-21.c | 13 +
 gcc/testsuite/gcc.target/i386/pr90773-22.c | 13 +
 gcc/testsuite/gcc.target/i386/pr90773-23.c | 13 +
 4 files changed, 52 insertions(+)
 create mode 100644 gcc/testsuite/gcc.target/i386/pr90773-20.c
 create mode 100644 gcc/testsuite/gcc.target/i386/pr90773-21.c
 create mode 100644 gcc/testsuite/gcc.target/i386/pr90773-22.c
 create mode 100644 gcc/testsuite/gcc.target/i386/pr90773-23.c

diff --git a/gcc/testsuite/gcc.target/i386/pr90773-20.c 
b/gcc/testsuite/gcc.target/i386/pr90773-20.c
new file mode 100644
index 000..e61e405f2b6
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/pr90773-20.c
@@ -0,0 +1,13 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -march=skylake" } */
+
+extern char *dst;
+
+void
+foo (int c)
+{
+  __builtin_memset (dst, c, 33);
+}
+
+/* { dg-final { scan-assembler-times "vmovdqu\[\\t \]%ymm\[0-9\]+, 
\\(%\[\^,\]+\\)" 1 } } */
+/* { dg-final { scan-assembler-times "movb\[\\t \]+.+, 32\\(%\[\^,\]+\\)" 1 } 
} */
diff --git a/gcc/testsuite/gcc.target/i386/pr90773-21.c 
b/gcc/testsuite/gcc.target/i386/pr90773-21.c
new file mode 100644
index 000..16ad17f3cbb
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/pr90773-21.c
@@ -0,0 +1,13 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -march=skylake" } */
+
+extern char *dst;
+
+void
+foo (int c)
+{
+  __builtin_memset (dst, c, 34);
+}
+
+/* { dg-final { scan-assembler-times "vmovdqu\[\\t \]%ymm\[0-9\]+, 
\\(%\[\^,\]+\\)" 1 } } */
+/* { dg-final { scan-assembler-times "movw\[\\t \]%.*, 32\\(%\[\^,\]+\\)" 1 } 
} */
diff --git a/gcc/testsuite/gcc.target/i386/pr90773-22.c 
b/gcc/testsuite/gcc.target/i386/pr90773-22.c
new file mode 100644
index 000..45a8ff65a84
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/pr90773-22.c
@@ -0,0 +1,13 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -march=skylake" } */
+
+extern char *dst;
+
+void
+foo (void)
+{
+  __builtin_memset (dst, 0, 33);
+}
+
+/* { dg-final { scan-assembler-times "vmovdqu\[\\t \]%ymm\[0-9\]+, 
\\(%\[\^,\]+\\)" 1 } } */
+/* { dg-final { scan-assembler-times "movb\[\\t \]+.+, 32\\(%\[\^,\]+\\)" 1 } 
} */
diff --git a/gcc/testsuite/gcc.target/i386/pr90773-23.c 
b/gcc/testsuite/gcc.target/i386/pr90773-23.c
new file mode 100644
index 000..9256ce10ff0
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/pr90773-23.c
@@ -0,0 +1,13 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -march=skylake" } */
+
+extern char *dst;
+
+void
+foo (void)
+{
+  __builtin_memset (dst, 0, 34);
+}
+
+/* { dg-final { scan-assembler-times "vmovdqu\[\\t \]%ymm\[0-9\]+, 
\\(%\[\^,\]+\\)" 1 } } */
+/* { dg-final { scan-assembler-times "movw\[\\t \]+.+, 32\\(%\[\^,\]+\\)" 1 } 
} */
-- 
2.31.1



[PATCH v2 07/11] x86: Also pass -mno-avx to pr72839.c

2021-05-11 Thread H.J. Lu via Gcc-patches
Also pass -mno-avx to pr72839.c to avoid copying data with YMM or ZMM
registers.

* gcc.target/i386/pr72839.c: Also pass -mno-avx.
---
 gcc/testsuite/gcc.target/i386/pr72839.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/gcc/testsuite/gcc.target/i386/pr72839.c 
b/gcc/testsuite/gcc.target/i386/pr72839.c
index ea724f70377..6888d9d0a55 100644
--- a/gcc/testsuite/gcc.target/i386/pr72839.c
+++ b/gcc/testsuite/gcc.target/i386/pr72839.c
@@ -1,6 +1,6 @@
 /* { dg-do compile } */
 /* { dg-require-effective-target ia32 } */
-/* { dg-options "-O2 -mtune=lakemont" } */
+/* { dg-options "-O2 -mtune=lakemont -mno-avx" } */
 
 extern char *strcpy (char *, const char *);
 
-- 
2.31.1



[PATCH v2 01/11] Add TARGET_READ_MEMSET_VALUE/TARGET_GEN_MEMSET_VALUE

2021-05-11 Thread H.J. Lu via Gcc-patches
Add TARGET_READ_MEMSET_VALUE and TARGET_GEN_MEMSET_VALUE to support
target instructions to duplicate QImode value to TImode/OImode/XImode
value for memmset.  Define SCRATCH_SSE_REG as a scratch register for
ix86_gen_memset_value.

gcc/

PR middle-end/90773
* builtins.c (builtin_memset_read_str): Call
targetm.read_memset_value.
(builtin_memset_gen_str): Call targetm.gen_memset_value.
* target.def (read_memset_value): New hook.
(gen_memset_value): Likewise.
* targhooks.c: Inclue "builtins.h".
(default_read_memset_value): New function.
(default_gen_memset_value): Likewise.
* targhooks.h (default_read_memset_value): New prototype.
(default_gen_memset_value): Likewise.
* config/i386/i386-expand.c (ix86_expand_vector_init_duplicate):
Make it global.
* config/i386/i386-protos.h (ix86_minimum_incoming_stack_boundary):
New.
(ix86_expand_vector_init_duplicate): Likewise.
* config/i386/i386.c (ix86_minimum_incoming_stack_boundary): Add
an argument to ignore stack_alignment_estimated.  It is passed
as false by default.
(ix86_gen_memset_value_from_prev): New function.
(ix86_gen_memset_value): Likewise.
(ix86_read_memset_value): Likewise.
(TARGET_GEN_MEMSET_VALUE): New.
(TARGET_READ_MEMSET_VALUE): Likewise.
* config/i386/i386.h (SCRATCH_SSE_REG): New.
* doc/tm.texi.in: Add TARGET_READ_MEMSET_VALUE and
TARGET_GEN_MEMSET_VALUE hooks.
* doc/tm.texi: Regenerated.

gcc/testsuite/

PR middle-end/90773
* gcc.target/i386/pr90773-15.c: New test.
* gcc.target/i386/pr90773-16.c: Likewise.
* gcc.target/i386/pr90773-17.c: Likewise.
* gcc.target/i386/pr90773-18.c: Likewise.
* gcc.target/i386/pr90773-19.c: Likewise.
---
 gcc/builtins.c |  47 +---
 gcc/config/i386/i386-expand.c  |   2 +-
 gcc/config/i386/i386-protos.h  |   5 +
 gcc/config/i386/i386.c | 268 -
 gcc/config/i386/i386.h |   4 +
 gcc/doc/tm.texi|  16 ++
 gcc/doc/tm.texi.in |   4 +
 gcc/expr.c |   1 -
 gcc/target.def |  20 ++
 gcc/targhooks.c|  56 +
 gcc/targhooks.h|   4 +
 gcc/testsuite/gcc.target/i386/pr90773-15.c |  14 ++
 gcc/testsuite/gcc.target/i386/pr90773-16.c |  14 ++
 gcc/testsuite/gcc.target/i386/pr90773-17.c |  14 ++
 gcc/testsuite/gcc.target/i386/pr90773-18.c |  15 ++
 gcc/testsuite/gcc.target/i386/pr90773-19.c |  14 ++
 16 files changed, 449 insertions(+), 49 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/i386/pr90773-15.c
 create mode 100644 gcc/testsuite/gcc.target/i386/pr90773-16.c
 create mode 100644 gcc/testsuite/gcc.target/i386/pr90773-17.c
 create mode 100644 gcc/testsuite/gcc.target/i386/pr90773-18.c
 create mode 100644 gcc/testsuite/gcc.target/i386/pr90773-19.c

diff --git a/gcc/builtins.c b/gcc/builtins.c
index 2f0efae11e8..6951f2d3633 100644
--- a/gcc/builtins.c
+++ b/gcc/builtins.c
@@ -6584,24 +6584,11 @@ expand_builtin_strncpy (tree exp, rtx target)
previous iteration.  */
 
 rtx
-builtin_memset_read_str (void *data, void *prevp,
+builtin_memset_read_str (void *data, void *prev,
 HOST_WIDE_INT offset ATTRIBUTE_UNUSED,
 scalar_int_mode mode)
 {
-  by_pieces_prev *prev = (by_pieces_prev *) prevp;
-  if (prev != nullptr && prev->data != nullptr)
-{
-  /* Use the previous data in the same mode.  */
-  if (prev->mode == mode)
-   return prev->data;
-}
-
-  const char *c = (const char *) data;
-  char *p = XALLOCAVEC (char, GET_MODE_SIZE (mode));
-
-  memset (p, *c, GET_MODE_SIZE (mode));
-
-  return c_readstr (p, mode);
+  return targetm.read_memset_value ((const char *) data, prev, mode);
 }
 
 /* Callback routine for store_by_pieces.  Return the RTL of a register
@@ -6611,37 +6598,11 @@ builtin_memset_read_str (void *data, void *prevp,
nullptr, it has the RTL info from the previous iteration.  */
 
 static rtx
-builtin_memset_gen_str (void *data, void *prevp,
+builtin_memset_gen_str (void *data, void *prev,
HOST_WIDE_INT offset ATTRIBUTE_UNUSED,
scalar_int_mode mode)
 {
-  rtx target, coeff;
-  size_t size;
-  char *p;
-
-  by_pieces_prev *prev = (by_pieces_prev *) prevp;
-  if (prev != nullptr && prev->data != nullptr)
-{
-  /* Use the previous data in the same mode.  */
-  if (prev->mode == mode)
-   return prev->data;
-
-  target = simplify_gen_subreg (mode, prev->data, prev->mode, 0);
-  if (target != nullptr)
-   return target;
-}
-
-  size = GET_MODE_SIZE (mode);
-  if (size == 1)
-return (rtx) data;
-
-  p = XALLOCAVEC (char, 

[PATCH v2 00/11] Allow TImode/OImode/XImode in op_by_pieces operations

2021-05-11 Thread H.J. Lu via Gcc-patches
1. Add TARGET_READ_MEMSET_VALUE and TARGET_GEN_MEMSET_VALUE to support
target instructions to duplicate QImode value to TImode/OImode/XImode
value for memmset.
2. x86: Avoid stack realignment when copying data
3. x86: Remov MAX_BITSIZE_MODE_ANY_INT.  Only x86 backend defines it.
4. x86: Use TImode/OImode/XImode integers for piecewise move and store.
5. x86: Add tests for TImode/OImode/XImode for piecewise move and store.
6. x86: Adjust existing tests.

On x86-64, SPEC CPU 2017 performance impact is neutral.  Glibc code size
differences with -O2 build are:

 Before After
libc.so 19065721906444

Some code sequence differences in libc.so are:

:
...
jne   | jne

test   %r15,%r15test   
%r15,%r15
je| je 

mov%r13d,(%r14) mov
%r13d,(%r14)
lea0x10(%r14),%rdi  lea
0x10(%r14),%rdi
mov$0x1,%ecxmov
$0x1,%ecx
mov%r13d,%edx   mov
%r13d,%edx
mov%r15,0x40(%r12)  mov
%r15,0x40(%r12)
mov%r15,%rsimov
%r15,%rsi
call call   

lea0xa2f9b(%rip),%rax# | lea
0xa2fab(%rip),%rax# 
xor%esi,%esixor
%esi,%esi
mov%ebp,%edimov
%ebp,%edi
mov%rax,0x8(%r12)   mov
%rax,0x8(%r12)
movzwl 0x12(%rsp),%eax  movzwl 
0x12(%rsp),%eax
mov$0x8,%edx  <
lea0xc(%rsp),%rcx   lea
0xc(%rsp),%rcx
mov%r14,0x48(%r12)<
add$0x40,%r14 <
mov$0x4,%r8dmov
$0x4,%r8d
  > movq   
$0x0,0x1d0(%r14)
  > mov
$0x8,%edx
rol$0x8,%ax rol
$0x8,%ax
mov%ebp,(%r12)| mov
%r14,0x48(%r12)
movq   $0x0,0x190(%r14)   | add
$0x40,%r14
mov%ax,0x4(%r12)  <
mov%r14,0x30(%r12)  mov
%r14,0x30(%r12)
  > mov
%ax,0x4(%r12)
  > mov
%ebp,(%r12)
movl   $0x1,0xc(%rsp)   movl   
$0x1,0xc(%rsp)
callcall   

mov%r12,%rdimov
%r12,%rdi
movabs $0x101010101010101,%rdx<
test   %eax,%eaxtest   
%eax,%eax
mov$0xff,%eax   mov
$0xff,%eax
cmove  %eax,%ebxcmove  
%eax,%ebx
movzbl %bl,%eax   | movd   
%ebx,%xmm0
mov%ebx,0xc(%rsp)   mov
%ebx,0xc(%rsp)
mov%rax,%rsi  | 
punpcklbw %xmm0,%xmm0
imul   %rdx,%rsi  | 
punpcklwd %xmm0,%xmm0
mul%rdx   | pshufd 
$0x0,%xmm0,%xmm0
add%rsi,%rdx  | movups 
%xmm0,0x50(%r12)
mov%rax,0x50(%r12)| movups 
%xmm0,0x60(%r12)
mov%rdx,0x58(%r12)| movups 
%xmm0,0x70(%r12)
mov%rax,0x60(%r12)| movups 
%xmm0,0x80(%r12)
mov%rdx,0x68(%r12)| movups 
%xmm0,0x90(%r12)
mov%rax,0x70(%r12)| movups 
%xmm0,0xa0(%r12)
mov%rdx,0x78(%r12)| movups 
%xmm0,0xb0(%r12)
mov%rax,0x80(%r12)| movups 
%xmm0,0xc0(%r12)
mov

[PATCH v2 04/11] x86: Update piecewise move and store

2021-05-11 Thread H.J. Lu via Gcc-patches
We can use TImode/OImode/XImode integers for piecewise move and store.
When vector register is used for piecewise move and store, we don't
increase stack_alignment_needed since vector register spill isn't
required for piecewise move and store.  Since stack_realign_needed is
set to true by checking stack_alignment_estimated set by pseudo vector
register usage, we also need to check stack_realign_needed to eliminate
frame pointer.

gcc/

* config/i386/i386.c (ix86_finalize_stack_frame_flags): Also
check stack_realign_needed for stack realignment.
(ix86_legitimate_constant_p): Always allow CONST_WIDE_INT smaller
than the largest integer supported by vector register.
* config/i386/i386.h (MOVE_MAX): Set to 64.
(MOVE_MAX_PIECES): Set to bytes of the largest integer supported
by vector register.
(STORE_MAX_PIECES): New.

gcc/testsuite/

* gcc.target/i386/pr90773-1.c: Adjust to expect movq for 32-bit.
* gcc.target/i386/pr90773-4.c: Also run for 32-bit.
* gcc.target/i386/pr90773-14.c: Likewise.
* gcc.target/i386/pr90773-15.c: Likewise.
* gcc.target/i386/pr90773-16.c: Likewise.
* gcc.target/i386/pr90773-17.c: Likewise.
---
 gcc/config/i386/i386.c | 21 ---
 gcc/config/i386/i386.h | 31 +-
 gcc/testsuite/gcc.target/i386/pr90773-1.c  | 10 +++
 gcc/testsuite/gcc.target/i386/pr90773-14.c |  2 +-
 gcc/testsuite/gcc.target/i386/pr90773-15.c |  6 ++---
 gcc/testsuite/gcc.target/i386/pr90773-16.c |  2 +-
 gcc/testsuite/gcc.target/i386/pr90773-17.c |  2 +-
 gcc/testsuite/gcc.target/i386/pr90773-4.c  |  2 +-
 8 files changed, 53 insertions(+), 23 deletions(-)

diff --git a/gcc/config/i386/i386.c b/gcc/config/i386/i386.c
index f9cbc1d10eb..98bf08b854b 100644
--- a/gcc/config/i386/i386.c
+++ b/gcc/config/i386/i386.c
@@ -7943,8 +7943,17 @@ ix86_finalize_stack_frame_flags (void)
  assumed stack realignment might be needed or -fno-omit-frame-pointer
  is used, but in the end nothing that needed the stack alignment had
  been spilled nor stack access, clear frame_pointer_needed and say we
- don't need stack realignment.  */
-  if ((stack_realign || (!flag_omit_frame_pointer && optimize))
+ don't need stack realignment.
+
+ When vector register is used for piecewise move and store, we don't
+ increase stack_alignment_needed as there is no register spill for
+ piecewise move and store.  Since stack_realign_needed is set to true
+ by checking stack_alignment_estimated which is updated by pseudo
+ vector register usage, we also need to check stack_realign_needed to
+ eliminate frame pointer.  */
+  if ((stack_realign
+   || (!flag_omit_frame_pointer && optimize)
+   || crtl->stack_realign_needed)
   && frame_pointer_needed
   && crtl->is_leaf
   && crtl->sp_is_unchanging
@@ -10403,7 +10412,13 @@ ix86_legitimate_constant_p (machine_mode mode, rtx x)
  /* FALLTHRU */
case E_OImode:
case E_XImode:
- if (!standard_sse_constant_p (x, mode))
+ if (!standard_sse_constant_p (x, mode)
+ && GET_MODE_SIZE (TARGET_AVX512F
+   ? XImode
+   : (TARGET_AVX
+  ? OImode
+  : (TARGET_SSE2
+ ? TImode : DImode))) < GET_MODE_SIZE 
(mode))
return false;
default:
  break;
diff --git a/gcc/config/i386/i386.h b/gcc/config/i386/i386.h
index 45d86802c51..677afbf7031 100644
--- a/gcc/config/i386/i386.h
+++ b/gcc/config/i386/i386.h
@@ -1754,7 +1754,7 @@ typedef struct ix86_args {
 
 /* Max number of bytes we can move from memory to memory
in one reasonably fast instruction.  */
-#define MOVE_MAX 16
+#define MOVE_MAX 64
 
 /* MOVE_MAX_PIECES is the number of bytes at a time which we can
move efficiently, as opposed to  MOVE_MAX which is the maximum
@@ -1765,11 +1765,30 @@ typedef struct ix86_args {
widest mode with MAX_FIXED_MODE_SIZE, we can only use TImode in
64-bit mode.  */
 #define MOVE_MAX_PIECES \
-  ((TARGET_64BIT \
-&& TARGET_SSE2 \
-&& TARGET_SSE_UNALIGNED_LOAD_OPTIMAL \
-&& TARGET_SSE_UNALIGNED_STORE_OPTIMAL) \
-   ? GET_MODE_SIZE (TImode) : UNITS_PER_WORD)
+  ((TARGET_AVX512F && !TARGET_PREFER_AVX256) \
+   ? 64 \
+   : ((TARGET_AVX \
+   && !TARGET_PREFER_AVX128 \
+   && !TARGET_AVX256_SPLIT_UNALIGNED_LOAD \
+   && !TARGET_AVX256_SPLIT_UNALIGNED_STORE) \
+  ? 32 \
+  : ((TARGET_SSE2 \
+ && TARGET_SSE_UNALIGNED_LOAD_OPTIMAL \
+ && TARGET_SSE_UNALIGNED_STORE_OPTIMAL) \
+? 16 : UNITS_PER_WORD)))
+
+/* STORE_MAX_PIECES is the number of bytes at a time that we can
+   store efficiently.  */
+#define STORE_MAX_PIECES \
+  ((TARGET_AVX512F && !TARGET_PREFER_AVX256) \
+   ? 64 \
+   : ((TARGET_AVX \
+   && 

[PATCH v2 02/11] x86: Avoid stack realignment when copying data

2021-05-11 Thread H.J. Lu via Gcc-patches
To avoid stack realignment, use SCRATCH_SSE_REG to copy data from one
memory location to another.

gcc/

* config/i386/i386-expand.c (ix86_expand_vector_move): Use
SCRATCH_SSE_REG to copy data from one memory location to
another.

gcc/testsuite/

* gcc.target/i386/eh_return-1.c: New test.
---
 gcc/config/i386/i386-expand.c   | 16 -
 gcc/testsuite/gcc.target/i386/eh_return-1.c | 26 +
 2 files changed, 41 insertions(+), 1 deletion(-)
 create mode 100644 gcc/testsuite/gcc.target/i386/eh_return-1.c

diff --git a/gcc/config/i386/i386-expand.c b/gcc/config/i386/i386-expand.c
index 7f1dff6337c..09d5e5d88af 100644
--- a/gcc/config/i386/i386-expand.c
+++ b/gcc/config/i386/i386-expand.c
@@ -431,7 +431,21 @@ ix86_expand_vector_move (machine_mode mode, rtx operands[])
   && !register_operand (op0, mode)
   && !register_operand (op1, mode))
 {
-  emit_move_insn (op0, force_reg (GET_MODE (op0), op1));
+  rtx tmp;
+  mode = GET_MODE (op0);
+  if (TARGET_SSE
+ && (GET_MODE_ALIGNMENT (mode)
+ > ix86_minimum_incoming_stack_boundary (false, true)))
+   {
+ /* NB: Don't increase stack alignment requirement by using
+a scratch SSE register to copy data from one memory
+location to another since it doesn't require a spill.  */
+ tmp = gen_rtx_REG (mode, SCRATCH_SSE_REG);
+ emit_move_insn (tmp, op1);
+   }
+  else
+   tmp = force_reg (mode, op1);
+  emit_move_insn (op0, tmp);
   return;
 }
 
diff --git a/gcc/testsuite/gcc.target/i386/eh_return-1.c 
b/gcc/testsuite/gcc.target/i386/eh_return-1.c
new file mode 100644
index 000..671ba635e88
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/eh_return-1.c
@@ -0,0 +1,26 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -march=haswell -mno-avx512f" } */
+
+struct _Unwind_Context
+{
+  void *ra;
+  char array[48];
+};
+
+extern long uw_install_context_1 (struct _Unwind_Context *);
+
+void
+_Unwind_RaiseException (void)
+{
+  struct _Unwind_Context this_context, cur_context;
+  long offset = uw_install_context_1 (_context);
+  __builtin_memcpy (_context, _context,
+   sizeof (struct _Unwind_Context));
+  void *handler = __builtin_frob_return_addr ((_context)->ra);
+  uw_install_context_1 (_context);
+  __builtin_eh_return (offset, handler);
+}
+
+/* { dg-final { scan-assembler-times "vmovdqu\[ \\t\]+\[^\n\]*%ymm" 4 } } */
+/* No need to dynamically realign the stack here.  */
+/* { dg-final { scan-assembler-not "and\[^\n\r]*%\[re\]sp" } } */
-- 
2.31.1



[PATCH v2 03/11] Remove MAX_BITSIZE_MODE_ANY_INT

2021-05-11 Thread H.J. Lu via Gcc-patches
It is only defined for i386 and everyone uses the default:

 #define MAX_BITSIZE_MODE_ANY_INT (64*BITS_PER_UNIT)

Whatever problems we had before, they have been fixed now.

* config/i386/i386-modes.def (MAX_BITSIZE_MODE_ANY_INT): Removed.
---
 gcc/config/i386/i386-modes.def | 15 +++
 1 file changed, 3 insertions(+), 12 deletions(-)

diff --git a/gcc/config/i386/i386-modes.def b/gcc/config/i386/i386-modes.def
index dbddfd8e48f..4e7014be034 100644
--- a/gcc/config/i386/i386-modes.def
+++ b/gcc/config/i386/i386-modes.def
@@ -107,19 +107,10 @@ INT_MODE (XI, 64);
 PARTIAL_INT_MODE (HI, 16, P2QI);
 PARTIAL_INT_MODE (SI, 32, P2HI);
 
-/* Mode used for signed overflow checking of TImode.  As
-   MAX_BITSIZE_MODE_ANY_INT is only 160, wide-int.h reserves only that
-   rounded up to multiple of HOST_BITS_PER_WIDE_INT bits in wide_int etc.,
-   so OImode is too large.  For the overflow checking we actually need
-   just 1 or 2 bits beyond TImode precision.  Use 160 bits to have
-   a multiple of 32.  */
+/* Mode used for signed overflow checking of TImode.  For the overflow
+   checking we actually need just 1 or 2 bits beyond TImode precision.
+   Use 160 bits to have a multiple of 32.  */
 PARTIAL_INT_MODE (OI, 160, POI);
 
-/* Keep the OI and XI modes from confusing the compiler into thinking
-   that these modes could actually be used for computation.  They are
-   only holders for vectors during data movement.  Include POImode precision
-   though.  */
-#define MAX_BITSIZE_MODE_ANY_INT (160)
-
 /* The symbol Pmode stands for one of the above machine modes (usually SImode).
The tm.h file specifies which one.  It is not a distinct mode.  */
-- 
2.31.1



Re: [PATCH 00/57] Replace the Power target-specific built-in machinery

2021-05-11 Thread Segher Boessenkool
On Tue, May 11, 2021 at 10:57:56AM -0500, Bill Schmidt wrote:
> Hi!  I'd like to ping this series.  This is a big change, so I'd like to 
> get it committed fairly early in stage 1.  I know you have a lot stacked 
> up, though.

I haven't received most of this series (only the last three patches).
I'll dig it up from the archives.


Segher


[Patch] OpenMP: detach - fix firstprivate handling

2021-05-11 Thread Tobias Burnus

The sfield / firstprivate lookup used the wrong var decl
for the lookup – hence it failed.
I used an extra long diff to make it easier to follow why
'c' and not 'detach_clause' has the proper clause for the
decl to be used as key.

Testsuite run ongoing.
OK for mainline, when it passes?

Tobias

-
Mentor Graphics (Deutschland) GmbH, Arnulfstrasse 201, 80634 München 
Registergericht München HRB 106955, Geschäftsführer: Thomas Heurung, Frank 
Thürauf
OpenMP: detach - fix firstprivate handling 
gcc/ChangeLog:

	* omp-low.c (finish_taskreg_scan): Use the proper detach decl.

libgomp/ChangeLog:

	* testsuite/libgomp.fortran/detach-1.f90: New test.

diff --git a/gcc/omp-low.c b/gcc/omp-low.c
index c0ce1a4990e..cadca7e201f 100644
--- a/gcc/omp-low.c
+++ b/gcc/omp-low.c
@@ -2436,55 +2436,55 @@ finish_taskreg_scan (omp_context *ctx)
 	  /* Look for a firstprivate clause with the detach event handle.  */
 	  for (c = gimple_omp_taskreg_clauses (ctx->stmt);
 	   c; c = OMP_CLAUSE_CHAIN (c))
 	{
 	  if (OMP_CLAUSE_CODE (c) != OMP_CLAUSE_FIRSTPRIVATE)
 		continue;
 	  if (maybe_lookup_decl_in_outer_ctx (OMP_CLAUSE_DECL (c), ctx)
 		  == OMP_CLAUSE_DECL (detach_clause))
 		break;
 	}
 
 	  gcc_assert (c);
 	  field = lookup_field (OMP_CLAUSE_DECL (c), ctx);
 
 	  /* Move field corresponding to the detach clause first.
 	 This is filled by GOMP_task and needs to be in a
 	 specific position.  */
 	  p = _FIELDS (ctx->record_type);
 	  while (*p)
 	if (*p == field)
 	  *p = DECL_CHAIN (*p);
 	else
 	  p = _CHAIN (*p);
 	  DECL_CHAIN (field) = TYPE_FIELDS (ctx->record_type);
 	  TYPE_FIELDS (ctx->record_type) = field;
 	  if (ctx->srecord_type)
 	{
-	  field = lookup_sfield (OMP_CLAUSE_DECL (detach_clause), ctx);
+	  field = lookup_sfield (OMP_CLAUSE_DECL (c), ctx);
 	  p = _FIELDS (ctx->srecord_type);
 	  while (*p)
 		if (*p == field)
 		  *p = DECL_CHAIN (*p);
 		else
 		  p = _CHAIN (*p);
 	  DECL_CHAIN (field) = TYPE_FIELDS (ctx->srecord_type);
 	  TYPE_FIELDS (ctx->srecord_type) = field;
 	}
 	}
   layout_type (ctx->record_type);
   fixup_child_record_type (ctx);
   if (ctx->srecord_type)
 	layout_type (ctx->srecord_type);
   tree t = fold_convert_loc (loc, long_integer_type_node,
  TYPE_SIZE_UNIT (ctx->record_type));
   if (TREE_CODE (t) != INTEGER_CST)
 	{
 	  t = unshare_expr (t);
 	  walk_tree (, finish_taskreg_remap, ctx, NULL);
 	}
   gimple_omp_task_set_arg_size (ctx->stmt, t);
   t = build_int_cst (long_integer_type_node,
 			 TYPE_ALIGN_UNIT (ctx->record_type));
   gimple_omp_task_set_arg_align (ctx->stmt, t);
 }
 }
diff --git a/libgomp/testsuite/libgomp.fortran/detach-1.f90 b/libgomp/testsuite/libgomp.fortran/detach-1.f90
new file mode 100644
index 000..88546fe473b
--- /dev/null
+++ b/libgomp/testsuite/libgomp.fortran/detach-1.f90
@@ -0,0 +1,22 @@
+program test
+use omp_lib
+implicit none
+integer(omp_event_handle_kind) :: oevent, ievent
+integer :: i
+integer, allocatable :: temp(:)
+ALLOCATE(temp(5))
+
+!$omp parallel num_threads(3)
+!$omp single
+DO i=1,5
+!$omp task firstprivate(i) firstprivate(temp)  detach(oevent)
+  temp(:) = 0;
+  temp(1) = -1;
+  !print *,temp
+  call omp_fulfill_event(oevent)
+!$omp end task
+ENDDO
+!$omp taskwait
+!$omp end single
+!$omp end parallel
+end program


Re: [PATCH] forwprop: Support vec perm fed by CTOR and CTOR/CST [PR99398]

2021-05-11 Thread Segher Boessenkool
Hi!

On Fri, May 07, 2021 at 10:40:21AM +0800, Kewen.Lin wrote:
>  .../gcc.target/powerpc/vec-perm-ctor-run.c| 124 +
>  .../gcc.target/powerpc/vec-perm-ctor.c|   9 +
>  .../gcc.target/powerpc/vec-perm-ctor.h| 163 ++

The new testcases are fine (as far as rs6000 is concerned anyway).

> +bool
> +vec_perm_indices::new_shrinked_vector (const vec_perm_indices ,
> +unsigned int factor)

The past participle is "shrunk", not "shrinked" (shrink/shrank/shrunk).

And that is as technical as I can review this :-)


Segher


[PING 2][PATCH] define auto_vec copy ctor and assignment (PR 90904)

2021-05-11 Thread Martin Sebor via Gcc-patches

Ping 2:
https://gcc.gnu.org/pipermail/gcc-patches/2021-April/568901.html


On 5/3/21 3:50 PM, Martin Sebor wrote:

Ping:

https://gcc.gnu.org/pipermail/gcc-patches/2021-April/568901.html

On 4/27/21 9:52 AM, Martin Sebor wrote:

On 4/27/21 8:04 AM, Richard Biener wrote:

On Tue, Apr 27, 2021 at 3:59 PM Martin Sebor  wrote:


On 4/27/21 1:58 AM, Richard Biener wrote:

On Tue, Apr 27, 2021 at 2:46 AM Martin Sebor via Gcc-patches
 wrote:


PR 90904 notes that auto_vec is unsafe to copy and assign because
the class manages its own memory but doesn't define (or delete)
either special function.  Since I first ran into the problem,
auto_vec has grown a move ctor and move assignment from
a dynamically-allocated vec but still no copy ctor or copy
assignment operator.

The attached patch adds the two special functions to auto_vec along
with a few simple tests.  It makes auto_vec safe to use in containers
that expect copyable and assignable element types and passes 
bootstrap

and regression testing on x86_64-linux.


The question is whether we want such uses to appear since those
can be quite inefficient?  Thus the option is to delete those 
operators?


I would strongly prefer the generic vector class to have the properties
expected of any other generic container: copyable and assignable.  If
we also want another vector type with this restriction I suggest to add
another "noncopyable" type and make that property explicit in its name.
I can submit one in a followup patch if you think we need one.


I'm not sure (and not strictly against the copy and assign).  Looking 
around

I see that vec<> does not do deep copying.  Making auto_vec<> do it
might be surprising (I added the move capability to match how vec<>
is used - as "reference" to a vector)


The vec base classes are special: they have no ctors at all (because
of their use in unions).  That's something we might have to live with
but it's not a model to follow in ordinary containers.

The auto_vec class was introduced to fill the need for a conventional
sequence container with a ctor and dtor.  The missing copy ctor and
assignment operators were an oversight, not a deliberate feature.
This change fixes that oversight.

The revised patch also adds a copy ctor/assignment to the auto_vec
primary template (that's also missing it).  In addition, it adds
a new class called auto_vec_ncopy that disables copying and
assignment as you prefer.  It also disables copying for
the auto_string_vec class.

Martin






Re: [stage 1 patch] remove unreachable code in expand_expr_real_1 (PR 21433)

2021-05-11 Thread Martin Sebor via Gcc-patches

On 2/12/21 1:55 AM, Richard Biener wrote:

On Fri, Feb 12, 2021 at 1:35 AM Martin Sebor via Gcc-patches
 wrote:


While trawling through old bugs I came across one from 2005: PR 21433
- The COMPONENT_REF case of expand_expr_real_1 is probably wrong.

The report looks correct in that argument 0 in COMPONENT_REF cannot
be a CONSTRUCTOR.  In my tests it's only been one of the following
codes:

array_ref
component_ref
mem_ref
parm_decl
result_decl
var_decl

The attached patch removes the CONSTRUCTOR code and replaces it with
an assert verifying it doesn't come up there.  Besides testing on
x86_64-linux, the change is supported by comments in code and also
in the internals manual (although that looks incorrect and should
be changed to avoid suggesting the first operand is a decl).


Note the CTOR operand is valid GENERIC and likely came up before
we introduced GIMPLE.  GIMPLE simply feeds more restrictive GENERIC
to the RTL expansion routines nowadays (so the patch is OK
eventually), but please
avoid altering GENERIC or tree.def documentation which documents
_GENERIC_.


I have committed the patch in r12-728.

Martin



The restricted boundary of GIMPLE -> RTL expansion is not documented
and in theory we might even run into your assert when processing
global initializers (in case the CTOR ends up TREE_CONSTANT).


tree.def:

/* Value is structure or union component.
 Operand 0 is the structure or union (an expression).
 Operand 1 is the field (a node of type FIELD_DECL).
 Operand 2, if present, is the value of DECL_FIELD_OFFSET, measured
 in units of DECL_OFFSET_ALIGN / BITS_PER_UNIT.  */
DEFTREECODE (COMPONENT_REF, "component_ref", tcc_reference, 3)

generic.texi:

@item COMPONENT_REF
These nodes represent non-static data member accesses.  The first
operand is the object (rather than a pointer to it); the second operand
is the @code{FIELD_DECL} for the data member.  The third operand represents
the byte offset of the field, but should not be used directly; call
@code{component_ref_field_offset} instead.




Re: [PATCH] PR libstdc++/89728 diagnose some missuses of [locale.convenience] functions

2021-05-11 Thread Jonathan Wakely via Gcc-patches

On 11/05/21 21:27 +0300, Antony Polukhin via Libstdc++ wrote:

This patch provides compile time diagnostics for common misuse of
[locale.convenience] functions with std::string as a character type.


2021-05-11  Antony Polukhin  

PR libstdc++/89728
 * include/bits/locale_facets.h (ctype) Add static assert.
 * testsuite/22_locale/ctype/is/string/89728_neg.cc New test.

--
Best regards,
Antony Polukhin



diff --git a/libstdc++-v3/include/bits/locale_facets.h 
b/libstdc++-v3/include/bits/locale_facets.h
index 03724cf..012857f 100644
--- a/libstdc++-v3/include/bits/locale_facets.h
+++ b/libstdc++-v3/include/bits/locale_facets.h
@@ -136,6 +136,17 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
  return __s;
}

+  template
+struct __is_string
+{
+   enum _Value { _value = 0 };


The _value member needs to use double underscores.

But since this is only used in a static_assert, which is only
available in C++11, I think it can just derive from std::false_type
instead.


+};
+
+  template
+struct __is_string >
+{
+   enum _Value { _value = 1 };
+};

  // 22.2.1.1  Template class ctype
  // Include host and configuration specific ctype enums for ctype_base.
@@ -614,6 +625,10 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
  template
class ctype : public __ctype_abstract_base<_CharT>
{
+#if __cplusplus >= 201103L
+  static_assert(!__is_string<_CharT>::_value,
+   "std::basic_string used as a character type");
+#endif
public:
  // Types:
  typedef _CharTchar_type;


Alternatively, would it be even simpler to just define a partial
specialization of ctype?

template
  class ctype >
  {
#if __cplusplus >= 201103L
  static_assert(something dependent,
"std::basic_string used as a character type");
#endif
  private:
ctype();
~ctype();
  };

This will work in C++98 too.



[PATCH] avoid a couple of missing -Wuninitialized (PR 98583, 93100)

2021-05-11 Thread Martin Sebor via Gcc-patches

The attached change teaches the uninitialized pass about
__builtin_stack_restore and __builtin___asan_mark to avoid two
classes of -Wuninitialized false negatives.

Richard, you already approved the __builtin_stack_restore change
in the bug but I figured I'd submit a patch with both changes for
approval since they affect the same piece of code.

Martin
Avoid -Wuninitialized false negatives with sanitization and VLAs.

Resolves:
PR tree-optimization/93100 - gcc -fsanitize=address inhibits -Wuninitialized
PR middle-end/98583 - missing -Wuninitialized reading from a second VLA in its own block

gcc/ChangeLog:

	PR tree-optimization/93100
	PR middle-end/98583
	* tree-ssa-uninit.c (check_defs):

gcc/testsuite/ChangeLog:

	PR tree-optimization/93100
	PR middle-end/98583
	* g++.dg/warn/uninit-pr93100.C: New test.
	* gcc.dg/uninit-pr93100.c: New test.
	* gcc.dg/uninit-pr98583.c: New test.

diff --git a/gcc/testsuite/g++.dg/warn/uninit-pr93100.C b/gcc/testsuite/g++.dg/warn/uninit-pr93100.C
new file mode 100644
index 000..c9cd3ef0174
--- /dev/null
+++ b/gcc/testsuite/g++.dg/warn/uninit-pr93100.C
@@ -0,0 +1,59 @@
+/* PR tree-optimization/98508 - Sanitizer disable -Wall and -Wextra
+   { dg-do compile }
+   { dg-options "-O0 -Wall -fsanitize=address" } */
+
+struct S
+{
+  int a;
+};
+
+void warn_init_self_O0 ()
+{
+  S s = S (s);  // { dg-warning "\\\[-Wuninitialized" }
+  (void)
+}
+
+
+void warn_init_self_use_O0 ()
+{
+  S s = S (s);  // { dg-warning "\\\[-Wuninitialized" }
+
+  void sink (void*);
+  sink ();
+}
+
+
+#pragma GCC optimize ("1")
+
+void warn_init_self_O1 ()
+{
+  S s = S (s);  // { dg-warning "\\\[-Wuninitialized" }
+  (void)
+}
+
+
+void warn_init_self_use_O1 ()
+{
+  S s = S (s);  // { dg-warning "\\\[-Wuninitialized" }
+
+  void sink (void*);
+  sink ();
+}
+
+
+#pragma GCC optimize ("2")
+
+void warn_init_self_O2 ()
+{
+  S s = S (s);  // { dg-warning "\\\[-Wuninitialized" }
+  (void)
+}
+
+
+void warn_init_self_use_O2 ()
+{
+  S s = S (s);  // { dg-warning "\\\[-Wuninitialized" }
+
+  void sink (void*);
+  sink ();
+}
diff --git a/gcc/testsuite/gcc.dg/uninit-pr93100.c b/gcc/testsuite/gcc.dg/uninit-pr93100.c
new file mode 100644
index 000..61b7e434038
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/uninit-pr93100.c
@@ -0,0 +1,74 @@
+/* PR tree-optimization/93100 - gcc -fsanitize=address inhibits -Wuninitialized
+   { dg-do compile }
+   { dg-options "-Wall -fsanitize=address" } */
+
+struct A
+{
+  _Bool b;
+  int i;
+};
+
+void warn_A_b_O0 (void)
+{
+  struct A a;
+
+  if (a.b)  // { dg-warning "\\\[-Wuninitialized" }
+{
+  (void)
+}
+}
+
+void warn_A_i_O0 (void)
+{
+  struct A a;
+
+  if (a.i)  // { dg-warning "\\\[-Wuninitialized" }
+{
+  (void)
+}
+}
+
+#pragma GCC optimize ("1")
+
+void warn_A_b_O1 (void)
+{
+  struct A a;
+
+  if (a.b)  // { dg-warning "\\\[-Wuninitialized" }
+{
+  (void)
+}
+}
+
+void warn_A_i_O1 (void)
+{
+  struct A a;
+
+  if (a.i)  // { dg-warning "\\\[-Wuninitialized" }
+{
+  (void)
+}
+}
+
+
+#pragma GCC optimize ("2")
+
+void warn_A_b_O2 (void)
+{
+  struct A a;
+
+  if (a.b)  // { dg-warning "\\\[-Wuninitialized" }
+{
+  (void)
+}
+}
+
+void warn_A_i_O2 (void)
+{
+  struct A a;
+
+  if (a.i)  // { dg-warning "\\\[-Wuninitialized" }
+{
+  (void)
+}
+}
diff --git a/gcc/testsuite/gcc.dg/uninit-pr98583.c b/gcc/testsuite/gcc.dg/uninit-pr98583.c
new file mode 100644
index 000..638b0295809
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/uninit-pr98583.c
@@ -0,0 +1,31 @@
+/* PR middle-end/98583 - missing -Wuninitialized reading from a second VLA
+   in its own block
+   { dg-do compile }
+   { dg-options "-O2 -Wall" } */
+
+void f (int*);
+void g (int);
+
+void h1 (int n)
+{
+  int a[n];
+  f (a);
+
+  int b[n];
+  g (b[1]); // { dg-warning "\\\[-Wuninitialized" }
+}
+
+void h2 (int n, int i, int j)
+{
+  if (i)
+{
+  int a[n];
+  f (a);
+}
+
+  if (j)
+{
+  int b[n];
+  g (b[1]); // { dg-warning "\\\[-Wmaybe-uninitialized" }
+}
+}
diff --git a/gcc/tree-ssa-uninit.c b/gcc/tree-ssa-uninit.c
index 0800f596ab1..f55ce1939ac 100644
--- a/gcc/tree-ssa-uninit.c
+++ b/gcc/tree-ssa-uninit.c
@@ -209,6 +209,16 @@ check_defs (ao_ref *ref, tree vdef, void *data_)
 {
   check_defs_data *data = (check_defs_data *)data_;
   gimple *def_stmt = SSA_NAME_DEF_STMT (vdef);
+
+  /* The ASAN_MARK intrinsic doesn't modify the variable.  */
+  if (is_gimple_call (def_stmt)
+  && gimple_call_internal_p (def_stmt, IFN_ASAN_MARK))
+return false;
+
+  /* End of VLA scope is not a kill.  */
+  if (gimple_call_builtin_p (def_stmt, BUILT_IN_STACK_RESTORE))
+return false;
+
   /* If this is a clobber then if it is not a kill walk past it.  */
   if (gimple_clobber_p (def_stmt))
 {


[committed] preprocessor: Fix cpp_avoid_paste for digit separators

2021-05-11 Thread Joseph Myers
The libcpp function cpp_avoid_paste is used to insert whitespace in
preprocessed output where needed to avoid two consecutive
preprocessing tokens, that logically (e.g. when stringized) do not
have whitespace between them, from being incorrectly lexed as one when
the preprocessed input is reread by a compiler.

This fails to allow for digit separators, so meaning that invalid
code, that has a CPP_NUMBER (from a macro expansion) followed by a
character literal, can result in preprocessed output with a valid use
of digit separators, so that required syntax errors do not occur when
compiling with -save-temps.  Fix this by handling that case in
cpp_avoid_paste (as with other cases in cpp_avoid_paste, this doesn't
try to check whether the language version in use supports digit
separators; it's always OK to have unnecessary whitespace in
preprocessed output).

Note: there are other cases, with various kinds of wide character or
string literal following a CPP_NUMBER, where spurious pasting of
preprocessing tokens can occur but the sequence of tokens remains
invalid both before and after that pasting.  Maybe cpp_avoid_paste
should also handle those cases (and similar cases after a CPP_NAME),
to ensure the sequence of preprocessing tokens in preprocessed output
is exactly right, whether or not it affects whether syntax errors
occur.  This patch only addresses the case with digit separators where
invalid code can fail to be diagnosed without the space inserted.

Bootstrapped with no regressions for x86_64-pc-linux-gnu.  Applied to 
mainline.

libcpp/
* lex.c (cpp_avoid_paste): Do not allow pasting CPP_NUMBER with
CPP_CHAR.

gcc/testsuite/
* g++.dg/cpp1y/digit-sep-paste.C, gcc.dg/c2x-digit-separators-3.c:
New tests.

diff --git a/gcc/testsuite/g++.dg/cpp1y/digit-sep-paste.C 
b/gcc/testsuite/g++.dg/cpp1y/digit-sep-paste.C
new file mode 100644
index 000..41fb967ef8d
--- /dev/null
+++ b/gcc/testsuite/g++.dg/cpp1y/digit-sep-paste.C
@@ -0,0 +1,11 @@
+// Test token pasting with digit separators avoided for preprocessed output.
+// { dg-do compile { target c++14 } }
+// { dg-options "-save-temps" }
+
+#define ZERO 0
+
+int
+f ()
+{
+  return ZERO'0'0; /* { dg-error "expected" } */
+}
diff --git a/gcc/testsuite/gcc.dg/c2x-digit-separators-3.c 
b/gcc/testsuite/gcc.dg/c2x-digit-separators-3.c
new file mode 100644
index 000..cddb88fa880
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/c2x-digit-separators-3.c
@@ -0,0 +1,12 @@
+/* Test C2x digit separators.  Test token pasting avoided for preprocessed
+   output.  */
+/* { dg-do compile } */
+/* { dg-options "-std=c2x -save-temps" } */
+
+#define ZERO 0
+
+int
+f (void)
+{
+  return ZERO'0'0; /* { dg-error "expected" } */
+}
diff --git a/libcpp/lex.c b/libcpp/lex.c
index b7ce85a0331..36cd2e30630 100644
--- a/libcpp/lex.c
+++ b/libcpp/lex.c
@@ -3725,6 +3725,7 @@ cpp_avoid_paste (cpp_reader *pfile, const cpp_token 
*token1,
|| b == CPP_NAME
|| b == CPP_CHAR || b == CPP_STRING); /* L */
 case CPP_NUMBER:   return (b == CPP_NUMBER || b == CPP_NAME
+   || b == CPP_CHAR
|| c == '.' || c == '+' || c == '-');
  /* UCNs */
 case CPP_OTHER:return ((token1->val.str.text[0] == '\\'

-- 
Joseph S. Myers
jos...@codesourcery.com


[PATCH] PR libstdc++/89728 diagnose some missuses of [locale.convenience] functions

2021-05-11 Thread Antony Polukhin via Gcc-patches
This patch provides compile time diagnostics for common misuse of
[locale.convenience] functions with std::string as a character type.


2021-05-11  Antony Polukhin  

PR libstdc++/89728
  * include/bits/locale_facets.h (ctype) Add static assert.
  * testsuite/22_locale/ctype/is/string/89728_neg.cc New test.

-- 
Best regards,
Antony Polukhin
diff --git a/libstdc++-v3/include/bits/locale_facets.h 
b/libstdc++-v3/include/bits/locale_facets.h
index 03724cf..012857f 100644
--- a/libstdc++-v3/include/bits/locale_facets.h
+++ b/libstdc++-v3/include/bits/locale_facets.h
@@ -136,6 +136,17 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
   return __s;
 }
 
+  template
+struct __is_string
+{
+   enum _Value { _value = 0 };
+};
+
+  template
+struct __is_string >
+{
+   enum _Value { _value = 1 };
+};
 
   // 22.2.1.1  Template class ctype
   // Include host and configuration specific ctype enums for ctype_base.
@@ -614,6 +625,10 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
   template
 class ctype : public __ctype_abstract_base<_CharT>
 {
+#if __cplusplus >= 201103L
+  static_assert(!__is_string<_CharT>::_value,
+   "std::basic_string used as a character type");
+#endif
 public:
   // Types:
   typedef _CharT   char_type;
diff --git a/libstdc++-v3/testsuite/22_locale/ctype/is/string/89728_neg.cc 
b/libstdc++-v3/testsuite/22_locale/ctype/is/string/89728_neg.cc
new file mode 100644
index 000..987fa8e
--- /dev/null
+++ b/libstdc++-v3/testsuite/22_locale/ctype/is/string/89728_neg.cc
@@ -0,0 +1,73 @@
+// { dg-do compile { target c++11 } }
+
+// Copyright (C) 2021 Free Software Foundation, Inc.
+//
+// This file is part of the GNU ISO C++ Library.  This library is free
+// software; you can redistribute it and/or modify it under the
+// terms of the GNU General Public License as published by the
+// Free Software Foundation; either version 3, or (at your option)
+// any later version.
+
+// This library is distributed in the hope that it will be useful,
+// but WITHOUT ANY WARRANTY; without even the implied warranty of
+// MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+// GNU General Public License for more details.
+
+// You should have received a copy of the GNU General Public License along
+// with this library; see the file COPYING3.  If not see
+// .
+
+// { dg-error "used as a character type" "" { target *-*-* } 0 }
+
+#include 
+
+template 
+struct trait: std::char_traits {};
+
+template 
+auto make_str()
+{
+  return std::basic_string>{};
+}
+
+void test01()
+{
+  const auto& loc = std::locale::classic();
+
+  std::isspace(std::string{}, loc);// { dg-error "required from 
here" }
+  std::isprint(make_str(), loc);  // { dg-error "required from 
here" }
+  std::iscntrl(make_str(), loc);  // { dg-error "required from 
here" }
+  std::isupper(make_str(), loc);  // { dg-error "required from 
here" }
+  std::islower(make_str(), loc);  // { dg-error "required from 
here" }
+  std::isalpha(make_str(), loc);  // { dg-error "required from 
here" }
+  std::isdigit(make_str(), loc);  // { dg-error "required from 
here" }
+  std::ispunct(make_str(), loc);  // { dg-error "required from 
here" }
+  std::isxdigit(make_str(), loc); // { dg-error "required from 
here" }
+  std::isalnum(make_str(), loc);  // { dg-error "required from 
here" }
+  std::isgraph(make_str(), loc);  // { dg-error "required from 
here" }
+  std::isblank(make_str(), loc); // { dg-error "required from 
here" }
+  std::toupper(make_str(), loc); // { dg-error "required from 
here" }
+  std::tolower(make_str(), loc); // { dg-error "required from 
here" }
+}
+
+#ifdef _GLIBCXX_USE_WCHAR_T
+void test02()
+{
+  const auto& loc = std::locale::classic();
+
+  std::isspace(std::wstring{}, loc);   // { dg-error "required from 
here" }
+  std::isprint(make_str(), loc);   // { dg-error "required from 
here" }
+  std::iscntrl(make_str(), loc);   // { dg-error "required from 
here" }
+  std::isupper(make_str(), loc);   // { dg-error "required from 
here" }
+  std::islower(make_str(), loc);   // { dg-error "required from 
here" }
+  std::isalpha(make_str(), loc);   // { dg-error "required from 
here" }
+  std::isdigit(make_str(), loc);   // { dg-error "required from 
here" }
+  std::ispunct(make_str(), loc);   // { dg-error "required from 
here" }
+  std::isxdigit(make_str(), loc);  // { dg-error "required from 
here" }
+  std::isalnum(make_str(), loc);   // { dg-error "required from 
here" }
+  std::isgraph(make_str(), loc);   // { dg-error "required from 
here" }
+  std::isblank(make_str(), loc);  // { dg-error "required from 
here" }
+  std::toupper(make_str(), loc);  // { dg-error "required from 
here" }
+  std::tolower(make_str(), loc);  // { dg-error "required from 
here" }
+}
+#endif


[PATCH] ada: do not use binary mode in conf.py

2021-05-11 Thread Martin Liška

It's about more porting to Python3.

Ready for master?
Thanks,
Martin

gcc/ada/ChangeLog:

* doc/share/conf.py: Do not use binary mode.
Do not use u' literals as Python3 uses unicode by default.
---
 gcc/ada/doc/share/conf.py | 28 ++--
 1 file changed, 14 insertions(+), 14 deletions(-)

diff --git a/gcc/ada/doc/share/conf.py b/gcc/ada/doc/share/conf.py
index debd71688b8..705a6787056 100644
--- a/gcc/ada/doc/share/conf.py
+++ b/gcc/ada/doc/share/conf.py
@@ -18,9 +18,9 @@ import latex_elements
 
 DOCS = {

 'gnat_rm': {
-'title': u'GNAT Reference Manual'},
+'title': 'GNAT Reference Manual'},
 'gnat_ugn': {
-'title': u'GNAT User\'s Guide for Native Platforms'}}
+'title': 'GNAT User\'s Guide for Native Platforms'}}
 
 # Then retrieve the source directory

 root_source_dir = os.path.dirname(os.path.dirname(os.path.abspath(__file__)))
@@ -28,17 +28,17 @@ gnatvsn_spec = os.path.join(root_source_dir, '..', 
'gnatvsn.ads')
 basever = os.path.join(root_source_dir, '..', '..', 'BASE-VER')
 texi_fsf = True  # Set to False when FSF doc is switched to sphinx by default
 
-with open(gnatvsn_spec, 'rb') as fd:

+with open(gnatvsn_spec, 'r') as fd:
 gnatvsn_content = fd.read()
 
 
 def get_copyright():

-return u'2008-%s, Free Software Foundation' % time.strftime('%Y')
+return '2008-%s, Free Software Foundation' % time.strftime('%Y')
 
 
 def get_gnat_version():

-m = re.search(br'Gnat_Static_Version_String : ' +
-  br'constant String := "([^\(\)]+)\(.*\)?";',
+m = re.search(r'Gnat_Static_Version_String : ' +
+  r'constant String := "([^\(\)]+)\(.*\)?";',
   gnatvsn_content)
 if m:
 return m.group(1).strip().decode()
@@ -57,12 +57,12 @@ def get_gnat_version():
 
 
 def get_gnat_build_type():

-m = re.search(br'Build_Type : constant Gnat_Build_Type := (.+);',
+m = re.search(r'Build_Type : constant Gnat_Build_Type := (.+);',
   gnatvsn_content)
 if m:
-return {b'Gnatpro': 'PRO',
-b'FSF': 'FSF',
-b'GPL': 'GPL'}[m.group(1).strip()]
+return {'Gnatpro': 'PRO',
+'FSF': 'FSF',
+'GPL': 'GPL'}[m.group(1).strip()]
 else:
 print('cannot compute GNAT build type')
 sys.exit(1)
@@ -119,8 +119,8 @@ copyright_macros = {
 'date': time.strftime("%b %d, %Y"),
 'edition': 'GNAT %s Edition' % 'Pro' if get_gnat_build_type() == 'PRO'
else 'GPL',
-'name': u'GNU Ada',
-'tool': u'GNAT',
+'name': 'GNU Ada',
+'tool': 'GNAT',
 'version': version}
 
 latex_elements = {

@@ -134,11 +134,11 @@ latex_elements = {
 'tableofcontents': latex_elements.TOC % copyright_macros}
 
 latex_documents = [

-(master_doc, '%s.tex' % doc_name, project, u'AdaCore', 'manual')]
+(master_doc, '%s.tex' % doc_name, project, 'AdaCore', 'manual')]
 
 texinfo_documents = [

 (master_doc, doc_name, project,
- u'AdaCore', doc_name, doc_name, '')]
+ 'AdaCore', doc_name, doc_name, '')]
 
 
 def setup(app):

--
2.31.1



Re: [PATCH] libstdc++: Remove extern "C" from Ryu sources

2021-05-11 Thread Jakub Jelinek via Gcc-patches
On Tue, May 11, 2021 at 06:07:19PM +0100, Jonathan Wakely via Gcc-patches wrote:
> > I'm not sure if the abort call is necessary since the link step already
> > fails with a multiple definition error (without the fix) even if the
> > function is defined with an empty body.  But since Jakub included an
> > abort call in his testcase I carried it over :)  Shall I just make it
> > dg-do run, or perhaps keep it dg-do link and make the function body
> > empty?  Either seems to do the right thing.
> 
> OK, if it works as-is then let's leave it as a link test. I think
> having the abort there is likely to confuse me again in future when I
> forget this conversation, so let's go with an empty body.

When mentioning it on IRC, I didn't think of it failing already at link
time, had the mental model of binary + shared library that just exports
that symbol, so kind like a small shared library containing that
std::to_chars(x, x+64, 42.L, std::chars_format::scientific); in some
function, linked with -shared -fpic -static-libstdc++ and then
binary with that generic_to_chars function extern "C" and main calling
the shared library case.

In the end, both that and the dg-do link testcase should catch it fine.

Jakub



Re: [PATCH] libstdc++: Remove extern "C" from Ryu sources

2021-05-11 Thread Jonathan Wakely via Gcc-patches

On 11/05/21 13:04 -0400, Patrick Palka wrote:

On Tue, 11 May 2021, Jonathan Wakely wrote:


On 11/05/21 11:16 -0400, Patrick Palka via Libstdc++ wrote:
> On Tue, 11 May 2021, Patrick Palka wrote:
>
> > floating_to_chars.cc includes the Ryu sources into an anonymous
> > namespace as a convenient way to give all its symbols internal linkage.
> > But an entity declared extern "C" always has external linkage, even
> > from within an anonymous namespace, so this trick doesn't work in the
> > presence of extern "C", and it causes the Ryu function generic_to_chars
> > to be visible from libstdc++.a.
> >
> > This patch removes the only use of extern "C" from our local copy of
> > Ryu, along with some declarations for never-defined functions that GCC
> > now warns about.
> >
> > Tested on x86_64-pc-linux-gnu, and also verified that generic_to_chars
> > is not visible from libstdc++.a.  Does this look OK for trunk and the 11
> > branch?
>
> Now with a testcase:
>
> -- >8 --
>
> Subject: [PATCH] libstdc++: Remove extern "C" from Ryu sources
>
> floating_to_chars.cc includes the Ryu sources into an anonymous
> namespace as a convenient way to give all its symbols internal linkage.
> But an entity declared extern "C" always has external linkage, even
> from within an anonymous namespace, so this trick doesn't work in the
> presence of extern "C", and it causes the Ryu function generic_to_chars
> to be visible from libstdc++.a.
>
> This patch removes the only use of extern "C" from our local copy of
> Ryu along with some declarations for never-defined functions that GCC
> now warns about.
>
> Tested on x86_64-pc-linux-gnu, and also verified that generic_to_chars
> is not visible from libstdc++.a.  Does this look OK for trunk and the 11
> branch?
>
> libstdc++-v3/ChangeLog:
>
>* src/c++17/ryu/LOCAL_PATCHES: Update.
>* src/c++17/ryu/ryu_generic_128.h: Remove extern "C".
>Remove declarations for never-defined functions.
>* testsuite/20_util/to_chars/4.cc: New test.
> ---
> libstdc++-v3/src/c++17/ryu/LOCAL_PATCHES |  1 +
> libstdc++-v3/src/c++17/ryu/ryu_generic_128.h | 21 ++--
> libstdc++-v3/testsuite/20_util/to_chars/4.cc | 36 
> 3 files changed, 40 insertions(+), 18 deletions(-)
> create mode 100644 libstdc++-v3/testsuite/20_util/to_chars/4.cc
>
> diff --git a/libstdc++-v3/src/c++17/ryu/LOCAL_PATCHES
> b/libstdc++-v3/src/c++17/ryu/LOCAL_PATCHES
> index 51e504cb6ea..72ffad9662d 100644
> --- a/libstdc++-v3/src/c++17/ryu/LOCAL_PATCHES
> +++ b/libstdc++-v3/src/c++17/ryu/LOCAL_PATCHES
> @@ -1,2 +1,3 @@
> r11-6248
> r11-7636
> +r12-XXX
> diff --git a/libstdc++-v3/src/c++17/ryu/ryu_generic_128.h
> b/libstdc++-v3/src/c++17/ryu/ryu_generic_128.h
> index 2afbf274e11..6d988ab01eb 100644
> --- a/libstdc++-v3/src/c++17/ryu/ryu_generic_128.h
> +++ b/libstdc++-v3/src/c++17/ryu/ryu_generic_128.h
> @@ -18,9 +18,9 @@
> #define RYU_GENERIC_128_H
>
>
> -#ifdef __cplusplus
> -extern "C" {
> -#endif
> +// NOTE: These symbols are declared extern "C" upstream, but we don't want
> that
> +// because it'd override the internal linkage of the anonymous namespace
> into
> +// which this header is included.
>
> // This is a generic 128-bit implementation of float to shortest conversion
> // using the Ryu algorithm. It can handle any IEEE-compatible floating-point
> @@ -42,18 +42,6 @@ struct floating_decimal_128 {
>   bool sign;
> };
>
> -struct floating_decimal_128 float_to_fd128(float f);
> -struct floating_decimal_128 double_to_fd128(double d);
> -
> -// According to wikipedia (https://en.wikipedia.org/wiki/Long_double), this
> likely only works on
> -// x86 with specific compilers (clang?). May need an ifdef.
> -struct floating_decimal_128 long_double_to_fd128(long double d);
> -
> -// Converts the given binary floating point number to the shortest decimal
> floating point number
> -// that still accurately represents it.
> -struct floating_decimal_128 generic_binary_to_decimal(
> -const uint128_t bits, const uint32_t mantissaBits, const uint32_t
> exponentBits, const bool explicitLeadingBit);
> -
> // Converts the given decimal floating point number to a string, writing to
> result, and returning
> // the number characters written. Does not terminate the buffer with a 0. In
> the worst case, this
> // function can write up to 53 characters.
> @@ -63,8 +51,5 @@ struct floating_decimal_128 generic_binary_to_decimal(
> // = 1 + 39 + 1 + 1 + 1 + 10 = 53
> int generic_to_chars(const struct floating_decimal_128 v, char* const
> result);
>
> -#ifdef __cplusplus
> -}
> -#endif
>
> #endif // RYU_GENERIC_128_H
> diff --git a/libstdc++-v3/testsuite/20_util/to_chars/4.cc
> b/libstdc++-v3/testsuite/20_util/to_chars/4.cc
> new file mode 100644
> index 000..96f6e5d010c
> --- /dev/null
> +++ b/libstdc++-v3/testsuite/20_util/to_chars/4.cc
> @@ -0,0 +1,36 @@
> +// Copyright (C) 2021 Free Software Foundation, Inc.
> +//
> +// This file is part of the GNU ISO C++ Library.  This library is free
> +// 

Re: [PATCH] libstdc++: Remove extern "C" from Ryu sources

2021-05-11 Thread Patrick Palka via Gcc-patches
On Tue, 11 May 2021, Jonathan Wakely wrote:

> On 11/05/21 11:16 -0400, Patrick Palka via Libstdc++ wrote:
> > On Tue, 11 May 2021, Patrick Palka wrote:
> > 
> > > floating_to_chars.cc includes the Ryu sources into an anonymous
> > > namespace as a convenient way to give all its symbols internal linkage.
> > > But an entity declared extern "C" always has external linkage, even
> > > from within an anonymous namespace, so this trick doesn't work in the
> > > presence of extern "C", and it causes the Ryu function generic_to_chars
> > > to be visible from libstdc++.a.
> > > 
> > > This patch removes the only use of extern "C" from our local copy of
> > > Ryu, along with some declarations for never-defined functions that GCC
> > > now warns about.
> > > 
> > > Tested on x86_64-pc-linux-gnu, and also verified that generic_to_chars
> > > is not visible from libstdc++.a.  Does this look OK for trunk and the 11
> > > branch?
> > 
> > Now with a testcase:
> > 
> > -- >8 --
> > 
> > Subject: [PATCH] libstdc++: Remove extern "C" from Ryu sources
> > 
> > floating_to_chars.cc includes the Ryu sources into an anonymous
> > namespace as a convenient way to give all its symbols internal linkage.
> > But an entity declared extern "C" always has external linkage, even
> > from within an anonymous namespace, so this trick doesn't work in the
> > presence of extern "C", and it causes the Ryu function generic_to_chars
> > to be visible from libstdc++.a.
> > 
> > This patch removes the only use of extern "C" from our local copy of
> > Ryu along with some declarations for never-defined functions that GCC
> > now warns about.
> > 
> > Tested on x86_64-pc-linux-gnu, and also verified that generic_to_chars
> > is not visible from libstdc++.a.  Does this look OK for trunk and the 11
> > branch?
> > 
> > libstdc++-v3/ChangeLog:
> > 
> > * src/c++17/ryu/LOCAL_PATCHES: Update.
> > * src/c++17/ryu/ryu_generic_128.h: Remove extern "C".
> > Remove declarations for never-defined functions.
> > * testsuite/20_util/to_chars/4.cc: New test.
> > ---
> > libstdc++-v3/src/c++17/ryu/LOCAL_PATCHES |  1 +
> > libstdc++-v3/src/c++17/ryu/ryu_generic_128.h | 21 ++--
> > libstdc++-v3/testsuite/20_util/to_chars/4.cc | 36 
> > 3 files changed, 40 insertions(+), 18 deletions(-)
> > create mode 100644 libstdc++-v3/testsuite/20_util/to_chars/4.cc
> > 
> > diff --git a/libstdc++-v3/src/c++17/ryu/LOCAL_PATCHES
> > b/libstdc++-v3/src/c++17/ryu/LOCAL_PATCHES
> > index 51e504cb6ea..72ffad9662d 100644
> > --- a/libstdc++-v3/src/c++17/ryu/LOCAL_PATCHES
> > +++ b/libstdc++-v3/src/c++17/ryu/LOCAL_PATCHES
> > @@ -1,2 +1,3 @@
> > r11-6248
> > r11-7636
> > +r12-XXX
> > diff --git a/libstdc++-v3/src/c++17/ryu/ryu_generic_128.h
> > b/libstdc++-v3/src/c++17/ryu/ryu_generic_128.h
> > index 2afbf274e11..6d988ab01eb 100644
> > --- a/libstdc++-v3/src/c++17/ryu/ryu_generic_128.h
> > +++ b/libstdc++-v3/src/c++17/ryu/ryu_generic_128.h
> > @@ -18,9 +18,9 @@
> > #define RYU_GENERIC_128_H
> > 
> > 
> > -#ifdef __cplusplus
> > -extern "C" {
> > -#endif
> > +// NOTE: These symbols are declared extern "C" upstream, but we don't want
> > that
> > +// because it'd override the internal linkage of the anonymous namespace
> > into
> > +// which this header is included.
> > 
> > // This is a generic 128-bit implementation of float to shortest conversion
> > // using the Ryu algorithm. It can handle any IEEE-compatible floating-point
> > @@ -42,18 +42,6 @@ struct floating_decimal_128 {
> >   bool sign;
> > };
> > 
> > -struct floating_decimal_128 float_to_fd128(float f);
> > -struct floating_decimal_128 double_to_fd128(double d);
> > -
> > -// According to wikipedia (https://en.wikipedia.org/wiki/Long_double), this
> > likely only works on
> > -// x86 with specific compilers (clang?). May need an ifdef.
> > -struct floating_decimal_128 long_double_to_fd128(long double d);
> > -
> > -// Converts the given binary floating point number to the shortest decimal
> > floating point number
> > -// that still accurately represents it.
> > -struct floating_decimal_128 generic_binary_to_decimal(
> > -const uint128_t bits, const uint32_t mantissaBits, const uint32_t
> > exponentBits, const bool explicitLeadingBit);
> > -
> > // Converts the given decimal floating point number to a string, writing to
> > result, and returning
> > // the number characters written. Does not terminate the buffer with a 0. In
> > the worst case, this
> > // function can write up to 53 characters.
> > @@ -63,8 +51,5 @@ struct floating_decimal_128 generic_binary_to_decimal(
> > // = 1 + 39 + 1 + 1 + 1 + 10 = 53
> > int generic_to_chars(const struct floating_decimal_128 v, char* const
> > result);
> > 
> > -#ifdef __cplusplus
> > -}
> > -#endif
> > 
> > #endif // RYU_GENERIC_128_H
> > diff --git a/libstdc++-v3/testsuite/20_util/to_chars/4.cc
> > b/libstdc++-v3/testsuite/20_util/to_chars/4.cc
> > new file mode 100644
> > index 000..96f6e5d010c
> > --- /dev/null
> > 

Re: [PATCH] libstdc++: Remove extern "C" from Ryu sources

2021-05-11 Thread Jonathan Wakely via Gcc-patches

On 11/05/21 11:16 -0400, Patrick Palka via Libstdc++ wrote:

On Tue, 11 May 2021, Patrick Palka wrote:


floating_to_chars.cc includes the Ryu sources into an anonymous
namespace as a convenient way to give all its symbols internal linkage.
But an entity declared extern "C" always has external linkage, even
from within an anonymous namespace, so this trick doesn't work in the
presence of extern "C", and it causes the Ryu function generic_to_chars
to be visible from libstdc++.a.

This patch removes the only use of extern "C" from our local copy of
Ryu, along with some declarations for never-defined functions that GCC
now warns about.

Tested on x86_64-pc-linux-gnu, and also verified that generic_to_chars
is not visible from libstdc++.a.  Does this look OK for trunk and the 11
branch?


Now with a testcase:

-- >8 --

Subject: [PATCH] libstdc++: Remove extern "C" from Ryu sources

floating_to_chars.cc includes the Ryu sources into an anonymous
namespace as a convenient way to give all its symbols internal linkage.
But an entity declared extern "C" always has external linkage, even
from within an anonymous namespace, so this trick doesn't work in the
presence of extern "C", and it causes the Ryu function generic_to_chars
to be visible from libstdc++.a.

This patch removes the only use of extern "C" from our local copy of
Ryu along with some declarations for never-defined functions that GCC
now warns about.

Tested on x86_64-pc-linux-gnu, and also verified that generic_to_chars
is not visible from libstdc++.a.  Does this look OK for trunk and the 11
branch?

libstdc++-v3/ChangeLog:

* src/c++17/ryu/LOCAL_PATCHES: Update.
* src/c++17/ryu/ryu_generic_128.h: Remove extern "C".
Remove declarations for never-defined functions.
* testsuite/20_util/to_chars/4.cc: New test.
---
libstdc++-v3/src/c++17/ryu/LOCAL_PATCHES |  1 +
libstdc++-v3/src/c++17/ryu/ryu_generic_128.h | 21 ++--
libstdc++-v3/testsuite/20_util/to_chars/4.cc | 36 
3 files changed, 40 insertions(+), 18 deletions(-)
create mode 100644 libstdc++-v3/testsuite/20_util/to_chars/4.cc

diff --git a/libstdc++-v3/src/c++17/ryu/LOCAL_PATCHES 
b/libstdc++-v3/src/c++17/ryu/LOCAL_PATCHES
index 51e504cb6ea..72ffad9662d 100644
--- a/libstdc++-v3/src/c++17/ryu/LOCAL_PATCHES
+++ b/libstdc++-v3/src/c++17/ryu/LOCAL_PATCHES
@@ -1,2 +1,3 @@
r11-6248
r11-7636
+r12-XXX
diff --git a/libstdc++-v3/src/c++17/ryu/ryu_generic_128.h 
b/libstdc++-v3/src/c++17/ryu/ryu_generic_128.h
index 2afbf274e11..6d988ab01eb 100644
--- a/libstdc++-v3/src/c++17/ryu/ryu_generic_128.h
+++ b/libstdc++-v3/src/c++17/ryu/ryu_generic_128.h
@@ -18,9 +18,9 @@
#define RYU_GENERIC_128_H


-#ifdef __cplusplus
-extern "C" {
-#endif
+// NOTE: These symbols are declared extern "C" upstream, but we don't want that
+// because it'd override the internal linkage of the anonymous namespace into
+// which this header is included.

// This is a generic 128-bit implementation of float to shortest conversion
// using the Ryu algorithm. It can handle any IEEE-compatible floating-point
@@ -42,18 +42,6 @@ struct floating_decimal_128 {
  bool sign;
};

-struct floating_decimal_128 float_to_fd128(float f);
-struct floating_decimal_128 double_to_fd128(double d);
-
-// According to wikipedia (https://en.wikipedia.org/wiki/Long_double), this 
likely only works on
-// x86 with specific compilers (clang?). May need an ifdef.
-struct floating_decimal_128 long_double_to_fd128(long double d);
-
-// Converts the given binary floating point number to the shortest decimal 
floating point number
-// that still accurately represents it.
-struct floating_decimal_128 generic_binary_to_decimal(
-const uint128_t bits, const uint32_t mantissaBits, const uint32_t 
exponentBits, const bool explicitLeadingBit);
-
// Converts the given decimal floating point number to a string, writing to 
result, and returning
// the number characters written. Does not terminate the buffer with a 0. In 
the worst case, this
// function can write up to 53 characters.
@@ -63,8 +51,5 @@ struct floating_decimal_128 generic_binary_to_decimal(
// = 1 + 39 + 1 + 1 + 1 + 10 = 53
int generic_to_chars(const struct floating_decimal_128 v, char* const result);

-#ifdef __cplusplus
-}
-#endif

#endif // RYU_GENERIC_128_H
diff --git a/libstdc++-v3/testsuite/20_util/to_chars/4.cc 
b/libstdc++-v3/testsuite/20_util/to_chars/4.cc
new file mode 100644
index 000..96f6e5d010c
--- /dev/null
+++ b/libstdc++-v3/testsuite/20_util/to_chars/4.cc
@@ -0,0 +1,36 @@
+// Copyright (C) 2021 Free Software Foundation, Inc.
+//
+// This file is part of the GNU ISO C++ Library.  This library is free
+// software; you can redistribute it and/or modify it under the
+// terms of the GNU General Public License as published by the
+// Free Software Foundation; either version 3, or (at your option)
+// any later version.
+
+// This library is distributed in the hope that it will be useful,
+// but WITHOUT ANY WARRANTY; without 

[committed] libstdc++: Fix tests that fail in C++98 mode

2021-05-11 Thread Jonathan Wakely via Gcc-patches
The header synopsis test fails to define NOTHROW for C++98.

The shared_ptr test should be skipped for C++98.

The debug mode one should work for C++98 too, it just needs to avoid
C++11 syntax that isn't valid in C++98.

libstdc++-v3/ChangeLog:

* testsuite/20_util/headers/memory/synopsis.cc: Define C++98
alternative for macro.
* testsuite/20_util/shared_ptr/creation/99006.cc: Add effective
target keyword.
* testsuite/25_algorithms/copy/debug/99402.cc: Avoid C++11
syntax.

Tested powerpc64le-linux. Committed to trunk.

commit 37407a2ae701c0a93377106a2938ab5474062fc3
Author: Jonathan Wakely 
Date:   Tue May 11 17:14:26 2021

libstdc++: Fix tests that fail in C++98 mode

The header synopsis test fails to define NOTHROW for C++98.

The shared_ptr test should be skipped for C++98.

The debug mode one should work for C++98 too, it just needs to avoid
C++11 syntax that isn't valid in C++98.

libstdc++-v3/ChangeLog:

* testsuite/20_util/headers/memory/synopsis.cc: Define C++98
alternative for macro.
* testsuite/20_util/shared_ptr/creation/99006.cc: Add effective
target keyword.
* testsuite/25_algorithms/copy/debug/99402.cc: Avoid C++11
syntax.

diff --git a/libstdc++-v3/testsuite/20_util/headers/memory/synopsis.cc 
b/libstdc++-v3/testsuite/20_util/headers/memory/synopsis.cc
index 1463fcf9468..9a4264a0759 100644
--- a/libstdc++-v3/testsuite/20_util/headers/memory/synopsis.cc
+++ b/libstdc++-v3/testsuite/20_util/headers/memory/synopsis.cc
@@ -22,6 +22,8 @@
 
 #if __cplusplus >= 201103L
 # define NOTHROW noexcept
+#else
+# define NOTHROW
 #endif
 
 namespace std {
diff --git a/libstdc++-v3/testsuite/20_util/shared_ptr/creation/99006.cc 
b/libstdc++-v3/testsuite/20_util/shared_ptr/creation/99006.cc
index d5f7a5da5e9..e070fb9d420 100644
--- a/libstdc++-v3/testsuite/20_util/shared_ptr/creation/99006.cc
+++ b/libstdc++-v3/testsuite/20_util/shared_ptr/creation/99006.cc
@@ -1,5 +1,5 @@
-// FIXME: This should use { target { ! c++20 } }
-// { dg-do compile }
+// FIXME: This should use { target { c++11 && { ! c++20 } } }
+// { dg-do compile { target { c++11 } } }
 
 #include 
 
diff --git a/libstdc++-v3/testsuite/25_algorithms/copy/debug/99402.cc 
b/libstdc++-v3/testsuite/25_algorithms/copy/debug/99402.cc
index 041d222d079..9a9c97af605 100644
--- a/libstdc++-v3/testsuite/25_algorithms/copy/debug/99402.cc
+++ b/libstdc++-v3/testsuite/25_algorithms/copy/debug/99402.cc
@@ -28,8 +28,9 @@ using namespace std;
 
 int main()
 {
+int two[] = { 0, 1 };
 // any container with non-random access iterators:
-const set source = { 0, 1 };
+const set source(two, two + 2);
 vector dest(1);
 copy(source.begin(), ++source.begin(), dest.begin());
 }


[committed] libstdc++: Fix missing members in std::allocator

2021-05-11 Thread Jonathan Wakely via Gcc-patches
The changes in 75c6a925dab5b7af9ab47c10906cb0e140261cc2 were slightly
incorrect, because the converting constructor should be noexcept, and
the POCMA and is_always_equal traits should still be present in C++20.
This fixes it, and slightly refactors the preprocessor conditions and
order of members. Also add comments explaining things.

The non-standard construct and destroy members added for PR 78052 can be
private if allocator_traits> is made a friend.

libstdc++-v3/ChangeLog:

* include/bits/allocator.h (allocator) [C++20]: Add
missing noexcept to constructor. Restore missing POCMA and
is_always_equal_traits.
[C++17]: Make construct and destroy members private and
declare allocator_traits as a friend.
* include/bits/memoryfwd.h (allocator_traits): Declare.
* include/ext/malloc_allocator.h (malloc_allocator::allocate):
Add nodiscard attribute. Add static assertion for LWG 3307.
* include/ext/new_allocator.h (new_allocator::allocate): Add
static assertion for LWG 3307.
* testsuite/20_util/allocator/void.cc: Check that converting
constructor is noexcept. Check for propagation traits and
size_type and difference_type. Check that pointer and
const_pointer are gone in C++20.

Tested powerpc64le-linux. Committed to trunk. This needs to be
backported to 10 and 11, but I won't make the construct/destroy
members private on the branches.



commit 5e3a1ea3d89d62972e1f036b2ede37a80b880bdf
Author: Jonathan Wakely 
Date:   Tue May 11 15:01:01 2021

libstdc++: Fix missing members in std::allocator

The changes in 75c6a925dab5b7af9ab47c10906cb0e140261cc2 were slightly
incorrect, because the converting constructor should be noexcept, and
the POCMA and is_always_equal traits should still be present in C++20.
This fixes it, and slightly refactors the preprocessor conditions and
order of members. Also add comments explaining things.

The non-standard construct and destroy members added for PR 78052 can be
private if allocator_traits> is made a friend.

libstdc++-v3/ChangeLog:

* include/bits/allocator.h (allocator) [C++20]: Add
missing noexcept to constructor. Restore missing POCMA and
is_always_equal_traits.
[C++17]: Make construct and destroy members private and
declare allocator_traits as a friend.
* include/bits/memoryfwd.h (allocator_traits): Declare.
* include/ext/malloc_allocator.h (malloc_allocator::allocate):
Add nodiscard attribute. Add static assertion for LWG 3307.
* include/ext/new_allocator.h (new_allocator::allocate): Add
static assertion for LWG 3307.
* testsuite/20_util/allocator/void.cc: Check that converting
constructor is noexcept. Check for propagation traits and
size_type and difference_type. Check that pointer and
const_pointer are gone in C++20.

diff --git a/libstdc++-v3/include/bits/allocator.h 
b/libstdc++-v3/include/bits/allocator.h
index c5c1f28b3d0..73d5d7a25be 100644
--- a/libstdc++-v3/include/bits/allocator.h
+++ b/libstdc++-v3/include/bits/allocator.h
@@ -60,6 +60,13 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
*  @{
*/
 
+  // Since C++20 the primary template should be used for allocator,
+  // but then it would have a non-trivial default ctor and dtor, which
+  // would be an ABI change. So C++20 still uses the allocator explicit
+  // specialization, with the historical ABI properties, but with the same
+  // members that are present in the primary template.
+
+#if ! _GLIBCXX_INLINE_VERSION
   /// allocator specialization.
   template<>
 class allocator
@@ -68,28 +75,40 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
   typedef voidvalue_type;
   typedef size_t  size_type;
   typedef ptrdiff_t   difference_type;
+
 #if __cplusplus <= 201703L
+  // These were removed for C++20.
   typedef void*   pointer;
   typedef const void* const_pointer;
 
   template
struct rebind
{ typedef allocator<_Tp1> other; };
-#else
-  allocator() = default;
+#endif
 
-  template
-   constexpr
-   allocator(const allocator<_Up>&) { }
-#endif // ! C++20
-
-#if __cplusplus >= 201103L && __cplusplus <= 201703L
+#if __cplusplus >= 201103L
   // _GLIBCXX_RESOLVE_LIB_DEFECTS
   // 2103. std::allocator propagate_on_container_move_assignment
   typedef true_type propagate_on_container_move_assignment;
 
   typedef true_type is_always_equal;
 
+#if __cplusplus >= 202002L
+  allocator() = default;
+
+  template
+   constexpr
+   allocator(const allocator<_Up>&) noexcept { }
+
+  // No allocate member because it's ill-formed by LWG 3307.
+  // No deallocate member because it would be undefined to call it
+  // with any pointer which wasn't obtained from allocate.
+
+#else // ! C++20
+   

Re: [PATCH 02/57] Support scanning of build-time GC roots in gengtype

2021-05-11 Thread Bill Schmidt via Gcc-patches
Hi!  I'd like to ping this specific patch from the series, which is the 
only one remaining that affects common code.  I confess that I don't 
know whom to ask for a review for gengtype; I didn't get any good ideas 
from MAINTAINERS.  If you know of a good reviewer candidate, please CC them.


In any case, this is a reasonably straightforward patch.  It allows 
adding generated header files to be identified as "./header.h" and 
included in the files to be scanned by gengtype for GC roots.


Thank you!
Bill

On 4/27/21 10:32 AM, Bill Schmidt wrote:

Currently gengtype supports scanning target-specific files for GC roots,
but those files must exist in the source tree.  This patch extends the
support to include header files generated into the build directory.  It
also allows targets to specify build dependencies for s-gtype to ensure
the built headers are up to date prior to running gengtype.

2021-04-02  Bill Schmidt  

gcc/
* Makefile.in (EXTRA_GTYPE_DEPS): New variable.
(s-gtype): Depend on EXTRA_GTYPE_DEPS.
* gengtype-state.c (state_writer::write_state_files_list): Add a
parameter to the fileslist expression for the number of build
headers to scan.
(read_state_file_list): Detect build headers and strip the initial
"./" from their names.
* gengtype.c (build_headers): New global variable.
(num_build_headers): Likewise.
(open_base_files): Emit #include for each build header.
(main): Detect and count build headers.
* gengtype.h (build_headers): New extern variable.
(num_build_headers): Likewise.
---
  gcc/Makefile.in  |  5 +++--
  gcc/gengtype-state.c | 29 +++--
  gcc/gengtype.c   | 19 ---
  gcc/gengtype.h   |  5 +
  4 files changed, 47 insertions(+), 11 deletions(-)

diff --git a/gcc/Makefile.in b/gcc/Makefile.in
index 2fd94fc7dba..1a253256042 100644
--- a/gcc/Makefile.in
+++ b/gcc/Makefile.in
@@ -561,6 +561,7 @@ out_object_file=@out_object_file@
  OUT_FILE_DEPS=
  common_out_file=$(srcdir)/common/config/@common_out_file@
  common_out_object_file=@common_out_object_file@
+EXTRA_GTYPE_DEPS=
  md_file=$(srcdir)/common.md $(srcdir)/config/@md_file@
  tm_file_list=@tm_file_list@
  tm_include_list=@tm_include_list@
@@ -2740,8 +2741,8 @@ s-gtyp-input: Makefile
$(SHELL) $(srcdir)/../move-if-change tmp-gi.list gtyp-input.list
$(STAMP) s-gtyp-input
  
-s-gtype: build/gengtype$(build_exeext) $(filter-out [%], $(GTFILES)) \

-gtyp-input.list
+s-gtype: $(EXTRA_GTYPE_DEPS) build/gengtype$(build_exeext) \
+   $(filter-out [%], $(GTFILES)) gtyp-input.list
  # First, parse all files and save a state file.
$(RUN_GEN) build/gengtype$(build_exeext) $(GENGTYPE_FLAGS) \
  -S $(srcdir) -I gtyp-input.list -w tmp-gtype.state
diff --git a/gcc/gengtype-state.c b/gcc/gengtype-state.c
index 891f2e18a61..be3549dce33 100644
--- a/gcc/gengtype-state.c
+++ b/gcc/gengtype-state.c
@@ -1269,7 +1269,7 @@ state_writer::write_state_files_list (void)
int i = 0;
/* Write the list of files with their lang_bitmap.  */
begin_s_expr ("fileslist");
-  fprintf (state_file, "%d", (int) num_gt_files);
+  fprintf (state_file, "%d %d", (int) num_gt_files, (int) num_build_headers);
for (i = 0; i < (int) num_gt_files; i++)
  {
const char *cursrcrelpath = NULL;
@@ -2456,16 +2456,20 @@ read_state_files_list (void)
struct state_token_st *t0 = peek_state_token (0);
struct state_token_st *t1 = peek_state_token (1);
struct state_token_st *t2 = peek_state_token (2);
+  struct state_token_st *t3 = peek_state_token (3);
  
if (state_token_kind (t0) == STOK_LEFTPAR

&& state_token_is_name (t1, "!fileslist")
-  && state_token_kind (t2) == STOK_INTEGER)
+  && state_token_kind (t2) == STOK_INTEGER
+  && state_token_kind (t3) == STOK_INTEGER)
  {
-  int i = 0;
+  int i = 0, j = 0;
num_gt_files = t2->stok_un.stok_num;
-  next_state_tokens (3);
-  t0 = t1 = t2 = NULL;
+  num_build_headers = t3->stok_un.stok_num;
+  next_state_tokens (4);
+  t0 = t1 = t2 = t3 = NULL;
gt_files = XCNEWVEC (const input_file *, num_gt_files);
+  build_headers = XCNEWVEC (const char *, num_build_headers);
for (i = 0; i < (int) num_gt_files; i++)
{
  bool issrcfile = FALSE;
@@ -2498,7 +2502,20 @@ read_state_files_list (void)
  free (fullpath);
}
  else
-   curgt = input_file_by_name (fnam);
+   {
+ curgt = input_file_by_name (fnam);
+ /* Look for a header file created during the build,
+which looks like "./.h".  */
+ int len = strlen (fnam);
+ if (len >= 5 && fnam[0] == '.' && fnam[1] == '/'
+ && fnam[len-2] == '.' && fnam[len-1] == 

Re: [PATCH 00/57] Replace the Power target-specific built-in machinery

2021-05-11 Thread Bill Schmidt via Gcc-patches
Hi!  I'd like to ping this series.  This is a big change, so I'd like to 
get it committed fairly early in stage 1.  I know you have a lot stacked 
up, though.


Thanks!
Bill

On 4/27/21 10:32 AM, Bill Schmidt wrote:

The design of the target-specific built-in function support in the
Power back end has not stood the test of time.  The machinery is
grossly inefficient, confusing, and arcane; and adding new built-in
functions is inefficient and error-prone.  This patch set introduces a
replacement.

Because of the scope of the changes, it's important to be able to
verify that the new system makes only intended changes to the
functions that are supported.  Therefore this patch set adds a new
mechanism, and (in the final patch) enables it instead of the existing
support, but does not yet remove the old support.  That will happen in
a follow-up patch once we're comfortable with the new system.

Most of the patches in this set are specific to the rs6000 back end.
However, the first two patches make changes in common code and require
review from the appropriate maintainers.  Jakub and Jeff, I would
appreciate it if you could look at these two small patches.

After these changes are upstream, adding new built-in functions will
usually be as simple as adding two lines to a file,
rs6000-builtin-new.def, that give the prototype of the function and a
little additional information.  Adding new overloaded functions will
require adding a new section to another file, rs6000-overload.def,
with one line describing the overload information, and two lines for
each function to be dispatched to from the overloaded function.

The patches are divided into the following sections.

Patches 0001-0002: Common code patches

   Patch 0001 adds a mechanism to the Makefile to allow specifying
   additional dependencies for "out_object_file", which is rs6000.o for
   the rs6000 back end.  I found this necessary to be able to have
   rs6000.o depend on a header file generated during the build.

   Patch 0002 expands the gengtype machinery to scan header files
   created during the build for GC roots.

Patches 0003, 0005-0023: Generator program

   A new program, rs6000-gen-builtins, is created and executed during
   the build.  It reads rs6000-builtin-new.def and rs6000-overload.def
   and produces three output files:  rs6000-builtins.h,
   rs6000-builtins.c, and rs6000-vecdefines.h.  rs6000-builtins.h
   defines the data structures representing the built-in functions,
   overloaded functions, overload instantiations, and function type
   specifiers.  rs6000-builtins.c contains static initializers for the
   data structures, as well as the function rs6000_autoinit_builtins
   that performs additional run-time initialization.
   rs6000-vecdefines.h contains a set of #defines that map external
   identifiers such as vec_add to their internal builtin names, such as
   __builtin_vec_add.  This replaces most of the similar #defines
   previously contained in altivec.h, which now #includes the new file
   instead.

   This set of patches adds the source for the generator program.

Patches 0024-0025: Target build machinery

   These patches make changes to config.gcc and t-rs6000 to build and
   run the new generator program, and to ensure that the garbage
   collection roots in rs6000-builtins.h are scanned by gengtype.

Patches 0004, 0026-0031, 0033-0037: Input files

   These patches build up the input files to the generator program,
   listing all of the built-in functions and overloads to be
   processed.

Patch 0032: Add pointer types

   This patch creates and caches a bunch of pointer type nodes.  The
   existing built-in machinery, for some reason, only created base
   types up front and created the pointer types on demand (over and
   over and over again).  The new mechanism needs all the type nodes
   available, so we add them here.

Patch 0038: Call rs6000_autoinit_builtins

Patch 0039: A little special handling for Darwin

Patches 0040-0041: Miscellaneous support patches

Patch 0042: Rewrite the overload processing

   Most of this code remains largely the same as before, with the same
   special handling for a few interesting built-in functions.  But the
   general handling of overloaded functions is now much more efficient
   since the new data structures are designed for quick lookup, whereas
   the old machinery does a brutal linear search.

Patch 0043: Rewrite gimple folding

   The "rewrite" here consists entirely of changing the names of the
   builtins to be processed, since we need a separate enumeration of
   builtins for the new machinery.

Patch 0044: Vectorization support

   Small updates to the functions used for mapping built-ins to their
   vectorized counterparts.

Patches 0045-0050: Rewrite built-in function expansion

   This is where most of the meat comes in.  Lookup of built-ins at
   expand time is again much more efficient, replacing the old
   mechanism of multiple linear searches over the whole built-in
   

Re: [PATCH 0/4] [rs6000] ROP support

2021-05-11 Thread Bill Schmidt via Gcc-patches
Hi!  I'd like to ping this series.  It has slightly higher priority from 
my perspective, since I'd like this to be backported in time for GCC 11.2.


Thanks!
Bill

On 4/25/21 8:50 PM, Bill Schmidt via Gcc-patches wrote:

Add POWER10 support for hashst[p] and hashchk[p] operations.  When
the -mrop-protect option is selected, any function that loads the link
register from memory before returning must have protection in the
prologue and epilogue to ensure the link register save location has
not been compromised.  If -mprivileged is also specified, the
protection instructions generated require supervisor privilege.

The patches are broken up into logical chunks:
  - Option handling
  - Instruction generation
  - Predefined macro handling
  - Test cases

Bootstrapped and tested on a POWER10 system with no regressions.
Tests on a kernel that enables user-space ROP mitigation were
successful.  Is this series ok for trunk?  I would also like to later
backport these patches to GCC for the 11.2 release.

Thanks!
Bill

Bill Schmidt (4):
   rs6000: Add -mrop-protect and -mprivileged flags
   rs6000: Emit ROP-protect instructions in prologue and epilogue
   rs6000: Conditionally define __ROP_PROTECT__
   rs6000: Add ROP tests

  gcc/config/rs6000/rs6000-c.c |  3 +
  gcc/config/rs6000/rs6000-internal.h  |  2 +
  gcc/config/rs6000/rs6000-logue.c | 86 +---
  gcc/config/rs6000/rs6000.c   |  7 ++
  gcc/config/rs6000/rs6000.md  | 39 +++
  gcc/config/rs6000/rs6000.opt |  6 ++
  gcc/doc/invoke.texi  | 19 +-
  gcc/testsuite/gcc.target/powerpc/rop-1.c | 16 +
  gcc/testsuite/gcc.target/powerpc/rop-2.c | 16 +
  gcc/testsuite/gcc.target/powerpc/rop-3.c | 19 ++
  gcc/testsuite/gcc.target/powerpc/rop-4.c | 14 
  gcc/testsuite/gcc.target/powerpc/rop-5.c | 17 +
  12 files changed, 231 insertions(+), 13 deletions(-)
  create mode 100644 gcc/testsuite/gcc.target/powerpc/rop-1.c
  create mode 100644 gcc/testsuite/gcc.target/powerpc/rop-2.c
  create mode 100644 gcc/testsuite/gcc.target/powerpc/rop-3.c
  create mode 100644 gcc/testsuite/gcc.target/powerpc/rop-4.c
  create mode 100644 gcc/testsuite/gcc.target/powerpc/rop-5.c



Re: [PATCH] OpenMP: Add support for 'close' in map clause

2021-05-11 Thread Jakub Jelinek via Gcc-patches
On Tue, May 11, 2021 at 05:30:19PM +0200, Tobias Burnus wrote:
> On 11.05.21 17:20, Jakub Jelinek via Gcc-patches wrote:
> > One extra thing, sorry, forgot to mention, for the translators it might be
> > better to use "too many %qs modifiers", "always" (or, "close").
> > That way they can translate it just once instead of twice.
> 
> That won't work for
> 
> c_parser_error (parser, "expected modifier % only once");
> 
> as the error function is not variadic, contrary to warning_at/error_at; namely
> 
> c/c-parser.h:extern bool c_parser_error (c_parser *parser, const char 
> *gmsgid);

Ok, ignore that part then.  Sorry.

Jakub



Re: [PATCH] OpenMP: Add support for 'close' in map clause

2021-05-11 Thread Tobias Burnus

On 11.05.21 17:20, Jakub Jelinek via Gcc-patches wrote:

One extra thing, sorry, forgot to mention, for the translators it might be
better to use "too many %qs modifiers", "always" (or, "close").
That way they can translate it just once instead of twice.


That won't work for

c_parser_error (parser, "expected modifier % only once");

as the error function is not variadic, contrary to warning_at/error_at; namely

c/c-parser.h:extern bool c_parser_error (c_parser *parser, const char *gmsgid);

Tobias

-
Mentor Graphics (Deutschland) GmbH, Arnulfstrasse 201, 80634 München 
Registergericht München HRB 106955, Geschäftsführer: Thomas Heurung, Frank 
Thürauf


Re: [PATCH] OpenMP: Add support for 'close' in map clause

2021-05-11 Thread Jakub Jelinek via Gcc-patches
On Tue, May 11, 2021 at 04:27:55PM +0200, Marcel Vollweiler wrote:
> > The usual wording would be
> > "too many % modifiers"
> > 
> 
> Changed for 'always' and 'close' for C and C++.

One extra thing, sorry, forgot to mention, for the translators it might be
better to use "too many %qs modifiers", "always" (or, "close").
That way they can translate it just once instead of twice.

> > IMHO you should at least check that tok->type == CPP_NAME before
> > checking pos + 1 token's type, you don't want to skip over CPP_EOF,
> > CPP_PRAGMA_EOF, or even CPP_CLOSE_PAREN etc.
> > Perhaps by adding
> >if (tok->type != CPP_NAME)
> >   break;
> > right after c_token *tok = c_parser_peek_nth_token_raw (parser, pos); ?
> 
> The check of the token's type at position 'pos' is done in the condition
> of the while loop, that means
>'c_parser_peek_nth_token_raw (parser, pos + 1)->type == CPP_COLON'
> is only reached when
>'c_parser_peek_nth_token_raw (parser, pos)->type == CPP_NAME'
> holds (since 'pos' is not changed in between).

You're right.

> > And, IMHO something should clear always and close (btw, might be better
> > to use close_modifier as variable name and for consistency always_modifier)
> > unless we reach the CPP_COLON case.
> > 
> 
> Good point, I agree with both. Cleared and renamed :)

I think the clearing is still insufficient.
It will clear on e.g. map (always, close, foobar)
but not on map (always) or map (always, close)
because in that case the loop terminates by the while loop condition
no longer being true.

And there is another thing I have missed (and again should be in the
testsuite):
map (always, always)
or
map (always, close, close)
etc. will with the patch diagnose that too many 'always' modifiers
(or 'close'), but that isn't correct diagnostic, there aren't any
modifiers, but the same variable is mapped multiple times.

So, one possibility is to remember details like:
potential always modifier has been seen
potential always modifier has been seen more than once
potential close modifier has been seen
potential close modifier has been seen more than once
and only when seeing the colon enact them and diagnose too many modifiers
(but then not with cp_parser_error but error with a location_t of one of the
modifiers), e.g. always_modifier == -1 would mean 1 potential has been seen,
== -2 more than one potential and == 1 it was modifier.

Or another one is not to do much in the first raw token lookup loop,
just check if it is a sequence of
always
close
,
tokens followed by
CPP_NAME (other than always, close) + CPP_CLONE combo
and in that case just set a bool flag that map-kind is present,
but don't consume any tokens.
And then in another loop if that bool flag is set, lookup non-raw
tokens and parse them, setting flags, doing diagnostics etc.
Basically do the look-ahead only to check if it is
map (var1, var2, ...)
case
or
map (modifiers map-kind: var1, var2, ...)
case.

Jakub



Re: [PATCH] libstdc++: Remove extern "C" from Ryu sources

2021-05-11 Thread Patrick Palka via Gcc-patches
On Tue, 11 May 2021, Patrick Palka wrote:

> floating_to_chars.cc includes the Ryu sources into an anonymous
> namespace as a convenient way to give all its symbols internal linkage.
> But an entity declared extern "C" always has external linkage, even
> from within an anonymous namespace, so this trick doesn't work in the
> presence of extern "C", and it causes the Ryu function generic_to_chars
> to be visible from libstdc++.a.
> 
> This patch removes the only use of extern "C" from our local copy of
> Ryu, along with some declarations for never-defined functions that GCC
> now warns about.
> 
> Tested on x86_64-pc-linux-gnu, and also verified that generic_to_chars
> is not visible from libstdc++.a.  Does this look OK for trunk and the 11
> branch?

Now with a testcase:

-- >8 --

Subject: [PATCH] libstdc++: Remove extern "C" from Ryu sources

floating_to_chars.cc includes the Ryu sources into an anonymous
namespace as a convenient way to give all its symbols internal linkage.
But an entity declared extern "C" always has external linkage, even
from within an anonymous namespace, so this trick doesn't work in the
presence of extern "C", and it causes the Ryu function generic_to_chars
to be visible from libstdc++.a.

This patch removes the only use of extern "C" from our local copy of
Ryu along with some declarations for never-defined functions that GCC
now warns about.

Tested on x86_64-pc-linux-gnu, and also verified that generic_to_chars
is not visible from libstdc++.a.  Does this look OK for trunk and the 11
branch?

libstdc++-v3/ChangeLog:

* src/c++17/ryu/LOCAL_PATCHES: Update.
* src/c++17/ryu/ryu_generic_128.h: Remove extern "C".
Remove declarations for never-defined functions.
* testsuite/20_util/to_chars/4.cc: New test.
---
 libstdc++-v3/src/c++17/ryu/LOCAL_PATCHES |  1 +
 libstdc++-v3/src/c++17/ryu/ryu_generic_128.h | 21 ++--
 libstdc++-v3/testsuite/20_util/to_chars/4.cc | 36 
 3 files changed, 40 insertions(+), 18 deletions(-)
 create mode 100644 libstdc++-v3/testsuite/20_util/to_chars/4.cc

diff --git a/libstdc++-v3/src/c++17/ryu/LOCAL_PATCHES 
b/libstdc++-v3/src/c++17/ryu/LOCAL_PATCHES
index 51e504cb6ea..72ffad9662d 100644
--- a/libstdc++-v3/src/c++17/ryu/LOCAL_PATCHES
+++ b/libstdc++-v3/src/c++17/ryu/LOCAL_PATCHES
@@ -1,2 +1,3 @@
 r11-6248
 r11-7636
+r12-XXX
diff --git a/libstdc++-v3/src/c++17/ryu/ryu_generic_128.h 
b/libstdc++-v3/src/c++17/ryu/ryu_generic_128.h
index 2afbf274e11..6d988ab01eb 100644
--- a/libstdc++-v3/src/c++17/ryu/ryu_generic_128.h
+++ b/libstdc++-v3/src/c++17/ryu/ryu_generic_128.h
@@ -18,9 +18,9 @@
 #define RYU_GENERIC_128_H
 
 
-#ifdef __cplusplus
-extern "C" {
-#endif
+// NOTE: These symbols are declared extern "C" upstream, but we don't want that
+// because it'd override the internal linkage of the anonymous namespace into
+// which this header is included.
 
 // This is a generic 128-bit implementation of float to shortest conversion
 // using the Ryu algorithm. It can handle any IEEE-compatible floating-point
@@ -42,18 +42,6 @@ struct floating_decimal_128 {
   bool sign;
 };
 
-struct floating_decimal_128 float_to_fd128(float f);
-struct floating_decimal_128 double_to_fd128(double d);
-
-// According to wikipedia (https://en.wikipedia.org/wiki/Long_double), this 
likely only works on
-// x86 with specific compilers (clang?). May need an ifdef.
-struct floating_decimal_128 long_double_to_fd128(long double d);
-
-// Converts the given binary floating point number to the shortest decimal 
floating point number
-// that still accurately represents it.
-struct floating_decimal_128 generic_binary_to_decimal(
-const uint128_t bits, const uint32_t mantissaBits, const uint32_t 
exponentBits, const bool explicitLeadingBit);
-
 // Converts the given decimal floating point number to a string, writing to 
result, and returning
 // the number characters written. Does not terminate the buffer with a 0. In 
the worst case, this
 // function can write up to 53 characters.
@@ -63,8 +51,5 @@ struct floating_decimal_128 generic_binary_to_decimal(
 // = 1 + 39 + 1 + 1 + 1 + 10 = 53
 int generic_to_chars(const struct floating_decimal_128 v, char* const result);
 
-#ifdef __cplusplus
-}
-#endif
 
 #endif // RYU_GENERIC_128_H
diff --git a/libstdc++-v3/testsuite/20_util/to_chars/4.cc 
b/libstdc++-v3/testsuite/20_util/to_chars/4.cc
new file mode 100644
index 000..96f6e5d010c
--- /dev/null
+++ b/libstdc++-v3/testsuite/20_util/to_chars/4.cc
@@ -0,0 +1,36 @@
+// Copyright (C) 2021 Free Software Foundation, Inc.
+//
+// This file is part of the GNU ISO C++ Library.  This library is free
+// software; you can redistribute it and/or modify it under the
+// terms of the GNU General Public License as published by the
+// Free Software Foundation; either version 3, or (at your option)
+// any later version.
+
+// This library is distributed in the hope that it will be useful,
+// but WITHOUT ANY WARRANTY; without even the 

Re: [PATCH, OG10, OpenMP 5.0, committed] Remove array section base-pointer mapping semantics, and other front-end adjustments.

2021-05-11 Thread Thomas Schwinge
Hi Chung-Lin!

On 2021-05-11T19:28:04+0800, Chung-Lin Tang  wrote:
> This patch largely implements three pieces of functionality:
>
> (1) Per discussion and clarification on the omp-lang mailing list,
> standards conforming behavior for mapping array sections should *NOT* also 
> map the base-pointer,
> i.e for this code:
>
> struct S { int *ptr; ... };
> struct S s;
> #pragma omp target enter data map(to: s.ptr[:100])
>
> Currently we generate after gimplify:
> #pragma omp target enter data map(struct:s [len: 1]) map(alloc:s.ptr [len: 
> 8]) \
>map(to:*_1 [len: 400]) map(attach:s.ptr [bias: 
> 0])
>
> which is deemed incorrect. After this patch, the gimplify results are now 
> adjusted to:
> #pragma omp target enter data map(to:*_1 [len: 400]) map(attach:s.ptr [bias: 
> 0])
> (the attach operation is still generated, and if s.ptr is already mapped 
> prior, attachment will happen)
>
> The correct way of achieving the base-pointer-also-mapped behavior would be 
> to use:
> #pragma omp target enter data map(to: s.ptr, s.ptr[:100])
>
> This adjustment in behavior required a number of small adjustments here and 
> there in gimplify, including
> to accomodate map sequences for C++ references.

I'm a bit confused by that -- this mandates the bulk of the testsuite
changes that you've included, and these seem a step backwards in terms of
user experience, but then, I have no state on the exact OpenMP
specification requirements, so you certainly may be right on that.  (And
also, as Julian mentioned, how this relates to OpenACC semantics, which I
also haven't considered in detail -- but I note you didn't adjust any
OpenACC testcases for that, so I suppose that's really conditionalized to
OpenMP only.)

> There is also a small Fortran front-end patch involved (hence CCing Tobias).
> The new gimplify processing changed behavior in handling 
> GOMP_MAP_ALWAYS_POINTER maps such that
> the libgomp.fortran/struct-elem-map-1.f90 regressed. It appeared that the 
> Fortran FE was generating
> a GOMP_MAP_ALWAYS_POINTER for array types, which didn't seem quite correct, 
> and the pre-patch behavior
> was removing this map anyways. I have a small change in 
> trans-openmp.c:gfc_trans_omp_array_section
> to not generate the map in this case, and so far no bad test results.

Makes sense to argue that one separately, with testcases, for the master
branch submission?

> (2) The second part (though kind of related to the first above) are fixes in 
> libgomp/target.c
> to not overwrite attached pointers when handling device<->host copies, mainly 
> for the "always" case.
> This behavior is also noted in the 5.0 spec, but not yet properly coded 
> before.

Likewise, if that makes sense?

> (3) The third is a set of changes to the C/C++ front-ends to extend the 
> allowed component access syntax
> in map clauses. This is actually mainly an effort to allow SPEC HPC to 
> compile, so despite in the long
> term the entire map clause syntax parsing is probably going to be revamped, 
> we're still adding this in
> for now. These changes are enabled for both OpenACC and OpenMP.

Likewise, if that makes sense?  ;-)

> Tested on x86_64-linux with nvptx offloading with no regressions.

I'm seeing a regression with
'libgomp.oacc-c-c++-common/noncontig_array-1.c' execution testing, both C
and C++, for '-O2' (but not '-O0'), and only for about half of the
invocations.  But it seems to reliable reproduce in GDB:

Thread 1 "a.out" received signal SIGSEGV, Segmentation fault.
gomp_decrement_refcount (do_remove=, do_copy=, delete_p=false, refcount_set=0x0, k=0xc4d450) at 
[...]/source-gcc/libgomp/target.c:468
468   uintptr_t orig_refcount = *refcount_ptr;
(gdb) bt
#0  gomp_decrement_refcount (do_remove=, 
do_copy=, delete_p=false, refcount_set=0x0, k=0xc4d450) at 
[...]/source-gcc/libgomp/target.c:468
#1  gomp_unmap_vars_internal (aq=0x0, aq@entry=0x8223c0, refcount_set=0x0, 
do_copyfrom=, do_copyfrom@entry=true, tgt=tgt@entry=0xc696a0) at 
[...]/source-gcc/libgomp/target.c:2065
#2  goacc_unmap_vars (tgt=tgt@entry=0xc696a0, 
do_copyfrom=do_copyfrom@entry=true, aq=aq@entry=0x0) at 
[...]/source-gcc/libgomp/target.c:2118
#3  0x77daa41c in GOACC_parallel_keyed (flags_m=flags_m@entry=-1, 
fn=fn@entry=0x400ae0 , mapnum=mapnum@entry=2, 
hostaddrs=hostaddrs@entry=0x7fffd7a0, sizes=sizes@entry=0x604500 
, kinds=kinds@entry=0x6044f0 ) at 
[...]/source-gcc/libgomp/oacc-parallel.c:639
#4  0x00400f11 in test3 () at 
source-gcc/libgomp/testsuite/libgomp.oacc-c/../libgomp.oacc-c-c++-common/noncontig_array-1.c:75
#5  0x004008f3 in main () at 
source-gcc/libgomp/testsuite/libgomp.oacc-c/../libgomp.oacc-c-c++-common/noncontig_array-1.c:101
(gdb) print refcount_ptr
$1 = (uintptr_t *) 0x1
(gdb) list 457,468
457   uintptr_t *refcount_ptr = >refcount;
458
459   if (REFCOUNT_STRUCTELEM_FIRST_P (k->refcount))
460

Re: [PATCH] testsuite: Fix input operands of gcc.dg/guality/pr43077-1.c

2021-05-11 Thread Jakub Jelinek via Gcc-patches
On Tue, May 11, 2021 at 04:36:47PM +0200, Stefan Schulze Frielinghaus via 
Gcc-patches wrote:
> The type of the output operands *p and *q of the extended asm statement
> of function foo is unsigned long whereas the type of the corresponding
> input operands is int.  This results, e.g. on IBM Z, in the case that
> the immediates 2 and 3 are written into registers in SI mode and read in
> DI mode resulting in wrong values.  Fixed by lifting the input operands
> to type long.
> 
> gcc/testsuite/ChangeLog:
> 
>   * gcc.dg/guality/pr43077-1.c: Align types of output and input
>   operands by lifting immediates to type long.
> 
> Ok for mainline?

Ok, thanks.

> diff --git a/gcc/testsuite/gcc.dg/guality/pr43077-1.c 
> b/gcc/testsuite/gcc.dg/guality/pr43077-1.c
> index 39bd26aae01..2d9376298d4 100644
> --- a/gcc/testsuite/gcc.dg/guality/pr43077-1.c
> +++ b/gcc/testsuite/gcc.dg/guality/pr43077-1.c
> @@ -24,7 +24,7 @@ int __attribute__((noinline))
>  foo (unsigned long *p, unsigned long *q)
>  {
>int ret;
> -  asm volatile ("" : "=r" (ret), "=r" (*p), "=r" (*q) : "0" (1), "1" (2), 
> "2" (3));
> +  asm volatile ("" : "=r" (ret), "=r" (*p), "=r" (*q) : "0" (1), "1" (2l), 
> "2" (3l));
>return ret;
>  }
>  
> -- 
> 2.23.0

Jakub



[PATCH,V2 2/2] dwarf: new dwarf_debuginfo_p predicate

2021-05-11 Thread Indu Bhagat via Gcc-patches
[Changes from V1]
  - included checks in
- config/darwin.c
- config/i386/cygming.h
- config/i386/darwin.h
- config/mips/mips.c
- config/rs6000/rs6000.c
[End of changes from V1]

This patch introduces a dwarf_debuginfo_p predicate that abstracts and
replaces complex checks on write_symbols.

gcc/c-family/ChangeLog:

* c-lex.c (init_c_lex): Use dwarf_debuginfo_p.

gcc/ChangeLog:

* config/c6x/c6x.c (c6x_output_file_unwind): Use dwarf_debuginfo_p.
* config/darwin.c (darwin_override_options): Likewise.
* config/i386/cygming.h (DBX_REGISTER_NUMBER): Likewise.
* config/i386/darwin.h (DBX_REGISTER_NUMBER): Likewise.
(DWARF2_FRAME_REG_OUT): Likewise.
* config/mips/mips.c (mips_output_filename): Likewise.
* config/rs6000/rs6000.c (rs6000_xcoff_declare_function_name):
Likewise.
(rs6000_dbx_register_number): Likewise.
* dwarf2cfi.c (cfi_label_required_p): Likewise.
(dwarf2out_do_frame): Likewise.
* final.c (dwarf2_debug_info_emitted_p): Likewise.
(final_scan_insn_1): Likewise.
* flags.h (dwarf_debuginfo_p): New function declaration.
* opts.c (dwarf_debuginfo_p): New function definition.
* targhooks.c (default_debug_unwind_info): Use dwarf_debuginfo_p.
* toplev.c (process_options): Likewise.
---
 gcc/c-family/c-lex.c   |  4 ++--
 gcc/config/c6x/c6x.c   |  3 +--
 gcc/config/darwin.c|  2 +-
 gcc/config/i386/cygming.h  |  2 +-
 gcc/config/i386/darwin.h   |  4 ++--
 gcc/config/mips/mips.c |  2 +-
 gcc/config/rs6000/rs6000.c |  4 ++--
 gcc/dwarf2cfi.c|  9 -
 gcc/final.c| 15 ++-
 gcc/flags.h|  4 
 gcc/opts.c |  8 
 gcc/targhooks.c|  2 +-
 gcc/toplev.c   |  6 ++
 13 files changed, 35 insertions(+), 30 deletions(-)

diff --git a/gcc/c-family/c-lex.c b/gcc/c-family/c-lex.c
index 6374b72..5174b22 100644
--- a/gcc/c-family/c-lex.c
+++ b/gcc/c-family/c-lex.c
@@ -27,6 +27,7 @@ along with GCC; see the file COPYING3.  If not see
 #include "stor-layout.h"
 #include "c-pragma.h"
 #include "debug.h"
+#include "flags.h"
 #include "file-prefix-map.h" /* remap_macro_filename()  */
 #include "langhooks.h"
 #include "attribs.h"
@@ -87,8 +88,7 @@ init_c_lex (void)
 
   /* Set the debug callbacks if we can use them.  */
   if ((debug_info_level == DINFO_LEVEL_VERBOSE
-   && (write_symbols == DWARF2_DEBUG
-  || write_symbols == VMS_AND_DWARF2_DEBUG))
+   && dwarf_debuginfo_p ())
   || flag_dump_go_spec != NULL)
 {
   cb->define = cb_define;
diff --git a/gcc/config/c6x/c6x.c b/gcc/config/c6x/c6x.c
index f9ad1e5..a10e2f8 100644
--- a/gcc/config/c6x/c6x.c
+++ b/gcc/config/c6x/c6x.c
@@ -439,8 +439,7 @@ c6x_output_file_unwind (FILE * f)
 {
   if (flag_unwind_tables || flag_exceptions)
{
- if (write_symbols == DWARF2_DEBUG
- || write_symbols == VMS_AND_DWARF2_DEBUG)
+ if (dwarf_debuginfo_p ())
asm_fprintf (f, "\t.cfi_sections .debug_frame, .c6xabi.exidx\n");
  else
asm_fprintf (f, "\t.cfi_sections .c6xabi.exidx\n");
diff --git a/gcc/config/darwin.c b/gcc/config/darwin.c
index 5d17391..b937914 100644
--- a/gcc/config/darwin.c
+++ b/gcc/config/darwin.c
@@ -3348,7 +3348,7 @@ darwin_override_options (void)
   && generating_for_darwin_version >= 9
   && (flag_gtoggle ? (debug_info_level == DINFO_LEVEL_NONE)
   : (debug_info_level >= DINFO_LEVEL_NORMAL))
-  && write_symbols == DWARF2_DEBUG)
+  && dwarf_debuginfo_p ())
 flag_var_tracking_uninit = flag_var_tracking;
 
   /* Final check on PCI options; for Darwin these are not dependent on the PIE
diff --git a/gcc/config/i386/cygming.h b/gcc/config/i386/cygming.h
index cfbca34..ac458cd 100644
--- a/gcc/config/i386/cygming.h
+++ b/gcc/config/i386/cygming.h
@@ -82,7 +82,7 @@ along with GCC; see the file COPYING3.  If not see
 #undef DBX_REGISTER_NUMBER
 #define DBX_REGISTER_NUMBER(n) \
   (TARGET_64BIT ? dbx64_register_map[n]\
-   : (write_symbols == DWARF2_DEBUG\
+   : (dwarf_debuginfo_p () \
   ? svr4_dbx_register_map[n] : dbx_register_map[n]))
 
 /* Map gcc register number to DWARF 2 CFA column number. For 32 bit
diff --git a/gcc/config/i386/darwin.h b/gcc/config/i386/darwin.h
index afa9f1b..5312003 100644
--- a/gcc/config/i386/darwin.h
+++ b/gcc/config/i386/darwin.h
@@ -275,13 +275,13 @@ along with GCC; see the file COPYING3.  If not see
 #undef DBX_REGISTER_NUMBER
 #define DBX_REGISTER_NUMBER(n) \
   (TARGET_64BIT ? dbx64_register_map[n]\
-   : write_symbols == DWARF2_DEBUG ? svr4_dbx_register_map[n]  \
+   : dwarf_debuginfo_p () ? svr4_dbx_register_map[n]   \
: dbx_register_map[n])
 
 /* 

[PATCH,V2 1/2] opts: change write_symbols to support bitmasks

2021-05-11 Thread Indu Bhagat via Gcc-patches
[Changes from V1]
  - Use debug_set_names API and remove asserts around the diagnostics
  - Reword diagnostic and adjust testsuite
[End of changes from V1]

To support multiple debug formats, we need to move away from explicit
enumeration of each individual combination of debug formats.

gcc/c-family/ChangeLog:

* c-opts.c (c_common_post_options): Adjust access to debug_type_names.
* c-pch.c (struct c_pch_validity): Use type uint32_t.
(pch_init): Renamed member.
(c_common_valid_pch): Adjust access to debug_type_names.

gcc/ChangeLog:

* common.opt: Change type to support bitmasks.
* flag-types.h (enum debug_info_type): Rename enumerator constants.
(NO_DEBUG): New bitmask.
(DBX_DEBUG): Likewise.
(DWARF2_DEBUG): Likewise.
(XCOFF_DEBUG): Likewise.
(VMS_DEBUG): Likewise.
(VMS_AND_DWARF2_DEBUG): Likewise.
* flags.h (debug_set_to_format): New function declaration.
(debug_set_count): Likewise.
(debug_set_names): Likewise.
* opts.c (debug_type_masks): Array of bitmasks for debug formats.
(debug_set_to_format): New function definition.
(debug_set_count): Likewise.
(debug_set_names): Likewise.
(set_debug_level): Update access to debug_type_names.
* toplev.c: Likewise.

gcc/objc/ChangeLog:

* objc-act.c (synth_module_prologue): Use uint32_t instead of enum
debug_info_type.

gcc/testsuite/ChangeLog:

* gcc.dg/pch/valid-1.c: Adjust diagnostic message in testcase.
* lib/dg-pch.exp: Adjust diagnostic message.
---
 gcc/c-family/c-opts.c  |   7 ++-
 gcc/c-family/c-pch.c   |  12 ++--
 gcc/common.opt |   2 +-
 gcc/flag-types.h   |  29 +++---
 gcc/flags.h|  17 +-
 gcc/objc/objc-act.c|   2 +-
 gcc/opts.c | 109 +
 gcc/testsuite/gcc.dg/pch/valid-1.c |   2 +-
 gcc/testsuite/lib/dg-pch.exp   |   4 +-
 gcc/toplev.c   |   9 ++-
 10 files changed, 157 insertions(+), 36 deletions(-)

diff --git a/gcc/c-family/c-opts.c b/gcc/c-family/c-opts.c
index 89e05a4..60b5802 100644
--- a/gcc/c-family/c-opts.c
+++ b/gcc/c-family/c-opts.c
@@ -1112,9 +1112,10 @@ c_common_post_options (const char **pfilename)
  /* Only -g0 and -gdwarf* are supported with PCH, for other
 debug formats we warn here and refuse to load any PCH files.  */
  if (write_symbols != NO_DEBUG && write_symbols != DWARF2_DEBUG)
-   warning (OPT_Wdeprecated,
-"the %qs debug format cannot be used with "
-"pre-compiled headers", debug_type_names[write_symbols]);
+ warning (OPT_Wdeprecated,
+  "the %qs debug info cannot be used with "
+  "pre-compiled headers",
+  debug_set_names (write_symbols & ~DWARF2_DEBUG));
}
   else if (write_symbols != NO_DEBUG && write_symbols != DWARF2_DEBUG)
c_common_no_more_pch ();
diff --git a/gcc/c-family/c-pch.c b/gcc/c-family/c-pch.c
index fd94c37..8f0f760 100644
--- a/gcc/c-family/c-pch.c
+++ b/gcc/c-family/c-pch.c
@@ -52,7 +52,7 @@ enum {
 
 struct c_pch_validity
 {
-  unsigned char debug_info_type;
+  uint32_t pch_write_symbols;
   signed char match[MATCH_SIZE];
   void (*pch_init) (void);
   size_t target_data_length;
@@ -108,7 +108,7 @@ pch_init (void)
   pch_outfile = f;
 
   memset (, '\0', sizeof (v));
-  v.debug_info_type = write_symbols;
+  v.pch_write_symbols = write_symbols;
   {
 size_t i;
 for (i = 0; i < MATCH_SIZE; i++)
@@ -252,13 +252,13 @@ c_common_valid_pch (cpp_reader *pfile, const char *name, 
int fd)
   /* The allowable debug info combinations are that either the PCH file
  was built with the same as is being used now, or the PCH file was
  built for some kind of debug info but now none is in use.  */
-  if (v.debug_info_type != write_symbols
+  if (v.pch_write_symbols != write_symbols
   && write_symbols != NO_DEBUG)
 {
   cpp_warning (pfile, CPP_W_INVALID_PCH,
-  "%s: created with -g%s, but used with -g%s", name,
-  debug_type_names[v.debug_info_type],
-  debug_type_names[write_symbols]);
+  "%s: created with '%s' debug info, but used with '%s'", name,
+  debug_set_names (v.pch_write_symbols),
+  debug_set_names (write_symbols));
   return 2;
 }
 
diff --git a/gcc/common.opt b/gcc/common.opt
index a75b44e..ffb968d 100644
--- a/gcc/common.opt
+++ b/gcc/common.opt
@@ -109,7 +109,7 @@ bool exit_after_options
 ; flag-types.h for the definitions of the different possible types of
 ; debugging information.
 Variable
-enum debug_info_type write_symbols = NO_DEBUG
+uint32_t write_symbols = NO_DEBUG
 
 ; Level of debugging information we are producing.  

[PATCH, V2 0/2] Fix write_symbols for supporting multiple debug formats

2021-05-11 Thread Indu Bhagat via Gcc-patches
[Changes from V1]
  - (Addressed Richard's comments)
  - For patch 1/2 [opts: change write_symbols to support bitmasks], use
  debug_set_names more uniformly. Reworded the diagnostics in c-family/c-opts.c
  and c-family/c-pch.c as there can be multiple debug formats. Updated the
  testsuite files accordingly. 
  - Included more backend files for patch 2/2 [dwarf: new dwarf_debuginfo_p
  predicate]
  - Regression tested on x86_64. Open for any suggestions on further testing.
[End of changes from V1]

Hello,

Over the last year, we have discussed and agreed that in order to support
multiple debug formats, we keep DWARF as defacto internal format and any new 
debug format to be supported feeds off DWARF dies. This requirement
specification has worked well for addition for CTF/BTF overall. 

There are some existing issues that need to discussed and fixed in this regard,
though. One of these is the definition and handling of write_symbols.

The current issue is that write_symbols is defined as 

   enum debug_info_type write_symbols = NO_DEBUG;

This means any new combination of debug formats needs to be explicitly
enumerated, like CTF_AND_DWARF2_DEBUG, VMS_AND_DWARF2_DEBUG etc. So the issue
is, to support say, -gctf -gbtf -g or possibly other combination of debug
formats to work together, each one needs to spelled out explicitly; which will
make the handling ugly.

This patch set updates write_symbols to use bitmasks.

Thanks,
Indu Bhagat (2):
  opts: change write_symbols to support bitmasks
  dwarf: new dwarf_debuginfo_p predicate

 gcc/c-family/c-lex.c   |   4 +-
 gcc/c-family/c-opts.c  |   7 ++-
 gcc/c-family/c-pch.c   |  12 ++--
 gcc/common.opt |   2 +-
 gcc/config/c6x/c6x.c   |   3 +-
 gcc/config/darwin.c|   2 +-
 gcc/config/i386/cygming.h  |   2 +-
 gcc/config/i386/darwin.h   |   4 +-
 gcc/config/mips/mips.c |   2 +-
 gcc/config/rs6000/rs6000.c |   4 +-
 gcc/dwarf2cfi.c|   9 ++-
 gcc/final.c|  15 ++---
 gcc/flag-types.h   |  29 ++---
 gcc/flags.h|  21 ++-
 gcc/objc/objc-act.c|   2 +-
 gcc/opts.c | 117 +
 gcc/targhooks.c|   2 +-
 gcc/testsuite/gcc.dg/pch/valid-1.c |   2 +-
 gcc/testsuite/lib/dg-pch.exp   |   4 +-
 gcc/toplev.c   |  15 ++---
 20 files changed, 192 insertions(+), 66 deletions(-)

-- 
1.8.3.1



[PATCH] testsuite: Fix input operands of gcc.dg/guality/pr43077-1.c

2021-05-11 Thread Stefan Schulze Frielinghaus via Gcc-patches
The type of the output operands *p and *q of the extended asm statement
of function foo is unsigned long whereas the type of the corresponding
input operands is int.  This results, e.g. on IBM Z, in the case that
the immediates 2 and 3 are written into registers in SI mode and read in
DI mode resulting in wrong values.  Fixed by lifting the input operands
to type long.

gcc/testsuite/ChangeLog:

* gcc.dg/guality/pr43077-1.c: Align types of output and input
operands by lifting immediates to type long.

Ok for mainline?

---
 gcc/testsuite/gcc.dg/guality/pr43077-1.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/gcc/testsuite/gcc.dg/guality/pr43077-1.c 
b/gcc/testsuite/gcc.dg/guality/pr43077-1.c
index 39bd26aae01..2d9376298d4 100644
--- a/gcc/testsuite/gcc.dg/guality/pr43077-1.c
+++ b/gcc/testsuite/gcc.dg/guality/pr43077-1.c
@@ -24,7 +24,7 @@ int __attribute__((noinline))
 foo (unsigned long *p, unsigned long *q)
 {
   int ret;
-  asm volatile ("" : "=r" (ret), "=r" (*p), "=r" (*q) : "0" (1), "1" (2), "2" 
(3));
+  asm volatile ("" : "=r" (ret), "=r" (*p), "=r" (*q) : "0" (1), "1" (2l), "2" 
(3l));
   return ret;
 }
 
-- 
2.23.0



Re: [PATCH] OpenMP: Add support for 'close' in map clause

2021-05-11 Thread Marcel Vollweiler


Am 10.05.2021 um 20:34 schrieb Jakub Jelinek:

On Mon, May 10, 2021 at 04:11:39PM +0200, Marcel Vollweiler wrote:

@@ -15660,37 +15665,54 @@ c_parser_omp_clause_map (c_parser *parser, tree list)
if (!parens.require_open (parser))
  return list;

-  if (c_parser_next_token_is (parser, CPP_NAME))
+  int always = 0;
+  int close = 0;
+  int pos = 1;
+  while (c_parser_peek_nth_token_raw (parser, pos)->type == CPP_NAME)


Nice, totally missed that Joseph has added this.


  {
-  c_token *tok = c_parser_peek_token (parser);
+  c_token *tok = c_parser_peek_nth_token_raw (parser, pos);
const char *p = IDENTIFIER_POINTER (tok->value);
-  always_id_kind = tok->id_kind;
-  always_loc = tok->location;
-  always_id = tok->value;
if (strcmp ("always", p) == 0)
 {
-  c_token *sectok = c_parser_peek_2nd_token (parser);
-  if (sectok->type == CPP_COMMA)
+  if (always)
 {
-  c_parser_consume_token (parser);
-  c_parser_consume_token (parser);
-  always = 2;
+  c_parser_error (parser, "expected modifier % only once");


The usual wording would be
"too many % modifiers"



Changed for 'always' and 'close' for C and C++.


+  parens.skip_until_found_close (parser);
+  return list;
+}
+
+  always_id_kind = tok->id_kind;
+  always_loc = tok->location;
+  always_id = tok->value;


But you don't need any of the always_{id_kind,loc,id} variables anymore,
so they should be removed and everything that touches them too.



That's true. I removed them.


+
+  always++;
+}
+  else if (strcmp ("close", p) == 0)
+{
+  if (close)
+{
+  c_parser_error (parser, "expected modifier % only once");


Similarly.


+  parens.skip_until_found_close (parser);
+  return list;
 }
-  else if (sectok->type == CPP_NAME)
+
+  close++;
+}
+  else if (c_parser_peek_nth_token_raw (parser, pos + 1)->type == 
CPP_COLON)


IMHO you should at least check that tok->type == CPP_NAME before
checking pos + 1 token's type, you don't want to skip over CPP_EOF,
CPP_PRAGMA_EOF, or even CPP_CLOSE_PAREN etc.
Perhaps by adding
   if (tok->type != CPP_NAME)
  break;
right after c_token *tok = c_parser_peek_nth_token_raw (parser, pos); ?


The check of the token's type at position 'pos' is done in the condition
of the while loop, that means
   'c_parser_peek_nth_token_raw (parser, pos + 1)->type == CPP_COLON'
is only reached when
   'c_parser_peek_nth_token_raw (parser, pos)->type == CPP_NAME'
holds (since 'pos' is not changed in between).




+{
+  for (int i = 1; i < pos; ++i)
 {
-  p = IDENTIFIER_POINTER (sectok->value);
-  if (strcmp ("alloc", p) == 0
-  || strcmp ("to", p) == 0
-  || strcmp ("from", p) == 0
-  || strcmp ("tofrom", p) == 0
-  || strcmp ("release", p) == 0
-  || strcmp ("delete", p) == 0)
-{
-  c_parser_consume_token (parser);
-  always = 1;
-}
+  c_parser_peek_token(parser);


Formatting, space before (



Corrected.


+  c_parser_consume_token (parser);
 }
+  break;


And, IMHO something should clear always and close (btw, might be better
to use close_modifier as variable name and for consistency always_modifier)
unless we reach the CPP_COLON case.



Good point, I agree with both. Cleared and renamed :)


Because we don't want
   map (always, close)
to imply
   map (always, close, tofrom: always, close)
but
   map (tofrom: always, close)
and my reading of your changes suggests that we actually use the
*_ALWAYS* kinds in that case.


+  cp_parser_error (parser,
+   "expected modifier % only once");


See above.


+  cp_parser_skip_to_closing_parenthesis (parser,
+ /*recovering=*/true,
+ /*or_comma=*/false,
+ /*consume_paren=*/true);
+  return list;
+}
+
+  always = true;
+}
+  else if (strcmp ("close", p) == 0)
+{
+  if (close)
+{
+  cp_parser_error (parser,
+   "expected modifier % only once");


Likewise.


+  else if (cp_lexer_peek_nth_token (parser->lexer, pos + 1)->type
+   == CPP_COLON)
+{
+  for (int i = 1; i < pos; ++i)
+cp_lexer_consume_token (parser->lexer);
+  break;
+}
+  else
+break;
+
+  if (cp_lexer_peek_nth_token (parser->lexer, pos + 1)->type == CPP_COMMA)
+pos++;
+  pos++;
  }


Again, I don't see anything that would clear always/close if it didn't reach
the CPP_COLON case.

And it should be covered in the testcase.


I added a test case similar to your example above:
   '#pragma omp target map (always, close, to: always, close, 

[committed] preprocessor: Enable digit separators for C2X

2021-05-11 Thread Joseph Myers
C2X adds digit separators, as in C++.  Enable them accordingly in
libcpp and c-lex.c.  Some basic tests are added that digit separators
behave as expected for C2X and are properly disabled for C11; further
test coverage is included in the existing g++.dg/cpp1y/digit-sep*.C
tests.

Bootstrapped with no regressions for x86_64-pc-linux-gnu.  Applied to 
mainline.

gcc/c-family/
* c-lex.c (interpret_float): Handle digit separators for C2X.

libcpp/
* init.c (lang_defaults): Enable digit separators for GNUC2X and
STDC2X.

gcc/testsuite/
* gcc.dg/c11-digit-separators-1.c,
gcc.dg/c2x-digit-separators-1.c, gcc.dg/c2x-digit-separators-2.c:
New tests.

diff --git a/gcc/c-family/c-lex.c b/gcc/c-family/c-lex.c
index 6374b72ed2d..1c66ecd8fc4 100644
--- a/gcc/c-family/c-lex.c
+++ b/gcc/c-family/c-lex.c
@@ -1001,7 +1001,7 @@ interpret_float (const cpp_token *token, unsigned int 
flags,
 }
 
   copy = (char *) alloca (copylen + 1);
-  if (cxx_dialect > cxx11)
+  if (c_dialect_cxx () ? cxx_dialect > cxx11 : flag_isoc2x)
 {
   size_t maxlen = 0;
   for (size_t i = 0; i < copylen; ++i)
diff --git a/gcc/testsuite/gcc.dg/c11-digit-separators-1.c 
b/gcc/testsuite/gcc.dg/c11-digit-separators-1.c
new file mode 100644
index 000..fc832260acb
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/c11-digit-separators-1.c
@@ -0,0 +1,7 @@
+/* Test C2x digit separators not in C11.  */
+/* { dg-do compile } */
+/* { dg-options "-std=c11 -pedantic-errors" } */
+
+#define m(x) 0
+
+_Static_assert (m(1'2)+(3'4) == 0, "digit separators");
diff --git a/gcc/testsuite/gcc.dg/c2x-digit-separators-1.c 
b/gcc/testsuite/gcc.dg/c2x-digit-separators-1.c
new file mode 100644
index 000..6eadf2ea87f
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/c2x-digit-separators-1.c
@@ -0,0 +1,39 @@
+/* Test C2x digit separators.  Valid usages.  */
+/* { dg-do run } */
+/* { dg-options "-std=c2x -pedantic-errors" } */
+
+_Static_assert (123'45'6 == 123456);
+_Static_assert (0'123 == 0123);
+_Static_assert (0x1'23 == 0x123);
+
+#define m(x) 0
+
+_Static_assert (m(1'2)+(3'4) == 34);
+
+_Static_assert (0x0'e-0xe == 0);
+
+#define a0 '.' -
+#define acat(x) a ## x
+_Static_assert (acat (0'.') == 0);
+
+#define c0(x) 0
+#define b0 c0 (
+#define bcat(x) b ## x
+_Static_assert (bcat (0'\u00c0')) == 0);
+
+extern void exit (int);
+extern void abort (void);
+
+int
+main (void)
+{
+  if (314'159e-0'5f != 3.14159f)
+abort ();
+  exit (0);
+}
+
+#line 0'123
+_Static_assert (__LINE__ == 123);
+
+#line 4'56'7'8'9
+_Static_assert (__LINE__ == 456789);
diff --git a/gcc/testsuite/gcc.dg/c2x-digit-separators-2.c 
b/gcc/testsuite/gcc.dg/c2x-digit-separators-2.c
new file mode 100644
index 000..d72f8adc6cb
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/c2x-digit-separators-2.c
@@ -0,0 +1,25 @@
+/* Test C2x digit separators.  Invalid usages.  */
+/* { dg-do compile } */
+/* { dg-options "-std=c2x -pedantic-errors" } */
+
+void
+tf (void)
+{
+  int i;
+  i = 1''2; /* { dg-error "adjacent digit separators" } */
+  i = 0x'0; /* { dg-error "digit separator after base indicator" } */
+  i = 0X'1; /* { dg-error "digit separator after base indicator" } */
+  i = 0b'0; /* { dg-error "digit separator after base indicator" } */
+  i = 0B'1; /* { dg-error "digit separator after base indicator" } */
+  i = 1'u; /* { dg-error "digit separator outside digit sequence" } */
+  float f = 1.2e-3'f; /* { dg-error "digit separator outside digit sequence" } 
*/
+  i = 1'2'3'; /* { dg-error "12:missing terminating" } */
+  ;
+  double d;
+  d = 1'.2'3e-4; /* { dg-warning "multi-character" } */
+  /* { dg-error "expected" "parse error" { target *-*-* } .-1 } */
+  d = 1.2''3; /* { dg-error "adjacent digit separators" } */
+  d = 1.23e-4''5; /* { dg-error "adjacent digit separators" } */
+  d = 1.2'3e-4'5'; /* { dg-error "17:missing terminating" } */
+  /* { dg-error "expected" "parse error" { target *-*-* } .-1 } */
+}
diff --git a/libcpp/init.c b/libcpp/init.c
index 68ed2c761b9..18a2341c2d0 100644
--- a/libcpp/init.c
+++ b/libcpp/init.c
@@ -103,13 +103,13 @@ static const struct lang_flags lang_defaults[] =
   /* GNUC99   */  { 1,  0,  1,  1,  0,  0,  1,   1,   1,   0,0, 0, 
0,   0,  1,   1, 0,   0 },
   /* GNUC11   */  { 1,  0,  1,  1,  1,  0,  1,   1,   1,   0,0, 0, 
0,   0,  1,   1, 0,   0 },
   /* GNUC17   */  { 1,  0,  1,  1,  1,  0,  1,   1,   1,   0,0, 0, 
0,   0,  1,   1, 0,   0 },
-  /* GNUC2X   */  { 1,  0,  1,  1,  1,  0,  1,   1,   1,   0,1, 0, 
0,   1,  1,   1, 1,   0 },
+  /* GNUC2X   */  { 1,  0,  1,  1,  1,  0,  1,   1,   1,   0,1, 1, 
0,   1,  1,   1, 1,   0 },
   /* STDC89   */  { 0,  0,  0,  0,  0,  1,  0,   0,   0,   0,0, 0, 
1,   0,  0,   0, 0,   0 },
   /* STDC94   */  { 0,  0,  0,  0,  0,  1,  1,   0,   0,   0,0, 0, 
1,   0,  0,   0, 0,   0 },
   /* STDC99   */  { 1,  0,  1,  1,  0,  

Re: [PATCH 1/2] opts: change write_symbols to support bitmasks

2021-05-11 Thread Indu Bhagat via Gcc-patches

On 5/10/21 6:11 AM, Richard Biener wrote:

On Thu, May 6, 2021 at 2:31 AM Indu Bhagat via Gcc-patches
 wrote:


To support multiple debug formats, we need to move away from explicit
enumeration of each individual combination of debug formats.


debug_set_names with its static buffer seems unused?  You wire quite some
APIs with gcc_assert on having a single bit set - that doesn't look forward
looking.

I suppose the BTF followups will "fix" this, but see comments below.


Thanks for your feedback.

Yes, I intended to fix them when I added the CTF/BTF as then I would 
have a way to debug/test it more meaningfully. Because of this, the 
debug_set_names and the associated static buffer were unused in V1 indeed.


For V2, taking your inputs, I have now fixed the uses in c-opts.c and 
c-pch.c at least.





gcc/c-family/ChangeLog:

 * c-opts.c (c_common_post_options): Adjust access to debug_type_names.
 * c-pch.c (struct c_pch_validity): Use type uint32_t.
 (pch_init): Renamed member.
 (c_common_valid_pch): Adjust access to debug_type_names.

gcc/ChangeLog:

 * common.opt: Change type to support bitmasks.
 * flag-types.h (enum debug_info_type): Rename enumerator constants.
 (NO_DEBUG): New bitmask.
 (DBX_DEBUG): Likewise.
 (DWARF2_DEBUG): Likewise.
 (XCOFF_DEBUG): Likewise.
 (VMS_DEBUG): Likewise.
 (VMS_AND_DWARF2_DEBUG): Likewise.
 * flags.h (debug_set_to_format): New function declaration.
 (debug_set_count): Likewise.
 (debug_set_names): Likewise.
 * opts.c (debug_type_masks): Array of bitmasks for debug formats.
 (debug_set_to_format): New function definition.
 (debug_set_count): Likewise.
 (debug_set_names): Likewise.
 (set_debug_level): Update access to debug_type_names.
 * toplev.c: Likewise.

gcc/objc/ChangeLog:

 * objc-act.c (synth_module_prologue): Use uint32_t instead of enum
 debug_info_type.
---
  gcc/c-family/c-opts.c |  10 +++--
  gcc/c-family/c-pch.c  |  12 +++---
  gcc/common.opt|   2 +-
  gcc/flag-types.h  |  29 ++
  gcc/flags.h   |  17 +++-
  gcc/objc/objc-act.c   |   2 +-
  gcc/opts.c| 109 +-
  gcc/toplev.c  |   9 +++--
  8 files changed, 158 insertions(+), 32 deletions(-)

diff --git a/gcc/c-family/c-opts.c b/gcc/c-family/c-opts.c
index 89e05a4..e463240 100644
--- a/gcc/c-family/c-opts.c
+++ b/gcc/c-family/c-opts.c
@@ -1112,9 +1112,13 @@ c_common_post_options (const char **pfilename)
   /* Only -g0 and -gdwarf* are supported with PCH, for other
  debug formats we warn here and refuse to load any PCH files.  */
   if (write_symbols != NO_DEBUG && write_symbols != DWARF2_DEBUG)
-   warning (OPT_Wdeprecated,
-"the %qs debug format cannot be used with "
-"pre-compiled headers", debug_type_names[write_symbols]);
+   {
+ gcc_assert (debug_set_count (write_symbols) <= 1);


Why this assert?  Iff then simply include the count check in the
condition of the warning.


+ warning (OPT_Wdeprecated,
+  "the %qs debug format cannot be used with "
+  "pre-compiled headers",
+  debug_type_names[debug_set_to_format (write_symbols)]);


Maybe simply emit another diagnostic if debug_set_count > 1.



OK. I have removed the asserts. The code now uses debug_set_names 
uniformly. I have changed the diagnostic message, as there can be 
multiple debug formats for PCH at some point. So,


from "the 'XXX' debug format cannot be used with pre-compiled headers"
to   "the 'XXX YYY' debug info cannot be used with pre-compiled headers"
if multiple debug formats were enabled.


+   }
 }
else if (write_symbols != NO_DEBUG && write_symbols != DWARF2_DEBUG)
 c_common_no_more_pch ();
diff --git a/gcc/c-family/c-pch.c b/gcc/c-family/c-pch.c
index fd94c37..6804388 100644
--- a/gcc/c-family/c-pch.c
+++ b/gcc/c-family/c-pch.c
@@ -52,7 +52,7 @@ enum {

  struct c_pch_validity
  {
-  unsigned char debug_info_type;
+  uint32_t pch_write_symbols;
signed char match[MATCH_SIZE];
void (*pch_init) (void);
size_t target_data_length;
@@ -108,7 +108,7 @@ pch_init (void)
pch_outfile = f;

memset (, '\0', sizeof (v));
-  v.debug_info_type = write_symbols;
+  v.pch_write_symbols = write_symbols;
{
  size_t i;
  for (i = 0; i < MATCH_SIZE; i++)
@@ -252,13 +252,15 @@ c_common_valid_pch (cpp_reader *pfile, const char *name, 
int fd)
/* The allowable debug info combinations are that either the PCH file
   was built with the same as is being used now, or the PCH file was
   built for some kind of debug info but now none is in use.  */
-  if (v.debug_info_type != write_symbols
+  if (v.pch_write_symbols != write_symbols

[PATCH] libstdc++: Remove extern "C" from Ryu sources

2021-05-11 Thread Patrick Palka via Gcc-patches
floating_to_chars.cc includes the Ryu sources into an anonymous
namespace as a convenient way to give all its symbols internal linkage.
But an entity declared extern "C" always has external linkage, even
from within an anonymous namespace, so this trick doesn't work in the
presence of extern "C", and it causes the Ryu function generic_to_chars
to be visible from libstdc++.a.

This patch removes the only use of extern "C" from our local copy of
Ryu, along with some declarations for never-defined functions that GCC
now warns about.

Tested on x86_64-pc-linux-gnu, and also verified that generic_to_chars
is not visible from libstdc++.a.  Does this look OK for trunk and the 11
branch?

libstdc++-v3/ChangeLog:

* src/c++17/ryu/LOCAL_PATCHES: Update.
* src/c++17/ryu/ryu_generic_128.h: Remove extern "C".
Remove declarations for never-defined functions.
---
 libstdc++-v3/src/c++17/ryu/LOCAL_PATCHES |  1 +
 libstdc++-v3/src/c++17/ryu/ryu_generic_128.h | 21 +++-
 2 files changed, 4 insertions(+), 18 deletions(-)

diff --git a/libstdc++-v3/src/c++17/ryu/LOCAL_PATCHES 
b/libstdc++-v3/src/c++17/ryu/LOCAL_PATCHES
index 51e504cb6ea..72ffad9662d 100644
--- a/libstdc++-v3/src/c++17/ryu/LOCAL_PATCHES
+++ b/libstdc++-v3/src/c++17/ryu/LOCAL_PATCHES
@@ -1,2 +1,3 @@
 r11-6248
 r11-7636
+r12-XXX
diff --git a/libstdc++-v3/src/c++17/ryu/ryu_generic_128.h 
b/libstdc++-v3/src/c++17/ryu/ryu_generic_128.h
index 2afbf274e11..6d988ab01eb 100644
--- a/libstdc++-v3/src/c++17/ryu/ryu_generic_128.h
+++ b/libstdc++-v3/src/c++17/ryu/ryu_generic_128.h
@@ -18,9 +18,9 @@
 #define RYU_GENERIC_128_H
 
 
-#ifdef __cplusplus
-extern "C" {
-#endif
+// NOTE: These symbols are declared extern "C" upstream, but we don't want that
+// because it'd override the internal linkage of the anonymous namespace into
+// which this header is included.
 
 // This is a generic 128-bit implementation of float to shortest conversion
 // using the Ryu algorithm. It can handle any IEEE-compatible floating-point
@@ -42,18 +42,6 @@ struct floating_decimal_128 {
   bool sign;
 };
 
-struct floating_decimal_128 float_to_fd128(float f);
-struct floating_decimal_128 double_to_fd128(double d);
-
-// According to wikipedia (https://en.wikipedia.org/wiki/Long_double), this 
likely only works on
-// x86 with specific compilers (clang?). May need an ifdef.
-struct floating_decimal_128 long_double_to_fd128(long double d);
-
-// Converts the given binary floating point number to the shortest decimal 
floating point number
-// that still accurately represents it.
-struct floating_decimal_128 generic_binary_to_decimal(
-const uint128_t bits, const uint32_t mantissaBits, const uint32_t 
exponentBits, const bool explicitLeadingBit);
-
 // Converts the given decimal floating point number to a string, writing to 
result, and returning
 // the number characters written. Does not terminate the buffer with a 0. In 
the worst case, this
 // function can write up to 53 characters.
@@ -63,8 +51,5 @@ struct floating_decimal_128 generic_binary_to_decimal(
 // = 1 + 39 + 1 + 1 + 1 + 10 = 53
 int generic_to_chars(const struct floating_decimal_128 v, char* const result);
 
-#ifdef __cplusplus
-}
-#endif
 
 #endif // RYU_GENERIC_128_H
-- 
2.31.1.527.g2d677e5b15



[pushed] c++: ICE casting class to vector [PR100517]

2021-05-11 Thread Jason Merrill via Gcc-patches
My recent change to reject calling rvalue() with an argument of class type
crashes on this testcase, where we use rvalue() on what we expect to be an
argument of integer or vector type.  Fixed by checking first.

Tested x86_64-pc-linux-gnu, applying to trunk.

gcc/cp/ChangeLog:

PR c++/100517
* typeck.c (build_reinterpret_cast_1): Check intype on
cast to vector.

gcc/testsuite/ChangeLog:

PR c++/100517
* g++.dg/ext/vector41.C: New test.
---
 gcc/cp/typeck.c |  2 +-
 gcc/testsuite/g++.dg/ext/vector41.C | 12 
 2 files changed, 13 insertions(+), 1 deletion(-)
 create mode 100644 gcc/testsuite/g++.dg/ext/vector41.C

diff --git a/gcc/cp/typeck.c b/gcc/cp/typeck.c
index 9002dd14fae..703ddd3cc7a 100644
--- a/gcc/cp/typeck.c
+++ b/gcc/cp/typeck.c
@@ -8114,7 +8114,7 @@ build_reinterpret_cast_1 (location_t loc, tree type, tree 
expr,
"pointer-to-object is conditionally-supported");
   return build_nop_reinterpret (type, expr);
 }
-  else if (gnu_vector_type_p (type))
+  else if (gnu_vector_type_p (type) && scalarish_type_p (intype))
 return convert_to_vector (type, rvalue (expr));
   else if (gnu_vector_type_p (intype)
   && INTEGRAL_OR_ENUMERATION_TYPE_P (type))
diff --git a/gcc/testsuite/g++.dg/ext/vector41.C 
b/gcc/testsuite/g++.dg/ext/vector41.C
new file mode 100644
index 000..bfc3bb6db4b
--- /dev/null
+++ b/gcc/testsuite/g++.dg/ext/vector41.C
@@ -0,0 +1,12 @@
+// PR c++/100517
+// { dg-options "" }
+
+typedef int __v2si __attribute__ ((__vector_size__ (8)));
+
+struct S { };
+
+void
+f (S s)
+{
+  (void) reinterpret_cast<__v2si> (s); // { dg-error "" }
+}

base-commit: 2301a394607b88f8996efe864350c5f841000f76
-- 
2.27.0



[PATCH] More maybe_fold_reference TLC

2021-05-11 Thread Richard Biener
This removes stale users of maybe_fold_reference where IL constraints
make it never do anything.

Bootstrapped and tested on x86_64-unknown-linux-gnu, pushed.

2021-05-11  Richard Biener  

* gimple-fold.c (gimple_fold_call): Do not call
maybe_fold_reference on call arguments or the static chain.
(fold_stmt_1): Do not call maybe_fold_reference on GIMPLE_ASM
inputs.
---
 gcc/gimple-fold.c | 59 ---
 1 file changed, 59 deletions(-)

diff --git a/gcc/gimple-fold.c b/gcc/gimple-fold.c
index 74ec36e3a78..68717cf1542 100644
--- a/gcc/gimple-fold.c
+++ b/gcc/gimple-fold.c
@@ -5447,19 +5447,6 @@ gimple_fold_call (gimple_stmt_iterator *gsi, bool 
inplace)
   gcall *stmt = as_a  (gsi_stmt (*gsi));
   tree callee;
   bool changed = false;
-  unsigned i;
-
-  /* Fold *& in call arguments.  */
-  for (i = 0; i < gimple_call_num_args (stmt); ++i)
-if (REFERENCE_CLASS_P (gimple_call_arg (stmt, i)))
-  {
-   tree tmp = maybe_fold_reference (gimple_call_arg (stmt, i));
-   if (tmp)
- {
-   gimple_call_set_arg (stmt, i, tmp);
-   changed = true;
- }
-  }
 
   /* Check for virtual calls that became direct calls.  */
   callee = gimple_call_fn (stmt);
@@ -5562,15 +5549,6 @@ gimple_fold_call (gimple_stmt_iterator *gsi, bool 
inplace)
  gimple_call_set_chain (stmt, NULL);
  changed = true;
}
-  else
-   {
- tree tmp = maybe_fold_reference (gimple_call_chain (stmt));
- if (tmp)
-   {
- gimple_call_set_chain (stmt, tmp);
- changed = true;
-   }
-   }
 }
 
   if (inplace)
@@ -6285,43 +6263,6 @@ fold_stmt_1 (gimple_stmt_iterator *gsi, bool inplace, 
tree (*valueize) (tree))
   changed |= gimple_fold_call (gsi, inplace);
   break;
 
-case GIMPLE_ASM:
-  /* Fold *& in asm operands.  */
-  {
-   gasm *asm_stmt = as_a  (stmt);
-   size_t noutputs;
-   const char **oconstraints;
-   const char *constraint;
-   bool allows_mem, allows_reg;
-
-   noutputs = gimple_asm_noutputs (asm_stmt);
-   oconstraints = XALLOCAVEC (const char *, noutputs);
-
-   for (i = 0; i < noutputs; ++i)
- {
-   tree link = gimple_asm_output_op (asm_stmt, i);
-   oconstraints[i]
- = TREE_STRING_POINTER (TREE_VALUE (TREE_PURPOSE (link)));
- }
-   for (i = 0; i < gimple_asm_ninputs (asm_stmt); ++i)
- {
-   tree link = gimple_asm_input_op (asm_stmt, i);
-   tree op = TREE_VALUE (link);
-   constraint
- = TREE_STRING_POINTER (TREE_VALUE (TREE_PURPOSE (link)));
-   parse_input_constraint (, 0, 0, noutputs, 0,
-   oconstraints, _mem, _reg);
-   if (REFERENCE_CLASS_P (op)
-   && (allows_reg || !allows_mem)
-   && (op = maybe_fold_reference (op)) != NULL_TREE)
- {
-   TREE_VALUE (link) = op;
-   changed = true;
- }
- }
-  }
-  break;
-
 case GIMPLE_DEBUG:
   if (gimple_debug_bind_p (stmt))
{
-- 
2.26.2


New Japanese PO file for 'gcc' (version 11.1.0)

2021-05-11 Thread Translation Project Robot
Hello, gentle maintainer.

This is a message from the Translation Project robot.

A revised PO file for textual domain 'gcc' has been submitted
by the Japanese team of translators.  The file is available at:

https://translationproject.org/latest/gcc/ja.po

(This file, 'gcc-11.1.0.ja.po', has just now been sent to you in
a separate email.)

All other PO files for your package are available in:

https://translationproject.org/latest/gcc/

Please consider including all of these in your next release, whether
official or a pretest.

Whenever you have a new distribution with a new version number ready,
containing a newer POT file, please send the URL of that distribution
tarball to the address below.  The tarball may be just a pretest or a
snapshot, it does not even have to compile.  It is just used by the
translators when they need some extra translation context.

The following HTML page has been updated:

https://translationproject.org/domain/gcc.html

If any question arises, please contact the translation coordinator.

Thank you for all your work,

The Translation Project robot, in the
name of your translation coordinator.




Re: [PATCH,rs6000] Test cases for p10 fusion patterns

2021-05-11 Thread Aaron Sawdey via Gcc-patches
Ping.

Aaron Sawdey, Ph.D. saw...@linux.ibm.com
IBM Linux on POWER Toolchain
 

> On Apr 26, 2021, at 2:00 PM, acsaw...@linux.ibm.com wrote:
> 
> From: Aaron Sawdey 
> 
> This adds some test cases to make sure that the combine patterns for p10
> fusion are working.
> 
> OK for trunk?
> 
> gcc/testsuite/ChangeLog:
>   * gcc.target/powerpc/fusion-p10-ldcmpi.c: New file.
>   * gcc.target/powerpc/fusion-p10-2logical.c: New file.
> ---
> .../gcc.target/powerpc/fusion-p10-2logical.c  | 205 ++
> .../gcc.target/powerpc/fusion-p10-ldcmpi.c|  66 ++
> 2 files changed, 271 insertions(+)
> create mode 100644 gcc/testsuite/gcc.target/powerpc/fusion-p10-2logical.c
> create mode 100644 gcc/testsuite/gcc.target/powerpc/fusion-p10-ldcmpi.c
> 
> diff --git a/gcc/testsuite/gcc.target/powerpc/fusion-p10-2logical.c 
> b/gcc/testsuite/gcc.target/powerpc/fusion-p10-2logical.c
> new file mode 100644
> index 000..9a205373505
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/powerpc/fusion-p10-2logical.c
> @@ -0,0 +1,205 @@
> +/* { dg-do compile { target { powerpc*-*-* } } } */
> +/* { dg-skip-if "" { powerpc*-*-darwin* } } */
> +/* { dg-options "-mdejagnu-cpu=power10 -O3 -dp" } */
> +
> +#include 
> +#include 
> +
> +/* and/andc/eqv/nand/nor/or/orc/xor */
> +#define AND(a,b) ((a)&(b))
> +#define ANDC1(a,b) ((a)&((~b)))
> +#define ANDC2(a,b) ((~(a))&(b))
> +#define EQV(a,b) (~((a)^(b)))
> +#define NAND(a,b) (~((a)&(b)))
> +#define NOR(a,b) (~((a)|(b)))
> +#define OR(a,b) ((a)|(b))
> +#define ORC1(a,b) ((a)|((~b)))
> +#define ORC2(a,b) ((~(a))|(b))
> +#define XOR(a,b) ((a)^(b))
> +#define TEST1(type, func)
> \
> +  type func ## _and_T_ ## type (type a, type b, type c) { return 
> AND(func(a,b),c); } \
> +  type func ## _andc1_T_   ## type (type a, type b, type c) { return 
> ANDC1(func(a,b),c); } \
> +  type func ## _andc2_T_   ## type (type a, type b, type c) { return 
> ANDC2(func(a,b),c); } \
> +  type func ## _eqv_T_ ## type (type a, type b, type c) { return 
> EQV(func(a,b),c); } \
> +  type func ## _nand_T_## type (type a, type b, type c) { return 
> NAND(func(a,b),c); } \
> +  type func ## _nor_T_ ## type (type a, type b, type c) { return 
> NOR(func(a,b),c); } \
> +  type func ## _or_T_  ## type (type a, type b, type c) { return 
> OR(func(a,b),c); } \
> +  type func ## _orc1_T_## type (type a, type b, type c) { return 
> ORC1(func(a,b),c); } \
> +  type func ## _orc2_T_## type (type a, type b, type c) { return 
> ORC2(func(a,b),c); } \
> +  type func ## _xor_T_ ## type (type a, type b, type c) { return 
> XOR(func(a,b),c); } \
> +  type func ## _rev_and_T_ ## type (type a, type b, type c) { return 
> AND(c,func(a,b)); } \
> +  type func ## _rev_andc1_T_   ## type (type a, type b, type c) { return 
> ANDC1(c,func(a,b)); } \
> +  type func ## _rev_andc2_T_   ## type (type a, type b, type c) { return 
> ANDC2(c,func(a,b)); } \
> +  type func ## _rev_eqv_T_ ## type (type a, type b, type c) { return 
> EQV(c,func(a,b)); } \
> +  type func ## _rev_nand_T_## type (type a, type b, type c) { return 
> NAND(c,func(a,b)); } \
> +  type func ## _rev_nor_T_ ## type (type a, type b, type c) { return 
> NOR(c,func(a,b)); } \
> +  type func ## _rev_or_T_  ## type (type a, type b, type c) { return 
> OR(c,func(a,b)); } \
> +  type func ## _rev_orc1_T_## type (type a, type b, type c) { return 
> ORC1(c,func(a,b)); } \
> +  type func ## _rev_orc2_T_## type (type a, type b, type c) { return 
> ORC2(c,func(a,b)); } \
> +  type func ## _rev_xor_T_ ## type (type a, type b, type c) { return 
> XOR(c,func(a,b)); }
> +#define TEST(type)\
> +  TEST1(type,AND) \
> +  TEST1(type,ANDC1)   \
> +  TEST1(type,ANDC2)   \
> +  TEST1(type,EQV) \
> +  TEST1(type,NAND)\
> +  TEST1(type,NOR) \
> +  TEST1(type,OR)  \
> +  TEST1(type,ORC1)\
> +  TEST1(type,ORC2)\
> +  TEST1(type,XOR)
> +
> +typedef vector bool char vboolchar_t;
> +typedef vector unsigned int vuint_t;
> +
> +TEST(uint8_t);
> +TEST(int8_t);
> +TEST(uint16_t);
> +TEST(int16_t);
> +TEST(uint32_t);
> +TEST(int32_t);
> +TEST(uint64_t);
> +TEST(int64_t);
> +TEST(vboolchar_t);
> +TEST(vuint_t);
> +
> +/* Recreate with:
> +   grep ' \*fuse_' fusion-p10-2logical.s|sed -e 's,^.*\*,,' |sort -k 7,7 
> |uniq -c|awk '{l=30-length($2); printf("/%s* { %s { scan-assembler-times 
> \"%s\"%-*s%4d } } *%s/\n","","dg-final",$2,l,"",$1,"");}'
> + */
> +  
> +/* { dg-final { scan-assembler-times "fuse_and_and/1"
>   16 } } */
> +/* { dg-final { scan-assembler-times "fuse_and_and/2"
>   16 } } */
> +/* { dg-final { scan-assembler-times "fuse_andc_and/0"   
>   16 } } */
> +/* { dg-final { scan-assembler-times "fuse_andc_and/1"   
>   26 } } */
> +/* { dg-final { scan-assembler-times "fuse_andc_and/2"   
>   48 } } */
> +/* { 

Re: [PATCH,rs6000] Add insn types for fusion pairs

2021-05-11 Thread Aaron Sawdey via Gcc-patches
Ping.

In answer to Will’s question — some of these are not immediately used but will 
be in other pending patches.

Aaron Sawdey, Ph.D. saw...@linux.ibm.com
IBM Linux on POWER Toolchain
 

> On Apr 26, 2021, at 1:04 PM, acsaw...@linux.ibm.com wrote:
> 
> From: Aaron Sawdey 
> 
> This adds new values for insn attr type for p10 fusion. The genfusion.pl
> script is modified to use them, and fusion.md regenerated to capture
> the new patterns. There are also some formatting only changes to
> fusion.md that apparently weren't captured after a previous commit
> of genfusion.pl.
> 
> If bootstrap/regtest passes, OK for trunk and backport to 11.2?
> 
> Thanks,
>Aaron
> 
> gcc/
>   * rs6000.md (define_attr "type"): Add types for fusion.
>   * genfusion.md (gen_ld_cmpi_p10): Use new fusion types.
>   (gen_2logical): Use new fusion types.
>   * fusion.md: Regenerate.
> ---
> gcc/config/rs6000/fusion.md| 288 -
> gcc/config/rs6000/genfusion.pl |   8 +-
> gcc/config/rs6000/rs6000.md|   4 +-
> 3 files changed, 152 insertions(+), 148 deletions(-)
> 
> diff --git a/gcc/config/rs6000/fusion.md b/gcc/config/rs6000/fusion.md
> index 56478fcae1d..6d71bc2df73 100644
> --- a/gcc/config/rs6000/fusion.md
> +++ b/gcc/config/rs6000/fusion.md
> @@ -35,7 +35,7 @@ (define_insn_and_split "*ld_cmpdi_cr0_DI_clobber_CC_none"
>(set (match_dup 2)
> (compare:CC (match_dup 0) (match_dup 3)))]
>   ""
> -  [(set_attr "type" "load")
> +  [(set_attr "type" "fused_load_cmpi")
>(set_attr "cost" "8")
>(set_attr "length" "8")])
> 
> @@ -56,7 +56,7 @@ (define_insn_and_split 
> "*ld_cmpldi_cr0_DI_clobber_CCUNS_none"
>(set (match_dup 2)
> (compare:CCUNS (match_dup 0) (match_dup 3)))]
>   ""
> -  [(set_attr "type" "load")
> +  [(set_attr "type" "fused_load_cmpi")
>(set_attr "cost" "8")
>(set_attr "length" "8")])
> 
> @@ -77,7 +77,7 @@ (define_insn_and_split "*ld_cmpdi_cr0_DI_DI_CC_none"
>(set (match_dup 2)
> (compare:CC (match_dup 0) (match_dup 3)))]
>   ""
> -  [(set_attr "type" "load")
> +  [(set_attr "type" "fused_load_cmpi")
>(set_attr "cost" "8")
>(set_attr "length" "8")])
> 
> @@ -98,7 +98,7 @@ (define_insn_and_split "*ld_cmpldi_cr0_DI_DI_CCUNS_none"
>(set (match_dup 2)
> (compare:CCUNS (match_dup 0) (match_dup 3)))]
>   ""
> -  [(set_attr "type" "load")
> +  [(set_attr "type" "fused_load_cmpi")
>(set_attr "cost" "8")
>(set_attr "length" "8")])
> 
> @@ -119,7 +119,7 @@ (define_insn_and_split "*lwa_cmpdi_cr0_SI_clobber_CC_none"
>(set (match_dup 2)
> (compare:CC (match_dup 0) (match_dup 3)))]
>   ""
> -  [(set_attr "type" "load")
> +  [(set_attr "type" "fused_load_cmpi")
>(set_attr "cost" "8")
>(set_attr "length" "8")])
> 
> @@ -140,7 +140,7 @@ (define_insn_and_split 
> "*lwz_cmpldi_cr0_SI_clobber_CCUNS_none"
>(set (match_dup 2)
> (compare:CCUNS (match_dup 0) (match_dup 3)))]
>   ""
> -  [(set_attr "type" "load")
> +  [(set_attr "type" "fused_load_cmpi")
>(set_attr "cost" "8")
>(set_attr "length" "8")])
> 
> @@ -161,7 +161,7 @@ (define_insn_and_split "*lwa_cmpdi_cr0_SI_SI_CC_none"
>(set (match_dup 2)
> (compare:CC (match_dup 0) (match_dup 3)))]
>   ""
> -  [(set_attr "type" "load")
> +  [(set_attr "type" "fused_load_cmpi")
>(set_attr "cost" "8")
>(set_attr "length" "8")])
> 
> @@ -182,7 +182,7 @@ (define_insn_and_split "*lwz_cmpldi_cr0_SI_SI_CCUNS_none"
>(set (match_dup 2)
> (compare:CCUNS (match_dup 0) (match_dup 3)))]
>   ""
> -  [(set_attr "type" "load")
> +  [(set_attr "type" "fused_load_cmpi")
>(set_attr "cost" "8")
>(set_attr "length" "8")])
> 
> @@ -203,7 +203,7 @@ (define_insn_and_split "*lwa_cmpdi_cr0_SI_EXTSI_CC_sign"
>(set (match_dup 2)
> (compare:CC (match_dup 0) (match_dup 3)))]
>   ""
> -  [(set_attr "type" "load")
> +  [(set_attr "type" "fused_load_cmpi")
>(set_attr "cost" "8")
>(set_attr "length" "8")])
> 
> @@ -224,7 +224,7 @@ (define_insn_and_split 
> "*lwz_cmpldi_cr0_SI_EXTSI_CCUNS_zero"
>(set (match_dup 2)
> (compare:CCUNS (match_dup 0) (match_dup 3)))]
>   ""
> -  [(set_attr "type" "load")
> +  [(set_attr "type" "fused_load_cmpi")
>(set_attr "cost" "8")
>(set_attr "length" "8")])
> 
> @@ -245,7 +245,7 @@ (define_insn_and_split "*lha_cmpdi_cr0_HI_clobber_CC_sign"
>(set (match_dup 2)
> (compare:CC (match_dup 0) (match_dup 3)))]
>   ""
> -  [(set_attr "type" "load")
> +  [(set_attr "type" "fused_load_cmpi")
>(set_attr "cost" "8")
>(set_attr "length" "8")])
> 
> @@ -266,7 +266,7 @@ (define_insn_and_split 
> "*lhz_cmpldi_cr0_HI_clobber_CCUNS_zero"
>(set (match_dup 2)
> (compare:CCUNS (match_dup 0) (match_dup 3)))]
>   ""
> -  [(set_attr "type" "load")
> +  [(set_attr "type" "fused_load_cmpi")
>(set_attr "cost" "8")
>(set_attr "length" "8")])
> 
> @@ -287,7 +287,7 @@ (define_insn_and_split "*lha_cmpdi_cr0_HI_EXTHI_CC_sign"
>(set 

Re: [PATCH,rs6000 0/2] p10 add-add and add-logical fusion series

2021-05-11 Thread Aaron Sawdey via Gcc-patches
Ping.

Aaron Sawdey, Ph.D. saw...@linux.ibm.com
IBM Linux on POWER Toolchain
 

> On Apr 26, 2021, at 3:21 PM, acsaw...@linux.ibm.com wrote:
> 
> From: Aaron Sawdey 
> 
> Two more sets of combine patterns for p10 fusion. These require 
> the "Add insn types for fusion pairs" patch I posted earlier today.
> 
> If ok I would like to put these in gcc 12 trunk and backport for 11.2.
> 
> Thanks,
>   Aaron
> 
> Aaron Sawdey (2):
>  combine patterns for add-add fusion
>  Fusion patterns for add-logical/logical-add
> 
> gcc/config/rs6000/fusion.md   | 908 +-
> gcc/config/rs6000/genfusion.pl| 127 ++-
> gcc/config/rs6000/rs6000-cpus.def |   8 +-
> gcc/config/rs6000/rs6000.c|   9 +
> gcc/config/rs6000/rs6000.opt  |  12 +
> .../gcc.target/powerpc/fusion-p10-addadd.c|  41 +
> .../gcc.target/powerpc/fusion-p10-logadd.c|  98 ++
> 7 files changed, 925 insertions(+), 278 deletions(-)
> create mode 100644 gcc/testsuite/gcc.target/powerpc/fusion-p10-addadd.c
> create mode 100644 gcc/testsuite/gcc.target/powerpc/fusion-p10-logadd.c
> 
> -- 
> 2.27.0
> 



Re: [PATCH] Bump LTO_major_version to 11.

2021-05-11 Thread Richard Biener via Gcc-patches
On Tue, May 11, 2021 at 3:39 PM Jakub Jelinek  wrote:
>
> On Tue, May 11, 2021 at 03:33:58PM +0200, Martin Liška wrote:
> > On 5/11/21 9:49 AM, Richard Biener wrote:
> > > I wonder if we can instead upstream the build-id use and conditionalize
> > > the checksum stuff on some configury?  Some people do seem worried
> > > about "weakening" the checksum.
> >
> > I like the build-id approach. Can we please upstream it?
>
> Not all hosts support build ids...
> So there needs to be an alternative for those.

Well, just keep the old code for those.

Until PCH dies.

Richard.

> Jakub
>


Re: [PATCH] Bump LTO_major_version to 11.

2021-05-11 Thread Jakub Jelinek via Gcc-patches
On Tue, May 11, 2021 at 03:33:58PM +0200, Martin Liška wrote:
> On 5/11/21 9:49 AM, Richard Biener wrote:
> > I wonder if we can instead upstream the build-id use and conditionalize
> > the checksum stuff on some configury?  Some people do seem worried
> > about "weakening" the checksum.
> 
> I like the build-id approach. Can we please upstream it?

Not all hosts support build ids...
So there needs to be an alternative for those.

Jakub



Re: [PATCH] Bump LTO_major_version to 11.

2021-05-11 Thread Martin Liška

On 5/11/21 9:49 AM, Richard Biener wrote:

I wonder if we can instead upstream the build-id use and conditionalize
the checksum stuff on some configury?  Some people do seem worried
about "weakening" the checksum.


I like the build-id approach. Can we please upstream it?

If it's not feasible then we can consider using my approach.

Martin


Re: [PATCH] forwprop: Support vec perm fed by CTOR and CTOR/CST [PR99398]

2021-05-11 Thread Richard Biener
On Fri, 7 May 2021, Kewen.Lin wrote:

> Hi, 
> 
> This patch is to teach forwprop to optimize some cases where the
> permutated operands of vector permutation are from two same type
> CTOR and CTOR or one CTOR and one VECTOR CST.  It aggressively
> takes VIEW_CONVERT_EXPR as trivial copies and transform the vector
> permutation into vector CTOR.
> 
> Bootstrapped/regtested on powerpc64le-linux-gnu P9, powerpc64-linux-gnu P8,
> x86_64-redhat-linux and aarch64-linux-gnu.
> 
> Is it ok for trunk?

Can you please avoid the changes to get_prop_source_stmt and
can_propagate_from?  It should work to add a single match
of a V_C_E after the get_prop_source_stmt call.  Ideally
we'd have

  /* Shuffle of a constructor.  */
  else if (code == CONSTRUCTOR || code == VECTOR_CST)
{
...
}
  else if (code == VIEW_CONVERT_EXPR)
{
   op1 must also be a V_C_E or VECTOR_CST here
}

but then I fear we have no canonicalization of the VECTOR_CST
to 2nd VEC_PERM operand.  But then moving the op1 gathering
out of the if (code == CONSTRUCTOR || code == VECTOR_CST)
case (doesn't need an else) might still make such refactoring
possible as first matching

  if (code == VIEW_CONVERT_EXPR || code2 == VIEW_CONVERT_EXPR)
   {
...
  }
  else if (code == CONSTRUCTOR || code == VECTOR_CST)
...

I'd appreciate Richard S. comments on the 
vec_perm_indices::new_shrinked_vector code.

Thanks,
Richard.

> BR,
> Kewen
> --
> gcc/ChangeLog:
> 
>   PR tree-optimization/99398
>   * tree-ssa-forwprop.c (get_prop_source_stmt): Add optional argument
>   view_conv_prop to indicate whether to take VIEW_CONVERT_EXPR as
>   trivial copy.  Add handlings for this argument.
>   (remove_prop_source_from_use): Likewise.
>   (simplify_permutation): Optimize some cases where the fed operands
>   are CTOR/CST and propagated through VIEW_CONVERT_EXPR.  Add the
>   call to vec_perm_indices::new_shrinked_vector.
>   * vec-perm-indices.c (vec_perm_indices::new_shrinked_vector): New
>   function.
>   * vec-perm-indices.h (vec_perm_indices::new_shrinked_vector): New
>   declare.
> 
> gcc/testsuite/ChangeLog:
> 
>   PR tree-optimization/99398
>   * gcc.target/powerpc/vec-perm-ctor-run.c: New test.
>   * gcc.target/powerpc/vec-perm-ctor.c: New test.
>   * gcc.target/powerpc/vec-perm-ctor.h: New test.
> 

-- 
Richard Biener 
SUSE Software Solutions Germany GmbH, Maxfeldstrasse 5, 90409 Nuernberg,
Germany; GF: Felix Imendörffer; HRB 36809 (AG Nuernberg)


Re: [PATCH, OG10, OpenMP 5.0, committed] Remove array section base-pointer mapping semantics, and other front-end adjustments.

2021-05-11 Thread Julian Brown
On Tue, 11 May 2021 19:28:04 +0800
Chung-Lin Tang  wrote:

> This patch largely implements three pieces of functionality:
> 
> (1) Per discussion and clarification on the omp-lang mailing list,
> standards conforming behavior for mapping array sections should *NOT*
> also map the base-pointer, i.e for this code:
> 
> struct S { int *ptr; ... };
> struct S s;
> #pragma omp target enter data map(to: s.ptr[:100])
> 
> Currently we generate after gimplify:
> #pragma omp target enter data map(struct:s [len: 1]) map(alloc:s.ptr
> [len: 8]) \ map(to:*_1 [len: 400]) map(attach:s.ptr [bias: 0])
> 
> which is deemed incorrect. After this patch, the gimplify results are
> now adjusted to: #pragma omp target enter data map(to:*_1 [len: 400])
> map(attach:s.ptr [bias: 0]) (the attach operation is still generated,
> and if s.ptr is already mapped prior, attachment will happen)

Oh, that's not going to play nicely (eventually?) with the patch series
I just posted... we probably need to clarify what the intention is for
OpenACC, but IIUC "user expectation" (i.e. existing code) expects the
base-pointer mapping to happen.

Julian


Re: [PATCH] s390: Add more vcond_mask patterns.

2021-05-11 Thread Andreas Krebbel via Gcc-patches
Hi Robin,


On 5/5/21 5:18 PM, Robin Dapp wrote:
...
> diff --git a/gcc/config/s390/vector.md b/gcc/config/s390/vector.md
> index c80d582a300..7c730432d80 100644
> --- a/gcc/config/s390/vector.md
> +++ b/gcc/config/s390/vector.md
> @@ -36,6 +36,7 @@
>  (define_mode_iterator V_HW2 [V16QI V8HI V4SI V2DI V2DF (V4SF "TARGET_VXE")
>(V1TF "TARGET_VXE") (TF "TARGET_VXE")])
>
> +

whitespace diff?

>  (define_mode_iterator V_HW_64 [V2DI V2DF])
>  (define_mode_iterator VT_HW_HSDT [V8HI V4SI V4SF V2DI V2DF V1TI V1TF TI TF])
>  (define_mode_iterator V_HW_HSD [V8HI V4SI (V4SF "TARGET_VXE") V2DI V2DF])
> @@ -725,6 +726,26 @@
>"TARGET_VX"
>"operands[4] = CONST0_RTX (mode);")
>
> +(define_expand "vcond_mask_"
> +  [(set (match_operand:VX_VEC_CONV_BFP 0 "register_operand" "")
> + (if_then_else:VX_VEC_CONV_BFP
> +  (eq (match_operand:VX_VEC_CONV_INT 3 "register_operand" "")
> +  (match_dup 4))
> +  (match_operand:VX_VEC_CONV_BFP 2 "register_operand" "")
> +  (match_operand:VX_VEC_CONV_BFP 1 "register_operand" "")))]
> +  "TARGET_VX"
> +  "operands[4] = CONST0_RTX (mode);")

This should be covered by the existing pattern already.

> +
> +(define_expand "vcond_mask_"
> +  [(set (match_operand:VX_VEC_CONV_INT 0 "register_operand" "")
> + (if_then_else:VX_VEC_CONV_INT
> +  (eq (match_operand:VX_VEC_CONV_BFP 3 "register_operand" "")
> +  (match_dup 4))
> +  (match_operand:VX_VEC_CONV_INT 2 "register_operand" "")
> +  (match_operand:VX_VEC_CONV_INT 1 "register_operand" "")))]
> +  "TARGET_VX"
> +  "operands[4] = CONST0_RTX (mode);")

op3 is supposed to be a comparison result operand. A vector float mode looks 
wrong here.

I think the real problem is the expander name. That's why it could not be found 
by optab. The second
mode needs to be the int vector mode of op3. With that change the testcases 
work as expected:

diff --git a/gcc/config/s390/vector.md b/gcc/config/s390/vector.md
index c80d582a300d..ab605b3d2cf3 100644
--- a/gcc/config/s390/vector.md
+++ b/gcc/config/s390/vector.md
@@ -715,7 +715,7 @@
   DONE;
 })

-(define_expand "vcond_mask_"
+(define_expand "vcond_mask_"
   [(set (match_operand:V 0 "register_operand" "")
(if_then_else:V
 (eq (match_operand: 3 "register_operand" "")


> +
>
>  ; We only have HW support for byte vectors.  The middle-end is
>  ; supposed to lower the mode if required.
> diff --git a/gcc/testsuite/gcc.target/s390/vector/vcond-mixed-double.c
b/gcc/testsuite/gcc.target/s390/vector/vcond-mixed-double.c
> new file mode 100644
> index 000..8795d08a732
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/s390/vector/vcond-mixed-double.c
> @@ -0,0 +1,41 @@
> +/* Check for vectorization of mixed conditionals.  */
> +/* { dg-do compile { target { s390*-*-* } } } */
> +/* { dg-options "-O3 -march=z14 -mzarch" } */

I think you have to add -fdump-tree-vect-details here. Otherwise the dump scan 
below will just go as
"unresolved".

> +
> +double xd[1024];
> +double zd[1024];
> +double wd[1024];
> +
> +long xl[1024];
> +long zl[1024];
> +long wl[1024];
> +
> +void foold ()
> +{
> +  int i;
> +  for (i = 0; i < 1024; ++i)
> +zd[i] = xl[i] ? zd[i] : wd[i];
> +}
> +
> +void foodl ()
> +{
> +  int i;
> +  for (i = 0; i < 1024; ++i)
> +zl[i] = xd[i] ? zl[i] : wl[i];
> +}
> +
> +void foold2 ()
> +{
> +  int i;
> +  for (i = 0; i < 1024; ++i)
> +zd[i] = (xd[i] > 0) ? zd[i] : wd[i];
> +}
> +
> +void foold3 ()
> +{
> +  int i;
> +  for (i = 0; i < 1024; ++i)
> +zd[i] = (xd[i] > 0. & wd[i] < 0.) ? zd[i] : wd[i];
> +}
> +
> +/* { dg-final { scan-tree-dump-times "vectorized 1 loops" 4 "vect" } } */
> diff --git a/gcc/testsuite/gcc.target/s390/vector/vcond-mixed-float.c
b/gcc/testsuite/gcc.target/s390/vector/vcond-mixed-float.c
> new file mode 100644
> index 000..1153cace420
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/s390/vector/vcond-mixed-float.c
> @@ -0,0 +1,41 @@
> +/* Check for vectorization of mixed conditionals.  */
> +/* { dg-do compile { target { s390*-*-* } } } */
> +/* { dg-options "-O3 -march=z15 -mzarch" } */

Likewise.

> +
> +float xf[1024];
> +float zf[1024];
> +float wf[1024];
> +
> +int xi[1024];
> +int zi[1024];
> +int wi[1024];
> +
> +void fooif ()
> +{
> +  int i;
> +  for (i = 0; i < 1024; ++i)
> +zf[i] = xi[i] ? zf[i] : wf[i];
> +}
> +
> +void foofi ()
> +{
> +  int i;
> +  for (i = 0; i < 1024; ++i)
> +zi[i] = xf[i] ? zi[i] : wi[i];
> +}
> +
> +void fooif2 ()
> +{
> +  int i;
> +  for (i = 0; i < 1024; ++i)
> +zf[i] = (xf[i] > 0) ? zf[i] : wf[i];
> +}
> +
> +void fooif3 ()
> +{
> +  int i;
> +  for (i = 0; i < 1024; ++i)
> +zf[i] = (xf[i] > 0.f & wf[i] < 0.f) ? zf[i] : wf[i];
> +}
> +
> +/* { dg-final { scan-tree-dump-times "vectorized 1 loops" 4 "vect" } } */
> --
> 2.23.0
>

Andreas


Re: [PATCH 1.0/2] ipa-sra: Introduce a mini-DCE to tree-inline.c (PR 93385)

2021-05-11 Thread Richard Biener via Gcc-patches
On Mon, May 10, 2021 at 8:52 PM Martin Jambor  wrote:
>
> Hi,
>
> On Mon, May 10 2021, Richard Biener wrote:
> > I've tried to have a look at this patch but it does a lot of IPA specific
> > refactoring(?), so the actual DCE bits are hard to find.  Is it possible
> > to split the patch up or is it too entangled?
> >
>
> Yes:
>
> I was asked by Richi to split my fix for PR 93385 for easier review
> into IPA-SRA materialization refactoring and the actual DCE addition.
> This is the second part that actually contains the DCE of statements
> that IPA-SRA should not leave behind because they can have problematic
> side effects, even if they are useless, so that we do not depend on
> tree-dce to remove them for correctness.
>
> The patch fixes the problem by doing a def-use walk when materializing
> clones, marking which statements should not be copied and which
> SSA_NAMEs do not need to be computed because eventually they would be
> DCEd.  We do this on the original function body and tree-inline simply
> does not copy statements which are "dead."
>
> The only complication is removing dead argument calls because that
> needs to be communicated to callee redirection code using the
> infrastructure introduced by the previous patch.
>
> I added all testcases of the original patch to this one, although some
> probably test behavior introduced in the previous patch.
>
> The patch is so far only lightly tested but I have verified that
> together with the second one they make up pretty much exactly the
> original one (modulo m_new_call_arg_modification_info) which I did
> bootstrap this morning.  I will of course bootstrap it independently
> too.
>
> What do you think?
>
> Martin
>
>
> gcc/ChangeLog:
>
> 2021-05-10  Martin Jambor  
>
> PR ipa/93385
> * ipa-param-manipulation.h (class ipa_param_body_adjustments): New
> members m_dead_stmts and m_dead_ssas.
> * ipa-param-manipulation.c (phi_arg_will_live_p): New function.
> (ipa_param_body_adjustments::mark_dead_statements): Likwise.
> (ipa_param_body_adjustments::common_initialization): Call it on
> all removed but not split parameters.
> (ipa_param_body_adjustments::ipa_param_body_adjustments): Initialize
> new mwmbers.
> (ipa_param_body_adjustments::modify_call_stmt): Remove arguments that
> are dead.
> * tree-inline.c (remap_gimple_stmt): Do not copy dead statements, 
> reset
> dead debug statements.
> (copy_phis_for_bb): Do not copy dead PHI nodes.
>
> gcc/testsuite/ChangeLog:
>
> 2021-03-22  Martin Jambor  
>
> PR ipa/93385
> * gcc.dg/ipa/pr93385.c: New test.
> * gcc.dg/ipa/ipa-sra-23.c: Likewise.
> * gcc.dg/ipa/ipa-sra-24.c: Likewise.
> * g++.dg/ipa/ipa-sra-4.C: Likewise.
> ---
>  gcc/ipa-param-manipulation.c  | 142 +++---
>  gcc/ipa-param-manipulation.h  |   6 ++
>  gcc/testsuite/g++.dg/ipa/ipa-sra-4.C  |  37 +++
>  gcc/testsuite/gcc.dg/ipa/ipa-sra-23.c |  24 +
>  gcc/testsuite/gcc.dg/ipa/ipa-sra-24.c |  20 
>  gcc/testsuite/gcc.dg/ipa/pr93385.c|  27 +
>  gcc/tree-inline.c |  18 +++-
>  7 files changed, 256 insertions(+), 18 deletions(-)
>  create mode 100644 gcc/testsuite/g++.dg/ipa/ipa-sra-4.C
>  create mode 100644 gcc/testsuite/gcc.dg/ipa/ipa-sra-23.c
>  create mode 100644 gcc/testsuite/gcc.dg/ipa/ipa-sra-24.c
>  create mode 100644 gcc/testsuite/gcc.dg/ipa/pr93385.c
>
> diff --git a/gcc/ipa-param-manipulation.c b/gcc/ipa-param-manipulation.c
> index 424b8e5343f..d7d73542856 100644
> --- a/gcc/ipa-param-manipulation.c
> +++ b/gcc/ipa-param-manipulation.c
> @@ -969,6 +969,97 @@ ipa_param_body_adjustments::carry_over_param (tree t)
>return new_parm;
>  }
>
> +/* Return true if BLOCKS_TO_COPY is NULL or if PHI has an argument ARG in
> +   position that corresponds to an edge that is coming from a block that has
> +   the corresponding bit set in BLOCKS_TO_COPY.  */
> +
> +static bool
> +phi_arg_will_live_p (gphi *phi, bitmap blocks_to_copy, tree arg)
> +{
> +  bool arg_will_survive = false;
> +  if (!blocks_to_copy)
> +arg_will_survive = true;
> +  else
> +for (unsigned i = 0; i < gimple_phi_num_args (phi); i++)
> +  if (gimple_phi_arg_def (phi, i) == arg

I think this is prone to quadratic, it would be nice to use the
faster FOR_EACH_IMM_USE_FAST () below and then
phi_arg_index_from_use () to get to the corresponding edge
directly - this would remove the loop over PHI args here and
you can then inline the function below.

> + && bitmap_bit_p (blocks_to_copy,
> +  gimple_phi_arg_edge (phi, i)->src->index))
> +   {
> + arg_will_survive = true;
> + break;
> +   }
> +  return arg_will_survive;
> +}
> +
> +/* Populate m_dead_stmts given that DEAD_PARAM is going to be removed without
> +   any replacement or splitting.  REPL is the replacement VAR_SECL to base 
> any
> +   

Re: [PATCH] ipa/100513 - fix SSA_NAME_DEF_STMT corruption in IPA param manip

2021-05-11 Thread Martin Jambor
Hi,

On Tue, May 11 2021, Richard Biener wrote:
> This fixes unintended clobbering of SSA_NAME_DEF_STMT of the
> cloned/inlined from SSA name during IPA parameter manipulation
> of call stmt LHSs.  gimple_call_set_lhs adjusts SSA_NAME_DEF_STMT
> of the lhs to the stmt being modified but when
> ipa_param_body_adjustments::modify_call_stmt is called the
> cloning/inlining process has not yet remapped the stmts operands
> to the copy variants but they are still original.
>
> Bootstrapped on x86_64-unknown-linux-gnu, testing in progress.
>
> OK for trunk and branch?

Yes, thank you.

Martin


>
> Thanks,
> Richard.
>
> 2021-05-11  Richard Biener  
>
>   PR ipa/100513
>   * ipa-param-manipulation.c
>   (ipa_param_body_adjustments::modify_call_stmt): Avoid
>   altering SSA_NAME_DEF_STMT by adjusting the calls LHS
>   via gimple_call_lhs_ptr.



Re: [PATCH 1/2] vect: Add costing_for_scalar parameter to init_cost hook

2021-05-11 Thread Richard Biener via Gcc-patches
On Tue, May 11, 2021 at 12:50 PM Kewen.Lin  wrote:
>
> Hi Richi,
>
> > OTOH we already pass scalar_stmt to individual add_stmt_cost,
> > so not sure whether the context really matters.  That said,
> > the density test looks "interesting" ... the intent was that finish_cost
> > might look at gathered data from add_stmt, not that it looks at
> > the GIMPLE IL ... so why are you not counting vector_stmt vs.
> > scalar_stmt entries in vect_body and using that for this metric?
> >
> 
>  Good to know the intention behind finish_cost, thanks!
> 
>  I'm afraid that the check on vector_stmt and scalar_stmt entries
>  from add_stmt_cost doesn't work for the density test here.  The
>  density test focuses on the vector version itself, there are some
>  stmts whose relevants are marked as vect_unused_in_scope, IIUC
>  they won't be passed down when costing for both versions.  But the
>  existing density check would like to know the cost for the
>  non-vectorized part.  The current implementation does:
> 
>   vec_cost = data->cost[vect_body]
> 
>    if (!STMT_VINFO_RELEVANT_P (stmt_info)
>    && !STMT_VINFO_IN_PATTERN_P (stmt_info))
>  not_vec_cost++
> 
>   density_pct = (vec_cost * 100) / (vec_cost + not_vec_cost);
> 
>  it takes those unrelevant stmts into account, and then has
>  both costs from the non-vectorized part (not_vec_cost)
>  and vectorized part (cost[vect_body]), it can calculate the
>  vectorization code density ratio.
> >>>
> >>> Yes, but then what "relevant" stmts are actually needed and what
> >>> not is missed by your heuristics.  It's really some GIGO one
> >>> I fear - each vectorized data reference will add a pointer IV
> >>> (eventually commoned by IVOPTs later) and pointer value updates
> >>> that are not accounted for in costing (the IVs and updates in the
> >>> scalar code are marked as not relevant).  Are those the stmts
> >>> this heuristic wants to look at?
> >>
> >> Yes, the IVs and updates (even the comparison for exit) are what
> >> the heuristics tries to count.  In most cases, the non vectorized
> >> part in the loop are IV updates.  And it's so true that the
> >> collected not_vec_cost could be not accurate, but it seems hard
> >> to predict the cost exactly here?
> >>
> >> Assuming this not_vect_cost cost is over priced, it could result
> >> in a lower density ratio than what it should be.  Also assuming
> >> the density threshold is relatively conservative, in this case
> >> if the ratio still exceeds the density threshold, we can say the
> >> loop is really dense.  It could miss to catch some "dense" loops,
> >> but I hope it won't take "non-dense" loops as "dense" unexpectedly.
> >
> > So we could in principle include IVs and updates in the costing but
> > then the vectorizer isn't absolutely careful for doing scalar cleanups
> > and instead expects IVOPTs to create canonical IVs.  Note for
> > the scalar part those stmts are not costed either, we'd have to
> > change that as well.  What this would mean is that for a scalar
> > loop accessing a[i] and b[i] we'd have one original IV + update
> > and the vectorizer generates two pointer IVs + updates.
> >
>
>
> I broke down my understanding a bit below to ensure it's correct.
>
>   - We can pass down those "unrelevant" stmts into add_stmt_cost
> for both scalar and vector versions, then targets can check
> stmts accordingly instead of scanning IL by themselves.
> For scalar version, these are mainly original IV + update
> + some address ref calculation;  while for vector version,
> these are mainly pointer IVs + updates.
>
>   - What's the cost assigned for these "unrelevant" stmts?
> The comments seems to imply we want to cost them?   If so,
> I am worried that it can break some current costing
> heuristics which don't consider these costs.  Besides,
> these "unrelavant" stmts can be optimized later, if we
> consider them somwhere like calculating profitable min
> iter, could result in worse code?
> Can we pass them down but cost them freely?
>
> > But in the end the vector code shouldn't end up worse than the
> > scalar code with respect to IVs - the cases where it would should
> > be already costed.  So I wonder if you have specific examples
> > where things go worse enough for the heuristic to trigger?
> >
>
> One typical case that I worked on to reuse this density check is the
> function mat_times_vec of src file block_solver.fppized.f of SPEC2017
> 503.bwaves_r, the density with the existing heuristic is 83 (doesn't
> exceed the threshold unlikely).  The interesting loop is the innermost
> one while option set is "-O2 -mcpu=power8 -ffast-math -ftree-vectorize".
> We have verified that this loop isn't profitable to be vectorized at
> O2 (without loop-interchange).

Yeah, but that's because the loop only runs 5 iterations, not 

[PATCH, OG10, OpenMP 5.0, committed] Remove array section base-pointer mapping semantics, and other front-end adjustments.

2021-05-11 Thread Chung-Lin Tang

This patch largely implements three pieces of functionality:

(1) Per discussion and clarification on the omp-lang mailing list,
standards conforming behavior for mapping array sections should *NOT* also map 
the base-pointer,
i.e for this code:

   struct S { int *ptr; ... };
   struct S s;
   #pragma omp target enter data map(to: s.ptr[:100])

Currently we generate after gimplify:
#pragma omp target enter data map(struct:s [len: 1]) map(alloc:s.ptr [len: 8]) \
  map(to:*_1 [len: 400]) map(attach:s.ptr [bias: 0])

which is deemed incorrect. After this patch, the gimplify results are now 
adjusted to:
#pragma omp target enter data map(to:*_1 [len: 400]) map(attach:s.ptr [bias: 0])
(the attach operation is still generated, and if s.ptr is already mapped prior, 
attachment will happen)

The correct way of achieving the base-pointer-also-mapped behavior would be to 
use:
#pragma omp target enter data map(to: s.ptr, s.ptr[:100])

This adjustment in behavior required a number of small adjustments here and 
there in gimplify, including
to accomodate map sequences for C++ references.

There is also a small Fortran front-end patch involved (hence CCing Tobias).
The new gimplify processing changed behavior in handling 
GOMP_MAP_ALWAYS_POINTER maps such that
the libgomp.fortran/struct-elem-map-1.f90 regressed. It appeared that the 
Fortran FE was generating
a GOMP_MAP_ALWAYS_POINTER for array types, which didn't seem quite correct, and 
the pre-patch behavior
was removing this map anyways. I have a small change in 
trans-openmp.c:gfc_trans_omp_array_section
to not generate the map in this case, and so far no bad test results.

(2) The second part (though kind of related to the first above) are fixes in 
libgomp/target.c
to not overwrite attached pointers when handling device<->host copies, mainly for the 
"always" case.
This behavior is also noted in the 5.0 spec, but not yet properly coded before.

(3) The third is a set of changes to the C/C++ front-ends to extend the allowed 
component access syntax
in map clauses. This is actually mainly an effort to allow SPEC HPC to compile, 
so despite in the long
term the entire map clause syntax parsing is probably going to be revamped, 
we're still adding this in
for now. These changes are enabled for both OpenACC and OpenMP.

Tested on x86_64-linux with nvptx offloading with no regressions. Pushed to 
devel/omp/gcc-10, will
send mainline version of patch later.

Chung-Lin

2021-05-11  Chung-Lin Tang  

gcc/c/ChangeLog:

* c-parser.c (struct omp_dim): New struct type for use inside
c_parser_omp_variable_list.
(c_parser_omp_variable_list): Allow multiple levels of array and
component accesses in array section base-pointer expression.
(c_parser_omp_clause_to): Set 'allow_deref' to true in call to
c_parser_omp_var_list_parens.
(c_parser_omp_clause_from): Likewise.
* c-typeck.c (handle_omp_array_sections_1): Extend allowed range
of base-pointer expressions involving INDIRECT/MEM/ARRAY_REF and
POINTER_PLUS_EXPR.
(c_finish_omp_clauses): Extend allowed ranged of expressions
involving INDIRECT/MEM/ARRAY_REF and POINTER_PLUS_EXPR.

gcc/cp/ChangeLog:

* parser.c (struct omp_dim): New struct type for use inside
cp_parser_omp_var_list_no_open.
(cp_parser_omp_var_list_no_open): Allow multiple levels of array and
component accesses in array section base-pointer expression.
(cp_parser_omp_all_clauses): Set 'allow_deref' to true in call to
cp_parser_omp_var_list for to/from clauses.
* semantics.c (handle_omp_array_sections_1): Extend allowed range
of base-pointer expressions involving INDIRECT/MEM/ARRAY_REF and
POINTER_PLUS_EXPR.
(handle_omp_array_sections): Adjust pointer map generation of
references.
(finish_omp_clauses): Extend allowed ranged of expressions
involving INDIRECT/MEM/ARRAY_REF and POINTER_PLUS_EXPR.

gcc/fortran/ChangeLog:

* trans-openmp.c (gfc_trans_omp_array_section): Do not generate
GOMP_MAP_ALWAYS_POINTER map for main array maps of ARRAY_TYPE type.


gcc/ChangeLog:

* gimplify.c (extract_base_bit_offset): Add 'tree *offsetp' parameter,
accomodate case where 'offset' return of get_inner_reference is
non-NULL.
(is_or_contains_p): Further robustify conditions.
(omp_target_reorder_clauses): In alloc/to/from sorting phase, also
move following GOMP_MAP_ALWAYS_POINTER maps along.  Add new sorting
phase where we make sure pointers with an attach/detach map are ordered
correctly.
(gimplify_scan_omp_clauses): Add modifications to avoid creating
GOMP_MAP_STRUCT and associated alloc map for attach/detach maps.

gcc/testsuite/ChangeLog:

* c-c++-common/goacc/deep-copy-arrayofstruct.c: Adjust testcase.
* c-c++-common/gomp/target-enter-data-1.c: 

[PATCH] ipa/100513 - fix SSA_NAME_DEF_STMT corruption in IPA param manip

2021-05-11 Thread Richard Biener
This fixes unintended clobbering of SSA_NAME_DEF_STMT of the
cloned/inlined from SSA name during IPA parameter manipulation
of call stmt LHSs.  gimple_call_set_lhs adjusts SSA_NAME_DEF_STMT
of the lhs to the stmt being modified but when
ipa_param_body_adjustments::modify_call_stmt is called the
cloning/inlining process has not yet remapped the stmts operands
to the copy variants but they are still original.

Bootstrapped on x86_64-unknown-linux-gnu, testing in progress.

OK for trunk and branch?

Thanks,
Richard.

2021-05-11  Richard Biener  

PR ipa/100513
* ipa-param-manipulation.c
(ipa_param_body_adjustments::modify_call_stmt): Avoid
altering SSA_NAME_DEF_STMT by adjusting the calls LHS
via gimple_call_lhs_ptr.
---
 gcc/ipa-param-manipulation.c | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/gcc/ipa-param-manipulation.c b/gcc/ipa-param-manipulation.c
index 1d1e64f546a..f2d91476655 100644
--- a/gcc/ipa-param-manipulation.c
+++ b/gcc/ipa-param-manipulation.c
@@ -1692,7 +1692,9 @@ ipa_param_body_adjustments::modify_call_stmt (gcall 
**stmt_p)
   if (tree lhs = gimple_call_lhs (stmt))
{
  modify_expression (, false);
- gimple_call_set_lhs (new_stmt, lhs);
+ /* Avoid adjusting SSA_NAME_DEF_STMT of a SSA lhs, SSA names
+have not yet been remapped.  */
+ *gimple_call_lhs_ptr (new_stmt) = lhs;
}
   *stmt_p = new_stmt;
   return true;
-- 
2.26.2


[committed] aarch64: A couple of mul_laneq tweaks

2021-05-11 Thread Richard Sandiford via Gcc-patches
This patch removes the duplication between the mul_laneq3
and the older mul-lane patterns.  The older patterns were previously
divided into two based on whether the indexed operand had the same mode
as the other operands or whether it had the opposite length from the
other operands (64-bit vs. 128-bit).  However, it seemed easier to
divide them instead based on whether the indexed operand was 64-bit or
128-bit, since that maps directly to the arm_neon.h “q” conventions.

Also, it looks like the older patterns were missing cases for
V8HF<->V4HF combinations, which meant that vmul_laneq_f16 and
vmulq_lane_f16 didn't produce single instructions.

There was a typo in the V2SF entry for VCONQ, but in practice
no patterns were using that entry until now.

The test passes for both endiannesses, but endianness does change
the mapping between regexps and functions.

Tested on aarch64-linux-gnu and aarch64_be-elf, pushed to trunk.

Richard


gcc/
* config/aarch64/iterators.md (VMUL_CHANGE_NLANES): Delete.
(VMULD): New iterator.
(VCOND): Handle V4HF and V8HF.
(VCONQ): Fix entry for V2SF.
* config/aarch64/aarch64-simd.md (mul_lane3): Use VMULD
instead of VMUL.  Use a 64-bit vector mode for the indexed operand.
(*aarch64_mul3_elt_): Merge with...
(mul_laneq3): ...this define_insn.  Use VMUL instead of VDQSF.
Use a 128-bit vector mode for the indexed operand.  Use stype for
the scheduling type.

gcc/testsuite/
* gcc.target/aarch64/fmul_lane_1.c: New test.
---
 gcc/config/aarch64/aarch64-simd.md| 46 +--
 gcc/config/aarch64/iterators.md   | 13 ++--
 .../gcc.target/aarch64/fmul_lane_1.c  | 59 +++
 3 files changed, 82 insertions(+), 36 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/aarch64/fmul_lane_1.c

diff --git a/gcc/config/aarch64/aarch64-simd.md 
b/gcc/config/aarch64/aarch64-simd.md
index 234762960bd..99620895e78 100644
--- a/gcc/config/aarch64/aarch64-simd.md
+++ b/gcc/config/aarch64/aarch64-simd.md
@@ -719,51 +719,35 @@ (define_expand "copysign3"
 )
 
 (define_insn "mul_lane3"
- [(set (match_operand:VMUL 0 "register_operand" "=w")
-   (mult:VMUL
-(vec_duplicate:VMUL
+ [(set (match_operand:VMULD 0 "register_operand" "=w")
+   (mult:VMULD
+(vec_duplicate:VMULD
   (vec_select:
-(match_operand:VMUL 2 "register_operand" "")
+(match_operand: 2 "register_operand" "")
 (parallel [(match_operand:SI 3 "immediate_operand" "i")])))
-(match_operand:VMUL 1 "register_operand" "w")))]
+(match_operand:VMULD 1 "register_operand" "w")))]
   "TARGET_SIMD"
   {
-operands[3] = aarch64_endian_lane_rtx (mode, INTVAL (operands[3]));
+operands[3] = aarch64_endian_lane_rtx (mode, INTVAL (operands[3]));
 return "mul\\t%0., %1., %2.[%3]";
   }
   [(set_attr "type" "neon_mul__scalar")]
 )
 
 (define_insn "mul_laneq3"
-  [(set (match_operand:VDQSF 0 "register_operand" "=w")
-   (mult:VDQSF
- (vec_duplicate:VDQSF
-   (vec_select:
- (match_operand:V4SF 2 "register_operand" "w")
- (parallel [(match_operand:SI 3 "immediate_operand" "i")])))
- (match_operand:VDQSF 1 "register_operand" "w")))]
-  "TARGET_SIMD"
-  {
-operands[3] = aarch64_endian_lane_rtx (V4SFmode, INTVAL (operands[3]));
-return "fmul\\t%0., %1., %2.[%3]";
-  }
-  [(set_attr "type" "neon_fp_mul_s_scalar")]
-)
-
-(define_insn "*aarch64_mul3_elt_"
-  [(set (match_operand:VMUL_CHANGE_NLANES 0 "register_operand" "=w")
- (mult:VMUL_CHANGE_NLANES
-   (vec_duplicate:VMUL_CHANGE_NLANES
+  [(set (match_operand:VMUL 0 "register_operand" "=w")
+ (mult:VMUL
+   (vec_duplicate:VMUL
  (vec_select:
-   (match_operand: 1 "register_operand" "")
-   (parallel [(match_operand:SI 2 "immediate_operand")])))
-  (match_operand:VMUL_CHANGE_NLANES 3 "register_operand" "w")))]
+   (match_operand: 2 "register_operand" "")
+   (parallel [(match_operand:SI 3 "immediate_operand")])))
+  (match_operand:VMUL 1 "register_operand" "w")))]
   "TARGET_SIMD"
   {
-operands[2] = aarch64_endian_lane_rtx (mode, INTVAL 
(operands[2]));
-return "mul\\t%0., %3., %1.[%2]";
+operands[3] = aarch64_endian_lane_rtx (mode, INTVAL (operands[3]));
+return "mul\\t%0., %1., %2.[%3]";
   }
-  [(set_attr "type" "neon_mul__scalar")]
+  [(set_attr "type" "neon_mul__scalar")]
 )
 
 (define_insn "mul_n3"
diff --git a/gcc/config/aarch64/iterators.md b/gcc/config/aarch64/iterators.md
index c57aa6bf2f4..69d9dbebe8f 100644
--- a/gcc/config/aarch64/iterators.md
+++ b/gcc/config/aarch64/iterators.md
@@ -312,15 +312,17 @@ (define_mode_iterator SX2 [SI SF])
 (define_mode_iterator DSX [DF DI SF SI])
 
 
-;; Modes available for Advanced SIMD mul lane operations.
+;; Modes available for Advanced SIMD mul operations.
 (define_mode_iterator VMUL [V4HI V8HI V2SI V4SI
   

Re: [PATCH] arm: Avoid emitting bogus CFA adjusts for CMSE nonsecure calls [PR99725]

2021-05-11 Thread Richard Earnshaw via Gcc-patches




On 06/05/2021 09:27, Alex Coplan via Gcc-patches wrote:

Hi all,

The PR shows us attaching REG_CFA_ADJUST_CFA notes to stack pointer
adjustments emitted in cmse_nonsecure_call_inline_register_clear (when
-march=armv8.1-m.main). However, the stack pointer is not guaranteed to
be the CFA reg. If we're at -O0 or we have -fno-omit-frame-pointer, then
the frame pointer will be used as the CFA reg, and these notes on the sp
adjustments will lead to ICEs in dwarf2out_frame_debug_adjust_cfa.

This patch avoids emitting these notes if the current function has a
frame pointer.

Testing:
  * Bootstrapped and regtested on arm-linux-gnueabihf, no regressions.
  * Regtested an arm-eabi cross configured with --with-arch=armv8.1-m.main, no
regressions.

OK for trunk and backports as appropriate?

Thanks,
Alex

gcc/ChangeLog:

PR target/99725
* config/arm/arm.c (cmse_nonsecure_call_inline_register_clear):
Avoid emitting CFA adjusts on the sp if we have the fp.

gcc/testsuite/ChangeLog:

PR target/99725
* gcc.target/arm/cmse/pr99725.c: New test.



OK.

R.


Re: [PATCH] regcprop: Fix another cprop_hardreg bug [PR100342]

2021-05-11 Thread Richard Sandiford via Gcc-patches
Jakub Jelinek  writes:
> On Tue, Jan 19, 2021 at 04:10:33PM +, Richard Sandiford via Gcc-patches 
> wrote:
>> Ah, ok, thanks for the extra context.
>> 
>> So AIUI the problem when recording xmm2<-di isn't just:
>> 
>>  [A] partial_subreg_p (vd->e[sr].mode, GET_MODE (src))
>> 
>> but also that:
>> 
>>  [B] partial_subreg_p (vd->e[sr].mode, vd->e[vd->e[sr].oldest_regno].mode)
>> 
>> For example, all registers in this sequence can be part of the same chain:
>> 
>> (set (reg:HI R1) (reg:HI R0))
>> (set (reg:SI R2) (reg:SI R1)) // [A]
>> (set (reg:DI R3) (reg:DI R2)) // [A]
>> (set (reg:SI R4) (reg:SI R[0-3]))
>> (set (reg:HI R5) (reg:HI R[0-4]))
>> 
>> But:
>> 
>> (set (reg:SI R1) (reg:SI R0))
>> (set (reg:HI R2) (reg:HI R1))
>> (set (reg:SI R3) (reg:SI R2)) // [A] && [B]
>> 
>> is problematic because it dips below the precision of the oldest regno
>> and then increases again.
>> 
>> When this happens, I guess we have two choices:
>> 
>> (1) what the patch does: treat R3 as the start of a new chain.
>> (2) pretend that the copy occured in vd->e[sr].mode instead
>> (i.e. copy vd->e[sr].mode to vd->e[dr].mode)
>> 
>> I guess (2) would need to be subject to REG_CAN_CHANGE_MODE_P.
>> Maybe the optimisation provided by (2) compared to (1) isn't common
>> enough to be worth the complication.
>> 
>> I think we should test [B] as well as [A] though.  The pass is set
>> up to do some quite elaborate mode changes and I think rejecting
>> [A] on its own would make some of the other code redundant.
>> It also feels like it should be a seperate “if” or “else if”,
>> with its own comment.
>
> Unfortunately, we now have a testcase that shows that testing also [B]
> is a problem (unfortunately now latent on the trunk, only reproduces
> on 10 and 11 branches).

This whole area feels way more complicated than it ought to be :-/

> The comment in the patch tries to list just the interesting instructions,
> we have a 64-bit value, copy low 8 bit of those to another register,
> copy full 64 bits to another register and then clobber the original register.
> Before that (set (reg:DI r14) (const_int ...)) we have a chain
> DI r14, QI si, DI bp , that instruction drops the DI r14 from that chain, so
> we have QI si, DI bp , si being the oldest_regno.
> Next DI si is copied into DI dx.  Only the low 8 bits of that are defined,
> the rest is unspecified, but we would add DI dx into that same chain at the
> end, so QI si, DI bp, DI dx [*].  Next si is overwritten, so the chain is
> DI bp, DI dx.  And then we see (set (reg:DI dx) (reg:DI bp)) and remove it
> as redundant, because we think bp and dx are already equivalent, when in
> reality that is true only for the lowpart 8 bits.
> I believe the [*] marked step above is where the bug is.
>
> The committed regcprop.c (copy_value) change (but only committed to
> trunk/11, not to 10) added
>   else if (partial_subreg_p (vd->e[sr].mode, GET_MODE (src))
>&& partial_subreg_p (vd->e[sr].mode,
> vd->e[vd->e[sr].oldest_regno].mode))
> return;
> and while the first partial_subreg_p call returns true, the second one
> doesn't; before the (set (reg:DI r14) (const_int ...)) insn it would be
> true and we'd return, but as that reg got clobbered, si became the oldest
> regno in the chain and so vd->e[vd->e[sr].oldest_regno].mode is QImode
> and vd->e[sr].mode is QImode too, so the second partial_subreg_p is false.
> But as the testcase shows, what is the oldest_regno in the chain is
> something that changes over time, so relying on it for anything is
> problematic, something could have a different oldest_regno and later
> on get a different oldest_regno (perhaps with different mode) because
> the oldest_regno got overwritten and it can change both ways.
>
> I wrote the following patch (originally against 10 branch because that is
> where Uros has been debugging it) and bootstrapped/regtested it on 11
> branch successfully.
> It effectively implements your (2) above; I'm not sure if
> REG_CAN_CHANGE_MODE_P is needed there, because it is already tested in
> find_oldest_value_reg -> maybe_mode_change -> mode_change_ok.

The REG_CAN_CHANGE_MODE_P test would in this case be for
vd->e[dr].mode → vd->e[sr].mode, rather than oldest_regno's mode.
I'm just worried that:

   (set (reg:HI R1) (reg:HI R0))
   (set (reg:SI R2) (reg:SI R1))

isn't equivalent to:

   (set (reg:HI R1) (reg:HI R0))
   (set (reg:HI R2) (reg:HI R1))

if REG_CAN_CHANGE_MODE_P is false for either the R2 or R1 change.
If we pretend that it is when building the chain then there's a
risk of GIGO when using it in find_oldest_value_reg.

(Although in this case SI and HI are both valid for R1,
REG_CAN_CHANGE_MODE_P might still be false if the HI bits are
not in the low 16 bits of the SI.  That's unlikely in this case,
but a similar thing can happen for vector modes or multi-register modes.)

I'm not saying the patch is wrong.  I just wanted to clarify
why I 

[PATCH] More maybe_fold_reference TLC

2021-05-11 Thread Richard Biener
This adjusts maybe_fold_reference to adhere to its desired behavior
of performing constant folding and thus explicitely avoid returning
unfolded reference trees.

Bootstrapped and tested on x86_64-unknown-linux-gnu, pushed.

2021-05-11  Richard Biener  

* gimple-fold.c (maybe_fold_reference): Only return
is_gimple_min_invariant values.
---
 gcc/gimple-fold.c | 27 ++-
 1 file changed, 14 insertions(+), 13 deletions(-)

diff --git a/gcc/gimple-fold.c b/gcc/gimple-fold.c
index 6beb4f3d305..74ec36e3a78 100644
--- a/gcc/gimple-fold.c
+++ b/gcc/gimple-fold.c
@@ -311,27 +311,28 @@ get_symbol_constant_value (tree sym)
 static tree
 maybe_fold_reference (tree expr)
 {
-  tree result;
+  tree result = NULL_TREE;
 
   if ((TREE_CODE (expr) == VIEW_CONVERT_EXPR
|| TREE_CODE (expr) == REALPART_EXPR
|| TREE_CODE (expr) == IMAGPART_EXPR)
   && CONSTANT_CLASS_P (TREE_OPERAND (expr, 0)))
-return fold_unary_loc (EXPR_LOCATION (expr),
-  TREE_CODE (expr),
-  TREE_TYPE (expr),
-  TREE_OPERAND (expr, 0));
-  else if (TREE_CODE (expr) == BIT_FIELD_REF
-  && CONSTANT_CLASS_P (TREE_OPERAND (expr, 0)))
-return fold_ternary_loc (EXPR_LOCATION (expr),
+result = fold_unary_loc (EXPR_LOCATION (expr),
 TREE_CODE (expr),
 TREE_TYPE (expr),
-TREE_OPERAND (expr, 0),
-TREE_OPERAND (expr, 1),
-TREE_OPERAND (expr, 2));
+TREE_OPERAND (expr, 0));
+  else if (TREE_CODE (expr) == BIT_FIELD_REF
+  && CONSTANT_CLASS_P (TREE_OPERAND (expr, 0)))
+result = fold_ternary_loc (EXPR_LOCATION (expr),
+  TREE_CODE (expr),
+  TREE_TYPE (expr),
+  TREE_OPERAND (expr, 0),
+  TREE_OPERAND (expr, 1),
+  TREE_OPERAND (expr, 2));
+  else
+result = fold_const_aggregate_ref (expr);
 
-  if ((result = fold_const_aggregate_ref (expr))
-  && is_gimple_min_invariant (result))
+  if (result && is_gimple_min_invariant (result))
 return result;
 
   return NULL_TREE;
-- 
2.26.2


Re: [PATCH 1/2] vect: Add costing_for_scalar parameter to init_cost hook

2021-05-11 Thread Kewen.Lin via Gcc-patches
Hi Richi,

> OTOH we already pass scalar_stmt to individual add_stmt_cost,
> so not sure whether the context really matters.  That said,
> the density test looks "interesting" ... the intent was that finish_cost
> might look at gathered data from add_stmt, not that it looks at
> the GIMPLE IL ... so why are you not counting vector_stmt vs.
> scalar_stmt entries in vect_body and using that for this metric?
>

 Good to know the intention behind finish_cost, thanks!

 I'm afraid that the check on vector_stmt and scalar_stmt entries
 from add_stmt_cost doesn't work for the density test here.  The
 density test focuses on the vector version itself, there are some
 stmts whose relevants are marked as vect_unused_in_scope, IIUC
 they won't be passed down when costing for both versions.  But the
 existing density check would like to know the cost for the
 non-vectorized part.  The current implementation does:

  vec_cost = data->cost[vect_body]

   if (!STMT_VINFO_RELEVANT_P (stmt_info)
   && !STMT_VINFO_IN_PATTERN_P (stmt_info))
 not_vec_cost++

  density_pct = (vec_cost * 100) / (vec_cost + not_vec_cost);

 it takes those unrelevant stmts into account, and then has
 both costs from the non-vectorized part (not_vec_cost)
 and vectorized part (cost[vect_body]), it can calculate the
 vectorization code density ratio.
>>>
>>> Yes, but then what "relevant" stmts are actually needed and what
>>> not is missed by your heuristics.  It's really some GIGO one
>>> I fear - each vectorized data reference will add a pointer IV
>>> (eventually commoned by IVOPTs later) and pointer value updates
>>> that are not accounted for in costing (the IVs and updates in the
>>> scalar code are marked as not relevant).  Are those the stmts
>>> this heuristic wants to look at?
>>
>> Yes, the IVs and updates (even the comparison for exit) are what
>> the heuristics tries to count.  In most cases, the non vectorized
>> part in the loop are IV updates.  And it's so true that the
>> collected not_vec_cost could be not accurate, but it seems hard
>> to predict the cost exactly here?
>>
>> Assuming this not_vect_cost cost is over priced, it could result
>> in a lower density ratio than what it should be.  Also assuming
>> the density threshold is relatively conservative, in this case
>> if the ratio still exceeds the density threshold, we can say the
>> loop is really dense.  It could miss to catch some "dense" loops,
>> but I hope it won't take "non-dense" loops as "dense" unexpectedly.
> 
> So we could in principle include IVs and updates in the costing but
> then the vectorizer isn't absolutely careful for doing scalar cleanups
> and instead expects IVOPTs to create canonical IVs.  Note for
> the scalar part those stmts are not costed either, we'd have to
> change that as well.  What this would mean is that for a scalar
> loop accessing a[i] and b[i] we'd have one original IV + update
> and the vectorizer generates two pointer IVs + updates.
> 


I broke down my understanding a bit below to ensure it's correct.

  - We can pass down those "unrelevant" stmts into add_stmt_cost
for both scalar and vector versions, then targets can check
stmts accordingly instead of scanning IL by themselves.
For scalar version, these are mainly original IV + update
+ some address ref calculation;  while for vector version,
these are mainly pointer IVs + updates.
  
  - What's the cost assigned for these "unrelevant" stmts?
The comments seems to imply we want to cost them?   If so,
I am worried that it can break some current costing
heuristics which don't consider these costs.  Besides,
these "unrelavant" stmts can be optimized later, if we
consider them somwhere like calculating profitable min
iter, could result in worse code?
Can we pass them down but cost them freely?

> But in the end the vector code shouldn't end up worse than the
> scalar code with respect to IVs - the cases where it would should
> be already costed.  So I wonder if you have specific examples
> where things go worse enough for the heuristic to trigger?
> 

One typical case that I worked on to reuse this density check is the
function mat_times_vec of src file block_solver.fppized.f of SPEC2017
503.bwaves_r, the density with the existing heuristic is 83 (doesn't
exceed the threshold unlikely).  The interesting loop is the innermost
one while option set is "-O2 -mcpu=power8 -ffast-math -ftree-vectorize".
We have verified that this loop isn't profitable to be vectorized at
O2 (without loop-interchange).

Another function shell which also comes from 503.bwaves_r src file
shell_lam.fppized.f does hit this threshold, the loop is the one
starting from line 228.

BR,
Kewen


[PATCH] middle-end/100509 - avoid folding constant to aggregate type

2021-05-11 Thread Richard Biener
When folding a constant initializer looking through aliases to
incompatible types can lead to us trying to fold a constant
to an aggregate type which can't work.  Simply avoid trying
to constant fold non-register typed symbols.

Bootstrapped and tested on x86_64-unknown-linux-gnu, pushed to trunk 
sofar.

2021-05-11  Richard Biener  

PR middle-end/100509
* gimple-fold.c (fold_gimple_assign): Only call
get_symbol_constant_value on register type symbols.

* gcc.dg/pr100509.c: New testcase.
---
 gcc/gimple-fold.c   | 3 ++-
 gcc/testsuite/gcc.dg/pr100509.c | 9 +
 2 files changed, 11 insertions(+), 1 deletion(-)
 create mode 100644 gcc/testsuite/gcc.dg/pr100509.c

diff --git a/gcc/gimple-fold.c b/gcc/gimple-fold.c
index 768ef89d876..6beb4f3d305 100644
--- a/gcc/gimple-fold.c
+++ b/gcc/gimple-fold.c
@@ -547,7 +547,8 @@ fold_gimple_assign (gimple_stmt_iterator *si)
   CONSTRUCTOR_ELTS (rhs));
  }
 
-   else if (DECL_P (rhs))
+   else if (DECL_P (rhs)
+&& is_gimple_reg_type (TREE_TYPE (rhs)))
  return get_symbol_constant_value (rhs);
   }
   break;
diff --git a/gcc/testsuite/gcc.dg/pr100509.c b/gcc/testsuite/gcc.dg/pr100509.c
new file mode 100644
index 000..9405e2a27df
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/pr100509.c
@@ -0,0 +1,9 @@
+/* { dg-do compile } */
+/* { dg-options "-O" } */
+
+struct X {
+  int a;
+};
+const int a = 0;
+static struct X A __attribute__((alias("a")));
+void foo() { struct X b = A; }
-- 
2.26.2


Re: [PATCH] aarch64: Avoid duplicating bti j insns for jump tables [PR99988]

2021-05-11 Thread Richard Sandiford via Gcc-patches
Alex Coplan  writes:
> Hi Richard,
>
> On 21/04/2021 13:05, Richard Sandiford wrote:
>> Alex Coplan  writes:
>> > Hi Richard,
>> >
>> > On 15/04/2021 18:45, Richard Sandiford wrote:
>> >> Looks good in general, but like you say, it's GCC 12 material.
>> >
>> > Thanks for the review. The attached patch addresses these comments and
>> > bootstraps/regtests OK on aarch64-linux-gnu. OK for trunk?
>> 
>> OK, thanks.
>
> The patch applies cleanly and bootstraps/regtests OK on the GCC 11
> branch. OK to backport there and to 10 and 9 if the same is true for
> those branches?

To summarise what we discussed off-list: this initially looked like it
was “just” a new DCE optimisation, which is why it seemed like GCC 12
material.  However, as Alex pointed out, the unpatched BTI support can
generate code whose size is quadratic in the number of switch cases,
which is a serious regression whichever way you cut it.

So this is OK for backports, and would have been OK during GCC 11
stage 4 too.

Thanks,
Richard


Re: [PATCH] RISC-V: For '-march' and '-mabi' options, add 'Negative' property mentions itself.

2021-05-11 Thread Kito Cheng via Gcc-patches
It seems useful, I've backported to GCC 9/10/11.

Without this patch:
$ riscv64-unknown-elf-gcc -march=rv32imafd ~/hello.c -march=rv64gc
/scratch1/kitoc/riscv-gnu-workspace/rv64gc/install/bin/../lib/gcc/riscv64-unknown-elf/11.1.0/../../../../riscv64-unknown-elf/bin/ld:
unrecognised
emulation mode: elf3264lriscv
Supported emulations: elf64lriscv elf32lriscv elf64briscv elf32briscv
collect2: error: ld returned 1 exit status

With this patch:
$ riscv64-unknown-elf-gcc -march=rv32imafd ~/hello.c -march=rv64gc
// No error!

On Fri, Apr 30, 2021 at 11:42 AM Jim Wilson  wrote:
>
> On Wed, Apr 28, 2021 at 1:30 AM Geng Qi via Gcc-patches <
> gcc-patches@gcc.gnu.org> wrote:
>
> > gcc/ChangeLog:
> > * config/riscv/riscv.opt (march=,mabi=): Negative itself.
> >
>
> Thanks.  I committed this.
>
> Do we need to backport to release branches?  This looks like an uncommon
> problem, or we would have noticed this before.
>
> Jim


[PATCH 7/7] [og10] WIP GOMP_MAP_ATTACH_ZERO_LENGTH_ARRAY_SECTION changes

2021-05-11 Thread Julian Brown
This work-in-progress patch tries to get
GOMP_MAP_ATTACH_ZERO_LENGTH_ARRAY_SECTION to behave more like
GOMP_MAP_ATTACH_DETACH -- in that the mapping is made to form groups
to be processed by build_struct_group/build_struct_comp_map.  I think
that's important to integrate with how groups of mappings for array
sections are handled in other cases.

This patch isn't sufficient by itself to fix a couple of broken test cases
at present (libgomp.c++/target-lambda-1.C, libgomp.c++/target-this-4.C),
though.

2021-05-11  Julian Brown  

gcc/
* gimplify.c (build_struct_comp_nodes): Add
GOMP_MAP_ATTACH_ZERO_LENGTH_ARRAY_SECTION handling.
(build_struct_group): Process GOMP_MAP_ATTACH_ZERO_LENGTH_ARRAY_SECTION
as part of pointer group.
(gimplify_scan_omp_clauses): Update prev_list_p such that
GOMP_MAP_ATTACH_ZERO_LENGTH_ARRAY_SECTION will form part of pointer
group.
---
 gcc/gimplify.c | 16 
 1 file changed, 12 insertions(+), 4 deletions(-)

diff --git a/gcc/gimplify.c b/gcc/gimplify.c
index 6d204908c82..c5cb486aa23 100644
--- a/gcc/gimplify.c
+++ b/gcc/gimplify.c
@@ -8298,7 +8298,9 @@ build_struct_comp_nodes (enum tree_code code, tree 
grp_start, tree grp_end,
   if (grp_mid
   && OMP_CLAUSE_CODE (grp_mid) == OMP_CLAUSE_MAP
   && (OMP_CLAUSE_MAP_KIND (grp_mid) == GOMP_MAP_ALWAYS_POINTER
- || OMP_CLAUSE_MAP_KIND (grp_mid) == GOMP_MAP_ATTACH_DETACH))
+ || OMP_CLAUSE_MAP_KIND (grp_mid) == GOMP_MAP_ATTACH_DETACH
+ || (OMP_CLAUSE_MAP_KIND (grp_mid)
+ == GOMP_MAP_ATTACH_ZERO_LENGTH_ARRAY_SECTION)))
 {
   tree c3
= build_omp_clause (OMP_CLAUSE_LOCATION (grp_end), OMP_CLAUSE_MAP);
@@ -8774,12 +8776,14 @@ build_struct_group (struct gimplify_omp_ctx *ctx,
? splay_tree_lookup (ctx->variables, (splay_tree_key) decl)
: NULL);
   bool ptr = (OMP_CLAUSE_MAP_KIND (c) == GOMP_MAP_ALWAYS_POINTER);
-  bool attach_detach = (OMP_CLAUSE_MAP_KIND (c) == GOMP_MAP_ATTACH_DETACH);
+  bool attach_detach = (OMP_CLAUSE_MAP_KIND (c) == GOMP_MAP_ATTACH_DETACH
+   || (OMP_CLAUSE_MAP_KIND (c)
+   == GOMP_MAP_ATTACH_ZERO_LENGTH_ARRAY_SECTION));
   bool attach = (OMP_CLAUSE_MAP_KIND (c) == GOMP_MAP_ATTACH
 || OMP_CLAUSE_MAP_KIND (c) == GOMP_MAP_DETACH);
   bool has_attachments = false;
   /* For OpenACC, pointers in structs should trigger an attach action.  */
-  if (attach_detach
+  if (OMP_CLAUSE_MAP_KIND (c) == GOMP_MAP_ATTACH_DETACH
   && ((region_type & (ORT_ACC | ORT_TARGET | ORT_TARGET_DATA))
  || code == OMP_TARGET_ENTER_DATA
  || code == OMP_TARGET_EXIT_DATA))
@@ -9784,6 +9788,8 @@ gimplify_scan_omp_clauses (tree *list_p, gimple_seq 
*pre_p,
  if (!remove
  && OMP_CLAUSE_MAP_KIND (c) != GOMP_MAP_ALWAYS_POINTER
  && OMP_CLAUSE_MAP_KIND (c) != GOMP_MAP_ATTACH_DETACH
+ && (OMP_CLAUSE_MAP_KIND (c)
+ != GOMP_MAP_ATTACH_ZERO_LENGTH_ARRAY_SECTION)
  && OMP_CLAUSE_MAP_KIND (c) != GOMP_MAP_TO_PSET
  && OMP_CLAUSE_CHAIN (c)
  && OMP_CLAUSE_CODE (OMP_CLAUSE_CHAIN (c)) == OMP_CLAUSE_MAP
@@ -9792,7 +9798,9 @@ gimplify_scan_omp_clauses (tree *list_p, gimple_seq 
*pre_p,
  || (OMP_CLAUSE_MAP_KIND (OMP_CLAUSE_CHAIN (c))
  == GOMP_MAP_ATTACH_DETACH)
  || (OMP_CLAUSE_MAP_KIND (OMP_CLAUSE_CHAIN (c))
- == GOMP_MAP_TO_PSET)))
+ == GOMP_MAP_TO_PSET)
+ || (OMP_CLAUSE_MAP_KIND (OMP_CLAUSE_CHAIN (c))
+ == GOMP_MAP_ATTACH_ZERO_LENGTH_ARRAY_SECTION)))
prev_list_p = list_p;
 
  break;
-- 
2.29.2



[PATCH 6/7] [og10] Rework indirect struct handling for OpenACC/OpenMP in gimplify.c

2021-05-11 Thread Julian Brown
This patch reworks indirect struct handling in gimplify.c (i.e. for struct
components mapped with "mystruct->a[0:n]", "mystruct->b", etc.), for
both OpenACC and OpenMP.  The key observation leading to these changes
was that component mappings of references-to-structures is already
implemented and working, and indirect struct component handling via a
pointer can work quite similarly.  That lets us remove some earlier,
special-case handling for mapping indirect struct component accesses
for OpenACC, which required the pointed-to struct to be manually mapped
before the indirect component mapping.

With this patch, you can map struct components directly (e.g. an array
slice "mystruct->a[0:n]") just like you can map a non-indirect struct
component slice ("mystruct.a[0:n]"). Both references-to-pointers (with
the former syntax) and references to structs (with the latter syntax)
work now.

For Fortran class pointers, we no longer re-use GOMP_MAP_TO_PSET for the
class metadata (the structure that points to the class data and vptr)
-- it is instead treated as any other struct.

For C++, the struct handling also works for class members ("this->foo"),
without having to explicitly map "this[:1]" first.

For OpenACC, we permit chained indirect component references
("mystruct->a->b[0:n]"), though only the last part of such mappings will
trigger an attach/detach operation.  To properly use such a construct
on the target, you must still manually map "mystruct->a[:1]" first --
but there's no need to map "mystruct[:1]" explicitly before that.

2021-05-11  Julian Brown  

gcc/fortran/
* trans-openmp.c (gfc_trans_omp_clauses): Don't create GOMP_MAP_TO_PSET
mappings for class metadata, nor GOMP_MAP_POINTER mappings for
POINTER_TYPE_P decls.

gcc/
* gimplify.c (extract_base_bit_offset): Add BASE_IND parameter.  Handle
pointer-typed indirect references alongside reference-typed ones.
(strip_components, strip_components_and_deref, aggregate_base_p): New
functions.
(build_struct_group): Remove PD parameter.  Add pointer type indirect
ref handling, including chained references.  Handle pointers and
references to structs in OpenACC regions as well as OpenMP ones.
Remove gimplification of non-pointer struct component mappings.
(gimplify_scan_omp_clauses): Remove struct_deref_set handling.  Rework
pointer-type indirect structure access handling to work more like
the reference-typed handling.
* omp-low.c (scan_sharing_clauses): Handle pointer-type indirect struct
references, and references to pointers to structs also.

gcc/testsuite/
* g++.dg/goacc/member-array-acc.C: New test.
* g++.dg/gomp/member-array-omp.C: New test.
* g++.dg/gomp/target-3.C: Adjust scan dump matching patterns.
* g++.dg/gomp/target-this-2.C: Adjust scan dump matching patterns.
* gcc.dg/gomp/target-3.c: Remove XFAIL.

libgomp/
* testsuite/libgomp.oacc-c-c++-common/deep-copy-15.c: New test.
* testsuite/libgomp.oacc-c-c++-common/deep-copy-16.c: New test.
---
 gcc/fortran/trans-openmp.c|  20 +-
 gcc/gimplify.c| 321 +++---
 gcc/omp-low.c |  16 +-
 gcc/testsuite/g++.dg/goacc/member-array-acc.C |  13 +
 gcc/testsuite/g++.dg/gomp/member-array-omp.C  |  13 +
 gcc/testsuite/g++.dg/gomp/target-3.C  |   4 +-
 gcc/testsuite/g++.dg/gomp/target-this-2.C |   2 +-
 gcc/testsuite/gcc.dg/gomp/target-3.c  |   2 +-
 .../libgomp.oacc-c-c++-common/deep-copy-15.c  |  68 
 .../libgomp.oacc-c-c++-common/deep-copy-16.c  |  95 ++
 10 files changed, 405 insertions(+), 149 deletions(-)
 create mode 100644 gcc/testsuite/g++.dg/goacc/member-array-acc.C
 create mode 100644 gcc/testsuite/g++.dg/gomp/member-array-omp.C
 create mode 100644 libgomp/testsuite/libgomp.oacc-c-c++-common/deep-copy-15.c
 create mode 100644 libgomp/testsuite/libgomp.oacc-c-c++-common/deep-copy-16.c

diff --git a/gcc/fortran/trans-openmp.c b/gcc/fortran/trans-openmp.c
index e3df4bbf84e..9098b35c9f1 100644
--- a/gcc/fortran/trans-openmp.c
+++ b/gcc/fortran/trans-openmp.c
@@ -2693,30 +2693,16 @@ gfc_trans_omp_clauses (stmtblock_t *block, 
gfc_omp_clauses *clauses,
  tree present = gfc_omp_check_optional_argument (decl, true);
  if (openacc && n->sym->ts.type == BT_CLASS)
{
- tree type = TREE_TYPE (decl);
  if (n->sym->attr.optional)
sorry ("optional class parameter");
- if (POINTER_TYPE_P (type))
-   {
- node4 = build_omp_clause (input_location,
-   OMP_CLAUSE_MAP);
- OMP_CLAUSE_SET_MAP_KIND (node4, GOMP_MAP_POINTER);
- OMP_CLAUSE_DECL (node4) = decl;
-  

[PATCH 5/7] [og10] Rewrite GOMP_MAP_ATTACH_DETACH mappings for OpenMP also

2021-05-11 Thread Julian Brown
It never makes sense for a GOMP_MAP_ATTACH_DETACH mapping to survive
beyond gimplify.c, and with OpenMP making use of that mapping type too
now alongside OpenACC, there are cases where it was making it through
to omp-low.c.  This patch rewrites such mappings to GOMP_MAP_ATTACH or
GOMP_MAP_DETACH unconditionally for both OpenACC and OpenMP, in cases
where it hasn't otherwise been handled already in the preceding code.

2021-05-11  Julian Brown  

gcc/
* gimplify.c (gimplify_scan_omp_clauses): Remove OpenACC-only
condition for changing GOMP_MAP_ATTACH_DETACH to GOMP_MAP_ATTACH or
GOMP_MAP_DETACH.  Use detach for OMP_TARGET_EXIT_DATA also.
---
 gcc/gimplify.c | 13 -
 1 file changed, 4 insertions(+), 9 deletions(-)

diff --git a/gcc/gimplify.c b/gcc/gimplify.c
index c2072c7188f..86000f8470b 100644
--- a/gcc/gimplify.c
+++ b/gcc/gimplify.c
@@ -9695,16 +9695,11 @@ gimplify_scan_omp_clauses (tree *list_p, gimple_seq 
*pre_p,
  if (cont)
continue;
}
- else if ((code == OACC_ENTER_DATA
-   || code == OACC_EXIT_DATA
-   || code == OACC_DATA
-   || code == OACC_PARALLEL
-   || code == OACC_KERNELS
-   || code == OACC_SERIAL)
-  && OMP_CLAUSE_MAP_KIND (c) == GOMP_MAP_ATTACH_DETACH)
+ else if (OMP_CLAUSE_MAP_KIND (c) == GOMP_MAP_ATTACH_DETACH)
{
- gomp_map_kind k = (code == OACC_EXIT_DATA
-? GOMP_MAP_DETACH : GOMP_MAP_ATTACH);
+ gomp_map_kind k
+   = ((code == OACC_EXIT_DATA || code == OMP_TARGET_EXIT_DATA)
+  ? GOMP_MAP_DETACH : GOMP_MAP_ATTACH);
  OMP_CLAUSE_SET_MAP_KIND (c, k);
}
 
-- 
2.29.2



[PATCH 4/7] [og10] Revert gimplify.c parts of "Arrow operator handling for C front-end in OpenMP map clauses"

2021-05-11 Thread Julian Brown
With the "rework indirect struct handling" patch later in this series,
some parts of this earlier patch (by Chung-Lin) become unnecessary.
This patch reverts those bits.

An XFAIL has been added for a test that fails for the time being with
this reversion, until the later patch in the series fixes it again.

2021-05-11  Julian Brown  

gcc/
* gimplify.c (build_struct_group): Remove COMPONENT_REF_P parameter.
Don't call gimplify_expr on decl in non-reference case.  Remove code to
add FIRSTPRIVATE_POINTER for *pointer-to-struct expressions.
(gimplify_scan_omp_clauses): Remove COMPONENT_REF_P handling.

gcc/testsuite/
* gcc.dg/gomp/target-3.c: XFAIL test.
---
 gcc/gimplify.c   | 41 
 gcc/testsuite/gcc.dg/gomp/target-3.c |  2 +-
 2 files changed, 6 insertions(+), 37 deletions(-)

diff --git a/gcc/gimplify.c b/gcc/gimplify.c
index 0674d882861..c2072c7188f 100644
--- a/gcc/gimplify.c
+++ b/gcc/gimplify.c
@@ -8686,8 +8686,7 @@ move_concat_nodes_after (tree first_new, tree 
*last_new_tail, tree *first_ptr,
 static tree
 build_struct_group (struct gimplify_omp_ctx *ctx,
enum omp_region_type region_type, enum tree_code code,
-   tree decl, tree *pd, bool component_ref_p,
-   unsigned int *flags, tree c,
+   tree decl, tree *pd, unsigned int *flags, tree c,
hash_map *_map_to_clause,
tree *_list_p, tree *_p, gimple_seq *pre_p,
bool *cont)
@@ -8737,13 +8736,7 @@ build_struct_group (struct gimplify_omp_ctx *ctx,
   if (base_ref)
OMP_CLAUSE_DECL (l) = unshare_expr (base_ref);
   else
-   {
- OMP_CLAUSE_DECL (l) = unshare_expr (decl);
- if (!DECL_P (OMP_CLAUSE_DECL (l))
- && (gimplify_expr (_CLAUSE_DECL (l), pre_p, NULL,
-is_gimple_lvalue, fb_lvalue) == GS_ERROR))
-   return error_mark_node;
-   }
+   OMP_CLAUSE_DECL (l) = decl;
   OMP_CLAUSE_SIZE (l)
= (!attach ? size_int (1)
   : (DECL_P (OMP_CLAUSE_DECL (l))
@@ -8785,27 +8778,6 @@ build_struct_group (struct gimplify_omp_ctx *ctx,
*flags |= GOVD_SEEN;
   if (has_attachments)
*flags |= GOVD_MAP_HAS_ATTACHMENTS;
-
-  /* If this is a *pointer-to-struct expression, make sure a
-firstprivate map of the base-pointer exists.  */
-  if (component_ref_p
- && ((TREE_CODE (decl) == MEM_REF
-  && integer_zerop (TREE_OPERAND (decl, 1)))
- || INDIRECT_REF_P (decl))
- && DECL_P (TREE_OPERAND (decl, 0))
- && !splay_tree_lookup (ctx->variables,
-((splay_tree_key) TREE_OPERAND (decl, 0
-   {
- decl = TREE_OPERAND (decl, 0);
- tree c2 = build_omp_clause (OMP_CLAUSE_LOCATION (c), OMP_CLAUSE_MAP);
- enum gomp_map_kind mkind = GOMP_MAP_FIRSTPRIVATE_POINTER;
- OMP_CLAUSE_SET_MAP_KIND (c2, mkind);
- OMP_CLAUSE_DECL (c2) = decl;
- OMP_CLAUSE_SIZE (c2) = size_zero_node;
- OMP_CLAUSE_CHAIN (c2) = OMP_CLAUSE_CHAIN (c);
- OMP_CLAUSE_CHAIN (c) = c2;
-   }
-
   return decl;
 }
   else if (struct_map_to_clause)
@@ -9660,9 +9632,7 @@ gimplify_scan_omp_clauses (tree *list_p, gimple_seq 
*pre_p,
  && OMP_CLAUSE_MAP_KIND (c) == GOMP_MAP_ATTACH_DETACH)
OMP_CLAUSE_SET_MAP_KIND (c, GOMP_MAP_ALWAYS_POINTER);
  if ((DECL_P (decl)
-  || (component_ref_p
-  && (INDIRECT_REF_P (decl)
-  || TREE_CODE (decl) == MEM_REF)))
+  || (component_ref_p && INDIRECT_REF_P (decl)))
  && OMP_CLAUSE_MAP_KIND (c) != GOMP_MAP_TO_PSET
  && OMP_CLAUSE_MAP_KIND (c) != GOMP_MAP_ATTACH
  && OMP_CLAUSE_MAP_KIND (c) != GOMP_MAP_DETACH
@@ -9710,9 +9680,8 @@ gimplify_scan_omp_clauses (tree *list_p, gimple_seq 
*pre_p,
  bool cont = false;
  tree add_decl
= build_struct_group (ctx, region_type, code, decl, pd,
- component_ref_p, , c,
- struct_map_to_clause, prev_list_p,
- list_p, pre_p, );
+ , c, struct_map_to_clause,
+ prev_list_p, list_p, pre_p, );
  if (add_decl == error_mark_node)
{
  remove = true;
diff --git a/gcc/testsuite/gcc.dg/gomp/target-3.c 
b/gcc/testsuite/gcc.dg/gomp/target-3.c
index 3e7921270c9..08e42eeb304 100644
--- a/gcc/testsuite/gcc.dg/gomp/target-3.c
+++ b/gcc/testsuite/gcc.dg/gomp/target-3.c
@@ -13,4 +13,4 @@ void foo (struct S *s)
   #pragma omp target enter data map (alloc: s->a, s->b)
 }
 
-/* { dg-final { 

[PATCH 3/7] [og10] Revert gimplify.c parts of "Fix template case of non-static member access inside member functions"

2021-05-11 Thread Julian Brown
With the "rework indirect struct handling" patch later in this series,
some parts of this earlier patch (by Chung-Lin) become unnecessary.
This patch reverts those bits.

2021-05-11  Julian Brown  

gcc/
* gimplify.c (gimplify_scan_omp_clauses): Don't strip nops in indir_p
case. Don't add map(*pointer_to_struct) mappings to struct_deref_set.
---
 gcc/gimplify.c | 19 ---
 1 file changed, 19 deletions(-)

diff --git a/gcc/gimplify.c b/gcc/gimplify.c
index ad192b27208..0674d882861 100644
--- a/gcc/gimplify.c
+++ b/gcc/gimplify.c
@@ -9574,7 +9574,6 @@ gimplify_scan_omp_clauses (tree *list_p, gimple_seq 
*pre_p,
{
  indir_p = true;
  decl = TREE_OPERAND (decl, 0);
- STRIP_NOPS (decl);
}
  if (TREE_CODE (decl) == INDIRECT_REF
  && DECL_P (TREE_OPERAND (decl, 0))
@@ -9747,24 +9746,6 @@ gimplify_scan_omp_clauses (tree *list_p, gimple_seq 
*pre_p,
  break;
}
 
- /* If this was of the form map(*pointer_to_struct), then the
-'pointer_to_struct' DECL should be considered deref'ed.  */
- if ((OMP_CLAUSE_MAP_KIND (c) == GOMP_MAP_ALLOC
-  || GOMP_MAP_COPY_TO_P (OMP_CLAUSE_MAP_KIND (c))
-  || GOMP_MAP_COPY_FROM_P (OMP_CLAUSE_MAP_KIND (c)))
- && INDIRECT_REF_P (orig_decl)
- && DECL_P (TREE_OPERAND (orig_decl, 0))
- && TREE_CODE (TREE_TYPE (orig_decl)) == RECORD_TYPE)
-   {
- tree ptr = TREE_OPERAND (orig_decl, 0);
- if (!struct_deref_set || !struct_deref_set->contains (ptr))
-   {
- if (!struct_deref_set)
-   struct_deref_set = new hash_set ();
- struct_deref_set->add (ptr);
-   }
-   }
-
  if (!remove
  && OMP_CLAUSE_MAP_KIND (c) != GOMP_MAP_ALWAYS_POINTER
  && OMP_CLAUSE_MAP_KIND (c) != GOMP_MAP_ATTACH_DETACH
-- 
2.29.2



[PATCH 2/7] [og10] Refactor struct lowering for OpenMP/OpenACC in gimplify.c

2021-05-11 Thread Julian Brown
This patch is a second attempt at refactoring struct component mapping
handling for OpenACC/OpenMP during gimplification, after the patch I
posted here:

  https://gcc.gnu.org/pipermail/gcc-patches/2018-November/510503.html

And improved here, post-review:

  https://gcc.gnu.org/pipermail/gcc-patches/2019-November/533394.html

This patch goes further, in that the struct-handling code is outlined
into its own function (to create the "GOMP_MAP_STRUCT" node and the
sorted list of nodes immediately following it, from a set of mappings
of components of a given struct or derived type). I've also gone through
the list-handling code and attempted to add comments documenting how it
works to the best of my understanding, and broken out a couple of helper
functions in order to (hopefully) have the code self-document better also.

2021-05-11  Julian Brown  

gcc/
* gimplify.c (insert_struct_comp_map): Refactor function into...
(build_struct_comp_nodes): This new function.  Remove list handling
and improve self-documentation.
(insert_node_after, move_node_after, move_nodes_after,
move_concat_nodes_after): New helper functions.
(build_struct_group): New function to build up GOMP_MAP_STRUCT node
groups to map struct components. Outlined from...
(gimplify_scan_omp_clauses): Here.
---
 gcc/gimplify.c | 846 +++--
 1 file changed, 540 insertions(+), 306 deletions(-)

diff --git a/gcc/gimplify.c b/gcc/gimplify.c
index b36b961bf26..ad192b27208 100644
--- a/gcc/gimplify.c
+++ b/gcc/gimplify.c
@@ -8253,73 +8253,66 @@ gimplify_omp_depend (tree *list_p, gimple_seq *pre_p)
   return 1;
 }
 
-/* Insert a GOMP_MAP_ALLOC or GOMP_MAP_RELEASE node following a
-   GOMP_MAP_STRUCT mapping.  C is an always_pointer mapping.  STRUCT_NODE is
-   the struct node to insert the new mapping after (when the struct node is
-   initially created).  PREV_NODE is the first of two or three mappings for a
-   pointer, and is either:
- - the node before C, when a pair of mappings is used, e.g. for a C/C++
-   array section.
- - not the node before C.  This is true when we have a reference-to-pointer
-   type (with a mapping for the reference and for the pointer), or for
-   Fortran derived-type mappings with a GOMP_MAP_TO_PSET.
-   If SCP is non-null, the new node is inserted before *SCP.
-   if SCP is null, the new node is inserted before PREV_NODE.
-   The return type is:
- - PREV_NODE, if SCP is non-null.
- - The newly-created ALLOC or RELEASE node, if SCP is null.
- - The second newly-created ALLOC or RELEASE node, if we are mapping a
-   reference to a pointer.  */
+/* For a set of mappings describing an array section pointed to by a struct
+   (or derived type, etc.) component, create an "alloc" or "release" node to
+   insert into a list following a GOMP_MAP_STRUCT node.  For some types of
+   mapping (e.g. Fortran arrays with descriptors), an additional mapping may
+   be created that is inserted into the list of mapping nodes attached to the
+   directive being processed -- not part of the sorted list of nodes after
+   GOMP_MAP_STRUCT.
+
+   CODE is the code of the directive being processed.  GRP_START and GRP_END
+   are the first and last of two or three nodes representing this array section
+   mapping (e.g. a data movement node like GOMP_MAP_{TO,FROM}, optionally a
+   GOMP_MAP_TO_PSET, and finally a GOMP_MAP_ALWAYS_POINTER).  EXTRA_NODE is
+   filled with the additional node described above, if needed.
+
+   This function does not add the new nodes to any lists itself.  It is the
+   responsibility of the caller to do that.  */
 
 static tree
-insert_struct_comp_map (enum tree_code code, tree c, tree struct_node,
-   tree prev_node, tree *scp)
+build_struct_comp_nodes (enum tree_code code, tree grp_start, tree grp_end,
+tree *extra_node)
 {
   enum gomp_map_kind mkind
 = (code == OMP_TARGET_EXIT_DATA || code == OACC_EXIT_DATA)
   ? GOMP_MAP_RELEASE : GOMP_MAP_ALLOC;
 
-  tree c2 = build_omp_clause (OMP_CLAUSE_LOCATION (c), OMP_CLAUSE_MAP);
-  tree cl = scp ? prev_node : c2;
+  gcc_assert (grp_start != grp_end);
+
+  tree c2 = build_omp_clause (OMP_CLAUSE_LOCATION (grp_end), OMP_CLAUSE_MAP);
   OMP_CLAUSE_SET_MAP_KIND (c2, mkind);
-  OMP_CLAUSE_DECL (c2) = unshare_expr (OMP_CLAUSE_DECL (c));
-  OMP_CLAUSE_CHAIN (c2) = scp ? *scp : prev_node;
-  if (OMP_CLAUSE_CHAIN (prev_node) != c
-  && OMP_CLAUSE_CODE (OMP_CLAUSE_CHAIN (prev_node)) == OMP_CLAUSE_MAP
-  && (OMP_CLAUSE_MAP_KIND (OMP_CLAUSE_CHAIN (prev_node))
- == GOMP_MAP_TO_PSET))
-OMP_CLAUSE_SIZE (c2) = OMP_CLAUSE_SIZE (OMP_CLAUSE_CHAIN (prev_node));
+  OMP_CLAUSE_DECL (c2) = unshare_expr (OMP_CLAUSE_DECL (grp_end));
+  OMP_CLAUSE_CHAIN (c2) = NULL_TREE;
+  tree grp_mid = NULL_TREE;
+  if (OMP_CLAUSE_CHAIN (grp_start) != grp_end)
+grp_mid = OMP_CLAUSE_CHAIN (grp_start);
+

[PATCH 1/7] [og10] Unify ARRAY_REF/INDIRECT_REF stripping code in extract_base_bit_offset

2021-05-11 Thread Julian Brown
For historical reasons, it seems that extract_base_bit_offset
unnecessarily used two different ways to strip ARRAY_REF/INDIRECT_REF
nodes from component accesses. I verified that the two ways of performing
the operation gave the same results across the whole testsuite (and
several additional benchmarks).

The code was like this since an earlier "mechanical" refactoring by me,
first posted here:

  https://gcc.gnu.org/pipermail/gcc-patches/2018-November/510503.html

It was never clear to me if there was an important semantic
difference between the two ways of stripping the base before calling
get_inner_reference, but it appears that there is not, so one can go away.

2021-05-11  Julian Brown  

gcc/
* gimplify.c (extract_base_bit_offset): Unify ARRAY_REF/INDIRECT_REF
stripping code in first call/subsequent call cases.
---
 gcc/gimplify.c | 32 +++-
 1 file changed, 11 insertions(+), 21 deletions(-)

diff --git a/gcc/gimplify.c b/gcc/gimplify.c
index ba071e8ae68..b36b961bf26 100644
--- a/gcc/gimplify.c
+++ b/gcc/gimplify.c
@@ -8340,31 +8340,21 @@ extract_base_bit_offset (tree base, tree *base_ref, 
poly_int64 *bitposp,
   poly_offset_int poffset;
 
   if (base_ref)
-{
-  *base_ref = NULL_TREE;
-
-  while (TREE_CODE (base) == ARRAY_REF)
-   base = TREE_OPERAND (base, 0);
+*base_ref = NULL_TREE;
 
-  if (TREE_CODE (base) == INDIRECT_REF)
-   base = TREE_OPERAND (base, 0);
-}
-  else
+  if (TREE_CODE (base) == ARRAY_REF)
 {
-  if (TREE_CODE (base) == ARRAY_REF)
-   {
- while (TREE_CODE (base) == ARRAY_REF)
-   base = TREE_OPERAND (base, 0);
- if (TREE_CODE (base) != COMPONENT_REF
- || TREE_CODE (TREE_TYPE (base)) != ARRAY_TYPE)
-   return NULL_TREE;
-   }
-  else if (TREE_CODE (base) == INDIRECT_REF
-  && TREE_CODE (TREE_OPERAND (base, 0)) == COMPONENT_REF
-  && (TREE_CODE (TREE_TYPE (TREE_OPERAND (base, 0)))
-  == REFERENCE_TYPE))
+  while (TREE_CODE (base) == ARRAY_REF)
base = TREE_OPERAND (base, 0);
+  if (TREE_CODE (base) != COMPONENT_REF
+ || TREE_CODE (TREE_TYPE (base)) != ARRAY_TYPE)
+   return NULL_TREE;
 }
+  else if (TREE_CODE (base) == INDIRECT_REF
+  && TREE_CODE (TREE_OPERAND (base, 0)) == COMPONENT_REF
+  && (TREE_CODE (TREE_TYPE (TREE_OPERAND (base, 0)))
+  == REFERENCE_TYPE))
+base = TREE_OPERAND (base, 0);
 
   base = get_inner_reference (base, , , , ,
  , , );
-- 
2.29.2



[PATCH 0/7] [og10] OpenACC/OpenMP: Rework struct component handling

2021-05-11 Thread Julian Brown
Hi,

This patch series reworks how indirect struct (or class or derived-type)
component mappings are lowered for OpenMP and OpenACC.  The series is
currently based on the og10 branch, but I'm planning to rebase it to
trunk and repost soon.

Currently on og10, there are some conflicts with Chung-Lin's recent
work to support indirect struct component mapping and (particularly) C++
lambdas for OpenMP -- some of those conflicts are addressed by this
patch series, but some still need attention.  There are a couple of
regressions (see the last patch in the series) so the whole series isn't
quite ready for og10 yet.  Hopefully posting now will help us resolve
those last bits!

The first two patches in the series have been tested by themselves
however, and those ones do *not* cause regressions. If it seems helpful,
I can commit those now.

Julian

Julian Brown (7):
  [og10] Unify ARRAY_REF/INDIRECT_REF stripping code in
extract_base_bit_offset
  [og10] Refactor struct lowering for OpenMP/OpenACC in gimplify.c
  [og10] Revert gimplify.c parts of "Fix template case of non-static
member access inside member functions"
  [og10] Revert gimplify.c parts of "Arrow operator handling for C
front-end in OpenMP map clauses"
  [og10] Rewrite GOMP_MAP_ATTACH_DETACH mappings for OpenMP also
  [og10] Rework indirect struct handling for OpenACC/OpenMP in
gimplify.c
  [og10] WIP GOMP_MAP_ATTACH_ZERO_LENGTH_ARRAY_SECTION changes

 gcc/fortran/trans-openmp.c|   20 +-
 gcc/gimplify.c| 1178 ++---
 gcc/omp-low.c |   16 +-
 gcc/testsuite/g++.dg/goacc/member-array-acc.C |   13 +
 gcc/testsuite/g++.dg/gomp/member-array-omp.C  |   13 +
 gcc/testsuite/g++.dg/gomp/target-3.C  |4 +-
 gcc/testsuite/g++.dg/gomp/target-this-2.C |2 +-
 .../libgomp.oacc-c-c++-common/deep-copy-15.c  |   68 +
 .../libgomp.oacc-c-c++-common/deep-copy-16.c  |   95 ++
 9 files changed, 921 insertions(+), 488 deletions(-)
 create mode 100644 gcc/testsuite/g++.dg/goacc/member-array-acc.C
 create mode 100644 gcc/testsuite/g++.dg/gomp/member-array-omp.C
 create mode 100644 libgomp/testsuite/libgomp.oacc-c-c++-common/deep-copy-15.c
 create mode 100644 libgomp/testsuite/libgomp.oacc-c-c++-common/deep-copy-16.c

-- 
2.29.2



[PATCH] gcc-changelog: Remove non-strict mode.

2021-05-11 Thread Martin Liška

Hello.

I'm going to push a commit that removes non-strict mode. It's useless right now.

Martin

contrib/ChangeLog:

* gcc-changelog/git_check_commit.py: Remove --non-strict-mode.
* gcc-changelog/git_commit.py: Remove strict mode.
* gcc-changelog/git_email.py: Likewise.
* gcc-changelog/git_repository.py: Likewise.
* gcc-changelog/test_email.py: Likewise.
* gcc-changelog/test_patches.txt: Update patches so that they
don't contain a ChangeLog file changes.
---
 contrib/gcc-changelog/git_check_commit.py |   6 +-
 contrib/gcc-changelog/git_commit.py   |   4 +-
 contrib/gcc-changelog/git_email.py|   4 +-
 contrib/gcc-changelog/git_repository.py   |   4 +-
 contrib/gcc-changelog/test_email.py   |  19 +-
 contrib/gcc-changelog/test_patches.txt| 401 +++---
 6 files changed, 132 insertions(+), 306 deletions(-)

diff --git a/contrib/gcc-changelog/git_check_commit.py 
b/contrib/gcc-changelog/git_check_commit.py
index 246e9735c1d..9a4c5d448fb 100755
--- a/contrib/gcc-changelog/git_check_commit.py
+++ b/contrib/gcc-changelog/git_check_commit.py
@@ -29,14 +29,10 @@ parser.add_argument('-g', '--git-path', default='.',
 help='Path to git repository')
 parser.add_argument('-p', '--print-changelog', action='store_true',
 help='Print final changelog entires')
-parser.add_argument('-n', '--non-strict-mode', action='store_true',
-help='Use non-strict mode (allow changes in ChangeLog and '
-'other automatically updated files).')
 args = parser.parse_args()
 
 retval = 0

-for git_commit in parse_git_revisions(args.git_path, args.revisions,
-  not args.non_strict_mode):
+for git_commit in parse_git_revisions(args.git_path, args.revisions):
 res = 'OK' if git_commit.success else 'FAILED'
 print('Checking %s: %s' % (git_commit.original_info.hexsha, res))
 if git_commit.success:
diff --git a/contrib/gcc-changelog/git_commit.py 
b/contrib/gcc-changelog/git_commit.py
index 4a3f96997c5..c70279e2504 100755
--- a/contrib/gcc-changelog/git_commit.py
+++ b/contrib/gcc-changelog/git_commit.py
@@ -288,7 +288,7 @@ class GitInfo:
 
 
 class GitCommit:

-def __init__(self, info, strict=True, commit_to_info_hook=None, 
ref_name=None):
+def __init__(self, info, commit_to_info_hook=None, ref_name=None):
 self.original_info = info
 self.info = info
 self.message = None
@@ -325,7 +325,7 @@ class GitCommit:
 if len(project_files) == len(self.info.modified_files):
 # All modified files are only MISC files
 return
-elif project_files and strict:
+elif project_files:
 self.errors.append(Error('ChangeLog, DATESTAMP, BASE-VER and '
  'DEV-PHASE updates should be done '
  'separately from normal commits'))
diff --git a/contrib/gcc-changelog/git_email.py 
b/contrib/gcc-changelog/git_email.py
index 65ccb8cea99..6933e3d71fe 100755
--- a/contrib/gcc-changelog/git_email.py
+++ b/contrib/gcc-changelog/git_email.py
@@ -32,7 +32,7 @@ unidiff_supports_renaming = hasattr(PatchedFile(), 
'is_rename')
 
 
 class GitEmail(GitCommit):

-def __init__(self, filename, strict=False):
+def __init__(self, filename):
 self.filename = filename
 diff = PatchSet.from_filename(filename)
 date = None
@@ -68,7 +68,7 @@ class GitEmail(GitCommit):
 t = 'M'
 modified_files.append((target if t != 'D' else source, t))
 git_info = GitInfo(None, date, author, body, modified_files)
-super().__init__(git_info, strict=strict,
+super().__init__(git_info,
  commit_to_info_hook=lambda x: None)
 
 
diff --git a/contrib/gcc-changelog/git_repository.py b/contrib/gcc-changelog/git_repository.py

index 501c0d931f5..2d688826ff8 100755
--- a/contrib/gcc-changelog/git_repository.py
+++ b/contrib/gcc-changelog/git_repository.py
@@ -29,7 +29,7 @@ except ImportError:
 from git_commit import GitCommit, GitInfo, decode_path
 
 
-def parse_git_revisions(repo_path, revisions, strict=True, ref_name=None):

+def parse_git_revisions(repo_path, revisions, ref_name=None):
 repo = Repo(repo_path)
 
 def commit_to_info(commit):

@@ -72,7 +72,7 @@ def parse_git_revisions(repo_path, revisions, strict=True, 
ref_name=None):
 commits = [repo.commit(revisions)]
 
 for commit in commits:

-git_commit = GitCommit(commit_to_info(commit.hexsha), strict=strict,
+git_commit = GitCommit(commit_to_info(commit.hexsha),
commit_to_info_hook=commit_to_info,
ref_name=ref_name)
 parsed_commits.append(git_commit)
diff --git a/contrib/gcc-changelog/test_email.py 
b/contrib/gcc-changelog/test_email.py
index d66bf5be4eb..7472762e66d 100755
--- 

Re: [PATCH] match.pd: Optimize (x & y) == x into (x & ~y) == 0 [PR94589]

2021-05-11 Thread Richard Biener
On Tue, 11 May 2021, Jakub Jelinek wrote:

> On Thu, May 06, 2021 at 09:42:41PM +0200, Marc Glisse wrote:
> > We can probably do it in 2 steps, first something like
> > 
> > (for cmp (eq ne)
> >  (simplify
> >   (cmp (bit_and:c @0 @1) @0)
> >   (cmp (@0 (bit_not! @1)) { build_zero_cst (TREE_TYPE (@0)); })))
> > 
> > to get rid of the double use, and then simplify X==0 to X<=~C if C is a
> > mask 111...000 (I thought we already had a function to detect such masks, or
> > the 000...111, but I can't find them anymore).
> 
> Ok, here is the first step then.
> Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?
> 
> Or should it have cmp:c too given that == and != are commutative too?

I think so.  OK with that change.

Richard.

> 2021-05-11  Jakub Jelinek  
>   Marc Glisse  
> 
>   PR tree-optimization/94589
>   * match.pd ((X & Y) == Y -> (X & ~Y) == 0,
>   (X | Y) == Y -> (X & ~Y) == 0): New GIMPLE simplifications.
> 
>   * gcc.dg/tree-ssa/pr94589-1.c: New test.
> 
> --- gcc/match.pd.jj   2021-04-27 14:46:56.583716888 +0200
> +++ gcc/match.pd  2021-05-10 22:31:49.726870421 +0200
> @@ -4764,6 +4764,18 @@ (define_operator_list COND_TERNARY
>(cmp:c (bit_xor:c @0 @1) @0)
>(cmp @1 { build_zero_cst (TREE_TYPE (@1)); }))
>  
> +#if GIMPLE
> + /* (X & Y) == X becomes (X & ~Y) == 0.  */
> + (simplify
> +  (cmp (bit_and:c @0 @1) @0)
> +  (cmp (bit_and @0 (bit_not! @1)) { build_zero_cst (TREE_TYPE (@0)); }))
> +
> + /* (X | Y) == Y becomes (X & ~Y) == 0.  */
> + (simplify
> +  (cmp (bit_ior:c @0 @1) @1)
> +  (cmp (bit_and @0 (bit_not! @1)) { build_zero_cst (TREE_TYPE (@0)); }))
> +#endif
> +
>   /* (X ^ C1) op C2 can be rewritten as X op (C1 ^ C2).  */
>   (simplify
>(cmp (convert?@3 (bit_xor @0 INTEGER_CST@1)) INTEGER_CST@2)
> --- gcc/testsuite/gcc.dg/tree-ssa/pr94589-1.c.jj  2021-05-10 
> 22:36:10.574130179 +0200
> +++ gcc/testsuite/gcc.dg/tree-ssa/pr94589-1.c 2021-05-10 22:36:17.789054362 
> +0200
> @@ -0,0 +1,21 @@
> +/* PR tree-optimization/94589 */
> +/* { dg-do compile } */
> +/* { dg-options "-O2 -fdump-tree-optimized" } */
> +
> +int
> +foo (int x)
> +{
> +  return (x & 23) == x;
> +/* { dg-final { scan-tree-dump " & -24;" "optimized" } } */
> +/* { dg-final { scan-tree-dump-not " & 23;" "optimized" } } */
> +/* { dg-final { scan-tree-dump " == 0" "optimized" } } */
> +}
> +
> +int
> +bar (int x)
> +{
> +  return (x | 137) != 137;
> +/* { dg-final { scan-tree-dump " & -138;" "optimized" } } */
> +/* { dg-final { scan-tree-dump-not " \\| 137;" "optimized" } } */
> +/* { dg-final { scan-tree-dump " != 0" "optimized" } } */
> +}
> 
> 
>   Jakub
> 
> 

-- 
Richard Biener 
SUSE Software Solutions Germany GmbH, Maxfeldstrasse 5, 90409 Nuernberg,
Germany; GF: Felix Imendörffer; HRB 36809 (AG Nuernberg)


Re: [PATCH] match.pd: Optimize (x & y) == x into (x & ~y) == 0 [PR94589]

2021-05-11 Thread Marc Glisse

On Tue, 11 May 2021, Jakub Jelinek via Gcc-patches wrote:


On Thu, May 06, 2021 at 09:42:41PM +0200, Marc Glisse wrote:

We can probably do it in 2 steps, first something like

(for cmp (eq ne)
 (simplify
  (cmp (bit_and:c @0 @1) @0)
  (cmp (@0 (bit_not! @1)) { build_zero_cst (TREE_TYPE (@0)); })))

to get rid of the double use, and then simplify X==0 to X<=~C if C is a
mask 111...000 (I thought we already had a function to detect such masks, or
the 000...111, but I can't find them anymore).


Ok, here is the first step then.
Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?

Or should it have cmp:c too given that == and != are commutative too?


Ah, yes, you are right, good point on the cmp:c, thank you.


2021-05-11  Jakub Jelinek  
Marc Glisse  

PR tree-optimization/94589
* match.pd ((X & Y) == Y -> (X & ~Y) == 0,

   ^

X?


(X | Y) == Y -> (X & ~Y) == 0): New GIMPLE simplifications.

* gcc.dg/tree-ssa/pr94589-1.c: New test.

--- gcc/match.pd.jj 2021-04-27 14:46:56.583716888 +0200
+++ gcc/match.pd2021-05-10 22:31:49.726870421 +0200
@@ -4764,6 +4764,18 @@ (define_operator_list COND_TERNARY
  (cmp:c (bit_xor:c @0 @1) @0)
  (cmp @1 { build_zero_cst (TREE_TYPE (@1)); }))

+#if GIMPLE
+ /* (X & Y) == X becomes (X & ~Y) == 0.  */
+ (simplify
+  (cmp (bit_and:c @0 @1) @0)
+  (cmp (bit_and @0 (bit_not! @1)) { build_zero_cst (TREE_TYPE (@0)); }))
+
+ /* (X | Y) == Y becomes (X & ~Y) == 0.  */
+ (simplify
+  (cmp (bit_ior:c @0 @1) @1)
+  (cmp (bit_and @0 (bit_not! @1)) { build_zero_cst (TREE_TYPE (@0)); }))
+#endif
+
 /* (X ^ C1) op C2 can be rewritten as X op (C1 ^ C2).  */
 (simplify
  (cmp (convert?@3 (bit_xor @0 INTEGER_CST@1)) INTEGER_CST@2)
--- gcc/testsuite/gcc.dg/tree-ssa/pr94589-1.c.jj2021-05-10 
22:36:10.574130179 +0200
+++ gcc/testsuite/gcc.dg/tree-ssa/pr94589-1.c   2021-05-10 22:36:17.789054362 
+0200
@@ -0,0 +1,21 @@
+/* PR tree-optimization/94589 */
+/* { dg-do compile } */
+/* { dg-options "-O2 -fdump-tree-optimized" } */
+
+int
+foo (int x)
+{
+  return (x & 23) == x;
+/* { dg-final { scan-tree-dump " & -24;" "optimized" } } */
+/* { dg-final { scan-tree-dump-not " & 23;" "optimized" } } */
+/* { dg-final { scan-tree-dump " == 0" "optimized" } } */
+}
+
+int
+bar (int x)
+{
+  return (x | 137) != 137;
+/* { dg-final { scan-tree-dump " & -138;" "optimized" } } */
+/* { dg-final { scan-tree-dump-not " \\| 137;" "optimized" } } */
+/* { dg-final { scan-tree-dump " != 0" "optimized" } } */
+}


Jakub


--
Marc Glisse


Re: [PATCH 1/2] vect: Add costing_for_scalar parameter to init_cost hook

2021-05-11 Thread Richard Biener via Gcc-patches
On Tue, May 11, 2021 at 9:10 AM Kewen.Lin  wrote:
>
> Hi Richi,
>
> on 2021/5/10 下午9:55, Richard Biener wrote:
> > On Sat, May 8, 2021 at 10:05 AM Kewen.Lin  wrote:
> >>
> >> Hi Richi,
> >>
> >> Thanks for the comments!
> >>
> >> on 2021/5/7 下午5:43, Richard Biener wrote:
> >>> On Fri, May 7, 2021 at 5:30 AM Kewen.Lin via Gcc-patches
> >>>  wrote:
> 
>  Hi,
> 
>  When I was investigating density_test heuristics, I noticed that
>  the current rs6000_density_test could be used for single scalar
>  iteration cost calculation, through the call trace:
>    vect_compute_single_scalar_iteration_cost
>  -> rs6000_finish_cost
>   -> rs6000_density_test
> 
>  It looks unexpected as its desriptive function comments and Bill
>  helped to confirm this needs to be fixed (thanks!).
> 
>  So this patch is to check the passed data, if it's the same as
>  the one in loop_vinfo, it indicates it's working on vector version
>  cost calculation, otherwise just early return.
> 
>  Bootstrapped/regtested on powerpc64le-linux-gnu P9.
> 
>  Nothing remarkable was observed with SPEC2017 Power9 full run.
> 
>  Is it ok for trunk?
> >>>
> >>> +  /* Only care about cost of vector version, so exclude scalar
> >>> version here.  */
> >>> +  if (LOOP_VINFO_TARGET_COST_DATA (loop_vinfo) != (void *) data)
> >>> +return;
> >>>
> >>> Hmm, looks like a quite "random" test to me.  What about adding a
> >>> parameter to finish_cost () (or init_cost?) indicating the cost kind?
> >>>
> >>
> >> I originally wanted to change the hook interface, but noticed that
> >> the finish_cost in function vect_estimate_min_profitable_iters is
> >> the only invocation with LOOP_VINFO_TARGET_COST_DATA (loop_vinfo),
> >> it looks enough to differentiate the scalar version costing or
> >> vector version costing for loop.  Do you mean this observation/
> >> assumption easy to be broken sometime later?
> >
> > Yes, this field is likely to become stale.
> >
> >>
> >> The attached patch to add one new parameter to indicate the
> >> costing kind explicitly as you suggested.
> >>
> >> Does it look better?
> >>
> >> gcc/ChangeLog:
> >>
> >> * doc/tm.texi: Regenerated.
> >> * target.def (init_cost): Add new parameter costing_for_scalar.
> >> * targhooks.c (default_init_cost): Adjust for new parameter.
> >> * targhooks.h (default_init_cost): Likewise.
> >> * tree-vect-loop.c (_loop_vec_info::_loop_vec_info): Likewise.
> >> (vect_compute_single_scalar_iteration_cost): Likewise.
> >> (vect_analyze_loop_2): Likewise.
> >> * tree-vect-slp.c (_bb_vec_info::_bb_vec_info): Likewise.
> >> (vect_bb_vectorization_profitable_p): Likewise.
> >> * tree-vectorizer.h (init_cost): Likewise.
> >> * config/aarch64/aarch64.c (aarch64_init_cost): Likewise.
> >> * config/i386/i386.c (ix86_init_cost): Likewise.
> >> * config/rs6000/rs6000.c (rs6000_init_cost): Likewise.
> >>
> >>> OTOH we already pass scalar_stmt to individual add_stmt_cost,
> >>> so not sure whether the context really matters.  That said,
> >>> the density test looks "interesting" ... the intent was that finish_cost
> >>> might look at gathered data from add_stmt, not that it looks at
> >>> the GIMPLE IL ... so why are you not counting vector_stmt vs.
> >>> scalar_stmt entries in vect_body and using that for this metric?
> >>>
> >>
> >> Good to know the intention behind finish_cost, thanks!
> >>
> >> I'm afraid that the check on vector_stmt and scalar_stmt entries
> >> from add_stmt_cost doesn't work for the density test here.  The
> >> density test focuses on the vector version itself, there are some
> >> stmts whose relevants are marked as vect_unused_in_scope, IIUC
> >> they won't be passed down when costing for both versions.  But the
> >> existing density check would like to know the cost for the
> >> non-vectorized part.  The current implementation does:
> >>
> >>  vec_cost = data->cost[vect_body]
> >>
> >>   if (!STMT_VINFO_RELEVANT_P (stmt_info)
> >>   && !STMT_VINFO_IN_PATTERN_P (stmt_info))
> >> not_vec_cost++
> >>
> >>  density_pct = (vec_cost * 100) / (vec_cost + not_vec_cost);
> >>
> >> it takes those unrelevant stmts into account, and then has
> >> both costs from the non-vectorized part (not_vec_cost)
> >> and vectorized part (cost[vect_body]), it can calculate the
> >> vectorization code density ratio.
> >
> > Yes, but then what "relevant" stmts are actually needed and what
> > not is missed by your heuristics.  It's really some GIGO one
> > I fear - each vectorized data reference will add a pointer IV
> > (eventually commoned by IVOPTs later) and pointer value updates
> > that are not accounted for in costing (the IVs and updates in the
> > scalar code are marked as not relevant).  Are those the stmts
> > this heuristic wants to look at?
>
> Yes, the IVs and updates (even the 

Re: [PATCH] Bump LTO_major_version to 11.

2021-05-11 Thread Richard Biener via Gcc-patches
On Tue, May 11, 2021 at 8:46 AM Martin Liška  wrote:
>
> On 4/23/21 1:37 PM, Martin Liška wrote:
> > On 4/23/21 12:59 PM, Richard Biener wrote:
> >> True, the question is on how much detail we have to pay attention to.
> >
> > Agree with that.
> >
> >> For us of course the build-id solution works fine.  And hopefully the
> >> days of PCH are counted...
> >
> > Yes.
> >
> > I have a tentative patch that emits the attached checksum.h header file.
> > We also include flags in the checksum:
> >
> > ...
> >   build/genchecksum$(build_exeext) $(C_OBJS) $(BACKEND) $(LIBDEPS) \
> >
> >   checksum-options > cc1-checksum.c.tmp &&   \
> >
> > ...
> >
> > $ cat checksum-options
> >
> > g++ -no-pie   -g   -DIN_GCC -fPIC-fno-exceptions -fno-rtti 
> > -fasynchronous-unwind-tables -W -Wall -Wno-narrowing -Wwrite-strings 
> > -Wcast-qual -Wno-error=format-diag -Wmissing-format-attribute 
> > -Woverloaded-virtual -pedantic -Wno-long-long -Wno-variadic-macros 
> > -Wno-overlength-strings -fno-common  -DHAVE_CONFIG_H -static-libstdc++ 
> > -static-libgcc
> >
> > Can we ignore them in the checksum calculation?
> > Martin
> >
>
> Richi, what do you think about this part?

We included the linker options in checksumming when changing from checksumming
cc1 to its object files.  I can't find any reasoning other than
mimicing what was there
before.  I wonder how the details of the host binary build influence
PCH validity - can
you make an experiment and (with fixed PCH checksum) try to load a PCH file
generated with the stage1 compiler with the stage3 compiler?  (thus -O0 vs -O2
or even LTO for LTO bootstrap)?  That is, we're concerned about layout and
semantics of the data structures participating in PCH but since the layout is
exposed no option should change it.

Note this also means that we should be able to share the checksum for all
stages (removing the odd comparison failures on frontend binaries).  Some
configury might change the layout so the stage1 PCH could in theory be
not compatible with the stage2+ one.

I wonder if we can instead upstream the build-id use and conditionalize
the checksum stuff on some configury?  Some people do seem worried
about "weakening" the checksum.

Richard.

> Thanks,
> Martin


Re: [PATCH] testsuite/s390: Fix risbg-ll-3.c f2_cconly test.

2021-05-11 Thread Andreas Krebbel via Gcc-patches
On 5/4/21 5:08 PM, Robin Dapp wrote:
> Hi,
> 
> instead of selecting bits 62 to (wraparound) 59 from r2 and inserting 
> them into r3, we select bits 60 to 62 from r3 and insert them into r2 
> nowadays.  Adjust the test accordingly.
> 
> Is this OK?
> 
> Regards
>   Robin
> 
> gcc/testsuite/ChangeLog:
> 
>  * gcc.target/s390/risbg-ll-3.c: Change match pattern.
> 

Ok. Thanks!

Andreas



[PATCH] match.pd: Optimize (x & y) == x into (x & ~y) == 0 [PR94589]

2021-05-11 Thread Jakub Jelinek via Gcc-patches
On Thu, May 06, 2021 at 09:42:41PM +0200, Marc Glisse wrote:
> We can probably do it in 2 steps, first something like
> 
> (for cmp (eq ne)
>  (simplify
>   (cmp (bit_and:c @0 @1) @0)
>   (cmp (@0 (bit_not! @1)) { build_zero_cst (TREE_TYPE (@0)); })))
> 
> to get rid of the double use, and then simplify X==0 to X<=~C if C is a
> mask 111...000 (I thought we already had a function to detect such masks, or
> the 000...111, but I can't find them anymore).

Ok, here is the first step then.
Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?

Or should it have cmp:c too given that == and != are commutative too?

2021-05-11  Jakub Jelinek  
Marc Glisse  

PR tree-optimization/94589
* match.pd ((X & Y) == Y -> (X & ~Y) == 0,
(X | Y) == Y -> (X & ~Y) == 0): New GIMPLE simplifications.

* gcc.dg/tree-ssa/pr94589-1.c: New test.

--- gcc/match.pd.jj 2021-04-27 14:46:56.583716888 +0200
+++ gcc/match.pd2021-05-10 22:31:49.726870421 +0200
@@ -4764,6 +4764,18 @@ (define_operator_list COND_TERNARY
   (cmp:c (bit_xor:c @0 @1) @0)
   (cmp @1 { build_zero_cst (TREE_TYPE (@1)); }))
 
+#if GIMPLE
+ /* (X & Y) == X becomes (X & ~Y) == 0.  */
+ (simplify
+  (cmp (bit_and:c @0 @1) @0)
+  (cmp (bit_and @0 (bit_not! @1)) { build_zero_cst (TREE_TYPE (@0)); }))
+
+ /* (X | Y) == Y becomes (X & ~Y) == 0.  */
+ (simplify
+  (cmp (bit_ior:c @0 @1) @1)
+  (cmp (bit_and @0 (bit_not! @1)) { build_zero_cst (TREE_TYPE (@0)); }))
+#endif
+
  /* (X ^ C1) op C2 can be rewritten as X op (C1 ^ C2).  */
  (simplify
   (cmp (convert?@3 (bit_xor @0 INTEGER_CST@1)) INTEGER_CST@2)
--- gcc/testsuite/gcc.dg/tree-ssa/pr94589-1.c.jj2021-05-10 
22:36:10.574130179 +0200
+++ gcc/testsuite/gcc.dg/tree-ssa/pr94589-1.c   2021-05-10 22:36:17.789054362 
+0200
@@ -0,0 +1,21 @@
+/* PR tree-optimization/94589 */
+/* { dg-do compile } */
+/* { dg-options "-O2 -fdump-tree-optimized" } */
+
+int
+foo (int x)
+{
+  return (x & 23) == x;
+/* { dg-final { scan-tree-dump " & -24;" "optimized" } } */
+/* { dg-final { scan-tree-dump-not " & 23;" "optimized" } } */
+/* { dg-final { scan-tree-dump " == 0" "optimized" } } */
+}
+
+int
+bar (int x)
+{
+  return (x | 137) != 137;
+/* { dg-final { scan-tree-dump " & -138;" "optimized" } } */
+/* { dg-final { scan-tree-dump-not " \\| 137;" "optimized" } } */
+/* { dg-final { scan-tree-dump " != 0" "optimized" } } */
+}


Jakub



[committed] openmp: Fix up taskloop reduction ICE if taskloop has no iterations [PR100471]

2021-05-11 Thread Jakub Jelinek via Gcc-patches
Hi!

When a taskloop doesn't have any iterations, GOMP_taskloop* takes an early
return, doesn't create any tasks and more importantly, doesn't create
a taskgroup and doesn't register task reductions.  But, the code emitted
in the callers assumes task reductions have been registered and performs
the reduction handling and task reduction unregistration.  The pointer
to the task reduction private variables is reused, on input it is the alignment
and only on output it is the pointer, so in the case taskloop with no iterations
the caller attempts to dereference the alignment value as if it was a pointer
and crashes.  We could in the early returns register the task reductions
only to have them looped over and unregistered in the caller, but I think
it is better to tell the caller there is nothing to task reduce and bypass
all that.

Bootstrapped/regtested on x86_64-linux and i686-linux, committed to trunk
so far.

2021-05-11  Jakub Jelinek  

PR middle-end/100471
* omp-low.c (lower_omp_task_reductions): For OMP_TASKLOOP, if data
is 0, bypass the reduction loop including
GOMP_taskgroup_reduction_unregister call.

* taskloop.c (GOMP_taskloop): If GOMP_TASK_FLAG_REDUCTION and not
GOMP_TASK_FLAG_NOGROUP, when doing early return clear the task
reduction pointer.
* testsuite/libgomp.c/task-reduction-4.c: New test.

--- gcc/omp-low.c.jj2021-05-10 12:22:30.391452318 +0200
+++ gcc/omp-low.c   2021-05-10 13:39:39.366205162 +0200
@@ -8781,7 +8781,7 @@ lower_omp_task_reductions (omp_context *
   tree num_thr_sz = create_tmp_var (size_type_node);
   tree lab1 = create_artificial_label (UNKNOWN_LOCATION);
   tree lab2 = create_artificial_label (UNKNOWN_LOCATION);
-  tree lab3 = NULL_TREE;
+  tree lab3 = NULL_TREE, lab7 = NULL_TREE;
   gimple *g;
   if (code == OMP_FOR || code == OMP_SECTIONS)
 {
@@ -8846,6 +8846,14 @@ lower_omp_task_reductions (omp_context *
  NULL_TREE, NULL_TREE);
   tree data = create_tmp_var (pointer_sized_int_node);
   gimple_seq_add_stmt (end, gimple_build_assign (data, t));
+  if (code == OMP_TASKLOOP)
+{
+  lab7 = create_artificial_label (UNKNOWN_LOCATION);
+  g = gimple_build_cond (NE_EXPR, data,
+build_zero_cst (pointer_sized_int_node),
+lab1, lab7);
+  gimple_seq_add_stmt (end, g);
+}
   gimple_seq_add_stmt (end, gimple_build_label (lab1));
   tree ptr;
   if (TREE_CODE (TYPE_SIZE_UNIT (record_type)) == INTEGER_CST)
@@ -9209,6 +9217,8 @@ lower_omp_task_reductions (omp_context *
   g = gimple_build_call (t, 1, build_fold_addr_expr (avar));
 }
   gimple_seq_add_stmt (end, g);
+  if (lab7)
+gimple_seq_add_stmt (end, gimple_build_label (lab7));
   t = build_constructor (atype, NULL);
   TREE_THIS_VOLATILE (t) = 1;
   gimple_seq_add_stmt (end, gimple_build_assign (avar, t));
--- libgomp/taskloop.c.jj   2021-01-04 10:25:56.074038599 +0100
+++ libgomp/taskloop.c  2021-05-10 12:32:04.024191809 +0200
@@ -51,20 +51,32 @@ GOMP_taskloop (void (*fn) (void *), void
 
   /* If parallel or taskgroup has been cancelled, don't start new tasks.  */
   if (team && gomp_team_barrier_cancelled (>barrier))
-return;
+{
+early_return:
+  if ((flags & (GOMP_TASK_FLAG_NOGROUP | GOMP_TASK_FLAG_REDUCTION))
+ == GOMP_TASK_FLAG_REDUCTION)
+   {
+ struct gomp_data_head { TYPE t1, t2; uintptr_t *ptr; };
+ uintptr_t *ptr = ((struct gomp_data_head *) data)->ptr;
+ /* Tell callers GOMP_taskgroup_reduction_register has not been
+called.  */
+ ptr[2] = 0;
+   }
+  return;
+}
 
 #ifdef TYPE_is_long
   TYPE s = step;
   if (step > 0)
 {
   if (start >= end)
-   return;
+   goto early_return;
   s--;
 }
   else
 {
   if (start <= end)
-   return;
+   goto early_return;
   s++;
 }
   UTYPE n = (end - start + s) / step;
@@ -73,13 +85,13 @@ GOMP_taskloop (void (*fn) (void *), void
   if (flags & GOMP_TASK_FLAG_UP)
 {
   if (start >= end)
-   return;
+   goto early_return;
   n = (end - start + step - 1) / step;
 }
   else
 {
   if (start <= end)
-   return;
+   goto early_return;
   n = (start - end - step - 1) / -step;
 }
 #endif
--- libgomp/testsuite/libgomp.c/task-reduction-4.c.jj   2021-05-10 
12:31:37.628479637 +0200
+++ libgomp/testsuite/libgomp.c/task-reduction-4.c  2021-05-10 
12:30:57.966912118 +0200
@@ -0,0 +1,21 @@
+/* PR middle-end/100471 */
+
+extern void abort (void);
+
+int c;
+
+int
+main ()
+{
+#pragma omp parallel
+#pragma omp single
+  {
+int r = 0, i;
+#pragma omp taskloop reduction(+:r)
+for (i = 0; i < c; i++)
+  r++;
+if (r != 0)
+  abort ();
+  }
+  return 0;
+}


Jakub



Re: [PATCH 2/2 v2] rs6000: Guard density_test only for vector version

2021-05-11 Thread Kewen.Lin via Gcc-patches
Hi Segher,

on 2021/5/11 上午4:12, Segher Boessenkool wrote:
> Hi!
> 
> On Sat, May 08, 2021 at 04:12:18PM +0800, Kewen.Lin wrote:
>> --- a/gcc/config/rs6000/rs6000.c
>> +++ b/gcc/config/rs6000/rs6000.c
>> @@ -5234,6 +5234,8 @@ typedef struct _rs6000_cost_data
>>/* For each vectorized loop, this var holds TRUE iff a non-memory vector
>>   instruction is needed by the vectorization.  */
>>bool vect_nonmem;
>> +  /* Indicates costing for the scalar version of a loop or block.  */
>> +  bool costing_for_scalar;
>>  } rs6000_cost_data;
> 
> "... this is costing for ..."?
> 
>> @@ -5255,6 +5257,12 @@ rs6000_density_test (rs6000_cost_data *data)
>>int vec_cost = data->cost[vect_body], not_vec_cost = 0;
>>int i, density_pct;
>>  
>> +  /* This density test only cares about the cost of vector version of the
>> + loop, early return if it's costing for the scalar version (namely
>> + computing single scalar iteration cost).  */
>> +  if (data->costing_for_scalar)
>> +return;
> 
> "..., so immediately return if we are passed costing for ..."?
> 
> The patch is okay for trunk with those or similar changes.  Thanks!
> 
> 

Thanks for the review!

Committed in r12-705 with the requested comment changes.

BR,
Kewen


Re: [PATCH 1/2] vect: Add costing_for_scalar parameter to init_cost hook

2021-05-11 Thread Kewen.Lin via Gcc-patches
Hi Richard,

on 2021/5/10 下午10:08, Richard Sandiford wrote:
> "Kewen.Lin via Gcc-patches"  writes:
>> on 2021/5/7 下午5:43, Richard Biener wrote:
>>> On Fri, May 7, 2021 at 5:30 AM Kewen.Lin via Gcc-patches
>>>  wrote:

 Hi,

 When I was investigating density_test heuristics, I noticed that
 the current rs6000_density_test could be used for single scalar
 iteration cost calculation, through the call trace:
   vect_compute_single_scalar_iteration_cost
 -> rs6000_finish_cost
  -> rs6000_density_test

 It looks unexpected as its desriptive function comments and Bill
 helped to confirm this needs to be fixed (thanks!).

 So this patch is to check the passed data, if it's the same as
 the one in loop_vinfo, it indicates it's working on vector version
 cost calculation, otherwise just early return.

 Bootstrapped/regtested on powerpc64le-linux-gnu P9.

 Nothing remarkable was observed with SPEC2017 Power9 full run.

 Is it ok for trunk?
>>>
>>> +  /* Only care about cost of vector version, so exclude scalar
>>> version here.  */
>>> +  if (LOOP_VINFO_TARGET_COST_DATA (loop_vinfo) != (void *) data)
>>> +return;
>>>
>>> Hmm, looks like a quite "random" test to me.  What about adding a
>>> parameter to finish_cost () (or init_cost?) indicating the cost kind?
>>>
>>
>> I originally wanted to change the hook interface, but noticed that
>> the finish_cost in function vect_estimate_min_profitable_iters is
>> the only invocation with LOOP_VINFO_TARGET_COST_DATA (loop_vinfo),
>> it looks enough to differentiate the scalar version costing or
>> vector version costing for loop.  Do you mean this observation/
>> assumption easy to be broken sometime later?
>>
>> The attached patch to add one new parameter to indicate the
>> costing kind explicitly as you suggested.
>>
>> Does it look better?
>>
>> gcc/ChangeLog:
>>
>>  * doc/tm.texi: Regenerated.
>>  * target.def (init_cost): Add new parameter costing_for_scalar.
>>  * targhooks.c (default_init_cost): Adjust for new parameter.
>>  * targhooks.h (default_init_cost): Likewise.
>>  * tree-vect-loop.c (_loop_vec_info::_loop_vec_info): Likewise.
>>  (vect_compute_single_scalar_iteration_cost): Likewise.
>>  (vect_analyze_loop_2): Likewise.
>>  * tree-vect-slp.c (_bb_vec_info::_bb_vec_info): Likewise.
>>  (vect_bb_vectorization_profitable_p): Likewise.
>>  * tree-vectorizer.h (init_cost): Likewise.
>>  * config/aarch64/aarch64.c (aarch64_init_cost): Likewise.
>>  * config/i386/i386.c (ix86_init_cost): Likewise.
>>  * config/rs6000/rs6000.c (rs6000_init_cost): Likewise.  
> 
> Just wanted to say thanks for doing this.  I hit the same problem
> when doing the Neoverse V1 tuning near the end of stage 4.  Due to
> the extreme lateness of the changes, I couldn't reasonably ask for
> target-independent help at that time, but this patch will make
> things simpler for AArch64. :-)
> 


Glad to see that rs6000 port isn't the only port requesting this.  :-)
Thanks for the information!
BR,
Kewen


Re: [PATCH 1/2] vect: Add costing_for_scalar parameter to init_cost hook

2021-05-11 Thread Kewen.Lin via Gcc-patches
Hi Richi,

on 2021/5/10 下午9:55, Richard Biener wrote:
> On Sat, May 8, 2021 at 10:05 AM Kewen.Lin  wrote:
>>
>> Hi Richi,
>>
>> Thanks for the comments!
>>
>> on 2021/5/7 下午5:43, Richard Biener wrote:
>>> On Fri, May 7, 2021 at 5:30 AM Kewen.Lin via Gcc-patches
>>>  wrote:

 Hi,

 When I was investigating density_test heuristics, I noticed that
 the current rs6000_density_test could be used for single scalar
 iteration cost calculation, through the call trace:
   vect_compute_single_scalar_iteration_cost
 -> rs6000_finish_cost
  -> rs6000_density_test

 It looks unexpected as its desriptive function comments and Bill
 helped to confirm this needs to be fixed (thanks!).

 So this patch is to check the passed data, if it's the same as
 the one in loop_vinfo, it indicates it's working on vector version
 cost calculation, otherwise just early return.

 Bootstrapped/regtested on powerpc64le-linux-gnu P9.

 Nothing remarkable was observed with SPEC2017 Power9 full run.

 Is it ok for trunk?
>>>
>>> +  /* Only care about cost of vector version, so exclude scalar
>>> version here.  */
>>> +  if (LOOP_VINFO_TARGET_COST_DATA (loop_vinfo) != (void *) data)
>>> +return;
>>>
>>> Hmm, looks like a quite "random" test to me.  What about adding a
>>> parameter to finish_cost () (or init_cost?) indicating the cost kind?
>>>
>>
>> I originally wanted to change the hook interface, but noticed that
>> the finish_cost in function vect_estimate_min_profitable_iters is
>> the only invocation with LOOP_VINFO_TARGET_COST_DATA (loop_vinfo),
>> it looks enough to differentiate the scalar version costing or
>> vector version costing for loop.  Do you mean this observation/
>> assumption easy to be broken sometime later?
> 
> Yes, this field is likely to become stale.
> 
>>
>> The attached patch to add one new parameter to indicate the
>> costing kind explicitly as you suggested.
>>
>> Does it look better?
>>
>> gcc/ChangeLog:
>>
>> * doc/tm.texi: Regenerated.
>> * target.def (init_cost): Add new parameter costing_for_scalar.
>> * targhooks.c (default_init_cost): Adjust for new parameter.
>> * targhooks.h (default_init_cost): Likewise.
>> * tree-vect-loop.c (_loop_vec_info::_loop_vec_info): Likewise.
>> (vect_compute_single_scalar_iteration_cost): Likewise.
>> (vect_analyze_loop_2): Likewise.
>> * tree-vect-slp.c (_bb_vec_info::_bb_vec_info): Likewise.
>> (vect_bb_vectorization_profitable_p): Likewise.
>> * tree-vectorizer.h (init_cost): Likewise.
>> * config/aarch64/aarch64.c (aarch64_init_cost): Likewise.
>> * config/i386/i386.c (ix86_init_cost): Likewise.
>> * config/rs6000/rs6000.c (rs6000_init_cost): Likewise.
>>
>>> OTOH we already pass scalar_stmt to individual add_stmt_cost,
>>> so not sure whether the context really matters.  That said,
>>> the density test looks "interesting" ... the intent was that finish_cost
>>> might look at gathered data from add_stmt, not that it looks at
>>> the GIMPLE IL ... so why are you not counting vector_stmt vs.
>>> scalar_stmt entries in vect_body and using that for this metric?
>>>
>>
>> Good to know the intention behind finish_cost, thanks!
>>
>> I'm afraid that the check on vector_stmt and scalar_stmt entries
>> from add_stmt_cost doesn't work for the density test here.  The
>> density test focuses on the vector version itself, there are some
>> stmts whose relevants are marked as vect_unused_in_scope, IIUC
>> they won't be passed down when costing for both versions.  But the
>> existing density check would like to know the cost for the
>> non-vectorized part.  The current implementation does:
>>
>>  vec_cost = data->cost[vect_body]
>>
>>   if (!STMT_VINFO_RELEVANT_P (stmt_info)
>>   && !STMT_VINFO_IN_PATTERN_P (stmt_info))
>> not_vec_cost++
>>
>>  density_pct = (vec_cost * 100) / (vec_cost + not_vec_cost);
>>
>> it takes those unrelevant stmts into account, and then has
>> both costs from the non-vectorized part (not_vec_cost)
>> and vectorized part (cost[vect_body]), it can calculate the
>> vectorization code density ratio.
> 
> Yes, but then what "relevant" stmts are actually needed and what
> not is missed by your heuristics.  It's really some GIGO one
> I fear - each vectorized data reference will add a pointer IV
> (eventually commoned by IVOPTs later) and pointer value updates
> that are not accounted for in costing (the IVs and updates in the
> scalar code are marked as not relevant).  Are those the stmts
> this heuristic wants to look at?

Yes, the IVs and updates (even the comparison for exit) are what
the heuristics tries to count.  In most cases, the non vectorized
part in the loop are IV updates.  And it's so true that the
collected not_vec_cost could be not accurate, but it seems hard
to predict the cost exactly here?

Assuming this 

Re: [PATCH] Bump LTO_major_version to 11.

2021-05-11 Thread Martin Liška

On 4/23/21 1:37 PM, Martin Liška wrote:

On 4/23/21 12:59 PM, Richard Biener wrote:

True, the question is on how much detail we have to pay attention to.


Agree with that.


For us of course the build-id solution works fine.  And hopefully the
days of PCH are counted...


Yes.

I have a tentative patch that emits the attached checksum.h header file.
We also include flags in the checksum:

...
  build/genchecksum$(build_exeext) $(C_OBJS) $(BACKEND) $(LIBDEPS) \

  checksum-options > cc1-checksum.c.tmp &&   \

...

$ cat checksum-options

g++ -no-pie   -g   -DIN_GCC -fPIC-fno-exceptions -fno-rtti 
-fasynchronous-unwind-tables -W -Wall -Wno-narrowing -Wwrite-strings 
-Wcast-qual -Wno-error=format-diag -Wmissing-format-attribute 
-Woverloaded-virtual -pedantic -Wno-long-long -Wno-variadic-macros 
-Wno-overlength-strings -fno-common  -DHAVE_CONFIG_H -static-libstdc++ 
-static-libgcc

Can we ignore them in the checksum calculation?
Martin



Richi, what do you think about this part?

Thanks,
Martin


Re: [PATCH 02/12] Allow generating pseudo register with specific alignment

2021-05-11 Thread Richard Biener via Gcc-patches
On Mon, May 10, 2021 at 4:12 PM H.J. Lu  wrote:
>
> On Mon, May 10, 2021 at 6:59 AM Richard Biener
>  wrote:
> >
> > On Mon, May 10, 2021 at 3:29 PM H.J. Lu  wrote:
> > >
> > > On Mon, May 10, 2021 at 2:39 AM Richard Sandiford
> > >  wrote:
> > > >
> > > > Richard Biener via Gcc-patches  writes:
> > > > > On Fri, Apr 30, 2021 at 8:30 PM Richard Sandiford via Gcc-patches
> > > > >  wrote:
> > > > >>
> > > > >> "H.J. Lu via Gcc-patches"  writes:
> > > > >> > On Fri, Apr 30, 2021 at 5:49 AM H.J. Lu  
> > > > >> > wrote:
> > > > >> >>
> > > > >> >> On Fri, Apr 30, 2021 at 5:42 AM Richard Sandiford
> > > > >> >>  wrote:
> > > > >> >> >
> > > > >> >> > "H.J. Lu via Gcc-patches"  writes:
> > > > >> >> > > On Fri, Apr 30, 2021 at 2:06 AM Richard Sandiford
> > > > >> >> > >  wrote:
> > > > >> >> > >>
> > > > >> >> > >> "H.J. Lu via Gcc-patches"  writes:
> > > > >> >> > >> > gen_reg_rtx tracks stack alignment needed for pseudo 
> > > > >> >> > >> > registers so that
> > > > >> >> > >> > associated hard registers can be properly spilled onto 
> > > > >> >> > >> > stack.  But there
> > > > >> >> > >> > are cases where associated hard registers will never be 
> > > > >> >> > >> > spilled onto
> > > > >> >> > >> > stack.  gen_reg_rtx is changed to take an argument for 
> > > > >> >> > >> > register alignment
> > > > >> >> > >> > so that stack realignment can be avoided when not needed.
> > > > >> >> > >>
> > > > >> >> > >> How is it guaranteed that they will never be spilled though?
> > > > >> >> > >> I don't think that that guarantee exists for any kind of 
> > > > >> >> > >> pseudo,
> > > > >> >> > >> except perhaps for the temporary pseudos that the RA creates 
> > > > >> >> > >> to
> > > > >> >> > >> replace (match_scratch …)es.
> > > > >> >> > >>
> > > > >> >> > >
> > > > >> >> > > The caller of creating pseudo registers with specific 
> > > > >> >> > > alignment must
> > > > >> >> > > guarantee that they will never be spilled.   I am only using 
> > > > >> >> > > it in
> > > > >> >> > >
> > > > >> >> > >   /* Make operand1 a register if it isn't already.  */
> > > > >> >> > >   if (can_create_pseudo_p ()
> > > > >> >> > >   && !register_operand (op0, mode)
> > > > >> >> > >   && !register_operand (op1, mode))
> > > > >> >> > > {
> > > > >> >> > >   /* NB: Don't increase stack alignment requirement when 
> > > > >> >> > > forcing
> > > > >> >> > >  operand1 into a pseudo register to copy data from 
> > > > >> >> > > one memory
> > > > >> >> > >  location to another since it doesn't require a 
> > > > >> >> > > spill.  */
> > > > >> >> > >   emit_move_insn (op0,
> > > > >> >> > >   force_reg (GET_MODE (op0), op1,
> > > > >> >> > >  (UNITS_PER_WORD * 
> > > > >> >> > > BITS_PER_UNIT)));
> > > > >> >> > >   return;
> > > > >> >> > > }
> > > > >> >> > >
> > > > >> >> > > for vector moves.  RA shouldn't spill it.
> > > > >> >> >
> > > > >> >> > But this is the point: it's a case of hoping that the RA won't 
> > > > >> >> > spill it,
> > > > >> >> > rather than having a guarantee that it won't.
> > > > >> >> >
> > > > >> >> > Even if the moves start out adjacent, they could be separated 
> > > > >> >> > by later
> > > > >> >> > RTL optimisations, particularly scheduling.  (I realise pre-RA 
> > > > >> >> > scheduling
> > > > >> >> > isn't enabled by default for x86, but it can still be enabled 
> > > > >> >> > explicitly.)
> > > > >> >> > Or if the same data is being copied to two locations, we might 
> > > > >> >> > reuse
> > > > >> >> > values loaded by the first copy for the second copy as well.
> > > > >> >
> > > > >> > There are cases where pseudo vector registers are created as pure
> > > > >> > temporary registers in the backend and they shouldn't ever be 
> > > > >> > spilled
> > > > >> > to stack.   They will be spilled to stack only if there are other 
> > > > >> > non-temporary
> > > > >> > vector register usage in which case stack will be properly 
> > > > >> > re-aligned.
> > > > >> > Caller of creating pseudo registers with specific alignment 
> > > > >> > guarantees
> > > > >> > that they are used only as pure temporary registers.
> > > > >>
> > > > >> I don't think there's really a distinct category of pure temporary
> > > > >> registers though.  The things I mentioned above can happen for any
> > > > >> kind of pseudo register.
> > > > >
> > > > > I wonder if for the cases HJ thinks of it is appropriate to use 
> > > > > hardregs?
> > > > > Do we generally handle those well?  That is, are they again subject
> > > > > to be allocated by RA when no longer live?
> > > >
> > > > Yeah, using hard registers should work.  Of course, any given fixed 
> > > > choice
> > > > of hard register has the potential to be suboptimal in some situation,
> > > > but it should be safe.
> > >
> > > I tried hard registers.  The generated code isn't as good as pseudo 
> > > registers.
> > > But I want to avoid align the shack when