Re: [PATCH] Add a bootstrap-native build config

2024-11-06 Thread Andi Kleen
On Tue, Jul 30, 2024 at 09:40:42AM -0700, Andi Kleen wrote:
> From: Andi Kleen 
> 
> ... that uses -march=native -mtune=native to build a compiler optimized
> for the host.
> 
> config/ChangeLog:
> 
>   * bootstrap-native.mk: New file.
> 
> gcc/ChangeLog:
> 
>   * doc/install.texi: Document bootstrap-native.


I haven't gotten any approval on this patch.
Since multiple people liked it and it borders on the trivial and cannot
break anything else I plan to commit it in a week under the "obvious"
rule, unless someone objects.

Thanks,

-Andi


Re: [PATCH v3] Remove sys/user time in -ftime-report

2024-11-06 Thread Andi Kleen
On Fri, Nov 01, 2024 at 02:01:18PM -0400, John David Anglin wrote:
> This breaks build on hppa64-hp-hpux11.11.  This target has clock_gettime
> but it doesn't have CLOCK_MONOTONIC.  It has CLOCK_REALTIME.  I modified
> timevar.cc as follows to restore build.

Alternative would be to check for CLOCK_GETTIME in autoconf, but I guess
that will work too.


> Dave
> ---
> 
> diff --git a/gcc/timevar.cc b/gcc/timevar.cc
> index e12775e6ff3..412d9b62a8f 100644
> --- a/gcc/timevar.cc
> +++ b/gcc/timevar.cc
> @@ -160,7 +160,11 @@ get_time (struct timevar_time_def *now)
> 
>  #ifdef HAVE_CLOCK_GETTIME
>struct timespec ts;
> +#if _POSIX_TIMERS > 0 && defined(_POSIX_MONOTONIC_CLOCK)
>clock_gettime (CLOCK_MONOTONIC, &ts);
> +#else
> +  clock_gettime (CLOCK_REALTIME, &ts);
> +#endif
>now->wall = ts.tv_sec * 10 + ts.tv_nsec;
>return;
>  #define HAVE_WALL_TIME 1


Re: [PATCH] PR117350: Keep assembler name for abstract decls for autofdo

2024-11-05 Thread Andi Kleen
On Tue, Nov 05, 2024 at 09:47:17AM +0100, Richard Biener wrote:
> On Tue, Nov 5, 2024 at 2:02 AM Jason Merrill  wrote:
> >
> > On 10/31/24 4:40 PM, Andi Kleen wrote:
> > > From: Andi Kleen 
> > >
> > > autofdo looks up inline stacks and tries to match them with the profile
> > > data using their symbol name. Make sure all decls that can be in a inline 
> > > stack
> > > have a valid assembler name.
> > >
> > > This fixes a bootstrap problem with autoprofiledbootstrap and LTO.
> >
> > OK in a week if no other comments.
> 
> Hmm, but DECL_ABSTRACT_P should be only set on entities that generate no code.
> 
> How does autofdo look them up?  Are you sure it's the abstract decl
> autofdo wants
> to lookup?  Or is autofdo processing not serializing the compilation and thus 
> it
> affects code generation on parts that have not been processed yet?


autofdo tries to match inlines to an inline stack derived from dwarf.
So if something appears in dwarf it has to be in its stack. For the
test case the abstract entity is in the dwarf stack.

For the inside gcc lookup it walks the BLOCK_SUPERCONTEXT links
and looks at BLOCK_ABSTRACT_ORIGIN and ignores everything that has unknown
location

Maybe there could be some filtering there, but it would need to be on
both sides, which would be a version break for the file format.

-Andi



[PATCH] Update gcc-auto-profile / gen_autofdo_event.py

2024-10-31 Thread Andi Kleen
From: Andi Kleen 

- Fix warnings with newer python versions about bad escapes by
making all the python string raw.
- Add a fallback for using the builtin perf event list if the
CPU model number is unknown.
- Regenerate the shipped gcc-auto-profile with the changes.

contrib/ChangeLog:

* gen_autofdo_event.py: Convert strings to raw.
Add fallback to using builtin perf event list.

gcc/ChangeLog:

* config/i386/gcc-auto-profile: Regenerate.
---
 contrib/gen_autofdo_event.py | 36 ++--
 gcc/config/i386/gcc-auto-profile | 21 ---
 2 files changed, 34 insertions(+), 23 deletions(-)

diff --git a/contrib/gen_autofdo_event.py b/contrib/gen_autofdo_event.py
index 4c201943b5c7..4e58a5320fff 100755
--- a/contrib/gen_autofdo_event.py
+++ b/contrib/gen_autofdo_event.py
@@ -112,7 +112,7 @@ for j in u:
 u.close()
 
 if args.script:
-print('''#!/bin/sh
+print(r'''#!/bin/sh
 # Profile workload for gcc profile feedback (autofdo) using Linux perf.
 # Auto generated. To regenerate for new CPUs run
 # contrib/gen_autofdo_event.py --script --all in gcc source
@@ -152,22 +152,26 @@ case `grep -E -q "^cpu family\s*: 6" /proc/cpuinfo &&
 for event, mod in eventmap.items():
 for m in mod[:-1]:
 print("model*:\ %s|\\" % m)
-print('model*:\ %s) E="%s$FLAGS" ;;' % (mod[-1], event))
-print('''*)
+print(r'model*:\ %s) E="%s$FLAGS" ;;' % (mod[-1], event))
+print(r'''*)
+if perf list br_inst_retired | grep -q br_inst_retired.near_taken ; 
then
+E=br_inst_retired.near_taken:p
+else
 echo >&2 "Unknown CPU. Run contrib/gen_autofdo_event.py --all --script to 
update script."
-   exit 1 ;;''')
-print("esac")
-print("set -x")
-print('if ! perf record -e $E -b "$@" ; then')
-print('  # PEBS may not actually be working even if the processor supports 
it')
-print('  # (e.g., in a virtual machine). Trying to run without /p.')
-print('  set +x')
-print('  echo >&2 "Retrying without /p."')
-print('  E="$(echo "${E}" | sed -e \'s/\/p/\//\')"')
-print('  set -x')
-print('  exec perf record -e $E -b "$@"')
-print(' set +x')
-print('fi')
+ exit 1
+fi ;;''')
+print(r"esac")
+print(r"set -x")
+print(r'if ! perf record -e $E -b "$@" ; then')
+print(r'  # PEBS may not actually be working even if the processor 
supports it')
+print(r'  # (e.g., in a virtual machine). Trying to run without /p.')
+print(r'  set +x')
+print(r'  echo >&2 "Retrying without /p."')
+print(r'  E="$(echo "${E}" | sed -e \'s/\/p/\//\ -e s/:p//)"')
+print(r'  set -x')
+print(r'  exec perf record -e $E -b "$@"')
+print(r' set +x')
+print(r'fi')
 
 if cpufound == 0 and not args.all:
 sys.exit('CPU %s not found' % cpu)
diff --git a/gcc/config/i386/gcc-auto-profile b/gcc/config/i386/gcc-auto-profile
index 04f7d35dcc51..528b34e42400 100755
--- a/gcc/config/i386/gcc-auto-profile
+++ b/gcc/config/i386/gcc-auto-profile
@@ -82,17 +82,24 @@ model*:\ 126|\
 model*:\ 167|\
 model*:\ 140|\
 model*:\ 141|\
-model*:\ 143|\
-model*:\ 207|\
 model*:\ 106|\
-model*:\ 108) E="cpu/event=0xc4,umask=0x20/p$FLAGS" ;;
+model*:\ 108|\
+model*:\ 173|\
+model*:\ 174) E="cpu/event=0xc4,umask=0x20/$FLAGS" ;;
 model*:\ 134|\
 model*:\ 150|\
-model*:\ 156|\
-model*:\ 190) E="cpu/event=0xc4,umask=0xfe/p$FLAGS" ;;
+model*:\ 156) E="cpu/event=0xc4,umask=0xfe/p$FLAGS" ;;
+model*:\ 143|\
+model*:\ 207) E="cpu/event=0xc4,umask=0x20/p$FLAGS" ;;
+model*:\ 190) E="cpu/event=0xc4,umask=0xc0/$FLAGS" ;;
+model*:\ 190) E="cpu/event=0xc4,umask=0xfe/$FLAGS" ;;
 *)
+if perf list br_inst_retired | grep -q br_inst_retired.near_taken ; 
then
+E=br_inst_retired.near_taken:p
+else
 echo >&2 "Unknown CPU. Run contrib/gen_autofdo_event.py --all --script to 
update script."
-   exit 1 ;;
+ exit 1
+fi ;;
 esac
 set -x
 if ! perf record -e $E -b "$@" ; then
@@ -100,7 +107,7 @@ if ! perf record -e $E -b "$@" ; then
   # (e.g., in a virtual machine). Trying to run without /p.
   set +x
   echo >&2 "Retrying without /p."
-  E="$(echo "${E}" | sed -e 's/\/p/\//')"
+  E="$(echo "${E}" | sed -e \'s/\/p/\//\ -e s/:p//)"
   set -x
   exec perf record -e $E -b "$@"
  set +x
-- 
2.46.2



[PATCH] Enable autofdo bootstrap for lto/fortran

2024-10-31 Thread Andi Kleen
From: Andi Kleen 

When autofdo bootstrap support was originally implemented there were
issues with the LTO bootstrap, that is why it wasn't enabled
for them. I retested this now and it works on x86_64-linux.

Fortran was also missing, not sure why. Also enabled now.

gcc/fortran/ChangeLog:

* Make-lang.in: Enable autofdo.

gcc/lto/ChangeLog:

* Make-lang.in: Enable autofdo.
---
 gcc/fortran/Make-lang.in | 13 +++--
 gcc/lto/Make-lang.in | 14 +-
 2 files changed, 16 insertions(+), 11 deletions(-)

diff --git a/gcc/fortran/Make-lang.in b/gcc/fortran/Make-lang.in
index 0be3c6b654b1..7295118185fd 100644
--- a/gcc/fortran/Make-lang.in
+++ b/gcc/fortran/Make-lang.in
@@ -69,6 +69,15 @@ F95_OBJS = $(F95_PARSER_OBJS) $(FORTRAN_TARGET_OBJS) \
 
 fortran_OBJS = $(F95_OBJS) fortran/gfortranspec.o
 
+ifeq ($(if $(wildcard ../stage_current),$(shell cat \
+  ../stage_current)),stageautofeedback)
+$(fortran_OBJS): CFLAGS += -fauto-profile=f95.fda
+$(fortran_OBJS): f95.fda
+endif
+
+f95.fda: create_fdas_for_lto1
+   $(PROFILE_MERGER) $(shell ls -ha f95_*.fda) --output_file f95.fda 
-gcov_version 2
+
 #
 # Define the names for selecting gfortran in LANGUAGES.
 fortran: f951$(exeext)
@@ -272,7 +281,7 @@ fortran.install-info: $(DESTDIR)$(infodir)/gfortran.info
 fortran.install-man: $(DESTDIR)$(man1dir)/$(GFORTRAN_INSTALL_NAME)$(man1ext)
 
 $(DESTDIR)$(man1dir)/$(GFORTRAN_INSTALL_NAME)$(man1ext): doc/gfortran.1 \
-   installdirs
+   installdirs f95*.fda
-rm -f $@
-$(INSTALL_DATA) $< $@
-chmod a-x $@
@@ -293,7 +302,7 @@ fortran.uninstall:
 # We just have to delete files specific to us.
 
 fortran.mostlyclean:
-   -rm -f gfortran$(exeext) gfortran-cross$(exeext) f951$(exeext)
+   -rm -f gfortran$(exeext) gfortran-cross$(exeext) f951$(exeext) f95*.fda
-rm -f fortran/*.o
 
 fortran.clean:
diff --git a/gcc/lto/Make-lang.in b/gcc/lto/Make-lang.in
index b62ddcbe0dc9..4f9f21cdfc9e 100644
--- a/gcc/lto/Make-lang.in
+++ b/gcc/lto/Make-lang.in
@@ -29,15 +29,11 @@ lto_OBJS = $(LTO_OBJS)
 LTO_DUMP_OBJS = lto/lto-lang.o lto/lto-object.o attribs.o lto/lto-partition.o 
lto/lto-symtab.o lto/lto-dump.o lto/lto-common.o
 lto_dump_OBJS = $(LTO_DUMP_OBJS)
 
-# this is only useful in a LTO bootstrap, but this does not work right
-# now. Should reenable after this is fixed, but only when LTO bootstrap
-# is enabled.
-
-#ifeq ($(if $(wildcard ../stage_current),$(shell cat \
-#  ../stage_current)),stageautofeedback)
-#$(LTO_OBJS): CFLAGS += -fauto-profile=lto1.fda
-#$(LTO_OBJS): lto1.fda
-#endif
+ifeq ($(if $(wildcard ../stage_current),$(shell cat \
+  ../stage_current)),stageautofeedback)
+$(LTO_OBJS): CFLAGS += -fauto-profile=lto1.fda
+$(LTO_OBJS): lto1.fda
+endif
 
 # Rules
 
-- 
2.46.2



[PATCH] PR117350: Keep assembler name for abstract decls for autofdo

2024-10-31 Thread Andi Kleen
From: Andi Kleen 

autofdo looks up inline stacks and tries to match them with the profile
data using their symbol name. Make sure all decls that can be in a inline stack
have a valid assembler name.

This fixes a bootstrap problem with autoprofiledbootstrap and LTO.

2024-10-30  Jason Merrill  
Andrew Pinski  
Andi Kleen  
gcc/ChangeLog:

PR bootstrap/117350
* tree.cc (need_assembler_name_p): Keep assembler name
for abstract declarations when autofdo is used.
---
 gcc/tree.cc | 5 +++--
 1 file changed, 3 insertions(+), 2 deletions(-)

diff --git a/gcc/tree.cc b/gcc/tree.cc
index b4c059d3b0db..92f99eaccd72 100644
--- a/gcc/tree.cc
+++ b/gcc/tree.cc
@@ -787,8 +787,9 @@ need_assembler_name_p (tree decl)
   || DECL_ASSEMBLER_NAME_SET_P (decl))
 return false;
 
-  /* Abstract decls do not need an assembler name.  */
-  if (DECL_ABSTRACT_P (decl))
+  /* Abstract decls do not need an assembler name, except they
+ can be looked up by autofdo.  */
+  if (DECL_ABSTRACT_P (decl) && !flag_auto_profile)
 return false;
 
   /* For VAR_DECLs, only static, public and external symbols need an
-- 
2.46.2



Re: [PATCH v3] Remove sys/user time in -ftime-report

2024-10-31 Thread Andi Kleen
> I'm getting a build failure:
> 
> timevar.cc:163: undefined reference to `clock_gettime'
> 
> Our frozen build tools are intended to produce binaries that work
> "everywhere", so they're a few years old, but apparently something didn't
> configure correctly.
> 
> I see that libbacktrace configure correctly detects that clock_gettime is
> missing by default, but that it can be found in -lrt.
> 
> I'm investigating if I have a configuration problem on my end, but I think a
> configure test might be appropriate in gcc/configure, something like the one
> in the other subdirectories.

Here's a patch. I'll commit it as obvious unless someone complains.


Add autoconf check for clock_gettime

Reported by Andrew Stubbs

gcc/ChangeLog:

* config.in: Regenerate.
* configure: Regenerate.
* configure.ac: Check for HAVE_CLOCK_GETTIME.
* timevar.cc (get_time): Use HAVE_CLOCK_GETTIME.

diff --git a/gcc/config.in b/gcc/config.in
index 3fc4666d60b5..0a506d1783a4 100644
--- a/gcc/config.in
+++ b/gcc/config.in
@@ -883,6 +883,12 @@
 #endif
 
 
+/* Define to 1 if you have the `clock_gettime' function. */
+#ifndef USED_FOR_TARGET
+#undef HAVE_CLOCK_GETTIME
+#endif
+
+
 /* Define if  defines clock_t. */
 #ifndef USED_FOR_TARGET
 #undef HAVE_CLOCK_T
diff --git a/gcc/configure b/gcc/configure
index 47c58036530f..150ab6164142 100755
--- a/gcc/configure
+++ b/gcc/configure
@@ -10613,7 +10613,8 @@ fi
 for ac_func in times clock kill getrlimit setrlimit atoq \
popen sysconf strsignal getrusage nl_langinfo \
gettimeofday mbstowcs wcswidth mmap posix_fallocate setlocale \
-   clearerr_unlocked feof_unlocked   ferror_unlocked fflush_unlocked 
fgetc_unlocked fgets_unlocked   fileno_unlocked fprintf_unlocked fputc_unlocked 
fputs_unlocked   fread_unlocked fwrite_unlocked getchar_unlocked getc_unlocked  
 putchar_unlocked putc_unlocked madvise mallinfo mallinfo2 fstatat getauxval
+   clearerr_unlocked feof_unlocked   ferror_unlocked fflush_unlocked 
fgetc_unlocked fgets_unlocked   fileno_unlocked fprintf_unlocked fputc_unlocked 
fputs_unlocked   fread_unlocked fwrite_unlocked getchar_unlocked getc_unlocked  
 putchar_unlocked putc_unlocked madvise mallinfo mallinfo2 fstatat getauxval \
+   clock_gettime
 do :
   as_ac_var=`$as_echo "ac_cv_func_$ac_func" | $as_tr_sh`
 ac_fn_cxx_check_func "$LINENO" "$ac_func" "$as_ac_var"
@@ -10626,6 +10627,54 @@ fi
 done
 
 
+# At least for glibc, clock_gettime is in librt.  But don't pull that
+# in if it still doesn't give us the function we want.
+if test $ac_cv_func_clock_gettime = no; then
+  { $as_echo "$as_me:${as_lineno-$LINENO}: checking for clock_gettime in -lrt" 
>&5
+$as_echo_n "checking for clock_gettime in -lrt... " >&6; }
+if ${ac_cv_lib_rt_clock_gettime+:} false; then :
+  $as_echo_n "(cached) " >&6
+else
+  ac_check_lib_save_LIBS=$LIBS
+LIBS="-lrt  $LIBS"
+cat confdefs.h - <<_ACEOF >conftest.$ac_ext
+/* end confdefs.h.  */
+
+/* Override any GCC internal prototype to avoid an error.
+   Use char because int might match the return type of a GCC
+   builtin and then its argument prototype would still apply.  */
+#ifdef __cplusplus
+extern "C"
+#endif
+char clock_gettime ();
+int
+main ()
+{
+return clock_gettime ();
+  ;
+  return 0;
+}
+_ACEOF
+if ac_fn_cxx_try_link "$LINENO"; then :
+  ac_cv_lib_rt_clock_gettime=yes
+else
+  ac_cv_lib_rt_clock_gettime=no
+fi
+rm -f core conftest.err conftest.$ac_objext \
+conftest$ac_exeext conftest.$ac_ext
+LIBS=$ac_check_lib_save_LIBS
+fi
+{ $as_echo "$as_me:${as_lineno-$LINENO}: result: $ac_cv_lib_rt_clock_gettime" 
>&5
+$as_echo "$ac_cv_lib_rt_clock_gettime" >&6; }
+if test "x$ac_cv_lib_rt_clock_gettime" = xyes; then :
+  LIBS="-lrt $LIBS"
+
+$as_echo "#define HAVE_CLOCK_GETTIME 1" >>confdefs.h
+
+fi
+
+fi
+
 if test x$ac_cv_func_mbstowcs = xyes; then
   { $as_echo "$as_me:${as_lineno-$LINENO}: checking whether mbstowcs works" >&5
 $as_echo_n "checking whether mbstowcs works... " >&6; }
@@ -21405,7 +21454,7 @@ else
   lt_dlunknown=0; lt_dlno_uscore=1; lt_dlneed_uscore=2
   lt_status=$lt_dlunknown
   cat > conftest.$ac_ext <<_LT_EOF
-#line 21408 "configure"
+#line 21457 "configure"
 #include "confdefs.h"
 
 #if HAVE_DLFCN_H
@@ -21511,7 +21560,7 @@ else
   lt_dlunknown=0; lt_dlno_uscore=1; lt_dlneed_uscore=2
   lt_status=$lt_dlunknown
   cat > conftest.$ac_ext <<_LT_EOF
-#line 21514 "configure"
+#line 21563 "configure"
 #include "confdefs.h"
 
 #if HAVE_DLFCN_H
diff --git a/gcc/configure.ac b/gcc/configure.ac
index dc8346a7b823..bdb22d53e2ca 100644
--- a/gcc/configure.ac
+++ b/gcc/configure.ac
@@ -1557,7 +1557,17 @@ define(gcc_UNLOCKED_FUNCS, clearerr_unlocked 
feof_unlocked dnl
 AC_CHECK_FUNCS(times clock kill getrlimit setrlimit atoq \
popen sysconf strsignal getrusage nl_langinfo \
gettimeofday mbstowcs wcswidth mmap posix_fallocate setlocale \
-   gcc_UNLOCKED_FUNCS madvise mallinfo mallinfo2 fstatat getauxval)
+   gcc_UNLO

Re: [PATCH v3] Remove sys/user time in -ftime-report

2024-10-30 Thread Andi Kleen
On Wed, Oct 23, 2024 at 02:56:51PM +0200, Richard Biener wrote:
> On Wed, Oct 9, 2024 at 6:18 PM Andi Kleen  wrote:
> >
> > From: Andi Kleen 
> >
> > Retrieving sys/user time in timevars is quite expensive because it
> > always needs a system call. Only getting the wall time is much
> > cheaper because operating systems have optimized paths for this.
> >
> > The sys time isn't that interesting for a compiler and wall time
> > is usually close to user time except when the system is overloaded.
> > On the other hand when it is not wall time is more accurate because
> > it has less overhead.
> >
> > For building tramp3d with -O0 the -ftime-report overhead drops from
> > 18% to 3%. For -O2 it drops from 8% to not measurable.
> >
> > I changed the code to use gettimeofday as a fallback for clock_gettime
> > CLOCK_MONOTONIC.  If a host has neither of those the time will not
> > be measured. Previously clock was the fallback.
> 
> OK for trunk if there's no serious objection until mid next week.

I committed the patch now.

-Andi


Re: [PATCH v3 1/2][RFC] Provide more contexts for -Warray-bounds, -Wstringop-* warning messages due to code movements from compiler transformation [PR109071]

2024-10-30 Thread Andi Kleen
Qing Zhao  writes:

> Control this with a new option -fdiagnostics-details.

It would be useful to be also able to print the inline call stack,
maybe with a separate option.

In some array bounds cases I looked at the problem was hidden in some inlines
and it wasn't trivial to figure it out.

I wrote this patch for it at some point.


Print inline stack for warn access warnings

The warnings reported by gimple-ssa-warn-access often depend on the
caller with inlining, and when there are a lot of callers it can be
difficult to figure out which caller triggered a warning.

Print the function context including inline stack for these
warnings.

gcc/ChangeLog:

* gimple-ssa-warn-access.cc (maybe_inform_function): New
function to report function context.
(warn_string_no_nul): Use maybe_inform_function.
(maybe_warn_nonstring_arg): Dito.
(maybe_warn_for_bound): Dito.
(warn_for_access): Dito.
(check_access): Dito.
(warn_dealloc_offset): Dito.
(maybe_warn_alloc_args_overflow): Dito.
(pass_waccess::check_strncat): Dito.
(pass_waccess::maybe_check_access_sizes): Dito.
(pass_waccess::maybe_check_dealloc_call): Dito.
(pass_waccess::warn_invalid_pointer): Dito.
(maybe_warn_mismatched_realloc): Dito.
(pass_waccess::check_dangling_stores): Dito.
(pass_waccess::execute): Reset last_function variable.

diff --git a/gcc/gimple-ssa-warn-access.cc b/gcc/gimple-ssa-warn-access.cc
index 61f9f0f3d310..94c043531988 100644
--- a/gcc/gimple-ssa-warn-access.cc
+++ b/gcc/gimple-ssa-warn-access.cc
@@ -125,6 +125,21 @@ call_arg (tree expr, unsigned argno)
   return CALL_EXPR_ARG (expr, argno);
 }
 
+/* Already printed inform for the function.  */
+static bool printed_function;
+
+/* Inform about the function stack unless warning is suppressed at LOC
+   with opt code OPT.  */
+static void
+maybe_inform_function (location_t loc, int opt)
+{
+  if (printed_function)
+return;
+  printed_function = true;
+  if (!warning_suppressed_at (loc, (opt_code)opt))
+inform (DECL_SOURCE_LOCATION (cfun->decl), "in function %qD", cfun->decl);
+}
+
 /* For a call EXPR at LOC to a function FNAME that expects a string
in the argument ARG, issue a diagnostic due to it being a called
with an argument that is a character array with no terminating
@@ -162,6 +177,8 @@ warn_string_no_nul (location_t loc, GimpleOrTree expr, 
const char *fname,
 
   auto_diagnostic_group d;
 
+  maybe_inform_function (loc, opt);
+
   const tree maxobjsize = max_object_size ();
   const wide_int maxsiz = wi::to_wide (maxobjsize);
   if (expr)
@@ -485,6 +502,7 @@ maybe_warn_nonstring_arg (tree fndecl, GimpleOrTree exp)
   if (tree_int_cst_lt (maxobjsize, bndrng[0]))
{
  bool warned = false;
+ maybe_inform_function (loc, OPT_Wstringop_overread);
  if (tree_int_cst_equal (bndrng[0], bndrng[1]))
warned = warning_at (loc, OPT_Wstringop_overread,
 "%qD specified bound %E "
@@ -638,6 +656,7 @@ maybe_warn_nonstring_arg (tree fndecl, GimpleOrTree exp)
   auto_diagnostic_group d;
   if (wi::ltu_p (asize, wibnd))
{
+ maybe_inform_function (loc, OPT_Wstringop_overread);
  if (bndrng[0] == bndrng[1])
warned = warning_at (loc, OPT_Wstringop_overread,
 "%qD argument %i declared attribute "
@@ -723,6 +742,7 @@ maybe_warn_for_bound (opt_code opt, location_t loc, 
GimpleOrTree exp, tree func,
   auto_diagnostic_group d;
   if (tree_int_cst_lt (maxobjsize, bndrng[0]))
{
+ maybe_inform_function (loc, opt);
  if (bndrng[0] == bndrng[1])
warned = (func
  ? warning_at (loc, opt,
@@ -760,7 +780,9 @@ maybe_warn_for_bound (opt_code opt, location_t loc, 
GimpleOrTree exp, tree func,
   else if (!size || tree_int_cst_le (bndrng[0], size))
return false;
   else if (tree_int_cst_equal (bndrng[0], bndrng[1]))
-   warned = (func
+   {
+ maybe_inform_function (loc, opt);
+ warned = (func
  ? warning_at (loc, opt,
(maybe
 ? G_("%qD specified bound %E may exceed "
@@ -775,8 +797,11 @@ maybe_warn_for_bound (opt_code opt, location_t loc, 
GimpleOrTree exp, tree func,
 : G_("specified bound %E exceeds "
  "source size %E")),
bndrng[0], size));
+   }
   else
-   warned = (func
+   {
+ maybe_inform_function (loc, opt);
+ warned = (func
  ? warning_at (loc, opt,
(maybe
 ? G_("%qD specified bound [%E, %E] may "
@@ -791,6 +816,7 @@ maybe

Re: [PATCH v2 3/3] Simplify switch bit test clustering algorithmg

2024-10-29 Thread Andi Kleen
> > However this exposes PR117352 which is a negative interaction of the
> > more aggressive bit test conversion.  I don't think it's a show stopper,
> > this can be sorted out later.
> 
> I think it is a show stopper for GCC 15 because it is a pretty big
> performance regression with targets that have ccmp (which now includes
> x86_64).

Okay I reverted it.

It showed a weakness in the new algorithm that it doesn't take range
comparisons into account. And yes the cost check probably needs
to be adjust to understand ccmp.

-Andi


Re: [PATCH v2 3/3] Simplify switch bit test clustering algorithm

2024-10-29 Thread Andi Kleen
On Tue, Oct 29, 2024 at 01:50:57PM +0100, Richard Biener wrote:
> On Mon, Oct 28, 2024 at 9:58 PM Andi Kleen  wrote:
> >
> > From: Andi Kleen 
> >
> > The current switch bit test clustering enumerates all possible case
> > clusters combinations to find ones that fit the bit test constrains
> > best.  This causes performance problems with very large switches.
> >
> > For bit test clustering which happens naturally in word sized chunks
> > I don't think such an expensive algorithm is really needed.
> >
> > This patch implements a simple greedy algorithm that walks
> > the sorted list and examines word sized windows and tries
> > to cluster them.
> >
> > Surprisingly the new algorithm gives consistly better clusters
> > for the examples I tried.
> >
> > For example from the gcc bootstrap:
> >
> > old: 0-15 16-31 96-175
> > new: 0-31 96-175
> >
> > I'm not fully sure why that is, probably some bug in the old
> > algorithm? This shows even up in the test suite where if-to-switch-6
> > now can generate a switch, as well as a case in switch-1.c
> >
> > I don't have a proof that the new algorithm is always as good or better,
> > but so far at least I don't see any counter examples.
> >
> > It also fixes the excessive compile time in PR117091,
> > however this was already fixed by an earlier patch
> > that doesn't run clustering when no targets have multiple
> > values.
> 
> OK if you add a comment (as part of the function comment for example)
> explaining the idea of the algorithm.


I added the comment.

I will commit it with this change. I also had to add a few more
-fno-bit-tests to make the Linaro tester happy.

However this exposes PR117352 which is a negative interaction of the 
more aggressive bit test conversion.  I don't think it's a show stopper,
this can be sorted out later.

-Andi


[PATCH v2 2/3] Only do switch bit test clustering when multiple labels point to same bb

2024-10-28 Thread Andi Kleen
From: Andi Kleen 

The bit cluster code generation strategy is only beneficial when
multiple case labels point to the same code. Do a quick check if
that is the case before trying to cluster.

This fixes the switch part of PR117091 where all case labels are unique
however it doesn't address the performance problems for non unique
cases.

gcc/ChangeLog:

PR middle-end/117091
* gimple-if-to-switch.cc (if_chain::is_beneficial): Update
find_bit_test call.
* tree-switch-conversion.cc (bit_test_cluster::find_bit_tests):
Get max_c argument and bail out early if all case labels are
unique.
(switch_decision_tree::compute_cases_per_edge): Record number of
targets per label and return.
(switch_decision_tree::analyze_switch_statement): ... pass to
find_bit_tests.
* tree-switch-conversion.h: Update prototypes.
---
 gcc/gimple-if-to-switch.cc|  2 +-
 gcc/tree-switch-conversion.cc | 23 ---
 gcc/tree-switch-conversion.h  |  5 +++--
 3 files changed, 20 insertions(+), 10 deletions(-)

diff --git a/gcc/gimple-if-to-switch.cc b/gcc/gimple-if-to-switch.cc
index 96ce1c380a59..4151d1bb520e 100644
--- a/gcc/gimple-if-to-switch.cc
+++ b/gcc/gimple-if-to-switch.cc
@@ -254,7 +254,7 @@ if_chain::is_beneficial ()
   else
 output.release ();
 
-  output = bit_test_cluster::find_bit_tests (filtered_clusters);
+  output = bit_test_cluster::find_bit_tests (filtered_clusters, 2);
   r = output.length () < filtered_clusters.length ();
   if (r)
 dump_clusters (&output, "BT can be built");
diff --git a/gcc/tree-switch-conversion.cc b/gcc/tree-switch-conversion.cc
index 00426d46..3436c2a8b98c 100644
--- a/gcc/tree-switch-conversion.cc
+++ b/gcc/tree-switch-conversion.cc
@@ -1772,12 +1772,13 @@ jump_table_cluster::is_beneficial (const vec 
&,
 }
 
 /* Find bit tests of given CLUSTERS, where all members of the vector
-   are of type simple_cluster.  New clusters are returned.  */
+   are of type simple_cluster.   MAX_C is the approx max number of cases per
+   label.  New clusters are returned.  */
 
 vec
-bit_test_cluster::find_bit_tests (vec &clusters)
+bit_test_cluster::find_bit_tests (vec &clusters, int max_c)
 {
-  if (!is_enabled ())
+  if (!is_enabled () || max_c == 1)
 return clusters.copy ();
 
   unsigned l = clusters.length ();
@@ -2206,18 +2207,26 @@ bit_test_cluster::hoist_edge_and_branch_if_true 
(gimple_stmt_iterator *gsip,
 }
 
 /* Compute the number of case labels that correspond to each outgoing edge of
-   switch statement.  Record this information in the aux field of the edge.  */
+   switch statement.  Record this information in the aux field of the edge.
+   Return the approx max number of cases per edge.  */
 
-void
+int
 switch_decision_tree::compute_cases_per_edge ()
 {
+  int max_c = 0;
   reset_out_edges_aux (m_switch);
   int ncases = gimple_switch_num_labels (m_switch);
   for (int i = ncases - 1; i >= 1; --i)
 {
   edge case_edge = gimple_switch_edge (cfun, m_switch, i);
   case_edge->aux = (void *) ((intptr_t) (case_edge->aux) + 1);
+  /* For a range case add one extra. That's enough for the bit
+cluster heuristic.  */
+  if ((intptr_t)case_edge->aux > max_c)
+   max_c = (intptr_t)case_edge->aux +
+   !!CASE_HIGH (gimple_switch_label (m_switch, i));
 }
+  return max_c;
 }
 
 /* Analyze switch statement and return true when the statement is expanded
@@ -2235,7 +2244,7 @@ switch_decision_tree::analyze_switch_statement ()
   m_case_bbs.reserve (l);
   m_case_bbs.quick_push (default_bb);
 
-  compute_cases_per_edge ();
+  int max_c = compute_cases_per_edge ();
 
   for (unsigned i = 1; i < l; i++)
 {
@@ -2256,7 +2265,7 @@ switch_decision_tree::analyze_switch_statement ()
   reset_out_edges_aux (m_switch);
 
   /* Find bit-test clusters.  */
-  vec output = bit_test_cluster::find_bit_tests (clusters);
+  vec output = bit_test_cluster::find_bit_tests (clusters, max_c);
 
   /* Find jump table clusters.  */
   vec output2;
diff --git a/gcc/tree-switch-conversion.h b/gcc/tree-switch-conversion.h
index 6468995eb316..e6a85fa60258 100644
--- a/gcc/tree-switch-conversion.h
+++ b/gcc/tree-switch-conversion.h
@@ -399,7 +399,7 @@ public:
 
   /* Find bit tests of given CLUSTERS, where all members of the vector
  are of type simple_cluster.  New clusters are returned.  */
-  static vec find_bit_tests (vec &clusters);
+  static vec find_bit_tests (vec &clusters, int max_c);
 
   /* Return true when RANGE of case values with UNIQ labels
  can build a bit test.  */
@@ -576,8 +576,9 @@ public:
   bool try_switch_expansion (vec &clusters);
   /* Compute the number of case labels that correspond to each outgoing edge of
  switch statement.  Record this information in the aux field of the edge.
+ Returns approx max number of cases per edge.
  */
-  void compute_ca

[PATCH v2 3/3] Simplify switch bit test clustering algorithm

2024-10-28 Thread Andi Kleen
From: Andi Kleen 

The current switch bit test clustering enumerates all possible case
clusters combinations to find ones that fit the bit test constrains
best.  This causes performance problems with very large switches.

For bit test clustering which happens naturally in word sized chunks
I don't think such an expensive algorithm is really needed.

This patch implements a simple greedy algorithm that walks
the sorted list and examines word sized windows and tries
to cluster them.

Surprisingly the new algorithm gives consistly better clusters
for the examples I tried.

For example from the gcc bootstrap:

old: 0-15 16-31 96-175
new: 0-31 96-175

I'm not fully sure why that is, probably some bug in the old
algorithm? This shows even up in the test suite where if-to-switch-6
now can generate a switch, as well as a case in switch-1.c

I don't have a proof that the new algorithm is always as good or better,
but so far at least I don't see any counter examples.

It also fixes the excessive compile time in PR117091,
however this was already fixed by an earlier patch
that doesn't run clustering when no targets have multiple
values.

gcc/ChangeLog:

PR middle-end/117091
* tree-switch-conversion.cc (bit_test_cluster::find_bit_tests):
Change clustering algorithm to simple greedy.

gcc/testsuite/ChangeLog:

* gcc.dg/tree-ssa/if-to-switch-6.c: Allow condition chain.
* gcc.dg/tree-ssa/switch-1.c: Allow more bit tests.
---
 .../gcc.dg/tree-ssa/if-to-switch-6.c  |  2 +-
 gcc/testsuite/gcc.dg/tree-ssa/switch-1.c  |  2 +-
 gcc/tree-switch-conversion.cc | 76 ++-
 3 files changed, 42 insertions(+), 38 deletions(-)

diff --git a/gcc/testsuite/gcc.dg/tree-ssa/if-to-switch-6.c 
b/gcc/testsuite/gcc.dg/tree-ssa/if-to-switch-6.c
index b1640673eae1..657af770e438 100644
--- a/gcc/testsuite/gcc.dg/tree-ssa/if-to-switch-6.c
+++ b/gcc/testsuite/gcc.dg/tree-ssa/if-to-switch-6.c
@@ -39,4 +39,4 @@ int main(int argc, char **argv)
   return 0;
 }
 
-/* { dg-final { scan-tree-dump-not "Condition chain" "iftoswitch" } } */
+/* { dg-final { scan-tree-dump "Condition chain" "iftoswitch" } } */
diff --git a/gcc/testsuite/gcc.dg/tree-ssa/switch-1.c 
b/gcc/testsuite/gcc.dg/tree-ssa/switch-1.c
index 6f70c9de0c19..f1654aba6d99 100644
--- a/gcc/testsuite/gcc.dg/tree-ssa/switch-1.c
+++ b/gcc/testsuite/gcc.dg/tree-ssa/switch-1.c
@@ -107,4 +107,4 @@ int foo5 (int x)
   }
 }
 
-/* { dg-final { scan-tree-dump ";; GIMPLE switch case clusters: BT:10-62 
600-700 JT:1000-1021 11" "switchlower1" } } */
+/* { dg-final { scan-tree-dump ";; GIMPLE switch case clusters: BT:10-62 
600-700 BT:1000-1021 11" "switchlower1" } } */
diff --git a/gcc/tree-switch-conversion.cc b/gcc/tree-switch-conversion.cc
index 3436c2a8b98c..b7736a9853d9 100644
--- a/gcc/tree-switch-conversion.cc
+++ b/gcc/tree-switch-conversion.cc
@@ -1782,55 +1782,59 @@ bit_test_cluster::find_bit_tests (vec 
&clusters, int max_c)
 return clusters.copy ();
 
   unsigned l = clusters.length ();
-  auto_vec min;
-  min.reserve (l + 1);
+  vec output;
 
-  min.quick_push (min_cluster_item (0, 0, 0));
+  output.create (l);
 
-  for (unsigned i = 1; i <= l; i++)
+  unsigned end;
+  for (unsigned i = 0; i < l; i += end)
 {
-  /* Set minimal # of clusters with i-th item to infinite.  */
-  min.quick_push (min_cluster_item (INT_MAX, INT_MAX, INT_MAX));
+  HOST_WIDE_INT values = 0;
+  hash_set targets;
+  cluster *start_cluster = clusters[i];
 
-  for (unsigned j = 0; j < i; j++)
+  end = 0;
+  while (i + end < l)
{
- if (min[j].m_count + 1 < min[i].m_count
- && can_be_handled (clusters, j, i - 1))
-   min[i] = min_cluster_item (min[j].m_count + 1, j, INT_MAX);
+ cluster *end_cluster = clusters[i + end];
+
+ /* Does value range fit into the BITS_PER_WORD window?  */
+ HOST_WIDE_INT w = cluster::get_range (start_cluster->get_low (),
+   end_cluster->get_high ());
+ if (w == 0 || w > BITS_PER_WORD)
+   break;
+
+ /* Compute # of values tested for new case.  */
+ HOST_WIDE_INT r = 1;
+ if (!end_cluster->is_single_value_p ())
+   r = cluster::get_range (end_cluster->get_low (),
+   end_cluster->get_high ());
+ if (r == 0)
+   break;
+
+ /* Check for max # of targets.  */
+ if (targets.elements() == m_max_case_bit_tests
+ && !targets.contains (end_cluster->m_case_bb))
+   break;
+
+ targets.add (end_cluster->m_case_bb);
+ values += r;
+ end++;
}
 
-  gcc_checking_assert (min[i].m_count != INT_MAX);
-}
-
-  /* No result.  */
-  if (min[l].m_count == 

[PATCH v2 1/3] Disable -fbit-tests and -fjump-tables at -O0

2024-10-28 Thread Andi Kleen
From: Andi Kleen 

gcc/ChangeLog:

* common.opt: Enable -fbit-tests and -fjump-tables only at -O1.
* opts.cc (default_options_table): Dito.
---
 gcc/common.opt | 4 ++--
 gcc/opts.cc| 2 ++
 2 files changed, 4 insertions(+), 2 deletions(-)

diff --git a/gcc/common.opt b/gcc/common.opt
index 12b25ff486de..70a22cdc71a4 100644
--- a/gcc/common.opt
+++ b/gcc/common.opt
@@ -2189,11 +2189,11 @@ Common Var(flag_ivopts) Init(1) Optimization
 Optimize induction variables on trees.
 
 fjump-tables
-Common Var(flag_jump_tables) Init(1) Optimization
+Common Var(flag_jump_tables) Init(0) Optimization
 Use jump tables for sufficiently large switch statements.
 
 fbit-tests
-Common Var(flag_bit_tests) Init(1) Optimization
+Common Var(flag_bit_tests) Init(0) Optimization
 Use bit tests for sufficiently large switch statements.
 
 fkeep-inline-functions
diff --git a/gcc/opts.cc b/gcc/opts.cc
index acd53befdbfc..7adc495a7c2a 100644
--- a/gcc/opts.cc
+++ b/gcc/opts.cc
@@ -610,6 +610,7 @@ static const struct default_options default_options_table[] 
=
 { OPT_LEVELS_1_PLUS, OPT_fvar_tracking, NULL, 1 },
 
 /* -O1 (and not -Og) optimizations.  */
+{ OPT_LEVELS_1_PLUS_NOT_DEBUG, OPT_fbit_tests, NULL, 1 },
 { OPT_LEVELS_1_PLUS_NOT_DEBUG, OPT_fbranch_count_reg, NULL, 1 },
 #if DELAY_SLOTS
 { OPT_LEVELS_1_PLUS_NOT_DEBUG, OPT_fdelayed_branch, NULL, 1 },
@@ -618,6 +619,7 @@ static const struct default_options default_options_table[] 
=
 { OPT_LEVELS_1_PLUS_NOT_DEBUG, OPT_fif_conversion, NULL, 1 },
 { OPT_LEVELS_1_PLUS_NOT_DEBUG, OPT_fif_conversion2, NULL, 1 },
 { OPT_LEVELS_1_PLUS_NOT_DEBUG, OPT_finline_functions_called_once, NULL, 1 
},
+{ OPT_LEVELS_1_PLUS_NOT_DEBUG, OPT_fjump_tables, NULL, 1 },
 { OPT_LEVELS_1_PLUS_NOT_DEBUG, OPT_fmove_loop_invariants, NULL, 1 },
 { OPT_LEVELS_1_PLUS_NOT_DEBUG, OPT_fmove_loop_stores, NULL, 1 },
 { OPT_LEVELS_1_PLUS_NOT_DEBUG, OPT_fssa_phiopt, NULL, 1 },
-- 
2.46.2



[PATCH 2/2] Only do switch bit test clustering when multiple labels point to same bb

2024-10-16 Thread Andi Kleen
From: Andi Kleen 

The bit cluster code generation strategy is only beneficial when
multiple case labels point to the same code. Do a quick check if
that is the case before trying to cluster.

This fixes the switch part of PR117091 where all case labels are unique
however it doesn't address the performance problems for non unique
cases.

gcc/ChangeLog:

PR middle-end/117091
* gimple-if-to-switch.cc (if_chain::is_beneficial): Update
find_bit_test call.
* tree-switch-conversion.cc (bit_test_cluster::find_bit_tests):
Get max_c argument and bail out early if all case labels are
unique.
(switch_decision_tree::compute_cases_per_edge): Record number of
targets per label and return.
(switch_decision_tree::analyze_switch_statement): ... pass to
find_bit_tests.
* tree-switch-conversion.h: Update prototypes.
---
 gcc/gimple-if-to-switch.cc|  2 +-
 gcc/tree-switch-conversion.cc | 23 ---
 gcc/tree-switch-conversion.h  |  5 +++--
 3 files changed, 20 insertions(+), 10 deletions(-)

diff --git a/gcc/gimple-if-to-switch.cc b/gcc/gimple-if-to-switch.cc
index 96ce1c380a59..4151d1bb520e 100644
--- a/gcc/gimple-if-to-switch.cc
+++ b/gcc/gimple-if-to-switch.cc
@@ -254,7 +254,7 @@ if_chain::is_beneficial ()
   else
 output.release ();
 
-  output = bit_test_cluster::find_bit_tests (filtered_clusters);
+  output = bit_test_cluster::find_bit_tests (filtered_clusters, 2);
   r = output.length () < filtered_clusters.length ();
   if (r)
 dump_clusters (&output, "BT can be built");
diff --git a/gcc/tree-switch-conversion.cc b/gcc/tree-switch-conversion.cc
index 00426d46..bb7b8cf215a3 100644
--- a/gcc/tree-switch-conversion.cc
+++ b/gcc/tree-switch-conversion.cc
@@ -1772,12 +1772,13 @@ jump_table_cluster::is_beneficial (const vec 
&,
 }
 
 /* Find bit tests of given CLUSTERS, where all members of the vector
-   are of type simple_cluster.  New clusters are returned.  */
+   are of type simple_cluster. max_c is the max number of cases per label.
+   New clusters are returned.  */
 
 vec
-bit_test_cluster::find_bit_tests (vec &clusters)
+bit_test_cluster::find_bit_tests (vec &clusters, int max_c)
 {
-  if (!is_enabled ())
+  if (!is_enabled () || max_c == 1)
 return clusters.copy ();
 
   unsigned l = clusters.length ();
@@ -2206,18 +2207,26 @@ bit_test_cluster::hoist_edge_and_branch_if_true 
(gimple_stmt_iterator *gsip,
 }
 
 /* Compute the number of case labels that correspond to each outgoing edge of
-   switch statement.  Record this information in the aux field of the edge.  */
+   switch statement.  Record this information in the aux field of the edge.
+   Return the approx max number of cases per edge.  */
 
-void
+int
 switch_decision_tree::compute_cases_per_edge ()
 {
+  int max_c = 0;
   reset_out_edges_aux (m_switch);
   int ncases = gimple_switch_num_labels (m_switch);
   for (int i = ncases - 1; i >= 1; --i)
 {
   edge case_edge = gimple_switch_edge (cfun, m_switch, i);
   case_edge->aux = (void *) ((intptr_t) (case_edge->aux) + 1);
+  /* For a range case add one extra. That's enough for the bit
+cluster heuristic.  */
+  if ((intptr_t)case_edge->aux > max_c)
+   max_c = (intptr_t)case_edge->aux +
+   !!CASE_HIGH (gimple_switch_label (m_switch, i));
 }
+  return max_c;
 }
 
 /* Analyze switch statement and return true when the statement is expanded
@@ -2235,7 +2244,7 @@ switch_decision_tree::analyze_switch_statement ()
   m_case_bbs.reserve (l);
   m_case_bbs.quick_push (default_bb);
 
-  compute_cases_per_edge ();
+  int max_c = compute_cases_per_edge ();
 
   for (unsigned i = 1; i < l; i++)
 {
@@ -2256,7 +2265,7 @@ switch_decision_tree::analyze_switch_statement ()
   reset_out_edges_aux (m_switch);
 
   /* Find bit-test clusters.  */
-  vec output = bit_test_cluster::find_bit_tests (clusters);
+  vec output = bit_test_cluster::find_bit_tests (clusters, max_c);
 
   /* Find jump table clusters.  */
   vec output2;
diff --git a/gcc/tree-switch-conversion.h b/gcc/tree-switch-conversion.h
index fbfd7ff7b3ff..15f919f24f9f 100644
--- a/gcc/tree-switch-conversion.h
+++ b/gcc/tree-switch-conversion.h
@@ -399,7 +399,7 @@ public:
 
   /* Find bit tests of given CLUSTERS, where all members of the vector
  are of type simple_cluster.  New clusters are returned.  */
-  static vec find_bit_tests (vec &clusters);
+  static vec find_bit_tests (vec &clusters, int max_c);
 
   /* Return true when RANGE of case values with UNIQ labels
  can build a bit test.  */
@@ -577,8 +577,9 @@ public:
   bool try_switch_expansion (vec &clusters);
   /* Compute the number of case labels that correspond to each outgoing edge of
  switch statement.  Record this information in the aux field of the edge.
+ Returns max number of cases per edge.
  */
-  void compute_cases_per_edge ()

[PATCH 1/2] Disable -fbit-tests and -fjump-tables at -O0

2024-10-16 Thread Andi Kleen
From: Andi Kleen 

gcc/ChangeLog:

* common.opt: Enable -fbit-tests and -fjump-tables only at -O1.
* tree-switch-conversion.h (jump_table_cluster::is_enabled):
  Dito.
---
 gcc/common.opt   | 4 ++--
 gcc/tree-switch-conversion.h | 5 +++--
 2 files changed, 5 insertions(+), 4 deletions(-)

diff --git a/gcc/common.opt b/gcc/common.opt
index 12b25ff486de..4af7a94fea42 100644
--- a/gcc/common.opt
+++ b/gcc/common.opt
@@ -2189,11 +2189,11 @@ Common Var(flag_ivopts) Init(1) Optimization
 Optimize induction variables on trees.
 
 fjump-tables
-Common Var(flag_jump_tables) Init(1) Optimization
+Common Var(flag_jump_tables) Init(-1) Optimization
 Use jump tables for sufficiently large switch statements.
 
 fbit-tests
-Common Var(flag_bit_tests) Init(1) Optimization
+Common Var(flag_bit_tests) Init(-1) Optimization
 Use bit tests for sufficiently large switch statements.
 
 fkeep-inline-functions
diff --git a/gcc/tree-switch-conversion.h b/gcc/tree-switch-conversion.h
index 6468995eb316..fbfd7ff7b3ff 100644
--- a/gcc/tree-switch-conversion.h
+++ b/gcc/tree-switch-conversion.h
@@ -442,7 +442,7 @@ public:
   /* Return whether bit test expansion is allowed.  */
   static inline bool is_enabled (void)
   {
-return flag_bit_tests;
+return flag_bit_tests >= 0 ? flag_bit_tests : (optimize >= 1);
   }
 
   /* True when the jump table handles an entire switch statement.  */
@@ -524,7 +524,8 @@ bool jump_table_cluster::is_enabled (void)
  over-ruled us, we really have no choice.  */
   if (!targetm.have_casesi () && !targetm.have_tablejump ())
 return false;
-  if (!flag_jump_tables)
+  int flag = flag_jump_tables >= 0 ? flag_jump_tables : (optimize >= 1);
+  if (!flag)
 return false;
 #ifndef ASM_OUTPUT_ADDR_DIFF_ELT
   if (flag_pic)
-- 
2.46.2



[PATCH] PR116510: Add missing fold_converts into tree switch if conversion

2024-10-15 Thread Andi Kleen
From: Andi Kleen 

Passes test suite. Ok to commit?

gcc/ChangeLog:

PR middle-end/116510
* tree-if-conv.cc (predicate_bbs): Add missing fold_converts.

gcc/testsuite/ChangeLog:

* gcc.dg/vect/vect-switch-ifcvt-3.c: New test.
---
 gcc/testsuite/gcc.dg/vect/vect-switch-ifcvt-3.c | 12 
 gcc/tree-if-conv.cc |  9 ++---
 2 files changed, 18 insertions(+), 3 deletions(-)
 create mode 100644 gcc/testsuite/gcc.dg/vect/vect-switch-ifcvt-3.c

diff --git a/gcc/testsuite/gcc.dg/vect/vect-switch-ifcvt-3.c 
b/gcc/testsuite/gcc.dg/vect/vect-switch-ifcvt-3.c
new file mode 100644
index ..41bc8a1cf129
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/vect/vect-switch-ifcvt-3.c
@@ -0,0 +1,12 @@
+/* { dg-do compile } */
+// PR116510
+
+char excmap_def_0;
+int gg_strescape_i;
+void gg_strescape() {
+  for (; gg_strescape_i; gg_strescape_i++)
+switch ((unsigned char)gg_strescape_i)
+case '\\':
+case '"':
+  excmap_def_0 = 0;
+}
diff --git a/gcc/tree-if-conv.cc b/gcc/tree-if-conv.cc
index 90c754a48147..376a4642954d 100644
--- a/gcc/tree-if-conv.cc
+++ b/gcc/tree-if-conv.cc
@@ -1477,10 +1477,12 @@ predicate_bbs (loop_p loop)
{
  tree low = build2_loc (loc, GE_EXPR,
 boolean_type_node,
-index, CASE_LOW (label));
+index, fold_convert_loc (loc, 
TREE_TYPE (index),
+CASE_LOW (label)));
  tree high = build2_loc (loc, LE_EXPR,
  boolean_type_node,
- index, CASE_HIGH (label));
+ index, fold_convert_loc (loc, 
TREE_TYPE (index),
+ CASE_HIGH (label)));
  case_cond = build2_loc (loc, TRUTH_AND_EXPR,
  boolean_type_node,
  low, high);
@@ -1489,7 +1491,8 @@ predicate_bbs (loop_p loop)
case_cond = build2_loc (loc, EQ_EXPR,
boolean_type_node,
index,
-   CASE_LOW (gimple_switch_label (sw, i)));
+   fold_convert_loc (loc, TREE_TYPE 
(index),
+ CASE_LOW (label)));
  if (i > 1)
switch_cond = build2_loc (loc, TRUTH_OR_EXPR,
  boolean_type_node,
-- 
2.46.2



[PATCH v3] Remove sys/user time in -ftime-report

2024-10-09 Thread Andi Kleen
From: Andi Kleen 

Retrieving sys/user time in timevars is quite expensive because it
always needs a system call. Only getting the wall time is much
cheaper because operating systems have optimized paths for this.

The sys time isn't that interesting for a compiler and wall time
is usually close to user time except when the system is overloaded.
On the other hand when it is not wall time is more accurate because
it has less overhead.

For building tramp3d with -O0 the -ftime-report overhead drops from
18% to 3%. For -O2 it drops from 8% to not measurable.

I changed the code to use gettimeofday as a fallback for clock_gettime
CLOCK_MONOTONIC.  If a host has neither of those the time will not
be measured. Previously clock was the fallback.

This removes a lot of code in timevar.cc:

 gcc/timevar.cc | 167 ++---
 gcc/timevar.h  |  10 +---

 2 files changed, 17 insertions(+), 160 deletions(-)

Bootstrapped on x86_64-linux with full test suite run.

gcc/ChangeLog:

* timevar.cc (struct tms): Remove.
(RUSAGE_SELF): Remove.
(TICKS_PER_SECOND): Remove.
(USE_TIMES): Remove.
(HAVE_USER_TIME): Remove.
(HAVE_SYS_TIME): Remove.
(HAVE_WALL_TIME): Remove.
(USE_GETRUSAGE): Remove.
(USE_CLOCK): Remove.
(NANOSEC_PER_SEC): Remove.
(TICKS_TO_NANOSEC): Remove.
(CLOCKS_TO_NANOSEC): Remove.
(timer::named_items::push): Remove sys/user.
(get_time): Remove clock and times and getruage code.
(timevar_accumulate): Remove sys/user.
(timevar_diff): Dito.
(timer::validate_phases): Dito.
(timer::print_row): Dito.
(timer::all_zero): Dito.
(timer::print): Dito.
(make_json_for_timevar_time_def): Dito.
* timevar.h (struct timevar_time_def): Dito.

---

v2: Adjust JSON/Sarif output too.
v3: Make unconditional.
---
 gcc/timevar.cc | 189 ++---
 gcc/timevar.h  |  10 +--
 2 files changed, 22 insertions(+), 177 deletions(-)

diff --git a/gcc/timevar.cc b/gcc/timevar.cc
index 68bcf44864f9..4a57e74230d3 100644
--- a/gcc/timevar.cc
+++ b/gcc/timevar.cc
@@ -26,84 +26,6 @@ along with GCC; see the file COPYING3.  If not see
 #include "options.h"
 #include "json.h"
 
-#ifndef HAVE_CLOCK_T
-typedef int clock_t;
-#endif
-
-#ifndef HAVE_STRUCT_TMS
-struct tms
-{
-  clock_t tms_utime;
-  clock_t tms_stime;
-  clock_t tms_cutime;
-  clock_t tms_cstime;
-};
-#endif
-
-#ifndef RUSAGE_SELF
-# define RUSAGE_SELF 0
-#endif
-
-/* Calculation of scale factor to convert ticks to seconds.
-   We mustn't use CLOCKS_PER_SEC except with clock().  */
-#if HAVE_SYSCONF && defined _SC_CLK_TCK
-# define TICKS_PER_SECOND sysconf (_SC_CLK_TCK) /* POSIX 1003.1-1996 */
-#else
-# ifdef CLK_TCK
-#  define TICKS_PER_SECOND CLK_TCK /* POSIX 1003.1-1988; obsolescent */
-# else
-#  ifdef HZ
-#   define TICKS_PER_SECOND HZ  /* traditional UNIX */
-#  else
-#   define TICKS_PER_SECOND 100 /* often the correct value */
-#  endif
-# endif
-#endif
-
-/* Prefer times to getrusage to clock (each gives successively less
-   information).  */
-#ifdef HAVE_TIMES
-# if defined HAVE_DECL_TIMES && !HAVE_DECL_TIMES
-  extern clock_t times (struct tms *);
-# endif
-# define USE_TIMES
-# define HAVE_USER_TIME
-# define HAVE_SYS_TIME
-# define HAVE_WALL_TIME
-#else
-#ifdef HAVE_GETRUSAGE
-# if defined HAVE_DECL_GETRUSAGE && !HAVE_DECL_GETRUSAGE
-  extern int getrusage (int, struct rusage *);
-# endif
-# define USE_GETRUSAGE
-# define HAVE_USER_TIME
-# define HAVE_SYS_TIME
-#else
-#ifdef HAVE_CLOCK
-# if defined HAVE_DECL_CLOCK && !HAVE_DECL_CLOCK
-  extern clock_t clock (void);
-# endif
-# define USE_CLOCK
-# define HAVE_USER_TIME
-#endif
-#endif
-#endif
-
-/* libc is very likely to have snuck a call to sysconf() into one of
-   the underlying constants, and that can be very slow, so we have to
-   precompute them.  Whose wonderful idea was it to make all those
-   _constants_ variable at run time, anyway?  */
-#define NANOSEC_PER_SEC 10
-#ifdef USE_TIMES
-static uint64_t ticks_to_nanosec;
-#define TICKS_TO_NANOSEC (NANOSEC_PER_SEC / TICKS_PER_SECOND)
-#endif
-
-#ifdef USE_CLOCK
-static uint64_t clocks_to_nanosec;
-#define CLOCKS_TO_NANOSEC (NANOSEC_PER_SEC / CLOCKS_PER_SEC)
-#endif
-
 /* Non-NULL if timevars should be used.  In GCC, this happens with
the -ftime-report flag.  */
 
@@ -181,8 +103,6 @@ timer::named_items::push (const char *item_name)
   timer::timevar_def *def = &m_hash_map.get_or_insert (item_name, &existed);
   if (!existed)
 {
-  def->elapsed.user = 0;
-  def->elapsed.sys = 0;
   def->elapsed.wall = 0;
   def->name = item_name;
   def->standalone = 0;
@@ -230,37 +150,27 @@ timer::named_items::make_json () const
   return arr;
 }
 
-/* Fill the current times into TIME.  The definition o

Re: [PATCH v2] Add -ftime-report-wall

2024-10-09 Thread Andi Kleen
> So, shouldn't we go without the new option and simply change
> -ftime-report behavior?

I think it's fine (given the constraints I outlined earlier).
It will slightly change the output, but I guess there aren't that many
users that parse it mechanically.

I can do that unless someoneelse objects.

-Andi


[PATCH v2] Add -ftime-report-wall

2024-10-05 Thread Andi Kleen
From: Andi Kleen 

Time vars normally use times(2) to get the user/sys/wall time, which is always a
system call. I don't think the system time is very useful because most overhead
is in user time. If we only use the wall (or monotonic) time modern OS have an
optimized path to get it directly from a CPU instruction like RDTSC
without system call, which is much faster.

Add a -ftime-report-wall option. It actually uses the POSIX monotonic time,
so strictly it's not wall clock, but it's still a reasonable name.

Comparing the overhead with tramp3d -O0:

  ./gcc/cc1plus -quiet  ../tsrc/tramp3d-v4.i ran
1.03 ± 0.00 times faster than ./gcc/cc1plus -quiet -ftime-report-wall 
../tsrc/tramp3d-v4.i
1.18 ± 0.00 times faster than ./gcc/cc1plus -quiet -ftime-report 
../tsrc/tramp3d-v4.i

-ftime-report costs 18% (excluding the output), while -ftime-report-wall
only costs 3%, so is nearly free. So it would be feasible for some build
system to always enable it and break down the build time into passes.

With -O2 it is a bit less pronounced but still visible:

  ./gcc/cc1plus -O2 -quiet  ../tsrc/tramp3d-v4.i ran
1.00 ± 0.00 times faster than ./gcc/cc1plus -O2 -quiet -ftime-report-wall 
../tsrc/tramp3d-v4.i
1.08 ± 0.01 times faster than ./gcc/cc1plus -O2 -quiet -ftime-report 
../tsrc/tramp3d-v4.i

The drawback is that if there is context switching with other programs
the time will be overestimated, however for the common case that the
system is not oversubscribed it is more accurate because each
measurement has less overhead.

Bootstrapped on x86_64-linux with full test suite run.

gcc/ChangeLog:

* common.opt (ftime-report-wall): Add.
* common.opt.urls: Regenerate.
* doc/invoke.texi: (ftime-report-wall): Document
* gcc.cc (try_generate_repro): Check for -ftime-report-wall.
* timevar.cc (get_time): Use clock_gettime if enabled.
(timer::print): Print only wall time for time_report_wall.
(make_json_for_timevar_time_def): Dito.
* toplev.cc (toplev::start_timevars): Check for time_report_wall.

gcc/testsuite/ChangeLog:

* g++.dg/ext/timevar3.C: New test.

---

v2: Adjust JSON/Sarif output too.
---
 gcc/common.opt  |  4 +++
 gcc/common.opt.urls |  3 +++
 gcc/doc/invoke.texi |  7 ++
 gcc/gcc.cc  |  3 ++-
 gcc/testsuite/g++.dg/ext/timevar3.C | 14 +++
 gcc/timevar.cc  | 38 +++--
 gcc/toplev.cc   |  3 ++-
 7 files changed, 62 insertions(+), 10 deletions(-)
 create mode 100644 gcc/testsuite/g++.dg/ext/timevar3.C

diff --git a/gcc/common.opt b/gcc/common.opt
index 12b25ff486de..a200a8a0bc45 100644
--- a/gcc/common.opt
+++ b/gcc/common.opt
@@ -3014,6 +3014,10 @@ ftime-report
 Common Var(time_report)
 Report the time taken by each compiler pass.
 
+ftime-report-wall
+Common Var(time_report_wall)
+Report the wall time taken by each compiler.
+
 ftime-report-details
 Common Var(time_report_details)
 Record times taken by sub-phases separately.
diff --git a/gcc/common.opt.urls b/gcc/common.opt.urls
index e31736cd9945..6e79a8f9390b 100644
--- a/gcc/common.opt.urls
+++ b/gcc/common.opt.urls
@@ -1378,6 +1378,9 @@ UrlSuffix(gcc/Optimize-Options.html#index-fthread-jumps)
 ftime-report
 UrlSuffix(gcc/Developer-Options.html#index-ftime-report)
 
+ftime-report-wall
+UrlSuffix(gcc/Developer-Options.html#index-ftime-report-wall)
+
 ftime-report-details
 UrlSuffix(gcc/Developer-Options.html#index-ftime-report-details)
 
diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi
index d38c1feb86f7..8c11d12e7521 100644
--- a/gcc/doc/invoke.texi
+++ b/gcc/doc/invoke.texi
@@ -784,6 +784,7 @@ Objective-C and Objective-C++ Dialects}.
 -frandom-seed=@var{string}  -fsched-verbose=@var{n}
 -fsel-sched-verbose  -fsel-sched-dump-cfg  -fsel-sched-pipelining-verbose
 -fstats  -fstack-usage  -ftime-report  -ftime-report-details
+-ftime-report-wall
 -fvar-tracking-assignments-toggle  -gtoggle
 -print-file-name=@var{library}  -print-libgcc-file-name
 -print-multi-directory  -print-multi-lib  -print-multi-os-directory
@@ -21048,6 +21049,12 @@ slightly different place within the compiler.
 @item -ftime-report-details
 Record the time consumed by infrastructure parts separately for each pass.
 
+@opindex ftime-report-wall
+@item -ftime-report-wall
+Report statistics about compiler pass time consumpion, but only using wall
+time.  This is faster than @option{-ftime-report}, but can be more
+influenced by background jobs.
+
 @opindex fira-verbose
 @item -fira-verbose=@var{n}
 Control the verbosity of the dump file for the integrated register allocator.
diff --git a/gcc/gcc.cc b/gcc/gcc.cc
index 16fed46fb35f..8d3046eb7874 100644
--- a/gcc/gcc.cc
+++ b/gcc/gcc.cc
@@ -7964,7 +7964,8 @@ try_generate_repro (const char **argv)
it might varry between invocations.  */
 else if (! strcmp (argv[nargs], "-quiet

Re: [PATCH v1] Add -ftime-report-wall

2024-10-03 Thread Andi Kleen
> The only consumer I know of for the JSON time report data is in the
> integration tests I wrote for -fanalyzer, which assumes that all fields
> are present when printing, and then goes on to use the "user" times for
> summarizing; see this commit FWIW:
> https://github.com/davidmalcolm/gcc-analyzer-integration-tests/commit/5420ce968e6eae886e61486555b54fd460e0d35f

It seems to be broken even without my changes:


% ./gcc/cc1plus -ftime-report -fdiagnostics-format=sarif-file 
../tsrc/tramp3d-v4.i
cc1plus: internal compiler error: Segmentation fault
0x27206ee internal_error(char const*, ...)
../../gcc/gcc/diagnostic-global-context.cc:517
0x133401f crash_signal
../../gcc/gcc/toplev.cc:321
0x27e7934 htab_hash_string
../../gcc/libiberty/hashtab.c:838
0x2715dde string_hash::hash(char const*)
../../gcc/gcc/hash-traits.h:239
0x2715dde simple_hashmap_traits, 
sarif_artifact*>::hash(char const* const&)
../../gcc/gcc/hash-map-traits.h:50
0x2715dde hash_map, sarif_artifact*> 
>::get(char const* const&)
../../gcc/gcc/hash-map.h:191
0x2715dde ordered_hash_map, sarif_artifact*> 
>::get(char const* const&)
../../gcc/gcc/ordered-hash-map.h:76
0x2715dde sarif_builder::get_or_create_artifact(char const*, 
diagnostic_artifact_role, bool)
../../gcc/gcc/diagnostic-format-sarif.cc:2892
0x2716403 sarif_output_format::sarif_output_format(diagnostic_context&, 
line_maps const*, char const*, bool)
../../gcc/gcc/diagnostic-format-sarif.cc:3154
0x2716403 
sarif_file_output_format::sarif_file_output_format(diagnostic_context&, 
line_maps const*, char const*, bool, char const*)
../../gcc/gcc/diagnostic-format-sarif.cc:3193
0x2716403 std::enable_if::value, 
std::unique_ptr > >::type 
make_unique(diagnostic_context&, line_maps const*&, char 
const*&, bool&, char const*&)
../../gcc/gcc/make-unique.h:41
0x2716403 diagnostic_output_format_init_sarif_file(diagnostic_context&, 
line_maps const*, char const*, bool, char const*)
../../gcc/gcc/diagnostic-format-sarif.cc:3392
0x26f0522 common_handle_option(gcc_options*, gcc_options*, cl_decoded_option 
const*, unsigned int, int, unsigned int, cl_option_handlers const*, 
diagnostic_context*, void (*)())
../../gcc/gcc/opts.cc:2968
0x26f5728 handle_option
../../gcc/gcc/opts-common.cc:1316
0x26f585e read_cmdline_option(gcc_options*, gcc_options*, cl_decoded_option*, 
unsigned int, unsigned int, cl_option_handlers const*, diagnostic_context*)
../../gcc/gcc/opts-common.cc:1646
0x120f194 read_cmdline_options
../../gcc/gcc/opts-global.cc:242
0x120f194 decode_options(gcc_options*, gcc_options*, cl_decoded_option*, 
unsigned int, unsigned int, diagnostic_context*, void (*)())
../../gcc/gcc/opts-global.cc:329
Please submit a full bug report, with preprocessed source (by using 
-freport-bug).
Please include the complete backtrace with any bug report.
See  for instructions.


Re: [PATCH] testsuite: Fix tail_call and musttail effective targets [PR116080]

2024-10-03 Thread Andi Kleen
On Thu, Oct 03, 2024 at 01:48:35PM +, Christophe Lyon wrote:
> Some of the musttail tests (eg musttail7.c) fail on arm-eabi because
> check_effective_target_musttail pass, but the actual code in the test
> is rejected.
 
Looks good to me. Thanks.

-Andi


Re: [PATCH v1] Add -ftime-report-wall

2024-10-03 Thread Andi Kleen
> Note that if the user requests SARIF output e.g. with
>   -fdiagnostics-format=sarif-stderr
> then any timevar data from -ftime-report is written in JSON form as
> part of the SARIF, rather than in text form to stderr (see
> 75d623946d4b6ea80a777b789b116d4b4a2298dc).
> 
> I see that the proposed patch leaves the user and sys stats as zero,
> and conditionalizes what's printed for text output as part of
> timer::print.  Should it also do something similar in
> make_json_for_timevar_time_def for the json output, and not add the
> properties for "user" and "sys" if the data hasn't been gathered?

> Hope I'm reading the patch correctly.

Yes that's right.

I mainly adjusted the human output for cosmetic reasons.

For machine readable i guess it is better to have a stable schema 
and not skip fields to avoid pain for parsers. So I left it alone.

-Andi


[PATCH v1] Add -ftime-report-wall

2024-10-02 Thread Andi Kleen
From: Andi Kleen 

Time vars normally use times(2) to get the user/sys/wall time, which is always a
system call. I don't think the system time is very useful because most overhead
is in user time. If we only use the wall (or monotonic) time modern OS have an
optimized path to get it directly from a CPU instruction like RDTSC
without system call, which is much faster.

Comparing the overhead with tramp3d:

  ./gcc/cc1plus -quiet  ../tsrc/tramp3d-v4.i ran
1.03 ± 0.00 times faster than ./gcc/cc1plus -quiet -ftime-report-wall 
../tsrc/tramp3d-v4.i
1.18 ± 0.00 times faster than ./gcc/cc1plus -quiet -ftime-report 
../tsrc/tramp3d-v4.i

-ftime-report costs 18% (excluding the output), while -ftime-report-wall
only costs 3%, so is nearly free. So it would be feasible for some build
system to always enable it and break down the build time into passes.

The drawback is that if there is context switching with other programs
the time will be overestimated, however for the common case that the
system is not oversubscribed it is more accurate because each
measurement has less overhead.

Add a -ftime-report-wall option. It actually uses the POSIX monotonic time,
so strictly it's not wall clock, but it's still a reasonable name.

Bootstrapped on x86_64-linux with full test suite run.

gcc/ChangeLog:

* common.opt (ftime-report-wall): Add.
* common.opt.urls: Regenerate.
* doc/invoke.texi: (ftime-report-wall): Document
* gcc.cc (try_generate_repro): Check for -ftime-report-wall.
* timevar.cc (get_time): Use clock_gettime if enabled.
(timer::print): Print only wall time for time_report_wall.
* toplev.cc (toplev::start_timevars): Check for time_report_wall.

gcc/testsuite/ChangeLog:

* g++.dg/ext/timevar3.C: New test.
---
 gcc/common.opt  |  4 
 gcc/common.opt.urls |  3 +++
 gcc/doc/invoke.texi |  7 +++
 gcc/gcc.cc  |  3 ++-
 gcc/testsuite/g++.dg/ext/timevar3.C | 14 +
 gcc/timevar.cc  | 31 +++--
 gcc/toplev.cc   |  3 ++-
 7 files changed, 57 insertions(+), 8 deletions(-)
 create mode 100644 gcc/testsuite/g++.dg/ext/timevar3.C

diff --git a/gcc/common.opt b/gcc/common.opt
index d270e524ff45..e9fb15e28d80 100644
--- a/gcc/common.opt
+++ b/gcc/common.opt
@@ -3010,6 +3010,10 @@ ftime-report
 Common Var(time_report)
 Report the time taken by each compiler pass.
 
+ftime-report-wall
+Common Var(time_report_wall)
+Report the wall time taken by each compiler.
+
 ftime-report-details
 Common Var(time_report_details)
 Record times taken by sub-phases separately.
diff --git a/gcc/common.opt.urls b/gcc/common.opt.urls
index e31736cd9945..6e79a8f9390b 100644
--- a/gcc/common.opt.urls
+++ b/gcc/common.opt.urls
@@ -1378,6 +1378,9 @@ UrlSuffix(gcc/Optimize-Options.html#index-fthread-jumps)
 ftime-report
 UrlSuffix(gcc/Developer-Options.html#index-ftime-report)
 
+ftime-report-wall
+UrlSuffix(gcc/Developer-Options.html#index-ftime-report-wall)
+
 ftime-report-details
 UrlSuffix(gcc/Developer-Options.html#index-ftime-report-details)
 
diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi
index e199522f62c7..80cb355f5d79 100644
--- a/gcc/doc/invoke.texi
+++ b/gcc/doc/invoke.texi
@@ -784,6 +784,7 @@ Objective-C and Objective-C++ Dialects}.
 -frandom-seed=@var{string}  -fsched-verbose=@var{n}
 -fsel-sched-verbose  -fsel-sched-dump-cfg  -fsel-sched-pipelining-verbose
 -fstats  -fstack-usage  -ftime-report  -ftime-report-details
+-ftime-report-wall
 -fvar-tracking-assignments-toggle  -gtoggle
 -print-file-name=@var{library}  -print-libgcc-file-name
 -print-multi-directory  -print-multi-lib  -print-multi-os-directory
@@ -21026,6 +21027,12 @@ slightly different place within the compiler.
 @item -ftime-report-details
 Record the time consumed by infrastructure parts separately for each pass.
 
+@opindex ftime-report-wall
+@item -ftime-report-wall
+Report statistics about compiler pass time consumpion, but only using wall
+time.  This is faster than @option{-ftime-report}, but can be more
+influenced by background jobs.
+
 @opindex fira-verbose
 @item -fira-verbose=@var{n}
 Control the verbosity of the dump file for the integrated register allocator.
diff --git a/gcc/gcc.cc b/gcc/gcc.cc
index 16fed46fb35f..8d3046eb7874 100644
--- a/gcc/gcc.cc
+++ b/gcc/gcc.cc
@@ -7964,7 +7964,8 @@ try_generate_repro (const char **argv)
it might varry between invocations.  */
 else if (! strcmp (argv[nargs], "-quiet"))
   quiet = 1;
-else if (! strcmp (argv[nargs], "-ftime-report"))
+else if (! strcmp (argv[nargs], "-ftime-report")
+  || ! strcmp (argv[nargs], "-ftime-report-wall"))
   return;
 
   if (out_arg == -1 || !quiet)
diff --git a/gcc/testsuite/g++.dg/ext/timevar3.C 
b/gcc/testsuite/g++.dg/ext/timevar3.C
new file mode 100644
i

Re: [RFC PATCH] Allow limited extended asm at toplevel

2024-10-02 Thread Andi Kleen
Jakub Jelinek  writes:

> And for kernel perhaps we should add some new option which allows
> some dumb parsing of the toplevel asms and gather something from that
> parsing.

See also https://gcc.gnu.org/bugzilla/show_bug.cgi?id=107779

> The restrictions I've implemented are:
> 1) asm qualifiers aren't still allowed, so asm goto or asm inline can't be
>specified at toplevel, asm volatile has the volatile ignored for C++ with
>a warning and is an error in C like before
> 2) I see good use for mainly input operands, output maybe to make it clear
>that the inline asm may write some memory, I don't see a good use for
>clobbers, so the patch doesn't allow those (and of course labels because
>asm goto can't be specified)

One of the main uses for this is to specify functions that may get
called by the assembler. You proposal is to specify them as input "m" ?
Seems odd.  Perhaps this needs a new syntax.

One issue that asms also often run into is that they don't like
reordering. Some way to specify attribute((no_reorder)) would be useful.

-Andi


Re: [PING^4] [PATCH] Add a bootstrap-native build config

2024-09-10 Thread Andi Kleen
On Tue, Sep 10, 2024 at 03:29:08AM +, Ramana Radhakrishnan wrote:
> > diff --git a/config/bootstrap-native.mk b/config/bootstrap-native.mk
> > new file mode 100644
> > index ..a4a3d8594089
> > --- /dev/null
> > +++ b/config/bootstrap-native.mk
> > @@ -0,0 +1 @@
> > +BOOT_CFLAGS := -march=native -mtune=native $(BOOT_CFLAGS)
> 
> Does every port have a -march=native + -mtune=native that behaves the same 
> way as what is expected here ? 

Not all of them do, but it's getting somewhat common with popular ones.

> 
> > diff --git a/gcc/doc/install.texi b/gcc/doc/install.texi
> > index 4973f195daf9..29827c5106f8 100644
> > --- a/gcc/doc/install.texi
> > +++ b/gcc/doc/install.texi
> > @@ -3052,6 +3052,12 @@ Removes any @option{-O}-started option from 
> > @code{BOOT_CFLAGS}, and adds
> > @itemx @samp{bootstrap-Og}
> > Analogous to @code{bootstrap-O1}.
> > 
> > +@item @samp{bootstrap-native}
> > +@itemx @samp{bootstrap-native}
> > +Optimize the compiler code for the build host, if supported by the
> > +architecture. Note this only affects the compiler, not the targeted
> > +code. If you want the later use @samp{--with-cpu}.
> > +
> 
>  The defaults suitable for a port can be different , for instance on AArch32 
> additional options to specify float abi and floating point units might be 
> required.
>  I would suggest rewriting this to something like .  “If you want the later , 
> choose options suitable to the target you are looking for.  For e.g. 
> @samp{--with-cpu} would be a good starting point.” 

Ok.

-Andi


[PING^4] [PATCH] Add a bootstrap-native build config

2024-09-09 Thread Andi Kleen
Andi Kleen  writes:

Ping^4

Could someone please approve this (nearly trivial) patch?

Thanks,
-Andi

> Andi Kleen  writes:
>
> Ping^3
>
>> Andi Kleen  writes:
>>
>> PING^2 for the patch.
>>
>> (not sure if there is any maintainer to cc here, this is generic build 
>> infrastructure)
>>
>>> Andi Kleen  writes:
>>>
>>> I wanted to ping this patch:
>>>
>>> https://gcc.gnu.org/pipermail/gcc-patches/2024-July/658729.html
>>>
>>>
>>>> From: Andi Kleen 
>>>>
>>>> ... that uses -march=native -mtune=native to build a compiler optimized
>>>> for the host.
>>>>
>>>> config/ChangeLog:
>>>>
>>>>* bootstrap-native.mk: New file.
>>>>
>>>> gcc/ChangeLog:
>>>>
>>>>* doc/install.texi: Document bootstrap-native.
>>>> ---
>>>>  config/bootstrap-native.mk | 1 +
>>>>  gcc/doc/install.texi   | 6 ++
>>>>  2 files changed, 7 insertions(+)
>>>>  create mode 100644 config/bootstrap-native.mk
>>>>
>>>> diff --git a/config/bootstrap-native.mk b/config/bootstrap-native.mk
>>>> new file mode 100644
>>>> index ..a4a3d8594089
>>>> --- /dev/null
>>>> +++ b/config/bootstrap-native.mk
>>>> @@ -0,0 +1 @@
>>>> +BOOT_CFLAGS := -march=native -mtune=native $(BOOT_CFLAGS)
>>>> diff --git a/gcc/doc/install.texi b/gcc/doc/install.texi
>>>> index 4973f195daf9..29827c5106f8 100644
>>>> --- a/gcc/doc/install.texi
>>>> +++ b/gcc/doc/install.texi
>>>> @@ -3052,6 +3052,12 @@ Removes any @option{-O}-started option from 
>>>> @code{BOOT_CFLAGS}, and adds
>>>>  @itemx @samp{bootstrap-Og}
>>>>  Analogous to @code{bootstrap-O1}.
>>>>  
>>>> +@item @samp{bootstrap-native}
>>>> +@itemx @samp{bootstrap-native}
>>>> +Optimize the compiler code for the build host, if supported by the
>>>> +architecture. Note this only affects the compiler, not the targeted
>>>> +code. If you want the later use @samp{--with-cpu}.
>>>> +
>>>>  @item @samp{bootstrap-lto}
>>>>  Enables Link-Time Optimization for host tools during bootstrapping.
>>>>  @samp{BUILD_CONFIG=bootstrap-lto} is equivalent to adding


[PING^3] [PATCH] PR116080: Fix test suite checks for musttail

2024-09-02 Thread Andi Kleen
Andi Kleen  writes:

PING^3

> Andi Kleen  writes:
>
> PING^2 for https://gcc.gnu.org/pipermail/gcc-patches/2024-July/658602.html
>
> This fixes some musttail related test suite failures that cause noise on
> various targets.
>
>> Andi Kleen  writes:
>>
>> I wanted to ping this patch. It fixes test suite noise on various
>> targets.
>>
>> https://gcc.gnu.org/pipermail/gcc-patches/2024-July/658602.html
>>
>>
>>> From: Andi Kleen 
>>>
>>> This is a new attempt to fix PR116080. The previous try was reverted
>>> because it just broke a bunch of tests, hiding the problem.
>>>
>>> - musttail behaves differently than tailcall at -O0. Some of the test
>>> run at -O0, so add separate effective target tests for musttail.
>>> - New effective target tests need to use unique file names
>>> to make dejagnu caching work
>>> - Change the tests to use new targets
>>> - Add a external_musttail test to check for target's ability
>>> to do tail calls between translation units. This covers some powerpc
>>> ABIs.
>>>
>>> gcc/testsuite/ChangeLog:
>>>
>>> PR testsuite/116080
>>> * c-c++-common/musttail1.c: Use musttail target.
>>> * c-c++-common/musttail12.c: Use struct_musttail target.
>>> * c-c++-common/musttail2.c: Use musttail target.
>>> * c-c++-common/musttail3.c: Likewise.
>>> * c-c++-common/musttail4.c: Likewise.
>>> * c-c++-common/musttail7.c: Likewise.
>>> * c-c++-common/musttail8.c: Likewise.
>>> * g++.dg/musttail10.C: Likewise. Replace powerpc checks with
>>> external_musttail.
>>> * g++.dg/musttail11.C: Use musttail target.
>>> * g++.dg/musttail6.C: Use musttail target. Replace powerpc
>>> checks with external_musttail.
>>> * g++.dg/musttail9.C: Use musttail target.
>>> * lib/target-supports.exp: Add musttail, struct_musttail,
>>> external_musttail targets. Remove optimization for musttail.
>>> Use unique file names for musttail.
>>> ---
>>>  gcc/testsuite/c-c++-common/musttail1.c  |  2 +-
>>>  gcc/testsuite/c-c++-common/musttail12.c |  2 +-
>>>  gcc/testsuite/c-c++-common/musttail2.c  |  2 +-
>>>  gcc/testsuite/c-c++-common/musttail3.c  |  2 +-
>>>  gcc/testsuite/c-c++-common/musttail4.c  |  2 +-
>>>  gcc/testsuite/c-c++-common/musttail7.c  |  2 +-
>>>  gcc/testsuite/c-c++-common/musttail8.c  |  2 +-
>>>  gcc/testsuite/g++.dg/musttail10.C   |  4 ++--
>>>  gcc/testsuite/g++.dg/musttail11.C   |  2 +-
>>>  gcc/testsuite/g++.dg/musttail6.C|  4 ++--
>>>  gcc/testsuite/g++.dg/musttail9.C|  2 +-
>>>  gcc/testsuite/lib/target-supports.exp   | 30 -
>>>  12 files changed, 37 insertions(+), 19 deletions(-)
>>>
>>> diff --git a/gcc/testsuite/c-c++-common/musttail1.c 
>>> b/gcc/testsuite/c-c++-common/musttail1.c
>>> index 74efcc2a0bc6..51549672e02a 100644
>>> --- a/gcc/testsuite/c-c++-common/musttail1.c
>>> +++ b/gcc/testsuite/c-c++-common/musttail1.c
>>> @@ -1,4 +1,4 @@
>>> -/* { dg-do compile { target { tail_call && { c || c++11 } } } } */
>>> +/* { dg-do compile { target { musttail && { c || c++11 } } } } */
>>>  /* { dg-additional-options "-fdelayed-branch" { target sparc*-*-* } } */
>>>  
>>>  int __attribute__((noinline,noclone,noipa))
>>> diff --git a/gcc/testsuite/c-c++-common/musttail12.c 
>>> b/gcc/testsuite/c-c++-common/musttail12.c
>>> index 4140bcd00950..475afc5af3f3 100644
>>> --- a/gcc/testsuite/c-c++-common/musttail12.c
>>> +++ b/gcc/testsuite/c-c++-common/musttail12.c
>>> @@ -1,4 +1,4 @@
>>> -/* { dg-do compile { target { struct_tail_call && { c || c++11 } } } } */
>>> +/* { dg-do compile { target { struct_musttail && { c || c++11 } } } } */
>>>  /* { dg-additional-options "-fdelayed-branch" { target sparc*-*-* } } */
>>>  
>>>  struct str
>>> diff --git a/gcc/testsuite/c-c++-common/musttail2.c 
>>> b/gcc/testsuite/c-c++-common/musttail2.c
>>> index 86f2c3d77404..1970c4edd670 100644
>>> --- a/gcc/testsuite/c-c++-common/musttail2.c
>>> +++ b/gcc/testsuite/c-c++-common/musttail2.c
>>> @@ -1,4 +1,4 @@
>>> -/* { dg-do compile { target { tail_call && { c || c++11 } } } } */
>>> +/* { dg-do compile { target { musttail && { c || c++11 } } } } */
>>>  
>>&g

[PING^3] [PATCH] Add a bootstrap-native build config

2024-09-02 Thread Andi Kleen
Andi Kleen  writes:

Ping^3

> Andi Kleen  writes:
>
> PING^2 for the patch.
>
> (not sure if there is any maintainer to cc here, this is generic build 
> infrastructure)
>
>> Andi Kleen  writes:
>>
>> I wanted to ping this patch:
>>
>> https://gcc.gnu.org/pipermail/gcc-patches/2024-July/658729.html
>>
>>
>>> From: Andi Kleen 
>>>
>>> ... that uses -march=native -mtune=native to build a compiler optimized
>>> for the host.
>>>
>>> config/ChangeLog:
>>>
>>> * bootstrap-native.mk: New file.
>>>
>>> gcc/ChangeLog:
>>>
>>> * doc/install.texi: Document bootstrap-native.
>>> ---
>>>  config/bootstrap-native.mk | 1 +
>>>  gcc/doc/install.texi   | 6 ++
>>>  2 files changed, 7 insertions(+)
>>>  create mode 100644 config/bootstrap-native.mk
>>>
>>> diff --git a/config/bootstrap-native.mk b/config/bootstrap-native.mk
>>> new file mode 100644
>>> index ..a4a3d8594089
>>> --- /dev/null
>>> +++ b/config/bootstrap-native.mk
>>> @@ -0,0 +1 @@
>>> +BOOT_CFLAGS := -march=native -mtune=native $(BOOT_CFLAGS)
>>> diff --git a/gcc/doc/install.texi b/gcc/doc/install.texi
>>> index 4973f195daf9..29827c5106f8 100644
>>> --- a/gcc/doc/install.texi
>>> +++ b/gcc/doc/install.texi
>>> @@ -3052,6 +3052,12 @@ Removes any @option{-O}-started option from 
>>> @code{BOOT_CFLAGS}, and adds
>>>  @itemx @samp{bootstrap-Og}
>>>  Analogous to @code{bootstrap-O1}.
>>>  
>>> +@item @samp{bootstrap-native}
>>> +@itemx @samp{bootstrap-native}
>>> +Optimize the compiler code for the build host, if supported by the
>>> +architecture. Note this only affects the compiler, not the targeted
>>> +code. If you want the later use @samp{--with-cpu}.
>>> +
>>>  @item @samp{bootstrap-lto}
>>>  Enables Link-Time Optimization for host tools during bootstrapping.
>>>  @samp{BUILD_CONFIG=bootstrap-lto} is equivalent to adding


[PATCH] Fix test failing on sparc

2024-08-27 Thread Andi Kleen
From: Andi Kleen 

SPARC does not support vectorizing conditions, which this test relies
on. Use vect_condition as effective target.

Committed as obvious.

PR testsuite/116500

gcc/testsuite/ChangeLog:

* gcc.dg/vect/vect-switch-ifcvt-1.c: Use vect_condition to
check if vectorizing conditions is supported for target.
---
 gcc/testsuite/gcc.dg/vect/vect-switch-ifcvt-1.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/gcc/testsuite/gcc.dg/vect/vect-switch-ifcvt-1.c 
b/gcc/testsuite/gcc.dg/vect/vect-switch-ifcvt-1.c
index f5352ef8ed7a..2e3a9ae3c249 100644
--- a/gcc/testsuite/gcc.dg/vect/vect-switch-ifcvt-1.c
+++ b/gcc/testsuite/gcc.dg/vect/vect-switch-ifcvt-1.c
@@ -1,4 +1,4 @@
-/* { dg-require-effective-target vect_int } */
+/* { dg-require-effective-target vect_condition } */
 #include "tree-vect.h"
 
 extern void abort (void);
-- 
2.45.2



[PING^2] [PATCH] Add a bootstrap-native build config

2024-08-25 Thread Andi Kleen
Andi Kleen  writes:

PING^2 for the patch.

(not sure if there is any maintainer to cc here, this is generic build 
infrastructure)

> Andi Kleen  writes:
>
> I wanted to ping this patch:
>
> https://gcc.gnu.org/pipermail/gcc-patches/2024-July/658729.html
>
>
>> From: Andi Kleen 
>>
>> ... that uses -march=native -mtune=native to build a compiler optimized
>> for the host.
>>
>> config/ChangeLog:
>>
>>  * bootstrap-native.mk: New file.
>>
>> gcc/ChangeLog:
>>
>>  * doc/install.texi: Document bootstrap-native.
>> ---
>>  config/bootstrap-native.mk | 1 +
>>  gcc/doc/install.texi   | 6 ++
>>  2 files changed, 7 insertions(+)
>>  create mode 100644 config/bootstrap-native.mk
>>
>> diff --git a/config/bootstrap-native.mk b/config/bootstrap-native.mk
>> new file mode 100644
>> index ..a4a3d8594089
>> --- /dev/null
>> +++ b/config/bootstrap-native.mk
>> @@ -0,0 +1 @@
>> +BOOT_CFLAGS := -march=native -mtune=native $(BOOT_CFLAGS)
>> diff --git a/gcc/doc/install.texi b/gcc/doc/install.texi
>> index 4973f195daf9..29827c5106f8 100644
>> --- a/gcc/doc/install.texi
>> +++ b/gcc/doc/install.texi
>> @@ -3052,6 +3052,12 @@ Removes any @option{-O}-started option from 
>> @code{BOOT_CFLAGS}, and adds
>>  @itemx @samp{bootstrap-Og}
>>  Analogous to @code{bootstrap-O1}.
>>  
>> +@item @samp{bootstrap-native}
>> +@itemx @samp{bootstrap-native}
>> +Optimize the compiler code for the build host, if supported by the
>> +architecture. Note this only affects the compiler, not the targeted
>> +code. If you want the later use @samp{--with-cpu}.
>> +
>>  @item @samp{bootstrap-lto}
>>  Enables Link-Time Optimization for host tools during bootstrapping.
>>  @samp{BUILD_CONFIG=bootstrap-lto} is equivalent to adding


Re: [PING^2] [PATCH] PR116080: Fix test suite checks for musttail

2024-08-25 Thread Andi Kleen
Andi Kleen  writes:

PING^2 for https://gcc.gnu.org/pipermail/gcc-patches/2024-July/658602.html

This fixes some musttail related test suite failures that cause noise on
various targets.

> Andi Kleen  writes:
>
> I wanted to ping this patch. It fixes test suite noise on various
> targets.
>
> https://gcc.gnu.org/pipermail/gcc-patches/2024-July/658602.html
>
>
>> From: Andi Kleen 
>>
>> This is a new attempt to fix PR116080. The previous try was reverted
>> because it just broke a bunch of tests, hiding the problem.
>>
>> - musttail behaves differently than tailcall at -O0. Some of the test
>> run at -O0, so add separate effective target tests for musttail.
>> - New effective target tests need to use unique file names
>> to make dejagnu caching work
>> - Change the tests to use new targets
>> - Add a external_musttail test to check for target's ability
>> to do tail calls between translation units. This covers some powerpc
>> ABIs.
>>
>> gcc/testsuite/ChangeLog:
>>
>>  PR testsuite/116080
>>  * c-c++-common/musttail1.c: Use musttail target.
>>  * c-c++-common/musttail12.c: Use struct_musttail target.
>>  * c-c++-common/musttail2.c: Use musttail target.
>>  * c-c++-common/musttail3.c: Likewise.
>>  * c-c++-common/musttail4.c: Likewise.
>>  * c-c++-common/musttail7.c: Likewise.
>>  * c-c++-common/musttail8.c: Likewise.
>>  * g++.dg/musttail10.C: Likewise. Replace powerpc checks with
>>  external_musttail.
>>  * g++.dg/musttail11.C: Use musttail target.
>>  * g++.dg/musttail6.C: Use musttail target. Replace powerpc
>>  checks with external_musttail.
>>  * g++.dg/musttail9.C: Use musttail target.
>>  * lib/target-supports.exp: Add musttail, struct_musttail,
>>  external_musttail targets. Remove optimization for musttail.
>>  Use unique file names for musttail.
>> ---
>>  gcc/testsuite/c-c++-common/musttail1.c  |  2 +-
>>  gcc/testsuite/c-c++-common/musttail12.c |  2 +-
>>  gcc/testsuite/c-c++-common/musttail2.c  |  2 +-
>>  gcc/testsuite/c-c++-common/musttail3.c  |  2 +-
>>  gcc/testsuite/c-c++-common/musttail4.c  |  2 +-
>>  gcc/testsuite/c-c++-common/musttail7.c  |  2 +-
>>  gcc/testsuite/c-c++-common/musttail8.c  |  2 +-
>>  gcc/testsuite/g++.dg/musttail10.C   |  4 ++--
>>  gcc/testsuite/g++.dg/musttail11.C   |  2 +-
>>  gcc/testsuite/g++.dg/musttail6.C|  4 ++--
>>  gcc/testsuite/g++.dg/musttail9.C|  2 +-
>>  gcc/testsuite/lib/target-supports.exp   | 30 -
>>  12 files changed, 37 insertions(+), 19 deletions(-)
>>
>> diff --git a/gcc/testsuite/c-c++-common/musttail1.c 
>> b/gcc/testsuite/c-c++-common/musttail1.c
>> index 74efcc2a0bc6..51549672e02a 100644
>> --- a/gcc/testsuite/c-c++-common/musttail1.c
>> +++ b/gcc/testsuite/c-c++-common/musttail1.c
>> @@ -1,4 +1,4 @@
>> -/* { dg-do compile { target { tail_call && { c || c++11 } } } } */
>> +/* { dg-do compile { target { musttail && { c || c++11 } } } } */
>>  /* { dg-additional-options "-fdelayed-branch" { target sparc*-*-* } } */
>>  
>>  int __attribute__((noinline,noclone,noipa))
>> diff --git a/gcc/testsuite/c-c++-common/musttail12.c 
>> b/gcc/testsuite/c-c++-common/musttail12.c
>> index 4140bcd00950..475afc5af3f3 100644
>> --- a/gcc/testsuite/c-c++-common/musttail12.c
>> +++ b/gcc/testsuite/c-c++-common/musttail12.c
>> @@ -1,4 +1,4 @@
>> -/* { dg-do compile { target { struct_tail_call && { c || c++11 } } } } */
>> +/* { dg-do compile { target { struct_musttail && { c || c++11 } } } } */
>>  /* { dg-additional-options "-fdelayed-branch" { target sparc*-*-* } } */
>>  
>>  struct str
>> diff --git a/gcc/testsuite/c-c++-common/musttail2.c 
>> b/gcc/testsuite/c-c++-common/musttail2.c
>> index 86f2c3d77404..1970c4edd670 100644
>> --- a/gcc/testsuite/c-c++-common/musttail2.c
>> +++ b/gcc/testsuite/c-c++-common/musttail2.c
>> @@ -1,4 +1,4 @@
>> -/* { dg-do compile { target { tail_call && { c || c++11 } } } } */
>> +/* { dg-do compile { target { musttail && { c || c++11 } } } } */
>>  
>>  struct box { char field[256]; int i; };
>>  
>> diff --git a/gcc/testsuite/c-c++-common/musttail3.c 
>> b/gcc/testsuite/c-c++-common/musttail3.c
>> index ea9589c59ef2..7499fd6460b4 100644
>> --- a/gcc/testsuite/c-c++-common/musttail3.c
>> +++ b/gcc/testsuite/c-c++-common/musttail3.c
>> @@ -1,4 +1,4 @@
>> -/* { dg-do compile { target { tail

[PING] [PATCH v2] Support if conversion for switches

2024-08-13 Thread Andi Kleen
Andi Kleen  writes:

I wanted to ping this patch. I believe Richard ok'ed most of it earlier
but need an ok for the changes resulting from his review too
(but they were mostly only test suite and comment fixes
apart from some minor tweaks)

-Andi

> The gimple-if-to-switch pass converts if statements with
> multiple equal checks on the same value to a switch. This breaks
> vectorization which cannot handle switches.
>
> Teach the tree-if-conv pass used by the vectorizer to handle
> simple switch statements, like those created by if-to-switch earlier.
> These are switches that only have a single non default block,
> They are handled similar to COND in if conversion.
>
> This makes the vect-bitfield-read-1-not test fail. The test
> checks for a bitfield analysis failing, but it actually
> relied on the ifcvt erroring out early because the test
> is using a switch. The if conversion still does not
> work because the switch is not in a form that this
> patch can handle, but it fails much later and the bitfield
> analysis succeeds, which makes the test fail. I marked
> it xfail because it doesn't seem to be testing what it wants
> to test.
>
> [v2: Fix tests to run correctly. Update comments and commit log.
>  Fix gimple switch accessor use.]
>
> gcc/ChangeLog:
>
>   PR tree-opt/115866
>   * tree-if-conv.cc (if_convertible_switch_p): New function.
>   (if_convertible_stmt_p): Check for switch.
>   (get_loop_body_in_if_conv_order): Handle switch.
>   (predicate_bbs): Likewise.
>   (predicate_statements): Likewise.
>   (remove_conditions_and_labels): Likewise.
>   (ifcvt_split_critical_edges): Likewise.
>   (ifcvt_local_dce): Likewise.
>
> gcc/testsuite/ChangeLog:
>
>   * gcc.dg/vect/vect-switch-ifcvt-1.c: New test.
>   * gcc.dg/vect/vect-switch-ifcvt-2.c: New test.
>   * gcc.dg/vect/vect-switch-search-line-fast.c: New test.
>   * gcc.dg/vect/vect-bitfield-read-1-not.c: Change to xfail.
> ---
>  gcc/doc/cfg.texi  |   4 +-
>  .../gcc.dg/vect/vect-bitfield-read-1-not.c|   2 +-
>  .../gcc.dg/vect/vect-switch-ifcvt-1.c | 115 ++
>  .../gcc.dg/vect/vect-switch-ifcvt-2.c |  49 
>  .../vect/vect-switch-search-line-fast.c   |  17 +++
>  gcc/tree-if-conv.cc   |  93 +-
>  6 files changed, 272 insertions(+), 8 deletions(-)
>  create mode 100644 gcc/testsuite/gcc.dg/vect/vect-switch-ifcvt-1.c
>  create mode 100644 gcc/testsuite/gcc.dg/vect/vect-switch-ifcvt-2.c
>  create mode 100644 gcc/testsuite/gcc.dg/vect/vect-switch-search-line-fast.c
>
> diff --git a/gcc/doc/cfg.texi b/gcc/doc/cfg.texi
> index 9a22420f91f..a6f2b9f97d6 100644
> --- a/gcc/doc/cfg.texi
> +++ b/gcc/doc/cfg.texi
> @@ -83,13 +83,13 @@ lexicographical order, except @code{ENTRY_BLOCK} and 
> @code{EXIT_BLOCK}.
>  The macro @code{FOR_ALL_BB} also visits all basic blocks in
>  lexicographical order, including @code{ENTRY_BLOCK} and @code{EXIT_BLOCK}.
>  
> -@findex post_order_compute, inverted_post_order_compute, walk_dominator_tree
> +@findex post_order_compute, inverted_post_order_compute, dom_walker::walk
>  The functions @code{post_order_compute} and 
> @code{inverted_post_order_compute}
>  can be used to compute topological orders of the CFG.  The orders are
>  stored as vectors of basic block indices.  The @code{BASIC_BLOCK} array
>  can be used to iterate each basic block by index.
>  Dominator traversals are also possible using
> -@code{walk_dominator_tree}.  Given two basic blocks A and B, block A
> +@code{dom_walker::walk}.  Given two basic blocks A and B, block A
>  dominates block B if A is @emph{always} executed before B@.
>  
>  Each @code{basic_block} also contains pointers to the first
> diff --git a/gcc/testsuite/gcc.dg/vect/vect-bitfield-read-1-not.c 
> b/gcc/testsuite/gcc.dg/vect/vect-bitfield-read-1-not.c
> index 0d91067ebb2..85f4de8464a 100644
> --- a/gcc/testsuite/gcc.dg/vect/vect-bitfield-read-1-not.c
> +++ b/gcc/testsuite/gcc.dg/vect/vect-bitfield-read-1-not.c
> @@ -55,6 +55,6 @@ int main (void)
>return 0;
>  }
>  
> -/* { dg-final { scan-tree-dump-not "Bitfield OK to lower." "ifcvt" } } */
> +/* { dg-final { scan-tree-dump-times "Bitfield OK to lower." 0 "ifcvt" { 
> xfail *-*-* } } } */
>  
>  
> diff --git a/gcc/testsuite/gcc.dg/vect/vect-switch-ifcvt-1.c 
> b/gcc/testsuite/gcc.dg/vect/vect-switch-ifcvt-1.c
> new file mode 100644
> index 000..f5352ef8ed7
> --- /dev/null
> +++ b/gcc/testsuite/gcc.dg/vect/vect-switch-ifcvt-1.c
> @@ -0,0 +1,115 @@
> +/* { dg-require-effective-target vect_int } */
> +#include "tree-vect.h&

[PING] [PATCH] PR116080: Fix test suite checks for musttail

2024-08-12 Thread Andi Kleen
Andi Kleen  writes:

I wanted to ping this patch. It fixes test suite noise on various
targets.

https://gcc.gnu.org/pipermail/gcc-patches/2024-July/658602.html


> From: Andi Kleen 
>
> This is a new attempt to fix PR116080. The previous try was reverted
> because it just broke a bunch of tests, hiding the problem.
>
> - musttail behaves differently than tailcall at -O0. Some of the test
> run at -O0, so add separate effective target tests for musttail.
> - New effective target tests need to use unique file names
> to make dejagnu caching work
> - Change the tests to use new targets
> - Add a external_musttail test to check for target's ability
> to do tail calls between translation units. This covers some powerpc
> ABIs.
>
> gcc/testsuite/ChangeLog:
>
>   PR testsuite/116080
>   * c-c++-common/musttail1.c: Use musttail target.
>   * c-c++-common/musttail12.c: Use struct_musttail target.
>   * c-c++-common/musttail2.c: Use musttail target.
>   * c-c++-common/musttail3.c: Likewise.
>   * c-c++-common/musttail4.c: Likewise.
>   * c-c++-common/musttail7.c: Likewise.
>   * c-c++-common/musttail8.c: Likewise.
>   * g++.dg/musttail10.C: Likewise. Replace powerpc checks with
>   external_musttail.
>   * g++.dg/musttail11.C: Use musttail target.
>   * g++.dg/musttail6.C: Use musttail target. Replace powerpc
>   checks with external_musttail.
>   * g++.dg/musttail9.C: Use musttail target.
>   * lib/target-supports.exp: Add musttail, struct_musttail,
>   external_musttail targets. Remove optimization for musttail.
>   Use unique file names for musttail.
> ---
>  gcc/testsuite/c-c++-common/musttail1.c  |  2 +-
>  gcc/testsuite/c-c++-common/musttail12.c |  2 +-
>  gcc/testsuite/c-c++-common/musttail2.c  |  2 +-
>  gcc/testsuite/c-c++-common/musttail3.c  |  2 +-
>  gcc/testsuite/c-c++-common/musttail4.c  |  2 +-
>  gcc/testsuite/c-c++-common/musttail7.c  |  2 +-
>  gcc/testsuite/c-c++-common/musttail8.c  |  2 +-
>  gcc/testsuite/g++.dg/musttail10.C   |  4 ++--
>  gcc/testsuite/g++.dg/musttail11.C   |  2 +-
>  gcc/testsuite/g++.dg/musttail6.C|  4 ++--
>  gcc/testsuite/g++.dg/musttail9.C|  2 +-
>  gcc/testsuite/lib/target-supports.exp   | 30 -
>  12 files changed, 37 insertions(+), 19 deletions(-)
>
> diff --git a/gcc/testsuite/c-c++-common/musttail1.c 
> b/gcc/testsuite/c-c++-common/musttail1.c
> index 74efcc2a0bc6..51549672e02a 100644
> --- a/gcc/testsuite/c-c++-common/musttail1.c
> +++ b/gcc/testsuite/c-c++-common/musttail1.c
> @@ -1,4 +1,4 @@
> -/* { dg-do compile { target { tail_call && { c || c++11 } } } } */
> +/* { dg-do compile { target { musttail && { c || c++11 } } } } */
>  /* { dg-additional-options "-fdelayed-branch" { target sparc*-*-* } } */
>  
>  int __attribute__((noinline,noclone,noipa))
> diff --git a/gcc/testsuite/c-c++-common/musttail12.c 
> b/gcc/testsuite/c-c++-common/musttail12.c
> index 4140bcd00950..475afc5af3f3 100644
> --- a/gcc/testsuite/c-c++-common/musttail12.c
> +++ b/gcc/testsuite/c-c++-common/musttail12.c
> @@ -1,4 +1,4 @@
> -/* { dg-do compile { target { struct_tail_call && { c || c++11 } } } } */
> +/* { dg-do compile { target { struct_musttail && { c || c++11 } } } } */
>  /* { dg-additional-options "-fdelayed-branch" { target sparc*-*-* } } */
>  
>  struct str
> diff --git a/gcc/testsuite/c-c++-common/musttail2.c 
> b/gcc/testsuite/c-c++-common/musttail2.c
> index 86f2c3d77404..1970c4edd670 100644
> --- a/gcc/testsuite/c-c++-common/musttail2.c
> +++ b/gcc/testsuite/c-c++-common/musttail2.c
> @@ -1,4 +1,4 @@
> -/* { dg-do compile { target { tail_call && { c || c++11 } } } } */
> +/* { dg-do compile { target { musttail && { c || c++11 } } } } */
>  
>  struct box { char field[256]; int i; };
>  
> diff --git a/gcc/testsuite/c-c++-common/musttail3.c 
> b/gcc/testsuite/c-c++-common/musttail3.c
> index ea9589c59ef2..7499fd6460b4 100644
> --- a/gcc/testsuite/c-c++-common/musttail3.c
> +++ b/gcc/testsuite/c-c++-common/musttail3.c
> @@ -1,4 +1,4 @@
> -/* { dg-do compile { target { tail_call && { c || c++11 } } } } */
> +/* { dg-do compile { target { struct_musttail && { c || c++11 } } } } */
>  
>  extern int foo2 (int x, ...);
>  
> diff --git a/gcc/testsuite/c-c++-common/musttail4.c 
> b/gcc/testsuite/c-c++-common/musttail4.c
> index 23f4b5e1cd68..bd6effa4b931 100644
> --- a/gcc/testsuite/c-c++-common/musttail4.c
> +++ b/gcc/testsuite/c-c++-common/musttail4.c
> @@ -1,4 +1,4 @@
> -/* { dg-do compile { target { tail_call && { c || c++11 } } } } */
> +/* { dg-do compile { target {

[PING] [PATCH] Add a bootstrap-native build config

2024-08-12 Thread Andi Kleen
Andi Kleen  writes:

I wanted to ping this patch:

https://gcc.gnu.org/pipermail/gcc-patches/2024-July/658729.html


> From: Andi Kleen 
>
> ... that uses -march=native -mtune=native to build a compiler optimized
> for the host.
>
> config/ChangeLog:
>
>   * bootstrap-native.mk: New file.
>
> gcc/ChangeLog:
>
>   * doc/install.texi: Document bootstrap-native.
> ---
>  config/bootstrap-native.mk | 1 +
>  gcc/doc/install.texi   | 6 ++
>  2 files changed, 7 insertions(+)
>  create mode 100644 config/bootstrap-native.mk
>
> diff --git a/config/bootstrap-native.mk b/config/bootstrap-native.mk
> new file mode 100644
> index ..a4a3d8594089
> --- /dev/null
> +++ b/config/bootstrap-native.mk
> @@ -0,0 +1 @@
> +BOOT_CFLAGS := -march=native -mtune=native $(BOOT_CFLAGS)
> diff --git a/gcc/doc/install.texi b/gcc/doc/install.texi
> index 4973f195daf9..29827c5106f8 100644
> --- a/gcc/doc/install.texi
> +++ b/gcc/doc/install.texi
> @@ -3052,6 +3052,12 @@ Removes any @option{-O}-started option from 
> @code{BOOT_CFLAGS}, and adds
>  @itemx @samp{bootstrap-Og}
>  Analogous to @code{bootstrap-O1}.
>  
> +@item @samp{bootstrap-native}
> +@itemx @samp{bootstrap-native}
> +Optimize the compiler code for the build host, if supported by the
> +architecture. Note this only affects the compiler, not the targeted
> +code. If you want the later use @samp{--with-cpu}.
> +
>  @item @samp{bootstrap-lto}
>  Enables Link-Time Optimization for host tools during bootstrapping.
>  @samp{BUILD_CONFIG=bootstrap-lto} is equivalent to adding


[PATCH] Fix reference to the dom walker function in the documentation

2024-08-08 Thread Andi Kleen
From: Andi Kleen 

It is using a class now with a different name.

I will commit as obvious unless someone complains

Also I included this patch by mistake in my earlier if conversion v2
patch. Please ignore that hunk there.

gcc/ChangeLog:

* doc/cfg.texi: Fix references to dom_walker.
---
 gcc/doc/cfg.texi | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/gcc/doc/cfg.texi b/gcc/doc/cfg.texi
index 9a22420f91f..a6f2b9f97d6 100644
--- a/gcc/doc/cfg.texi
+++ b/gcc/doc/cfg.texi
@@ -83,13 +83,13 @@ lexicographical order, except @code{ENTRY_BLOCK} and 
@code{EXIT_BLOCK}.
 The macro @code{FOR_ALL_BB} also visits all basic blocks in
 lexicographical order, including @code{ENTRY_BLOCK} and @code{EXIT_BLOCK}.
 
-@findex post_order_compute, inverted_post_order_compute, walk_dominator_tree
+@findex post_order_compute, inverted_post_order_compute, dom_walker::walk
 The functions @code{post_order_compute} and @code{inverted_post_order_compute}
 can be used to compute topological orders of the CFG.  The orders are
 stored as vectors of basic block indices.  The @code{BASIC_BLOCK} array
 can be used to iterate each basic block by index.
 Dominator traversals are also possible using
-@code{walk_dominator_tree}.  Given two basic blocks A and B, block A
+@code{dom_walker::walk}.  Given two basic blocks A and B, block A
 dominates block B if A is @emph{always} executed before B@.
 
 Each @code{basic_block} also contains pointers to the first
-- 
2.45.2



[PATCH v2] Support if conversion for switches

2024-08-08 Thread Andi Kleen
The gimple-if-to-switch pass converts if statements with
multiple equal checks on the same value to a switch. This breaks
vectorization which cannot handle switches.

Teach the tree-if-conv pass used by the vectorizer to handle
simple switch statements, like those created by if-to-switch earlier.
These are switches that only have a single non default block,
They are handled similar to COND in if conversion.

This makes the vect-bitfield-read-1-not test fail. The test
checks for a bitfield analysis failing, but it actually
relied on the ifcvt erroring out early because the test
is using a switch. The if conversion still does not
work because the switch is not in a form that this
patch can handle, but it fails much later and the bitfield
analysis succeeds, which makes the test fail. I marked
it xfail because it doesn't seem to be testing what it wants
to test.

[v2: Fix tests to run correctly. Update comments and commit log.
 Fix gimple switch accessor use.]

gcc/ChangeLog:

PR tree-opt/115866
* tree-if-conv.cc (if_convertible_switch_p): New function.
(if_convertible_stmt_p): Check for switch.
(get_loop_body_in_if_conv_order): Handle switch.
(predicate_bbs): Likewise.
(predicate_statements): Likewise.
(remove_conditions_and_labels): Likewise.
(ifcvt_split_critical_edges): Likewise.
(ifcvt_local_dce): Likewise.

gcc/testsuite/ChangeLog:

* gcc.dg/vect/vect-switch-ifcvt-1.c: New test.
* gcc.dg/vect/vect-switch-ifcvt-2.c: New test.
* gcc.dg/vect/vect-switch-search-line-fast.c: New test.
* gcc.dg/vect/vect-bitfield-read-1-not.c: Change to xfail.
---
 gcc/doc/cfg.texi  |   4 +-
 .../gcc.dg/vect/vect-bitfield-read-1-not.c|   2 +-
 .../gcc.dg/vect/vect-switch-ifcvt-1.c | 115 ++
 .../gcc.dg/vect/vect-switch-ifcvt-2.c |  49 
 .../vect/vect-switch-search-line-fast.c   |  17 +++
 gcc/tree-if-conv.cc   |  93 +-
 6 files changed, 272 insertions(+), 8 deletions(-)
 create mode 100644 gcc/testsuite/gcc.dg/vect/vect-switch-ifcvt-1.c
 create mode 100644 gcc/testsuite/gcc.dg/vect/vect-switch-ifcvt-2.c
 create mode 100644 gcc/testsuite/gcc.dg/vect/vect-switch-search-line-fast.c

diff --git a/gcc/doc/cfg.texi b/gcc/doc/cfg.texi
index 9a22420f91f..a6f2b9f97d6 100644
--- a/gcc/doc/cfg.texi
+++ b/gcc/doc/cfg.texi
@@ -83,13 +83,13 @@ lexicographical order, except @code{ENTRY_BLOCK} and 
@code{EXIT_BLOCK}.
 The macro @code{FOR_ALL_BB} also visits all basic blocks in
 lexicographical order, including @code{ENTRY_BLOCK} and @code{EXIT_BLOCK}.
 
-@findex post_order_compute, inverted_post_order_compute, walk_dominator_tree
+@findex post_order_compute, inverted_post_order_compute, dom_walker::walk
 The functions @code{post_order_compute} and @code{inverted_post_order_compute}
 can be used to compute topological orders of the CFG.  The orders are
 stored as vectors of basic block indices.  The @code{BASIC_BLOCK} array
 can be used to iterate each basic block by index.
 Dominator traversals are also possible using
-@code{walk_dominator_tree}.  Given two basic blocks A and B, block A
+@code{dom_walker::walk}.  Given two basic blocks A and B, block A
 dominates block B if A is @emph{always} executed before B@.
 
 Each @code{basic_block} also contains pointers to the first
diff --git a/gcc/testsuite/gcc.dg/vect/vect-bitfield-read-1-not.c 
b/gcc/testsuite/gcc.dg/vect/vect-bitfield-read-1-not.c
index 0d91067ebb2..85f4de8464a 100644
--- a/gcc/testsuite/gcc.dg/vect/vect-bitfield-read-1-not.c
+++ b/gcc/testsuite/gcc.dg/vect/vect-bitfield-read-1-not.c
@@ -55,6 +55,6 @@ int main (void)
   return 0;
 }
 
-/* { dg-final { scan-tree-dump-not "Bitfield OK to lower." "ifcvt" } } */
+/* { dg-final { scan-tree-dump-times "Bitfield OK to lower." 0 "ifcvt" { xfail 
*-*-* } } } */
 
 
diff --git a/gcc/testsuite/gcc.dg/vect/vect-switch-ifcvt-1.c 
b/gcc/testsuite/gcc.dg/vect/vect-switch-ifcvt-1.c
new file mode 100644
index 000..f5352ef8ed7
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/vect/vect-switch-ifcvt-1.c
@@ -0,0 +1,115 @@
+/* { dg-require-effective-target vect_int } */
+#include "tree-vect.h"
+
+extern void abort (void);
+
+int
+f1 (char *s)
+{
+  int c = 0;
+  int i;
+  for (i = 0; i < 64; i++)
+{
+  switch (*s)
+   {
+   case ',':
+   case '|':
+ c++;
+   }
+  s++;
+}
+  return c;
+}
+
+int
+f2 (char *s)
+{
+  int c = 0;
+  int i;
+  for (i = 0; i < 64; i++)
+{
+  if (*s != '#')
+   {
+ switch (*s)
+   {
+   case ',':
+   case '|':
+ c++;
+   }
+   }
+  s++;
+}
+  return c;
+}
+
+int
+f3 (char *s)
+{
+  int c = 0;
+  int i;
+  for (i = 0; i < 64; i++)
+{
+  if (*s != '#')
+if (*s == ',' || *s == '|' || *s == '@' || *s == '*')
+ c++;
+  s++;
+}
+  return c;
+}
+
+
+int
+f4 (char *s)
+{
+  

Re: [PATCH] Support if conversion for switches

2024-08-08 Thread Andi Kleen
> > But your comment made me realize there is a major bug.
> >
> > if_convertible_switch_p also needs to check that that the labels don't fall
> > through, so the the flow graph is diamond shape.  Need some easy way to
> > verify that.
> 
> Do we verify this for if()s?  That is,

No we do not. After some consideration it isn't a bug at all.

> 
>   if (i)
> {
>   ...
>goto fallthru;
> }
>   else
>{
> fallthru:
>  ...
>}
> 
> For ifs we seem to add the predicate to both edges even in the degenerate 
> case.

Yes we do.

-Andi


Re: [PATCH] Support if conversion for switches

2024-08-07 Thread Andi Kleen
> > + /* Create chain of switch tests for each case.  */
> > + tree switch_cond = NULL_TREE;
> > + tree index = gimple_switch_index (sw);
> > + for (unsigned i = 1; i < gimple_switch_num_labels (sw); i++)
> > +   {
> > + tree label = gimple_switch_label (sw, i);
> > + tree case_cond;
> > + /* This currently cannot happen because tree-cfg lowers range
> > +switches with a single destination to COND.  */
> 
> But it should also lower non-range switches with a single destination ...?
> See convert_single_case_switch.  You say
> 
>   switch (i)
> {
> case 1:
> case 5 ... 7:
>   return 42;
> default:
>   return 0;
> }
> 
> doesn't hit here with a CASE_HIGH for the 5 ... 7 CASE_LABEL?

Yes it can actually happen. I'll correct the comment/description
and add a test case.

But your comment made me realize there is a major bug.

if_convertible_switch_p also needs to check that that the labels don't fall
through, so the the flow graph is diamond shape.  Need some easy way to 
verify that.

-Andi


Re: [PATCH] PR116080: Fix test suite checks for musttail

2024-08-07 Thread Andi Kleen
> > Okay for trunk? I would like to check that one in to avoid the noise
> > in the regression reports.
> 
> I've tested this version in a few trees.

Thanks Thomas.

> That's because of effective-target 'struct_musttail' for '-m32'
> reporting:
> 
> struct_musttail1494739.cc: In function 'foo bar()':
> struct_musttail1494739.cc:5:88: error: cannot tail-call: return value 
> used after call
> 
> (I'm just mentioning the latter "regressions" in case those are
> unexpected.)

I believe that's because these test cases are handled by the GIMPLE level
tail call handling in tree-tailcall (which avoids any target
restrictions), while the TCL test checks for the generic case using 
an extern (so hits target restrictions).

While this could probably be distinguished in the test case probing
I don't think it's worth it. Some of this is just for the frontend,
which is architecture independent enough.

-Andi


Re: [PATCH 2/3] libcpp: replace SSE4.2 helper with an SSSE3 one

2024-08-06 Thread Andi Kleen
On Tue, Aug 06, 2024 at 11:50:00AM -0700, Andi Kleen wrote:
> > -  s += 16;
> > +  v16qi data, t;
> > +  /* Unaligned load.  Reading beyond the final newline is safe, since
> > +files.cc:read_file_guts pads the allocation.  */
> 
> You need to change that function to use 32 byte padding as Jakub
> pointed out (I forgot that too)

Never mind, it's in the next patch.



Re: [PATCH 2/3] libcpp: replace SSE4.2 helper with an SSSE3 one

2024-08-06 Thread Andi Kleen
> -  s += 16;
> +  v16qi data, t;
> +  /* Unaligned load.  Reading beyond the final newline is safe, since
> +  files.cc:read_file_guts pads the allocation.  */

You need to change that function to use 32 byte padding as Jakub
pointed out (I forgot that too)

> +  data = *(const v16qi_u *)s;
> +  /* Prevent propagation into pshufb and pcmp as memory operand.  */
> +  __asm__ ("" : "+x" (data));

It would probably make sense to a file a PR on this separately,
to eventually fix the compiler to not need such workarounds.
Not sure how much difference it makes however.

-Andi


Re: [PATCH 0/3] libcpp: improve x86 vectorized helpers

2024-08-06 Thread Andi Kleen
> Andi, can you push your own patch?).

Done.

-Andi


Re: [RFC] libstdc++: Replace Ryu with Teju Jagua for float.

2024-08-06 Thread Andi Kleen
Cassio Neri  writes:

> Implement the template function teju_jagua which finds the shortest
> representation of a floating-point number. The floating-point type is a
> template parameter and the implementation is generic enough to handle all
> floating-point types of interest, namely, IEEE 754, std::bfloat16_t,
> x86 80-bit and IBM128.

So the only benefit is performance, right? So the patch
should come with some performance numbers how it is better 
than the old code. Also how did you validate that it works
correctly?

-Andi


[PATCH] Support if conversion for switches

2024-08-06 Thread Andi Kleen
The gimple-if-to-switch pass converts if statements with
multiple equal checks on the same value to a switch. This breaks
vectorization which cannot handle switches.

Teach the tree-if-conv pass used by the vectorizer to handle
simple switch statements, like those created by if-to-switch earlier.
These are switches that only have a single non default block,
and no ranges. They are handled similar to if in if conversion.

Some notes:

In theory this handles switches with case ranges, but it seems
for the simple "one target label" switch case that is supported
here these are always optimized by the cfg passes to COND,
so this case is latent.

This makes the vect-bitfield-read-1-not test fail. The test
checks for a bitfield analysis failing, but it actually
relied on the ifcvt erroring out early because the test
is using a switch. The if conversion still does not
work because the switch is not in a form that this
patch can handle, but it fails much later and the bitfield
analysis succeeds, which makes the test fail. I marked
it xfail because it doesn't seem to be testing what it wants
to test.

gcc/ChangeLog:

PR tree-opt/115866
* tree-if-conv.cc (if_convertible_switch_p): New function.
(if_convertible_stmt_p): Check for switch.
(get_loop_body_in_if_conv_order): Handle switch.
(predicate_bbs): Likewise.
(predicate_statements): Likewise.
(remove_conditions_and_labels): Likewise.
(ifcvt_split_critical_edges): Likewise.
(ifcvt_local_dce): Likewise.

gcc/testsuite/ChangeLog:

* gcc.dg/vect/vect-switch-ifcvt-1.c: New test.
* gcc.dg/vect/vect-switch-ifcvt-2.c: New test.
* gcc.dg/vect/vect-switch-search-line-fast.c: New test.
* gcc.dg/vect/vect-bitfield-read-1-not.c: Change to xfail.
---
 .../gcc.dg/vect/vect-bitfield-read-1-not.c|   2 +-
 .../gcc.dg/vect/vect-switch-ifcvt-1.c | 107 ++
 .../gcc.dg/vect/vect-switch-ifcvt-2.c |  28 +
 .../vect/vect-switch-search-line-fast.c   |  17 +++
 gcc/tree-if-conv.cc   |  90 ++-
 5 files changed, 238 insertions(+), 6 deletions(-)
 create mode 100644 gcc/testsuite/gcc.dg/vect/vect-switch-ifcvt-1.c
 create mode 100644 gcc/testsuite/gcc.dg/vect/vect-switch-ifcvt-2.c
 create mode 100644 gcc/testsuite/gcc.dg/vect/vect-switch-search-line-fast.c

diff --git a/gcc/testsuite/gcc.dg/vect/vect-bitfield-read-1-not.c 
b/gcc/testsuite/gcc.dg/vect/vect-bitfield-read-1-not.c
index 0d91067ebb2..85f4de8464a 100644
--- a/gcc/testsuite/gcc.dg/vect/vect-bitfield-read-1-not.c
+++ b/gcc/testsuite/gcc.dg/vect/vect-bitfield-read-1-not.c
@@ -55,6 +55,6 @@ int main (void)
   return 0;
 }
 
-/* { dg-final { scan-tree-dump-not "Bitfield OK to lower." "ifcvt" } } */
+/* { dg-final { scan-tree-dump-times "Bitfield OK to lower." 0 "ifcvt" { xfail 
*-*-* } } } */
 
 
diff --git a/gcc/testsuite/gcc.dg/vect/vect-switch-ifcvt-1.c 
b/gcc/testsuite/gcc.dg/vect/vect-switch-ifcvt-1.c
new file mode 100644
index 000..0b06d3c84a7
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/vect/vect-switch-ifcvt-1.c
@@ -0,0 +1,107 @@
+/* { dg-require-effective-target vect_int } */
+
+extern void abort (void);
+
+int
+f1 (char *s)
+{
+  int c = 0;
+  int i;
+  for (i = 0; i < 64; i++)
+{
+  switch (*s)
+   {
+   case ',':
+   case '|':
+ c++;
+   }
+  s++;
+}
+  return c;
+}
+
+int
+f2 (char *s)
+{
+  int c = 0;
+  int i;
+  for (i = 0; i < 64; i++)
+{
+  if (*s != '#')
+   {
+ switch (*s)
+   {
+   case ',':
+   case '|':
+ c++;
+   }
+   }
+  s++;
+}
+  return c;
+}
+
+int
+f3 (char *s)
+{
+  int c = 0;
+  int i;
+  for (i = 0; i < 64; i++)
+{
+  if (*s != '#')
+if (*s == ',' || *s == '|' || *s == '@' || *s == '*')
+ c++;
+  s++;
+}
+  return c;
+}
+
+
+int
+f4 (char *s)
+{
+  int c = 0;
+  int i;
+  for (i = 0; i < 64; i++)
+{
+  if (*s == ',' || *s == '|' || *s == '@' || *s == '*')
+   c++;
+  s++;
+}
+  return c;
+}
+
+#define CHECK(f, str, res) \
+  __builtin_strcpy(buf, str); n = f(buf); if (n != res) abort();
+
+int
+main ()
+{
+  int n;
+  char buf[64];
+
+  CHECK (f1, ",,", 10);
+  CHECK (f1, "||", 10);
+  CHECK (f1, "aa", 0);
+  CHECK (f1, "", 0);
+  CHECK (f1, ",|,|xx", 4);
+
+  CHECK (f2, ",|,|xx", 4);
+  CHECK (f2, ",|,|xx", 4);
+  CHECK (f2, ",|,|xx", 4);
+  CHECK (f2, ",|,|xx", 4);
+
+  CHECK (f3, ",|,|xx", 4);
+  CHECK (f3, ",|,|xx", 4);
+  CHECK (f3, ",|,|xx", 4);
+  CHECK (f3, ",|,|xx", 4);
+
+  CHECK (f4, ",|,|xx", 4);
+  CHECK (f4, ",|,|xx", 4);
+  CHECK (f4, ",|,|xx", 4);
+  CHECK (f4, ",|,|xx", 4);
+
+  return 0;
+}
+
+/* { dg-final { scan-tree-dump-times "vectorized 1 loops" 4 "vect"  } } */
diff --git a/gcc/testsuite/gcc.dg/vect/vect-switch-ifcvt-2.c 
b/gcc/testsuite/gcc.dg/vect/vect-switc

Re: [PATCH 0/1] Initial support for AVX10.2

2024-08-04 Thread Andi Kleen
> BTW, I noticed that in LLVM there is FP8 support for ARM currently
> undergoing. I will have a look on it to see if everything is mature.

There's even FP8 work for ARM work under way for gcc, see
https://gcc.gnu.org/pipermail/gcc-patches/2024-August/659248.html


-andi


Re: [PATCH] PR116080: Fix test suite checks for musttail

2024-08-02 Thread Andi Kleen
Andi Kleen  writes:

> From: Andi Kleen 
>
> This is a new attempt to fix PR116080. The previous try was reverted
> because it just broke a bunch of tests, hiding the problem.

The previous version still had one failure on powerpc because
of a template call that needs a dg-error check for external_tail_call.
I fixed that now in the below version.

Okay for trunk? I would like to check that one in to avoid the noise
in the regression reports.

---

This is a new attempt to fix PR116080. The previous try was reverted
because it just broke a bunch of tests, hiding the problem.

- musttail behaves differently than tailcall at -O0. Some of the test
run at -O0, so add separate effective target tests for musttail.
- New effective target tests need to use unique file names
to make dejagnu caching work
- Change the tests to use new targets
- Add a external_musttail test to check for target's ability
to do tail calls between translation units. This covers some powerpc
ABIs.

gcc/testsuite/ChangeLog:

PR testsuite/116080
* c-c++-common/musttail1.c: Use musttail target.
* c-c++-common/musttail12.c: Use struct_musttail target.
* c-c++-common/musttail2.c: Use musttail target.
* c-c++-common/musttail3.c: Likewise.
* c-c++-common/musttail4.c: Likewise.
* c-c++-common/musttail7.c: Likewise.
* c-c++-common/musttail8.c: Likewise.
* g++.dg/musttail10.C: Likewise. Replace powerpc checks with
external_musttail.
* g++.dg/musttail11.C: Use musttail target.
* g++.dg/musttail6.C: Use musttail target. Replace powerpc
checks with external_musttail.
* g++.dg/musttail9.C: Use musttail target.
* lib/target-supports.exp: Add musttail, struct_musttail,
external_musttail targets. Remove optimization for musttail.
Use unique file names for musttail.
---
 gcc/testsuite/c-c++-common/musttail1.c  |  2 +-
 gcc/testsuite/c-c++-common/musttail12.c |  2 +-
 gcc/testsuite/c-c++-common/musttail2.c  |  2 +-
 gcc/testsuite/c-c++-common/musttail3.c  |  2 +-
 gcc/testsuite/c-c++-common/musttail4.c  |  2 +-
 gcc/testsuite/c-c++-common/musttail7.c  |  2 +-
 gcc/testsuite/c-c++-common/musttail8.c  |  2 +-
 gcc/testsuite/g++.dg/musttail10.C   |  6 ++---
 gcc/testsuite/g++.dg/musttail11.C   |  2 +-
 gcc/testsuite/g++.dg/musttail6.C|  4 ++--
 gcc/testsuite/g++.dg/musttail9.C|  2 +-
 gcc/testsuite/lib/target-supports.exp   | 30 -
 12 files changed, 38 insertions(+), 20 deletions(-)

diff --git a/gcc/testsuite/c-c++-common/musttail1.c 
b/gcc/testsuite/c-c++-common/musttail1.c
index 74efcc2a0bc6..51549672e02a 100644
--- a/gcc/testsuite/c-c++-common/musttail1.c
+++ b/gcc/testsuite/c-c++-common/musttail1.c
@@ -1,4 +1,4 @@
-/* { dg-do compile { target { tail_call && { c || c++11 } } } } */
+/* { dg-do compile { target { musttail && { c || c++11 } } } } */
 /* { dg-additional-options "-fdelayed-branch" { target sparc*-*-* } } */
 
 int __attribute__((noinline,noclone,noipa))
diff --git a/gcc/testsuite/c-c++-common/musttail12.c 
b/gcc/testsuite/c-c++-common/musttail12.c
index 4140bcd00950..475afc5af3f3 100644
--- a/gcc/testsuite/c-c++-common/musttail12.c
+++ b/gcc/testsuite/c-c++-common/musttail12.c
@@ -1,4 +1,4 @@
-/* { dg-do compile { target { struct_tail_call && { c || c++11 } } } } */
+/* { dg-do compile { target { struct_musttail && { c || c++11 } } } } */
 /* { dg-additional-options "-fdelayed-branch" { target sparc*-*-* } } */
 
 struct str
diff --git a/gcc/testsuite/c-c++-common/musttail2.c 
b/gcc/testsuite/c-c++-common/musttail2.c
index 86f2c3d77404..1970c4edd670 100644
--- a/gcc/testsuite/c-c++-common/musttail2.c
+++ b/gcc/testsuite/c-c++-common/musttail2.c
@@ -1,4 +1,4 @@
-/* { dg-do compile { target { tail_call && { c || c++11 } } } } */
+/* { dg-do compile { target { musttail && { c || c++11 } } } } */
 
 struct box { char field[256]; int i; };
 
diff --git a/gcc/testsuite/c-c++-common/musttail3.c 
b/gcc/testsuite/c-c++-common/musttail3.c
index ea9589c59ef2..7499fd6460b4 100644
--- a/gcc/testsuite/c-c++-common/musttail3.c
+++ b/gcc/testsuite/c-c++-common/musttail3.c
@@ -1,4 +1,4 @@
-/* { dg-do compile { target { tail_call && { c || c++11 } } } } */
+/* { dg-do compile { target { struct_musttail && { c || c++11 } } } } */
 
 extern int foo2 (int x, ...);
 
diff --git a/gcc/testsuite/c-c++-common/musttail4.c 
b/gcc/testsuite/c-c++-common/musttail4.c
index 23f4b5e1cd68..bd6effa4b931 100644
--- a/gcc/testsuite/c-c++-common/musttail4.c
+++ b/gcc/testsuite/c-c++-common/musttail4.c
@@ -1,4 +1,4 @@
-/* { dg-do compile { target { tail_call && { c || c++11 } } } } */
+/* { dg-do compile { target { musttail && { c || c++11 } } } } */
 
 struct box { char field[64]; int i; };
 
diff --git a/gcc/testsuite/c-c++-common/musttail7.c 
b/gcc/testsuite/c-c++-common/mus

Re: [PATCH 0/1] Initial support for AVX10.2

2024-08-02 Thread Andi Kleen
> 
> INT8 is actually char per my understanding.
> 
> For FP8, currently there is no basic calculation insts yet. So we have no
> support for them in AVX10.2 currently, and treat them just as a piece
> of char.
> 
> Also there might be other issues for FP8 to discuss, like ABI issues, so
> we put the support aside for now. When everything is mature, we may
> add the support for that.

But then it's too late isn't it? You wouldn't be able to change
the types of the existing intrinsics anymore, or later end up with
two sets of intrinsics, and end up with interoperability problems
with full computation.

Better to define proper types from the beginning.

-Andi


Re: [PATCH 0/1] Initial support for AVX10.2

2024-08-01 Thread Andi Kleen
Haochen Jiang  writes:

> Hi all,
>
> AVX10.2 tech details has been just published on July 31st in the
> following link:
>
> https://cdrdv2.intel.com/v1/dl/getContent/828965
>
> For new features and instructions, we could divide them into two parts.
> One is ymm rounding control, the other is the new instructions.
>
> In the following weeks, we plan to upstream ymm rounding part first,
> following by new instructions. After all of them upstreamed, we will
> also upstream several patches optimizing codegen with new AVX10.2
> instructions.

Are there plans to make INT8/FP8 types supported by the compiler?
Or just supporting it through some intrinsics?

It seems explicit types would be much more convenient to use
for developers, although it has some drawbacks (like accuracy
depending on spills)

I realize it's likely a lot more work, but it might be worth it?

-Andi


Re: [PATCH] middle-end/114563 - improve release_pages

2024-07-31 Thread Andi Kleen
On Wed, Jul 31, 2024 at 04:02:22PM +0200, Richard Biener wrote:
> The following improves release_pages when using the madvise path
> to sort the freelist to get more page entries contiguous and possibly
> release them.  This populates the unused prev pointer so the reclaim
> can then easily unlink from the freelist without re-ordering it.
> The paths not having madvise do not keep the memory allocated, so
> I left them untouched.
> 
> Re-bootstrap and regtest running on x86_64-unknown-linux-gnu.
> 
> I've CCed people messing with release_pages;  This doesn't really
> address PR114563 but I thought I post this patch anyway - the
> actual issue we run into for the PR is the linear search of
> G.free_pages when that list becomes large but a requested allocation
> cannot be served from it.
> 
>   PR middle-end/114563
>   * ggc-page.cc (page_sort): New qsort comparator.
>   (release_pages): Sort the free_pages list entries after their
>   memory block virtual address to improve contiguous memory
>   chunk release.

I saw this in a profile some time ago and tried it with a slightly
different patch. Instead of a full sort it uses an array to keep
multiple free lists. But I couldn't find any speed ups in non checking
builds later.

My feeling is that an array is probably more efficient.

I guess should compare both on that PR.


diff --git a/gcc/ggc-page.cc b/gcc/ggc-page.cc
index 4245f843a29f..af1627b002c6 100644
--- a/gcc/ggc-page.cc
+++ b/gcc/ggc-page.cc
@@ -234,6 +234,8 @@ static struct
 }
 inverse_table[NUM_ORDERS];
 
+struct free_list;
+
 /* A page_entry records the status of an allocation page.  This
structure is dynamically sized to fit the bitmap in_use_p.  */
 struct page_entry
@@ -251,6 +253,9 @@ struct page_entry
  of the host system page size.)  */
   size_t bytes;
 
+  /* Free list of this page size.  */
+  struct free_list *free_list;
+
   /* The address at which the memory is allocated.  */
   char *page;
 
@@ -368,6 +373,15 @@ struct free_object
 };
 #endif
 
+constexpr int num_free_list = 8;
+
+/* A free_list for pages with BYTES size.  */
+struct free_list
+{
+  size_t bytes;
+  page_entry *free_pages;
+};
+
 /* The rest of the global variables.  */
 static struct ggc_globals
 {
@@ -412,8 +426,8 @@ static struct ggc_globals
   int dev_zero_fd;
 #endif
 
-  /* A cache of free system pages.  */
-  page_entry *free_pages;
+  /* A cache of free system pages. Entry 0 is fallback.  */
+  struct free_list free_lists[num_free_list];
 
 #ifdef USING_MALLOC_PAGE_GROUPS
   page_group *page_groups;
@@ -754,6 +768,26 @@ clear_page_group_in_use (page_group *group, char *page)
 }
 #endif
 
+/* Find a free list for ENTRY_SIZE.  */
+
+static inline struct free_list *
+find_free_list (size_t entry_size)
+{
+  int i;
+  for (i = 1; i < num_free_list; i++)
+{
+  if (G.free_lists[i].bytes == entry_size)
+   return &G.free_lists[i];
+  if (G.free_lists[i].bytes == 0)
+   {
+ G.free_lists[i].bytes = entry_size;
+ return &G.free_lists[i];
+   }
+}
+  /* Fallback.  */
+  return &G.free_lists[0];
+}
+
 /* Allocate a new page for allocating objects of size 2^ORDER,
and return an entry for it.  The entry is not added to the
appropriate page_table list.  */
@@ -770,6 +804,7 @@ alloc_page (unsigned order)
 #ifdef USING_MALLOC_PAGE_GROUPS
   page_group *group;
 #endif
+  struct free_list *free_list;
 
   num_objects = OBJECTS_PER_PAGE (order);
   bitmap_size = BITMAP_SIZE (num_objects + 1);
@@ -782,8 +817,10 @@ alloc_page (unsigned order)
   entry = NULL;
   page = NULL;
 
+  free_list = find_free_list (entry_size);
+
   /* Check the list of free pages for one we can use.  */
-  for (pp = &G.free_pages, p = *pp; p; pp = &p->next, p = *pp)
+  for (pp = &free_list->free_pages, p = *pp; p; pp = &p->next, p = *pp)
 if (p->bytes == entry_size)
   break;
 
@@ -816,7 +853,7 @@ alloc_page (unsigned order)
   /* We want just one page.  Allocate a bunch of them and put the
 extras on the freelist.  (Can only do this optimization with
 mmap for backing store.)  */
-  struct page_entry *e, *f = G.free_pages;
+  struct page_entry *e, *f = free_list->free_pages;
   int i, entries = GGC_QUIRE_SIZE;
 
   page = alloc_anon (NULL, G.pagesize * GGC_QUIRE_SIZE, false);
@@ -833,12 +870,13 @@ alloc_page (unsigned order)
  e = XCNEWVAR (struct page_entry, page_entry_size);
  e->order = order;
  e->bytes = G.pagesize;
+ e->free_list = free_list;
  e->page = page + (i << G.lg_pagesize);
  e->next = f;
  f = e;
}
 
-  G.free_pages = f;
+  free_list->free_pages = f;
 }
   else
 page = alloc_anon (NULL, entry_size, true);
@@ -904,12 +942,13 @@ alloc_page (unsigned order)
  e = XCNEWVAR (struct page_entry, page_entry_size);
  e->order = order;
  e->bytes = G.pagesize;
+ e->free_list = free_list;
   

Re: [PATCH] Add a bootstrap-native build config

2024-07-30 Thread Andi Kleen
> > +BOOT_CFLAGS := -march=native -mtune=native $(BOOT_CFLAGS)
> 
> I was under the impression that -mtune=native is useless with
> -march=native. Is that wrong?

On x86 it's right, but not sure about other architectures. I suppose
it doesn't hurt.

-Andi


Re: [PATCH 2/2] Add AVX2 code path to lexer

2024-07-30 Thread Andi Kleen
> Is that from some kind of rigorous measurement under perf? As you
> surely know, 0.6% wall-clock time can be from boost clock variation
> or just run-to-run noise on x86.

I compared it using hyperfine which does rigorous measurements yes.
It was well above the run-to-run variability.

I had some other patches that didn't meet that bar, e.g. 
i've been experimenting with more modern hashes for inchash
and multiple ggc free lists, but so far no above noise
results.

> 
> I have looked at this code before. When AVX2 is available, so is SSSE3,
> and then a much more efficient approach is available: instead of comparing
> against \r \n \\ ? one-by-one, build a vector
> 
>   0  1  2  3  4  5  6  7  8  9a   bc d   e   f
> { 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, '\n', 0, '\\', '\r', 0, '?' }
> 
> where each character C we're seeking is at position (C % 16). Then
> you can match against them all at once using PSHUFB:
> 
>   t = _mm_shuffle_epi8 (lut, data);
>   t = t == data;

I thought the PSHUFB trick only worked for some bit patterns?

At least according to this paper: https://arxiv.org/pdf/1902.08318

But yes if it applies here it's a good idea.


> 
> As you might recognize this handily beats the fancy SSE4.1 loop as well.
> I did not pursue this because I did not measure a substantial improvement
> (we're way into the land of diminishing returns here) and it seemed like
> maintainers might not like to be distracted with that, but if we are
> touching this code, might as well use the more efficient algorithm.
> I'll be happy to propose a patch if people think it's worthwhile.

Yes makes sense.

(of course it would be even better to teach the vectorizer about it,
although this will require fixing some other issues first, see PR116126)

-Andi


[PATCH] Add a bootstrap-native build config

2024-07-30 Thread Andi Kleen
From: Andi Kleen 

... that uses -march=native -mtune=native to build a compiler optimized
for the host.

config/ChangeLog:

* bootstrap-native.mk: New file.

gcc/ChangeLog:

* doc/install.texi: Document bootstrap-native.
---
 config/bootstrap-native.mk | 1 +
 gcc/doc/install.texi   | 6 ++
 2 files changed, 7 insertions(+)
 create mode 100644 config/bootstrap-native.mk

diff --git a/config/bootstrap-native.mk b/config/bootstrap-native.mk
new file mode 100644
index ..a4a3d8594089
--- /dev/null
+++ b/config/bootstrap-native.mk
@@ -0,0 +1 @@
+BOOT_CFLAGS := -march=native -mtune=native $(BOOT_CFLAGS)
diff --git a/gcc/doc/install.texi b/gcc/doc/install.texi
index 4973f195daf9..29827c5106f8 100644
--- a/gcc/doc/install.texi
+++ b/gcc/doc/install.texi
@@ -3052,6 +3052,12 @@ Removes any @option{-O}-started option from 
@code{BOOT_CFLAGS}, and adds
 @itemx @samp{bootstrap-Og}
 Analogous to @code{bootstrap-O1}.
 
+@item @samp{bootstrap-native}
+@itemx @samp{bootstrap-native}
+Optimize the compiler code for the build host, if supported by the
+architecture. Note this only affects the compiler, not the targeted
+code. If you want the later use @samp{--with-cpu}.
+
 @item @samp{bootstrap-lto}
 Enables Link-Time Optimization for host tools during bootstrapping.
 @samp{BUILD_CONFIG=bootstrap-lto} is equivalent to adding
-- 
2.45.2



Re: [PATCH 2/2] Add AVX2 code path to lexer

2024-07-30 Thread Andi Kleen
Andrew Pinski  writes:
>
> Using the builtin here seems wrong. Why not use the intrinsic
> _mm256_movemask_epi8 ?

I followed the rest of the vectorized code paths. The original reason was that
there was some incompatibility of the intrinsic header with the source
build. I don't know if it's still true, but I guess it doesn't hurt.

> Also it might make sense to remove the MMX version.

See the previous patch.

-Andi



[PATCH 1/2] Remove MMX code path in lexer

2024-07-30 Thread Andi Kleen
From: Andi Kleen 

Host systems with only MMX and no SSE2 should be really rare now.
Let's remove the MMX code path to keep the number of custom
implementations the same.

The SSE2 code path is also somewhat dubious now (nearly everything
should have SSE4 4.2 which is >15 years old now), but the SSE2
code path is used as fallback for others and also apparently
Solaris uses it due to tool chain deficiencies.

libcpp/ChangeLog:

* lex.cc (search_line_mmx): Remove function.
(init_vectorized_lexer): Remove search_line_mmx.
---
 libcpp/lex.cc | 75 ---
 1 file changed, 75 deletions(-)

diff --git a/libcpp/lex.cc b/libcpp/lex.cc
index 16f2c23af1e1..1591dcdf151a 100644
--- a/libcpp/lex.cc
+++ b/libcpp/lex.cc
@@ -290,71 +290,6 @@ static const char repl_chars[4][16] 
__attribute__((aligned(16))) = {
 '?', '?', '?', '?', '?', '?', '?', '?' },
 };
 
-/* A version of the fast scanner using MMX vectorized byte compare insns.
-
-   This uses the PMOVMSKB instruction which was introduced with "MMX2",
-   which was packaged into SSE1; it is also present in the AMD MMX
-   extension.  Mark the function as using "sse" so that we emit a real
-   "emms" instruction, rather than the 3dNOW "femms" instruction.  */
-
-static const uchar *
-#ifndef __SSE__
-__attribute__((__target__("sse")))
-#endif
-search_line_mmx (const uchar *s, const uchar *end ATTRIBUTE_UNUSED)
-{
-  typedef char v8qi __attribute__ ((__vector_size__ (8)));
-  typedef int __m64 __attribute__ ((__vector_size__ (8), __may_alias__));
-
-  const v8qi repl_nl = *(const v8qi *)repl_chars[0];
-  const v8qi repl_cr = *(const v8qi *)repl_chars[1];
-  const v8qi repl_bs = *(const v8qi *)repl_chars[2];
-  const v8qi repl_qm = *(const v8qi *)repl_chars[3];
-
-  unsigned int misalign, found, mask;
-  const v8qi *p;
-  v8qi data, t, c;
-
-  /* Align the source pointer.  While MMX doesn't generate unaligned data
- faults, this allows us to safely scan to the end of the buffer without
- reading beyond the end of the last page.  */
-  misalign = (uintptr_t)s & 7;
-  p = (const v8qi *)((uintptr_t)s & -8);
-  data = *p;
-
-  /* Create a mask for the bytes that are valid within the first
- 16-byte block.  The Idea here is that the AND with the mask
- within the loop is "free", since we need some AND or TEST
- insn in order to set the flags for the branch anyway.  */
-  mask = -1u << misalign;
-
-  /* Main loop processing 8 bytes at a time.  */
-  goto start;
-  do
-{
-  data = *++p;
-  mask = -1;
-
-start:
-  t = __builtin_ia32_pcmpeqb(data, repl_nl);
-  c = __builtin_ia32_pcmpeqb(data, repl_cr);
-  t = (v8qi) __builtin_ia32_por ((__m64)t, (__m64)c);
-  c = __builtin_ia32_pcmpeqb(data, repl_bs);
-  t = (v8qi) __builtin_ia32_por ((__m64)t, (__m64)c);
-  c = __builtin_ia32_pcmpeqb(data, repl_qm);
-  t = (v8qi) __builtin_ia32_por ((__m64)t, (__m64)c);
-  found = __builtin_ia32_pmovmskb (t);
-  found &= mask;
-}
-  while (!found);
-
-  __builtin_ia32_emms ();
-
-  /* FOUND contains 1 in bits for which we matched a relevant
- character.  Conversion to the byte index is trivial.  */
-  found = __builtin_ctz(found);
-  return (const uchar *)p + found;
-}
 
 /* A version of the fast scanner using SSE2 vectorized byte compare insns.  */
 
@@ -509,8 +444,6 @@ init_vectorized_lexer (void)
   minimum = 3;
 #elif defined(__SSE2__)
   minimum = 2;
-#elif defined(__SSE__)
-  minimum = 1;
 #endif
 
   if (minimum == 3)
@@ -521,14 +454,6 @@ init_vectorized_lexer (void)
 impl = search_line_sse42;
   else if (minimum == 2 || (edx & bit_SSE2))
impl = search_line_sse2;
-  else if (minimum == 1 || (edx & bit_SSE))
-   impl = search_line_mmx;
-}
-  else if (__get_cpuid (0x8001, &dummy, &dummy, &dummy, &edx))
-{
-  if (minimum == 1
- || (edx & (bit_MMXEXT | bit_CMOV)) == (bit_MMXEXT | bit_CMOV))
-   impl = search_line_mmx;
 }
 
   search_line_fast = impl;
-- 
2.45.2



[PATCH 2/2] Add AVX2 code path to lexer

2024-07-30 Thread Andi Kleen
From: Andi Kleen 

AVX2 is widely available on x86 and it allows to do the scanner line
check with 32 bytes at a time. The code is similar to the SSE2 code
path, just using AVX and 32 bytes at a time instead of SSE2 16 bytes.

Also adjust the code to allow inlining when the compiler
is built for an AVX2 host, following what other architectures
do.

I see about a ~0.6% compile time improvement for compiling i386
insn-recog.i with -O0.

libcpp/ChangeLog:

* config.in (HAVE_AVX2): Add.
* configure: Regenerate.
* configure.ac: Add HAVE_AVX2 check.
* lex.cc (repl_chars): Extend to 32 bytes.
(search_line_avx2): New function to scan line using AVX2.
(init_vectorized_lexer): Check for AVX2 in CPUID.
---
 libcpp/config.in|  3 ++
 libcpp/configure| 17 +
 libcpp/configure.ac |  3 ++
 libcpp/lex.cc   | 91 +++--
 4 files changed, 110 insertions(+), 4 deletions(-)

diff --git a/libcpp/config.in b/libcpp/config.in
index 253ef03a3dea..8fad6bd4b4f5 100644
--- a/libcpp/config.in
+++ b/libcpp/config.in
@@ -213,6 +213,9 @@
 /* Define to 1 if you can assemble SSE4 insns. */
 #undef HAVE_SSE4
 
+/* Define to 1 if you can assemble AVX2 insns. */
+#undef HAVE_AVX2
+
 /* Define to 1 if you have the  header file. */
 #undef HAVE_STDDEF_H
 
diff --git a/libcpp/configure b/libcpp/configure
index 32d6aaa30699..6d9286ac9601 100755
--- a/libcpp/configure
+++ b/libcpp/configure
@@ -9149,6 +9149,23 @@ if ac_fn_c_try_compile "$LINENO"; then :
 
 $as_echo "#define HAVE_SSE4 1" >>confdefs.h
 
+fi
+rm -f core conftest.err conftest.$ac_objext conftest.$ac_ext
+cat confdefs.h - <<_ACEOF >conftest.$ac_ext
+/* end confdefs.h.  */
+
+int
+main ()
+{
+asm ("vpcmpeqb %%ymm0, %%ymm4, %%ymm5" : : "i"(0))
+  ;
+  return 0;
+}
+_ACEOF
+if ac_fn_c_try_compile "$LINENO"; then :
+
+$as_echo "#define HAVE_AVX2 1" >>confdefs.h
+
 fi
 rm -f core conftest.err conftest.$ac_objext conftest.$ac_ext
 esac
diff --git a/libcpp/configure.ac b/libcpp/configure.ac
index b883fec776fe..c06609827924 100644
--- a/libcpp/configure.ac
+++ b/libcpp/configure.ac
@@ -200,6 +200,9 @@ case $target in
 AC_TRY_COMPILE([], [asm ("pcmpestri %0, %%xmm0, %%xmm1" : : "i"(0))],
   [AC_DEFINE([HAVE_SSE4], [1],
 [Define to 1 if you can assemble SSE4 insns.])])
+AC_TRY_COMPILE([], [asm ("vpcmpeqb %%ymm0, %%ymm4, %%ymm5" : : "i"(0))],
+  [AC_DEFINE([HAVE_AVX2], [1],
+[Define to 1 if you can assemble AVX2 insns.])])
 esac
 
 # Enable --enable-host-shared.
diff --git a/libcpp/lex.cc b/libcpp/lex.cc
index 1591dcdf151a..72f3402aac99 100644
--- a/libcpp/lex.cc
+++ b/libcpp/lex.cc
@@ -278,19 +278,31 @@ search_line_acc_char (const uchar *s, const uchar *end 
ATTRIBUTE_UNUSED)
 /* Replicated character data to be shared between implementations.
Recall that outside of a context with vector support we can't
define compatible vector types, therefore these are all defined
-   in terms of raw characters.  */
-static const char repl_chars[4][16] __attribute__((aligned(16))) = {
+   in terms of raw characters.
+   gcc constant propagates this and usually turns it into a
+   vector broadcast, so it actually disappears.  */
+
+static const char repl_chars[4][32] __attribute__((aligned(32))) = {
   { '\n', '\n', '\n', '\n', '\n', '\n', '\n', '\n',
+'\n', '\n', '\n', '\n', '\n', '\n', '\n', '\n',
+'\n', '\n', '\n', '\n', '\n', '\n', '\n', '\n',
 '\n', '\n', '\n', '\n', '\n', '\n', '\n', '\n' },
   { '\r', '\r', '\r', '\r', '\r', '\r', '\r', '\r',
+'\r', '\r', '\r', '\r', '\r', '\r', '\r', '\r',
+'\r', '\r', '\r', '\r', '\r', '\r', '\r', '\r',
 '\r', '\r', '\r', '\r', '\r', '\r', '\r', '\r' },
   { '\\', '\\', '\\', '\\', '\\', '\\', '\\', '\\',
+'\\', '\\', '\\', '\\', '\\', '\\', '\\', '\\',
+'\\', '\\', '\\', '\\', '\\', '\\', '\\', '\\',
 '\\', '\\', '\\', '\\', '\\

[PATCH] PR116080: Fix test suite checks for musttail

2024-07-29 Thread Andi Kleen
From: Andi Kleen 

This is a new attempt to fix PR116080. The previous try was reverted
because it just broke a bunch of tests, hiding the problem.

- musttail behaves differently than tailcall at -O0. Some of the test
run at -O0, so add separate effective target tests for musttail.
- New effective target tests need to use unique file names
to make dejagnu caching work
- Change the tests to use new targets
- Add a external_musttail test to check for target's ability
to do tail calls between translation units. This covers some powerpc
ABIs.

gcc/testsuite/ChangeLog:

PR testsuite/116080
* c-c++-common/musttail1.c: Use musttail target.
* c-c++-common/musttail12.c: Use struct_musttail target.
* c-c++-common/musttail2.c: Use musttail target.
* c-c++-common/musttail3.c: Likewise.
* c-c++-common/musttail4.c: Likewise.
* c-c++-common/musttail7.c: Likewise.
* c-c++-common/musttail8.c: Likewise.
* g++.dg/musttail10.C: Likewise. Replace powerpc checks with
external_musttail.
* g++.dg/musttail11.C: Use musttail target.
* g++.dg/musttail6.C: Use musttail target. Replace powerpc
checks with external_musttail.
* g++.dg/musttail9.C: Use musttail target.
* lib/target-supports.exp: Add musttail, struct_musttail,
external_musttail targets. Remove optimization for musttail.
Use unique file names for musttail.
---
 gcc/testsuite/c-c++-common/musttail1.c  |  2 +-
 gcc/testsuite/c-c++-common/musttail12.c |  2 +-
 gcc/testsuite/c-c++-common/musttail2.c  |  2 +-
 gcc/testsuite/c-c++-common/musttail3.c  |  2 +-
 gcc/testsuite/c-c++-common/musttail4.c  |  2 +-
 gcc/testsuite/c-c++-common/musttail7.c  |  2 +-
 gcc/testsuite/c-c++-common/musttail8.c  |  2 +-
 gcc/testsuite/g++.dg/musttail10.C   |  4 ++--
 gcc/testsuite/g++.dg/musttail11.C   |  2 +-
 gcc/testsuite/g++.dg/musttail6.C|  4 ++--
 gcc/testsuite/g++.dg/musttail9.C|  2 +-
 gcc/testsuite/lib/target-supports.exp   | 30 -
 12 files changed, 37 insertions(+), 19 deletions(-)

diff --git a/gcc/testsuite/c-c++-common/musttail1.c 
b/gcc/testsuite/c-c++-common/musttail1.c
index 74efcc2a0bc6..51549672e02a 100644
--- a/gcc/testsuite/c-c++-common/musttail1.c
+++ b/gcc/testsuite/c-c++-common/musttail1.c
@@ -1,4 +1,4 @@
-/* { dg-do compile { target { tail_call && { c || c++11 } } } } */
+/* { dg-do compile { target { musttail && { c || c++11 } } } } */
 /* { dg-additional-options "-fdelayed-branch" { target sparc*-*-* } } */
 
 int __attribute__((noinline,noclone,noipa))
diff --git a/gcc/testsuite/c-c++-common/musttail12.c 
b/gcc/testsuite/c-c++-common/musttail12.c
index 4140bcd00950..475afc5af3f3 100644
--- a/gcc/testsuite/c-c++-common/musttail12.c
+++ b/gcc/testsuite/c-c++-common/musttail12.c
@@ -1,4 +1,4 @@
-/* { dg-do compile { target { struct_tail_call && { c || c++11 } } } } */
+/* { dg-do compile { target { struct_musttail && { c || c++11 } } } } */
 /* { dg-additional-options "-fdelayed-branch" { target sparc*-*-* } } */
 
 struct str
diff --git a/gcc/testsuite/c-c++-common/musttail2.c 
b/gcc/testsuite/c-c++-common/musttail2.c
index 86f2c3d77404..1970c4edd670 100644
--- a/gcc/testsuite/c-c++-common/musttail2.c
+++ b/gcc/testsuite/c-c++-common/musttail2.c
@@ -1,4 +1,4 @@
-/* { dg-do compile { target { tail_call && { c || c++11 } } } } */
+/* { dg-do compile { target { musttail && { c || c++11 } } } } */
 
 struct box { char field[256]; int i; };
 
diff --git a/gcc/testsuite/c-c++-common/musttail3.c 
b/gcc/testsuite/c-c++-common/musttail3.c
index ea9589c59ef2..7499fd6460b4 100644
--- a/gcc/testsuite/c-c++-common/musttail3.c
+++ b/gcc/testsuite/c-c++-common/musttail3.c
@@ -1,4 +1,4 @@
-/* { dg-do compile { target { tail_call && { c || c++11 } } } } */
+/* { dg-do compile { target { struct_musttail && { c || c++11 } } } } */
 
 extern int foo2 (int x, ...);
 
diff --git a/gcc/testsuite/c-c++-common/musttail4.c 
b/gcc/testsuite/c-c++-common/musttail4.c
index 23f4b5e1cd68..bd6effa4b931 100644
--- a/gcc/testsuite/c-c++-common/musttail4.c
+++ b/gcc/testsuite/c-c++-common/musttail4.c
@@ -1,4 +1,4 @@
-/* { dg-do compile { target { tail_call && { c || c++11 } } } } */
+/* { dg-do compile { target { musttail && { c || c++11 } } } } */
 
 struct box { char field[64]; int i; };
 
diff --git a/gcc/testsuite/c-c++-common/musttail7.c 
b/gcc/testsuite/c-c++-common/musttail7.c
index c753a3fe9b2a..d17cb71256d7 100644
--- a/gcc/testsuite/c-c++-common/musttail7.c
+++ b/gcc/testsuite/c-c++-common/musttail7.c
@@ -1,4 +1,4 @@
-/* { dg-do compile { target { tail_call && { c || c++11 } } } } */
+/* { dg-do compile { target { musttail && { c || c++11 } } } } */
 /* { dg-additional-options "-fdelayed-branch" { target sparc*-*-* } } */
 
 void __attribute__((noipa)) f() {}
diff --git a/gcc/testsuite

Re: [PATCH v1 1/2] PR116080: Fix tail call dejagnu checks

2024-07-29 Thread Andi Kleen



I'm going to revert the patch for now. There are two problems:

- The new tests don't have a unique name so the caching confuses 
the results.
- To test with -O2 we need explicit musttail checks because tail call doesn't
run with -O0 w/o musttail.



Re: [PATCH v1 1/2] PR116080: Fix tail call dejagnu checks

2024-07-29 Thread Andi Kleen
> ..., that means that a number of the new test cases are UNSUPPORTED, for
> example, x86_64 GNU/Linux:
> 
> +UNSUPPORTED: c-c++-common/musttail1.c  -Wc++-compat 
> +UNSUPPORTED: c-c++-common/musttail12.c  -Wc++-compat 
> +PASS: c-c++-common/musttail13.c  -Wc++-compat   (test for errors, line 4)
> +PASS: c-c++-common/musttail13.c  -Wc++-compat  (test for excess errors)
> +UNSUPPORTED: c-c++-common/musttail2.c  -Wc++-compat 
> +UNSUPPORTED: c-c++-common/musttail3.c  -Wc++-compat 
> +UNSUPPORTED: c-c++-common/musttail4.c  -Wc++-compat 
> +PASS: c-c++-common/musttail5.c  -Wc++-compat   (test for errors, line 17)
> +PASS: c-c++-common/musttail5.c  -Wc++-compat   (test for warnings, line 
> 10)
> +PASS: c-c++-common/musttail5.c  -Wc++-compat   (test for warnings, line 
> 11)
> +PASS: c-c++-common/musttail5.c  -Wc++-compat   (test for warnings, line 
> 12)
> +PASS: c-c++-common/musttail5.c  -Wc++-compat   (test for warnings, line 
> 24)
> +PASS: c-c++-common/musttail5.c  -Wc++-compat   (test for warnings, line 
> 25)
> +PASS: c-c++-common/musttail5.c  -Wc++-compat   (test for warnings, line 
> 26)
> +PASS: c-c++-common/musttail5.c  -Wc++-compat   (test for warnings, line 
> 5)
> +PASS: c-c++-common/musttail5.c  -Wc++-compat   (test for warnings, line 
> 6)
> +PASS: c-c++-common/musttail5.c  -Wc++-compat  (test for excess errors)
> +UNSUPPORTED: c-c++-common/musttail7.c  -Wc++-compat 
> +UNSUPPORTED: c-c++-common/musttail8.c  -Wc++-compat 
> 
> (Similarly for their C++ testing.)
> 
> +UNSUPPORTED: g++.dg/musttail10.C  
> +UNSUPPORTED: g++.dg/musttail11.C  
> +UNSUPPORTED: g++.dg/musttail6.C  
> +UNSUPPORTED: g++.dg/musttail9.C  
> 
> ..., and even a few existing test cases "regress" from PASS to
> UNSUPPORTED:
> 
> [-PASS:-]{+UNSUPPORTED:+} gcc.dg/plugin/must-tail-call-1.c 
> -fplugin=./must_tail_call_plugin.so[-(test for excess errors)-]
> [-PASS:-]{+UNSUPPORTED:+} gcc.dg/plugin/must-tail-call-2.c 
> -fplugin=./must_tail_call_plugin.so[-(test for errors, line 18)-]
> [-PASS: gcc.dg/plugin/must-tail-call-2.c 
> -fplugin=./must_tail_call_plugin.so  (test for errors, line 33)-]
> [-PASS: gcc.dg/plugin/must-tail-call-2.c 
> -fplugin=./must_tail_call_plugin.so  (test for errors, line 40)-]
> [-PASS: gcc.dg/plugin/must-tail-call-2.c 
> -fplugin=./must_tail_call_plugin.so  (test for errors, line 49)-]
> [-PASS: gcc.dg/plugin/must-tail-call-2.c 
> -fplugin=./must_tail_call_plugin.so  (test for errors, line 58)-]
> [-PASS: gcc.dg/plugin/must-tail-call-2.c 
> -fplugin=./must_tail_call_plugin.so (test for excess errors)-]
> 
> Similarly for ppc64le GNU/Linux.
> 
> Is that intentional?

Thanks.  I will take a look. At least on x86_64-linux everything should
be supported. On powerpc and ARM I expect some unsupported. 

But the previous test cases shouldn't have changed. Maybe we need
more tail_call dejagnu tests that also enable -O2. 

The whole area is unfortunately somewhat of a mine field because of
lots of varying restrictions on tail calls, both with frontends
and targets.

-Andi


[PATCH v1 2/2] PR116019: Improve tail call error message

2024-07-25 Thread Andi Kleen
From: Andi Kleen 

The "tail call must be the same type" message is common on some
targets with C++, or without optimization. It is generated
when gcc believes there is an access of the return value
after the call. However usually it does not actually corespond
to a type mismatch, but can be caused for other reasons.

Make it slightly more vague to be less misleading.

gcc/ChangeLog:

PR c++/116019
* tree-tailcall.cc (find_tail_calls): Change tail call
error message.
---
 gcc/tree-tailcall.cc | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/gcc/tree-tailcall.cc b/gcc/tree-tailcall.cc
index a68079d4f507..1901b1a13f99 100644
--- a/gcc/tree-tailcall.cc
+++ b/gcc/tree-tailcall.cc
@@ -632,7 +632,7 @@ find_tail_calls (basic_block bb, struct tailcall **ret, 
bool only_musttail,
   && may_be_aliased (result_decl)
   && ref_maybe_used_by_stmt_p (call, result_decl, false))
 {
-  maybe_error_musttail (call, _("tail call must be same type"));
+  maybe_error_musttail (call, _("return value used after call"));
   return;
 }
 
-- 
2.45.2



[PATCH v1 1/2] PR116080: Fix tail call dejagnu checks

2024-07-25 Thread Andi Kleen
From: Andi Kleen 

- Run the target_effective tail_call checks without optimization to
match the actual test cases.
- Add an extra check for external tail calls to handle targets like
powerpc that cannot tail call between different object files.
This one will also cover templates.

gcc/testsuite/ChangeLog:

PR testsuite/116080
* g++.dg/musttail10.C: Use external tail call target check.
* g++.dg/musttail6.C: Dito.
* lib/target-supports.exp: Add external_tail_call. Disable
optimization for tail call checks.
---
 gcc/testsuite/g++.dg/musttail10.C |  2 +-
 gcc/testsuite/g++.dg/musttail6.C  |  2 +-
 gcc/testsuite/lib/target-supports.exp | 14 +++---
 3 files changed, 13 insertions(+), 5 deletions(-)

diff --git a/gcc/testsuite/g++.dg/musttail10.C 
b/gcc/testsuite/g++.dg/musttail10.C
index ff7fcc7d8755..bd75affa2220 100644
--- a/gcc/testsuite/g++.dg/musttail10.C
+++ b/gcc/testsuite/g++.dg/musttail10.C
@@ -8,7 +8,7 @@ double g() { [[gnu::musttail]] return f(); } /* { dg-error 
"cannot tail-cal
 
 template 
 __attribute__((noinline, noclone, noipa))
-T g1() { [[gnu::musttail]] return f(); } /* { dg-error "target is not able" 
"" { target powerpc*-*-* } } */
+T g1() { [[gnu::musttail]] return f(); } /* { dg-error "target is not able" 
"" { target { external_tail_call } } } */
 
 template 
 __attribute__((noinline, noclone, noipa))
diff --git a/gcc/testsuite/g++.dg/musttail6.C b/gcc/testsuite/g++.dg/musttail6.C
index 5c6f69407ddb..81f6d9f3ca77 100644
--- a/gcc/testsuite/g++.dg/musttail6.C
+++ b/gcc/testsuite/g++.dg/musttail6.C
@@ -1,6 +1,6 @@
 /* { dg-do compile { target { struct_tail_call } } } */
+/* { dg-require-effective-target external_tail_call } */
 /* A lot of architectures will not build this due to PR115606 and PR115607 */
-/* { dg-skip-if "powerpc does not support sibcall to templates" { powerpc*-*-* 
} } */
 /* { dg-options "-std=gnu++11" } */
 /* { dg-additional-options "-fdelayed-branch" { target sparc*-*-* } } */
 
diff --git a/gcc/testsuite/lib/target-supports.exp 
b/gcc/testsuite/lib/target-supports.exp
index d368251ef9a4..0a3946e82d4b 100644
--- a/gcc/testsuite/lib/target-supports.exp
+++ b/gcc/testsuite/lib/target-supports.exp
@@ -12741,7 +12741,15 @@ proc check_effective_target_tail_call { } {
 return [check_no_messages_and_pattern tail_call ",SIBCALL" rtl-expand {
__attribute__((__noipa__)) void foo (void) { }
__attribute__((__noipa__)) void bar (void) { foo(); }
-} {-O2 -fdump-rtl-expand-all}] ;# The "SIBCALL" note requires a detailed 
dump.
+} {-fdump-rtl-expand-all}] ;# The "SIBCALL" note requires a detailed dump.
+}
+
+# Return 1 if the target can perform tail-calls for externals
+proc check_effective_target_external_tail_call { } {
+return [check_no_messages_and_pattern tail_call ",SIBCALL" rtl-expand {
+   extern __attribute__((__noipa__)) void foo (void);
+   __attribute__((__noipa__)) void bar (void) { foo(); }
+} {-fdump-rtl-expand-all}] ;# The "SIBCALL" note requires a detailed dump.
 }
 
 # Return 1 if the target can perform tail-call optimizations for structures
@@ -12751,9 +12759,9 @@ proc check_effective_target_struct_tail_call { } {
 return [check_no_messages_and_pattern tail_call ",SIBCALL" rtl-expand {
// C++
struct foo { int a, b; };
-   __attribute__((__noipa__)) struct foo foo (void) { return {}; }
+   extern __attribute__((__noipa__)) struct foo foo (void);
__attribute__((__noipa__)) struct foo bar (void) { return foo(); }
-} {-O2 -fdump-rtl-expand-all}] ;# The "SIBCALL" note requires a detailed 
dump.
+} {-fdump-rtl-expand-all}] ;# The "SIBCALL" note requires a detailed dump.
 }
 
 # Return 1 if the target's calling sequence or its ABI
-- 
2.45.2



Re: [PATCH v10 1/3] C++: Support clang compatible [[musttail]] (PR83324)

2024-07-18 Thread Andi Kleen


Updated patch with the !retval bug fix identified by Marek.

This patch implements a clang compatible [[musttail]] attribute for
returns.
  
musttail is useful as an alternative to computed goto for interpreters.
With computed goto the interpreter function usually ends up very big
which causes problems with register allocation and other per function
optimizations not scaling. With musttail the interpreter can be instead
written as a sequence of smaller functions that call each other. To
avoid unbounded stack growth this requires forcing a sibling call, which
this attribute does. It guarantees an error if the call cannot be tail
called which allows the programmer to fix it instead of risking a stack
overflow. Unlike computed goto it is also type-safe.
   
It turns out that David Malcolm had already implemented middle/backend
support for a musttail attribute back in 2016, but it wasn't exposed
to any frontend other than a special plugin.

This patch adds a [[gnu::musttail]] attribute for C++ that can be added
to return statements. The return statement must be a direct call
(it does not follow dependencies), which is similar to what clang
implements. It then uses the existing must tail infrastructure.

For compatibility it also detects clang::musttail

Passes bootstrap and full test

gcc/c-family/ChangeLog:

* c-attribs.cc (set_musttail_on_return): New function.
* c-common.h (set_musttail_on_return): Declare new function.

gcc/cp/ChangeLog:
 
PR c/83324
* cp-tree.h (AGGR_INIT_EXPR_MUST_TAIL): Add.
* parser.cc (cp_parser_statement): Handle musttail.
(cp_parser_jump_statement): Dito.
* pt.cc (tsubst_expr): Copy CALL_EXPR_MUST_TAIL_CALL.
* semantics.cc (simplify_aggr_init_expr): Handle musttail.

diff --git a/gcc/c-family/c-attribs.cc b/gcc/c-family/c-attribs.cc
index 5adc7b775eaf..685f212683f4 100644
--- a/gcc/c-family/c-attribs.cc
+++ b/gcc/c-family/c-attribs.cc
@@ -672,6 +672,26 @@ attribute_takes_identifier_p (const_tree attr_id)
 return targetm.attribute_takes_identifier_p (attr_id);
 }
 
+/* Set a musttail attribute MUSTTAIL_P on return expression RETVAL
+   at LOC.  */
+
+void
+set_musttail_on_return (tree retval, location_t loc, bool musttail_p)
+{
+  if (retval && musttail_p)
+{
+  tree t = retval;
+  if (TREE_CODE (t) == TARGET_EXPR)
+   t = TARGET_EXPR_INITIAL (t);
+  if (TREE_CODE (t) != CALL_EXPR)
+   error_at (loc, "cannot tail-call: return value must be a call");
+  else
+   CALL_EXPR_MUST_TAIL_CALL (t) = 1;
+}
+  else if (musttail_p && !retval)
+error_at (loc, "cannot tail-call: return value must be a call");
+}
+
 /* Verify that argument value POS at position ARGNO to attribute NAME
applied to function FN (which is either a function declaration or function
type) refers to a function parameter at position POS and the expected type
diff --git a/gcc/c-family/c-common.h b/gcc/c-family/c-common.h
index adee822a3ae0..2510ee4dbc9d 100644
--- a/gcc/c-family/c-common.h
+++ b/gcc/c-family/c-common.h
@@ -1648,6 +1648,7 @@ extern tree handle_noreturn_attribute (tree *, tree, 
tree, int, bool *);
 extern tree handle_musttail_attribute (tree *, tree, tree, int, bool *);
 extern bool has_attribute (location_t, tree, tree, tree (*)(tree));
 extern tree build_attr_access_from_parms (tree, bool);
+extern void set_musttail_on_return (tree, location_t, bool);
 
 /* In c-format.cc.  */
 extern bool valid_format_string_type_p (tree);
diff --git a/gcc/cp/cp-tree.h b/gcc/cp/cp-tree.h
index c6f102564ce0..67ba3274eb1b 100644
--- a/gcc/cp/cp-tree.h
+++ b/gcc/cp/cp-tree.h
@@ -4236,6 +4236,10 @@ templated_operator_saved_lookups (tree t)
 #define AGGR_INIT_FROM_THUNK_P(NODE) \
   (AGGR_INIT_EXPR_CHECK (NODE)->base.protected_flag)
 
+/* Nonzero means that the call was marked musttail.  */
+#define AGGR_INIT_EXPR_MUST_TAIL(NODE) \
+  (AGGR_INIT_EXPR_CHECK (NODE)->base.static_flag)
+
 /* AGGR_INIT_EXPR accessors.  These are equivalent to the CALL_EXPR
accessors, except for AGGR_INIT_EXPR_SLOT (which takes the place of
CALL_EXPR_STATIC_CHAIN).  */
diff --git a/gcc/cp/parser.cc b/gcc/cp/parser.cc
index efd5d6f29a71..1fa0780944b6 100644
--- a/gcc/cp/parser.cc
+++ b/gcc/cp/parser.cc
@@ -2467,7 +2467,7 @@ static tree cp_parser_perform_range_for_lookup
 static tree cp_parser_range_for_member_function
   (tree, tree);
 static tree cp_parser_jump_statement
-  (cp_parser *);
+  (cp_parser *, tree &);
 static void cp_parser_declaration_statement
   (cp_parser *);
 
@@ -12757,7 +12757,7 @@ cp_parser_statement (cp_parser* parser, tree 
in_statement_expr,
case RID_CO_RETURN:
case RID_GOTO:
  std_attrs = process_stmt_hotness_attribute (std_attrs, attrs_loc);
- statement = cp_parser_jump_statement (parser);
+ statement = cp_parser_jump_statement (parser, std_attrs);
  break;
 
  /* Objective-C++ exception-handling constructs.  */
@@ -14845,

Re: [PATCH v10 2/3] C: Implement musttail attribute for returns

2024-07-18 Thread Andi Kleen
> > > > +  set_musttail_on_return (retval, xloc, musttail_p);
> > > > +
> > > >if (retval)
> > > >  {
> > > >tree semantic_type = NULL_TREE;
> > > 
> > > Is it deliberate that set_musttail_on_return is called outside the
> > > if (retval) block?  If it can be moved into it, set_musttail_on_return
> > > can be simplified to assume that retval is always non-null.
> > 
> > Yes it can be removed.

Actually I was wrong here, after double checking. The !retval case is
needed to diagnose a [[musttail]] set on a plain return (which is not
allowed following the clang spec)

So the call has to be outside the check.

The C frontend did it correctly, but the C++ part did not (fixed now)

-Andi


Re: [PATCH v10 2/3] C: Implement musttail attribute for returns

2024-07-18 Thread Andi Kleen
On Thu, Jul 18, 2024 at 02:19:21PM -0400, Marek Polacek wrote:
> On Wed, Jul 17, 2024 at 09:30:00PM -0700, Andi Kleen wrote:
> > Implement a C23 clang compatible musttail attribute similar to the earlier
> > C++ implementation in the C parser.
> > 
> > gcc/c/ChangeLog:
> > 
> > PR c/83324
> > * c-parser.cc (struct attr_state): Define with musttail_p.
> > (c_parser_statement_after_labels): Handle [[musttail]].
> > (c_parser_std_attribute): Dito.
> > (c_parser_handle_musttail): Dito.
> > (c_parser_compound_statement_nostart): Dito.
> > (c_parser_all_labels): Dito.
> > (c_parser_statement): Dito.
> > * c-tree.h (c_finish_return): Add musttail_p flag.
> > * c-typeck.cc (c_finish_return): Handle musttail_p flag.
> > ---
> >  gcc/c/c-parser.cc | 70 ++-
> >  gcc/c/c-tree.h|  2 +-
> >  gcc/c/c-typeck.cc |  7 +++--
> >  3 files changed, 63 insertions(+), 16 deletions(-)
> > 
> > diff --git a/gcc/c/c-parser.cc b/gcc/c/c-parser.cc
> > index 12c5ed5d92c7..a8848d01f21a 100644
> > --- a/gcc/c/c-parser.cc
> > +++ b/gcc/c/c-parser.cc
> > @@ -1621,6 +1621,12 @@ struct omp_for_parse_data {
> >bool fail : 1;
> >  };
> >  
> > +struct attr_state
> > +{
> > +  /* True if we parsed a musttail attribute for return.  */
> > +  bool musttail_p;
> > +};
> > +
> >  static bool c_parser_nth_token_starts_std_attributes (c_parser *,
> >   unsigned int);
> >  static tree c_parser_std_attribute_specifier_sequence (c_parser *);
> > @@ -1665,7 +1671,7 @@ static location_t c_parser_compound_statement_nostart 
> > (c_parser *);
> >  static void c_parser_label (c_parser *, tree);
> >  static void c_parser_statement (c_parser *, bool *, location_t * = NULL);
> >  static void c_parser_statement_after_labels (c_parser *, bool *,
> > -vec * = NULL);
> > +vec * = NULL, attr_state = 
> > {});
> 
> Nit: the line seems to go over 80 columns.

Ok.

> >  || c_parser_next_token_is_keyword (parser, RID_DEFAULT)
> >  || (c_parser_next_token_is (parser, CPP_NAME)
> > @@ -7346,7 +7384,10 @@ c_parser_all_labels (c_parser *parser)
> >std_attrs = NULL;
> >if ((have_std_attrs = c_parser_nth_token_starts_std_attributes 
> > (parser,
> >   1)))
> > -   std_attrs = c_parser_std_attribute_specifier_sequence (parser);
> > +   {
> > + std_attrs = c_parser_std_attribute_specifier_sequence (parser);
> > + std_attrs = c_parser_handle_musttail (parser, std_attrs, attr);
> > +   }
> 
> Thanks, I believe this addresses the testcase I mentioned earlier:
> 
>   struct str
>   {
> int a, b;
>   };
> 
>   struct str
>   cstruct (int x)
>   {
> if (x < 10)
>   L: // <
>   [[gnu::musttail]] return cstruct (x + 1);
> return ((struct str){ x, 0 });
>   }
> 
> but I didn't see that being tested in your testsuite patch; apologies if
> I missed it.

It wasn't there. I will add it.

> 
> >  tree
> > -c_finish_return (location_t loc, tree retval, tree origtype)
> > +c_finish_return (location_t loc, tree retval, tree origtype, bool 
> > musttail_p)
> >  {
> >tree valtype = TREE_TYPE (TREE_TYPE (current_function_decl)), ret_stmt;
> >bool no_warning = false;
> > @@ -11742,6 +11743,8 @@ c_finish_return (location_t loc, tree retval, tree 
> > origtype)
> >  warning_at (xloc, 0,
> > "function declared % has a % statement");
> >  
> > +  set_musttail_on_return (retval, xloc, musttail_p);
> > +
> >if (retval)
> >  {
> >tree semantic_type = NULL_TREE;
> 
> Is it deliberate that set_musttail_on_return is called outside the
> if (retval) block?  If it can be moved into it, set_musttail_on_return
> can be simplified to assume that retval is always non-null.

Yes it can be removed.

Is the patchk ok with these changes?

-Andi


[PATCH v10 3/3] Add tests for C/C++ musttail attributes

2024-07-17 Thread Andi Kleen
Some adopted from the existing C musttail plugin tests.
Also extends the ability to query the sibcall capabilities of the
target.

gcc/testsuite/ChangeLog:

* testsuite/lib/target-supports.exp
(check_effective_target_struct_tail_call): New function.
* c-c++-common/musttail1.c: New test.
* c-c++-common/musttail2.c: New test.
* c-c++-common/musttail3.c: New test.
* c-c++-common/musttail4.c: New test.
* c-c++-common/musttail7.c: New test.
* c-c++-common/musttail8.c: New test.
* g++.dg/musttail6.C: New test.
* g++.dg/musttail9.C: New test.
* g++.dg/musttail10.C: New test.
* g++.dg/musttail11.C: New test.
---
 gcc/testsuite/c-c++-common/musttail1.c | 14 ++
 gcc/testsuite/c-c++-common/musttail2.c | 33 ++
 gcc/testsuite/c-c++-common/musttail3.c | 29 +
 gcc/testsuite/c-c++-common/musttail4.c | 17 
 gcc/testsuite/c-c++-common/musttail5.c | 28 
 gcc/testsuite/c-c++-common/musttail7.c | 14 ++
 gcc/testsuite/c-c++-common/musttail8.c | 17 
 gcc/testsuite/g++.dg/musttail10.C  | 40 +
 gcc/testsuite/g++.dg/musttail11.C  | 33 ++
 gcc/testsuite/g++.dg/musttail6.C   | 60 ++
 gcc/testsuite/g++.dg/musttail9.C   | 10 +
 gcc/testsuite/lib/target-supports.exp  |  9 
 12 files changed, 304 insertions(+)
 create mode 100644 gcc/testsuite/c-c++-common/musttail1.c
 create mode 100644 gcc/testsuite/c-c++-common/musttail2.c
 create mode 100644 gcc/testsuite/c-c++-common/musttail3.c
 create mode 100644 gcc/testsuite/c-c++-common/musttail4.c
 create mode 100644 gcc/testsuite/c-c++-common/musttail5.c
 create mode 100644 gcc/testsuite/c-c++-common/musttail7.c
 create mode 100644 gcc/testsuite/c-c++-common/musttail8.c
 create mode 100644 gcc/testsuite/g++.dg/musttail10.C
 create mode 100644 gcc/testsuite/g++.dg/musttail11.C
 create mode 100644 gcc/testsuite/g++.dg/musttail6.C
 create mode 100644 gcc/testsuite/g++.dg/musttail9.C

diff --git a/gcc/testsuite/c-c++-common/musttail1.c 
b/gcc/testsuite/c-c++-common/musttail1.c
new file mode 100644
index ..74efcc2a0bc6
--- /dev/null
+++ b/gcc/testsuite/c-c++-common/musttail1.c
@@ -0,0 +1,14 @@
+/* { dg-do compile { target { tail_call && { c || c++11 } } } } */
+/* { dg-additional-options "-fdelayed-branch" { target sparc*-*-* } } */
+
+int __attribute__((noinline,noclone,noipa))
+callee (int i)
+{
+  return i * i;
+}
+
+int __attribute__((noinline,noclone,noipa))
+caller (int i)
+{
+  [[gnu::musttail]] return callee (i + 1);
+}
diff --git a/gcc/testsuite/c-c++-common/musttail2.c 
b/gcc/testsuite/c-c++-common/musttail2.c
new file mode 100644
index ..86f2c3d77404
--- /dev/null
+++ b/gcc/testsuite/c-c++-common/musttail2.c
@@ -0,0 +1,33 @@
+/* { dg-do compile { target { tail_call && { c || c++11 } } } } */
+
+struct box { char field[256]; int i; };
+
+int __attribute__((noinline,noclone,noipa))
+test_2_callee (int i, struct box b)
+{
+  if (b.field[0])
+return 5;
+  return i * i;
+}
+
+int __attribute__((noinline,noclone,noipa))
+test_2_caller (int i)
+{
+  struct box b;
+  [[gnu::musttail]] return test_2_callee (i + 1, b); /* { dg-error "cannot 
tail-call: " } */
+}
+
+extern void setjmp (void);
+void
+test_3 (void)
+{
+  [[gnu::musttail]] return setjmp (); /* { dg-error "cannot tail-call: " } */
+}
+
+extern float f7(void);
+
+int
+test_6 (void)
+{
+  [[gnu::musttail]] return f7(); /* { dg-error "cannot tail-call: " } */
+}
diff --git a/gcc/testsuite/c-c++-common/musttail3.c 
b/gcc/testsuite/c-c++-common/musttail3.c
new file mode 100644
index ..ea9589c59ef2
--- /dev/null
+++ b/gcc/testsuite/c-c++-common/musttail3.c
@@ -0,0 +1,29 @@
+/* { dg-do compile { target { tail_call && { c || c++11 } } } } */
+
+extern int foo2 (int x, ...);
+
+struct str
+{
+  int a, b;
+};
+
+struct str
+cstruct (int x)
+{
+  if (x < 10)
+[[clang::musttail]] return cstruct (x + 1);
+  return ((struct str){ x, 0 });
+}
+
+int
+foo (int x)
+{
+  if (x < 10)
+[[clang::musttail]] return foo2 (x, 29);
+  if (x < 100)
+{
+  int k = foo (x + 1);
+  [[clang::musttail]] return k;/* { dg-error "cannot tail-call: " } */
+}
+  return x;
+}
diff --git a/gcc/testsuite/c-c++-common/musttail4.c 
b/gcc/testsuite/c-c++-common/musttail4.c
new file mode 100644
index ..23f4b5e1cd68
--- /dev/null
+++ b/gcc/testsuite/c-c++-common/musttail4.c
@@ -0,0 +1,17 @@
+/* { dg-do compile { target { tail_call && { c || c++11 } } } } */
+
+struct box { char field[64]; int i; };
+
+struct box __attribute__((noinline,noclone,noipa))
+returns_struct (int i)
+{
+  struct box b;
+  b.i = i * i;
+  return b;
+}
+
+int __attribute__((noinline,noclone))
+test_1 (int i)
+{
+  [[gnu::musttail]] return returns_struct (i * 5).i; /* { dg-error "cannot 
tail-call: " } */
+}
diff --git a/gcc/testsuite/c-c++-common/musttail5.c 
b/gcc/testsuite/c-c++-common/musttail5.c

[PATCH v10 2/3] C: Implement musttail attribute for returns

2024-07-17 Thread Andi Kleen
Implement a C23 clang compatible musttail attribute similar to the earlier
C++ implementation in the C parser.

gcc/c/ChangeLog:

PR c/83324
* c-parser.cc (struct attr_state): Define with musttail_p.
(c_parser_statement_after_labels): Handle [[musttail]].
(c_parser_std_attribute): Dito.
(c_parser_handle_musttail): Dito.
(c_parser_compound_statement_nostart): Dito.
(c_parser_all_labels): Dito.
(c_parser_statement): Dito.
* c-tree.h (c_finish_return): Add musttail_p flag.
* c-typeck.cc (c_finish_return): Handle musttail_p flag.
---
 gcc/c/c-parser.cc | 70 ++-
 gcc/c/c-tree.h|  2 +-
 gcc/c/c-typeck.cc |  7 +++--
 3 files changed, 63 insertions(+), 16 deletions(-)

diff --git a/gcc/c/c-parser.cc b/gcc/c/c-parser.cc
index 12c5ed5d92c7..a8848d01f21a 100644
--- a/gcc/c/c-parser.cc
+++ b/gcc/c/c-parser.cc
@@ -1621,6 +1621,12 @@ struct omp_for_parse_data {
   bool fail : 1;
 };
 
+struct attr_state
+{
+  /* True if we parsed a musttail attribute for return.  */
+  bool musttail_p;
+};
+
 static bool c_parser_nth_token_starts_std_attributes (c_parser *,
  unsigned int);
 static tree c_parser_std_attribute_specifier_sequence (c_parser *);
@@ -1665,7 +1671,7 @@ static location_t c_parser_compound_statement_nostart 
(c_parser *);
 static void c_parser_label (c_parser *, tree);
 static void c_parser_statement (c_parser *, bool *, location_t * = NULL);
 static void c_parser_statement_after_labels (c_parser *, bool *,
-vec * = NULL);
+vec * = NULL, attr_state = 
{});
 static tree c_parser_c99_block_statement (c_parser *, bool *,
  location_t * = NULL);
 static void c_parser_if_statement (c_parser *, bool *, vec *);
@@ -6982,6 +6988,29 @@ c_parser_handle_directive_omp_attributes (tree &attrs,
 }
 }
 
+/* Check if STD_ATTR contains a musttail attribute and remove if it
+   precedes a return.  PARSER is the parser and ATTR is the output
+   attr_state.  */
+
+static tree
+c_parser_handle_musttail (c_parser *parser, tree std_attrs, attr_state &attr)
+{
+  if (c_parser_next_token_is_keyword (parser, RID_RETURN))
+{
+  if (lookup_attribute ("gnu", "musttail", std_attrs))
+   {
+ std_attrs = remove_attribute ("gnu", "musttail", std_attrs);
+ attr.musttail_p = true;
+   }
+  if (lookup_attribute ("clang", "musttail", std_attrs))
+   {
+ std_attrs = remove_attribute ("clang", "musttail", std_attrs);
+ attr.musttail_p = true;
+   }
+}
+  return std_attrs;
+}
+
 /* Parse a compound statement except for the opening brace.  This is
used for parsing both compound statements and statement expressions
(which follow different paths to handling the opening).  */
@@ -6998,6 +7027,7 @@ c_parser_compound_statement_nostart (c_parser *parser)
   bool in_omp_loop_block
 = omp_for_parse_state ? omp_for_parse_state->want_nested_loop : false;
   tree sl = NULL_TREE;
+  attr_state a = {};
 
   if (c_parser_next_token_is (parser, CPP_CLOSE_BRACE))
 {
@@ -7138,7 +7168,10 @@ c_parser_compound_statement_nostart (c_parser *parser)
= c_parser_nth_token_starts_std_attributes (parser, 1);
   tree std_attrs = NULL_TREE;
   if (have_std_attrs)
-   std_attrs = c_parser_std_attribute_specifier_sequence (parser);
+   {
+ std_attrs = c_parser_std_attribute_specifier_sequence (parser);
+ std_attrs = c_parser_handle_musttail (parser, std_attrs, a);
+   }
   if (c_parser_next_token_is_keyword (parser, RID_CASE)
  || c_parser_next_token_is_keyword (parser, RID_DEFAULT)
  || (c_parser_next_token_is (parser, CPP_NAME)
@@ -7286,7 +7319,7 @@ c_parser_compound_statement_nostart (c_parser *parser)
  last_stmt = true;
  mark_valid_location_for_stdc_pragma (false);
  if (!omp_for_parse_state)
-   c_parser_statement_after_labels (parser, NULL);
+   c_parser_statement_after_labels (parser, NULL, NULL, a);
  else
{
  /* In canonical loop nest form, nested loops can only appear
@@ -7328,15 +7361,20 @@ c_parser_compound_statement_nostart (c_parser *parser)
 /* Parse all consecutive labels, possibly preceded by standard
attributes.  In this context, a statement is required, not a
declaration, so attributes must be followed by a statement that is
-   not just a semicolon.  */
+   not just a semicolon.  Returns an attr_state.  */
 
-static void
+static attr_state
 c_parser_all_labels (c_parser *parser)
 {
+  attr_state attr = {};
   bool have_std_attrs;
   tree std_attrs = NULL;
   if ((have_std_attrs = c_parser_nth_token_starts_std_attributes (parser, 1)))
-std_attrs = c_parser_std_attribute_specifier_sequence (parser);
+{
+  std_attrs = c_pars

[PATCH v10 1/3] C++: Support clang compatible [[musttail]] (PR83324)

2024-07-17 Thread Andi Kleen
This patch implements a clang compatible [[musttail]] attribute for
returns.

musttail is useful as an alternative to computed goto for interpreters.
With computed goto the interpreter function usually ends up very big
which causes problems with register allocation and other per function
optimizations not scaling. With musttail the interpreter can be instead
written as a sequence of smaller functions that call each other. To
avoid unbounded stack growth this requires forcing a sibling call, which
this attribute does. It guarantees an error if the call cannot be tail
called which allows the programmer to fix it instead of risking a stack
overflow. Unlike computed goto it is also type-safe.

It turns out that David Malcolm had already implemented middle/backend
support for a musttail attribute back in 2016, but it wasn't exposed
to any frontend other than a special plugin.

This patch adds a [[gnu::musttail]] attribute for C++ that can be added
to return statements. The return statement must be a direct call
(it does not follow dependencies), which is similar to what clang
implements. It then uses the existing must tail infrastructure.

For compatibility it also detects clang::musttail

Passes bootstrap and full test

gcc/c-family/ChangeLog:

* c-attribs.cc (set_musttail_on_return): New function.
* c-common.h (set_musttail_on_return): Declare new function.

gcc/cp/ChangeLog:

PR c/83324
* cp-tree.h (AGGR_INIT_EXPR_MUST_TAIL): Add.
* parser.cc (cp_parser_statement): Handle musttail.
(cp_parser_jump_statement): Dito.
* pt.cc (tsubst_expr): Copy CALL_EXPR_MUST_TAIL_CALL.
* semantics.cc (simplify_aggr_init_expr): Handle musttail.
---
 gcc/c-family/c-attribs.cc | 20 
 gcc/c-family/c-common.h   |  1 +
 gcc/cp/cp-tree.h  |  4 
 gcc/cp/parser.cc  | 32 +---
 gcc/cp/pt.cc  |  9 -
 gcc/cp/semantics.cc   |  1 +
 6 files changed, 63 insertions(+), 4 deletions(-)

diff --git a/gcc/c-family/c-attribs.cc b/gcc/c-family/c-attribs.cc
index 5adc7b775eaf..685f212683f4 100644
--- a/gcc/c-family/c-attribs.cc
+++ b/gcc/c-family/c-attribs.cc
@@ -672,6 +672,26 @@ attribute_takes_identifier_p (const_tree attr_id)
 return targetm.attribute_takes_identifier_p (attr_id);
 }
 
+/* Set a musttail attribute MUSTTAIL_P on return expression RETVAL
+   at LOC.  */
+
+void
+set_musttail_on_return (tree retval, location_t loc, bool musttail_p)
+{
+  if (retval && musttail_p)
+{
+  tree t = retval;
+  if (TREE_CODE (t) == TARGET_EXPR)
+   t = TARGET_EXPR_INITIAL (t);
+  if (TREE_CODE (t) != CALL_EXPR)
+   error_at (loc, "cannot tail-call: return value must be a call");
+  else
+   CALL_EXPR_MUST_TAIL_CALL (t) = 1;
+}
+  else if (musttail_p && !retval)
+error_at (loc, "cannot tail-call: return value must be a call");
+}
+
 /* Verify that argument value POS at position ARGNO to attribute NAME
applied to function FN (which is either a function declaration or function
type) refers to a function parameter at position POS and the expected type
diff --git a/gcc/c-family/c-common.h b/gcc/c-family/c-common.h
index adee822a3ae0..2510ee4dbc9d 100644
--- a/gcc/c-family/c-common.h
+++ b/gcc/c-family/c-common.h
@@ -1648,6 +1648,7 @@ extern tree handle_noreturn_attribute (tree *, tree, 
tree, int, bool *);
 extern tree handle_musttail_attribute (tree *, tree, tree, int, bool *);
 extern bool has_attribute (location_t, tree, tree, tree (*)(tree));
 extern tree build_attr_access_from_parms (tree, bool);
+extern void set_musttail_on_return (tree, location_t, bool);
 
 /* In c-format.cc.  */
 extern bool valid_format_string_type_p (tree);
diff --git a/gcc/cp/cp-tree.h b/gcc/cp/cp-tree.h
index c6f102564ce0..67ba3274eb1b 100644
--- a/gcc/cp/cp-tree.h
+++ b/gcc/cp/cp-tree.h
@@ -4236,6 +4236,10 @@ templated_operator_saved_lookups (tree t)
 #define AGGR_INIT_FROM_THUNK_P(NODE) \
   (AGGR_INIT_EXPR_CHECK (NODE)->base.protected_flag)
 
+/* Nonzero means that the call was marked musttail.  */
+#define AGGR_INIT_EXPR_MUST_TAIL(NODE) \
+  (AGGR_INIT_EXPR_CHECK (NODE)->base.static_flag)
+
 /* AGGR_INIT_EXPR accessors.  These are equivalent to the CALL_EXPR
accessors, except for AGGR_INIT_EXPR_SLOT (which takes the place of
CALL_EXPR_STATIC_CHAIN).  */
diff --git a/gcc/cp/parser.cc b/gcc/cp/parser.cc
index efd5d6f29a71..71bffd4a9311 100644
--- a/gcc/cp/parser.cc
+++ b/gcc/cp/parser.cc
@@ -2467,7 +2467,7 @@ static tree cp_parser_perform_range_for_lookup
 static tree cp_parser_range_for_member_function
   (tree, tree);
 static tree cp_parser_jump_statement
-  (cp_parser *);
+  (cp_parser *, tree &);
 static void cp_parser_declaration_statement
   (cp_parser *);
 
@@ -12757,7 +12757,7 @@ cp_parser_statement (cp_parser* parser, tree 
in_statement_expr,
case RID_CO_RETURN:
case RID_GOTO:
  std_attrs = process_stmt_hotness_attribute (std_attrs, a

Remaining frontend patches for musttail

2024-07-17 Thread Andi Kleen
This patchkit contains the remaining C/C++ frontend patches for the 
[[musttail]] 
extension that still need approval for trunk. I already committed
the tree-ssa and RTL pieces.

C: I addressed Marek's feedback, but need final ack. Marek can you
please take a look?

C++: Fixed support for AGGR_VIEW expressions thanks to Jason's prodding.

Tests: Addressed Jason's feedback and covered now hopefully all the 
class passing cases. I split some tests to create a full set of errors,
otherwise frontend errors would stop the tree optimizers from running.

The class passing tests are showing another problem in the middle-end
code where implicit calls generated by the C++ frontend stop
tree-tailcall early, so it can't identify the user written tail call.
This results in cryptic "cannot tail-call: other reasons" fallback
errors, which is not ideal, but also not a show stopper.
Currently this is hidden in the test suite by running that test at -O0. 

-Andi





Re: [PATCH v9 08/10] Add tests for C/C++ musttail attributes

2024-07-17 Thread Andi Kleen
> Great. Does it also work in a non-template function?

Sadly it did not because there needs to be more AGGR_VIEW_EXPR handling,
as you predicted at some point. I fixed it now. Will send updated patches.

-Andi


Re: [PATCH v9 08/10] Add tests for C/C++ musttail attributes

2024-07-16 Thread Andi Kleen
On Tue, Jul 16, 2024 at 06:06:42PM -0400, Jason Merrill wrote:
> On 7/16/24 5:55 PM, Andi Kleen wrote:
> > On Tue, Jul 16, 2024 at 12:52:31PM -0700, Andi Kleen wrote:
> > > On Tue, Jul 16, 2024 at 02:51:13PM -0400, Jason Merrill wrote:
> > > > On 7/16/24 12:18 PM, Andi Kleen wrote:
> > > > > On Tue, Jul 16, 2024 at 11:17:14AM -0400, Jason Merrill wrote:
> > > > > > On 7/16/24 11:15 AM, Andi Kleen wrote:
> > > > > > > > In the adjusted test it looks like the types of f and g match, 
> > > > > > > > so I wouldn't
> > > > > > > > expect an error.
> > > > > > > 
> > > > > > > Good point! Missing the forest for the trees.
> > > > > > > 
> > > > > > > Anyways are the C++ patches ok with this change?
> > > > > > 
> > > > > > I'm still looking for a test which does error because the types are
> > > > > > different.
> > > > > 
> > > > > Like this?
> > > > 
> > > > Where the called function returns C and the callee function does not.
> > > 
> > > In this case the attribute seems to get lost and it succeeds.
> > 
> > This somewhat hackish patch fixes it here, to handle the case
> > of a TARGET_EXPR where the CALL_EXPR is in the cleanup. extract_call
> > bails on that.
> 
> The CALL_EXPR in the cleanup is calling the destructor, that's not what
> we're trying to tail-call.
> 
> I think the problem here is that the call to f is represented with an
> AGGR_INIT_EXPR instead of CALL_EXPR, so you need to handle the flag on that
> tree_code as well.


Okay this seems to work
(I had to adjust the test case because it now correctly errors out
on passing the class at -O0) 


diff --git a/gcc/cp/cp-tree.h b/gcc/cp/cp-tree.h
index 4bb3e9c4989b..5ec8102c1849 100644
--- a/gcc/cp/cp-tree.h
+++ b/gcc/cp/cp-tree.h
@@ -4245,6 +4245,10 @@ templated_operator_saved_lookups (tree t)
 #define AGGR_INIT_FROM_THUNK_P(NODE) \
   (AGGR_INIT_EXPR_CHECK (NODE)->base.protected_flag)
 
+/* Nonzero means that the call was marked musttail.  */
+#define AGGR_INIT_EXPR_MUST_TAIL(NODE) \
+  (AGGR_INIT_EXPR_CHECK (NODE)->base.static_flag)
+
 /* AGGR_INIT_EXPR accessors.  These are equivalent to the CALL_EXPR
accessors, except for AGGR_INIT_EXPR_SLOT (which takes the place of
CALL_EXPR_STATIC_CHAIN).  */
diff --git a/gcc/cp/pt.cc b/gcc/cp/pt.cc
index 3b914089a6e2..d668c5af6a23 100644
--- a/gcc/cp/pt.cc
+++ b/gcc/cp/pt.cc
@@ -21124,6 +21124,8 @@ tsubst_expr (tree t, tree args, tsubst_flags_t 
complain, tree in_decl)
  CALL_EXPR_REVERSE_ARGS (call) = rev;
  if (TREE_CODE (call) == CALL_EXPR)
CALL_EXPR_MUST_TAIL_CALL (call) = mtc;
+ else if (TREE_CODE (call) == AGGR_INIT_EXPR)
+   AGGR_INIT_EXPR_MUST_TAIL (call) = mtc;
}
if (warning_suppressed_p (t, OPT_Wpessimizing_move))
  /* This also suppresses -Wredundant-move.  */
diff --git a/gcc/cp/semantics.cc b/gcc/cp/semantics.cc
index cd3df13772db..fb45974cd90f 100644
--- a/gcc/cp/semantics.cc
+++ b/gcc/cp/semantics.cc
@@ -4979,6 +4979,7 @@ simplify_aggr_init_expr (tree *tp)
 = CALL_EXPR_OPERATOR_SYNTAX (aggr_init_expr);
   CALL_EXPR_ORDERED_ARGS (call_expr) = CALL_EXPR_ORDERED_ARGS (aggr_init_expr);
   CALL_EXPR_REVERSE_ARGS (call_expr) = CALL_EXPR_REVERSE_ARGS (aggr_init_expr);
+  CALL_EXPR_MUST_TAIL_CALL (call_expr) = AGGR_INIT_EXPR_MUST_TAIL 
(aggr_init_expr);
 
   if (style == ctor)
 {
diff --git a/gcc/testsuite/g++.dg/musttail10.C 
b/gcc/testsuite/g++.dg/musttail10.C
index e454a6238a06..93ec32db160a 100644
--- a/gcc/testsuite/g++.dg/musttail10.C
+++ b/gcc/testsuite/g++.dg/musttail10.C
@@ -14,9 +14,11 @@ template 
 __attribute__((noinline, noclone, noipa))
 T g2() { [[gnu::musttail]] return f(); }
 
+#if __OPTIMIZE__ >= 1
 template 
 __attribute__((noinline, noclone, noipa))
 T g3() { [[gnu::musttail]] return f(); }
+#endif
 
 template 
 __attribute__((noinline, noclone, noipa))
@@ -28,12 +30,20 @@ class C
 public:
   C(double x) : x(x) {}
   ~C() { asm("":::"memory"); }
+  operator int() { return x; } 
 };
 
+template 
+__attribute__((noinline, noclone, noipa))
+T g5() { [[gnu::musttail]] return f(); } /* { dg-error "cannot tail-call" } 
*/
+
 int main()
 {
   g1();
   g2();
+#if __OPTIMIZE__ >= 1
   g3();
+#endif
   g4();
+  g5();
 }



Re: [PATCH v9 08/10] Add tests for C/C++ musttail attributes

2024-07-16 Thread Andi Kleen
On Tue, Jul 16, 2024 at 12:52:31PM -0700, Andi Kleen wrote:
> On Tue, Jul 16, 2024 at 02:51:13PM -0400, Jason Merrill wrote:
> > On 7/16/24 12:18 PM, Andi Kleen wrote:
> > > On Tue, Jul 16, 2024 at 11:17:14AM -0400, Jason Merrill wrote:
> > > > On 7/16/24 11:15 AM, Andi Kleen wrote:
> > > > > > In the adjusted test it looks like the types of f and g match, so I 
> > > > > > wouldn't
> > > > > > expect an error.
> > > > > 
> > > > > Good point! Missing the forest for the trees.
> > > > > 
> > > > > Anyways are the C++ patches ok with this change?
> > > > 
> > > > I'm still looking for a test which does error because the types are
> > > > different.
> > > 
> > > Like this?
> > 
> > Where the called function returns C and the callee function does not.
> 
> In this case the attribute seems to get lost and it succeeds.

This somewhat hackish patch fixes it here, to handle the case
of a TARGET_EXPR where the CALL_EXPR is in the cleanup. extract_call
bails on that.

diff --git a/gcc/cp/pt.cc b/gcc/cp/pt.cc
index 3b914089a6e2..8753aa51da52 100644
--- a/gcc/cp/pt.cc
+++ b/gcc/cp/pt.cc
@@ -21124,6 +21124,10 @@ tsubst_expr (tree t, tree args, tsubst_flags_t 
complain, tree in_decl)
  CALL_EXPR_REVERSE_ARGS (call) = rev;
  if (TREE_CODE (call) == CALL_EXPR)
CALL_EXPR_MUST_TAIL_CALL (call) = mtc;
+ else if (mtc
+  && TREE_CODE (ret) == TARGET_EXPR
+  && TREE_CODE (TARGET_EXPR_CLEANUP (ret)) == 
CALL_EXPR)
+   CALL_EXPR_MUST_TAIL_CALL (TARGET_EXPR_CLEANUP (ret)) = mtc;
}
if (warning_suppressed_p (t, OPT_Wpessimizing_move))
  /* This also suppresses -Wredundant-move.  */


Re: [PATCH v9 08/10] Add tests for C/C++ musttail attributes

2024-07-16 Thread Andi Kleen
On Tue, Jul 16, 2024 at 02:51:13PM -0400, Jason Merrill wrote:
> On 7/16/24 12:18 PM, Andi Kleen wrote:
> > On Tue, Jul 16, 2024 at 11:17:14AM -0400, Jason Merrill wrote:
> > > On 7/16/24 11:15 AM, Andi Kleen wrote:
> > > > > In the adjusted test it looks like the types of f and g match, so I 
> > > > > wouldn't
> > > > > expect an error.
> > > > 
> > > > Good point! Missing the forest for the trees.
> > > > 
> > > > Anyways are the C++ patches ok with this change?
> > > 
> > > I'm still looking for a test which does error because the types are
> > > different.
> > 
> > Like this?
> 
> Where the called function returns C and the callee function does not.

In this case the attribute seems to get lost and it succeeds.

diff --git a/gcc/testsuite/g++.dg/musttail10.C 
b/gcc/testsuite/g++.dg/musttail10.C
index e454a6238a06..39f0ec38253d 100644
--- a/gcc/testsuite/g++.dg/musttail10.C
+++ b/gcc/testsuite/g++.dg/musttail10.C
@@ -28,12 +28,18 @@ class C
 public:
   C(double x) : x(x) {}
   ~C() { asm("":::"memory"); }
+  operator int() { return x; } 
 };
 
+template 
+__attribute__((noinline, noclone, noipa))
+T g5() { [[gnu::musttail]] return f(); } /* { dg-error "cannot tail-call" } 
*/
+
 int main()
 {
   g1();
   g2();
   g3();
   g4();
+  g5();
 }



Re: [PATCH v9 08/10] Add tests for C/C++ musttail attributes

2024-07-16 Thread Andi Kleen
On Tue, Jul 16, 2024 at 11:17:14AM -0400, Jason Merrill wrote:
> On 7/16/24 11:15 AM, Andi Kleen wrote:
> > > In the adjusted test it looks like the types of f and g match, so I 
> > > wouldn't
> > > expect an error.
> > 
> > Good point! Missing the forest for the trees.
> > 
> > Anyways are the C++ patches ok with this change?
> 
> I'm still looking for a test which does error because the types are
> different.

Like this?

-Andi


diff --git a/gcc/testsuite/g++.dg/musttail10.C 
b/gcc/testsuite/g++.dg/musttail10.C
index 6a8507784a14..e454a6238a06 100644
--- a/gcc/testsuite/g++.dg/musttail10.C
+++ b/gcc/testsuite/g++.dg/musttail10.C
@@ -4,7 +4,7 @@
 
 template  T f();
 
-double h() { [[gnu::musttail]] return f(); } /* { dg-error "cannot 
tail-call" } */
+double g() { [[gnu::musttail]] return f(); } /* { dg-error "cannot 
tail-call" } */
 
 template 
 __attribute__((noinline, noclone, noipa))
@@ -18,6 +18,10 @@ template 
 __attribute__((noinline, noclone, noipa))
 T g3() { [[gnu::musttail]] return f(); }
 
+template 
+__attribute__((noinline, noclone, noipa))
+T g4() { [[gnu::musttail]] return f(); } /* { dg-error "cannot 
tail-call" } */
+
 class C
 {
   double x;
@@ -31,4 +35,5 @@ int main()
   g1();
   g2();
   g3();
+  g4();
 }


Re: [PATCH v9 08/10] Add tests for C/C++ musttail attributes

2024-07-16 Thread Andi Kleen
> In the adjusted test it looks like the types of f and g match, so I wouldn't
> expect an error.

Good point! Missing the forest for the trees.

Anyways are the C++ patches ok with this change?

Thanks,
-Andi


Re: [PATCH v9 08/10] Add tests for C/C++ musttail attributes

2024-07-15 Thread Andi Kleen
On Mon, Jul 15, 2024 at 06:57:57PM -0400, Jason Merrill wrote:
> On 7/8/24 12:56 PM, Andi Kleen wrote:
> > diff --git a/gcc/testsuite/g++.dg/musttail10.C 
> > b/gcc/testsuite/g++.dg/musttail10.C
> > new file mode 100644
> > index ..9b7043b8a306
> > --- /dev/null
> > +++ b/gcc/testsuite/g++.dg/musttail10.C
> > @@ -0,0 +1,34 @@
> > +/* { dg-do compile { target { tail_call } } } */
> > +/* { dg-options "-std=gnu++11" } */
> > +/* { dg-additional-options "-fdelayed-branch" { target sparc*-*-* } } */
> > +
> > +int f();
> > +
> > +double h() { [[gnu::musttail]] return f(); } /* { dg-error "cannot 
> > tail-call" } */
> > +
> > +template 
> > +__attribute__((noinline, noclone, noipa))
> > +T g1() { [[gnu::musttail]] return f(); } /* { dg-error "target is not 
> > able" "" { target powerpc*-*-* } } */
> > +
> > +template 
> > +__attribute__((noinline, noclone, noipa))
> > +T g2() { [[gnu::musttail]] return f(); } /* { dg-error "cannot tail-call" 
> > } */
> > +
> > +template 
> > +__attribute__((noinline, noclone, noipa))
> > +T g3() { [[gnu::musttail]] return f(); } /* { dg-error "cannot tail-call" 
> > } */
> > +
> > +class C
> > +{
> > +  double x;
> > +public:
> > +  C(double x) : x(x) {}
> > +  ~C() { asm("":::"memory"); }
> > +};
> > +
> > +int main()
> > +{
> > +  g1();
> > +  g2();
> > +  g3();
> > +}
> 
> I had asked for this test to check the case where the function called with
> [[musttail]] returns a non-trivially-copyable class; the test now includes
> such a class, but all the [[musttail]] calls are still to a function that
> returns int.

Thanks Jason.

I fixed the test case, but now the musttail gets lost, no error for g2/g3.

That means the flag is still lost somewhere. Does something outside tsubst need 
changes too?

Right now tsubst has only this:

@@ -21113,12 +21113,17 @@ tsubst_expr (tree t, tree args, tsubst_flags_t 
complain, tree in_decl)
bool op = CALL_EXPR_OPERATOR_SYNTAX (t);
bool ord = CALL_EXPR_ORDERED_ARGS (t);
bool rev = CALL_EXPR_REVERSE_ARGS (t);
-   if (op || ord || rev)
+   bool mtc = false;
+   if (TREE_CODE (t) == CALL_EXPR)
+ mtc = CALL_EXPR_MUST_TAIL_CALL (t);
+   if (op || ord || rev || mtc)
  if (tree call = extract_call_expr (ret))
{
  CALL_EXPR_OPERATOR_SYNTAX (call) = op;
  CALL_EXPR_ORDERED_ARGS (call) = ord;
  CALL_EXPR_REVERSE_ARGS (call) = rev;
+ if (TREE_CODE (call) == CALL_EXPR)
+   CALL_EXPR_MUST_TAIL_CALL (call) = mtc;
}
if (warning_suppressed_p (t, OPT_Wpessimizing_move))
  /* This also suppresses -Wredundant-move.  */


Fixed test case:


template  T f();

double h() { [[gnu::musttail]] return f(); } /* { dg-error "cannot 
tail-call" } */

template 
__attribute__((noinline, noclone, noipa))
T g1() { [[gnu::musttail]] return f(); } /* { dg-error "target is not able" 
"" { target powerpc*-*-* } } */

template 
__attribute__((noinline, noclone, noipa))
T g2() { [[gnu::musttail]] return f(); } /* { dg-error "cannot tail-call" } 
*/

template 
__attribute__((noinline, noclone, noipa))
T g3() { [[gnu::musttail]] return f(); } /* { dg-error "cannot tail-call" } 
*/

class C
{
  double x;
public:
  C(double x) : x(x) {}
  ~C() { asm("":::"memory"); }
};

int main()
{
  g1();
  g2();
  g3();
}



Re: [PATCH v9 04/10] C++: Support clang compatible [[musttail]] (PR83324)

2024-07-13 Thread Andi Kleen


Updated version with common code for C/C++ extracted in c-family.
Other than that no changes.
Is this version ok to commit?

---


This patch implements a clang compatible [[musttail]] attribute for
returns.

musttail is useful as an alternative to computed goto for interpreters.
With computed goto the interpreter function usually ends up very big
which causes problems with register allocation and other per function
optimizations not scaling. With musttail the interpreter can be instead
written as a sequence of smaller functions that call each other. To
avoid unbounded stack growth this requires forcing a sibling call, which
this attribute does. It guarantees an error if the call cannot be tail
called which allows the programmer to fix it instead of risking a stack
overflow. Unlike computed goto it is also type-safe.

It turns out that David Malcolm had already implemented middle/backend
support for a musttail attribute back in 2016, but it wasn't exposed
to any frontend other than a special plugin.

This patch adds a [[gnu::musttail]] attribute for C++ that can be added
to return statements. The return statement must be a direct call
(it does not follow dependencies), which is similar to what clang
implements. It then uses the existing must tail infrastructure.

For compatibility it also detects clang::musttail

Passes bootstrap and full test

gcc/c-family/ChangeLog:

* c-attribs.cc (set_musttail_on_return): New function.
* c-common.h (set_musttail_on_return): Declare new function.

gcc/cp/ChangeLog:

PR c/83324
* parser.cc (cp_parser_statement): Handle musttail.
(cp_parser_jump_statement): Dito.
* pt.cc (tsubst_expr): Copy CALL_EXPR_MUST_TAIL_CALL.
---
 gcc/c-family/c-attribs.cc | 20 
 gcc/c-family/c-common.h   |  1 +
 gcc/cp/parser.cc  | 26 +++---
 gcc/cp/pt.cc  |  7 ++-
 4 files changed, 50 insertions(+), 4 deletions(-)

diff --git a/gcc/c-family/c-attribs.cc b/gcc/c-family/c-attribs.cc
index 5adc7b775eaf..685f212683f4 100644
--- a/gcc/c-family/c-attribs.cc
+++ b/gcc/c-family/c-attribs.cc
@@ -672,6 +672,26 @@ attribute_takes_identifier_p (const_tree attr_id)
 return targetm.attribute_takes_identifier_p (attr_id);
 }
 
+/* Set a musttail attribute MUSTTAIL_P on return expression RETVAL
+   at LOC.  */
+
+void
+set_musttail_on_return (tree retval, location_t loc, bool musttail_p)
+{
+  if (retval && musttail_p)
+{
+  tree t = retval;
+  if (TREE_CODE (t) == TARGET_EXPR)
+   t = TARGET_EXPR_INITIAL (t);
+  if (TREE_CODE (t) != CALL_EXPR)
+   error_at (loc, "cannot tail-call: return value must be a call");
+  else
+   CALL_EXPR_MUST_TAIL_CALL (t) = 1;
+}
+  else if (musttail_p && !retval)
+error_at (loc, "cannot tail-call: return value must be a call");
+}
+
 /* Verify that argument value POS at position ARGNO to attribute NAME
applied to function FN (which is either a function declaration or function
type) refers to a function parameter at position POS and the expected type
diff --git a/gcc/c-family/c-common.h b/gcc/c-family/c-common.h
index e84c9c47513b..079c9dc5f08b 100644
--- a/gcc/c-family/c-common.h
+++ b/gcc/c-family/c-common.h
@@ -1646,6 +1646,7 @@ extern tree handle_noreturn_attribute (tree *, tree, 
tree, int, bool *);
 extern tree handle_musttail_attribute (tree *, tree, tree, int, bool *);
 extern bool has_attribute (location_t, tree, tree, tree (*)(tree));
 extern tree build_attr_access_from_parms (tree, bool);
+extern void set_musttail_on_return (tree, location_t, bool);
 
 /* In c-format.cc.  */
 extern bool valid_format_string_type_p (tree);
diff --git a/gcc/cp/parser.cc b/gcc/cp/parser.cc
index 31ae9c2fb54d..e2411ee7213c 100644
--- a/gcc/cp/parser.cc
+++ b/gcc/cp/parser.cc
@@ -2467,7 +2467,7 @@ static tree cp_parser_perform_range_for_lookup
 static tree cp_parser_range_for_member_function
   (tree, tree);
 static tree cp_parser_jump_statement
-  (cp_parser *);
+  (cp_parser *, tree &);
 static void cp_parser_declaration_statement
   (cp_parser *);
 
@@ -12756,7 +12756,7 @@ cp_parser_statement (cp_parser* parser, tree 
in_statement_expr,
case RID_CO_RETURN:
case RID_GOTO:
  std_attrs = process_stmt_hotness_attribute (std_attrs, attrs_loc);
- statement = cp_parser_jump_statement (parser);
+ statement = cp_parser_jump_statement (parser, std_attrs);
  break;
 
  /* Objective-C++ exception-handling constructs.  */
@@ -14844,10 +14844,11 @@ cp_parser_init_statement (cp_parser *parser, tree 
*decl)
jump-statement:
  goto * expression ;
 
+   STD_ATTRS are the statement attributes. They can be modified.
Returns the new BREAK_STMT, CONTINUE_STMT, RETURN_EXPR, or GOTO_EXPR.  */
 
 static tree
-cp_parser_jump_statement (cp_parser* parser)
+cp_parser_jump_statement (cp_parser* parser, tree &std_attrs)
 {
   tree statement = error_mark_node;
   cp_token *token;
@@ -14924,6

Re: [PATCH v9 05/10] C: Implement musttail attribute for returns

2024-07-13 Thread Andi Kleen


Here's an updated patch with your feedback addressed.
Is this version ok?

The common code is in the C++ patch.

---

Implement a C23 clang compatible musttail attribute similar to the earlier
C++ implementation in the C parser.

gcc/c/ChangeLog:

PR c/83324
* c-parser.cc (struct attr_state): Define with musttail_p.
(c_parser_statement_after_labels): Handle [[musttail]].
(c_parser_std_attribute): Dito.
(c_parser_handle_musttail): Dito.
(c_parser_compound_statement_nostart): Dito.
(c_parser_all_labels): Dito.
(c_parser_statement): Dito.
* c-tree.h (c_finish_return): Add musttail_p flag.
* c-typeck.cc (c_finish_return): Handle musttail_p flag.
---
 gcc/c/c-parser.cc | 70 ++-
 gcc/c/c-tree.h|  2 +-
 gcc/c/c-typeck.cc |  7 +++--
 3 files changed, 63 insertions(+), 16 deletions(-)

diff --git a/gcc/c/c-parser.cc b/gcc/c/c-parser.cc
index 8c4e697a4e10..9cb4d5d932ad 100644
--- a/gcc/c/c-parser.cc
+++ b/gcc/c/c-parser.cc
@@ -1621,6 +1621,12 @@ struct omp_for_parse_data {
   bool fail : 1;
 };
 
+struct attr_state
+{
+  /* True if we parsed a musttail attribute for return.  */
+  bool musttail_p;
+};
+
 static bool c_parser_nth_token_starts_std_attributes (c_parser *,
  unsigned int);
 static tree c_parser_std_attribute_specifier_sequence (c_parser *);
@@ -1665,7 +1671,7 @@ static location_t c_parser_compound_statement_nostart 
(c_parser *);
 static void c_parser_label (c_parser *, tree);
 static void c_parser_statement (c_parser *, bool *, location_t * = NULL);
 static void c_parser_statement_after_labels (c_parser *, bool *,
-vec * = NULL);
+vec * = NULL, attr_state = 
{});
 static tree c_parser_c99_block_statement (c_parser *, bool *,
  location_t * = NULL);
 static void c_parser_if_statement (c_parser *, bool *, vec *);
@@ -6982,6 +6988,29 @@ c_parser_handle_directive_omp_attributes (tree &attrs,
 }
 }
 
+/* Check if STD_ATTR contains a musttail attribute and remove if it
+   precedes a return.  PARSER is the parser and ATTR is the output
+   attr_state.  */
+
+static tree
+c_parser_handle_musttail (c_parser *parser, tree std_attrs, attr_state &attr)
+{
+  if (c_parser_next_token_is_keyword (parser, RID_RETURN))
+{
+  if (lookup_attribute ("gnu", "musttail", std_attrs))
+   {
+ std_attrs = remove_attribute ("gnu", "musttail", std_attrs);
+ attr.musttail_p = true;
+   }
+  if (lookup_attribute ("clang", "musttail", std_attrs))
+   {
+ std_attrs = remove_attribute ("clang", "musttail", std_attrs);
+ attr.musttail_p = true;
+   }
+}
+  return std_attrs;
+}
+
 /* Parse a compound statement except for the opening brace.  This is
used for parsing both compound statements and statement expressions
(which follow different paths to handling the opening).  */
@@ -6998,6 +7027,7 @@ c_parser_compound_statement_nostart (c_parser *parser)
   bool in_omp_loop_block
 = omp_for_parse_state ? omp_for_parse_state->want_nested_loop : false;
   tree sl = NULL_TREE;
+  attr_state a = {};
 
   if (c_parser_next_token_is (parser, CPP_CLOSE_BRACE))
 {
@@ -7138,7 +7168,10 @@ c_parser_compound_statement_nostart (c_parser *parser)
= c_parser_nth_token_starts_std_attributes (parser, 1);
   tree std_attrs = NULL_TREE;
   if (have_std_attrs)
-   std_attrs = c_parser_std_attribute_specifier_sequence (parser);
+   {
+ std_attrs = c_parser_std_attribute_specifier_sequence (parser);
+ std_attrs = c_parser_handle_musttail (parser, std_attrs, a);
+   }
   if (c_parser_next_token_is_keyword (parser, RID_CASE)
  || c_parser_next_token_is_keyword (parser, RID_DEFAULT)
  || (c_parser_next_token_is (parser, CPP_NAME)
@@ -7286,7 +7319,7 @@ c_parser_compound_statement_nostart (c_parser *parser)
  last_stmt = true;
  mark_valid_location_for_stdc_pragma (false);
  if (!omp_for_parse_state)
-   c_parser_statement_after_labels (parser, NULL);
+   c_parser_statement_after_labels (parser, NULL, NULL, a);
  else
{
  /* In canonical loop nest form, nested loops can only appear
@@ -7328,15 +7361,20 @@ c_parser_compound_statement_nostart (c_parser *parser)
 /* Parse all consecutive labels, possibly preceded by standard
attributes.  In this context, a statement is required, not a
declaration, so attributes must be followed by a statement that is
-   not just a semicolon.  */
+   not just a semicolon.  Returns an attr_state.  */
 
-static void
+static attr_state
 c_parser_all_labels (c_parser *parser)
 {
+  attr_state attr = {};
   bool have_std_attrs;
   tree std_attrs = NULL;
   if ((have_std_attrs = c_parser_nth_token_starts_std_attrib

[PATCH v9 10/10] Mark expand musttail error messages for translation

2024-07-08 Thread Andi Kleen
The musttail error messages are reported to the user, so must be
translated.

gcc/ChangeLog:

PR83324
* calls.cc (initialize_argument_information): Mark messages
for translation.
(can_implement_as_sibling_call_p): Dito.
(expand_call): Dito.
---
 gcc/calls.cc | 56 ++--
 1 file changed, 28 insertions(+), 28 deletions(-)

diff --git a/gcc/calls.cc b/gcc/calls.cc
index 883eb9971257..f28c58217fdf 100644
--- a/gcc/calls.cc
+++ b/gcc/calls.cc
@@ -1420,9 +1420,9 @@ initialize_argument_information (int num_actuals 
ATTRIBUTE_UNUSED,
{
  *may_tailcall = false;
  maybe_complain_about_tail_call (exp,
- "a callee-copied argument is"
- " stored in the current"
- " function's frame");
+ _("a callee-copied argument 
is"
+   " stored in the current"
+   " function's frame"));
}
 
  args[i].tree_value = build_fold_addr_expr_loc (loc,
@@ -1489,8 +1489,8 @@ initialize_argument_information (int num_actuals 
ATTRIBUTE_UNUSED,
  type = TREE_TYPE (args[i].tree_value);
  *may_tailcall = false;
  maybe_complain_about_tail_call (exp,
- "argument must be passed"
- " by copying");
+ _("argument must be passed"
+   " by copying"));
}
  arg.pass_by_reference = true;
}
@@ -2508,8 +2508,8 @@ can_implement_as_sibling_call_p (tree exp,
 {
   maybe_complain_about_tail_call
(exp,
-"machine description does not have"
-" a sibcall_epilogue instruction pattern");
+_("machine description does not have"
+  " a sibcall_epilogue instruction pattern"));
   return false;
 }
 
@@ -2519,7 +2519,7 @@ can_implement_as_sibling_call_p (tree exp,
  sibling calls will return a structure.  */
   if (structure_value_addr != NULL_RTX)
 {
-  maybe_complain_about_tail_call (exp, "callee returns a structure");
+  maybe_complain_about_tail_call (exp, _("callee returns a structure"));
   return false;
 }
 
@@ -2528,8 +2528,8 @@ can_implement_as_sibling_call_p (tree exp,
   if (!targetm.function_ok_for_sibcall (fndecl, exp))
 {
   maybe_complain_about_tail_call (exp,
- "target is not able to optimize the"
- " call into a sibling call");
+ _("target is not able to optimize the"
+   " call into a sibling call"));
   return false;
 }
 
@@ -2537,18 +2537,18 @@ can_implement_as_sibling_call_p (tree exp,
  optimized.  */
   if (flags & ECF_RETURNS_TWICE)
 {
-  maybe_complain_about_tail_call (exp, "callee returns twice");
+  maybe_complain_about_tail_call (exp, _("callee returns twice"));
   return false;
 }
   if (flags & ECF_NORETURN)
 {
-  maybe_complain_about_tail_call (exp, "callee does not return");
+  maybe_complain_about_tail_call (exp, _("callee does not return"));
   return false;
 }
 
   if (TYPE_VOLATILE (TREE_TYPE (TREE_TYPE (addr
 {
-  maybe_complain_about_tail_call (exp, "volatile function type");
+  maybe_complain_about_tail_call (exp, _("volatile function type"));
   return false;
 }
 
@@ -2567,7 +2567,7 @@ can_implement_as_sibling_call_p (tree exp,
  the argument areas are shared.  */
   if (fndecl && decl_function_context (fndecl) == current_function_decl)
 {
-  maybe_complain_about_tail_call (exp, "nested function");
+  maybe_complain_about_tail_call (exp, _("nested function"));
   return false;
 }
 
@@ -2579,8 +2579,8 @@ can_implement_as_sibling_call_p (tree exp,
crtl->args.size - crtl->args.pretend_args_size))
 {
   maybe_complain_about_tail_call (exp,
- "callee required more stack slots"
- " than the caller");
+ _("callee required more stack slots"
+   " than the caller"));
   return false;
 }
 
@@ -2594,15 +2594,15 @@ can_implement_as_sibling_call_p (tree exp,
crtl->args.size)))
 {
   maybe_complain_about_tail_call (exp,
- "inconsistent number of"
- " popped arguments");
+ _("inconsistent number of"
+ 

[PATCH v9 05/10] C: Implement musttail attribute for returns

2024-07-08 Thread Andi Kleen
Implement a C23 clang compatible musttail attribute similar to the earlier
C++ implementation in the C parser.

PR83324

gcc/c/ChangeLog:

* c-parser.cc (struct attr_state): Define with musttail_p.
(c_parser_statement_after_labels): Handle [[musttail]]
(c_parser_std_attribute): Dito.
(c_parser_handle_musttail): Dito.
(c_parser_compound_statement_nostart): Dito.
(c_parser_all_labels): Dito.
(c_parser_statement): Dito.
* c-tree.h (c_finish_return): Add musttail_p flag.
* c-typeck.cc (c_finish_return): Handle musttail_p flag.
---
 gcc/c/c-parser.cc | 59 +--
 gcc/c/c-tree.h|  2 +-
 gcc/c/c-typeck.cc | 15 ++--
 3 files changed, 61 insertions(+), 15 deletions(-)

diff --git a/gcc/c/c-parser.cc b/gcc/c/c-parser.cc
index 8c4e697a4e10..ce1c2c2be835 100644
--- a/gcc/c/c-parser.cc
+++ b/gcc/c/c-parser.cc
@@ -1621,6 +1621,11 @@ struct omp_for_parse_data {
   bool fail : 1;
 };
 
+struct attr_state
+{
+  bool musttail_p; // parsed a musttail for return
+};
+
 static bool c_parser_nth_token_starts_std_attributes (c_parser *,
  unsigned int);
 static tree c_parser_std_attribute_specifier_sequence (c_parser *);
@@ -1665,7 +1670,7 @@ static location_t c_parser_compound_statement_nostart 
(c_parser *);
 static void c_parser_label (c_parser *, tree);
 static void c_parser_statement (c_parser *, bool *, location_t * = NULL);
 static void c_parser_statement_after_labels (c_parser *, bool *,
-vec * = NULL);
+vec * = NULL, attr_state = 
{});
 static tree c_parser_c99_block_statement (c_parser *, bool *,
  location_t * = NULL);
 static void c_parser_if_statement (c_parser *, bool *, vec *);
@@ -6982,6 +6987,28 @@ c_parser_handle_directive_omp_attributes (tree &attrs,
 }
 }
 
+/* Check if STD_ATTR contains a musttail attribute and handle it
+   PARSER is the parser and A is the output attr_state.  */
+
+static tree
+c_parser_handle_musttail (c_parser *parser, tree std_attrs, attr_state &a)
+{
+  if (c_parser_next_token_is_keyword (parser, RID_RETURN))
+{
+  if (lookup_attribute ("gnu", "musttail", std_attrs))
+   {
+ std_attrs = remove_attribute ("gnu", "musttail", std_attrs);
+ a.musttail_p = true;
+   }
+  if (lookup_attribute ("clang", "musttail", std_attrs))
+   {
+ std_attrs = remove_attribute ("clang", "musttail", std_attrs);
+ a.musttail_p = true;
+   }
+}
+  return std_attrs;
+}
+
 /* Parse a compound statement except for the opening brace.  This is
used for parsing both compound statements and statement expressions
(which follow different paths to handling the opening).  */
@@ -6998,6 +7025,7 @@ c_parser_compound_statement_nostart (c_parser *parser)
   bool in_omp_loop_block
 = omp_for_parse_state ? omp_for_parse_state->want_nested_loop : false;
   tree sl = NULL_TREE;
+  attr_state a = {};
 
   if (c_parser_next_token_is (parser, CPP_CLOSE_BRACE))
 {
@@ -7138,7 +7166,10 @@ c_parser_compound_statement_nostart (c_parser *parser)
= c_parser_nth_token_starts_std_attributes (parser, 1);
   tree std_attrs = NULL_TREE;
   if (have_std_attrs)
-   std_attrs = c_parser_std_attribute_specifier_sequence (parser);
+   {
+ std_attrs = c_parser_std_attribute_specifier_sequence (parser);
+ std_attrs = c_parser_handle_musttail (parser, std_attrs, a);
+   }
   if (c_parser_next_token_is_keyword (parser, RID_CASE)
  || c_parser_next_token_is_keyword (parser, RID_DEFAULT)
  || (c_parser_next_token_is (parser, CPP_NAME)
@@ -7286,7 +7317,7 @@ c_parser_compound_statement_nostart (c_parser *parser)
  last_stmt = true;
  mark_valid_location_for_stdc_pragma (false);
  if (!omp_for_parse_state)
-   c_parser_statement_after_labels (parser, NULL);
+   c_parser_statement_after_labels (parser, NULL, NULL, a);
  else
{
  /* In canonical loop nest form, nested loops can only appear
@@ -7328,15 +7359,18 @@ c_parser_compound_statement_nostart (c_parser *parser)
 /* Parse all consecutive labels, possibly preceded by standard
attributes.  In this context, a statement is required, not a
declaration, so attributes must be followed by a statement that is
-   not just a semicolon.  */
+   not just a semicolon.  Returns an attr_state.  */
 
-static void
+static attr_state
 c_parser_all_labels (c_parser *parser)
 {
+  attr_state a = {};
   bool have_std_attrs;
   tree std_attrs = NULL;
   if ((have_std_attrs = c_parser_nth_token_starts_std_attributes (parser, 1)))
-std_attrs = c_parser_std_attribute_specifier_sequence (parser);
+std_attrs = c_parser_handle_musttail (parser,
+   c_parser_std_attribute_speci

[PATCH v9 08/10] Add tests for C/C++ musttail attributes

2024-07-08 Thread Andi Kleen
Some adopted from the existing C musttail plugin tests.

gcc/testsuite/ChangeLog:

* c-c++-common/musttail1.c: New test.
* c-c++-common/musttail2.c: New test.
* c-c++-common/musttail3.c: New test.
* c-c++-common/musttail4.c: New test.
* c-c++-common/musttail7.c: New test.
* c-c++-common/musttail8.c: New test.
* g++.dg/musttail6.C: New test.
* g++.dg/musttail9.C: New test.
* g++.dg/musttail10.C: New test.
---
 gcc/testsuite/c-c++-common/musttail1.c | 14 ++
 gcc/testsuite/c-c++-common/musttail2.c | 33 ++
 gcc/testsuite/c-c++-common/musttail3.c | 29 
 gcc/testsuite/c-c++-common/musttail4.c | 17 +++
 gcc/testsuite/c-c++-common/musttail5.c | 28 
 gcc/testsuite/c-c++-common/musttail7.c | 14 ++
 gcc/testsuite/c-c++-common/musttail8.c | 17 +++
 gcc/testsuite/g++.dg/musttail10.C  | 34 ++
 gcc/testsuite/g++.dg/musttail6.C   | 61 ++
 gcc/testsuite/g++.dg/musttail9.C   | 10 +
 10 files changed, 257 insertions(+)
 create mode 100644 gcc/testsuite/c-c++-common/musttail1.c
 create mode 100644 gcc/testsuite/c-c++-common/musttail2.c
 create mode 100644 gcc/testsuite/c-c++-common/musttail3.c
 create mode 100644 gcc/testsuite/c-c++-common/musttail4.c
 create mode 100644 gcc/testsuite/c-c++-common/musttail5.c
 create mode 100644 gcc/testsuite/c-c++-common/musttail7.c
 create mode 100644 gcc/testsuite/c-c++-common/musttail8.c
 create mode 100644 gcc/testsuite/g++.dg/musttail10.C
 create mode 100644 gcc/testsuite/g++.dg/musttail6.C
 create mode 100644 gcc/testsuite/g++.dg/musttail9.C

diff --git a/gcc/testsuite/c-c++-common/musttail1.c 
b/gcc/testsuite/c-c++-common/musttail1.c
new file mode 100644
index ..74efcc2a0bc6
--- /dev/null
+++ b/gcc/testsuite/c-c++-common/musttail1.c
@@ -0,0 +1,14 @@
+/* { dg-do compile { target { tail_call && { c || c++11 } } } } */
+/* { dg-additional-options "-fdelayed-branch" { target sparc*-*-* } } */
+
+int __attribute__((noinline,noclone,noipa))
+callee (int i)
+{
+  return i * i;
+}
+
+int __attribute__((noinline,noclone,noipa))
+caller (int i)
+{
+  [[gnu::musttail]] return callee (i + 1);
+}
diff --git a/gcc/testsuite/c-c++-common/musttail2.c 
b/gcc/testsuite/c-c++-common/musttail2.c
new file mode 100644
index ..86f2c3d77404
--- /dev/null
+++ b/gcc/testsuite/c-c++-common/musttail2.c
@@ -0,0 +1,33 @@
+/* { dg-do compile { target { tail_call && { c || c++11 } } } } */
+
+struct box { char field[256]; int i; };
+
+int __attribute__((noinline,noclone,noipa))
+test_2_callee (int i, struct box b)
+{
+  if (b.field[0])
+return 5;
+  return i * i;
+}
+
+int __attribute__((noinline,noclone,noipa))
+test_2_caller (int i)
+{
+  struct box b;
+  [[gnu::musttail]] return test_2_callee (i + 1, b); /* { dg-error "cannot 
tail-call: " } */
+}
+
+extern void setjmp (void);
+void
+test_3 (void)
+{
+  [[gnu::musttail]] return setjmp (); /* { dg-error "cannot tail-call: " } */
+}
+
+extern float f7(void);
+
+int
+test_6 (void)
+{
+  [[gnu::musttail]] return f7(); /* { dg-error "cannot tail-call: " } */
+}
diff --git a/gcc/testsuite/c-c++-common/musttail3.c 
b/gcc/testsuite/c-c++-common/musttail3.c
new file mode 100644
index ..ea9589c59ef2
--- /dev/null
+++ b/gcc/testsuite/c-c++-common/musttail3.c
@@ -0,0 +1,29 @@
+/* { dg-do compile { target { tail_call && { c || c++11 } } } } */
+
+extern int foo2 (int x, ...);
+
+struct str
+{
+  int a, b;
+};
+
+struct str
+cstruct (int x)
+{
+  if (x < 10)
+[[clang::musttail]] return cstruct (x + 1);
+  return ((struct str){ x, 0 });
+}
+
+int
+foo (int x)
+{
+  if (x < 10)
+[[clang::musttail]] return foo2 (x, 29);
+  if (x < 100)
+{
+  int k = foo (x + 1);
+  [[clang::musttail]] return k;/* { dg-error "cannot tail-call: " } */
+}
+  return x;
+}
diff --git a/gcc/testsuite/c-c++-common/musttail4.c 
b/gcc/testsuite/c-c++-common/musttail4.c
new file mode 100644
index ..23f4b5e1cd68
--- /dev/null
+++ b/gcc/testsuite/c-c++-common/musttail4.c
@@ -0,0 +1,17 @@
+/* { dg-do compile { target { tail_call && { c || c++11 } } } } */
+
+struct box { char field[64]; int i; };
+
+struct box __attribute__((noinline,noclone,noipa))
+returns_struct (int i)
+{
+  struct box b;
+  b.i = i * i;
+  return b;
+}
+
+int __attribute__((noinline,noclone))
+test_1 (int i)
+{
+  [[gnu::musttail]] return returns_struct (i * 5).i; /* { dg-error "cannot 
tail-call: " } */
+}
diff --git a/gcc/testsuite/c-c++-common/musttail5.c 
b/gcc/testsuite/c-c++-common/musttail5.c
new file mode 100644
index ..234da0d3f2a9
--- /dev/null
+++ b/gcc/testsuite/c-c++-common/musttail5.c
@@ -0,0 +1,28 @@
+/* { dg-do compile } */
+/* { dg-options "-std=c23" { target c } } */
+/* { dg-options "-std=gnu++11" { target c++ } } */
+
+[[musttail]] int j; /* { dg-warning "attribute" } */
+__attribute__((musttail)) int k; /* { dg-warning "attribute" } */
+
+void foo(void)

[PATCH v9 09/10] Add documentation for musttail attribute

2024-07-08 Thread Andi Kleen
gcc/ChangeLog:

PR83324
* doc/extend.texi: Document [[musttail]]
---
 gcc/doc/extend.texi | 25 +++--
 1 file changed, 23 insertions(+), 2 deletions(-)

diff --git a/gcc/doc/extend.texi b/gcc/doc/extend.texi
index b2e41a581dd1..f83e643da19c 100644
--- a/gcc/doc/extend.texi
+++ b/gcc/doc/extend.texi
@@ -9921,7 +9921,7 @@ same manner as the @code{deprecated} attribute.
 @section Statement Attributes
 @cindex Statement Attributes
 
-GCC allows attributes to be set on null statements.  @xref{Attribute Syntax},
+GCC allows attributes to be set on statements.  @xref{Attribute Syntax},
 for details of the exact syntax for using attributes.  Other attributes are
 available for functions (@pxref{Function Attributes}), variables
 (@pxref{Variable Attributes}), labels (@pxref{Label Attributes}), enumerators
@@ -9978,6 +9978,25 @@ foo (int x, int y)
 @code{y} is not actually incremented and the compiler can but does not
 have to optimize it to just @code{return 42 + 42;}.
 
+@cindex @code{musttail} statement attribute
+@item musttail
+
+The @code{gnu::musttail} or @code{clang::musttail} attribute
+can be applied to a @code{return} statement with a return-value expression
+that is a function call.  It asserts that the call must be a tail call that
+does not allocate extra stack space, so it is safe to use tail recursion
+to implement long running loops.
+
+@smallexample
+[[gnu::musttail]] return foo();
+@end smallexample
+
+If the compiler cannot generate a @code{musttail} tail call it will report
+an error. On some targets tail calls may never be supported.
+Tail calls cannot reference locals in memory, which may affect
+builds without optimization when passing small structures, or passing
+or returning large structures. Enabling -O1 or -O2 can improve
+the success of tail calls.
 @end table
 
 @node Attribute Syntax
@@ -10101,7 +10120,9 @@ the constant expression, if present.
 
 @subsubheading Statement Attributes
 In GNU C, an attribute specifier list may appear as part of a null
-statement.  The attribute goes before the semicolon.
+statement. The attribute goes before the semicolon.
+Some attributes in new style syntax are also supported
+on non-null statements.
 
 @subsubheading Type Attributes
 
-- 
2.45.2



[PATCH v9 07/10] Give better error messages for musttail

2024-07-08 Thread Andi Kleen
When musttail is set, make tree-tailcall give error messages
when it cannot handle a call. This avoids vague "other reasons"
error messages later at expand time when it sees a musttail
function not marked tail call.

In various cases this requires delaying the error until
the call is discovered.

Also print more information on the failure to the dump file.

gcc/ChangeLog:

PR83324
* tree-tailcall.cc (maybe_error_musttail): New function.
(suitable_for_tail_opt_p): Report error reason.
(suitable_for_tail_call_opt_p): Report error reason.
(find_tail_calls): Accept basic blocks with abnormal edges.
Delay reporting of errors until the call is discovered.
Move top level suitability checks to here.
(tree_optimize_tail_calls_1): Remove top level checks.
---
 gcc/tree-tailcall.cc | 187 +++
 1 file changed, 154 insertions(+), 33 deletions(-)

diff --git a/gcc/tree-tailcall.cc b/gcc/tree-tailcall.cc
index 43e8c25215cb..a68079d4f507 100644
--- a/gcc/tree-tailcall.cc
+++ b/gcc/tree-tailcall.cc
@@ -40,9 +40,11 @@ along with GCC; see the file COPYING3.  If not see
 #include "tree-eh.h"
 #include "dbgcnt.h"
 #include "cfgloop.h"
+#include "intl.h"
 #include "common/common-target.h"
 #include "ipa-utils.h"
 #include "tree-ssa-live.h"
+#include "diagnostic-core.h"
 
 /* The file implements the tail recursion elimination.  It is also used to
analyze the tail calls in general, passing the results to the rtl level
@@ -131,14 +133,20 @@ static tree m_acc, a_acc;
 
 static bitmap tailr_arg_needs_copy;
 
+static void maybe_error_musttail (gcall *call, const char *err);
+
 /* Returns false when the function is not suitable for tail call optimization
-   from some reason (e.g. if it takes variable number of arguments).  */
+   from some reason (e.g. if it takes variable number of arguments). CALL
+   is call to report for.  */
 
 static bool
-suitable_for_tail_opt_p (void)
+suitable_for_tail_opt_p (gcall *call)
 {
   if (cfun->stdarg)
-return false;
+{
+  maybe_error_musttail (call, _("caller uses stdargs"));
+  return false;
+}
 
   return true;
 }
@@ -146,35 +154,47 @@ suitable_for_tail_opt_p (void)
 /* Returns false when the function is not suitable for tail call optimization
for some reason (e.g. if it takes variable number of arguments).
This test must pass in addition to suitable_for_tail_opt_p in order to make
-   tail call discovery happen.  */
+   tail call discovery happen. CALL is call to report error for.  */
 
 static bool
-suitable_for_tail_call_opt_p (void)
+suitable_for_tail_call_opt_p (gcall *call)
 {
   tree param;
 
   /* alloca (until we have stack slot life analysis) inhibits
  sibling call optimizations, but not tail recursion.  */
   if (cfun->calls_alloca)
-return false;
+{
+  maybe_error_musttail (call, _("caller uses alloca"));
+  return false;
+}
 
   /* If we are using sjlj exceptions, we may need to add a call to
  _Unwind_SjLj_Unregister at exit of the function.  Which means
  that we cannot do any sibcall transformations.  */
   if (targetm_common.except_unwind_info (&global_options) == UI_SJLJ
   && current_function_has_exception_handlers ())
-return false;
+{
+  maybe_error_musttail (call, _("caller uses sjlj exceptions"));
+  return false;
+}
 
   /* Any function that calls setjmp might have longjmp called from
  any called function.  ??? We really should represent this
  properly in the CFG so that this needn't be special cased.  */
   if (cfun->calls_setjmp)
-return false;
+{
+  maybe_error_musttail (call, _("caller uses setjmp"));
+  return false;
+}
 
   /* Various targets don't handle tail calls correctly in functions
  that call __builtin_eh_return.  */
   if (cfun->calls_eh_return)
-return false;
+{
+  maybe_error_musttail (call, _("caller uses __builtin_eh_return"));
+  return false;
+}
 
   /* ??? It is OK if the argument of a function is taken in some cases,
  but not in all cases.  See PR15387 and PR19616.  Revisit for 4.1.  */
@@ -182,7 +202,10 @@ suitable_for_tail_call_opt_p (void)
param;
param = DECL_CHAIN (param))
 if (TREE_ADDRESSABLE (param))
-  return false;
+  {
+   maybe_error_musttail (call, _("address of caller arguments taken"));
+   return false;
+  }
 
   return true;
 }
@@ -402,16 +425,42 @@ propagate_through_phis (tree var, edge e)
   return var;
 }
 
+/* Report an error for failing to tail convert must call CALL
+   with error message ERR. Also clear the flag to prevent further
+   errors.  */
+
+static void
+maybe_error_musttail (gcall *call, const char *err)
+{
+  if (gimple_call_must_tail_p (call))
+{
+  error_at (call->location, "cannot tail-call: %s", err);
+  /* Avoid another error. ??? If there are multiple reasons why tail
+calls fail it might be useful to repo

[PATCH v9 01/10] Improve must tail in RTL backend

2024-07-08 Thread Andi Kleen
- Give error messages for all causes of non sibling call generation
- When giving error messages clear the musttail flag to avoid ICEs
- Error out when tree-tailcall failed to mark a must-tail call
sibcall. In this case it doesn't know the true reason and only gives
a vague message.

PR83324

gcc/ChangeLog:

* calls.cc (maybe_complain_about_tail_call): Clear must tail
flag on error.
(expand_call): Give error messages for all musttail failures.
---
 gcc/calls.cc | 32 ++--
 1 file changed, 26 insertions(+), 6 deletions(-)

diff --git a/gcc/calls.cc b/gcc/calls.cc
index 21d78f9779fe..883eb9971257 100644
--- a/gcc/calls.cc
+++ b/gcc/calls.cc
@@ -1249,6 +1249,7 @@ maybe_complain_about_tail_call (tree call_expr, const 
char *reason)
 return;
 
   error_at (EXPR_LOCATION (call_expr), "cannot tail-call: %s", reason);
+  CALL_EXPR_MUST_TAIL_CALL (call_expr) = 0;
 }
 
 /* Fill in ARGS_SIZE and ARGS array based on the parameters found in
@@ -2650,7 +2651,13 @@ expand_call (tree exp, rtx target, int ignore)
   /* The type of the function being called.  */
   tree fntype;
   bool try_tail_call = CALL_EXPR_TAILCALL (exp);
-  bool must_tail_call = CALL_EXPR_MUST_TAIL_CALL (exp);
+  /* tree-tailcall decided not to do tail calls. Error for the musttail case,
+ unfortunately we don't know the reason so it's fairly vague.
+ When tree-tailcall reported an error it already cleared the flag,
+ so this shouldn't really happen unless the
+ the musttail pass gave up walking before finding the call.  */
+  if (!try_tail_call)
+  maybe_complain_about_tail_call (exp, "other reasons");
   int pass;
 
   /* Register in which non-BLKmode value will be returned,
@@ -3022,10 +3029,21 @@ expand_call (tree exp, rtx target, int ignore)
  pushed these optimizations into -O2.  Don't try if we're already
  expanding a call, as that means we're an argument.  Don't try if
  there's cleanups, as we know there's code to follow the call.  */
-  if (currently_expanding_call++ != 0
-  || (!flag_optimize_sibling_calls && !CALL_FROM_THUNK_P (exp))
-  || args_size.var
-  || dbg_cnt (tail_call) == false)
+  if (currently_expanding_call++ != 0)
+{
+  maybe_complain_about_tail_call (exp, "inside another call");
+  try_tail_call = 0;
+}
+  if (!flag_optimize_sibling_calls
+   && !CALL_FROM_THUNK_P (exp)
+   && !CALL_EXPR_MUST_TAIL_CALL (exp))
+try_tail_call = 0;
+  if (args_size.var)
+{
+  maybe_complain_about_tail_call (exp, "variable size arguments");
+  try_tail_call = 0;
+}
+  if (dbg_cnt (tail_call) == false)
 try_tail_call = 0;
 
   /* Workaround buggy C/C++ wrappers around Fortran routines with
@@ -3046,13 +3064,15 @@ expand_call (tree exp, rtx target, int ignore)
if (MEM_P (*iter))
  {
try_tail_call = 0;
+   maybe_complain_about_tail_call (exp,
+   "hidden string length argument passed on 
stack");
break;
  }
}
 
   /* If the user has marked the function as requiring tail-call
  optimization, attempt it.  */
-  if (must_tail_call)
+  if (CALL_EXPR_MUST_TAIL_CALL (exp))
 try_tail_call = 1;
 
   /*  Rest of purposes for tail call optimizations to fail.  */
-- 
2.45.2



[PATCH v9 06/10] Enable musttail tail conversion even when not optimizing

2024-07-08 Thread Andi Kleen
Enable the tailcall optimization for non optimizing builds,
but in this case only checks calls that have the musttail attribute set.
This makes musttail work without optimization.

This is done with a new late musttail pass that is only active when
not optimizing. The new pass relies on tree-cfg to discover musttails.
This avoids a ~0.8% compiler run time penalty at -O0.

gcc/ChangeLog:

PR83324
* function.h (struct function): Add has_musttail.
* lto-streamer-in.cc (input_struct_function_base): Stream
has_musttail.
* lto-streamer-out.cc (output_struct_function_base): Dito.
* passes.def (pass_musttail): Add.
* tree-cfg.cc (notice_special_calls): Record has_musttail.
(clear_special_calls): Clear has_musttail.
* tree-pass.h (make_pass_musttail): Add.
* tree-tailcall.cc (find_tail_calls): Handle only_musttail
  argument.
(tree_optimize_tail_calls_1): Pass on only_musttail.
(execute_tail_calls): Pass only_musttail as false.
(class pass_musttail): Add.
(make_pass_musttail): Add.
---
 gcc/function.h  |  3 ++
 gcc/lto-streamer-in.cc  |  1 +
 gcc/lto-streamer-out.cc |  1 +
 gcc/passes.def  |  1 +
 gcc/tree-cfg.cc |  3 ++
 gcc/tree-pass.h |  1 +
 gcc/tree-tailcall.cc| 68 +++--
 7 files changed, 69 insertions(+), 9 deletions(-)

diff --git a/gcc/function.h b/gcc/function.h
index c0ba6cc1531a..fbeadeaf4104 100644
--- a/gcc/function.h
+++ b/gcc/function.h
@@ -430,6 +430,9 @@ struct GTY(()) function {
   /* Nonzero when the tail call has been identified.  */
   unsigned int tail_call_marked : 1;
 
+  /* Has musttail marked calls.  */
+  unsigned int has_musttail : 1;
+
   /* Nonzero if the current function contains a #pragma GCC unroll.  */
   unsigned int has_unroll : 1;
 
diff --git a/gcc/lto-streamer-in.cc b/gcc/lto-streamer-in.cc
index ad0ca24007a0..2e592be80823 100644
--- a/gcc/lto-streamer-in.cc
+++ b/gcc/lto-streamer-in.cc
@@ -1325,6 +1325,7 @@ input_struct_function_base (struct function *fn, class 
data_in *data_in,
   fn->calls_eh_return = bp_unpack_value (&bp, 1);
   fn->has_force_vectorize_loops = bp_unpack_value (&bp, 1);
   fn->has_simduid_loops = bp_unpack_value (&bp, 1);
+  fn->has_musttail = bp_unpack_value (&bp, 1);
   fn->assume_function = bp_unpack_value (&bp, 1);
   fn->va_list_fpr_size = bp_unpack_value (&bp, 8);
   fn->va_list_gpr_size = bp_unpack_value (&bp, 8);
diff --git a/gcc/lto-streamer-out.cc b/gcc/lto-streamer-out.cc
index d4f728094ed5..0be381abbd96 100644
--- a/gcc/lto-streamer-out.cc
+++ b/gcc/lto-streamer-out.cc
@@ -2290,6 +2290,7 @@ output_struct_function_base (struct output_block *ob, 
struct function *fn)
   bp_pack_value (&bp, fn->calls_eh_return, 1);
   bp_pack_value (&bp, fn->has_force_vectorize_loops, 1);
   bp_pack_value (&bp, fn->has_simduid_loops, 1);
+  bp_pack_value (&bp, fn->has_musttail, 1);
   bp_pack_value (&bp, fn->assume_function, 1);
   bp_pack_value (&bp, fn->va_list_fpr_size, 8);
   bp_pack_value (&bp, fn->va_list_gpr_size, 8);
diff --git a/gcc/passes.def b/gcc/passes.def
index b8c21b1e4351..49ab89387552 100644
--- a/gcc/passes.def
+++ b/gcc/passes.def
@@ -444,6 +444,7 @@ along with GCC; see the file COPYING3.  If not see
   NEXT_PASS (pass_tsan_O0);
   NEXT_PASS (pass_sanopt);
   NEXT_PASS (pass_cleanup_eh);
+  NEXT_PASS (pass_musttail);
   NEXT_PASS (pass_lower_resx);
   NEXT_PASS (pass_nrv);
   NEXT_PASS (pass_gimple_isel);
diff --git a/gcc/tree-cfg.cc b/gcc/tree-cfg.cc
index 7fb7b92966be..e6fd1294b958 100644
--- a/gcc/tree-cfg.cc
+++ b/gcc/tree-cfg.cc
@@ -2290,6 +2290,8 @@ notice_special_calls (gcall *call)
 cfun->calls_alloca = true;
   if (flags & ECF_RETURNS_TWICE)
 cfun->calls_setjmp = true;
+  if (gimple_call_must_tail_p (call))
+cfun->has_musttail = true;
 }
 
 
@@ -2301,6 +2303,7 @@ clear_special_calls (void)
 {
   cfun->calls_alloca = false;
   cfun->calls_setjmp = false;
+  cfun->has_musttail = false;
 }
 
 /* Remove PHI nodes associated with basic block BB and all edges out of BB.  */
diff --git a/gcc/tree-pass.h b/gcc/tree-pass.h
index 9843d189d27d..8093b363bf14 100644
--- a/gcc/tree-pass.h
+++ b/gcc/tree-pass.h
@@ -368,6 +368,7 @@ extern gimple_opt_pass *make_pass_sra (gcc::context *ctxt);
 extern gimple_opt_pass *make_pass_sra_early (gcc::context *ctxt);
 extern gimple_opt_pass *make_pass_tail_recursion (gcc::context *ctxt);
 extern gimple_opt_pass *make_pass_tail_calls (gcc::context *ctxt);
+extern gimple_opt_pass *make_pass_musttail (gcc::context *ctxt);
 extern gimple_opt_pass *make_pass_fix_loops (gcc::context *ctxt);
 extern gimple_opt_pass *make_pass_tree_loop (gcc::context *ctxt);
 extern gimple_opt_pass *make_pass_tree_no_loop (gcc::context *ctxt);
diff --git a/gcc/tree-tailcall.cc b/gcc/tree-tailcall.cc
index e9f7f8a12b3a..43e8c25215cb 100644
--- a/gcc/tree-tailcall.cc
+++ b/gcc/tree-tailcall.cc
@@ -408,10 +408,10 @@ static live_vars_map *li

[PATCH v9 04/10] C++: Support clang compatible [[musttail]] (PR83324)

2024-07-08 Thread Andi Kleen
This patch implements a clang compatible [[musttail]] attribute for
returns.

musttail is useful as an alternative to computed goto for interpreters.
With computed goto the interpreter function usually ends up very big
which causes problems with register allocation and other per function
optimizations not scaling. With musttail the interpreter can be instead
written as a sequence of smaller functions that call each other. To
avoid unbounded stack growth this requires forcing a sibling call, which
this attribute does. It guarantees an error if the call cannot be tail
called which allows the programmer to fix it instead of risking a stack
overflow. Unlike computed goto it is also type-safe.

It turns out that David Malcolm had already implemented middle/backend
support for a musttail attribute back in 2016, but it wasn't exposed
to any frontend other than a special plugin.

This patch adds a [[gnu::musttail]] attribute for C++ that can be added
to return statements. The return statement must be a direct call
(it does not follow dependencies), which is similar to what clang
implements. It then uses the existing must tail infrastructure.

For compatibility it also detects clang::musttail

Passes bootstrap and full test

gcc/cp/ChangeLog:

PR c/83324
* parser.cc (cp_parser_statement): Handle musttail.
(cp_parser_jump_statement): Dito.
* pt.cc (tsubst_expr): Copy CALL_EXPR_MUST_TAIL_CALL.
---
 gcc/cp/parser.cc | 34 +++---
 gcc/cp/pt.cc |  7 ++-
 2 files changed, 37 insertions(+), 4 deletions(-)

diff --git a/gcc/cp/parser.cc b/gcc/cp/parser.cc
index 31ae9c2fb54d..c8ed88f7a91b 100644
--- a/gcc/cp/parser.cc
+++ b/gcc/cp/parser.cc
@@ -2467,7 +2467,7 @@ static tree cp_parser_perform_range_for_lookup
 static tree cp_parser_range_for_member_function
   (tree, tree);
 static tree cp_parser_jump_statement
-  (cp_parser *);
+  (cp_parser *, tree &);
 static void cp_parser_declaration_statement
   (cp_parser *);
 
@@ -12756,7 +12756,7 @@ cp_parser_statement (cp_parser* parser, tree 
in_statement_expr,
case RID_CO_RETURN:
case RID_GOTO:
  std_attrs = process_stmt_hotness_attribute (std_attrs, attrs_loc);
- statement = cp_parser_jump_statement (parser);
+ statement = cp_parser_jump_statement (parser, std_attrs);
  break;
 
  /* Objective-C++ exception-handling constructs.  */
@@ -14844,10 +14844,11 @@ cp_parser_init_statement (cp_parser *parser, tree 
*decl)
jump-statement:
  goto * expression ;
 
+   STD_ATTRS are the statement attributes. They can be modified.
Returns the new BREAK_STMT, CONTINUE_STMT, RETURN_EXPR, or GOTO_EXPR.  */
 
 static tree
-cp_parser_jump_statement (cp_parser* parser)
+cp_parser_jump_statement (cp_parser* parser, tree &std_attrs)
 {
   tree statement = error_mark_node;
   cp_token *token;
@@ -14924,6 +14925,33 @@ cp_parser_jump_statement (cp_parser* parser)
  /* If the next token is a `;', then there is no
 expression.  */
  expr = NULL_TREE;
+
+   if (keyword == RID_RETURN && expr)
+ {
+   bool musttail_p = false;
+   if (lookup_attribute ("gnu", "musttail", std_attrs))
+ {
+   musttail_p = true;
+   std_attrs = remove_attribute ("gnu", "musttail", std_attrs);
+ }
+   /* Support this for compatibility.  */
+   if (lookup_attribute ("clang", "musttail", std_attrs))
+ {
+   musttail_p = true;
+   std_attrs = remove_attribute ("clang", "musttail", std_attrs);
+ }
+   if (musttail_p)
+ {
+   tree t = expr;
+   if (t && TREE_CODE (t) == TARGET_EXPR)
+ t = TARGET_EXPR_INITIAL (t);
+   if (t && TREE_CODE (t) != CALL_EXPR)
+ error_at (token->location, "cannot tail-call: return value 
must be a call");
+   else
+ CALL_EXPR_MUST_TAIL_CALL (t) = 1;
+ }
+ }
+
/* Build the return-statement, check co-return first, since type
   deduction is not valid there.  */
if (keyword == RID_CO_RETURN)
diff --git a/gcc/cp/pt.cc b/gcc/cp/pt.cc
index d1316483e245..3b914089a6e2 100644
--- a/gcc/cp/pt.cc
+++ b/gcc/cp/pt.cc
@@ -21113,12 +21113,17 @@ tsubst_expr (tree t, tree args, tsubst_flags_t 
complain, tree in_decl)
bool op = CALL_EXPR_OPERATOR_SYNTAX (t);
bool ord = CALL_EXPR_ORDERED_ARGS (t);
bool rev = CALL_EXPR_REVERSE_ARGS (t);
-   if (op || ord || rev)
+   bool mtc = false;
+   if (TREE_CODE (t) == CALL_EXPR)
+ mtc = CALL_EXPR_MUST_TAIL_CALL (t);
+   if (op || ord || rev || mtc)
  if (tree call = extract_call_expr (ret))
{
  CALL_EXPR_OPERATOR_SYNTAX (call) = op;
  CALL_EXPR_ORDERED_ARGS (call) = ord;
  CALL_

[PATCH v9 03/10] Add a musttail generic attribute to the c-attribs table

2024-07-08 Thread Andi Kleen
The actual handling is directly in the parser since the
generic mechanism doesn't support statement attributes,
but this gives basic error checking/detection on the attribute.

gcc/c-family/ChangeLog:

PR83324
* c-attribs.cc (handle_musttail_attribute): Add.
* c-common.h (handle_musttail_attribute): Add.
---
 gcc/c-family/c-attribs.cc | 15 +++
 gcc/c-family/c-common.h   |  1 +
 2 files changed, 16 insertions(+)

diff --git a/gcc/c-family/c-attribs.cc b/gcc/c-family/c-attribs.cc
index f9b229aba7fc..5adc7b775eaf 100644
--- a/gcc/c-family/c-attribs.cc
+++ b/gcc/c-family/c-attribs.cc
@@ -340,6 +340,8 @@ const struct attribute_spec c_common_gnu_attributes[] =
   { "common", 0, 0, true,  false, false, false,
  handle_common_attribute,
  attr_common_exclusions },
+  { "musttail",  0, 0, false, false, false,
+ false, handle_musttail_attribute, NULL },
   /* FIXME: logically, noreturn attributes should be listed as
  "false, true, true" and apply to function types.  But implementing this
  would require all the places in the compiler that use TREE_THIS_VOLATILE
@@ -1222,6 +1224,19 @@ handle_common_attribute (tree *node, tree name, tree 
ARG_UNUSED (args),
   return NULL_TREE;
 }
 
+/* Handle a "musttail" attribute; arguments as in
+   struct attribute_spec.handler.  */
+
+tree
+handle_musttail_attribute (tree ARG_UNUSED (*node), tree name, tree ARG_UNUSED 
(args),
+  int ARG_UNUSED (flags), bool *no_add_attrs)
+{
+  /* Currently only a statement attribute, handled directly in parser.  */
+  warning (OPT_Wattributes, "%qE attribute ignored", name);
+  *no_add_attrs = true;
+  return NULL_TREE;
+}
+
 /* Handle a "noreturn" attribute; arguments as in
struct attribute_spec.handler.  */
 
diff --git a/gcc/c-family/c-common.h b/gcc/c-family/c-common.h
index 48c89b603bcd..e84c9c47513b 100644
--- a/gcc/c-family/c-common.h
+++ b/gcc/c-family/c-common.h
@@ -1643,6 +1643,7 @@ extern tree find_tm_attribute (tree);
 extern const struct attribute_spec::exclusions attr_cold_hot_exclusions[];
 extern const struct attribute_spec::exclusions attr_noreturn_exclusions[];
 extern tree handle_noreturn_attribute (tree *, tree, tree, int, bool *);
+extern tree handle_musttail_attribute (tree *, tree, tree, int, bool *);
 extern bool has_attribute (location_t, tree, tree, tree (*)(tree));
 extern tree build_attr_access_from_parms (tree, bool);
 
-- 
2.45.2



New musttail patchkit

2024-07-08 Thread Andi Kleen
This version addresses all the review feedback (Thanks everyone!)

It is getting close to the finish line. The only missing reviews now
are for the C frontend part (patch 5). Joseph and Marek, I would
appreciate if you could take a look.

- Addressed Richie's feedback with various improvements
and better comments and commit messages.
- Squashed some tree-tailcall patches
- Fix some more test issues pointed out by the Linaro bot
[if there are other architectures with some but
not full tail call support like ARM the test cases
may need further adjustments to skip those]
- Some minor cleanups.

-Andi


[PATCH v9 02/10] Fix pro_and_epilogue for sibcalls at -O0 (PR115255)

2024-07-08 Thread Andi Kleen
Some of the cfg fixups in pro_and_epilogue for sibcalls were dependent on 
"optimize".
Make them check cfun->tail_call_marked instead to handle the -O0 musttail
case. This fixes the musttail test cases on arm targets.

gcc/ChangeLog:

PR target/115255
* function.cc (thread_prologue_and_epilogue_insns): Check
  cfun->tail_call_marked for sibcalls too.
(rest_of_handle_thread_prologue_and_epilogue): Dito.
---
 gcc/function.cc | 8 ++--
 1 file changed, 6 insertions(+), 2 deletions(-)

diff --git a/gcc/function.cc b/gcc/function.cc
index 4edd4da12474..a6f6de349420 100644
--- a/gcc/function.cc
+++ b/gcc/function.cc
@@ -2231,6 +2231,7 @@ use_register_for_decl (const_tree decl)
   /* We don't set DECL_IGNORED_P for the function_result_decl.  */
   if (optimize)
return true;
+  /* Needed for [[musttail]] which can operate even at -O0 */
   if (cfun->tail_call_marked)
return true;
   /* We don't set DECL_REGISTER for the function_result_decl.  */
@@ -6259,8 +6260,11 @@ thread_prologue_and_epilogue_insns (void)
 }
 
   /* Threading the prologue and epilogue changes the artificial refs in the
- entry and exit blocks, and may invalidate DF info for tail calls.  */
+ entry and exit blocks, and may invalidate DF info for tail calls.
+ This is also needed for [[musttail]] conversion even when not
+ optimizing.  */
   if (optimize
+  || cfun->tail_call_marked
   || flag_optimize_sibling_calls
   || flag_ipa_icf_functions
   || in_lto_p)
@@ -6557,7 +6561,7 @@ rest_of_handle_thread_prologue_and_epilogue (function 
*fun)
 {
   /* prepare_shrink_wrap is sensitive to the block structure of the control
  flow graph, so clean it up first.  */
-  if (optimize)
+  if (cfun->tail_call_marked || optimize)
 cleanup_cfg (0);
 
   /* On some machines, the prologue and epilogue code, or parts thereof,
-- 
2.45.2



Re: [PATCH v8 07/12] Enable musttail tail conversion even when not optimizing

2024-07-08 Thread Andi Kleen
On Mon, Jul 08, 2024 at 05:27:53PM +0200, Richard Biener wrote:
> 
> 
> > Am 08.07.2024 um 17:22 schrieb Andi Kleen :
> > 
> > On Mon, Jul 08, 2024 at 08:53:27AM +0200, Richard Biener wrote:
> >> Ah, I see.  So this pass is responsible for both -O0 and
> >> -fno-optimized-sibling-calls.
> >> But I'm quite sure the other pass doesn't run with -O0
> >> -foptimize-sibling-calls, does it?
> > 
> > It does run:
> > 
> > ./cc1 -O0 -fdump-passes -foptimize-sibling-calls t.c 2>&1 | grep tail
> > tree-tailr1   :  ON
> >  tree-tailr2  :  ON
> >  tree-tailc   :  ON
> 
> I would not trust -fdump-passes, IIRC that just executes the gate functions.

You're right it's not invoked according to gdb.

So the musttail gate needs a || optimize == 0.

-Andi


Re: [PATCH v3] Target-independent store forwarding avoidance.

2024-07-08 Thread Andi Kleen
> I have added a target hook for this in v4 of this patch. The hook
> receives all the information about the stores, the load, the estimated
> sequence cost and whether we expect to eliminate the load. With this
> information the target should be able to make an informed decision.
> 
> What you mention is also true for AArch64: some microbenchmarking I
> did shows that some cores efficiently handle 32bit->64bit store
> forwarding while others not, so creating a target hook is necessary
> for such cases.

Perhaps for the 32->64 case have a generic simple target flag. I presume it
will be common.

On x86 there are lots of other cases too and the details vary based on
the micro architecture. I wonder if there is an efficient way to encode
that in a table.

> This is still hard to tell. In some cases I have observed either
> improvement or regressions in benchmarks, which are highly susceptible
> to costing and the specific store-forwarding penalties of the CPU.
> I have seen cases where the store-forwarding instance is profitable to
> avoid but we get bad code generation due to other reasons (usually
> store_bit_field lowering not being good enough) and hence a
> regression.

I wonder if there could be some heuristic to avoid it for those cases.

> So I believe more time and testing is needed to really evaluate the
> speedups that can be achieved.

So for now it would be off by default?

-Andi


Re: [PATCH v8 08/12] Give better error messages for musttail

2024-07-08 Thread Andi Kleen
On Mon, Jul 08, 2024 at 09:06:21AM +0200, Richard Biener wrote:
> On Sat, Jul 6, 2024 at 8:45 PM Andi Kleen  wrote:
> >
> > > >if (!single_succ_p (bb))
> > > > -return;
> > > > +{
> > > > +  int num_eh, num_other;
> > > > +  bb_get_succ_edge_count (bb, num_eh, num_other);
> > > > +  /* Allow EH edges so that we can give a better
> > > > +error message later.  */
> > >
> > > Please instead use has_abnormal_or_eh_outgoing_edge_p (bb) instead
> >
> > That's not equivalent, need a num_other == 1 check too.
> 
> There can be at most one regular outgoing edge for a block with an
> outgoing EH or abnormal edge.

GIMPLE_CONDs cannot trigger EH?

> > Do you want me to move the function to a generic place?
> 
> Maybe you can use find_fallthru_edge () instead if you think
> has_abnormal_or_eh_outgoing_edge_p isn't good enough?  That will
> find the single_succ_edge when the BB isn't single_succ_p because
> of EH/abnormal edges.
> 
> I think both choices would be equivalent to your new function and its use.

Okay will do the later.

> The comment above the check is a bit weird in how it talks about types, but
> "tail call must be same type" isn't very helpful and it isn't in any way 
> related
> to the actual check being performed.  "return slot" is supposed to be the
> storage used for return pointed to by the invisible reference parameter to
> space allocated by the caller.  Do you know a more C/C++ standard related
> naming for this?

I don't have a better name. Probably the right thing would be to use
whatever term the respective ABI uses, but that may not be the same for
every target. I used your suggestion.

-Andi


Re: [PATCH v8 09/12] Delay caller error reporting for musttail

2024-07-08 Thread Andi Kleen
> > Overall the logic in this pass is rather convoluted and
> > could deserve some cleanups and separation of concerns.
> > e.g. it would be better to separate tail calls and tail
> > recursion. But I'm not trying to rewrite the pass here.
> 
> Understood.  For a v9, can you squash the tree-tailcall.cc changes
> please?

I squashed all the tree-tailcall error changes. The new pass is still
a separate patch. I would prefer to keep it this way.

BTW I'm surprised you prefer that. Normally smaller patches are better
if bisecting is needed.

-Andi


Re: [PATCH v8 07/12] Enable musttail tail conversion even when not optimizing

2024-07-08 Thread Andi Kleen
On Mon, Jul 08, 2024 at 08:53:27AM +0200, Richard Biener wrote:
> Ah, I see.  So this pass is responsible for both -O0 and
> -fno-optimized-sibling-calls.
> But I'm quite sure the other pass doesn't run with -O0
> -foptimize-sibling-calls, does it?

It does run:

./cc1 -O0 -fdump-passes -foptimize-sibling-calls t.c 2>&1 | grep tail
 tree-tailr1   :  ON
  tree-tailr2  :  ON
  tree-tailc   :  ON

But I suspect without the earlier expand patch to adjust the cfg rebuild it may
ICE on some of the targets.

-Andi


Re: [PATCH v8 08/12] Give better error messages for musttail

2024-07-06 Thread Andi Kleen
> >if (!single_succ_p (bb))
> > -return;
> > +{
> > +  int num_eh, num_other;
> > +  bb_get_succ_edge_count (bb, num_eh, num_other);
> > +  /* Allow EH edges so that we can give a better
> > +error message later.  */
> 
> Please instead use has_abnormal_or_eh_outgoing_edge_p (bb) instead

That's not equivalent, need a num_other == 1 check too.

Do you want me to move the function to a generic place?

> to avoid adding another function like this.  Also only continue searching
> for a musttail call if cfun->has_musttail

Done (although I must say I liked the better dump messages even for non
tailcall)

> >if (gimple_references_memory_p (stmt)
> >   || gimple_has_volatile_ops (stmt))
> > -   return;
> > +   {
> > + bad_stmt = true;
> 
> break here when !cfun->has_musttail?

Done.

> >if (ass_var
> >&& !is_gimple_reg (ass_var)
> >&& !auto_var_in_fn_p (ass_var, cfun->decl))
> > -return;
> > +{
> > +  maybe_error_musttail (call, _("return value in memory"));
> > +  return;
> > +}
> > +
> > +  if (cfun->calls_setjmp)
> > +{
> > +  maybe_error_musttail (call, _("caller uses setjmp"));
> > +  return;
> > +}
> >
> >/* If the call might throw an exception that wouldn't propagate out of
> >   cfun, we can't transform to a tail or sibling call (82081).  */
> > -  if (stmt_could_throw_p (cfun, stmt)
> > -  && !stmt_can_throw_external (cfun, stmt))
> > +  if ((stmt_could_throw_p (cfun, stmt)
> > +   && !stmt_can_throw_external (cfun, stmt)) || !single_succ_p (bb))
> 
> This reports for the found stmt while above we reject any intermediate
> non-fallthru control flow.  I would suggest to, in the above BB check,
> record a gimple *last = last_stmt (bb) and if last == stmt report this reason
> but otherwise "control altering statement between call and return"?

Ok.  I reported "code between call and return". I don't think there
since "control" would imply control flow.

Also there is no last_stmt () or did I miss it? It couldn't be used
anyways because it still needs to skip the nops etc. But the backwards 
loop can easily discover it.

BTW I suspect some of the checks are redundant but it is hard to really
prove it, so I left everything in place.

> > +maybe_error_musttail (call,
> > + _("call may throw exception that does not 
> > propagate"));
> >  return;
> > +  }
> >
> >/* If the function returns a value, then at present, the tail call
> >   must return the same type of value.  There is conceptually a copy
> > @@ -524,7 +593,10 @@ find_tail_calls (basic_block bb, struct tailcall 
> > **ret, bool only_musttail)
> >if (result_decl
> >&& may_be_aliased (result_decl)
> >&& ref_maybe_used_by_stmt_p (call, result_decl, false))
> > -return;
> > +{
> > +  maybe_error_musttail (call, _("tail call must be same type"));
> 
> ?  "call uses the return slot"?
> 
> Otherwise looks OK.

Done. Although I'm not sure what a return slot is, but maybe the users
can figure it out)

-Andi


Re: [PATCH v8 09/12] Delay caller error reporting for musttail

2024-07-06 Thread Andi Kleen
On Fri, Jul 05, 2024 at 01:45:17PM +0200, Richard Biener wrote:
> On Sat, Jun 22, 2024 at 9:00 PM Andi Kleen  wrote:
> >
> > Move the error reporting for caller attributes to be
> > after the tail call discovery, so that we can give proper
> > error messages tagged to the calls.
> 
> Hmm.  This all gets a bit awkward.  I realize that early checking
> gets us less compile-time unnecessarily spent for searching for
> a tail call - but at least for the musttail case parsing constraints
> should put a practical limit on how far to look?

All the top level checks are for obscure situations, so it's unlikely
that it makes much difference for compile time either way.

> 
> So what I wonder is whether it would be better to separate
> searching for a (musttail) candidate separate from validation?
> 
> We could for example invoke find_tail_calls twice, once to
> find a musttail candidate (can there be multiple ones?) and once
> to validate and error?  Would that make the delaying less awkward?

There can be multiple musttails in a function, in theory
one for every return.

I'm not sure I see the awkward part? (other than perhaps
the not-quite-natural accumulation of opt_tailcalls). There
are alots of checks before and after discovery. This just
moves them all to be after.

If the top level checks were done based on a discovered 
list you would need extra loops to walk the candidates 
later and error. It wouldn't be any simpler at least.

Overall the logic in this pass is rather convoluted and
could deserve some cleanups and separation of concerns.
e.g. it would be better to separate tail calls and tail
recursion. But I'm not trying to rewrite the pass here.

-Andi


Re: [PATCH v8 03/12] Add a musttail generic attribute to the c-attribs table

2024-07-06 Thread Andi Kleen
On Fri, Jul 05, 2024 at 12:44:47PM +0200, Richard Biener wrote:
> On Sat, Jun 22, 2024 at 8:57 PM Andi Kleen  wrote:
> >
> > It does nothing currently since statement attributes are handled
> > directly in the parser.
> 
> Is this needed at all?  a "'musttail' attribute ignored" diagnostic isn't
> much more helpful than "'foo' attribute directive ignored"?  Or does
> stmt attribute parsing rely on this table as well?

It avoids an extra check in the C/C++ parser. I will clarify the commit
message to say that.

-Andi


Re: [PATCH v8 07/12] Enable musttail tail conversion even when not optimizing

2024-07-06 Thread Andi Kleen
> > +class pass_musttail : public gimple_opt_pass
> > +{
> > +public:
> > +  pass_musttail (gcc::context *ctxt)
> > +: gimple_opt_pass (pass_data_musttail, ctxt)
> > +  {}
> > +
> > +  /* opt_pass methods: */
> > +  /* This pass is only used when not optimizing to make [[musttail]] still
> > + work.  */
> > +  bool gate (function *) final override { return 
> > !flag_optimize_sibling_calls; }
> 
> Shouldn't this check f->has_musttail only?  That is, I would expect
> -fno-optimize-sibling-calls to still tail-call [[musttail]]?  The comment says
> the pass only runs when not optimizing - so maybe you wanted to do
> return optimize == 0;?

When flag_optimize_sibling_call is set the other tailcall pass will 
take care of the musttails. It is only needed when that one doesn't run.
So I think looking at that flag is correct.

But I should move the f->has_musttail check into the gate (done) and
clarified the comment because it is not specific to optimizing.

Thanks,
-Andi


Re: [PATCH] x86: Update branch hint for Redwood Cove.

2024-07-02 Thread Andi Kleen
liuhongt  writes:

> From: "H.J. Lu" 
>
> According to Intel® 64 and IA-32 Architectures Optimization Reference
> Manual[1], Branch Hint is updated for Redwood Cove.
>
> cut from [1]-
> Starting with the Redwood Cove microarchitecture, if the predictor has
> no stored information about a branch, the branch has the Intel® SSE2
> branch taken hint (i.e., instruction prefix 3EH), When the codec
> decodes the branch, it flips the branch’s prediction from not-taken to
> taken. It then flushes the pipeline in front of it and steers this
> pipeline to fetch the taken path of the branch.
> cut end -
>
> For -mtune-ctrl=branch_prediction_hints, always generate branch hint for
> conditional branches, this tune is disabled by default.
>
> [1] 
> https://www.intel.com/content/www/us/en/content-details/821612/intel-64-and-ia-32-architectures-optimization-reference-manual-volume-1.html
>
> Bootstrapped and regtested on x86_64-pc-linux-gnu{-m32,}.
> Ready push to trunk.

So what does it do to code size?
You may not want to do it with -Os.

Maybe it should be only done with actual profile feedback data
available, i'm not sure if the builtin heuristics are good enough to
justify it and there is a risk that it is very wrong.  

Yes as long as it's disabled by default that's all not a problem, but it
would need to be solved to enable it.

-Andi


[PING] Re: Updated musttail patchkit

2024-07-01 Thread Andi Kleen
Andi Kleen  writes:

I wanted to ping this patch kit to add musttail support for C/C++,
to enable future python versions and other users and keep up with clang. 

https://gcc.gnu.org/pipermail/gcc-patches/2024-June/thread.html#655447

It unfortunately touches various different parts of the compiler.
All the previous feedback has been addressed, except for
- cannot make it a warning because that would defeat the purpose
- cannot move all of the checking to expand time (would be a whole
scale rewrite of the whole mechanism)

These are RTL level:
https://gcc.gnu.org/pipermail/gcc-patches/2024-June/655448.html
(got some feedback from the two Richards and Jakub earlier)
https://gcc.gnu.org/pipermail/gcc-patches/2024-June/655450.html
(got some feedback from Andrew)

C++:
https://gcc.gnu.org/pipermail/gcc-patches/2024-June/655449.html
https://gcc.gnu.org/pipermail/gcc-patches/2024-June/655451.html
https://gcc.gnu.org/pipermail/gcc-patches/2024-June/655453.html
(C++, already approved)

C:
https://gcc.gnu.org/pipermail/gcc-patches/2024-June/655452.html
(C, got some feedback from Joseph, but never got finally approved) 

https://gcc.gnu.org/pipermail/gcc-patches/2024-June/655455.html

Unreviewed patches, touching both tree-ssa-tailcall and calls.c expand:
https://gcc.gnu.org/pipermail/gcc-patches/2024-June/655454.html
https://gcc.gnu.org/pipermail/gcc-patches/2024-June/655457.html
https://gcc.gnu.org/pipermail/gcc-patches/2024-June/655456.html
https://gcc.gnu.org/pipermail/gcc-patches/2024-June/655458.html
https://gcc.gnu.org/pipermail/gcc-patches/2024-June/655459.html

Thanks,
-Andi

> - Fix problems with encoding musttail in tree structure (Thanks Jakub and 
> Jason)
> - Fixes a miscompilation that would break bootstrap with 
> --enable-checking=release
> - Avoids a 0.8% compile time penalty at -O0 for the new musttail pass by 
> using a cfun flag
> that is discovered by tree-cfg
> - Enables translation of musttail error messages
> - Further improves error reporting, avoiding "other reasons" error messages
> for various cases and reporting the correct error in others.
> - Adjusted the test suite to powerpc sibcall limitations
> - Addressed C++ review feedback
> - Improves dump file output
> - Improves the documentation
> - Some random cleanups
> - Rebased on trunk
>
> Tested full bootstrap on x86_64-linux and powerpc64le-linux, as well
> as a x86_64 LTO profiled bootstrap and some x86_64 testing with
> --enable-release=checking.


[PATCH v8 08/12] Give better error messages for musttail

2024-06-22 Thread Andi Kleen
When musttail is set, make tree-tailcall give error messages
when it cannot handle a call. This avoids vague "other reasons"
error messages later at expand time when it sees a musttail
function not marked tail call.

In various cases this requires delaying the error until
the call is discovered.

gcc/ChangeLog:

* tree-tailcall.cc (maybe_error_musttail): New function.
(bb_get_succ_edge_count): New function.
(find_tail_calls): Add error reporting. Handle EH edges
for error reporting.
---
 gcc/tree-tailcall.cc | 116 +--
 1 file changed, 102 insertions(+), 14 deletions(-)

diff --git a/gcc/tree-tailcall.cc b/gcc/tree-tailcall.cc
index 0c6df10e64f7..4687e20e61d0 100644
--- a/gcc/tree-tailcall.cc
+++ b/gcc/tree-tailcall.cc
@@ -40,9 +40,11 @@ along with GCC; see the file COPYING3.  If not see
 #include "tree-eh.h"
 #include "dbgcnt.h"
 #include "cfgloop.h"
+#include "intl.h"
 #include "common/common-target.h"
 #include "ipa-utils.h"
 #include "tree-ssa-live.h"
+#include "diagnostic-core.h"
 
 /* The file implements the tail recursion elimination.  It is also used to
analyze the tail calls in general, passing the results to the rtl level
@@ -402,6 +404,41 @@ propagate_through_phis (tree var, edge e)
   return var;
 }
 
+/* Report an error for failing to tail convert must call CALL
+   with error message ERR. Also clear the flag to prevent further
+   errors.  */
+
+static void
+maybe_error_musttail (gcall *call, const char *err)
+{
+  if (gimple_call_must_tail_p (call))
+{
+  error_at (call->location, "cannot tail-call: %s", err);
+  /* Avoid another error. ??? If there are multiple reasons why tail
+calls fail it might be useful to report them all to avoid
+whack-a-mole for the user. But currently there is too much
+redundancy in the reporting, so keep it simple.  */
+  gimple_call_set_must_tail (call, false); /* Avoid another error.  */
+  gimple_call_set_tail (call, false);
+}
+}
+
+/* Count succ edges for BB and return in NUM_OTHER and NUM_EH.  */
+
+static void
+bb_get_succ_edge_count (basic_block bb, int &num_other, int &num_eh)
+{
+  edge e;
+  edge_iterator ei;
+  num_eh = 0;
+  num_other = 0;
+  FOR_EACH_EDGE (e, ei, bb->succs)
+if (e->flags & EDGE_EH)
+  num_eh++;
+else
+  num_other++;
+}
+
 /* Argument for compute_live_vars/live_vars_at_stmt and what compute_live_vars
returns.  Computed lazily, but just once for the function.  */
 static live_vars_map *live_vars;
@@ -426,8 +463,16 @@ find_tail_calls (basic_block bb, struct tailcall **ret, 
bool only_musttail)
   tree var;
 
   if (!single_succ_p (bb))
-return;
+{
+  int num_eh, num_other;
+  bb_get_succ_edge_count (bb, num_eh, num_other);
+  /* Allow EH edges so that we can give a better
+error message later.  */
+  if (num_other != 1)
+   return;
+}
 
+  bool bad_stmt = false;
   for (gsi = gsi_last_bb (bb); !gsi_end_p (gsi); gsi_prev (&gsi))
 {
   stmt = gsi_stmt (gsi);
@@ -448,6 +493,12 @@ find_tail_calls (basic_block bb, struct tailcall **ret, 
bool only_musttail)
  /* Handle only musttail calls when not optimizing.  */
  if (only_musttail && !gimple_call_must_tail_p (call))
return;
+ if (bad_stmt)
+   {
+ maybe_error_musttail (call,
+ _("memory reference or volatile after call"));
+ return;
+   }
  ass_var = gimple_call_lhs (call);
  break;
}
@@ -462,9 +513,14 @@ find_tail_calls (basic_block bb, struct tailcall **ret, 
bool only_musttail)
   /* If the statement references memory or volatile operands, fail.  */
   if (gimple_references_memory_p (stmt)
  || gimple_has_volatile_ops (stmt))
-   return;
+   {
+ bad_stmt = true;
+   }
 }
 
+  if (bad_stmt)
+return;
+
   if (gsi_end_p (gsi))
 {
   edge_iterator ei;
@@ -489,13 +545,26 @@ find_tail_calls (basic_block bb, struct tailcall **ret, 
bool only_musttail)
   if (ass_var
   && !is_gimple_reg (ass_var)
   && !auto_var_in_fn_p (ass_var, cfun->decl))
-return;
+{
+  maybe_error_musttail (call, _("return value in memory"));
+  return;
+}
+
+  if (cfun->calls_setjmp)
+{
+  maybe_error_musttail (call, _("caller uses setjmp"));
+  return;
+}
 
   /* If the call might throw an exception that wouldn't propagate out of
  cfun, we can't transform to a tail or sibling call (82081).  */
-  if (stmt_could_throw_p (cfun, stmt)
-  && !stmt_can_throw_external (cfun, stmt))
+  if ((stmt_could_throw_p (cfun, stmt)
+   && !stmt_can_throw_external (cfun, stmt)) || !single_succ_p (bb))
+  {
+maybe_error_musttail (call,
+ _("call may throw exception that does not 
propagate"));
 return;
+  }
 
   /* If the function returns a value, then at present, the tail call
  mu

  1   2   3   4   5   6   7   8   9   >