[PATCH] c++: testsuite: require lto_incremental in pr90990_0.C

2022-06-20 Thread Alexandre Oliva via Gcc-patches


Other LTO tests that use -r require the lto_incremental effective
target.  I suppose pr90990_0.C is missing it due to an oversight.
This patch arranges for this test to also be skipped on
non-lto_incremental targets.

Regstrapped on x86_64-linux-gnu, also tested with a cross to
aarch64-rtems6.  Ok to install?


for  gcc/testsuite/ChangeLog

* g++.dg/lto/pr90990_0.C: Require lto_incremental target.
---
 gcc/testsuite/g++.dg/lto/pr90990_0.C |1 +
 1 file changed, 1 insertion(+)

diff --git a/gcc/testsuite/g++.dg/lto/pr90990_0.C 
b/gcc/testsuite/g++.dg/lto/pr90990_0.C
index 22a5e3ffaaa45..74cc2bbd92889 100644
--- a/gcc/testsuite/g++.dg/lto/pr90990_0.C
+++ b/gcc/testsuite/g++.dg/lto/pr90990_0.C
@@ -1,5 +1,6 @@
 // { dg-lto-do link }
 /* { dg-extra-ld-options {  -r -nostdlib } } */
+// { dg-require-effective-target lto_incremental }
 class A {
 public:
   float m_floats;

-- 
Alexandre Oliva, happy hackerhttps://FSFLA.org/blogs/lxo/
   Free Software Activist   GNU Toolchain Engineer
Disinformation flourishes because many people care deeply about injustice
but very few check the facts.  Ask me about 


[PATCH] Introduce -nolibstdc++ option

2022-06-20 Thread Alexandre Oliva via Gcc-patches


Using g++ to link without libstdc++, as in g++.dg/abi/pure-virtual1.C,
is error prone, because there's no way to tell g++ to drop libstdc++
without also dropping libc and any other libraries that the target
implicitly links in.

This has often led to the need for manual adjustments to this
testcase.

I figured adding support for -nolibstdc++, even though redundant,
makes some sense.  One could presumably use gcc rather than g++ for
linking, for the same effect, but sometimes changing the link command
is harder than adding an option, as in our testsuite.

Regstrapped on x86_64-linux-gnu, also tested with a cross to
aarch64-rtems6.  Ok to install?


for  gcc/ChangeLog

* common.opt (nolibstdc++): New.
* doc/invoke.texi (-nolibstdc++): Document it.

for  gcc/cp/ChangeLog

* g++spec.c (lang_specific_driver): Implement -nolibstdc++.

for  gcc/testsuite/ChangeLog

* g++.dg/abi/pure-virtual1.C: Use -nolibstdc++.
---
 gcc/common.opt   |3 +++
 gcc/cp/g++spec.cc|1 +
 gcc/doc/invoke.texi  |6 +-
 gcc/testsuite/g++.dg/abi/pure-virtual1.C |2 +-
 4 files changed, 10 insertions(+), 2 deletions(-)

diff --git a/gcc/common.opt b/gcc/common.opt
index 32917aafcaec1..e00c6fc2fb098 100644
--- a/gcc/common.opt
+++ b/gcc/common.opt
@@ -3456,6 +3456,9 @@ Driver
 nolibc
 Driver
 
+nolibstdc++
+Driver
+
 nostdlib
 Driver
 
diff --git a/gcc/cp/g++spec.cc b/gcc/cp/g++spec.cc
index 8174d652776b1..539e6ca089d85 100644
--- a/gcc/cp/g++spec.cc
+++ b/gcc/cp/g++spec.cc
@@ -160,6 +160,7 @@ lang_specific_driver (struct cl_decoded_option 
**in_decoded_options,
{
case OPT_nostdlib:
case OPT_nodefaultlibs:
+   case OPT_nolibstdc__:
  library = -1;
  break;
 
diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi
index 50f57877477bc..469b6d97e0dfa 100644
--- a/gcc/doc/invoke.texi
+++ b/gcc/doc/invoke.texi
@@ -652,7 +652,7 @@ Objective-C and Objective-C++ Dialects}.
 @item Linker Options
 @xref{Link Options,,Options for Linking}.
 @gccoptlist{@var{object-file-name}  -fuse-ld=@var{linker}  -l@var{library} @gol
--nostartfiles  -nodefaultlibs  -nolibc  -nostdlib @gol
+-nostartfiles  -nodefaultlibs  -nolibc  -nolibstdc++  -nostdlib @gol
 -e @var{entry}  --entry=@var{entry} @gol
 -pie  -pthread  -r  -rdynamic @gol
 -s  -static  -static-pie  -static-libgcc  -static-libstdc++ @gol
@@ -16787,6 +16787,10 @@ absence of a C library is assumed, for example 
@option{-lpthread} or
 @option{-lm} in some configurations.  This is intended for bare-board
 targets when there is indeed no C library available.
 
+@item -nolibstdc++
+@opindex nolibstdc++
+Do not link with standard C++ libraries implicitly.
+
 @item -nostdlib
 @opindex nostdlib
 Do not use the standard system startup files or libraries when linking.
diff --git a/gcc/testsuite/g++.dg/abi/pure-virtual1.C 
b/gcc/testsuite/g++.dg/abi/pure-virtual1.C
index 538e2cb097a0d..889c33e4952f4 100644
--- a/gcc/testsuite/g++.dg/abi/pure-virtual1.C
+++ b/gcc/testsuite/g++.dg/abi/pure-virtual1.C
@@ -1,7 +1,7 @@
 // Test that we don't need libsupc++ just for __cxa_pure_virtual.
 // { dg-do link }
 // { dg-require-weak }
-// { dg-additional-options "-fno-rtti -nodefaultlibs -lc" }
+// { dg-additional-options "-fno-rtti -nolibstdc++" }
 // { dg-additional-options "-Wl,-undefined,dynamic_lookup" { target 
*-*-darwin* } }
 // { dg-xfail-if "AIX weak" { powerpc-ibm-aix* } }
 

-- 
Alexandre Oliva, happy hackerhttps://FSFLA.org/blogs/lxo/
   Free Software Activist   GNU Toolchain Engineer
Disinformation flourishes because many people care deeply about injustice
but very few check the facts.  Ask me about 


[PATCH] libstdc++: testsuite: call sched_yield for nonpreemptive targets

2022-06-20 Thread Alexandre Oliva via Gcc-patches


As in the gcc testsuite, systems without preemptive multi-threading
require sched_yield calls to be placed at points in which a context
switch might be needed to enable the test to complete.

Regstrapped on x86_64-linux-gnu, also tested with a cross to
aarch64-rtems6.  Ok to install?


for  libstdc++-v3/ChangeLog

* testsuite/30_threads/this_thread/60421.cc (test02): Call
sched_yield.
---
 .../testsuite/30_threads/this_thread/60421.cc  |1 +
 1 file changed, 1 insertion(+)

diff --git a/libstdc++-v3/testsuite/30_threads/this_thread/60421.cc 
b/libstdc++-v3/testsuite/30_threads/this_thread/60421.cc
index ad6f9aeffcc80..12dbeba1cc492 100644
--- a/libstdc++-v3/testsuite/30_threads/this_thread/60421.cc
+++ b/libstdc++-v3/testsuite/30_threads/this_thread/60421.cc
@@ -59,6 +59,7 @@ test02()
   while (!sleeping)
   {
 // Wait for the thread to start sleeping.
+sched_yield ();
   }
   while (sleeping)
   {

-- 
Alexandre Oliva, happy hackerhttps://FSFLA.org/blogs/lxo/
   Free Software Activist   GNU Toolchain Engineer
Disinformation flourishes because many people care deeply about injustice
but very few check the facts.  Ask me about 


[PATCH] testsuite: pthread: call sched_yield for non-preemptive targets

2022-06-20 Thread Alexandre Oliva via Gcc-patches


Systems without preemptive multi-threading require sched_yield calls
to be placed at points in which a context switch might be needed to
enable the test to complete.

Regstrapped on x86_64-linux-gnu, also tested with a cross to
aarch64-rtems6.  Ok to install?


for  gcc/testsuite/ChangeLog

* gcc.dg/atomic/c11-atomic-exec-4.c: Call sched_yield.
* gcc.dg/atomic/c11-atomic-exec-5.c: Likewise.
* gcc.dg/atomic/pr80640-2.c: Likewise.
* gcc.dg/atomic/pr80640.c: Likewise.
* gcc.dg/atomic/pr81316.c: Likewise.
* gcc.dg/di-sync-multithread.c: Likewise.
---
 gcc/testsuite/gcc.dg/atomic/c11-atomic-exec-4.c |   12 +---
 gcc/testsuite/gcc.dg/atomic/c11-atomic-exec-5.c |6 +-
 gcc/testsuite/gcc.dg/atomic/pr80640-2.c |6 --
 gcc/testsuite/gcc.dg/atomic/pr80640.c   |6 --
 gcc/testsuite/gcc.dg/atomic/pr81316.c   |9 +++--
 gcc/testsuite/gcc.dg/di-sync-multithread.c  |8 
 6 files changed, 37 insertions(+), 10 deletions(-)

diff --git a/gcc/testsuite/gcc.dg/atomic/c11-atomic-exec-4.c 
b/gcc/testsuite/gcc.dg/atomic/c11-atomic-exec-4.c
index d6bb629f59ffa..669e7c058c39e 100644
--- a/gcc/testsuite/gcc.dg/atomic/c11-atomic-exec-4.c
+++ b/gcc/testsuite/gcc.dg/atomic/c11-atomic-exec-4.c
@@ -32,7 +32,10 @@ test_thread_##NAME (void *arg)   
\
 {  \
   thread_ready = true; \
   for (int i = 0; i < ITER_COUNT; i++) \
-PRE var_##NAME POST;   \
+{  \
+  sched_yield ();  \
+  PRE var_##NAME POST; \
+}  \
   return NULL; \
 }  \
\
@@ -49,9 +52,12 @@ test_main_##NAME (void)  
\
   return 1;
\
 }  \
   while (!thread_ready)
\
-;  \
+sched_yield ();\
   for (int i = 0; i < ITER_COUNT; i++) \
-PRE var_##NAME POST;   \
+{  \
+  PRE var_##NAME POST; \
+  sched_yield ();  \
+}  \
   pthread_join (thread_id, NULL);  \
   if (var_##NAME != (FINAL))   \
 {  \
diff --git a/gcc/testsuite/gcc.dg/atomic/c11-atomic-exec-5.c 
b/gcc/testsuite/gcc.dg/atomic/c11-atomic-exec-5.c
index 692c64ad20737..f8bfa63b4cc8a 100644
--- a/gcc/testsuite/gcc.dg/atomic/c11-atomic-exec-5.c
+++ b/gcc/testsuite/gcc.dg/atomic/c11-atomic-exec-5.c
@@ -53,8 +53,11 @@ test_thread_##NAME (void *arg)   
\
   thread_ready = true; \
   while (!thread_stop) \
 {  \
+  sched_yield ();  \
   var_##NAME = (INIT1);\
+  sched_yield ();  \
   var_##NAME = (INIT2);\
+  sched_yield ();  \
 }  \
   return NULL; \
 }  \
@@ -75,13 +78,14 @@ test_main_##NAME (void) 
\
 }  \
   int num_1_pass = 0, num_1_fail = 0, num_2_pass = 0, num_2_fail = 0;  \
   while (!thread_ready)
\
-;  \
+sched_yield ();   

[PATCH] testsuite: outputs.exp: cleanup before running tests (was: Re: [PATCH] testsuite: outputs.exp: test for skip_atsave more thoroughly)

2022-06-20 Thread Alexandre Oliva via Gcc-patches
On Jun 21, 2022, Alexandre Oliva  wrote:

>   * gcc.misc-tests/outputs.exp (outest): Introduce quiet mode,

Use the just-added dry-run infrastructure to clean up files that may
have been left over by interrupted runs of outputs.exp, which used to
lead to spurious non-repeatable (self-fixing) failures.

Regstrapped on x86_64-linux-gnu, also tested with a cross to
aarch64-rtems6.  Ok to install?


for  gcc/testsuite/ChangeLog

* gcc.misc-tests/outputs.exp: Clean up left-overs first.
---
 gcc/testsuite/gcc.misc-tests/outputs.exp |3 +++
 1 file changed, 3 insertions(+)

diff --git a/gcc/testsuite/gcc.misc-tests/outputs.exp 
b/gcc/testsuite/gcc.misc-tests/outputs.exp
index a63ce66693b97..ab919db1ccb2d 100644
--- a/gcc/testsuite/gcc.misc-tests/outputs.exp
+++ b/gcc/testsuite/gcc.misc-tests/outputs.exp
@@ -304,6 +304,9 @@ if { "$aout" != "" } then {
 set oaout "-o $aout"
 }
 
+# Clean up any left-overs from an earlier interrupted run.
+outest "$b-cleanup?" $sing "$oaout" {alt/ dir/ o/ od/ obj/} {{} {} {} {} {} 
{$aout}}
+
 # Sometimes the -I or -L flags that cause the compiler driver to save
 # .args.[01], instead of leaving it for the linker to save .ld1_args,
 # is hiding in driver self specs.


-- 
Alexandre Oliva, happy hackerhttps://FSFLA.org/blogs/lxo/
   Free Software Activist   GNU Toolchain Engineer
Disinformation flourishes because many people care deeply about injustice
but very few check the facts.  Ask me about 


[PATCH] testsuite: outputs.exp: test for skip_atsave more thoroughly

2022-06-20 Thread Alexandre Oliva via Gcc-patches


The presence of -I or -L flags in link command lines changes the
driver's, and thus the linker's behavior, WRT naming files with
command-line options.  With such flags, the driver creates .args.0 and
.args.1 files, whereas without them it's the linker (collect2, really)
that creates .ld1_args.

I've hit some fails on a target system that doesn't have -I or -L
flags in the board config file, but it does add some of them
implicitly with configured-in driver self specs.  Alas, the test in
outputs.exp doesn't catch that, so we proceed to run rather than
skip_atsave tests.

I've reworked the outest procedure to allow dry runs and to return
would-have-been pass/fail results as lists, so we can now test whether
certain files are created and use that to configure the actual test
runs.

Regstrapped on x86_64-linux-gnu, also tested with a cross to
aarch64-rtems6.  Ok to install?


for  gcc/testsuite/ChangeLog

* gcc.misc-tests/outputs.exp (outest): Introduce quiet mode,
create and return lists of passes and fails.  Use it to catch
skip_atsave cases where -L flags are implicitly added by
driver self specs.
---
 gcc/testsuite/gcc.misc-tests/outputs.exp |   49 ++
 1 file changed, 42 insertions(+), 7 deletions(-)

diff --git a/gcc/testsuite/gcc.misc-tests/outputs.exp 
b/gcc/testsuite/gcc.misc-tests/outputs.exp
index afae735e92d76..a63ce66693b97 100644
--- a/gcc/testsuite/gcc.misc-tests/outputs.exp
+++ b/gcc/testsuite/gcc.misc-tests/outputs.exp
@@ -116,8 +116,23 @@ if [info exists env(MAKEFLAGS)] {
 # it weren't for
 # https://core.tcl-lang.org/tcl/tktview?name=5bbd044812), but .{i,s,o}
 # and .[iso] will pass even if only the .o is present.
+
+# Return a list containing two lists, the first naming the passes, the
+# second naming the fails.  If test ends with a question mark, the
+# test is taken as a preparatory test or cleanup, and no pass or fail
+# results will be logged, though the lists will still be built and
+# returned.
 array unset outests *
 proc outest { test sources opts dirs outputs } {
+if { [string index $test end] == "?" } {
+   set quiet 1
+} else {
+   set quiet 0
+}
+
+set passes {}
+set fails {}
+
 global b srcdir subdir
 global outests
 
@@ -182,15 +197,15 @@ proc outest { test sources opts dirs outputs } {
set o "$og"
}
if { [file exists $d$o] } then {
-   pass "$test: $d$o"
+   lappend passes "$d$o"
file delete $d$o
} else {
set ogl [glob -nocomplain -path $d -- $o]
if { $ogl != {} } {
-   pass "$test: $d$o"
+   lappend passes "$d$o"
file delete $ogl
} else {
-   fail "$test: $d$o"
+   lappend fails "$d$o"
}
}
}
@@ -219,17 +234,27 @@ proc outest { test sources opts dirs outputs } {
 }
 
 if { [llength $outb] == 0 } then {
-   pass "$test: extra"
+   lappend passes "extra"
 } else {
-   fail "$test: extra\n$outb"
+   lappend fails "extra\n$outb"
 }
 
 if { [string equal "$gcc_output" ""] } then {
-   pass "$test: std out"
+   lappend passes "std out"
 } else {
-   fail "$test: std out\n$gcc_output"
+   lappend fails "std out\n$gcc_output"
 }
 
+if !$quiet {
+   foreach p $passes {
+   pass "$test: $p"
+   }
+   foreach f $fails {
+   fail "$test: $f"
+   }
+}
+
+return [list $passes $fails]
 }
 
 set sing {-0.c}
@@ -279,6 +304,16 @@ if { "$aout" != "" } then {
 set oaout "-o $aout"
 }
 
+# Sometimes the -I or -L flags that cause the compiler driver to save
+# .args.[01], instead of leaving it for the linker to save .ld1_args,
+# is hiding in driver self specs.
+if !$skip_atsave {
+set atsave_test_out [outest "$b-skip-atsave?" $sing "@/dev/null -o $b.exe 
-save-temps" {} {{.args.1}}]
+if { [lindex [lindex $atsave_test_out 0] 0] == "$b.args.1" } {
+   set skip_atsave 1
+}
+}
+
 # Driver-chosen outputs.
 outest "$b-1 asm default 1" $sing "-S" {} {{-0.s}}
 outest "$b-2 asm default 2" $mult "-S" {} {{-1.s -2.s}}

-- 
Alexandre Oliva, happy hackerhttps://FSFLA.org/blogs/lxo/
   Free Software Activist   GNU Toolchain Engineer
Disinformation flourishes because many people care deeply about injustice
but very few check the facts.  Ask me about 


[PATCH] aarch64: testsuite: symbol-range compile only

2022-06-20 Thread Alexandre Oliva via Gcc-patches


On some of our embedded aarch64 targets, RAM size is too small for
this test to fit.  It doesn't look like this test requires linking,
and if it does, the -tiny version may presumably get most of the
coverage without going overboard in target system requirements.

Regstrapped on x86_64-linux-gnu, also tested with a cross to
aarch64-rtems6.  Ok to install?


for  gcc/testsuite/ChangeLog

* gcc.target/aarch64/symbol-range.c: Compile only.
---
 gcc/testsuite/gcc.target/aarch64/symbol-range.c |2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/gcc/testsuite/gcc.target/aarch64/symbol-range.c 
b/gcc/testsuite/gcc.target/aarch64/symbol-range.c
index d8e82fa1b2829..cc68c19ca85d9 100644
--- a/gcc/testsuite/gcc.target/aarch64/symbol-range.c
+++ b/gcc/testsuite/gcc.target/aarch64/symbol-range.c
@@ -1,4 +1,4 @@
-/* { dg-do link } */
+/* { dg-do compile } */
 /* { dg-options "-O3 -save-temps -mcmodel=small" } */
 
 char fixed_regs[0x8000];

-- 
Alexandre Oliva, happy hackerhttps://FSFLA.org/blogs/lxo/
   Free Software Activist   GNU Toolchain Engineer
Disinformation flourishes because many people care deeply about injustice
but very few check the facts.  Ask me about 


[PATCH] libstdc++-v3: testsuite: complex proj requirements

2022-06-20 Thread Alexandre Oliva via Gcc-patches


The template version of complex::proj returns its argument without
testing for infinities, and that's all we have when neither C99
complex nor C99 math functions are available, and it seems too hard to
do better without isinf and copysign.

I suppose just calling them and expecting users will supply
specializations as needed has been ruled out, and so has refraining
from defining it when it can't be implemented correctly.

It's pointless to run the proj.cc test under these circumstances, so
arrange for it to be skipped.  In an unusual way, after trying to
introduce dg-require tests for ccomplex-or-cmath, and found their
results to be misleading due to variations across -std=* versions.

Regstrapped on x86_64-linux-gnu, also tested with a cross to
aarch64-rtems6.  Ok to install?


for  libstdc++-v3/ChangeLog

* testsuite/26_numerics/complex/proj.cc: Skip test in the
circumstances in which the implementation of proj is known to
be broken.
---
 libstdc++-v3/testsuite/26_numerics/complex/proj.cc |   13 +
 1 file changed, 13 insertions(+)

diff --git a/libstdc++-v3/testsuite/26_numerics/complex/proj.cc 
b/libstdc++-v3/testsuite/26_numerics/complex/proj.cc
index a053119197ccd..69f8153c06f05 100644
--- a/libstdc++-v3/testsuite/26_numerics/complex/proj.cc
+++ b/libstdc++-v3/testsuite/26_numerics/complex/proj.cc
@@ -397,6 +397,19 @@ test03()
 int
 main()
 {
+  /* If neither of these macros is nonzero, proj calls a
+ specialization of the __complex_proj template, that just returns
+ its argument, without testing for infinities, rendering the whole
+ test pointless, and failing (as intended/noted in the
+ implementation) the cases that involve infinities.  Alas, the
+ normal ways to skip tests may not work: we don't have a test for
+ C99_COMPLEX, and these macros may vary depending on -std=*, but
+ macro tests wouldn't take them into account.  */
+#if ! (_GLIBCXX_USE_C99_COMPLEX || _GLIBCXX_USE_C99_MATH_TR1)
+  if (true)
+return 0;
+#endif
+
   test01();
   test02();
   test03();

-- 
Alexandre Oliva, happy hackerhttps://FSFLA.org/blogs/lxo/
   Free Software Activist   GNU Toolchain Engineer
Disinformation flourishes because many people care deeply about injustice
but very few check the facts.  Ask me about 


[PATCH] libstdc++: testsuite: use cmath long double overloads

2022-06-20 Thread Alexandre Oliva via Gcc-patches


In case we need to supplement the C standard library with additional
definitions for float and long double, the declarations expected to be
in the C headers may not be there.  Rely on the cmath overloads
instead.

Regstrapped on x86_64-linux-gnu, also tested with a cross to
aarch64-rtems6.  Ok to install?


for  libstdc++-v3/ChangeLog

* testsuite/20_util/to_chars/long_double.cc: Use cmath
long double overloads for nexttoward and ldexp.
---
 .../testsuite/20_util/to_chars/long_double.cc  |6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/libstdc++-v3/testsuite/20_util/to_chars/long_double.cc 
b/libstdc++-v3/testsuite/20_util/to_chars/long_double.cc
index 0b1c2c2936fdc..498388110b179 100644
--- a/libstdc++-v3/testsuite/20_util/to_chars/long_double.cc
+++ b/libstdc++-v3/testsuite/20_util/to_chars/long_double.cc
@@ -54,11 +54,11 @@ namespace detail
 {
   long double
   nextupl(long double x)
-  { return nexttowardl(x, numeric_limits::infinity()); }
+  { return nexttoward(x, numeric_limits::infinity()); }
 
   long double
   nextdownl(long double x)
-  { return nexttowardl(x, -numeric_limits::infinity()); }
+  { return nexttoward(x, -numeric_limits::infinity()); }
 }
 
 // The long double overloads of std::to_chars currently just go through printf
@@ -138,7 +138,7 @@ test01()
   for (int exponent : {-11000, -3000, -300, -50, -7, 0, 7, 50, 300, 3000, 
11000})
 for (long double testcase : hex_testcases)
   {
-   testcase = ldexpl(testcase, exponent);
+   testcase = ldexp(testcase, exponent);
if (testcase == 0.0L || isinf(testcase))
  continue;
 

-- 
Alexandre Oliva, happy hackerhttps://FSFLA.org/blogs/lxo/
   Free Software Activist   GNU Toolchain Engineer
Disinformation flourishes because many people care deeply about injustice
but very few check the facts.  Ask me about 


[PATCH] libstdc++: testsuite: require cmath for exp simd

2022-06-20 Thread Alexandre Oliva via Gcc-patches


simd_math.h assumes declarations for many C99 functions to be present,
that libstdc++ doesn't add to target systems that don't have them in
the C library.

Add the C99 math requirement to tests for simd features, so that they
don't fail because of limitations of the target C library.

Regstrapped on x86_64-linux-gnu, also tested with a cross to
aarch64-rtems6.  Ok to install?


for  libstdc++-v3/ChangeLog

* testsuite/experimental/simd/standard_abi_usable.cc: Require
cmath support.
* testsuite/experimental/simd/standard_abi_usable_2.cc:
Likewise.
---
 .../experimental/simd/standard_abi_usable.cc   |1 +
 .../experimental/simd/standard_abi_usable_2.cc |1 +
 2 files changed, 2 insertions(+)

diff --git a/libstdc++-v3/testsuite/experimental/simd/standard_abi_usable.cc 
b/libstdc++-v3/testsuite/experimental/simd/standard_abi_usable.cc
index 4d7e6726951fe..1b686d9ca095b 100644
--- a/libstdc++-v3/testsuite/experimental/simd/standard_abi_usable.cc
+++ b/libstdc++-v3/testsuite/experimental/simd/standard_abi_usable.cc
@@ -1,5 +1,6 @@
 // { dg-options "-std=c++17 -fno-fast-math" }
 // { dg-do compile { target c++17 } }
+// { dg-require-cmath "" }
 
 // Copyright (C) 2020-2022 Free Software Foundation, Inc.
 //
diff --git a/libstdc++-v3/testsuite/experimental/simd/standard_abi_usable_2.cc 
b/libstdc++-v3/testsuite/experimental/simd/standard_abi_usable_2.cc
index a609adaf000b3..a0203d0b4238d 100644
--- a/libstdc++-v3/testsuite/experimental/simd/standard_abi_usable_2.cc
+++ b/libstdc++-v3/testsuite/experimental/simd/standard_abi_usable_2.cc
@@ -1,4 +1,5 @@
 // { dg-options "-std=c++17 -ffast-math" }
 // { dg-do compile }
+// { dg-require-cmath "" }
 
 #include "standard_abi_usable.cc"

-- 
Alexandre Oliva, happy hackerhttps://FSFLA.org/blogs/lxo/
   Free Software Activist   GNU Toolchain Engineer
Disinformation flourishes because many people care deeply about injustice
but very few check the facts.  Ask me about 


[PATCH] libstdc++: testsuite: require cmath for nexttowardl

2022-06-20 Thread Alexandre Oliva via Gcc-patches


nexttowardl is only expected to be available with C99 math, but
20_util/to_chars/long_double.cc uses it unconditionally.

State the cmath requirement in the test.

Regstrapped on x86_64-linux-gnu, also tested with a cross to
aarch64-rtems6.  Ok to install?


for  libstdc++-v3/ChangeLog

* testsuite/20_util/to_chars/long_double.cc: Require cmath.
---
 .../testsuite/20_util/to_chars/long_double.cc  |1 +
 1 file changed, 1 insertion(+)

diff --git a/libstdc++-v3/testsuite/20_util/to_chars/long_double.cc 
b/libstdc++-v3/testsuite/20_util/to_chars/long_double.cc
index 94b5b5967d374..0b1c2c2936fdc 100644
--- a/libstdc++-v3/testsuite/20_util/to_chars/long_double.cc
+++ b/libstdc++-v3/testsuite/20_util/to_chars/long_double.cc
@@ -36,6 +36,7 @@
 
 // { dg-require-effective-target ieee_floats }
 // { dg-require-effective-target size32plus }
+// { dg-require-cmath "" }
 
 #include 
 


-- 
Alexandre Oliva, happy hackerhttps://FSFLA.org/blogs/lxo/
   Free Software Activist   GNU Toolchain Engineer
Disinformation flourishes because many people care deeply about injustice
but very few check the facts.  Ask me about 


[PATCH] libstdc++: testsuite: work around bitset namespace pollution

2022-06-20 Thread Alexandre Oliva via Gcc-patches


rtems6 declares a global struct bitset in a header file included
indirectly by sys/types.h, that ambiguates the unqualified references
to bitset after "using namespace std" in the testsuite.

Work around the namespace pollution with using declarations of
std::bitset.

Regstrapped on x86_64-linux-gnu, also tested with a cross to
aarch64-rtems6.0.  Ok to install?


for  libstdc++-v3/ChangeLog

* testsuite/23_containers/bitset/cons/dr1325-2.cc: Work around
global struct bitset.
* testsuite/23_containers/bitset/ext/15361.cc: Likewise.
* testsuite/23_containers/bitset/input/1.cc: Likewise.
* testsuite/23_containers/bitset/to_string/1.cc: Likewise.
* testsuite/23_containers/bitset/to_string/dr396.cc: Likewise.
---
 .../23_containers/bitset/cons/dr1325-2.cc  |1 +
 .../testsuite/23_containers/bitset/ext/15361.cc|1 +
 .../testsuite/23_containers/bitset/input/1.cc  |1 +
 .../testsuite/23_containers/bitset/to_string/1.cc  |1 +
 .../23_containers/bitset/to_string/dr396.cc|1 +
 5 files changed, 5 insertions(+)

diff --git a/libstdc++-v3/testsuite/23_containers/bitset/cons/dr1325-2.cc 
b/libstdc++-v3/testsuite/23_containers/bitset/cons/dr1325-2.cc
index 4b79c9f046dbc..2371bef09cca7 100644
--- a/libstdc++-v3/testsuite/23_containers/bitset/cons/dr1325-2.cc
+++ b/libstdc++-v3/testsuite/23_containers/bitset/cons/dr1325-2.cc
@@ -39,6 +39,7 @@ template
 void test01()
 {
   using namespace std;
+  using std::bitset; // Work around struct ::bitset on rtems.
 
   const char s1[4] = { '0', '1', '0', '1' };
   VERIFY( bitset<4>(s1, 4) == test01_ref<4>(s1, 4) );
diff --git a/libstdc++-v3/testsuite/23_containers/bitset/ext/15361.cc 
b/libstdc++-v3/testsuite/23_containers/bitset/ext/15361.cc
index 40cb94966ab8f..392470084aee5 100644
--- a/libstdc++-v3/testsuite/23_containers/bitset/ext/15361.cc
+++ b/libstdc++-v3/testsuite/23_containers/bitset/ext/15361.cc
@@ -22,6 +22,7 @@
 void test01()
 {
   using namespace std;
+  using std::bitset; // Work around struct ::bitset on rtems.
 
   bitset<256> b;
   b.set(225);
diff --git a/libstdc++-v3/testsuite/23_containers/bitset/input/1.cc 
b/libstdc++-v3/testsuite/23_containers/bitset/input/1.cc
index 8738c77238377..939861b171eaa 100644
--- a/libstdc++-v3/testsuite/23_containers/bitset/input/1.cc
+++ b/libstdc++-v3/testsuite/23_containers/bitset/input/1.cc
@@ -26,6 +26,7 @@
 void test01()
 {
   using namespace std;
+  using std::bitset; // Work around struct ::bitset on rtems.
 
   bitset<5>  b5;
   bitset<0>  b0;
diff --git a/libstdc++-v3/testsuite/23_containers/bitset/to_string/1.cc 
b/libstdc++-v3/testsuite/23_containers/bitset/to_string/1.cc
index f4af91373cc37..8384eb96d2547 100644
--- a/libstdc++-v3/testsuite/23_containers/bitset/to_string/1.cc
+++ b/libstdc++-v3/testsuite/23_containers/bitset/to_string/1.cc
@@ -25,6 +25,7 @@
 void test01()
 {
   using namespace std;
+  using std::bitset; // Work around struct ::bitset on rtems.
 
   bitset<5> b5;
   string s0 = b5.to_string, allocator >();
diff --git a/libstdc++-v3/testsuite/23_containers/bitset/to_string/dr396.cc 
b/libstdc++-v3/testsuite/23_containers/bitset/to_string/dr396.cc
index 8faded348479a..dfba27ed3afa1 100644
--- a/libstdc++-v3/testsuite/23_containers/bitset/to_string/dr396.cc
+++ b/libstdc++-v3/testsuite/23_containers/bitset/to_string/dr396.cc
@@ -26,6 +26,7 @@
 void test01()
 {
   using namespace std;
+  using std::bitset; // Work around struct ::bitset on rtems.
 
   bitset<5> b5;
   string s0 = b5.to_string, allocator >('a', 
'b');


-- 
Alexandre Oliva, happy hackerhttps://FSFLA.org/blogs/lxo/
   Free Software Activist   GNU Toolchain Engineer
Disinformation flourishes because many people care deeply about injustice
but very few check the facts.  Ask me about 


Re: [PATCH] x86-64: Remove HAVE_LD_PIE_COPYRELOC

2022-06-20 Thread Fangrui Song via Gcc-patches
On Wed, Jun 15, 2022 at 2:34 AM Fangrui Song  wrote:
>
> This was introduced in 2014-12 to use local binding for external symbols
> for -fPIE.  It avoids a GOT indirection but the same optimizationis
> obtained with ld's R_X86_64_[REX_]GOTPCRELX optimization (albeit with
> slightly longer code).
>
> One design goal of -fPIE was to avoid copy relocations.
> HAVE_LD_PIE_COPYRELOC has deviated from the goal.  By removing
> HAVE_LD_PIE_COPYRELOC, the -fPIE behavior of x86-64 will match x86-32
> and other targets.
>
> The design goal of protected symbols was to improve performance similar
> to -Bsymbolic.  lld rejects copy relocations on data symbols.  Latest
> glibc rtld reports a warning when a protected data symbol is copy
> relocated[1].  With the adoption of PIE most object files are -fPIE or
> -fPIC.  -fPIE defaulting to the possibly copy relocations behavior makes
> protected data symbols infeasible to adopt on x86-64.
>
> [1]: 
> https://sourceware.org/git/?p=glibc.git;a=commit;h=7374c02b683b7110b853a32496a619410364d70b
> ("elf: Refine direct extern access diagnostics to protected symbol")
> ---
>  gcc/config.in |  6 ---
>  gcc/config/i386/i386.cc   | 16 +-
>  gcc/configure | 52 ---
>  gcc/configure.ac  | 48 -
>  gcc/doc/sourcebuild.texi  |  3 --
>  .../gcc.target/i386/pie-copyrelocs-1.c| 14 -
>  .../gcc.target/i386/pie-copyrelocs-2.c| 14 -
>  .../gcc.target/i386/pie-copyrelocs-3.c| 14 -
>  .../gcc.target/i386/pie-copyrelocs-4.c| 17 --
>  gcc/testsuite/gcc.target/i386/pr32219-9.c |  1 -
>  gcc/testsuite/lib/target-supports.exp | 47 -
>  11 files changed, 2 insertions(+), 230 deletions(-)
>  delete mode 100644 gcc/testsuite/gcc.target/i386/pie-copyrelocs-1.c
>  delete mode 100644 gcc/testsuite/gcc.target/i386/pie-copyrelocs-2.c
>  delete mode 100644 gcc/testsuite/gcc.target/i386/pie-copyrelocs-3.c
>  delete mode 100644 gcc/testsuite/gcc.target/i386/pie-copyrelocs-4.c
>
> diff --git a/gcc/config.in b/gcc/config.in
> index 16bb963b45b..ade42625deb 100644
> --- a/gcc/config.in
> +++ b/gcc/config.in
> @@ -1691,12 +1691,6 @@
>  #endif
>
>
> -/* Define 0/1 if your linker supports -pie option with copy reloc. */
> -#ifndef USED_FOR_TARGET
> -#undef HAVE_LD_PIE_COPYRELOC
> -#endif
> -
> -
>  /* Define if your PowerPC linker has .gnu.attributes long double support. */
>  #ifndef USED_FOR_TARGET
>  #undef HAVE_LD_PPC_GNU_ATTR_LONG_DOUBLE
> diff --git a/gcc/config/i386/i386.cc b/gcc/config/i386/i386.cc
> index 3d189e124e4..f9fd9650f7c 100644
> --- a/gcc/config/i386/i386.cc
> +++ b/gcc/config/i386/i386.cc
> @@ -10790,16 +10790,7 @@ legitimate_pic_address_disp_p (rtx disp)
>   || ix86_cmodel == CM_SMALL_PIC)
> return true;
> }
> - else if (!SYMBOL_REF_FAR_ADDR_P (op0)
> -  && (SYMBOL_REF_LOCAL_P (op0)
> -  || ((ix86_direct_extern_access
> -   && !(SYMBOL_REF_DECL (op0)
> -&& lookup_attribute 
> ("nodirect_extern_access",
> - DECL_ATTRIBUTES 
> (SYMBOL_REF_DECL (op0)
> -  && HAVE_LD_PIE_COPYRELOC
> -  && flag_pie
> -  && !SYMBOL_REF_WEAK (op0)
> -  && !SYMBOL_REF_FUNCTION_P (op0)))
> + else if (!SYMBOL_REF_FAR_ADDR_P (op0) && SYMBOL_REF_LOCAL_P (op0)
>&& ix86_cmodel != CM_LARGE_PIC)
> return true;
>   break;
> @@ -23815,10 +23806,7 @@ ix86_binds_local_p (const_tree exp)
>  ix86_has_no_direct_extern_access = true;
>return default_binds_local_p_3 (exp, flag_shlib != 0, true,
>   direct_extern_access,
> - (direct_extern_access
> -  && (!flag_pic
> -  || (TARGET_64BIT
> -  && HAVE_LD_PIE_COPYRELOC != 0;
> + (direct_extern_access && !flag_pic));
>  }
>
>  /* If flag_pic or ix86_direct_extern_access is false, then neither
> diff --git a/gcc/configure b/gcc/configure
> index f43dc989d02..bf8aaec6e05 100755
> --- a/gcc/configure
> +++ b/gcc/configure
> @@ -30081,58 +30081,6 @@ fi
>  { $as_echo "$as_me:${as_lineno-$LINENO}: result: $gcc_cv_ld_pie" >&5
>  $as_echo "$gcc_cv_ld_pie" >&6; }
>
> -{ $as_echo "$as_me:${as_lineno-$LINENO}: checking linker PIE support with 
> copy reloc" >&5
> -$as_echo_n "checking linker PIE support with copy reloc... " >&6; }
> -gcc_cv_ld_pie_copyreloc=no
> -if test $gcc_cv_ld_pie = yes ; then
> -  if test $in_tree_ld = yes ; then
> -if test "$gcc_cv_gld_major_version" -eq 2 -a "$gcc_cv_gld_minor_version" 

Re: [PATCH] Revert "[PATCH] RISC-V: Use new linker emulations for glibc ABI."

2022-06-20 Thread Fangrui Song via Gcc-patches
On Mon, Jun 20, 2022 at 1:21 AM Kito Cheng  wrote:
>
> Generally I agree we should fix that by GCC driver rather than ld
> emulation, but I think this should be reverted with the -L path fix,
> otherwise that will break multilib on GNU toolchain for linux
> immediately?

Thanks for the good consideration. That said, I am unsure any distro
uses this currently.
I think some just work around the possibly non-existent paths by
creating symlinks.
Perhaps we should prioritize on fixing the scheme before distros start
to rely on the behavior.

> On Wed, Jun 15, 2022 at 4:00 PM Fangrui Song via Gcc-patches
>  wrote:
> >
> > This reverts commit 37d57ac9a636f2235f9060e84fb8dd7968abd1dc.
> >
> > The resolution to https://sourceware.org/bugzilla/show_bug.cgi?id=22962
> > let GCC pass -m emulation to ld and let the ld emulation configure
> > default library paths.  This scheme is problematic:
> >
> > * It's not ld's business to specify default -L.  Different platforms have
> > different opinions on the hierarchy and all other arches work well without 
> > ld's
> > default -L.
> > * If some ABI derived library paths are desired, the compiler driver is in a
> > better position to make the decision and traditionally has done this.
> > * -m emulation is opaque to the compiler driver.  It doesn't affect -B, so
> > data files like crt*.o, libasan_preinit.o, and libtsan_preinit.o are not 
> > affected.
> >
> > As is, many platforms just use symlinks to fake the 
> > lib64/{ilp32{,f},lp64{,f}}
> > hierarchies needed by the GNU ld emulation.  They can always specify -L
> > explicitly if they want some ABI derived library paths.  See also the 
> > rejected
> > https://reviews.llvm.org/D95755
> >
> > gcc/Changelog:
> >
> > * config/riscv/linux.h (LD_EMUL_SUFFIX): Remove.
> > (LINK_SPEC): Remove LD_EMUL_SUFFIX.
> > ---
> >  gcc/config/riscv/linux.h | 10 +-
> >  1 file changed, 1 insertion(+), 9 deletions(-)
> >
> > diff --git a/gcc/config/riscv/linux.h b/gcc/config/riscv/linux.h
> > index 38803723ba9..e0ff6e6a178 100644
> > --- a/gcc/config/riscv/linux.h
> > +++ b/gcc/config/riscv/linux.h
> > @@ -49,16 +49,8 @@ along with GCC; see the file COPYING3.  If not see
> >
> >  #define CPP_SPEC "%{pthread:-D_REENTRANT}"
> >
> > -#define LD_EMUL_SUFFIX \
> > -  "%{mabi=lp64d:}" \
> > -  "%{mabi=lp64f:_lp64f}" \
> > -  "%{mabi=lp64:_lp64}" \
> > -  "%{mabi=ilp32d:}" \
> > -  "%{mabi=ilp32f:_ilp32f}" \
> > -  "%{mabi=ilp32:_ilp32}"
> > -
> >  #define LINK_SPEC "\
> > --melf" XLEN_SPEC DEFAULT_ENDIAN_SPEC "riscv" LD_EMUL_SUFFIX " \
> > +-melf" XLEN_SPEC DEFAULT_ENDIAN_SPEC "riscv \
> >  %{mno-relax:--no-relax} \
> >  %{mbig-endian:-EB} \
> >  %{mlittle-endian:-EL} \
> > --
> > 2.36.1.476.g0c4daa206d-goog
> >



-- 
宋方睿


Re: [PATCH] if-to-switch: Don't skip the first condition bb when find_conditions in if-to-switch [PR105740]

2022-06-20 Thread Xionghu Luo via Gcc-patches

Correct the format...

 test2:
.LFB0:
        .cfi_startproc
        xorl    %edx, %edx
        cmpl    $3, (%rdi)
        jle     .L1
        movl    16(%rdi), %eax
        cmpl    $1, %eax
        je      .L4
        subl    $2, %eax
        cmpl    $4, %eax
        ja      .L1
        movl    CSWTCH.1(,%rax,4), %edx
.L1:
        movl    %edx, %eax
        ret
        .p2align 4,,10
        .p2align 3
.L4:
        movl    $12, %edx
        jmp     .L1
        .cfi_endproc
.LFE0:
        .size   test2, .-test2
        .section        .rodata
        .align 16
        .type   CSWTCH.1, @object
        .size   CSWTCH.1, 20
CSWTCH.1:
        .long   27
        .long   38
        .long   18
        .long   58
        .long   68



With the patch attatched:


 test2:
.LFB0:
        .cfi_startproc
        xorl    %edx, %edx
        cmpl    $3, (%rdi)
        jle     .L1
        movl    16(%rdi), %eax
        subl    $1, %eax
        cmpl    $5, %eax
        jbe     .L6
.L1:
        movl    %edx, %eax
        ret
        .p2align 4,,10
        .p2align 3
.L6:
        movl    CSWTCH.1(,%rax,4), %edx
        movl    %edx, %eax
        ret
        .cfi_endproc
.LFE0:
        .size   test2, .-test2
        .section        .rodata
        .align 16
        .type   CSWTCH.1, @object
        .size   CSWTCH.1, 24
CSWTCH.1:
        .long   12
        .long   27
        .long   38
        .long   18
        .long   58
        .long   68




On 2022/6/21 11:05, xionghuluo(罗雄虎) via Gcc-patches wrote:

Current GCC generates:


test2:
.LFB0:
.cfi_startproc
xorl  %edx, %edx
cmpl  $3, (%rdi)
jle   .L1
movl  16(%rdi), %eax
cmpl  $1, %eax
je   .L4
subl  $2, %eax
cmpl  $4, %eax
ja   .L1
movl  CSWTCH.1(,%rax,4), %edx
.L1:
movl  %edx, %eax
ret
.p2align 4,,10
.p2align 3
.L4:
movl  $12, %edx
jmp   .L1
.cfi_endproc
.LFE0:
.size  test2, .-test2
.section.rodata
.align 16
.type  CSWTCH.1, @object
.size  CSWTCH.1, 20
CSWTCH.1:
.long  27
.long  38
.long  18
.long  58
.long  68




With the patch attatched:


test2:
.LFB0:
.cfi_startproc
xorl  %edx, %edx
cmpl  $3, (%rdi)
jle   .L1
movl  16(%rdi), %eax
subl  $1, %eax
cmpl  $5, %eax
jbe   .L6
.L1:
movl  %edx, %eax
ret
.p2align 4,,10
.p2align 3
.L6:
movl  CSWTCH.1(,%rax,4), %edx
movl  %edx, %eax
ret
.cfi_endproc
.LFE0:
.size  test2, .-test2
.section.rodata
.align 16
.type  CSWTCH.1, @object
.size  CSWTCH.1, 24
CSWTCH.1:
.long  12
.long  27
.long  38
.long  18
.long  58
.long  68


Bootstrap and regression tested pass on x86_64-linux-gnu, OK for master?




Re: [PING][PATCH][WIP] have configure probe prefix for gmp/mpfr/mpc [PR44425]

2022-06-20 Thread Alexandre Oliva via Gcc-patches
Hello, Eric,

On Jun  9, 2022, Eric Gallager  wrote:

> (cc-ing the build machinery maintainers listed in MAINTAINERS this time)

Thanks, I'd missed it the first time.

> On Thu, Jun 2, 2022 at 11:53 AM Eric Gallager  wrote:

>> So, I'm working on fixing PR bootstrap/44425, and have this patch to
>> have the top-level configure script check in the value passed to
>> `--prefix=` when looking for gmp/mpfr/mpc. It "works" (in that
>> configuring with just `--prefix=` and none of
>> `--with-gmp=`/`--with-mpfr=`/`--with-mpc=` now works where it failed
>> before), but unfortunately it results in a bunch of duplicated
>> `-I`/`-L` flags stuck in ${gmplibs} and ${gmpinc}... is that
>> acceptable or should I try another approach?

I wonder if it would make sense to add -L${libdir} and -I${includedir}
to host flags.  It would obviate the explicit flag-setting for each of
the libs, and it would address the apparent double-setting when prefix
is set, or confusingly partial overrides when --with-*-include and
--with-*-lib are used.

It would be IMHO preferrable to use libdir and includedir, rather than
prefix, especially for cases in which exec_prefix != prefix, but takes
some work for libdir and includedir to be expanded correctly during
configure.  E.g., libdir is normally set to ${exec_prefix}/lib, and
exec_prefix defaults to ${prefix}, but a shell won't expand multiple
layers of macros like make does, so configure needs some help with that.

-- 
Alexandre Oliva, happy hackerhttps://FSFLA.org/blogs/lxo/
   Free Software Activist   GNU Toolchain Engineer
Disinformation flourishes because many people care deeply about injustice
but very few check the facts.  Ask me about 


[PATCH] if-to-switch: Don't skip the first condition bb when find_conditions in if-to-switch [PR105740]

2022-06-20 Thread 罗雄虎
Current GCC generates:


test2:
.LFB0:
.cfi_startproc
xorl  %edx, %edx
cmpl  $3, (%rdi)
jle   .L1
movl  16(%rdi), %eax
cmpl  $1, %eax
je   .L4
subl  $2, %eax
cmpl  $4, %eax
ja   .L1
movl  CSWTCH.1(,%rax,4), %edx
.L1:
movl  %edx, %eax
ret
.p2align 4,,10
.p2align 3
.L4:
movl  $12, %edx
jmp   .L1
.cfi_endproc
.LFE0:
.size  test2, .-test2
.section.rodata
.align 16
.type  CSWTCH.1, @object
.size  CSWTCH.1, 20
CSWTCH.1:
.long  27
.long  38
.long  18
.long  58
.long  68




With the patch attatched:


test2:
.LFB0:
.cfi_startproc
xorl  %edx, %edx
cmpl  $3, (%rdi)
jle   .L1
movl  16(%rdi), %eax
subl  $1, %eax
cmpl  $5, %eax
jbe   .L6
.L1:
movl  %edx, %eax
ret
.p2align 4,,10
.p2align 3
.L6:
movl  CSWTCH.1(,%rax,4), %edx
movl  %edx, %eax
ret
.cfi_endproc
.LFE0:
.size  test2, .-test2
.section.rodata
.align 16
.type  CSWTCH.1, @object
.size  CSWTCH.1, 24
CSWTCH.1:
.long  12
.long  27
.long  38
.long  18
.long  58
.long  68


Bootstrap and regression tested pass on x86_64-linux-gnu, OK for master?

0001-if-to-switch-Don-t-skip-the-first-condition-bb-when-.patch
Description: Binary data


RE: [PATCH] i386: Add syscall to enable AMX for latest kernels

2022-06-20 Thread Jiang, Haochen via Gcc-patches
> -Original Message-
> From: Uros Bizjak 
> Sent: Monday, June 20, 2022 10:54 PM
> To: Jiang, Haochen 
> Cc: gcc-patches@gcc.gnu.org; Liu, Hongtao 
> Subject: Re: [PATCH] i386: Add syscall to enable AMX for latest kernels
> 
> On Mon, Jun 20, 2022 at 10:04 AM Haochen Jiang 
> wrote:
> >
> > From: "Jiang, Haochen" 
> >
> > Hi all,
> >
> > We need syscall to enable AMX for kernels>=5.4. It is missing in
> > current amx tests, which will cause test fail.
> 
> So this new code is only valid for linux & co?

Thanks for reminding me for that, I only test on linux since the header file is 
only in linux.

Just updated a patch wrapping with a macro not to change the behavior on 
windows.

Regtested on x86_64-pc-linux-gnu.

Thx,
Haochen
> 
> Uros.
> 
> >
> > This patch aims to add them to fix this bug.
> >
> > BRs,
> > Haochen
> >
> > gcc/testsuite/ChangeLog:
> >
> > * gcc.target/i386/amx-check.h (request_perm_xtile_data):
> > New function to check if AMX is usable and enable AMX.
> > (main): Run test if AMX is usable.
> > ---
> >  gcc/testsuite/gcc.target/i386/amx-check.h | 24
> > +++
> >  1 file changed, 24 insertions(+)
> >
> > diff --git a/gcc/testsuite/gcc.target/i386/amx-check.h
> > b/gcc/testsuite/gcc.target/i386/amx-check.h
> > index 434b0e59703..92ed8669304 100644
> > --- a/gcc/testsuite/gcc.target/i386/amx-check.h
> > +++ b/gcc/testsuite/gcc.target/i386/amx-check.h
> > @@ -4,11 +4,22 @@
> >  #include 
> >  #include 
> >  #include 
> > +#include 
> > +#include 
> >  #ifdef DEBUG
> >  #include 
> >  #endif
> >  #include "cpuid.h"
> >
> > +#define XFEATURE_XTILECFG  17
> > +#define XFEATURE_XTILEDATA 18
> > +#define XFEATURE_MASK_XTILECFG (1 << XFEATURE_XTILECFG)
> > +#define XFEATURE_MASK_XTILEDATA(1 << XFEATURE_XTILEDATA)
> > +#define XFEATURE_MASK_XTILE(XFEATURE_MASK_XTILECFG |
> XFEATURE_MASK_XTILEDATA)
> > +
> > +#define ARCH_GET_XCOMP_PERM0x1022
> > +#define ARCH_REQ_XCOMP_PERM0x1023
> > +
> >  /* TODO: The tmm emulation is temporary for current
> > AMX implementation with no tmm regclass, should
> > be changed in the future. */
> > @@ -44,6 +55,18 @@ typedef struct __tile
> >  /* Stride (colum width in byte) used for tileload/store */  #define
> > _STRIDE 64
> >
> > +/* We need syscall to use amx functions */ int
> > +request_perm_xtile_data() {
> > +  unsigned long bitmask;
> > +
> > +  if (syscall (SYS_arch_prctl, ARCH_REQ_XCOMP_PERM,
> XFEATURE_XTILEDATA) ||
> > +  syscall (SYS_arch_prctl, ARCH_GET_XCOMP_PERM, ))
> > +return 0;
> > +
> > +  return (bitmask & XFEATURE_MASK_XTILE) != 0; }
> > +
> >  /* Initialize tile config by setting all tmm size to 16x64 */  void
> > init_tile_config (__tilecfg_u *dst)  { @@ -186,6 +209,7 @@ main ()
> > #ifdef AMX_BF16
> >&& __builtin_cpu_supports ("amx-bf16")  #endif
> > +  && request_perm_xtile_data ()
> >)
> >  {
> >DO_TEST ();
> > --
> > 2.18.2
> >


0001-i386-Add-syscall-to-enable-AMX-for-latest-kernels.patch
Description: 0001-i386-Add-syscall-to-enable-AMX-for-latest-kernels.patch


Re: [rs6000 PATCH] PR target/105991: Recognize PLUS and XOR forms of rldimi.

2022-06-20 Thread Kewen.Lin via Gcc-patches
on 2022/6/21 06:10, Segher Boessenkool wrote:
> Hi!
> 
> On Fri, Jun 17, 2022 at 07:13:37AM +0200, Roger Sayle wrote:
>> This patch addresses PR target/105991 where a change to prefer representing
>> shifts and adds at the tree-level as multiplications, causes problems for
>> the rldimi patterns in the powerpc backend.
> 
> Because it now is converted to different RTL at expand time.  Which the
> generic expand code does some premature optimisation on, which makes us
> end up with the addition instead of data manipulation insns.  Oh well.
> 
>> The issue is that rs6000.md
>> models this pattern using IOR, and some variants that have the equivalent
>> PLUS or XOR in the RTL fail to match some *rotl4_insert patterns.
>> This is fixed in this patch by adding a define_insn_and_split to locally
>> canonicalize the PLUS and XOR forms to the backend's preferred IOR form.
> 
> Okay.
> 
>> An alternative fix might be for the RTL optimizers to define a canonical
>> form for these plus_xor_ior equivalent expressions, but the logical
>> choice might be plus (which may appear in an addressing mode), and such
>> a change may require a number of tweaks to update various backends
>> (i.e.  a more intrusive change than the one proposed here).
> 
> This does not make sense in an address at all, thankfully :-)
> 
> The only sane canonicalisation for this is something like VEC_DUPLICATE
> but for submodes of integer modes, instead of the component mode of a
> vector mode.  I don't feel this is worth trying to handle in general
> though.
> 
>> Many thanks for Marek Polacek for bootstrapping and regression testing
>> this change without problems.
> 
> You have an account on the cfarm, it is quick and easy to test there :-)
> I recommend gcc135, a 32 core p9, with oodles of disk space :-)
> 
>> +; Canonicalize the PLUS and XOR forms to IOR for rotl3_insert_3
>> +(define_code_iterator plus_xor [plus xor])
>> +
>> +(define_insn_and_split "*rotl3_insert_3_"
>> +  [(set (match_operand:GPR 0 "gpc_reg_operand" "=r")
>> +(plus_xor:GPR
>> +  (and:GPR (match_operand:GPR 3 "gpc_reg_operand" "0")
>> +   (match_operand:GPR 4 "const_int_operand" "n"))
>> +  (ashift:GPR (match_operand:GPR 1 "gpc_reg_operand" "r")
>> +  (match_operand:SI 2 "const_int_operand" "n"]
>> +  "INTVAL (operands[2]) == exact_log2 (UINTVAL (operands[4]) + 1)"
> 
> exact_log2 returns -1 if its argument is not a power of two.  Please
> test it is > 0 explicitly here: I don't think this splitter will work
> correctly otherwise.  There shouldn't really be a shift by 0 ever of
> course, but it isn't invalid RTL.
> 
>> --- /dev/null
>> +++ b/gcc/testsuite/gcc.target/powerpc/pr105991.c
>> @@ -0,0 +1,11 @@
>> +/* { dg-do compile } */
>> +/* { dg-options "-O2" } */
>> +unsigned long long
>> +foo (unsigned long long value)
>> +{
>> +  value &= 0x;
>> +  value |= value << 32;
>> +  return value;
>> +}
>> +/* { dg-final { scan-assembler "rldimi" } } */
> 
> Write
> /* { dg-final { scan-assembler {\mrldimi\M} } } */
> please.
> 

This case also needs effective-target keyword lp64,
that is /* { dg-require-effective-target lp64 } */

since with -m32, it gets:
  mr 3,4

with -m32 -mpowerpc64, it gets:
  rldicl 3,4,0,32


BR,
Kewen


Re: [PATCH 2/2]middle-end: Support recognition of three-way max/min.

2022-06-20 Thread Andrew Pinski via Gcc-patches
On Thu, Jun 16, 2022 at 4:11 AM Tamar Christina via Gcc-patches
 wrote:
>
> Hi All,
>
> This patch adds support for three-way min/max recognition in phi-opts.
>
> Concretely for e.g.
>
> #include 
>
> uint8_t three_min (uint8_t xc, uint8_t xm, uint8_t xy) {
> uint8_t  xk;
> if (xc < xm) {
> xk = (uint8_t) (xc < xy ? xc : xy);
> } else {
> xk = (uint8_t) (xm < xy ? xm : xy);
> }
> return xk;
> }
>
> we generate:
>
>[local count: 1073741824]:
>   _5 = MIN_EXPR ;
>   _7 = MIN_EXPR ;
>   return _7;
>
> instead of
>
>   :
>   if (xc_2(D) < xm_3(D))
> goto ;
>   else
> goto ;
>
>   :
>   xk_5 = MIN_EXPR ;
>   goto ;
>
>   :
>   xk_6 = MIN_EXPR ;
>
>   :
>   # xk_1 = PHI 
>   return xk_1;
>
> The same function also immediately deals with turning a minimization problem
> into a maximization one if the results are inverted.  We do this here since
> doing it in match.pd would end up changing the shape of the BBs and adding
> additional instructions which would prevent various optimizations from 
> working.
>
> Bootstrapped Regtested on aarch64-none-linux-gnu and no issues.
>
> Ok for master?
>
> Thanks,
> Tamar
>
> gcc/ChangeLog:
>
> * tree-ssa-phiopt.cc (minmax_replacement): Optionally search for the 
> phi
> sequence of a three-way conditional.
> (replace_phi_edge_with_variable): Support deferring of BB removal.
> (tree_ssa_phiopt_worker): Detect diamond phi structure for three-way
> min/max.
> (strip_bit_not, invert_minmax_code): New.

I have been working on getting rid of minmax_replacement and a few
others and only having match_simplify_replacement and having the
simplification logic all in match.pd instead.
Is there a reason why you can't expand match_simplify_replacement and match.pd?

>The reason was that a lot of the foldings checked that the BB contains only
> a single SSA and that that SSA is a phi node.

Could you expand on that?

Thanks,
Andrew

>
> gcc/testsuite/ChangeLog:
>
> * gcc.dg/tree-ssa/split-path-1.c: Disable phi-opts so we don't 
> optimize
> code away.
> * gcc.dg/tree-ssa/minmax-3.c: New test.
> * gcc.dg/tree-ssa/minmax-4.c: New test.
> * gcc.dg/tree-ssa/minmax-5.c: New test.
> * gcc.dg/tree-ssa/minmax-6.c: New test.
> * gcc.dg/tree-ssa/minmax-7.c: New test.
> * gcc.dg/tree-ssa/minmax-8.c: New test.
>
> --- inline copy of patch --
> diff --git a/gcc/testsuite/gcc.dg/tree-ssa/minmax-3.c 
> b/gcc/testsuite/gcc.dg/tree-ssa/minmax-3.c
> new file mode 100644
> index 
> ..de3b2e946e81701e3b75f580e6a843695a05786e
> --- /dev/null
> +++ b/gcc/testsuite/gcc.dg/tree-ssa/minmax-3.c
> @@ -0,0 +1,17 @@
> +/* { dg-do compile } */
> +/* { dg-options "-O -fdump-tree-phiopt" } */
> +
> +#include 
> +
> +uint8_t three_min (uint8_t xc, uint8_t xm, uint8_t xy) {
> +   uint8_t  xk;
> +if (xc < xm) {
> +xk = (uint8_t) (xc < xy ? xc : xy);
> +} else {
> +xk = (uint8_t) (xm < xy ? xm : xy);
> +}
> +return xk;
> +}
> +
> +/* { dg-final { scan-tree-dump-times "MIN_EXPR" 3 "phiopt1" } } */
> +/* { dg-final { scan-tree-dump-times "MAX_EXPR" 0 "phiopt1" } } */
> diff --git a/gcc/testsuite/gcc.dg/tree-ssa/minmax-4.c 
> b/gcc/testsuite/gcc.dg/tree-ssa/minmax-4.c
> new file mode 100644
> index 
> ..0b6d667be868c2405eaefd17cb522da44bafa0e2
> --- /dev/null
> +++ b/gcc/testsuite/gcc.dg/tree-ssa/minmax-4.c
> @@ -0,0 +1,17 @@
> +/* { dg-do compile } */
> +/* { dg-options "-O -fdump-tree-phiopt" } */
> +
> +#include 
> +
> +uint8_t three_max (uint8_t xc, uint8_t xm, uint8_t xy) {
> +uint8_t xk;
> +if (xc > xm) {
> +xk = (uint8_t) (xc > xy ? xc : xy);
> +} else {
> +xk = (uint8_t) (xm > xy ? xm : xy);
> +}
> +return xk;
> +}
> +
> +/* { dg-final { scan-tree-dump-times "MIN_EXPR" 0 "phiopt1" } } */
> +/* { dg-final { scan-tree-dump-times "MAX_EXPR" 3 "phiopt1" } } */
> diff --git a/gcc/testsuite/gcc.dg/tree-ssa/minmax-5.c 
> b/gcc/testsuite/gcc.dg/tree-ssa/minmax-5.c
> new file mode 100644
> index 
> ..650601a3cc75d09a9e6e54a35f5b9993074f8510
> --- /dev/null
> +++ b/gcc/testsuite/gcc.dg/tree-ssa/minmax-5.c
> @@ -0,0 +1,17 @@
> +/* { dg-do compile } */
> +/* { dg-options "-O -fdump-tree-phiopt" } */
> +
> +#include 
> +
> +uint8_t three_minmax1 (uint8_t xc, uint8_t xm, uint8_t xy) {
> +   uint8_t  xk;
> +if (xc > xm) {
> +xk = (uint8_t) (xc < xy ? xc : xy);
> +} else {
> +xk = (uint8_t) (xm < xy ? xm : xy);
> +}
> +return xk;
> +}
> +
> +/* { dg-final { scan-tree-dump-times "MIN_EXPR" 2 "phiopt1" } } */
> +/* { dg-final { scan-tree-dump-times "MAX_EXPR" 1 "phiopt1" } } */
> diff --git a/gcc/testsuite/gcc.dg/tree-ssa/minmax-6.c 
> b/gcc/testsuite/gcc.dg/tree-ssa/minmax-6.c
> new file mode 100644
> index 
> 

Re: [PATCH v5, rs6000] Implemented f[min/max]_optab by xs[min/max]dp [PR103605]

2022-06-20 Thread Segher Boessenkool
Hi!

On Mon, Jun 20, 2022 at 11:12:50AM +0800, HAO CHEN GUI wrote:
> --- a/gcc/config/rs6000/rs6000-builtins.def
> +++ b/gcc/config/rs6000/rs6000-builtins.def

You don't have this in the changelog.  Please fix.

> --- a/gcc/config/rs6000/rs6000.md
> +++ b/gcc/config/rs6000/rs6000.md

This, too.  And match.pd isn't in the patch.

> +(define_insn "f3"
> +  [(set (match_operand:SFDF 0 "vsx_register_operand" "=wa")
> + (unspec:SFDF [(match_operand:SFDF 1 "vsx_register_operand" "wa")
> +   (match_operand:SFDF 2 "vsx_register_operand" "wa")]
> +  FMINMAX))]
> +  "TARGET_VSX && !flag_finite_math_only"

&& !flag_trapping_math

and/or whatever else is needed as well here.

> +  "xsdp %x0,%x1,%x2"
> +  [(set_attr "type" "fp")]
> +)

Are things like
  fmin(4.0, 2.0);
(still) optimised correctly?

> new file mode 100644
> index 000..e43ac40c2d1
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/powerpc/pr103605.c
> +/* { dg-options "-O1 -mvsx" } */

Please use -O2 instead.  That way, it will catch it if any of the
optimisations that are normally done (and not with just -O1) sabotage
us here.

Thanks,


Segher


Re: [PATCH] c: Extend the -Wpadded message with actual padding size

2022-06-20 Thread Andrew Pinski via Gcc-patches
On Mon, Jun 20, 2022 at 6:50 AM Vit Kabele  wrote:
>
> I fixed the formatting and added the test.
>
> The test has first element 32bit so that it should work on both 32 and
> 64bit architectures, even without the aligned attribute.
>
> If there is some better way how to write the test properly formatted
> (i.e. not on a single line), please let me know.
>
> -- >8 --
> Subject: [PATCH] c: Extend the -Wpadded message with actual padding size
>
> When the compiler warns about padding struct to alignment boundary, it
> now also informs the user about the size of the alignment that needs to
> be added to get rid of the warning.
>
> This removes the need of using pahole or similar tools, or manually
> determining the padding size.
>
> Tested on x86_64-pc-linux-gnu.
>
> gcc/ChangeLog:
>
> * stor-layout.cc (finalize_record_size): Extend warning message.
>
> gcc/testsuite/ChangeLog:
>
> * c-c++-common/Wpadded.c: New test.
>
> Signed-off-by: Vit Kabele 
> ---
>  gcc/stor-layout.cc   |  7 ++-
>  gcc/testsuite/c-c++-common/Wpadded.c | 10 ++
>  2 files changed, 16 insertions(+), 1 deletion(-)
>  create mode 100644 gcc/testsuite/c-c++-common/Wpadded.c
>
> diff --git a/gcc/stor-layout.cc b/gcc/stor-layout.cc
> index 765f22f68b9..88923c4136b 100644
> --- a/gcc/stor-layout.cc
> +++ b/gcc/stor-layout.cc
> @@ -1781,7 +1781,12 @@ finalize_record_size (record_layout_info rli)
>&& simple_cst_equal (unpadded_size, TYPE_SIZE (rli->t)) == 0
>&& input_location != BUILTINS_LOCATION
>&& !TYPE_ARTIFICIAL (rli->t))
> -warning (OPT_Wpadded, "padding struct size to alignment boundary");
> +  {
> +   tree pad_size
> + = size_binop (MINUS_EXPR, TYPE_SIZE_UNIT (rli->t), 
> unpadded_size_unit);
> + warning (OPT_Wpadded,
> +   "padding struct size to alignment boundary with %E bytes", 
> pad_size);
> +  }
>
>if (warn_packed && TREE_CODE (rli->t) == RECORD_TYPE
>&& TYPE_PACKED (rli->t) && ! rli->packed_maybe_necessary
> diff --git a/gcc/testsuite/c-c++-common/Wpadded.c 
> b/gcc/testsuite/c-c++-common/Wpadded.c
> new file mode 100644
> index 000..e8f1044a36b
> --- /dev/null
> +++ b/gcc/testsuite/c-c++-common/Wpadded.c
> @@ -0,0 +1,10 @@
> +/* { dg-do compile } */
> +/* { dg-options "-Wpadded" } */
> +
> +/*
> + * The struct is on single line, because C++ compiler emits the -Wpadded
> + * warning at the first line of the struct, while the C compiler at the last
> + * line of the struct definition. This way the test passes on both
> + */
> +struct S { __UINT32_TYPE__ i; char c; }; /* { dg-warning "padding struct 
> size to alignment boundary with 3 bytes" } */
> +
Note the testcase will fail on some targets where alignment is 1 for everything.
You most likely want the dg-warning to be like it is in gcc.dg/Wpadded.c:
/* { dg-warning "padding struct size to alignment boundary with 3
bytes" ""  { target { ! default_packed } } } */

You might want the following from the same file too:
/* -fpack-struct is necessary because the warning expected requires the initial
   packing to be larger than 1, which cannot be guaranteed for all targets.
   We won't get a warning anyway if the target has "packed" structure
   layout.  */
/* { dg-options "-Wpadded -fpack-struct=8" } */
/* { dg-additional-options "-mno-ms-bitfields" { target *-*-mingw* } } */


Thanks,
Andrew Pinski

> --
> 2.30.2


[PATCH] tsystem.h: Add missing stdint.h include.

2022-06-20 Thread Kacper Słomiński via Gcc-patches
Users of tsystem.h expect stdint.h to be included (for example,
gcc/gcov-io.h included by libgcc/libgcov.h), but it is not
included, and it is not provided by any other header included here
according to POSIX.

gcc/ChangeLog:

* tsystem.h: Add missing stdint.h include.
---
Proposed fix for bug report 106036. Found when compiling gcc with
mlibc (https://github.com/managarm/mlibc) as the C library. Apologies
for any mistakes with the patch.

 gcc/tsystem.h | 1 +
 1 file changed, 1 insertion(+)

diff --git a/gcc/tsystem.h b/gcc/tsystem.h
index dfaf9e86a..5a8551cc2 100644
--- a/gcc/tsystem.h
+++ b/gcc/tsystem.h
@@ -41,6 +41,7 @@ see the files COPYING3 and COPYING.RUNTIME respectively.  If 
not, see
 #define _GNU_SOURCE 1
 
 /* GCC supplies these headers.  */
+#include 
 #include 
 #include 
 
-- 
2.36.1



Re: [rs6000 PATCH] PR target/105991: Recognize PLUS and XOR forms of rldimi.

2022-06-20 Thread Segher Boessenkool
Hi!

On Fri, Jun 17, 2022 at 07:13:37AM +0200, Roger Sayle wrote:
> This patch addresses PR target/105991 where a change to prefer representing
> shifts and adds at the tree-level as multiplications, causes problems for
> the rldimi patterns in the powerpc backend.

Because it now is converted to different RTL at expand time.  Which the
generic expand code does some premature optimisation on, which makes us
end up with the addition instead of data manipulation insns.  Oh well.

> The issue is that rs6000.md
> models this pattern using IOR, and some variants that have the equivalent
> PLUS or XOR in the RTL fail to match some *rotl4_insert patterns.
> This is fixed in this patch by adding a define_insn_and_split to locally
> canonicalize the PLUS and XOR forms to the backend's preferred IOR form.

Okay.

> An alternative fix might be for the RTL optimizers to define a canonical
> form for these plus_xor_ior equivalent expressions, but the logical
> choice might be plus (which may appear in an addressing mode), and such
> a change may require a number of tweaks to update various backends
> (i.e.  a more intrusive change than the one proposed here).

This does not make sense in an address at all, thankfully :-)

The only sane canonicalisation for this is something like VEC_DUPLICATE
but for submodes of integer modes, instead of the component mode of a
vector mode.  I don't feel this is worth trying to handle in general
though.

> Many thanks for Marek Polacek for bootstrapping and regression testing
> this change without problems.

You have an account on the cfarm, it is quick and easy to test there :-)
I recommend gcc135, a 32 core p9, with oodles of disk space :-)

> +; Canonicalize the PLUS and XOR forms to IOR for rotl3_insert_3
> +(define_code_iterator plus_xor [plus xor])
> +
> +(define_insn_and_split "*rotl3_insert_3_"
> +  [(set (match_operand:GPR 0 "gpc_reg_operand" "=r")
> + (plus_xor:GPR
> +   (and:GPR (match_operand:GPR 3 "gpc_reg_operand" "0")
> +(match_operand:GPR 4 "const_int_operand" "n"))
> +   (ashift:GPR (match_operand:GPR 1 "gpc_reg_operand" "r")
> +   (match_operand:SI 2 "const_int_operand" "n"]
> +  "INTVAL (operands[2]) == exact_log2 (UINTVAL (operands[4]) + 1)"

exact_log2 returns -1 if its argument is not a power of two.  Please
test it is > 0 explicitly here: I don't think this splitter will work
correctly otherwise.  There shouldn't really be a shift by 0 ever of
course, but it isn't invalid RTL.

> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/powerpc/pr105991.c
> @@ -0,0 +1,11 @@
> +/* { dg-do compile } */
> +/* { dg-options "-O2" } */
> +unsigned long long
> +foo (unsigned long long value)
> +{
> +  value &= 0x;
> +  value |= value << 32;
> +  return value;
> +}
> +/* { dg-final { scan-assembler "rldimi" } } */

Write
/* { dg-final { scan-assembler {\mrldimi\M} } } */
please.


Okay for trunk with those changes.  Thanks!


Segher


[PATCH v2] tree-optimization/95821 - Convert strlen + strchr to memchr

2022-06-20 Thread Noah Goldstein via Gcc-patches
This patch allows for strchr(x, c) to the replace with memchr(x, c,
strlen(x) + 1) if strlen(x) has already been computed earlier in the
tree.

Handles PR95821: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=95821

Since memchr doesn't need to re-find the null terminator it is faster
than strchr.

bootstrapped and tested on x86_64-linux.

PR tree-optimization/95821

gcc/

* tree-ssa-strlen.cc (strlen_pass::handle_builtin_strchr): Emit
memchr instead of strchr if strlen already computed.

gcc/testsuite/

* c-c++-common/pr95821-1.c: New test.
* c-c++-common/pr95821-2.c: New test.
* c-c++-common/pr95821-3.c: New test.
* c-c++-common/pr95821-4.c: New test.
* c-c++-common/pr95821-5.c: New test.
* c-c++-common/pr95821-6.c: New test.
* c-c++-common/pr95821-7.c: New test.
* c-c++-common/pr95821-8.c: New test.
---
 gcc/testsuite/c-c++-common/pr95821-1.c | 15 +
 gcc/testsuite/c-c++-common/pr95821-2.c | 17 +
 gcc/testsuite/c-c++-common/pr95821-3.c | 17 +
 gcc/testsuite/c-c++-common/pr95821-4.c | 16 +
 gcc/testsuite/c-c++-common/pr95821-5.c | 19 ++
 gcc/testsuite/c-c++-common/pr95821-6.c | 18 ++
 gcc/testsuite/c-c++-common/pr95821-7.c | 18 ++
 gcc/testsuite/c-c++-common/pr95821-8.c | 19 ++
 gcc/tree-ssa-strlen.cc | 89 --
 9 files changed, 209 insertions(+), 19 deletions(-)
 create mode 100644 gcc/testsuite/c-c++-common/pr95821-1.c
 create mode 100644 gcc/testsuite/c-c++-common/pr95821-2.c
 create mode 100644 gcc/testsuite/c-c++-common/pr95821-3.c
 create mode 100644 gcc/testsuite/c-c++-common/pr95821-4.c
 create mode 100644 gcc/testsuite/c-c++-common/pr95821-5.c
 create mode 100644 gcc/testsuite/c-c++-common/pr95821-6.c
 create mode 100644 gcc/testsuite/c-c++-common/pr95821-7.c
 create mode 100644 gcc/testsuite/c-c++-common/pr95821-8.c

diff --git a/gcc/testsuite/c-c++-common/pr95821-1.c 
b/gcc/testsuite/c-c++-common/pr95821-1.c
new file mode 100644
index 000..e0beb609ea2
--- /dev/null
+++ b/gcc/testsuite/c-c++-common/pr95821-1.c
@@ -0,0 +1,15 @@
+/* { dg-do compile } */
+/* { dg-options "-O2" } */
+/* { dg-final { scan-assembler "memchr" } } */
+
+#include 
+
+char *
+foo (char *s, char c)
+{
+   size_t slen = __builtin_strlen(s);
+   if(slen < 1000)
+   return NULL;
+
+   return __builtin_strchr(s, c);
+}
diff --git a/gcc/testsuite/c-c++-common/pr95821-2.c 
b/gcc/testsuite/c-c++-common/pr95821-2.c
new file mode 100644
index 000..5429f0586be
--- /dev/null
+++ b/gcc/testsuite/c-c++-common/pr95821-2.c
@@ -0,0 +1,17 @@
+/* { dg-do compile } */
+/* { dg-options "-O2" } */
+/* { dg-final { scan-assembler-not "memchr" } } */
+
+#include 
+
+char *
+foo (char *s, char c, char * other)
+{
+   size_t slen = __builtin_strlen(s);
+   if(slen < 1000)
+   return NULL;
+
+   *other = 0;
+
+   return __builtin_strchr(s, c);
+}
diff --git a/gcc/testsuite/c-c++-common/pr95821-3.c 
b/gcc/testsuite/c-c++-common/pr95821-3.c
new file mode 100644
index 000..bc929c6044b
--- /dev/null
+++ b/gcc/testsuite/c-c++-common/pr95821-3.c
@@ -0,0 +1,17 @@
+/* { dg-do compile } */
+/* { dg-options "-O2" } */
+/* { dg-final { scan-assembler "memchr" } } */
+
+#include 
+
+char *
+foo (char * __restrict s, char c, char * __restrict other)
+{
+   size_t slen = __builtin_strlen(s);
+   if(slen < 1000)
+   return NULL;
+
+   *other = 0;
+
+   return __builtin_strchr(s, c);
+}
diff --git a/gcc/testsuite/c-c++-common/pr95821-4.c 
b/gcc/testsuite/c-c++-common/pr95821-4.c
new file mode 100644
index 000..684b41d5b70
--- /dev/null
+++ b/gcc/testsuite/c-c++-common/pr95821-4.c
@@ -0,0 +1,16 @@
+/* { dg-do compile } */
+/* { dg-options "-O2" } */
+/* { dg-final { scan-assembler "memchr" } } */
+
+#include 
+#include 
+
+char *
+foo (char *s, char c)
+{
+   size_t slen = strlen(s);
+   if(slen < 1000)
+   return NULL;
+
+   return strchr(s, c);
+}
diff --git a/gcc/testsuite/c-c++-common/pr95821-5.c 
b/gcc/testsuite/c-c++-common/pr95821-5.c
new file mode 100644
index 000..00c1d93b614
--- /dev/null
+++ b/gcc/testsuite/c-c++-common/pr95821-5.c
@@ -0,0 +1,19 @@
+/* { dg-do compile } */
+/* { dg-options "-O2" } */
+/* { dg-final { scan-assembler-not "memchr" } } */
+
+#include 
+#include 
+
+char *
+foo (char *s, char c, char * other)
+{
+   size_t slen = strlen(s);
+   if(slen < 1000)
+   return NULL;
+
+   *other = 0;
+
+   return strchr(s, c);
+}
+int main() {}
diff --git a/gcc/testsuite/c-c++-common/pr95821-6.c 
b/gcc/testsuite/c-c++-common/pr95821-6.c
new file mode 100644
index 000..dec839de5ea
--- /dev/null
+++ b/gcc/testsuite/c-c++-common/pr95821-6.c
@@ -0,0 +1,18 @@
+/* { dg-do compile } */
+/* { dg-options "-O2" } */
+/* { dg-final { scan-assembler "memchr" } } */
+
+#include 
+#include 
+
+char *
+foo (char * __restrict 

Re: [PATCH v1] tree-optimization/95821 - Convert strlen + strchr to memchr

2022-06-20 Thread Noah Goldstein via Gcc-patches
On Mon, Jun 20, 2022 at 10:29 AM Jakub Jelinek  wrote:
>
> On Mon, Jun 20, 2022 at 09:35:36AM -0700, Noah Goldstein via Gcc-patches 
> wrote:
> > This patch allows for strchr(x, c) to the replace with memchr(x, c,
> > strlen(x) + 1) if strlen(x) has already been computed earlier in the
> > tree.
> >
> > Handles PR95821: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=95821
> >
> > Since memchr doesn't need to re-find the null terminator it is faster
> > than strchr.
>
> Do you have a GCC Copyright assignment on file, or do you want to submit
> this under DCO ( https://gcc.gnu.org/dco.html )?  If the latter, there
> should be a Signed-off-by: line, both in the mail and later commit.
> >
> > bootstrapped and tested on x86_64-linux.
> >
> > gcc/
> >
>
> As it fixes a GCC bugzilla bug, the ChangeLog entry should start with
> PR tree-optimization/95821
> line.
> > * tree-ssa-strlen.cc: Emit memchr instead of strchr if strlen
> >  already computed.
>
> All the indented lines in ChangeLog should be indented by tab.
> You are modifying strlen_pass::handle_builtin_strchr function, so after
> tree-ssa-strlen.cc there should be that function name in parens:
> * tree-ssa-strlen.cc (strlen_pass::handle_builtin_strchr): Emit
> memchr ...

Fixed in v2.
>
> >
> > gcc/testsuite/
> >
> > * c-c++-common/pr95821-1.c
> > * c-c++-common/pr95821-2.c
> > * c-c++-common/pr95821-3.c
> > * c-c++-common/pr95821-4.c
> > * c-c++-common/pr95821-5.c
> > * c-c++-common/pr95821-6.c
>
> All the above lines should end with ": New test." after .c

Fixed in V2.
>
> > --- a/gcc/tree-ssa-strlen.cc
> > +++ b/gcc/tree-ssa-strlen.cc
>
> How does the patch relate to the one that H.J. attached in
> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=95821#c4 ?
>
> > @@ -2405,9 +2405,12 @@ strlen_pass::handle_builtin_strlen ()
> >  }
> >  }
> >
> > -/* Handle a strchr call.  If strlen of the first argument is known, replace
> > -   the strchr (x, 0) call with the endptr or x + strlen, otherwise remember
> > -   that lhs of the call is endptr and strlen of the argument is endptr - 
> > x.  */
> > +/* Handle a strchr call.  If strlen of the first argument is known,
> > +   replace the strchr (x, 0) call with the endptr or x + strlen,
> > +   otherwise remember that lhs of the call is endptr and strlen of the
> > +   argument is endptr - x.  If strlen of x is not know but has been
> > +   computed earlier in the tree then replace strchr(x, c) to
> > +   memchr(x, c, strlen + 1).  */
>
> Space before ( even in comments.

Fixed in V2.
>
>
>
> >  void
> >  strlen_pass::handle_builtin_strchr ()
> > @@ -2418,8 +2421,8 @@ strlen_pass::handle_builtin_strchr ()
> >if (lhs == NULL_TREE)
> >  return;
> >
> > -  if (!integer_zerop (gimple_call_arg (stmt, 1)))
> > -return;
> > +  tree chr = gimple_call_arg (stmt, 1);
> > +  bool is_strchr_zerop = integer_zerop (chr);
> >
> >tree src = gimple_call_arg (stmt, 0);
> >
> > @@ -2452,32 +2455,56 @@ strlen_pass::handle_builtin_strchr ()
> > fprintf (dump_file, "Optimizing: ");
> > print_gimple_stmt (dump_file, stmt, 0, TDF_SLIM);
> >   }
> > -   if (si != NULL && si->endptr != NULL_TREE)
> > +   if (!is_strchr_zerop)
> >   {
> > -   rhs = unshare_expr (si->endptr);
> > -   if (!useless_type_conversion_p (TREE_TYPE (lhs),
> > -   TREE_TYPE (rhs)))
> > - rhs = fold_convert_loc (loc, TREE_TYPE (lhs), rhs);
> > +   /* If its not strchr(s, zerop) then try and convert to
> > +memchr if strlen has already been computed.  */
>
> Again, space before (.  The second line is weirdly formatted, should
> be indented below If.

Fixed in V2.
>
> > +   tree fn = builtin_decl_explicit (BUILT_IN_MEMCHR);
> > +   tree one = build_int_cst (TREE_TYPE (rhs), 1);
> > +   rhs = fold_build2_loc (loc, PLUS_EXPR, TREE_TYPE (rhs),
> > +  unshare_expr (rhs), one);
> > +   tree size = make_ssa_name (TREE_TYPE (rhs));
> > +   gassign *size_stmt = gimple_build_assign (size, rhs);
> > +   gsi_insert_before (_gsi, size_stmt, GSI_SAME_STMT);
> > +   rhs = size;
> > +   if (!update_gimple_call (_gsi, fn, 3, src, chr, rhs))
> > + return;
>
> I think we should differentiate more.  If integer_nonzerop (chr)
> or perhaps better tree_expr_nonzero_p (chr), then it is better
> to optimize t = strlen (x); ... p = strchr (x, c); to
> t = strlen (x); ... p = memchr (x, c, t);
> the t + 1 is only needed if c might be zero.

Done in V2. Also added the optimizations if chr has zero-char bits.

Right now:

t=strlen (s);

strchr (s, 0) -> t;
strchr (s, 256) -> t;

strchr (s, 1234) -> memchr (s, 1234, t);
strchr (s, non_zero) -> memchr (s, non_zero, t);
strchr (s, unknown) -> memchr (s, unknown, t + 1);


>
> > +   /* Don't update strlen of lhs if 

Re: [PATCH RFA] ubsan: default to trap on unreachable at -O0 and -Og [PR104642]

2022-06-20 Thread Jason Merrill via Gcc-patches

On 6/16/22 09:14, Jakub Jelinek wrote:

On Wed, Jun 15, 2022 at 04:38:49PM -0400, Jason Merrill wrote:

Furthermore, handling it the UBSan way means we slow down the compiler
(enable a bunch of extra passes, like sanopt, ubsan), which is undesirable
e.g. for -O0 compilation speed.


The ubsan pass is not enabled for unreachable|return.  sanopt does a single


You're right.


pass over the function to rewrite __builtin_unreachable, but that doesn't
seem like much overhead.


But I think we are trying to avoid hard any kind of unnecessary whole IL
extra walks, especially for -O0.


OK.


So, I think -funreachable-traps should be a separate flag and not an alias,
enabled by default for -O0 and -Og, which would be handled elsewhere
(I'd say e.g. in fold_builtin_0 and perhaps gimple_fold_builtin too to
avoid allocating trees unnecessary)


I tried this approach, but it misses some __builtin_unreachable calls added
by e.g. execute_fixup_cfg; it seems they never get folded by any subsequent
pass.


We could also expand BUILT_IN_UNREACHABLE as BUILT_IN_TRAP during expansion
to catch whatever isn't caught by folding.


That was an early thing I tried, but that's too late to prevent it from 
being used for optimization.  More recently I've put an assert in 
expand_builtin_unreachable to catch ones that slip past.



and would be done if
flag_unreachable_traps && !sanitize_flag_p (SANITIZE_UNREACHABLE),
just replacing that __builtin_unreachable call with __builtin_trap.
For the function ends in fact under those conditions we could emit
__builtin_trap right away instead of emitting __builtin_unreachable
and waiting on folding it later to __builtin_trap.


Sure, but I generally prefer to change fewer places.


I'd say this would be very small change and the fastest + most reliable.
Simply replace all builtin_decl_implicit (BUILT_IN_UNREACHABLE) calls
with builtin_decl_unreachable () (12 of them) and define
tree
builtin_decl_unreachable ()
{
   enum built_in_function fncode = BUILT_IN_UNREACHABLE;

   if (sanitize_flag_p (SANITIZE_UNREACHABLE))
 {
   if (flag_sanitize_undefined_trap_on_error)
fncode = BUILT_IN_TRAP;
   /* Otherwise we want __builtin_unreachable () later folded into
 __ubsan_handle_builtin_unreachable with extra args.  */
 }
   else if (flag_unreachable_traps)
 fncode = BUILT_IN_TRAP;
   return builtin_decl_implicit (fncode);
}
and that's it (well, also in build_common_builtin_nodes
declare __builtin_trap for FEs that don't do that - like it is done
for __builtin_unreachable).


OK, here's another version of the patch using that approach.

From 280713174b2bbda97ebd88cefa90d52df73813f5 Mon Sep 17 00:00:00 2001
From: Jason Merrill 
Date: Fri, 10 Jun 2022 16:35:21 -0400
Subject: [PATCH] ubsan: default to trap on unreachable at -O0 and -Og
 [PR104642]
To: gcc-patches@gcc.gnu.org

When not optimizing, we can't do anything useful with unreachability in
terms of code performance, so we might as well improve debugging by turning
__builtin_unreachable into a trap.  In the PR richi suggested introducing an
-funreachable-traps flag for this, but this functionality is already
implemented as -fsanitize=unreachable -fsanitize-trap=unreachable, we
just need to set those flags by default.

I think it also makes sense to do this when we're explicitly optimizing for
the debugging experience.

I then needed to make options-save handle -fsanitize; since it has custom
parsing, that meant handling it explicitly in the awk scripts.

Jakub observed that this would slow down -O0 by default from running the
sanopt pass, so this revision avoids the need for sanopt by rewriting calls
introduced by the compiler to __builtin_trap immediately, and calls written
by the user at fold time.  Many of the calls introduced by the compiler are
also rewritten immediately to ubsan calls, which fixes ubsan-8b.C;
previously the call to f() was optimized away before sanopt.  But this early
rewriting isn't practical for uses of __builtin_unreachable in
devirtualization and such, so sanopt rewriting is still done for
non-trapping sanitize.

Do we still want -funreachable-traps as an alias for
-fsanitize=unreachable,return -fsanitize-trap=unreachable,return?

	PR c++/104642

gcc/ChangeLog:

	* doc/invoke.texi (-fsanitize=unreachable): On by default at -O0.
	* opts.cc (finish_options): At -O0, trap on unreachable code.
	* optc-save-gen.awk, opth-gen.awk: Include flag_sanitize.
	* tree.cc (build_common_builtin_nodes): Add __builtin_trap.
	* sanopt.cc: Don't run for just SANITIZE_RETURN
	or SANITIZE_UNREACHABLE when trapping.
	* ubsan.cc (builtin_decl_unreachable): New.
	(unreachable_1): Factor out.
	(build_builtin_unreachable): Use it.
	(gimple_build_builtin_unreachable): Use it.
	(ubsan_instrument_unreachable): Use it.
	* builtins.cc (expand_builtin_unreachable): Add assert.
	(fold_builtin_0): Call build_builtin_unreachable.
	* tree.h (builtin_decl_unreachable)
	(gimple_build_builtin_unreachable)
	

Re: [PATCH RFA] ubsan: do return check with -fsanitize=unreachable

2022-06-20 Thread Jason Merrill via Gcc-patches

On 6/20/22 07:05, Jakub Jelinek wrote:

On Fri, Jun 17, 2022 at 05:20:02PM -0400, Jason Merrill wrote:

Related to PR104642, the current situation where we get less return checking
with just -fsanitize=unreachable than no sanitize flags seems undesirable; I
propose that we do return checking when -fsanitize=unreachable.


__builtin_unreachable itself (unless turned into trap or
__ubsan_handle_builtin_unreachable) is not any kind of return checking, it
is just an optimization.


Yes, but I'm talking about "when -fsanitize=unreachable".


Looks like clang just traps on missing return if not -fsanitize=return, but
the approach in this patch seems more helpful to me if we're already
sanitizing other should-be-unreachable code.

I'm assuming that the difference in treatment of SANITIZE_UNREACHABLE and
SANITIZE_RETURN with regard to loop optimization is deliberate.


return and unreachable are separate sanitizers and such silent one way
implication can have quite unexpected consequences, especially with
-fsanitize-trap=.
Say with -fsanitize=unreachable -fsanitize-trap=unreachable, both current
trunk and clang will link without -lubsan, because the only enabled UBSan
sanitizers use __builtin_trap () which doesn't need library.
With -fsanitize=unreachable silently meaning -fsanitize=unreachable,return
the above would link in -lubsan, because while SANITIZE_UNREACHABLE uses
__builtin_trap, SANITIZE_RETURN doesn't.
Similarly, one has no_sanitize attribute, one could in certain function
__attribute__((no_sanitize ("unreachable"))) and because on the command
line using -fsanitize=unreachable assume other sanitizers aren't enabled,
but the silent addition of return sanitizer would break that.


Ah, true.  How about this approach instead?
From 439c645fa5197ccbfcb6fbbeda5772a8581c3a7e Mon Sep 17 00:00:00 2001
From: Jason Merrill 
Date: Wed, 15 Jun 2022 15:45:48 -0400
Subject: [PATCH] ubsan: do return check with -fsanitize=unreachable
To: gcc-patches@gcc.gnu.org

The current situation where we get less return checking with just
-fsanitize=unreachable than no sanitize flags seems undesirable; I propose
that we do return checking when -fsanitize=unreachable.

Looks like clang just traps on missing return if not -fsanitize=return, but
the approach in this patch seems more helpful to me if we're already
sanitizing other believed-unreachable code.

I took this approach rather than setting SANITIZE_RETURN in opts.cc so that
attribute no_sanitize ("unreachable") will turn this behavior off as well.

gcc/ChangeLog:

	* doc/invoke.texi: Note that -fsanitize=unreachable includes
	-fsanitize=return.

gcc/c-family/ChangeLog:

	* c-ubsan.cc (ubsan_instrument_return): Also look for trap
	on unreachable.

gcc/cp/ChangeLog:

	* cp-gimplify.cc (cp_maybe_instrument_return): Also instrument
	return if SANITIZE_UNREACHABLE.
---
 gcc/doc/invoke.texi |  2 ++
 gcc/c-family/c-ubsan.cc |  5 -
 gcc/cp/cp-gimplify.cc   | 14 +-
 3 files changed, 7 insertions(+), 14 deletions(-)

diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi
index 16a893ec1da..b6786df73b9 100644
--- a/gcc/doc/invoke.texi
+++ b/gcc/doc/invoke.texi
@@ -15950,6 +15950,8 @@ built with this option turned on will issue an error message
 when the end of a non-void function is reached without actually
 returning a value.  This option works in C++ only.
 
+This check is also enabled by -fsanitize=unreachable.
+
 @item -fsanitize=signed-integer-overflow
 @opindex fsanitize=signed-integer-overflow
 This option enables signed integer overflow checking.  We check that
diff --git a/gcc/c-family/c-ubsan.cc b/gcc/c-family/c-ubsan.cc
index 360ba82250c..ac6da558473 100644
--- a/gcc/c-family/c-ubsan.cc
+++ b/gcc/c-family/c-ubsan.cc
@@ -339,7 +339,10 @@ ubsan_instrument_vla (location_t loc, tree size)
 tree
 ubsan_instrument_return (location_t loc)
 {
-  if (flag_sanitize_trap & SANITIZE_RETURN)
+  /* C++ calls this for SANITIZE_RETURN|SANITIZE_UNREACHABLE.  */
+  unsigned mask = flag_sanitize & SANITIZE_RETURN;
+  if (!mask) mask = flag_sanitize & SANITIZE_UNREACHABLE;
+  if (flag_sanitize_trap & mask)
 /* pass_warn_function_return checks for BUILTINS_LOCATION.  */
 return build_call_expr_loc (BUILTINS_LOCATION,
 builtin_decl_explicit (BUILT_IN_TRAP), 0);
diff --git a/gcc/cp/cp-gimplify.cc b/gcc/cp/cp-gimplify.cc
index 7b0465729a3..7214e4be175 100644
--- a/gcc/cp/cp-gimplify.cc
+++ b/gcc/cp/cp-gimplify.cc
@@ -1806,18 +1806,6 @@ cp_maybe_instrument_return (tree fndecl)
   || !targetm.warn_func_return (fndecl))
 return;
 
-  if (!sanitize_flags_p (SANITIZE_RETURN, fndecl)
-  /* Don't add __builtin_unreachable () if not optimizing, it will not
-	 improve any optimizations in that case, just break UB code.
-	 Don't add it if -fsanitize=unreachable -fno-sanitize=return either,
-	 UBSan covers this with ubsan_instrument_return above where sufficient
-	 information is provided, while the __builtin_unreachable () below
-	 if return sanitization is disabled 

Re: [PATCH] c++, v2: Add support for __real__/__imag__ modifications in constant expressions [PR88174]

2022-06-20 Thread Jason Merrill via Gcc-patches

On 6/17/22 13:06, Jakub Jelinek wrote:

On Fri, Jun 10, 2022 at 09:57:06PM +0200, Jakub Jelinek via Gcc-patches wrote:

On Fri, Jun 10, 2022 at 01:27:28PM -0400, Jason Merrill wrote:

Doesn't this assert mean that complex_expr will always be == valp?


No, even when handling the pushed *PART_EXPR, it will set
valp = _OPERAND (*valp, index != integer_zero_node);
So, valp will be either _OPERAND (*complex_expr, 0)
or _OPERAND (*complex_expr, 1).
As *valp = init; is what is usually then stored and we want to store there
the scalar.


I don't understand this block; shouldn't valp point to the real or imag part
of the complex number at this point?  How could complex_part be set without
us handling the complex case in the loop already?


Because for most references, the code will do:
   vec_safe_push (ctors, *valp);
   vec_safe_push (indexes, index);
I chose not to do this for *PART_EXPR, because the COMPLEX_EXPR isn't a
CONSTRUCTOR and code later on e.g. walks all the ctors and accesses
CONSTRUCTOR_NO_CLEARING on them etc.  As the *PART_EXPR is asserted to
be outermost only, complex_expr is a variant of that ctors push and
complex_part of the indexes.
The reason for the above if is just in case the evaluation of the rhs
of the store would store to the complex and could e.g. make it a COMPLEX_CST
again.


I might have added the COMPLEX_EXPR to ctors instead of a separate variable,
but this is fine too.


See above.
The COMPLEX_EXPR needs special handling (conversion into COMPLEX_CST if it
is constant) anyway.


Here is a variant patch which pushes even the *PART_EXPR related entries
into ctors and indexes vectors, so it doesn't need to use extra variables
for the complex stuff.


Thanks.


2022-06-17  Jakub Jelinek  

PR c++/88174
* constexpr.cc (cxx_eval_store_expression): Handle REALPART_EXPR
and IMAGPART_EXPR.  Change ctors from releasing_vec to
auto_vec, adjust all uses.

* g++.dg/cpp1y/constexpr-complex1.C: New test.

--- gcc/cp/constexpr.cc.jj  2022-06-09 17:42:23.606243920 +0200
+++ gcc/cp/constexpr.cc 2022-06-17 18:59:54.809208997 +0200
@@ -5714,6 +5714,20 @@ cxx_eval_store_expression (const constex
  }
  break;
  
+	case REALPART_EXPR:

+ gcc_assert (probe == target);
+ vec_safe_push (refs, probe);
+ vec_safe_push (refs, TREE_TYPE (probe));
+ probe = TREE_OPERAND (probe, 0);
+ break;
+
+   case IMAGPART_EXPR:
+ gcc_assert (probe == target);
+ vec_safe_push (refs, probe);
+ vec_safe_push (refs, TREE_TYPE (probe));
+ probe = TREE_OPERAND (probe, 0);
+ break;
+
default:
  if (evaluated)
object = probe;
@@ -5752,7 +5766,8 @@ cxx_eval_store_expression (const constex
type = TREE_TYPE (object);
bool no_zero_init = true;
  
-  releasing_vec ctors, indexes;

+  auto_vec ctors;
+  releasing_vec indexes;
auto_vec index_pos_hints;
bool activated_union_member_p = false;
bool empty_base = false;
@@ -5792,14 +5807,36 @@ cxx_eval_store_expression (const constex
  *valp = ary_ctor;
}
  
-  /* If the value of object is already zero-initialized, any new ctors for

-subobjects will also be zero-initialized.  */
-  no_zero_init = CONSTRUCTOR_NO_CLEARING (*valp);
-
enum tree_code code = TREE_CODE (type);
tree reftype = refs->pop();
tree index = refs->pop();
  
+  if (code == COMPLEX_TYPE)

+   {
+ if (TREE_CODE (*valp) == COMPLEX_CST)
+   *valp = build2 (COMPLEX_EXPR, type, TREE_REALPART (*valp),
+   TREE_IMAGPART (*valp));
+ else if (TREE_CODE (*valp) == CONSTRUCTOR
+  && CONSTRUCTOR_NELTS (*valp) == 0
+  && CONSTRUCTOR_NO_CLEARING (*valp))
+   {
+ tree r = build_constructor (reftype, NULL);
+ CONSTRUCTOR_NO_CLEARING (r) = 1;
+ *valp = build2 (COMPLEX_EXPR, type, r, r);
+   }
+ gcc_assert (TREE_CODE (*valp) == COMPLEX_EXPR);
+ ctors.safe_push (valp);
+ vec_safe_push (indexes, index);
+ valp = _OPERAND (*valp, TREE_CODE (index) == IMAGPART_EXPR);
+ gcc_checking_assert (refs->is_empty ());
+ type = reftype;
+ break;
+   }
+
+  /* If the value of object is already zero-initialized, any new ctors for
+subobjects will also be zero-initialized.  */
+  no_zero_init = CONSTRUCTOR_NO_CLEARING (*valp);
+
if (code == RECORD_TYPE && is_empty_field (index))
/* Don't build a sub-CONSTRUCTOR for an empty base or field, as they
   have no data and might have an offset lower than previously declared
@@ -5842,7 +5879,7 @@ cxx_eval_store_expression (const constex
  no_zero_init = true;
}
  
-  vec_safe_push (ctors, *valp);

+  ctors.safe_push (valp);
vec_safe_push (indexes, index);
  
constructor_elt *cep

@@ 

Re: [PATCH v1] tree-optimization/95821 - Convert strlen + strchr to memchr

2022-06-20 Thread Jakub Jelinek via Gcc-patches
On Mon, Jun 20, 2022 at 12:12:53PM -0700, Noah Goldstein wrote:
> Got it. Will have that in V2.

Thanks.
> 
> We could also make the initial:
> bool is_strchr_zerop = integer_zerop (chr);
> 
> Only check the lower 8 bits.

Sure.  Though, in that case it is just an optimization,
it is ok to not to optimize strchr (x, 256); as
strchr (x, 0);, but it is not ok to optimize strchr (x, 256);
into memchr (x, 256, strlen (x)); so for the strlen (x) vs. strlen (x) + 1
decision it is needed for correctness.

Jakub



Re: PING^1 [PATCH] x86: Skip ENDBR when emitting direct call/jmp to local function

2022-06-20 Thread Uros Bizjak via Gcc-patches
On Mon, Jun 20, 2022 at 8:14 PM H.J. Lu  wrote:
>
> On Tue, May 10, 2022 at 9:25 AM H.J. Lu  wrote:
> >
> > Mark a function with SYMBOL_FLAG_FUNCTION_ENDBR when inserting ENDBR at
> > function entry.  Skip the 4-byte ENDBR when emitting a direct call/jmp
> > to a local function with ENDBR at function entry.
> >
> > This has been tested on Linux kernel.
> >
> > gcc/
> >
> > PR target/102953
> > * config/i386/i386-features.cc
> > (rest_of_insert_endbr_and_patchable_area): Set
> > SYMBOL_FLAG_FUNCTION_ENDBR when inserting ENDBR.
> > * config/i386/i386.cc (ix86_print_operand): Skip the 4-byte ENDBR
> > when calling the local function with ENDBR at function entry.
> > * config/i386/i386.h (SYMBOL_FLAG_FUNCTION_ENDBR): New.
> > (SYMBOL_FLAG_FUNCTION_ENDBR_P): Likewise.
> >
> > gcc/testsuite/
> >
> > PR target/102953
> > * gcc.target/i386/pr102953-1.c: New test.
> > * gcc.target/i386/pr102953-2.c: Likewise.
> > ---
> >  gcc/config/i386/i386-features.cc   |  2 ++
> >  gcc/config/i386/i386.cc| 11 +++-
> >  gcc/config/i386/i386.h |  5 
> >  gcc/testsuite/gcc.target/i386/pr102953-1.c | 25 ++
> >  gcc/testsuite/gcc.target/i386/pr102953-2.c | 30 ++
> >  5 files changed, 72 insertions(+), 1 deletion(-)
> >  create mode 100644 gcc/testsuite/gcc.target/i386/pr102953-1.c
> >  create mode 100644 gcc/testsuite/gcc.target/i386/pr102953-2.c
> >
> > diff --git a/gcc/config/i386/i386-features.cc 
> > b/gcc/config/i386/i386-features.cc
> > index 6fe41c3c24f..3ca1131ed59 100644
> > --- a/gcc/config/i386/i386-features.cc
> > +++ b/gcc/config/i386/i386-features.cc
> > @@ -1979,6 +1979,8 @@ rest_of_insert_endbr_and_patchable_area (bool 
> > need_endbr,
> >   || (TARGET_DLLIMPORT_DECL_ATTRIBUTES
> >   && DECL_DLLIMPORT_P (cfun->decl
> > {
> > + rtx symbol = XEXP (DECL_RTL (cfun->decl), 0);
> > + SYMBOL_REF_FLAGS (symbol) |= SYMBOL_FLAG_FUNCTION_ENDBR;
> >   if (crtl->profile && flag_fentry)
> > {
> >   /* Queue ENDBR insertion to x86_function_profiler.
> > diff --git a/gcc/config/i386/i386.cc b/gcc/config/i386/i386.cc
> > index 86752a6516a..ad1de239bef 100644
> > --- a/gcc/config/i386/i386.cc
> > +++ b/gcc/config/i386/i386.cc
> > @@ -13787,7 +13787,16 @@ ix86_print_operand (FILE *file, rtx x, int code)
> >else if (flag_pic || MACHOPIC_INDIRECT)
> > output_pic_addr_const (file, x, code);
> >else
> > -   output_addr_const (file, x);
> > +   {
> > + /* Skip ENDBR when emitting a direct call/jmp to a local
> > +function with ENDBR at function entry.  */
> > + if (code == 'P'
> > + && GET_CODE (x) == SYMBOL_REF
> > + && SYMBOL_REF_LOCAL_P (x)
> > + && SYMBOL_FLAG_FUNCTION_ENDBR_P (x))
> > +   x = gen_rtx_PLUS (Pmode, x, GEN_INT (4));
> > + output_addr_const (file, x);
> > +   }
> >  }
> >  }
> >
> > diff --git a/gcc/config/i386/i386.h b/gcc/config/i386/i386.h
> > index 363082ba47b..7a6317fea57 100644
> > --- a/gcc/config/i386/i386.h
> > +++ b/gcc/config/i386/i386.h
> > @@ -2792,6 +2792,11 @@ extern GTY(()) tree ms_va_list_type_node;
> >  #define SYMBOL_REF_STUBVAR_P(X) \
> > ((SYMBOL_REF_FLAGS (X) & SYMBOL_FLAG_STUBVAR) != 0)
> >
> > +/* Flag to mark a function with ENDBR at entry.  */
> > +#define SYMBOL_FLAG_FUNCTION_ENDBR (SYMBOL_FLAG_MACH_DEP << 5)
> > +#define SYMBOL_FLAG_FUNCTION_ENDBR_P(X) \
> > +   ((SYMBOL_REF_FLAGS (X) & SYMBOL_FLAG_FUNCTION_ENDBR) != 0)
> > +
> >  extern void debug_ready_dispatch (void);
> >  extern void debug_dispatch_window (int);
> >
> > diff --git a/gcc/testsuite/gcc.target/i386/pr102953-1.c 
> > b/gcc/testsuite/gcc.target/i386/pr102953-1.c
> > new file mode 100644
> > index 000..2afad391baf
> > --- /dev/null
> > +++ b/gcc/testsuite/gcc.target/i386/pr102953-1.c
> > @@ -0,0 +1,25 @@
> > +/* { dg-do compile { target { ! *-*-darwin* } } } */
> > +/* { dg-options "-O2 -fno-pic -fplt -fcf-protection" } */
> > +
> > +extern int func (int);
> > +
> > +extern int i;
> > +
> > +__attribute__ ((noclone, noinline, noipa))
> > +static int
> > +bar (int x)
> > +{
> > +  if (x == 0)
> > +return x;
> > +  return bar (x - 1) + func (x);
> > +}
> > +
> > +void *
> > +foo (void)
> > +{
> > +  i = bar (2);
> > +  return bar;
> > +}
> > +
> > +/* { dg-final { scan-assembler-times {call\t_?bar\+4\M} 2 } } */
> > +/* { dg-final { scan-assembler-times {call\t_?func\M} 1 } } */
> > diff --git a/gcc/testsuite/gcc.target/i386/pr102953-2.c 
> > b/gcc/testsuite/gcc.target/i386/pr102953-2.c
> > new file mode 100644
> > index 000..5b8d517f4f2
> > --- /dev/null
> > +++ b/gcc/testsuite/gcc.target/i386/pr102953-2.c
> > @@ -0,0 +1,30 @@
> > +/* { dg-do compile { target { ! *-*-darwin* } } } */
> > +/* { dg-options "-O2 

Re: [PATCH v1] tree-optimization/95821 - Convert strlen + strchr to memchr

2022-06-20 Thread Noah Goldstein via Gcc-patches
On Mon, Jun 20, 2022 at 12:04 PM Jakub Jelinek  wrote:
>
> On Mon, Jun 20, 2022 at 11:48:24AM -0700, Noah Goldstein wrote:
> > > I think we should differentiate more.  If integer_nonzerop (chr)
> > > or perhaps better tree_expr_nonzero_p (chr), then it is better
> > > to optimize t = strlen (x); ... p = strchr (x, c); to
> > > t = strlen (x); ... p = memchr (x, c, t);
> > What do you mean by differentiate more? More comments? Or
> > seperate the logic more?
>
> Different code, don't add the 1 to the strlen value whenever you know
> that chr can't be possibly 0 (either it is a non-zero constant,
> or the compiler can prove it won't be zero at runtime otherwise).
> Because if c is not 0, then memchr (x, c, strlen (x)) == memchr (x, c, strlen 
> (x) + 1),
> either c is among the first strlen (x) chars, or it will return NULL
> because x[strlen (x)] == 0.
>
> It actually is slightly more complicated, strchr second argument is int,
> but we just care about the low 8 bits.
> For TREE_CODE (chr) == INTEGER_CST, it is still trivial,
> say integer_nonzerop (fold_convert (char_type_node, chr))
> or equivalent using wide-int.h APIs.
> For SSA_NAMEs, we'd need get_zero_bits API, but we only have
> get_nonzero_bits, but we could say at least handle the case where
> get_ssa_name_range_info gives a VR_RANGE or set of them where none of
> the ranges include integral multiplies of 256.
> But for start perhaps just handling INTEGER_CST chr would be good enough.

Got it. Will have that in V2.

We could also make the initial:
bool is_strchr_zerop = integer_zerop (chr);

Only check the lower 8 bits.
>
> Jakub
>


[PATCH] Fortran: handle explicit-shape specs with constant bounds [PR105954]

2022-06-20 Thread Harald Anlauf via Gcc-patches
Dear all,

after simplification of constant bound expressions of an explicit
shape spec of an array, we need to ensure that we never obtain
negative extents.  In some cases this did happen, and we ICEd
as we hit an assert that this should never happen...

The original testcase by Gerhard exhibited this for sizeof()
of a derived type with an array component, but the issue is
more fundamental and affects other intrinsics during
simplification.

A straightforward solution "fixes up" the upper bound in the
shape spec when it is known to be below lower bounds minus one.

Regtested on x86_64-pc-linux-gnu.  OK for mainline?

Thanks,
Harald

From 65f7fd793415cb291ffb5bca8cdbcb10fc511ab8 Mon Sep 17 00:00:00 2001
From: Harald Anlauf 
Date: Mon, 20 Jun 2022 20:59:55 +0200
Subject: [PATCH] Fortran: handle explicit-shape specs with constant bounds
 [PR105954]

gcc/fortran/ChangeLog:

	PR fortran/105954
	* decl.cc (variable_decl): Adjust upper bounds for explicit-shape
	specs with constant bound expressions to ensure non-negative
	extents.

gcc/testsuite/ChangeLog:

	PR fortran/105954
	* gfortran.dg/pr105954.f90: New test.
---
 gcc/fortran/decl.cc| 12 
 gcc/testsuite/gfortran.dg/pr105954.f90 | 26 ++
 2 files changed, 38 insertions(+)
 create mode 100644 gcc/testsuite/gfortran.dg/pr105954.f90

diff --git a/gcc/fortran/decl.cc b/gcc/fortran/decl.cc
index bd586e75008..26ff54d4684 100644
--- a/gcc/fortran/decl.cc
+++ b/gcc/fortran/decl.cc
@@ -2775,6 +2775,18 @@ variable_decl (int elem)
 		  else
 		gfc_free_expr (n);
 		}
+	  /* For an explicit-shape spec with constant bounds, ensure
+		 that the effective upper bound is not lower than the
+		 respective lower bound minus one.  Otherwise adjust it so
+		 that the extent is trivially derived to be zero.  */
+	  if (as->lower[i]->expr_type == EXPR_CONSTANT
+		  && as->upper[i]->expr_type == EXPR_CONSTANT
+		  && as->lower[i]->ts.type == BT_INTEGER
+		  && as->upper[i]->ts.type == BT_INTEGER
+		  && mpz_cmp (as->upper[i]->value.integer,
+			  as->lower[i]->value.integer) < 0)
+		mpz_sub_ui (as->upper[i]->value.integer,
+			as->lower[i]->value.integer, 1);
 	}
 	}
 }
diff --git a/gcc/testsuite/gfortran.dg/pr105954.f90 b/gcc/testsuite/gfortran.dg/pr105954.f90
new file mode 100644
index 000..89004bf9aa7
--- /dev/null
+++ b/gcc/testsuite/gfortran.dg/pr105954.f90
@@ -0,0 +1,26 @@
+! { dg-do compile }
+! { dg-options "-fdump-tree-original" }
+! PR fortran/105954 - ICE in gfc_element_size, at fortran/target-memory.cc:132
+! Contributed by G.Steinmetz
+
+program p
+  use iso_c_binding, only: c_float, c_sizeof
+  implicit none
+  integer, parameter :: n = -99
+  type t
+ real :: b(3,7:n)
+  end type
+  type, bind(c) :: u
+ real(c_float) :: b(3,7:n)
+  end type
+  type(t) :: d
+  type(u) :: e
+  integer, parameter :: k = storage_size(d)
+  integer, parameter :: m = sizeof(d)
+  integer, parameter :: l = c_sizeof(e)
+  if (k /= 0) stop 1
+  if (m /= 0) stop 2
+  if (l /= 0) stop 3
+end
+
+! { dg-final { scan-tree-dump-not "_gfortran_stop_numeric" "original" } }
--
2.35.3



Re: [PATCH v1] tree-optimization/95821 - Convert strlen + strchr to memchr

2022-06-20 Thread Jakub Jelinek via Gcc-patches
On Mon, Jun 20, 2022 at 11:48:24AM -0700, Noah Goldstein wrote:
> > I think we should differentiate more.  If integer_nonzerop (chr)
> > or perhaps better tree_expr_nonzero_p (chr), then it is better
> > to optimize t = strlen (x); ... p = strchr (x, c); to
> > t = strlen (x); ... p = memchr (x, c, t);
> What do you mean by differentiate more? More comments? Or
> seperate the logic more?

Different code, don't add the 1 to the strlen value whenever you know
that chr can't be possibly 0 (either it is a non-zero constant,
or the compiler can prove it won't be zero at runtime otherwise).
Because if c is not 0, then memchr (x, c, strlen (x)) == memchr (x, c, strlen 
(x) + 1),
either c is among the first strlen (x) chars, or it will return NULL
because x[strlen (x)] == 0.

It actually is slightly more complicated, strchr second argument is int,
but we just care about the low 8 bits.
For TREE_CODE (chr) == INTEGER_CST, it is still trivial,
say integer_nonzerop (fold_convert (char_type_node, chr))
or equivalent using wide-int.h APIs.
For SSA_NAMEs, we'd need get_zero_bits API, but we only have
get_nonzero_bits, but we could say at least handle the case where
get_ssa_name_range_info gives a VR_RANGE or set of them where none of
the ranges include integral multiplies of 256.
But for start perhaps just handling INTEGER_CST chr would be good enough.

Jakub



Re: [PATCH] rs6000: Improve .machine

2022-06-20 Thread Segher Boessenkool
On Mon, Jun 20, 2022 at 09:48:34AM +0200, Sebastian Huber wrote:
> On 04/04/2022 11:31, Sebastian Huber wrote:
> >Hello Segher,
> >
> >On 15/03/2022 23:29, Segher Boessenkool wrote:
> >>On Tue, Mar 15, 2022 at 03:29:23PM +0100, Sebastian Huber wrote:
> >>>now that the PR104829 is fixed could I back port
> >>>
> >>>Segher Boessenkool (2):
> >>>   rs6000: Improve .machine
> >>>   rs6000: Do not use rs6000_cpu for .machine ppc and ppc64 (PR104829)
> >>>
> >>>to GCC 10 and 11?
> >>I will do it, in a few days though.
> >>
> >>Thanks for your enthusiasm :-),
> >
> >would now be a good time to back port the fixes or do you want to wait 
> >for the GCC 12 release? I would be nice if the fixes are included in the 
> >GCC 10.4 release.
> 
> The GCC 10.4 release candidate will be made on 21st June. May I pack 
> port the two patches today?

Yes.  Thanks!


Segher


Re: [PATCH v1] tree-optimization/95821 - Convert strlen + strchr to memchr

2022-06-20 Thread Noah Goldstein via Gcc-patches
On Mon, Jun 20, 2022 at 10:29 AM Jakub Jelinek  wrote:
>
> On Mon, Jun 20, 2022 at 09:35:36AM -0700, Noah Goldstein via Gcc-patches 
> wrote:
> > This patch allows for strchr(x, c) to the replace with memchr(x, c,
> > strlen(x) + 1) if strlen(x) has already been computed earlier in the
> > tree.
> >
> > Handles PR95821: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=95821
> >
> > Since memchr doesn't need to re-find the null terminator it is faster
> > than strchr.
>
> Do you have a GCC Copyright assignment on file, or do you want to submit
> this under DCO ( https://gcc.gnu.org/dco.html )?  If the latter, there
> should be a Signed-off-by: line, both in the mail and later commit.
> >
> > bootstrapped and tested on x86_64-linux.
> >
> > gcc/
> >
>
> As it fixes a GCC bugzilla bug, the ChangeLog entry should start with
> PR tree-optimization/95821
> line.
> > * tree-ssa-strlen.cc: Emit memchr instead of strchr if strlen
> >  already computed.
>
> All the indented lines in ChangeLog should be indented by tab.
> You are modifying strlen_pass::handle_builtin_strchr function, so after
> tree-ssa-strlen.cc there should be that function name in parens:
> * tree-ssa-strlen.cc (strlen_pass::handle_builtin_strchr): Emit
> memchr ...
>
> >
> > gcc/testsuite/
> >
> > * c-c++-common/pr95821-1.c
> > * c-c++-common/pr95821-2.c
> > * c-c++-common/pr95821-3.c
> > * c-c++-common/pr95821-4.c
> > * c-c++-common/pr95821-5.c
> > * c-c++-common/pr95821-6.c
>
> All the above lines should end with ": New test." after .c
>
> > --- a/gcc/tree-ssa-strlen.cc
> > +++ b/gcc/tree-ssa-strlen.cc
>
> How does the patch relate to the one that H.J. attached in
> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=95821#c4 ?
>
> > @@ -2405,9 +2405,12 @@ strlen_pass::handle_builtin_strlen ()
> >  }
> >  }
> >
> > -/* Handle a strchr call.  If strlen of the first argument is known, replace
> > -   the strchr (x, 0) call with the endptr or x + strlen, otherwise remember
> > -   that lhs of the call is endptr and strlen of the argument is endptr - 
> > x.  */
> > +/* Handle a strchr call.  If strlen of the first argument is known,
> > +   replace the strchr (x, 0) call with the endptr or x + strlen,
> > +   otherwise remember that lhs of the call is endptr and strlen of the
> > +   argument is endptr - x.  If strlen of x is not know but has been
> > +   computed earlier in the tree then replace strchr(x, c) to
> > +   memchr(x, c, strlen + 1).  */
>
> Space before ( even in comments.
>
>
>
> >  void
> >  strlen_pass::handle_builtin_strchr ()
> > @@ -2418,8 +2421,8 @@ strlen_pass::handle_builtin_strchr ()
> >if (lhs == NULL_TREE)
> >  return;
> >
> > -  if (!integer_zerop (gimple_call_arg (stmt, 1)))
> > -return;
> > +  tree chr = gimple_call_arg (stmt, 1);
> > +  bool is_strchr_zerop = integer_zerop (chr);
> >
> >tree src = gimple_call_arg (stmt, 0);
> >
> > @@ -2452,32 +2455,56 @@ strlen_pass::handle_builtin_strchr ()
> > fprintf (dump_file, "Optimizing: ");
> > print_gimple_stmt (dump_file, stmt, 0, TDF_SLIM);
> >   }
> > -   if (si != NULL && si->endptr != NULL_TREE)
> > +   if (!is_strchr_zerop)
> >   {
> > -   rhs = unshare_expr (si->endptr);
> > -   if (!useless_type_conversion_p (TREE_TYPE (lhs),
> > -   TREE_TYPE (rhs)))
> > - rhs = fold_convert_loc (loc, TREE_TYPE (lhs), rhs);
> > +   /* If its not strchr(s, zerop) then try and convert to
> > +memchr if strlen has already been computed.  */
>
> Again, space before (.  The second line is weirdly formatted, should
> be indented below If.
>
> > +   tree fn = builtin_decl_explicit (BUILT_IN_MEMCHR);
> > +   tree one = build_int_cst (TREE_TYPE (rhs), 1);
> > +   rhs = fold_build2_loc (loc, PLUS_EXPR, TREE_TYPE (rhs),
> > +  unshare_expr (rhs), one);
> > +   tree size = make_ssa_name (TREE_TYPE (rhs));
> > +   gassign *size_stmt = gimple_build_assign (size, rhs);
> > +   gsi_insert_before (_gsi, size_stmt, GSI_SAME_STMT);
> > +   rhs = size;
> > +   if (!update_gimple_call (_gsi, fn, 3, src, chr, rhs))
> > + return;
>
> I think we should differentiate more.  If integer_nonzerop (chr)
> or perhaps better tree_expr_nonzero_p (chr), then it is better
> to optimize t = strlen (x); ... p = strchr (x, c); to
> t = strlen (x); ... p = memchr (x, c, t);
What do you mean by differentiate more? More comments? Or
seperate the logic more?

> the t + 1 is only needed if c might be zero.
>
> > +   /* Don't update strlen of lhs if search-char was non-zero.  */
>
> Wasn't known to be zero is the right thing.
>
> Jakub
>


PING^1 [PATCH] x86: Skip ENDBR when emitting direct call/jmp to local function

2022-06-20 Thread H.J. Lu via Gcc-patches
On Tue, May 10, 2022 at 9:25 AM H.J. Lu  wrote:
>
> Mark a function with SYMBOL_FLAG_FUNCTION_ENDBR when inserting ENDBR at
> function entry.  Skip the 4-byte ENDBR when emitting a direct call/jmp
> to a local function with ENDBR at function entry.
>
> This has been tested on Linux kernel.
>
> gcc/
>
> PR target/102953
> * config/i386/i386-features.cc
> (rest_of_insert_endbr_and_patchable_area): Set
> SYMBOL_FLAG_FUNCTION_ENDBR when inserting ENDBR.
> * config/i386/i386.cc (ix86_print_operand): Skip the 4-byte ENDBR
> when calling the local function with ENDBR at function entry.
> * config/i386/i386.h (SYMBOL_FLAG_FUNCTION_ENDBR): New.
> (SYMBOL_FLAG_FUNCTION_ENDBR_P): Likewise.
>
> gcc/testsuite/
>
> PR target/102953
> * gcc.target/i386/pr102953-1.c: New test.
> * gcc.target/i386/pr102953-2.c: Likewise.
> ---
>  gcc/config/i386/i386-features.cc   |  2 ++
>  gcc/config/i386/i386.cc| 11 +++-
>  gcc/config/i386/i386.h |  5 
>  gcc/testsuite/gcc.target/i386/pr102953-1.c | 25 ++
>  gcc/testsuite/gcc.target/i386/pr102953-2.c | 30 ++
>  5 files changed, 72 insertions(+), 1 deletion(-)
>  create mode 100644 gcc/testsuite/gcc.target/i386/pr102953-1.c
>  create mode 100644 gcc/testsuite/gcc.target/i386/pr102953-2.c
>
> diff --git a/gcc/config/i386/i386-features.cc 
> b/gcc/config/i386/i386-features.cc
> index 6fe41c3c24f..3ca1131ed59 100644
> --- a/gcc/config/i386/i386-features.cc
> +++ b/gcc/config/i386/i386-features.cc
> @@ -1979,6 +1979,8 @@ rest_of_insert_endbr_and_patchable_area (bool 
> need_endbr,
>   || (TARGET_DLLIMPORT_DECL_ATTRIBUTES
>   && DECL_DLLIMPORT_P (cfun->decl
> {
> + rtx symbol = XEXP (DECL_RTL (cfun->decl), 0);
> + SYMBOL_REF_FLAGS (symbol) |= SYMBOL_FLAG_FUNCTION_ENDBR;
>   if (crtl->profile && flag_fentry)
> {
>   /* Queue ENDBR insertion to x86_function_profiler.
> diff --git a/gcc/config/i386/i386.cc b/gcc/config/i386/i386.cc
> index 86752a6516a..ad1de239bef 100644
> --- a/gcc/config/i386/i386.cc
> +++ b/gcc/config/i386/i386.cc
> @@ -13787,7 +13787,16 @@ ix86_print_operand (FILE *file, rtx x, int code)
>else if (flag_pic || MACHOPIC_INDIRECT)
> output_pic_addr_const (file, x, code);
>else
> -   output_addr_const (file, x);
> +   {
> + /* Skip ENDBR when emitting a direct call/jmp to a local
> +function with ENDBR at function entry.  */
> + if (code == 'P'
> + && GET_CODE (x) == SYMBOL_REF
> + && SYMBOL_REF_LOCAL_P (x)
> + && SYMBOL_FLAG_FUNCTION_ENDBR_P (x))
> +   x = gen_rtx_PLUS (Pmode, x, GEN_INT (4));
> + output_addr_const (file, x);
> +   }
>  }
>  }
>
> diff --git a/gcc/config/i386/i386.h b/gcc/config/i386/i386.h
> index 363082ba47b..7a6317fea57 100644
> --- a/gcc/config/i386/i386.h
> +++ b/gcc/config/i386/i386.h
> @@ -2792,6 +2792,11 @@ extern GTY(()) tree ms_va_list_type_node;
>  #define SYMBOL_REF_STUBVAR_P(X) \
> ((SYMBOL_REF_FLAGS (X) & SYMBOL_FLAG_STUBVAR) != 0)
>
> +/* Flag to mark a function with ENDBR at entry.  */
> +#define SYMBOL_FLAG_FUNCTION_ENDBR (SYMBOL_FLAG_MACH_DEP << 5)
> +#define SYMBOL_FLAG_FUNCTION_ENDBR_P(X) \
> +   ((SYMBOL_REF_FLAGS (X) & SYMBOL_FLAG_FUNCTION_ENDBR) != 0)
> +
>  extern void debug_ready_dispatch (void);
>  extern void debug_dispatch_window (int);
>
> diff --git a/gcc/testsuite/gcc.target/i386/pr102953-1.c 
> b/gcc/testsuite/gcc.target/i386/pr102953-1.c
> new file mode 100644
> index 000..2afad391baf
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/i386/pr102953-1.c
> @@ -0,0 +1,25 @@
> +/* { dg-do compile { target { ! *-*-darwin* } } } */
> +/* { dg-options "-O2 -fno-pic -fplt -fcf-protection" } */
> +
> +extern int func (int);
> +
> +extern int i;
> +
> +__attribute__ ((noclone, noinline, noipa))
> +static int
> +bar (int x)
> +{
> +  if (x == 0)
> +return x;
> +  return bar (x - 1) + func (x);
> +}
> +
> +void *
> +foo (void)
> +{
> +  i = bar (2);
> +  return bar;
> +}
> +
> +/* { dg-final { scan-assembler-times {call\t_?bar\+4\M} 2 } } */
> +/* { dg-final { scan-assembler-times {call\t_?func\M} 1 } } */
> diff --git a/gcc/testsuite/gcc.target/i386/pr102953-2.c 
> b/gcc/testsuite/gcc.target/i386/pr102953-2.c
> new file mode 100644
> index 000..5b8d517f4f2
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/i386/pr102953-2.c
> @@ -0,0 +1,30 @@
> +/* { dg-do compile { target { ! *-*-darwin* } } } */
> +/* { dg-options "-O2 -fno-pic -fplt -fcf-protection" } */
> +
> +static int bar (int x);
> +extern int func (int);
> +
> +int
> +foo (int i)
> +{
> +  return bar (i);
> +}
> +
> +void *
> +bar_p (void)
> +{
> +  return bar;
> +}
> +
> +__attribute__ ((noclone, noinline, noipa))
> +static int
> +bar (int x)
> +{
> +  if (x == 0)
> 

Re: [PATCH v1] tree-optimization/95821 - Convert strlen + strchr to memchr

2022-06-20 Thread H.J. Lu via Gcc-patches
On Mon, Jun 20, 2022 at 10:29 AM Jakub Jelinek  wrote:
>
> On Mon, Jun 20, 2022 at 09:35:36AM -0700, Noah Goldstein via Gcc-patches 
> wrote:
> > This patch allows for strchr(x, c) to the replace with memchr(x, c,
> > strlen(x) + 1) if strlen(x) has already been computed earlier in the
> > tree.
> >
> > Handles PR95821: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=95821
> >
> > Since memchr doesn't need to re-find the null terminator it is faster
> > than strchr.
>
> Do you have a GCC Copyright assignment on file, or do you want to submit

Noah works for Intel and he should be covered.

> this under DCO ( https://gcc.gnu.org/dco.html )?  If the latter, there
> should be a Signed-off-by: line, both in the mail and later commit.
> >
> > bootstrapped and tested on x86_64-linux.
> >
> > gcc/
> >
>
> As it fixes a GCC bugzilla bug, the ChangeLog entry should start with
> PR tree-optimization/95821
> line.
> > * tree-ssa-strlen.cc: Emit memchr instead of strchr if strlen
> >  already computed.
>
> All the indented lines in ChangeLog should be indented by tab.
> You are modifying strlen_pass::handle_builtin_strchr function, so after
> tree-ssa-strlen.cc there should be that function name in parens:
> * tree-ssa-strlen.cc (strlen_pass::handle_builtin_strchr): Emit
> memchr ...
>
> >
> > gcc/testsuite/
> >
> > * c-c++-common/pr95821-1.c
> > * c-c++-common/pr95821-2.c
> > * c-c++-common/pr95821-3.c
> > * c-c++-common/pr95821-4.c
> > * c-c++-common/pr95821-5.c
> > * c-c++-common/pr95821-6.c
>
> All the above lines should end with ": New test." after .c
>
> > --- a/gcc/tree-ssa-strlen.cc
> > +++ b/gcc/tree-ssa-strlen.cc
>
> How does the patch relate to the one that H.J. attached in
> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=95821#c4 ?

Both patches are very similar.  Mine has a bug.

> > @@ -2405,9 +2405,12 @@ strlen_pass::handle_builtin_strlen ()
> >  }
> >  }
> >
> > -/* Handle a strchr call.  If strlen of the first argument is known, replace
> > -   the strchr (x, 0) call with the endptr or x + strlen, otherwise remember
> > -   that lhs of the call is endptr and strlen of the argument is endptr - 
> > x.  */
> > +/* Handle a strchr call.  If strlen of the first argument is known,
> > +   replace the strchr (x, 0) call with the endptr or x + strlen,
> > +   otherwise remember that lhs of the call is endptr and strlen of the
> > +   argument is endptr - x.  If strlen of x is not know but has been
> > +   computed earlier in the tree then replace strchr(x, c) to
> > +   memchr(x, c, strlen + 1).  */
>
> Space before ( even in comments.
>
>
>
> >  void
> >  strlen_pass::handle_builtin_strchr ()
> > @@ -2418,8 +2421,8 @@ strlen_pass::handle_builtin_strchr ()
> >if (lhs == NULL_TREE)
> >  return;
> >
> > -  if (!integer_zerop (gimple_call_arg (stmt, 1)))
> > -return;
> > +  tree chr = gimple_call_arg (stmt, 1);
> > +  bool is_strchr_zerop = integer_zerop (chr);
> >
> >tree src = gimple_call_arg (stmt, 0);
> >
> > @@ -2452,32 +2455,56 @@ strlen_pass::handle_builtin_strchr ()
> > fprintf (dump_file, "Optimizing: ");
> > print_gimple_stmt (dump_file, stmt, 0, TDF_SLIM);
> >   }
> > -   if (si != NULL && si->endptr != NULL_TREE)
> > +   if (!is_strchr_zerop)
> >   {
> > -   rhs = unshare_expr (si->endptr);
> > -   if (!useless_type_conversion_p (TREE_TYPE (lhs),
> > -   TREE_TYPE (rhs)))
> > - rhs = fold_convert_loc (loc, TREE_TYPE (lhs), rhs);
> > +   /* If its not strchr(s, zerop) then try and convert to
> > +memchr if strlen has already been computed.  */
>
> Again, space before (.  The second line is weirdly formatted, should
> be indented below If.
>
> > +   tree fn = builtin_decl_explicit (BUILT_IN_MEMCHR);
> > +   tree one = build_int_cst (TREE_TYPE (rhs), 1);
> > +   rhs = fold_build2_loc (loc, PLUS_EXPR, TREE_TYPE (rhs),
> > +  unshare_expr (rhs), one);
> > +   tree size = make_ssa_name (TREE_TYPE (rhs));
> > +   gassign *size_stmt = gimple_build_assign (size, rhs);
> > +   gsi_insert_before (_gsi, size_stmt, GSI_SAME_STMT);
> > +   rhs = size;
> > +   if (!update_gimple_call (_gsi, fn, 3, src, chr, rhs))
> > + return;
>
> I think we should differentiate more.  If integer_nonzerop (chr)
> or perhaps better tree_expr_nonzero_p (chr), then it is better
> to optimize t = strlen (x); ... p = strchr (x, c); to
> t = strlen (x); ... p = memchr (x, c, t);
> the t + 1 is only needed if c might be zero.
>
> > +   /* Don't update strlen of lhs if search-char was non-zero.  */
>
> Wasn't known to be zero is the right thing.
>
> Jakub
>


-- 
H.J.


Re: [PATCH v1] tree-optimization/95821 - Convert strlen + strchr to memchr

2022-06-20 Thread Jakub Jelinek via Gcc-patches
On Mon, Jun 20, 2022 at 09:35:36AM -0700, Noah Goldstein via Gcc-patches wrote:
> This patch allows for strchr(x, c) to the replace with memchr(x, c,
> strlen(x) + 1) if strlen(x) has already been computed earlier in the
> tree.
> 
> Handles PR95821: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=95821
> 
> Since memchr doesn't need to re-find the null terminator it is faster
> than strchr.

Do you have a GCC Copyright assignment on file, or do you want to submit
this under DCO ( https://gcc.gnu.org/dco.html )?  If the latter, there
should be a Signed-off-by: line, both in the mail and later commit.
> 
> bootstrapped and tested on x86_64-linux.
> 
> gcc/
> 

As it fixes a GCC bugzilla bug, the ChangeLog entry should start with
PR tree-optimization/95821
line.
> * tree-ssa-strlen.cc: Emit memchr instead of strchr if strlen
>  already computed.

All the indented lines in ChangeLog should be indented by tab.
You are modifying strlen_pass::handle_builtin_strchr function, so after
tree-ssa-strlen.cc there should be that function name in parens:
* tree-ssa-strlen.cc (strlen_pass::handle_builtin_strchr): Emit
memchr ...

> 
> gcc/testsuite/
> 
> * c-c++-common/pr95821-1.c
> * c-c++-common/pr95821-2.c
> * c-c++-common/pr95821-3.c
> * c-c++-common/pr95821-4.c
> * c-c++-common/pr95821-5.c
> * c-c++-common/pr95821-6.c

All the above lines should end with ": New test." after .c

> --- a/gcc/tree-ssa-strlen.cc
> +++ b/gcc/tree-ssa-strlen.cc

How does the patch relate to the one that H.J. attached in
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=95821#c4 ?

> @@ -2405,9 +2405,12 @@ strlen_pass::handle_builtin_strlen ()
>  }
>  }
>  
> -/* Handle a strchr call.  If strlen of the first argument is known, replace
> -   the strchr (x, 0) call with the endptr or x + strlen, otherwise remember
> -   that lhs of the call is endptr and strlen of the argument is endptr - x.  
> */
> +/* Handle a strchr call.  If strlen of the first argument is known,
> +   replace the strchr (x, 0) call with the endptr or x + strlen,
> +   otherwise remember that lhs of the call is endptr and strlen of the
> +   argument is endptr - x.  If strlen of x is not know but has been
> +   computed earlier in the tree then replace strchr(x, c) to
> +   memchr(x, c, strlen + 1).  */

Space before ( even in comments.



>  void
>  strlen_pass::handle_builtin_strchr ()
> @@ -2418,8 +2421,8 @@ strlen_pass::handle_builtin_strchr ()
>if (lhs == NULL_TREE)
>  return;
>  
> -  if (!integer_zerop (gimple_call_arg (stmt, 1)))
> -return;
> +  tree chr = gimple_call_arg (stmt, 1);
> +  bool is_strchr_zerop = integer_zerop (chr);
>  
>tree src = gimple_call_arg (stmt, 0);
>  
> @@ -2452,32 +2455,56 @@ strlen_pass::handle_builtin_strchr ()
> fprintf (dump_file, "Optimizing: ");
> print_gimple_stmt (dump_file, stmt, 0, TDF_SLIM);
>   }
> -   if (si != NULL && si->endptr != NULL_TREE)
> +   if (!is_strchr_zerop)
>   {
> -   rhs = unshare_expr (si->endptr);
> -   if (!useless_type_conversion_p (TREE_TYPE (lhs),
> -   TREE_TYPE (rhs)))
> - rhs = fold_convert_loc (loc, TREE_TYPE (lhs), rhs);
> +   /* If its not strchr(s, zerop) then try and convert to
> +memchr if strlen has already been computed.  */

Again, space before (.  The second line is weirdly formatted, should
be indented below If.

> +   tree fn = builtin_decl_explicit (BUILT_IN_MEMCHR);
> +   tree one = build_int_cst (TREE_TYPE (rhs), 1);
> +   rhs = fold_build2_loc (loc, PLUS_EXPR, TREE_TYPE (rhs),
> +  unshare_expr (rhs), one);
> +   tree size = make_ssa_name (TREE_TYPE (rhs));
> +   gassign *size_stmt = gimple_build_assign (size, rhs);
> +   gsi_insert_before (_gsi, size_stmt, GSI_SAME_STMT);
> +   rhs = size;
> +   if (!update_gimple_call (_gsi, fn, 3, src, chr, rhs))
> + return;

I think we should differentiate more.  If integer_nonzerop (chr)
or perhaps better tree_expr_nonzero_p (chr), then it is better
to optimize t = strlen (x); ... p = strchr (x, c); to
t = strlen (x); ... p = memchr (x, c, t);
the t + 1 is only needed if c might be zero.

> +   /* Don't update strlen of lhs if search-char was non-zero.  */

Wasn't known to be zero is the right thing.

Jakub



Re: kernel sparse annotations vs. compiler attributes and debug_annotate_{type, decl} WAS: Re: [PATCH 0/9] Add debug_annotate attributes

2022-06-20 Thread Yonghong Song via Gcc-patches




On 6/17/22 10:18 AM, Jose E. Marchesi wrote:


Hi Yonghong.


On 6/15/22 1:57 PM, David Faust wrote:


On 6/14/22 22:53, Yonghong Song wrote:



On 6/7/22 2:43 PM, David Faust wrote:

Hello,

This patch series adds support for:

- Two new C-language-level attributes that allow to associate (to "annotate" or
 to "tag") particular declarations and types with arbitrary strings. As
 explained below, this is intended to be used to, for example, characterize
 certain pointer types.

- The conveyance of that information in the DWARF output in the form of a new
 DIE: DW_TAG_GNU_annotation.

- The conveyance of that information in the BTF output in the form of two new
 kinds of BTF objects: BTF_KIND_DECL_TAG and BTF_KIND_TYPE_TAG.

All of these facilities are being added to the eBPF ecosystem, and support for
them exists in some form in LLVM.

Purpose
===

1)  Addition of C-family language constructs (attributes) to specify free-text
   tags on certain language elements, such as struct fields.

   The purpose of these annotations is to provide additional information 
about
   types, variables, and function parameters of interest to the kernel. A
   driving use case is to tag pointer types within the linux kernel and eBPF
   programs with additional semantic information, such as '__user' or 
'__rcu'.

   For example, consider the linux kernel function do_execve with the
   following declaration:

 static int do_execve(struct filename *filename,
const char __user *const __user *__argv,
const char __user *const __user *__envp);

   Here, __user could be defined with these annotations to record semantic
   information about the pointer parameters (e.g., they are user-provided) 
in
   DWARF and BTF information. Other kernel facilites such as the eBPF 
verifier
   can read the tags and make use of the information.

2)  Conveying the tags in the generated DWARF debug info.

   The main motivation for emitting the tags in DWARF is that the Linux 
kernel
   generates its BTF information via pahole, using DWARF as a source:

   ++  BTF  BTF   +--+
   | pahole |---> vmlinux.btf --->| verifier |
   ++ +--+
   ^^
   ||
 DWARF |BTF |
   ||
vmlinux  +-+
module1.ko   | BPF program |
module2.ko   +-+
  ...

   This is because:

   a)  Unlike GCC, LLVM will only generate BTF for BPF programs.

   b)  GCC can generate BTF for whatever target with -gbtf, but there is no
   support for linking/deduplicating BTF in the linker.

   In the scenario above, the verifier needs access to the pointer tags of
   both the kernel types/declarations (conveyed in the DWARF and translated
   to BTF by pahole) and those of the BPF program (available directly in 
BTF).

   Another motivation for having the tag information in DWARF, unrelated to
   BPF and BTF, is that the drgn project (another DWARF consumer) also wants
   to benefit from these tags in order to differentiate between different
   kinds of pointers in the kernel.

3)  Conveying the tags in the generated BTF debug info.

   This is easy: the main purpose of having this info in BTF is for the
   compiled eBPF programs. The kernel verifier can then access the tags
   of pointers used by the eBPF programs.


For more information about these tags and the motivation behind them, please
refer to the following linux kernel discussions:

 https://lore.kernel.org/bpf/20210914223004.244411-1-...@fb.com/
 https://lore.kernel.org/bpf/20211012164838.3345699-1-...@fb.com/
 https://lore.kernel.org/bpf/2022012604.1504583-1-...@fb.com/


Implementation Overview
===

To enable these annotations, two new C language attributes are added:
__attribute__((debug_annotate_decl("foo"))) and
__attribute__((debug_annotate_type("bar"))). Both attributes accept a single
arbitrary string constant argument, which will be recorded in the generated
DWARF and/or BTF debug information. They have no effect on code generation.

Note that we are not using the same attribute names as LLVM (btf_decl_tag and
btf_type_tag, respectively). While these attributes are functionally very
similar, they have grown beyond purely BTF-specific uses, so inclusion of "btf"
in the attribute name seems misleading.

DWARF support is enabled via a new DW_TAG_GNU_annotation. When generating DWARF,
declarations and types will be checked for the corresponding attributes. If
present, a 

[PATCH 5/5][_Hahtable] Prealloc nodes on copy

2022-06-20 Thread François Dumont via Gcc-patches

libstdc++: [_Hashtable] Prealloc nodes on _Hashtable copy

Prealloc nodes on copy to reduce memory fragmentation of nodes. Create a new
assignment method which copy hashtable instances respecting order of 
nodes in the

bucket rather than order of node in the singly-linked list.

libstdc++-v3/ChangeLog:

    * include/bits/hashtable_policy.h: Include stl_tempbuf.h.
    (_Hashtable_alloc<>::_M_allocate_node_ptr): New.
    (_PreAllocHashtableNodes<>): New.
    * include/bits/hashtable.h 
(_Hashtable<>::__prealloc_hashtable_nodes_gen_t): New.

    (_Hashtable<>::_M_assign_stable):New.
    (_Hashtable<>::operator=(const _Hashtable&)): Use latter.
    (_Hashtable(const _Hashtable&)): Likewise.
    (_Hashtable(const _Hashtable&, const allocator_type&)): Likewise.
    (_Hashtable(_Hashtable&&, __node_alloc_type&&, false_type)): Likewise.
    * testsuite/util/exception/safety.h (setup_base::compare): Compare 
containers

    rather than compare iterators.
    * testsuite/23_containers/unordered_multiset/cons/copy.cc (main): 
Likewise.


Tested under Linux x86_64.

François
diff --git a/libstdc++-v3/include/bits/hashtable.h b/libstdc++-v3/include/bits/hashtable.h
index 011a707605f..b497f16d017 100644
--- a/libstdc++-v3/include/bits/hashtable.h
+++ b/libstdc++-v3/include/bits/hashtable.h
@@ -290,6 +290,9 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
 	__detail::_ReuseOrAllocNode<__node_alloc_type>;
   using __alloc_node_gen_cache_bbegin_t =
 	__detail::_AllocNode<__node_alloc_type, __detail::_CacheBBeginPolicy>;
+  using __prealloc_hashtable_nodes_gen_t =
+	__detail::_PreAllocHashtableNodes<__node_alloc_type>;
+
   using __node_builder_t =
 	__detail::_NodeBuilder<_ExtractKey>;
   using __no_cache_bbegin_policy_t =
@@ -483,6 +486,10 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
 	void
 	_M_assign(_Ht&&, const _NodeGenerator&);
 
+  template
+	void
+	_M_assign_stable(_Ht&&, const _NodeGenerator&);
+
   void
   _M_move_assign(_Hashtable&&, true_type);
 
@@ -1364,10 +1371,10 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
 	  _M_bucket_count = __ht._M_bucket_count;
 	  _M_element_count = __ht._M_element_count;
 	  _M_rehash_policy = __ht._M_rehash_policy;
-	  __alloc_node_gen_cache_bbegin_t __node_gen(*this);
 	  __try
 		{
-		  _M_assign(__ht, __node_gen);
+		  __prealloc_hashtable_nodes_gen_t __node_gen(__ht, *this);
+		  _M_assign_stable(__ht, __node_gen);
 		}
 	  __catch(...)
 		{
@@ -1487,6 +1494,72 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
 	  }
   }
 
+  template
+template
+  void
+  _Hashtable<_Key, _Value, _Alloc, _ExtractKey, _Equal,
+		 _Hash, _RangeHash, _Unused, _RehashPolicy, _Traits>::
+  _M_assign_stable(_Ht&& __ht, const _NodeGenerator& __node_gen)
+  {
+	__buckets_ptr __buckets = nullptr;
+	if (!_M_buckets)
+	  _M_buckets = __buckets = _M_allocate_buckets(_M_bucket_count);
+
+	__try
+	  {
+	if (!__ht._M_before_begin._M_nxt)
+	  return;
+
+	__node_ptr __prev_n = nullptr;
+	for (std::size_t __bkt = 0; __bkt != _M_bucket_count; ++__bkt)
+	  {
+		if (__ht._M_buckets[__bkt] == nullptr)
+		  continue;
+
+		__node_ptr __ht_n =
+		  static_cast<__node_ptr>(__ht._M_buckets[__bkt]->_M_nxt);
+		__node_ptr __prev_ht_n = __ht_n;
+		__node_base_ptr __nxt_bkt_n = __bkt < _M_bucket_count - 1
+		  ? __ht._M_buckets[__bkt + 1] : nullptr;
+		do
+		  {
+		__node_ptr __this_n
+		  = __node_gen(__prev_n, __bkt,
+   __fwd_value_for<_Ht>(__ht_n->_M_v()));
+		this->_M_copy_code(*__this_n, *__ht_n);
+		if (__prev_n)
+		  {
+			if (!_M_buckets[__bkt])
+			  _M_buckets[__bkt] = __prev_n;
+			__prev_n->_M_nxt = __this_n;
+		  }
+		else
+		  {
+			_M_buckets[__bkt] = &_M_before_begin;
+			_M_before_begin._M_nxt = __this_n;
+		  }
+
+		__prev_n = __this_n;
+		__prev_ht_n = __ht_n;
+		__ht_n = __ht_n->_M_next();
+		  }
+		while (__ht_n
+		   && __ht._M_is_nxt_in_bucket(__bkt, __prev_ht_n,
+		   __nxt_bkt_n));
+	  }
+	  }
+	__catch(...)
+	  {
+	clear();
+	if (__buckets)
+	  _M_deallocate_buckets();
+	__throw_exception_again;
+	  }
+  }
+
   template::value,
 	const _Hashtable&, _Hashtable&&>;
-	  _M_assign(std::forward<_Fwd_Ht>(__ht), __node_gen);
+	  _M_assign_stable(std::forward<_Fwd_Ht>(__ht), __node_gen);
 	  __ht.clear();
 	}
 }
diff --git a/libstdc++-v3/include/bits/hashtable_policy.h b/libstdc++-v3/include/bits/hashtable_policy.h
index ff206a6ed20..becafcd3249 100644
--- a/libstdc++-v3/include/bits/hashtable_policy.h
+++ b/libstdc++-v3/include/bits/hashtable_policy.h
@@ -37,6 +37,7 @@
 #include 	// for std::pair
 #include 	// for __gnu_cxx::__aligned_buffer
 #include 	// for std::__alloc_rebind
+#include 	// for std::get_temporary_buffer.
 #include 	// for __gnu_cxx::__int_traits
 
 namespace std _GLIBCXX_VISIBILITY(default)
@@ -291,6 +292,113 @@ namespace __detail
   __hashtable_alloc& _M_h;
 };
 
+  template
+struct _PreAllocHashtableNodes
+{
+private:
+  

[PATCH 4/5][_Hashtable] Before begin cache policy

2022-06-20 Thread François Dumont via Gcc-patches

libstdc++: [_Hashtable] Add before begin bucket index cache on range insert

Add a policy to maintain a cache of the before begin bucket index in the 
context

of range insertion.

libstdc++-v3/ChangeLog:

    * include/bits/hashtable_policy.h (_CacheBbeginPolicy): New, maintain
    a cache.
    (_NoCacheBbeginPolicy): New, no cache.
    (_ReuseOrAllocNode<>): Inherit _CacheBBeginPolicy.
    (_AllocNode<>): Add cache policy template parameter.
    (_Map_base<>::operator[]): Adapt, use _NoCacheBBeginPolicy.
    (_Insert_base<>__node_gen_type): Replace by...
    (_Insert_base<>::__alloc_node_gen_t<>): ...this. Use cache policy as a
    template parameter.
    (_Insert_base<>::insert): Adapt.
    (_Insert_base<>::try_emplace): Adapt.
    (_Insert<>::__node_gen_type): Replace by...
    (_Insert<>::__alloc_node_gen_t): ...this, use _NoCacheBBeginPolicy.
    (_Insert<>::insert): Adapt.
    * include/bits/hashtable.h
    (_Hashtable<>::__alloc_node_gen_t): Remove.
    (_Hashtable<>::__alloc_node_gen_cache_bbegin_t): New.
    (_Hashtable<>::__no_cache_bbegin_policy_t): New.
    (_Hashtable<>::__cache_bbegin_policy_t): New.
    (_Hashtable<>::_CacheBBeginPolicy): Add friend declaration.
    (_Hashtable<>::_NoCacheBBeginPolicy): Add friend declaration.
    (_Hashtable<>::_M_insert_bucket_begin): Add BBegin policy.
    (_Hashtable<>::_M_insert_unique_node): Likewise.
    (_Hashtable<>::_M_insert_multi_node): Likewise.

Tested under Linux x64.

François
diff --git a/libstdc++-v3/include/bits/hashtable.h b/libstdc++-v3/include/bits/hashtable.h
index b0d1bc1f08a..011a707605f 100644
--- a/libstdc++-v3/include/bits/hashtable.h
+++ b/libstdc++-v3/include/bits/hashtable.h
@@ -288,10 +288,14 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
 
   using __reuse_or_alloc_node_gen_t =
 	__detail::_ReuseOrAllocNode<__node_alloc_type>;
-  using __alloc_node_gen_t =
-	__detail::_AllocNode<__node_alloc_type>;
+  using __alloc_node_gen_cache_bbegin_t =
+	__detail::_AllocNode<__node_alloc_type, __detail::_CacheBBeginPolicy>;
   using __node_builder_t =
 	__detail::_NodeBuilder<_ExtractKey>;
+  using __no_cache_bbegin_policy_t =
+	__detail::_NoCacheBBeginPolicy;
+  using __cache_bbegin_policy_t =
+	__detail::_CacheBBeginPolicy;
 
   // Simple RAII type for managing a node containing an element
   struct _Scoped_node
@@ -376,6 +380,9 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
 	   bool _Unique_keysa>
 	friend struct __detail::_Equality;
 
+  friend struct __detail::_CacheBBeginPolicy;
+  friend struct __detail::_NoCacheBBeginPolicy;
+
 public:
   using size_type = typename __hashtable_base::size_type;
   using difference_type = typename __hashtable_base::difference_type;
@@ -872,8 +879,9 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
   }
 
   // Insert a node at the beginning of a bucket.
-  void
-  _M_insert_bucket_begin(size_type, __node_ptr);
+  template
+	void
+	_M_insert_bucket_begin(size_type, __node_ptr, const _BBeginPolicy&);
 
   // Remove the bucket first node
   void
@@ -890,15 +898,18 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
   // Insert node __n with hash code __code, in bucket __bkt if no
   // rehash (assumes no element with same key already present).
   // Takes ownership of __n if insertion succeeds, throws otherwise.
-  iterator
-  _M_insert_unique_node(size_type __bkt, __hash_code,
-			__node_ptr __n, size_type __n_elt = 1);
+  template
+	iterator
+	_M_insert_unique_node(size_type __bkt, __hash_code,
+			  __node_ptr __n, const _BBeginPolicy&,
+			  size_type __n_elt = 1);
 
   // Insert node __n with key __k and hash code __code.
   // Takes ownership of __n if insertion succeeds, throws otherwise.
-  iterator
-  _M_insert_multi_node(__node_ptr __hint,
-			   __hash_code __code, __node_ptr __n);
+  template
+	iterator
+	_M_insert_multi_node(__node_ptr __hint, __hash_code,
+			 __node_ptr __n, const _BBeginPolicy&);
 
   template
 	std::pair
@@ -1087,8 +1098,10 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
 		  }
 		else
 		  {
+		__no_cache_bbegin_policy_t __bbegin_policy;
 		__ret.position
-		  = _M_insert_unique_node(__bkt, __code, __nh._M_ptr);
+		  = _M_insert_unique_node(__bkt, __code, __nh._M_ptr,
+	  __bbegin_policy);
 		__nh._M_ptr = nullptr;
 		__ret.inserted = true;
 		  }
@@ -1117,8 +1130,10 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
 	__hint = cend();
 	  }
 
+	__no_cache_bbegin_policy_t __bbegin_policy;
 	auto __ret
-	  = _M_insert_multi_node(__hint._M_cur, __code, __nh._M_ptr);
+	  = _M_insert_multi_node(__hint._M_cur, __code, __nh._M_ptr,
+ __bbegin_policy);
 	__nh._M_ptr = nullptr;
 	return __ret;
   }
@@ -1175,6 +1190,7 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
 	  node_type>, "Node types are compatible");
 	  __glibcxx_assert(get_allocator() == __src.get_allocator());
 
+	  __cache_bbegin_policy_t __bbegin_policy;
 	  auto __n_elt = __src.size();
 	  for (auto 

[PATCH 3/5][_Hashtable] std::initializer_list insertion

2022-06-20 Thread François Dumont via Gcc-patches

libstdc++: [_Hashtable] Consider all initializer_list elements are inserted

When instantiated using an initializer_list the container is pre-sized 
based on

initializer_list size.

libstdc++-v3/ChangeLog:

    * include/bits/hashtable_policy.h 
(_Insert_base<>::insert(initializer_list<>)):
    Use assignment operator if container is empty and has default 
bucket count.
    * include/bits/hashtable.h (_Hashtable<>(initializer_list<>)): Use 
initializer_list
    size as bucket count hint if user did not provide any value that is 
to say if it is

    the default 0 value.
    * testsuite/23_containers/unordered_set/init-list.cc (test02): New 
test case.


Tested under Linux x86_64.

François
diff --git a/libstdc++-v3/include/bits/hashtable.h b/libstdc++-v3/include/bits/hashtable.h
index e53cbaf0644..b0d1bc1f08a 100644
--- a/libstdc++-v3/include/bits/hashtable.h
+++ b/libstdc++-v3/include/bits/hashtable.h
@@ -575,7 +575,8 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
 		 const _Hash& __hf = _Hash(),
 		 const key_equal& __eql = key_equal(),
 		 const allocator_type& __a = allocator_type())
-  : _Hashtable(__l.begin(), __l.end(), __bkt_count_hint,
+  : _Hashtable(__l.begin(), __l.end(),
+		   __bkt_count_hint == 0 ? __l.size() : __bkt_count_hint,
 		   __hf, __eql, __a, __unique_keys{})
   { }
 
diff --git a/libstdc++-v3/include/bits/hashtable_policy.h b/libstdc++-v3/include/bits/hashtable_policy.h
index e848ba1d3f7..139d0ec27df 100644
--- a/libstdc++-v3/include/bits/hashtable_policy.h
+++ b/libstdc++-v3/include/bits/hashtable_policy.h
@@ -969,7 +969,16 @@ namespace __detail
 
   void
   insert(initializer_list __l)
-  { this->insert(__l.begin(), __l.end()); }
+  {
+	__hashtable& __h = _M_conjure_hashtable();
+	if (__h.empty() && __h.bucket_count() == 1)
+	  {
+	__h = __l;
+	return;
+	  }
+
+	this->insert(__l.begin(), __l.end());
+  }
 
   template
 	void
diff --git a/libstdc++-v3/testsuite/23_containers/unordered_set/init-list.cc b/libstdc++-v3/testsuite/23_containers/unordered_set/init-list.cc
index fc11498c718..70789d03e63 100644
--- a/libstdc++-v3/testsuite/23_containers/unordered_set/init-list.cc
+++ b/libstdc++-v3/testsuite/23_containers/unordered_set/init-list.cc
@@ -48,8 +48,27 @@ void test01()
   VERIFY(m.count(1) == 0);
 }
 
+void test02()
+{
+  unordered_set u({ 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12 });
+  VERIFY( u.size() == 13 );
+  VERIFY( u.count(0) == 1 );
+  VERIFY( u.count(13) == 0 );
+
+  auto bkt_count = u.bucket_count();
+  u.insert({ 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12 });
+  VERIFY( u.size() == 13 );
+  VERIFY( u.bucket_count() == bkt_count );
+
+  u.clear();
+  u.insert({ 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12 });
+  VERIFY( u.size() == 13 );
+  VERIFY( u.bucket_count() == bkt_count );
+}
+
 int main()
 {
   __gnu_test::set_memory_limits();
   test01();
+  test02();
 }


[PATCH 2/5][_Hashtable] New method to check current bucket

2022-06-20 Thread François Dumont via Gcc-patches
libstdc++: [_Hashtable] Use next bucket node and equal_to to check if 
same bucket


To find out if we are still in the same bucket we can first check that 
current node
is not the next bucket's before-begin and then that hash code are equals 
when cached.
If not we can also use the equal_to functor in a multi-container 
context. As a last

resort, compute node bucket index.

libstdc++-v3/ChangeLog:

    * include/bits/hashtable_policy.h 
(_Hashtable_base<>::_S_hash_code_equals): New.
    * include/bits/hashtable.h (_Hashtable<>::_M_is_in_bucket): New, 
use latter.

    (_Hashtable<>::_M_find_before_node): Use latter.
    (_Hashtable<>::_M_find_before_node_tr): Likewise.

Tested under Linux x86_64.

François
diff --git a/libstdc++-v3/include/bits/hashtable.h b/libstdc++-v3/include/bits/hashtable.h
index 8318da168e3..e53cbaf0644 100644
--- a/libstdc++-v3/include/bits/hashtable.h
+++ b/libstdc++-v3/include/bits/hashtable.h
@@ -801,6 +801,33 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
   __node_base_ptr
   _M_find_before_node(const key_type&);
 
+  bool
+  _M_is_in_bucket(size_type __bkt, __node_ptr, __node_ptr __n,
+		  true_type /* __uks */) const
+  { return _M_bucket_index(*__n) == __bkt; }
+
+  bool
+  _M_is_in_bucket(size_type __bkt, __node_ptr __prev_n, __node_ptr __n,
+		  false_type /* __uks */) const
+  {
+	return this->_M_key_equals(_ExtractKey{}(__prev_n->_M_v()), *__n)
+	  || _M_bucket_index(*__n) == __bkt;
+  }
+
+  bool
+  _M_is_nxt_in_bucket(size_type __bkt, __node_ptr __prev_n,
+			  __node_base_ptr __nxt_bkt_n) const
+  {
+	if (__prev_n == __nxt_bkt_n)
+	  return false;
+
+	__node_ptr __n = __prev_n->_M_next();
+	if (this->_S_hash_code_equals(*__prev_n, *__n))
+	  return true;
+
+	return _M_is_in_bucket(__bkt, __prev_n, __n, __unique_keys{});
+  }
+
   // Find and insert helper functions and types
   // Find the node before the one matching the criteria.
   __node_base_ptr
@@ -1999,13 +2026,15 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
   if (!__prev_p)
 	return nullptr;
 
+  __node_base_ptr __nxt_bkt_n
+	= __bkt < _M_bucket_count - 1 ? _M_buckets[__bkt + 1] : nullptr;
   for (__node_ptr __p = static_cast<__node_ptr>(__prev_p->_M_nxt);;
 	   __p = __p->_M_next())
 	{
 	  if (this->_M_equals(__k, __code, *__p))
 	return __prev_p;
 
-	  if (!__p->_M_nxt || _M_bucket_index(*__p->_M_next()) != __bkt)
+	  if (!__p->_M_nxt || !_M_is_nxt_in_bucket(__bkt, __p, __nxt_bkt_n))
 	break;
 	  __prev_p = __p;
 	}
@@ -2029,13 +2058,15 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
 	if (!__prev_p)
 	  return nullptr;
 
+	__node_base_ptr __nxt_bkt_n
+	  = __bkt < _M_bucket_count - 1 ? _M_buckets[__bkt + 1] : nullptr;
 	for (__node_ptr __p = static_cast<__node_ptr>(__prev_p->_M_nxt);;
 	 __p = __p->_M_next())
 	  {
 	if (this->_M_equals_tr(__k, __code, *__p))
 	  return __prev_p;
 
-	if (!__p->_M_nxt || _M_bucket_index(*__p->_M_next()) != __bkt)
+	if (!__p->_M_nxt || !_M_is_nxt_in_bucket(__bkt, __p, __nxt_bkt_n))
 	  break;
 	__prev_p = __p;
 	  }
diff --git a/libstdc++-v3/include/bits/hashtable_policy.h b/libstdc++-v3/include/bits/hashtable_policy.h
index 83a9ff2bb3d..e848ba1d3f7 100644
--- a/libstdc++-v3/include/bits/hashtable_policy.h
+++ b/libstdc++-v3/include/bits/hashtable_policy.h
@@ -1721,6 +1721,16 @@ namespace __detail
   : __hash_code_base(__hash), _EqualEBO(__eq)
   { }
 
+  static bool
+  _S_hash_code_equals(const _Hash_node_code_cache&,
+			  const _Hash_node_code_cache&)
+  { return false; }
+
+  static bool
+  _S_hash_code_equals(const _Hash_node_code_cache& __lhn,
+			  const _Hash_node_code_cache& __rhn)
+  { return __lhn._M_hash_code == __rhn._M_hash_code; }
+
   bool
   _M_key_equals(const _Key& __k,
 		const _Hash_node_value<_Value,


[PATCH 1/5][_Hashtable] Make more use of user provided hint

2022-06-20 Thread François Dumont via Gcc-patches

libstdc++: [_Hashtable] Make more use of insertion hint

Make use of the user provided hint iterator in unordered containers 
operations.


Hint is used:
- As a hint for allocation to potentially reduce memory fragmentation.
- For unordered_set/unordered_map we check if it does not match the key 
of the

element to insert, before computing the hash code.
- For unordered_multiset/unordered_multimap, if equals to the key of the 
element
to insert, the hash code is taken from the hint so that we can take 
advantage of

the potential hash code cache.

libstdc++-v3/ChangeLog:

    * include/bits/hashtable_policy.h (_NodeBuilder<>::_S_build): Add 
_NodePtr template

    parameter.
    (_ReuseOrAllocNode::operator()): Add __node_ptr parameter.
    (_AllocNode::operator()): Likewise.
    (_Insert_base::try_emplace): Adapt to use hint.
    (_Hashtable_alloc<>::_M_allocate_node(__node_ptr, _Args&&...)): Add 
__node_ptr parameter.

    * include/bits/hashtable.h
    (_Hashtable<>::_Scope_node<>(__hashtable_alloc*, __node_ptr, 
_Args&&...)):

    Add __node_ptr parameter.
    (_Hashtable<>::_M_get_hint(size_type, __node_ptr)): New.
    (_Hashtable<>::_M_emplace_unique(const_iterator, _Args&&...)): New.
    (_Hashtable<>::_M_emplace_multi(const_iterator, _Args&&...)): New.
    (_Hashtable<>::_M_emplace()): Adapt to use latter.
    (_Hashtable<>::_M_insert_unique): Add const_iterator parameter.
    (_Hashtable<>::_M_insert(const_iterator, _Arg&&, const 
_NodeGenerator&, true_type)):

    Use latter.
    (_Hashtable<>::_M_reinsert_node(const_iterator, node_type&&)):
    Add const_iterator parameter, adapt to use it.
    (_Hashtable<>::_M_reinsert_node_multi): Make more use of hint 
parameter.
    * include/bits/unordered_map.h 
(unordered_map<>::insert(node_type&&)): Pass cend as

    hint.
    (unordered_map<>::insert(const_iterator, node_type&&)): Adapt to 
use hint.
    * include/bits/unordered_set.h 
(unordered_set<>::insert(node_type&&)): Pass cend as

    hint.
    (unordered_set<>::insert(const_iterator, node_type&&)): Adapt to 
use hint.


Tested under Linux x86_64.

François
diff --git a/libstdc++-v3/include/bits/hashtable.h b/libstdc++-v3/include/bits/hashtable.h
index 1b21b795f89..8318da168e3 100644
--- a/libstdc++-v3/include/bits/hashtable.h
+++ b/libstdc++-v3/include/bits/hashtable.h
@@ -302,9 +302,11 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
 
 	// Allocate a node and construct an element within it.
 	template
-	  _Scoped_node(__hashtable_alloc* __h, _Args&&... __args)
+	  _Scoped_node(__hashtable_alloc* __h,
+		   __node_ptr __hint, _Args&&... __args)
 	  : _M_h(__h),
-	_M_node(__h->_M_allocate_node(std::forward<_Args>(__args)...))
+	_M_node(__h->_M_allocate_node(__hint,
+	  std::forward<_Args>(__args)...))
 	  { }
 
 	// Destroy element and deallocate node.
@@ -829,6 +831,18 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
 	  return nullptr;
 	}
 
+  // Gets a hint after which a node should be allocated given a bucket.
+  __node_ptr
+  _M_get_hint(size_type __bkt, __node_ptr __hint = nullptr) const
+  {
+	__node_base_ptr __node;
+	if (__node = _M_buckets[__bkt])
+	  return __node != &_M_before_begin
+	? static_cast<__node_ptr>(__node) : __hint;
+
+	return __hint;
+  }
+
   // Insert a node at the beginning of a bucket.
   void
   _M_insert_bucket_begin(size_type, __node_ptr);
@@ -860,26 +874,40 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
 
   template
 	std::pair
-	_M_emplace(true_type __uks, _Args&&... __args);
+	_M_emplace_unique(const_iterator, _Args&&... __args);
+
+  template
+	iterator
+	_M_emplace_multi(const_iterator, _Args&&... __args);
+
+  template
+	std::pair
+	_M_emplace(true_type /*__uks*/, _Args&&... __args)
+	{ return _M_emplace_unique(cend(), std::forward<_Args>(__args)...); }
 
   template
 	iterator
-	_M_emplace(false_type __uks, _Args&&... __args)
-	{ return _M_emplace(cend(), __uks, std::forward<_Args>(__args)...); }
+	_M_emplace(false_type /*__uks*/, _Args&&... __args)
+	{ return _M_emplace_multi(cend(), std::forward<_Args>(__args)...); }
 
-  // Emplace with hint, useless when keys are unique.
   template
 	iterator
-	_M_emplace(const_iterator, true_type __uks, _Args&&... __args)
-	{ return _M_emplace(__uks, std::forward<_Args>(__args)...).first; }
+	_M_emplace(const_iterator __hint, true_type /*__uks*/,
+		   _Args&&... __args)
+	{
+	  return _M_emplace_unique(__hint,
+   std::forward<_Args>(__args)...).first;
+	}
 
   template
 	iterator
-	_M_emplace(const_iterator, false_type __uks, _Args&&... __args);
+	_M_emplace(const_iterator __hint, false_type /*__uks*/,
+		   _Args&&... __args)
+	{ return _M_emplace_multi(__hint, std::forward<_Args>(__args)...); }
 
   template
 	std::pair
-	_M_insert_unique(_Kt&&, _Arg&&, const _NodeGenerator&);
+	_M_insert_unique(const_iterator, _Kt&&, _Arg&&, const _NodeGenerator&);
 
   template
 	static __conditional_t<
@@ -899,9 +927,10 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
 
   

[PATCH] Enhance _Hashtable for range insertion 0/5

2022-06-20 Thread François Dumont via Gcc-patches

Hi

Here is a series of patch to enhance _Hashtable behavior mostly in the 
context of range insertion. I also start considering the problem of 
memory fragmentation in this container with 2 objectives:


- It is easier to find out when you're done with the elements of a 
bucket if the last node of the bucket N is the before-begin node of 
bucket N + 1.


- It is faster to loop through nodes of a bucket if those node are close 
in memory, ultimately we should have addressof(Node + 1) == 
addressof(Node) + 1


[1/5] Make more use of user hints as both insertion and allocation hints.

[2/5] Introduce a new method to check if we are still looping through 
the same bucket's nodes


[3/5] Consider that all initializer_list elements are going to be inserted

[4/5] Introduce a before-begin cache policy to remember which bucket is 
currently pointing on it


[5/5] Prealloc nodes on _Hashtable copy and introduce a new assignment 
method which replicate buckets data structure


François



[PATCH v1] tree-optimization/95821 - Convert strlen + strchr to memchr

2022-06-20 Thread Noah Goldstein via Gcc-patches
This patch allows for strchr(x, c) to the replace with memchr(x, c,
strlen(x) + 1) if strlen(x) has already been computed earlier in the
tree.

Handles PR95821: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=95821

Since memchr doesn't need to re-find the null terminator it is faster
than strchr.

bootstrapped and tested on x86_64-linux.

gcc/

* tree-ssa-strlen.cc: Emit memchr instead of strchr if strlen
 already computed.

gcc/testsuite/

* c-c++-common/pr95821-1.c
* c-c++-common/pr95821-2.c
* c-c++-common/pr95821-3.c
* c-c++-common/pr95821-4.c
* c-c++-common/pr95821-5.c
* c-c++-common/pr95821-6.c
---
 gcc/testsuite/c-c++-common/pr95821-1.c | 15 ++
 gcc/testsuite/c-c++-common/pr95821-2.c | 17 +++
 gcc/testsuite/c-c++-common/pr95821-3.c | 17 +++
 gcc/testsuite/c-c++-common/pr95821-4.c | 16 ++
 gcc/testsuite/c-c++-common/pr95821-5.c | 19 +++
 gcc/testsuite/c-c++-common/pr95821-6.c | 18 +++
 gcc/tree-ssa-strlen.cc | 69 +++---
 7 files changed, 152 insertions(+), 19 deletions(-)
 create mode 100644 gcc/testsuite/c-c++-common/pr95821-1.c
 create mode 100644 gcc/testsuite/c-c++-common/pr95821-2.c
 create mode 100644 gcc/testsuite/c-c++-common/pr95821-3.c
 create mode 100644 gcc/testsuite/c-c++-common/pr95821-4.c
 create mode 100644 gcc/testsuite/c-c++-common/pr95821-5.c
 create mode 100644 gcc/testsuite/c-c++-common/pr95821-6.c

diff --git a/gcc/testsuite/c-c++-common/pr95821-1.c 
b/gcc/testsuite/c-c++-common/pr95821-1.c
new file mode 100644
index 000..e0beb609ea2
--- /dev/null
+++ b/gcc/testsuite/c-c++-common/pr95821-1.c
@@ -0,0 +1,15 @@
+/* { dg-do compile } */
+/* { dg-options "-O2" } */
+/* { dg-final { scan-assembler "memchr" } } */
+
+#include 
+
+char *
+foo (char *s, char c)
+{
+   size_t slen = __builtin_strlen(s);
+   if(slen < 1000)
+   return NULL;
+
+   return __builtin_strchr(s, c);
+}
diff --git a/gcc/testsuite/c-c++-common/pr95821-2.c 
b/gcc/testsuite/c-c++-common/pr95821-2.c
new file mode 100644
index 000..5429f0586be
--- /dev/null
+++ b/gcc/testsuite/c-c++-common/pr95821-2.c
@@ -0,0 +1,17 @@
+/* { dg-do compile } */
+/* { dg-options "-O2" } */
+/* { dg-final { scan-assembler-not "memchr" } } */
+
+#include 
+
+char *
+foo (char *s, char c, char * other)
+{
+   size_t slen = __builtin_strlen(s);
+   if(slen < 1000)
+   return NULL;
+
+   *other = 0;
+
+   return __builtin_strchr(s, c);
+}
diff --git a/gcc/testsuite/c-c++-common/pr95821-3.c 
b/gcc/testsuite/c-c++-common/pr95821-3.c
new file mode 100644
index 000..bc929c6044b
--- /dev/null
+++ b/gcc/testsuite/c-c++-common/pr95821-3.c
@@ -0,0 +1,17 @@
+/* { dg-do compile } */
+/* { dg-options "-O2" } */
+/* { dg-final { scan-assembler "memchr" } } */
+
+#include 
+
+char *
+foo (char * __restrict s, char c, char * __restrict other)
+{
+   size_t slen = __builtin_strlen(s);
+   if(slen < 1000)
+   return NULL;
+
+   *other = 0;
+
+   return __builtin_strchr(s, c);
+}
diff --git a/gcc/testsuite/c-c++-common/pr95821-4.c 
b/gcc/testsuite/c-c++-common/pr95821-4.c
new file mode 100644
index 000..684b41d5b70
--- /dev/null
+++ b/gcc/testsuite/c-c++-common/pr95821-4.c
@@ -0,0 +1,16 @@
+/* { dg-do compile } */
+/* { dg-options "-O2" } */
+/* { dg-final { scan-assembler "memchr" } } */
+
+#include 
+#include 
+
+char *
+foo (char *s, char c)
+{
+   size_t slen = strlen(s);
+   if(slen < 1000)
+   return NULL;
+
+   return strchr(s, c);
+}
diff --git a/gcc/testsuite/c-c++-common/pr95821-5.c 
b/gcc/testsuite/c-c++-common/pr95821-5.c
new file mode 100644
index 000..00c1d93b614
--- /dev/null
+++ b/gcc/testsuite/c-c++-common/pr95821-5.c
@@ -0,0 +1,19 @@
+/* { dg-do compile } */
+/* { dg-options "-O2" } */
+/* { dg-final { scan-assembler-not "memchr" } } */
+
+#include 
+#include 
+
+char *
+foo (char *s, char c, char * other)
+{
+   size_t slen = strlen(s);
+   if(slen < 1000)
+   return NULL;
+
+   *other = 0;
+
+   return strchr(s, c);
+}
+int main() {}
diff --git a/gcc/testsuite/c-c++-common/pr95821-6.c 
b/gcc/testsuite/c-c++-common/pr95821-6.c
new file mode 100644
index 000..dec839de5ea
--- /dev/null
+++ b/gcc/testsuite/c-c++-common/pr95821-6.c
@@ -0,0 +1,18 @@
+/* { dg-do compile } */
+/* { dg-options "-O2" } */
+/* { dg-final { scan-assembler "memchr" } } */
+
+#include 
+#include 
+
+char *
+foo (char * __restrict s, char c, char * __restrict other)
+{
+   size_t slen = strlen(s);
+   if(slen < 1000)
+   return NULL;
+
+   *other = 0;
+
+   return strchr(s, c);
+}
diff --git a/gcc/tree-ssa-strlen.cc b/gcc/tree-ssa-strlen.cc
index 1d4c0f78fbf..d959a530ea0 100644
--- a/gcc/tree-ssa-strlen.cc
+++ b/gcc/tree-ssa-strlen.cc
@@ -2405,9 +2405,12 @@ strlen_pass::handle_builtin_strlen ()
 }
 }
 
-/* Handle a strchr call.  If strlen of the first argument is known, 

Re: [PATCH] Add -fextra-libc-function=memcmpeq for __memcmpeq

2022-06-20 Thread H.J. Lu via Gcc-patches
On Mon, Jun 20, 2022 at 2:39 AM Richard Biener
 wrote:
>
> On Thu, Jun 16, 2022 at 1:38 AM Fangrui Song  wrote:
> >
> > On Wed, Jun 15, 2022 at 2:44 PM H.J. Lu via Gcc-patches
> >  wrote:
> > >
> > > On Mon, Jun 13, 2022 at 9:01 AM Richard Biener
> > >  wrote:
> > > >
> > > >
> > > >
> > > > > Am 13.06.2022 um 16:36 schrieb H.J. Lu :
> > > > >
> > > > > On Mon, Jun 13, 2022 at 3:11 AM Richard Biener
> > > > >  wrote:
> > > > >>
> > > > >>> On Tue, Jun 7, 2022 at 9:02 PM H.J. Lu via Gcc-patches
> > > > >>>  wrote:
> > > > >>>
> > > > >>> Add -fextra-libc-function=memcmpeq to map
> > > > >>>
> > > > >>> extern int __memcmpeq (const void *, const void *, size_t);
> > > > >>>
> > > > >>> which was added to GLIBC 2.35, to __builtin_memcmp_eq.
> > > > >>
> > > > >> Humm.  Can't we instead use the presence of a declaration
> > > > >> of __memcmpeq with a GNU standard dialect as this instead of
> > > > >> adding a weird -fextra-libc-function= option?  Maybe that's even
> > > > >> reasonable with a non-GNU dialect standard in effect since
> > > > >> __ prefixed names are in the implementation namespace?
> > > > >
> > > > > But not all source codes include  and GCC may generate
> > > > > memcmp directly.  How should we handle these cases?
> > > >
> > > > Not.  Similar as to vectorized math functions.
> > > > I think it’s not worth optimizing for this case.
> > >
> > > Another question.  Should we consider any __memcmpeq prototype
> > > or just the one in the system header file?
>
> Any.

Here is the v2 patch:

https://gcc.gnu.org/pipermail/gcc-patches/2022-June/596881.html

> > An idea from https://reviews.llvm.org/D56593#3586673: -fbuiltin-__memcmpeq
> >
> > This requires making -fbuiltin-function available, see
> > https://gcc.gnu.org/onlinedocs/gcc/C-Dialect-Options.html
> > ("There is no corresponding -fbuiltin-function option")
> >
> > I prefer an option over a magic behavior about whether a declaration exists.
>
> But we already have this behavior for multiple cases.  It's also the only
> way that in practice __memcmpeq will be used - _nobody_ (but maybe
> special crafted SPEC peak runs) will add explicit -fbuiltin-__memcmpeq.
>
> Richard.

Thanks.

-- 
H.J.


[PATCH v2] Enable __memcmpeq after seeing __memcmpeq prototype

2022-06-20 Thread H.J. Lu via Gcc-patches
extern int __memcmpeq (const void *, const void *, size_t);

was was added to GLIBC 2.35.  Expand BUILT_IN_MEMCMP_EQ to __memcmpeq
after seeing __memcmpeq prototype

gcc/

* builtins.cc (have_memcmpeq_prototype): New.
(expand_builtin): Issue an error for BUILT_IN___MEMCMPEQ if
there is no __memcmpeq prototype.  Expand BUILT_IN_MEMCMP_EQ
to BUILT_IN___MEMCMP_EQ if there is __memcmpeq prototype.
* builtins.def (BUILT_IN___MEMCMPEQ): New.
* builtins.h (have_memcmpeq_prototype): New.

gcc/c/

* c-decl.cc (diagnose_mismatched_decls): Set
have_memcmpeq_prototype to true after seeing __memcmpeq prototype.

gcc/cp/

*  decl.cc (duplicate_decls): Set have_memcmpeq_prototype to true
after seeing __memcmpeq prototype.

gcc/testsuite/

* c-c++-common/memcmpeq-1.c: New test.
* c-c++-common/memcmpeq-2.c: Likewise.
* c-c++-common/memcmpeq-3.c: Likewise.
* c-c++-common/memcmpeq-4.c: Likewise.
* c-c++-common/memcmpeq-5.c: Likewise.
* c-c++-common/memcmpeq-6.c: Likewise.
* c-c++-common/memcmpeq.h: Likewise.
---
 gcc/builtins.cc | 17 -
 gcc/builtins.def|  3 +++
 gcc/builtins.h  |  3 +++
 gcc/c/c-decl.cc | 25 ++---
 gcc/cp/decl.cc  |  5 +
 gcc/testsuite/c-c++-common/memcmpeq-1.c | 11 +++
 gcc/testsuite/c-c++-common/memcmpeq-2.c | 11 +++
 gcc/testsuite/c-c++-common/memcmpeq-3.c | 11 +++
 gcc/testsuite/c-c++-common/memcmpeq-4.c | 11 +++
 gcc/testsuite/c-c++-common/memcmpeq-5.c | 11 +++
 gcc/testsuite/c-c++-common/memcmpeq-6.c | 10 ++
 gcc/testsuite/c-c++-common/memcmpeq.h   | 11 +++
 12 files changed, 121 insertions(+), 8 deletions(-)
 create mode 100644 gcc/testsuite/c-c++-common/memcmpeq-1.c
 create mode 100644 gcc/testsuite/c-c++-common/memcmpeq-2.c
 create mode 100644 gcc/testsuite/c-c++-common/memcmpeq-3.c
 create mode 100644 gcc/testsuite/c-c++-common/memcmpeq-4.c
 create mode 100644 gcc/testsuite/c-c++-common/memcmpeq-5.c
 create mode 100644 gcc/testsuite/c-c++-common/memcmpeq-6.c
 create mode 100644 gcc/testsuite/c-c++-common/memcmpeq.h

diff --git a/gcc/builtins.cc b/gcc/builtins.cc
index 971b18c3745..96e283e5847 100644
--- a/gcc/builtins.cc
+++ b/gcc/builtins.cc
@@ -104,6 +104,9 @@ builtin_info_type builtin_info[(int)END_BUILTINS];
 /* Non-zero if __builtin_constant_p should be folded right away.  */
 bool force_folding_builtin_constant_p;
 
+/* True if there is a __memcmpeq prototype.  */
+bool have_memcmpeq_prototype;
+
 static int target_char_cast (tree, char *);
 static int apply_args_size (void);
 static int apply_result_size (void);
@@ -7392,6 +7395,15 @@ expand_builtin (tree exp, rtx target, rtx subtarget, 
machine_mode mode,
return target;
   break;
 
+case BUILT_IN___MEMCMPEQ:
+  if (!have_memcmpeq_prototype)
+   {
+ error ("use of %<__builtin___memcmpeq ()%> without "
+"%<__memcmpeq%> prototype");
+ return const0_rtx;
+   }
+  break;
+
 /* Expand it as BUILT_IN_MEMCMP_EQ first. If not successful, change it
back to a BUILT_IN_STRCMP. Remember to delete the 3rd parameter
when changing it to a strcmp call.  */
@@ -7445,7 +7457,10 @@ expand_builtin (tree exp, rtx target, rtx subtarget, 
machine_mode mode,
return target;
   if (fcode == BUILT_IN_MEMCMP_EQ)
{
- tree newdecl = builtin_decl_explicit (BUILT_IN_MEMCMP);
+ tree newdecl = builtin_decl_explicit
+   (have_memcmpeq_prototype
+? BUILT_IN___MEMCMPEQ
+: BUILT_IN_MEMCMP);
  TREE_OPERAND (exp, 1) = build_fold_addr_expr (newdecl);
}
   break;
diff --git a/gcc/builtins.def b/gcc/builtins.def
index 005976f34e9..95642c6acdf 100644
--- a/gcc/builtins.def
+++ b/gcc/builtins.def
@@ -965,6 +965,9 @@ DEF_BUILTIN_STUB (BUILT_IN_ALLOCA_WITH_ALIGN_AND_MAX, 
"__builtin_alloca_with_ali
equality with zero.  */
 DEF_BUILTIN_STUB (BUILT_IN_MEMCMP_EQ, "__builtin_memcmp_eq")
 
+/* Similar to BUILT_IN_MEMCMP_EQ, but is mapped to __memcmpeq.  */
+DEF_EXT_LIB_BUILTIN (BUILT_IN___MEMCMPEQ, "__memcmpeq", 
BT_FN_INT_CONST_PTR_CONST_PTR_SIZE, ATTR_PURE_NOTHROW_NONNULL_LEAF)
+
 /* An internal version of strcmp/strncmp, used when the result is only 
tested for equality with zero.  */
 DEF_BUILTIN_STUB (BUILT_IN_STRCMP_EQ, "__builtin_strcmp_eq")
diff --git a/gcc/builtins.h b/gcc/builtins.h
index 5ad830c9fbf..e3e80b33f6d 100644
--- a/gcc/builtins.h
+++ b/gcc/builtins.h
@@ -49,6 +49,9 @@ extern struct target_builtins *this_target_builtins;
 /* Non-zero if __builtin_constant_p should be folded right away.  */
 extern bool force_folding_builtin_constant_p;
 
+/* True if there is a __memcmpeq prototype.  */
+extern bool have_memcmpeq_prototype;
+
 extern bool 

Re: [PATCH] testsuite, asan: Avoid color in asan test output.

2022-06-20 Thread Jakub Jelinek via Gcc-patches
On Mon, Jun 20, 2022 at 04:28:21PM +0100, Iain Sandoe wrote:
> The presence of the color markers in the some of the asan tests
> appears to confuse the dg-output matching (possibly a platform
> TCL or termios bug) on some Darwin platforms.
> 
> Since the color is not being tested, switch it off (makes the log
> files easier to read too).  This fixes a large number of spurious
> test fails on AVX512 Darwin19.
> 
> tested on x86_64 Darwin / Linux,
> OK for master / backports?
> thanks
> Iain
> 
> Signed-off-by: Iain Sandoe 
> 
> gcc/testsuite/ChangeLog:
> 
>   * lib/asan-dg.exp: Do not apply color to asan output when
>   under test.

Okay.

> diff --git a/gcc/testsuite/lib/asan-dg.exp b/gcc/testsuite/lib/asan-dg.exp
> index 7e0f85dc9b0..87c70d0bebb 100644
> --- a/gcc/testsuite/lib/asan-dg.exp
> +++ b/gcc/testsuite/lib/asan-dg.exp
> @@ -111,6 +111,8 @@ proc asan_init { args } {
>  global asan_saved_TEST_ALWAYS_FLAGS
>  global asan_saved_ALWAYS_CXXFLAGS
>  
> +setenv ASAN_OPTIONS "color=never"
> +
>  set link_flags ""
>  if ![is_remote host] {
>   if [info exists TOOL_OPTIONS] {
> -- 
> 2.24.3 (Apple Git-128)

Jakub



[PATCH] testsuite, asan: Avoid color in asan test output.

2022-06-20 Thread Iain Sandoe via Gcc-patches
The presence of the color markers in the some of the asan tests
appears to confuse the dg-output matching (possibly a platform
TCL or termios bug) on some Darwin platforms.

Since the color is not being tested, switch it off (makes the log
files easier to read too).  This fixes a large number of spurious
test fails on AVX512 Darwin19.

tested on x86_64 Darwin / Linux,
OK for master / backports?
thanks
Iain

Signed-off-by: Iain Sandoe 

gcc/testsuite/ChangeLog:

* lib/asan-dg.exp: Do not apply color to asan output when
under test.
---
 gcc/testsuite/lib/asan-dg.exp | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/gcc/testsuite/lib/asan-dg.exp b/gcc/testsuite/lib/asan-dg.exp
index 7e0f85dc9b0..87c70d0bebb 100644
--- a/gcc/testsuite/lib/asan-dg.exp
+++ b/gcc/testsuite/lib/asan-dg.exp
@@ -111,6 +111,8 @@ proc asan_init { args } {
 global asan_saved_TEST_ALWAYS_FLAGS
 global asan_saved_ALWAYS_CXXFLAGS
 
+setenv ASAN_OPTIONS "color=never"
+
 set link_flags ""
 if ![is_remote host] {
if [info exists TOOL_OPTIONS] {
-- 
2.24.3 (Apple Git-128)



[pushed] testsuite, Darwin: Skip an unsupported test.

2022-06-20 Thread Iain Sandoe via Gcc-patches
Darwin does not support patchable function entries, skip the test
there.

tested on x86_64 darwin / linux, pushed to master, thanks,
Iain

gcc/testsuite/ChangeLog:

* g++.dg/modules/pr105169_a.C: Skip the test on Darwin.
* g++.dg/modules/pr105169_b.C: Likewise.
---
 gcc/testsuite/g++.dg/modules/pr105169_a.C | 2 +-
 gcc/testsuite/g++.dg/modules/pr105169_b.C | 2 +-
 2 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/gcc/testsuite/g++.dg/modules/pr105169_a.C 
b/gcc/testsuite/g++.dg/modules/pr105169_a.C
index 02660b3a0e4..e4ec855ab49 100644
--- a/gcc/testsuite/g++.dg/modules/pr105169_a.C
+++ b/gcc/testsuite/g++.dg/modules/pr105169_a.C
@@ -1,4 +1,4 @@
-/* { dg-module-do link } */
+/* { dg-module-do link { target { ! *-*-darwin* } } } */
 /* { dg-options "-std=c++11 -fpatchable-function-entry=2 -O2" } */
 /* { dg-additional-options "-std=c++11 -fpatchable-function-entry=2 -O2" } */
 
diff --git a/gcc/testsuite/g++.dg/modules/pr105169_b.C 
b/gcc/testsuite/g++.dg/modules/pr105169_b.C
index 7a9c5863a6a..afbb12927e4 100644
--- a/gcc/testsuite/g++.dg/modules/pr105169_b.C
+++ b/gcc/testsuite/g++.dg/modules/pr105169_b.C
@@ -1,4 +1,4 @@
-/* { dg-module-do link } */
+/* { dg-module-do link { target { ! *-*-darwin* } } } */
 /* { dg-options "-std=c++11 -fpatchable-function-entry=2 -O2" } */
 /* { dg-additional-options "-std=c++11 -fpatchable-function-entry=2 -O2" } */
 
-- 
2.24.3 (Apple Git-128)



[pushed] testsuite, Darwin: Allow for two CTOR bodies in array61 test.

2022-06-20 Thread Iain Sandoe via Gcc-patches
For targets without alias support, we emit two essentially identical function
bodies into the gimple (complete and base CTORs). So this test needs to allow
for that when the target does not support aliases.  The target support alias
test does not seem to be usable in the context of a single scan-tree-dump so
the fix here uses the target designation.

Note that the array has 10 elements, so that if the test were failing (because
we were emitting 10 inits instead of a loop) the count would be expected to
exceed 2, on Darwin and 1 where there's alias support.

tested on x86_64 darwin / linux, pushed to master, thanks,
Iain

Signed-off-by: Iain Sandoe 

gcc/testsuite/ChangeLog:

* g++.dg/init/array61.C: Allow for two CTOR bodies on Darwin, where
aliases are not currently supported.
---
 gcc/testsuite/g++.dg/init/array61.C | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/gcc/testsuite/g++.dg/init/array61.C 
b/gcc/testsuite/g++.dg/init/array61.C
index eaf535c2546..c8f82b9f155 100644
--- a/gcc/testsuite/g++.dg/init/array61.C
+++ b/gcc/testsuite/g++.dg/init/array61.C
@@ -1,7 +1,8 @@
 // PR c++/92385
 // { dg-do compile { target c++11 } }
 // { dg-additional-options -fdump-tree-gimple }
-// { dg-final { scan-tree-dump-times "item::item" 1 "gimple" } }
+// { dg-final { scan-tree-dump-times "item::item" 1 "gimple" { target { ! 
*-*-darwin* } } } }
+// { dg-final { scan-tree-dump-times "item::item" 2 "gimple" { target { 
*-*-darwin* } } } }
 
 struct item {
   int i;
-- 
2.24.3 (Apple Git-128)



[committed] arm: more testsutie fallout for mve move-immediate changes

2022-06-20 Thread Richard Earnshaw via Gcc-patches

Unfortunately, there is more fall-out in the testsuite for my changes
to use MVE move-immediate operations instead of literal pool loads.
Fixed as follows:

gcc/testsuite/ChangeLog:
* gcc.target/arm/simd/mve-vcmp-f32-2.c: Adjust expected output.
* gcc.target/arm/simd/pr100757.c: Likewise.
* gcc.target/arm/simd/pr100757-2.c: Likewise.
* gcc.target/arm/simd/pr100757-3.c: Likewise.
* gcc.target/arm/simd/pr100757-4.c: Likewise.
---
 gcc/testsuite/gcc.target/arm/simd/mve-vcmp-f32-2.c |  6 --
 gcc/testsuite/gcc.target/arm/simd/pr100757-2.c |  9 ++---
 gcc/testsuite/gcc.target/arm/simd/pr100757-3.c |  9 ++---
 gcc/testsuite/gcc.target/arm/simd/pr100757-4.c | 10 +++---
 gcc/testsuite/gcc.target/arm/simd/pr100757.c   |  9 ++---
 5 files changed, 29 insertions(+), 14 deletions(-)

diff --git a/gcc/testsuite/gcc.target/arm/simd/mve-vcmp-f32-2.c b/gcc/testsuite/gcc.target/arm/simd/mve-vcmp-f32-2.c
index 917a95bf141..2440cef267e 100644
--- a/gcc/testsuite/gcc.target/arm/simd/mve-vcmp-f32-2.c
+++ b/gcc/testsuite/gcc.target/arm/simd/mve-vcmp-f32-2.c
@@ -28,5 +28,7 @@ FUNC(>=, vcmpge)
 /* { dg-final { scan-assembler-times {\tvcmp.f32\tle, q[0-9]+, q[0-9]+\n} 1 } } */
 /* { dg-final { scan-assembler-times {\tvcmp.f32\tgt, q[0-9]+, q[0-9]+\n} 1 } } */
 /* { dg-final { scan-assembler-times {\tvcmp.f32\tge, q[0-9]+, q[0-9]+\n} 1 } } */
-/* { dg-final { scan-assembler-times {\t.word\t1073741824\n} 24 } } */ /* Constant 2.0f.  */
-/* { dg-final { scan-assembler-times {\t.word\t1077936128\n} 24 } } */ /* Constant 3.0f.  */
+/* { dg-final { scan-assembler-times {\tvmov\.f32\tq[0-7], #2\.0e\+0  @ v4sf} 6 } } */
+/* { dg-final { scan-assembler-not {\t.word\t1073741824\n} } } */ /* Constant 2.0f.  */
+/* { dg-final { scan-assembler-times {\tvmov\.f32\tq[0-7], #3\.0e\+0  @ v4sf} 6 } } */
+/* { dg-final { scan-assembler-not {\t.word\t1077936128\n} } } */ /* Constant 3.0f.  */
diff --git a/gcc/testsuite/gcc.target/arm/simd/pr100757-2.c b/gcc/testsuite/gcc.target/arm/simd/pr100757-2.c
index c2262b4d81e..21426fee370 100644
--- a/gcc/testsuite/gcc.target/arm/simd/pr100757-2.c
+++ b/gcc/testsuite/gcc.target/arm/simd/pr100757-2.c
@@ -13,8 +13,11 @@ int fn1(int d) {
   return c;
 }
 
-/* { dg-final { scan-assembler-times {\t.word\t1073741824\n} 4 } } */ /* Constant 2.0f.  */
-/* { dg-final { scan-assembler-times {\t.word\t4\n} 4 } } */ /* Initial value for c.  */
-/* { dg-final { scan-assembler-times {\t.word\t5\n} 4 } } */ /* Possible value for c.  */
+/* { dg-final { scan-assembler-times {\tvmov\.f32\tq[0-7], #2\.0e\+0  @ v4sf} 1 } } */
+/* { dg-final { scan-assembler-not {\t.word\t1073741824\n} } } */
+/* { dg-final { scan-assembler-times {\tvmov\.i32\tq[0-7], #0x4  @ v4si} 1 } } */
+/* { dg-final { scan-assembler-not {\t.word\t4\n} } } */
+/* { dg-final { scan-assembler-times {\tvmov\.i32\tq[0-7], #0x5  @ v4si} 1 } } */
+/* { dg-final { scan-assembler-not {\t.word\t5\n} } } */
 /* { dg-final { scan-assembler-not {\t.word\t1\n} } } */ /* 'true' mask.  */
 /* { dg-final { scan-assembler-not {\t.word\t0\n} } } */ /* 'false' mask.  */
diff --git a/gcc/testsuite/gcc.target/arm/simd/pr100757-3.c b/gcc/testsuite/gcc.target/arm/simd/pr100757-3.c
index e604555c04c..1640a447ee5 100644
--- a/gcc/testsuite/gcc.target/arm/simd/pr100757-3.c
+++ b/gcc/testsuite/gcc.target/arm/simd/pr100757-3.c
@@ -13,8 +13,11 @@ float fn1(int d) {
   return c;
 }
 
-/* { dg-final { scan-assembler-times {\t.word\t1073741824\n} 4 } } */ /* Constant 2.0f.  */
-/* { dg-final { scan-assembler-times {\t.word\t1084227584\n} 4 } } */ /* Initial value for c (4.0).  */
-/* { dg-final { scan-assembler-times {\t.word\t1082130432\n} 4 } } */ /* Possible value for c (5.0).  */
+/* { dg-final { scan-assembler-times {\tvmov\.f32\tq[0-7], #2\.0e\+0  @ v4sf} 1 } } */
+/* { dg-final { scan-assembler-not {\t.word\t1073741824\n} } } */
+/* { dg-final { scan-assembler-times {\tvmov\.f32\tq[0-7], #4\.0e\+0  @ v4sf} 1 } } */
+/* { dg-final { scan-assembler-not {\t.word\t1084227584\n} } } */
+/* { dg-final { scan-assembler-times {\tvmov\.f32\tq[0-7], #5\.0e\+0  @ v4sf} 1 } } */
+/* { dg-final { scan-assembler-not {\t.word\t1082130432\n} } } */
 /* { dg-final { scan-assembler-not {\t.word\t1\n} } } */ /* 'true' mask.  */
 /* { dg-final { scan-assembler-not {\t.word\t0\n} } } */ /* 'false' mask.  */
diff --git a/gcc/testsuite/gcc.target/arm/simd/pr100757-4.c b/gcc/testsuite/gcc.target/arm/simd/pr100757-4.c
index c12040c517f..7431494d62d 100644
--- a/gcc/testsuite/gcc.target/arm/simd/pr100757-4.c
+++ b/gcc/testsuite/gcc.target/arm/simd/pr100757-4.c
@@ -13,7 +13,11 @@ int fn1(int d) {
   return c;
 }
 
-/* { dg-final { scan-assembler-times {\t.word\t0\n} 4 } } */ /* 'false' mask.  */
+
+/* { dg-final { scan-assembler-times {\tvmov\.i32\tq[0-7], #0  @ v4si} 1 } } */
+/* { dg-final { scan-assembler-not {\t.word\t0\n} } } */ /* 'false' mask.  */
 /* { dg-final { scan-assembler-not {\t.word\t1\n} } } */ /* 'true' mask.  */

[pushed] testsuite: Add a missing USER_LABEL_PREFIX to a regex.

2022-06-20 Thread Iain Sandoe via Gcc-patches
Fixes this test on Darwin.

tested on x86_64 darwin / linux, pushed to master, thanks
Iain

Signed-off-by: Iain Sandoe 

gcc/testsuite/ChangeLog:

* g++.dg/modules/init-2_b.C: Add a missing USER_LABEL_PREFIX
to a regex.
---
 gcc/testsuite/g++.dg/modules/init-2_b.C | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/gcc/testsuite/g++.dg/modules/init-2_b.C 
b/gcc/testsuite/g++.dg/modules/init-2_b.C
index a98e67616a2..e30a555e39a 100644
--- a/gcc/testsuite/g++.dg/modules/init-2_b.C
+++ b/gcc/testsuite/g++.dg/modules/init-2_b.C
@@ -8,4 +8,4 @@ import Foo;
 // There should be an idempotency check
 // { dg-final { scan-assembler {_ZZ9_ZGIW3BarE9__in_chrg} } }
 // { dg-final { scan-assembler {call[ \t]+_?_ZGIW3Foo} { target i?86-*-* 
x86_64-*-* } } }
-// { dg-final { scan-assembler {.(quad|long)[ \t]+_ZGIW3Bar} { target i?86-*-* 
x86_64-*-* } } }
+// { dg-final { scan-assembler {.(quad|long)[ \t]+_?_ZGIW3Bar} { target 
i?86-*-* x86_64-*-* } } }
-- 
2.24.3 (Apple Git-128)



[pushed] testsuite: Require init_priority target support in a test.

2022-06-20 Thread Iain Sandoe via Gcc-patches
The attr-cdtor-1 test fails on targets without init priority since the
diagnostic emitted concerns the absence of support.  Disable the test
on such targets.

tested on x86_64 darwin / linux, pushed to master, thanks
Iain

Signed-off-by: Iain Sandoe 

gcc/testsuite/ChangeLog:

* c-c++-common/attr-cdtor-1.c: Requite init_priority support.
---
 gcc/testsuite/c-c++-common/attr-cdtor-1.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/gcc/testsuite/c-c++-common/attr-cdtor-1.c 
b/gcc/testsuite/c-c++-common/attr-cdtor-1.c
index ea61336c404..a0d069b014d 100644
--- a/gcc/testsuite/c-c++-common/attr-cdtor-1.c
+++ b/gcc/testsuite/c-c++-common/attr-cdtor-1.c
@@ -1,5 +1,5 @@
 /* PR c/90658 */
-/* { dg-do compile } */
+/* { dg-do compile { target init_priority } } */
 
 void f ();
 void g1 () __attribute__ ((constructor(f))); /* { dg-error "priorities must be 
integers" } */
-- 
2.24.3 (Apple Git-128)



Re: [PATCH] i386: Add syscall to enable AMX for latest kernels

2022-06-20 Thread Uros Bizjak via Gcc-patches
On Mon, Jun 20, 2022 at 10:04 AM Haochen Jiang  wrote:
>
> From: "Jiang, Haochen" 
>
> Hi all,
>
> We need syscall to enable AMX for kernels>=5.4. It is missing in current
> amx tests, which will cause test fail.

So this new code is only valid for linux & co?

Uros.

>
> This patch aims to add them to fix this bug.
>
> BRs,
> Haochen
>
> gcc/testsuite/ChangeLog:
>
> * gcc.target/i386/amx-check.h (request_perm_xtile_data):
> New function to check if AMX is usable and enable AMX.
> (main): Run test if AMX is usable.
> ---
>  gcc/testsuite/gcc.target/i386/amx-check.h | 24 +++
>  1 file changed, 24 insertions(+)
>
> diff --git a/gcc/testsuite/gcc.target/i386/amx-check.h 
> b/gcc/testsuite/gcc.target/i386/amx-check.h
> index 434b0e59703..92ed8669304 100644
> --- a/gcc/testsuite/gcc.target/i386/amx-check.h
> +++ b/gcc/testsuite/gcc.target/i386/amx-check.h
> @@ -4,11 +4,22 @@
>  #include 
>  #include 
>  #include 
> +#include 
> +#include 
>  #ifdef DEBUG
>  #include 
>  #endif
>  #include "cpuid.h"
>
> +#define XFEATURE_XTILECFG  17
> +#define XFEATURE_XTILEDATA 18
> +#define XFEATURE_MASK_XTILECFG (1 << XFEATURE_XTILECFG)
> +#define XFEATURE_MASK_XTILEDATA(1 << XFEATURE_XTILEDATA)
> +#define XFEATURE_MASK_XTILE(XFEATURE_MASK_XTILECFG | 
> XFEATURE_MASK_XTILEDATA)
> +
> +#define ARCH_GET_XCOMP_PERM0x1022
> +#define ARCH_REQ_XCOMP_PERM0x1023
> +
>  /* TODO: The tmm emulation is temporary for current
> AMX implementation with no tmm regclass, should
> be changed in the future. */
> @@ -44,6 +55,18 @@ typedef struct __tile
>  /* Stride (colum width in byte) used for tileload/store */
>  #define _STRIDE 64
>
> +/* We need syscall to use amx functions */
> +int request_perm_xtile_data()
> +{
> +  unsigned long bitmask;
> +
> +  if (syscall (SYS_arch_prctl, ARCH_REQ_XCOMP_PERM, XFEATURE_XTILEDATA) ||
> +  syscall (SYS_arch_prctl, ARCH_GET_XCOMP_PERM, ))
> +return 0;
> +
> +  return (bitmask & XFEATURE_MASK_XTILE) != 0;
> +}
> +
>  /* Initialize tile config by setting all tmm size to 16x64 */
>  void init_tile_config (__tilecfg_u *dst)
>  {
> @@ -186,6 +209,7 @@ main ()
>  #ifdef AMX_BF16
>&& __builtin_cpu_supports ("amx-bf16")
>  #endif
> +  && request_perm_xtile_data ()
>)
>  {
>DO_TEST ();
> --
> 2.18.2
>


Re: PING^1 [PATCH] i386: Disallow sibcall when calling ifunc functions with PIC register

2022-06-20 Thread Uros Bizjak via Gcc-patches
On Mon, Jun 20, 2022 at 4:03 PM H.J. Lu  wrote:
>
> On Tue, Jun 14, 2022 at 12:25 PM H.J. Lu  wrote:
> >
> > Disallow siball when calling ifunc functions with PIC register so that
> > PIC register can be restored.
> >
> > gcc/
> >
> > PR target/105960
> > * config/i386/i386.cc (ix86_function_ok_for_sibcall): Return
> > false if PIC register is used when calling ifunc functions.
> >
> > gcc/testsuite/
> >
> > PR target/105960
> > * gcc.target/i386/pr105960.c: New test.

LGTM.

Thanks,
Uros.

> > ---
> >  gcc/config/i386/i386.cc  |  9 +
> >  gcc/testsuite/gcc.target/i386/pr105960.c | 19 +++
> >  2 files changed, 28 insertions(+)
> >  create mode 100644 gcc/testsuite/gcc.target/i386/pr105960.c
> >
> > diff --git a/gcc/config/i386/i386.cc b/gcc/config/i386/i386.cc
> > index 3d189e124e4..1ca7836e11e 100644
> > --- a/gcc/config/i386/i386.cc
> > +++ b/gcc/config/i386/i386.cc
> > @@ -1015,6 +1015,15 @@ ix86_function_ok_for_sibcall (tree decl, tree exp)
> > }
> >  }
> >
> > +  if (decl && ix86_use_pseudo_pic_reg ())
> > +{
> > +  /* When PIC register is used, it must be restored after ifunc
> > +function returns.  */
> > +   cgraph_node *node = cgraph_node::get (decl);
> > +   if (node && node->ifunc_resolver)
> > +return false;
> > +}
> > +
> >/* Otherwise okay.  That also includes certain types of indirect calls.  
> > */
> >return true;
> >  }
> > diff --git a/gcc/testsuite/gcc.target/i386/pr105960.c 
> > b/gcc/testsuite/gcc.target/i386/pr105960.c
> > new file mode 100644
> > index 000..db137a1642d
> > --- /dev/null
> > +++ b/gcc/testsuite/gcc.target/i386/pr105960.c
> > @@ -0,0 +1,19 @@
> > +/* { dg-do compile } */
> > +/* { dg-require-ifunc "" } */
> > +/* { dg-options "-O2 -fpic" } */
> > +
> > +__attribute__((target_clones("default","fma")))
> > +static inline double
> > +expfull_ref(double x)
> > +{
> > +  return __builtin_pow(x, 0.1234);
> > +}
> > +
> > +double
> > +exp_ref(double x)
> > +{
> > +  return expfull_ref(x);
> > +}
> > +
> > +/* { dg-final { scan-assembler "jmp\[ \t\]*expfull_ref@PLT" { target { ! 
> > ia32 } } } } */
> > +/* { dg-final { scan-assembler "call\[ \t\]*expfull_ref@PLT" { target ia32 
> > } } } */
> > --
> > 2.36.1
> >
>
> PING.
>
> --
> H.J.


[PATCH v4] tree-optimization/94899: Remove "+ 0x80000000" in int comparisons

2022-06-20 Thread Arjun Shankar via Gcc-patches
Expressions of the form "X + CST < Y + CST" where:

* CST is an unsigned integer constant with only the MSB set, and
* X and Y's types have integer conversion ranks <= CST's

can be simplified to "(signed) X < (signed) Y".

This is because, assuming a 32-bit signed numbers,
(unsigned) INT_MIN + 0x8000 is 0, and
(unsigned) INT_MAX + 0x8000 is UINT_MAX.

i.e. the result increases monotonically with signed input.

This means:
((signed) X < (signed) Y) iff (X + 0x8000 < Y + 0x8000)

gcc/
* match.pd (X + C < Y + C -> (signed) X < (signed) Y, if C is
0x8000): New simplification.
gcc/testsuite/
* gcc.dg/pr94899.c: New test.
---
 gcc/match.pd   | 13 +
 gcc/testsuite/gcc.dg/pr94899.c | 49 ++
 2 files changed, 62 insertions(+)
 create mode 100644 gcc/testsuite/gcc.dg/pr94899.c
---
v3: https://gcc.gnu.org/pipermail/gcc-patches/2022-June/596785.html

Notes on v4, based on Richard and Jakub's review comments:

Richard wrote:

> It might be possible to test for zero + or - operations instead?

OK. That seems more fool-proof. I've made the change.

Jakub wrote:

> Can't one just omit the INTEGER_CST part on the second @0?

I hadn't thought of that. Done!

> As a follow-up, it might be useful to make it work for vector integral types
> too,
> typedef unsigned V __attribute__((vector_size (4 * sizeof (int;
> #define M __INT_MAX__ + 1U
> V foo (V x, V y)
> {
>   return x + (V) { M, M, M, M } < y + (V) { M, M, M, M };
> }
> using uniform_integer_cst_p.

OK. This syntax is unfamiliar to me. I'll read a bit and then try to work on
a follow-up. Thanks!

diff --git a/gcc/match.pd b/gcc/match.pd
index a63b649841b..4a570894b2e 100644
--- a/gcc/match.pd
+++ b/gcc/match.pd
@@ -2089,6 +2089,19 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT)
   (if (ANY_INTEGRAL_TYPE_P (TREE_TYPE (@0))
&& TYPE_OVERFLOW_UNDEFINED (TREE_TYPE (@0)))
(op @0 @1
+
+/* As a special case, X + C < Y + C is the same as (signed) X < (signed) Y
+   when C is an unsigned integer constant with only the MSB set, and X and
+   Y have types of equal or lower integer conversion rank than C's.  */
+(for op (lt le ge gt)
+ (simplify
+  (op (plus @1 INTEGER_CST@0) (plus @2 @0))
+  (if (INTEGRAL_TYPE_P (TREE_TYPE (@0))
+   && TYPE_UNSIGNED (TREE_TYPE (@0))
+   && wi::only_sign_bit_p (wi::to_wide (@0)))
+   (with { tree stype = signed_type_for (TREE_TYPE (@0)); }
+(op (convert:stype @1) (convert:stype @2))
+
 /* For equality and subtraction, this is also true with wrapping overflow.  */
 (for op (eq ne minus)
  (simplify
diff --git a/gcc/testsuite/gcc.dg/pr94899.c b/gcc/testsuite/gcc.dg/pr94899.c
new file mode 100644
index 000..2fc7009a2e7
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/pr94899.c
@@ -0,0 +1,49 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -fdump-tree-optimized" } */
+
+typedef __INT16_TYPE__ int16_t;
+typedef __INT32_TYPE__ int32_t;
+typedef __UINT16_TYPE__ uint16_t;
+typedef __UINT32_TYPE__ uint32_t;
+
+#define MAGIC (~ (uint32_t) 0 / 2 + 1)
+
+int
+f_i16_i16 (int16_t x, int16_t y)
+{
+  return x + MAGIC < y + MAGIC;
+}
+
+int
+f_i16_i32 (int16_t x, int32_t y)
+{
+  return x + MAGIC < y + MAGIC;
+}
+
+int
+f_i32_i32 (int32_t x, int32_t y)
+{
+  return x + MAGIC < y + MAGIC;
+}
+
+int
+f_u32_i32 (uint32_t x, int32_t y)
+{
+  return x + MAGIC < y + MAGIC;
+}
+
+int
+f_u32_u32 (uint32_t x, uint32_t y)
+{
+  return x + MAGIC < y + MAGIC;
+}
+
+int
+f_i32_i32_sub (int32_t x, int32_t y)
+{
+  return x - MAGIC < y - MAGIC;
+}
+
+/* The addition/subtraction of constants should be optimized away.  */
+/* { dg-final { scan-tree-dump-not "\\+" "optimized"} } */
+/* { dg-final { scan-tree-dump-not "\\-" "optimized"} } */
-- 
2.35.3



PING^1 [PATCH] i386: Disallow sibcall when calling ifunc functions with PIC register

2022-06-20 Thread H.J. Lu via Gcc-patches
On Tue, Jun 14, 2022 at 12:25 PM H.J. Lu  wrote:
>
> Disallow siball when calling ifunc functions with PIC register so that
> PIC register can be restored.
>
> gcc/
>
> PR target/105960
> * config/i386/i386.cc (ix86_function_ok_for_sibcall): Return
> false if PIC register is used when calling ifunc functions.
>
> gcc/testsuite/
>
> PR target/105960
> * gcc.target/i386/pr105960.c: New test.
> ---
>  gcc/config/i386/i386.cc  |  9 +
>  gcc/testsuite/gcc.target/i386/pr105960.c | 19 +++
>  2 files changed, 28 insertions(+)
>  create mode 100644 gcc/testsuite/gcc.target/i386/pr105960.c
>
> diff --git a/gcc/config/i386/i386.cc b/gcc/config/i386/i386.cc
> index 3d189e124e4..1ca7836e11e 100644
> --- a/gcc/config/i386/i386.cc
> +++ b/gcc/config/i386/i386.cc
> @@ -1015,6 +1015,15 @@ ix86_function_ok_for_sibcall (tree decl, tree exp)
> }
>  }
>
> +  if (decl && ix86_use_pseudo_pic_reg ())
> +{
> +  /* When PIC register is used, it must be restored after ifunc
> +function returns.  */
> +   cgraph_node *node = cgraph_node::get (decl);
> +   if (node && node->ifunc_resolver)
> +return false;
> +}
> +
>/* Otherwise okay.  That also includes certain types of indirect calls.  */
>return true;
>  }
> diff --git a/gcc/testsuite/gcc.target/i386/pr105960.c 
> b/gcc/testsuite/gcc.target/i386/pr105960.c
> new file mode 100644
> index 000..db137a1642d
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/i386/pr105960.c
> @@ -0,0 +1,19 @@
> +/* { dg-do compile } */
> +/* { dg-require-ifunc "" } */
> +/* { dg-options "-O2 -fpic" } */
> +
> +__attribute__((target_clones("default","fma")))
> +static inline double
> +expfull_ref(double x)
> +{
> +  return __builtin_pow(x, 0.1234);
> +}
> +
> +double
> +exp_ref(double x)
> +{
> +  return expfull_ref(x);
> +}
> +
> +/* { dg-final { scan-assembler "jmp\[ \t\]*expfull_ref@PLT" { target { ! 
> ia32 } } } } */
> +/* { dg-final { scan-assembler "call\[ \t\]*expfull_ref@PLT" { target ia32 } 
> } } */
> --
> 2.36.1
>

PING.

-- 
H.J.


Re: [PATCH] c: Extend the -Wpadded message with actual padding size

2022-06-20 Thread Vit Kabele
I fixed the formatting and added the test.

The test has first element 32bit so that it should work on both 32 and
64bit architectures, even without the aligned attribute.

If there is some better way how to write the test properly formatted
(i.e. not on a single line), please let me know.

-- >8 --
Subject: [PATCH] c: Extend the -Wpadded message with actual padding size

When the compiler warns about padding struct to alignment boundary, it
now also informs the user about the size of the alignment that needs to
be added to get rid of the warning.

This removes the need of using pahole or similar tools, or manually
determining the padding size.

Tested on x86_64-pc-linux-gnu.

gcc/ChangeLog:

* stor-layout.cc (finalize_record_size): Extend warning message.

gcc/testsuite/ChangeLog:

* c-c++-common/Wpadded.c: New test.

Signed-off-by: Vit Kabele 
---
 gcc/stor-layout.cc   |  7 ++-
 gcc/testsuite/c-c++-common/Wpadded.c | 10 ++
 2 files changed, 16 insertions(+), 1 deletion(-)
 create mode 100644 gcc/testsuite/c-c++-common/Wpadded.c

diff --git a/gcc/stor-layout.cc b/gcc/stor-layout.cc
index 765f22f68b9..88923c4136b 100644
--- a/gcc/stor-layout.cc
+++ b/gcc/stor-layout.cc
@@ -1781,7 +1781,12 @@ finalize_record_size (record_layout_info rli)
   && simple_cst_equal (unpadded_size, TYPE_SIZE (rli->t)) == 0
   && input_location != BUILTINS_LOCATION
   && !TYPE_ARTIFICIAL (rli->t))
-warning (OPT_Wpadded, "padding struct size to alignment boundary");
+  {
+   tree pad_size
+ = size_binop (MINUS_EXPR, TYPE_SIZE_UNIT (rli->t), 
unpadded_size_unit);
+ warning (OPT_Wpadded,
+   "padding struct size to alignment boundary with %E bytes", 
pad_size);
+  }
 
   if (warn_packed && TREE_CODE (rli->t) == RECORD_TYPE
   && TYPE_PACKED (rli->t) && ! rli->packed_maybe_necessary
diff --git a/gcc/testsuite/c-c++-common/Wpadded.c 
b/gcc/testsuite/c-c++-common/Wpadded.c
new file mode 100644
index 000..e8f1044a36b
--- /dev/null
+++ b/gcc/testsuite/c-c++-common/Wpadded.c
@@ -0,0 +1,10 @@
+/* { dg-do compile } */
+/* { dg-options "-Wpadded" } */
+
+/*
+ * The struct is on single line, because C++ compiler emits the -Wpadded
+ * warning at the first line of the struct, while the C compiler at the last
+ * line of the struct definition. This way the test passes on both
+ */
+struct S { __UINT32_TYPE__ i; char c; }; /* { dg-warning "padding struct size 
to alignment boundary with 3 bytes" } */
+
-- 
2.30.2


Re: [PATCH]middle-end Add optimized float addsub without needing VEC_PERM_EXPR.

2022-06-20 Thread Richard Sandiford via Gcc-patches
Tamar Christina via Gcc-patches  writes:
>> -Original Message-
>> From: Richard Biener 
>> Sent: Monday, June 20, 2022 12:56 PM
>> To: Tamar Christina 
>> Cc: Andrew Pinski via Gcc-patches ; nd
>> 
>> Subject: RE: [PATCH]middle-end Add optimized float addsub without
>> needing VEC_PERM_EXPR.
>> 
>> On Mon, 20 Jun 2022, Tamar Christina wrote:
>> 
>> > > -Original Message-
>> > > From: Richard Biener 
>> > > Sent: Saturday, June 18, 2022 11:49 AM
>> > > To: Andrew Pinski via Gcc-patches 
>> > > Cc: Tamar Christina ; nd 
>> > > Subject: Re: [PATCH]middle-end Add optimized float addsub without
>> > > needing VEC_PERM_EXPR.
>> > >
>> > >
>> > >
>> > > > Am 17.06.2022 um 22:34 schrieb Andrew Pinski via Gcc-patches > > > patc...@gcc.gnu.org>:
>> > > >
>> > > > On Thu, Jun 16, 2022 at 3:59 AM Tamar Christina via Gcc-patches
>> > > >  wrote:
>> > > >>
>> > > >> Hi All,
>> > > >>
>> > > >> For IEEE 754 floating point formats we can replace a sequence of
>> > > >> alternative
>> > > >> +/- with fneg of a wider type followed by an fadd.  This
>> > > >> +eliminated the need for
>> > > >> using a permutation.  This patch adds a math.pd rule to recognize
>> > > >> and do this rewriting.
>> > > >
>> > > > I don't think this is correct. You don't check the format of the
>> > > > floating point to make sure this is valid (e.g. REAL_MODE_FORMAT's
>> > > > signbit_rw/signbit_ro field).
>> >
>> > Yes I originally had this check, but I wondered whether it would be needed.
>> > I'm not aware of any vector ISA where the 32-bit and 16-bit floats
>> > don't follow the IEEE data layout and semantics here.
>> >
>> > My preference would be to ask the target about the data format of its
>> > vector Floating points because I don't think there needs to be a direct
>> correlation between
>> > The scalar and vector formats strictly speaking.   But I know Richi won't 
>> > like
>> that so
>> > the check is probably most likely.
>> >
>> > > > Also would just be better if you do the xor in integer mode (using
>> > > > signbit_rw field for the correct bit)?
>> > > > And then making sure the target optimizes the xor to the neg
>> > > > instruction when needed?
>> >
>> > I don't really see the advantage of this one. It's not removing an
>> > instruction and it's assuming the vector ISA can do integer ops on a
>> > floating point vector cheaply.  Since match.pd doesn't have the
>> > ability to do costing I'd rather not do this.
>> >
>> > > I’m also worried about using FP operations for the negate here.
>> > > When @1 is constant do we still constant fold this correctly?
>> >
>> > We never did constant folding for this case, the folding
>> > infrastructure doesn't know how to fold the VEC_PERM_EXPR.  So even
>> > with @0 and @1 constant no folding takes place even today if we vectorize.
>> >
>> > >
>> > > For costing purposes it would be nice to make this visible to the
>> vectorizer.
>> > >
>> >
>> > I initially wanted to use VEC_ADDSUB for this, but noticed it didn't
>> > trigger in a number of place I had expected it to. While looking into
>> > it I noticed it's because this follows the x86 instruction semantics so 
>> > left it
>> alone.
>> >
>> > It felt like adding a third pattern here might be confusing. However I
>> > can also use the SLP pattern matcher to rewrite it without an optab if you
>> prefer that?
>> >
>> > The operations will then be costed normally.
>> >
>> > > Also is this really good for all targets?  Can there be issues with
>> > > reformatting when using FP ops as in your patch or with using
>> > > integer XOR as suggested making this more expensive than the blend?
>> >
>> > I don't think with the fp ops alone,  since it's using two fp ops already 
>> > and
>> after the change 2 fp ops.
>> > and I can't image that a target would have a slow -a.
>> 
>> Wouldn't a target need to re-check if lanes are NaN or denormal if after a
>> SFmode lane operation a DFmode lane operation follows?  IIRC that is what
>> usually makes punning "integer" vectors as FP vectors costly.
>
> I guess this really depends on the target.
>
>> 
>> Note one option would be to emit a multiply with { 1, -1, 1, -1 } on GIMPLE
>> where then targets could opt-in to handle this via a DFmode negate via a
>> combine pattern?  Not sure if this can be even done starting from the vec-
>> perm RTL IL.
>>
>
> But multiplies can be so expensive that the VEC_PERM_EXPR would still be
> better.  At least as you say, the target costed for that. 
>
>> I fear whether (neg:V2DF (subreg:V2DF (reg:V4SF))) is a good idea will
>> heavily depend on the target CPU (not only the ISA).  For RISC-V for example
>> I think the DF lanes do not overlap with two SF lanes (so same with gcn I
>> think).
>
> Right, so I think the conclusion is I need to move this to the backend.

I wouldn't go that far :)  It just means that it needs more conditions.

E.g. whether the subreg is cheap should be testable via MODES_TIEABLE_P.
I don't think that macro should be true for modes 

[PATCH] middle-end/106027 - fix types in needle folding

2022-06-20 Thread Richard Biener via Gcc-patches
The fold_to_nonsharp_ineq_using_bound folding ends up creating invalid
typed IL which confuses later foldings.  The following fixes that.

Bootstrapped and tested on x86_64-unknwon-linux-gnu, pushed.

2022-06-20  Richard Biener  

PR middle-end/106027
* fold-const.cc (fold_to_nonsharp_ineq_using_bound): Use the
type of the prevailing comparison for the new comparison type.
(fold_binary_loc): Use proper types for the A < X && A + 1 > Y
to A < X && A >= Y folding.

* gcc.dg/pr106027.c: New testcase.
---
 gcc/fold-const.cc   | 10 +++---
 gcc/testsuite/gcc.dg/pr106027.c |  8 
 2 files changed, 15 insertions(+), 3 deletions(-)
 create mode 100644 gcc/testsuite/gcc.dg/pr106027.c

diff --git a/gcc/fold-const.cc b/gcc/fold-const.cc
index fbdf3c824af..99021a82df4 100644
--- a/gcc/fold-const.cc
+++ b/gcc/fold-const.cc
@@ -7530,7 +7530,7 @@ tree_swap_operands_p (const_tree arg0, const_tree arg1)
 static tree
 fold_to_nonsharp_ineq_using_bound (location_t loc, tree ineq, tree bound)
 {
-  tree a, typea, type = TREE_TYPE (ineq), a1, diff, y;
+  tree a, typea, type = TREE_TYPE (bound), a1, diff, y;
 
   if (TREE_CODE (bound) == LT_EXPR)
 a = TREE_OPERAND (bound, 0);
@@ -12037,11 +12037,15 @@ fold_binary_loc (location_t loc, enum tree_code code, 
tree type,
{
  tem = fold_to_nonsharp_ineq_using_bound (loc, arg0, arg1);
  if (tem && !operand_equal_p (tem, arg0, 0))
-   return fold_build2_loc (loc, code, type, tem, arg1);
+   return fold_convert (type,
+fold_build2_loc (loc, code, TREE_TYPE (arg1),
+ tem, arg1));
 
  tem = fold_to_nonsharp_ineq_using_bound (loc, arg1, arg0);
  if (tem && !operand_equal_p (tem, arg1, 0))
-   return fold_build2_loc (loc, code, type, arg0, tem);
+   return fold_convert (type,
+fold_build2_loc (loc, code, TREE_TYPE (arg0),
+ arg0, tem));
}
 
   if ((tem = fold_truth_andor (loc, code, type, arg0, arg1, op0, op1))
diff --git a/gcc/testsuite/gcc.dg/pr106027.c b/gcc/testsuite/gcc.dg/pr106027.c
new file mode 100644
index 000..735205fb252
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/pr106027.c
@@ -0,0 +1,8 @@
+/* { dg-do compile } */
+/* { dg-options "-O" } */
+
+int
+foo (unsigned int x, int y)
+{
+  return x <= (((y != y) < 0) ? y < 1 : 0);
+}
-- 
2.35.3


Re: [PATCH 3/3] lto-plugin: implement LDPT_GET_API_VERSION

2022-06-20 Thread Martin Liška
On 6/20/22 11:35, Richard Biener wrote:
> I think this is OK.  Can we get buy-in from mold people?

Sure, I've just pinged Rui:
https://github.com/rui314/mold/issues/454#issuecomment-1160419030

Martin


Re: [PATCH] vect: Respect slp decision when applying suggested uf [PR105940]

2022-06-20 Thread Kewen.Lin via Gcc-patches
on 2022/6/20 15:47, Richard Biener wrote:
> On Fri, Jun 17, 2022 at 12:53 PM Kewen.Lin  wrote:
>>
>> Hi,
>>
>> This follows Richi's suggestion in PR105940, it aims to avoid
>> inconsistent slp decision between when the suggested unroll
>> factor is worked out and when the suggested unroll factor is
>> applied.
>>
>> If the previous slp decision is true when the suggested unroll
>> factor is worked out, when we are applying unroll factor we
>> don't need to start over with slp off if the analysis with slp
>> on fails.  On the other hand, if the previous slp decision is
>> false when the suggested unroll factor is worked out, when we
>> are applying unroll factor we can skip the slp handlings.
>>
>> Function vect_is_simple_reduction saves reduction chains for
>> subsequent slp analyses, we have to disable this early otherwise
>> there is an ICE in vectorizable_reduction for below:
>>
>>   if (REDUC_GROUP_FIRST_ELEMENT (stmt_info))
>> gcc_assert (slp_node
>> && REDUC_GROUP_FIRST_ELEMENT (stmt_info)
>>== stmt_info);
> 
> We ensure this by either decomposing the group in vect_analyze_slp
> if the reduction chain doesn't SLP or when we re-try without SLP
> by not re-trying:
> 
>   /* If there are reduction chains re-trying will fail anyway.  */
>   if (! LOOP_VINFO_REDUCTION_CHAINS (loop_vinfo).is_empty ())
> return ok;
>

Yeah, thanks for the pointer.  I put one alternative in the PR using
the undo (decomposing) way in vect_analyze_loop_2, but thought passing
slp flag down looks better as we avoid the useless efforts.

>> Bootstrapped and regtested on x86_64-redhat-linux,
>> powerpc64{,le}-linux-gnu and aarch64-linux-gnu.
>>
>> Also tested with SPEC2017 build with some rs6000 hacking.
>>
>> Is it ok for trunk?
> 
> OK.
> 

Thanks Richi!  Committed as r13-1173.

BR,
Kewen


RE: [PATCH]middle-end Add optimized float addsub without needing VEC_PERM_EXPR.

2022-06-20 Thread Tamar Christina via Gcc-patches
> -Original Message-
> From: Richard Biener 
> Sent: Monday, June 20, 2022 12:56 PM
> To: Tamar Christina 
> Cc: Andrew Pinski via Gcc-patches ; nd
> 
> Subject: RE: [PATCH]middle-end Add optimized float addsub without
> needing VEC_PERM_EXPR.
> 
> On Mon, 20 Jun 2022, Tamar Christina wrote:
> 
> > > -Original Message-
> > > From: Richard Biener 
> > > Sent: Saturday, June 18, 2022 11:49 AM
> > > To: Andrew Pinski via Gcc-patches 
> > > Cc: Tamar Christina ; nd 
> > > Subject: Re: [PATCH]middle-end Add optimized float addsub without
> > > needing VEC_PERM_EXPR.
> > >
> > >
> > >
> > > > Am 17.06.2022 um 22:34 schrieb Andrew Pinski via Gcc-patches  > > patc...@gcc.gnu.org>:
> > > >
> > > > On Thu, Jun 16, 2022 at 3:59 AM Tamar Christina via Gcc-patches
> > > >  wrote:
> > > >>
> > > >> Hi All,
> > > >>
> > > >> For IEEE 754 floating point formats we can replace a sequence of
> > > >> alternative
> > > >> +/- with fneg of a wider type followed by an fadd.  This
> > > >> +eliminated the need for
> > > >> using a permutation.  This patch adds a math.pd rule to recognize
> > > >> and do this rewriting.
> > > >
> > > > I don't think this is correct. You don't check the format of the
> > > > floating point to make sure this is valid (e.g. REAL_MODE_FORMAT's
> > > > signbit_rw/signbit_ro field).
> >
> > Yes I originally had this check, but I wondered whether it would be needed.
> > I'm not aware of any vector ISA where the 32-bit and 16-bit floats
> > don't follow the IEEE data layout and semantics here.
> >
> > My preference would be to ask the target about the data format of its
> > vector Floating points because I don't think there needs to be a direct
> correlation between
> > The scalar and vector formats strictly speaking.   But I know Richi won't 
> > like
> that so
> > the check is probably most likely.
> >
> > > > Also would just be better if you do the xor in integer mode (using
> > > > signbit_rw field for the correct bit)?
> > > > And then making sure the target optimizes the xor to the neg
> > > > instruction when needed?
> >
> > I don't really see the advantage of this one. It's not removing an
> > instruction and it's assuming the vector ISA can do integer ops on a
> > floating point vector cheaply.  Since match.pd doesn't have the
> > ability to do costing I'd rather not do this.
> >
> > > I’m also worried about using FP operations for the negate here.
> > > When @1 is constant do we still constant fold this correctly?
> >
> > We never did constant folding for this case, the folding
> > infrastructure doesn't know how to fold the VEC_PERM_EXPR.  So even
> > with @0 and @1 constant no folding takes place even today if we vectorize.
> >
> > >
> > > For costing purposes it would be nice to make this visible to the
> vectorizer.
> > >
> >
> > I initially wanted to use VEC_ADDSUB for this, but noticed it didn't
> > trigger in a number of place I had expected it to. While looking into
> > it I noticed it's because this follows the x86 instruction semantics so 
> > left it
> alone.
> >
> > It felt like adding a third pattern here might be confusing. However I
> > can also use the SLP pattern matcher to rewrite it without an optab if you
> prefer that?
> >
> > The operations will then be costed normally.
> >
> > > Also is this really good for all targets?  Can there be issues with
> > > reformatting when using FP ops as in your patch or with using
> > > integer XOR as suggested making this more expensive than the blend?
> >
> > I don't think with the fp ops alone,  since it's using two fp ops already 
> > and
> after the change 2 fp ops.
> > and I can't image that a target would have a slow -a.
> 
> Wouldn't a target need to re-check if lanes are NaN or denormal if after a
> SFmode lane operation a DFmode lane operation follows?  IIRC that is what
> usually makes punning "integer" vectors as FP vectors costly.

I guess this really depends on the target.

> 
> Note one option would be to emit a multiply with { 1, -1, 1, -1 } on GIMPLE
> where then targets could opt-in to handle this via a DFmode negate via a
> combine pattern?  Not sure if this can be even done starting from the vec-
> perm RTL IL.
>

But multiplies can be so expensive that the VEC_PERM_EXPR would still be
better.  At least as you say, the target costed for that. 

> I fear whether (neg:V2DF (subreg:V2DF (reg:V4SF))) is a good idea will
> heavily depend on the target CPU (not only the ISA).  For RISC-V for example
> I think the DF lanes do not overlap with two SF lanes (so same with gcn I
> think).

Right, so I think the conclusion is I need to move this to the backend.

Thanks,
Tamar

> 
> Richard.
> 
> > The XOR one I wouldn't do, as the vector int and vector float could
> > for instance be in different register files or FP be a co-processor
> > etc.  Mixing FP and Integer ops in this case I can image can lead to
> > something suboptimal.  Also for targets with masking/predication the
> VEC_PERM_EXP could potentially be 

RE: [PATCH]middle-end Add optimized float addsub without needing VEC_PERM_EXPR.

2022-06-20 Thread Richard Biener via Gcc-patches
On Mon, 20 Jun 2022, Tamar Christina wrote:

> > -Original Message-
> > From: Richard Biener 
> > Sent: Saturday, June 18, 2022 11:49 AM
> > To: Andrew Pinski via Gcc-patches 
> > Cc: Tamar Christina ; nd 
> > Subject: Re: [PATCH]middle-end Add optimized float addsub without
> > needing VEC_PERM_EXPR.
> > 
> > 
> > 
> > > Am 17.06.2022 um 22:34 schrieb Andrew Pinski via Gcc-patches  > patc...@gcc.gnu.org>:
> > >
> > > On Thu, Jun 16, 2022 at 3:59 AM Tamar Christina via Gcc-patches
> > >  wrote:
> > >>
> > >> Hi All,
> > >>
> > >> For IEEE 754 floating point formats we can replace a sequence of
> > >> alternative
> > >> +/- with fneg of a wider type followed by an fadd.  This eliminated
> > >> +the need for
> > >> using a permutation.  This patch adds a math.pd rule to recognize and
> > >> do this rewriting.
> > >
> > > I don't think this is correct. You don't check the format of the
> > > floating point to make sure this is valid (e.g. REAL_MODE_FORMAT's
> > > signbit_rw/signbit_ro field).
> 
> Yes I originally had this check, but I wondered whether it would be needed.
> I'm not aware of any vector ISA where the 32-bit and 16-bit floats don't 
> follow
> the IEEE data layout and semantics here.
> 
> My preference would be to ask the target about the data format of its vector
> Floating points because I don't think there needs to be a direct correlation 
> between
> The scalar and vector formats strictly speaking.   But I know Richi won't 
> like that so
> the check is probably most likely.
> 
> > > Also would just be better if you do the xor in integer mode (using
> > > signbit_rw field for the correct bit)?
> > > And then making sure the target optimizes the xor to the neg
> > > instruction when needed?
> 
> I don't really see the advantage of this one. It's not removing an instruction
> and it's assuming the vector ISA can do integer ops on a floating point vector
> cheaply.  Since match.pd doesn't have the ability to do costing I'd rather not
> do this.
> 
> > I’m also worried about using FP operations for the negate here.  When @1 is
> > constant do we still constant fold this correctly?
> 
> We never did constant folding for this case, the folding infrastructure 
> doesn't
> know how to fold the VEC_PERM_EXPR.  So even with @0 and @1 constant no
> folding takes place even today if we vectorize.
> 
> > 
> > For costing purposes it would be nice to make this visible to the 
> > vectorizer.
> > 
> 
> I initially wanted to use VEC_ADDSUB for this, but noticed it didn't trigger 
> in a number of
> place I had expected it to. While looking into it I noticed it's because this 
> follows the x86
> instruction semantics so left it alone.
> 
> It felt like adding a third pattern here might be confusing. However I can 
> also use the SLP
> pattern matcher to rewrite it without an optab if you prefer that?
> 
> The operations will then be costed normally.
> 
> > Also is this really good for all targets?  Can there be issues with 
> > reformatting
> > when using FP ops as in your patch or with using integer XOR as suggested
> > making this more expensive than the blend?
> 
> I don't think with the fp ops alone,  since it's using two fp ops already and 
> after the change 2 fp ops.
> and I can't image that a target would have a slow -a.

Wouldn't a target need to re-check if lanes are NaN or denormal if after
a SFmode lane operation a DFmode lane operation follows?  IIRC that is
what usually makes punning "integer" vectors as FP vectors costly.

Note one option would be to emit a multiply with { 1, -1, 1, -1 } on
GIMPLE where then targets could opt-in to handle this via a DFmode
negate via a combine pattern?  Not sure if this can be even done
starting from the vec-perm RTL IL.

I fear whether (neg:V2DF (subreg:V2DF (reg:V4SF))) is a good idea
will heavily depend on the target CPU (not only the ISA).  For RISC-V
for example I think the DF lanes do not overlap with two SF lanes
(so same with gcn I think).

Richard.

> The XOR one I wouldn't do, as the vector int and vector float could for 
> instance be in different register
> files or FP be a co-processor etc.  Mixing FP and Integer ops in this case I 
> can image can lead to something
> suboptimal.  Also for targets with masking/predication the VEC_PERM_EXP could 
> potentially be lowered to
> a mask/predicate in the backend. Whereas the XOR approach is far less likely.
> 
> Thanks,
> Tamar
> 
> > 
> > Richard.
> > 
> > > Thanks,
> > > Andrew Pinski
> > >
> > >
> > >
> > >>
> > >> For
> > >>
> > >> void f (float *restrict a, float *restrict b, float *res, int n) {
> > >>   for (int i = 0; i < (n & -4); i+=2)
> > >>{
> > >>  res[i+0] = a[i+0] + b[i+0];
> > >>  res[i+1] = a[i+1] - b[i+1];
> > >>}
> > >> }
> > >>
> > >> we generate:
> > >>
> > >> .L3:
> > >>ldr q1, [x1, x3]
> > >>ldr q0, [x0, x3]
> > >>fnegv1.2d, v1.2d
> > >>faddv0.4s, v0.4s, v1.4s
> > >>str q0, [x2, x3]
> > >>   

Re: [PATCH RFA] ubsan: do return check with -fsanitize=unreachable

2022-06-20 Thread Jakub Jelinek via Gcc-patches
On Fri, Jun 17, 2022 at 05:20:02PM -0400, Jason Merrill wrote:
> Related to PR104642, the current situation where we get less return checking
> with just -fsanitize=unreachable than no sanitize flags seems undesirable; I
> propose that we do return checking when -fsanitize=unreachable.

__builtin_unreachable itself (unless turned into trap or
__ubsan_handle_builtin_unreachable) is not any kind of return checking, it
is just an optimization.

> Looks like clang just traps on missing return if not -fsanitize=return, but
> the approach in this patch seems more helpful to me if we're already
> sanitizing other should-be-unreachable code.
> 
> I'm assuming that the difference in treatment of SANITIZE_UNREACHABLE and
> SANITIZE_RETURN with regard to loop optimization is deliberate.

return and unreachable are separate sanitizers and such silent one way
implication can have quite unexpected consequences, especially with
-fsanitize-trap=.
Say with -fsanitize=unreachable -fsanitize-trap=unreachable, both current
trunk and clang will link without -lubsan, because the only enabled UBSan
sanitizers use __builtin_trap () which doesn't need library.
With -fsanitize=unreachable silently meaning -fsanitize=unreachable,return
the above would link in -lubsan, because while SANITIZE_UNREACHABLE uses
__builtin_trap, SANITIZE_RETURN doesn't.
Similarly, one has no_sanitize attribute, one could in certain function
__attribute__((no_sanitize ("unreachable"))) and because on the command
line using -fsanitize=unreachable assume other sanitizers aren't enabled,
but the silent addition of return sanitizer would break that.

> --- a/gcc/cp/cp-gimplify.cc
> +++ b/gcc/cp/cp-gimplify.cc
> @@ -1806,18 +1806,6 @@ cp_maybe_instrument_return (tree fndecl)
>|| !targetm.warn_func_return (fndecl))
>  return;
>  
> -  if (!sanitize_flags_p (SANITIZE_RETURN, fndecl)
> -  /* Don't add __builtin_unreachable () if not optimizing, it will not
> -  improve any optimizations in that case, just break UB code.
> -  Don't add it if -fsanitize=unreachable -fno-sanitize=return either,
> -  UBSan covers this with ubsan_instrument_return above where sufficient
> -  information is provided, while the __builtin_unreachable () below
> -  if return sanitization is disabled will just result in hard to
> -  understand runtime error without location.  */
> -  && (!optimize
> -   || sanitize_flags_p (SANITIZE_UNREACHABLE, fndecl)))
> -return;
> -
>tree t = DECL_SAVED_TREE (fndecl);
>while (t)
>  {

I think the above is correct, if -fsanitize=return, we want to fall through
and use __ubsan_handle_missing_return (or __builtin_trap if
-fsanitize-trap=return).
Otherwise, for -O0, __builtin_unreachable most likely doesn't offer any
important optimization benefits and just makes debugging bad code harder.
Similarly for -fsanitize=unreachable, the __builtin_unreachable there would
be an optimization which we shouldn't turn into
__ubsan_handle_builtin_unreachable / __builtin_trap.

Now, -funreachable-traps can of course change the condition a little bit,
and so can implementation of builtin_decl_unreachable and stopping of
folding of __builtin_unreachable to __builtin_trap if -fsanitize=unreachable
-fsanitize-trap=unreachable.

The -fsanitize=return case remains the same no matter what.

Otherwise, if -funreachable-traps, we are emitting __builtin_trap rather
than __builtin_unreachable, so it is perfectly fine to fall through
regardless of !optimize or SANITIZE_UNREACHABLE being on, it isn't an
optimization in that case, but checking.

Otherwise, if !optimize, we should return, __builtin_unreachable in there
wouldn't bring many advantages and just punish users of bad code.

Otherwise, if builtin_decl_unreachable is implemented and we never fold
__builtin_unreachable to __builtin_trap, for SANITIZE_UNREACHABLE
enabled and (flag_sanitize_trap & SANITIZE_UNREACHABLE) != 0 we could
emit __builtin_unreachable (but in that case directly, not through
builtin_decl_unreachable).

Otherwise, if SANITIZE_UNREACHABLE is on and
(flag_sanitize_trap & SANITIZE_UNREACHABLE) == 0, I assume we'll still
want to fold __builtin_unreachable to __ubsan_handle_builtin_unreachable
during sanopt etc., we can live without the optimization and not instrument.

Otherwise emit __builtin_unreachable (directly).

Jakub



Re: [PATCH 2/3] lto-plugin: make claim_file_handler thread-safe

2022-06-20 Thread Martin Liška
On 6/20/22 11:32, Richard Biener wrote:
> On Thu, Jun 16, 2022 at 9:01 AM Martin Liška  wrote:
>>
>> lto-plugin/ChangeLog:
>>
>> * lto-plugin.c (plugin_lock): New lock.
>> (claim_file_handler): Use mutex for critical section.
>> (onload): Initialize mutex.
>> ---
>>  lto-plugin/lto-plugin.c | 16 +++-
>>  1 file changed, 15 insertions(+), 1 deletion(-)
>>
>> diff --git a/lto-plugin/lto-plugin.c b/lto-plugin/lto-plugin.c
>> index 00b760636dc..13118c4983c 100644
>> --- a/lto-plugin/lto-plugin.c
>> +++ b/lto-plugin/lto-plugin.c
>> @@ -55,6 +55,7 @@ along with this program; see the file COPYING3.  If not see
>>  #include 
>>  #include 
>>  #include 
>> +#include 
> 
> Not sure if we support any non-pthread target for building the LTO
> plugin, but it
> seems we have
> 
>   # Among non-ELF, only Windows platforms support the lto-plugin so far.
>   # Build it unless LTO was explicitly disabled.
>   case $target in
> *-cygwin* | *-mingw*) build_lto_plugin=$enable_lto ;;
> 
> which suggests that at least build validating the above with --enable-lto

Verified that it's fine.

> 
> IIRC we have gthr-*.h in libgcc/, not sure if that's usable in a
> host linker plugin.
> 
>>  #ifdef HAVE_SYS_WAIT_H
>>  #include 
>>  #endif
>> @@ -157,6 +158,9 @@ enum symbol_style
>>ss_uscore,   /* Underscore prefix all symbols.  */
>>  };
>>
>> +/* Plug-in mutex.  */
>> +static pthread_mutex_t plugin_lock;
>> +
>>  static char *arguments_file_name;
>>  static ld_plugin_register_claim_file register_claim_file;
>>  static ld_plugin_register_all_symbols_read register_all_symbols_read;
>> @@ -1262,15 +1266,18 @@ claim_file_handler (const struct 
>> ld_plugin_input_file *file, int *claimed)
>>   lto_file.symtab.syms);
>>check (status == LDPS_OK, LDPL_FATAL, "could not add symbols");
>>
>> +  pthread_mutex_lock (_lock);
>>num_claimed_files++;
>>claimed_files =
>> xrealloc (claimed_files,
>>   num_claimed_files * sizeof (struct plugin_file_info));
>>claimed_files[num_claimed_files - 1] = lto_file;
>> +  pthread_mutex_unlock (_lock);
>>
>>*claimed = 1;
>>  }
>>
>> +  pthread_mutex_lock (_lock);
>>if (offload_files == NULL)
>>  {
>>/* Add dummy item to the start of the list.  */
>> @@ -1333,11 +1340,12 @@ claim_file_handler (const struct 
>> ld_plugin_input_file *file, int *claimed)
>> offload_files_last_lto = ofld;
>>num_offload_files++;
>>  }
>> +  pthread_mutex_unlock (_lock);
>>
>>goto cleanup;
>>
>>   err:
>> -  non_claimed_files++;
>> +  __atomic_fetch_add (_claimed_files, 1, __ATOMIC_RELAXED);
> 
> is it worth "optimizing" this with yet another need for target specific 
> support
> (just use pthread_mutex here as well?)

Sure.

May I install the patch with the change?

Cheers,
Martin

> 
>>free (lto_file.name);
>>
>>   cleanup:
>> @@ -1415,6 +1423,12 @@ onload (struct ld_plugin_tv *tv)
>>struct ld_plugin_tv *p;
>>enum ld_plugin_status status;
>>
>> +  if (pthread_mutex_init (_lock, NULL) != 0)
>> +{
>> +  fprintf (stderr, "mutex init failed\n");
>> +  abort ();
>> +}
>> +
>>p = tv;
>>while (p->tv_tag)
>>  {
>> --
>> 2.36.1
>>
>>
From 12fb5f8fbb283313f9d5bcadb24c45904128804d Mon Sep 17 00:00:00 2001
From: Martin Liska 
Date: Mon, 16 May 2022 14:18:41 +0200
Subject: [PATCH 1/2] lto-plugin: make claim_file_handler thread-safe

lto-plugin/ChangeLog:

	* lto-plugin.c (plugin_lock): New lock.
	(claim_file_handler): Use mutex for critical section.
	(onload): Initialize mutex.
---
 lto-plugin/lto-plugin.c | 16 
 1 file changed, 16 insertions(+)

diff --git a/lto-plugin/lto-plugin.c b/lto-plugin/lto-plugin.c
index 00b760636dc..2d95b7b3803 100644
--- a/lto-plugin/lto-plugin.c
+++ b/lto-plugin/lto-plugin.c
@@ -55,6 +55,7 @@ along with this program; see the file COPYING3.  If not see
 #include 
 #include 
 #include 
+#include 
 #ifdef HAVE_SYS_WAIT_H
 #include 
 #endif
@@ -157,6 +158,9 @@ enum symbol_style
   ss_uscore,	/* Underscore prefix all symbols.  */
 };
 
+/* Plug-in mutex.  */
+static pthread_mutex_t plugin_lock;
+
 static char *arguments_file_name;
 static ld_plugin_register_claim_file register_claim_file;
 static ld_plugin_register_all_symbols_read register_all_symbols_read;
@@ -1262,15 +1266,18 @@ claim_file_handler (const struct ld_plugin_input_file *file, int *claimed)
 			  lto_file.symtab.syms);
   check (status == LDPS_OK, LDPL_FATAL, "could not add symbols");
 
+  pthread_mutex_lock (_lock);
   num_claimed_files++;
   claimed_files =
 	xrealloc (claimed_files,
 		  num_claimed_files * sizeof (struct plugin_file_info));
   claimed_files[num_claimed_files - 1] = lto_file;
+  pthread_mutex_unlock (_lock);
 
   *claimed = 1;
 }
 
+  pthread_mutex_lock (_lock);
   if (offload_files == NULL)
 {
   /* Add dummy item to the start of the list.  */
@@ 

RE: [PATCH]middle-end Add optimized float addsub without needing VEC_PERM_EXPR.

2022-06-20 Thread Tamar Christina via Gcc-patches
> -Original Message-
> From: Richard Biener 
> Sent: Saturday, June 18, 2022 11:49 AM
> To: Andrew Pinski via Gcc-patches 
> Cc: Tamar Christina ; nd 
> Subject: Re: [PATCH]middle-end Add optimized float addsub without
> needing VEC_PERM_EXPR.
> 
> 
> 
> > Am 17.06.2022 um 22:34 schrieb Andrew Pinski via Gcc-patches  patc...@gcc.gnu.org>:
> >
> > On Thu, Jun 16, 2022 at 3:59 AM Tamar Christina via Gcc-patches
> >  wrote:
> >>
> >> Hi All,
> >>
> >> For IEEE 754 floating point formats we can replace a sequence of
> >> alternative
> >> +/- with fneg of a wider type followed by an fadd.  This eliminated
> >> +the need for
> >> using a permutation.  This patch adds a math.pd rule to recognize and
> >> do this rewriting.
> >
> > I don't think this is correct. You don't check the format of the
> > floating point to make sure this is valid (e.g. REAL_MODE_FORMAT's
> > signbit_rw/signbit_ro field).

Yes I originally had this check, but I wondered whether it would be needed.
I'm not aware of any vector ISA where the 32-bit and 16-bit floats don't follow
the IEEE data layout and semantics here.

My preference would be to ask the target about the data format of its vector
Floating points because I don't think there needs to be a direct correlation 
between
The scalar and vector formats strictly speaking.   But I know Richi won't like 
that so
the check is probably most likely.

> > Also would just be better if you do the xor in integer mode (using
> > signbit_rw field for the correct bit)?
> > And then making sure the target optimizes the xor to the neg
> > instruction when needed?

I don't really see the advantage of this one. It's not removing an instruction
and it's assuming the vector ISA can do integer ops on a floating point vector
cheaply.  Since match.pd doesn't have the ability to do costing I'd rather not
do this.

> I’m also worried about using FP operations for the negate here.  When @1 is
> constant do we still constant fold this correctly?

We never did constant folding for this case, the folding infrastructure doesn't
know how to fold the VEC_PERM_EXPR.  So even with @0 and @1 constant no
folding takes place even today if we vectorize.

> 
> For costing purposes it would be nice to make this visible to the vectorizer.
> 

I initially wanted to use VEC_ADDSUB for this, but noticed it didn't trigger in 
a number of
place I had expected it to. While looking into it I noticed it's because this 
follows the x86
instruction semantics so left it alone.

It felt like adding a third pattern here might be confusing. However I can also 
use the SLP
pattern matcher to rewrite it without an optab if you prefer that?

The operations will then be costed normally.

> Also is this really good for all targets?  Can there be issues with 
> reformatting
> when using FP ops as in your patch or with using integer XOR as suggested
> making this more expensive than the blend?

I don't think with the fp ops alone,  since it's using two fp ops already and 
after the change 2 fp ops.
and I can't image that a target would have a slow -a.

The XOR one I wouldn't do, as the vector int and vector float could for 
instance be in different register
files or FP be a co-processor etc.  Mixing FP and Integer ops in this case I 
can image can lead to something
suboptimal.  Also for targets with masking/predication the VEC_PERM_EXP could 
potentially be lowered to
a mask/predicate in the backend. Whereas the XOR approach is far less likely.

Thanks,
Tamar

> 
> Richard.
> 
> > Thanks,
> > Andrew Pinski
> >
> >
> >
> >>
> >> For
> >>
> >> void f (float *restrict a, float *restrict b, float *res, int n) {
> >>   for (int i = 0; i < (n & -4); i+=2)
> >>{
> >>  res[i+0] = a[i+0] + b[i+0];
> >>  res[i+1] = a[i+1] - b[i+1];
> >>}
> >> }
> >>
> >> we generate:
> >>
> >> .L3:
> >>ldr q1, [x1, x3]
> >>ldr q0, [x0, x3]
> >>fnegv1.2d, v1.2d
> >>faddv0.4s, v0.4s, v1.4s
> >>str q0, [x2, x3]
> >>add x3, x3, 16
> >>cmp x3, x4
> >>bne .L3
> >>
> >> now instead of:
> >>
> >> .L3:
> >>ldr q1, [x0, x3]
> >>ldr q2, [x1, x3]
> >>faddv0.4s, v1.4s, v2.4s
> >>fsubv1.4s, v1.4s, v2.4s
> >>tbl v0.16b, {v0.16b - v1.16b}, v3.16b
> >>str q0, [x2, x3]
> >>add x3, x3, 16
> >>cmp x3, x4
> >>bne .L3
> >>
> >> Bootstrapped Regtested on aarch64-none-linux-gnu and no issues.
> >>
> >> Thanks to George Steed for the idea.
> >>
> >> Ok for master?
> >>
> >> Thanks,
> >> Tamar
> >>
> >> gcc/ChangeLog:
> >>
> >>* match.pd: Add fneg/fadd rule.
> >>
> >> gcc/testsuite/ChangeLog:
> >>
> >>* gcc.target/aarch64/simd/addsub_1.c: New test.
> >>* gcc.target/aarch64/sve/addsub_1.c: New test.
> >>
> >> --- inline copy of patch --
> >> diff --git a/gcc/match.pd b/gcc/match.pd index
> >>
> 

Re: [PATCH V3] RISC-V:Fix a bug that is the CMO builtins are missing parameter

2022-06-20 Thread Kito Cheng via Gcc-patches
Committed, thanks!

On Wed, Jun 8, 2022 at 10:20 AM  wrote:
>
> From: yulong 
>
> We changed builtins format about zicbom and zicboz subextensions and modified 
> test cases.
> diff with the previous version:
> 1.We modified the FUNCTION_TYPE from RISCV_VOID_FTYPE_SI/DI to 
> RISCV_VOID_FTYPE_VOID_PTR.
> 2.We added a new RISCV_ATYPE_VOID_PTR in riscv-builtins.cc and a new 
> DEF_RISCV_FTYPE (1, (VOID, VOID_PTR)) in riscv-ftypes.def.
> 3.We deleted DEF_RISCV_FTYPE (1, (VOID, SI/DI)).
> 4.We modified the input parameters of the test cases.
>
> Thanks, Simon and Kito.
>
> gcc/ChangeLog:
>
> * config/riscv/riscv-builtins.cc (RISCV_ATYPE_VOID_PTR): New.
> * config/riscv/riscv-cmo.def (RISCV_BUILTIN): changed the 
> FUNCTION_TYPE of RISCV_BUILTIN.
> * config/riscv/riscv-ftypes.def (0): New.
> (1):
>
> gcc/testsuite/ChangeLog:
>
> * gcc.target/riscv/cmo-zicbom-1.c: modified the input parameters.
> * gcc.target/riscv/cmo-zicbom-2.c: modified the input parameters.
> * gcc.target/riscv/cmo-zicboz-1.c: modified the input parameters.
> * gcc.target/riscv/cmo-zicboz-2.c: modified the input parameters.
>
> ---
>  gcc/config/riscv/riscv-builtins.cc|  1 +
>  gcc/config/riscv/riscv-cmo.def| 16 ++--
>  gcc/config/riscv/riscv-ftypes.def |  3 +--
>  gcc/testsuite/gcc.target/riscv/cmo-zicbom-1.c | 26 ---
>  gcc/testsuite/gcc.target/riscv/cmo-zicbom-2.c | 26 ---
>  gcc/testsuite/gcc.target/riscv/cmo-zicboz-1.c | 10 ---
>  gcc/testsuite/gcc.target/riscv/cmo-zicboz-2.c | 10 ---
>  7 files changed, 58 insertions(+), 34 deletions(-)
>
> diff --git a/gcc/config/riscv/riscv-builtins.cc 
> b/gcc/config/riscv/riscv-builtins.cc
> index 795132a0c16..1218fdfc67d 100644
> --- a/gcc/config/riscv/riscv-builtins.cc
> +++ b/gcc/config/riscv/riscv-builtins.cc
> @@ -133,6 +133,7 @@ AVAIL (prefetchi64, TARGET_ZICBOP && TARGET_64BIT)
>  #define RISCV_ATYPE_USI unsigned_intSI_type_node
>  #define RISCV_ATYPE_SI intSI_type_node
>  #define RISCV_ATYPE_DI intDI_type_node
> +#define RISCV_ATYPE_VOID_PTR ptr_type_node
>
>  /* RISCV_FTYPE_ATYPESN takes N RISCV_FTYPES-like type codes and lists
> their associated RISCV_ATYPEs.  */
> diff --git a/gcc/config/riscv/riscv-cmo.def b/gcc/config/riscv/riscv-cmo.def
> index b30ecf96ec1..9fe5094ce1a 100644
> --- a/gcc/config/riscv/riscv-cmo.def
> +++ b/gcc/config/riscv/riscv-cmo.def
> @@ -1,16 +1,16 @@
>  // zicbom
> -RISCV_BUILTIN (clean_si, "zicbom_cbo_clean", RISCV_BUILTIN_DIRECT, 
> RISCV_SI_FTYPE, clean32),
> -RISCV_BUILTIN (clean_di, "zicbom_cbo_clean", RISCV_BUILTIN_DIRECT, 
> RISCV_DI_FTYPE, clean64),
> +RISCV_BUILTIN (clean_si, "zicbom_cbo_clean", RISCV_BUILTIN_DIRECT_NO_TARGET, 
> RISCV_VOID_FTYPE_VOID_PTR, clean32),
> +RISCV_BUILTIN (clean_di, "zicbom_cbo_clean", RISCV_BUILTIN_DIRECT_NO_TARGET, 
> RISCV_VOID_FTYPE_VOID_PTR, clean64),
>
> -RISCV_BUILTIN (flush_si, "zicbom_cbo_flush", RISCV_BUILTIN_DIRECT, 
> RISCV_SI_FTYPE, flush32),
> -RISCV_BUILTIN (flush_di, "zicbom_cbo_flush", RISCV_BUILTIN_DIRECT, 
> RISCV_DI_FTYPE, flush64),
> +RISCV_BUILTIN (flush_si, "zicbom_cbo_flush", RISCV_BUILTIN_DIRECT_NO_TARGET, 
> RISCV_VOID_FTYPE_VOID_PTR, flush32),
> +RISCV_BUILTIN (flush_di, "zicbom_cbo_flush", RISCV_BUILTIN_DIRECT_NO_TARGET, 
> RISCV_VOID_FTYPE_VOID_PTR, flush64),
>
> -RISCV_BUILTIN (inval_si, "zicbom_cbo_inval", RISCV_BUILTIN_DIRECT, 
> RISCV_SI_FTYPE, inval32),
> -RISCV_BUILTIN (inval_di, "zicbom_cbo_inval", RISCV_BUILTIN_DIRECT, 
> RISCV_DI_FTYPE, inval64),
> +RISCV_BUILTIN (inval_si, "zicbom_cbo_inval", RISCV_BUILTIN_DIRECT_NO_TARGET, 
> RISCV_VOID_FTYPE_VOID_PTR, inval32),
> +RISCV_BUILTIN (inval_di, "zicbom_cbo_inval", RISCV_BUILTIN_DIRECT_NO_TARGET, 
> RISCV_VOID_FTYPE_VOID_PTR, inval64),
>
>  // zicboz
> -RISCV_BUILTIN (zero_si, "zicboz_cbo_zero", RISCV_BUILTIN_DIRECT, 
> RISCV_SI_FTYPE, zero32),
> -RISCV_BUILTIN (zero_di, "zicboz_cbo_zero", RISCV_BUILTIN_DIRECT, 
> RISCV_DI_FTYPE, zero64),
> +RISCV_BUILTIN (zero_si, "zicboz_cbo_zero", RISCV_BUILTIN_DIRECT_NO_TARGET, 
> RISCV_VOID_FTYPE_VOID_PTR, zero32),
> +RISCV_BUILTIN (zero_di, "zicboz_cbo_zero", RISCV_BUILTIN_DIRECT_NO_TARGET, 
> RISCV_VOID_FTYPE_VOID_PTR, zero64),
>
>  // zicbop
>  RISCV_BUILTIN (prefetchi_si, "zicbop_cbo_prefetchi", RISCV_BUILTIN_DIRECT, 
> RISCV_SI_FTYPE_SI, prefetchi32),
> diff --git a/gcc/config/riscv/riscv-ftypes.def 
> b/gcc/config/riscv/riscv-ftypes.def
> index 62421292ce7..c2b45c63ea1 100644
> --- a/gcc/config/riscv/riscv-ftypes.def
> +++ b/gcc/config/riscv/riscv-ftypes.def
> @@ -28,7 +28,6 @@ along with GCC; see the file COPYING3.  If not see
>
>  DEF_RISCV_FTYPE (0, (USI))
>  DEF_RISCV_FTYPE (1, (VOID, USI))
> -DEF_RISCV_FTYPE (0, (SI))
> -DEF_RISCV_FTYPE (0, (DI))
> +DEF_RISCV_FTYPE (1, (VOID, VOID_PTR))
>  DEF_RISCV_FTYPE (1, (SI, SI))
>  DEF_RISCV_FTYPE (1, (DI, DI))
> diff --git a/gcc/testsuite/gcc.target/riscv/cmo-zicbom-1.c 
> 

Re: [PATCH v1.1] tree-optimization/105736: Don't let error_mark_node escape for ADDR_EXPR

2022-06-20 Thread Jakub Jelinek via Gcc-patches
On Tue, Jun 14, 2022 at 09:01:54PM +0530, Siddhesh Poyarekar wrote:
> The addr_expr computation does not check for error_mark_node before
> returning the size expression.  This used to work in the constant case
> because the conversion to uhwi would end up causing it to return
> size_unknown, but that won't work for the dynamic case.

Regarding subject/first line of commit, it should be something like:

tree-object-size: Don't let error_mark_node escape for ADDR_EXPR [PR105736]
instead of what you have.

> Modify the control flow to explicitly return size_unknown if the offset
> computation returns an error_mark_node.
> 
> gcc/ChangeLog:
> 
>   PR tree-optimization/105736
>   * tree-object-size.cc (addr_object_size): Return size_unknown
>   when object offset computation returns an error.
> 
> gcc/testsuite/ChangeLog:
> 
>   PR tree-optimization/105736
>   * gcc.dg/builtin-dynamic-object-size-0.c (TV4, val3,
>   test_pr105736): New struct declaration, variable and function to
>   test PR.

If you want to spell the exact changes in the test, it would be better
to do it separately when it is different changes.
* gcc.dg/builtin-dynamic-object-size-0.c (struct TV4): New type.
(val3): New variable.
(test_pr105736): New function.
>   (main): Use them.

Otherwise LGTM, but for GCC 13, it would be nice to add support
for BIT_FIELD_REF if both second and third arguments are multiples of
BITS_PER_UNIT.

Something like:
case BIT_FIELD_REF:
  if (!tree_fits_shwi_p (TREE_OPERAND (expr, 1))
  || !tree_fits_shwi_p (TREE_OPERAND (expr, 2))
  || (tree_to_shwi (TREE_OPERAND (expr, 1)) % BITS_PER_UNIT)
  || (tree_to_shwi (TREE_OPERAND (expr, 2)) % BITS_PER_UNIT))
return error_mark_node;

  base = compute_object_offset (TREE_OPERAND (expr, 0), var);
  if (base == error_mark_node)
return base;

  off = size_int (tree_to_shwi (TREE_OPERAND (expr, 2)
  / BITS_PER_UNIT));
  break;
or so.

> 
> Signed-off-by: Siddhesh Poyarekar 
> ---
> Changes from v1:
> - Used FAIL() instead of __builtin_abort() in the test.
> 
> Tested:
> 
> - x86_64 bootstrap and test
> - --with-build-config=bootstrap-ubsan build
> 
> May I also backport this to gcc12?

Ok.

Jakub



Re: [PATCH] Add -fextra-libc-function=memcmpeq for __memcmpeq

2022-06-20 Thread Richard Biener via Gcc-patches
On Thu, Jun 16, 2022 at 1:38 AM Fangrui Song  wrote:
>
> On Wed, Jun 15, 2022 at 2:44 PM H.J. Lu via Gcc-patches
>  wrote:
> >
> > On Mon, Jun 13, 2022 at 9:01 AM Richard Biener
> >  wrote:
> > >
> > >
> > >
> > > > Am 13.06.2022 um 16:36 schrieb H.J. Lu :
> > > >
> > > > On Mon, Jun 13, 2022 at 3:11 AM Richard Biener
> > > >  wrote:
> > > >>
> > > >>> On Tue, Jun 7, 2022 at 9:02 PM H.J. Lu via Gcc-patches
> > > >>>  wrote:
> > > >>>
> > > >>> Add -fextra-libc-function=memcmpeq to map
> > > >>>
> > > >>> extern int __memcmpeq (const void *, const void *, size_t);
> > > >>>
> > > >>> which was added to GLIBC 2.35, to __builtin_memcmp_eq.
> > > >>
> > > >> Humm.  Can't we instead use the presence of a declaration
> > > >> of __memcmpeq with a GNU standard dialect as this instead of
> > > >> adding a weird -fextra-libc-function= option?  Maybe that's even
> > > >> reasonable with a non-GNU dialect standard in effect since
> > > >> __ prefixed names are in the implementation namespace?
> > > >
> > > > But not all source codes include  and GCC may generate
> > > > memcmp directly.  How should we handle these cases?
> > >
> > > Not.  Similar as to vectorized math functions.
> > > I think it’s not worth optimizing for this case.
> >
> > Another question.  Should we consider any __memcmpeq prototype
> > or just the one in the system header file?

Any.

> An idea from https://reviews.llvm.org/D56593#3586673: -fbuiltin-__memcmpeq
>
> This requires making -fbuiltin-function available, see
> https://gcc.gnu.org/onlinedocs/gcc/C-Dialect-Options.html
> ("There is no corresponding -fbuiltin-function option")
>
> I prefer an option over a magic behavior about whether a declaration exists.

But we already have this behavior for multiple cases.  It's also the only
way that in practice __memcmpeq will be used - _nobody_ (but maybe
special crafted SPEC peak runs) will add explicit -fbuiltin-__memcmpeq.

Richard.

> > > Richard.
> > >
> > > >
> > > >> Richard.
> > > >>
> > > >>> gcc/
> > > >>>
> > > >>>* builtins.cc: Include "opts.h".
> > > >>>(expand_builtin): Generate BUILT_IN_MEMCMP_EQ if __memcmpeq is
> > > >>>available.
> > > >>>* builtins.def (BUILT_IN___MEMCMPEQ): New.
> > > >>>* common.opt: Add -fextra-libc-function=.
> > > >>>* opts.cc (extra_libc_functions): New.
> > > >>>(parse_extra_libc_function): New function.
> > > >>>(common_handle_option): Handle -fextra-libc-function=.
> > > >>>* opts.h (extra_libc_function_list): New.
> > > >>>(extra_libc_functions): Likewise.
> > > >>>* doc/invoke.texi: Document -fextra-libc-function=memcmpeq.
> > > >>>
> > > >>> gcc/testsuite/
> > > >>>
> > > >>>* c-c++-common/memcmpeq-1.c: New test.
> > > >>>* c-c++-common/memcmpeq-2.c: Likewise.
> > > >>>* c-c++-common/memcmpeq-3.c: Likewise.
> > > >>>* c-c++-common/memcmpeq-4.c: Likewise.
> > > >>>* c-c++-common/memcmpeq-5.c: Likewise.
> > > >>>* c-c++-common/memcmpeq-6.c: Likewise.
> > > >>>* c-c++-common/memcmpeq-7.c: Likewise.
> > > >>> ---
> > > >>> gcc/builtins.cc |  5 -
> > > >>> gcc/builtins.def|  4 
> > > >>> gcc/common.opt  |  4 
> > > >>> gcc/doc/invoke.texi |  6 ++
> > > >>> gcc/opts.cc | 23 +++
> > > >>> gcc/opts.h  |  7 +++
> > > >>> gcc/testsuite/c-c++-common/memcmpeq-1.c | 11 +++
> > > >>> gcc/testsuite/c-c++-common/memcmpeq-2.c | 11 +++
> > > >>> gcc/testsuite/c-c++-common/memcmpeq-3.c | 11 +++
> > > >>> gcc/testsuite/c-c++-common/memcmpeq-4.c | 11 +++
> > > >>> gcc/testsuite/c-c++-common/memcmpeq-5.c | 11 +++
> > > >>> gcc/testsuite/c-c++-common/memcmpeq-6.c | 11 +++
> > > >>> gcc/testsuite/c-c++-common/memcmpeq-7.c | 11 +++
> > > >>> 13 files changed, 125 insertions(+), 1 deletion(-)
> > > >>> create mode 100644 gcc/testsuite/c-c++-common/memcmpeq-1.c
> > > >>> create mode 100644 gcc/testsuite/c-c++-common/memcmpeq-2.c
> > > >>> create mode 100644 gcc/testsuite/c-c++-common/memcmpeq-3.c
> > > >>> create mode 100644 gcc/testsuite/c-c++-common/memcmpeq-4.c
> > > >>> create mode 100644 gcc/testsuite/c-c++-common/memcmpeq-5.c
> > > >>> create mode 100644 gcc/testsuite/c-c++-common/memcmpeq-6.c
> > > >>> create mode 100644 gcc/testsuite/c-c++-common/memcmpeq-7.c
> > > >>>
> > > >>> diff --git a/gcc/builtins.cc b/gcc/builtins.cc
> > > >>> index b9d89b409b8..22269318e8c 100644
> > > >>> --- a/gcc/builtins.cc
> > > >>> +++ b/gcc/builtins.cc
> > > >>> @@ -81,6 +81,7 @@ along with GCC; see the file COPYING3.  If not see
> > > >>> #include "demangle.h"
> > > >>> #include "gimple-range.h"
> > > >>> #include "pointer-query.h"
> > > >>> +#include "opts.h"
> > > >>>
> > > >>> struct target_builtins default_target_builtins;
> > > >>> #if 

Re: [PATCH 3/3] lto-plugin: implement LDPT_GET_API_VERSION

2022-06-20 Thread Richard Biener via Gcc-patches
On Thu, Jun 16, 2022 at 2:25 PM Martin Liška  wrote:
>
> On 6/16/22 10:00, Alexander Monakov wrote:
> > On Thu, 16 Jun 2022, Martin Liška wrote:
> >
> >> Hi.
> >>
> >> I'm sending updated version of the patch where I addressed the comments.
> >>
> >> Patch can bootstrap on x86_64-linux-gnu and survives regression tests.
> >>
> >> Ready to be installed?
> >
> > I noticed a typo (no objection on the substance on the patch from me):
>
> Good!
>
> >
> >> --- a/include/plugin-api.h
> >> +++ b/include/plugin-api.h
> >> @@ -483,6 +483,34 @@ enum ld_plugin_level
> >>LDPL_FATAL
> >>  };
> >>
> >> +/* Contract between a plug-in and a linker.  */
> >> +
> >> +enum linker_api_version
> >> +{
> >> +   /* The linker/plugin do not implement any of the API levels below, the 
> >> API
> >> +   is determined solely via the transfer vector.  */
> >> +   LAPI_UNSPECIFIED = 0,

I'll note this is somewhat redundant with the presence of
ld_plugin_get_api_version,
but only somewhat since it also provides info about the linker/plugin
identifier.

> >> +   /* API level v1.  The linker provides add_symbols_v3, add_symbols_v2,
> >
> > This should be '*get_*symbols_v3, add_symbols_v2'.
>
> Sure, fixed.
>
> >
> >> +  the plugin will use that and not any lower versions.
> >> +  claim_file is thread-safe on the plugin side and
> >> +  add_symbols on the linker side.  */
> >> +   LAPI_V1 = 1
> >> +};
> >> +
> >> +/* The linker's interface for API version negotiation.  A plug-in calls
> >> +  the function (with its IDENTIFIER and VERSION), plus minimal and maximal
> >> +  version of linker_api_version is provided.  Linker then returns selected
> >> +  API version and provides its IDENTIFIER and VERSION.  */
> >> +
> >> +typedef
> >> +enum linker_api_version
> >> +(*ld_plugin_get_api_version) (const char *plugin_identifier, unsigned 
> >> plugin_version,
> >> +  enum linker_api_version minimal_api_supported,
> >> +  enum linker_api_version maximal_api_supported,
> >> +  const char **linker_identifier,
> >> +  unsigned *linker_version);
> >
> > IIRC Richi asked to mention which side owns the strings (does the receiver 
> > need
> > to 'free' or 'strdup' them). Perhaps we could say they are owned by the
> > originating side, but it might be even better to say they are unchanging to
> > allow simply using string literals. Perhaps add something like this to the
> > comment?
> >
> > Identifier pointers remain valid as long as the plugin is loaded.
>
> I welcome the change and I'm sending a patch that incorporates that.
>
> >
> >>  /* Values for the tv_tag field of the transfer vector.  */
> >>
> >>  enum ld_plugin_tag
> >> @@ -521,6 +549,7 @@ enum ld_plugin_tag
> >>LDPT_REGISTER_NEW_INPUT_HOOK,
> >>LDPT_GET_WRAP_SYMBOLS,
> >>LDPT_ADD_SYMBOLS_V2,
> >> +  LDPT_GET_API_VERSION,
> >>  };
> >
> > I went checking if this is in sync with Binutils header and noticed that
> > get_wrap_symbols and add_symbols_v2 are not even mentioned on the wiki page 
> > with
> > plugin API documentation.
>
> Yes, I know about that. I'm going to update wiki page once we get this in.

I think this is OK.  Can we get buy-in from mold people?

Thanks,
Richard.

> Cheers,
> Martin
>
> >
> > Alexander


Re: [PATCH 2/3] lto-plugin: make claim_file_handler thread-safe

2022-06-20 Thread Richard Biener via Gcc-patches
On Thu, Jun 16, 2022 at 9:01 AM Martin Liška  wrote:
>
> lto-plugin/ChangeLog:
>
> * lto-plugin.c (plugin_lock): New lock.
> (claim_file_handler): Use mutex for critical section.
> (onload): Initialize mutex.
> ---
>  lto-plugin/lto-plugin.c | 16 +++-
>  1 file changed, 15 insertions(+), 1 deletion(-)
>
> diff --git a/lto-plugin/lto-plugin.c b/lto-plugin/lto-plugin.c
> index 00b760636dc..13118c4983c 100644
> --- a/lto-plugin/lto-plugin.c
> +++ b/lto-plugin/lto-plugin.c
> @@ -55,6 +55,7 @@ along with this program; see the file COPYING3.  If not see
>  #include 
>  #include 
>  #include 
> +#include 

Not sure if we support any non-pthread target for building the LTO
plugin, but it
seems we have

  # Among non-ELF, only Windows platforms support the lto-plugin so far.
  # Build it unless LTO was explicitly disabled.
  case $target in
*-cygwin* | *-mingw*) build_lto_plugin=$enable_lto ;;

which suggests that at least build validating the above with --enable-lto

IIRC we have gthr-*.h in libgcc/, not sure if that's usable in a
host linker plugin.

>  #ifdef HAVE_SYS_WAIT_H
>  #include 
>  #endif
> @@ -157,6 +158,9 @@ enum symbol_style
>ss_uscore,   /* Underscore prefix all symbols.  */
>  };
>
> +/* Plug-in mutex.  */
> +static pthread_mutex_t plugin_lock;
> +
>  static char *arguments_file_name;
>  static ld_plugin_register_claim_file register_claim_file;
>  static ld_plugin_register_all_symbols_read register_all_symbols_read;
> @@ -1262,15 +1266,18 @@ claim_file_handler (const struct ld_plugin_input_file 
> *file, int *claimed)
>   lto_file.symtab.syms);
>check (status == LDPS_OK, LDPL_FATAL, "could not add symbols");
>
> +  pthread_mutex_lock (_lock);
>num_claimed_files++;
>claimed_files =
> xrealloc (claimed_files,
>   num_claimed_files * sizeof (struct plugin_file_info));
>claimed_files[num_claimed_files - 1] = lto_file;
> +  pthread_mutex_unlock (_lock);
>
>*claimed = 1;
>  }
>
> +  pthread_mutex_lock (_lock);
>if (offload_files == NULL)
>  {
>/* Add dummy item to the start of the list.  */
> @@ -1333,11 +1340,12 @@ claim_file_handler (const struct ld_plugin_input_file 
> *file, int *claimed)
> offload_files_last_lto = ofld;
>num_offload_files++;
>  }
> +  pthread_mutex_unlock (_lock);
>
>goto cleanup;
>
>   err:
> -  non_claimed_files++;
> +  __atomic_fetch_add (_claimed_files, 1, __ATOMIC_RELAXED);

is it worth "optimizing" this with yet another need for target specific support
(just use pthread_mutex here as well?)

>free (lto_file.name);
>
>   cleanup:
> @@ -1415,6 +1423,12 @@ onload (struct ld_plugin_tv *tv)
>struct ld_plugin_tv *p;
>enum ld_plugin_status status;
>
> +  if (pthread_mutex_init (_lock, NULL) != 0)
> +{
> +  fprintf (stderr, "mutex init failed\n");
> +  abort ();
> +}
> +
>p = tv;
>while (p->tv_tag)
>  {
> --
> 2.36.1
>
>


Re: [PATCH 1/3] lto-plugin: support LDPT_GET_SYMBOLS_V3

2022-06-20 Thread Richard Biener via Gcc-patches
On Thu, Jun 16, 2022 at 9:00 AM Martin Liška  wrote:
>
> That supports skipping of an object file (LDPS_NO_SYMS).

OK.

Thanks,
Richard.

> lto-plugin/ChangeLog:
>
> * lto-plugin.c (struct plugin_file_info): Add skip_file flag.
> (write_resolution): Write resolution only if get_symbols != 
> LDPS_NO_SYMS.
> (all_symbols_read_handler): Ignore file if skip_file is true.
> (onload): Handle LDPT_GET_SYMBOLS_V3.
> ---
>  lto-plugin/lto-plugin.c | 42 ++---
>  1 file changed, 35 insertions(+), 7 deletions(-)
>
> diff --git a/lto-plugin/lto-plugin.c b/lto-plugin/lto-plugin.c
> index 47378435612..00b760636dc 100644
> --- a/lto-plugin/lto-plugin.c
> +++ b/lto-plugin/lto-plugin.c
> @@ -136,6 +136,7 @@ struct plugin_file_info
>void *handle;
>struct plugin_symtab symtab;
>struct plugin_symtab conflicts;
> +  bool skip_file;
>  };
>
>  /* List item with name of the file with offloading.  */
> @@ -159,7 +160,7 @@ enum symbol_style
>  static char *arguments_file_name;
>  static ld_plugin_register_claim_file register_claim_file;
>  static ld_plugin_register_all_symbols_read register_all_symbols_read;
> -static ld_plugin_get_symbols get_symbols, get_symbols_v2;
> +static ld_plugin_get_symbols get_symbols, get_symbols_v2, get_symbols_v3;
>  static ld_plugin_register_cleanup register_cleanup;
>  static ld_plugin_add_input_file add_input_file;
>  static ld_plugin_add_input_library add_input_library;
> @@ -547,15 +548,13 @@ free_symtab (struct plugin_symtab *symtab)
>  static void
>  write_resolution (void)
>  {
> -  unsigned int i;
> +  unsigned int i, included_files = 0;
>FILE *f;
>
>check (resolution_file, LDPL_FATAL, "resolution file not specified");
>f = fopen (resolution_file, "w");
>check (f, LDPL_FATAL, "could not open file");
>
> -  fprintf (f, "%d\n", num_claimed_files);
> -
>for (i = 0; i < num_claimed_files; i++)
>  {
>struct plugin_file_info *info = _files[i];
> @@ -563,13 +562,38 @@ write_resolution (void)
>struct ld_plugin_symbol *syms = symtab->syms;
>
>/* Version 2 of API supports IRONLY_EXP resolution that is
> - accepted by GCC-4.7 and newer.  */
> -  if (get_symbols_v2)
> +accepted by GCC-4.7 and newer.
> +Version 3 can return LDPS_NO_SYMS that means the object
> +will not be used at all.  */
> +  if (get_symbols_v3)
> +   {
> + enum ld_plugin_status status
> +   = get_symbols_v3 (info->handle, symtab->nsyms, syms);
> + if (status == LDPS_NO_SYMS)
> +   {
> + info->skip_file = true;
> + continue;
> +   }
> +   }
> +  else if (get_symbols_v2)
>  get_symbols_v2 (info->handle, symtab->nsyms, syms);
>else
>  get_symbols (info->handle, symtab->nsyms, syms);
>
> +  ++included_files;
> +
>finish_conflict_resolution (symtab, >conflicts);
> +}
> +
> +  fprintf (f, "%d\n", included_files);
> +
> +  for (i = 0; i < num_claimed_files; i++)
> +{
> +  struct plugin_file_info *info = _files[i];
> +  struct plugin_symtab *symtab = >symtab;
> +
> +  if (info->skip_file)
> +   continue;
>
>fprintf (f, "%s %d\n", info->name, symtab->nsyms + 
> info->conflicts.nsyms);
>dump_symtab (f, symtab);
> @@ -833,7 +857,8 @@ all_symbols_read_handler (void)
>  {
>struct plugin_file_info *info = _files[i];
>
> -  *lto_arg_ptr++ = info->name;
> +  if (!info->skip_file)
> +   *lto_arg_ptr++ = info->name;
>  }
>
>*lto_arg_ptr++ = NULL;
> @@ -1410,6 +1435,9 @@ onload (struct ld_plugin_tv *tv)
> case LDPT_REGISTER_ALL_SYMBOLS_READ_HOOK:
>   register_all_symbols_read = p->tv_u.tv_register_all_symbols_read;
>   break;
> +   case LDPT_GET_SYMBOLS_V3:
> + get_symbols_v3 = p->tv_u.tv_get_symbols;
> + break;
> case LDPT_GET_SYMBOLS_V2:
>   get_symbols_v2 = p->tv_u.tv_get_symbols;
>   break;
> --
> 2.36.1
>
>


RE: [PATCH 2/2]middle-end: Support recognition of three-way max/min.

2022-06-20 Thread Tamar Christina via Gcc-patches
> -Original Message-
> From: Richard Biener 
> Sent: Monday, June 20, 2022 9:36 AM
> To: Tamar Christina 
> Cc: gcc-patches@gcc.gnu.org; nd ; ja...@redhat.com
> Subject: Re: [PATCH 2/2]middle-end: Support recognition of three-way
> max/min.
> 
> On Thu, 16 Jun 2022, Tamar Christina wrote:
> 
> > Hi All,
> >
> > This patch adds support for three-way min/max recognition in phi-opts.
> >
> > Concretely for e.g.
> >
> > #include 
> >
> > uint8_t three_min (uint8_t xc, uint8_t xm, uint8_t xy) {
> > uint8_t  xk;
> > if (xc < xm) {
> > xk = (uint8_t) (xc < xy ? xc : xy);
> > } else {
> > xk = (uint8_t) (xm < xy ? xm : xy);
> > }
> > return xk;
> > }
> >
> > we generate:
> >
> >[local count: 1073741824]:
> >   _5 = MIN_EXPR ;
> >   _7 = MIN_EXPR ;
> >   return _7;
> >
> > instead of
> >
> >   :
> >   if (xc_2(D) < xm_3(D))
> > goto ;
> >   else
> > goto ;
> >
> >   :
> >   xk_5 = MIN_EXPR ;
> >   goto ;
> >
> >   :
> >   xk_6 = MIN_EXPR ;
> >
> >   :
> >   # xk_1 = PHI 
> >   return xk_1;
> >
> > The same function also immediately deals with turning a minimization
> > problem into a maximization one if the results are inverted.  We do
> > this here since doing it in match.pd would end up changing the shape
> > of the BBs and adding additional instructions which would prevent various
> optimizations from working.
> 
> Can you explain a bit more?

I'll respond to this one first In case it changes how you want me to proceed.

I initially had used a match.pd rule to do the min to max conversion, but a
number of testcases started to fail.  The reason was that a lot of the foldings
checked that the BB contains only a single SSA and that that SSA is a phi node.

By changing the min into max, the negation of the result ends up In the same BB
and so the optimizations are skipped leading to less optimal code.

I did look into relaxing those phi opts but it felt like I'd make a rather 
arbitrary
exception for minus and seemed better to handle it in the minmax folding. 

Thanks,
Tamar

> 
> > Bootstrapped Regtested on aarch64-none-linux-gnu and no issues.
> >
> > Ok for master?
> >
> > Thanks,
> > Tamar
> >
> > gcc/ChangeLog:
> >
> > * tree-ssa-phiopt.cc (minmax_replacement): Optionally search for
> the phi
> > sequence of a three-way conditional.
> > (replace_phi_edge_with_variable): Support deferring of BB removal.
> > (tree_ssa_phiopt_worker): Detect diamond phi structure for three-
> way
> > min/max.
> > (strip_bit_not, invert_minmax_code): New.
> >
> > gcc/testsuite/ChangeLog:
> >
> > * gcc.dg/tree-ssa/split-path-1.c: Disable phi-opts so we don't
> optimize
> > code away.
> > * gcc.dg/tree-ssa/minmax-3.c: New test.
> > * gcc.dg/tree-ssa/minmax-4.c: New test.
> > * gcc.dg/tree-ssa/minmax-5.c: New test.
> > * gcc.dg/tree-ssa/minmax-6.c: New test.
> > * gcc.dg/tree-ssa/minmax-7.c: New test.
> > * gcc.dg/tree-ssa/minmax-8.c: New test.
> >
> > --- inline copy of patch --
> > diff --git a/gcc/testsuite/gcc.dg/tree-ssa/minmax-3.c
> > b/gcc/testsuite/gcc.dg/tree-ssa/minmax-3.c
> > new file mode 100644
> > index
> >
> ..de3b2e946e81701e3b75f580e
> 6a8
> > 43695a05786e
> > --- /dev/null
> > +++ b/gcc/testsuite/gcc.dg/tree-ssa/minmax-3.c
> > @@ -0,0 +1,17 @@
> > +/* { dg-do compile } */
> > +/* { dg-options "-O -fdump-tree-phiopt" } */
> > +
> > +#include 
> > +
> > +uint8_t three_min (uint8_t xc, uint8_t xm, uint8_t xy) {
> > +   uint8_t  xk;
> > +if (xc < xm) {
> > +xk = (uint8_t) (xc < xy ? xc : xy);
> > +} else {
> > +xk = (uint8_t) (xm < xy ? xm : xy);
> > +}
> > +return xk;
> > +}
> > +
> > +/* { dg-final { scan-tree-dump-times "MIN_EXPR" 3 "phiopt1" } } */
> > +/* { dg-final { scan-tree-dump-times "MAX_EXPR" 0 "phiopt1" } } */
> > diff --git a/gcc/testsuite/gcc.dg/tree-ssa/minmax-4.c
> > b/gcc/testsuite/gcc.dg/tree-ssa/minmax-4.c
> > new file mode 100644
> > index
> >
> ..0b6d667be868c2405eaefd17c
> b52
> > 2da44bafa0e2
> > --- /dev/null
> > +++ b/gcc/testsuite/gcc.dg/tree-ssa/minmax-4.c
> > @@ -0,0 +1,17 @@
> > +/* { dg-do compile } */
> > +/* { dg-options "-O -fdump-tree-phiopt" } */
> > +
> > +#include 
> > +
> > +uint8_t three_max (uint8_t xc, uint8_t xm, uint8_t xy) {
> > +uint8_t xk;
> > +if (xc > xm) {
> > +xk = (uint8_t) (xc > xy ? xc : xy);
> > +} else {
> > +xk = (uint8_t) (xm > xy ? xm : xy);
> > +}
> > +return xk;
> > +}
> > +
> > +/* { dg-final { scan-tree-dump-times "MIN_EXPR" 0 "phiopt1" } } */
> > +/* { dg-final { scan-tree-dump-times "MAX_EXPR" 3 "phiopt1" } } */
> > diff --git a/gcc/testsuite/gcc.dg/tree-ssa/minmax-5.c
> > b/gcc/testsuite/gcc.dg/tree-ssa/minmax-5.c
> > new file mode 100644
> > index
> >
> ..650601a3cc75d09a9e6e54a35f
> 5b
> > 9993074f8510
> > --- /dev/null
> > +++ 

Re: [PATCH]middle-end simplify complex if expressions where comparisons are inverse of one another.

2022-06-20 Thread Richard Biener via Gcc-patches
On Thu, 16 Jun 2022, Tamar Christina wrote:

> Hi All,
> 
> This optimizes the following sequence
> 
>   ((a < b) & c) | ((a >= b) & d)
> 
> into
> 
>   (a < b ? c : d) & 1
> 
> for scalar. On vector we can omit the & 1.
>
> This changes the code generation from
> 
> zoo2:
>   cmp w0, w1
>   csetw0, lt
>   csetw1, ge
>   and w0, w0, w2
>   and w1, w1, w3
>   orr w0, w0, w1
>   ret
> 
> into
> 
>   cmp w0, w1
>   cselw0, w2, w3, lt
>   and w0, w0, 1
>   ret
> 
> and significantly reduces the number of selects we have to do in the vector
> code.
> 
> Bootstrapped Regtested on aarch64-none-linux-gnu,
> x86_64-pc-linux-gnu and no issues.
> 
> Ok for master?
> 
> Thanks,
> Tamar
> 
> gcc/ChangeLog:
> 
>   * fold-const.cc (inverse_conditions_p): Traverse if SSA_NAME.
>   * match.pd: Add new rule.
> 
> gcc/testsuite/ChangeLog:
> 
>   * gcc.target/aarch64/if-compare_1.c: New test.
>   * gcc.target/aarch64/if-compare_2.c: New test.
> 
> --- inline copy of patch -- 
> diff --git a/gcc/fold-const.cc b/gcc/fold-const.cc
> index 
> 39a5a52958d87497f301826e706886b290771a2d..f180599b90150acd3ed895a64280aa3255061256
>  100644
> --- a/gcc/fold-const.cc
> +++ b/gcc/fold-const.cc
> @@ -2833,15 +2833,38 @@ compcode_to_comparison (enum comparison_code code)
>  bool
>  inverse_conditions_p (const_tree cond1, const_tree cond2)
>  {
> -  return (COMPARISON_CLASS_P (cond1)
> -   && COMPARISON_CLASS_P (cond2)
> -   && (invert_tree_comparison
> -   (TREE_CODE (cond1),
> -HONOR_NANS (TREE_OPERAND (cond1, 0))) == TREE_CODE (cond2))
> -   && operand_equal_p (TREE_OPERAND (cond1, 0),
> -   TREE_OPERAND (cond2, 0), 0)
> -   && operand_equal_p (TREE_OPERAND (cond1, 1),
> -   TREE_OPERAND (cond2, 1), 0));
> +  if (COMPARISON_CLASS_P (cond1)
> +  && COMPARISON_CLASS_P (cond2)
> +  && (invert_tree_comparison
> +(TREE_CODE (cond1),
> + HONOR_NANS (TREE_OPERAND (cond1, 0))) == TREE_CODE (cond2))
> +  && operand_equal_p (TREE_OPERAND (cond1, 0),
> +   TREE_OPERAND (cond2, 0), 0)
> +  && operand_equal_p (TREE_OPERAND (cond1, 1),
> +   TREE_OPERAND (cond2, 1), 0))
> +return true;
> +
> +  if (TREE_CODE (cond1) == SSA_NAME
> +  && TREE_CODE (cond2) == SSA_NAME)
> +{
> +  gimple *gcond1 = SSA_NAME_DEF_STMT (cond1);
> +  gimple *gcond2 = SSA_NAME_DEF_STMT (cond2);
> +  if (!is_gimple_assign (gcond1) || !is_gimple_assign (gcond2))
> + return false;
> +
> +  tree_code code1 = gimple_assign_rhs_code (gcond1);
> +  tree_code code2 = gimple_assign_rhs_code (gcond2);
> +  return TREE_CODE_CLASS (code1) == tcc_comparison
> +  && TREE_CODE_CLASS (code2) == tcc_comparison
> +  && invert_tree_comparison (code1,
> +   HONOR_NANS (gimple_arg (gcond1, 0))) == code2
> +  && operand_equal_p (gimple_arg (gcond1, 0),
> +  gimple_arg (gcond2, 0), 0)
> +  && operand_equal_p (gimple_arg (gcond1, 1),
> +  gimple_arg (gcond2, 1), 0);
> +}
> +
> +  return false;

if we do extend inverse_condition_p please add an overload like

bool
inverse_condition_p (enum tree_code, tree op00, tree op01,
 enum tree_code, tree op10, tree op11)

so you can avoid some code duplication here.

>  }
>  
>  /* Return a tree for the comparison which is the combination of
> diff --git a/gcc/match.pd b/gcc/match.pd
> index 
> 6d691d302b339c0e4556b40af158b5208c12d08f..bad49dd348add751d9ec1e3023e34d9ac123194f
>  100644
> --- a/gcc/match.pd
> +++ b/gcc/match.pd
> @@ -1160,6 +1160,32 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT)
>   (convert (bit_and (negate (convert:utype { pmop[0]; }))
>  (convert:utype @1)))
>  
> +/* Fold (((a < b) & c) | ((a >= b) & d)) into (a < b ? c : d) & 1.  */
> +(simplify
> + (bit_ior
> +  (bit_and:c (convert? @0) @2)
> +  (bit_and:c (convert? @1) @3))

in case the comparison returns a signed bool this might turn out wrong.
Maybe simply use zero_one_valued_p@0 here instead of (convert? @0)?

> +   (if (inverse_conditions_p (@0, @1)
> + /* The scalar version has to be canonicalized after vectorization
> +because it makes unconditional loads conditional ones, which
> +means we lose vectorization because the loads may trap.  */
> + && canonicalize_math_after_vectorization_p ())
> +(bit_and (cond @0 @2 @3) { build_each_one_cst (type); })))

I think you should restrict this to INTEGRAL_TYPE_P and use
build_one_cst (type) (also see below).

you can do inverse_onditions_p with lock-step for over
tcc_comparison and inverted_tcc_comparison{,_with_nans} (see existing 
examples).

> +(simplify
> + (bit_ior
> +  (bit_and:c (convert? (vec_cond:s @0 @4 integer_zerop)) @2)
> +  (bit_and:c (convert? (vec_cond:s @1 @4 

RE: [PATCH 1/2]middle-end: Simplify subtract where both arguments are being bitwise inverted.

2022-06-20 Thread Tamar Christina via Gcc-patches
> -Original Message-
> From: Richard Sandiford 
> Sent: Monday, June 20, 2022 9:19 AM
> To: Richard Biener via Gcc-patches 
> Cc: Tamar Christina ; Richard Biener
> ; Richard Guenther ;
> nd 
> Subject: Re: [PATCH 1/2]middle-end: Simplify subtract where both
> arguments are being bitwise inverted.
> 
> Richard Biener via Gcc-patches  writes:
> > On Thu, Jun 16, 2022 at 1:10 PM Tamar Christina via Gcc-patches
> >  wrote:
> >>
> >> Hi All,
> >>
> >> This adds a match.pd rule that drops the bitwwise nots when both
> >> arguments to a subtract is inverted. i.e. for:
> >>
> >> float g(float a, float b)
> >> {
> >>   return ~(int)a - ~(int)b;
> >> }
> >>
> >> we instead generate
> >>
> >> float g(float a, float b)
> >> {
> >>   return (int)a - (int)b;
> >> }
> >>
> >> We already do a limited version of this from the fold_binary fold
> >> functions but this makes a more general version in match.pd that applies
> more often.
> >>
> >> Bootstrapped Regtested on aarch64-none-linux-gnu and no issues.
> >>
> >> Ok for master?
> >>
> >> Thanks,
> >> Tamar
> >>
> >> gcc/ChangeLog:
> >>
> >> * match.pd: New bit_not rule.
> >>
> >> gcc/testsuite/ChangeLog:
> >>
> >> * gcc.dg/subnot.c: New test.
> >>
> >> --- inline copy of patch --
> >> diff --git a/gcc/match.pd b/gcc/match.pd index
> >>
> a59b6778f661cf9121dd3503f43472871e4da445..51b0a1b562409af535e53828a1
> 0
> >> c30b8a3e1ae2e 100644
> >> --- a/gcc/match.pd
> >> +++ b/gcc/match.pd
> >> @@ -1258,6 +1258,10 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT)
> >> (simplify
> >>   (bit_not (plus:c (bit_not @0) @1))
> >>   (minus @0 @1))
> >> +/* (~X - ~Y) -> X - Y.  */
> >> +(simplify
> >> + (minus (bit_not @0) (bit_not @1))
> >> + (minus @0 @1))
> >
> > It doesn't seem correct.
> >
> > (gdb) p/x ~-1 - ~0x8000
> > $3 = 0x8001
> > (gdb) p/x -1 - 0x8000
> > $4 = 0x7fff
> >
> > where I was looking for a case exposing undefined integer overflow.
> 
> Yeah, shouldn't it be folding to (minus @1 @0) instead?
> 
>   ~X = (-X - 1)
>   -Y = (-Y - 1)
> 
> so:
> 
>   ~X - ~Y = (-X - 1) - (-Y - 1)
>   = -X - 1 + Y + 1
>   = Y - X
>

You're right, sorry, I should have paid more attention when I wrote the patch.
 
Tamar
> Richard
> 
> 
> > Richard.
> >
> >>
> >>  /* ~(X - Y) -> ~X + Y.  */
> >>  (simplify
> >> diff --git a/gcc/testsuite/gcc.dg/subnot.c
> >> b/gcc/testsuite/gcc.dg/subnot.c new file mode 100644 index
> >>
> ..d621bacd27bd3d19a010e4c9f
> 83
> >> 1aa77d28bd02d
> >> --- /dev/null
> >> +++ b/gcc/testsuite/gcc.dg/subnot.c
> >> @@ -0,0 +1,9 @@
> >> +/* { dg-do compile } */
> >> +/* { dg-options "-O -fdump-tree-optimized" } */
> >> +
> >> +float g(float a, float b)
> >> +{
> >> +  return ~(int)a - ~(int)b;
> >> +}
> >> +
> >> +/* { dg-final { scan-tree-dump-not "~" "optimized" } } */
> >>
> >>
> >>
> >>
> >> --


Re: [PATCH v3] tree-optimization/94899: Remove "+ 0x80000000" in int comparisons

2022-06-20 Thread Jakub Jelinek via Gcc-patches
On Mon, Jun 20, 2022 at 09:36:28AM +0200, Richard Biener wrote:
> > --- a/gcc/match.pd
> > +++ b/gcc/match.pd
> > @@ -2080,6 +2080,19 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT)
> >(if (ANY_INTEGRAL_TYPE_P (TREE_TYPE (@0))
> > && TYPE_OVERFLOW_UNDEFINED (TREE_TYPE (@0)))
> > (op @0 @1
> > +
> > +/* As a special case, X + C < Y + C is the same as (signed) X < (signed) Y
> > +   when C is an unsigned integer constant with only the MSB set, and X and
> > +   Y have types of equal or lower integer conversion rank than C's.  */
> > +(for op (lt le ge gt)
> > + (simplify
> > +  (op (plus @1 INTEGER_CST@0) (plus @2 INTEGER_CST@0))

Can't one just omit the INTEGER_CST part on the second @0?

> > +  (if (INTEGRAL_TYPE_P (TREE_TYPE (@0))
> > +   && TYPE_UNSIGNED (TREE_TYPE (@0))
> > +   && wi::only_sign_bit_p (wi::to_wide (@0)))
> > +   (with { tree stype = signed_type_for (TREE_TYPE (@0)); }
> > +(op (convert:stype @1) (convert:stype @2))

As a follow-up, it might be useful to make it work for vector integral types
too,
typedef unsigned V __attribute__((vector_size (4 * sizeof (int;
#define M __INT_MAX__ + 1U
V foo (V x, V y)
{
  return x + (V) { M, M, M, M } < y + (V) { M, M, M, M };
}
using uniform_integer_cst_p.

Jakub



Re: [PATCH 2/2]middle-end: Support recognition of three-way max/min.

2022-06-20 Thread Richard Biener via Gcc-patches
On Thu, 16 Jun 2022, Tamar Christina wrote:

> Hi All,
> 
> This patch adds support for three-way min/max recognition in phi-opts.
> 
> Concretely for e.g.
> 
> #include 
> 
> uint8_t three_min (uint8_t xc, uint8_t xm, uint8_t xy) {
>   uint8_t  xk;
> if (xc < xm) {
> xk = (uint8_t) (xc < xy ? xc : xy);
> } else {
> xk = (uint8_t) (xm < xy ? xm : xy);
> }
> return xk;
> }
> 
> we generate:
> 
>[local count: 1073741824]:
>   _5 = MIN_EXPR ;
>   _7 = MIN_EXPR ;
>   return _7;
> 
> instead of
> 
>   :
>   if (xc_2(D) < xm_3(D))
> goto ;
>   else
> goto ;
> 
>   :
>   xk_5 = MIN_EXPR ;
>   goto ;
> 
>   :
>   xk_6 = MIN_EXPR ;
> 
>   :
>   # xk_1 = PHI 
>   return xk_1;
> 
> The same function also immediately deals with turning a minimization problem
> into a maximization one if the results are inverted.  We do this here since
> doing it in match.pd would end up changing the shape of the BBs and adding
> additional instructions which would prevent various optimizations from 
> working.

Can you explain a bit more?

> Bootstrapped Regtested on aarch64-none-linux-gnu and no issues.
> 
> Ok for master?
> 
> Thanks,
> Tamar
> 
> gcc/ChangeLog:
> 
>   * tree-ssa-phiopt.cc (minmax_replacement): Optionally search for the phi
>   sequence of a three-way conditional.
>   (replace_phi_edge_with_variable): Support deferring of BB removal.
>   (tree_ssa_phiopt_worker): Detect diamond phi structure for three-way
>   min/max.
>   (strip_bit_not, invert_minmax_code): New.
> 
> gcc/testsuite/ChangeLog:
> 
>   * gcc.dg/tree-ssa/split-path-1.c: Disable phi-opts so we don't optimize
>   code away.
>   * gcc.dg/tree-ssa/minmax-3.c: New test.
>   * gcc.dg/tree-ssa/minmax-4.c: New test.
>   * gcc.dg/tree-ssa/minmax-5.c: New test.
>   * gcc.dg/tree-ssa/minmax-6.c: New test.
>   * gcc.dg/tree-ssa/minmax-7.c: New test.
>   * gcc.dg/tree-ssa/minmax-8.c: New test.
> 
> --- inline copy of patch -- 
> diff --git a/gcc/testsuite/gcc.dg/tree-ssa/minmax-3.c 
> b/gcc/testsuite/gcc.dg/tree-ssa/minmax-3.c
> new file mode 100644
> index 
> ..de3b2e946e81701e3b75f580e6a843695a05786e
> --- /dev/null
> +++ b/gcc/testsuite/gcc.dg/tree-ssa/minmax-3.c
> @@ -0,0 +1,17 @@
> +/* { dg-do compile } */
> +/* { dg-options "-O -fdump-tree-phiopt" } */
> +
> +#include 
> +
> +uint8_t three_min (uint8_t xc, uint8_t xm, uint8_t xy) {
> + uint8_t  xk;
> +if (xc < xm) {
> +xk = (uint8_t) (xc < xy ? xc : xy);
> +} else {
> +xk = (uint8_t) (xm < xy ? xm : xy);
> +}
> +return xk;
> +}
> +
> +/* { dg-final { scan-tree-dump-times "MIN_EXPR" 3 "phiopt1" } } */
> +/* { dg-final { scan-tree-dump-times "MAX_EXPR" 0 "phiopt1" } } */
> diff --git a/gcc/testsuite/gcc.dg/tree-ssa/minmax-4.c 
> b/gcc/testsuite/gcc.dg/tree-ssa/minmax-4.c
> new file mode 100644
> index 
> ..0b6d667be868c2405eaefd17cb522da44bafa0e2
> --- /dev/null
> +++ b/gcc/testsuite/gcc.dg/tree-ssa/minmax-4.c
> @@ -0,0 +1,17 @@
> +/* { dg-do compile } */
> +/* { dg-options "-O -fdump-tree-phiopt" } */
> +
> +#include 
> +
> +uint8_t three_max (uint8_t xc, uint8_t xm, uint8_t xy) {
> +uint8_t   xk;
> +if (xc > xm) {
> +xk = (uint8_t) (xc > xy ? xc : xy);
> +} else {
> +xk = (uint8_t) (xm > xy ? xm : xy);
> +}
> +return xk;
> +}
> +
> +/* { dg-final { scan-tree-dump-times "MIN_EXPR" 0 "phiopt1" } } */
> +/* { dg-final { scan-tree-dump-times "MAX_EXPR" 3 "phiopt1" } } */
> diff --git a/gcc/testsuite/gcc.dg/tree-ssa/minmax-5.c 
> b/gcc/testsuite/gcc.dg/tree-ssa/minmax-5.c
> new file mode 100644
> index 
> ..650601a3cc75d09a9e6e54a35f5b9993074f8510
> --- /dev/null
> +++ b/gcc/testsuite/gcc.dg/tree-ssa/minmax-5.c
> @@ -0,0 +1,17 @@
> +/* { dg-do compile } */
> +/* { dg-options "-O -fdump-tree-phiopt" } */
> +
> +#include 
> +
> +uint8_t three_minmax1 (uint8_t xc, uint8_t xm, uint8_t xy) {
> + uint8_t  xk;
> +if (xc > xm) {
> +xk = (uint8_t) (xc < xy ? xc : xy);
> +} else {
> +xk = (uint8_t) (xm < xy ? xm : xy);
> +}
> +return xk;
> +}
> +
> +/* { dg-final { scan-tree-dump-times "MIN_EXPR" 2 "phiopt1" } } */
> +/* { dg-final { scan-tree-dump-times "MAX_EXPR" 1 "phiopt1" } } */
> diff --git a/gcc/testsuite/gcc.dg/tree-ssa/minmax-6.c 
> b/gcc/testsuite/gcc.dg/tree-ssa/minmax-6.c
> new file mode 100644
> index 
> ..a628f6d99222958cfd8c410f0e85639e3a49dd4b
> --- /dev/null
> +++ b/gcc/testsuite/gcc.dg/tree-ssa/minmax-6.c
> @@ -0,0 +1,17 @@
> +/* { dg-do compile } */
> +/* { dg-options "-O -fdump-tree-phiopt" } */
> +
> +#include 
> +
> +uint8_t three_minmax3 (uint8_t xc, uint8_t xm, uint8_t xy) {
> +uint8_t  xk;
> +if (xc > xm) {
> +xk = (uint8_t) (xy < xc ? xc : xy);
> +} else {
> +xk = (uint8_t) (xm < xy ? xm : xy);
> +

Re: [PATCH 1/2]middle-end: Simplify subtract where both arguments are being bitwise inverted.

2022-06-20 Thread Richard Sandiford via Gcc-patches
Richard Biener via Gcc-patches  writes:
> On Thu, Jun 16, 2022 at 1:10 PM Tamar Christina via Gcc-patches
>  wrote:
>>
>> Hi All,
>>
>> This adds a match.pd rule that drops the bitwwise nots when both arguments 
>> to a
>> subtract is inverted. i.e. for:
>>
>> float g(float a, float b)
>> {
>>   return ~(int)a - ~(int)b;
>> }
>>
>> we instead generate
>>
>> float g(float a, float b)
>> {
>>   return (int)a - (int)b;
>> }
>>
>> We already do a limited version of this from the fold_binary fold functions 
>> but
>> this makes a more general version in match.pd that applies more often.
>>
>> Bootstrapped Regtested on aarch64-none-linux-gnu and no issues.
>>
>> Ok for master?
>>
>> Thanks,
>> Tamar
>>
>> gcc/ChangeLog:
>>
>> * match.pd: New bit_not rule.
>>
>> gcc/testsuite/ChangeLog:
>>
>> * gcc.dg/subnot.c: New test.
>>
>> --- inline copy of patch --
>> diff --git a/gcc/match.pd b/gcc/match.pd
>> index 
>> a59b6778f661cf9121dd3503f43472871e4da445..51b0a1b562409af535e53828a10c30b8a3e1ae2e
>>  100644
>> --- a/gcc/match.pd
>> +++ b/gcc/match.pd
>> @@ -1258,6 +1258,10 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT)
>>  (simplify
>>   (bit_not (plus:c (bit_not @0) @1))
>>   (minus @0 @1))
>> +/* (~X - ~Y) -> X - Y.  */
>> +(simplify
>> + (minus (bit_not @0) (bit_not @1))
>> + (minus @0 @1))
>
> It doesn't seem correct.
>
> (gdb) p/x ~-1 - ~0x8000
> $3 = 0x8001
> (gdb) p/x -1 - 0x8000
> $4 = 0x7fff
>
> where I was looking for a case exposing undefined integer overflow.

Yeah, shouldn't it be folding to (minus @1 @0) instead?

  ~X = (-X - 1)
  -Y = (-Y - 1)

so:

  ~X - ~Y = (-X - 1) - (-Y - 1)
  = -X - 1 + Y + 1
  = Y - X

Richard


> Richard.
>
>>
>>  /* ~(X - Y) -> ~X + Y.  */
>>  (simplify
>> diff --git a/gcc/testsuite/gcc.dg/subnot.c b/gcc/testsuite/gcc.dg/subnot.c
>> new file mode 100644
>> index 
>> ..d621bacd27bd3d19a010e4c9f831aa77d28bd02d
>> --- /dev/null
>> +++ b/gcc/testsuite/gcc.dg/subnot.c
>> @@ -0,0 +1,9 @@
>> +/* { dg-do compile } */
>> +/* { dg-options "-O -fdump-tree-optimized" } */
>> +
>> +float g(float a, float b)
>> +{
>> +  return ~(int)a - ~(int)b;
>> +}
>> +
>> +/* { dg-final { scan-tree-dump-not "~" "optimized" } } */
>>
>>
>>
>>
>> --


[PATCH] i386: Add syscall to enable AMX for latest kernels

2022-06-20 Thread Haochen Jiang via Gcc-patches
From: "Jiang, Haochen" 

Hi all,

We need syscall to enable AMX for kernels>=5.4. It is missing in current
amx tests, which will cause test fail.

This patch aims to add them to fix this bug.

BRs,
Haochen

gcc/testsuite/ChangeLog:

* gcc.target/i386/amx-check.h (request_perm_xtile_data):
New function to check if AMX is usable and enable AMX.
(main): Run test if AMX is usable.
---
 gcc/testsuite/gcc.target/i386/amx-check.h | 24 +++
 1 file changed, 24 insertions(+)

diff --git a/gcc/testsuite/gcc.target/i386/amx-check.h 
b/gcc/testsuite/gcc.target/i386/amx-check.h
index 434b0e59703..92ed8669304 100644
--- a/gcc/testsuite/gcc.target/i386/amx-check.h
+++ b/gcc/testsuite/gcc.target/i386/amx-check.h
@@ -4,11 +4,22 @@
 #include 
 #include 
 #include 
+#include 
+#include 
 #ifdef DEBUG
 #include 
 #endif
 #include "cpuid.h"
 
+#define XFEATURE_XTILECFG  17
+#define XFEATURE_XTILEDATA 18
+#define XFEATURE_MASK_XTILECFG (1 << XFEATURE_XTILECFG)
+#define XFEATURE_MASK_XTILEDATA(1 << XFEATURE_XTILEDATA)
+#define XFEATURE_MASK_XTILE(XFEATURE_MASK_XTILECFG | 
XFEATURE_MASK_XTILEDATA)
+
+#define ARCH_GET_XCOMP_PERM0x1022
+#define ARCH_REQ_XCOMP_PERM0x1023
+
 /* TODO: The tmm emulation is temporary for current
AMX implementation with no tmm regclass, should
be changed in the future. */
@@ -44,6 +55,18 @@ typedef struct __tile
 /* Stride (colum width in byte) used for tileload/store */
 #define _STRIDE 64
 
+/* We need syscall to use amx functions */
+int request_perm_xtile_data()
+{
+  unsigned long bitmask;
+
+  if (syscall (SYS_arch_prctl, ARCH_REQ_XCOMP_PERM, XFEATURE_XTILEDATA) ||
+  syscall (SYS_arch_prctl, ARCH_GET_XCOMP_PERM, ))
+return 0;
+
+  return (bitmask & XFEATURE_MASK_XTILE) != 0;
+}
+
 /* Initialize tile config by setting all tmm size to 16x64 */
 void init_tile_config (__tilecfg_u *dst)
 {
@@ -186,6 +209,7 @@ main ()
 #ifdef AMX_BF16
   && __builtin_cpu_supports ("amx-bf16")
 #endif
+  && request_perm_xtile_data ()
   )
 {
   DO_TEST ();
-- 
2.18.2



Re: [PATCH 1/2]middle-end: Simplify subtract where both arguments are being bitwise inverted.

2022-06-20 Thread Richard Biener via Gcc-patches
On Thu, Jun 16, 2022 at 1:10 PM Tamar Christina via Gcc-patches
 wrote:
>
> Hi All,
>
> This adds a match.pd rule that drops the bitwwise nots when both arguments to 
> a
> subtract is inverted. i.e. for:
>
> float g(float a, float b)
> {
>   return ~(int)a - ~(int)b;
> }
>
> we instead generate
>
> float g(float a, float b)
> {
>   return (int)a - (int)b;
> }
>
> We already do a limited version of this from the fold_binary fold functions 
> but
> this makes a more general version in match.pd that applies more often.
>
> Bootstrapped Regtested on aarch64-none-linux-gnu and no issues.
>
> Ok for master?
>
> Thanks,
> Tamar
>
> gcc/ChangeLog:
>
> * match.pd: New bit_not rule.
>
> gcc/testsuite/ChangeLog:
>
> * gcc.dg/subnot.c: New test.
>
> --- inline copy of patch --
> diff --git a/gcc/match.pd b/gcc/match.pd
> index 
> a59b6778f661cf9121dd3503f43472871e4da445..51b0a1b562409af535e53828a10c30b8a3e1ae2e
>  100644
> --- a/gcc/match.pd
> +++ b/gcc/match.pd
> @@ -1258,6 +1258,10 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT)
>  (simplify
>   (bit_not (plus:c (bit_not @0) @1))
>   (minus @0 @1))
> +/* (~X - ~Y) -> X - Y.  */
> +(simplify
> + (minus (bit_not @0) (bit_not @1))
> + (minus @0 @1))

It doesn't seem correct.

(gdb) p/x ~-1 - ~0x8000
$3 = 0x8001
(gdb) p/x -1 - 0x8000
$4 = 0x7fff

where I was looking for a case exposing undefined integer overflow.

Richard.

>
>  /* ~(X - Y) -> ~X + Y.  */
>  (simplify
> diff --git a/gcc/testsuite/gcc.dg/subnot.c b/gcc/testsuite/gcc.dg/subnot.c
> new file mode 100644
> index 
> ..d621bacd27bd3d19a010e4c9f831aa77d28bd02d
> --- /dev/null
> +++ b/gcc/testsuite/gcc.dg/subnot.c
> @@ -0,0 +1,9 @@
> +/* { dg-do compile } */
> +/* { dg-options "-O -fdump-tree-optimized" } */
> +
> +float g(float a, float b)
> +{
> +  return ~(int)a - ~(int)b;
> +}
> +
> +/* { dg-final { scan-tree-dump-not "~" "optimized" } } */
>
>
>
>
> --


Ping: [RFA configure parts] aarch64: Make cc1 handle --with options

2022-06-20 Thread Richard Sandiford via Gcc-patches
Ping for the configure bits

Richard Sandiford via Gcc-patches  writes:
> On aarch64, --with-arch, --with-cpu and --with-tune only have an
> effect on the driver, so “./xgcc -B./ -O3” can give significantly
> different results from “./cc1 -O3”.  --with-arch did have a limited
> effect on ./cc1 in previous releases, although it didn't work
> entirely correctly.
>
> Being of a lazy persuasion, I've got used to ./cc1 selecting SVE for
> --with-arch=armv8.2-a+sve without having to supply an explicit -march,
> so this patch makes ./cc1 emulate the relevant OPTION_DEFAULT_SPECS.
> It relies on Wilco's earlier clean-ups.
>
> The patch makes config.gcc define WITH_FOO_STRING macros for each
> supported --with-foo option.  This could be done only in aarch64-
> specific code, but I thought it could be useful on other targets
> too (and can be safely ignored otherwise).  There didn't seem to
> be any existing and potentially clashing uses of macros with this
> style of name.
>
> Tested on aarch64-linux-gnu & x86_64-linux-gnu.  OK for the configure
> bits?
>
> Richard
>
>
> gcc/
>   * config.gcc: Define WITH_FOO_STRING macros for each supported
>   --with-foo option.
>   * config/aarch64/aarch64.cc (aarch64_override_options): Emulate
>   OPTION_DEFAULT_SPECS.
>   * config/aarch64/aarch64.h (OPTION_DEFAULT_SPECS): Reference the above.
> ---
>  gcc/config.gcc| 14 ++
>  gcc/config/aarch64/aarch64.cc |  8 
>  gcc/config/aarch64/aarch64.h  |  5 -
>  3 files changed, 26 insertions(+), 1 deletion(-)
>
> diff --git a/gcc/config.gcc b/gcc/config.gcc
> index cdbefb5b4f5..e039230431c 100644
> --- a/gcc/config.gcc
> +++ b/gcc/config.gcc
> @@ -5865,6 +5865,20 @@ else
>   configure_default_options="{ ${t} }"
>  fi
>  
> +for option in $supported_defaults
> +do
> + lc_option=`echo $option | sed s/-/_/g`
> + uc_option=`echo $lc_option | tr a-z A-Z`
> + eval "val=\$with_$lc_option"
> + if test -n "$val"
> + then
> + val="\\\"$val\\\""
> + else
> + val=nullptr
> + fi
> + tm_defines="$tm_defines WITH_${uc_option}_STRING=$val"
> +done
> +
>  if test "$target_cpu_default2" != ""
>  then
>   if test "$target_cpu_default" != ""
> diff --git a/gcc/config/aarch64/aarch64.cc b/gcc/config/aarch64/aarch64.cc
> index d21e041eccb..0bc700b81ad 100644
> --- a/gcc/config/aarch64/aarch64.cc
> +++ b/gcc/config/aarch64/aarch64.cc
> @@ -18109,6 +18109,14 @@ aarch64_override_options (void)
>if (aarch64_branch_protection_string)
>  aarch64_validate_mbranch_protection (aarch64_branch_protection_string);
>  
> +  /* Emulate OPTION_DEFAULT_SPECS.  */
> +  if (!aarch64_arch_string && !aarch64_cpu_string)
> +aarch64_arch_string = WITH_ARCH_STRING;
> +  if (!aarch64_arch_string && !aarch64_cpu_string)
> +aarch64_cpu_string = WITH_CPU_STRING;
> +  if (!aarch64_cpu_string && !aarch64_tune_string)
> +aarch64_tune_string = WITH_TUNE_STRING;
> +
>/* -mcpu=CPU is shorthand for -march=ARCH_FOR_CPU, -mtune=CPU.
>   If either of -march or -mtune is given, they override their
>   respective component of -mcpu.  */
> diff --git a/gcc/config/aarch64/aarch64.h b/gcc/config/aarch64/aarch64.h
> index 80cfe4b7407..3122dbd7098 100644
> --- a/gcc/config/aarch64/aarch64.h
> +++ b/gcc/config/aarch64/aarch64.h
> @@ -1267,7 +1267,10 @@ extern enum aarch64_code_model aarch64_cmodel;
>  /* Support for configure-time --with-arch, --with-cpu and --with-tune.
> --with-arch and --with-cpu are ignored if either -mcpu or -march is used.
> --with-tune is ignored if either -mtune or -mcpu is used (but is not
> -   affected by -march).  */
> +   affected by -march).
> +
> +   There is corresponding code in aarch64_override_options that emulates
> +   this behavior when cc1  are invoked directly.  */
>  #define OPTION_DEFAULT_SPECS \
>{"arch", "%{!march=*:%{!mcpu=*:-march=%(VALUE)}}" },   \
>{"cpu",  "%{!march=*:%{!mcpu=*:-mcpu=%(VALUE)}}" },   \


Re: [PATCH]middle-end Use subregs to expand COMPLEX_EXPR to set the lowpart.

2022-06-20 Thread Richard Sandiford via Gcc-patches
Richard Sandiford via Gcc-patches  writes:
> Tamar Christina  writes:
>>> -Original Message-
>>> From: Richard Sandiford 
>>> Sent: Monday, June 13, 2022 9:41 AM
>>> To: Tamar Christina 
>>> Cc: gcc-patches@gcc.gnu.org; nd ; rguent...@suse.de
>>> Subject: Re: [PATCH]middle-end Use subregs to expand COMPLEX_EXPR to
>>> set the lowpart.
>>> 
>>> Tamar Christina  writes:
>>> > Hi All,
>>> >
>>> > When lowering COMPLEX_EXPR we currently emit two VEC_EXTRACTs.
>>> One
>>> > for the lowpart and one for the highpart.
>>> >
>>> > The problem with this is that in RTL the lvalue of the RTX is the only
>>> > thing tying the two instructions together.
>>> >
>>> > This means that e.g. combine is unable to try to combine the two
>>> > instructions for setting the lowpart and highpart.
>>> >
>>> > For ISAs that have bit extract instructions we can eliminate one of
>>> > the extracts if, and only if we're setting the entire complex number.
>>> >
>>> > This change changes the expand code when we're setting the entire
>>> > complex number to generate a subreg for the lowpart instead of a
>>> vec_extract.
>>> >
>>> > This allows us to optimize sequences such as:
>>> >
>>> > _Complex int f(int a, int b) {
>>> > _Complex int t = a + b * 1i;
>>> > return t;
>>> > }
>>> >
>>> > from:
>>> >
>>> > f:
>>> >   bfi x2, x0, 0, 32
>>> >   bfi x2, x1, 32, 32
>>> >   mov x0, x2
>>> >   ret
>>> >
>>> > into:
>>> >
>>> > f:
>>> >   bfi x0, x1, 32, 32
>>> >   ret
>>> >
>>> > I have also confirmed the codegen for x86_64 did not change.
>>> >
>>> > Bootstrapped Regtested on aarch64-none-linux-gnu, x86_64-pc-linux-gnu
>>> > and no issues.
>>> >
>>> > Ok for master?
>>> 
>>> I'm not sure this is endian-safe.  For big-endian it's the imaginary part 
>>> that can
>>> be written as a subreg.  The real part might be the high part of a register.
>>> 
>>> Maybe a more general way to handle this would be to add (yet another)
>>> parameter to store_bit_field that indicates that the current value of the
>>> structure is undefined.  That would also be useful in at least one other 
>>> caller
>>> (from calls.cc).  write_complex_part could then have a similar parameter,
>>> true for the first write and false for the second.
>>
>> Ohayou-gozaimasu!
>>
>> I've rewritten it using the approach you requested. I attempted to set the 
>> flag
>> In the correct places as well.
>
> Thanks, looks good.  But rather than treat this as a new case, I think
> we can instead generalise this store_bit_field_1 code:
>
>   else if (constant_multiple_p (bitnum, regsize * BITS_PER_UNIT, )
>  && multiple_p (bitsize, regsize * BITS_PER_UNIT)
>  && known_ge (GET_MODE_BITSIZE (GET_MODE (op0)), bitsize))



>   {
> sub = simplify_gen_subreg (fieldmode, op0, GET_MODE (op0),
>regnum * regsize);
> if (sub)
>   {
> if (reverse)
>   value = flip_storage_order (fieldmode, value);
> emit_move_insn (sub, value);
> return true;
>   }
>   }
>
> so that the multiple_p test is skipped if the structure is undefined.

Actually, we should probably skip the constant_multiple_p test as well.
Keeping it would only be meaningful for little-endian.

simplify_gen_subreg should alread do the necessary checks to make sure
that the subreg can be formed (via validate_subreg).

Thanks,
Richard

> Richard
>
>> Bootstrapped Regtested on aarch64-none-linux-gnu, x86_64-pc-linux-gnu
>> and no issues.
>>
>> Ok for master?
>>
>> Thanks,
>> Tamar
>>
>> gcc/ChangeLog:
>>
>>  * expmed.cc (store_bit_field): Add parameter that indicates if value is
>>  still undefined and if so emit a subreg move instead.
>>  * expr.h (write_complex_part): Likewise.
>>  * expmed.h (store_bit_field): Add new parameter.
>>  * builtins.cc (expand_ifn_atomic_compare_exchange_into_call): Use new
>>  parameter.
>>  (expand_ifn_atomic_compare_exchange): Likewise.
>>  * calls.cc (store_unaligned_arguments_into_pseudos): Likewise.
>>  * emit-rtl.cc (validate_subreg): Likewise.
>>  * expr.cc (emit_group_store): Likewise.
>>  (copy_blkmode_from_reg): Likewise.
>>  (copy_blkmode_to_reg): Likewise.
>>  (clear_storage_hints): Likewise.
>>  (write_complex_part):  Likewise.
>>  (emit_move_complex_parts): Likewise.
>>  (expand_assignment): Likewise.
>>  (store_expr): Likewise.
>>  (store_field): Likewise.
>>  (expand_expr_real_2): Likewise.
>>  * ifcvt.cc (noce_emit_move_insn): Likewise.
>>  * internal-fn.cc (expand_arith_set_overflow): Likewise.
>>  (expand_arith_overflow_result_store): Likewise.
>>  (expand_addsub_overflow): Likewise.
>>  (expand_neg_overflow): Likewise.
>>  (expand_mul_overflow): Likewise.
>>  (expand_arith_overflow): Likewise.
>>
>> gcc/testsuite/ChangeLog:
>>
>>  * g++.target/aarch64/complex-init.C: New test.
>>
>> --- inline copy of patch ---

Re: [PATCH] rs6000: Improve .machine

2022-06-20 Thread Sebastian Huber

On 04/04/2022 11:31, Sebastian Huber wrote:

Hello Segher,

On 15/03/2022 23:29, Segher Boessenkool wrote:

On Tue, Mar 15, 2022 at 03:29:23PM +0100, Sebastian Huber wrote:

now that the PR104829 is fixed could I back port

Segher Boessenkool (2):
   rs6000: Improve .machine
   rs6000: Do not use rs6000_cpu for .machine ppc and ppc64 (PR104829)

to GCC 10 and 11?

I will do it, in a few days though.

Thanks for your enthusiasm :-),


would now be a good time to back port the fixes or do you want to wait 
for the GCC 12 release? I would be nice if the fixes are included in the 
GCC 10.4 release.


The GCC 10.4 release candidate will be made on 21st June. May I pack 
port the two patches today?


--
embedded brains GmbH
Herr Sebastian HUBER
Dornierstr. 4
82178 Puchheim
Germany
email: sebastian.hu...@embedded-brains.de
phone: +49-89-18 94 741 - 16
fax:   +49-89-18 94 741 - 08

Registergericht: Amtsgericht München
Registernummer: HRB 157899
Vertretungsberechtigte Geschäftsführer: Peter Rasmussen, Thomas Dörfler
Unsere Datenschutzerklärung finden Sie hier:
https://embedded-brains.de/datenschutzerklaerung/


Re: [PATCH] vect: Respect slp decision when applying suggested uf [PR105940]

2022-06-20 Thread Richard Biener via Gcc-patches
On Fri, Jun 17, 2022 at 12:53 PM Kewen.Lin  wrote:
>
> Hi,
>
> This follows Richi's suggestion in PR105940, it aims to avoid
> inconsistent slp decision between when the suggested unroll
> factor is worked out and when the suggested unroll factor is
> applied.
>
> If the previous slp decision is true when the suggested unroll
> factor is worked out, when we are applying unroll factor we
> don't need to start over with slp off if the analysis with slp
> on fails.  On the other hand, if the previous slp decision is
> false when the suggested unroll factor is worked out, when we
> are applying unroll factor we can skip the slp handlings.
>
> Function vect_is_simple_reduction saves reduction chains for
> subsequent slp analyses, we have to disable this early otherwise
> there is an ICE in vectorizable_reduction for below:
>
>   if (REDUC_GROUP_FIRST_ELEMENT (stmt_info))
> gcc_assert (slp_node
> && REDUC_GROUP_FIRST_ELEMENT (stmt_info)
>== stmt_info);

We ensure this by either decomposing the group in vect_analyze_slp
if the reduction chain doesn't SLP or when we re-try without SLP
by not re-trying:

  /* If there are reduction chains re-trying will fail anyway.  */
  if (! LOOP_VINFO_REDUCTION_CHAINS (loop_vinfo).is_empty ())
return ok;

> Bootstrapped and regtested on x86_64-redhat-linux,
> powerpc64{,le}-linux-gnu and aarch64-linux-gnu.
>
> Also tested with SPEC2017 build with some rs6000 hacking.
>
> Is it ok for trunk?

OK.

Thanks,
Richard.

> BR,
> Kewen
> -
>
> PR tree-optimization/105940
>
> gcc/ChangeLog:
>
> * tree-vect-loop.cc (vect_analyze_loop_2): Add new parameter
> slp_done_for_suggested_uf and adjust with it accordingly.
> (vect_analyze_loop_1): Add new variable slp_done_for_suggested_uf,
> pass it down to vect_analyze_loop_2 for the initial analysis and
> applying suggested unroll factor.
> (vect_is_simple_reduction): Add parameter slp and adjust with it.
> (vect_analyze_scalar_cycles_1): Add parameter slp and pass down.
> (vect_analyze_scalar_cycles): Likewise.
> ---
>  gcc/tree-vect-loop.cc | 101 --
>  1 file changed, 67 insertions(+), 34 deletions(-)
>
> diff --git a/gcc/tree-vect-loop.cc b/gcc/tree-vect-loop.cc
> index e05f8e87f7d..ccab68caf9a 100644
> --- a/gcc/tree-vect-loop.cc
> +++ b/gcc/tree-vect-loop.cc
> @@ -157,7 +157,7 @@ along with GCC; see the file COPYING3.  If not see
>  static void vect_estimate_min_profitable_iters (loop_vec_info, int *, int *,
> unsigned *);
>  static stmt_vec_info vect_is_simple_reduction (loop_vec_info, stmt_vec_info,
> -  bool *, bool *);
> +  bool *, bool *, bool);
>
>  /* Subroutine of vect_determine_vf_for_stmt that handles only one
> statement.  VECTYPE_MAYBE_SET_P is true if STMT_VINFO_VECTYPE
> @@ -463,10 +463,12 @@ vect_inner_phi_in_double_reduction_p (loop_vec_info 
> loop_vinfo, gphi *phi)
> Examine the cross iteration def-use cycles of scalar variables
> in LOOP.  LOOP_VINFO represents the loop that is now being
> considered for vectorization (can be LOOP, or an outer-loop
> -   enclosing LOOP).  */
> +   enclosing LOOP).  SLP indicates there will be some subsequent
> +   slp analyses or not.  */
>
>  static void
> -vect_analyze_scalar_cycles_1 (loop_vec_info loop_vinfo, class loop *loop)
> +vect_analyze_scalar_cycles_1 (loop_vec_info loop_vinfo, class loop *loop,
> + bool slp)
>  {
>basic_block bb = loop->header;
>tree init, step;
> @@ -545,7 +547,7 @@ vect_analyze_scalar_cycles_1 (loop_vec_info loop_vinfo, 
> class loop *loop)
>
>stmt_vec_info reduc_stmt_info
> = vect_is_simple_reduction (loop_vinfo, stmt_vinfo, _reduc,
> -   _chain);
> +   _chain, slp);
>if (reduc_stmt_info)
>  {
>   STMT_VINFO_REDUC_DEF (stmt_vinfo) = reduc_stmt_info;
> @@ -616,11 +618,11 @@ vect_analyze_scalar_cycles_1 (loop_vec_info loop_vinfo, 
> class loop *loop)
>   a[i] = i;  */
>
>  static void
> -vect_analyze_scalar_cycles (loop_vec_info loop_vinfo)
> +vect_analyze_scalar_cycles (loop_vec_info loop_vinfo, bool slp)
>  {
>class loop *loop = LOOP_VINFO_LOOP (loop_vinfo);
>
> -  vect_analyze_scalar_cycles_1 (loop_vinfo, loop);
> +  vect_analyze_scalar_cycles_1 (loop_vinfo, loop, slp);
>
>/* When vectorizing an outer-loop, the inner-loop is executed sequentially.
>   Reductions in such inner-loop therefore have different properties than
> @@ -632,7 +634,7 @@ vect_analyze_scalar_cycles (loop_vec_info loop_vinfo)
>  current checks are too strict.  */
>
>if (loop->inner)
> -vect_analyze_scalar_cycles_1 (loop_vinfo, loop->inner);
> +vect_analyze_scalar_cycles_1 (loop_vinfo, 

Re: [PATCH] [x86] Replace REGNO with reg_or_subregno in pre_reload splitter.

2022-06-20 Thread Uros Bizjak via Gcc-patches
On Mon, Jun 20, 2022 at 9:27 AM liuhongt  wrote:
>
> The patch is similar to [1], but use reg_or_subregno instead of REGNO.
>
> [1] https://gcc.gnu.org/pipermail/gcc-patches/2022-June/596804.html
>
> Bootstrapped and regtested on x86_64-pc-linux-gnu{-m32,}.
> Ok for trunk?
>
> gcc/ChangeLog:
>
> PR target/105993
> * config/i386/sse.md (sse4_2_pcmpestr): Replace REGNO with
> reg_or_subregno.
> (sse4_2_pcmpistr): Ditto.

OK, but I think that reg_or_subregno should be improved to return
INVALID_REGNUM when the subreg of memory is processed.

Thanks,
Uros.

> ---
>  gcc/config/i386/sse.md | 8 
>  1 file changed, 4 insertions(+), 4 deletions(-)
>
> diff --git a/gcc/config/i386/sse.md b/gcc/config/i386/sse.md
> index 64ac490d272..083a7e8885a 100644
> --- a/gcc/config/i386/sse.md
> +++ b/gcc/config/i386/sse.md
> @@ -23250,8 +23250,8 @@ (define_insn_and_split "sse4_2_pcmpestr"
>"&& 1"
>[(const_int 0)]
>  {
> -  int ecx = !find_regno_note (curr_insn, REG_UNUSED, REGNO (operands[0]));
> -  int xmm0 = !find_regno_note (curr_insn, REG_UNUSED, REGNO (operands[1]));
> +  int ecx = !find_regno_note (curr_insn, REG_UNUSED, reg_or_subregno 
> (operands[0]));
> +  int xmm0 = !find_regno_note (curr_insn, REG_UNUSED, reg_or_subregno 
> (operands[1]));
>int flags = !find_regno_note (curr_insn, REG_UNUSED, FLAGS_REG);
>
>if (ecx)
> @@ -23386,8 +23386,8 @@ (define_insn_and_split "sse4_2_pcmpistr"
>"&& 1"
>[(const_int 0)]
>  {
> -  int ecx = !find_regno_note (curr_insn, REG_UNUSED, REGNO (operands[0]));
> -  int xmm0 = !find_regno_note (curr_insn, REG_UNUSED, REGNO (operands[1]));
> +  int ecx = !find_regno_note (curr_insn, REG_UNUSED, reg_or_subregno 
> (operands[0]));
> +  int xmm0 = !find_regno_note (curr_insn, REG_UNUSED, reg_or_subregno 
> (operands[1]));
>int flags = !find_regno_note (curr_insn, REG_UNUSED, FLAGS_REG);
>
>if (ecx)
> --
> 2.18.1
>


Re: [PATCH] libgo: Recognize off64_t / loff_t type definition of musl libc

2022-06-20 Thread Eric Botcazou via Gcc-patches
> aarch64-suse-linux, of course.

Likewise on x86_64-suse-linux.

> > What is the output of
> > 
> > grep loff_t TARGET/libgo/gen-sysinfo.go
> 
> type ___loff_t int64
> type _loff_t int64
> type ___kernel_loff_t int64

Ditto.

-- 
Eric Botcazou




Re: [PATCH v3] tree-optimization/94899: Remove "+ 0x80000000" in int comparisons

2022-06-20 Thread Richard Biener via Gcc-patches
On Fri, Jun 17, 2022 at 10:57 AM Arjun Shankar  wrote:
>
> Expressions of the form "X + CST < Y + CST" where:
>
> * CST is an unsigned integer constant with only the MSB set, and
> * X and Y's types have integer conversion ranks <= CST's
>
> can be simplified to "(signed) X < (signed) Y".
>
> This is because, assuming a 32-bit signed numbers,
> (unsigned) INT_MIN + 0x8000 is 0, and
> (unsigned) INT_MAX + 0x8000 is UINT_MAX.
>
> i.e. the result increases monotonically with signed input.
>
> This means:
> ((signed) X < (signed) Y) iff (X + 0x8000 < Y + 0x8000)
>
> gcc/
> * match.pd (X + C < Y + C -> (signed) X < (signed) Y, if C is
> 0x8000): New simplification.
> gcc/testsuite/
> * gcc.dg/pr94899.c: New test.
> ---
>  gcc/match.pd   | 13 +
>  gcc/testsuite/gcc.dg/pr94899.c | 48 ++
>  2 files changed, 61 insertions(+)
>  create mode 100644 gcc/testsuite/gcc.dg/pr94899.c
> ---
> v2: https://gcc.gnu.org/pipermail/gcc-patches/2022-February/589709.html
>
> Notes on v3, based on Richard and Jakub's review comments:
>
> 1. Canonicalized the match expression to avoid having to use ":c".
> 2. Redefined MAGIC in the test to avoid running afoul of 16-bit int
>machines.
>
> Richard has approved this patch for inclusion in GCC 13:
> https://gcc.gnu.org/pipermail/gcc-patches/2022-February/589852.html
>
> diff --git a/gcc/match.pd b/gcc/match.pd
> index 3e9572e4c9c..ef42611854a 100644
> --- a/gcc/match.pd
> +++ b/gcc/match.pd
> @@ -2080,6 +2080,19 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT)
>(if (ANY_INTEGRAL_TYPE_P (TREE_TYPE (@0))
> && TYPE_OVERFLOW_UNDEFINED (TREE_TYPE (@0)))
> (op @0 @1
> +
> +/* As a special case, X + C < Y + C is the same as (signed) X < (signed) Y
> +   when C is an unsigned integer constant with only the MSB set, and X and
> +   Y have types of equal or lower integer conversion rank than C's.  */
> +(for op (lt le ge gt)
> + (simplify
> +  (op (plus @1 INTEGER_CST@0) (plus @2 INTEGER_CST@0))
> +  (if (INTEGRAL_TYPE_P (TREE_TYPE (@0))
> +   && TYPE_UNSIGNED (TREE_TYPE (@0))
> +   && wi::only_sign_bit_p (wi::to_wide (@0)))
> +   (with { tree stype = signed_type_for (TREE_TYPE (@0)); }
> +(op (convert:stype @1) (convert:stype @2))
> +
>  /* For equality and subtraction, this is also true with wrapping overflow.  
> */
>  (for op (eq ne minus)
>   (simplify
> diff --git a/gcc/testsuite/gcc.dg/pr94899.c b/gcc/testsuite/gcc.dg/pr94899.c
> new file mode 100644
> index 000..685201307ec
> --- /dev/null
> +++ b/gcc/testsuite/gcc.dg/pr94899.c
> @@ -0,0 +1,48 @@
> +/* { dg-do compile } */
> +/* { dg-options "-O2 -fdump-tree-optimized" } */
> +
> +typedef __INT16_TYPE__ int16_t;
> +typedef __INT32_TYPE__ int32_t;
> +typedef __UINT16_TYPE__ uint16_t;
> +typedef __UINT32_TYPE__ uint32_t;
> +
> +#define MAGIC (~ (uint32_t) 0 / 2 + 1)
> +
> +int
> +f_i16_i16 (int16_t x, int16_t y)
> +{
> +  return x + MAGIC < y + MAGIC;
> +}
> +
> +int
> +f_i16_i32 (int16_t x, int32_t y)
> +{
> +  return x + MAGIC < y + MAGIC;
> +}
> +
> +int
> +f_i32_i32 (int32_t x, int32_t y)
> +{
> +  return x + MAGIC < y + MAGIC;
> +}
> +
> +int
> +f_u32_i32 (uint32_t x, int32_t y)
> +{
> +  return x + MAGIC < y + MAGIC;
> +}
> +
> +int
> +f_u32_u32 (uint32_t x, uint32_t y)
> +{
> +  return x + MAGIC < y + MAGIC;
> +}
> +
> +int
> +f_i32_i32_sub (int32_t x, int32_t y)
> +{
> +  return x - MAGIC < y - MAGIC;
> +}
> +
> +/* The constants above should have been optimized away.  */
> +/* { dg-final { scan-tree-dump-times "2147483648" 0 "optimized"} } */

It might be possible to test for zero + or - operations instead?

OK otherwise.

Thanks,
Richard.

> --
> 2.35.3
>


Re: [PATCH] libgompd: Fix sizes in OMPD support and add local ICVs finctions.

2022-06-20 Thread Jakub Jelinek via Gcc-patches
On Fri, Jun 17, 2022 at 01:20:28AM +0200, Mohamed Atef wrote:
> libgomp/ChangeLog
> 
> 2022-06-17  Mohamed Atef  
> 
> * ompd-helper.h (DEREFERENCE, ACCESS_VALUE): New macros.
> (gompd_get_proc_bind): Change the returned value from ompd_word_t
> to const char *.
> (gompd_get_max_task_priority): Fix format.
> (gompd_stringize_gompd_enabled): Removed.
> (gompd_get_gompd_enabled): New function prototype.
> * ompd-helper.c (gompd_get_affinity_format): Call CHECK_RET.
> Fix format in gompd_enabled GET_VALUE.
> (gompd_stringize_gompd_enabled): Removed.
> (gompd_get_nthread, gompd_get_thread_limit, gompd_get_run_sched,
> gompd_get_run_sched_chunk_size, gompd_get_default_device,
> gompd_get_dynamic, gompd_get_max_active_levels, gompd_get_proc_bind,
> gompd_is_final,
> gompd_is_implicit, gompd_get_team_size): New functions.
> (gompd_get_gompd_enabled): Change the returned value from
> ompd_word_t to const char *.
> * ompd-init.c (ompd_process_initialize): Use sizeof_short instead of
> sizeof_long_long in GET_VALUE argument.
> * ompd-support.h: Change type from __UINT64_TYPE__ to unsigned short.
> (GOMPD_FOREACH_ACCESS): Add entries for gomp_task kind
> and final_task and gomp_team nthreads.
> * ompd-support.c (gompd_get_offset, gompd_get_sizeof_member,
> gompd_get_size, OMPD_SECTION): Define.
> (gompd_access_gomp_thread_handle,
> gompd_sizeof_gomp_thread_handle): New variables.
> (gompd_state): Change type from __UNIT64_TYPE__ to
> unsigned short.
> (gompd_load): Remove gompd_init_access, gompd_init_sizeof_members,
> gompd_init_sizes, gompd_access_gomp_thread_handle,
> gompd_sizeof_gomp_thread_handle.
> * ompd-icv.c (ompd_get_icv_from_scope): Add thread_handle,
> task_handle and parallel_handle. Fix format in ashandle definition.

Just a nit.  After . there should be 2 spaces instead of one
unless it is at the end of line.

> Call gompd_get_nthread, gompd_get_thread_limit, gomp_get_run_shed,
> gompd_get_run_sched_chunk_size, gompd_get_default_device,
> gompd_get_dynamic, gompd_get_max_active_levels, gompd_get_proc_bind,
> gompd_is_final,
> gompd_is_implicit,
> and gompd_get_team_size.
> (ompd_get_icv_string_from_scope): Fix format in ashandle definition.
> Add task_handle. Call gompd_get_gompd_enabled, and

Here too.

> gompd_get_proc_bind. Remove the call to
> gompd_stringize_gompd_enabled.

> +
> +unsigned short gompd_access_gomp_thread_handle;
> +unsigned short gompd_sizeof_gomp_thread_handle;

This is undesirable, both because you are then mixing
const and non-const objects in OMPD_SECTION if GOMP_NEEDS_THREAD_HANDLE
is defined and because you need to duplicate the stuff in the macros.
I'd suggest
#ifndef GOMP_NEEDS_THREAD_HANDLE
const unsigned short gompd_access_gomp_thread_handle
  __attribute__ ((used)) OMPD_SECTION = 0;
const unsigned short gompd_sizeof_gomp_thread_handle
  __attribute__ ((used)) OMPD_SECTION = 0;
#endif

> +/* Get offset of the member m in struct t.  */
> +#define gompd_get_offset(t, m) \
> +  const unsigned short gompd_access_##t##_##m __attribute__ ((used)) \
> +OMPD_SECTION \
> +  = (unsigned short) offsetof (struct t, m);
> +  GOMPD_FOREACH_ACCESS (gompd_get_offset)
> +#ifdef GOMP_NEEDS_THREAD_HANDLE
> +  gompd_access_gomp_thread_handle __attribute__ ((used)) OMPD_SECTION
> += (unsigned short) offsetof (gomp_thread, handle);
> +#endif

Remove the above 4 lines.

> +#undef gompd_get_offset
> +/* Get size of member m in struct t.  */
> +#define gompd_get_sizeof_member(t, m) \
> +  const unsigned short gompd_sizeof_##t##_##m __attribute__ ((used)) \
> +OMPD_SECTION \
> +  = sizeof (((struct t *) NULL)->m);
> +  GOMPD_FOREACH_ACCESS (gompd_get_sizeof_member)
> +#ifdef GOMP_NEEDS_THREAD_HANDLE
> +  gompd_sizeof_gomp_thread_handle __attribute__ ((used)) OMPD_SECTION
> += sizeof (((struct gomp_thread *) NULL)->handle);
> +#endif

And these.

> +#undef gompd_get_sizeof_member
> +/* Get size of struct t.  */
> +#define gompd_get_size(t) \
> +  const unsigned short gompd_sizeof_##t##_ __attribute__ ((used)) \
> +OMPD_SECTION \
> +  = sizeof (struct t);
> +  GOMPD_SIZES (gompd_get_size)
> +#undef gompd_get_size
>  
> --- a/libgomp/ompd-support.h
> +++ b/libgomp/ompd-support.h
> @@ -67,7 +67,7 @@
>  #endif
>  
>  void gompd_load (void);
> -extern __UINT64_TYPE__ gompd_state;
> +extern unsigned short gompd_state;
>  
>  #define OMPD_ENABLED 0x1

#ifdef GOMP_NEEDS_THREAD_HANDLE
#define gompd_thread_handle_access gompd_access (gomp_thread, handle)
#else
#define gompd_thread_handle_access
#endif

above the following macro.

> @@ -83,7 +83,10 @@ extern __UINT64_TYPE__ gompd_state;
>gompd_access (gomp_thread_pool, threads) \
>gompd_access (gomp_thread, ts) \
>gompd_access (gomp_team_state, team_id) \
> -  gompd_access (gomp_task, icv)
> +  gompd_access (gomp_task, icv) \
> +  gompd_access (gomp_task, kind) \
> +  gompd_access (gomp_task, final_task) \
> +  gompd_access (gomp_team, nthreads)

and add \
  gompd_thread_handle_access
here.

Otherwise LGTM.

  

[PATCH] [x86] Replace REGNO with reg_or_subregno in pre_reload splitter.

2022-06-20 Thread liuhongt via Gcc-patches
The patch is similar to [1], but use reg_or_subregno instead of REGNO.

[1] https://gcc.gnu.org/pipermail/gcc-patches/2022-June/596804.html

Bootstrapped and regtested on x86_64-pc-linux-gnu{-m32,}.
Ok for trunk?

gcc/ChangeLog:

PR target/105993
* config/i386/sse.md (sse4_2_pcmpestr): Replace REGNO with
reg_or_subregno.
(sse4_2_pcmpistr): Ditto.
---
 gcc/config/i386/sse.md | 8 
 1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/gcc/config/i386/sse.md b/gcc/config/i386/sse.md
index 64ac490d272..083a7e8885a 100644
--- a/gcc/config/i386/sse.md
+++ b/gcc/config/i386/sse.md
@@ -23250,8 +23250,8 @@ (define_insn_and_split "sse4_2_pcmpestr"
   "&& 1"
   [(const_int 0)]
 {
-  int ecx = !find_regno_note (curr_insn, REG_UNUSED, REGNO (operands[0]));
-  int xmm0 = !find_regno_note (curr_insn, REG_UNUSED, REGNO (operands[1]));
+  int ecx = !find_regno_note (curr_insn, REG_UNUSED, reg_or_subregno 
(operands[0]));
+  int xmm0 = !find_regno_note (curr_insn, REG_UNUSED, reg_or_subregno 
(operands[1]));
   int flags = !find_regno_note (curr_insn, REG_UNUSED, FLAGS_REG);
 
   if (ecx)
@@ -23386,8 +23386,8 @@ (define_insn_and_split "sse4_2_pcmpistr"
   "&& 1"
   [(const_int 0)]
 {
-  int ecx = !find_regno_note (curr_insn, REG_UNUSED, REGNO (operands[0]));
-  int xmm0 = !find_regno_note (curr_insn, REG_UNUSED, REGNO (operands[1]));
+  int ecx = !find_regno_note (curr_insn, REG_UNUSED, reg_or_subregno 
(operands[0]));
+  int xmm0 = !find_regno_note (curr_insn, REG_UNUSED, reg_or_subregno 
(operands[1]));
   int flags = !find_regno_note (curr_insn, REG_UNUSED, FLAGS_REG);
 
   if (ecx)
-- 
2.18.1



Re: [statistics.cc] Emit asm name of function with -fdump-statistics-asmname

2022-06-20 Thread Richard Biener via Gcc-patches
On Thu, Jun 16, 2022 at 5:05 PM Prathamesh Kulkarni via Gcc-patches
 wrote:
>
> Hi,
> I just noticed -fdump-statistics supports asmname sub-option, which
> according to the doc states:
> "If DECL_ASSEMBLER_NAME has been set for a given decl, use that in the dump
> instead of DECL_NAME. Its primary use is ease of use working backward from
> mangled names in the assembly file."
>
> When passed -fdump-statistics-asmname, the dump however still contains the
> original name of functions. The patch modifies statistics.cc to emit asm
> name of function instead. Also for C++, it helps to better disambiguate
> overloaded function names in the stats dump file.
> I have attached stats dump for a simple test-case.
>
> Does it look OK ?

decl_assembler_name has the side-effect of computing and setting it if it is
not set already - I think that's unwanted.  You probably want to use
it only if DECL_ASSEMBLER_NAME_SET_P which then means when it
gets it later the dump will be split.

Richard.

>
> Thanks,
> Prathamesh


Re: [PATCH] Revert "[PATCH] RISC-V: Use new linker emulations for glibc ABI."

2022-06-20 Thread Kito Cheng via Gcc-patches
Generally I agree we should fix that by GCC driver rather than ld
emulation, but I think this should be reverted with the -L path fix,
otherwise that will break multilib on GNU toolchain for linux
immediately?

On Wed, Jun 15, 2022 at 4:00 PM Fangrui Song via Gcc-patches
 wrote:
>
> This reverts commit 37d57ac9a636f2235f9060e84fb8dd7968abd1dc.
>
> The resolution to https://sourceware.org/bugzilla/show_bug.cgi?id=22962
> let GCC pass -m emulation to ld and let the ld emulation configure
> default library paths.  This scheme is problematic:
>
> * It's not ld's business to specify default -L.  Different platforms have
> different opinions on the hierarchy and all other arches work well without 
> ld's
> default -L.
> * If some ABI derived library paths are desired, the compiler driver is in a
> better position to make the decision and traditionally has done this.
> * -m emulation is opaque to the compiler driver.  It doesn't affect -B, so
> data files like crt*.o, libasan_preinit.o, and libtsan_preinit.o are not 
> affected.
>
> As is, many platforms just use symlinks to fake the lib64/{ilp32{,f},lp64{,f}}
> hierarchies needed by the GNU ld emulation.  They can always specify -L
> explicitly if they want some ABI derived library paths.  See also the rejected
> https://reviews.llvm.org/D95755
>
> gcc/Changelog:
>
> * config/riscv/linux.h (LD_EMUL_SUFFIX): Remove.
> (LINK_SPEC): Remove LD_EMUL_SUFFIX.
> ---
>  gcc/config/riscv/linux.h | 10 +-
>  1 file changed, 1 insertion(+), 9 deletions(-)
>
> diff --git a/gcc/config/riscv/linux.h b/gcc/config/riscv/linux.h
> index 38803723ba9..e0ff6e6a178 100644
> --- a/gcc/config/riscv/linux.h
> +++ b/gcc/config/riscv/linux.h
> @@ -49,16 +49,8 @@ along with GCC; see the file COPYING3.  If not see
>
>  #define CPP_SPEC "%{pthread:-D_REENTRANT}"
>
> -#define LD_EMUL_SUFFIX \
> -  "%{mabi=lp64d:}" \
> -  "%{mabi=lp64f:_lp64f}" \
> -  "%{mabi=lp64:_lp64}" \
> -  "%{mabi=ilp32d:}" \
> -  "%{mabi=ilp32f:_ilp32f}" \
> -  "%{mabi=ilp32:_ilp32}"
> -
>  #define LINK_SPEC "\
> --melf" XLEN_SPEC DEFAULT_ENDIAN_SPEC "riscv" LD_EMUL_SUFFIX " \
> +-melf" XLEN_SPEC DEFAULT_ENDIAN_SPEC "riscv \
>  %{mno-relax:--no-relax} \
>  %{mbig-endian:-EB} \
>  %{mlittle-endian:-EL} \
> --
> 2.36.1.476.g0c4daa206d-goog
>