Re: [PATCH] LoongArch: Replace UNSPEC_FCOPYSIGN with copysign RTL

2023-10-03 Thread chenglulu

LGTM!

Thanks!

在 2023/10/3 上午11:46, Xi Ruoyao 写道:

When I added copysign support for LoongArch (r13-3702), we did not have
a copysign RTL insn, so I had to use UNSPEC to represent the copysign
instruction. Now the copysign RTX code has been added in r14-1586, so
this patch removes those UNSPECs, and it uses the native RTL copysign
insn.

Inspired by rs6000 patch "Cleanup: Replace UNSPEC_COPYSIGN with copysign
RTL" [1] from Michael Meissner.

[1]: https://gcc.gnu.org/pipermail/gcc-patches/2023-September/631701.html

gcc/ChangeLog:

* config/loongarch/loongarch.md (UNSPEC_FCOPYSIGN): Delete.
(copysign3): Use copysign RTL instead of UNSPEC.
---

Bootstrapped and regtested on loongarch64-linux-gnu.  Ok for trunk?

  gcc/config/loongarch/loongarch.md | 6 ++
  1 file changed, 2 insertions(+), 4 deletions(-)

diff --git a/gcc/config/loongarch/loongarch.md 
b/gcc/config/loongarch/loongarch.md
index 2b09209945b..9916c741641 100644
--- a/gcc/config/loongarch/loongarch.md
+++ b/gcc/config/loongarch/loongarch.md
@@ -37,7 +37,6 @@ (define_c_enum "unspec" [
UNSPEC_FCLASS
UNSPEC_FMAX
UNSPEC_FMIN
-  UNSPEC_FCOPYSIGN
UNSPEC_FTINT
UNSPEC_FTINTRM
UNSPEC_FTINTRP
@@ -1130,9 +1129,8 @@ (define_insn "abs2"
  
  (define_insn "copysign3"

[(set (match_operand:ANYF 0 "register_operand" "=f")
-   (unspec:ANYF [(match_operand:ANYF 1 "register_operand" "f")
- (match_operand:ANYF 2 "register_operand" "f")]
-UNSPEC_FCOPYSIGN))]
+   (copysign:ANYF (match_operand:ANYF 1 "register_operand" "f")
+  (match_operand:ANYF 2 "register_operand" "f")))]
"TARGET_HARD_FLOAT"
"fcopysign.\t%0,%1,%2"
[(set_attr "type" "fcopysign")




[PATCH 2/2] testsuite: Replace many dg-require-thread-fence with dg-require-atomic-exchange

2023-10-03 Thread Hans-Peter Nilsson
> From: Christophe Lyon 
> Date: Tue, 3 Oct 2023 15:20:39 +0200

> The patch passed almost all our CI configurations, except arm-eabi when
> testing with
>  -mthumb/-march=armv6s-m/-mtune=cortex-m0/-mfloat-abi=soft/-mfpu=auto
> where is causes these failures:
> FAIL: 29_atomics/atomic_flag/clear/1.cc -std=gnu++17 (test for excess
> errors)
> UNRESOLVED: 29_atomics/atomic_flag/clear/1.cc -std=gnu++17 compilation
> failed to produce executable
> FAIL: 29_atomics/atomic_flag/cons/value_init.cc -std=gnu++20 (test for
> excess errors)
> UNRESOLVED: 29_atomics/atomic_flag/cons/value_init.cc -std=gnu++20
> compilation failed to produce executable
> FAIL: 29_atomics/atomic_flag/cons/value_init.cc -std=gnu++26 (test for
> excess errors)
> UNRESOLVED: 29_atomics/atomic_flag/cons/value_init.cc -std=gnu++26
> compilation failed to produce executable
> FAIL: 29_atomics/atomic_flag/test_and_set/explicit.cc -std=gnu++17 (test
> for excess errors)
> UNRESOLVED: 29_atomics/atomic_flag/test_and_set/explicit.cc -std=gnu++17
> compilation failed to produce executable
> FAIL: 29_atomics/atomic_flag/test_and_set/implicit.cc -std=gnu++17 (test
> for excess errors)
> UNRESOLVED: 29_atomics/atomic_flag/test_and_set/implicit.cc -std=gnu++17
> compilation failed to produce executable
> 
> The linker error is:
> undefined reference to `__atomic_test_and_set'

Here's 2/2, fixing those regressions by, after code
inspection, gating those test-case users of
dg-require-thread-fence that actually use not just
atomic-load/store functionality, but a form of
compare-exchange, including the test-and-set cases listed
above as testsuite regressions.  That neatly includes the
regressions above.

Again, other libstdc++ test-cases should likely also use
this gate, from what I see of "undefined references" in
libstdc++.log.

Tested together with 1/2.
Ok to commit?

(N.B. there was a stray suffix "non-atomic code" in the
subject of 1/2; that's just a typo which will not be
committed.)

-- >8 --
These tests actually use a form of atomic exchange, not just
atomic loading and storing.  Some target have the latter,
but not the former, yielding linker errors for missing
library functions (and not supported by libatomic).

This change is just for existing uses of
dg-require-thread-fence.  It does not fix any other tests
that should also be gated on dg-require-atomic-exchange.

* testsuite/29_atomics/atomic/compare_exchange_padding.cc,
testsuite/29_atomics/atomic_flag/clear/1.cc,
testsuite/29_atomics/atomic_flag/cons/value_init.cc,
testsuite/29_atomics/atomic_flag/test_and_set/explicit.cc,
testsuite/29_atomics/atomic_flag/test_and_set/implicit.cc,
testsuite/29_atomics/atomic_ref/compare_exchange_padding.cc,
testsuite/29_atomics/atomic_ref/generic.cc,
testsuite/29_atomics/atomic_ref/integral.cc,
testsuite/29_atomics/atomic_ref/pointer.cc: Replace
dg-require-thread-fence with dg-require-atomic-exchange.
---
 .../testsuite/29_atomics/atomic/compare_exchange_padding.cc | 2 +-
 libstdc++-v3/testsuite/29_atomics/atomic_flag/clear/1.cc| 2 +-
 .../testsuite/29_atomics/atomic_flag/cons/value_init.cc | 2 +-
 .../testsuite/29_atomics/atomic_flag/test_and_set/explicit.cc   | 2 +-
 .../testsuite/29_atomics/atomic_flag/test_and_set/implicit.cc   | 2 +-
 .../testsuite/29_atomics/atomic_ref/compare_exchange_padding.cc | 2 +-
 libstdc++-v3/testsuite/29_atomics/atomic_ref/generic.cc | 2 +-
 libstdc++-v3/testsuite/29_atomics/atomic_ref/integral.cc| 2 +-
 libstdc++-v3/testsuite/29_atomics/atomic_ref/pointer.cc | 2 +-
 9 files changed, 9 insertions(+), 9 deletions(-)

diff --git 
a/libstdc++-v3/testsuite/29_atomics/atomic/compare_exchange_padding.cc 
b/libstdc++-v3/testsuite/29_atomics/atomic/compare_exchange_padding.cc
index 01f7475631e6..14698bb82456 100644
--- a/libstdc++-v3/testsuite/29_atomics/atomic/compare_exchange_padding.cc
+++ b/libstdc++-v3/testsuite/29_atomics/atomic/compare_exchange_padding.cc
@@ -1,5 +1,5 @@
 // { dg-do run { target c++20 } }
-// { dg-require-thread-fence "" }
+// { dg-require-atomic-exchange "" }
 // { dg-add-options libatomic }
 
 #include 
diff --git a/libstdc++-v3/testsuite/29_atomics/atomic_flag/clear/1.cc 
b/libstdc++-v3/testsuite/29_atomics/atomic_flag/clear/1.cc
index 89ed381fe057..0d8a11899ef1 100644
--- a/libstdc++-v3/testsuite/29_atomics/atomic_flag/clear/1.cc
+++ b/libstdc++-v3/testsuite/29_atomics/atomic_flag/clear/1.cc
@@ -1,5 +1,5 @@
 // { dg-do run { target c++11 } }
-// { dg-require-thread-fence "" }
+// { dg-require-atomic-exchange "" }
 
 // Copyright (C) 2009-2023 Free Software Foundation, Inc.
 //
diff --git a/libstdc++-v3/testsuite/29_atomics/atomic_flag/cons/value_init.cc 
b/libstdc++-v3/testsuite/29_atomics/atomic_flag/cons/value_init.cc
index f3f38b54dbcd..f95818532107 100644
--- a/libstdc++-v3/testsuite/29_atomics/atomic_flag/cons/value_init.cc
+++ 

[PATCH 1/2] testsuite: Add dg-require-atomic-exchange non-atomic code

2023-10-03 Thread Hans-Peter Nilsson
> From: Christophe Lyon 
> Date: Tue, 3 Oct 2023 15:20:39 +0200

> Maybe we need a new variant of dg-require-thread-fence ?

Yes: many of the dg-require-thread-fence users need
something stronger.  Tested arm-eabi together with the next
patch (2/2) with
RUNTESTFLAGS=--target_board=arm-sim/-mthumb/-march=armv6s-m/-mtune=cortex-m0/-mfloat-abi=soft/-mfpu=auto\
conformance.exp=29_atomics/\*

(Incidentally, in the patch context is seen
dg-require-atomic-builtins which is a misnomer: it should
rather be named "dg-require-lock-atomic-builtins-free".)

Ok to commit?

-- >8 --
Some targets (armv6) support inline atomic load and store,
i.e. dg-require-thread-fence matches, but not atomic like
atomic exchange.  This directive will replace uses of
dg-require-thread-fence where an atomic exchange operation
is actually used.

* testsuite/lib/dg-options.exp (dg-require-atomic-exchange): New proc.
* testsuite/lib/libstdc++.exp (check_v3_target_atomic_exchange): Ditto.
---
 libstdc++-v3/testsuite/lib/dg-options.exp |  9 ++
 libstdc++-v3/testsuite/lib/libstdc++.exp  | 35 +++
 2 files changed, 44 insertions(+)

diff --git a/libstdc++-v3/testsuite/lib/dg-options.exp 
b/libstdc++-v3/testsuite/lib/dg-options.exp
index 84ad0c65330b..b13c2f244c63 100644
--- a/libstdc++-v3/testsuite/lib/dg-options.exp
+++ b/libstdc++-v3/testsuite/lib/dg-options.exp
@@ -133,6 +133,15 @@ proc dg-require-thread-fence { args } {
 return
 }
 
+proc dg-require-atomic-exchange { args } {
+if { ![ check_v3_target_atomic_exchange ] } {
+   upvar dg-do-what dg-do-what
+   set dg-do-what [list [lindex ${dg-do-what} 0] "N" "P"]
+   return
+}
+return
+}
+
 proc dg-require-atomic-builtins { args } {
 if { ![ check_v3_target_atomic_builtins ] } {
upvar dg-do-what dg-do-what
diff --git a/libstdc++-v3/testsuite/lib/libstdc++.exp 
b/libstdc++-v3/testsuite/lib/libstdc++.exp
index 608056e5068e..481f81711074 100644
--- a/libstdc++-v3/testsuite/lib/libstdc++.exp
+++ b/libstdc++-v3/testsuite/lib/libstdc++.exp
@@ -1221,6 +1221,41 @@ proc check_v3_target_thread_fence { } {
 }]
 }
 
+proc check_v3_target_atomic_exchange { } {
+return [check_v3_target_prop_cached et_atomic_exchange {
+   global cxxflags
+   global DEFAULT_CXXFLAGS
+
+   # Set up and link a C++11 test program that depends
+   # on atomic exchange be available for "int".
+   set src atomic_exchange[pid].cc
+
+   set f [open $src "w"]
+   puts $f "
+int i, j, k;
+   int main() {
+   __atomic_exchange (, , , __ATOMIC_SEQ_CST);
+   return 0;
+   }"
+   close $f
+
+   set cxxflags_saved $cxxflags
+   set cxxflags "$cxxflags $DEFAULT_CXXFLAGS -Werror -std=gnu++11"
+
+   set lines [v3_target_compile $src /dev/null executable ""]
+   set cxxflags $cxxflags_saved
+   file delete $src
+
+   if [string match "" $lines] {
+   # No error message, linking succeeded.
+   return 1
+   } else {
+   verbose "check_v3_target_atomic_exchange: compilation failed" 2
+   return 0
+   }
+}]
+}
+
 # Return 1 if atomics_bool and atomic_int are always lock-free, 0 otherwise.
 proc check_v3_target_atomic_builtins { } {
 return [check_v3_target_prop_cached et_atomic_builtins {
-- 
2.30.2


> 
> Thanks,
> 
> Christophe
> 
> 
> Ok to commit?
> >
> > -- >8 --
> > Make __atomic_test_and_set consistent with other __atomic_ and __sync_
> > builtins: call a matching library function instead of emitting
> > non-atomic code when the target has no direct insn support.
> >
> > There's special-case code handling targetm.atomic_test_and_set_trueval
> > != 1 trying a modified maybe_emit_sync_lock_test_and_set.  Previously,
> > if that worked but its matching emit_store_flag_force returned NULL,
> > we'd segfault later on.  Now that the caller handles NULL, gcc_assert
> > here instead.
> >
> > While the referenced PR:s are ARM-specific, the issue is general.
> >
> > PR target/107567
> > PR target/109166
> > * builtins.cc (expand_builtin) :
> > Handle failure from expand_builtin_atomic_test_and_set.
> > * optabs.cc (expand_atomic_test_and_set): When all attempts fail to
> > generate atomic code through target support, return NULL
> > instead of emitting non-atomic code.  Also, for code handling
> > targetm.atomic_test_and_set_trueval != 1, gcc_assert result
> > from calling emit_store_flag_force instead of returning NULL.
> > ---
> >  gcc/builtins.cc |  5 -
> >  gcc/optabs.cc   | 22 +++---
> >  2 files changed, 11 insertions(+), 16 deletions(-)
> >
> > diff --git a/gcc/builtins.cc b/gcc/builtins.cc
> > index 6e4274bb2a4e..40dfd36a3197 100644
> > --- a/gcc/builtins.cc
> > +++ b/gcc/builtins.cc
> > @@ -8387,7 +8387,10 @@ expand_builtin (tree exp, rtx target, rtx
> > subtarget, machine_mode mode,
> >break;
> >
> >  case 

Re: [PATCH 1/2] libstdc++: Define _versioned_namespace in xmethods.py

2023-10-03 Thread Jonathan Wakely
On Tue, 3 Oct 2023, 18:19 Tom Tromey,  wrote:

> flake8 pointed out that is_specialization_of in xmethods.py looks at a
> global that wasn't added to the file.  This patch correct the
> oversight.
>

OK, thanks



>
> libstdc++-v3/ChangeLog:
>
> * python/libstdcxx/v6/xmethods.py (_versioned_namespace):
> Define.
> ---
>  libstdc++-v3/python/libstdcxx/v6/xmethods.py | 2 ++
>  1 file changed, 2 insertions(+)
>
> diff --git a/libstdc++-v3/python/libstdcxx/v6/xmethods.py
> b/libstdc++-v3/python/libstdcxx/v6/xmethods.py
> index 844c8a2105a..8ccf57c4d6b 100644
> --- a/libstdc++-v3/python/libstdcxx/v6/xmethods.py
> +++ b/libstdc++-v3/python/libstdcxx/v6/xmethods.py
> @@ -28,6 +28,8 @@ def get_bool_type():
>  def get_std_size_type():
>  return gdb.lookup_type('std::size_t')
>
> +_versioned_namespace = '__8::'
> +
>  def is_specialization_of(x, template_name):
>  """
>  Test whether a type is a specialization of the named class template.
> --
> 2.40.1
>
>


Re: [PATCH 2/2] libstdc++: _versioned_namespace is always non-None

2023-10-03 Thread Jonathan Wakely
On Tue, 3 Oct 2023, 23:55 Jonathan Wakely,  wrote:

>
>
> On Tue, 3 Oct 2023, 19:27 Tom Tromey,  wrote:
>
>> Some code in the pretty-printers seems to assume that the
>> _versioned_namespace global might be None (or the empty string).
>> However, doesn't occur, as the variable is never reassigned.
>>
>
> ok for trunk, but we should just remove that bit from xmethods.py as the
> variable is never even set in that file.
>

Oh I see you already addressed that in another patch :-)


>
>
>> libstdc++-v3/ChangeLog:
>>
>> * python/libstdcxx/v6/printers.py: Assume that
>> _versioned_namespace is non-None.
>> * python/libstdcxx/v6/xmethods.py (is_specialization_of):
>> Assume that _versioned_namespace is non-None.
>> ---
>>  libstdc++-v3/python/libstdcxx/v6/printers.py | 15 ++-
>>  libstdc++-v3/python/libstdcxx/v6/xmethods.py |  3 +--
>>  2 files changed, 7 insertions(+), 11 deletions(-)
>>
>> diff --git a/libstdc++-v3/python/libstdcxx/v6/printers.py
>> b/libstdc++-v3/python/libstdcxx/v6/printers.py
>> index 23efbd171ec..e370551cbe1 100644
>> --- a/libstdc++-v3/python/libstdcxx/v6/printers.py
>> +++ b/libstdc++-v3/python/libstdcxx/v6/printers.py
>> @@ -139,7 +139,7 @@ def lookup_templ_spec(templ, *args):
>>  except gdb.error as e:
>>  # Type not found, try again in versioned namespace.
>>  global _versioned_namespace
>> -if _versioned_namespace and _versioned_namespace not in templ:
>> +if _versioned_namespace not in templ:
>>  t = t.replace('::', '::' + _versioned_namespace, 1)
>>  try:
>>  return gdb.lookup_type(t)
>> @@ -211,16 +211,13 @@ def is_specialization_of(x, template_name):
>>  global _versioned_namespace
>>  if isinstance(x, gdb.Type):
>>  x = x.tag
>> -if _versioned_namespace:
>> -template_name = '(%s)?%s' % (_versioned_namespace, template_name)
>> +template_name = '(%s)?%s' % (_versioned_namespace, template_name)
>>  return re.match('^std::%s<.*>$' % template_name, x) is not None
>>
>>
>>  def strip_versioned_namespace(typename):
>>  global _versioned_namespace
>> -if _versioned_namespace:
>> -return typename.replace(_versioned_namespace, '')
>> -return typename
>> +return typename.replace(_versioned_namespace, '')
>>
>>
>>  def strip_inline_namespaces(type_str):
>> @@ -2355,7 +2352,7 @@ class Printer(object):
>>  # Add a name using _GLIBCXX_BEGIN_NAMESPACE_VERSION.
>>  def add_version(self, base, name, function):
>>  self.add(base + name, function)
>> -if _versioned_namespace and '__cxx11' not in base:
>> +if '__cxx11' not in base:
>>  vbase = re.sub('^(std|__gnu_cxx)::', r'\g<0>%s' %
>> _versioned_namespace, base)
>>  self.add(vbase + name, function)
>> @@ -2527,7 +2524,7 @@ def add_one_template_type_printer(obj, name,
>> defargs):
>>  printer = TemplateTypePrinter('std::__debug::' + name, defargs)
>>  gdb.types.register_type_printer(obj, printer)
>>
>> -if _versioned_namespace and '__cxx11' not in name:
>> +if '__cxx11' not in name:
>>  # Add second type printer for same type in versioned namespace:
>>  ns = 'std::' + _versioned_namespace
>>  # PR 86112 Cannot use dict comprehension here:
>> @@ -2628,7 +2625,7 @@ class FilteringTypePrinter(object):
>>  def add_one_type_printer(obj, template, name, targ1=None):
>>  printer = FilteringTypePrinter('std::' + template, 'std::' + name,
>> targ1)
>>  gdb.types.register_type_printer(obj, printer)
>> -if _versioned_namespace and '__cxx11' not in template:
>> +if '__cxx11' not in template:
>>  ns = 'std::' + _versioned_namespace
>>  printer = FilteringTypePrinter(ns + template, ns + name, targ1)
>>  gdb.types.register_type_printer(obj, printer)
>> diff --git a/libstdc++-v3/python/libstdcxx/v6/xmethods.py
>> b/libstdc++-v3/python/libstdcxx/v6/xmethods.py
>> index 8ccf57c4d6b..42e60eb57b1 100644
>> --- a/libstdc++-v3/python/libstdcxx/v6/xmethods.py
>> +++ b/libstdc++-v3/python/libstdcxx/v6/xmethods.py
>> @@ -39,8 +39,7 @@ def is_specialization_of(x, template_name):
>>  """
>>  if isinstance(x, gdb.Type):
>>  x = x.tag
>> -if _versioned_namespace:
>> -template_name = '(%s)?%s' % (_versioned_namespace, template_name)
>> +template_name = '(%s)?%s' % (_versioned_namespace, template_name)
>>  return re.match(r'^std::(__\d::)?%s<.*>$' % template_name, x) is not
>> None
>>
>>  class LibStdCxxXMethod(gdb.xmethod.XMethod):
>> --
>> 2.40.1
>>
>>


Re: [PATCH 2/2] libstdc++: _versioned_namespace is always non-None

2023-10-03 Thread Jonathan Wakely
On Tue, 3 Oct 2023, 19:27 Tom Tromey,  wrote:

> Some code in the pretty-printers seems to assume that the
> _versioned_namespace global might be None (or the empty string).
> However, doesn't occur, as the variable is never reassigned.
>

ok for trunk, but we should just remove that bit from xmethods.py as the
variable is never even set in that file.



> libstdc++-v3/ChangeLog:
>
> * python/libstdcxx/v6/printers.py: Assume that
> _versioned_namespace is non-None.
> * python/libstdcxx/v6/xmethods.py (is_specialization_of):
> Assume that _versioned_namespace is non-None.
> ---
>  libstdc++-v3/python/libstdcxx/v6/printers.py | 15 ++-
>  libstdc++-v3/python/libstdcxx/v6/xmethods.py |  3 +--
>  2 files changed, 7 insertions(+), 11 deletions(-)
>
> diff --git a/libstdc++-v3/python/libstdcxx/v6/printers.py
> b/libstdc++-v3/python/libstdcxx/v6/printers.py
> index 23efbd171ec..e370551cbe1 100644
> --- a/libstdc++-v3/python/libstdcxx/v6/printers.py
> +++ b/libstdc++-v3/python/libstdcxx/v6/printers.py
> @@ -139,7 +139,7 @@ def lookup_templ_spec(templ, *args):
>  except gdb.error as e:
>  # Type not found, try again in versioned namespace.
>  global _versioned_namespace
> -if _versioned_namespace and _versioned_namespace not in templ:
> +if _versioned_namespace not in templ:
>  t = t.replace('::', '::' + _versioned_namespace, 1)
>  try:
>  return gdb.lookup_type(t)
> @@ -211,16 +211,13 @@ def is_specialization_of(x, template_name):
>  global _versioned_namespace
>  if isinstance(x, gdb.Type):
>  x = x.tag
> -if _versioned_namespace:
> -template_name = '(%s)?%s' % (_versioned_namespace, template_name)
> +template_name = '(%s)?%s' % (_versioned_namespace, template_name)
>  return re.match('^std::%s<.*>$' % template_name, x) is not None
>
>
>  def strip_versioned_namespace(typename):
>  global _versioned_namespace
> -if _versioned_namespace:
> -return typename.replace(_versioned_namespace, '')
> -return typename
> +return typename.replace(_versioned_namespace, '')
>
>
>  def strip_inline_namespaces(type_str):
> @@ -2355,7 +2352,7 @@ class Printer(object):
>  # Add a name using _GLIBCXX_BEGIN_NAMESPACE_VERSION.
>  def add_version(self, base, name, function):
>  self.add(base + name, function)
> -if _versioned_namespace and '__cxx11' not in base:
> +if '__cxx11' not in base:
>  vbase = re.sub('^(std|__gnu_cxx)::', r'\g<0>%s' %
> _versioned_namespace, base)
>  self.add(vbase + name, function)
> @@ -2527,7 +2524,7 @@ def add_one_template_type_printer(obj, name,
> defargs):
>  printer = TemplateTypePrinter('std::__debug::' + name, defargs)
>  gdb.types.register_type_printer(obj, printer)
>
> -if _versioned_namespace and '__cxx11' not in name:
> +if '__cxx11' not in name:
>  # Add second type printer for same type in versioned namespace:
>  ns = 'std::' + _versioned_namespace
>  # PR 86112 Cannot use dict comprehension here:
> @@ -2628,7 +2625,7 @@ class FilteringTypePrinter(object):
>  def add_one_type_printer(obj, template, name, targ1=None):
>  printer = FilteringTypePrinter('std::' + template, 'std::' + name,
> targ1)
>  gdb.types.register_type_printer(obj, printer)
> -if _versioned_namespace and '__cxx11' not in template:
> +if '__cxx11' not in template:
>  ns = 'std::' + _versioned_namespace
>  printer = FilteringTypePrinter(ns + template, ns + name, targ1)
>  gdb.types.register_type_printer(obj, printer)
> diff --git a/libstdc++-v3/python/libstdcxx/v6/xmethods.py
> b/libstdc++-v3/python/libstdcxx/v6/xmethods.py
> index 8ccf57c4d6b..42e60eb57b1 100644
> --- a/libstdc++-v3/python/libstdcxx/v6/xmethods.py
> +++ b/libstdc++-v3/python/libstdcxx/v6/xmethods.py
> @@ -39,8 +39,7 @@ def is_specialization_of(x, template_name):
>  """
>  if isinstance(x, gdb.Type):
>  x = x.tag
> -if _versioned_namespace:
> -template_name = '(%s)?%s' % (_versioned_namespace, template_name)
> +template_name = '(%s)?%s' % (_versioned_namespace, template_name)
>  return re.match(r'^std::(__\d::)?%s<.*>$' % template_name, x) is not
> None
>
>  class LibStdCxxXMethod(gdb.xmethod.XMethod):
> --
> 2.40.1
>
>


Re: [PATCH v2] libiberty: Use posix_spawn in pex-unix when available.

2023-10-03 Thread Ian Lance Taylor
On Tue, Oct 3, 2023 at 12:04 PM Brendan Shanks  wrote:
>
> +  ret = posix_spawnattr_init ();
> +  if (ret) { *err = ret; *errmsg = "posix_spawnattr_init"; goto exit; }

Sorry, but let's keep the formatting used in the rest of the file.

if (ret != 0)
  {
*err = ret;
*errmsg = "posix_spawnattr_init";
goto exit;
  }



> +  if (in != STDIN_FILE_NO)
> +close (in);
> +  if (out != STDOUT_FILE_NO)
> +close (out);
> +  if (errdes != STDERR_FILE_NO)
> +close (errdes);

Not a big deal, but the other version of this function checks the
error result of close.

Ian


[RFC gcc13 backport 1/3] RISC-V: Add Ztso atomic mappings

2023-10-03 Thread Patrick O'Neill
The RISC-V Ztso extension currently has no effect on generated code.
With the additional ordering constraints guarenteed by Ztso, we can emit
more optimized atomic mappings than the RVWMO mappings.

This PR implements the Ztso psABI mappings[1].

[1] https://github.com/riscv-non-isa/riscv-elf-psabi-doc/pull/391

2023-08-08 Patrick O'Neill 

gcc/ChangeLog:

* common/config/riscv/riscv-common.cc: Add Ztso and mark Ztso as
dependent on 'a' extension.
* config/riscv/riscv-opts.h (MASK_ZTSO): New mask.
(TARGET_ZTSO): New target.
* config/riscv/riscv.cc (riscv_memmodel_needs_amo_acquire): Add
Ztso case.
(riscv_memmodel_needs_amo_release): Add Ztso case.
(riscv_print_operand): Add Ztso case for LR/SC annotations.
* config/riscv/riscv.md: Import sync-rvwmo.md and sync-ztso.md.
* config/riscv/riscv.opt: Add Ztso target variable.
* config/riscv/sync.md (mem_thread_fence_1): Expand to RVWMO or
Ztso specific insn.
(atomic_load): Expand to RVWMO or Ztso specific insn.
(atomic_store): Expand to RVWMO or Ztso specific insn.
* config/riscv/sync-rvwmo.md: New file. Seperate out RVWMO
specific load/store/fence mappings.
* config/riscv/sync-ztso.md: New file. Seperate out Ztso
specific load/store/fence mappings.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/amo-table-ztso-amo-add-1.c: New test.
* gcc.target/riscv/amo-table-ztso-amo-add-2.c: New test.
* gcc.target/riscv/amo-table-ztso-amo-add-3.c: New test.
* gcc.target/riscv/amo-table-ztso-amo-add-4.c: New test.
* gcc.target/riscv/amo-table-ztso-amo-add-5.c: New test.
* gcc.target/riscv/amo-table-ztso-compare-exchange-1.c: New test.
* gcc.target/riscv/amo-table-ztso-compare-exchange-2.c: New test.
* gcc.target/riscv/amo-table-ztso-compare-exchange-3.c: New test.
* gcc.target/riscv/amo-table-ztso-compare-exchange-4.c: New test.
* gcc.target/riscv/amo-table-ztso-compare-exchange-5.c: New test.
* gcc.target/riscv/amo-table-ztso-compare-exchange-6.c: New test.
* gcc.target/riscv/amo-table-ztso-compare-exchange-7.c: New test.
* gcc.target/riscv/amo-table-ztso-fence-1.c: New test.
* gcc.target/riscv/amo-table-ztso-fence-2.c: New test.
* gcc.target/riscv/amo-table-ztso-fence-3.c: New test.
* gcc.target/riscv/amo-table-ztso-fence-4.c: New test.
* gcc.target/riscv/amo-table-ztso-fence-5.c: New test.
* gcc.target/riscv/amo-table-ztso-load-1.c: New test.
* gcc.target/riscv/amo-table-ztso-load-2.c: New test.
* gcc.target/riscv/amo-table-ztso-load-3.c: New test.
* gcc.target/riscv/amo-table-ztso-store-1.c: New test.
* gcc.target/riscv/amo-table-ztso-store-2.c: New test.
* gcc.target/riscv/amo-table-ztso-store-3.c: New test.
* gcc.target/riscv/amo-table-ztso-subword-amo-add-1.c: New test.
* gcc.target/riscv/amo-table-ztso-subword-amo-add-2.c: New test.
* gcc.target/riscv/amo-table-ztso-subword-amo-add-3.c: New test.
* gcc.target/riscv/amo-table-ztso-subword-amo-add-4.c: New test.
* gcc.target/riscv/amo-table-ztso-subword-amo-add-5.c: New test.

Signed-off-by: Patrick O'Neill 
---
 gcc/common/config/riscv/riscv-common.cc   |   6 +
 gcc/config/riscv/riscv-opts.h |   4 +
 gcc/config/riscv/riscv.cc |  20 +++-
 gcc/config/riscv/riscv.md |   2 +
 gcc/config/riscv/riscv.opt|   3 +
 gcc/config/riscv/sync-rvwmo.md|  96 +++
 gcc/config/riscv/sync-ztso.md |  80 +
 gcc/config/riscv/sync.md  | 111 ++
 .../riscv/amo-table-ztso-amo-add-1.c  |  15 +++
 .../riscv/amo-table-ztso-amo-add-2.c  |  15 +++
 .../riscv/amo-table-ztso-amo-add-3.c  |  15 +++
 .../riscv/amo-table-ztso-amo-add-4.c  |  15 +++
 .../riscv/amo-table-ztso-amo-add-5.c  |  15 +++
 .../riscv/amo-table-ztso-compare-exchange-1.c |  10 ++
 .../riscv/amo-table-ztso-compare-exchange-2.c |  10 ++
 .../riscv/amo-table-ztso-compare-exchange-3.c |  10 ++
 .../riscv/amo-table-ztso-compare-exchange-4.c |  10 ++
 .../riscv/amo-table-ztso-compare-exchange-5.c |  10 ++
 .../riscv/amo-table-ztso-compare-exchange-6.c |  10 ++
 .../riscv/amo-table-ztso-compare-exchange-7.c |  10 ++
 .../gcc.target/riscv/amo-table-ztso-fence-1.c |  14 +++
 .../gcc.target/riscv/amo-table-ztso-fence-2.c |  14 +++
 .../gcc.target/riscv/amo-table-ztso-fence-3.c |  14 +++
 .../gcc.target/riscv/amo-table-ztso-fence-4.c |  14 +++
 .../gcc.target/riscv/amo-table-ztso-fence-5.c |  15 +++
 .../gcc.target/riscv/amo-table-ztso-load-1.c  |  16 +++
 .../gcc.target/riscv/amo-table-ztso-load-2.c  |  16 +++
 .../gcc.target/riscv/amo-table-ztso-load-3.c  |  17 +++
 .../gcc.target/riscv/amo-table-ztso-store-1.c |  16 +++
 

[RFC gcc13 backport 2/3] RISC-V: Specify -mabi for ztso testcases

2023-10-03 Thread Patrick O'Neill
On rv32 targets, this patch fixes ztso testcases errors like this:
cc1: error: ABI requires '-march=rv32'

2023-08-11 Patrick O'Neill 

gcc/testsuite/ChangeLog:

* gcc.target/riscv/amo-table-ztso-amo-add-1.c: Add -mabi=lp64d
to dg-options.
* gcc.target/riscv/amo-table-ztso-amo-add-2.c: Ditto.
* gcc.target/riscv/amo-table-ztso-amo-add-3.c: Ditto.
* gcc.target/riscv/amo-table-ztso-amo-add-4.c: Ditto.
* gcc.target/riscv/amo-table-ztso-amo-add-5.c: Ditto.
* gcc.target/riscv/amo-table-ztso-compare-exchange-1.c: Ditto.
* gcc.target/riscv/amo-table-ztso-compare-exchange-2.c: Ditto.
* gcc.target/riscv/amo-table-ztso-compare-exchange-3.c: Ditto.
* gcc.target/riscv/amo-table-ztso-compare-exchange-4.c: Ditto.
* gcc.target/riscv/amo-table-ztso-compare-exchange-5.c: Ditto.
* gcc.target/riscv/amo-table-ztso-compare-exchange-6.c: Ditto.
* gcc.target/riscv/amo-table-ztso-compare-exchange-7.c: Ditto.
* gcc.target/riscv/amo-table-ztso-fence-1.c: Ditto.
* gcc.target/riscv/amo-table-ztso-fence-2.c: Ditto.
* gcc.target/riscv/amo-table-ztso-fence-3.c: Ditto.
* gcc.target/riscv/amo-table-ztso-fence-4.c: Ditto.
* gcc.target/riscv/amo-table-ztso-fence-5.c: Ditto.
* gcc.target/riscv/amo-table-ztso-load-1.c: Ditto.
* gcc.target/riscv/amo-table-ztso-load-2.c: Ditto.
* gcc.target/riscv/amo-table-ztso-load-3.c: Ditto.
* gcc.target/riscv/amo-table-ztso-store-1.c: Ditto.
* gcc.target/riscv/amo-table-ztso-store-2.c: Ditto.
* gcc.target/riscv/amo-table-ztso-store-3.c: Ditto.
* gcc.target/riscv/amo-table-ztso-subword-amo-add-1.c: Ditto.
* gcc.target/riscv/amo-table-ztso-subword-amo-add-2.c: Ditto.
* gcc.target/riscv/amo-table-ztso-subword-amo-add-3.c: Ditto.
* gcc.target/riscv/amo-table-ztso-subword-amo-add-4.c: Ditto.
* gcc.target/riscv/amo-table-ztso-subword-amo-add-5.c: Ditto.

Signed-off-by: Patrick O'Neill 
---
 gcc/testsuite/gcc.target/riscv/amo-table-ztso-amo-add-1.c   | 2 +-
 gcc/testsuite/gcc.target/riscv/amo-table-ztso-amo-add-2.c   | 2 +-
 gcc/testsuite/gcc.target/riscv/amo-table-ztso-amo-add-3.c   | 2 +-
 gcc/testsuite/gcc.target/riscv/amo-table-ztso-amo-add-4.c   | 2 +-
 gcc/testsuite/gcc.target/riscv/amo-table-ztso-amo-add-5.c   | 2 +-
 .../gcc.target/riscv/amo-table-ztso-compare-exchange-1.c| 2 +-
 .../gcc.target/riscv/amo-table-ztso-compare-exchange-2.c| 2 +-
 .../gcc.target/riscv/amo-table-ztso-compare-exchange-3.c| 2 +-
 .../gcc.target/riscv/amo-table-ztso-compare-exchange-4.c| 2 +-
 .../gcc.target/riscv/amo-table-ztso-compare-exchange-5.c| 2 +-
 .../gcc.target/riscv/amo-table-ztso-compare-exchange-6.c| 2 +-
 .../gcc.target/riscv/amo-table-ztso-compare-exchange-7.c| 2 +-
 gcc/testsuite/gcc.target/riscv/amo-table-ztso-fence-1.c | 2 +-
 gcc/testsuite/gcc.target/riscv/amo-table-ztso-fence-2.c | 2 +-
 gcc/testsuite/gcc.target/riscv/amo-table-ztso-fence-3.c | 2 +-
 gcc/testsuite/gcc.target/riscv/amo-table-ztso-fence-4.c | 2 +-
 gcc/testsuite/gcc.target/riscv/amo-table-ztso-fence-5.c | 2 +-
 gcc/testsuite/gcc.target/riscv/amo-table-ztso-load-1.c  | 2 +-
 gcc/testsuite/gcc.target/riscv/amo-table-ztso-load-2.c  | 2 +-
 gcc/testsuite/gcc.target/riscv/amo-table-ztso-load-3.c  | 2 +-
 gcc/testsuite/gcc.target/riscv/amo-table-ztso-store-1.c | 2 +-
 gcc/testsuite/gcc.target/riscv/amo-table-ztso-store-2.c | 2 +-
 gcc/testsuite/gcc.target/riscv/amo-table-ztso-store-3.c | 2 +-
 .../gcc.target/riscv/amo-table-ztso-subword-amo-add-1.c | 2 +-
 .../gcc.target/riscv/amo-table-ztso-subword-amo-add-2.c | 2 +-
 .../gcc.target/riscv/amo-table-ztso-subword-amo-add-3.c | 2 +-
 .../gcc.target/riscv/amo-table-ztso-subword-amo-add-4.c | 2 +-
 .../gcc.target/riscv/amo-table-ztso-subword-amo-add-5.c | 2 +-
 28 files changed, 28 insertions(+), 28 deletions(-)

diff --git a/gcc/testsuite/gcc.target/riscv/amo-table-ztso-amo-add-1.c 
b/gcc/testsuite/gcc.target/riscv/amo-table-ztso-amo-add-1.c
index a7097e9aab9..a88d08eb3f4 100644
--- a/gcc/testsuite/gcc.target/riscv/amo-table-ztso-amo-add-1.c
+++ b/gcc/testsuite/gcc.target/riscv/amo-table-ztso-amo-add-1.c
@@ -1,6 +1,6 @@
 /* { dg-do compile } */
 /* Verify that atomic op mappings match the Ztso suggested mapping.  */
-/* { dg-options "-march=rv64id_ztso -O3" } */
+/* { dg-options "-march=rv64id_ztso -mabi=lp64d -O3" } */
 /* { dg-skip-if "" { *-*-* } { "-g" "-flto"} } */
 /* { dg-final { check-function-bodies "**" "" } } */
 
diff --git a/gcc/testsuite/gcc.target/riscv/amo-table-ztso-amo-add-2.c 
b/gcc/testsuite/gcc.target/riscv/amo-table-ztso-amo-add-2.c
index 8e993903439..ebd240f9dd2 100644
--- a/gcc/testsuite/gcc.target/riscv/amo-table-ztso-amo-add-2.c
+++ 

[RFC gcc13 backport 3/3] [RISCV][committed] Remove spurious newline in ztso sequence

2023-10-03 Thread Patrick O'Neill
From: Jeff Law 

amo-table-ztso-load-3 the coordination branch after merging up the Ztso changes
due to a spurious newline in the output causing scan-function-body to fail.
There's probably an over-zealous .* or similar regexp in the framework.  I
didn't see it in a quick scan, but could have easily missed it.

Regardless, fixing the extraneous newline is easy :-)

gcc/
* config/riscv/sync-ztso.md (atomic_load_ztso): Avoid extraenous
newline.

Signed-off-by: Patrick O'Neill 
---
 gcc/config/riscv/sync-ztso.md | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/gcc/config/riscv/sync-ztso.md b/gcc/config/riscv/sync-ztso.md
index 91c2a48c069..ed94471b96b 100644
--- a/gcc/config/riscv/sync-ztso.md
+++ b/gcc/config/riscv/sync-ztso.md
@@ -52,7 +52,7 @@
 
 if (model == MEMMODEL_SEQ_CST)
   return "fence\trw,rw\;"
-"l\t%0,%1\;";
+"l\t%0,%1";
 else
   return "l\t%0,%1";
   }
@@ -77,4 +77,4 @@
   return "s\t%z1,%0";
   }
   [(set_attr "type" "atomic")
-   (set (attr "length") (const_int 8))])
\ No newline at end of file
+   (set (attr "length") (const_int 8))])
-- 
2.34.1



[RFC gcc13 backport 0/3] Add Ztso atomic mappings

2023-10-03 Thread Patrick O'Neill
I vaugely recall some discussion about backporting the Ztso mappings
along with the RVWMO mappings. Now that the RVWMO mappings have been
backported for 13.3, is there interest in also backporting the Ztso
mappings?

Tested using for regressions using rv32gc/rv64gc glibc.

Jeff Law (1):
  [RISCV][committed] Remove spurious newline in ztso sequence

Patrick O'Neill (2):
  RISC-V: Add Ztso atomic mappings
  RISC-V: Specify -mabi for ztso testcases

 gcc/common/config/riscv/riscv-common.cc   |   6 +
 gcc/config/riscv/riscv-opts.h |   4 +
 gcc/config/riscv/riscv.cc |  20 +++-
 gcc/config/riscv/riscv.md |   2 +
 gcc/config/riscv/riscv.opt|   3 +
 gcc/config/riscv/sync-rvwmo.md|  96 +++
 gcc/config/riscv/sync-ztso.md |  80 +
 gcc/config/riscv/sync.md  | 111 ++
 .../riscv/amo-table-ztso-amo-add-1.c  |  15 +++
 .../riscv/amo-table-ztso-amo-add-2.c  |  15 +++
 .../riscv/amo-table-ztso-amo-add-3.c  |  15 +++
 .../riscv/amo-table-ztso-amo-add-4.c  |  15 +++
 .../riscv/amo-table-ztso-amo-add-5.c  |  15 +++
 .../riscv/amo-table-ztso-compare-exchange-1.c |  10 ++
 .../riscv/amo-table-ztso-compare-exchange-2.c |  10 ++
 .../riscv/amo-table-ztso-compare-exchange-3.c |  10 ++
 .../riscv/amo-table-ztso-compare-exchange-4.c |  10 ++
 .../riscv/amo-table-ztso-compare-exchange-5.c |  10 ++
 .../riscv/amo-table-ztso-compare-exchange-6.c |  10 ++
 .../riscv/amo-table-ztso-compare-exchange-7.c |  10 ++
 .../gcc.target/riscv/amo-table-ztso-fence-1.c |  14 +++
 .../gcc.target/riscv/amo-table-ztso-fence-2.c |  14 +++
 .../gcc.target/riscv/amo-table-ztso-fence-3.c |  14 +++
 .../gcc.target/riscv/amo-table-ztso-fence-4.c |  14 +++
 .../gcc.target/riscv/amo-table-ztso-fence-5.c |  15 +++
 .../gcc.target/riscv/amo-table-ztso-load-1.c  |  16 +++
 .../gcc.target/riscv/amo-table-ztso-load-2.c  |  16 +++
 .../gcc.target/riscv/amo-table-ztso-load-3.c  |  17 +++
 .../gcc.target/riscv/amo-table-ztso-store-1.c |  16 +++
 .../gcc.target/riscv/amo-table-ztso-store-2.c |  16 +++
 .../gcc.target/riscv/amo-table-ztso-store-3.c |  17 +++
 .../riscv/amo-table-ztso-subword-amo-add-1.c  |  10 ++
 .../riscv/amo-table-ztso-subword-amo-add-2.c  |  10 ++
 .../riscv/amo-table-ztso-subword-amo-add-3.c  |  10 ++
 .../riscv/amo-table-ztso-subword-amo-add-4.c  |  10 ++
 .../riscv/amo-table-ztso-subword-amo-add-5.c  |  10 ++
 36 files changed, 612 insertions(+), 74 deletions(-)
 create mode 100644 gcc/config/riscv/sync-rvwmo.md
 create mode 100644 gcc/config/riscv/sync-ztso.md
 create mode 100644 gcc/testsuite/gcc.target/riscv/amo-table-ztso-amo-add-1.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/amo-table-ztso-amo-add-2.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/amo-table-ztso-amo-add-3.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/amo-table-ztso-amo-add-4.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/amo-table-ztso-amo-add-5.c
 create mode 100644 
gcc/testsuite/gcc.target/riscv/amo-table-ztso-compare-exchange-1.c
 create mode 100644 
gcc/testsuite/gcc.target/riscv/amo-table-ztso-compare-exchange-2.c
 create mode 100644 
gcc/testsuite/gcc.target/riscv/amo-table-ztso-compare-exchange-3.c
 create mode 100644 
gcc/testsuite/gcc.target/riscv/amo-table-ztso-compare-exchange-4.c
 create mode 100644 
gcc/testsuite/gcc.target/riscv/amo-table-ztso-compare-exchange-5.c
 create mode 100644 
gcc/testsuite/gcc.target/riscv/amo-table-ztso-compare-exchange-6.c
 create mode 100644 
gcc/testsuite/gcc.target/riscv/amo-table-ztso-compare-exchange-7.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/amo-table-ztso-fence-1.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/amo-table-ztso-fence-2.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/amo-table-ztso-fence-3.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/amo-table-ztso-fence-4.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/amo-table-ztso-fence-5.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/amo-table-ztso-load-1.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/amo-table-ztso-load-2.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/amo-table-ztso-load-3.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/amo-table-ztso-store-1.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/amo-table-ztso-store-2.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/amo-table-ztso-store-3.c
 create mode 100644 
gcc/testsuite/gcc.target/riscv/amo-table-ztso-subword-amo-add-1.c
 create mode 100644 
gcc/testsuite/gcc.target/riscv/amo-table-ztso-subword-amo-add-2.c
 create mode 100644 
gcc/testsuite/gcc.target/riscv/amo-table-ztso-subword-amo-add-3.c
 create mode 100644 
gcc/testsuite/gcc.target/riscv/amo-table-ztso-subword-amo-add-4.c
 create mode 100644 
gcc/testsuite/gcc.target/riscv/amo-table-ztso-subword-amo-add-5.c

--
2.34.1



Re: gcc-patches From rewriting mailman settings (Was: [Linaro-TCWG-CI] gcc patch #75674: FAIL: 68 regressions)

2023-10-03 Thread Gerald Pfeifer
On Tue, 19 Sep 2023, Mark Wielaard wrote:
>> Although there were some positive responses (on list and on irc) it is
>> sometimes hard to know if there really is consensus for these kind of
>> infrastructure tweaks. But I believe there is at least no sustained
>> opposition to changing the gcc-patches mailman setting as proposed
>> above.
> This change is now done for gcc-patches.

Yeah, yeah, yeah. Thank you!

>> And if there are no complaints at Cauldron we could do the same for
>> the other patch lists the week after.

Sadly I missed Cauldron - have there been any complaints there?

Can you adjust the g...@gcc.gnu.org list and others @gcc.gnu.org as well?
I for one would love to see that.

Thanks,
Gerald


[Committed] RISC-V: Unescape chars in pr111566.f90 test

2023-10-03 Thread Patrick O'Neill



On 10/3/23 14:55, Jeff Law wrote:



On 10/3/23 14:19, Patrick O'Neill wrote:

Some characters are escaped which causes the testcase to fail. This
patch restores the original characters.

Tested for regressions using multilib rv32gcv-ilp32d, rv64gcv-lp64d.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/rvv/fortran/pr111566.f90: Restore escaped
characters.

LOL.  Yea, this is OK.

jeff


Committed.

Patrick



Re: [PATCH] c++: merge tsubst_copy into tsubst_copy_and_build

2023-10-03 Thread Jason Merrill

On 10/3/23 08:41, Patrick Palka wrote:

On Mon, 2 Oct 2023, Patrick Palka wrote:


Bootstrapped and regtested on x86_64-pc-linux-gnu, does this look
OK for trunk?

-- >8 --

The relationship between tsubst_copy_and_build and tsubst_copy (two of
the main template argument substitution routines for expression trees)
is rather hazy.  The former is mostly a superset of the latter, with
some differences.

The main difference is that they handle many tree codes differently, but
much of the tree code handling in tsubst_copy appears to be dead code[1].
This is because tsubst_copy only gets directly called in a few places
and mostly on id-expressions.  The interesting exceptions are PARM_DECL,
VAR_DECL, BIT_NOT_EXPR, SCOPE_REF, TEMPLATE_ID_EXPR and IDENTIFIER_NODE:

  * for PARM_DECL and VAR_DECL, tsubst_copy_and_build calls tsubst_copy
followed by doing some extra handling of its own
  * for BIT_NOT_EXPR tsubst_copy implicitly handles unresolved destructor
calls (i.e. the first operand is an identifier or a type)
  * for SCOPE_REF, TEMPLATE_ID_EXPR and IDENTIFIER_NODE tsubst_copy
refrains from doing name lookup of the terminal name

Other more minor differences are that tsubst_copy exits early when
'args' is null, and it calls maybe_dependent_member_ref


That's curious, since what that function does seems like name lookup; I 
wouldn't think we would want to call it when tf_no_name_lookup.



and finally it dispatches to tsubst for type trees.


And it looks like you fix the callers to avoid that?


Thus tsubst_copy is (at this point) similar enough to tsubst_copy_and_build
that it makes sense to merge the two functions, with the main difference
being the name lookup behavior[2].  So this patch merges tsubst_copy into
tsubst_copy_and_build via a new tsubst tf_no_name_lookup which controls
name lookup and resolution of a (top-level) id-expression.

[1]: http://thrifty.mooo.com:8008/gcc-lcov/gcc/cp/pt.cc.gcov.html#17231
[2]: I don't know the history of tsubst_copy but I would guess it was
added before we settled on using processing_template_decl to control
whether our AST building routines perform semantic checking and return
non-templated trees, and so we needed a separate tsubst routine that
avoids semantic checking and always returns a templated tree for e.g.
partial substitution.


Oops, this is wrong -- tsubst_copy_and_build came after tsubst_copy,
and was introduced as an optimization with the intent of getting rid
of tsubst_copy eventually:
https://gcc.gnu.org/pipermail/gcc-patches/2003-January/093659.html


I wonder if we want to add a small tsubst_name wrapper to call 
tsubst_copy_and_build with tf_no_name_lookup?


Can we also merge in tsubst_expr and use that name instead of the 
unwieldy tsubst_copy_and_build?


Jason



Re: [PATCH] RISC-V: Unescape chars in pr111566.f90 test

2023-10-03 Thread Jeff Law




On 10/3/23 14:19, Patrick O'Neill wrote:

Some characters are escaped which causes the testcase to fail. This
patch restores the original characters.

Tested for regressions using multilib rv32gcv-ilp32d, rv64gcv-lp64d.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/rvv/fortran/pr111566.f90: Restore escaped
characters.

LOL.  Yea, this is OK.

jeff


[COMMITTED] Don't use range_info_get_range for pointers.

2023-10-03 Thread Andrew MacLeod

Properly check for pointers instead of just using range_info_get_range.

bootstrapped on 86_64-pc-linux-gnu (and presumably AIX too :-) with no 
regressions.


On 10/3/23 12:53, David Edelsohn wrote:

AIX bootstrap is happier with the patch.

Thanks, David

commit d8808c37d29110872fa51b98e71aef9e160b4692
Author: Andrew MacLeod 
Date:   Tue Oct 3 12:32:10 2023 -0400

Don't use range_info_get_range for pointers.

Pointers only track null and nonnull, so we need to handle them
specially.

* tree-ssanames.cc (set_range_info): Use get_ptr_info for
pointers rather than range_info_get_range.

diff --git a/gcc/tree-ssanames.cc b/gcc/tree-ssanames.cc
index 1eae411ac1c..0a32444fbdf 100644
--- a/gcc/tree-ssanames.cc
+++ b/gcc/tree-ssanames.cc
@@ -420,15 +420,11 @@ set_range_info (tree name, const vrange )
 
   // Pick up the current range, or VARYING if none.
   tree type = TREE_TYPE (name);
-  Value_Range tmp (type);
-  if (range_info_p (name))
-range_info_get_range (name, tmp);
-  else
-tmp.set_varying (type);
-
   if (POINTER_TYPE_P (type))
 {
-  if (r.nonzero_p () && !tmp.nonzero_p ())
+  struct ptr_info_def *pi = get_ptr_info (name);
+  // If R is nonnull and pi is not, set nonnull.
+  if (r.nonzero_p () && (!pi || pi->pt.null))
{
  set_ptr_nonnull (name);
  return true;
@@ -436,6 +432,11 @@ set_range_info (tree name, const vrange )
   return false;
 }
 
+  Value_Range tmp (type);
+  if (range_info_p (name))
+range_info_get_range (name, tmp);
+  else
+tmp.set_varying (type);
   // If the result doesn't change, or is undefined, return false.
   if (!tmp.intersect (r) || tmp.undefined_p ())
 return false;


Re: [PATCH] c++: print source code in print_instantiation_partial_context_line

2023-10-03 Thread Jason Merrill

On 10/3/23 12:48, David Malcolm wrote:

As mentioned in my Cauldron talk, this patch adds a call to
diagnostic_show_locus to the "required from here" messages
in print_instantiation_partial_context_line, so that e.g., rather
than the rather mystifying:

In file included from ../x86_64-pc-linux-gnu/libstdc++-v3/include/memory:78,
  from ../../src/demo-1.C:1:
../x86_64-pc-linux-gnu/libstdc++-v3/include/bits/unique_ptr.h: In instantiation of 
‘std::__detail::__unique_ptr_t<_Tp> std::make_unique(_Args&& ...) [with _Tp = bar; _Args = 
{}; __detail::__unique_ptr_t<_Tp> = __detail::__unique_ptr_t]’:
../../src/demo-1.C:15:32:   required from here
../x86_64-pc-linux-gnu/libstdc++-v3/include/bits/unique_ptr.h:1066:30: error: 
no matching function for call to ‘bar::bar()’
  1066 | { return unique_ptr<_Tp>(new _Tp(std::forward<_Args>(__args)...)); 
}
   |  ^~~
../../src/demo-1.C:10:3: note: candidate: ‘bar::bar(int)’
10 |   bar (int);
   |   ^~~
../../src/demo-1.C:10:3: note:   candidate expects 1 argument, 0 provided
../../src/demo-1.C:7:7: note: candidate: ‘constexpr bar::bar(const bar&)’
 7 | class bar : public foo
   |   ^~~
../../src/demo-1.C:7:7: note:   candidate expects 1 argument, 0 provided
../../src/demo-1.C:7:7: note: candidate: ‘constexpr bar::bar(bar&&)’
../../src/demo-1.C:7:7: note:   candidate expects 1 argument, 0 provided

we emit:

In file included from ../x86_64-pc-linux-gnu/libstdc++-v3/include/memory:78,
  from ../../src/demo-1.C:1:
../x86_64-pc-linux-gnu/libstdc++-v3/include/bits/unique_ptr.h: In instantiation of 
‘std::__detail::__unique_ptr_t<_Tp> std::make_unique(_Args&& ...) [with _Tp = bar; _Args = 
{}; __detail::__unique_ptr_t<_Tp> = __detail::__unique_ptr_t]’:
../../src/demo-1.C:15:32:   required from here
15 |   return std::make_unique ();
   |  ~~^~
../x86_64-pc-linux-gnu/libstdc++-v3/include/bits/unique_ptr.h:1066:30: error: 
no matching function for call to ‘bar::bar()’
  1066 | { return unique_ptr<_Tp>(new _Tp(std::forward<_Args>(__args)...)); 
}
   |  ^~~
../../src/demo-1.C:10:3: note: candidate: ‘bar::bar(int)’
10 |   bar (int);
   |   ^~~
../../src/demo-1.C:10:3: note:   candidate expects 1 argument, 0 provided
../../src/demo-1.C:7:7: note: candidate: ‘constexpr bar::bar(const bar&)’
 7 | class bar : public foo
   |   ^~~
../../src/demo-1.C:7:7: note:   candidate expects 1 argument, 0 provided
../../src/demo-1.C:7:7: note: candidate: ‘constexpr bar::bar(bar&&)’
../../src/demo-1.C:7:7: note:   candidate expects 1 argument, 0 provided

which shows the code that's leading to the error (the bad call to
std::make_unique).


Successfully bootstrapped & regrtested on x86_64-pc-linux-gnu.

OK for trunk?


OK, thanks.  Now that you mention it, that's long been a small annoyance 
that never quite reached the point that it occurred to me to fix it.


Jason



gcc/cp/ChangeLog:
* error.cc (print_instantiation_partial_context_line): Call
diagnostic_show_locus.

gcc/testsuite/ChangeLog:
* g++.dg/diagnostic/static_assert3.C: Add directives for
additional source printing.
* g++.dg/template/error60.C: New test.

Signed-off-by: David Malcolm 
---
  gcc/cp/error.cc   |  2 +
  .../g++.dg/diagnostic/static_assert3.C|  7 +++-
  gcc/testsuite/g++.dg/template/error60.C   | 37 +++
  3 files changed, 45 insertions(+), 1 deletion(-)
  create mode 100644 gcc/testsuite/g++.dg/template/error60.C

diff --git a/gcc/cp/error.cc b/gcc/cp/error.cc
index ef96e140f24..767478cf5fd 100644
--- a/gcc/cp/error.cc
+++ b/gcc/cp/error.cc
@@ -3774,6 +3774,8 @@ print_instantiation_partial_context_line 
(diagnostic_context *context,
   ? _("recursively required from here\n")
   : _("required from here\n"));
  }
+  gcc_rich_location rich_loc (loc);
+  diagnostic_show_locus (context, _loc, DK_NOTE);
  }
  
  /* Same as print_instantiation_full_context but less verbose.  */

diff --git a/gcc/testsuite/g++.dg/diagnostic/static_assert3.C 
b/gcc/testsuite/g++.dg/diagnostic/static_assert3.C
index 5d363884508..4ec53f17120 100644
--- a/gcc/testsuite/g++.dg/diagnostic/static_assert3.C
+++ b/gcc/testsuite/g++.dg/diagnostic/static_assert3.C
@@ -5,6 +5,11 @@
  template  struct is_same { static constexpr bool 
value = false; };
  template  struct is_same { static constexpr bool value = 
true; };
  
+/* { dg-begin-multiline-output "" }

+  f(0, 1.3);
+  ~^~~~
+   { dg-end-multiline-output "" } */
+
  template 
  void f(T, U)
  {
@@ -32,5 +37,5 @@ void f(T, U)
  
  void g()

  {
- f(0, 1.3);
+ f(0, 1.3); // { dg-message " required from here" }
  }
diff --git a/gcc/testsuite/g++.dg/template/error60.C 
b/gcc/testsuite/g++.dg/template/error60.C
new file mode 

[PATCH v2 RFA] diagnostic: add permerror variants with opt

2023-10-03 Thread Jason Merrill
This revision changes from using DK_PEDWARN for permerror-with-option to using
DK_PERMERROR.

Tested x86_64-pc-linux-gnu.  OK for trunk?

-- 8< --

In the discussion of promoting some pedwarns to be errors by default, rather
than move them all into -fpermissive it seems to me to make sense to support
DK_PERMERROR with an option flag.  This way will also work with
-fpermissive, but users can also still use -Wno-error=narrowing to downgrade
that specific diagnostic rather than everything affected by -fpermissive.

So, for diagnostics that we want to make errors by default we can just
change the pedwarn call to permerror.

The tests check desired behavior for such a permerror in a system header
with various flags.  The patch preserves the existing permerror behavior of
ignoring -w and system headers by default, but respecting them when
downgraded to a warning by -fpermissive.

This seems similar to but a bit better than the approach of forcing
-pedantic-errors that I previously used for -Wnarrowing: specifically, in
that now -w by itself is not enough to silence the -Wnarrowing
error (integer-pack2.C).

gcc/ChangeLog:

* doc/invoke.texi: Move -fpermissive to Warning Options.
* diagnostic.cc (update_effective_level_from_pragmas): Remove
redundant system header check.
(diagnostic_report_diagnostic): Move down syshdr/-w check.
(diagnostic_impl): Handle DK_PERMERROR with an option number.
(permerror): Add new overloads.
* diagnostic-core.h (permerror): Declare them.

gcc/cp/ChangeLog:

* typeck2.cc (check_narrowing): Use permerror.

gcc/testsuite/ChangeLog:

* g++.dg/ext/integer-pack2.C: Add -fpermissive.
* g++.dg/diagnostic/sys-narrow.h: New test.
* g++.dg/diagnostic/sys-narrow1.C: New test.
* g++.dg/diagnostic/sys-narrow1a.C: New test.
* g++.dg/diagnostic/sys-narrow1b.C: New test.
* g++.dg/diagnostic/sys-narrow1c.C: New test.
* g++.dg/diagnostic/sys-narrow1d.C: New test.
* g++.dg/diagnostic/sys-narrow1e.C: New test.
* g++.dg/diagnostic/sys-narrow1f.C: New test.
* g++.dg/diagnostic/sys-narrow1g.C: New test.
* g++.dg/diagnostic/sys-narrow1h.C: New test.
* g++.dg/diagnostic/sys-narrow1i.C: New test.
---
 gcc/doc/invoke.texi   | 22 +++---
 gcc/diagnostic-core.h |  3 +
 gcc/testsuite/g++.dg/diagnostic/sys-narrow.h  |  2 +
 gcc/cp/typeck2.cc | 10 +--
 gcc/diagnostic.cc | 67 ---
 gcc/testsuite/g++.dg/diagnostic/sys-narrow1.C |  4 ++
 .../g++.dg/diagnostic/sys-narrow1a.C  |  5 ++
 .../g++.dg/diagnostic/sys-narrow1b.C  |  5 ++
 .../g++.dg/diagnostic/sys-narrow1c.C  |  5 ++
 .../g++.dg/diagnostic/sys-narrow1d.C  |  5 ++
 .../g++.dg/diagnostic/sys-narrow1e.C  |  5 ++
 .../g++.dg/diagnostic/sys-narrow1f.C  |  5 ++
 .../g++.dg/diagnostic/sys-narrow1g.C  |  5 ++
 .../g++.dg/diagnostic/sys-narrow1h.C  |  6 ++
 .../g++.dg/diagnostic/sys-narrow1i.C  |  6 ++
 gcc/testsuite/g++.dg/ext/integer-pack2.C  |  2 +-
 16 files changed, 117 insertions(+), 40 deletions(-)
 create mode 100644 gcc/testsuite/g++.dg/diagnostic/sys-narrow.h
 create mode 100644 gcc/testsuite/g++.dg/diagnostic/sys-narrow1.C
 create mode 100644 gcc/testsuite/g++.dg/diagnostic/sys-narrow1a.C
 create mode 100644 gcc/testsuite/g++.dg/diagnostic/sys-narrow1b.C
 create mode 100644 gcc/testsuite/g++.dg/diagnostic/sys-narrow1c.C
 create mode 100644 gcc/testsuite/g++.dg/diagnostic/sys-narrow1d.C
 create mode 100644 gcc/testsuite/g++.dg/diagnostic/sys-narrow1e.C
 create mode 100644 gcc/testsuite/g++.dg/diagnostic/sys-narrow1f.C
 create mode 100644 gcc/testsuite/g++.dg/diagnostic/sys-narrow1g.C
 create mode 100644 gcc/testsuite/g++.dg/diagnostic/sys-narrow1h.C
 create mode 100644 gcc/testsuite/g++.dg/diagnostic/sys-narrow1i.C

diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi
index 4085fc90907..6b6506a75b2 100644
--- a/gcc/doc/invoke.texi
+++ b/gcc/doc/invoke.texi
@@ -231,7 +231,7 @@ in the following sections.
 -fnew-inheriting-ctors
 -fnew-ttp-matching
 -fno-nonansi-builtins  -fnothrow-opt  -fno-operator-names
--fno-optional-diags  -fpermissive
+-fno-optional-diags
 -fno-pretty-templates
 -fno-rtti  -fsized-deallocation
 -ftemplate-backtrace-limit=@var{n}
@@ -323,7 +323,7 @@ Objective-C and Objective-C++ Dialects}.
 @item Warning Options
 @xref{Warning Options,,Options to Request or Suppress Warnings}.
 @gccoptlist{-fsyntax-only  -fmax-errors=@var{n}  -Wpedantic
--pedantic-errors
+-pedantic-errors -fpermissive
 -w  -Wextra  -Wall  -Wabi=@var{n}
 -Waddress  -Wno-address-of-packed-member  -Waggregate-return
 -Walloc-size-larger-than=@var{byte-size}  -Walloc-zero
@@ -3494,12 +3494,6 @@ Disable diagnostics that the standard says a compiler 
does not need to
 issue.  Currently, the only such diagnostic issued by G++ is the 

Re: [PATCH] RISC-V: Use stdint-gcc.h in rvv testsuite

2023-10-03 Thread Patrick O'Neill

On 10/2/23 06:57, Kito Cheng wrote:


On Tue, Sep 26, 2023 at 10:59 AM Patrick O'Neill  wrote:

stdint.h can be replaced with stdint-gcc.h to resolve some missing
system headers in non-multilib installations.

Tested using glibc rv32gcv and rv64gcv on r14-4258-gc9837443075.

gcc/ChangeLog:

  * config/riscv/riscv_vector.h (__RISCV_VECTOR_H): Replace
  stdint.h with stdint-gcc.h

I don't think this will work when testing an installed compiler which I do.

Thanks,
Andrew

In the riscv target testsuite (gcc.target/riscv) all occurrences of
#include  are currently constrained to the rvv/ subdirectory.
All non-vector tests use #include  rather than
#include . Have you encountered any issues when testing
installations with non-vector tests?

I think the concern is to replace stdint.h with stdint-gcc.h for riscv_vector.h,
that means users MAY include stdint-gcc.h *AND* stdint.h, stdint.h the later
one generally is provided by libc, and stdint-gcc.h typically are not included.

Other than the changes in "riscv_vector.h", everything else looks fine to me.


Ah okay, I'll retest and send a v2 that omits the riscv_vector.h change. 
Thanks, Patrick


[PATCH] RISC-V: Unescape chars in pr111566.f90 test

2023-10-03 Thread Patrick O'Neill
Some characters are escaped which causes the testcase to fail. This
patch restores the original characters.

Tested for regressions using multilib rv32gcv-ilp32d, rv64gcv-lp64d.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/rvv/fortran/pr111566.f90: Restore escaped
characters.

Signed-off-by: Patrick O'Neill 
---
 gcc/testsuite/gcc.target/riscv/rvv/fortran/pr111566.f90 | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/gcc/testsuite/gcc.target/riscv/rvv/fortran/pr111566.f90 
b/gcc/testsuite/gcc.target/riscv/rvv/fortran/pr111566.f90
index 265e913b299..2e30dc9bfaa 100644
--- a/gcc/testsuite/gcc.target/riscv/rvv/fortran/pr111566.f90
+++ b/gcc/testsuite/gcc.target/riscv/rvv/fortran/pr111566.f90
@@ -1,11 +1,11 @@
 ! { dg-do compile }
-! { dg-options -march=rv64gcv -mabi=lp64d -Ofast 
-fallow-argument-mismatch -fmax-stack-var-size=65536 -S  -std=legacy -w }
+! { dg-options "-march=rv64gcv -mabi=lp64d -Ofast -fallow-argument-mismatch 
-fmax-stack-var-size=65536 -S  -std=legacy -w" }

 module a
   integer,parameter :: SHR_KIND_R8 = selected_real_kind(12)
 end module a
 module b
-  use a,  c = shr_kind_r8
+  use a,  c => shr_kind_r8
 contains
   subroutine d(cg , km, i1, i2)
 real (c) ch(i2,km)
@@ -22,7 +22,7 @@ contains
 enddo
 if ( cq == 0 ) then
do i=i1,i2
-  if( cr =  cs ) then
+  if( cr <=  cs ) then
  cg= sign( min(ct,   cg),  cg)
   endif
enddo
--
2.34.1



Re: [PATCH] Fix coroutine tests for libstdc++ gnu-version-namespace mode

2023-10-03 Thread François Dumont

Indeed ! Here is the right one.

On 03/10/2023 11:52, Jonathan Wakely wrote:

On Mon, 2 Oct 2023 at 18:07, François Dumont  wrote:

Hi

Gentle reminder for this minor patch.

It looks like you attached the wrong patch.



Thanks

On 23/09/2023 22:10, François Dumont wrote:

I'm eventually fixing those tests the same way we manage this problem
in libstdc++ testsuite.

testsuite: Add optional libstdc++ version namespace in expected
diagnostic

 When libstdc++ is build with
--enable-symvers=gnu-versioned-namespace diagnostics are
 showing this namespace, currently __8.

 gcc/testsuite/ChangeLog:

 *
testsuite/g++.dg/coroutines/coro-bad-alloc-00-bad-op-new.C: Add optional
 '__8' version namespace in expected diagnostic.
 *
testsuite/g++.dg/coroutines/coro-bad-alloc-01-bad-op-del.C: Likewise.
 *
testsuite/g++.dg/coroutines/coro-bad-alloc-02-no-op-new-nt.C: Likewise.
 *
testsuite/g++.dg/coroutines/coro-bad-grooaf-01-grooaf-expected.C:
Likewise.
 * testsuite/g++.dg/coroutines/pr97438.C: Likewise.
 * testsuite/g++.dg/coroutines/ramp-return-b.C: Likewise.

Tested under Linux x86_64.

I'm contributing to libstdc++ so I already have write access.

Ok to commit ?

Françoisdiff --git a/gcc/testsuite/g++.dg/coroutines/coro-bad-alloc-00-bad-op-new.C b/gcc/testsuite/g++.dg/coroutines/coro-bad-alloc-00-bad-op-new.C
index 4706deebf4e..928e0c974e1 100644
--- a/gcc/testsuite/g++.dg/coroutines/coro-bad-alloc-00-bad-op-new.C
+++ b/gcc/testsuite/g++.dg/coroutines/coro-bad-alloc-00-bad-op-new.C
@@ -6,7 +6,7 @@
 #include "coro1-allocators.h"
 
 struct coro1
-f ()  /* { dg-error {'operator new' is provided by 'std::__n4861::__coroutine_traits_impl::promise_type' \{aka 'coro1::promise_type'\} but is not usable with the function signature 'coro1 f\(\)'} } */
+f ()  /* { dg-error {'operator new' is provided by 'std::(__8::)?__n4861::__coroutine_traits_impl::promise_type' \{aka 'coro1::promise_type'\} but is not usable with the function signature 'coro1 f\(\)'} } */
 {
   co_return;
 }
diff --git a/gcc/testsuite/g++.dg/coroutines/coro-bad-alloc-01-bad-op-del.C b/gcc/testsuite/g++.dg/coroutines/coro-bad-alloc-01-bad-op-del.C
index 252cb5e442c..6bed524aa0a 100644
--- a/gcc/testsuite/g++.dg/coroutines/coro-bad-alloc-01-bad-op-del.C
+++ b/gcc/testsuite/g++.dg/coroutines/coro-bad-alloc-01-bad-op-del.C
@@ -6,7 +6,7 @@
 #include "coro1-allocators.h"
 
 struct coro1
-f ()  /* { dg-error {'operator delete' is provided by 'std::__n4861::__coroutine_traits_impl::promise_type' \{aka 'coro1::promise_type'\} but is not usable with the function signature 'coro1 f\(\)'} } */
+f ()  /* { dg-error {'operator delete' is provided by 'std::(__8::)?__n4861::__coroutine_traits_impl::promise_type' \{aka 'coro1::promise_type'\} but is not usable with the function signature 'coro1 f\(\)'} } */
 {
   co_return;
 }
diff --git a/gcc/testsuite/g++.dg/coroutines/coro-bad-alloc-02-no-op-new-nt.C b/gcc/testsuite/g++.dg/coroutines/coro-bad-alloc-02-no-op-new-nt.C
index 89972b60945..0a545fed0e3 100644
--- a/gcc/testsuite/g++.dg/coroutines/coro-bad-alloc-02-no-op-new-nt.C
+++ b/gcc/testsuite/g++.dg/coroutines/coro-bad-alloc-02-no-op-new-nt.C
@@ -9,7 +9,7 @@
 #include "coro1-allocators.h"
 
 struct coro1
-f () /* { dg-error {'coro1::promise_type::get_return_object_on_allocation_failure\(\)\(\)' is provided by 'std::__n4861::__coroutine_traits_impl::promise_type' \{aka 'coro1::promise_type'\} but 'operator new' is not marked 'throw\(\)' or 'noexcept'} } */
+f () /* { dg-error {'coro1::promise_type::get_return_object_on_allocation_failure\(\)\(\)' is provided by 'std::(__8::)?__n4861::__coroutine_traits_impl::promise_type' \{aka 'coro1::promise_type'\} but 'operator new' is not marked 'throw\(\)' or 'noexcept'} } */
 {
   co_return;
 }
diff --git a/gcc/testsuite/g++.dg/coroutines/coro-bad-grooaf-01-grooaf-expected.C b/gcc/testsuite/g++.dg/coroutines/coro-bad-grooaf-01-grooaf-expected.C
index 9fa3d64a9f2..b36e88f871a 100644
--- a/gcc/testsuite/g++.dg/coroutines/coro-bad-grooaf-01-grooaf-expected.C
+++ b/gcc/testsuite/g++.dg/coroutines/coro-bad-grooaf-01-grooaf-expected.C
@@ -6,7 +6,7 @@
 int used_grooaf = 0;
 
 struct coro1
-f () noexcept // { dg-warning {'operator new' is marked 'throw\(\)' or 'noexcept' but no usable 'get_return_object_on_allocation_failure' is provided by 'std::__n4861::__coroutine_traits_impl::promise_type' \{aka 'coro1::promise_type'\}} }
+f () noexcept // { dg-warning {'operator new' is marked 'throw\(\)' or 'noexcept' but no usable 'get_return_object_on_allocation_failure' is provided by 'std::(__8::)?__n4861::__coroutine_traits_impl::promise_type' \{aka 'coro1::promise_type'\}} }
 {
   PRINT ("coro1: about to return");
   co_return;
diff --git a/gcc/testsuite/g++.dg/coroutines/pr97438.C b/gcc/testsuite/g++.dg/coroutines/pr97438.C
index 95376648ed7..ac37118eae7 100644
--- a/gcc/testsuite/g++.dg/coroutines/pr97438.C
+++ 

[PATCH v2] libiberty: Use posix_spawn in pex-unix when available.

2023-10-03 Thread Brendan Shanks
Hi,

This patch implements pex_unix_exec_child using posix_spawn when
available.

This should especially benefit recent macOS (where vfork just calls
fork), but should have equivalent or faster performance on all
platforms.
In addition, the implementation is substantially simpler than the
vfork+exec code path.

Tested on x86_64-linux.

v2: Fix error handling (previously the function would be run twice in
case of error), and don't use a macro that changes control flow.

libiberty/
* configure.ac (AC_CHECK_HEADERS): Add spawn.h.
(checkfuncs): Add posix_spawn, posix_spawnp.
(AC_CHECK_FUNCS): Add posix_spawn, posix_spawnp.
* configure, config.in: Rebuild.
* pex-unix.c [HAVE_POSIX_SPAWN] (pex_unix_exec_child): New function.

Signed-off-by: Brendan Shanks 
---
 libiberty/configure.ac |  8 ++--
 libiberty/pex-unix.c   | 93 ++
 2 files changed, 98 insertions(+), 3 deletions(-)

diff --git a/libiberty/configure.ac b/libiberty/configure.ac
index 0748c592704..2488b031bc8 100644
--- a/libiberty/configure.ac
+++ b/libiberty/configure.ac
@@ -289,7 +289,7 @@ AC_SUBST_FILE(host_makefile_frag)
 # It's OK to check for header files.  Although the compiler may not be
 # able to link anything, it had better be able to at least compile
 # something.
-AC_CHECK_HEADERS(sys/file.h sys/param.h limits.h stdlib.h malloc.h string.h 
unistd.h strings.h sys/time.h time.h sys/resource.h sys/stat.h sys/mman.h 
fcntl.h alloca.h sys/pstat.h sys/sysmp.h sys/sysinfo.h machine/hal_sysinfo.h 
sys/table.h sys/sysctl.h sys/systemcfg.h stdint.h stdio_ext.h process.h 
sys/prctl.h)
+AC_CHECK_HEADERS(sys/file.h sys/param.h limits.h stdlib.h malloc.h string.h 
unistd.h strings.h sys/time.h time.h sys/resource.h sys/stat.h sys/mman.h 
fcntl.h alloca.h sys/pstat.h sys/sysmp.h sys/sysinfo.h machine/hal_sysinfo.h 
sys/table.h sys/sysctl.h sys/systemcfg.h stdint.h stdio_ext.h process.h 
sys/prctl.h spawn.h)
 AC_HEADER_SYS_WAIT
 AC_HEADER_TIME
 
@@ -412,7 +412,8 @@ funcs="$funcs setproctitle"
 vars="sys_errlist sys_nerr sys_siglist"
 
 checkfuncs="__fsetlocking canonicalize_file_name dup3 getrlimit getrusage \
- getsysinfo gettimeofday on_exit pipe2 psignal pstat_getdynamic 
pstat_getstatic \
+ getsysinfo gettimeofday on_exit pipe2 posix_spawn posix_spawnp psignal \
+ pstat_getdynamic pstat_getstatic \
  realpath setrlimit spawnve spawnvpe strerror strsignal sysconf sysctl \
  sysmp table times wait3 wait4"
 
@@ -435,7 +436,8 @@ if test "x" = "y"; then
 index insque \
 memchr memcmp memcpy memmem memmove memset mkstemps \
 on_exit \
-pipe2 psignal pstat_getdynamic pstat_getstatic putenv \
+pipe2 posix_spawn posix_spawnp psignal \
+pstat_getdynamic pstat_getstatic putenv \
 random realpath rename rindex \
 sbrk setenv setproctitle setrlimit sigsetmask snprintf spawnve spawnvpe \
  stpcpy stpncpy strcasecmp strchr strdup \
diff --git a/libiberty/pex-unix.c b/libiberty/pex-unix.c
index 33b5bce31c2..5e5ac529ad2 100644
--- a/libiberty/pex-unix.c
+++ b/libiberty/pex-unix.c
@@ -58,6 +58,9 @@ extern int errno;
 #ifdef HAVE_PROCESS_H
 #include 
 #endif
+#ifdef HAVE_SPAWN_H
+#include 
+#endif
 
 #ifdef vfork /* Autoconf may define this to fork for us. */
 # define VFORK_STRING "fork"
@@ -559,6 +562,96 @@ pex_unix_exec_child (struct pex_obj *obj ATTRIBUTE_UNUSED,
   return (pid_t) -1;
 }
 
+#elif defined(HAVE_POSIX_SPAWN) && defined(HAVE_POSIX_SPAWNP)
+/* Implementation of pex->exec_child using posix_spawn.*/
+
+static pid_t
+pex_unix_exec_child (struct pex_obj *obj ATTRIBUTE_UNUSED,
+int flags, const char *executable,
+char * const * argv, char * const * env,
+int in, int out, int errdes,
+int toclose, const char **errmsg, int *err)
+{
+  int ret;
+  pid_t pid = -1;
+  posix_spawnattr_t attr;
+  posix_spawn_file_actions_t actions;
+  int attr_initialized = 0, actions_initialized = 0;
+
+  ret = posix_spawnattr_init ();
+  if (ret) { *err = ret; *errmsg = "posix_spawnattr_init"; goto exit; }
+  attr_initialized = 1;
+
+  /* Use vfork() on glibc <=2.24. */
+#ifdef POSIX_SPAWN_USEVFORK
+  ret = posix_spawnattr_setflags (, POSIX_SPAWN_USEVFORK);
+  if (ret) { *err = ret; *errmsg = "posix_spawnattr_setflags"; goto exit; }
+#endif
+
+  ret = posix_spawn_file_actions_init ();
+  if (ret) { *err = ret; *errmsg = "posix_spawn_file_actions_init"; goto exit; 
}
+  actions_initialized = 1;
+
+  if (in != STDIN_FILE_NO)
+{
+  ret = posix_spawn_file_actions_adddup2 (, in, STDIN_FILE_NO);
+  if (ret) { *err = ret; *errmsg = "posix_spawn_file_actions_adddup2"; 
goto exit; }
+
+  ret = posix_spawn_file_actions_addclose (, in);
+  if (ret) { *err = ret; *errmsg = "posix_spawn_file_actions_addclose"; 
goto exit; }
+}
+  if (out != STDOUT_FILE_NO)
+{
+  ret = posix_spawn_file_actions_adddup2 (, out, STDOUT_FILE_NO);
+  if (ret) { *err = ret; 

[PATCH] match.pd: Avoid other build_nonstandard_integer_type calls [PR111369]

2023-10-03 Thread Jakub Jelinek
Hi!

On Sat, Sep 30, 2023 at 11:57:38AM +0200, Jakub Jelinek wrote:
> > This fixes PR111369, where one of the bitint*.c tests FAILs with
> > GCC_TEST_RUN_EXPENSIVE=1.
> 
> Though, I think there is an preexisting issue which the
> build_nonstandard_integer_type didn't help with; if type is signed 1-bit
> precision, then I think a ? ~t : t could be valid, but -(type)a would invoke
> UB if a is 1 - the cast would make it -1 and negation of -1 in signed 1-bit
> invokes UB.
> So perhaps we should guard this optimization on type having element precision 
> > 1
> or being unsigned.  Plus the (convert:type @2) didn't make sense, @2 already
> must have TREE_TYPE type.

In the light of the PR111668 patch which shows that
build_nonstandard_integer_type is needed (at least for some signed prec > 1
BOOLEAN_TYPEs if we use e.g. negation), I've reworked this patch and handled
the last problematic build_nonstandard_integer_type call in there as well.

In the x == cstN ? cst4 : cst3 optimization it uses
build_nonstandard_integer_type solely for BOOLEAN_TYPEs (I really don't see
why ENUMERAL_TYPEs would be a problem, we treat them in GIMPLE as uselessly
convertible to same precision/sign INTEGER_TYPEs), for INTEGER_TYPEs it is
really a no-op (might return a different type, but always INTEGER_TYPE
with same TYPE_PRECISION same TYPE_UNSIGNED) and for BITINT_TYPE with larger
precisions really harmful (we shouldn't create large precision
INTEGER_TYPEs).

The a?~t:t optimization just omits the negation of a in type for 1-bit
precision types or any BOOLEAN_TYPEs.  I think that is correct, because
for both signed and unsigned 1-bit precision type, cast to type of a bool
value yields already 0, -1 or 0, 1 values and for 1-bit precision negation
of that is still 0, -1 or 0, 1 (except for invoking sometimes UB).
And for signed larger precision BOOLEAN_TYPEs I think it is correct as well,
cast of [0, 1] to type yields 0, -1 and those can be xored with 0 or -1
to yield the proper result, any other values would be UB.

Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?

2023-10-03  Jakub Jelinek  

PR middle-end/111369
* match.pd (x == cstN ? cst4 : cst3): Use
build_nonstandard_integer_type only if type1 is BOOLEAN_TYPE.
Fix comment typo.  Formatting fix.
(a?~t:t -> (-(a))^t): Always convert to type rather
than using build_nonstandard_integer_type.  Perform negation
only if type has precision > 1 and is not signed BOOLEAN_TYPE.

--- gcc/match.pd.jj 2023-10-03 10:33:30.817614648 +0200
+++ gcc/match.pd2023-10-03 11:29:54.089566764 +0200
@@ -5178,7 +5178,7 @@ (define_operator_list SYNC_FETCH_AND_AND
 
 /* Optimize
# x_5 in range [cst1, cst2] where cst2 = cst1 + 1
-   x_5 ? cstN ? cst4 : cst3
+   x_5 == cstN ? cst4 : cst3
# op is == or != and N is 1 or 2
to r_6 = x_5 + (min (cst3, cst4) - cst1) or
r_6 = (min (cst3, cst4) + cst1) - x_5 depending on op, N and which
@@ -5214,7 +5214,8 @@ (define_operator_list SYNC_FETCH_AND_AND
 type1 = type;
auto prec = TYPE_PRECISION (type1);
auto unsign = TYPE_UNSIGNED (type1);
-   type1 = build_nonstandard_integer_type (prec, unsign);
+   if (TREE_CODE (type1) == BOOLEAN_TYPE)
+   type1 = build_nonstandard_integer_type (prec, unsign);
min = wide_int::from (min, prec,
 TYPE_SIGN (TREE_TYPE (@0)));
wide_int a = wide_int::from (wi::to_wide (arg0), prec,
@@ -5253,14 +5254,7 @@ (define_operator_list SYNC_FETCH_AND_AND
   }
   (if (code == PLUS_EXPR)
(convert (plus (convert:type1 @0) { arg; }))
-   (convert (minus { arg; } (convert:type1 @0)))
-  )
- )
-)
-   )
-  )
- )
-)
+   (convert (minus { arg; } (convert:type1 @0))
 #endif
 
 (simplify
@@ -6758,13 +6752,11 @@ (define_operator_list SYNC_FETCH_AND_AND
  (with { bool wascmp; }
   (if (INTEGRAL_TYPE_P (type)
&& bitwise_inverted_equal_p (@1, @2, wascmp)
-   && (!wascmp || element_precision (type) == 1))
-   (with {
- auto prec = TYPE_PRECISION (type);
- auto unsign = TYPE_UNSIGNED (type);
- tree inttype = build_nonstandard_integer_type (prec, unsign);
-}
-(convert (bit_xor (negate (convert:inttype @0)) (convert:inttype @2)))
+   && (!wascmp || TYPE_PRECISION (type) == 1))
+   (if ((!TYPE_UNSIGNED (type) && TREE_CODE (type) == BOOLEAN_TYPE)
+   || TYPE_PRECISION (type) == 1)
+(bit_xor (convert:type @0) @2)
+(bit_xor (negate (convert:type @0)) @2)
 #endif
 
 /* Simplify pointer equality compares using PTA.  */


Jakub



[PATCH] match.pd: Fix up a ? cst1 : cst2 regression on signed bool [PR111668]

2023-10-03 Thread Jakub Jelinek
Hi!

My relatively recent changes to these simplifiers to avoid
doing build_nonstandard_integer_type (primarily for BITINT_TYPE)
broke PR111668, a recurrence of the PR110487 bug.
I thought the build_nonstandard_integer_type isn't ever needed there,
but there is one special case where it is.
For the a ? -1 : 0 and a ? 0 : -1 simplifications there are actually
3 different cases.  One is for signed 1-bit precision types (signed
kind of implied from integer_all_onesp, because otherwise it would
match integer_onep earlier), where the simplifier wierdly was matching
them using the a ? powerof2cst : 0 -> a << (log2(powerof2cst))
simplification and then another simplifier optimizing away the left shift
when log2(powerof2cst) was 0.  Another one is signed BOOLEAN_TYPE with
precision > 1, where indeed we shouldn't be doing the negation in type,
because it isn't well defined in that type, the type only has 2 valid
values, 0 and -1.  As an alternative, we could also e.g. cast to
signed 1-bit precision BOOLEAN_TYPE and then extend to type.
And the last case is what we were doing for types which have both 1 and -1
(all all ones) as valid values (i.e. all signed/unsigned ENUMERAL_TYPEs,
INTEGRAL_TYPEs and BITINT_TYPEs with precision > 1).

The following patch avoids the hops through << 0 for 1-bit precision
and uses build_nonstandard_integer_type solely for the BOOLEAN_TYPE types
(where we have a guarantee the precision is reasonably small, nothing ought
to be created 129+ bit precision BOOLEAN_TYPEs).

Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?

2023-10-03  Jakub Jelinek  

PR tree-optimization/111668
* match.pd (a ? CST1 : CST2): Handle the a ? -1 : 0 and
a ? 0 : -1 cases before the powerof2cst cases and differentiate
between 1-bit precision types, larger precision boolean types
and other integral types.  Fix comment pastos and formatting.

--- gcc/match.pd.jj 2023-10-02 09:42:01.657836005 +0200
+++ gcc/match.pd2023-10-03 10:33:30.817614648 +0200
@@ -5100,36 +5100,53 @@ (define_operator_list SYNC_FETCH_AND_AND
  (switch
   (if (integer_zerop (@2))
(switch
-/* a ? 1 : 0 -> a if 0 and 1 are integral types. */
+/* a ? 1 : 0 -> a if 0 and 1 are integral types.  */
 (if (integer_onep (@1))
  (convert (convert:boolean_type_node @0)))
+/* a ? -1 : 0 -> -a.  */
+(if (INTEGRAL_TYPE_P (type) && integer_all_onesp (@1))
+ (if (TYPE_PRECISION (type) == 1)
+  /* For signed 1-bit precision just cast bool to the type.  */
+  (convert (convert:boolean_type_node @0))
+  (if (TREE_CODE (type) == BOOLEAN_TYPE)
+   (with {
+ tree intt = build_nonstandard_integer_type (TYPE_PRECISION (type),
+ TYPE_UNSIGNED (type));
+   }
+   (convert (negate (convert:intt (convert:boolean_type_node @0)
+   (negate (convert:type (convert:boolean_type_node @0))
 /* a ? powerof2cst : 0 -> a << (log2(powerof2cst)) */
 (if (INTEGRAL_TYPE_P (type) && integer_pow2p (@1))
  (with {
tree shift = build_int_cst (integer_type_node, tree_log2 (@1));
   }
-  (lshift (convert (convert:boolean_type_node @0)) { shift; })))
-/* a ? -1 : 0 -> -a.  No need to check the TYPE_PRECISION not being 1
-   here as the powerof2cst case above will handle that case correctly.  */
-(if (INTEGRAL_TYPE_P (type) && integer_all_onesp (@1))
- (negate (convert:type (convert:boolean_type_node @0))
+  (lshift (convert (convert:boolean_type_node @0)) { shift; })
   (if (integer_zerop (@1))
(switch
-/* a ? 0 : 1 -> !a. */
+/* a ? 0 : 1 -> !a.  */
 (if (integer_onep (@2))
  (convert (bit_xor (convert:boolean_type_node @0) { boolean_true_node; })))
-/* a ? powerof2cst : 0 -> (!a) << (log2(powerof2cst)) */
+/* a ? 0 : -1 -> -(!a).  */
+(if (INTEGRAL_TYPE_P (type) && integer_all_onesp (@2))
+ (if (TYPE_PRECISION (type) == 1)
+  /* For signed 1-bit precision just cast bool to the type.  */
+  (convert (bit_xor (convert:boolean_type_node @0) { boolean_true_node; }))
+  (if (TREE_CODE (type) == BOOLEAN_TYPE)
+   (with {
+ tree intt = build_nonstandard_integer_type (TYPE_PRECISION (type),
+ TYPE_UNSIGNED (type));
+   }
+   (convert (negate (convert:intt (bit_xor (convert:boolean_type_node @0)
+   { boolean_true_node; })
+   (negate (convert:type (bit_xor (convert:boolean_type_node @0)
+ { boolean_true_node; }))
+/* a ? 0 : powerof2cst -> (!a) << (log2(powerof2cst)) */
 (if (INTEGRAL_TYPE_P (type) && integer_pow2p (@2))
  (with {
tree shift = build_int_cst (integer_type_node, tree_log2 (@2));
   }
   (lshift (convert (bit_xor (convert:boolean_type_node @0)
-   { boolean_true_node; })) 

Re: mvconst_internal splitter gated with !@ira_in_progess (was Re: Yet Another IRA question)

2023-10-03 Thread Jeff Law




On 10/2/23 18:12, Vineet Gupta wrote:



On 9/28/23 12:52, Vineet Gupta wrote:


On 9/28/23 05:53, Jeff Law wrote:
Vineet -- assuming Vlad's patch goes in, the other obvious candidate 
for this would be the mvconst_internal define_insn_and_split where 
we'd probably want to reject the insn as a whole once IRA has started. 


Good point, although currently we've kind of papered over it with 
-fsched-pressure, but I'm sure there are way more cases that this will 
improve still.
I will spin up a full multilib test with that, hopefully with no 
fallout :-)


I have the results finally. This is testsuite neutral. Same results 
before/after


    = Summary of gcc testsuite =
     | # of unexpected case / # of unique 
unexpected case

     |  gcc |  g++ | gfortran |
    rv32imac/  ilp32/ medlow |  168 /    70 |   13 / 6 |   67 /    12 |
  rv32imafdc/ ilp32d/ medlow |  168 /    70 |   13 / 6 |   24 / 4 |
    rv64imac/   lp64/ medlow |  161 /    70 |    9 / 3 |   67 /    12 |
  rv64imafdc/  lp64d/ medlow |  160 /    69 |    5 / 2 |    6 / 1 |

But the SPEC runs are not affected at all, if anything it seems to be 
way under noise for 5 workloads.

Not sure if we still want to add the gate, your call
I'd tend to go forward with it as I think it'll help -O1 builds as those 
don't inherently turn on the scheduler.


Jeff


Re: [COMMITTED] Remove pass counting in VRP.

2023-10-03 Thread David Malcolm
On Tue, 2023-10-03 at 13:11 -0400, Andrew MacLeod wrote:
> 
> On 10/3/23 13:02, David Malcolm wrote:
> > On Tue, 2023-10-03 at 10:32 -0400, Andrew MacLeod wrote:
> > > Pass counting in VRP is used to decide when to call early VRP,
> > > pass
> > > the
> > > flag to enable warnings, and when the final pass is.
> > > 
> > > If you try to add additional passes, this becomes quite fragile.
> > > This
> > > patch simply chooses the pass based on the data pointer passed
> > > in,
> > > and
> > > remove the pass counter.   The first FULL VRP pass invokes the
> > > warning
> > > code, and the flag passed in now represents the FINAL pass of
> > > VRP.
> > > There is no longer a global flag which, as it turns out, wasn't
> > > working
> > > well with the JIT compiler, but when undetected.  (Thanks to
> > > dmalcolm
> > > for helping me sort out what was going on there)
> > > 
> > > 
> > > Bootstraps  on x86_64-pc-linux-gnu with no regressions.   Pushed.
> > [CCing jit mailing list]
> > 
> > I'm worried that this patch may have "papered over" an issue with
> > libgccjit.  Specifically:
> 
> well, that isnt the patch that was checked in :-P

Aha!  That makes much more sense.  I took a look at
https://gcc.gnu.org/git/?p=gcc.git;a=commitdiff;h=7eb5ce7f58ed4a48641e1786e4fdeb2f7fb8c5ff
and yes, that looks like it will work with libgccjit

Thanks for clarifying (and for fixing the issue)
Dave

> 
> Im not sure how the old version got into the commit note.
> 
> Attached is the version checked in.
> 



[PATCH 2/2] libstdc++: _versioned_namespace is always non-None

2023-10-03 Thread Tom Tromey
Some code in the pretty-printers seems to assume that the
_versioned_namespace global might be None (or the empty string).
However, doesn't occur, as the variable is never reassigned.

libstdc++-v3/ChangeLog:

* python/libstdcxx/v6/printers.py: Assume that
_versioned_namespace is non-None.
* python/libstdcxx/v6/xmethods.py (is_specialization_of):
Assume that _versioned_namespace is non-None.
---
 libstdc++-v3/python/libstdcxx/v6/printers.py | 15 ++-
 libstdc++-v3/python/libstdcxx/v6/xmethods.py |  3 +--
 2 files changed, 7 insertions(+), 11 deletions(-)

diff --git a/libstdc++-v3/python/libstdcxx/v6/printers.py 
b/libstdc++-v3/python/libstdcxx/v6/printers.py
index 23efbd171ec..e370551cbe1 100644
--- a/libstdc++-v3/python/libstdcxx/v6/printers.py
+++ b/libstdc++-v3/python/libstdcxx/v6/printers.py
@@ -139,7 +139,7 @@ def lookup_templ_spec(templ, *args):
 except gdb.error as e:
 # Type not found, try again in versioned namespace.
 global _versioned_namespace
-if _versioned_namespace and _versioned_namespace not in templ:
+if _versioned_namespace not in templ:
 t = t.replace('::', '::' + _versioned_namespace, 1)
 try:
 return gdb.lookup_type(t)
@@ -211,16 +211,13 @@ def is_specialization_of(x, template_name):
 global _versioned_namespace
 if isinstance(x, gdb.Type):
 x = x.tag
-if _versioned_namespace:
-template_name = '(%s)?%s' % (_versioned_namespace, template_name)
+template_name = '(%s)?%s' % (_versioned_namespace, template_name)
 return re.match('^std::%s<.*>$' % template_name, x) is not None
 
 
 def strip_versioned_namespace(typename):
 global _versioned_namespace
-if _versioned_namespace:
-return typename.replace(_versioned_namespace, '')
-return typename
+return typename.replace(_versioned_namespace, '')
 
 
 def strip_inline_namespaces(type_str):
@@ -2355,7 +2352,7 @@ class Printer(object):
 # Add a name using _GLIBCXX_BEGIN_NAMESPACE_VERSION.
 def add_version(self, base, name, function):
 self.add(base + name, function)
-if _versioned_namespace and '__cxx11' not in base:
+if '__cxx11' not in base:
 vbase = re.sub('^(std|__gnu_cxx)::', r'\g<0>%s' %
_versioned_namespace, base)
 self.add(vbase + name, function)
@@ -2527,7 +2524,7 @@ def add_one_template_type_printer(obj, name, defargs):
 printer = TemplateTypePrinter('std::__debug::' + name, defargs)
 gdb.types.register_type_printer(obj, printer)
 
-if _versioned_namespace and '__cxx11' not in name:
+if '__cxx11' not in name:
 # Add second type printer for same type in versioned namespace:
 ns = 'std::' + _versioned_namespace
 # PR 86112 Cannot use dict comprehension here:
@@ -2628,7 +2625,7 @@ class FilteringTypePrinter(object):
 def add_one_type_printer(obj, template, name, targ1=None):
 printer = FilteringTypePrinter('std::' + template, 'std::' + name, targ1)
 gdb.types.register_type_printer(obj, printer)
-if _versioned_namespace and '__cxx11' not in template:
+if '__cxx11' not in template:
 ns = 'std::' + _versioned_namespace
 printer = FilteringTypePrinter(ns + template, ns + name, targ1)
 gdb.types.register_type_printer(obj, printer)
diff --git a/libstdc++-v3/python/libstdcxx/v6/xmethods.py 
b/libstdc++-v3/python/libstdcxx/v6/xmethods.py
index 8ccf57c4d6b..42e60eb57b1 100644
--- a/libstdc++-v3/python/libstdcxx/v6/xmethods.py
+++ b/libstdc++-v3/python/libstdcxx/v6/xmethods.py
@@ -39,8 +39,7 @@ def is_specialization_of(x, template_name):
 """
 if isinstance(x, gdb.Type):
 x = x.tag
-if _versioned_namespace:
-template_name = '(%s)?%s' % (_versioned_namespace, template_name)
+template_name = '(%s)?%s' % (_versioned_namespace, template_name)
 return re.match(r'^std::(__\d::)?%s<.*>$' % template_name, x) is not None
 
 class LibStdCxxXMethod(gdb.xmethod.XMethod):
-- 
2.40.1



[PATCH 1/2] libstdc++: Define _versioned_namespace in xmethods.py

2023-10-03 Thread Tom Tromey
flake8 pointed out that is_specialization_of in xmethods.py looks at a
global that wasn't added to the file.  This patch correct the
oversight.

libstdc++-v3/ChangeLog:

* python/libstdcxx/v6/xmethods.py (_versioned_namespace):
Define.
---
 libstdc++-v3/python/libstdcxx/v6/xmethods.py | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/libstdc++-v3/python/libstdcxx/v6/xmethods.py 
b/libstdc++-v3/python/libstdcxx/v6/xmethods.py
index 844c8a2105a..8ccf57c4d6b 100644
--- a/libstdc++-v3/python/libstdcxx/v6/xmethods.py
+++ b/libstdc++-v3/python/libstdcxx/v6/xmethods.py
@@ -28,6 +28,8 @@ def get_bool_type():
 def get_std_size_type():
 return gdb.lookup_type('std::size_t')
 
+_versioned_namespace = '__8::'
+
 def is_specialization_of(x, template_name):
 """
 Test whether a type is a specialization of the named class template.
-- 
2.40.1



[PATCH 0/2] A couple minor _versioned_namespace patches

2023-10-03 Thread Tom Tromey
While I was working on the flake8/black patches, flake8 pointed out a
bug in xmethods.py.  This is fixed in patch 1.  Then I found the
checks of _versioned_namespace to be a bit odd, so I wrote patch 2.

Tested on x86-64 Fedora 36.

Tom




Re: [COMMITTED] Remove pass counting in VRP.

2023-10-03 Thread Andrew MacLeod


On 10/3/23 13:02, David Malcolm wrote:

On Tue, 2023-10-03 at 10:32 -0400, Andrew MacLeod wrote:

Pass counting in VRP is used to decide when to call early VRP, pass
the
flag to enable warnings, and when the final pass is.

If you try to add additional passes, this becomes quite fragile. This
patch simply chooses the pass based on the data pointer passed in,
and
remove the pass counter.   The first FULL VRP pass invokes the
warning
code, and the flag passed in now represents the FINAL pass of VRP.
There is no longer a global flag which, as it turns out, wasn't
working
well with the JIT compiler, but when undetected.  (Thanks to dmalcolm
for helping me sort out what was going on there)


Bootstraps  on x86_64-pc-linux-gnu with no regressions.   Pushed.

[CCing jit mailing list]

I'm worried that this patch may have "papered over" an issue with
libgccjit.  Specifically:


well, that isnt the patch that was checked in :-P

Im not sure how the old version got into the commit note.

Attached is the version checked in.

commit 7eb5ce7f58ed4a48641e1786e4fdeb2f7fb8c5ff
Author: Andrew MacLeod 
Date:   Thu Sep 28 09:19:32 2023 -0400

Remove pass counting in VRP.

Rather than using a pass count to decide which parameters are passed to
VRP, makemit explicit.

* passes.def (pass_vrp): Pass "final pass" flag as parameter.
* tree-vrp.cc (vrp_pass_num): Remove.
(pass_vrp::my_pass): Remove.
(pass_vrp::pass_vrp): Add warn_p as a parameter.
(pass_vrp::final_p): New.
(pass_vrp::set_pass_param): Set final_p param.
(pass_vrp::execute): Call execute_range_vrp with no conditions.
(make_pass_vrp): Pass additional parameter.
(make_pass_early_vrp): Ditto.

diff --git a/gcc/passes.def b/gcc/passes.def
index 4110a472914..2bafd60bbfb 100644
--- a/gcc/passes.def
+++ b/gcc/passes.def
@@ -221,7 +221,7 @@ along with GCC; see the file COPYING3.  If not see
   NEXT_PASS (pass_fre, true /* may_iterate */);
   NEXT_PASS (pass_merge_phi);
   NEXT_PASS (pass_thread_jumps_full, /*first=*/true);
-  NEXT_PASS (pass_vrp, true /* warn_array_bounds_p */);
+  NEXT_PASS (pass_vrp, false /* final_p*/);
   NEXT_PASS (pass_dse);
   NEXT_PASS (pass_dce);
   /* pass_stdarg is always run and at this point we execute
@@ -348,7 +348,7 @@ along with GCC; see the file COPYING3.  If not see
   NEXT_PASS (pass_dominator, false /* may_peel_loop_headers_p */);
   NEXT_PASS (pass_strlen);
   NEXT_PASS (pass_thread_jumps_full, /*first=*/false);
-  NEXT_PASS (pass_vrp, false /* warn_array_bounds_p */);
+  NEXT_PASS (pass_vrp, true /* final_p */);
   /* Run CCP to compute alignment and nonzero bits.  */
   NEXT_PASS (pass_ccp, true /* nonzero_p */);
   NEXT_PASS (pass_warn_restrict);
diff --git a/gcc/tree-vrp.cc b/gcc/tree-vrp.cc
index d7b194f5904..4f8c7745461 100644
--- a/gcc/tree-vrp.cc
+++ b/gcc/tree-vrp.cc
@@ -1120,36 +1120,32 @@ const pass_data pass_data_early_vrp =
   ( TODO_cleanup_cfg | TODO_update_ssa | TODO_verify_all ),
 };
 
-static int vrp_pass_num = 0;
 class pass_vrp : public gimple_opt_pass
 {
 public:
-  pass_vrp (gcc::context *ctxt, const pass_data _)
-: gimple_opt_pass (data_, ctxt), data (data_), warn_array_bounds_p (false),
-  my_pass (vrp_pass_num++)
-  {}
+  pass_vrp (gcc::context *ctxt, const pass_data _, bool warn_p)
+: gimple_opt_pass (data_, ctxt), data (data_),
+  warn_array_bounds_p (warn_p), final_p (false)
+{ }
 
   /* opt_pass methods: */
-  opt_pass * clone () final override { return new pass_vrp (m_ctxt, data); }
+  opt_pass * clone () final override
+{ return new pass_vrp (m_ctxt, data, false); }
   void set_pass_param (unsigned int n, bool param) final override
 {
   gcc_assert (n == 0);
-  warn_array_bounds_p = param;
+  final_p = param;
 }
   bool gate (function *) final override { return flag_tree_vrp != 0; }
   unsigned int execute (function *fun) final override
 {
-  // Early VRP pass.
-  if (my_pass == 0)
-	return execute_ranger_vrp (fun, /*warn_array_bounds_p=*/false, false);
-
-  return execute_ranger_vrp (fun, warn_array_bounds_p, my_pass == 2);
+  return execute_ranger_vrp (fun, warn_array_bounds_p, final_p);
 }
 
  private:
   const pass_data 
   bool warn_array_bounds_p;
-  int my_pass;
+  bool final_p;
 }; // class pass_vrp
 
 const pass_data pass_data_assumptions =
@@ -1219,13 +1215,13 @@ public:
 gimple_opt_pass *
 make_pass_vrp (gcc::context *ctxt)
 {
-  return new pass_vrp (ctxt, pass_data_vrp);
+  return new pass_vrp (ctxt, pass_data_vrp, true);
 }
 
 gimple_opt_pass *
 make_pass_early_vrp (gcc::context *ctxt)
 {
-  return new pass_vrp (ctxt, pass_data_early_vrp);
+  return new pass_vrp (ctxt, pass_data_early_vrp, false);
 }
 
 gimple_opt_pass *


[committed] ipa-modref: Fix dumping

2023-10-03 Thread Martin Jambor
Hi,

function dump_lto_records ought to dump to its parameter OUT but was
dumping expressions to dump_file.  This is corrected by this patch and
while at at, I also made the modref_summary::dump member function
const so that it is callable from more contexts.

I have committed this patch as obvious after including it in a bootstrap
and testing on an x86_64-linux.

Thanks,

Martin


gcc/ChangeLog:

2023-09-21  Martin Jambor  

* ipa-modref.h (modref_summary::dump): Make const.
* ipa-modref.cc (modref_summary::dump): Likewise.
(dump_lto_records): Dump to out instead of dump_file.
---
 gcc/ipa-modref.cc | 6 +++---
 gcc/ipa-modref.h  | 2 +-
 2 files changed, 4 insertions(+), 4 deletions(-)

diff --git a/gcc/ipa-modref.cc b/gcc/ipa-modref.cc
index c04f9f44c06..fe55621f007 100644
--- a/gcc/ipa-modref.cc
+++ b/gcc/ipa-modref.cc
@@ -474,7 +474,7 @@ dump_lto_records (modref_records_lto *tt, FILE *out)
   FOR_EACH_VEC_SAFE_ELT (tt->bases, i, n)
 {
   fprintf (out, "  Base %i:", (int)i);
-  print_generic_expr (dump_file, n->base);
+  print_generic_expr (out, n->base);
   fprintf (out, " (alias set %i)\n",
   n->base ? get_alias_set (n->base) : 0);
   if (n->every_ref)
@@ -487,7 +487,7 @@ dump_lto_records (modref_records_lto *tt, FILE *out)
   FOR_EACH_VEC_SAFE_ELT (n->refs, j, r)
{
  fprintf (out, "Ref %i:", (int)j);
- print_generic_expr (dump_file, r->ref);
+ print_generic_expr (out, r->ref);
  fprintf (out, " (alias set %i)\n",
   r->ref ? get_alias_set (r->ref) : 0);
  if (r->every_access)
@@ -567,7 +567,7 @@ remove_modref_edge_summaries (cgraph_node *node)
 /* Dump summary.  */
 
 void
-modref_summary::dump (FILE *out)
+modref_summary::dump (FILE *out) const
 {
   if (loads)
 {
diff --git a/gcc/ipa-modref.h b/gcc/ipa-modref.h
index 2a2d31e86db..f7dedace2da 100644
--- a/gcc/ipa-modref.h
+++ b/gcc/ipa-modref.h
@@ -66,7 +66,7 @@ struct GTY(()) modref_summary
 
   modref_summary ();
   ~modref_summary ();
-  void dump (FILE *);
+  void dump (FILE *) const;
   bool useful_p (int ecf_flags, bool check_flags = true);
   void finalize (tree);
 };
-- 
2.42.0



Re: [PATCH] ipa: Self-DCE of uses of removed call LHSs (PR 108007)

2023-10-03 Thread Martin Jambor
Hello,

On Mon, Sep 25 2023, Jan Hubicka wrote:

[...]

>> >> +static void
>> >> +purge_transitive_uses (tree name, hash_set  *killed_ssas)
>> >> +{
>> >> +  imm_use_iterator imm_iter;
>> >> +  gimple *stmt;
>> >> +
>> >> +  FOR_EACH_IMM_USE_STMT (stmt, imm_iter, name)
>> >> +{
>> >> +  if (gimple_debug_bind_p (stmt))
>> >> + {
>> >> +   /* When runing within tree-inline, we will never end up here but
>> >> +  adding the SSAs to killed_ssas will do the trick in this case and
>> >> +  the respective debug statements will get reset. */
>> >> +
>> >> +   gimple_debug_bind_reset_value (stmt);
>> >> +   update_stmt (stmt);
>> >> +   continue;
>> >> + }
>> >> +
>> >> +  tree lhs = NULL_TREE;
>> >> +  if (is_gimple_assign (stmt))
>> >> + lhs = gimple_assign_lhs (stmt);
>> >> +  else if (gimple_code (stmt) == GIMPLE_PHI)
>> >> + lhs = gimple_phi_result (stmt);
>> >> +  gcc_assert (lhs
>> >> +   && (TREE_CODE (lhs) == SSA_NAME)
>> >> +   && !gimple_vdef (stmt));
>> >> +
>> >> +  if (!killed_ssas->contains (lhs))
>> >> + {
>> >> +   killed_ssas->add (lhs);
>> >> +   purge_transitive_uses (lhs, killed_ssas);
>
> SSA graph may be deep so this may cause stack overflow, so I think we
> should use worklist here (it is also easy to do).
>
> OK with that change.
> Honza

I have just committed the following after a bootstrap and testing on
x86_64-linux.

Thanks,

Martin


PR 108007 is another manifestation where we rely on DCE to clean-up
after IPA-SRA and if the user explicitely switches DCE off, IPA-SRA
can leave behind statements which are fed uninitialized values and
trap, even though their results are themselves never used.

I have already fixed this for unused parameters in callees, this bug
shows that almost the same thing can happen for removed returns, on
the side of callers.  This means that the issue has to be fixed
elsewhere, in call redirection.  This patch adds a function which
looks for (and through, using a work-list) uses of operations fed
specific SSA names and removes them all.

That would have been easy if it wasn't for debug statements during
tree-inline (from which call redirection is also invoked).  Debug
statements are decoupled from the rest at this point and iterating
over uses of SSAs does not bring them up.  During tree-inline they are
handled especially at the end, I assume in order to make sure that
relative ordering of UIDs are the same with and without debug info.

This means that during tree-inline we need to make a hash of killed
SSAs, that we already have in copy_body_data, available to the
function making the purging.  So the patch duly does also that, making
the interface slightly ugly.

gcc/ChangeLog:

2023-09-27  Martin Jambor  

PR ipa/108007
* cgraph.h (cgraph_edge): Add a parameter to
redirect_call_stmt_to_callee.
* ipa-param-manipulation.h (ipa_param_adjustments): Add a
parameter to modify_call.
* cgraph.cc (cgraph_edge::redirect_call_stmt_to_callee): New
parameter killed_ssas, pass it to padjs->modify_call.
* ipa-param-manipulation.cc (purge_transitive_uses): New function.
(ipa_param_adjustments::modify_call): New parameter killed_ssas.
Instead of substituting uses, invoke purge_transitive_uses.  If
hash of killed SSAs has not been provided, create a temporary one
and release SSAs that have been added to it.
* tree-inline.cc (redirect_all_calls): Create
id->killed_new_ssa_names earlier, pass it to edge redirection,
adjust a comment.
(copy_body): Release SSAs in id->killed_new_ssa_names.

gcc/testsuite/ChangeLog:

2023-05-11  Martin Jambor  

PR ipa/108007
* gcc.dg/ipa/pr108007.c: New test.
---
 gcc/cgraph.cc   | 10 +++-
 gcc/cgraph.h|  9 ++-
 gcc/ipa-param-manipulation.cc   | 88 +
 gcc/ipa-param-manipulation.h|  3 +-
 gcc/testsuite/gcc.dg/ipa/pr108007.c | 32 +++
 gcc/tree-inline.cc  | 28 +
 6 files changed, 132 insertions(+), 38 deletions(-)
 create mode 100644 gcc/testsuite/gcc.dg/ipa/pr108007.c

diff --git a/gcc/cgraph.cc b/gcc/cgraph.cc
index e41e5ad3ae7..b82367ac342 100644
--- a/gcc/cgraph.cc
+++ b/gcc/cgraph.cc
@@ -1403,11 +1403,17 @@ cgraph_edge::redirect_callee (cgraph_node *n)
speculative indirect call, remove "speculative" of the indirect call and
also redirect stmt to it's final direct target.
 
+   When called from within tree-inline, KILLED_SSAs has to contain the pointer
+   to killed_new_ssa_names within the copy_body_data structure and SSAs
+   discovered to be useless (if LHS is removed) will be added to it, otherwise
+   it needs to be NULL.
+
It is up to caller to iteratively transform each "speculative"
direct call as appropriate.  */
 
 gimple *
-cgraph_edge::redirect_call_stmt_to_callee (cgraph_edge *e)

Re: [COMMITTED] Remove pass counting in VRP.

2023-10-03 Thread David Malcolm
On Tue, 2023-10-03 at 10:32 -0400, Andrew MacLeod wrote:
> Pass counting in VRP is used to decide when to call early VRP, pass
> the 
> flag to enable warnings, and when the final pass is.
> 
> If you try to add additional passes, this becomes quite fragile. This
> patch simply chooses the pass based on the data pointer passed in,
> and 
> remove the pass counter.   The first FULL VRP pass invokes the
> warning 
> code, and the flag passed in now represents the FINAL pass of VRP.  
> There is no longer a global flag which, as it turns out, wasn't
> working 
> well with the JIT compiler, but when undetected.  (Thanks to dmalcolm
> for helping me sort out what was going on there)
> 
> 
> Bootstraps  on x86_64-pc-linux-gnu with no regressions.   Pushed.

[CCing jit mailing list]

I'm worried that this patch may have "papered over" an issue with
libgccjit.  Specifically:

[...snip...]

> diff --git a/gcc/tree-vrp.cc b/gcc/tree-vrp.cc
> index d7b194f5904..05266dfe34a 100644
> --- a/gcc/tree-vrp.cc
> +++ b/gcc/tree-vrp.cc
> @@ -1120,36 +1120,44 @@ const pass_data pass_data_early_vrp =
>( TODO_cleanup_cfg | TODO_update_ssa | TODO_verify_all ),
>  };
>  
> -static int vrp_pass_num = 0;
> +static bool run_warning_pass = true;

I see the global variable "run_warning_pass" starts out true here

>  class pass_vrp : public gimple_opt_pass
>  {
>  public:
>pass_vrp (gcc::context *ctxt, const pass_data _)
> -: gimple_opt_pass (data_, ctxt), data (data_), warn_array_bounds_p 
> (false),
> -  my_pass (vrp_pass_num++)
> -  {}
> +: gimple_opt_pass (data_, ctxt), data (data_),
> +  warn_array_bounds_p (false), final_p (false)
> +  {
> +// Only the frst VRP pass should run warnings.
> +if ( == _data_vrp)
> +  {
> + warn_array_bounds_p = run_warning_pass;
> + run_warning_pass = false;

...and run_warning_pass affects the member data
pass_vrp::warn_array_bounds_p here, and then becomes false, but nothing
seems to ever reset run_warning_pass back to true.

It seems that with this patch, if libgccjit compiles more than one
gcc_jit_context in the same process, the first context compilation will
warn, whereas subsequent ones in that process won't.

Or did I miss something?

[...snip...]

Thoughts?
Dave



Re: [COMMITTED] Return TRUE only when a global value is updated.

2023-10-03 Thread Andrew MacLeod

perfect.  I'll check it in when my testrun is done.

Thanks  .. .  and sorry :-)

Andrew

On 10/3/23 12:53, David Edelsohn wrote:

AIX bootstrap is happier with the patch.

Thanks, David

On Tue, Oct 3, 2023 at 12:30 PM Andrew MacLeod  
wrote:


Give this a try..  I'm testing it here, but x86 doesn't seem to
show it
anyway for some reason :-P

I think i needed to handle pointers special since SSA_NAMES handle
pointer ranges different.

Andrew

On 10/3/23 11:47, David Edelsohn wrote:
> This patch caused a bootstrap failure on AIX.
>
> during GIMPLE pass: evrp
>
> /nasfarm/edelsohn/src/src/libgcc/libgcc2.c: In function
'__gcc_bcmp':
>
> /nasfarm/edelsohn/src/src/libgcc/libgcc2.c:2910:1: internal
compiler
> error: in get_irange, at value-range-storage.cc:343
>
> 2910 | }
>
> | ^
>
>
> 0x11b7f4b7 irange_storage::get_irange(irange&, tree_node*) const
>
> /nasfarm/edelsohn/src/src/gcc/value-range-storage.cc:343
>
> 0x11b7e7af vrange_storage::get_vrange(vrange&, tree_node*) const
>
> /nasfarm/edelsohn/src/src/gcc/value-range-storage.cc:178
>
> 0x139f3d77 range_info_get_range(tree_node const*, vrange&)
>
> /nasfarm/edelsohn/src/src/gcc/tree-ssanames.cc:118
>
> 0x1134b463 set_range_info(tree_node*, vrange const&)
>
> /nasfarm/edelsohn/src/src/gcc/tree-ssanames.cc:425
>
> 0x116a7333 gimple_ranger::register_inferred_ranges(gimple*)
>
> /nasfarm/edelsohn/src/src/gcc/gimple-range.cc:487
>
> 0x125cef27 rvrp_folder::fold_stmt(gimple_stmt_iterator*)
>
> /nasfarm/edelsohn/src/src/gcc/tree-vrp.cc:1033
>
> 0x123dd063
>
substitute_and_fold_dom_walker::before_dom_children(basic_block_def*)
>
> /nasfarm/edelsohn/src/src/gcc/tree-ssa-propagate.cc:876
>
> 0x1176cc43 dom_walker::walk(basic_block_def*)
>
> /nasfarm/edelsohn/src/src/gcc/domwalk.cc:311
>
> 0x123dd733
> substitute_and_fold_engine::substitute_and_fold(basic_block_def*)
>
> /nasfarm/edelsohn/src/src/gcc/tree-ssa-propagate.cc:999
>
> 0x123d0f5f execute_ranger_vrp(function*, bool, bool)
>
> /nasfarm/edelsohn/src/src/gcc/tree-vrp.cc:1062
>
> 0x123d14ef execute
>
> /nasfarm/edelsohn/src/src/gcc/tree-vrp.cc:1142
>





Re: Check that passes do not forget to define profile

2023-10-03 Thread Andre Vieira (lists)

Hi Honza,

My current patch set for AArch64 VLA omp codegen started failing on 
gcc.dg/gomp/pr87898.c after this. I traced it back to 
'move_sese_region_to_fn' in tree/cfg.cc not setting count for the bb 
created.


I was able to 'fix' it locally by setting the count of the new bb to the 
accumulation of e->count () of all the entry_endges (if initialized). 
I'm however not even close to certain that's the right approach, 
attached patch for illustration.


Kind regards,
Andre

On 24/08/2023 14:14, Jan Hubicka via Gcc-patches wrote:

Hi,
this patch extends verifier to check that all probabilities and counts are
initialized if profile is supposed to be present.  This is a bit complicated
by the posibility that we inline !flag_guess_branch_probability function
into function with profile defined and in this case we need to stop
verification.  For this reason I added flag to cfg structure tracking this.

Bootstrapped/regtested x86_64-linux, comitted.

gcc/ChangeLog:

* cfg.h (struct control_flow_graph): New field full_profile.
* auto-profile.cc (afdo_annotate_cfg): Set full_profile to true.
* cfg.cc (init_flow): Set full_profile to false.
* graphite.cc (graphite_transform_loops): Set full_profile to false.
* lto-streamer-in.cc (input_cfg): Initialize full_profile flag.
* predict.cc (pass_profile::execute): Set full_profile to true.
* symtab-thunks.cc (expand_thunk): Set full_profile to true.
* tree-cfg.cc (gimple_verify_flow_info): Verify that profile is full
if full_profile is set.
* tree-inline.cc (initialize_cfun): Initialize full_profile.
(expand_call_inline): Combine full_profile.


diff --git a/gcc/auto-profile.cc b/gcc/auto-profile.cc
index e3af3555e75..ff3b763945c 100644
--- a/gcc/auto-profile.cc
+++ b/gcc/auto-profile.cc
@@ -1578,6 +1578,7 @@ afdo_annotate_cfg (const stmt_set _stmts)
  }
update_max_bb_count ();
profile_status_for_fn (cfun) = PROFILE_READ;
+  cfun->cfg->full_profile = true;
if (flag_value_profile_transformations)
  {
gimple_value_profile_transformations ();
diff --git a/gcc/cfg.cc b/gcc/cfg.cc
index 9eb9916f61a..b7865f14e7f 100644
--- a/gcc/cfg.cc
+++ b/gcc/cfg.cc
@@ -81,6 +81,7 @@ init_flow (struct function *the_fun)
  = ENTRY_BLOCK_PTR_FOR_FN (the_fun);
the_fun->cfg->edge_flags_allocated = EDGE_ALL_FLAGS;
the_fun->cfg->bb_flags_allocated = BB_ALL_FLAGS;
+  the_fun->cfg->full_profile = false;
  }
  
  /* Helper function for remove_edge and free_cffg.  Frees edge structure
diff --git a/gcc/cfg.h b/gcc/cfg.h
index a0e944979c8..53e2553012c 100644
--- a/gcc/cfg.h
+++ b/gcc/cfg.h
@@ -78,6 +78,9 @@ struct GTY(()) control_flow_graph {
/* Dynamically allocated edge/bb flags.  */
int edge_flags_allocated;
int bb_flags_allocated;
+
+  /* Set if the profile is computed on every edge and basic block.  */
+  bool full_profile;
  };
  
  
diff --git a/gcc/graphite.cc b/gcc/graphite.cc

index 19f8975ffa2..2b387d5b016 100644
--- a/gcc/graphite.cc
+++ b/gcc/graphite.cc
@@ -512,6 +512,8 @@ graphite_transform_loops (void)
  
if (changed)

  {
+  /* FIXME: Graphite does not update profile meaningfully currently.  */
+  cfun->cfg->full_profile = false;
cleanup_tree_cfg ();
profile_status_for_fn (cfun) = PROFILE_ABSENT;
release_recorded_exits (cfun);
diff --git a/gcc/lto-streamer-in.cc b/gcc/lto-streamer-in.cc
index 0cce14414ca..d3128fcebe4 100644
--- a/gcc/lto-streamer-in.cc
+++ b/gcc/lto-streamer-in.cc
@@ -1030,6 +1030,7 @@ input_cfg (class lto_input_block *ib, class data_in 
*data_in,
basic_block p_bb;
unsigned int i;
int index;
+  bool full_profile = false;
  
init_empty_tree_cfg_for_function (fn);
  
@@ -1071,6 +1072,8 @@ input_cfg (class lto_input_block *ib, class data_in *data_in,

  data_in->location_cache.input_location_and_block (>goto_locus,
, ib, data_in);
  e->probability = profile_probability::stream_in (ib);
+ if (!e->probability.initialized_p ())
+   full_profile = false;
  
  	}
  
@@ -1145,6 +1148,7 @@ input_cfg (class lto_input_block *ib, class data_in *data_in,
  
/* Rebuild the loop tree.  */

flow_loops_find (loops);
+  cfun->cfg->full_profile = full_profile;
  }
  
  
diff --git a/gcc/predict.cc b/gcc/predict.cc

index 5a1a561cc24..396746cbfd1 100644
--- a/gcc/predict.cc
+++ b/gcc/predict.cc
@@ -4131,6 +4131,7 @@ pass_profile::execute (function *fun)
  scev_initialize ();
  
tree_estimate_probability (false);

+  cfun->cfg->full_profile = true;
  
if (nb_loops > 1)

  scev_finalize ();
diff --git a/gcc/symtab-thunks.cc b/gcc/symtab-thunks.cc
index 4c04235c41b..23ead0d2138 100644
--- a/gcc/symtab-thunks.cc
+++ b/gcc/symtab-thunks.cc
@@ -648,6 +648,7 @@ expand_thunk (cgraph_node *node, bool output_asm_thunks,
  ? PROFILE_READ : PROFILE_GUESSED;
/* FIXME: C++ FE 

Re: [PATCH v3] RISC-V:Optimize the MASK opt generation

2023-10-03 Thread Jeff Law




On 10/2/23 20:38, Kito Cheng wrote:

Proposed fix, and verified with "mawk" and "gawk -P" (gawk with posix
mode) on my linux also some other report it work on freebsd, just wait
review :)

https://gcc.gnu.org/pipermail/gcc-patches/2023-October/631785.html

OK
jeff


Re: PING: PR rtl-optimization/110701

2023-10-03 Thread Jeff Law




On 10/3/23 09:55, Roger Sayle wrote:

There are a small handful of middle-end maintainers/reviewers that

understand and appreciate the difference between the RTL statements:

(set (subreg:HI (reg:SI x)) (reg:HI y))

and

(set (strict_lowpart:HI (reg:SI x)) (reg:HI y))

If one (or more) of them could please take a look at

https://gcc.gnu.org/pipermail/gcc-patches/2023-July/625532.html 



I’d very much appreciate it (one less wrong-code regression).

This definitely fell through the cracks.

The patch is OK for the trunk.

Thanks again,
jeff


Re: [COMMITTED] Return TRUE only when a global value is updated.

2023-10-03 Thread David Edelsohn
AIX bootstrap is happier with the patch.

Thanks, David

On Tue, Oct 3, 2023 at 12:30 PM Andrew MacLeod  wrote:

> Give this a try..  I'm testing it here, but x86 doesn't seem to show it
> anyway for some reason :-P
>
> I think i needed to handle pointers special since SSA_NAMES handle
> pointer ranges different.
>
> Andrew
>
> On 10/3/23 11:47, David Edelsohn wrote:
> > This patch caused a bootstrap failure on AIX.
> >
> > during GIMPLE pass: evrp
> >
> > /nasfarm/edelsohn/src/src/libgcc/libgcc2.c: In function '__gcc_bcmp':
> >
> > /nasfarm/edelsohn/src/src/libgcc/libgcc2.c:2910:1: internal compiler
> > error: in get_irange, at value-range-storage.cc:343
> >
> > 2910 | }
> >
> > | ^
> >
> >
> > 0x11b7f4b7 irange_storage::get_irange(irange&, tree_node*) const
> >
> > /nasfarm/edelsohn/src/src/gcc/value-range-storage.cc:343
> >
> > 0x11b7e7af vrange_storage::get_vrange(vrange&, tree_node*) const
> >
> > /nasfarm/edelsohn/src/src/gcc/value-range-storage.cc:178
> >
> > 0x139f3d77 range_info_get_range(tree_node const*, vrange&)
> >
> > /nasfarm/edelsohn/src/src/gcc/tree-ssanames.cc:118
> >
> > 0x1134b463 set_range_info(tree_node*, vrange const&)
> >
> > /nasfarm/edelsohn/src/src/gcc/tree-ssanames.cc:425
> >
> > 0x116a7333 gimple_ranger::register_inferred_ranges(gimple*)
> >
> > /nasfarm/edelsohn/src/src/gcc/gimple-range.cc:487
> >
> > 0x125cef27 rvrp_folder::fold_stmt(gimple_stmt_iterator*)
> >
> > /nasfarm/edelsohn/src/src/gcc/tree-vrp.cc:1033
> >
> > 0x123dd063
> > substitute_and_fold_dom_walker::before_dom_children(basic_block_def*)
> >
> > /nasfarm/edelsohn/src/src/gcc/tree-ssa-propagate.cc:876
> >
> > 0x1176cc43 dom_walker::walk(basic_block_def*)
> >
> > /nasfarm/edelsohn/src/src/gcc/domwalk.cc:311
> >
> > 0x123dd733
> > substitute_and_fold_engine::substitute_and_fold(basic_block_def*)
> >
> > /nasfarm/edelsohn/src/src/gcc/tree-ssa-propagate.cc:999
> >
> > 0x123d0f5f execute_ranger_vrp(function*, bool, bool)
> >
> > /nasfarm/edelsohn/src/src/gcc/tree-vrp.cc:1062
> >
> > 0x123d14ef execute
> >
> > /nasfarm/edelsohn/src/src/gcc/tree-vrp.cc:1142
> >


[PATCH] c++: print source code in print_instantiation_partial_context_line

2023-10-03 Thread David Malcolm
As mentioned in my Cauldron talk, this patch adds a call to
diagnostic_show_locus to the "required from here" messages
in print_instantiation_partial_context_line, so that e.g., rather
than the rather mystifying:

In file included from ../x86_64-pc-linux-gnu/libstdc++-v3/include/memory:78,
 from ../../src/demo-1.C:1:
../x86_64-pc-linux-gnu/libstdc++-v3/include/bits/unique_ptr.h: In instantiation 
of ‘std::__detail::__unique_ptr_t<_Tp> std::make_unique(_Args&& ...) [with _Tp 
= bar; _Args = {}; __detail::__unique_ptr_t<_Tp> = 
__detail::__unique_ptr_t]’:
../../src/demo-1.C:15:32:   required from here
../x86_64-pc-linux-gnu/libstdc++-v3/include/bits/unique_ptr.h:1066:30: error: 
no matching function for call to ‘bar::bar()’
 1066 | { return unique_ptr<_Tp>(new _Tp(std::forward<_Args>(__args)...)); }
  |  ^~~
../../src/demo-1.C:10:3: note: candidate: ‘bar::bar(int)’
   10 |   bar (int);
  |   ^~~
../../src/demo-1.C:10:3: note:   candidate expects 1 argument, 0 provided
../../src/demo-1.C:7:7: note: candidate: ‘constexpr bar::bar(const bar&)’
7 | class bar : public foo
  |   ^~~
../../src/demo-1.C:7:7: note:   candidate expects 1 argument, 0 provided
../../src/demo-1.C:7:7: note: candidate: ‘constexpr bar::bar(bar&&)’
../../src/demo-1.C:7:7: note:   candidate expects 1 argument, 0 provided

we emit:

In file included from ../x86_64-pc-linux-gnu/libstdc++-v3/include/memory:78,
 from ../../src/demo-1.C:1:
../x86_64-pc-linux-gnu/libstdc++-v3/include/bits/unique_ptr.h: In instantiation 
of ‘std::__detail::__unique_ptr_t<_Tp> std::make_unique(_Args&& ...) [with _Tp 
= bar; _Args = {}; __detail::__unique_ptr_t<_Tp> = 
__detail::__unique_ptr_t]’:
../../src/demo-1.C:15:32:   required from here
   15 |   return std::make_unique ();
  |  ~~^~
../x86_64-pc-linux-gnu/libstdc++-v3/include/bits/unique_ptr.h:1066:30: error: 
no matching function for call to ‘bar::bar()’
 1066 | { return unique_ptr<_Tp>(new _Tp(std::forward<_Args>(__args)...)); }
  |  ^~~
../../src/demo-1.C:10:3: note: candidate: ‘bar::bar(int)’
   10 |   bar (int);
  |   ^~~
../../src/demo-1.C:10:3: note:   candidate expects 1 argument, 0 provided
../../src/demo-1.C:7:7: note: candidate: ‘constexpr bar::bar(const bar&)’
7 | class bar : public foo
  |   ^~~
../../src/demo-1.C:7:7: note:   candidate expects 1 argument, 0 provided
../../src/demo-1.C:7:7: note: candidate: ‘constexpr bar::bar(bar&&)’
../../src/demo-1.C:7:7: note:   candidate expects 1 argument, 0 provided

which shows the code that's leading to the error (the bad call to
std::make_unique).


Successfully bootstrapped & regrtested on x86_64-pc-linux-gnu.

OK for trunk?


gcc/cp/ChangeLog:
* error.cc (print_instantiation_partial_context_line): Call
diagnostic_show_locus.

gcc/testsuite/ChangeLog:
* g++.dg/diagnostic/static_assert3.C: Add directives for
additional source printing.
* g++.dg/template/error60.C: New test.

Signed-off-by: David Malcolm 
---
 gcc/cp/error.cc   |  2 +
 .../g++.dg/diagnostic/static_assert3.C|  7 +++-
 gcc/testsuite/g++.dg/template/error60.C   | 37 +++
 3 files changed, 45 insertions(+), 1 deletion(-)
 create mode 100644 gcc/testsuite/g++.dg/template/error60.C

diff --git a/gcc/cp/error.cc b/gcc/cp/error.cc
index ef96e140f24..767478cf5fd 100644
--- a/gcc/cp/error.cc
+++ b/gcc/cp/error.cc
@@ -3774,6 +3774,8 @@ print_instantiation_partial_context_line 
(diagnostic_context *context,
   ? _("recursively required from here\n")
   : _("required from here\n"));
 }
+  gcc_rich_location rich_loc (loc);
+  diagnostic_show_locus (context, _loc, DK_NOTE);
 }
 
 /* Same as print_instantiation_full_context but less verbose.  */
diff --git a/gcc/testsuite/g++.dg/diagnostic/static_assert3.C 
b/gcc/testsuite/g++.dg/diagnostic/static_assert3.C
index 5d363884508..4ec53f17120 100644
--- a/gcc/testsuite/g++.dg/diagnostic/static_assert3.C
+++ b/gcc/testsuite/g++.dg/diagnostic/static_assert3.C
@@ -5,6 +5,11 @@
 template  struct is_same { static constexpr bool value 
= false; };
 template  struct is_same { static constexpr bool value = 
true; };
 
+/* { dg-begin-multiline-output "" }
+  f(0, 1.3);
+  ~^~~~
+   { dg-end-multiline-output "" } */
+
 template 
 void f(T, U)
 {
@@ -32,5 +37,5 @@ void f(T, U)
 
 void g()
 {
- f(0, 1.3);
+ f(0, 1.3); // { dg-message " required from here" }
 }
diff --git a/gcc/testsuite/g++.dg/template/error60.C 
b/gcc/testsuite/g++.dg/template/error60.C
new file mode 100644
index 000..8c2139b207c
--- /dev/null
+++ b/gcc/testsuite/g++.dg/template/error60.C
@@ -0,0 +1,37 @@
+// { dg-options "-fdiagnostics-show-caret" }
+
+template 
+struct my_pointer
+{
+  my_pointer (Foo *ptr) // { dg-message " 

Re: [COMMITTED] Return TRUE only when a global value is updated.

2023-10-03 Thread Andrew MacLeod
Give this a try..  I'm testing it here, but x86 doesn't seem to show it 
anyway for some reason :-P


I think i needed to handle pointers special since SSA_NAMES handle 
pointer ranges different.


Andrew

On 10/3/23 11:47, David Edelsohn wrote:

This patch caused a bootstrap failure on AIX.

during GIMPLE pass: evrp

/nasfarm/edelsohn/src/src/libgcc/libgcc2.c: In function '__gcc_bcmp':

/nasfarm/edelsohn/src/src/libgcc/libgcc2.c:2910:1: internal compiler 
error: in get_irange, at value-range-storage.cc:343


2910 | }

| ^


0x11b7f4b7 irange_storage::get_irange(irange&, tree_node*) const

/nasfarm/edelsohn/src/src/gcc/value-range-storage.cc:343

0x11b7e7af vrange_storage::get_vrange(vrange&, tree_node*) const

/nasfarm/edelsohn/src/src/gcc/value-range-storage.cc:178

0x139f3d77 range_info_get_range(tree_node const*, vrange&)

/nasfarm/edelsohn/src/src/gcc/tree-ssanames.cc:118

0x1134b463 set_range_info(tree_node*, vrange const&)

/nasfarm/edelsohn/src/src/gcc/tree-ssanames.cc:425

0x116a7333 gimple_ranger::register_inferred_ranges(gimple*)

/nasfarm/edelsohn/src/src/gcc/gimple-range.cc:487

0x125cef27 rvrp_folder::fold_stmt(gimple_stmt_iterator*)

/nasfarm/edelsohn/src/src/gcc/tree-vrp.cc:1033

0x123dd063 
substitute_and_fold_dom_walker::before_dom_children(basic_block_def*)


/nasfarm/edelsohn/src/src/gcc/tree-ssa-propagate.cc:876

0x1176cc43 dom_walker::walk(basic_block_def*)

/nasfarm/edelsohn/src/src/gcc/domwalk.cc:311

0x123dd733 
substitute_and_fold_engine::substitute_and_fold(basic_block_def*)


/nasfarm/edelsohn/src/src/gcc/tree-ssa-propagate.cc:999

0x123d0f5f execute_ranger_vrp(function*, bool, bool)

/nasfarm/edelsohn/src/src/gcc/tree-vrp.cc:1062

0x123d14ef execute

/nasfarm/edelsohn/src/src/gcc/tree-vrp.cc:1142
diff --git a/gcc/tree-ssanames.cc b/gcc/tree-ssanames.cc
index 1eae411ac1c..1401f67c781 100644
--- a/gcc/tree-ssanames.cc
+++ b/gcc/tree-ssanames.cc
@@ -420,15 +420,11 @@ set_range_info (tree name, const vrange )
 
   // Pick up the current range, or VARYING if none.
   tree type = TREE_TYPE (name);
-  Value_Range tmp (type);
-  if (range_info_p (name))
-range_info_get_range (name, tmp);
-  else
-tmp.set_varying (type);
-
   if (POINTER_TYPE_P (type))
 {
-  if (r.nonzero_p () && !tmp.nonzero_p ())
+  struct ptr_info_def *pi = get_ptr_info (name);
+  // If R is nonnull and pi is not, set nonnull.
+  if (r.nonzero_p () && (!pi || !pi->pt.null))
 	{
 	  set_ptr_nonnull (name);
 	  return true;
@@ -436,6 +432,11 @@ set_range_info (tree name, const vrange )
   return false;
 }
 
+  Value_Range tmp (type);
+  if (range_info_p (name))
+range_info_get_range (name, tmp);
+  else
+tmp.set_varying (type);
   // If the result doesn't change, or is undefined, return false.
   if (!tmp.intersect (r) || tmp.undefined_p ())
 return false;


Re: [ARC PATCH] Split SImode shifts pre-reload on !TARGET_BARREL_SHIFTER.

2023-10-03 Thread Claudiu Zissulescu Ianculescu
Hi Roger,

It is not necessary to do any mods on your patch. I've just answered
the questions which you asked me. The adds are faster for the ARC CPUs
which are still in production, and I suppose we can leverage the LP
instruction use with DBNZ instructions for implementing loops. I'll
come back to you asap, after I've got the nightly results :)

Thank you,
Claudiu

On Tue, Oct 3, 2023 at 6:34 PM Roger Sayle  wrote:
>
>
> Hi Claudiu,
> Thanks for the answers to my technical questions.
> If you'd prefer to update arc.md's add3 pattern first,
> I'm happy to update/revise my patch based on this
> and your feedback, for example preferring add over
> asl_s (or controlling this choice with -Os).
>
> Thanks again.
> Roger
> --
>
> > -Original Message-
> > From: Claudiu Zissulescu 
> > Sent: 03 October 2023 15:26
> > To: Roger Sayle ; gcc-patches@gcc.gnu.org
> > Subject: RE: [ARC PATCH] Split SImode shifts pre-reload on
> > !TARGET_BARREL_SHIFTER.
> >
> > Hi Roger,
> >
> > It was nice to meet you too.
> >
> > Thank you in looking into the ARC's non-Barrel Shifter configurations.  I
> will dive
> > into your patch asap, but before starting here are a few of my comments:
> >
> > -Original Message-
> > From: Roger Sayle 
> > Sent: Thursday, September 28, 2023 2:27 PM
> > To: gcc-patches@gcc.gnu.org
> > Cc: Claudiu Zissulescu 
> > Subject: [ARC PATCH] Split SImode shifts pre-reload on
> > !TARGET_BARREL_SHIFTER.
> >
> >
> > Hi Claudiu,
> > It was great meeting up with you and the Synopsys ARC team at the GNU
> tools
> > Cauldron in Cambridge.
> >
> > This patch is the first in a series to improve SImode and DImode shifts
> and rotates
> > in the ARC backend.  This first piece splits SImode shifts, for
> > !TARGET_BARREL_SHIFTER targets, after combine and before reload, in the
> split1
> > pass, as suggested by the FIXME comment above output_shift in arc.cc.  To
> do
> > this I've copied the implementation of the x86_pre_reload_split function
> from
> > i386 backend, and renamed it arc_pre_reload_split.
> >
> > Although the actual implementations of shifts remain the same (as in
> > output_shift), having them as explicit instructions in the RTL stream
> allows better
> > scheduling and use of compact forms when available.  The benefits can be
> seen in
> > two short examples below.
> >
> > For the function:
> > unsigned int foo(unsigned int x, unsigned int y) {
> >   return y << 2;
> > }
> >
> > GCC with -O2 -mcpu=em would previously generate:
> > foo:add r1,r1,r1
> > add r1,r1,r1
> > j_s.d   [blink]
> > mov_s   r0,r1   ;4
> >
> > [CZI] The move shouldn't be generated indeed. The use of ADDs are slightly
> > beneficial for older ARCv1 arches.
> >
> > and with this patch now generates:
> > foo:asl_s r0,r1
> > j_s.d   [blink]
> > asl_s r0,r0
> >
> > [CZI] Nice. This new sequence is as fast as we can get for our ARCv2 cpus.
> >
> > Notice the original (from shift_si3's output_shift) requires the shift
> sequence to be
> > monolithic with the same destination register as the source (requiring an
> extra
> > mov_s).  The new version can eliminate this move, and schedule the second
> asl in
> > the branch delay slot of the return.
> >
> > For the function:
> > int x,y,z;
> >
> > void bar()
> > {
> >   x <<= 3;
> >   y <<= 3;
> >   z <<= 3;
> > }
> >
> > GCC -O2 -mcpu=em currently generates:
> > bar:push_s  r13
> > ld.as   r12,[gp,@x@sda] ;23
> > ld.as   r3,[gp,@y@sda]  ;23
> > mov r2,0
> > add3 r12,r2,r12
> > mov r2,0
> > add3 r3,r2,r3
> > ld.as   r2,[gp,@z@sda]  ;23
> > st.as   r12,[gp,@x@sda] ;26
> > mov r13,0
> > add3 r2,r13,r2
> > st.as   r3,[gp,@y@sda]  ;26
> > st.as   r2,[gp,@z@sda]  ;26
> > j_s.d   [blink]
> > pop_s   r13
> >
> > where each shift by 3, uses ARC's add3 instruction, which is similar to
> x86's lea
> > implementing x = (y<<3) + z, but requires the value zero to be placed in a
> > temporary register "z".  Splitting this before reload allows these pseudos
> to be
> > shared/reused.  With this patch, we get
> >
> > bar:ld.as   r2,[gp,@x@sda]  ;23
> > mov_s   r3,0;3
> > add3r2,r3,r2
> > ld.as   r3,[gp,@y@sda]  ;23
> > st.as   r2,[gp,@x@sda]  ;26
> > ld.as   r2,[gp,@z@sda]  ;23
> > mov_s   r12,0   ;3
> > add3r3,r12,r3
> > add3r2,r12,r2
> > st.as   r3,[gp,@y@sda]  ;26
> > st.as   r2,[gp,@z@sda]  ;26
> > j_s [blink]
> >
> > [CZI] Looks great, but it also shows that I've forgot to add to ADD3
> instruction the
> > Ra,LIMM,RC variant, which will lead to have instead of
> > mov_s   r3,0;3
> > add3r2,r3,r2
> > Only this add3,0,r2, Indeed it is longer instruction but faster.
> >
> > Unfortunately, register allocation means that we only share two of the
> three
> > "mov_s z,0", but this is sufficient to 

Re: RFC: attributes documentation

2023-10-03 Thread Joseph Myers
On Tue, 3 Oct 2023, Sandra Loosemore wrote:

> Is __attribute__ also considered more powerful than the standard [[]] syntax,
> enough to recommend it over writing standard-conforming code?

Anything that can be expressed with __attribute__ should also be 
expressible with [[]], so use of [[]] is probably a good idea for users 
not concerned with portability to older GCC versions.

-- 
Joseph S. Myers
jos...@codesourcery.com


Re: [COMMITTED] Return TRUE only when a global value is updated.

2023-10-03 Thread Andrew MacLeod

huh.  thanks,  I'll have a look.


Andrew

On 10/3/23 11:47, David Edelsohn wrote:

This patch caused a bootstrap failure on AIX.

during GIMPLE pass: evrp

/nasfarm/edelsohn/src/src/libgcc/libgcc2.c: In function '__gcc_bcmp':

/nasfarm/edelsohn/src/src/libgcc/libgcc2.c:2910:1: internal compiler 
error: in get_irange, at value-range-storage.cc:343


2910 | }

| ^


0x11b7f4b7 irange_storage::get_irange(irange&, tree_node*) const

/nasfarm/edelsohn/src/src/gcc/value-range-storage.cc:343

0x11b7e7af vrange_storage::get_vrange(vrange&, tree_node*) const

/nasfarm/edelsohn/src/src/gcc/value-range-storage.cc:178

0x139f3d77 range_info_get_range(tree_node const*, vrange&)

/nasfarm/edelsohn/src/src/gcc/tree-ssanames.cc:118

0x1134b463 set_range_info(tree_node*, vrange const&)

/nasfarm/edelsohn/src/src/gcc/tree-ssanames.cc:425

0x116a7333 gimple_ranger::register_inferred_ranges(gimple*)

/nasfarm/edelsohn/src/src/gcc/gimple-range.cc:487

0x125cef27 rvrp_folder::fold_stmt(gimple_stmt_iterator*)

/nasfarm/edelsohn/src/src/gcc/tree-vrp.cc:1033

0x123dd063 
substitute_and_fold_dom_walker::before_dom_children(basic_block_def*)


/nasfarm/edelsohn/src/src/gcc/tree-ssa-propagate.cc:876

0x1176cc43 dom_walker::walk(basic_block_def*)

/nasfarm/edelsohn/src/src/gcc/domwalk.cc:311

0x123dd733 
substitute_and_fold_engine::substitute_and_fold(basic_block_def*)


/nasfarm/edelsohn/src/src/gcc/tree-ssa-propagate.cc:999

0x123d0f5f execute_ranger_vrp(function*, bool, bool)

/nasfarm/edelsohn/src/src/gcc/tree-vrp.cc:1062

0x123d14ef execute

/nasfarm/edelsohn/src/src/gcc/tree-vrp.cc:1142





PING: PR rtl-optimization/110701

2023-10-03 Thread Roger Sayle
 

There are a small handful of middle-end maintainers/reviewers that

understand and appreciate the difference between the RTL statements:

(set (subreg:HI (reg:SI x)) (reg:HI y))

and

(set (strict_lowpart:HI (reg:SI x)) (reg:HI y))

 

If one (or more) of them could please take a look at

https://gcc.gnu.org/pipermail/gcc-patches/2023-July/625532.html

I'd very much appreciate it (one less wrong-code regression).

 

Many thanks in advance,

Roger

--

 



Re: [COMMITTED] Return TRUE only when a global value is updated.

2023-10-03 Thread David Edelsohn
This patch caused a bootstrap failure on AIX.

during GIMPLE pass: evrp

/nasfarm/edelsohn/src/src/libgcc/libgcc2.c: In function '__gcc_bcmp':

/nasfarm/edelsohn/src/src/libgcc/libgcc2.c:2910:1: internal compiler error:
in get_irange, at value-range-storage.cc:343

 2910 | }

  | ^


0x11b7f4b7 irange_storage::get_irange(irange&, tree_node*) const

/nasfarm/edelsohn/src/src/gcc/value-range-storage.cc:343

0x11b7e7af vrange_storage::get_vrange(vrange&, tree_node*) const

/nasfarm/edelsohn/src/src/gcc/value-range-storage.cc:178

0x139f3d77 range_info_get_range(tree_node const*, vrange&)

/nasfarm/edelsohn/src/src/gcc/tree-ssanames.cc:118

0x1134b463 set_range_info(tree_node*, vrange const&)

/nasfarm/edelsohn/src/src/gcc/tree-ssanames.cc:425

0x116a7333 gimple_ranger::register_inferred_ranges(gimple*)

/nasfarm/edelsohn/src/src/gcc/gimple-range.cc:487

0x125cef27 rvrp_folder::fold_stmt(gimple_stmt_iterator*)

/nasfarm/edelsohn/src/src/gcc/tree-vrp.cc:1033

0x123dd063
substitute_and_fold_dom_walker::before_dom_children(basic_block_def*)

/nasfarm/edelsohn/src/src/gcc/tree-ssa-propagate.cc:876

0x1176cc43 dom_walker::walk(basic_block_def*)

/nasfarm/edelsohn/src/src/gcc/domwalk.cc:311

0x123dd733 substitute_and_fold_engine::substitute_and_fold(basic_block_def*)

/nasfarm/edelsohn/src/src/gcc/tree-ssa-propagate.cc:999

0x123d0f5f execute_ranger_vrp(function*, bool, bool)

/nasfarm/edelsohn/src/src/gcc/tree-vrp.cc:1062

0x123d14ef execute

/nasfarm/edelsohn/src/src/gcc/tree-vrp.cc:1142


RE: [ARC PATCH] Split SImode shifts pre-reload on !TARGET_BARREL_SHIFTER.

2023-10-03 Thread Roger Sayle


Hi Claudiu,
Thanks for the answers to my technical questions.
If you'd prefer to update arc.md's add3 pattern first,
I'm happy to update/revise my patch based on this
and your feedback, for example preferring add over
asl_s (or controlling this choice with -Os).

Thanks again.
Roger
--

> -Original Message-
> From: Claudiu Zissulescu 
> Sent: 03 October 2023 15:26
> To: Roger Sayle ; gcc-patches@gcc.gnu.org
> Subject: RE: [ARC PATCH] Split SImode shifts pre-reload on
> !TARGET_BARREL_SHIFTER.
> 
> Hi Roger,
> 
> It was nice to meet you too.
> 
> Thank you in looking into the ARC's non-Barrel Shifter configurations.  I
will dive
> into your patch asap, but before starting here are a few of my comments:
> 
> -Original Message-
> From: Roger Sayle 
> Sent: Thursday, September 28, 2023 2:27 PM
> To: gcc-patches@gcc.gnu.org
> Cc: Claudiu Zissulescu 
> Subject: [ARC PATCH] Split SImode shifts pre-reload on
> !TARGET_BARREL_SHIFTER.
> 
> 
> Hi Claudiu,
> It was great meeting up with you and the Synopsys ARC team at the GNU
tools
> Cauldron in Cambridge.
> 
> This patch is the first in a series to improve SImode and DImode shifts
and rotates
> in the ARC backend.  This first piece splits SImode shifts, for
> !TARGET_BARREL_SHIFTER targets, after combine and before reload, in the
split1
> pass, as suggested by the FIXME comment above output_shift in arc.cc.  To
do
> this I've copied the implementation of the x86_pre_reload_split function
from
> i386 backend, and renamed it arc_pre_reload_split.
> 
> Although the actual implementations of shifts remain the same (as in
> output_shift), having them as explicit instructions in the RTL stream
allows better
> scheduling and use of compact forms when available.  The benefits can be
seen in
> two short examples below.
> 
> For the function:
> unsigned int foo(unsigned int x, unsigned int y) {
>   return y << 2;
> }
> 
> GCC with -O2 -mcpu=em would previously generate:
> foo:add r1,r1,r1
> add r1,r1,r1
> j_s.d   [blink]
> mov_s   r0,r1   ;4
> 
> [CZI] The move shouldn't be generated indeed. The use of ADDs are slightly
> beneficial for older ARCv1 arches.
> 
> and with this patch now generates:
> foo:asl_s r0,r1
> j_s.d   [blink]
> asl_s r0,r0
> 
> [CZI] Nice. This new sequence is as fast as we can get for our ARCv2 cpus.
> 
> Notice the original (from shift_si3's output_shift) requires the shift
sequence to be
> monolithic with the same destination register as the source (requiring an
extra
> mov_s).  The new version can eliminate this move, and schedule the second
asl in
> the branch delay slot of the return.
> 
> For the function:
> int x,y,z;
> 
> void bar()
> {
>   x <<= 3;
>   y <<= 3;
>   z <<= 3;
> }
> 
> GCC -O2 -mcpu=em currently generates:
> bar:push_s  r13
> ld.as   r12,[gp,@x@sda] ;23
> ld.as   r3,[gp,@y@sda]  ;23
> mov r2,0
> add3 r12,r2,r12
> mov r2,0
> add3 r3,r2,r3
> ld.as   r2,[gp,@z@sda]  ;23
> st.as   r12,[gp,@x@sda] ;26
> mov r13,0
> add3 r2,r13,r2
> st.as   r3,[gp,@y@sda]  ;26
> st.as   r2,[gp,@z@sda]  ;26
> j_s.d   [blink]
> pop_s   r13
> 
> where each shift by 3, uses ARC's add3 instruction, which is similar to
x86's lea
> implementing x = (y<<3) + z, but requires the value zero to be placed in a
> temporary register "z".  Splitting this before reload allows these pseudos
to be
> shared/reused.  With this patch, we get
> 
> bar:ld.as   r2,[gp,@x@sda]  ;23
> mov_s   r3,0;3
> add3r2,r3,r2
> ld.as   r3,[gp,@y@sda]  ;23
> st.as   r2,[gp,@x@sda]  ;26
> ld.as   r2,[gp,@z@sda]  ;23
> mov_s   r12,0   ;3
> add3r3,r12,r3
> add3r2,r12,r2
> st.as   r3,[gp,@y@sda]  ;26
> st.as   r2,[gp,@z@sda]  ;26
> j_s [blink]
> 
> [CZI] Looks great, but it also shows that I've forgot to add to ADD3
instruction the
> Ra,LIMM,RC variant, which will lead to have instead of
> mov_s   r3,0;3
> add3r2,r3,r2
> Only this add3,0,r2, Indeed it is longer instruction but faster.
> 
> Unfortunately, register allocation means that we only share two of the
three
> "mov_s z,0", but this is sufficient to reduce register pressure enough to
avoid
> spilling r13 in the prologue/epilogue.
> 
> This patch also contains a (latent?) bug fix.  The implementation of the
default
> insn "length" attribute, assumes instructions of type "shift" have two
input
> operands and accesses operands[2], hence specializations of shifts that
don't
> have a operands[2], need to be categorized as type "unary" (which results
in the
> correct length).
> 
> [CZI] The ARC types need an upgrade too.
> 
> This patch has been tested on a cross-compiler to arc-elf (hosted on
x86_64-pc-
> linux-gnu), but because I've an incomplete tool chain many of the
regression test
> fail, but there are no new failures with 

[PATCH 3/6] aarch64: Implement system register validation tools

2023-10-03 Thread Victor Do Nascimento
Given the implementation of a mechanism of encoding system registers
into GCC, this patch provides the mechanism of validating their use by
the compiler.  In particular, this involves:

  1. Ensuring a supplied string corresponds to a known system
 register name.  System registers can be accessed either via their
 name (e.g. `SPSR_EL1') or their encoding (e.g. `S3_0_C4_C0_0').
 Register names are validated using a binary search of the
 `sysreg_names' structure populated from the
 `aarch64_system_regs.def' file via `match_reg'.
 The encoding naming convention is validated via a parser
 implemented in this patch - `is_implem_def_reg'.
  2. Once a given register name is deemed to be valid, it is checked
 against a further 2 criteria:
   a. Is the referenced register implemented in the target
  architecture?  This is achieved by comparing the ARCH field
  in the relevant SYSREG entry from `aarch64_system_regs.def'
  against `aarch64_feature_flags' flags set at compile-time.
   b. Is the register being used correctly?  Check the requested
  operation against the FLAGS specified in SYSREG.
  This prevents operations like writing to a read-only system
  register.
   NOTE: For registers specified via their encoding
   (e.g. `S3_0_C4_C0_0'), once the encoding value is deemed valid
   (as per step 1) no further checks such as read/write support or
   architectural feature requirements are done and this second step
   is skipped, as is done in gas.

gcc/ChangeLog:

* gcc/config/aarch64/aarch64-protos.h (aarch64_valid_sysreg_name_p): 
New.
(aarch64_retrieve_sysreg): Likewise.
* gcc/config/aarch64/aarch64.cc (match_reg): Likewise.
(is_implem_def_reg): Likewise.
(aarch64_valid_sysreg_name_p): Likewise.
(aarch64_retrieve_sysreg): Likewise.
(aarch64_sysreg_valid_for_rw_p): Likewise.
* gcc/config/aarch64/predicates.md (aarch64_sysreg_string): New.
---
 gcc/config/aarch64/aarch64-protos.h |   2 +
 gcc/config/aarch64/aarch64.cc   | 121 
 gcc/config/aarch64/predicates.md|   4 +
 3 files changed, 127 insertions(+)

diff --git a/gcc/config/aarch64/aarch64-protos.h 
b/gcc/config/aarch64/aarch64-protos.h
index 60a55f4bc19..a134e2fcf8e 100644
--- a/gcc/config/aarch64/aarch64-protos.h
+++ b/gcc/config/aarch64/aarch64-protos.h
@@ -830,6 +830,8 @@ bool aarch64_simd_shift_imm_p (rtx, machine_mode, bool);
 bool aarch64_sve_ptrue_svpattern_p (rtx, struct simd_immediate_info *);
 bool aarch64_simd_valid_immediate (rtx, struct simd_immediate_info *,
enum simd_immediate_check w = AARCH64_CHECK_MOV);
+bool aarch64_valid_sysreg_name_p (const char *);
+const char *aarch64_retrieve_sysreg (char *, bool);
 rtx aarch64_check_zero_based_sve_index_immediate (rtx);
 bool aarch64_sve_index_immediate_p (rtx);
 bool aarch64_sve_arith_immediate_p (machine_mode, rtx, bool);
diff --git a/gcc/config/aarch64/aarch64.cc b/gcc/config/aarch64/aarch64.cc
index 030b39ded1a..dd5ac1cbc8d 100644
--- a/gcc/config/aarch64/aarch64.cc
+++ b/gcc/config/aarch64/aarch64.cc
@@ -28070,6 +28070,127 @@ aarch64_pars_overlap_p (rtx par1, rtx par2)
   return false;
 }
 
+/* Binary search of a user-supplied system register name against
+   a database of known register names.  Upon match the index of
+   hit in database is returned, else return -1.  */
+int
+match_reg (const char *ref, const char *database[], int db_len)
+{
+  /* Check for named system registers.  */
+  int imin = 0, imax = db_len - 1, mid, cmp_res;
+  while (imin <= imax)
+{
+  mid = (imin + imax) / 2;
+
+  cmp_res = strcmp (ref, database[mid]);
+  if (cmp_res == 0)
+   return mid;
+  else if (cmp_res > 0)
+   imin = mid+1;
+  else
+   imax = mid-1;
+}
+  return -1;
+}
+
+/* Parse an implementation-defined system register name of
+   the form S[0-3]_[0-7]_C[0-15]_C[0-15]_[1-7].
+   Return true if name matched against above pattern, false
+   otherwise.  */
+bool
+is_implem_def_reg (const char *regname)
+{
+/* Check for implementation-defined system registers.  */
+  int name_len = strlen (regname);
+  if (name_len < 12 || name_len > 14)
+return false;
+
+  int pos = 0, i = 0, j = 0;
+  char n[3] = {0}, m[3] = {0};
+  if (regname[pos] != 's' && regname[pos] != 'S')
+return false;
+  pos++;
+  if (regname[pos] < '0' || regname[pos] > '3')
+return false;
+  pos++;
+  if (regname[pos++] != '_')
+return false;
+  if (regname[pos] < '0' || regname[pos] > '7')
+return false;
+  pos++;
+  if (regname[pos++] != '_')
+return false;
+  if (regname[pos] != 'c' && regname[pos] != 'C')
+return false;
+  pos++;
+  while (regname[pos] != '_')
+{
+  if (i > 2)
+   return false;
+  if (!ISDIGIT (regname[pos]))
+   return false;
+  n[i++] = regname[pos++];
+}
+  if (atoi (n) > 15)
+return false;
+  

[PATCH 1/6] aarch64: Sync system register information with Binutils

2023-10-03 Thread Victor Do Nascimento
This patch adds the `aarch64-sys-regs.def' file to GCC, teaching
the compiler about system registers known to the assembler and how
these can be used.

The macros used to hold system register information reflect those in
use by binutils, a design choice made to facilitate the sharing of data
between different parts of the toolchain.

By aligning the representation of data common to different parts of
the toolchain we can greatly reduce the duplication of work,
facilitating the maintenance of the aarch64 back-end across different
parts of the toolchain; any `SYSREG (...)' that is added in one
project can just as easily be added to its counterpart.

GCC does not implement the full range of ISA flags present in
Binutils.  Where this is the case, aliases must be added to aarch64.h
with the unknown architectural extension being mapped to its
associated base architecture, such that any flag present in Binutils
and used in system register definitions is understood in GCC.  Again,
this is done such that flags can be used interchangeably between
projects making use of the aarch64-system-regs.def file.  This is done
in the next patch in the series.

`.arch' directives missing from the emitted assembly files as a
consequence of this aliasing are accounted for by the compiler using
the S encoding of system registers when
issuing mrs/msr instructions.  This design choice ensures the
assembler will accept anything that was deemed acceptable by the
compiler.

gcc/ChangeLog:

* gcc/config/aarch64/aarch64-system-regs.def: New.
---
 gcc/config/aarch64/aarch64-sys-regs.def | 1059 +++
 1 file changed, 1059 insertions(+)
 create mode 100644 gcc/config/aarch64/aarch64-sys-regs.def

diff --git a/gcc/config/aarch64/aarch64-sys-regs.def 
b/gcc/config/aarch64/aarch64-sys-regs.def
new file mode 100644
index 000..d77fee1d5e3
--- /dev/null
+++ b/gcc/config/aarch64/aarch64-sys-regs.def
@@ -0,0 +1,1059 @@
+/* Copyright (C) 2023 Free Software Foundation, Inc.
+   Contributed by Arm Ltd
+
+   This file is part of GCC.
+
+   GCC is free software; you can redistribute it and/or modify it
+   under the terms of the GNU General Public License as published
+   by the Free Software Foundation; either version 3, or (at your
+   option) any later version.
+
+   GCC is distributed in the hope that it will be useful, but WITHOUT
+   ANY WARRANTY; without even the implied warranty of MERCHANTABILITY
+   or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public
+   License for more details.
+
+   You should have received a copy of the GNU General Public License
+   along with GCC; see the file COPYING3.  If not see
+   .  */
+
+/* Array of system registers and their associated arch features.
+
+   Before using #include to read this file, define a macro:
+
+ SYSREG (name, encoding, flags, features)
+
+  The NAME is the system register name, as recognized by the
+  assembler.  ENCODING provides the necessary information for the binary
+  encoding of the system register.  The FLAGS field is a bitmask of
+  relevant behavior information pertaining to the particular register.
+  For example: is it read/write-only? does it alias another register?
+  The FEATURES field maps onto ISA flags and specifies the architectural
+  feature requirements of the system register.  */
+
+  SYSREG ("accdata_el1",   CPENC (3,0,13,0,5), 0,  
AARCH64_NO_FEATURES)
+  SYSREG ("actlr_el1", CPENC (3,0,1,0,1),  0,  
AARCH64_NO_FEATURES)
+  SYSREG ("actlr_el2", CPENC (3,4,1,0,1),  0,  
AARCH64_NO_FEATURES)
+  SYSREG ("actlr_el3", CPENC (3,6,1,0,1),  0,  
AARCH64_NO_FEATURES)
+  SYSREG ("afsr0_el1", CPENC (3,0,5,1,0),  0,  
AARCH64_NO_FEATURES)
+  SYSREG ("afsr0_el12",CPENC (3,5,5,1,0),  F_ARCHEXT,  
AARCH64_FEATURE (V8_1A))
+  SYSREG ("afsr0_el2", CPENC (3,4,5,1,0),  0,  
AARCH64_NO_FEATURES)
+  SYSREG ("afsr0_el3", CPENC (3,6,5,1,0),  0,  
AARCH64_NO_FEATURES)
+  SYSREG ("afsr1_el1", CPENC (3,0,5,1,1),  0,  
AARCH64_NO_FEATURES)
+  SYSREG ("afsr1_el12",CPENC (3,5,5,1,1),  F_ARCHEXT,  
AARCH64_FEATURE (V8_1A))
+  SYSREG ("afsr1_el2", CPENC (3,4,5,1,1),  0,  
AARCH64_NO_FEATURES)
+  SYSREG ("afsr1_el3", CPENC (3,6,5,1,1),  0,  
AARCH64_NO_FEATURES)
+  SYSREG ("aidr_el1",  CPENC (3,1,0,0,7),  F_REG_READ, 
AARCH64_NO_FEATURES)
+  SYSREG ("allint",CPENC (3,0,4,3,0),  F_ARCHEXT,  
AARCH64_FEATURE (V8_8A))
+  SYSREG ("amair_el1", CPENC (3,0,10,3,0), 0,  
AARCH64_NO_FEATURES)
+  SYSREG ("amair_el12",CPENC (3,5,10,3,0), F_ARCHEXT,

Re: RFC: attributes documentation

2023-10-03 Thread Sandra Loosemore

On 10/3/23 08:19, Joseph Myers wrote:

On Mon, 2 Oct 2023, Sandra Loosemore wrote:


Going beyond that, though, I think we should also document that the standard
syntax is now the preferred way to do it, and change the examples (except for
the parts documenting the old syntax) to use the new standard syntax.  It's
been accepted by the default -std= setting for both C and C++ since GCC 10,
and my understanding is that C2x will be official by the time GCC 14 is
released (so supporting the new syntax won't be just another GNU extension any
more). Does this sound OK to everybody?


If you're documenting attributes in the [[]] form, you need to be a lot
more careful to distinguish between an attribute on a declaration and one
on the type of that declaration, for example, because those need to go in
different places in the standard syntax (the laxity applied with
__attribute__ that tries to guess what was meant and e.g. move an
attribute from a declaration to its type doesn't apply with the standard
syntax, which has precise rules in the standard for what entity an
attribute in a given syntactic position appertains to).  In some cases
this means that the [[]] attribute needs to go in a different position
from __attribute__ in examples [snip]


Right, I was aware I couldn't just do a simple text substitution here without 
at least hand-checking that the examples still have the intended effect!  I'm 
estimating this as a medium-sized documentation project of the kind I typically 
do around the end of the year when we are in stage 3, although since life is 
uncertain I'd like to get in the more minimal fix (to at least mention the 
C/C++ standard attribute syntax) sooner.  My question was more about whether 
there was objection in principle to doing the rewrite.  I recall that for a 
very long time we were telling people to use __attribute__ instead of pragmas, 
and basically telling people to avoid #pragma entirely, for example.  Is 
__attribute__ also considered more powerful than the standard [[]] syntax, 
enough to recommend it over writing standard-conforming code?


-Sandra


[PATCH 5/6] aarch64: Implement system register r/w arm ACLE intrinsic functions

2023-10-03 Thread Victor Do Nascimento
Implement the aarch64 intrinsics for reading and writing system
registers with the following signatures:

uint32_t __arm_rsr(const char *special_register);
uint64_t __arm_rsr64(const char *special_register);
void* __arm_rsrp(const char *special_register);
float __arm_rsrf(const char *special_register);
double __arm_rsrf64(const char *special_register);
void __arm_wsr(const char *special_register, uint32_t value);
void __arm_wsr64(const char *special_register, uint64_t value);
void __arm_wsrp(const char *special_register, const void *value);
void __arm_wsrf(const char *special_register, float value);
void __arm_wsrf64(const char *special_register, double value);

gcc/ChangeLog:

* gcc/config/aarch64/aarch64-builtins.cc (enum aarch64_builtins):
Add enums for new builtins.
(aarch64_init_rwsr_builtins): New.
(aarch64_general_init_builtins): Call aarch64_init_rwsr_builtins.
(aarch64_expand_rwsr_builtin):  New.
(aarch64_general_expand_builtin): Call aarch64_general_expand_builtin.
* gcc/config/aarch64/aarch64.md (read_sysregdi): New insn_and_split.
(write_sysregdi): Likewise.
* gcc/config/aarch64/arm_acle.h (__arm_rsr): New.
(__arm_rsrp): Likewise.
(__arm_rsr64): Likewise.
(__arm_rsrf): Likewise.
(__arm_rsrf64): Likewise.
(__arm_wsr): Likewise.
(__arm_wsrp): Likewise.
(__arm_wsr64): Likewise.
(__arm_wsrf): Likewise.
(__arm_wsrf64): Likewise.

gcc/testsuite/ChangeLog:

* gcc/testsuite/gcc.target/aarch64/acle/rwsr.c: New.
* gcc/testsuite/gcc.target/aarch64/acle/rwsr-1.c: Likewise.
---
 gcc/config/aarch64/aarch64-builtins.cc| 200 ++
 gcc/config/aarch64/aarch64.md |  17 ++
 gcc/config/aarch64/arm_acle.h |  30 +++
 .../gcc.target/aarch64/acle/rwsr-1.c  |  20 ++
 gcc/testsuite/gcc.target/aarch64/acle/rwsr.c  | 144 +
 5 files changed, 411 insertions(+)
 create mode 100644 gcc/testsuite/gcc.target/aarch64/acle/rwsr-1.c
 create mode 100644 gcc/testsuite/gcc.target/aarch64/acle/rwsr.c

diff --git a/gcc/config/aarch64/aarch64-builtins.cc 
b/gcc/config/aarch64/aarch64-builtins.cc
index 04f59fd9a54..d8bb2a989a5 100644
--- a/gcc/config/aarch64/aarch64-builtins.cc
+++ b/gcc/config/aarch64/aarch64-builtins.cc
@@ -808,6 +808,17 @@ enum aarch64_builtins
   AARCH64_RBIT,
   AARCH64_RBITL,
   AARCH64_RBITLL,
+  /* System register builtins.  */
+  AARCH64_RSR,
+  AARCH64_RSRP,
+  AARCH64_RSR64,
+  AARCH64_RSRF,
+  AARCH64_RSRF64,
+  AARCH64_WSR,
+  AARCH64_WSRP,
+  AARCH64_WSR64,
+  AARCH64_WSRF,
+  AARCH64_WSRF64,
   AARCH64_BUILTIN_MAX
 };
 
@@ -1798,6 +1809,65 @@ aarch64_init_rng_builtins (void)
   AARCH64_BUILTIN_RNG_RNDRRS);
 }
 
+/* Add builtins for reading system register.  */
+static void
+aarch64_init_rwsr_builtins (void)
+{
+  tree fntype = NULL;
+  tree const_char_ptr_type
+= build_pointer_type (build_type_variant (char_type_node, true, false));
+
+#define AARCH64_INIT_RWSR_BUILTINS_DECL(F, N, T) \
+  aarch64_builtin_decls[AARCH64_##F] \
+= aarch64_general_add_builtin ("__builtin_aarch64_"#N, T, AARCH64_##F);
+
+  fntype
+= build_function_type_list (uint32_type_node, const_char_ptr_type, NULL);
+  AARCH64_INIT_RWSR_BUILTINS_DECL (RSR, rsr, fntype);
+
+  fntype
+= build_function_type_list (ptr_type_node, const_char_ptr_type, NULL);
+  AARCH64_INIT_RWSR_BUILTINS_DECL (RSRP, rsrp, fntype);
+
+  fntype
+= build_function_type_list (uint64_type_node, const_char_ptr_type, NULL);
+  AARCH64_INIT_RWSR_BUILTINS_DECL (RSR64, rsr64, fntype);
+
+  fntype
+= build_function_type_list (float_type_node, const_char_ptr_type, NULL);
+  AARCH64_INIT_RWSR_BUILTINS_DECL (RSRF, rsrf, fntype);
+
+  fntype
+= build_function_type_list (double_type_node, const_char_ptr_type, NULL);
+  AARCH64_INIT_RWSR_BUILTINS_DECL (RSRF64, rsrf64, fntype);
+
+  fntype
+= build_function_type_list (void_type_node, const_char_ptr_type,
+   uint32_type_node, NULL);
+
+  AARCH64_INIT_RWSR_BUILTINS_DECL (WSR, wsr, fntype);
+
+  fntype
+= build_function_type_list (void_type_node, const_char_ptr_type,
+   const_ptr_type_node, NULL);
+  AARCH64_INIT_RWSR_BUILTINS_DECL (WSRP, wsrp, fntype);
+
+  fntype
+= build_function_type_list (void_type_node, const_char_ptr_type,
+   uint64_type_node, NULL);
+  AARCH64_INIT_RWSR_BUILTINS_DECL (WSR64, wsr64, fntype);
+
+  fntype
+= build_function_type_list (void_type_node, const_char_ptr_type,
+   float_type_node, NULL);
+  AARCH64_INIT_RWSR_BUILTINS_DECL (WSRF, wsrf, fntype);
+
+  fntype
+= build_function_type_list (void_type_node, const_char_ptr_type,
+   double_type_node, NULL);
+  

[PATCH 2/6] aarch64: Add support for aarch64-sys-regs.def

2023-10-03 Thread Victor Do Nascimento
This patch defines the structure of a new .def file used for
representing the aarch64 system registers, what information it should
hold and the basic framework in GCC to process this file.

Entries in the aarch64-system-regs.def file should be as follows:

  SYSREG (NAME, CPENC (sn,op1,cn,cm,op2), FLAG1 | ... | FLAGn, ARCH)

Where the arguments to SYSREG correspond to:
  - NAME:  The system register name, as used in the assembly language.
  - CPENC: The system register encoding, mapping to:

   s__c_c_

  - FLAG: The entries in the FLAGS field are bitwise-OR'd together to
  encode extra information required to ensure proper use of
  the system register.  For example, a read-only system
  register will have the flag F_REG_READ, while write-only
  registers will be labeled F_REG_WRITE.  Such flags are
  tested against at compile-time.
  - ARCH: The architectural features the system register is associated
  with.  This is encoded via one of three possible macros:
  1. When a system register is universally implemented, we say
  it has no feature requirements, so we tag it with the
  AARCH64_NO_FEATURES macro.
  2. When a register is only implemented for a single
  architectural extension EXT, the AARCH64_FEATURE (EXT), is
  used.
  3. When a given system register is made available by any of N
  possible architectural extensions, the AARCH64_FEATURES(N, ...)
  macro is used to combine them accordingly.

In order to enable proper interpretation of the SYSREG entries by the
compiler, flags defining system register behavior such as `F_REG_READ'
and `F_REG_WRITE' are also defined here, so they can later be used for
the validation of system register properties.

Finally, any architectural feature flags from Binutils missing from GCC
have appropriate aliases defined here so as to ensure
cross-compatibility of SYSREG entries across the toolchain.

gcc/ChangeLog:

* gcc/config/aarch64/aarch64.cc (sysreg_names): New.
(sysreg_names_generic): Likewise.
(sysreg_reqs): Likewise.
(sysreg_properties): Likewise.
(nsysreg): Likewise.
* gcc/config/aarch64/aarch64.h (AARCH64_ISA_V8A): Add missing
ISA flag.
(AARCH64_ISA_V8_1A): Likewise.
(AARCH64_ISA_V8_7A): Likewise.
(AARCH64_ISA_V8_8A): Likewise.
(AARCH64_NO_FEATURES): Likewise.
(AARCH64_FL_RAS): New ISA flag alias.
(AARCH64_FL_LOR): Likewise.
(AARCH64_FL_PAN): Likewise.
(AARCH64_FL_AMU): Likewise.
(AARCH64_FL_SCXTNUM): Likewise.
(AARCH64_FL_ID_PFR2): Likewise.
(F_DEPRECATED): New.
(F_REG_READ): Likewise.
(F_REG_WRITE): Likewise.
(F_ARCHEXT): Likewise.
(F_REG_ALIAS): Likewise.
---
 gcc/config/aarch64/aarch64.cc | 55 +++
 gcc/config/aarch64/aarch64.h  | 36 +++
 2 files changed, 91 insertions(+)

diff --git a/gcc/config/aarch64/aarch64.cc b/gcc/config/aarch64/aarch64.cc
index 9fbfc548a89..030b39ded1a 100644
--- a/gcc/config/aarch64/aarch64.cc
+++ b/gcc/config/aarch64/aarch64.cc
@@ -89,6 +89,8 @@
 /* This file should be included last.  */
 #include "target-def.h"
 
+#include "aarch64.h"
+
 /* Defined for convenience.  */
 #define POINTER_BYTES (POINTER_SIZE / BITS_PER_UNIT)
 
@@ -2807,6 +2809,59 @@ static const struct processor all_cores[] =
   {NULL, aarch64_none, aarch64_none, aarch64_no_arch, 0, NULL}
 };
 
+/* Database of system register names.  */
+const char *sysreg_names[] =
+{
+#define SYSREG(NAME, ENC, FLAGS, ARCH) NAME,
+#include "aarch64-sys-regs.def"
+#undef SYSREG
+};
+
+const char *sysreg_names_generic[] =
+{
+#define CPENC(SN, OP1, CN, CM, OP2) "s"#SN"_"#OP1"_c"#CN"_c"#CM"_"#OP2
+#define SYSREG(NAME, ENC, FLAGS, ARCH) ENC,
+#include "aarch64-sys-regs.def"
+#undef SYSREG
+};
+
+/* An aarch64_feature_set initializer for a single feature,
+   AARCH64_FEATURE_.  */
+#define AARCH64_FEATURE(FEAT) AARCH64_FL_##FEAT
+
+/* Used by AARCH64_FEATURES.  */
+#define AARCH64_OR_FEATURES_1(X, F1) \
+  AARCH64_FEATURE (F1)
+#define AARCH64_OR_FEATURES_2(X, F1, F2) \
+  (AARCH64_FEATURE (F1) | AARCH64_OR_FEATURES_1 (X, F2))
+#define AARCH64_OR_FEATURES_3(X, F1, ...) \
+  (AARCH64_FEATURE (F1) | AARCH64_OR_FEATURES_2 (X, __VA_ARGS__))
+
+/* An aarch64_feature_set initializer for the N features listed in "...".  */
+#define AARCH64_FEATURES(N, ...) \
+  AARCH64_OR_FEATURES_##N (0, __VA_ARGS__)
+
+/* Database of system register architectural requirements.  */
+const unsigned long long sysreg_reqs[] =
+{
+#define SYSREG(NAME, ENC, FLAGS, ARCH) ARCH,
+#include "aarch64-sys-regs.def"
+#undef SYSREG
+};
+
+/* Database of system register properties.  Properties assigned unique
+   bits in bitfield and combined via bitwise-OR.  */
+const unsigned sysreg_properties[] =
+{
+#define SYSREG(NAME, ENC, FLAGS, ARCH) FLAGS,
+#include 

[PATCH 4/6] aarch64: Add basic target_print_operand support for CONST_STRING

2023-10-03 Thread Victor Do Nascimento
Motivated by the need to print system register names in output
assembly, this patch adds the required logic to
`aarch64_print_operand' to accept rtxs of type CONST_STRING and
process these accordingly.

Consequently, an rtx such as:

  (set (reg/i:DI 0 x0)
 (unspec:DI [(const_string ("amcgcr_el0"))])

can now be output correctly using the following output pattern when
composing `define_insn's:

  "mrs\t%x0, %1"

gcc/ChangeLog

* gcc/config/aarch64/aarch64.cc (aarch64_print_operand): Add
support for CONST_STRING.
---
 gcc/config/aarch64/aarch64.cc | 6 ++
 1 file changed, 6 insertions(+)

diff --git a/gcc/config/aarch64/aarch64.cc b/gcc/config/aarch64/aarch64.cc
index dd5ac1cbc8d..d6dd0586ac1 100644
--- a/gcc/config/aarch64/aarch64.cc
+++ b/gcc/config/aarch64/aarch64.cc
@@ -12400,6 +12400,12 @@ aarch64_print_operand (FILE *f, rtx x, int code)
 
   switch (GET_CODE (x))
{
+   case CONST_STRING:
+ {
+   const char *output_op = XSTR (x, 0);
+   asm_fprintf (f, "%s", output_op);
+   break;
+ }
case REG:
  if (aarch64_sve_data_mode_p (GET_MODE (x)))
{
-- 
2.41.0



[PATCH 6/6] aarch64: Add front-end argument type checking for target builtins

2023-10-03 Thread Victor Do Nascimento
In implementing the ACLE read/write system register builtins it was
observed that leaving argument type checking to be done at expand-time
meant that poorly-formed function calls were being "fixed" by certain
optimization passes, meaning bad code wasn't being properly picked up
in checking.

Example:

  const char *regname = "amcgcr_el0";
  long long a = __builtin_aarch64_rsr64 (regname);

is reduced by the ccp1 pass to

  long long a = __builtin_aarch64_rsr64 ("amcgcr_el0");

As these functions require an argument of STRING_CST type, there needs
to be a check carried out by the front-end capable of picking this up.

The introduced `check_general_builtin_call' function will be called by
the TARGET_CHECK_BUILTIN_CALL hook whenever a call to a builtin
belonging to the AARCH64_BUILTIN_GENERAL category is encountered,
carrying out any appropriate checks associated with a particular
builtin function code.

gcc/ChangeLog:

* gcc/config/aarch64/aarch64-builtins.cc (check_general_builtin_call):
New.
* gcc/config/aarch64/aarch64-c.cc (aarch64_check_builtin_call):
Add check_general_builtin_call call.
* gcc/config/aarch64/aarch64-protos.h (check_general_builtin_call):
New.

gcc/testsuite/ChangeLog:

* gcc/testsuite/gcc.target/aarch64/acle/rwsr-2.c: New.
---
 gcc/config/aarch64/aarch64-builtins.cc| 33 +++
 gcc/config/aarch64/aarch64-c.cc   |  4 +--
 gcc/config/aarch64/aarch64-protos.h   |  3 ++
 .../gcc.target/aarch64/acle/rwsr-2.c  | 15 +
 4 files changed, 53 insertions(+), 2 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/aarch64/acle/rwsr-2.c

diff --git a/gcc/config/aarch64/aarch64-builtins.cc 
b/gcc/config/aarch64/aarch64-builtins.cc
index d8bb2a989a5..6734361f4f4 100644
--- a/gcc/config/aarch64/aarch64-builtins.cc
+++ b/gcc/config/aarch64/aarch64-builtins.cc
@@ -2126,6 +2126,39 @@ aarch64_general_builtin_decl (unsigned code, bool)
   return aarch64_builtin_decls[code];
 }
 
+bool
+check_general_builtin_call (location_t location, vec,
+   unsigned int code, tree fndecl,
+   unsigned int nargs ATTRIBUTE_UNUSED, tree *args)
+{
+  switch (code)
+{
+case AARCH64_RSR:
+case AARCH64_RSRP:
+case AARCH64_RSR64:
+case AARCH64_RSRF:
+case AARCH64_RSRF64:
+case AARCH64_WSR:
+case AARCH64_WSRP:
+case AARCH64_WSR64:
+case AARCH64_WSRF:
+case AARCH64_WSRF64:
+  if (TREE_CODE (args[0]) == VAR_DECL
+ || TREE_CODE (TREE_TYPE (args[0])) != POINTER_TYPE
+ || TREE_CODE (TREE_OPERAND (TREE_OPERAND (args[0], 0) , 0))
+ != STRING_CST)
+   {
+ const char  *fn_name, *err_msg;
+ fn_name = IDENTIFIER_POINTER (DECL_NAME (fndecl));
+ err_msg = "first argument to %<%s%> must be a string literal";
+ error_at (location, err_msg, fn_name);
+ return false;
+   }
+}
+  /* Default behavior.  */
+  return true;
+}
+
 typedef enum
 {
   SIMD_ARG_COPY_TO_REG,
diff --git a/gcc/config/aarch64/aarch64-c.cc b/gcc/config/aarch64/aarch64-c.cc
index 578ec6f45b0..6e2b83b8308 100644
--- a/gcc/config/aarch64/aarch64-c.cc
+++ b/gcc/config/aarch64/aarch64-c.cc
@@ -338,8 +338,8 @@ aarch64_check_builtin_call (location_t loc, vec 
arg_loc,
   switch (code & AARCH64_BUILTIN_CLASS)
 {
 case AARCH64_BUILTIN_GENERAL:
-  return true;
-
+  return check_general_builtin_call (loc, arg_loc, subcode, orig_fndecl,
+nargs, args);
 case AARCH64_BUILTIN_SVE:
   return aarch64_sve::check_builtin_call (loc, arg_loc, subcode,
  orig_fndecl, nargs, args);
diff --git a/gcc/config/aarch64/aarch64-protos.h 
b/gcc/config/aarch64/aarch64-protos.h
index a134e2fcf8e..9ef96ff511f 100644
--- a/gcc/config/aarch64/aarch64-protos.h
+++ b/gcc/config/aarch64/aarch64-protos.h
@@ -990,6 +990,9 @@ tree aarch64_general_builtin_rsqrt (unsigned int);
 void handle_arm_acle_h (void);
 void handle_arm_neon_h (void);
 
+bool check_general_builtin_call (location_t, vec, unsigned int,
+ tree, unsigned int, tree *);
+
 namespace aarch64_sve {
   void init_builtins ();
   void handle_arm_sve_h ();
diff --git a/gcc/testsuite/gcc.target/aarch64/acle/rwsr-2.c 
b/gcc/testsuite/gcc.target/aarch64/acle/rwsr-2.c
new file mode 100644
index 000..72e5fb75b21
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/acle/rwsr-2.c
@@ -0,0 +1,15 @@
+/* Test the __arm_[r,w]sr ACLE intrinsics family.  */
+/* Ensure that illegal behavior is rejected by the compiler.  */
+
+/* { dg-do compile } */
+/* { dg-options "-O3 -march=armv8.4-a" } */
+
+#include 
+
+void
+test_non_const_sysreg_name ()
+{
+  const char *regname = "trcseqstr";
+  long long a = __arm_rsr64 (regname); /* { dg-error "first argument to 
'__builtin_aarch64_rsr64' must be a string literal" } */
+  __arm_wsr64 (regname, 

[PATCH 0/6] aarch64: Add support for __arm_rsr and __arm_wsr ACLE function family

2023-10-03 Thread Victor Do Nascimento
This patch series adds support for reading and writing to and from
system registers via the relevant ACLE-defined builtins [1], making a
series of additions to the aarch64-specific areas of the compiler to
make this possible. 

Firstly, a mechanism for defining system registers is established via a
new .def file and the new SYSREG macro.  This macro is the same as is
used in Binutils and system register entries are compatible with
either code-base.

Given the information contained in this system register definition
file, a compile-time validation mechanism is implemented, such that any
system register name passed as a string literal argument to these
builtins can be checked against known system registers and its use
for a given target architecture validated.

Finally, patterns for each of these builtins are added to the back-end
such that, if all validation criteria are met, the correct assembly is
emitted.

Thus, the following example of system register access is now valid for
GCC:

long long old = __arm_rsr("trcseqstr");
__arm_wsr("trcseqstr", new);

Testing:
 - Bootstrap/regtest on aarch64-linux-gnu done.

[1] https://arm-software.github.io/acle/main/acle.html

Victor Do Nascimento (6):
  aarch64: Sync system register information with Binutils
  aarch64: Add support for aarch64-sys-regs.def
  aarch64: Implement system register validation tools
  aarch64: Add basic target_print_operand support for CONST_STRING
  aarch64: Implement system register r/w arm ACLE intrinsic functions
  aarch64: Add front-end argument type checking for target builtins

 gcc/config/aarch64/aarch64-builtins.cc|  233 
 gcc/config/aarch64/aarch64-c.cc   |4 +-
 gcc/config/aarch64/aarch64-protos.h   |5 +
 gcc/config/aarch64/aarch64-sys-regs.def   | 1059 +
 gcc/config/aarch64/aarch64.cc |  182 +++
 gcc/config/aarch64/aarch64.h  |   36 +
 gcc/config/aarch64/aarch64.md |   17 +
 gcc/config/aarch64/arm_acle.h |   30 +
 gcc/config/aarch64/predicates.md  |4 +
 .../gcc.target/aarch64/acle/rwsr-1.c  |   20 +
 .../gcc.target/aarch64/acle/rwsr-2.c  |   15 +
 gcc/testsuite/gcc.target/aarch64/acle/rwsr.c  |  144 +++
 12 files changed, 1747 insertions(+), 2 deletions(-)
 create mode 100644 gcc/config/aarch64/aarch64-sys-regs.def
 create mode 100644 gcc/testsuite/gcc.target/aarch64/acle/rwsr-1.c
 create mode 100644 gcc/testsuite/gcc.target/aarch64/acle/rwsr-2.c
 create mode 100644 gcc/testsuite/gcc.target/aarch64/acle/rwsr.c

-- 
2.41.0



Re: [PATCH] __atomic_test_and_set: Fall back to library, not non-atomic code

2023-10-03 Thread Hans-Peter Nilsson
> From: Christophe Lyon 
> Date: Tue, 3 Oct 2023 15:20:39 +0200

> The patch passed almost all our CI configurations, except arm-eabi when
> testing with
>  -mthumb/-march=armv6s-m/-mtune=cortex-m0/-mfloat-abi=soft/-mfpu=auto
> where is causes these failures:
> FAIL: 29_atomics/atomic_flag/clear/1.cc -std=gnu++17 (test for excess
> errors)
> UNRESOLVED: 29_atomics/atomic_flag/clear/1.cc -std=gnu++17 compilation
> failed to produce executable
> FAIL: 29_atomics/atomic_flag/cons/value_init.cc -std=gnu++20 (test for
> excess errors)
> UNRESOLVED: 29_atomics/atomic_flag/cons/value_init.cc -std=gnu++20
> compilation failed to produce executable
> FAIL: 29_atomics/atomic_flag/cons/value_init.cc -std=gnu++26 (test for
> excess errors)
> UNRESOLVED: 29_atomics/atomic_flag/cons/value_init.cc -std=gnu++26
> compilation failed to produce executable
> FAIL: 29_atomics/atomic_flag/test_and_set/explicit.cc -std=gnu++17 (test
> for excess errors)
> UNRESOLVED: 29_atomics/atomic_flag/test_and_set/explicit.cc -std=gnu++17
> compilation failed to produce executable
> FAIL: 29_atomics/atomic_flag/test_and_set/implicit.cc -std=gnu++17 (test
> for excess errors)
> UNRESOLVED: 29_atomics/atomic_flag/test_and_set/implicit.cc -std=gnu++17
> compilation failed to produce executable

For which set of multilibs in that set, do you get these
errors?  I'm guessing -march=armv6s-m, but I'm checking.

> The linker error is:
> undefined reference to `__atomic_test_and_set'

I read that as you're saying you have a multilib combination
where you currently don't emit __sync_synchronize but also
don't emit anything for __atomic_test_and_set.

> Maybe we need a new variant of dg-require-thread-fence ?

Perhaps.  Unless of course, there's a multilib combination
for which you *can* emit the proper atomic spell; missing it
because the need for it, has been hidden!

(At first I thought it was related to caching the
thread-fence property across multilib testing, but I don't
think that was correct.)

> 
> Thanks,
> 
> Christophe
> 
> 
> Ok to commit?

ENOPATCH

brgds, H-P


[COMMITTED] Remove pass counting in VRP.

2023-10-03 Thread Andrew MacLeod
Pass counting in VRP is used to decide when to call early VRP, pass the 
flag to enable warnings, and when the final pass is.


If you try to add additional passes, this becomes quite fragile. This 
patch simply chooses the pass based on the data pointer passed in, and 
remove the pass counter.   The first FULL VRP pass invokes the warning 
code, and the flag passed in now represents the FINAL pass of VRP.  
There is no longer a global flag which, as it turns out, wasn't working 
well with the JIT compiler, but when undetected.  (Thanks to dmalcolm 
for helping me sort out what was going on there)



Bootstraps  on x86_64-pc-linux-gnu with no regressions.   Pushed.

Andrew
From 29abc475a360ad14d5f692945f2805fba1fdc679 Mon Sep 17 00:00:00 2001
From: Andrew MacLeod 
Date: Thu, 28 Sep 2023 09:19:32 -0400
Subject: [PATCH 2/5] Remove pass counting in VRP.

Rather than using a pass count to decide which parameters are passed to
VRP, makemit explicit.

	* passes.def (pass_vrp): Use parameter for final pass flag..
	* tree-vrp.cc (vrp_pass_num): Remove.
	(run_warning_pass): New.
	(pass_vrp::my_pass): Remove.
	(pass_vrp::final_p): New.
	(pass_vrp::set_pass_param): Set final_p param.
	(pass_vrp::execute): Choose specific pass based on data pointer.
---
 gcc/passes.def  |  4 ++--
 gcc/tree-vrp.cc | 26 +-
 2 files changed, 19 insertions(+), 11 deletions(-)

diff --git a/gcc/passes.def b/gcc/passes.def
index 4110a472914..2bafd60bbfb 100644
--- a/gcc/passes.def
+++ b/gcc/passes.def
@@ -221,7 +221,7 @@ along with GCC; see the file COPYING3.  If not see
   NEXT_PASS (pass_fre, true /* may_iterate */);
   NEXT_PASS (pass_merge_phi);
   NEXT_PASS (pass_thread_jumps_full, /*first=*/true);
-  NEXT_PASS (pass_vrp, true /* warn_array_bounds_p */);
+  NEXT_PASS (pass_vrp, false /* final_p*/);
   NEXT_PASS (pass_dse);
   NEXT_PASS (pass_dce);
   /* pass_stdarg is always run and at this point we execute
@@ -348,7 +348,7 @@ along with GCC; see the file COPYING3.  If not see
   NEXT_PASS (pass_dominator, false /* may_peel_loop_headers_p */);
   NEXT_PASS (pass_strlen);
   NEXT_PASS (pass_thread_jumps_full, /*first=*/false);
-  NEXT_PASS (pass_vrp, false /* warn_array_bounds_p */);
+  NEXT_PASS (pass_vrp, true /* final_p */);
   /* Run CCP to compute alignment and nonzero bits.  */
   NEXT_PASS (pass_ccp, true /* nonzero_p */);
   NEXT_PASS (pass_warn_restrict);
diff --git a/gcc/tree-vrp.cc b/gcc/tree-vrp.cc
index d7b194f5904..05266dfe34a 100644
--- a/gcc/tree-vrp.cc
+++ b/gcc/tree-vrp.cc
@@ -1120,36 +1120,44 @@ const pass_data pass_data_early_vrp =
   ( TODO_cleanup_cfg | TODO_update_ssa | TODO_verify_all ),
 };
 
-static int vrp_pass_num = 0;
+static bool run_warning_pass = true;
 class pass_vrp : public gimple_opt_pass
 {
 public:
   pass_vrp (gcc::context *ctxt, const pass_data _)
-: gimple_opt_pass (data_, ctxt), data (data_), warn_array_bounds_p (false),
-  my_pass (vrp_pass_num++)
-  {}
+: gimple_opt_pass (data_, ctxt), data (data_),
+  warn_array_bounds_p (false), final_p (false)
+  {
+// Only the frst VRP pass should run warnings.
+if ( == _data_vrp)
+  {
+	warn_array_bounds_p = run_warning_pass;
+	run_warning_pass = false;
+  }
+  }
 
   /* opt_pass methods: */
   opt_pass * clone () final override { return new pass_vrp (m_ctxt, data); }
   void set_pass_param (unsigned int n, bool param) final override
 {
   gcc_assert (n == 0);
-  warn_array_bounds_p = param;
+  final_p = param;
 }
   bool gate (function *) final override { return flag_tree_vrp != 0; }
   unsigned int execute (function *fun) final override
 {
   // Early VRP pass.
-  if (my_pass == 0)
-	return execute_ranger_vrp (fun, /*warn_array_bounds_p=*/false, false);
+  if ( == _data_early_vrp)
+	return execute_ranger_vrp (fun, /*warn_array_bounds_p=*/false,
+   /*final_p=*/false);
 
-  return execute_ranger_vrp (fun, warn_array_bounds_p, my_pass == 2);
+  return execute_ranger_vrp (fun, warn_array_bounds_p, final_p);
 }
 
  private:
   const pass_data 
   bool warn_array_bounds_p;
-  int my_pass;
+  bool final_p;
 }; // class pass_vrp
 
 const pass_data pass_data_assumptions =
-- 
2.41.0



[COMMITTED] Return TRUE only when a global value is updated.

2023-10-03 Thread Andrew MacLeod
set_range_info should return TRUE only when it sets a new value. It was 
currently returning true whenever it set a value, whether it was 
different or not.


With this change,  VRP no longer overwrites global ranges DOM has set.  
2 testcases needed adjusting that were expecting VRP2 to set a range but 
turns out it was really being set in DOM2.   Instead they check for the 
range in the final listing...


Bootstrapped on  x86_64-pc-linux-gnu with no regressions. Pushed.

Andrew
From dae5de2a2353b928cc7099a78d88a40473abefd2 Mon Sep 17 00:00:00 2001
From: Andrew MacLeod 
Date: Wed, 27 Sep 2023 12:34:16 -0400
Subject: [PATCH 1/5] Return TRUE only when a global value is updated.

set_range_info should return TRUE only when it sets a new value.  VRP no
longer overwrites global ranges DOM has set.  Check for ranges in the
final listing.

	gcc/
	* tree-ssanames.cc (set_range_info): Return true only if the
	current value changes.

	gcc/testsuite/
	* gcc.dg/pr93917.c: Check for ranges in final optimized listing.
	* gcc.dg/tree-ssa/vrp-unreachable.c: Ditto.
---
 gcc/testsuite/gcc.dg/pr93917.c|  4 ++--
 .../gcc.dg/tree-ssa/vrp-unreachable.c |  4 ++--
 gcc/tree-ssanames.cc  | 24 +--
 3 files changed, 16 insertions(+), 16 deletions(-)

diff --git a/gcc/testsuite/gcc.dg/pr93917.c b/gcc/testsuite/gcc.dg/pr93917.c
index f09e1c41ae8..f636b77f45d 100644
--- a/gcc/testsuite/gcc.dg/pr93917.c
+++ b/gcc/testsuite/gcc.dg/pr93917.c
@@ -1,5 +1,5 @@
 /* { dg-do compile } */
-/* { dg-options "-O2 -fdump-tree-vrp1 -fdump-tree-vrp2" } */
+/* { dg-options "-O2 -fdump-tree-vrp1 -fdump-tree-vrp2 -fdump-tree-optimized-alias" } */
 
 void f3(int n);
 
@@ -19,5 +19,5 @@ void f2(int*n)
 
 /* { dg-final { scan-tree-dump-times "Global Export.*0, \\+INF" 1 "vrp1" } } */
 /* { dg-final { scan-tree-dump-times "__builtin_unreachable" 1 "vrp1" } } */
-/* { dg-final { scan-tree-dump-times "Global Export.*0, \\+INF" 1 "vrp2" } } */
 /* { dg-final { scan-tree-dump-times "__builtin_unreachable" 0 "vrp2" } } */
+/* { dg-final { scan-tree-dump-times "0, \\+INF" 2 "optimized" } } */
diff --git a/gcc/testsuite/gcc.dg/tree-ssa/vrp-unreachable.c b/gcc/testsuite/gcc.dg/tree-ssa/vrp-unreachable.c
index 5835dfc8dbc..4aad7f1be5d 100644
--- a/gcc/testsuite/gcc.dg/tree-ssa/vrp-unreachable.c
+++ b/gcc/testsuite/gcc.dg/tree-ssa/vrp-unreachable.c
@@ -1,5 +1,5 @@
 /* { dg-do compile } */
-/* { dg-options "-O2 -fdump-tree-vrp1-alias -fdump-tree-vrp2-alias" } */
+/* { dg-options "-O2 -fdump-tree-vrp1 -fdump-tree-vrp2 -fdump-tree-optimized-alias" } */
 
 void dead (unsigned n);
 void alive (unsigned n);
@@ -39,4 +39,4 @@ void func (unsigned n, unsigned m)
 /* { dg-final { scan-tree-dump-not "dead" "vrp1" } } */
 /* { dg-final { scan-tree-dump-times "builtin_unreachable" 1 "vrp1" } } */
 /* { dg-final { scan-tree-dump-not "builtin_unreachable" "vrp2" } } */
-/* { dg-final { scan-tree-dump-times "fff8 VALUE 0x0" 4 "vrp2" } } */
+/* { dg-final { scan-tree-dump-times "fff8 VALUE 0x0" 2 "optimized" } } */
diff --git a/gcc/tree-ssanames.cc b/gcc/tree-ssanames.cc
index 23387b90fe3..1eae411ac1c 100644
--- a/gcc/tree-ssanames.cc
+++ b/gcc/tree-ssanames.cc
@@ -418,10 +418,17 @@ set_range_info (tree name, const vrange )
   if (r.undefined_p () || r.varying_p ())
 return false;
 
+  // Pick up the current range, or VARYING if none.
   tree type = TREE_TYPE (name);
+  Value_Range tmp (type);
+  if (range_info_p (name))
+range_info_get_range (name, tmp);
+  else
+tmp.set_varying (type);
+
   if (POINTER_TYPE_P (type))
 {
-  if (r.nonzero_p ())
+  if (r.nonzero_p () && !tmp.nonzero_p ())
 	{
 	  set_ptr_nonnull (name);
 	  return true;
@@ -429,18 +436,11 @@ set_range_info (tree name, const vrange )
   return false;
 }
 
-  /* If a global range already exists, incorporate it.  */
-  if (range_info_p (name))
-{
-  Value_Range tmp (type);
-  range_info_get_range (name, tmp);
-  tmp.intersect (r);
-  if (tmp.undefined_p ())
-	return false;
+  // If the result doesn't change, or is undefined, return false.
+  if (!tmp.intersect (r) || tmp.undefined_p ())
+return false;
 
-  return range_info_set_range (name, tmp);
-}
-  return range_info_set_range (name, r);
+  return range_info_set_range (name, tmp);
 }
 
 /* Set nonnull attribute to pointer NAME.  */
-- 
2.41.0



RE: [ARC PATCH] Split SImode shifts pre-reload on !TARGET_BARREL_SHIFTER.

2023-10-03 Thread Claudiu Zissulescu
Hi Roger,

It was nice to meet you too.

Thank you in looking into the ARC's non-Barrel Shifter configurations.  I will 
dive into your patch asap, but before starting here are a few of my comments: 

-Original Message-
From: Roger Sayle  
Sent: Thursday, September 28, 2023 2:27 PM
To: gcc-patches@gcc.gnu.org
Cc: Claudiu Zissulescu 
Subject: [ARC PATCH] Split SImode shifts pre-reload on !TARGET_BARREL_SHIFTER.


Hi Claudiu,
It was great meeting up with you and the Synopsys ARC team at the GNU tools 
Cauldron in Cambridge.

This patch is the first in a series to improve SImode and DImode shifts and 
rotates in the ARC backend.  This first piece splits SImode shifts, for 
!TARGET_BARREL_SHIFTER targets, after combine and before reload, in the split1 
pass, as suggested by the FIXME comment above output_shift in arc.cc.  To do 
this I've copied the implementation of the x86_pre_reload_split function from 
i386 backend, and renamed it arc_pre_reload_split.

Although the actual implementations of shifts remain the same (as in 
output_shift), having them as explicit instructions in the RTL stream allows 
better scheduling and use of compact forms when available.  The benefits can be 
seen in two short examples below.

For the function:
unsigned int foo(unsigned int x, unsigned int y) {
  return y << 2;
}

GCC with -O2 -mcpu=em would previously generate:
foo:add r1,r1,r1
add r1,r1,r1
j_s.d   [blink]
mov_s   r0,r1   ;4

[CZI] The move shouldn't be generated indeed. The use of ADDs are slightly 
beneficial for older ARCv1 arches.

and with this patch now generates:
foo:asl_s r0,r1
j_s.d   [blink]
asl_s r0,r0

[CZI] Nice. This new sequence is as fast as we can get for our ARCv2 cpus.

Notice the original (from shift_si3's output_shift) requires the shift sequence 
to be monolithic with the same destination register as the source (requiring an 
extra mov_s).  The new version can eliminate this move, and schedule the second 
asl in the branch delay slot of the return.

For the function:
int x,y,z;

void bar()
{
  x <<= 3;
  y <<= 3;
  z <<= 3;
}

GCC -O2 -mcpu=em currently generates:
bar:push_s  r13
ld.as   r12,[gp,@x@sda] ;23
ld.as   r3,[gp,@y@sda]  ;23
mov r2,0
add3 r12,r2,r12
mov r2,0
add3 r3,r2,r3
ld.as   r2,[gp,@z@sda]  ;23
st.as   r12,[gp,@x@sda] ;26
mov r13,0
add3 r2,r13,r2
st.as   r3,[gp,@y@sda]  ;26
st.as   r2,[gp,@z@sda]  ;26
j_s.d   [blink]
pop_s   r13

where each shift by 3, uses ARC's add3 instruction, which is similar to x86's 
lea implementing x = (y<<3) + z, but requires the value zero to be placed in a 
temporary register "z".  Splitting this before reload allows these pseudos to 
be shared/reused.  With this patch, we get

bar:ld.as   r2,[gp,@x@sda]  ;23
mov_s   r3,0;3
add3r2,r3,r2
ld.as   r3,[gp,@y@sda]  ;23
st.as   r2,[gp,@x@sda]  ;26
ld.as   r2,[gp,@z@sda]  ;23
mov_s   r12,0   ;3
add3r3,r12,r3
add3r2,r12,r2
st.as   r3,[gp,@y@sda]  ;26
st.as   r2,[gp,@z@sda]  ;26
j_s [blink]

[CZI] Looks great, but it also shows that I've forgot to add to ADD3 
instruction the Ra,LIMM,RC variant, which will lead to have instead of 
mov_s   r3,0;3
add3r2,r3,r2
Only this add3,0,r2, Indeed it is longer instruction but faster.

Unfortunately, register allocation means that we only share two of the three 
"mov_s z,0", but this is sufficient to reduce register pressure enough to avoid 
spilling r13 in the prologue/epilogue.

This patch also contains a (latent?) bug fix.  The implementation of the 
default insn "length" attribute, assumes instructions of type "shift" have two 
input operands and accesses operands[2], hence specializations of shifts that 
don't have a operands[2], need to be categorized as type "unary" (which results 
in the correct length).

[CZI] The ARC types need an upgrade too.

This patch has been tested on a cross-compiler to arc-elf (hosted on 
x86_64-pc-linux-gnu), but because I've an incomplete tool chain many of the 
regression test fail, but there are no new failures with new test cases added 
below.  If you can confirm that there are no issues from additional testing, is 
this OK for mainline?

Finally a quick technical question.  ARC's zero overhead loops require at least 
two instructions in the loop, so currently the backend's implementation of 
shr20 pads the loop body with a "nop".

lshr20: mov.f lp_count, 20
lpnz2f
lsr r0,r0
nop
2:  # end single insn loop
j_s [blink]


[CZI] The ZOLs (LP instructions) are not great when dealing with short loop 
blocks. Hence, the NOP instruction. Personally, I don't fancy using the LP 
instruction in this case, as it prohibits LP usage for a true for-loop.

could this be more efficiently implemented as:


Re: RFC: attributes documentation

2023-10-03 Thread Joseph Myers
On Mon, 2 Oct 2023, Sandra Loosemore wrote:

> Going beyond that, though, I think we should also document that the standard
> syntax is now the preferred way to do it, and change the examples (except for
> the parts documenting the old syntax) to use the new standard syntax.  It's
> been accepted by the default -std= setting for both C and C++ since GCC 10,
> and my understanding is that C2x will be official by the time GCC 14 is
> released (so supporting the new syntax won't be just another GNU extension any
> more). Does this sound OK to everybody?

If you're documenting attributes in the [[]] form, you need to be a lot 
more careful to distinguish between an attribute on a declaration and one 
on the type of that declaration, for example, because those need to go in 
different places in the standard syntax (the laxity applied with 
__attribute__ that tries to guess what was meant and e.g. move an 
attribute from a declaration to its type doesn't apply with the standard 
syntax, which has precise rules in the standard for what entity an 
attribute in a given syntactic position appertains to).  In some cases 
this means that the [[]] attribute needs to go in a different position 
from __attribute__ in examples - in

int f (void) __attribute__ ((foo));

the GNU attribute in that position is considered a declaration attribute 
(but a function type attribute applied to a declaration will automatically 
be applied to the type instead) whereas

int f (void) [[gnu::foo]];

is only a function type attribute, not a declaration attribute [*], and

[[gnu::foo]] int f (void);

is only a declaration attribute, not a function type attribute.  (The 
version

int [[gnu::foo]] f (void);

applies the attribute to the int return type, which probably isn't what 
you want - unless you're e.g. using [[gnu::vector_size (16)]] to declare 
that the function returns a vector.)

To complicate things further, some attributes that are implemented as 
declaration attributes maybe logically should be type attributes but 
support for them on types hasn't been implemented - and if making such a 
GNU attribute into a type attribute in future, we might then need to 
consider appropriate compatibility for allowing it in both places even 
with the standard syntax.


[*] In the GNU syntax an attribute in this position is parsed as an 
attribute following a full declarator and any subsequent asm giving an 
external linkage name for the declaration.  In the standard syntax, an 
attribute in this position is part of a declarator immediately following a 
function declarator (and so could appear on nested function declarators, 
for example).  So while it looks superficially like "the same" position 
for the attributes, it's not at all the same syntactically.

-- 
Joseph S. Myers
jos...@codesourcery.com


[PATCH] ira: Scale save/restore costs of callee save registers with block frequency

2023-10-03 Thread Surya Kumari Jangala
ira: Scale save/restore costs of callee save registers with block frequency

In assign_hard_reg(), when computing the costs of the hard registers, the
cost of saving/restoring a callee-save hard register in prolog/epilog is
taken into consideration. However, this cost is not scaled with the entry
block frequency. Without scaling, the cost of saving/restoring is quite
small and this can result in a callee-save register being chosen by
assign_hard_reg() even though there are free caller-save registers
available. Assigning a callee save register to a pseudo that is live
in the entire function and across a call will cause shrink wrap to fail.

2023-10-03  Surya Kumari Jangala  

gcc/
PR rtl-optimization/111673
* ira-color.cc (assign_hard_reg): Scale save/restore costs of
callee save registers with block frequency.

gcc/testsuite/
PR rtl-optimization/111673
* gcc.target/powerpc/pr111673/c: New test.
---

diff --git a/gcc/ira-color.cc b/gcc/ira-color.cc
index f2e8ea34152..eb20c52310d 100644
--- a/gcc/ira-color.cc
+++ b/gcc/ira-color.cc
@@ -2175,7 +2175,8 @@ assign_hard_reg (ira_allocno_t a, bool retry_p)
add_cost = ((ira_memory_move_cost[mode][rclass][0]
 + ira_memory_move_cost[mode][rclass][1])
* saved_nregs / hard_regno_nregs (hard_regno,
- mode) - 1);
+ mode) - 1)
+   * REG_FREQ_FROM_BB (ENTRY_BLOCK_PTR_FOR_FN (cfun));
cost += add_cost;
full_cost += add_cost;
  }
diff --git a/gcc/testsuite/gcc.target/powerpc/pr111673.c 
b/gcc/testsuite/gcc.target/powerpc/pr111673.c
new file mode 100644
index 000..e0c0f85460a
--- /dev/null
+++ b/gcc/testsuite/gcc.target/powerpc/pr111673.c
@@ -0,0 +1,17 @@
+/* { dg-do compile { target lp64 } } */
+/* { dg-options "-O2 -fdump-rtl-pro_and_epilogue" } */
+
+/* Verify there is an early return without the prolog and shrink-wrap
+   the function. */
+
+int f (int);
+int
+advance (int dz)
+{
+  if (dz > 0)
+return (dz + dz) * dz;
+  else
+return dz * f (dz);
+}
+
+/* { dg-final { scan-rtl-dump-times "Performing shrink-wrapping" 1 
"pro_and_epilogue" } } */


Re: [PATCH] rs6000: Make 32 bit stack_protect support prefixed insn [PR111367]

2023-10-03 Thread David Edelsohn
On Wed, Sep 27, 2023 at 1:38 AM Kewen.Lin  wrote:

> Hi,
>
> As PR111367 shows, with prefixed insn supported, some of
> checkings consider it's able to leverage prefixed insn
> for stack protect related load/store, but since we don't
> actually change the emitted assembly for 32 bit, it can
> cause the assembler error as exposed.
>
> Mike's commit r10-4547-gce6a6c007e5a98 has already handled
> the 64 bit case (DImode), this patch is to treat the 32
> bit case (SImode) by making use of mode iterator P and
> ptrload attribute iterator, also fixes the constraints
> to match the emitted operand formats.
>
> Bootstrapped and regtested on powerpc64-linux-gnu P7/P8/P9
> and powerpc64le-linux-gnu P9.
>
> This patch has incorporated Segher's comments in PR111367,
> I'm going to push this soon if no objections.
>

This patch is okay.

Thanks, David


>
> BR,
> Kewen
> -
> PR target/111367
>
> gcc/ChangeLog:
>
> * config/rs6000/rs6000.md (stack_protect_setsi): Support prefixed
> instruction emission and incorporate to stack_protect_set.
> (stack_protect_setdi): Rename to ...
> (stack_protect_set): ... this, adjust constraint.
> (stack_protect_testsi): Support prefixed instruction emission and
> incorporate to stack_protect_test.
> (stack_protect_testdi): Rename to ...
> (stack_protect_test): ... this, adjust constraint.
>
> gcc/testsuite/ChangeLog:
>
> * g++.target/powerpc/pr111367.C: New test.
> ---
>  gcc/config/rs6000/rs6000.md | 73 -
>  gcc/testsuite/g++.target/powerpc/pr111367.C | 22 +++
>  2 files changed, 49 insertions(+), 46 deletions(-)
>  create mode 100644 gcc/testsuite/g++.target/powerpc/pr111367.C
>
> diff --git a/gcc/config/rs6000/rs6000.md b/gcc/config/rs6000/rs6000.md
> index 1a9a7b1a479..0ac79fc7735 100644
> --- a/gcc/config/rs6000/rs6000.md
> +++ b/gcc/config/rs6000/rs6000.md
> @@ -12389,33 +12389,26 @@ (define_expand "stack_protect_set"
>DONE;
>  })
>
> -(define_insn "stack_protect_setsi"
> -  [(set (match_operand:SI 0 "memory_operand" "=m")
> -   (unspec:SI [(match_operand:SI 1 "memory_operand" "m")]
> UNSPEC_SP_SET))
> -   (set (match_scratch:SI 2 "=") (const_int 0))]
> -  "TARGET_32BIT"
> -  "lwz%U1%X1 %2,%1\;stw%U0%X0 %2,%0\;li %2,0"
> -  [(set_attr "type" "three")
> -   (set_attr "length" "12")])
> -
>  ;; We can't use the prefixed attribute here because there are two memory
>  ;; instructions.  We can't split the insn due to the fact that this
> operation
>  ;; needs to be done in one piece.
> -(define_insn "stack_protect_setdi"
> -  [(set (match_operand:DI 0 "memory_operand" "=Y")
> -   (unspec:DI [(match_operand:DI 1 "memory_operand" "Y")]
> UNSPEC_SP_SET))
> -   (set (match_scratch:DI 2 "=") (const_int 0))]
> -  "TARGET_64BIT"
> +(define_insn "stack_protect_set"
> +  [(set (match_operand:P 0 "memory_operand" "=YZ")
> +   (unspec:P [(match_operand:P 1 "memory_operand" "YZ")]
> UNSPEC_SP_SET))
> +   (set (match_scratch:P 2 "=") (const_int 0))]
> +  ""
>  {
> -  if (prefixed_memory (operands[1], DImode))
> -output_asm_insn ("pld %2,%1", operands);
> +  if (prefixed_memory (operands[1], mode))
> +/* Prefixed load only supports D-form but no update and X-form.  */
> +output_asm_insn ("p %2,%1", operands);
>else
> -output_asm_insn ("ld%U1%X1 %2,%1", operands);
> +output_asm_insn ("%U1%X1 %2,%1", operands);
>
> -  if (prefixed_memory (operands[0], DImode))
> -output_asm_insn ("pstd %2,%0", operands);
> +  if (prefixed_memory (operands[0], mode))
> +/* Prefixed store only supports D-form but no update and X-form.  */
> +output_asm_insn ("pst %2,%0", operands);
>else
> -output_asm_insn ("std%U0%X0 %2,%0", operands);
> +output_asm_insn ("st%U0%X0 %2,%0", operands);
>
>return "li %2,0";
>  }
> @@ -12461,45 +12454,33 @@ (define_expand "stack_protect_test"
>DONE;
>  })
>
> -(define_insn "stack_protect_testsi"
> -  [(set (match_operand:CCEQ 0 "cc_reg_operand" "=x,?y")
> -(unspec:CCEQ [(match_operand:SI 1 "memory_operand" "m,m")
> - (match_operand:SI 2 "memory_operand" "m,m")]
> -UNSPEC_SP_TEST))
> -   (set (match_scratch:SI 4 "=r,r") (const_int 0))
> -   (clobber (match_scratch:SI 3 "=,"))]
> -  "TARGET_32BIT"
> -  "@
> -   lwz%U1%X1 %3,%1\;lwz%U2%X2 %4,%2\;xor. %3,%3,%4\;li %4,0
> -   lwz%U1%X1 %3,%1\;lwz%U2%X2 %4,%2\;cmplw %0,%3,%4\;li %3,0\;li %4,0"
> -  [(set_attr "length" "16,20")])
> -
>  ;; We can't use the prefixed attribute here because there are two memory
>  ;; instructions.  We can't split the insn due to the fact that this
> operation
>  ;; needs to be done in one piece.
> -(define_insn "stack_protect_testdi"
> +(define_insn "stack_protect_test"
>[(set (match_operand:CCEQ 0 "cc_reg_operand" "=x,?y")
> -(unspec:CCEQ [(match_operand:DI 1 "memory_operand" "Y,Y")
> - (match_operand:DI 2 "memory_operand" "Y,Y")]
> +

[pushed] diagnostics: add ctors to text_info; add m_ prefixes to fields

2023-10-03 Thread David Malcolm
No functional change intended.

Successfully bootstrapped & regrtested on x86_64-pc-linux-gnu.
Pushed to trunk as r14-4379-gc44ca7c01226e0.

gcc/ada/ChangeLog:
* gcc-interface/misc.cc: Use text_info ctor.

gcc/analyzer/ChangeLog:
* analyzer-logging.cc (logger::log_va_partial): Use text_info
ctor.
* analyzer.cc (make_label_text): Likewise.
(make_label_text_n): Likewise.
* pending-diagnostic.cc (evdesc::event_desc::formatted_print):
Likewise.

gcc/c/ChangeLog:
* c-objc-common.cc (c_tree_printer): Update for "m_" prefixes to
text_info fields.

gcc/cp/ChangeLog:
* error.cc: Update for "m_" prefixes to text_info fields.

gcc/d/ChangeLog:
* d-diagnostic.cc (d_diagnostic_report_diagnostic): Use text_info
ctor.

gcc/ChangeLog:
* diagnostic.cc (diagnostic_set_info_translated): Update for "m_"
prefixes to text_info fields.
(diagnostic_report_diagnostic): Likewise.
(verbatim): Use text_info ctor.
(simple_diagnostic_path::add_event): Likewise.
(simple_diagnostic_path::add_thread_event): Likewise.
* dumpfile.cc (dump_pretty_printer::decode_format): Update for
"m_" prefixes to text_info fields.
(dump_context::dump_printf_va): Use text_info ctor.
* graphviz.cc (graphviz_out::graphviz_out): Use text_info ctor.
(graphviz_out::print): Likewise.
* opt-problem.cc (opt_problem::opt_problem): Likewise.
* pretty-print.cc (pp_format): Update for "m_" prefixes to
text_info fields.
(pp_printf): Use text_info ctor.
(pp_verbatim): Likewise.
(assert_pp_format_va): Likewise.
* pretty-print.h (struct text_info): Add ctors.  Add "m_" prefix
to all fields.
* text-art/styled-string.cc (styled_string::from_fmt_va): Use
text_info ctor.
* tree-diagnostic.cc (default_tree_printer): Update for "m_"
prefixes to text_info fields.
* tree-pretty-print.h (pp_ti_abstract_origin): Likewise.

gcc/fortran/ChangeLog:
* error.cc (gfc_format_decoder): Update for "m_" prefixes to
text_info fields.
---
 gcc/ada/gcc-interface/misc.cc  |  5 +--
 gcc/analyzer/analyzer-logging.cc   |  5 +--
 gcc/analyzer/analyzer.cc   | 15 +--
 gcc/analyzer/pending-diagnostic.cc |  7 +---
 gcc/c/c-objc-common.cc |  4 +-
 gcc/cp/error.cc|  8 ++--
 gcc/d/d-diagnostic.cc  |  6 +--
 gcc/diagnostic.cc  | 33 +---
 gcc/dumpfile.cc| 13 +++---
 gcc/fortran/error.cc   |  2 +-
 gcc/graphviz.cc| 10 +
 gcc/opt-problem.cc |  6 +--
 gcc/pretty-print.cc| 63 --
 gcc/pretty-print.h | 24 +---
 gcc/text-art/styled-string.cc  |  5 +--
 gcc/tree-diagnostic.cc |  6 +--
 gcc/tree-pretty-print.h|  2 +-
 17 files changed, 81 insertions(+), 133 deletions(-)

diff --git a/gcc/ada/gcc-interface/misc.cc b/gcc/ada/gcc-interface/misc.cc
index 269c15e4b0d..453ae8087a6 100644
--- a/gcc/ada/gcc-interface/misc.cc
+++ b/gcc/ada/gcc-interface/misc.cc
@@ -293,7 +293,6 @@ static void
 internal_error_function (diagnostic_context *context, const char *msgid,
 va_list *ap)
 {
-  text_info tinfo;
   char *buffer, *p, *loc;
   String_Template temp, temp_loc;
   String_Pointer sp, sp_loc;
@@ -309,9 +308,7 @@ internal_error_function (diagnostic_context *context, const 
char *msgid,
   pp_clear_output_area (context->printer);
 
   /* Format the message into the pretty-printer.  */
-  tinfo.format_spec = msgid;
-  tinfo.args_ptr = ap;
-  tinfo.err_no = errno;
+  text_info tinfo (msgid, ap, errno);
   pp_format_verbatim (context->printer, );
 
   /* Extract a (writable) pointer to the formatted text.  */
diff --git a/gcc/analyzer/analyzer-logging.cc b/gcc/analyzer/analyzer-logging.cc
index b78481c4098..ddfbb5b4c04 100644
--- a/gcc/analyzer/analyzer-logging.cc
+++ b/gcc/analyzer/analyzer-logging.cc
@@ -144,10 +144,7 @@ logger::log_partial (const char *fmt, ...)
 void
 logger::log_va_partial (const char *fmt, va_list *ap)
 {
-  text_info text;
-  text.format_spec = fmt;
-  text.args_ptr = ap;
-  text.err_no = 0;
+  text_info text (fmt, ap, 0);
   pp_format (m_pp, );
   pp_output_formatted_text (m_pp);
 }
diff --git a/gcc/analyzer/analyzer.cc b/gcc/analyzer/analyzer.cc
index 94c5cf242b2..9d4bc788f31 100644
--- a/gcc/analyzer/analyzer.cc
+++ b/gcc/analyzer/analyzer.cc
@@ -425,19 +425,13 @@ make_label_text (bool can_colorize, const char *fmt, ...)
   if (!can_colorize)
 pp_show_color (pp) = false;
 
-  text_info ti;
   rich_location rich_loc (line_table, UNKNOWN_LOCATION);
 
   va_list ap;
 
   va_start (ap, fmt);
 
-  ti.format_spec = _(fmt);
-  ti.args_ptr = 
-  ti.err_no = 0;
-  ti.x_data = NULL;
-  ti.m_richloc = _loc;
-
+  text_info ti (_(fmt), , 0, 

Re: [PATCH v3] RISC-V:Optimize the MASK opt generation

2023-10-03 Thread David Edelsohn
The patch works on AIX.

I have Gawk installed, but it is a very old release before
multi-dimensional array support was added.

Thanks, David


On Mon, Oct 2, 2023 at 10:38 PM Kito Cheng  wrote:

> Proposed fix, and verified with "mawk" and "gawk -P" (gawk with posix
> mode) on my linux also some other report it work on freebsd, just wait
> review :)
>
> https://gcc.gnu.org/pipermail/gcc-patches/2023-October/631785.html
>
> On Tue, Oct 3, 2023 at 2:07 AM Jeff Law  wrote:
> >
> >
> >
> > On 10/2/23 12:03, David Edelsohn wrote:
> > > On Mon, Oct 2, 2023 at 1:59 PM Jeff Law  > > > wrote:
> > >
> > >
> > >
> > > On 10/2/23 11:20, David Edelsohn wrote:
> > >  > Wang,
> > >  >
> > >  > The AWK portions of this patch broke bootstrap on AIX.
> > >  >
> > >  > Also, the AWK portions are common code, not RISC-V specific.  I
> > > don't
> > >  > see anywhere that the common portions of the patch were
> reviewed or
> > >  > approved by anyone with authority to approve the changes to the
> > > AWK files.
> > >  >
> > >  > This patch should not have been committed without approval by a
> > > reviewer
> > >  > with authority for that portion of the compiler and should have
> been
> > >  > tested on targets other than RISC-V if common parts of the
> > > compiler were
> > >  > changed.
> > > I acked the generic bits.  So the lack of testing on another
> target is
> > > on me.
> > >
> > >
> > > Hi, Jeff
> > >
> > > Sorry. I didn't see a comment from a global reviewer in the V3 thread.
> > NP.
> >
> > >
> > > I am using Gawk on AIX.  After the change, I see a parse error from
> > > gawk.  I'm rebuilding with a checkout just before the change to confirm
> > > that it was the source of the error, and it seems to be past that
> > > failure location.  I didn't keep the exact error.  Once I get past this
> > > build cycle, I'll reproduce it.
> > I think there's already a patch circulating which fixes this.  It broke
> > at least one other platform.  Hopefully it'll all be sorted out today.
> >
> >
> > jeff
>


Re: [PATCH] __atomic_test_and_set: Fall back to library, not non-atomic code

2023-10-03 Thread Christophe Lyon
Hi!

On Tue, 26 Sept 2023 at 16:34, Hans-Peter Nilsson  wrote:

> Tested cris-elf, native x86_64-pc-linux-gnu and arm-eabi.
>
> For arm-eabi, notably lacking any atomic support for the
> default multilib, with --target_board=arm-sim it regressed
> 29_atomics/atomic_flag/cons/value_init.cc with the expected
> linker failure due to lack of __atomic_test_and_set - which
> is a good thing.  With this one, there are 44 unexpected
> FAILs for libstdc+++ at r14-4210-g94982a6b9cf4.  This number
> was 206 as late as r14-3470-g721f7e2c4e5e, but mitigated by
> r14-3980-g62b29347c38394, deliberately.  To fix the
> regression, I'll do the same and follow up with adding
> dg-require-thread-fence on
> 29_atomics/atomic_flag/cons/value_init.cc (and if approved,
> commit it before this one).
>
> Incidentally, the fortran test-results for arm-eabi are
> riddled with missing-__sync_synchronize linker errors
> causing some 18134 unexpected failures, where cris-elf has
> 121.
>
>
The patch passed almost all our CI configurations, except arm-eabi when
testing with
 -mthumb/-march=armv6s-m/-mtune=cortex-m0/-mfloat-abi=soft/-mfpu=auto
where is causes these failures:
FAIL: 29_atomics/atomic_flag/clear/1.cc -std=gnu++17 (test for excess
errors)
UNRESOLVED: 29_atomics/atomic_flag/clear/1.cc -std=gnu++17 compilation
failed to produce executable
FAIL: 29_atomics/atomic_flag/cons/value_init.cc -std=gnu++20 (test for
excess errors)
UNRESOLVED: 29_atomics/atomic_flag/cons/value_init.cc -std=gnu++20
compilation failed to produce executable
FAIL: 29_atomics/atomic_flag/cons/value_init.cc -std=gnu++26 (test for
excess errors)
UNRESOLVED: 29_atomics/atomic_flag/cons/value_init.cc -std=gnu++26
compilation failed to produce executable
FAIL: 29_atomics/atomic_flag/test_and_set/explicit.cc -std=gnu++17 (test
for excess errors)
UNRESOLVED: 29_atomics/atomic_flag/test_and_set/explicit.cc -std=gnu++17
compilation failed to produce executable
FAIL: 29_atomics/atomic_flag/test_and_set/implicit.cc -std=gnu++17 (test
for excess errors)
UNRESOLVED: 29_atomics/atomic_flag/test_and_set/implicit.cc -std=gnu++17
compilation failed to produce executable

The linker error is:
undefined reference to `__atomic_test_and_set'

Maybe we need a new variant of dg-require-thread-fence ?

Thanks,

Christophe


Ok to commit?
>
> -- >8 --
> Make __atomic_test_and_set consistent with other __atomic_ and __sync_
> builtins: call a matching library function instead of emitting
> non-atomic code when the target has no direct insn support.
>
> There's special-case code handling targetm.atomic_test_and_set_trueval
> != 1 trying a modified maybe_emit_sync_lock_test_and_set.  Previously,
> if that worked but its matching emit_store_flag_force returned NULL,
> we'd segfault later on.  Now that the caller handles NULL, gcc_assert
> here instead.
>
> While the referenced PR:s are ARM-specific, the issue is general.
>
> PR target/107567
> PR target/109166
> * builtins.cc (expand_builtin) :
> Handle failure from expand_builtin_atomic_test_and_set.
> * optabs.cc (expand_atomic_test_and_set): When all attempts fail to
> generate atomic code through target support, return NULL
> instead of emitting non-atomic code.  Also, for code handling
> targetm.atomic_test_and_set_trueval != 1, gcc_assert result
> from calling emit_store_flag_force instead of returning NULL.
> ---
>  gcc/builtins.cc |  5 -
>  gcc/optabs.cc   | 22 +++---
>  2 files changed, 11 insertions(+), 16 deletions(-)
>
> diff --git a/gcc/builtins.cc b/gcc/builtins.cc
> index 6e4274bb2a4e..40dfd36a3197 100644
> --- a/gcc/builtins.cc
> +++ b/gcc/builtins.cc
> @@ -8387,7 +8387,10 @@ expand_builtin (tree exp, rtx target, rtx
> subtarget, machine_mode mode,
>break;
>
>  case BUILT_IN_ATOMIC_TEST_AND_SET:
> -  return expand_builtin_atomic_test_and_set (exp, target);
> +  target = expand_builtin_atomic_test_and_set (exp, target);
> +  if (target)
> +   return target;
> +  break;
>
>  case BUILT_IN_ATOMIC_CLEAR:
>return expand_builtin_atomic_clear (exp);
> diff --git a/gcc/optabs.cc b/gcc/optabs.cc
> index 8b96f23aec05..e1898da22808 100644
> --- a/gcc/optabs.cc
> +++ b/gcc/optabs.cc
> @@ -7080,25 +7080,17 @@ expand_atomic_test_and_set (rtx target, rtx mem,
> enum memmodel model)
>/* Recall that the legacy lock_test_and_set optab was allowed to do
> magic
>   things with the value 1.  Thus we try again without trueval.  */
>if (!ret && targetm.atomic_test_and_set_trueval != 1)
> -ret = maybe_emit_sync_lock_test_and_set (subtarget, mem, const1_rtx,
> model);
> -
> -  /* Failing all else, assume a single threaded environment and simply
> - perform the operation.  */
> -  if (!ret)
>  {
> -  /* If the result is ignored skip the move to target.  */
> -  if (subtarget != const0_rtx)
> -emit_move_insn (subtarget, mem);
> +  ret = 

Re: [PATCH] c++: merge tsubst_copy into tsubst_copy_and_build

2023-10-03 Thread Patrick Palka
On Mon, 2 Oct 2023, Patrick Palka wrote:

> Bootstrapped and regtested on x86_64-pc-linux-gnu, does this look
> OK for trunk?
> 
> -- >8 --
> 
> The relationship between tsubst_copy_and_build and tsubst_copy (two of
> the main template argument substitution routines for expression trees)
> is rather hazy.  The former is mostly a superset of the latter, with
> some differences.
> 
> The main difference is that they handle many tree codes differently, but
> much of the tree code handling in tsubst_copy appears to be dead code[1].
> This is because tsubst_copy only gets directly called in a few places
> and mostly on id-expressions.  The interesting exceptions are PARM_DECL,
> VAR_DECL, BIT_NOT_EXPR, SCOPE_REF, TEMPLATE_ID_EXPR and IDENTIFIER_NODE:
> 
>  * for PARM_DECL and VAR_DECL, tsubst_copy_and_build calls tsubst_copy
>followed by doing some extra handling of its own
>  * for BIT_NOT_EXPR tsubst_copy implicitly handles unresolved destructor
>calls (i.e. the first operand is an identifier or a type)
>  * for SCOPE_REF, TEMPLATE_ID_EXPR and IDENTIFIER_NODE tsubst_copy
>refrains from doing name lookup of the terminal name
> 
> Other more minor differences are that tsubst_copy exits early when
> 'args' is null, and it calls maybe_dependent_member_ref, and finally
> it dispatches to tsubst for type trees.
> 
> Thus tsubst_copy is (at this point) similar enough to tsubst_copy_and_build
> that it makes sense to merge the two functions, with the main difference
> being the name lookup behavior[2].  So this patch merges tsubst_copy into
> tsubst_copy_and_build via a new tsubst tf_no_name_lookup which controls
> name lookup and resolution of a (top-level) id-expression.
> 
> [1]: http://thrifty.mooo.com:8008/gcc-lcov/gcc/cp/pt.cc.gcov.html#17231
> [2]: I don't know the history of tsubst_copy but I would guess it was
> added before we settled on using processing_template_decl to control
> whether our AST building routines perform semantic checking and return
> non-templated trees, and so we needed a separate tsubst routine that
> avoids semantic checking and always returns a templated tree for e.g.
> partial substitution.

Oops, this is wrong -- tsubst_copy_and_build came after tsubst_copy,
and was introduced as an optimization with the intent of getting rid
of tsubst_copy eventually:
https://gcc.gnu.org/pipermail/gcc-patches/2003-January/093659.html

> 
> gcc/cp/ChangeLog:
> 
>   * cp-tree.h (enum tsubst_flags): Add tf_no_name_lookup.
>   * pt.cc (tsubst_copy):
>   (tsubst_pack_expansion): Use tsubst for substituting BASES_TYPE.
>   (tsubst_decl) : Use tsubst_copy_and_build with
>   tf_no_name_lookup instead of tsubst_copy.
>   (tsubst) : Use tsubst_copy_and_build
>   instead of tsubst_copy for substituting
>   CLASS_PLACEHOLDER_TEMPLATE.
>   : Use tsubst_copy_and_build with
>   tf_no_name_lookup instead of tsubst_copy for substituting
>   TYPENAME_TYPE_FULLNAME.
>   (tsubst_qualified_id): Likewise for substituting the component
>   name of a SCOPE_REF.
>   (tsubst_copy): Remove.
>   (tsubst_copy_and_build): Clear tf_no_name_lookup at the start,
>   and remember if it was set.  Call maybe_dependent_member_ref.
>   : Don't do name lookup if tf_no_name_lookup
>   was set.
>   : Don't finish a template-id if
>   tf_no_name_lookup was set.
>   : Handle identifier and type operand (if
>   tf_no_name_lookup was set).
>   : Avoid trying to resolve a SCOPE_REF if
>   tf_no_name_lookup by calling build_qualified_name directly
>   instead of tsubst_qualified_id.
>   : Handling of sizeof...  copied from tsubst_copy.
>   : Use tsubst_copy_and_build with
>   tf_no_name_lookup instead of tsubst_copy to substitute
>   a TEMPLATE_ID_EXPR callee naming an unresolved template.
>   : Likewise to substitute the member.
>   : Copied from tsubst_copy and merged with ...
>   : ... these.  Initial handling copied
>   from tsubst_copy.  Optimize local variable substitution by
>   trying retrieve_local_specialization before checking
>   uses_template_parms.
>   : Copied from tsubst_copy.
>   : Likewise.
>   : Likewise.
>   : Likewise.
>   : Likewise.
>   : Likewise.
>   : Likewise.
>   : Likewise.
>   : Likewise.
>   : Likewise.
>   : Likewise.
>   : Likewise.
>   : Likewise.
>   : Likewise.
>   : Likewise.
>   : Use tsubst and tsubst_copy_and_build instead
>   of tsubst_copy.
>   : Copied from tsubst_copy.
>   (tsubst_initializer_list): Use tsubst and tsubst_copy_and_build
>   instead of tsubst_copy.
> ---
>  gcc/cp/cp-tree.h |3 +
>  gcc/cp/pt.cc | 1742 +++---
>  2 files changed, 719 insertions(+), 1026 deletions(-)
> 
> diff --git a/gcc/cp/cp-tree.h b/gcc/cp/cp-tree.h
> index 8b9a7d58462..919eab34803 100644
> --- a/gcc/cp/cp-tree.h
> +++ b/gcc/cp/cp-tree.h
> @@ 

Re: [PATCH] contrib/mklog.py: Fix issues reported by flake8

2023-10-03 Thread Jakub Jelinek
On Tue, Oct 03, 2023 at 02:02:40PM +0200, Martin Jambor wrote:
> Hi,
> 
> the testing infrastructure built by Martin Liška contains checking a
> few python scripts in contrib witha tool flake8.  That tool recently
> complains that:
> 
>   contrib/mklog.py:360:45: E711 comparison to None should be 'if cond is 
> None:'
>   contrib/mklog.py:362:1: E305 expected 2 blank lines after class or function 
> definition, found 1
> 
> I'd like to silence these with the following, hopefully trivial,
> changes.  However, I have only tested the changes by running flake8
> again and running ./contrib/mklog.py --help.
> 
> Is this good for trunk?  (Or should I stop using flake8 instead?)
> 
> Thanks,
> 
> Martin
> 
> 
> contrib/ChangeLog:
> 
> 2023-10-03  Martin Jambor  
> 
>   * mklog.py (skip_line_in_changelog): Compare to None using is instead
>   of ==, add an extra newline after the function.

Ok, thanks.

Jakub



[PATCH] contrib/mklog.py: Fix issues reported by flake8

2023-10-03 Thread Martin Jambor
Hi,

the testing infrastructure built by Martin Liška contains checking a
few python scripts in contrib witha tool flake8.  That tool recently
complains that:

  contrib/mklog.py:360:45: E711 comparison to None should be 'if cond is None:'
  contrib/mklog.py:362:1: E305 expected 2 blank lines after class or function 
definition, found 1

I'd like to silence these with the following, hopefully trivial,
changes.  However, I have only tested the changes by running flake8
again and running ./contrib/mklog.py --help.

Is this good for trunk?  (Or should I stop using flake8 instead?)

Thanks,

Martin


contrib/ChangeLog:

2023-10-03  Martin Jambor  

* mklog.py (skip_line_in_changelog): Compare to None using is instead
of ==, add an extra newline after the function.
---
 contrib/mklog.py | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/contrib/mklog.py b/contrib/mklog.py
index effe5aa1ca5..1c2c3216e9e 100755
--- a/contrib/mklog.py
+++ b/contrib/mklog.py
@@ -357,7 +357,8 @@ def update_copyright(data):
 
 
 def skip_line_in_changelog(line):
-return FIRST_LINE_OF_END_RE.match(line) == None
+return FIRST_LINE_OF_END_RE.match(line) is None
+
 
 if __name__ == '__main__':
 extra_args = os.getenv('GCC_MKLOG_ARGS')
-- 
2.42.0



Re: [PATCH]middle-end: Recursively check is_trivially_copyable_or_pair in vec.h

2023-10-03 Thread Jakub Jelinek
On Tue, Oct 03, 2023 at 11:41:01AM +, Tamar Christina wrote:
> > We have stablesort method instead of
> > qsort but that would require consistent ordering in the vector (std::sort
> > doesn't ensure stable sorting either).
> > 
> > If it is a non-issue, the patch is ok with the above nits fixed.  Otherwise
> > perhaps we'd need to push in the first loop into the vector (but that
> >   if (!phi_arg_map.get (arg))
> > args.quick_push (arg);
> >   phi_arg_map.get_or_insert (arg).safe_push (i); in there was quite
> > inefficient, better would be
> >   bool existed;
> >   phi_arg_map.get_or_insert (arg, ).safe_push (i);
> >   if (!existed)
> > args.safe_push (ifcvt_arg_entry { arg, 0, 0, vNULL }); or something
> > similar), plus use stablesort.  Or add another compared member which would
> > be the first position.
> 
> Hmm the problem here is that it would make the second loop that fills in the 
> len
> quadratic as it has to search for arg in the list.  I suppose I could push a 
> pointer
> to the struct instead of `i` in the hashmap and the element into args and 
> update
> the pointer as we go along?  Would that work?

Only if the second loop traverses the hashmap elements and for each tries to
find the corresponding vector element.
If instead you do what you've done before in the second loop, walk the
vector and for each arg in there lookup phi_args_map.get (v.arg) (but please
just once, vanilla trunk looks it up twice in
  for (int index : phi_arg_map.get (args[i]))
{
  edge e = gimple_phi_arg_edge (phi, index);
  len += get_bb_num_predicate_stmts (e->src);
}
  
  unsigned occur = phi_arg_map.get (args[i])->length ();
), then I don't think it would be quadratic.

Jakub



Re: [PATCH v5] Implement new RTL optimizations pass: fold-mem-offsets.

2023-10-03 Thread Manolis Tsamis
On Fri, Sep 29, 2023 at 10:22 PM Jeff Law  wrote:
>
>
>
> On 9/12/23 04:13, Manolis Tsamis wrote:
>
> >>> +
> >>> +/* Get the single reaching definition of an instruction inside a BB.
> >>> +   The definition is desired for REG used in INSN.
> >>> +   Return the definition insn or NULL if there's no definition with
> >>> +   the desired criteria.  */
> >>> +static rtx_insn*
> >>> +get_single_def_in_bb (rtx_insn *insn, rtx reg)
> >>> +{
> >>> +  df_ref use;
> >>> +  struct df_link *ref_chain, *ref_link;
> >>> +
> >>> +  FOR_EACH_INSN_USE (use, insn)
> >>> +{
> >>> +  if (GET_CODE (DF_REF_REG (use)) == SUBREG)
> >>> + return NULL;
> >>> +  if (REGNO (DF_REF_REG (use)) == REGNO (reg))
> >>> + break;
> >>> +}
> >>> +
> >>> +  if (!use)
> >>> +return NULL;
> >>> +
> >>> +  ref_chain = DF_REF_CHAIN (use);
> >> So what if there's two uses of REG in INSN?  I don't think it's be
> >> common at all, but probably better safe and reject than sorry, right? Or
> >> is that case filtered out earlier?
> >>
> > If the REG is the same won't the definitions be the same even if that
> > REG appears multiple times in INSN?
> Yes.
Good, so no issues here.
>
> > fold_offsets_1 should be able to handle the folding with multiple uses
> > of REG just fine, for example add R1, R1 or add (ashift R1, 1), R1.
> > If there's no other issue here I assume we want to keep that as-is in
> > order to not reduce the propagation power (Which I assume is similar
> > to ree which uses the same logic).
> OK.  I was primarily concerned about the folding and rewriting aspects.
> It probably can only show up on targets with LEA like instructions, and
> even on such targets it's probably rate.
>
OK.
>
>
>
> >>> +/* Test if INSN is a memory load / store that can have an offset folded 
> >>> to it.
> >>> +   Return true iff INSN is such an instruction and return through 
> >>> MEM_OUT,
> >>> +   REG_OUT and OFFSET_OUT the RTX that has a MEM code, the register that 
> >>> is
> >>> +   used as a base address and the offset accordingly.
> >>> +   All of the out pointers may be NULL in which case they will be 
> >>> ignored.  */
> >>> +bool
> >>> +get_fold_mem_root (rtx_insn* insn, rtx *mem_out, rtx *reg_out,
> >>> +HOST_WIDE_INT *offset_out)
> >>> +{
> >>> +  rtx set = single_set (insn);
> >>> +  rtx mem = NULL_RTX;
> >>> +
> >>> +  if (set != NULL_RTX)
> >>> +{
> >>> +  rtx src = SET_SRC (set);
> >>> +  rtx dest = SET_DEST (set);
> >>> +
> >>> +  /* Don't fold when we have unspec / volatile.  */
> >>> +  if (GET_CODE (src) == UNSPEC
> >>> +   || GET_CODE (src) == UNSPEC_VOLATILE
> >>> +   || GET_CODE (dest) == UNSPEC
> >>> +   || GET_CODE (dest) == UNSPEC_VOLATILE)
> >>> + return false;
> >>> +
> >>> +  if (MEM_P (src))
> >>> + mem = src;
> >>> +  else if (MEM_P (dest))
> >>> + mem = dest;
> >>> +  else if ((GET_CODE (src) == SIGN_EXTEND
> >>> + || GET_CODE (src) == ZERO_EXTEND)
> >>> +&& MEM_P (XEXP (src, 0)))
> >> Note some architectures allow both a source and destination memory.  It
> >> looks like your code will prefer the source operand in that case.
> >> That's fine, just pointing it out.
> >>
> > Thanks for pointing that possibility out. I thought for a moment that
> > this would be a bug with multiple mentions of the address register.
> > but it should be fine due to:
> > /* Special case: A foldable memory store is not foldable if it
> > mentions DEST outside of the address calculation. */
> ACK.
>
> The other thing I keep pondering is autoincrement style addressing.
> Though I think at some point I convinced myself they weren't a problem.
> I think your checks only allow specific kinds of expressions for the
> memory address and I don't think {PRE,POST}_{INC,DEC.MODIFY} were in the
> list of valid ops.
>
Yes, although I haven't considered pre/post-inc/dec, they're indeed
not allowed due to what is allowed to be a root memory operation
(get_fold_mem_root).
I believe these shouldn't be an issue as anything that transitively
affects even one of these will be rejected.

>
> >>> +
> >>> +  int max_iters = 5;
> >>> +  for (int i = 0; i < max_iters; i++)
> >>> +{
> >>> +  bool made_changes = false;
> >>> +  for (fold_info_map::iterator iter = fold_info->begin ();
> >>> +iter != fold_info->end (); ++iter)
> >>> + {
> >>> +   fold_mem_info *info = (*iter).second;
> >>> +   if (bitmap_intersect_p (_fold_insns, info->fold_insns))
> >>> + made_changes |= bitmap_ior_into (_fold_insns,
> >>> +  info->fold_insns);
> >>> + }
> >>> +
> >>> +  if (!made_changes)
> >>> + return true;
> >>> +}
> >>> +
> >>> +  return false;
> >> So how was the magic value of "5" determined here?  In general we try
> >> not to have magic #s like that and instead find a better way to control
> >> iterations, falling back to a PARAM when all else fails.
> >>
> > It's 

[PATCH v6] Implement new RTL optimizations pass: fold-mem-offsets.

2023-10-03 Thread Manolis Tsamis
This is a new RTL pass that tries to optimize memory offset calculations
by moving them from add immediate instructions to the memory loads/stores.
For example it can transform this:

  addi t4,sp,16
  add  t2,a6,t4
  shl  t3,t2,1
  ld   a2,0(t3)
  addi a2,1
  sd   a2,8(t2)

into the following (one instruction less):

  add  t2,a6,sp
  shl  t3,t2,1
  ld   a2,32(t3)
  addi a2,1
  sd   a2,24(t2)

Although there are places where this is done already, this pass is more
powerful and can handle the more difficult cases that are currently not
optimized. Also, it runs late enough and can optimize away unnecessary
stack pointer calculations.

gcc/ChangeLog:

* Makefile.in: Add fold-mem-offsets.o.
* passes.def: Schedule a new pass.
* tree-pass.h (make_pass_fold_mem_offsets): Declare.
* common.opt: New options.
* doc/invoke.texi: Document new option.
* fold-mem-offsets.cc: New file.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/fold-mem-offsets-1.c: New test.
* gcc.target/riscv/fold-mem-offsets-2.c: New test.
* gcc.target/riscv/fold-mem-offsets-3.c: New test.

Signed-off-by: Manolis Tsamis 
---

Changes in v6:
- Fix formatting issues.
- Compute maximum validity iterations based on
  flag_expensive_optimizations.

Changes in v5:
- Introduce new helper function fold_offsets_1.
- Fix bug because constants could be partially propagated
  through instructions that weren't understood.
- Introduce helper class fold_mem_info that stores f-m-o
  info for an instruction.
- Calculate fold_offsets only once with do_fold_info_calculation.
- Fix correctness issue by introducing compute_validity_closure.
- Propagate in more cases for PLUS/MINUS with constant.

Changes in v4:
- Add DF_EQ_NOTES flag to avoid incorrect state in notes.
- Remove fold_mem_offsets_driver and enum fold_mem_phase.
- Call recog when patching offsets in do_commit_offset.
- Restore INSN_CODE after modifying insn in do_check_validity.

Changes in v3:
- Added propagation for more codes:
  sub, neg, mul.
- Added folding / elimination for sub and
  const int moves.
- For the validity check of the generated addresses
  also test memory_address_addr_space_p.
- Replaced GEN_INT with gen_int_mode.
- Replaced some bitmap_head with auto_bitmap.
- Refactor each phase into own function for readability.
- Add dump details.
- Replace rtx iteration with reg_mentioned_p.
- Return early for codes that we can't propagate through.

Changes in v2:
- Made the pass target-independant instead of RISCV specific.
- Fixed a number of bugs.
- Add code to handle more ADD patterns as found
  in other targets (x86, aarch64).
- Improved naming and comments.
- Fixed bitmap memory leak.

 gcc/Makefile.in   |   1 +
 gcc/common.opt|   4 +
 gcc/doc/invoke.texi   |   8 +
 gcc/fold-mem-offsets.cc   | 900 ++
 gcc/passes.def|   1 +
 .../gcc.target/riscv/fold-mem-offsets-1.c |  16 +
 .../gcc.target/riscv/fold-mem-offsets-2.c |  24 +
 .../gcc.target/riscv/fold-mem-offsets-3.c |  17 +
 gcc/tree-pass.h   |   1 +
 9 files changed, 972 insertions(+)
 create mode 100644 gcc/fold-mem-offsets.cc
 create mode 100644 gcc/testsuite/gcc.target/riscv/fold-mem-offsets-1.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/fold-mem-offsets-2.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/fold-mem-offsets-3.c

diff --git a/gcc/Makefile.in b/gcc/Makefile.in
index 9cc16268abf..747f749538d 100644
--- a/gcc/Makefile.in
+++ b/gcc/Makefile.in
@@ -1443,6 +1443,7 @@ OBJS = \
fixed-value.o \
fold-const.o \
fold-const-call.o \
+   fold-mem-offsets.o \
function.o \
function-abi.o \
function-tests.o \
diff --git a/gcc/common.opt b/gcc/common.opt
index f137a1f81ac..b103b8d28ed 100644
--- a/gcc/common.opt
+++ b/gcc/common.opt
@@ -1252,6 +1252,10 @@ fcprop-registers
 Common Var(flag_cprop_registers) Optimization
 Perform a register copy-propagation optimization pass.
 
+ffold-mem-offsets
+Target Bool Var(flag_fold_mem_offsets) Init(1)
+Fold instructions calculating memory offsets to the memory access instruction 
if possible.
+
 fcrossjumping
 Common Var(flag_crossjumping) Optimization
 Perform cross-jumping optimization.
diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi
index 4085fc90907..3ba28d6c4e5 100644
--- a/gcc/doc/invoke.texi
+++ b/gcc/doc/invoke.texi
@@ -543,6 +543,7 @@ Objective-C and Objective-C++ Dialects}.
 -fauto-inc-dec  -fbranch-probabilities
 -fcaller-saves
 -fcombine-stack-adjustments  -fconserve-stack
+-ffold-mem-offsets
 

RE: [PATCH]middle-end: Recursively check is_trivially_copyable_or_pair in vec.h

2023-10-03 Thread Tamar Christina
> -Original Message-
> From: Jakub Jelinek 
> Sent: Tuesday, October 3, 2023 12:02 PM
> To: Tamar Christina 
> Cc: gcc-patches@gcc.gnu.org; nd ; jwak...@redhat.com
> Subject: Re: [PATCH]middle-end: Recursively check
> is_trivially_copyable_or_pair in vec.h
> 
> On Tue, Oct 03, 2023 at 10:27:16AM +, Tamar Christina wrote:
> > +/* Structure used to track meta-data on PHI arguments used to generate
> > +   most efficient comparison sequence to slatten a PHI node.  */
> 
> ^^^ typo (at least, never heard 
> of this word, and
> wiktionary doesn't know it either (except for Dannish/Swedish))
> 
> > @@ -2045,6 +2065,25 @@ gen_phi_nest_statement (gphi *phi,
> gimple_stmt_iterator *gsi,
> >return lhs;
> >  }
> >
> 
> Perhaps add a short function comment here?
> 
> > +static int
> > +cmp_arg_entry (const void *p1, const void *p2) {
> > +  const ifcvt_arg_entry sval1 = *(const ifcvt_arg_entry *)p1;
> > +  const ifcvt_arg_entry sval2 = *(const ifcvt_arg_entry *)p2;
> > +
> > +  if (sval1.num_compares < sval2.num_compares)
> > +return -1;
> > +  else if (sval1.num_compares > sval2.num_compares)
> > +return 1;
> > +
> > +  if (sval1.occurs < sval2.occurs)
> > +return -1;
> > +  else if (sval1.occurs > sval2.occurs)
> > +return 1;
> > +
> > +  return 0;
> > +}
> > +
> 
> > @@ -2167,61 +2206,53 @@ predicate_scalar_phi (gphi *phi,
> gimple_stmt_iterator *gsi)
> >/* Create hashmap for PHI node which contain vector of argument indexes
> >   having the same value.  */
> >bool swap = false;
> > -  hash_map > phi_arg_map;
> > +  hash_map > phi_arg_map;
> >unsigned int num_args = gimple_phi_num_args (phi);
> >/* Vector of different PHI argument values.  */
> > -  auto_vec args (num_args);
> > +  auto_vec args;
> >
> > -  /* Compute phi_arg_map.  */
> > +  /* Compute phi_arg_map, determine the list of unique PHI args and the
> indices
> > + where they are in the PHI node.  The indices will be used to determine
> > + the conditions to apply and their complexity.  */
> >for (i = 0; i < num_args; i++)
> >  {
> >tree arg;
> >
> >arg = gimple_phi_arg_def (phi, i);
> > -  if (!phi_arg_map.get (arg))
> > -   args.quick_push (arg);
> >phi_arg_map.get_or_insert (arg).safe_push (i);
> >  }
> >
> > -  /* Determine element with max number of occurrences and complexity.
> Looking at only
> > - number of occurrences as a measure for complexity isn't enough as all
> usages can
> > - be unique but the comparisons to reach the PHI node differ per branch.
> */
> > -  typedef std::pair > ArgEntry;
> > -  auto_vec argsKV;
> > -  for (i = 0; i < args.length (); i++)
> > +  /* Determine element with max number of occurrences and complexity.
> Looking
> > + at only number of occurrences as a measure for complexity isn't enough
> as
> > + all usages can be unique but the comparisons to reach the PHI node 
> > differ
> > + per branch.  */
> > +  for (auto entry : phi_arg_map)
> >  {
> >unsigned int len = 0;
> > -  for (int index : phi_arg_map.get (args[i]))
> > +  for (int index : entry.second)
> > {
> >   edge e = gimple_phi_arg_edge (phi, index);
> >   len += get_bb_num_predicate_stmts (e->src);
> > }
> >
> > -  unsigned occur = phi_arg_map.get (args[i])->length ();
> > +  unsigned occur = entry.second.length ();
> >if (dump_file && (dump_flags & TDF_DETAILS))
> > fprintf (dump_file, "Ranking %d as len=%d, idx=%d\n", i, len, occur);
> > -  argsKV.safe_push ({ args[i], { len, occur }});
> > +  args.safe_push ({ entry.first, len, occur, entry.second });
> >  }
> >
> >/* Sort elements based on rankings ARGS.  */
> > -  std::sort(argsKV.begin(), argsKV.end(), [](const ArgEntry ,
> > -const ArgEntry ) {
> > -return left.second < right.second;
> > -  });
> > -
> > -  for (i = 0; i < args.length (); i++)
> > -args[i] = argsKV[i].first;
> > +  args.qsort (cmp_arg_entry);
> 
> I admit I don't know what you're using the args vector later on for and
> whether its ordering affects code generation, but because you qsort it I
> assume it does. My worry is that a hash_map traversal might not be the same
> order on all hosts and similarly qsort doesn't achieve stable sorting in case
> num_compares and occurrs members are equal for two or more different
> arguments.  Can that ever happen?

The order does matter but only for args. The hashmap is only used to collect
the unique values and their locations.  While you can have num_compares and
occurs being the same it wouldn't matter for the optimization as they are both
"equally" as expensive.  It would matter in the case where they are the last 2
entries in the list as we never test the last entry.  So your codegen would
select a different element then to test.  So could potentially affect 
reproducibility.

> We have stablesort method 

Re: [PATCH]middle-end: Recursively check is_trivially_copyable_or_pair in vec.h

2023-10-03 Thread Jakub Jelinek
On Tue, Oct 03, 2023 at 10:27:16AM +, Tamar Christina wrote:
> +/* Structure used to track meta-data on PHI arguments used to generate
> +   most efficient comparison sequence to slatten a PHI node.  */

^^^ typo (at least, never heard
of this word, and wiktionary doesn't know it either (except for 
Dannish/Swedish))

> @@ -2045,6 +2065,25 @@ gen_phi_nest_statement (gphi *phi, 
> gimple_stmt_iterator *gsi,
>return lhs;
>  }
>  

Perhaps add a short function comment here?

> +static int
> +cmp_arg_entry (const void *p1, const void *p2)
> +{
> +  const ifcvt_arg_entry sval1 = *(const ifcvt_arg_entry *)p1;
> +  const ifcvt_arg_entry sval2 = *(const ifcvt_arg_entry *)p2;
> +
> +  if (sval1.num_compares < sval2.num_compares)
> +return -1;
> +  else if (sval1.num_compares > sval2.num_compares)
> +return 1;
> +
> +  if (sval1.occurs < sval2.occurs)
> +return -1;
> +  else if (sval1.occurs > sval2.occurs)
> +return 1;
> +
> +  return 0;
> +}
> +

> @@ -2167,61 +2206,53 @@ predicate_scalar_phi (gphi *phi, gimple_stmt_iterator 
> *gsi)
>/* Create hashmap for PHI node which contain vector of argument indexes
>   having the same value.  */
>bool swap = false;
> -  hash_map > phi_arg_map;
> +  hash_map > phi_arg_map;
>unsigned int num_args = gimple_phi_num_args (phi);
>/* Vector of different PHI argument values.  */
> -  auto_vec args (num_args);
> +  auto_vec args;
>  
> -  /* Compute phi_arg_map.  */
> +  /* Compute phi_arg_map, determine the list of unique PHI args and the 
> indices
> + where they are in the PHI node.  The indices will be used to determine
> + the conditions to apply and their complexity.  */
>for (i = 0; i < num_args; i++)
>  {
>tree arg;
>  
>arg = gimple_phi_arg_def (phi, i);
> -  if (!phi_arg_map.get (arg))
> - args.quick_push (arg);
>phi_arg_map.get_or_insert (arg).safe_push (i);
>  }
>  
> -  /* Determine element with max number of occurrences and complexity.  
> Looking at only
> - number of occurrences as a measure for complexity isn't enough as all 
> usages can
> - be unique but the comparisons to reach the PHI node differ per branch.  
> */
> -  typedef std::pair > ArgEntry;
> -  auto_vec argsKV;
> -  for (i = 0; i < args.length (); i++)
> +  /* Determine element with max number of occurrences and complexity.  
> Looking
> + at only number of occurrences as a measure for complexity isn't enough 
> as
> + all usages can be unique but the comparisons to reach the PHI node 
> differ
> + per branch.  */
> +  for (auto entry : phi_arg_map)
>  {
>unsigned int len = 0;
> -  for (int index : phi_arg_map.get (args[i]))
> +  for (int index : entry.second)
>   {
> edge e = gimple_phi_arg_edge (phi, index);
> len += get_bb_num_predicate_stmts (e->src);
>   }
>  
> -  unsigned occur = phi_arg_map.get (args[i])->length ();
> +  unsigned occur = entry.second.length ();
>if (dump_file && (dump_flags & TDF_DETAILS))
>   fprintf (dump_file, "Ranking %d as len=%d, idx=%d\n", i, len, occur);
> -  argsKV.safe_push ({ args[i], { len, occur }});
> +  args.safe_push ({ entry.first, len, occur, entry.second });
>  }
>  
>/* Sort elements based on rankings ARGS.  */
> -  std::sort(argsKV.begin(), argsKV.end(), [](const ArgEntry ,
> -  const ArgEntry ) {
> -return left.second < right.second;
> -  });
> -
> -  for (i = 0; i < args.length (); i++)
> -args[i] = argsKV[i].first;
> +  args.qsort (cmp_arg_entry);

I admit I don't know what you're using the args vector later on for and
whether its ordering affects code generation, but because you qsort it I
assume it does.  My worry is that a hash_map traversal might not be the same
order on all hosts and similarly qsort doesn't achieve stable sorting
in case num_compares and occurrs members are equal for two or more different
arguments.  Can that ever happen?  We have stablesort method instead of
qsort but that would require consistent ordering in the vector (std::sort
doesn't ensure stable sorting either).

If it is a non-issue, the patch is ok with the above nits fixed.  Otherwise
perhaps we'd need to push in the first loop into the vector (but that
  if (!phi_arg_map.get (arg))
args.quick_push (arg);
  phi_arg_map.get_or_insert (arg).safe_push (i);
in there was quite inefficient, better would be
  bool existed;
  phi_arg_map.get_or_insert (arg, ).safe_push (i);
  if (!existed)
args.safe_push (ifcvt_arg_entry { arg, 0, 0, vNULL });
or something similar), plus use stablesort.  Or add another compared member
which would be the first position.

Jakub



Re: [PATCH V2] Emit funcall external declarations only if actually used.

2023-10-03 Thread Jose E. Marchesi


ping

> ping
>
>> [Differences from V1:
>> - Prototype for call_from_call_insn moved before comment block.
>> - Reuse the `call' flag for SYMBOL_REF_LIBCALL.
>> - Fallback to check REG_CALL_DECL in non-direct calls.
>> - New test to check correct behavior for non-direct calls.]
>>
>> There are many places in GCC where alternative local sequences are
>> tried in order to determine what is the cheapest or best alternative
>> to use in the current target.  When any of these sequences involve a
>> libcall, the current implementation of emit_library_call_value_1
>> introduce a side-effect consisting on emitting an external declaration
>> for the funcall (such as __divdi3) which is thus emitted even if the
>> sequence that does the libcall is not retained.
>>
>> This is problematic in targets such as BPF, because the kernel loader
>> chokes on the spurious symbol __divdi3 and makes the resulting BPF
>> object unloadable.  Note that BPF objects are not linked before being
>> loaded.
>>
>> This patch changes emit_library_call_value_1 to mark the target
>> SYMBOL_REF as a libcall.  Then, the emission of the external
>> declaration is done in the first loop of final.cc:shorten_branches.
>> This happens only if the corresponding sequence has been kept.
>>
>> Regtested in x86_64-linux-gnu.
>> Tested with host x86_64-linux-gnu with target bpf-unknown-none.
>>
>> gcc/ChangeLog
>>
>>  * rtl.h (SYMBOL_REF_LIBCALL): Define.
>>  * calls.cc (emit_library_call_value_1): Do not emit external
>>  libcall declaration here.
>>  * final.cc (shorten_branches): Do it here.
>>
>> gcc/testsuite/ChangeLog
>>
>>  * gcc.target/bpf/divmod-libcall-1.c: New test.
>>  * gcc.target/bpf/divmod-libcall-2.c: Likewise.
>>  * gcc.c-torture/compile/libcall-2.c: Likewise.
>> ---
>>  gcc/calls.cc  |  9 +++---
>>  gcc/final.cc  | 30 +++
>>  gcc/rtl.h |  5 
>>  .../gcc.c-torture/compile/libcall-2.c |  8 +
>>  .../gcc.target/bpf/divmod-libcall-1.c | 19 
>>  .../gcc.target/bpf/divmod-libcall-2.c | 16 ++
>>  6 files changed, 83 insertions(+), 4 deletions(-)
>>  create mode 100644 gcc/testsuite/gcc.c-torture/compile/libcall-2.c
>>  create mode 100644 gcc/testsuite/gcc.target/bpf/divmod-libcall-1.c
>>  create mode 100644 gcc/testsuite/gcc.target/bpf/divmod-libcall-2.c
>>
>> diff --git a/gcc/calls.cc b/gcc/calls.cc
>> index 1f3a6d5c450..219ea599b16 100644
>> --- a/gcc/calls.cc
>> +++ b/gcc/calls.cc
>> @@ -4388,9 +4388,10 @@ emit_library_call_value_1 (int retval, rtx orgfun, 
>> rtx value,
>>  || argvec[i].partial != 0)
>>update_stack_alignment_for_call ([i].locate);
>>  
>> -  /* If this machine requires an external definition for library
>> - functions, write one out.  */
>> -  assemble_external_libcall (fun);
>> +  /* Mark the emitted target as a libcall.  This will be used by final
>> + in order to emit an external symbol declaration if the libcall is
>> + ever used.  */
>> +  SYMBOL_REF_LIBCALL (fun) = 1;
>>  
>>original_args_size = args_size;
>>args_size.constant = (aligned_upper_bound (args_size.constant
>> @@ -4735,7 +4736,7 @@ emit_library_call_value_1 (int retval, rtx orgfun, rtx 
>> value,
>> valreg,
>> old_inhibit_defer_pop + 1, call_fusage, flags, args_so_far);
>>  
>> -  if (flag_ipa_ra)
>> +  if (flag_ipa_ra || SYMBOL_REF_LIBCALL (orgfun))
>>  {
>>rtx datum = orgfun;
>>gcc_assert (GET_CODE (datum) == SYMBOL_REF);
>> diff --git a/gcc/final.cc b/gcc/final.cc
>> index dd3e22547ac..2041e43fdd1 100644
>> --- a/gcc/final.cc
>> +++ b/gcc/final.cc
>> @@ -804,6 +804,8 @@ make_pass_compute_alignments (gcc::context *ctxt)
>>  }
>>  
>>  
>> +static rtx call_from_call_insn (rtx_call_insn *insn);
>> +
>>  /* Make a pass over all insns and compute their actual lengths by shortening
>> any branches of variable length if possible.  */
>>  
>> @@ -850,6 +852,34 @@ shorten_branches (rtx_insn *first)
>>for (insn = get_insns (), i = 1; insn; insn = NEXT_INSN (insn))
>>  {
>>INSN_SHUID (insn) = i++;
>> +
>> +  /* If this is a `call' instruction implementing a libcall, and
>> + this machine requires an external definition for library
>> + functions, write one out.  */
>> +  if (CALL_P (insn))
>> +{
>> +  rtx x;
>> +
>> +  if ((x = call_from_call_insn (dyn_cast  (insn)))
>> +  && (x = XEXP (x, 0))
>> +  && MEM_P (x)
>> +  && (x = XEXP (x, 0))
>> +  && SYMBOL_REF_P (x)
>> +  && SYMBOL_REF_LIBCALL (x))
>> +{
>> +  /* Direct call.  */
>> +  assemble_external_libcall (x);
>> +}
>> +  else if ((x = find_reg_note (insn, REG_CALL_DECL, NULL_RTX))
>> +   && (x = XEXP (x, 0)))
>> +{
>> +   

Re: [PATCH 01/12] [contrib] validate_failures.py: Avoid testsuite aliasing

2023-10-03 Thread rep . dot . nop
On 27 September 2023 16:47:27 CEST, Maxim Kuvyrkov  
wrote:
>Hi Bernhard,
>
>Thanks, I meant to fix this, but forgot.

np.

>The underlying problem here is that we want to detect which sub-testsuites had 
>failures.  Current regex doesn't match go's case because there is no "..." at 
>the end: "Running foo" vs "Running foo ..." .
>
>My preferred way of fixing this is to make go's testsuite print out "..." .  
>We have a similar patch for glibc [1].
>
>[1] https://sourceware.org/pipermail/libc-alpha/2023-June/148702.html

Which asks:
---8<---
>> WDYT?
> 
> I looked at the gcc-testresults mailing list, and there appear no
> === … failures === lines at all?  What was the motivation for adding it
> in the first place?

The only motivation is that it looks like a nice header for the following 
FAILs.  What's your preference for the line -- drop it entirely or print out:

=== glibc failures ===
no unexpected failures

?
---8<---

I'd drop the above entirely if there are no failures, it's pretty superfluous, 
isn't it.

And concerning gotools and the missing trailing ".exp ...", I guess it's fine 
to add that to streamline the gotools output to all the other existing sum 
output.

TIA,


>
>--
>Maxim Kuvyrkov
>https://www.linaro.org
>
>> On Sep 26, 2023, at 19:46, Bernhard Reutner-Fischer  
>> wrote:
>> 
>> Hi Maxim!
>> 
>> On Mon, 5 Jun 2023 18:06:25 +0400
>> Maxim Kuvyrkov via Gcc-patches  wrote:
>> 
 On Jun 3, 2023, at 19:17, Jeff Law  wrote:
 
 On 6/2/23 09:20, Maxim Kuvyrkov via Gcc-patches wrote:  
> This patch adds tracking of current testsuite "tool" and "exp"
> to the processing of .sum files.  This avoids aliasing between
> tests from different testsuites with same name+description.
> E.g., this is necessary for testsuite/c-c++-common, which is ran
> for both gcc and g++ "tools".
> This patch changes manifest format from ...
> 
> FAIL: gcc_test
> FAIL: g++_test
> 
> ... to ...
> 
> === gcc tests ===
> Running gcc/foo.exp ...
> FAIL: gcc_test
> === gcc Summary ==
> === g++ tests ===
> Running g++/bar.exp ...
> FAIL: g++_test
> === g++ Summary ==
> .
> The new format uses same formatting as DejaGnu's .sum files
> to specify which "tool" and "exp" the test belongs to.  
 I think the series is fine.  You're not likely to hear from Diego or Doug 
 I suspect, I don't think either are involved in GNU stuff anymore.
 
>>> 
>>> Thanks, Jeff.  I'll wait for a couple of days and will merge if there are 
>>> no new comments.
>> 
>> Maxim, may i ask you to have a look at the following problem, please?
>> 
>> ISTM that your exp code does not work as expected for go, maybe you
>> forgot to test the changes with go enabled?
>> 
>> Ever since your changes in summer i see the following:
>> 
>> gcc-14.mine$ 
>> /scratch/src/gcc-14.mine/contrib/testsuite-management/validate_failures.py 
>> --clean_build ../gcc-14.orig/
>> Getting actual results from build directory .
>> ./gcc/testsuite/go/go.sum
>> ./gcc/testsuite/gcc/gcc.sum
>> ./gcc/testsuite/objc/objc.sum
>> ./gcc/testsuite/jit/jit.sum
>> ./gcc/testsuite/gdc/gdc.sum
>> ./gcc/testsuite/gnat/gnat.sum
>> ./gcc/testsuite/ada/acats/acats.sum
>> ./gcc/testsuite/g++/g++.sum
>> ./gcc/testsuite/obj-c++/obj-c++.sum
>> ./gcc/testsuite/rust/rust.sum
>> ./gcc/testsuite/gfortran/gfortran.sum
>> ./x86_64-pc-linux-gnu/libgomp/testsuite/libgomp.sum
>> ./x86_64-pc-linux-gnu/libphobos/testsuite/libphobos.sum
>> ./x86_64-pc-linux-gnu/libstdc++-v3/testsuite/libstdc++.sum
>> ./x86_64-pc-linux-gnu/libffi/testsuite/libffi.sum
>> ./x86_64-pc-linux-gnu/libitm/testsuite/libitm.sum
>> ./x86_64-pc-linux-gnu/libgo/libgo.sum
>> ./x86_64-pc-linux-gnu/libatomic/testsuite/libatomic.sum
>> ./gotools/gotools.sum
>> .sum file seems to be broken: tool="gotools", exp="None", 
>> summary_line="FAIL: TestScript"
>> Traceback (most recent call last):
>>  File 
>> "/scratch/src/gcc-14.mine/contrib/testsuite-management/validate_failures.py",
>>  line 732, in 
>>retval = Main(sys.argv)
>>  File 
>> "/scratch/src/gcc-14.mine/contrib/testsuite-management/validate_failures.py",
>>  line 721, in Main
>>retval = CompareBuilds()
>>  File 
>> "/scratch/src/gcc-14.mine/contrib/testsuite-management/validate_failures.py",
>>  line 622, in CompareBuilds
>>actual = GetResults(sum_files)
>>  File 
>> "/scratch/src/gcc-14.mine/contrib/testsuite-management/validate_failures.py",
>>  line 466, in GetResults
>>build_results.update(ParseSummary(sum_fname))
>>  File 
>> "/scratch/src/gcc-14.mine/contrib/testsuite-management/validate_failures.py",
>>  line 405, in ParseSummary
>>result = result_set.MakeTestResult(line, ordinal)
>>  File 
>> "/scratch/src/gcc-14.mine/contrib/testsuite-management/validate_failures.py",
>>  line 239, in MakeTestResult
>>return TestResult(summary_line, ordinal,
>>  File 
>> "/scratch/src/gcc-14.mine/contrib/testsuite-management/validate_failures.py",
>>  line 151, 

RE: [PATCH]middle-end: Recursively check is_trivially_copyable_or_pair in vec.h

2023-10-03 Thread Tamar Christina
> -Original Message-
> From: Jakub Jelinek 
> Sent: Monday, October 2, 2023 2:21 PM
> To: Tamar Christina 
> Cc: gcc-patches@gcc.gnu.org; nd ; jwak...@redhat.com
> Subject: Re: [PATCH]middle-end: Recursively check
> is_trivially_copyable_or_pair in vec.h
> 
> On Mon, Oct 02, 2023 at 01:38:53PM +0100, Tamar Christina wrote:
> > Hi All,
> >
> > I recently committed a patch that uses a nested std::pair in the second
> argument.
> > It temporarily adds a second ranking variable for sorting and then later 
> > drops
> it.
> >
> > This hits the newly added assert in vec.h.  This assert made some
> > relaxation for std::pair but doesn't allow this case through.  The
> > patch allows a recursive std::pair in the second argument which fixes
> bootstrap.
> 
> I must say I still don't understand why using a struct ifcvt_arg_entry { tree 
> arg;
> unsigned len, occur; }; with comments describing what the members mean
> wouldn't be a better fix, in the sorting function what exactly means
> x{1,2}.second.first and x{1,2}.second.second isn't easily understandable,
> neither from the identifiers nor from any comments.
> Seems because you use 2 separate vectors, one with just tree elements and
> another with those tree elements + 2 unsigned values cached from it for the
> sorting purpose and then rewrite the original tree vector after sorting, I 
> don't
> really see why nested std::pair would be a better match for it than a named
> structure.  Furthermore, why populate args first, then compute the extra 2
> integers in another loop pushing to argsKV and finally overwrite args with
> sorted values?  Can't the first loop push tree with the 2 integers already?
> what is the point of not using this structure later on when both args and
> argsKV vectors are live until the end of the same function?
> Can't you either pass that argsKV to others, having just one vector, or at 
> least
> release the other vector when you don't really need it?
> Formatting style, swap? arg1 : arg0 isn't correctly formatted, missing space
> before ?.
> 
> Also, ArgEntry is CamelCase which we (usually) don't use in GCC and
> additionally doesn't seem to be unique enough for ODR purposes.
> Ditto argsKV.
> 

Hi All,

This refactors the code to remove the args cache and index lookups
in favor of a single structure. It also again, removes the use of
std::sort as previously requested but avoids the new asserts in
trunk.

Bootstrapped Regtested on aarch64-none-linux-gnu, x86_64-linux-gnu
and no issues.

Ok for master?

Thanks,
Tamar

gcc/ChangeLog:

* tree-if-conv.cc (typedef struct ifcvt_arg_entry): New.
(cmp_arg_entry): New.
(gen_phi_arg_condition, gen_phi_nest_statement,
predicate_scalar_phi): Use them.

--- inline copy of patch 

diff --git a/gcc/tree-if-conv.cc b/gcc/tree-if-conv.cc
index 
a8c915913aed267edfb3ebd2c530aeca7cf51832..f7037bd42494b3982d2efd593ba276812b8d2f4f
 100644
--- a/gcc/tree-if-conv.cc
+++ b/gcc/tree-if-conv.cc
@@ -1927,11 +1927,32 @@ gen_simplified_condition (tree cond, 
scalar_cond_masked_set_type _set)
   return cond;
 }
 
+/* Structure used to track meta-data on PHI arguments used to generate
+   most efficient comparison sequence to slatten a PHI node.  */
+
+typedef struct ifcvt_arg_entry
+{
+  /* The PHI node argument value.  */
+  tree arg;
+
+  /* The number of compares required to reach this PHI node from start of the
+ BB being if-converted.  */
+  unsigned num_compares;
+
+  /* The number of times this PHI node argument appears in the current PHI
+ node.  */
+  unsigned occurs;
+
+  /* The indices at which this PHI arg occurs inside the PHI node.  */
+  vec  indexes;
+} ifcvt_arg_entry_t;
+
 /* Produce condition for all occurrences of ARG in PHI node.  Set *INVERT
as to whether the condition is inverted.  */
 
 static tree
-gen_phi_arg_condition (gphi *phi, vec *occur, gimple_stmt_iterator *gsi,
+gen_phi_arg_condition (gphi *phi, ifcvt_arg_entry_t ,
+  gimple_stmt_iterator *gsi,
   scalar_cond_masked_set_type _set, bool *invert)
 {
   int len;
@@ -1941,11 +1962,11 @@ gen_phi_arg_condition (gphi *phi, vec *occur, 
gimple_stmt_iterator *gsi,
   edge e;
 
   *invert = false;
-  len = occur->length ();
+  len = arg.indexes.length ();
   gcc_assert (len > 0);
   for (i = 0; i < len; i++)
 {
-  e = gimple_phi_arg_edge (phi, (*occur)[i]);
+  e = gimple_phi_arg_edge (phi, arg.indexes[i]);
   c = bb_predicate (e->src);
   if (is_true_predicate (c))
{
@@ -2010,22 +2031,21 @@ gen_phi_arg_condition (gphi *phi, vec *occur, 
gimple_stmt_iterator *gsi,
 static tree
 gen_phi_nest_statement (gphi *phi, gimple_stmt_iterator *gsi,
scalar_cond_masked_set_type _set, tree type,
-   hash_map> _arg_map,
-   gimple **res_stmt, tree lhs0, vec ,
-   unsigned idx)
+   gimple **res_stmt, tree lhs0,
+   

[PING^2][PATCH v2] swap: Fix incorrect lane extraction by vec_extract() [PR106770]

2023-10-03 Thread Surya Kumari Jangala
Ping

On 20/09/23 7:31 am, Surya Kumari Jangala wrote:
> Ping
> 
> On 10/09/23 10:58 pm, Surya Kumari Jangala wrote:
>> swap: Fix incorrect lane extraction by vec_extract() [PR106770]
>>
>> In the routine rs6000_analyze_swaps(), special handling of swappable
>> instructions is done even if the webs that contain the swappable
>> instructions are not optimized, i.e., the webs do not contain any
>> permuting load/store instructions along with the associated register
>> swap instructions. Doing special handling in such webs will result in
>> the extracted lane being adjusted unnecessarily for vec_extract.
>>
>> Another issue is that existing code treats non-permuting loads/stores
>> as special swappables. Non-permuting loads/stores (that have not yet
>> been split into a permuting load/store and a swap) are handled by
>> converting them into a permuting load/store (which effectively removes
>> the swap). As a result, if special swappables are handled only in webs
>> containing permuting loads/stores, then non-optimal code is generated
>> for non-permuting loads/stores.
>>
>> Hence, in this patch, all webs containing either permuting loads/
>> stores or non-permuting loads/stores are marked as requiring special
>> handling of swappables. Swaps associated with permuting loads/stores
>> are marked for removal, and non-permuting loads/stores are converted to
>> permuting loads/stores. Then the special swappables in the webs are
>> fixed up.
>>
>> Another issue with always handling swappable instructions is that it is
>> incorrect to do so in webs where loads/stores on quad word aligned
>> addresses are changed to lvx/stvx. Similarly, in webs where
>> swap(load(vector constant)) instructions are replaced with
>> load(swapped vector constant), the swappable instructions should not be
>> modified.
>>
>> 2023-09-10  Surya Kumari Jangala  
>>
>> gcc/
>>  PR rtl-optimization/PR106770
>>  * config/rs6000/rs6000-p8swap.cc (non_permuting_mem_insn): New
>>  function.
>>  (handle_non_permuting_mem_insn): New function.
>>  (rs6000_analyze_swaps): Handle swappable instructions only in
>>  certain webs.
>>  (web_requires_special_handling): New instance variable.
>>  (handle_special_swappables): Remove handling of non-permuting
>>  load/store instructions.
>>
>> gcc/testsuite/
>>  PR rtl-optimization/PR106770
>>  * gcc.target/powerpc/pr106770.c: New test.
>> ---
>>
>> diff --git a/gcc/config/rs6000/rs6000-p8swap.cc 
>> b/gcc/config/rs6000/rs6000-p8swap.cc
>> index 0388b9bd736..3a695aa1318 100644
>> --- a/gcc/config/rs6000/rs6000-p8swap.cc
>> +++ b/gcc/config/rs6000/rs6000-p8swap.cc
>> @@ -179,6 +179,13 @@ class swap_web_entry : public web_entry_base
>>unsigned int special_handling : 4;
>>/* Set if the web represented by this entry cannot be optimized.  */
>>unsigned int web_not_optimizable : 1;
>> +  /* Set if the swappable insns in the web represented by this entry
>> + have to be fixed. Swappable insns have to be fixed in :
>> +   - webs containing permuting loads/stores and the swap insns
>> + in such webs have been marked for removal
>> +   - webs where non-permuting loads/stores have been converted
>> + to permuting loads/stores  */
>> +  unsigned int web_requires_special_handling : 1;
>>/* Set if this insn should be deleted.  */
>>unsigned int will_delete : 1;
>>  };
>> @@ -1468,14 +1475,6 @@ handle_special_swappables (swap_web_entry 
>> *insn_entry, unsigned i)
>>if (dump_file)
>>  fprintf (dump_file, "Adjusting subreg in insn %d\n", i);
>>break;
>> -case SH_NOSWAP_LD:
>> -  /* Convert a non-permuting load to a permuting one.  */
>> -  permute_load (insn);
>> -  break;
>> -case SH_NOSWAP_ST:
>> -  /* Convert a non-permuting store to a permuting one.  */
>> -  permute_store (insn);
>> -  break;
>>  case SH_EXTRACT:
>>/* Change the lane on an extract operation.  */
>>adjust_extract (insn);
>> @@ -2401,6 +2400,25 @@ recombine_lvx_stvx_patterns (function *fun)
>>free (to_delete);
>>  }
>>  
>> +/* Return true if insn is a non-permuting load/store.  */
>> +static bool
>> +non_permuting_mem_insn (swap_web_entry *insn_entry, unsigned int i)
>> +{
>> +  return (insn_entry[i].special_handling == SH_NOSWAP_LD ||
>> +  insn_entry[i].special_handling == SH_NOSWAP_ST);
>> +}
>> +
>> +/* Convert a non-permuting load/store insn to a permuting one.  */
>> +static void
>> +handle_non_permuting_mem_insn (swap_web_entry *insn_entry, unsigned int i)
>> +{
>> +  rtx_insn *insn = insn_entry[i].insn;
>> +  if (insn_entry[i].special_handling == SH_NOSWAP_LD)
>> +permute_load (insn);
>> +  else if (insn_entry[i].special_handling == SH_NOSWAP_ST)
>> +permute_store (insn);
>> +}
>> +
>>  /* Main entry point for this pass.  */
>>  unsigned int
>>  rs6000_analyze_swaps (function *fun)
>> @@ -2624,25 +2642,56 @@ rs6000_analyze_swaps (function *fun)
>>dump_swap_insn_table 

Re: [PATCH] Fix coroutine tests for libstdc++ gnu-version-namespace mode

2023-10-03 Thread Jonathan Wakely
On Mon, 2 Oct 2023 at 18:07, François Dumont  wrote:
>
> Hi
>
> Gentle reminder for this minor patch.

It looks like you attached the wrong patch.


>
> Thanks
>
> On 23/09/2023 22:10, François Dumont wrote:
> > I'm eventually fixing those tests the same way we manage this problem
> > in libstdc++ testsuite.
> >
> >testsuite: Add optional libstdc++ version namespace in expected
> > diagnostic
> >
> > When libstdc++ is build with
> > --enable-symvers=gnu-versioned-namespace diagnostics are
> > showing this namespace, currently __8.
> >
> > gcc/testsuite/ChangeLog:
> >
> > *
> > testsuite/g++.dg/coroutines/coro-bad-alloc-00-bad-op-new.C: Add optional
> > '__8' version namespace in expected diagnostic.
> > *
> > testsuite/g++.dg/coroutines/coro-bad-alloc-01-bad-op-del.C: Likewise.
> > *
> > testsuite/g++.dg/coroutines/coro-bad-alloc-02-no-op-new-nt.C: Likewise.
> > *
> > testsuite/g++.dg/coroutines/coro-bad-grooaf-01-grooaf-expected.C:
> > Likewise.
> > * testsuite/g++.dg/coroutines/pr97438.C: Likewise.
> > * testsuite/g++.dg/coroutines/ramp-return-b.C: Likewise.
> >
> > Tested under Linux x86_64.
> >
> > I'm contributing to libstdc++ so I already have write access.
> >
> > Ok to commit ?
> >
> > François


[PATCH][GCC] aarch64: Enable Cortex-X4 CPU

2023-10-03 Thread Saurabh Jha

Hey,


This patch adds support for the Cortex-X4 CPU to GCC.


Regression testing for aarch64-none-elf target and found no regressions.


Okay for gcc-master? I don't have commit access so if it looks okay, 
could someone please help me commit this?



Thanks, Saurabh


gcc/ChangeLog * config/aarch64/aarch64-cores.def (AARCH64_CORE): Add 
support for cortex-x4 core. * config/aarch64/aarch64-tune.md: 
Regenerated. * doc/invoke.texi: Add command-line option for cortex-x4 core.
diff --git a/gcc/config/aarch64/aarch64-cores.def 
b/gcc/config/aarch64/aarch64-cores.def
index 73976e9a4c5..e13625d176e 100644
--- a/gcc/config/aarch64/aarch64-cores.def
+++ b/gcc/config/aarch64/aarch64-cores.def
@@ -182,6 +182,8 @@ AARCH64_CORE("cortex-x2",  cortexx2, cortexa57, V9A,  
(SVE2_BITPERM, MEMTAG, I8M
 
 AARCH64_CORE("cortex-x3",  cortexx3, cortexa57, V9A,  (SVE2_BITPERM, MEMTAG, 
I8MM, BF16), neoversen2, 0x41, 0xd4e, -1)
 
+AARCH64_CORE("cortex-x4",  cortexx4, cortexa57, V9_2A,  (SVE2_BITPERM, MEMTAG, 
PROFILE), neoversen2, 0x41, 0xd81, -1)
+
 AARCH64_CORE("neoverse-n2", neoversen2, cortexa57, V9A, (I8MM, BF16, 
SVE2_BITPERM, RNG, MEMTAG, PROFILE), neoversen2, 0x41, 0xd49, -1)
 
 AARCH64_CORE("demeter", demeter, cortexa57, V9A, (I8MM, BF16, SVE2_BITPERM, 
RNG, MEMTAG, PROFILE), neoversev2, 0x41, 0xd4f, -1)
diff --git a/gcc/config/aarch64/aarch64-tune.md 
b/gcc/config/aarch64/aarch64-tune.md
index 12d610f0f65..33135f45f85 100644
--- a/gcc/config/aarch64/aarch64-tune.md
+++ b/gcc/config/aarch64/aarch64-tune.md
@@ -1,5 +1,5 @@
 ;; -*- buffer-read-only: t -*-
 ;; Generated automatically by gentune.sh from aarch64-cores.def
 (define_attr "tune"
-   
"cortexa34,cortexa35,cortexa53,cortexa57,cortexa72,cortexa73,thunderx,thunderxt88p1,thunderxt88,octeontx,octeontxt81,octeontxt83,thunderxt81,thunderxt83,ampere1,ampere1a,emag,xgene1,falkor,qdf24xx,exynosm1,phecda,thunderx2t99p1,vulcan,thunderx2t99,cortexa55,cortexa75,cortexa76,cortexa76ae,cortexa77,cortexa78,cortexa78ae,cortexa78c,cortexa65,cortexa65ae,cortexx1,cortexx1c,ares,neoversen1,neoversee1,octeontx2,octeontx2t98,octeontx2t96,octeontx2t93,octeontx2f95,octeontx2f95n,octeontx2f95mm,a64fx,tsv110,thunderx3t110,zeus,neoversev1,neoverse512tvb,saphira,cortexa57cortexa53,cortexa72cortexa53,cortexa73cortexa35,cortexa73cortexa53,cortexa75cortexa55,cortexa76cortexa55,cortexr82,cortexa510,cortexa520,cortexa710,cortexa715,cortexa720,cortexx2,cortexx3,neoversen2,demeter,neoversev2"
+   
"cortexa34,cortexa35,cortexa53,cortexa57,cortexa72,cortexa73,thunderx,thunderxt88p1,thunderxt88,octeontx,octeontxt81,octeontxt83,thunderxt81,thunderxt83,ampere1,ampere1a,emag,xgene1,falkor,qdf24xx,exynosm1,phecda,thunderx2t99p1,vulcan,thunderx2t99,cortexa55,cortexa75,cortexa76,cortexa76ae,cortexa77,cortexa78,cortexa78ae,cortexa78c,cortexa65,cortexa65ae,cortexx1,cortexx1c,ares,neoversen1,neoversee1,octeontx2,octeontx2t98,octeontx2t96,octeontx2t93,octeontx2f95,octeontx2f95n,octeontx2f95mm,a64fx,tsv110,thunderx3t110,zeus,neoversev1,neoverse512tvb,saphira,cortexa57cortexa53,cortexa72cortexa53,cortexa73cortexa35,cortexa73cortexa53,cortexa75cortexa55,cortexa76cortexa55,cortexr82,cortexa510,cortexa520,cortexa710,cortexa715,cortexa720,cortexx2,cortexx3,cortexx4,neoversen2,demeter,neoversev2"
(const (symbol_ref "((enum attr_tune) aarch64_tune)")))
diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi
index 33befee7d6b..5684b55bc06 100644
--- a/gcc/doc/invoke.texi
+++ b/gcc/doc/invoke.texi
@@ -20570,9 +20570,9 @@ performance of the code.  Permissible values for this 
option are:
 @samp{cortex-a73.cortex-a35}, @samp{cortex-a73.cortex-a53},
 @samp{cortex-a75.cortex-a55}, @samp{cortex-a76.cortex-a55},
 @samp{cortex-r82}, @samp{cortex-x1}, @samp{cortex-x1c}, @samp{cortex-x2},
-@samp{cortex-x3}, @samp{cortex-a510}, @samp{cortex-a520}, @samp{cortex-a710},
-@samp{cortex-a715}, @samp{cortex-a720}, @samp{ampere1}, @samp{ampere1a},
-and @samp{native}.
+@samp{cortex-x3}, @samp{cortex-x4}, @samp{cortex-a510}, @samp{cortex-a520},
+@samp{cortex-a710}, @samp{cortex-a715}, @samp{cortex-a720}, @samp{ampere1},
+@samp{ampere1a}, and @samp{native}.
 
 The values @samp{cortex-a57.cortex-a53}, @samp{cortex-a72.cortex-a53},
 @samp{cortex-a73.cortex-a35}, @samp{cortex-a73.cortex-a53},


Re: [PATCH 3/3] aarch64: Convert aarch64 multi choice patterns to new syntax

2023-10-03 Thread Andrea Corallo
Richard Sandiford  writes:

> Andrea Corallo  writes:
>> Hi all,
>> this patch converts a number of multi multi choice patterns within the
>> aarch64 backend to the new syntax.
>>
>> The list of the converted patterns is in the Changelog.
>>
>> For completeness here follows the list of multi choice patterns that
>> were rejected for conversion by my parser, they typically have some C
>> as asm output and require some manual intervention:
>> aarch64_simd_vec_set, aarch64_get_lane,
>> aarch64_cmdi, aarch64_cmdi, aarch64_cmtstdi,
>> *aarch64_movv8di, *aarch64_be_mov, *aarch64_be_movci,
>> *aarch64_be_mov, *aarch64_be_movxi, *aarch64_sve_mov_le,
>> *aarch64_sve_mov_be, @aarch64_pred_mov,
>> @aarch64_sve_gather_prefetch,
>> @aarch64_sve_gather_prefetch,
>> *aarch64_sve_gather_prefetch_sxtw,
>> *aarch64_sve_gather_prefetch_uxtw,
>> @aarch64_vec_duplicate_vq_le, *vec_extract_0,
>> *vec_extract_v128, *cmp_and,
>> *fcm_and_combine, @aarch64_sve_ext,
>> @aarch64_sve2_aba, *sibcall_insn, *sibcall_value_insn,
>> *xor_one_cmpl3, *insv_reg_,
>> *aarch64_bfi_,
>> *aarch64_bfidi_subreg_, *aarch64_bfxil,
>> *aarch64_bfxilsi_uxtw,
>> *aarch64_cvtf2_mult,
>> atomic_store.
>>
>> Bootstraped and reg tested on aarch64-unknown-linux-gnu, also I
>> analysed tmp-mddump.md (from 'make mddump') and could not find
>> effective differences, okay for trunk?
>
> I'd left this for a few days in case there were any comments on
> the formatting.  Since there weren't:
>
>>
>> Bests
>>
>>   Andrea
>>
>> gcc/ChangeLog:
>>
>>  * config/aarch64/aarch64.md (@ccmp)
>>  (@ccmp_rev, *call_insn, *call_value_insn)
>>  (*mov_aarch64, load_pair_sw_)
>>  (load_pair_dw_)
>>  (store_pair_sw_)
>>  (store_pair_dw_, *extendsidi2_aarch64)
>>  (*zero_extendsidi2_aarch64, *load_pair_zero_extendsidi2_aarch64)
>>  (*extend2_aarch64)
>>  (*zero_extend2_aarch64)
>>  (*extendqihi2_aarch64, *zero_extendqihi2_aarch64)
>>  (*add3_aarch64, *addsi3_aarch64_uxtw, *add3_poly_1)
>>  (add3_compare0, *addsi3_compare0_uxtw)
>>  (*add3_compareC_cconly, add3_compareC)
>>  (*add3_compareV_cconly_imm, add3_compareV_imm)
>>  (*add3nr_compare0, subdi3, subv_imm)
>>  (*cmpv_insn, sub3_compare1_imm, neg2)
>>  (cmp, fcmp, fcmpe, *cmov_insn)
>>  (*cmovsi_insn_uxtw, 3, *si3_uxtw)
>>  (*and3_compare0, *andsi3_compare0_uxtw, one_cmpl2)
>>  (*_one_cmpl3, *and3nr_compare0)
>>  (*aarch64_ashl_sisd_or_int_3)
>>  (*aarch64_lshr_sisd_or_int_3)
>>  (*aarch64_ashr_sisd_or_int_3, *ror3_insn)
>>  (*si3_insn_uxtw, _trunc2)
>>  (2)
>>  (3)
>>  (3)
>>  (*aarch64_3_cssc, copysign3_insn): Update
>>  to new syntax.
>>
>>  * config/aarch64/aarch64-sve2.md (@aarch64_scatter_stnt)
>>  (@aarch64_scatter_stnt_)
>>  (*aarch64_mul_unpredicated_)
>>  (@aarch64_pred_, *cond__2)
>>  (*cond__3, *cond__any)
>>  (*cond__z, @aarch64_pred_)
>>  (*cond__2, *cond__3)
>>  (*cond__any, @aarch64_sve_)
>>  (@aarch64_sve__lane_)
>>  (@aarch64_sve_add_mul_lane_)
>>  (@aarch64_sve_sub_mul_lane_, @aarch64_sve2_xar)
>>  (*aarch64_sve2_bcax, @aarch64_sve2_eor3)
>>  (*aarch64_sve2_nor, *aarch64_sve2_nand)
>>  (*aarch64_sve2_bsl, *aarch64_sve2_nbsl)
>>  (*aarch64_sve2_bsl1n, *aarch64_sve2_bsl2n)
>>  (*aarch64_sve2_sra, @aarch64_sve_add_)
>>  (*aarch64_sve2_aba, @aarch64_sve_add_)
>>  (@aarch64_sve_add__lane_)
>>  (@aarch64_sve_qadd_)
>>  (@aarch64_sve_qadd__lane_)
>>  (@aarch64_sve_sub_)
>>  (@aarch64_sve_sub__lane_)
>>  (@aarch64_sve_qsub_)
>>  (@aarch64_sve_qsub__lane_)
>>  (@aarch64_sve_, @aarch64__lane_)
>>  (@aarch64_pred_)
>>  (@aarch64_pred_, *cond__2)
>>  (*cond__z, @aarch64_sve_)
>>  (@aarch64__lane_, @aarch64_sve_)
>>  (@aarch64__lane_, @aarch64_pred_)
>>  (*cond__any_relaxed)
>>  (*cond__any_strict)
>>  (@aarch64_pred_, *cond_)
>>  (@aarch64_pred_, *cond_)
>>  (*cond__strict): Update to new syntax.
>>
>>  * config/aarch64/aarch64-sve.md (*aarch64_sve_mov_ldr_str)
>>  (*aarch64_sve_mov_no_ldr_str, @aarch64_pred_mov)
>>  (*aarch64_sve_mov, aarch64_wrffr)
>>  (mask_scatter_store)
>>  (*mask_scatter_store_xtw_unpacked)
>>  (*mask_scatter_store_sxtw)
>>  (*mask_scatter_store_uxtw)
>>  (@aarch64_scatter_store_trunc)
>>  (@aarch64_scatter_store_trunc)
>>  (*aarch64_scatter_store_trunc_sxtw)
>>  (*aarch64_scatter_store_trunc_uxtw)
>>  (*vec_duplicate_reg, vec_shl_insert_)
>>  (vec_series, @extract__)
>>  (@aarch64_pred_, *cond__2)
>>  (*cond__any, @aarch64_pred_)
>>  (@aarch64_sve_revbhw_)
>>  (@cond_)
>>  (*2)
>>  (@aarch64_pred_sxt)
>>  (@aarch64_cond_sxt)
>>  (*cond_uxt_2, *cond_uxt_any, *cnot)
>>  (*cond_cnot_2, *cond_cnot_any)
>>  (@aarch64_pred_, *cond__2_relaxed)
>>  (*cond__2_strict, *cond__any_relaxed)
>>  (*cond__any_strict, @aarch64_pred_)
>>  (*cond__2, *cond__3)

Re:

2023-10-03 Thread Kito Cheng
Ooop, I screwed up when writing my cover letter of the target
attribute patch set...

On Tue, Oct 3, 2023 at 5:10 PM Kito Cheng  wrote:
>
> From: Kito Cheng 
>
> Reply-To:
>
> Subject: [PATCH v1 0/4] RISC-V target attribute
>
> In-Reply-To:
>
> This patch set implement target attribute for RISC-V target, which is similar 
> to other target like x86 or ARM, let user able to set some local setting per 
> function without changing global settings.
>
> We support arch, tune and cpu first, and we will support other target 
> attribute later, this version DOES NOT include multi-version function support 
> yet, that is future work, probably work for GCC 15.
>
> The full proposal is put in RISC-V C-API document[1], which has discussed 
> with RISC-V LLVM community, so we have consistent syntax and semantics.
>
> [1] https://github.com/riscv-non-isa/riscv-c-api-doc/pull/35
>
>


[PATCH v1 3/4] RISC-V: Extend riscv_subset_list, preparatory for target attribute support

2023-10-03 Thread Kito Cheng
riscv_subset_list only accept a full arch string before, but we need to
parse single extension when supporting target attribute, also we may set
a riscv_subset_list directly rather than re-parsing the ISA string
again.

gcc/ChangeLog:

* config/riscv/riscv-subset.h (riscv_subset_list::parse_single_std_ext):
New.
(riscv_subset_list::parse_single_multiletter_ext): Ditto.
(riscv_subset_list::clone): Ditto.
(riscv_subset_list::parse_single_ext): Ditto.
(riscv_subset_list::set_loc): Ditto.
(riscv_set_arch_by_subset_list): Ditto.
* common/config/riscv/riscv-common.cc
(riscv_subset_list::parse_single_std_ext): New.
(riscv_subset_list::parse_single_multiletter_ext): Ditto.
(riscv_subset_list::clone): Ditto.
(riscv_subset_list::parse_single_ext): Ditto.
(riscv_subset_list::set_loc): Ditto.
(riscv_set_arch_by_subset_list): Ditto.
---
 gcc/common/config/riscv/riscv-common.cc | 210 
 gcc/config/riscv/riscv-subset.h |  11 ++
 2 files changed, 221 insertions(+)

diff --git a/gcc/common/config/riscv/riscv-common.cc 
b/gcc/common/config/riscv/riscv-common.cc
index 9a0a68fe5db..76a1378874d 100644
--- a/gcc/common/config/riscv/riscv-common.cc
+++ b/gcc/common/config/riscv/riscv-common.cc
@@ -1036,6 +1036,41 @@ riscv_subset_list::parse_std_ext (const char *p)
   return p;
 }
 
+/* Parsing function for one standard extensions.
+
+   Return Value:
+ Points to the end of extensions.
+
+   Arguments:
+ `p`: Current parsing position.  */
+
+const char *
+riscv_subset_list::parse_single_std_ext (const char *p)
+{
+  if (*p == 'x' || *p == 's' || *p == 'z')
+{
+  error_at (m_loc,
+   "%<-march=%s%>: Not single-letter extension. "
+   "%<%c%>",
+   m_arch, *p);
+  return nullptr;
+}
+
+  unsigned major_version = 0;
+  unsigned minor_version = 0;
+  bool explicit_version_p = false;
+  char subset[2] = {0, 0};
+
+  subset[0] = *p;
+
+  p++;
+
+  p = parsing_subset_version (subset, p, _version, _version,
+ /* std_ext_p= */ true, _version_p);
+
+  add (subset, major_version, minor_version, explicit_version_p, false);
+  return p;
+}
 
 /* Check any implied extensions for EXT.  */
 void
@@ -1138,6 +1173,109 @@ riscv_subset_list::handle_combine_ext ()
 }
 }
 
+/* Parsing function for multi-letter extensions.
+
+   Return Value:
+ Points to the end of extensions.
+
+   Arguments:
+ `p`: Current parsing position.
+ `ext_type`: What kind of extensions, 's', 'z' or 'x'.
+ `ext_type_str`: Full name for kind of extension.  */
+
+
+const char *
+riscv_subset_list::parse_single_multiletter_ext (const char *p,
+const char *ext_type,
+const char *ext_type_str)
+{
+  unsigned major_version = 0;
+  unsigned minor_version = 0;
+  size_t ext_type_len = strlen (ext_type);
+
+{
+  if (strncmp (p, ext_type, ext_type_len) != 0)
+   return NULL;
+
+  char *subset = xstrdup (p);
+  char *q = subset;
+  const char *end_of_version;
+  bool explicit_version_p = false;
+  char *ext;
+  char backup;
+  size_t len;
+  size_t end_of_version_pos, i;
+  bool found_any_number = false;
+  bool found_minor_version = false;
+
+  backup = *q;
+  *q = '\0';
+  len = q - subset;
+  *q = backup;
+
+  end_of_version_pos = len;
+  /* Find the begin of version string.  */
+  for (i = len -1; i > 0; --i)
+   {
+ if (ISDIGIT (subset[i]))
+   {
+ found_any_number = true;
+ continue;
+   }
+ /* Might be version seperator, but need to check one more char,
+we only allow p, so we could stop parsing if found
+any more `p`.  */
+ if (subset[i] == 'p' &&
+ !found_minor_version &&
+ found_any_number && ISDIGIT (subset[i-1]))
+   {
+ found_minor_version = true;
+ continue;
+   }
+
+ end_of_version_pos = i + 1;
+ break;
+   }
+
+  backup = subset[end_of_version_pos];
+  subset[end_of_version_pos] = '\0';
+  ext = xstrdup (subset);
+  subset[end_of_version_pos] = backup;
+
+  end_of_version
+   = parsing_subset_version (ext, subset + end_of_version_pos, 
_version, _version,
+ /* std_ext_p= */ false, _version_p);
+  free (ext);
+
+  if (end_of_version == NULL)
+   return NULL;
+
+  subset[end_of_version_pos] = '\0';
+
+  if (strlen (subset) == 1)
+   {
+ error_at (m_loc, "%<-march=%s%>: name of %s must be more than 1 
letter",
+   m_arch, ext_type_str);
+ free (subset);
+ return NULL;
+   }
+
+  add (subset, major_version, minor_version, explicit_version_p, false);
+  

[PATCH v1 4/4] RISC-V: Implement target attribute

2023-10-03 Thread Kito Cheng
The target attribute which proposed in [1], target attribute allow user
to specify a local setting per-function basis.

The syntax of target attribute is `__attribute__((target("")))`.

and the syntax of `` describes below:
```
ATTR-STRING := ATTR-STRING ';' ATTR
 | ATTR

ATTR:= ARCH-ATTR
 | CPU-ATTR
 | TUNE-ATTR

ARCH-ATTR   := 'arch=' EXTENSIONS-OR-FULLARCH

EXTENSIONS-OR-FULLARCH := 
| 

EXTENSIONS :=  ',' 
| 

FULLARCHSTR:= 

EXTENSION  :=   

OP := '+'

VERSION:= [0-9]+ 'p' [0-9]+
| [1-9][0-9]*
|

EXTENSION-NAME := Naming rule is defined in RISC-V ISA manual

CPU-ATTR:= 'cpu=' 
TUNE-ATTR   := 'tune=' 
```

[1] https://github.com/riscv-non-isa/riscv-c-api-doc/pull/35

gcc/ChangeLog:

* config.gcc (riscv): Add riscv-target-attr.o.
* config/riscv/riscv-opts.h (TARGET_MIN_VLEN_OPTS): New.
* config/riscv/riscv-protos.h (riscv_declare_function_size) New.
(riscv_option_valid_attribute_p): New.
(riscv_override_options_internal): New.
(struct riscv_tune_info): New.
(riscv_parse_tune): New.
* config/riscv/riscv-target-attr.cc
(class riscv_target_attr_parser): New.
(struct riscv_attribute_info): New.
(riscv_attributes): New.
(riscv_target_attr_parser::parse_arch):
(riscv_target_attr_parser::handle_arch):
(riscv_target_attr_parser::handle_cpu):
(riscv_target_attr_parser::handle_tune):
(riscv_target_attr_parser::update_settings):
(riscv_process_one_target_attr):
(num_occurences_in_str):
(riscv_process_target_attr):
(riscv_option_valid_attribute_p):
* config/riscv/riscv.cc: Include target-globals.h and
riscv-subset.h.
(struct riscv_tune_info): Move to riscv-protos.h.
(get_tune_str):
(riscv_parse_tune):
(riscv_declare_function_size):
(riscv_option_override): Build target_option_default_node and
target_option_current_node.
(riscv_save_restore_target_globals):
(riscv_option_restore):
(riscv_previous_fndecl):
(riscv_set_current_function): Apply the target attribute.
(TARGET_OPTION_RESTORE): Define.
(TARGET_OPTION_VALID_ATTRIBUTE_P): Ditto.
* config/riscv/riscv.h (SWITCHABLE_TARGET): Define to 1.
(ASM_DECLARE_FUNCTION_SIZE) Define.
* config/riscv/riscv.opt (mtune=): Add Save attribute.
(mcpu=): Ditto.
(mcmodel=): Ditto.
* config/riscv/t-riscv: Add build rule for riscv-target-attr.o
* doc/extend.texi: Add doc for target attribute.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/target-attr-01.c: New.
* gcc.target/riscv/target-attr-02.c: Ditto.
* gcc.target/riscv/target-attr-03.c: Ditto.
* gcc.target/riscv/target-attr-04.c: Ditto.
* gcc.target/riscv/target-attr-05.c: Ditto.
* gcc.target/riscv/target-attr-06.c: Ditto.
* gcc.target/riscv/target-attr-07.c: Ditto.
* gcc.target/riscv/target-attr-bad-01.c: Ditto.
* gcc.target/riscv/target-attr-bad-02.c: Ditto.
* gcc.target/riscv/target-attr-bad-03.c: Ditto.
* gcc.target/riscv/target-attr-bad-04.c: Ditto.
* gcc.target/riscv/target-attr-bad-05.c: Ditto.
* gcc.target/riscv/target-attr-bad-06.c: Ditto.
* gcc.target/riscv/target-attr-bad-07.c: Ditto.
* gcc.target/riscv/target-attr-warning-01.c: Ditto.
* gcc.target/riscv/target-attr-warning-02.c: Ditto.
* gcc.target/riscv/target-attr-warning-03.c: Ditto.
---
 gcc/config.gcc|   2 +-
 gcc/config/riscv/riscv-opts.h |   6 +
 gcc/config/riscv/riscv-protos.h   |  21 +
 gcc/config/riscv/riscv-target-attr.cc | 396 ++
 gcc/config/riscv/riscv.cc | 192 +++--
 gcc/config/riscv/riscv.h  |   6 +
 gcc/config/riscv/riscv.opt|   6 +-
 gcc/config/riscv/t-riscv  |   5 +
 gcc/doc/extend.texi   |  58 +++
 .../gcc.target/riscv/target-attr-01.c |  31 ++
 .../gcc.target/riscv/target-attr-02.c |  31 ++
 .../gcc.target/riscv/target-attr-03.c |  26 ++
 .../gcc.target/riscv/target-attr-04.c |  28 ++
 .../gcc.target/riscv/target-attr-05.c |  27 ++
 .../gcc.target/riscv/target-attr-06.c |  27 ++
 .../gcc.target/riscv/target-attr-07.c |  25 ++
 .../gcc.target/riscv/target-attr-bad-01.c |  13 +
 .../gcc.target/riscv/target-attr-bad-02.c |  13 +
 .../gcc.target/riscv/target-attr-bad-03.c |  13 +
 .../gcc.target/riscv/target-attr-bad-04.c |  13 +
 .../gcc.target/riscv/target-attr-bad-05.c |  13 +
 

[PATCH v1 2/4] RISC-V: Refactor riscv_option_override and riscv_convert_vector_bits. [NFC]

2023-10-03 Thread Kito Cheng
Allow those funciton apply from a local gcc_options rather than the
global options.

Preparatory for target attribute, sperate this change for eaiser reivew
since it's a NFC.

gcc/ChangeLog:

* config/riscv/riscv.cc (riscv_convert_vector_bits): Get setting
from argument rather than get setting from global setting.
(riscv_override_options_internal): New, splited from
riscv_override_options, also take a gcc_options argument.
(riscv_option_override): Splited most part to
riscv_override_options_internal.
---
 gcc/config/riscv/riscv.cc | 93 ++-
 1 file changed, 52 insertions(+), 41 deletions(-)

diff --git a/gcc/config/riscv/riscv.cc b/gcc/config/riscv/riscv.cc
index d5446b63dbf..d089ec1b241 100644
--- a/gcc/config/riscv/riscv.cc
+++ b/gcc/config/riscv/riscv.cc
@@ -7982,10 +7982,11 @@ riscv_init_machine_status (void)
 /* Return the VLEN value associated with -march.
TODO: So far we only support length-agnostic value. */
 static poly_uint16
-riscv_convert_vector_bits (void)
+riscv_convert_vector_bits (struct gcc_options *opts)
 {
   int chunk_num;
-  if (TARGET_MIN_VLEN > 32)
+  int min_vlen = TARGET_MIN_VLEN_OPTS (opts);
+  if (min_vlen > 32)
 {
   /* When targetting minimum VLEN > 32, we should use 64-bit chunk size.
 Otherwise we can not include SEW = 64bits.
@@ -8003,7 +8004,7 @@ riscv_convert_vector_bits (void)
   - TARGET_MIN_VLEN = 2048bit: [256,256]
   - TARGET_MIN_VLEN = 4096bit: [512,512]
   FIXME: We currently DON'T support TARGET_MIN_VLEN > 4096bit.  */
-  chunk_num = TARGET_MIN_VLEN / 64;
+  chunk_num = min_vlen / 64;
 }
   else
 {
@@ -8022,10 +8023,10 @@ riscv_convert_vector_bits (void)
  to set RVV mode size. The RVV machine modes size are run-time constant if
  TARGET_VECTOR is enabled. The RVV machine modes size remains default
  compile-time constant if TARGET_VECTOR is disabled.  */
-  if (TARGET_VECTOR)
+  if (TARGET_VECTOR_OPTS_P (opts))
 {
-  if (riscv_autovec_preference == RVV_FIXED_VLMAX)
-   return (int) TARGET_MIN_VLEN / (riscv_bytes_per_vector_chunk * 8);
+  if (opts->x_riscv_autovec_preference == RVV_FIXED_VLMAX)
+   return (int) min_vlen / (riscv_bytes_per_vector_chunk * 8);
   else
return poly_uint16 (chunk_num, chunk_num);
 }
@@ -8033,40 +8034,33 @@ riscv_convert_vector_bits (void)
 return 1;
 }
 
-/* Implement TARGET_OPTION_OVERRIDE.  */
-
-static void
-riscv_option_override (void)
+/* 'Unpack' up the internal tuning structs and update the options
+in OPTS.  The caller must have set up selected_tune and selected_arch
+as all the other target-specific codegen decisions are
+derived from them.  */
+void
+riscv_override_options_internal (struct gcc_options *opts)
 {
   const struct riscv_tune_info *cpu;
 
-#ifdef SUBTARGET_OVERRIDE_OPTIONS
-  SUBTARGET_OVERRIDE_OPTIONS;
-#endif
-
-  flag_pcc_struct_return = 0;
-
-  if (flag_pic)
-g_switch_value = 0;
-
   /* The presence of the M extension implies that division instructions
  are present, so include them unless explicitly disabled.  */
-  if (TARGET_MUL && (target_flags_explicit & MASK_DIV) == 0)
-target_flags |= MASK_DIV;
-  else if (!TARGET_MUL && TARGET_DIV)
+  if (TARGET_MUL_OPTS_P (opts) && (target_flags_explicit & MASK_DIV) == 0)
+opts->x_target_flags |= MASK_DIV;
+  else if (!TARGET_MUL_OPTS_P (opts) && TARGET_DIV_OPTS_P (opts))
 error ("%<-mdiv%> requires %<-march%> to subsume the % extension");
 
   /* Likewise floating-point division and square root.  */
   if ((TARGET_HARD_FLOAT || TARGET_ZFINX) && (target_flags_explicit & 
MASK_FDIV) == 0)
-target_flags |= MASK_FDIV;
+opts->x_target_flags |= MASK_FDIV;
 
   /* Handle -mtune, use -mcpu if -mtune is not given, and use default -mtune
  if both -mtune and -mcpu are not given.  */
-  cpu = riscv_parse_tune (riscv_tune_string ? riscv_tune_string :
- (riscv_cpu_string ? riscv_cpu_string :
+  cpu = riscv_parse_tune (opts->x_riscv_tune_string ? 
opts->x_riscv_tune_string :
+ (opts->x_riscv_cpu_string ? opts->x_riscv_cpu_string :
   RISCV_TUNE_STRING_DEFAULT));
   riscv_microarchitecture = cpu->microarchitecture;
-  tune_param = optimize_size ? _size_tune_info : cpu->tune_param;
+  tune_param = opts->x_optimize_size ? _size_tune_info : 
cpu->tune_param;
 
   /* Use -mtune's setting for slow_unaligned_access, even when optimizing
  for size.  For architectures that trap and emulate unaligned accesses,
@@ -8082,15 +8076,38 @@ riscv_option_override (void)
 
   if ((target_flags_explicit & MASK_STRICT_ALIGN) == 0
   && cpu->tune_param->slow_unaligned_access)
-target_flags |= MASK_STRICT_ALIGN;
+opts->x_target_flags |= MASK_STRICT_ALIGN;
 
   /* If the user hasn't specified a branch cost, use the processor's
  default.  */
-  if (riscv_branch_cost == 0)
-

[no subject]

2023-10-03 Thread Kito Cheng
From: Kito Cheng 

Reply-To:

Subject: [PATCH v1 0/4] RISC-V target attribute

In-Reply-To:

This patch set implement target attribute for RISC-V target, which is similar 
to other target like x86 or ARM, let user able to set some local setting per 
function without changing global settings.

We support arch, tune and cpu first, and we will support other target attribute 
later, this version DOES NOT include multi-version function support yet, that 
is future work, probably work for GCC 15.

The full proposal is put in RISC-V C-API document[1], which has discussed with 
RISC-V LLVM community, so we have consistent syntax and semantics. 

[1] https://github.com/riscv-non-isa/riscv-c-api-doc/pull/35




[PATCH v1 1/4] options: Define TARGET__P and TARGET__OPTS_P macro for Mask and InverseMask

2023-10-03 Thread Kito Cheng
We TARGET__P marcro to test a Mask and InverseMask with user
specified target_variable, however we may want to test with specific
gcc_options variable rather than target_variable.

Like RISC-V has defined lots of Mask with TargetVariable, which is not
easy to use, because that means we need to known which Mask are associate with
which TargetVariable, so take a gcc_options variable is a better interface
for such use case.

gcc/ChangeLog:

* doc/options.texi (Mask): Document TARGET__P and
TARGET__OPTS_P.
(InverseMask): Ditto.
* opth-gen.awk (Mask): Generate TARGET__P and
TARGET__OPTS_P macro.
(InverseMask): Ditto.
---
 gcc/doc/options.texi | 23 ---
 gcc/opth-gen.awk | 13 -
 2 files changed, 28 insertions(+), 8 deletions(-)

diff --git a/gcc/doc/options.texi b/gcc/doc/options.texi
index 1f7c15b8eb4..715f0a1479c 100644
--- a/gcc/doc/options.texi
+++ b/gcc/doc/options.texi
@@ -404,18 +404,27 @@ You may also specify @code{Var} to select a variable 
other than
 The options-processing script will automatically allocate a unique bit
 for the option.  If the option is attached to @samp{target_flags} or @code{Var}
 which is defined by @code{TargetVariable},  the script will set the macro
-@code{MASK_@var{name}} to the appropriate bitmask.  It will also declare a 
-@code{TARGET_@var{name}} macro that has the value 1 when the option is active
-and 0 otherwise.  If you use @code{Var} to attach the option to a different 
variable
-which is not defined by @code{TargetVariable}, the bitmask macro with be
-called @code{OPTION_MASK_@var{name}}.
+@code{MASK_@var{name}} to the appropriate bitmask.  It will also declare a
+@code{TARGET_@var{name}}, @code{TARGET_@var{name}_P} and
+@code{TARGET_@var{name}_OPTS_P}: @code{TARGET_@var{name}} macros that has the
+value 1 when the option is active and 0 otherwise, @code{TARGET_@var{name}_P} 
is
+similar to @code{TARGET_@var{name}} but take an argument as @samp{target_flags}
+or @code{TargetVariable}, and @code{TARGET_@var{name}_OPTS_P} also similar to
+@code{TARGET_@var{name}} but take an argument as @code{gcc_options}.
+If you use @code{Var} to attach the option to a different variable which is not
+defined by @code{TargetVariable}, the bitmask macro with be called
+@code{OPTION_MASK_@var{name}}.
 
 @item InverseMask(@var{othername})
 @itemx InverseMask(@var{othername}, @var{thisname})
 The option is the inverse of another option that has the
 @code{Mask(@var{othername})} property.  If @var{thisname} is given,
-the options-processing script will declare a @code{TARGET_@var{thisname}}
-macro that is 1 when the option is active and 0 otherwise.
+the options-processing script will declare @code{TARGET_@var{thisname}},
+@code{TARGET_@var{name}_P} and @code{TARGET_@var{name}_OPTS_P} macros:
+@code{TARGET_@var{thisname}} is 1 when the option is active and 0 otherwise,
+@code{TARGET_@var{name}_P} is similar to @code{TARGET_@var{name}} but take an
+argument as @samp{target_flags}, and and @code{TARGET_@var{name}_OPTS_P} also
+similar to @code{TARGET_@var{name}} but take an argument as @code{gcc_options}.
 
 @item Enum(@var{name})
 The option's argument is a string from the set of strings associated
diff --git a/gcc/opth-gen.awk b/gcc/opth-gen.awk
index 70ca3d37719..4d498abd130 100644
--- a/gcc/opth-gen.awk
+++ b/gcc/opth-gen.awk
@@ -439,6 +439,10 @@ for (i = 0; i < n_target_vars; i++)
{
print "#define TARGET_" other_masks[i][j] \
  " ((" target_vars[i] " & MASK_" other_masks[i][j] ") != 
0)"
+   print "#define TARGET_" other_masks[i][j] "_P(" target_vars[i] 
")" \
+ " (((" target_vars[i] ") & MASK_" other_masks[i][j] ") != 
0)"
+   print "#define TARGET_" other_masks[i][j] "_OPTS_P(opts)" \
+ " (((opts->x_" target_vars[i] ") & MASK_" 
other_masks[i][j] ") != 0)"
}
 }
 print ""
@@ -469,15 +473,22 @@ for (i = 0; i < n_opts; i++) {
  " ((" vname " & " mask original_name ") != 0)"
print "#define TARGET_" name "_P(" vname ")" \
  " (((" vname ") & " mask original_name ") != 0)"
+   print "#define TARGET_" name "_OPTS_P(opts)" \
+ " (((opts->x_" vname ") & " mask original_name ") != 0)"
print "#define TARGET_EXPLICIT_" name "_P(opts)" \
  " ((opts->x_" vname "_explicit & " mask original_name ") 
!= 0)"
print "#define SET_TARGET_" name "(opts) opts->x_" vname " |= " 
mask original_name
}
 }
 for (i = 0; i < n_extra_masks; i++) {
-   if (extra_mask_macros[extra_masks[i]] == 0)
+   if (extra_mask_macros[extra_masks[i]] == 0) {
print "#define TARGET_" extra_masks[i] \
  " ((target_flags & MASK_" extra_masks[i] ") != 0)"
+   print "#define TARGET_" extra_masks[i] "_P(target_flags)" \
+ 

Re: [PATCH v2] ARM: Block predication on atomics [PR111235]

2023-10-03 Thread Maxim Kuvyrkov
> On Oct 1, 2023, at 00:36, Ramana Radhakrishnan  
> wrote:
> 
> + linaro-toolchain as I don't understand the CI issues on patchwork.
> 
> 
...
> Ok if no regressions but as you might get nagged by the post commit CI ...

I don't see any pre-commit failures for this patch, but regardless of what 
results are for pre-commit CI, there's always a chance to identify problems in 
post-commit CI -- simply because we test wa-a-ay more configurations in 
post-commit CI than in pre-commit CI.

> 
> While it is not policy yet to look at these bots but given the
> enthusiasm at Cauldron for patchwork and pre-commit CI and because all
> my armhf boxes are boxed up, I decided to do something a bit novel !
> 
> I tried reviewing this via patchwork
> 
> https://patchwork.sourceware.org/project/gcc/patch/pawpr08mb8982a6aa40749b74cad14c5783...@pawpr08mb8982.eurprd08.prod.outlook.com/
> 
> and notice that
> 
> https://ci.linaro.org/job/tcwg_gcc_build--master-arm-precommit/2393/artifact/artifacts/artifacts.precommit/notify/mail-body.txt
> says nothing could be built.

Um, no.  This says ...
===
Results changed to
# reset_artifacts:
-10
# true:
0
# build_abe gcc:
1

From
# reset_artifacts:
-10
# true:
0
# build_abe gcc:
1
===
... i.e., build succeeded both before and after patch.  We'll change the 
boilerplate intro for successful builds from ...
"Dear contributor, our automatic CI has detected problems related to your 
patch(es)."
... to ...
"Dear contributor, you are awesome, no CI failures related to your patch(es)".

One things that is strange -- testsuite builds were not triggered, we have only 
2 reports from build tests, but are missing another 2 reports from testsuite 
tests.

> 
> Possibly worth double checking the status for it being a false
> negative as to why the build failed.

Pre-commit CI is happy with the patch, albeit testsuite checks didn't run for 
some reason.  Regardless, we'll quickly catch and report any fallout in the 
post-commit CI once the patch is merged.

> 
> It was green on patchwork but remembering that Green is not Green for
> CI in patchwork I clicked on the afore mentioned ci.linaro.org link
> and see that it's actually broken.

Unfortunately, I seem to have confused developers about green and red at my 
Cauldron presentation.  "Green/Red" in patchwork mean the usual PASS/FAIL.  
It's only in post-commit CI in jenkins interface green and red mean something 
different.

--
Maxim Kuvyrkov
https://www.linaro.org