[Bug rtl-optimization/101169] [10 regression] test case gcc.target/powerpc/fold-vec-extract-char.p7.c fails after r10-9880

2023-10-18 Thread linkw at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=101169

--- Comment #6 from Kewen Lin  ---
PR111850 reminded me this bug, the sub-optimal issue described in #comment 4
has been fixed on latest trunk, I think it's r14-4664-g04c9cf5c786b94.

[Bug target/111753] [14 Regression] ICE: in extract_constrain_insn, at recog.cc:2692 insn does not satisfy its constraints: {*movsf_internal} with -O2 -mavx512bw -fno-tree-ter starting with r14-4499

2023-10-18 Thread haochen.jiang at intel dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111753

--- Comment #3 from Haochen Jiang  ---
It seems like caused by I changed the behavior when trying to use x/ymm16+ w/o
avx512vl specified.

Working on a solution for that.

[Bug c++/111872] New: GCC rejects out of class definition of inner private class template

2023-10-18 Thread jlame646 at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111872

Bug ID: 111872
   Summary: GCC rejects out of class definition of inner private
class template
   Product: gcc
   Version: 14.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: c++
  Assignee: unassigned at gcc dot gnu.org
  Reporter: jlame646 at gmail dot com
  Target Milestone: ---

The following valid(afaik) code is rejected by gcc but accepted by clang and
msvc. 
https://godbolt.org/z/x5xMvETPh
```
class A {

struct N;

template struct S; 
};
// works now

class A::N{};
template class  A::S{};//gcc rejects this but clang and msbc
accepts this
```

GCC says:

:10:19: error: 'class A::N' is private within this context
   10 | template class  A::S{};//gcc rejects this but clang and
msbc accepts this
  |   ^
:9:10: note: declared private here
9 | class A::N{};
  |  ^

[Bug c/100532] ICE: tree check: expected class ‘type’, have ‘exceptional’ (error_mark) in useless_type_conversion_p, at gimple-expr.c:259

2023-10-18 Thread pinskia at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100532

--- Comment #8 from Andrew Pinski  ---
Maybe the simple fix:
diff --git a/gcc/c/c-typeck.cc b/gcc/c/c-typeck.cc
index 6e044b4afbc..8f8562936dc 100644
--- a/gcc/c/c-typeck.cc
+++ b/gcc/c/c-typeck.cc
@@ -3367,7 +3367,7 @@ convert_argument (location_t ploc, tree function, tree
fundecl,
 {
   error_at (ploc, "type of formal parameter %d is incomplete",
parmnum + 1);
-  return val;
+  return error_mark_node;
 }

   /* Optionally warn about conversions that differ from the default

[PATCH][_Hashtable] Fix merge

2023-10-18 Thread François Dumont

libstdc++: [_Hashtable] Do not reuse untrusted cached hash code

On merge reuse merged node cached hash code only if we are on the same 
type of
hash and this hash is stateless. Usage of function pointers or 
std::function as

hash functor will prevent this optimization.

libstdc++-v3/ChangeLog

    * include/bits/hashtable_policy.h
    (_Hash_code_base::_M_hash_code(const _Hash&, const 
_Hash_node_value<>&)): Remove.
    (_Hash_code_base::_M_hash_code<_H2>(const _H2&, const 
_Hash_node_value<>&)): Remove.

    * include/bits/hashtable.h
    (_M_src_hash_code<_H2>(const _H2&, const key_type&, const 
__node_value_type&)): New.

    (_M_merge_unique<>, _M_merge_multi<>): Use latter.
    * testsuite/23_containers/unordered_map/modifiers/merge.cc
    (test04, test05, test06): New test cases.

Tested under Linux x86_64, ok to commit ?

François

diff --git a/libstdc++-v3/include/bits/hashtable.h b/libstdc++-v3/include/bits/hashtable.h
index 4c12dc895b2..f69acfe5213 100644
--- a/libstdc++-v3/include/bits/hashtable.h
+++ b/libstdc++-v3/include/bits/hashtable.h
@@ -1109,6 +1109,20 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
 	return { __n, this->_M_node_allocator() };
   }
 
+  // Check and if needed compute hash code using _Hash as __n _M_hash_code,
+  // if present, was computed using _H2.
+  template
+	__hash_code
+	_M_src_hash_code(const _H2&, const key_type& __k,
+			 const __node_value_type& __src_n) const
+	{
+	  if constexpr (std::is_same_v<_H2, _Hash>)
+	if constexpr (std::is_empty_v<_Hash>)
+	  return this->_M_hash_code(__src_n);
+
+	  return this->_M_hash_code(__k);
+	}
+
 public:
   // Extract a node.
   node_type
@@ -1146,7 +1160,7 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
 	  auto __pos = __i++;
 	  const key_type& __k = _ExtractKey{}(*__pos);
 	  __hash_code __code
-		= this->_M_hash_code(__src.hash_function(), *__pos._M_cur);
+		= _M_src_hash_code(__src.hash_function(), __k, *__pos._M_cur);
 	  size_type __bkt = _M_bucket_index(__code);
 	  if (_M_find_node(__bkt, __k, __code) == nullptr)
 		{
@@ -1174,8 +1188,9 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
 	  for (auto __i = __src.cbegin(), __end = __src.cend(); __i != __end;)
 	{
 	  auto __pos = __i++;
+	  const key_type& __k = _ExtractKey{}(*__pos);
 	  __hash_code __code
-		= this->_M_hash_code(__src.hash_function(), *__pos._M_cur);
+		= _M_src_hash_code(__src.hash_function(), __k, *__pos._M_cur);
 	  auto __nh = __src.extract(__pos);
 	  __hint = _M_insert_multi_node(__hint, __code, __nh._M_ptr)._M_cur;
 	  __nh._M_ptr = nullptr;
diff --git a/libstdc++-v3/include/bits/hashtable_policy.h b/libstdc++-v3/include/bits/hashtable_policy.h
index 86b32fb15f2..5d162463dc3 100644
--- a/libstdc++-v3/include/bits/hashtable_policy.h
+++ b/libstdc++-v3/include/bits/hashtable_policy.h
@@ -1319,19 +1319,6 @@ namespace __detail
 	  return _M_hash()(__k);
 	}
 
-  __hash_code
-  _M_hash_code(const _Hash&,
-		   const _Hash_node_value<_Value, true>& __n) const
-  { return __n._M_hash_code; }
-
-  // Compute hash code using _Hash as __n _M_hash_code, if present, was
-  // computed using _H2.
-  template
-	__hash_code
-	_M_hash_code(const _H2&,
-		const _Hash_node_value<_Value, __cache_hash_code>& __n) const
-	{ return _M_hash_code(_ExtractKey{}(__n._M_v())); }
-
   __hash_code
   _M_hash_code(const _Hash_node_value<_Value, false>& __n) const
   { return _M_hash_code(_ExtractKey{}(__n._M_v())); }
diff --git a/libstdc++-v3/testsuite/23_containers/unordered_map/modifiers/merge.cc b/libstdc++-v3/testsuite/23_containers/unordered_map/modifiers/merge.cc
index b140ce452aa..c051b58137a 100644
--- a/libstdc++-v3/testsuite/23_containers/unordered_map/modifiers/merge.cc
+++ b/libstdc++-v3/testsuite/23_containers/unordered_map/modifiers/merge.cc
@@ -17,15 +17,29 @@
 
 // { dg-do run { target c++17 } }
 
+#include 
+#include 
 #include 
 #include 
 #include 
 
 using test_type = std::unordered_map;
 
-struct hash {
-  auto operator()(int i) const noexcept { return ~std::hash()(i); }
-};
+template
+  struct xhash
+  {
+auto operator()(const T& i) const noexcept
+{ return ~std::hash()(i); }
+  };
+
+
+namespace std
+{
+  template
+struct __is_fast_hash> : __is_fast_hash>
+{ };
+}
+
 struct equal : std::equal_to<> { };
 
 template
@@ -64,7 +78,7 @@ test02()
 {
   const test_type c0{ {1, 10}, {2, 20}, {3, 30} };
   test_type c1 = c0;
-  std::unordered_map c2( c0.begin(), c0.end() );
+  std::unordered_map, equal> c2( c0.begin(), c0.end() );
 
   c1.merge(c2);
   VERIFY( c1 == c0 );
@@ -89,7 +103,7 @@ test03()
 {
   const test_type c0{ {1, 10}, {2, 20}, {3, 30} };
   test_type c1 = c0;
-  std::unordered_multimap c2( c0.begin(), c0.end() );
+  std::unordered_multimap, equal> c2( c0.begin(), c0.end() );
   c1.merge(c2);
   VERIFY( c1 == c0 );
   VERIFY( equal_elements(c2, c0) );
@@ -125,10 +139,164 @@ test03()
   VERIFY( c2.empty() );
 }
 
+void
+test04()
+{
+  const 

[Bug middle-end/110986] [14 Regression] aarch64 has support for conditional not (and vectorized conditional not ) after r14-3110-g7fb65f10285

2023-10-18 Thread pinskia at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110986

--- Comment #23 from Andrew Pinski  ---
Final patch posted:
https://gcc.gnu.org/pipermail/gcc-patches/2023-October/633517.html

The Canonicalization between the 2 forms or doing it in isel will wait until
next I think.

[PATCH] aarch64: [PR110986] Emit csinv again for `a ? ~b : b`

2023-10-18 Thread Andrew Pinski
After r14-3110-g7fb65f10285, the canonical form for
`a ? ~b : b` changed to be `-(a) ^ b` that means
for aarch64 we need to add a few new insn patterns
to be able to catch this and change it to be
what is the canonical form for the aarch64 backend.
A secondary pattern was needed to support a zero_extended
form too; this adds a testcase for all 3 cases.

Bootstrapped and tested on aarch64-linux-gnu with no regressions.

PR target/110986

gcc/ChangeLog:

* config/aarch64/aarch64.md (*cmov_insn_insv): New pattern.
(*cmov_uxtw_insn_insv): Likewise.

gcc/testsuite/ChangeLog:

* gcc.target/aarch64/cond_op-1.c: New test.
---
 gcc/config/aarch64/aarch64.md| 46 
 gcc/testsuite/gcc.target/aarch64/cond_op-1.c | 20 +
 2 files changed, 66 insertions(+)
 create mode 100644 gcc/testsuite/gcc.target/aarch64/cond_op-1.c

diff --git a/gcc/config/aarch64/aarch64.md b/gcc/config/aarch64/aarch64.md
index 32c7adc8928..59cd0415937 100644
--- a/gcc/config/aarch64/aarch64.md
+++ b/gcc/config/aarch64/aarch64.md
@@ -4413,6 +4413,52 @@ (define_insn "*csinv3_uxtw_insn3"
   [(set_attr "type" "csel")]
 )
 
+;; There are two canonical forms for `cmp ? ~a : a`.
+;; This is the second form and is here to help combine.
+;; Support `-(cmp) ^ a` into `cmp ? ~a : a`
+;; The second pattern is to support the zero extend'ed version.
+
+(define_insn_and_split "*cmov_insn_insv"
+  [(set (match_operand:GPI 0 "register_operand" "=r")
+(xor:GPI
+(neg:GPI
+ (match_operator:GPI 1 "aarch64_comparison_operator"
+  [(match_operand 2 "cc_register" "") (const_int 0)]))
+(match_operand:GPI 3 "general_operand" "r")))]
+  "can_create_pseudo_p ()"
+  "#"
+  "&& true"
+  [(set (match_dup 0)
+   (if_then_else:GPI (match_dup 1)
+ (not:GPI (match_dup 3))
+ (match_dup 3)))]
+  {
+operands[3] = force_reg (mode, operands[3]);
+  }
+  [(set_attr "type" "csel")]
+)
+
+(define_insn_and_split "*cmov_uxtw_insn_insv"
+  [(set (match_operand:DI 0 "register_operand" "=r")
+(zero_extend:DI
+(xor:SI
+ (neg:SI
+  (match_operator:SI 1 "aarch64_comparison_operator"
+   [(match_operand 2 "cc_register" "") (const_int 0)]))
+ (match_operand:SI 3 "general_operand" "r"]
+  "can_create_pseudo_p ()"
+  "#"
+  "&& true"
+  [(set (match_dup 0)
+   (if_then_else:DI (match_dup 1)
+ (zero_extend:DI (not:SI (match_dup 3)))
+ (zero_extend:DI (match_dup 3]
+  {
+operands[3] = force_reg (SImode, operands[3]);
+  }
+  [(set_attr "type" "csel")]
+)
+
 ;; If X can be loaded by a single CNT[BHWD] instruction,
 ;;
 ;;A = UMAX (B, X)
diff --git a/gcc/testsuite/gcc.target/aarch64/cond_op-1.c 
b/gcc/testsuite/gcc.target/aarch64/cond_op-1.c
new file mode 100644
index 000..e6c7821127e
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/cond_op-1.c
@@ -0,0 +1,20 @@
+/* { dg-do compile } */
+/* { dg-options "-O2" } */
+/* PR target/110986 */
+
+
+long long full(unsigned a, unsigned b)
+{
+  return a ? ~b : b;
+}
+unsigned fuu(unsigned a, unsigned b)
+{
+  return a ? ~b : b;
+}
+long long f(unsigned long long a, unsigned long long b)
+{
+  return a ? ~b : b;
+}
+
+/* { dg-final { scan-assembler-times "csinv\tw\[0-9\]*" 2 } } */
+/* { dg-final { scan-assembler-times "csinv\tx\[0-9\]*" 1 } } */
-- 
2.39.3



[Bug tree-optimization/111860] [14 Regression] incorrect vUSE after guard block loop skip block during vectorization.

2023-10-18 Thread pinskia at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111860

--- Comment #10 from Andrew Pinski  ---
Just an FYI, I do get a similar ICE on:
libgomp/testsuite/libgomp.fortran/simd3.f90

Testcase on aarch64-linux-gnu now too.

Maybe since it was in the libgomp testsuite you missed it when you tested your
patch.


/home/ubuntu/src/upstream-gcc-aarch64/gcc/libgomp/testsuite/libgomp.fortran/simd3.f90:56:18:
Error: stmt with wrong VUSE^M
# VUSE <.MEM_68>^M
_21 = D.3326[_50];^M
expected .MEM_95^M
/home/ubuntu/src/upstream-gcc-aarch64/gcc/libgomp/testsuite/libgomp.fortran/simd3.f90:56:18:
Error: PHI node with wrong VUSE on edge from BB 32^M
.MEM_131 = PHI <.MEM_68(32), .MEM_68(29)>^M
expected .MEM_95^M
/home/ubuntu/src/upstream-gcc-aarch64/gcc/libgomp/testsuite/libgomp.fortran/simd3.f90:56:18:
Error: PHI node with wrong VUSE on edge from BB 29^M
.MEM_131 = PHI <.MEM_68(32), .MEM_68(29)>^M
expected .MEM_95^M
during GIMPLE pass: vect^M
/home/ubuntu/src/upstream-gcc-aarch64/gcc/libgomp/testsuite/libgomp.fortran/simd3.f90:56:18:
internal compiler error: verify_ssa failed^M
0x12312eb verify_ssa(bool, bool)^M

[Bug c/104822] -Wscalar-storage-order warning for initialization from NULL seems useless

2023-10-18 Thread pinskia at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=104822

--- Comment #4 from Andrew Pinski  ---
Created attachment 56147
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=56147=edit
Patch which I am testing

[Bug tree-optimization/111791] RISC-V: Strange loop vectorizaion on popcount function

2023-10-18 Thread juzhe.zhong at rivai dot ai via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111791

--- Comment #7 from JuzheZhong  ---
I don't think this is popcount vectorization issue.

This code should not be vectorized. It's true this code won' be vectorized if
we
use default COST model.

So this is not an issue.

[Bug modula2/111871] New: invoking gm2 with -pipe and -v does not work

2023-10-18 Thread gaius at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111871

Bug ID: 111871
   Summary: invoking gm2 with -pipe and -v does not work
   Product: gcc
   Version: 14.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: modula2
  Assignee: gaius at gcc dot gnu.org
  Reporter: gaius at gcc dot gnu.org
  Target Milestone: ---

Forwarding from the gm2 mailing list.

$ gm2 -v -O2 -pipe -o hello hello.mod -Wl,-rpath=/usr/local/gnu/lib`
$ cat hello.mod
MODULE hello;

FROM StrIO IMPORT WriteString, WriteLn;

BEGIN
  WriteString('Hello world from M2!');
  WriteLn
END hello.

Compiling yields this error:
--8<--
Driving: gm2 -v -O2 -pipe -o hello hello.mod -Wl,-rpath=/usr/local/gnu/lib 
-fm2-pathname=- -fm2-pathnameI. -fgen-module-list=- -fscaffold-dynamic -fscaff  
  old-main -flibs=m2cor,m2log,m2pim,m2iso -fplugin=m2rte -l m2cor -l m2log -l 
m2pim -l m2iso -l stdc++ -l m -l pthread
new argc = 21, added_libraries = 7
Using built-in specs.
COLLECT_GCC=gm2
COLLECT_LTO_WRAPPER=/usr/local/gnu/libexec/gcc/x86_64-unknown-openbsd7.3/14.0.0/lto-wrapper
Target: x86_64-unknown-openbsd7.3
Configured with: ../gcc/configure --verbose 
--enable-languages=c,c++,d,fortran,lto,m2,objc,obj-c++,rust --enable-libssp 
--enable-threads=posix --enable-wchar_t --disable-libstdcxx-pch 
--enable-default-ssp --enable-default-pie --enable-cpp 
--with-as=/usr/local/gnu/bin/as --with-ld=/usr/bin/ld --enable-link
er-build-id --prefix=/usr/local/gnu --with-local-prefix=/usr/local/gnu 
--disable-tls --disable-bootstrap
Thread model: posix
Supported LTO compression algorithms: zlib
gcc version 14.0.0 20230604 (experimental) (GCC)
COLLECT_GCC_OPTIONS='-v' '-O2' '-pipe' '-o' 'hello' '-fm2-pathname=-' 
'-fm2-pathnameI.' '-fgen-module-list=-' '-fscaffold-dynamic' '-fscaffold-main' 
'-flibs=m2cor,m2log,m2pim,m2iso' '-fplugin=m2rte' '-mtune=generic' 
'-march=x86-64'
 /usr/local/gnu/libexec/gcc/x86_64-unknown-openbsd7.3/14.0.0/cc1gm2 -v 
-iplugindir=/usr/local/gnu/lib/gcc/x86_64-unknown-openbsd7.3/14.0.0/plugin 
-quiet -dumpbase hello.mod -dumpbase-ext .mod -mtune=generic -march=x86-64 
-O2 -version -fm2-pathname=- -fm2-pathnameI. -fgen-module-list=- 
-fscaffold-dynamic -fscaffold-main -flibs=m2cor,m2log,m2pim,m2iso 
-fplugin=m2rte -fm2-pathname=- -fm2-pathnameI. hello.mod -o - |
 /usr/local/gnu/bin/as -v -o /tmp//cceXRr5c.o -
GNU Modula-2 (GCC) version 14.0.0 20230604 (experimental) 
(x86_64-unknown-openbsd7.3)
compiled by GNU C version 14.0.0 20230604 (experimental), GMP version 
6.2.1, MPFR version 4.2.0, MPC version 1.3.1, isl version isl-0.25-GMP
GGC heuristics: --param ggc-min-expand=30 --param ggc-min-heapsize=4096
Versions of loaded plugins:
 m2rte: Unknown version.
GNU assembler version 2.40.50 (x86_64-unknown-openbsd7.3) using BFD version 
(GNU Binutils) 2.40.50.20230604
{standard input}: Assembler messages:
{standard input}:1: Error: no such instruction: `hello.mod'
-->8--

[Bug target/111720] RISC-V: Ugly codegen in RVV

2023-10-18 Thread juzhe.zhong at rivai dot ai via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111720

--- Comment #15 from JuzheZhong  ---
After investigation:

I found it seems to be an issue to variable-length vector:

https://godbolt.org/z/6Wrjz9ofE

void fn (char * restrict out, int x)
{
   [local count: 1073741824]:
  MEM[(int8x16_t *)out_2(D)] = { 1, 2, 7, 1, 3, 4, 5, 3, 1, 0, 1, 2, 4, 4, 9, 9
};
  return;

}


void fn2 (char * restrict out, int x)
{
  svint8_t varr;
  char arr[32];

   [local count: 1073741824]:
  arr =
"\x01\x02\x07\x01\x03\x04\x05\x03\x01\x00\x01\x02\x04\x04\t\t\x01\x02\x07\x01\x03\x04\x05\x03\x01\x00\x01\x02\x04\x04\t\t";
  varr_3 = MEM[(svint8_t *)];
  MEM[(svint8_t *)out_4(D)] = varr_3;
  arr ={v} {CLOBBER(eol)};
  return;

}

If we use ARM NEON type, the gimple IR won't have CLOBBER. Then no stack
transferring.

fn:
adrpx1, .LC0
ldr q31, [x1, #:lo12:.LC0]
str q31, [x0]
ret
fn2:
adrpx1, .LANCHOR0
add x1, x1, :lo12:.LANCHOR0
sub sp, sp, #32
ptrue   p7.b, all
ldp q31, q30, [x1]
stp q31, q30, [sp]
ld1bz31.b, p7/z, [sp]
st1bz31.b, p7, [x0]
add sp, sp, 32
ret

ARM SVE type will have CLOBBER in gimple IR then cause redundant stack
transferring in ASM.

[Bug tree-optimization/111739] incorrect code with PGO enabled

2023-10-18 Thread pinskia at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111739

Andrew Pinski  changed:

   What|Removed |Added

 Status|NEW |RESOLVED
 Resolution|--- |INVALID

--- Comment #3 from Andrew Pinski  ---
The code is undefined for alias reasons:
union {
  int f;
  short g
} h;
*j = 

...
  a = n(h.f++);
  *j = 0;
...
  for (; i < 2; i++)
d = k(h.g || 0, 59376);
  short *b = 
  *b ^= e;
  printf("%d\n", h);
```

In the case of profiling, we are able to optimize away the short load from h
in:
  *b ^= e;
and point it back to the store from `h.f++` and miss the store via *j.

Once you add -fno-strict-aliasing the testcase starts to work.

[Bug tree-optimization/111738] incorrect code when PGO is enabled

2023-10-18 Thread pinskia at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111738

Andrew Pinski  changed:

   What|Removed |Added

 Resolution|--- |INVALID
 Status|UNCONFIRMED |RESOLVED

--- Comment #4 from Andrew Pinski  ---
=
==1==ERROR: AddressSanitizer: global-buffer-overflow on address 0x004043e0
at pc 0x004012a3 bp 0x7ffd3315e4b0 sp 0x7ffd3315e4a8
WRITE of size 8 at 0x004043e0 thread T0
#0 0x4012a2 in i (/app/output.s+0x4012a2) (BuildId:
4eb338bebafc71b3519003fd1b76487cfb8fb27b)
#1 0x4011c3 in h (/app/output.s+0x4011c3) (BuildId:
4eb338bebafc71b3519003fd1b76487cfb8fb27b)
#2 0x40146f in main (/app/output.s+0x40146f) (BuildId:
4eb338bebafc71b3519003fd1b76487cfb8fb27b)
#3 0x7f28b75e2082 in __libc_start_main
(/lib/x86_64-linux-gnu/libc.so.6+0x24082) (BuildId:
1878e6b475720c7c51969e69ab2d276fae6d1dee)
#4 0x4010fd in _start (/app/output.s+0x4010fd) (BuildId:
4eb338bebafc71b3519003fd1b76487cfb8fb27b)

0x004043e4 is located 0 bytes after global variable 'b' defined in
'/app/example.cpp:1:8' (0x4043e0) of size 4
SUMMARY: AddressSanitizer: global-buffer-overflow (/app/output.s+0x4012a2)
(BuildId: 4eb338bebafc71b3519003fd1b76487cfb8fb27b) in i
Shadow bytes around the buggy address:
  0x00404100: 00 f9 f9 f9 f9 f9 f9 f9 00 00 00 00 00 00 00 00
  0x00404180: 00 00 00 00 00 00 00 00 f9 f9 f9 f9 f9 f9 f9 f9
  0x00404200: f9 f9 f9 f9 f9 f9 f9 f9 f9 f9 f9 f9 f9 f9 f9 f9
  0x00404280: f9 f9 f9 f9 f9 f9 f9 f9 f9 f9 f9 f9 f9 f9 f9 f9
  0x00404300: f9 f9 f9 f9 f9 f9 f9 f9 f9 f9 f9 f9 f9 f9 f9 f9
=>0x00404380: f9 f9 f9 f9 f9 f9 f9 f9 00 00 00 00[04]f9 f9 f9
  0x00404400: f9 f9 f9 f9 04 f9 f9 f9 f9 f9 f9 f9 04 f9 f9 f9
  0x00404480: f9 f9 f9 f9 04 f9 f9 f9 f9 f9 f9 f9 00 00 00 00
  0x00404500: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
  0x00404580: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
  0x00404600: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
Shadow byte legend (one shadow byte represents 8 application bytes):
  Addressable:   00
  Partially addressable: 01 02 03 04 05 06 07 
  Heap left redzone:   fa
  Freed heap region:   fd
  Stack left redzone:  f1
  Stack mid redzone:   f2
  Stack right redzone: f3
  Stack after return:  f5
  Stack use after scope:   f8
  Global redzone:  f9
  Global init order:   f6
  Poisoned by user:f7
  Container overflow:  fc
  Array cookie:ac
  Intra object redzone:bb
  ASan internal:   fe
  Left alloca redzone: ca
  Right alloca redzone:cb
==1==ABORTING


In this case:
```
**c = 
```
...
  int **k = 
  *k = 
  *g |= **c;
```

Basically once I fixed the size issue (or use -m32) there is profile
difference.

[PATCH 6/6] PowerPC: Add support for 1,024 bit DMR registers.

2023-10-18 Thread Michael Meissner
This patch is a prelimianry patch to add the full 1,024 bit dense math register
(DMRs) for -mcpu=future.  The MMA 512-bit accumulators map onto the top of the
DMR register.

This patch only adds the new 1,024 bit register support.  It does not add
support for any instructions that need 1,024 bit registers instead of 512 bit
registers.

I used the new mode 'TDOmode' to be the opaque mode used for 1,204 bit
registers.  The 'wD' constraint added in previous patches is used for these
registers.  I added support to do load and store of DMRs via the VSX registers,
since there are no load/store dense math instructions.  I added the new keyword
'__dmr' to create 1,024 bit types that can be loaded into DMRs.  At present, I
don't have aliases for __dmr512 and __dmr1024 that we've discussed internally.

The patches have been tested on both little and big endian systems.  Can I check
it into the master branch?

2023-10-18   Michael Meissner  

gcc/

* config/rs6000/mma.md (UNSPEC_DM_INSERT512_UPPER): New unspec.
(UNSPEC_DM_INSERT512_LOWER): Likewise.
(UNSPEC_DM_EXTRACT512): Likewise.
(UNSPEC_DMR_RELOAD_FROM_MEMORY): Likewise.
(UNSPEC_DMR_RELOAD_TO_MEMORY): Likewise.
(movtdo): New define_expand and define_insn_and_split to implement 1,024
bit DMR registers.
(movtdo_insert512_upper): New insn.
(movtdo_insert512_lower): Likewise.
(movtdo_extract512): Likewise.
(reload_dmr_from_memory): Likewise.
(reload_dmr_to_memory): Likewise.
* config/rs6000/rs6000-builtin.cc (rs6000_type_string): Add DMR
support.
(rs6000_init_builtins): Add support for __dmr keyword.
* config/rs6000/rs6000-call.cc (rs6000_return_in_memory): Add support
for TDOmode.
(rs6000_function_arg): Likewise.
* config/rs6000/rs6000-modes.def (TDOmode): New mode.
* config/rs6000/rs6000.cc (rs6000_hard_regno_nregs_internal): Add
support for TDOmode.
(rs6000_hard_regno_mode_ok_uncached): Likewise.
(rs6000_hard_regno_mode_ok): Likewise.
(rs6000_modes_tieable_p): Likewise.
(rs6000_debug_reg_global): Likewise.
(rs6000_setup_reg_addr_masks): Likewise.
(rs6000_init_hard_regno_mode_ok): Add support for TDOmode.  Setup reload
hooks for DMR mode.
(reg_offset_addressing_ok_p): Add support for TDOmode.
(rs6000_emit_move): Likewise.
(rs6000_secondary_reload_simple_move): Likewise.
(rs6000_secondary_reload_class): Likewise.
(rs6000_mangle_type): Add mangling for __dmr type.
(rs6000_dmr_register_move_cost): Add support for TDOmode.
(rs6000_split_multireg_move): Likewise.
(rs6000_invalid_conversion): Likewise.
* config/rs6000/rs6000.h (VECTOR_ALIGNMENT_P): Add TDOmode.
(enum rs6000_builtin_type_index): Add DMR type nodes.
(dmr_type_node): Likewise.
(ptr_dmr_type_node): Likewise.

gcc/testsuite/

* gcc.target/powerpc/dm-1024bit.c: New test.
---
 gcc/config/rs6000/mma.md  | 152 ++
 gcc/config/rs6000/rs6000-builtin.cc   |  13 ++
 gcc/config/rs6000/rs6000-call.cc  |  13 +-
 gcc/config/rs6000/rs6000-modes.def|   4 +
 gcc/config/rs6000/rs6000.cc   | 135 
 gcc/config/rs6000/rs6000.h|   7 +-
 gcc/testsuite/gcc.target/powerpc/dm-1024bit.c |  63 
 7 files changed, 351 insertions(+), 36 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/powerpc/dm-1024bit.c

diff --git a/gcc/config/rs6000/mma.md b/gcc/config/rs6000/mma.md
index cae407bc37c..0a89db8af99 100644
--- a/gcc/config/rs6000/mma.md
+++ b/gcc/config/rs6000/mma.md
@@ -93,6 +93,11 @@ (define_c_enum "unspec"
UNSPEC_MMA_XXMTACC
UNSPEC_MMA_VECTOR_PAIR_MEMORY
UNSPEC_DM_ASSEMBLE_ACC
+   UNSPEC_DM_INSERT512_UPPER
+   UNSPEC_DM_INSERT512_LOWER
+   UNSPEC_DM_EXTRACT512
+   UNSPEC_DMR_RELOAD_FROM_MEMORY
+   UNSPEC_DMR_RELOAD_TO_MEMORY
   ])
 
 (define_c_enum "unspecv"
@@ -916,3 +921,150 @@ (define_insn "mma_"
   [(set_attr "type" "mma")
(set_attr "prefixed" "yes")
(set_attr "isa" "dm,not_dm,not_dm")])
+
+
+;; TDOmode (i.e. __dmr).
+(define_expand "movtdo"
+  [(set (match_operand:TDO 0 "nonimmediate_operand")
+   (match_operand:TDO 1 "input_operand"))]
+  "TARGET_DENSE_MATH"
+{
+  rs6000_emit_move (operands[0], operands[1], TDOmode);
+  DONE;
+})
+
+(define_insn_and_split "*movtdo"
+  [(set (match_operand:TDO 0 "nonimmediate_operand" "=wa,m,wa,wD,wD,wa")
+   (match_operand:TDO 1 "input_operand" "m,wa,wa,wa,wD,wD"))]
+  "TARGET_DENSE_MATH
+   && (gpc_reg_operand (operands[0], TDOmode)
+   || gpc_reg_operand (operands[1], TDOmode))"
+  "@
+   #
+   #
+   #
+   #
+   dmmr %0,%1
+   #"
+  "&& reload_completed
+   && (!dmr_operand (operands[0], TDOmode) || !dmr_operand (operands[1], 
TDOmode))"
+  [(const_int 0)]
+{
+  rtx op0 = operands[0];
+  rtx op1 = 

[PATCH 5/6] PowerPC: Switch to dense math names for all MMA operations.

2023-10-18 Thread Michael Meissner
This patch changes the assembler instruction names for MMA instructions from
the original name used in power10 to the new name when used with the dense math
system.  I.e. xvf64gerpp becomes dmxvf64gerpp.  The assembler will emit the
same bits for either spelling.

The patches have been tested on both little and big endian systems.  Can I check
it into the master branch?

2023-10-18   Michael Meissner  

gcc/

* config/rs6000/mma.md (vvi4i4i8_dm): New int attribute.
(avvi4i4i8_dm): Likewise.
(vvi4i4i2_dm): Likewise.
(avvi4i4i2_dm): Likewise.
(vvi4i4_dm): Likewise.
(avvi4i4_dm): Likewise.
(pvi4i2_dm): Likewise.
(apvi4i2_dm): Likewise.
(vvi4i4i4_dm): Likewise.
(avvi4i4i4_dm): Likewise.
(mma_): Add support for running on DMF systems, generating the dense
math instruction and using the dense math accumulators.
(mma_): Likewise.
(mma_): Likewise.
(mma_): Likewise.
(mma_): Likewise.
(mma_): Likewise.
(mma_): Likewise.
(mma_): Likewise.
(mma_): Likewise.
(mma_): Likewise.
(mma_): Likewise.
(mma_): Likewise.

gcc/testsuite/

* gcc.target/powerpc/dm-double-test.c: New test.
* lib/target-supports.exp (check_effective_target_ppc_dmr_ok): New
target test.
---
 gcc/config/rs6000/mma.md  |  98 +++--
 .../gcc.target/powerpc/dm-double-test.c   | 194 ++
 gcc/testsuite/lib/target-supports.exp |  19 ++
 3 files changed, 299 insertions(+), 12 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/powerpc/dm-double-test.c

diff --git a/gcc/config/rs6000/mma.md b/gcc/config/rs6000/mma.md
index e5589d8eccc..cae407bc37c 100644
--- a/gcc/config/rs6000/mma.md
+++ b/gcc/config/rs6000/mma.md
@@ -228,13 +228,22 @@ (define_int_attr apv  [(UNSPEC_MMA_XVF64GERPP 
"xvf64gerpp")
 
 (define_int_attr vvi4i4i8  [(UNSPEC_MMA_PMXVI4GER8 "pmxvi4ger8")])
 
+(define_int_attr vvi4i4i8_dm   [(UNSPEC_MMA_PMXVI4GER8 
"pmdmxvi4ger8")])
+
 (define_int_attr avvi4i4i8 [(UNSPEC_MMA_PMXVI4GER8PP   
"pmxvi4ger8pp")])
 
+(define_int_attr avvi4i4i8_dm  [(UNSPEC_MMA_PMXVI4GER8PP   
"pmdmxvi4ger8pp")])
+
 (define_int_attr vvi4i4i2  [(UNSPEC_MMA_PMXVI16GER2"pmxvi16ger2")
 (UNSPEC_MMA_PMXVI16GER2S   "pmxvi16ger2s")
 (UNSPEC_MMA_PMXVF16GER2"pmxvf16ger2")
 (UNSPEC_MMA_PMXVBF16GER2   
"pmxvbf16ger2")])
 
+(define_int_attr vvi4i4i2_dm   [(UNSPEC_MMA_PMXVI16GER2"pmdmxvi16ger2")
+(UNSPEC_MMA_PMXVI16GER2S   
"pmdmxvi16ger2s")
+(UNSPEC_MMA_PMXVF16GER2"pmdmxvf16ger2")
+(UNSPEC_MMA_PMXVBF16GER2   
"pmdmxvbf16ger2")])
+
 (define_int_attr avvi4i4i2 [(UNSPEC_MMA_PMXVI16GER2PP  "pmxvi16ger2pp")
 (UNSPEC_MMA_PMXVI16GER2SPP 
"pmxvi16ger2spp")
 (UNSPEC_MMA_PMXVF16GER2PP  "pmxvf16ger2pp")
@@ -246,25 +255,54 @@ (define_int_attr avvi4i4i2
[(UNSPEC_MMA_PMXVI16GER2PP  "pmxvi16ger2pp")
 (UNSPEC_MMA_PMXVBF16GER2NP 
"pmxvbf16ger2np")
 (UNSPEC_MMA_PMXVBF16GER2NN 
"pmxvbf16ger2nn")])
 
+(define_int_attr avvi4i4i2_dm  [(UNSPEC_MMA_PMXVI16GER2PP  
"pmdmxvi16ger2pp")
+(UNSPEC_MMA_PMXVI16GER2SPP 
"pmdmxvi16ger2spp")
+(UNSPEC_MMA_PMXVF16GER2PP  
"pmdmxvf16ger2pp")
+(UNSPEC_MMA_PMXVF16GER2PN  
"pmdmxvf16ger2pn")
+(UNSPEC_MMA_PMXVF16GER2NP  
"pmdmxvf16ger2np")
+(UNSPEC_MMA_PMXVF16GER2NN  
"pmdmxvf16ger2nn")
+(UNSPEC_MMA_PMXVBF16GER2PP 
"pmdmxvbf16ger2pp")
+(UNSPEC_MMA_PMXVBF16GER2PN 
"pmdmxvbf16ger2pn")
+(UNSPEC_MMA_PMXVBF16GER2NP 
"pmdmxvbf16ger2np")
+(UNSPEC_MMA_PMXVBF16GER2NN 
"pmdmxvbf16ger2nn")])
+
 (define_int_attr vvi4i4[(UNSPEC_MMA_PMXVF32GER 
"pmxvf32ger")])
 
+(define_int_attr vvi4i4_dm [(UNSPEC_MMA_PMXVF32GER 
"pmdmxvf32ger")])
+
 (define_int_attr avvi4i4   [(UNSPEC_MMA_PMXVF32GERPP   "pmxvf32gerpp")
 (UNSPEC_MMA_PMXVF32GERPN   "pmxvf32gerpn")
 (UNSPEC_MMA_PMXVF32GERNP   "pmxvf32gernp")
 (UNSPEC_MMA_PMXVF32GERNN   
"pmxvf32gernn")])
 
+(define_int_attr avvi4i4_dm[(UNSPEC_MMA_PMXVF32GERPP   
"pmdmxvf32gerpp")
+(UNSPEC_MMA_PMXVF32GERPN   
"pmdmxvf32gerpn")
+ 

[Bug target/110733] [14 Regression] ICE: in curr_insn_transform, at lra-constraints.cc:4259 (unable to generate reloads for: {*one_cmplv16qi2}) with -O -fno-omit-frame-pointer -mavx512f

2023-10-18 Thread pinskia at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110733

Andrew Pinski  changed:

   What|Removed |Added

 Status|WAITING |RESOLVED
 Resolution|--- |FIXED

--- Comment #3 from Andrew Pinski  ---
.

[PATCH 4/6] PowerPC: Make MMA insns support DMR registers.

2023-10-18 Thread Michael Meissner
This patch changes the MMA instructions to use either FPR registers
(-mcpu=power10) or DMRs (-mcpu=future).  In this patch, the existing MMA
instruction names are used.

A macro (__PPC_DMR__) is defined if the MMA instructions use the DMRs.

The patches have been tested on both little and big endian systems.  Can I check
it into the master branch?

2023-10-18   Michael Meissner  

gcc/

* config/rs6000/mma.md (mma_): New define_expand to handle
mma_ for dense math and non dense math.
(mma_ insn): Restrict to non dense math.
(mma_xxsetaccz): Convert to define_expand to handle non dense math and
dense math.
(mma_xxsetaccz_vsx): Rename from mma_xxsetaccz and restrict usage to non
dense math.
(mma_xxsetaccz_dm): Dense math version of mma_xxsetaccz.
(mma_): Add support for dense math.
(mma_): Likewise.
(mma_): Likewise.
(mma_): Likewise.
(mma_): Likewise.
(mma_): Likewise.
(mma_): Likewise.
(mma_): Likewise.
(mma_): Likewise.
(mma_): Likewise.
(mma_): Likewise.
(mma_): Likewise.
(mma_): Likewise.
(mma_): Likewise.
* config/rs6000/rs6000-c.cc (rs6000_target_modify_macros): Define
__PPC_DMR__ if we have dense math instructions.
* config/rs6000/rs6000.cc (print_operand): Make %A handle only DMRs if
dense math and only FPRs if not dense math.
(rs6000_split_multireg_move): Do not generate the xxmtacc instruction to
prime the DMR registers or the xxmfacc instruction to de-prime
instructions if we have dense math register support.
---
 gcc/config/rs6000/mma.md  | 247 +-
 gcc/config/rs6000/rs6000-c.cc |   3 +
 gcc/config/rs6000/rs6000.cc   |  35 ++---
 3 files changed, 176 insertions(+), 109 deletions(-)

diff --git a/gcc/config/rs6000/mma.md b/gcc/config/rs6000/mma.md
index d2c5b73fa8f..e5589d8eccc 100644
--- a/gcc/config/rs6000/mma.md
+++ b/gcc/config/rs6000/mma.md
@@ -596,190 +596,249 @@ (define_insn "*mma_disassemble_acc_dm"
   "dmxxextfdmr256 %0,%1,2"
   [(set_attr "type" "mma")])
 
-(define_insn "mma_"
+;; MMA instructions that do not use their accumulators as an input, still must
+;; not allow their vector operands to overlap the registers used by the
+;; accumulator.  We enforce this by marking the output as early clobber.  If we
+;; have dense math, we don't need the whole prime/de-prime action, so just make
+;; thse instructions be NOPs.
+
+(define_expand "mma_"
+  [(set (match_operand:XO 0 "register_operand")
+   (unspec:XO [(match_operand:XO 1 "register_operand")]
+  MMA_ACC))]
+  "TARGET_MMA"
+{
+  if (TARGET_DENSE_MATH)
+{
+  if (!rtx_equal_p (operands[0], operands[1]))
+   emit_move_insn (operands[0], operands[1]);
+  DONE;
+}
+
+  /* Generate the prime/de-prime code.  */
+})
+
+(define_insn "*mma_"
   [(set (match_operand:XO 0 "fpr_reg_operand" "=")
(unspec:XO [(match_operand:XO 1 "fpr_reg_operand" "0")]
MMA_ACC))]
-  "TARGET_MMA"
+  "TARGET_MMA && !TARGET_DENSE_MATH"
   " %A0"
   [(set_attr "type" "mma")])
 
 ;; We can't have integer constants in XOmode so we wrap this in an
-;; UNSPEC_VOLATILE.
+;; UNSPEC_VOLATILE for the non-dense math case.  For dense math, we don't need
+;; to disable optimization and we can do a normal UNSPEC.
 
-(define_insn "mma_xxsetaccz"
-  [(set (match_operand:XO 0 "fpr_reg_operand" "=d")
+(define_expand "mma_xxsetaccz"
+  [(set (match_operand:XO 0 "register_operand")
(unspec_volatile:XO [(const_int 0)]
UNSPECV_MMA_XXSETACCZ))]
   "TARGET_MMA"
+{
+  if (TARGET_DENSE_MATH)
+{
+  emit_insn (gen_mma_xxsetaccz_dm (operands[0]));
+  DONE;
+}
+})
+
+(define_insn "*mma_xxsetaccz_vsx"
+  [(set (match_operand:XO 0 "fpr_reg_operand" "=d")
+   (unspec_volatile:XO [(const_int 0)]
+   UNSPECV_MMA_XXSETACCZ))]
+  "TARGET_MMA && !TARGET_DENSE_MATH"
   "xxsetaccz %A0"
   [(set_attr "type" "mma")])
 
+
+(define_insn "mma_xxsetaccz_dm"
+  [(set (match_operand:XO 0 "dmr_operand" "=wD")
+   (unspec:XO [(const_int 0)]
+  UNSPECV_MMA_XXSETACCZ))]
+  "TARGET_DENSE_MATH"
+  "dmsetdmrz %0"
+  [(set_attr "type" "mma")])
+
 (define_insn "mma_"
-  [(set (match_operand:XO 0 "fpr_reg_operand" "=,")
-   (unspec:XO [(match_operand:V16QI 1 "vsx_register_operand" "v,?wa")
-   (match_operand:V16QI 2 "vsx_register_operand" "v,?wa")]
+  [(set (match_operand:XO 0 "accumulator_operand" "=wD,,")
+   (unspec:XO [(match_operand:V16QI 1 "vsx_register_operand" "wa,v,?wa")
+   (match_operand:V16QI 2 "vsx_register_operand" "wa,v,?wa")]
MMA_VV))]
   "TARGET_MMA"
   " %A0,%x1,%x2"
-  [(set_attr "type" "mma")])
+  [(set_attr "type" "mma")
+   (set_attr "isa" "dm,not_dm,not_dm")])
 
 (define_insn "mma_"
-  [(set (match_operand:XO 0 

[PATCH 3/6] PowerPC: Add support for accumulators in DMR registers.

2023-10-18 Thread Michael Meissner
The MMA subsystem added the notion of accumulator registers as an optional
feature of ISA 3.1 (power10).  In ISA 3.1, these accumulators overlapped with
the traditional floating point registers 0..31, but logically the accumulator
registers were separate from the FPR registers.  In ISA 3.1, it was anticipated
that in future systems, the accumulator registers may no overlap with the FPR
registers.  This patch adds the support for dense math registers as separate
registers.

This particular patch does not change the MMA support to use the accumulators
within the dense math registers.  This patch just adds the basic support for
having separate DMRs.  The next patch will switch the MMA support to use the
accumulators if -mcpu=future is used.

For testing purposes, I added an undocumented option '-mdense-math' to enable
or disable the dense math support.

This patch adds a new constraint (wD).  If MMA is selected but dense math is
not selected (i.e. -mcpu=power10), the wD constraint will allow access to
accumulators that overlap with the VSX vector registers 0..31.  If both MMA and
dense math are selected (i.e. -mcpu=future), the wD constraint will only allow
dense math registers.

This patch modifies the existing %A output modifier.  If MMA is selected but
dense math is not selected, then %A output modifier converts the VSX register
number to the accumulator number, by dividing it by 4.  If both MMA and dense
math are selected, then %A will map the separate DMR registers into 0..7.

The intention is that user code using extended asm can be modified to run on
both MMA without dense math and MMA with dense math:

1)  If possible, don't use extended asm, but instead use the MMA built-in
functions;

2)  If you do need to write extended asm, change the d constraints
targetting accumulators should now use wD;

3)  Only use the built-in zero, assemble and disassemble functions create
move data between vector quad types and dense math accumulators.
I.e. do not use the xxmfacc, xxmtacc, and xxsetaccz directly in the
extended asm code.  The reason is these instructions assume there is a
1-to-1 correspondence between 4 adjacent FPR registers and an
accumulator that overlaps with those instructions.  With accumulators
now being separate registers, there no longer is a 1-to-1
correspondence.

It is possible that the mangling for DMRs and the GDB register numbers may
change in the future.

The patches have been tested on both little and big endian systems.  Can I check
it into the master branch?

2023-10-18   Michael Meissner  

gcc/

* config/rs6000/constraints.md (wD constraint): New constraint.
* config/rs6000/mma.md (UNSPEC_DM_ASSEMBLE_ACC): New unspec.
(movxo): Convert into define_expand.
(movxo_vsx): Version of movxo where accumulators overlap with VSX vector
registers 0..31.
(movxo_dm): Verson of movxo that supports separate dense math
accumulators.
(mma_assemble_acc): Add dense math support to define_expand.
(mma_assemble_acc_vsx): Rename from mma_assemble_acc, and restrict it to
non dense math systems.
(mma_assemble_acc_dm): Dense math version of mma_assemble_acc.
(mma_disassemble_acc): Add dense math support to define_expand.
(mma_disassemble_acc_vsx): Rename from mma_disassemble_acc, and restrict
it to non dense math systems.
(mma_disassemble_acc_dm): Dense math version of mma_disassemble_acc.
* config/rs6000/predicates.md (dmr_operand): New predicate.
(accumulator_operand): Likewise.
* config/rs6000/rs6000-cpus.def (ISA_FUTURE_MASKS): Add -mdense-math.
(POWERPC_MASKS): Likewise.
* config/rs6000/rs6000.cc (enum rs6000_reg_type): Add DMR_REG_TYPE.
(enum rs6000_reload_reg_type): Add RELOAD_REG_DMR.
(LAST_RELOAD_REG_CLASS): Add support for DMR registers and the wD
constraint.
(reload_reg_map): Likewise.
(rs6000_reg_names): Likewise.
(alt_reg_names): Likewise.
(rs6000_hard_regno_nregs_internal): Likewise.
(rs6000_hard_regno_mode_ok_uncached): Likewise.
(rs6000_debug_reg_global): Likewise.
(rs6000_setup_reg_addr_masks): Likewise.
(rs6000_init_hard_regno_mode_ok): Likewise.
(rs6000_option_override_internal): Add checking for -mdense-math.
(rs6000_secondary_reload_memory): Add support for DMR registers.
(rs6000_secondary_reload_simple_move): Likewise.
(rs6000_preferred_reload_class): Likewise.
(rs6000_secondary_reload_class): Likewise.
(print_operand): Make %A handle both FPRs and DMRs.
(rs6000_dmr_register_move_cost): New helper function.
(rs6000_register_move_cost): Add support for DMR registers.
(rs6000_memory_move_cost): Likewise.
(rs6000_compute_pressure_classes): Likewise.
(rs6000_debugger_regno): 

[PATCH 2/6] PowerPC: Make -mcpu=future enable -mblock-ops-vector-pair.

2023-10-18 Thread Michael Meissner
This patch re-enables generating load and store vector pair instructions when
doing certain memory copy operations when -mcpu=future is used.

During power10 development, it was determined that using store vector pair
instructions were problematical in a few cases, so we disabled generating load
and store vector pair instructions for memory options by default.  This patch
re-enables generating these instructions if -mcpu=future is used.

The patches have been tested on both little and big endian systems.  Can I check
it into the master branch?

2023-10-18   Michael Meissner  

gcc/

* config/rs6000/rs6000-cpus.def (ISA_FUTURE_MASKS): Add
-mblock-ops-vector-pair.
(POWERPC_MASKS): Likewise.
---
 gcc/config/rs6000/rs6000-cpus.def | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/gcc/config/rs6000/rs6000-cpus.def 
b/gcc/config/rs6000/rs6000-cpus.def
index a6d9d7bf9a8..849af6b3ac8 100644
--- a/gcc/config/rs6000/rs6000-cpus.def
+++ b/gcc/config/rs6000/rs6000-cpus.def
@@ -90,6 +90,7 @@
 
 /* Flags for a potential future processor that may or may not be delivered.  */
 #define ISA_FUTURE_MASKS   (ISA_3_1_MASKS_SERVER   \
+| OPTION_MASK_BLOCK_OPS_VECTOR_PAIR\
 | OPTION_MASK_FUTURE)
 
 /* Flags that need to be turned off if -mno-power9-vector.  */
@@ -127,6 +128,7 @@
 
 /* Mask of all options to set the default isa flags based on -mcpu=.  */
 #define POWERPC_MASKS  (OPTION_MASK_ALTIVEC\
+| OPTION_MASK_BLOCK_OPS_VECTOR_PAIR\
 | OPTION_MASK_CMPB \
 | OPTION_MASK_CRYPTO   \
 | OPTION_MASK_DFP  \
-- 
2.41.0


-- 
Michael Meissner, IBM
PO Box 98, Ayer, Massachusetts, USA, 01432
email: meiss...@linux.ibm.com


Re: [PATCH 1/6] PowerPC: Add -mcpu=future option

2023-10-18 Thread Michael Meissner
This patch implements support for a potential future PowerPC cpu.  Features
added with -mcpu=future, may or may not be added to new PowerPC processors.

This patch adds support for the -mcpu=future option.  If you use -mcpu=future,
the macro __ARCH_PWR_FUTURE__ is defined, and the assembler .machine directive
"future" is used.  Future patches in this series will add support for new
instructions that may be present in future PowerPC processors.

This particular patch does not any new features.  It exists as a ground work
for future patches to support for a possible PowerPC processor in the future.

This patch does not implement any differences in tuning when -mcpu=future is
used compared to -mcpu=power10.  If -mcpu=future is used, GCC will use power10
tuning.  If you explicitly use -mtune=future, you will get a warning that
-mtune=future is not supported, and default tuning will be set for power10.

The patches have been tested on both little and big endian systems.  Can I check
it into the master branch?

2023-10-18   Michael Meissner  

gcc/

* config/rs6000/rs6000-c.cc (rs6000_target_modify_macros): Define
__ARCH_PWR_FUTURE__ if -mcpu=future.
* config/rs6000/rs6000-cpus.def (ISA_FUTURE_MASKS): New macro.
(POWERPC_MASKS): Add -mcpu=future support.
* config/rs6000/rs6000-opts.h (enum processor_type): Add
PROCESSOR_FUTURE.
* config/rs6000/rs6000-tables.opt: Regenerate.
* config/rs6000/rs6000.cc (rs600_cpu_index_lookup): New helper
function.
(rs6000_option_override_internal): Make -mcpu=future set
-mtune=power10.  If the user explicitly uses -mtune=future, give a
warning and reset the tuning to power10.
(rs6000_option_override_internal): Use power10 costs for future
machine.
(rs6000_machine_from_flags): Add support for -mcpu=future.
(rs6000_opt_masks): Likewise.
* config/rs6000/rs6000.h (ASM_CPU_SUPPORT): Likewise.
* config/rs6000/rs6000.md (cpu attribute): Likewise.
* config/rs6000/rs6000.opt (-mfuture): New undocumented debug switch.
* doc/invoke.texi (IBM RS/6000 and PowerPC Options): Document 
-mcpu=future.
---
 gcc/config/rs6000/rs6000-c.cc   |  2 +
 gcc/config/rs6000/rs6000-cpus.def   |  6 +++
 gcc/config/rs6000/rs6000-opts.h |  4 +-
 gcc/config/rs6000/rs6000-tables.opt |  3 ++
 gcc/config/rs6000/rs6000.cc | 58 -
 gcc/config/rs6000/rs6000.h  |  1 +
 gcc/config/rs6000/rs6000.md |  2 +-
 gcc/config/rs6000/rs6000.opt|  4 ++
 gcc/doc/invoke.texi |  2 +-
 9 files changed, 69 insertions(+), 13 deletions(-)

diff --git a/gcc/config/rs6000/rs6000-c.cc b/gcc/config/rs6000/rs6000-c.cc
index 65be0ac43e2..e276c20cccd 100644
--- a/gcc/config/rs6000/rs6000-c.cc
+++ b/gcc/config/rs6000/rs6000-c.cc
@@ -447,6 +447,8 @@ rs6000_target_modify_macros (bool define_p, HOST_WIDE_INT 
flags)
 rs6000_define_or_undefine_macro (define_p, "_ARCH_PWR9");
   if ((flags & OPTION_MASK_POWER10) != 0)
 rs6000_define_or_undefine_macro (define_p, "_ARCH_PWR10");
+  if ((flags & OPTION_MASK_FUTURE) != 0)
+rs6000_define_or_undefine_macro (define_p, "_ARCH_PWR_FUTURE");
   if ((flags & OPTION_MASK_SOFT_FLOAT) != 0)
 rs6000_define_or_undefine_macro (define_p, "_SOFT_FLOAT");
   if ((flags & OPTION_MASK_RECIP_PRECISION) != 0)
diff --git a/gcc/config/rs6000/rs6000-cpus.def 
b/gcc/config/rs6000/rs6000-cpus.def
index 8c530a22da8..a6d9d7bf9a8 100644
--- a/gcc/config/rs6000/rs6000-cpus.def
+++ b/gcc/config/rs6000/rs6000-cpus.def
@@ -88,6 +88,10 @@
 | OPTION_MASK_POWER10  \
 | OTHER_POWER10_MASKS)
 
+/* Flags for a potential future processor that may or may not be delivered.  */
+#define ISA_FUTURE_MASKS   (ISA_3_1_MASKS_SERVER   \
+| OPTION_MASK_FUTURE)
+
 /* Flags that need to be turned off if -mno-power9-vector.  */
 #define OTHER_P9_VECTOR_MASKS  (OPTION_MASK_FLOAT128_HW\
 | OPTION_MASK_P9_MINMAX)
@@ -134,6 +138,7 @@
 | OPTION_MASK_FPRND\
 | OPTION_MASK_POWER10  \
 | OPTION_MASK_P10_FUSION   \
+| OPTION_MASK_FUTURE   \
 | OPTION_MASK_HTM  \
 | OPTION_MASK_ISEL \
 | OPTION_MASK_LOAD_VECTOR_PAIR \
@@ -267,3 +272,4 @@ RS6000_CPU ("powerpc64", PROCESSOR_POWERPC64, 
OPTION_MASK_PPC_GFXOPT
 RS6000_CPU ("powerpc64le", PROCESSOR_POWER8, MASK_POWERPC64
| ISA_2_7_MASKS_SERVER | OPTION_MASK_HTM)
 RS6000_CPU ("rs64", PROCESSOR_RS64A, OPTION_MASK_PPC_GFXOPT | MASK_POWERPC64)

[PATCH 0/6] PowerPC Future patches

2023-10-18 Thread Michael Meissner
This patch is very preliminary support for a potential new feature to the
PowerPC that extends the current power10 MMA architecture.  This feature may or
may not be present in any specific future PowerPC processor.

In the current MMA subsystem for Power10, there are 8 512-bit accumulator
registers.  These accumulators are each tied to sets of 4 FPR registers.  When
you issue a prime instruction, it makes sure the accumulator is a copy of the 4
FPR registers the accumulator is tied to.  When you issue a deprime
instruction, it makes sure that the accumulator data content is logically
copied to the matching FPR register.

In the potential dense math system, the accumulators are moved to separate
registers called dense math registers (DM registers or DMR).  The DMRs are then
extended to 1,024 bits and new instructions will be added to deal with all
1,024 bits of the DMRs.

If you take existing MMA code, it will work as long as you don't do anything
with accumulators, and you follow the rules in the ISA 3.1 documentation for
using the MMA subsystem.

These patches add support for the 512-bit accumulators within the dense math
system, and for allocation of the 1,024-bit DMRs.  At this time, no additional
built-in functions will be done to support any dense math features other than
doing data movement between the DMRs and the VSX registers.  Before we can look
at adding any new dense math support other than data movement, we need the GCC
compiler to be able to allocate and use these DMRs.

There are 6 patches in this patch set:

1) The first patch just adds -mcpu=future as an option to add new support.
This is similar to the -mcpu=future that we did before power10 was announced.

2) The second patch enables GCC to use the load and store vector pair
instructions to optimize memory copy operations in the compiler.  For power10,
we needed to just stay with normal vector load/stores for memory copy
operations.

3) The third patch enables 512-bit accumulators that are located within in DMRs
instead of the FPRs.  This patch enables the register allocation, but it does
not move the existing MMA to use these registers.

4) The fourth patch switches the MMA subsystem to use 512-bit accumulators
within DMRs if you use -mcpu=future.

5) The fifth patch switches the names of the MMA instructions to use the dense
math equivalent name if -mcpu=future.

6) The sixth patch enables using the full 1,024-bit DMRs.  Right now, all you
can do with DMRs is move a VSX register to a DMR register, and to move a DMR
register to a VSX register.

In terms of changes, these patch now use the wD constraint for accumulators.
If you compile with -mcpu=power10, the wD constraint will match the equivalent
FPR register that overlaps with the accumulator.  If you compile with
-mcpu=future, the wD constraint will match the DMR register and not the FPR
register.

These patches also modifies the print_operand %A output modifier to print out
DMR register numbers if -mcpu=future, and continue to print out the FPR
register number divided by 4 for -mcpu=power10.

In general, if you only use the built-in functions, things work between the two
systems.  If you use extended asm, you will likely need to modify the code.
Going forward, hopefully if you modify your code to use the wD constraint and
%A output modifier, you can write code that switches more easily between the
two systems.

Again, these are preliminary patches for a potential future machine.  Things
will likely change in terms of implementation and usage over time.

Originally these patches were submitted in November 2022:
https://gcc.gnu.org/pipermail/gcc-patches/2022-November/605581.html

-- 
Michael Meissner, IBM
PO Box 98, Ayer, Massachusetts, USA, 01432
email: meiss...@linux.ibm.com


[Bug middle-end/111799] [14 Regression] Missed Dead Code Elimination since r14-2365-g2e406f0753e

2023-10-18 Thread pinskia at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111799

Andrew Pinski  changed:

   What|Removed |Added

 Status|UNCONFIRMED |NEW
  Component|tree-optimization   |middle-end
 Ever confirmed|0   |1
   Keywords||TREE
   Last reconfirmed||2023-10-18

--- Comment #5 from Andrew Pinski  ---
(In reply to Theodoros Theodoridis from comment #4)
> Oops, there was a bug in my reduction, here's the fixed code:
> 
> https://godbolt.org/z/shxffzs8E
> 
> void foo(void);
> typedef unsigned short uint16_t;
> static int b;
> static int c;
> static int *f = 
> static int *ad;
> static char(a)(char g, char h) { return g + h; }
> static char(d)(char g, char h) { return g * h; }
> static void(e)(uint16_t g) {
> if (!(((g) >= 1) && ((g) <= 65459))) {
> __builtin_unreachable();
> }
> }
> int main() {
> b = 0;
> for (;; b = 1) {
> char i = d(126 | 1, 205);
> e(i);
> short j;
> int k = *f;
> j = -21;
> for (; j; j = a(j, 7)) e((j ^ k && *f) <= *f);
> if (b) break;
> ad = 
> }
> if (ad)
> ;
> else
> foo();
> ;
> }


Confirmed with this testcase but what is interesting is that the optimization
of calling foo away does NOT happen at the gimple level but at the RTL level
and just happens on x86_64 but NOT on aarch64.

on aarch64 for GCC 13, we even have:
mov w0, 1
str wzr, [x1, #:lo12:.LANCHOR0]
cbz w0, .L13

Which is obvious should have been removed ...

Anyways this is a missed jump threading that should have happened at the gimple
level. I have not looked into why it is not done.

[Bug middle-end/110986] [14 Regression] aarch64 has support for conditional not (and vectorized conditional not ) after r14-3110-g7fb65f10285

2023-10-18 Thread pinskia at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110986

Andrew Pinski  changed:

   What|Removed |Added

  Attachment #56134|0   |1
is obsolete||

--- Comment #22 from Andrew Pinski  ---
Created attachment 56146
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=56146=edit
patch under test

[Bug tree-optimization/111791] RISC-V: Strange loop vectorizaion on popcount function

2023-10-18 Thread pinskia at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111791

--- Comment #6 from Andrew Pinski  ---
(In reply to Vineet Gupta from comment #5)
> (In reply to Robin Dapp from comment #4)
> 
> > Analyzing loop at pr111791.c:8
> > pr111791.c:8:25: note:  === analyze_loop_nest ===
> > pr111791.c:8:25: note:   === vect_analyze_loop_form ===
> > pr111791.c:8:25: note:=== get_loop_niters ===
> > Matching expression match.pd:1919, generic-match-8.cc:27
> > Applying pattern match.pd:1975, generic-match-2.cc:4670
> > Matching expression match.pd:2707, generic-match-4.cc:36
> > Matching expression match.pd:2710, generic-match-3.cc:53
> > Matching expression match.pd:2717, generic-match-2.cc:23
> > Matching expression match.pd:2707, generic-match-4.cc:36
> > Matching expression match.pd:2710, generic-match-3.cc:53
> > Matching expression match.pd:2717, generic-match-2.cc:23
> > Matching expression match.pd:2707, generic-match-4.cc:36
> > Matching expression match.pd:2710, generic-match-3.cc:53
> > Matching expression match.pd:2717, generic-match-2.cc:23
> > Matching expression match.pd:148, generic-match-10.cc:27
> > Matching expression match.pd:148, generic-match-10.cc:27
> > Applying pattern match.pd:4519, generic-match-4.cc:2923
> > Applying pattern match.pd:201, generic-match-4.cc:3103
> > Applying pattern match.pd:3393, generic-match-2.cc:182
> > pr111791.c:8:25: note:   Symbolic number of iterations is (unsigned intD.4)
> > __builtin_popcountlD.1952 (value_4(D))
> 
> Curious, how did you get this debug output - is this just one of
> -fdump-tree-?

The `applying pattern`/`matching expression` comes from `-folding` option of
`-fdump-tree-`. It is enabled with `-all` at the end too.
So in this case it looks like it was:
`-fdump-tree-vect-all` since both __builtin_popcount and the type `unsigned
int` has the decl ID at the end (that is what `D.4` and `D.1952` are).

[Bug tree-optimization/111791] RISC-V: Strange loop vectorizaion on popcount function

2023-10-18 Thread vineetg at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111791

--- Comment #5 from Vineet Gupta  ---
(In reply to Robin Dapp from comment #4)

> Analyzing loop at pr111791.c:8
> pr111791.c:8:25: note:  === analyze_loop_nest ===
> pr111791.c:8:25: note:   === vect_analyze_loop_form ===
> pr111791.c:8:25: note:=== get_loop_niters ===
> Matching expression match.pd:1919, generic-match-8.cc:27
> Applying pattern match.pd:1975, generic-match-2.cc:4670
> Matching expression match.pd:2707, generic-match-4.cc:36
> Matching expression match.pd:2710, generic-match-3.cc:53
> Matching expression match.pd:2717, generic-match-2.cc:23
> Matching expression match.pd:2707, generic-match-4.cc:36
> Matching expression match.pd:2710, generic-match-3.cc:53
> Matching expression match.pd:2717, generic-match-2.cc:23
> Matching expression match.pd:2707, generic-match-4.cc:36
> Matching expression match.pd:2710, generic-match-3.cc:53
> Matching expression match.pd:2717, generic-match-2.cc:23
> Matching expression match.pd:148, generic-match-10.cc:27
> Matching expression match.pd:148, generic-match-10.cc:27
> Applying pattern match.pd:4519, generic-match-4.cc:2923
> Applying pattern match.pd:201, generic-match-4.cc:3103
> Applying pattern match.pd:3393, generic-match-2.cc:182
> pr111791.c:8:25: note:   Symbolic number of iterations is (unsigned intD.4)
> __builtin_popcountlD.1952 (value_4(D))

Curious, how did you get this debug output - is this just one of -fdump-tree-?

[Bug target/111870] Miscompile of atomic rmw or on x86 (not aarch, though)

2023-10-18 Thread pinskia at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111870

Andrew Pinski  changed:

   What|Removed |Added

 Resolution|--- |INVALID
 Status|WAITING |RESOLVED

--- Comment #10 from Andrew Pinski  ---
There is no atomicity issue here.
If we need the old (or new) value, we use a compare-and-exchange loop to get
the old value. If we don't need it, we use an atomic or.

[Bug c/101364] ICE: tree check: expected class ‘type’, have ‘exceptional’ (error_mark) in c_type_promotes_to, at c/c-typeck.c:278

2023-10-18 Thread pinskia at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=101364

Andrew Pinski  changed:

   What|Removed |Added

 Status|ASSIGNED|RESOLVED
   Target Milestone|--- |14.0
 Resolution|--- |FIXED

--- Comment #6 from Andrew Pinski  ---
Fixed.

[Bug middle-end/111863] [14 Regression] Wrong code with "-O3 -fno-tree-ccp -fno-tree-dominator-opts -fno-tree-vrp" since r14-1600

2023-10-18 Thread pinskia at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111863

Andrew Pinski  changed:

   What|Removed |Added

 Status|ASSIGNED|RESOLVED
 Resolution|--- |FIXED

--- Comment #12 from Andrew Pinski  ---
Fixed.

[Bug c/101285] [11/12/13/14 Regression] ICE: tree check: expected class ‘type’, have ‘exceptional’ (error_mark) in c_safe_arg_type_equiv_p, at c/c-typeck.c:5830

2023-10-18 Thread pinskia at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=101285

Andrew Pinski  changed:

   What|Removed |Added

 Status|ASSIGNED|RESOLVED
 Resolution|--- |FIXED

--- Comment #9 from Andrew Pinski  ---
.

[Bug c/101285] [11/12/13/14 Regression] ICE: tree check: expected class ‘type’, have ‘exceptional’ (error_mark) in c_safe_arg_type_equiv_p, at c/c-typeck.c:5830

2023-10-18 Thread pinskia at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=101285

Andrew Pinski  changed:

   What|Removed |Added

   Target Milestone|11.5|14.0

--- Comment #8 from Andrew Pinski  ---
Fixed on the trunk, this is an error recovery issue so not backporting.

[Bug middle-end/111863] [14 Regression] Wrong code with "-O3 -fno-tree-ccp -fno-tree-dominator-opts -fno-tree-vrp" since r14-1600

2023-10-18 Thread cvs-commit at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111863

--- Comment #11 from CVS Commits  ---
The trunk branch has been updated by Andrew Pinski :

https://gcc.gnu.org/g:b20dbddcc41120144e700c4e3ef1ec396b1c56ab

commit r14-4729-gb20dbddcc41120144e700c4e3ef1ec396b1c56ab
Author: Andrew Pinski 
Date:   Wed Oct 18 10:26:07 2023 -0700

Fix expansion of `(a & 2) != 1`

I had a thinko in r14-1600-ge60593f3881c72a96a3fa4844d73e8a2cd14f670
where we would remove the `& CST` part if we ended up not calling
expand_single_bit_test.
This fixes the problem by introducing a new variable that will be used
for calling expand_single_bit_test.
As afar as I know this can only show up when disabling optimization
passes as this above form would have been optimized away.

Committed as obvious after a bootstrap/test on x86_64-linux-gnu.

PR middle-end/111863

gcc/ChangeLog:

* expr.cc (do_store_flag): Don't over write arg0
when stripping off `& POW2`.

gcc/testsuite/ChangeLog:

* gcc.c-torture/execute/pr111863-1.c: New test.

[Bug c/101364] ICE: tree check: expected class ‘type’, have ‘exceptional’ (error_mark) in c_type_promotes_to, at c/c-typeck.c:278

2023-10-18 Thread cvs-commit at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=101364

--- Comment #5 from CVS Commits  ---
The trunk branch has been updated by Andrew Pinski :

https://gcc.gnu.org/g:879c91fcccf93681bd7e13290bfbb384cadcd268

commit r14-4728-g879c91fcccf93681bd7e13290bfbb384cadcd268
Author: Andrew Pinski 
Date:   Sat Oct 14 13:40:05 2023 -0700

[c] Fix PR 101364: ICE after error due to diagnose_arglist_conflict not
checking for error

When checking to see if we have a function declaration has a conflict due
to
promotations, there is no test to see if the type was an error mark and
then calls
c_type_promotes_to. c_type_promotes_to is not ready for error_mark and
causes an
ICE.

This adds a check for error before the call of c_type_promotes_to.

OK? Bootstrapped and tested on x86_64-linux-gnu with no regressions.

PR c/101364

gcc/c/ChangeLog:

* c-decl.cc (diagnose_arglist_conflict): Test for
error mark before calling of c_type_promotes_to.

gcc/testsuite/ChangeLog:

* gcc.dg/pr101364-1.c: New test.

[Bug c/101285] [11/12/13/14 Regression] ICE: tree check: expected class ‘type’, have ‘exceptional’ (error_mark) in c_safe_arg_type_equiv_p, at c/c-typeck.c:5830

2023-10-18 Thread cvs-commit at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=101285

--- Comment #7 from CVS Commits  ---
The trunk branch has been updated by Andrew Pinski :

https://gcc.gnu.org/g:11e6bcedb41359c69ee790f38b04033d236336a8

commit r14-4727-g11e6bcedb41359c69ee790f38b04033d236336a8
Author: Andrew Pinski 
Date:   Sat Oct 14 13:18:00 2023 -0700

Fix ICE due to c_safe_arg_type_equiv_p not checking for error_mark node

This is a simple error recovery issue when c_safe_arg_type_equiv_p
was added in r8-5312-gc65e18d3331aa999. The issue is that after
an error, an argument type (of a function type) might turn
into an error mark node and c_safe_arg_type_equiv_p was not ready
for that. So this just adds a check for error operand for its
arguments before getting the main variant.

OK? Bootstrapped and tested on x86_64-linux-gnu with no regressions.

PR c/101285

gcc/c/ChangeLog:

* c-typeck.cc (c_safe_arg_type_equiv_p): Return true for error
operands early.

gcc/testsuite/ChangeLog:

* gcc.dg/pr101285-1.c: New test.

[COMMITTED] Fix expansion of `(a & 2) != 1`

2023-10-18 Thread Andrew Pinski
I had a thinko in r14-1600-ge60593f3881c72a96a3fa4844d73e8a2cd14f670
where we would remove the `& CST` part if we ended up not calling
expand_single_bit_test.
This fixes the problem by introducing a new variable that will be used
for calling expand_single_bit_test.
As afar as I know this can only show up when disabling optimization
passes as this above form would have been optimized away.

Committed as obvious after a bootstrap/test on x86_64-linux-gnu.

PR middle-end/111863

gcc/ChangeLog:

* expr.cc (do_store_flag): Don't over write arg0
when stripping off `& POW2`.

gcc/testsuite/ChangeLog:

* gcc.c-torture/execute/pr111863-1.c: New test.
---
 gcc/expr.cc  |  9 +
 gcc/testsuite/gcc.c-torture/execute/pr111863-1.c | 16 
 2 files changed, 21 insertions(+), 4 deletions(-)
 create mode 100644 gcc/testsuite/gcc.c-torture/execute/pr111863-1.c

diff --git a/gcc/expr.cc b/gcc/expr.cc
index 8aed3fc6cbe..763bd82c59f 100644
--- a/gcc/expr.cc
+++ b/gcc/expr.cc
@@ -13206,14 +13206,15 @@ do_store_flag (sepops ops, rtx target, machine_mode 
mode)
  || integer_pow2p (arg1))
   && (TYPE_PRECISION (ops->type) != 1 || TYPE_UNSIGNED (ops->type)))
 {
-  wide_int nz = tree_nonzero_bits (arg0);
-  gimple *srcstmt = get_def_for_expr (arg0, BIT_AND_EXPR);
+  tree narg0 = arg0;
+  wide_int nz = tree_nonzero_bits (narg0);
+  gimple *srcstmt = get_def_for_expr (narg0, BIT_AND_EXPR);
   /* If the defining statement was (x & POW2), then use that instead of
 the non-zero bits.  */
   if (srcstmt && integer_pow2p (gimple_assign_rhs2 (srcstmt)))
{
  nz = wi::to_wide (gimple_assign_rhs2 (srcstmt));
- arg0 = gimple_assign_rhs1 (srcstmt);
+ narg0 = gimple_assign_rhs1 (srcstmt);
}
 
   if (wi::popcount (nz) == 1
@@ -13227,7 +13228,7 @@ do_store_flag (sepops ops, rtx target, machine_mode 
mode)
 
  type = lang_hooks.types.type_for_mode (mode, unsignedp);
  return expand_single_bit_test (loc, tcode,
-arg0,
+narg0,
 bitnum, type, target, mode);
}
 }
diff --git a/gcc/testsuite/gcc.c-torture/execute/pr111863-1.c 
b/gcc/testsuite/gcc.c-torture/execute/pr111863-1.c
new file mode 100644
index 000..4e27fe631b2
--- /dev/null
+++ b/gcc/testsuite/gcc.c-torture/execute/pr111863-1.c
@@ -0,0 +1,16 @@
+/* { dg-options " -fno-tree-ccp -fno-tree-dominator-opts -fno-tree-vrp" } */
+
+__attribute__((noipa))
+int f(int a)
+{
+a &= 2;
+return a != 1;
+}
+int main(void)
+{
+int t = f(1);
+if (!t)
+__builtin_abort();
+__builtin_printf("%d\n",t);
+return 0;
+}
-- 
2.39.3



[PATCH] c++: Make -Wunknown-pragmas controllable by #pragma GCC diagnostic [PR89038]

2023-10-18 Thread Lewis Hyatt
Hello-

The PR points out that my fix for PR53431 was incomplete and did not handle
-Wunknown-pragmas. This is a one-line fix to correct that, is it OK for
trunk and for GCC 13 backport please? bootstrap + regtest all languages on
x86-64 Linux. Thanks!

-Lewis

-- >8 --

As noted on the PR, commit r13-1544, the fix for PR53431, did not handle
the specific case of -Wunknown-pragmas, because that warning is issued
during preprocessing, but not by libcpp directly (it comes from the
cb_def_pragma callback).  Address that by handling this pragma in
addition to libcpp pragmas during the early pragma handler.

gcc/c-family/ChangeLog:

PR c++/89038
* c-pragma.cc (handle_pragma_diagnostic_impl):  Handle
-Wunknown-pragmas during early processing.

gcc/testsuite/ChangeLog:

PR c++/89038
* c-c++-common/cpp/Wunknown-pragmas-1.c: New test.
---
 gcc/c-family/c-pragma.cc|  3 ++-
 gcc/testsuite/c-c++-common/cpp/Wunknown-pragmas-1.c | 13 +
 2 files changed, 15 insertions(+), 1 deletion(-)
 create mode 100644 gcc/testsuite/c-c++-common/cpp/Wunknown-pragmas-1.c

diff --git a/gcc/c-family/c-pragma.cc b/gcc/c-family/c-pragma.cc
index 293311dd4ce..98dfb0f108b 100644
--- a/gcc/c-family/c-pragma.cc
+++ b/gcc/c-family/c-pragma.cc
@@ -963,7 +963,8 @@ handle_pragma_diagnostic_impl ()
   /* option_string + 1 to skip the initial '-' */
   unsigned int option_index = find_opt (data.option_str + 1, lang_mask);
 
-  if (early && !c_option_is_from_cpp_diagnostics (option_index))
+  if (early && !(c_option_is_from_cpp_diagnostics (option_index)
+|| option_index == OPT_Wunknown_pragmas))
 return;
 
   if (option_index == OPT_SPECIAL_unknown)
diff --git a/gcc/testsuite/c-c++-common/cpp/Wunknown-pragmas-1.c 
b/gcc/testsuite/c-c++-common/cpp/Wunknown-pragmas-1.c
new file mode 100644
index 000..fb58739e2bc
--- /dev/null
+++ b/gcc/testsuite/c-c++-common/cpp/Wunknown-pragmas-1.c
@@ -0,0 +1,13 @@
+/* PR c++/89038 */
+/* { dg-additional-options "-Wunknown-pragmas" } */
+
+#pragma oops /* { dg-warning "-:-Wunknown-pragmas" } */
+#pragma GGC diagnostic push /* { dg-warning "-:-Wunknown-pragmas" } */
+#pragma GCC diagnostics push /* { dg-warning "-:-Wunknown-pragmas" } */
+
+/* Test we can disable the warnings.  */
+#pragma GCC diagnostic ignored "-Wunknown-pragmas"
+
+#pragma oops /* { dg-bogus "-:-Wunknown-pragmas" } */
+#pragma GGC diagnostic push /* { dg-bogus "-:-Wunknown-pragmas" } */
+#pragma GCC diagnostics push /* { dg-bogus "-:-Wunknown-pragmas" } */


Re: [PATCH V2 7/7] aarch64: Add system register duplication check selftest

2023-10-18 Thread Richard Sandiford
Victor Do Nascimento  writes:
> Add a build-time test to check whether system register data, as
> imported from `aarch64-sys-reg.def' has any duplicate entries.
>
> Duplicate entries are defined as any two SYSREG entries in the .def
> file which share the same encoding values (as specified by its `CPENC'
> field) and where the relationship amongst the two does not fit into
> one of the following categories:
>
>   * Simple aliasing: In some cases, it is observed that one
>   register name serves as an alias to another.  One example of
>   this is where TRCEXTINSELR aliases TRCEXTINSELR0.
>   * Expressing intent: It is possible that when a given register
>   serves two distinct functions depending on how it is used, it
>   is given two distinct names whose use should match the context
>   under which it is being used.  Example:  Debug Data Transfer
>   Register. When used to receive data, it should be accessed as
>   DBGDTRRX_EL0 while when transmitting data it should be
>   accessed via DBGDTRTX_EL0.
>   * Register depreciation: Some register names have been
>   deprecated and should no longer be used, but backwards-
>   compatibility requires that such names continue to be
>   recognized, as is the case for the SPSR_EL1 register, whose
>   access via the SPSR_SVC name is now deprecated.
>   * Same encoding different target: Some encodings are given
>   different meaning depending on the target architecture and, as
>   such, are given different names in each of theses contexts.
>   We see an example of this for CPENC(3,4,2,0,0), which
>   corresponds to TTBR0_EL2 for Armv8-A targets and VSCTLR_EL2
>   in Armv8-R targets.
>
> A consequence of these observations is that `CPENC' duplication is
> acceptable iff at least one of the `properties' or `arch_reqs' fields
> of the `sysreg_t' structs associated with the two registers in
> question differ and it's this condition that is checked by the new
> `aarch64_test_sysreg_encoding_clashes' function.
>
> gcc/ChangeLog:
>
>   * gcc/config/aarch64/aarch64.cc
>   (aarch64_test_sysreg_encoding_clashes): New.
>   (aarch64_run_selftests): add call to
>   aarch64_test_sysreg_encoding_clashes selftest.
> ---
>  gcc/config/aarch64/aarch64.cc | 53 +++
>  1 file changed, 53 insertions(+)
>
> diff --git a/gcc/config/aarch64/aarch64.cc b/gcc/config/aarch64/aarch64.cc
> index d187e171beb..e0be2877ede 100644
> --- a/gcc/config/aarch64/aarch64.cc
> +++ b/gcc/config/aarch64/aarch64.cc
> @@ -22,6 +22,7 @@
>  
>  #define INCLUDE_STRING
>  #define INCLUDE_ALGORITHM
> +#define INCLUDE_VECTOR
>  #include "config.h"
>  #include "system.h"
>  #include "coretypes.h"
> @@ -28332,6 +28333,57 @@ aarch64_test_fractional_cost ()
>ASSERT_EQ (cf (1, 2).as_double (), 0.5);
>  }
>  
> +/* Calculate whether our system register data, as imported from
> +   `aarch64-sys-reg.def' has any duplicate entries.  */
> +static void
> +aarch64_test_sysreg_encoding_clashes (void)
> +{
> +  using dup_counters_t = hash_map;
> +  using dup_instances_t = hash_map +std::vector>;
> +
> +  dup_counters_t duplicate_counts;
> +  dup_instances_t duplicate_instances;
> +
> +  /* Every time an encoding is established to come up more than once
> +  we add it to a "clash-analysis queue", which is then used to extract
> +  necessary information from our hash map when establishing whether
> +  repeated encodings are valid.  */

Formatting nit, sorry, but second and subsequent lines should be
indented to line up with the "E".

> +
> +  /* 1) Collect recurrence information.  */
> +  std::vector testqueue;
> +
> +  for (unsigned i = 0; i < nsysreg; i++)
> +{
> +  const sysreg_t *reg = sysreg_structs + i;
> +
> +  unsigned *tbl_entry = _counts.get_or_insert (reg->encoding);
> +  *tbl_entry += 1;
> +
> +  std::vector *tmp
> + = _instances.get_or_insert (reg->encoding);
> +
> +  tmp->push_back (reg);
> +  if (*tbl_entry > 1)
> +   testqueue.push_back (reg->encoding);
> +}

Do we need two hash maps here?  It looks like the length of the vector
is always equal to the count.  Also...

> +
> +  /* 2) Carry out analysis on collected data.  */
> +  for (auto enc : testqueue)

...hash_map itself is iterable.  We could iterate over that instead,
which would avoid the need for the queue.

> +{
> +  unsigned nrep = *duplicate_counts.get (enc);
> +  for (unsigned i = 0; i < nrep; i++)
> + for (unsigned j = i+1; j < nrep; j++)

Formatting nit, but "i + 1" rather than "i+1".

Overall, it looks like really nice work.  Thanks for doing this.

Richard

> +   {
> + std::vector *tmp2 = duplicate_instances.get (enc);
> + const sysreg_t *a = (*tmp2)[i];
> + const sysreg_t *b = (*tmp2)[j];
> + ASSERT_TRUE ((a->properties != b->properties)
> +  || (a->arch_reqs != 

[Bug bootstrap/111601] [14 Regression] bootstrap fails in stagestrain in libcody on x86_64-linux-gnu and powerpc64le-linux-gnu

2023-10-18 Thread bergner at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111601

--- Comment #6 from Peter Bergner  ---
(In reply to Andrew Pinski from comment #5)
> (In reply to Peter Bergner from comment #4) 
> > CCing richi and jakub to see if they've seen anything like this before?
> 
> I suspect we are miscompiling the final compiler somehow. I linked 2 other
> reports which reported that PGO is causing wrong code; I have not looked
> into confirming them yet though.

Thanks and yes, I agree.  Luckily those test cases are MUCH smaller than gcc
itself.  Hopefully the bug is the same!

Re: [PATCH V2 6/7] aarch64: Add front-end argument type checking for target builtins

2023-10-18 Thread Richard Sandiford
Victor Do Nascimento  writes:
> In implementing the ACLE read/write system register builtins it was
> observed that leaving argument type checking to be done at expand-time
> meant that poorly-formed function calls were being "fixed" by certain
> optimization passes, meaning bad code wasn't being properly picked up
> in checking.
>
> Example:
>
>   const char *regname = "amcgcr_el0";
>   long long a = __builtin_aarch64_rsr64 (regname);
>
> is reduced by the ccp1 pass to
>
>   long long a = __builtin_aarch64_rsr64 ("amcgcr_el0");
>
> As these functions require an argument of STRING_CST type, there needs
> to be a check carried out by the front-end capable of picking this up.
>
> The introduced `check_general_builtin_call' function will be called by
> the TARGET_CHECK_BUILTIN_CALL hook whenever a call to a builtin
> belonging to the AARCH64_BUILTIN_GENERAL category is encountered,
> carrying out any appropriate checks associated with a particular
> builtin function code.
>
> gcc/ChangeLog:
>
>   * gcc/config/aarch64/aarch64-builtins.cc (check_general_builtin_call):
>   New.
>   * gcc/config/aarch64/aarch64-c.cc (aarch64_check_builtin_call):
>   Add check_general_builtin_call call.
>   * gcc/config/aarch64/aarch64-protos.h (check_general_builtin_call):
>   New.
>
> gcc/testsuite/ChangeLog:
>
>   * gcc/testsuite/gcc.target/aarch64/acle/rwsr-2.c: New.
> ---
>  gcc/config/aarch64/aarch64-builtins.cc| 33 +++
>  gcc/config/aarch64/aarch64-c.cc   |  4 +--
>  gcc/config/aarch64/aarch64-protos.h   |  3 ++
>  .../gcc.target/aarch64/acle/rwsr-2.c  | 15 +
>  4 files changed, 53 insertions(+), 2 deletions(-)
>  create mode 100644 gcc/testsuite/gcc.target/aarch64/acle/rwsr-2.c
>
> diff --git a/gcc/config/aarch64/aarch64-builtins.cc 
> b/gcc/config/aarch64/aarch64-builtins.cc
> index d8bb2a989a5..6734361f4f4 100644
> --- a/gcc/config/aarch64/aarch64-builtins.cc
> +++ b/gcc/config/aarch64/aarch64-builtins.cc
> @@ -2126,6 +2126,39 @@ aarch64_general_builtin_decl (unsigned code, bool)
>return aarch64_builtin_decls[code];
>  }
>  
> +bool
> +check_general_builtin_call (location_t location, vec,
> + unsigned int code, tree fndecl,
> + unsigned int nargs ATTRIBUTE_UNUSED, tree *args)
> +{

How about aarch64_general_check_builtin_call?  It's better to use
aarch64_* prefixes where possible.

> +  switch (code)
> +{
> +case AARCH64_RSR:
> +case AARCH64_RSRP:
> +case AARCH64_RSR64:
> +case AARCH64_RSRF:
> +case AARCH64_RSRF64:
> +case AARCH64_WSR:
> +case AARCH64_WSRP:
> +case AARCH64_WSR64:
> +case AARCH64_WSRF:
> +case AARCH64_WSRF64:
> +  if (TREE_CODE (args[0]) == VAR_DECL
> +   || TREE_CODE (TREE_TYPE (args[0])) != POINTER_TYPE
> +   || TREE_CODE (TREE_OPERAND (TREE_OPERAND (args[0], 0) , 0))
> +   != STRING_CST)

Similarly to the expand code in 5/7, I think this should check
positively for specific tree codes rather than negatively for a
VAR_DECL.  That is, we should ensure TREE_CODE (x) is something
(rather than isn't something) before accessing TREE_OPERAND (x, 0).

> + {
> +   const char  *fn_name, *err_msg;
> +   fn_name = IDENTIFIER_POINTER (DECL_NAME (fndecl));
> +   err_msg = "first argument to %<%s%> must be a string literal";
> +   error_at (location, err_msg, fn_name);

The error message needs to remain part of the error_at call,
since being in error_at ensures that it gets picked up for translation.
It's simpler to use %qD rather than %<%s%>, and pass fndecl directly.

> +   return false;
> + }
> +}
> +  /* Default behavior.  */
> +  return true;
> +}
> +
>  typedef enum
>  {
>SIMD_ARG_COPY_TO_REG,
> diff --git a/gcc/config/aarch64/aarch64-c.cc b/gcc/config/aarch64/aarch64-c.cc
> index ab8844f6049..c2a9a59df73 100644
> --- a/gcc/config/aarch64/aarch64-c.cc
> +++ b/gcc/config/aarch64/aarch64-c.cc
> @@ -339,8 +339,8 @@ aarch64_check_builtin_call (location_t loc, 
> vec arg_loc,
>switch (code & AARCH64_BUILTIN_CLASS)
>  {
>  case AARCH64_BUILTIN_GENERAL:
> -  return true;
> -
> +  return check_general_builtin_call (loc, arg_loc, subcode, orig_fndecl,
> +  nargs, args);
>  case AARCH64_BUILTIN_SVE:
>return aarch64_sve::check_builtin_call (loc, arg_loc, subcode,
> orig_fndecl, nargs, args);
> diff --git a/gcc/config/aarch64/aarch64-protos.h 
> b/gcc/config/aarch64/aarch64-protos.h
> index a134e2fcf8e..9ef96ff511f 100644
> --- a/gcc/config/aarch64/aarch64-protos.h
> +++ b/gcc/config/aarch64/aarch64-protos.h
> @@ -990,6 +990,9 @@ tree aarch64_general_builtin_rsqrt (unsigned int);
>  void handle_arm_acle_h (void);
>  void handle_arm_neon_h (void);
>  
> +bool check_general_builtin_call (location_t, vec, unsigned int,
> +   tree, unsigned int, 

[Bug driver/103398] configure: Enable --enable-default-pie by default for Linux

2023-10-18 Thread mark.esler at canonical dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=103398

Mark Esler  changed:

   What|Removed |Added

 CC||mark.esler at canonical dot com

--- Comment #4 from Mark Esler  ---
(In reply to Andrew Pinski from comment #1)
> No. The whole reason why there is an option is because it is optional.

Could this issue be re-considered?

Or should -fhardened and related endeavors all re-suggest the specific flags
that --enable-default-pie provides?

Re: [V3][PATCH 0/3] New attribute "counted_by" to annotate bounds for C99 FAM(PR108896)

2023-10-18 Thread Qing Zhao


> On Oct 5, 2023, at 4:08 PM, Siddhesh Poyarekar  wrote:
> 
> On 2023-08-25 11:24, Qing Zhao wrote:
>> This is the 3rd version of the patch, per our discussion based on the
>> review comments for the 1st and 2nd version, the major changes in this
>> version are:
> 
> Hi Qing,
> 
> I hope the review was helpful.  Overall, a couple of things to consider:
> 
> 1. How would you handle potential reordering between assignment of the size 
> to the counted_by field with the __bdos call that may consume it? You'll 
> probably need to express some kind of dependency there or in the worst case, 
> insert a barrier to disallow reordering.

Good point! 

So, your example in the respond to [V3][PATCH 2/3]Use the counted_by atribute 
info in builtin object size [PR108896]:
“
Maybe another test where the allocation, size assignment and __bdos call happen 
in the same function, where the allocator is not recognized by gcc:

void *
__attribute__ ((noinline))
alloc (size_t sz)
{
 return __builtin_malloc (sz);
}

void test (size_t sz)
{
 array_annotated = alloc (sz);
 array_annotated->b = sz;
 return __builtin_dynamic_object_size (array_annotated->c, 1);
}

The interesting thing to test (and ensure in the codegen) is that the 
assignment to array_annotated->b does not get reordered to below the 
__builtin_dynamic_object_size call since technically there is no data 
dependency between the two.
“
Will test on this. 

Not sure whether the current GCC alias analysis is able to distinguish one 
field of a structure from another field of the same structure, if YES, then
We need to add an explicit dependency edge from the write to 
“array_annotated->b” to the call to 
“__builtin_dynamic_object_size(array_annotated->c,1)”.
I will check on this and see how to resolve this issue.

I guess the possible solution is that we can add an implicit ref to 
“array_annotated->b” at the call to 
“__builtin_dynamic_object_size(array_annotated->c, 1)” if the counted_by 
attribute is available. That should resolve the issue.

Richard, what do you think on this?

> 
> 2. How would you handle signedness of the size field?  The size gets 
> converted to sizetype everywhere it is used and overflows/underflows may 
> produce interesting results.  Do you want to limit the types to unsigned or 
> do you want to add a disclaimer in the docs?  The former seems like the 
> *right* thing to do given that it is a new feature; best to enforce the 
> cleaner habit at the outset.

As I replied to Martin in another email, I plan to do the following to resolve 
this issue:

1. No specification for signed or unsigned for counted_by field.
2. Add a sanitizer option -fsanitize=counted-by-bound to catch the cases when 
the size of the counted-by is not positive.

Then, we will be consistent with the handling of VLA. 

So, I will not change anything for the current patch.
However, I will add the sanitizer option in a followup patch set.

Let me know your opinion.

thanks.

Qing

> 
> Thanks,
> Sid
> 
>> ***Against 1st version:
>> 1. change the name "element_count" to "counted_by";
>> 2. change the parameter for the attribute from a STRING to an
>> Identifier;
>> 3. Add logic and testing cases to handle anonymous structure/unions;
>> 4. Clarify documentation to permit the situation when the allocation
>> size is larger than what's specified by "counted_by", at the same time,
>> it's user's error if allocation size is smaller than what's specified by
>> "counted_by";
>> 5. Add a complete testing case for using counted_by attribute in
>> __builtin_dynamic_object_size when there is mismatch between the
>> allocation size and the value of "counted_by", the expecting behavior
>> for each case and the explanation on why in the comments.
>> ***Against 2rd version:
>> 1. Identify a tree node sharing issue and fixed it in the routine
>>"component_ref_get_counted_ty" of tree.cc;
>> 2. Update the documentation and testing cases with the clear usage
>>of the fomula to compute the allocation size:
>> MAX (sizeof (struct A), offsetof (struct A, array[0]) + counted_by * 
>> sizeof(element))
>>(the algorithm used in tree-object-size.cc is correct).
>> In this set of patches, the major functionality provided is:
>> 1. a new attribute "counted_by";
>> 2. use this new attribute in bound sanitizer;
>> 3. use this new attribute in dynamic object size for subobject size;
>> As discussed, I plan to add two more separate patches sets after this initial
>> patch set is approved and committed.
>> set 1. A new warning option and a new sanitizer option for the user error
>>   when the allocation size is smaller than the value of "counted_by".
>> set 2. An improvement to __builtin_dynamic_object_size  for whole-object
>>   size of the structure with FAM annaoted with counted_by.
>> there are also some existing bugs in tree-object-size.cc identified
>> during the study, and PRs were filed to record them. these bugs will
>> be fixed seperately with individual patches:
>> 

Re: [PATCH V2 4/7] aarch64: Add basic target_print_operand support for CONST_STRING

2023-10-18 Thread Richard Sandiford
Victor Do Nascimento  writes:
> Motivated by the need to print system register names in output
> assembly, this patch adds the required logic to
> `aarch64_print_operand' to accept rtxs of type CONST_STRING and
> process these accordingly.
>
> Consequently, an rtx such as:
>
>   (set (reg/i:DI 0 x0)
>  (unspec:DI [(const_string ("s3_3_c13_c2_2"))])
>
> can now be output correctly using the following output pattern when
> composing `define_insn's:
>
>   "mrs\t%x0, %1"
>
> gcc/ChangeLog
>
>   * gcc/config/aarch64/aarch64.cc (aarch64_print_operand): Add
>   support for CONST_STRING.
> ---
>  gcc/config/aarch64/aarch64.cc | 6 ++
>  1 file changed, 6 insertions(+)
>
> diff --git a/gcc/config/aarch64/aarch64.cc b/gcc/config/aarch64/aarch64.cc
> index 816c4b69fc8..d187e171beb 100644
> --- a/gcc/config/aarch64/aarch64.cc
> +++ b/gcc/config/aarch64/aarch64.cc
> @@ -12430,6 +12430,12 @@ aarch64_print_operand (FILE *f, rtx x, int code)
>  
>switch (GET_CODE (x))
>   {
> + case CONST_STRING:
> +   {
> + const char *output_op = XSTR (x, 0);
> + asm_fprintf (f, "%s", output_op);
> + break;
> +   }

LGTM, but it seems slightly neater to avoid the temporary:

case CONST_STRING:
  asm_fprintf (f, "%s", XSTR (x, 0));
  break;

(Sorry for the micro-comment.)

Thanks,
Richard

>   case REG:
> if (aarch64_sve_data_mode_p (GET_MODE (x)))
>   {


Re: [PATCH V2 2/7] aarch64: Add support for aarch64-sys-regs.def

2023-10-18 Thread Richard Sandiford
Victor Do Nascimento  writes:
> This patch defines the structure of a new .def file used for
> representing the aarch64 system registers, what information it should
> hold and the basic framework in GCC to process this file.
>
> Entries in the aarch64-system-regs.def file should be as follows:
>
>   SYSREG (NAME, CPENC (sn,op1,cn,cm,op2), FLAG1 | ... | FLAGn, ARCH)
>
> Where the arguments to SYSREG correspond to:
>   - NAME:  The system register name, as used in the assembly language.
>   - CPENC: The system register encoding, mapping to:
>
>  s__c_c_
>
>   - FLAG: The entries in the FLAGS field are bitwise-OR'd together to
> encode extra information required to ensure proper use of
> the system register.  For example, a read-only system
> register will have the flag F_REG_READ, while write-only
> registers will be labeled F_REG_WRITE.  Such flags are
> tested against at compile-time.
>   - ARCH: The architectural features the system register is associated
> with.  This is encoded via one of three possible macros:
> 1. When a system register is universally implemented, we say
> it has no feature requirements, so we tag it with the
> AARCH64_NO_FEATURES macro.
> 2. When a register is only implemented for a single
> architectural extension EXT, the AARCH64_FEATURE (EXT), is
> used.
> 3. When a given system register is made available by any of N
> possible architectural extensions, the AARCH64_FEATURES(N, ...)
> macro is used to combine them accordingly.
>
> In order to enable proper interpretation of the SYSREG entries by the
> compiler, flags defining system register behavior such as `F_REG_READ'
> and `F_REG_WRITE' are also defined here, so they can later be used for
> the validation of system register properties.
>
> Finally, any architectural feature flags from Binutils missing from GCC
> have appropriate aliases defined here so as to ensure
> cross-compatibility of SYSREG entries across the toolchain.
>
> gcc/ChangeLog:
>
>   * gcc/config/aarch64/aarch64.cc (sysreg_t): New.
>   (sysreg_structs): Likewise.
>   (nsysreg): Likewise.
>   (AARCH64_FEATURE): Likewise.
>   (AARCH64_FEATURES): Likewise.
>   (AARCH64_NO_FEATURES): Likewise.
>   * gcc/config/aarch64/aarch64.h (AARCH64_ISA_V8A): Add missing
>   ISA flag.
>   (AARCH64_ISA_V8_1A): Likewise.
>   (AARCH64_ISA_V8_7A): Likewise.
>   (AARCH64_ISA_V8_8A): Likewise.
>   (AARCH64_NO_FEATURES): Likewise.
>   (AARCH64_FL_RAS): New ISA flag alias.
>   (AARCH64_FL_LOR): Likewise.
>   (AARCH64_FL_PAN): Likewise.
>   (AARCH64_FL_AMU): Likewise.
>   (AARCH64_FL_SCXTNUM): Likewise.
>   (AARCH64_FL_ID_PFR2): Likewise.
>   (F_DEPRECATED): New.
>   (F_REG_READ): Likewise.
>   (F_REG_WRITE): Likewise.
>   (F_ARCHEXT): Likewise.
>   (F_REG_ALIAS): Likewise.
> ---
>  gcc/config/aarch64/aarch64.cc | 38 +++
>  gcc/config/aarch64/aarch64.h  | 36 +
>  2 files changed, 74 insertions(+)
>
> diff --git a/gcc/config/aarch64/aarch64.cc b/gcc/config/aarch64/aarch64.cc
> index 9fbfc548a89..69de2366424 100644
> --- a/gcc/config/aarch64/aarch64.cc
> +++ b/gcc/config/aarch64/aarch64.cc
> @@ -2807,6 +2807,44 @@ static const struct processor all_cores[] =
>{NULL, aarch64_none, aarch64_none, aarch64_no_arch, 0, NULL}
>  };
>  
> +typedef struct {
> +  const char* name;
> +  const char* encoding;

Formatting nit, but GCC style is:

  const char *foo

rather than:

  const char* foo;

> +  const unsigned properties;
> +  const unsigned long long arch_reqs;

I don't think these two should be const.  There's no reason in principle
why a sysreg_t can't be created and modified dynamically.

It would be useful to have some comments above the fields to say what
they represent.  E.g. the definition on its own doesn't make clear what
"properties" refers to.

arch_reqs should use aarch64_feature_flags rather than unsigned long long.
We're running out of feature flags in GCC too, so aarch64_feature_flags
is soon likely to be a C++ class.

> +} sysreg_t;
> +
> +/* An aarch64_feature_set initializer for a single feature,
> +   AARCH64_FEATURE_.  */
> +#define AARCH64_FEATURE(FEAT) AARCH64_FL_##FEAT
> +
> +/* Used by AARCH64_FEATURES.  */
> +#define AARCH64_OR_FEATURES_1(X, F1) \
> +  AARCH64_FEATURE (F1)
> +#define AARCH64_OR_FEATURES_2(X, F1, F2) \
> +  (AARCH64_FEATURE (F1) | AARCH64_OR_FEATURES_1 (X, F2))
> +#define AARCH64_OR_FEATURES_3(X, F1, ...) \
> +  (AARCH64_FEATURE (F1) | AARCH64_OR_FEATURES_2 (X, __VA_ARGS__))
> +
> +/* An aarch64_feature_set initializer for the N features listed in "...".  */
> +#define AARCH64_FEATURES(N, ...) \
> +  AARCH64_OR_FEATURES_##N (0, __VA_ARGS__)
> +
> +/* Database of system registers, their encodings and architectural
> +   requirements.  */
> +const sysreg_t 

Re: [PATCH] libcpp: testsuite: Add test for fixed _Pragma bug [PR82335]

2023-10-18 Thread Lewis Hyatt
May I please ping this one, and/or, is it something straightforward
enough I can just commit it as obvious? Thanks!
https://gcc.gnu.org/pipermail/gcc-patches/2023-October/631814.html

-Lewis

On Mon, Oct 2, 2023 at 6:23 PM Lewis Hyatt  wrote:
>
> Hello-
>
> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=82335 is another
> _Pragma-related bug that got fixed in GCC 12 but is still open. Before
> closing it out, I thought it would be good to add the testcase from that
> PR, which we don't have exactly in the testsuite already. Is it OK please?
> Thanks!
>
> -Lewis
>
> -- >8 --
>
> This PR was fixed by r12-4797 and r12-5454. Add test coverage from the PR
> that is not represented elsewhere.
>
> gcc/testsuite/ChangeLog:
>
> PR preprocessor/82335
> * c-c++-common/cpp/diagnostic-pragma-3.c: New test.
> ---
>  .../c-c++-common/cpp/diagnostic-pragma-3.c| 37 +++
>  1 file changed, 37 insertions(+)
>  create mode 100644 gcc/testsuite/c-c++-common/cpp/diagnostic-pragma-3.c
>
> diff --git a/gcc/testsuite/c-c++-common/cpp/diagnostic-pragma-3.c 
> b/gcc/testsuite/c-c++-common/cpp/diagnostic-pragma-3.c
> new file mode 100644
> index 000..459dcec73b3
> --- /dev/null
> +++ b/gcc/testsuite/c-c++-common/cpp/diagnostic-pragma-3.c
> @@ -0,0 +1,37 @@
> +/* This is like diagnostic-pragma-2.c, but handles the case where everything
> +   is wrapped inside a macro, which previously caused additional issues 
> tracked
> +   in PR preprocessor/82335.  */
> +
> +/* { dg-do compile } */
> +/* { dg-additional-options "-save-temps -Wattributes -Wtype-limits" } */
> +
> +#define B _Pragma("GCC diagnostic push") \
> +  _Pragma("GCC diagnostic ignored \"-Wattributes\"")
> +#define E _Pragma("GCC diagnostic pop")
> +
> +#define X() B int __attribute((unknown_attr)) x; E
> +#define Y   B int __attribute((unknown_attr)) y; E
> +#define WRAP(x) x
> +
> +void test1(void)
> +{
> +  WRAP(X())
> +  WRAP(Y)
> +}
> +
> +/* Additional test provided on the PR.  */
> +#define PRAGMA(...) _Pragma(#__VA_ARGS__)
> +#define PUSH_IGN(X) PRAGMA(GCC diagnostic push) PRAGMA(GCC diagnostic 
> ignored X)
> +#define POP() PRAGMA(GCC diagnostic pop)
> +#define TEST(X, Y) \
> +  PUSH_IGN("-Wtype-limits") \
> +  int Y = (__typeof(X))-1 < 0; \
> +  POP()
> +
> +int test2()
> +{
> +  unsigned x;
> +  TEST(x, i1);
> +  WRAP(TEST(x, i2))
> +  return i1 + i2;
> +}


[Bug bootstrap/111601] [14 Regression] bootstrap fails in stagestrain in libcody on x86_64-linux-gnu and powerpc64le-linux-gnu

2023-10-18 Thread pinskia at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111601

--- Comment #5 from Andrew Pinski  ---
(In reply to Peter Bergner from comment #4) 
> CCing richi and jakub to see if they've seen anything like this before?

I suspect we are miscompiling the final compiler somehow. I linked 2 other
reports which reported that PGO is causing wrong code; I have not looked into
confirming them yet though.

[Bug target/111870] Miscompile of atomic rmw or on x86 (not aarch, though)

2023-10-18 Thread pinskia at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111870

--- Comment #9 from Andrew Pinski  ---
If I change your testcase to be:
uint64_t huh2 (_Atomic(uint64_t)* map, int t) {
   return atomic_fetch_or_explicit(map, t, memory_order_relaxed);
}

You will see that it does the `lock cmpxchg` loop too.

[Bug target/111870] Miscompile of atomic rmw or on x86 (not aarch, though)

2023-10-18 Thread pinskia at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111870

--- Comment #8 from Andrew Pinski  ---
On aarch64, ldset does both a load and ior. that is unlike the `lock or` on
x86.

[Bug target/111870] Miscompile of atomic rmw or on x86 (not aarch, though)

2023-10-18 Thread pinskia at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111870

--- Comment #7 from Andrew Pinski  ---
That is not using the fetch part is optimized to just `lock or`.

[Bug target/111870] Miscompile of atomic rmw or on x86 (not aarch, though)

2023-10-18 Thread pinskia at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111870

--- Comment #6 from Andrew Pinski  ---
If you don't use the return value of atomic_fetch_or_explicit, there is no need
for a compare-and-exchange (swap) loop. If you need the fetch part, the
compare-and-exchange loop needs to be used as `lock or` does not provide that.


I still don't understand what exactly you are saying is wrong except you want
to use `lock or` in both cases but you can't since that will not get the right
atomic fetch part of the atomic_fetch_or.

[Bug bootstrap/111601] [14 Regression] bootstrap fails in stagestrain in libcody on x86_64-linux-gnu and powerpc64le-linux-gnu

2023-10-18 Thread bergner at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111601

Peter Bergner  changed:

   What|Removed |Added

 CC||jakub at gcc dot gnu.org,
   ||rguenth at gcc dot gnu.org

--- Comment #4 from Peter Bergner  ---
(In reply to Peter Bergner from comment #3)
> I'll try and see if I can reduce the test case.

cvise reduced this down to:


bergner@ltcden2-lp1:$ cat pr111601.ii 
struct param_type {
  param_type() : param_type(0.5) { }
  param_type(double);
};

bergner@ltcden2-lp1:$
/home/bergner/gcc/build/gcc-fsf-mainline-pr111601-regtest/./gcc/xgcc
-B/home/bergner/gcc/build/gcc-fsf-mainline-pr111601-regtest/./gcc
-shared-libgcc -fno-checking -x c++-header -nostdinc++ -O2 -S pr111601.ii
pr111601.ii: In constructor ‘param_type::param_type()’:
pr111601.ii:2:32: internal compiler error: tree check: expected tree that
contains ‘decl common’ structure, have ‘’ in
build_new_method_call, at cp/call.cc:11630
2 |   param_type() : param_type(0.5) { }
  |^
Please submit a full bug report, with preprocessed source (by using
-freport-bug).
See  for instructions.

I'll note that a xgcc built without using "make profiledbootstrap-lean" does
not ICE.

CCing richi and jakub to see if they've seen anything like this before?

Re: [V3][PATCH 2/3] Use the counted_by atribute info in builtin object size [PR108896]

2023-10-18 Thread Qing Zhao
Hi, Sid,

Thanks a lot for the detailed comments.

See my responds embedded below.

Qing

> On Oct 5, 2023, at 4:01 PM, Siddhesh Poyarekar  wrote:
> 
> 
> 
> On 2023-08-25 11:24, Qing Zhao wrote:
>> Use the counted_by atribute info in builtin object size to compute the
>> subobject size for flexible array members.
>> gcc/ChangeLog:
>>  PR C/108896
>>  * tree-object-size.cc (addr_object_size): Use the counted_by
>>  attribute info.
>>  * tree.cc (component_ref_has_counted_by_p): New function.
>>  (component_ref_get_counted_by): New function.
>>  * tree.h (component_ref_has_counted_by_p): New prototype.
>>  (component_ref_get_counted_by): New prototype.
>> gcc/testsuite/ChangeLog:
>>  PR C/108896
>>  * gcc.dg/flex-array-counted-by-2.c: New test.
>>  * gcc.dg/flex-array-counted-by-3.c: New test.
>> ---
>>  .../gcc.dg/flex-array-counted-by-2.c  |  74 ++
>>  .../gcc.dg/flex-array-counted-by-3.c  | 210 ++
>>  gcc/tree-object-size.cc   |  37 ++-
>>  gcc/tree.cc   |  95 +++-
>>  gcc/tree.h|  10 +
>>  5 files changed, 418 insertions(+), 8 deletions(-)
>>  create mode 100644 gcc/testsuite/gcc.dg/flex-array-counted-by-2.c
>>  create mode 100644 gcc/testsuite/gcc.dg/flex-array-counted-by-3.c
>> diff --git a/gcc/testsuite/gcc.dg/flex-array-counted-by-2.c 
>> b/gcc/testsuite/gcc.dg/flex-array-counted-by-2.c
>> new file mode 100644
>> index ..ec580c1f1f01
>> --- /dev/null
>> +++ b/gcc/testsuite/gcc.dg/flex-array-counted-by-2.c
>> @@ -0,0 +1,74 @@
>> +/* test the attribute counted_by and its usage in
>> + * __builtin_dynamic_object_size.  */
>> +/* { dg-do run } */
>> +/* { dg-options "-O2" } */
>> +
>> +#include "builtin-object-size-common.h"
>> +
>> +#define expect(p, _v) do { \
>> +size_t v = _v; \
>> +if (p == v) \
>> +__builtin_printf ("ok:  %s == %zd\n", #p, p); \
>> +else \
>> +{  \
>> +  __builtin_printf ("WAT: %s == %zd (expected %zd)\n", #p, p, v); \
>> +  FAIL (); \
>> +} \
>> +} while (0);
> 
> You're using this in a bunch of tests already; does it make sense to 
> consolidate it into builtin-object-size-common.h?
Will do this. 
> 
>> +
>> +struct flex {
>> +  int b;
>> +  int c[];
>> +} *array_flex;
>> +
>> +struct annotated {
>> +  int b;
>> +  int c[] __attribute__ ((counted_by (b)));
>> +} *array_annotated;
>> +
>> +struct nested_annotated {
>> +  struct {
>> +union {
>> +  int b;
>> +  float f;  
>> +};
>> +int n;
>> +  };
>> +  int c[] __attribute__ ((counted_by (b)));
>> +} *array_nested_annotated;
>> +
>> +void __attribute__((__noinline__)) setup (int normal_count, int attr_count)
>> +{
>> +  array_flex
>> += (struct flex *)malloc (sizeof (struct flex)
>> + + normal_count *  sizeof (int));
>> +  array_flex->b = normal_count;
>> +
>> +  array_annotated
>> += (struct annotated *)malloc (sizeof (struct annotated)
>> +  + attr_count *  sizeof (int));
>> +  array_annotated->b = attr_count;
>> +
>> +  array_nested_annotated
>> += (struct nested_annotated *)malloc (sizeof (struct nested_annotated)
>> + + attr_count *  sizeof (int));
>> +  array_nested_annotated->b = attr_count;
>> +
>> +  return;
>> +}
>> +
>> +void __attribute__((__noinline__)) test ()
>> +{
>> +expect(__builtin_dynamic_object_size(array_flex->c, 1), -1);
>> +expect(__builtin_dynamic_object_size(array_annotated->c, 1),
>> +   array_annotated->b * sizeof (int));
>> +expect(__builtin_dynamic_object_size(array_nested_annotated->c, 1),
>> +   array_nested_annotated->b * sizeof (int));
>> +}
> 
> Maybe another test where the allocation, size assignment and __bdos call 
> happen in the same function, where the allocator is not recognized by gcc:
> 
> void *
> __attribute__ ((noinline))
> alloc (size_t sz)
> {
>  return __builtin_malloc (sz);
> }
> 
> void test (size_t sz)
> {
>  array_annotated = alloc (sz);
>  array_annotated->b = sz;
>  return __builtin_dynamic_object_size (array_annotated->c, 1);
> }
> 
> The interesting thing to test (and ensure in the codegen) is that the 
> assignment to array_annotated->b does not get reordered to below the 
> __builtin_dynamic_object_size call since technically there is no data 
> dependency between the two.
Good point.
Will add such testing case. 
> 
>> +
>> +int main(int argc, char *argv[])
>> +{
>> +  setup (10,10);
>> +  test ();
>> +  DONE ();
>> +}
>> diff --git a/gcc/testsuite/gcc.dg/flex-array-counted-by-3.c 
>> b/gcc/testsuite/gcc.dg/flex-array-counted-by-3.c
>> new file mode 100644
>> index ..a0c3cb88ec71
>> --- /dev/null
>> +++ b/gcc/testsuite/gcc.dg/flex-array-counted-by-3.c
>> @@ -0,0 +1,210 @@
>> +/* test the attribute counted_by and its usage in
>> +__builtin_dynamic_object_size: what's the correct behavior when the
>> 

Re: [PATCH V2 5/7] aarch64: Implement system register r/w arm ACLE intrinsic functions

2023-10-18 Thread Richard Sandiford
Victor Do Nascimento  writes:
> Implement the aarch64 intrinsics for reading and writing system
> registers with the following signatures:
>
>   uint32_t __arm_rsr(const char *special_register);
>   uint64_t __arm_rsr64(const char *special_register);
>   void* __arm_rsrp(const char *special_register);
>   float __arm_rsrf(const char *special_register);
>   double __arm_rsrf64(const char *special_register);
>   void __arm_wsr(const char *special_register, uint32_t value);
>   void __arm_wsr64(const char *special_register, uint64_t value);
>   void __arm_wsrp(const char *special_register, const void *value);
>   void __arm_wsrf(const char *special_register, float value);
>   void __arm_wsrf64(const char *special_register, double value);
>
> gcc/ChangeLog:
>
>   * gcc/config/aarch64/aarch64-builtins.cc (enum aarch64_builtins):
>   Add enums for new builtins.
>   (aarch64_init_rwsr_builtins): New.
>   (aarch64_general_init_builtins): Call aarch64_init_rwsr_builtins.
>   (aarch64_expand_rwsr_builtin):  New.
>   (aarch64_general_expand_builtin): Call aarch64_general_expand_builtin.
>   * gcc/config/aarch64/aarch64.md (read_sysregdi): New insn_and_split.
>   (write_sysregdi): Likewise.
>   * gcc/config/aarch64/arm_acle.h (__arm_rsr): New.
>   (__arm_rsrp): Likewise.
>   (__arm_rsr64): Likewise.
>   (__arm_rsrf): Likewise.
>   (__arm_rsrf64): Likewise.
>   (__arm_wsr): Likewise.
>   (__arm_wsrp): Likewise.
>   (__arm_wsr64): Likewise.
>   (__arm_wsrf): Likewise.
>   (__arm_wsrf64): Likewise.
>
> gcc/testsuite/ChangeLog:
>
>   * gcc/testsuite/gcc.target/aarch64/acle/rwsr.c: New.
>   * gcc/testsuite/gcc.target/aarch64/acle/rwsr-1.c: Likewise.
> ---
>  gcc/config/aarch64/aarch64-builtins.cc| 200 ++
>  gcc/config/aarch64/aarch64.md |  17 ++
>  gcc/config/aarch64/arm_acle.h |  30 +++
>  .../gcc.target/aarch64/acle/rwsr-1.c  |  20 ++
>  gcc/testsuite/gcc.target/aarch64/acle/rwsr.c  | 144 +
>  5 files changed, 411 insertions(+)
>  create mode 100644 gcc/testsuite/gcc.target/aarch64/acle/rwsr-1.c
>  create mode 100644 gcc/testsuite/gcc.target/aarch64/acle/rwsr.c
>
> diff --git a/gcc/config/aarch64/aarch64-builtins.cc 
> b/gcc/config/aarch64/aarch64-builtins.cc
> index 04f59fd9a54..d8bb2a989a5 100644
> --- a/gcc/config/aarch64/aarch64-builtins.cc
> +++ b/gcc/config/aarch64/aarch64-builtins.cc
> @@ -808,6 +808,17 @@ enum aarch64_builtins
>AARCH64_RBIT,
>AARCH64_RBITL,
>AARCH64_RBITLL,
> +  /* System register builtins.  */
> +  AARCH64_RSR,
> +  AARCH64_RSRP,
> +  AARCH64_RSR64,
> +  AARCH64_RSRF,
> +  AARCH64_RSRF64,
> +  AARCH64_WSR,
> +  AARCH64_WSRP,
> +  AARCH64_WSR64,
> +  AARCH64_WSRF,
> +  AARCH64_WSRF64,
>AARCH64_BUILTIN_MAX
>  };
>  
> @@ -1798,6 +1809,65 @@ aarch64_init_rng_builtins (void)
>  AARCH64_BUILTIN_RNG_RNDRRS);
>  }
>  
> +/* Add builtins for reading system register.  */
> +static void
> +aarch64_init_rwsr_builtins (void)
> +{
> +  tree fntype = NULL;
> +  tree const_char_ptr_type
> += build_pointer_type (build_type_variant (char_type_node, true, false));
> +
> +#define AARCH64_INIT_RWSR_BUILTINS_DECL(F, N, T) \
> +  aarch64_builtin_decls[AARCH64_##F] \
> += aarch64_general_add_builtin ("__builtin_aarch64_"#N, T, AARCH64_##F);
> +
> +  fntype
> += build_function_type_list (uint32_type_node, const_char_ptr_type, NULL);
> +  AARCH64_INIT_RWSR_BUILTINS_DECL (RSR, rsr, fntype);
> +
> +  fntype
> += build_function_type_list (ptr_type_node, const_char_ptr_type, NULL);
> +  AARCH64_INIT_RWSR_BUILTINS_DECL (RSRP, rsrp, fntype);
> +
> +  fntype
> += build_function_type_list (uint64_type_node, const_char_ptr_type, NULL);
> +  AARCH64_INIT_RWSR_BUILTINS_DECL (RSR64, rsr64, fntype);
> +
> +  fntype
> += build_function_type_list (float_type_node, const_char_ptr_type, NULL);
> +  AARCH64_INIT_RWSR_BUILTINS_DECL (RSRF, rsrf, fntype);
> +
> +  fntype
> += build_function_type_list (double_type_node, const_char_ptr_type, NULL);
> +  AARCH64_INIT_RWSR_BUILTINS_DECL (RSRF64, rsrf64, fntype);
> +
> +  fntype
> += build_function_type_list (void_type_node, const_char_ptr_type,
> + uint32_type_node, NULL);
> +
> +  AARCH64_INIT_RWSR_BUILTINS_DECL (WSR, wsr, fntype);
> +
> +  fntype
> += build_function_type_list (void_type_node, const_char_ptr_type,
> + const_ptr_type_node, NULL);
> +  AARCH64_INIT_RWSR_BUILTINS_DECL (WSRP, wsrp, fntype);
> +
> +  fntype
> += build_function_type_list (void_type_node, const_char_ptr_type,
> + uint64_type_node, NULL);
> +  AARCH64_INIT_RWSR_BUILTINS_DECL (WSR64, wsr64, fntype);
> +
> +  fntype
> += build_function_type_list (void_type_node, const_char_ptr_type,
> + float_type_node, NULL);
> +  

[Bug target/111870] Miscompile of atomic rmw or on x86 (not aarch, though)

2023-10-18 Thread fallenleafs at icloud dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111870

--- Comment #5 from isoosqa  ---
Please, forgive me. I typed stuff wrong in original link

[Bug target/111870] Miscompile of atomic rmw or on x86 (not aarch, though)

2023-10-18 Thread fallenleafs at icloud dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111870

--- Comment #4 from isoosqa  ---
Oops, I sent wrong code. This is the one https://godbolt.org/z/GxdvMdP76

[Bug target/111870] Miscompile of atomic rmw or on x86 (not aarch, though)

2023-10-18 Thread pinskia at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111870

--- Comment #3 from Andrew Pinski  ---
Or maybe the issue is you don't understand the cmpxchg instruction and how it
gives back the original value too.


The RTL form for the "lock;cmpxchg " is:

(insn:TI 14 10 17 5 (parallel [
(set (reg:DI 0 ax [108])
(unspec_volatile:DI [
(mem/v:DI (reg/v/f:DI 5 di [orig:105 map ] [105]) [-1 
S8 A64])
(reg:DI 0 ax [108])
(reg:DI 4 si [107])
(const_int 32773 [0x8005])
] UNSPECV_CMPXCHG))
(set (mem/v:DI (reg/v/f:DI 5 di [orig:105 map ] [105]) [-1  S8
A64])
(unspec_volatile:DI [
(const_int 0 [0])
] UNSPECV_CMPXCHG))
(set (reg:CCZ 17 flags)
(unspec_volatile:CCZ [
(const_int 0 [0])
] UNSPECV_CMPXCHG))
]) "/app/example.c":15:22 9336 {atomic_compare_and_swapdi_1}
 (expr_list:REG_DEAD (reg:DI 4 si [107])
(nil)))

[Bug target/111870] Miscompile of atomic rmw or on x86 (not aarch, though)

2023-10-18 Thread pinskia at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111870

Andrew Pinski  changed:

   What|Removed |Added

 Ever confirmed|0   |1
   Last reconfirmed||2023-10-18
 Status|UNCONFIRMED |WAITING

--- Comment #2 from Andrew Pinski  ---
>atomicity of load gets elided. 

No IT DOES NOT.

mov rax, QWORD PTR [rdi]
.L3:
mov rsi, rax
mov rdx, rax
or  rsi, 1
lock cmpxchgQWORD PTR [rdi], rsi
jne .L3

That is correct as far as I know.

an atomic load from [rdi] and then do the or and then do a compare-and-exchange
with the new value (old value was in rdx for the comparison and the new value
is stored into rax).

That is very much atomically doing the IOR.

Re: [PATCH] RISC-V: Add popcount fallback expander.

2023-10-18 Thread Robin Dapp
> Could you by the way add this mention this PR: 
> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111791
> Add the test of this PR ?

Commented in that PR.  This patch does not help there.

Regards
 Robin


[Bug target/111870] Miscompile of atomic rmw or on x86 (not aarch, though)

2023-10-18 Thread pinskia at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111870

--- Comment #1 from Andrew Pinski  ---
Created attachment 56145
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=56145=edit
testcase

Next time please enter attach or place inline the testcase rather than just a
link.

[Bug c/111870] New: Miscompile of atomic rmw or on x86 (not aarch, though)

2023-10-18 Thread fallenleafs at icloud dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111870

Bug ID: 111870
   Summary: Miscompile of atomic rmw or on x86 (not aarch, though)
   Product: gcc
   Version: 13.2.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: c
  Assignee: unassigned at gcc dot gnu.org
  Reporter: fallenleafs at icloud dot com
  Target Milestone: ---

Recently I wrote one little piece of code that gave me too much headache. After
investigation, I discovered that on x86 the sequence of instructions that GCC
(strangely, llvm as well) produces is invalid because after applying
optimisations, atomicity of load gets elided. You can witness that gcc makes up
nonsense here[https://godbolt.org/z/Thfeq1KGW]. There you can find c code and
if you compile it with gcc13.2 to aarch, atomic_fetch_or_explicit gets
translated to a loop of a pair of special load-store instructions which is
correct lowering, but if you do it for x86, you can, in fact, witness that
generated code does not contain `lock or ...` instruction, which would be
correct code, but instead `lock cmpxchg ...` which is invalid.

Re: [PATCH v2] gcc: Introduce -fhardened

2023-10-18 Thread Qing Zhao
Marek,

Sorry for the late comment (I was just back from a long vacation immediate 
after Cauldron). 

One question:

Is the option “-fhandened” for production build or for development build? 

If it’s for development build, then adding -ftrivial-auto-var-init=pattern is 
reasonable since the major purpose for  -ftrivial-auto-var-init=pattern is for 
debugging, the runtime overhead of -ftrivial-auto-var-init=pattern is higher 
then -ftrivial-auto-var-init=zero.

However, if it’s for production build, then adding -ftrivial-auto-var-init=zero 
is better since the major purpose for -ftrivial-auto-var-init=zero is for 
production build to eliminate all uninitialization. And the runtime overhead of 
=zero is smaller than =pattern.

Qing
> On Oct 11, 2023, at 4:48 PM, Marek Polacek  wrote:
> 
> On Tue, Sep 19, 2023 at 10:58:19AM -0400, Marek Polacek wrote:
>> On Mon, Sep 18, 2023 at 08:57:39AM +0200, Richard Biener wrote:
>>> On Fri, Sep 15, 2023 at 5:09 PM Marek Polacek via Gcc-patches
>>>  wrote:
 
 Bootstrapped/regtested on x86_64-pc-linux-gnu, 
 powerpc64le-unknown-linux-gnu,
 and aarch64-unknown-linux-gnu; ok for trunk?
 
 -- >8 --
 In 
 I proposed -fhardened, a new umbrella option that enables a reasonable set
 of hardening flags.  The read of the room seems to be that the option
 would be useful.  So here's a patch implementing that option.
 
 Currently, -fhardened enables:
 
  -D_FORTIFY_SOURCE=3 (or =2 for older glibcs)
  -D_GLIBCXX_ASSERTIONS
  -ftrivial-auto-var-init=pattern
  -fPIE  -pie  -Wl,-z,relro,-z,now
  -fstack-protector-strong
  -fstack-clash-protection
  -fcf-protection=full (x86 GNU/Linux only)
 
 -fhardened will not override options that were specified on the command 
 line
 (before or after -fhardened).  For example,
 
 -D_FORTIFY_SOURCE=1 -fhardened
 
 means that _FORTIFY_SOURCE=1 will be used.  Similarly,
 
  -fhardened -fstack-protector
 
 will not enable -fstack-protector-strong.
 
 In DW_AT_producer it is reflected only as -fhardened; it doesn't expand
 to anything.  I think we need a better way to show what it actually
 enables.
>>> 
>>> I do think we need to find a solution here to solve asserting compliance.
>> 
>> Fair enough.
>> 
>>> Maybe we can have -Whardened that will diagnose any altering of
>>> -fhardened by other options on the command-line or by missed target
>>> implementations?  People might for example use -fstack-protector
>>> but don't really want to make protection lower than requested with 
>>> -fhardened.
>>> 
>>> Any such conflict is much less appearant than when you use the
>>> flags -fhardened composes.
>> 
>> How about: --help=hardened says which options -fhardened attempts to
>> enable, and -Whardened warns when it didn't enable an option?  E.g.,
>> 
>>  -fstack-protector -fhardened -Whardened
>> 
>> would say that it didn't enable -fstack-protector-strong because
>> -fstack-protector was specified on the command line?
>> 
>> If !HAVE_LD_NOW_SUPPORT, --help=hardened probably doesn't even have to
>> list -z now, likewise for -z relro.
>> 
>> Unclear if -Whardened should be enabled by default, but probably yes?
> 
> Here's v2 which adds -Whardened (enabled by default).
> 
> Bootstrapped/regtested on x86_64-pc-linux-gnu, ok for trunk?
> 
> -- >8 --
> In 
> I proposed -fhardened, a new umbrella option that enables a reasonable set
> of hardening flags.  The read of the room seems to be that the option
> would be useful.  So here's a patch implementing that option.
> 
> Currently, -fhardened enables:
> 
>  -D_FORTIFY_SOURCE=3 (or =2 for older glibcs)
>  -D_GLIBCXX_ASSERTIONS
>  -ftrivial-auto-var-init=pattern
>  -fPIE  -pie  -Wl,-z,relro,-z,now
>  -fstack-protector-strong
>  -fstack-clash-protection
>  -fcf-protection=full (x86 GNU/Linux only)
> 
> -fhardened will not override options that were specified on the command line
> (before or after -fhardened).  For example,
> 
> -D_FORTIFY_SOURCE=1 -fhardened
> 
> means that _FORTIFY_SOURCE=1 will be used.  Similarly,
> 
>  -fhardened -fstack-protector
> 
> will not enable -fstack-protector-strong.
> 
> In DW_AT_producer it is reflected only as -fhardened; it doesn't expand
> to anything.  This patch provides -Whardened, enabled by default, which
> warns when -fhardened couldn't enable a particular option.  I think most
> often it will say that _FORTIFY_SOURCE wasn't enabled because optimization
> were not enabled.
> 
> gcc/c-family/ChangeLog:
> 
>   * c-opts.cc (c_finish_options): Maybe cpp_define _FORTIFY_SOURCE
>   and _GLIBCXX_ASSERTIONS.
> 
> gcc/ChangeLog:
> 
>   * common.opt (Whardened, fhardened): New options.
>   * config.in: Regenerate.
>   * config/bpf/bpf.cc: Include "opts.h".
>   

[Bug target/111857] RISC-V: Failed to vectorize small GNU vector if zvl4096b with fixed-vlmax

2023-10-18 Thread pinskia at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111857

Andrew Pinski  changed:

   What|Removed |Added

   Severity|normal  |enhancement

[Bug tree-optimization/111869] ICE: verify_ssa failed since r14-4710-g60c231cb658

2023-10-18 Thread pinskia at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111869

Andrew Pinski  changed:

   What|Removed |Added

 Status|UNCONFIRMED |RESOLVED
 Resolution|--- |DUPLICATE

--- Comment #1 from Andrew Pinski  ---
Dup.

*** This bug has been marked as a duplicate of bug 111860 ***

[Bug tree-optimization/111860] [14 Regression] incorrect vUSE after guard block loop skip block during vectorization.

2023-10-18 Thread pinskia at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111860

Andrew Pinski  changed:

   What|Removed |Added

 CC||shaohua.li at inf dot ethz.ch

--- Comment #9 from Andrew Pinski  ---
*** Bug 111869 has been marked as a duplicate of this bug. ***

[Bug tree-optimization/111860] [14 Regression] incorrect vUSE after guard block loop skip block during vectorization.

2023-10-18 Thread pinskia at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111860

Andrew Pinski  changed:

   What|Removed |Added

 Status|NEW |ASSIGNED

--- Comment #8 from Andrew Pinski  ---
.

[Bug tree-optimization/111791] RISC-V: Strange loop vectorizaion on popcount function

2023-10-18 Thread rdapp at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111791

--- Comment #4 from Robin Dapp  ---
This is a scalar popcount and as Kito already noted we will just emit

  cpop a0, a0

once the zbb extension is present.

As to the question what is actually being vectorized here, I'm not so sure :D
It looks like we're generating a vectorized scalar popcount by something like a
reduction?  But we already did the call to __popcountdi2?

Analyzing loop at pr111791.c:8
pr111791.c:8:25: note:  === analyze_loop_nest ===
pr111791.c:8:25: note:   === vect_analyze_loop_form ===
pr111791.c:8:25: note:=== get_loop_niters ===
Matching expression match.pd:1919, generic-match-8.cc:27
Applying pattern match.pd:1975, generic-match-2.cc:4670
Matching expression match.pd:2707, generic-match-4.cc:36
Matching expression match.pd:2710, generic-match-3.cc:53
Matching expression match.pd:2717, generic-match-2.cc:23
Matching expression match.pd:2707, generic-match-4.cc:36
Matching expression match.pd:2710, generic-match-3.cc:53
Matching expression match.pd:2717, generic-match-2.cc:23
Matching expression match.pd:2707, generic-match-4.cc:36
Matching expression match.pd:2710, generic-match-3.cc:53
Matching expression match.pd:2717, generic-match-2.cc:23
Matching expression match.pd:148, generic-match-10.cc:27
Matching expression match.pd:148, generic-match-10.cc:27
Applying pattern match.pd:4519, generic-match-4.cc:2923
Applying pattern match.pd:201, generic-match-4.cc:3103
Applying pattern match.pd:3393, generic-match-2.cc:182
pr111791.c:8:25: note:   Symbolic number of iterations is (unsigned intD.4)
__builtin_popcountlD.1952 (value_4(D))

Ah, interesting: ranger(?) recognizes that the loop runs "popcount" iterations.
Shouldn't that still be 64?  Well, it probably knows better :)

Regardless, we use this symbolic value as number of iterations:
  _5 = __builtin_popcountlD.1952 (value_4(D));
  niters.4_9 = (unsigned intD.4) _5;
  _2 = __builtin_popcountlD.1952 (value_4(D));
  bnd.5_3 = (unsigned intD.4) _2;
  _23 = (unsigned long) bnd.5_3;

Then, it gets funnier:

  # nbits_11 = PHI 
  # vect_vec_iv_.6_15 = PHI <_16(6), { 0, 1, 2, ... }(5)>
  # ivtmp_24 = PHI 
  _26 = .SELECT_VL (ivtmp_24, POLY_INT_CST [4, 4]);
  _16 = vect_vec_iv_.6_15 + { POLY_INT_CST [4, 4], ... };
  vect_nbits_7.7_18 = vect_vec_iv_.6_15 + { 1, ... };
  # RANGE [irange] int [1, 65]
  nbits_7 = nbits_11 + 1;
  # RANGE [irange] long unsigned int [0, 18446744073709551614]
  _1 = value_10 + 18446744073709551615;
  # RANGE [irange] long unsigned int [0, 18446744073709551614]
  value_8 = _1 & value_10;
  ivtmp_25 = ivtmp_24 - _26;

i.e. we have a vector IV that we add to the vectorized nbits.  Finally we
extract the niter-th (=popcount) element from that vector only to get -
popcount :)

Still not sure why that happens but a vector-mode popcount expander doesn't
help here as everything is scalar.  Maybe the explanation is simple in that we
would vectorize such a loop anyway and here it just looks particularly bad
because we already know the result via ranger?

[Bug tree-optimization/111869] New: ICE: verify_ssa failed since r14-4710-g60c231cb658

2023-10-18 Thread shaohua.li at inf dot ethz.ch via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111869

Bug ID: 111869
   Summary: ICE: verify_ssa failed since r14-4710-g60c231cb658
   Product: gcc
   Version: 14.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: tree-optimization
  Assignee: unassigned at gcc dot gnu.org
  Reporter: shaohua.li at inf dot ethz.ch
CC: tamar.christina at arm dot com
  Target Milestone: ---

gcc at -O3 crashed.

Bisected to r14-4710-g60c231cb658

$ cat a.c
int a;
unsigned b() {
  long c;
  unsigned d = 1;
  char *e = "";
  c = 0;
  for (; c < a; c++)
e[0]++;
  c = 0;
  for (; c < a; c++)
if (c)
  d = 0;
  return d;
}
int main() {}
$
$ gcc -O3 a.c
a.c: In function ‘b’:
a.c:2:10: error: PHI node with wrong VUSE on edge from BB 14
2 | unsigned b() {
  |  ^
.MEM_39 = PHI <.MEM_1(14), .MEM_1(13)>
expected .MEM_29
a.c:2:10: error: PHI node with wrong VUSE on edge from BB 13
.MEM_39 = PHI <.MEM_1(14), .MEM_1(13)>
expected .MEM_29
during GIMPLE pass: vect
a.c:2:10: internal compiler error: verify_ssa failed
0x7f10859fa082 __libc_start_main
../csu/libc-start.c:308
Please submit a full bug report, with preprocessed source (by using
-freport-bug).
Please include the complete backtrace with any bug report.
See  for instructions
$

[Bug bootstrap/111601] [14 Regression] bootstrap fails in stagestrain in libcody on x86_64-linux-gnu and powerpc64le-linux-gnu

2023-10-18 Thread bergner at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111601

Peter Bergner  changed:

   What|Removed |Added

 Ever confirmed|0   |1
 Status|UNCONFIRMED |NEW
   Last reconfirmed||2023-10-18

--- Comment #3 from Peter Bergner  ---
(In reply to Matthias Klose from comment #2)
> this seems to be fixed on x86_64-linux-gnu with trunk 20231017.
> powerpc64le-linux now fails in a different way, trying to build the
> libstdc++ pch headers. 
> 
> Full build log at
> https://buildd.debian.org/status/fetch.php?pkg=gcc-
> snapshot=ppc64el=1%3A20231017-1=1697561774=1

Ok, that is the same error I'm seeing, so Confirmed.  I'll try and see if I can
reduce the test case.

Re: [V3][PATCH 0/3] New attribute "counted_by" to annotate bounds for C99 FAM(PR108896)

2023-10-18 Thread Qing Zhao


> On Oct 6, 2023, at 4:01 PM, Martin Uecker  wrote:
> 
> Am Freitag, dem 06.10.2023 um 06:50 -0400 schrieb Siddhesh Poyarekar:
>> On 2023-10-06 01:11, Martin Uecker wrote:
>>> Am Donnerstag, dem 05.10.2023 um 15:35 -0700 schrieb Kees Cook:
 On Thu, Oct 05, 2023 at 04:08:52PM -0400, Siddhesh Poyarekar wrote:
> 2. How would you handle signedness of the size field?  The size gets
> converted to sizetype everywhere it is used and overflows/underflows may
> produce interesting results.  Do you want to limit the types to unsigned 
> or
> do you want to add a disclaimer in the docs?  The former seems like the
> *right* thing to do given that it is a new feature; best to enforce the
> cleaner habit at the outset.
 
 The Linux kernel has a lot of "int" counters, so the goal is to catch
 negative offsets just like too-large offsets at runtime with the sanitizer
 and report 0 for __bdos. Refactoring all these to be unsigned is going
 to take time since at least some of them use the negative values as
 special values unrelated to array indexing. :(
 
 So, perhaps if unsigned counters are worth enforcing, can this be a
 separate warning the kernel can turn off initially?
 
>>> 
>>> I think unsigned counters are much more problematic than signed ones
>>> because wraparound errors are more difficult to find.
>>> 
>>> With unsigned you could potentially diagnose wraparound, but only if we
>>> add -fsanitize=unsigned-overflow *and* add mechanism to mark intentional
>>> wraparound *and* everybody adds this annotation after carefully screening
>>> their code *and* rewriting all operations such as (counter - 3) + 5
>>> where the wraparound in the intermediate expression is harmless.
>>> 
>>> For this reason, I do not think we should ever enforce some rule that
>>> the counter has to be unsigned.
>>> 
>>> What we could do, is detect *storing* negative values into the
>>> counter at run-time using UBSan. (but if negative values are
>>> used for special cases, one also should be able to turn this
>>> off).
>> 
>> All of the object size detection relies on object sizes being sizetype. 
>> The closest we could do with that is detect (sz != SIZE_MAX && sz > 
>> size_t / 2), since allocators typically cannot allocate more than 
>> SIZE_MAX / 2.
> 
> I was talking about the counter in:
> 
> struct {
>  int counter;
>  char buf[] __counted_by__((counter))
> };
> 
> which could be checked to be positive either when stored to or 
> when buf is used.
> 
> And yes, we could also check the size of buf.  Not sure what is
> done for VLAs now, but I guess it could be similar.
> 
For VLAs, the bounds expression could be both signed or unsigned. 
But we have added a sanitizer option -fsanitize=vla-bound to catch the cases 
when the size of the VLA is not positive.

For example:

opc@qinzhao-ol8u3-x86 Martin]$ cat t3.c
#include 
size_t foo(int m)
{
  char t[m];

  return sizeof(t);
}

int main()
{
  printf ("the sizeof flexm is %lu \n", foo(-1));
  return 0;
}
[opc@qinzhao-ol8u3-x86 Martin]$ sh t
/home/opc/Install/latest-d/bin/gcc -fsanitize=undefined -O2 -Wall -Wpedantic 
t3.c
t3.c:4:8: runtime error: variable length array bound evaluates to non-positive 
value -1
the sizeof flexm is 18446744073609551616 


We can do the same thing for “counted_by”. i.e:

1. No specification for signed or unsigned for counted_by field.
2. Add an sanitizer option -fsanitize=counted-by-bound to catch the cases when 
the size of the counted-by is not positive.

Is this good enough?

Qing
> Best,
> Martin
> 
> 
> 
>> 
>> Sid



[Bug fortran/111851] f951: Segmentation fault at gfc_delete_symtree

2023-10-18 Thread kargl at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111851

kargl at gcc dot gnu.org changed:

   What|Removed |Added

 CC||kargl at gcc dot gnu.org

--- Comment #2 from kargl at gcc dot gnu.org ---
Oddly, I'm seeing an ICE due to the assert() at line 3131 in 
symbol.cc(gfc_release_symbol).  I don't see the memory hog issue.  If I comment
out that assert() and let gfortran proceed, I see

% gfcx -c a.f90
a.f90:1:28:

1 | SELECT TYPE (rvec2ASSOCIATE(
  |1
Error: Syntax error in argument list at (1)


On FreeBSD I use the following for configure

../gccx/configure --prefix=$HOME/work/x --enable-languages=c,c++,fortran,lto \
  --enable-bootstrap --disable-nls --disable-multilib --enable-libsanitizer

[Bug target/110551] [11/12/13/14 Regression] an extra mov when doing 128bit multiply

2023-10-18 Thread roger at nextmovesoftware dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110551

Roger Sayle  changed:

   What|Removed |Added

 CC||roger at nextmovesoftware dot 
com

--- Comment #4 from Roger Sayle  ---
Patch proposed:
https://gcc.gnu.org/pipermail/gcc-patches/2023-October/67.html

Re: [PATCH V2 3/7] aarch64: Implement system register validation tools

2023-10-18 Thread Richard Sandiford
Generally looks really good.  Some comments below.

Victor Do Nascimento  writes:
> Given the implementation of a mechanism of encoding system registers
> into GCC, this patch provides the mechanism of validating their use by
> the compiler.  In particular, this involves:
>
>   1. Ensuring a supplied string corresponds to a known system
>  register name.  System registers can be accessed either via their
>  name (e.g. `SPSR_EL1') or their encoding (e.g. `S3_0_C4_C0_0').
>  Register names are validated using a hash map, mapping known
>  system register names to its corresponding `sysreg_t' struct,
>  which is populated from the `aarch64_system_regs.def' file.
>  Register name validation is done via `lookup_sysreg_map', while
>  the encoding naming convention is validated via a parser
>  implemented in this patch - `is_implem_def_reg'.
>   2. Once a given register name is deemed to be valid, it is checked
>  against a further 2 criteria:
>a. Is the referenced register implemented in the target
>   architecture?  This is achieved by comparing the ARCH field
> in the relevant SYSREG entry from `aarch64_system_regs.def'
> against `aarch64_feature_flags' flags set at compile-time.
>b. Is the register being used correctly?  Check the requested
> operation against the FLAGS specified in SYSREG.
> This prevents operations like writing to a read-only system
> register.
>
> gcc/ChangeLog:
>
>   * gcc/config/aarch64/aarch64-protos.h (aarch64_valid_sysreg_name_p): 
> New.
>   (aarch64_retrieve_sysreg): Likewise.
>   * gcc/config/aarch64/aarch64.cc (is_implem_def_reg): Likewise.
>   (aarch64_valid_sysreg_name_p): Likewise.
>   (aarch64_retrieve_sysreg): Likewise.
>   (aarch64_register_sysreg): Likewise.
>   (aarch64_init_sysregs): Likewise.
>   (aarch64_lookup_sysreg_map): Likewise.
>   * gcc/config/aarch64/predicates.md (aarch64_sysreg_string): New.
> ---
>  gcc/config/aarch64/aarch64-protos.h |   2 +
>  gcc/config/aarch64/aarch64.cc   | 146 
>  gcc/config/aarch64/predicates.md|   4 +
>  3 files changed, 152 insertions(+)
>
> diff --git a/gcc/config/aarch64/aarch64-protos.h 
> b/gcc/config/aarch64/aarch64-protos.h
> index 60a55f4bc19..a134e2fcf8e 100644
> --- a/gcc/config/aarch64/aarch64-protos.h
> +++ b/gcc/config/aarch64/aarch64-protos.h
> @@ -830,6 +830,8 @@ bool aarch64_simd_shift_imm_p (rtx, machine_mode, bool);
>  bool aarch64_sve_ptrue_svpattern_p (rtx, struct simd_immediate_info *);
>  bool aarch64_simd_valid_immediate (rtx, struct simd_immediate_info *,
>   enum simd_immediate_check w = AARCH64_CHECK_MOV);
> +bool aarch64_valid_sysreg_name_p (const char *);
> +const char *aarch64_retrieve_sysreg (char *, bool);
>  rtx aarch64_check_zero_based_sve_index_immediate (rtx);
>  bool aarch64_sve_index_immediate_p (rtx);
>  bool aarch64_sve_arith_immediate_p (machine_mode, rtx, bool);
> diff --git a/gcc/config/aarch64/aarch64.cc b/gcc/config/aarch64/aarch64.cc
> index 69de2366424..816c4b69fc8 100644
> --- a/gcc/config/aarch64/aarch64.cc
> +++ b/gcc/config/aarch64/aarch64.cc
> @@ -85,6 +85,7 @@
>  #include "config/arm/aarch-common.h"
>  #include "config/arm/aarch-common-protos.h"
>  #include "ssa.h"
> +#include "hash-map.h"
>  
>  /* This file should be included last.  */
>  #include "target-def.h"
> @@ -2845,6 +2846,52 @@ const sysreg_t sysreg_structs[] =
>  const unsigned nsysreg = TOTAL_ITEMS;
>  #undef TOTAL_ITEMS
>  
> +using sysreg_map_t = hash_map;
> +static sysreg_map_t *sysreg_map = nullptr;

One concern with static, non-GTY, runtime-initialised data is "does it
work with PCH?".  I suspect it does, since all uses of the map go through
aarch64_lookup_sysreg_map, and since nothing seems to rely on persistent
pointer values.  But it would be good to have a PCH test just to make sure.

I'm thinking of something like the tests in gcc/testsuite/gcc.dg/pch.
The header file (.hs) would define a function that does sysreg reads
and writes.  When the .hs is included from the .c file, the reads and
writes would be imported through a PCH load, rather than through the
normal frontend route.

> +
> +/* Map system register names to their hardware metadata: Encoding,

s/Encoding/encoding/

> +   feature flags and architectural feature requirements, all of which
> +   are encoded in a sysreg_t struct.  */
> +void
> +aarch64_register_sysreg (const char *name, const sysreg_t *metadata)
> +{
> +  bool dup = sysreg_map->put (name, metadata);
> +  gcc_checking_assert (!dup);
> +}
> +
> +/* Lazily initialize hash table for system register validation,
> +   checking the validity of supplied register name and returning
> +   register's associated metadata.  */
> +static void
> +aarch64_init_sysregs (void)
> +{
> +  gcc_assert (!sysreg_map);
> +  sysreg_map = new sysreg_map_t;
> +  gcc_assert (sysreg_map);

This assert seems redundant.  

[Bug tree-optimization/111648] [14 Regression] Wrong code at -O2/3 on x86_64-linux-gnu since r14-3243-ga7dba4a1c05

2023-10-18 Thread prathamesh3492 at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111648

prathamesh3492 at gcc dot gnu.org changed:

   What|Removed |Added

 Status|ASSIGNED|RESOLVED
 Resolution|--- |FIXED

--- Comment #6 from prathamesh3492 at gcc dot gnu.org ---
Fixed.

Re: PR111648: Fix wrong code-gen due to incorrect VEC_PERM_EXPR folding

2023-10-18 Thread Prathamesh Kulkarni
On Wed, 18 Oct 2023 at 23:22, Richard Sandiford
 wrote:
>
> Prathamesh Kulkarni  writes:
> > On Tue, 17 Oct 2023 at 02:40, Richard Sandiford
> >  wrote:
> >> Prathamesh Kulkarni  writes:
> >> > diff --git a/gcc/fold-const.cc b/gcc/fold-const.cc
> >> > index 4f8561509ff..55a6a68c16c 100644
> >> > --- a/gcc/fold-const.cc
> >> > +++ b/gcc/fold-const.cc
> >> > @@ -10684,9 +10684,8 @@ valid_mask_for_fold_vec_perm_cst_p (tree arg0, 
> >> > tree arg1,
> >> >
> >> >/* Ensure that the stepped sequence always selects from the same
> >> >input pattern.  */
> >> > -  unsigned arg_npatterns
> >> > - = ((q1 & 1) == 0) ? VECTOR_CST_NPATTERNS (arg0)
> >> > -   : VECTOR_CST_NPATTERNS (arg1);
> >> > +  tree arg = ((q1 & 1) == 0) ? arg0 : arg1;
> >> > +  unsigned arg_npatterns = VECTOR_CST_NPATTERNS (arg);
> >> >
> >> >if (!multiple_p (step, arg_npatterns))
> >> >   {
> >> > @@ -10694,6 +10693,29 @@ valid_mask_for_fold_vec_perm_cst_p (tree arg0, 
> >> > tree arg1,
> >> >   *reason = "step is not multiple of npatterns";
> >> > return false;
> >> >   }
> >> > +
> >> > +  /* If a1 chooses base element from arg, ensure that it's a natural
> >> > +  stepped sequence, ie, (arg[2] - arg[1]) == (arg[1] - arg[0])
> >> > +  to preserve arg's encoding.  */
> >> > +
> >> > +  unsigned HOST_WIDE_INT index;
> >> > +  if (!r1.is_constant ())
> >> > + return false;
> >> > +  if (index < arg_npatterns)
> >> > + {
> >>
> >> I don't know whether it matters in practice, but I think the two conditions
> >> above are more natural as:
> >>
> >> if (maybe_lt (r1, arg_npatterns))
> >>   {
> >> unsigned HOST_WIDE_INT index;
> >> if (!r1.is_constant ())
> >>   return false;
> >>
> >> ...[code below]...
> >>   }
> >>
> >> > +   tree arg_elem0 = vector_cst_elt (arg, index);
> >> > +   tree arg_elem1 = vector_cst_elt (arg, index + arg_npatterns);
> >> > +   tree arg_elem2 = vector_cst_elt (arg, index + arg_npatterns * 2);
> >> > +
> >> > +   if (!operand_equal_p (const_binop (MINUS_EXPR, arg_elem2, 
> >> > arg_elem1),
> >> > + const_binop (MINUS_EXPR, arg_elem1, 
> >> > arg_elem0),
> >> > + 0))
> >>
> >> This needs to check whether const_binop returns null.  Maybe:
> >>
> >>tree step1, step2;
> >>if (!(step1 = const_binop (MINUS_EXPR, arg_elem1, arg_elem0))
> >>|| !(step2 = const_binop (MINUS_EXPR, arg_elem2, arg_elem1))
> >>|| !operand_equal_p (step1, step2, 0))
> >>
> >> OK with those changes, thanks.
> > Hi Richard,
> > Thanks for the suggestions, updated the attached patch accordingly.
> > Bootstrapped+tested with and without SVE on aarch64-linux-gnu and
> > x86_64-linux-gnu.
> > OK to commit ?
>
> Yes, thanks.
Thanks, committed to trunk in 3ec8ecb8e92faec889bc6f7aeac9ff59e82b4f7f.

Thanks,
Prathamesh
>
> Richard
>
> >
> > Thanks,
> > Prathamesh
> >>
> >> Richard
> >>
> >> > + {
> >> > +   if (reason)
> >> > + *reason = "not a natural stepped sequence";
> >> > +   return false;
> >> > + }
> >> > + }
> >> >  }
> >> >
> >> >return true;
> >> > @@ -17161,7 +17183,8 @@ namespace test_fold_vec_perm_cst {
> >> >  static tree
> >> >  build_vec_cst_rand (machine_mode vmode, unsigned npatterns,
> >> >   unsigned nelts_per_pattern,
> >> > - int step = 0, int threshold = 100)
> >> > + int step = 0, bool natural_stepped = false,
> >> > + int threshold = 100)
> >> >  {
> >> >tree inner_type = lang_hooks.types.type_for_mode (GET_MODE_INNER 
> >> > (vmode), 1);
> >> >tree vectype = build_vector_type_for_mode (inner_type, vmode);
> >> > @@ -17176,17 +17199,28 @@ build_vec_cst_rand (machine_mode vmode, 
> >> > unsigned npatterns,
> >> >
> >> >// Fill a1 for each pattern
> >> >for (unsigned i = 0; i < npatterns; i++)
> >> > -builder.quick_push (build_int_cst (inner_type, rand () % 
> >> > threshold));
> >> > -
> >> > +{
> >> > +  tree a1;
> >> > +  if (natural_stepped)
> >> > + {
> >> > +   tree a0 = builder[i];
> >> > +   wide_int a0_val = wi::to_wide (a0);
> >> > +   wide_int a1_val = a0_val + step;
> >> > +   a1 = wide_int_to_tree (inner_type, a1_val);
> >> > + }
> >> > +  else
> >> > + a1 = build_int_cst (inner_type, rand () % threshold);
> >> > +  builder.quick_push (a1);
> >> > +}
> >> >if (nelts_per_pattern == 2)
> >> >  return builder.build ();
> >> >
> >> >for (unsigned i = npatterns * 2; i < npatterns * nelts_per_pattern; 
> >> > i++)
> >> >  {
> >> >tree prev_elem = builder[i - npatterns];
> >> > -  int prev_elem_val = TREE_INT_CST_LOW (prev_elem);
> >> > -  int val = prev_elem_val + step;
> >> > -  builder.quick_push (build_int_cst (inner_type, val));
> >> > +  wide_int 

[Bug tree-optimization/111648] [14 Regression] Wrong code at -O2/3 on x86_64-linux-gnu since r14-3243-ga7dba4a1c05

2023-10-18 Thread cvs-commit at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111648

--- Comment #5 from CVS Commits  ---
The master branch has been updated by Prathamesh Kulkarni
:

https://gcc.gnu.org/g:3ec8ecb8e92faec889bc6f7aeac9ff59e82b4f7f

commit r14-4726-g3ec8ecb8e92faec889bc6f7aeac9ff59e82b4f7f
Author: Prathamesh Kulkarni 
Date:   Thu Oct 19 00:29:38 2023 +0530

PR111648: Fix wrong code-gen due to incorrect VEC_PERM_EXPR folding.

gcc/ChangeLog:
PR tree-optimization/111648
* fold-const.cc (valid_mask_for_fold_vec_perm_cst_p): If a1
chooses base element from arg, ensure that it's a natural stepped
sequence.
(build_vec_cst_rand): New param natural_stepped and use it to
construct a naturally stepped sequence.
(test_nunits_min_2): Add new unit tests Case 6 and Case 7.

[Bug c/110500] gcc: internal compiler error: tree check: expected class 'type', have 'exceptional' (error_mark) in c_parser_omp_clause_allocate

2023-10-18 Thread pinskia at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110500

--- Comment #3 from Andrew Pinski  ---
*** Bug 111862 has been marked as a duplicate of this bug. ***

[Bug c/111862] GCC: internal compiler error: tree check: expected class 'type', have 'exceptional' (error_mark) in c_parser_omp_clause_reduction, at c/c-parser.cc:16234

2023-10-18 Thread pinskia at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111862

Andrew Pinski  changed:

   What|Removed |Added

 Resolution|--- |DUPLICATE
 Status|UNCONFIRMED |RESOLVED

--- Comment #1 from Andrew Pinski  ---
Dup of bug 110500 (which was not fixed yet; just a patch was posted to the bug
report for someone to finish up).

*** This bug has been marked as a duplicate of bug 110500 ***

[Bug target/96347] note: non-delegitimized UNSPEC UNSPEC_TP (19) found in variable location

2023-10-18 Thread ibuclaw at gdcproject dot org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=96347

Iain Buclaw  changed:

   What|Removed |Added

 Resolution|--- |FIXED
 Status|UNCONFIRMED |RESOLVED

--- Comment #6 from Iain Buclaw  ---
(releases/gcc-9) $ ./gcc/xg++ -B ./gcc/ pr.cc -O2 -g -fchecking
pr.cc: In function ‘int main()’:
pr.cc:45:5: note: non-delegitimized UNSPEC UNSPEC_TP (19) found in variable
location
   45 | int main (void)
  | ^~~~
pr.cc:45:5: note: non-delegitimized UNSPEC UNSPEC_TP (19) found in variable
location

(releases/gcc-10) $ ./gcc/xg++ -B ./gcc/ pr.cc -O2 -g -fchecking
pr.cc: In function ‘int main()’:
pr.cc:45:5: note: non-delegitimized UNSPEC UNSPEC_TP (19) found in variable
location
   45 | int main (void)
  | ^~~~
pr.cc:45:5: note: non-delegitimized UNSPEC UNSPEC_TP (19) found in variable
location
pr.cc:45:5: note: non-delegitimized UNSPEC UNSPEC_TP (19) found in variable
location

(releases/gcc-11) $ ./gcc/xg++ -B ./gcc/ pr.cc -O2 -g -fchecking

(releases/gcc-12) $ ./gcc/xg++ -B ./gcc/ pr.cc -O2 -g -fchecking

(releases/gcc-13) $ ./gcc/xg++ -B ./gcc/ pr.cc -O2 -g -fchecking

(trunk) $ ./gcc/xg++ -B ./gcc/ pr.cc -O2 -g -fchecking

Minimal test is only reproducible on the 9.x and 10.x compilers, and I've not
seen it crop up again in any D testsuite runs. I'll just close this then.

[Bug tree-optimization/111866] [14 regression] ICE when compiling gcc.target/powerpc/p9-vec-length-full-7.c

2023-10-18 Thread pinskia at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111866

Andrew Pinski  changed:

   What|Removed |Added

   Last reconfirmed||2023-10-18
   Keywords||ice-on-valid-code
 Ever confirmed|0   |1
 Status|UNCONFIRMED |ASSIGNED

--- Comment #2 from Andrew Pinski  ---
.

[Bug middle-end/61192] Conflict between global register and function name

2023-10-18 Thread pinskia at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=61192

--- Comment #7 from Andrew Pinski  ---
Created attachment 56144
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=56144=edit
Another testcase

[Bug middle-end/61192] Conflict between global register and function name

2023-10-18 Thread pinskia at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=61192

Andrew Pinski  changed:

   What|Removed |Added

 CC||141242068 at smail dot 
nju.edu.cn

--- Comment #6 from Andrew Pinski  ---
*** Bug 111865 has been marked as a duplicate of this bug. ***

[Bug middle-end/111865] [11/12/13/14 Regression] ICE with register decl and extern decl with the same asm name

2023-10-18 Thread pinskia at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111865

Andrew Pinski  changed:

   What|Removed |Added

 Resolution|--- |DUPLICATE
 Status|NEW |RESOLVED

--- Comment #4 from Andrew Pinski  ---
Basically a dup of bug 61192.

*** This bug has been marked as a duplicate of bug 61192 ***

[Bug middle-end/111865] [11/12/13/14 Regression] ICE with register decl and extern decl with the same asm name

2023-10-18 Thread pinskia at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111865

Andrew Pinski  changed:

   What|Removed |Added

 Status|UNCONFIRMED |NEW
   Last reconfirmed||2023-10-18
 Ever confirmed|0   |1

--- Comment #3 from Andrew Pinski  ---
Confirmed.

[Bug middle-end/111865] [11/12/13/14 Regression] ICE with register decl and extern decl with the same asm name

2023-10-18 Thread pinskia at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111865

--- Comment #2 from Andrew Pinski  ---
(In reply to Andrew Pinski from comment #1)
> Created attachment 56143 [details]
> testcase that could go into the testsuite with more targets supported

Add:
```
#elif defined __aarch64__
# define ASM __asm__("sp")
```

To it too.

[Bug middle-end/111865] [11/12/13/14 Regression] ICE with register decl and extern decl with the same asm name

2023-10-18 Thread pinskia at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111865

--- Comment #1 from Andrew Pinski  ---
Created attachment 56143
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=56143=edit
testcase that could go into the testsuite with more targets supported

[Bug middle-end/111865] [11/12/13/14 Regression] ICE with register decl and extern decl with the same asm name

2023-10-18 Thread pinskia at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111865

Andrew Pinski  changed:

   What|Removed |Added

   Target Milestone|--- |11.5
Summary|GCC: 14: internal compiler  |[11/12/13/14 Regression]
   |error: symtab_node::verify  |ICE with register decl and
   |failed  |extern decl with the same
   ||asm name
  Known to fail||9.1.0

[Bug target/111867] aarch64: Wrong code for bf16 literal load when the arch support +fp16

2023-10-18 Thread iains at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111867

--- Comment #4 from Iain Sandoe  ---
(In reply to Andrew Pinski from comment #3)
> (In reply to Andrew Pinski from comment #2)
> > Maybe something like:
> > diff --git a/gcc/config/aarch64/aarch64.cc b/gcc/config/aarch64/aarch64.cc
> > index 62b1ae0652f..db2dde84329 100644
> > --- a/gcc/config/aarch64/aarch64.cc
> > +++ b/gcc/config/aarch64/aarch64.cc
> > @@ -23788,7 +23788,8 @@ aarch64_float_const_representable_p (rtx x)
> >  return false;
> > 
> >if (GET_MODE (x) == VOIDmode
> > -  || (GET_MODE (x) == HFmode && !TARGET_FP_F16INST))
> > +  || (GET_MODE (x) == HFmode && !TARGET_FP_F16INST)
> > +  || (GET_MODE (x) == BFmode))
> >  return false;
> > 
> >r = *CONST_DOUBLE_REAL_VALUE (x);

Yeah that fixes this case; re-running the testsuite to see if that clears any
other bf16 fails.


> That is there are no fmov instructions for bfmode constants ...

Although I notice there's a spare bit pattern in the "ftype" field [0b10) in
the fmov insn... but I guess that's being kept for something more useful than
bf16.

[Bug middle-end/111868] [14 regression] many ICEs after r14-4710

2023-10-18 Thread tnfchris at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111868

Tamar Christina  changed:

   What|Removed |Added

 Status|UNCONFIRMED |RESOLVED
 Resolution|--- |DUPLICATE

--- Comment #1 from Tamar Christina  ---
Duplicate of https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111860 patch going
through regression testing.

*** This bug has been marked as a duplicate of bug 111860 ***

[Bug tree-optimization/111860] [14 Regression] incorrect vUSE after guard block loop skip block during vectorization.

2023-10-18 Thread tnfchris at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111860

Tamar Christina  changed:

   What|Removed |Added

 CC||seurer at gcc dot gnu.org

--- Comment #7 from Tamar Christina  ---
*** Bug 111868 has been marked as a duplicate of this bug. ***

[Bug target/111867] aarch64: Wrong code for bf16 literal load when the arch support +fp16

2023-10-18 Thread pinskia at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111867

--- Comment #3 from Andrew Pinski  ---
(In reply to Andrew Pinski from comment #2)
> Maybe something like:
> diff --git a/gcc/config/aarch64/aarch64.cc b/gcc/config/aarch64/aarch64.cc
> index 62b1ae0652f..db2dde84329 100644
> --- a/gcc/config/aarch64/aarch64.cc
> +++ b/gcc/config/aarch64/aarch64.cc
> @@ -23788,7 +23788,8 @@ aarch64_float_const_representable_p (rtx x)
>  return false;
> 
>if (GET_MODE (x) == VOIDmode
> -  || (GET_MODE (x) == HFmode && !TARGET_FP_F16INST))
> +  || (GET_MODE (x) == HFmode && !TARGET_FP_F16INST)
> +  || (GET_MODE (x) == BFmode))
>  return false;
> 
>r = *CONST_DOUBLE_REAL_VALUE (x);

That is there are no fmov instructions for bfmode constants ...

[Bug middle-end/111868] New: [14 regression] many ICEs after r14-4710

2023-10-18 Thread seurer at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111868

Bug ID: 111868
   Summary: [14 regression] many ICEs after r14-4710
   Product: gcc
   Version: 14.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: middle-end
  Assignee: unassigned at gcc dot gnu.org
  Reporter: seurer at gcc dot gnu.org
  Target Milestone: ---

g:60c231cb65807fb963624acc4f82d2935e305f93, r14-4710-g60c231cb65807f

FAIL: libgomp.fortran/pr100981-2.f90   -O1  (internal compiler error:
verify_ssa failed)
FAIL: libgomp.fortran/pr100981-2.f90   -O1  (test for excess errors)
FAIL: libgomp.fortran/pr100981-2.f90   -O2  (internal compiler error:
verify_ssa failed)
FAIL: libgomp.fortran/pr100981-2.f90   -O2  (test for excess errors)
FAIL: libgomp.fortran/pr100981-2.f90   -O3 -fomit-frame-pointer -funroll-loops
-fpeel-loops -ftracer -finline-functions  (internal compiler error: verify_ssa
failed)
FAIL: libgomp.fortran/pr100981-2.f90   -O3 -fomit-frame-pointer -funroll-loops
-fpeel-loops -ftracer -finline-functions  (test for excess errors)
FAIL: libgomp.fortran/pr100981-2.f90   -O3 -g  (internal compiler error:
verify_ssa failed)
FAIL: libgomp.fortran/pr100981-2.f90   -O3 -g  (test for excess errors)
FAIL: libgomp.fortran/pr100981-2.f90   -Os  (internal compiler error:
verify_ssa failed)
FAIL: libgomp.fortran/pr100981-2.f90   -Os  (test for excess errors)
FAIL: libgomp.fortran/simd3.f90   -O3 -fomit-frame-pointer -funroll-loops
-fpeel-loops -ftracer -finline-functions  (internal compiler error: verify_ssa
failed)
FAIL: libgomp.fortran/simd3.f90   -O3 -fomit-frame-pointer -funroll-loops
-fpeel-loops -ftracer -finline-functions  (test for excess errors)
FAIL: libgomp.fortran/simd3.f90   -O3 -g  (internal compiler error: verify_ssa
failed)
FAIL: libgomp.fortran/simd3.f90   -O3 -g  (test for excess errors)


spawn -ignore SIGHUP
/home/seurer/gcc/git/build/gcc-test/gcc/testsuite/gfortran/../../gfortran
-B/home/seurer/gcc/git/build/gcc-test/gcc/testsuite/gfortran/../../
-B/home/seurer/gcc/git/build/gcc-test/powerpc64le-unknown-linux-gnu/./libgfortran/
/home/seurer/gcc/git/gcc-test/gcc/testsuite/gfortran.dg/vect/pr100981-1.f90
-fdiagnostics-plain-output -fdiagnostics-plain-output -O -O2 -ftree-vectorize
-fvect-cost-model=unlimited -fdump-tree-vect-details -maltivec -mpower9-vector
-O3 -ftree-parallelize-loops=2 -fno-signed-zeros -fno-trapping-math -S -o
pr100981-1.s
/home/seurer/gcc/git/gcc-test/gcc/testsuite/gfortran.dg/vect/pr100981-1.f90:5:23:
Error: stmt with wrong VUSE
# VUSE <.MEM_17(D)>
_82 = REALPART_EXPR <(*cx_22(D))[_83]>;
expected .MEM_39
/home/seurer/gcc/git/gcc-test/gcc/testsuite/gfortran.dg/vect/pr100981-1.f90:5:23:
Error: stmt with wrong VUSE
# VUSE <.MEM_17(D)>
_80 = IMAGPART_EXPR <(*cx_22(D))[_83]>;
expected .MEM_39
/home/seurer/gcc/git/gcc-test/gcc/testsuite/gfortran.dg/vect/pr100981-1.f90:5:23:
Error: PHI node with wrong VUSE on edge from BB 34
.MEM_23 = PHI <.MEM_17(D)(34), .MEM_17(D)(33)>
expected .MEM_39
/home/seurer/gcc/git/gcc-test/gcc/testsuite/gfortran.dg/vect/pr100981-1.f90:5:23:
Error: PHI node with wrong VUSE on edge from BB 33
.MEM_23 = PHI <.MEM_17(D)(34), .MEM_17(D)(33)>
expected .MEM_39
during GIMPLE pass: vect
dump file: pr100981-1.f90.175t.vect
/home/seurer/gcc/git/gcc-test/gcc/testsuite/gfortran.dg/vect/pr100981-1.f90:5:23:
internal compiler error: verify_ssa failed
0x110e9d2f verify_ssa(bool, bool)
/home/seurer/gcc/git/gcc-test/gcc/tree-ssa.cc:1203
0x10bd279f execute_function_todo
/home/seurer/gcc/git/gcc-test/gcc/passes.cc:2095
0x10bd35ab do_per_function
/home/seurer/gcc/git/gcc-test/gcc/passes.cc:1687
0x10bd37cb execute_todo
/home/seurer/gcc/git/gcc-test/gcc/passes.cc:2142


commit 60c231cb65807fb963624acc4f82d2935e305f93 (HEAD)
Author: Tamar Christina 
Date:   Wed Oct 18 09:03:06 2023 +0100

middle-end: maintain LCSSA throughout loop peeling

[Bug target/111867] aarch64: Wrong code for bf16 literal load when the arch support +fp16

2023-10-18 Thread pinskia at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111867

--- Comment #2 from Andrew Pinski  ---
Maybe something like:
diff --git a/gcc/config/aarch64/aarch64.cc b/gcc/config/aarch64/aarch64.cc
index 62b1ae0652f..db2dde84329 100644
--- a/gcc/config/aarch64/aarch64.cc
+++ b/gcc/config/aarch64/aarch64.cc
@@ -23788,7 +23788,8 @@ aarch64_float_const_representable_p (rtx x)
 return false;

   if (GET_MODE (x) == VOIDmode
-  || (GET_MODE (x) == HFmode && !TARGET_FP_F16INST))
+  || (GET_MODE (x) == HFmode && !TARGET_FP_F16INST)
+  || (GET_MODE (x) == BFmode))
 return false;

   r = *CONST_DOUBLE_REAL_VALUE (x);

[Bug target/111867] aarch64: Wrong code for bf16 literal load when the arch support +fp16

2023-10-18 Thread pinskia at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111867

Andrew Pinski  changed:

   What|Removed |Added

 Status|UNCONFIRMED |NEW
 Target|aarch64-linux-gnu,  |aarch64
   |aarch64-apple-darwin|
   Last reconfirmed||2023-10-18
 Ever confirmed|0   |1

--- Comment #1 from Andrew Pinski  ---
This is definitely wrong:
(insn 5 2 6 (set (reg:BF 63 v31 [95])
(const_double:BF 1.0e+0 [0x0.8p+1])) "/app/example.cpp":4:10 72
{*movbf_aarch64}
 (nil))


Confirmed.

Re: [PATCH 10/11] aarch64: Generalise TFmode load/store pair patterns

2023-10-18 Thread Richard Sandiford
Alex Coplan  writes:
> This patch generalises the TFmode load/store pair patterns to TImode and
> TDmode.  This brings them in line with the DXmode patterns, and uses the
> same technique with separate mode iterators (TX and TX2) to allow for
> distinct modes in each arm of the load/store pair.
>
> For example, in combination with the post-RA load/store pair fusion pass
> in the following patch, this improves the codegen for the following
> varargs testcase involving TImode stores:
>
> void g(void *);
> int foo(int x, ...)
> {
> __builtin_va_list ap;
> __builtin_va_start (ap, x);
> g();
> __builtin_va_end (ap);
> }
>
> from:
>
> foo:
> .LFB0:
>   stp x29, x30, [sp, -240]!
> .LCFI0:
>   mov w9, -56
>   mov w8, -128
>   mov x29, sp
>   add x10, sp, 176
>   stp x1, x2, [sp, 184]
>   add x1, sp, 240
>   add x0, sp, 16
>   stp x1, x1, [sp, 16]
>   str x10, [sp, 32]
>   stp w9, w8, [sp, 40]
>   str q0, [sp, 48]
>   str q1, [sp, 64]
>   str q2, [sp, 80]
>   str q3, [sp, 96]
>   str q4, [sp, 112]
>   str q5, [sp, 128]
>   str q6, [sp, 144]
>   str q7, [sp, 160]
>   stp x3, x4, [sp, 200]
>   stp x5, x6, [sp, 216]
>   str x7, [sp, 232]
>   bl  g
>   ldp x29, x30, [sp], 240
> .LCFI1:
>   ret
>
> to:
>
> foo:
> .LFB0:
>   stp x29, x30, [sp, -240]!
> .LCFI0:
>   mov w9, -56
>   mov w8, -128
>   mov x29, sp
>   add x10, sp, 176
>   stp x1, x2, [sp, 1bd4971b7c71e70a637a1dq84]
>   add x1, sp, 240
>   add x0, sp, 16
>   stp x1, x1, [sp, 16]
>   str x10, [sp, 32]
>   stp w9, w8, [sp, 40]
>   stp q0, q1, [sp, 48]
>   stp q2, q3, [sp, 80]
>   stp q4, q5, [sp, 112]
>   stp q6, q7, [sp, 144]
>   stp x3, x4, [sp, 200]
>   stp x5, x6, [sp, 216]
>   str x7, [sp, 232]
>   bl  g
>   ldp x29, x30, [sp], 240
> .LCFI1:
>   ret
>
> Note that this patch isn't needed if we only use the mode
> canonicalization approach in the new ldp fusion pass (since we
> canonicalize T{I,F,D}mode to V16QImode), but we seem to get slightly
> better performance with mode canonicalization disabled (see
> --param=aarch64-ldp-canonicalize-modes in the following patch).
>
> Bootstrapped/regtested as a series on aarch64-linux-gnu, OK for trunk?
>
> gcc/ChangeLog:
>
>   * config/aarch64/aarch64.md (load_pair_dw_tftf): Rename to ...
>   (load_pair_dw_): ... this.
>   (store_pair_dw_tftf): Rename to ...
>   (store_pair_dw_): ... this.
>   * config/aarch64/iterators.md (TX2): New.

OK, thanks.  It would be nice to investigate & fix the reasons for
the regressions with canonicalised modes, but I agree that this patch
is a strict improvement, since it fixes a hole in the current scheme.

Richard

> ---
>  gcc/config/aarch64/aarch64.md   | 22 +++---
>  gcc/config/aarch64/iterators.md |  3 +++
>  2 files changed, 14 insertions(+), 11 deletions(-)
>
> diff --git a/gcc/config/aarch64/aarch64.md b/gcc/config/aarch64/aarch64.md
> index 32c7adc8928..e6af09c2e8b 100644
> --- a/gcc/config/aarch64/aarch64.md
> +++ b/gcc/config/aarch64/aarch64.md
> @@ -1757,16 +1757,16 @@ (define_insn "load_pair_dw_"
>}
>  )
>  
> -(define_insn "load_pair_dw_tftf"
> -  [(set (match_operand:TF 0 "register_operand" "=w")
> - (match_operand:TF 1 "aarch64_mem_pair_operand" "Ump"))
> -   (set (match_operand:TF 2 "register_operand" "=w")
> - (match_operand:TF 3 "memory_operand" "m"))]
> +(define_insn "load_pair_dw_"
> +  [(set (match_operand:TX 0 "register_operand" "=w")
> + (match_operand:TX 1 "aarch64_mem_pair_operand" "Ump"))
> +   (set (match_operand:TX2 2 "register_operand" "=w")
> + (match_operand:TX2 3 "memory_operand" "m"))]
> "TARGET_SIMD
>  && rtx_equal_p (XEXP (operands[3], 0),
>   plus_constant (Pmode,
>  XEXP (operands[1], 0),
> -GET_MODE_SIZE (TFmode)))"
> +GET_MODE_SIZE (mode)))"
>"ldp\\t%q0, %q2, %z1"
>[(set_attr "type" "neon_ldp_q")
> (set_attr "fp" "yes")]
> @@ -1805,11 +1805,11 @@ (define_insn "store_pair_dw_"
>}
>  )
>  
> -(define_insn "store_pair_dw_tftf"
> -  [(set (match_operand:TF 0 "aarch64_mem_pair_operand" "=Ump")
> - (match_operand:TF 1 "register_operand" "w"))
> -   (set (match_operand:TF 2 "memory_operand" "=m")
> - (match_operand:TF 3 "register_operand" "w"))]
> +(define_insn "store_pair_dw_"
> +  [(set (match_operand:TX 0 "aarch64_mem_pair_operand" "=Ump")
> + (match_operand:TX 1 "register_operand" "w"))
> +   (set (match_operand:TX2 2 "memory_operand" "=m")
> + (match_operand:TX2 3 "register_operand" "w"))]
> "TARGET_SIMD &&
>  rtx_equal_p (XEXP (operands[2], 0),
>

[Bug target/111867] New: aarch64: Wrong code for bf16 literal load when the arch support +fp16

2023-10-18 Thread iains at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111867

Bug ID: 111867
   Summary: aarch64: Wrong code for bf16 literal load when the
arch support +fp16
   Product: gcc
   Version: 14.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: target
  Assignee: unassigned at gcc dot gnu.org
  Reporter: iains at gcc dot gnu.org
  Target Milestone: ---

Analysing some target fails for aarch64-darwin.

The base arch for Apple M1 is (as far as I can determine)
armv8.4-a+fp16+sb+ssbs
So it has fp16 and fp15fml 0 but not bf16.

(M2 does have bf16).

===

int main ()
{
  __bf16 a = 1.0bf16;
  return (int) (a + a);
}

=== with the arch flags above produces:
_main:
 
fmovh31, 1.0e+0
str h31, [x29, 46]
ldr h15, [x29, 46]
mov v0.h[0], v15.h[0]
bl  ___extendbfsf2
fmovs14, s0


Which seems to be loading an __fp16 value into h31 (not a __bf16 value)
not surprisingly this fails.


I checked the instruction bit pattern with objdump, and it is 0x1eee101f, which
is clearly a fp16 load.


 with pruned arch flags armv8.4-a

_main:
 
adrpx0, lC0@PAGE
ldr h31, [x0, #lC0@PAGEOFF]
str h31, [x29, 30]
ldr h0, [x29, 30]
bl  ___extendbfsf2



lC0:
.hword  16256

which looks correct (and produces the expected answer).

So support for fp16 seems to be breaking soft __bf16.

Re: [PATCH 09/11] aarch64, testsuite: Fix up pr71727.c

2023-10-18 Thread Richard Sandiford
Alex Coplan  writes:
> The test is trying to check that we don't use q-register stores with
> -mstrict-align, so actually check specifically for that.
>
> This is a prerequisite to avoid regressing:
>
> scan-assembler-not "add\tx0, x0, :"
>
> with the upcoming ldp fusion pass, as we change where the ldps are
> formed such that a register is used rather than a symbolic (lo_sum)
> address for the first load.
>
> Bootstrapped/regtested as a series on aarch64-linux-gnu, OK for trunk?
>
> gcc/testsuite/ChangeLog:
>
>   * gcc.target/aarch64/pr71727.c: Adjust scan-assembler-not to
>   make sure we don't have q-register stores with -mstrict-align.

OK, thanks.

Richard

> ---
>  gcc/testsuite/gcc.target/aarch64/pr71727.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/gcc/testsuite/gcc.target/aarch64/pr71727.c 
> b/gcc/testsuite/gcc.target/aarch64/pr71727.c
> index 41fa72bc67e..226258a76fe 100644
> --- a/gcc/testsuite/gcc.target/aarch64/pr71727.c
> +++ b/gcc/testsuite/gcc.target/aarch64/pr71727.c
> @@ -30,4 +30,4 @@ _start (void)
>  }
>  
>  /* { dg-final { scan-assembler-times "mov\tx" 5 {target lp64} } } */
> -/* { dg-final { scan-assembler-not "add\tx0, x0, :" {target lp64} } } */
> +/* { dg-final { scan-assembler-not {st[rp]\tq[0-9]+} {target lp64} } } */


Re: [PATCH v3 1/2] c++: Initial support for P0847R7 (Deducing This) [PR102609]

2023-10-18 Thread Jason Merrill

On 10/18/23 13:28, waffl3x wrote:

I will try to get something done today, but I was struggling with
writing some of the tests, there's also a lot more of them now. I also
wrote a bunch of musings in comments that I would like feedback on.

My most concrete question is, how exactly should I be testing a
pedwarn, I want to test that I get the correct warning and error with
the separate flags, do I have to create two separate tests for each one?



Yes. I tend to use letter suffixes for tests that vary only in flags
(and expected results), e.g. feature1a.C, feature1b.C.


Will do.


Instead of OPT_Wpedantic, this should be controlled by
-Wc++23-extensions (OPT_Wc__23_extensions)


Yeah, I'll do this.


If you wanted, you could add a more specific warning option for this
(e.g. -Wc++23-explicit-this) which is also affected by
-Wc++23-extensions, but I would lean toward just using the existing
flag. Up to you.


I brought it up in irc and there was some pushback to my point of view
on it, so I'll just stick with OPT_Wc__23_extensions for now. I do
think a more sophisticated interface would be beneficial but I will
bring discussion around that up again in the future.

I've seen plenty of these G_ or _ macros on strings around like in
grokfndecl for these errors.

G_("static member function %qD cannot have cv-qualifier")
G_("non-member function %qD cannot have cv-qualifier")

G_("static member function %qD cannot have ref-qualifier")
G_("non-member function %qD cannot have ref-qualifier")

I have been able to figure out it relates to translation, but not
exactly what the protocol around them is.


The protocol is described in gcc/ABOUT-GCC-NLS.  In general, "strings" 
passed directly to a diagnostic function don't need any decoration, but 
if they're assigned to a variable first, they need G_() so they're 
recognized as diagnostic strings to be added to the translation table.


The _() macro is used for strings that are going to be passed to a %s, 
but better to avoid doing that for strings that need translation.  N_() 
is (rarely) used for strings that aren't diagnostic format strings, but 
get passed to another function that passes them to _().


Jason



Re: [PATCH 08/11] aarch64, testsuite: Tweak sve/pcs/args_9.c to allow stps

2023-10-18 Thread Richard Sandiford
Alex Coplan  writes:
> With the new ldp/stp pass enabled, there is a change in the codegen for
> this test as follows:
>
> add x8, sp, 16
> ptrue   p3.h, mul3
> str p3, [x8]
> -   str x8, [sp, 8]
> -   str x9, [sp]
> +   stp x9, x8, [sp]
> ptrue   p3.d, vl8
> ptrue   p2.s, vl7
> ptrue   p1.h, vl6
>
> i.e. we now form an stp that we were missing previously. This patch
> adjusts the scan-assembler such that it should pass whether or not
> we form the stp.
>
> Bootstrapped/regtested as a series on aarch64-linux-gnu, OK for trunk?
>
> gcc/testsuite/ChangeLog:
>
>   * gcc.target/aarch64/sve/pcs/args_9.c: Adjust scan-assemblers to
>   allow for stp.

OK, thanks.

Richard

> ---
>  gcc/testsuite/gcc.target/aarch64/sve/pcs/args_9.c | 4 ++--
>  1 file changed, 2 insertions(+), 2 deletions(-)
>
> diff --git a/gcc/testsuite/gcc.target/aarch64/sve/pcs/args_9.c 
> b/gcc/testsuite/gcc.target/aarch64/sve/pcs/args_9.c
> index ad9affadf02..942a44ab448 100644
> --- a/gcc/testsuite/gcc.target/aarch64/sve/pcs/args_9.c
> +++ b/gcc/testsuite/gcc.target/aarch64/sve/pcs/args_9.c
> @@ -45,5 +45,5 @@ caller (int64_t *x0, int16_t *x1, svbool_t p0)
>return svcntp_b8 (res, res);
>  }
>  
> -/* { dg-final { scan-assembler {\tptrue\t(p[0-9]+)\.b, mul3\n\tstr\t\1, 
> \[(x[0-9]+)\]\n.*\tstr\t\2, \[sp\]\n} } } */
> -/* { dg-final { scan-assembler {\tptrue\t(p[0-9]+)\.h, mul3\n\tstr\t\1, 
> \[(x[0-9]+)\]\n.*\tstr\t\2, \[sp, 8\]\n} } } */
> +/* { dg-final { scan-assembler {\tptrue\t(p[0-9]+)\.b, mul3\n\tstr\t\1, 
> \[(x[0-9]+)\]\n.*\t(?:str\t\2, \[sp\]|stp\t\2, x[0-9]+, \[sp\])\n} } } */
> +/* { dg-final { scan-assembler {\tptrue\t(p[0-9]+)\.h, mul3\n\tstr\t\1, 
> \[(x[0-9]+)\]\n.*\t(?:str\t\2, \[sp, 8\]|stp\tx[0-9]+, \2, \[sp\])\n} } } */


Re: [PATCH 07/11] aarch64, testsuite: Prevent stp in lr_free_1.c

2023-10-18 Thread Richard Sandiford
Alex Coplan  writes:
> The test is looking for individual stores which are able to be merged
> into stp instructions.  The test currently passes -fno-schedule-fusion
> -fno-peephole2, presumably to prevent these stores from being turned
> into stps, but this is no longer sufficient with the new ldp/stp fusion
> pass.
>
> As such, we add --param=aarch64-stp-policy=never to prevent stps being
> formed.
>
> Bootstrapped/regtested as a series on aarch64-linux-gnu, OK for trunk?
>
> gcc/testsuite/ChangeLog:
>
>   * gcc.target/aarch64/lr_free_1.c: Add
>   --param=aarch64-stp-policy=never to dg-options.

OK.  Thanks to Manos for adding this --param.

Richard

> ---
>  gcc/testsuite/gcc.target/aarch64/lr_free_1.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/gcc/testsuite/gcc.target/aarch64/lr_free_1.c 
> b/gcc/testsuite/gcc.target/aarch64/lr_free_1.c
> index 50dcf04e697..9949061096e 100644
> --- a/gcc/testsuite/gcc.target/aarch64/lr_free_1.c
> +++ b/gcc/testsuite/gcc.target/aarch64/lr_free_1.c
> @@ -1,5 +1,5 @@
>  /* { dg-do run } */
> -/* { dg-options "-fno-inline -O2 -fomit-frame-pointer -ffixed-x2 -ffixed-x3 
> -ffixed-x4 -ffixed-x5 -ffixed-x6 -ffixed-x7 -ffixed-x8 -ffixed-x9 -ffixed-x10 
> -ffixed-x11 -ffixed-x12 -ffixed-x13 -ffixed-x14 -ffixed-x15 -ffixed-x16 
> -ffixed-x17 -ffixed-x18 -ffixed-x19 -ffixed-x20 -ffixed-x21 -ffixed-x22 
> -ffixed-x23 -ffixed-x24 -ffixed-x25 -ffixed-x26 -ffixed-x27 -ffixed-28 
> -ffixed-29 --save-temps -mgeneral-regs-only -fno-ipa-cp -fno-schedule-fusion 
> -fno-peephole2" } */
> +/* { dg-options "-fno-inline -O2 -fomit-frame-pointer -ffixed-x2 -ffixed-x3 
> -ffixed-x4 -ffixed-x5 -ffixed-x6 -ffixed-x7 -ffixed-x8 -ffixed-x9 -ffixed-x10 
> -ffixed-x11 -ffixed-x12 -ffixed-x13 -ffixed-x14 -ffixed-x15 -ffixed-x16 
> -ffixed-x17 -ffixed-x18 -ffixed-x19 -ffixed-x20 -ffixed-x21 -ffixed-x22 
> -ffixed-x23 -ffixed-x24 -ffixed-x25 -ffixed-x26 -ffixed-x27 -ffixed-28 
> -ffixed-29 --save-temps -mgeneral-regs-only -fno-ipa-cp -fno-schedule-fusion 
> -fno-peephole2 --param=aarch64-stp-policy=never" } */
>  
>  extern void abort ();
>  


Re: [PATCH 04/11] rtl-ssa: Support inferring uses of mem in change_insns

2023-10-18 Thread Richard Sandiford
Alex Coplan  writes:
> Currently, rtl_ssa::change_insns requires all new uses and defs to be
> specified explicitly.  This turns out to be rather inconvenient for
> forming load pairs in the new aarch64 load pair pass, as the pass has to
> determine which mem def the final load pair consumes, and then obtain or
> create a suitable use (i.e. significant bookkeeping, just to keep the
> RTL-SSA IR consistent).  It turns out to be much more convenient to
> allow change_insns to infer which def is consumed and create a suitable
> use of mem itself.  This patch does that.
>
> Bootstrapped/regtested as a series on aarch64-linux-gnu, OK for trunk?
>
> gcc/ChangeLog:
>
>   * rtl-ssa/changes.cc (function_info::finalize_new_accesses): Add new
>   parameter to give final insn position, infer use of mem if it isn't
>   specified explicitly.
>   (function_info::change_insns): Pass down final insn position to
>   finalize_new_accesses.
>   * rtl-ssa/functions.h: Add parameter to finalize_new_accesses.

OK, thanks.

Richard

> ---
>  gcc/rtl-ssa/changes.cc  | 31 ---
>  gcc/rtl-ssa/functions.h |  2 +-
>  2 files changed, 29 insertions(+), 4 deletions(-)
>
> diff --git a/gcc/rtl-ssa/changes.cc b/gcc/rtl-ssa/changes.cc
> index c48ddd2463c..523ad60d7d8 100644
> --- a/gcc/rtl-ssa/changes.cc
> +++ b/gcc/rtl-ssa/changes.cc
> @@ -370,8 +370,11 @@ update_insn_in_place (insn_change )
>  // Finalize the new list of definitions and uses in CHANGE, removing
>  // any uses and definitions that are no longer needed, and converting
>  // pending clobbers into actual definitions.
> +//
> +// POS gives the final position of INSN, which hasn't yet been moved into
> +// place.
>  void
> -function_info::finalize_new_accesses (insn_change )
> +function_info::finalize_new_accesses (insn_change , insn_info *pos)
>  {
>insn_info *insn = change.insn ();
>  
> @@ -462,13 +465,34 @@ function_info::finalize_new_accesses (insn_change 
> )
>// Add (possibly temporary) uses to m_temp_uses for each resource.
>// If there are multiple references to the same resource, aggregate
>// information in the modes and flags.
> +  use_info *mem_use = nullptr;
>for (rtx_obj_reference ref : properties.refs ())
>  if (ref.is_read ())
>{
>   unsigned int regno = ref.regno;
>   machine_mode mode = ref.is_reg () ? ref.mode : BLKmode;
>   use_info *use = find_access (unshared_uses, ref.regno);
> - gcc_assert (use);
> + if (!use)
> +   {
> + // For now, we only support inferring uses of mem.
> + gcc_assert (regno == MEM_REGNO);
> +
> + if (mem_use)
> +   {
> + mem_use->record_reference (ref, false);
> + continue;
> +   }
> +
> + resource_info resource { mode, regno };
> + auto def = find_def (resource, pos).prev_def (pos);
> + auto set = safe_dyn_cast  (def);
> + gcc_assert (set);
> + mem_use = allocate (insn, resource, set);
> + mem_use->record_reference (ref, true);
> + m_temp_uses.safe_push (mem_use);
> + continue;
> +   }
> +
>   if (use->m_has_been_superceded)
> {
>   // This is the first reference to the resource.
> @@ -656,7 +680,8 @@ function_info::change_insns (array_slice 
> changes)
>  
> // Finalize the new list of accesses for the change.  Don't install
> // them yet, so that we still have access to the old lists below.
> -   finalize_new_accesses (change);
> +   finalize_new_accesses (change,
> +  placeholder ? placeholder : insn);
>   }
>placeholders[i] = placeholder;
>  }
> diff --git a/gcc/rtl-ssa/functions.h b/gcc/rtl-ssa/functions.h
> index d7da9774213..73690a0e63b 100644
> --- a/gcc/rtl-ssa/functions.h
> +++ b/gcc/rtl-ssa/functions.h
> @@ -265,7 +265,7 @@ private:
>  
>insn_info *add_placeholder_after (insn_info *);
>void possibly_queue_changes (insn_change &);
> -  void finalize_new_accesses (insn_change &);
> +  void finalize_new_accesses (insn_change &, insn_info *);
>void apply_changes_to_insn (insn_change &);
>  
>void init_function_data ();


Re: [PATCH 03/11] rtl-ssa: Add entry point to allow re-parenting uses

2023-10-18 Thread Richard Sandiford
Alex Coplan  writes:
> This is needed by the upcoming aarch64 load pair pass, as it can
> re-order stores (when alias analysis determines this is safe) and thus
> change which mem def a given use consumes (in the RTL-SSA view, there is
> no alias disambiguation of memory).
>
> Bootstrapped/regtested as a series on aarch64-linux-gnu, OK for trunk?
>
> gcc/ChangeLog:
>
>   * rtl-ssa/accesses.cc (function_info::reparent_use): New.
>   * rtl-ssa/functions.h (function_info): Declare new member
>   function reparent_use.

OK, thanks.

Richard

> ---
>  gcc/rtl-ssa/accesses.cc | 8 
>  gcc/rtl-ssa/functions.h | 3 +++
>  2 files changed, 11 insertions(+)
>
> diff --git a/gcc/rtl-ssa/accesses.cc b/gcc/rtl-ssa/accesses.cc
> index f12b5f4dd77..774ab9d99ee 100644
> --- a/gcc/rtl-ssa/accesses.cc
> +++ b/gcc/rtl-ssa/accesses.cc
> @@ -1239,6 +1239,14 @@ function_info::add_use (use_info *use)
>  insert_use_before (use, neighbor->value ());
>  }
>  
> +void
> +function_info::reparent_use (use_info *use, set_info *new_def)
> +{
> +  remove_use (use);
> +  use->set_def (new_def);
> +  add_use (use);
> +}
> +
>  // If USE has a known definition, remove USE from that definition's list
>  // of uses.  Also remove if it from the associated splay tree, if any.
>  void
> diff --git a/gcc/rtl-ssa/functions.h b/gcc/rtl-ssa/functions.h
> index 8b53b264064..d7da9774213 100644
> --- a/gcc/rtl-ssa/functions.h
> +++ b/gcc/rtl-ssa/functions.h
> @@ -159,6 +159,9 @@ public:
>// Like change_insns, but for a single change CHANGE.
>void change_insn (insn_change );
>  
> +  // Given a use USE, re-parent it to get its def from NEW_DEF.
> +  void reparent_use (use_info *use, set_info *new_def);
> +
>// If the changes that have been made to instructions require updates
>// to the CFG, perform those updates now.  Return true if something 
> changed.
>// If it did:


  1   2   3   >