Re: [17/17] check hash table insertions

2022-12-28 Thread Richard Biener via Gcc-patches



> Am 29.12.2022 um 00:06 schrieb Alexandre Oliva :
> 
> On Dec 28, 2022, Richard Biener  wrote:
> 
>> I wonder if on INSERT, pushing a DELETED marker would fix the dangling
>> insert and search during delete problem be whether that would be
>> better from a design point of view? (It of course requires a DELETED
>> representation)
> 
> I'm undecided whether a design that rules out the possibility of not
> having DELETED would qualify as unequivocally better.
> 
> Unless DELETED takes over the NULL representation, and something else is
> used for EMPTY, every INSERT point would have to be adjusted to look for
> the non-NULL DELETED representation.

Not sure what you mean - I think turning EMPTY (or DELETED) into DELETED at 
insert time would make the slot occupied for lookups and thus fix lookup during 
insert.  Of course insert during insert would still fail and possibly return 
the same slot again.  So the most foolproof design would be a new INSERTING 
representation.

>  That alone was enough for me to
> rule out going down that path.
> 
> If we were to change the INSERT callers, it would make sense to
> e.g. return a smart pointer that enforced the completion of the
> insertion.  But that wouldn't fix lookups during insertion without
> requiring a separate DELETED representation.
> 
> The one-pending-insertion strategy I implemented was the only one I
> could find that addressed all of the pitfalls without significant
> overhead.  It caught even one case that the mere element-counting had
> failed to catch.

Yes, it’s true that this addresses all issues with just imposing constraints on 
API use.  But the I wondered how you catched the completion of the insertion?  
(I’ll have to
Dig into the patch to see)

Richard 

> 
> -- 
> Alexandre Oliva, happy hackerhttps://FSFLA.org/blogs/lxo/
>   Free Software Activist   GNU Toolchain Engineer
> Disinformation flourishes because many people care deeply about injustice
> but very few check the facts.  Ask me about 


Re: [15/17] prevent hash set/map insertion of deleted entries

2022-12-28 Thread Jeff Law via Gcc-patches




On 12/28/22 05:32, Alexandre Oliva via Gcc-patches wrote:

On Dec 27, 2022, David Malcolm  wrote:


Would it make sense to also add assertions that such entries aren't
Traits::is_deleted?  (both for hash_map and hash_set)


Yeah, I guess so.  I've come up with something for hash-table proper
too, coming up in 17/17.


Just like the recently-added checks for empty entries, add checks for
deleted entries as well.  This didn't catch any problems, but it might
prevent future accidents.  Suggested by David Malcolm.

Regstrapped on x86_64-linux-gnu.  Ok to install?


for  gcc/ChangeLog

* hash-map.h (put, get_or_insert): Check that added entry
doesn't look deleted either.
& hash-set.h (add): Likewise.

OK
jeff


Re: [14/17] parloops: don't request insert that won't be completed

2022-12-28 Thread Jeff Law via Gcc-patches




On 12/28/22 05:30, Alexandre Oliva via Gcc-patches wrote:


In take_address_of, we may refrain from completing a decl_address
INSERT if gsi is NULL, so dnn't even ask for an INSERT in this case.

Regstrapped on x86_64-linux-gnu.  Ok to install?


for  gcc/ChangeLog

* tree-parloops.cc (take_address_of): Skip INSERT if !gsi.

OK
jeff


Re: [PATCH] RISC-V: Optimize min/max with SImode sources on 64-bit

2022-12-28 Thread Jeff Law via Gcc-patches




On 12/28/22 11:18, Raphael Moreira Zinsly wrote:

The Zbb min/max pattern was not matching 32-bit sources when
compiling for 64-bit.
This patch separates the pattern into SImode and DImode, and
use a define_expand to handle SImode on 64-bit.
zbb-min-max-02.c generates different code as a result of the new
expander.  The resulting code is as efficient as the old code.
Furthermore, the special sh1add pattern that appeared in
zbb-min-max-02.c is tested by the zba-shNadd-* tests.

gcc/ChangeLog:

* config/riscv/bitmanip.md
(3): Divide pattern into
si3_insn and di3.
(si3): Handle SImode sources on
TARGET_64BIT.

gcc/testsuite:

* gcc.target/riscv/zbb-abs.c: New test.
* gcc.target/riscv/zbb-min-max-02.c: Addapt the
expected output.

So we need to do a bit of x86 debugging.

Given the regressions below on the x86 testsuite, we should assume there 
may be other targets where the optimization might result in testsuite 
regressions.


The good news is we can can use my upstream GCC tester to help identify 
some of these cases.  So I'll put the simplify-rtx change into my tester 
and see what pops out overnight on the embedded targets.


You're also missing a ChangeLog entry for the simplify-rtx change. 
Sorry I didn't catch that sooner.




--- /home/jlaw/gcc.sum  2022-12-28 15:59:52.612140312 -0700
+++ gcc/testsuite/gcc/gcc.sum   2022-12-28 17:14:52.661333526 -0700
@@ -182730,9 +182730,9 @@
 PASS: gcc.target/i386/pr106095.c (test for excess errors)
 PASS: gcc.target/i386/pr106122.c (test for excess errors)
 PASS: gcc.target/i386/pr106231-1.c (test for excess errors)
-PASS: gcc.target/i386/pr106231-1.c scan-assembler-not cltq
+FAIL: gcc.target/i386/pr106231-1.c scan-assembler-not cltq
 PASS: gcc.target/i386/pr106231-2.c (test for excess errors)
-PASS: gcc.target/i386/pr106231-2.c scan-assembler-not cltq
+FAIL: gcc.target/i386/pr106231-2.c scan-assembler-not cltq
 PASS: gcc.target/i386/pr106273.c (test for excess errors)
 PASS: gcc.target/i386/pr106273.c scan-assembler-not andn[ \\t]+%rdi, 
%r11, %rdi

 PASS: gcc.target/i386/pr106303.c (test for excess errors)
@@ -186148,7 +186148,7 @@
 PASS: gcc.target/i386/pr91704.c scan-assembler-not \tvpsubusb\t
 PASS: gcc.target/i386/pr91704.c scan-assembler-times \tvpcmpgtb\t%ymm 1
 PASS: gcc.target/i386/pr91824-1.c (test for excess errors)
-PASS: gcc.target/i386/pr91824-1.c scan-assembler-not cltq
+FAIL: gcc.target/i386/pr91824-1.c scan-assembler-not cltq
 PASS: gcc.target/i386/pr91824-2.c (test for excess errors)
 PASS: gcc.target/i386/pr91824-2.c scan-assembler-not cltq
 PASS: gcc.target/i386/pr91824-2.c scan-assembler-not movl\t%eax, %eax
@@ -186561,9 +186561,9 @@
 PASS: gcc.target/i386/pr95483-7.c (test for excess errors)
 PASS: gcc.target/i386/pr95483-7.c scan-assembler-times vmovdqu[ 
\\t]+[^{\n]*\\)[^\n]*%xmm[0-9]+(?:\n|[ \\t]+#) 2

 PASS: gcc.target/i386/pr95535-1.c (test for excess errors)
-PASS: gcc.target/i386/pr95535-1.c scan-assembler-not cltq
+FAIL: gcc.target/i386/pr95535-1.c scan-assembler-not cltq
 PASS: gcc.target/i386/pr95535-2.c (test for excess errors)
-PASS: gcc.target/i386/pr95535-2.c scan-assembler-not cltq
+FAIL: gcc.target/i386/pr95535-2.c scan-assembler-not cltq
 PASS: gcc.target/i386/pr95740.c (test for excess errors)
 PASS: gcc.target/i386/pr95740.c scan-assembler-times (?n)incl[\\t ]*%eax 1
 PASS: gcc.target/i386/pr95740.c scan-assembler-times (?n)incq[\\t ]*%rax 1



Re: [17/17] check hash table insertions

2022-12-28 Thread Alexandre Oliva via Gcc-patches
On Dec 28, 2022, Richard Biener  wrote:

> I wonder if on INSERT, pushing a DELETED marker would fix the dangling
> insert and search during delete problem be whether that would be
> better from a design point of view? (It of course requires a DELETED
> representation)

I'm undecided whether a design that rules out the possibility of not
having DELETED would qualify as unequivocally better.

Unless DELETED takes over the NULL representation, and something else is
used for EMPTY, every INSERT point would have to be adjusted to look for
the non-NULL DELETED representation.  That alone was enough for me to
rule out going down that path.

If we were to change the INSERT callers, it would make sense to
e.g. return a smart pointer that enforced the completion of the
insertion.  But that wouldn't fix lookups during insertion without
requiring a separate DELETED representation.

The one-pending-insertion strategy I implemented was the only one I
could find that addressed all of the pitfalls without significant
overhead.  It caught even one case that the mere element-counting had
failed to catch.

-- 
Alexandre Oliva, happy hackerhttps://FSFLA.org/blogs/lxo/
   Free Software Activist   GNU Toolchain Engineer
Disinformation flourishes because many people care deeply about injustice
but very few check the facts.  Ask me about 


[PATCH] RISC-V: Optimize min/max with SImode sources on 64-bit

2022-12-28 Thread Raphael Moreira Zinsly
The Zbb min/max pattern was not matching 32-bit sources when
compiling for 64-bit.
This patch separates the pattern into SImode and DImode, and
use a define_expand to handle SImode on 64-bit.
zbb-min-max-02.c generates different code as a result of the new
expander.  The resulting code is as efficient as the old code.
Furthermore, the special sh1add pattern that appeared in
zbb-min-max-02.c is tested by the zba-shNadd-* tests.

gcc/ChangeLog:

* config/riscv/bitmanip.md
(3): Divide pattern into
si3_insn and di3.
(si3): Handle SImode sources on
TARGET_64BIT.

gcc/testsuite:

* gcc.target/riscv/zbb-abs.c: New test.
* gcc.target/riscv/zbb-min-max-02.c: Addapt the
expected output.
---
 gcc/config/riscv/bitmanip.md  | 38 ---
 gcc/testsuite/gcc.target/riscv/zbb-abs.c  | 18 +
 .../gcc.target/riscv/zbb-min-max-02.c |  2 +-
 3 files changed, 52 insertions(+), 6 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/riscv/zbb-abs.c

diff --git a/gcc/config/riscv/bitmanip.md b/gcc/config/riscv/bitmanip.md
index d17133d58c1..abf08a29e89 100644
--- a/gcc/config/riscv/bitmanip.md
+++ b/gcc/config/riscv/bitmanip.md
@@ -360,14 +360,42 @@
   DONE;
 })
 
-(define_insn "3"
-  [(set (match_operand:X 0 "register_operand" "=r")
-(bitmanip_minmax:X (match_operand:X 1 "register_operand" "r")
-  (match_operand:X 2 "register_operand" "r")))]
-  "TARGET_ZBB"
+(define_insn "si3_insn"
+  [(set (match_operand:SI 0 "register_operand" "=r")
+(bitmanip_minmax:SI (match_operand:SI 1 "register_operand" "r")
+(match_operand:SI 2 "register_operand" "r")))]
+  "!TARGET_64BIT && TARGET_ZBB"
   "\t%0,%1,%2"
   [(set_attr "type" "bitmanip")])
 
+(define_insn "di3"
+  [(set (match_operand:DI 0 "register_operand" "=r")
+(bitmanip_minmax:DI (match_operand:DI 1 "register_operand" "r")
+(match_operand:DI 2 "register_operand" "r")))]
+  "TARGET_64BIT && TARGET_ZBB"
+  "\t%0,%1,%2"
+  [(set_attr "type" "bitmanip")])
+
+(define_expand "si3"
+  [(set (match_operand:SI 0 "register_operand" "=r")
+(bitmanip_minmax:SI (match_operand:SI 1 "register_operand" "r")
+(match_operand:SI 2 "register_operand" "r")))]
+  "TARGET_ZBB"
+  "
+{
+  if (TARGET_64BIT)
+{
+  rtx op1_x = gen_reg_rtx (DImode);
+  emit_move_insn (op1_x, gen_rtx_SIGN_EXTEND (DImode, operands[1]));
+  rtx op2_x = gen_reg_rtx (DImode);
+  emit_move_insn (op2_x, gen_rtx_SIGN_EXTEND (DImode, operands[2]));
+  rtx dst_x = gen_reg_rtx (DImode);
+  emit_insn (gen_di3 (dst_x, op1_x, op2_x));
+  emit_move_insn (operands[0], gen_lowpart (SImode, dst_x));
+  DONE;
+}
+}")
+
 ;; Optimize the common case of a SImode min/max against a constant
 ;; that is safe both for sign- and zero-extension.
 (define_insn_and_split "*minmax"
diff --git a/gcc/testsuite/gcc.target/riscv/zbb-abs.c 
b/gcc/testsuite/gcc.target/riscv/zbb-abs.c
new file mode 100644
index 000..6ef7efdbd49
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/zbb-abs.c
@@ -0,0 +1,18 @@
+/* { dg-do compile } */
+/* { dg-options "-march=rv64gc_zbb" } */
+/* { dg-skip-if "" { *-*-* } { "-O0" } } */
+
+#define ABS(x) (((x) >= 0) ? (x) : -(x))
+
+int
+foo (int x)
+{
+  return ABS(x);
+}
+
+/* { dg-final { scan-assembler-times "neg" 1 } } */
+/* { dg-final { scan-assembler-times "max" 1 } } */
+/* { dg-final { scan-assembler-not "sraiw" } } */
+/* { dg-final { scan-assembler-not "xor" } } */
+/* { dg-final { scan-assembler-not "subw" } } */
+
diff --git a/gcc/testsuite/gcc.target/riscv/zbb-min-max-02.c 
b/gcc/testsuite/gcc.target/riscv/zbb-min-max-02.c
index b462859f10f..b9db655d55d 100644
--- a/gcc/testsuite/gcc.target/riscv/zbb-min-max-02.c
+++ b/gcc/testsuite/gcc.target/riscv/zbb-min-max-02.c
@@ -9,6 +9,6 @@ int f(unsigned int* a)
 }
 
 /* { dg-final { scan-assembler-times "minu" 1 } } */
-/* { dg-final { scan-assembler-times "sext.w" 1 } } */
+/* { dg-final { scan-assembler-times "sext.w|addw" 1 } } */
 /* { dg-final { scan-assembler-not "zext.w" } } */
 
-- 
2.38.1



[PATCH 2/2] libstdc++: Implement P1413R3 'deprecate aligned_storage and aligned_union'

2022-12-28 Thread Nathaniel Shead via Gcc-patches
These two patches implement P1413 (deprecate std::aligned_storage and
std::aligned_union) for C++23. Tested on x86_64-linux.

-- >8 --

Adds deprecated attributes for C++23, and makes use of it for
std::aligned_storage, std::aligned_storage_t, std::aligned_union, and
std::aligned_union_t.

libstdc++-v3/ChangeLog:

* doc/doxygen/user.cfg.in (PREDEFINED): Add new macros.
* include/bits/c++config (_GLIBCXX23_DEPRECATED)
(_GLIBCXX23_DEPRECATED_SUGGEST): New macros.
* include/std/type_traits (aligned_storage, aligned_union)
(aligned_storage_t, aligned_union_t): Deprecate for C++23.
* testsuite/20_util/aligned_storage/deprecated-2b.cc: New test.
* testsuite/20_util/aligned_union/deprecated-2b.cc: New test.

Signed-off-by: Nathaniel Shead 
---
 libstdc++-v3/doc/doxygen/user.cfg.in  |  2 ++
 libstdc++-v3/include/bits/c++config   | 10 +++
 libstdc++-v3/include/std/type_traits  | 17 +---
 .../20_util/aligned_storage/deprecated-2b.cc  | 26 +++
 .../20_util/aligned_union/deprecated-2b.cc| 26 +++
 5 files changed, 77 insertions(+), 4 deletions(-)
 create mode 100644 
libstdc++-v3/testsuite/20_util/aligned_storage/deprecated-2b.cc
 create mode 100644 
libstdc++-v3/testsuite/20_util/aligned_union/deprecated-2b.cc

diff --git a/libstdc++-v3/doc/doxygen/user.cfg.in 
b/libstdc++-v3/doc/doxygen/user.cfg.in
index fc46e722529..31613f51517 100644
--- a/libstdc++-v3/doc/doxygen/user.cfg.in
+++ b/libstdc++-v3/doc/doxygen/user.cfg.in
@@ -2396,6 +2396,8 @@ PREDEFINED = __cplusplus=202002L \
  "_GLIBCXX17_DEPRECATED_SUGGEST(E)= " \
  "_GLIBCXX20_DEPRECATED= " \
  "_GLIBCXX20_DEPRECATED_SUGGEST(E)= " \
+ "_GLIBCXX23_DEPRECATED= " \
+ "_GLIBCXX23_DEPRECATED_SUGGEST(E)= " \
  _GLIBCXX17_INLINE=inline \
  _GLIBCXX_CHRONO_INT64_T=int64_t \
  _GLIBCXX_DEFAULT_ABI_TAG \
diff --git a/libstdc++-v3/include/bits/c++config 
b/libstdc++-v3/include/bits/c++config
index d2b0cfa15ce..7cec5d3de2d 100644
--- a/libstdc++-v3/include/bits/c++config
+++ b/libstdc++-v3/include/bits/c++config
@@ -86,6 +86,8 @@
 //   _GLIBCXX17_DEPRECATED_SUGGEST( string-literal )
 //   _GLIBCXX20_DEPRECATED
 //   _GLIBCXX20_DEPRECATED_SUGGEST( string-literal )
+//   _GLIBCXX23_DEPRECATED
+//   _GLIBCXX23_DEPRECATED_SUGGEST( string-literal )
 #ifndef _GLIBCXX_USE_DEPRECATED
 # define _GLIBCXX_USE_DEPRECATED 1
 #endif
@@ -131,6 +133,14 @@
 # define _GLIBCXX20_DEPRECATED_SUGGEST(ALT)
 #endif
 
+#if defined(__DEPRECATED) && (__cplusplus >= 202100L)
+# define _GLIBCXX23_DEPRECATED [[__deprecated__]]
+# define _GLIBCXX23_DEPRECATED_SUGGEST(ALT) _GLIBCXX_DEPRECATED_SUGGEST(ALT)
+#else
+# define _GLIBCXX23_DEPRECATED
+# define _GLIBCXX23_DEPRECATED_SUGGEST(ALT)
+#endif
+
 // Macros for ABI tag attributes.
 #ifndef _GLIBCXX_ABI_TAG_CXX11
 # define _GLIBCXX_ABI_TAG_CXX11 __attribute ((__abi_tag__ ("cxx11")))
diff --git a/libstdc++-v3/include/std/type_traits 
b/libstdc++-v3/include/std/type_traits
index 2f4d4bb8d4d..9df833e82be 100644
--- a/libstdc++-v3/include/std/type_traits
+++ b/libstdc++-v3/include/std/type_traits
@@ -2088,10 +2088,15 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
*  type shall be a POD type suitable for use as uninitialized
*  storage for any object whose size is at most _Len and whose
*  alignment is a divisor of _Align.
+   *
+   *  @deprecated Deprecated in C++23. Uses can be replaced by an
+   *  array std::byte[_Len] declared with alignas(_Align).
   */
   template::__type)>
-struct aligned_storage
+struct
+_GLIBCXX23_DEPRECATED
+aligned_storage
 {
   union type
   {
@@ -2127,9 +2132,13 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
*  least size _Len.
*
*  @see aligned_storage
+   *
+   *  @deprecated Deprecated in C++23.
*/
   template 
-struct aligned_union
+struct
+_GLIBCXX23_DEPRECATED
+aligned_union
 {
 private:
   static_assert(sizeof...(_Types) != 0, "At least one type is required");
@@ -2580,10 +2589,10 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
   /// Alias template for aligned_storage
   template::__type)>
-using aligned_storage_t = typename aligned_storage<_Len, _Align>::type;
+using aligned_storage_t _GLIBCXX23_DEPRECATED = typename 
aligned_storage<_Len, _Align>::type;
 
   template 
-using aligned_union_t = typename aligned_union<_Len, _Types...>::type;
+using aligned_union_t _GLIBCXX23_DEPRECATED = typename aligned_union<_Len, 
_Types...>::type;
 
   /// Alias template for decay
   template
diff --git a/libstdc++-v3/testsuite/20_util/aligned_storage/deprecated-2b.cc 
b/libstdc++-v3/testsuite/20_util/aligned_storage/deprecated-2b.cc
new file mode 100644
index 000..a0e338a5843
--- /dev/null
+++ b/libstdc++-v3/testsuite/20_util/aligned_stor

[PATCH 1/2] libstdc++: Normalise _GLIBCXX20_DEPRECATED macro

2022-12-28 Thread Nathaniel Shead via Gcc-patches
These two patches implement P1413 (deprecate std::aligned_storage and
std::aligned_union) for C++23. Tested on x86_64-linux.

-- >8 --

Updates _GLIBCXX20_DEPRECATED to be defined and behave the same as the
versions for other standards (e.g. _GLIBCXX17_DEPRECATED).

libstdc++-v3/ChangeLog:

* doc/doxygen/user.cfg.in (PREDEFINED): Update macros.
* include/bits/c++config (_GLIBCXX20_DEPRECATED): Make
consistent with other 'deprecated' macros.
* include/std/type_traits (is_pod, is_pod_v): Use
_GLIBCXX20_DEPRECATED_SUGGEST instead.

Signed-off-by: Nathaniel Shead 
---
 libstdc++-v3/doc/doxygen/user.cfg.in | 4 ++--
 libstdc++-v3/include/bits/c++config  | 6 +++---
 libstdc++-v3/include/std/type_traits | 4 ++--
 3 files changed, 7 insertions(+), 7 deletions(-)

diff --git a/libstdc++-v3/doc/doxygen/user.cfg.in 
b/libstdc++-v3/doc/doxygen/user.cfg.in
index 834ad9e4fd5..fc46e722529 100644
--- a/libstdc++-v3/doc/doxygen/user.cfg.in
+++ b/libstdc++-v3/doc/doxygen/user.cfg.in
@@ -2394,8 +2394,8 @@ PREDEFINED = __cplusplus=202002L \
  "_GLIBCXX11_DEPRECATED_SUGGEST(E)= " \
  "_GLIBCXX17_DEPRECATED= " \
  "_GLIBCXX17_DEPRECATED_SUGGEST(E)= " \
- "_GLIBCXX20_DEPRECATED(E)= " \
- "_GLIBCXX20_DEPRECATED(E)= " \
+ "_GLIBCXX20_DEPRECATED= " \
+ "_GLIBCXX20_DEPRECATED_SUGGEST(E)= " \
  _GLIBCXX17_INLINE=inline \
  _GLIBCXX_CHRONO_INT64_T=int64_t \
  _GLIBCXX_DEFAULT_ABI_TAG \
diff --git a/libstdc++-v3/include/bits/c++config 
b/libstdc++-v3/include/bits/c++config
index 50406066afe..d2b0cfa15ce 100644
--- a/libstdc++-v3/include/bits/c++config
+++ b/libstdc++-v3/include/bits/c++config
@@ -84,7 +84,7 @@
 //   _GLIBCXX14_DEPRECATED_SUGGEST( string-literal )
 //   _GLIBCXX17_DEPRECATED
 //   _GLIBCXX17_DEPRECATED_SUGGEST( string-literal )
-//   _GLIBCXX20_DEPRECATED( string-literal )
+//   _GLIBCXX20_DEPRECATED
 //   _GLIBCXX20_DEPRECATED_SUGGEST( string-literal )
 #ifndef _GLIBCXX_USE_DEPRECATED
 # define _GLIBCXX_USE_DEPRECATED 1
@@ -124,10 +124,10 @@
 #endif
 
 #if defined(__DEPRECATED) && (__cplusplus >= 202002L)
-# define _GLIBCXX20_DEPRECATED(MSG) [[deprecated(MSG)]]
+# define _GLIBCXX20_DEPRECATED [[__deprecated__]]
 # define _GLIBCXX20_DEPRECATED_SUGGEST(ALT) _GLIBCXX_DEPRECATED_SUGGEST(ALT)
 #else
-# define _GLIBCXX20_DEPRECATED(MSG)
+# define _GLIBCXX20_DEPRECATED
 # define _GLIBCXX20_DEPRECATED_SUGGEST(ALT)
 #endif
 
diff --git a/libstdc++-v3/include/std/type_traits 
b/libstdc++-v3/include/std/type_traits
index 5dc9e1b2921..2f4d4bb8d4d 100644
--- a/libstdc++-v3/include/std/type_traits
+++ b/libstdc++-v3/include/std/type_traits
@@ -815,7 +815,7 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
   // Could use is_standard_layout && is_trivial instead of the builtin.
   template
 struct
-_GLIBCXX20_DEPRECATED("use is_standard_layout && is_trivial instead")
+_GLIBCXX20_DEPRECATED_SUGGEST("is_standard_layout && is_trivial")
 is_pod
 : public integral_constant
 {
@@ -3210,7 +3210,7 @@ template 
 template 
   inline constexpr bool is_standard_layout_v = __is_standard_layout(_Tp);
 template 
-  _GLIBCXX20_DEPRECATED("use is_standard_layout_v && is_trivial_v instead")
+  _GLIBCXX20_DEPRECATED_SUGGEST("is_standard_layout_v && is_trivial_v")
   inline constexpr bool is_pod_v = __is_pod(_Tp);
 template 
   _GLIBCXX17_DEPRECATED
-- 
2.34.1



Re: [17/17] check hash table insertions

2022-12-28 Thread Richard Biener via Gcc-patches



> Am 28.12.2022 um 13:51 schrieb Alexandre Oliva via Gcc-patches 
> :
> 
> On Dec 27, 2022, Alexandre Oliva  wrote:
> 
>> The number of bugs it revealed tells me that leaving callers in charge
>> of completing insertions is too error prone.  I found this
>> verification code a bit too expensive to enable in general.
> 
> Here's a relatively cheap, checking-only test to catch dangling inserts.
> 
> 
> I've noticed a number of potential problems in hash tables, of three
> kinds: insertion of entries that seem empty, dangling insertions, and
> lookups during insertions.
> 
> These problems may all have the effect of replacing a deleted entry
> with one that seems empty, which may disconnect double-hashing chains
> involving that entry, and thus cause entries to go missing.
> 
> This patch detects such problems by recording a pending insertion and
> checking that it's completed before other potentially-conflicting
> operations.  The additional field is only introduced when checking is
> enabled.

I wonder if on INSERT, pushing a DELETED marker would fix the dangling insert 
and search during delete problem be whether that would be better from a design 
point of view? (It of course requires a DELETED representation)

> Regstrapped on x86_64-linux-gnu.  Ok to install?
> 
> 
> for  gcc/ChnageLog
> 
>* hash-table.h (check_complete_insertion, check_insert_slot):
>New hash_table methods.
>(m_inserting_slot): New hash_table field.
>(begin, hash_table ctors, ~hash_table): Check previous insert.
>(expand, empty_slow, clear_slot, find_with_hash): Likewise.
>(remote_elt_with_hash, traverse_noresize): Likewise.
>(gt_pch_nx): Likewise.
>(find_slot_with_hash): Likewise.  Record requested insert.
> ---
> gcc/hash-table.h |   63 +++---
> 1 file changed, 60 insertions(+), 3 deletions(-)
> 
> diff --git a/gcc/hash-table.h b/gcc/hash-table.h
> index f4bda6102323e..33753d04b7bdb 100644
> --- a/gcc/hash-table.h
> +++ b/gcc/hash-table.h
> @@ -495,6 +495,7 @@ public:
> {
>   if (Lazy && m_entries == NULL)
>return iterator ();
> +  check_complete_insertion ();
>   iterator iter (m_entries, m_entries + m_size);
>   iter.slide ();
>   return iter;
> @@ -551,8 +552,39 @@ private:
> Descriptor::mark_empty (v);
>   }
> 
> +public:
> +  void check_complete_insertion () const
> +  {
> +#if CHECKING_P
> +if (!m_inserting_slot)
> +  return;
> +
> +gcc_checking_assert (m_inserting_slot >= &m_entries[0]
> + && m_inserting_slot < &m_entries[m_size]);
> +
> +if (!is_empty (*m_inserting_slot))
> +  m_inserting_slot = NULL;
> +else
> +  gcc_unreachable ();
> +#endif
> +  }
> +
> +private:
> +  value_type *check_insert_slot (value_type *ret)
> +  {
> +#if CHECKING_P
> +gcc_checking_assert (is_empty (*ret));
> +m_inserting_slot = ret;
> +#endif
> +return ret;
> +  }
> +
> +#if CHECKING_P
> +  mutable value_type *m_inserting_slot;
> +#endif
> +
>   /* Table itself.  */
> -  typename Descriptor::value_type *m_entries;
> +  value_type *m_entries;
> 
>   size_t m_size;
> 
> @@ -607,6 +639,9 @@ hash_table::hash_table 
> (size_t size, bool ggc,
> ATTRIBUTE_UNUSED,
> mem_alloc_origin origin
> MEM_STAT_DECL) :
> +#if CHECKING_P
> +  m_inserting_slot (0),
> +#endif
>   m_n_elements (0), m_n_deleted (0), m_searches (0), m_collisions (0),
>   m_ggc (ggc), m_sanitize_eq_and_hash (sanitize_eq_and_hash)
> #if GATHER_STATISTICS
> @@ -639,6 +674,9 @@ hash_table::hash_table 
> (const hash_table &h,
> ATTRIBUTE_UNUSED,
> mem_alloc_origin origin
> MEM_STAT_DECL) :
> +#if CHECKING_P
> +  m_inserting_slot (0),
> +#endif
>   m_n_elements (h.m_n_elements), m_n_deleted (h.m_n_deleted),
>   m_searches (0), m_collisions (0), m_ggc (ggc),
>   m_sanitize_eq_and_hash (sanitize_eq_and_hash)
> @@ -646,6 +684,8 @@ hash_table::hash_table 
> (const hash_table &h,
>   , m_gather_mem_stats (gather_mem_stats)
> #endif
> {
> +  h.check_complete_insertion ();
> +
>   size_t size = h.m_size;
> 
>   if (m_gather_mem_stats)
> @@ -675,6 +715,8 @@ template template class Allocator>
> hash_table::~hash_table ()
> {
> +  check_complete_insertion ();
> +
>   if (!Lazy || m_entries)
> {
>   for (size_t i = m_size - 1; i < m_size; i--)
> @@ -778,6 +820,8 @@ template void
> hash_table::expand ()
> {
> +  check_complete_insertion ();
> +
>   value_type *oentries = m_entries;
>   unsigned int oindex = m_size_prime_index;
>   size_t osize = size ();
> @@ -853,6 +897,8 @@ template void
> hash_table::empty_slow ()
> {
> +  check_complete_insertion ();
> +
>   size_t size = m_size;
>   size_t nsize = size;
>   value_type *entries = m_entries;
> @@ -901,6 +947,8 @@ template void
> hash_table::clear_slot (value_type *slot)
> {
> +  check_complete_insertion ();
> +
>

Re: [x86 PATCH] Provide zero_extend versions/variants of several patterns.

2022-12-28 Thread Uros Bizjak via Gcc-patches
On Wed, Dec 28, 2022 at 1:22 PM Roger Sayle  wrote:
>
>
> Hi Uros,
> Many thanks for your reviews.
>
> > On Wed, Dec 28, 2022 at 2:15 AM Roger Sayle
> >  wrote:
> > >
> > >
> > > Back in September, the review of my patch for PR
> > > rtl-optimization/106594,
> > > https://gcc.gnu.org/pipermail/gcc-patches/2022-September/601501.html
> > > suggested that I submit the x86 backend bits, independently and first.
> > >
> > > The executive summary is that the middle-end doesn't have a preferred
> > > canonical form for expressing zero-extension, sometimes using an AND
> > > and sometimes using zero_extend.  Pending changes to RTL
> > > simplification will/may alter some of these representations, so a few
> > > additional patterns are required to recognize these alternate
> > > representations and avoid any testsuite regressions.
> > >
> > > As an example, *popcountsi2_zext is currently represented as:
> > >   [(set (match_operand:DI 0 "register_operand" "=r")
> > > (and:DI
> > >   (subreg:DI
> > > (popcount:SI
> > >   (match_operand:SI 1 "nonimmediate_operand" "rm")) 0)
> > >   (const_int 63)))
> > >(clobber (reg:CC FLAGS_REG))]
> > >
> > > this patch adds an alternate/equivalent pattern that matches:
> > >   [(set (match_operand:DI 0 "register_operand" "=r")
> > >(zero_extend:DI
> > >  (popcount:SI (match_operand:SI 1 "nonimmediate_operand" "rm"
> > >(clobber (reg:CC FLAGS_REG))]
> > >
> > > Another example is *popcounthi2 which is currently represented as:
> > >   [(set (match_operand:SI 0 "register_operand")
> > > (popcount:SI
> > >   (zero_extend:SI (match_operand:HI 1 "nonimmediate_operand"
> > >(clobber (reg:CC FLAGS_REG))]
> > >
> > > this patch adds an alternate/equivalent pattern that matches:
> > >   [(set (match_operand:SI 0 "register_operand")
> > > (zero_extend:SI
> > >   (popcount:HI (match_operand:HI 1 "nonimmediate_operand"
> > >(clobber (reg:CC FLAGS_REG))]
> > >
> > > The contents of the machine description definitions remain the same,
> > > it's just the expected RTL is slightly different but equivalent.
> > > Providing both forms makes the backend more robust to middle-end
> > > changes [and possibly catches some missed optimizations].
> >
> > It would be nice to have a canonical representation of zero-extended 
> > patterns,
> > but this is what we have now. Unfortunately, a certain HW limitation 
> > requires
> > several patterns for one insn, so the canonical representation is even more
> > desirable here.  Hopefully, a "future"
> > patch will allow us some cleanups in this area.
> >
> > > This patch has been tested on x86_64-pc-linux-gnu with make bootstrap
> > > and make -k check, both with and without --target_board=unix{-m32},
> > > with no new failures.  Ok for mainline?
> >
> > OK, but please split out HImode popcount&1 pattern to a separate patch to 
> > not
> > mix separate topics in one patch.
>
> The peephole2 is the exact same topic; it's not a new optimization or feature,
> but identical fall-out from the representational changes to ZERO_EXTEND.
>
> The existing peephole2 (line 18305 of i386.md) matches this sequence:
>
>(parallel [(set (match_operand:HI 1 "register_operand")
>(popcount:HI
> (match_operand:HI 2 "nonimmediate_operand")))
>   (clobber (reg:CC FLAGS_REG))])
>(set (match_operand 3 "register_operand")
> (zero_extend (match_dup 1)))
>(set (reg:CCZ FLAGS_REG)
> (compare:CCZ (and:QI (match_operand:QI 4 "register_operand")
>  (const_int 1))
>  (const_int 0)))
>(set (pc) (if_then_else (match_operator 5 "bt_comparison_operator"
> [(reg:CCZ FLAGS_REG)
>  (const_int 0)])
>(label_ref (match_operand 6))
>(pc)))]
>
> but with pending tweaks/changes to middle-end zero_extend handing, the
> peephole2 pass now sees the equivalent (apologies for the renumbering):
>
>(parallel [(set (match_operand:HI 1 "register_operand")
>   (popcount:HI
>(match_operand:HI 2 "nonimmediate_operand")))
>  (clobber (reg:CC FLAGS_REG))])
>(set (reg:CCZ FLAGS_REG)
> (compare:CCZ (and:QI (match_operand:QI 3 "register_operand")
> (const_int 1))
> (const_int 0)))
>(set (pc) (if_then_else (match_operator 4 "bt_comparison_operator"
>[(reg:CCZ FLAGS_REG)
> (const_int 0)])
>(label_ref (match_operand 5))
>(pc)))]
>
> Notice that the zero_extension has been eliminated, as match_dup 3
> and match_dup 4 are the same register, and we only need to inspect
> the lowest bit.  i.e. the current transform matches a redundant
> instructi

[17/17] check hash table insertions

2022-12-28 Thread Alexandre Oliva via Gcc-patches
On Dec 27, 2022, Alexandre Oliva  wrote:

> The number of bugs it revealed tells me that leaving callers in charge
> of completing insertions is too error prone.  I found this
> verification code a bit too expensive to enable in general.

Here's a relatively cheap, checking-only test to catch dangling inserts.


I've noticed a number of potential problems in hash tables, of three
kinds: insertion of entries that seem empty, dangling insertions, and
lookups during insertions.

These problems may all have the effect of replacing a deleted entry
with one that seems empty, which may disconnect double-hashing chains
involving that entry, and thus cause entries to go missing.

This patch detects such problems by recording a pending insertion and
checking that it's completed before other potentially-conflicting
operations.  The additional field is only introduced when checking is
enabled.

Regstrapped on x86_64-linux-gnu.  Ok to install?


for  gcc/ChnageLog

* hash-table.h (check_complete_insertion, check_insert_slot):
New hash_table methods.
(m_inserting_slot): New hash_table field.
(begin, hash_table ctors, ~hash_table): Check previous insert.
(expand, empty_slow, clear_slot, find_with_hash): Likewise.
(remote_elt_with_hash, traverse_noresize): Likewise.
(gt_pch_nx): Likewise.
(find_slot_with_hash): Likewise.  Record requested insert.
---
 gcc/hash-table.h |   63 +++---
 1 file changed, 60 insertions(+), 3 deletions(-)

diff --git a/gcc/hash-table.h b/gcc/hash-table.h
index f4bda6102323e..33753d04b7bdb 100644
--- a/gcc/hash-table.h
+++ b/gcc/hash-table.h
@@ -495,6 +495,7 @@ public:
 {
   if (Lazy && m_entries == NULL)
return iterator ();
+  check_complete_insertion ();
   iterator iter (m_entries, m_entries + m_size);
   iter.slide ();
   return iter;
@@ -551,8 +552,39 @@ private:
 Descriptor::mark_empty (v);
   }
 
+public:
+  void check_complete_insertion () const
+  {
+#if CHECKING_P
+if (!m_inserting_slot)
+  return;
+
+gcc_checking_assert (m_inserting_slot >= &m_entries[0]
+&& m_inserting_slot < &m_entries[m_size]);
+
+if (!is_empty (*m_inserting_slot))
+  m_inserting_slot = NULL;
+else
+  gcc_unreachable ();
+#endif
+  }
+
+private:
+  value_type *check_insert_slot (value_type *ret)
+  {
+#if CHECKING_P
+gcc_checking_assert (is_empty (*ret));
+m_inserting_slot = ret;
+#endif
+return ret;
+  }
+
+#if CHECKING_P
+  mutable value_type *m_inserting_slot;
+#endif
+
   /* Table itself.  */
-  typename Descriptor::value_type *m_entries;
+  value_type *m_entries;
 
   size_t m_size;
 
@@ -607,6 +639,9 @@ hash_table::hash_table (size_t 
size, bool ggc,
 ATTRIBUTE_UNUSED,
 mem_alloc_origin origin
 MEM_STAT_DECL) :
+#if CHECKING_P
+  m_inserting_slot (0),
+#endif
   m_n_elements (0), m_n_deleted (0), m_searches (0), m_collisions (0),
   m_ggc (ggc), m_sanitize_eq_and_hash (sanitize_eq_and_hash)
 #if GATHER_STATISTICS
@@ -639,6 +674,9 @@ hash_table::hash_table (const 
hash_table &h,
 ATTRIBUTE_UNUSED,
 mem_alloc_origin origin
 MEM_STAT_DECL) :
+#if CHECKING_P
+  m_inserting_slot (0),
+#endif
   m_n_elements (h.m_n_elements), m_n_deleted (h.m_n_deleted),
   m_searches (0), m_collisions (0), m_ggc (ggc),
   m_sanitize_eq_and_hash (sanitize_eq_and_hash)
@@ -646,6 +684,8 @@ hash_table::hash_table (const 
hash_table &h,
   , m_gather_mem_stats (gather_mem_stats)
 #endif
 {
+  h.check_complete_insertion ();
+
   size_t size = h.m_size;
 
   if (m_gather_mem_stats)
@@ -675,6 +715,8 @@ template class Allocator>
 hash_table::~hash_table ()
 {
+  check_complete_insertion ();
+
   if (!Lazy || m_entries)
 {
   for (size_t i = m_size - 1; i < m_size; i--)
@@ -778,6 +820,8 @@ template::expand ()
 {
+  check_complete_insertion ();
+
   value_type *oentries = m_entries;
   unsigned int oindex = m_size_prime_index;
   size_t osize = size ();
@@ -853,6 +897,8 @@ template::empty_slow ()
 {
+  check_complete_insertion ();
+
   size_t size = m_size;
   size_t nsize = size;
   value_type *entries = m_entries;
@@ -901,6 +947,8 @@ template::clear_slot (value_type *slot)
 {
+  check_complete_insertion ();
+
   gcc_checking_assert (!(slot < m_entries || slot >= m_entries + size ()
 || is_empty (*slot) || is_deleted (*slot)));
 
@@ -927,6 +975,8 @@ hash_table
   if (Lazy && m_entries == NULL)
 m_entries = alloc_entries (size);
 
+  check_complete_insertion ();
+
 #if CHECKING_P
   if (m_sanitize_eq_and_hash)
 verify (comparable, hash);
@@ -976,6 +1026,8 @@ hash_table
 }
   if (insert 

[16/17] check hash table counts at expand

2022-12-28 Thread Alexandre Oliva via Gcc-patches
On Dec 28, 2022, Martin Liška  wrote:

> I really like the verification code you added. It's quite similar to what
> I added to hash-table.h:

*nod*

Prompted by your encouragement, I've combined the element count
verification code with the verify() loop, and also added it to expand,
where it can be done cheaply.


Add cheap verification of element and deleted entry counts during
expand and hash verify.

Regstrapped on x86_64-linux-gnu.  Ok to install?


for  gcc/ChangeLog

* hash-table.h (expand): Check elements and deleted counts.
(verify): Likewise.
---
 gcc/hash-table.h |   35 ---
 1 file changed, 28 insertions(+), 7 deletions(-)

diff --git a/gcc/hash-table.h b/gcc/hash-table.h
index 53507daae26c3..f4bda6102323e 100644
--- a/gcc/hash-table.h
+++ b/gcc/hash-table.h
@@ -805,19 +805,28 @@ hash_table::expand ()
 hash_table_usage ().release_instance_overhead (this, sizeof (value_type)
* osize);
 
+  size_t n_deleted = m_n_deleted;
+
   m_entries = nentries;
   m_size = nsize;
   m_size_prime_index = nindex;
   m_n_elements -= m_n_deleted;
   m_n_deleted = 0;
 
+  size_t n_elements = m_n_elements;
+
   value_type *p = oentries;
   do
 {
   value_type &x = *p;
 
-  if (!is_empty (x) && !is_deleted (x))
+  if (is_empty (x))
+   ;
+  else if (is_deleted (x))
+   n_deleted--;
+  else
 {
+ n_elements--;
   value_type *q = find_empty_slot_for_expand (Descriptor::hash (x));
  new ((void*) q) value_type (std::move (x));
  /* After the resources of 'x' have been moved to a new object at 'q',
@@ -829,6 +838,8 @@ hash_table::expand ()
 }
   while (p < olimit);
 
+  gcc_checking_assert (!n_elements && !n_deleted);
+
   if (!m_ggc)
 Allocator  ::data_free (oentries);
   else
@@ -1018,8 +1029,9 @@ hash_table
   return &m_entries[index];
 }
 
-/* Verify that all existing elements in th hash table which are
-   equal to COMPARABLE have an equal HASH value provided as argument.  */
+/* Verify that all existing elements in the hash table which are
+   equal to COMPARABLE have an equal HASH value provided as argument.
+   Also check that the hash table element counts are correct.  */
 
 template class Allocator>
@@ -1027,14 +1039,23 @@ void
 hash_table
 ::verify (const compare_type &comparable, hashval_t hash)
 {
+  size_t n_elements = m_n_elements;
+  size_t n_deleted = m_n_deleted;
   for (size_t i = 0; i < MIN (hash_table_sanitize_eq_limit, m_size); i++)
 {
   value_type *entry = &m_entries[i];
-  if (!is_empty (*entry) && !is_deleted (*entry)
- && hash != Descriptor::hash (*entry)
- && Descriptor::equal (*entry, comparable))
-   hashtab_chk_error ();
+  if (!is_empty (*entry))
+   {
+ n_elements--;
+ if (is_deleted (*entry))
+   n_deleted--;
+ else if (hash != Descriptor::hash (*entry)
+  && Descriptor::equal (*entry, comparable))
+   hashtab_chk_error ();
+   }
 }
+  if (hash_table_sanitize_eq_limit >= m_size)
+gcc_checking_assert (!n_elements && !n_deleted);
 }
 
 /* This function deletes an element with the given COMPARABLE value


-- 
Alexandre Oliva, happy hackerhttps://FSFLA.org/blogs/lxo/
   Free Software Activist   GNU Toolchain Engineer
Disinformation flourishes because many people care deeply about injustice
but very few check the facts.  Ask me about 


[15/17] prevent hash set/map insertion of deleted entries

2022-12-28 Thread Alexandre Oliva via Gcc-patches
On Dec 27, 2022, David Malcolm  wrote:

> Would it make sense to also add assertions that such entries aren't
> Traits::is_deleted?  (both for hash_map and hash_set)

Yeah, I guess so.  I've come up with something for hash-table proper
too, coming up in 17/17.


Just like the recently-added checks for empty entries, add checks for
deleted entries as well.  This didn't catch any problems, but it might
prevent future accidents.  Suggested by David Malcolm.

Regstrapped on x86_64-linux-gnu.  Ok to install?


for  gcc/ChangeLog

* hash-map.h (put, get_or_insert): Check that added entry
doesn't look deleted either.
& hash-set.h (add): Likewise.
---
 gcc/hash-map.h |8 +---
 gcc/hash-set.h |3 ++-
 2 files changed, 7 insertions(+), 4 deletions(-)

diff --git a/gcc/hash-map.h b/gcc/hash-map.h
index 63fa21cf37c5b..e6ca9cf5e6429 100644
--- a/gcc/hash-map.h
+++ b/gcc/hash-map.h
@@ -173,8 +173,9 @@ public:
   if (ins)
{
  e->m_key = k;
- new ((void *) &e->m_value) Value (v);
- gcc_checking_assert (!Traits::is_empty (*e));
+ new ((void *)&e->m_value) Value (v);
+ gcc_checking_assert (!Traits::is_empty (*e)
+  && !Traits::is_deleted (*e));
}
   else
e->m_value = v;
@@ -204,7 +205,8 @@ public:
{
  e->m_key = k;
  new ((void *)&e->m_value) Value ();
- gcc_checking_assert (!Traits::is_empty (*e));
+ gcc_checking_assert (!Traits::is_empty (*e)
+  && !Traits::is_deleted (*e));
}
 
   if (existed != NULL)
diff --git a/gcc/hash-set.h b/gcc/hash-set.h
index a98121a060eed..08e1851d5118d 100644
--- a/gcc/hash-set.h
+++ b/gcc/hash-set.h
@@ -61,7 +61,8 @@ public:
{
  new (e) Key (k);
  // Catch attempts to insert e.g. a NULL pointer.
- gcc_checking_assert (!Traits::is_empty (*e));
+ gcc_checking_assert (!Traits::is_empty (*e)
+  && !Traits::is_deleted (*e));
}
 
   return existed;


-- 
Alexandre Oliva, happy hackerhttps://FSFLA.org/blogs/lxo/
   Free Software Activist   GNU Toolchain Engineer
Disinformation flourishes because many people care deeply about injustice
but very few check the facts.  Ask me about 


[14/17] parloops: don't request insert that won't be completed

2022-12-28 Thread Alexandre Oliva via Gcc-patches


In take_address_of, we may refrain from completing a decl_address
INSERT if gsi is NULL, so dnn't even ask for an INSERT in this case.

Regstrapped on x86_64-linux-gnu.  Ok to install?


for  gcc/ChangeLog

* tree-parloops.cc (take_address_of): Skip INSERT if !gsi.
---
 gcc/tree-parloops.cc |7 +--
 1 file changed, 5 insertions(+), 2 deletions(-)

diff --git a/gcc/tree-parloops.cc b/gcc/tree-parloops.cc
index e680d97dd0497..b829ba7b873b6 100644
--- a/gcc/tree-parloops.cc
+++ b/gcc/tree-parloops.cc
@@ -1221,8 +1221,11 @@ take_address_of (tree obj, tree type, edge entry,
   uid = DECL_UID (TREE_OPERAND (TREE_OPERAND (*var_p, 0), 0));
   int_tree_map elt;
   elt.uid = uid;
-  int_tree_map *slot = decl_address->find_slot (elt, INSERT);
-  if (!slot->to)
+  int_tree_map *slot = decl_address->find_slot (elt,
+   gsi == NULL
+   ? NO_INSERT
+   : INSERT);
+  if (!slot || !slot->to)
 {
   if (gsi == NULL)
return NULL;


-- 
Alexandre Oliva, happy hackerhttps://FSFLA.org/blogs/lxo/
   Free Software Activist   GNU Toolchain Engineer
Disinformation flourishes because many people care deeply about injustice
but very few check the facts.  Ask me about 


RE: [x86 PATCH] Provide zero_extend versions/variants of several patterns.

2022-12-28 Thread Roger Sayle


Hi Uros,
Many thanks for your reviews.

> On Wed, Dec 28, 2022 at 2:15 AM Roger Sayle
>  wrote:
> >
> >
> > Back in September, the review of my patch for PR
> > rtl-optimization/106594,
> > https://gcc.gnu.org/pipermail/gcc-patches/2022-September/601501.html
> > suggested that I submit the x86 backend bits, independently and first.
> >
> > The executive summary is that the middle-end doesn't have a preferred
> > canonical form for expressing zero-extension, sometimes using an AND
> > and sometimes using zero_extend.  Pending changes to RTL
> > simplification will/may alter some of these representations, so a few
> > additional patterns are required to recognize these alternate
> > representations and avoid any testsuite regressions.
> >
> > As an example, *popcountsi2_zext is currently represented as:
> >   [(set (match_operand:DI 0 "register_operand" "=r")
> > (and:DI
> >   (subreg:DI
> > (popcount:SI
> >   (match_operand:SI 1 "nonimmediate_operand" "rm")) 0)
> >   (const_int 63)))
> >(clobber (reg:CC FLAGS_REG))]
> >
> > this patch adds an alternate/equivalent pattern that matches:
> >   [(set (match_operand:DI 0 "register_operand" "=r")
> >(zero_extend:DI
> >  (popcount:SI (match_operand:SI 1 "nonimmediate_operand" "rm"
> >(clobber (reg:CC FLAGS_REG))]
> >
> > Another example is *popcounthi2 which is currently represented as:
> >   [(set (match_operand:SI 0 "register_operand")
> > (popcount:SI
> >   (zero_extend:SI (match_operand:HI 1 "nonimmediate_operand"
> >(clobber (reg:CC FLAGS_REG))]
> >
> > this patch adds an alternate/equivalent pattern that matches:
> >   [(set (match_operand:SI 0 "register_operand")
> > (zero_extend:SI
> >   (popcount:HI (match_operand:HI 1 "nonimmediate_operand"
> >(clobber (reg:CC FLAGS_REG))]
> >
> > The contents of the machine description definitions remain the same,
> > it's just the expected RTL is slightly different but equivalent.
> > Providing both forms makes the backend more robust to middle-end
> > changes [and possibly catches some missed optimizations].
> 
> It would be nice to have a canonical representation of zero-extended patterns,
> but this is what we have now. Unfortunately, a certain HW limitation requires
> several patterns for one insn, so the canonical representation is even more
> desirable here.  Hopefully, a "future"
> patch will allow us some cleanups in this area.
> 
> > This patch has been tested on x86_64-pc-linux-gnu with make bootstrap
> > and make -k check, both with and without --target_board=unix{-m32},
> > with no new failures.  Ok for mainline?
> 
> OK, but please split out HImode popcount&1 pattern to a separate patch to not
> mix separate topics in one patch.

The peephole2 is the exact same topic; it's not a new optimization or feature, 
but identical fall-out from the representational changes to ZERO_EXTEND.

The existing peephole2 (line 18305 of i386.md) matches this sequence:

   (parallel [(set (match_operand:HI 1 "register_operand")
   (popcount:HI
(match_operand:HI 2 "nonimmediate_operand")))
  (clobber (reg:CC FLAGS_REG))])
   (set (match_operand 3 "register_operand")
(zero_extend (match_dup 1)))
   (set (reg:CCZ FLAGS_REG)
(compare:CCZ (and:QI (match_operand:QI 4 "register_operand")
 (const_int 1))
 (const_int 0)))
   (set (pc) (if_then_else (match_operator 5 "bt_comparison_operator"
[(reg:CCZ FLAGS_REG)
 (const_int 0)])
   (label_ref (match_operand 6))
   (pc)))]

but with pending tweaks/changes to middle-end zero_extend handing, the
peephole2 pass now sees the equivalent (apologies for the renumbering):

   (parallel [(set (match_operand:HI 1 "register_operand")
  (popcount:HI
   (match_operand:HI 2 "nonimmediate_operand")))
 (clobber (reg:CC FLAGS_REG))])
   (set (reg:CCZ FLAGS_REG)
(compare:CCZ (and:QI (match_operand:QI 3 "register_operand")
(const_int 1))
(const_int 0)))
   (set (pc) (if_then_else (match_operator 4 "bt_comparison_operator"
   [(reg:CCZ FLAGS_REG)
(const_int 0)])
   (label_ref (match_operand 5))
   (pc)))]

Notice that the zero_extension has been eliminated, as match_dup 3
and match_dup 4 are the same register, and we only need to inspect
the lowest bit.  i.e. the current transform matches a redundant
instruction, that the RTL optimizers will soon be able to eliminate.

Hence, IMHO, the issue is identical, the RTL that the backend is expecting
is sensitive the representations and transformations performed upstream.

All the new patterns appear in the MD file immediately after

Re: Re: [PATCH] RISC-V: Fix ICE of visiting non-existing block in CFG.

2022-12-28 Thread 钟居哲
Yeah, I agree with you that it makes the pass looks confusing that if we are 
mixing FOR_EACH_BB and for (const bb_info *bb...
But Jeff feels happy if I use FOR_EACH_BB so I send a patch to change the 
iterator form if it doesn't care about the order.
In this patch, it's ok for both FOR_EACH_BB and for (const bb_info *bb... So I 
change it as Jeff suggested.

However, in other places of this pass, for example 
compute_global_backward_infos function, I want to iterate blocks in reverse 
order and I must use 
"for (const bb_info *bb : crtl->ssa->reverse_bbs ())" which can allow me to do 
the information backward propagation throughly
so that I can do the aggressive and fancy optimization.

Base on these situations, it will be mixing FOR_EACH_BB and for (const bb_info 
*bb...  in this pass which may make the pass
a little bit confusing.


juzhe.zh...@rivai.ai
 
From: Richard Sandiford
Date: 2022-12-28 19:47
To: Jeff Law via Gcc-patches
CC: juzhe.zhong; Jeff Law; kito.cheng\@gmail.com; palmer\@dabbelt.com
Subject: Re: [PATCH] RISC-V: Fix ICE of visiting non-existing block in CFG.
Jeff Law via Gcc-patches  writes:
> On 12/27/22 16:11, juzhe.zhong wrote:
>> You mean only change to this form you suggested in this patch? Since in 
>> all other places of this PASS,I use RTL_SSA framework to iterate 
>> instructions and blocks. I use RTL_SSA framework to iterate blocks here 
>> to make codes look more consistent even though they are same here.
> The FOR_EACH_BB is used far more widely than the C++ style found in 
> RTL-SSA so I'd slightly prefer that style.
 
I can see where you're coming from, but what the patch does is preferred
for RTL-SSA passes.  There is some additional information in
rtl_ssa::bb_info compared to the underlying basic_block, and even if
this particular loop doesn't use that information, IMO it would be
better to avoid mixing styles within a pass.
 
Also, the list that the patch iterates over is in reverse postorder,
whereas FOR_EACH_BB doesn't guarantee a particular order.  Again,
that might not be important here, but it seems better to stick to the
“native” RTL-SSA approach.
 
Thanks,
Richard
 


Re: [PATCH] RISC-V: Fix ICE of visiting non-existing block in CFG.

2022-12-28 Thread Richard Sandiford via Gcc-patches
Jeff Law via Gcc-patches  writes:
> On 12/27/22 16:11, juzhe.zhong wrote:
>> You mean only change to this form you suggested in this patch? Since in 
>> all other places of this PASS,I use RTL_SSA framework to iterate 
>> instructions and blocks. I use RTL_SSA framework to iterate blocks here 
>> to make codes look more consistent even though they are same here.
> The FOR_EACH_BB is used far more widely than the C++ style found in 
> RTL-SSA so I'd slightly prefer that style.

I can see where you're coming from, but what the patch does is preferred
for RTL-SSA passes.  There is some additional information in
rtl_ssa::bb_info compared to the underlying basic_block, and even if
this particular loop doesn't use that information, IMO it would be
better to avoid mixing styles within a pass.

Also, the list that the patch iterates over is in reverse postorder,
whereas FOR_EACH_BB doesn't guarantee a particular order.  Again,
that might not be important here, but it seems better to stick to the
“native” RTL-SSA approach.

Thanks,
Richard


RE: [PATCH v2] libgfortran: Replace mutex with rwlock

2022-12-28 Thread Zhu, Lipeng via Gcc-patches
> Hi Lipeng,

> > This patch try to introduce the rwlock and split the read/write to 
> > unit_root tree and unit_cache with rwlock instead of the mutex to 
> > increase CPU efficiency. In the get_gfc_unit function, the percentage 
> > to step into the insert_unit function is around 30%, in most 
> > instances, we can get the unit in the phase of reading the unit_cache 
> > or unit_root tree. So split the read/write phase by rwlock would be an 
> > approach to make it more parallel.
>
> No comment on the code itself, as yet... but I'd like to know how throroughly 
> you tested it, using which tools, and on which programs.
> Did you use valgrind --tool=helgrind or --tool=drd?  Since it is prone to 
> race conditions, did you also test Fortran's asynchronous I/O?
>
> Best regards
>
>   Thomas

Hi Thomas,

I didn’t use valgrind and make check-fortran succeed in local. 
And the tools/programs I used is pts/neatbench 
https://openbenchmarking.org/test/pts/neatbench

Best Regards,
Lipeng Zhu




[PATCH] libgfortran: Replace mutex with rwlock

2022-12-28 Thread Zhu, Lipeng via Gcc-patches
> libstdc++ implements shared mutex with pthread_rwlock, which can 
> libstdc++ conflict
> with the pthread_rwlock usage in libgcc.  Lipeng, please limit the 
> pthread_rwlock usage in libgcc only when __cplusplus isn't defined.
> 
> 
> --
> H.J.

Thanks for suggestion, send a V2 patch. 

--
Lipeng Zhu


Re: [x86 PATCH] Provide zero_extend versions/variants of several patterns.

2022-12-28 Thread Uros Bizjak via Gcc-patches
On Wed, Dec 28, 2022 at 2:15 AM Roger Sayle  wrote:
>
>
> Back in September, the review of my patch for PR rtl-optimization/106594,
> https://gcc.gnu.org/pipermail/gcc-patches/2022-September/601501.html
> suggested that I submit the x86 backend bits, independently and first.
>
> The executive summary is that the middle-end doesn't have a preferred
> canonical form for expressing zero-extension, sometimes using an AND
> and sometimes using zero_extend.  Pending changes to RTL simplification
> will/may alter some of these representations, so a few additional
> patterns are required to recognize these alternate representations
> and avoid any testsuite regressions.
>
> As an example, *popcountsi2_zext is currently represented as:
>   [(set (match_operand:DI 0 "register_operand" "=r")
> (and:DI
>   (subreg:DI
> (popcount:SI
>   (match_operand:SI 1 "nonimmediate_operand" "rm")) 0)
>   (const_int 63)))
>(clobber (reg:CC FLAGS_REG))]
>
> this patch adds an alternate/equivalent pattern that matches:
>   [(set (match_operand:DI 0 "register_operand" "=r")
>(zero_extend:DI
>  (popcount:SI (match_operand:SI 1 "nonimmediate_operand" "rm"
>(clobber (reg:CC FLAGS_REG))]
>
> Another example is *popcounthi2 which is currently represented as:
>   [(set (match_operand:SI 0 "register_operand")
> (popcount:SI
>   (zero_extend:SI (match_operand:HI 1 "nonimmediate_operand"
>(clobber (reg:CC FLAGS_REG))]
>
> this patch adds an alternate/equivalent pattern that matches:
>   [(set (match_operand:SI 0 "register_operand")
> (zero_extend:SI
>   (popcount:HI (match_operand:HI 1 "nonimmediate_operand"
>(clobber (reg:CC FLAGS_REG))]
>
> The contents of the machine description definitions remain the same,
> it's just the expected RTL is slightly different but equivalent.
> Providing both forms makes the backend more robust to middle-end
> changes [and possibly catches some missed optimizations].

It would be nice to have a canonical representation of zero-extended
patterns, but this is what we have now. Unfortunately, a certain HW
limitation requires several patterns for one insn, so the canonical
representation is even more desirable here.  Hopefully, a "future"
patch will allow us some cleanups in this area.

> This patch has been tested on x86_64-pc-linux-gnu with make bootstrap
> and make -k check, both with and without --target_board=unix{-m32},
> with no new failures.  Ok for mainline?

OK, but please split out HImode popcount&1 pattern to a separate patch
to not mix separate topics in one patch.

Thanks,
Uros.

>
>
> 2022-12-28  Roger Sayle  
>
> gcc/ChangeLog
> * config/i386/i386.md (*clzsi2_lzcnt_zext_2): define_insn_and_split
> to match ZERO_EXTEND form of *clzsi2_lzcnt_zext.
> (*clzsi2_lzcnt_zext_2_falsedep): Likewise, new define_insn to match
> ZERO_EXTEND form of *clzsi2_lzcnt_zext_falsedep.
> (*bmi2_bzhi_zero_extendsidi_5): Likewise, new define_insn to match
> ZERO_EXTEND form of *bmi2_bzhi_zero_extendsidi.
> (*popcountsi2_zext_2): Likewise, new define_insn_and_split to match
> ZERO_EXTEND form of *popcountsi2_zext.
> (*popcountsi2_zext_2_falsedep): Likewise, new define_insn to match
> ZERO_EXTEND form of *popcountsi2_zext_falsedep.
> (*popcounthi2_2): Likewise, new define_insn_and_split to match
> ZERO_EXTEND form of *popcounthi2.
> (define_peephole2): ZERO_EXTEND variant of HImode popcount&1 using
> parity flag peephole2.
>
> Thanks in advance,
> Roger
> --
>


Re: [x86_64 PATCH] Add post-reload splitter for extendditi2.

2022-12-28 Thread Uros Bizjak via Gcc-patches
On Wed, Dec 28, 2022 at 1:32 AM Roger Sayle  wrote:
>
>
> This is another step towards a possible solution for PR 105137.
> This patch introduces a define_insn_and_split for extendditi2,
> that allows DImode to TImode sign-extension to be represented in
> the early RTL optimizers, before being split post-reload into
> the exact same idiom as currently produced by RTL expansion.

Please see extendsidi2_1 insn pattern and follow-up splitters and
peephole2 patterns that do exactly what you want to achieve, but they
are currently handling only SImode to DImode on 32-bit targets. OTOH,
these patterns handle several more cases (e.g. split to the memory
output) and just have to be macroized with DWIH mode iterator to also
handle DImode to TImode on 64-bit targets. Probably, an extendsidi
expander will have to be slightly adjusted when macroized to signal
middle end the availability of extendditi pattern.

Following macroization, any possible follow-up optimizations and
improvements will then be automatically applied also to 32-bit
targets.

Uros.

>
> Typically this produces the identical code, so the first new
> test case:
> __int128 foo(long long x) { return (__int128)x; }
>
> continues to generate:
> foo:movq%rdi, %rax
> cqto
> ret
>
> The "magic" is that this representation allows combine and the
> other RTL optimizers to do a better job.  Hence, the second
> test case:
>
> __int128 foo(__int128 a, long long b) {
> a += ((__int128)b) << 70;
> return a;
> }
>
> which mainline with -O2 currently generates as:
>
> foo:movq%rsi, %rax
> movq%rdx, %rcx
> movq%rdi, %rsi
> salq$6, %rcx
> movq%rax, %rdi
> xorl%eax, %eax
> movq%rcx, %rdx
> addq%rsi, %rax
> adcq%rdi, %rdx
> ret
>
> with this patch now becomes:
> foo:movl$0, %eax
> salq$6, %rdx
> addq%rdi, %rax
> adcq%rsi, %rdx
> ret
>
> i.e. the same code for the signed and unsigned extension variants.
>
> This patch has been tested on x86_64-pc-linux-gnu with make bootstrap
> and make -k check, both with and without --target_board=unix{-m32},
> with no new failures.  Ok for mainline?
>
> 2022-12-28  Roger Sayle  
>
> gcc/ChangeLog
> * config/i386/i386.md (extendditi2): New define_insn_and_split
> to split DImode to TImode sign-extension after reload.
>
> gcc/testsuite/ChangeLog
> * gcc.target/i386/extendditi2-1.c: New test case.
> * gcc.target/i386/extendditi2-2.c: Likewise.
>
>
> Thanks in advance,
> Roger
> --
>


Re: [PATCH v2] libgfortran: Replace mutex with rwlock

2022-12-28 Thread Thomas Koenig via Gcc-patches

Hi Lipeng,


This patch try to introduce the rwlock and split the read/write to
unit_root tree and unit_cache with rwlock instead of the mutex to
increase CPU efficiency. In the get_gfc_unit function, the percentage
to step into the insert_unit function is around 30%, in most instances,
we can get the unit in the phase of reading the unit_cache or unit_root
tree. So split the read/write phase by rwlock would be an approach to
make it more parallel.


No comment on the code itself, as yet... but I'd like to know how
throroughly you tested it, using which tools, and on which programs.
Did you use valgrind --tool=helgrind or --tool=drd?  Since it is
prone to race conditions, did you also test Fortran's asynchronous I/O?

Best regards

Thomas


Re: [PATCH] contrib: add contrib to update-copyright.py script

2022-12-28 Thread Martin Liška
On 12/14/22 14:13, Jakub Jelinek wrote:
> s/Filder/Filter/g ?

Yep. Pushed with the fix as there hasn't been any comments on this.

Martin


Re: [PATCH] IPA: do not release body if still needed

2022-12-28 Thread Martin Liška
PING^2

On 12/9/22 09:28, Martin Liška wrote:
> PING^1
> 
> On 12/1/22 10:59, Martin Liška wrote:
>> Hi.
>>
>> Noticed during building of libbackend.a with the LTO partial linking.
>>
>> The function release_body is called even if clone_of is a clone
>> of a another function and thus it shares tree declaration. We should
>> preserve it in that situation.
>>
>> Patch can bootstrap on x86_64-linux-gnu and survives regression tests.
>>
>> Ready to be installed?
>> Thanks,
>> Martin
>>
>>  PR ipa/107944
>>
>> gcc/ChangeLog:
>>
>>  * cgraph.cc (cgraph_node::remove): Do not release body
>>  if a node is clone of another node.
>> ---
>>  gcc/cgraph.cc | 2 +-
>>  1 file changed, 1 insertion(+), 1 deletion(-)
>>
>> diff --git a/gcc/cgraph.cc b/gcc/cgraph.cc
>> index f15cb47c8b8..2e7d77ffd6c 100644
>> --- a/gcc/cgraph.cc
>> +++ b/gcc/cgraph.cc
>> @@ -1893,7 +1893,7 @@ cgraph_node::remove (void)
>>else if (clone_of)
>>  {
>>clone_of->clones = next_sibling_clone;
>> -  if (!clone_of->analyzed && !clone_of->clones && !clones)
>> +  if (!clone_of->analyzed && !clone_of->clones && !clones && 
>> !clone_of->clone_of)
>>  clone_of->release_body ();
>>  }
>>if (next_sibling_clone)
> 



[PATCH] docs: fix Var documentation for .opt files

2022-12-28 Thread Martin Liška
The Var documentation was somehow wrongly split into 2 pieces.

PR middle-end/107966

gcc/ChangeLog:

* doc/options.texi: Fix Var documentation in internal manual.
---
 gcc/doc/options.texi | 26 +-
 1 file changed, 13 insertions(+), 13 deletions(-)

diff --git a/gcc/doc/options.texi b/gcc/doc/options.texi
index 17ba923890e..39a2d1a0d43 100644
--- a/gcc/doc/options.texi
+++ b/gcc/doc/options.texi
@@ -340,19 +340,6 @@ The state of this option should be stored in variable 
@var{var}
 (actually a macro for @code{global_options.x_@var{var}}).
 The way that the state is stored depends on the type of option:
 
-@item WarnRemoved
-The option is removed and every usage of such option will
-result in a warning.  We use it option backward compatibility.
-
-@item Var(@var{var}, @var{set})
-The option controls an integer variable @var{var} and is active when
-@var{var} equals @var{set}.  The option parser will set @var{var} to
-@var{set} when the positive form of the option is used and @code{!@var{set}}
-when the ``no-'' form is used.
-
-@var{var} is declared in the same way as for the single-argument form
-described above.
-
 @itemize @bullet
 @item
 If the option uses the @code{Mask} or @code{InverseMask} properties,
@@ -390,11 +377,24 @@ and wasn't given.
 The option-processing script will usually zero-initialize @var{var}.
 You can modify this behavior using @code{Init}.
 
+@item Var(@var{var}, @var{set})
+The option controls an integer variable @var{var} and is active when
+@var{var} equals @var{set}.  The option parser will set @var{var} to
+@var{set} when the positive form of the option is used and @code{!@var{set}}
+when the ``no-'' form is used.
+
+@var{var} is declared in the same way as for the single-argument form
+described above.
+
 @item Init(@var{value})
 The variable specified by the @code{Var} property should be statically
 initialized to @var{value}.  If more than one option using the same
 variable specifies @code{Init}, all must specify the same initializer.
 
+@item WarnRemoved
+The option is removed and every usage of such option will
+result in a warning.  We use it option backward compatibility.
+
 @item Mask(@var{name})
 The option is associated with a bit in the @code{target_flags}
 variable (@pxref{Run-time Target}) and is active when that bit is set.
-- 
2.39.0



[PATCH v2] libgfortran: Replace mutex with rwlock

2022-12-28 Thread Lipeng Zhu via Gcc-patches
This patch try to introduce the rwlock and split the read/write to
unit_root tree and unit_cache with rwlock instead of the mutex to
increase CPU efficiency. In the get_gfc_unit function, the percentage
to step into the insert_unit function is around 30%, in most instances,
we can get the unit in the phase of reading the unit_cache or unit_root
tree. So split the read/write phase by rwlock would be an approach to
make it more parallel.

BTW, the IPC metrics can gain around 9x in our test
server with 220 cores. The benchmark we used is
https://github.com/rwesson/NEAT

libgcc/ChangeLog:

* gthr-posix.h (__GTHREAD_RWLOCK_INIT): New macro
(__gthrw): New function
(__gthread_rwlock_rdlock): New function
(__gthread_rwlock_tryrdlock): New function
(__gthread_rwlock_wrlock): New function
(__gthread_rwlock_trywrlock): New function
(__gthread_rwlock_unlock): New function

libgfortran/ChangeLog:

* io/async.c (DEBUG_LINE): New
* io/async.h (RWLOCK_DEBUG_ADD): New macro
(CHECK_RDLOCK): New macro
(CHECK_WRLOCK): New macro
(TAIL_RWLOCK_DEBUG_QUEUE): New macro
(IN_RWLOCK_DEBUG_QUEUE): New macro
(RDLOCK): New macro
(WRLOCK): New macro
(RWUNLOCK): New macro
(RD_TO_WRLOCK): New macro
(INTERN_RDLOCK): New macro
(INTERN_WRLOCK): New macro
(INTERN_RWUNLOCK): New macro
* io/io.h (internal_proto): Define unit_rwlock
* io/transfer.c (st_read_done_worker): Relace unit_lock with unit_rwlock
(st_write_done_worker): Relace unit_lock with unit_rwlock
* io/unit.c (get_gfc_unit): Relace unit_lock with unit_rwlock
(if): Relace unit_lock with unit_rwlock
(close_unit_1): Relace unit_lock with unit_rwlock
(close_units): Relace unit_lock with unit_rwlock
(newunit_alloc): Relace unit_lock with unit_rwlock
* io/unix.c (flush_all_units): Relace unit_lock with unit_rwlock

---
v1 -> v2:
Limit the pthread_rwlock usage in libgcc only when __cplusplus isn't defined.

Signed-off-by: Lipeng Zhu 
---
 libgcc/gthr-posix.h   |  60 +++
 libgfortran/io/async.c|   4 +
 libgfortran/io/async.h| 151 ++
 libgfortran/io/io.h   |  15 ++--
 libgfortran/io/transfer.c |   8 +-
 libgfortran/io/unit.c |  65 
 libgfortran/io/unix.c |  16 ++--
 7 files changed, 273 insertions(+), 46 deletions(-)

diff --git a/libgcc/gthr-posix.h b/libgcc/gthr-posix.h
index f1a5ab8e075..cbe7cba20d0 100644
--- a/libgcc/gthr-posix.h
+++ b/libgcc/gthr-posix.h
@@ -48,6 +48,9 @@ typedef pthread_t __gthread_t;
 typedef pthread_key_t __gthread_key_t;
 typedef pthread_once_t __gthread_once_t;
 typedef pthread_mutex_t __gthread_mutex_t;
+#ifndef __cplusplus
+typedef pthread_rwlock_t __gthread_rwlock_t;
+#endif
 typedef pthread_mutex_t __gthread_recursive_mutex_t;
 typedef pthread_cond_t __gthread_cond_t;
 typedef struct timespec __gthread_time_t;
@@ -58,6 +61,9 @@ typedef struct timespec __gthread_time_t;
 
 #define __GTHREAD_MUTEX_INIT PTHREAD_MUTEX_INITIALIZER
 #define __GTHREAD_MUTEX_INIT_FUNCTION __gthread_mutex_init_function
+#ifndef __cplusplus
+#define __GTHREAD_RWLOCK_INIT PTHREAD_RWLOCK_INITIALIZER
+#endif
 #define __GTHREAD_ONCE_INIT PTHREAD_ONCE_INIT
 #if defined(PTHREAD_RECURSIVE_MUTEX_INITIALIZER)
 #define __GTHREAD_RECURSIVE_MUTEX_INIT PTHREAD_RECURSIVE_MUTEX_INITIALIZER
@@ -135,6 +141,13 @@ __gthrw(pthread_mutexattr_init)
 __gthrw(pthread_mutexattr_settype)
 __gthrw(pthread_mutexattr_destroy)
 
+#ifndef __cplusplus
+__gthrw(pthread_rwlock_rdlock)
+__gthrw(pthread_rwlock_tryrdlock)
+__gthrw(pthread_rwlock_wrlock)
+__gthrw(pthread_rwlock_trywrlock)
+__gthrw(pthread_rwlock_unlock)
+#endif
 
 #if defined(_LIBOBJC) || defined(_LIBOBJC_WEAK)
 /* Objective-C.  */
@@ -885,6 +898,53 @@ __gthread_cond_destroy (__gthread_cond_t* __cond)
   return __gthrw_(pthread_cond_destroy) (__cond);
 }
 
+#ifndef __cplusplus
+static inline int
+__gthread_rwlock_rdlock (__gthread_rwlock_t *__rwlock)
+{
+  if (__gthread_active_p ())
+return __gthrw_(pthread_rwlock_rdlock) (__rwlock);
+  else
+return 0;
+}
+
+static inline int
+__gthread_rwlock_tryrdlock (__gthread_rwlock_t *__rwlock)
+{
+  if (__gthread_active_p ())
+return __gthrw_(pthread_rwlock_tryrdlock) (__rwlock);
+  else
+return 0;
+}
+
+static inline int
+__gthread_rwlock_wrlock (__gthread_rwlock_t *__rwlock)
+{
+  if (__gthread_active_p ())
+return __gthrw_(pthread_rwlock_wrlock) (__rwlock);
+  else
+return 0;
+}
+
+static inline int
+__gthread_rwlock_trywrlock (__gthread_rwlock_t *__rwlock)
+{
+  if (__gthread_active_p ())
+return __gthrw_(pthread_rwlock_trywrlock) (__rwlock);
+  else
+return 0;
+}
+
+static inline int
+__gthread_rwlock_unlock (__gthread_rwlock_t *__rwlock)
+{
+  if (__gthread_active_p ())
+return __gthrw_(pthread_rwlock_unlock) (__rwlock);
+  else
+return 0;
+}
+#endif
+
 #endif /* _LIBOBJC */
 
 #endif /* ! GCC_GTHR_POSIX_H */
diff --git a/libgfortran/io/async.c b/libgfortran/io/async.c
in

Re: [x86 PATCH] Use ix86_expand_clear in ix86_split_ashl.

2022-12-28 Thread Uros Bizjak via Gcc-patches
On Wed, Dec 28, 2022 at 1:06 AM Roger Sayle  wrote:
>
> This patch is a one line change, to call ix86_expand_clear instead of
> emit_move_insn with const0_rtx in ix86_split_ashl, allowing the backend
> to use an xor instruction to clear a register if appropriate.
>
> The effect is demonstrated with the following function.
> __int128 foo(__int128 x, unsigned long long b) {
> return ((__int128)b << 72) + x;
> }
>
> previously with -O2, GCC would generate
>
> foo:movl$0, %eax
> salq$8, %rdx
> addq%rdi, %rax
> adcq%rsi, %rdx
> ret
>
> with this patch, it now generates
>
> foo:xorl%eax, %eax
> salq$8, %rdx
> addq%rdi, %rax
> adcq%rsi, %rdx
> ret
>
> This patch has been tested on x86_64-pc-linux-gnu with make bootstrap
> and make -k check, both with and without --target_board=unix{-m32},
> with no new failures.  It's an independent piece of a larger bug fix.
> Ok for mainline?
>
>
> 2022-12-28  Roger Sayle  
>
> gcc/ChangeLog
> * config/i386/i386-expand.cc (ix86_split_ashl): Call
> ix86_expand_clear to generate an xor instruction.
>
> gcc/testsuite/ChangeLog
> * gcc.target/i386/ashlti3-1.c: New test case.

OK.

Thanks,
Uros.

>
>
> Thanks in advance,
> Roger
> --
>


Re: [00/13] check hash table counts

2022-12-28 Thread Martin Liška
On 12/27/22 05:07, Alexandre Oliva via Gcc-patches wrote:
> 
> While looking into another issue that corrupted hash tables, I added
> code to check consistency of element counts, and hit numerous issues
> that were concerning, of three kinds: insertion of entries that seem
> empty, dangling insertions, and lookups during insertions.
> 
> These problems may all have the effect of replacing a deleted entry
> with one that seems empty, which may disconnect double-hashing chains
> involving that entry, and thus cause entries to go missing.
> 
> This patch, opening the patch series, only adds code to detect these
> problems to the hash table verifier method.  I'm not sure it makes
> sense to put it in, but I post it for the record.  Fixes and cheaper
> detectors for some of these errors are going to be posted separately.
> 
> The number of bugs it revealed tells me that leaving callers in charge
> of completing insertions is too error prone.  I found this
> verification code a bit too expensive to enable in general.

Hello.

I really like the verification code you added. It's quite similar to what
I added to hash-table.h:

void
hash_table
::verify (const compare_type &comparable, hashval_t hash)

where the check is also expensive, but I guard it with a param value:

hash-table-verification-limit
The number of elements for which hash table verification is done for each 
searched element.

You can utilize the parameter or come up with your own.

Cheers,
Martin

>  We could
> add check entry count cheaply at expand time and catch some of these
> at very low cost, which would likely catch at least some of the
> errors, but perhaps a safer interface could avoid them entirely.
> WDYT?
> 
> 
> I'm going to post a collection of 13 patches a as a followup to this
> one, fixing various problems it has detected, or adding code to catch
> some of these problems sooner.
> 
> 
> for  gcc/ChangeLog
> 
>   * hash-table.h (verify): Check elements and deleted counts.
> ---
>  gcc/hash-table.h |   17 +
>  1 file changed, 17 insertions(+)
> 
> diff --git a/gcc/hash-table.h b/gcc/hash-table.h
> index 53507daae26c3..7dbeea05373a2 100644
> --- a/gcc/hash-table.h
> +++ b/gcc/hash-table.h
> @@ -1035,6 +1035,23 @@ hash_table
> && Descriptor::equal (*entry, comparable))
>   hashtab_chk_error ();
>  }
> +
> +  if (m_size < 2048)
> +{
> +  size_t elmts = m_n_elements, dels = m_n_deleted;
> +  for (size_t i = 0; i < m_size; i++)
> + {
> +   value_type *entry = &m_entries[i];
> +   if (!is_empty (*entry))
> + {
> +   elmts--;
> +   if (is_deleted (*entry))
> + dels--;
> + }
> + }
> +  if (elmts || dels)
> + hashtab_chk_error ();
> +}
>  }
>  
>  /* This function deletes an element with the given COMPARABLE value
> 
> 



[PATCH] c: check if target_clone attrs are all string

2022-12-28 Thread Martin Liška
Hi.

The patch checks all attribute arguments if they are string. If not,
an error message is emitted.

Patch can bootstrap on x86_64-linux-gnu and survives regression tests.

Ready to be installed?
Thanks,
Martin

PR c/107993

gcc/c-family/ChangeLog:

* c-attribs.cc (handle_target_clones_attribute): Check for
string constant for all target_clone attribute values.

gcc/testsuite/ChangeLog:

* gcc.target/i386/pr107993.c: New test.
---
 gcc/c-family/c-attribs.cc| 14 ++
 gcc/testsuite/gcc.target/i386/pr107993.c |  9 +
 2 files changed, 19 insertions(+), 4 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/i386/pr107993.c

diff --git a/gcc/c-family/c-attribs.cc b/gcc/c-family/c-attribs.cc
index b36dd97802b..33d84cb6e07 100644
--- a/gcc/c-family/c-attribs.cc
+++ b/gcc/c-family/c-attribs.cc
@@ -5574,12 +5574,18 @@ handle_target_clones_attribute (tree *node, tree name, 
tree ARG_UNUSED (args),
   /* Ensure we have a function type.  */
   if (TREE_CODE (*node) == FUNCTION_DECL)
 {
-  if (TREE_CODE (TREE_VALUE (args)) != STRING_CST)
+  for (tree t = args; t != NULL_TREE; t = TREE_CHAIN (t))
{
- error ("%qE attribute argument not a string constant", name);
- *no_add_attrs = true;
+ tree value = TREE_VALUE (t);
+ if (TREE_CODE (value) != STRING_CST)
+   {
+ error ("%qE attribute argument not a string constant", name);
+ *no_add_attrs = true;
+ return NULL_TREE;
+   }
}
-  else if (lookup_attribute ("always_inline", DECL_ATTRIBUTES (*node)))
+
+  if (lookup_attribute ("always_inline", DECL_ATTRIBUTES (*node)))
{
  warning (OPT_Wattributes, "%qE attribute ignored due to conflict "
   "with %qs attribute", name, "always_inline");
diff --git a/gcc/testsuite/gcc.target/i386/pr107993.c 
b/gcc/testsuite/gcc.target/i386/pr107993.c
new file mode 100644
index 000..b0b84a677d8
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/pr107993.c
@@ -0,0 +1,9 @@
+/* PR c/107993 */
+/* { dg-do compile } */
+
+typedef union { int x; } u;
+__attribute__((target_clones("arch=alderlake",!"default")))
+int f (u *x)
+{ /* { dg-error ".target_clones. attribute argument not a string constant" } */
+  return 0;
+}
-- 
2.39.0