Re: [PATCH 3/4 v3] ivopts: Consider cost_step on different forms during unrolling

2020-09-02 Thread Kewen.Lin via Gcc-patches
Hi Segher,

on 2020/9/2 下午6:25, Segher Boessenkool wrote:
> Hi!
> 
> On Wed, Sep 02, 2020 at 11:16:00AM +0800, Kewen.Lin wrote:
>> on 2020/9/1 上午3:41, Segher Boessenkool wrote:
>>> On Tue, Aug 25, 2020 at 08:46:55PM +0800, Kewen.Lin wrote:
 1) Currently address_cost hook on rs6000 always return zero, but at least
 from Power7, pre_inc/pre_dec kind instructions are cracked, it means we
 have to take the address update into account (scalar normal operation).
>>>
>>> From Power4 on already (not sure about Power6, but does anyone care?)
>>
>> Thanks for the information, it looks this issue exists for a long time.
> 
> Well, *is* it an issue?  The addressing doesn't get more expensive...
> For example, an
>   ldu 3,16(4)
> is cracked to an
>   ld 3,16(4)
> and an
>   addi 4,4,16
> (the addi is not on the critical path of the load).  So it seems to me
> this shouldn't increase the addressing cost at all?  (The instruction of
> course is really two insns in one.)
> 

Good question!  I agree that they can execute in parallel, but it depends
on how we interprete the addressing cost, if it's for required execution
resource, I think it's off, since comparing with ld, the ldu has two iops
and extra ALU requirement.  I'm not sure its usage elsewhere, but in the
context of IVOPTs on Power, for one normal candidate, its step cost is 4,
the cost for group (1) is zero, total cost is 4 for this combination.
for the scenario like:
ldx rx, iv // (1)
...
iv = iv + step // (2)

While for ainc_use candidate (like ldu), its step cost is 4, but the cost
for group (1) is (-4 // minus step cost), total cost is 0.  It looks to
say the step update is free.

We can also see (1) and (2) can also execute in parallel (same iteration).
If we consider the next iteration, it will have the dependency, but it's
the same for ldu.  So basically they are similar, I think it's unfair to
have this difference in the cost modeling.  The cracked addi should have
its cost here.  Does it make sense?

Apart from that, one P9 specific point is that the update form load isn't
preferred,  the reason is that the instruction can not retire until both
parts complete, it can hold up subsequent instructions from retiring.
If the addi stalls (starvation), the instruction can not retire and can
cause things stuck.  It seems also something we can model here?

BR,
Kewen


Re: [PATCH] [AVX512] [PR87767] Optimize memory broadcast for constant vector under AVX512

2020-09-02 Thread Hongtao Liu via Gcc-patches
On Wed, Sep 2, 2020 at 5:58 PM Jakub Jelinek  wrote:
>
> On Wed, Sep 02, 2020 at 09:57:08AM +0800, Hongtao Liu via Gcc-patches wrote:
> > +
> > +  first = XVECEXP (constant, 0, 0);
> > +  /* There could be some rtx like
> > +  (mem/u/c:V16QI (symbol_ref/u:DI ("*.LC1")))
> > +  but with "*.LC1" refer to V2DI constant vector.  */
> > +  if (GET_MODE (constant) != mode)
> > + {
> > +   constant = simplify_subreg (mode, constant, GET_MODE (constant), 0);
> > +   if (constant == NULL_RTX || GET_CODE (constant) != CONST_VECTOR)
> > + return;
> > + }
>
> The
>   first = XVECEXP (constant, 0, 0);
> line needs to be after this if, not before it, otherwise it will miscompile
> things or just ICE.
>

Changed.

> > @@ -2197,6 +2272,10 @@ remove_partial_avx_dependency (void)
> > if (!NONDEBUG_INSN_P (insn))
> >   continue;
> >
> > +   /* Hanlde AVX512 embedded broadcast here to save compile time.  */
>
> s/Hanlde/Handle/
>

Changed, sorry for the typo.

> > +  for (insn = get_insns (); insn; insn = NEXT_INSN (insn))
> > +{
> > +  if (!INSN_P (insn))
> > + continue;
> > +  replace_constant_pool_with_broadcast (insn);
> > +}
>
> Perhaps instead do:
>   for (insn = get_insns (); insn; insn = NEXT_INSN (insn))
> if (INSN_P (insn))
>   replace_constant_pool_with_broadcast (insn);
> ?
>

Changed.

> > +  /* opt_pass methods: */
> > +  virtual bool gate (function *)
> > +{
> > +  /* Return false if rpad pass gate is true.
> > +  replace_constant_pool_with_broadcast is called
> > +  from both this pass and rpad pass.  */
> > +  return (TARGET_AVX512F
> > +   && !(TARGET_AVX
> > +&& TARGET_SSE_PARTIAL_REG_DEPENDENCY
> > +&& TARGET_SSE_MATH
> > +&& optimize
> > +&& optimize_function_for_speed_p (cfun)));
>
> I think this could be a maintainance nightmare.
> Perhaps instead add
>

Yes, a common interface should be added as bellow, changed.

> static bool
> remove_partial_avx_dependency_gate ()
> {
>   return (TARGET_AVX
>   && TARGET_SSE_PARTIAL_REG_DEPENDENCY
>   && TARGET_SSE_MATH
>   && optimize
>   && optimize_function_for_speed_p (cfun));
> }
> after the remove_partial_avx_dependency function definition,
> change pass_remove_partial_avx_dependency gate body to
>   return remove_partial_avx_dependency_gate ();
> and in pass_constant_pool_broadcast::gate do
>   return (TARGET_AVX512F && !remove_partial_avx_dependency_gate ();
> (with the comment you have there)?
>
> LGTM with those changes.
>
> Jakub
>

Thanks for the review, update patch.

-- 
BR,
Hongtao
From acf3825279190ca0540bb4704f66568fdbe06ce8 Mon Sep 17 00:00:00 2001
From: liuhongt 
Date: Wed, 8 Jul 2020 17:14:36 +0800
Subject: [PATCH] Optimize memory broadcast for constant vector under AVX512.

For constant vector having one duplicated value, there's no need to put
whole vector in the constant pool, using embedded broadcast instead.

2020-07-09  Hongtao Liu  

gcc/ChangeLog:

	PR target/87767
	* config/i386/i386-features.c
	(replace_constant_pool_with_broadcast): New function.
	(constant_pool_broadcast): Ditto.
	(class pass_constant_pool_broadcast): New pass.
	(make_pass_constant_pool_broadcast): Ditto.
	(remove_partial_avx_dependency): Call
	replace_constant_pool_with_broadcast under TARGET_AVX512F, it
	would save compile time when both pass rpad and cpb are
	available.
	(remove_partial_avx_dependency_gate): New function.
	(class pass_remove_partial_avx_dependency::gate): Call
	remove_partial_avx_dependency_gate.
	* config/i386/i386-passes.def: Insert new pass after combine.
	* config/i386/i386-protos.h
	(make_pass_constant_pool_broadcast): Declare.
	* config/i386/sse.md (*avx512dq_mul3_bcst):
	New define_insn.
	(*avx512f_mul3_bcst): Ditto.
	* config/i386/avx512fintrin.h (_mm512_set1_ps,
	_mm512_set1_pd,_mm512_set1_epi32, _mm512_set1_epi64): Adjusted.

gcc/testsuite/ChangeLog:

	PR target/87767
	* gcc.target/i386/avx2-broadcast-pr87767-1.c: New test.
	* gcc.target/i386/avx512f-broadcast-pr87767-1.c: New test.
	* gcc.target/i386/avx512f-broadcast-pr87767-2.c: New test.
	* gcc.target/i386/avx512f-broadcast-pr87767-3.c: New test.
	* gcc.target/i386/avx512f-broadcast-pr87767-4.c: New test.
	* gcc.target/i386/avx512f-broadcast-pr87767-5.c: New test.
	* gcc.target/i386/avx512f-broadcast-pr87767-6.c: New test.
	* gcc.target/i386/avx512f-broadcast-pr87767-7.c: New test.
	* gcc.target/i386/avx512vl-broadcast-pr87767-1.c: New test.
	* gcc.target/i386/avx512vl-broadcast-pr87767-1.c: New test.
	* gcc.target/i386/avx512vl-broadcast-pr87767-2.c: New test.
	* gcc.target/i386/avx512vl-broadcast-pr87767-3.c: New test.
	* gcc.target/i386/avx512vl-broadcast-pr87767-4.c: New test.
	* gcc.target/i386/avx512vl-broadcast-pr87767-5.c: New test.
	* gcc.target/i386/avx512vl-broadcast-pr87767-6.c: New test.
---
 gcc/config/i386/avx512fintrin.h   |  

[PATCH v2] c: Silently ignore pragma region [PR85487]

2020-09-02 Thread Austin Morton via Gcc-patches
#pragma region is a feature introduced by Microsoft in order to allow
manual grouping and folding of code within Visual Studio.  It is
entirely ignored by the compiler.  Clang has supported this feature
since 2012 when in MSVC compatibility mode, and enabled it across the
board in 2018.

As it stands, you cannot use #pragma region within GCC without
disabling unknown pragma warnings, which is not advisable.

I propose GCC adopt "#pragma region" and "#pragma endregion" in order
to alleviate these issues.  Because the pragma has no purpose at
compile time, the implementation is trivial.


Microsoft Documentation on the feature:
https://docs.microsoft.com/en-us/cpp/preprocessor/region-endregion

LLVM change which enabled pragma region across the board:
https://reviews.llvm.org/D42248
---
 gcc/ChangeLog|  5 +
 gcc/c-family/ChangeLog   |  5 +
 gcc/c-family/c-pragma.c  | 10 ++
 gcc/doc/cpp.texi |  6 ++
 gcc/testsuite/ChangeLog  |  5 +
 gcc/testsuite/gcc.dg/pragma-region.c | 21 +
 6 files changed, 52 insertions(+)
 create mode 100644 gcc/testsuite/gcc.dg/pragma-region.c

diff --git a/gcc/ChangeLog b/gcc/ChangeLog
index 9db853dcd..d0ba77b55 100644
--- a/gcc/ChangeLog
+++ b/gcc/ChangeLog
@@ -1,3 +1,8 @@
+2020-09-02  Austin Morton  
+
+ PR c/85487
+ * doc/cpp.texi (Pragmas): Document pragma region/endregion
+
 2020-08-26  Göran Uddeborg  

  PR gcov-profile/96285
diff --git a/gcc/c-family/ChangeLog b/gcc/c-family/ChangeLog
index 1eaa99f31..ccf06095f 100644
--- a/gcc/c-family/ChangeLog
+++ b/gcc/c-family/ChangeLog
@@ -1,3 +1,8 @@
+2020-09-02  Austin Morton  
+
+ PR c/85487
+ * c-pragma.c (handle_pragma_region): Declare.
+
 2020-08-11  Jakub Jelinek  

  PR c/96545
diff --git a/gcc/c-family/c-pragma.c b/gcc/c-family/c-pragma.c
index e3169e68f..de0411d07 100644
--- a/gcc/c-family/c-pragma.c
+++ b/gcc/c-family/c-pragma.c
@@ -1166,6 +1166,13 @@ handle_pragma_message (cpp_reader *ARG_UNUSED(dummy))
 TREE_STRING_POINTER (message));
 }

+/* Silently ignore region pragmas.  */
+
+static void
+handle_pragma_region (cpp_reader *ARG_UNUSED(dummy))
+{
+}
+
 /* Mark whether the current location is valid for a STDC pragma.  */

 static bool valid_location_for_stdc_pragma;
@@ -1584,6 +1591,9 @@ init_pragma (void)

   c_register_pragma_with_expansion (0, "message", handle_pragma_message);

+  c_register_pragma (0, "region", handle_pragma_region);
+  c_register_pragma (0, "endregion", handle_pragma_region);
+
 #ifdef REGISTER_TARGET_PRAGMAS
   REGISTER_TARGET_PRAGMAS ();
 #endif
diff --git a/gcc/doc/cpp.texi b/gcc/doc/cpp.texi
index 33f876ab7..c868ed695 100644
--- a/gcc/doc/cpp.texi
+++ b/gcc/doc/cpp.texi
@@ -3789,6 +3789,12 @@ file will never be read again, no matter what.
It is a less-portable
 alternative to using @samp{#ifndef} to guard the contents of header files
 against multiple inclusions.

+@item #pragma region
+@itemx #pragma endregion
+These pragmas are silently ignored by the compiler.  Any trailing text is
+also silently ignored.  They exist only to facilitate code organization
+and folding in supported editors.
+
 @end ftable

 @node Other Directives
diff --git a/gcc/testsuite/ChangeLog b/gcc/testsuite/ChangeLog
index 5c1a45716..d90067555 100644
--- a/gcc/testsuite/ChangeLog
+++ b/gcc/testsuite/ChangeLog
@@ -1,3 +1,8 @@
+2020-09-02  Austin Morton  
+
+ PR c/85487
+ * gcc.dg/pragma-region.c: New test.
+
 2020-08-26  Jeff Law  

  * gcc.target/i386/387-7.c: Add dg-require-effective-target c99_runtime.
diff --git a/gcc/testsuite/gcc.dg/pragma-region.c
b/gcc/testsuite/gcc.dg/pragma-region.c
new file mode 100644
index 0..72cc2c144
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/pragma-region.c
@@ -0,0 +1,21 @@
+/* Verify #pragma region and #pragma endregion do not emit warnings.  */
+
+/* { dg-options "-Wunknown-pragmas" } */
+
+#pragma region
+
+#pragma region name
+
+#pragma region "name"
+
+#pragma region()
+
+#pragma region("name")
+
+#pragma endregion
+
+#pragma endregion garbage
+
+#pragma endregion()
+
+#pragma endregion("garbage")
-- 
2.17.1


[PATCH] Add C++2a synchronization support

2020-09-02 Thread Thomas Rodgers
Note - ignore previous version of this patch, didn't filter out
Makefile.in

Adds support for -
atomic wait/notify_one/notify_all
counting_semaphore
binary_semaphore
latch

* include/Makefile.am (bits_headers): Add new header.
* include/Makefile.in: Regenerate.
* include/bits/atomic_base.h (__atomic_base<_Itp>::wait): Define.
(__atomic_base<_Itp>::notify_one): Likewise.
(__atomic_base<_Itp>::notify_all): Likewise.
(__atomic_base<_Ptp*>::wait): Likewise.
(__atomic_base<_Ptp*>::notify_one): Likewise.
(__atomic_base<_Ptp*>::notify_all): Likewise.
(__atomic_impl::wait): Likewise.
(__atomic_impl::notify_one): Likewise.
(__atomic_impl::notify_all): Likewise.
(__atomic_float<_Fp>::wait): Likewise.
(__atomic_float<_Fp>::notify_one): Likewise.
(__atomic_float<_Fp>::notify_all): Likewise.
(__atomic_ref<_Tp>::wait): Likewise.
(__atomic_ref<_Tp>::notify_one): Likewise.
(__atomic_ref<_Tp>::notify_all): Likewise.
(atomic_wait<_Tp>): Likewise.
(atomic_wait_explicit<_Tp>): Likewise.
(atomic_notify_one<_Tp>): Likewise.
(atomic_notify_all<_Tp>): Likewise.
* include/bits/atomic_wait.h: New file.
* include/bits/atomic_timed_wait.h: New file.
* include/bits/semaphore_base.h: New file.
* include/std/atomic (atomic::wait): Define.
(atomic::wait_one): Likewise.
(atomic::wait_all): Likewise.
(atomic<_Tp>::wait): Likewise.
(atomic<_Tp>::wait_one): Likewise.
(atomic<_Tp>::wait_all): Likewise.
(atomic<_Tp*>::wait): Likewise.
(atomic<_Tp*>::wait_one): Likewise.
(atomic<_Tp*>::wait_all): Likewise.
* include/std/latch: New file.
* include/std/semaphore: New file.
* include/std/version: Add __cpp_lib_semaphore and
__cpp_lib_latch defines.
* testsuite/29_atomic/atomic/wait_notify/atomic_refs.cc: New test.
* testsuite/29_atomic/atomic/wait_notify/bool.cc: Likewise.
* testsuite/29_atomic/atomic/wait_notify/integrals.cc: Likewise.
* testsuite/29_atomic/atomic/wait_notify/floats.cc: Likewise.
* testsuite/29_atomic/atomic/wait_notify/pointers.cc: Likewise.
* testsuite/29_atomic/atomic/wait_notify/generic.cc: Liekwise.
* testsuite/29_atomic/atomic/wait_notify/generic.h: New File.
* testsuite/30_thread/semaphore/1.cc: New test.
* testsuite/30_thread/semaphore/2.cc: Likewise.
* testsuite/30_thread/semaphore/least_max_value_neg.cc: Likewise.
* testsuite/30_thread/semaphore/try_acquire.cc: Likewise.
* testsuite/30_thread/semaphore/try_acquire_for.cc: Likewise.
* testsuite/30_thread/semaphore/try_acquire_posix.cc: Likewise.
* testsuite/30_thread/semaphore/try_acquire_until.cc: Likewise.
* testsuite/30_thread/latch/1.cc: New test.
* testsuite/30_thread/latch/2.cc: New test.
* testsuite/30_thread/latch/3.cc: New test.
---
 libstdc++-v3/include/Makefile.am  |   5 +
 libstdc++-v3/include/Makefile.in  |   5 +
 libstdc++-v3/include/bits/atomic_base.h   | 172 +-
 libstdc++-v3/include/bits/atomic_timed_wait.h | 281 
 libstdc++-v3/include/bits/atomic_wait.h   | 301 ++
 libstdc++-v3/include/bits/semaphore_base.h| 283 
 libstdc++-v3/include/std/atomic   |  73 +
 libstdc++-v3/include/std/latch|  90 ++
 libstdc++-v3/include/std/semaphore|  92 ++
 libstdc++-v3/include/std/version  |   2 +
 .../atomic/wait_notify/atomic_refs.cc | 103 ++
 .../29_atomics/atomic/wait_notify/bool.cc |  59 
 .../29_atomics/atomic/wait_notify/floats.cc   |  32 ++
 .../29_atomics/atomic/wait_notify/generic.cc  |  31 ++
 .../29_atomics/atomic/wait_notify/generic.h   | 160 ++
 .../atomic/wait_notify/integrals.cc   |  65 
 .../29_atomics/atomic/wait_notify/pointers.cc |  59 
 libstdc++-v3/testsuite/30_threads/latch/1.cc  |  27 ++
 libstdc++-v3/testsuite/30_threads/latch/2.cc  |  27 ++
 libstdc++-v3/testsuite/30_threads/latch/3.cc  |  50 +++
 .../testsuite/30_threads/semaphore/1.cc   |  27 ++
 .../testsuite/30_threads/semaphore/2.cc   |  27 ++
 .../semaphore/least_max_value_neg.cc  |  30 ++
 .../30_threads/semaphore/try_acquire.cc   |  55 
 .../30_threads/semaphore/try_acquire_for.cc   |  85 +
 .../30_threads/semaphore/try_acquire_posix.cc | 153 +
 .../30_threads/semaphore/try_acquire_until.cc |  94 ++
 27 files changed, 2387 insertions(+), 1 deletion(-)
 create mode 100644 libstdc++-v3/include/bits/atomic_timed_wait.h
 create mode 100644 libstdc++-v3/include/bits/atomic_wait.h
 create mode 100644 libstdc++-v3/include/bits/semaphore_base.h
 create mode 100644 libstdc++-v3/include/std/latch
 create 

[PATCH] Add C++2a synchronization support

2020-09-02 Thread Thomas Rodgers
Adds support for -
atomic wait/notify_one/notify_all
counting_semaphore
binary_semaphore
latch

* include/Makefile.am (bits_headers): Add new header.
* include/Makefile.in: Regenerate.
* include/bits/atomic_base.h (__atomic_base<_Itp>::wait): Define.
(__atomic_base<_Itp>::notify_one): Likewise.
(__atomic_base<_Itp>::notify_all): Likewise.
(__atomic_base<_Ptp*>::wait): Likewise.
(__atomic_base<_Ptp*>::notify_one): Likewise.
(__atomic_base<_Ptp*>::notify_all): Likewise.
(__atomic_impl::wait): Likewise.
(__atomic_impl::notify_one): Likewise.
(__atomic_impl::notify_all): Likewise.
(__atomic_float<_Fp>::wait): Likewise.
(__atomic_float<_Fp>::notify_one): Likewise.
(__atomic_float<_Fp>::notify_all): Likewise.
(__atomic_ref<_Tp>::wait): Likewise.
(__atomic_ref<_Tp>::notify_one): Likewise.
(__atomic_ref<_Tp>::notify_all): Likewise.
(atomic_wait<_Tp>): Likewise.
(atomic_wait_explicit<_Tp>): Likewise.
(atomic_notify_one<_Tp>): Likewise.
(atomic_notify_all<_Tp>): Likewise.
* include/bits/atomic_wait.h: New file.
* include/bits/atomic_timed_wait.h: New file.
* include/bits/semaphore_base.h: New file.
* include/std/atomic (atomic::wait): Define.
(atomic::wait_one): Likewise.
(atomic::wait_all): Likewise.
(atomic<_Tp>::wait): Likewise.
(atomic<_Tp>::wait_one): Likewise.
(atomic<_Tp>::wait_all): Likewise.
(atomic<_Tp*>::wait): Likewise.
(atomic<_Tp*>::wait_one): Likewise.
(atomic<_Tp*>::wait_all): Likewise.
* include/std/latch: New file.
* include/std/semaphore: New file.
* include/std/version: Add __cpp_lib_semaphore and
__cpp_lib_latch defines.
* testsuite/29_atomic/atomic/wait_notify/atomic_refs.cc: New test.
* testsuite/29_atomic/atomic/wait_notify/bool.cc: Likewise.
* testsuite/29_atomic/atomic/wait_notify/integrals.cc: Likewise.
* testsuite/29_atomic/atomic/wait_notify/floats.cc: Likewise.
* testsuite/29_atomic/atomic/wait_notify/pointers.cc: Likewise.
* testsuite/29_atomic/atomic/wait_notify/generic.cc: Liekwise.
* testsuite/29_atomic/atomic/wait_notify/generic.h: New File.
* testsuite/30_thread/semaphore/1.cc: New test.
* testsuite/30_thread/semaphore/2.cc: Likewise.
* testsuite/30_thread/semaphore/least_max_value_neg.cc: Likewise.
* testsuite/30_thread/semaphore/try_acquire.cc: Likewise.
* testsuite/30_thread/semaphore/try_acquire_for.cc: Likewise.
* testsuite/30_thread/semaphore/try_acquire_posix.cc: Likewise.
* testsuite/30_thread/semaphore/try_acquire_until.cc: Likewise.
* testsuite/30_thread/latch/1.cc: New test.
* testsuite/30_thread/latch/2.cc: New test.
* testsuite/30_thread/latch/3.cc: New test.
---
 libstdc++-v3/include/Makefile.am  |   5 +
 libstdc++-v3/include/Makefile.in  |   5 +
 libstdc++-v3/include/bits/atomic_base.h   | 172 +-
 libstdc++-v3/include/bits/atomic_timed_wait.h | 281 
 libstdc++-v3/include/bits/atomic_wait.h   | 301 ++
 libstdc++-v3/include/bits/semaphore_base.h| 283 
 libstdc++-v3/include/std/atomic   |  73 +
 libstdc++-v3/include/std/latch|  90 ++
 libstdc++-v3/include/std/semaphore|  92 ++
 libstdc++-v3/include/std/version  |   2 +
 .../atomic/wait_notify/atomic_refs.cc | 103 ++
 .../29_atomics/atomic/wait_notify/bool.cc |  59 
 .../29_atomics/atomic/wait_notify/floats.cc   |  32 ++
 .../29_atomics/atomic/wait_notify/generic.cc  |  31 ++
 .../29_atomics/atomic/wait_notify/generic.h   | 160 ++
 .../atomic/wait_notify/integrals.cc   |  65 
 .../29_atomics/atomic/wait_notify/pointers.cc |  59 
 libstdc++-v3/testsuite/30_threads/latch/1.cc  |  27 ++
 libstdc++-v3/testsuite/30_threads/latch/2.cc  |  27 ++
 libstdc++-v3/testsuite/30_threads/latch/3.cc  |  50 +++
 .../testsuite/30_threads/semaphore/1.cc   |  27 ++
 .../testsuite/30_threads/semaphore/2.cc   |  27 ++
 .../semaphore/least_max_value_neg.cc  |  30 ++
 .../30_threads/semaphore/try_acquire.cc   |  55 
 .../30_threads/semaphore/try_acquire_for.cc   |  85 +
 .../30_threads/semaphore/try_acquire_posix.cc | 153 +
 .../30_threads/semaphore/try_acquire_until.cc |  94 ++
 27 files changed, 2387 insertions(+), 1 deletion(-)
 create mode 100644 libstdc++-v3/include/bits/atomic_timed_wait.h
 create mode 100644 libstdc++-v3/include/bits/atomic_wait.h
 create mode 100644 libstdc++-v3/include/bits/semaphore_base.h
 create mode 100644 libstdc++-v3/include/std/latch
 create mode 100644 libstdc++-v3/include/std/semaphore
 create mode 100644 

[PING][PATCH 2/5] C front end support to detect out-of-bounds accesses to array parameters

2020-09-02 Thread Martin Sebor via Gcc-patches

Ping: https://gcc.gnu.org/pipermail/gcc-patches/2020-August/552266.html

On 8/25/20 12:44 PM, Martin Sebor wrote:

Joseph, do you have any more comments on the rest of the most recent
revision of the patch?

https://gcc.gnu.org/pipermail/gcc-patches/2020-August/552266.html

Martin

On 8/19/20 6:09 PM, Joseph Myers wrote:

On Wed, 19 Aug 2020, Martin Sebor via Gcc-patches wrote:


I think you need a while loop there, not just an if, to account for the
case of multiple consecutive cdk_attrs.  At least the GNU attribute 
syntax


 direct-declarator:
[...]
   ( gnu-attributes[opt] declarator )

should produce multiple consecutive cdk_attrs for each level of
parentheses with attributes inside.


I had considered a loop but couldn't find a way to trigger what you
describe (or a test in the testsuite that would do it) so I didn't
use one.  I saw loops like that in other places but I couldn't get
even those to uncover such a test case.  Here's what I tried:

   #define A(N) __attribute__ ((aligned (N), may_alias))
   int n;
   void f (int (* A (2) A (4) (* A (2) A (4) (* A (2) A (4) [n])[n])));

Sequences of consecutive attributes are all chained together.

I've added the loop here but I have no test for it.  It would be
good to add one if it really is needed.


The sort of thing I'm thinking of would be, where A is some attribute:

void f (int (A (A (A arg;

(that example doesn't involve an array, but it illustrates the syntax I'd
expect to produce multiple consecutive cdk_attrs).







Re: [PATCH] c++: Disable -frounding-math during manifestly constant evaluation [PR96862]

2020-09-02 Thread Marc Glisse

On Wed, 2 Sep 2020, Jason Merrill via Gcc-patches wrote:


On 9/1/20 6:13 AM, Marc Glisse wrote:

On Tue, 1 Sep 2020, Jakub Jelinek via Gcc-patches wrote:


As discussed in the PR, fold-const.c punts on floating point constant
evaluation if the result is inexact and -frounding-math is turned on.
 /* Don't constant fold this floating point operation if the
    result may dependent upon the run-time rounding mode and
    flag_rounding_math is set, or if GCC's software emulation
    is unable to accurately represent the result.  */
 if ((flag_rounding_math
  || (MODE_COMPOSITE_P (mode) && !flag_unsafe_math_optimizations))
 && (inexact || !real_identical (, )))
   return NULL_TREE;
Jonathan said that we should be evaluating them anyway, e.g. conceptually
as if they are done with the default rounding mode before user had a 
chance

to change that, and e.g. in C in initializers it is also ignored.
In fact, fold-const.c for C initializers turns off various other options:

/* Perform constant folding and related simplification of initializer
  expression EXPR.  These behave identically to "fold_buildN" but ignore
  potential run-time traps and exceptions that fold must preserve.  */

#define START_FOLD_INIT \
 int saved_signaling_nans = flag_signaling_nans;\
 int saved_trapping_math = flag_trapping_math;\
 int saved_rounding_math = flag_rounding_math;\
 int saved_trapv = flag_trapv;\
 int saved_folding_initializer = folding_initializer;\
 flag_signaling_nans = 0;\
 flag_trapping_math = 0;\
 flag_rounding_math = 0;\
 flag_trapv = 0;\
 folding_initializer = 1;

#define END_FOLD_INIT \
 flag_signaling_nans = saved_signaling_nans;\
 flag_trapping_math = saved_trapping_math;\
 flag_rounding_math = saved_rounding_math;\
 flag_trapv = saved_trapv;\
 folding_initializer = saved_folding_initializer;

So, shall cxx_eval_outermost_constant_expr instead turn off all those
options (then warning_sentinel wouldn't be the right thing to use, but 
given

the 8 or how many return stmts in cxx_eval_outermost_constant_expr, we'd
need a RAII class for this.  Not sure about the folding_initializer, that
one is affecting complex multiplication and division constant evaluation
somehow.


I don't think we need to turn off flag_signaling_nans or flag_trapv. I 
think we want to turn off flag_trapping_math so we can fold 1./0 to inf 
(still in a context where folding is mandatory). Setting 
folding_initializer seems consistent with that, enabling infinite results 
in complex folding (it also forces folding of __builtin_constant_p, which 
may be redundant with force_folding_builtin_constant_p).


C++ says that division by zero has undefined behavior, and that an expression 
with undefined behavior is not constant, so we shouldn't fold 1./0 to inf 
anyway.  The same is true of other trapping operations.  So clearing 
flag_signaling_nans, flag_trapping_math, and flag_trapv seems wrong for C++. 
And folding_initializer seems to be used for the same sort of thing.


So we should actually force flag_trapping_math to true during constexpr
evaluation? And folding_initializer to false, and never mind trapv but
maybe disable wrapv?

#include 
constexpr double a = std::numeric_limits::infinity();
constexpr double b = a + a;
constexpr double c = a - a;
constexpr double d = 1. / a;
constexpr double e = 1. / d;

clang rejects c and e. MSVC rejects e. Intel warns on c.

Gcc rejects only e, and accepts the whole thing if I pass
-fno-trapping-math.

Almost any FP operation is possibly trapping, 1./3. sets FE_INEXACT just 
as 1./0. sets FE_DIVBYZERO. But the standard says


char array[1 + int(1 + 0.2 - 0.1 - 0.1)]; // Must be evaluated during 
translation

So it doesn't seem like it cares about that? Division by zero is the only 
one that gets weirdly special-cased...


--
Marc Glisse


Re: [PATCH v3] c++: Fix P0960 in member init list and array [PR92812]

2020-09-02 Thread Marek Polacek via Gcc-patches
On Wed, Sep 02, 2020 at 05:06:45PM -0400, Jason Merrill wrote:
> On 9/2/20 4:37 PM, Marek Polacek wrote:
> > I've added do_aggregate_paren_init to factor some common code.  It's not
> > perfect because the enclosing conditions couldn't really be factored out,
> 
> This condition:
> 
> > +  && (list_length (init) > 1
> > +  /* A single-element list: handle non-standard extensions
> > + like compound literals.  This also prevents triggering
> > + aggregate ()-initialization in compiler-generated code
> > + for =default.  */
> > +  || (list_length (init) == 1
> > +  && !same_type_ignoring_top_level_qualifiers_p
> > +  (type, TREE_TYPE (TREE_VALUE (init))
> 
> seems like it could move into do_aggregate_paren_init as well, even if it's
> redundant with code in the caller in some cases; we never want to add { } to
> a single parenthesized expression of the same type.

True, done in this patch.

> And don't we need to
> check for the same-type situation for the array case in check_initializer?

Yea, I think so.  It can be reached using either

  char c[4]("foo");
 
which is already handled by the string literal case, or with an array prvalue:

  using T = int[2];
  T t(T{1, 1});

> Incidentally, for checking whether a list is length 1 or more, probably
> slightly more efficient to look at TREE_CHAIN directly like
> do_aggregate_paren_init does in this patch, rather than use list_length.

Done.

How does this look?  Testing in progress, but dg.exp and old-deja.exp
is clean.

Ok if testing passes?

-- >8 --
This patch nails down the remaining P0960 case in PR92812:

  struct A {
int ar[2];
A(): ar(1, 2) {} // doesn't work without this patch
  };

Note that when the target object is not of array type, this already
works:

  struct S { int x, y; };
  struct A {
S s;
A(): s(1, 2) { } // OK in C++20
  };

because build_new_method_call_1 takes care of the P0960 magic.

It proved to be quite hairy.  When the ()-list has more than one
element, we can always create a CONSTRUCTOR, because the code was
previously invalid.  But when the ()-list has just one element, it
gets all kinds of difficult.  As usual, we have to handle a("foo")
so as not to wrap the STRING_CST in a CONSTRUCTOR.  Always turning
x(e) into x{e} would run into trouble as in c++/93790.  Another
issue was what to do about x({e}): previously, this would trigger
"list-initializer for non-class type must not be parenthesized".
I figured I'd make this work in C++20, so that given

  struct S { int x, y; };

you can do

   S a[2];
   [...]
   A(): a({1, 2}) // initialize a[0] with {1, 2} and a[1] with {}

It also turned out that, as an extension, we support compound literals:

  F (): m((S[1]) { 1, 2 })

so this has to keep working as before.  Moreover, make sure not to trigger
in compiler-generated code, like =default, where array assignment is allowed.

I've factored out a function that turns a TREE_LIST into a CONSTRUCTOR
to simplify handling of P0960.

paren-init35.C also tests this with vector types.

gcc/cp/ChangeLog:

PR c++/92812
* cp-tree.h (do_aggregate_paren_init): Declare.
* decl.c (do_aggregate_paren_init): New.
(grok_reference_init): Use it.
(check_initializer): Likewise.
* init.c (perform_member_init): Handle initializing an array from
a ()-list.  Use do_aggregate_paren_init.

gcc/testsuite/ChangeLog:

PR c++/92812
* g++.dg/cpp0x/constexpr-array23.C: Adjust dg-error.
* g++.dg/cpp0x/initlist69.C: Likewise.
* g++.dg/diagnostic/mem-init1.C: Likewise.
* g++.dg/init/array28.C: Likewise.
* g++.dg/cpp2a/paren-init33.C: New test.
* g++.dg/cpp2a/paren-init34.C: New test.
* g++.dg/cpp2a/paren-init35.C: New test.
* g++.old-deja/g++.brendan/crash60.C: Adjust dg-error.
* g++.old-deja/g++.law/init10.C: Likewise.
* g++.old-deja/g++.other/array3.C: Likewise.
---
 gcc/cp/cp-tree.h  |   1 +
 gcc/cp/decl.c |  62 +
 gcc/cp/init.c |  26 +++-
 .../g++.dg/cpp0x/constexpr-array23.C  |   6 +-
 gcc/testsuite/g++.dg/cpp0x/initlist69.C   |   4 +-
 gcc/testsuite/g++.dg/cpp2a/paren-init33.C | 128 ++
 gcc/testsuite/g++.dg/cpp2a/paren-init34.C |  25 
 gcc/testsuite/g++.dg/cpp2a/paren-init35.C |  21 +++
 gcc/testsuite/g++.dg/diagnostic/mem-init1.C   |   4 +-
 gcc/testsuite/g++.dg/init/array28.C   |   2 +-
 .../g++.old-deja/g++.brendan/crash60.C|   2 +-
 gcc/testsuite/g++.old-deja/g++.law/init10.C   |   2 +-
 gcc/testsuite/g++.old-deja/g++.other/array3.C |   3 +-
 13 files changed, 239 insertions(+), 47 deletions(-)
 create mode 100644 gcc/testsuite/g++.dg/cpp2a/paren-init33.C
 create mode 100644 gcc/testsuite/g++.dg/cpp2a/paren-init34.C
 create 

Re: [PATCH] c++: Add __builtin_bit_cast to implement std::bit_cast [PR93121]

2020-09-02 Thread Jason Merrill via Gcc-patches

On 8/27/20 6:19 AM, Richard Biener wrote:

On Thu, 27 Aug 2020, Jakub Jelinek wrote:


On Fri, Jul 31, 2020 at 04:28:05PM -0400, Jason Merrill via Gcc-patches wrote:

On 7/31/20 6:06 AM, Jakub Jelinek wrote:

On Fri, Jul 31, 2020 at 10:54:46AM +0100, Jonathan Wakely wrote:

Does the standard require that somewhere?  Because that is not what the
compiler implements right now.


https://gcc.gnu.org/bugzilla/show_bug.cgi?id=78620


But does that imply that all CONSTRUCTORs without CONSTRUCTOR_NO_CLEARING
need to be treated that way?  I mean, aren't such CONSTRUCTORs used also for
other initializations?


Yes, they are also used to represent constant values of classes that are
initialized by constexpr constructor.


And, are the default copy constructors or assignment operators supposed to
also copy the padding bits, or do they become unspecified again through
that?


For a non-union class, a defaulted copy is defined as memberwise copy, not a
copy of the entire object representation.  So I guess strictly speaking the
padding bits do become unspecified.  But I think if the copy is trivial, in
practice all implementations do copy the object representation; perhaps the
specification should adjust accordingly.


Sorry for not responding earlier.  I think at least in GCC there is no
guarantee the copying is copying the object representation rather than
memberwise copy, both are possible, depending e.g. whether SRA happens or
not.


Note we've basically settled on that SRA needs to copy padding and that
GIMPLE copies all bytes for aggregate copies and thus

   x = {}

is equivalent to a memset.


So, shouldn't we have a new CONSTRUCTOR flag that will represent whether
padding bits are cleared or not and then use it e.g. in the gimplifier?
Right now the gimplifier only adds first zero initialization if
CONSTRUCTOR_NO_CLEARING is not set and some initializers are not present,
so if there is a new flag, we'd need to in that case find out if there are
any padding bits and do the zero initialization in that case.
A question is if GIMPLE var = {}; statement (empty CONSTRUCTOR) is handled
as zero initialization of also the padding bits, or if we should treat it
that way only if the CONSTRUCTOR on the rhs has the new bit set and e.g.
when lowering memset 0 into var = {}; set the bit too.
 From what I understood on IRC, D has similar need for zero initialization of
padding.


Now indeed the gimplifier will turn a aggregate CTOR initialization
to memberwise init without caring for padding.  Which means GENERIC
has the less strict semantics and we indeed would need some CTOR flag
to tell whether padding is implicitely zero or undefined?


CONSTRUCTOR_NO_CLEARING would seem to already mean that, but the C++ 
front end uses it just to indicate whether some fields are 
uninitialized.  I suppose C++ could use a LANG_FLAG for that instead of 
the generic flag.



In the testcase below, what is and what is not UB?

#include 

struct S { int a : 31; int b; };
struct T { int a, b; };

constexpr int
foo ()
{
   S a = S ();
   S b = { 0, 0 };
   S c = a;
   S d;
   S e;
   d = a;
   e = S ();
   int u = std::bit_cast (T, a).a; // Is this well defined due to value 
initialization of a?
   int v = std::bit_cast (T, b).a; // But this is invalid, right?  There is no 
difference in the IL though.
   int w = std::bit_cast (T, c).a; // And this is also invalid, or are default 
copy ctors required to copy padding bits?
   int x = std::bit_cast (T, d).a; // Similarly for default copy assignment 
operators...
   int y = std::bit_cast (T, e).a; // And this too?
   int z = std::bit_cast (T, S ()).a; // This one is well defined?
   return u + v + w + x + y + z;
}

constexpr int x = foo ();

Jakub








Re: [PATCH v2] c++: Fix P0960 in member init list and array [PR92812]

2020-09-02 Thread Jason Merrill via Gcc-patches

On 9/2/20 4:37 PM, Marek Polacek wrote:

On Wed, Sep 02, 2020 at 12:00:29PM -0400, Jason Merrill via Gcc-patches wrote:

On 9/1/20 6:23 PM, Marek Polacek wrote:

This patch nails down the remaining P0960 case in PR92812:

struct A {
  int ar[2];
  A(): ar(1, 2) {} // doesn't work without this patch
};

Note that when the target object is not of array type, this already
works:

struct S { int x, y; };
struct A {
  S s;
  A(): s(1, 2) { } // OK in C++20
};

because build_new_method_call_1 takes care of the P0960 magic.

It proved to be quite hairy.  When the ()-list has more than one
element, we can always create a CONSTRUCTOR, because the code was
previously invalid.  But when the ()-list has just one element, it
gets all kinds of difficult.  As usual, we have to handle a("foo")
so as not to wrap the STRING_CST in a CONSTRUCTOR.  Always turning
x(e) into x{e} would run into trouble as in c++/93790.  Another
issue was what to do about x({e}): previously, this would trigger
"list-initializer for non-class type must not be parenthesized".
I figured I'd make this work in C++20, so that given

struct S { int x, y; };

you can do

 S a[2];
 [...]
 A(): a({1, 2}) // initialize a[0] with {1, 2} and a[1] with {}

It also turned out that, as an extension, we support compound literals:

F (): m((S[1]) { 1, 2 })

so this has to keep working as before.

Moreover, make sure not to trigger in compiler-generated code, like
=default, where array assignment is allowed.

paren-init35.C also tests this with vector types.

Bootstrapped/regtested on x86_64-pc-linux-gnu, ok for trunk?

gcc/cp/ChangeLog:

PR c++/92812
* init.c (do_paren_init_for_array_p): New.
(perform_member_init): Use it.  If true, build up a CONSTRUCTOR
from the list of arguments.

gcc/testsuite/ChangeLog:

PR c++/92812
* g++.dg/cpp0x/constexpr-array23.C: Adjust dg-error.
* g++.dg/cpp0x/initlist69.C: Likewise.
* g++.dg/diagnostic/mem-init1.C: Likewise.
* g++.dg/init/array28.C: Likewise.
* g++.dg/cpp2a/paren-init33.C: New test.
* g++.dg/cpp2a/paren-init34.C: New test.
* g++.dg/cpp2a/paren-init35.C: New test.
* g++.old-deja/g++.brendan/crash60.C: Adjust dg-error.
* g++.old-deja/g++.law/init10.C: Likewise.
* g++.old-deja/g++.other/array3.C: Likewise.
---
   gcc/cp/init.c |  64 -
   .../g++.dg/cpp0x/constexpr-array23.C  |   6 +-
   gcc/testsuite/g++.dg/cpp0x/initlist69.C   |   4 +-
   gcc/testsuite/g++.dg/cpp2a/paren-init33.C | 128 ++
   gcc/testsuite/g++.dg/cpp2a/paren-init34.C |  25 
   gcc/testsuite/g++.dg/cpp2a/paren-init35.C |  21 +++
   gcc/testsuite/g++.dg/diagnostic/mem-init1.C   |   4 +-
   gcc/testsuite/g++.dg/init/array28.C   |   2 +-
   .../g++.old-deja/g++.brendan/crash60.C|   2 +-
   gcc/testsuite/g++.old-deja/g++.law/init10.C   |   2 +-
   gcc/testsuite/g++.old-deja/g++.other/array3.C |   3 +-
   11 files changed, 243 insertions(+), 18 deletions(-)
   create mode 100644 gcc/testsuite/g++.dg/cpp2a/paren-init33.C
   create mode 100644 gcc/testsuite/g++.dg/cpp2a/paren-init34.C
   create mode 100644 gcc/testsuite/g++.dg/cpp2a/paren-init35.C

diff --git a/gcc/cp/init.c b/gcc/cp/init.c
index d4540db3605..2edc9651ad6 100644
--- a/gcc/cp/init.c
+++ b/gcc/cp/init.c
@@ -756,6 +756,41 @@ maybe_warn_list_ctor (tree member, tree init)
 "of the underlying array", member, begin);
   }
+/* Return true if we should attempt to perform the P0960 magic when
+   initializing an array TYPE from a parenthesized list of values LIST.  */
+
+static bool
+do_paren_init_for_array_p (tree list, tree type)
+{
+  if (cxx_dialect < cxx20)
+/* P0960 is a C++20 feature.  */
+return false;
+
+  const int len = list_length (list);
+  if (len == 0)
+/* Value-initialization.  */
+return false;
+  else if (len > 1)
+/* If the list had more than one element, the code is ill-formed
+   pre-C++20, so we should attempt to ()-init.  */
+return true;
+
+  /* Lists with one element are trickier.  */
+  tree elt = TREE_VALUE (list);
+
+  /* For a("foo"), don't wrap the STRING_CST in { }.  */
+  if (char_type_p (TYPE_MAIN_VARIANT (TREE_TYPE (type)))
+  && TREE_CODE (tree_strip_any_location_wrapper (elt)) == STRING_CST)
+return false;


Hmm, yet another place we need to implement the special treatment of
strings?  Can't we factor this better?  Could there be a general e.g.
maybe_aggregate_paren_init function to turn a list into a CONSTRUCTOR that's
used in various places?


I've added do_aggregate_paren_init to factor some common code.  It's not
perfect because the enclosing conditions couldn't really be factored out,


This condition:


+  && (list_length (init) > 1
+  /* A single-element list: handle non-standard extensions
+ 

Re: [PATCH] c++: Disable -frounding-math during manifestly constant evaluation [PR96862]

2020-09-02 Thread Jason Merrill via Gcc-patches

On 9/1/20 6:13 AM, Marc Glisse wrote:

On Tue, 1 Sep 2020, Jakub Jelinek via Gcc-patches wrote:


As discussed in the PR, fold-const.c punts on floating point constant
evaluation if the result is inexact and -frounding-math is turned on.
 /* Don't constant fold this floating point operation if the
    result may dependent upon the run-time rounding mode and
    flag_rounding_math is set, or if GCC's software emulation
    is unable to accurately represent the result.  */
 if ((flag_rounding_math
  || (MODE_COMPOSITE_P (mode) && 
!flag_unsafe_math_optimizations))

 && (inexact || !real_identical (, )))
   return NULL_TREE;
Jonathan said that we should be evaluating them anyway, e.g. conceptually
as if they are done with the default rounding mode before user had a 
chance

to change that, and e.g. in C in initializers it is also ignored.
In fact, fold-const.c for C initializers turns off various other options:

/* Perform constant folding and related simplification of initializer
  expression EXPR.  These behave identically to "fold_buildN" but ignore
  potential run-time traps and exceptions that fold must preserve.  */

#define START_FOLD_INIT \
 int saved_signaling_nans = flag_signaling_nans;\
 int saved_trapping_math = flag_trapping_math;\
 int saved_rounding_math = flag_rounding_math;\
 int saved_trapv = flag_trapv;\
 int saved_folding_initializer = folding_initializer;\
 flag_signaling_nans = 0;\
 flag_trapping_math = 0;\
 flag_rounding_math = 0;\
 flag_trapv = 0;\
 folding_initializer = 1;

#define END_FOLD_INIT \
 flag_signaling_nans = saved_signaling_nans;\
 flag_trapping_math = saved_trapping_math;\
 flag_rounding_math = saved_rounding_math;\
 flag_trapv = saved_trapv;\
 folding_initializer = saved_folding_initializer;

So, shall cxx_eval_outermost_constant_expr instead turn off all those
options (then warning_sentinel wouldn't be the right thing to use, but 
given

the 8 or how many return stmts in cxx_eval_outermost_constant_expr, we'd
need a RAII class for this.  Not sure about the folding_initializer, that
one is affecting complex multiplication and division constant evaluation
somehow.


I don't think we need to turn off flag_signaling_nans or flag_trapv. I 
think we want to turn off flag_trapping_math so we can fold 1./0 to inf 
(still in a context where folding is mandatory). Setting 
folding_initializer seems consistent with that, enabling infinite 
results in complex folding (it also forces folding of 
__builtin_constant_p, which may be redundant with 
force_folding_builtin_constant_p).


C++ says that division by zero has undefined behavior, and that an 
expression with undefined behavior is not constant, so we shouldn't fold 
1./0 to inf anyway.  The same is true of other trapping operations.  So 
clearing flag_signaling_nans, flag_trapping_math, and flag_trapv seems 
wrong for C++.  And folding_initializer seems to be used for the same 
sort of thing.



The following patch has been bootstrapped/regtested on x86_64-linux and
i686-linux, but see above, maybe we want something else.

2020-09-01  Jakub Jelinek  

PR c++/96862
* constexpr.c (cxx_eval_outermost_constant_expr): Temporarily disable
flag_rounding_math during manifestly constant evaluation.


OK.


* g++.dg/cpp1z/constexpr-96862.C: New test.

--- gcc/cp/constexpr.c.jj    2020-08-31 14:10:15.826921458 +0200
+++ gcc/cp/constexpr.c    2020-08-31 15:41:26.429964532 +0200
@@ -6680,6 +6680,8 @@ cxx_eval_outermost_constant_expr (tree t
    allow_non_constant, strict,
    manifestly_const_eval || !allow_non_constant };

+  /* Turn off -frounding-math for manifestly constant evaluation.  */
+  warning_sentinel rm (flag_rounding_math, ctx.manifestly_const_eval);
  tree type = initialized_type (t);
  tree r = t;
  bool is_consteval = false;
--- gcc/testsuite/g++.dg/cpp1z/constexpr-96862.C.jj    2020-08-31 
15:50:07.847473028 +0200
+++ gcc/testsuite/g++.dg/cpp1z/constexpr-96862.C    2020-08-31 
15:49:40.829861168 +0200

@@ -0,0 +1,20 @@
+// PR c++/96862
+// { dg-do compile { target c++17 } }
+// { dg-additional-options "-frounding-math" }
+
+constexpr double a = 0x1.0p+100 + 0x1.0p-100;
+const double b = 0x1.0p+100 + 0x1.0p-100;
+const double & = 0x1.0p+100 + 0x1.0p-100;
+static_assert (0x1.0p+100 + 0x1.0p-100 == 0x1.0p+100, "");
+
+void
+foo ()
+{
+  constexpr double d = 0x1.0p+100 + 0x1.0p-100;
+  const double e = 0x1.0p+100 + 0x1.0p-100;
+  const double & = 0x1.0p+100 + 0x1.0p-100;
+  static_assert (0x1.0p+100 + 0x1.0p-100 == 0x1.0p+100, "");
+}
+
+const double  = a;
+const double  = b;

Jakub






Re: [PATCH v2] c++: Fix P0960 in member init list and array [PR92812]

2020-09-02 Thread Marek Polacek via Gcc-patches
On Wed, Sep 02, 2020 at 12:00:29PM -0400, Jason Merrill via Gcc-patches wrote:
> On 9/1/20 6:23 PM, Marek Polacek wrote:
> > This patch nails down the remaining P0960 case in PR92812:
> > 
> >struct A {
> >  int ar[2];
> >  A(): ar(1, 2) {} // doesn't work without this patch
> >};
> > 
> > Note that when the target object is not of array type, this already
> > works:
> > 
> >struct S { int x, y; };
> >struct A {
> >  S s;
> >  A(): s(1, 2) { } // OK in C++20
> >};
> > 
> > because build_new_method_call_1 takes care of the P0960 magic.
> > 
> > It proved to be quite hairy.  When the ()-list has more than one
> > element, we can always create a CONSTRUCTOR, because the code was
> > previously invalid.  But when the ()-list has just one element, it
> > gets all kinds of difficult.  As usual, we have to handle a("foo")
> > so as not to wrap the STRING_CST in a CONSTRUCTOR.  Always turning
> > x(e) into x{e} would run into trouble as in c++/93790.  Another
> > issue was what to do about x({e}): previously, this would trigger
> > "list-initializer for non-class type must not be parenthesized".
> > I figured I'd make this work in C++20, so that given
> > 
> >struct S { int x, y; };
> > 
> > you can do
> > 
> > S a[2];
> > [...]
> > A(): a({1, 2}) // initialize a[0] with {1, 2} and a[1] with {}
> > 
> > It also turned out that, as an extension, we support compound literals:
> > 
> >F (): m((S[1]) { 1, 2 })
> > 
> > so this has to keep working as before.
> > 
> > Moreover, make sure not to trigger in compiler-generated code, like
> > =default, where array assignment is allowed.
> > 
> > paren-init35.C also tests this with vector types.
> > 
> > Bootstrapped/regtested on x86_64-pc-linux-gnu, ok for trunk?
> > 
> > gcc/cp/ChangeLog:
> > 
> > PR c++/92812
> > * init.c (do_paren_init_for_array_p): New.
> > (perform_member_init): Use it.  If true, build up a CONSTRUCTOR
> > from the list of arguments.
> > 
> > gcc/testsuite/ChangeLog:
> > 
> > PR c++/92812
> > * g++.dg/cpp0x/constexpr-array23.C: Adjust dg-error.
> > * g++.dg/cpp0x/initlist69.C: Likewise.
> > * g++.dg/diagnostic/mem-init1.C: Likewise.
> > * g++.dg/init/array28.C: Likewise.
> > * g++.dg/cpp2a/paren-init33.C: New test.
> > * g++.dg/cpp2a/paren-init34.C: New test.
> > * g++.dg/cpp2a/paren-init35.C: New test.
> > * g++.old-deja/g++.brendan/crash60.C: Adjust dg-error.
> > * g++.old-deja/g++.law/init10.C: Likewise.
> > * g++.old-deja/g++.other/array3.C: Likewise.
> > ---
> >   gcc/cp/init.c |  64 -
> >   .../g++.dg/cpp0x/constexpr-array23.C  |   6 +-
> >   gcc/testsuite/g++.dg/cpp0x/initlist69.C   |   4 +-
> >   gcc/testsuite/g++.dg/cpp2a/paren-init33.C | 128 ++
> >   gcc/testsuite/g++.dg/cpp2a/paren-init34.C |  25 
> >   gcc/testsuite/g++.dg/cpp2a/paren-init35.C |  21 +++
> >   gcc/testsuite/g++.dg/diagnostic/mem-init1.C   |   4 +-
> >   gcc/testsuite/g++.dg/init/array28.C   |   2 +-
> >   .../g++.old-deja/g++.brendan/crash60.C|   2 +-
> >   gcc/testsuite/g++.old-deja/g++.law/init10.C   |   2 +-
> >   gcc/testsuite/g++.old-deja/g++.other/array3.C |   3 +-
> >   11 files changed, 243 insertions(+), 18 deletions(-)
> >   create mode 100644 gcc/testsuite/g++.dg/cpp2a/paren-init33.C
> >   create mode 100644 gcc/testsuite/g++.dg/cpp2a/paren-init34.C
> >   create mode 100644 gcc/testsuite/g++.dg/cpp2a/paren-init35.C
> > 
> > diff --git a/gcc/cp/init.c b/gcc/cp/init.c
> > index d4540db3605..2edc9651ad6 100644
> > --- a/gcc/cp/init.c
> > +++ b/gcc/cp/init.c
> > @@ -756,6 +756,41 @@ maybe_warn_list_ctor (tree member, tree init)
> >  "of the underlying array", member, begin);
> >   }
> > +/* Return true if we should attempt to perform the P0960 magic when
> > +   initializing an array TYPE from a parenthesized list of values LIST.  */
> > +
> > +static bool
> > +do_paren_init_for_array_p (tree list, tree type)
> > +{
> > +  if (cxx_dialect < cxx20)
> > +/* P0960 is a C++20 feature.  */
> > +return false;
> > +
> > +  const int len = list_length (list);
> > +  if (len == 0)
> > +/* Value-initialization.  */
> > +return false;
> > +  else if (len > 1)
> > +/* If the list had more than one element, the code is ill-formed
> > +   pre-C++20, so we should attempt to ()-init.  */
> > +return true;
> > +
> > +  /* Lists with one element are trickier.  */
> > +  tree elt = TREE_VALUE (list);
> > +
> > +  /* For a("foo"), don't wrap the STRING_CST in { }.  */
> > +  if (char_type_p (TYPE_MAIN_VARIANT (TREE_TYPE (type)))
> > +  && TREE_CODE (tree_strip_any_location_wrapper (elt)) == STRING_CST)
> > +return false;
> 
> Hmm, yet another place we need to implement the special treatment of
> strings?  Can't we factor this better?  Could there be a general e.g.
> maybe_aggregate_paren_init function to turn a list into a 

Re: [PATCH] dwarf: Multi-register CFI address support

2020-09-02 Thread Tom Tromey
> "Andrew" == Andrew Stubbs  writes:

Andrew> 
http://llvm.org/docs/AMDGPUDwarfExtensionsForHeterogeneousDebugging.html#composite-location-description-operations

Thanks.  Adding that to the appropriate spot in the patch would be
great.

Tom


Re: [PATCH] dwarf: Multi-register CFI address support

2020-09-02 Thread Andrew Stubbs

On 02/09/2020 18:49, Tom Tromey wrote:

"Andrew" == Andrew Stubbs  writes:


Andrew> To be fair, the DWARF standard makes a similar assumption; the
Andrew> engineers working on LLVM and GDB, at AMD, have therefore invented
Andrew> some new DWARF operators that they plan to propose for a future
Andrew> standard. Only one is relevant here, however:
Andrew> DW_OP_LLVM_piece_end. (Unfortunately this clashes with an AArch64
Andrew> extension, but I think we can cope using an alias -- only GCC dumps
Andrew> will be confusing.)

Andrew> +/* AMD GCN extensions (originally for LLVM).  */
Andrew> +// This clashes with DW_OP_AARCH64_operation, so use an alias instead
Andrew> +// DW_OP (DW_OP_LLVM_piece_end, 0xea)
Andrew> +#define DW_OP_LLVM_piece_end DW_OP_AARCH64_operation
Andrew>  DW_END_OP
  
Is it too late to pick a non-clashing value?


Also, we have tried pretty hard in recent years to document all of gcc's
DWARF extensions.  Please add a link to the documentation for this one.
If there aren't docs -- I guess it would be ideal if you could write
them.  Putting them on the GCC wiki is fine.


I didn't select this number; I'm just following out-of-tree LLVM and GDB 
usage from AMD. It's allocated from the user extension range, so I 
imagine it'll enter the standard with a different number (and probably 
name).


The documentation is here:

http://llvm.org/docs/AMDGPUDwarfExtensionsForHeterogeneousDebugging.html#composite-location-description-operations

Andrew


Re: [PATCH] c++: Stop defining true, false and bool as macros in

2020-09-02 Thread Nathan Sidwell

On 9/2/20 3:13 PM, Jonathan Wakely wrote:

Since r216679 these macros have only been defined in C++98 mode, rather
than all modes. That is permitted as a GNU extension because that header
doesn't exist in the C++ standard until C++11, so we can make it do
whatever we want for C++98. But as discussed in the PR c++/60304
comments, these macros shouldn't ever be defined for C++.

This patch removes the macro definitions for C++98 too.

The new test already passed for C++98 (and the conversion is ill-formed
in C++11 and later) so this new test is arguably unnecessary.

gcc/ChangeLog:

PR c++/60304
* ginclude/stdbool.h (bool, false, true): Never define for C++.

gcc/testsuite/ChangeLog:

PR c++/60304
* g++.dg/warn/Wconversion-null-5.C: New test.


Back in 2012 Gerald argued that we should keep these macros in case
there is code depending on them. We've not been defining them in C++11
and later modes (including our default -std=gnu++14) for nearly six
years now (GCC 5.1 shipped with the change).  I'm not aware of any
reports of problems. I think it's time to stop defining them at all.

Bootstrapped and tested on powerpc64le-linux, OK for trunk?


seems reasonable, thanks



--
Nathan Sidwell


[PATCH] c++: Stop defining true, false and bool as macros in

2020-09-02 Thread Jonathan Wakely via Gcc-patches
Since r216679 these macros have only been defined in C++98 mode, rather
than all modes. That is permitted as a GNU extension because that header
doesn't exist in the C++ standard until C++11, so we can make it do
whatever we want for C++98. But as discussed in the PR c++/60304
comments, these macros shouldn't ever be defined for C++.

This patch removes the macro definitions for C++98 too.

The new test already passed for C++98 (and the conversion is ill-formed
in C++11 and later) so this new test is arguably unnecessary.

gcc/ChangeLog:

PR c++/60304
* ginclude/stdbool.h (bool, false, true): Never define for C++.

gcc/testsuite/ChangeLog:

PR c++/60304
* g++.dg/warn/Wconversion-null-5.C: New test.


Back in 2012 Gerald argued that we should keep these macros in case
there is code depending on them. We've not been defining them in C++11
and later modes (including our default -std=gnu++14) for nearly six
years now (GCC 5.1 shipped with the change).  I'm not aware of any
reports of problems. I think it's time to stop defining them at all.

Bootstrapped and tested on powerpc64le-linux, OK for trunk?


commit f049cda373d29ea1bce4065b24cbb392cdc5b172
Author: Jonathan Wakely 
Date:   Wed Sep 2 18:51:28 2020

c++: Stop defining true, false and bool as macros in 

Since r216679 these macros have only been defined in C++98 mode, rather
than all modes. That is permitted as a GNU extension because that header
doesn't exist in the C++ standard until C++11, so we can make it do
whatever we want for C++98. But as discussed in the PR c++/60304
comments, these macros shouldn't ever be defined for C++.

This patch removes the macro definitions for C++98 too.

The new test already passed for C++98 (and the conversion is ill-formed
in C++11 and later) so this new test is arguably unnecessary.

gcc/ChangeLog:

PR c++/60304
* ginclude/stdbool.h (bool, false, true): Never define for C++.

gcc/testsuite/ChangeLog:

PR c++/60304
* g++.dg/warn/Wconversion-null-5.C: New test.

diff --git a/gcc/ginclude/stdbool.h b/gcc/ginclude/stdbool.h
index 72be438692f..1b56498d96f 100644
--- a/gcc/ginclude/stdbool.h
+++ b/gcc/ginclude/stdbool.h
@@ -39,13 +39,6 @@ see the files COPYING3 and COPYING.RUNTIME respectively.  If 
not, see
 /* Supporting _Bool in C++ is a GCC extension.  */
 #define _Bool  bool
 
-#if __cplusplus < 201103L
-/* Defining these macros in C++98 is a GCC extension.  */
-#define bool   bool
-#define false  false
-#define true   true
-#endif
-
 #endif /* __cplusplus */
 
 /* Signal that all the definitions are present.  */
diff --git a/gcc/testsuite/g++.dg/warn/Wconversion-null-5.C 
b/gcc/testsuite/g++.dg/warn/Wconversion-null-5.C
new file mode 100644
index 000..05980ea91ab
--- /dev/null
+++ b/gcc/testsuite/g++.dg/warn/Wconversion-null-5.C
@@ -0,0 +1,6 @@
+// PR c++/60304
+// { dg-do compile { target c++98_only } }
+// { dg-options "-Wconversion-null" }
+
+#include 
+int * foo() {return false;} // { dg-warning "converting 'false' to pointer" }


[r11-2979 Regression] FAIL: g++.old-deja/g++.abi/cxa_vec.C -std=gnu++98 (test for excess errors) on Linux/x86_64 (-m64)

2020-09-02 Thread sunil.k.pandey via Gcc-patches
On Linux/x86_64,

6bdbf0f37bda2587a4e82cbb956de7a159a397ae is the first bad commit
commit 6bdbf0f37bda2587a4e82cbb956de7a159a397ae
Author: Jonathan Wakely 
Date:   Wed Sep 2 13:27:57 2020 +0100

libstdc++: Break header cycle between  and 

caused

FAIL: g++.old-deja/g++.abi/cxa_vec.C  -std=gnu++14 (test for excess errors)
FAIL: g++.old-deja/g++.abi/cxa_vec.C  -std=gnu++17 (test for excess errors)
FAIL: g++.old-deja/g++.abi/cxa_vec.C  -std=gnu++2a (test for excess errors)
FAIL: g++.old-deja/g++.abi/cxa_vec.C  -std=gnu++98 (test for excess errors)

with GCC configured with

../../gcc/configure 
--prefix=/local/skpandey/gccwork/toolwork/gcc-bisect-master/master/r11-2979/usr 
--enable-clocale=gnu --with-system-zlib --with-demangler-in-ld 
--with-fpmath=sse --enable-languages=c,c++,fortran --enable-cet --without-isl 
--enable-libmpx x86_64-linux --disable-bootstrap

To reproduce:

$ cd {build_dir}/gcc && make check 
RUNTESTFLAGS="old-deja.exp=g++.old-deja/g++.abi/cxa_vec.C 
--target_board='unix{-m64}'"

(Please do not reply to this email, for question about this report, contact me 
at skpgkp2 at gmail dot com)


Re: [PATCH] dwarf: Multi-register CFI address support

2020-09-02 Thread Tom Tromey
> "Andrew" == Andrew Stubbs  writes:

Andrew> To be fair, the DWARF standard makes a similar assumption; the
Andrew> engineers working on LLVM and GDB, at AMD, have therefore invented
Andrew> some new DWARF operators that they plan to propose for a future
Andrew> standard. Only one is relevant here, however:
Andrew> DW_OP_LLVM_piece_end. (Unfortunately this clashes with an AArch64
Andrew> extension, but I think we can cope using an alias -- only GCC dumps
Andrew> will be confusing.)

Andrew> +/* AMD GCN extensions (originally for LLVM).  */
Andrew> +// This clashes with DW_OP_AARCH64_operation, so use an alias instead
Andrew> +// DW_OP (DW_OP_LLVM_piece_end, 0xea)
Andrew> +#define DW_OP_LLVM_piece_end DW_OP_AARCH64_operation
Andrew>  DW_END_OP
 
Is it too late to pick a non-clashing value?

Also, we have tried pretty hard in recent years to document all of gcc's
DWARF extensions.  Please add a link to the documentation for this one.
If there aren't docs -- I guess it would be ideal if you could write
them.  Putting them on the GCC wiki is fine.

Tom


Re: [PATCH] separate reading past the end from -Wstringop-overflow

2020-09-02 Thread Joseph Myers
On Tue, 1 Sep 2020, Jeff Law via Gcc-patches wrote:

> > With this commit:
> > https://gcc.gnu.org/pipermail/gcc-patches/2020-September/553109.html
> > the remaining failures should now be gone.  Please let me know if
> > any persist.
> There's a related glibc build failure, but I think Joseph ack'd a fix for it
> today.

Note I'm not sure if Maciej will be committing that fix soon or not.

There is also at least one glibc testsuite build failure that appears on 
those architectures where the glibc build didn't fail.

-- 
Joseph S. Myers
jos...@codesourcery.com


Re: [r11-2979 Regression] FAIL: g++.old-deja/g++.abi/cxa_vec.C -std=gnu++98 (test for excess errors) on Linux/x86_64 (-m64 -march=cascadelake)

2020-09-02 Thread Jonathan Wakely via Gcc-patches

On 02/09/20 10:22 -0700, sunil.k.pandey wrote:

On Linux/x86_64,

6bdbf0f37bda2587a4e82cbb956de7a159a397ae is the first bad commit
commit 6bdbf0f37bda2587a4e82cbb956de7a159a397ae
Author: Jonathan Wakely 
Date:   Wed Sep 2 13:27:57 2020 +0100

   libstdc++: Break header cycle between  and 

caused

FAIL: g++.old-deja/g++.abi/cxa_vec.C  -std=gnu++14 (test for excess errors)
FAIL: g++.old-deja/g++.abi/cxa_vec.C  -std=gnu++17 (test for excess errors)
FAIL: g++.old-deja/g++.abi/cxa_vec.C  -std=gnu++2a (test for excess errors)
FAIL: g++.old-deja/g++.abi/cxa_vec.C  -std=gnu++98 (test for excess errors)

with GCC configured with

../../gcc/configure 
--prefix=/local/skpandey/gccwork/toolwork/gcc-bisect-master/master/r11-2979/usr 
--enable-clocale=gnu --with-system-zlib --with-demangler-in-ld 
--with-fpmath=sse --enable-languages=c,c++,fortran --enable-cet --without-isl 
--enable-libmpx x86_64-linux --disable-bootstrap

To reproduce:

$ cd {build_dir}/gcc && make check 
RUNTESTFLAGS="old-deja.exp=g++.old-deja/g++.abi/cxa_vec.C --target_board='unix{-m64\ 
-march=cascadelake}'"

(Please do not reply to this email, for question about this report, contact me 
at skpgkp2 at gmail dot com)


The test is invalid, but worked until I changed  to stop
including .

This patch fixes it, pushed to master as obvious.


commit ce90d203cea33a4bfd4e415f601fe4486ecbb45d
Author: Jonathan Wakely 
Date:   Wed Sep 2 18:37:17 2020

testsuite: Add missing  header to testcase

This test no longer compiles because  stopped including
, so std::set_terminate is not defined.

gcc/testsuite/ChangeLog:

* g++.old-deja/g++.abi/cxa_vec.C: Include  for
std::set_terminate.

diff --git a/gcc/testsuite/g++.old-deja/g++.abi/cxa_vec.C b/gcc/testsuite/g++.old-deja/g++.abi/cxa_vec.C
index d52637281fe..de647c4eb69 100644
--- a/gcc/testsuite/g++.old-deja/g++.abi/cxa_vec.C
+++ b/gcc/testsuite/g++.old-deja/g++.abi/cxa_vec.C
@@ -15,6 +15,7 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 #include 
 


Re: Deque rotate on current node

2020-09-02 Thread François Dumont via Gcc-patches

On 01/09/20 3:25 pm, Jonathan Wakely wrote:

On 01/09/20 14:06 +0200, François Dumont wrote:

Hi

No chance to review this small patch ?


I did review it, and I wasn't convinced it was a good change. It only
helps a particular usage pattern, and might hurt in other cases.


    I shouldn't have illustrate the target of this patch with its 
impact on the use case of an initial push_front. It is clearly not its 
purpose and I agree that it doesn't really improve this use case, it 
doesn't make it worst neither however.





I don't agree with your assertion that you use std::deque when you
only use push_front() and you use std::list if you need both
push_front() and push_back().

Ideally we'd keep the most recently reallocated node around for reuse,
and then in the situation you describe the first push_front would
allocate a new node, but if you immediately do pop_back() we wouldn't
deallocate the node. But I haven't figured out a way to do that
caching without an ABI break.
AFAIR I looked at a solution too and couldn't find any ABI compatible. 
This is why I thought this patch could be a limited answer to this.


The patch also has no tests. Are our existing tests sufficient to
cover this case? Do we want a test that verifies that we don't
allocate a new node if doing push_front() into an empty deque?


I initially thought that this patch didn't need any specific test but as 
this patch purpose is performance we could indeed add a performance 
test. This is what I've done in attachment. We can now clearly see the 
impact:


Before:

deque.cc     push_back/pop_front      1167r 1167u    
0s  528mem    0pf


After:

deque.cc     push_back/pop_front      1018r 1017u    
0s    0mem    0pf


Some CPU enhancements coming from the limitation on memory usage.

I'll do the same with push_front/pop_back if you eventually validate the 
patch.


But even if the results are great I agree that the conditions to benefit 
from it are limited. You need the deque to be empty when you push_back 
at node past-the-end position to benefit from it.


If you think that this kind of situation is too rare to deserve a 
special piece of code in deque implementation then ok, I won't bother 
you with this proposal anymore.


François


diff --git a/libstdc++-v3/include/bits/deque.tcc b/libstdc++-v3/include/bits/deque.tcc
index 7d1ec86456a..e0fb7f07bc4 100644
--- a/libstdc++-v3/include/bits/deque.tcc
+++ b/libstdc++-v3/include/bits/deque.tcc
@@ -486,6 +486,19 @@ _GLIBCXX_BEGIN_NAMESPACE_CONTAINER
   _M_push_back_aux(const value_type& __t)
 #endif
   {
+	if (empty())
+	  {
+	// Move iterators to point to the current node begin.
+	this->_M_impl._M_start._M_cur = this->_M_impl._M_start._M_first;
+	this->_M_impl._M_finish._M_cur = this->_M_impl._M_finish._M_first;
+#if __cplusplus >= 201103L
+	emplace_back(std::forward<_Args>(__args)...);
+#else
+	push_back(__t);
+#endif
+	return;
+	  }
+
 	if (size() == max_size())
 	  __throw_length_error(
 	  __N("cannot create std::deque larger than max_size()"));
@@ -525,6 +538,19 @@ _GLIBCXX_BEGIN_NAMESPACE_CONTAINER
   _M_push_front_aux(const value_type& __t)
 #endif
   {
+	if (empty())
+	  {
+	// Move iterators to point to the current node end.
+	this->_M_impl._M_finish._M_cur = this->_M_impl._M_finish._M_last - 1;
+	this->_M_impl._M_start._M_cur = this->_M_impl._M_start._M_last - 1;
+#if __cplusplus >= 201103L
+	emplace_front(std::forward<_Args>(__args)...);
+#else
+	push_front(__t);
+#endif
+	return;
+	  }
+
 	if (size() == max_size())
 	  __throw_length_error(
 	  __N("cannot create std::deque larger than max_size()"));
diff --git a/libstdc++-v3/testsuite/performance/23_containers/insert_erase/deque.cc b/libstdc++-v3/testsuite/performance/23_containers/insert_erase/deque.cc
new file mode 100644
index 000..1eed79d1202
--- /dev/null
+++ b/libstdc++-v3/testsuite/performance/23_containers/insert_erase/deque.cc
@@ -0,0 +1,45 @@
+// Copyright (C) 2020 Free Software Foundation, Inc.
+//
+// This file is part of the GNU ISO C++ Library.  This library is free
+// software; you can redistribute it and/or modify it under the
+// terms of the GNU General Public License as published by the
+// Free Software Foundation; either version 3, or (at your option)
+// any later version.
+
+// This library is distributed in the hope that it will be useful,
+// but WITHOUT ANY WARRANTY; without even the implied warranty of
+// MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+// GNU General Public License for more details.
+
+// You should have received a copy of the GNU General Public License along
+// with this library; see the file COPYING3.  If not see
+// .
+
+
+#include 
+#include 
+#include 
+
+int main()
+{
+  using namespace __gnu_test;
+
+  time_counter time;
+  resource_counter resource;
+
+  const int nb = 2;
+  std::deque dq;
+

[r11-2979 Regression] FAIL: g++.old-deja/g++.abi/cxa_vec.C -std=gnu++98 (test for excess errors) on Linux/x86_64 (-m64 -march=cascadelake)

2020-09-02 Thread sunil.k.pandey via Gcc-patches
On Linux/x86_64,

6bdbf0f37bda2587a4e82cbb956de7a159a397ae is the first bad commit
commit 6bdbf0f37bda2587a4e82cbb956de7a159a397ae
Author: Jonathan Wakely 
Date:   Wed Sep 2 13:27:57 2020 +0100

libstdc++: Break header cycle between  and 

caused

FAIL: g++.old-deja/g++.abi/cxa_vec.C  -std=gnu++14 (test for excess errors)
FAIL: g++.old-deja/g++.abi/cxa_vec.C  -std=gnu++17 (test for excess errors)
FAIL: g++.old-deja/g++.abi/cxa_vec.C  -std=gnu++2a (test for excess errors)
FAIL: g++.old-deja/g++.abi/cxa_vec.C  -std=gnu++98 (test for excess errors)

with GCC configured with

../../gcc/configure 
--prefix=/local/skpandey/gccwork/toolwork/gcc-bisect-master/master/r11-2979/usr 
--enable-clocale=gnu --with-system-zlib --with-demangler-in-ld 
--with-fpmath=sse --enable-languages=c,c++,fortran --enable-cet --without-isl 
--enable-libmpx x86_64-linux --disable-bootstrap

To reproduce:

$ cd {build_dir}/gcc && make check 
RUNTESTFLAGS="old-deja.exp=g++.old-deja/g++.abi/cxa_vec.C 
--target_board='unix{-m64\ -march=cascadelake}'"

(Please do not reply to this email, for question about this report, contact me 
at skpgkp2 at gmail dot com)


Re: dg-options after board/cflags

2020-09-02 Thread Jose E. Marchesi via Gcc-patches


> On Wed, Sep 2, 2020 at 8:31 AM Jose E. Marchesi via Gcc-patches
>  wrote:
>>
>>
>> Hi people!
>>
>> While adding a bpf-sim.exp to dejagnu, I noticed that the flags in
>> board/cflags were included in the final compilation line _after_ the
>> flags in the test's dg-options.
>>
>> Since the test options are more particular than the board options, I
>> would expect them to be placed after any board-defined flags, so I
>> prepared the patch below for dejagnu, which does the right thing for the
>> gcc.target/bpf testsuite.
>>
>> However:
>>
>> 1. There could be tests around that depend (erroneously) on some of
>>their dg-options to not have effect (or a different effect) because
>>they are annulled (or modified) by some flag in a board file.
>>
>> 2. This could also impact other programs using dejagnu.
>>
>> How do you people recommend to proceed?
>> Should we fix dejagnu and then fix buggy tests?
>> Or the other way around?  :-)
>>
>> diff --git a/lib/target.exp b/lib/target.exp
>> index 36ae639..f0bfe20 100644
>> --- a/lib/target.exp
>> +++ b/lib/target.exp
>> @@ -455,7 +455,7 @@ proc default_target_compile {source destfile type 
>> options} {
>> }
>> if {[regexp "^additional_flags=" $i]} {
>> regsub "^additional_flags=" $i "" tmp
>> -   append add_flags " $tmp"
>> +   append additional_flags " $tmp"
>> }
>> if {[regexp "^ldflags=" $i]} {
>> regsub "^ldflags=" $i "" tmp
>> @@ -703,6 +703,8 @@ proc default_target_compile {source destfile type 
>> options} {
>> }
>>  }
>>
>> +append add_flags " $additional_flags"
>> +
>>  verbose "doing compile"
>>
>>  set sources ""
>> @@ -728,7 +730,7 @@ proc default_target_compile {source destfile type 
>> options} {
>> append add_flags " -o $destfile"
>> }
>>  }
>> -
>> +
>>  # This is obscure: we put SOURCES at the end when building an
>>  # object, because otherwise, in some situations, libtool will
>>  # become confused about the name of the actual source file.
>
> Does your dejagnu contain
>
> commit 5256bd82343000c76bc0e48139003f90b6184347
> Author: H.J. Lu 
> Date:   Thu Feb 26 17:53:48 2015 +1100
>
> * lib/target.exp (default_target_compile): Prepend multilib_flags,
> instead of appending it.
>
> Some GCC testcases need explicit GCC options to properly run. For
> example gcc.target/i386/pr32219-1.c has -fpie specified explicitly:
>
> /* { dg-options "-O2 -fpie" } */
>
> But with multlib, eg:
> make check-gcc RUNTESTFLAGS="--target_board='unix{-fpic}'"
>
> -fpic is appended to the command line options, which overrides the command
> line options specified by dg-options.  multlib flags should be placed at
> the beginning of the command line options, not at the end.  This patch
> updates default_target_compile to prepend multilib_flags, instead of
> appending it.

Yeah, this is dejagnu master.

Your patch dealt with board/multilib_flags, but the same problem exists
for board/cflags and many other flag-containing options.


Re: dg-options after board/cflags

2020-09-02 Thread H.J. Lu via Gcc-patches
On Wed, Sep 2, 2020 at 8:31 AM Jose E. Marchesi via Gcc-patches
 wrote:
>
>
> Hi people!
>
> While adding a bpf-sim.exp to dejagnu, I noticed that the flags in
> board/cflags were included in the final compilation line _after_ the
> flags in the test's dg-options.
>
> Since the test options are more particular than the board options, I
> would expect them to be placed after any board-defined flags, so I
> prepared the patch below for dejagnu, which does the right thing for the
> gcc.target/bpf testsuite.
>
> However:
>
> 1. There could be tests around that depend (erroneously) on some of
>their dg-options to not have effect (or a different effect) because
>they are annulled (or modified) by some flag in a board file.
>
> 2. This could also impact other programs using dejagnu.
>
> How do you people recommend to proceed?
> Should we fix dejagnu and then fix buggy tests?
> Or the other way around?  :-)
>
> diff --git a/lib/target.exp b/lib/target.exp
> index 36ae639..f0bfe20 100644
> --- a/lib/target.exp
> +++ b/lib/target.exp
> @@ -455,7 +455,7 @@ proc default_target_compile {source destfile type 
> options} {
> }
> if {[regexp "^additional_flags=" $i]} {
> regsub "^additional_flags=" $i "" tmp
> -   append add_flags " $tmp"
> +   append additional_flags " $tmp"
> }
> if {[regexp "^ldflags=" $i]} {
> regsub "^ldflags=" $i "" tmp
> @@ -703,6 +703,8 @@ proc default_target_compile {source destfile type 
> options} {
> }
>  }
>
> +append add_flags " $additional_flags"
> +
>  verbose "doing compile"
>
>  set sources ""
> @@ -728,7 +730,7 @@ proc default_target_compile {source destfile type 
> options} {
> append add_flags " -o $destfile"
> }
>  }
> -
> +
>  # This is obscure: we put SOURCES at the end when building an
>  # object, because otherwise, in some situations, libtool will
>  # become confused about the name of the actual source file.

Does your dejagnu contain

commit 5256bd82343000c76bc0e48139003f90b6184347
Author: H.J. Lu 
Date:   Thu Feb 26 17:53:48 2015 +1100

* lib/target.exp (default_target_compile): Prepend multilib_flags,
instead of appending it.

Some GCC testcases need explicit GCC options to properly run. For
example gcc.target/i386/pr32219-1.c has -fpie specified explicitly:

/* { dg-options "-O2 -fpie" } */

But with multlib, eg:
make check-gcc RUNTESTFLAGS="--target_board='unix{-fpic}'"

-fpic is appended to the command line options, which overrides the command
line options specified by dg-options.  multlib flags should be placed at
the beginning of the command line options, not at the end.  This patch
updates default_target_compile to prepend multilib_flags, instead of
appending it.


-- 
H.J.


Re: [committed] libstdc++: Fix std::gcd and std::lcm for unsigned integers [PR 92978]

2020-09-02 Thread Jonathan Wakely via Gcc-patches

On 28/08/20 23:11 +0100, Jonathan Wakely wrote:

This fixes a bug with mixed signed and unsigned types, where converting
a negative value to the unsigned result type alters the value. The
solution is to obtain the absolute values of the arguments immediately
and to perform the actual GCD or LCM algorithm on two arguments of the
same type.

In order to operate on the most negative number without overflow when
taking its absolute, use an unsigned type for the result of the abs
operation. For example, -INT_MIN will overflow, but -(unsigned)INT_MIN
is (unsigned)INT_MAX+1U which is the correct value.

libstdc++-v3/ChangeLog:

PR libstdc++/92978
* include/std/numeric (__abs_integral): Replace with ...
(__detail::__absu): New function template that returns an
unsigned type, guaranteeing it can represent the most
negative signed value.
(__detail::__gcd, __detail::__lcm): Require arguments to
be unsigned and therefore already non-negative.
(gcd, lcm): Convert arguments to absolute value as unsigned
type before calling __detail::__gcd or __detail::__lcm.
* include/experimental/numeric (gcd, lcm): Likewise.
* testsuite/26_numerics/gcd/gcd_neg.cc: Adjust expected
errors.
* testsuite/26_numerics/lcm/lcm_neg.cc: Likewise.
* testsuite/26_numerics/gcd/92978.cc: New test.
* testsuite/26_numerics/lcm/92978.cc: New test.
* testsuite/experimental/numeric/92978.cc: New test.

Tested powerpc64le-linux. Committed to trunk.



[snip]


diff --git a/libstdc++-v3/testsuite/experimental/numeric/92978.cc 
b/libstdc++-v3/testsuite/experimental/numeric/92978.cc
new file mode 100644
index 000..8408fd4d9ce
--- /dev/null
+++ b/libstdc++-v3/testsuite/experimental/numeric/92978.cc
@@ -0,0 +1,48 @@
+// Copyright (C) 2020 Free Software Foundation, Inc.
+//
+// This file is part of the GNU ISO C++ Library.  This library is free
+// software; you can redistribute it and/or modify it under the
+// terms of the GNU General Public License as published by the
+// Free Software Foundation; either version 3, or (at your option)
+// any later version.
+
+// This library is distributed in the hope that it will be useful,
+// but WITHOUT ANY WARRANTY; without even the implied warranty of
+// MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+// GNU General Public License for more details.
+
+// You should have received a copy of the GNU General Public License along
+// with this library; see the file COPYING3.  If not see
+// .
+
+// { dg-do compile { target c++14 } }
+
+#include 
+#include 
+
+void
+test01()
+{
+  // PR libstdc++/92978
+  static_assert( std::experimental::gcd(-120, 10U) == 10,
+  "mixed signed/unsigned" );
+  static_assert( std::experimental::gcd(120U, -10) == 10,
+  "mixed signed/unsigned" );
+
+  static_assert( std::lcm(-42, 21U) == 42U );


This test is supposed to be usingthe experimental functions, but this
calls std::lcm (and so the test fails if run as C++14).

Fixed with the attached patch, committed to trunk.

commit c71644776f4e8477289a4de16239dbb420db6945
Author: Jonathan Wakely 
Date:   Wed Sep 2 17:20:37 2020

libstdc++: Fix test to use correct function

This was copied from a test for std::lcm but I forgot to change one of
the calls to use the experimental version of the function.

libstdc++-v3/ChangeLog:

PR libstdc++/92978
* testsuite/experimental/numeric/92978.cc: Use experimental::lcm
not std::lcm.

diff --git a/libstdc++-v3/testsuite/experimental/numeric/92978.cc b/libstdc++-v3/testsuite/experimental/numeric/92978.cc
index 8408fd4d9ce..e2a4b1adefa 100644
--- a/libstdc++-v3/testsuite/experimental/numeric/92978.cc
+++ b/libstdc++-v3/testsuite/experimental/numeric/92978.cc
@@ -29,7 +29,7 @@ test01()
   static_assert( std::experimental::gcd(120U, -10) == 10,
   "mixed signed/unsigned" );
 
-  static_assert( std::lcm(-42, 21U) == 42U );
+  static_assert( std::experimental::lcm(-42, 21U) == 42U );
 }
 
 void


Re: [PATCH] c++: Fix P0960 in member init list and array [PR92812]

2020-09-02 Thread Jason Merrill via Gcc-patches

On 9/1/20 6:23 PM, Marek Polacek wrote:

This patch nails down the remaining P0960 case in PR92812:

   struct A {
 int ar[2];
 A(): ar(1, 2) {} // doesn't work without this patch
   };

Note that when the target object is not of array type, this already
works:

   struct S { int x, y; };
   struct A {
 S s;
 A(): s(1, 2) { } // OK in C++20
   };

because build_new_method_call_1 takes care of the P0960 magic.

It proved to be quite hairy.  When the ()-list has more than one
element, we can always create a CONSTRUCTOR, because the code was
previously invalid.  But when the ()-list has just one element, it
gets all kinds of difficult.  As usual, we have to handle a("foo")
so as not to wrap the STRING_CST in a CONSTRUCTOR.  Always turning
x(e) into x{e} would run into trouble as in c++/93790.  Another
issue was what to do about x({e}): previously, this would trigger
"list-initializer for non-class type must not be parenthesized".
I figured I'd make this work in C++20, so that given

   struct S { int x, y; };

you can do

S a[2];
[...]
A(): a({1, 2}) // initialize a[0] with {1, 2} and a[1] with {}

It also turned out that, as an extension, we support compound literals:

   F (): m((S[1]) { 1, 2 })

so this has to keep working as before.

Moreover, make sure not to trigger in compiler-generated code, like
=default, where array assignment is allowed.

paren-init35.C also tests this with vector types.

Bootstrapped/regtested on x86_64-pc-linux-gnu, ok for trunk?

gcc/cp/ChangeLog:

PR c++/92812
* init.c (do_paren_init_for_array_p): New.
(perform_member_init): Use it.  If true, build up a CONSTRUCTOR
from the list of arguments.

gcc/testsuite/ChangeLog:

PR c++/92812
* g++.dg/cpp0x/constexpr-array23.C: Adjust dg-error.
* g++.dg/cpp0x/initlist69.C: Likewise.
* g++.dg/diagnostic/mem-init1.C: Likewise.
* g++.dg/init/array28.C: Likewise.
* g++.dg/cpp2a/paren-init33.C: New test.
* g++.dg/cpp2a/paren-init34.C: New test.
* g++.dg/cpp2a/paren-init35.C: New test.
* g++.old-deja/g++.brendan/crash60.C: Adjust dg-error.
* g++.old-deja/g++.law/init10.C: Likewise.
* g++.old-deja/g++.other/array3.C: Likewise.
---
  gcc/cp/init.c |  64 -
  .../g++.dg/cpp0x/constexpr-array23.C  |   6 +-
  gcc/testsuite/g++.dg/cpp0x/initlist69.C   |   4 +-
  gcc/testsuite/g++.dg/cpp2a/paren-init33.C | 128 ++
  gcc/testsuite/g++.dg/cpp2a/paren-init34.C |  25 
  gcc/testsuite/g++.dg/cpp2a/paren-init35.C |  21 +++
  gcc/testsuite/g++.dg/diagnostic/mem-init1.C   |   4 +-
  gcc/testsuite/g++.dg/init/array28.C   |   2 +-
  .../g++.old-deja/g++.brendan/crash60.C|   2 +-
  gcc/testsuite/g++.old-deja/g++.law/init10.C   |   2 +-
  gcc/testsuite/g++.old-deja/g++.other/array3.C |   3 +-
  11 files changed, 243 insertions(+), 18 deletions(-)
  create mode 100644 gcc/testsuite/g++.dg/cpp2a/paren-init33.C
  create mode 100644 gcc/testsuite/g++.dg/cpp2a/paren-init34.C
  create mode 100644 gcc/testsuite/g++.dg/cpp2a/paren-init35.C

diff --git a/gcc/cp/init.c b/gcc/cp/init.c
index d4540db3605..2edc9651ad6 100644
--- a/gcc/cp/init.c
+++ b/gcc/cp/init.c
@@ -756,6 +756,41 @@ maybe_warn_list_ctor (tree member, tree init)
 "of the underlying array", member, begin);
  }
  
+/* Return true if we should attempt to perform the P0960 magic when

+   initializing an array TYPE from a parenthesized list of values LIST.  */
+
+static bool
+do_paren_init_for_array_p (tree list, tree type)
+{
+  if (cxx_dialect < cxx20)
+/* P0960 is a C++20 feature.  */
+return false;
+
+  const int len = list_length (list);
+  if (len == 0)
+/* Value-initialization.  */
+return false;
+  else if (len > 1)
+/* If the list had more than one element, the code is ill-formed
+   pre-C++20, so we should attempt to ()-init.  */
+return true;
+
+  /* Lists with one element are trickier.  */
+  tree elt = TREE_VALUE (list);
+
+  /* For a("foo"), don't wrap the STRING_CST in { }.  */
+  if (char_type_p (TYPE_MAIN_VARIANT (TREE_TYPE (type)))
+  && TREE_CODE (tree_strip_any_location_wrapper (elt)) == STRING_CST)
+return false;


Hmm, yet another place we need to implement the special treatment of 
strings?  Can't we factor this better?  Could there be a general e.g. 
maybe_aggregate_paren_init function to turn a list into a CONSTRUCTOR 
that's used in various places?



+  /* Don't trigger in compiler-generated code for = default.  */
+  if (current_function_decl && DECL_DEFAULTED_FN (current_function_decl))
+return false;
+
+  /* Handle non-standard extensions like compound literals.  */
+  return !same_type_ignoring_top_level_qualifiers_p (type, TREE_TYPE (elt));


Isn't the defaulted function case caught by the same-type check?

Jason



[committed] MSP430: Fix -mlarge documentation to indicate size_t is a 20-bit type

2020-09-02 Thread Jozef Lawrynowicz
Minor documentation fix, committed as obvious.
>From 0edc2c1a2445dffc7b839d833263c78f7cab01dc Mon Sep 17 00:00:00 2001
From: Jozef Lawrynowicz 
Date: Wed, 2 Sep 2020 16:34:43 +0100
Subject: [PATCH] MSP430: Fix -mlarge documentation to indicate size_t is a
 20-bit type

gcc/ChangeLog:

* doc/invoke.texi (MSP430 options): Fix -mlarge description to
indicate size_t is a 20-bit type.
---
 gcc/doc/invoke.texi | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi
index 5d29a7fa23c..bca8c856dc8 100644
--- a/gcc/doc/invoke.texi
+++ b/gcc/doc/invoke.texi
@@ -24777,7 +24777,7 @@ any scripts that would be selected by the 
@option{-mmcu=} option.
 
 @item -mlarge
 @opindex mlarge
-Use large-model addressing (20-bit pointers, 32-bit @code{size_t}).
+Use large-model addressing (20-bit pointers, 20-bit @code{size_t}).
 
 @item -msmall
 @opindex msmall
-- 
2.28.0



Re: [committed] libstdc++: Fix three-way comparison for std::array [PR 96851]

2020-09-02 Thread Jonathan Wakely via Gcc-patches

On 02/09/20 15:51 +0100, Jonathan Wakely wrote:

The spaceship operator for std::array uses memcmp when the
__is_byte trait is true, but memcmp isn't usable in
constexpr contexts. Also, memcmp should only be used for unsigned byte
types, because it gives the wrong answer for signed chars with negative
values.

We can simply check std::is_constant_evaluated() so that we don't use
memcmp during constant evaluation.

To fix the problem of using memcmp for inappropriate types, this patch
adds new __is_memcmp_ordered and __is_memcmp_ordered_with traits. These
say whether using memcmp will give the right answer for ordering
operations such as lexicographical_compare and three-way comparisons.
The new traits can be used in several places, and can also be used to
implement my suggestion in PR 93059 comment 37 to use memcmp for
unsigned integers larger than one byte on big endian targets.

libstdc++-v3/ChangeLog:

PR libstdc++/96851
* include/bits/cpp_type_traits.h (__is_memcmp_ordered):
New trait that says if memcmp can be used for ordering.
(__is_memcmp_ordered_with): Likewise, for two types.
* include/bits/deque.tcc (__lex_cmp_dit): Use new traits
instead of __is_byte and __numeric_traits.
(__lexicographical_compare_aux1): Likewise.
* include/bits/ranges_algo.h (__lexicographical_compare_fn):
Likewise.
* include/bits/stl_algobase.h (__lexicographical_compare_aux1)
(__is_byte_iter): Likewise.
* include/std/array (operator<=>): Likewise. Only use memcmp
when std::is_constant_evaluated() is false.
* testsuite/23_containers/array/comparison_operators/96851.cc:
New test.
* testsuite/23_containers/array/tuple_interface/get_neg.cc:
Adjust dg-error line numbers.


For the gcc-10 branch I've committed the attached backport, which adds
the new traits and uses them everywhere appropriate, but doesn't
extend the memcmp optimisation to all unsigned integers for big endian
targets.

Tested x86_64-linux. Committed to gcc-10.

commit 33c34c4c2466fd4fd050ed8e2d5996c35ebdeef6
Author: Jonathan Wakely 
Date:   Wed Sep 2 15:17:24 2020

libstdc++: Fix three-way comparison for std::array [PR 96851]

The spaceship operator for std::array uses memcmp when the
__is_byte trait is true, but memcmp isn't usable in
constexpr contexts. Also, memcmp should only be used for unsigned byte
types, because it gives the wrong answer for signed chars with negative
values.

We can simply check std::is_constant_evaluated() so that we don't use
memcmp during constant evaluation.

To fix the problem of using memcmp for inappropriate types, this patch
adds new __is_memcmp_ordered and __is_memcmp_ordered_with traits. These
say whether using memcmp will give the right answer for ordering
operations such as lexicographical_compare and three-way comparisons.
The new traits can be used in several places.

Unlike the trunk commit this was backported from, this commit for the
branch doesn't extend the memcmp optimisations to all unsigned integers
on big endian targets. Only narrow character types and std::byte will
use memcmp.

libstdc++-v3/ChangeLog:

PR libstdc++/96851
* include/bits/cpp_type_traits.h (__is_memcmp_ordered):
New trait that says if memcmp can be used for ordering.
(__is_memcmp_ordered_with): Likewise, for two types.
* include/bits/ranges_algo.h (__lexicographical_compare_fn):
Use new traits instead of __is_byte and __numeric_traits.
* include/bits/stl_algobase.h (__lexicographical_compare_aux1)
(__is_byte_iter): Likewise.
* include/std/array (operator<=>): Likewise. Only use memcmp
when std::is_constant_evaluated() is false.
* testsuite/23_containers/array/comparison_operators/96851.cc:
New test.
* testsuite/23_containers/array/tuple_interface/get_neg.cc:
Adjust dg-error line numbers.

(cherry picked from commit 2f983fa69005b603ea1758a013b4134d5b0f24a8)

diff --git a/libstdc++-v3/include/bits/cpp_type_traits.h b/libstdc++-v3/include/bits/cpp_type_traits.h
index 979ad9c2c69..ca83f590eb4 100644
--- a/libstdc++-v3/include/bits/cpp_type_traits.h
+++ b/libstdc++-v3/include/bits/cpp_type_traits.h
@@ -482,6 +482,50 @@ __INT_N(__GLIBCXX_TYPE_INT_N_3)
 : __is_nonvolatile_trivially_copyable<_Tp>
 { };
 
+  // Whether memcmp can be used to determine ordering for a type
+  // e.g. in std::lexicographical_compare or three-way comparisons.
+  // True for unsigned narrow character types (and std::byte).
+  template::__value>
+struct __is_memcmp_ordered
+{
+  static const bool __value = _Tp(-1) > _Tp(1); // is unsigned
+};
+
+  template
+struct __is_memcmp_ordered<_Tp, false>
+{
+  static const bool __value = false;
+};

Re: [committed] libstdc++: Break header cycle between and

2020-09-02 Thread Jonathan Wakely via Gcc-patches

On 02/09/20 14:15 +0100, Jonathan Wakely wrote:

The  and  headers each include each other, which makes
building them as header-units "exciting". The  header only needs
the definition of std::exception (in order to derive from it) which is
already in its own header, so just include that.

libstdc++-v3/ChangeLog:

* include/bits/stl_iterator.h: Include 
for definitions of __try, __catch and __throw_exception_again.
(counted_iterator::operator++(int)): Use __throw_exception_again
instead of throw.
* libsupc++/new: Include  not .
* libsupc++/new_opvnt.cc: Include .
* testsuite/18_support/destroying_delete.cc: Include
 for std::is_same_v definition.
* testsuite/20_util/variant/index_type.cc: Qualify size_t.

Tested powerpc64le-linux. Committed to trunk.




commit 6bdbf0f37bda2587a4e82cbb956de7a159a397ae
Author: Jonathan Wakely 
Date:   Wed Sep 2 13:27:57 2020

   libstdc++: Break header cycle between  and 

   The  and  headers each include each other, which makes
   building them as header-units "exciting". The  header only needs
   the definition of std::exception (in order to derive from it) which is
   already in its own header, so just include that.

   libstdc++-v3/ChangeLog:

   * include/bits/stl_iterator.h: Include 
   for definitions of __try, __catch and __throw_exception_again.
   (counted_iterator::operator++(int)): Use __throw_exception_again
   instead of throw.
   * libsupc++/new: Include  not .
   * libsupc++/new_opvnt.cc: Include .
   * testsuite/18_support/destroying_delete.cc: Include
for std::is_same_v definition.
   * testsuite/20_util/variant/index_type.cc: Qualify size_t.

diff --git a/libstdc++-v3/include/bits/stl_iterator.h 
b/libstdc++-v3/include/bits/stl_iterator.h
index f0cf4c55c09..da740e3732e 100644
--- a/libstdc++-v3/include/bits/stl_iterator.h
+++ b/libstdc++-v3/include/bits/stl_iterator.h
@@ -79,6 +79,7 @@
#if __cplusplus > 201703L
# include 
# include 
+# include 
# include 
#endif

@@ -2062,7 +2063,7 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
return _M_current++;
  } __catch(...) {
++_M_length;
-   throw;
+   __throw_exception_again;
  }

  }


I've also changed the line above on the gcc-10 branch. Even though
both GCC and Clang accept it with -fno-exceptions (rather
mysteriously) it should be using the __throw_exception_again macro
instead.

commit 7eb76b3b1721247bc2c9ab6a41c1655158ed3411
Author: Jonathan Wakely 
Date:   Wed Sep 2 14:50:34 2020

libstdc++: Use __throw_exception_again macro for -fno-exceptions

libstdc++-v3/ChangeLog:

* include/bits/stl_iterator.h (counted_iterator::operator++(int)):
Use __throw_exception_again macro.

diff --git a/libstdc++-v3/include/bits/stl_iterator.h b/libstdc++-v3/include/bits/stl_iterator.h
index 19b1d53f781..d6bb085b3c6 100644
--- a/libstdc++-v3/include/bits/stl_iterator.h
+++ b/libstdc++-v3/include/bits/stl_iterator.h
@@ -2013,7 +2013,7 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
 	return _M_current++;
 	  } __catch(...) {
 	++_M_length;
-	throw;
+	__throw_exception_again;
 	  }
 
   }


dg-options after board/cflags

2020-09-02 Thread Jose E. Marchesi via Gcc-patches


Hi people!

While adding a bpf-sim.exp to dejagnu, I noticed that the flags in
board/cflags were included in the final compilation line _after_ the
flags in the test's dg-options.

Since the test options are more particular than the board options, I
would expect them to be placed after any board-defined flags, so I
prepared the patch below for dejagnu, which does the right thing for the
gcc.target/bpf testsuite.

However:

1. There could be tests around that depend (erroneously) on some of
   their dg-options to not have effect (or a different effect) because
   they are annulled (or modified) by some flag in a board file.

2. This could also impact other programs using dejagnu.

How do you people recommend to proceed?
Should we fix dejagnu and then fix buggy tests?
Or the other way around?  :-)

diff --git a/lib/target.exp b/lib/target.exp
index 36ae639..f0bfe20 100644
--- a/lib/target.exp
+++ b/lib/target.exp
@@ -455,7 +455,7 @@ proc default_target_compile {source destfile type options} {
}
if {[regexp "^additional_flags=" $i]} {
regsub "^additional_flags=" $i "" tmp
-   append add_flags " $tmp"
+   append additional_flags " $tmp"
}
if {[regexp "^ldflags=" $i]} {
regsub "^ldflags=" $i "" tmp
@@ -703,6 +703,8 @@ proc default_target_compile {source destfile type options} {
}
 }
 
+append add_flags " $additional_flags"
+
 verbose "doing compile"
 
 set sources ""
@@ -728,7 +730,7 @@ proc default_target_compile {source destfile type options} {
append add_flags " -o $destfile"
}
 }
-
+
 # This is obscure: we put SOURCES at the end when building an
 # object, because otherwise, in some situations, libtool will
 # become confused about the name of the actual source file.


[Patch] Fortran: Fixes for pointer function call as variable (PR96896)

2020-09-02 Thread Tobias Burnus

During some discussion such an example as attached came up:
  f() = 0.0
where 'f' is a function which returns a pointer to an array.
This gets handled as
  _F.D0 => f()
  _F.D0 = 0.0
However, the first line did fail with a rank error as the rank
was taken from the RHS.

Changing this to the LHS express failed due to 'use_assoc',
which added an 'extern' to the variable and 'proc_pointer'
also caused problems – in principle, either problem could
have also occurred for the RHS.

Side effect: The error message is better for rank mismatch
as for 'f() = a' no pointer assignment is involved (in terms
of the user code) but before we had the error message
'Different ranks in pointer assignment'.

OK?

Tobias

-
Mentor Graphics (Deutschland) GmbH, Arnulfstraße 201, 80634 München / Germany
Registergericht München HRB 106955, Geschäftsführer: Thomas Heurung, Alexander 
Walter
Fortran: Fixes for pointer function call as variable (PR96896)

gcc/fortran/ChangeLog:

	PR fortran/96896
	* resolve.c (get_temp_from_expr): Also reset proc_pointer +
	use_assoc attribute.
	(resolve_ptr_fcn_assign): Use information from the LHS.

gcc/testsuite/ChangeLog:

	PR fortran/96896
	* gfortran.dg/ptr_func_assign_4.f08:
	* gfortran.dg/ptr-func-3.f90: New test.

 gcc/fortran/resolve.c   |  4 +-
 gcc/testsuite/gfortran.dg/ptr-func-3.f90| 56 +
 gcc/testsuite/gfortran.dg/ptr_func_assign_4.f08 |  4 +-
 3 files changed, 61 insertions(+), 3 deletions(-)

diff --git a/gcc/fortran/resolve.c b/gcc/fortran/resolve.c
index e4232717e42..a3e1e427ba7 100644
--- a/gcc/fortran/resolve.c
+++ b/gcc/fortran/resolve.c
@@ -11173,9 +11173,11 @@ get_temp_from_expr (gfc_expr *e, gfc_namespace *ns)
   /* Add the attributes and the arrayspec to the temporary.  */
   tmp->n.sym->attr = gfc_expr_attr (e);
   tmp->n.sym->attr.function = 0;
+  tmp->n.sym->attr.proc_pointer = 0;
   tmp->n.sym->attr.result = 0;
   tmp->n.sym->attr.flavor = FL_VARIABLE;
   tmp->n.sym->attr.dummy = 0;
+  tmp->n.sym->attr.use_assoc = 0;
   tmp->n.sym->attr.intent = INTENT_UNKNOWN;
 
   if (as)
@@ -11595,7 +11597,7 @@ resolve_ptr_fcn_assign (gfc_code **code, gfc_namespace *ns)
   return false;
 }
 
-  tmp_ptr_expr = get_temp_from_expr ((*code)->expr2, ns);
+  tmp_ptr_expr = get_temp_from_expr ((*code)->expr1, ns);
 
   /* get_temp_from_expression is set up for ordinary assignments. To that
  end, where array bounds are not known, arrays are made allocatable.
diff --git a/gcc/testsuite/gfortran.dg/ptr-func-3.f90 b/gcc/testsuite/gfortran.dg/ptr-func-3.f90
new file mode 100644
index 000..0f1af64002a
--- /dev/null
+++ b/gcc/testsuite/gfortran.dg/ptr-func-3.f90
@@ -0,0 +1,56 @@
+! { dg-do run }
+! PR fortran/96896
+
+call test1
+call reshape_test
+end
+
+subroutine test1
+implicit none
+integer, target :: B
+integer, pointer :: A(:)
+allocate(A(5))
+A = 1
+B = 10
+get_A() = get_B()
+if (any (A /= 10)) stop 1
+get_A() = get_A()
+if (any (A /= 10)) stop 2
+deallocate(A)
+contains
+  function get_A()
+integer, pointer :: get_A(:)
+get_A => A
+  end
+  function get_B()
+integer, pointer :: get_B
+get_B => B
+  end
+end
+
+subroutine reshape_test
+implicit none
+real, target, dimension (1:9) :: b
+integer :: i
+b = 1.0
+myshape(b) = 3.0
+do i = 1, 3
+  myfunc (b,i,2) = b(i) + i
+  b(i) = b(i) + 2.0
+end do
+if (any (b /= [real::5,5,5,4,5,6,3,3,3])) stop 3
+contains
+  function myfunc(b,i,j)
+real, target, dimension (1:9) :: b
+real, pointer :: myfunc
+real, pointer :: p(:,:)
+integer :: i,j 
+p => myshape(b)
+myfunc => p(i,j)
+  end function myfunc
+  function myshape(b)
+real, target, dimension (1:9) :: b
+real, pointer :: myshape(:,:)
+myshape(1:3,1:3) => b
+  end function myshape
+end subroutine reshape_test
diff --git a/gcc/testsuite/gfortran.dg/ptr_func_assign_4.f08 b/gcc/testsuite/gfortran.dg/ptr_func_assign_4.f08
index 46ef2ac5566..49ba9bcd3d9 100644
--- a/gcc/testsuite/gfortran.dg/ptr_func_assign_4.f08
+++ b/gcc/testsuite/gfortran.dg/ptr_func_assign_4.f08
@@ -10,8 +10,8 @@ program p
   integer :: c
 
   c = 3
-  func (b(2, 2)) = b ! { dg-error "Different ranks" }
-  func (c) = b   ! { dg-error "Different ranks" }
+  func (b(2, 2)) = b ! { dg-error "Incompatible ranks 1 and 2 in assignment" }
+  func (c) = b   ! { dg-error "Incompatible ranks 1 and 2 in assignment" }
 
 contains
   function func(arg) result(r)


[committed] libstdc++: Fix three-way comparison for std::array [PR 96851]

2020-09-02 Thread Jonathan Wakely via Gcc-patches
The spaceship operator for std::array uses memcmp when the
__is_byte trait is true, but memcmp isn't usable in
constexpr contexts. Also, memcmp should only be used for unsigned byte
types, because it gives the wrong answer for signed chars with negative
values.

We can simply check std::is_constant_evaluated() so that we don't use
memcmp during constant evaluation.

To fix the problem of using memcmp for inappropriate types, this patch
adds new __is_memcmp_ordered and __is_memcmp_ordered_with traits. These
say whether using memcmp will give the right answer for ordering
operations such as lexicographical_compare and three-way comparisons.
The new traits can be used in several places, and can also be used to
implement my suggestion in PR 93059 comment 37 to use memcmp for
unsigned integers larger than one byte on big endian targets.

libstdc++-v3/ChangeLog:

PR libstdc++/96851
* include/bits/cpp_type_traits.h (__is_memcmp_ordered):
New trait that says if memcmp can be used for ordering.
(__is_memcmp_ordered_with): Likewise, for two types.
* include/bits/deque.tcc (__lex_cmp_dit): Use new traits
instead of __is_byte and __numeric_traits.
(__lexicographical_compare_aux1): Likewise.
* include/bits/ranges_algo.h (__lexicographical_compare_fn):
Likewise.
* include/bits/stl_algobase.h (__lexicographical_compare_aux1)
(__is_byte_iter): Likewise.
* include/std/array (operator<=>): Likewise. Only use memcmp
when std::is_constant_evaluated() is false.
* testsuite/23_containers/array/comparison_operators/96851.cc:
New test.
* testsuite/23_containers/array/tuple_interface/get_neg.cc:
Adjust dg-error line numbers.

Tested powerpc64le-linux. Committed to trunk.

commit 2f983fa69005b603ea1758a013b4134d5b0f24a8
Author: Jonathan Wakely 
Date:   Wed Sep 2 15:17:24 2020

libstdc++: Fix three-way comparison for std::array [PR 96851]

The spaceship operator for std::array uses memcmp when the
__is_byte trait is true, but memcmp isn't usable in
constexpr contexts. Also, memcmp should only be used for unsigned byte
types, because it gives the wrong answer for signed chars with negative
values.

We can simply check std::is_constant_evaluated() so that we don't use
memcmp during constant evaluation.

To fix the problem of using memcmp for inappropriate types, this patch
adds new __is_memcmp_ordered and __is_memcmp_ordered_with traits. These
say whether using memcmp will give the right answer for ordering
operations such as lexicographical_compare and three-way comparisons.
The new traits can be used in several places, and can also be used to
implement my suggestion in PR 93059 comment 37 to use memcmp for
unsigned integers larger than one byte on big endian targets.

libstdc++-v3/ChangeLog:

PR libstdc++/96851
* include/bits/cpp_type_traits.h (__is_memcmp_ordered):
New trait that says if memcmp can be used for ordering.
(__is_memcmp_ordered_with): Likewise, for two types.
* include/bits/deque.tcc (__lex_cmp_dit): Use new traits
instead of __is_byte and __numeric_traits.
(__lexicographical_compare_aux1): Likewise.
* include/bits/ranges_algo.h (__lexicographical_compare_fn):
Likewise.
* include/bits/stl_algobase.h (__lexicographical_compare_aux1)
(__is_byte_iter): Likewise.
* include/std/array (operator<=>): Likewise. Only use memcmp
when std::is_constant_evaluated() is false.
* testsuite/23_containers/array/comparison_operators/96851.cc:
New test.
* testsuite/23_containers/array/tuple_interface/get_neg.cc:
Adjust dg-error line numbers.

diff --git a/libstdc++-v3/include/bits/cpp_type_traits.h 
b/libstdc++-v3/include/bits/cpp_type_traits.h
index 979ad9c2c69..b48d1adc63c 100644
--- a/libstdc++-v3/include/bits/cpp_type_traits.h
+++ b/libstdc++-v3/include/bits/cpp_type_traits.h
@@ -482,6 +482,66 @@ __INT_N(__GLIBCXX_TYPE_INT_N_3)
 : __is_nonvolatile_trivially_copyable<_Tp>
 { };
 
+  // Whether memcmp can be used to determine ordering for a type
+  // e.g. in std::lexicographical_compare or three-way comparisons.
+  // True for unsigned integer-like types where comparing each byte in turn
+  // as an unsigned char yields the right result. This is true for all
+  // unsigned integers on big endian targets, but only unsigned narrow
+  // character types (and std::byte) on little endian targets.
+  template::__value
+#else
+   __is_byte<_Tp>::__value
+#endif
+>
+struct __is_memcmp_ordered
+{
+  static const bool __value = _Tp(-1) > _Tp(1); // is unsigned
+};
+
+  template
+struct __is_memcmp_ordered<_Tp, false>
+{
+  static const bool __value = false;
+};
+
+  // Whether two 

Re: [PATCH 3/3] Use more ONE_? in GGC functions.

2020-09-02 Thread Martin Liška

On 9/1/20 2:33 PM, Martin Liška wrote:

The last patch is a refactoring using ONE_* macros.

Thoughts?
Thanks,
Martin


There's rebassed version of the patch.

Patch can bootstrap on x86_64-linux-gnu and survives regression tests.

Ready to be installed?
Thanks,
Martin
>From 5cefff607077503794a387b612585604c2b3d0f0 Mon Sep 17 00:00:00 2001
From: Martin Liska 
Date: Wed, 2 Sep 2020 14:34:21 +0200
Subject: [PATCH 3/3] Use ONE_? macros.

gcc/ChangeLog:

	* ggc-common.c (ggc_rlimit_bound): Use ONE_? macro.
	(ggc_min_expand_heuristic): Likewise.
	(ggc_min_heapsize_heuristic): Likewise.
	* ggc-page.c (ggc_collect): Likewise.
	* system.h (ONE_G): Likewise.
---
 gcc/ggc-common.c | 16 
 gcc/ggc-page.c   |  2 +-
 gcc/system.h |  1 +
 3 files changed, 10 insertions(+), 9 deletions(-)

diff --git a/gcc/ggc-common.c b/gcc/ggc-common.c
index 50c52fe525b..c21886861f0 100644
--- a/gcc/ggc-common.c
+++ b/gcc/ggc-common.c
@@ -742,7 +742,7 @@ ggc_rlimit_bound (double limit)
 	 appears to be ignored.  Ignore such silliness.  If a limit
 	 this small was actually effective for mmap, GCC wouldn't even
 	 start up.  */
-  && rlim.rlim_cur >= 8 * 1024 * 1024)
+  && rlim.rlim_cur >= 8 * ONE_M)
 limit = rlim.rlim_cur;
 # endif /* RLIMIT_AS or RLIMIT_DATA */
 #endif /* HAVE_GETRLIMIT */
@@ -761,7 +761,7 @@ ggc_min_expand_heuristic (void)
 
   /* The heuristic is a percentage equal to 30% + 70%*(RAM/1GB), yielding
  a lower bound of 30% and an upper bound of 100% (when RAM >= 1GB).  */
-  min_expand /= 1024*1024*1024;
+  min_expand /= ONE_G;
   min_expand *= 70;
   min_expand = MIN (min_expand, 70);
   min_expand += 30;
@@ -776,8 +776,8 @@ ggc_min_heapsize_heuristic (void)
   double phys_kbytes = physmem_total ();
   double limit_kbytes = ggc_rlimit_bound (phys_kbytes * 2);
 
-  phys_kbytes /= 1024; /* Convert to Kbytes.  */
-  limit_kbytes /= 1024;
+  phys_kbytes /= ONE_K; /* Convert to Kbytes.  */
+  limit_kbytes /= ONE_K;
 
   /* The heuristic is RAM/8, with a lower bound of 4M and an upper
  bound of 128M (when RAM >= 1GB).  */
@@ -790,7 +790,7 @@ ggc_min_heapsize_heuristic (void)
struct rlimit rlim;
if (getrlimit (RLIMIT_RSS, ) == 0
&& rlim.rlim_cur != (rlim_t) RLIM_INFINITY)
- phys_kbytes = MIN (phys_kbytes, rlim.rlim_cur / 1024);
+ phys_kbytes = MIN (phys_kbytes, rlim.rlim_cur / ONE_K);
  }
 # endif
 
@@ -798,12 +798,12 @@ ggc_min_heapsize_heuristic (void)
  *next* GC would be within 20Mb of the limit or within a quarter of
  the limit, whichever is larger.  If GCC does hit the data limit,
  compilation will fail, so this tries to be conservative.  */
-  limit_kbytes = MAX (0, limit_kbytes - MAX (limit_kbytes / 4, 20 * 1024));
+  limit_kbytes = MAX (0, limit_kbytes - MAX (limit_kbytes / 4, 20 * ONE_K));
   limit_kbytes = (limit_kbytes * 100) / (110 + ggc_min_expand_heuristic ());
   phys_kbytes = MIN (phys_kbytes, limit_kbytes);
 
-  phys_kbytes = MAX (phys_kbytes, 4 * 1024);
-  phys_kbytes = MIN (phys_kbytes, 128 * 1024);
+  phys_kbytes = MAX (phys_kbytes, 4 * ONE_K);
+  phys_kbytes = MIN (phys_kbytes, 128 * ONE_K);
 
   return phys_kbytes;
 }
diff --git a/gcc/ggc-page.c b/gcc/ggc-page.c
index 9405f033a7c..07e108f3e9d 100644
--- a/gcc/ggc-page.c
+++ b/gcc/ggc-page.c
@@ -2184,7 +2184,7 @@ ggc_collect (void)
  total allocations haven't expanded much since the last
  collection.  */
   float allocated_last_gc =
-MAX (G.allocated_last_gc, (size_t)param_ggc_min_heapsize * 1024);
+MAX (G.allocated_last_gc, (size_t)param_ggc_min_heapsize * ONE_K);
 
   /* It is also good time to get memory block pool into limits.  */
   memory_block_pool::trim ();
diff --git a/gcc/system.h b/gcc/system.h
index 4f0482be25d..b0f3f1dd019 100644
--- a/gcc/system.h
+++ b/gcc/system.h
@@ -1237,6 +1237,7 @@ void gcc_stablesort (void *, size_t, size_t,
 
 #define ONE_K 1024
 #define ONE_M (ONE_K * ONE_K)
+#define ONE_G (ONE_K * ONE_M)
 
 /* Display a number as an integer multiple of either:
- 1024, if said integer is >= to 10 K (in base 2)
-- 
2.28.0



Re: [PATCH 4/N] Change timevar memory allocation to MiB.

2020-09-02 Thread Martin Liška

Please forget about the patch. It's part of 02/N v2.

Martin


Re: [PATCH 2/3] Use MiB unit when displaying memory allocation.

2020-09-02 Thread Martin Liška

On 9/1/20 4:04 PM, Jan Hubicka wrote:

The patch is about usage of MiB in memory allocation reports.
I see it much better readable than values displayed in KiB:

Reading object files: tramp3d-v4.o {GC released 1 MiB} {GC 19 MiB -> 19 MiB} 
{GC 19 MiB}  {heap 12 MiB}
Reading the symbol table:
Merging declarations: {GC released 1 MiB madv_dontneed 0 MiB} {GC 27 MiB -> 27 
MiB} {GC 27 MiB}  {heap 15 MiB}
Reading summaries:  {GC 27 MiB}  {heap 15 MiB}  {GC 27 MiB}  {heap 15 MiB}  
{GC 27 MiB}  {heap 15 MiB}  {GC 27 MiB}  {heap 15 MiB}  {GC 27 MiB}  {heap 15 MiB}  
{GC 30 MiB}  {heap 15 MiB}  {GC 30 MiB}  {heap 15 MiB} {GC 30 MiB}
Merging symbols: {heap 15 MiB}Materializing decls:
   {heap 15 MiB}  {heap 15 MiB}  {heap 15 MiB}  {heap 15 MiB}  {heap 15 MiB}  
{heap 15 MiB}  {heap 15 MiB}  {heap 15 MiB}  {heap 15 MiB}  {heap 15 MiB}  {heap 15 MiB} 
 {GC released 1 MiB madv_dontneed 2 MiB} {GC trimmed to 27 MiB, 28 MiB mapped} {heap 15 MiB}  {heap 15 MiB}  
{heap 15 MiB}  {heap 15 MiB}
Streaming out {GC trimmed to 27 MiB, 28 MiB mapped} {heap 15 MiB} ./a.ltrans0.o 
( 11257 insns) ./a.ltrans1.o ( 11293 insns) ./a.ltrans2.o ( 8669 insns) 
./a.ltrans3.o ( 138934 insns)


One problem I see here is that while it is OK for Firefox builds it is
bit too coarse for smaller testcases where the memory use is still
importnat.  I guess we may just print KBs before the large gets too
large, just like norton commander does? :)


Sure, let's do it using SIZE_AMOUNT macro.

Patch can bootstrap on x86_64-linux-gnu and survives regression tests.

Ready to be installed?
Thanks,
Martin



Honza


Thoughts?
Thanks,
Martin


>From 8826f267175d456121612332b838e41a9542a513 Mon Sep 17 00:00:00 2001
From: Martin Liska 
Date: Wed, 2 Sep 2020 14:30:16 +0200
Subject: [PATCH 2/3] Use SIZE_AMOUNT macro for GGC memory allocation numbers.

gcc/ChangeLog:

	* ggc-common.c (ggc_prune_overhead_list): Use SIZE_AMOUNT.
	* ggc-page.c (release_pages): Likewise.
	(ggc_collect): Likewise.
	(ggc_trim): Likewise.
	(ggc_grow): Likewise.
	* timevar.c (timer::print): Likewise.

gcc/testsuite/ChangeLog:

	* g++.dg/ext/timevar1.C: Prune more possible number values.
	* g++.dg/ext/timevar2.C: Likewise.
---
 gcc/ggc-common.c|  6 +++---
 gcc/ggc-page.c  | 15 +++
 gcc/testsuite/g++.dg/ext/timevar1.C |  3 ++-
 gcc/testsuite/g++.dg/ext/timevar2.C |  3 ++-
 gcc/timevar.c   |  8 
 5 files changed, 18 insertions(+), 17 deletions(-)

diff --git a/gcc/ggc-common.c b/gcc/ggc-common.c
index b8782c5824b..50c52fe525b 100644
--- a/gcc/ggc-common.c
+++ b/gcc/ggc-common.c
@@ -1008,7 +1008,7 @@ ggc_prune_overhead_list (void)
   }
 }
 
-/* Print memory used by heap in kb if this info is not available.  */
+/* Print memory used by heap if this info is not available.  */
 
 void
 report_heap_memory_use ()
@@ -1020,7 +1020,7 @@ report_heap_memory_use ()
   #define MALLINFO_FN mallinfo
 #endif
   if (!quiet_flag)
-fprintf (stderr," {heap %luk}",
-	 (unsigned long) MALLINFO_FN ().arena / ONE_K);
+fprintf (stderr, " {heap " PRsa (0) "}",
+	 SIZE_AMOUNT (MALLINFO_FN ().arena));
 #endif
 }
diff --git a/gcc/ggc-page.c b/gcc/ggc-page.c
index 53b311c2a52..9405f033a7c 100644
--- a/gcc/ggc-page.c
+++ b/gcc/ggc-page.c
@@ -1164,9 +1164,9 @@ release_pages (void)
 {
   fprintf (stderr, " {GC");
   if (n1)
-	fprintf (stderr, " released %luk", (unsigned long)(n1 / 1024));
+	fprintf (stderr, " released " PRsa (0), SIZE_AMOUNT (n1));
   if (n2)
-	fprintf (stderr, " madv_dontneed %luk", (unsigned long)(n2 / 1024));
+	fprintf (stderr, " madv_dontneed " PRsa (0), SIZE_AMOUNT (n2));
   fprintf (stderr, "}");
 }
 }
@@ -2208,7 +2208,7 @@ ggc_collect (void)
 
   /* Output this later so we do not interfere with release_pages.  */
   if (!quiet_flag)
-fprintf (stderr, " {GC %luk -> ", (unsigned long) allocated / 1024);
+fprintf (stderr, " {GC " PRsa (0) " -> ", SIZE_AMOUNT (allocated));
 
   /* Indicate that we've seen collections at this context depth.  */
   G.context_depth_collections = ((unsigned long)1 << (G.context_depth + 1)) - 1;
@@ -2235,7 +2235,7 @@ ggc_collect (void)
   timevar_pop (TV_GC);
 
   if (!quiet_flag)
-fprintf (stderr, "%luk}", (unsigned long) G.allocated / 1024);
+fprintf (stderr, PRsa (0) "}", SIZE_AMOUNT (G.allocated));
   if (GGC_DEBUG_LEVEL >= 2)
 fprintf (G.debug_file, "END COLLECTING\n");
 }
@@ -2250,9 +2250,8 @@ ggc_trim ()
   sweep_pages ();
   release_pages ();
   if (!quiet_flag)
-fprintf (stderr, " {GC trimmed to %luk, %luk mapped}",
-	 (unsigned long) G.allocated / 1024,
-	 (unsigned long) G.bytes_mapped / 1024);
+fprintf (stderr, " {GC trimmed to " PRsa (0) ", " PRsa (0) " mapped}",
+	 SIZE_AMOUNT (G.allocated), SIZE_AMOUNT (G.bytes_mapped));
   timevar_pop (TV_GC);
 }
 
@@ -2269,7 +2268,7 @@ ggc_grow (void)
   else
 ggc_collect ();
   if (!quiet_flag)
-fprintf (stderr, " {GC %luk} ", (unsigned long) 

Re: [PATCH 1/3] Support new mallinfo2 function.

2020-09-02 Thread Martin Liška

On 9/1/20 2:31 PM, Martin Liška wrote:

Hey.

I've just applied to patches to glibc introducing a new mallinfo2 function.
Limitation of the current function mallinfo is usage of int type which overflows
for allocation > 2GB.

The patch adds configure detection and usage of the new one. And it prints heap 
usage
in MiB.

Ready to be installed after tests?
Thanks,
Martin


All right, there's V2 where I just support mallinfo2.

Martin
>From bdb6dcf8fbd51a9dc62e6a50a7eeedc734c130f9 Mon Sep 17 00:00:00 2001
From: Martin Liska 
Date: Tue, 1 Sep 2020 14:14:45 +0200
Subject: [PATCH 1/3] Support new mallinfo2 function.

gcc/ChangeLog:

	* config.in: Regenerate.
	* configure: Likewise.
	* configure.ac: Detect for mallinfo2.
	* ggc-common.c (defined): Use it.
	* system.h: Handle also HAVE_MALLINFO2.
---
 gcc/config.in| 16 ++--
 gcc/configure|  4 ++--
 gcc/configure.ac |  4 ++--
 gcc/ggc-common.c | 12 +---
 gcc/system.h |  2 +-
 5 files changed, 28 insertions(+), 10 deletions(-)

diff --git a/gcc/config.in b/gcc/config.in
index 478e74fac02..1832c112ed9 100644
--- a/gcc/config.in
+++ b/gcc/config.in
@@ -983,13 +983,19 @@
 #endif
 
 
-/* Define to 1 if we found a declaration for 'mallinfo', otherwise define to
-   0. */
+/* Define to 1 if we found a declaration for 'mallinfo */
 #ifndef USED_FOR_TARGET
 #undef HAVE_DECL_MALLINFO
 #endif
 
 
+/* Define to 1 if we found a declaration for 'mallinfo2', otherwise define to
+   0. */
+#ifndef USED_FOR_TARGET
+#undef HAVE_DECL_MALLINFO2
+#endif
+
+
 /* Define to 1 if we found a declaration for 'malloc', otherwise define to 0.
*/
 #ifndef USED_FOR_TARGET
@@ -1665,6 +1671,12 @@
 #endif
 
 
+/* Define to 1 if you have the `mallinfo2' function. */
+#ifndef USED_FOR_TARGET
+#undef HAVE_MALLINFO2
+#endif
+
+
 /* Define to 1 if you have the  header file. */
 #ifndef USED_FOR_TARGET
 #undef HAVE_MALLOC_H
diff --git a/gcc/configure b/gcc/configure
index 0f7a8dbe0f9..b8b9bd3505b 100755
--- a/gcc/configure
+++ b/gcc/configure
@@ -10120,7 +10120,7 @@ fi
 for ac_func in times clock kill getrlimit setrlimit atoq \
 	popen sysconf strsignal getrusage nl_langinfo \
 	gettimeofday mbstowcs wcswidth mmap setlocale \
-	clearerr_unlocked feof_unlocked   ferror_unlocked fflush_unlocked fgetc_unlocked fgets_unlocked   fileno_unlocked fprintf_unlocked fputc_unlocked fputs_unlocked   fread_unlocked fwrite_unlocked getchar_unlocked getc_unlocked   putchar_unlocked putc_unlocked madvise mallinfo
+	clearerr_unlocked feof_unlocked   ferror_unlocked fflush_unlocked fgetc_unlocked fgets_unlocked   fileno_unlocked fprintf_unlocked fputc_unlocked fputs_unlocked   fread_unlocked fwrite_unlocked getchar_unlocked getc_unlocked   putchar_unlocked putc_unlocked madvise mallinfo mallinfo2
 do :
   as_ac_var=`$as_echo "ac_cv_func_$ac_func" | $as_tr_sh`
 ac_fn_cxx_check_func "$LINENO" "$ac_func" "$as_ac_var"
@@ -11549,7 +11549,7 @@ fi
 done
 
 
-for ac_func in mallinfo
+for ac_func in mallinfo, mallinfo2
 do
   ac_tr_decl=`$as_echo "HAVE_DECL_$ac_func" | $as_tr_cpp`
 { $as_echo "$as_me:${as_lineno-$LINENO}: checking whether $ac_func is declared" >&5
diff --git a/gcc/configure.ac b/gcc/configure.ac
index 0f11238c19f..18640fdb8a5 100644
--- a/gcc/configure.ac
+++ b/gcc/configure.ac
@@ -1408,7 +1408,7 @@ define(gcc_UNLOCKED_FUNCS, clearerr_unlocked feof_unlocked dnl
 AC_CHECK_FUNCS(times clock kill getrlimit setrlimit atoq \
 	popen sysconf strsignal getrusage nl_langinfo \
 	gettimeofday mbstowcs wcswidth mmap setlocale \
-	gcc_UNLOCKED_FUNCS madvise mallinfo)
+	gcc_UNLOCKED_FUNCS madvise mallinfo mallinfo2)
 
 if test x$ac_cv_func_mbstowcs = xyes; then
   AC_CACHE_CHECK(whether mbstowcs works, gcc_cv_func_mbstowcs_works,
@@ -1488,7 +1488,7 @@ gcc_AC_CHECK_DECLS(getrlimit setrlimit getrusage, , ,[
 #endif
 ])
 
-gcc_AC_CHECK_DECLS(mallinfo, , ,[
+gcc_AC_CHECK_DECLS([mallinfo, mallinfo2], , ,[
 #include "ansidecl.h"
 #include "system.h"
 #ifdef HAVE_MALLOC_H
diff --git a/gcc/ggc-common.c b/gcc/ggc-common.c
index 94da02f1185..b8782c5824b 100644
--- a/gcc/ggc-common.c
+++ b/gcc/ggc-common.c
@@ -1008,13 +1008,19 @@ ggc_prune_overhead_list (void)
   }
 }
 
-/* Return memory used by heap in kb, 0 if this info is not available.  */
+/* Print memory used by heap in kb if this info is not available.  */
 
 void
 report_heap_memory_use ()
 {
-#ifdef HAVE_MALLINFO
+#if defined(HAVE_MALLINFO) || defined(HAVE_MALLINFO2)
+#ifdef HAVE_MALLINFO2
+  #define MALLINFO_FN mallinfo2
+#else
+  #define MALLINFO_FN mallinfo
+#endif
   if (!quiet_flag)
-fprintf (stderr," {heap %luk}", (unsigned long)(mallinfo().arena / 1024));
+fprintf (stderr," {heap %luk}",
+	 (unsigned long) MALLINFO_FN ().arena / ONE_K);
 #endif
 }
diff --git a/gcc/system.h b/gcc/system.h
index 3c543a005d8..4f0482be25d 100644
--- a/gcc/system.h
+++ b/gcc/system.h
@@ -732,7 +732,7 @@ extern int vsnprintf (char *, size_t, const char *, va_list);
 #endif
 
 #ifdef INCLUDE_MALLOC_H
-#ifdef HAVE_MALLINFO
+#if 

[committed] MSP430: Skip gcc.dg/pr55940.c in the small memory model

2020-09-02 Thread Jozef Lawrynowicz
In the MSP430 small memory model, there is a 16-bit address space and
pointer arithmetic wraps around the address space, so any calculated
address is always within this range.

In this test, pointer arithmetic wraps when 0x1000 is added to the
address of a variable, causing the resulting address to be unexpectedly
less than 0x2000, which breaks the test.

Committed as obvious.
>From d45a6c7099a346153e970476688be5bd6a016cef Mon Sep 17 00:00:00 2001
From: Jozef Lawrynowicz 
Date: Wed, 2 Sep 2020 13:42:39 +0100
Subject: [PATCH] MSP430: Skip gcc.dg/pr55940.c in the small memory model

In the MSP430 small memory model, there is a 16-bit address space and
pointer arithmetic wraps around the address space, so any calculated
address is always within this range.

In this test, pointer arithmetic wraps when 0x1000 is added to the
address of a variable, causing the resulting address to be unexpectedly
less than 0x2000, which breaks the test.

gcc/testsuite/ChangeLog:

* gcc.dg/pr55940.c: Skip for msp430 unless -mlarge is specified.
---
 gcc/testsuite/gcc.dg/pr55940.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/gcc/testsuite/gcc.dg/pr55940.c b/gcc/testsuite/gcc.dg/pr55940.c
index d046d0b6912..85761f6c31f 100644
--- a/gcc/testsuite/gcc.dg/pr55940.c
+++ b/gcc/testsuite/gcc.dg/pr55940.c
@@ -1,5 +1,6 @@
 /* PR target/55940 */
 /* { dg-do run } */
+/* { dg-skip-if "pointer arithmetic can wrap" { msp430-*-* } { "*" } { 
"-mlarge" } } */
 /* { dg-options "-Os" } */
 /* { dg-additional-options "-mpreferred-stack-boundary=2" { target { { 
i?86-*-* x86_64-*-* } && ia32 } } } */
 
-- 
2.28.0



[committed] libstdc++: Break header cycle between and

2020-09-02 Thread Jonathan Wakely via Gcc-patches
The  and  headers each include each other, which makes
building them as header-units "exciting". The  header only needs
the definition of std::exception (in order to derive from it) which is
already in its own header, so just include that.

libstdc++-v3/ChangeLog:

* include/bits/stl_iterator.h: Include 
for definitions of __try, __catch and __throw_exception_again.
(counted_iterator::operator++(int)): Use __throw_exception_again
instead of throw.
* libsupc++/new: Include  not .
* libsupc++/new_opvnt.cc: Include .
* testsuite/18_support/destroying_delete.cc: Include
 for std::is_same_v definition.
* testsuite/20_util/variant/index_type.cc: Qualify size_t.

Tested powerpc64le-linux. Committed to trunk.

commit 6bdbf0f37bda2587a4e82cbb956de7a159a397ae
Author: Jonathan Wakely 
Date:   Wed Sep 2 13:27:57 2020

libstdc++: Break header cycle between  and 

The  and  headers each include each other, which makes
building them as header-units "exciting". The  header only needs
the definition of std::exception (in order to derive from it) which is
already in its own header, so just include that.

libstdc++-v3/ChangeLog:

* include/bits/stl_iterator.h: Include 
for definitions of __try, __catch and __throw_exception_again.
(counted_iterator::operator++(int)): Use __throw_exception_again
instead of throw.
* libsupc++/new: Include  not .
* libsupc++/new_opvnt.cc: Include .
* testsuite/18_support/destroying_delete.cc: Include
 for std::is_same_v definition.
* testsuite/20_util/variant/index_type.cc: Qualify size_t.

diff --git a/libstdc++-v3/include/bits/stl_iterator.h 
b/libstdc++-v3/include/bits/stl_iterator.h
index f0cf4c55c09..da740e3732e 100644
--- a/libstdc++-v3/include/bits/stl_iterator.h
+++ b/libstdc++-v3/include/bits/stl_iterator.h
@@ -79,6 +79,7 @@
 #if __cplusplus > 201703L
 # include 
 # include 
+# include 
 # include 
 #endif
 
@@ -2062,7 +2063,7 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
return _M_current++;
  } __catch(...) {
++_M_length;
-   throw;
+   __throw_exception_again;
  }
 
   }
diff --git a/libstdc++-v3/libsupc++/new b/libstdc++-v3/libsupc++/new
index ebd1c490282..21848a573d1 100644
--- a/libstdc++-v3/libsupc++/new
+++ b/libstdc++-v3/libsupc++/new
@@ -38,7 +38,7 @@
 #pragma GCC system_header
 
 #include 
-#include 
+#include 
 
 #pragma GCC visibility push(default)
 
@@ -52,7 +52,7 @@ namespace std
*
*  @c bad_alloc (or classes derived from it) is used to report allocation
*  errors from the throwing forms of @c new.  */
-  class bad_alloc : public exception 
+  class bad_alloc : public exception
   {
   public:
 bad_alloc() throw() { }
diff --git a/libstdc++-v3/libsupc++/new_opvnt.cc 
b/libstdc++-v3/libsupc++/new_opvnt.cc
index 9f9dace5778..771696d4ba6 100644
--- a/libstdc++-v3/libsupc++/new_opvnt.cc
+++ b/libstdc++-v3/libsupc++/new_opvnt.cc
@@ -25,7 +25,8 @@
 
 #include 
 #include "new"
- 
+#include "exception_defines.h"
+
 _GLIBCXX_WEAK_DEFINITION void*
 operator new[] (std::size_t sz, const std::nothrow_t&) noexcept
 {
diff --git a/libstdc++-v3/testsuite/18_support/destroying_delete.cc 
b/libstdc++-v3/testsuite/18_support/destroying_delete.cc
index 96134d7e010..f0c9bb9fa25 100644
--- a/libstdc++-v3/testsuite/18_support/destroying_delete.cc
+++ b/libstdc++-v3/testsuite/18_support/destroying_delete.cc
@@ -19,6 +19,7 @@
 // { dg-do run { target c++2a } }
 
 #include 
+#include 
 #include 
 
 #ifndef __cpp_lib_destroying_delete
diff --git a/libstdc++-v3/testsuite/20_util/variant/index_type.cc 
b/libstdc++-v3/testsuite/20_util/variant/index_type.cc
index 73863fa677f..1c44758363c 100644
--- a/libstdc++-v3/testsuite/20_util/variant/index_type.cc
+++ b/libstdc++-v3/testsuite/20_util/variant/index_type.cc
@@ -22,4 +22,4 @@
 #include 
 
 static_assert(sizeof(std::variant)
- < sizeof(size_t));
+ < sizeof(std::size_t));


Re: [PATCH][AVX512]Lower AVX512 vector compare to AVX version when dest is vector

2020-09-02 Thread H.J. Lu via Gcc-patches
On Wed, Sep 2, 2020 at 2:33 AM Hongtao Liu via Gcc-patches
 wrote:
>
> Hi:
>   Add define_peephole2 to eliminate potential redundant conversion
> from mask to vector.
>   Bootstrap is ok, regression test is ok for i386/x86-64 backend.
>   Ok for trunk?
>
> gcc/ChangeLog:
> PR target/96891
> * config/i386/sse.md (VI_128_256): New mode iterator.
> (define_peephole2): Lower avx512 vector compare to avx version
> when dest is vector.
>
> gcc/testsuite/ChangeLog:

Missing PR target/96891

> * gcc.target/i386/avx512bw-pr96891-1.c: New test.
> * gcc.target/i386/avx512f-pr96891-1.c: New test.
> * gcc.target/i386/avx512f-pr96891-2.c: New test.
>
> --
> BR,
> Hongtao



-- 
H.J.


Re: [PATCH] Add if-chain to switch conversion pass.

2020-09-02 Thread Martin Liška

On 9/1/20 4:50 PM, David Malcolm wrote:

Hope this is constructive
Dave


Thank you David. All of them very very useful!

There's updated version of the patch.
Martin
>From 6efbec6402a0babb0b8ccc2145fbf681e8030786 Mon Sep 17 00:00:00 2001
From: Martin Liska 
Date: Fri, 28 Aug 2020 10:26:13 +0200
Subject: [PATCH] Add if-chain to switch conversion pass.

gcc/ChangeLog:

	PR tree-optimization/14799
	PR ipa/88702
	* Makefile.in: Add new gimple-if-to-switch.o.
	* common.opt: Add new option.
	* dbgcnt.def (DEBUG_COUNTER): Add new debug counter.
	* doc/invoke.texi: Document -fconvert-if-to-switch.
	* opts.c: Add -fconvert-if-to-switch at OPT_LEVELS_2_PLUS.
	* passes.def: Register new pass.
	* timevar.def (TV_TREE_IF_TO_SWITCH): Add new time variable.
	* tree-pass.h (make_pass_if_to_switch): New.
	* tree-switch-conversion.h: New file.
	* gimple-if-to-switch.cc: New file.

gcc/testsuite/ChangeLog:

	PR tree-optimization/14799
	PR ipa/88702
	* gcc.dg/tree-ssa/reassoc-32.c: Add -fno-convert-if-to-switch.
	* gcc.dg/tree-ssa/if-to-switch-1.c: New test.
	* gcc.dg/tree-ssa/if-to-switch-2.c: Likewise.
	* gcc.dg/tree-ssa/if-to-switch-3.c: Likewise.
	* gcc.dg/tree-ssa/if-to-switch-4.c: Likewise.
	* gcc.dg/tree-ssa/if-to-switch-5.c: Likewise.
	* gcc.dg/tree-ssa/if-to-switch-6.c: Likewise.
	* gcc.dg/tree-ssa/if-to-switch-7.c: Likewise.
	* gcc.dg/tree-ssa/if-to-switch-8.c: Likewise.
---
 gcc/Makefile.in   |   1 +
 gcc/common.opt|   4 +
 gcc/dbgcnt.def|   1 +
 gcc/doc/invoke.texi   |  13 +-
 gcc/gimple-if-to-switch.cc| 762 ++
 gcc/opts.c|   1 +
 gcc/passes.def|   1 +
 .../gcc.dg/tree-ssa/if-to-switch-1.c  |  35 +
 .../gcc.dg/tree-ssa/if-to-switch-2.c  |  11 +
 .../gcc.dg/tree-ssa/if-to-switch-3.c  |  11 +
 .../gcc.dg/tree-ssa/if-to-switch-4.c  |  36 +
 .../gcc.dg/tree-ssa/if-to-switch-5.c  |  12 +
 .../gcc.dg/tree-ssa/if-to-switch-6.c  |  42 +
 .../gcc.dg/tree-ssa/if-to-switch-7.c  |  25 +
 .../gcc.dg/tree-ssa/if-to-switch-8.c  |  27 +
 gcc/testsuite/gcc.dg/tree-ssa/reassoc-32.c|   2 +-
 gcc/timevar.def   |   1 +
 gcc/tree-pass.h   |   1 +
 gcc/tree-switch-conversion.h  |  15 +-
 19 files changed, 993 insertions(+), 8 deletions(-)
 create mode 100644 gcc/gimple-if-to-switch.cc
 create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/if-to-switch-1.c
 create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/if-to-switch-2.c
 create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/if-to-switch-3.c
 create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/if-to-switch-4.c
 create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/if-to-switch-5.c
 create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/if-to-switch-6.c
 create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/if-to-switch-7.c
 create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/if-to-switch-8.c

diff --git a/gcc/Makefile.in b/gcc/Makefile.in
index 79e854aa938..782e9cfe95b 100644
--- a/gcc/Makefile.in
+++ b/gcc/Makefile.in
@@ -1361,6 +1361,7 @@ OBJS = \
 	gimple-array-bounds.o \
 	gimple-builder.o \
 	gimple-expr.o \
+	gimple-if-to-switch.o \
 	gimple-iterator.o \
 	gimple-fold.o \
 	gimple-laddress.o \
diff --git a/gcc/common.opt b/gcc/common.opt
index dd68c61ae1d..80e60934f22 100644
--- a/gcc/common.opt
+++ b/gcc/common.opt
@@ -1173,6 +1173,10 @@ fconserve-stack
 Common Var(flag_conserve_stack) Optimization
 Do not perform optimizations increasing noticeably stack usage.
 
+fconvert-if-to-switch
+Common Report Var(flag_convert_if_to_switch) Optimization
+Perform conversions of if-elseif chain into a switch statement.
+
 fcprop-registers
 Common Report Var(flag_cprop_registers) Optimization
 Perform a register copy-propagation optimization pass.
diff --git a/gcc/dbgcnt.def b/gcc/dbgcnt.def
index cf8775b2b66..e393e45deb4 100644
--- a/gcc/dbgcnt.def
+++ b/gcc/dbgcnt.def
@@ -170,6 +170,7 @@ DEBUG_COUNTER (if_after_combine)
 DEBUG_COUNTER (if_after_reload)
 DEBUG_COUNTER (if_conversion)
 DEBUG_COUNTER (if_conversion_tree)
+DEBUG_COUNTER (if_to_switch)
 DEBUG_COUNTER (ipa_cp_bits)
 DEBUG_COUNTER (ipa_sra_params)
 DEBUG_COUNTER (ipa_sra_retvalues)
diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi
index 5d29a7fa23c..fcaf993e2c2 100644
--- a/gcc/doc/invoke.texi
+++ b/gcc/doc/invoke.texi
@@ -467,7 +467,7 @@ Objective-C and Objective-C++ Dialects}.
 -fassociative-math  -fauto-profile  -fauto-profile[=@var{path}] @gol
 -fauto-inc-dec  -fbranch-probabilities @gol
 -fcaller-saves @gol
--fcombine-stack-adjustments  -fconserve-stack @gol
+-fcombine-stack-adjustments  -fconserve-stack -fconvert-if-to-switch  @gol
 -fcompare-elim  -fcprop-registers  -fcrossjumping @gol
 -fcse-follow-jumps  -fcse-skip-blocks  -fcx-fortran-rules @gol
 -fcx-limited-range @gol
@@ -534,7 +534,7 @@ 

Re: [RFC][nvptx, libgomp] Add 128-bit atomic support

2020-09-02 Thread Tom de Vries
On 9/2/20 12:44 PM, Jakub Jelinek wrote:
> On Wed, Sep 02, 2020 at 12:22:28PM +0200, Tom de Vries wrote:
>> And test-case passes on x86_64 with this patch (obviously, in
>> combination with trigger patch above).
>>
>> Jakub, WDYT?
> 
> I guess the normal answer would be use libatomic, but it isn't ported for
> nvptx.

Ah, I was not aware of that one, filed
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=96898 to look into that.

> I guess at least temporarily this is ok, though I'm wondering why
> you need __sync_*_16 rather than __atomic_*_16, 

That's what omp-expand.c uses in expand_omp_atomic_pipeline:
BUILT_IN_SYNC_VAL_COMPARE_AND_SWAP_N .

Thanks,
- Tom

> or perhaps both __sync_* and
> __atomic_*.
> 
> What happens if you try
> unsigned __int128 v;
> #pragma omp declare target (v)
> int
> main ()
> {
>   #pragma omp target
>   {
> __atomic_add_fetch (, 1, __ATOMIC_RELAXED);
> __atomic_fetch_add (, 1, __ATOMIC_RELAXED);
> unsigned __int128v exp = 2;
> __atomic_compare_exchange_n (, , 7, 0, __ATOMIC_RELEASE, 
> __ATOMIC_ACQUIRE);
>   }
> }
> etc. (see some gcc.dg/atomic* tests, ditto for __sync_*)?
> I guess better not to throw everything into one test, because not every
> target supports them all (e.g. I think x86_64 doesn't really do 128-bit
> atomic loads because the cmpxchg16b insn are not appropriate for .rodata
> locations).
> 
>   Jakub
> 


Re: [RFC][nvptx, libgomp] Add 128-bit atomic support

2020-09-02 Thread Tobias Burnus

On 9/2/20 12:22 PM, Tom de Vries wrote:


Tobias, can you try on powerpc?


Testcase now compiles and runs w/o error message.

On 9/2/20 12:44 PM, Jakub Jelinek wrote:


I guess the normal answer would be use libatomic, but it isn't ported for
nvptx.
I guess at least temporarily this is ok,though I'm wondering why
you need __sync_*_16 rather than __atomic_*_16, or perhaps both __sync_* and
__atomic_*.

What happens if you try
unsigned __int128 v;


...

I had to change "unsigned __int128" and "unsigned __int128v" to
"__uint128_t" and "expected" to "exp". Result without offloading
configured on x86-64-gnu-linux:

aotmic.c:(.text+0x84): undefined reference to `__atomic_fetch_add_16'
/usr/bin/ld: aotmic.c:(.text+0xa3): undefined reference to 
`__atomic_fetch_add_16'
/usr/bin/ld: aotmic.c:(.text+0xda): undefined reference to 
`__atomic_compare_exchange_16'

And on PowerPC with nvptx (without the RFC patch):

atomic.c: In function 'main._omp_fn.0':
atomic.c:6:11: internal compiler error: in write_fn_proto, at 
config/nvptx/nvptx.c:913
6 |   #pragma omp target
  |   ^

Tobias

-
Mentor Graphics (Deutschland) GmbH, Arnulfstraße 201, 80634 München / Germany
Registergericht München HRB 106955, Geschäftsführer: Thomas Heurung, Alexander 
Walter


Re: [RFC][nvptx, libgomp] Add 128-bit atomic support

2020-09-02 Thread Jakub Jelinek via Gcc-patches
On Wed, Sep 02, 2020 at 12:22:28PM +0200, Tom de Vries wrote:
> And test-case passes on x86_64 with this patch (obviously, in
> combination with trigger patch above).
> 
> Jakub, WDYT?

I guess the normal answer would be use libatomic, but it isn't ported for
nvptx.
I guess at least temporarily this is ok, though I'm wondering why
you need __sync_*_16 rather than __atomic_*_16, or perhaps both __sync_* and
__atomic_*.

What happens if you try
unsigned __int128 v;
#pragma omp declare target (v)
int
main ()
{
  #pragma omp target
  {
__atomic_add_fetch (, 1, __ATOMIC_RELAXED);
__atomic_fetch_add (, 1, __ATOMIC_RELAXED);
unsigned __int128v exp = 2;
__atomic_compare_exchange_n (, , 7, 0, __ATOMIC_RELEASE, 
__ATOMIC_ACQUIRE);
  }
}
etc. (see some gcc.dg/atomic* tests, ditto for __sync_*)?
I guess better not to throw everything into one test, because not every
target supports them all (e.g. I think x86_64 doesn't really do 128-bit
atomic loads because the cmpxchg16b insn are not appropriate for .rodata
locations).

Jakub



Re: [PATCH 3/4 v3] ivopts: Consider cost_step on different forms during unrolling

2020-09-02 Thread Segher Boessenkool
Hi!

On Wed, Sep 02, 2020 at 11:16:00AM +0800, Kewen.Lin wrote:
> on 2020/9/1 上午3:41, Segher Boessenkool wrote:
> > On Tue, Aug 25, 2020 at 08:46:55PM +0800, Kewen.Lin wrote:
> >> 1) Currently address_cost hook on rs6000 always return zero, but at least
> >> from Power7, pre_inc/pre_dec kind instructions are cracked, it means we
> >> have to take the address update into account (scalar normal operation).
> > 
> > From Power4 on already (not sure about Power6, but does anyone care?)
> 
> Thanks for the information, it looks this issue exists for a long time.

Well, *is* it an issue?  The addressing doesn't get more expensive...
For example, an
  ldu 3,16(4)
is cracked to an
  ld 3,16(4)
and an
  addi 4,4,16
(the addi is not on the critical path of the load).  So it seems to me
this shouldn't increase the addressing cost at all?  (The instruction of
course is really two insns in one.)


Segher


[RFC][nvptx, libgomp] Add 128-bit atomic support

2020-09-02 Thread Tom de Vries
[ was: Re: [patch][nvptx] libgomp: Split testcase in order to XFAIL
__sync_val_compare_and_swap_16 (was: [PATCH] nvptx: Add support for
subword compare-and-swap) ]

On 9/2/20 9:56 AM, Tom de Vries wrote:
> On 9/1/20 2:58 PM, Tom de Vries wrote:
>> On 9/1/20 1:41 PM, Tobias Burnus wrote:
>>> Hi Tom, hello all,
>>>
>>> it turned out that the testcase fails on PowerPC (but not x86_64)
>>> as the nvptx lto complains: unresolved symbol
>>> __sync_val_compare_and_swap_16
>>>
>>> The testcase uses int128 – and that's the culprit, but I have no idea
>>> why it only fails with PowerPC and not with x86-64.
>>>
>>
> 
> Reproduced on x86_64 using trigger patch:
> ...
> $ git diff
> diff --git a/gcc/config/i386/sync.md b/gcc/config/i386/sync.md
> index ed17bb00205..eccedac192f 100644
> --- a/gcc/config/i386/sync.md
> +++ b/gcc/config/i386/sync.md
> @@ -153,9 +153,15 @@
>  (DI "TARGET_64BIT || (TARGET_CMPXCHG8B && (TARGET_80387 ||
> TARGET_SSE))")
> ])
> 
> + (define_mode_iterator ATOMIC2
> +[QI HI SI
> + (DI "TARGET_64BIT || (TARGET_CMPXCHG8B && (TARGET_80387 ||
> TARGET_SSE))")
> +TI
> +])
> +
>  (define_expand "atomic_load"
> -  [(set (match_operand:ATOMIC 0 "nonimmediate_operand")
> -   (unspec:ATOMIC [(match_operand:ATOMIC 1 "memory_operand")
> +  [(set (match_operand:ATOMIC2 0 "nonimmediate_operand")
> +   (unspec:ATOMIC2 [(match_operand:ATOMIC2 1 "memory_operand")
> (match_operand:SI 2 "const_int_operand")]
>UNSPEC_LDA))]
>""
> diff --git a/libgomp/testsuite/libgomp.c-c++-common/reduction-16.c
> b/libgomp/testsuite/libgomp.c-c++-common/reduction-16.c
> index d0e82b04790..62b0e032c33 100644
> --- a/libgomp/testsuite/libgomp.c-c++-common/reduction-16.c
> +++ b/libgomp/testsuite/libgomp.c-c++-common/reduction-16.c
> @@ -1,4 +1,5 @@
>  /* { dg-do run } */
> +/* { dg-additional-options "-mcx16" } */
> 
>  #include 
> 
> ...
> 

And test-case passes on x86_64 with this patch (obviously, in
combination with trigger patch above).

Jakub, WDYT?

Tobias, can you try on powerpc?

Thanks,
- Tom

[nvptx, libgomp] Add 128-bit atomic support

---
 libgomp/config/nvptx/atomic.c | 34 ++
 1 file changed, 34 insertions(+)

diff --git a/libgomp/config/nvptx/atomic.c b/libgomp/config/nvptx/atomic.c
new file mode 100644
index 000..49a6d350827
--- /dev/null
+++ b/libgomp/config/nvptx/atomic.c
@@ -0,0 +1,34 @@
+#include 
+
+#include "../../atomic.c"
+
+unsigned __int128
+__sync_val_compare_and_swap_16 (volatile void *vptr, unsigned __int128 oldval,
+unsigned __int128 newval)
+{
+  volatile unsigned __int128 *ptr = vptr;
+  GOMP_atomic_start ();
+  unsigned __int128 val = *ptr;
+  if (val == oldval)
+*ptr = newval;
+  GOMP_atomic_end ();
+  return val;
+}
+
+bool
+__sync_bool_compare_and_swap_16 (volatile void *vptr, unsigned __int128 oldval,
+ unsigned __int128 newval)
+{
+  return __sync_val_compare_and_swap_16 (vptr, oldval, newval) == oldval;
+}
+
+unsigned __int128
+__atomic_load_16 (const volatile void *vptr,
+		  int memorder __attribute__((unused)))
+{
+  const volatile unsigned __int128 *ptr = vptr;
+  GOMP_atomic_start ();
+  unsigned __int128 val = *ptr;
+  GOMP_atomic_end ();
+  return val;
+}


Re: [PATCH, rs6000] Fix Vector long long subtype (PR96139)

2020-09-02 Thread Segher Boessenkool
Hi Will,

On Tue, Sep 01, 2020 at 09:00:20PM -0500, will schmidt wrote:
>   This corrects an issue with the powerpc vector long long subtypes.
> As reported by SjMunroe in PR96139.  When building some code with -Wall
> and attempting to print an element of a "long long vector" with a long long
> printf format string, we will report a error because the vector sub-type
> was improperly defined as int.
> 
> When defining a V2DI_type_node we use a TARGET_POWERPC64 ternary to
> define the V2DI_type_node with "vector long" or "vector long long".
> We also need to specify the proper sub-type when we define the type.

> -  V2DI_type_node = rs6000_vector_type (TARGET_POWERPC64 ? "__vector long"
> -: "__vector long long",
> -intDI_type_node, 2);
> +  V2DI_type_node
> += rs6000_vector_type (TARGET_POWERPC64
> +   ? "__vector long" : "__vector long long",
> +   TARGET_POWERPC64
> +   ? long_long_integer_type_node : intDI_type_node,
> +   2);

Can't you just use long_long_integer_type_node in all cases?  Or, what
else is intDI_type_node for 32 bit?

> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/powerpc/pr96139-a.c
> @@ -0,0 +1,32 @@
> +/* { dg-do compile } */
> +/* { dg-options "-O2 -Wall -m32 " } */

(trailing space, here and elsewhere -- not that it matters of course)


Segher


Re: [PATCH] [AVX512] [PR87767] Optimize memory broadcast for constant vector under AVX512

2020-09-02 Thread Jakub Jelinek via Gcc-patches
On Wed, Sep 02, 2020 at 09:57:08AM +0800, Hongtao Liu via Gcc-patches wrote:
> +
> +  first = XVECEXP (constant, 0, 0);
> +  /* There could be some rtx like
> +  (mem/u/c:V16QI (symbol_ref/u:DI ("*.LC1")))
> +  but with "*.LC1" refer to V2DI constant vector.  */
> +  if (GET_MODE (constant) != mode)
> + {
> +   constant = simplify_subreg (mode, constant, GET_MODE (constant), 0);
> +   if (constant == NULL_RTX || GET_CODE (constant) != CONST_VECTOR)
> + return;
> + }

The
  first = XVECEXP (constant, 0, 0);
line needs to be after this if, not before it, otherwise it will miscompile
things or just ICE.

> @@ -2197,6 +2272,10 @@ remove_partial_avx_dependency (void)
> if (!NONDEBUG_INSN_P (insn))
>   continue;
>  
> +   /* Hanlde AVX512 embedded broadcast here to save compile time.  */

s/Hanlde/Handle/

> +  for (insn = get_insns (); insn; insn = NEXT_INSN (insn))
> +{
> +  if (!INSN_P (insn))
> + continue;
> +  replace_constant_pool_with_broadcast (insn);
> +}

Perhaps instead do:
  for (insn = get_insns (); insn; insn = NEXT_INSN (insn))
if (INSN_P (insn))
  replace_constant_pool_with_broadcast (insn);
?

> +  /* opt_pass methods: */
> +  virtual bool gate (function *)
> +{
> +  /* Return false if rpad pass gate is true.
> +  replace_constant_pool_with_broadcast is called
> +  from both this pass and rpad pass.  */
> +  return (TARGET_AVX512F
> +   && !(TARGET_AVX
> +&& TARGET_SSE_PARTIAL_REG_DEPENDENCY
> +&& TARGET_SSE_MATH
> +&& optimize
> +&& optimize_function_for_speed_p (cfun)));

I think this could be a maintainance nightmare.
Perhaps instead add

static bool
remove_partial_avx_dependency_gate ()
{
  return (TARGET_AVX
  && TARGET_SSE_PARTIAL_REG_DEPENDENCY
  && TARGET_SSE_MATH
  && optimize
  && optimize_function_for_speed_p (cfun));
}
after the remove_partial_avx_dependency function definition,
change pass_remove_partial_avx_dependency gate body to
  return remove_partial_avx_dependency_gate ();
and in pass_constant_pool_broadcast::gate do
  return (TARGET_AVX512F && !remove_partial_avx_dependency_gate ();
(with the comment you have there)?

LGTM with those changes.

Jakub



Re: [RFC] enable flags-unchanging asms, add_overflow/expand/combine woes

2020-09-02 Thread Segher Boessenkool
Hi!

On Tue, Sep 01, 2020 at 07:22:57PM -0300, Alexandre Oliva wrote:
> This WIP patchlet introduces means for machines that implicitly clobber
> cc flags in asm statements, but that have machinery to output flags
> (namely x86, non-thumb arm and aarch64), to state that the asm statement
> does NOT clobber cc flags.  That's accomplished by using "=@ccC" in the
> output constraints.  It disables the implicit clobber, but it doesn't
> set up an actual asm output to the flags, so they are left alone.
> 
> It's ugly, I know.

Yeah, it's bloody disgusting :-)  But it is very local, and it works
with the generic code without any changes there, that is good.  OTOH
this patch is for x86 only.  (And aarch, but not the other targets
with default clobbers).

> I've considered "!cc" or "nocc" in the clobber
> list as a machine-independent way to signal cc is not modified, or
> even special-casing empty asm patterns (at a slight risk of breaking
> code that expects implicit clobbers even for empty asm patterns, but
> empty asm patterns won't clobber flags, so how could it break
> anything?).

People write empty asm statements not because they would like no insns
emitted from it, but *because* they want the other effects an asm has
(for example, an empty asm usually has no outputs, so it is volatile,
and that makes sure it is executed in the real machine exactly as often
as in the abstract machine).  So your expectation might be wrong,
someone might want an empty asm to clobber cc on x86 (like any asm is
documented as doing).

But how about a "none" clobber?  That would be generic, and just remove
all preceding clobbers (incl. the implicit clobbers).  Maybe disallow
any explicit clobbers before it, not sure what is nicer.

> I take this might be useful for do-nothing asm
> statements, often used to stop certain optimizations, e.g.:
> 
>   __typeof (*p) c = __builtin_add_overflow (*p, 1, p);
>   asm ("" : "+m" (*p)); // Make sure we write to memory.
>   *p += c; // This should compile into an add with carry.

Wow, nasty.  That asm cannot be optimised away even if the rest is
(unless GCC can somehow figure out nothing ever uses *p).  Is there no
better way to do this?

> Is there interest in, and a preferred form for (portably?), conveying
> a no-cc-clobbering asm?

Well, that whole cc clobbering is an x86 thing, but some other targets
clobber other registers by default.  Yes, I think this might be useful;
and see my suggestion above ("none").

> Without the asm, we issue load;add;adc;store, which is not the ideal
> sequence with add and adc to the same memory address (or two different
> addresses, if the last statement uses say *q instead of *p).

Is doing two RMWs on memory faster?  Huh.

> Alas, getting the first add to go straight to memory is more
> complicated.  Even with the asm that forces the output to memory, the
> output flag makes it harder to get it optimized to an add-to-memory
> form.  When the output flag is unused, we optimize it enough in gimple
> that TER does its job and we issue a single add, but that's not possible
> when the two outputs of ADD_OVERFLOW are used: the flag setting gets
> optimized away, but only after stopping combine from turning the
> load/add/store into an add-to-memory.
> 
> If we retried the 3-insn substitution after substituting the flag store
> into the add for the adc,

combine should retry every combination if any of the input insns to it
have changed (put another way, if any insn is changed all combinations
with it are tried anew).  If this doesn't work, please file a bug.

But.  Dependencies through memory are never used for combine (it uses
dependencies through registers only), maybe that is what you are seeing?
This makes many "RMW" optimisations need 4-insn combinations, which are
not normally done.

> we might succeed, but only if we had a pattern
> that matched add3_cc_overflow_1's parallel with the flag-setter as
> the second element of the parallel, because that's where combine adds it
> to the new i3 pattern, after splitting it out of i2.

That sounds like the backend pattern has it wrong then?  There is a
canonical order for this?

> I suppose adding such patterns manually isn't the way to go.  I wonder
> if getting recog_for_combine to recognize and reorder PARALLELs
> appearing out of order would get too expensive, even if genrecog were to
> generate optimized code to try alternate orders in parallels.

Very big parallels are used, and trying all orderings would take just a
little too much time ;-)

We could do some limited permutations of course.  There are some cases
where you *unavoidably* have this problem (say, when adding three
things together), so this could be useful sometimes.  Maybe just try
permuting the first three arms of the parallel, for example?

> The issue doesn't seem that important in the grand scheme of things, but
> there is some embarrassment from the missed combines and from the AFAICT
> impossibility to get GCC to issue the most compact 

Re: [PATCH] Adjust testcase

2020-09-02 Thread Hongtao Liu via Gcc-patches
On Mon, Aug 31, 2020 at 2:19 PM Hongtao Liu  wrote:
>
> Hi:
>   This patch is to adjust testcases which failed the regression test
> when gcc is built with -march=skylake-avx512.
>   Also add runtime check for AVX512 tests.
>
> gcc/testsuite/ChangeLog:
> PR target/96246
> PR target/96855
> PR target/96856
> PR target/96857
> * g++.target/i386/avx512bw-pr96246-2.C: Add runtime check for
> AVX512BW.
> * g++.target/i386/avx512vl-pr96246-2.C: Add runtime check for
> AVX512BW and AVX512VL
> * g++.target/i386/avx512f-helper.h: New header.
> * gcc.target/i386/pr92658-avx512f.c: Add
> -mprefer-vector-width=512 to avoid impact of different default
> mtune which gcc is built with.
> * gcc.target/i386/avx512bw-pr95488-1.c: Ditto.
> * gcc.target/i386/pr92645-4.c: Add -mno-avx512f to avoid
> impact of different default march which gcc is built with.
>
>
> --
> BR,
> Hongtao

I am going to check in this patch, the patch only touches the
testcases, and wouldn't affect any functionality of GCC.

-- 
BR,
Hongtao


[PATCH][AVX512]Lower AVX512 vector compare to AVX version when dest is vector

2020-09-02 Thread Hongtao Liu via Gcc-patches
Hi:
  Add define_peephole2 to eliminate potential redundant conversion
from mask to vector.
  Bootstrap is ok, regression test is ok for i386/x86-64 backend.
  Ok for trunk?

gcc/ChangeLog:
PR target/96891
* config/i386/sse.md (VI_128_256): New mode iterator.
(define_peephole2): Lower avx512 vector compare to avx version
when dest is vector.

gcc/testsuite/ChangeLog:

* gcc.target/i386/avx512bw-pr96891-1.c: New test.
* gcc.target/i386/avx512f-pr96891-1.c: New test.
* gcc.target/i386/avx512f-pr96891-2.c: New test.

-- 
BR,
Hongtao
From ba76432c08f47e4ecc1f355c0dfdea8908aaf9f4 Mon Sep 17 00:00:00 2001
From: liuhongt 
Date: Wed, 2 Sep 2020 17:14:39 +0800
Subject: [PATCH] Lower AVX512 vector compare to AVX version when dest is
 vector.

gcc/ChangeLog:
	PR target/96891
	* config/i386/sse.md (VI_128_256): New mode iterator.
	(define_peephole2): Lower avx512 vector compare to avx version
	when dest is vector.

gcc/testsuite/ChangeLog:

	* gcc.target/i386/avx512bw-pr96891-1.c: New test.
	* gcc.target/i386/avx512f-pr96891-1.c: New test.
	* gcc.target/i386/avx512f-pr96891-2.c: New test.
---
 gcc/config/i386/sse.md| 93 +++
 .../gcc.target/i386/avx512bw-pr96891-1.c  | 36 +++
 .../gcc.target/i386/avx512f-pr96891-1.c   | 40 
 .../gcc.target/i386/avx512f-pr96891-2.c   | 30 ++
 4 files changed, 199 insertions(+)
 create mode 100644 gcc/testsuite/gcc.target/i386/avx512bw-pr96891-1.c
 create mode 100644 gcc/testsuite/gcc.target/i386/avx512f-pr96891-1.c
 create mode 100644 gcc/testsuite/gcc.target/i386/avx512f-pr96891-2.c

diff --git a/gcc/config/i386/sse.md b/gcc/config/i386/sse.md
index 8250325e1a3..31e0dc2a600 100644
--- a/gcc/config/i386/sse.md
+++ b/gcc/config/i386/sse.md
@@ -629,6 +629,9 @@ (define_mode_iterator VI_128 [V16QI V8HI V4SI V2DI])
 ;; All 256bit vector integer modes
 (define_mode_iterator VI_256 [V32QI V16HI V8SI V4DI])
 
+;; All 128 and 256bit vector integer modes
+(define_mode_iterator VI_128_256 [V16QI V8HI V4SI V2DI V32QI V16HI V8SI V4DI])
+
 ;; Various 128bit vector integer mode combinations
 (define_mode_iterator VI12_128 [V16QI V8HI])
 (define_mode_iterator VI14_128 [V16QI V4SI])
@@ -6703,6 +6706,96 @@ (define_insn "*_cvtmask2"
(set_attr "prefix" "evex")
(set_attr "mode" "")])
 
+/* Lower avx512 parallel floating compare to avx compare when dst is vector.  */
+(define_peephole2
+  [(set (match_operand: 0 "register_operand")
+	(unspec:
+	  [(match_operand:VF_128_256 1 "register_operand")
+	   (match_operand:VF_128_256 2 "nonimmediate_operand")
+	   (match_operand:SI 3 "const_0_to_31_operand")]
+	  UNSPEC_PCMP))
+   (set (match_operand: 4 "register_operand")
+	(vec_merge:
+	  (match_operand: 5 "vector_all_ones_operand")
+	  (match_operand: 6 "const0_operand")
+	  (match_dup 0)))]
+  "!EXT_REX_SSE_REGNO_P (REGNO (operands[4]))
+  && !EXT_REX_SSE_REGNO_P (REGNO (operands[1]))
+  && !(REG_P (operands[2]) && EXT_REX_SSE_REGNO_P (REGNO (operands[2])))
+  && peep2_reg_dead_p (2, operands[0])"
+  [(set (match_dup 7)
+	(unspec:VF_128_256
+	  [(match_dup 1)
+	   (match_dup 2)
+	   (match_dup 3)] UNSPEC_PCMP))]
+  "operands[7] = gen_rtx_REG (mode, REGNO (operands[4]));")
+
+/* Lower avx512 parallel integral compare to avx compare when dst is vector.  */
+(define_peephole2
+  [(set (match_operand: 0 "register_operand")
+	(unspec:
+	  [(match_operand:VI_128_256 1 "register_operand")
+	   (match_operand:VI_128_256 2 "nonimmediate_operand")]
+	  UNSPEC_MASKED_EQ))
+   (set (match_operand:VI_128_256 4 "register_operand")
+	(vec_merge:VI_128_256
+	  (match_operand:VI_128_256 5 "vector_all_ones_operand")
+	  (match_operand:VI_128_256 6 "const0_operand")
+	  (match_dup 0)))]
+  "!EXT_REX_SSE_REGNO_P (REGNO (operands[4]))
+  && !EXT_REX_SSE_REGNO_P (REGNO (operands[1]))
+  && !(REG_P (operands[2]) && EXT_REX_SSE_REGNO_P (REGNO (operands[2])))
+  && peep2_reg_dead_p (2, operands[0])"
+  [(set (match_dup 4)
+  	(eq:VI_128_256
+	  (match_dup 1)
+	  (match_dup 2)))])
+
+(define_peephole2
+  [(set (match_operand: 0 "register_operand")
+	(unspec:
+	  [(match_operand:VI_128_256 1 "register_operand")
+	   (match_operand:VI_128_256 2 "nonimmediate_operand")]
+	  UNSPEC_MASKED_GT))
+   (set (match_operand:VI_128_256 4 "register_operand")
+	(vec_merge:VI_128_256
+	  (match_operand:VI_128_256 5 "vector_all_ones_operand")
+	  (match_operand:VI_128_256 6 "const0_operand")
+	  (match_dup 0)))]
+  "!EXT_REX_SSE_REGNO_P (REGNO (operands[4]))
+  && !EXT_REX_SSE_REGNO_P (REGNO (operands[1]))
+  && !(REG_P (operands[2]) && EXT_REX_SSE_REGNO_P (REGNO (operands[2])))
+  && peep2_reg_dead_p (2, operands[0])"
+  [(set (match_dup 4)
+  	(gt:VI_128_256
+	  (match_dup 1)
+	  (match_dup 2)))])
+
+(define_peephole2
+  [(set (match_operand: 0 "register_operand")
+	(unspec:
+	  [(match_operand:VI_128_256 1 "register_operand")
+	   (match_operand:VI_128_256 2 "nonimmediate_operand")
+	   (match_operand:SI 3 

Re: [PATCH] rs6000: Expand vec_insert in expander instead of gimple [PR79251]

2020-09-02 Thread Richard Biener via Gcc-patches
On Wed, Sep 2, 2020 at 11:26 AM luoxhu  wrote:
>
> Hi,
>
> On 2020/9/1 21:07, Richard Biener wrote:
> > On Tue, Sep 1, 2020 at 10:11 AM luoxhu via Gcc-patches
> >  wrote:
> >>
> >> Hi,
> >>
> >> On 2020/9/1 01:04, Segher Boessenkool wrote:
> >>> Hi!
> >>>
> >>> On Mon, Aug 31, 2020 at 04:06:47AM -0500, Xiong Hu Luo wrote:
>  vec_insert accepts 3 arguments, arg0 is input vector, arg1 is the value
>  to be insert, arg2 is the place to insert arg1 to arg0.  This patch adds
>  __builtin_vec_insert_v4si[v4sf,v2di,v2df,v8hi,v16qi] for vec_insert to
>  not expand too early in gimple stage if arg2 is variable, to avoid 
>  generate
>  store hit load instructions.
> 
>  For Power9 V4SI:
>    addi 9,1,-16
>    rldic 6,6,2,60
>    stxv 34,-16(1)
>    stwx 5,9,6
>    lxv 34,-16(1)
>  =>
>    addis 9,2,.LC0@toc@ha
>    addi 9,9,.LC0@toc@l
>    mtvsrwz 33,5
>    lxv 32,0(9)
>    sradi 9,6,2
>    addze 9,9
>    sldi 9,9,2
>    subf 9,9,6
>    subfic 9,9,3
>    sldi 9,9,2
>    subfic 9,9,20
>    lvsl 13,0,9
>    xxperm 33,33,45
>    xxperm 32,32,45
>    xxsel 34,34,33,32
> >>>
> >>> For v a V4SI, x a SI, j some int,  what do we generate for
> >>> v[j&3] = x;
> >>> ?
> >>> This should be exactly the same as we generate for
> >>> vec_insert(x, v, j);
> >>> (the builtin does a modulo 4 automatically).
> >>
> >> No, even with my patch "stxv 34,-16(1);stwx 5,9,6;lxv 34,-16(1)" generated 
> >> currently.
> >> Is it feasible and acceptable to expand some kind of pattern in expander 
> >> directly without
> >> builtin transition?
> >>
> >> I borrowed some of implementation from vec_extract.  For vec_extract, the 
> >> issue also exists:
> >>
> >> source: gimple:
> >>  expand:asm:
> >> 1) i = vec_extract (v, n);  =>  i = __builtin_vec_ext_v4si (v, n);   => 
> >> {r120:SI=unspec[r118:V4SI,r119:DI] 134;...} => slwi 9,6,2   vextuwrx 
> >> 3,9,2
> >> 2) i = vec_extract (v, 3);  =>  i = __builtin_vec_ext_v4si (v, 3);   => 
> >> {r120:SI=vec_select(r118:V4SI,parallel)...} =>  li 9,12  vextuwrx 
> >> 3,9,2
> >> 3) i = v[n%4];   =>  _1 = n & 3;  i = VIEW_CONVERT_EXPR(v)[_1];  
> >> =>...=> stxv 34,-16(1);addi 9,1,-16; rldic 5,5,2,60; 
> >> lwax 3,9,5
> >> 4) i = v[3]; =>  i = BIT_FIELD_REF ;  
> >> =>  {r120:SI=vec_select(r118:V4SI,parallel)...} => li 9,12;   vextuwrx 
> >> 3,9,2
> >
> > Why are 1) and 2) handled differently than 3)/4)?
>
> 1) and 2) are calling builtin function vec_extract, it is defined to
>  __builtin_vec_extract and will be resolved to ALTIVEC_BUILTIN_VEC_EXTRACT
>  by resolve_overloaded_builtin, to generate a call __builtin_vec_ext_v4si
>  to be expanded only in RTL.
> 3) is access variable v as array type with opcode VIEW_CONVERT_EXPR, I
>  guess we should also generate builtin call instead of calling
>  convert_vector_to_array_for_subscript to generate VIEW_CONVERT_EXPR
>  expression for such kind of usage.
> 4) is translated to BIT_FIELD_REF with constant bitstart and bitsize,
> variable v could also be accessed by register instead of stack, so optabs
> could match the rs6000_expand_vector_insert to generate expected instruction
> through extract_bit_field.
>
> >
> >> Case 3) also couldn't handle the similar usage, and case 4) doesn't 
> >> generate builtin as expected,
> >> it just expand to vec_select by coincidence.  So does this mean both 
> >> vec_insert and vec_extract
> >> and all other similar vector builtins should use IFN as suggested by 
> >> Richard Biener, to match the
> >> pattern in gimple and expand both constant and variable index in expander? 
> >>  Will this also be
> >> beneficial for other targets except power?  Or we should do that gradually 
> >> after this patch
> >> approved as it seems another independent issue?  Thanks:)
> >
> > If the code generated for 3)/4) isn't optimal you have to figure why
> > by tracing the RTL
> > expansion code and looking for missing optabs.
> >
> > Consider the amount of backend code you need to write if ever using
> > those in constexpr
> > context ...
>
> It seems too complicated to expand the "i = VIEW_CONVERT_EXPR(v)[_1];"
> or "VIEW_CONVERT_EXPR(v1)[_1] = i_6(D);" to
> rs6000_expand_vector_insert/rs6000_expand_vector_extract in RTL, as:
> 1) Vector v is stored to stack with array type; need extra load and store 
> operation.
> 2) Requires amount of code to decompose VIEW_CONVERT_EXPR to extract the 
> vector and
> index then call rs6000_expand_vector_insert/rs6000_expand_vector_extract.
>
> which means replace following instructions #9~#12 to new instruction 
> sequences:
> 1: NOTE_INSN_DELETED
> 6: NOTE_INSN_BASIC_BLOCK 2
> 2: r119:V4SI=%2:V4SI
> 3: r120:DI=%5:DI
> 4: 

Re: [PATCH] rs6000: Expand vec_insert in expander instead of gimple [PR79251]

2020-09-02 Thread luoxhu via Gcc-patches
Hi,

On 2020/9/1 21:07, Richard Biener wrote:
> On Tue, Sep 1, 2020 at 10:11 AM luoxhu via Gcc-patches
>  wrote:
>>
>> Hi,
>>
>> On 2020/9/1 01:04, Segher Boessenkool wrote:
>>> Hi!
>>>
>>> On Mon, Aug 31, 2020 at 04:06:47AM -0500, Xiong Hu Luo wrote:
 vec_insert accepts 3 arguments, arg0 is input vector, arg1 is the value
 to be insert, arg2 is the place to insert arg1 to arg0.  This patch adds
 __builtin_vec_insert_v4si[v4sf,v2di,v2df,v8hi,v16qi] for vec_insert to
 not expand too early in gimple stage if arg2 is variable, to avoid generate
 store hit load instructions.

 For Power9 V4SI:
   addi 9,1,-16
   rldic 6,6,2,60
   stxv 34,-16(1)
   stwx 5,9,6
   lxv 34,-16(1)
 =>
   addis 9,2,.LC0@toc@ha
   addi 9,9,.LC0@toc@l
   mtvsrwz 33,5
   lxv 32,0(9)
   sradi 9,6,2
   addze 9,9
   sldi 9,9,2
   subf 9,9,6
   subfic 9,9,3
   sldi 9,9,2
   subfic 9,9,20
   lvsl 13,0,9
   xxperm 33,33,45
   xxperm 32,32,45
   xxsel 34,34,33,32
>>>
>>> For v a V4SI, x a SI, j some int,  what do we generate for
>>> v[j&3] = x;
>>> ?
>>> This should be exactly the same as we generate for
>>> vec_insert(x, v, j);
>>> (the builtin does a modulo 4 automatically).
>>
>> No, even with my patch "stxv 34,-16(1);stwx 5,9,6;lxv 34,-16(1)" generated 
>> currently.
>> Is it feasible and acceptable to expand some kind of pattern in expander 
>> directly without
>> builtin transition?
>>
>> I borrowed some of implementation from vec_extract.  For vec_extract, the 
>> issue also exists:
>>
>> source: gimple:  
>>expand:asm:
>> 1) i = vec_extract (v, n);  =>  i = __builtin_vec_ext_v4si (v, n);   => 
>> {r120:SI=unspec[r118:V4SI,r119:DI] 134;...} => slwi 9,6,2   vextuwrx 
>> 3,9,2
>> 2) i = vec_extract (v, 3);  =>  i = __builtin_vec_ext_v4si (v, 3);   => 
>> {r120:SI=vec_select(r118:V4SI,parallel)...} =>  li 9,12  vextuwrx 
>> 3,9,2
>> 3) i = v[n%4];   =>  _1 = n & 3;  i = VIEW_CONVERT_EXPR(v)[_1];  =>  
>>   ...=> stxv 34,-16(1);addi 9,1,-16; rldic 5,5,2,60; lwax 
>> 3,9,5
>> 4) i = v[3]; =>  i = BIT_FIELD_REF ;  =>  
>> {r120:SI=vec_select(r118:V4SI,parallel)...} => li 9,12;   vextuwrx 3,9,2
> 
> Why are 1) and 2) handled differently than 3)/4)?

1) and 2) are calling builtin function vec_extract, it is defined to
 __builtin_vec_extract and will be resolved to ALTIVEC_BUILTIN_VEC_EXTRACT
 by resolve_overloaded_builtin, to generate a call __builtin_vec_ext_v4si
 to be expanded only in RTL. 
3) is access variable v as array type with opcode VIEW_CONVERT_EXPR, I
 guess we should also generate builtin call instead of calling
 convert_vector_to_array_for_subscript to generate VIEW_CONVERT_EXPR
 expression for such kind of usage.
4) is translated to BIT_FIELD_REF with constant bitstart and bitsize,
variable v could also be accessed by register instead of stack, so optabs
could match the rs6000_expand_vector_insert to generate expected instruction
through extract_bit_field.

> 
>> Case 3) also couldn't handle the similar usage, and case 4) doesn't generate 
>> builtin as expected,
>> it just expand to vec_select by coincidence.  So does this mean both 
>> vec_insert and vec_extract
>> and all other similar vector builtins should use IFN as suggested by Richard 
>> Biener, to match the
>> pattern in gimple and expand both constant and variable index in expander?  
>> Will this also be
>> beneficial for other targets except power?  Or we should do that gradually 
>> after this patch
>> approved as it seems another independent issue?  Thanks:)
> 
> If the code generated for 3)/4) isn't optimal you have to figure why
> by tracing the RTL
> expansion code and looking for missing optabs.
> 
> Consider the amount of backend code you need to write if ever using
> those in constexpr
> context ...

It seems too complicated to expand the "i = VIEW_CONVERT_EXPR(v)[_1];"
or "VIEW_CONVERT_EXPR(v1)[_1] = i_6(D);" to
rs6000_expand_vector_insert/rs6000_expand_vector_extract in RTL, as:
1) Vector v is stored to stack with array type; need extra load and store 
operation.
2) Requires amount of code to decompose VIEW_CONVERT_EXPR to extract the vector 
and
index then call rs6000_expand_vector_insert/rs6000_expand_vector_extract.

which means replace following instructions #9~#12 to new instruction sequences:
1: NOTE_INSN_DELETED
6: NOTE_INSN_BASIC_BLOCK 2
2: r119:V4SI=%2:V4SI
3: r120:DI=%5:DI
4: r121:DI=%6:DI
5: NOTE_INSN_FUNCTION_BEG
8: [r112:DI]=r119:V4SI

9: r122:DI=r121:DI&0x3
   10: r123:DI=r122:DI<<0x2
   11: r124:DI=r112:DI+r123:DI
   12: [r124:DI]=r120:DI#0

   13: r118:V4SI=[r112:DI]
   17: %2:V4SI=r118:V4SI
   18: use %2:V4SI

=>

1: NOTE_INSN_DELETED
6: 

Re: [PATCH] fortran: Fix o'...' boz to integer/real conversions [PR96859]

2020-09-02 Thread Tobias Burnus

On 9/2/20 10:18 AM, Jakub Jelinek via Gcc-patches wrote:


The standard says that excess digits from boz are truncated.
For hexadecimal or binary, the routines copy just the number of digits
that will be needed, but for octal we copy number of digits that
contain one extra bit (for 8-bit, 32-bit or 128-bit, i.e. kind 1, 4 and 16)
or two extra bits (for 16-bit or 64-bit, i.e. kind 2 and 8).
The clearing of the first bit is done correctly by changing the first digit
if it is 4-7 to one smaller by 4 (i.e. modulo 4).
The clearing of the first two bits is done by changing 4 or 6 to 0
and 5 or 7 to 1, which is incorrect, because we really want to change the
first digit to 0 if it was even, or to 1 if it was odd, so digits
2 and 3 are mishandled by keeping them as is, rather than changing 2 to 0
and 3 to 1.

Fixed thusly, bootstrapped/regtested on x86_64-linux and i686-linux, ok for
trunk and release branches?


The code part of the patch looks okay – as does the testcase.
→ LGTM.

What confused me a while when looking at the patch is the comment
"Clear first two bits"
This does not really apply to neither o'2' = b'11' nor
o'3' = b'11' as only the first ('1') bit needs to be removed (ignoring
'0' padding on left). However, unless you have better wording,
we can leave it as is.

Thanks for the patch!

Tobias



2020-09-02  Jakub Jelinek  

  PR fortran/96859
  * check.c (gfc_boz2real, gfc_boz2int): When clearing first two bits,
  change also '2' to '0' and '3' to '1' rather than just handling '4'
  through '7'.

  * gfortran.dg/pr96859.f90: New test.

--- gcc/fortran/check.c.jj2020-08-24 10:00:01.424256990 +0200
+++ gcc/fortran/check.c   2020-09-01 17:15:19.882311053 +0200
@@ -340,9 +340,9 @@ gfc_boz2real (gfc_expr *x, int kind)
/* Clear first two bits.  */
else
  {
-   if (buf[0] == '4' || buf[0] == '6')
+   if (buf[0] == '2' || buf[0] == '4' || buf[0] == '6')
  buf[0] = '0';
-   else if (buf[0] == '5' || buf[0] == '7')
+   else if (buf[0] == '3' || buf[0] == '5' || buf[0] == '7')
  buf[0] = '1';
  }
  }
@@ -429,9 +429,9 @@ gfc_boz2int (gfc_expr *x, int kind)
/* Clear first two bits.  */
else
  {
-   if (buf[0] == '4' || buf[0] == '6')
+   if (buf[0] == '2' || buf[0] == '4' || buf[0] == '6')
  buf[0] = '0';
-   else if (buf[0] == '5' || buf[0] == '7')
+   else if (buf[0] == '3' || buf[0] == '5' || buf[0] == '7')
  buf[0] = '1';
  }
  }
--- gcc/testsuite/gfortran.dg/pr96859.f90.jj  2020-09-01 15:26:26.448799264 
+0200
+++ gcc/testsuite/gfortran.dg/pr96859.f90 2020-09-01 15:26:18.492914701 
+0200
@@ -0,0 +1,25 @@
+! PR fortran/96859
+! { dg-do run }
+
+program pr96859
+  if (merge_bits(32767_2, o'1234567', 32767_2).ne.32767_2) stop 1
+  if (merge_bits(o'1234567', 32767_2, o'1234567').ne.32767_2) stop 2
+  if (merge_bits(32767_2, o'1234567', b'010101').ne.14711_2) stop 3
+  if (merge_bits(32767_2, o'1234567', z'12345678').ne.32639_2) stop 4
+  if (int (o'1034567', 2).ne.14711_2) stop 5
+  if (int (o'1234567', 2).ne.14711_2) stop 6
+  if (int (o'1434567', 2).ne.14711_2) stop 7
+  if (int (o'1634567', 2).ne.14711_2) stop 8
+  if (int (o'1134567', 2).ne.-18057_2) stop 9
+  if (int (o'1334567', 2).ne.-18057_2) stop 10
+  if (int (o'1534567', 2).ne.-18057_2) stop 11
+  if (int (o'1734567', 2).ne.-18057_2) stop 12
+  if (int (o'70123456776543211234567', 8).ne.1505855851274254711_8) stop 13
+  if (int (o'72123456776543211234567', 8).ne.1505855851274254711_8) stop 14
+  if (int (o'74123456776543211234567', 8).ne.1505855851274254711_8) stop 15
+  if (int (o'76123456776543211234567', 8).ne.1505855851274254711_8) stop 16
+  if (int (o'71123456776543211234567', 8).ne.-7717516185580521097_8) stop 17
+  if (int (o'73123456776543211234567', 8).ne.-7717516185580521097_8) stop 18
+  if (int (o'75123456776543211234567', 8).ne.-7717516185580521097_8) stop 19
+  if (int (o'77123456776543211234567', 8).ne.-7717516185580521097_8) stop 20
+end

  Jakub


-
Mentor Graphics (Deutschland) GmbH, Arnulfstraße 201, 80634 München / Germany
Registergericht München HRB 106955, Geschäftsführer: Thomas Heurung, Alexander 
Walter


Re: [PATCH] middle-end/94301 - map V1x to x when the vector mode is not available

2020-09-02 Thread Richard Biener
On Tue, 1 Sep 2020, Jakub Jelinek wrote:

> On Tue, Sep 01, 2020 at 01:52:21PM +0200, Richard Biener wrote:
> > OK, I'll see to do that (or fixup the RTL expansion side somehow).
> > 
> > Note that clang and gcc disagree about the return value ABI for
> > 
> > typedef double v1df __attribute__((vector_size(8)));
> > 
> > v1df foo (v1df x)
> > {
> >   return x;
> > }
> > 
> > where clang returns in %xmm0 while we return by invisible reference.
> > The argument is passed the same (via stack).  IIRC we've long said
> > the backends should look at the types, not the modes when deciding
> > how to pass / return things ...
> 
> But many of the backends still do use modes.
> If there is an ABI issue, I think we need to find out what the psABI says
> and if it is unclear, discuss with psABI authors.

I've filed PR96895 for this.

Richard.


[PATCH] fortran: Fix o'...' boz to integer/real conversions [PR96859]

2020-09-02 Thread Jakub Jelinek via Gcc-patches
Hi!

The standard says that excess digits from boz are truncated.
For hexadecimal or binary, the routines copy just the number of digits
that will be needed, but for octal we copy number of digits that
contain one extra bit (for 8-bit, 32-bit or 128-bit, i.e. kind 1, 4 and 16)
or two extra bits (for 16-bit or 64-bit, i.e. kind 2 and 8).
The clearing of the first bit is done correctly by changing the first digit
if it is 4-7 to one smaller by 4 (i.e. modulo 4).
The clearing of the first two bits is done by changing 4 or 6 to 0
and 5 or 7 to 1, which is incorrect, because we really want to change the
first digit to 0 if it was even, or to 1 if it was odd, so digits
2 and 3 are mishandled by keeping them as is, rather than changing 2 to 0
and 3 to 1.

Fixed thusly, bootstrapped/regtested on x86_64-linux and i686-linux, ok for
trunk and release branches?

2020-09-02  Jakub Jelinek  

PR fortran/96859
* check.c (gfc_boz2real, gfc_boz2int): When clearing first two bits,
change also '2' to '0' and '3' to '1' rather than just handling '4'
through '7'.

* gfortran.dg/pr96859.f90: New test.

--- gcc/fortran/check.c.jj  2020-08-24 10:00:01.424256990 +0200
+++ gcc/fortran/check.c 2020-09-01 17:15:19.882311053 +0200
@@ -340,9 +340,9 @@ gfc_boz2real (gfc_expr *x, int kind)
   /* Clear first two bits.  */
   else
{
- if (buf[0] == '4' || buf[0] == '6')
+ if (buf[0] == '2' || buf[0] == '4' || buf[0] == '6')
buf[0] = '0';
- else if (buf[0] == '5' || buf[0] == '7')
+ else if (buf[0] == '3' || buf[0] == '5' || buf[0] == '7')
buf[0] = '1';
}
 }
@@ -429,9 +429,9 @@ gfc_boz2int (gfc_expr *x, int kind)
   /* Clear first two bits.  */
   else
{
- if (buf[0] == '4' || buf[0] == '6')
+ if (buf[0] == '2' || buf[0] == '4' || buf[0] == '6')
buf[0] = '0';
- else if (buf[0] == '5' || buf[0] == '7')
+ else if (buf[0] == '3' || buf[0] == '5' || buf[0] == '7')
buf[0] = '1';
}
 }
--- gcc/testsuite/gfortran.dg/pr96859.f90.jj2020-09-01 15:26:26.448799264 
+0200
+++ gcc/testsuite/gfortran.dg/pr96859.f90   2020-09-01 15:26:18.492914701 
+0200
@@ -0,0 +1,25 @@
+! PR fortran/96859
+! { dg-do run }
+
+program pr96859
+  if (merge_bits(32767_2, o'1234567', 32767_2).ne.32767_2) stop 1
+  if (merge_bits(o'1234567', 32767_2, o'1234567').ne.32767_2) stop 2
+  if (merge_bits(32767_2, o'1234567', b'010101').ne.14711_2) stop 3
+  if (merge_bits(32767_2, o'1234567', z'12345678').ne.32639_2) stop 4
+  if (int (o'1034567', 2).ne.14711_2) stop 5
+  if (int (o'1234567', 2).ne.14711_2) stop 6
+  if (int (o'1434567', 2).ne.14711_2) stop 7
+  if (int (o'1634567', 2).ne.14711_2) stop 8
+  if (int (o'1134567', 2).ne.-18057_2) stop 9
+  if (int (o'1334567', 2).ne.-18057_2) stop 10
+  if (int (o'1534567', 2).ne.-18057_2) stop 11
+  if (int (o'1734567', 2).ne.-18057_2) stop 12
+  if (int (o'70123456776543211234567', 8).ne.1505855851274254711_8) stop 13
+  if (int (o'72123456776543211234567', 8).ne.1505855851274254711_8) stop 14
+  if (int (o'74123456776543211234567', 8).ne.1505855851274254711_8) stop 15
+  if (int (o'76123456776543211234567', 8).ne.1505855851274254711_8) stop 16
+  if (int (o'71123456776543211234567', 8).ne.-7717516185580521097_8) stop 17
+  if (int (o'73123456776543211234567', 8).ne.-7717516185580521097_8) stop 18
+  if (int (o'75123456776543211234567', 8).ne.-7717516185580521097_8) stop 19
+  if (int (o'77123456776543211234567', 8).ne.-7717516185580521097_8) stop 20
+end

Jakub



Re: [committed] improve handling of offset wraparound in -Wstringop-overread

2020-09-02 Thread Christophe Lyon via Gcc-patches
On Wed, 2 Sep 2020 at 00:12, Martin Sebor via Gcc-patches
 wrote:
>
> ILP32 failures in a test added for the new -Wstringop-overread
> option exposed an unnecessarily restrictive handling of offsets
> in ranges with an upper bound that's apparently less than
> the lower bound.  I have relaxed the handling of this case to
> avoid these failures and improve the efficacy of both
> the new warning as well as -Wstringop-overflow, and committed
> the attached in r11-2973.  (Besides an x86_64-linux bootstrap
> I verified the change by running a subset of tests under
> the arm-eabi cross where the failures were first observed).
>

thanks!

> Martin


Re: [patch][nvptx] libgomp: Split testcase in order to XFAIL __sync_val_compare_and_swap_16 (was: [PATCH] nvptx: Add support for subword compare-and-swap)

2020-09-02 Thread Tom de Vries
On 9/1/20 2:58 PM, Tom de Vries wrote:
> On 9/1/20 1:41 PM, Tobias Burnus wrote:
>> Hi Tom, hello all,
>>
>> it turned out that the testcase fails on PowerPC (but not x86_64)
>> as the nvptx lto complains: unresolved symbol
>> __sync_val_compare_and_swap_16
>>
>> The testcase uses int128 – and that's the culprit, but I have no idea
>> why it only fails with PowerPC and not with x86-64.
>>
> 

Reproduced on x86_64 using trigger patch:
...
$ git diff
diff --git a/gcc/config/i386/sync.md b/gcc/config/i386/sync.md
index ed17bb00205..eccedac192f 100644
--- a/gcc/config/i386/sync.md
+++ b/gcc/config/i386/sync.md
@@ -153,9 +153,15 @@
 (DI "TARGET_64BIT || (TARGET_CMPXCHG8B && (TARGET_80387 ||
TARGET_SSE))")
])

+ (define_mode_iterator ATOMIC2
+[QI HI SI
+ (DI "TARGET_64BIT || (TARGET_CMPXCHG8B && (TARGET_80387 ||
TARGET_SSE))")
+TI
+])
+
 (define_expand "atomic_load"
-  [(set (match_operand:ATOMIC 0 "nonimmediate_operand")
-   (unspec:ATOMIC [(match_operand:ATOMIC 1 "memory_operand")
+  [(set (match_operand:ATOMIC2 0 "nonimmediate_operand")
+   (unspec:ATOMIC2 [(match_operand:ATOMIC2 1 "memory_operand")
(match_operand:SI 2 "const_int_operand")]
   UNSPEC_LDA))]
   ""
diff --git a/libgomp/testsuite/libgomp.c-c++-common/reduction-16.c
b/libgomp/testsuite/libgomp.c-c++-common/reduction-16.c
index d0e82b04790..62b0e032c33 100644
--- a/libgomp/testsuite/libgomp.c-c++-common/reduction-16.c
+++ b/libgomp/testsuite/libgomp.c-c++-common/reduction-16.c
@@ -1,4 +1,5 @@
 /* { dg-do run } */
+/* { dg-additional-options "-mcx16" } */

 #include 

...

Thanks,
- Tom


Re: [PATCH] bpf: use elfos.h

2020-09-02 Thread Jose E. Marchesi via Gcc-patches


> BPF is an ELF-based target, so it definitely benefits from using
> elfos.h.  This patch makes the target to use it, and removes
> superfluous definitions from bpf.h which are better defined in
> elfos.h.
>
> Note that BPF, despite being an ELF target, doesn't use DWARF.  At
> some point it will generate DWARF when generating xBPF (-mxbpf) and
> BTF when generating plain eBPF, but for the time being it just
> generates stabs.
>
> 2020-09-01  Jose E. Marchesi  
>
>   gcc/
>   * config.gcc: Use elfos.h in bpf-*-* targets.
>   * config/bpf/bpf.h (MAX_OFILE_ALIGNMENT): Remove definition.
>   (COMMON_ASM_OP): Likewise.
>   (INIT_SECTION_ASM_OP): Likewise.
>   (FINI_SECTION_ASM_OP): Likewise.
>   (ASM_OUTPUT_SKIP): Likewise.
>   (ASM_OUTPUT_ALIGNED_COMMON): Likewise.
>   (ASM_OUTPUT_ALIGNED_LOCAL): Likewise.

I just installed this in both master and gcc-10.
Salud!


[COMMITTED] bpf: use the default asm_named_section target hook

2020-09-02 Thread Jose E. Marchesi via Gcc-patches
This patch makes the BPF backend to not provide its own implementation
of the asm_named_section hook; the default handler works perfectly
well.

2020-09-02  Jose E. Marchesi  

gcc/
* config/bpf/bpf.c (bpf_asm_named_section): Delete.
(TARGET_ASM_NAMED_SECTION): Likewise.
---
 gcc/config/bpf/bpf.c | 17 -
 1 file changed, 17 deletions(-)

diff --git a/gcc/config/bpf/bpf.c b/gcc/config/bpf/bpf.c
index 84d17d4a27f..972a91adcd8 100644
--- a/gcc/config/bpf/bpf.c
+++ b/gcc/config/bpf/bpf.c
@@ -219,23 +219,6 @@ bpf_target_macros (cpp_reader *pfile)
   }
 }
 
-/* Output assembly directives to switch to section NAME.  The section
-   should have attributes as specified by FLAGS, which is a bit mask
-   of the 'SECTION_*' flags defined in 'output.h'.  If DECL is
-   non-NULL, it is the 'VAR_DECL' or 'FUNCTION_DECL' with which this
-   section is associated.  */
-
-static void
-bpf_asm_named_section (const char *name,
-  unsigned int flags ATTRIBUTE_UNUSED,
-  tree decl ATTRIBUTE_UNUSED)
-{
-  fprintf (asm_out_file, "\t.section\t%s\n", name);
-}
-
-#undef TARGET_ASM_NAMED_SECTION
-#define TARGET_ASM_NAMED_SECTION bpf_asm_named_section
-
 /* Return an RTX representing the place where a function returns or
receives a value of data type RET_TYPE, a tree node representing a
data type.  */
-- 
2.25.0.2.g232378479e