[Bug ipa/100100] missed optimization for dead code elimination at -O3 (vs. -O1, -Os, -O2)

2021-04-15 Thread rguenth at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100100

Richard Biener  changed:

   What|Removed |Added

 CC||jamborm at gcc dot gnu.org
Version|unknown |11.0
 Status|UNCONFIRMED |NEW
   Keywords||missed-optimization
 Ever confirmed|0   |1
   Last reconfirmed||2021-04-15

--- Comment #1 from Richard Biener  ---
Different inlining somehow causes different CCP:

-;; Function n (n, funcdef_no=3, decl_uid=1980, cgraph_uid=4, symbol_order=14)
+;; Function j (j, funcdef_no=2, decl_uid=1977, cgraph_uid=3, symbol_order=13)

 Adding destination of edge (0 -> 2) to worklist

 Simulating block 2

 Visiting statement:
-a.0_1 = a;
-which is likely CONSTANT
-Lattice value changed to CONSTANT 0.  Adding SSA edges to worklist.
-marking stmt to be not simulated again
-
-Visiting statement:
-_6 = a.0_1 & 65535;
+_8 = q_2(D) & 65535;
 which is likely CONSTANT
-Match-and-simplified a.0_1 & 65535 to 0
-Lattice value changed to CONSTANT 0.  Adding SSA edges to worklist.
+Lattice value changed to CONSTANT 0x0 (0x).  Adding SSA edges to worklist.

so we know 'a' is zero (after IPA) but at -O3 a is passed as parameter
(but IPA CP didn't figure out its constant value).

Still odd that the inlining decision is different.

[Bug c++/80456] [8/9/10/11 Regression] calling constexpr member function from volatile-qualified member function: error: ‘this’ is not a constant expression

2021-04-15 Thread ppalka at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=80456

Patrick Palka  changed:

   What|Removed |Added

  Known to work||4.8.1
Summary|calling constexpr member|[8/9/10/11 Regression]
   |function from   |calling constexpr member
   |volatile-qualified member   |function from
   |function: error: ‘this’ is  |volatile-qualified member
   |not a constant expression   |function: error: ‘this’ is
   ||not a constant expression
  Known to fail||10.3.0, 11.0, 8.4.0, 9.3.0
 CC||ppalka at gcc dot gnu.org
   Target Milestone|--- |8.5

--- Comment #8 from Patrick Palka  ---
We apparently started rejecting the testcase starting with GCC 4.9 (r0-122547
or perhaps r0-122549); GCC 4.8 accepts.  So I suppose we should consider this
PR a regression.

[Bug target/99767] [9/10/11 Regression] ICE in expand_direct_optab_fn, at internal-fn.c:3360

2021-04-15 Thread rguenth at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=99767

--- Comment #4 from Richard Biener  ---
Note for quite some of the failed-DCE PRs we could "fix up" at RTL expansion
time by teaching rewrite_out_of_ssa to DCE all zero-use defs where it already
does

  /* Eliminate PHIs which are of no use, such as virtual or dead phis.  */
  eliminate_useless_phis ();

for example by walking all SSA names and seeding a bitmap for
simple_dce_from_worklist.

[Bug preprocessor/100099] Compilation speed of #include is too slow. Just include the header takes 0.342 seconds

2021-04-15 Thread jakub at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100099

Jakub Jelinek  changed:

   What|Removed |Added

 CC||jakub at gcc dot gnu.org

--- Comment #1 from Jakub Jelinek  ---
And what do you expect?  That means including over 1.5MB of preprocessed
source, with over 5500 inline functions.

[Bug ipa/100100] New: missed optimization for dead code elimination at -O3 (vs. -O1, -Os, -O2)

2021-04-15 Thread zhendong.su at inf dot ethz.ch via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100100

Bug ID: 100100
   Summary: missed optimization for dead code elimination at -O3
(vs. -O1, -Os, -O2)
   Product: gcc
   Version: unknown
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: ipa
  Assignee: unassigned at gcc dot gnu.org
  Reporter: zhendong.su at inf dot ethz.ch
CC: marxin at gcc dot gnu.org
  Target Milestone: ---

[538] % gcctk -v
Using built-in specs.
COLLECT_GCC=gcctk
COLLECT_LTO_WRAPPER=/local/suz-local/software/local/gcc-trunk/libexec/gcc/x86_64-pc-linux-gnu/11.0.1/lto-wrapper
Target: x86_64-pc-linux-gnu
Configured with: ../gcc-trunk/configure --disable-bootstrap
--prefix=/local/suz-local/software/local/gcc-trunk --enable-languages=c,c++
--disable-werror --enable-multilib --with-system-zlib
Thread model: posix
Supported LTO compression algorithms: zlib
gcc version 11.0.1 20210415 (experimental) [master revision
4dd9e1c541e:7315804b0a0:b5f644a98b3f3543d3a8d2dfea7785c22879013f] (GCC) 
[539] % 
[539] % gcctk -O1 -S -o O1.s small.c
[540] % gcctk -O3 -S -o O3.s small.c
[541] % 
[541] % wc O1.s O3.s
  79  175 1126 O1.s
  98  196 1466 O3.s
 177  371 2592 total
[542] % 
[542] % grep foo O1.s
[543] % grep foo O3.s
jmp foo
[544] % 
[544] % cat small.c
extern void foo(void);
static int a, d, e, h, o, p;
static short b, c, f, *i = 
static char g;
static void l() {
  d = 0;
  for (; d < 4; d++)
for (; f; f++)
  c = 0;
  for (; c; c++)
;
}
static void k(unsigned short q) {
  l();
  for (; g; g++)
h = e == p;
}
static void j(int q) {
  k(q);
  unsigned char m = q;
  o = m;
  if (m)
foo();
}
void n() {
  j(a);
  *i = 0;
}
int main() {
  n();
  return 0;
}

[Bug preprocessor/100099] New: Compilation speed of #include is too slow. Just include the header takes 0.342 seconds

2021-04-15 Thread unlvsur at live dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100099

Bug ID: 100099
   Summary: Compilation speed of #include is too
slow. Just include the header takes 0.342 seconds
   Product: gcc
   Version: 11.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: preprocessor
  Assignee: unassigned at gcc dot gnu.org
  Reporter: unlvsur at live dot com
  Target Milestone: ---

Compilation speed of #include is too slow.

[PATCH][pushed] docs: remove itemx for a param

2021-04-15 Thread Martin Liška
gcc/ChangeLog:

* doc/invoke.texi: Other params don't use it, remove it.
---
 gcc/doc/invoke.texi | 1 -
 1 file changed, 1 deletion(-)

diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi
index 17551246477..096cebc8562 100644
--- a/gcc/doc/invoke.texi
+++ b/gcc/doc/invoke.texi
@@ -13049,7 +13049,6 @@ also use other heuristics to decide whether 
if-conversion is likely to be
 profitable.
 
 @item max-rtl-if-conversion-predictable-cost
-@itemx max-rtl-if-conversion-unpredictable-cost
 RTL if-conversion will try to remove conditional branches around a block
 and replace them with conditionally executed instructions.  These parameters
 give the maximum permissible cost for the sequence that would be generated
-- 
2.31.1



[Bug fortran/100098] New: Polymorphic pointers and allocatables have incorrect rank

2021-04-15 Thread jrfsousa at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100098

Bug ID: 100098
   Summary: Polymorphic pointers and allocatables have incorrect
rank
   Product: gcc
   Version: 11.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: fortran
  Assignee: unassigned at gcc dot gnu.org
  Reporter: jrfsousa at gmail dot com
  Target Milestone: ---

Created attachment 50601
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=50601=edit
Fortran code showing problem

Hi All!

Rank information is not correctly written into the pointer and allocatable
polymorphic object descriptors.

Seen on:

GNU Fortran (GCC) 11.0.1 20210415 (experimental)
GNU Fortran (GCC) 10.3.1 20210415

Thank you very much.

Best regards,
José Rui

Re: [PATCH] [libstdc++] Refactor/cleanup of atomic wait implementation

2021-04-15 Thread Jonathan Wakely via Gcc-patches

On 23/03/21 12:00 -0700, Thomas Rodgers wrote:

From: Thomas Rodgers 

* This patch addresses jwakely's previous feedback.
* This patch also subsumes thiago.macie...@intel.com 's 'Uncontroversial


If this part is intended as part of the commit msg let's put Thiago's
name rather than email address, but I'm assuming this preamble isn't
intended for the commit anyway.


 improvements to C++20 wait-related implementation'.
* This patch also changes the atomic semaphore implementation to avoid
 checking for any waiters before a FUTEX_WAKE op.

This is a substantial rewrite of the atomic wait/notify (and timed wait
counterparts) implementation.

The previous __platform_wait looped on EINTR however this behavior is
not required by the standard. A new _GLIBCXX_HAVE_PLATFORM_WAIT macro
now controls whether wait/notify are implemented using a platform
specific primitive or with a platform agnostic mutex/condvar. This
patch only supplies a definition for linux futexes. A future update
could add support __ulock_wait/wake on Darwin, for instance.

The members of __waiters were lifted to a new base class. The members
are now arranged such that overall sizeof(__waiters_base) fits in two
cache lines (on platforms with at least 64 byte cache lines). The
definition will also use destructive_interference_size for this if it
is available.


N.B. that makes the ABI potentially different with different
compilers, e.g. if you compile it today it will use 64, but then you
compile it with some future version of Clang that defines the
interference sizes it might use a different value. That's OK for now,
but is something to be aware of and remember.



The __waiters type is now specific to untimed waits. Timed waits have a
corresponding __timed_waiters type. Much of the code has been moved from
the previous __atomic_wait() free function to the __waiter_base template
and a __waiter derived type is provided to implement the un-timed wait
operations. A similar change has been made to the timed wait
implementation.


While reading this code I keep getting confused between __waiter
singular and __waiters plural. Would something like __waiter_pool or
__waiters_mgr work instead of __waiters?


The __atomic_spin code has been extended to take a spin policy which is
invoked after the initial busy wait loop. The default policy is to
return from the spin. The timed wait code adds a timed backoff spinning
policy. The code from  which implements this_thread::sleep_for,
sleep_until has been moved to a new  header
which allows the thread sleep code to be consumed without pulling in the
whole of .


The new header is misnamed. The existing  headers all
define std::foo, but this doesn't define std::thread::sleep* or
std::thread_sleep*. I think  would be fine, or
 if you prefer that.

The original reason I introduced  was that
 seemed too likely to clash with something in glibc or
another project using "bits" as a prefix, so I figured std_mutex.h for
std::mutex would be safer. I had the same concern for 
and so that's  too, but I think thread_sleep is
probably sufficiently un-clashy, and this_thread_sleep definitely so.




The entry points into the wait/notify code have been restructured to
support either -
  * Testing the current value of the atomic stored at the given address
and waiting on a notification.
  * Applying a predicate to determine if the wait was satisfied.
The entry points were renamed to make it clear that the wait and wake
operations operate on addresses. The first variant takes the expected
value and a function which returns the current value that should be used
in comparison operations, these operations are named with a _v suffix
(e.g. 'value'). All atomic<_Tp> wait/notify operations use the first
variant. Barriers, latches and semaphores use the predicate variant.

This change also centralizes what it means to compare values for the
purposes of atomic::wait rather than scattering through individual
predicates.


I like this a lot more, thanks.



diff --git a/libstdc++-v3/include/bits/atomic_base.h 
b/libstdc++-v3/include/bits/atomic_base.h
index 2dc00676054..2e46691c59a 100644
--- a/libstdc++-v3/include/bits/atomic_base.h
+++ b/libstdc++-v3/include/bits/atomic_base.h
@@ -1017,8 +1015,8 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
  wait(const _Tp* __ptr, _Val<_Tp> __old,
   memory_order __m = memory_order_seq_cst) noexcept
  {
-   std::__atomic_wait(__ptr, __old,
-   [=]() { return load(__ptr, __m) == __old; });
+   std::__atomic_wait_address_v(__ptr, __old,
+   [__ptr, __m]() { return load(__ptr, __m); });


Pre-existing, but __ptr is dependent here so this needs to call
__atomic_impl::load to prevent ADL.




diff --git a/libstdc++-v3/include/bits/atomic_timed_wait.h 
b/libstdc++-v3/include/bits/atomic_timed_wait.h
index a0c5ef4374e..4b876236d2b 100644
--- a/libstdc++-v3/include/bits/atomic_timed_wait.h
+++ b/libstdc++-v3/include/bits/atomic_timed_wait.h
@@ -36,6 +36,7 @@

#if 

[Bug fortran/100097] New: Unlimited polymorphic pointers and allocatables have incorrect rank

2021-04-15 Thread jrfsousa at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100097

Bug ID: 100097
   Summary: Unlimited polymorphic pointers and allocatables have
incorrect rank
   Product: gcc
   Version: 11.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: fortran
  Assignee: unassigned at gcc dot gnu.org
  Reporter: jrfsousa at gmail dot com
  Target Milestone: ---

Created attachment 50600
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=50600=edit
Fortran code showing problem

Hi All!

Rank information is not correctly written into the pointer and allocatable
unlimited polymorphic descriptors.

Seen on:

GNU Fortran (GCC) 11.0.1 20210415 (experimental)
GNU Fortran (GCC) 10.3.1 20210415

Thank you very much.

Best regards,
José Rui

[Bug c++/100091] [11 Regression] decltype([]{}) rejected as a default template parameter

2021-04-15 Thread rguenth at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100091

--- Comment #3 from Richard Biener  ---
Of course such use of a lambda is quite pointless which in turn makes the P1
classification a bit odd (but given it's a new regression it's technically
correct).  We'll be happy to demote this to P2 though.

Re: [Patch, fortran] PR fortran/100094 - Undefined pointers have incorrect rank when using optimization

2021-04-15 Thread Tobias Burnus

On 15.04.21 13:56, José Rui Faustino de Sousa via Gcc-patches wrote:


Proposed patch to:
PR100094 - Undefined pointers have incorrect rank when using optimization
Patch tested only on x86_64-pc-linux-gnu.


LGTM - thanks!

Tobias


Pointers, and allocatables, must carry TKR information even when
undefined. The patch adds code to initialize both pointers and
allocatables element size, rank and type as soon as possible to do so.
Latter initialization will work for allocatables, but not for pointers
since one can not test meaningfully the association status of
undefined pointers.

Thank you very much.

Best regards,
José Rui

Fortran: Add missing TKR initialization [PR100094]

gcc/fortran/ChangeLog:

PR fortran/100094
* trans-array.c (gfc_trans_deferred_array): Add code to initialize
pointers and allocatables with correct TKR parameters.

gcc/testsuite/ChangeLog:

PR fortran/100094
* gfortran.dg/PR100094.f90: New test.


-
Mentor Graphics (Deutschland) GmbH, Arnulfstrasse 201, 80634 München 
Registergericht München HRB 106955, Geschäftsführer: Thomas Heurung, Frank 
Thürauf


[Bug jit/100096] libgccjit.so.0: Cannot write-enable text segment: Permission denied on NetBSD 9.1

2021-04-15 Thread jakub at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100096

Jakub Jelinek  changed:

   What|Removed |Added

 CC||jakub at gcc dot gnu.org

--- Comment #1 from Jakub Jelinek  ---
How did you build libgccjit.so.0?
Have you used --enable-host-shared during configure?

Re: [committed] gimple UIDs, LTO and -fanalyzer [PR98599]

2021-04-15 Thread David Malcolm via Gcc-patches
On Thu, 2021-04-15 at 11:45 +0200, Jan Hubicka wrote:
> Hi,
> this is patch fixing the underlying issue of function missing
> lto_prepare_function_for_streaming because gimple_has_body_p is not
> the
> same thing as node.has_gimple_body (which needs to be clarified next
> stage1 by finding better names for this I suppose).
> 
> I commited it to gcc 11 even though we already have your workaround
> since it is small and safe and it may save some pain when backporting
> changes to the branch in future - basically all passes at WPA
> renumbering statements would hit this issue which is not that obvious
> to
> debug as we found :)
> 

I think it's just the analyzer that's affected in gcc 11 (and plugins,
I suppose), hence I went with the localized fix, but it's your call.


> We may backport it to gcc10 too if you preffer it over your fix - I
> think both are fine in general for release branches.
> 

The analyzer started changing stmt uids in gcc 11 (specifically in
b0702ac5588333e27d7ec43d21d704521f7a05c6, on 2020-10-27), so I think
the fix would only affect plugins in older releases.

Dave


> lto-bootstrapped/regtested x86_64-linux.
> 
> Honza
> 
> 2021-04-15  Jan Hubicka  
> 
> PR lto/98599
> * lto.c (lto_wpa_write_files): Fix handling of clones.
> 
> diff --git a/gcc/lto/lto.c b/gcc/lto/lto.c
> index ceb61bb300b..5903f75ac23 100644
> --- a/gcc/lto/lto.c
> +++ b/gcc/lto/lto.c
> @@ -306,7 +306,7 @@ lto_wpa_write_files (void)
>    cgraph_node *node;
>    /* Do body modifications needed for streaming before we fork out
>   worker processes.  */
> -  FOR_EACH_FUNCTION_WITH_GIMPLE_BODY (node)
> +  FOR_EACH_FUNCTION (node)
>  if (!node->clone_of && gimple_has_body_p (node->decl))
>    lto_prepare_function_for_streaming (node);
>  
> 




P1 patch ping

2021-04-15 Thread Jakub Jelinek via Gcc-patches
Hi!

I'd like to ping this patch, it is one of the last 4 P1s we have for GCC11.

Thanks.

On Thu, Apr 08, 2021 at 04:15:42PM -0600, Martin Sebor via Gcc-patches wrote:
> PR c/99420 - bogus -Warray-parameter on a function redeclaration in function 
> scope
> PR c/99972 - missing -Wunused-result on a call to a locally redeclared 
> warn_unused_result function
> 
> gcc/c/ChangeLog:
> 
>   PR c/99420
>   PR c/99972
>   * c-decl.c (pushdecl): Always propagate type attribute.
> 
> gcc/testsuite/ChangeLog:
> 
>   PR c/99420
>   PR c/99972
>   * gcc.dg/Warray-parameter-9.c: New test.
>   * gcc.dg/Wnonnull-6.c: New test.
>   * gcc.dg/Wreturn-type3.c: New test.
>   * gcc.dg/Wunused-result.c: New test.
>   * gcc.dg/attr-noreturn.c: New test.
>   * gcc.dg/attr-returns-nonnull.c: New test.

Jakub



[committed] testsuite: enable pr86058.c also on i?86-*-* [PR100073]

2021-04-15 Thread Jakub Jelinek via Gcc-patches
On Wed, Apr 14, 2021 at 07:50:37PM +0200, Jakub Jelinek wrote:
> On Wed, Apr 14, 2021 at 10:49:42AM -0600, Martin Sebor via Gcc-patches wrote:
> > Apparently the IL GCC emits on some targets (arm and aarach64 with
> > mabi=ilp32, and powerpc64 to name the three where the failures have
> > been pointed out) isn't handled by the uninit pass and so it doesn't
> > issue the expected warning.  That might be a new (as in previously
> > unknown) limitation in the warning or one I don't remember coming
> > across.
> > 
> > I don't see excess warnings with my arm-eabi cross-compiler.  What
> > are they in your environment?
> > 
> > I have limited the test to just x86_64 for now and repurposed pr100073
> > where the same failure was reported on powerpc64 to track the missing
> > warning on these targets.
> 
> +   The test fails on a number of non-x86_64 targets due to pr100073.
> +   { dg-do compile { target x86_64-*-* } }
> 
> change is incorrect.

I have tested it and the test works the same for -m64/-m32/-mx32, therefore
I chose:
> or you mean x86_64 -m64/-mx32/-m32, then it should be
> { i?86-*-* x86_64-*-* }

Tested on x86_64-linux and i686-linux, committed to trunk.

2021-04-15  Jakub Jelinek  

PR testsuite/100073
* gcc.dg/pr86058.c: Enable also on i?86-*-*.

--- gcc/testsuite/gcc.dg/pr86058.c.jj   2021-04-15 10:40:33.449919170 +0200
+++ gcc/testsuite/gcc.dg/pr86058.c  2021-04-15 14:04:02.247335188 +0200
@@ -1,7 +1,7 @@
 /* PR middle-end/86058 - TARGET_MEM_REF causing incorrect message for
-Wmaybe-uninitialized warning
-   The test fails on a number of non-x86_64 targets due to pr100073.
-   { dg-do compile { target x86_64-*-* } }
+   The test fails on a number of non-x86 targets due to pr100073.
+   { dg-do compile { target i?86-*-* x86_64-*-* } }
{ dg-options "-O2 -Wuninitialized -Wmaybe-uninitialized" } */
 
 extern void foo (int *);


Jakub



[Bug tree-optimization/100073] missing warning on an uninitialized array read in a loop

2021-04-15 Thread cvs-commit at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100073

--- Comment #6 from CVS Commits  ---
The master branch has been updated by Jakub Jelinek :

https://gcc.gnu.org/g:4d1fa72894e3fbc5f331d2e8984e990307396124

commit r11-8194-g4d1fa72894e3fbc5f331d2e8984e990307396124
Author: Jakub Jelinek 
Date:   Thu Apr 15 14:08:03 2021 +0200

testsuite: enable pr86058.c also on i?86-*-* [PR100073]

The test also works with -m32 or -mx32 the same as it does for -m64,
therefore it should be enabled for i?86-*-* x86_64-*-* targets,
x86_64-*-* alone is never right.

2021-04-15  Jakub Jelinek  

PR testsuite/100073
* gcc.dg/pr86058.c: Enable also on i?86-*-*.

[Bug tree-optimization/100095] missed optimization for dead code elimination at -O3 (vs. -O2)

2021-04-15 Thread rguenth at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100095

Richard Biener  changed:

   What|Removed |Added

 Blocks||85316
Version|unknown |11.0
 Ever confirmed|0   |1
 Status|UNCONFIRMED |NEW
   Last reconfirmed||2021-04-15
   Keywords||missed-optimization

--- Comment #1 from Richard Biener  ---
VRP optimizes this at -O2 but fails at -O3 where we fail to do some
final value replacement as we unswitch the loop nest on the b != 0 condition
which makes us end up with multiple exits which final_value_replacement_loop
doesn't like (it's somewhat artificial what we end up with - leaving
code commoning on the plate).

In the end the trigger of the missing optimization is unswitching.


Referenced Bugs:

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=85316
[Bug 85316] [meta-bug] VRP range propagation missed cases

[PATCH] testsuite: Fix unroll-and-jam.c on IBM Z

2021-04-15 Thread Stefan Schulze Frielinghaus via Gcc-patches
For z10 and newer inner loops are completely unrolled which leaves no
inner loops to jam which renders this testcase to fail.  Reverting
max-completely-peel-times to the default value fixes this testcase.

gcc/testsuite/ChangeLog:

* gcc.dg/unroll-and-jam.c: Revert max-completely-peel-times to
the default value on IBM Z.

Ok for mainline?

---
 gcc/testsuite/gcc.dg/unroll-and-jam.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/gcc/testsuite/gcc.dg/unroll-and-jam.c 
b/gcc/testsuite/gcc.dg/unroll-and-jam.c
index 7eb64217a05..b8f4f16dc74 100644
--- a/gcc/testsuite/gcc.dg/unroll-and-jam.c
+++ b/gcc/testsuite/gcc.dg/unroll-and-jam.c
@@ -1,5 +1,6 @@
 /* { dg-do run } */
 /* { dg-options "-O3 -floop-unroll-and-jam -fno-tree-loop-im --param 
unroll-jam-min-percent=0 -fdump-tree-unrolljam-details" } */
+/* { dg-additional-options "--param max-completely-peel-times=16" { target { 
s390*-*-* } } } */
 /* { dg-require-effective-target int32plus } */
 
 #include 
-- 
2.23.0



Re: [Patch, fortran] 99307 - FAIL: gfortran.dg/class_assign_4.f90 execution test

2021-04-15 Thread Paul Richard Thomas via Gcc-patches
Pushed to master in commit 9a0e09f3dd5339bb18cc47317f2298d9157ced29

Thanks

Paul


On Wed, 14 Apr 2021 at 14:51, Tobias Burnus  wrote:

> On 11.04.21 09:05, Paul Richard Thomas wrote:
> > Tobias noticed a major technical fault with the resubmission below: I
> > forgot to attach the patch :-(
>
> LGTM. Plus as remarked in the first review: 'trans-expr_c' typo needs to
> be fixed (ChangeLog).
>
> Tobias
>
> >
> > Please find it attached this time.
> >
> > Paul
> >
> > On Tue, 6 Apr 2021 at 18:08, Paul Richard Thomas
> > mailto:paul.richard.tho...@gmail.com>>
> > wrote:
> >
> > Hi Tobias,
> >
> > I believe that the attached fixes the problems that you found with
> > gfc_find_and_cut_at_last_class_ref.
> >
> > I will test:
> >type1%type%array_class2 → NULL is returned  (why?)
> >class1%type%array_class2 → ts = class1 but array2_class is used
> > later on (ups!)
> >class1%...%scalar_class2 → ts = class1 but scalar_class2 is used
> >
> > The ChangeLogs remain the same, apart from the date.
> >
> > Regtests OK on FC33/x86_64.
> >
> > Paul
> >
> >
> > On Mon, 29 Mar 2021 at 14:58, Tobias Burnus
> > mailto:tob...@codesourcery.com>> wrote:
> >
> > Hi all,
> >
> > as preremark I want to note that the testcase class_assign_4.f90
> > was added for PR83118/PR96012 (fixes problems in handling
> > class objects, Dec 18, 2020)
> > and got revised for PR99124 (class defined operators, Feb 23,
> > 2021).
> > Both patches were then also applied to GCC 9 and 10.
> >
> > On 26.03.21 17:30, Paul Richard Thomas via Gcc-patches wrote:
> > > This patch comes in two versions: submit.diff with
> > Change.Logs or
> > > submit2.diff with Change2.Logs.
> > > The first fixes the problem by changing array temporaries
> > from class
> > > expressions into class temporaries. This permits the use of
> > > gfc_get_class_from_expr to obtain the vptr for these
> > temporaries and all
> > > the good things that come with that when handling dynamic
> > types. The second
> > > part of the fix is to use the array element length from the
> > class
> > > descriptor, when reallocating on assignment. This is needed
> > because the
> > > vptr is being set too early. I will set about trying to
> > track down why this
> > > is happening and fix it after release.
> > >
> > > The second version does the same as the first but puts in
> > place a load of
> > > tidying up that is permitted by the fix to class array
> > temporaries.
> >
> > > I couldn't readily see how to prepare a testcase - ideas?
> > > Both regtest on FC33/x86_64. The first was tested by
> > Dominique (see the
> > > PR). OK for master?
> >
> > Typo – underscore-'c' should be a dot-'c' – both changelog files
> >
> > >   * trans-expr_c (gfc_trans_scalar_assign): Make use of
> > pre and
> >
> > I think the second longer version is nicer in general, but at
> > least for
> > GCC 9/GCC10 the first version is simpler and, hence, less
> > error prone.
> >
> > As you only ask about mainline, I would prefer the second one.
> >
> > However, I am not happy about gfc_find_and_cut_at_last_class_ref:
> >
> > > + of refs following. If ts is non-null the cut is at the
> > class entity
> > > + or component that is followed by an array reference, which
> > is not +
> > > an element. */ ... + + if (ts) + { + if (e->symtree + &&
> > > e->symtree->n.sym->ts.type == BT_CLASS) + *ts =
> > > >symtree->n.sym->ts; + else + *ts = NULL; + } + for (ref
> > = e->ref;
> > > ref; ref = ref->next) { + if (ts && ref->type ==
> > REF_COMPONENT + &&
> > > ref->u.c.component->ts.type == BT_CLASS + && ref->next &&
> > > ref->next->type == REF_COMPONENT + && strcmp
> > > (ref->next->u.c.component->name, "_data") == 0 + &&
> > ref->next->next +
> > > && ref->next->next->type == REF_ARRAY + &&
> > ref->next->next->u.ar.type
> > > != AR_ELEMENT) + { + *ts = >u.c.component->ts; +
> > class_ref = ref;
> > > + break; + } + + if (ts && *ts == NULL) + return NULL; +
> > Namely, if there is:
> >type1%array_class2 → array_class2 is used for 'ts' and
> > later (ok)
> >type1%type%array_class2 → NULL is returned  (why?)
> >class1%type%array_class2 → ts = class1 but array2_class is
> > used later on (ups!)
> >class1%...%scalar_class2 → ts = class1 but scalar_class2 is
> > used
> > etc.
> >
> > Thus this either needs to be cleaned up (separate 'ref' loop for
> > ts != NULL) – including 

[Patch, fortran] PR fortran/100094 - Undefined pointers have incorrect rank when using optimization

2021-04-15 Thread José Rui Faustino de Sousa via Gcc-patches

Hi All!

Proposed patch to:

PR100094 - Undefined pointers have incorrect rank when using optimization

Patch tested only on x86_64-pc-linux-gnu.

Pointers, and allocatables, must carry TKR information even when 
undefined. The patch adds code to initialize both pointers and 
allocatables element size, rank and type as soon as possible to do so. 
Latter initialization will work for allocatables, but not for pointers 
since one can not test meaningfully the association status of undefined 
pointers.


Thank you very much.

Best regards,
José Rui

Fortran: Add missing TKR initialization [PR100094]

gcc/fortran/ChangeLog:

PR fortran/100094
* trans-array.c (gfc_trans_deferred_array): Add code to initialize
pointers and allocatables with correct TKR parameters.

gcc/testsuite/ChangeLog:

PR fortran/100094
* gfortran.dg/PR100094.f90: New test.

diff --git a/gcc/fortran/trans-array.c b/gcc/fortran/trans-array.c
index be5eb89350f..2bd69724366 100644
--- a/gcc/fortran/trans-array.c
+++ b/gcc/fortran/trans-array.c
@@ -10920,6 +10920,20 @@ gfc_trans_deferred_array (gfc_symbol * sym, gfc_wrapped_block * block)
 	}
 }
 
+  /* Set initial TKR for pointers and allocatables */
+  if (GFC_DESCRIPTOR_TYPE_P (type)
+  && (sym->attr.pointer || sym->attr.allocatable))
+{
+  tree etype;
+
+  gcc_assert (sym->as && sym->as->rank>=0);
+  tmp = gfc_conv_descriptor_dtype (descriptor);
+  etype = gfc_get_element_type (type);
+  tmp = fold_build2_loc (input_location, MODIFY_EXPR,
+  			 TREE_TYPE (tmp), tmp,
+  			 gfc_get_dtype_rank_type (sym->as->rank, etype));
+  gfc_add_expr_to_block (, tmp);
+}
   gfc_restore_backend_locus ();
   gfc_init_block ();
 
diff --git a/gcc/testsuite/gfortran.dg/PR100094.f90 b/gcc/testsuite/gfortran.dg/PR100094.f90
new file mode 100644
index 000..f2f7f1631dc
--- /dev/null
+++ b/gcc/testsuite/gfortran.dg/PR100094.f90
@@ -0,0 +1,37 @@
+! { dg-do run }
+!
+! Test the fix for PR100094
+!
+
+program foo_p
+
+  implicit none
+
+  integer, parameter :: n = 11
+  
+  integer, pointer :: pout(:)
+  integer,  target :: a(n)
+  integer  :: i
+  
+  a = [(i, i=1,n)]
+  call foo(pout)
+  if(.not.associated(pout)) stop 1
+  if(.not.associated(pout, a)) stop 2
+  if(any(pout/=a)) stop 3
+  stop
+
+contains
+
+  subroutine foo(that)
+integer, pointer, intent(out) :: that(..)
+
+select rank(that)
+rank(1)
+  that => a
+rank default
+  stop 4
+end select
+return
+  end subroutine foo
+
+end program foo_p


[Bug c++/100091] [11 Regression] decltype([]{}) rejected as a default template parameter

2021-04-15 Thread marxin at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100091

Martin Liška  changed:

   What|Removed |Added

 CC||jason at gcc dot gnu.org,
   ||marxin at gcc dot gnu.org

--- Comment #2 from Martin Liška  ---
> It's a recent change in behaviour. Possibly caused by
> r11-8166-ge1666ebd9ad31dbd8b9b933c883bdd882cfd1522.

I can confirm that.

[Bug libstdc++/96657] [9/10/11 Regression] libsupc++.a missing required functions from src/c++98/atomicity.cc when atomic builtins are not supported

2021-04-15 Thread redi at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=96657

--- Comment #4 from Jonathan Wakely  ---
For completeness, here's a testcase which fails on sparc64-unknown-linux-gnu
when compiled using gcc -m32 eh.C -lsupc++

#include 

int main()
{
  std::make_exception_ptr(1);
}

/usr/bin/ld:
/home/jwakely/gcc/11/lib/gcc/sparc64-unknown-linux-gnu/11.0.1/../../../../lib32/libsupc++.a(eh_ptr.o):
in function `__gnu_cxx::__exchange_and_add_dispatch(int*, int)':
/home/jwakely/build/sparc64-unknown-linux-gnu/32/libstdc++-v3/include/ext/atomicity.h:101:
undefined reference to `__gnu_cxx::__exchange_and_add(int volatile*, int)'
/usr/bin/ld:
/home/jwakely/gcc/11/lib/gcc/sparc64-unknown-linux-gnu/11.0.1/../../../../lib32/libsupc++.a(eh_ptr.o):
in function `__gnu_cxx::__atomic_add_dispatch(int*, int)':
/home/jwakely/build/sparc64-unknown-linux-gnu/32/libstdc++-v3/include/ext/atomicity.h:111:
undefined reference to `__gnu_cxx::__atomic_add(int volatile*, int)'
/usr/bin/ld:
/home/jwakely/gcc/11/lib/gcc/sparc64-unknown-linux-gnu/11.0.1/../../../../lib32/libsupc++.a(eh_ptr.o):
in function `__gnu_cxx::__exchange_and_add_dispatch(int*, int)':
/home/jwakely/build/sparc64-unknown-linux-gnu/32/libstdc++-v3/include/ext/atomicity.h:101:
undefined reference to `__gnu_cxx::__exchange_and_add(int volatile*, int)'
/usr/bin/ld:
/home/jwakely/gcc/11/lib/gcc/sparc64-unknown-linux-gnu/11.0.1/../../../../lib32/libsupc++.a(eh_ptr.o):
in function `__gnu_cxx::__atomic_add_dispatch(int*, int)':
/home/jwakely/build/sparc64-unknown-linux-gnu/32/libstdc++-v3/include/ext/atomicity.h:111:
undefined reference to `__gnu_cxx::__atomic_add(int volatile*, int)'
/usr/bin/ld:
/home/jwakely/gcc/11/lib/gcc/sparc64-unknown-linux-gnu/11.0.1/../../../../lib32/libsupc++.a(eh_throw.o):
in function `__gnu_cxx::__exchange_and_add_dispatch(int*, int)':
/home/jwakely/build/sparc64-unknown-linux-gnu/32/libstdc++-v3/include/ext/atomicity.h:101:
undefined reference to `__gnu_cxx::__exchange_and_add(int volatile*, int)'
collect2: error: ld returned 1 exit status

Re: [RFC] Run pass_sink_code once more after ivopts/fre

2021-04-15 Thread Richard Biener
On Thu, 15 Apr 2021, Xionghu Luo wrote:

> Thanks,
> 
> On 2021/4/14 14:41, Richard Biener wrote:
> >> "#538,#235,#234,#233" will all be sunk from bb 35 to bb 37 by rtl-sink,
> >> but it moves #538 first, then #235, there is strong dependency here. It
> >> seemsdoesn't like the LCM framework that could solve all and do the
> >> delete-insert in one iteration.
> > So my question was whether we want to do both within the LCM store
> > sinking framework.  The LCM dataflow is also used by RTL PRE which
> > handles both loads and non-loads so in principle it should be able
> > to handle stores and non-stores for the sinking case (PRE on the
> > reverse CFG).
> > 
> > A global dataflow is more powerful than any local ad-hoc method.
> 
> My biggest concern is whether the LCM DF framework could support sinking
> *multiple* reverse-dependent non-store instructions together by *one*
> calling of LCM DF.   If this is not supported, we need run multiple LCM
> until no new changes, it would be time consuming obviously (unless
> compiling time is not important here).

As said it is used for PRE and there it most definitely can do that.

> 
> > 
> > Richard.
> > 
> >> However, there are still some common methods could be shared, like the
> >> def-use check(though store-motion is per bb, rtl-sink is per loop),
> >> insert_store, commit_edge_insertions etc.
> >>
> >>
> >>508: L508:
> >>507: NOTE_INSN_BASIC_BLOCK 34
> >> 12: r139:DI=r140:DI
> >>REG_DEAD r140:DI
> >>240: L240:
> >>231: NOTE_INSN_BASIC_BLOCK 35
> >>232: r142:DI=zero_extend(r139:DI#0)
> >>233: r371:SI=r142:DI#0-0x1
> >>234: r243:DI=zero_extend(r371:SI)
> >>REG_DEAD r371:SI
> >>235: r452:DI=r262:DI+r139:DI
> >>538: r194:DI=r452:DI
> >>236: r372:CCUNS=cmp(r142:DI#0,r254:DI#0)
> 
> 
> Like here, Each instruction's dest reg is calculated in the input vector
> bitmap, after solving the equations by calling pre_edge_rev_lcm, 
> move #538 out of loop for the first call, then move #235 out of loop
> after a second call... 4 repeat calls needed in total here, is the LCM
> framework smart enough to move the all 4 instruction within one iteration?
> I am worried that the input vector bitmap couldn't solve the dependency
> problem for two back chained instructions.
> 
> 
> 

-- 
Richard Biener 
SUSE Software Solutions Germany GmbH, Maxfeldstrasse 5, 90409 Nuernberg,
Germany; GF: Felix Imendörffer; HRB 36809 (AG Nuernberg)


[Bug tree-optimization/99971] GCC generates partially vectorized and scalar code at once

2021-04-15 Thread rguenth at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=99971

--- Comment #5 from Richard Biener  ---
(In reply to Richard Biener from comment #4)
> (In reply to andysem from comment #3)
> > I tried adding __restrict__ to the equivalents of x, y1 and y2 in the
> > original larger code base and it didn't help. The compiler (gcc 10.2) would
> > still generate the same half-vectorized code.
> 
> Hmm, that's odd.  I suppose the equivalent of test() was inlined in the
> larger code base?
> 
> I'd be interested in preprocessed source of a translation unit that exhibits
> this issue (and a pointer to the point in the source that is relevant).
> 
> Note for GCC 12 I have a patch to improve things w/o requiring the use
> of __restrict (and I'm curious on whether that helps for the larger code
> base).

https://gcc.gnu.org/pipermail/gcc-patches/2021-April/567805.html

is the patch which applies to current master.

[Bug tree-optimization/99971] GCC generates partially vectorized and scalar code at once

2021-04-15 Thread rguenth at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=99971

--- Comment #4 from Richard Biener  ---
(In reply to andysem from comment #3)
> I tried adding __restrict__ to the equivalents of x, y1 and y2 in the
> original larger code base and it didn't help. The compiler (gcc 10.2) would
> still generate the same half-vectorized code.

Hmm, that's odd.  I suppose the equivalent of test() was inlined in the
larger code base?

I'd be interested in preprocessed source of a translation unit that exhibits
this issue (and a pointer to the point in the source that is relevant).

Note for GCC 12 I have a patch to improve things w/o requiring the use
of __restrict (and I'm curious on whether that helps for the larger code base).

[Bug jit/100096] New: libgccjit.so.0: Cannot write-enable text segment: Permission denied on NetBSD 9.1

2021-04-15 Thread swilde--- via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100096

Bug ID: 100096
   Summary: libgccjit.so.0: Cannot write-enable text segment:
Permission denied on NetBSD 9.1
   Product: gcc
   Version: 10.2.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: jit
  Assignee: dmalcolm at gcc dot gnu.org
  Reporter: swi...@sha-bang.de
  Target Milestone: ---

On NetBSD 9.1 i386 the Hello World example from
https://gcc.gnu.org/onlinedocs/jit/intro/tutorial01.html
fails with:

% ./tut01-hello-world
/usr/local/lib/libgccjit.so.0: text relocations
/usr/local/lib/libgccjit.so.0: Cannot write-enable text segment: Permission
denied

when security.pax.mprotect.global is enabled, which is the default on the
system.
When disabelin global memory protection (as root) with:

sysctl -w security.pax.mprotect.global=0

The example works (still emitting a warning:

% ./tut01-hello-world
/usr/local/lib/libgccjit.so.0: text relocations
hello world

Turning of security.pax.mprotect.global shouldn't be required for libgccjit
to work.

Also the warning "/usr/local/lib/libgccjit.so.0: text relocations" should
be prevented if possible.

[PATCH] Remove gimplify_buildN API use from complex lowering

2021-04-15 Thread Richard Biener
This removes the legacy gimplify_buildN API use from complex lowering.

Bootstrapped on x86_64-unknown-linux-gnu, testing in progress, queued for 
stage1

2021-04-15  Richard Biener  

* tree-complex.c: Include gimple-fold.h.
(expand_complex_addition): Use gimple_build.
(expand_complex_multiplication_components): Likewise.
(expand_complex_multiplication): Likewise.
(expand_complex_div_straight): Likewise.
(expand_complex_div_wide): Likewise.
(expand_complex_division): Likewise.
(expand_complex_conjugate): Likewise.
(expand_complex_comparison): Likewise.
---
 gcc/tree-complex.c | 232 ++---
 1 file changed, 132 insertions(+), 100 deletions(-)

diff --git a/gcc/tree-complex.c b/gcc/tree-complex.c
index b11da01a58b..d7d991714de 100644
--- a/gcc/tree-complex.c
+++ b/gcc/tree-complex.c
@@ -40,6 +40,7 @@ along with GCC; see the file COPYING3.  If not see
 #include "tree-hasher.h"
 #include "cfgloop.h"
 #include "cfganal.h"
+#include "gimple-fold.h"
 
 
 /* For each complex ssa name, a lattice value.  We're interested in finding
@@ -916,25 +917,27 @@ expand_complex_addition (gimple_stmt_iterator *gsi, tree 
inner_type,
 complex_lattice_t al, complex_lattice_t bl)
 {
   tree rr, ri;
+  gimple_seq stmts = NULL;
+  location_t loc = gimple_location (gsi_stmt (*gsi));
 
   switch (PAIR (al, bl))
 {
 case PAIR (ONLY_REAL, ONLY_REAL):
-  rr = gimplify_build2 (gsi, code, inner_type, ar, br);
+  rr = gimple_build (, loc, code, inner_type, ar, br);
   ri = ai;
   break;
 
 case PAIR (ONLY_REAL, ONLY_IMAG):
   rr = ar;
   if (code == MINUS_EXPR)
-   ri = gimplify_build2 (gsi, MINUS_EXPR, inner_type, ai, bi);
+   ri = gimple_build (, loc, MINUS_EXPR, inner_type, ai, bi);
   else
ri = bi;
   break;
 
 case PAIR (ONLY_IMAG, ONLY_REAL):
   if (code == MINUS_EXPR)
-   rr = gimplify_build2 (gsi, MINUS_EXPR, inner_type, ar, br);
+   rr = gimple_build (, loc, MINUS_EXPR, inner_type, ar, br);
   else
rr = br;
   ri = ai;
@@ -942,23 +945,23 @@ expand_complex_addition (gimple_stmt_iterator *gsi, tree 
inner_type,
 
 case PAIR (ONLY_IMAG, ONLY_IMAG):
   rr = ar;
-  ri = gimplify_build2 (gsi, code, inner_type, ai, bi);
+  ri = gimple_build (, loc, code, inner_type, ai, bi);
   break;
 
 case PAIR (VARYING, ONLY_REAL):
-  rr = gimplify_build2 (gsi, code, inner_type, ar, br);
+  rr = gimple_build (, loc, code, inner_type, ar, br);
   ri = ai;
   break;
 
 case PAIR (VARYING, ONLY_IMAG):
   rr = ar;
-  ri = gimplify_build2 (gsi, code, inner_type, ai, bi);
+  ri = gimple_build (, loc, code, inner_type, ai, bi);
   break;
 
 case PAIR (ONLY_REAL, VARYING):
   if (code == MINUS_EXPR)
goto general;
-  rr = gimplify_build2 (gsi, code, inner_type, ar, br);
+  rr = gimple_build (, loc, code, inner_type, ar, br);
   ri = bi;
   break;
 
@@ -966,19 +969,20 @@ expand_complex_addition (gimple_stmt_iterator *gsi, tree 
inner_type,
   if (code == MINUS_EXPR)
goto general;
   rr = br;
-  ri = gimplify_build2 (gsi, code, inner_type, ai, bi);
+  ri = gimple_build (, loc, code, inner_type, ai, bi);
   break;
 
 case PAIR (VARYING, VARYING):
 general:
-  rr = gimplify_build2 (gsi, code, inner_type, ar, br);
-  ri = gimplify_build2 (gsi, code, inner_type, ai, bi);
+  rr = gimple_build (, loc, code, inner_type, ar, br);
+  ri = gimple_build (, loc, code, inner_type, ai, bi);
   break;
 
 default:
   gcc_unreachable ();
 }
 
+  gsi_insert_seq_before (gsi, stmts, GSI_SAME_STMT);
   update_complex_assignment (gsi, rr, ri);
 }
 
@@ -1059,26 +1063,26 @@ expand_complex_libcall (gimple_stmt_iterator *gsi, tree 
type, tree ar, tree ai,
components of the result into RR and RI.  */
 
 static void
-expand_complex_multiplication_components (gimple_stmt_iterator *gsi,
-tree type, tree ar, tree ai,
-tree br, tree bi,
-tree *rr, tree *ri)
+expand_complex_multiplication_components (gimple_seq *stmts, location_t loc,
+ tree type, tree ar, tree ai,
+ tree br, tree bi,
+ tree *rr, tree *ri)
 {
   tree t1, t2, t3, t4;
 
-  t1 = gimplify_build2 (gsi, MULT_EXPR, type, ar, br);
-  t2 = gimplify_build2 (gsi, MULT_EXPR, type, ai, bi);
-  t3 = gimplify_build2 (gsi, MULT_EXPR, type, ar, bi);
+  t1 = gimple_build (stmts, loc, MULT_EXPR, type, ar, br);
+  t2 = gimple_build (stmts, loc, MULT_EXPR, type, ai, bi);
+  t3 = gimple_build (stmts, loc, MULT_EXPR, type, ar, bi);
 
   /* Avoid expanding redundant multiplication for the common
  case of squaring 

[PATCH] Remove gimplify_buildN API use from phiopt

2021-04-15 Thread Richard Biener
This removes use of the legacy gimplify_buildN API from phiopt.

Bootstrapped and tested on x86_64-unknown-linux-gnu, queued for stage1

2021-04-15  Richard Biener  

* tree-ssa-phiopt.c (two_value_replacement): Remove use
of legacy gimplify_buildN API.
---
 gcc/tree-ssa-phiopt.c | 14 +++---
 1 file changed, 7 insertions(+), 7 deletions(-)

diff --git a/gcc/tree-ssa-phiopt.c b/gcc/tree-ssa-phiopt.c
index 13e5c4971d2..35ce51e5977 100644
--- a/gcc/tree-ssa-phiopt.c
+++ b/gcc/tree-ssa-phiopt.c
@@ -752,16 +752,16 @@ two_value_replacement (basic_block cond_bb, basic_block 
middle_bb,
 }
 
   tree arg = wide_int_to_tree (type, a);
-  gimple_stmt_iterator gsi = gsi_for_stmt (stmt);
-  if (!useless_type_conversion_p (type, TREE_TYPE (lhs)))
-lhs = gimplify_build1 (, NOP_EXPR, type, lhs);
+  gimple_seq stmts = NULL;
+  lhs = gimple_convert (, type, lhs);
   tree new_rhs;
   if (code == PLUS_EXPR)
-new_rhs = gimplify_build2 (, PLUS_EXPR, type, lhs, arg);
+new_rhs = gimple_build (, PLUS_EXPR, type, lhs, arg);
   else
-new_rhs = gimplify_build2 (, MINUS_EXPR, type, arg, lhs);
-  if (!useless_type_conversion_p (TREE_TYPE (arg0), type))
-new_rhs = gimplify_build1 (, NOP_EXPR, TREE_TYPE (arg0), new_rhs);
+new_rhs = gimple_build (, MINUS_EXPR, type, arg, lhs);
+  new_rhs = gimple_convert (, TREE_TYPE (arg0), new_rhs);
+  gimple_stmt_iterator gsi = gsi_for_stmt (stmt);
+  gsi_insert_seq_before (, stmts, GSI_SAME_STMT);
 
   replace_phi_edge_with_variable (cond_bb, e1, phi, new_rhs);
 
-- 
2.26.2


[Bug tree-optimization/100095] New: missed optimization for dead code elimination at -O3 (vs. -O2)

2021-04-15 Thread zhendong.su at inf dot ethz.ch via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100095

Bug ID: 100095
   Summary: missed optimization for dead code elimination at -O3
(vs. -O2)
   Product: gcc
   Version: unknown
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: tree-optimization
  Assignee: unassigned at gcc dot gnu.org
  Reporter: zhendong.su at inf dot ethz.ch
  Target Milestone: ---

[654] % gcctk -v
Using built-in specs.
COLLECT_GCC=gcctk
COLLECT_LTO_WRAPPER=/local/suz-local/software/local/gcc-trunk/libexec/gcc/x86_64-pc-linux-gnu/11.0.1/lto-wrapper
Target: x86_64-pc-linux-gnu
Configured with: ../gcc-trunk/configure --disable-bootstrap
--prefix=/local/suz-local/software/local/gcc-trunk --enable-languages=c,c++
--disable-werror --enable-multilib --with-system-zlib
Thread model: posix
Supported LTO compression algorithms: zlib
gcc version 11.0.1 20210415 (experimental) [master revision
4dd9e1c541e:7315804b0a0:b5f644a98b3f3543d3a8d2dfea7785c22879013f] (GCC) 
[655] % 
[655] % gcctk -O2 -S -c -o O2.s small.c
[656] % gcctk -O3 -S -c -o O3.s small.c
[657] % 
[657] % wc O2.s O3.s
  75  164  982 O2.s
 121  269 1617 O3.s
 196  433 2599 total
[658] % 
[658] % grep foo O2.s
[659] % grep foo O3.s
jmp foo
[660] % 
[660] % cat small.c
extern void foo(void);
int a, b, c, d, g, h;
static int *e = 
volatile int f;
void i() {
  for (d = 5; d >= 0; d--)
for (c = 0; c < 4; c++)
  while (1) {
h = b ? a % b : 0;
if (g)
  f;
if (*e)
  break;
  }
  foo();
}

[PATCH] Deprecate gimple-builder.h API

2021-04-15 Thread Richard Biener
This adds a deprecation note to the undocumented gimple-builder.h
API only used by asan and sancov.

Pushed.

2021-04-15  Richard Biener  

* gimple-builder.h: Add deprecation note.
---
 gcc/gimple-builder.h | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/gcc/gimple-builder.h b/gcc/gimple-builder.h
index 61cf08c8dcb..ae273ce9041 100644
--- a/gcc/gimple-builder.h
+++ b/gcc/gimple-builder.h
@@ -21,6 +21,8 @@ along with GCC; see the file COPYING3.  If not see
 #ifndef GCC_GIMPLE_BUILDER_H
 #define GCC_GIMPLE_BUILDER_H
 
+/* ???  This API is legacy and should not be used in new code.  */
+
 gassign *build_assign (enum tree_code, tree, int, tree lhs = NULL_TREE);
 gassign *build_assign (enum tree_code, gimple *, int, tree lhs = NULL_TREE);
 gassign *build_assign (enum tree_code, tree, tree, tree lhs = NULL_TREE);
-- 
2.26.2


[Bug target/99249] SVE: ICE in aarch64_expand_sve_const_vector (during RTL pass: early_remat)

2021-04-15 Thread rsandifo at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=99249

rsandifo at gcc dot gnu.org  changed:

   What|Removed |Added

   Assignee|unassigned at gcc dot gnu.org  |rsandifo at gcc dot 
gnu.org

--- Comment #4 from rsandifo at gcc dot gnu.org  
---
Fixed by
https://gcc.gnu.org/git/gitweb.cgi?p=gcc.git;h=a065e0bb092a010664777394530ab1a52bb5293b,
but I typoed the PR number.

[Bug c++/99246] [modules] ICE in write_location, at cp/module.cc:15687

2021-04-15 Thread rsandifo at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=99246

rsandifo at gcc dot gnu.org  changed:

   What|Removed |Added

 CC||rsandifo at gcc dot gnu.org

--- Comment #6 from rsandifo at gcc dot gnu.org  
---
Oops, sorry for the PR typo, should have been PR99249.

[Bug target/96339] [SVE] Optimise svlast[ab]

2021-04-15 Thread belagod at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=96339

Tejas Belagod  changed:

   What|Removed |Added

 Status|UNCONFIRMED |ASSIGNED
   Target Milestone|--- |12.0
   Last reconfirmed||2021-04-15
 Ever confirmed|0   |1

[Bug target/99929] SVE: Wrong code at -O2 -ftree-vectorize

2021-04-15 Thread rsandifo at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=99929

rsandifo at gcc dot gnu.org  changed:

   What|Removed |Added

   Target Milestone|--- |8.5

--- Comment #4 from rsandifo at gcc dot gnu.org  
---
Fixed on trunk so far.  Intending to backport to GCC 8.

[Bug c/98852] [10 Regression] Conditional expression wrongly rejected for arm_neon.h vectors

2021-04-15 Thread rsandifo at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98852

rsandifo at gcc dot gnu.org  changed:

   What|Removed |Added

Summary|[10/11 Regression]  |[10 Regression] Conditional
   |Conditional expression  |expression wrongly rejected
   |wrongly rejected for|for arm_neon.h vectors
   |arm_neon.h vectors  |

--- Comment #5 from rsandifo at gcc dot gnu.org  
---
Fixed on trunk so far.  Backports will need to be slightly different.

[Bug c/98852] [10/11 Regression] Conditional expression wrongly rejected for arm_neon.h vectors

2021-04-15 Thread cvs-commit at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98852

--- Comment #4 from CVS Commits  ---
The master branch has been updated by Richard Sandiford :

https://gcc.gnu.org/g:1696fc1ea01d5c9dce96b5d3122921aab9308f59

commit r11-8192-g1696fc1ea01d5c9dce96b5d3122921aab9308f59
Author: Richard Sandiford 
Date:   Thu Apr 15 11:37:39 2021 +0100

c++: Tweak merging of vector attributes that affect type identity [PR98852]

 types are distinct from GNU vector types in at least
their mangling.  However, there used to be nothing explicit in the
VECTOR_TYPE itself to indicate the difference: we simply treated them
as distinct TYPE_MAIN_VARIANTs.  This caused problems like the ones
reported in PR95726.

The fix for that PR was to add type attributes to the 
types, in order to maintain the distinction between them and GNU
vectors.  However, this in turn caused PR98852, where cp_common_type
would merge the type attributes from the two source types and attach
the result to the common type.  For example:

   unsigned vector with no attribute + signed vector with attribute X

would get converted to:

   unsigned vector with attribute X

That isn't what we want in this case, since X describes the mangling
of the original type.  But even if we dropped the mangling from X and
worked it out from context, we would still have a situation in which
the common type was provably distinct from both of the source types:
it would take its -ness from one side and its signedness
from the other.  I guess there are other cases where the common type
doesn't match either side, but I'm not sure it's the obvious behaviour
here.  It's also different from GCC 10.1 and earlier, where the unsigned
vector âwonâ in its original form.

This patch instead merges only the attributes that don't affect type
identity.  For now I've restricted it to vector types, since we're so
close to GCC 11, but it might make sense to use this elsewhere.

I've tried to audit the C and target-specific attributes to look for
other types that might be affected by this, but I couldn't see any.
The closest was s390_vector_bool, but the handler for that attribute
changes the type node and drops the attribute itself
(*no_add_attrs = true).

gcc/
PR c++/98852
* attribs.h (restrict_type_identity_attributes_to): Declare.
* attribs.c (restrict_type_identity_attributes_to): New function.

gcc/cp/
PR c++/98852
* typeck.c (merge_type_attributes_from): New function.
(cp_common_type): Use it for vector types.

[Bug c/98852] [10/11 Regression] Conditional expression wrongly rejected for arm_neon.h vectors

2021-04-15 Thread cvs-commit at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98852

--- Comment #3 from CVS Commits  ---
The master branch has been updated by Richard Sandiford :

https://gcc.gnu.org/g:a3317f7b3c02907a122f89879e5b6e90c386e64d

commit r11-8191-ga3317f7b3c02907a122f89879e5b6e90c386e64d
Author: Richard Sandiford 
Date:   Thu Apr 15 11:37:38 2021 +0100

c: Don't drop vector attributes that affect type identity [PR98852]

 types are distinct from GNU vector types in at least
their mangling.  However, there used to be nothing explicit in the
VECTOR_TYPE itself to indicate the difference: we simply treated them
as distinct TYPE_MAIN_VARIANTs.  This caused problems like the ones
reported in PR95726.

The fix for that PR was to add type attributes to the 
types, in order to maintain the distinction between them and GNU
vectors.  However, this in turn caused PR98852, where c_common_type
would unconditionally drop the attributes on the source types.
This meant that:

vector +  vector

had a GNU vector type rather than an  vector type.

See https://gcc.gnu.org/bugzilla/show_bug.cgi?id=96377#c2 for
Jakub's analysis of the history of this c_common_type code.
TBH I'm not sure which case the build_type_attribute_variant
code is handling, but I think we should at least avoid dropping
attributes that affect type identity.

I've tried to audit the C and target-specific attributes to look
for other types that might be affected by this, but I couldn't
see any.  We are only dealing with:

  gcc_assert (code1 == VECTOR_TYPE || code1 == COMPLEX_TYPE
  || code1 == FIXED_POINT_TYPE || code1 == REAL_TYPE
  || code1 == INTEGER_TYPE);

which excludes most affects_type_identity attributes.  The closest
was s390_vector_bool, but the handler for that attribute changes
the type node and drops the attribute itself (*no_add_attrs = true).

I put the main list handling into a separate function
(remove_attributes_matching) because a later patch will need it
for something else.

gcc/
PR c/98852
* attribs.h (affects_type_identity_attributes): Declare.
* attribs.c (remove_attributes_matching): New function.
(affects_type_identity_attributes): Likewise.

gcc/c/
PR c/98852
* c-typeck.c (c_common_type): Do not drop attributes that
affect type identity.

gcc/testsuite/
PR c/98852
* gcc.target/aarch64/advsimd-intrinsics/pr98852.c: New test.

Re: removing toxic emailers

2021-04-15 Thread Aaron Gyes via Gcc
> On Apr 14, 2021, at 5:10 PM, Christopher Dimech  wrote:

> What are we?  Adults or Children?  You know, as I know, that identities
> can be made up.  There are many computing specialists who can do that.
> They can even be made so it looks as though they were sent by you, or 
> from your work and home address.  They could even be made up to look as
> though your children sent them.

That’s far out man, like outer space far out. It’s fortunate, though, that
despite this confusing world of tricksters you find yourself in, you have
maintained the kind of confidence and composure required to put in thisn 
insincere
kind of low-effort trolling to defend your principals, in a serious discussion
that were it to go the wrong way, could well potentially also require you to 
take
responsibility for your behavior in public. 

> So my point here — if it’s okay just to have a point when people should 
> already be drinking and dancing — my point is let’s not get confused. 


Do you imagine people may one day solemnly read through these archives here, 
shaking
their heads at how Mr. Stallman was treated, how mean and irrational it all 
was, even as
even you tried your best to outwit the members into doing the right thing… Just 
as people do
when reading Socrates' Apology, or Tacitus talking about the suffering under 
emperors?

That would be sad because the annals of the mailing list will be available 
verbatim, probably
Literally forever, so obviously that can’t happen.

Aaron

[Bug fortran/100094] New: Undefined pointers have incorrect rank when using optimization

2021-04-15 Thread jrfsousa at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100094

Bug ID: 100094
   Summary: Undefined pointers have incorrect rank when using
optimization
   Product: gcc
   Version: 11.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: fortran
  Assignee: unassigned at gcc dot gnu.org
  Reporter: jrfsousa at gmail dot com
  Target Milestone: ---

Created attachment 50599
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=50599=edit
Fortran code showing problem

Hi All!

Rank information is not correctly written into the pointer descriptor when
using optimization or -ffpe-trap.

Seen on:

GNU Fortran (GCC) 11.0.1 20210415 (experimental)
GNU Fortran (GCC) 10.3.1 20210415

Thank you very much.

Best regards,
José Rui

Re: [Patch, fortran] PR fortran/84006, PR fortran/100027 - ICE on storage_size with polymorphic argument

2021-04-15 Thread Tobias Burnus

Hi José,

first, I think you did not yet commit the approved patch for PR100018,
did you?

On 11.04.21 02:34, José Rui Faustino de Sousa via Fortran wrote:

Proposed patch to:
PR84006 - [8/9/10/11 Regression] ICE in storage_size() with CLASS entity
PR100027 - ICE on storage_size with polymorphic argument

Patch tested only on x86_64-pc-linux-gnu.


LGTM – however, I think it would be useful to also test polymorphic
components
– and to check whether the result comes out right, especially as you
already have a dg-do run test.

Hence, how about replacing that testcase by the extended attached testcase?

Tobias


Add branch to if clause to handle polymorphic objects, not sure if I
got all possible variations...

Thank you very much.

Best regards,
José Rui

Fortran: Fix ICE using storage_size intrinsic [PR84006, PR100027]

gcc/fortran/ChangeLog:

PR fortran/84006
PR fortran/100027
* trans-intrinsic.c (gfc_conv_intrinsic_storage_size): add if
clause branch to handle polymorphic objects.

gcc/testsuite/ChangeLog:

PR fortran/84006
* gfortran.dg/PR84006.f90: New test.

PR fortran/100027
* gfortran.dg/PR100027.f90: New test.


-
Mentor Graphics (Deutschland) GmbH, Arnulfstrasse 201, 80634 München 
Registergericht München HRB 106955, Geschäftsführer: Thomas Heurung, Frank 
Thürauf
! { dg-do run }
!

program foo_p

  implicit none

  integer, parameter :: n = 11
  integer, parameter :: foo_size = storage_size(n)*4
  integer, parameter :: bar_size = storage_size(n)*(4+8)
  
  type :: foo_t
integer :: arr1(4)
  end type foo_t

  type, extends(foo_t) :: bar_t
integer :: arr2(8)
  end type bar_t

  type box_t
class(foo_t), allocatable :: x, y(:)
  end type box_t

  class(*), pointer :: apu(:)
  class(foo_t), pointer :: apf(:)
  class(bar_t), pointer :: apb(:)
  type(foo_t),  target :: atf(n)
  type(bar_t),  target :: atb(n)
  type(box_t), target :: aa, bb

  integer :: m
  
  apu => atb
  m = storage_size(apu)
  if (m /= bar_size) stop
  apu => atf
  m = storage_size(apu)
  if (m /= foo_size) stop
  apf => atb
  m = storage_size(apf)
  if (m /= bar_size) stop
  apf => atf
  m = storage_size(apf)
  if (m /= foo_size) stop
  apb => atb
  m = storage_size(apb)
  if (m /= bar_size) stop

  allocate(foo_t :: aa%x, aa%y(1))
  allocate(bar_t :: bb%x, bb%y(1))
  if (storage_size(aa%x) /= foo_size) stop
  if (storage_size(aa%y) /= foo_size) stop
  if (storage_size(bb%x) /= bar_size) stop
  if (storage_size(bb%y) /= bar_size) stop

  apu => bb%y
  m = storage_size(apu)
  if (m /= bar_size) stop
  apu => aa%y
  m = storage_size(apu)
  if (m /= foo_size) stop
  apf => bb%y
  m = storage_size(apf)
  if (m /= bar_size) stop
  apf => aa%y
  m = storage_size(apf)
  if (m /= foo_size) stop

end program foo_p


[Bug ipa/80726] [8/9/10/11 Regression] Destructor not inlined anymore (regression)

2021-04-15 Thread hubicka at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=80726

Jan Hubicka  changed:

   What|Removed |Added

 Resolution|--- |DUPLICATE
 Status|ASSIGNED|RESOLVED

--- Comment #9 from Jan Hubicka  ---
This is a dup that is fixed on mainline.

*** This bug has been marked as a duplicate of bug 98265 ***

[Bug ipa/98265] [10 Regression] gcc-10 has significantly worse code generated with -O2 compared to -O1 (or gcc-9 -O2) when using the Eigen C++ library

2021-04-15 Thread hubicka at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98265

Jan Hubicka  changed:

   What|Removed |Added

 CC||cuzdav at gmail dot com

--- Comment #12 from Jan Hubicka  ---
*** Bug 80726 has been marked as a duplicate of this bug. ***

[Bug ipa/99309] [10/11 Regression] Segmentation fault with __builtin_constant_p usage at -O2

2021-04-15 Thread hubicka at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=99309

Jan Hubicka  changed:

   What|Removed |Added

   Assignee|unassigned at gcc dot gnu.org  |hubicka at gcc dot 
gnu.org
 Status|NEW |ASSIGNED

--- Comment #9 from Jan Hubicka  ---
Have WIP patch to attach predicates to buildtin_constant_p and redirect to true
if inliner works out that it is a constat (still relying on late passes to
optimize the if branch well).

>From all the options I can think of this seems best even though it may end up
in relatively rare cases that we do the (very simple) propagation at IPA time
and late optimizations won't.  Without explicitly disabling passes (where I
think this is fine to happen) all testcases we seen so far was of the form that
constant was eventually propagated but only after we folded builtin_constant_p
to false.

Overall it is not possible to assure that builtin_constant_p on memory will
fold to true only if all uses of the memory later in the if branch will ford to
constant since AO has walking limits.

[Bug c/88566] -Wconversion not using value range information

2021-04-15 Thread redi at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88566

Jonathan Wakely  changed:

   What|Removed |Added

   Target Milestone|--- |10.0

[Bug c++/91179] Spurious -Wconversion warning after promotion

2021-04-15 Thread redi at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=91179

Jonathan Wakely  changed:

   What|Removed |Added

   Target Milestone|--- |10.0

Re: [committed] gimple UIDs, LTO and -fanalyzer [PR98599]

2021-04-15 Thread Jan Hubicka
Hi,
this is patch fixing the underlying issue of function missing
lto_prepare_function_for_streaming because gimple_has_body_p is not the
same thing as node.has_gimple_body (which needs to be clarified next
stage1 by finding better names for this I suppose).

I commited it to gcc 11 even though we already have your workaround
since it is small and safe and it may save some pain when backporting
changes to the branch in future - basically all passes at WPA
renumbering statements would hit this issue which is not that obvious to
debug as we found :)

We may backport it to gcc10 too if you preffer it over your fix - I
think both are fine in general for release branches.

lto-bootstrapped/regtested x86_64-linux.

Honza

2021-04-15  Jan Hubicka  

PR lto/98599
* lto.c (lto_wpa_write_files): Fix handling of clones.

diff --git a/gcc/lto/lto.c b/gcc/lto/lto.c
index ceb61bb300b..5903f75ac23 100644
--- a/gcc/lto/lto.c
+++ b/gcc/lto/lto.c
@@ -306,7 +306,7 @@ lto_wpa_write_files (void)
   cgraph_node *node;
   /* Do body modifications needed for streaming before we fork out
  worker processes.  */
-  FOR_EACH_FUNCTION_WITH_GIMPLE_BODY (node)
+  FOR_EACH_FUNCTION (node)
 if (!node->clone_of && gimple_has_body_p (node->decl))
   lto_prepare_function_for_streaming (node);
 


[Bug analyzer/98599] [11 Regression] fatal error: Cgraph edge statement index out of range with -Os -flto -fanalyzer

2021-04-15 Thread cvs-commit at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98599

--- Comment #18 from CVS Commits  ---
The master branch has been updated by Jan Hubicka :

https://gcc.gnu.org/g:b5f644a98b3f3543d3a8d2dfea7785c22879013f

commit r11-8190-gb5f644a98b3f3543d3a8d2dfea7785c22879013f
Author: Jan Hubicka 
Date:   Thu Apr 15 11:40:40 2021 +0200

Fix handling of clones in lto_wpa_write_files [PR98599]

2021-04-15  Jan Hubicka  

PR lto/98599
* lto.c (lto_wpa_write_files): Fix handling of clones.

[Bug target/100009] [9 Regression] -march=native doesn't work on tigerlake

2021-04-15 Thread crazylht at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=19

--- Comment #5 from Hongtao.liu  ---
(In reply to Hongtao.liu from comment #3)
> > Response from Jim Wilson: 
> > Looks like a bug in gcc-9.  tigerlake was added to
> > gcc/config/i386/driver-i386.c but not to the arch_names_table in i386.c.  I
> > would suggest filing a bug report.  I don't think there is a way to
> > workaround this.  It needs to be fixed in the gcc source tree.
> 
> Oops,
> will backport r10-2664-ga9fcfec30f70c30883f53d4b1bd533fbea0e9fb2 (tigerlake
> part) to gcc9.

Fixed by r9-9351

[Bug ipa/92535] [10 regression] ICF is relatively expensive and became less effective

2021-04-15 Thread hubicka at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=92535

Jan Hubicka  changed:

   What|Removed |Added

 Status|ASSIGNED|NEW
Summary|[10/11 regression] ICF is   |[10 regression] ICF is
   |relatively expensive and|relatively expensive and
   |became less effective   |became less effective

--- Comment #17 from Jan Hubicka  ---
For GCC 11 we now get faster build times with ICF than without on cc1plus,
Firefox and clang LTO build.  So I think we can consider it no longer
regression while ICF can always be improved (and I have some changes queues for
next stage1).

I have no plan to backport this to gcc10, so unasigning.

[Bug target/100093] different behavior between -mtune=cpu_type and target_attribute (“arch=cputype”)

2021-04-15 Thread crazylht at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100093

--- Comment #1 from Hongtao.liu  ---
When ix86_tune_features[X86_TUNE_AVX256_UNALIGNED_LOAD/STORE_OPTIMAL] is false,
GCC goes to set up the bit MASK_AVX256_SPLIT_UNALIGNED_LOAD/STORE, but when
ix86_tune_features[X86_TUNE_AVX256_UNALIGNED_LOAD/STORE_OPTIMAL
features[X86_TUNE_AVX256_UNALIGNED_LOAD/STORE_OPTIMAL] is true, it doesn't  to
clear the bit which causes the issue.

  if (!ix86_tune_features[X86_TUNE_AVX256_UNALIGNED_LOAD_OPTIMAL]
  && !(opts_set->x_target_flags & MASK_AVX256_SPLIT_UNALIGNED_LOAD))
opts->x_target_flags |= MASK_AVX256_SPLIT_UNALIGNED_LOAD;
  if (!ix86_tune_features[X86_TUNE_AVX256_UNALIGNED_STORE_OPTIMAL]
  && !(opts_set->x_target_flags & MASK_AVX256_SPLIT_UNALIGNED_STORE))
opts->x_target_flags |= MASK_AVX256_SPLIT_UNALIGNED_STORE;

[Bug tree-optimization/100076] eembc/automotive/basefp01 has 30.3% regression compare -O2 -ftree-vectorize with -O2 on CLX/Znver3

2021-04-15 Thread crazylht at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100076

--- Comment #6 from Hongtao.liu  ---
(In reply to Richard Biener from comment #5)
> Note even when avoiding the STLF hit the vectorized version is slower.
> You can use -mtune-ctl=^sse_unaligned_load_optimal to force loading
> the lower/upper half of vectors separately.
> 
This leads to extra instructions(extra 2 loads), and if the vectorizer knew
that, it would find that the cost of vectorization is larger than scalar. 

> The reason is that without -ffast-math we are using an in-order reduction
> which doesn't save us much but instead just combines dependence chains
> here.  We do have a related bug for this somewhere.
> 
> With -ffast-math the version with/without
> -mtune-ctl=^sse_unaligned_load_optimal
> is about the same speed, so STLF is a red herring here (on Zen2).
> 
> Still not vectorizing is a lot faster.
> 

Yes, As far as vectorization is concerned, vectorization does not improve
performance here(compare -O2 -funroll-loops vs -O2 -ftree-vectorize
-funroll-loops) so I'm wondering if we can adjust the heuristic or cost model
so that the loop is not vectorized.

> Can you check if -mtune-ctl=^sse_unaligned_load_optimal helps on CLX?

doesn't help.

Re: [WIP] Re: [PATCH] openmp: Fix intermittent hanging of task-detach-6 libgomp tests [PR98738]

2021-04-15 Thread Thomas Schwinge
Hi!

On 2021-04-09T13:00:39+0200, I wrote:
> On 2021-03-25T12:02:15+0100, I wrote:
>> On 2021-03-11T17:52:55+0100, I wrote:
>>> On 2021-02-23T22:52:38+0100, Jakub Jelinek via Gcc-patches 
>>>  wrote:
 On Tue, Feb 23, 2021 at 09:43:51PM +, Kwok Cheung Yeung wrote:
> On 19/02/2021 7:12 pm, Kwok Cheung Yeung wrote:
> > I have included the current state of my patch. All task-detach-* tests
> > pass when executed without offloading or with offloading to GCN, but
> > with offloading to Nvidia, task-detach-6.* hangs consistently but
> > everything else passes (probably because of the missing
> > gomp_team_barrier_done?).
>
> It looks like the hang has nothing to do with the detach patch - this 
> hangs
> consistently for me when offloaded to NVPTX:
>
> #include 
>
> int main (void)
> {
> #pragma omp target
>   #pragma omp parallel
> #pragma omp task
>   ;
> }
>
> This doesn't hang when offloaded to GCN or the host device, or if
> num_threads(1) is specified on the omp parallel.
>>>
>>> So, I reproduced this the hard way;
>>>  :-/
>>>
>>> Please always file issues when you run into such things.  I've now filed
>>> PR99555 "[OpenMP/nvptx] Execution-time hang for simple nested OpenMP
>>> 'target'/'parallel'/'task' constructs".
>>>
 Then it can be solved separately, I'll try to have a look if I see 
 something
 bad from the dumps, but I admit I don't have much experience with debugging
 NVPTX offloaded code...
>>>
>>> Any luck?
>>>
>>>
>>> Until this gets resolved properly, OK to push something like the attached
>>> (currently testing) "Avoid OpenMP/nvptx execution-time hangs for simple
>>> nested OpenMP 'target'/'parallel'/'task' constructs [PR99555]"?
>>
>> As posted, I've now pushed "Avoid OpenMP/nvptx execution-time hangs for
>> simple nested OpenMP 'target'/'parallel'/'task' constructs [PR99555]" to
>> master branch in commit d99111fd8e12deffdd9a965ce17e8a760d531ec3, see
>> attached.  "... awaiting proper resolution, of course."
>
>> +  if (on_device_arch_nvptx ())
>> +__builtin_abort (); //TODO Until resolved, skip, with error status.
>
> Actually, we can do better: do try to execute this trivial OpenMP code
> (expected to complete in no time), but for nvptx offloading "make sure
> that we exit quickly, with error status", and XFAIL that.  So that we'll
> get XFAIL -> XPASS when this starts to work for nvptx offloading.

Pushed "XFAIL OpenMP/nvptx execution-time hangs for simple nested OpenMP
'target'/'parallel'/'task' constructs [PR99555]" to master branch in
commit 4dd9e1c541e0eb921d62c8652c854b1259e56aac, see attached.


Grüße
 Thomas


-
Mentor Graphics (Deutschland) GmbH, Arnulfstrasse 201, 80634 München 
Registergericht München HRB 106955, Geschäftsführer: Thomas Heurung, Frank 
Thürauf
>From 4dd9e1c541e0eb921d62c8652c854b1259e56aac Mon Sep 17 00:00:00 2001
From: Thomas Schwinge 
Date: Wed, 7 Apr 2021 10:36:36 +0200
Subject: [PATCH] XFAIL OpenMP/nvptx execution-time hangs for simple nested
 OpenMP 'target'/'parallel'/'task' constructs [PR99555]

... still awaiting proper resolution, of course.

	libgomp/
	PR target/99555
	* testsuite/lib/libgomp.exp
	(check_effective_target_offload_device_nvptx): New.
	* testsuite/libgomp.c/pr99555-1.c : Until
	resolved, make sure that we exit quickly, with error status,
	XFAILed.
	* testsuite/libgomp.c-c++-common/task-detach-6.c: Likewise.
	* testsuite/libgomp.fortran/task-detach-6.f90: Likewise.
---
 libgomp/testsuite/lib/libgomp.exp| 12 
 .../testsuite/libgomp.c-c++-common/task-detach-6.c   |  5 -
 libgomp/testsuite/libgomp.c/pr99555-1.c  |  5 -
 libgomp/testsuite/libgomp.fortran/task-detach-6.f90  |  3 ++-
 4 files changed, 22 insertions(+), 3 deletions(-)

diff --git a/libgomp/testsuite/lib/libgomp.exp b/libgomp/testsuite/lib/libgomp.exp
index 72d001186a5..14dcfdfd00a 100644
--- a/libgomp/testsuite/lib/libgomp.exp
+++ b/libgomp/testsuite/lib/libgomp.exp
@@ -401,6 +401,18 @@ proc check_effective_target_offload_device_shared_as { } {
 } ]
 }
 
+# Return 1 if using nvptx offload device.
+proc check_effective_target_offload_device_nvptx { } {
+return [check_runtime_nocache offload_device_nvptx {
+  #include 
+  #include "testsuite/libgomp.c-c++-common/on_device_arch.h"
+  int main ()
+	{
+	  return !on_device_arch_nvptx ();
+	}
+} ]
+}
+
 # Return 1 if at least one Nvidia GPU is accessible.
 
 proc check_effective_target_openacc_nvidia_accel_present { } {
diff --git a/libgomp/testsuite/libgomp.c-c++-common/task-detach-6.c b/libgomp/testsuite/libgomp.c-c++-common/task-detach-6.c
index 119d7f52f8f..f18b57bf047 100644
--- a/libgomp/testsuite/libgomp.c-c++-common/task-detach-6.c
+++ b/libgomp/testsuite/libgomp.c-c++-common/task-detach-6.c
@@ -2,6 +2,8 @@
 
 #include 
 #include 
+#include  // For 

Re: removing toxic emailers

2021-04-15 Thread Jonathan Wakely via Gcc
On Thu, 15 Apr 2021 at 02:18, Christopher Dimech wrote:
> What are we?  Adults or Children?  You know, as I know, that identities
> can be made up.  There are many computing specialists who can do that.
> They can even be made so it looks as though they were sent by you, or
> from your work and home address.  They could even be made up to look as
> though your children sent them.
>
> I remember a closing comment by Eben Moglen during a full-day program at
> Columbia Law School in 2016.  And I agree with him.
>
> So my point here — if it’s okay just to have a point when people should 
> already be drinking and dancing — my point is let’s not get confused. This is 
> not war time. This is diplomacy time. Skill counts. Agility counts. 
> Discretion counts. Long credibility counts. Ammunition? Ammunition is 
> worthless because wherever we fire it, we work everywhere and it’s only going 
> to hit us. - Eben Moglen

Interesting choice of quote from the guy who made the very first reply
to the whole thing with "What is this?  The usual rant of freaked out
madness!!!"
https://gcc.gnu.org/pipermail/gcc/2021-March/235092.html

and followed soon after with "More rats for the wood pile. "
https://gcc.gnu.org/pipermail/gcc/2021-March/235109.html

But now you're lecturing us about diplomacy.

Fuck off, Christopher. Just fuck off. You've added nothing of value to
this entire discussion, just riled people up and stirred up trouble.
Fuck off.


[Bug tree-optimization/99971] GCC generates partially vectorized and scalar code at once

2021-04-15 Thread andysem at mail dot ru via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=99971

--- Comment #3 from andysem at mail dot ru ---
I tried adding __restrict__ to the equivalents of x, y1 and y2 in the original
larger code base and it didn't help. The compiler (gcc 10.2) would still
generate the same half-vectorized code.

[Bug target/99555] [OpenMP/nvptx] Execution-time hang for simple nested OpenMP 'target'/'parallel'/'task' constructs

2021-04-15 Thread cvs-commit at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=99555

--- Comment #5 from CVS Commits  ---
The master branch has been updated by Thomas Schwinge :

https://gcc.gnu.org/g:4dd9e1c541e0eb921d62c8652c854b1259e56aac

commit r11-8189-g4dd9e1c541e0eb921d62c8652c854b1259e56aac
Author: Thomas Schwinge 
Date:   Wed Apr 7 10:36:36 2021 +0200

XFAIL OpenMP/nvptx execution-time hangs for simple nested OpenMP
'target'/'parallel'/'task' constructs [PR99555]

... still awaiting proper resolution, of course.

libgomp/
PR target/99555
* testsuite/lib/libgomp.exp
(check_effective_target_offload_device_nvptx): New.
* testsuite/libgomp.c/pr99555-1.c : Until
resolved, make sure that we exit quickly, with error status,
XFAILed.
* testsuite/libgomp.c-c++-common/task-detach-6.c: Likewise.
* testsuite/libgomp.fortran/task-detach-6.f90: Likewise.

[Bug target/100093] New: different behavior between -mtune=cpu_type and target_attribute (“arch=cputype”)

2021-04-15 Thread crazylht at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100093

Bug ID: 100093
   Summary: different behavior between -mtune=cpu_type and
target_attribute (“arch=cputype”)
   Product: gcc
   Version: 11.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: target
  Assignee: unassigned at gcc dot gnu.org
  Reporter: crazylht at gmail dot com
CC: hjl.tools at gmail dot com
  Target Milestone: ---
  Host: x86_64-pc-linux-gnu
Target: x86_64-*-* i?86-*-*

Refer to https://godbolt.org/z/31nv3T8Tf

__attribute__((target("tune=skylake-avx512")))
void fill_avx2(double *__restrict__ data, int n, double value)
{
for (int i = 0; i < n * 16; i++) {
data[i] = value;
}
}

Shouldn't command line gcc -O3 -march=znver1 generate same codes gcc -O3
-march=znver1 -mtune=skylake-avx512?

[Bug target/100056] [9/10 Regression] orr + lsl vs. [us]bfiz

2021-04-15 Thread jakub at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100056

Jakub Jelinek  changed:

   What|Removed |Added

Summary|[9/10/11 Regression]  orr + |[9/10 Regression]  orr +
   |lsl vs. [us]bfiz|lsl vs. [us]bfiz

--- Comment #10 from Jakub Jelinek  ---
Fixed on the trunk so far.  Backports unlikely.

[Bug target/100056] [9/10/11 Regression] orr + lsl vs. [us]bfiz

2021-04-15 Thread cvs-commit at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100056

--- Comment #9 from CVS Commits  ---
The master branch has been updated by Jakub Jelinek :

https://gcc.gnu.org/g:39d23b7960e4efb11bbe1eff056ae9da0884c539

commit r11-8188-g39d23b7960e4efb11bbe1eff056ae9da0884c539
Author: Jakub Jelinek 
Date:   Thu Apr 15 10:45:09 2021 +0200

aarch64: Fix several *_ashl3 related regressions
[PR100056]

Before combiner added 2 to 2 combinations, the following testcase functions
have been all compiled into 2 instructions, zero/sign extensions or and
followed by orr with lsl, e.g. for the first function
Trying 7 -> 8:
7: r96:SI=r94:SI<<0xb
8: r95:SI=r96:SI|r94:SI
  REG_DEAD r96:SI
  REG_DEAD r94:SI
Successfully matched this instruction:
(set (reg:SI 95)
(ior:SI (ashift:SI (reg/v:SI 94 [ i ])
(const_int 11 [0xb]))
(reg/v:SI 94 [ i ])))
is the important successful try_combine and so we end up with
and w0, w0, 255
orr w0, w0, w0, lsl 11
in the body.
With 2 to 2 combination, before that can trigger, another successful
combination:
Trying 2 -> 7:
2: r94:SI=zero_extend(x0:QI)
  REG_DEAD x0:QI
7: r96:SI=r94:SI<<0xb
is replaced with:
(set (reg/v:SI 94 [ i ])
(zero_extend:SI (reg:QI 0 x0 [ i ])))
and
(set (reg:SI 96)
(and:SI (ashift:SI (reg:SI 0 x0 [ i ])
(const_int 11 [0xb]))
(const_int 522240 [0x7f800])))
and in the end results in 3 instructions in the body:
and w1, w0, 255
ubfiz   w0, w0, 11, 8
orr w0, w0, w1
The following combine splitters help undo that when combiner tries to
combine 3 instructions - the zero/sign extend or and, the other insn
from the 2 to 2 combination ([us]bfiz) and the logical op, the CPUs
don't have an insn to do everything in one op, but we can split it
back into the zero/sign extend or and followed by logical with lsl.

2021-04-15  Jakub Jelinek  

PR target/100056
* config/aarch64/aarch64.md
(*_3):
Add combine splitters for *_ashl3 with
ZERO_EXTEND, SIGN_EXTEND or AND.

* gcc.target/aarch64/pr100056.c: New test.

[Bug target/100092] [10 Regression] nvptx offloading on aarch64 fails to specify -foffload-abi

2021-04-15 Thread rguenth at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100092

Richard Biener  changed:

   What|Removed |Added

 Status|UNCONFIRMED |RESOLVED
 Resolution|--- |INVALID

--- Comment #3 from Richard Biener  ---
OK, so the difference in the testresults is because the 10.2.1 based compiler
did not have the offload compiler installed at test time.  Oops.

[Bug target/100092] [10 Regression] nvptx offloading on aarch64 fails to specify -foffload-abi

2021-04-15 Thread doko at debian dot org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100092

Matthias Klose  changed:

   What|Removed |Added

 CC||doko at debian dot org

--- Comment #2 from Matthias Klose  ---
there's more needed, see PR96265.

[Bug rtl-optimization/100090] ICE in regcprop.c (find_oldest_value_reg)

2021-04-15 Thread rguenth at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100090

--- Comment #2 from Richard Biener  ---
Doesn't reproduce on x86_64-linux with -m32.

[Bug c++/100091] [11 Regression] decltype([]{}) rejected as a default template parameter

2021-04-15 Thread rguenth at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100091

Richard Biener  changed:

   What|Removed |Added

 Status|UNCONFIRMED |NEW
   Priority|P3  |P1
Summary|decltype([]{}) rejected as  |[11 Regression]
   |a default template  |decltype([]{}) rejected as
   |parameter   |a default template
   ||parameter
 Ever confirmed|0   |1
   Target Milestone|--- |11.0
  Known to work||10.3.0
   Last reconfirmed||2021-04-15

--- Comment #1 from Richard Biener  ---
Confirmed.

[Bug target/100092] [10 Regression] nvptx offloading on aarch64 fails to specify -foffload-abi

2021-04-15 Thread rguenth at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100092

--- Comment #1 from Richard Biener  ---
On trunk g:29a14a1a907947fe9e43bce62d3468559f17da97 adds TARGET_OFFLOAD_OPTIONS
to aarch64.

[Bug target/100092] [10 Regression] nvptx offloading on aarch64 fails to specify -foffload-abi

2021-04-15 Thread rguenth at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100092

Richard Biener  changed:

   What|Removed |Added

   Target Milestone|--- |10.4
 Target||nvptx
   Host||aarch64-linux

[Bug target/100092] New: [10 Regression] nvptx offloading on aarch64 fails to specify -foffload-abi

2021-04-15 Thread rguenth at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100092

Bug ID: 100092
   Summary: [10 Regression] nvptx offloading on aarch64 fails to
specify -foffload-abi
   Product: gcc
   Version: 10.3.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: target
  Assignee: unassigned at gcc dot gnu.org
  Reporter: rguenth at gcc dot gnu.org
  Target Milestone: ---

Comparing GCC 10.2.1 and 10.3.0 testresults on aarch64 I see

=== libgomp Summary ===

-# of expected passes   7627
-# of unexpected successes  6
-# of expected failures 4
+# of expected passes   6837
+# of unexpected failures   820
+# of expected failures 10
+# of unresolved testcases  400
 # of untested testcases3
-# of unsupported tests 601
+# of unsupported tests 602

with all the FAILs being like

+FAIL: libgomp.c/examples-4/target_data-4.c (internal compiler error)
+FAIL: libgomp.c/examples-4/target_data-4.c (test for excess errors)
+UNRESOLVED: libgomp.c/examples-4/target_data-4.c compilation failed to produce
executable

and in the log

spawn -ignore SIGHUP
/home/abuild/rpmbuild/BUILD/gcc-10.3.0+git1587/obj-aarch64-suse-linux/./gcc/xgcc
-B/home/abuild/rpmbuild/BUILD/gcc-10.3.0+git1587/obj-aarch64-suse-linux/./gcc/
-B/usr/aarch64-suse-linux/bin/ -B/usr/aarch64-suse-linux/lib/ -isystem
/usr/aarch64-suse-linux/include -isystem /usr/aarch64-suse-linux/sys-include
-fchecking=1 ../../../../libgomp/testsuite/libgomp.c/examples-4/target_data-4.c
-B/home/abuild/rpmbuild/BUILD/gcc-10.3.0+git1587/obj-aarch64-suse-linux/aarch64-suse-linux/./libgomp/
-B/home/abuild/rpmbuild/BUILD/gcc-10.3.0+git1587/obj-aarch64-suse-linux/aarch64-suse-linux/./libgomp/.libs
-I/home/abuild/rpmbuild/BUILD/gcc-10.3.0+git1587/obj-aarch64-suse-linux/aarch64-suse-linux/./libgomp
-I../../../../libgomp/testsuite/../../include
-I../../../../libgomp/testsuite/.. -Lno -fmessage-length=0
-fno-diagnostics-show-caret -Wno-hsa -fdiagnostics-color=never
-B/usr/lib64/gcc/aarch64-suse-linux/10 -B/usr/bin -fopenmp -O2
-L/home/abuild/rpmbuild/BUILD/gcc-10.3.0+git1587/obj-aarch64-suse-linux/aarch64-suse-linux/./libgomp/.libs
-lm -o ./target_data-4.exe^M
mkoffload: internal compiler error: in main, at config/nvptx/mkoffload.c:511^M
Please submit a full bug report,^M
with preprocessed source if appropriate.^M
See  for instructions.^M
lto-wrapper: fatal error:
/usr/lib64/gcc/aarch64-suse-linux/10//accel/nvptx-none/mkoffload returned 4
exit status^M
compilation terminated.^M
/usr/aarch64-suse-linux/bin/ld: error: lto-wrapper failed^M

if you look at mkoffload you see

  switch (offload_abi)
{
case OFFLOAD_ABI_LP64:
  obstack_ptr_grow (_obstack, "-m64");
  break;
case OFFLOAD_ABI_ILP32:
  obstack_ptr_grow (_obstack, "-m32");
  break;
default:
  gcc_unreachable ();  // <--- ICE here

I fail to see a corresponding setting of -foffload-abi in the aarch64 backend
on the branch (there is one on trunk).

I'm not at all sure how this is a regression but the testcase passes earlier
(10.2.1+git583 vs. 10.3.0+git1587)

[Bug target/99555] [OpenMP/nvptx] Execution-time hang for simple nested OpenMP 'target'/'parallel'/'task' constructs

2021-04-15 Thread vries at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=99555

--- Comment #4 from Tom de Vries  ---
Investigated using cuda-gdb.

After typing ^c, we investigate the state:
...
(cuda-gdb) info cuda kernels
  Kernel Parent Dev Grid Status   SMs Mask GridDim BlockDim Invocation 
*  0  -   01 Active 0x0010 (1,1,1) (32,8,1) main$_omp_fn() 
...

So, we have 256 threads in the CTA, or 8 warps.

The threads have the following state:
...
(cuda-gdb) info cuda threads
  BlockIdx ThreadIdx To BlockIdx ThreadIdx Count Virtual PC Filename 
Line 
Kernel 0
*  (0,0,0)   (0,0,0) (0,0,0)   (0,0,0) 1 0x00b5f638  n/a   
 0 
   (0,0,0)   (0,1,0) (0,0,0)   (0,7,0) 7 0x00b2f350  n/a   
 0 
   (0,0,0)   (1,0,0) (0,0,0)   (1,0,0) 1 0x00b5f638  n/a   
 0 
   (0,0,0)   (1,1,0) (0,0,0)   (1,7,0) 7 0x00b2f350  n/a   
 0 
   (0,0,0)   (2,0,0) (0,0,0)   (2,0,0) 1 0x00b5f638  n/a   
 0 
   (0,0,0)   (2,1,0) (0,0,0)   (2,7,0) 7 0x00b2f350  n/a   
 0 
   (0,0,0)   (3,0,0) (0,0,0)   (3,0,0) 1 0x00b5f638  n/a   
 0 
   (0,0,0)   (3,1,0) (0,0,0)   (3,7,0) 7 0x00b2f350  n/a   
 0 
   (0,0,0)   (4,0,0) (0,0,0)   (4,0,0) 1 0x00b5f638  n/a   
 0 
   (0,0,0)   (4,1,0) (0,0,0)   (4,7,0) 7 0x00b2f350  n/a   
 0 
   (0,0,0)   (5,0,0) (0,0,0)   (5,0,0) 1 0x00b5f638  n/a   
 0 
   (0,0,0)   (5,1,0) (0,0,0)   (5,7,0) 7 0x00b2f350  n/a   
 0 
   (0,0,0)   (6,0,0) (0,0,0)   (6,0,0) 1 0x00b5f638  n/a   
 0 
   (0,0,0)   (6,1,0) (0,0,0)   (6,7,0) 7 0x00b2f350  n/a   
 0 
   (0,0,0)   (7,0,0) (0,0,0)   (7,0,0) 1 0x00b5f638  n/a   
 0 
   (0,0,0)   (7,1,0) (0,0,0)   (7,7,0) 7 0x00b2f350  n/a   
 0 
   (0,0,0)   (8,0,0) (0,0,0)   (8,0,0) 1 0x00b5f638  n/a   
 0 
   (0,0,0)   (8,1,0) (0,0,0)   (8,7,0) 7 0x00b2f350  n/a   
 0 
   (0,0,0)   (9,0,0) (0,0,0)   (9,0,0) 1 0x00b5f638  n/a   
 0 
   (0,0,0)   (9,1,0) (0,0,0)   (9,7,0) 7 0x00b2f350  n/a   
 0 
   (0,0,0)  (10,0,0) (0,0,0)  (10,0,0) 1 0x00b5f638  n/a   
 0 
   (0,0,0)  (10,1,0) (0,0,0)  (10,7,0) 7 0x00b2f350  n/a   
 0 
   (0,0,0)  (11,0,0) (0,0,0)  (11,0,0) 1 0x00b5f638  n/a   
 0 
   (0,0,0)  (11,1,0) (0,0,0)  (11,7,0) 7 0x00b2f350  n/a   
 0 
   (0,0,0)  (12,0,0) (0,0,0)  (12,0,0) 1 0x00b5f638  n/a   
 0 
   (0,0,0)  (12,1,0) (0,0,0)  (12,7,0) 7 0x00b2f350  n/a   
 0 
   (0,0,0)  (13,0,0) (0,0,0)  (13,0,0) 1 0x00b5f638  n/a   
 0 
   (0,0,0)  (13,1,0) (0,0,0)  (13,7,0) 7 0x00b2f350  n/a   
 0 
   (0,0,0)  (14,0,0) (0,0,0)  (14,0,0) 1 0x00b5f638  n/a   
 0 
   (0,0,0)  (14,1,0) (0,0,0)  (14,7,0) 7 0x00b2f350  n/a   
 0 
   (0,0,0)  (15,0,0) (0,0,0)  (15,0,0) 1 0x00b5f638  n/a   
 0 
   (0,0,0)  (15,1,0) (0,0,0)  (15,7,0) 7 0x00b2f350  n/a   
 0 
   (0,0,0)  (16,0,0) (0,0,0)  (16,0,0) 1 0x00b5f638  n/a   
 0 
   (0,0,0)  (16,1,0) (0,0,0)  (16,7,0) 7 0x00b2f350  n/a   
 0 
   (0,0,0)  (17,0,0) (0,0,0)  (17,0,0) 1 0x00b5f638  n/a   
 0 
   (0,0,0)  (17,1,0) (0,0,0)  (17,7,0) 7 0x00b2f350  n/a   
 0 
   (0,0,0)  (18,0,0) (0,0,0)  (18,0,0) 1 0x00b5f638  n/a   
 0 
   (0,0,0)  (18,1,0) (0,0,0)  (18,7,0) 7 0x00b2f350  n/a   
 0 
   (0,0,0)  (19,0,0) (0,0,0)  (19,0,0) 1 0x00b5f638  n/a   
 0 
   (0,0,0)  (19,1,0) (0,0,0)  (19,7,0) 7 0x00b2f350  n/a   
 0 
   (0,0,0)  (20,0,0) (0,0,0)  (20,0,0) 1 0x00b5f638  n/a   
 0 
   (0,0,0)  (20,1,0) (0,0,0)  (20,7,0) 7 0x00b2f350  n/a   
 0 
   (0,0,0)  (21,0,0) (0,0,0)  (21,0,0) 1 0x00b5f638  n/a   
 0 
   (0,0,0)  (21,1,0) (0,0,0)  (21,7,0) 7 0x00b2f350  n/a   
 0 
   (0,0,0)  (22,0,0) (0,0,0)  (22,0,0) 1 0x00b5f638  n/a   
 0 
   (0,0,0)  (22,1,0) (0,0,0)  (22,7,0) 7 0x00b2f350  n/a   
 0 
   (0,0,0)  (23,0,0) (0,0,0)  (23,0,0) 1 0x00b5f638  n/a   
 0 
   (0,0,0)  (23,1,0) (0,0,0)  (23,7,0) 7 0x00b2f350  n/a   
 0 
   (0,0,0)  (24,0,0) (0,0,0)  (24,0,0) 1 0x00b5f638  n/a   
 0 
   (0,0,0)  (24,1,0) (0,0,0)  (24,7,0) 7 0x00b2f350  n/a   
 0 
   (0,0,0)  (25,0,0) (0,0,0)  (25,0,0) 1 0x00b5f638  n/a   
 0 
   (0,0,0)  (25,1,0) (0,0,0)  (25,7,0) 7 0x00b2f350  n/a   
 0 
   (0,0,0)  (26,0,0) (0,0,0)  (26,0,0) 1 0x00b5f638  

[Bug rtl-optimization/100066] [11 Regression] ICE in lra_assign, at lra-assigns.c:1649

2021-04-15 Thread jakub at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100066

Jakub Jelinek  changed:

   What|Removed |Added

 Resolution|--- |FIXED
 Status|NEW |RESOLVED

--- Comment #4 from Jakub Jelinek  ---
Fixed.

[Bug tree-optimization/100076] eembc/automotive/basefp01 has 30.3% regression compare -O2 -ftree-vectorize with -O2 on CLX/Znver3

2021-04-15 Thread rguenth at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100076

Richard Biener  changed:

   What|Removed |Added

 Ever confirmed|0   |1
   Last reconfirmed||2021-04-15
 Status|UNCONFIRMED |NEW

--- Comment #5 from Richard Biener  ---
Note even when avoiding the STLF hit the vectorized version is slower.
You can use -mtune-ctl=^sse_unaligned_load_optimal to force loading
the lower/upper half of vectors separately.

The reason is that without -ffast-math we are using an in-order reduction
which doesn't save us much but instead just combines dependence chains
here.  We do have a related bug for this somewhere.

With -ffast-math the version with/without
-mtune-ctl=^sse_unaligned_load_optimal
is about the same speed, so STLF is a red herring here (on Zen2).

Still not vectorizing is a lot faster.

Can you check if -mtune-ctl=^sse_unaligned_load_optimal helps on CLX?

[Bug target/100088] ymm store split into two xmm stores

2021-04-15 Thread crazylht at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100088

--- Comment #4 from Hongtao.liu  ---
(In reply to Hongtao.liu from comment #3)
> (In reply to Hongtao.liu from comment #2)
> > > 
> > > This issue does not exist for sse or avx512f. Setting `-march=haswell` or
> > > `-mtune=haswell` on the command line also seems to fix this but neither of
> > > these works when added to the target attribute.
> > 
> > It seems a problem that target attribute arch=haswell didn't work
> typo tune=haswell
> > 
> > https://godbolt.org/z/5hhqTYv9G

Seems also fixed in GCC11.

[Bug target/100088] ymm store split into two xmm stores

2021-04-15 Thread crazylht at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100088

--- Comment #3 from Hongtao.liu  ---
(In reply to Hongtao.liu from comment #2)
> > 
> > This issue does not exist for sse or avx512f. Setting `-march=haswell` or
> > `-mtune=haswell` on the command line also seems to fix this but neither of
> > these works when added to the target attribute.
> 
> It seems a problem that target attribute arch=haswell didn't work
typo tune=haswell
> 
> https://godbolt.org/z/5hhqTYv9G

[Bug target/100088] ymm store split into two xmm stores

2021-04-15 Thread crazylht at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100088

Hongtao.liu  changed:

   What|Removed |Added

 CC||crazylht at gmail dot com

--- Comment #2 from Hongtao.liu  ---

> 
> This issue does not exist for sse or avx512f. Setting `-march=haswell` or
> `-mtune=haswell` on the command line also seems to fix this but neither of
> these works when added to the target attribute.

It seems a problem that target attribute arch=haswell didn't work

https://godbolt.org/z/5hhqTYv9G

[Bug tree-optimization/100089] [11 Regression] 30% performance regression for denbench/mp2decoddata2 with -O3

2021-04-15 Thread rguenth at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100089

Richard Biener  changed:

   What|Removed |Added

 Ever confirmed|0   |1
   Last reconfirmed||2021-04-15
 Status|UNCONFIRMED |NEW
Summary|[11 Performance regression  |[11 Regression] 30%
   |] 30% for   |performance regression for
   |denbench/mp2decoddata2 with |denbench/mp2decoddata2 with
   |-O3 |-O3
   Target Milestone|--- |11.0

--- Comment #1 from Richard Biener  ---
Indeed loop vectorization throws if-converted bodies at the BB vectorizer as a
last resort (because BB vectorization doesn't do if-conversion itself).  But
the BB vectorizer then uses the if-converted scalar code as the thing to
cost against (costing against the not if-converted loop body isn't really
possible).  To quote

  /* If we applied if-conversion then try to vectorize the
 BB of innermost loops.
 ???  Ideally BB vectorization would learn to vectorize
 control flow by applying if-conversion on-the-fly, the
 following retains the if-converted loop body even when
 only non-if-converted parts took part in BB vectorization.  */
  if (flag_tree_slp_vectorize != 0
  && loop_vectorized_call
  && ! loop->inner)
{

as a "hack" we could see to scalar cost the always executed part of
the not if-converted loop body and apply the full bias of this cost
vs. the scalar cost of the if-converted body to the scalar cost of the
BB vectorization.  But that's really apples-to-oranges in the end
(as it is now).

Maybe we can cost the whole partly vectorized loop body in this mode
and compare it against the scalar cost of the original loop.  But even
the loop vectorizer costs the if-converted scalar loop, so it is off as well.

Long-term if-conversion needs to be integrated with vectorization so we
can at least keep track of what stmts were originally executed conditional
and what not.

Short-term I'm not sure we can do much.  Doing SLP on the if-converted
body does help in quite some cases.

[Bug c++/100091] New: decltype([]{}) rejected as a default template parameter

2021-04-15 Thread pilarlatiesa at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100091

Bug ID: 100091
   Summary: decltype([]{}) rejected as a default template
parameter
   Product: gcc
   Version: 11.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: c++
  Assignee: unassigned at gcc dot gnu.org
  Reporter: pilarlatiesa at gmail dot com
  Target Milestone: ---

This piece of code is accepted by 10.2, but rejected by yesterday's (20210414)
snapshot:

$ cat test.cpp

  template
  void f() {}

$ ../GCC-11/bin/g++ -std=c++20 -c test.cpp

test.cpp:2:30: error: lambda-expression in template parameter type
2 | template
  |

It's a recent change in behaviour. Possibly caused by
r11-8166-ge1666ebd9ad31dbd8b9b933c883bdd882cfd1522.

I'm labeling this as rejects-valid because I believe [basic.def.odr]/14 allows
such an use of a lambda, but I hardly understand that wording.

[Bug tree-optimization/100086] [11 Regression] spurious -Wnonnull with __builtin_expect

2021-04-15 Thread rguenth at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100086

Richard Biener  changed:

   What|Removed |Added

   Target Milestone|--- |11.0

[Bug target/100085] Bad code for union transfer from __float128 to vector types

2021-04-15 Thread rguenth at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100085

Richard Biener  changed:

   What|Removed |Added

 Target||powerpc
   Last reconfirmed||2021-04-15
 Ever confirmed|0   |1
   Keywords||missed-optimization
  Component|rtl-optimization|target
 Status|UNCONFIRMED |NEW

--- Comment #2 from Richard Biener  ---
RTL expansion for

vui128_t test_xfer_bin128_2_vui128t (__binary128 f128)
{
  vector(1) __int128 unsigned _3;

;;   basic block 2, loop depth 0
;;pred:   ENTRY
  _3 = VIEW_CONVERT_EXPR(f128_2(D));
  return _3;

power9 (-) vs power8 (+) is

 (note 3 2 6 2 NOTE_INSN_FUNCTION_BEG)
-(insn 6 3 7 2 (set (mem/c:KF (reg/f:DI 112 virtual-stack-vars) [1  S16 A128])
-(reg/v:KF 118 [ f128 ])) "vec_f128_ppc.h":143:19 -1
- (nil))
-(insn 7 6 8 2 (set (reg:V1TI 120)
-(mem/c:V1TI (reg/f:DI 112 virtual-stack-vars) [1  S16 A128]))
"t.c":13:10 -1
+(insn 6 3 7 2 (set (subreg:V1TI (reg:KF 120 [ f128 ]) 0)
+(rotate:V1TI (subreg:V1TI (reg/v:KF 118 [ f128 ]) 0)
+(const_int 64 [0x40]))) "vec_f128_ppc.h":143:19 -1
+ (nil))
+(insn 7 6 8 2 (set (mem/c:V1TI (reg/f:DI 112 virtual-stack-vars) [1  S16
A128])
+(rotate:V1TI (subreg:V1TI (reg:KF 120 [ f128 ]) 0)
+(const_int 64 [0x40]))) "vec_f128_ppc.h":143:19 -1
+ (nil))
+(insn 8 7 9 2 (set (reg:V2DI 122)
+(vec_select:V2DI (mem/c:V2DI (reg/f:DI 112 virtual-stack-vars) [1  S16
A128])
+(parallel [
+(const_int 1 [0x1])
+(const_int 0 [0])
+]))) "t.c":13:10 -1
+ (nil))
+(insn 9 8 10 2 (set (subreg:V2DI (reg:V1TI 121) 0)
+(vec_select:V2DI (reg:V2DI 122)
+(parallel [
+(const_int 1 [0x1])
+(const_int 0 [0])
+]))) "t.c":13:10 -1
  (nil))

so power8 avoids the stack but in turn ends up with sth that's not
optimized down the road.

[Bug rtl-optimization/100090] ICE in regcprop.c (find_oldest_value_reg)

2021-04-15 Thread andrewdkaster at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100090

--- Comment #1 from Andrew Kaster  ---
Created attachment 50598
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=50598=edit
Reduced Test Case (cvise-ified)

[Bug rtl-optimization/100090] New: ICE in regcprop.c (find_oldest_value_reg)

2021-04-15 Thread andrewdkaster at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100090

Bug ID: 100090
   Summary: ICE in regcprop.c (find_oldest_value_reg)
   Product: gcc
   Version: 10.3.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: rtl-optimization
  Assignee: unassigned at gcc dot gnu.org
  Reporter: andrewdkaster at gmail dot com
  Target Milestone: ---
  Host: x86_64-linux-gnu
Target: i686-pc-serenity

Encountered an ICE when trying to port OpenJDK to the Serenity operating
system. G++ used has these patches applied to provide spec files for the OS:

https://github.com/SerenityOS/serenity/blob/b6093ae2e362aa1f9cf7e87bc829b37b24250d25/Toolchain/Patches/gcc.patch

Test case was reduced with cvise 1.6.0 (ubuntu 20.04), which seems to be the
cause of like.. all the warnings.



~/bugs$ ../serenity/Toolchain/Local/i686/bin/i686-pc-serenity-g++ -v -std=c++11
-save-temps -g -O2 -fpermissive -S -c testcase.i
Using built-in specs.
COLLECT_GCC=../serenity/Toolchain/Local/i686/bin/i686-pc-serenity-g++
Target: i686-pc-serenity
Configured with: /home/andrew/serenity/Toolchain/Tarballs/gcc-10.3.0/configure
--prefix=/home/andrew/serenity/Toolchain/Local/i686 --target=i686-pc-serenity
--with-sysroot=/home/andrew/serenity/Toolchain/../Build/i686/Root --disable-nls
--with-newlib --enable-shared --enable-languages=c,c++ --enable-default-pie
--enable-lto
Thread model: single
Supported LTO compression algorithms: zlib
gcc version 10.3.0 (GCC)
COLLECT_GCC_OPTIONS='-v' '-std=c++11' '-save-temps' '-g' '-O2' '-fpermissive'
'-S' '-c' '-mtune=generic' '-march=pentiumpro'

/home/andrew/serenity/Toolchain/Local/i686/libexec/gcc/i686-pc-serenity/10.3.0/cc1plus
-fpreprocessed testcase.i -fno-exceptions -ftls-model=initial-exec -quiet
-dumpbase testcase.i -mtune=generic -march=pentiumpro -auxbase testcase -g -O2
-std=c++11 -version -fpermissive -o testcase.s -fno-exceptions
-ftls-model=initial-exec
GNU C++11 (GCC) version 10.3.0 (i686-pc-serenity)
compiled by GNU C version 10.2.0, GMP version 6.2.0, MPFR version
4.0.2, MPC version 1.1.0, isl version none
GGC heuristics: --param ggc-min-expand=100 --param ggc-min-heapsize=131072
GNU C++11 (GCC) version 10.3.0 (i686-pc-serenity)
compiled by GNU C version 10.2.0, GMP version 6.2.0, MPFR version
4.0.2, MPC version 1.1.0, isl version none
GGC heuristics: --param ggc-min-expand=100 --param ggc-min-heapsize=131072
Compiler executable checksum: cf67ba5e5efa94a26d76866548722b70
testcase.i:12:3: warning: ISO C++ forbids declaration of 'ab' with no type
[-fpermissive]
   12 |   ab(a);
  |   ^~
testcase.i:18:3: warning: ISO C++ forbids declaration of 'm' with no type
[-fpermissive]
   18 |   m() { return ag; }
  |   ^
testcase.i:19:4: warning: ISO C++ forbids declaration of 'ag' with no type
[-fpermissive]
   19 |   *ag;
  |^~
testcase.i: In member function 'int k::m()':
testcase.i:18:16: warning: invalid conversion from 'int*' to 'int'
[-fpermissive]
   18 |   m() { return ag; }
  |^~
  ||
  |int*
testcase.i: At global scope:
testcase.i:30:3: warning: ISO C++ forbids declaration of 'aj' with no type
[-fpermissive]
   30 |   aj(ah ak) {
  |   ^~
testcase.i: In member function 'int n::aj(n::ah)':
testcase.i:34:3: warning: no return statement in function returning non-void
[-Wreturn-type]
   34 |   }
  |   ^
testcase.i: In constructor 'ao::ao(k*)':
testcase.i:48:23: warning: invalid conversion from 'int' to 'n*' [-fpermissive]
   48 |   ao(k *aq) : ao(aq->m(), aq) {}
  |  ~^~
  |   |
  |   int
testcase.i:45:9: note:   initializing argument 1 of 'ao::ao(n*, k*)'
   45 |   ao(n *ai, k *) : ap(ai) {}
  |  ~~~^~
testcase.i: At global scope:
testcase.i:67:3: warning: ISO C++ forbids declaration of 'bc' with no type
[-fpermissive]
   67 |   bc(az, bool);
  |   ^~
testcase.i:76:3: warning: ISO C++ forbids declaration of 'bg' with no type
[-fpermissive]
   76 |   bg(b, e, d &, g &);
  |   ^~
testcase.i:95:3: warning: ISO C++ forbids declaration of 'bq' with no type
[-fpermissive]
   95 |   bq(b);
  |   ^~
testcase.i:102:11: warning: ISO C++ forbids declaration of 'bu' with no type
[-fpermissive]
  102 |   virtual bu();
  |   ^~
testcase.i:120:11: warning: ISO C++ forbids declaration of 'bg' with no type
[-fpermissive]
  120 |   virtual bg(b ce) {
  |   ^~
testcase.i: In constructor 'ca::ca(ca::bf, ar*, bp*, bl*, at*)':
testcase.i:119:36: warning: anachronistic old-style base class initializer
[-fpermissive]
  119 |   ca(bf, ar *, bp *, bl *, at *) : ("") {}
  |^
testcase.i:119:37: warning: invalid conversion from 'const char*' to 'char'
[-fpermissive]
  119 |   ca(bf, ar *, bp *, bl *, at *) : ("") {}
  | ^~
  

[Bug tree-optimization/100082] missed optimization for dead code elimination at -O3 (vs. -O2)

2021-04-15 Thread rguenth at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100082

Richard Biener  changed:

   What|Removed |Added

 Status|UNCONFIRMED |NEW
 Ever confirmed|0   |1
   Keywords||missed-optimization
Version|unknown |11.0
   Last reconfirmed||2021-04-15

--- Comment #1 from Richard Biener  ---
Confirmed.  At -O2 PRE manages to optimize the call to foo.  The difference
starts at cunrolli where -O3 unrolls but -O2 not, disabling cunrolli restores
optimization.

Re: [PATCH] [GCC-9] backport -march=tigerlake to GCC9 [PR target/100009]

2021-04-15 Thread Uros Bizjak via Gcc-patches
On Wed, Apr 14, 2021 at 3:30 AM Hongtao Liu  wrote:
>
> On Tue, Apr 13, 2021 at 6:38 PM Uros Bizjak  wrote:
> >
> > On Tue, Apr 13, 2021 at 12:18 PM Hongtao Liu  wrote:
> > >
> > > Hi:
> > >   As described in PR, we introduced tigerlake string in driver-i386.c
> > > by r9-8652 w/o support -march/tune=tigerlake which causes an error
> > > when using -march/tune=native with GCC9 on tigerlake machine.
> > >   Bootstrapped and regtested on x86-64_iinux-gnu{-m32,}.
> > >   Ok for GCC9?
> > >
> > > gcc/
> > > * common/config/i386/i386-common.c
> > > (processor_names): Add tigerlake.
> > > (processor_alias_table): Ditto.
> > > * config.gcc: Document -march=tigerlake.
> >
> > Nope. Better.
> >
> > (x86_64_archs): Ditto.
> >
> > > * config/i386/driver-i386.c
> > > (host_detect_local_cpu): Detect tigerlake, add "has_avx" to
> > > classify processor.
> > > * config/i386/i386-c.c (ix86_target_macros_internal): Handle
> > > tigerlake.
> >
> > Handle PROCESSOR_TIGERLAKE.
> >
> > > * config/i386/i386.c (m_TIGERLAKE)  : Define.
> > > (m_CORE_AVX512): Ditto.
> >
> > You don't define this macro, but you add m_TIGERLAKE to m_CORE_AVX512.
> > Please correct this confusion.
> >
> > > (processor_cost_table): Add tigerlake.
> >
> > Please correct the above. You added skylake_cost.
> >
> > > (ix86_option_override_internal): Handle PTA_MOVDIRI, 
> > > PTA_MOVDIR64B.
> >
> > Where?
> >
> > > (processor_model): Add M_INTEL_COREI7_TIGERLAKE.
> > > (arch_names_table): Add tigerlake.
> > > (get_builtin_code_for_version) : Handle PROCESSOR_TIGERLAKE.
> > > * config/i386/i386.h (TARGET_TIGERLAKE): Define.
> > > (processor_type) : Add PROCESSOR_TIGERLAKE.
> >
> > (enum processor_type)
> >
> > > (PTA_TIGERLAKE)  : Ditto.
> >
> > Ditto what? This is a new define.
> >
> > > * doc/extend.texi: Add tigerlake.
> > > * doc/invoke.texi: Add tigerlake.
> >
> > Added where? To which section?
> >
> > > gcc/testsuite/
> > > * gcc.target/i386/funcspec-56.inc: Handle new march.
> > > * g++.target/i386/mv16.C: Handle new march
> >
> > Dot.
> >
> > >
> > > libgcc/
> > > * config/i386/cpuinfo.h: Add INTEL_COREI7_TIGERLAKE.
> >
> > (enum processor_subtypes)
> > >
> > > From-SVN: r274693
> >
> > Please repost with improved/corrected ChangeLog.
> >
> > Uros.
> >
> > > --
> > > BR,
> > > Hongtao
>
> updated.
>
> gcc/
> * common/config/i386/i386-common.c
> (processor_names): Add tigerlake.
> (processor_alias_table): Ditto.
> * config.gcc (x86_64_archs): Ditto.
> * config/i386/driver-i386.c
> (host_detect_local_cpu): Detect tigerlake, add "has_avx" to
> classify processor.
> * config/i386/i386-c.c (ix86_target_macros_internal): Handle
> PROCESSOR_TIGERLAKE.
> * config/i386/i386.c (m_TIGERLAKE): Define.
> (m_CORE_AVX512): Add m_TIGERLAKE.
> (processor_cost_table): Add skylake_cost for tigerlake.
> (processor_model): Add M_INTEL_COREI7_TIGERLAKE.
> (arch_names_table): Add tigerlake.
> (get_builtin_code_for_version): Handle PROCESSOR_TIGERLAKE.
> * config/i386/i386.h (TARGET_TIGERLAKE): Define.
> (enum processor_type): Add PROCESSOR_TIGERLAKE.
> (PTA_TIGERLAKE): Define.
> * doc/extend.texi (__builtin_cpu_is): Add tigerlake.
> * doc/invoke.texi (-march=cpu-type): Ditto.
>
> gcc/testsuite/
> * gcc.target/i386/funcspec-56.inc: Handle new march.
> * g++.target/i386/mv16.C: Handle new march.
>
> libgcc/
> * config/i386/cpuinfo.h (enum processor_subtypes): Add
> INTEL_COREI7_TIGERLAKE.

OK.

Thanks,
Uros.

>
>
> --
> BR,
> Hongtao


[Bug tree-optimization/100081] [11 Regression] Compile time hog in irange since r11-4135-ge864d395b4e862ce

2021-04-15 Thread rguenth at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100081

--- Comment #2 from Richard Biener  ---
>From the profile it looks like there's a lot tree INTEGER_CST work being done
rather than sticking to wide_ints.  That's always (constant-time) more
expensive.

[Bug rtl-optimization/100080] missed optimization for dead code elimination at -O3 (vs. -O2)

2021-04-15 Thread rguenth at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100080

Richard Biener  changed:

   What|Removed |Added

   Keywords||missed-optimization
 Ever confirmed|0   |1
   Last reconfirmed||2021-04-15
 Status|UNCONFIRMED |NEW
  Component|tree-optimization   |rtl-optimization
Version|unknown |11.0

--- Comment #1 from Richard Biener  ---
Confirmed.  At -O2 combine manages to drop the call to foo () (indirectly), at
-O3 it does not.  There's not much difference on the GIMPLE level

[Bug tree-optimization/100089] New: [11 Performance regression ] 30% for denbench/mp2decoddata2 with -O3

2021-04-15 Thread crazylht at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100089

Bug ID: 100089
   Summary: [11 Performance regression ] 30% for
denbench/mp2decoddata2 with -O3
   Product: gcc
   Version: 11.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: tree-optimization
  Assignee: unassigned at gcc dot gnu.org
  Reporter: crazylht at gmail dot com
CC: hjl.tools at gmail dot com
  Target Milestone: ---
  Host: x86_64-pc-linux-gnu
Target: x86_64-*-* i?86-*-*

Created attachment 50597
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=50597=edit
denbench_mp2decoddata2.cpp

https://godbolt.org/z/EGoz1zx61

cat test.cpp

static inline void idctrow(e_s16 *blk)
{
   e_s32 x0, x1, x2, x3, x4, x5, x6, x7, x8;


   if (!((x1 = blk[4]<<11) | (x2 = blk[6]) | (x3 = blk[2]) |
  (x4 = blk[1]) | (x5 = blk[7]) | (x6 = blk[5]) | (x7 = blk[3])))
   {
  blk[0]=blk[1]=blk[2]=blk[3]=blk[4]=blk[5]=blk[6]=blk[7]=(e_s16)blk[0]<<3;
  return;
   }

   x0 = (blk[0]<<11) + 128;


   x8 = 565*(x4+x5);
   x4 = x8 + (2841 -565)*x4;
   x5 = x8 - (2841 +565)*x5;
   x8 = 2408*(x6+x7);
   x6 = x8 - (2408 -1609)*x6;
   x7 = x8 - (2408 +1609)*x7;


   x8 = x0 + x1;
   x0 -= x1;
   x1 = 1108*(x3+x2);
   x2 = x1 - (2676 +1108)*x2;
   x3 = x1 + (2676 -1108)*x3;
   x1 = x4 + x6;
   x4 -= x6;
   x6 = x5 + x7;
   x5 -= x7;
   x7 = x8 + x3;
   x8 -= x3;
   x3 = x0 + x2;
   x0 -= x2;
   x2 = (181*(x4+x5)+128)>>8;
   x4 = (181*(x4-x5)+128)>>8;

   blk[0] = (e_s16)((x7+x1)>>8);
   blk[1] = (e_s16)((x3+x2)>>8);
   blk[2] = (e_s16)((x0+x4)>>8);
   blk[3] = (e_s16)((x8+x6)>>8);
   blk[4] = (e_s16)((x8-x6)>>8);
   blk[5] = (e_s16)((x0-x4)>>8);
   blk[6] = (e_s16)((x3-x2)>>8);
   blk[7] = (e_s16)((x7-x1)>>8);

}

int
__attribute__ ((noipa))
Fast_IDCT(e_s16 *block)
{
   e_s32 i;

   for (i=0; i<8; i++)
  idctrow(block+8*i);

   return 1;
}

 pass_ifcvt transforms the if branch in idctrow into an conditional move, and
then pass_vect finds that although there's no loop vectorization opportunity
but there are opportunities for SLP, but the cost model of SLP does not
consider the cost of these conditional movs, which eventually generates a large
number of redundant test and cmov in codegen.

test.cpp:76:11: note:   stmt 1 MEM[(e_s16 *)_3 + 2B] = _ifc__264;
test.cpp:76:11: note:   stmt 2 MEM[(e_s16 *)_3 + 4B] = _ifc__267;
test.cpp:76:11: note:   stmt 3 MEM[(e_s16 *)_3 + 6B] = _ifc__270;
test.cpp:76:11: note:   stmt 4 MEM[(e_s16 *)_3 + 8B] = _ifc__273;
test.cpp:76:11: note:   stmt 5 MEM[(e_s16 *)_3 + 10B] = _ifc__276;
test.cpp:76:11: note:   stmt 6 MEM[(e_s16 *)_3 + 12B] = _ifc__279;
test.cpp:76:11: note:   stmt 7 MEM[(e_s16 *)_3 + 14B] = _ifc__282;
test.cpp:76:11: note:   children 0x3ec9580
test.cpp:76:11: note: node (external) 0x3ec9580 (max_nunits=1, refcnt=1)
test.cpp:76:11: note:   { _ifc__261, _ifc__264, _ifc__267, _ifc__270,
_ifc__273, _ifc__276, _ifc__279, _ifc__282 }
test.cpp:76:11: note: Cost model analysis: 
0x3c1aee0 _ifc__261 1 times scalar_store costs 12 in body
0x3c1aee0 _ifc__264 1 times scalar_store costs 12 in body
0x3c1aee0 _ifc__267 1 times scalar_store costs 12 in body
0x3c1aee0 _ifc__270 1 times scalar_store costs 12 in body
0x3c1aee0 _ifc__273 1 times scalar_store costs 12 in body
0x3c1aee0 _ifc__276 1 times scalar_store costs 12 in body
0x3c1aee0 _ifc__279 1 times scalar_store costs 12 in body
0x3c1aee0 _ifc__282 1 times scalar_store costs 12 in body
0x3c1aee0 _ifc__261 1 times unaligned_store (misalign -1) costs 12 in body
0x3c1aee0  1 times vec_construct costs 32 in prologue
test.cpp:76:11: note: Cost model analysis for part in loop 1:
  Vector cost: 44
  Scalar cost: 96


int Fast_IDCT (e_s16 * block)
{
  vector(8) short int * vectp.78;
  vector(8) short int * vectp.77;
  e_s32 x0;
  e_s32 x1;
  e_s32 x2;
  e_s32 x3;
  e_s32 x4;
  e_s32 x5;
  e_s32 x6;
  e_s32 x7;
  e_s32 x8;
  e_s32 i;
  long unsigned int i.0_1;
  long unsigned int _2;
  e_s16 * _3;
  unsigned long ivtmp_4;
  unsigned long ivtmp_5;
  short int _10;
  int _11;
  int _12;
  short int _14;
  short int _16;
  short int _17;
  short int _19;
  short int _20;
  short int _22;
  short int _23;
  short int _25;
  short int _26;
  short int _28;
  short int _29;
  long int _31;
  int _34;
  short int _35;
  int _38;
  int _39;
  long int _41;
  long int _43;
  long int _45;
  long int _47;
  long int _49;
  long int _51;
  long int _55;
  long int _57;
  long int _59;
  long int _69;
  long int _70;
  long int _71;
  long int _73;
  long int _74;
  long int _75;
  long int _77;
  long int _78;
  short int _79;
  long int _80;
  long int _81;
  short int _82;
  long int _83;
  long int _84;
  short int _85;
  long int _86;
  long int _87;
  short int _88;
  long int _89;
  long int _90;
  short int _91;
  long int _92;
  long int _93;
  short int _94;
  long int _95;
  long int _96;
  short int _97;
  long int _98;
  

[Bug fortran/99307] FAIL: gfortran.dg/class_assign_4.f90 execution test

2021-04-15 Thread cvs-commit at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=99307

--- Comment #12 from CVS Commits  ---
The master branch has been updated by Paul Thomas :

https://gcc.gnu.org/g:9a0e09f3dd5339bb18cc47317f2298d9157ced29

commit r11-8187-g9a0e09f3dd5339bb18cc47317f2298d9157ced29
Author: Paul Thomas 
Date:   Thu Apr 15 07:34:26 2021 +0100

Fortran: Fix class reallocate on assignment [PR99307].

2021-04-15  Paul Thomas  

gcc/fortran
PR fortran/99307
* symbol.c: Remove trailing white space.
* trans-array.c (gfc_trans_create_temp_array): Create a class
temporary for class expressions and assign the new descriptor
to the data field.
(build_class_array_ref): If the class expr can be extracted,
then use that for 'decl'. Class function results are reliably
handled this way. Call gfc_find_and_cut_at_last_class_ref to
eliminate largely redundant code. Remove dead code and recast
the rest of the code to extract 'decl' for remaining cases.
Call gfc_build_spanned_array_ref.
(gfc_alloc_allocatable_for_assignment): Use class descriptor
element length for 'elemsize1'. Eliminate repeat set of dtype
for class expressions.
* trans-expr.c (gfc_find_and_cut_at_last_class_ref): Include
additional code from build_class_array_ref, and use optional
gfc_typespec pointer argument.
(gfc_trans_scalar_assign): Make use of pre and post blocks for
all class expressions.
* trans.c (get_array_span): For unlimited polymorphic exprs
multiply the span by the value of the _len field.
(gfc_build_spanned_array_ref): New function.
(gfc_build_array_ref): Call gfc_build_spanned_array_ref and
eliminate repeated code.
* trans.h: Add arg to gfc_find_and_cut_at_last_class_ref and
add prototype for gfc_build_spanned_array_ref.

Re: [PATCH V6 2/7] dwarf: new dwarf_debuginfo_p predicate

2021-04-15 Thread Richard Biener via Gcc-patches
On Wed, Apr 14, 2021 at 4:07 PM Jose E. Marchesi via Gcc-patches
 wrote:
>
> This patch introduces a dwarf_debuginfo_p predicate that abstracts and
> replaces complex checks on write_symbols.

OK once stage1 opens (can be pushed independently of the rest).

Richard.

> 2021-04-14  Indu Bhagat  
>
> gcc/ChangeLog
>
> * flags.h (dwarf_debuginfo_p): New function declaration.
> * opts.c (dwarf_debuginfo_p): New function definition.
> * config/c6x/c6x.c (c6x_output_file_unwind): Likewise.
> * dwarf2cfi.c (cfi_label_required_p): Likewise.
> (dwarf2out_do_frame): Likewise.
> * final.c (dwarf2_debug_info_emitted_p): Likewise.
> (final_scan_insn_1): Likewise.
> * targhooks.c (default_debug_unwind_info): Likewise.
> * toplev.c (process_options): Likewise.
>
> gcc/c-family/ChangeLog
>
> * c-lex.c (init_c_lex): Use dwarf_debuginfo_p.
> ---
>  gcc/c-family/c-lex.c |  4 ++--
>  gcc/config/c6x/c6x.c |  3 +--
>  gcc/dwarf2cfi.c  |  9 -
>  gcc/final.c  | 15 ++-
>  gcc/flags.h  |  3 +++
>  gcc/opts.c   |  8 
>  gcc/targhooks.c  |  2 +-
>  gcc/toplev.c |  6 ++
>  8 files changed, 27 insertions(+), 23 deletions(-)
>
> diff --git a/gcc/c-family/c-lex.c b/gcc/c-family/c-lex.c
> index 6374b72ed2d..5174b22c303 100644
> --- a/gcc/c-family/c-lex.c
> +++ b/gcc/c-family/c-lex.c
> @@ -27,6 +27,7 @@ along with GCC; see the file COPYING3.  If not see
>  #include "stor-layout.h"
>  #include "c-pragma.h"
>  #include "debug.h"
> +#include "flags.h"
>  #include "file-prefix-map.h" /* remap_macro_filename()  */
>  #include "langhooks.h"
>  #include "attribs.h"
> @@ -87,8 +88,7 @@ init_c_lex (void)
>
>/* Set the debug callbacks if we can use them.  */
>if ((debug_info_level == DINFO_LEVEL_VERBOSE
> -   && (write_symbols == DWARF2_DEBUG
> -  || write_symbols == VMS_AND_DWARF2_DEBUG))
> +   && dwarf_debuginfo_p ())
>|| flag_dump_go_spec != NULL)
>  {
>cb->define = cb_define;
> diff --git a/gcc/config/c6x/c6x.c b/gcc/config/c6x/c6x.c
> index f9ad1e5f6c5..a10e2f8d662 100644
> --- a/gcc/config/c6x/c6x.c
> +++ b/gcc/config/c6x/c6x.c
> @@ -439,8 +439,7 @@ c6x_output_file_unwind (FILE * f)
>  {
>if (flag_unwind_tables || flag_exceptions)
> {
> - if (write_symbols == DWARF2_DEBUG
> - || write_symbols == VMS_AND_DWARF2_DEBUG)
> + if (dwarf_debuginfo_p ())
> asm_fprintf (f, "\t.cfi_sections .debug_frame, .c6xabi.exidx\n");
>   else
> asm_fprintf (f, "\t.cfi_sections .c6xabi.exidx\n");
> diff --git a/gcc/dwarf2cfi.c b/gcc/dwarf2cfi.c
> index 362ff3fdac2..c27ac1960b0 100644
> --- a/gcc/dwarf2cfi.c
> +++ b/gcc/dwarf2cfi.c
> @@ -39,7 +39,7 @@ along with GCC; see the file COPYING3.  If not see
>  #include "expr.h"  /* init_return_column_size */
>  #include "output.h"/* asm_out_file */
>  #include "debug.h" /* dwarf2out_do_frame, dwarf2out_do_cfi_asm */
> -
> +#include "flags.h" /* dwarf_debuginfo_p */
>
>  /* ??? Poison these here until it can be done generically.  They've been
> totally replaced in this file; make sure it stays that way.  */
> @@ -2289,8 +2289,7 @@ cfi_label_required_p (dw_cfi_ref cfi)
>
>if (dwarf_version == 2
>&& debug_info_level > DINFO_LEVEL_TERSE
> -  && (write_symbols == DWARF2_DEBUG
> - || write_symbols == VMS_AND_DWARF2_DEBUG))
> +  && dwarf_debuginfo_p ())
>  {
>switch (cfi->dw_cfi_opc)
> {
> @@ -3557,9 +3556,9 @@ bool
>  dwarf2out_do_frame (void)
>  {
>/* We want to emit correct CFA location expressions or lists, so we
> - have to return true if we're going to output debug info, even if
> + have to return true if we're going to generate debug info, even if
>   we're not going to output frame or unwind info.  */
> -  if (write_symbols == DWARF2_DEBUG || write_symbols == VMS_AND_DWARF2_DEBUG)
> +  if (dwarf_debuginfo_p ())
>  return true;
>
>if (saved_do_cfi_asm > 0)
> diff --git a/gcc/final.c b/gcc/final.c
> index daae115fef5..cae692062b4 100644
> --- a/gcc/final.c
> +++ b/gcc/final.c
> @@ -1442,7 +1442,8 @@ asm_str_count (const char *templ)
>  static bool
>  dwarf2_debug_info_emitted_p (tree decl)
>  {
> -  if (write_symbols != DWARF2_DEBUG && write_symbols != VMS_AND_DWARF2_DEBUG)
> +  /* When DWARF2 debug info is not generated internally.  */
> +  if (!dwarf_debuginfo_p ())
>  return false;
>
>if (DECL_IGNORED_P (decl))
> @@ -2330,10 +2331,8 @@ final_scan_insn_1 (rtx_insn *insn, FILE *file, int 
> optimize_p ATTRIBUTE_UNUSED,
>   break;
>
> case NOTE_INSN_BLOCK_BEG:
> - if (debug_info_level == DINFO_LEVEL_NORMAL
> - || debug_info_level == DINFO_LEVEL_VERBOSE
> - || write_symbols == DWARF2_DEBUG
> - || write_symbols == VMS_AND_DWARF2_DEBUG
> + if 

Re: [RFC] Run pass_sink_code once more after ivopts/fre

2021-04-15 Thread Xionghu Luo via Gcc-patches
Thanks,

On 2021/4/14 14:41, Richard Biener wrote:
>> "#538,#235,#234,#233" will all be sunk from bb 35 to bb 37 by rtl-sink,
>> but it moves #538 first, then #235, there is strong dependency here. It
>> seemsdoesn't like the LCM framework that could solve all and do the
>> delete-insert in one iteration.
> So my question was whether we want to do both within the LCM store
> sinking framework.  The LCM dataflow is also used by RTL PRE which
> handles both loads and non-loads so in principle it should be able
> to handle stores and non-stores for the sinking case (PRE on the
> reverse CFG).
> 
> A global dataflow is more powerful than any local ad-hoc method.

My biggest concern is whether the LCM DF framework could support sinking
*multiple* reverse-dependent non-store instructions together by *one*
calling of LCM DF.   If this is not supported, we need run multiple LCM
until no new changes, it would be time consuming obviously (unless
compiling time is not important here).

> 
> Richard.
> 
>> However, there are still some common methods could be shared, like the
>> def-use check(though store-motion is per bb, rtl-sink is per loop),
>> insert_store, commit_edge_insertions etc.
>>
>>
>>508: L508:
>>507: NOTE_INSN_BASIC_BLOCK 34
>> 12: r139:DI=r140:DI
>>REG_DEAD r140:DI
>>240: L240:
>>231: NOTE_INSN_BASIC_BLOCK 35
>>232: r142:DI=zero_extend(r139:DI#0)
>>233: r371:SI=r142:DI#0-0x1
>>234: r243:DI=zero_extend(r371:SI)
>>REG_DEAD r371:SI
>>235: r452:DI=r262:DI+r139:DI
>>538: r194:DI=r452:DI
>>236: r372:CCUNS=cmp(r142:DI#0,r254:DI#0)


Like here, Each instruction's dest reg is calculated in the input vector
bitmap, after solving the equations by calling pre_edge_rev_lcm, 
move #538 out of loop for the first call, then move #235 out of loop
after a second call... 4 repeat calls needed in total here, is the LCM
framework smart enough to move the all 4 instruction within one iteration?
I am worried that the input vector bitmap couldn't solve the dependency
problem for two back chained instructions.


-- 
Thanks,
Xionghu


[Bug tree-optimization/93210] Sub-optimal code optimization on struct/combound constexpr (gcc vs. clang)

2021-04-15 Thread cvs-commit at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=93210

--- Comment #7 from CVS Commits  ---
The master branch has been updated by Stefan Schulze Frielinghaus
:

https://gcc.gnu.org/g:417c36cfd620bf2b047852c2aa9ac49004aed2bc

commit r11-8186-g417c36cfd620bf2b047852c2aa9ac49004aed2bc
Author: Stefan Schulze Frielinghaus 
Date:   Thu Apr 15 08:03:47 2021 +0200

re PR tree-optimization/93210 (Sub-optimal code optimization on
struct/combound constexpr (gcc vs. clang))

Regarding test gcc.dg/pr93210.c, on different targets GIMPLE code may
slightly differ which is why the scan-tree-dump-times directive may
fail.  For example, for a RETURN_EXPR on x86_64 we have

  return 0x11100f0e0d0c0a090807060504030201;

whereas on IBM Z the first operand is a RESULT_DECL like

   = 0x102030405060708090a0c0d0e0f1011;
  return ;

gcc/testsuite/ChangeLog:

* gcc.dg/pr93210.c: Adapt regex in order to also support a
RESULT_DECL as an operand for a RETURN_EXPR.

Re: removing toxic emailers

2021-04-15 Thread Thomas Koenig via Gcc

My 0.02 Euro-Cent:

There is a minor problem with contributors being overly harsh/
borderline abusive on the mailing list.  In my > 15 years with
the project, I have only had that problem with one single
person, and I have resolved that by never again touching the
system that particular person is responsible for, also not
for testing.

The _real_ problem is in bugzilla, mostly with abusive users
complaining about the time it sometimes takes to fix bugs
("Why didn't you fix this?  Are you stupid or what? That bug
has been open for _weeks_!") or who will not understand that
their program has an error, and insist on the compiler sanctioning
their particular non-standard usage.

On bugzilla, there is also a rather minor problem with contributors
being overly harsh/borderline abusive, but that is also quite
restrictive.

If we talk about gcc becoming a more welcoming place, bugzilla
is the place to start.



<    1   2   3