[Bug c++/52202] [C++11][DR 1376] Should not extend lifetime of temporary wrapped in static_cast to reference type

2021-04-06 Thread jens.maurer at gmx dot net via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=52202

--- Comment #5 from Jens Maurer  ---
Core issue 1299 resolved via http://wg21.link/p0727 does in fact
lifetime-extend the temporary in the example.

This bug report should therefore be closed without action.
(If a test case is missing that lifetime-extension does happen, the example
code can have its "abort" condition reversed to suit that purpose.)

[Bug target/99905] [8/9/10/11 Regression] wrong code with -mno-mmx -mno-sse since r7-4540-gb229ab2a712ccd44

2021-04-06 Thread marxin at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=99905

Martin Liška  changed:

   What|Removed |Added

Summary|[8/9/10/11 Regression]  |[8/9/10/11 Regression]
   |wrong code with -mno-mmx|wrong code with -mno-mmx
   |-mno-sse|-mno-sse since
   ||r7-4540-gb229ab2a712ccd44
 CC||matz at gcc dot gnu.org

--- Comment #4 from Martin Liška  ---
> If -mgeneral-regs-only is not used, the flag can be replaced by -mno-mmx
> -mno-sse:

Good idea! Doing that, it started with r7-4540-gb229ab2a712ccd44.

[Bug target/99924] [11 Regression] ICE in vect_schedule_slp_node, at tree-vect-slp.c:6040 since r11-6734-gad2603433853129e847cade5e269c6a5f889a020

2021-04-06 Thread marxin at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=99924

Martin Liška  changed:

   What|Removed |Added

   Last reconfirmed||2021-04-06
  Known to fail||11.0
  Known to work||10.2.0
 Status|UNCONFIRMED |NEW
Summary|ICE in  |[11 Regression] ICE in
   |vect_schedule_slp_node, at  |vect_schedule_slp_node, at
   |tree-vect-slp.c:6040|tree-vect-slp.c:6040 since
   ||r11-6734-gad2603433853129e8
   ||47cade5e269c6a5f889a020
 CC||marxin at gcc dot gnu.org,
   ||tnfchris at gcc dot gnu.org
 Ever confirmed|0   |1

--- Comment #1 from Martin Liška  ---
Started with r11-6734-gad2603433853129e847cade5e269c6a5f889a020.

[Bug target/99924] [11 Regression] ICE in vect_schedule_slp_node, at tree-vect-slp.c:6040 since r11-6734-gad2603433853129e847cade5e269c6a5f889a020

2021-04-06 Thread marxin at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=99924

Martin Liška  changed:

   What|Removed |Added

   Target Milestone|--- |11.0

[Bug bootstrap/99920] [10 regression] ICE building gcc 10 on power 7 BE

2021-04-06 Thread rguenth at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=99920

Richard Biener  changed:

   What|Removed |Added

   Priority|P3  |P1
   Target Milestone|--- |10.3
 CC||jakub at gcc dot gnu.org,
   ||rguenth at gcc dot gnu.org

[Bug bootstrap/99920] [10 regression] ICE building gcc 10 on power 7 BE

2021-04-06 Thread jakub at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=99920

--- Comment #1 from Jakub Jelinek  ---
Could you please attach preprocessed source so that I can try to look at it
quickly with a cross-compiler?  Thanks.

[Bug c/99872] [11 Regression] optimizations sometimes lead to missing asm prefixes

2021-04-06 Thread rguenth at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=99872

Richard Biener  changed:

   What|Removed |Added

   Target Milestone|--- |11.0
   Last reconfirmed||2021-04-06
 Status|UNCONFIRMED |WAITING
 Ever confirmed|0   |1

--- Comment #1 from Richard Biener  ---
Are you saying your provided __fpclassify should not have another underscore?

[Bug tree-optimization/99873] [11 Regression] GCC no longer makes as much use of ST3

2021-04-06 Thread rguenth at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=99873

Richard Biener  changed:

   What|Removed |Added

   Target Milestone|--- |11.0

--- Comment #2 from Richard Biener  ---
We can also undo the splitting if SLP doesn't work out (keep the original
DR analysis chaining somewhere).

[Bug c++/99845] gcc8: Overloaded operator new[](size_t, const std::nothrow_t&) is seg faulting when the allocation fails

2021-04-06 Thread keith.halligan at microfocus dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=99845

Keith Halligan  changed:

   What|Removed |Added

 Status|RESOLVED|REOPENED
 Resolution|INVALID |---

--- Comment #11 from Keith Halligan  ---

I am re-opening this issue as there seems to be a bug with the 32-bit code
generation that I'm after noticing.  

While adding noexcept to the "opeator new[]()" overloaded functions does stop
the crash on 64-bit, it does nothing for the 32-bit code, with the compiler
attempting to throw a std::bad_alloc.

Below is a version of Jonathan Wakeley's modified testcase that has noexcept on
the "operator new[]()".  

$ cat bug99845.cpp && g++ -m32 -O1 -o bug99845 bug99845.cpp

namespace std {
  using size_t = decltype(sizeof(0));

  struct nothrow_t { } const nothrow = { };
}

void* operator new(std::size_t);
void* operator new[](std::size_t);
void operator delete(void*) noexcept;
void operator delete[](void*) noexcept;
void operator delete(void*, std::size_t) noexcept;
void operator delete[](void*, std::size_t) noexcept;

void* operator new(std::size_t, const std::nothrow_t&) noexcept;
void* operator new[](std::size_t, const std::nothrow_t&) noexcept;
void operator delete(void*, const std::nothrow_t&) noexcept;
void operator delete[](void*, const std::nothrow_t&) noexcept;

extern "C" int printf(const char* ...);

using std::size_t;

struct X
{
  void* operator new[](size_t sz, const std::nothrow_t& nt) noexcept {
return ::operator new(sz, nt);
  }

  unsigned data = 0;
};

struct Y
{
  static X* alloc(unsigned n) { return new(std::nothrow) X[n]; }
};

int main()
{
  Y::alloc(-1u);
}

==
$ ./bug99845
terminate called after throwing an instance of 'std::bad_array_new_length'
  what():  std::bad_array_new_length
Aborted (core dumped)

[Bug sanitizer/99877] [8/9/10/11 Regression] Crash in GIMPLE pass:sanopt in huge function using OpenMP since r8-7544-gd838c2d5a8b1844c

2021-04-06 Thread rguenth at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=99877

Richard Biener  changed:

   What|Removed |Added

   Target Milestone|--- |8.5
   Priority|P3  |P2

[Bug tree-optimization/99880] [10/11 Regression] ICE in maybe_set_vectorized_backedge_value, at tree-vect-loop.c:9161 since r10-3711-g69f8c1aef5cdcc54

2021-04-06 Thread rguenth at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=99880

Richard Biener  changed:

   What|Removed |Added

   Assignee|unassigned at gcc dot gnu.org  |rguenth at gcc dot 
gnu.org
   Priority|P3  |P2
   Target Milestone|--- |10.3
 Status|NEW |ASSIGNED

--- Comment #2 from Richard Biener  ---
Mine.

[Bug c++/99926] New: Parameter packs and variadic arguments: Clang, gcc, and msvc differ on this one

2021-04-06 Thread matthurd at acm dot org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=99926

Bug ID: 99926
   Summary: Parameter packs and variadic arguments: Clang, gcc,
and msvc differ on this one
   Product: gcc
   Version: 11.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: c++
  Assignee: unassigned at gcc dot gnu.org
  Reporter: matthurd at acm dot org
  Target Milestone: ---

I found three compiler differences from three compilers:
https://godbolt.org/z/cEoYrn4T8 - two are wrong.

g++ trunk and 10.2 affected.

I thought gcc may be correct and clang may be incorrect in this compiler
difference so I filed a bug with llvm. Richard Smith surmised gcc was incorrect
and clang is correct, so I have closed the clang bug and I'm opening a gcc bug
here if you'll indulge me.

[Note: clang issue: https://bugs.llvm.org/show_bug.cgi?id=49852]




gcc, clang, and msvc all compile this fun one:

auto foo(auto..) { return 42; }
int add_three() { return foo(3,4,5); }



But they argue about this curious one:

[[nodiscard]] constexpr auto foo(auto...t...) noexcept {return (... + t);}
int add_three() { return foo(3,4,5);} //gcc(7),  clang(12), msvc(err)
int add_more()  { return foo(3,4,5,6);   } //gcc(18), clang(18), msvc(err)

https://godbolt.org/z/cEoYrn4T8


It looks like gcc may be failing to extend the deduction as clang does. Like
clang, EDG extends it as well, Richard reported. msvc will give the same answer
as clang if auto is not used and it is a normal template expansion. This leaves
gcc as the outlier I guess.


That gcc may be wrong makes sense, though it makes a C-style variadic after a
parameter pack unreachable which is a wee semantic whole in the grammar I
guess.

--Matt.

[Bug target/99881] Regression compare -O2 -ftree-vectorize with -O2 on SKX/CLX

2021-04-06 Thread rguenth at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=99881

Richard Biener  changed:

   What|Removed |Added

 Target||x86_64-*-*

--- Comment #3 from Richard Biener  ---
But 2 element construction _should_ be cheap.  What is missing is the move
cost from GPR to XMM regs (but we do not have a good idea whether the sources
are memory, so it's not as clear-cut here either).

IMHO a better approach might be to up unaligned vector store/load costs?

For the testcase at hand why does a throughput of 1 pose a problem?  There's
only one punpckldq instruction around?

Note that for the case of non-loop vectorization of 'double' the two element
vector CTORs are common and important to handle cheaply.  See also all the
discussion in PR98856

[Bug tree-optimization/99887] Failure to optimize log2 pattern to clz

2021-04-06 Thread rguenth at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=99887

Richard Biener  changed:

   What|Removed |Added

   Last reconfirmed||2021-04-06
 Ever confirmed|0   |1
 Status|UNCONFIRMED |NEW

--- Comment #1 from Richard Biener  ---
Confirmed.  We're not having CLZ pattern detection (niter analysis can do
popcount only)

[Bug tree-optimization/99887] Failure to optimize log2 pattern to clz

2021-04-06 Thread jakub at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=99887

Jakub Jelinek  changed:

   What|Removed |Added

 CC||jakub at gcc dot gnu.org

--- Comment #2 from Jakub Jelinek  ---
Related to PR94793

[Bug target/99905] [8/9/10/11 Regression] wrong code with -mno-mmx -mno-sse since r7-4540-gb229ab2a712ccd44

2021-04-06 Thread rguenth at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=99905

Richard Biener  changed:

   What|Removed |Added

   Priority|P3  |P2
   Target Milestone|--- |8.5

[Bug target/99908] SIMD: negating logical + if_else has a suboptimal codegen.

2021-04-06 Thread rguenth at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=99908

Richard Biener  changed:

   What|Removed |Added

 Status|UNCONFIRMED |NEW
   Last reconfirmed||2021-04-06
 Ever confirmed|0   |1
Version|unknown |11.0

--- Comment #1 from Richard Biener  ---
Confirmed.  On GIMPLE the intrinsics are opaque:

   [local count: 1073741824]:
  _10 = __builtin_ia32_andnotsi256 (mask_3(D), { -1, -1, -1, -1 });
  _7 = VIEW_CONVERT_EXPR(_10);
  _4 = VIEW_CONVERT_EXPR(b_6(D));
  _2 = VIEW_CONVERT_EXPR(a_5(D));
  _8 = __builtin_ia32_pblendvb256 (_2, _4, _7);
  _9 = VIEW_CONVERT_EXPR<__m256i>(_8);
  return _9;

and on RTL the blend is an UNSPEC:

(insn 14 13 15 2 (set (reg:V32QI 93)
(unspec:V32QI [
(reg:V32QI 94)
(reg:V32QI 95)
(reg:V32QI 96)
] UNSPEC_BLENDV)) "include/avx2intrin.h":209:20 -1
 (nil))

that makes it a target missed optimization.

[Bug c++/99910] [11 Regression] g++.dg/modules/xtreme-header-2_b.C ICE

2021-04-06 Thread rguenth at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=99910

Richard Biener  changed:

   What|Removed |Added

   Target Milestone|--- |11.0

[Bug c++/99911] C++20 adl issue

2021-04-06 Thread rguenth at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=99911

Richard Biener  changed:

   What|Removed |Added

Version|unknown |11.0
   Keywords||rejects-valid

--- Comment #1 from Richard Biener  ---
Please provide the testcase not only through a godbolt link.

[Bug target/99748] MVE: Wrong code at -O0 with float to integer conversion

2021-04-06 Thread cvs-commit at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=99748

--- Comment #4 from CVS Commits  ---
The master branch has been updated by Alex Coplan :

https://gcc.gnu.org/g:16ea7f57891d3fe885ee55b2917208695e184714

commit r11-7999-g16ea7f57891d3fe885ee55b2917208695e184714
Author: Alex Coplan 
Date:   Tue Apr 6 09:06:27 2021 +0100

arm: Fix PCS for SFmode -> SImode libcalls [PR99748]

This patch fixes PR99748 which shows us trying to pass the argument to
__aeabi_f2iz in the VFP register s0 when the library function is
expecting to use the GPR r0. It also fixes the __aeabi_f2uiz case which
was broken in the same way.

For the testcase in the PR, here is the code we generate before the
patch (with -mfloat-abi=hard -march=armv8.1-m.main+mve -O0):

main:
push{r7, lr}
sub sp, sp, #8
add r7, sp, #0
mov r3, #1065353216
str r3, [r7, #4]@ float
vldr.32 s0, [r7, #4]
bl  __aeabi_f2iz
mov r3, r0
cmp r3, #1
[...]

This becomes:

main:
push{r7, lr}
sub sp, sp, #8
add r7, sp, #0
mov r3, #1065353216
str r3, [r7, #4]@ float
ldr r0, [r7, #4]@ float
bl  __aeabi_f2iz
mov r3, r0
cmp r3, #1
[...]

after the patch. We see a similar change for the same testcase with a
cast to unsigned instead of int.

gcc/ChangeLog:

PR target/99748
* config/arm/arm.c (arm_libcall_uses_aapcs_base): Also use base
PCS for [su]fix_optab.

[Bug c++/99926] Parameter packs and variadic arguments: Clang, gcc, and msvc differ on this one

2021-04-06 Thread matthurd at acm dot org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=99926

--- Comment #1 from Matt Hurd  ---
Just a correction to the commentary. The variadic after the pack is not
unreachable as Richard Smith points out the following code can make the
variadic argument reachable if you wrote such evil and it had a name:


> auto *p = foo;
> p(3, 4, 5); // passes the '5' via the C-style variable argument list.

--Matt.

[Bug target/99748] MVE: Wrong code at -O0 with float to integer conversion

2021-04-06 Thread acoplan at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=99748

--- Comment #5 from Alex Coplan  ---
Fixed on trunk so far, needs a backport to GCC 10.

[Bug c++/99911] C++20 adl issue

2021-04-06 Thread denis.yaroshevskij at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=99911

--- Comment #2 from Denis Yaroshevskiy  ---
Created attachment 50510
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=50510&action=edit
Test Case (-std=c++20 -O3)

[Bug c++/99911] C++20 adl issue

2021-04-06 Thread denis.yaroshevskij at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=99911

--- Comment #3 from Denis Yaroshevskiy  ---
Also removed catch dependency from godbolt if that was the issue:
https://gcc.godbolt.org/z/1YEoeeP93

[Bug tree-optimization/99927] New: [11 Regression] Maybe wrong code since r11-39-gf9e1ea10e657af9f

2021-04-06 Thread marxin at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=99927

Bug ID: 99927
   Summary: [11 Regression] Maybe wrong code since
r11-39-gf9e1ea10e657af9f
   Product: gcc
   Version: 11.0
Status: UNCONFIRMED
  Keywords: wrong-code
  Severity: normal
  Priority: P3
 Component: tree-optimization
  Assignee: unassigned at gcc dot gnu.org
  Reporter: marxin at gcc dot gnu.org
CC: rguenth at gcc dot gnu.org
  Target Milestone: ---

Reduced from yarpgen:

$ cat func.cpp
short var = 9;

int test_var_1 = 0, test_var_5 = 0, test_var_8 = 0, test_var_10 = 0;

void test(unsigned var_6, unsigned long long var_9) {

  for (; test_var_10;)
if (test_var_5)
  for (;; test_var_1 += test_var_8)
;
  for (int i_10 = 0; i_10 < 23; i_10 += 4)
for (unsigned int i_11 = 0; i_11 < var_6 + 471511258; i_11 ++)
  if ((var_9 == 0) % var_6)
var = 0;
}

int main() {
  test(3823456048, 10675217251973);
  __builtin_printf("%u\n", var);
  if (var != 9)
__builtin_abort ();

  return 0;
}

$ g++ func.cpp && ./a.out
9
$ g++ func.cpp -O3 && ./a.out
0
Aborted (core dumped)

[Bug target/99912] Unnecessary / inefficient spilling of AVX2 ymm registers

2021-04-06 Thread rguenth at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=99912

Richard Biener  changed:

   What|Removed |Added

 CC||rguenth at gcc dot gnu.org
   Keywords||missed-optimization
 Target||x86_64-*-*

--- Comment #3 from Richard Biener  ---
Which function does the loop kernel reside in?  I see you have some lambdas
in Z4c_RHS, done fancy as out-of-line functions, that do look like they
could comprise the actual kernels.  In apply_upwind_diss I see cases without
stack usage.

I'm looking at -O2 -march=skylake compiles

Note that with C++ it's easy to retain some abstraction and thus misinterpret
stack accesses as spilling where they are aggregates not eliminated.  For
example in one of the lambdas I see

  _61489 = __builtin_ia32_maskloadpd256 (_104487, _61513);
  D.545024[1].elts.car = _61489;
...
  MEM[(struct vect *)&D.544982].elts._M_elems[1] = MEM[(const struct simd
&)&D.545024 + 32];
...
  MEM[(struct mat3 *)&vars + 992B] = MEM[(const struct mat3 &)&D.544982];

and D.544982 is later variable indexed in some MIN/MAX, FMA using code
(instead of using 'vars' there).  Looking at what -fdump-tree-optimized
produces is sometimes pointing at problems.

That said, the code is large so please point at some source lines within the
important kernel(s) (of the preprocessed source, that is) and the compile
options used.

[Bug c/99872] [11 Regression] optimizations sometimes lead to missing asm prefixes

2021-04-06 Thread jyong at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=99872

--- Comment #2 from jyong at gcc dot gnu.org ---
No, its the internal compiler symbols like LC5 and _LC6 generated by GCC
ignoring the underscore prefix setting for the target, causing GAS to emit them
as external undefined symbols. LD fails to find the symbols to satisfy them
upon linking.

32bit Windows PE symbols should come with an underscore prefix, this does not
apply to 64bit Windows code.

[Bug tree-optimization/99918] [9/10/11 Regression] suboptimal code for bool bitfield tests

2021-04-06 Thread rguenth at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=99918

Richard Biener  changed:

   What|Removed |Added

 Status|UNCONFIRMED |NEW
   Last reconfirmed||2021-04-06
 Blocks||85316
 Ever confirmed|0   |1
   Target Milestone|--- |9.4

--- Comment #5 from Richard Biener  ---
The main issue is optimize_bit_field_compare in fold-const.c which produces
during GENERIC folding in .005t.original:

  if ((BIT_FIELD_REF  & 1) != 0)
{
  b.j = 0;
}
  else
{
  b.j = b.i;
}
  return b.j;

that's premature in this place.  For f() it also takes until DOM3 to do
the folding unless you disable SRA which then makes EVRP recognize the
second store as a.j = 0.  With SRA we fail to derive ranges for a_10 in

  a_10 = MEM  [(struct A *)&a];
  a$1_11 = MEM  [(struct A *)&a + 1B];
  _1 = VIEW_CONVERT_EXPR<_Bool>(a_10);
  if (_1 != 0)
goto ; [INV]
  else
goto ; [INV]

   :

   :
  # a$1_9 = PHI <0(2), a_10(3)>
  _7 = VIEW_CONVERT_EXPR<_Bool>(a$1_9);

thus we're missing looking through VIEW_CONVERT_EXPR in register_assert_for.
Amending that would eventually also allow optimizing the prematurely folded
vairant.


Referenced Bugs:

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=85316
[Bug 85316] [meta-bug] VRP range propagation missed cases

[Bug tree-optimization/99919] [9/10/11 Regression] bogus -Wmaybe-uninitialized with a _Bool bit-field

2021-04-06 Thread rguenth at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=99919

Richard Biener  changed:

   What|Removed |Added

   Target Milestone|--- |9.4

[Bug middle-end/99928] New: [OpenMP] reduction variable in combined target construct wrongly mapped as firstprivate

2021-04-06 Thread burnus at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=99928

Bug ID: 99928
   Summary: [OpenMP] reduction variable in combined target
construct wrongly mapped as firstprivate
   Product: gcc
   Version: 11.0
Status: UNCONFIRMED
  Keywords: openmp, wrong-code
  Severity: normal
  Priority: P3
 Component: middle-end
  Assignee: unassigned at gcc dot gnu.org
  Reporter: burnus at gcc dot gnu.org
CC: anlauf at gcc dot gnu.org, jakub at gcc dot gnu.org
  Target Milestone: ---

Reported by Harald at
https://gcc.gnu.org/pipermail/fortran/2021-March/055896.html
with a Fortran program. But it also occurs for C as shown below.

OpenMP 5.1 has:
"If a list item appears in a reduction, lastprivate or linear clause
 on a combined target construct then it is treated as if it also appears
 in a map clause with a map-type of tofrom." (2.21.7)

Likewise OpenMP 5.0 in 2.19.7.

The FE chops the target-teams into target + teams and for target;
's' is implicitly mapped as 'firstprivate' instead of as 'map(tofrom:'.

It seems as if this is best done on the C/C++/Fortran FE which still sees that
it is a combined target construct.

int a[10];

int s2()
{
  int s = 0;
  #pragma omp target data map(a,s)
  {
#pragma omp target teams reduction(+:s)
{
  for (int i=0; i < 10; i++)
s += a[i];
}
  }
  return s;
}

Original dump:

 #pragma omp target data map(tofrom:s) map(tofrom:a)
  #pragma omp target
#pragma omp teams reduction(+:s)

omplower dump:

#pragma omp target data map(tofrom:s [len: 4]) map(tofrom:a [len: 40])
#pragma omp target num_teams(0) thread_limit(0) firstprivate(s) map(tofrom:a
[len: 40])
#pragma omp teams reduction(+:s) shared(a)

[Bug target/99924] [11 Regression] ICE in vect_schedule_slp_node, at tree-vect-slp.c:6040 since r11-6734-gad2603433853129e847cade5e269c6a5f889a020

2021-04-06 Thread rguenth at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=99924

Richard Biener  changed:

   What|Removed |Added

   Keywords||ice-on-valid-code
   Assignee|unassigned at gcc dot gnu.org  |rguenth at gcc dot 
gnu.org
 Status|NEW |ASSIGNED
   Priority|P3  |P1

--- Comment #2 from Richard Biener  ---
Confirmed.  I will have a look.

[Bug tree-optimization/99873] [11 Regression] GCC no longer makes as much use of ST3

2021-04-06 Thread rsandifo at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=99873

--- Comment #3 from rsandifo at gcc dot gnu.org  
---
(In reply to Richard Biener from comment #2)
> We can also undo the splitting if SLP doesn't work out (keep the original
> DR analysis chaining somewhere).
Yeah, that sounds like something we should do for the cases that
can't use store-lanes.  So far though, I've not seen any cases that
are better with the split group than with the store-lanes version,
so I think we want the skip even if SLP would succeed.

[Bug target/99924] [11 Regression] ICE in vect_schedule_slp_node, at tree-vect-slp.c:6040 since r11-6734-gad2603433853129e847cade5e269c6a5f889a020

2021-04-06 Thread tnfchris at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=99924

--- Comment #3 from Tamar Christina  ---
(In reply to Richard Biener from comment #2)
> Confirmed.  I will have a look.

It's interesting since the cost model needs to be disabled to reproduce it.

It looks like when it is one of the load nodes already has a VEC_STMT so the
assert crashed in schedule_slp_node

  vec_stmts = {
m_vec = 0x4e0e870
  },

>>> dbgrep (node)
# VUSE <.MEM_6(D)>
_14 = REALPART_EXPR <*tfn_7(D)>;

If that helps.

[Bug tree-optimization/99927] [11 Regression] Maybe wrong code since r11-39-gf9e1ea10e657af9f

2021-04-06 Thread rguenth at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=99927

Richard Biener  changed:

   What|Removed |Added

 Status|UNCONFIRMED |NEW
 Ever confirmed|0   |1
   Last reconfirmed||2021-04-06
 CC||amacleod at redhat dot com
   Priority|P3  |P1
   Target Milestone|--- |11.0

--- Comment #1 from Richard Biener  ---
-fdisable-tree-cunroll fixes it but not disabling the lim pass after it but
disabling lim2 which then no longer makes us unroll the loop.

Disabling DOM3 _and_ VRP2 also fixes the issue, it looks like some bogus VRP
gets triggered.

[Bug middle-end/99928] [OpenMP] reduction variable in combined target construct wrongly mapped as firstprivate

2021-04-06 Thread jakub at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=99928

--- Comment #1 from Jakub Jelinek  ---
Yes, that is new in OpenMP 5.0, 4.5 didn't have it.
Usually we do this in the gimplifier (gimplify_scan_omp_clauses), we also know
there whether it is a combined construct or not.
Look for the various spots where we omp_notice_variable in the outer context
there based on various conditions.

[Bug target/99929] New: SVE: Wrong code at -O2 -ftree-vectorize

2021-04-06 Thread acoplan at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=99929

Bug ID: 99929
   Summary: SVE: Wrong code at -O2 -ftree-vectorize
   Product: gcc
   Version: 11.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: target
  Assignee: unassigned at gcc dot gnu.org
  Reporter: acoplan at gcc dot gnu.org
  Target Milestone: ---

AArch64 GCC miscompiles the following testcase:

#include 
static void e(unsigned long long *g, int p2) { *g ^= p2; }
static unsigned long long b;
static int f[1][1][1][1];
static long l[23][2];
static short m[23];
int main() {
  for (unsigned i = 0; i < 23; ++i)
for (unsigned j = 0; j < 2; ++j)
  l[i][j] = m[i] = 4;
  if (svaddv(svptrue_pat_b32(SV_VL1), svdup_u32(1)) != 1)
__builtin_abort();
  for (unsigned i = 0; i < 3; ++i)
e(&b, m[i]);
}

with -march=armv8.2-a+sve -O2 -ftree-vectorize. At -O2 (without
-ftree-vectorize), we do the reduction with:

  uaddv d0, p0, z0.s

where the predicate is generated by:

  ptrue p0.b, vl1

which gives the expected result. With -ftree-vectorize, we do the reduction
with:

  uaddv d0, p1, z0.s

where the predicate is generated by:

  ptrue p1.h, all

which does not give the expected result.

[Bug bootstrap/99920] [10 regression] ICE building gcc 10 on power 7 BE

2021-04-06 Thread jakub at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=99920

--- Comment #2 from Jakub Jelinek  ---
Can't reproduce btw, ../configure --enable-languages=c,fortran,c++
--with-cpu=power7 --enable-bootstrap --enable-multilib
on gcc110.fsffrance.org built just fine.

[Bug target/99890] The -mstrict-align doesn't support the ARM targets

2021-04-06 Thread rearnsha at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=99890

Richard Earnshaw  changed:

   What|Removed |Added

 Status|WAITING |RESOLVED
 Resolution|--- |WORKSFORME

--- Comment #4 from Richard Earnshaw  ---
Try -mno-unaligned-access

[Bug target/99908] SIMD: negating logical + if_else has a suboptimal codegen.

2021-04-06 Thread crazylht at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=99908

--- Comment #2 from Hongtao.liu  ---
I'm testing

@@ -17759,6 +17759,35 @@ (define_insn "_pblendvb"
(set_attr "btver2_decode" "vector,vector,vector")
(set_attr "mode" "")])

+(define_split
+  [(set (match_operand:VI1_AVX2 0 "register_operand")
+   (unspec:VI1_AVX2
+ [(match_operand:VI1_AVX2 1 "vector_operand")
+  (match_operand:VI1_AVX2 2 "register_operand")
+  (not:VI1_AVX2 (match_operand:VI1_AVX2 3 "register_operand"))]
+ UNSPEC_BLENDV))]
+  "TARGET_SSE4_1"
+  [(set (match_dup 0)
+   (unspec:VI1_AVX2
+ [(match_dup 2) (match_dup 1) (match_dup 3)]
+ UNSPEC_BLENDV))])
+
+(define_split
+  [(set (match_operand:VI1_AVX2 0 "register_operand")
+   (unspec:VI1_AVX2
+ [(match_operand:VI1_AVX2 1 "vector_operand")
+  (match_operand:VI1_AVX2 2 "register_operand")
+  (subreg:VI1_AVX2 (not (match_operand 3 "register_operand")) 0)]
+ UNSPEC_BLENDV))]
+  "TARGET_SSE4_1
+   && GET_MODE_CLASS (GET_MODE (operands[3])) == MODE_VECTOR_INT
+   && GET_MODE_SIZE (GET_MODE (operands[3])) == "
+  [(set (match_dup 0)
+   (unspec:VI1_AVX2
+ [(match_dup 2) (match_dup 1) (match_dup 4)]
+ UNSPEC_BLENDV))]
+  "operands[4] = gen_lowpart (mode, operands[3]);")
+

[Bug c++/97900] [9/10 Regression] g++ crashes when instantiating a templated function with a template-type vector parameter

2021-04-06 Thread cvs-commit at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97900

--- Comment #5 from CVS Commits  ---
The master branch has been updated by Jakub Jelinek :

https://gcc.gnu.org/g:ffc2331d7994d7fabb1f6ebed931024a9bbe69f2

commit r11-8000-gffc2331d7994d7fabb1f6ebed931024a9bbe69f2
Author: Jakub Jelinek 
Date:   Tue Apr 6 11:46:32 2021 +0200

testsuite: Fix up g++.dg/ext/vector40.C test

The test FAILs on i686-linux due to -Wpsabi diagnostics.

2021-04-06  Jakub Jelinek  

PR c++/97900
* g++.dg/ext/vector40.C: Add -Wno-psabi -w to dg-options.

[Bug lto/99898] Possible LTO object incompatibility on gcc-10 branch

2021-04-06 Thread hubicka at ucw dot cz via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=99898

--- Comment #5 from Jan Hubicka  ---
> The LTO minor saw a bump around Sep 10 last year already so the object files
> must be younger or LTO should complain.
> 
> I'm not aware of any specific change where we forgot the bumping but there 
> were
> a lot of changes and since we did already bump bumping again shouldn't cause
> any harm.  Still I'd like to be sure we're not seeing a genuine streaming bug
> here.

I only reacall backporting the streaming fixes early in gcc10 timeframe
(August) that was reason for the September bump.
Didn't we backport some new command line options/params breaking
streaming of optimization nodes as usual?

honza

[Bug target/99881] Regression compare -O2 -ftree-vectorize with -O2 on SKX/CLX

2021-04-06 Thread crazylht at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=99881

--- Comment #4 from Hongtao.liu  ---
(In reply to Richard Biener from comment #3)
> But 2 element construction _should_ be cheap.  What is missing is the move
> cost from GPR to XMM regs (but we do not have a good idea whether the sources
> are memory, so it's not as clear-cut here either).
> 
> IMHO a better approach might be to up unaligned vector store/load costs?
> 
> For the testcase at hand why does a throughput of 1 pose a problem?  There's
> only one punpckldq instruction around?
> 

There're several lea/add(which also may use port 5) instructions around
punckldq, considering that FAST LEA and Int ALU will be common in address
computation, throughput of 1 for punckldq will be a bottleneck.

refer to https://godbolt.org/z/hK9r5vTzd for original case

> Note that for the case of non-loop vectorization of 'double' the two element
> vector CTORs are common and important to handle cheaply.  See also all the
> discussion in PR98856

[Bug lto/99898] Possible LTO object incompatibility on gcc-10 branch

2021-04-06 Thread hubicka at ucw dot cz via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=99898

--- Comment #6 from Jan Hubicka  ---
> I only reacall backporting the streaming fixes early in gcc10 timeframe
> (August) that was reason for the September bump.
> Didn't we backport some new command line options/params breaking
> streaming of optimization nodes as usual?

We just few hours after the bump (in common.opt). So there is small
range of revisions where one can produce incompatible objects. But I did
not check lang specific/target specific options.

Honza

[Bug lto/99898] Possible LTO object incompatibility on gcc-10 branch

2021-04-06 Thread jakub at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=99898

--- Comment #7 from Jakub Jelinek  ---
Any *.opt changes can break the streaming of optimization or target option
nodes.
And from experience with gcc plugins we have such changes ~ each month even on
release branches.

Re: [Bug lto/99898] Possible LTO object incompatibility on gcc-10 branch

2021-04-06 Thread Jan Hubicka
> Any *.opt changes can break the streaming of optimization or target option
> nodes.
> And from experience with gcc plugins we have such changes ~ each month even on
> release branches.
It may make sense to add a simple test to our regular testers that
either the new revision can consume old object files or the version was
updated :)

Honza


[Bug lto/99898] Possible LTO object incompatibility on gcc-10 branch

2021-04-06 Thread hubicka at ucw dot cz via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=99898

--- Comment #8 from Jan Hubicka  ---
> Any *.opt changes can break the streaming of optimization or target option
> nodes.
> And from experience with gcc plugins we have such changes ~ each month even on
> release branches.
It may make sense to add a simple test to our regular testers that
either the new revision can consume old object files or the version was
updated :)

Honza

[Bug target/99924] [11 Regression] ICE in vect_schedule_slp_node, at tree-vect-slp.c:6040 since r11-6734-gad2603433853129e847cade5e269c6a5f889a020

2021-04-06 Thread cvs-commit at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=99924

--- Comment #4 from CVS Commits  ---
The master branch has been updated by Richard Biener :

https://gcc.gnu.org/g:58cd9fc8a61de09ba181c5ed5ac7fb91ec506414

commit r11-8001-g58cd9fc8a61de09ba181c5ed5ac7fb91ec506414
Author: Richard Biener 
Date:   Tue Apr 6 11:21:47 2021 +0200

tree-optimization/99924 - visit permute nodes again when partitioning

Since SLP graph partitioning works on scalar stmts (because it's done
for costing) we have to make sure to visit permute nodes multiple
times since they will not pull partitions together.

2021-04-06  Richard Biener  

PR tree-optimization/99924
* tree-vect-slp.c (vect_bb_partition_graph_r): Do not mark
nodes w/o scalar stmts as visited.

* gfortran.dg/vect/pr99924.f90: New testcase.

[Bug lto/99898] Possible LTO object incompatibility on gcc-10 branch

2021-04-06 Thread jakub at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=99898

Jakub Jelinek  changed:

   What|Removed |Added

 CC||jakub at gcc dot gnu.org

--- Comment #9 from Jakub Jelinek  ---
Many of the *.opt changes are target specific, so you'd need to test it also
across all targets, and furthermore it depends on what exactly is being
saved/restored, many options might be at the same spot.
So perhaps we want to compute some hash of the options stuff (e.g. compute it
by the awk scripts that emit options*.[ch]) and use that to determine LTO
compatibility in addition to the version?

[Bug target/99924] [11 Regression] ICE in vect_schedule_slp_node, at tree-vect-slp.c:6040 since r11-6734-gad2603433853129e847cade5e269c6a5f889a020

2021-04-06 Thread rguenth at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=99924

Richard Biener  changed:

   What|Removed |Added

 Resolution|--- |FIXED
 Status|ASSIGNED|RESOLVED

--- Comment #5 from Richard Biener  ---
Fixed.

[Bug target/99930] New: Failure to optimize floating point -abs(x) in nontrivial code at -O2/3

2021-04-06 Thread core13 at gmx dot net via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=99930

Bug ID: 99930
   Summary: Failure to optimize floating point -abs(x) in
nontrivial code at -O2/3
   Product: gcc
   Version: 11.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: target
  Assignee: unassigned at gcc dot gnu.org
  Reporter: core13 at gmx dot net
  Target Milestone: ---

Expected compiler output for -abs(x) is an orps setting the sign bit.


It works as expected with trivial code at -O1/2/3 optimization levels:

float q(float p)
{
return -std::abs(p);
}

orpsxmm0, XMMWORD PTR .LC1[rip]
ret


With more complex code the compiler uses orps at -O1 but andps + xorps at
-O2/3:

bool t(float n[2], float m)
{
for (int i = 0; i < 2; i++)
if (m > -std::abs(n[i]))
return true;
return false;
}

-O1
movss   xmm1, DWORD PTR [rdi]
orpsxmm1, XMMWORD PTR .LC1[rip]
comiss  xmm0, xmm1
ja  .L3
movss   xmm1, DWORD PTR [rdi+4]
orpsxmm1, XMMWORD PTR .LC1[rip]
comiss  xmm0, xmm1
setaal
ret

-O2/3
movss   xmm1, DWORD PTR [rdi]
movss   xmm3, DWORD PTR .LC0[rip]
movss   xmm2, DWORD PTR .LC1[rip]
andps   xmm1, xmm3
xorps   xmm1, xmm2
comiss  xmm0, xmm1
ja  .L3
movss   xmm1, DWORD PTR [rdi+4]
andps   xmm1, xmm3
xorps   xmm1, xmm2
comiss  xmm0, xmm1
setaal
ret

https://godbolt.org/z/5ch5ceEj7

[Bug c++/99931] New: Unnamed `struct` defined with `using` having internal linkage instead of external, unlike `typedef`, yielding different semantics for two

2021-04-06 Thread egor_suvorov at mail dot ru via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=99931

Bug ID: 99931
   Summary: Unnamed `struct` defined with `using` having internal
linkage instead of external, unlike `typedef`,
yielding different semantics for two
   Product: gcc
   Version: unknown
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: c++
  Assignee: unassigned at gcc dot gnu.org
  Reporter: egor_suvorov at mail dot ru
  Target Milestone: ---

Possible duplicate of this StackOverflow question:
https://stackoverflow.com/questions/48613758/using-vs-typedef-is-there-a-subtle-lesser-known-difference

Found through this StackOverflow question by Momme Sherif with help of M.M:
https://stackoverflow.com/questions/66966426/error-defining-an-unnamed-structure-in-c

Consider two following files:

// a.cpp
using Foo = struct {};
void test(Foo);
int main() {
Foo f;
test(f);
}
// b.cpp
using Foo = struct {};
void test(Foo) {
}

`g++ a.cpp b.cpp` fails on my machine with:

a.cpp:2:6: error: 'void test(Foo)', declared using unnamed type, is used but
never defined [-fpermissive]
2 | void test(Foo);
  |  ^~~~
a.cpp:1:7: note: 'using Foo = struct' does not refer to the
unqualified type, so it is not used for linkage
1 | using Foo = struct {};
  |   ^~~
a.cpp:2:6: warning: 'void test(Foo)' used but never defined
2 | void test(Foo);
  |  ^~~~

However, if you change `using Foo = struct {};` to `typedef struct {} Foo;`,
the code will compile successfully.

Looks like a clear semantics difference between `typedef` and `using`, which
kind of contradicts [dcl.typedef]/2 (emphasis mine):

> A typedef-name can also be introduced by an alias-declaration. The identifier 
> following the using keyword is not looked up; it becomes a typedef-name and 
> the optional attribute-specifier-seq following the identifier appertains to 
> that typedef-name. **Such a typedef-name has the same semantics as if it were 
> introduced by the typedef specifier.** In particular, it does not define a 
> new type.

My GCC is

g++ (Rev6, Built by MSYS2 project) 10.2.0
Copyright (C) 2020 Free Software Foundation, Inc.
This is free software; see the source for copying conditions.  There is NO
warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.

[Bug target/99932] New: OpenACC/nvptx offloading execution regressions starting with CUDA 11.2-era Nvidia Driver 460.27.04

2021-04-06 Thread tschwinge at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=99932

Bug ID: 99932
   Summary: OpenACC/nvptx offloading execution regressions
starting with CUDA 11.2-era Nvidia Driver 460.27.04
   Product: gcc
   Version: 11.0
Status: UNCONFIRMED
  Keywords: openacc
  Severity: normal
  Priority: P3
 Component: target
  Assignee: unassigned at gcc dot gnu.org
  Reporter: tschwinge at gcc dot gnu.org
CC: vries at gcc dot gnu.org
  Target Milestone: ---
Target: nvptx

We're seeing OpenACC/nvptx offloading execution regressions (including a lot of
timeouts) starting with CUDA 11.2-era Nvidia Driver 460.27.04.  Confirmed with:
CUDA 11.2-era 460.27.04, 460.32.03, 460.39, 460.56, 460.67, and CUDA 11.3-era
465.19.01, across several variants of GPU hardware.

Explicitly (re-)confirmed good are older versions such as CUDA 9.1-era 390.12,
and CUDA 11.1-era 455.38, 455.45.01.

Most of these are in the 'vector_length > 32' testcases, but also a few others.

@@ -6147,7 +6147,7 @@ PASS:
libgomp.oacc-c/../libgomp.oacc-c-c++-common/parallel-dims.c -DACC_DEVICE_T
PASS: libgomp.oacc-c/../libgomp.oacc-c-c++-common/parallel-dims.c
-DACC_DEVICE_TYPE_nvidia=1 -DACC_MEM_SHARED=0 -foffload=nvptx-none  -O2   (test
for warnings, line 596)
PASS: libgomp.oacc-c/../libgomp.oacc-c-c++-common/parallel-dims.c
-DACC_DEVICE_TYPE_nvidia=1 -DACC_MEM_SHARED=0 -foffload=nvptx-none  -O2   (test
for warnings, line 618)
PASS: libgomp.oacc-c/../libgomp.oacc-c-c++-common/parallel-dims.c
-DACC_DEVICE_TYPE_nvidia=1 -DACC_MEM_SHARED=0 -foffload=nvptx-none  -O2  (test
for excess errors)
[-PASS:-]{+FAIL:+}
libgomp.oacc-c/../libgomp.oacc-c-c++-common/parallel-dims.c
-DACC_DEVICE_TYPE_nvidia=1 -DACC_MEM_SHARED=0 -foffload=nvptx-none  -O2 
execution test

@@ -6581,7 +6581,8 @@ PASS:
libgomp.oacc-c/../libgomp.oacc-c-c++-common/vector-length-128-1.c -DACC_DE
PASS: libgomp.oacc-c/../libgomp.oacc-c-c++-common/vector-length-128-10.c
-DACC_DEVICE_TYPE_nvidia=1 -DACC_MEM_SHARED=0 -foffload=nvptx-none  -O0  (test
for excess errors)
PASS: libgomp.oacc-c/../libgomp.oacc-c-c++-common/vector-length-128-10.c
-DACC_DEVICE_TYPE_nvidia=1 -DACC_MEM_SHARED=0 -foffload=nvptx-none  -O0 
execution test
PASS: libgomp.oacc-c/../libgomp.oacc-c-c++-common/vector-length-128-10.c
-DACC_DEVICE_TYPE_nvidia=1 -DACC_MEM_SHARED=0 -foffload=nvptx-none  -O2  (test
for excess errors)
[-PASS:-]{+WARNING: program timed out.+}
{+FAIL:+}
libgomp.oacc-c/../libgomp.oacc-c-c++-common/vector-length-128-10.c
-DACC_DEVICE_TYPE_nvidia=1 -DACC_MEM_SHARED=0 -foffload=nvptx-none  -O2 
execution test
PASS: libgomp.oacc-c/../libgomp.oacc-c-c++-common/vector-length-128-2.c
-DACC_DEVICE_TYPE_nvidia=1 -DACC_MEM_SHARED=0 -foffload=nvptx-none  -O0  (test
for excess errors)
PASS: libgomp.oacc-c/../libgomp.oacc-c-c++-common/vector-length-128-2.c
-DACC_DEVICE_TYPE_nvidia=1 -DACC_MEM_SHARED=0 -foffload=nvptx-none  -O0 
execution test
PASS: libgomp.oacc-c/../libgomp.oacc-c-c++-common/vector-length-128-2.c
-DACC_DEVICE_TYPE_nvidia=1 -DACC_MEM_SHARED=0 -foffload=nvptx-none  -O0  output
pattern test
@@ -6599,32 +6600,32 @@ PASS:
libgomp.oacc-c/../libgomp.oacc-c-c++-common/vector-length-128-3.c -DACC_DE
PASS: libgomp.oacc-c/../libgomp.oacc-c-c++-common/vector-length-128-3.c
-DACC_DEVICE_TYPE_nvidia=1 -DACC_MEM_SHARED=0 -foffload=nvptx-none  -O2  output
pattern test
PASS: libgomp.oacc-c/../libgomp.oacc-c-c++-common/vector-length-128-3.c
-DACC_DEVICE_TYPE_nvidia=1 -DACC_MEM_SHARED=0 -foffload=nvptx-none  -O2  
scan-offload-tree-dump oaccdevlow "__attribute__\\(\\(oacc function \\(1, 1,
32\\)"
PASS: libgomp.oacc-c/../libgomp.oacc-c-c++-common/vector-length-128-4.c
-DACC_DEVICE_TYPE_nvidia=1 -DACC_MEM_SHARED=0 -foffload=nvptx-none  -O0  (test
for excess errors)
[-PASS:-]{+WARNING: program timed out.+}
{+FAIL:+} libgomp.oacc-c/../libgomp.oacc-c-c++-common/vector-length-128-4.c
-DACC_DEVICE_TYPE_nvidia=1 -DACC_MEM_SHARED=0 -foffload=nvptx-none  -O0 
execution test[-PASS:
libgomp.oacc-c/../libgomp.oacc-c-c++-common/vector-length-128-4.c
-DACC_DEVICE_TYPE_nvidia=1 -DACC_MEM_SHARED=0 -foffload=nvptx-none  -O0  output
pattern test-]
PASS: libgomp.oacc-c/../libgomp.oacc-c-c++-common/vector-length-128-4.c
-DACC_DEVICE_TYPE_nvidia=1 -DACC_MEM_SHARED=0 -foffload=nvptx-none  -O0  
scan-offload-tree-dump oaccdevlow "__attribute__\\(\\(oacc function \\(1, 2,
128\\)"
PASS: libgomp.oacc-c/../libgomp.oacc-c-c++-common/vector-length-128-4.c
-DACC_DEVICE_TYPE_nvidia=1 -DACC_MEM_SHARED=0 -foffload=nvptx-none  -O2  (test
for excess errors)
PASS: libgomp.oacc-c/../libgomp.oacc-c-c++-common/vector-length-128-4.c
-DACC_DEVICE_TYPE_nvidia=1 -DACC_MEM_SHARED=0 -foffload=nvptx-none  -O2 
execution test
PASS: libgomp.oacc-c/../libgomp.oacc-c-c++-common/vector-length-128-4.c
-DACC_DEVICE_TYPE_nvidia=1 -DACC_MEM_SHARED=0 -foffload=nv

[Bug tree-optimization/96573] [10 Regression] Regression in optimization on x86-64 with -O3

2021-04-06 Thread cvs-commit at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=96573

--- Comment #7 from CVS Commits  ---
The master branch has been updated by Jakub Jelinek :

https://gcc.gnu.org/g:bfeb36bd03c2168af263daa13370a20a96c42b5d

commit r11-8002-gbfeb36bd03c2168af263daa13370a20a96c42b5d
Author: Jakub Jelinek 
Date:   Tue Apr 6 12:44:51 2021 +0200

testsuite: Fix up pr96573.c on aarch64 [PR96573]

On Thu, Apr 01, 2021 at 02:16:55PM +0100, Alex Coplan via Gcc-patches
wrote:
> FYI, I'm seeing the new test failing on aarch64:
>
> PASS: gcc.dg/pr96573.c (test for excess errors)
> FAIL: gcc.dg/pr96573.c scan-tree-dump optimized "__builtin_bswap"

The vectorizer in the aarch64 case manages to emit a VEC_PERM_EXPR instead
(which is just as efficient).

So, do we want to go for the following (and/or perhaps also restrict the
test to
a couple of targets where it works?  In my last distro build it failed only
on aarch64-linux, while armv7hl-linux-gnueabi and
{i686,x86_64,powerpc64le,s390x}-linux were fine)?

2021-04-06  Jakub Jelinek  

PR tree-optimization/96573
* gcc.dg/pr96573.c: Instead of __builtin_bswap accept also
VEC_PERM_EXPR with bswapping permutation.

[Bug target/99929] SVE: Wrong code at -O2 -ftree-vectorize

2021-04-06 Thread acoplan at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=99929

--- Comment #1 from Alex Coplan  ---
Slightly cleaner testcase:

#include 
static void e(short *g, short p2) { *g ^= p2; }
static short m[23];
int main() {
  for (unsigned i = 0; i < 23; ++i)
m[i] = 4;
  if (svaddv(svptrue_pat_b32(SV_VL1), svdup_u32(1)) != 1)
__builtin_abort();
  for (unsigned i = 0; i < 3; ++i)
e(m, m[i]);
}

[Bug c/99872] [11 Regression] optimizations sometimes lead to missing asm prefixes

2021-04-06 Thread jakub at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=99872

Jakub Jelinek  changed:

   What|Removed |Added

 CC||jakub at gcc dot gnu.org

--- Comment #3 from Jakub Jelinek  ---
So perhaps it was my r11-2944-g0106300f6c3f7bae5eb1c46dbd45aa07c94e1b15
that broke it?
Seems mingw -m32 uses empty LOCAL_LABEL_PREFIX:
#define LOCAL_LABEL_PREFIX (TARGET_64BIT ? "." : "")
and so emits:
fldsLC1
...
.section .rdata,"dr"
.align 4
LC1:
.long   -8388608
.align 4
LC2:
.long   2139095040
etc.
But on the testcase it emits:
.set_LC5,_LC6
If you change manually that .set _LC5,_LC6 to .set LC5,LC6, does it work then?

[Bug tree-optimization/99880] [10/11 Regression] ICE in maybe_set_vectorized_backedge_value, at tree-vect-loop.c:9161 since r10-3711-g69f8c1aef5cdcc54

2021-04-06 Thread rguenth at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=99880

--- Comment #3 from Richard Biener  ---
So we're having a PHI latch value vectorized because it's used by the outer
loop (vect_used_in_outer_by_reduction), but the vect_nested_cycle def PHI
in the inner loop isn't relevant since it's vect_unused_in_scope:

   [local count: 11811160]:
  # g_36 = PHI 
  # c_lsm.16_20 = PHI <_12(11), c_lsm.16_29(15)>
  # ivtmp_18 = PHI 
  g_21 = g_36 + 5;
  vect_cst__28 = { 0, 0, 0, 0 };

   [local count: 118111600]:
  # h_14 = PHI <10(5), h_24(12)>
  # c_lsm.16_22 = PHI 
  # ivtmp_30 = PHI <10(5), ivtmp_31(12)>
  h_24 = h_14 + -1;
  _7 = d.6_6 <= c_lsm.16_22;
  _8 = (int) _7;
  vect__10.19_33 = vect_cst__28;
  _10 = 0;
  ivtmp_31 = ivtmp_30 - 1;
  if (ivtmp_31 != 0)
goto ; [90.00%]
  else
goto ; [10.00%]

   [local count: 106300440]:
  goto ; [100.00%]

   [local count: 11811160]:
  # _27 = PHI <_10(6)>
  _12 = _27 - e.10_11;

it's the c_lsm.16_22 PHI and the _10 backedge def (note how we failed to
constant propagate the 10 ...).  The fix is simple and hopefully it doesn't
break other stuff.

[Bug other/99933] New: gcc/brig/brigfrontend/brig-function.cc: 4 * possible performance problem ?

2021-04-06 Thread dcb314 at hotmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=99933

Bug ID: 99933
   Summary: gcc/brig/brigfrontend/brig-function.cc: 4 * possible
performance problem ?
   Product: gcc
   Version: 11.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: other
  Assignee: unassigned at gcc dot gnu.org
  Reporter: dcb314 at hotmail dot com
  Target Milestone: ---

Static analyser cppcheck finds the following:

trunk.git/gcc/brig/brigfrontend/brig-function.cc:1063:11: performance: Passing
the result of c_str() to a function that takes std::string as argument no. 1 is
slow and redundant. [stlcstrParam]
trunk.git/gcc/brig/brigfrontend/brig-function.cc:1091:11: performance: Passing
the result of c_str() to a function that takes std::string as argument no. 1 is
slow and redundant. [stlcstrParam]
trunk.git/gcc/brig/brigfrontend/brig-function.cc:1108:11: performance: Passing
the result of c_str() to a function that takes std::string as argument no. 1 is
slow and redundant. [stlcstrParam]
trunk.git/gcc/brig/brigfrontend/brig-function.cc:1126:11: performance: Passing
the result of c_str() to a function that takes std::string as argument no. 1 is
slow and redundant. [stlcstrParam]

In the component field of the web page, there doesn't seem to be an entry
for brig.

[Bug lto/99898] Possible LTO object incompatibility on gcc-10 branch

2021-04-06 Thread hubicka at ucw dot cz via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=99898

--- Comment #10 from Jan Hubicka  ---
> Many of the *.opt changes are target specific, so you'd need to test it also
> across all targets, and furthermore it depends on what exactly is being
> saved/restored, many options might be at the same spot.
> So perhaps we want to compute some hash of the options stuff (e.g. compute it
> by the awk scripts that emit options*.[ch]) and use that to determine LTO
> compatibility in addition to the version?

That would work.  One does not really do that in lto header, simply
stream the hash before streaming out the optimization_node decl.
Bit sad would be that w/o version info you have no indication if you
mixed new compiler with old objects or vice versa, but that is minor
anoyance I guess.  It would be good that compiler would just
sorryclaiming that it can not read object files created by different
version..

I believe we already safe a diff from default values rather than
streaming out all values. An option would be tom strea the option names
rather than indexes so adding/removing completely unrelated option does
not disturb the file format.

Honza
> 
> -- 
> You are receiving this mail because:
> You are on the CC list for the bug.

[Bug c++/99934] New: bad_array_new_length thrown when non-throwing allocation function would have been used

2021-04-06 Thread redi at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=99934

Bug ID: 99934
   Summary: bad_array_new_length thrown when non-throwing
allocation function would have been used
   Product: gcc
   Version: 11.0
Status: UNCONFIRMED
  Keywords: wrong-code
  Severity: normal
  Priority: P3
 Component: c++
  Assignee: unassigned at gcc dot gnu.org
  Reporter: redi at gcc dot gnu.org
  Target Milestone: ---

Reduced from Bug 99845 comment 11:

namespace std {
  using size_t = decltype(sizeof(0));
}

extern "C" void abort();
extern "C" int puts(const char*);

struct X
{
  void* operator new[](std::size_t) noexcept {
puts("should not be here");
abort();
return nullptr;
  }

  int data;
};

int main()
{
  int n = -1;
  auto p = new X[n];
  if (p)
abort();
}

This terminates with:

terminate called after throwing an instance of 'std::bad_array_new_length'
  what():  std::bad_array_new_length
Aborted (core dumped)

The new-expression is erroneous (it has non-class type and its value before
converting to std::size_t is less than zero), the allocation function that
would be called is non-throwing, therefore the value of the new-expression
should be (X*)nullptr instead of throwing bad_array_new_length. This was
changed by https://wg21.link/cwg1992

[Bug target/99872] [11 Regression] optimizations sometimes lead to missing asm prefixes

2021-04-06 Thread jakub at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=99872

--- Comment #4 from Jakub Jelinek  ---
Created attachment 50511
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=50511&action=edit
gcc11-pr99872.patch

So perhaps this patch?
I went through all ASM_OUTPUT_DEF definitions and all of them use assemble_name
for each of the last 2 arguments separately, and assemble_name should handle
the
* etc. at the start of the symbols.
Verified with the cross to mingw -m32 that it now emits .set LC5,LC6, and
verified that on x86_64-linux it didn't change anything e.g. on the testcase
mentioned in the commit message.

I have no way to test this on mingw32, neither 32-bit nor 64-bit, but will test
it on x86_64-linux and i686-linux tonight.

[Bug c++/99845] gcc8: Overloaded operator new[](size_t, const std::nothrow_t&) is seg faulting when the allocation fails

2021-04-06 Thread redi at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=99845

Jonathan Wakely  changed:

   What|Removed |Added

 Status|REOPENED|RESOLVED
 Resolution|--- |INVALID

--- Comment #12 from Jonathan Wakely  ---
(In reply to Keith Halligan from comment #11)
> While adding noexcept to the "opeator new[]()" overloaded functions does
> stop the crash on 64-bit, it does nothing for the 32-bit code, with the

Clearly not "does nothing" since the behaviour changes.

> compiler attempting to throw a std::bad_alloc.

No, not bad_alloc:

> terminate called after throwing an instance of 'std::bad_array_new_length'
>   what():  std::bad_array_new_length

You've asked for an array of -1u objects of size 4, which is four times larger
than the entire address space on 32-bit x86. That makes the new-expression
erroneous, and what GCC does is exactly what the C++14 standard requires.

C++17 was changed by https://wg21.link/cwg1992 but GCC doesn't implement that
yet, which I've reported as Bug 99934.

[Bug target/99881] Regression compare -O2 -ftree-vectorize with -O2 on SKX/CLX

2021-04-06 Thread rguenth at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=99881

--- Comment #5 from Richard Biener  ---
(In reply to Hongtao.liu from comment #4)
> (In reply to Richard Biener from comment #3)
> > But 2 element construction _should_ be cheap.  What is missing is the move
> > cost from GPR to XMM regs (but we do not have a good idea whether the 
> > sources
> > are memory, so it's not as clear-cut here either).
> > 
> > IMHO a better approach might be to up unaligned vector store/load costs?
> > 
> > For the testcase at hand why does a throughput of 1 pose a problem?  There's
> > only one punpckldq instruction around?
> > 
> 
> There're several lea/add(which also may use port 5) instructions around
> punckldq, considering that FAST LEA and Int ALU will be common in address
> computation, throughput of 1 for punckldq will be a bottleneck.
> 
> refer to https://godbolt.org/z/hK9r5vTzd for original case

Too bad.  But this is starting to model resource constraints which are not
at all handled by the generic part of the vectorizer cost model.  We kind-of
have the ability to do this in the target (see how rs6000 models some of this
in its finis_cost hook via rs6000_density_test).  But then the cost model
suffers from quite some GIGO already and I fear adding complexity will only
produce more 'G'.

As you have seen you need quite some offset to make up for the saved store,
I think trying to get integer_to_sse costed for the movd/pinsrq would be a
better way than parametrizing 'vec_construct' (because there's no vec_construct
instruction - there's multiple pieces to it).

> > Note that for the case of non-loop vectorization of 'double' the two element
> > vector CTORs are common and important to handle cheaply.  See also all the
> > discussion in PR98856

[Bug rtl-optimization/99930] Failure to optimize floating point -abs(x) in nontrivial code at -O2/3

2021-04-06 Thread rguenth at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=99930

Richard Biener  changed:

   What|Removed |Added

   Last reconfirmed||2021-04-06
 Status|UNCONFIRMED |NEW
 Ever confirmed|0   |1
 Target||x86_64-*-*
  Component|target  |rtl-optimization

--- Comment #1 from Richard Biener  ---
Confirmed.  At -O1

Trying 10 -> 12:
   10: {r91:SF=abs(r92:SF);use [`*.LC0'];clobber flags:CC;}
  REG_UNUSED flags:CC
  REG_DEAD r92:SF
   12: {r94:SF=-r91:SF;use r95:V4SF;clobber flags:CC;}
  REG_DEAD r95:V4SF
  REG_DEAD r91:SF
  REG_UNUSED flags:CC
Failed to match this instruction:
(parallel [
(set (reg:SF 94)
(neg:SF (abs:SF (reg:SF 92 [ *n_9(D) ]
(use (reg:V4SF 95))
(clobber (reg:CC 17 flags))
])
Successfully matched this instruction:
(parallel [
(set (reg:SF 94)
(neg:SF (abs:SF (reg:SF 92 [ *n_9(D) ]
(use (reg:V4SF 95))
])
allowing combination of insns 10 and 12
original costs 4 + 4 = 8
replacement cost 8

but with -O2:

Trying 10 -> 12:
   10: {r91:SF=abs(r92:SF);use r93:V4SF;clobber flags:CC;}
  REG_DEAD r92:SF
  REG_UNUSED flags:CC
   12: {r94:SF=-r91:SF;use r95:V4SF;clobber flags:CC;}
  REG_DEAD r91:SF
  REG_UNUSED flags:CC
Can't combine i2 into i3

we're later trying

Trying 10, 12 -> 13:
   10: {r91:SF=abs(r92:SF);use r93:V4SF;clobber flags:CC;}
  REG_DEAD r92:SF
  REG_UNUSED flags:CC
   12: {r94:SF=-r91:SF;use r95:V4SF;clobber flags:CC;}
  REG_DEAD r91:SF
  REG_UNUSED flags:CC
   13: flags:CCFP=cmp(r90:SF,r94:SF)
  REG_DEAD r94:SF
Failed to match this instruction:
(set (reg:CCFP 17 flags)
(compare:CCFP (neg:SF (abs:SF (reg:SF 92 [ *n_9(D) ])))
(reg/v:SF 90 [ m ])))
Failed to match this instruction:
(set (reg:SF 94)
(abs:SF (reg:SF 92 [ *n_9(D) ])))

[Bug rtl-optimization/99930] Failure to optimize floating point -abs(x) in nontrivial code at -O2/3

2021-04-06 Thread rguenth at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=99930

Richard Biener  changed:

   What|Removed |Added

 CC||segher at gcc dot gnu.org

--- Comment #2 from Richard Biener  ---
Seems because of r93 being live:

insn_cost 8 for 9: r93:V4SF=[`*.LC0']
  REG_EQUAL const_vector
insn_cost 4 for10: {r91:SF=abs(r92:SF);use r93:V4SF;clobber flags:CC;}
  REG_DEAD r92:SF
  REG_UNUSED flags:CC
insn_cost 8 for11: r95:V4SF=[`*.LC1']
  REG_EQUAL const_vector
insn_cost 4 for12: {r94:SF=-r91:SF;use r95:V4SF;clobber flags:CC;}
  REG_DEAD r91:SF
  REG_UNUSED flags:CC
insn_cost 4 for13: flags:CCFP=cmp(r90:SF,r94:SF)
  REG_DEAD r94:SF
insn_cost 12 for14: pc={(flags:CCFP>0)?L35:pc}
  REG_DEAD flags:CCFP
  REG_BR_PROB 59055804
insn_cost 8 for16: r97:SF=[r89:DI+0x4]
  REG_DEAD r89:DI
insn_cost 4 for18: {r96:SF=abs(r97:SF);use r93:V4SF;clobber flags:CC;}
  REG_DEAD r97:SF
  REG_DEAD r93:V4SF
  REG_UNUSED flags:CC
insn_cost 4 for20: {r99:SF=-r96:SF;use r95:V4SF;clobber flags:CC;}
  REG_DEAD r96:SF
  REG_DEAD r95:V4SF
  REG_UNUSED flags:CC

while at -O1 we have two loads of LC0 and r93 is dead after insn 10.

[Bug other/99933] gcc/brig/brigfrontend/brig-function.cc: 4 * possible performance problem ?

2021-04-06 Thread dcb314 at hotmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=99933

David Binderman  changed:

   What|Removed |Added

 CC||pekka.jaaskelainen@parmance
   ||.com

--- Comment #1 from David Binderman  ---
Adding original author for their opinion.

[Bug middle-end/99857] [11 Regression] FAIL: libgomp.c/declare-variant-1.c (test for excess errors) by r11-7926

2021-04-06 Thread tschwinge at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=99857

Thomas Schwinge  changed:

   What|Removed |Added

   Assignee|unassigned at gcc dot gnu.org  |hubicka at gcc dot 
gnu.org
   Last reconfirmed|2021-04-01 00:00:00 |2021-4-6
 CC||tschwinge at gcc dot gnu.org
 Status|NEW |ASSIGNED

--- Comment #3 from Thomas Schwinge  ---
Honza stated that he's "looking into it",
.

---

With offloading enabled, there are more similar failures:

[-PASS:-]{+FAIL:+} libgomp.c/../libgomp.c-c++-common/task-detach-6.c (test
for excess errors)

[-PASS:-]{+FAIL:+} libgomp.c/pr99555-1.c (test for excess errors)

[-PASS:-]{+FAIL:+} libgomp.c/target-42.c (test for excess errors)

[-PASS:-]{+FAIL:+} libgomp.c++/../libgomp.c-c++-common/task-detach-6.c
(test for excess errors)

[-PASS:-]{+FAIL:+} libgomp.fortran/task-detach-6.f90   -O0  (test for
excess errors)
[-PASS:-]{+FAIL:+} libgomp.fortran/task-detach-6.f90   -O1  (test for
excess errors)
[-PASS:-]{+FAIL:+} libgomp.fortran/task-detach-6.f90   -O2  (test for
excess errors)
[-PASS:-]{+FAIL:+} libgomp.fortran/task-detach-6.f90   -O3
-fomit-frame-pointer -funroll-loops -fpeel-loops -ftracer -finline-functions 
(test for excess errors)
[-PASS:-]{+FAIL:+} libgomp.fortran/task-detach-6.f90   -O3 -g  (test for
excess errors)
[-PASS:-]{+FAIL:+} libgomp.fortran/task-detach-6.f90   -Os  (test for
excess errors)

..., and:

during IPA pass: modref
[...]/libgomp/testsuite/libgomp.c/target-32.c:55:1: internal compiler
error: in omp_lto_output_declare_variant_alt, at omp-general.c:2368

... seen for:

[-PASS:-]{+FAIL: libgomp.c/target-32.c (internal compiler error)+}
{+FAIL:+} libgomp.c/target-32.c (test for excess errors)

[-PASS:-]{+FAIL: libgomp.c/thread-limit-2.c (internal compiler error)+}
{+FAIL:+} libgomp.c/thread-limit-2.c (test for excess errors)

[Bug tree-optimization/99880] [10/11 Regression] ICE in maybe_set_vectorized_backedge_value, at tree-vect-loop.c:9161 since r10-3711-g69f8c1aef5cdcc54

2021-04-06 Thread cvs-commit at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=99880

--- Comment #4 from CVS Commits  ---
The master branch has been updated by Richard Biener :

https://gcc.gnu.org/g:e5c170e080399fb3d24a38bbfcd66bd4675abe53

commit r11-8005-ge5c170e080399fb3d24a38bbfcd66bd4675abe53
Author: Richard Biener 
Date:   Tue Apr 6 13:20:44 2021 +0200

tree-optimization/99880 - avoid vectorizing irrelevant PHI backedge defs

This adds a relevancy check before trying to set the vector def of
a backedge in an unvectorized PHI.

2021-04-06  Richard Biener  

PR tree-optimization/99880
* tree-vect-loop.c (maybe_set_vectorized_backedge_value): Only
set vectorized defs of relevant PHIs.

* gcc.dg/torture/pr99880.c: New testcase.

[Bug tree-optimization/99880] [10 Regression] ICE in maybe_set_vectorized_backedge_value, at tree-vect-loop.c:9161 since r10-3711-g69f8c1aef5cdcc54

2021-04-06 Thread rguenth at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=99880

Richard Biener  changed:

   What|Removed |Added

Summary|[10/11 Regression] ICE in   |[10 Regression] ICE in
   |maybe_set_vectorized_backed |maybe_set_vectorized_backed
   |ge_value, at|ge_value, at
   |tree-vect-loop.c:9161 since |tree-vect-loop.c:9161 since
   |r10-3711-g69f8c1aef5cdcc54  |r10-3711-g69f8c1aef5cdcc54
  Known to fail|11.0|
  Known to work||11.0

--- Comment #5 from Richard Biener  ---
Fixed on trunk sofar.

[Bug c++/99934] bad_array_new_length thrown when non-throwing allocation function would have been used

2021-04-06 Thread jakub at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=99934

--- Comment #1 from Jakub Jelinek  ---
For the non-global replaceable operator new call, we put the outer_nelts_check
already into the size argument before we actually look up the call:
  tree errval = TYPE_MAX_VALUE (sizetype);
  if (cxx_dialect >= cxx11 && flag_exceptions)
errval = throw_bad_array_new_length ();
  if (outer_nelts_check != NULL_TREE)
size = fold_build3 (COND_EXPR, sizetype, outer_nelts_check,
size, errval);
...
  alloc_call
= build_new_method_call (dummy, fns, &align_args,
 /*conversion_path=*/NULL_TREE,
 LOOKUP_NORMAL, &alloc_fn, tf_none);
...
alloc_call = build_new_method_call (dummy, fns, placement,
/*conversion_path=*/NULL_TREE,
LOOKUP_NORMAL,
&alloc_fn, complain);
So I guess either we should after alloc_call is built look whether it is
noexcept/throw() and if so, wrap it into another COND_EXPR with unshare_expr of
outer_nelts_check and alloc_call, build_zero_cst (TREE_TYPE (alloc_call)),
or perhaps that + at that point try to simplify the size argument of the call.
But I think any kind of CSE in GIMPLE or RTL optimizations should optimize that
already and so the FE doesn't need to duplicate such optimization.

Plus verify what happens with the global replaceable operators.

And another thing is constant expression evaluation,
http://eel.is/c++draft/expr.new#9.5 says we should reject it during constant
expression evaluation, but if we represent it as
COND_EXPR something, operator new (...), nullptr
then I'm afraid constant expression evaluation would accept it and evaluate to
nullptr.

[Bug bootstrap/99920] [10 regression] ICE building gcc 10 on power 7 BE

2021-04-06 Thread seurer at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=99920

seurer at gcc dot gnu.org changed:

   What|Removed |Added

 Status|UNCONFIRMED |RESOLVED
 Resolution|--- |INVALID

--- Comment #3 from seurer at gcc dot gnu.org ---
I changed to use gcc 8.4 as my build compiler and the ICEs went away.  I assume
it was an issue with the older build compiler I had been using.

[Bug demangler/99935] New: Stack exhaustion demangling rust mangled name

2021-04-06 Thread nickc at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=99935

Bug ID: 99935
   Summary: Stack exhaustion demangling rust mangled name
   Product: gcc
   Version: 11.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: demangler
  Assignee: unassigned at gcc dot gnu.org
  Reporter: nickc at gcc dot gnu.org
  Target Milestone: ---

The rust demangler can be pushed into an infinite loop, triggering stack
exhaustion:

  %  cat pr27963
# Reproduced from binutils PR 27963.
# Note - the expected output is wrong.  It is just there as a placeholder.
--format=rust
_RIMBALO_suB_I__Z5printi
fred

  % valgrind ./testsuite/test-demangle < pr27963
[...]
==429737== Stack overflow in thread #1: can't grow stack to 0x1ffe001000
[...]
=429737== Stack overflow in thread #1: can't grow stack to 0x1ffe001000
==429737==at 0x410BA7: demangle_path (rust-demangle.c:742)
[...]
Segmentation fault (core dumped)

[Bug bootstrap/99920] [10 regression] ICE building gcc 10 on power 7 BE

2021-04-06 Thread rguenther at suse dot de via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=99920

--- Comment #4 from rguenther at suse dot de  ---
On Tue, 6 Apr 2021, seurer at gcc dot gnu.org wrote:

> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=99920
> 
> seurer at gcc dot gnu.org changed:
> 
>What|Removed |Added
> 
>  Status|UNCONFIRMED |RESOLVED
>  Resolution|--- |INVALID
> 
> --- Comment #3 from seurer at gcc dot gnu.org ---
> I changed to use gcc 8.4 as my build compiler and the ICEs went away.  I 
> assume
> it was an issue with the older build compiler I had been using.

Possibly - what version was that?

[Bug c++/99936] New: FAIL: g++.dg/modules/xtreme-header* on Darwin

2021-04-06 Thread dominiq at lps dot ens.fr via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=99936

Bug ID: 99936
   Summary: FAIL: g++.dg/modules/xtreme-header* on Darwin
   Product: gcc
   Version: 11.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: c++
  Assignee: unassigned at gcc dot gnu.org
  Reporter: dominiq at lps dot ens.fr
CC: iains at gcc dot gnu.org
  Target Milestone: ---

The following failures appeared between r11-7844 and r11-7872

FAIL: g++.dg/modules/xtreme-header-5_a.H -std=c++17 (internal compiler error)
FAIL: g++.dg/modules/xtreme-header-5_a.H -std=c++17 (test for excess errors)
FAIL: g++.dg/modules/xtreme-header-5_a.H -std=c++2a (internal compiler error)
FAIL: g++.dg/modules/xtreme-header-5_a.H -std=c++2a (test for excess errors)
FAIL: g++.dg/modules/xtreme-header-5_a.H -std=c++2b (internal compiler error)
FAIL: g++.dg/modules/xtreme-header-5_a.H -std=c++2b (test for excess errors)
FAIL: g++.dg/modules/xtreme-header-5_a.H module-cmi 
(gcm.cache/\$srcdir/g++.dg/modules/xtreme-header-5_a.H.gcm)
FAIL: g++.dg/modules/xtreme-header-5_a.H module-cmi 
(gcm.cache/\$srcdir/g++.dg/modules/xtreme-header-5_a.H.gcm)
FAIL: g++.dg/modules/xtreme-header-5_a.H module-cmi 
(gcm.cache/\$srcdir/g++.dg/modules/xtreme-header-5_a.H.gcm)
FAIL: g++.dg/modules/xtreme-header-5_b.C -std=c++17 (test for excess errors)
FAIL: g++.dg/modules/xtreme-header-5_b.C -std=c++2a (test for excess errors)
FAIL: g++.dg/modules/xtreme-header-5_b.C -std=c++2b (test for excess errors)
FAIL: g++.dg/modules/xtreme-header-5_c.C -std=c++17 (test for excess errors)
FAIL: g++.dg/modules/xtreme-header-5_c.C -std=c++2a (test for excess errors)
FAIL: g++.dg/modules/xtreme-header-5_c.C -std=c++2b (test for excess errors)
FAIL: g++.dg/modules/xtreme-header_a.H -std=c++17 (internal compiler error)
FAIL: g++.dg/modules/xtreme-header_a.H -std=c++17 (test for excess errors)
FAIL: g++.dg/modules/xtreme-header_a.H -std=c++2a (internal compiler error)
FAIL: g++.dg/modules/xtreme-header_a.H -std=c++2a (test for excess errors)
FAIL: g++.dg/modules/xtreme-header_a.H -std=c++2b (internal compiler error)
FAIL: g++.dg/modules/xtreme-header_a.H -std=c++2b (test for excess errors)
FAIL: g++.dg/modules/xtreme-header_a.H module-cmi 
(gcm.cache/\$srcdir/g++.dg/modules/xtreme-header_a.H.gcm)
FAIL: g++.dg/modules/xtreme-header_a.H module-cmi 
(gcm.cache/\$srcdir/g++.dg/modules/xtreme-header_a.H.gcm)
FAIL: g++.dg/modules/xtreme-header_a.H module-cmi 
(gcm.cache/\$srcdir/g++.dg/modules/xtreme-header_a.H.gcm)
FAIL: g++.dg/modules/xtreme-header_b.C -std=c++17 (test for excess errors)
FAIL: g++.dg/modules/xtreme-header_b.C -std=c++2a (test for excess errors)
FAIL: g++.dg/modules/xtreme-header_b.C -std=c++2b (test for excess errors)

The ICEs are

expected none of template_decl, have template_decl in add_specializations, at
cp/module.cc:12952

[Bug target/99781] [11 Regression] ICE in partial_subreg_p, at rtl.h:3144

2021-04-06 Thread cvs-commit at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=99781

--- Comment #6 from CVS Commits  ---
The master branch has been updated by Vladimir Makarov :

https://gcc.gnu.org/g:4bbd51afaa4a3c116fb538d912b35e126be80b41

commit r11-8008-g4bbd51afaa4a3c116fb538d912b35e126be80b41
Author: Vladimir N. Makarov 
Date:   Wed Mar 31 13:26:30 2021 -0400

[PR99781] Update correctly reg notes in LRA for multi-registers and set up
biggest mode safely

The PR is about incorrect use of partial_subreg_p for unordered modes.
I found 2 places of dangerous comparing unordered modes in LRA.  The
patch removes dangerous use of paradoxical_subreg_p and
partial_subreg_p in split_reg and process_bb_lives.  The both places
used them to solve PR77761 long time ago.  But the problem was also
fixed by later patches too (if there is no hard reg explicitly, it
have VOIDmode and we use natural mode to split hard reg live,
otherwise we use the biggest explicitly used mode for hard reg
splitting).  The PR also says about inaccurate update of reg notes in
LRA.  It happens for reg notes which refer for multi-registers.  The
patch also fixes this issue.

gcc/ChangeLog:

PR target/99781
* lra-constraints.c (split_reg): Don't check paradoxical_subreg_p.
* lra-lives.c (clear_sparseset_regnos, regnos_in_sparseset_p): New
functions.
(process_bb_lives): Don't update biggest mode of hard reg for
implicit in multi-register group.  Use the new functions for
updating dead_set and unused_set by register notes.

gcc/testsuite/ChangeLog:

PR target/99781
* g++.target/aarch64/sve/pr99781.C: New.

[Bug bootstrap/99920] [10 regression] ICE building gcc 10 on power 7 BE

2021-04-06 Thread jakub at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=99920

--- Comment #5 from Jakub Jelinek  ---
GCC 10 doesn't require C++11 and so in theory everything from GCC 4.1 till GCC
11 should be supported as system compiler, perhaps we need some workaround for
some known bugs somewhere (we have a couple of them already).

[Bug middle-end/99857] [11 Regression] FAIL: libgomp.c/declare-variant-1.c (test for excess errors) by r11-7926

2021-04-06 Thread hubicka at ucw dot cz via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=99857

--- Comment #4 from Jan Hubicka  ---
> Honza stated that he's "looking into it",
> .
I do just got distracted by easter.  Problem has to be release_body
happening mid offloading streaming process that is bit odd by itself.
Having smaller testcase is nice - the libgomp one had quite few
release_body calls at compile time.

Honza

[Bug c++/96780] debuginfo for std::move and std::forward isn't useful

2021-04-06 Thread redi at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=96780

--- Comment #3 from Jonathan Wakely  ---
I think that would be great.

[Bug other/99933] gcc/brig/brigfrontend/brig-function.cc: 4 * possible performance problem ?

2021-04-06 Thread marxin at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=99933

Martin Liška  changed:

   What|Removed |Added

   Last reconfirmed||2021-04-06
   Assignee|unassigned at gcc dot gnu.org  |marxin at gcc dot 
gnu.org
 Ever confirmed|0   |1
 CC||marxin at gcc dot gnu.org
 Status|UNCONFIRMED |ASSIGNED

--- Comment #2 from Martin Liška  ---
BRIG FE will be removed next month, not needed to be fixed.

[Bug tree-optimization/99927] [11 Regression] Maybe wrong code since r11-39-gf9e1ea10e657af9f

2021-04-06 Thread jakub at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=99927

Jakub Jelinek  changed:

   What|Removed |Added

 CC||jakub at gcc dot gnu.org
   Assignee|unassigned at gcc dot gnu.org  |jakub at gcc dot gnu.org
 Status|NEW |ASSIGNED

--- Comment #2 from Jakub Jelinek  ---
Interestingly, it only reproduces on AMD CPUs and not Intel.
The bug is in:
xorl%edx, %edx
divl%edi
movl$1, %eax
cmove   %edx, %eax
divl leaves ZF undefined as documented (and as seen in RTL), but we use that in
the cmove instruction.

[Bug tree-optimization/99927] [11 Regression] Maybe wrong code since r11-39-gf9e1ea10e657af9f

2021-04-06 Thread jakub at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=99927

--- Comment #3 from Jakub Jelinek  ---
Before combine it looks fine:
(insn 23 22 105 6 (parallel [
(set (reg:SI 108)
(udiv:SI (reg:SI 104)
(reg/v:SI 102 [ var_6 ])))
(set (reg:SI 107)
(umod:SI (reg:SI 104)
(reg/v:SI 102 [ var_6 ])))
(clobber (reg:CC 17 flags))
]) "pr99927.c":13:24 449 {*udivmodsi4}
 (expr_list:REG_DEAD (reg:SI 104)
(expr_list:REG_UNUSED (reg:SI 108)
(expr_list:REG_UNUSED (reg:CC 17 flags)
(nil)
(insn 105 23 106 6 (set (reg:QI 135)
(const_int 1 [0x1])) "pr99927.c":13:24 77 {*movqi_internal}
 (nil))
(insn 106 105 107 6 (parallel [
(set (reg:QI 134)
(and:QI (subreg:QI (reg:SI 107) 0)
(const_int 1 [0x1])))
(clobber (reg:CC 17 flags))
]) "pr99927.c":13:24 491 {*andqi_1}
 (expr_list:REG_UNUSED (reg:CC 17 flags)
(nil)))
(insn 107 106 108 6 (set (reg:CCZ 17 flags)
(compare:CCZ (reg:SI 107)
(const_int 0 [0]))) "pr99927.c":13:24 7 {*cmpsi_ccno_1}
 (expr_list:REG_DEAD (reg:SI 107)
(nil)))
(insn 108 107 111 6 (set (reg:QI 96 [ var_lsm_flag.12 ])
(if_then_else:QI (eq (reg:CCZ 17 flags)
(const_int 0 [0]))
(reg:QI 134)
(reg:QI 135))) "pr99927.c":13:24 1104 {*movqicc_noc}
 (expr_list:REG_DEAD (reg:QI 134)
(nil)))
but in combine dump there is:
(insn 23 22 105 6 (parallel [
(set (reg:SI 108)
(udiv:SI (reg:SI 104)
(reg/v:SI 102 [ var_6 ])))
(set (reg:SI 107)
(umod:SI (reg:SI 104)
(reg/v:SI 102 [ var_6 ])))
(clobber (reg:CC 17 flags))
]) "pr99927.c":13:24 449 {*udivmodsi4}
 (expr_list:REG_DEAD (reg:SI 104)
(expr_list:REG_UNUSED (reg:SI 108)
(expr_list:REG_UNUSED (reg:CC 17 flags)
(nil)
(note 105 23 106 6 NOTE_INSN_DELETED)
(note 106 105 107 6 NOTE_INSN_DELETED)
(insn 107 106 108 6 (set (reg:QI 135)
(const_int 1 [0x1])) "pr99927.c":13:24 77 {*movqi_internal}
 (nil))
(note 108 107 111 6 NOTE_INSN_DELETED)
(insn 111 108 85 6 (set (reg:QI 96 [ var_lsm_flag.12 ])
(if_then_else:QI (eq (reg:CCZ 17 flags)
(const_int 0 [0]))
(subreg:QI (reg:SI 107) 0)
(reg:QI 135))) "pr99927.c":13:24 1104 {*movqicc_noc}
 (expr_list:REG_DEAD (reg:SI 107)
(expr_list:REG_DEAD (reg:QI 135)
(nil

[Bug c/99937] New: Optimization needed for ARM with single cycle multiplier

2021-04-06 Thread mike.robins at talktalk dot net via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=99937

Bug ID: 99937
   Summary: Optimization needed for ARM with single cycle
multiplier
   Product: gcc
   Version: 10.2.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: c
  Assignee: unassigned at gcc dot gnu.org
  Reporter: mike.robins at talktalk dot net
  Target Milestone: ---

Created attachment 50512
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=50512&action=edit
Source file(s)

I am cross-compiling using "arm-none-eabi-gcc -mcpu=cortex-m0plus -Wall -Wextra
-fno-strict-aliasing -fwrapv -O3 -S foobar.c" for a target architecture that
performs a multiply in a single cycle, using gcc version 10.2.0 on a PC running
Fedora Linux.

Is there an option to persuade the compiler to use the multiply instruction
automatically instead of shifts and adds when multiplying by a constant?

In the example code attached, gcc uses the trick of multiplying by a big number
instead of dividing by a small one (12 in this case). For my target, the code
from "-O3" is both longer and slower then that for "-Os".

[Bug libstdc++/99871] #includes inside push visibility scope

2021-04-06 Thread redi at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=99871

Jonathan Wakely  changed:

   What|Removed |Added

   Target Milestone|--- |12.0

--- Comment #2 from Jonathan Wakely  ---
We use the #pragma before using #include in most of the libsupc++ headers, and
also 

[Bug driver/99896] g++ drops -lc

2021-04-06 Thread matz at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=99896

Michael Matz  changed:

   What|Removed |Added

 CC||matz at gcc dot gnu.org

--- Comment #7 from Michael Matz  ---
(In reply to Jonathan Wakely from comment #1)
> (In reply to Tom de Vries from comment #0)
> > With g++, we have instead:
> > ...
> > collect2 ... main.o foo.o -lpcre2-posix ...
> > ...
> 
> It isn't dropped, it's moved to the end:
> 
> main.o foo.o -lpcre2-posix -lstdc++ -lm -lc -lgcc_s -lgcc -lc -lgcc_s -lgcc
> 
> If you need it before foo.o then -Wl,-lc seems like the right workaround for
> me.

Workaround is the correct term here.  The correct thing would be for g++ to not
reorder -l arguments.  The similarity to -I is superficial: duplicated -l
arguments have meaning (with static archives for instance) and their position
in relation to object and source files matters.  g++ can validly tack on
additional -l arguments to the end, and arguably also replace a lone -lc
argument that was originally at the end of the command line or implicit (e.g.
to inject its unwinder), but it shouldn't otherwise reorder such arguments.

I will of course agree that the issue that the added -lc "solves" is actually
a bug in the testcase (and gdb).  But that should be immaterial here.  At the
very least gcc and g++ should behave the same in this respect.

[Bug tree-optimization/99927] [11 Regression] Maybe wrong code since r11-39-gf9e1ea10e657af9f

2021-04-06 Thread rguenth at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=99927

--- Comment #4 from Richard Biener  ---
Trying 105, 107 -> 108:
  105: r135:QI=0x1
  107: flags:CCZ=cmp(r107:SI,0)
  108: r96:QI={(flags:CCZ==0)?r107:SI#0:r135:QI}
  REG_DEAD r107:SI
  REG_DEAD flags:CC
Failed to match this instruction:
(parallel [
(set (reg:QI 96 [ var_lsm_flag.12 ])
(subreg:QI (reg:SI 107) 0))
(set (reg:QI 135)
(const_int 1 [0x1]))
])
Failed to match this instruction:
(parallel [
(set (reg:QI 96 [ var_lsm_flag.12 ])
(subreg:QI (reg:SI 107) 0))
(set (reg:QI 135)
(const_int 1 [0x1]))
])
Successfully matched this instruction:
(set (reg:QI 135)
(const_int 1 [0x1]))
Successfully matched this instruction:
(set (reg:QI 96 [ var_lsm_flag.12 ])
(subreg:QI (reg:SI 107) 0))
allowing combination of insns 105, 107 and 108
original costs 4 + 4 + 8 = 16
replacement costs 4 + 4 = 8
deferring deletion of insn with uid = 105.
modifying insn i2   107: r135:QI=0x1
deferring rescan insn with uid = 107.
modifying insn i3   108: r96:QI=r107:SI#0
  REG_DEAD r107:SI
deferring rescan insn with uid = 108.

note that insn 107 was the CC setter for the if-then-else but we now have
a plain move there.

[Bug target/99937] Optimization needed for ARM with single cycle multiplier

2021-04-06 Thread rguenth at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=99937

Richard Biener  changed:

   What|Removed |Added

  Component|c   |target
   Keywords||missed-optimization
 Target||arm

--- Comment #1 from Richard Biener  ---
You need to adjust RTX costing accordingly which likely means adding a new
subtarget tuning.

[Bug target/99937] Optimization needed for ARM with single cycle multiplier

2021-04-06 Thread mike.robins at talktalk dot net via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=99937

--- Comment #2 from mike.robins at talktalk dot net ---
(In reply to Richard Biener from comment #1)
> You need to adjust RTX costing accordingly which likely means adding a new
> subtarget tuning.

Hi Richard
Are you saying that this would have to be added at the GCC source level
somehow. I.e that there is no existing -mtune... or -f... to achieve this?
Mike

[Bug target/99912] Unnecessary / inefficient spilling of AVX2 ymm registers

2021-04-06 Thread schnetter at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=99912

--- Comment #4 from Erik Schnetter  ---
I build with the compiler options

/Users/eschnett/src/CarpetX/Cactus/view-compilers/bin/g++  -fopenmp -Wall -pipe
-g -march=skylake -std=gnu++17 -O3 -fcx-limited-range -fexcess-precision=fast
-fno-math-errno -fno-rounding-math -fno-signaling-nans
-funsafe-math-optimizations   -c -o configs/sim/build/Z4c/rhs.cxx.o
configs/sim/build/Z4c/rhs.cxx.ii

One of the kernels in question (the one I describe above) is the C++ lambda in
lines 281013 to 281119. The call to the "noinline" function ensures that the
kernel (and surrounding for loops) is compiled as a separate function, which
produces more efficient code. The function "grid.loop_int_device" contains
essentially three nested for loops, and the actual kernel is the C++ lambda in
lines 281015 to 281118.

I'll have a look at -fdump-tree-optimized.

[Bug tree-optimization/99927] [11 Regression] Maybe wrong code since r11-39-gf9e1ea10e657af9f

2021-04-06 Thread jakub at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=99927

--- Comment #5 from Jakub Jelinek  ---
It indeed goes wrong in the 105, 107 -> 108 try_combine, but at the start of
that we have:
(insn 105 23 106 6 (set (reg:QI 135)
(const_int 1 [0x1])) "pr99927.c":13:24 77 {*movqi_internal}
 (nil))
(note 106 105 107 6 NOTE_INSN_DELETED)
(insn 107 106 108 6 (set (reg:CCZ 17 flags)
(compare:CCZ (reg:SI 107)
(const_int 0 [0]))) "pr99927.c":13:24 7 {*cmpsi_ccno_1}
 (nil))
(insn 108 107 111 6 (set (reg:QI 96 [ var_lsm_flag.12 ])
(if_then_else:QI (eq (reg:CCZ 17 flags)
(const_int 0 [0]))
(subreg:QI (reg:SI 107) 0)
(reg:QI 135))) "pr99927.c":13:24 1104 {*movqicc_noc}
 (expr_list:REG_DEAD (reg:SI 107)
(expr_list:REG_DEAD (reg:CC 17 flags)
(nil
(insn 111 108 85 6 (set (reg:QI 96 [ var_lsm_flag.12 ])
(if_then_else:QI (eq (reg:CCZ 17 flags)
(const_int 0 [0]))
(reg:QI 96 [ var_lsm_flag.12 ])
(reg:QI 135))) "pr99927.c":13:24 1104 {*movqicc_noc}
 (expr_list:REG_DEAD (reg:QI 135)
(nil)))
(jump_insn 85 111 35 6 (set (pc)
(if_then_else (ne (reg:CCZ 17 flags)
(const_int 0 [0]))
(label_ref 45)
(pc))) 806 {*jcc}
 (expr_list:REG_DEAD (reg:CCZ 17 flags)
(int_list:REG_BR_PROB 536870916 (nil)))
 -> 45)

The substitutions of 105 and 107 into 108 properly simplify 108 into
(set (reg:QI 96 [ var_lsm_flag.12 ])
(subreg:QI (reg:SI 107) 0))
because it is:
(set (reg:QI 96 [ var_lsm_flag.12 ])
(if_then_else:QI (ne (reg:SI 107)
(const_int 0 [0]))
(const_int 1 [0x1])
(const_int 0 [0])))
But what is wrong is that try_combine has been called at all, because
(reg:CCZ 17 flags) is used in 3 instructions rather than just one.

[Bug tree-optimization/99927] [11 Regression] Maybe wrong code since r11-39-gf9e1ea10e657af9f

2021-04-06 Thread jakub at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=99927

Jakub Jelinek  changed:

   What|Removed |Added

 CC||segher at gcc dot gnu.org

--- Comment #6 from Jakub Jelinek  ---
So, we have at the start of first try_combine called on bb 6:
...
(insn 105 23 106 6 (set (reg:QI 135)
(const_int 1 [0x1])) "pr99927.c":13:24 77 {*movqi_internal}
 (nil))
(insn 106 105 107 6 (parallel [
(set (reg:QI 134)
(and:QI (subreg:QI (reg:SI 107) 0)
(const_int 1 [0x1])))
(clobber (reg:CC 17 flags))
]) "pr99927.c":13:24 491 {*andqi_1}
 (expr_list:REG_UNUSED (reg:CC 17 flags)
(nil)))
(insn 107 106 108 6 (set (reg:CCZ 17 flags)
(compare:CCZ (reg:SI 107)
(const_int 0 [0]))) "pr99927.c":13:24 7 {*cmpsi_ccno_1}
 (expr_list:REG_DEAD (reg:SI 107)
(nil)))
(insn 108 107 111 6 (set (reg:QI 96 [ var_lsm_flag.12 ])
(if_then_else:QI (eq (reg:CCZ 17 flags)
(const_int 0 [0]))
(reg:QI 134)
(reg:QI 135))) "pr99927.c":13:24 1104 {*movqicc_noc}
 (expr_list:REG_DEAD (reg:QI 134)
(nil)))
(insn 111 108 85 6 (set (reg:QI 96 [ var_lsm_flag.12 ])
(if_then_else:QI (eq (reg:CCZ 17 flags)
(const_int 0 [0]))
(reg:QI 96 [ var_lsm_flag.12 ])
(reg:QI 135))) "pr99927.c":13:24 1104 {*movqicc_noc}
 (expr_list:REG_DEAD (reg:QI 135)
(nil)))
(jump_insn 85 111 35 6 (set (pc)
(if_then_else (ne (reg:CCZ 17 flags)
(const_int 0 [0]))
(label_ref 45)
(pc))) 806 {*jcc}
 (expr_list:REG_DEAD (reg:CCZ 17 flags)
(int_list:REG_BR_PROB 536870916 (nil)))
 -> 45)

where LOG_LINKS of 108 are i105/r135, i106/r134 and i107/r17,
of 111 are i108/r96 and 85 has NULL LOG_LINKS.
But, r17 is used in all of i108, i111 and i85, so isn't single use, so isn't it
incorrect that it has the i107/r17 link?

[Bug c++/96873] Internal compiler error in alias_ctad_tweaks

2021-04-06 Thread mateusz.pusz at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=96873

--- Comment #3 from Mateusz Pusz  ---
Are there any chances for it to be fixed for gcc-11 or gcc-10.3? This feature
is essential for the Physical Units library for C++Next.

[Bug c++/96873] Internal compiler error in alias_ctad_tweaks

2021-04-06 Thread mpolacek at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=96873

--- Comment #4 from Marek Polacek  ---
Yeah, hopefully for both.

[Bug c++/96873] Internal compiler error in alias_ctad_tweaks

2021-04-06 Thread mateusz.pusz at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=96873

--- Comment #5 from Mateusz Pusz  ---
Thanks!

[Bug tree-optimization/99927] [11 Regression] Maybe wrong code since r11-39-gf9e1ea10e657af9f

2021-04-06 Thread jakub at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=99927

Jakub Jelinek  changed:

   What|Removed |Added

   Assignee|jakub at gcc dot gnu.org   |unassigned at gcc dot 
gnu.org
 Status|ASSIGNED|NEW

--- Comment #7 from Jakub Jelinek  ---
Ah, create_log_links wants to work like that.
So, the bug seems to be that insn 108 has REG_DEAD (reg:CC 17 flags) note.
It doesn't initially, but it is added during 106 -> 108 combination
(gdb) p debug_rtx (i3)
(insn 108 107 111 6 (set (reg:QI 96 [ var_lsm_flag.12 ])
(if_then_else:QI (eq (reg:CCZ 17 flags)
(const_int 0 [0]))
(reg:QI 134)
(reg:QI 135))) "pr99927.c":13:24 1104 {*movqicc_noc}
 (expr_list:REG_DEAD (reg:QI 134)
(nil)))
$151 = void
(gdb) p debug_rtx (i2)
(insn 106 105 107 6 (parallel [
(set (reg:QI 134)
(and:QI (subreg:QI (reg:SI 107) 0)
(const_int 1 [0x1])))
(clobber (reg:CC 17 flags))
]) "pr99927.c":13:24 491 {*andqi_1}
 (expr_list:REG_UNUSED (reg:CC 17 flags)
(nil)))

The combination of those 2 insns is successful - into:
(insn 108 107 111 6 (set (reg:QI 96 [ var_lsm_flag.12 ])
(if_then_else:QI (eq (reg:CCZ 17 flags)
(const_int 0 [0]))
(subreg:QI (reg:SI 107) 0)
(reg:QI 135))) "pr99927.c":13:24 1104 {*movqicc_noc}
 (expr_list:REG_DEAD (reg:SI 107)
(expr_list:REG_DEAD (reg:CC 17 flags)
(nil
but the distribute_notes that turned REG_UNUSED (reg:CC 17 flags) note from
insn 106 into REG_DEAD (reg:CC 17 flags) note on insn 108 is what looks broken
to me,
the flags register is set by insn 107 in between those two and is used by some
insns after insn 108 (111 and 85) and the new combined pattern certainly
doesn't
kill the register in any way.

Segher, could you please have a look?
Thanks.

[Bug c++/99938] New: Non-void function with no return statement: Either no or misleading warning is printed

2021-04-06 Thread rschoe at de dot ibm.com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=99938

Bug ID: 99938
   Summary: Non-void function with no return statement: Either no
or misleading warning is printed
   Product: gcc
   Version: 10.2.1
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: c++
  Assignee: unassigned at gcc dot gnu.org
  Reporter: rschoe at de dot ibm.com
  Target Milestone: ---

Created attachment 50513
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=50513&action=edit
Code example when compiled with g++ -O1 -c code.cpp does not show any warning,
If you exchange NULL with nullptr, warning shows wrong line

Hi,
Tested this with g++ (GCC) 10.2.1 20201125 (Red Hat 10.2.1-9) uname -r
5.10.19-200.fc33.x86_64

The following code

  ```
  #include 

  struct C
  {
  C(int *);
  ~C();
  };

  int foo()
  {
  C c = NULL;
  if(false)
  {
  while(1){}
  }
  }
  ```

compiled with
  `g++ -O1 -c code.cpp`
(compiler arguments are relevant)

does not generate any warning about missing return statement in `foo()`

however when modified slightly (change `NULL` to `nullptr`):
  ```
  #include 

  struct C
  {
  C(int *);
  ~C();
  };

  int foo()
  {
  C c = nullptr;
  if(false)
  {
  while(1){}
  }
  }
  ```

g++ generates the following output (compiler arguments are relevant):
  ```
  g++ -O1 -c code.cpp
  main.cpp: In function ‘int foo()’:
  main.cpp:11:11: warning: control reaches end of non-void function
[-Wreturn-type]
 11 | C c = nullptr;
|   ^~~
  ```

which detects the missing `return` but points to the wrong line. I expected
line 16 (the closing bracket of foo() function scope) to be called out.


Other modifications which lead to the warning being printed with correct line
number (16) are (applying one at a time is sufficient):
  - Compile with `-O0`
  - Comment/remove the `while(1){}`
  - Comment/remove the desctuctor `~C` declaration

clang prints warnings with correct line (16) in all cases. I would expect g++
to behave the same.




Excuse me if I overlooked something or misunderstood c++ or the concept of g++.
If this is intended behavior, I would be happy to learn more about it :)

Also I had some trouble formatting this bug report. Somehow I could not figure
out how to add formatting (e.g. Markdown) or attach multiple files.

[Bug c++/99938] Non-void function with no return statement: Either no or misleading warning is printed

2021-04-06 Thread redi at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=99938

Jonathan Wakely  changed:

   What|Removed |Added

   Keywords||diagnostic

--- Comment #1 from Jonathan Wakely  ---
(In reply to rschoe from comment #0)
> Also I had some trouble formatting this bug report. Somehow I could not
> figure out how to add formatting (e.g. Markdown) or attach multiple files.

No formatting is supported. To add multiple files just attach them one at a
time, i.e. attach one when you create the bug, then attach another as a
separate edit to the bug once it exists.

[Bug tree-optimization/99927] [11 Regression] Maybe wrong code since r11-39-gf9e1ea10e657af9f

2021-04-06 Thread jakub at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=99927

--- Comment #8 from Jakub Jelinek  ---
distribute_notes has:
  /* If this register is set or clobbered in I3, put the note there
 unless there is one already.  */
  if (reg_set_p (XEXP (note, 0), PATTERN (i3)))
{
  if (from_insn != i3)
break;

  if (! (REG_P (XEXP (note, 0))
 ? find_regno_note (i3, REG_UNUSED, REGNO (XEXP (note, 0)))
 : find_reg_note (i3, REG_UNUSED, XEXP (note, 0
place = i3;
}
  /* Otherwise, if this register is used by I3, then this register
 now dies here, so we must put a REG_DEAD note here unless there
 is one already.  */
  else if (reg_referenced_p (XEXP (note, 0), PATTERN (i3))
   && ! (REG_P (XEXP (note, 0))
 ? find_regno_note (i3, REG_DEAD,
REGNO (XEXP (note, 0)))
 : find_reg_note (i3, REG_DEAD, XEXP (note, 0
{
  PUT_REG_NOTE_KIND (note, REG_DEAD);
  place = i3;
}
the if (reg_set_p (...)) is false, as flags is not set by the insn, it is used.
But the else if is clearly not true, at least when XEXP (note, 0) is set in
instructions in between i3 and i2 (or from whatever insn the notes come from).

[Bug c++/99938] Non-void function with no return statement: Either no or misleading warning is printed

2021-04-06 Thread rschoe at de dot ibm.com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=99938

--- Comment #2 from rschoe at de dot ibm.com ---
Created attachment 50514
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=50514&action=edit
modified example (nullptr), which now shows a warning but wrong line number 11
wgen compiled with g++ -O1 -c code2.cpp

[Bug other/99903] 32-bit x86 frontends randomly crash while reporting timing on Windows

2021-04-06 Thread izbyshev at ispras dot ru via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=99903

--- Comment #3 from Alexey Izbyshev  ---
Crashes eventually occurred with both one- and two-processor affinity masks, so
pinning GCC to a single core doesn't help. But I've tracked the reason down.

When `get_time()` from `gcc/timevar.c` gets inlined into its callers (which
happens with -O2), it "returns" the result on a x87 FPU register. Then
`timevar_accumulate()` computes the difference between this 80-bit number and a
64-bit double stored in the timer structure. So when `clock()` returns 15 at
both start and end measurements, this code basically subtracts 15 * (1.0 /
1000) stored with 64-bit precision from itself computed with 80-bit precision,
and the difference is 8.673617379884035472e-19. When `clock()` returns 15 for
all measurements during a single cc1 run, the total time and each phase time
are equal to this same constant, and the sum of phase times is twice the total
time:

Timing error: total of phase timers exceeds total time.
user1.734723475976807094e-18 > 8.673617379884035472e-19

Maybe GCC should round such ridiculously small intervals to zero?

[Bug c/99939] New: CMSE: -march=armv8.1-m.main+mve does not support correctly.

2021-04-06 Thread sripar01 at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=99939

Bug ID: 99939
   Summary: CMSE: -march=armv8.1-m.main+mve does not support
correctly.
   Product: gcc
   Version: 11.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: c
  Assignee: unassigned at gcc dot gnu.org
  Reporter: sripar01 at gcc dot gnu.org
  Target Milestone: ---
  Host: x86_64
Target: arm

Created attachment 50515
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=50515&action=edit
Pre-processor output file

The current arm-none-eabi-gcc does not support CMSE correctly on passing
command line option "-march=armv8.1-m.main+mve -mfloat-abi=hard -mfpu=auto
-mcmse".

On compiling the attached testcase test.c with mentioned command line I see
following error message:

$ arm-none-eabi-gcc  -march=armv8.1-m.main+mve -mcmse -mfpu=auto
-mfloat-abi=hard -Wl,-section-start=.gnu.sgstubs=0x19 --specs=rdimon.specs
test.i
/media/sripar01/2tb_work/embedded_toolchain/gcc-arm-none-eabi-10-2020-q4-major/bin/../lib/gcc/arm-none-eabi/10.2.1/../../../../arm-none-eabi/bin/ld:
/tmp/cc8Jg0A2.o: in function `secure_fun':
test.i:(.text+0x16): undefined reference to `cmse_check_address_range'
collect2: error: ld returned 1 exit status

I see "undefined reference to `cmse_check_address_range'" in the error message
but this function is present in the file gcc/libgcc/config/arm/cmse.c and is
available for multilib build with `-mcmse` option.

$ arm-none-eabi-gcc -v
Using built-in specs.
COLLECT_GCC=arm-none-eabi-gcc
COLLECT_LTO_WRAPPER=/media/sripar01/2tb_work/embedded_toolchain/gcc-arm-none-eabi-10-2020-q4-major/bin/../lib/gcc/arm-none-eabi/10.2.1/lto-wrapper
Target: arm-none-eabi
Configured with:
/mnt/workspace/workspace/GCC-10-pipeline/jenkins-GCC-10-pipeline-48_20201124_1606180641/src/gcc/configure
--target=arm-none-eabi
--prefix=/mnt/workspace/workspace/GCC-10-pipeline/jenkins-GCC-10-pipeline-48_20201124_1606180641/install-native
--libexecdir=/mnt/workspace/workspace/GCC-10-pipeline/jenkins-GCC-10-pipeline-48_20201124_1606180641/install-native/lib
--infodir=/mnt/workspace/workspace/GCC-10-pipeline/jenkins-GCC-10-pipeline-48_20201124_1606180641/install-native/share/doc/gcc-arm-none-eabi/info
--mandir=/mnt/workspace/workspace/GCC-10-pipeline/jenkins-GCC-10-pipeline-48_20201124_1606180641/install-native/share/doc/gcc-arm-none-eabi/man
--htmldir=/mnt/workspace/workspace/GCC-10-pipeline/jenkins-GCC-10-pipeline-48_20201124_1606180641/install-native/share/doc/gcc-arm-none-eabi/html
--pdfdir=/mnt/workspace/workspace/GCC-10-pipeline/jenkins-GCC-10-pipeline-48_20201124_1606180641/install-native/share/doc/gcc-arm-none-eabi/pdf
--enable-languages=c,c++ --enable-plugins --disable-decimal-float
--disable-libffi --disable-libgomp --disable-libmudflap --disable-libquadmath
--disable-libssp --disable-libstdcxx-pch --disable-nls --disable-shared
--disable-threads --disable-tls --with-gnu-as --with-gnu-ld --with-newlib
--with-headers=yes --with-python-dir=share/gcc-arm-none-eabi
--with-sysroot=/mnt/workspace/workspace/GCC-10-pipeline/jenkins-GCC-10-pipeline-48_20201124_1606180641/install-native/arm-none-eabi
--build=x86_64-linux-gnu --host=x86_64-linux-gnu
--with-gmp=/mnt/workspace/workspace/GCC-10-pipeline/jenkins-GCC-10-pipeline-48_20201124_1606180641/build-native/host-libs/usr
--with-mpfr=/mnt/workspace/workspace/GCC-10-pipeline/jenkins-GCC-10-pipeline-48_20201124_1606180641/build-native/host-libs/usr
--with-mpc=/mnt/workspace/workspace/GCC-10-pipeline/jenkins-GCC-10-pipeline-48_20201124_1606180641/build-native/host-libs/usr
--with-isl=/mnt/workspace/workspace/GCC-10-pipeline/jenkins-GCC-10-pipeline-48_20201124_1606180641/build-native/host-libs/usr
--with-libelf=/mnt/workspace/workspace/GCC-10-pipeline/jenkins-GCC-10-pipeline-48_20201124_1606180641/build-native/host-libs/usr
--with-host-libstdcxx='-static-libgcc -Wl,-Bstatic,-lstdc++,-Bdynamic -lm'
--with-pkgversion='GNU Arm Embedded Toolchain 10-2020-q4-major'
--with-multilib-list=rmprofile,aprofile
Thread model: single
Supported LTO compression algorithms: zlib
gcc version 10.2.1 20201103 (release) (GNU Arm Embedded Toolchain
10-2020-q4-major)

[Bug bootstrap/99920] [10 regression] ICE building gcc 10 on power 7 BE

2021-04-06 Thread jakub at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=99920

Jakub Jelinek  changed:

   What|Removed |Added

   Last reconfirmed||2021-04-06
 Resolution|INVALID |---
 Status|RESOLVED|REOPENED
 Ever confirmed|0   |1

--- Comment #6 from Jakub Jelinek  ---
If the problematic older system gcc was 4.1+, can you please bisect among *.o
files compiled by the system gcc?
If the ICE is with the stage1 gcc when building stage1 libgcc, then
the suspected TUs could be gimple-ssa-store-merging.o or fold-const.o, would be
nice to start with those.
I.e. build again with your older system gcc, verify it fails, make a copy of
those two *.o files, rebuild them with gcc 8.x, make cc1 cc1plus and verify it
doesn't ICE anymore.

  1   2   >