[Bug tree-optimization/95401] [10 Regression] GCC produces incorrect instruction with -O3 for AVX2 since r10-2257-g868363d4f52df19d

2021-03-05 Thread rguenth at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=95401

--- Comment #8 from Richard Biener  ---
(In reply to Alexandre Oliva from comment #7)
> How important is it that the test added for this PR be split into two
> separate source files?
> 
> I ask because, on targets that support vectors, but the vector unit is not
> enabled in the default configuration, vect.exp makes compile the default
> action, instead of run, and with additional sources, compile fails because
> one can't compile multiple sources into a single asm output.

Hmm, but that sounds like a mistake in the dg setup?  Anyway, if you can make
the testcase fail when combined (and some noipa attributes sprinkled around)
it's certainly fine to merge it into a single TU.

[Bug middle-end/99394] s254 benchmark of TSVC is vectorized by clang and not by gcc

2021-03-05 Thread rguenth at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=99394

--- Comment #2 from Richard Biener  ---
This is a loop-carried data dependence which we can't handle (we avoid creating
those from PRE but here it appears in the source itself).  I wonder how
LLVM handles this (pre/post vectorization IL).

Specifically 'carry around variable' is something we don't handle.

Can you somehow extract a compilable testcase (with just this kernel)?

Looking at the source peeling a single iteration (to get rid of the initial
value) and then undoing the PRE, vectorizing

for (int i = 1; i < LEN_1D; i++) {
a[i] = (b[i] + b[i-1]) * (real_t).5;
}

would likely result in optimal code.  The assembly from clang doesn't look
optimal to me - llvm likely materializes 'x' as temporary array, vectorizing

  x[0] = b[LEN_1D-1];
for (int i = 0; i < LEN_1D; i++) {
a[i] = (b[i] + x[i]) * (real_t).5;
x[i+1] = b[i];
}

and then somehow (like we handle OMP simd lane arrays?) uses two vectors
as a sliding window over x[].  At least the standard strathegy for
these kind of dependences is to get "rid" of them by making them data
dependences and then hope for the best.

[Bug tree-optimization/99394] s254 benchmark of TSVC is vectorized by clang and not by gcc

2021-03-05 Thread rguenth at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=99394

Richard Biener  changed:

   What|Removed |Added

 Status|UNCONFIRMED |NEW
   Keywords||missed-optimization
 Ever confirmed|0   |1
   Last reconfirmed||2021-03-05
 CC||rguenth at gcc dot gnu.org
  Component|middle-end  |tree-optimization

[Bug tree-optimization/99396] std::rotl and std::rotr Does not convert into ROTATE on the gimple level

2021-03-05 Thread redi at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=99396

Jonathan Wakely  changed:

   What|Removed |Added

 CC|jwakely.gcc at gmail dot com   |

--- Comment #5 from Jonathan Wakely  ---
Stop CCing me on every bug you file, or I will ban your account, permanently
this time.

I read the gcc-bugs mailing list, so I will see the bug anyway. Stop CCing me.
I won't ask again, I'll just ban you.

Also, don't just post a godbolt link with no other comments, that's not a valid
but report.

[Bug tree-optimization/99395] s116 benchmark of TSVC is vectorized by clang and not by gcc

2021-03-05 Thread rguenth at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=99395

Richard Biener  changed:

   What|Removed |Added

   Last reconfirmed||2021-03-05
 CC||rguenth at gcc dot gnu.org,
   ||rsandifo at gcc dot gnu.org
   Keywords||missed-optimization
 Ever confirmed|0   |1
 Status|UNCONFIRMED |NEW
  Component|middle-end  |tree-optimization

--- Comment #2 from Richard Biener  ---
please provide compilable testcases ...

Reduced testcase:

double a[1024];
void foo ()
{
  for (int i = 0; i < 1022; i += 2)
{
  a[i] = a[i+1] * a[i];
  a[i+1] = a[i+2] * a[i+1];
}
}

[Bug tree-optimization/98856] [11 Regression] botan AES-128/XTS is slower by ~17% since r11-6649-g285fa338b06b804e72997c4d876ecf08a9c083af

2021-03-05 Thread ubizjak at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98856

--- Comment #24 from Uroš Bizjak  ---
(In reply to Richard Biener from comment #22)

> That works to avoid the vpinsrq.  I guess the case of a mem operand
> behaves similar to a gpr (plus the load uop), at least I don't have any
> contrary evidence (but I didn't do any microbenchmarks either).
> 
> I'm not sure IRA/LRA will optimally handle the situation with register
> pressure causing spilling in case it needs to reload both gpr operands.
> At least for
> 
> typedef long v2di __attribute__((vector_size(16)));
> 
> v2di foo (long a, long b)
> {
>   return (v2di){a, b};
> }
> 
> with -msse4.1 -O3 -ffixed-xmm1 -ffixed-xmm2 -ffixed-xmm3 -ffixed-xmm4
> -ffixed-xmm5 -ffixed-xmm6 -ffixed-xmm7 -ffixed-xmm8 -ffixed-xmm9
> -ffixed-xmm10 -ffixed-xmm11 -ffixed-xmm12 -ffixed-xmm13 -ffixed-xmm14
> -ffixed-xmm15 I get with the
> patch
> 
> foo:
> .LFB0:
> .cfi_startproc
> movq%rsi, -16(%rsp)
> movq%rdi, %xmm0
> pinsrq  $1, -16(%rsp), %xmm0
> ret
> 
> while without it's
> 
> movq%rdi, %xmm0
> pinsrq  $1, %rsi, %xmm0

This is expacted, my patch is based on the assumption that punpcklqdq is cheap
compared to pinsrq, and interunit moves are cheap. This way, IRA will reload GP
register to XMM register and use cheaper instruction.

> as far as I understand LRA dumps the new attribute is a hard one, even
> applying when other alternatives are worse.  In this case we choose
> alt 7.  Covering also alts 7 and 8 with the optimize-for-speed attribute
> causes reload fails - which is expected if there's no way for LRA to
> choose alt 1.  The following seems to work for the small testcase above
> but not for the important case in the benchmark (meh).
> 
> diff --git a/gcc/config/i386/sse.md b/gcc/config/i386/sse.md
> index db5be59f5b7..e393a0d823b 100644
> --- a/gcc/config/i386/sse.md
> +++ b/gcc/config/i386/sse.md
> @@ -15992,7 +15992,7 @@
>   (match_operand:DI 1 "register_operand"
>   "  0, 0,x ,Yv,0,Yv,0,0,v")
>   (match_operand:DI 2 "nonimmediate_operand"
> - " rm,rm,rm,rm,x,Yv,x,m,m")))]
> + " !rm,!rm,!rm,!rm,x,Yv,x,!m,!m")))]
>"TARGET_SSE"
>"@
> pinsrq\t{$1, %2, %0|%0, %2, 1}

The above means that GP will still be used, since it fits without reloading.

> I guess the idea of this insn setup was exactly to get IRA/LRA choose
> the optimal instruction sequence - otherwise exposing the reload so
> late is probably suboptimal.

THere is one more tool in the toolbox. A peephole2 pattern can be
conditionalized on availabe XMM register. So, if XMM reg is available, the
GPR->XMM move can be emitted in front of the insn. So, if there is XMM register
pressure, pinsrd will be used, but if an XMM register is availabe, it will be
reused to emit punpcklqdq.

The peephole2 pattern can also be conditionalized for targets where GPR->XMM
moves are fast.

[Bug tree-optimization/99397] s152 benchmark of TSVC is vectorized by clang and not by gcc

2021-03-05 Thread rguenth at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=99397

Richard Biener  changed:

   What|Removed |Added

  Component|middle-end  |tree-optimization
   Keywords||missed-optimization
 Ever confirmed|0   |1
 Status|UNCONFIRMED |NEW
   Last reconfirmed||2021-03-05

--- Comment #1 from Richard Biener  ---
That's the long-standing issue of dependence analysis not handling mixed
array and pointer access forms which means we miss distance zero computation
and handling here.

There's a duplicate for this.

The mitigiation is to "try again" with the array access demoted to a
pointer-based access (thus, analyze some alternative DR and see if dependence
analysis
can handle that).

[Bug c++/99401] New: GCC11 MinGW-w64 32-bit build fails with undefined reference to `LC0'

2021-03-05 Thread brechtsanders at users dot sourceforge.net via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=99401

Bug ID: 99401
   Summary: GCC11 MinGW-w64 32-bit build fails with undefined
reference to `LC0'
   Product: gcc
   Version: 11.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: c++
  Assignee: unassigned at gcc dot gnu.org
  Reporter: brechtsanders at users dot sourceforge.net
  Target Milestone: ---

Created attachment 50301
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=50301&action=edit
make output

When building GCC11 (last tested with 11-20210228) for Windows 32-bit with
MinGW-w64 the build fails with undefined references to LC0/LC1/LC2/LC3.

My build was done using the following configure command:
../configure --prefix=/R/winlibs32_stage/inst_gcc-11-20210228/share/gcc
--build=i686-w64-mingw32 --host=i686-w64-mingw32 --with-pkgversion=MinGW-W64
i686-posix-dwarf, built by Brecht Sanders --with-tune=generic
--enable-checking=release --enable-threads=posix --with-dwarf2
--disable-sjlj-exceptions --disable-libunwind-exceptions
--disable-serial-configure --disable-bootstrap --enable-host-shared
--enable-plugin --disable-default-ssp --disable-rpath --disable-libstdcxx-pch
--enable-libstdcxx-time=yes --disable-libstdcxx-debug
--disable-version-specific-runtime-libs --with-stabs --disable-symvers
--enable-languages=c,c++,lto,objc,obj-c++,d --disable-gold --disable-nls
--disable-stage1-checking --disable-win32-registry --disable-multilib
--enable-ld --enable-libquadmath --enable-libada --enable-libssp
--enable-libstdcxx --enable-lto --enable-fully-dynamic-string --enable-libgomp
--enable-graphite --enable-mingw-wildcard
--with-mpc=/d/Prog/winlibs32_stage/custombuilt
--with-mpfr=/d/Prog/winlibs32_stage/custombuilt
--with-gmp=/d/Prog/winlibs32_stage/custombuilt
--with-isl=/d/Prog/winlibs32_stage/custombuilt --enable-install-libiberty
--enable-__cxa_atexit --without-included-gettext --with-diagnostics-color=auto
--with-libiconv --with-system-zlib
--with-build-sysroot=/R/winlibs32_stage/gcc-11-20210228/mingw-w64
--enable-large-address-aware
CFLAGS=-I/d/Prog/winlibs32_stage/custombuilt/include/libdl-win32

Note that I was able to build the 32-bit compiler once, but I had to disable
fortran to work around this error. This is the second iteration where I try to
build GCC 11-20210228 with the same version of GCC from the first iteration.

Windows 64-bit builds work fine, so this error is limited to Windows 32-bit.

Any idea what causes this?

[Bug fortran/99345] [11 Regression] ICE in doloop_contained_procedure_code, at fortran/frontend-passes.c:2464

2021-03-05 Thread doko at debian dot org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=99345

--- Comment #6 from Matthias Klose  ---
now at https://people.debian.org/~doko/tmp/espresso-test.tar.xz

[Bug c++/99402] New: std::copy creates _GLIBCXX_DEBUG false positive for attempt to subscript a dereferenceable (start-of-sequence) iterator

2021-03-05 Thread kip at thevertigo dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=99402

Bug ID: 99402
   Summary: std::copy creates _GLIBCXX_DEBUG false positive for
attempt to subscript a dereferenceable
(start-of-sequence) iterator
   Product: gcc
   Version: 10.2.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: c++
  Assignee: unassigned at gcc dot gnu.org
  Reporter: kip at thevertigo dot com
  Target Milestone: ---

The following is a minimal:

// Standard C++ / POSIX system headers...
#include 
#include 
#include 

using namespace std;

int main()
{
// Container of eleven elements...
const set Source = { 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10 };

// Goal is to copy the first ten elements in to here, or 0 to 9 inclusive.
//  It has space for ten elements...
vector Destination(10);

// This should point to the end of the source range, or element with value
//  10 which is the first value after the range to be copied...
const auto EndIterator = next(cbegin(Source), 10);

// This results in memory corruption, or an abort with STL debugging
//  enabled. copy_n(..., 10, ...) works fine...
copy(cbegin(Source), EndIterator, begin(Destination));

return 0;
}

Compile and run with the following:

$ g++-10 -D_GLIBCXX_DEBUG test.cpp -o test -g3 -std=c++17 && ./test 

I see the following:

/usr/include/c++/10/bits/stl_algobase.h:566:
In function:
_OI std::copy(_II, _II, _OI) [with _II = 
__gnu_debug::_Safe_iterator, 
std::__debug::set, std::bidirectional_iterator_tag>; _OI = 
__gnu_debug::_Safe_iterator<__gnu_cxx::__normal_iterator > >, 
std::__debug::vector, std::random_access_iterator_tag>]

Error: attempt to subscript a dereferenceable (start-of-sequence) iterator 
11 step from its current position, which falls outside its dereferenceable 
range.

Objects involved in the operation:
iterator "__result" @ 0x0x7ffc3a448040 {
  type = __gnu_cxx::__normal_iterator > > (mutable iterator);
  state = dereferenceable (start-of-sequence);
  references sequence with type 'std::__debug::vector >' @ 0x0x7ffc3a4480d0
}
Aborted (core dumped)

I've been advised from another who ran the same test that this works fine with
GCC 8 and 9, so it may be a regression.

[Bug fortran/99345] [11 Regression] ICE in doloop_contained_procedure_code, at fortran/frontend-passes.c:2464

2021-03-05 Thread marxin at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=99345

--- Comment #7 from Martin Liška  ---
Thanks, but I can't see where the missing modules come from:

$ gcc postahc.f90 -c
postahc.f90:21:7:

   21 |   USE kinds,   ONLY : DP
  |   1
Fatal Error: Cannot open module file ‘kinds.mod’ for reading at (1): No such
file or directory
compilation terminated.

[Bug c++/98810] [9/10 Regression] [C++20] ICE in tsubst_copy, at cp/pt.c:16771

2021-03-05 Thread clyon at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98810

Christophe Lyon  changed:

   What|Removed |Added

 CC||clyon at gcc dot gnu.org

--- Comment #7 from Christophe Lyon  ---
Hi, the backport in gcc-9 branch is causing errors:

ERROR: g++.dg/cpp2a/nontype-class-defarg1.C  -std=c++14: syntax error in target
selector "target c++20" for " dg-do 2 compile { target c++20 } "

[Bug c++/99401] GCC11 MinGW-w64 32-bit build fails with undefined reference to `LC0'

2021-03-05 Thread ebotcazou at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=99401

Eric Botcazou  changed:

   What|Removed |Added

   Last reconfirmed||2021-03-05
 Status|UNCONFIRMED |WAITING
 Ever confirmed|0   |1
 CC||ebotcazou at gcc dot gnu.org

--- Comment #1 from Eric Botcazou  ---
This looks like an issue with the base compiler, could you post the output of
'gcc -v'?  Could you also avoid forcing -O2 during stage #1, i.e. use the
default -O0 or only -O1?

[Bug fortran/99345] [11 Regression] ICE in doloop_contained_procedure_code, at fortran/frontend-passes.c:2464

2021-03-05 Thread doko at debian dot org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=99345

--- Comment #8 from Matthias Klose  ---
updated the tarball to include the Modules dir

[Bug tree-optimization/99396] std::rotl and std::rotr Does not convert into ROTATE on the gimple level

2021-03-05 Thread jakub at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=99396

Jakub Jelinek  changed:

   What|Removed |Added

 CC||jakub at gcc dot gnu.org

--- Comment #6 from Jakub Jelinek  ---
There is absolutely no reason why libstdc++ should use any intrinsics for the
rotates, gcc recognizes a lot of patterns into rotates.
Just not the extra verbose one used in libstdc++.
The comment in gcc says:
/* Recognize rotation patterns.  Return true if a transformation
   applied, otherwise return false.

   We are looking for X with unsigned type T with bitsize B, OP being
   +, | or ^, some type T2 wider than T.  For:
   (X << CNT1) OP (X >> CNT2)   iff CNT1 + CNT2 == B
   ((T) ((T2) X << CNT1)) OP ((T) ((T2) X >> CNT2)) iff CNT1 + CNT2 == B

   transform these into:
   X r<< CNT1

   Or for:
   (X << Y) OP (X >> (B - Y))
   (X << (int) Y) OP (X >> (int) (B - Y))
   ((T) ((T2) X << Y)) OP ((T) ((T2) X >> (B - Y)))
   ((T) ((T2) X << (int) Y)) OP ((T) ((T2) X >> (int) (B - Y)))
   (X << Y) | (X >> ((-Y) & (B - 1)))
   (X << (int) Y) | (X >> (int) ((-Y) & (B - 1)))
   ((T) ((T2) X << Y)) | ((T) ((T2) X >> ((-Y) & (B - 1
   ((T) ((T2) X << (int) Y)) | ((T) ((T2) X >> (int) ((-Y) & (B - 1

   transform these into:
   X r<< Y

   Or for:
   (X << (Y & (B - 1))) | (X >> ((-Y) & (B - 1)))
   (X << (int) (Y & (B - 1))) | (X >> (int) ((-Y) & (B - 1)))
   ((T) ((T2) X << (Y & (B - 1 | ((T) ((T2) X >> ((-Y) & (B - 1
   ((T) ((T2) X << (int) (Y & (B - 1 \
 | ((T) ((T2) X >> (int) ((-Y) & (B - 1

   transform these into:
   X r<< (Y & (B - 1))

   Note, in the patterns with T2 type, the type of OP operands
   might be even a signed type, but should have precision B.
   Expressions with & (B - 1) should be recognized only if B is
   a power of 2.  */

but libstdc++ does e.g.
  constexpr auto _Nd = __gnu_cxx::__int_traits<_Tp>::__digits;
  const int __r = __s % _Nd;
  if (__r == 0)
return __x;
  else if (__r > 0)
return (__x << __r) | (__x >> ((_Nd - __r) % _Nd));
  else
return (__x >> -__r) | (__x << ((_Nd + __r) % _Nd)); // rotr(x, -r)
So, can't it e.g. use
  constexpr auto _Nd = __gnu_cxx::__int_traits<_Tp>::__digits;
  const auto __r = static_cast(__s);
  return (__x << (__r % _Nd)) | (__x >> ((-__r) % _Nd));
?

[Bug fortran/99355] -freal-X-real-Y -freal-Z-real-X promotes Z to Y

2021-03-05 Thread cvs-commit at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=99355

--- Comment #17 from CVS Commits  ---
The master branch has been updated by Tobias Burnus :

https://gcc.gnu.org/g:80cf2facbbdafed159b326d83f7cf3999c3df8d0

commit r11-7519-g80cf2facbbdafed159b326d83f7cf3999c3df8d0
Author: Tobias Burnus 
Date:   Fri Mar 5 10:43:11 2021 +0100

Fortran: Follow fixes to -freal-{4,8}-real* handling [PR99355,PR57871]

gcc/fortran/ChangeLog:

PR fortran/99355
PR fortran/57871
* invoke.texi (-freal{4,8}-real-*): Extend description.
* primary.c (match_real_constant): Also promote real literals
with '_kind' number.

gcc/testsuite/ChangeLog:

* gfortran.dg/real4-10-real8-10.f90: Add check for real literals
with '_kind' number.
* gfortran.dg/real4-10-real8-16.f90: Likewise.
* gfortran.dg/real4-10-real8-4.f90: Likewise.
* gfortran.dg/real4-10.f90: Likewise.
* gfortran.dg/real4-16-real8-10.f90: Likewise.
* gfortran.dg/real4-16-real8-16.f90: Likewise.
* gfortran.dg/real4-16-real8-4.f90: Likewise.
* gfortran.dg/real4-16.f90: Likewise.
* gfortran.dg/real4-8-real8-10.f90: Likewise.
* gfortran.dg/real4-8-real8-16.f90: Likewise.
* gfortran.dg/real4-8-real8-4.f90: Likewise.
* gfortran.dg/real4-8.f90: Likewise.
* gfortran.dg/real8-10.f90: Likewise.
* gfortran.dg/real8-16.f90: Likewise.
* gfortran.dg/real8-4.f90: Likewise.

[Bug fortran/57871] gfortran -freal-4-real-16 gives wrong result for selected_real_kind(1)

2021-03-05 Thread cvs-commit at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=57871

--- Comment #10 from CVS Commits  ---
The master branch has been updated by Tobias Burnus :

https://gcc.gnu.org/g:80cf2facbbdafed159b326d83f7cf3999c3df8d0

commit r11-7519-g80cf2facbbdafed159b326d83f7cf3999c3df8d0
Author: Tobias Burnus 
Date:   Fri Mar 5 10:43:11 2021 +0100

Fortran: Follow fixes to -freal-{4,8}-real* handling [PR99355,PR57871]

gcc/fortran/ChangeLog:

PR fortran/99355
PR fortran/57871
* invoke.texi (-freal{4,8}-real-*): Extend description.
* primary.c (match_real_constant): Also promote real literals
with '_kind' number.

gcc/testsuite/ChangeLog:

* gfortran.dg/real4-10-real8-10.f90: Add check for real literals
with '_kind' number.
* gfortran.dg/real4-10-real8-16.f90: Likewise.
* gfortran.dg/real4-10-real8-4.f90: Likewise.
* gfortran.dg/real4-10.f90: Likewise.
* gfortran.dg/real4-16-real8-10.f90: Likewise.
* gfortran.dg/real4-16-real8-16.f90: Likewise.
* gfortran.dg/real4-16-real8-4.f90: Likewise.
* gfortran.dg/real4-16.f90: Likewise.
* gfortran.dg/real4-8-real8-10.f90: Likewise.
* gfortran.dg/real4-8-real8-16.f90: Likewise.
* gfortran.dg/real4-8-real8-4.f90: Likewise.
* gfortran.dg/real4-8.f90: Likewise.
* gfortran.dg/real8-10.f90: Likewise.
* gfortran.dg/real8-16.f90: Likewise.
* gfortran.dg/real8-4.f90: Likewise.

[Bug tree-optimization/99396] std::rotl and std::rotr Does not convert into ROTATE on the gimple level

2021-03-05 Thread jakub at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=99396

--- Comment #7 from Jakub Jelinek  ---
Ah, but __digits is one smaller than we want for signed types.
Plus before C++20 the left shifts of negative values are UB?
Maybe all the rotates should be implemented using the corresponding unsigned
types...

[Bug fortran/99355] -freal-X-real-Y -freal-Z-real-X promotes Z to Y

2021-03-05 Thread burnus at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=99355

Tobias Burnus  changed:

   What|Removed |Added

 Resolution|--- |FIXED
 Status|REOPENED|RESOLVED

--- Comment #18 from Tobias Burnus  ---
The follow-up issue is now also FIXED :-)

Thanks again for the original report Nick!

And thanks Dominique for spotting the omission! At least the crucial bit of the
PR57871 test is now in the testsuite.

[Bug tree-optimization/99396] std::rotl and std::rotr Does not convert into ROTATE on the gimple level

2021-03-05 Thread jakub at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=99396

--- Comment #8 from Jakub Jelinek  ---
so
  using __unsigned_type = __make_unsigned<_Tp>::__type;
  constexpr auto _Nd = __gnu_cxx::__int_traits<__unsigned_type>::__digits;
  const auto __r = static_cast(__s);
  const auto __y = static_cast<__unsigned_type>(__x;
  const auto __z = (__y << (__r % _Nd)) | (__y >> ((-__r) % _Nd));
  return static_cast<_Tp>(__z);
?  Of course, if _Tp can be anything other than integral types, it would need
to be a specialization for integral types (though it will work for __int128
fine if that doesn't count as integral).

[Bug fortran/99345] [11 Regression] ICE in doloop_contained_procedure_code, at fortran/frontend-passes.c:2464 since r11-2578-g27eac9ee6137a6b5

2021-03-05 Thread marxin at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=99345

Martin Liška  changed:

   What|Removed |Added

 Status|WAITING |NEW
 CC||tkoenig at gcc dot gnu.org
Summary|[11 Regression] ICE in  |[11 Regression] ICE in
   |doloop_contained_procedure_ |doloop_contained_procedure_
   |code, at|code, at
   |fortran/frontend-passes.c:2 |fortran/frontend-passes.c:2
   |464 |464 since
   ||r11-2578-g27eac9ee6137a6b5

--- Comment #9 from Martin Liška  ---
I've reduced that to:

$ cat x.f90
MODULE kinds
  INTEGER, PARAMETER :: DP = selected_real_kind(14,200)
  CONTAINS
SUBROUTINE print_kind_info (stdout)
  INTEGER, INTENT(IN) :: stdout
  WRITE(stdout,'(/,T2,A,T78,A,2(/,T2,A,T75,I6),3(/,T2,A,T67,E14.8))') &
  kind('C')
END SUBROUTINE print_kind_info
END MODULE kinds
USE kinds
  COMPLEX(DP), ALLOCATABLE :: selfen_upfan(:)
  DO iq = 1, nq
CALL calc_upper_fan(iq, selfen_upfan)
  ENDDO  
  DO ik = 1, nk
ENDDO
  CONTAINS
SUBROUTINE calc_upper_fan(iq, selfen_upfan)
  COMPLEX(DP)  selfen_upfan(nk)
  INTEGER recl
  INQUIRE(IOLENGTH=recl) ENDDO
END  
END 

$ gfortran x.f90 -c
f951: internal compiler error: in doloop_contained_procedure_code, at
fortran/frontend-passes.c:2464
0x656727 doloop_contained_procedure_code
/home/marxin/Programming/gcc/gcc/fortran/frontend-passes.c:2464
0x9f7d87 gfc_code_walker(gfc_code**, int (*)(gfc_code**, int*, void*), int
(*)(gfc_expr**, int*, void*), void*)
/home/marxin/Programming/gcc/gcc/fortran/frontend-passes.c:5299
0x9f7eef gfc_code_walker(gfc_code**, int (*)(gfc_code**, int*, void*), int
(*)(gfc_expr**, int*, void*), void*)
/home/marxin/Programming/gcc/gcc/fortran/frontend-passes.c:5623
0x9f98ac doloop_code
/home/marxin/Programming/gcc/gcc/fortran/frontend-passes.c:2620
0x9f7d87 gfc_code_walker(gfc_code**, int (*)(gfc_code**, int*, void*), int
(*)(gfc_expr**, int*, void*), void*)
/home/marxin/Programming/gcc/gcc/fortran/frontend-passes.c:5299
0x9f7eef gfc_code_walker(gfc_code**, int (*)(gfc_code**, int*, void*), int
(*)(gfc_expr**, int*, void*), void*)
/home/marxin/Programming/gcc/gcc/fortran/frontend-passes.c:5623
0x9f8f3f doloop_warn
/home/marxin/Programming/gcc/gcc/fortran/frontend-passes.c:3052
0x9f94af gfc_run_passes(gfc_namespace*)
/home/marxin/Programming/gcc/gcc/fortran/frontend-passes.c:156
0x90917e gfc_resolve(gfc_namespace*)
/home/marxin/Programming/gcc/gcc/fortran/resolve.c:17428
0x90917e gfc_resolve(gfc_namespace*)
/home/marxin/Programming/gcc/gcc/fortran/resolve.c:17401
0x8fabab resolve_all_program_units
/home/marxin/Programming/gcc/gcc/fortran/parse.c:6290
0x8fabab gfc_parse_file()
/home/marxin/Programming/gcc/gcc/fortran/parse.c:6542
0x94ec0f gfc_be_parse_file
/home/marxin/Programming/gcc/gcc/fortran/f95-lang.c:212
Please submit a full bug report,
with preprocessed source if appropriate.
Please include the complete backtrace with any bug report.
See  for instructions.

Started with r11-2578-g27eac9ee6137a6b5.

[Bug fortran/57871] gfortran -freal-4-real-16 gives wrong result for selected_real_kind(1)

2021-03-05 Thread burnus at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=57871

Tobias Burnus  changed:

   What|Removed |Added

 CC||burnus at gcc dot gnu.org

--- Comment #11 from Tobias Burnus  ---
(In reply to Dominique d'Humieres from comment #9)
> Note that after r11-7501 the test in comment O gives cat run time:
> % gfc pr57871.f90 -freal-4-real-16

That issue is now fixed – see PR fortran/99355 for some details.

I also updated the documentation to fill some gaps in the
-freal-4-real-16 etc. descriptions and added another caveat. See commit or
https://gcc.gnu.org/onlinedocs/gfortran/Fortran-Dialect-Options.html#index-freal-4-real-16
(may take a day to get updated).


As this issue is about documentation, if I read the later comments correctly:
Can you check whether the documentation is now sufficient or whether more is
needed? If so, what is needed? — If not, can this bug now be closed?

[Bug target/99312] __ARM_ARCH is not implemented correctly when compiled with -march=armv8.1-a

2021-03-05 Thread ktkachov at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=99312

ktkachov at gcc dot gnu.org changed:

   What|Removed |Added

 CC||ktkachov at gcc dot gnu.org

--- Comment #2 from ktkachov at gcc dot gnu.org ---
We intend to deprecate that macro going forward as it's not a useful way for
testing architecture features in aarch64. It made sense in the pre-Armv7-a
days, but now the recommended way to test for features is the __ARM_FEATURE*
macros.

The scheme is also not very well-suited for things like the recent AArch64
Armv8-R.

Is there a particular use case that you have in mind?

[Bug c++/99403] New: Add header fix-it hints for std::this_thread::* and std::jthread

2021-03-05 Thread redi at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=99403

Bug ID: 99403
   Summary: Add header fix-it hints for std::this_thread::* and
std::jthread
   Product: gcc
   Version: 11.0
Status: UNCONFIRMED
  Keywords: diagnostic
  Severity: enhancement
  Priority: P3
 Component: c++
  Assignee: unassigned at gcc dot gnu.org
  Reporter: redi at gcc dot gnu.org
  Target Milestone: ---

#include 
#include 
int main()
{
  std::this_thread::sleep_for(std::chrono::seconds(1));
  std::this_thread::sleep_until(std::chrono::system_clock::now());
  std::this_thread::get_id();
  std::this_thread::yield();
}

This fails with:

ns.C: In function 'int main()':
ns.C:5:21: error: 'sleep_for' is not a member of 'std::this_thread'
5 |   std::this_thread::sleep_for(std::chrono::seconds(1));
  | ^
ns.C:6:21: error: 'sleep_until' is not a member of 'std::this_thread'
6 |   std::this_thread::sleep_until(std::chrono::system_clock::now());
  | ^~~

We have a fix-it hint for std::thread, but not the other entities in :

ns.C:2:1: note: 'std::thread' is defined in header ''; did you forget
to '#include '?
1 | #include 
  +++ |+#include 
2 | int main()

That should be added for std::jthread (since C++20 only), and namespace
std::this_thread and the four functions in the example above (all for C++11 and
later). There is no error for get_id and yield because they happen to be
transitively included via , but that's an implementation detail that
could change in future (no pun intended).

If  isn't included we get errors for std::this_thread itself, so that
nested namespace deserves a fix-it hint of its own:

#include 
int main()
{
  std::this_thread::sleep_for(std::chrono::seconds(1));
}

ns.C: In function 'int main()':
ns.C:4:8: error: 'std::this_thread' has not been declared
4 |   std::this_thread::sleep_for(std::chrono::seconds(1));
  |^~~

[Bug target/99216] ICE in aarch64_sve::function_expander::expand() with LTO

2021-03-05 Thread acoplan at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=99216

--- Comment #4 from Alex Coplan  ---
Right, the problem appears to be to do with the way that overloaded functions
are implemented for the ACLE. Specifically the m_direct_overloads flag in
aarch64_sve::function_builder. If this flag is set, we register a separate
builtin (with a separate function code) for each overload as opposed to
registering the overloaded function once and resolving it later. The two
different schemes end up with each builtin having a different code.

We set m_direct_overloads to be true if the language is C++:

m_direct_overloads = lang_GNU_CXX ();

so in cc1plus, we use one numbering scheme, but in lto1, we use a different
numbering scheme, with predictably disastrous consequences (we try and expand
svaddv as an svbic).

So one options would be that for LTO we instantiate both sets of tree nodes.
Then, when expanding a tree node that came from LTO, we dispatch on a flag in
the tree node (essentially just whether it came from C++ or not) to determine
which set of functions to use. Seems a bit messy though.

@Richard: does that sound at all sane? Any ideas for a better approach?

[Bug tree-optimization/98856] [11 Regression] botan AES-128/XTS is slower by ~17% since r11-6649-g285fa338b06b804e72997c4d876ecf08a9c083af

2021-03-05 Thread rguenther at suse dot de via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98856

--- Comment #25 from rguenther at suse dot de  ---
On Fri, 5 Mar 2021, ubizjak at gmail dot com wrote:

> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98856
> 
> --- Comment #24 from Uroš Bizjak  ---
> (In reply to Richard Biener from comment #22)
> > I guess the idea of this insn setup was exactly to get IRA/LRA choose
> > the optimal instruction sequence - otherwise exposing the reload so
> > late is probably suboptimal.
> 
> THere is one more tool in the toolbox. A peephole2 pattern can be
> conditionalized on availabe XMM register. So, if XMM reg is available, the
> GPR->XMM move can be emitted in front of the insn. So, if there is XMM 
> register
> pressure, pinsrd will be used, but if an XMM register is availabe, it will be
> reused to emit punpcklqdq.
> 
> The peephole2 pattern can also be conditionalized for targets where GPR->XMM
> moves are fast.

Note the trick is esp. important when GPR->XMM moves are _slow_.  But only
in the case we originally combine two GPR operands.  Doing two
GPR->XMM moves and then one puncklqdq hides half of the latency of the
slow moves since they have no data dependence on each other.  So for the
peephole we should try to match this - a reloaded operand and a GPR
operand.  When the %xmm operand results from a SSE computation there's
no point in splitting out a GPR->XMM move.

So in the end a peephole2 sounds like it could better match the condition
the transform is profitable on.

[Bug c++/99404] New: Diagnostics for undeclared members of a namespace don't say "namespace"

2021-03-05 Thread redi at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=99404

Bug ID: 99404
   Summary: Diagnostics for undeclared members of a namespace
don't say "namespace"
   Product: gcc
   Version: 11.0
Status: UNCONFIRMED
  Keywords: diagnostic
  Severity: enhancement
  Priority: P3
 Component: c++
  Assignee: unassigned at gcc dot gnu.org
  Reporter: redi at gcc dot gnu.org
  Target Milestone: ---

#include 
#include 
int main()
{
  std::this_thread::sleep_until(std::chrono::system_clock::now());
}

This invalid program gives:

ns.C: In function 'int main()':
ns.C:5:21: error: 'sleep_until' is not a member of 'std::this_thread'
5 |   std::this_thread::sleep_until(std::chrono::system_clock::now());
  | ^~~

It might be helpful to casual C++ programmers if it was clear that
'std::this_thread' is a namespace, and not a class type or global variable e.g.

ns.C:5:21: error: 'sleep_until' is not a member of namespace 'std::this_thread'


Minimal example:

namespace N { }
void f() {
  using N::a;
  N::b();
}

ns2.C: In function ‘void f()’:
ns2.C:3:12: error: ‘a’ has not been declared in ‘N’
3 |   using N::a;
  |^
ns2.C:4:6: error: ‘b’ is not a member of ‘N’
4 |   N::b();
  |  ^

Also, is there a reason these two diagnostics are worded differently?


Clang says:

ns2.C:3:12: error: no member named 'a' in namespace 'N'
  using N::a;
~~~^
ns2.C:4:6: error: no member named 'b' in namespace 'N'
  N::b();
  ~~~^
2 errors generated.


And EDG says:

"ns2.C", line 3: error: namespace "N" has no member "a"
using N::a;
 ^

"ns2.C", line 4: error: namespace "N" has no member "b"
N::b();
   ^

2 errors detected in the compilation of "ns2.C".

[Bug c++/48396] std::type_info is implicitly declared

2021-03-05 Thread redi at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=48396

Jonathan Wakely  changed:

   What|Removed |Added

   Keywords||accepts-invalid
   See Also||https://gcc.gnu.org/bugzill
   ||a/show_bug.cgi?id=56468
Summary|Undeclared library types|std::type_info is
   |accepted in namespace std   |implicitly declared
   Last reconfirmed|2012-04-12 00:00:00 |2021-3-5

--- Comment #1 from Jonathan Wakely  ---
Adjusting the summary because this is specific to std::type_info, it doesn't
happen for other types.

I think there's some historical reason that G++ pre-defines std::type_info, but
maybe we could make that only valid in system headers. It still causes bugs
even then, because libstdc++ can accidentally rely on the G++ behaviour, which
doesn't work with other compilers (e.g PR 56468).

[Bug target/99401] GCC11 MinGW-w64 32-bit build fails with undefined reference to `LC0'

2021-03-05 Thread brechtsanders at users dot sourceforge.net via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=99401

--- Comment #2 from Brecht Sanders  
---
Created attachment 50302
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=50302&action=edit
gcc -v

[Bug tree-optimization/99396] std::rotl and std::rotr Does not convert into ROTATE on the gimple level

2021-03-05 Thread jakub at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=99396

Jakub Jelinek  changed:

   What|Removed |Added

 Status|NEW |ASSIGNED
   Assignee|unassigned at gcc dot gnu.org  |jakub at gcc dot gnu.org

--- Comment #9 from Jakub Jelinek  ---
Created attachment 50303
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=50303&action=edit
gcc11-pr99396.patch

This seems to work for me (almost, there is a useless and for uchar/ushort) on:
#include 

unsigned char f1 (unsigned char x, int y) { return std::rotl (x, y); }
unsigned char f2 (unsigned char x, int y) { return std::rotr (x, y); }
unsigned short f3 (unsigned short x, int y) { return std::rotl (x, y); }
unsigned short f4 (unsigned short x, int y) { return std::rotr (x, y); }
unsigned int f5 (unsigned int x, int y) { return std::rotl (x, y); }
unsigned int f6 (unsigned int x, int y) { return std::rotr (x, y); }
unsigned long int f7 (unsigned long int x, int y) { return std::rotl (x, y); }
unsigned long int f8 (unsigned long int x, int y) { return std::rotr (x, y); }
unsigned long long int f9 (unsigned long long int x, int y) { return std::rotl
(x, y); }
unsigned long long int f10 (unsigned long long int x, int y) { return std::rotr
(x, y); }
//unsigned __int128 f11 (unsigned __int128 x, int y) { return std::rotl (x, y);
}
//unsigned __int128 f12 (unsigned __int128 x, int y) { return std::rotr (x, y);
}

constexpr auto a = std::rotl (1234U, 0);
constexpr auto b = std::rotl (1234U, 5);
constexpr auto c = std::rotl (1234U, -5);
constexpr auto d = std::rotl (1234U, -__INT_MAX__ - 1);

[Bug c/99137] ICE in gimplify_scan_omp_clauses, at gimplify.c:9833

2021-03-05 Thread cvs-commit at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=99137

--- Comment #6 from CVS Commits  ---
The master branch has been updated by Tobias Burnus :

https://gcc.gnu.org/g:6ddedd3efa3fe482f76a4037521a06b3ac9f2a8b

commit r11-7520-g6ddedd3efa3fe482f76a4037521a06b3ac9f2a8b
Author: Tobias Burnus 
Date:   Fri Mar 5 11:41:44 2021 +0100

OpenACC: C/C++ - fix async parsing [PR99137]

gcc/c/ChangeLog:

PR c/99137
* c-parser.c (c_parser_oacc_clause_async): Reject comma
expressions.

gcc/cp/ChangeLog:

PR c/99137
* parser.c (cp_parser_oacc_clause_async): Reject comma expressions.

gcc/testsuite/ChangeLog:

PR c/99137
* c-c++-common/goacc/asyncwait-1.c: Update dg-error; add
additional test.

[Bug tree-optimization/98856] [11 Regression] botan AES-128/XTS is slower by ~17% since r11-6649-g285fa338b06b804e72997c4d876ecf08a9c083af

2021-03-05 Thread rguenth at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98856

--- Comment #26 from Richard Biener  ---
(In reply to rguent...@suse.de from comment #25)
> On Fri, 5 Mar 2021, ubizjak at gmail dot com wrote:
> 
> > https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98856
> > 
> > --- Comment #24 from Uroš Bizjak  ---
> > (In reply to Richard Biener from comment #22)
> > > I guess the idea of this insn setup was exactly to get IRA/LRA choose
> > > the optimal instruction sequence - otherwise exposing the reload so
> > > late is probably suboptimal.
> > 
> > THere is one more tool in the toolbox. A peephole2 pattern can be
> > conditionalized on availabe XMM register. So, if XMM reg is available, the
> > GPR->XMM move can be emitted in front of the insn. So, if there is XMM 
> > register
> > pressure, pinsrd will be used, but if an XMM register is availabe, it will 
> > be
> > reused to emit punpcklqdq.
> > 
> > The peephole2 pattern can also be conditionalized for targets where GPR->XMM
> > moves are fast.
> 
> Note the trick is esp. important when GPR->XMM moves are _slow_.  But only
> in the case we originally combine two GPR operands.  Doing two
> GPR->XMM moves and then one puncklqdq hides half of the latency of the
> slow moves since they have no data dependence on each other.  So for the
> peephole we should try to match this - a reloaded operand and a GPR
> operand.  When the %xmm operand results from a SSE computation there's
> no point in splitting out a GPR->XMM move.
> 
> So in the end a peephole2 sounds like it could better match the condition
> the transform is profitable on.

I tried

diff --git a/gcc/config/i386/sse.md b/gcc/config/i386/sse.md
index db5be59f5b7..8d0d3077cf8 100644
--- a/gcc/config/i386/sse.md
+++ b/gcc/config/i386/sse.md
@@ -1419,6 +1419,23 @@
   DONE;
 })

+(define_peephole2
+  [(set (match_operand:DI 0 "sse_reg_operand")
+(match_operand:DI 1 "general_gr_operand"))
+   (match_scratch:DI 2 "sse_reg_operand")
+   (set (match_operand:V2DI 2 "sse_reg_operand")
+   (vec_concat:V2DI (match_dup:DI 0)
+(match_operand:DI 3 "general_gr_operand")))]
+  "reload_completed"
+  [(set (match_dup 0)
+(match_dup 1))
+   (set (match_dup 2)
+(match_dup 3))
+   (set (match_dup 2)
+   (vec_concat:V2DI (match_dup 0)
+(match_dup 2)))]
+  "")
+
 ;; Merge movsd/movhpd to movupd for TARGET_SSE_UNALIGNED_LOAD_OPTIMAL targets.
 (define_peephole2
   [(set (match_operand:V2DF 0 "sse_reg_operand")

but that doesn't seem to match for some unknown reason.

[Bug tree-optimization/99396] std::rotl and std::rotr Does not convert into ROTATE on the gimple level

2021-03-05 Thread redi at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=99396

--- Comment #10 from Jonathan Wakely  ---
As you noted on IRC, these functions are undefined for anything except unsigned
integral types. Adding that here for observers wondering about comments 7 and
8.

[Bug target/99312] __ARM_ARCH is not implemented correctly when compiled with -march=armv8.1-a

2021-03-05 Thread pinskia at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=99312

--- Comment #3 from Andrew Pinski  ---
(In reply to ktkachov from comment #2)
> We intend to deprecate that macro going forward as it's not a useful way for
> testing architecture features in aarch64. It made sense in the pre-Armv7-a
> days, but now the recommended way to test for features is the __ARM_FEATURE*
> macros.

Which is interesting because x86 is going in the opposite direction and having
ISA "levels".

> 
> The scheme is also not very well-suited for things like the recent AArch64
> Armv8-R.
> 
> Is there a particular use case that you have in mind?

A customer was asking for how to detect the ISA Level at compile time and
runtime.

[Bug target/99312] __ARM_ARCH is not implemented correctly when compiled with -march=armv8.1-a

2021-03-05 Thread pinskia at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=99312

--- Comment #4 from Andrew Pinski  ---
ODP is one place where this is used:
https://opendataplane.github.io/odp/structodp__system__info__t.html

[Bug tree-optimization/99396] std::rotl and std::rotr Does not convert into ROTATE on the gimple level

2021-03-05 Thread jakub at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=99396

Jakub Jelinek  changed:

   What|Removed |Added

  Attachment #50303|0   |1
is obsolete||

--- Comment #11 from Jakub Jelinek  ---
Created attachment 50304
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=50304&action=edit
gcc11-pr99396.patch

Updated patch.  I'm afraid the previous patch wouldn't work properly on weirdo
types that have _Nd which is not a power of two.
In such cases, (e.g. for __int20 type on some targets), ~0U + 1ULL is not
divisible by _Nd and so I think it wouldn't handle properly std::rot[rl] with
negative second arguments.

[Bug tree-optimization/99396] std::rotl and std::rotr Does not convert into ROTATE on the gimple level

2021-03-05 Thread jakub at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=99396

Jakub Jelinek  changed:

   What|Removed |Added

  Attachment #50304|0   |1
is obsolete||

--- Comment #12 from Jakub Jelinek  ---
Created attachment 50305
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=50305&action=edit
gcc11-pr99396.patch

Some further tweaks based on IRC discussions.

[Bug target/99401] GCC11 MinGW-w64 32-bit build fails with undefined reference to `LC0'

2021-03-05 Thread ebotcazou at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=99401

--- Comment #3 from Eric Botcazou  ---
> gcc -v

No, what's needed is the output for the *base* compiler, i.e. the compiler you
start from, not the compiler you're building.

[Bug target/99312] __ARM_ARCH is not implemented correctly when compiled with -march=armv8.1-a

2021-03-05 Thread pinskia at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=99312

Andrew Pinski  changed:

   What|Removed |Added

URL||https://gcc.gnu.org/piperma
   ||il/gcc-patches/2021-March/5
   ||66322.html
   Keywords||patch

--- Comment #5 from Andrew Pinski  ---
Also Naveen posted a patch earlier today:
https://gcc.gnu.org/pipermail/gcc-patches/2021-March/566322.html

[Bug rtl-optimization/99376] sanitizer detects undefined behaviour in rtlanal.c

2021-03-05 Thread cvs-commit at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=99376

--- Comment #3 from CVS Commits  ---
The master branch has been updated by Eric Botcazou :

https://gcc.gnu.org/g:28354bc22bd66648891a875ee08ca2b27debf2a2

commit r11-7521-g28354bc22bd66648891a875ee08ca2b27debf2a2
Author: Eric Botcazou 
Date:   Fri Mar 5 12:38:49 2021 +0100

Fix undefined behavior spotted by the sanitizer

gcc/
PR rtl-optimization/99376
* rtlanal.c (nonzero_bits1) : If the number
of low-order zero bits is too large, set the result to 0 directly.

[Bug ada/99264] latest glibc release breaks Ada build on Linux

2021-03-05 Thread cvs-commit at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=99264

--- Comment #7 from CVS Commits  ---
The master branch has been updated by Eric Botcazou :

https://gcc.gnu.org/g:331763de7d4850702a0f67298f36017c73cdb103

commit r11-7523-g331763de7d4850702a0f67298f36017c73cdb103
Author: Eric Botcazou 
Date:   Fri Mar 5 12:45:41 2021 +0100

Fix build breakage with latest glibc release

gcc/ada/
PR ada/99264
* init.c (__gnat_alternate_sta) [Linux]: Remove preprocessor test
on
MINSIGSTKSZ and bump size to 32KB.
* libgnarl/s-osinte__linux.ads (Alternate_Stack_Size): Bump to
32KB.

[Bug target/99216] ICE in aarch64_sve::function_expander::expand() with LTO

2021-03-05 Thread rsandifo at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=99216

--- Comment #5 from rsandifo at gcc dot gnu.org  
---
(In reply to Alex Coplan from comment #4)
> Right, the problem appears to be to do with the way that overloaded
> functions are implemented for the ACLE. Specifically the m_direct_overloads
> flag in aarch64_sve::function_builder. If this flag is set, we register a
> separate builtin (with a separate function code) for each overload as
> opposed to registering the overloaded function once and resolving it later.
> The two different schemes end up with each builtin having a different code.
> 
> We set m_direct_overloads to be true if the language is C++:
> 
> m_direct_overloads = lang_GNU_CXX ();
> 
> so in cc1plus, we use one numbering scheme, but in lto1, we use a different
> numbering scheme, with predictably disastrous consequences (we try and
> expand svaddv as an svbic).
> 
> So one options would be that for LTO we instantiate both sets of tree nodes.
> Then, when expanding a tree node that came from LTO, we dispatch on a flag
> in the tree node (essentially just whether it came from C++ or not) to
> determine which set of functions to use. Seems a bit messy though.
> 
> @Richard: does that sound at all sane? Any ideas for a better approach?
I think we should just make the non-C++ codes line up with the C++ ones
by pushing a dummy entry onto registered_functions (but not creating or
registering a decl for it).

[Bug ada/99264] latest glibc release breaks Ada build on Linux

2021-03-05 Thread cvs-commit at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=99264

--- Comment #8 from CVS Commits  ---
The releases/gcc-10 branch has been updated by Eric Botcazou
:

https://gcc.gnu.org/g:c85c24099b28f7af907466af2c1b73da9455368c

commit r10-9417-gc85c24099b28f7af907466af2c1b73da9455368c
Author: Eric Botcazou 
Date:   Fri Mar 5 12:45:41 2021 +0100

Fix build breakage with latest glibc release

gcc/ada/
PR ada/99264
* init.c (__gnat_alternate_sta) [Linux]: Remove preprocessor test
on
MINSIGSTKSZ and bump size to 32KB.
* libgnarl/s-osinte__linux.ads (Alternate_Stack_Size): Bump to
32KB.

[Bug ada/99264] latest glibc release breaks Ada build on Linux

2021-03-05 Thread cvs-commit at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=99264

--- Comment #9 from CVS Commits  ---
The releases/gcc-9 branch has been updated by Eric Botcazou
:

https://gcc.gnu.org/g:a5a7cdcaa0c29ee547c41d24f495e9694a6fe7f1

commit r9-9267-ga5a7cdcaa0c29ee547c41d24f495e9694a6fe7f1
Author: Eric Botcazou 
Date:   Fri Mar 5 12:45:41 2021 +0100

Fix build breakage with latest glibc release

gcc/ada/
PR ada/99264
* init.c (__gnat_alternate_sta) [Linux]: Remove preprocessor test
on
MINSIGSTKSZ and bump size to 32KB.
* libgnarl/s-osinte__linux.ads (Alternate_Stack_Size): Bump to
32KB.

[Bug rtl-optimization/99376] sanitizer detects undefined behaviour in rtlanal.c

2021-03-05 Thread ebotcazou at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=99376

Eric Botcazou  changed:

   What|Removed |Added

 Resolution|--- |FIXED
 Status|ASSIGNED|RESOLVED
   Target Milestone|--- |11.0

--- Comment #4 from Eric Botcazou  ---
.

[Bug other/63426] [meta-bug] Issues found with -fsanitize=undefined

2021-03-05 Thread ebotcazou at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=63426
Bug 63426 depends on bug 99376, which changed state.

Bug 99376 Summary: sanitizer detects undefined behaviour in rtlanal.c
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=99376

   What|Removed |Added

 Status|ASSIGNED|RESOLVED
 Resolution|--- |FIXED

[Bug tree-optimization/98856] [11 Regression] botan AES-128/XTS is slower by ~17% since r11-6649-g285fa338b06b804e72997c4d876ecf08a9c083af

2021-03-05 Thread ubizjak at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98856

--- Comment #27 from Uroš Bizjak  ---
(In reply to Richard Biener from comment #26)
> but that doesn't seem to match for some unknown reason.

Try this:

(define_peephole2
  [(match_scratch:DI 5 "Yv")
   (set (match_operand:DI 0 "sse_reg_operand")
(match_operand:DI 1 "general_reg_operand"))
   (set (match_operand:V2DI 2 "sse_reg_operand")
(vec_concat:V2DI (match_operand:DI 3 "sse_reg_operand")
 (match_operand:DI 4 "nonimmediate_gr_operand")))]
  ""
  [(set (match_dup 0)
(match_dup 1))
   (set (match_dup 5)
(match_dup 4))
   (set (match_dup 2)
   (vec_concat:V2DI (match_dup 3)
(match_dup 5)))])

[Bug ada/99264] latest glibc release breaks Ada build on Linux

2021-03-05 Thread ebotcazou at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=99264

Eric Botcazou  changed:

   What|Removed |Added

 Resolution|--- |FIXED
 Status|ASSIGNED|RESOLVED
   Target Milestone|11.0|9.4

--- Comment #10 from Eric Botcazou  ---
Fixed on mainline, 10 and 9 branches.

[Bug tree-optimization/99396] std::rotl and std::rotr Does not convert into ROTATE on the gimple level

2021-03-05 Thread unlvsur at live dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=99396

--- Comment #13 from cqwrteur  ---
(In reply to Jakub Jelinek from comment #12)
> Created attachment 50305 [details]
> gcc11-pr99396.patch
> 
> Some further tweaks based on IRC discussions.

shouldn't this be

if constexpr ((_Nd & (_Nd - 1)) == 0)
{
}
else
{
}

instead of if ((_Nd & (_Nd - 1)) == 0) ?

[Bug tree-optimization/99396] std::rotl and std::rotr Does not convert into ROTATE on the gimple level

2021-03-05 Thread jakub at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=99396

--- Comment #14 from Jakub Jelinek  ---
I believe std::__rot{l,r} can be used even in C++14 and if constexpr is only
supported in C++17 and later.
With optimizations enabled (_Nd & (_Nd - 1)) == 0 will optimize into constant
anyway.

[Bug fortran/97927] gfortran: ICE in lookup_field_for_decl, at tree-nested.c:288

2021-03-05 Thread doko at debian dot org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97927

--- Comment #10 from Matthias Klose  ---
seen again with 20210227

[Bug tree-optimization/99396] std::rotl and std::rotr Does not convert into ROTATE on the gimple level

2021-03-05 Thread jakub at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=99396

--- Comment #15 from Jakub Jelinek  ---
Could be if _GLIBCXX17_CONSTEXPR (...)
sure.

[Bug tree-optimization/99396] std::rotl and std::rotr Does not convert into ROTATE on the gimple level

2021-03-05 Thread unlvsur at live dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=99396

--- Comment #16 from cqwrteur  ---
(In reply to Jakub Jelinek from comment #15)
> Could be if _GLIBCXX17_CONSTEXPR (...)
> sure.

bit header is a new C++20 header. There is no reason the compiler does not
support if constexpr.

If you compile it with pre-C++20, the code should even get compiled

[Bug tree-optimization/99396] std::rotl and std::rotr Does not convert into ROTATE on the gimple level

2021-03-05 Thread unlvsur at live dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=99396

--- Comment #17 from cqwrteur  ---
(In reply to cqwrteur from comment #16)
> (In reply to Jakub Jelinek from comment #15)
> > Could be if _GLIBCXX17_CONSTEXPR (...)
> > sure.
> 
> bit header is a new C++20 header. There is no reason the compiler does not
> support if constexpr.
> 
> If you compile it with pre-C++20, the code should even get compiled

sorry. should not even get compiled

[Bug tree-optimization/99396] std::rotl and std::rotr Does not convert into ROTATE on the gimple level

2021-03-05 Thread unlvsur at live dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=99396

--- Comment #18 from cqwrteur  ---
(In reply to cqwrteur from comment #17)
> (In reply to cqwrteur from comment #16)
> > (In reply to Jakub Jelinek from comment #15)
> > > Could be if _GLIBCXX17_CONSTEXPR (...)
> > > sure.
> > 
> > bit header is a new C++20 header. There is no reason the compiler does not
> > support if constexpr.
> > 
> > If you compile it with pre-C++20, the code should even get compiled
> 
> sorry. should not even get compiled

okay. it looks like you folks want to support C++14 and C++17 as an extension.

[Bug target/99216] ICE in aarch64_sve::function_expander::expand() with LTO

2021-03-05 Thread acoplan at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=99216

--- Comment #6 from Alex Coplan  ---
Ok, I'll have a go, thanks.

[Bug libstdc++/99402] [10/11 Regression] std::copy creates _GLIBCXX_DEBUG false positive for attempt to subscript a dereferenceable (start-of-sequence) iterator

2021-03-05 Thread redi at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=99402

Jonathan Wakely  changed:

   What|Removed |Added

 Ever confirmed|0   |1
  Known to work||9.3.0
   Target Milestone|--- |10.3
Summary|std::copy creates   |[10/11 Regression]
   |_GLIBCXX_DEBUG false|std::copy creates
   |positive for attempt to |_GLIBCXX_DEBUG false
   |subscript a dereferenceable |positive for attempt to
   |(start-of-sequence) |subscript a dereferenceable
   |iterator|(start-of-sequence)
   ||iterator
 CC||fdumont at gcc dot gnu.org
  Known to fail||10.1.0, 10.2.0, 11.0
 Status|UNCONFIRMED |NEW
   Last reconfirmed||2021-03-05

--- Comment #1 from Jonathan Wakely  ---
Probably caused by r276600

[Bug tree-optimization/98856] [11 Regression] botan AES-128/XTS is slower by ~17% since r11-6649-g285fa338b06b804e72997c4d876ecf08a9c083af

2021-03-05 Thread ubizjak at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98856

--- Comment #28 from Uroš Bizjak  ---
(In reply to Uroš Bizjak from comment #27)
> (In reply to Richard Biener from comment #26)
> > but that doesn't seem to match for some unknown reason. 
> Try this:

The latency problem with the original testcase is solved with:

(define_peephole2
  [(match_scratch:DI 3 "Yv")
   (set (match_operand:V2DI 0 "sse_reg_operand")
(vec_concat:V2DI (match_operand:DI 1 "sse_reg_operand")
 (match_operand:DI 2 "nonimmediate_gr_operand")))]
  ""
  [(set (match_dup 3) (match_dup 2))
   (set (match_dup 0)
(vec_concat:V2DI (match_dup 1) (match_dup 3)))])

but I don't know if this transformation applies universally to all x86 targets.

[Bug tree-optimization/98856] [11 Regression] botan AES-128/XTS is slower by ~17% since r11-6649-g285fa338b06b804e72997c4d876ecf08a9c083af

2021-03-05 Thread rguenth at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98856

--- Comment #29 from Richard Biener  ---
(In reply to Uroš Bizjak from comment #27)
> (In reply to Richard Biener from comment #26)
> > but that doesn't seem to match for some unknown reason.
> 
> Try this:
> 
> (define_peephole2
>   [(match_scratch:DI 5 "Yv")
>(set (match_operand:DI 0 "sse_reg_operand")
> (match_operand:DI 1 "general_reg_operand"))
>(set (match_operand:V2DI 2 "sse_reg_operand")
> (vec_concat:V2DI (match_operand:DI 3 "sse_reg_operand")
>  (match_operand:DI 4 "nonimmediate_gr_operand")))]
>   ""
>   [(set (match_dup 0)
> (match_dup 1))
>(set (match_dup 5)
> (match_dup 4))
>(set (match_dup 2)
>(vec_concat:V2DI (match_dup 3)
> (match_dup 5)))])

Ah, I messed up operands.  The following works (the above position of
match_scratch happily chooses an operand matching operand 0):

;; Further split pinsrq variants of vec_concatv2di with two GPR sources,
;; one already reloaded, to hide the latency of one GPR->XMM transitions.
(define_peephole2
  [(set (match_operand:DI 0 "sse_reg_operand")
(match_operand:DI 1 "general_reg_operand"))
   (match_scratch:DI 2 "Yv")
   (set (match_operand:V2DI 3 "sse_reg_operand")
(vec_concat:V2DI (match_dup 0)
 (match_operand:DI 4 "nonimmediate_gr_operand")))]
  "reload_completed && optimize_insn_for_speed_p ()"
  [(set (match_dup 0)
(match_dup 1))
   (set (match_dup 2)
(match_dup 4))
   (set (match_dup 3)
(vec_concat:V2DI (match_dup 0)
 (match_dup 2)))])

but for some reason it again doesn't work for the important loop.  There
we have

  389: xmm0:DI=cx:DI
  REG_DEAD cx:DI
  390: dx:DI=[sp:DI+0x10]
   56: {dx:DI=dx:DI 0>>0x3f;clobber flags:CC;}
  REG_UNUSED flags:CC
   57: xmm0:V2DI=vec_concat(xmm0:DI,dx:DI)

I suppose the reason is that there's two unrelated insns between the
xmm0 = cx:DI and the vec_concat.  Which would hint that we somehow
need to not match this GPR->XMM move in the peephole pattern but
instead somehow in the condition (can we use DF there?)

The simplified variant below works but IMHO matches cases we do not
want to transform.  I can't find any example on how to achieve that
though.

;; Further split pinsrq variants of vec_concatv2di with two GPR sources,
;; one already reloaded, to hide the latency of one GPR->XMM transitions.
(define_peephole2
  [(match_scratch:DI 3 "Yv")
   (set (match_operand:V2DI 0 "sse_reg_operand")
(vec_concat:V2DI (match_operand:DI 1 "sse_reg_operand")
 (match_operand:DI 2 "nonimmediate_gr_operand")))]
  "reload_completed && optimize_insn_for_speed_p ()"
  [(set (match_dup 3)
(match_dup 2))
   (set (match_dup 0)
(vec_concat:V2DI (match_dup 1)
 (match_dup 3)))])

[Bug libstdc++/99402] [10/11 Regression] std::copy creates _GLIBCXX_DEBUG false positive for attempt to subscript a dereferenceable (start-of-sequence) iterator

2021-03-05 Thread redi at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=99402

--- Comment #2 from Jonathan Wakely  ---
(In reply to Kip Warner from comment #0)
> // This results in memory corruption, or an abort with STL debugging

I don't see any memory corruption, I think it's just a bug in the Debug Mode
checks, which aborts when it shouldn't do. Without those bad checks it doesn't
abort, and works correctly.

[Bug libstdc++/99402] [10/11 Regression] std::copy creates _GLIBCXX_DEBUG false positive for attempt to subscript a dereferenceable (start-of-sequence) iterator

2021-03-05 Thread redi at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=99402

--- Comment #3 from Jonathan Wakely  ---
François, this can't be right:

  return std::make_pair(-__seq_dist.first,
__seq_dist.second == __dp_exact
? __dp_sign_max_size : __seq_dist.second);

This uses __seq_dist.second, but __seq_dist comes from _SeqTraits::_S_size
which is:

  template
struct _Sequence_traits
{
  typedef _Distance_traits _DistTraits;

  static typename _DistTraits::__type
  _S_size(const _Sequence& __seq)
  { return std::make_pair(__seq.size(), __dp_exact); }
};

i.e. __seq_dist.second is always __dp_exact, so this function always returns
__dp_sign_max_size.

You should be using __base_dist.second, no?

[Bug target/99405] New: Rotate with mask not optimized on x86 for QI/HImode rotates

2021-03-05 Thread jakub at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=99405

Bug ID: 99405
   Summary: Rotate with mask not optimized on x86 for QI/HImode
rotates
   Product: gcc
   Version: 11.0
Status: UNCONFIRMED
  Keywords: missed-optimization
  Severity: enhancement
  Priority: P3
 Component: target
  Assignee: unassigned at gcc dot gnu.org
  Reporter: jakub at gcc dot gnu.org
CC: jakub at gcc dot gnu.org, unlvsur at live dot com
Depends on: 99396
  Target Milestone: ---

+++ This bug was initially created as a clone of Bug #99396 +++

In
unsigned char f1 (unsigned char x, unsigned y) { return (x << (y & 7)) | (x >>
(-y & 7)); }
unsigned short f2 (unsigned short x, unsigned y) { return (x << (y & 15)) | (x
>> (-y & 15)); }
unsigned int f3 (unsigned int x, unsigned y) { return (x << (y & 31)) | (x >>
(-y & 31)); }
unsigned char f4 (unsigned char x, unsigned y) { return (x >> (y & 7)) | (x <<
(-y & 7)); }
unsigned short f5 (unsigned short x, unsigned y) { return (x >> (y & 15)) | (x
<< (-y & 15)); }
unsigned int f6 (unsigned int x, unsigned y) { return (x >> (y & 31)) | (x <<
(-y & 31)); }
unsigned char f7 (unsigned char x, unsigned char y) { unsigned char v = y & 7;
unsigned char w = -y & 7; return (x << v) | (x >> w); }
unsigned short f8 (unsigned short x, unsigned char y) { unsigned char v = y &
15; unsigned char w = -y & 15; return (x << v) | (x >> w); }
unsigned int f9 (unsigned int x, unsigned char y) { unsigned char v = y & 31;
unsigned char w = -y & 31; return (x << v) | (x >> w); }
unsigned char f10 (unsigned char x, unsigned char y) { unsigned char v = y & 7;
unsigned char w = -y & 7; return (x >> v) | (x << w); }
unsigned short f11 (unsigned short x, unsigned char y) { unsigned char v = y &
15; unsigned char w = -y & 15; return (x >> v) | (x << w); }
unsigned int f12 (unsigned int x, unsigned char y) { unsigned char v = y & 31;
unsigned char w = -y & 31; return (x >> v) | (x << w); }
#ifdef __x86_64__
unsigned long long f13 (unsigned long long x, unsigned y) { return (x << (y &
63)) | (x >> (-y & 63)); }
unsigned long long f14 (unsigned long long x, unsigned y) { return (x >> (y &
63)) | (x << (-y & 63)); }
unsigned long long f15 (unsigned long long x, unsigned char y) { unsigned char
v = y & 63; unsigned char w = -y & 63; return (x << v) | (x >> w); }
unsigned long long f16 (unsigned long long x, unsigned char y) { unsigned char
v = y & 63; unsigned char w = -y & 63; return (x >> v) | (x << w); }
#endif

we don't optimize away the and instructions in f{1,2,4,5,7,8,10,11}.


Referenced Bugs:

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=99396
[Bug 99396] std::rotl and std::rotr Does not convert into ROTATE on the gimple
level

[Bug target/99401] Rebuilding the compiler with itself fails at -O2

2021-03-05 Thread ebotcazou at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=99401

Eric Botcazou  changed:

   What|Removed |Added

 Status|WAITING |NEW
Summary|GCC11 MinGW-w64 32-bit  |Rebuilding the compiler
   |build fails with undefined  |with itself fails at -O2
   |reference to `LC0'  |
   Keywords|build   |

--- Comment #4 from Eric Botcazou  ---
OK, I missed that you're explicitly rebuilding the compiler with itself after
having bootstrapped it first...  You're on your own here, no one does that.

[Bug libstdc++/99402] [10/11 Regression] std::copy creates _GLIBCXX_DEBUG false positive for attempt to subscript a dereferenceable (start-of-sequence) iterator

2021-03-05 Thread redi at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=99402

--- Comment #4 from Jonathan Wakely  ---
This causes __valid_range to return {11, __dp_sign_max_size} and then we check
__result._M_can_advance(11) which fails.

We don't want to advance the result by the size of the other sequence, only by
distance(__first, __last). We don't care if there are elements in the sequence
past __last because we're not trying to copy them.

[Bug target/99405] Rotate with mask not optimized on x86 for QI/HImode rotates

2021-03-05 Thread jakub at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=99405

Jakub Jelinek  changed:

   What|Removed |Added

   Assignee|unassigned at gcc dot gnu.org  |jakub at gcc dot gnu.org
 Ever confirmed|0   |1
 Status|UNCONFIRMED |ASSIGNED
   Last reconfirmed||2021-03-05

--- Comment #1 from Jakub Jelinek  ---
Created attachment 50306
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=50306&action=edit
gcc11-pr99405.patch

Untested fix.

[Bug target/99401] Rebuilding the compiler with itself fails at -O2

2021-03-05 Thread brechtsanders at users dot sourceforge.net via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=99401

--- Comment #5 from Brecht Sanders  
---
*** Bug 97618 has been marked as a duplicate of this bug. ***

[Bug c/97618] undefined reference to LC11 building for target MinGW-w64 32-bit

2021-03-05 Thread brechtsanders at users dot sourceforge.net via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97618

Brecht Sanders  changed:

   What|Removed |Added

 Resolution|--- |DUPLICATE
 Status|UNCONFIRMED |RESOLVED

--- Comment #5 from Brecht Sanders  
---
Seems to be the same issue as 99401.

*** This bug has been marked as a duplicate of bug 99401 ***

[Bug tree-optimization/98856] [11 Regression] botan AES-128/XTS is slower by ~17% since r11-6649-g285fa338b06b804e72997c4d876ecf08a9c083af

2021-03-05 Thread jakub at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98856

--- Comment #30 from Jakub Jelinek  ---
(In reply to Richard Biener from comment #29)
> I suppose the reason is that there's two unrelated insns between the
> xmm0 = cx:DI and the vec_concat.  Which would hint that we somehow
> need to not match this GPR->XMM move in the peephole pattern but
> instead somehow in the condition (can we use DF there?)

peephole2 are run in a pass that does:
  df_set_flags (DF_LR_RUN_DCE);
  df_note_add_problem ();
  df_analyze ();
so, DF that uses the note or default problems is ok, but e.g.
DF_UD_CHAIN/DF_DU_CHAIN is not available.
But it can e.g. walk some number of previous instructions (with some reasonably
small upper bound) etc.

[Bug target/99401] Rebuilding the compiler with itself fails at -O2

2021-03-05 Thread jakub at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=99401

Jakub Jelinek  changed:

   What|Removed |Added

 CC||jakub at gcc dot gnu.org

--- Comment #6 from Jakub Jelinek  ---
Anyway, in that case you want to attach here the preprocessed source on which
some undefined references to LC* appear and state exact gcc options that were
used to compile that, so that others can try to reproduce it e.g. with
cross-compilers.

[Bug tree-optimization/98856] [11 Regression] botan AES-128/XTS is slower by ~17% since r11-6649-g285fa338b06b804e72997c4d876ecf08a9c083af

2021-03-05 Thread ubizjak at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98856

--- Comment #31 from Uroš Bizjak  ---
(In reply to Richard Biener from comment #29)
> The simplified variant below works but IMHO matches cases we do not
> want to transform.  I can't find any example on how to achieve that
> though.

I think that pinsrd should be transformed to punpcklqdq irrespective of its
first input operand. The insn scheduler should move insns around to mask their
latencies.

> ;; Further split pinsrq variants of vec_concatv2di with two GPR sources,
> ;; one already reloaded, to hide the latency of one GPR->XMM transitions.
> (define_peephole2
>   [(match_scratch:DI 3 "Yv")
>(set (match_operand:V2DI 0 "sse_reg_operand")
> (vec_concat:V2DI (match_operand:DI 1 "sse_reg_operand")
>  (match_operand:DI 2 "nonimmediate_gr_operand")))]
>   "reload_completed && optimize_insn_for_speed_p ()"

Please use

  "TARGET_64BIT && TARGET_SSE4_1
   && !optimize_insn_for_size_p ()"

here.

>   [(set (match_dup 3)
> (match_dup 2))
>(set (match_dup 0)
> (vec_concat:V2DI (match_dup 1)
>  (match_dup 3)))])

[Bug tree-optimization/98856] [11 Regression] botan AES-128/XTS is slower by ~17% since r11-6649-g285fa338b06b804e72997c4d876ecf08a9c083af

2021-03-05 Thread rguenther at suse dot de via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98856

--- Comment #32 from rguenther at suse dot de  ---
On Fri, 5 Mar 2021, ubizjak at gmail dot com wrote:

> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98856
> 
> --- Comment #31 from Uroš Bizjak  ---
> (In reply to Richard Biener from comment #29)
> > The simplified variant below works but IMHO matches cases we do not
> > want to transform.  I can't find any example on how to achieve that
> > though.
> 
> I think that pinsrd should be transformed to punpcklqdq irrespective of its
> first input operand. The insn scheduler should move insns around to mask their
> latencies.
> 
> > ;; Further split pinsrq variants of vec_concatv2di with two GPR sources,
> > ;; one already reloaded, to hide the latency of one GPR->XMM transitions.
> > (define_peephole2
> >   [(match_scratch:DI 3 "Yv")
> >(set (match_operand:V2DI 0 "sse_reg_operand")
> > (vec_concat:V2DI (match_operand:DI 1 "sse_reg_operand")
> >  (match_operand:DI 2 "nonimmediate_gr_operand")))]
> >   "reload_completed && optimize_insn_for_speed_p ()"
> 
> Please use
> 
>   "TARGET_64BIT && TARGET_SSE4_1
>&& !optimize_insn_for_size_p ()"
> 
> here.

what about reload_completed?  We really only want to do this after RA.

Will test the patch then and add the reduced testcase.

[Bug libstdc++/99402] [10/11 Regression] std::copy creates _GLIBCXX_DEBUG false positive for attempt to subscript a dereferenceable (start-of-sequence) iterator

2021-03-05 Thread redi at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=99402

--- Comment #5 from Jonathan Wakely  ---
(In reply to Jonathan Wakely from comment #3)
> You should be using __base_dist.second, no?

No, it's not that simple. I don't understand how this code is meant to work,
but this can't be right:

  if (this->_M_is_begin())
{
  if (__rhs._M_is_before_begin())
return std::make_pair(-1, __dp_exact);

  if (__rhs._M_is_end())
return __seq_dist;

  return std::make_pair(__seq_dist.first,
__seq_dist.second == __dp_exact
? __dp_sign_max_size : __seq_dist.second);
}

We can't use __seq_dist.first as the distance, because we have already
established that __rhs is not the end iterator, so it must be before the end.
And so the distance we're trying to find must be less than __seq_dist.first.

[Bug target/99405] Rotate with mask not optimized on x86 for QI/HImode rotates

2021-03-05 Thread ubizjak at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=99405

--- Comment #2 from Uroš Bizjak  ---
(In reply to Jakub Jelinek from comment #1)
> Created attachment 50306 [details]
> gcc11-pr99405.patch
> 
> Untested fix.

- (match_operand:SI 2 "register_operand" "c")
+ (match_operand:SI 2 "register_operand")

The constraint is here on purpose, you are risking reload failures with
compound instruction without it. IIRC from discussion with Segher, the combine
pass shouldn't propagate hard registers around anymore, but it can still
happen. So, if there is no compelling reason, I'd suggest to leave the
constraint.

[Bug libstdc++/99402] [10/11 Regression] std::copy creates _GLIBCXX_DEBUG false positive for attempt to subscript a dereferenceable (start-of-sequence) iterator

2021-03-05 Thread redi at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=99402

--- Comment #6 from Jonathan Wakely  ---
Reduced:

#include 
#include 
#include 

using namespace std;

int main()
{
// any container with non-random access iterators:
const set source = { 0, 1 };
vector dest(1);
copy(source.begin(), ++source.begin(), dest.begin());
}

[Bug target/99405] Rotate with mask not optimized on x86 for QI/HImode rotates

2021-03-05 Thread jakub at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=99405

Jakub Jelinek  changed:

   What|Removed |Added

  Attachment #50306|0   |1
is obsolete||

--- Comment #3 from Jakub Jelinek  ---
Created attachment 50307
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=50307&action=edit
gcc11-pr99405.patch

No particular reason, it just feeled weird to have constraints on pre-reload
splitters.

Anyway, I've tried also to replace the pre-reload splitters with combine
splitters, but for some reason it didn't work.

[Bug tree-optimization/98856] [11 Regression] botan AES-128/XTS is slower by ~17% since r11-6649-g285fa338b06b804e72997c4d876ecf08a9c083af

2021-03-05 Thread rguenth at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98856

--- Comment #33 from Richard Biener  ---
Created attachment 50308
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=50308&action=edit
patch

I am testing the following.

[Bug tree-optimization/98856] [11 Regression] botan AES-128/XTS is slower by ~17% since r11-6649-g285fa338b06b804e72997c4d876ecf08a9c083af

2021-03-05 Thread ubizjak at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98856

--- Comment #34 from Uroš Bizjak  ---
(In reply to rguent...@suse.de from comment #32)
> what about reload_completed?  We really only want to do this after RA.

No need for it, this is peephole2 pass that *always* runs after reload.

[Bug c++/99389] [modules] bad serialization of data

2021-03-05 Thread cvs-commit at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=99389

--- Comment #1 from CVS Commits  ---
The master branch has been updated by Nathan Sidwell :

https://gcc.gnu.org/g:4d66685e49d20e0c7a87c5fa0757c7eb63ffcdaa

commit r11-7524-g4d66685e49d20e0c7a87c5fa0757c7eb63ffcdaa
Author: Nathan Sidwell 
Date:   Fri Mar 5 05:25:54 2021 -0800

c++: instantiating imported specializations [PR 99389]

When an incomplete class specialization is imported, and is completed
by instantiation, we were failing to mark the instantiation, and thus
didn't stream it out.  Leading to errors in importing as we had
members of an incomplete type.

PR c++/99389
gcc/cp/
* pt.c (instantiate_class_template_1): Set instantiating module
here.
gcc/testsuite/
* g++.dg/modules/pr99389_a.H: New.
* g++.dg/modules/pr99389_b.C: New.
* g++.dg/modules/pr99389_c.C: New.

[Bug gcov-profile/99406] New: [11 regression] MAP_ANONYMOUS undeclared in libgcov.h

2021-03-05 Thread ro at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=99406

Bug ID: 99406
   Summary: [11 regression] MAP_ANONYMOUS undeclared in libgcov.h
   Product: gcc
   Version: 11.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: gcov-profile
  Assignee: unassigned at gcc dot gnu.org
  Reporter: ro at gcc dot gnu.org
CC: iains at gcc dot gnu.org, marxin at gcc dot gnu.org
  Target Milestone: ---
Target: *-apple-darwin11

The unconditional use of MAP_ANONYMOUS in libgcov.h broke Mac OS X 10.7/Darwin
11
bootstrap:

In file included from /vol/gcc/src/hg/master/darwin/libgcc/libgcov-merge.c:26:
/vol/gcc/src/hg/master/darwin/libgcc/libgcov.h: In function 'malloc_mmap':
In file included from /vol/gcc/src/hg/master/darwin/libgcc/libgcov-merge.c:26:
/vol/gcc/src/hg/master/darwin/libgcc/libgcov.h: In function 'malloc_mmap':
/vol/gcc/src/hg/master/darwin/libgcc/libgcov.h:420:30: error: 'MAP_ANONYMOUS'
undeclared (first use in this function); did you mean 'MAP_ANON'?
  420 |MAP_PRIVATE | MAP_ANONYMOUS, -1, 0);
  |  ^
  |  MAP_ANON
/vol/gcc/src/hg/master/darwin/libgcc/libgcov.h:420:30: note: each undeclared
identifier is reported only once for each function it appears in

I haven't checked which macOS version introduced MAP_ANONYMOUS as alias for
MAP_ANON, but macOS 11/Darwin 20 has it.

[Bug c++/99389] [modules] bad serialization of data

2021-03-05 Thread nathan at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=99389

Nathan Sidwell  changed:

   What|Removed |Added

 Resolution|--- |FIXED
 Status|UNCONFIRMED |RESOLVED

--- Comment #2 from Nathan Sidwell  ---
4d66685e49d 2021-03-05 | c++: instantiating imported specializations [PR 99389]

[Bug gcov-profile/99406] [11 regression] MAP_ANONYMOUS undeclared in libgcov.h

2021-03-05 Thread ro at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=99406

Rainer Orth  changed:

   What|Removed |Added

   Target Milestone|--- |11.0

[Bug gcov-profile/99406] [11 regression] MAP_ANONYMOUS undeclared in libgcov.h

2021-03-05 Thread iains at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=99406

--- Comment #1 from Iain Sandoe  ---
(In reply to Rainer Orth from comment #0)
> The unconditional use of MAP_ANONYMOUS in libgcov.h broke Mac OS X
> 10.7/Darwin 11
> bootstrap:
> 
> In file included from
> /vol/gcc/src/hg/master/darwin/libgcc/libgcov-merge.c:26:
> /vol/gcc/src/hg/master/darwin/libgcc/libgcov.h: In function 'malloc_mmap':
> In file included from
> /vol/gcc/src/hg/master/darwin/libgcc/libgcov-merge.c:26:
> /vol/gcc/src/hg/master/darwin/libgcc/libgcov.h: In function 'malloc_mmap':
> /vol/gcc/src/hg/master/darwin/libgcc/libgcov.h:420:30: error:
> 'MAP_ANONYMOUS' undeclared (first use in this function); did you mean
> 'MAP_ANON'?
>   420 |MAP_PRIVATE | MAP_ANONYMOUS, -1, 0);
>   |  ^
>   |  MAP_ANON
> /vol/gcc/src/hg/master/darwin/libgcc/libgcov.h:420:30: note: each undeclared
> identifier is reported only once for each function it appears in
> 
> I haven't checked which macOS version introduced MAP_ANONYMOUS as alias for
> MAP_ANON, but macOS 11/Darwin 20 has it.

I think from darwin15 / macOS 10.11

/opt/iains/SDKs/darwin15/usr/include/sys/mman.h:#define MAP_ANONYMOUS  
MAP_ANON
/opt/iains/SDKs/darwin16/usr/include/sys/mman.h:#define MAP_ANONYMOUS  
MAP_ANON
/opt/iains/SDKs/darwin17/usr/include/sys/mman.h:#define MAP_ANONYMOUS  
MAP_ANON
/opt/iains/SDKs/darwin18/usr/include/sys/mman.h:#define MAP_ANONYMOUS  
MAP_ANON
/opt/iains/SDKs/darwin19/usr/include/sys/mman.h:#define MAP_ANONYMOUS  
MAP_ANON

[Bug gcov-profile/99406] [11 regression] MAP_ANONYMOUS undeclared in libgcov.h

2021-03-05 Thread jakub at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=99406

Jakub Jelinek  changed:

   What|Removed |Added

 CC||jakub at gcc dot gnu.org

--- Comment #2 from Jakub Jelinek  ---
gcc/system.h has:
#ifndef MAP_FAILED
# define MAP_FAILED ((void *)-1)
#endif

#if !defined (MAP_ANONYMOUS) && defined (MAP_ANON)
# define MAP_ANONYMOUS MAP_ANON
#endif

so I think libgcov.h needs to do that too.

[Bug gcov-profile/99406] [11 regression] MAP_ANONYMOUS undeclared in libgcov.h

2021-03-05 Thread marxin at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=99406

Martin Liška  changed:

   What|Removed |Added

   Assignee|unassigned at gcc dot gnu.org  |marxin at gcc dot 
gnu.org
 Ever confirmed|0   |1
   Last reconfirmed||2021-03-05
 Status|UNCONFIRMED |ASSIGNED

--- Comment #3 from Martin Liška  ---
(In reply to Jakub Jelinek from comment #2)
> gcc/system.h has:
> #ifndef MAP_FAILED
> # define MAP_FAILED ((void *)-1)
> #endif
> 
> #if !defined (MAP_ANONYMOUS) && defined (MAP_ANON)
> # define MAP_ANONYMOUS MAP_ANON
> #endif
> 
> so I think libgcov.h needs to do that too.

Yes, mine.

[Bug gcov-profile/99406] [11 regression] MAP_ANONYMOUS undeclared in libgcov.h

2021-03-05 Thread jakub at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=99406

--- Comment #4 from Jakub Jelinek  ---
Created attachment 50309
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=50309&action=edit
gcc11-pr99406.patch

Like this.

[Bug fortran/97927] gfortran: ICE in lookup_field_for_decl, at tree-nested.c:288

2021-03-05 Thread doko at debian dot org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97927

--- Comment #11 from Matthias Klose  ---
20210227 trunk

[Bug fortran/97927] gfortran: ICE in lookup_field_for_decl, at tree-nested.c:288

2021-03-05 Thread burnus at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97927

--- Comment #12 from Tobias Burnus  ---
(In reply to Matthias Klose from comment #10)
> seen again with 20210227

I tried it with the attached file and the build.sh but calling gfortran
directly w/o mpif90 wrapper.

That's with --enable-checking=yes,extra,rtl and:
mainline: 4d66685e49d20e0c7a87c5fa0757c7eb63ffcdaa (Fri Mar 5 05:25:54 2021
-0800)
GCC 10: c85c24099b28f7af907466af2c1b73da9455368c (Fri Mar 5 12:45:41 2021
+0100)

That's on x86_64-gnu-linux (Ubuntu 14.04.4 LTS).

It did compile without any problems; I also tried running it with valgrind and
did not spot anything there, either.

Looking at the Elk webpage, also no gfortran-related issues are reported
threre.

[Bug tree-optimization/99394] s254 benchmark of TSVC is vectorized by clang and not by gcc

2021-03-05 Thread hubicka at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=99394

--- Comment #3 from Jan Hubicka  ---
testcase is:

typedef float real_t;

#define iterations 10
#define LEN_1D 32000
#define LEN_2D 256
// array definitions
real_t flat_2d_array[LEN_2D*LEN_2D];

real_t x[LEN_1D];

real_t a[LEN_1D],b[LEN_1D],c[LEN_1D],d[LEN_1D],e[LEN_1D],
bb[LEN_2D][LEN_2D],cc[LEN_2D][LEN_2D],tt[LEN_2D][LEN_2D];

int indx[LEN_1D];

real_t* __restrict__ xx;
real_t* yy;

// %2.5

real_t s254(void)
{

//scalar and array expansion
//carry around variable

real_t x;
for (int nl = 0; nl < 4*iterations; nl++) {
x = b[LEN_1D-1];
for (int i = 0; i < LEN_1D; i++) {
a[i] = (b[i] + x) * (real_t).5;
x = b[i];
}
}

}

[Bug middle-end/99407] New: s243 benchmark of TSVC is vectorized by clang and not by gcc

2021-03-05 Thread hubicka at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=99407

Bug ID: 99407
   Summary: s243 benchmark of TSVC is vectorized by clang and not
by gcc
   Product: gcc
   Version: 11.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: middle-end
  Assignee: unassigned at gcc dot gnu.org
  Reporter: hubicka at gcc dot gnu.org
  Target Milestone: ---

This testcase (from TSVC) is about 4 times faster on zen3 when built with
clang.

typedef float real_t;

#define iterations 10
#define LEN_1D 32000
#define LEN_2D 256
// array definitions
real_t flat_2d_array[LEN_2D*LEN_2D];

real_t x[LEN_1D];

real_t a[LEN_1D],b[LEN_1D],c[LEN_1D],d[LEN_1D],e[LEN_1D],
bb[LEN_2D][LEN_2D],cc[LEN_2D][LEN_2D],tt[LEN_2D][LEN_2D];

int indx[LEN_1D];

real_t* __restrict__ xx;
real_t* yy;
real_t s243(void)
{

//node splitting
//false dependence cycle breaking

for (int nl = 0; nl < iterations; nl++) {
for (int i = 0; i < LEN_1D-1; i++) {
a[i] = b[i] + c[i  ] * d[i];
b[i] = a[i] + d[i  ] * e[i];
a[i] = b[i] + a[i+1] * d[i];
}
}
}

internal loop from clang is:
.LBB0_2:#   Parent Loop BB0_1 Depth=1
# =>  This Inner Loop Header: Depth=2
vmovups c(%rcx), %ymm12
vmovups c+32(%rcx), %ymm14
vmovups d(%rcx), %ymm0
vmovups d+32(%rcx), %ymm7
vfmadd213ps b(%rcx), %ymm0, %ymm12  # ymm12 = (ymm0 * ymm12) + mem
vfmadd213ps b+32(%rcx), %ymm7, %ymm14 # ymm14 = (ymm7 * ymm14) +
mem
vfmadd231ps e(%rcx), %ymm0, %ymm12  # ymm12 = (ymm0 * mem) + ymm12
vfmadd231ps e+32(%rcx), %ymm7, %ymm14 # ymm14 = (ymm7 * mem) +
ymm14
vmovups %ymm12, b(%rcx)
vmovups %ymm14, b+32(%rcx)
vfmadd231ps a+4(%rcx), %ymm0, %ymm12 # ymm12 = (ymm0 * mem) + ymm12
vfmadd231ps a+36(%rcx), %ymm7, %ymm14 # ymm14 = (ymm7 * mem) +
ymm14
vmovups %ymm12, a(%rcx)
vmovups %ymm14, a+32(%rcx)
addq$64, %rcx
cmpq$127936, %rcx   # imm = 0x1F3C0
jne .LBB0_2

[Bug c++/98810] [9/10 Regression] [C++20] ICE in tsubst_copy, at cp/pt.c:16771

2021-03-05 Thread mpolacek at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98810

--- Comment #8 from Marek Polacek  ---
It probably needs target c++2a instead.

[Bug target/98092] [11 Regression] ICE in extract_insn, at recog.c:2315 (error: unrecognizable insn)

2021-03-05 Thread jakub at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98092

--- Comment #4 from Jakub Jelinek  ---
Any progress on this?

[Bug middle-end/99407] s243 benchmark of TSVC is vectorized by clang and not by gcc

2021-03-05 Thread hubicka at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=99407

--- Comment #1 from Jan Hubicka  ---
Here we get:
s243.c:27:18: missed:   not vectorized, possible dependence between data-refs
a[i_29] and a[_9]
s243.c:26:27: missed:  bad data dependence.
s243.c:26:27: note:  * Analysis failed with vector mode V8QI

   [local count: 1052266997]:

   [local count: 1063004410]:
  # i_29 = PHI <_9(6), 0(4)>
  # ivtmp_43 = PHI 
  _1 = b[i_29];
  _2 = c[i_29];
  _3 = d[i_29];
  _4 = _2 * _3;
  _5 = _1 + _4;
  a[i_29] = _5;
  _6 = e[i_29];
  _7 = _3 * _6;
  _8 = _5 + _7;
  b[i_29] = _8;
  _9 = i_29 + 1;
  _10 = a[_9];
  _11 = _3 * _10;
  _12 = _8 + _11;
  a[i_29] = _12;
  ivtmp_42 = ivtmp_43 - 1;
  if (ivtmp_42 != 0)
goto ; [98.99%]
  else
goto ; [1.01%]

[Bug middle-end/99408] New: s3251 benchmark of TSVC vectorized by clang runs about 7 times faster compared to gcc

2021-03-05 Thread hubicka at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=99408

Bug ID: 99408
   Summary: s3251 benchmark of TSVC vectorized by clang runs about
7 times faster compared to gcc
   Product: gcc
   Version: 11.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: middle-end
  Assignee: unassigned at gcc dot gnu.org
  Reporter: hubicka at gcc dot gnu.org
  Target Milestone: ---

typedef float real_t;
#define iterations 10
#define LEN_1D 32000
#define LEN_2D 256
real_t a[LEN_1D],b[LEN_1D],c[LEN_1D],d[LEN_1D],e[LEN_1D];
void
main(void)
{
for (int nl = 0; nl < iterations; nl++) {
for (int i = 0; i < LEN_1D-1; i++){
a[i+1] = b[i]+c[i];
b[i]   = c[i]*e[i];
d[i]   = a[i]*e[i];
}
}
}

Built with -march=znver2 -Ofast I get:
main:
.LFB0:
.cfi_startproc
vmovaps c+127968(%rip), %xmm5
vmovaps e+127968(%rip), %xmm4
movl$10, %edx
vmovq   c+127984(%rip), %xmm9
vmovq   e+127984(%rip), %xmm10
vmovss  c+127992(%rip), %xmm7
vmovss  e+127992(%rip), %xmm3
vmovss  c+127984(%rip), %xmm13
vmulps  %xmm4, %xmm5, %xmm6
vmulps  %xmm9, %xmm10, %xmm12
vmulss  %xmm3, %xmm7, %xmm11
.p2align 4
.p2align 3
.L2:
xorl%eax, %eax
.p2align 4
.p2align 3
.L4:
vmovaps c(%rax), %ymm2
addq$32, %rax
vaddps  b-32(%rax), %ymm2, %ymm0
vmovups %ymm0, a-28(%rax)
vmulps  e-32(%rax), %ymm2, %ymm0
vmovaps e-32(%rax), %ymm2
vmovaps %ymm0, b-32(%rax)
vmulps  a-32(%rax), %ymm2, %ymm0
vmovaps %ymm0, d-32(%rax)
cmpq$127968, %rax
jne .L4
vaddps  b+127968(%rip), %xmm5, %xmm1
vaddss  b+127984(%rip), %xmm13, %xmm2
decl%edx
vmovaps %xmm6, b+127968(%rip)
vmovq   b+127984(%rip), %xmm0
vmovlps %xmm12, b+127984(%rip)
vaddps  %xmm0, %xmm9, %xmm0
vmovups %xmm1, a+127972(%rip)
vshufps $255, %xmm1, %xmm1, %xmm1
vmulps  a+127968(%rip), %xmm4, %xmm8
vunpcklps   %xmm2, %xmm1, %xmm1
vaddss  b+127992(%rip), %xmm7, %xmm2
vmovss  %xmm11, b+127992(%rip)
vmulps  %xmm10, %xmm1, %xmm1
vmovlps %xmm0, a+127988(%rip)
vmovshdup   %xmm0, %xmm0
vmulss  %xmm3, %xmm0, %xmm0
vmovss  %xmm2, a+127996(%rip)
jne .L2
vmovaps %xmm8, d+127968(%rip)
vmovlps %xmm1, d+127984(%rip)
vmovss  %xmm0, d+127992(%rip)
vzeroupper
ret


Clang does:

main:   # @main
.cfi_startproc
# %bb.0:
vbroadcastssa(%rip), %ymm0
vmovss  e+127968(%rip), %xmm1   # xmm1 = mem[0],zero,zero,zero
vmovss  e+127980(%rip), %xmm2   # xmm2 = mem[0],zero,zero,zero
vmovss  c+127984(%rip), %xmm4   # xmm4 = mem[0],zero,zero,zero
vmovss  e+127984(%rip), %xmm5   # xmm5 = mem[0],zero,zero,zero
vmovss  c+127988(%rip), %xmm8   # xmm8 = mem[0],zero,zero,zero
vmovss  e+127988(%rip), %xmm9   # xmm9 = mem[0],zero,zero,zero
vmovss  c+127992(%rip), %xmm11  # xmm11 = mem[0],zero,zero,zero
vmovss  e+127992(%rip), %xmm12  # xmm12 = mem[0],zero,zero,zero
xorl%eax, %eax
vmovups %ymm0, -56(%rsp)# 32-byte Spill
vmovss  c+127968(%rip), %xmm0   # xmm0 = mem[0],zero,zero,zero
vmovss  %xmm1, -64(%rsp)# 4-byte Spill
vmulss  %xmm4, %xmm5, %xmm3
vmulss  %xmm8, %xmm9, %xmm10
vmulss  %xmm11, %xmm12, %xmm13
vmovss  %xmm0, -60(%rsp)# 4-byte Spill
vmulss  %xmm0, %xmm1, %xmm0
vmovss  e+127972(%rip), %xmm1   # xmm1 = mem[0],zero,zero,zero
vmovss  %xmm0, -68(%rsp)# 4-byte Spill
vmovss  c+127972(%rip), %xmm0   # xmm0 = mem[0],zero,zero,zero
vmovss  %xmm1, -76(%rsp)# 4-byte Spill
vmovss  %xmm0, -72(%rsp)# 4-byte Spill
vmulss  %xmm0, %xmm1, %xmm0
vmovss  e+127976(%rip), %xmm1   # xmm1 = mem[0],zero,zero,zero
vmovss  %xmm0, -80(%rsp)# 4-byte Spill
vmovss  c+127976(%rip), %xmm0   # xmm0 = mem[0],zero,zero,zero
vmovss  %xmm1, -88(%rsp)# 4-byte Spill
vmovss  %xmm0, -84(%rsp)# 4-byte Spill
vmulss  %xmm0, %xmm1, %xmm0
vmovss  c+127980(%rip), %xmm1   # xmm1 = mem[0],zero,zero,zero
vmovss  %xmm0, -92(%rsp)# 4-byte Spill
vmulss  %xmm1, %xmm2, %xmm0
   vmovss  %xmm0, -96(%rsp)# 4-byte Spill
.p2align4, 0x90
.LBB0_1:# =>This Loop Header: 

[Bug middle-end/99409] New: s252 benchmark of TSVC is vectorized by clang and not by gcc

2021-03-05 Thread hubicka at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=99409

Bug ID: 99409
   Summary: s252 benchmark of TSVC is vectorized by clang and not
by gcc
   Product: gcc
   Version: 11.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: middle-end
  Assignee: unassigned at gcc dot gnu.org
  Reporter: hubicka at gcc dot gnu.org
  Target Milestone: ---

typedef float real_t;
#define iterations 10
#define LEN_1D 32000
#define LEN_2D 256
real_t a[LEN_1D],b[LEN_1D],c[LEN_1D],d[LEN_1D],e[LEN_1D];

void main()
{

//scalar and array expansion
//loop with ambiguous scalar temporary

real_t t, s;
for (int nl = 0; nl < iterations; nl++) {
t = (real_t) 0.;
for (int i = 0; i < LEN_1D; i++) {
s = b[i] * c[i];
a[i] = s + t;
t = s;
}
}

}

clang does:
main:   # @main
.cfi_startproc
# %bb.0:
xorl%eax, %eax
.p2align4, 0x90
.LBB0_1:# =>This Loop Header: Depth=1
# Child Loop BB0_2 Depth 2
vxorps  %xmm0, %xmm0, %xmm0
movq$-128000, %rcx  # imm = 0xFFFE0C00
.p2align4, 0x90
.LBB0_2:#   Parent Loop BB0_1 Depth=1
# =>  This Inner Loop Header: Depth=2
vmovups c+128000(%rcx), %ymm1
vmovups c+128032(%rcx), %ymm2
vmovups c+128064(%rcx), %ymm3
vmovups c+128096(%rcx), %ymm4
vmulps  b+128000(%rcx), %ymm1, %ymm1
vmulps  b+128032(%rcx), %ymm2, %ymm2
vmulps  b+128064(%rcx), %ymm3, %ymm3
vmulps  b+128096(%rcx), %ymm4, %ymm4
vperm2f128  $33, %ymm1, %ymm0, %ymm0 # ymm0 = ymm0[2,3],ymm1[0,1]
vperm2f128  $33, %ymm2, %ymm1, %ymm5 # ymm5 = ymm1[2,3],ymm2[0,1]
vperm2f128  $33, %ymm3, %ymm2, %ymm6 # ymm6 = ymm2[2,3],ymm3[0,1]
vperm2f128  $33, %ymm4, %ymm3, %ymm7 # ymm7 = ymm3[2,3],ymm4[0,1]
vshufps $3, %ymm1, %ymm0, %ymm0 # ymm0 =
ymm0[3,0],ymm1[0,0],ymm0[7,4],ymm1[4,4]
vshufps $3, %ymm2, %ymm5, %ymm5 # ymm5 =
ymm5[3,0],ymm2[0,0],ymm5[7,4],ymm2[4,4]
vshufps $3, %ymm3, %ymm6, %ymm6 # ymm6 =
ymm6[3,0],ymm3[0,0],ymm6[7,4],ymm3[4,4]
vshufps $3, %ymm4, %ymm7, %ymm7 # ymm7 =
ymm7[3,0],ymm4[0,0],ymm7[7,4],ymm4[4,4]
vshufps $152, %ymm1, %ymm0, %ymm0   # ymm0 =
ymm0[0,2],ymm1[1,2],ymm0[4,6],ymm1[5,6]
vshufps $152, %ymm2, %ymm5, %ymm5   # ymm5 =
ymm5[0,2],ymm2[1,2],ymm5[4,6],ymm2[5,6]
vshufps $152, %ymm3, %ymm6, %ymm6   # ymm6 =
ymm6[0,2],ymm3[1,2],ymm6[4,6],ymm3[5,6]
vshufps $152, %ymm4, %ymm7, %ymm7   # ymm7 =
ymm7[0,2],ymm4[1,2],ymm7[4,6],ymm4[5,6]
vaddps  %ymm0, %ymm1, %ymm0
vaddps  %ymm5, %ymm2, %ymm1
vaddps  %ymm6, %ymm3, %ymm2
vaddps  %ymm7, %ymm4, %ymm3
vmovups %ymm0, a+128000(%rcx)
vmovups %ymm1, a+128032(%rcx)
vmovups %ymm2, a+128064(%rcx)
vmovups %ymm3, a+128096(%rcx)
subq$-128, %rcx
vmovaps %ymm4, %ymm0
jne .LBB0_2
# %bb.3:#   in Loop: Header=BB0_1 Depth=1
incl%eax
cmpl$10, %eax   # imm = 0x186A0
jne .LBB0_1
# %bb.4:
vzeroupper
retq

s252.c:18:27: note:   worklist: examine stmt: _3 = s_11 + t_21;
s252.c:18:27: note:   vect_is_simple_use: operand _1 * _2, type of def:
internal
s252.c:18:27: note:   mark relevant 5, live 0: s_11 = _1 * _2;
s252.c:18:27: note:   vect_is_simple_use: operand t_21 = PHI ,
type of def: unknown
s252.c:18:27: missed:   Unsupported pattern.
s252.c:20:22: missed:   not vectorized: unsupported use in stmt.
s252.c:18:27: missed:  unexpected pattern.

   [local count: 1052266996]:

   [local count: 1063004409]:
  # t_21 = PHI 
  # i_23 = PHI 
  # ivtmp_20 = PHI 
  _1 = b[i_23];
  _2 = c[i_23];
  s_11 = _1 * _2;
  _3 = s_11 + t_21;
  a[i_23] = _3;
  i_13 = i_23 + 1;
  ivtmp_19 = ivtmp_20 - 1;
  if (ivtmp_19 != 0)
goto ; [98.99%]
  else
goto ; [1.01%]

[Bug ipa/99122] [10/11 Regression] ICE in force_constant_size, at gimplify.c:733

2021-03-05 Thread jamborm at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=99122

--- Comment #25 from Martin Jambor  ---
I have proposed a patch for the IPA-CP part on the mailing list:
https://gcc.gnu.org/pipermail/gcc-patches/2021-March/566333.html

[Bug middle-end/99411] New: s311 benchmark of TSVC is vectorized by clang better than by gcc

2021-03-05 Thread hubicka at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=99411

Bug ID: 99411
   Summary: s311 benchmark of TSVC is vectorized by clang better
than by gcc
   Product: gcc
   Version: 11.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: middle-end
  Assignee: unassigned at gcc dot gnu.org
  Reporter: hubicka at gcc dot gnu.org
  Target Milestone: ---

typedef float real_t;

#define iterations 10
#define LEN_1D 32000
#define LEN_2D 256
real_t a[LEN_1D];

int main()
{

//reductions
//sum reduction

real_t sum;
for (int nl = 0; nl < iterations*10; nl++) {
sum = (real_t)0.;
for (int i = 0; i < LEN_1D; i++) {
sum += a[i];
}
}
  return sum > 4;
}

We produce with -O2 -march=znver2

.L2:
movl$a, %eax
vxorps  %xmm0, %xmm0, %xmm0
.p2align 4
.p2align 3
.L3:
vaddps  (%rax), %ymm0, %ymm0
addq$32, %rax
cmpq$a+128000, %rax
jne .L3
vextractf128$0x1, %ymm0, %xmm1
decl%edx
vaddps  %xmm0, %xmm1, %xmm1
vmovhlps%xmm1, %xmm1, %xmm0
vaddps  %xmm1, %xmm0, %xmm0
vshufps $85, %xmm0, %xmm0, %xmm1
vaddps  %xmm0, %xmm1, %xmm0
jne .L2
xorl%eax, %eax
vcomiss .LC0(%rip), %xmm0
seta%al
vzeroupper
ret
.cfi_endproc


clang does:
main:   # @main
.cfi_startproc
# %bb.0:
xorl%eax, %eax
.p2align4, 0x90
.LBB0_1:# =>This Loop Header: Depth=1
# Child Loop BB0_2 Depth 2
vxorps  %xmm0, %xmm0, %xmm0
movq$-128000, %rcx  # imm = 0xFFFE0C00
vxorps  %xmm1, %xmm1, %xmm1
vxorps  %xmm2, %xmm2, %xmm2
vxorps  %xmm3, %xmm3, %xmm3
.p2align4, 0x90
.LBB0_2:#   Parent Loop BB0_1 Depth=1
# =>  This Inner Loop Header: Depth=2
vaddps  a+128000(%rcx), %ymm0, %ymm0
vaddps  a+128032(%rcx), %ymm1, %ymm1
vaddps  a+128064(%rcx), %ymm2, %ymm2
vaddps  a+128096(%rcx), %ymm3, %ymm3
subq$-128, %rcx
jne .LBB0_2
# %bb.3:#   in Loop: Header=BB0_1 Depth=1
incl%eax
cmpl$100, %eax  # imm = 0xF4240
jne .LBB0_1
# %bb.4:
vaddps  %ymm0, %ymm1, %ymm0
xorl%eax, %eax
vaddps  %ymm0, %ymm2, %ymm0
vaddps  %ymm0, %ymm3, %ymm0
vextractf128$1, %ymm0, %xmm1
vaddps  %xmm1, %xmm0, %xmm0
vpermilpd   $1, %xmm0, %xmm1# xmm1 = xmm0[1,0]
vaddps  %xmm1, %xmm0, %xmm0
vmovshdup   %xmm0, %xmm1# xmm1 = xmm0[1,1,3,3]
vaddss  %xmm1, %xmm0, %xmm0
vucomiss.LCPI0_0(%rip), %xmm0
seta%al
vzeroupper
retq

On zen3 hardware gcc version runs 2.4s, while clang's 0.8s

[Bug c/99410] New: Nios II Error: branch offset out of range

2021-03-05 Thread giulio.benetti at benettiengineering dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=99410

Bug ID: 99410
   Summary: Nios II Error: branch offset out of range
   Product: gcc
   Version: 7.3.1
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: c
  Assignee: unassigned at gcc dot gnu.org
  Reporter: giulio.benetti at benettiengineering dot com
  Target Milestone: ---

When building git package on Buildroot gcc throws:
'''
[ 98%] Building C object
src/CMakeFiles/bellesip.dir/antlr3c/src/antlr3treeparser.c.o
/tmp/ccDtjRfo.s: Assembler messages:
/tmp/ccDtjRfo.s:210798: Error: branch offset out of range

/tmp/ccDtjRfo.s: Fatal error: branch relaxation failed
'''

To reproduce it:

# git clone git://git.busybox.net/buildroot
# wget https://git.busybox.net/buildroot-test/tree/utils/br-reproduce-build

- modify BASE_GIT=... with your buildroot path in br-reproduce-build then:
# chmod a+x br-reproduce-build
# ./br-reproduce-build 71f26fd81db8e9b19b3f18f3f3cefd9c768f094f

The only way I've found to build correctly is to turn off optimization
overriding CFLAGS with -O0.

[Bug tree-optimization/98856] [11 Regression] botan AES-128/XTS is slower by ~17% since r11-6649-g285fa338b06b804e72997c4d876ecf08a9c083af

2021-03-05 Thread rguenth at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98856

--- Comment #35 from Richard Biener  ---
(In reply to Richard Biener from comment #33)
> Created attachment 50308 [details]
> patch
> 
> I am testing the following.

It FAILs

FAIL: gcc.target/i386/avx512dq-concatv2di-1.c scan-assembler
vpinsrq[^\\n\\r]*\\
\\\$1[^\\n\\r]*%[re]si[^\\n\\r]*%xmm18[^\\n\\r]*%xmm19
FAIL: gcc.target/i386/avx512dq-concatv2di-1.c scan-assembler
vpinsrq[^\\n\\r]*\$1[^\\n\\r]*%rsi[^\\n\\r]*%xmm16[^\\n\\r]*%xmm17
FAIL: gcc.target/i386/avx512vl-concatv2di-1.c scan-assembler
vmovhps[^\\n\\r]*%[re]si[^\\n\\r]*%xmm18[^\\n\\r]*%xmm19

I'll see how to update those next week.

[Bug middle-end/99411] s311 and s31111 benchmark of TSVC is vectorized by clang better than by gcc

2021-03-05 Thread hubicka at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=99411

Jan Hubicka  changed:

   What|Removed |Added

Summary|s311 benchmark of TSVC is   |s311 and s3 benchmark
   |vectorized by clang better  |of TSVC is vectorized by
   |than by gcc |clang better than by gcc

--- Comment #1 from Jan Hubicka  ---
I think this is same case

typedef float real_t;

#define iterations 100
#define LEN_1D 32000
#define LEN_2D 256
real_t a[LEN_1D];
real_t test(real_t* A){
  real_t s = (real_t)0.0;
  for (int i = 0; i < 4; i++)
s += A[i];
  return s;
}

int main()
{

//reductions
//sum reduction
real_t sum;
for (int nl = 0; nl < 2000*iterations; nl++) {
sum = (real_t)0.;
sum += test(a);
sum += test(&a[4]);
sum += test(&a[8]);
sum += test(&a[12]);
sum += test(&a[16]);
sum += test(&a[20]);
sum += test(&a[24]);
sum += test(&a[28]);
}
  return sum>4;
}

[Bug middle-end/99411] s311, s312 and s31111 benchmark of TSVC is vectorized by clang better than by gcc

2021-03-05 Thread hubicka at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=99411

Jan Hubicka  changed:

   What|Removed |Added

Summary|s311 and s3 benchmark   |s311, s312 and s3
   |of TSVC is vectorized by|benchmark of TSVC is
   |clang better than by gcc|vectorized by clang better
   ||than by gcc

--- Comment #2 from Jan Hubicka  ---
another one:
// %3.1
typedef float real_t;

#define iterations 10
#define LEN_1D 32000
#define LEN_2D 256
real_t a[LEN_1D];

int main ()
{

//reductions
//product reduction

real_t prod;
for (int nl = 0; nl < 10*iterations; nl++) {
prod = (real_t)1.;
for (int i = 0; i < LEN_1D; i++) {
prod *= a[i];
}
}
return prod > 0;
}

  1   2   >