[Bug middle-end/110018] Missing vectorizable_conversion(unsigned char -> double) for BB vectorizer

2023-05-28 Thread crazylht at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110018

--- Comment #2 from Hongtao.liu  ---

> Currently, when modifier is NONE, vectorizable_conversion doesn't try any
> immediate type, it can be extended similar like WIDEN.
> 
After gdb the testcase, the modifier is not NONE, it's widen from V8QI to V4DF,
and failed.

[Bug middle-end/110018] Missing vectorizable_conversion(unsigned char -> double) for BB vectorizer

2023-05-28 Thread pinskia at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110018

Andrew Pinski  changed:

   What|Removed |Added

 Status|UNCONFIRMED |NEW
 Ever confirmed|0   |1
   Last reconfirmed||2023-05-29

--- Comment #1 from Andrew Pinski  ---
Confirmed.

[Bug middle-end/110018] New: Missing vectorizable_conversion(unsigned char -> double) for BB vectorizer

2023-05-28 Thread crazylht at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110018

Bug ID: 110018
   Summary: Missing vectorizable_conversion(unsigned char ->
double) for BB vectorizer
   Product: gcc
   Version: 13.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: middle-end
  Assignee: unassigned at gcc dot gnu.org
  Reporter: crazylht at gmail dot com
  Target Milestone: ---

When Looking at PR109812, I noticed there's missing vectorizable_conversion for
BB vectorizer when target doesn't support direct optab for unsigned char to
double. But actually it can be vectorized via unsigned char -> short/int/long
long -> double when vectorizable_conversion is ok for any of the immediate
type.

Currently, when modifier is NONE, vectorizable_conversion doesn't try any
immediate type, it can be extended similar like WIDEN.

 5158case NONE:
 5159  if (code != FIX_TRUNC_EXPR
 5160  && code != FLOAT_EXPR
 5161  && !CONVERT_EXPR_CODE_P (code))
 5162return false;
 5163  if (supportable_convert_operation (code, vectype_out, vectype_in,
))
 5164break;
 5165  /* FALLTHRU */

void
foo (double* __restrict a, unsigned char* b)
{
a[0] = b[0];
a[1] = b[1];
a[2] = b[2];
a[3] = b[3];
a[4] = b[4];
a[5] = b[5];
a[6] = b[6];
a[7] = b[7];
}

missed:   conversion not supported by target.

[Bug libstdc++/110016] Possible miscodegen when inlining std::condition_variable::wait predicate causes deadlock

2023-05-28 Thread pinskia at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110016

--- Comment #9 from Andrew Pinski  ---
So I think this is a bug in your code:

Inside substrate::threadPool_t::finish,
we have:

finished = true;
haveWork.notify_all();

If I change it to:
{
  std::lock_guard lock{workMutex};
  finished = true;
  haveWork.notify_all();
}

Then I don't get a deadlock at all.
As I mentioned, I did think there was a race condition.
Here is what I think happened:
Thread26:thread 1
checks finished, still false sets finished to be true
calls wait   calls notify_all
...  notify_all happens
finally gets into futex_wait syscall 

And then thread26 never got the notification.

With my change the check for finished has to wait till thread1 lets go of the
mutex (and the other way around).

[Bug libstdc++/110016] Possible miscodegen when inlining std::condition_variable::wait predicate causes deadlock

2023-05-28 Thread pinskia at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110016

--- Comment #8 from Andrew Pinski  ---
Here is the backtrace in that case:
(gdb) bt
#0  0xf6acd22c in futex_wait_cancelable (private=,
expected=0, futex_word=0xf3103c64) at
../sysdeps/unix/sysv/linux/futex-internal.h:88
#1  __pthread_cond_wait_common (abstime=0x0, mutex=0xf3103c08,
cond=0xf3103c38) at pthread_cond_wait.c:502
#2  __pthread_cond_wait (cond=0xf3103c38, mutex=0xf3103c08) at
pthread_cond_wait.c:655
#3  0xf6efc2e4 in __tsan::call_pthread_cancel_with_cleanup
(fn=fn@entry=0xf6eafd00 <_FUN(void*)>, cleanup=cleanup@entry=0xf6eb5364
<_FUN(void*)>, arg=arg@entry=0xe5f3dff0) at
/home/ubuntu/src/upstream-gcc-aarch64/gcc/libsanitizer/tsan/tsan_platform_linux.cpp:493
#4  0xf6ed4194 in cond_wait<__interceptor_pthread_cond_wait(void*,
void*):: > (m=0xf3103c08, c=0xf3103c38, fn=...,
si=0xe5f3dfd0, pc=281474824487080, thr=) at
/home/ubuntu/src/upstream-gcc-aarch64/gcc/libsanitizer/tsan/tsan_interceptors_posix.cpp:1259
#5  __interceptor_pthread_cond_wait (c=, m=0xf3103c08) at
/home/ubuntu/src/upstream-gcc-aarch64/gcc/libsanitizer/tsan/tsan_interceptors_posix.cpp:1270
#6  0x004045f4 in
std::condition_variable::wait::waitWork()::{lambda()#1}>(std::unique_lock&,
substrate::threadPool_t::waitWork()::{lambda()#1})
(this=this@entry=0xf3103c38, __lock=..., __p=__p@entry=...)
at /home/ubuntu/upstream-gcc/include/c++/14.0.0/condition_variable:102
#7  0x004064e0 in substrate::threadPool_t::waitWork()
(this=this@entry=0xf3103c00) at t.cc:282
#8  0x004081e4 in substrate::threadPool_t::workerThread(unsigned long) (this=this@entry=0xf3103c00,
processor=) at t.cc:310
#9  0x00408234 in substrate::threadPool_t::threadPool_t(bool
(*)())::{lambda(unsigned long)#1}::operator()(unsigned long) const
(currentProcessor=,
__closure=) at t.cc:337
#10 0x00408294 in std::__invoke_impl::threadPool_t(bool (*)())::{lambda(unsigned long)#1}, unsigned
long>(std::__invoke_other, substrate::threadPool_t::threadPool_t(bool
(*)())::{lambda(unsigned long)#1}&&, unsigned long&&) (__f=...)
at /home/ubuntu/upstream-gcc/include/c++/14.0.0/bits/invoke.h:60
#11 0x004082e0 in std::__invoke::threadPool_t(bool (*)())::{lambda(unsigned long)#1}, unsigned
long>(std::__invoke_result&&, (substrate::threadPool_t::threadPool_t(bool (*)())::{lambda(unsigned long)#1}&&)...) (__fn=...)
at /home/ubuntu/upstream-gcc/include/c++/14.0.0/bits/invoke.h:90
#12 0x004084d4 in
std::thread::_Invoker::threadPool_t(bool (*)())::{lambda(unsigned long)#1}, unsigned long>
>::_M_invoke<0ul, 1ul>(std::_Index_tuple<0ul, 1ul>)
(this=this@entry=0xf56004a8) at
/home/ubuntu/upstream-gcc/include/c++/14.0.0/bits/std_thread.h:291
#13 0x00408504 in
std::thread::_Invoker::threadPool_t(bool (*)())::{lambda(unsigned long)#1}, unsigned long>
>::operator()() (this=this@entry=0xf56004a8) at
/home/ubuntu/upstream-gcc/include/c++/14.0.0/bits/std_thread.h:295
#14 0x00408534 in
std::thread::_State_impl::threadPool_t(bool (*)())::{lambda(unsigned long)#1}, unsigned long> >
>::_M_run() (this=0xf56004a0) at
/home/ubuntu/upstream-gcc/include/c++/14.0.0/bits/std_thread.h:244
#15 0xf6ced74c in std::execute_native_thread_routine
(__p=0xf56004a0) at
/home/ubuntu/src/upstream-gcc-aarch64/gcc/libstdc++-v3/src/c++11/thread.cc:104
#16 0xf6eaf63c in __tsan_thread_start_func (arg=0xf6f0) at
/home/ubuntu/src/upstream-gcc-aarch64/gcc/libsanitizer/tsan/tsan_interceptors_posix.cpp:1038
#17 0xf6ac7088 in start_thread (arg=0xf62f) at
pthread_create.c:463
#18 0xf6a304ec in thread_start () at
../sysdeps/unix/sysv/linux/aarch64/clone.S:78

[Bug target/109987] ICE in in rs6000_emit_le_vsx_store on ppc64le with -Ofast -mno-power8-vector

2023-05-28 Thread linkw at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109987

Kewen Lin  changed:

   What|Removed |Added

 Status|UNCONFIRMED |NEW
 Ever confirmed|0   |1
   Keywords||ice-on-valid-code
 CC||bergner at gcc dot gnu.org,
   ||linkw at gcc dot gnu.org,
   ||segher at gcc dot gnu.org
   Last reconfirmed||2023-05-29

--- Comment #1 from Kewen Lin  ---
Confirmed, it's similar to what the issue was found in PR103627 #c4, previously
I made a patch to make MMA feature require power9-vector, see
https://gcc.gnu.org/pipermail/gcc-patches/2021-December/587310.html. But Segher
thought power9-vector is a workaround option, we should make it go away, so
just guard it under vsx, see his comment
https://gcc.gnu.org/pipermail/gcc-patches/2022-January/589303.html.

Unfortunately this issue is specified another workaround option
-mno-power8-vector, I think we probably need to put -mpower{8,9}-vector removal
in a high priority.

[Bug libstdc++/110016] Possible miscodegen when inlining std::condition_variable::wait predicate causes deadlock

2023-05-28 Thread pinskia at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110016

--- Comment #7 from Andrew Pinski  ---
I can reproduce the failure on aarch64-linux-gnu on the trunk with `-std=c++17
-pthread -O2 -fsanitize=thread -fno-inline` so your theory about inlining is
causing an issue is so incorrect.

[Bug libstdc++/110016] Possible miscodegen when inlining std::condition_variable::wait predicate causes deadlock

2023-05-28 Thread pinskia at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110016

--- Comment #6 from Andrew Pinski  ---
(In reply to Andrew Pinski from comment #5)
> Did you try on some other target than x86 for gcc?

To answer my own question is that it fails on aarch64-linux-gnu also. So this
makes it more likely a library issue (maybe glibc ...)


Thread 26 (Thread 0xe5f3ea10 (LWP 1003246)):
#0  0xf6acd22c in futex_wait_cancelable (private=,
expected=0, futex_word=0xf3103c64) at
../sysdeps/unix/sysv/linux/futex-internal.h:88
#1  __pthread_cond_wait_common (abstime=0x0, mutex=0xf3103c08,
cond=0xf3103c38) at pthread_cond_wait.c:502
#2  __pthread_cond_wait (cond=0xf3103c38, mutex=0xf3103c08) at
pthread_cond_wait.c:655
#3  0xf6efc2e4 in __tsan::call_pthread_cancel_with_cleanup
(fn=fn@entry=0xf6eafd00 <_FUN(void*)>, cleanup=cleanup@entry=0xf6eb5364
<_FUN(void*)>, arg=arg@entry=0xe5f3e0d0) at
/home/ubuntu/src/upstream-gcc-aarch64/gcc/libsanitizer/tsan/tsan_platform_linux.cpp:493
#4  0xf6ed4194 in cond_wait<__interceptor_pthread_cond_wait(void*,
void*):: > (m=0xf3103c08, c=0xf3103c38, fn=...,
si=0xe5f3e0b0, pc=281474824487080, thr=) at
/home/ubuntu/src/upstream-gcc-aarch64/gcc/libsanitizer/tsan/tsan_interceptors_posix.cpp:1259
#5  __interceptor_pthread_cond_wait (c=, m=0xf3103c08) at
/home/ubuntu/src/upstream-gcc-aarch64/gcc/libsanitizer/tsan/tsan_interceptors_posix.cpp:1270
#6  0x00403850 in
std::condition_variable::wait::waitWork()::{lambda()#1}>(std::unique_lock&,
substrate::threadPool_t::waitWork()::{lambda()#1}) (__p=...,
__lock=..., this=0xf3103c38)
at /home/ubuntu/upstream-gcc/include/c++/14.0.0/bits/atomic_base.h:503
#7  substrate::threadPool_t::waitWork() (this=0xf3103c00) at
t.cc:287
#8  substrate::threadPool_t::workerThread(unsigned long)
(processor=, this=0xf3103c00) at t.cc:312
#9  substrate::threadPool_t::threadPool_t(bool
(*)())::{lambda(unsigned long)#1}::operator()(unsigned long) const
(currentProcessor=, __closure=) at t.cc:338
#10 std::__invoke_impl::threadPool_t(bool (*)())::{lambda(unsigned long)#1}, unsigned
long>(std::__invoke_other, substrate::threadPool_t::threadPool_t(bool
(*)())::{lambda(unsigned long)#1}&&, unsigned long&&) (__f=...)
at /home/ubuntu/upstream-gcc/include/c++/14.0.0/bits/invoke.h:61
#11 std::__invoke::threadPool_t(bool
(*)())::{lambda(unsigned long)#1}, unsigned long>(std::__invoke_result&&,
(substrate::threadPool_t::threadPool_t(bool (*)())::{lambda(unsigned
long)#1}&&)...) (__fn=...) at
/home/ubuntu/upstream-gcc/include/c++/14.0.0/bits/invoke.h:96
#12 std::thread::_Invoker::threadPool_t(bool (*)())::{lambda(unsigned long)#1}, unsigned long>
>::_M_invoke<0ul, 1ul>(std::_Index_tuple<0ul, 1ul>) (this=) at
/home/ubuntu/upstream-gcc/include/c++/14.0.0/bits/std_thread.h:292
#13 std::thread::_Invoker::threadPool_t(bool (*)())::{lambda(unsigned long)#1}, unsigned long>
>::operator()() (this=) at
/home/ubuntu/upstream-gcc/include/c++/14.0.0/bits/std_thread.h:299
#14
std::thread::_State_impl::threadPool_t(bool (*)())::{lambda(unsigned long)#1}, unsigned long> >
>::_M_run() (this=) at
/home/ubuntu/upstream-gcc/include/c++/14.0.0/bits/std_thread.h:244
#15 0xf6ced74c in std::execute_native_thread_routine
(__p=0xf56004a0) at
/home/ubuntu/src/upstream-gcc-aarch64/gcc/libstdc++-v3/src/c++11/thread.cc:104
#16 0xf6eaf63c in __tsan_thread_start_func (arg=0xf730) at
/home/ubuntu/src/upstream-gcc-aarch64/gcc/libsanitizer/tsan/tsan_interceptors_posix.cpp:1038
#17 0xf6ac7088 in start_thread (arg=0xf66f) at
pthread_create.c:463
#18 0xf6a304ec in thread_start () at
../sysdeps/unix/sysv/linux/aarch64/clone.S:78

[Bug libstdc++/110016] Possible miscodegen when inlining std::condition_variable::wait predicate causes deadlock

2023-05-28 Thread pinskia at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110016

--- Comment #4 from Andrew Pinski  ---
Did you try on some other target than x86 for gcc?

--- Comment #5 from Andrew Pinski  ---
Did you try on some other target than x86 for gcc?

[Bug libstdc++/110016] Possible miscodegen when inlining std::condition_variable::wait predicate causes deadlock

2023-05-28 Thread pinskia at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110016

--- Comment #4 from Andrew Pinski  ---
Did you try on some other target than x86 for gcc?

[Bug libstdc++/110016] Possible miscodegen when inlining std::condition_variable::wait predicate causes deadlock

2023-05-28 Thread pinskia at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110016

--- Comment #3 from Andrew Pinski  ---
When you say you compiled with clang, did you use libstdc++ or libc++?

Did you try adding gnu::always_inline attribute on the lambda to see if it
fails there too?

Again the inlining should not have an effect here except if there is some kind
of race condition happening.

[Bug libstdc++/110016] Possible miscodegen when inlining std::condition_variable::wait predicate causes deadlock

2023-05-28 Thread amy at amyspark dot me via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110016

--- Comment #2 from Amyspark  ---
We've seen failures both in Windows (all ABI flavors) and macOS only when
compiled with GCC -- AppleClang, Clang, and MSVC (in its three flavors) all
work without issue. So I'm doubtful it's a logical issue, especially given that
preventing the compiler from inlining void Predicate::operator() into
std::condition_variable::wait seems to be enough to work around it.

Re 98033-- looks somewhat like it, though as explained earlier, it may also
affect non Linux platforms ie. any target where GCC relies on (win)pthreads.

[Bug tree-optimization/94892] (x >> 31) + 1 not getting narrowed to compare

2023-05-28 Thread pinskia at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=94892

--- Comment #7 from Andrew Pinski  ---
(In reply to Andrew Pinski from comment #6)
> Two things we should simplify:
>   _4 = _3 >> 31;
>   _4 != -1
> 
> Into:
>   _3 >= 0 (if _3 is signed, otherwise false)
> 
> (this will solve f0)

See bug 85234 comment #5 on handle that one (g and g2).

[Bug tree-optimization/85234] missed optimisation opportunity for (x >> CST)!=0 is not optimized to (((unsigned)x) >= (1<

2023-05-28 Thread pinskia at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=85234

--- Comment #5 from Andrew Pinski  ---
Here is the testcase for the constants besides 0:L
```
#define N 3
#define unsigned int
#define cmp ==
#define M 0xf000u
_Bool f(unsigned x, int t)
{
return (x << N) cmp (M << N);
}
_Bool f1(unsigned x, int t)
{
return ((x^M) & (-1u>>N)) cmp 0;
}

_Bool f2(unsigned x, int t)
{
return (x & (-1u>>N)) cmp (M & (-1u>>N));
}

_Bool g(unsigned x, int t)
{
return (x >> N) cmp M;
}
_Bool g2(unsigned x, int t)
{
_Bool  = 0;
if ()
return 0;
return (x & (-1u<= (1u<

[Bug libgcc/110017] Crossback Compilation for multilib fails on latest ubuntu due to -mx32 being disabled by the linux kernel

2023-05-28 Thread unlvsur at live dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110017

--- Comment #4 from cqwrteur  ---
Created attachment 55182
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=55182=edit
Here is the build script (need to install a x86_64-w64-mingw32 cross compiler
first)

[Bug libgcc/110017] Crossback Compilation for multilib fails on latest ubuntu due to -mx32 being disabled by the linux kernel

2023-05-28 Thread unlvsur at live dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110017

--- Comment #3 from cqwrteur  ---
(In reply to Andrew Pinski from comment #2)
> How are you configuring GCC?

gcc/configure --disable-nls --disable-werror --enable-languages=c,c++
--enable-multilib --with-multilib-list=m64,m32,mx32
--with-gxx-libcxx-include-dir=$PREFIXTARGET/include/c++/v1 --prefix=$PREFIX
--build=x86_64-pc-linux-gnu --host=x86_64-w64-mingw32
--target=x86_64-pc-linux-gnu --disable-bootstrap --disable-libstdcxx-verbose
--with-libstdcxx-eh-pool-obj-count=0 --enable-libstdcxx-backtrace

[Bug libgcc/110017] Crossback Compilation for multilib fails on latest ubuntu due to -mx32 being disabled by the linux kernel

2023-05-28 Thread pinskia at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110017

Andrew Pinski  changed:

   What|Removed |Added

 Ever confirmed|0   |1
   Last reconfirmed||2023-05-28
Version|14.0|unknown
 Status|UNCONFIRMED |WAITING

--- Comment #2 from Andrew Pinski  ---
How are you configuring GCC?

[Bug libgcc/110017] Crossback Compilation for multilib fails on latest ubuntu due to -mx32 being disabled by the linux kernel

2023-05-28 Thread unlvsur at live dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110017

--- Comment #1 from cqwrteur  ---
(In reply to cqwrteur from comment #0)
> I attempted crossback compilation for GCC, where the compiler is built on
> Linux, runs on Windows, and is targeted for Linux again. However, the build
> system of libgcc includes a sanity test to detect the functionality of the
> compiler, which prevents the build for the -mx32 option and disables m32.
> 
> Moreover, during crossback compilation, GCC specifically looks for the "cc"
> command instead of just "gcc," even in cases where it doesn't exist.
> 
> Is there a way to remove or bypass the sanity test restriction for crossback
> compilation in this scenario?

Not the functionality. It detects whether -mx32 program could run but of course
it cannot because linux kernel disabled that.

[Bug libgcc/110017] New: Crossback Compilation for multilib fails on latest ubuntu due to -mx32 being disabled by the linux kernel

2023-05-28 Thread unlvsur at live dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110017

Bug ID: 110017
   Summary: Crossback Compilation for multilib fails on latest
ubuntu due to -mx32 being disabled by the linux kernel
   Product: gcc
   Version: unknown
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: libgcc
  Assignee: unassigned at gcc dot gnu.org
  Reporter: unlvsur at live dot com
  Target Milestone: ---

I attempted crossback compilation for GCC, where the compiler is built on
Linux, runs on Windows, and is targeted for Linux again. However, the build
system of libgcc includes a sanity test to detect the functionality of the
compiler, which prevents the build for the -mx32 option and disables m32.

Moreover, during crossback compilation, GCC specifically looks for the "cc"
command instead of just "gcc," even in cases where it doesn't exist.

Is there a way to remove or bypass the sanity test restriction for crossback
compilation in this scenario?

[Bug ipa/109914] --suggest-attribute=pure misdiagnoses static functions

2023-05-28 Thread bruno at clisp dot org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109914

--- Comment #3 from Bruno Haible  ---
(In reply to Jan Hubicka from comment #2)
> The reason why gcc warns is that it is unable to prove that the function is
> always finite. This means that it can not auto-detect pure attribute since
> optimizing the call out may turn infinite program to finite one. 
> So adding the attribute would still help compiler to know that the loops are
> indeed finite.

Thanks for explaining. So, the warning asks the developer not only to add an
__attribute__((__pure__)) marker, but also to verify that the function
terminates. In this case, it does, but it took me a minute of reflection to
convince myself.

For what purpose shall the developer make this effort? The documentation
https://gcc.gnu.org/onlinedocs/gcc-13.1.0/gcc/Common-Function-Attributes.html
says that it's to allow the compiler to do common subexpression elimination.
But in this case, the compiler could easily find out that it cannot do common
subexpression elimination anyway, because:
  - The only caller of this function (have_xattr) is file_has_acl.
  - In this function, there are three calls to have_xattr.
  - Each of them is executed only at most once. Control flow analysis shows
this.
  - Each of them has different argument lists: The first argument is a string
literal in each case, namely "system.nfs4_acl", "system.posix_acl_access",
"system.posix_acl_default" respectively.
So, there is no possibility for common subexpression elimination here, even if
the function was marked "pure".

Therefore it is pointless to suggest to the developer that it would be a gain
to mark the function as "pure" and that it is worth spending brain cycles on
that.

[Bug tree-optimization/85234] missed optimisation opportunity for (x >> CST)!=0 is not optimized to (((unsigned)x) >= (1<

2023-05-28 Thread pinskia at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=85234

--- Comment #4 from Andrew Pinski  ---
Here are some testcases dealing with this and showing what still needs to be
done:
```
#define N 3
#define unsigned int
#define cmp !=
_Bool rshift(unsigned x, int t)
{
return (x << N) cmp 0;
}
_Bool rshift1(unsigned x, int t)
{
return (x & (-1u>>N)) cmp 0;
}
_Bool lshift(unsigned x, int t)
{
return (x >> N) cmp 0;
}
_Bool lshift1(unsigned x, int t)
{
return (x & (-1u<= (1u<

[Bug libstdc++/110016] [12/13/14] Possible miscodegen when inlining std::condition_variable::wait predicate causes deadlock

2023-05-28 Thread pinskia at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110016

Andrew Pinski  changed:

   What|Removed |Added

  Component|c++ |libstdc++

--- Comment #1 from Andrew Pinski  ---
I doubt this is a code generation issue but rather either a libstdc++ issue or
a problem in the code itself.

Inlining if anything might expose a race condition that was in the code more
often than not.

[Bug c/110007] Implement support for Clang’s __builtin_unpredictable()

2023-05-28 Thread richard.yao at alumni dot stonybrook.edu via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110007

--- Comment #8 from Richard Yao  ---
(In reply to Alexander Monakov from comment #6)
> Are you sure the branch is unpredictable in your micro-benchmark? If you
> have repeated runs you'll train the predictors.

(In reply to Jan Hubicka from comment #7)
> Also note that branch predicted with 50% outcome is not necessarily
> unpredictable for example in this:
> 
> for (int i = 0; i < 1; i++)
>   if (i&1)
>  
> 
> I would expect branch predictor to work this out on modern systems.
> So having explicit flag in branch_probability that given probability is hard
> for CPU to predict would make sense and I was thinking we may try to get
> this info from auto-fdo eventually too.

Good point. I had reused an existing micro-benchmark, but it is using libc's
srand(), which is known for not having great quality RNG. It is quite possible
that the last branch really is predictable because of that. Having only 1
unpredictable branch is not that terrible, so I probably will defer looking
into this further to a future date.

(In reply to Alexander Monakov from comment #6)
> Implementing a __builtin_branchless_select would address such needs more
> directly. There were similar requests in the past, two of them related to
> qsort and bsearch, unsurprisingly: PR 93165, PR 97734, PR 106804.

As a developer that works on a project that supports GCC and Clang equally, but
must support older versions of GCC longer, I would like to see both GCC and
Clang adopt each others' builtins. That way, I need to implement fewer
compatibility shims to support both compilers and what compatibility shims I do
need can be dropped sooner (maybe after 10 years).

I am not against the new builtin, but I would like to also have
__builtin_unpredictable(). It would be consistent with the existing
likely/unlikely macros that my project uses, which should mean other developers
will not have to learn the specifics of how predication works to read code
using it, since they can just treat it as a compiler hint and read things as if
it were not there unless they have some reason to reason more deeply about
things.

I could just do a macro around __builtin_expect_with_probability(), but I
greatly prefer __builtin_unpredictable() since people reading the code do not
need to decipher the argument list to understand what it does since the name is
self documenting.

[Bug c++/110016] New: [12/13/14] Possible miscodegen when inlining std::condition_variable::wait predicate causes deadlock

2023-05-28 Thread amy at amyspark dot me via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110016

Bug ID: 110016
   Summary: [12/13/14] Possible miscodegen when inlining
std::condition_variable::wait predicate causes
deadlock
   Product: gcc
   Version: 12.2.1
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: c++
  Assignee: unassigned at gcc dot gnu.org
  Reporter: amy at amyspark dot me
  Target Milestone: ---

Created attachment 55181
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=55181=edit
Minimum test case to reproduce the deadlock

Hi all,

This is to report a possible codegen issue when inlining a lambda predicate for
std::condition_variable::wait. We've verified this to happen with the following
versions:

- g++-8 (Homebrew GCC 8.5.0) 8.5.0
- g++.exe (Rev6, Built by MSYS2 project) 13.1.0 (both UCRT64 and MINGW64)
- g++
(Compiler-Explorer-Build-gcc-4579954f25020f0b39361ab6ec0c8876fda27041-binutils-2.40)
14.0.0 20230522 (experimental)

The deadlock seems to happen with 100% certainty on GCC 12.2.1 if one enables
ThreadSanitizer; otherwise it happens sporadically in CI.

I packaged a reduced version of the test suite:
https://godbolt.org/z/fj8rnrbo7, a copy of which you'll find attached to this
report. Build with `-std=c++17 -pthread -O2 -fsanitize=thread`.

In all cases, once the deadlock is hit (wait for ~3 seconds under GDB) the
"finished" atomic boolean and the "workQueue" are correctly flagged as true and
empty, respectively; however, the thread will still wait for the condition
variable indefinitely. This can be easily worked around by blocking the
inlining eg. turn the lambda into a std::bind instance.

The complete code of the library where we reproduced this is available here:
https://github.com/bad-alloc-heavy-industries/substrate/tree/375db811308ad7414771dbde9af4efa7aa393ca8.
You can build it with `meson setup build -Dcpp_std=c++17 -Db_sanitize=thread`
and run the test with `meson test -C build`.

[Bug tree-optimization/109985] __builtin_prefetch ignored by GCC 12/13

2023-05-28 Thread hubicka at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109985

--- Comment #6 from Jan Hubicka  ---
Created attachment 55180
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=55180=edit
untested patch

It turns out that as modref was written for memory loads/stores only and later
side effects discovery was retrofitted, I forgot to revisit code handling CONST
and NOVOPS together. There are quite few places where we can not short-circuit
on NOVOPS and be sure we merge in the side effects and determinism flags.

[Bug tree-optimization/109985] __builtin_prefetch ignored by GCC 12/13

2023-05-28 Thread hubicka at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109985

Jan Hubicka  changed:

   What|Removed |Added

   Assignee|unassigned at gcc dot gnu.org  |hubicka at gcc dot 
gnu.org
 Status|NEW |ASSIGNED

--- Comment #5 from Jan Hubicka  ---
Hmm, this is slipperly.  So novops tells gcc that the function has on memory
side effects and in turn we optimize out the call?

I think we need to handle novops as having side-effects.

[Bug ipa/109914] --suggest-attribute=pure misdiagnoses static functions

2023-05-28 Thread hubicka at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109914

--- Comment #2 from Jan Hubicka  ---
The reason why gcc warns is that it is unable to prove that the function is
always finite. This means that it can not auto-detect pure attribute since
optimizing the call out may turn infinite program to finite one. 
So adding the attribute would still help compiler to know that the loops are
indeed finite.

[Bug middle-end/79704] [meta-bug] Phoronix Test Suite compiler performance issues

2023-05-28 Thread hubicka at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=79704

Jan Hubicka  changed:

   What|Removed |Added

 CC||hubicka at gcc dot gnu.org

--- Comment #2 from Jan Hubicka  ---
Note that I tried to reproduce from
https://www.phoronix.com/review/gcc13-clang16-raptorlake/3
also tsvc (on zen3 machine) and there performance seems OK (GCC does 2168417
nodes/s and clang 2159913)

liquid-dsp fails for me:

pts/liquid-dsp-1.0.0:
Test Installation 1 of 1
1 File Needed [0.74 MB / 1 Minute]
File Found: liquid-dsp-20210131.tar.xz 
   
   
   [0.74MB]
Approximate Install Size: 68 MB
Estimated Install Time: 6 Seconds
Installing Test @ 20:09:38
The installer exited with a non-zero exit status.
ERROR:
/usr/lib64/gcc/x86_64-suse-linux/13/../../../../x86_64-suse-linux/bin/ld:
cannot find -lliquid: No such file or directory
Installing the package 'file' might fix this error.
LOG:
~/.phoronix-test-suite/installed-tests/pts/liquid-dsp-1.0.0/install-failed.log

seems the build script is broken and actually links to the system wide
libquid-dsp

[Bug middle-end/110015] openjpeg is slower when built with gcc13 compared to clang16

2023-05-28 Thread hubicka at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110015

--- Comment #1 from Jan Hubicka  ---
opj_t1_enc_refpass is not inlined due to large function growth and some others
due to max-inline-insns-auto.  With inlining forced I get profile:

  87.35%   opj_t1_cblk_encode_processor
   6.22%  opj_dwt_encode_and_deinterleave_v.lto_priv.0
   1.80%  opj_mqc_byteout
   1.50%  opj_dwt_encode_and_deinterleave_h_one_row.lto_priv.0

So pretty much same profile as for clang. However runtime is still 45573 with
-O3 -flto -march=native -fno-semantic-interposition --param
large-function-insns=100  --param max-inline-insns-auto=5

So it does not seem to be missing IPA optimizations.

There are number of conditional moves in clang code, -mbrach=cost helps a bit,
but not enough.

[Bug rtl-optimization/101188] [postreload] Uses content of a clobbered register

2023-05-28 Thread gjl at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=101188

--- Comment #9 from Georg-Johann Lay  ---
The bug works as follows:

postreload.cc::reload_cse_move2add() loops over all insns, and at some point it
encounters

(insn 44 14 15 2 (set (reg/f:HI 14 r14 [58])
(reg/v/f:HI 16 r16 [orig:51 self ] [51])) "fail1.c":28:5 101
{*movhi_split}
 (nil))

During the analysis for that insn, it executes

  rtx_insn *next = next_nonnote_nondebug_insn (insn);
  rtx set = NULL_RTX;
  if (next)
set = single_set (next);

where next is

(insn 15 44 16 2 (parallel [
(set (reg/f:HI 14 r14 [58])
(plus:HI (reg/f:HI 14 r14 [58])
(const_int 68 [0x44])))
(clobber (reg:QI 31 r31))
]) "fail1.c":28:5 175 {addhi3_clobber}
 (nil))

Further down, it continues with success = 0:

  if (success)
delete_insn (insn);
  changed |= success;
  insn = next;
  [...]
  continue;

The scan then continues with NEXT_INSN (insn), which is the insn AFTER insn 15,
so the CLOBBER of QI:31 in insn 15 is bypassed, and note_stores or similar is
never executed on insn 15.  The "set = single_set (next)" also bypasses that
insn 15 is a PARALLEL with a CLOBBER of a general purpose register.

Appears the code is in postreload since 2003, when postreload.c was split out
of reload1.c.

[Bug middle-end/110015] New: openjpeg is slower when built with gcc13 compared to clang16

2023-05-28 Thread hubicka at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110015

Bug ID: 110015
   Summary: openjpeg is slower when built with gcc13 compared to
clang16
   Product: gcc
   Version: unknown
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: middle-end
  Assignee: unassigned at gcc dot gnu.org
  Reporter: hubicka at gcc dot gnu.org
  Target Milestone: ---

I tried to reproduce openjpeg benchmarks from Phoronix
https://www.phoronix.com/review/gcc13-clang16-raptorlake/5

On zen3 hardware I get 42607ms for clang build and 45702ms for gcc build that
is a 7% difference (Phoronix reports 10% on RaptorLake)

perf of clang build:
  88.64%  opj_t1_cblk_encode_processor
   6.68%  opj_dwt_encode_and_deinterleave_v
   1.30%  opj_dwt_encode_and_deinterleave_h_one_row

opj_t1_cblk_encode_processor is huge with no obvious hot spots.

perf of gcc build:

  70.36% opj_t1_cblk_encode_processor   
  16.12% opj_t1_enc_refpass.lto_priv.0  
   3.88% opj_dwt_encode_and_deinterleave_v  
   2.46% pj_dwt_fetch_cols_vertical_pass
   2.35% opj_mqc_byteout

So we apparently inline less even at -O3

[Bug fortran/88486] ICE in gfc_conv_scalarized_array_ref, at fortran/trans-array.c:3401

2023-05-28 Thread kargl at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88486

--- Comment #5 from kargl at gcc dot gnu.org ---
(In reply to G. Steinmetz from comment #0)
> Affects versions down to at least gfortran-5.
> Under the hood related to pr85686.
> 
> 
> $ cat z1.f90
> subroutine s(x)
>character(:), allocatable :: x(:)
>x = ['bcd']
>x = ['a'//x//'e']
>print *, x
> end
> 

This compiles with GNU Fortran (GCC) 13.0.1 20230408 (experimental).
This has an ICE with GNU Fortran (FreeBSD Ports Collection) 12.2.0.

Filling out the code to something that actually does something
reveals a wrong-code issue with an array constructor.  There are
a boat load of warnings of uninitialized variables, e.g., 
a.f90:53:34:

   53 |  x = [ ('a' // x // 'e') ]
  |  ^
Warning: '__var_1_realloc_string.dim[0].ubound' is used uninitialized
[-Wuninitialized]
a.f90:1:11:

1 | program foo
  |   ^
note: '__var_1_realloc_string' declared here
a.f90:3:36:

3 |character(:), allocatable :: a(:)
  |^
Warning: '.a' is used uninitialized [-Wuninitialized]
a.f90:34:22:

   34 |   end subroutine s
  |  ^
note: '.a' declared here


program foo

   character(:), allocatable :: a(:)

   call s(a)
   print '(A,1X,2(I0,1X),/)', 'a: >>' // a // '<<', size(a), len(a(1))
   if (allocated(a)) deallocate(a)

   call t(a)
   print '(A,1X,2(I0,1X),/)', 'a: >>' // a // '<<', size(a), len(a(1))
   if (allocated(a)) deallocate(a)

   call u(a)
   print '(A,1X,2(I0,1X),/)', 'a: >>' // a // '<<', size(a), len(a(1))
   if (allocated(a)) deallocate(a)

   call v(a)
   print '(A,1X,2(I0,1X),/)', 'a: >>' // a // '<<', size(a), len(a(1))
   if (allocated(a)) deallocate(a)

   call w(a)
   print '(A,1X,2(I0,1X),/)', 'a: >>' // a // '<<', size(a), len(a(1))
   if (allocated(a)) deallocate(a)

   contains

  subroutine s(x)
 character(:), allocatable :: x(:)
 x = ['bcd']
 x = ['a' // x // 'e']
 print '(A,1X,2(I0,1X))', 's: >>' // x // '<<', size(x), len(x(1))
  end subroutine s

  subroutine t(x)
 character(:), allocatable :: x(:)
 x = ['bcd']
 x = 'a' // x // 'e'
 print '(A,1X,2(I0,1X))', 't: >>' // x // '<<', size(x), len(x(1))
  end subroutine t

  subroutine u(x)
 character(:), allocatable :: x(:)
 x = ['bcd']
 x = [ ('a' // x // 'e') ]
 print '(A,1X,2(I0,1X))', 'u: >>' // x // '<<', size(x), len(x(1))
  end subroutine u

  subroutine v(x)
 character(:), allocatable, intent(out) :: x(:)
 x = ['bcd']
 x = [ ('a' // x // 'e') ]
 print '(A,1X,2(I0,1X))', 'v: >>' // x // '<<', size(x), len(x(1))
  end subroutine v

  subroutine w(x)
 character(:), allocatable, intent(out) :: x(:)
 x = [ 'a' // ['bcd'] // 'e' ]
 print '(A,1X,2(I0,1X))', 'w: >>' // x // '<<', size(x), len(x(1))
  end subroutine w

end program foo


s: >>abcde<< 1 5
a: >>abc<< 1 3<--- whoops

t: >>abcde<< 1 5
a: >>abcde<< 1 5

u: >>abcde<< 1 5
a: >>abc<< 1 3<--- whoops

v: >>abcde<< 1 5
a: >>abc<< 1 3<--- whoops

w: >>abcde<< 1 5
a: >>abcde<< 1 5

[Bug target/109812] GraphicsMagick resize is a lot slower in GCC 13.1 vs Clang 16 on Intel Raptor Lake

2023-05-28 Thread hubicka at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109812

--- Comment #10 from Jan Hubicka  ---
This is benchmarkeable version of the simplified testcase:

jan@localhost:/tmp> cat t.c
#define N 1000
struct rgb {unsigned char r,g,b;} rgbs[N];
int *addr;
struct drgb {double r,g,b;
#ifdef OPACITY
 double o;
#endif
};

struct drgb sum(double w)
{
struct drgb r;
for (int i = 0; i < N; i++)
{
  r.r += rgbs[i].r * w;
  r.g += rgbs[i].g * w;
  r.b += rgbs[i].b * w;
}
return r;
}
jan@localhost:/tmp> cat q.c
struct drgb {double r,g,b;
#ifdef OPACITY
 double o;
#endif
};
struct drgb sum(double w);
int
main()
{
for (int i = 0; i < 1000; i++)
sum(i);
}


jan@localhost:/tmp> gcc t.c q.c -march=native -O3 -g ; objdump -d a.out | grep
vfmadd231pd  ; perf stat ./a.out
  40119d:   c4 e2 d9 b8 d1  vfmadd231pd %xmm1,%xmm4,%xmm2

 Performance counter stats for './a.out':

 12,148.04 msec task-clock:u #1.000 CPUs
utilized 
 0  context-switches:u   #0.000 /sec
 0  cpu-migrations:u #0.000 /sec
   736  page-faults:u#   60.586 /sec
50,018,421,148  cycles:u #4.117 GHz 
   220,502  stalled-cycles-frontend:u#0.00% frontend
cycles idle  
39,950,154,369  stalled-cycles-backend:u #   79.87% backend
cycles idle   
   120,000,191,713  instructions:u   #2.40  insn per
cycle
  #0.33  stalled cycles per
insn   
10,000,048,918  branches:u   #  823.182 M/sec   
 7,959  branch-misses:u  #0.00% of all
branches   

  12.149466078 seconds time elapsed

  12.149084000 seconds user
   0.0 seconds sys


jan@localhost:/tmp> gcc t.c q.c -march=native -O3 -g -DOPACITY ; objdump -d
a.out | grep vfmadd231pd  ; perf stat ./a.out

 Performance counter stats for './a.out':

 12,141.11 msec task-clock:u #1.000 CPUs
utilized 
 0  context-switches:u   #0.000 /sec
 0  cpu-migrations:u #0.000 /sec
   735  page-faults:u#   60.538 /sec
50,018,839,129  cycles:u #4.120 GHz 
   185,034  stalled-cycles-frontend:u#0.00% frontend
cycles idle  
29,963,999,798  stalled-cycles-backend:u #   59.91% backend
cycles idle   
   120,000,191,729  instructions:u   #2.40  insn per
cycle
  #0.25  stalled cycles per
insn   
10,000,048,913  branches:u   #  823.652 M/sec   
 7,311  branch-misses:u  #0.00% of all
branches   

  12.142252354 seconds time elapsed

  12.138237000 seconds user
   0.00400 seconds sys


So on zen2 hardware I get same performance on both.  It may be interesting to
test it on Raptor Lake.

[Bug fortran/88486] ICE in gfc_conv_scalarized_array_ref, at fortran/trans-array.c:3401

2023-05-28 Thread kargl at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88486

kargl at gcc dot gnu.org changed:

   What|Removed |Added

 CC||kargl at gcc dot gnu.org

--- Comment #4 from kargl at gcc dot gnu.org ---
(In reply to anlauf from comment #3)
> Further reduced:
> 
> subroutine s(x)
>   character(:), allocatable :: x(:)
>   character(:), allocatable :: y(:)
>   y = [x//'a']
> end

This compiles with GNU Fortran (GCC) 13.0.1 20230408 (experimental).
This has an ICE with GNU Fortran (FreeBSD Ports Collection) 12.2.0.

[Bug target/109812] GraphicsMagick resize is a lot slower in GCC 13.1 vs Clang 16 on Intel Raptor Lake

2023-05-28 Thread hubicka at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109812

--- Comment #9 from Jan Hubicka  ---
Oddly enough simplified version of the loop SLP vectorizes for me:
struct rgb {unsigned char r,g,b;} *rgbs;
int *addr;
double *weights;
struct drgb {double r,g,b;};

struct drgb sum()
{
struct drgb r;
for (int i = 0; i < 10; i++)
{
  int j = addr[i];
  double w = weights[i];
  r.r += rgbs[j].r * w;
  r.g += rgbs[j].g * w;
  r.b += rgbs[j].b * w;
}
return r;
}
I get:
L2:
movslq  (%r9,%rdx,4), %rax
vmovsd  (%r8,%rdx,8), %xmm1
incq%rdx
leaq(%rax,%rax,2), %rax
addq%rsi, %rax
movzbl  (%rax), %ecx
vmovddup%xmm1, %xmm4
vmovd   %ecx, %xmm0
movzbl  1(%rax), %ecx
movzbl  2(%rax), %eax
vpinsrd $1, %ecx, %xmm0, %xmm0
vcvtdq2pd   %xmm0, %xmm0
vfmadd231pd %xmm4, %xmm0, %xmm2
vcvtsi2sdl  %eax, %xmm5, %xmm0
vfmadd231sd %xmm1, %xmm0, %xmm3
cmpq$10, %rdx
jne .L2


I think the actual loop is:
  [local count: 44202554]:
  _106 = _262->pixel;
  _109 = *source_231(D).columns;

   [local count: 401841405]:
  # pixel$green_332 = PHI <_124(89), pixel$green_265(53)>
  # i_357 = PHI 
  # pixel$red_371 = PHI <_119(89), pixel$red_263(53)>
  # pixel$blue_377 = PHI <_129(89), pixel$blue_267(53)>
  i.51_102 = (long unsigned int) i_357;
  _103 = i.51_102 * 16;
  _104 = _262 + _103;
  _105 = _104->pixel;
  _107 = _105 - _106;
  _108 = (long unsigned int) _107;
  _110 = _108 * _109;
  _112 = _110 + _621;
  weight_297 = _104->weight;
  _113 = _112 * 4;
  _114 = _276 + _113;
  _115 = _114->red;
  _116 = (int) _115;
  _117 = (double) _116;
  _118 = _117 * weight_297;
  _119 = _118 + pixel$red_371;
  _120 = _114->green;
 _121 = (int) _120;
  _122 = (double) _121;
  _123 = _122 * weight_297;
  _124 = _123 + pixel$green_332;
  _125 = _114->blue;
  _126 = (int) _125;
  _127 = (double) _126;
  _128 = _127 * weight_297;
  _129 = _128 + pixel$blue_377;
  i_298 = i_357 + 1;
  if (n_195 > i_298)
goto ; [89.00%]
  else
goto ; [11.00%]

   [local count: 44202554]:
  # _607 = PHI <_124(54)>
  # _606 = PHI <_119(54)>
  # _605 = PHI <_129(54)>
  goto ; [100.00%]

   [local count: 357638851]:
  goto ; [100.00%]


and SLP vectorizer seems to claim:
../magick/resize.c:1284:52: note:   _125 = _114->blue;
../magick/resize.c:1284:52: note:   _120 = _114->green;
../magick/resize.c:1284:52: note:   _115 = _114->red;
../magick/resize.c:1284:52: missed:   not consecutive access weight_297 =
_104->weight;
../magick/resize.c:1284:52: missed:   not consecutive access _105 =
_104->pixel;
../magick/resize.c:1284:52: missed:   not consecutive access _134->red =
iftmp.57_207;
../magick/resize.c:1284:52: missed:   not consecutive access _134->green =
iftmp.60_208;
../magick/resize.c:1284:52: missed:   not consecutive access _134->blue =
iftmp.63_209;
../magick/resize.c:1284:52: missed:   not consecutive access _134->opacity = 0;
../magick/resize.c:1284:52: missed:   not consecutive access _63 =
*source_231(D).columns;
../magick/resize.c:1284:52: missed:   not consecutive access _60 = _262->pixel;

Not sure if that is related to the real testcase:


struct rgb {unsigned char r,g,b;} *rgbs;
int *addr;
double *weights;
struct drgb {double r,g,b,o;};

struct drgb sum()
{
struct drgb r;
for (int i = 0; i < 10; i++)
{
  int j = addr[i];
  double w = weights[i];
  r.r += rgbs[j].r * w;
  r.g += rgbs[j].g * w;
  r.b += rgbs[j].b * w;
}
return r;
}

make us to miss the vectorization even though there is nothing using drgb->o:

sum:
.LFB0:
.cfi_startproc
movq%rdi, %r8
movqweights(%rip), %rsi
movqaddr(%rip), %rdi
vxorps  %xmm2, %xmm2, %xmm2
movqrgbs(%rip), %rcx
xorl%edx, %edx
.p2align 4
.p2align 3
.L2:
movslq  (%rdi,%rdx,4), %rax
vmovsd  (%rsi,%rdx,8), %xmm0
incq%rdx
leaq(%rax,%rax,2), %rax
addq%rcx, %rax
movzbl  (%rax), %r9d
vcvtsi2sdl  %r9d, %xmm2, %xmm1
movzbl  1(%rax), %r9d
movzbl  2(%rax), %eax
vfmadd231sd %xmm0, %xmm1, %xmm3
vcvtsi2sdl  %r9d, %xmm2, %xmm1
vfmadd231sd %xmm0, %xmm1, %xmm5
vcvtsi2sdl  %eax, %xmm2, %xmm1
vfmadd231sd %xmm0, %xmm1, %xmm4
cmpq$10, %rdx
jne .L2
vmovq   %xmm4, %xmm4
vunpcklpd   %xmm5, %xmm3, %xmm0
movq%r8, %rax
vinsertf128 $0x1, %xmm4, %ymm0, %ymm0
vmovupd %ymm0, (%r8)
vzeroupper
ret

[Bug analyzer/110014] New: -Wanalyzer-allocation-size mishandles realloc (..., .... * sizeof (object))

2023-05-28 Thread eggert at cs dot ucla.edu via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110014

Bug ID: 110014
   Summary: -Wanalyzer-allocation-size mishandles realloc (...,
 * sizeof (object))
   Product: gcc
   Version: 13.1.1
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: analyzer
  Assignee: dmalcolm at gcc dot gnu.org
  Reporter: eggert at cs dot ucla.edu
  Target Milestone: ---

Created attachment 55179
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=55179=edit
compile with 'gcc -fanalyzer -S' to reproduce the bug

This is a followup to bug 109577, and reports a more serious problem with
-Wanalyzer-allocation-size: it mishandles realloc even when the last argument
is obviously a multiple of the object size.

I discovered this problem when compiling an experimental version of GNU
diffutils.

This is with gcc (GCC) 13.1.1 20230511 (Red Hat 13.1.1-2) x86-64.

Compile the attached program with:

gcc -fanalyzer -S w.i

The output is as follows. All the warnings are incorrect. The last warning is
for a call of the form realloc(p, N * sizeof (long)) even though the result is
used as a long * so the call is obviously well-sized.


w.i: In function ‘slurp’:
w.i:11:14: warning: allocated buffer size is not a multiple of the pointee's
size [CWE-131] [-Wanalyzer-allocation-size]
   11 | buffer = realloc (buffer, cc);
  |  ^~~~
  ‘slurp’: events 1-4
|
|9 |   if (!__builtin_add_overflow (file_size - file_size % sizeof
(long),
|  |  ^
|  |  |
|  |  (1) following ‘true’ branch...
|   10 |2 * sizeof (long), ))
|   11 | buffer = realloc (buffer, cc);
|  |  
|  |  |
|  |  (2) ...to here
|  |  (3) allocated ‘cc’ bytes here
|  |  (4) assigned to ‘long int *’ here; ‘sizeof (long
int)’ is ‘8’
|
w.i: In function ‘slurp1’:
w.i:18:10: warning: allocated buffer size is not a multiple of the pointee's
size [CWE-131] [-Wanalyzer-allocation-size]
   18 |   return realloc (buffer, file_size - file_size % sizeof (long));
  |  ^~~
  ‘slurp1’: events 1-2
|
|   18 |   return realloc (buffer, file_size - file_size % sizeof (long));
|  |  ^~~
|  |  |
|  |  (1) allocated ‘file_size & 18446744073709551608’ bytes
here
|  |  (2) assigned to ‘long int *’ here; ‘sizeof (long int)’ is
‘8’
|
w.i: In function ‘slurp2’:
w.i:24:10: warning: allocated buffer size is not a multiple of the pointee's
size [CWE-131] [-Wanalyzer-allocation-size]
   24 |   return realloc (buffer, (file_size / sizeof (long)) * sizeof (long));
  |  ^
  ‘slurp2’: events 1-2
|
|   24 |   return realloc (buffer, (file_size / sizeof (long)) * sizeof
(long));
|  | 
^
|  |  |
|  |  (1) allocated ‘file_size & 18446744073709551608’ bytes
here
|  |  (2) assigned to ‘long int *’ here; ‘sizeof (long int)’ is
‘8’
|

[Bug fortran/68241] [meta-bug] [F03] Deferred-length character

2023-05-28 Thread kargl at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=68241
Bug 68241 depends on bug 65381, which changed state.

Bug 65381 Summary: [10/11/12/13/14 Regression] ICE during array result, 
assignment
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=65381

   What|Removed |Added

 Status|REOPENED|RESOLVED
 Resolution|--- |FIXED

[Bug fortran/65381] [10/11/12/13/14 Regression] ICE during array result, assignment

2023-05-28 Thread kargl at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=65381

kargl at gcc dot gnu.org changed:

   What|Removed |Added

 CC||kargl at gcc dot gnu.org
 Status|REOPENED|RESOLVED
 Resolution|--- |FIXED

--- Comment #13 from kargl at gcc dot gnu.org ---
All of the codes in this bug report compile with 

GNU Fortran (FreeBSD Ports Collection) 12.2.0
GNU Fortran (GCC) 13.0.1 20230408 (experimental)

[Bug target/109812] GraphicsMagick resize is a lot slower in GCC 13.1 vs Clang 16 on Intel Raptor Lake

2023-05-28 Thread hubicka at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109812

--- Comment #8 from Jan Hubicka  ---
Created attachment 55178
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=55178=edit
Preprocessed source of VerticalFiller and HorisontalFiller

[Bug target/109812] GraphicsMagick resize is a lot slower in GCC 13.1 vs Clang 16 on Intel Raptor Lake

2023-05-28 Thread hubicka at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109812

Jan Hubicka  changed:

   What|Removed |Added

Summary|GraphicsMagick resize is a  |GraphicsMagick resize is a
   |lot slower in GCC 13.1 vs   |lot slower in GCC 13.1 vs
   |Clang 16|Clang 16 on Intel Raptor
   ||Lake

--- Comment #7 from Jan Hubicka  ---
On zen3 hardware I get GCC:

GraphicsMagick 1.3.38:
pts/graphics-magick-2.1.0 [Operation: Resizing]
Test 1 of 1
Estimated Trial Run Count:3 
Estimated Time To Completion: 4 Minutes [17:00 UTC] 
Started Run 1 @ 16:57:17
Started Run 2 @ 16:58:22
Started Run 3 @ 16:59:26

Operation: Resizing:
1390
1386
1383

Average: 1386 Iterations Per Minute
Deviation: 0.25%

clang16:

GraphicsMagick 1.3.38:
pts/graphics-magick-2.1.0 [Operation: Resizing]
Test 1 of 1
Estimated Trial Run Count:3
Estimated Time To Completion: 4 Minutes [16:54 UTC]
Started Run 1 @ 16:51:48
Started Run 2 @ 16:52:52
Started Run 3 @ 16:53:56

Operation: Resizing:
180
180
180

Average: 180 Iterations Per Minute
Deviation: 0.00%


GCC profile:
  52.07%  VerticalFilter._omp_fn.0  
  24.59%  HorizontalFilter._omp_fn.0
  11.78%  ReadCachePixels.isra.0

Clang does not seem to have openmp in it, so to get comparable runs I added 
OMP_THREAD_LIMIT=1

With this I get:
GraphicsMagick 1.3.38:
pts/graphics-magick-2.1.0 [Operation: Resizing]
Test 1 of 1
Estimated Trial Run Count:3
Estimated Time To Completion: 4 Minutes [17:17 UTC]
Started Run 1 @ 17:14:14
Started Run 2 @ 17:15:18
Started Run 3 @ 17:16:22

Operation: Resizing:
184
186
186

Average: 185 Iterations Per Minute
Deviation: 0.62%

so GCC build is still bit faster. Internal loop of VerticalFillter is:
  0.00 │4a0:┌─→mov  0x8(%rdx),%rax  ▒
  1.33 ││  vmovsd   (%rdx),%xmm1▒
  1.58 ││  add  $0x10,%rdx  ▒
  0.00 ││  sub  %r13,%rax   ▒
  4.77 ││  imul %r11,%rax   ▒
  1.01 ││  add  %rcx,%rax   ▒
  0.04 ││  movzbl   0x2(%r15,%rax,4),%r10d  ▒
  8.38 ││  vcvtsi2sd%r10d,%xmm2,%xmm0   ▒
  2.44 ││  movzbl   0x1(%r15,%rax,4),%r10d  ◆
  1.55 ││  movzbl   (%r15,%rax,4),%eax  ▒
  0.00 ││  vfmadd231sd  %xmm0,%xmm1,%xmm4   ▒
 13.91 ││  vcvtsi2sd%r10d,%xmm2,%xmm0   ▒
  1.86 ││  vfmadd231sd  %xmm0,%xmm1,%xmm5   ▒
 13.00 ││  vcvtsi2sd%eax,%xmm2,%xmm0▒
  2.02 ││  vfmadd231sd  %xmm0,%xmm1,%xmm3   ▒
 12.54 │├──cmp  %rdx,%rdi   ▒
  0.00 │└──jne  4a0 ▒

HorisontalFiller:
  0.01 │520:┌─→mov  0x8(%r8),%rdx ▒
  0.96 ││  vmovsd   (%r8),%xmm1   ▒
  1.93 ││  add  $0x10,%r8 ▒
  0.50 ││  sub  %r15,%rdx ▒
  4.02 ││  add  %r11,%rdx ▒
  2.26 ││  movzbl   0x2(%r14,%rdx,4),%ebx ▒
  0.09 ││  vcvtsi2sd%ebx,%xmm2,%xmm0  ▒
 10.10 ││  movzbl   0x1(%r14,%rdx,4),%ebx ◆
  0.92 ││  movzbl   (%r14,%rdx,4),%edx▒
  1.84 ││  vfmadd231sd  %xmm0,%xmm1,%xmm4 ▒
  6.82 ││  vcvtsi2sd%ebx,%xmm2,%xmm0  ▒
 11.15 ││  vfmadd231sd  %xmm0,%xmm1,%xmm3 ▒
 13.81 ││  vcvtsi2sd%edx,%xmm2,%xmm0  ▒
  6.16 ││  vfmadd231sd  %xmm0,%xmm1,%xmm5 ▒
  8.61 │├──cmp  %rsi,%r8  ▒
  1.56 │└──jne  520   ▒

ReadCachePixels:
   │2e0:┌─→mov(%rbx,%rax,4),%edx  ▒
 83.03 ││  mov%edx,(%r12,%rax,4)  ▒
 12.34 ││  inc%rax▒
  0.02 │├──cmp%rsi,%rax   ▒

With Clang I get:
  

[Bug target/64331] regcprop propagates registers noted as REG_DEAD

2023-05-28 Thread gjl at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=64331

Georg-Johann Lay  changed:

   What|Removed |Added

 Status|ASSIGNED|NEW

[Bug target/64331] regcprop propagates registers noted as REG_DEAD

2023-05-28 Thread gjl at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=64331

Georg-Johann Lay  changed:

   What|Removed |Added

   Assignee|gjl at gcc dot gnu.org |unassigned at gcc dot 
gnu.org

--- Comment #13 from Georg-Johann Lay  ---
Resetting assignee to default.  The AVR backend solved the problem by a
target-specific mini-pass that (re)computes notes as late as possible.

[Bug target/109812] GraphicsMagick resize is a lot slower in GCC 13.1 vs Clang 16

2023-05-28 Thread hubicka at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109812

Jan Hubicka  changed:

   What|Removed |Added

 CC||hubicka at gcc dot gnu.org

--- Comment #6 from Jan Hubicka  ---
I installed the phoronix testuiste and uploaded sample data it uses to
http://www.ucw.cz/~hubicka/sample-photo-6000x4000-1.zip

I doubt they make much difference especially for resizing.

[Bug c/110007] Implement support for Clang’s __builtin_unpredictable()

2023-05-28 Thread hubicka at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110007

Jan Hubicka  changed:

   What|Removed |Added

 CC||hubicka at gcc dot gnu.org

--- Comment #7 from Jan Hubicka  ---
Also note that branch predicted with 50% outcome is not necessarily
unpredictable for example in this:

for (int i = 0; i < 1; i++)
  if (i&1)
 

I would expect branch predictor to work this out on modern systems.
So having explicit flag in branch_probability that given probability is hard
for CPU to predict would make sense and I was thinking we may try to get this
info from auto-fdo eventually too.

[Bug fortran/99139] ICE: gfc_get_default_type(): Bad symbol '__tmp_UNKNOWN_0_rank_1'

2023-05-28 Thread kargl at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=99139

--- Comment #5 from kargl at gcc dot gnu.org ---
(In reply to sandra from comment #4)
> The problem noted in comment 1 looks related to PR 102641 --
> automatically-inserted implicit initialization code can't cope with
> assumed-rank arrays.

I don't think it is related.  PR102601 involves default initialization
and/or deallocation of an actual argument associated with an intent(out)
assumed-rank dummy argument.

> I tested the patch in comment 2 and saw a whole lot of regressions (ICEs). 
> :-(

The patch in comment #2 needed to be moved down into the 'if (m == MATCH_YES)'
block where 'expr2 != NULL'.  The following has been regtested with no new
regressions.

diff --git a/gcc/fortran/match.cc b/gcc/fortran/match.cc
index 5eb6d0e1c1d..0a030ae01df 100644
--- a/gcc/fortran/match.cc
+++ b/gcc/fortran/match.cc
@@ -6770,8 +6770,20 @@ gfc_match_select_rank (void)

   gfc_current_ns = gfc_build_block_ns (ns);
   m = gfc_match (" %n => %e", name, );
+
   if (m == MATCH_YES)
 {
+  /* If expr2 corresponds to an implicitly typed variable, then the
+actual type of the variable may not have been set.  Set it here.  */
+  if (!gfc_current_ns->seen_implicit_none 
+ && expr2->expr_type == EXPR_VARIABLE
+ && expr2->ts.type == BT_UNKNOWN
+ && expr2->symtree && expr2->symtree->n.sym)
+   {
+ gfc_set_default_type (expr2->symtree->n.sym, 0, gfc_current_ns);
+ expr2->ts.type = expr2->symtree->n.sym->ts.type;
+   }
+
   expr1 = gfc_get_expr ();
   expr1->expr_type = EXPR_VARIABLE;
   expr1->where = expr2->where;

[Bug tree-optimization/110009] Another missing ABS detection

2023-05-28 Thread pinskia at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110009

--- Comment #2 from Andrew Pinski  ---
(In reply to Georg-Johann Lay from comment #1)
> (In reply to Andrew Pinski from comment #0)
> > unsigned
> > f1 (int v)
> > {
> >   [...]
> >   int b_5;
> > 
> >   b_5 = v>>(sizeof(v)*8 - 1);
> 
> Does it depend on -fwrapv maybe.

No in this case there is a missing pattern to match against.

[Bug c++/110000] GCC should implement exclude_from_explicit_instantiation

2023-05-28 Thread nikolasklauser at berlin dot de via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=11

--- Comment #8 from Nikolas Klauser  ---
(In reply to Florian Weimer from comment #7)
> (In reply to Nikolas Klauser from comment #6)
> > Does that make sense?
> 
> Not quite. I was trying to suggest that you also need to suppress all
> inter-procedural analysis. This will inhibit quite a few useful
> optimizations.

Why would you need to do that? As long as any functions that are part of the
ABI don't change in a non-benign way, everything is fine. If an
implementation-detail function doesn't get inlined, but the public function
does, it's fine because the detail function gets emitted by every TU that uses
it, which means that it'll always be there as long as some function relies on
the symbol. If the implementation-detail function gets inlined, the code will
obviously be there - no need to have a symbol anywhere.

[Bug target/99435] avr: incorrect I/O address ranges for some cores

2023-05-28 Thread gjl at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=99435

Georg-Johann Lay  changed:

   What|Removed |Added

 Status|WAITING |RESOLVED
 Resolution|--- |INVALID

--- Comment #3 from Georg-Johann Lay  ---
Closed as invalid.

The linked ATmega16U4 states on page 26:

> 5. AVR Memories
> 5.4 I/O Memory
> [...]
> I/O Registers within the address range 0x00 - 0x1F are directly bit-accessible
> using the SBI and CBI instructions. In these registers, the value of single
> bits can be checked by using the SBIS and SBIC instructions. Refer to the
> instruction set section for more details. When using the I/O specific commands
> IN and OUT, the I/O addresses 0x00 - 0x3F must be used. When addressing I/O
> Registers as data space using LD and ST instructions, 0x20 must be added to
> these addresses. The device is a complex microcontroller with more peripheral
> units than can be supported within the 64 location reserved in Opcode for the
> IN and OUT instructions. For the Extended I/O space from 0x60 - 0xFF in SRAM,
> only the ST/STS/STD and LD/LDS/LDD instructions can be used.

So the lower I/O has a range of 5 bits (CBI, SBI, SBIC, SBIS), and the I/O
addressable by IN and OUT has a range of 6 bits.

[Bug target/49263] SH Target: underutilized "TST #imm, R0" instruction

2023-05-28 Thread olegendo at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=49263

--- Comment #44 from Oleg Endo  ---
(In reply to Alexander Klepikov from comment #43)
> 
> Well, not really. Look what's happening during expand pass when 'ashrsi3' is
> expanding. Function 'expand_ashiftrt' is called and what it does at the end
> - it explicitly emits 3 insns:
> [...]

> 
> By the way, right shift for integers expands to only one 'lshiftrt' insn and
> that's why it can be catched and converted to 'tst'.
> 

Yeah, I might have dropped the ball on the right shift patterns back then and
only reworked the left shift patterns to do that. 


> 
> As far as I understand these insns could be catched later by a peephole and
> converted to 'tstsi_t' insn like it is done for other much simple insn
> sequences.

It's the combine RTL pass and split1 RTL pass that does most of this work here.
 Peephole pass in GCC is too simplistic for this.


> 
> Thank you for your time and detailed explanations! I agree with you on all
> points. Software cannot be perfect and it's OK for GCC not to be super
> optimized, so this part better sholud be left intact.

We can't have it perfect, but we can try ;)

> 
> By the way, I tried to link library to my project and I figured out that
> linker is smart enough to link only necessary library functions even without
> LTO. So increase in size is about 100 or 200 bytes, that is acceptable.
> Thank you very much for help!

You're welcome.

Yes, to strip out unused library functions it doesn't need LTO.  But using LTO
for embedded/MCU  firmware, I find it can reduce the code size by about 20%. 
For example, it can also inline small library functions in your code (if the
library was also compiled with LTO).

[Bug target/49263] SH Target: underutilized "TST #imm, R0" instruction

2023-05-28 Thread klepikov.alex+bugs at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=49263

--- Comment #43 from Alexander Klepikov  
---
> > Thank you! I have an idea. If it's impossible to defer initial optimization,
> > maybe it's possible to emit some intermediate insn and catch it and optimize
> > later?
> 
> This is basically what is supposed to be happening there already.

Well, not really. Look what's happening during expand pass when 'ashrsi3' is
expanding. Function 'expand_ashiftrt' is called and what it does at the end -
it explicitly emits 3 insns:

wrk = gen_reg_rtx (Pmode);

  //This one
  emit_move_insn (gen_rtx_REG (SImode, 4), operands[1]);

  sprintf (func, "__ashiftrt_r4_%d", value);
  rtx lab = function_symbol (wrk, func, SFUNC_STATIC).lab;

  //This one
  emit_insn (gen_ashrsi3_n (GEN_INT (value), wrk, lab));

  //And this one
  emit_move_insn (operands[0], gen_rtx_REG (SImode, 4));

As far as I understand these insns could be catched later by a peephole and
converted to 'tstsi_t' insn like it is done for other much simple insn
sequences.

What I'm thinkig about is to emit only one, say 'compound', insn. Which could
be then splitted later somwhere in split pass to function call, to those 3
insns.

I wrote test code that emits only one bogus insn. This insn expands to pure asm
code. Of course, that asm code is invalid, because it is impossible to place a
libcall label at the end of function with pure asm code injection. But then all
what is could be coverted to 'tst', converts to 'tst'. And all pure right
shifts converts to invalid asm code, of course.

That's why I am thinking about possibility of emitting some intermediate insn
at expand pass that will defer it real expanding. But I still don't know how to
do it right and even if it is possible.

By the way, right shift for integers expands to only one 'lshiftrt' insn and
that's why it can be catched and converted to 'tst'.

> 
> However, it's a bit of a dilemma.
> 
> 1) If we don't have a dynamic shift insn and we smash the constant shift
> into individual 
> stitching shifts early, it might open some new optimization opportunities,
> e.g. by sharing intermediate shift results.  Not sure though if that
> actually happens in practice though.
> 
> 2) Whether to use the dynamic shift insn or emit a function call or use
> stitching shifts sequence, it all has an impact on register allocation and
> other instruction use.  This can be problematic during the course of RTL
> optimization passes.
> 
> 3) Even if we have a dynamic shift, sometimes it's more beneficial to emit a
> shorter stitching shift sequence.  Which one is better depends on the
> surrounding code.  I don't think there is anything good there to make the
> proper choice.
> 
> Some other shift related PRs: PR 54089, PR 65317, PR 67691, PR 67869, PR
> 52628, PR 58017

Thank you for your time and detailed explanations! I agree with you on all
points. Software cannot be perfect and it's OK for GCC not to be super
optimized, so this part better sholud be left intact.

By the way, I tried to link library to my project and I figured out that linker
is smart enough to link only necessary library functions even without LTO. So
increase in size is about 100 or 200 bytes, that is acceptable. Thank you very
much for help!

[Bug tree-optimization/110009] Another missing ABS detection

2023-05-28 Thread gjl at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110009

--- Comment #1 from Georg-Johann Lay  ---
(In reply to Andrew Pinski from comment #0)
> unsigned
> f1 (int v)
> {
>   [...]
>   int b_5;
> 
>   b_5 = v>>(sizeof(v)*8 - 1);

Does it depend on -fwrapv maybe.

[Bug rtl-optimization/101188] [postreload] Uses content of a clobbered register

2023-05-28 Thread gjl at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=101188

Georg-Johann Lay  changed:

   What|Removed |Added

Summary|[AVR] Miscompilation and|[postreload] Uses content
   |function pointers   |of a clobbered register
   See Also||https://gcc.gnu.org/bugzill
   ||a/show_bug.cgi?id=56833

--- Comment #8 from Georg-Johann Lay  ---
Changing the title to something that resembles what is going wrong.

Also there is PR56833 which was fixed around v4.9, so maybe that fix was
incomplete.  There is also PR56442 which is still open, and where it's unclear
whether that is a duplicate.

[Bug c++/110000] GCC should implement exclude_from_explicit_instantiation

2023-05-28 Thread fw at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=11

--- Comment #7 from Florian Weimer  ---
(In reply to Nikolas Klauser from comment #6)
> Does that make sense?

Not quite. I was trying to suggest that you also need to suppress all
inter-procedural analysis. This will inhibit quite a few useful optimizations.