[Bug middle-end/110018] Missing vectorizable_conversion(unsigned char -> double) for BB vectorizer
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110018 --- Comment #2 from Hongtao.liu --- > Currently, when modifier is NONE, vectorizable_conversion doesn't try any > immediate type, it can be extended similar like WIDEN. > After gdb the testcase, the modifier is not NONE, it's widen from V8QI to V4DF, and failed.
[Bug middle-end/110018] Missing vectorizable_conversion(unsigned char -> double) for BB vectorizer
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110018 Andrew Pinski changed: What|Removed |Added Status|UNCONFIRMED |NEW Ever confirmed|0 |1 Last reconfirmed||2023-05-29 --- Comment #1 from Andrew Pinski --- Confirmed.
[Bug middle-end/110018] New: Missing vectorizable_conversion(unsigned char -> double) for BB vectorizer
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110018 Bug ID: 110018 Summary: Missing vectorizable_conversion(unsigned char -> double) for BB vectorizer Product: gcc Version: 13.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: middle-end Assignee: unassigned at gcc dot gnu.org Reporter: crazylht at gmail dot com Target Milestone: --- When Looking at PR109812, I noticed there's missing vectorizable_conversion for BB vectorizer when target doesn't support direct optab for unsigned char to double. But actually it can be vectorized via unsigned char -> short/int/long long -> double when vectorizable_conversion is ok for any of the immediate type. Currently, when modifier is NONE, vectorizable_conversion doesn't try any immediate type, it can be extended similar like WIDEN. 5158case NONE: 5159 if (code != FIX_TRUNC_EXPR 5160 && code != FLOAT_EXPR 5161 && !CONVERT_EXPR_CODE_P (code)) 5162return false; 5163 if (supportable_convert_operation (code, vectype_out, vectype_in, )) 5164break; 5165 /* FALLTHRU */ void foo (double* __restrict a, unsigned char* b) { a[0] = b[0]; a[1] = b[1]; a[2] = b[2]; a[3] = b[3]; a[4] = b[4]; a[5] = b[5]; a[6] = b[6]; a[7] = b[7]; } missed: conversion not supported by target.
[Bug libstdc++/110016] Possible miscodegen when inlining std::condition_variable::wait predicate causes deadlock
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110016 --- Comment #9 from Andrew Pinski --- So I think this is a bug in your code: Inside substrate::threadPool_t::finish, we have: finished = true; haveWork.notify_all(); If I change it to: { std::lock_guard lock{workMutex}; finished = true; haveWork.notify_all(); } Then I don't get a deadlock at all. As I mentioned, I did think there was a race condition. Here is what I think happened: Thread26:thread 1 checks finished, still false sets finished to be true calls wait calls notify_all ... notify_all happens finally gets into futex_wait syscall And then thread26 never got the notification. With my change the check for finished has to wait till thread1 lets go of the mutex (and the other way around).
[Bug libstdc++/110016] Possible miscodegen when inlining std::condition_variable::wait predicate causes deadlock
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110016 --- Comment #8 from Andrew Pinski --- Here is the backtrace in that case: (gdb) bt #0 0xf6acd22c in futex_wait_cancelable (private=, expected=0, futex_word=0xf3103c64) at ../sysdeps/unix/sysv/linux/futex-internal.h:88 #1 __pthread_cond_wait_common (abstime=0x0, mutex=0xf3103c08, cond=0xf3103c38) at pthread_cond_wait.c:502 #2 __pthread_cond_wait (cond=0xf3103c38, mutex=0xf3103c08) at pthread_cond_wait.c:655 #3 0xf6efc2e4 in __tsan::call_pthread_cancel_with_cleanup (fn=fn@entry=0xf6eafd00 <_FUN(void*)>, cleanup=cleanup@entry=0xf6eb5364 <_FUN(void*)>, arg=arg@entry=0xe5f3dff0) at /home/ubuntu/src/upstream-gcc-aarch64/gcc/libsanitizer/tsan/tsan_platform_linux.cpp:493 #4 0xf6ed4194 in cond_wait<__interceptor_pthread_cond_wait(void*, void*):: > (m=0xf3103c08, c=0xf3103c38, fn=..., si=0xe5f3dfd0, pc=281474824487080, thr=) at /home/ubuntu/src/upstream-gcc-aarch64/gcc/libsanitizer/tsan/tsan_interceptors_posix.cpp:1259 #5 __interceptor_pthread_cond_wait (c=, m=0xf3103c08) at /home/ubuntu/src/upstream-gcc-aarch64/gcc/libsanitizer/tsan/tsan_interceptors_posix.cpp:1270 #6 0x004045f4 in std::condition_variable::wait::waitWork()::{lambda()#1}>(std::unique_lock&, substrate::threadPool_t::waitWork()::{lambda()#1}) (this=this@entry=0xf3103c38, __lock=..., __p=__p@entry=...) at /home/ubuntu/upstream-gcc/include/c++/14.0.0/condition_variable:102 #7 0x004064e0 in substrate::threadPool_t::waitWork() (this=this@entry=0xf3103c00) at t.cc:282 #8 0x004081e4 in substrate::threadPool_t::workerThread(unsigned long) (this=this@entry=0xf3103c00, processor=) at t.cc:310 #9 0x00408234 in substrate::threadPool_t::threadPool_t(bool (*)())::{lambda(unsigned long)#1}::operator()(unsigned long) const (currentProcessor=, __closure=) at t.cc:337 #10 0x00408294 in std::__invoke_impl::threadPool_t(bool (*)())::{lambda(unsigned long)#1}, unsigned long>(std::__invoke_other, substrate::threadPool_t::threadPool_t(bool (*)())::{lambda(unsigned long)#1}&&, unsigned long&&) (__f=...) at /home/ubuntu/upstream-gcc/include/c++/14.0.0/bits/invoke.h:60 #11 0x004082e0 in std::__invoke::threadPool_t(bool (*)())::{lambda(unsigned long)#1}, unsigned long>(std::__invoke_result&&, (substrate::threadPool_t::threadPool_t(bool (*)())::{lambda(unsigned long)#1}&&)...) (__fn=...) at /home/ubuntu/upstream-gcc/include/c++/14.0.0/bits/invoke.h:90 #12 0x004084d4 in std::thread::_Invoker::threadPool_t(bool (*)())::{lambda(unsigned long)#1}, unsigned long> >::_M_invoke<0ul, 1ul>(std::_Index_tuple<0ul, 1ul>) (this=this@entry=0xf56004a8) at /home/ubuntu/upstream-gcc/include/c++/14.0.0/bits/std_thread.h:291 #13 0x00408504 in std::thread::_Invoker::threadPool_t(bool (*)())::{lambda(unsigned long)#1}, unsigned long> >::operator()() (this=this@entry=0xf56004a8) at /home/ubuntu/upstream-gcc/include/c++/14.0.0/bits/std_thread.h:295 #14 0x00408534 in std::thread::_State_impl::threadPool_t(bool (*)())::{lambda(unsigned long)#1}, unsigned long> > >::_M_run() (this=0xf56004a0) at /home/ubuntu/upstream-gcc/include/c++/14.0.0/bits/std_thread.h:244 #15 0xf6ced74c in std::execute_native_thread_routine (__p=0xf56004a0) at /home/ubuntu/src/upstream-gcc-aarch64/gcc/libstdc++-v3/src/c++11/thread.cc:104 #16 0xf6eaf63c in __tsan_thread_start_func (arg=0xf6f0) at /home/ubuntu/src/upstream-gcc-aarch64/gcc/libsanitizer/tsan/tsan_interceptors_posix.cpp:1038 #17 0xf6ac7088 in start_thread (arg=0xf62f) at pthread_create.c:463 #18 0xf6a304ec in thread_start () at ../sysdeps/unix/sysv/linux/aarch64/clone.S:78
[Bug target/109987] ICE in in rs6000_emit_le_vsx_store on ppc64le with -Ofast -mno-power8-vector
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109987 Kewen Lin changed: What|Removed |Added Status|UNCONFIRMED |NEW Ever confirmed|0 |1 Keywords||ice-on-valid-code CC||bergner at gcc dot gnu.org, ||linkw at gcc dot gnu.org, ||segher at gcc dot gnu.org Last reconfirmed||2023-05-29 --- Comment #1 from Kewen Lin --- Confirmed, it's similar to what the issue was found in PR103627 #c4, previously I made a patch to make MMA feature require power9-vector, see https://gcc.gnu.org/pipermail/gcc-patches/2021-December/587310.html. But Segher thought power9-vector is a workaround option, we should make it go away, so just guard it under vsx, see his comment https://gcc.gnu.org/pipermail/gcc-patches/2022-January/589303.html. Unfortunately this issue is specified another workaround option -mno-power8-vector, I think we probably need to put -mpower{8,9}-vector removal in a high priority.
[Bug libstdc++/110016] Possible miscodegen when inlining std::condition_variable::wait predicate causes deadlock
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110016 --- Comment #7 from Andrew Pinski --- I can reproduce the failure on aarch64-linux-gnu on the trunk with `-std=c++17 -pthread -O2 -fsanitize=thread -fno-inline` so your theory about inlining is causing an issue is so incorrect.
[Bug libstdc++/110016] Possible miscodegen when inlining std::condition_variable::wait predicate causes deadlock
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110016 --- Comment #6 from Andrew Pinski --- (In reply to Andrew Pinski from comment #5) > Did you try on some other target than x86 for gcc? To answer my own question is that it fails on aarch64-linux-gnu also. So this makes it more likely a library issue (maybe glibc ...) Thread 26 (Thread 0xe5f3ea10 (LWP 1003246)): #0 0xf6acd22c in futex_wait_cancelable (private=, expected=0, futex_word=0xf3103c64) at ../sysdeps/unix/sysv/linux/futex-internal.h:88 #1 __pthread_cond_wait_common (abstime=0x0, mutex=0xf3103c08, cond=0xf3103c38) at pthread_cond_wait.c:502 #2 __pthread_cond_wait (cond=0xf3103c38, mutex=0xf3103c08) at pthread_cond_wait.c:655 #3 0xf6efc2e4 in __tsan::call_pthread_cancel_with_cleanup (fn=fn@entry=0xf6eafd00 <_FUN(void*)>, cleanup=cleanup@entry=0xf6eb5364 <_FUN(void*)>, arg=arg@entry=0xe5f3e0d0) at /home/ubuntu/src/upstream-gcc-aarch64/gcc/libsanitizer/tsan/tsan_platform_linux.cpp:493 #4 0xf6ed4194 in cond_wait<__interceptor_pthread_cond_wait(void*, void*):: > (m=0xf3103c08, c=0xf3103c38, fn=..., si=0xe5f3e0b0, pc=281474824487080, thr=) at /home/ubuntu/src/upstream-gcc-aarch64/gcc/libsanitizer/tsan/tsan_interceptors_posix.cpp:1259 #5 __interceptor_pthread_cond_wait (c=, m=0xf3103c08) at /home/ubuntu/src/upstream-gcc-aarch64/gcc/libsanitizer/tsan/tsan_interceptors_posix.cpp:1270 #6 0x00403850 in std::condition_variable::wait::waitWork()::{lambda()#1}>(std::unique_lock&, substrate::threadPool_t::waitWork()::{lambda()#1}) (__p=..., __lock=..., this=0xf3103c38) at /home/ubuntu/upstream-gcc/include/c++/14.0.0/bits/atomic_base.h:503 #7 substrate::threadPool_t::waitWork() (this=0xf3103c00) at t.cc:287 #8 substrate::threadPool_t::workerThread(unsigned long) (processor=, this=0xf3103c00) at t.cc:312 #9 substrate::threadPool_t::threadPool_t(bool (*)())::{lambda(unsigned long)#1}::operator()(unsigned long) const (currentProcessor=, __closure=) at t.cc:338 #10 std::__invoke_impl::threadPool_t(bool (*)())::{lambda(unsigned long)#1}, unsigned long>(std::__invoke_other, substrate::threadPool_t::threadPool_t(bool (*)())::{lambda(unsigned long)#1}&&, unsigned long&&) (__f=...) at /home/ubuntu/upstream-gcc/include/c++/14.0.0/bits/invoke.h:61 #11 std::__invoke::threadPool_t(bool (*)())::{lambda(unsigned long)#1}, unsigned long>(std::__invoke_result&&, (substrate::threadPool_t::threadPool_t(bool (*)())::{lambda(unsigned long)#1}&&)...) (__fn=...) at /home/ubuntu/upstream-gcc/include/c++/14.0.0/bits/invoke.h:96 #12 std::thread::_Invoker::threadPool_t(bool (*)())::{lambda(unsigned long)#1}, unsigned long> >::_M_invoke<0ul, 1ul>(std::_Index_tuple<0ul, 1ul>) (this=) at /home/ubuntu/upstream-gcc/include/c++/14.0.0/bits/std_thread.h:292 #13 std::thread::_Invoker::threadPool_t(bool (*)())::{lambda(unsigned long)#1}, unsigned long> >::operator()() (this=) at /home/ubuntu/upstream-gcc/include/c++/14.0.0/bits/std_thread.h:299 #14 std::thread::_State_impl::threadPool_t(bool (*)())::{lambda(unsigned long)#1}, unsigned long> > >::_M_run() (this=) at /home/ubuntu/upstream-gcc/include/c++/14.0.0/bits/std_thread.h:244 #15 0xf6ced74c in std::execute_native_thread_routine (__p=0xf56004a0) at /home/ubuntu/src/upstream-gcc-aarch64/gcc/libstdc++-v3/src/c++11/thread.cc:104 #16 0xf6eaf63c in __tsan_thread_start_func (arg=0xf730) at /home/ubuntu/src/upstream-gcc-aarch64/gcc/libsanitizer/tsan/tsan_interceptors_posix.cpp:1038 #17 0xf6ac7088 in start_thread (arg=0xf66f) at pthread_create.c:463 #18 0xf6a304ec in thread_start () at ../sysdeps/unix/sysv/linux/aarch64/clone.S:78
[Bug libstdc++/110016] Possible miscodegen when inlining std::condition_variable::wait predicate causes deadlock
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110016 --- Comment #4 from Andrew Pinski --- Did you try on some other target than x86 for gcc? --- Comment #5 from Andrew Pinski --- Did you try on some other target than x86 for gcc?
[Bug libstdc++/110016] Possible miscodegen when inlining std::condition_variable::wait predicate causes deadlock
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110016 --- Comment #4 from Andrew Pinski --- Did you try on some other target than x86 for gcc?
[Bug libstdc++/110016] Possible miscodegen when inlining std::condition_variable::wait predicate causes deadlock
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110016 --- Comment #3 from Andrew Pinski --- When you say you compiled with clang, did you use libstdc++ or libc++? Did you try adding gnu::always_inline attribute on the lambda to see if it fails there too? Again the inlining should not have an effect here except if there is some kind of race condition happening.
[Bug libstdc++/110016] Possible miscodegen when inlining std::condition_variable::wait predicate causes deadlock
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110016 --- Comment #2 from Amyspark --- We've seen failures both in Windows (all ABI flavors) and macOS only when compiled with GCC -- AppleClang, Clang, and MSVC (in its three flavors) all work without issue. So I'm doubtful it's a logical issue, especially given that preventing the compiler from inlining void Predicate::operator() into std::condition_variable::wait seems to be enough to work around it. Re 98033-- looks somewhat like it, though as explained earlier, it may also affect non Linux platforms ie. any target where GCC relies on (win)pthreads.
[Bug tree-optimization/94892] (x >> 31) + 1 not getting narrowed to compare
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=94892 --- Comment #7 from Andrew Pinski --- (In reply to Andrew Pinski from comment #6) > Two things we should simplify: > _4 = _3 >> 31; > _4 != -1 > > Into: > _3 >= 0 (if _3 is signed, otherwise false) > > (this will solve f0) See bug 85234 comment #5 on handle that one (g and g2).
[Bug tree-optimization/85234] missed optimisation opportunity for (x >> CST)!=0 is not optimized to (((unsigned)x) >= (1<
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=85234 --- Comment #5 from Andrew Pinski --- Here is the testcase for the constants besides 0:L ``` #define N 3 #define unsigned int #define cmp == #define M 0xf000u _Bool f(unsigned x, int t) { return (x << N) cmp (M << N); } _Bool f1(unsigned x, int t) { return ((x^M) & (-1u>>N)) cmp 0; } _Bool f2(unsigned x, int t) { return (x & (-1u>>N)) cmp (M & (-1u>>N)); } _Bool g(unsigned x, int t) { return (x >> N) cmp M; } _Bool g2(unsigned x, int t) { _Bool = 0; if () return 0; return (x & (-1u<= (1u<
[Bug libgcc/110017] Crossback Compilation for multilib fails on latest ubuntu due to -mx32 being disabled by the linux kernel
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110017 --- Comment #4 from cqwrteur --- Created attachment 55182 --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=55182=edit Here is the build script (need to install a x86_64-w64-mingw32 cross compiler first)
[Bug libgcc/110017] Crossback Compilation for multilib fails on latest ubuntu due to -mx32 being disabled by the linux kernel
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110017 --- Comment #3 from cqwrteur --- (In reply to Andrew Pinski from comment #2) > How are you configuring GCC? gcc/configure --disable-nls --disable-werror --enable-languages=c,c++ --enable-multilib --with-multilib-list=m64,m32,mx32 --with-gxx-libcxx-include-dir=$PREFIXTARGET/include/c++/v1 --prefix=$PREFIX --build=x86_64-pc-linux-gnu --host=x86_64-w64-mingw32 --target=x86_64-pc-linux-gnu --disable-bootstrap --disable-libstdcxx-verbose --with-libstdcxx-eh-pool-obj-count=0 --enable-libstdcxx-backtrace
[Bug libgcc/110017] Crossback Compilation for multilib fails on latest ubuntu due to -mx32 being disabled by the linux kernel
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110017 Andrew Pinski changed: What|Removed |Added Ever confirmed|0 |1 Last reconfirmed||2023-05-28 Version|14.0|unknown Status|UNCONFIRMED |WAITING --- Comment #2 from Andrew Pinski --- How are you configuring GCC?
[Bug libgcc/110017] Crossback Compilation for multilib fails on latest ubuntu due to -mx32 being disabled by the linux kernel
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110017 --- Comment #1 from cqwrteur --- (In reply to cqwrteur from comment #0) > I attempted crossback compilation for GCC, where the compiler is built on > Linux, runs on Windows, and is targeted for Linux again. However, the build > system of libgcc includes a sanity test to detect the functionality of the > compiler, which prevents the build for the -mx32 option and disables m32. > > Moreover, during crossback compilation, GCC specifically looks for the "cc" > command instead of just "gcc," even in cases where it doesn't exist. > > Is there a way to remove or bypass the sanity test restriction for crossback > compilation in this scenario? Not the functionality. It detects whether -mx32 program could run but of course it cannot because linux kernel disabled that.
[Bug libgcc/110017] New: Crossback Compilation for multilib fails on latest ubuntu due to -mx32 being disabled by the linux kernel
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110017 Bug ID: 110017 Summary: Crossback Compilation for multilib fails on latest ubuntu due to -mx32 being disabled by the linux kernel Product: gcc Version: unknown Status: UNCONFIRMED Severity: normal Priority: P3 Component: libgcc Assignee: unassigned at gcc dot gnu.org Reporter: unlvsur at live dot com Target Milestone: --- I attempted crossback compilation for GCC, where the compiler is built on Linux, runs on Windows, and is targeted for Linux again. However, the build system of libgcc includes a sanity test to detect the functionality of the compiler, which prevents the build for the -mx32 option and disables m32. Moreover, during crossback compilation, GCC specifically looks for the "cc" command instead of just "gcc," even in cases where it doesn't exist. Is there a way to remove or bypass the sanity test restriction for crossback compilation in this scenario?
[Bug ipa/109914] --suggest-attribute=pure misdiagnoses static functions
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109914 --- Comment #3 from Bruno Haible --- (In reply to Jan Hubicka from comment #2) > The reason why gcc warns is that it is unable to prove that the function is > always finite. This means that it can not auto-detect pure attribute since > optimizing the call out may turn infinite program to finite one. > So adding the attribute would still help compiler to know that the loops are > indeed finite. Thanks for explaining. So, the warning asks the developer not only to add an __attribute__((__pure__)) marker, but also to verify that the function terminates. In this case, it does, but it took me a minute of reflection to convince myself. For what purpose shall the developer make this effort? The documentation https://gcc.gnu.org/onlinedocs/gcc-13.1.0/gcc/Common-Function-Attributes.html says that it's to allow the compiler to do common subexpression elimination. But in this case, the compiler could easily find out that it cannot do common subexpression elimination anyway, because: - The only caller of this function (have_xattr) is file_has_acl. - In this function, there are three calls to have_xattr. - Each of them is executed only at most once. Control flow analysis shows this. - Each of them has different argument lists: The first argument is a string literal in each case, namely "system.nfs4_acl", "system.posix_acl_access", "system.posix_acl_default" respectively. So, there is no possibility for common subexpression elimination here, even if the function was marked "pure". Therefore it is pointless to suggest to the developer that it would be a gain to mark the function as "pure" and that it is worth spending brain cycles on that.
[Bug tree-optimization/85234] missed optimisation opportunity for (x >> CST)!=0 is not optimized to (((unsigned)x) >= (1<
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=85234 --- Comment #4 from Andrew Pinski --- Here are some testcases dealing with this and showing what still needs to be done: ``` #define N 3 #define unsigned int #define cmp != _Bool rshift(unsigned x, int t) { return (x << N) cmp 0; } _Bool rshift1(unsigned x, int t) { return (x & (-1u>>N)) cmp 0; } _Bool lshift(unsigned x, int t) { return (x >> N) cmp 0; } _Bool lshift1(unsigned x, int t) { return (x & (-1u<= (1u<
[Bug libstdc++/110016] [12/13/14] Possible miscodegen when inlining std::condition_variable::wait predicate causes deadlock
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110016 Andrew Pinski changed: What|Removed |Added Component|c++ |libstdc++ --- Comment #1 from Andrew Pinski --- I doubt this is a code generation issue but rather either a libstdc++ issue or a problem in the code itself. Inlining if anything might expose a race condition that was in the code more often than not.
[Bug c/110007] Implement support for Clang’s __builtin_unpredictable()
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110007 --- Comment #8 from Richard Yao --- (In reply to Alexander Monakov from comment #6) > Are you sure the branch is unpredictable in your micro-benchmark? If you > have repeated runs you'll train the predictors. (In reply to Jan Hubicka from comment #7) > Also note that branch predicted with 50% outcome is not necessarily > unpredictable for example in this: > > for (int i = 0; i < 1; i++) > if (i&1) > > > I would expect branch predictor to work this out on modern systems. > So having explicit flag in branch_probability that given probability is hard > for CPU to predict would make sense and I was thinking we may try to get > this info from auto-fdo eventually too. Good point. I had reused an existing micro-benchmark, but it is using libc's srand(), which is known for not having great quality RNG. It is quite possible that the last branch really is predictable because of that. Having only 1 unpredictable branch is not that terrible, so I probably will defer looking into this further to a future date. (In reply to Alexander Monakov from comment #6) > Implementing a __builtin_branchless_select would address such needs more > directly. There were similar requests in the past, two of them related to > qsort and bsearch, unsurprisingly: PR 93165, PR 97734, PR 106804. As a developer that works on a project that supports GCC and Clang equally, but must support older versions of GCC longer, I would like to see both GCC and Clang adopt each others' builtins. That way, I need to implement fewer compatibility shims to support both compilers and what compatibility shims I do need can be dropped sooner (maybe after 10 years). I am not against the new builtin, but I would like to also have __builtin_unpredictable(). It would be consistent with the existing likely/unlikely macros that my project uses, which should mean other developers will not have to learn the specifics of how predication works to read code using it, since they can just treat it as a compiler hint and read things as if it were not there unless they have some reason to reason more deeply about things. I could just do a macro around __builtin_expect_with_probability(), but I greatly prefer __builtin_unpredictable() since people reading the code do not need to decipher the argument list to understand what it does since the name is self documenting.
[Bug c++/110016] New: [12/13/14] Possible miscodegen when inlining std::condition_variable::wait predicate causes deadlock
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110016 Bug ID: 110016 Summary: [12/13/14] Possible miscodegen when inlining std::condition_variable::wait predicate causes deadlock Product: gcc Version: 12.2.1 Status: UNCONFIRMED Severity: normal Priority: P3 Component: c++ Assignee: unassigned at gcc dot gnu.org Reporter: amy at amyspark dot me Target Milestone: --- Created attachment 55181 --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=55181=edit Minimum test case to reproduce the deadlock Hi all, This is to report a possible codegen issue when inlining a lambda predicate for std::condition_variable::wait. We've verified this to happen with the following versions: - g++-8 (Homebrew GCC 8.5.0) 8.5.0 - g++.exe (Rev6, Built by MSYS2 project) 13.1.0 (both UCRT64 and MINGW64) - g++ (Compiler-Explorer-Build-gcc-4579954f25020f0b39361ab6ec0c8876fda27041-binutils-2.40) 14.0.0 20230522 (experimental) The deadlock seems to happen with 100% certainty on GCC 12.2.1 if one enables ThreadSanitizer; otherwise it happens sporadically in CI. I packaged a reduced version of the test suite: https://godbolt.org/z/fj8rnrbo7, a copy of which you'll find attached to this report. Build with `-std=c++17 -pthread -O2 -fsanitize=thread`. In all cases, once the deadlock is hit (wait for ~3 seconds under GDB) the "finished" atomic boolean and the "workQueue" are correctly flagged as true and empty, respectively; however, the thread will still wait for the condition variable indefinitely. This can be easily worked around by blocking the inlining eg. turn the lambda into a std::bind instance. The complete code of the library where we reproduced this is available here: https://github.com/bad-alloc-heavy-industries/substrate/tree/375db811308ad7414771dbde9af4efa7aa393ca8. You can build it with `meson setup build -Dcpp_std=c++17 -Db_sanitize=thread` and run the test with `meson test -C build`.
[Bug tree-optimization/109985] __builtin_prefetch ignored by GCC 12/13
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109985 --- Comment #6 from Jan Hubicka --- Created attachment 55180 --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=55180=edit untested patch It turns out that as modref was written for memory loads/stores only and later side effects discovery was retrofitted, I forgot to revisit code handling CONST and NOVOPS together. There are quite few places where we can not short-circuit on NOVOPS and be sure we merge in the side effects and determinism flags.
[Bug tree-optimization/109985] __builtin_prefetch ignored by GCC 12/13
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109985 Jan Hubicka changed: What|Removed |Added Assignee|unassigned at gcc dot gnu.org |hubicka at gcc dot gnu.org Status|NEW |ASSIGNED --- Comment #5 from Jan Hubicka --- Hmm, this is slipperly. So novops tells gcc that the function has on memory side effects and in turn we optimize out the call? I think we need to handle novops as having side-effects.
[Bug ipa/109914] --suggest-attribute=pure misdiagnoses static functions
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109914 --- Comment #2 from Jan Hubicka --- The reason why gcc warns is that it is unable to prove that the function is always finite. This means that it can not auto-detect pure attribute since optimizing the call out may turn infinite program to finite one. So adding the attribute would still help compiler to know that the loops are indeed finite.
[Bug middle-end/79704] [meta-bug] Phoronix Test Suite compiler performance issues
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=79704 Jan Hubicka changed: What|Removed |Added CC||hubicka at gcc dot gnu.org --- Comment #2 from Jan Hubicka --- Note that I tried to reproduce from https://www.phoronix.com/review/gcc13-clang16-raptorlake/3 also tsvc (on zen3 machine) and there performance seems OK (GCC does 2168417 nodes/s and clang 2159913) liquid-dsp fails for me: pts/liquid-dsp-1.0.0: Test Installation 1 of 1 1 File Needed [0.74 MB / 1 Minute] File Found: liquid-dsp-20210131.tar.xz [0.74MB] Approximate Install Size: 68 MB Estimated Install Time: 6 Seconds Installing Test @ 20:09:38 The installer exited with a non-zero exit status. ERROR: /usr/lib64/gcc/x86_64-suse-linux/13/../../../../x86_64-suse-linux/bin/ld: cannot find -lliquid: No such file or directory Installing the package 'file' might fix this error. LOG: ~/.phoronix-test-suite/installed-tests/pts/liquid-dsp-1.0.0/install-failed.log seems the build script is broken and actually links to the system wide libquid-dsp
[Bug middle-end/110015] openjpeg is slower when built with gcc13 compared to clang16
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110015 --- Comment #1 from Jan Hubicka --- opj_t1_enc_refpass is not inlined due to large function growth and some others due to max-inline-insns-auto. With inlining forced I get profile: 87.35% opj_t1_cblk_encode_processor 6.22% opj_dwt_encode_and_deinterleave_v.lto_priv.0 1.80% opj_mqc_byteout 1.50% opj_dwt_encode_and_deinterleave_h_one_row.lto_priv.0 So pretty much same profile as for clang. However runtime is still 45573 with -O3 -flto -march=native -fno-semantic-interposition --param large-function-insns=100 --param max-inline-insns-auto=5 So it does not seem to be missing IPA optimizations. There are number of conditional moves in clang code, -mbrach=cost helps a bit, but not enough.
[Bug rtl-optimization/101188] [postreload] Uses content of a clobbered register
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=101188 --- Comment #9 from Georg-Johann Lay --- The bug works as follows: postreload.cc::reload_cse_move2add() loops over all insns, and at some point it encounters (insn 44 14 15 2 (set (reg/f:HI 14 r14 [58]) (reg/v/f:HI 16 r16 [orig:51 self ] [51])) "fail1.c":28:5 101 {*movhi_split} (nil)) During the analysis for that insn, it executes rtx_insn *next = next_nonnote_nondebug_insn (insn); rtx set = NULL_RTX; if (next) set = single_set (next); where next is (insn 15 44 16 2 (parallel [ (set (reg/f:HI 14 r14 [58]) (plus:HI (reg/f:HI 14 r14 [58]) (const_int 68 [0x44]))) (clobber (reg:QI 31 r31)) ]) "fail1.c":28:5 175 {addhi3_clobber} (nil)) Further down, it continues with success = 0: if (success) delete_insn (insn); changed |= success; insn = next; [...] continue; The scan then continues with NEXT_INSN (insn), which is the insn AFTER insn 15, so the CLOBBER of QI:31 in insn 15 is bypassed, and note_stores or similar is never executed on insn 15. The "set = single_set (next)" also bypasses that insn 15 is a PARALLEL with a CLOBBER of a general purpose register. Appears the code is in postreload since 2003, when postreload.c was split out of reload1.c.
[Bug middle-end/110015] New: openjpeg is slower when built with gcc13 compared to clang16
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110015 Bug ID: 110015 Summary: openjpeg is slower when built with gcc13 compared to clang16 Product: gcc Version: unknown Status: UNCONFIRMED Severity: normal Priority: P3 Component: middle-end Assignee: unassigned at gcc dot gnu.org Reporter: hubicka at gcc dot gnu.org Target Milestone: --- I tried to reproduce openjpeg benchmarks from Phoronix https://www.phoronix.com/review/gcc13-clang16-raptorlake/5 On zen3 hardware I get 42607ms for clang build and 45702ms for gcc build that is a 7% difference (Phoronix reports 10% on RaptorLake) perf of clang build: 88.64% opj_t1_cblk_encode_processor 6.68% opj_dwt_encode_and_deinterleave_v 1.30% opj_dwt_encode_and_deinterleave_h_one_row opj_t1_cblk_encode_processor is huge with no obvious hot spots. perf of gcc build: 70.36% opj_t1_cblk_encode_processor 16.12% opj_t1_enc_refpass.lto_priv.0 3.88% opj_dwt_encode_and_deinterleave_v 2.46% pj_dwt_fetch_cols_vertical_pass 2.35% opj_mqc_byteout So we apparently inline less even at -O3
[Bug fortran/88486] ICE in gfc_conv_scalarized_array_ref, at fortran/trans-array.c:3401
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88486 --- Comment #5 from kargl at gcc dot gnu.org --- (In reply to G. Steinmetz from comment #0) > Affects versions down to at least gfortran-5. > Under the hood related to pr85686. > > > $ cat z1.f90 > subroutine s(x) >character(:), allocatable :: x(:) >x = ['bcd'] >x = ['a'//x//'e'] >print *, x > end > This compiles with GNU Fortran (GCC) 13.0.1 20230408 (experimental). This has an ICE with GNU Fortran (FreeBSD Ports Collection) 12.2.0. Filling out the code to something that actually does something reveals a wrong-code issue with an array constructor. There are a boat load of warnings of uninitialized variables, e.g., a.f90:53:34: 53 | x = [ ('a' // x // 'e') ] | ^ Warning: '__var_1_realloc_string.dim[0].ubound' is used uninitialized [-Wuninitialized] a.f90:1:11: 1 | program foo | ^ note: '__var_1_realloc_string' declared here a.f90:3:36: 3 |character(:), allocatable :: a(:) |^ Warning: '.a' is used uninitialized [-Wuninitialized] a.f90:34:22: 34 | end subroutine s | ^ note: '.a' declared here program foo character(:), allocatable :: a(:) call s(a) print '(A,1X,2(I0,1X),/)', 'a: >>' // a // '<<', size(a), len(a(1)) if (allocated(a)) deallocate(a) call t(a) print '(A,1X,2(I0,1X),/)', 'a: >>' // a // '<<', size(a), len(a(1)) if (allocated(a)) deallocate(a) call u(a) print '(A,1X,2(I0,1X),/)', 'a: >>' // a // '<<', size(a), len(a(1)) if (allocated(a)) deallocate(a) call v(a) print '(A,1X,2(I0,1X),/)', 'a: >>' // a // '<<', size(a), len(a(1)) if (allocated(a)) deallocate(a) call w(a) print '(A,1X,2(I0,1X),/)', 'a: >>' // a // '<<', size(a), len(a(1)) if (allocated(a)) deallocate(a) contains subroutine s(x) character(:), allocatable :: x(:) x = ['bcd'] x = ['a' // x // 'e'] print '(A,1X,2(I0,1X))', 's: >>' // x // '<<', size(x), len(x(1)) end subroutine s subroutine t(x) character(:), allocatable :: x(:) x = ['bcd'] x = 'a' // x // 'e' print '(A,1X,2(I0,1X))', 't: >>' // x // '<<', size(x), len(x(1)) end subroutine t subroutine u(x) character(:), allocatable :: x(:) x = ['bcd'] x = [ ('a' // x // 'e') ] print '(A,1X,2(I0,1X))', 'u: >>' // x // '<<', size(x), len(x(1)) end subroutine u subroutine v(x) character(:), allocatable, intent(out) :: x(:) x = ['bcd'] x = [ ('a' // x // 'e') ] print '(A,1X,2(I0,1X))', 'v: >>' // x // '<<', size(x), len(x(1)) end subroutine v subroutine w(x) character(:), allocatable, intent(out) :: x(:) x = [ 'a' // ['bcd'] // 'e' ] print '(A,1X,2(I0,1X))', 'w: >>' // x // '<<', size(x), len(x(1)) end subroutine w end program foo s: >>abcde<< 1 5 a: >>abc<< 1 3<--- whoops t: >>abcde<< 1 5 a: >>abcde<< 1 5 u: >>abcde<< 1 5 a: >>abc<< 1 3<--- whoops v: >>abcde<< 1 5 a: >>abc<< 1 3<--- whoops w: >>abcde<< 1 5 a: >>abcde<< 1 5
[Bug target/109812] GraphicsMagick resize is a lot slower in GCC 13.1 vs Clang 16 on Intel Raptor Lake
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109812 --- Comment #10 from Jan Hubicka --- This is benchmarkeable version of the simplified testcase: jan@localhost:/tmp> cat t.c #define N 1000 struct rgb {unsigned char r,g,b;} rgbs[N]; int *addr; struct drgb {double r,g,b; #ifdef OPACITY double o; #endif }; struct drgb sum(double w) { struct drgb r; for (int i = 0; i < N; i++) { r.r += rgbs[i].r * w; r.g += rgbs[i].g * w; r.b += rgbs[i].b * w; } return r; } jan@localhost:/tmp> cat q.c struct drgb {double r,g,b; #ifdef OPACITY double o; #endif }; struct drgb sum(double w); int main() { for (int i = 0; i < 1000; i++) sum(i); } jan@localhost:/tmp> gcc t.c q.c -march=native -O3 -g ; objdump -d a.out | grep vfmadd231pd ; perf stat ./a.out 40119d: c4 e2 d9 b8 d1 vfmadd231pd %xmm1,%xmm4,%xmm2 Performance counter stats for './a.out': 12,148.04 msec task-clock:u #1.000 CPUs utilized 0 context-switches:u #0.000 /sec 0 cpu-migrations:u #0.000 /sec 736 page-faults:u# 60.586 /sec 50,018,421,148 cycles:u #4.117 GHz 220,502 stalled-cycles-frontend:u#0.00% frontend cycles idle 39,950,154,369 stalled-cycles-backend:u # 79.87% backend cycles idle 120,000,191,713 instructions:u #2.40 insn per cycle #0.33 stalled cycles per insn 10,000,048,918 branches:u # 823.182 M/sec 7,959 branch-misses:u #0.00% of all branches 12.149466078 seconds time elapsed 12.149084000 seconds user 0.0 seconds sys jan@localhost:/tmp> gcc t.c q.c -march=native -O3 -g -DOPACITY ; objdump -d a.out | grep vfmadd231pd ; perf stat ./a.out Performance counter stats for './a.out': 12,141.11 msec task-clock:u #1.000 CPUs utilized 0 context-switches:u #0.000 /sec 0 cpu-migrations:u #0.000 /sec 735 page-faults:u# 60.538 /sec 50,018,839,129 cycles:u #4.120 GHz 185,034 stalled-cycles-frontend:u#0.00% frontend cycles idle 29,963,999,798 stalled-cycles-backend:u # 59.91% backend cycles idle 120,000,191,729 instructions:u #2.40 insn per cycle #0.25 stalled cycles per insn 10,000,048,913 branches:u # 823.652 M/sec 7,311 branch-misses:u #0.00% of all branches 12.142252354 seconds time elapsed 12.138237000 seconds user 0.00400 seconds sys So on zen2 hardware I get same performance on both. It may be interesting to test it on Raptor Lake.
[Bug fortran/88486] ICE in gfc_conv_scalarized_array_ref, at fortran/trans-array.c:3401
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88486 kargl at gcc dot gnu.org changed: What|Removed |Added CC||kargl at gcc dot gnu.org --- Comment #4 from kargl at gcc dot gnu.org --- (In reply to anlauf from comment #3) > Further reduced: > > subroutine s(x) > character(:), allocatable :: x(:) > character(:), allocatable :: y(:) > y = [x//'a'] > end This compiles with GNU Fortran (GCC) 13.0.1 20230408 (experimental). This has an ICE with GNU Fortran (FreeBSD Ports Collection) 12.2.0.
[Bug target/109812] GraphicsMagick resize is a lot slower in GCC 13.1 vs Clang 16 on Intel Raptor Lake
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109812 --- Comment #9 from Jan Hubicka --- Oddly enough simplified version of the loop SLP vectorizes for me: struct rgb {unsigned char r,g,b;} *rgbs; int *addr; double *weights; struct drgb {double r,g,b;}; struct drgb sum() { struct drgb r; for (int i = 0; i < 10; i++) { int j = addr[i]; double w = weights[i]; r.r += rgbs[j].r * w; r.g += rgbs[j].g * w; r.b += rgbs[j].b * w; } return r; } I get: L2: movslq (%r9,%rdx,4), %rax vmovsd (%r8,%rdx,8), %xmm1 incq%rdx leaq(%rax,%rax,2), %rax addq%rsi, %rax movzbl (%rax), %ecx vmovddup%xmm1, %xmm4 vmovd %ecx, %xmm0 movzbl 1(%rax), %ecx movzbl 2(%rax), %eax vpinsrd $1, %ecx, %xmm0, %xmm0 vcvtdq2pd %xmm0, %xmm0 vfmadd231pd %xmm4, %xmm0, %xmm2 vcvtsi2sdl %eax, %xmm5, %xmm0 vfmadd231sd %xmm1, %xmm0, %xmm3 cmpq$10, %rdx jne .L2 I think the actual loop is: [local count: 44202554]: _106 = _262->pixel; _109 = *source_231(D).columns; [local count: 401841405]: # pixel$green_332 = PHI <_124(89), pixel$green_265(53)> # i_357 = PHI # pixel$red_371 = PHI <_119(89), pixel$red_263(53)> # pixel$blue_377 = PHI <_129(89), pixel$blue_267(53)> i.51_102 = (long unsigned int) i_357; _103 = i.51_102 * 16; _104 = _262 + _103; _105 = _104->pixel; _107 = _105 - _106; _108 = (long unsigned int) _107; _110 = _108 * _109; _112 = _110 + _621; weight_297 = _104->weight; _113 = _112 * 4; _114 = _276 + _113; _115 = _114->red; _116 = (int) _115; _117 = (double) _116; _118 = _117 * weight_297; _119 = _118 + pixel$red_371; _120 = _114->green; _121 = (int) _120; _122 = (double) _121; _123 = _122 * weight_297; _124 = _123 + pixel$green_332; _125 = _114->blue; _126 = (int) _125; _127 = (double) _126; _128 = _127 * weight_297; _129 = _128 + pixel$blue_377; i_298 = i_357 + 1; if (n_195 > i_298) goto ; [89.00%] else goto ; [11.00%] [local count: 44202554]: # _607 = PHI <_124(54)> # _606 = PHI <_119(54)> # _605 = PHI <_129(54)> goto ; [100.00%] [local count: 357638851]: goto ; [100.00%] and SLP vectorizer seems to claim: ../magick/resize.c:1284:52: note: _125 = _114->blue; ../magick/resize.c:1284:52: note: _120 = _114->green; ../magick/resize.c:1284:52: note: _115 = _114->red; ../magick/resize.c:1284:52: missed: not consecutive access weight_297 = _104->weight; ../magick/resize.c:1284:52: missed: not consecutive access _105 = _104->pixel; ../magick/resize.c:1284:52: missed: not consecutive access _134->red = iftmp.57_207; ../magick/resize.c:1284:52: missed: not consecutive access _134->green = iftmp.60_208; ../magick/resize.c:1284:52: missed: not consecutive access _134->blue = iftmp.63_209; ../magick/resize.c:1284:52: missed: not consecutive access _134->opacity = 0; ../magick/resize.c:1284:52: missed: not consecutive access _63 = *source_231(D).columns; ../magick/resize.c:1284:52: missed: not consecutive access _60 = _262->pixel; Not sure if that is related to the real testcase: struct rgb {unsigned char r,g,b;} *rgbs; int *addr; double *weights; struct drgb {double r,g,b,o;}; struct drgb sum() { struct drgb r; for (int i = 0; i < 10; i++) { int j = addr[i]; double w = weights[i]; r.r += rgbs[j].r * w; r.g += rgbs[j].g * w; r.b += rgbs[j].b * w; } return r; } make us to miss the vectorization even though there is nothing using drgb->o: sum: .LFB0: .cfi_startproc movq%rdi, %r8 movqweights(%rip), %rsi movqaddr(%rip), %rdi vxorps %xmm2, %xmm2, %xmm2 movqrgbs(%rip), %rcx xorl%edx, %edx .p2align 4 .p2align 3 .L2: movslq (%rdi,%rdx,4), %rax vmovsd (%rsi,%rdx,8), %xmm0 incq%rdx leaq(%rax,%rax,2), %rax addq%rcx, %rax movzbl (%rax), %r9d vcvtsi2sdl %r9d, %xmm2, %xmm1 movzbl 1(%rax), %r9d movzbl 2(%rax), %eax vfmadd231sd %xmm0, %xmm1, %xmm3 vcvtsi2sdl %r9d, %xmm2, %xmm1 vfmadd231sd %xmm0, %xmm1, %xmm5 vcvtsi2sdl %eax, %xmm2, %xmm1 vfmadd231sd %xmm0, %xmm1, %xmm4 cmpq$10, %rdx jne .L2 vmovq %xmm4, %xmm4 vunpcklpd %xmm5, %xmm3, %xmm0 movq%r8, %rax vinsertf128 $0x1, %xmm4, %ymm0, %ymm0 vmovupd %ymm0, (%r8) vzeroupper ret
[Bug analyzer/110014] New: -Wanalyzer-allocation-size mishandles realloc (..., .... * sizeof (object))
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110014 Bug ID: 110014 Summary: -Wanalyzer-allocation-size mishandles realloc (..., * sizeof (object)) Product: gcc Version: 13.1.1 Status: UNCONFIRMED Severity: normal Priority: P3 Component: analyzer Assignee: dmalcolm at gcc dot gnu.org Reporter: eggert at cs dot ucla.edu Target Milestone: --- Created attachment 55179 --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=55179=edit compile with 'gcc -fanalyzer -S' to reproduce the bug This is a followup to bug 109577, and reports a more serious problem with -Wanalyzer-allocation-size: it mishandles realloc even when the last argument is obviously a multiple of the object size. I discovered this problem when compiling an experimental version of GNU diffutils. This is with gcc (GCC) 13.1.1 20230511 (Red Hat 13.1.1-2) x86-64. Compile the attached program with: gcc -fanalyzer -S w.i The output is as follows. All the warnings are incorrect. The last warning is for a call of the form realloc(p, N * sizeof (long)) even though the result is used as a long * so the call is obviously well-sized. w.i: In function ‘slurp’: w.i:11:14: warning: allocated buffer size is not a multiple of the pointee's size [CWE-131] [-Wanalyzer-allocation-size] 11 | buffer = realloc (buffer, cc); | ^~~~ ‘slurp’: events 1-4 | |9 | if (!__builtin_add_overflow (file_size - file_size % sizeof (long), | | ^ | | | | | (1) following ‘true’ branch... | 10 |2 * sizeof (long), )) | 11 | buffer = realloc (buffer, cc); | | | | | | | (2) ...to here | | (3) allocated ‘cc’ bytes here | | (4) assigned to ‘long int *’ here; ‘sizeof (long int)’ is ‘8’ | w.i: In function ‘slurp1’: w.i:18:10: warning: allocated buffer size is not a multiple of the pointee's size [CWE-131] [-Wanalyzer-allocation-size] 18 | return realloc (buffer, file_size - file_size % sizeof (long)); | ^~~ ‘slurp1’: events 1-2 | | 18 | return realloc (buffer, file_size - file_size % sizeof (long)); | | ^~~ | | | | | (1) allocated ‘file_size & 18446744073709551608’ bytes here | | (2) assigned to ‘long int *’ here; ‘sizeof (long int)’ is ‘8’ | w.i: In function ‘slurp2’: w.i:24:10: warning: allocated buffer size is not a multiple of the pointee's size [CWE-131] [-Wanalyzer-allocation-size] 24 | return realloc (buffer, (file_size / sizeof (long)) * sizeof (long)); | ^ ‘slurp2’: events 1-2 | | 24 | return realloc (buffer, (file_size / sizeof (long)) * sizeof (long)); | | ^ | | | | | (1) allocated ‘file_size & 18446744073709551608’ bytes here | | (2) assigned to ‘long int *’ here; ‘sizeof (long int)’ is ‘8’ |
[Bug fortran/68241] [meta-bug] [F03] Deferred-length character
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=68241 Bug 68241 depends on bug 65381, which changed state. Bug 65381 Summary: [10/11/12/13/14 Regression] ICE during array result, assignment https://gcc.gnu.org/bugzilla/show_bug.cgi?id=65381 What|Removed |Added Status|REOPENED|RESOLVED Resolution|--- |FIXED
[Bug fortran/65381] [10/11/12/13/14 Regression] ICE during array result, assignment
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=65381 kargl at gcc dot gnu.org changed: What|Removed |Added CC||kargl at gcc dot gnu.org Status|REOPENED|RESOLVED Resolution|--- |FIXED --- Comment #13 from kargl at gcc dot gnu.org --- All of the codes in this bug report compile with GNU Fortran (FreeBSD Ports Collection) 12.2.0 GNU Fortran (GCC) 13.0.1 20230408 (experimental)
[Bug target/109812] GraphicsMagick resize is a lot slower in GCC 13.1 vs Clang 16 on Intel Raptor Lake
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109812 --- Comment #8 from Jan Hubicka --- Created attachment 55178 --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=55178=edit Preprocessed source of VerticalFiller and HorisontalFiller
[Bug target/109812] GraphicsMagick resize is a lot slower in GCC 13.1 vs Clang 16 on Intel Raptor Lake
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109812 Jan Hubicka changed: What|Removed |Added Summary|GraphicsMagick resize is a |GraphicsMagick resize is a |lot slower in GCC 13.1 vs |lot slower in GCC 13.1 vs |Clang 16|Clang 16 on Intel Raptor ||Lake --- Comment #7 from Jan Hubicka --- On zen3 hardware I get GCC: GraphicsMagick 1.3.38: pts/graphics-magick-2.1.0 [Operation: Resizing] Test 1 of 1 Estimated Trial Run Count:3 Estimated Time To Completion: 4 Minutes [17:00 UTC] Started Run 1 @ 16:57:17 Started Run 2 @ 16:58:22 Started Run 3 @ 16:59:26 Operation: Resizing: 1390 1386 1383 Average: 1386 Iterations Per Minute Deviation: 0.25% clang16: GraphicsMagick 1.3.38: pts/graphics-magick-2.1.0 [Operation: Resizing] Test 1 of 1 Estimated Trial Run Count:3 Estimated Time To Completion: 4 Minutes [16:54 UTC] Started Run 1 @ 16:51:48 Started Run 2 @ 16:52:52 Started Run 3 @ 16:53:56 Operation: Resizing: 180 180 180 Average: 180 Iterations Per Minute Deviation: 0.00% GCC profile: 52.07% VerticalFilter._omp_fn.0 24.59% HorizontalFilter._omp_fn.0 11.78% ReadCachePixels.isra.0 Clang does not seem to have openmp in it, so to get comparable runs I added OMP_THREAD_LIMIT=1 With this I get: GraphicsMagick 1.3.38: pts/graphics-magick-2.1.0 [Operation: Resizing] Test 1 of 1 Estimated Trial Run Count:3 Estimated Time To Completion: 4 Minutes [17:17 UTC] Started Run 1 @ 17:14:14 Started Run 2 @ 17:15:18 Started Run 3 @ 17:16:22 Operation: Resizing: 184 186 186 Average: 185 Iterations Per Minute Deviation: 0.62% so GCC build is still bit faster. Internal loop of VerticalFillter is: 0.00 │4a0:┌─→mov 0x8(%rdx),%rax ▒ 1.33 ││ vmovsd (%rdx),%xmm1▒ 1.58 ││ add $0x10,%rdx ▒ 0.00 ││ sub %r13,%rax ▒ 4.77 ││ imul %r11,%rax ▒ 1.01 ││ add %rcx,%rax ▒ 0.04 ││ movzbl 0x2(%r15,%rax,4),%r10d ▒ 8.38 ││ vcvtsi2sd%r10d,%xmm2,%xmm0 ▒ 2.44 ││ movzbl 0x1(%r15,%rax,4),%r10d ◆ 1.55 ││ movzbl (%r15,%rax,4),%eax ▒ 0.00 ││ vfmadd231sd %xmm0,%xmm1,%xmm4 ▒ 13.91 ││ vcvtsi2sd%r10d,%xmm2,%xmm0 ▒ 1.86 ││ vfmadd231sd %xmm0,%xmm1,%xmm5 ▒ 13.00 ││ vcvtsi2sd%eax,%xmm2,%xmm0▒ 2.02 ││ vfmadd231sd %xmm0,%xmm1,%xmm3 ▒ 12.54 │├──cmp %rdx,%rdi ▒ 0.00 │└──jne 4a0 ▒ HorisontalFiller: 0.01 │520:┌─→mov 0x8(%r8),%rdx ▒ 0.96 ││ vmovsd (%r8),%xmm1 ▒ 1.93 ││ add $0x10,%r8 ▒ 0.50 ││ sub %r15,%rdx ▒ 4.02 ││ add %r11,%rdx ▒ 2.26 ││ movzbl 0x2(%r14,%rdx,4),%ebx ▒ 0.09 ││ vcvtsi2sd%ebx,%xmm2,%xmm0 ▒ 10.10 ││ movzbl 0x1(%r14,%rdx,4),%ebx ◆ 0.92 ││ movzbl (%r14,%rdx,4),%edx▒ 1.84 ││ vfmadd231sd %xmm0,%xmm1,%xmm4 ▒ 6.82 ││ vcvtsi2sd%ebx,%xmm2,%xmm0 ▒ 11.15 ││ vfmadd231sd %xmm0,%xmm1,%xmm3 ▒ 13.81 ││ vcvtsi2sd%edx,%xmm2,%xmm0 ▒ 6.16 ││ vfmadd231sd %xmm0,%xmm1,%xmm5 ▒ 8.61 │├──cmp %rsi,%r8 ▒ 1.56 │└──jne 520 ▒ ReadCachePixels: │2e0:┌─→mov(%rbx,%rax,4),%edx ▒ 83.03 ││ mov%edx,(%r12,%rax,4) ▒ 12.34 ││ inc%rax▒ 0.02 │├──cmp%rsi,%rax ▒ With Clang I get:
[Bug target/64331] regcprop propagates registers noted as REG_DEAD
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=64331 Georg-Johann Lay changed: What|Removed |Added Status|ASSIGNED|NEW
[Bug target/64331] regcprop propagates registers noted as REG_DEAD
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=64331 Georg-Johann Lay changed: What|Removed |Added Assignee|gjl at gcc dot gnu.org |unassigned at gcc dot gnu.org --- Comment #13 from Georg-Johann Lay --- Resetting assignee to default. The AVR backend solved the problem by a target-specific mini-pass that (re)computes notes as late as possible.
[Bug target/109812] GraphicsMagick resize is a lot slower in GCC 13.1 vs Clang 16
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109812 Jan Hubicka changed: What|Removed |Added CC||hubicka at gcc dot gnu.org --- Comment #6 from Jan Hubicka --- I installed the phoronix testuiste and uploaded sample data it uses to http://www.ucw.cz/~hubicka/sample-photo-6000x4000-1.zip I doubt they make much difference especially for resizing.
[Bug c/110007] Implement support for Clang’s __builtin_unpredictable()
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110007 Jan Hubicka changed: What|Removed |Added CC||hubicka at gcc dot gnu.org --- Comment #7 from Jan Hubicka --- Also note that branch predicted with 50% outcome is not necessarily unpredictable for example in this: for (int i = 0; i < 1; i++) if (i&1) I would expect branch predictor to work this out on modern systems. So having explicit flag in branch_probability that given probability is hard for CPU to predict would make sense and I was thinking we may try to get this info from auto-fdo eventually too.
[Bug fortran/99139] ICE: gfc_get_default_type(): Bad symbol '__tmp_UNKNOWN_0_rank_1'
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=99139 --- Comment #5 from kargl at gcc dot gnu.org --- (In reply to sandra from comment #4) > The problem noted in comment 1 looks related to PR 102641 -- > automatically-inserted implicit initialization code can't cope with > assumed-rank arrays. I don't think it is related. PR102601 involves default initialization and/or deallocation of an actual argument associated with an intent(out) assumed-rank dummy argument. > I tested the patch in comment 2 and saw a whole lot of regressions (ICEs). > :-( The patch in comment #2 needed to be moved down into the 'if (m == MATCH_YES)' block where 'expr2 != NULL'. The following has been regtested with no new regressions. diff --git a/gcc/fortran/match.cc b/gcc/fortran/match.cc index 5eb6d0e1c1d..0a030ae01df 100644 --- a/gcc/fortran/match.cc +++ b/gcc/fortran/match.cc @@ -6770,8 +6770,20 @@ gfc_match_select_rank (void) gfc_current_ns = gfc_build_block_ns (ns); m = gfc_match (" %n => %e", name, ); + if (m == MATCH_YES) { + /* If expr2 corresponds to an implicitly typed variable, then the +actual type of the variable may not have been set. Set it here. */ + if (!gfc_current_ns->seen_implicit_none + && expr2->expr_type == EXPR_VARIABLE + && expr2->ts.type == BT_UNKNOWN + && expr2->symtree && expr2->symtree->n.sym) + { + gfc_set_default_type (expr2->symtree->n.sym, 0, gfc_current_ns); + expr2->ts.type = expr2->symtree->n.sym->ts.type; + } + expr1 = gfc_get_expr (); expr1->expr_type = EXPR_VARIABLE; expr1->where = expr2->where;
[Bug tree-optimization/110009] Another missing ABS detection
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110009 --- Comment #2 from Andrew Pinski --- (In reply to Georg-Johann Lay from comment #1) > (In reply to Andrew Pinski from comment #0) > > unsigned > > f1 (int v) > > { > > [...] > > int b_5; > > > > b_5 = v>>(sizeof(v)*8 - 1); > > Does it depend on -fwrapv maybe. No in this case there is a missing pattern to match against.
[Bug c++/110000] GCC should implement exclude_from_explicit_instantiation
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=11 --- Comment #8 from Nikolas Klauser --- (In reply to Florian Weimer from comment #7) > (In reply to Nikolas Klauser from comment #6) > > Does that make sense? > > Not quite. I was trying to suggest that you also need to suppress all > inter-procedural analysis. This will inhibit quite a few useful > optimizations. Why would you need to do that? As long as any functions that are part of the ABI don't change in a non-benign way, everything is fine. If an implementation-detail function doesn't get inlined, but the public function does, it's fine because the detail function gets emitted by every TU that uses it, which means that it'll always be there as long as some function relies on the symbol. If the implementation-detail function gets inlined, the code will obviously be there - no need to have a symbol anywhere.
[Bug target/99435] avr: incorrect I/O address ranges for some cores
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=99435 Georg-Johann Lay changed: What|Removed |Added Status|WAITING |RESOLVED Resolution|--- |INVALID --- Comment #3 from Georg-Johann Lay --- Closed as invalid. The linked ATmega16U4 states on page 26: > 5. AVR Memories > 5.4 I/O Memory > [...] > I/O Registers within the address range 0x00 - 0x1F are directly bit-accessible > using the SBI and CBI instructions. In these registers, the value of single > bits can be checked by using the SBIS and SBIC instructions. Refer to the > instruction set section for more details. When using the I/O specific commands > IN and OUT, the I/O addresses 0x00 - 0x3F must be used. When addressing I/O > Registers as data space using LD and ST instructions, 0x20 must be added to > these addresses. The device is a complex microcontroller with more peripheral > units than can be supported within the 64 location reserved in Opcode for the > IN and OUT instructions. For the Extended I/O space from 0x60 - 0xFF in SRAM, > only the ST/STS/STD and LD/LDS/LDD instructions can be used. So the lower I/O has a range of 5 bits (CBI, SBI, SBIC, SBIS), and the I/O addressable by IN and OUT has a range of 6 bits.
[Bug target/49263] SH Target: underutilized "TST #imm, R0" instruction
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=49263 --- Comment #44 from Oleg Endo --- (In reply to Alexander Klepikov from comment #43) > > Well, not really. Look what's happening during expand pass when 'ashrsi3' is > expanding. Function 'expand_ashiftrt' is called and what it does at the end > - it explicitly emits 3 insns: > [...] > > By the way, right shift for integers expands to only one 'lshiftrt' insn and > that's why it can be catched and converted to 'tst'. > Yeah, I might have dropped the ball on the right shift patterns back then and only reworked the left shift patterns to do that. > > As far as I understand these insns could be catched later by a peephole and > converted to 'tstsi_t' insn like it is done for other much simple insn > sequences. It's the combine RTL pass and split1 RTL pass that does most of this work here. Peephole pass in GCC is too simplistic for this. > > Thank you for your time and detailed explanations! I agree with you on all > points. Software cannot be perfect and it's OK for GCC not to be super > optimized, so this part better sholud be left intact. We can't have it perfect, but we can try ;) > > By the way, I tried to link library to my project and I figured out that > linker is smart enough to link only necessary library functions even without > LTO. So increase in size is about 100 or 200 bytes, that is acceptable. > Thank you very much for help! You're welcome. Yes, to strip out unused library functions it doesn't need LTO. But using LTO for embedded/MCU firmware, I find it can reduce the code size by about 20%. For example, it can also inline small library functions in your code (if the library was also compiled with LTO).
[Bug target/49263] SH Target: underutilized "TST #imm, R0" instruction
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=49263 --- Comment #43 from Alexander Klepikov --- > > Thank you! I have an idea. If it's impossible to defer initial optimization, > > maybe it's possible to emit some intermediate insn and catch it and optimize > > later? > > This is basically what is supposed to be happening there already. Well, not really. Look what's happening during expand pass when 'ashrsi3' is expanding. Function 'expand_ashiftrt' is called and what it does at the end - it explicitly emits 3 insns: wrk = gen_reg_rtx (Pmode); //This one emit_move_insn (gen_rtx_REG (SImode, 4), operands[1]); sprintf (func, "__ashiftrt_r4_%d", value); rtx lab = function_symbol (wrk, func, SFUNC_STATIC).lab; //This one emit_insn (gen_ashrsi3_n (GEN_INT (value), wrk, lab)); //And this one emit_move_insn (operands[0], gen_rtx_REG (SImode, 4)); As far as I understand these insns could be catched later by a peephole and converted to 'tstsi_t' insn like it is done for other much simple insn sequences. What I'm thinkig about is to emit only one, say 'compound', insn. Which could be then splitted later somwhere in split pass to function call, to those 3 insns. I wrote test code that emits only one bogus insn. This insn expands to pure asm code. Of course, that asm code is invalid, because it is impossible to place a libcall label at the end of function with pure asm code injection. But then all what is could be coverted to 'tst', converts to 'tst'. And all pure right shifts converts to invalid asm code, of course. That's why I am thinking about possibility of emitting some intermediate insn at expand pass that will defer it real expanding. But I still don't know how to do it right and even if it is possible. By the way, right shift for integers expands to only one 'lshiftrt' insn and that's why it can be catched and converted to 'tst'. > > However, it's a bit of a dilemma. > > 1) If we don't have a dynamic shift insn and we smash the constant shift > into individual > stitching shifts early, it might open some new optimization opportunities, > e.g. by sharing intermediate shift results. Not sure though if that > actually happens in practice though. > > 2) Whether to use the dynamic shift insn or emit a function call or use > stitching shifts sequence, it all has an impact on register allocation and > other instruction use. This can be problematic during the course of RTL > optimization passes. > > 3) Even if we have a dynamic shift, sometimes it's more beneficial to emit a > shorter stitching shift sequence. Which one is better depends on the > surrounding code. I don't think there is anything good there to make the > proper choice. > > Some other shift related PRs: PR 54089, PR 65317, PR 67691, PR 67869, PR > 52628, PR 58017 Thank you for your time and detailed explanations! I agree with you on all points. Software cannot be perfect and it's OK for GCC not to be super optimized, so this part better sholud be left intact. By the way, I tried to link library to my project and I figured out that linker is smart enough to link only necessary library functions even without LTO. So increase in size is about 100 or 200 bytes, that is acceptable. Thank you very much for help!
[Bug tree-optimization/110009] Another missing ABS detection
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110009 --- Comment #1 from Georg-Johann Lay --- (In reply to Andrew Pinski from comment #0) > unsigned > f1 (int v) > { > [...] > int b_5; > > b_5 = v>>(sizeof(v)*8 - 1); Does it depend on -fwrapv maybe.
[Bug rtl-optimization/101188] [postreload] Uses content of a clobbered register
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=101188 Georg-Johann Lay changed: What|Removed |Added Summary|[AVR] Miscompilation and|[postreload] Uses content |function pointers |of a clobbered register See Also||https://gcc.gnu.org/bugzill ||a/show_bug.cgi?id=56833 --- Comment #8 from Georg-Johann Lay --- Changing the title to something that resembles what is going wrong. Also there is PR56833 which was fixed around v4.9, so maybe that fix was incomplete. There is also PR56442 which is still open, and where it's unclear whether that is a duplicate.
[Bug c++/110000] GCC should implement exclude_from_explicit_instantiation
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=11 --- Comment #7 from Florian Weimer --- (In reply to Nikolas Klauser from comment #6) > Does that make sense? Not quite. I was trying to suggest that you also need to suppress all inter-procedural analysis. This will inhibit quite a few useful optimizations.