[Bug fortran/88713] _gfortran_internal_pack@PLT prevents vectorization
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88713 --- Comment #3 from Chris Elrod --- Created attachment 45353 --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=45353&action=edit g++ assembly output
[Bug fortran/88713] _gfortran_internal_pack@PLT prevents vectorization
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88713 --- Comment #2 from Chris Elrod --- Created attachment 45352 --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=45352&action=edit gfortran assembly output
[Bug fortran/88713] New: _gfortran_internal_pack@PLT prevents vectorization
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88713 Bug ID: 88713 Summary: _gfortran_internal_pack@PLT prevents vectorization Product: gcc Version: 8.2.1 Status: UNCONFIRMED Severity: normal Priority: P3 Component: fortran Assignee: unassigned at gcc dot gnu.org Reporter: elrodc at gmail dot com Target Milestone: --- Created attachment 45350 --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=45350&action=edit Fortran version of vectorization test. I am attaching Fortran and C++ translations of a simple working example. The C++ version is vectorized, while the Fortran version is not. The code consists of two functions. One simply runs a for loop, calling the other function. The function is vectorizable across loop iterations. g++ does this succcesfully. However, gfortran does not, because it repacks data with call_gfortran_internal_pack@PLT so that it can no longer be vectorized across iterations. I compiled with: gfortran -Ofast -march=skylake-avx512 -mprefer-vector-width=512 -fno-semantic-interposition -shared -fPIC -S vectorization_test.cpp -o gfortvectorization_test.s g++ -Ofast -march=skylake-avx512 -mprefer-vector-width=512 -shared -fPIC -S vectorization_test.cpp -o gppvectorization_test.s LLVM (via flang and clang) successfully vectorizes both versions.
[Bug fortran/88713] _gfortran_internal_pack@PLT prevents vectorization
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88713 --- Comment #1 from Chris Elrod --- Created attachment 45351 --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=45351&action=edit C++ version of the vectorization test case.
[Bug c/81980] Spurious -Wmissing-format-attribute warning in 32-bit mode
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=81980 Eric Gallager changed: What|Removed |Added CC||dmalcolm at gcc dot gnu.org, ||dodji at gcc dot gnu.org --- Comment #3 from Eric Gallager --- cc-ing diagnostics maintainers
[Bug c++/80789] Better error for passing lambda with capture as function pointer
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=80789 --- Comment #2 from Eric Gallager --- not sure whether to cc the C++ FE maintainers or the diagnostics maintainers on this...
[Bug c++/78502] Analyze 'final'/'override' even for uninstantiated class templates
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=78502 Eric Gallager changed: What|Removed |Added CC||jason at redhat dot com, ||nathan at gcc dot gnu.org --- Comment #2 from Eric Gallager --- since this might be an accepts-invalid for gcc (or a rejects-valid for clang) I'm cc-ing the C++ FE maintainers for their interpretation of the standard.
[Bug rtl-optimization/63156] web can't handle AUTOINC correctly
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=63156 Eric Gallager changed: What|Removed |Added CC||steven at gcc dot gnu.org Assignee|steven at gcc dot gnu.org |unassigned at gcc dot gnu.org --- Comment #12 from Eric Gallager --- (In reply to Eric Gallager from comment #11) > (In reply to Steven Bosscher from comment #7) > > (In reply to Carrot from comment #6) > > > Since it is intentionally to remove flag DF_REF_READ_WRITE on use, > > > > Ah, but I don't think that was the correct fix. The DEF and USE refs should > > both have the flag set. > > Are you still working on this? Guess not; unassigning and moving to cc
[Bug libstdc++/88607] forward_list.h contains utf-8 charactor
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88607 --- Comment #10 from Jonathan Wakely --- Author: redi Date: Sun Jan 6 00:49:11 2019 New Revision: 267607 URL: https://gcc.gnu.org/viewcvs?rev=267607&root=gcc&view=rev Log: PR libstdc++/88607 add tests using -finput-charset=ascii This verifies that the header can be compiled with ASCII as the input character set. PR libstdc++/88607 * testsuite/17_intro/headers/c++1998/charset.cc: New test. * testsuite/17_intro/headers/c++2011/charset.cc: New test. * testsuite/17_intro/headers/c++2014/charset.cc: New test. * testsuite/17_intro/headers/c++2017/charset.cc: New test. * testsuite/17_intro/headers/c++2020/charset.cc: New test. Added: trunk/libstdc++-v3/testsuite/17_intro/headers/c++1998/charset.cc trunk/libstdc++-v3/testsuite/17_intro/headers/c++2011/charset.cc trunk/libstdc++-v3/testsuite/17_intro/headers/c++2014/charset.cc trunk/libstdc++-v3/testsuite/17_intro/headers/c++2017/charset.cc trunk/libstdc++-v3/testsuite/17_intro/headers/c++2020/charset.cc Modified: trunk/libstdc++-v3/ChangeLog
[Bug target/85048] [missed optimization] vector conversions
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=85048 Devin Hussey changed: What|Removed |Added CC||husseydevin at gmail dot com --- Comment #5 from Devin Hussey --- ARM/AArch64 NEON use these: FromTo Intrinsic ARMv7-a AArch64 intXxY_t -> int2XxY_tvmovl_sX vmovl.sX sshll #0? uintXxY_t. -> uint2XxY_t vmovl_uX vmovl.uX ushll #0? [u]int2XxY_t -> [u]intXxY_t vmovn_[us]Xvmovn.iX xtn floatXxY_t -> intXxY_t vcvt[q]_sX_fX vcvt.sX.fX fcvtzs floatXxY_t -> uintXxY_tvcvt[q]_uX_fX vcvt.uX.fX fcvtzu intXxY_t -> floatXxY_t vcvt[q]_fX_sX vcvt.fX.sX scvtf uintXxY_t-> floatXxY_t vcvt[q]_fX_uX vcvt.fX.uX ucvtf float32x2_t -> float64x2_t vcvt_f32_f64 2x vcvt.f64.f32 fcvtl float64x2_t -> float32x2_t vcvt_f64_f32 2x vcvt.f32.f64 fcvtn Clang optimizes vmovl to vshll by zero for some reason. float32x2_t <-> float64x2_t requires 2 VFP instructions on ARMv7-a.
[Bug c/81871] bogus attribute alloc_align accepted
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=81871 Martin Sebor changed: What|Removed |Added Status|ASSIGNED|RESOLVED Resolution|--- |FIXED Target Milestone|--- |9.0 Known to fail||8.2.0 --- Comment #5 from Martin Sebor --- Looks like r266195 fixed it. $ cat t.c && gcc -S t.c void __attribute__ ((alloc_align (1))) f (int); void* __attribute__ ((alloc_align (1))) g (void*); t.c:1:1: warning: ‘alloc_align’ attribute ignored on a function returning ‘void’ [-Wattributes] 1 | void __attribute__ ((alloc_align (1))) f (int); | ^~~~ t.c:3:1: warning: ‘alloc_align’ attribute argument value ‘1’ refers to parameter type ‘void *’ [-Wattributes] 3 | void* __attribute__ ((alloc_align (1))) g (void*); | ^~~~ It's being tested by gcc.dg/attr-alloc_align-4.c so the bug can be resolved. Thanks for the reminder!
[Bug c/81871] bogus attribute alloc_align accepted
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=81871 --- Comment #4 from Eric Gallager --- (In reply to Martin Sebor from comment #3) > Let me fix this. Any progress?
[Bug libstdc++/77776] C++17 std::hypot implementation is poor
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=6 --- Comment #4 from Marc Glisse --- (In reply to Matthias Kretz from comment #3) > Did you consider the error introduced by scaling with __amax? I made sure > that the division is without error by zeroing the mantissa bits. Here's a > motivating example that shows an error of 1 ulp otherwise: > https://godbolt.org/z/_U2K7e Your "reference" number seems strange. Why not do the computation with double (or long double or mpfr) or use __builtin_hypotf? Note that it changes the value. How precise is hypot supposed to be? I know it is supposed to try and avoid spurious overflow/underflow, but I am not convinced that it should aim for correct rounding. (I see that you are using clang in that godbolt link, with gcc I need to mark the global variables with "extern const" to get a similar asm) > About std::fma, how bad is the performance hit if there's no instruction for > it? FMA doesn't seem particularly relevant here.
[Bug target/88712] Optimization: mov edx, 0 not replaced with xor edx, edx in this case
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88712 Andrew Pinski changed: What|Removed |Added Target||x86_64 --- Comment #1 from Andrew Pinski --- This is normally controlled by TARGET_USE_MOV0 but that seems like it is only enabled for k6 and maybe size.
[Bug rtl-optimization/88712] New: Optimization: mov edx, 0 not replaced with xor edx, edx in this case
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88712 Bug ID: 88712 Summary: Optimization: mov edx, 0 not replaced with xor edx, edx in this case Product: gcc Version: 9.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: rtl-optimization Assignee: unassigned at gcc dot gnu.org Reporter: matt at godbolt dot org Target Milestone: --- The code: ---snip int func(int val, const int *ptr) { int res = val + 1234; if (res == *ptr) { res = 0; } return res; } --- generates the following ASM on all version of GCC back to 4.9.x: --- func(int, int const*): lea eax, [rdi+1234] mov edx, 0 cmp DWORD PTR [rsi], eax cmove eax, edx ret --- The `mov edx, 0` is surprising to me. All the other compilers I tested (see https://godbolt.org/z/Nt9pKp for more details) use the common `xor edx, edx` (or `xor eax, eax`) idiom for zeroing edx. Is this a missed optimization in the case of a cmov being generated, or am I missing something subtle?
[Bug fortran/88710] [F08] Sourced allocation of array fails, yielding wrong bounds and result
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88710 --- Comment #4 from c...@mnet-mail.de --- Thanks, this caught the bounds violation with the following output: lbound/ubound(a):-1-1 1 2 1 1 lbound/ubound(b):-1-1 1 2 1 1 lbound/ubound(c):-1-1 1 2 1 1 lbound/ubound(t): 0 0 0 3 2 0 a, b, c, t: At line 28 of file test_alloc.F90 Fortran runtime error: Index '1' of dimension 3 of array 't' above upper bound of 0 Error termination. Backtrace: #0 0x2b3018cf341a #1 0x2b3018cf3f75 #2 0x2b3018cf4347 #3 0x403e97 #4 0x40400f #5 0x2b301917182f #6 0x4008d8 #7 0x
[Bug fortran/88710] [F08] Sourced allocation of array fails, yielding wrong bounds and result
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88710 --- Comment #3 from Dominique d'Humieres --- > For what it's worth, I have compiled the code also with '-Wall' > and '-Warray-bounds' but both these options didn't give any warning. The relevant option is -fcheck=bounds.
[Bug c++/85052] Implement support for clang's __builtin_convertvector
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=85052 --- Comment #9 from Matthias Kretz --- (In reply to Devin Hussey from comment #7) > Wait, silly me, this isn't about optimizations, this is about patterns. Regarding optimizations, PR85048 is a first step (it lists all x86 single-instruction SIMD conversions). I also linked my library implementation in #5, which provides optimizations for all cases on x86.
[Bug fortran/88710] [F08] Sourced allocation of array fails, yielding wrong bounds and result
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88710 --- Comment #2 from c...@mnet-mail.de --- Yes, the said block accesses 't' outside its bounds (because the returned bounds are wrong). Thanks for mentioning this. For what it's worth, I have compiled the code also with '-Wall' and '-Warray-bounds' but both these options didn't give any warning.
[Bug libstdc++/77776] C++17 std::hypot implementation is poor
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=6 --- Comment #3 from Matthias Kretz --- Did you consider the error introduced by scaling with __amax? I made sure that the division is without error by zeroing the mantissa bits. Here's a motivating example that shows an error of 1 ulp otherwise: https://godbolt.org/z/_U2K7e About std::fma, how bad is the performance hit if there's no instruction for it?
[Bug middle-end/87836] ICE in cc1 for gcc-6.5.0 with SPARC hardware
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=87836 --- Comment #27 from ro at CeBiTec dot Uni-Bielefeld.DE --- > --- Comment #26 from Gary Mills --- > I have no concerns about removal of gcc support for Solaris 10: That is an I've only mentioned it to make clear that the oldest version of Solaris as that's going to be tested with change quite a bit once S10 support is gone, certainly to something much newer than the snv_121 as in Illumos. > obsolete operating system, after all. illumos is equivalent to Solaris 11. No, it's not: while it's certainly closer to S11 than S10, it has still been quite a way from snv_147 (the last OpenSolaris build) to snv_175 (aka Solaris 11.0). No need to tell me about OpenSolaris/Illumos, btw.: I've been in the OpenSolaris Pilot from day one. > gas is used for illumos compilers on x86. It works on SPARC too, and avoids > the ICE. Unfortunately, gcc with gas can't be used to compile the SPARC > kernel. That's because some SPARC kernel files are written in assembler > language. These won't compile with gas, only with the native assembler. It It shouldn't be too hard to introduce make rules (or rather change cw) to build them with as directly, even if gcc on SPARC starts using gas. Hasn't this already been done for Illumos on x86? Alternatively, you can always rewrite them to use gas syntax, and I doubt that there are many as-specific constructs or directives in there: it's low-level kernel code, after all. > would be difficult, but not impossible, to use gcc with gas on SPARC hardware. > > I've just attempted to build gcc-7.3.0 on SPARC with an even more restricted > configuration: > > $ > /export/home/mills/Downloads/code/oi-userland/components/developer/gcc-7/gcc-7.3.0/configure > --without-gnu-ld --with-ld=/usr/bin/ld --without-gnu-as --with-as=/usr/bin/as > > The compilers are not specified on the command line but they are in the > environment. The compilers were identified correctly. > > The build got considerably farther, but ended with this error: > > libtool: compile: > /export/home/mills/Downloads/code/oi-userland/components/developer/gcc-7/build/sparcv7/./gcc/xgcc > -shared-libgcc > -B/export/home/mills/Downloads/code/oi-userland/components/developer/gcc-7/build/sparcv7/./gcc > -nostdinc++ > -L/export/home/mills/Downloads/code/oi-userland/components/developer/gcc-7/build/sparcv7/sparc-sun-solaris2.11/libstdc++-v3/src > -L/export/home/mills/Downloads/code/oi-userland/components/developer/gcc-7/build/sparcv7/sparc-sun-solaris2.11/libstdc++-v3/src/.libs > -L/export/home/mills/Downloads/code/oi-userland/components/developer/gcc-7/build/sparcv7/sparc-sun-solaris2.11/libstdc++-v3/libsupc++/.libs > -B/usr/local/sparc-sun-solaris2.11/bin/ > -B/usr/local/sparc-sun-solaris2.11/lib/ > -isystem /usr/local/sparc-sun-solaris2.11/include -isystem > /usr/local/sparc-sun-solaris2.11/sys-include > -I/export/home/mills/Downloads/code/oi-userland/components/developer/gcc-7/gcc-7.3.0/libstdc++-v3/../libgcc > -I/export/home/mills/Downloads/code/oi-userland/components/developer/gcc-7/build/sparcv7/sparc-sun-solaris2.11/libstdc++-v3/include/sparc-sun-solaris2.11 > -I/export/home/mills/Downloads/code/oi-userland/components/developer/gcc-7/build/sparcv7/sparc-sun-solaris2.11/libstdc++-v3/include > -I/export/home/mills/Downloads/code/oi-userland/components/developer/gcc-7/gcc-7.3.0/libstdc++-v3/libsupc++ > -D_GLIBCXX_SHARED -fno-implicit-templates -Wall -Wextra -Wwrite-strings > -Wcast-qual -Wabi -fdiagnostics-show-location=once -ffunction-sections > -fdata-sections -frandom-seed=new_opa.lo -g -O2 -std=gnu++1z -c > /export/home/mills/Downloads/code/oi-userland/components/developer/gcc-7/gcc-7.3.0/libstdc++-v3/libsupc++/new_opa.cc > -fPIC -DPIC -D_GLIBCXX_SHARED -o new_opa.o > /export/home/mills/Downloads/code/oi-userland/components/developer/gcc-7/gcc-7.3.0/libstdc++-v3/libsupc++/new_opa.cc: > In function 'void* operator new(std::size_t, std::align_val_t)': > /export/home/mills/Downloads/code/oi-userland/components/developer/gcc-7/gcc-7.3.0/libstdc++-v3/libsupc++/new_opa.cc:103:33: > error: 'aligned_alloc' was not declared in this scope >while (__builtin_expect ((p = aligned_alloc (align, sz)) == 0, false)) > ^ > /export/home/mills/Downloads/code/oi-userland/components/developer/gcc-7/gcc-7.3.0/libstdc++-v3/libsupc++/new_opa.cc:103:33: > note: suggested alternative: > In file included from /usr/include/stdlib.h:39:0, > from > /export/home/mills/Downloads/code/oi-userland/components/developer/gcc-7/build/sparcv7/sparc-sun-solaris2.11/libstdc++-v3/include/cstdlib:75, > from > /export/home/mills/Downloads/code/oi-userland/components/developer/gcc-7/build/sparcv7/sparc-sun-solaris2.11/libstdc++-v3/include/stdlib.h:36, > from > /export/home/mills/Downloads/code/oi-userland/components/developer/gcc-7/gcc-7.3.0/libstdc++-v3/libsupc++/new_opa.cc:27: > /usr/include/iso/stdlib_c11.h:
[Bug middle-end/87836] ICE in cc1 for gcc-6.5.0 with SPARC hardware
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=87836 --- Comment #26 from Gary Mills --- I have no concerns about removal of gcc support for Solaris 10: That is an obsolete operating system, after all. illumos is equivalent to Solaris 11. gas is used for illumos compilers on x86. It works on SPARC too, and avoids the ICE. Unfortunately, gcc with gas can't be used to compile the SPARC kernel. That's because some SPARC kernel files are written in assembler language. These won't compile with gas, only with the native assembler. It would be difficult, but not impossible, to use gcc with gas on SPARC hardware. I've just attempted to build gcc-7.3.0 on SPARC with an even more restricted configuration: $ /export/home/mills/Downloads/code/oi-userland/components/developer/gcc-7/gcc-7.3.0/configure --without-gnu-ld --with-ld=/usr/bin/ld --without-gnu-as --with-as=/usr/bin/as The compilers are not specified on the command line but they are in the environment. The compilers were identified correctly. The build got considerably farther, but ended with this error: libtool: compile: /export/home/mills/Downloads/code/oi-userland/components/developer/gcc-7/build/sparcv7/./gcc/xgcc -shared-libgcc -B/export/home/mills/Downloads/code/oi-userland/components/developer/gcc-7/build/sparcv7/./gcc -nostdinc++ -L/export/home/mills/Downloads/code/oi-userland/components/developer/gcc-7/build/sparcv7/sparc-sun-solaris2.11/libstdc++-v3/src -L/export/home/mills/Downloads/code/oi-userland/components/developer/gcc-7/build/sparcv7/sparc-sun-solaris2.11/libstdc++-v3/src/.libs -L/export/home/mills/Downloads/code/oi-userland/components/developer/gcc-7/build/sparcv7/sparc-sun-solaris2.11/libstdc++-v3/libsupc++/.libs -B/usr/local/sparc-sun-solaris2.11/bin/ -B/usr/local/sparc-sun-solaris2.11/lib/ -isystem /usr/local/sparc-sun-solaris2.11/include -isystem /usr/local/sparc-sun-solaris2.11/sys-include -I/export/home/mills/Downloads/code/oi-userland/components/developer/gcc-7/gcc-7.3.0/libstdc++-v3/../libgcc -I/export/home/mills/Downloads/code/oi-userland/components/developer/gcc-7/build/sparcv7/sparc-sun-solaris2.11/libstdc++-v3/include/sparc-sun-solaris2.11 -I/export/home/mills/Downloads/code/oi-userland/components/developer/gcc-7/build/sparcv7/sparc-sun-solaris2.11/libstdc++-v3/include -I/export/home/mills/Downloads/code/oi-userland/components/developer/gcc-7/gcc-7.3.0/libstdc++-v3/libsupc++ -D_GLIBCXX_SHARED -fno-implicit-templates -Wall -Wextra -Wwrite-strings -Wcast-qual -Wabi -fdiagnostics-show-location=once -ffunction-sections -fdata-sections -frandom-seed=new_opa.lo -g -O2 -std=gnu++1z -c /export/home/mills/Downloads/code/oi-userland/components/developer/gcc-7/gcc-7.3.0/libstdc++-v3/libsupc++/new_opa.cc -fPIC -DPIC -D_GLIBCXX_SHARED -o new_opa.o /export/home/mills/Downloads/code/oi-userland/components/developer/gcc-7/gcc-7.3.0/libstdc++-v3/libsupc++/new_opa.cc: In function 'void* operator new(std::size_t, std::align_val_t)': /export/home/mills/Downloads/code/oi-userland/components/developer/gcc-7/gcc-7.3.0/libstdc++-v3/libsupc++/new_opa.cc:103:33: error: 'aligned_alloc' was not declared in this scope while (__builtin_expect ((p = aligned_alloc (align, sz)) == 0, false)) ^ /export/home/mills/Downloads/code/oi-userland/components/developer/gcc-7/gcc-7.3.0/libstdc++-v3/libsupc++/new_opa.cc:103:33: note: suggested alternative: In file included from /usr/include/stdlib.h:39:0, from /export/home/mills/Downloads/code/oi-userland/components/developer/gcc-7/build/sparcv7/sparc-sun-solaris2.11/libstdc++-v3/include/cstdlib:75, from /export/home/mills/Downloads/code/oi-userland/components/developer/gcc-7/build/sparcv7/sparc-sun-solaris2.11/libstdc++-v3/include/stdlib.h:36, from /export/home/mills/Downloads/code/oi-userland/components/developer/gcc-7/gcc-7.3.0/libstdc++-v3/libsupc++/new_opa.cc:27: /usr/include/iso/stdlib_c11.h:60:14: note: 'std::aligned_alloc' extern void *aligned_alloc(size_t, size_t); ^ Makefile:936: recipe for target 'new_opa.lo' failed make[6]: *** [new_opa.lo] Error 1 make[6]: Leaving directory '/dpool/export/home/mills/Downloads/code/oi-userland-apr/components/developer/gcc-7/build/sparcv7/sparc-sun-solaris2.11/libstdc++-v3/libsupc++' There is a patch which seems to fix this error: --- gcc-7.1.0.orig/libstdc++-v3/libsupc++/new_opa.cc2017-01-26 15:30:45.0 +0100 +++ gcc-7.1.0/libstdc++-v3/libsupc++/new_opa.cc 2017-05-04 17:16:25.920300456 +0200 @@ -31,7 +31,6 @@ using std::new_handler; using std::bad_alloc; -#if !_GLIBCXX_HAVE_ALIGNED_ALLOC #if _GLIBCXX_HAVE__ALIGNED_MALLOC #define aligned_alloc(al,sz) _aligned_malloc(sz,al) #elif _GLIBCXX_HAVE_POSIX_MEMALIGN @@ -82,7 +81,6 @@ return aligned_ptr; } #endif -#endif _GLIBCXX_WEAK_DEFINITION void * operator new (std::size_t sz, std::align_val_t al) I can't be certain that this patch does not have unwanted side effec
[Bug ipa/88711] [regression 9.0] scan-ipa-dump inline "Inlined tp_sum/
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88711 Dominique d'Humieres changed: What|Removed |Added Status|UNCONFIRMED |NEW Last reconfirmed||2019-01-05 Ever confirmed|0 |1 --- Comment #2 from Dominique d'Humieres --- Confirmed on darwin.
[Bug fortran/88710] [F08] Sourced allocation of array fails, yielding wrong bounds and result
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88710 Dominique d'Humieres changed: What|Removed |Added Status|UNCONFIRMED |NEW Last reconfirmed||2019-01-05 CC||burnus at gcc dot gnu.org Ever confirmed|0 |1 --- Comment #1 from Dominique d'Humieres --- The behavior has changed between revisions r265171 (2018-10-15) lbound/ubound(a):-1-1 1 2 1 1 lbound/ubound(b):-1-1 1 2 1 1 lbound/ubound(c):-1-1 1 2 1 1 lbound/ubound(t): 0 0 0 3 2 0 and r265310 (2018-10-19) lbound/ubound(a):-1-1 1 2 1 1 lbound/ubound(b):-1-1 1 2 1 1 lbound/ubound(c):-1-1 1 2 1 1 lbound/ubound(t): 1 1 1 4 3 1 likely r265212 (pr67125). Note that the block do k = lbound(a,3), ubound(a,3) do j = lbound(a,2), ubound(a,2) do i = lbound(a,1), ubound(a,1) write(*,'(1p,4(e23.16,1x))') & & a(i,j,k), b(i,j,k), c(i,j,k), t(i,j,k) end do end do end do accesses 't' outside its bounds in both cases.
[Bug ipa/88711] [regression 9.0] scan-ipa-dump inline "Inlined tp_sum/
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88711 --- Comment #1 from kargl at gcc dot gnu.org --- > The likely cause of this regression is > > > r267600 | hubicka | 2019-01-05 09:47:34 -0800 (Sat, 05 Jan 2019) | 2 lines > > * ipa-fnsummary.c (analyze_function_body): Fix accounting of time. Definitely caused by r267600. Verified by 'svn merge -r267600:267599 .' to remove offending patch. Perhaps, the scan line in the testcase needs to be adjusted?
[Bug c++/85052] Implement support for clang's __builtin_convertvector
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=85052 --- Comment #8 from Jakub Jelinek --- Note, I've posted in the meantime a newer version of the patch that should handle the 2x narrowing or 2x widening cases better, see https://gcc.gnu.org/ml/gcc-patches/2019-01/msg00129.html
[Bug c++/85052] Implement support for clang's __builtin_convertvector
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=85052 --- Comment #7 from Devin Hussey --- Wait, silly me, this isn't about optimizations, this is about patterns. It does the same thing it was doing for this code: typedef unsigned u32x2 __attribute__((vector_size(8))); typedef unsigned long long u64x2 __attribute__((vector_size(16))); u64x2 cvt(u32x2 in) { return (u64x2) { (unsigned long long)in[0], (unsigned long long)in[1] }; }
[Bug c++/85052] Implement support for clang's __builtin_convertvector
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=85052 --- Comment #6 from Devin Hussey --- The patch seems to be working. typedef unsigned u32x2 __attribute__((vector_size(8))); typedef unsigned long long u64x2 __attribute__((vector_size(16))); u64x2 cvt(u32x2 in) { return __builtin_convertvector(in, u64x2); } It doesn't generate the best code, but it isn't bad. x86_64, SSE4.1: cvt: movq%xmm0, %rax movd%eax, %xmm0 shrq$32, %rax pinsrq $1, %rax, %xmm0 ret x86_64, SSE2: cvt: movq%xmm0, %rax movd%eax, %xmm0 shrq$32, %rax movq%rax, %xmm1 punpcklqdq %xmm1, %xmm0 ret ARMv7a NEON: cvt: sub sp, sp, #16 mov r3, #0 str r3, [sp, #4] str r3, [sp, #12] add r3, sp, #8 vst1.32 {d0[0]}, [sp] vst1.32 {d0[1]}, [r3] vld1.64 {d0-d1}, [sp:64] add sp, sp, #16 bx lr I haven't built the others yet. The correct code would be this ([signed|unsigned]): cvt: vmovl.[s|u]32q0, d0 bx lr I am testing other targets now. For the reference, this is what clang generates for other targets: aarch64: cvt: [s|u]shll v0.2d, v0.2s, #0 ret sse4.1/avx: cvt: [v]pmov[s|z]xdqxmm0, xmm0 ret sse2: signed_cvt: pxorxmm1, xmm1 pcmpgtd xmm1, xmm0 punpckldq xmm0, xmm1 # xmm0 = xmm0[0],xmm1[0],xmm0[1],xmm1[1] ret unsigned_cvt: xorps xmm1, xmm1 unpcklpsxmm0, xmm1 # xmm0 = xmm0[0],xmm1[0],xmm0[1],xmm1[1] ret
[Bug ipa/88711] New: [regression 9.0] scan-ipa-dump inline "Inlined tp_sum/
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88711 Bug ID: 88711 Summary: [regression 9.0] scan-ipa-dump inline "Inlined tp_sum/ Product: gcc Version: 9.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: ipa Assignee: unassigned at gcc dot gnu.org Reporter: kargl at gcc dot gnu.org CC: marxin at gcc dot gnu.org Target Milestone: --- A recent change (as in the last 12 hours) has introduce this regression on x86_64-*-freebsd. FAIL: gfortran.dg/pr79966.f90 -O scan-ipa-dump inline "Inlined tp_sum/[0-9]+ into runtptests/[0-9]+" The likely cause of this regression is r267600 | hubicka | 2019-01-05 09:47:34 -0800 (Sat, 05 Jan 2019) | 2 lines * ipa-fnsummary.c (analyze_function_body): Fix accounting of time.
[Bug fortran/88710] New: [F08] Sourced allocation of array fails, yielding wrong bounds and result
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88710 Bug ID: 88710 Summary: [F08] Sourced allocation of array fails, yielding wrong bounds and result Product: gcc Version: 8.1.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: fortran Assignee: unassigned at gcc dot gnu.org Reporter: c...@mnet-mail.de Target Milestone: --- The following code shows that sourced allocation of an allocatable array with gfortran 8.1.0 leads to wrong lower and upper bounds that do not correspond to those of the source expression. Moreover, the initialized array therefore does not yield the correct result expected from the value of the source expression. $ cat test_alloc.F90 program test_alloc implicit none integer(4) :: i, j, k real(8), dimension(:,:,:), allocatable :: a, b, c, t allocate( a(-1:2,-1:1,1:1) ) allocate( b(-1:2,-1:1,1:1) ) allocate( c(-1:2,-1:1,1:1) ) a = 1.d0 b = 2.d0 c = 0.d0 allocate(t, source = (a + (c - b)) ) write(*,'(a,6(i5,1x))') 'lbound/ubound(a): ', lbound(a), ubound(a) write(*,'(a,6(i5,1x))') 'lbound/ubound(b): ', lbound(b), ubound(b) write(*,'(a,6(i5,1x))') 'lbound/ubound(c): ', lbound(c), ubound(c) write(*,'(a,6(i5,1x))') 'lbound/ubound(t): ', lbound(t), ubound(t) write(*,*) 'a, b, c, t: ' do k = lbound(a,3), ubound(a,3) do j = lbound(a,2), ubound(a,2) do i = lbound(a,1), ubound(a,1) write(*,'(1p,4(e23.16,1x))') & & a(i,j,k), b(i,j,k), c(i,j,k), t(i,j,k) end do end do end do end program test_alloc Running this code with gfortran 8.1.0 gives the following output. $ gfortran-8 test_alloc.F90 -o test.gfort; ./test.gfort lbound/ubound(a):-1-1 1 2 1 1 lbound/ubound(b):-1-1 1 2 1 1 lbound/ubound(c):-1-1 1 2 1 1 lbound/ubound(t): 0 0 0 3 2 0 a, b, c, t: 1.E+00 2.E+00 0.E+00 -1.E+00 1.E+00 2.E+00 0.E+00 -1.E+00 1.E+00 2.E+00 0.E+00 -1.E+00 1.E+00 2.E+00 0.E+00 -1.E+00 1.E+00 2.E+00 0.E+00 -1.E+00 1.E+00 2.E+00 0.E+00 0.E+00 1.E+00 2.E+00 0.E+00 1.6304166312761136-322 1.E+00 2.E+00 0.E+00 1.0023829485142537E-95 1.E+00 2.E+00 0.E+00 3.4119363283543871-315 1.E+00 2.E+00 0.E+00 0.E+00 1.E+00 2.E+00 0.E+00 2.0716172530123468-320 1.E+00 2.E+00 0.E+00 9.6317959318370178-317 Both flang 6.0 and pgfortran 18.4-0 yield the following (correct) output (notice the different bounds for t, and its values printed in the last column): $ flang test_alloc.F90 -o test.flang; ./test.flang lbound/ubound(a):-1-1 1 2 1 1 lbound/ubound(b):-1-1 1 2 1 1 lbound/ubound(c):-1-1 1 2 1 1 lbound/ubound(t):-1-1 1 2 1 1 a, b, c, t: 1.E+00 2.E+00 0.E+00 -1.E+00 1.E+00 2.E+00 0.E+00 -1.E+00 1.E+00 2.E+00 0.E+00 -1.E+00 1.E+00 2.E+00 0.E+00 -1.E+00 1.E+00 2.E+00 0.E+00 -1.E+00 1.E+00 2.E+00 0.E+00 -1.E+00 1.E+00 2.E+00 0.E+00 -1.E+00 1.E+00 2.E+00 0.E+00 -1.E+00 1.E+00 2.E+00 0.E+00 -1.E+00 1.E+00 2.E+00 0.E+00 -1.E+00 1.E+00 2.E+00 0.E+00 -1.E+00 1.E+00 2.E+00 0.E+00 -1.E+00 Gfortran version used is: $ gfortran-8 -v Using built-in specs. COLLECT_GCC=gfortran-8 COLLECT_LTO_WRAPPER=/usr/lib/gcc/x86_64-linux-gnu/8/lto-wrapper OFFLOAD_TARGET_NAMES=nvptx-none OFFLOAD_TARGET_DEFAULT=1
[Bug fortran/88653] Is this a compiler bug?
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88653 --- Comment #14 from Murat Tekeev --- I will establish anew Cygwin and I will try to repeat compilation. When I used version 7.3, everything was good. Eventually, there are also other compilers, except gfortran.
[Bug driver/88708] help-dummy.o file left behind
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88708 Jakub Jelinek changed: What|Removed |Added CC||jakub at gcc dot gnu.org --- Comment #1 from Jakub Jelinek --- No idea why the documentation suggests the -c there, without it it works just fine. With -S too, it actually calls cc1 with -o help-dummy.s but doesn't actually emit there anything into that file (nor, if it exists previously, removes it or modifies it). With -E it actually fails: ./xgcc -B ./ -E -Q -O --help=optimizers cc1: fatal error: help-dummy: No such file or directory compilation terminated. I wonder if we shouldn't treat -E as -S and -c as no -E/-S/-c with these help options, which is IMHO the best thing. Without -E/-S/-c, cc1 is executed with say -o /tmp/cc7Z9tXX.s but doesn't write that file, and as is executed with -o /tmp/cc4DJDCT.o /tmp/cc7Z9tXX.s and all the temporary files are removed afterwards.
[Bug target/88706] [og8, nvptx, openacc] Inconsistencies when vector length set using vector_length clause or fopenacc-dim
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88706 --- Comment #1 from Tom de Vries --- (In reply to Tom de Vries from comment #0) > I think the same problem exists for the other work around in > nvptx_adjust_parallelism, this one: > ... > /* FIXME: This is overly conservative; worker and vector loop will > > eventually be combined. */ > if (wv) > return inner_mask & ~GOMP_DIM_MASK (GOMP_DIM_WORKER); > ... > It's just harder to spot because the workaround doesn't affect vector length. Confirmed. With this additional patch: ... @@ -5695,7 +5696,10 @@ nvptx_adjust_parallelism (unsigned inner_mask, unsigned outer_mask) /* FIXME: This is overly conservative; worker and vector loop will eventually be combined. */ if (wv) -return inner_mask & ~GOMP_DIM_MASK (GOMP_DIM_WORKER); +{ + fprintf (stderr, "worker-vector loop workaround applied in %s\n", current_function_name ()); + return inner_mask & ~GOMP_DIM_MASK (GOMP_DIM_WORKER); +} /* It's difficult to guarantee that warps in large vector_lengths will remain convergent when a vector loop is nested inside a ... we see for the first case (vector_length set on parallel directive, no -fopenacc-dim=): ... oa.vector_length in nvptx_adjust_parallelism: 128 oa.vector_length in nvptx_adjust_parallelism: 128 oa.vector_length in nvptx_adjust_parallelism: 128 oa.vector_length in nvptx_adjust_parallelism: 128 oa.vector_length in nvptx_adjust_parallelism: 128 oa.vector_length in nvptx_adjust_parallelism: 128 worker-vector loop workaround applied in test2._omp_fn.1 oa.vector_length in nvptx_adjust_parallelism: 128 oa.vector_length in nvptx_adjust_parallelism: 128 oa.vector_length in nvptx_adjust_parallelism: 128 oa.vector_length in nvptx_adjust_parallelism: 32 oa.vector_length in nvptx_adjust_parallelism: 32 oa.vector_length in nvptx_adjust_parallelism: 32 oa.vector_length in nvptx_adjust_parallelism: 32 oa.vector_length in nvptx_adjust_parallelism: 32 oa.vector_length in nvptx_adjust_parallelism: 32 ... and for the second case (no vector_length set on parallel directive, using -fopenacc-dim=): ... oa.vector_length in nvptx_adjust_parallelism: 32 oa.vector_length in nvptx_adjust_parallelism: 32 oa.vector_length in nvptx_adjust_parallelism: 32 oa.vector_length in nvptx_adjust_parallelism: 32 oa.vector_length in nvptx_adjust_parallelism: 32 oa.vector_length in nvptx_adjust_parallelism: 32 oa.vector_length in nvptx_adjust_parallelism: 32 oa.vector_length in nvptx_adjust_parallelism: 32 oa.vector_length in nvptx_adjust_parallelism: 32 oa.vector_length in nvptx_adjust_parallelism: 32 oa.vector_length in nvptx_adjust_parallelism: 32 oa.vector_length in nvptx_adjust_parallelism: 32 oa.vector_length in nvptx_adjust_parallelism: 32 oa.vector_length in nvptx_adjust_parallelism: 32 oa.vector_length in nvptx_adjust_parallelism: 32 ...
[Bug fortran/85855] [7/8/9 Regression] (Maybe) uninitialized descriptor fields of an allocatable array component of a function result
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=85855 Dominique d'Humieres changed: What|Removed |Added Status|NEW |RESOLVED Resolution|--- |DUPLICATE --- Comment #7 from Dominique d'Humieres --- > I'm seeing the same behavior on GCC 7.3; this looks to be a duplicate > of https://gcc.gnu.org/bugzilla/show_bug.cgi?id=77504 . I agree. *** This bug has been marked as a duplicate of bug 77504 ***
[Bug fortran/77504] "is used uninitialized" with allocatable string and array constructors
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=77504 Dominique d'Humieres changed: What|Removed |Added CC||vladimir.fuka at gmail dot com --- Comment #8 from Dominique d'Humieres --- *** Bug 85855 has been marked as a duplicate of this bug. ***
[Bug middle-end/24639] [meta-bug] bug to track all Wuninitialized issues
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=24639 Bug 24639 depends on bug 85855, which changed state. Bug 85855 Summary: [7/8/9 Regression] (Maybe) uninitialized descriptor fields of an allocatable array component of a function result https://gcc.gnu.org/bugzilla/show_bug.cgi?id=85855 What|Removed |Added Status|NEW |RESOLVED Resolution|--- |DUPLICATE
[Bug fortran/85855] [7/8/9 Regression] (Maybe) uninitialized descriptor fields of an allocatable array component of a function result
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=85855 Seth Johnson changed: What|Removed |Added CC||johnsonsr at ornl dot gov --- Comment #6 from Seth Johnson --- I'm seeing the same behavior on GCC 7.3; this looks to be a duplicate of https://gcc.gnu.org/bugzilla/show_bug.cgi?id=77504 .
[Bug tree-optimization/88709] Improve store-merging
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88709 Jakub Jelinek changed: What|Removed |Added CC||redi at gcc dot gnu.org --- Comment #1 from Jakub Jelinek --- Compared to the first testcase, we do handle struct S { char buf[8]; }; void bar (struct S *); void foo (void) { struct S s; int a = 0; __builtin_memcpy (&s.buf[4], &a, sizeof (int)); s.buf[0] = 5; s.buf[1] = 2; s.buf[2] = 3; s.buf[3] = 2; s.buf[5] = 7; bar (&s); } though, because the store is in that case MEM[&s + 4B] = {} and thus valid for lhs.
[Bug fortran/88009] [9 Regression] ICE in find_intrinsic_vtab, at fortran/class.c:2761
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88009 janus at gcc dot gnu.org changed: What|Removed |Added Status|ASSIGNED|RESOLVED Resolution|--- |FIXED --- Comment #5 from janus at gcc dot gnu.org --- Fixed with r267598. Closing.
[Bug tree-optimization/88709] New: Improve store-merging
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88709 Bug ID: 88709 Summary: Improve store-merging Product: gcc Version: 9.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: tree-optimization Assignee: unassigned at gcc dot gnu.org Reporter: jakub at gcc dot gnu.org Target Milestone: --- As shown in: struct S { char buf[8]; }; void bar (struct S *); void foo (void) { struct S s = {}; s.buf[1] = 1; s.buf[3] = 2; bar (&s); } or struct val_t { char data[16]; }; void optimize_me (val_t); void optimize_me3 (val_t, val_t, val_t); void good () { optimize_me ({ 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16 }); } void bad () { optimize_me ({ 1, 2, 3, 4, 5 }); } void why () { optimize_me ({ 1, 2, 3, 4, 5, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0 }); } void srsly () { optimize_me3 ({ 1, 2, 3, 4, 5, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0 }, { 11, 12, 13, 14, 15, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10 }, { 21, 22, 23, 24, 25, 20, 20, 20, 10, 20, 20, 20, 20, 20, 20 }); } void srsly_not_one_missing () { optimize_me3 ({ 1, 2, 3, 4, 5, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0 }, { 11, 12, 13, 14, 15, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10 }, { 21, 22, 23, 24, 25, 20, 20, 20, 10, 20, 20, 20, 20, 20, 20, 11 }); } there is room for improvement in store-merging. In the first testcase, we ignore the clearing because !lhs_valid_for_store_merging_p, the lhs is in that case the whole VAR_DECL rather than a component of it. And in the second testcase, we sometimes punt because of the same reason, sometimes because rhs_valid_for_store_merging_p is false. Handling these = {} storage clearings (or perhaps even __builtin_memset calls) is something we could handle, though with extra care, we don't want to take apart those clears if it doesn't reduce the amount of needed stores.
[Bug fortran/88009] [9 Regression] ICE in find_intrinsic_vtab, at fortran/class.c:2761
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88009 --- Comment #4 from janus at gcc dot gnu.org --- Author: janus Date: Sat Jan 5 14:32:12 2019 New Revision: 267598 URL: https://gcc.gnu.org/viewcvs?rev=267598&root=gcc&view=rev Log: 2019-01-05 Janus Weil PR fortran/88009 * class.c (gfc_find_derived_vtab): Mark the _final component as artificial. (find_intrinsic_vtab): Ditto. Also add an extra check to avoid dereferencing a null pointer and adjust indentation. * resolve.c (resolve_fl_variable): Add extra check to avoid dereferencing a null pointer. Move variable declarations to local scope. (resolve_fl_procedure): Add extra check to avoid dereferencing a null pointer. * symbol.c (check_conflict): Suppress errors for artificial symbols. 2019-01-05 Janus Weil PR fortran/88009 * gfortran.dg/blockdata_10.f90: New test case. Added: trunk/gcc/testsuite/gfortran.dg/blockdata_10.f90 Modified: trunk/gcc/fortran/ChangeLog trunk/gcc/fortran/class.c trunk/gcc/fortran/resolve.c trunk/gcc/fortran/symbol.c trunk/gcc/testsuite/ChangeLog
[Bug c/88698] Relax generic vector conversions
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88698 --- Comment #10 from Devin Hussey --- Well what about a special type attribute or some kind of transparent_union like thing for Intel's types? It seems that Intel's intrinsics are the main (only) platform that uses generic types.
[Bug fortran/88653] Is this a compiler bug?
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88653 Thomas Koenig changed: What|Removed |Added Status|NEW |WAITING --- Comment #13 from Thomas Koenig --- I checked this with the exact same version on Cygwin, no errors detected. So, this loos like an installation or hardware problem. Could you maybe re-install the compiler?
[Bug driver/88708] New: help-dummy.o file left behind
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88708 Bug ID: 88708 Summary: help-dummy.o file left behind Product: gcc Version: 9.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: driver Assignee: unassigned at gcc dot gnu.org Reporter: drepper.fsp+rhbz at gmail dot com Target Milestone: --- When using gcc -c -Q -O --help=optimizers the driver leaves behind the help-dummy.o file. This happens with gcc trunk and all prior versions I was able to test.
[Bug fortran/88632] [F08] function contained in module invisible to submodule unless declared public
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88632 Paul Thomas changed: What|Removed |Added Assignee|unassigned at gcc dot gnu.org |pault at gcc dot gnu.org --- Comment #2 from Paul Thomas --- Created attachment 45349 --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=45349&action=edit A provisional patch that fixes the problem The attached fixes this but causes regressions: FAIL: gfortran.dg/module_private_2.f90 -O scan-tree-dump-times optimized "priv" 0 FAIL: gfortran.dg/public_private_module_7.f90 -O scan-assembler-not __m_common_attrs_MOD_other FAIL: gfortran.dg/public_private_module_8.f90 -O scan-assembler-not __m_MOD_myotherlen FAIL: gfortran.dg/public_private_module_2.f90 -O scan-assembler-not two FAIL: gfortran.dg/public_private_module_2.f90 -O scan-assembler-not six FAIL: gfortran.dg/warn_unused_function_2.f90 -O (test for warnings, line 16) I think that this is best dealt with by extending the patch by flagging the module as having a module function/subroutine, which implies that there is a submodule somewhere, and making all the module procedures TREE_PUBLIC. That will suppress the above regressions. Otherwise, I will have to find someway of persuading the linker to find the symbol from the submodule. First I must get the C-interop patch out of the way and then I will come back to this PR. Paul
[Bug libgomp/88707] Random failures of libgomp.c++/task-reduction-(8|10).C on x86_64-apple-darwin18
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88707 --- Comment #2 from Iain Sandoe --- (on Darwin17 I had a recent build) I find that a built exe fails quite often; here's a sample of the hung program (it appears deadlocked, not consuming any CPU). The correct libraries are being loaded. Sampling process 23844 for 3 seconds with 1 millisecond of run time between samples Sampling completed, processing symbols... Analysis of sampling task-reduction-10.exe (pid 23844) every 1 millisecond Process: task-reduction-10.exe [23844] Path: /Volumes/scratch/10-13-his/gcc-trunk-gcc/x86_64-apple-darwin17/libgomp/testsuite/task-reduction-10.exe Load Address:0x10835 Identifier: task-reduction-10.exe Version: 0 Code Type: X86-64 Parent Process: bash [34246] Date/Time: 2019-01-05 12:53:30.784 + Launch Time: 2019-01-05 12:52:40.943 + OS Version: Mac OS X 10.13.6 (17G4015) Report Version: 7 Analysis Tool: /usr/bin/sample Physical footprint: 568K Physical footprint (peak): 576K Call graph: 2799 Thread_58184392 DispatchQueue_1: com.apple.main-thread (serial) + 2799 ??? (in ) [0x7f9679c02718] + 2799 gomp_barrier_wait_end (in libgomp.1.dylib) + 86 [0x10862d606] bar.c:92 + 2799 gomp_sem_wait (in libgomp.1.dylib) + 40 [0x10862d488] sem.c:71 + 2799 _pthread_cond_wait (in libsystem_pthread.dylib) + 732 [0x7fff53677589] + 2799 __psynch_cvwait (in libsystem_kernel.dylib) + 10 [0x7fff534aea16] 2799 Thread_58184395 + 2799 ??? (in ) [0x2060] + 2799 ??? (in ) [0x7f9679c02cb8] + 2799 gomp_barrier_wait_end (in libgomp.1.dylib) + 86 [0x10862d606] bar.c:92 + 2799 gomp_sem_wait (in libgomp.1.dylib) + 40 [0x10862d488] sem.c:71 + 2799 _pthread_cond_wait (in libsystem_pthread.dylib) + 732 [0x7fff53677589] + 2799 __psynch_cvwait (in libsystem_kernel.dylib) + 10 [0x7fff534aea16] 2799 Thread_58184397 + 2799 ??? (in ) [0x2060] + 2799 ??? (in ) [0x7f9679c030b8] + 2799 gomp_barrier_wait_end (in libgomp.1.dylib) + 86 [0x10862d606] bar.c:92 + 2799 gomp_sem_wait (in libgomp.1.dylib) + 40 [0x10862d488] sem.c:71 + 2799 _pthread_cond_wait (in libsystem_pthread.dylib) + 732 [0x7fff53677589] + 2799 __psynch_cvwait (in libsystem_kernel.dylib) + 10 [0x7fff534aea16] 2799 Thread_58184398 + 2799 ??? (in ) [0x2060] + 2799 ??? (in ) [0x7f9679c032b8] + 2799 gomp_barrier_wait_end (in libgomp.1.dylib) + 86 [0x10862d606] bar.c:92 + 2799 gomp_sem_wait (in libgomp.1.dylib) + 40 [0x10862d488] sem.c:71 + 2799 _pthread_cond_wait (in libsystem_pthread.dylib) + 732 [0x7fff53677589] + 2799 __psynch_cvwait (in libsystem_kernel.dylib) + 10 [0x7fff534aea16] 2799 Thread_58184399 + 2799 ??? (in ) [0x2060] + 2799 ??? (in ) [0x7f9679c034b8] + 2799 gomp_barrier_wait_end (in libgomp.1.dylib) + 86 [0x10862d606] bar.c:92 + 2799 gomp_sem_wait (in libgomp.1.dylib) + 40 [0x10862d488] sem.c:71 + 2799 _pthread_cond_wait (in libsystem_pthread.dylib) + 732 [0x7fff53677589] + 2799 __psynch_cvwait (in libsystem_kernel.dylib) + 10 [0x7fff534aea16] 2799 Thread_58184400 + 2799 ??? (in ) [0x2060] + 2799 ??? (in ) [0x7f9679d00118] + 2799 gomp_barrier_wait_end (in libgomp.1.dylib) + 86 [0x10862d606] bar.c:92 + 2799 gomp_sem_wait (in libgomp.1.dylib) + 40 [0x10862d488] sem.c:71 + 2799 _pthread_cond_wait (in libsystem_pthread.dylib) + 732 [0x7fff53677589] + 2799 __psynch_cvwait (in libsystem_kernel.dylib) + 10 [0x7fff534aea16] 2799 Thread_58184401 2799 ??? (in ) [0x2060] 2799 ??? (in ) [0x7f9679d00318] 2799 gomp_barrier_wait_end (in libgomp.1.dylib) + 86 [0x10862d606] bar.c:92 2799 gomp_sem_wait (in libgomp.1.dylib) + 40 [0x10862d488] sem.c:71 2799 _pthread_cond_wait (in libsystem_pthread.dylib) + 732 [0x7fff53677589] 2799 __psynch_cvwait (in libsystem_kernel.dylib) + 10 [0x7fff534aea16] Total number in stack (recursive counted multiple, when >=5): 7 __psynch_cvwait (in libsystem_kernel.dylib) + 0 [0x7fff534aea0c] 7 _pthread_cond_wait (in libsystem_pthread.dylib) + 732 [0x7fff53677589] 7 gomp_barrier_wait_end (in libgomp.1.dylib) + 86 [0x10862d606] bar.c:92 7 gomp_sem_wait (in libgomp.1.dylib) + 40 [0x10862d488] sem.c:71 6 ??? (in ) [0x2060] Sort by top of stack, same collapsed (when >= 5): __psynch_cvwait (in libsystem_kernel.dylib)19593
[Bug target/60563] FAIL: g++.dg/ext/sync-4.C on *-apple-darwin*
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=60563 Dominique d'Humieres changed: What|Removed |Added Priority|P3 |P4 Status|NEW |RESOLVED Resolution|--- |FIXED --- Comment #20 from Dominique d'Humieres --- Silenced on trunk and release branches, closing. The test will XPASS when the ld problem will be fixed and the darwin hack could then be removed.
[Bug libgomp/88707] Random failures of libgomp.c++/task-reduction-(8|10).C on x86_64-apple-darwin18
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88707 Iain Sandoe changed: What|Removed |Added Status|UNCONFIRMED |NEW Last reconfirmed||2019-01-05 Ever confirmed|0 |1 --- Comment #1 from Iain Sandoe --- looking through my last set of results, the first occurrence I see is for Darwin16 (OSX 10.12), but since this is a random fail - that might be inconclusive.
[Bug target/60563] FAIL: g++.dg/ext/sync-4.C on *-apple-darwin*
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=60563 --- Comment #19 from dominiq at gcc dot gnu.org --- Author: dominiq Date: Sat Jan 5 12:44:12 2019 New Revision: 267597 URL: https://gcc.gnu.org/viewcvs?rev=267597&root=gcc&view=rev Log: 2019-01-05 Dominique d'Humieres PR target/60563 * g++.dg/ext/sync-4.C: Add dg-xfail-run-if for darwin. Modified: branches/gcc-7-branch/gcc/testsuite/ChangeLog branches/gcc-7-branch/gcc/testsuite/g++.dg/ext/sync-4.C
[Bug libgomp/88707] New: Random failures of libgomp.c++/task-reduction-(8|10).C on x86_64-apple-darwin18
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88707 Bug ID: 88707 Summary: Random failures of libgomp.c++/task-reduction-(8|10).C on x86_64-apple-darwin18 Product: gcc Version: 9.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: libgomp Assignee: unassigned at gcc dot gnu.org Reporter: dominiq at lps dot ens.fr CC: iains at gcc dot gnu.org, jakub at gcc dot gnu.org Target Milestone: --- Host: x86_64-apple-darwin18 Target: x86_64-apple-darwin18 Build: x86_64-apple-darwin18 On x86_64-apple-darwin18 I see WARNING: program timed out. FAIL: libgomp.c++/task-reduction-10.C execution test WARNING: program timed out. FAIL: libgomp.c++/task-reduction-8.C execution test since they were introduced at revision r265930. Not only the tests are randomly timed out for -m32 or -m64, but I have to to kill the executable manually. I don't see the problem on darwin 10.
[Bug middle-end/82564] ICE at -O1 and above: in assign_stack_temp_for_type, at function.c:783
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=82564 Jakub Jelinek changed: What|Removed |Added Status|UNCONFIRMED |RESOLVED CC||jakub at gcc dot gnu.org Resolution|--- |FIXED --- Comment #4 from Jakub Jelinek --- Fixed on the trunk.
[Bug target/88620] [7/8 Regression] ICE in assign_stack_temp_for_type, at function.c:837
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88620 Jakub Jelinek changed: What|Removed |Added Summary|[7/8/9 Regression] ICE in |[7/8 Regression] ICE in |assign_stack_temp_for_type, |assign_stack_temp_for_type, |at function.c:837 |at function.c:837 --- Comment #5 from Jakub Jelinek --- Fixed on the trunk so far.
[Bug fortran/88653] Is this a compiler bug?
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88653 --- Comment #12 from Dominique d'Humieres --- It seems that the problem comes from your installation. Did you build gfortran yourself or did you get it from some binary distribution? If the later, from where? Did you report the problem to them? Is this the first time you use gfortran? If no, what was the last working version? > A list of files that failed to compile. Does this mean that the other files compile and run?
[Bug target/60563] FAIL: g++.dg/ext/sync-4.C on *-apple-darwin*
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=60563 --- Comment #18 from dominiq at gcc dot gnu.org --- Author: dominiq Date: Sat Jan 5 11:17:40 2019 New Revision: 267596 URL: https://gcc.gnu.org/viewcvs?rev=267596&root=gcc&view=rev Log: 2019-01-05 Dominique d'Humieres PR target/60563 * g++.dg/ext/sync-4.C: Add dg-xfail-run-if for darwin. Modified: branches/gcc-8-branch/gcc/testsuite/ChangeLog branches/gcc-8-branch/gcc/testsuite/g++.dg/ext/sync-4.C
[Bug target/88706] New: [og8, nvptx, openacc] Inconsistencies when vector length set using vector_length clause or fopenacc-dim
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88706 Bug ID: 88706 Summary: [og8, nvptx, openacc] Inconsistencies when vector length set using vector_length clause or fopenacc-dim Product: gcc Version: unknown Status: UNCONFIRMED Severity: normal Priority: P3 Component: target Assignee: unassigned at gcc dot gnu.org Reporter: vries at gcc dot gnu.org Target Milestone: --- Consider libgomp testcase vred2d-128.c (posted partially here): ... gentest (test1, "acc parallel loop gang vector_length (128)", "acc loop vector reduction(+:t1) reduction(-:t2)") gentest (test2, "acc parallel loop gang vector_length (128)", "acc loop worker vector reduction(+:t1) reduction(-:t2)") gentest (test3, "acc parallel loop gang worker vector_length (128)", "acc loop vector reduction(+:t1) reduction(-:t2)") gentest (test4, "acc parallel loop", "acc loop reduction(+:t1) reduction(-:t2)") ... The resulting front-end attributes are: ... $ grep -A1 __attribute__ vred2d-128.c.088t.fixup_cfg4 __attribute__((oacc function (, , 128), omp target entrypoint)) test1._omp_fn.0 (long int * t2, long int * t1, int[1] * a2, int[1] * a1) -- __attribute__((oacc function (, , 128), omp target entrypoint)) test2._omp_fn.1 (long int * t2, long int * t1, int[1] * a2, int[1] * a1) -- __attribute__((oacc function (, , 128), omp target entrypoint)) test3._omp_fn.2 (long int * t2, long int * t1, int[1] * a2, int[1] * a1) -- __attribute__((oacc function (, , ), omp target entrypoint)) test4._omp_fn.3 (long int * t2, long int * t1, int[1] * a2, int[1] * a1) ... When we compile at -O2 and grep for the resulting dimensions, we have: ... $ grep FUNC_MAP vred2d-128.s //:FUNC_MAP "test1$_omp_fn$0", 0, 0x1, 0x80 //:FUNC_MAP "test2$_omp_fn$1", 0, 0x1, 0x80 //:FUNC_MAP "test3$_omp_fn$2", 0, 0, 0x20 //:FUNC_MAP "test4$_omp_fn$3", 0, 0, 0x20 ... Note that the vector length for test3 has been downgraded by the -mno-long-vector-in-workers workaround. Now if we remove the hardcoded vector-length (128) from test1, test2 and test3, and we add -fopenacc-dim=::128 we have instead: ... //:FUNC_MAP "test1$_omp_fn$0", 0, 0x1, 0x80 //:FUNC_MAP "test2$_omp_fn$1", 0, 0, 0x80 //:FUNC_MAP "test3$_omp_fn$2", 0, 0, 0x80 //:FUNC_MAP "test4$_omp_fn$3", 0, 0, 0x80 ... The change on test4 is expected. But the change on test3 is unexpected. It should not matter whether we set the vector length on the parallel directive, or using -fopenacc-dim, the effect of -mno-long-vector-in-workers should be the same. The cause for this can be seen by adding this print statement: ... diff --git a/gcc/config/nvptx/nvptx.c b/gcc/config/nvptx/nvptx.c index 110dbffe0d0..5aab6db169f 100644 --- a/gcc/config/nvptx/nvptx.c +++ b/gcc/config/nvptx/nvptx.c @@ -5688,6 +5688,7 @@ nvptx_adjust_parallelism (unsigned inner_mask, unsigned outer_mask) offload_attrs oa; populate_offload_attrs (&oa); + fprintf (stderr, "oa.vector_length in nvptx_adjust_parallelism: %d\n", oa.vector_length); if (oa.vector_length == PTX_WARP_SIZE) return inner_mask; ... If we have the first case (vector_length set on parallel directive, no -fopenacc-dim=), we have: ... oa.vector_length in nvptx_adjust_parallelism: 128 oa.vector_length in nvptx_adjust_parallelism: 128 oa.vector_length in nvptx_adjust_parallelism: 128 oa.vector_length in nvptx_adjust_parallelism: 128 oa.vector_length in nvptx_adjust_parallelism: 128 oa.vector_length in nvptx_adjust_parallelism: 128 oa.vector_length in nvptx_adjust_parallelism: 128 oa.vector_length in nvptx_adjust_parallelism: 128 oa.vector_length in nvptx_adjust_parallelism: 128 oa.vector_length in nvptx_adjust_parallelism: 32 oa.vector_length in nvptx_adjust_parallelism: 32 oa.vector_length in nvptx_adjust_parallelism: 32 oa.vector_length in nvptx_adjust_parallelism: 32 oa.vector_length in nvptx_adjust_parallelism: 32 oa.vector_length in nvptx_adjust_parallelism: 32 ... But in the second case (no vector_length set on parallel directive, using -fopenacc-dim=), we have: ... oa.vector_length in nvptx_adjust_parallelism: 32 oa.vector_length in nvptx_adjust_parallelism: 32 oa.vector_length in nvptx_adjust_parallelism: 32 oa.vector_length in nvptx_adjust_parallelism: 32 oa.vector_length in nvptx_adjust_parallelism: 32 oa.vector_length in nvptx_adjust_parallelism: 32 oa.vector_length in nvptx_adjust_parallelism: 32 oa.vector_length in nvptx_adjust_parallelism: 32 oa.vector_length in nvptx_adjust_parallelism: 32 oa.vector_length in nvptx_adjust_parallelism: 32 oa.vector_length in nvptx_adjust_parallelism: 32 oa.vector_length in nvptx_adjust_parallelism: 32 oa.vector_length in nvptx_adjust_parallelism: 32 oa.vector_length in nvptx_adjust_parallelism: 32 oa.vector_length in nvptx_adjust_parallelism: 32 ... I think the same problem exists for the other work around in nvptx_adjust_parallelism, thi
[Bug target/88620] [7/8/9 Regression] ICE in assign_stack_temp_for_type, at function.c:837
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88620 --- Comment #4 from Jakub Jelinek --- Author: jakub Date: Sat Jan 5 11:14:12 2019 New Revision: 267595 URL: https://gcc.gnu.org/viewcvs?rev=267595&root=gcc&view=rev Log: PR middle-end/82564 PR target/88620 * expr.c (expand_assignment): For calls returning VLA structures if to_rtx is not a MEM, force it into a stack temporary. * gcc.dg/nested-func-12.c: New test. * gcc.c-torture/compile/pr82564.c: New test. Added: trunk/gcc/testsuite/gcc.c-torture/compile/pr82564.c trunk/gcc/testsuite/gcc.dg/nested-func-12.c Modified: trunk/gcc/ChangeLog trunk/gcc/expr.c trunk/gcc/testsuite/ChangeLog
[Bug middle-end/82564] ICE at -O1 and above: in assign_stack_temp_for_type, at function.c:783
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=82564 --- Comment #3 from Jakub Jelinek --- Author: jakub Date: Sat Jan 5 11:14:12 2019 New Revision: 267595 URL: https://gcc.gnu.org/viewcvs?rev=267595&root=gcc&view=rev Log: PR middle-end/82564 PR target/88620 * expr.c (expand_assignment): For calls returning VLA structures if to_rtx is not a MEM, force it into a stack temporary. * gcc.dg/nested-func-12.c: New test. * gcc.c-torture/compile/pr82564.c: New test. Added: trunk/gcc/testsuite/gcc.c-torture/compile/pr82564.c trunk/gcc/testsuite/gcc.dg/nested-func-12.c Modified: trunk/gcc/ChangeLog trunk/gcc/expr.c trunk/gcc/testsuite/ChangeLog
[Bug c/88698] Relax generic vector conversions
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88698 --- Comment #9 from Marc Glisse --- (In reply to Devin Hussey from comment #2) > What I am saying is that I think -flax-vector-conversions should be default, > or we should only have minimal warnings instead of errors. > > That will make generic vectors much easier to use. And more confusing / error-prone, there is a compromise. > typedef uint32_t u32x4 __attribute__((vector_size(16))); > > u32x4 shift(u32x4 val) > { > return _mm_srli_epi32(val, 15); > } Indeed, when calling an intrinsic, it could make sense to allow other vector types of the same size. Or would you expect the same behavior if you were calling your own function instead of _mm_srli_epi32? > 3. Cast. Good lord, if you thought intrinsics were ugly, this will change > your mind: > > return (u32x4)_mm_srli_epi32((__m128i)val, 15); It isn't that bad. First, if you only use intrinsics, you shouldn't define u32x4, then you only have __m128i, __m128 and __m128d, fewer conversions are needed. Then, if you do define u32x4, you can rewrite that as return val >> 15; > This is the second issue: unsigned long and unsigned int are the same size > and should have no issues converting between each other. We could special case this. But note that in C/C++, we don't consider int and long as the same type just because they have the same size, and reinterpreting int* as long* violates strict aliasing. > typedef unsigned u32x4 __attribute__((vector_size(16))); > typedef unsigned long long u64x2 __attribute__((vector_size(16))); > > u64x2 cast(u32x4 val) > { > return val; > } > > > This should emit a warning without a cast. I would recommend an error, but > Clang without -Wvector-conversion accepts this without any complaining. At some point it isn't easy to have a different behavior for an implicit conversion in different contexts. Should the intrinsics be marked with some magic flag that asks to be lax about their arguments? (In reply to Devin Hussey from comment #5) > Clang even allows this: > > #include > > uint32x4_t mult(uint16x8_t top, uint32x4_t bot) > { > return top * bot; > } We clearly don't want that...
[Bug debug/88635] [8 Regression] Assembler error when building with "-g -O2 -m32"
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88635 --- Comment #5 from Jakub Jelinek --- Author: jakub Date: Sat Jan 5 11:12:35 2019 New Revision: 267594 URL: https://gcc.gnu.org/viewcvs?rev=267594&root=gcc&view=rev Log: PR debug/88635 * dwarf2out.c (const_ok_for_output_1): Reject MINUS that contains SYMBOL_REF, CODE_LABEL or UNSPEC in subexpressions of second argument. Reject PLUS that contains SYMBOL_REF, CODE_LABEL or UNSPEC in subexpressions of both operands. (mem_loc_descriptor): Handle UNSPEC if target hook acks it and all the subrtxes are CONSTANT_P. * config/i386/i386.c (ix86_const_not_ok_for_debug_p): Revert 2018-11-09 changes. * gcc.dg/debug/dwarf2/pr88635.c: New test. Added: trunk/gcc/testsuite/gcc.dg/debug/dwarf2/pr88635.c Modified: trunk/gcc/ChangeLog trunk/gcc/config/i386/i386.c trunk/gcc/dwarf2out.c trunk/gcc/testsuite/ChangeLog
[Bug target/60563] FAIL: g++.dg/ext/sync-4.C on *-apple-darwin*
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=60563 --- Comment #17 from dominiq at gcc dot gnu.org --- Author: dominiq Date: Sat Jan 5 11:09:11 2019 New Revision: 267593 URL: https://gcc.gnu.org/viewcvs?rev=267593&root=gcc&view=rev Log: 2019-01-05 Dominique d'Humieres PR target/60563 Missing PR entry in the previous commit. Modified: trunk/gcc/testsuite/ChangeLog
[Bug target/88638] [9 Regression] FAIL: *string-format-1.* on darwin
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88638 --- Comment #3 from Dominique d'Humieres --- > I submitted the patch below for review. Dominique, if you have > an opportunity to test it on Darwin and let me know if there are > any outstanding problems that would be great. > https://gcc.gnu.org/ml/gcc-patches/2019-01/msg00181.html The patch for c-family/c-attribs.c no longer applies due to revision r267591: I used --- ../_clean/gcc/c-family/c-attribs.c 2019-01-05 05:45:01.0 +0100 +++ gcc/c-family/c-attribs.c2019-01-05 06:04:49.0 +0100 @@ -632,16 +632,12 @@ positional_argument (const_tree fntype, } bool type_match; - if (code == STRING_CST && POINTER_TYPE_P (argtype)) - { - /* Where the expected code is STRING_CST accept any pointer -to a narrow character type, qualified or otherwise. */ - tree type = TREE_TYPE (argtype); - type = TYPE_MAIN_VARIANT (type); - type_match = (type == char_type_node - || type == signed_char_type_node - || type == unsigned_char_type_node); - } + if (code == STRING_CST) + /* Where the expected code is STRING_CST accept any pointer + expected by attribute format (this includes possibly qualified + char pointers and, for targets like Darwin, also pointers to + struct CFString). */ + type_match = valid_format_string_type_p (argtype); else if (code == INTEGER_TYPE) /* For integers, accept enums, wide characters and other types that match INTEGRAL_TYPE_P except for bool. */ @@ -652,6 +648,21 @@ positional_argument (const_tree fntype, if (!type_match) { + if (code == STRING_CST) + { + /* Reject invalid format strings with an error. */ + if (argno < 1) + error ("%qE attribute argument value %qE refers to " + "parameter type %qT", + atname, pos, argtype); + else + error ("%qE attribute argument %i value %qE refers to " + "parameter type %qT", + atname, argno, pos, argtype); + + return NULL_TREE; + } + if (argno < 1) warning (OPT_Wattributes, "%qE attribute argument value %qE refers to " A quick tests showed that it fixed the reported failures.
[Bug c/88698] Relax generic vector conversions
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88698 --- Comment #8 from Andrew Pinski --- (In reply to Devin Hussey from comment #7) > I mean, sure, but how about this? > > What about meeting in the middle? The problem is how do you implement the rules that are required by both the Altivec and Neon programming manuals? Do you treat those types differently? And then what about the generic vector types to/from the Altivec/Neon types? How do you want to have those handled? Basically GCC was trying to follow what the Altivec (VMX) PEM says with respect to the types and their casting. For reference of the Altivec PEM: https://www.nxp.com/docs/en/reference-manual/ALTIVECPEM.pdf Have you read the Altivec PEM? GCC vector extension is/was modeled mostly after the Altivec PEM with a few additions aftwards (like operators and condtionals). Here is the patch which added vector_size: https://gcc.gnu.org/ml/gcc-patches/2001-12/msg00379.html Here is the patch that made it in which added the operators: https://gcc.gnu.org/ml/gcc/2002-05/msg02234.html Notice that this patch has the following test: + v4si a, b; .. + uv4si f; ... + f = a; /* { dg-error "incompatible types in assignment" } */ As mentioned in the thread which added -flax-vector-conversions, that was an accident that some versions of GCC accepted the assignment without the cast. Somehow the testcase got lost.
[Bug c/88698] Relax generic vector conversions
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88698 --- Comment #7 from Devin Hussey --- I mean, sure, but how about this? What about meeting in the middle? -fno-lax-vector-conversions generates errors like it does now. -flax-vector-conversions shuts GCC up. No flag causes warnings on -Wpedantic or -Wvector-conversion. If we really want to enforce the standard, we should also add a pedantic warning for when we use overloads on intrinsic types without -std=gnu*. -Wgnu-vector-extensions or something: warning: { arithmetic operators | logical operators | array subscripts | initializer lists } on vector types are a GNU extension I feel that the weird promotion rules Clang uses should be an error, and assignment to different types should warn without a cast.
[Bug ipa/88702] [6/7/8 regression] We do terrible job optimizing IsHTMLWhitespace from Firefox
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88702 --- Comment #4 from Jan Hubicka --- > The only pass that can do about this (at least right now) is reassoc (both 1 > and 2), which is too late for inlining. So, either teach fnsplit not to > separate multiple if comparisons of the same variable against constants, or > schedule reasoc or just the maybe_optimize_range_tests part thereof in some > early pass. Yep, I also found out about reassoc. Teaching fnsplit to pattern match this is just a partial solution - we would still miscalculate size of function body for functions like this (which indeed look quite common). I will experiment with early reassoc. I kind of debugged what happens later. Because code is compiled with -O2 and growth gets positive for both inlines and functions are not inline, we won't inline.
[Bug c/88698] Relax generic vector conversions
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88698 Alexander Monakov changed: What|Removed |Added CC||amonakov at gcc dot gnu.org --- Comment #6 from Alexander Monakov --- My recommendation is to use a union like below; this allows writing code using both generic vectors and intrinsics without casts, and having each operation show exactly what lane types it operates on: typedef unsigned char u8v __attribute__((vector_size(16))); typedef unsigned short u16v __attribute__((vector_size(16))); typedef unsigned int u32v __attribute__((vector_size(16))); typedef union { u8v u8; u16v u16; u32v u32; __m128i m; } uv; Example use: uv x, t, lo_nib, hi_nib; memcpy(&x, ptr, sizeof x); t.u32 = x.u32 >> 4; lo_nib.u8 = x.u8 & 15; hi_nib.u8 = t.u8 & 15; lo_nib.m = _mm_shuffle_epi8(lut.m, lo_nib.m); hi_nib.m = _mm_shuffle_epi8(lut.m, hi_nib.m); This also allows writing 256-bit and 128-bit versions together when appropriate (with help of extra macros for using the right intrinsic function). Would you like to see the documentation mention this pattern?