[Bug target/87560] ICE in curr_insn_transform, at lra-constraints.c:3892
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=87560 --- Comment #9 from Bill Schmidt --- I plan to backport the fix to releases/gcc-9 after 9.3 releases.
[Bug target/91638] powerpc -mlong-double-NN (documentation) issues
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=91638 Bill Schmidt changed: What|Removed |Added Status|WAITING |RESOLVED Resolution|--- |FIXED CC||wschmidt at gcc dot gnu.org --- Comment #8 from Bill Schmidt --- Work is complete.
[Bug target/93709] [10 regression] fortran.dg/minlocval_4.f90 fails on power 9 after r10-4161
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=93709 Bill Schmidt changed: What|Removed |Added Status|UNCONFIRMED |NEW Last reconfirmed||2020-03-11 Ever confirmed|0 |1 --- Comment #7 from Bill Schmidt --- Confirmed, since fixed on trunk. Do we want any backports?
[Bug testsuite/94019] [9 regression] gcc.dg/vect/vect-over-widen-17.c fails starting with g:370c2ebe8fa20e0812cd2d533d4ed38ee2d37c85, r9-1590
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=94019 --- Comment #3 from Bill Schmidt --- Looks like this could be closed, Kewen?
[Bug testsuite/94019] [9 regression] gcc.dg/vect/vect-over-widen-17.c fails starting with g:370c2ebe8fa20e0812cd2d533d4ed38ee2d37c85, r9-1590
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=94019 --- Comment #4 from Bill Schmidt --- Oh sorry, we are awaiting a backport. Never mind.
[Bug target/87560] ICE in curr_insn_transform, at lra-constraints.c:3892
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=87560 --- Comment #10 from Bill Schmidt --- rs6000: Fix -mpower9-vector -mno-altivec ICE (PR87560) PR87560 reports an ICE when a test case is compiled with -mpower9-vector and -mno-altivec. This patch terminates compilation with an error when this combination (and other unreasonable ones) are requested. Bootstrapped and tested on powerpc64le-unknown-linux-gnu with no regressions. Reported error is now: f951: Error: '-mno-altivec' turns off '-mpower9-vector' 2020-03-12 Bill Schmidt Backport from master 2020-03-02 Bill Schmidt PR target/87560 * rs6000-cpus.def (OTHER_ALTIVEC_MASKS): New #define. * rs6000.c (rs6000_disable_incompatible_switches): Add table entry for OPTION_MASK_ALTIVEC.
[Bug target/87560] ICE in curr_insn_transform, at lra-constraints.c:3892
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=87560 Bill Schmidt changed: What|Removed |Added Resolution|--- |FIXED Status|NEW |RESOLVED --- Comment #11 from Bill Schmidt --- Backport complete, closing.
[Bug target/90000] Compile-time hog w/ impossible asm constraints on powerpc
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=9 Bill Schmidt changed: What|Removed |Added Last reconfirmed||2020-04-02 Ever confirmed|0 |1 Status|UNCONFIRMED |NEW --- Comment #3 from Bill Schmidt --- So, confirmed...
[Bug target/91804] [10 regression] r265398 breaks gcc.target/powerpc/vec-rlmi-rlnm.c
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=91804 Bill Schmidt changed: What|Removed |Added Priority|P2 |P4 --- Comment #3 from Bill Schmidt --- Moving to P4. This is not important, and we will fix this in GCC 11.
[Bug libstdc++/91153] New test case 29_atomics/atomic_float/1.cc execution test fails
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=91153 Bill Schmidt changed: What|Removed |Added CC||jakub at gcc dot gnu.org --- Comment #1 from Bill Schmidt --- Looks like this bug hasn't been classified (still P3). Curious whether it's on the list for P10 or deferred.
[Bug libstdc++/91153] New test case 29_atomics/atomic_float/1.cc execution test fails
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=91153 --- Comment #4 from Bill Schmidt --- Perfect, thanks! I'll take it off my concern list...
[Bug target/94707] [8/9/10 Regression] class with empty base passed incorrectly with -std=c++17 on powerpc64le
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=94707 --- Comment #6 from Bill Schmidt --- The ELFv2 ABI has a prominent note specifying: "Floating-point and vector aggregates that contain padding words and integer fields with a width of 0 should not be treated as homogeneous aggregates."
[Bug target/94707] [8/9/10 Regression] class with empty base passed incorrectly with -std=c++17 on powerpc64le
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=94707 --- Comment #7 from Bill Schmidt --- ELF V1 does not have a concept of homogeneous aggregates.
[Bug target/94707] [8/9/10 Regression] class with empty base passed incorrectly with -std=c++17 on powerpc64le
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=94707 --- Comment #8 from Bill Schmidt --- Thus the compiler is acting as expected in both cases, so far as I can see. If C++17 has added new hidden fields, that seems to have introduced an incompatibility between C++17 and C++14 targeted code for the ELFv2 ABI.
[Bug target/94954] New: Wrong code generation for vec_pack_to_short_fp32 builtin for Power
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=94954 Bug ID: 94954 Summary: Wrong code generation for vec_pack_to_short_fp32 builtin for Power Product: gcc Version: 11.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: target Assignee: unassigned at gcc dot gnu.org Reporter: wschmidt at gcc dot gnu.org Target Milestone: --- This builtin was mis-implemented. It is supposed to pack 32-bit floating-point values into 16-bit floating-point form. Instead, the values are converted to unsigned 16-bit integer form. This should be fixed in all supported releases.
[Bug target/94954] Wrong code generation for vec_pack_to_short_fp32 builtin for Power
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=94954 Bill Schmidt changed: What|Removed |Added Target Milestone|--- |11.0 Keywords||wrong-code CC||segher at gcc dot gnu.org Ever confirmed|0 |1 Target||powerpc*-*-* Last reconfirmed||2020-05-05 Status|UNCONFIRMED |NEW --- Comment #1 from Bill Schmidt --- Confirmed.
[Bug target/95082] New: LE implementations of vec_cnttz_lsbb and vec_cntlz_lsbb are wrong
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=95082 Bug ID: 95082 Summary: LE implementations of vec_cnttz_lsbb and vec_cntlz_lsbb are wrong Product: gcc Version: 11.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: target Assignee: unassigned at gcc dot gnu.org Reporter: wschmidt at gcc dot gnu.org Target Milestone: --- For little endian, we need to swap vctzlsbb and vclzlsbb, but today we generate the BE instruction in all cases.
[Bug target/95082] LE implementations of vec_cnttz_lsbb and vec_cntlz_lsbb are wrong
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=95082 Bill Schmidt changed: What|Removed |Added Keywords||wrong-code Status|UNCONFIRMED |NEW Last reconfirmed||2020-05-12 Ever confirmed|0 |1 Target Milestone|--- |11.0 CC||segher at gcc dot gnu.org --- Comment #1 from Bill Schmidt --- Confirmed.
[Bug fortran/95053] [11 regression] ICE in f951: gfc_divide()
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=95053 Bill Schmidt changed: What|Removed |Added CC||wschmidt at gcc dot gnu.org --- Comment #22 from Bill Schmidt --- Breaking legitimate code, even if "borderline," does not seem right to me. Zero division is generally a runtime exception because of such cases. You write code for a general case, then later you discover "oh, well, we could make this variable zero for our specific usage," and now the compiler throws a fit? Seems like this is warning-level stuff.
[Bug fortran/95053] [11 regression] ICE in f951: gfc_divide()
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=95053 --- Comment #25 from Bill Schmidt --- But I'm not going to worry about it further.
[Bug target/70053] Returning a struct of _Decimal128 values generates extraneous stores and loads
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=70053 Bill Schmidt changed: What|Removed |Added Ever confirmed|0 |1 Status|UNCONFIRMED |NEW Last reconfirmed||2020-05-22 --- Comment #5 from Bill Schmidt --- I'd like to understand why the difference between -O2 and -O3 exists. We shouldn't generate this kind of nasty store-load at -O2. Confirmed, BTW. :)
[Bug target/95737] PPC: Unnecessary extsw after negative less than
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=95737 --- Comment #1 from Bill Schmidt --- Please test this out of context of a return statement. The problem with unnecessary extends of return values is widely known and not specific to this particular case.
[Bug target/95737] PPC: Unnecessary extsw after negative less than
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=95737 Bill Schmidt changed: What|Removed |Added Resolution|--- |DUPLICATE Status|UNCONFIRMED |RESOLVED --- Comment #2 from Bill Schmidt --- If you can show this is different from 65010 (not a return value issue), please reopen. *** This bug has been marked as a duplicate of bug 65010 ***
[Bug target/65010] ppc backend generates unnecessary signed extension
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=65010 Bill Schmidt changed: What|Removed |Added CC||jens.seifert at de dot ibm.com --- Comment #10 from Bill Schmidt --- *** Bug 95737 has been marked as a duplicate of this bug. ***
[Bug target/95952] [8 Regression] gcc-8 bootstrap failure on powerpc64-linux
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=95952 Bill Schmidt changed: What|Removed |Added CC||willschm at gcc dot gnu.org --- Comment #2 from Bill Schmidt --- Ah, this is Will, not me. Will, can you please look into this ASAP?
[Bug target/96017] Powerpc suboptimal register spill in likely path
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=96017 Bill Schmidt changed: What|Removed |Added CC||segher at gcc dot gnu.org, ||wschmidt at gcc dot gnu.org Target Milestone|--- |9.4 Build|gcc version 9.2.1 20190909 | |(Debian 9.2.1-8)| Keywords||missed-optimization --- Comment #1 from Bill Schmidt --- Built with gcc version 9.2.1 20190909 (Debian 9.2.1-8) (moved from Build field).
[Bug target/96017] Powerpc suboptimal register spill in likely path
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=96017 Bill Schmidt changed: What|Removed |Added Target Milestone|9.4 |11.0
[Bug target/96017] Powerpc suboptimal register spill in likely path
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=96017 --- Comment #2 from Bill Schmidt --- Nick reports same behavior at -O3.
[Bug target/96139] Vector element extract mistypes long long int down to long int
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=96139 --- Comment #2 from Bill Schmidt --- Have you tried it for -m32, out of curiosity?
[Bug target/96787] rs6000 mcpu=power10 miscompiles libiberty htab_delete() causing bootstrap failure
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=96787 --- Comment #5 from Bill Schmidt --- The divergence occurs after .L75 in the two versions. In the P10 version, we see that the second bctrl has been converted into a bctr. It looks like a tail call optimization happening, but we aren't at the end of the function. This happens again later for the second bctrl after .L78. Why would we think a tail call optimization can happen in the middle of a block...
[Bug target/96787] rs6000 mcpu=power10 miscompiles libiberty htab_delete() causing bootstrap failure
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=96787 Bill Schmidt changed: What|Removed |Added CC||amodra at gcc dot gnu.org --- Comment #6 from Bill Schmidt --- Ah, I can't read. These really are at the end(s) of the function, so it does appear to be a tail call optimization that we've decided is legitimate. It looks like the problem is that we don't set up r12 prior to the tail call. Unlike the usual case, here we have "mtctr 9" to set up the CTR and r12 still points to the previously called function. That can't be good. This is exposed by my recent patch to allow more tail calls in rs6000_decl_ok_for_sibcall, but it's not clear to me where we need to fix things up so that r12 gets set.
[Bug target/96787] rs6000 mcpu=power10 miscompiles libiberty htab_delete() causing bootstrap failure
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=96787 --- Comment #7 from Bill Schmidt --- I believe the problem may be that rs6000_sibcall_aix doesn't contain any handling for indirect calls, whereas similar code for other ABIs, like rs6000_sibcall_sysv, does. Alan, does this make sense?
[Bug target/96787] rs6000 mcpu=power10 miscompiles libiberty htab_delete() causing bootstrap failure
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=96787 --- Comment #8 from Bill Schmidt --- I'm working on a patch.
[Bug target/96787] rs6000 mcpu=power10 miscompiles libiberty htab_delete() causing bootstrap failure
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=96787 Bill Schmidt changed: What|Removed |Added Status|NEW |RESOLVED Resolution|--- |FIXED --- Comment #10 from Bill Schmidt --- Work is complete.
[Bug target/96791] ICE in convert_mode_scalar, at expr.c:412
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=96791 Bill Schmidt changed: What|Removed |Added CC||luoxhu at gcc dot gnu.org --- Comment #2 from Bill Schmidt --- CCing Xiong Hu, as I suspect this is related to the recent work for partially dead stores.
[Bug target/96791] ICE in convert_mode_scalar, at expr.c:412
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=96791 --- Comment #4 from Bill Schmidt --- Not the partially dead store code after all -- just a coincidence!
[Bug target/92132] new test case gcc.dg/vect/vect-cond-reduc-4.c fails with its introduction in r277067
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=92132 --- Comment #2 from Bill Schmidt --- Yes, odd that the comparison is flagged as not vectorizable.
[Bug testsuite/92093] New test case gcc.target/powerpc/pr91275.c from r276410 fails on BE
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=92093 --- Comment #4 from Bill Schmidt --- Author: wschmidt Date: Thu Oct 17 15:32:40 2019 New Revision: 277117 URL: https://gcc.gnu.org/viewcvs?rev=277117&root=gcc&view=rev Log: 2019-10-17 Bill Schmidt Backport from mainline 2019-10-15 Bill Schmidt PR target/92093 * gcc.target/powerpc/pr91275.c: Fix type and endian issues. Modified: branches/gcc-9-branch/gcc/testsuite/ChangeLog branches/gcc-9-branch/gcc/testsuite/gcc.target/powerpc/pr91275.c
[Bug testsuite/92093] New test case gcc.target/powerpc/pr91275.c from r276410 fails on BE
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=92093 --- Comment #5 from Bill Schmidt --- Author: wschmidt Date: Thu Oct 17 15:33:58 2019 New Revision: 277118 URL: https://gcc.gnu.org/viewcvs?rev=277118&root=gcc&view=rev Log: 2019-10-17 Bill Schmidt Backport from mainline 2019-10-15 Bill Schmidt PR target/92093 * gcc.target/powerpc/pr91275.c: Fix type and endian issues. Modified: branches/gcc-8-branch/gcc/testsuite/ChangeLog branches/gcc-8-branch/gcc/testsuite/gcc.target/powerpc/pr91275.c
[Bug testsuite/92093] New test case gcc.target/powerpc/pr91275.c from r276410 fails on BE
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=92093 --- Comment #6 from Bill Schmidt --- Author: wschmidt Date: Thu Oct 17 15:35:28 2019 New Revision: 277119 URL: https://gcc.gnu.org/viewcvs?rev=277119&root=gcc&view=rev Log: 2019-10-17 Bill Schmidt Backport from mainline 2019-10-15 Bill Schmidt PR target/92093 * gcc.target/powerpc/pr91275.c: Fix type and endian issues. Modified: branches/gcc-7-branch/gcc/testsuite/ChangeLog branches/gcc-7-branch/gcc/testsuite/gcc.target/powerpc/pr91275.c
[Bug testsuite/92093] New test case gcc.target/powerpc/pr91275.c from r276410 fails on BE
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=92093 Bill Schmidt changed: What|Removed |Added Status|NEW |RESOLVED Known to work||7.4.0, 8.3.0, 9.1.0 Resolution|--- |FIXED Known to fail|7.4.0, 8.3.0, 9.1.0 | --- Comment #7 from Bill Schmidt --- Fixed everywhere.
[Bug testsuite/92093] New test case gcc.target/powerpc/pr91275.c from r276410 fails on BE
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=92093 Bill Schmidt changed: What|Removed |Added Status|RESOLVED|CLOSED --- Comment #8 from Bill Schmidt --- Closing.
[Bug testsuite/92126] gcc.dg/vect/pr62171.c fails on power7
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=92126 --- Comment #4 from Bill Schmidt --- Should we close this? I found it on an internal list of old failures on P7 that need looking at. Not sure having this issue open provides value.
[Bug target/92287] Mismatches in the calling convention for zero sized types
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=92287 --- Comment #5 from Bill Schmidt --- For 32-bit big-endian PowerPC (using the 32-bit ELF ABI), the same code generation is provided by GCC and Clang. I.e., here's the code generation for Clang with -O2 -m32 -mbig-endian, using 6.0.0-1ubuntu2: id_foo: # @id_foo .Lfunc_begin0: # %bb.0: mr 3, 4 blr The ABI document used to be posted at power.org, which is defunct. However, the sources are available at github: https://github.com/ryanarn/powerabi For the 32-bit ELF ABI, all structs (regardless of size) are passed using a pointer allowing for call-by-value semantics. This is the source of ZSTs requiring a register. So it's clear there is an ABI that requires this behavior. (Look for the Parameter Passing Register Selection Algorithm in https://github.com/ryanarn/powerabi/blob/master/chap3-elf32abi.sgml.) The 64-bit ABIs (both ELF V1 and ELF V2) pass structures in registers, and the parameter passing algorithms won't assign registers for size-0 aggregates. This is intentional. I hope this is helpful! Bill
[Bug target/91886] [10 regression] powerpc64 impossible constraint in asm
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=91886 --- Comment #32 from Bill Schmidt --- BTW, we are in close contact with the Clang folks for Power as well, so we're going to get together with them about constraints consistency and a way forward to ensure these problems don't recur. I don't want anyone to get the idea that we don't care about Clang; we care very much indeed about compatibility between the compilers. I've personally worked on both in the past to ensure the ABI compatibility that you enjoy today. We had an unexpected gotcha here, and we'll get it resolved. I do agree with Segher that this was a long-overdue cleanup that was causing us a lot of misery, and the use of "ws" in the field was rather surprising to us. Long term we really want to get to "wa" and remove "ws", but aliases in the meantime will be needed until supported versions of Clang and GCC have the compatibility issue resolved.
[Bug tree-optimization/92098] [9 Regression] After r262333, the following code cannot be vectorized on powerpc64le.
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=92098 Bill Schmidt changed: What|Removed |Added Status|RESOLVED|REOPENED Last reconfirmed||2019-11-15 Resolution|DUPLICATE |--- Target Milestone|10.0|9.3 Summary|[10 Regression] After |[9 Regression] After |r262333, the following code |r262333, the following code |cannot be vectorized on |cannot be vectorized on |powerpc64le.|powerpc64le. Ever confirmed|0 |1 --- Comment #2 from Bill Schmidt --- This was actually reported to us as a [9 regression], so until 92132 is backported (and we need to check that the backport fixes the issue), this should stay open, I guess.
[Bug tree-optimization/53947] [meta-bug] vectorizer missed-optimizations
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=53947 Bug 53947 depends on bug 92098, which changed state. Bug 92098 Summary: [9 Regression] After r262333, the following code cannot be vectorized on powerpc64le. https://gcc.gnu.org/bugzilla/show_bug.cgi?id=92098 What|Removed |Added Status|RESOLVED|REOPENED Resolution|DUPLICATE |---
[Bug testsuite/92398] [10 regression] error in update of gcc.target/powerpc/pr72804.c in r277872
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=92398 Bill Schmidt changed: What|Removed |Added Status|UNCONFIRMED |NEW Last reconfirmed||2019-12-09 Ever confirmed|0 |1 --- Comment #11 from Bill Schmidt --- Confirmed. ;-) Is this ready to close?
[Bug target/92923] __builtin_vec_xor() causes subregs to be used when not using V4SImode vectors
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=92923 Bill Schmidt changed: What|Removed |Added Status|UNCONFIRMED |NEW Last reconfirmed||2019-12-12 CC||wschmidt at gcc dot gnu.org Ever confirmed|0 |1 --- Comment #1 from Bill Schmidt --- Confirmed. The problem is bad overloading code for vec_xor, which accepts all vector types but translates them all into V4SI mode instead of having individual patterns for the different modes: In rs6000-builtin.def: BU_ALTIVEC_2 (VXOR, "vxor", CONST, xorv4si3) There should be multiples of these for different vector modes, not just one for all of them. In rs6000-c.c: { ALTIVEC_BUILTIN_VEC_XOR, ALTIVEC_BUILTIN_VXOR, RS6000_BTI_V4SF, RS6000_BTI_V4SF, RS6000_BTI_V4SF, 0 }, { ALTIVEC_BUILTIN_VEC_XOR, ALTIVEC_BUILTIN_VXOR, RS6000_BTI_V4SF, RS6000_BTI_V4SF, RS6000_BTI_bool_V4SI, 0 }, { ALTIVEC_BUILTIN_VEC_XOR, ALTIVEC_BUILTIN_VXOR, RS6000_BTI_V4SF, RS6000_BTI_bool_V4SI, RS6000_BTI_V4SF, 0 }, { ALTIVEC_BUILTIN_VEC_XOR, ALTIVEC_BUILTIN_VXOR, RS6000_BTI_V2DF, RS6000_BTI_V2DF, RS6000_BTI_V2DF, 0 }, { ALTIVEC_BUILTIN_VEC_XOR, ALTIVEC_BUILTIN_VXOR, RS6000_BTI_V2DF, RS6000_BTI_V2DF, RS6000_BTI_bool_V2DI, 0 }, { ALTIVEC_BUILTIN_VEC_XOR, ALTIVEC_BUILTIN_VXOR, RS6000_BTI_V2DF, RS6000_BTI_bool_V2DI, RS6000_BTI_V2DF, 0 }, { ALTIVEC_BUILTIN_VEC_XOR, ALTIVEC_BUILTIN_VXOR, RS6000_BTI_V2DI, RS6000_BTI_V2DI, RS6000_BTI_V2DI, 0 }, { ALTIVEC_BUILTIN_VEC_XOR, ALTIVEC_BUILTIN_VXOR, RS6000_BTI_V2DI, RS6000_BTI_V2DI, RS6000_BTI_bool_V2DI, 0 }, { ALTIVEC_BUILTIN_VEC_XOR, ALTIVEC_BUILTIN_VXOR, RS6000_BTI_V2DI, RS6000_BTI_bool_V2DI, RS6000_BTI_V2DI, 0 }, { ALTIVEC_BUILTIN_VEC_XOR, ALTIVEC_BUILTIN_VXOR, RS6000_BTI_unsigned_V2DI, RS6000_BTI_unsigned_V2DI, RS6000_BTI_unsigned_V2D\ I, 0 }, { ALTIVEC_BUILTIN_VEC_XOR, ALTIVEC_BUILTIN_VXOR, RS6000_BTI_unsigned_V2DI, RS6000_BTI_unsigned_V2DI, RS6000_BTI_bool_V2DI, 0\ }, { ALTIVEC_BUILTIN_VEC_XOR, ALTIVEC_BUILTIN_VXOR, RS6000_BTI_unsigned_V2DI, RS6000_BTI_bool_V2DI, RS6000_BTI_unsigned_V2DI, 0\ }, { ALTIVEC_BUILTIN_VEC_XOR, ALTIVEC_BUILTIN_VXOR, RS6000_BTI_bool_V2DI, RS6000_BTI_bool_V2DI, RS6000_BTI_bool_V2DI, 0 }, { ALTIVEC_BUILTIN_VEC_XOR, ALTIVEC_BUILTIN_VXOR, RS6000_BTI_bool_V4SI, RS6000_BTI_bool_V4SI, RS6000_BTI_bool_V4SI, 0 }, { ALTIVEC_BUILTIN_VEC_XOR, ALTIVEC_BUILTIN_VXOR, RS6000_BTI_V4SI, RS6000_BTI_bool_V4SI, RS6000_BTI_V4SI, 0 }, { ALTIVEC_BUILTIN_VEC_XOR, ALTIVEC_BUILTIN_VXOR, RS6000_BTI_V4SI, RS6000_BTI_V4SI, RS6000_BTI_bool_V4SI, 0 }, { ALTIVEC_BUILTIN_VEC_XOR, ALTIVEC_BUILTIN_VXOR, RS6000_BTI_V4SI, RS6000_BTI_V4SI, RS6000_BTI_V4SI, 0 }, { ALTIVEC_BUILTIN_VEC_XOR, ALTIVEC_BUILTIN_VXOR, RS6000_BTI_unsigned_V4SI, RS6000_BTI_bool_V4SI, RS6000_BTI_unsigned_V4SI, 0\ }, { ALTIVEC_BUILTIN_VEC_XOR, ALTIVEC_BUILTIN_VXOR, RS6000_BTI_unsigned_V4SI, RS6000_BTI_unsigned_V4SI, RS6000_BTI_bool_V4SI, 0\ }, { ALTIVEC_BUILTIN_VEC_XOR, ALTIVEC_BUILTIN_VXOR, RS6000_BTI_unsigned_V4SI, RS6000_BTI_unsigned_V4SI, RS6000_BTI_unsigned_V4S\ I, 0 }, { ALTIVEC_BUILTIN_VEC_XOR, ALTIVEC_BUILTIN_VXOR, RS6000_BTI_bool_V8HI, RS6000_BTI_bool_V8HI, RS6000_BTI_bool_V8HI, 0 }, { ALTIVEC_BUILTIN_VEC_XOR, ALTIVEC_BUILTIN_VXOR, RS6000_BTI_V8HI, RS6000_BTI_bool_V8HI, RS6000_BTI_V8HI, 0 }, { ALTIVEC_BUILTIN_VEC_XOR, ALTIVEC_BUILTIN_VXOR, RS6000_BTI_V8HI, RS6000_BTI_V8HI, RS6000_BTI_bool_V8HI, 0 }, { ALTIVEC_BUILTIN_VEC_XOR, ALTIVEC_BUILTIN_VXOR, RS6000_BTI_V8HI, RS6000_BTI_V8HI, RS6000_BTI_V8HI, 0 }, { ALTIVEC_BUILTIN_VEC_XOR, ALTIVEC_BUILTIN_VXOR, RS6000_BTI_unsigned_V8HI, RS6000_BTI_bool_V8HI, RS6000_BTI_unsigned_V8HI, 0\ }, { ALTIVEC_BUILTIN_VEC_XOR, ALTIVEC_BUILTIN_VXOR, RS6000_BTI_unsigned_V8HI, RS6000_BTI_unsigned_V8HI, RS6000_BTI_bool_V8HI, 0\ }, { ALTIVEC_BUILTIN_VEC_XOR, ALTIVEC_BUILTIN_VXOR, RS6000_BTI_unsigned_V8HI, RS6000_BTI_unsigned_V8HI, RS6000_BTI_unsigned_V8H\ I, 0 }, { ALTIVEC_BUILTIN_VEC_XOR, ALTIVEC_BUILTIN_VXOR, RS6000_BTI_V16QI, RS6000_BTI_bool_V16QI, RS6000_BTI_V16QI, 0 }, { ALTIVEC_BUILTIN_VEC_XOR, ALTIVEC_BUILTIN_VXOR, RS6000_BTI_bool_V16QI, RS6000_BTI_bool_V16QI, RS6000_BTI_bool_V16QI, 0 }, { ALTIVEC_BUILTIN_VEC_XOR, ALTIVEC_BUILTIN_VXOR, RS6000_BTI_V16QI, RS6000_BTI_V16QI, RS6000_BTI_bool_V16QI, 0 }, { ALTIVEC_BUILTIN_VEC_XOR, ALTIVEC_BUILTIN_VXOR, RS6000_BTI_V16QI, RS6000_BTI_V16QI, RS6000_BTI_V16QI, 0 }, { ALTIVEC_BUILTIN_VEC_XOR, ALTIVEC_BUILTIN_VXOR, RS6000_BTI_unsigned_V16QI, RS6000_BTI_bool_V16QI, RS6000_BTI_unsigned_V16QI\ , 0 }, { ALTIVEC_BUILTIN_VEC_XOR, ALTIVEC_BUILTIN_VXOR, RS6000_BTI_unsigned_V16QI, RS6000_BTI_unsigned_V16QI, RS6000_BTI_bool_
[Bug target/91534] some defined builtins are not usable
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=91534 Bill Schmidt changed: What|Removed |Added Status|UNCONFIRMED |RESOLVED CC||wschmidt at gcc dot gnu.org Resolution|--- |INVALID --- Comment #2 from Bill Schmidt --- For clarity, many of these interfaces are only used internally as part of mappings from overloaded builtins to builtins for a specific set of vector type arguments. Ultimately the interface that the user sees will be something like vec_madd. These internal tables are not intended to be a source of all possible interfaces that users can access. Accepted vector interfaces are defined in Appendix A of the Power ELF v2 ABI. Better documentation of them is in progress and should become available in 1H2020. Overhauling the whole Power-specific builtin system is on my list for GCC 11 if I can make the time.
[Bug target/93011] PowerPC GCC has warning that aggregate alignment changed in GCC 5
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=93011 --- Comment #1 from Bill Schmidt --- This is worth considering; but offhand I don't believe we should remove this until common distros that use GCC 4.8 or 4.9 as default are retired (RHEL 7 and SLES 12, for example, both use 4.8 as default and are still supported). This would take us out until 2027 at least, for long-term support contracts...
[Bug target/93011] PowerPC GCC has warning that aggregate alignment changed in GCC 5
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=93011 Bill Schmidt changed: What|Removed |Added Status|UNCONFIRMED |NEW Last reconfirmed||2019-12-19 Ever confirmed|0 |1 --- Comment #2 from Bill Schmidt --- But sure, confirmed. ;-)
[Bug tree-optimization/93013] PPC: optimization around modulo leads to incorrect result
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=93013 Bill Schmidt changed: What|Removed |Added Status|UNCONFIRMED |NEW Last reconfirmed||2019-12-19 CC||meissner at gcc dot gnu.org, ||segher at gcc dot gnu.org Component|c++ |tree-optimization Ever confirmed|0 |1 --- Comment #1 from Bill Schmidt --- The branch is removed by the middle end, as the optimized dump shows: ;; Function mod (_Z3modiiRi, funcdef_no=0, decl_uid=3256, cgraph_uid=1, symbol_\ order=0) mod (int x, int y, int & z) { int _1; bool _3; int _9; [local count: 1073741824]: _1 = x_4(D) % y_5(D); *z_7(D) = _1; _3 = y_5(D) == 0; _9 = (int) _3; return _9; } On POWER9 we will get the expected answer given the use of the modsw instruction. For POWER8 we get the codegen with divw which is indeed undefined for these inputs. I guess the issue would be in the expander for mod3. Confirmed.
[Bug target/93013] PPC: optimization around modulo leads to incorrect result
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=93013 Bill Schmidt changed: What|Removed |Added Target|powerpc-ibm-aix7.1.0.0 |powerpc-*-*-* Component|tree-optimization |target Target Milestone|--- |8.4
[Bug target/70928] Load simple float constants via VSX operations on PowerPC
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=70928 Bill Schmidt changed: What|Removed |Added CC||jens.seifert at de dot ibm.com --- Comment #3 from Bill Schmidt --- *** Bug 93128 has been marked as a duplicate of this bug. ***
[Bug target/93128] PPC small floating point constants can be constructed using vector operations
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=93128 Bill Schmidt changed: What|Removed |Added Status|UNCONFIRMED |RESOLVED CC||wschmidt at gcc dot gnu.org Resolution|--- |DUPLICATE --- Comment #2 from Bill Schmidt --- This is a duplicate of PR70928. *** This bug has been marked as a duplicate of bug 70928 ***
[Bug target/93206] non-delegitimized UNSPEC generated for C program on PowerPc with current mainline GCC tree
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=93206 Bill Schmidt changed: What|Removed |Added CC||wschmidt at gcc dot gnu.org --- Comment #2 from Bill Schmidt --- (In reply to Jakub Jelinek from comment #1) > note: non-delegitimized UNSPEC > is just a debugging note solely in non-release checking builds. Not all > UNSPECs need to be delegitimized, it is just a hint that it is something > that could be inspected, whether it can be easily delegitimized or not (see > rs6000_delegitimize_address). > As UNSPEC_FOO is not in upstream GCC, I fail to see the need for upstream PR. What may not be crystal clear here is that var-tracking is making a mistake. The original problem doesn't look quite like the obfuscated one here -- this came up while working on some future work where an UNSPEC does seem necessary. We get something like result = complex-thing-needing-UNSPEC result = expression-involving-result where both definitions of result get assigned to the same register, say r31. The first statement has a var_location note saying result is in r31. The second statement has a var_location note saying result is in the UNSPEC RTL generated from complex-thing-needing-UNSPEC. This is absolutely wrong, because result is once again in r31 at this point; it's a new lifetime of result, but that's where it is. That's the bug we're attempting to report here. Unfortunately, as the test was changed to hide stuff we can't disclose yet, the error message moved to a different place and another UNSPEC, which as you note is less obviously necessary as an UNSPEC anyway. But that's an unimportant detail. This var-tracking bug is making it hard to develop the code for this new feature. The original test was testing four versions of this instruction, each with a different mode. Only the V4DI version fails due to the var-tracking error. So the upstream PR is needed so development can proceed. > Anyway, I have to wonder why vsx.md uses so many UNSPECs, can't e.g. > UNSPEC_VSX_SET be just using vec_merge of vec_duplicate of the scalar > operand (what is inserted) and the vector operand, with the position as last > operand? > Is the reason the endian correction? There are too many unnecessary UNSPECs in the Power back end, yeah. We're rooting those out as we have time. But this bug shows up with a necessary UNSPEC originally, so that's not relevant for this report.
[Bug debug/93206] non-delegitimized UNSPEC generated for C program on PowerPc with current mainline GCC tree
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=93206 --- Comment #6 from Bill Schmidt --- (In reply to Jakub Jelinek from comment #4) > There is no error, it is a note and if some variable at some point, even > short one, can't be described using just registers or memory, but needs the > value of the UNSPEC to describe it, there is no var-tracking bug, it just > tries to build debug info from the UNSPEC and finds out it can't. But there *is* a register that describes the variable. It's wrongly using the UNSPEC instead. I contend that this is indeed a bug.
[Bug target/93230] PowerPC GCC vec_extract of a vector in memory does not fold sign/zero extension into load
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=93230 --- Comment #5 from Bill Schmidt --- Yeah, vec_extract should get folded in rs6000_fold_builtin eventually. I think that Will had a patch in progress on this at one time, but ran into some difficulties and it got abandoned in favor of more urgent work. Exposing extract to gimple optimization is highly desirable.
[Bug target/93230] PowerPC GCC vec_extract of a vector in memory does not fold sign/zero extension into load
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=93230 --- Comment #6 from Bill Schmidt --- That should read "rs6000_gimple_fold_builtin".
[Bug target/91274] vec_splat_[us]64 missing for ppc
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=91274 Bill Schmidt changed: What|Removed |Added Status|UNCONFIRMED |RESOLVED CC||wschmidt at gcc dot gnu.org Resolution|--- |INVALID --- Comment #2 from Bill Schmidt --- Such interfaces were never supported or promised for ppc64le. The fact that s390 supports them is irrelevant. The supported interfaces you're looking for are: vector signed long long vec_splats (signed long long); vector unsigned long long vec_splats (unsigned long long); See Appendix A of the ELF V2 ABI Specification for a list of all vector functions that must be supported by compilers for ppc64le.
[Bug target/91274] vec_splat_[us]64 missing for ppc
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=91274 --- Comment #4 from Bill Schmidt --- The short answer is history. Those others were inherited from the old Altivec PIM. Having splat-immediates with different names for different sizes and signedness isn't consistent with the rest of the vector architecture, which uses overloading to accomplish the same thing with fewer names to remember. So it wasn't deemed necessary to require all compilers to add yet another redundant interface when VSX came along. We could add the interfaces you request to GCC, but using them would be non-portable across compilers, and therefore they wouldn't be recommended. Adding to the list of interfaces to be supported by all compilers can be done, but will take time to propagate everywhere, and until then they would still not be recommended. So I'm not sure you really want to go that way. But if you do, feel free to re-open this as a feature request.
[Bug target/93448] PPC: missing builtin for DFP quantize(dqua,dquai,dquaq,dquaiq)
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=93448 Bill Schmidt changed: What|Removed |Added CC||meissner at gcc dot gnu.org --- Comment #1 from Bill Schmidt --- We haven't previously spent much effort on DFP intrinsics due to lack of interested users on Linux. Sounds like we have some now. :-) CCing Mike and Segher for the inline asm constraint question.
[Bug target/93449] PPC: Missing conversion builtin from vector to _Decimal128 and vice versa
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=93449 --- Comment #7 from Bill Schmidt --- The ELFv2 ABI Appendix B calls for a bcd data type defined as: typedef bcd vector unsigned char; and then defines a bunch of potential functions that can be built around it. The BCD functions (such as __builtin_bcdadd), earlier in the appendix, are defined in terms of vector unsigned char. GCC has just never gotten around to implementing these, due to a combination of user disinterest and resource constraints. We'll have to step up to these, hopefully in GCC 11 (though our plate is really full there already).
[Bug target/91903] vec_ctf altivec intrinsic can cause ICE on powerpc
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=91903 --- Comment #4 from Bill Schmidt --- Well, we should give you a better error message instead of an ICE. But the ABI definition of the second argument as "const int" indicates it needs to be an actual constant in the range 0..31. So You're Doing It Wrong (TM). If you have a need for b to be a variable, you'll probaby have to use a constant 0 and do the scaling yourself.
[Bug target/93230] PowerPC GCC vec_extract of a vector in memory does not fold sign/zero extension into load
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=93230 --- Comment #8 from Bill Schmidt --- Yes, the variable element numbers were the difficulties in question that slowed things down last time, as I recall. We may want to try to fold the simple cases in gimple and let the rest run through to expand.
[Bug target/91903] vec_ctf altivec intrinsic can cause ICE on powerpc
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=91903 Bill Schmidt changed: What|Removed |Added Assignee|unassigned at gcc dot gnu.org |wschmidt at gcc dot gnu.org Target Milestone|--- |10.0 --- Comment #5 from Bill Schmidt --- I'll have a look.
[Bug target/93570] PPC: __builtin_mtfsf does not return a value
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=93570 Bill Schmidt changed: What|Removed |Added Status|UNCONFIRMED |NEW Last reconfirmed||2020-02-05 Ever confirmed|0 |1 --- Comment #1 from Bill Schmidt --- Yes, looks like the documentation is wrong. Looking at GCC trunk: ftype = build_function_type_list (void_type_node, intSI_type_node, double_type_node, NULL_TREE); def_builtin ("__builtin_mtfsf", ftype, RS6000_BUILTIN_MTFSF); This indicates the correct prototype to be: void __builtin_mtfsf (const int, double); as you suggest. The documentation needs correcting, but you should be able to use the correct prototype in 8.3.0. This builtin hasn't changed in ages. Confirmed.
[Bug target/91903] vec_ctf altivec intrinsic can cause ICE on powerpc
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=91903 Bill Schmidt changed: What|Removed |Added Keywords||ice-on-invalid-code Target Milestone|10.0|11.0 --- Comment #6 from Bill Schmidt --- I could put a bandaid on this for GCC 10, but I don't think it's worthwhile. The rewrite of the builtin initialization code will take care of this in GCC 11. There are probably worse things than ice-on-invalid. I'll keep this open to track for the next release.
[Bug target/93570] PPC: __builtin_mtfsf does not return a value
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=93570 Bill Schmidt changed: What|Removed |Added Target Milestone|--- |10.0
[Bug target/93570] PPC: __builtin_mtfsf does not return a value
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=93570 Bill Schmidt changed: What|Removed |Added Status|NEW |RESOLVED Resolution|--- |FIXED --- Comment #3 from Bill Schmidt --- Fixed.
[Bug target/90763] PowerPC vec_xl_len should take const
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=90763 --- Comment #2 from Bill Schmidt --- Whoops, that was not supposed to go to bz. Sorry about that.
[Bug target/93709] [10 regression] fortran.dg/minlocval_4.f90 fails on power 9 after r10-4160
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=93709 --- Comment #1 from Bill Schmidt --- r10-4160 is the "daily bump" commit. How confident are you in your bisection? :-)
[Bug target/93819] PPC64 builtin vec_rlnm() argument order is wrong.
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=93819 Bill Schmidt changed: What|Removed |Added Component|c |target Target Milestone|--- |10.0
[Bug target/87560] ICE in curr_insn_transform, at lra-constraints.c:3892
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=87560 --- Comment #2 from Bill Schmidt --- Hm, I can't reproduce this with current trunk. Does it still occur for you, Martin?
[Bug target/87560] ICE in curr_insn_transform, at lra-constraints.c:3892
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=87560 --- Comment #3 from Bill Schmidt --- I expect the problem is still there somewhere, but it's gone latent. There haven't been any changes to *xxspltib__split since 2016. Will need to look at gcc-9 branch to debug.
[Bug target/87560] ICE in curr_insn_transform, at lra-constraints.c:3892
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=87560 --- Comment #4 from Bill Schmidt --- Although perhaps we've done a better job of sorting out these flags since then. Segher, anything ring a bell?
[Bug target/87560] ICE in curr_insn_transform, at lra-constraints.c:3892
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=87560 --- Comment #6 from Bill Schmidt --- OK, looks like the gimple has changed so we don't see the opportunity anymore in GCC 10.
[Bug tree-optimization/56321] [4.8 Regression] ICE:segfault in midend for -funsafe-math-optimizations -O3
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=56321 William J. Schmidt changed: What|Removed |Added CC||wschmidt at gcc dot gnu.org --- Comment #4 from William J. Schmidt 2013-02-14 14:42:10 UTC --- I'll have a look shortly. I see it marked as a 4.8 regression even though this work was done in 4.7. Does the test pass on 4.7?
[Bug tree-optimization/56321] [4.8 Regression] ICE:segfault in midend for -funsafe-math-optimizations -O3
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=56321 --- Comment #5 from William J. Schmidt 2013-02-14 14:43:29 UTC --- Actually I might be wrong about that, now that I think about it -- probably this was done in 4.8. It seems longer ago than that. ;)
[Bug tree-optimization/56321] [4.8 Regression] ICE:segfault in midend for -funsafe-math-optimizations -O3
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=56321 --- Comment #6 from William J. Schmidt 2013-02-14 20:11:32 UTC --- Odd. Reassociation makes a correct and profitable transformation into foo (int n) { double _2; double _5; double _6; double _7; double _8; float _9; : _2 = (double) n_1(D); _6 = _2 * 6.6662965923251249478198587894439697265625e-1; _7 = _6 + 2.0e+0; _5 = _7 * _2; _8 = _5; _9 = (float) _8; return _9; } but somehow verify_ssa() thinks the last statement (return _9;) contains a use of the undefined SSA name _4. Will continue to investigate later.
[Bug tree-optimization/56321] [4.8 Regression] ICE:segfault in midend for -funsafe-math-optimizations -O3
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=56321 William J. Schmidt changed: What|Removed |Added Status|NEW |ASSIGNED AssignedTo|unassigned at gcc dot |wschmidt at gcc dot gnu.org |gnu.org | --- Comment #7 from William J. Schmidt 2013-02-14 22:27:21 UTC --- I see. The problem is a memory VUSE on the return statement that no longer has a def. The VDEF was associated with the __builtin_pow statement that was expanded. Looks like I need to release the SSA name. Working on a fix.
[Bug tree-optimization/56321] [4.8 Regression] ICE:segfault in midend for -funsafe-math-optimizations -O3
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=56321 --- Comment #10 from William J. Schmidt 2013-02-15 15:13:55 UTC --- (In reply to comment #8) > (In reply to comment #7) > > I see. The problem is a memory VUSE on the return statement that no longer > > has > > a def. The VDEF was associated with the __builtin_pow statement that was > > expanded. Looks like I need to release the SSA name. Working on a fix. > > Use unlink_stmt_vdef when removing a stmt that possibly has a VDEF. Yes -- I've been trying that, but something more subtle seems to be going on. I think perhaps the statement isn't being removed but modified in place. I've been trying to unlink the VDEF when the call is known to go away later, and that's not doing it either. Going to have to get dirty with the debugger this morning.
[Bug tree-optimization/56321] [4.8 Regression] ICE:segfault in midend for -funsafe-math-optimizations -O3
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=56321 --- Comment #11 from William J. Schmidt 2013-02-15 15:49:03 UTC --- OK, got it. I was on the right track, there were just several locations where it could happen and I missed one.
[Bug fortran/48636] Enable more inlining with -O2 and higher
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=48636 William J. Schmidt changed: What|Removed |Added CC||wschmidt at gcc dot gnu.org --- Comment #43 from William J. Schmidt 2013-03-01 17:48:51 UTC --- (In reply to comment #38) > Looks like for x86 r193331 led to significant regression on 172.mgrid for -m32 > -O3 -funroll-loops The same degradation was seen on powerpc64-unknown-linux-gnu with r193331. The fix by Martin Jambor for PR55334 did not help for -m32. It did give a slight bump to -m64, but did not return the performance to pre-r193331 levels. So there still seems to be a problem with 172.mgrid related to this change.
[Bug fortran/48636] Enable more inlining with -O2 and higher
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=48636 William J. Schmidt changed: What|Removed |Added CC||bergner at gcc dot gnu.org --- Comment #44 from William J. Schmidt 2013-03-04 17:53:17 UTC --- Compiling mgrid.f on powerpc64-unknown-linux-gnu as follows: $ gfortran -S -m32 -O3 -mcpu=power7 -fpeel-loops -funroll-loops -ffast-math -fvect-cost-model mgrid.f I examined the assembly generated for revisions 193330, 193331 (this issue), and 196171 (PR55334). What I'm seeing is that for both 193331 and 196171, the inliner is much more aggressive, and in particular is inlining several copies of some pretty large functions. For -m32, I am not seeing any specialization of resid_, so although the change in 196171 helped a little, it appears that this was by reducing overall code size. There weren't any changes in inlining decisions. Of course there is a lot of distance between 193331 and 196171, so it is not a perfect comparison, though it appears 196171 is where -m32 received a slight boost. Anyway, the non-inlined call tree for 193330 is: main MAIN__ resid_ (x4) comm3_ psinv_ (x3) comm3_ norm2u3_ (x2) interp_ (x2) setup_ rprj3_ (x4) zran3_ The non-inlined call tree for 193331 is: main MAIN__ comm3_ (x5) resid_ comm3_ norm2u3_ (x2) setup_ zran3_ So with 193331 we have the following additional inlines: 3 inlines of resid_, size = 1068, total size = 3204 3 inlines of psinv_, size = 1046, total size = 3138 2 inlines of interp_, size = 1544, total size = 3088 4 inlines of rprj3_, size = 220, total size = 880 Here "size" is the number of lines of assembly code of the called procedure, including labels, so it's just a rough measure. The number of static call sites of comm3_ was also reduced by one, but I don't know whether it was inlined or specialized away. These are pretty large procedures to be duplicating, particularly to be duplicating more than once. Looking at resid_, it already generates spill code on its own, so putting 3 copies of this in its caller isn't likely to be very helpful. Of these, I think only rprj3_ looks like a reasonable inline candidate. Total lines of the assembly files are: 8660 r193330/mgrid.s 16398 r193331/mgrid.s 14592 r196171/mgrid.s Inlining creates unreachable code, so removing the unreachable procedures gives: 7765 r193330/mgrid.s 12591 r193331/mgrid.s 10795 r196171/mgrid.s With r196171 the reachable code is still about 40% larger than r193330 (where some reasonable inlining was already being done). This is better than the 60% bloat with r193331 but still seems too high. Again, these are rough measures but I think they are indicative. Without knowing anything about the inliner, I think the inlining heuristics probably need to take more account of code size than they seem to do at the moment, particularly when making more than one copy of a procedure and thus reducing spatial locality.
[Bug rtl-optimization/56605] New: Redundant branch introduced during loop2 phases
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=56605 Bug #: 56605 Summary: Redundant branch introduced during loop2 phases Classification: Unclassified Product: gcc Version: 4.8.0 Status: UNCONFIRMED Keywords: missed-optimization Severity: normal Priority: P3 Component: rtl-optimization AssignedTo: unassig...@gcc.gnu.org ReportedBy: wschm...@gcc.gnu.org CC: berg...@vnet.ibm.com Host: powerpc64-unknown-linux-gnu Target: powerpc64-unknown-linux-gnu Build: powerpc64-unknown-linux-gnu I've recently noticed that GCC commonly generates redundant branches prior to vectorized loops, such as: cmpwi 7,12,0 ... beq- 7,.L22 beq- 7,.L22 .p2align 4,,15 .L8: ... loop body ... The branches first appear in the 186r.loop2_doloop debug dump: (note 260 254 258 23 [bb 23] NOTE_INSN_BASIC_BLOCK) (insn 258 260 259 23 (set (reg:CC 330) (compare:CC (subreg:SI (reg:DI 153 [ bnd.10+-4 ]) 4) (const_int 0 [0]))) -1 (nil)) (jump_insn 259 258 263 23 (set (pc) (if_then_else (eq (reg:CC 330) (const_int 0 [0])) (label_ref 257) (pc))) -1 (expr_list:REG_BR_PROB (const_int 0 [0]) (nil)) -> 257) (note 263 259 261 24 [bb 24] NOTE_INSN_BASIC_BLOCK) (insn 261 263 262 24 (set (reg:CC 331) (compare:CC (subreg:SI (reg:DI 153 [ bnd.10+-4 ]) 4) (const_int 0 [0]))) -1 (nil)) (jump_insn 262 261 257 24 (set (pc) (if_then_else (eq (reg:CC 331) (const_int 0 [0])) (label_ref 257) (pc))) -1 (expr_list:REG_BR_PROB (const_int 0 [0]) (nil)) -> 257) Subsequently GCC removes the redundant compare, but does not remove the redundant branch. Simple test case to reproduce: #define N 4000 void foo(short* __restrict sb, int* __restrict ia) { int i; for (i = 0; i < N; i++) ia[i] = (int) sb[i]; } $GCC_INSTALL/bin/gcc -S -O3 -mvsx example.c (-mvsx is necessary to vectorize the loop. It may also be necessary to add -mcpu=power7.)
[Bug middle-end/35308] Straight line strength reduction
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=35308 William J. Schmidt changed: What|Removed |Added Target Milestone|4.8.1 |4.9.0 --- Comment #5 from William J. Schmidt 2013-03-25 13:17:39 UTC --- The unknown stride features made it into 4.8.0, but conditional candidates are still pending, and too complex for 4.8.1, I think. Changing target to 4.9 for the remaining work. I do plan to get that wrapped up fairly soon.
[Bug target/56843] New: PowerPC Newton-Raphson reciprocal estimates can be improved
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=56843 Bug #: 56843 Summary: PowerPC Newton-Raphson reciprocal estimates can be improved Classification: Unclassified Product: gcc Version: 4.9.0 Status: UNCONFIRMED Keywords: missed-optimization Severity: normal Priority: P3 Component: target AssignedTo: wschm...@gcc.gnu.org ReportedBy: wschm...@gcc.gnu.org Host: powerpc64-unknown-linux-gnu Target: powerpc64-unknown-linux-gnu Build: powerpc64-unknown-linux-gnu It was recently brought to my attention that the number of Newton-Raphson iterations for floating reciprocal-estimate and floating recriprocal-sqrt-estimate can be tightened. In particular, for 32-bit floating-point values targeting processors having higher precision estimates, a single iteration should suffice to produce maximum representable precision. We currently perform two. We should verify that one is actually sufficient in practice. We should also investigate whether 3 iterations is sufficient for 64-bit floating-point values when targeting processors having lower precision estimates. The theoretical math suggests 4 may be necessary, but this could be too conservative in practice as this is derived from a general bound on the method.
[Bug target/56843] PowerPC Newton-Raphson reciprocal estimates can be improved
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=56843 --- Comment #2 from Bill Schmidt 2013-04-04 16:12:31 UTC --- Regarding the last point, I found this in the user manual: "The double-precision square root estimate instructions are not generated by default on low-precision machines, since they do not provide an estimate that converges after three steps." That seems to indicate someone decided the libcall is better than a four-step iteration. That doesn't necessarily seem obvious to me.
[Bug target/56843] PowerPC Newton-Raphson reciprocal estimates can be improved
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=56843 --- Comment #3 from Bill Schmidt 2013-04-05 15:03:26 UTC --- Looks like we can improve performance for three cases on P6 and later machines: - 32-bit reciprocal square root: remove two instructions - 32-bit reciprocal: remove three instructions - 64-bit reciprocal: remove one instruction The last is due to a subtle bug in the existing implementation.
[Bug target/56843] PowerPC Newton-Raphson reciprocal estimates can be improved
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=56843 Bill Schmidt changed: What|Removed |Added Status|NEW |RESOLVED Resolution||FIXED --- Comment #4 from Bill Schmidt 2013-04-05 19:29:44 UTC --- Fixed in r197534.
[Bug tree-optimization/56933] New: [4.9 Regression] Vectorizer missing read-write dependency for interleaved accesses
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=56933 Bug #: 56933 Summary: [4.9 Regression] Vectorizer missing read-write dependency for interleaved accesses Classification: Unclassified Product: gcc Version: 4.9.0 Status: UNCONFIRMED Keywords: wrong-code Severity: normal Priority: P3 Component: tree-optimization AssignedTo: unassig...@gcc.gnu.org ReportedBy: wschm...@gcc.gnu.org CC: berg...@gcc.gnu.org, d...@gcc.gnu.org, rgue...@gcc.gnu.org Host: powerpc*-*-* Target: powerpc*-*-* Build: powerpc*-*-* Created attachment 29861 --> http://gcc.gnu.org/bugzilla/attachment.cgi?id=29861 Vectorization details dump for the test case vect_analyze_group_access() in tree-vect-data-refs.c contains a test for load-store dependencies: if (GROUP_READ_WRITE_DEPENDENCE (vinfo_for_stmt (next)) || GROUP_READ_WRITE_DEPENDENCE (vinfo_for_stmt (prev))) Currently this always returns false because this field has not yet been set in the vinfo. This began with r196872, where the code to analyze accesses was moved ahead of the code to analyze dependences. I put together a test demonstrating that it's possible for us to generate incorrect code as a result: subroutine test(a,b,c,d,e,f) integer k real*4, intent(out) :: a(1000) real*4, intent(out) :: b(1000) real*4, intent(in) :: c(1000) real*4, intent(inout) :: d(2000) real*4, intent(out) :: e(1000) real*4, intent(out) :: f(1000) do k = 1,1000 a(k) = 3.0 * d(2*k) e(k) = 3.3 * d(2*k+1) d(2*k) = 2.0 * c(k) d(2*k+1) = 2.3 * c(k) b(k) = d(2*k) - 5.5; f(k) = d(2*k+1) + 5.5; enddo return end I'm attaching a detailed dump of the vectorization pass that shows that the values of d(2*k) and d(2*k+1) used to compute b(k) and f(k) are the ones loaded prior to the stores to those locations. To reproduce on powerpc64-unknown-linux-gnu: $ gfortran -O3 -ffast-math -mcpu=power7 -fno-vect-cost-model interl-lsl-2.f
[Bug tree-optimization/56962] [4.8/4.9 Regression] SLSR caused miscompilation of fftw
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=56962 --- Comment #2 from Bill Schmidt 2013-04-15 13:19:53 UTC --- The fix looks correct to me. Thanks!
[Bug rtl-optimization/56605] Redundant branch introduced during loop2 phases
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=56605 Bill Schmidt changed: What|Removed |Added AssignedTo|unassigned at gcc dot |wschmidt at gcc dot gnu.org |gnu.org | Target Milestone|--- |4.8.1 --- Comment #4 from Bill Schmidt 2013-04-15 18:32:53 UTC --- This was fixed in trunk on 2013-03-20. Now that it's burned in for a few weeks, I'll plan to fix it in 4.8.1 shortly, provided there are no objections.
[Bug rtl-optimization/56605] Redundant branch introduced during loop2 phases
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=56605 Bill Schmidt changed: What|Removed |Added Status|NEW |RESOLVED Resolution||FIXED --- Comment #5 from Bill Schmidt 2013-04-23 19:49:08 UTC --- Fixed.
[Bug target/56864] [4.9 regression] FAIL: gcc.dg/vect/costmodel/ppc/costmodel-vect-76b.c scan-tree-dump-times vect "vectorized 1 loops" 0
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=56864 Bill Schmidt changed: What|Removed |Added CC||wschmidt at gcc dot gnu.org --- Comment #6 from Bill Schmidt 2013-05-01 17:49:00 UTC --- I can't confirm this today, either. The test passes with r198500. Andreas, do you still see a problem with the current trunk?
[Bug target/56864] [4.9 regression] FAIL: gcc.dg/vect/costmodel/ppc/costmodel-vect-76b.c scan-tree-dump-times vect "vectorized 1 loops" 0
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=56864 --- Comment #8 from Bill Schmidt 2013-05-01 20:13:35 UTC --- If possible, please check whether this began failing with r196872. That commit looks suspicious for at least one other test. I'm stabbing in the dark since I can't reproduce this one.
[Bug target/56865] [4.9 regression] FAIL: gcc.dg/vect/vect-42.c scan-tree-dump-times vect "Vectorizing an unaligned access" 4
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=56865 Bill Schmidt changed: What|Removed |Added CC||rguenth at gcc dot gnu.org, ||wschmidt at gcc dot gnu.org --- Comment #2 from Bill Schmidt 2013-05-01 21:58:09 UTC --- I've reproduced this as well. Additionally, gcc.dg/vect/vect-96.c fails similarly. Both tests began failing at r196872: 2013-03-21 Richard Biener * tree-vect-data-refs.c (vect_update_interleaving_chain): Remove. (vect_insert_into_interleaving_chain): Likewise. (vect_drs_dependent_in_basic_block): Inline ... (vect_slp_analyze_data_ref_dependence): ... here. New function, split out from ... (vect_analyze_data_ref_dependence): ... here. Simplify. (vect_check_interleaving): Simplify. (vect_analyze_data_ref_dependences): Likewise. Split out ... (vect_slp_analyze_data_ref_dependences): ... this new function. (dr_group_sort_cmp): New function. (vect_analyze_data_ref_accesses): Compute data-reference groups here instead of in vect_analyze_data_ref_dependence. Use a more efficient algorithm. * tree-vect-slp.c (vect_slp_analyze_bb_1): Use vect_slp_analyze_data_ref_dependences. Call vect_analyze_data_ref_accesses earlier. * tree-vect-loop.c (vect_analyze_loop_2): Likewise. * tree-vectorizer.h (vect_analyze_data_ref_dependences): Adjust. (vect_slp_analyze_data_ref_dependences): New prototype. Richi, I think this commit was not intended to have any functional effect -- is that correct?
[Bug target/56865] [4.9 regression] FAIL: gcc.dg/vect/vect-42.c scan-tree-dump-times vect "Vectorizing an unaligned access" 4
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=56865 --- Comment #4 from Bill Schmidt 2013-05-02 15:27:08 UTC --- (In reply to comment #3) > > Correct. Dumping order is affected by the patch though, thus if > we previously disabled vectorization at some point the dumping > before that can be affected due to the re-ordering. It appears that we are vectorizing the same loops, but we are now vectorizing one loop differently. In r196871, the loop is peeled for alignment. In r196872, the loop is versioned for alignment. I will attach the vectorization detail dumps for the two revisions.