[Bug target/87560] ICE in curr_insn_transform, at lra-constraints.c:3892

2020-03-04 Thread wschmidt at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=87560

--- Comment #9 from Bill Schmidt  ---
I plan to backport the fix to releases/gcc-9 after 9.3 releases.

[Bug target/91638] powerpc -mlong-double-NN (documentation) issues

2020-03-09 Thread wschmidt at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=91638

Bill Schmidt  changed:

   What|Removed |Added

 Status|WAITING |RESOLVED
 Resolution|--- |FIXED
 CC||wschmidt at gcc dot gnu.org

--- Comment #8 from Bill Schmidt  ---
Work is complete.

[Bug target/93709] [10 regression] fortran.dg/minlocval_4.f90 fails on power 9 after r10-4161

2020-03-11 Thread wschmidt at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=93709

Bill Schmidt  changed:

   What|Removed |Added

 Status|UNCONFIRMED |NEW
   Last reconfirmed||2020-03-11
 Ever confirmed|0   |1

--- Comment #7 from Bill Schmidt  ---
Confirmed, since fixed on trunk.  Do we want any backports?

[Bug testsuite/94019] [9 regression] gcc.dg/vect/vect-over-widen-17.c fails starting with g:370c2ebe8fa20e0812cd2d533d4ed38ee2d37c85, r9-1590

2020-03-11 Thread wschmidt at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=94019

--- Comment #3 from Bill Schmidt  ---
Looks like this could be closed, Kewen?

[Bug testsuite/94019] [9 regression] gcc.dg/vect/vect-over-widen-17.c fails starting with g:370c2ebe8fa20e0812cd2d533d4ed38ee2d37c85, r9-1590

2020-03-11 Thread wschmidt at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=94019

--- Comment #4 from Bill Schmidt  ---
Oh sorry, we are awaiting a backport.  Never mind.

[Bug target/87560] ICE in curr_insn_transform, at lra-constraints.c:3892

2020-03-12 Thread wschmidt at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=87560

--- Comment #10 from Bill Schmidt  ---
rs6000: Fix -mpower9-vector -mno-altivec ICE (PR87560)

PR87560 reports an ICE when a test case is compiled with -mpower9-vector
and -mno-altivec.  This patch terminates compilation with an error when
this combination (and other unreasonable ones) are requested.

Bootstrapped and tested on powerpc64le-unknown-linux-gnu with no
regressions.  Reported error is now:

f951: Error: '-mno-altivec' turns off '-mpower9-vector'

2020-03-12  Bill Schmidt  

Backport from master
2020-03-02  Bill Schmidt  

PR target/87560
* rs6000-cpus.def (OTHER_ALTIVEC_MASKS): New #define.
* rs6000.c (rs6000_disable_incompatible_switches): Add table entry
for OPTION_MASK_ALTIVEC.

[Bug target/87560] ICE in curr_insn_transform, at lra-constraints.c:3892

2020-03-12 Thread wschmidt at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=87560

Bill Schmidt  changed:

   What|Removed |Added

 Resolution|--- |FIXED
 Status|NEW |RESOLVED

--- Comment #11 from Bill Schmidt  ---
Backport complete, closing.

[Bug target/90000] Compile-time hog w/ impossible asm constraints on powerpc

2020-04-02 Thread wschmidt at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=9

Bill Schmidt  changed:

   What|Removed |Added

   Last reconfirmed||2020-04-02
 Ever confirmed|0   |1
 Status|UNCONFIRMED |NEW

--- Comment #3 from Bill Schmidt  ---
So, confirmed...

[Bug target/91804] [10 regression] r265398 breaks gcc.target/powerpc/vec-rlmi-rlnm.c

2020-04-08 Thread wschmidt at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=91804

Bill Schmidt  changed:

   What|Removed |Added

   Priority|P2  |P4

--- Comment #3 from Bill Schmidt  ---
Moving to P4.  This is not important, and we will fix this in GCC 11.

[Bug libstdc++/91153] New test case 29_atomics/atomic_float/1.cc execution test fails

2020-04-15 Thread wschmidt at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=91153

Bill Schmidt  changed:

   What|Removed |Added

 CC||jakub at gcc dot gnu.org

--- Comment #1 from Bill Schmidt  ---
Looks like this bug hasn't been classified (still P3).  Curious whether it's on
the list for P10 or deferred.

[Bug libstdc++/91153] New test case 29_atomics/atomic_float/1.cc execution test fails

2020-04-15 Thread wschmidt at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=91153

--- Comment #4 from Bill Schmidt  ---
Perfect, thanks!  I'll take it off my concern list...

[Bug target/94707] [8/9/10 Regression] class with empty base passed incorrectly with -std=c++17 on powerpc64le

2020-04-22 Thread wschmidt at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=94707

--- Comment #6 from Bill Schmidt  ---
The ELFv2 ABI has a prominent note specifying:

"Floating-point and vector aggregates that contain padding words and integer
fields with a width of 0 should not be treated as homogeneous aggregates."

[Bug target/94707] [8/9/10 Regression] class with empty base passed incorrectly with -std=c++17 on powerpc64le

2020-04-22 Thread wschmidt at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=94707

--- Comment #7 from Bill Schmidt  ---
ELF V1 does not have a concept of homogeneous aggregates.

[Bug target/94707] [8/9/10 Regression] class with empty base passed incorrectly with -std=c++17 on powerpc64le

2020-04-22 Thread wschmidt at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=94707

--- Comment #8 from Bill Schmidt  ---
Thus the compiler is acting as expected in both cases, so far as I can see.  If
C++17 has added new hidden fields, that seems to have introduced an
incompatibility between C++17 and C++14 targeted code for the ELFv2 ABI.

[Bug target/94954] New: Wrong code generation for vec_pack_to_short_fp32 builtin for Power

2020-05-05 Thread wschmidt at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=94954

Bug ID: 94954
   Summary: Wrong code generation for vec_pack_to_short_fp32
builtin for Power
   Product: gcc
   Version: 11.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: target
  Assignee: unassigned at gcc dot gnu.org
  Reporter: wschmidt at gcc dot gnu.org
  Target Milestone: ---

This builtin was mis-implemented.  It is supposed to pack 32-bit floating-point
values into 16-bit floating-point form.  Instead, the values are converted to
unsigned 16-bit integer form.  This should be fixed in all supported releases.

[Bug target/94954] Wrong code generation for vec_pack_to_short_fp32 builtin for Power

2020-05-05 Thread wschmidt at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=94954

Bill Schmidt  changed:

   What|Removed |Added

   Target Milestone|--- |11.0
   Keywords||wrong-code
 CC||segher at gcc dot gnu.org
 Ever confirmed|0   |1
 Target||powerpc*-*-*
   Last reconfirmed||2020-05-05
 Status|UNCONFIRMED |NEW

--- Comment #1 from Bill Schmidt  ---
Confirmed.

[Bug target/95082] New: LE implementations of vec_cnttz_lsbb and vec_cntlz_lsbb are wrong

2020-05-12 Thread wschmidt at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=95082

Bug ID: 95082
   Summary: LE implementations of vec_cnttz_lsbb and
vec_cntlz_lsbb are wrong
   Product: gcc
   Version: 11.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: target
  Assignee: unassigned at gcc dot gnu.org
  Reporter: wschmidt at gcc dot gnu.org
  Target Milestone: ---

For little endian, we need to swap vctzlsbb and vclzlsbb, but today we generate
the BE instruction in all cases.

[Bug target/95082] LE implementations of vec_cnttz_lsbb and vec_cntlz_lsbb are wrong

2020-05-12 Thread wschmidt at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=95082

Bill Schmidt  changed:

   What|Removed |Added

   Keywords||wrong-code
 Status|UNCONFIRMED |NEW
   Last reconfirmed||2020-05-12
 Ever confirmed|0   |1
   Target Milestone|--- |11.0
 CC||segher at gcc dot gnu.org

--- Comment #1 from Bill Schmidt  ---
Confirmed.

[Bug fortran/95053] [11 regression] ICE in f951: gfc_divide()

2020-05-14 Thread wschmidt at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=95053

Bill Schmidt  changed:

   What|Removed |Added

 CC||wschmidt at gcc dot gnu.org

--- Comment #22 from Bill Schmidt  ---
Breaking legitimate code, even if "borderline," does not seem right to me.  
Zero division is generally a runtime exception because of such cases.

You write code for a general case, then later you discover "oh, well, we could
make this variable zero for our specific usage," and now the compiler throws a
fit?  Seems like this is warning-level stuff.

[Bug fortran/95053] [11 regression] ICE in f951: gfc_divide()

2020-05-14 Thread wschmidt at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=95053

--- Comment #25 from Bill Schmidt  ---
But I'm not going to worry about it further.

[Bug target/70053] Returning a struct of _Decimal128 values generates extraneous stores and loads

2020-05-22 Thread wschmidt at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=70053

Bill Schmidt  changed:

   What|Removed |Added

 Ever confirmed|0   |1
 Status|UNCONFIRMED |NEW
   Last reconfirmed||2020-05-22

--- Comment #5 from Bill Schmidt  ---
I'd like to understand why the difference between -O2 and -O3 exists.  We
shouldn't generate this kind of nasty store-load at -O2.

Confirmed, BTW. :)

[Bug target/95737] PPC: Unnecessary extsw after negative less than

2020-06-19 Thread wschmidt at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=95737

--- Comment #1 from Bill Schmidt  ---
Please test this out of context of a return statement.  The problem with
unnecessary extends of return values is widely known and not specific to this
particular case.

[Bug target/95737] PPC: Unnecessary extsw after negative less than

2020-06-19 Thread wschmidt at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=95737

Bill Schmidt  changed:

   What|Removed |Added

 Resolution|--- |DUPLICATE
 Status|UNCONFIRMED |RESOLVED

--- Comment #2 from Bill Schmidt  ---
If you can show this is different from 65010 (not a return value issue), please
reopen.

*** This bug has been marked as a duplicate of bug 65010 ***

[Bug target/65010] ppc backend generates unnecessary signed extension

2020-06-19 Thread wschmidt at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=65010

Bill Schmidt  changed:

   What|Removed |Added

 CC||jens.seifert at de dot ibm.com

--- Comment #10 from Bill Schmidt  ---
*** Bug 95737 has been marked as a duplicate of this bug. ***

[Bug target/95952] [8 Regression] gcc-8 bootstrap failure on powerpc64-linux

2020-06-29 Thread wschmidt at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=95952

Bill Schmidt  changed:

   What|Removed |Added

 CC||willschm at gcc dot gnu.org

--- Comment #2 from Bill Schmidt  ---
Ah, this is Will, not me.  Will, can you please look into this ASAP?

[Bug target/96017] Powerpc suboptimal register spill in likely path

2020-07-01 Thread wschmidt at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=96017

Bill Schmidt  changed:

   What|Removed |Added

 CC||segher at gcc dot gnu.org,
   ||wschmidt at gcc dot gnu.org
   Target Milestone|--- |9.4
  Build|gcc version 9.2.1 20190909  |
   |(Debian 9.2.1-8)|
   Keywords||missed-optimization

--- Comment #1 from Bill Schmidt  ---
Built with gcc version 9.2.1 20190909 (Debian 9.2.1-8) (moved from Build
field).

[Bug target/96017] Powerpc suboptimal register spill in likely path

2020-07-01 Thread wschmidt at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=96017

Bill Schmidt  changed:

   What|Removed |Added

   Target Milestone|9.4 |11.0

[Bug target/96017] Powerpc suboptimal register spill in likely path

2020-07-01 Thread wschmidt at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=96017

--- Comment #2 from Bill Schmidt  ---
Nick reports same behavior at -O3.

[Bug target/96139] Vector element extract mistypes long long int down to long int

2020-07-09 Thread wschmidt at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=96139

--- Comment #2 from Bill Schmidt  ---
Have you tried it for -m32, out of curiosity?

[Bug target/96787] rs6000 mcpu=power10 miscompiles libiberty htab_delete() causing bootstrap failure

2020-08-25 Thread wschmidt at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=96787

--- Comment #5 from Bill Schmidt  ---
The divergence occurs after .L75 in the two versions.  In the P10 version, we
see that the second bctrl has been converted into a bctr.  It looks like a tail
call optimization happening, but we aren't at the end of the function.  This
happens again later for the second bctrl after .L78.

Why would we think a tail call optimization can happen in the middle of a
block...

[Bug target/96787] rs6000 mcpu=power10 miscompiles libiberty htab_delete() causing bootstrap failure

2020-08-25 Thread wschmidt at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=96787

Bill Schmidt  changed:

   What|Removed |Added

 CC||amodra at gcc dot gnu.org

--- Comment #6 from Bill Schmidt  ---
Ah, I can't read.  These really are at the end(s) of the function, so it does
appear to be a tail call optimization that we've decided is legitimate.

It looks like the problem is that we don't set up r12 prior to the tail call. 
Unlike the usual case, here we have "mtctr 9" to set up the CTR and r12 still
points to the previously called function.  That can't be good.

This is exposed by my recent patch to allow more tail calls in
rs6000_decl_ok_for_sibcall, but it's not clear to me where we need to fix
things up so that r12 gets set.

[Bug target/96787] rs6000 mcpu=power10 miscompiles libiberty htab_delete() causing bootstrap failure

2020-08-25 Thread wschmidt at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=96787

--- Comment #7 from Bill Schmidt  ---
I believe the problem may be that rs6000_sibcall_aix doesn't contain any
handling for indirect calls, whereas similar code for other ABIs, like
rs6000_sibcall_sysv, does.  Alan, does this make sense?

[Bug target/96787] rs6000 mcpu=power10 miscompiles libiberty htab_delete() causing bootstrap failure

2020-08-26 Thread wschmidt at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=96787

--- Comment #8 from Bill Schmidt  ---
I'm working on a patch.

[Bug target/96787] rs6000 mcpu=power10 miscompiles libiberty htab_delete() causing bootstrap failure

2020-08-27 Thread wschmidt at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=96787

Bill Schmidt  changed:

   What|Removed |Added

 Status|NEW |RESOLVED
 Resolution|--- |FIXED

--- Comment #10 from Bill Schmidt  ---
Work is complete.

[Bug target/96791] ICE in convert_mode_scalar, at expr.c:412

2020-08-27 Thread wschmidt at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=96791

Bill Schmidt  changed:

   What|Removed |Added

 CC||luoxhu at gcc dot gnu.org

--- Comment #2 from Bill Schmidt  ---
CCing Xiong Hu, as I suspect this is related to the recent work for partially
dead stores.

[Bug target/96791] ICE in convert_mode_scalar, at expr.c:412

2020-08-27 Thread wschmidt at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=96791

--- Comment #4 from Bill Schmidt  ---
Not the partially dead store code after all -- just a coincidence!

[Bug target/92132] new test case gcc.dg/vect/vect-cond-reduc-4.c fails with its introduction in r277067

2019-10-17 Thread wschmidt at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=92132

--- Comment #2 from Bill Schmidt  ---
Yes, odd that the comparison is flagged as not vectorizable.

[Bug testsuite/92093] New test case gcc.target/powerpc/pr91275.c from r276410 fails on BE

2019-10-17 Thread wschmidt at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=92093

--- Comment #4 from Bill Schmidt  ---
Author: wschmidt
Date: Thu Oct 17 15:32:40 2019
New Revision: 277117

URL: https://gcc.gnu.org/viewcvs?rev=277117&root=gcc&view=rev
Log:
2019-10-17  Bill Schmidt  

Backport from mainline
2019-10-15  Bill Schmidt  

PR target/92093
* gcc.target/powerpc/pr91275.c: Fix type and endian issues.


Modified:
branches/gcc-9-branch/gcc/testsuite/ChangeLog
branches/gcc-9-branch/gcc/testsuite/gcc.target/powerpc/pr91275.c

[Bug testsuite/92093] New test case gcc.target/powerpc/pr91275.c from r276410 fails on BE

2019-10-17 Thread wschmidt at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=92093

--- Comment #5 from Bill Schmidt  ---
Author: wschmidt
Date: Thu Oct 17 15:33:58 2019
New Revision: 277118

URL: https://gcc.gnu.org/viewcvs?rev=277118&root=gcc&view=rev
Log:
2019-10-17  Bill Schmidt  

Backport from mainline
2019-10-15  Bill Schmidt  

PR target/92093
* gcc.target/powerpc/pr91275.c: Fix type and endian issues.


Modified:
branches/gcc-8-branch/gcc/testsuite/ChangeLog
branches/gcc-8-branch/gcc/testsuite/gcc.target/powerpc/pr91275.c

[Bug testsuite/92093] New test case gcc.target/powerpc/pr91275.c from r276410 fails on BE

2019-10-17 Thread wschmidt at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=92093

--- Comment #6 from Bill Schmidt  ---
Author: wschmidt
Date: Thu Oct 17 15:35:28 2019
New Revision: 277119

URL: https://gcc.gnu.org/viewcvs?rev=277119&root=gcc&view=rev
Log:
2019-10-17  Bill Schmidt  

Backport from mainline
2019-10-15  Bill Schmidt  

PR target/92093
* gcc.target/powerpc/pr91275.c: Fix type and endian issues.


Modified:
branches/gcc-7-branch/gcc/testsuite/ChangeLog
branches/gcc-7-branch/gcc/testsuite/gcc.target/powerpc/pr91275.c

[Bug testsuite/92093] New test case gcc.target/powerpc/pr91275.c from r276410 fails on BE

2019-10-17 Thread wschmidt at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=92093

Bill Schmidt  changed:

   What|Removed |Added

 Status|NEW |RESOLVED
  Known to work||7.4.0, 8.3.0, 9.1.0
 Resolution|--- |FIXED
  Known to fail|7.4.0, 8.3.0, 9.1.0 |

--- Comment #7 from Bill Schmidt  ---
Fixed everywhere.

[Bug testsuite/92093] New test case gcc.target/powerpc/pr91275.c from r276410 fails on BE

2019-10-17 Thread wschmidt at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=92093

Bill Schmidt  changed:

   What|Removed |Added

 Status|RESOLVED|CLOSED

--- Comment #8 from Bill Schmidt  ---
Closing.

[Bug testsuite/92126] gcc.dg/vect/pr62171.c fails on power7

2019-10-17 Thread wschmidt at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=92126

--- Comment #4 from Bill Schmidt  ---
Should we close this?  I found it on an internal list of old failures on P7
that need looking at.  Not sure having this issue open provides value.

[Bug target/92287] Mismatches in the calling convention for zero sized types

2019-10-30 Thread wschmidt at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=92287

--- Comment #5 from Bill Schmidt  ---
For 32-bit big-endian PowerPC (using the 32-bit ELF ABI), the same code
generation is provided by GCC and Clang.  I.e., here's the code generation for
Clang with -O2 -m32 -mbig-endian, using 6.0.0-1ubuntu2:

id_foo: # @id_foo
.Lfunc_begin0:
# %bb.0:
mr 3, 4
blr

The ABI document used to be posted at power.org, which is defunct.  However,
the sources are available at github:

https://github.com/ryanarn/powerabi

For the 32-bit ELF ABI, all structs (regardless of size) are passed using a
pointer allowing for call-by-value semantics.  This is the source of ZSTs
requiring a register.  So it's clear there is an ABI that requires this
behavior.  (Look for the Parameter Passing Register Selection Algorithm in
https://github.com/ryanarn/powerabi/blob/master/chap3-elf32abi.sgml.)

The 64-bit ABIs (both ELF V1 and ELF V2) pass structures in registers, and the
parameter passing algorithms won't assign registers for size-0 aggregates. 
This is intentional.

I hope this is helpful!

Bill

[Bug target/91886] [10 regression] powerpc64 impossible constraint in asm

2019-11-08 Thread wschmidt at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=91886

--- Comment #32 from Bill Schmidt  ---
BTW, we are in close contact with the Clang folks for Power as well, so we're
going to get together with them about constraints consistency and a way forward
to ensure these problems don't recur.  I don't want anyone to get the idea that
we don't care about Clang; we care very much indeed about compatibility between
the compilers.  I've personally worked on both in the past to ensure the ABI
compatibility that you enjoy today.  We had an unexpected gotcha here, and
we'll get it resolved.

I do agree with Segher that this was a long-overdue cleanup that was causing us
a lot of misery, and the use of "ws" in the field was rather surprising to us. 
Long term we really want to get to "wa" and remove "ws", but aliases in the
meantime will be needed until supported versions of Clang and GCC have the
compatibility issue resolved.

[Bug tree-optimization/92098] [9 Regression] After r262333, the following code cannot be vectorized on powerpc64le.

2019-11-15 Thread wschmidt at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=92098

Bill Schmidt  changed:

   What|Removed |Added

 Status|RESOLVED|REOPENED
   Last reconfirmed||2019-11-15
 Resolution|DUPLICATE   |---
   Target Milestone|10.0|9.3
Summary|[10 Regression] After   |[9 Regression] After
   |r262333, the following code |r262333, the following code
   |cannot be vectorized on |cannot be vectorized on
   |powerpc64le.|powerpc64le.
 Ever confirmed|0   |1

--- Comment #2 from Bill Schmidt  ---
This was actually reported to us as a [9 regression], so until 92132 is
backported (and we need to check that the backport fixes the issue), this
should stay open, I guess.

[Bug tree-optimization/53947] [meta-bug] vectorizer missed-optimizations

2019-11-15 Thread wschmidt at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=53947
Bug 53947 depends on bug 92098, which changed state.

Bug 92098 Summary: [9 Regression] After r262333, the following code cannot be 
vectorized on powerpc64le.
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=92098

   What|Removed |Added

 Status|RESOLVED|REOPENED
 Resolution|DUPLICATE   |---

[Bug testsuite/92398] [10 regression] error in update of gcc.target/powerpc/pr72804.c in r277872

2019-12-09 Thread wschmidt at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=92398

Bill Schmidt  changed:

   What|Removed |Added

 Status|UNCONFIRMED |NEW
   Last reconfirmed||2019-12-09
 Ever confirmed|0   |1

--- Comment #11 from Bill Schmidt  ---
Confirmed. ;-)  Is this ready to close?

[Bug target/92923] __builtin_vec_xor() causes subregs to be used when not using V4SImode vectors

2019-12-12 Thread wschmidt at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=92923

Bill Schmidt  changed:

   What|Removed |Added

 Status|UNCONFIRMED |NEW
   Last reconfirmed||2019-12-12
 CC||wschmidt at gcc dot gnu.org
 Ever confirmed|0   |1

--- Comment #1 from Bill Schmidt  ---
Confirmed.  The problem is bad overloading code for vec_xor, which accepts all
vector types but translates them all into V4SI mode instead of having
individual patterns for the different modes:

In rs6000-builtin.def:

BU_ALTIVEC_2 (VXOR,   "vxor",   CONST,  xorv4si3)

There should be multiples of these for different vector modes, not just one for
all of them.  In rs6000-c.c:

  { ALTIVEC_BUILTIN_VEC_XOR, ALTIVEC_BUILTIN_VXOR,
RS6000_BTI_V4SF, RS6000_BTI_V4SF, RS6000_BTI_V4SF, 0 },
  { ALTIVEC_BUILTIN_VEC_XOR, ALTIVEC_BUILTIN_VXOR,
RS6000_BTI_V4SF, RS6000_BTI_V4SF, RS6000_BTI_bool_V4SI, 0 },
  { ALTIVEC_BUILTIN_VEC_XOR, ALTIVEC_BUILTIN_VXOR,
RS6000_BTI_V4SF, RS6000_BTI_bool_V4SI, RS6000_BTI_V4SF, 0 },
  { ALTIVEC_BUILTIN_VEC_XOR, ALTIVEC_BUILTIN_VXOR,
RS6000_BTI_V2DF, RS6000_BTI_V2DF, RS6000_BTI_V2DF, 0 },
  { ALTIVEC_BUILTIN_VEC_XOR, ALTIVEC_BUILTIN_VXOR,
RS6000_BTI_V2DF, RS6000_BTI_V2DF, RS6000_BTI_bool_V2DI, 0 },
  { ALTIVEC_BUILTIN_VEC_XOR, ALTIVEC_BUILTIN_VXOR,
RS6000_BTI_V2DF, RS6000_BTI_bool_V2DI, RS6000_BTI_V2DF, 0 },
  { ALTIVEC_BUILTIN_VEC_XOR, ALTIVEC_BUILTIN_VXOR,
RS6000_BTI_V2DI, RS6000_BTI_V2DI, RS6000_BTI_V2DI, 0 },
  { ALTIVEC_BUILTIN_VEC_XOR, ALTIVEC_BUILTIN_VXOR,
RS6000_BTI_V2DI, RS6000_BTI_V2DI, RS6000_BTI_bool_V2DI, 0 },
  { ALTIVEC_BUILTIN_VEC_XOR, ALTIVEC_BUILTIN_VXOR,
RS6000_BTI_V2DI, RS6000_BTI_bool_V2DI, RS6000_BTI_V2DI, 0 },
  { ALTIVEC_BUILTIN_VEC_XOR, ALTIVEC_BUILTIN_VXOR,
RS6000_BTI_unsigned_V2DI, RS6000_BTI_unsigned_V2DI,
RS6000_BTI_unsigned_V2D\
I, 0 },
  { ALTIVEC_BUILTIN_VEC_XOR, ALTIVEC_BUILTIN_VXOR,
RS6000_BTI_unsigned_V2DI, RS6000_BTI_unsigned_V2DI, RS6000_BTI_bool_V2DI,
0\
 },
  { ALTIVEC_BUILTIN_VEC_XOR, ALTIVEC_BUILTIN_VXOR,
RS6000_BTI_unsigned_V2DI, RS6000_BTI_bool_V2DI, RS6000_BTI_unsigned_V2DI,
0\
 },
  { ALTIVEC_BUILTIN_VEC_XOR, ALTIVEC_BUILTIN_VXOR,
RS6000_BTI_bool_V2DI, RS6000_BTI_bool_V2DI, RS6000_BTI_bool_V2DI, 0 },
  { ALTIVEC_BUILTIN_VEC_XOR, ALTIVEC_BUILTIN_VXOR,
RS6000_BTI_bool_V4SI, RS6000_BTI_bool_V4SI, RS6000_BTI_bool_V4SI, 0 },
  { ALTIVEC_BUILTIN_VEC_XOR, ALTIVEC_BUILTIN_VXOR,
RS6000_BTI_V4SI, RS6000_BTI_bool_V4SI, RS6000_BTI_V4SI, 0 },
  { ALTIVEC_BUILTIN_VEC_XOR, ALTIVEC_BUILTIN_VXOR,
RS6000_BTI_V4SI, RS6000_BTI_V4SI, RS6000_BTI_bool_V4SI, 0 },
  { ALTIVEC_BUILTIN_VEC_XOR, ALTIVEC_BUILTIN_VXOR,
RS6000_BTI_V4SI, RS6000_BTI_V4SI, RS6000_BTI_V4SI, 0 },
  { ALTIVEC_BUILTIN_VEC_XOR, ALTIVEC_BUILTIN_VXOR,
RS6000_BTI_unsigned_V4SI, RS6000_BTI_bool_V4SI, RS6000_BTI_unsigned_V4SI,
0\
 },
  { ALTIVEC_BUILTIN_VEC_XOR, ALTIVEC_BUILTIN_VXOR,
RS6000_BTI_unsigned_V4SI, RS6000_BTI_unsigned_V4SI, RS6000_BTI_bool_V4SI,
0\
 },
  { ALTIVEC_BUILTIN_VEC_XOR, ALTIVEC_BUILTIN_VXOR,
RS6000_BTI_unsigned_V4SI, RS6000_BTI_unsigned_V4SI,
RS6000_BTI_unsigned_V4S\
I, 0 },
  { ALTIVEC_BUILTIN_VEC_XOR, ALTIVEC_BUILTIN_VXOR,
RS6000_BTI_bool_V8HI, RS6000_BTI_bool_V8HI, RS6000_BTI_bool_V8HI, 0 },
  { ALTIVEC_BUILTIN_VEC_XOR, ALTIVEC_BUILTIN_VXOR,
RS6000_BTI_V8HI, RS6000_BTI_bool_V8HI, RS6000_BTI_V8HI, 0 },
  { ALTIVEC_BUILTIN_VEC_XOR, ALTIVEC_BUILTIN_VXOR,
RS6000_BTI_V8HI, RS6000_BTI_V8HI, RS6000_BTI_bool_V8HI, 0 },
  { ALTIVEC_BUILTIN_VEC_XOR, ALTIVEC_BUILTIN_VXOR,
RS6000_BTI_V8HI, RS6000_BTI_V8HI, RS6000_BTI_V8HI, 0 },
  { ALTIVEC_BUILTIN_VEC_XOR, ALTIVEC_BUILTIN_VXOR,
RS6000_BTI_unsigned_V8HI, RS6000_BTI_bool_V8HI, RS6000_BTI_unsigned_V8HI,
0\
 },
  { ALTIVEC_BUILTIN_VEC_XOR, ALTIVEC_BUILTIN_VXOR,
RS6000_BTI_unsigned_V8HI, RS6000_BTI_unsigned_V8HI, RS6000_BTI_bool_V8HI,
0\
 },
  { ALTIVEC_BUILTIN_VEC_XOR, ALTIVEC_BUILTIN_VXOR,
RS6000_BTI_unsigned_V8HI, RS6000_BTI_unsigned_V8HI,
RS6000_BTI_unsigned_V8H\
I, 0 },
  { ALTIVEC_BUILTIN_VEC_XOR, ALTIVEC_BUILTIN_VXOR,
RS6000_BTI_V16QI, RS6000_BTI_bool_V16QI, RS6000_BTI_V16QI, 0 },
  { ALTIVEC_BUILTIN_VEC_XOR, ALTIVEC_BUILTIN_VXOR,
RS6000_BTI_bool_V16QI, RS6000_BTI_bool_V16QI, RS6000_BTI_bool_V16QI, 0 },
  { ALTIVEC_BUILTIN_VEC_XOR, ALTIVEC_BUILTIN_VXOR,
RS6000_BTI_V16QI, RS6000_BTI_V16QI, RS6000_BTI_bool_V16QI, 0 },
  { ALTIVEC_BUILTIN_VEC_XOR, ALTIVEC_BUILTIN_VXOR,
RS6000_BTI_V16QI, RS6000_BTI_V16QI, RS6000_BTI_V16QI, 0 },
  { ALTIVEC_BUILTIN_VEC_XOR, ALTIVEC_BUILTIN_VXOR,
RS6000_BTI_unsigned_V16QI, RS6000_BTI_bool_V16QI,
RS6000_BTI_unsigned_V16QI\
, 0 },
  { ALTIVEC_BUILTIN_VEC_XOR, ALTIVEC_BUILTIN_VXOR,
RS6000_BTI_unsigned_V16QI, RS6000_BTI_unsigned_V16QI,
RS6000_BTI_bool_

[Bug target/91534] some defined builtins are not usable

2019-12-15 Thread wschmidt at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=91534

Bill Schmidt  changed:

   What|Removed |Added

 Status|UNCONFIRMED |RESOLVED
 CC||wschmidt at gcc dot gnu.org
 Resolution|--- |INVALID

--- Comment #2 from Bill Schmidt  ---
For clarity, many of these interfaces are only used internally as part of
mappings from overloaded builtins to builtins for a specific set of vector type
arguments.  Ultimately the interface that the user sees will be something like
vec_madd.  These internal tables are not intended to be a source of all
possible interfaces that users can access.

Accepted vector interfaces are defined in Appendix A of the Power ELF v2 ABI. 
Better documentation of them is in progress and should become available in
1H2020.  Overhauling the whole Power-specific builtin system is on my list for
GCC 11 if I can make the time.

[Bug target/93011] PowerPC GCC has warning that aggregate alignment changed in GCC 5

2019-12-19 Thread wschmidt at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=93011

--- Comment #1 from Bill Schmidt  ---
This is worth considering; but offhand I don't believe we should remove this
until common distros that use GCC 4.8 or 4.9 as default are retired (RHEL 7 and
SLES 12, for example, both use 4.8 as default and are still supported).  This
would take us out until 2027 at least, for long-term support contracts...

[Bug target/93011] PowerPC GCC has warning that aggregate alignment changed in GCC 5

2019-12-19 Thread wschmidt at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=93011

Bill Schmidt  changed:

   What|Removed |Added

 Status|UNCONFIRMED |NEW
   Last reconfirmed||2019-12-19
 Ever confirmed|0   |1

--- Comment #2 from Bill Schmidt  ---
But sure, confirmed. ;-)

[Bug tree-optimization/93013] PPC: optimization around modulo leads to incorrect result

2019-12-19 Thread wschmidt at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=93013

Bill Schmidt  changed:

   What|Removed |Added

 Status|UNCONFIRMED |NEW
   Last reconfirmed||2019-12-19
 CC||meissner at gcc dot gnu.org,
   ||segher at gcc dot gnu.org
  Component|c++ |tree-optimization
 Ever confirmed|0   |1

--- Comment #1 from Bill Schmidt  ---
The branch is removed by the middle end, as the optimized dump shows:

;; Function mod (_Z3modiiRi, funcdef_no=0, decl_uid=3256, cgraph_uid=1,
symbol_\
order=0)

mod (int x, int y, int & z)
{
  int _1;
  bool _3;
  int _9;

   [local count: 1073741824]:
  _1 = x_4(D) % y_5(D);
  *z_7(D) = _1;
  _3 = y_5(D) == 0;
  _9 = (int) _3;
  return _9;

}

On POWER9 we will get the expected answer given the use of the modsw
instruction.  For POWER8 we get the codegen with divw which is indeed undefined
for these inputs.  I guess the issue would be in the expander for mod3.

Confirmed.

[Bug target/93013] PPC: optimization around modulo leads to incorrect result

2019-12-19 Thread wschmidt at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=93013

Bill Schmidt  changed:

   What|Removed |Added

 Target|powerpc-ibm-aix7.1.0.0  |powerpc-*-*-*
  Component|tree-optimization   |target
   Target Milestone|--- |8.4

[Bug target/70928] Load simple float constants via VSX operations on PowerPC

2020-01-06 Thread wschmidt at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=70928

Bill Schmidt  changed:

   What|Removed |Added

 CC||jens.seifert at de dot ibm.com

--- Comment #3 from Bill Schmidt  ---
*** Bug 93128 has been marked as a duplicate of this bug. ***

[Bug target/93128] PPC small floating point constants can be constructed using vector operations

2020-01-06 Thread wschmidt at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=93128

Bill Schmidt  changed:

   What|Removed |Added

 Status|UNCONFIRMED |RESOLVED
 CC||wschmidt at gcc dot gnu.org
 Resolution|--- |DUPLICATE

--- Comment #2 from Bill Schmidt  ---
This is a duplicate of PR70928.

*** This bug has been marked as a duplicate of bug 70928 ***

[Bug target/93206] non-delegitimized UNSPEC generated for C program on PowerPc with current mainline GCC tree

2020-01-09 Thread wschmidt at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=93206

Bill Schmidt  changed:

   What|Removed |Added

 CC||wschmidt at gcc dot gnu.org

--- Comment #2 from Bill Schmidt  ---
(In reply to Jakub Jelinek from comment #1)
> note: non-delegitimized UNSPEC
> is just a debugging note solely in non-release checking builds.  Not all
> UNSPECs need to be delegitimized, it is just a hint that it is something
> that could be inspected, whether it can be easily delegitimized or not (see
> rs6000_delegitimize_address).
> As UNSPEC_FOO is not in upstream GCC, I fail to see the need for upstream PR.

What may not be crystal clear here is that var-tracking is making a mistake. 
The original problem doesn't look quite like the obfuscated one here -- this
came up while working on some future work where an UNSPEC does seem necessary. 
We get something like

  result = complex-thing-needing-UNSPEC
  result = expression-involving-result

where both definitions of result get assigned to the same register, say r31. 
The first statement has a var_location note saying result is in r31.  The
second statement has a var_location note saying result is in the UNSPEC RTL
generated from complex-thing-needing-UNSPEC.

This is absolutely wrong, because result is once again in r31 at this point;
it's a new lifetime of result, but that's where it is.  That's the bug we're
attempting to report here.  Unfortunately, as the test was changed to hide
stuff we can't disclose yet, the error message moved to a different place and
another UNSPEC, which as you note is less obviously necessary as an UNSPEC
anyway.  But that's an unimportant detail.  This var-tracking bug is making it
hard to develop the code for this new feature.

The original test was testing four versions of this instruction, each with a
different mode.  Only the V4DI version fails due to the var-tracking error.

So the upstream PR is needed so development can proceed.

> Anyway, I have to wonder why vsx.md uses so many UNSPECs, can't e.g.
> UNSPEC_VSX_SET be just using vec_merge of vec_duplicate of the scalar
> operand (what is inserted) and the vector operand, with the position as last
> operand?
> Is the reason the endian correction?

There are too many unnecessary UNSPECs in the Power back end, yeah.  We're
rooting those out as we have time.  But this bug shows up with a necessary
UNSPEC originally, so that's not relevant for this report.

[Bug debug/93206] non-delegitimized UNSPEC generated for C program on PowerPc with current mainline GCC tree

2020-01-09 Thread wschmidt at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=93206

--- Comment #6 from Bill Schmidt  ---
(In reply to Jakub Jelinek from comment #4)
> There is no error, it is a note and if some variable at some point, even
> short one, can't be described using just registers or memory, but needs the
> value of the UNSPEC to describe it, there is no var-tracking bug, it just
> tries to build debug info from the UNSPEC and finds out it can't.

But there *is* a register that describes the variable.  It's wrongly using the
UNSPEC instead.  I contend that this is indeed a bug.

[Bug target/93230] PowerPC GCC vec_extract of a vector in memory does not fold sign/zero extension into load

2020-01-13 Thread wschmidt at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=93230

--- Comment #5 from Bill Schmidt  ---
Yeah, vec_extract should get folded in rs6000_fold_builtin eventually.  I think
that Will had a patch in progress on this at one time, but ran into some
difficulties and it got abandoned in favor of more urgent work.  Exposing
extract to gimple optimization is highly desirable.

[Bug target/93230] PowerPC GCC vec_extract of a vector in memory does not fold sign/zero extension into load

2020-01-13 Thread wschmidt at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=93230

--- Comment #6 from Bill Schmidt  ---
That should read "rs6000_gimple_fold_builtin".

[Bug target/91274] vec_splat_[us]64 missing for ppc

2020-01-21 Thread wschmidt at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=91274

Bill Schmidt  changed:

   What|Removed |Added

 Status|UNCONFIRMED |RESOLVED
 CC||wschmidt at gcc dot gnu.org
 Resolution|--- |INVALID

--- Comment #2 from Bill Schmidt  ---
Such interfaces were never supported or promised for ppc64le.  The fact that
s390 supports them is irrelevant.

The supported interfaces you're looking for are:

vector signed long long vec_splats (signed long long);
vector unsigned long long vec_splats (unsigned long long);

See Appendix A of the ELF V2 ABI Specification for a list of all vector
functions that must be supported by compilers for ppc64le.

[Bug target/91274] vec_splat_[us]64 missing for ppc

2020-01-21 Thread wschmidt at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=91274

--- Comment #4 from Bill Schmidt  ---
The short answer is history.  Those others were inherited from the old Altivec
PIM.  Having splat-immediates with different names for different sizes and
signedness isn't consistent with the rest of the vector architecture, which
uses overloading to accomplish the same thing with fewer names to remember.  So
it wasn't deemed necessary to require all compilers to add yet another
redundant interface when VSX came along.

We could add the interfaces you request to GCC, but using them would be
non-portable across compilers, and therefore they wouldn't be recommended. 
Adding to the list of interfaces to be supported by all compilers can be done,
but will take time to propagate everywhere, and until then they would still not
be recommended.  So I'm not sure you really want to go that way.  But if you
do, feel free to re-open this as a feature request.

[Bug target/93448] PPC: missing builtin for DFP quantize(dqua,dquai,dquaq,dquaiq)

2020-01-28 Thread wschmidt at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=93448

Bill Schmidt  changed:

   What|Removed |Added

 CC||meissner at gcc dot gnu.org

--- Comment #1 from Bill Schmidt  ---
We haven't previously spent much effort on DFP intrinsics due to lack of
interested users on Linux.  Sounds like we have some now. :-)

CCing Mike and Segher for the inline asm constraint question.

[Bug target/93449] PPC: Missing conversion builtin from vector to _Decimal128 and vice versa

2020-01-28 Thread wschmidt at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=93449

--- Comment #7 from Bill Schmidt  ---
The ELFv2 ABI Appendix B calls for a bcd data type defined as:

typedef bcd vector unsigned char;

and then defines a bunch of potential functions that can be built around it. 
The BCD functions (such as __builtin_bcdadd), earlier in the appendix, are
defined in terms of vector unsigned char.

GCC has just never gotten around to implementing these, due to a combination of
user disinterest and resource constraints.  We'll have to step up to these,
hopefully in GCC 11 (though our plate is really full there already).

[Bug target/91903] vec_ctf altivec intrinsic can cause ICE on powerpc

2020-01-29 Thread wschmidt at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=91903

--- Comment #4 from Bill Schmidt  ---
Well, we should give you a better error message instead of an ICE.  But the ABI
definition of the second argument as "const int" indicates it needs to be an
actual constant in the range 0..31.

So You're Doing It Wrong (TM).

If you have a need for b to be a variable, you'll probaby have to use a
constant 0 and do the scaling yourself.

[Bug target/93230] PowerPC GCC vec_extract of a vector in memory does not fold sign/zero extension into load

2020-02-04 Thread wschmidt at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=93230

--- Comment #8 from Bill Schmidt  ---
Yes, the variable element numbers were the difficulties in question that slowed
things down last time, as I recall.  We may want to try to fold the simple
cases in gimple and let the rest run through to expand.

[Bug target/91903] vec_ctf altivec intrinsic can cause ICE on powerpc

2020-02-05 Thread wschmidt at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=91903

Bill Schmidt  changed:

   What|Removed |Added

   Assignee|unassigned at gcc dot gnu.org  |wschmidt at gcc dot 
gnu.org
   Target Milestone|--- |10.0

--- Comment #5 from Bill Schmidt  ---
I'll have a look.

[Bug target/93570] PPC: __builtin_mtfsf does not return a value

2020-02-05 Thread wschmidt at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=93570

Bill Schmidt  changed:

   What|Removed |Added

 Status|UNCONFIRMED |NEW
   Last reconfirmed||2020-02-05
 Ever confirmed|0   |1

--- Comment #1 from Bill Schmidt  ---
Yes, looks like the documentation is wrong.  Looking at GCC trunk:

  ftype = build_function_type_list (void_type_node,
intSI_type_node, double_type_node,
NULL_TREE);
  def_builtin ("__builtin_mtfsf", ftype, RS6000_BUILTIN_MTFSF);

This indicates the correct prototype to be:

  void __builtin_mtfsf (const int, double);

as you suggest.  The documentation needs correcting, but you should be able to
use the correct prototype in 8.3.0.  This builtin hasn't changed in ages.

Confirmed.

[Bug target/91903] vec_ctf altivec intrinsic can cause ICE on powerpc

2020-02-05 Thread wschmidt at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=91903

Bill Schmidt  changed:

   What|Removed |Added

   Keywords||ice-on-invalid-code
   Target Milestone|10.0|11.0

--- Comment #6 from Bill Schmidt  ---
I could put a bandaid on this for GCC 10, but I don't think it's worthwhile. 
The rewrite of the builtin initialization code will take care of this in GCC
11.  There are probably worse things than ice-on-invalid.  I'll keep this open
to track for the next release.

[Bug target/93570] PPC: __builtin_mtfsf does not return a value

2020-02-06 Thread wschmidt at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=93570

Bill Schmidt  changed:

   What|Removed |Added

   Target Milestone|--- |10.0

[Bug target/93570] PPC: __builtin_mtfsf does not return a value

2020-02-06 Thread wschmidt at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=93570

Bill Schmidt  changed:

   What|Removed |Added

 Status|NEW |RESOLVED
 Resolution|--- |FIXED

--- Comment #3 from Bill Schmidt  ---
Fixed.

[Bug target/90763] PowerPC vec_xl_len should take const

2020-02-06 Thread wschmidt at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=90763

--- Comment #2 from Bill Schmidt  ---
Whoops, that was not supposed to go to bz.  Sorry about that.

[Bug target/93709] [10 regression] fortran.dg/minlocval_4.f90 fails on power 9 after r10-4160

2020-02-16 Thread wschmidt at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=93709

--- Comment #1 from Bill Schmidt  ---
r10-4160 is the "daily bump" commit.  How confident are you in your bisection?
:-)

[Bug target/93819] PPC64 builtin vec_rlnm() argument order is wrong.

2020-02-18 Thread wschmidt at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=93819

Bill Schmidt  changed:

   What|Removed |Added

  Component|c   |target
   Target Milestone|--- |10.0

[Bug target/87560] ICE in curr_insn_transform, at lra-constraints.c:3892

2020-02-24 Thread wschmidt at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=87560

--- Comment #2 from Bill Schmidt  ---
Hm, I can't reproduce this with current trunk.  Does it still occur for you,
Martin?

[Bug target/87560] ICE in curr_insn_transform, at lra-constraints.c:3892

2020-02-28 Thread wschmidt at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=87560

--- Comment #3 from Bill Schmidt  ---
I expect the problem is still there somewhere, but it's gone latent.  There
haven't been any changes to *xxspltib__split since 2016.  Will need to
look at gcc-9 branch to debug.

[Bug target/87560] ICE in curr_insn_transform, at lra-constraints.c:3892

2020-02-28 Thread wschmidt at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=87560

--- Comment #4 from Bill Schmidt  ---
Although perhaps we've done a better job of sorting out these flags since then.
 Segher, anything ring a bell?

[Bug target/87560] ICE in curr_insn_transform, at lra-constraints.c:3892

2020-02-28 Thread wschmidt at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=87560

--- Comment #6 from Bill Schmidt  ---
OK, looks like the gimple has changed so we don't see the opportunity anymore
in GCC 10.

[Bug tree-optimization/56321] [4.8 Regression] ICE:segfault in midend for -funsafe-math-optimizations -O3

2013-02-14 Thread wschmidt at gcc dot gnu.org


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=56321



William J. Schmidt  changed:



   What|Removed |Added



 CC||wschmidt at gcc dot gnu.org



--- Comment #4 from William J. Schmidt  2013-02-14 
14:42:10 UTC ---

I'll have a look shortly.  I see it marked as a 4.8 regression even though this

work was done in 4.7.  Does the test pass on 4.7?


[Bug tree-optimization/56321] [4.8 Regression] ICE:segfault in midend for -funsafe-math-optimizations -O3

2013-02-14 Thread wschmidt at gcc dot gnu.org


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=56321



--- Comment #5 from William J. Schmidt  2013-02-14 
14:43:29 UTC ---

Actually I might be wrong about that, now that I think about it -- probably

this was done in 4.8.  It seems longer ago than that. ;)


[Bug tree-optimization/56321] [4.8 Regression] ICE:segfault in midend for -funsafe-math-optimizations -O3

2013-02-14 Thread wschmidt at gcc dot gnu.org


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=56321



--- Comment #6 from William J. Schmidt  2013-02-14 
20:11:32 UTC ---

Odd.  Reassociation makes a correct and profitable transformation into



foo (int n)

{

  double _2;

  double _5;

  double _6;

  double _7;

  double _8;

  float _9;



  :

  _2 = (double) n_1(D);

  _6 = _2 * 6.6662965923251249478198587894439697265625e-1;

  _7 = _6 + 2.0e+0;

  _5 = _7 * _2;

  _8 = _5;

  _9 = (float) _8;

  return _9;



}



but somehow verify_ssa() thinks the last statement (return _9;) contains a use

of the undefined SSA name _4.



Will continue to investigate later.


[Bug tree-optimization/56321] [4.8 Regression] ICE:segfault in midend for -funsafe-math-optimizations -O3

2013-02-14 Thread wschmidt at gcc dot gnu.org


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=56321



William J. Schmidt  changed:



   What|Removed |Added



 Status|NEW |ASSIGNED

 AssignedTo|unassigned at gcc dot   |wschmidt at gcc dot gnu.org

   |gnu.org |



--- Comment #7 from William J. Schmidt  2013-02-14 
22:27:21 UTC ---

I see.  The problem is a memory VUSE on the return statement that no longer has

a def.  The VDEF was associated with the __builtin_pow statement that was

expanded.  Looks like I need to release the SSA name.  Working on a fix.


[Bug tree-optimization/56321] [4.8 Regression] ICE:segfault in midend for -funsafe-math-optimizations -O3

2013-02-15 Thread wschmidt at gcc dot gnu.org


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=56321



--- Comment #10 from William J. Schmidt  
2013-02-15 15:13:55 UTC ---

(In reply to comment #8)

> (In reply to comment #7)

> > I see.  The problem is a memory VUSE on the return statement that no longer 
> > has

> > a def.  The VDEF was associated with the __builtin_pow statement that was

> > expanded.  Looks like I need to release the SSA name.  Working on a fix.

> 

> Use unlink_stmt_vdef when removing a stmt that possibly has a VDEF.



Yes -- I've been trying that, but something more subtle seems to be going on. 

I think perhaps the statement isn't being removed but modified in place.  I've

been trying to unlink the VDEF when the call is known to go away later, and

that's not doing it either.  Going to have to get dirty with the debugger this

morning.


[Bug tree-optimization/56321] [4.8 Regression] ICE:segfault in midend for -funsafe-math-optimizations -O3

2013-02-15 Thread wschmidt at gcc dot gnu.org


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=56321



--- Comment #11 from William J. Schmidt  
2013-02-15 15:49:03 UTC ---

OK, got it.  I was on the right track, there were just several locations where

it could happen and I missed one.


[Bug fortran/48636] Enable more inlining with -O2 and higher

2013-03-01 Thread wschmidt at gcc dot gnu.org


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=48636



William J. Schmidt  changed:



   What|Removed |Added



 CC||wschmidt at gcc dot gnu.org



--- Comment #43 from William J. Schmidt  
2013-03-01 17:48:51 UTC ---

(In reply to comment #38)

> Looks like for x86 r193331 led to significant regression on 172.mgrid for -m32

> -O3 -funroll-loops



The same degradation was seen on powerpc64-unknown-linux-gnu with r193331.  The

fix by Martin Jambor for PR55334 did not help for -m32.  It did give a slight

bump to -m64, but did not return the performance to pre-r193331 levels.  So

there still seems to be a problem with 172.mgrid related to this change.


[Bug fortran/48636] Enable more inlining with -O2 and higher

2013-03-04 Thread wschmidt at gcc dot gnu.org


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=48636



William J. Schmidt  changed:



   What|Removed |Added



 CC||bergner at gcc dot gnu.org



--- Comment #44 from William J. Schmidt  
2013-03-04 17:53:17 UTC ---

Compiling mgrid.f on powerpc64-unknown-linux-gnu as follows:



$ gfortran -S -m32 -O3 -mcpu=power7 -fpeel-loops -funroll-loops -ffast-math

-fvect-cost-model mgrid.f



I examined the assembly generated for revisions 193330, 193331 (this issue),

and 196171 (PR55334).  What I'm seeing is that for both 193331 and 196171, the

inliner is much more aggressive, and in particular is inlining several copies

of some pretty large functions.



For -m32, I am not seeing any specialization of resid_, so although the change

in 196171 helped a little, it appears that this was by reducing overall code

size.  There weren't any changes in inlining decisions.  Of course there is a

lot of distance between 193331 and 196171, so it is not a perfect comparison,

though it appears 196171 is where -m32 received a slight boost.



Anyway, the non-inlined call tree for 193330 is:



 main

  MAIN__

   resid_ (x4)

comm3_

   psinv_ (x3)

comm3_

   norm2u3_ (x2)

   interp_ (x2)

   setup_

   rprj3_ (x4)

   zran3_



The non-inlined call tree for 193331 is:



 main

  MAIN__

   comm3_ (x5)

   resid_

comm3_

   norm2u3_ (x2)

   setup_

   zran3_



So with 193331 we have the following additional inlines:



  3 inlines of resid_,  size = 1068, total size = 3204

  3 inlines of psinv_,  size = 1046, total size = 3138

  2 inlines of interp_, size = 1544, total size = 3088

  4 inlines of rprj3_,  size = 220,  total size = 880



Here "size" is the number of lines of assembly code of the called procedure,

including labels, so it's just a rough measure.  The number of static call

sites of comm3_ was also reduced by one, but I don't know whether it was

inlined or specialized away.



These are pretty large procedures to be duplicating, particularly to be

duplicating more than once.  Looking at resid_, it already generates spill code

on its own, so putting 3 copies of this in its caller isn't likely to be very

helpful.  Of these, I think only rprj3_ looks like a reasonable inline

candidate.



Total lines of the assembly files are:



  8660 r193330/mgrid.s

 16398 r193331/mgrid.s

 14592 r196171/mgrid.s



Inlining creates unreachable code, so removing the unreachable procedures

gives:



  7765 r193330/mgrid.s

 12591 r193331/mgrid.s

 10795 r196171/mgrid.s



With r196171 the reachable code is still about 40% larger than r193330 (where

some reasonable inlining was already being done).  This is better than the 60%

bloat with r193331 but still seems too high.  Again, these are rough measures

but I think they are indicative.



Without knowing anything about the inliner, I think the inlining heuristics

probably need to take more account of code size than they seem to do at the

moment, particularly when making more than one copy of a procedure and thus

reducing spatial locality.


[Bug rtl-optimization/56605] New: Redundant branch introduced during loop2 phases

2013-03-12 Thread wschmidt at gcc dot gnu.org


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=56605



 Bug #: 56605

   Summary: Redundant branch introduced during loop2 phases

Classification: Unclassified

   Product: gcc

   Version: 4.8.0

Status: UNCONFIRMED

  Keywords: missed-optimization

  Severity: normal

  Priority: P3

 Component: rtl-optimization

AssignedTo: unassig...@gcc.gnu.org

ReportedBy: wschm...@gcc.gnu.org

CC: berg...@vnet.ibm.com

  Host: powerpc64-unknown-linux-gnu

Target: powerpc64-unknown-linux-gnu

 Build: powerpc64-unknown-linux-gnu





I've recently noticed that GCC commonly generates redundant branches prior to

vectorized loops, such as:



   cmpwi 7,12,0

...

   beq- 7,.L22

   beq- 7,.L22

   .p2align 4,,15

.L8:

... loop body ...



The branches first appear in the 186r.loop2_doloop debug dump:



(note 260 254 258 23 [bb 23] NOTE_INSN_BASIC_BLOCK)

(insn 258 260 259 23 (set (reg:CC 330)

(compare:CC (subreg:SI (reg:DI 153 [ bnd.10+-4 ]) 4)

(const_int 0 [0]))) -1

 (nil))

(jump_insn 259 258 263 23 (set (pc)

(if_then_else (eq (reg:CC 330)

(const_int 0 [0]))

(label_ref 257)

(pc))) -1

 (expr_list:REG_BR_PROB (const_int 0 [0])

(nil))

 -> 257)

(note 263 259 261 24 [bb 24] NOTE_INSN_BASIC_BLOCK)

(insn 261 263 262 24 (set (reg:CC 331)

(compare:CC (subreg:SI (reg:DI 153 [ bnd.10+-4 ]) 4)

(const_int 0 [0]))) -1

 (nil))

(jump_insn 262 261 257 24 (set (pc)

(if_then_else (eq (reg:CC 331)

(const_int 0 [0]))

(label_ref 257)

(pc))) -1

 (expr_list:REG_BR_PROB (const_int 0 [0])

(nil))

 -> 257)



Subsequently GCC removes the redundant compare, but does not remove the

redundant branch.



Simple test case to reproduce:



#define N 4000

void foo(short* __restrict sb, int* __restrict ia) {

  int i;

  for (i = 0; i < N; i++)

ia[i] = (int) sb[i];

}



$GCC_INSTALL/bin/gcc -S -O3 -mvsx example.c



(-mvsx is necessary to vectorize the loop.  It may also be necessary to add

-mcpu=power7.)


[Bug middle-end/35308] Straight line strength reduction

2013-03-25 Thread wschmidt at gcc dot gnu.org


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=35308



William J. Schmidt  changed:



   What|Removed |Added



   Target Milestone|4.8.1   |4.9.0



--- Comment #5 from William J. Schmidt  2013-03-25 
13:17:39 UTC ---

The unknown stride features made it into 4.8.0, but conditional candidates are

still pending, and too complex for 4.8.1, I think.  Changing target to 4.9 for

the remaining work.  I do plan to get that wrapped up fairly soon.


[Bug target/56843] New: PowerPC Newton-Raphson reciprocal estimates can be improved

2013-04-04 Thread wschmidt at gcc dot gnu.org


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=56843



 Bug #: 56843

   Summary: PowerPC Newton-Raphson reciprocal estimates can be

improved

Classification: Unclassified

   Product: gcc

   Version: 4.9.0

Status: UNCONFIRMED

  Keywords: missed-optimization

  Severity: normal

  Priority: P3

 Component: target

AssignedTo: wschm...@gcc.gnu.org

ReportedBy: wschm...@gcc.gnu.org

  Host: powerpc64-unknown-linux-gnu

Target: powerpc64-unknown-linux-gnu

 Build: powerpc64-unknown-linux-gnu





It was recently brought to my attention that the number of Newton-Raphson

iterations for floating reciprocal-estimate and floating

recriprocal-sqrt-estimate can be tightened.  In particular, for 32-bit

floating-point values targeting processors having higher precision estimates, a

single iteration should suffice to produce maximum representable precision.  We

currently perform two.  We should verify that one is actually sufficient in

practice.



We should also investigate whether 3 iterations is sufficient for 64-bit

floating-point values when targeting processors having lower precision

estimates.  The theoretical math suggests 4 may be necessary, but this could be

too conservative in practice as this is derived from a general bound on the

method.


[Bug target/56843] PowerPC Newton-Raphson reciprocal estimates can be improved

2013-04-04 Thread wschmidt at gcc dot gnu.org


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=56843



--- Comment #2 from Bill Schmidt  2013-04-04 
16:12:31 UTC ---

Regarding the last point, I found this in the user manual:



"The double-precision square root estimate instructions are not generated by

default on low-precision machines, since they do not provide an estimate that

converges after three steps."



That seems to indicate someone decided the libcall is better than a four-step

iteration.  That doesn't necessarily seem obvious to me.


[Bug target/56843] PowerPC Newton-Raphson reciprocal estimates can be improved

2013-04-05 Thread wschmidt at gcc dot gnu.org


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=56843



--- Comment #3 from Bill Schmidt  2013-04-05 
15:03:26 UTC ---

Looks like we can improve performance for three cases on P6 and later machines:

 - 32-bit reciprocal square root: remove two instructions

 - 32-bit reciprocal: remove three instructions

 - 64-bit reciprocal: remove one instruction



The last is due to a subtle bug in the existing implementation.


[Bug target/56843] PowerPC Newton-Raphson reciprocal estimates can be improved

2013-04-05 Thread wschmidt at gcc dot gnu.org


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=56843



Bill Schmidt  changed:



   What|Removed |Added



 Status|NEW |RESOLVED

 Resolution||FIXED



--- Comment #4 from Bill Schmidt  2013-04-05 
19:29:44 UTC ---

Fixed in r197534.


[Bug tree-optimization/56933] New: [4.9 Regression] Vectorizer missing read-write dependency for interleaved accesses

2013-04-12 Thread wschmidt at gcc dot gnu.org


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=56933



 Bug #: 56933

   Summary: [4.9 Regression] Vectorizer missing read-write

dependency for interleaved accesses

Classification: Unclassified

   Product: gcc

   Version: 4.9.0

Status: UNCONFIRMED

  Keywords: wrong-code

  Severity: normal

  Priority: P3

 Component: tree-optimization

AssignedTo: unassig...@gcc.gnu.org

ReportedBy: wschm...@gcc.gnu.org

CC: berg...@gcc.gnu.org, d...@gcc.gnu.org,

rgue...@gcc.gnu.org

  Host: powerpc*-*-*

Target: powerpc*-*-*

 Build: powerpc*-*-*





Created attachment 29861

  --> http://gcc.gnu.org/bugzilla/attachment.cgi?id=29861

Vectorization details dump for the test case



vect_analyze_group_access() in tree-vect-data-refs.c contains a test for

load-store dependencies:



  if (GROUP_READ_WRITE_DEPENDENCE (vinfo_for_stmt (next))

  || GROUP_READ_WRITE_DEPENDENCE (vinfo_for_stmt (prev)))



Currently this always returns false because this field has not yet been set in

the vinfo.  This began with r196872, where the code to analyze accesses was

moved ahead of the code to analyze dependences.



I put together a test demonstrating that it's possible for us to generate

incorrect code as a result:



   subroutine test(a,b,c,d,e,f)

   integer k

   real*4, intent(out) :: a(1000)

   real*4, intent(out) :: b(1000)

   real*4, intent(in) :: c(1000)

   real*4, intent(inout) :: d(2000)

   real*4, intent(out) :: e(1000)

   real*4, intent(out) :: f(1000)



   do k = 1,1000

  a(k) = 3.0 * d(2*k)

  e(k) = 3.3 * d(2*k+1)

  d(2*k) = 2.0 * c(k)

  d(2*k+1) = 2.3 * c(k)

  b(k) = d(2*k) - 5.5;

  f(k) = d(2*k+1) + 5.5;

   enddo



   return

   end



I'm attaching a detailed dump of the vectorization pass that shows that the

values of d(2*k) and d(2*k+1) used to compute b(k) and f(k) are the ones loaded

prior to the stores to those locations.



To reproduce on powerpc64-unknown-linux-gnu:



$ gfortran -O3 -ffast-math -mcpu=power7 -fno-vect-cost-model interl-lsl-2.f


[Bug tree-optimization/56962] [4.8/4.9 Regression] SLSR caused miscompilation of fftw

2013-04-15 Thread wschmidt at gcc dot gnu.org


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=56962



--- Comment #2 from Bill Schmidt  2013-04-15 
13:19:53 UTC ---

The fix looks correct to me.  Thanks!


[Bug rtl-optimization/56605] Redundant branch introduced during loop2 phases

2013-04-15 Thread wschmidt at gcc dot gnu.org


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=56605



Bill Schmidt  changed:



   What|Removed |Added



 AssignedTo|unassigned at gcc dot   |wschmidt at gcc dot gnu.org

   |gnu.org |

   Target Milestone|--- |4.8.1



--- Comment #4 from Bill Schmidt  2013-04-15 
18:32:53 UTC ---

This was fixed in trunk on 2013-03-20.  Now that it's burned in for a few

weeks, I'll plan to fix it in 4.8.1 shortly, provided there are no objections.


[Bug rtl-optimization/56605] Redundant branch introduced during loop2 phases

2013-04-23 Thread wschmidt at gcc dot gnu.org


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=56605



Bill Schmidt  changed:



   What|Removed |Added



 Status|NEW |RESOLVED

 Resolution||FIXED



--- Comment #5 from Bill Schmidt  2013-04-23 
19:49:08 UTC ---

Fixed.


[Bug target/56864] [4.9 regression] FAIL: gcc.dg/vect/costmodel/ppc/costmodel-vect-76b.c scan-tree-dump-times vect "vectorized 1 loops" 0

2013-05-01 Thread wschmidt at gcc dot gnu.org


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=56864



Bill Schmidt  changed:



   What|Removed |Added



 CC||wschmidt at gcc dot gnu.org



--- Comment #6 from Bill Schmidt  2013-05-01 
17:49:00 UTC ---

I can't confirm this today, either.  The test passes with r198500.  Andreas, do

you still see a problem with the current trunk?


[Bug target/56864] [4.9 regression] FAIL: gcc.dg/vect/costmodel/ppc/costmodel-vect-76b.c scan-tree-dump-times vect "vectorized 1 loops" 0

2013-05-01 Thread wschmidt at gcc dot gnu.org


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=56864



--- Comment #8 from Bill Schmidt  2013-05-01 
20:13:35 UTC ---

If possible, please check whether this began failing with r196872.  That commit

looks suspicious for at least one other test.  I'm stabbing in the dark since I

can't reproduce this one.


[Bug target/56865] [4.9 regression] FAIL: gcc.dg/vect/vect-42.c scan-tree-dump-times vect "Vectorizing an unaligned access" 4

2013-05-01 Thread wschmidt at gcc dot gnu.org


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=56865



Bill Schmidt  changed:



   What|Removed |Added



 CC||rguenth at gcc dot gnu.org,

   ||wschmidt at gcc dot gnu.org



--- Comment #2 from Bill Schmidt  2013-05-01 
21:58:09 UTC ---

I've reproduced this as well.  Additionally, gcc.dg/vect/vect-96.c fails

similarly.  Both tests began failing at r196872:



2013-03-21  Richard Biener  



* tree-vect-data-refs.c (vect_update_interleaving_chain): Remove.

(vect_insert_into_interleaving_chain): Likewise.

(vect_drs_dependent_in_basic_block): Inline ...

(vect_slp_analyze_data_ref_dependence): ... here.  New function,

split out from ...

(vect_analyze_data_ref_dependence): ... here.  Simplify.

(vect_check_interleaving): Simplify.

(vect_analyze_data_ref_dependences): Likewise.  Split out ...

(vect_slp_analyze_data_ref_dependences): ... this new function.

(dr_group_sort_cmp): New function.

(vect_analyze_data_ref_accesses): Compute data-reference groups

here instead of in vect_analyze_data_ref_dependence.  Use

a more efficient algorithm.

* tree-vect-slp.c (vect_slp_analyze_bb_1): Use

vect_slp_analyze_data_ref_dependences.  Call

vect_analyze_data_ref_accesses earlier.

* tree-vect-loop.c (vect_analyze_loop_2): Likewise.

* tree-vectorizer.h (vect_analyze_data_ref_dependences): Adjust.

(vect_slp_analyze_data_ref_dependences): New prototype.



Richi, I think this commit was not intended to have any functional effect -- is

that correct?


[Bug target/56865] [4.9 regression] FAIL: gcc.dg/vect/vect-42.c scan-tree-dump-times vect "Vectorizing an unaligned access" 4

2013-05-02 Thread wschmidt at gcc dot gnu.org


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=56865



--- Comment #4 from Bill Schmidt  2013-05-02 
15:27:08 UTC ---

(In reply to comment #3)

> 

> Correct.  Dumping order is affected by the patch though, thus if

> we previously disabled vectorization at some point the dumping

> before that can be affected due to the re-ordering.



It appears that we are vectorizing the same loops, but we are now vectorizing

one loop differently.  In r196871, the loop is peeled for alignment.  In

r196872, the loop is versioned for alignment.  



I will attach the vectorization detail dumps for the two revisions.


  1   2   3   4   5   6   7   8   9   10   >