from:"meissner at gcc dot gnu.org"

[Bug target/115800] PowerPC GCC cannot build a little endian compile if --with-cpu=power5 is used

2024-07-05 Thread meissner at gcc dot gnu.org via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115800

--- Comment #6 from Michael Meissner  ---
Of course it would also apply if you are building a BE compiler that has little
endian multilibs, you would run into the same situation.

[Bug target/115800] PowerPC GCC cannot build a little endian compile if --with-cpu=power5 is used

2024-07-05 Thread meissner at gcc dot gnu.org via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115800

--- Comment #5 from Michael Meissner  ---
And libstdc++-v3 errors are similar:

mkdir -p ./powerpc64le-unknown-linux-gnu/bits/stdc++.h.gch
/home/meissner/fsf-build-ppc64le/work171-p5/./gcc/xgcc -shared-libgcc
-B/home/meissner/fsf-build-ppc64le/work171-p5/./gcc -nostdinc++
-L/home/meissner/fsf-build-ppc64le/work171-p5/powerpc64le-unknown-linux-gnu/libstdc++-v3/src
-L/home/meissner/fsf-build-ppc64le/work171-p5/powerpc64le-unknown-linux-gnu/libstdc++-v3/src/.libs
-L/home/meissner/fsf-build-ppc64le/work171-p5/powerpc64le-unknown-linux-gnu/libstdc++-v3/libsupc++/.libs
-B/home/meissner/fsf-install-ppc64le/work171-p5/powerpc64le-unknown-linux-gnu/bin/
-B/home/meissner/fsf-install-ppc64le/work171-p5/powerpc64le-unknown-linux-gnu/lib/
-isystem
/home/meissner/fsf-install-ppc64le/work171-p5/powerpc64le-unknown-linux-gnu/include
-isystem
/home/meissner/fsf-install-ppc64le/work171-p5/powerpc64le-unknown-linux-gnu/sys-include
   -x c++-header -nostdinc++ -g -O2 -D_GNU_SOURCE 
-I/home/meissner/fsf-build-ppc64le/work171-p5/powerpc64le-unknown-linux-gnu/libstdc++-v3/include/powerpc64le-unknown-linux-gnu
-I/home/meissner/fsf-build-ppc64le/work171-p5/powerpc64le-unknown-linux-gnu/libstdc++-v3/include
-I/home/meissner/fsf-src/work171/libstdc++-v3/libsupc++  -O2 -g
/home/meissner/fsf-src/work171/libstdc++-v3/include/precompiled/stdc++.h -o
powerpc64le-unknown-linux-gnu/bits/stdc++.h.gch/O2g.gch
In file included from
/home/meissner/fsf-build-ppc64le/work171-p5/powerpc64le-unknown-linux-gnu/libstdc++-v3/include/bits/stl_algobase.h:63,
 from
/home/meissner/fsf-build-ppc64le/work171-p5/powerpc64le-unknown-linux-gnu/libstdc++-v3/include/algorithm:60,
 from
/home/meissner/fsf-src/work171/libstdc++-v3/include/precompiled/stdc++.h:51:
/home/meissner/fsf-build-ppc64le/work171-p5/powerpc64le-unknown-linux-gnu/libstdc++-v3/include/ext/numeric_traits.h:224:38:
error: ‘__ieee128’ was not declared in this scope; did you mean ‘__int128’?
  224 | struct __numeric_traits_floating<__ieee128>
  |  ^
  |  __int128
/home/meissner/fsf-build-ppc64le/work171-p5/powerpc64le-unknown-linux-gnu/libstdc++-v3/include/ext/numeric_traits.h:224:47:
error: template argument 1 is invalid
  224 | struct __numeric_traits_floating<__ieee128>
  |   ^
/home/meissner/fsf-build-ppc64le/work171-p5/powerpc64le-unknown-linux-gnu/libstdc++-v3/include/ext/numeric_traits.h:232:29:
error: ‘__ieee128’ was not declared in this scope; did you mean ‘__int128’?
  232 | struct __numeric_traits<__ieee128>
  | ^
  | __int128
/home/meissner/fsf-build-ppc64le/work171-p5/powerpc64le-unknown-linux-gnu/libstdc++-v3/include/ext/numeric_traits.h:232:38:
error: template argument 1 is invalid
  232 | struct __numeric_traits<__ieee128>
  |  ^
/home/meissner/fsf-build-ppc64le/work171-p5/powerpc64le-unknown-linux-gnu/libstdc++-v3/include/ext/numeric_traits.h:233:40:
error: ‘__ieee128’ was not declared in this scope; did you mean ‘__int128’?
  233 | : public __numeric_traits_floating<__ieee128>
  |^
  |__int128
/home/meissner/fsf-build-ppc64le/work171-p5/powerpc64le-unknown-linux-gnu/libstdc++-v3/include/ext/numeric_traits.h:233:49:
error: template argument 1 is invalid
  233 | : public __numeric_traits_floating<__ieee128>
  | ^

[Bug target/115800] PowerPC GCC cannot build a little endian compile if --with-cpu=power5 is used

2024-07-05 Thread meissner at gcc dot gnu.org via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115800

--- Comment #4 from Michael Meissner  ---
Libgfortran gives various errors that _Float128 is not supported on this
target.

libtool: compile:  /home/meissner/fsf-build-ppc64le/work171-p5/./gcc/xgcc
-B/home/meissner/fsf-build-ppc64le/work171-p5/./gcc/
-B/home/meissner/fsf-install-ppc64le/work171-p5/powerpc64le-unknown-linux-gnu/bin/
-B/home/meissner/fsf-install-ppc64le/work171-p5/powerpc64le-unknown-linux-gnu/lib/
-isystem /home/meissner/fsf-ins
tall-ppc64le/work171-p5/powerpc64le-unknown-linux-gnu/include -isystem
/home/meissner/fsf-install-ppc64le/work171-p5/powerpc64le-unknown-linux-gnu/sys-include
-DHAVE_CONFIG_H -I. -I/home/meissner/fsf-src/work171/libgfortran
-iquote/home/meissner/fsf-src/work171/libgfortran/io
-I/home/meissner/fsf-src/work171/libgfortran/..
/gcc -I/home/meissner/fsf-src/work171/libgfortran/../gcc/config -I../.././gcc
-I/home/meissner/fsf-src/work171/libgfortran/../libgcc -I../libgcc
-I/home/meissner/fsf-src/work171/libgfortran/../libbacktrace -I../libbacktrace
-I../libbacktrace -std=gnu11 -Wall -Wstrict-prototypes -Wmissing-prototypes
-Wold-style-definition -
Wextra -Wwrite-strings -Werror=implicit-function-declaration -Werror=vla
-mabi=ibmlongdouble -mno-gnu-attribute -fcx-fortran-rules -ffunction-sections
-fdata-sections -g -O2 -MT caf/single.lo -MD -MP -MF caf/.deps/single.Tpo -c
/home/meissner/fsf-src/work171/libgfortran/caf/single.c  -fPIC -DPIC -o
caf/.libs/single.o
In file included from ./kinds.h:69,
 from
/home/meissner/fsf-src/work171/libgfortran/libgfortran.h:263,
 from
/home/meissner/fsf-src/work171/libgfortran/caf/libcaf.h:32,
 from
/home/meissner/fsf-src/work171/libgfortran/caf/single.c:26:
/home/meissner/fsf-src/work171/libgfortran/kinds-override.h:34:9: error:
‘_Float128’ is not supported on this target
   34 | typedef _Float128 GFC_REAL_17;
  | ^
/home/meissner/fsf-src/work171/libgfortran/kinds-override.h:35:18: error:
‘_Float128’ is not supported on this target
   35 | typedef _Complex _Float128 GFC_COMPLEX_17;
  |  ^
/home/meissner/fsf-src/work171/libgfortran/libgfortran.h:1963:8: error:
‘_Float128’ is not supported on this target
 1963 | extern _Float128 __acoshieee128 (_Float128)
  |^
/home/meissner/fsf-src/work171/libgfortran/libgfortran.h:1963:34: error:
‘_Float128’ is not supported on this target
 1963 | extern _Float128 __acoshieee128 (_Float128)
  |  ^
/home/meissner/fsf-src/work171/libgfortran/libgfortran.h:1965:8: error:
‘_Float128’ is not supported on this target
 1965 | extern _Float128 __acosieee128 (_Float128)
  |^
/home/meissner/fsf-src/work171/libgfortran/libgfortran.h:1965:33: error:
‘_Float128’ is not supported on this target
 1965 | extern _Float128 __acosieee128 (_Float128)
  | ^
/home/meissner/fsf-src/work171/libgfortran/libgfortran.h:1967:8: error:
‘_Float128’ is not supported on this target
 1967 | extern _Float128 __asinhieee128 (_Float128)
  |^
/home/meissner/fsf-src/work171/libgfortran/libgfortran.h:1967:34: error:
‘_Float128’ is not supported on this target
 1967 | extern _Float128 __asinhieee128 (_Float128)
  |  ^
/home/meissner/fsf-src/work171/libgfortran/libgfortran.h:1969:8: error:
‘_Float128’ is not supported on this target
 1969 | extern _Float128 __asinieee128 (_Float128)
  |^
/home/meissner/fsf-src/work171/libgfortran/libgfortran.h:1969:33: error:
‘_Float128’ is not supported on this target
 1969 | extern _Float128 __asinieee128 (_Float128)
  | ^
/home/meissner/fsf-src/work171/libgfortran/libgfortran.h:1971:8: error:
‘_Float128’ is not supported on this target
 1971 | extern _Float128 __atan2ieee128 (_Float128)
  |^
/home/meissner/fsf-src/work171/libgfortran/libgfortran.h:1971:34: error:
‘_Float128’ is not supported on this target
 1971 | extern _Float128 __atan2ieee128 (_Float128)
  |  ^
/home/meissner/fsf-src/work171/libgfortran/libgfortran.h:1973:8: error:
‘_Float128’ is not supported on this target
 1973 | extern _Float128 __atanhieee128 (_Float128)
  |^
/home/meissner/fsf-src/work171/libgfortran/libgfortran.h:1973:34: error:
‘_Float128’ is not supported on this target
 1973 | extern _Float128 __atanhieee128 (_Float128)
  |  ^
/home/meissner/fsf-src/work171/libgfortran/libgfortran.h:1975:8: error:
‘_Float128’ is not supported on this target
 1975 | extern _Float128 __atanieee128 (_Float128)
  |^
/home/meissner/fsf-src/work171/libgfortran/libgfortran.h:1975:33: error:
‘_Float128’ is not supported on this target
 1975 | extern _Float128 __atanieee128 (_Float128)
  |

[Bug target/115800] New: PowerPC GCC cannot build a little endian compile if --with-cpu=power5 is used

2024-07-05 Thread meissner at gcc dot gnu.org via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115800

Bug ID: 115800
   Summary: PowerPC GCC cannot build a little endian compile if
--with-cpu=power5 is used
   Product: gcc
   Version: unknown
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: target
  Assignee: unassigned at gcc dot gnu.org
  Reporter: meissner at gcc dot gnu.org
  Target Milestone: ---

The libgfortran and libstdc++-v3 libraries cannot be built if you build a
little endian compiler and set the default cpu to power5.  The reason appears
to be the libraries both assume that if a little endian target is being built,
that IEEE 128-bit floating point is supported.  But power5 does not support the
VSX instruction set and registers, so IEEE 128-bit floating point is not
available.

[Bug target/113652] [14 regression] Failed bootstrap on ppc unrecognized opcode: `lfiwzx' with -mcpu=7450

2024-04-12 Thread meissner at gcc dot gnu.org via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113652

--- Comment #23 from Michael Meissner  ---
This is one of those things where there is no right answer in part because we
need other things to flesh out the support.

The reason -mvsx was used is we need the VSX registers to build the IEEE
128-bit support in libgcc (KFmode values are passed in vector registers 32..63,
i.e. the traditional Altivec registers).

In theory, we need only Altivec, but I felt at the time when I implemented this
 that it opened up the door for lots of other things breaking due to the goofy
nature of Altivec addresses omitting the bottom bits and no direct move
support) and that VSX was the minimum ISA needed.

Now, from a practical matter, it should have been power8 as a minimum (due to
direct move) but at the time I did the initial work, we were still actively
supporting power7.

Then we have the issue that while the compiler can generate code on BE systems
for IEEE 128-bit (either with software emulation or with the power9 hardware
support) glibc only supports IEEE 128-bit on 64-bit LE.

So for a user it is useless to have IEEE 128-bit on BE systems.  But if
somebody wanted to go through and do the work to enable the GLIBC support and
other parts of the compiler/libraries that provide IEEE 128-bit.  But it is not
a windmill I want to charge and tilt at.  But hey, if somebody wants to do the
work to fix all of the support for this, go ahead.  I am not that person.

Note this is the classic catch 22 that we faced in the early days.  GCC has to
support the stuff to a minimal amount even though users can't use it.  But you
need that ability to generate the code to get glibc to do the support.

In terms of the immediate problem, you have several choices:

1) Ignore it and say to the users don't do that.

2) Prevent the IEEE 128-bit libgcc bits from being built on a BE or 32-bit LE
system unless some configure switch is used.  Or just kick the can down the
road, and don't provide a configure option in GCC 14, and if people are
interested do it in GCC 15.

3) Only build the IEEE 128-bit libgcc bits if the user configured the compiler
with --with-cpu=power7, --with-cpu=power8, --with-cpu=power9,
--with-cpu=power10 (and in the future --with-cpu=power11 or --with-cpu=future).
 This could be code that if __VSX__ is not defined, the libgcc support
functions won't get built.  We would then remove the -mvsx option from the
library support functions.

Though note, there is an issue in that if you don't use a --with-cpu= configure
option, it won't build the bits.  Thus for the brave person trying to enable
IEEE 128-bit for BE, they would have to configure with one of the IBM server
platforms, while the majority of users would be using the old Apple boards,
embedded platforms, or even AIX, etc.

Note, I will be on vacation from April 16th through the 23rd, and I probably
won't bring a work laptop, which will mean I won't be able to answer email in
that period.

[Bug target/94630] General bug for changes needed to switch the powerpc64le-linux long double default

2024-04-11 Thread meissner at gcc dot gnu.org via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=94630

Michael Meissner  changed:

   What|Removed |Added

 Status|ASSIGNED|RESOLVED
 Resolution|--- |FIXED

--- Comment #11 from Michael Meissner  ---
All of the changes needed for making long double use the IEEE 128-bit encoding
have been fixed in GCC.  The Fedora distribution now uses long doubles with
IEEE 128-bit encodings as the default.

[Bug target/101019] GCC should consider using PLI/SLDI/PADDI to load up 64-bit constants on power10

2024-04-11 Thread meissner at gcc dot gnu.org via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=101019

Michael Meissner  changed:

   What|Removed |Added

 Resolution|--- |FIXED
 Status|ASSIGNED|RESOLVED

--- Comment #2 from Michael Meissner  ---
The changes for this were checked into the master branch on December 12th, 2023
by Jiufu Guo  .

[Bug target/99708] __SIZEOF_FLOAT128__ not defined on powerpc64le-linux

2024-04-10 Thread meissner at gcc dot gnu.org via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=99708

Michael Meissner  changed:

   What|Removed |Added

 Status|NEW |RESOLVED
 Resolution|--- |FIXED

--- Comment #39 from Michael Meissner  ---
Changes were checked in on March 10th, 2022 by Jakub Jelinek to add these
defines.

[Bug libstdc++/104772] std::numeric_limits<__float128> should be specialized

2024-04-10 Thread meissner at gcc dot gnu.org via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=104772
Bug 104772 depends on bug 99708, which changed state.

Bug 99708 Summary: __SIZEOF_FLOAT128__ not defined on powerpc64le-linux
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=99708

   What|Removed |Added

 Status|NEW |RESOLVED
 Resolution|--- |FIXED

[Bug target/110960] TestSatWidenMulPairwiseAdd in the Google Highway test suite fails when compiled with GCC 12 or later with the -mcpu=power9 option

2024-03-29 Thread meissner at gcc dot gnu.org via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110960

Michael Meissner  changed:

   What|Removed |Added

 CC||meissner at gcc dot gnu.org

--- Comment #12 from Michael Meissner  ---
The test case actually shows on power8 GCC was generating incorrect code, and
power9 is actually doing the right thing.  But the test case was written
assuming the previous behavior was correct.

TL;DNR answer power8 generated STVX instead of STXVD2VX.  Power9 generates
STXV.

To explain what the issue is, we need to go back in history.

PowerPC processors (and Power before it) were originally designed for big
endian environments.  The Altivec instruction set had limited vector save and
load instructions (STVX and LVX) which ignored the bottom 4 bits of the
address.  STVX and LVX did the correct byte swapping if the PowerPC was running
in little endian mode.

When power7 came out with the VSX instruction set, the vector save and load
instructions (STXVD2X, STXV4X, LXVD2X, and LXV4X) were added.  These
instructions allowed saving and loading all 64 VSX registers (32 registers that
overlapped with floating point registers, and 32 registers that overlapped with
traditional Altivec registers).  However, these instructions only store and
load values using big endian ordering.

After the power8 came out, the PowerPC Linux systems were moved from being big
endian to little endian.  This meant that after doing a vector load
instruction, we had to do explicit byte swapping, and before a vector save we
had to do the byte swapping of the value before doing the save.

We added an optimization to GCC that in the special case of storing/loading
temporaries on the stack, we would use the Altivec instructions STVX and LVX
and elimiante the byte swapping instructions since we could insure that all
temporaries were correctly aligned.  But we couldn't use STVX and LVX in
general due to these instructions ignoring the bottom 4 bits of the address and
they restricted the vector registers to just the VSX registers that overlap
with the Altivec registers.

When power9 came out, we added new vector store and load instructions (STXV,
STXVX, LXV, and LXVX) that did the correct byte swapping on little endian
systems.  GCC now generates these instructions and eliminates the special code
to use the Altivec STVX and LVX instructions.

In the test case, VerifyVecEqToActual takes 2 vector arguments, and creates 2
16 byte arrays, and stores each vector into the array.  It uses
reinterpret_cast to convert this into a store instruction.

However, since the temporary is on the stack, on power8 this uses the Altivec
STVX instruction and it gets byte swapped.

[Bug target/113652] [14 regression] Failed bootstrap on ppc unrecognized opcode: `lfiwzx' with -mcpu=7450

2024-03-28 Thread meissner at gcc dot gnu.org via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113652

--- Comment #19 from Michael Meissner  ---
When I wrote the VSX support many years ago, I intended that -mvsx enable all
of ISA 2.06, which includes ISA 2.05, etc.

My intentions were there 2 options for power7, one is the base ISA 2.07 support
for everything except the VSX registers (i.e. -mpopcntd), and the other enables
the floating point support.  The reason is the kernel needs to be built without
floating point support.

If you say -mvsx, it should include the standard power7 integer instructions
(-mpopcntd), power6 server instructions (i.e. -mhard-dfp, -mcmpb, -mrecip,
-mpowerpc-gfxopt, and -mpowerpc-gpopt), etc.

VSX support assumes it can use lfiwax and lfiwzx.

[Bug target/70928] Load simple float constants via VSX operations on PowerPC

2024-03-27 Thread meissner at gcc dot gnu.org via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=70928

Michael Meissner  changed:

   What|Removed |Added

 Ever confirmed|0   |1
   Last reconfirmed||2024-03-27
 Status|UNCONFIRMED |NEW

--- Comment #5 from Michael Meissner  ---
Power10 can now load all SFmode constants and many DFmode constants via the
XXSPLTIDP instructions.  As time goes by, I would imagine doing this
optimization for power8 and power9 machines becomes less important.

[Bug bootstrap/31418] Bootstrap failure with -O2 -funroll-loops -funsafe-math-optimizations options on PPC

2024-03-27 Thread meissner at gcc dot gnu.org via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=31418

Michael Meissner  changed:

   What|Removed |Added

 Resolution|--- |FIXED
 Status|UNCONFIRMED |RESOLVED
 CC||meissner at gcc dot gnu.org

--- Comment #2 from Michael Meissner  ---
I built the current GCC 14 development compiler using -O2 -funroll-loops
-funsafe-math-optimizations, and it built fine.  I suspect it had been fixed
ages ago.

[Bug target/112886] New: We need a new print_operand output modifier for vector double

2023-12-06 Thread meissner at gcc dot gnu.org via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112886

Bug ID: 112886
   Summary: We need a new print_operand output modifier for vector
double
   Product: gcc
   Version: unknown
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: target
  Assignee: unassigned at gcc dot gnu.org
  Reporter: meissner at gcc dot gnu.org
  Target Milestone: ---

I've been working with vector double support to provide faster memory latency
for specialized applications.  While the work I've been doing might not make it
in GCC 14, I've been looking at what is needed to provide usable asm support
for using vector pairs.

The problem is we have %x for VSX registers that maps the traditional FPR
registers into 0..31 and the traditional Altivec registers into 32..63.  We
have %L that returns the 2nd register in a multiple register object. 
However, we don't have a combination of %x and %L, where for VSX
registers it would return the 2nd register in the vector pair as a VSX register
number.

For example, if you wanted to write a loop where you use vector pairs to load
the values and then manually process each vector, you might want to write using
%S to access the 2nd vector register:

__vector_pair *p, sum;
size_t i, n;
// ...
__asm__ ("xxspltib %x0,0\nxxspltib %S0,0" : "=wa" (sum));
for (i = 0; i < n; i++)
__asm__ ("xvadddp %x0,%x1,%x2\n\txvadddp %S0,%S1,%S2"
 : "=wa" (sum)
 : "wa" (sum), "wa" (p[i]));

However without this new print_operand output modifier, you would have to use
either "d" or "f" to limit the registers to the traditional FPR registers. 
I.e.:

__vector_pair *p, sum;
size_t i, n;
// ...
__asm__ ("xxspltib %0,0\nxxspltib %L0,0" : "=f" (sum));
for (i = 0; i < n; i++)
__asm__ ("xvadddp %0,%1,%2\n\txvadddp %L0,%L1,%L2"
 : "=f" (sum)
 : "f" (sum), "f" (p[i]));

If you do this, you limit the number of vector pairs that can be used to 16
instead of 32.  Generally you would want to use this in performance critical
code, and often there you are using all of the registers.

You can't just modify %L to deal with VSX registers, because the user might
be using an instruction that only accesses Altivec registers, i.e.:

__vector_pair *p, sum;
size_t i, n;
// ...
__asm__ ("vspltisw %0,0\nvspltisw %L0,0" : "=v" (sum));
for (i = 0; i < n; i++)
__asm__ ("vadduqm %0,%1,%2\n\tvadduqm %L0,%L1,%L2"
 : "=v" (sum)
 : "v" (sum), "v" (p[i]));

[Bug target/104698] Inefficient code for DI to TI sign extend on power10

2023-10-13 Thread meissner at gcc dot gnu.org via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=104698

Michael Meissner  changed:

   What|Removed |Added

 Status|NEW |RESOLVED
 Resolution|--- |FIXED

--- Comment #6 from Michael Meissner  ---
The patch was committed to the GCC 13 branch on March 5th, 2022 and later
backported to GCC 12.

[Bug target/111778] PowerPC constant code change uses an undefined shift

2023-10-11 Thread meissner at gcc dot gnu.org via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111778

Michael Meissner  changed:

   What|Removed |Added

   Severity|normal  |major
   Priority|P3  |P2
 CC||bergner at gcc dot gnu.org,
   ||dje at gcc dot gnu.org,
   ||guojiufu at gcc dot gnu.org,
   ||meissner at gcc dot gnu.org,
   ||segher at gcc dot gnu.org
  Build||powerpc64le-unknown-linux-g
   ||nu
 Target||powerpc64le-unknown-linux-g
   ||nu
   Host||x86_64-unknown-linux-gnu

[Bug target/111778] New: PowerPC constant code change uses an undefined shift

2023-10-11 Thread meissner at gcc dot gnu.org via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111778

Bug ID: 111778
   Summary: PowerPC constant code change uses an undefined shift
   Product: gcc
   Version: 14.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: target
  Assignee: unassigned at gcc dot gnu.org
  Reporter: meissner at gcc dot gnu.org
  Target Milestone: ---

I was building a cross compiler to PowerPC on my x86_86 workstation with the
latest version of GCC on October 11th.  I could not build the compiler on the
x86_64 system as it died in building libgcc.  I looked into it, and I
discovered the compiler was recursing until it ran out of stack space.  If I
build a native compiler with the same sources on a PowerPC system, it builds
fine.

I traced this down to a change made around October 10th:

commit 8f1a70a4fbcc6441c70da60d4ef6db1e5635e18a (HEAD)
Author: Jiufu Guo 
Date:   Tue Jan 10 20:52:33 2023 +0800

rs6000: build constant via li/lis;rldicl/rldicr

If a constant is possible left/right cleaned on a rotated value from
a negative value of "li/lis".  Then, using "li/lis ; rldicl/rldicr"
to build the constant.

gcc/ChangeLog:

* config/rs6000/rs6000.cc (can_be_built_by_li_lis_and_rldicl): New
function.
(can_be_built_by_li_lis_and_rldicr): New function.
(rs6000_emit_set_long_const): Call
can_be_built_by_li_lis_and_rldicr and
can_be_built_by_li_lis_and_rldicl.

gcc/testsuite/ChangeLog:

* gcc.target/powerpc/const-build.c: Add more tests.


In particular, the code is:

static bool
can_be_built_by_li_lis_and_rldicl (HOST_WIDE_INT c, int *shift,
   HOST_WIDE_INT *mask)
{
  /* Leading zeros may be cleaned by rldicl with a mask.  Change leading zeros
 to ones and then recheck it.  */
  int lz = clz_hwi (c);
  HOST_WIDE_INT unmask_c
= c | (HOST_WIDE_INT_M1U << (HOST_BITS_PER_WIDE_INT - lz));
  int n;

  if (can_be_rotated_to_lowbits (~unmask_c, 15, )
  || can_be_rotated_to_negative_lis (unmask_c, ))
{
  *mask = HOST_WIDE_INT_M1U >> lz;
  *shift = n == 0 ? 0 : HOST_BITS_PER_WIDE_INT - n;
  return true;
}

  return false;
}

In particular, if lz is 0 due to the constant having the highest bit set, the
-1 shift to set the mask in unmask_c would do a shift left by 64.

Different machines interpret num << shift differently if shift is at least the
number of bits in num's representation.  It is explicitly undefined behavior in
the C/C++ langauges.

In particular (-1 << 64) on an x86_64 produces -1 and (-1 << 64) on a 64-bit
PowerPC produces 0.

If I add a test for lz being 0 and returning false, the compiler builds fine.

One other note is the ChangeLog date is not correct for Jiufu Guo's changes
that include this change.  The several changes that were submitted list dates
of:

Tue Jan 10 21:40:48 2023 +0800
Tue Jan 10 20:52:33 2023 +0800
Thu Jun 15 21:11:53 2023 +0800
Thu Aug 24 09:08:34 2023 +0800

[Bug target/105325] power10: Error: operand out of range

2023-07-05 Thread meissner at gcc dot gnu.org via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=105325

Michael Meissner  changed:

   What|Removed |Added

 Resolution|--- |FIXED
 Status|ASSIGNED|RESOLVED

--- Comment #21 from Michael Meissner  ---
Fixed in trunk.  Back ported to GCC 13, GCC 12, and GCC 11.  The bug does not
show up in GCC 10.

Closing bug.

[Bug target/103498] Spec 2017 imagick_r is 2.62% slower on Power10 with pc-relative addressing compared to not using pc-relative addressing

2023-06-01 Thread meissner at gcc dot gnu.org via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=103498

Michael Meissner  changed:

   What|Removed |Added

 Resolution|--- |FIXED
 Status|UNCONFIRMED |RESOLVED

--- Comment #4 from Michael Meissner  ---
I just ran spec 2017 on a Power10 machine running RHEL 8, using the GCC 13.1
GCC and the Advance Toolchain 15.0 library.  In that run, I see no significant
(more than 1%) regressions if we use -mno-pcrel.  In fact, imagick_r was nearly
2% faster using PC-relative addressing.

[Bug target/109067] Powerpc GCC does not support __ibm128 complex multiply/divide if long double is IEEE 128-bit.

2023-04-11 Thread meissner at gcc dot gnu.org via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109067

Michael Meissner  changed:

   What|Removed |Added

 Status|UNCONFIRMED |RESOLVED
 Resolution|--- |FIXED

--- Comment #5 from Michael Meissner  ---
Trunk patched on March 20th, 2023.
Gcc 12 patched on April 10th, 2023.
Gcc 11 patched on April 11th, 2023.

[Bug target/70243] PowerPC V4SFmode should not use Altivec instructions on VSX systems

2023-04-05 Thread meissner at gcc dot gnu.org via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=70243

--- Comment #5 from Michael Meissner  ---
Created attachment 54814
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=54814=edit
Test case

This is test case that shows the generation of fmaddfp and fnmsubfp.

[Bug target/105325] power10: Error: operand out of range

2023-03-20 Thread meissner at gcc dot gnu.org via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=105325

Michael Meissner  changed:

   What|Removed |Added

   Assignee|acsawdey at gcc dot gnu.org|meissner at gcc dot 
gnu.org
 Status|NEW |ASSIGNED
 CC||meissner at gcc dot gnu.org

--- Comment #13 from Michael Meissner  ---
Aaron is not working on GCC any longer, so I'm taking over this bug.

[Bug target/109067] New: Powerpc GCC does not support __ibm128 complex multiply/divide if long double is IEEE 128-bit.

2023-03-08 Thread meissner at gcc dot gnu.org via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109067

Bug ID: 109067
   Summary: Powerpc GCC does not support __ibm128 complex
multiply/divide if long double is IEEE 128-bit.
   Product: gcc
   Version: 13.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: target
  Assignee: unassigned at gcc dot gnu.org
  Reporter: meissner at gcc dot gnu.org
  Target Milestone: ---

[Bug target/108958] New: Powerpcle could generate mtvsrdd for zero extend DI to TI mode, when the TImode is in a vector register

2023-02-27 Thread meissner at gcc dot gnu.org via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108958

Bug ID: 108958
   Summary: Powerpcle could generate mtvsrdd for zero extend DI to
TI mode, when the TImode is in a vector register
   Product: gcc
   Version: 13.0
Status: UNCONFIRMED
  Severity: enhancement
  Priority: P3
 Component: target
  Assignee: unassigned at gcc dot gnu.org
  Reporter: meissner at gcc dot gnu.org
  Target Milestone: ---

If you have a DImode variable (i.e. long) in a GPR, and you want to zero extend
it to TImode (i.e. `__int128', and the result is needed in a vector register,
you could just do a single `mtvsrdd' instruction, instead of separate zero a
GPR register, and then `mtvsrd' and `mtvsrdd' instructions.

[Bug middle-end/108623] We need to grow the precision field in tree_type_common for PowerPC

2023-02-01 Thread meissner at gcc dot gnu.org via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108623

--- Comment #7 from Michael Meissner  ---
Created attachment 54387
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=54387=edit
Proposed patch combining Richard's patch and an assertion.

[Bug middle-end/108623] We need to grow the precision field in tree_type_common for PowerPC

2023-02-01 Thread meissner at gcc dot gnu.org via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108623

Michael Meissner  changed:

   What|Removed |Added

   Last reconfirmed||2023-02-01
 Ever confirmed|0   |1
 Status|UNCONFIRMED |NEW

--- Comment #6 from Michael Meissner  ---
Yes I agree we want an assetion in sext_hwi as well.

Richard, are you going to submit the patch, or did you want me to do it (along
with the assertion)?

[Bug middle-end/108623] We need to grow the precision field in tree_type_common for PowerPC

2023-02-01 Thread meissner at gcc dot gnu.org via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108623

--- Comment #4 from Michael Meissner  ---
I must have missed the spare bits.  I think it is better to use the full 16
bits for precision.  I also think your other changes to realign bit fields
greater than 1 bit.

[Bug other/108623] New: We need to grow the precision field in tree_type_common for PowerPC

2023-02-01 Thread meissner at gcc dot gnu.org via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108623

Bug ID: 108623
   Summary: We need to grow the precision field in
tree_type_common for PowerPC
   Product: gcc
   Version: 13.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: other
  Assignee: unassigned at gcc dot gnu.org
  Reporter: meissner at gcc dot gnu.org
  Target Milestone: ---

The current patches that have been submitted to the PowerPC back end need to
grow the precision field in the tree_type_common structure (in tree-core.h). 
The current precision field is 10 bits but the type size for the dense math
register is 1,024 bits.

With the current 10 bit value, the precision of the TDO mode becomes 0 instead
of 1,024.  I noticed the tree-ssa-ccp.cc pass was dying with test programs that
use the dense math registers when I run the test on a PowerPC, but they run
fine on the x86_64.

Ultimately I tracked this down to the sext_hwi (sign extend) function in
hwint.h.

The code in hwint.h is:

static inline HOST_WIDE_INT
sext_hwi (HOST_WIDE_INT src, unsigned int prec)
{
  if (prec == HOST_BITS_PER_WIDE_INT)
return src;
  else
#if defined (__GNUC__)
{
  /* Take the faster path if the implementation-defined bits it's relying
 on are implemented the way we expect them to be.  Namely, conversion
 from unsigned to signed preserves bit pattern, and right shift of
 a signed value propagates the sign bit.
 We have to convert from signed to unsigned and back, because when left
 shifting signed values, any overflow is undefined behavior.  */
  gcc_checking_assert (prec < HOST_BITS_PER_WIDE_INT);
  int shift = HOST_BITS_PER_WIDE_INT - prec;
  return ((HOST_WIDE_INT) ((unsigned HOST_WIDE_INT) src << shift)) >>
shift;
}
#else
{
  /* Fall back to the slower, well defined path otherwise.  */
  gcc_checking_assert (prec < HOST_BITS_PER_WIDE_INT);
  HOST_WIDE_INT sign_mask = HOST_WIDE_INT_1 << (prec - 1);
  HOST_WIDE_INT value_mask = (HOST_WIDE_INT_1U << prec) - HOST_WIDE_INT_1U;
  return (((src & value_mask) ^ sign_mask) - sign_mask);
}
#endif
}

If the 'prec' argument is 0, the 'shift' variable will become 64.

In C/C++, a 64-bit value that is shifted either left or right by 64 bits is
undefined.  It turns out that the x86_64 will happen to return the original
value if you shift it left 64-bits and then shift it right with sign extension.
 From within the CCP pass, the original value is -1.  On the other hand, the
PowerPC always returns 0.  Since it isn't returning -1, it leads the CCP pass
to create a 0 constant for TDOmode.  But since TDOmode is opaque, converting it
to 0 generates an error when checking is enabled.

The solution is to grow the precision to 11 bits and reduce the
contains_placeholder_bits field to 1 bit.

Alternatively, we could grow the precision to 16 bits, which will cause all
trees to grow slightly, and keep contains_placeholder_bits at 2 bits. I see
references to contains placeholder bits in tree.cc and cp/module.cc, but I
don't see any place that actually sets the field.

[Bug target/93738] [10/11/12/13 regression] test case gcc.target/powerpc/20050603-3.c fails

2022-11-30 Thread meissner at gcc dot gnu.org via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=93738

Michael Meissner  changed:

   What|Removed |Added

 CC||meissner at gcc dot gnu.org
 Status|UNCONFIRMED |NEW
 Ever confirmed|0   |1
   Last reconfirmed||2022-12-01

--- Comment #6 from Michael Meissner  ---
It is still occurring in GCC 13.  It only fails in big endian 64-bit PowerPC on
Linux.  It does not generate the extra instruction for big endian 32-bit
PowerPC on Linux or little endian 64-bit PowerPC on Linux.

>From a brief glance, the extra instruction is generated in the combine phase.

[Bug testsuite/106345] Some ppc64le tests fail with -mcpu=power9 -mtune=power9

2022-08-18 Thread meissner at gcc dot gnu.org via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106345

Michael Meissner  changed:

   What|Removed |Added

 CC||meissner at gcc dot gnu.org

--- Comment #8 from Michael Meissner  ---
Note, the gcc.target/powerpc/pr92398.p9-.c test fails when the compiler is
configured for either --with-cpu=power9 or --with-cpu=power10.  No --with-tune=
was used in configuring either compiler.

[Bug target/106682] New: Powerpc test gcc.target/powerpc/pr86731-fwrapv-longlong.c fails on power8, passes on power9/power10

2022-08-18 Thread meissner at gcc dot gnu.org via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106682

Bug ID: 106682
   Summary: Powerpc test
gcc.target/powerpc/pr86731-fwrapv-longlong.c fails on
power8, passes on power9/power10
   Product: gcc
   Version: unknown
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: target
  Assignee: unassigned at gcc dot gnu.org
  Reporter: meissner at gcc dot gnu.org
  Target Milestone: ---

I was doing builds on a power10 for patch submission, and I noticed the test
pr86731-fwrapv-longlong.c fails when the target is power8, but it passes when
the target is power9 or power10.

Here is the log file from power8:
Executing on host: /home/meissner/fsf-build-ppc64le/work098-power8/gcc/xgcc
-B/home/meissner/fsf-build-ppc64le/work098-power8/gcc/ 
/home/meissner/fsf-src/work098/gcc/testsuite/gcc.target/powerpc/pr86731-fwrapv-longlong.c
   -fdiagnostics-plain-output  -maltivec -O3 -fwrapv -mpower8-vector
-ffat-lto-objects -fno-ident -S -o pr86731-fwrapv-longlong.s(timeout = 300)
spawn -ignore SIGHUP /home/meissner/fsf-build-ppc64le/work098-power8/gcc/xgcc
-B/home/meissner/fsf-build-ppc64le/work098-power8/gcc/
/home/meissner/fsf-src/work098/gcc/testsuite/gcc.target/powerpc/pr86731-fwrapv-longlong.c
-fdiagnostics-plain-output -maltivec -O3 -fwrapv -mpower8-vector
-ffat-lto-objects -fno-ident -S -o pr86731-fwrapv-longlong.s
PASS: gcc.target/powerpc/pr86731-fwrapv-longlong.c (test for excess errors)
PASS: gcc.target/powerpc/pr86731-fwrapv-longlong.c scan-assembler-times
\\mvspltis[bhw]\\M 0
PASS: gcc.target/powerpc/pr86731-fwrapv-longlong.c scan-assembler-times
\\mvsl[bhwd]\\M 0
gcc.target/powerpc/pr86731-fwrapv-longlong.c:
\\mp?lxv\\M|\\mlxv\\M|\\mlxvd2x\\M|\\mxxspltidp\\M found 0 times

[Bug testsuite/106681] New: Powerpc test gcc.dg/pr104992.c fails on power10

2022-08-18 Thread meissner at gcc dot gnu.org via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106681

Bug ID: 106681
   Summary: Powerpc test gcc.dg/pr104992.c fails on power10
   Product: gcc
   Version: unknown
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: testsuite
  Assignee: unassigned at gcc dot gnu.org
  Reporter: meissner at gcc dot gnu.org
  Target Milestone: ---

I was doing builds on a power10 system for patch submission, and I noticed the
following test fails when the test is compiled for power10, but it does not
fail when the test is compiled for power8 or power9: gcc.dg/pr104992.c:

Executing on host: /home/meissner/fsf-build-ppc64le/work098-if/gcc/xgcc
-B/home/meissner/fsf-build-ppc64le/work098-if/gcc/ 
/home/meissner/fsf-src/work098/gcc/testsuite/gcc.dg/pr104992.c   
-fdiagnostics-plain-output   -O2 -Wno-psabi -fdump-tree-optimized -S -o
pr104992.s(timeout = 300)
spawn -ignore SIGHUP /home/meissner/fsf-build-ppc64le/work098-if/gcc/xgcc
-B/home/meissner/fsf-build-ppc64le/work098-if/gcc/
/home/meissner/fsf-src/work098/gcc/testsuite/gcc.dg/pr104992.c
-fdiagnostics-plain-output -O2 -Wno-psabi -fdump-tree-optimized -S -o
pr104992.s
PASS: gcc.dg/pr104992.c (test for excess errors)
gcc.dg/pr104992.c: pattern found 6 times
FAIL: gcc.dg/pr104992.c scan-tree-dump-times optimized " % " 9

[Bug testsuite/106680] New: Test gcc.target/powerpc/bswap64-4.c fails on 32-bit BE

2022-08-18 Thread meissner at gcc dot gnu.org via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106680

Bug ID: 106680
   Summary: Test gcc.target/powerpc/bswap64-4.c fails on 32-bit BE
   Product: gcc
   Version: unknown
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: testsuite
  Assignee: unassigned at gcc dot gnu.org
  Reporter: meissner at gcc dot gnu.org
  Target Milestone: ---

I was doing some builds for submitting patches, and I did runs on BE systems as
well as LE systems.

I noticed the test gcc.target/powerpc/bswap64-4.c fails in 32-bit, because it
does not generate ldbrx or stdbrx instructions.  These instructions are not
supported on 32-bit.  So the test has to be adjusted to either only be run on a
64-bit system, or adjust the insns generated when the test is run on a 32-bit
target.

[Bug testsuite/101169] [10 regression] test case gcc.target/powerpc/fold-vec-extract-char.p7.c fails after r10-9880

2022-08-18 Thread meissner at gcc dot gnu.org via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=101169

Michael Meissner  changed:

   What|Removed |Added

 Ever confirmed|0   |1
 CC||meissner at gcc dot gnu.org
 Status|UNCONFIRMED |NEW
   Last reconfirmed||2022-08-18

--- Comment #3 from Michael Meissner  ---
The fold-vec-extract tests work fine on the development version of GCC 13 for
64-bit, but they are still failing if I run them on a BE system that supports
32-bit code generation.  It looks like the insn count may need to be adjusted
for 32-bit:

FAIL: gcc.target/powerpc/fold-vec-extract-int.p8.c scan-assembler-times
\\maddi\\M 9
FAIL: gcc.target/powerpc/fold-vec-extract-short.p7.c scan-assembler-times
\\maddi\\M|\\madd\\M 12
FAIL: gcc.target/powerpc/fold-vec-extract-short.p8.c scan-assembler-times
\\maddi\\M 9
FAIL: gcc.target/powerpc/fold-vec-extract-char.p7.c scan-assembler-times
\\maddi\\M 9
FAIL: gcc.target/powerpc/fold-vec-extract-double.p7.c scan-assembler-times
\\maddi\\M|\\madd\\M 3
FAIL: gcc.target/powerpc/fold-vec-extract-float.p7.c scan-assembler-times
\\maddi\\M|\\madd\\M 3
FAIL: gcc.target/powerpc/fold-vec-extract-float.p8.c scan-assembler-times
\\maddi\\M 2
FAIL: gcc.target/powerpc/fold-vec-extract-int.p7.c scan-assembler-times
\\maddi\\M|\\madd\\M 12

[Bug fortran/96983] [11/12 regression] ICE compiling gfortran.dg/pr96711.f90 starting with r11-3042

2022-03-17 Thread meissner at gcc dot gnu.org via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=96983

Michael Meissner  changed:

   What|Removed |Added

 Status|NEW |RESOLVED
 Resolution|--- |FIXED

--- Comment #41 from Michael Meissner  ---
I tried applying the fix to the GCC 10 branch, and it appears that the patch
needs more infrastructure needed.  So, I'm closing the PR.  If we need a GCC 10
backport, we can either reopen the issue or create a new bug report.  It is
fixed on the GCC 11 branch and on the master branch that will become GCC 12.

[Bug target/104868] [12 Regression] powerpc: Compiling libgfortran with -flto failing with GCC 12

2022-03-11 Thread meissner at gcc dot gnu.org via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=104868

--- Comment #8 from Michael Meissner  ---
Matheus, try the patch I just attached to the PR that I posted to the
gcc-patches mailing list.

[Bug target/104868] [12 Regression] powerpc: Compiling libgfortran with -flto failing with GCC 12

2022-03-11 Thread meissner at gcc dot gnu.org via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=104868

Michael Meissner  changed:

   What|Removed |Added

 CC||meissner at gcc dot gnu.org

--- Comment #7 from Michael Meissner  ---
Created attachment 52610
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=52610=edit
Patch to fix extendidti2 constraint on power10

[Bug lto/104868] powerpc: Compiling libgfortran with -flto failing with GCC 12

2022-03-10 Thread meissner at gcc dot gnu.org via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=104868

--- Comment #4 from Michael Meissner  ---
In looking at it, the reason is the convert from DImode to TImode has several
constraints.  The constraint that matters in this case has the output being an
Altivec register, while the input is a GPR register.  The vsx_splat_v2di
pattern that is called from the extendidti2 define_insn_and_split has a
constraint of 'b' (i.e. disallow register GPR 0), while the extendidti2 insn
has a constraint of 'r' (i.e. allow any GPR register).

A simple fix is to change the constant in extendditi2 to be 'b' instead of 'r'.

The reason GPR 0 is excluded is the mtvsrdd instruction uses a 0 in the RA
field to indicate put a zero into the upper 64 bits.

[Bug target/104253] libgcc missing __floatdiif

2022-03-05 Thread meissner at gcc dot gnu.org via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=104253

Michael Meissner  changed:

   What|Removed |Added

 Status|ASSIGNED|RESOLVED
 Resolution|--- |FIXED

--- Comment #16 from Michael Meissner  ---
Backports done to gcc 10 and gcc 11 branches.  PR closed.

[Bug target/104698] Inefficient code for DI to TI sign extend on power10

2022-02-28 Thread meissner at gcc dot gnu.org via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=104698

--- Comment #3 from Michael Meissner  ---
It goes beyond 'just use RTL'.

The problem is the code only generates an altivec instruction.  So if the
__int128_t value is in a GPR, the compiler will need to do a move to the vector
registers (1 insn), the instruction, and then move back to the GPRs (2 insns).

What it needs to do is have code paths for when the __int128_t is in a GPR and
a code path when it is in an altivec register.

I have patches that I'm testing that does this (i.e. handles both GPR and
Altivec registers) to avoid having to do direct moves.

[Bug target/104698] New: Inefficient code for DI to TI sign extend on power10

2022-02-25 Thread meissner at gcc dot gnu.org via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=104698

Bug ID: 104698
   Summary: Inefficient code for DI to TI sign extend on power10
   Product: gcc
   Version: 11.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: target
  Assignee: unassigned at gcc dot gnu.org
  Reporter: meissner at gcc dot gnu.org
  Target Milestone: ---

On power10, signed conversion from DImode to TImode is inefficient for GCC 11
and the current GCC 12.  GCC 10 does not do this optimization.

On power10, GCC tries to generate the 'vextsd2q' instruction.  However, to
generate this instruction, it would typically generate a 'mtvsrsdd' instruction
to get the TImode value into an Altivec register in the bottom 64-bits, then it
does the vextsd2g instruction, and finally it generates 'mfvsrd' and 'mfvsrld'
instructions to get the value back into the GPR registers.

For power9, it generates a move instruction and then an arithmetic shift right
63 bits to fill the upper word with the copy of the sign bit.

GCC should generate the following code sequences:

1) For GPR register to GPR register: Move register, and 'sradi' to create the
sign bits in the upper word.

2) For GPR register to VSX register to Altivec register: Splat the value to
fill the bottom 64 bits, and then do 'vextsd2q'.

3) For memory to GPR register, load the value into the low register, and fill
the high register with the sign bit.

4) For memory to Altivec register, load the value with load VSX vector
rightmost doubleword, and then do 'vextsd2q'.

[Bug target/104335] [12 regression] build failure if go is included in languages after r12-6747

2022-02-23 Thread meissner at gcc dot gnu.org via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=104335

Michael Meissner  changed:

   What|Removed |Added

 CC||asolokha at gmx dot com

--- Comment #10 from Michael Meissner  ---
*** Bug 104256 has been marked as a duplicate of this bug. ***

[Bug target/104256] ICE in validate_condition_mode, at config/rs6000/rs6000.cc:11354

2022-02-23 Thread meissner at gcc dot gnu.org via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=104256

Michael Meissner  changed:

   What|Removed |Added

 Resolution|--- |DUPLICATE
 Status|ASSIGNED|RESOLVED

--- Comment #2 from Michael Meissner  ---
The fix for 104335 also fixes PR target/104256.

*** This bug has been marked as a duplicate of bug 104335 ***

[Bug target/104256] ICE in validate_condition_mode, at config/rs6000/rs6000.cc:11354

2022-02-17 Thread meissner at gcc dot gnu.org via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=104256

--- Comment #1 from Michael Meissner  ---
Created attachment 52463
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=52463=edit
Proposed patch

[Bug target/104256] ICE in validate_condition_mode, at config/rs6000/rs6000.cc:11354

2022-02-17 Thread meissner at gcc dot gnu.org via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=104256

Michael Meissner  changed:

   What|Removed |Added

   Assignee|unassigned at gcc dot gnu.org  |meissner at gcc dot 
gnu.org
   Last reconfirmed||2022-02-17
 Ever confirmed|0   |1
 CC||meissner at gcc dot gnu.org
 Status|UNCONFIRMED |ASSIGNED

[Bug target/99197] Built-ins for packing/unpacking __ibm128 not documented

2022-02-14 Thread meissner at gcc dot gnu.org via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=99197

Michael Meissner  changed:

   What|Removed |Added

 CC||meissner at gcc dot gnu.org
 Resolution|--- |FIXED
 Status|UNCONFIRMED |RESOLVED

--- Comment #1 from Michael Meissner  ---
These functions have been documented since June 2018.  The file extend.texi
contains:

The @code{__builtin_unpack_ibm128} function takes a @code{__ibm128}
argument and a compile time constant of 0 or 1.  If the constant is 0,
the first @code{double} within the @code{__ibm128} is returned,
otherwise the second @code{double} is returned.

The @code{__builtin_pack_ibm128} function takes two @code{double}
arguments and returns a @code{__ibm128} value that combines the two
arguments.

[Bug ipa/102059] Incorrect always_inline diagnostic in LTO mode with #pragma GCC target("cpu=power10")

2022-02-08 Thread meissner at gcc dot gnu.org via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102059

--- Comment #31 from Michael Meissner  ---
Created attachment 52383
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=52383=edit
Simpler patch to fix the problem with power8-fusion.

This patch just ignores the -mpower8-fusion option in the callee if the caller
does not have it set, and the option wasn't set explicitly.

[Bug target/104253] libgcc missing __floatdiif

2022-01-31 Thread meissner at gcc dot gnu.org via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=104253

--- Comment #11 from Michael Meissner  ---
The patch has been posted, I'm awaiting approval.
https://gcc.gnu.org/pipermail/gcc-patches/2022-January/589469.html

BTW, the copy_to_mode_reg bug I mentioned earlier goes away with the patch.

[Bug target/104253] libgcc missing __floatdiif

2022-01-28 Thread meissner at gcc dot gnu.org via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=104253

Michael Meissner  changed:

   What|Removed |Added

  Attachment #52306|0   |1
is obsolete||

--- Comment #9 from Michael Meissner  ---
Created attachment 52312
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=52312=edit
Replacement patch for fixing the integer <-> __ibm128 conversions if long
double is IEEE 128-bit.

Replacement for patch #2 that only modifies the names that are used.

[Bug target/104253] libgcc missing __floatdiif

2022-01-28 Thread meissner at gcc dot gnu.org via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=104253

--- Comment #8 from Michael Meissner  ---
Yes, you are right.  I didn't remember which functions were generated by the
compiler, but I just did all of the conversion functions.

[Bug target/104124] Poor optimization for vector splat DW with small consts

2022-01-27 Thread meissner at gcc dot gnu.org via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=104124

--- Comment #3 from Michael Meissner  ---
There are two things going on.

1) There is no vspltisd instruction, so we can't generate a single instruction
to load constants other than 0 or -1.  Unfortunately, this was not added in
either power9 or power10.

2) On the power9 and power10 we have the xxspltib and vecsb2d instructions, and
we generate those if -mcpu=power9.

To add support for new types of constants, the procedure is:

1) You need to modify easy_altivec_constant and gen_altivec_constant in
rs6000.c (or rs6000.cc in GCC 12).  Then add new predicates in predicate.md for
these new patterns.

2) Look for the predicates "easy_vector_constant_add_self" and so forth in
predicates.md and add a new predicate here.

3) Then in altivec.md, look for the define_splits that use the various
easy_vector_const_* functions and add a new pattern.

[Bug libgcc/104253] libgcc missing __floatdiif

2022-01-27 Thread meissner at gcc dot gnu.org via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=104253

Michael Meissner  changed:

   What|Removed |Added

 Status|NEW |ASSIGNED
   Assignee|unassigned at gcc dot gnu.org  |meissner at gcc dot 
gnu.org

--- Comment #5 from Michael Meissner  ---
The other issue that I mentioned in note #2 is likely a different issue when
-mabi=ibmlongdouble is used.  I didn't have the patch to automatically use IEEE
128-bit if the compiler used to build stage1 also used IEEE 128-bit.

[Bug libgcc/104253] libgcc missing __floatdiif

2022-01-27 Thread meissner at gcc dot gnu.org via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=104253

--- Comment #4 from Michael Meissner  ---
Created attachment 52306
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=52306=edit
Patch to use the correct names for __ibm128 converts if long double is IEEE
128-bit

The problem was internally there are 3 types for 128-bit floating point:
TFmode -- mode for the type long double
IFmode -- mode for __ibm128 if long double is IEEE 128-bit
KFmode -- mode for __float128

There was not a conversion function specified to convert between IFmode and
other modes, so the machine independent portion of the compiler created a name
with 'if' in it.

This patch specifies the names for the conversion functions to use the
traditional TF modes.

[Bug libgcc/104253] libgcc missing __floatdiif

2022-01-26 Thread meissner at gcc dot gnu.org via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=104253

Michael Meissner  changed:

   What|Removed |Added

   Last reconfirmed||2022-01-26
 Status|UNCONFIRMED |NEW
 Ever confirmed|0   |1

--- Comment #1 from Michael Meissner  ---
In addition to __floatdiif not being in libgcc, if you build a recent trunk, it
exposes a second issue:

-ltcden3-lp5-> /home/meissner/fsf-install-ppc64le/trunk/bin/gcc -O2 pr-104253.c
during RTL pass: expand
pr-104253.c: In function ‘main’:
pr-104253.c:8:9: internal compiler error: in copy_to_mode_reg, at explow.cc:652
8 | printf("%a
%a\n",__builtin_unpack_ibm128(i,0),__builtin_unpack_ibm128(i,1));
  |
^~~
0x10408cab copy_to_mode_reg(machine_mode, rtx_def*)
/home/meissner/fsf-src/trunk/gcc/explow.cc:652
0x10f28837 rs6000_expand_builtin(tree_node*, rtx_def*, rtx_def*, machine_mode,
int)
/home/meissner/fsf-src/trunk/gcc/config/rs6000/rs6000-call.cc:5834
0x1043811f expand_expr_real_1(tree_node*, rtx_def*, machine_mode,
expand_modifier, rtx_def**, bool)
/home/meissner/fsf-src/trunk/gcc/expr.cc:11536
0x10446e7b store_expr(tree_node*, rtx_def*, int, bool, bool)
/home/meissner/fsf-src/trunk/gcc/expr.cc:6087
0x1044af77 expand_assignment(tree_node*, tree_node*, bool)
/home/meissner/fsf-src/trunk/gcc/expr.cc:5819
0x10285cbb expand_call_stmt
/home/meissner/fsf-src/trunk/gcc/cfgexpand.cc:2829
0x10285cbb expand_gimple_stmt_1
/home/meissner/fsf-src/trunk/gcc/cfgexpand.cc:3864
0x10285cbb expand_gimple_stmt
/home/meissner/fsf-src/trunk/gcc/cfgexpand.cc:4028
0x1028dd93 expand_gimple_basic_block
/home/meissner/fsf-src/trunk/gcc/cfgexpand.cc:6069
0x1028ff27 execute
/home/meissner/fsf-src/trunk/gcc/cfgexpand.cc:6795
Please submit a full bug report,
with preprocessed source if appropriate.
Please include the complete backtrace with any bug report.
See  for instructions.

[Bug testsuite/103763] [12 regression] gcc.target/powerpc/fold-vec-splat-floatdouble.c fails after r12-5988

2022-01-21 Thread meissner at gcc dot gnu.org via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=103763

Michael Meissner  changed:

   What|Removed |Added

 Resolution|--- |FIXED
 Status|ASSIGNED|RESOLVED

--- Comment #3 from Michael Meissner  ---
Fixed in commit abe3a4f0e9c461789b689e78d6116b1efffc1b5b on January 21, 2022.

[Bug target/104136] Gcc cannot compile wrf_r for power10 using -Ofast

2022-01-21 Thread meissner at gcc dot gnu.org via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=104136

Michael Meissner  changed:

   What|Removed |Added

 Resolution|--- |FIXED
 Status|ASSIGNED|RESOLVED

--- Comment #6 from Michael Meissner  ---
Fixed as per previous message.

[Bug target/104136] Gcc cannot compile wrf_r for power10 using -Ofast

2022-01-21 Thread meissner at gcc dot gnu.org via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=104136

--- Comment #5 from Michael Meissner  ---
Fixed in commit f9063d12633c62a089115df032a19295854d8b06 on January 21, 2022.

[Bug target/104136] Gcc cannot compile wrf_r for power10 using -Ofast

2022-01-21 Thread meissner at gcc dot gnu.org via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=104136

Michael Meissner  changed:

   What|Removed |Added

  Attachment #52246|0   |1
is obsolete||

--- Comment #3 from Michael Meissner  ---
Created attachment 52262
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=52262=edit
Updated patch to make xxspltiw/xxspltidp set the prefixed attribute

Replacement patch.  Submitted to gcc-patches:
https://gcc.gnu.org/pipermail/gcc-patches/2022-January/589052.html

[Bug target/104136] Gcc cannot compile wrf_r for power10 using -Ofast

2022-01-20 Thread meissner at gcc dot gnu.org via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=104136

Michael Meissner  changed:

   What|Removed |Added

  Attachment #52244|0   |1
is obsolete||

--- Comment #2 from Michael Meissner  ---
Created attachment 52246
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=52246=edit
Replacement patch to set prefixed attribute.

This patch explicitly sets the prefixed attribute for the xxspltiw and
xxspltidp instructions instead of modifying maybe_prefixed.

[Bug target/104136] Gcc cannot compile wrf_r for power10 using -Ofast

2022-01-20 Thread meissner at gcc dot gnu.org via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=104136

--- Comment #1 from Michael Meissner  ---
Created attachment 52244
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=52244=edit
Patch to mark XXSPLTIW and XXSPLTIDP as possibly being prefixed

If you compile module_advect_em.F90 with -Ofast -mcpu=power10, one module
is large enough that we can't use a single conditional jump to span the
function.  Instead, we have to reverse the condition, and do a conditional
jump around an unconditional branch.  It turns out when xxspltiw and
xxspltdp instructions were generated, they were not marked as being
prefixed (i.e. length of 12 bytes instead of 4 bytes).  This meant the
calculations for the branch length were off, which in turn meant the
assembler raised an error because it couldn't do the conditional jump.

The fix is to set the maybe_prefixed attribute so that insns with the type
'vecperm' might be prefixed.  Then in the code that optionally puts a 'p'
in front of the insn skip doing so for the permutes (i.e. load constant
with splat instruction).

[Bug target/104136] Gcc cannot compile wrf_r for power10 using -Ofast

2022-01-19 Thread meissner at gcc dot gnu.org via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=104136

Michael Meissner  changed:

   What|Removed |Added

   Priority|P3  |P1
   Severity|normal  |critical
   Host||powerpc64le-unknown-linux-g
   ||nu
   Assignee|unassigned at gcc dot gnu.org  |meissner at gcc dot 
gnu.org
 Target||powerpc64le-unknown-linux-g
   ||nu
   Last reconfirmed||2022-01-20
  Build||powerpc64le-unknown-linux-g
   ||nu
 Status|UNCONFIRMED |ASSIGNED
 Ever confirmed|0   |1
 CC||bergner at gcc dot gnu.org,
   ||meissner at gcc dot gnu.org,
   ||segher at gcc dot gnu.org,
   ||wschmidt at gcc dot gnu.org

[Bug target/104136] New: Gcc cannot compile wrf_r for power10 using -Ofast

2022-01-19 Thread meissner at gcc dot gnu.org via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=104136

Bug ID: 104136
   Summary: Gcc cannot compile wrf_r for power10 using -Ofast
   Product: gcc
   Version: 12.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: target
  Assignee: unassigned at gcc dot gnu.org
  Reporter: meissner at gcc dot gnu.org
  Target Milestone: ---

Using the current trunk compiler (from January 18th, 2022), I cannot compile
the module_advect_em fortran module with either -Ofast or -O3 using my normal
spec build options.  The reason is GCC generates a conditional jump
instruction, and the label is too far away.  This means the length insn
attribute is incorrect for one or more instructions, and GCC believes it does
not have to reverse the conditional jump. 

If I disable the generation of vector constants using the XXSPLTIW instruction
via -mno-splat-word-constant option, the module compiles fine.  Enabling or
disabling the XXSPLTIDP instruction with -mno-splat-float-constant does not
affect whether the file can be compiled, only disabling XXSPLTIW.

I used the following options build the module:
-g -Ofast -mcpu=power10 -finline-arg-packing \
-static-libgfortran -fstack-arrays -std=legacy \
-frandom-seed=spec2017 -fconvert=big-endian \
-fno-range-check -fcray-pointer

With those options, there are 646 XXSPLTIW instructions generated and 558
XXSPLTIDP instructions generated.  The size of the
__module_advect_em_MOD_advect_scalar function is 335,440 bytes.

[Bug testsuite/102935] [12 regression] new test case gcc.target/powerpc/pr101384-1.c fails

2022-01-12 Thread meissner at gcc dot gnu.org via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102935

Michael Meissner  changed:

   What|Removed |Added

 Resolution|--- |FIXED
 Status|ASSIGNED|RESOLVED

--- Comment #5 from Michael Meissner  ---
Patch committed on January 12th, 2022.

[Bug testsuite/102935] [12 regression] new test case gcc.target/powerpc/pr101384-1.c fails

2022-01-07 Thread meissner at gcc dot gnu.org via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102935

Michael Meissner  changed:

   What|Removed |Added

  Attachment #52143|0   |1
is obsolete||

--- Comment #3 from Michael Meissner  ---
Created attachment 52144
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=52144=edit
Patch to fix the code generation test for power9 and power10.

The previous patch posted was the wrong patch.

This patch fixes the regexp in pr101384-1.c to add support for power9 and
power10 using XXSPLTIB to set a vector register to all 1's instead of
VSPLTIS{B,H,W} that power8 generates.

[Bug testsuite/102935] [12 regression] new test case gcc.target/powerpc/pr101384-1.c fails

2022-01-07 Thread meissner at gcc dot gnu.org via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102935

--- Comment #2 from Michael Meissner  ---
Created attachment 52143
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=52143=edit
Patch to update code generation test

The test wants to load all 1's into a vector register.  On power8 it uses
VSPLTIS{B,H,W} to load up the value.  On power9 and power10, it uses XXSPLTIB
to load up the value.  This patch checks for XXSPLTIB being generated.

[Bug testsuite/102935] [12 regression] new test case gcc.target/powerpc/pr101384-1.c fails

2022-01-07 Thread meissner at gcc dot gnu.org via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102935

Michael Meissner  changed:

   What|Removed |Added

 Status|UNCONFIRMED |ASSIGNED
   Last reconfirmed||2022-01-07
 CC||dje at gcc dot gnu.org,
   ||meissner at gcc dot gnu.org,
   ||segher at gcc dot gnu.org
   Assignee|unassigned at gcc dot gnu.org  |meissner at gcc dot 
gnu.org
 Ever confirmed|0   |1

--- Comment #1 from Michael Meissner  ---
The reason it fails on power9 and power10 is GCC uses vspltiw to load up all
1's into the vector register.  On power9 and power10 it uses xxspltib to load
it up.

[Bug testsuite/103763] [12 regression] gcc.target/powerpc/fold-vec-splat-floatdouble.c fails after r12-5988

2022-01-07 Thread meissner at gcc dot gnu.org via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=103763

--- Comment #1 from Michael Meissner  ---
Created attachment 52141
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=52141=edit
Patch to fix the insn count

Update the insn regex for power10.

[Bug testsuite/103763] [12 regression] gcc.target/powerpc/fold-vec-splat-floatdouble.c fails after r12-5988

2022-01-07 Thread meissner at gcc dot gnu.org via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=103763

Michael Meissner  changed:

   What|Removed |Added

 Ever confirmed|0   |1
   Last reconfirmed||2022-01-07
 Status|UNCONFIRMED |ASSIGNED
   Assignee|unassigned at gcc dot gnu.org  |meissner at gcc dot 
gnu.org

[Bug target/103498] New: Spec 2017 imagick_r is 2.62% slower on Power10 with pc-relative addressing compared to not using pc-relative addressing

2021-11-30 Thread meissner at gcc dot gnu.org via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=103498

Bug ID: 103498
   Summary: Spec 2017 imagick_r is 2.62% slower on Power10 with
pc-relative addressing compared to not using
pc-relative addressing
   Product: gcc
   Version: 12.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: target
  Assignee: unassigned at gcc dot gnu.org
  Reporter: meissner at gcc dot gnu.org
  Target Milestone: ---

I was doing some Spec 2017 rate runs on a single power10 little endian 64-bit
CPU.  One of the runs disabled pc-relative addressing.  One benchmark
(imagick_r) was faster if PC-relative addressing was disabled.

For this run I used the options:
-DSPEC \
-DNDEBUG \
-I. \
-DSPEC_AUTO_SUPPRESS_OPENMP \
-g \
-save-temps=obj \
-Ofast \
-mcpu=power10 \
-mrecip \
-funroll-loops \
-msave-toc-indirect \
-mno-pcrel \
-fgnu89-inline \
-Wno-multichar \
-DSPEC_LP64 \
-frandom-seed=spec2017

[Bug target/99921] PowerPC xxeval has the wrong predicates

2021-11-30 Thread meissner at gcc dot gnu.org via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=99921

Michael Meissner  changed:

   What|Removed |Added

 Status|ASSIGNED|RESOLVED
 Resolution|--- |FIXED

--- Comment #3 from Michael Meissner  ---
Fixed on August 13th on the trunk.

[Bug middle-end/26163] [meta-bug] missed optimization in SPEC (2k17, 2k and 2k6 and 95)

2021-11-30 Thread meissner at gcc dot gnu.org via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=26163
Bug 26163 depends on bug 103320, which changed state.

Bug 103320 Summary: 12 Regression] Spec 2017 benchmark roms_r fails on PowerPC 
for -Ofast
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=103320

   What|Removed |Added

 Status|UNCONFIRMED |RESOLVED
 Resolution|--- |WONTFIX

[Bug target/103320] 12 Regression] Spec 2017 benchmark roms_r fails on PowerPC for -Ofast

2021-11-30 Thread meissner at gcc dot gnu.org via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=103320

Michael Meissner  changed:

   What|Removed |Added

 Resolution|--- |WONTFIX
 Status|UNCONFIRMED |RESOLVED

--- Comment #1 from Michael Meissner  ---
Note, roms_r is not compatible with -Ofast or -ffast-math unless you use the
-fno-unsafe-math-optimizations option.  I'm going to close the bug since I've
adjusted my scripts to add that option to roms_r (and perlbench_r which also
needs it).

[Bug libstdc++/103387] powerpc64le: segmentation fault on std::cout with ieee128 long double variable

2021-11-23 Thread meissner at gcc dot gnu.org via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=103387

Michael Meissner  changed:

   What|Removed |Added

   Severity|normal  |major
   Last reconfirmed||2021-11-23
   Priority|P3  |P1
 CC||meissner at gcc dot gnu.org
 Ever confirmed|0   |1
 Status|UNCONFIRMED |NEW

--- Comment #1 from Michael Meissner  ---
I tried it on a current trunk compiler (from November 23, 2021) using glibc
2.34 (IBM AT 14.0), and it does fail.  It works fine if I build a toolchain
where the default long double is IEEE 128-bit.

[Bug tree-optimization/103317] Spec 2017 benchmark blender_r fails with -Ofast on PowerPc (power9, power10)

2021-11-18 Thread meissner at gcc dot gnu.org via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=103317

Michael Meissner  changed:

   What|Removed |Added

   Priority|P2  |P1

[Bug regression/103318] Spec 2017 benchmark perlbench_r fails on PowerPC for -Ofast and -O3, passes with -O2

2021-11-18 Thread meissner at gcc dot gnu.org via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=103318

Michael Meissner  changed:

   What|Removed |Added

   Priority|P2  |P1

[Bug regression/103320] Spec 2017 benchmark roms_r fails on PowerPC for -Ofast

2021-11-18 Thread meissner at gcc dot gnu.org via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=103320

Michael Meissner  changed:

   What|Removed |Added

   Priority|P2  |P1

[Bug regression/103320] Spec 2017 benchmark roms_r fails on PowerPC for -Ofast

2021-11-18 Thread meissner at gcc dot gnu.org via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=103320

Michael Meissner  changed:

   What|Removed |Added

 CC||bergner at gcc dot gnu.org,
   ||dje at gcc dot gnu.org,
   ||meissner at gcc dot gnu.org,
   ||segher at gcc dot gnu.org,
   ||wschmidt at gcc dot gnu.org
   Severity|normal  |major
   Priority|P3  |P2

[Bug regression/103320] New: Spec 2017 benchmark roms_r fails on PowerPC for -Ofast

2021-11-18 Thread meissner at gcc dot gnu.org via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=103320

Bug ID: 103320
   Summary: Spec 2017 benchmark roms_r fails on PowerPC for -Ofast
   Product: gcc
   Version: 12.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: regression
  Assignee: unassigned at gcc dot gnu.org
  Reporter: meissner at gcc dot gnu.org
  Target Milestone: ---

The Spec 2017 benchmark roms_r compiles fine but produces the wrong output when
compiled with -Ofast options on both power9 and power10.  In going back with
the previous runs that I've done on power10, it worked with sources checked out
on October 17th, but failed with sources checked out on November 4th and
November 17th.

I used the options:
-g -mlittle -save-temps=obj -ffast-math -Ofast -mcpu=power10 -mrecip \
-funroll-loops -m64 -finline-arg-packing -static-libgfortran \
-fstack-arrays -std=legacy -frandom-seed=spec2017

[Bug regression/103318] Spec 2017 benchmark perlbench_r fails on PowerPC for -Ofast and -O3, passes with -O2

2021-11-18 Thread meissner at gcc dot gnu.org via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=103318

Michael Meissner  changed:

   What|Removed |Added

   Priority|P3  |P2
   Severity|normal  |major
 CC||bergner at gcc dot gnu.org,
   ||dje at gcc dot gnu.org,
   ||meissner at gcc dot gnu.org,
   ||segher at gcc dot gnu.org,
   ||wschmidt at gcc dot gnu.org

[Bug regression/103318] New: Spec 2017 benchmark perlbench_r fails on PowerPC for -Ofast and -O3, passes with -O2

2021-11-18 Thread meissner at gcc dot gnu.org via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=103318

Bug ID: 103318
   Summary: Spec 2017 benchmark perlbench_r fails on PowerPC for
-Ofast and -O3, passes with -O2
   Product: gcc
   Version: 12.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: regression
  Assignee: unassigned at gcc dot gnu.org
  Reporter: meissner at gcc dot gnu.org
  Target Milestone: ---

The Spec 2017 benchmark perlbench_r compiles fine but produces the wrong output
when compiled with -O3 or -Ofast options on both power9 and power10.  Using -O2
produces the right results on power10.  In going back with the previous runs
that I've done on power10, it worked with sources checked out on October 17th,
but failed with sources checked out on November 4th and November 17th.

For power10, the options that I use are:
-g -mlittle -save-temps=obj -ffast-math -Ofast -mcpu=power10 -mrecip \
-funroll-loops -m64  -fgnu89-inline -Wno-multichar -DSPEC_LP64 \
-frandom-seed=spec2017 -D_XOPEN_SOURCE=500 -DSPEC_LINUX_PPC_LE \
-fno-strict-aliasing -std=gnu11 -Wno-builtin-macro-redefined -w

[Bug tree-optimization/103317] Spec 2017 benchmark blender_r fails with -Ofast on PowerPc (power9, power10)

2021-11-18 Thread meissner at gcc dot gnu.org via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=103317

Michael Meissner  changed:

   What|Removed |Added

   Severity|normal  |major
   Priority|P3  |P2

[Bug tree-optimization/103317] New: Spec 2017 benchmark blender_r fails with -Ofast on PowerPc (power9, power10)

2021-11-18 Thread meissner at gcc dot gnu.org via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=103317

Bug ID: 103317
   Summary: Spec 2017 benchmark blender_r fails with -Ofast on
PowerPc (power9, power10)
   Product: gcc
   Version: 12.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: tree-optimization
  Assignee: unassigned at gcc dot gnu.org
  Reporter: meissner at gcc dot gnu.org
  Target Milestone: ---

Created attachment 51832
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=51832=edit
Preprocessed .i file that shows the bug (gzipped).

As of November 17th, the current trunk compiler for GCC 12 fails when compiling
the Spec 2017 benchmark blender_r for PowerPC using -Ofast.  I see failures for
both power9 and power10 systems.  Here is the traceback:

libpng/png.c: In function 'png_build_gamma_table':
libpng/png.c:2872:1: error: definition in block 19 does not dominate use in
block 18
 2872 | }
  | ^
for SSA_NAME: _57 in statement:
shift_48 = PHI <_57(18), _58(19)>
PHI argument
_57
for PHI node
shift_48 = PHI <_57(18), _58(19)>
during GIMPLE pass: phiopt
libpng/png.c:2872:1: internal compiler error: verify_ssa failed
0x10c5f64b verify_ssa(bool, bool)
/home/meissner/fsf-src/trunk-2021-11-17/gcc/tree-ssa.c:1211
0x1080e97f execute_function_todo
/home/meissner/fsf-src/trunk-2021-11-17/gcc/passes.c:2049
0x1080f613 execute_todo
/home/meissner/fsf-src/trunk-2021-11-17/gcc/passes.c:2096
Please submit a full bug report,
with preprocessed source if appropriate.

I see it with -O2, -O3, and -Ofast, using -mcpu=power8, -mcpu=power9, and
-mcpu=power10 on PowerPC little endian systems. It did compile with the
compiler I checked out on November 4th.

I will attach the .i file compressed with gzip.

[Bug ipa/102059] Incorrect always_inline diagnostic in LTO mode with #pragma GCC target("cpu=power10")

2021-08-26 Thread meissner at gcc dot gnu.org via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102059

Michael Meissner  changed:

   What|Removed |Added

 CC||meissner at gcc dot gnu.org

--- Comment #19 from Michael Meissner  ---
The main power8 fusion that GCC does is combining:

addis   rtmp,r0,symbol@hi(r2)
ld/lbz/lwz  rx,symbol@lo(rtmp)

into:

addis   rx,symbol@hi(r2)
ld/lbz/lwz  rx,symbol@lo(rx)

This fusion is listed as one of the fusion types in the power10 documents.  The
fusion type is wideimmediate.  Note, when you are compiling for -mcpu=power10,
this fusion case doesn't often get used because we use PC-relative loads.  But
the machine does support it.

In addition, it combines loads to a traditional floating point register, and
then a move to a traditional Altivec register.   Similarly, it will combine a
move from a traditional Altivec register to a traditional floating point
register, and then a store:

lfd   fy,32(rx)xxlor fy,vsrx
xxlor vsrz,fy,fy   stfd  fy,32(rz)

into:

li   rtmp,32   lirtmp,32
lxdx vsrz,2,rtmp   stxdx vsrx.rz.rtmp

Now on power9 and power10, this sequence is not generated because we have the
lxsd and stxsd instructions (and plxsd/pstxsd in power10).

So I suspect, we may want to move the p8 load fusion case support to fusion.md,
and do it for power10 as well.  Aaron Sawdey may have other thoughts, since he
has been working on the power10 fusion support, and knows more what is actually
implemented in current hardware.

Then for inlining, we may want to exclude p8_fusion and p10_fusion in the
comparison in rs6000_can_inline_p, since these are optimizations that don't
affect the instructions generated.

Note, there were so-called power9 fusion code that was originally in the power9
spec, but was not implemented in the hardware.  I removed support for these in
November 2018.

[Bug testsuite/100166] Some vold-vec-{load,store} tests fail when built with compiler configured with --with-cpu=power10

2021-08-04 Thread meissner at gcc dot gnu.org via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100166

Michael Meissner  changed:

   What|Removed |Added

 Status|ASSIGNED|RESOLVED
 Resolution|--- |FIXED

--- Comment #5 from Michael Meissner  ---
Fixed with July 13th commit.

[Bug testsuite/100167] GCC configured for power10 fails the gcc.target/powerpc/fold-vec-div-longlong.c test

2021-08-04 Thread meissner at gcc dot gnu.org via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100167

Michael Meissner  changed:

   What|Removed |Added

 Resolution|--- |FIXED
 Status|ASSIGNED|RESOLVED

--- Comment #3 from Michael Meissner  ---
Fixed with July 20th commit.

[Bug testsuite/100170] Gcc tests gcc.target/powerpc/ppc-{eq, ne}0-1.c fail on Power10

2021-08-04 Thread meissner at gcc dot gnu.org via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100170

Michael Meissner  changed:

   What|Removed |Added

 Status|ASSIGNED|RESOLVED
 Resolution|--- |FIXED

--- Comment #3 from Michael Meissner  ---
Fixed with July 26th checkin.

[Bug testsuite/100168] Test gcc.dg/pr56727-2.c fails on power10 code generation

2021-08-04 Thread meissner at gcc dot gnu.org via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100168

Michael Meissner  changed:

   What|Removed |Added

 Status|ASSIGNED|RESOLVED
 Resolution|--- |FIXED

--- Comment #3 from Michael Meissner  ---
Fixed with July 28th commit.

[Bug target/101153] [12 regression] gcc.target/powerpc/float128-minmax.c fails after r12-1605

2021-07-14 Thread meissner at gcc dot gnu.org via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=101153

Michael Meissner  changed:

   What|Removed |Added

 Status|UNCONFIRMED |RESOLVED
 Resolution|--- |FIXED

--- Comment #1 from Michael Meissner  ---
Fixed in:

commit 730d021e3e4acc7c0031113ec720c82e31d405e5
Author: Michael Meissner 
Date:   Wed Jun 30 14:54:48 2021 -0400

[Bug target/100809] PPC: __int128 divide/modulo does not use P10 instructions vdivsq/vdivuq

2021-07-14 Thread meissner at gcc dot gnu.org via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100809

Michael Meissner  changed:

   What|Removed |Added

 Status|ASSIGNED|RESOLVED
 Resolution|--- |FIXED

--- Comment #8 from Michael Meissner  ---
Patch applied to mainline and GCC 11 branches.  PR closed.

[Bug middle-end/33699] [9/10/11/12 regression] missing optimization on const addr area store

2021-07-08 Thread meissner at gcc dot gnu.org via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=33699

Michael Meissner  changed:

   What|Removed |Added

 CC||meissner at gcc dot gnu.org

--- Comment #32 from Michael Meissner  ---
I looked at adding the following powerpc patch that was proposed in March,
2021:
https://gcc.gnu.org/pipermail/gcc-patches/2021-March/566744.html

There are two parts to the patch, that are sort of unrelated.

The first part is to add minimum and maximum section anchor offset values and
use -fsection anchors.  I ran a spec 2017 benchmark on a pre-production power10
system, comparing my normal run times to run times with -fsection-anchors and
setting the minimum/maximum section anchor offsets.

Two benchmarks improved and two benchmarks regressed:

xalancbmk_r: 1.75% regression
cactuBSSN_r: 4.24% improvement
blender_r: 1.92% regression
roms_r: 1.05% improvement

I then built spec 2017 with just the part of setting const_anchor, but not the
section anchor minimum/maximum offsets.  Eight benchmarks did not build due to
assertion failures in cse.c:

gcc_r
exchange2_r
cactuBSSN_r
wrf_r
blender_r
cam4_r
fotonik3d_r
roms_r

If I specify the section anchor minimum/maximum offsets, add -fsection-anchors,
and set the const_anchor, all 23 INT+FP benchmarks build, but WRF_R does not
run correctly.  So without more debugging, I don't recommend setting
const_anchor.  It is probably useful to set the minimum/maximum section anchor
offsets in case people use -fsection-anchors.

As an aside, if we wanted to accept using constant addresses in the PowerPC, we
would need to recognize a constant address as being legitimate.  This may be
useful in some embedded environments where you have devices at certain memory
locations.  But somebody would need to add the support.

[Bug target/101019] GCC should consider using PLI/SLDI/PADDI to load up 64-bit constants on power10

2021-06-10 Thread meissner at gcc dot gnu.org via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=101019

Michael Meissner  changed:

   What|Removed |Added

 Status|NEW |ASSIGNED
   Assignee|unassigned at gcc dot gnu.org  |meissner at gcc dot 
gnu.org

--- Comment #1 from Michael Meissner  ---
I'm working on patches.

[Bug target/101019] GCC should consider using PLI/SLDI/PADDI to load up 64-bit constants on power10

2021-06-10 Thread meissner at gcc dot gnu.org via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=101019

Michael Meissner  changed:

   What|Removed |Added

 Status|UNCONFIRMED |NEW
 Ever confirmed|0   |1
   Last reconfirmed||2021-06-10
   Severity|normal  |enhancement
 CC||bergner at gcc dot gnu.org,
   ||dje at gcc dot gnu.org,
   ||meissner at gcc dot gnu.org,
   ||segher at gcc dot gnu.org,
   ||willschm at gcc dot gnu.org,
   ||wschmidt at gcc dot gnu.org

[Bug target/101019] New: GCC should consider using PLI/SLDI/PADDI to load up 64-bit constants on power10

2021-06-10 Thread meissner at gcc dot gnu.org via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=101019

Bug ID: 101019
   Summary: GCC should consider using PLI/SLDI/PADDI to load up
64-bit constants on power10
   Product: gcc
   Version: 12.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: target
  Assignee: unassigned at gcc dot gnu.org
  Reporter: meissner at gcc dot gnu.org
  Target Milestone: ---

GCC should consider using a sequence of PLI, SLDI, and PADDI to load up 64-bit
constants on power10.

For example:
long foo_2 (void) { return (1L << 53) | (1L << 35) | (1L << 30) | (1L << 2); }

Generates:
lis 3,0x20
ori 3,3,0x8
sldi 3,3,32
oris 3,3,0x4000
ori 3,3,0x4

when it could generate:
pli 3,2097160
sldi 3,3,32
paddi 3,3,1073741828

[Bug testsuite/101002] New: Some powerpc tests fail with -mlong-double-64

2021-06-09 Thread meissner at gcc dot gnu.org via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=101002

Bug ID: 101002
   Summary: Some powerpc tests fail with -mlong-double-64
   Product: gcc
   Version: 12.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: testsuite
  Assignee: unassigned at gcc dot gnu.org
  Reporter: meissner at gcc dot gnu.org
  Target Milestone: ---

I now build GCC with all 3 long double variants (IEEE 128-bit, IBM 128-bit, and
64-bit).  The following C test fail when when you configure the compiler to use
64-bit long doubles:

gcc.dg/float128-align.c
gcc.dg/float64x-align.c
gcc.dg/tree-ssa/builtin-sprintf.c
gcc.target/powerpc/convert-fp-64.c
gcc.target/powerpc/float128-hw.c
gcc.target/powerpc/float128-hw4.c
gcc.target/powerpc/gnuattr2.c
gcc.target/powerpc/gnuattr3.c
gcc.target/powerpc/pr60203.c
gcc.target/powerpc/pr79004.c
gcc.target/powerpc/pr82748-1.c
gcc.target/powerpc/pr85657-3.c
gcc.target/powerpc/signbit-1.c

[Bug target/99293] Built-in vec_splat generates sub-optimal code for -mcpu=power10

2021-06-04 Thread meissner at gcc dot gnu.org via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=99293

Michael Meissner  changed:

   What|Removed |Added

   Assignee|unassigned at gcc dot gnu.org  |meissner at gcc dot 
gnu.org
 Status|NEW |ASSIGNED

[Bug target/99293] Built-in vec_splat generates sub-optimal code for -mcpu=power10

2021-06-04 Thread meissner at gcc dot gnu.org via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=99293

--- Comment #3 from Michael Meissner  ---
Created attachment 50947
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=50947=edit
Proposed patch

[Bug c++/100809] PPC: __int128 divide/modulo does not use P10 instructions vdivsq/vdivuq

2021-06-04 Thread meissner at gcc dot gnu.org via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100809

Michael Meissner  changed:

   What|Removed |Added

   Assignee|unassigned at gcc dot gnu.org  |meissner at gcc dot 
gnu.org
 Status|NEW |ASSIGNED

--- Comment #5 from Michael Meissner  ---
Patch submitted:
https://gcc.gnu.org/pipermail/gcc-patches/2021-June/571942.html

[Bug testsuite/100168] Test gcc.dg/pr56727-2.c fails on power10 code generation

2021-06-04 Thread meissner at gcc dot gnu.org via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100168

Michael Meissner  changed:

   What|Removed |Added

 Ever confirmed|0   |1
 Status|UNCONFIRMED |ASSIGNED
   Assignee|unassigned at gcc dot gnu.org  |meissner at gcc dot 
gnu.org
   Last reconfirmed||2021-06-04

--- Comment #1 from Michael Meissner  ---
This is has been submitted with the patch:
https://gcc.gnu.org/pipermail/gcc-patches/2021-May/570688.html

[Bug c++/100809] PPC: __int128 divide/modulo does not use P10 instructions vdivsq/vdivuq

2021-06-01 Thread meissner at gcc dot gnu.org via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100809

--- Comment #4 from Michael Meissner  ---
Note, in looking at Carl's patch, it is only for adding the built-ins.  I don't
believe it adds direct support for {,u}divti3 and {,u}moddti3 to implement
these for normal __int128 variables.

[Bug c++/100809] PPC: __int128 divide/modulo does not use P10 instructions vdivsq/vdivuq

2021-06-01 Thread meissner at gcc dot gnu.org via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100809

Michael Meissner  changed:

   What|Removed |Added

   Last reconfirmed||2021-06-01
 Status|UNCONFIRMED |NEW
 CC||carll at gcc dot gnu.org,
   ||meissner at gcc dot gnu.org
 Ever confirmed|0   |1

--- Comment #3 from Michael Meissner  ---
Carl Love submitted a patch for this on April 26th.

1 2 3 4 5 6 7 8 9 10 >

1 - 100 of 1265 matches

Mail list logo