[Bug target/100085] Bad code for union transfer from __float128 to vector types

2022-02-26 Thread munroesj at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100085

--- Comment #23 from Steven Munroe  ---
Ok, but I strongly recommend a compiler test that verify that the compiler is
generating the expected code (for this and other cases).

We have a history of common code changes (accidental or deliberate) causing
regressions for POWER targets.

Best to find these early, before they impact customer performance.

[Bug target/100085] Bad code for union transfer from __float128 to vector types

2022-02-25 Thread segher at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100085

--- Comment #22 from Segher Boessenkool  ---
Well, we do not do anything AT here; but the patch is not on the GCC 11
branch either.

Xiong Hu, does it backport there cleanly?

[Bug target/100085] Bad code for union transfer from __float128 to vector types

2022-02-25 Thread munroesj at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100085

--- Comment #21 from Steven Munroe  ---
Yes I was told by Peter Bergner that the fix from
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100085#c15 had been back ported
top AT15.0-1.

But when ran this test with AT15.0-1 I saw:

 :
   0:   20 00 20 39 li  r9,32
   4:   d0 ff 41 39 addir10,r1,-48
   8:   57 12 42 f0 xxswapd vs34,vs34
   c:   99 4f 4a 7c stxvd2x vs34,r10,r9
  10:   ce 48 4a 7c lvx v2,r10,r9
  14:   20 00 80 4e blr

0030 :
  30:   20 00 20 39 li  r9,32
  34:   d0 ff 41 39 addir10,r1,-48
  38:   57 12 42 f0 xxswapd vs34,vs34
  3c:   99 4f 4a 7c stxvd2x vs34,r10,r9
  40:   ce 48 4a 7c lvx v2,r10,r9
  44:   20 00 80 4e blr

0060 :
  60:   20 00 20 39 li  r9,32
  64:   d0 ff 41 39 addir10,r1,-48
  68:   57 12 42 f0 xxswapd vs34,vs34
  6c:   99 4f 4a 7c stxvd2x vs34,r10,r9
  70:   99 4e 4a 7c lxvd2x  vs34,r10,r9
  74:   57 12 42 f0 xxswapd vs34,vs34
  78:   20 00 80 4e blr

0090 :
  90:   57 12 42 f0 xxswapd vs34,vs34
  94:   20 00 40 39 li  r10,32
  98:   d0 ff 01 39 addir8,r1,-48
  9c:   f0 ff 21 39 addir9,r1,-16
  a0:   99 57 48 7c stxvd2x vs34,r8,r10
  a4:   00 00 69 e8 ld  r3,0(r9)
  a8:   08 00 89 e8 ld  r4,8(r9)
  ac:   20 00 80 4e blr

So either the patch for AT15.0-1 is not applied correctly or is non-functional
because of some difference between GCC11/GCC12. Or regressed because of some
other change/patch.

In my experience this part of GCC is fragile (based on the long/sad history of
IBM long double). So this needs to monitored with each new update.

[Bug target/100085] Bad code for union transfer from __float128 to vector types

2022-02-24 Thread segher at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100085

Segher Boessenkool  changed:

   What|Removed |Added

 Status|WAITING |REOPENED

--- Comment #20 from Segher Boessenkool  ---
Ah, there are problems with -mcpu=power8 -mlittle -mabi=elfv2, on GCC 11
and before.

[Bug target/100085] Bad code for union transfer from __float128 to vector types

2022-02-24 Thread segher at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100085

--- Comment #19 from Segher Boessenkool  ---
And the same with all of GCC 8, GCC 9, GCC 10, GCC 11, and current trunk.

[Bug target/100085] Bad code for union transfer from __float128 to vector types

2022-02-24 Thread segher at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100085

Segher Boessenkool  changed:

   What|Removed |Added

 Status|REOPENED|WAITING

--- Comment #18 from Segher Boessenkool  ---
What do you see, what do you want to see?

For me (powerpc64-linux -mcpu=power10) I see three empty functions, and
for the last

stxv 34,-16(1)
ld 3,-16(1)
ld 4,-8(1)
blr

(and the same for power7 and later, just less efficient until p9; and the
same with -mlittle -mabi=elfv2).

[Bug target/100085] Bad code for union transfer from __float128 to vector types

2022-02-24 Thread munroesj at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100085

Steven Munroe  changed:

   What|Removed |Added

 Status|RESOLVED|REOPENED
 Resolution|FIXED   |---

--- Comment #17 from Steven Munroe  ---
I don't think this is fixed.

The fix was supposed to be back-ported to GCC11 for Advance Toolchain 15.

The updated test case shoes that this is clearly not working as advertised.

Either GCC12 fix has regressed due to subsequent updates or the AT15 GCC11
back-port fails due to some missing/different code between GCC11/12.

[Bug target/100085] Bad code for union transfer from __float128 to vector types

2022-02-24 Thread munroesj at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100085

--- Comment #16 from Steven Munroe  ---
Created attachment 52510
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=52510=edit
Reduced tests for xfers from _float128 to vector or __int128

Cover more types including __int128 and vector __int128

[Bug target/100085] Bad code for union transfer from __float128 to vector types

2022-01-14 Thread wschmidt at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100085

Bill Schmidt  changed:

   What|Removed |Added

 Status|NEW |RESOLVED
 Resolution|--- |FIXED

--- Comment #15 from Bill Schmidt  ---
This was fixed a while back in r12-1316 by Xiong Hu Luo.

[Bug target/100085] Bad code for union transfer from __float128 to vector types

2021-06-11 Thread segher at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100085

--- Comment #14 from Segher Boessenkool  ---
We *have* TImode already, but most 128-bit scalars currently use V1TImode.
This often leads to reduced performance because that is not a scalar mode,
does not get all optimisations we have generically for all other integer
scalars.  We have to do a lot of it manually, which is a lot of (combine)
patterns, and we still miss almost all cases.

I am not saying we should remove V1TImode.  I am saying we want to use
plain TImode for scalars, on newer cpus.  On p8 we had V1TImode so that
we could reduce the traffic between the vector register files and the
GPR register file, because that was very costly on p8 (mtvsr* and mfvsr*
were 5 cycles, and mtvsrdd and mfvsrld didn't even exist yet).

Using V1TImode for scalars on p8 was a pretty big win.  It should be a win
again to use TImode on later cpus though.

> And I have grave reservations about the vague plans of small/fringe minority 
> to 
> subset the PowerISA for their convenience.

I don't have reservations about that.  Instead, I battle that with all I can.

[Bug target/100085] Bad code for union transfer from __float128 to vector types

2021-06-10 Thread munroesj at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100085

--- Comment #13 from Steven Munroe  ---
"We want to use plain TImode instead of V1TImode on newer cpus."

Actually I disagree. We have vector __int128 in the ABI and with POWER10 a
complete set arithmetic operations for 128-bit in VRs.

Also this issue is not restricted to TImode. It also effects _Float128
(KFmode), _ibm128 (TFmode) and Libmvec for vector float/double. The proper and
optimum handling of these "union transfers" has been broken in GCC for years.

And I have grave reservations about the vague plans of small/fringe minority to
subset the PowerISA for their convenience.

[Bug target/100085] Bad code for union transfer from __float128 to vector types

2021-06-09 Thread segher at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100085

--- Comment #12 from Segher Boessenkool  ---
We want to use plain TImode instead of V1TImode on newer cpus.  It probably is
a good idea (for performance) on p9 already, but this will need testing. That's
only sideways related to this issue though (but so is -mvsx-timode :-) )

[Bug target/100085] Bad code for union transfer from __float128 to vector types

2021-06-09 Thread bergner at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100085

--- Comment #11 from Peter Bergner  ---
(In reply to luoxhu from comment #9)
> But for __float128 to __int128 mentioned in #c4, need hack
> rs6000_modes_tieable_p
> to remove the stack operation in dse1. But I am not sure this is *LEGAL*
> since TImode is allocated to GPR, It seems not true to access TImode from
> ALTIVEC or VSX without copying?

We used to have a -mvsx-timode option which allowed TImode pseudos into the VSX
registers.  We deprecated the option a while back and basically always allow
TImode in the VSX registers now.  I would say we even prefer them in VSX
registers over GOR registers.  The only "issue" is that our ABIs define
parameter passing and return values for TImode values go through the GPRs. :-(

[Bug target/100085] Bad code for union transfer from __float128 to vector types

2021-06-08 Thread luoxhu at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100085

--- Comment #10 from luoxhu at gcc dot gnu.org ---
float128 to vector __int128 is fixed by:

https://gcc.gnu.org/git/?p=gcc.git;a=commit;h=f700e4b0ee3ef53b48975cf89be26b9177e3a3f3

[Bug target/100085] Bad code for union transfer from __float128 to vector types

2021-06-02 Thread luoxhu at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100085

--- Comment #9 from luoxhu at gcc dot gnu.org ---
Patch sent, it could fix the __float128 to vector __int128 issue, 

https://gcc.gnu.org/pipermail/gcc-patches/2021-June/571689.html


But for __float128 to __int128 mentioned in #c4, need hack
rs6000_modes_tieable_p
to remove the stack operation in dse1. But I am not sure this is *LEGAL* since
TImode is allocated to GPR, It seems not true to access TImode from ALTIVEC or
VSX without copying?

diff --git a/gcc/config/rs6000/rs6000.c b/gcc/config/rs6000/rs6000.c
index ad11b67b125..ee69463ac46 100644
--- a/gcc/config/rs6000/rs6000.c
+++ b/gcc/config/rs6000/rs6000.c
@@ -1974,6 +1974,9 @@ rs6000_modes_tieable_p (machine_mode mode1, machine_mode
mode2)
   || mode2 == PTImode || mode2 == OOmode || mode2 == XOmode)
 return mode1 == mode2;

+  if (mode1 == TImode && ALTIVEC_OR_VSX_VECTOR_MODE (mode2))
+return true;
+


xxpermdi %vs0,%vs34,%vs34,3
mfvsrd %r4,%vs34
mfvsrd %r3,%vs0

[Bug target/100085] Bad code for union transfer from __float128 to vector types

2021-05-24 Thread segher at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100085

--- Comment #8 from Segher Boessenkool  ---
(In reply to luoxhu from comment #7)
> (In reply to Segher Boessenkool from comment #3)
> > The rotates in 6 and 7 are not merged, and neither are the vec_selects in
> > 8 and 9.  Both should be pretty easy to do, there is no unspec in sight,
> > etc.
> 
> Should this be done in pass bswaps or combine or by peephole2? :)

It should be done by simplify-rtx.c at least (which will make it work in
combine
and other places): two rotates that together do nothing should be optimised to
that, or generally, two rotates should be optimised to just one (which then can
be optimised to nothing).  Similar for vec_select.  Maybe something in bswaps
can help as well, I don't know, I haven't looked closely yet.

[Bug target/100085] Bad code for union transfer from __float128 to vector types

2021-05-24 Thread luoxhu at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100085

luoxhu at gcc dot gnu.org changed:

   What|Removed |Added

 CC||luoxhu at gcc dot gnu.org

--- Comment #7 from luoxhu at gcc dot gnu.org ---
(In reply to Segher Boessenkool from comment #3)
> The rotates in 6 and 7 are not merged, and neither are the vec_selects in
> 8 and 9.  Both should be pretty easy to do, there is no unspec in sight,
> etc.

Should this be done in pass bswaps or combine or by peephole2? :)

[Bug target/100085] Bad code for union transfer from __float128 to vector types

2021-04-30 Thread bergner at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100085

--- Comment #6 from Peter Bergner  ---
(In reply to Steven Munroe from comment #5)
> Any progress on this?

Sorry, not yet.  We've been busy with P10 items and the gcc11 release.  It is
on our list for looking into.

[Bug target/100085] Bad code for union transfer from __float128 to vector types

2021-04-29 Thread munroesj at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100085

--- Comment #5 from Steven Munroe  ---
Any progress on this?

[Bug target/100085] Bad code for union transfer from __float128 to vector types

2021-04-16 Thread munroesj at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100085

--- Comment #4 from Steven Munroe  ---
I am seeing this a similar problem with union transfers from __float128 to
__int128.


 static inline unsigned __int128
 vec_xfer_bin128_2_int128t (__binary128 f128)
 {
   __VF_128 vunion;

   vunion.vf1 = f128;

   return (vunion.ui1);
 }

and 

unsigned __int128
test_xfer_bin128_2_int128 (__binary128 f128)
{
  return vec_xfer_bin128_2_int128t (f128);
}

generates:

0030 :
  30:   57 12 42 f0 xxswapd vs34,vs34
  34:   20 00 20 39 li  r9,32
  38:   d0 ff 41 39 addir10,r1,-48
  3c:   99 4f 4a 7c stxvd2x vs34,r10,r9
  40:   f0 ff 61 e8 ld  r3,-16(r1)
  44:   f8 ff 81 e8 ld  r4,-8(r1)
  48:   20 00 80 4e blr

For POWER8 should use mfvsrd/xxpermdi/mfvsrd.

This looks like the root cause of poor performance for __float128 soft-float on
POWER8. A simple benchmark using __float128 in C code calling libgcc for
-mcpu=power8 and then hardware instructions for -mcpu=power9.

P8 target P8AT14, Uses libgcc __addkf3_sw and __mulkf3_sw:
test_time_f128 f128 CC  tb delta = 52589, sec = 0.000102713

P9 Target P8AT14, Uses libgcc __addkf3_hw and __mulkf3_hw:
test_time_f128 f128 CC  tb delta = 18762, sec = 3.66445e-05

P9 Target P9AT14, inline hardware binary128 float:
test_time_f128 f128 CC  tb delta = 3809, sec = 7.43945e-06

I used Valgrind Itrace and Sim-ppc and perfstat analysis. Every call to libgcc
__add/sub/mul/divkf3 takes a load-hit-store flush every call. This explains why
__float128 is so 13.8 X slower on P8 then P9.

[Bug target/100085] Bad code for union transfer from __float128 to vector types

2021-04-15 Thread segher at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100085

--- Comment #3 from Segher Boessenkool  ---
The rotates in 6 and 7 are not merged, and neither are the vec_selects in
8 and 9.  Both should be pretty easy to do, there is no unspec in sight,
etc.

[Bug target/100085] Bad code for union transfer from __float128 to vector types

2021-04-15 Thread rguenth at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100085

Richard Biener  changed:

   What|Removed |Added

 Target||powerpc
   Last reconfirmed||2021-04-15
 Ever confirmed|0   |1
   Keywords||missed-optimization
  Component|rtl-optimization|target
 Status|UNCONFIRMED |NEW

--- Comment #2 from Richard Biener  ---
RTL expansion for

vui128_t test_xfer_bin128_2_vui128t (__binary128 f128)
{
  vector(1) __int128 unsigned _3;

;;   basic block 2, loop depth 0
;;pred:   ENTRY
  _3 = VIEW_CONVERT_EXPR(f128_2(D));
  return _3;

power9 (-) vs power8 (+) is

 (note 3 2 6 2 NOTE_INSN_FUNCTION_BEG)
-(insn 6 3 7 2 (set (mem/c:KF (reg/f:DI 112 virtual-stack-vars) [1  S16 A128])
-(reg/v:KF 118 [ f128 ])) "vec_f128_ppc.h":143:19 -1
- (nil))
-(insn 7 6 8 2 (set (reg:V1TI 120)
-(mem/c:V1TI (reg/f:DI 112 virtual-stack-vars) [1  S16 A128]))
"t.c":13:10 -1
+(insn 6 3 7 2 (set (subreg:V1TI (reg:KF 120 [ f128 ]) 0)
+(rotate:V1TI (subreg:V1TI (reg/v:KF 118 [ f128 ]) 0)
+(const_int 64 [0x40]))) "vec_f128_ppc.h":143:19 -1
+ (nil))
+(insn 7 6 8 2 (set (mem/c:V1TI (reg/f:DI 112 virtual-stack-vars) [1  S16
A128])
+(rotate:V1TI (subreg:V1TI (reg:KF 120 [ f128 ]) 0)
+(const_int 64 [0x40]))) "vec_f128_ppc.h":143:19 -1
+ (nil))
+(insn 8 7 9 2 (set (reg:V2DI 122)
+(vec_select:V2DI (mem/c:V2DI (reg/f:DI 112 virtual-stack-vars) [1  S16
A128])
+(parallel [
+(const_int 1 [0x1])
+(const_int 0 [0])
+]))) "t.c":13:10 -1
+ (nil))
+(insn 9 8 10 2 (set (subreg:V2DI (reg:V1TI 121) 0)
+(vec_select:V2DI (reg:V2DI 122)
+(parallel [
+(const_int 1 [0x1])
+(const_int 0 [0])
+]))) "t.c":13:10 -1
  (nil))

so power8 avoids the stack but in turn ends up with sth that's not
optimized down the road.