[Bug target/110170] Sub-optimal conditional jumps in conditional-swap with floating point

2023-10-25 Thread cvs-commit at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110170

--- Comment #22 from CVS Commits  ---
The releases/gcc-12 branch has been updated by hongtao Liu
:

https://gcc.gnu.org/g:1e36498710f9ca84fefa578863cf505f484601b1

commit r12-9944-g1e36498710f9ca84fefa578863cf505f484601b1
Author: liuhongt 
Date:   Wed Jul 5 13:45:11 2023 +0800

Disparage slightly for the alternative which move DFmode between SSE_REGS
and GENERAL_REGS.

For testcase

void __cond_swap(double* __x, double* __y) {
  bool __r = (*__x < *__y);
  auto __tmp = __r ? *__x : *__y;
  *__y = __r ? *__y : *__x;
  *__x = __tmp;
}

GCC-14 with -O2 and -march=x86-64 options generates the following code:

__cond_swap(double*, double*):
movsd   xmm1, QWORD PTR [rdi]
movsd   xmm0, QWORD PTR [rsi]
comisd  xmm0, xmm1
jbe .L2
movqrax, xmm1
movapd  xmm1, xmm0
movqxmm0, rax
.L2:
movsd   QWORD PTR [rsi], xmm1
movsd   QWORD PTR [rdi], xmm0
ret

rax is used to save and restore DFmode value. In RA both GENERAL_REGS
and SSE_REGS cost zero since we didn't disparage the
alternative in movdf_internal pattern, according to register
allocation order, GENERAL_REGS is allocated. The patch add ? for
alternative (r,v) and (v,r) just like we did for movsf/hf/bf_internal
pattern, after that we get optimal RA.

__cond_swap:
.LFB0:
.cfi_startproc
movsd   (%rdi), %xmm1
movsd   (%rsi), %xmm0
comisd  %xmm1, %xmm0
jbe .L2
movapd  %xmm1, %xmm2
movapd  %xmm0, %xmm1
movapd  %xmm2, %xmm0
.L2:
movsd   %xmm1, (%rsi)
movsd   %xmm0, (%rdi)
ret

gcc/ChangeLog:

PR target/110170
* config/i386/i386.md (movdf_internal): Disparage slightly for
2 alternatives (r,v) and (v,r) by adding constraint modifier
'?'.

gcc/testsuite/ChangeLog:

* gcc.target/i386/pr110170-3.c: New test.

(cherry picked from commit 37a231cc7594d12ba0822077018aad751a6fb94e)

[Bug target/110170] Sub-optimal conditional jumps in conditional-swap with floating point

2023-10-17 Thread cvs-commit at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110170

--- Comment #21 from CVS Commits  ---
The releases/gcc-11 branch has been updated by hongtao Liu
:

https://gcc.gnu.org/g:0d005deb6c8a956b4f7ccb6e70e8e7830a40fed9

commit r11-11065-g0d005deb6c8a956b4f7ccb6e70e8e7830a40fed9
Author: liuhongt 
Date:   Wed Jul 5 13:45:11 2023 +0800

Disparage slightly for the alternative which move DFmode between SSE_REGS
and GENERAL_REGS.

For testcase

void __cond_swap(double* __x, double* __y) {
  bool __r = (*__x < *__y);
  auto __tmp = __r ? *__x : *__y;
  *__y = __r ? *__y : *__x;
  *__x = __tmp;
}

GCC-14 with -O2 and -march=x86-64 options generates the following code:

__cond_swap(double*, double*):
movsd   xmm1, QWORD PTR [rdi]
movsd   xmm0, QWORD PTR [rsi]
comisd  xmm0, xmm1
jbe .L2
movqrax, xmm1
movapd  xmm1, xmm0
movqxmm0, rax
.L2:
movsd   QWORD PTR [rsi], xmm1
movsd   QWORD PTR [rdi], xmm0
ret

rax is used to save and restore DFmode value. In RA both GENERAL_REGS
and SSE_REGS cost zero since we didn't disparage the
alternative in movdf_internal pattern, according to register
allocation order, GENERAL_REGS is allocated. The patch add ? for
alternative (r,v) and (v,r) just like we did for movsf/hf/bf_internal
pattern, after that we get optimal RA.

__cond_swap:
.LFB0:
.cfi_startproc
movsd   (%rdi), %xmm1
movsd   (%rsi), %xmm0
comisd  %xmm1, %xmm0
jbe .L2
movapd  %xmm1, %xmm2
movapd  %xmm0, %xmm1
movapd  %xmm2, %xmm0
.L2:
movsd   %xmm1, (%rsi)
movsd   %xmm0, (%rdi)
ret

gcc/ChangeLog:

PR target/110170
* config/i386/i386.md (movdf_internal): Disparage slightly for
2 alternatives (r,v) and (v,r) by adding constraint modifier
'?'.

gcc/testsuite/ChangeLog:

* gcc.target/i386/pr110170-3.c: New test.

(cherry picked from commit 37a231cc7594d12ba0822077018aad751a6fb94e)

[Bug target/110170] Sub-optimal conditional jumps in conditional-swap with floating point

2023-10-17 Thread cvs-commit at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110170

--- Comment #20 from CVS Commits  ---
The releases/gcc-13 branch has been updated by hongtao Liu
:

https://gcc.gnu.org/g:27165633859bdf92589428213edfeccdb49b7d83

commit r13-7956-g27165633859bdf92589428213edfeccdb49b7d83
Author: liuhongt 
Date:   Wed Jul 5 13:45:11 2023 +0800

Disparage slightly for the alternative which move DFmode between SSE_REGS
and GENERAL_REGS.

For testcase

void __cond_swap(double* __x, double* __y) {
  bool __r = (*__x < *__y);
  auto __tmp = __r ? *__x : *__y;
  *__y = __r ? *__y : *__x;
  *__x = __tmp;
}

GCC-14 with -O2 and -march=x86-64 options generates the following code:

__cond_swap(double*, double*):
movsd   xmm1, QWORD PTR [rdi]
movsd   xmm0, QWORD PTR [rsi]
comisd  xmm0, xmm1
jbe .L2
movqrax, xmm1
movapd  xmm1, xmm0
movqxmm0, rax
.L2:
movsd   QWORD PTR [rsi], xmm1
movsd   QWORD PTR [rdi], xmm0
ret

rax is used to save and restore DFmode value. In RA both GENERAL_REGS
and SSE_REGS cost zero since we didn't disparage the
alternative in movdf_internal pattern, according to register
allocation order, GENERAL_REGS is allocated. The patch add ? for
alternative (r,v) and (v,r) just like we did for movsf/hf/bf_internal
pattern, after that we get optimal RA.

__cond_swap:
.LFB0:
.cfi_startproc
movsd   (%rdi), %xmm1
movsd   (%rsi), %xmm0
comisd  %xmm1, %xmm0
jbe .L2
movapd  %xmm1, %xmm2
movapd  %xmm0, %xmm1
movapd  %xmm2, %xmm0
.L2:
movsd   %xmm1, (%rsi)
movsd   %xmm0, (%rdi)
ret

gcc/ChangeLog:

PR target/110170
* config/i386/i386.md (movdf_internal): Disparage slightly for
2 alternatives (r,v) and (v,r) by adding constraint modifier
'?'.

gcc/testsuite/ChangeLog:

* gcc.target/i386/pr110170-3.c: New test.

(cherry picked from commit 37a231cc7594d12ba0822077018aad751a6fb94e)

[Bug target/110170] Sub-optimal conditional jumps in conditional-swap with floating point

2023-07-21 Thread rguenth at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110170

Richard Biener  changed:

   What|Removed |Added

 Resolution|--- |FIXED
 Status|REOPENED|RESOLVED

--- Comment #19 from Richard Biener  ---
Fixed now.

[Bug target/110170] Sub-optimal conditional jumps in conditional-swap with floating point

2023-07-18 Thread rguenth at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110170

Richard Biener  changed:

   What|Removed |Added

   Target Milestone|14.0|---
 Resolution|FIXED   |---
 Status|RESOLVED|REOPENED

--- Comment #18 from Richard Biener  ---
Huh, right.  Somehow I thought minss/maxss is SSE 4.1.  I do have a patch
series that fixes this, the PR88540 is missing for this but it has some fallout
still.

[Bug target/110170] Sub-optimal conditional jumps in conditional-swap with floating point

2023-07-18 Thread crazylht at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110170

--- Comment #17 from Hongtao.liu  ---
(In reply to Richard Biener from comment #16)
> This is fixed now.

The original issue is for sse2, my patch only fixed misoptimization for sse4.1.

[Bug target/110170] Sub-optimal conditional jumps in conditional-swap with floating point

2023-07-18 Thread rguenth at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110170

Richard Biener  changed:

   What|Removed |Added

 Status|NEW |RESOLVED
   Target Milestone|--- |14.0
 Resolution|--- |FIXED

--- Comment #16 from Richard Biener  ---
This is fixed now.

[Bug target/110170] Sub-optimal conditional jumps in conditional-swap with floating point

2023-07-11 Thread cvs-commit at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110170

--- Comment #15 from CVS Commits  ---
The master branch has been updated by hongtao Liu :

https://gcc.gnu.org/g:e5c64efb1367459dbc2d2e29856f23908cb503c1

commit r14-2432-ge5c64efb1367459dbc2d2e29856f23908cb503c1
Author: liuhongt 
Date:   Tue Jul 11 21:21:03 2023 +0800

Fix typo in the testcase.

Antony Polukhin 2023-07-11 09:51:58 UTC
There's a typo at
https://gcc.gnu.org/git/?p=gcc.git;a=blob;f=gcc/testsuite/g%2B%2B.target/i386/pr110170.C;h=e638b12a5ee2264ecef77acca86432a9f24b103b;hb=d41a57c46df6f8f7dae0c0a8b349e734806a837b#l87

It should be `|| !test3() || !test3r()` rather than `|| !test3() ||
!test4r()`

gcc/testsuite/ChangeLog:

PR target/110170
* g++.target/i386/pr110170.C: Fix typo.

[Bug target/110170] Sub-optimal conditional jumps in conditional-swap with floating point

2023-07-11 Thread crazylht at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110170

--- Comment #14 from Hongtao.liu  ---
(In reply to Antony Polukhin from comment #13)
> There's a typo at
> https://gcc.gnu.org/git/?p=gcc.git;a=blob;f=gcc/testsuite/g%2B%2B.target/
> i386/pr110170.C;h=e638b12a5ee2264ecef77acca86432a9f24b103b;
> hb=d41a57c46df6f8f7dae0c0a8b349e734806a837b#l87
> 
> It should be `|| !test3() || !test3r()` rather than `|| !test3() ||
> !test4r()`

Yes, thanks for the reminder.

[Bug target/110170] Sub-optimal conditional jumps in conditional-swap with floating point

2023-07-11 Thread antoshkka at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110170

--- Comment #13 from Antony Polukhin  ---
There's a typo at
https://gcc.gnu.org/git/?p=gcc.git;a=blob;f=gcc/testsuite/g%2B%2B.target/i386/pr110170.C;h=e638b12a5ee2264ecef77acca86432a9f24b103b;hb=d41a57c46df6f8f7dae0c0a8b349e734806a837b#l87

It should be `|| !test3() || !test3r()` rather than `|| !test3() || !test4r()`

[Bug target/110170] Sub-optimal conditional jumps in conditional-swap with floating point

2023-07-09 Thread cvs-commit at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110170

--- Comment #12 from CVS Commits  ---
The master branch has been updated by hongtao Liu :

https://gcc.gnu.org/g:d41a57c46df6f8f7dae0c0a8b349e734806a837b

commit r14-2403-gd41a57c46df6f8f7dae0c0a8b349e734806a837b
Author: liuhongt 
Date:   Mon Jul 3 18:19:19 2023 +0800

Add pre_reload splitter to detect fp min/max pattern.

We have ix86_expand_sse_fp_minmax to detect min/max sematics, but
it requires rtx_equal_p for cmp_op0/cmp_op1 and if_true/if_false, for
the testcase in the PR, there's an extra move from cmp_op0 to if_true,
and it failed ix86_expand_sse_fp_minmax.

This patch adds pre_reload splitter to detect the min/max pattern.

Operands order in MINSS matters for signed zero and NANs, since the
instruction always returns second operand when any operand is NAN or
both operands are zero.

gcc/ChangeLog:

PR target/110170
* config/i386/i386.md (*ieee_max3_1): New pre_reload
splitter to detect fp max pattern.
(*ieee_min3_1): Ditto, but for fp min pattern.

gcc/testsuite/ChangeLog:

* g++.target/i386/pr110170.C: New test.
* gcc.target/i386/pr110170.c: New test.

[Bug target/110170] Sub-optimal conditional jumps in conditional-swap with floating point

2023-07-05 Thread cvs-commit at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110170

--- Comment #11 from CVS Commits  ---
The master branch has been updated by hongtao Liu :

https://gcc.gnu.org/g:37a231cc7594d12ba0822077018aad751a6fb94e

commit r14-2337-g37a231cc7594d12ba0822077018aad751a6fb94e
Author: liuhongt 
Date:   Wed Jul 5 13:45:11 2023 +0800

Disparage slightly for the alternative which move DFmode between SSE_REGS
and GENERAL_REGS.

For testcase

void __cond_swap(double* __x, double* __y) {
  bool __r = (*__x < *__y);
  auto __tmp = __r ? *__x : *__y;
  *__y = __r ? *__y : *__x;
  *__x = __tmp;
}

GCC-14 with -O2 and -march=x86-64 options generates the following code:

__cond_swap(double*, double*):
movsd   xmm1, QWORD PTR [rdi]
movsd   xmm0, QWORD PTR [rsi]
comisd  xmm0, xmm1
jbe .L2
movqrax, xmm1
movapd  xmm1, xmm0
movqxmm0, rax
.L2:
movsd   QWORD PTR [rsi], xmm1
movsd   QWORD PTR [rdi], xmm0
ret

rax is used to save and restore DFmode value. In RA both GENERAL_REGS
and SSE_REGS cost zero since we didn't disparage the
alternative in movdf_internal pattern, according to register
allocation order, GENERAL_REGS is allocated. The patch add ? for
alternative (r,v) and (v,r) just like we did for movsf/hf/bf_internal
pattern, after that we get optimal RA.

__cond_swap:
.LFB0:
.cfi_startproc
movsd   (%rdi), %xmm1
movsd   (%rsi), %xmm0
comisd  %xmm1, %xmm0
jbe .L2
movapd  %xmm1, %xmm2
movapd  %xmm0, %xmm1
movapd  %xmm2, %xmm0
.L2:
movsd   %xmm1, (%rsi)
movsd   %xmm0, (%rdi)
ret

gcc/ChangeLog:

PR target/110170
* config/i386/i386.md (movdf_internal): Disparage slightly for
2 alternatives (r,v) and (v,r) by adding constraint modifier
'?'.

gcc/testsuite/ChangeLog:

* gcc.target/i386/pr110170-3.c: New test.

[Bug target/110170] Sub-optimal conditional jumps in conditional-swap with floating point

2023-07-03 Thread crazylht at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110170

--- Comment #10 from Hongtao.liu  ---
There're couple of other issues.
1. rtx_cost for and/ior/xor:SF/DF is not right, it actually generate vector
instructions.
2. branch_cost is COSTS_N_INSN(1) instead of BRANCH_COST ().
which make noce more conservative to eliminate condition.
w/ sse2, backend tries

(insn 34 0 36 (set (reg:DF 86 [ _1 ])
(reg:DF 82 [ _1 ])) 151 {*movdf_internal}
 (nil))

(insn 36 34 37 (set (reg:DF 92)
(unspec:DF [
(reg:DF 83 [ _2 ])
(reg:DF 82 [ _1 ])
] UNSPEC_IEEE_MAX)) -1
 (nil))

(insn 37 36 38 (set (reg:DF 93)
(lt:DF (reg:DF 82 [ _1 ])
(reg:DF 83 [ _2 ]))) -1
 (nil))

(insn 38 37 39 (set (reg:DF 94)
(and:DF (reg:DF 86 [ _1 ])
(reg:DF 93))) -1
 (nil))

(insn 39 38 40 (set (reg:DF 95)
(and:DF (not:DF (reg:DF 93))
(reg:DF 83 [ _2 ]))) -1
 (nil))

(insn 40 39 41 (set (reg:DF 83 [ _2 ])
(ior:DF (reg:DF 95)
(reg:DF 94))) -1
 (nil))

(insn 41 40 0 (set (reg:DF 82 [ _1 ])
(reg:DF 92)) 151 {*movdf_internal}
 (nil))

which is cost is 28, and original cost is 12 (3 moves + 1 branch).(needs also
conside comparison? since it's counted in cmov seq), if use ix86_branch_cost +
count comparison cost in the orginal seq, then the cost should be 28 vs 28.)


(insn 5 17 6 3 (set (reg:DF 86 [ _1 ])
(reg:DF 82 [ _1 ]))
"/export/users/liuhongt/tools-build/build_intel-innersource_pr110170_debug/test.c":5:23
151 {*movdf_internal}
 (expr_list:REG_DEAD (reg:DF 82 [ _1 ])
(nil)))
(insn 6 5 7 3 (set (reg:DF 82 [ _1 ])
(reg:DF 83 [ _2 ]))
"/export/users/liuhongt/tools-build/build_intel-innersource_pr110170_debug/test.c":6:15
discrim 1 151 {*movdf_internal}
 (expr_list:REG_DEAD (reg:DF 83 [ _2 ])
(nil)))
(insn 7 6 18 3 (set (reg:DF 83 [ _2 ])
(reg:DF 86 [ _1 ]))
"/export/users/liuhongt/tools-build/build_intel-innersource_pr110170_debug/test.c":5:23
discrim 1 151 {*movdf_internal}
 (expr_list:REG_DEAD (reg:DF 86 [ _1 ])
(nil)))

[Bug target/110170] Sub-optimal conditional jumps in conditional-swap with floating point

2023-06-12 Thread crazylht at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110170

--- Comment #9 from Hongtao.liu  ---
(In reply to Hongtao.liu from comment #8)
> ix86_expand_sse_fp_minmax failed since rtx_equal_p (cmp_op0, if_true) is
> false, 
> 
> 249(reg:DF 86 [ _1 ])  (if_true)
> 250(reg:DF 83 [ _2 ])  (if_false)
> 251(reg:DF 82 [ _1 ])  (cmp0_op0)
> 252(reg:DF 83 [ _2 ])  (cmp1_op1)
> 
> but here if_true is just a copy from cmp_op0 but with different REGNO,
> rtx_equal_p seems too conservative here.
> 

But if_convert didn't maintain DF_CHAIN info, and and backend can't get
DF_REG_DEF_* info to figure out if_true is just a single_set of cmp_op0.


With -march=x86-64-v2, gcc generates 

movsd   (%rdi), %xmm2
movsd   (%rsi), %xmm1
movapd  %xmm2, %xmm0
movapd  %xmm1, %xmm3
cmpltsd %xmm1, %xmm0
maxsd   %xmm2, %xmm3
blendvpd%xmm0, %xmm2, %xmm1
movsd   %xmm3, (%rsi)
movsd   %xmm1, (%rdi)
ret

Which can be further optimized: cmpltsd + blendvpd -> minsd

[Bug target/110170] Sub-optimal conditional jumps in conditional-swap with floating point

2023-06-11 Thread crazylht at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110170

--- Comment #8 from Hongtao.liu  ---
ix86_expand_sse_fp_minmax failed since rtx_equal_p (cmp_op0, if_true) is false, 

249(reg:DF 86 [ _1 ])  (if_true)
250(reg:DF 83 [ _2 ])  (if_false)
251(reg:DF 82 [ _1 ])  (cmp0_op0)
252(reg:DF 83 [ _2 ])  (cmp1_op1)

but here if_true is just a copy from cmp_op0 but with different REGNO,
rtx_equal_p seems too conservative here.

 85(code_label 26 13 17 3 4 (nil) [1 uses])
 86(note 17 26 5 3 [bb 3] NOTE_INSN_BASIC_BLOCK)
 87(insn 5 17 6 3 (set (reg:DF 86 [ _1 ])
 88(reg:DF 82 [ _1 ])) "test.C":3:20 153 {*movdf_internal}
 89 (expr_list:REG_DEAD (reg:DF 82 [ _1 ])
 90(nil)))
 91(insn 6 5 7 3 (set (reg:DF 82 [ _1 ])
 92(reg:DF 83 [ _2 ])) "test.C":4:14 discrim 1 153 {*movdf_internal}
 93 (expr_list:REG_DEAD (reg:DF 83 [ _2 ])
 94(nil)))
 95(insn 7 6 18 3 (set (reg:DF 83 [ _2 ])
 96(reg:DF 86 [ _1 ])) "test.C":3:20 discrim 1 153 {*movdf_internal}
 97 (expr_list:REG_DEAD (reg:DF 86 [ _1 ])
 98(nil)))


3812  if (rtx_equal_p (cmp_op0, if_true) && rtx_equal_p (cmp_op1, if_false))
 3813is_min = true;
 3814  else if (rtx_equal_p (cmp_op1, if_true) && rtx_equal_p (cmp_op0,
if_false))
 3815is_min = false;
 3816  else
 3817=>  return false;

[Bug target/110170] Sub-optimal conditional jumps in conditional-swap with floating point

2023-06-09 Thread rguenth at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110170

Richard Biener  changed:

   What|Removed |Added

 Status|UNCONFIRMED |NEW
 Ever confirmed|0   |1
   Last reconfirmed||2023-06-09

[Bug target/110170] Sub-optimal conditional jumps in conditional-swap with floating point

2023-06-09 Thread crazylht at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110170

--- Comment #7 from Hongtao.liu  ---
void __cond_swap(double* __x, double* __y) {
  bool __r = (*__x < *__y);
  *__x = __r ? *__y : *__x ;
}

void __cond_swap1(double* __x, double* __y) {
  bool __r = (*__x < *__y);
  *__y = __r ? *__x : *__y;
}

Separately, GCC can generate both max/min.

[Bug target/110170] Sub-optimal conditional jumps in conditional-swap with floating point

2023-06-09 Thread crazylht at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110170

--- Comment #6 from Hongtao.liu  ---
(In reply to Hongtao.liu from comment #5)
> (In reply to Antony Polukhin from comment #2)
> > -fno-trapping-math had no effect
> > 
> > Some tests with nans seem to produce the same results for both code
> > snippets: https://godbolt.org/z/GaKM3EhMq
> 
> What about infinity, I notice
> With -ffinite-math-only -funsafe-math-optimizations, gcc now can generate 
> 
> __cond_swap(double*, double*):
> movsd   (%rdi), %xmm0
> movsd   (%rsi), %xmm1
> movapd  %xmm0, %xmm2
> minsd   %xmm1, %xmm0
> maxsd   %xmm1, %xmm2
> movsd   %xmm2, (%rsi)
> movsd   %xmm0, (%rdi)
> ret

Assume -funsafe-math-optimizations is not needed?

[Bug target/110170] Sub-optimal conditional jumps in conditional-swap with floating point

2023-06-08 Thread crazylht at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110170

--- Comment #5 from Hongtao.liu  ---
(In reply to Antony Polukhin from comment #2)
> -fno-trapping-math had no effect
> 
> Some tests with nans seem to produce the same results for both code
> snippets: https://godbolt.org/z/GaKM3EhMq

What about infinity, I notice
With -ffinite-math-only -funsafe-math-optimizations, gcc now can generate 

__cond_swap(double*, double*):
movsd   (%rdi), %xmm0
movsd   (%rsi), %xmm1
movapd  %xmm0, %xmm2
minsd   %xmm1, %xmm0
maxsd   %xmm1, %xmm2
movsd   %xmm2, (%rsi)
movsd   %xmm0, (%rdi)
ret

[Bug target/110170] Sub-optimal conditional jumps in conditional-swap with floating point

2023-06-08 Thread pinskia at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110170

Andrew Pinski  changed:

   What|Removed |Added

 Target||x86_64-linux-gnu

--- Comment #4 from Andrew Pinski  ---
Note for aarch64, we do produce conditional moves but only when there is a
loop.

That is:
```
__attribute__((noinline))
void __cond_swap(double* __x, double* __y) {
  for(int i = 0; i < 100; i++, __x++, __y++) {
  double __r = (*__x < *__y);
  double __tmp = __r ? *__x : *__y;
  *__y = __r ? *__y : *__x;
  *__x = __tmp;
  }
}
```
Produces:
```
.L3:
ldr d31, [x0, x2]
ldr d30, [x1, x2]
fcmpe   d31, d30
fcsel   d29, d30, d31, mi
fcsel   d31, d31, d30, mi
str d29, [x1, x2]
str d31, [x0, x2]
add x2, x2, 8
cmp x2, 800
bne .L3
```

Otherwise it will duplicate the return basic block (which is expected).

So this is a x86_64 specific issue.

[Bug target/110170] Sub-optimal conditional jumps in conditional-swap with floating point

2023-06-08 Thread pinskia at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110170

--- Comment #3 from Andrew Pinski  ---
So for arm, GCC does produce the code you want:
```
vcmpe.f64   d17, d16
vmrsAPSR_nzcv, FPSCR
ite pl
vmovpl.f64  d18, d17
vmovmi.f64  d18, d16
it  mi
vmovmi.f64  d16, d17
```

RTL CE1 (ifcvt) detects it:
if-conversion succeeded through noce_convert_multiple_sets


So maybe there is some cost issue. Because arm64 does not do it either.

[Bug target/110170] Sub-optimal conditional jumps in conditional-swap with floating point

2023-06-08 Thread antoshkka at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110170

--- Comment #2 from Antony Polukhin  ---
-fno-trapping-math had no effect

Some tests with nans seem to produce the same results for both code snippets:
https://godbolt.org/z/GaKM3EhMq