[Bug rtl-optimization/94864] Failure to combine vunpckhpd+movsd into single vunpckhpd

2023-08-22 Thread rguenth at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=94864

Richard Biener  changed:

   What|Removed |Added

 Resolution|--- |FIXED
 Status|ASSIGNED|RESOLVED
   Target Milestone|--- |14.0

--- Comment #6 from Richard Biener  ---
Fixed.

[Bug rtl-optimization/94864] Failure to combine vunpckhpd+movsd into single vunpckhpd

2023-08-22 Thread cvs-commit at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=94864

--- Comment #5 from CVS Commits  ---
The master branch has been updated by Richard Biener :

https://gcc.gnu.org/g:27de9aa152141e7f3ee66372647d0f2cd94c4b90

commit r14-3381-g27de9aa152141e7f3ee66372647d0f2cd94c4b90
Author: Richard Biener 
Date:   Wed Jul 12 15:01:47 2023 +0200

tree-optimization/94864 - vector insert of vector extract simplification

The PRs ask for optimizing of

  _1 = BIT_FIELD_REF ;
  result_4 = BIT_INSERT_EXPR ;

to a vector permutation.  The following implements this as
match.pd pattern, improving code generation on x86_64.

On the RTL level we face the issue that backend patterns inconsistently
use vec_merge and vec_select of vec_concat to represent permutes.

I think using a (supported) permute is almost always better
than an extract plus insert, maybe excluding the case we extract
element zero and that's aliased to a register that can be used
directly for insertion (not sure how to query that).

The patch FAILs one case in gcc.target/i386/avx512fp16-vmovsh-1a.c
where we now expand from

 __A_28 = VEC_PERM_EXPR ;

instead of

 _28 = BIT_FIELD_REF ;
 __A_29 = BIT_INSERT_EXPR ;

producing a vpblendw instruction instead of the expected vmovsh.  That's
either a missed vec_perm_const expansion optimization or even better,
an improvement - Zen4 for example has 4 ports to execute vpblendw
but only 3 for executing vmovsh and both instructions have the same size.

The patch XFAILs the sub-testcase.

PR tree-optimization/94864
PR tree-optimization/94865
PR tree-optimization/93080
* match.pd (bit_insert @0 (BIT_FIELD_REF @1 ..) ..): New pattern
for vector insertion from vector extraction.

* gcc.target/i386/pr94864.c: New testcase.
* gcc.target/i386/pr94865.c: Likewise.
* gcc.target/i386/avx512fp16-vmovsh-1a.c: XFAIL.
* gcc.dg/tree-ssa/forwprop-40.c: Likewise.
* gcc.dg/tree-ssa/forwprop-41.c: Likewise.

[Bug rtl-optimization/94864] Failure to combine vunpckhpd+movsd into single vunpckhpd

2023-07-31 Thread rguenth at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=94864
Bug 94864 depends on bug 88540, which changed state.

Bug 88540 Summary: Issues with vectorization of min/max operations
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88540

   What|Removed |Added

 Status|ASSIGNED|RESOLVED
 Resolution|--- |FIXED

[Bug rtl-optimization/94864] Failure to combine vunpckhpd+movsd into single vunpckhpd

2020-05-07 Thread rguenth at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=94864

Richard Biener  changed:

   What|Removed |Added

   Assignee|unassigned at gcc dot gnu.org  |rguenth at gcc dot 
gnu.org
 Status|NEW |ASSIGNED

--- Comment #4 from Richard Biener  ---
Addressed by the patch for PR94865.

[Bug rtl-optimization/94864] Failure to combine vunpckhpd+movsd into single vunpckhpd

2020-05-04 Thread segher at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=94864

--- Comment #3 from Segher Boessenkool  ---
vec_duplicate of vec_select is just a vec_select.  Any vec_merge is a
vec_select as well, as you say.

Canonicalisation should make vec_select always.

We probably should have canonicalisation rules for this, so that we do
not get all those rtxes in the instruction stream at all.

[Bug rtl-optimization/94864] Failure to combine vunpckhpd+movsd into single vunpckhpd

2020-04-30 Thread rguenth at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=94864

Richard Biener  changed:

   What|Removed |Added

 Ever confirmed|0   |1
   Last reconfirmed||2020-04-30
  Component|target  |rtl-optimization
 CC||rguenth at gcc dot gnu.org,
   ||segher at gcc dot gnu.org,
   ||uros at gcc dot gnu.org
 Status|UNCONFIRMED |NEW

--- Comment #2 from Richard Biener  ---
We're feeding combine with

(insn 7 4 9 2 (set (reg:DF 87)
(vec_select:DF (reg:V2DF 90)
(parallel [
(const_int 1 [0x1])
]))) "y.c":6:26 3195 {sse2_storehpd}
 (expr_list:REG_DEAD (reg:V2DF 90)
(nil)))
(insn 9 7 14 2 (set (reg:V2DF 88 [ result ])
(vec_merge:V2DF (vec_duplicate:V2DF (reg:DF 87))
(reg:V2DF 89)
(const_int 1 [0x1]))) "y.c":6:21 2918 {vec_setv2df_0}
 (expr_list:REG_DEAD (reg:V2DF 89)
(expr_list:REG_DEAD (reg:DF 87)
(nil

which makes

(set (reg:V2DF 88 [ result ])
(vec_merge:V2DF (vec_duplicate:V2DF (vec_select:DF (reg:V2DF 90)
(parallel [
(const_int 1 [0x1])
])))
(reg:V2DF 89)
(const_int 1 [0x1])))

out of this which does not match anything because x86 chooses to use
vec_merge in some and vec_select/vec_concat in other patterns:

(define_insn "*vec_interleave_highv2df"
  [(set (match_operand:V2DF 0 "nonimmediate_operand" "=x,v,v,x,v,m")
(vec_select:V2DF
  (vec_concat:V4DF
(match_operand:V2DF 1 "nonimmediate_operand" " 0,v,o,o,o,v")
(match_operand:V2DF 2 "nonimmediate_operand" " x,v,1,0,v,0"))
  (parallel [(const_int 1)
 (const_int 3)])))]
  "TARGET_SSE2 && ix86_vec_interleave_v2df_operator_ok (operands, 1)"
  "@
   unpckhpd\t{%2, %0|%0, %2}

not sure if combine should try to exchange vec_merge for vec_select/vec_concat
or if this is simply the backends fault (or GCCs for even having two ways
to express the same thing...)