[Bug rtl-optimization/94864] Failure to combine vunpckhpd+movsd into single vunpckhpd
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=94864 Richard Biener changed: What|Removed |Added Resolution|--- |FIXED Status|ASSIGNED|RESOLVED Target Milestone|--- |14.0 --- Comment #6 from Richard Biener --- Fixed.
[Bug rtl-optimization/94864] Failure to combine vunpckhpd+movsd into single vunpckhpd
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=94864 --- Comment #5 from CVS Commits --- The master branch has been updated by Richard Biener : https://gcc.gnu.org/g:27de9aa152141e7f3ee66372647d0f2cd94c4b90 commit r14-3381-g27de9aa152141e7f3ee66372647d0f2cd94c4b90 Author: Richard Biener Date: Wed Jul 12 15:01:47 2023 +0200 tree-optimization/94864 - vector insert of vector extract simplification The PRs ask for optimizing of _1 = BIT_FIELD_REF ; result_4 = BIT_INSERT_EXPR ; to a vector permutation. The following implements this as match.pd pattern, improving code generation on x86_64. On the RTL level we face the issue that backend patterns inconsistently use vec_merge and vec_select of vec_concat to represent permutes. I think using a (supported) permute is almost always better than an extract plus insert, maybe excluding the case we extract element zero and that's aliased to a register that can be used directly for insertion (not sure how to query that). The patch FAILs one case in gcc.target/i386/avx512fp16-vmovsh-1a.c where we now expand from __A_28 = VEC_PERM_EXPR ; instead of _28 = BIT_FIELD_REF ; __A_29 = BIT_INSERT_EXPR ; producing a vpblendw instruction instead of the expected vmovsh. That's either a missed vec_perm_const expansion optimization or even better, an improvement - Zen4 for example has 4 ports to execute vpblendw but only 3 for executing vmovsh and both instructions have the same size. The patch XFAILs the sub-testcase. PR tree-optimization/94864 PR tree-optimization/94865 PR tree-optimization/93080 * match.pd (bit_insert @0 (BIT_FIELD_REF @1 ..) ..): New pattern for vector insertion from vector extraction. * gcc.target/i386/pr94864.c: New testcase. * gcc.target/i386/pr94865.c: Likewise. * gcc.target/i386/avx512fp16-vmovsh-1a.c: XFAIL. * gcc.dg/tree-ssa/forwprop-40.c: Likewise. * gcc.dg/tree-ssa/forwprop-41.c: Likewise.
[Bug rtl-optimization/94864] Failure to combine vunpckhpd+movsd into single vunpckhpd
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=94864 Bug 94864 depends on bug 88540, which changed state. Bug 88540 Summary: Issues with vectorization of min/max operations https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88540 What|Removed |Added Status|ASSIGNED|RESOLVED Resolution|--- |FIXED
[Bug rtl-optimization/94864] Failure to combine vunpckhpd+movsd into single vunpckhpd
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=94864 Richard Biener changed: What|Removed |Added Assignee|unassigned at gcc dot gnu.org |rguenth at gcc dot gnu.org Status|NEW |ASSIGNED --- Comment #4 from Richard Biener --- Addressed by the patch for PR94865.
[Bug rtl-optimization/94864] Failure to combine vunpckhpd+movsd into single vunpckhpd
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=94864 --- Comment #3 from Segher Boessenkool --- vec_duplicate of vec_select is just a vec_select. Any vec_merge is a vec_select as well, as you say. Canonicalisation should make vec_select always. We probably should have canonicalisation rules for this, so that we do not get all those rtxes in the instruction stream at all.
[Bug rtl-optimization/94864] Failure to combine vunpckhpd+movsd into single vunpckhpd
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=94864 Richard Biener changed: What|Removed |Added Ever confirmed|0 |1 Last reconfirmed||2020-04-30 Component|target |rtl-optimization CC||rguenth at gcc dot gnu.org, ||segher at gcc dot gnu.org, ||uros at gcc dot gnu.org Status|UNCONFIRMED |NEW --- Comment #2 from Richard Biener --- We're feeding combine with (insn 7 4 9 2 (set (reg:DF 87) (vec_select:DF (reg:V2DF 90) (parallel [ (const_int 1 [0x1]) ]))) "y.c":6:26 3195 {sse2_storehpd} (expr_list:REG_DEAD (reg:V2DF 90) (nil))) (insn 9 7 14 2 (set (reg:V2DF 88 [ result ]) (vec_merge:V2DF (vec_duplicate:V2DF (reg:DF 87)) (reg:V2DF 89) (const_int 1 [0x1]))) "y.c":6:21 2918 {vec_setv2df_0} (expr_list:REG_DEAD (reg:V2DF 89) (expr_list:REG_DEAD (reg:DF 87) (nil which makes (set (reg:V2DF 88 [ result ]) (vec_merge:V2DF (vec_duplicate:V2DF (vec_select:DF (reg:V2DF 90) (parallel [ (const_int 1 [0x1]) ]))) (reg:V2DF 89) (const_int 1 [0x1]))) out of this which does not match anything because x86 chooses to use vec_merge in some and vec_select/vec_concat in other patterns: (define_insn "*vec_interleave_highv2df" [(set (match_operand:V2DF 0 "nonimmediate_operand" "=x,v,v,x,v,m") (vec_select:V2DF (vec_concat:V4DF (match_operand:V2DF 1 "nonimmediate_operand" " 0,v,o,o,o,v") (match_operand:V2DF 2 "nonimmediate_operand" " x,v,1,0,v,0")) (parallel [(const_int 1) (const_int 3)])))] "TARGET_SSE2 && ix86_vec_interleave_v2df_operator_ok (operands, 1)" "@ unpckhpd\t{%2, %0|%0, %2} not sure if combine should try to exchange vec_merge for vec_select/vec_concat or if this is simply the backends fault (or GCCs for even having two ways to express the same thing...)