[Bug tree-optimization/93080] insert of an extraction on the same location is not optimized
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=93080 --- Comment #9 from Richard Biener --- The testcase in the description is fixed, comment#2 and comment#4 are not.
[Bug tree-optimization/93080] insert of an extraction on the same location is not optimized
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=93080 --- Comment #8 from CVS Commits --- The master branch has been updated by Richard Biener : https://gcc.gnu.org/g:27de9aa152141e7f3ee66372647d0f2cd94c4b90 commit r14-3381-g27de9aa152141e7f3ee66372647d0f2cd94c4b90 Author: Richard Biener Date: Wed Jul 12 15:01:47 2023 +0200 tree-optimization/94864 - vector insert of vector extract simplification The PRs ask for optimizing of _1 = BIT_FIELD_REF ; result_4 = BIT_INSERT_EXPR ; to a vector permutation. The following implements this as match.pd pattern, improving code generation on x86_64. On the RTL level we face the issue that backend patterns inconsistently use vec_merge and vec_select of vec_concat to represent permutes. I think using a (supported) permute is almost always better than an extract plus insert, maybe excluding the case we extract element zero and that's aliased to a register that can be used directly for insertion (not sure how to query that). The patch FAILs one case in gcc.target/i386/avx512fp16-vmovsh-1a.c where we now expand from __A_28 = VEC_PERM_EXPR ; instead of _28 = BIT_FIELD_REF ; __A_29 = BIT_INSERT_EXPR ; producing a vpblendw instruction instead of the expected vmovsh. That's either a missed vec_perm_const expansion optimization or even better, an improvement - Zen4 for example has 4 ports to execute vpblendw but only 3 for executing vmovsh and both instructions have the same size. The patch XFAILs the sub-testcase. PR tree-optimization/94864 PR tree-optimization/94865 PR tree-optimization/93080 * match.pd (bit_insert @0 (BIT_FIELD_REF @1 ..) ..): New pattern for vector insertion from vector extraction. * gcc.target/i386/pr94864.c: New testcase. * gcc.target/i386/pr94865.c: Likewise. * gcc.target/i386/avx512fp16-vmovsh-1a.c: XFAIL. * gcc.dg/tree-ssa/forwprop-40.c: Likewise. * gcc.dg/tree-ssa/forwprop-41.c: Likewise.
[Bug tree-optimization/93080] insert of an extraction on the same location is not optimized
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=93080 --- Comment #7 from Richard Biener --- comment#4 could be implemented by an associating pattern in match.pd, currently we get a_3 = BIT_INSERT_EXPR ; a_4 = VEC_PERM_EXPR ; associating a VEC_PERM_EXPR when a or b are defined as insertion into b or a respectively so we get a permute of either a or b with itself (and in this case it's a noop permute). Of course with an arbitrary sequence of inserts / extracts / permutes more "generic" association would be necessary and a pure implementation in match.pd looks difficult.
[Bug tree-optimization/93080] insert of an extraction on the same location is not optimized
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=93080 Andrew Pinski changed: What|Removed |Added Status|ASSIGNED|NEW Assignee|pinskia at gcc dot gnu.org |unassigned at gcc dot gnu.org
[Bug tree-optimization/93080] insert of an extraction on the same location is not optimized
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=93080 Richard Biener changed: What|Removed |Added Target Milestone|11.2|---
[Bug tree-optimization/93080] insert of an extraction on the same location is not optimized
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=93080 Jakub Jelinek changed: What|Removed |Added Target Milestone|11.0|11.2 --- Comment #6 from Jakub Jelinek --- GCC 11.1 has been released, retargeting bugs to GCC 11.2.
[Bug tree-optimization/93080] insert of an extraction on the same location is not optimized
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=93080 --- Comment #5 from Andrew Pinski --- (In reply to Andrew Pinski from comment #3) > I have not looked into how to fix these regressions though. So the problem here is that PRE decides that BIT_INSERT_EXPR is partial redundant and moves the BIT_INSERT_EXPR inside the branch. Since I have decided match.pd is not the way to go for this one, I no longer run into this regression. Also I have a patch now which fixes comment #4 .
[Bug tree-optimization/93080] insert of an extraction on the same location is not optimized
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=93080 --- Comment #4 from Andrew Pinski --- Note the patch also does not handle the following (or something a little more complex): /* { dg-do compile } */ /* { dg-options "-O2 -fdump-tree-optimized" } */ #define vector __attribute__((__vector_size__(4*sizeof(int)) )) vector int g(vector int a, int c) { int b = a[2]; a[1] = c; a[2] = b; return a; } /* { dg-final { scan-tree-dump-times "BIT_INSERT_EXPR" 1 "optimized" } } */ /* { dg-final { scan-tree-dump-times "BIT_FIELD_REF" 0 "optimized" } } */
[Bug tree-optimization/93080] insert of an extraction on the same location is not optimized
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=93080 --- Comment #3 from Andrew Pinski --- (In reply to Richard Biener from comment #2) > It's a bugfix so applicable for stage3... pre-approved with a testcase. Except it causes a regression on x86_64: FAIL: gcc.target/i386/pr54855-8.c scan-assembler-not movsd FAIL: gcc.target/i386/pr54855-8.c scan-assembler-times maxsd 1 FAIL: gcc.target/i386/pr54855-9.c scan-assembler-not movss FAIL: gcc.target/i386/pr54855-9.c scan-assembler-times minss 1 I have not looked into how to fix these regressions though.
[Bug tree-optimization/93080] insert of an extraction on the same location is not optimized
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=93080 --- Comment #2 from Richard Biener --- It's a bugfix so applicable for stage3... pre-approved with a testcase. wonder how complicated it would be to handle vector int g(vector int a) { int b = a[0]; (*(vector unsigned int *)&a)[0] = b; return a; } or punned int/float. Probably doesn't happen in practice though.
[Bug tree-optimization/93080] insert of an extraction on the same location is not optimized
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=93080 Andrew Pinski changed: What|Removed |Added Status|UNCONFIRMED |ASSIGNED Last reconfirmed||2019-12-27 Target Milestone|--- |11.0 Ever confirmed|0 |1 --- Comment #1 from Andrew Pinski --- Mine for GCC 11. Patch: diff --git a/gcc/match.pd b/gcc/match.pd index 84a62ef..c81c5ea 100644 --- a/gcc/match.pd +++ b/gcc/match.pd @@ -5682,6 +5682,11 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT) && TYPE_PRECISION (type) == TYPE_PRECISION (TREE_TYPE (@1))) (convert @1))) +/* Inserting in the same value as extracted is just the original value. */ +(simplify + (bit_insert @0 (BIT_FIELD_REF @0 @1 @2) @2) + @0) + /* bit_insert<@0 convert:@1<@2> @3> -> bit_insert<@0 @2 @3> iff @1 was in the correct precision already and is an insert for integral type. */ (simplify