[Bug tree-optimization/93080] insert of an extraction on the same location is not optimized

2023-08-22 Thread rguenth at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=93080

--- Comment #9 from Richard Biener  ---
The testcase in the description is fixed, comment#2 and comment#4 are not.

[Bug tree-optimization/93080] insert of an extraction on the same location is not optimized

2023-08-22 Thread cvs-commit at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=93080

--- Comment #8 from CVS Commits  ---
The master branch has been updated by Richard Biener :

https://gcc.gnu.org/g:27de9aa152141e7f3ee66372647d0f2cd94c4b90

commit r14-3381-g27de9aa152141e7f3ee66372647d0f2cd94c4b90
Author: Richard Biener 
Date:   Wed Jul 12 15:01:47 2023 +0200

tree-optimization/94864 - vector insert of vector extract simplification

The PRs ask for optimizing of

  _1 = BIT_FIELD_REF ;
  result_4 = BIT_INSERT_EXPR ;

to a vector permutation.  The following implements this as
match.pd pattern, improving code generation on x86_64.

On the RTL level we face the issue that backend patterns inconsistently
use vec_merge and vec_select of vec_concat to represent permutes.

I think using a (supported) permute is almost always better
than an extract plus insert, maybe excluding the case we extract
element zero and that's aliased to a register that can be used
directly for insertion (not sure how to query that).

The patch FAILs one case in gcc.target/i386/avx512fp16-vmovsh-1a.c
where we now expand from

 __A_28 = VEC_PERM_EXPR ;

instead of

 _28 = BIT_FIELD_REF ;
 __A_29 = BIT_INSERT_EXPR ;

producing a vpblendw instruction instead of the expected vmovsh.  That's
either a missed vec_perm_const expansion optimization or even better,
an improvement - Zen4 for example has 4 ports to execute vpblendw
but only 3 for executing vmovsh and both instructions have the same size.

The patch XFAILs the sub-testcase.

PR tree-optimization/94864
PR tree-optimization/94865
PR tree-optimization/93080
* match.pd (bit_insert @0 (BIT_FIELD_REF @1 ..) ..): New pattern
for vector insertion from vector extraction.

* gcc.target/i386/pr94864.c: New testcase.
* gcc.target/i386/pr94865.c: Likewise.
* gcc.target/i386/avx512fp16-vmovsh-1a.c: XFAIL.
* gcc.dg/tree-ssa/forwprop-40.c: Likewise.
* gcc.dg/tree-ssa/forwprop-41.c: Likewise.

[Bug tree-optimization/93080] insert of an extraction on the same location is not optimized

2023-08-21 Thread rguenth at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=93080

--- Comment #7 from Richard Biener  ---
comment#4 could be implemented by an associating pattern in match.pd, currently
we get

  a_3 = BIT_INSERT_EXPR ;
  a_4 = VEC_PERM_EXPR ;

associating a VEC_PERM_EXPR  when a or b are defined as
insertion into b or a respectively so we get a permute of either a
or b with itself (and in this case it's a noop permute).

Of course with an arbitrary sequence of inserts / extracts / permutes
more "generic" association would be necessary and a pure implementation
in match.pd looks difficult.

[Bug tree-optimization/93080] insert of an extraction on the same location is not optimized

2023-07-15 Thread pinskia at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=93080

Andrew Pinski  changed:

   What|Removed |Added

 Status|ASSIGNED|NEW
   Assignee|pinskia at gcc dot gnu.org |unassigned at gcc dot 
gnu.org

[Bug tree-optimization/93080] insert of an extraction on the same location is not optimized

2021-07-28 Thread rguenth at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=93080

Richard Biener  changed:

   What|Removed |Added

   Target Milestone|11.2|---

[Bug tree-optimization/93080] insert of an extraction on the same location is not optimized

2021-04-27 Thread jakub at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=93080

Jakub Jelinek  changed:

   What|Removed |Added

   Target Milestone|11.0|11.2

--- Comment #6 from Jakub Jelinek  ---
GCC 11.1 has been released, retargeting bugs to GCC 11.2.

[Bug tree-optimization/93080] insert of an extraction on the same location is not optimized

2020-01-13 Thread pinskia at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=93080

--- Comment #5 from Andrew Pinski  ---
(In reply to Andrew Pinski from comment #3)
> I have not looked into how to fix these regressions though.

So the problem here is that PRE decides that BIT_INSERT_EXPR is partial
redundant and moves the BIT_INSERT_EXPR inside the branch.

Since I have decided match.pd is not the way to go for this one, I no longer
run into this regression.  Also I have a patch now which fixes comment #4 .

[Bug tree-optimization/93080] insert of an extraction on the same location is not optimized

2020-01-11 Thread pinskia at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=93080

--- Comment #4 from Andrew Pinski  ---
Note the patch also does not handle the following (or something a little more
complex):

/* { dg-do compile } */
/* { dg-options "-O2 -fdump-tree-optimized" } */
#define vector __attribute__((__vector_size__(4*sizeof(int)) ))

vector int g(vector int a, int c)
{
  int b = a[2];
  a[1] = c;
  a[2] = b;
  return a;
}

/* { dg-final { scan-tree-dump-times "BIT_INSERT_EXPR" 1 "optimized" } } */
/* { dg-final { scan-tree-dump-times "BIT_FIELD_REF" 0 "optimized" } } */

[Bug tree-optimization/93080] insert of an extraction on the same location is not optimized

2020-01-11 Thread pinskia at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=93080

--- Comment #3 from Andrew Pinski  ---
(In reply to Richard Biener from comment #2)
> It's a bugfix so applicable for stage3...  pre-approved with a testcase.

Except it causes a regression on x86_64:
FAIL: gcc.target/i386/pr54855-8.c scan-assembler-not movsd
FAIL: gcc.target/i386/pr54855-8.c scan-assembler-times maxsd 1
FAIL: gcc.target/i386/pr54855-9.c scan-assembler-not movss
FAIL: gcc.target/i386/pr54855-9.c scan-assembler-times minss 1

I have not looked into how to fix these regressions though.

[Bug tree-optimization/93080] insert of an extraction on the same location is not optimized

2020-01-09 Thread rguenth at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=93080

--- Comment #2 from Richard Biener  ---
It's a bugfix so applicable for stage3...  pre-approved with a testcase.

wonder how complicated it would be to handle

vector int g(vector int a)
{
  int b = a[0];
  (*(vector unsigned int *)&a)[0] = b;
  return a;
}

or punned int/float.  Probably doesn't happen in practice though.

[Bug tree-optimization/93080] insert of an extraction on the same location is not optimized

2019-12-27 Thread pinskia at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=93080

Andrew Pinski  changed:

   What|Removed |Added

 Status|UNCONFIRMED |ASSIGNED
   Last reconfirmed||2019-12-27
   Target Milestone|--- |11.0
 Ever confirmed|0   |1

--- Comment #1 from Andrew Pinski  ---
Mine for GCC 11.

Patch:
diff --git a/gcc/match.pd b/gcc/match.pd
index 84a62ef..c81c5ea 100644
--- a/gcc/match.pd
+++ b/gcc/match.pd
@@ -5682,6 +5682,11 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT)
   && TYPE_PRECISION (type) == TYPE_PRECISION (TREE_TYPE (@1)))
   (convert @1)))

+/* Inserting in the same value as extracted is just the original value. */
+(simplify
+ (bit_insert @0 (BIT_FIELD_REF @0 @1 @2) @2)
+ @0)
+
 /* bit_insert<@0 convert:@1<@2> @3> -> bit_insert<@0 @2 @3> iff @1 was in
the correct precision already and is an insert for integral type. */
 (simplify