[Bug tree-optimization/58497] SLP vectorizes identical operations

2021-08-14 Thread pinskia at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=58497

Andrew Pinski  changed:

   What|Removed |Added

   Assignee|unassigned at gcc dot gnu.org  |pinskia at gcc dot 
gnu.org
 Status|NEW |ASSIGNED

--- Comment #13 from Andrew Pinski  ---
Mine for GCC 13, I have patches which turn:

  W_6 = BIT_INSERT_EXPR ;
  W_7 = BIT_INSERT_EXPR ;
  W_8 = BIT_INSERT_EXPR ;
  W_9 = BIT_INSERT_EXPR ;
Into:
W_9 = {_2,_2,_2,_2};

This improvement deals with bitfields but vectors have a similar issue with
Bit_inserts so I deal with it there.

[Bug tree-optimization/58497] SLP vectorizes identical operations

2018-10-30 Thread rguenth at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=58497

Richard Biener  changed:

   What|Removed |Added

 Status|ASSIGNED|NEW
   Assignee|rguenth at gcc dot gnu.org |unassigned at gcc dot 
gnu.org

--- Comment #12 from Richard Biener  ---
We now generate

g:
.LFB0:
.cfi_startproc
pxor%xmm1, %xmm1
addl$1, %edi
movaps  %xmm1, %xmm0
cvtsi2ss%edi, %xmm0
shufps  $36, %xmm0, %xmm1
movaps  %xmm1, %xmm0
cvtsi2ss%edi, %xmm0
shufps  $196, %xmm0, %xmm1
movaps  %xmm1, %xmm0
unpcklps%xmm1, %xmm0
cvtsi2ss%edi, %xmm0
shufps  $225, %xmm1, %xmm0
cvtsi2ss%edi, %xmm0
ret

or with SSE4

g:
.LFB0:
.cfi_startproc
addl$1, %edi
pxor%xmm1, %xmm1
pxor%xmm0, %xmm0
cvtsi2ss%edi, %xmm1
insertps$48, %xmm1, %xmm0
insertps$32, %xmm1, %xmm0
insertps$16, %xmm1, %xmm0
movss   %xmm1, %xmm0
ret

on GIMPLE we end up with

g (int x)
{
  float4 W;
  int _1;
  float _2;

   [local count: 1073741824]:
  _1 = x_3(D) + 1;
  _2 = (float) _1;
  W_6 = BIT_INSERT_EXPR ;
  W_7 = BIT_INSERT_EXPR ;
  W_8 = BIT_INSERT_EXPR ;
  W_9 = BIT_INSERT_EXPR ;
  return W_9;

so we miss to recognize the splat.  The GIMPLE looks like this very early
already (update-address-taken + forwprop).  SLP vectorization
doesn't treat a BIT_INSERT_EXPR "reduction" as sink but we could probably
pattern-match a VEC_DUPLICATE_EXPR for the above.

[Bug tree-optimization/58497] SLP vectorizes identical operations

2015-11-12 Thread rguenth at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=58497

--- Comment #11 from Richard Biener  ---
Author: rguenth
Date: Thu Nov 12 09:00:37 2015
New Revision: 230216

URL: https://gcc.gnu.org/viewcvs?rev=230216=gcc=rev
Log:
2015-11-12  Richard Biener  

PR tree-optimization/58497
* tree-vect-generic.c: Include gimplify.h.
(tree_vec_extract): Lookup constant/constructor DEFs.
(do_cond): Unshare cond.

Modified:
trunk/gcc/ChangeLog
trunk/gcc/tree-vect-generic.c

[Bug tree-optimization/58497] SLP vectorizes identical operations

2015-11-11 Thread ro at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=58497

Rainer Orth  changed:

   What|Removed |Added

 CC||ro at gcc dot gnu.org

--- Comment #5 from Rainer Orth  ---
Created attachment 36685
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=36685=edit
-fdump-tree-optimized dump

[Bug tree-optimization/58497] SLP vectorizes identical operations

2015-11-11 Thread ro at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=58497

--- Comment #6 from Rainer Orth  ---
The new gcc.dg/tree-ssa/vector-5.c testcase FAILs on 64-bit Solaris/SPARC:

FAIL: gcc.dg/tree-ssa/vector-5.c scan-tree-dump-times optimized " * 3;" 1

  Rainer

[Bug tree-optimization/58497] SLP vectorizes identical operations

2015-11-11 Thread ro at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=58497

--- Comment #8 from Rainer Orth  ---
Created attachment 36687
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=36687=edit
-fdump-tree-dom2-details dump

[Bug tree-optimization/58497] SLP vectorizes identical operations

2015-11-11 Thread rguenther at suse dot de
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=58497

--- Comment #9 from rguenther at suse dot de  ---
On Wed, 11 Nov 2015, ro at gcc dot gnu.org wrote:

> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=58497
> 
> --- Comment #8 from Rainer Orth  ---
> Created attachment 36687
>   --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=36687=edit
> -fdump-tree-dom2-details dump

Ok, it's not supposed to look like this after lowering.  Does SPARC
not have an integer multiply instruction (SImode)?  Then the
FAIL is expected (though folding halfway does the transform anyway...).

[Bug tree-optimization/58497] SLP vectorizes identical operations

2015-11-11 Thread rguenth at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=58497

--- Comment #10 from Richard Biener  ---
Index: gcc/tree-vect-generic.c
===
--- gcc/tree-vect-generic.c (revision 230146)
+++ gcc/tree-vect-generic.c (working copy)
@@ -105,6 +106,15 @@ static inline tree
 tree_vec_extract (gimple_stmt_iterator *gsi, tree type,
  tree t, tree bitsize, tree bitpos)
 {
+  if (TREE_CODE (t) == SSA_NAME)
+{
+  gimple *def_stmt = SSA_NAME_DEF_STMT (t);
+  if (is_gimple_assign (def_stmt)
+ && (gimple_assign_rhs_code (def_stmt) == VECTOR_CST
+ || (bitpos
+ && gimple_assign_rhs_code (def_stmt) == CONSTRUCTOR)))
+   t = gimple_assign_rhs1 (def_stmt);
+}
   if (bitpos)
 {
   if (TREE_CODE (type) == BOOLEAN_TYPE)

should fix it (in testing).

[Bug tree-optimization/58497] SLP vectorizes identical operations

2015-11-11 Thread rguenth at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=58497

--- Comment #7 from Richard Biener  ---
(In reply to Rainer Orth from comment #6)
> The new gcc.dg/tree-ssa/vector-5.c testcase FAILs on 64-bit Solaris/SPARC:
> 
> FAIL: gcc.dg/tree-ssa/vector-5.c scan-tree-dump-times optimized " * 3;" 1
> 
>   Rainer

  :
  v1_2 = {i_1(D), i_1(D), i_1(D), i_1(D)};
  _6 = i_1(D);
  _7 = i_1(D) * 3;
  _8 = i_1(D);
  _9 = i_1(D) * 3;
  _10 = i_1(D);
  _11 = i_1(D) * 3;
  _12 = i_1(D);
  _13 = i_1(D) * 3;
  _3 = {_7, _9, _11, _13};

err, why would DOM which runs after lower_vector_ssa _not_ CSE those
multiplications?  Pleas attach dom2-details dumps.

[Bug tree-optimization/58497] SLP vectorizes identical operations

2015-10-22 Thread rguenth at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=58497

--- Comment #3 from Richard Biener  ---
Author: rguenth
Date: Thu Oct 22 13:36:46 2015
New Revision: 229173

URL: https://gcc.gnu.org/viewcvs?rev=229173=gcc=rev
Log:
2015-10-22  Richard Biener  

PR tree-optimization/58497
* tree-vect-generic.c (ssa_uniform_vector_p): New helper.
(expand_vector_operations_1): Use it.  Lower operations on
all uniform vectors to scalar operations if the HW supports it.

* gcc.dg/tree-ssa/vector-5.c: New testcase.

Added:
trunk/gcc/testsuite/gcc.dg/tree-ssa/vector-5.c
Modified:
trunk/gcc/ChangeLog
trunk/gcc/testsuite/ChangeLog
trunk/gcc/tree-vect-generic.c

--- Comment #4 from Richard Biener  ---
Now we fix this up in veclower, still the bug should be addressed in SLP
directly
(also because it affects cost decisions).


[Bug tree-optimization/58497] SLP vectorizes identical operations

2015-10-22 Thread rguenth at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=58497

--- Comment #3 from Richard Biener  ---
Author: rguenth
Date: Thu Oct 22 13:36:46 2015
New Revision: 229173

URL: https://gcc.gnu.org/viewcvs?rev=229173=gcc=rev
Log:
2015-10-22  Richard Biener  

PR tree-optimization/58497
* tree-vect-generic.c (ssa_uniform_vector_p): New helper.
(expand_vector_operations_1): Use it.  Lower operations on
all uniform vectors to scalar operations if the HW supports it.

* gcc.dg/tree-ssa/vector-5.c: New testcase.

Added:
trunk/gcc/testsuite/gcc.dg/tree-ssa/vector-5.c
Modified:
trunk/gcc/ChangeLog
trunk/gcc/testsuite/ChangeLog
trunk/gcc/tree-vect-generic.c

--- Comment #4 from Richard Biener  ---
Now we fix this up in veclower, still the bug should be addressed in SLP
directly
(also because it affects cost decisions).


[Bug tree-optimization/58497] SLP vectorizes identical operations

2013-09-23 Thread rguenth at gcc dot gnu.org
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=58497

Richard Biener rguenth at gcc dot gnu.org changed:

   What|Removed |Added

   Keywords||missed-optimization
 Target||x86_64-*-*
 Status|UNCONFIRMED |ASSIGNED
   Last reconfirmed||2013-09-23
 Depends on||53947
   Assignee|unassigned at gcc dot gnu.org  |rguenth at gcc dot 
gnu.org
 Ever confirmed|0   |1

--- Comment #1 from Richard Biener rguenth at gcc dot gnu.org ---
Heh ;)  I suppose this started with BIT_FIELD_REF support in SLP, 4.8 didn't
vectorize this at all.

Note that with for example

typedef float float4 __attribute__((vector_size(16)));

float4 g(int x)
{
  float4 W;
  W[0]=W[1]=x+1;
  W[2]=x+2;
  W[3]=x+3;
  return W;
}

vectorizing two same operations may be profitable.  But yes, if all
scalars are the same there is no point to do it.  And the cost model
should have disabled it as well (though likely the four stores
made it profitable in the end).

I will have a look at some point.

OTOH generated code is

g:
.LFB0:
.cfi_startproc
movl%edi, -12(%rsp)
movd-12(%rsp), %xmm1
pshufd  $0, %xmm1, %xmm0
paddd   .LC0(%rip), %xmm0
cvtdq2ps%xmm0, %xmm0
ret

vs. -fno-tree-vectorize:

g:
.LFB0:
.cfi_startproc
xorps   %xmm1, %xmm1
addl$1, %edi
xorps   %xmm0, %xmm0
cvtsi2ss%edi, %xmm1
movaps  %xmm0, %xmm2
movss   %xmm1, %xmm2
shufps  $36, %xmm2, %xmm0
movaps  %xmm0, %xmm2
movss   %xmm1, %xmm2
shufps  $196, %xmm2, %xmm0
movaps  %xmm0, %xmm2
unpcklps%xmm0, %xmm0
movss   %xmm1, %xmm0
shufps  $225, %xmm2, %xmm0
movss   %xmm1, %xmm0
ret

so clearly a win, but improvable to sth like

addl$1, %edi
cvtsi2ss%edi, %xmm1
pshufd  $0, %xmm1, %xmm0

the above also shows that vector init by BIT_FIELD_REF is not expanded
very well (sth for a generalized vector shuffle recognition in the bswap pass).


[Bug tree-optimization/58497] SLP vectorizes identical operations

2013-09-23 Thread rguenth at gcc dot gnu.org
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=58497

--- Comment #2 from Richard Biener rguenth at gcc dot gnu.org ---
Created attachment 30884
  -- http://gcc.gnu.org/bugzilla/attachment.cgi?id=30884action=edit
prototype patch

A quick check shows generated code will be

g:
.LFB0:
.cfi_startproc
xorps   %xmm0, %xmm0
addl$1, %edi
cvtsi2ss%edi, %xmm0
shufps  $0, %xmm0, %xmm0
ret

and the patch shows possible issues with finding an insert location for
the init stmt (otherwise external is just outside of the current
basic-block).