[Bug tree-optimization/87008] [8/9 Regression] gimple mem-to-mem assignment badly optimized

2019-02-22 Thread jakub at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=87008

Jakub Jelinek  changed:

   What|Removed |Added

   Target Milestone|8.3 |8.4

--- Comment #6 from Jakub Jelinek  ---
GCC 8.3 has been released.

[Bug tree-optimization/87008] [8/9 Regression] gimple mem-to-mem assignment badly optimized

2019-02-22 Thread jamborm at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=87008

--- Comment #7 from Martin Jambor  ---
(In reply to Marc Glisse from comment #1)
> struct A { double a, b; };
> struct B : A {};
> templatevoid cp(T&a,T const&b){a=b;}
> double f(B x){
>   B y; cp(y,x);
>   B z; cp(z,x);
>   return y.a - z.a;
> }
> 
> This is not quite equivalent because RTL manages to optimize this case, but
> gimple, with -Ofast, still gets the ugly:
> 
>   ISRA.1 = MEM[(const struct A &)&x];
>   SR.9_9 = MEM[(struct A *)&ISRA.1];
>   ISRA.1 = MEM[(const struct A &)&x];
>   SR.8_10 = MEM[(struct A *)&ISRA.1];
>   _3 = SR.9_9 - SR.8_10;
>   return _3;
> 
> Writing cp instead of cp makes it work, and the main difference starts
> in SRA. I expect (didn't check) this is another victim of r255510 .

Indeed, with the proof-of-concept patch from PR 85762, this gets optimized
into:

   [local count: 1073741824]:
  SR.7_2 = MEM[(const struct A &)&x];
  _3 = SR.7_2 - SR.7_2;
  return _3;

with -O2 and to retrn 0 with -Ofast.

[Bug tree-optimization/87008] [8/9 Regression] gimple mem-to-mem assignment badly optimized

2019-03-10 Thread jamborm at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=87008

--- Comment #8 from Martin Jambor  ---
Author: jamborm
Date: Sun Mar 10 16:20:06 2019
New Revision: 269556

URL: https://gcc.gnu.org/viewcvs?rev=269556&root=gcc&view=rev
Log:
Make SRA less strict with memcpy performing MEM_REFs

2019-03-10  Martin Jambor  

PR tree-optimization/85762
PR tree-optimization/87008
PR tree-optimization/85459
* tree-sra.c (contains_vce_or_bfcref_p): New parameter, set the bool
it points to if there is a type changing MEM_REF.  Adjust all callers.
(build_accesses_from_assign): Disable total scalarization if
contains_vce_or_bfcref_p returns true through the new parameter, for
both rhs and lhs.

testsuite/
* g++.dg/tree-ssa/pr87008.C: New test.
* gcc.dg/guality/pr54970.c: Xfail tests querying a[0] everywhere.


Added:
trunk/gcc/testsuite/g++.dg/tree-ssa/pr87008.C
Modified:
trunk/gcc/ChangeLog
trunk/gcc/testsuite/ChangeLog
trunk/gcc/testsuite/gcc.dg/guality/pr54970.c
trunk/gcc/tree-sra.c

[Bug tree-optimization/87008] [8/9 Regression] gimple mem-to-mem assignment badly optimized

2019-04-12 Thread jakub at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=87008

Jakub Jelinek  changed:

   What|Removed |Added

 CC||jakub at gcc dot gnu.org

--- Comment #9 from Jakub Jelinek  ---
Is this now fixed on the trunk with r269556?

[Bug tree-optimization/87008] [8/9 Regression] gimple mem-to-mem assignment badly optimized

2018-12-20 Thread rguenth at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=87008

Richard Biener  changed:

   What|Removed |Added

   Priority|P3  |P2

--- Comment #5 from Richard Biener  ---
So on trunk ESRA does some pointless stuff at least:

:
   ISRA.2 = MEM[(const struct A &)&x];
+  SR.7_4 = MEM[(struct A *)&ISRA.2];
+  SR.8_5 = MEM[(struct A *)&ISRA.2 + 8B];
+  MEM[(struct A *)&ISRA.2] = SR.7_4;
+  MEM[(struct A *)&ISRA.2 + 8B] = SR.8_5;
   MEM[(struct A *)&y] = ISRA.2;
-  ISRA.2 ={v} {CLOBBER};
+  y$D2304$a_17 = MEM[(struct A *)&y];
   ISRA.2 = MEM[(const struct A &)&x];
+  SR.5_20 = MEM[(struct A *)&ISRA.2];
+  SR.6_21 = MEM[(struct A *)&ISRA.2 + 8B];
+  MEM[(struct A *)&ISRA.2] = SR.5_20;
+  MEM[(struct A *)&ISRA.2 + 8B] = SR.6_21;
   MEM[(struct A *)&z] = ISRA.2;
-  ISRA.2 ={v} {CLOBBER};
-  _1 = y.D.2304.a;
-  _2 = z.D.2304.a;
+  z$D2304$a_24 = MEM[(struct A *)&z];
+  _1 = y$D2304$a_17;
+  _2 = z$D2304$a_24;
   _6 = _1 - _2;

where the main grief is caused by IPA SRA which replaces

  cp (&y.D.2304, &x.D.2304);
  cp (&z.D.2304, &x.D.2304);
  _1 = y.D.2304.a;
  _2 = z.D.2304.a;
  _6 = _1 - _2;

with

  _Z2cpI1AEvRT_RKS1_.isra.0 (&y, MEM[(const struct A &)&x]);
  _Z2cpI1AEvRT_RKS1_.isra.0 (&z, MEM[(const struct A &)&x]);
  _1 = y.D.2304.a;
  _2 = z.D.2304.a;
  _6 = _1 - _2;

replacing the by-reference second argument by an aggregate by-value
argument.

I think that's unwarranted - Martin, can you see if there's a simple logic
error that can rectify this?  The same behavior is happening when
the second parameter is not declared const.

With IPA SRA disabled the IL gets nicer but then ESRA doesnt' do anything
interesting anymore (but renaming stuff and uglifying/moving loads):

   MEM[(struct A *)&y] = MEM[(const struct A &)&x];
+  y$D2304$a_4 = MEM[(struct A *)&y];
   MEM[(struct A *)&z] = MEM[(const struct A &)&x];
-  _1 = y.D.2304.a;
-  _2 = z.D.2304.a;
+  z$D2304$a_5 = MEM[(struct A *)&z];
+  _1 = y$D2304$a_4;
+  _2 = z$D2304$a_5;
   _6 = _1 - _2;

also sth to avoid IMHO.  I guess it thinks it might fully scalarize
the copies but late decides not to but leaves the rest of the trasform
in-place.

So yes, the copy is viewed as contains_vce_or_bfcref_p because
we access a variable of type B via type A.  But that only matters
for variables we can scalarize away - we call this for MEM[&x]
only but we want to scalarize MEM[&y] = MEM[&x] thus scalarize y
away.  That we mark x as cannot_scalarize_away_bitmap shouldn't
affect total scalarization for the aggregate copy?

I wonder why build_accesses_from_assign only looks at the RHS for
total scalarization and not the LHS.

Well to sum it up, we see all uses of y and z and thus we _can_
total scalarize y and z simply eliding the aggregate copy.

[Bug tree-optimization/87008] [8/9 Regression] gimple mem-to-mem assignment badly optimized

2018-08-21 Thread glisse at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=87008

--- Comment #1 from Marc Glisse  ---
struct A { double a, b; };
struct B : A {};
templatevoid cp(T&a,T const&b){a=b;}
double f(B x){
  B y; cp(y,x);
  B z; cp(z,x);
  return y.a - z.a;
}

This is not quite equivalent because RTL manages to optimize this case, but
gimple, with -Ofast, still gets the ugly:

  ISRA.1 = MEM[(const struct A &)&x];
  SR.9_9 = MEM[(struct A *)&ISRA.1];
  ISRA.1 = MEM[(const struct A &)&x];
  SR.8_10 = MEM[(struct A *)&ISRA.1];
  _3 = SR.9_9 - SR.8_10;
  return _3;

Writing cp instead of cp makes it work, and the main difference starts in
SRA. I expect (didn't check) this is another victim of r255510 .

[Bug tree-optimization/87008] [8/9 Regression] gimple mem-to-mem assignment badly optimized

2018-08-21 Thread glisse at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=87008

--- Comment #2 from Marc Glisse  ---
Or just:

struct A { double a, b; };
struct B : A {};
double f(B x){
  B y;
  A*px=&x;
  A*py=&y;
  *py=*px;
  return y.a;
}

  MEM[(struct A *)&y] = MEM[(const struct A &)&x];
  y_6 = MEM[(struct A *)&y];
  y ={v} {CLOBBER};
  return y_6;

where y_6 should be read directly from x. SRA doesn't dare touch it. SCCVN does
see that reading from y is equivalent to reading from x, but unless something
else is already reading from x, it keeps the read from y.

[Bug tree-optimization/87008] [8/9 Regression] gimple mem-to-mem assignment badly optimized

2018-08-22 Thread rguenth at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=87008

Richard Biener  changed:

   What|Removed |Added

 Status|UNCONFIRMED |NEW
   Last reconfirmed||2018-08-22
 CC||jamborm at gcc dot gnu.org,
   ||rguenth at gcc dot gnu.org
   Target Milestone|--- |8.3
 Ever confirmed|0   |1

--- Comment #3 from Richard Biener  ---
(In reply to Marc Glisse from comment #2)
> Or just:
> 
> struct A { double a, b; };
> struct B : A {};
> double f(B x){
>   B y;
>   A*px=&x;
>   A*py=&y;
>   *py=*px;
>   return y.a;
> }
> 
>   MEM[(struct A *)&y] = MEM[(const struct A &)&x];
>   y_6 = MEM[(struct A *)&y];
>   y ={v} {CLOBBER};
>   return y_6;
> 
> where y_6 should be read directly from x. SRA doesn't dare touch it. SCCVN
> does see that reading from y is equivalent to reading from x, but unless
> something else is already reading from x, it keeps the read from y.

Yeah, SCCVN doesn't change the read to one from x because generally it
cannot know y will go away and reading from x possibly enlarges its
lifetime (no stack sharing).

To really handle this we need to expose the way we'd expand such aggregate
copies to RTL already at GIMPLE stage.  SRA could be the pass that should
eventually do that (but of course avoid exposing copy loops or calls to
memcpy we might expand to ...).  So it boils down to heuristics again...

[Bug tree-optimization/87008] [8/9 Regression] gimple mem-to-mem assignment badly optimized

2018-08-22 Thread rguenth at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=87008

--- Comment #4 from Richard Biener  ---
(In reply to Marc Glisse from comment #1)
> struct A { double a, b; };
> struct B : A {};
> templatevoid cp(T&a,T const&b){a=b;}
> double f(B x){
>   B y; cp(y,x);
>   B z; cp(z,x);
>   return y.a - z.a;
> }
> 
> This is not quite equivalent because RTL manages to optimize this case, but
> gimple, with -Ofast, still gets the ugly:
> 
>   ISRA.1 = MEM[(const struct A &)&x];
>   SR.9_9 = MEM[(struct A *)&ISRA.1];
>   ISRA.1 = MEM[(const struct A &)&x];
>   SR.8_10 = MEM[(struct A *)&ISRA.1];
>   _3 = SR.9_9 - SR.8_10;
>   return _3;
> 
> Writing cp instead of cp makes it work, and the main difference starts
> in SRA. I expect (didn't check) this is another victim of r255510 .

The initial IL is too convoluted for early FRE to figure out the equivalences.
For the above which is visible to the late FRE the issue is that the
redundant aggregate copy gets in the way which isn't detected in early FRE
and FRE also doesn't try to remove or detect redundant aggregate copies
because we don't really "value-number" aggregate stores.

To sum up it - aggregate copies are bad ;)  But they also sometimes
help - all the vn_reference_op_lookup_3 tricks wouldn't work without
them unless you end up with store pieces that always fully cover all
downstream loads.