[Bug middle-end/52890] Revision 185336 causes 10% degradation on cpu2000 benchmark 252.eon

2012-12-03 Thread jamborm at gcc dot gnu.org


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=52890



Martin Jambor  changed:



   What|Removed |Added



 Status|NEW |RESOLVED

 Resolution||FIXED



--- Comment #14 from Martin Jambor  2012-12-03 
13:25:26 UTC ---

Finally fixed.  Thanks for patience.


[Bug middle-end/52890] Revision 185336 causes 10% degradation on cpu2000 benchmark 252.eon

2012-11-30 Thread jamborm at gcc dot gnu.org


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=52890



--- Comment #13 from Martin Jambor  2012-11-30 
16:11:41 UTC ---

Author: jamborm

Date: Fri Nov 30 16:11:33 2012

New Revision: 193998



URL: http://gcc.gnu.org/viewcvs?root=gcc&view=rev&rev=193998

Log:

2012-11-30  Martin Jambor  



PR middle-end/52890

PR tree-optimization/55415

PR tree-optimization/54386

PR target/55448

* ipa-prop.c (ipa_modify_call_arguments): Be optimistic when

get_pointer_alignment_1 returns false and the base was not a

dereference.

* tree-sra.c (access_precludes_ipa_sra_p): New parameter req_align,

added check for required alignment.  Update the user.



* testsuite/gcc.dg/ipa/ipa-sra-7.c: New test.

* testsuite/gcc.dg/ipa/ipa-sra-8.c: Likewise.

* testsuite/gcc.dg/ipa/ipa-sra-9.c: Likewise.

* testsuite/gcc.target/i386/pr55448.c: Likewise.





Added:

trunk/gcc/testsuite/gcc.dg/ipa/ipa-sra-7.c

trunk/gcc/testsuite/gcc.dg/ipa/ipa-sra-8.c

trunk/gcc/testsuite/gcc.dg/ipa/ipa-sra-9.c

trunk/gcc/testsuite/gcc.target/i386/pr55448.c

Modified:

trunk/gcc/ChangeLog

trunk/gcc/ipa-prop.c

trunk/gcc/testsuite/ChangeLog

trunk/gcc/tree-sra.c


[Bug middle-end/52890] Revision 185336 causes 10% degradation on cpu2000 benchmark 252.eon

2012-11-27 Thread jamborm at gcc dot gnu.org


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=52890



--- Comment #12 from Martin Jambor  2012-11-27 
20:47:21 UTC ---

I have proposed a patch on the mailing list:



http://gcc.gnu.org/ml/gcc-patches/2012-11/msg02265.html


[Bug middle-end/52890] Revision 185336 causes 10% degradation on cpu2000 benchmark 252.eon

2012-09-14 Thread jamborm at gcc dot gnu.org
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=52890

--- Comment #11 from Martin Jambor  2012-09-14 
13:55:04 UTC ---
(In reply to comment #10)
> 
> Martin,
>   Have you done any more digging on this? I just discovered that cpu2006
> benchmark 471.omnetpp suffers the same problem (8 byte loads turned in to
> sequence of byte loads/shifts/ors). It causes a 12% degradation on PowerPC, 
> and
> also goes away when -fno-ipa-sra is added. I'm attaching a reduced testcase
> that can be compiled with g++ -S -O3 -m64 -mcpu=power7.

Thanks for the reduced testcase.  This PR is still on my radar but so
far other issues have had higher priority.  On the other hand, I agree
I have not looked at this for too long, I'll do my best to look at it
soon.  I can't promise much in the next 2-3 weeks, though.


[Bug middle-end/52890] Revision 185336 causes 10% degradation on cpu2000 benchmark 252.eon

2012-09-12 Thread pthaugen at gcc dot gnu.org
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=52890

--- Comment #10 from Pat Haugen  2012-09-12 
23:04:55 UTC ---
Created attachment 28181
  --> http://gcc.gnu.org/bugzilla/attachment.cgi?id=28181
Reduced testcase

Martin,
  Have you done any more digging on this? I just discovered that cpu2006
benchmark 471.omnetpp suffers the same problem (8 byte loads turned in to
sequence of byte loads/shifts/ors). It causes a 12% degradation on PowerPC, and
also goes away when -fno-ipa-sra is added. I'm attaching a reduced testcase
that can be compiled with g++ -S -O3 -m64 -mcpu=power7.


[Bug middle-end/52890] Revision 185336 causes 10% degradation on cpu2000 benchmark 252.eon

2012-09-12 Thread dje at gcc dot gnu.org
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=52890

David Edelsohn  changed:

   What|Removed |Added

 CC||dje at gcc dot gnu.org

--- Comment #9 from David Edelsohn  2012-09-12 15:23:27 
UTC ---
Please add new testcases -- hopefully generic tests -- in the common part of
the testsuite to cover this type of regression.


[Bug middle-end/52890] Revision 185336 causes 10% degradation on cpu2000 benchmark 252.eon

2012-04-17 Thread jamborm at gcc dot gnu.org
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=52890

--- Comment #8 from Martin Jambor  2012-04-17 
15:45:47 UTC ---
The last case is caused by IPA-SRA, -fno-ipa-sra helps, I belive.  All
pre-requesite patches are committed now so I'll concentrate on this
now and hopefully come up with a patch soon.


[Bug middle-end/52890] Revision 185336 causes 10% degradation on cpu2000 benchmark 252.eon

2012-04-11 Thread pthaugen at gcc dot gnu.org
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=52890

Pat Haugen  changed:

   What|Removed |Added

 Status|ASSIGNED|NEW

--- Comment #7 from Pat Haugen  2012-04-11 
18:34:55 UTC ---
I observed similar behavior on loads (8-byte loads transformed to sequence of
byte loads/shifts/ors) with r185470, unaligned read patch. This caused some
sizable degradation on a few benchmarks also.-fno-tree-sra had no effect on the
unaligned load case.

Reduced testcase:

// g++ -S -m64 -O2 -mcpu=power7 load.cc
typedef int ggBoolean;
class mrSurface
{
};
template < class T > class ggTrain
{
public:ggBoolean Append (T item);
  T operator[] (int i) const
  {
return data[i];
  }
private:  T * data;
  int nData;
};

class mrSurfaceList:public mrSurface
{
  virtual void viewingHit () const;
protected:  ggTrain < mrSurface * >surfaces;
};
extern mrSurface *sPtr;
void
mrSurfaceList::viewingHit () const 
{
  sPtr = surfaces[1];
}



150r.expand:

  D.2562_3 = MEM[(const struct ggTrain *)this_1(D) + 8B];
  D.2560_11 = MEM[(struct mrSurface * *)D.2562_3 + 8B];
  sPtr = D.2560_11;


The first MEM[...] gets expanded to the sequence of lbzs/shifts/ors.


[Bug middle-end/52890] Revision 185336 causes 10% degradation on cpu2000 benchmark 252.eon

2012-04-11 Thread jamborm at gcc dot gnu.org
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=52890

Martin Jambor  changed:

   What|Removed |Added

 Status|NEW |ASSIGNED
 AssignedTo|unassigned at gcc dot   |jamborm at gcc dot gnu.org
   |gnu.org |

--- Comment #6 from Martin Jambor  2012-04-11 
16:06:31 UTC ---
I'm assigning this to myself, after I commit the aforementioned patch
(hopefully very soon), I'll work on a followup similar to the above
that will also handle some situations in which
get_addr_base_and_unit_offset in build_ref_for_offset returns NULL.


[Bug middle-end/52890] Revision 185336 causes 10% degradation on cpu2000 benchmark 252.eon

2012-04-11 Thread jamborm at gcc dot gnu.org
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=52890

--- Comment #5 from Martin Jambor  2012-04-11 
15:53:44 UTC ---
But when I add the following line to the patch linked above, the
problem goes away:

Index: src/gcc/tree-sra.c
===
--- src.orig/gcc/tree-sra.c
+++ src/gcc/tree-sra.c
@@ -1455,6 +1455,7 @@ build_ref_for_offset (location_t loc, tr
 }
   else if (TREE_CODE (base) == MEM_REF)
 {
+  prev_base = base;
   off = build_int_cst (TREE_TYPE (TREE_OPERAND (base, 1)),
   base_offset + offset / BITS_PER_UNIT);
   off = int_const_binop (PLUS_EXPR, TREE_OPERAND (base, 1), off);


[Bug middle-end/52890] Revision 185336 causes 10% degradation on cpu2000 benchmark 252.eon

2012-04-11 Thread jamborm at gcc dot gnu.org
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=52890

Martin Jambor  changed:

   What|Removed |Added

 CC||jamborm at gcc dot gnu.org

--- Comment #4 from Martin Jambor  2012-04-11 
14:30:56 UTC ---
(In reply to comment #2)
> only looks at conservative alignment in this case (prev_base is not a base,
> but a COMPONENT_REF).  I think martin has a patch to fix this.

Unfortunately, http://gcc.gnu.org/ml/gcc-patches/2012-04/msg00206.html
does not help.  I'm going to look at this in more detail then...


[Bug middle-end/52890] Revision 185336 causes 10% degradation on cpu2000 benchmark 252.eon

2012-04-10 Thread pthaugen at gcc dot gnu.org
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=52890

Pat Haugen  changed:

   What|Removed |Added

 Status|UNCONFIRMED |NEW
 Ever Confirmed|0   |1


[Bug middle-end/52890] Revision 185336 causes 10% degradation on cpu2000 benchmark 252.eon

2012-04-10 Thread pthaugen at gcc dot gnu.org
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=52890

Pat Haugen  changed:

   What|Removed |Added

 Status|NEW |UNCONFIRMED
 Ever Confirmed|1   |0

--- Comment #3 from Pat Haugen  2012-04-10 
16:13:49 UTC ---
(In reply to comment #2)
> (gdb) prt to
>   type  size 
> unit size 
> align 8 symtab 0 alias set 2 canonical type 0xfffaff00f18 precision 
> 64>
> 
> I suppose SRA does this.  Can you check with -fno-tree-sra?

With -fno-tree-sra they both generate the same code, a sequence of 8-byte
stores to build up the rhs, and then a call to memcpy for the entire structure
copy.

At expand phase we now have:

  struct ggVector3 D.2563;
  struct ggVector3 D.2562;
  struct ggVector3 D.2561;
  struct ggONB3 D.2573;

  # BLOCK 2 freq:1
  # PRED: ENTRY [100.0%]  (fallthru,exec)
  D.2561.e[0] = 1.0e+0;
  D.2561.e[1] = 0.0;
  D.2561.e[2] = 0.0;
  D.2562.e[0] = 0.0;
  D.2562.e[1] = 0.0;
  D.2562.e[2] = -1.0e+0;
  D.2563.e[0] = 0.0;
  D.2563.e[1] = 1.0e+0;
  D.2563.e[2] = 0.0;
  D.2573.U = D.2561;
  D.2573.V = D.2562;
  D.2573.W = D.2563;
  VHR_1(D)->UVW = D.2573;


Whereas with SRA we had:

  MEM[(struct mrViewingHitRecord &)VHR_1(D)] = 1.0e+0;
  MEM[(struct mrViewingHitRecord &)VHR_1(D) + 8] = 0.0;
  MEM[(struct mrViewingHitRecord &)VHR_1(D) + 16] = 0.0;
  MEM[(struct mrViewingHitRecord &)VHR_1(D) + 24] = 0.0;
  MEM[(struct mrViewingHitRecord &)VHR_1(D) + 32] = 0.0;
  MEM[(struct mrViewingHitRecord &)VHR_1(D) + 40] = -1.0e+0;
  MEM[(struct mrViewingHitRecord &)VHR_1(D) + 48] = 0.0;
  MEM[(struct mrViewingHitRecord &)VHR_1(D) + 56] = 1.0e+0;
  MEM[(struct mrViewingHitRecord &)VHR_1(D) + 64] = 0.0;


[Bug middle-end/52890] Revision 185336 causes 10% degradation on cpu2000 benchmark 252.eon

2012-04-10 Thread rguenth at gcc dot gnu.org
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=52890

Richard Guenther  changed:

   What|Removed |Added

 Status|UNCONFIRMED |NEW
   Last reconfirmed||2012-04-10
 Ever Confirmed|0   |1

--- Comment #2 from Richard Guenther  2012-04-10 
12:06:26 UTC ---
(gdb) prt to
 
unit size 
align 8 symtab 0 alias set 2 canonical type 0xfffaff00f18 precision 64>

I suppose SRA does this.  Can you check with -fno-tree-sra?

The code in SRA, which is

  /* If prev_base were always an originally performed access
 we can extract more optimistic alignment information
 by looking at the access mode.  That would constrain the
 alignment of base + base_offset which we would need to
 adjust according to offset.  */
  align = get_pointer_alignment_1 (base, &misalign);
  if (misalign == 0
  && (TREE_CODE (prev_base) == MEM_REF
  || TREE_CODE (prev_base) == TARGET_MEM_REF))
align = MAX (align, TYPE_ALIGN (TREE_TYPE (prev_base)));
  misalign += (double_int_sext (tree_to_double_int (off),
TYPE_PRECISION (TREE_TYPE (off))).low
   * BITS_PER_UNIT);
  misalign = misalign & (align - 1);
  if (misalign != 0)
align = (misalign & -misalign);
  if (align < TYPE_ALIGN (exp_type))
exp_type = build_aligned_type (exp_type, align);

only looks at conservative alignment in this case (prev_base is not a base,
but a COMPONENT_REF).  I think martin has a patch to fix this.


[Bug middle-end/52890] Revision 185336 causes 10% degradation on cpu2000 benchmark 252.eon

2012-04-06 Thread pthaugen at gcc dot gnu.org
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=52890

--- Comment #1 from Pat Haugen  2012-04-06 
21:13:04 UTC ---
My main question on this is why the MEM_REF has an alignment of 8 (i.e. byte)?

Breakpoint 1, expand_assignment (to=0xfffafed2790, from=, 
nontemporal=) at
/home/pthaugen/src/gcc/temp3/gcc/gcc/expr.c:4653
4653store_bit_field (mem, GET_MODE_BITSIZE (mode),
(gdb) prt to
 
unit size 
align 8 symtab 0 alias set 2 canonical type 0xfffaff00f18 precision 64>

arg 0 
public unsigned DI size  unit size

align 64 symtab 0 alias set 6 canonical type 0xfffb0055f28>
visited var def_stmt GIMPLE_NOP

version 1
ptr-info 0xfffafe9af80>
arg 1 
constant 0>
test.cc:29:44>
(gdb) pr mem
(mem:DF (reg/v/f:DI 120 [ VHR ]) [5 MEM[(struct mrViewingHitRecord
&)VHR_1(D)]+0 S8 A8])