[Bug rtl-optimization/63191] [4.8/4.9/5 Regression] 32-bit gcc uses excessive memory during dead store elimination with -fPIC

2015-04-07 Thread steven at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=63191

--- Comment #7 from Steven Bosscher  ---
(In reply to Steven Bosscher from comment #6)
> Now let's see if I can come up with a more reasonable test case...

Like so:

- 8< -
typedef int X;

struct Z {
Z(const X* x1, X x2, X x3) :
  x1_(x1), x2_(x2), x3_(x3) {}
const X* x1_;
X x2_;
X x3_;
};

#undef X1
#undef X___10
#undef X__100
#undef X_1000
#undef X1
#define X1(N) \
  static const X Xs##N[] = {};
#define X___10(N) \
  X1(N##0) X1(N##1) X1(N##2) X1(N##3) X1(N##4) \
  X1(N##5) X1(N##6) X1(N##7) X1(N##8) X1(N##9)
#define X__100(N) \
  X___10(N##0) X___10(N##1) X___10(N##2) X___10(N##3) X___10(N##4) \
  X___10(N##5) X___10(N##6) X___10(N##7) X___10(N##8) X___10(N##9)
#define X_1000(N) \
  X__100(N##0) X__100(N##1) X__100(N##2) X__100(N##3) X__100(N##4) \
  X__100(N##5) X__100(N##6) X__100(N##7) X__100(N##8) X__100(N##9)
#define X1(N) \
  X_1000(N##0) X_1000(N##1) X_1000(N##2) X_1000(N##3) X_1000(N##4) \
  X_1000(N##5) X_1000(N##6) X_1000(N##7) X_1000(N##8) X_1000(N##9)

X1(0)

#undef Z1
#undef Z___10
#undef Z__100
#undef Z_1000
#undef Z1
#define Z1(N,I,J) \
  Z(Xs##N,1,1),
#define Z___10(N) \
  Z1(N##0,1,1) Z1(N##0,1,1) \
  Z1(N##0,1,1) Z1(N##1,2,1) \
  Z1(N##0,1,1) Z1(N##2,1,2) \
  Z1(N##0,1,1) Z1(N##3,6,3) \
  Z1(N##0,1,1) Z1(N##4,7,2) \
  Z1(N##0,1,1) Z1(N##5,1,3) \
  Z1(N##0,1,1) Z1(N##6,5,9) \
  Z1(N##0,1,1) Z1(N##7,7,1) \
  Z1(N##0,1,1) Z1(N##8,3,3) \
  Z1(N##0,1,1) Z1(N##9,2,2)
#define Z__100(N) \
  Z___10(N##0) Z___10(N##1) Z___10(N##2) Z___10(N##3) Z___10(N##4) \
  Z___10(N##5) Z___10(N##6) Z___10(N##7) Z___10(N##8) Z___10(N##9)
#define Z_1000(N) \
  Z__100(N##0) Z__100(N##1) Z__100(N##2) Z__100(N##3) Z__100(N##4) \
  Z__100(N##5) Z__100(N##6) Z__100(N##7) Z__100(N##8) Z__100(N##9)
#define Z1(N) \
  Z_1000(N##0) // Z_1000(N##1) Z_1000(N##2) Z_1000(N##3) Z_1000(N##4) \
  // Z_1000(N##5) Z_1000(N##6) Z_1000(N##7) Z_1000(N##8) Z_1000(N##9)

static const X XsLast[] = {};
static const Z Zs[] = { Z1(0) Z(XsLast,1,1) };

const Z* getzs() {
return &Zs[0];
}

- 8< -

exploding in DSE:
 dead store elim1:  45.34 (15%) usr   0.19 (28%) sys  45.53 (15%) wall
1016985 kB (45%) ggc


[Bug rtl-optimization/63191] [4.8/4.9/5 Regression] 32-bit gcc uses excessive memory during dead store elimination with -fPIC

2015-04-07 Thread steven at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=63191

--- Comment #6 from Steven Bosscher  ---
(In reply to woodfin from comment #5)
> You could try adding a non-static function that returns an address inside Zs.
> 
> const Z* getzs() {
>   return &Zs[0];
> }

Yes, that does the trick:
  PID USER  PR  NI  VIRT  RES  SHR S  %CPU %MEMTIME+  COMMAND
25244 stevenb   20   0 5964m 5.8g  30m R   100  9.3  25:03.60 cc1plus
(and counting)

Now let's see if I can come up with a more reasonable test case...


[Bug rtl-optimization/63191] [4.8/4.9/5 Regression] 32-bit gcc uses excessive memory during dead store elimination with -fPIC

2015-04-07 Thread woodfin at intersystems dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=63191

--- Comment #5 from woodfin at intersystems dot com ---
You could try adding a non-static function that returns an address inside Zs.

const Z* getzs() {
  return &Zs[0];
}

I'd think that would force it to actually perform the initialization if the
contents can be externally accessed.

Sorry, I don't have a gcc 5.0 environment yet. I'll set one up if you still
can't reproduce this there.


[Bug rtl-optimization/63191] [4.8/4.9/5 Regression] 32-bit gcc uses excessive memory during dead store elimination with -fPIC

2015-04-07 Thread steven at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=63191

--- Comment #4 from Steven Bosscher  ---
How is one to reproduce this bug with GCC5? I've tried:

$ ./xg++ --version
xg++ (GCC) 5.0.0 20150407 (experimental) [trunk revision 221906]
Copyright (C) 2015 Free Software Foundation, Inc.
This is free software; see the source for copying conditions.  There is NO
warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.

$ ./xg++ -B. -S -O2 -m32 -fPIC PR63191.cc -fdump-tree-optimized
$ cat PR63191.cc.190t.optimized

;; Function (static initializers for PR63191.cc) (_GLOBAL__sub_I_PR63191.cc,
funcdef_no=4, decl_uid=14028, cgraph_uid=4, symbol_order=1500) (executed once)

(static initializers for PR63191.cc) ()
{
  :
  return;

}


$ 

So AFAICT GCC5 optimizes the test case of comment #0 to an empty file.
I'm sure there's a way to avoid optimizing this to empty, but I'm not
quite a C++ guru ;-)


[Bug rtl-optimization/63191] [4.8/4.9/5 Regression] 32-bit gcc uses excessive memory during dead store elimination with -fPIC

2014-12-19 Thread jakub at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=63191

Jakub Jelinek  changed:

   What|Removed |Added

   Target Milestone|4.8.4   |4.8.5

--- Comment #3 from Jakub Jelinek  ---
GCC 4.8.4 has been released.


[Bug rtl-optimization/63191] [4.8/4.9/5 Regression] 32-bit gcc uses excessive memory during dead store elimination with -fPIC

2014-11-24 Thread rguenth at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=63191

Richard Biener  changed:

   What|Removed |Added

   Priority|P3  |P2


[Bug rtl-optimization/63191] [4.8/4.9/5 Regression] 32-bit gcc uses excessive memory during dead store elimination with -fPIC

2014-09-08 Thread rguenth at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=63191

--- Comment #2 from Richard Biener  ---
With

int a, b, c, d;
struct X { int a; int b; void *p; } z[4];
void foo (void)
{
  z[0].a = 1;
  z[0].b = 2;
  z[0].p = &a;
  z[1].a = 1;
  z[1].b = 2;
  z[1].p = &b;
  z[2].a = 1;
  z[2].b = 2;
  z[2].p = &c;
  z[3].a = 1;
  z[3].b = 2;
  z[3].p = &d;
}

CSEing of the GOT load of z works.


[Bug rtl-optimization/63191] [4.8/4.9/5 Regression] 32-bit gcc uses excessive memory during dead store elimination with -fPIC

2014-09-08 Thread rguenth at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=63191

Richard Biener  changed:

   What|Removed |Added

 Target||i?86-*-*
 Status|UNCONFIRMED |NEW
   Keywords||memory-hog
   Last reconfirmed||2014-09-08
 CC||rth at gcc dot gnu.org
 Blocks||47344
 Ever confirmed|0   |1
Summary|32-bit gcc uses excessive   |[4.8/4.9/5 Regression]
   |memory during dead store|32-bit gcc uses excessive
   |elimination with -fPIC  |memory during dead store
   ||elimination with -fPIC
   Target Milestone|--- |4.8.4

--- Comment #1 from Richard Biener  ---
Confirmed.  Possibly excessive value_rtx expansion from dse.c:canon_address.

The testcase is a function with a single basic-block and 3 stores
(the static initializer function) with the pattern

  D.94947 = (struct Z *) &Zs;
  D.94947->x1_ = &Xs1[0];
  D.94947->x2_ = 1;
  D.94947->x3_ = 1;
  temp.20397 = D.94947 + 12;
  temp.20397->x1_ = &Xs90[0];
  temp.20397->x2_ = 2;
  temp.20397->x3_ = 1;
...
  temp.30587 = temp.30586 + 12;
  temp.30587->x1_ = &Xs611[0];
  temp.30587->x2_ = 2;
  temp.30587->x3_ = 1;

thus groups of three stores followed by an address adjustment.  The above
is from a GCC 4.3 IL dump.

The GCC 4.9 IL dump shows

  MEM[(struct Z *)&Zs].x1_ = &Xs1;
  MEM[(struct Z *)&Zs].x2_ = 1;
  MEM[(struct Z *)&Zs].x3_ = 1;
  MEM[(struct Z *)&Zs + 12B].x1_ = &Xs90;
  MEM[(struct Z *)&Zs + 12B].x2_ = 2;
  MEM[(struct Z *)&Zs + 12B].x3_ = 1;
  MEM[(struct Z *)&Zs + 24B].x1_ = &Xs91;
  MEM[(struct Z *)&Zs + 24B].x2_ = 2;
  MEM[(struct Z *)&Zs + 24B].x3_ = 1;
...
  MEM[(struct Z *)&Zs + 122292B].x1_ = &Xs611;
  MEM[(struct Z *)&Zs + 122292B].x2_ = 2;
  MEM[(struct Z *)&Zs + 122292B].x3_ = 1;

which causes each store to be expanded via st like

(insn 71298 71297 71299 2 (set (reg:SI 40822)
(const:SI (unspec:SI [
(symbol_ref:SI ("_ZL2Zs") [flags 0x2]  )
] UNSPEC_GOTOFF))) t.C:5 -1
 (nil))
(insn 71299 71298 71300 2 (set (mem/c:SI (plus:SI (plus:SI (reg:SI 3 bx)
(reg:SI 40822))
(const_int 122216 [0x1dd68])) [4 MEM[(struct Z *)&Zs +
122208B].x3_+0 S4 A64])
(const_int 1 [0x1])) t.C:5 -1
 (nil))

I suppose "lowering" PIC addresses somewhere before RTL expansion (and
CSEing the addresses) would help here.  Lowering as in not treating
them as is_gimple_min_invariant.

With 4.3 we have a single address load for &Zs (but of course we retain
the individual stored addresses loads - thus still very many PIC addresses
in this function).

Why is CSE not able to CSE the UNSPEC_GOTOFF addresses?  Does it not do
it because of the (const:SI ...) wrapping (as in, not profitable)?  Or is
it confused about the other intermediate UNSPEC_GOTOFF uses?

That said, cse1 should be able to turn the RTL into sth equivalent to
what 4.3 produced.