[Bug rtl-optimization/63191] [4.8/4.9/5 Regression] 32-bit gcc uses excessive memory during dead store elimination with -fPIC
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=63191 --- Comment #7 from Steven Bosscher --- (In reply to Steven Bosscher from comment #6) > Now let's see if I can come up with a more reasonable test case... Like so: - 8< - typedef int X; struct Z { Z(const X* x1, X x2, X x3) : x1_(x1), x2_(x2), x3_(x3) {} const X* x1_; X x2_; X x3_; }; #undef X1 #undef X___10 #undef X__100 #undef X_1000 #undef X1 #define X1(N) \ static const X Xs##N[] = {}; #define X___10(N) \ X1(N##0) X1(N##1) X1(N##2) X1(N##3) X1(N##4) \ X1(N##5) X1(N##6) X1(N##7) X1(N##8) X1(N##9) #define X__100(N) \ X___10(N##0) X___10(N##1) X___10(N##2) X___10(N##3) X___10(N##4) \ X___10(N##5) X___10(N##6) X___10(N##7) X___10(N##8) X___10(N##9) #define X_1000(N) \ X__100(N##0) X__100(N##1) X__100(N##2) X__100(N##3) X__100(N##4) \ X__100(N##5) X__100(N##6) X__100(N##7) X__100(N##8) X__100(N##9) #define X1(N) \ X_1000(N##0) X_1000(N##1) X_1000(N##2) X_1000(N##3) X_1000(N##4) \ X_1000(N##5) X_1000(N##6) X_1000(N##7) X_1000(N##8) X_1000(N##9) X1(0) #undef Z1 #undef Z___10 #undef Z__100 #undef Z_1000 #undef Z1 #define Z1(N,I,J) \ Z(Xs##N,1,1), #define Z___10(N) \ Z1(N##0,1,1) Z1(N##0,1,1) \ Z1(N##0,1,1) Z1(N##1,2,1) \ Z1(N##0,1,1) Z1(N##2,1,2) \ Z1(N##0,1,1) Z1(N##3,6,3) \ Z1(N##0,1,1) Z1(N##4,7,2) \ Z1(N##0,1,1) Z1(N##5,1,3) \ Z1(N##0,1,1) Z1(N##6,5,9) \ Z1(N##0,1,1) Z1(N##7,7,1) \ Z1(N##0,1,1) Z1(N##8,3,3) \ Z1(N##0,1,1) Z1(N##9,2,2) #define Z__100(N) \ Z___10(N##0) Z___10(N##1) Z___10(N##2) Z___10(N##3) Z___10(N##4) \ Z___10(N##5) Z___10(N##6) Z___10(N##7) Z___10(N##8) Z___10(N##9) #define Z_1000(N) \ Z__100(N##0) Z__100(N##1) Z__100(N##2) Z__100(N##3) Z__100(N##4) \ Z__100(N##5) Z__100(N##6) Z__100(N##7) Z__100(N##8) Z__100(N##9) #define Z1(N) \ Z_1000(N##0) // Z_1000(N##1) Z_1000(N##2) Z_1000(N##3) Z_1000(N##4) \ // Z_1000(N##5) Z_1000(N##6) Z_1000(N##7) Z_1000(N##8) Z_1000(N##9) static const X XsLast[] = {}; static const Z Zs[] = { Z1(0) Z(XsLast,1,1) }; const Z* getzs() { return &Zs[0]; } - 8< - exploding in DSE: dead store elim1: 45.34 (15%) usr 0.19 (28%) sys 45.53 (15%) wall 1016985 kB (45%) ggc
[Bug rtl-optimization/63191] [4.8/4.9/5 Regression] 32-bit gcc uses excessive memory during dead store elimination with -fPIC
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=63191 --- Comment #6 from Steven Bosscher --- (In reply to woodfin from comment #5) > You could try adding a non-static function that returns an address inside Zs. > > const Z* getzs() { > return &Zs[0]; > } Yes, that does the trick: PID USER PR NI VIRT RES SHR S %CPU %MEMTIME+ COMMAND 25244 stevenb 20 0 5964m 5.8g 30m R 100 9.3 25:03.60 cc1plus (and counting) Now let's see if I can come up with a more reasonable test case...
[Bug rtl-optimization/63191] [4.8/4.9/5 Regression] 32-bit gcc uses excessive memory during dead store elimination with -fPIC
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=63191 --- Comment #5 from woodfin at intersystems dot com --- You could try adding a non-static function that returns an address inside Zs. const Z* getzs() { return &Zs[0]; } I'd think that would force it to actually perform the initialization if the contents can be externally accessed. Sorry, I don't have a gcc 5.0 environment yet. I'll set one up if you still can't reproduce this there.
[Bug rtl-optimization/63191] [4.8/4.9/5 Regression] 32-bit gcc uses excessive memory during dead store elimination with -fPIC
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=63191 --- Comment #4 from Steven Bosscher --- How is one to reproduce this bug with GCC5? I've tried: $ ./xg++ --version xg++ (GCC) 5.0.0 20150407 (experimental) [trunk revision 221906] Copyright (C) 2015 Free Software Foundation, Inc. This is free software; see the source for copying conditions. There is NO warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. $ ./xg++ -B. -S -O2 -m32 -fPIC PR63191.cc -fdump-tree-optimized $ cat PR63191.cc.190t.optimized ;; Function (static initializers for PR63191.cc) (_GLOBAL__sub_I_PR63191.cc, funcdef_no=4, decl_uid=14028, cgraph_uid=4, symbol_order=1500) (executed once) (static initializers for PR63191.cc) () { : return; } $ So AFAICT GCC5 optimizes the test case of comment #0 to an empty file. I'm sure there's a way to avoid optimizing this to empty, but I'm not quite a C++ guru ;-)
[Bug rtl-optimization/63191] [4.8/4.9/5 Regression] 32-bit gcc uses excessive memory during dead store elimination with -fPIC
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=63191 Jakub Jelinek changed: What|Removed |Added Target Milestone|4.8.4 |4.8.5 --- Comment #3 from Jakub Jelinek --- GCC 4.8.4 has been released.
[Bug rtl-optimization/63191] [4.8/4.9/5 Regression] 32-bit gcc uses excessive memory during dead store elimination with -fPIC
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=63191 Richard Biener changed: What|Removed |Added Priority|P3 |P2
[Bug rtl-optimization/63191] [4.8/4.9/5 Regression] 32-bit gcc uses excessive memory during dead store elimination with -fPIC
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=63191 --- Comment #2 from Richard Biener --- With int a, b, c, d; struct X { int a; int b; void *p; } z[4]; void foo (void) { z[0].a = 1; z[0].b = 2; z[0].p = &a; z[1].a = 1; z[1].b = 2; z[1].p = &b; z[2].a = 1; z[2].b = 2; z[2].p = &c; z[3].a = 1; z[3].b = 2; z[3].p = &d; } CSEing of the GOT load of z works.
[Bug rtl-optimization/63191] [4.8/4.9/5 Regression] 32-bit gcc uses excessive memory during dead store elimination with -fPIC
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=63191 Richard Biener changed: What|Removed |Added Target||i?86-*-* Status|UNCONFIRMED |NEW Keywords||memory-hog Last reconfirmed||2014-09-08 CC||rth at gcc dot gnu.org Blocks||47344 Ever confirmed|0 |1 Summary|32-bit gcc uses excessive |[4.8/4.9/5 Regression] |memory during dead store|32-bit gcc uses excessive |elimination with -fPIC |memory during dead store ||elimination with -fPIC Target Milestone|--- |4.8.4 --- Comment #1 from Richard Biener --- Confirmed. Possibly excessive value_rtx expansion from dse.c:canon_address. The testcase is a function with a single basic-block and 3 stores (the static initializer function) with the pattern D.94947 = (struct Z *) &Zs; D.94947->x1_ = &Xs1[0]; D.94947->x2_ = 1; D.94947->x3_ = 1; temp.20397 = D.94947 + 12; temp.20397->x1_ = &Xs90[0]; temp.20397->x2_ = 2; temp.20397->x3_ = 1; ... temp.30587 = temp.30586 + 12; temp.30587->x1_ = &Xs611[0]; temp.30587->x2_ = 2; temp.30587->x3_ = 1; thus groups of three stores followed by an address adjustment. The above is from a GCC 4.3 IL dump. The GCC 4.9 IL dump shows MEM[(struct Z *)&Zs].x1_ = &Xs1; MEM[(struct Z *)&Zs].x2_ = 1; MEM[(struct Z *)&Zs].x3_ = 1; MEM[(struct Z *)&Zs + 12B].x1_ = &Xs90; MEM[(struct Z *)&Zs + 12B].x2_ = 2; MEM[(struct Z *)&Zs + 12B].x3_ = 1; MEM[(struct Z *)&Zs + 24B].x1_ = &Xs91; MEM[(struct Z *)&Zs + 24B].x2_ = 2; MEM[(struct Z *)&Zs + 24B].x3_ = 1; ... MEM[(struct Z *)&Zs + 122292B].x1_ = &Xs611; MEM[(struct Z *)&Zs + 122292B].x2_ = 2; MEM[(struct Z *)&Zs + 122292B].x3_ = 1; which causes each store to be expanded via st like (insn 71298 71297 71299 2 (set (reg:SI 40822) (const:SI (unspec:SI [ (symbol_ref:SI ("_ZL2Zs") [flags 0x2] ) ] UNSPEC_GOTOFF))) t.C:5 -1 (nil)) (insn 71299 71298 71300 2 (set (mem/c:SI (plus:SI (plus:SI (reg:SI 3 bx) (reg:SI 40822)) (const_int 122216 [0x1dd68])) [4 MEM[(struct Z *)&Zs + 122208B].x3_+0 S4 A64]) (const_int 1 [0x1])) t.C:5 -1 (nil)) I suppose "lowering" PIC addresses somewhere before RTL expansion (and CSEing the addresses) would help here. Lowering as in not treating them as is_gimple_min_invariant. With 4.3 we have a single address load for &Zs (but of course we retain the individual stored addresses loads - thus still very many PIC addresses in this function). Why is CSE not able to CSE the UNSPEC_GOTOFF addresses? Does it not do it because of the (const:SI ...) wrapping (as in, not profitable)? Or is it confused about the other intermediate UNSPEC_GOTOFF uses? That said, cse1 should be able to turn the RTL into sth equivalent to what 4.3 produced.