[Bug tree-optimization/38785] [4.5/4.6/4.7/4.8 Regression] huge performance regression on EEMBC bitmnp01
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=38785 Jan Hubicka hubicka at gcc dot gnu.org changed: What|Removed |Added CC||vmakarov at redhat dot com --- Comment #37 from Jan Hubicka hubicka at gcc dot gnu.org 2012-11-22 09:50:52 UTC --- Yes, agreed. It is overall problem of SSA form to assume that reg-reg copies in PHIs will be optimized away by smart regalloc. Moreover we assume the same for constants. This case is hard to fix later since the values are path sensitive... Vladimir, I guess there is not much to do on regalloc side, right? Why the problem do not reproduce on simplified testcase: void f (int i, long *a, long *b) { int sum = 0; b[i] = 0; #define PART(I) if (t()) sum++; PART (1); PART (2); PART (3); PART (4); PART (5); PART (6); tt (sum); } here we somehow do not consider the partial redundancies on sum...
[Bug tree-optimization/38785] [4.5/4.6/4.7/4.8 Regression] huge performance regression on EEMBC bitmnp01
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=38785 --- Comment #38 from Jan Hubicka hubicka at gcc dot gnu.org 2012-11-22 13:05:38 UTC --- yet another variant... void f (int i, long *a, long *b) { int sum = 0; for (; --i = 0; a++, b++) { b[i] = 0; #define PART(I) if (t()) sum+=100+I; PART (1); PART (2); PART (3); PART (4); PART (5); PART (6); tt (sum); } } leads to... Starting insert iteration 1 Could not find SSA_NAME representative for expression:{plus_expr,sum_8,101} Created SSA_NAME representative pretmp_98 for expression:{plus_expr,sum_8,101} Could not find SSA_NAME representative for expression:{plus_expr,pretmp_98,102} Created SSA_NAME representative pretmp_99 for expression:{plus_expr,pretmp_98,102} Could not find SSA_NAME representative for expression:{plus_expr,pretmp_99,103} Created SSA_NAME representative pretmp_100 for expression:{plus_expr,pretmp_99,103} Could not find SSA_NAME representative for expression:{plus_expr,pretmp_100,104} Created SSA_NAME representative pretmp_101 for expression:{plus_expr,pretmp_100,104} Could not find SSA_NAME representative for expression:{plus_expr,pretmp_101,105} Created SSA_NAME representative pretmp_102 for expression:{plus_expr,pretmp_101,105} Could not find SSA_NAME representative for expression:{plus_expr,pretmp_100,105} Created SSA_NAME representative pretmp_103 for expression:{plus_expr,pretmp_100,105} Could not find SSA_NAME representative for expression:{plus_expr,pretmp_99,104} Created SSA_NAME representative pretmp_104 for expression:{plus_expr,pretmp_99,104} Could not find SSA_NAME representative for expression:{plus_expr,pretmp_104,105} Created SSA_NAME representative pretmp_105 for expression:{plus_expr,pretmp_104,105} Could not find SSA_NAME representative for expression:{plus_expr,pretmp_99,105} Created SSA_NAME representative pretmp_106 for expression:{plus_expr,pretmp_99,105} Could not find SSA_NAME representative for expression:{plus_expr,pretmp_98,103} Created SSA_NAME representative pretmp_107 for expression:{plus_expr,pretmp_98,103} Could not find SSA_NAME representative for expression:{plus_expr,pretmp_107,104} Created SSA_NAME representative pretmp_108 for expression:{plus_expr,pretmp_107,104} Could not find SSA_NAME representative for expression:{plus_expr,pretmp_108,105} Created SSA_NAME representative pretmp_109 for expression:{plus_expr,pretmp_108,105} Could not find SSA_NAME representative for expression:{plus_expr,pretmp_107,105} Created SSA_NAME representative pretmp_110 for expression:{plus_expr,pretmp_107,105} Could not find SSA_NAME representative for expression:{plus_expr,pretmp_98,104} Created SSA_NAME representative pretmp_111 for expression:{plus_expr,pretmp_98,104} Could not find SSA_NAME representative for expression:{plus_expr,pretmp_111,105} that eventually leads to a lot of unused pretmp vars.
[Bug tree-optimization/38785] [4.5/4.6/4.7/4.8 Regression] huge performance regression on EEMBC bitmnp01
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=38785 Jan Hubicka hubicka at gcc dot gnu.org changed: What|Removed |Added Status|WAITING |REOPENED --- Comment #35 from Jan Hubicka hubicka at gcc dot gnu.org 2012-11-21 17:56:04 UTC --- Too bad, we really need to make some model on how many PHI copies we introduce... I agree with Richard's comment that Joern's patch is rather bad in respect to optimization oppurtunities. This is more or less register pressure problem. I will try think about it a bit more ;)
[Bug tree-optimization/38785] [4.5/4.6/4.7/4.8 Regression] huge performance regression on EEMBC bitmnp01
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=38785 --- Comment #36 from Jorn Wolfgang Rennecke amylaar at gcc dot gnu.org 2012-11-21 17:59:10 UTC --- (In reply to comment #35) Too bad, we really need to make some model on how many PHI copies we introduce... I agree with Richard's comment that Joern's patch is rather bad in respect to optimization oppurtunities. This is more or less register pressure problem. I will try think about it a bit more ;) This is not just register pressure, these constant loads and register-register copies do not come free, either.
[Bug tree-optimization/38785] [4.5/4.6/4.7/4.8 Regression] huge performance regression on EEMBC bitmnp01
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=38785 --- Comment #34 from Igor Zamyatin izamyatin at gmail dot com 2012-11-20 21:00:39 UTC --- (In reply to comment #32) Would be possible to double check if this problem is still fixed after the fix to the tree-ssa-pre patch? Unfortunately the regression happened after the fix...
[Bug tree-optimization/38785] [4.5/4.6/4.7/4.8 Regression] huge performance regression on EEMBC bitmnp01
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=38785 Jan Hubicka hubicka at gcc dot gnu.org changed: What|Removed |Added Status|RESOLVED|WAITING CC||hubicka at gcc dot gnu.org Resolution|FIXED | --- Comment #32 from Jan Hubicka hubicka at gcc dot gnu.org 2012-11-16 17:50:28 UTC --- Would be possible to double check if this problem is still fixed after the fix to the tree-ssa-pre patch? I do not see any cold edges involved here, so perhaps we will need better heuristic. We now again find some partial redundancies. Found partial redundancy for expression {mem_ref0B,_20}@.MEM_5 (0024) Inserted pretmp_57 = *_20; in predecessor 6 Created phi prephitmp_56 = PHI _24(20), pretmp_57(6) in block 7 Found partial redundancy for expression {mem_ref0B,_20}@.MEM_6 (0029) Inserted pretmp_55 = *_20; in predecessor 8 Created phi prephitmp_63 = PHI _29(21), pretmp_55(8) in block 9 Found partial redundancy for expression {mem_ref0B,_20}@.MEM_7 (0034) Inserted pretmp_64 = *_20; in predecessor 10 Created phi prephitmp_65 = PHI _34(22), pretmp_64(10) in block 11 Found partial redundancy for expression {mem_ref0B,_20}@.MEM_8 (0039) Inserted pretmp_66 = *_20; in predecessor 12 Created phi prephitmp_67 = PHI _39(23), pretmp_66(12) in block 13 Starting insert iteration 2 Replaced *_20 with prephitmp_56 in _29 = *_20; Replaced *_20 with prephitmp_63 in _34 = *_20; Replaced *_20 with prephitmp_65 in _39 = *_20; Replaced *_20 with prephitmp_67 in _44 = *_20;
[Bug tree-optimization/38785] [4.5/4.6/4.7/4.8 Regression] huge performance regression on EEMBC bitmnp01
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=38785 --- Comment #33 from Jan Hubicka hubicka at gcc dot gnu.org 2012-11-16 18:00:54 UTC --- And at -O3 the testcase does not look really good indeed bb 7: # cstore_51 = PHI 0(5), 2147483647(6) # prephitmp_82 = PHI 1073741823(5), 3221225470(6) # prephitmp_83 = PHI 1789569705(5), 3937053352(6) # prephitmp_84 = PHI 2326440616(5), 4473924263(6) # prephitmp_85 = PHI 2755937345(5), 4903420992(6) # prephitmp_86 = PHI 3113851286(5), 5261334933(6) # prephitmp_87 = PHI 2684354557(5), 4831838204(6) # prephitmp_88 = PHI 2219066434(5), 4366550081(6) # prephitmp_89 = PHI 2576980375(5), 4724464022(6) # prephitmp_90 = PHI 2147483646(5), 4294967293(6) # prephitmp_91 = PHI 1610612734(5), 3758096381(6) # prephitmp_92 = PHI 2040109463(5), 4187593110(6) # prephitmp_93 = PHI 2398023404(5), 4545507051(6) # prephitmp_94 = PHI 1968526675(5), 4116010322(6) # prephitmp_95 = PHI 1503238552(5), 3650722199(6) # prephitmp_96 = PHI 1861152493(5), 4008636140(6) # prephitmp_97 = PHI 1431655764(5), 3579139411(6) # prephitmp_98 = PHI 715827882(5), 2863311529(6) # prephitmp_99 = PHI 1252698793(5), 3400182440(6) # prephitmp_100 = PHI 1682195522(5), 3829679169(6) # prephitmp_103 = PHI 1145324611(5), 3292808258(6) # prephitmp_106 = PHI 536870911(5), 2684354558(6) # prephitmp_107 = PHI 966367640(5), 3113851287(6) # prephitmp_108 = PHI 1324281581(5), 3471765228(6) # prephitmp_109 = PHI 894784852(5), 3042268499(6) # prephitmp_110 = PHI 429496729(5), 2576980376(6) # prephitmp_111 = PHI 787410670(5), 2934894317(6) # prephitmp_112 = PHI 357913941(5), 2505397588(6) *_18 = cstore_51; _24 = *_20; _25 = _24 2; if (_25 = -14) goto bb 8; else goto bb 9; The catch is that the patch disabled the partial PRE by an accident. No cold edges are involved here since we predict all the branches quite even :(
[Bug tree-optimization/38785] [4.5/4.6/4.7/4.8 Regression] huge performance regression on EEMBC bitmnp01
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=38785 Richard Guenther rguenth at gcc dot gnu.org changed: What|Removed |Added Target Milestone|4.5.4 |4.8.0
[Bug tree-optimization/38785] [4.5/4.6/4.7/4.8 Regression] huge performance regression on EEMBC bitmnp01
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=38785 --- Comment #30 from Maxim Kuvyrkov mkuvyrkov at gcc dot gnu.org 2012-04-28 01:56:59 UTC --- Author: mkuvyrkov Date: Sat Apr 28 01:56:54 2012 New Revision: 186928 URL: http://gcc.gnu.org/viewcvs?root=gccview=revrev=186928 Log: PR tree-optimization/38785 * common.opt (ftree-partial-pre): New option. * doc/invoke.texi: Document it. * opts.c (default_options_table): Initialize flag_tree_partial_pre. * tree-ssa-pre.c (do_partial_partial_insertion): Insert only if it will benefit speed path. (execute_pre): Use flag_tree_partial_pre. Modified: trunk/gcc/ChangeLog trunk/gcc/common.opt trunk/gcc/doc/invoke.texi trunk/gcc/opts.c trunk/gcc/tree-ssa-pre.c
[Bug tree-optimization/38785] [4.5/4.6/4.7/4.8 Regression] huge performance regression on EEMBC bitmnp01
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=38785 Maxim Kuvyrkov mkuvyrkov at gcc dot gnu.org changed: What|Removed |Added Status|ASSIGNED|RESOLVED Resolution||FIXED --- Comment #31 from Maxim Kuvyrkov mkuvyrkov at gcc dot gnu.org 2012-04-28 02:09:38 UTC --- Fixed by the above reworked version of Joern's and Steven's patches.
[Bug tree-optimization/38785] [4.5/4.6/4.7/4.8 Regression] huge performance regression on EEMBC bitmnp01
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=38785 Jakub Jelinek jakub at gcc dot gnu.org changed: What|Removed |Added Target Milestone|4.4.7 |4.5.4 --- Comment #29 from Jakub Jelinek jakub at gcc dot gnu.org 2012-03-13 12:46:14 UTC --- 4.4 branch is being closed, moving to 4.5.4 target.