[Bug tree-optimization/69042] [6 regression] Missed optimization in ivopts
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=69042 Richard Biener changed: What|Removed |Added Status|NEW |RESOLVED Resolution|--- |FIXED --- Comment #14 from Richard Biener --- Regression is fixed.
[Bug tree-optimization/69042] [6 regression] Missed optimization in ivopts
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=69042 --- Comment #13 from amker at gcc dot gnu.org --- Simple summary. The test case provided in this PR is resolved by the two patches, but the problem still exists if the first function in compilation unit triggers the issue. This is because x86's inconsistent behavior of PIC_OFF_REG, discussed at https://gcc.gnu.org/ml/gcc/2016-02/msg00040.html Maybe this will be addressed in next stage 1. If we go deeper in IVOPT, we may be able to model that symbol PIC offset requires an additional register for m32. But this will be a target dependent issue, for example x86_64 may not have this issue. We then need a backend hook to get this information. But register pressure computation is far from a fine grain model anyway...
[Bug tree-optimization/69042] [6 regression] Missed optimization in ivopts
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=69042 --- Comment #12 from amker at gcc dot gnu.org --- The above two patches actually doesn't fix the problem, but I think it covers the problem by bringing back the old behavior. So Ilya, could you please check that status of the regression? Thanks. If it disappears, maybe we can degrade it to P3? Thanks, bin
[Bug tree-optimization/69042] [6 regression] Missed optimization in ivopts
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=69042 --- Comment #11 from amker at gcc dot gnu.org --- Author: amker Date: Wed Mar 23 15:26:43 2016 New Revision: 234430 URL: https://gcc.gnu.org/viewcvs?rev=234430=gcc=rev Log: PR tree-optimization/69042 * params.def (PARAM_IV_CONSIDER_ALL_CANDIDATES_BOUND): Increase the parameter from 30 to 40. Modified: trunk/gcc/ChangeLog trunk/gcc/params.def
[Bug tree-optimization/69042] [6 regression] Missed optimization in ivopts
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=69042 --- Comment #10 from amker at gcc dot gnu.org --- Author: amker Date: Wed Mar 23 15:24:20 2016 New Revision: 234429 URL: https://gcc.gnu.org/viewcvs?rev=234429=gcc=rev Log: PR tree-optimization/69042 * tree-ssa-loop-ivopts.c (add_iv_candidate_for_use): Add IV cand for use with constant offset stripped in base. Modified: trunk/gcc/ChangeLog trunk/gcc/tree-ssa-loop-ivopts.c
[Bug tree-optimization/69042] [6 regression] Missed optimization in ivopts
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=69042 --- Comment #9 from amker at gcc dot gnu.org --- (In reply to amker from comment #8) > Though adding candidate with offset stripped from base helps this case, it > causes other regressions which I need to understand. I can confirm that one major regression on AArch64 for spec2k is in 173.applu. The root cause is this change increases candidate number. The number exceeds the default "--param iv-consider-all-candidates-bound=30". It can be resolved by increasing this param. Other regressions seem false alarms, I couldn't reproduce it. There are some small improvements in other cases, overall the spec2k perf isn't changed on AArch64. I will check spec2k6 and will send the patch if there is no regression either. And can we increase the param bound a little now (or in GCC 7)?
[Bug tree-optimization/69042] [6 regression] Missed optimization in ivopts
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=69042 --- Comment #8 from amker at gcc dot gnu.org --- Though adding candidate with offset stripped from base helps this case, it causes other regressions which I need to understand.
[Bug tree-optimization/69042] [6 regression] Missed optimization in ivopts
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=69042 --- Comment #7 from amker at gcc dot gnu.org --- If I add back the candidate, ivopt can fix attached case, but it still can't handle a slightly tuned case as below: extern const int indexes[]; int bar (int code); int foo (short *data) { register int i, j; j = 0; for (i = 1; i < 64; i++) { if (data[indexes[i]]) { j++; } else { if (bar (j)) return 0; j = 0; __asm__("":::"eax","ebx","ecx","edx","esi","edi","ebp"); } } return 1; } The only difference is change bar from a function definition to a declaration. The cost computed in ivopt for pic related symbol-ref is different because of issue described at : https://gcc.gnu.org/ml/gcc/2016-02/msg00040.html
[Bug tree-optimization/69042] [6 regression] Missed optimization in ivopts
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=69042 Richard Biener changed: What|Removed |Added Target||i?86-*-* Priority|P3 |P1 --- Comment #6 from Richard Biener --- Let's keep looking a bit but eventually downgrade again.
[Bug tree-optimization/69042] [6 regression] Missed optimization in ivopts
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=69042 --- Comment #3 from amker at gcc dot gnu.org --- (In reply to amker from comment #2) > For iv use: > use 0 > address > in statement _9 = indexes[i_23]; > > at position indexes[i_23] > type const int * > base (const int *) ( + 4) > step 4 > base object (void *) > related candidates > > Before the change, two candidates are added: > candidate 8 > var_before ivtmp.12 > var_after ivtmp.12 > incremented before exit test > type unsigned int > base (unsigned int) ( + 4) > step 4 > base object (void *) > candidate 10 > var_before ivtmp.14 > var_after ivtmp.14 > incremented before exit test > type unsigned int > base (unsigned int) > step 4 > base object (void *) > After the change only candidate 8 is added. I did this to minimize the > number of candidates. Maybe that's too aggressive. Probably candidate like > this (with offset stripped) should be added, I will check if it causes new > regressions. > > Another problem is candidate 8 could be chosen to decrease register > pressure, but isn't. I will check why the register pressure cost doesn't > work in this case. Though candidate 8 is still a little bit worse than > candidate 10, because of one more setup instruction in loop pre-header block. For this register pressure question. GCC assumes that symbol "indexes" in memory reference "MEM[symbol: indexes, index: _21, step: 4, offset: 0B]" doesn't introduce any register pressure because we can use addressing mode like "indexes(,%eax,4)". This is not true for pic/pie code. In such case, we need to move "indexes@GOT(%eax)" to a register before using it in memory reference. That's why the additional register use not counted in register pressure computation in IVOPT.
[Bug tree-optimization/69042] [6 regression] Missed optimization in ivopts
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=69042 --- Comment #4 from amker at gcc dot gnu.org --- Still need to check if aarch64 is affected by this register pressure issue. It shouldn't because we don't support symbol in addressing mode and need to compute it outside mem ref anyway.
[Bug tree-optimization/69042] [6 regression] Missed optimization in ivopts
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=69042 --- Comment #5 from amker at gcc dot gnu.org --- (In reply to Andrew Pinski from comment #1) > Confirmed, even on aarch64 too. Replacing the asm with: > > asm("":::"x0","x1","x2","x3","x4","x5","x6","x7","x8","x9","x10","x11","x12", > "x13","x14","x15","x16","x17","x18","x19","x20","x21","x22","x23","x24", > "x25","x26","x27","x28","x30"); > > > Shows the problem there. AArch64 case is complicated. 1) TREE/IVOPT doesn't understand asm instruction and its register pressure. 2) cost computation with symbol_ref involved is a mess because of both IVOPT and AArch64 backend. 3) Same as x86_64, IVOPT doesn't count register pressure for symbol value in memory reference. ... And most important: 4) If IVOPT picks up candidate {*index, 4}, it does decrease register pressure by one at the point of array reference, but this is neutralized because at exit condition of the loop, we need one more register to hold the terminating value. For now it's a constant 64 and can be rematerialized in cmp instruction. I think there is no benefit on AArch64 wrto this example.
[Bug tree-optimization/69042] [6 regression] Missed optimization in ivopts
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=69042 Andrew Pinski changed: What|Removed |Added Keywords||missed-optimization Target Milestone|--- |6.0
[Bug tree-optimization/69042] [6 regression] Missed optimization in ivopts
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=69042 Andrew Pinski changed: What|Removed |Added Status|UNCONFIRMED |NEW Last reconfirmed||2016-01-06 Ever confirmed|0 |1 --- Comment #1 from Andrew Pinski --- Confirmed, even on aarch64 too. Replacing the asm with: asm("":::"x0","x1","x2","x3","x4","x5","x6","x7","x8","x9","x10","x11","x12","x13","x14","x15","x16","x17","x18","x19","x20","x21","x22","x23","x24","x25","x26","x27","x28","x30"); Shows the problem there.