[Bug tree-optimization/69042] [6 regression] Missed optimization in ivopts

2016-03-24 Thread rguenth at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=69042

Richard Biener  changed:

   What|Removed |Added

 Status|NEW |RESOLVED
 Resolution|--- |FIXED

--- Comment #14 from Richard Biener  ---
Regression is fixed.

[Bug tree-optimization/69042] [6 regression] Missed optimization in ivopts

2016-03-23 Thread amker at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=69042

--- Comment #13 from amker at gcc dot gnu.org ---
Simple summary.
The test case provided in this PR is resolved by the two patches, but the
problem still exists if the first function in compilation unit triggers the
issue.  This is because x86's inconsistent behavior of PIC_OFF_REG, discussed
at https://gcc.gnu.org/ml/gcc/2016-02/msg00040.html  Maybe this will be
addressed in next stage 1.
If we go deeper in IVOPT, we may be able to model that symbol PIC offset
requires an additional register for m32.  But this will be a target dependent
issue, for example x86_64 may not have this issue.  We then need a backend hook
to get this information.  But register pressure computation is far from a fine
grain model anyway...

[Bug tree-optimization/69042] [6 regression] Missed optimization in ivopts

2016-03-23 Thread amker at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=69042

--- Comment #12 from amker at gcc dot gnu.org ---
The above two patches actually doesn't fix the problem, but I think it covers
the problem by bringing back the old behavior.
So Ilya, could you please check that status of the regression?  Thanks.  If it
disappears, maybe we can degrade it to P3?

Thanks,
bin

[Bug tree-optimization/69042] [6 regression] Missed optimization in ivopts

2016-03-23 Thread amker at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=69042

--- Comment #11 from amker at gcc dot gnu.org ---
Author: amker
Date: Wed Mar 23 15:26:43 2016
New Revision: 234430

URL: https://gcc.gnu.org/viewcvs?rev=234430=gcc=rev
Log:

PR tree-optimization/69042
* params.def (PARAM_IV_CONSIDER_ALL_CANDIDATES_BOUND): Increase the
parameter from 30 to 40.


Modified:
trunk/gcc/ChangeLog
trunk/gcc/params.def

[Bug tree-optimization/69042] [6 regression] Missed optimization in ivopts

2016-03-23 Thread amker at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=69042

--- Comment #10 from amker at gcc dot gnu.org ---
Author: amker
Date: Wed Mar 23 15:24:20 2016
New Revision: 234429

URL: https://gcc.gnu.org/viewcvs?rev=234429=gcc=rev
Log:

PR tree-optimization/69042
* tree-ssa-loop-ivopts.c (add_iv_candidate_for_use): Add IV cand
for use with constant offset stripped in base.


Modified:
trunk/gcc/ChangeLog
trunk/gcc/tree-ssa-loop-ivopts.c

[Bug tree-optimization/69042] [6 regression] Missed optimization in ivopts

2016-03-09 Thread amker at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=69042

--- Comment #9 from amker at gcc dot gnu.org ---
(In reply to amker from comment #8)
> Though adding candidate with offset stripped from base helps this case, it
> causes other regressions which I need to understand.

I can confirm that one major regression on AArch64 for spec2k is in 173.applu. 
The root cause is this change increases candidate number.  The number exceeds
the default "--param iv-consider-all-candidates-bound=30".  It can be resolved
by increasing this param.
Other regressions seem false alarms, I couldn't reproduce it.

There are some small improvements in other cases, overall the spec2k perf isn't
changed on AArch64.  I will check spec2k6 and will send the patch if there is
no regression either.
And can we increase the param bound a little now (or in GCC 7)?

[Bug tree-optimization/69042] [6 regression] Missed optimization in ivopts

2016-02-19 Thread amker at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=69042

--- Comment #8 from amker at gcc dot gnu.org ---
Though adding candidate with offset stripped from base helps this case, it
causes other regressions which I need to understand.

[Bug tree-optimization/69042] [6 regression] Missed optimization in ivopts

2016-02-04 Thread amker at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=69042

--- Comment #7 from amker at gcc dot gnu.org ---
If I add back the candidate, ivopt can fix attached case, but it still can't
handle a slightly tuned case as below:

extern const int indexes[];

int bar (int code);

int
foo (short *data)
{
  register int i, j;

  j = 0;
  for (i = 1; i < 64; i++) {
if (data[indexes[i]]) {
  j++;
} else {
  if (bar (j))
return 0;
  j = 0;
  __asm__("":::"eax","ebx","ecx","edx","esi","edi","ebp");
}
  }

  return 1;
}

The only difference is change bar from a function definition to a declaration.

The cost computed in ivopt for pic related symbol-ref is different because of
issue described at : https://gcc.gnu.org/ml/gcc/2016-02/msg00040.html

[Bug tree-optimization/69042] [6 regression] Missed optimization in ivopts

2016-01-14 Thread rguenth at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=69042

Richard Biener  changed:

   What|Removed |Added

 Target||i?86-*-*
   Priority|P3  |P1

--- Comment #6 from Richard Biener  ---
Let's keep looking a bit but eventually downgrade again.

[Bug tree-optimization/69042] [6 regression] Missed optimization in ivopts

2016-01-13 Thread amker at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=69042

--- Comment #3 from amker at gcc dot gnu.org ---
(In reply to amker from comment #2)
> For iv use:
> use 0
>   address
>   in statement _9 = indexes[i_23];
> 
>   at position indexes[i_23]
>   type const int *
>   base (const int *) ( + 4)
>   step 4
>   base object (void *) 
>   related candidates 
> 
> Before the change, two candidates are added:
> candidate 8
>   var_before ivtmp.12
>   var_after ivtmp.12
>   incremented before exit test
>   type unsigned int
>   base (unsigned int) ( + 4)
>   step 4
>   base object (void *) 
> candidate 10
>   var_before ivtmp.14
>   var_after ivtmp.14
>   incremented before exit test
>   type unsigned int
>   base (unsigned int) 
>   step 4
>   base object (void *) 
> After the change only candidate 8 is added.  I did this to minimize the
> number of candidates.  Maybe that's too aggressive.  Probably candidate like
> this (with offset stripped) should be added, I will check if it causes new
> regressions.
> 
> Another problem is candidate 8 could be chosen to decrease register
> pressure, but isn't.  I will check why the register pressure cost doesn't
> work in this case.  Though candidate 8 is still a little bit worse than
> candidate 10, because of one more setup instruction in loop pre-header block.

For this register pressure question.
GCC assumes that symbol "indexes" in memory reference "MEM[symbol: indexes,
index: _21, step: 4, offset: 0B]" doesn't introduce any register pressure
because we can use addressing mode like "indexes(,%eax,4)".  This is not true
for pic/pie code.  In such case, we need to move "indexes@GOT(%eax)" to a
register before using it in memory reference.  That's why the additional
register use not counted in register pressure computation in IVOPT.

[Bug tree-optimization/69042] [6 regression] Missed optimization in ivopts

2016-01-13 Thread amker at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=69042

--- Comment #4 from amker at gcc dot gnu.org ---
Still need to check if aarch64 is affected by this register pressure issue.  It
shouldn't because we don't support symbol in addressing mode and need to
compute it outside mem ref anyway.

[Bug tree-optimization/69042] [6 regression] Missed optimization in ivopts

2016-01-13 Thread amker at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=69042

--- Comment #5 from amker at gcc dot gnu.org ---
(In reply to Andrew Pinski from comment #1)
> Confirmed, even on aarch64 too.  Replacing the asm with:
>  
> asm("":::"x0","x1","x2","x3","x4","x5","x6","x7","x8","x9","x10","x11","x12",
> "x13","x14","x15","x16","x17","x18","x19","x20","x21","x22","x23","x24",
> "x25","x26","x27","x28","x30");
> 
> 
> Shows the problem there.

AArch64 case is complicated.
1) TREE/IVOPT doesn't understand asm instruction and its register pressure.

2) cost computation with symbol_ref involved is a mess because of both IVOPT
and AArch64 backend.  
3) Same as x86_64, IVOPT doesn't count register pressure for symbol value in
memory reference.
...
And most important:
4) If IVOPT picks up candidate {*index, 4}, it does decrease register pressure
by one at the point of array reference, but this is neutralized because at exit
condition of the loop, we need one more register to hold the terminating value.
 For now it's a constant 64 and can be rematerialized in cmp instruction.
I think there is no benefit on AArch64 wrto this example.

[Bug tree-optimization/69042] [6 regression] Missed optimization in ivopts

2016-01-05 Thread pinskia at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=69042

Andrew Pinski  changed:

   What|Removed |Added

   Keywords||missed-optimization
   Target Milestone|--- |6.0

[Bug tree-optimization/69042] [6 regression] Missed optimization in ivopts

2016-01-05 Thread pinskia at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=69042

Andrew Pinski  changed:

   What|Removed |Added

 Status|UNCONFIRMED |NEW
   Last reconfirmed||2016-01-06
 Ever confirmed|0   |1

--- Comment #1 from Andrew Pinski  ---
Confirmed, even on aarch64 too.  Replacing the asm with:
 
asm("":::"x0","x1","x2","x3","x4","x5","x6","x7","x8","x9","x10","x11","x12","x13","x14","x15","x16","x17","x18","x19","x20","x21","x22","x23","x24","x25","x26","x27","x28","x30");


Shows the problem there.