[Bug rtl-optimization/36712] Inefficient loop unrolling

steven at gcc dot gnu dot org Fri, 05 Feb 2010 05:33:33 -0800


------- Comment #16 from steven at gcc dot gnu dot org  2010-02-05 13:33 -------
I'm trying to coerce IVOPTSs into producing the following, optimal code in the
GIMPLE optimizers (without much luck, so far):


<bb 2>:
  pretmp.11_26 = (int) s_11(D);
  ivtmp.20_28 = (long unsigned int) b_inout_5(D);
  D.1848_31 = (long unsigned int) b_inout_5(D);
  D.1849_32 = D.1848_31 + 256;

<bb 3>:
 # ivtmp.20_25 = PHI <ivtmp.20_24(4), ivtmp.20_28(2)>
 D.1846_29 = (void *) ivtmp.20_25;
 D.1814_10 = MEM[base: D.1846_29]{*D.1813};
 D.1816_13 = D.1814_10 * pretmp.11_26;
 D.1847_30 = (void *) ivtmp.20_25;
 MEM[base: D.1847_30]{*D.1813} = D.1816_13;
 ivtmp.20_24 = ivtmp.20_25 + 4;
 if (ivtmp.20_24 != D.1849_32)
   goto <bb 4>;
 else
   goto <bb 5>;

<bb 4>:
  goto <bb 3>;

<bb 5>:
  return;


If we can get the compiler to generate the above code in IVOPTS, then we should
get the same code as the hand-unrolled example (although it also looks like  a
bit more scheduler look-ahead freedom is necessary).

I asked Zdenek for help. For ARM9 he found the following problem:

/quote/
Address costs:
 index costs 6
 cst + index costs 2

arm_arm_address_cost pretends that having reg + cst as an address is cheaper
than having reg by itself.  Ivopts are happy to make this happen :-)

There used to be the same problem on x86, which was eventually fixed by making
address_cost reflect the real cost of addresses, rather than "which addressing
modes should CSE prefer" metrics.
/quote/

Of course all cases with cost of "index" > cost of "index + cst" results in the
code like that of an unpatched compiler. But if I adjust the cost to make
"index" cost only 1 or 2, I get this:

<bb 2>:
  pretmp.11_26 = (int) s_11(D);
  ivtmp.25_28 = (long unsigned int) b_inout_5(D);

<bb 3>:
  # i_20 = PHI <i_14(4), 0(2)>
  # ivtmp.25_25 = PHI <ivtmp.25_24(4), ivtmp.25_28(2)>
  D.1846_29 = (void *) ivtmp.25_25;
  D.1814_10 = MEM[base: D.1846_29]{*D.1813};
  D.1816_13 = D.1814_10 * pretmp.11_26;
  D.1847_30 = (void *) ivtmp.25_25;
  MEM[base: D.1847_30]{*D.1813} = D.1816_13;
  ivtmp.25_24 = ivtmp.25_25 + 4;
  i_14 = i_20 + 1;
  if (i_14 != 64)
    goto <bb 4>;
  else
    goto <bb 5>;

<bb 4>:
  goto <bb 3>;

<bb 5>:
  return;


That is still not optimal: We get an extra IV for some reason.


-- 

steven at gcc dot gnu dot org changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |rakdver at gcc dot gnu dot
                   |                            |org


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=36712

[Bug rtl-optimization/36712] Inefficient loop unrolling

Reply via email to