I apologize if it is a well disguised feature, but I am forced to consider this being a performance regression/bug.
In the following trivial example: void VecADD( long long *In1, long long *In2, long long *Out, unsigned int samples ){ int i; for (i = 0; i < samples; i++) { Out[i] = In1[i] + In2[i]; } } there is an implicit imprecision in the way C is used - type of 'samples' is unsigned, while type of 'i' is signed. The problem on the high level - induction variable analysis fails for this loop, which impairs further tree level loop optimizations from functioning properly (including autoincrement). In my port performance is off by 50% for this loop. GCC 3.4.6 was able to handle this situation fine. What I believe to be the problem at the lowest level is a non-minimal (or overly restrictive) SSA representation right before the iv detection: VecADD (In1, In2, Out, samples) { int i; long long int D.1857; long long int D.1856; long long int * D.1855; long long int D.1854; long long int * D.1853; long long int * D.1852; unsigned int D.1851; unsigned int i.0; <bb 2>: <bb 6>: # i_10 = PHI <0(2)> i.0_5 = (unsigned int) i_10; if (i.0_5 < samples_4(D)) goto <bb 3>; else goto <bb 5>; <bb 3>: # i.0_9 = PHI <i.0_3(4), i.0_5(6)> # i_14 = PHI <i_1(4), i_10(6)> D.1851_6 = i.0_9 * 8; D.1852_8 = Out_7(D) + D.1851_6; D.1853_12 = In1_11(D) + D.1851_6; D.1854_13 = *D.1853_12; D.1855_17 = In2_16(D) + D.1851_6; D.1856_18 = *D.1855_17; D.1857_19 = D.1854_13 + D.1856_18; *D.1852_8 = D.1857_19; i_20 = i_14 + 1; <bb 4>: # i_1 = PHI <i_20(3)> i.0_3 = (unsigned int) i_1; if (i.0_3 < samples_4(D)) goto <bb 3>; else goto <bb 5>; <bb 5>: return; } The two PHI nodes in the beginning of BB3 break the iv detection. Same example when types of i and samples would match will be analyzed perfectly fine with the SSA at the same point looking like this: VecADD (In1, In2, Out, samples) { int i; long long int D.1857; long long int D.1856; long long int * D.1855; long long int D.1854; long long int * D.1853; long long int * D.1852; unsigned int D.1851; unsigned int i.0; <bb 2>: <bb 6>: # i_9 = PHI <0(2)> if (i_9 < samples_3(D)) goto <bb 3>; else goto <bb 5>; <bb 3>: # i_13 = PHI <i_1(4), i_9(6)> i.0_4 = (unsigned int) i_13; D.1851_5 = i.0_4 * 8; D.1852_7 = Out_6(D) + D.1851_5; D.1853_11 = In1_10(D) + D.1851_5; D.1854_12 = *D.1853_11; D.1855_16 = In2_15(D) + D.1851_5; D.1856_17 = *D.1855_16; D.1857_18 = D.1854_12 + D.1856_17; *D.1852_7 = D.1857_18; i_19 = i_13 + 1; <bb 4>: # i_1 = PHI <i_19(3)> if (i_1 < samples_3(D)) goto <bb 3>; else goto <bb 5>; <bb 5>: return; } On one hand I seem to understand that a danger of signed/unsigned overflow at increment can force this kind of conservatism, but on the high level this situation was handled fine by gcc 3.4.6 and is handled with no issues by another SSA based compiler. If there is a way to relax this strict interpretation of C rules by GCC 4.3.2, I would gladly learn about it, but my brief flag mining exercise yielded no results. Thank you. -- Summary: loop iv detection failure, SSA autoincrement Product: gcc Version: 4.3.2 Status: UNCONFIRMED Severity: normal Priority: P3 Component: c AssignedTo: unassigned at gcc dot gnu dot org ReportedBy: sergei_lus at yahoo dot com http://gcc.gnu.org/bugzilla/show_bug.cgi?id=38856