https://gcc.gnu.org/bugzilla/show_bug.cgi?id=96031

--- Comment #2 from bin cheng <amker at gcc dot gnu.org> ---
Interesting case, I see two issues in generated asm.  One is the unnecessary
bitwise and, the other is allocating different registers for induction variable
and the base address.  However, looks like neither issue is caused by ivopts. 
Check the dump:

431   <bb 4> [local count: 105119324]:
432   _12 = (short unsigned int) step_8(D);
433   ivtmp.10_11 = (unsigned long) &array;
434   _18 = len_7(D) + 4294967294;
435   _19 = (unsigned long) _18;
436   _20 = _19 * 2;
437   _21 = (unsigned long) &array;
438   _22 = _21 + 2;
439   _23 = _20 + _22;
440
441   <bb 5> [local count: 955630224]:
442   # ivtmp.8_15 = PHI <_12(4), ivtmp.8_5(6)>
443   # ivtmp.10_16 = PHI <ivtmp.10_11(4), ivtmp.10_4(6)>
444   _3 = ivtmp.8_15;
445   _2 = (void *) ivtmp.10_16;
446   MEM[base: _2, offset: 2B] = _3;
447   ivtmp.8_5 = ivtmp.8_15 + _12;
448   ivtmp.10_4 = ivtmp.10_16 + 2;
449   if (ivtmp.10_4 != _23)
450     goto <bb 6>; [89.00%]
451   else
452     goto <bb 8>; [11.00%]
453
454   <bb 8> [local count: 105119324]:
455   goto <bb 3>; [100.00%]
456
457   <bb 6> [local count: 850510900]:
458   goto <bb 5>; [100.00%]

As far as I can tell, it's optimal.

The register allocation issue is introduced by rtl PRE, apparently we should
not save the add 2 instruction in the last iteration with a false dependence
which is more harmful.

As for ivopt, I can see a minor improvement by replacing != exit condition with
<=, thus saving add 2 instruction computing _22, which happens to "disable" the
wrong PRE transformation.

Ah, I see it's already classified as rtl-optimization.

Thanks

Reply via email to