https://gcc.gnu.org/bugzilla/show_bug.cgi?id=62631
--- Comment #26 from Eric Botcazou <ebotcazou at gcc dot gnu.org> --- > The generated code on PA looks optimal to me: > > zdep %r25,29,30,%r28 > b .L2 > ldi 99,%r19 > .L6: > zdep %r25,29,30,%r28 > .L2: > addl %r26,%r28,%r28 > ldo 1(%r25),%r25 > comb,>>= %r19,%r25,.L6 > stw %r0,0(%r28) > bv,n %r0(%r2) For most other architectures the BIV (%r25) is eliminated to the GIV (%r28) so you only have one additive operation in the loop. This happens for 64-bit PA: .L5: ldo 4(%r26),%r26 cmpb,*>>,n %r28,%r26,.L5 stw %r0,0(%r26) bve,n (%r2) Why couldn't such a code be generated for 32-bit PA too?