https://gcc.gnu.org/bugzilla/show_bug.cgi?id=62631

--- Comment #26 from Eric Botcazou <ebotcazou at gcc dot gnu.org> ---
> The generated code on PA looks optimal to me:
> 
>         zdep %r25,29,30,%r28
>         b .L2
>         ldi 99,%r19
> .L6:
>         zdep %r25,29,30,%r28
> .L2:
>         addl %r26,%r28,%r28
>         ldo 1(%r25),%r25
>         comb,>>= %r19,%r25,.L6
>         stw %r0,0(%r28)
>         bv,n %r0(%r2)

For most other architectures the BIV (%r25) is eliminated to the GIV (%r28) so
you only have one additive operation in the loop.  This happens for 64-bit PA:

.L5:
        ldo 4(%r26),%r26
        cmpb,*>>,n %r28,%r26,.L5
        stw %r0,0(%r26)
        bve,n (%r2)

Why couldn't such a code be generated for 32-bit PA too?

Reply via email to