Re: Out of Order and Superscalar - small experiment

Tony Harminc Mon, 02 Jun 2014 18:55:18 -0700

On 2 June 2014 20:14, Robin Vowels <[email protected]> wrote:
> From: "Rob van der Heij" <[email protected]>
> Sent: Tuesday, June 03, 2014 1:00 AM


>> More recently I've been working on porting Linux gcc object code to CMS,
>> and now that I needed a nice checksum routine, I figured I might take a
>> popular open source checksum routine http://en.wikipedia.org/wiki/Adler-32
>> and let gcc compile and optimize it. Since the generated assembler source
>> wasn't that obvious to me, I was getting interested to know why.
>>
>> My simplistic implementation was like this (for each byte, so wrapped in a
>> loop)
>>
>>
>> *  IC        R4,0(R6)  AR        R2,R4     AR        R3,R2   *
>
> Must have muissed something here.

I think what you missed was the reference to the Adler-32 algorithm,
with its need to keep two 16-bit sums.

> A 3-instruction loop to sum bytes.
>
>         LA 6,X+offset (last byte of area to be summed)
>         SR 2,2
>         SR 4,4
> Loop IC 4,0(0,6)
>         AR 2,4
>         BCT 6,Loop
>
> And you can use BCTR to save a few µS.

Why do you think BCTR would save such a large amount of time? Perhaps
you're again talking about old machines. Surely BRCT/JCT would be the
time saver on a current machine if there is one for this case.

Tony H.

Re: Out of Order and Superscalar - small experiment

Reply via email to