----- Original Message -----
From: "Rob van der Heij" <rvdh...@gmail.com>
To: <ASSEMBLER-LIST@LISTSERV.UGA.EDU>
Sent: Tuesday, June 03, 2014 6:58 PM
Subject: Re: Out of Order and Superscalar - small experiment


On 3 June 2014 09:19, Robin Vowels <robi...@dodo.com.au> wrote:


On 2 June 2014 19:56, Robin Vowels <robi...@dodo.com.au> wrote:

From: "Tony Harminc" <t...@harminc.com>
Sent: Tuesday, June 03, 2014 3:30 AM


 Is LHI Rn,0 faster than SR Rn,Rn? I'd expect them to be the same, but
SR is half the size, and so lessens the amount of i-cache used.


XR Rn,Rn is faster than SR.


I doubt it. Perhaps on much older machines, but I have little doubt
that modern machines have special cases for most or all common ways of
zeroing a register, e.g. SR Rn, Rn and XR Rn, Rn and LA Rn,0 and LHI
Rn,0 .


Longer instructions tend to run slower.


We were asking about your claim that "XR is faster than SR"

"We"?  No, only one person commented "I doubt it" and "perhaps" without
any supporting evidence.

Just dragged out an old manual, to find that XR and SR took the same time.

but both are RR-type.

That's right.

 The best I know is that very long ago the SR was faster than XR,

The evidence contradicts that.

but with later machines it was not anymore. That message may have been
extrapolated in that now XR would be faster than SR, but I have not see
evidence of that. Someone suggested that LA would be done in address adders

LA took longer than SR, understandable because it's a longer instruction.
and it is required (in general) to form the addition of three values before it
can deposit the sum in the register specified.

and have an advantage over LHI, even to the point of outweighing the extra
two bytes over a SR.

A "suggestion" is not fact. It's unlikely that it's faster; more bytes are 
required
to be loaded than SR.

You're most welcome to borrow this thread to discuss another algorithm, but
maybe it's confusing enough already. If the algorithm were not relevant, I
could have used the CKSM as pointed out.

Did you try it to compare times?

I just wanted to share my
experience where the CPU did a better job in reordering instructions than
the compiler. That surprised me. Considering the audience, I provided some
background for those who care.

My simple version did not require wiping the register before the IC, and I
did not code that either.

Even if it did, it would be executed once, and you wouldn't notice the time.

---
This email is free from viruses and malware because avast! Antivirus protection 
is active.
http://www.avast.com

Reply via email to