> > > Another surprise (?) was that two STs were faster than an STM
> > > for two registers.
> > Once again, no big surprise in terms of cache and memory
> > design. It might even turn out that the advantage holds true
> > for a larger number of registers. Try it.
> > 
> Might this also depend on whether the storage operand is
> doubleword aligned, and whether the registers are an
> even/odd pair?

It might. It might also depend on whether the destination
addresses crossed a cache-line boundary among other considerations.
It is getting harder all the time to guess accurately what the
processor is going to do.

> I once heard a rumor that there was special millicode to optimize
> "STM R14,R12".

I've heard that too and for me, the jury is out. I have known
microcoders on both IBM and Amdahl iron. Some swear that those
optimizations are done. Others deny it. If it were ever true its
possible it was on some models and not others. 

That said, if you were ever going to special case -any- instruction
sequence on an IBM processor, that would probably be one of the more
productive idioms to go after. The putative reason for XPLINK in LE
is to avoid the overhead of saving and restoring registers for the
(typical) small C and C++ functions. 

CC

----------------------------------------------------------------------
For IBM-MAIN subscribe / signoff / archive access instructions,
send email to [EMAIL PROTECTED] with the message: GET IBM-MAIN INFO
Search the archives at http://bama.ua.edu/archives/ibm-main.html

Reply via email to