On Tue, Jan 17, 2012 at 6:49 AM, robin <robi...@dodo.com.au> wrote:
> From: "Rob van der Heij" <rvdh...@gmail.com>
> Sent: Tuesday, 17 January 2012 2:37 AM
>
>> Having the CLC near the EX helps for cache. I also like to assemble it
>> in-line because the right USINGs apply. We noticed that it is
>> attractive to run over the CLC (with the length byte 0 as assembled)
>> and then EX behind your back to do the real thing. More attractive
>> than branch over the target if the instruction lets you.
>
>
> A convenient place for the subject instruction is immediately after
> a B instruction, thus avoiding the need to execute CLC or MVC twice.

My experience was that executing the MVC or CLC twice (first with
length 0) is better than to branch over it. So:

X  CLC ONE(0),TWO
  EX Rx,X

But it may very well be that current CPUs look sufficiently over the
branch that one could

   B  Y
X CLC ONE(0),TWO
Y EX  Rx,X

Obviously I do not wish to make this kind of decision at each
instance. But once you find this in the deep bowls of a heavy loop, it
is worth to think about it and put the optimal one in my INLINEX macro
that does the work:
 INLINEX Rx,CLC,ONE(0),TWO

Rob

Reply via email to