On Tue, Jan 17, 2012 at 6:49 AM, robin <robi...@dodo.com.au> wrote: > From: "Rob van der Heij" <rvdh...@gmail.com> > Sent: Tuesday, 17 January 2012 2:37 AM > >> Having the CLC near the EX helps for cache. I also like to assemble it >> in-line because the right USINGs apply. We noticed that it is >> attractive to run over the CLC (with the length byte 0 as assembled) >> and then EX behind your back to do the real thing. More attractive >> than branch over the target if the instruction lets you. > > > A convenient place for the subject instruction is immediately after > a B instruction, thus avoiding the need to execute CLC or MVC twice.
My experience was that executing the MVC or CLC twice (first with length 0) is better than to branch over it. So: X CLC ONE(0),TWO EX Rx,X But it may very well be that current CPUs look sufficiently over the branch that one could B Y X CLC ONE(0),TWO Y EX Rx,X Obviously I do not wish to make this kind of decision at each instance. But once you find this in the deep bowls of a heavy loop, it is worth to think about it and put the optimal one in my INLINEX macro that does the work: INLINEX Rx,CLC,ONE(0),TWO Rob