When looking for a specific byte a CLI loop is my weapon of choice. Unless I'm dealing with frequently executed code I'm happy to simply embrace the TRT instruction along with the other unabashedly CISC-to-the-max members of the z/Architecture instruction set. Ultimately you just have to experiment because you're optimizing for a black-box and the only instrumentation available is CPU time used. Instruction order, branch points and branch target locations can make big differences so it's worth trying various combinations to see what works best (putting the first instruction of a tight loop on a doubleword boundary, for example).
Keven -----Original Message----- From: IBM Mainframe Assembler List [mailto:ASSEMBLER-LIST@LISTSERV.UGA.EDU] On Behalf Of Rob van der Heij Sent: Thursday, January 12, 2012 7:16 PM To: ASSEMBLER-LIST@LISTSERV.UGA.EDU Subject: Re: How bad is the EX instruction? On Fri, Jan 13, 2012 at 1:05 AM, Hall, Keven <keh...@informatica.com> wrote: > If you're looking to reduce CPU usage you might want to optimize the > TRT the heck out of the equation. Talk about expensive! [augment > with imagined or actual sound of cash register "cah-ching" sound for > added emphasis/effect] Ok... but how? Would a loop stepping over the max 8 bytes be wiser to find the first blank? Another idea I had was to step a 2-byte CLC with '* ' over the string, but the complexity and the end spoils the fun. Guess I never really measured TRT. A variation of this code is used to search items in a linked list. I obviously moved the TRT out of the loop and that might have helped make it faster. Rob