On Mon, Jan 16, 2012 at 4:13 PM, Tom Marchant <m42tom-ibmm...@yahoo.com> wrote:
> On Mon, 16 Jan 2012 06:49:54 -0600, Dan Skomsky, PSTI wrote:
>
>>One Assembler trick I have seen in speeding up scanning loops was to use a
>>CLI instruction to check the first byte of a string and then only doing the
>>CLC/CLCL if the CLI matches.  This trick even works if doing a binary
>>search.
>
> I don't know if the cost of EX is high enough that you would benefit
> from doing a one byte CLC before an EX of a CLC.  I don't see how a CLI
> will help you though.

Having the CLC near the EX helps for cache. I also like to assemble it
in-line because the right USINGs apply. We noticed that it is
attractive to run over the CLC (with the length byte 0 as assembled)
and then EX behind your back to do the real thing. More attractive
than branch over the target if the instruction lets you.

I doubt whether a branch between the CLC and the EX would be
advantage. Depending on how often the comparison already fails on the
first byte, you trade an untaken branch against an EX CLC that fails
on the first byte. Guess I should try that some Friday afternoon...

Rob

Reply via email to