Re: Unreadable code (Was: Concurrent Server Task Dispatch issue multitasking issue)

David Crayford Sun, 13 Jan 2019 20:48:22 -0800

On 14/01/2019 6:06 am, Ed Jaffe wrote:

On 1/13/2019 4:08 AM, David Crayford wrote:
On 13/01/2019 7:06 pm, Tony Thigpen wrote:
I have seen some reports that current C compilers, which understandthe z-hardware pipeline, can actually produce object that is fasterrunning than an assembler. Mainly because no sane assemblerprogrammer would produce great pipe-line code because it would beun-maintanable.
It's well established that that's been true for a well over a decadenow. Not just C but all compilers including COBOL which got a newoptimizer a few releases back.
Far, far less true now than it used to be.

Good to hear. The best optimization is done in hardware where you don'thave to recompile. Followed by a JIT.

Back in the old days, things ran a lot faster if you interleavedunrelated things in an "unfriendly" way. For example, this code fragment:


|    LGF  R0,Field1   Increment Field1
|    AGHI R0,1        (same)
|    ST   R0,Field1   (same)
|    LGF  R0,Field2   Increment Field2
|    AGHI R0,1        (same)
|    ST   R0,Field2   (same)
|    LGF  R0,Field3   Increment Field3
|    AGHI R0,1        (same)
|    ST   R0,Field3   (same)
|    LGF  R0,Field4   Increment Field4
|    AGHI R0,1        (same)
|    ST   R0,Field4   (same)

ran much faster when coded this way (which is not how a programmerwould usually write things):


|    LGF  R0,Field1   Increment Field1
|    LGF  R1,Field2   Increment Field2
|    LGF  R2,Field3   Increment Field3
|    LGF  R3,Field4   Increment Field4
|    AGHI R0,1        (same)
|    AGHI R1,1        (same)
|    AGHI R2,1        (same)
|    AGHI R3,1        (same)
|    ST   R0,Field1   (same)
|    ST   R1,Field2   (same)
|    ST   R2,Field3   (same)
|    ST   R3,Field5   (same)

But once OOO execution came on the scene with z196, you could get thesame enhanced performance from this easy-to-code and easy-to-readversion:


|    LGF  R0,Field1   Increment Field1
|    AGHI R0,1        (same)
|    ST   R0,Field1   (same)
|    LGF  R1,Field2   Increment Field2
|    AGHI R1,1        (same)
|    ST   R1,Field2   (same)
|    LGF  R2,Field3   Increment Field3
|    AGHI R2,1        (same)
|    ST   R2,Field3   (same)
|    LGF  R3,Field4   Increment Field4
|    AGHI R3,1        (same)
|    ST   R3,Field4   (same)

These days, many performance improvements are realized by the compilerusing newer instructions that replace older ones. For example, on z10and higher, this very same code can be replaced with:


|    ASI  Field1,1    Increment Field1
|    ASI  Field2,1    Increment Field1
|    ASI  Field3,1    Increment Field1
|    ASI  Field4,1    Increment Field1

IIRC, the interleaved instruction scheduling was to mitigate the AGIproblem?

In my experienced the two optimizations that make the most differenceare function inlining and loop unrolling. I've taken to definingfunctions in header files to take advantage of both (we don't use IPA).

Of course, an HLASM programmer can do exactly the same thing. Butchanging old code to use new instructions requiresrelatively-expensive programmer resources whereas simply recompilingprograms targeting a new machine is a relatively-inexpensive proposition.

I've noticed (depending on compiler options) the C optimizer is startingto use vector instructions. They can be a bit hairy even for experiencedassembler programmers. Best to leave that to a compiler IMO.


Instead of using a SRST instruction strlen() generates the following:

      VLBB     v0,str(r6,r9,0),2
      LCBB     r7,str(r6,r9,0),2
      LR       r0,r6
      ALR      r6,r7
      VFENEB   v0,v0,v0,b'0010'
      VLGVB    r2,v0,7
      CLRJH    r7,r2,@2L33
      VLBB     v0,str(r6,r9,0),2
      LCBB     r7,str(r6,r9,0),2
      LR       r0,r6
      ALR      r6,r7
      VFENEB   v0,v0,v0,b'0010'
      VLGVB    r2,v0,7
      CLRJH    r7,r2,@2L33
      VLBB     v0,str(r6,r9,0),2
      LCBB     r7,str(r6,r9,0),2
      LR       r0,r6
      VFENEB   v0,v0,v0,b'0010'
      ALR      r6,r7
      VLGVB    r2,v0,7
      CLRJH    r7,r2,@2L33
      VLBB     v0,str(r6,r9,0),2
      LCBB     r7,str(r6,r9,0),2
      LR       r0,r6
      ALR      r6,r7
      VFENEB   v0,v0,v0,b'0010'
      VLGVB    r2,v0,7
      CLRJNH   r7,r2,@2L25

----------------------------------------------------------------------
For IBM-MAIN subscribe / signoff / archive access instructions,
send email to lists...@listserv.ua.edu with the message: INFO IBM-MAIN

Re: Unreadable code (Was: Concurrent Server Task Dispatch issue multitasking issue)

Reply via email to