Someone from IBM sent me a couple of presentations, some of which I had contributed to. It covers things like *instruction decode, address generate, execute, put away*; parallelism. How L R4,VALUE L R5,0(R4) this has to wait until the previous load has finished, but other instructions can be processed in parallel.
Data from the L1 cache is faster than data from a different book. Dont have 2 threads running concurrently sharing the same cache block for private data. All good stuff Colin On Wed, 1 Mar 2023 at 17:59, Colin Paice <colinpai...@gmail.com> wrote: > I've been asked to give a talk on performance to a University Computing > department. > > I know the z hardware has in builtin instrumentation which allows you to > see where the delays were for a particular instruction. For example this > load instruction got data from the L3 cache and it took x nano seconds. > > Is there a presentation on this? > > I remember seeing a presentation (it may have been IBM confidential) > showing that a Load could be slow, if the data was in a the cache in a book > 3 ft away, compared to it being in the cache on the chip. > Also the second time round a loop is faster than the first time because > the instructions are in the instruction cache. > > This was all mind blowing stuff! > > Colin > ---------------------------------------------------------------------- For IBM-MAIN subscribe / signoff / archive access instructions, send email to lists...@listserv.ua.edu with the message: INFO IBM-MAIN