I'm in the process of hand-tuning a small, performance critical algorithm on a Z13, and I'm hampered by the lack of detailed information on the instruction-level performance of the machine. Back in the day, IBM used to publish a "Functional Characteristics" manual for each CPU model that provided this information; those seem to have been discontinued. I'm looking for things like:
* Is register renaming used for GPRs and/or FPRs? (affects the need for loop unrolling) * Are there any ways to bypass the L1 cache on moves of less than a page, when simply moving data without looking at it? * Does LMG/STMG outperform a linear sequence of LG/STG? Under what circumstances? At the very least, compiler maintainers need this information to select the right instruction sequences for each model. Does anyone know of a source for this information, other than writing test kernels and trying things out? I already have the SHARE presentations by David Bond and Bob Rogers, which contain some of this information, but it's not enough. I was really excited by the addition of vector registers on the Z13 (yippee, an additional 512 bytes of high speed scratch storage!), but the load/store performance hasn't turned out to be what I had hoped for. I may well be using the facility "the wrong way"; having detailed implementation information would sure help. -- Jerry Callen Rocket Software ---------------------------------------------------------------------- For IBM-MAIN subscribe / signoff / archive access instructions, send email to lists...@listserv.ua.edu with the message: INFO IBM-MAIN