This is really interesting. For those put off by the "C++" note that the issue has nothing whatsoever to do with C++. It is a pure branch prediction issue. Picture a program that computes an array of pseudo-random 8-bit integers from 0 to 255. Then it solves the problem "what is the sum of all of the numbers in the table greater than or equal to 128?" It does that with the obvious loop, which then contains
IC R0,next_table_byte CHI R0,128 JL *+6 AR R2,R0 The assertion of the thread -- and I don't doubt it -- is that the above code uses 1/6 the CPU time if you first sort the table. (Obviously, a stupid way of doing things: you could sort the table high to low and then exit once you got a value below 128; but that's not the point.) It illustrates the value of, or problem with, depending on your point of view, branch prediction. If the table is random then the outcome of the CHI/JL is unpredictable, and the branch prediction is at best useless and at worst counter-productive. But if the table is sorted, then the branch prediction has a chance to be right most of the time. There's a lot more than that in the thread, and it fundamentally has to do with modern CPU technology, not C++. Charles -----Original Message----- From: IBM Mainframe Discussion List [mailto:IBM-MAIN@LISTSERV.UA.EDU] On Behalf Of David Crayford Sent: Tuesday, August 13, 2019 11:13 PM To: IBM-MAIN@LISTSERV.UA.EDU Subject: Re: Instruction speeds Some interesting information on branch prediction in that paper. If you've got a z/OS C++ compiler try this snippet out, it's fascinating: https://stackoverflow.com/questions/11227809/why-is-processing-a-sorted-array-faster-than-processing-an-unsorted-array ---------------------------------------------------------------------- For IBM-MAIN subscribe / signoff / archive access instructions, send email to lists...@listserv.ua.edu with the message: INFO IBM-MAIN