On Wed, 14 Aug 2019 at 18:56, Charles Mills <charl...@mcn.org> wrote:
> This is really interesting. For those put off by the "C++" note that the > issue has nothing whatsoever to do with C++. It is a pure branch prediction > issue. Picture a program that computes an array of pseudo-random 8-bit > integers from 0 to 255. Then it solves the problem "what is the sum of all > of the numbers in the table greater than or equal to 128?" It does that > with the obvious loop, which then contains > > IC R0,next_table_byte > CHI R0,128 > JL *+6 > AR R2,R0 > > The assertion of the thread -- and I don't doubt it -- is that the above > code uses 1/6 the CPU time if you first sort the table. (Obviously, a > stupid way of doing things: you could sort the table high to low and then > exit once you got a value below 128; but that's not the point.) > > It illustrates the value of, or problem with, depending on your point of > view, branch prediction. If the table is random then the outcome of the > CHI/JL is unpredictable, and the branch prediction is at best useless and > at worst counter-productive. But if the table is sorted, then the branch > prediction has a chance to be right most of the time. > Of course the sort doesn't generally come free. As well as the basic CPU cost (typically n log n for any modern sort), it will have some amount of startup CPU and additional memory references. Yes, of course I understand that this is an example to demonstrate branch prediction. But in the Real World, branch prediction is just one of many aspects of modern CPUs that are hard to predict. For that matter branch prediction itself is more than just remembering which way previous branches went. As a rule, if you know which path is more likely (and you have reason to think that the predictor doesn't know better than you do), you should make the likely path the non-branch one. Similarly you should make the test as unlikely to branch as possible, e.g. arrange the algorithm to use JL rather than JNH, even if JNH might give the same result. The predictor, if it knows nothing else, is going to guess branching based on the number of bits on in the mask. > There's a lot more than that in the thread, and it fundamentally has to do > with modern CPU technology, not C++. > Yup. It is a good thread, with a nice explanation/analogy. Tony H. ---------------------------------------------------------------------- For IBM-MAIN subscribe / signoff / archive access instructions, send email to lists...@listserv.ua.edu with the message: INFO IBM-MAIN