On Wed, 14 Aug 2019 at 18:56, Charles Mills <charl...@mcn.org> wrote:

> This is really interesting. For those put off by the "C++" note that the
> issue has nothing whatsoever to do with C++. It is a pure branch prediction
> issue. Picture a program that computes an array of pseudo-random 8-bit
> integers from 0 to 255. Then it solves the problem "what is the sum of all
> of the numbers in the table greater than or equal to 128?" It does that
> with the obvious loop, which then contains
>
> IC   R0,next_table_byte
> CHI  R0,128
> JL   *+6
> AR   R2,R0
>
> The assertion of the thread -- and I don't doubt it -- is that the above
> code uses 1/6 the CPU time if you first sort the table. (Obviously, a
> stupid way of doing things: you could sort the table high to low and then
> exit once you got a value below 128; but that's not the point.)
>
> It illustrates the value of, or problem with, depending on your point of
> view, branch prediction. If the table is random then the outcome of the
> CHI/JL is unpredictable, and the branch prediction is at best useless and
> at worst counter-productive. But if the table is sorted, then the branch
> prediction has a chance to be right most of the time.
>

Of course the sort doesn't generally come free. As well as the basic CPU
cost (typically n log n for any modern sort), it will have some amount of
startup CPU and additional memory references. Yes, of course I understand
that this is an example to demonstrate branch prediction. But in the Real
World, branch prediction is just one of many aspects of modern CPUs that
are hard to predict. For that matter branch prediction itself is more than
just remembering which way previous branches went. As a rule, if you know
which path is more likely (and you have reason to think that the predictor
doesn't know better than you do), you should make the likely path the
non-branch one. Similarly you should make the test as unlikely to branch as
possible, e.g. arrange the algorithm to use JL rather than JNH, even if JNH
might give the same result. The predictor, if it knows nothing else, is
going to guess branching based on the number of bits on in the mask.


> There's a lot more than that in the thread, and it fundamentally has to do
> with modern CPU technology, not C++.
>

Yup. It is a good thread, with a nice explanation/analogy.

Tony H.

----------------------------------------------------------------------
For IBM-MAIN subscribe / signoff / archive access instructions,
send email to lists...@listserv.ua.edu with the message: INFO IBM-MAIN

Reply via email to