On Wednesday, 23 May 2018 at 02:24:08 UTC, IntegratedDimensions wrote:

Many times in expensive loops one must make decisions. Decisions must be determined and the determination costs.

for(int i = 0; i < N; i++)
{
    if(decision(i)) A; else B;
}

the if statement costs N times the cycle cost.

In some cases the decision holds for continuous ranges. For some 0 <= n <= N the decision is constant, but n is arbitrary(determined by unknown factors at compile time).

One can speed up the routine by using something akin to a simplified strategy pattern where one uses functions/delegates/lambdas to code a faster version without the test:


for(int i = 0; i < N; i++)
{
    d = (){ if(decision(i)) A; else d = () { B; };
    d();
}


assuming you meant
auto d = (int i){ if(decision(i)) A; else d = (int i) { B; };};
for(int i = 0; i < N; i++)
{
    d(i);
}


this code basically reduces to

for(int i = 0; i < N; i++)
{
    B;
}

Once the decision fails, which we assume once it fails it always fails in this particular case.

Therefor, once the decision fails it kicks in the "faster" code. Suppose decision is very slow.

The cycle cost is then n times rather than N times.

In general, such techniques can be used to speed up code, sometimes significantly, but are difficult to implement using a standard pattern and for more complex decision functions.


Does anyone see any way to use some some of D's power to help simplify such things but not using something like a strategy pattern or complexifying the overall design(using advanced techniques such as templates, mixins is not making the code more complex).

If decision is pure, consider using static foreach (iota(N).map!decision) { ... }to unroll it at compile time. even if it isn't the compiler may still be able to factor out parts of repeated calls to decision, look at the assembly/IR to confirm this.

Otherwise PROFILE!!!!! (and use profile guided optimisation! this implies using a compiler other than DMD which if you care about performance you should be doing anyway)

Blindly trying to optimise is just as likely to hurt performance.
in particular don't underestimate the branch predictor. Even if decision is complex, if there is a pattern in evaluating iota(n).map!decision the branch predictor will pick up on it.

In terms of exploiting knowledge of decision a priori that the compiler somehow lacks you really don't have much option but to do it yourself.


Reply via email to