Background: This came from an attempt to get rid of delegate indirection in parallel foreach loops on LDC. LDC can inline delegates that always point to the same code. This means that it can inline opApply delegates after inlining opApply itself and effectively constant folding the delegate.

Simplified case without unnecessarily complex context:

// Assume this function does NOT get inlined.
// In my real use case it's doing something
// much more complicated and in fact does not
// get inlined.
void runDelegate(scope void delegate() dg) {
    dg();
}

// Assume this function gets inlined into main().
uint divSum(uint divisor) {
    uint result = 0;

    // If divisor gets const folded and is a power of 2 then
    // the compiler can optimize the division to a shift.
    void doComputation() {
        foreach(i; 0U..1_000_000U) {
            result += i / divisor;
        }
    }

    runDelegate(&doComputation);
}

void main() {
    // divSum gets inlined, to here, but doComputation()
    // can't because it's called through a delegate.
    // Therefore, the 2 is never const folded into
    // doComputation().
    auto ans = divSum(2);
}

The issue I'm dealing with in std.parallelism is conceptually the same as this, but with much more context that's irrelevant to this discussion. Would the following be a feasible compiler optimization either in the near future or at least in principle:

When an outer function is inlined, all non-static inner functions should be recompiled with the information gained by inlining the outer function. In this case doComputation() would be recompiled with divisor const-folded to 2 and the division optimized to a shift. This post-inlining compilation would then be passed to runDelegate().

Also, is there any trick I'm not aware of to work around the standard compilation model and force this behavior now?

Reply via email to