== Quote from Andrei Alexandrescu (seewebsiteforem...@erdani.org)'s article
> On 8/23/10 19:26 PDT, dsimcha wrote:
> > == Quote from bearophile (bearophileh...@lycos.com)'s article
> >> The gentle and a-lot-working David Simcha has converted one of my bug 
> >> reports
> > into an enhancement request:
> >> http://d.puremagic.com/issues/show_bug.cgi?id=4264
> >> Currently (SVN) only array() is able to digest iterables that define 
> >> opApply
> > only, but I think a few more spread support will be very good.
> >> In the thread of that 4264 there are plenty of comments and views of the
> > situation. Opinions welcome, both here and there.
> >> Bye,
> >> bearophile
> >
> > This one didn't get much attention, possibly because the bug report is too 
> > long,
> > too complicated and too indirect.  Let me reiterate the basic points inline.
> >
> > I've been working heavily on debugging and polishing std.range and 
> > std.algorithm.
> >   (I've fixed 14 Bugzilla bugs plus several unlisted bugs and enhancements 
> > in
these
> > modules since last DMD release.)  One issue that's been nagging me for a 
> > while,
> > and has apparently been nagging Bearophile, too is that opApply-based 
> > objects are
> > not supported at all, even where supporting them would be completely 
> > feasible.
> > Examples are std.algorithm.map, std.algorithm.filter and 
> > std.algorithm.chain.  I'm
> > thinking of generalizing std.range and std.algorithm to work with any 
> > foreach-able
> > object where possible, including opApply-based iteration.  My concerns are:
> >
> > 1.  Bug 2443 makes it impossible to propagate ref to higher order objects
correctly.
> >
> > 2.  Building higher order iterables with opApply might get so slow that it's
> > virtually unusable, due to multiple layers of delegate calls.  Then again, 
> > this
> > might be premature optimization for a dumb compiler, as LDC can inline 
> > through at
> > least one level of opApply delegate calls.  I haven't tested to see if it 
> > can do
> > more than this.
> >
> > 3.  Ranges are the flagship way of iterating through an object in D, as they
> > should be.  opApply has its uses, but they're relatively few and far 
> > between.  I'm
> > wondering if anyone besides bearophile and I think supporting opApply is 
> > worth the
> > effort and potential bloat, if Walter cares enough to fix Bug 2443, and if 
> > Andrei
> > considers it to be a reasonable part of std.range/std.algorithm's design.
> Thanks for this note.
> First, I think those interested in the relative merits and demerits of
> ranges vs. opApply should read this:
> http://okmij.org/ftp/papers/LL3-collections-enumerators.txt. The author,
> Oleg Kiselyov - an excellent PL theorist - makes a lot of excellent
> points (but also misses a few; I plan to write a rebuttal to that
> article a some point).
> Clearly opApply (and internal iteration in general) has many merits that
> are difficult to rival with ranges. The converse is also true. So
> perhaps it's compelling to have both.
> One concern regarding opApply is speed of manipulation through an
> indirect (delegate) call. It would be great to have some benchmarks that
> give us a good baseline for discussion. David, would you want to put a
> few together?
> Andrei

Here's one benchmark and the results, using SHOO's awesome new StopWatch module
that just landed in SVN a few days ago:

import std.stdio, std.functional, std.stopwatch, std.traits, std.range,
    std.algorithm;

struct OpApplyFilter(alias pred, Range) {
    alias unaryFun!pred fun;
    Range range;

    int opApply(int delegate(ref ForeachType!Range) dg) {
        int res;

        foreach(elem; range) if(fun(elem)) {
            res = dg(elem);
            if(res) break;
        }

        return res;
    }
}

auto opApplyFilter(alias pred, Range)(Range range) {
    return OpApplyFilter!(pred, Range)(range);
}

struct OpApplyMap(alias pred, Range) {
    alias unaryFun!pred fun;
    Range range;

    int opApply(int delegate(ref ForeachType!Range) dg) {
        int res;

        foreach(elem; range) {
            auto mapped = fun(elem);
            res = dg(mapped);
            if(res) break;
        }

        return res;
    }
}

auto opApplyMap(alias pred, Range)(Range range) {
    return OpApplyMap!(pred, Range)(range);
}


void main() {
    auto myRange = iota(100_000_000);

    // Keep the compiler from excessively optimizing.
    int sum;

    auto sw = new StopWatch(StopWatch.AutoStart.yes);

    foreach(elem; map!"a * a"(filter!"a & 1"(myRange))) {
        sum += elem;
    }

    sw.stop;
    writeln("Range-based:  ", sw.peek.milliseconds);

    sw.reset();
    sw.start();

    foreach(elem; opApplyMap!"a * a"(opApplyFilter!"a & 1"(myRange))) {
        sum += elem;
    }

    sw.stop;
    writeln("opApply-based:  ", sw.peek.milliseconds);

    sw.reset();
    sw.start();

    foreach(elem; opApplyMap!"a * a"(myRange)) {
        sum += elem;
    }

    sw.stop;
    writeln("opApply-based, Map only:  ", sw.peek.milliseconds);

    sw.reset();
    sw.start();

    foreach(elem; opApplyFilter!"a & 1"(
    opApplyMap!"a + 1"(opApplyMap!"a * a"(myRange)))) {
        sum += elem;
    }

    sw.stop;
    writeln("opApply-based, two maps:  ", sw.peek.milliseconds);


    // This is to force the value of sum to be used to prevent
    // overly aggressive optimizations by DMD.
    writeln("Ignore this:  ", sum);
}

Results:

Range-based:  404.445
opApply-based:  718.914
opApply-based, Map only:  524.051
opApply-based, two maps:  1501.92
Ignore this:  1158518272

So it seems like adding more and more layers of stacking causes each layer of
stacking to have progressively more overhead.  This makes sense because each 
layer
of stacking is likely to have a progressively worse effect on the CPU pipeline.
The CPU can probably speculatively execute around one layer of indirect calls, 
but
not a whole bunch.

BTW, from looking at disassemblies it seems like for some strange reason the
predicate functions are never getting inlined.

Reply via email to