So we've had a good run with making popFront smaller. In ASCII
microbenchmarks with ldc, the speed is indistinguishable from s = s[1 ..
$]. Smaller functions make sure that the impact on instruction cache in
larger applications is not high.
Now it's time to look at the end-to-end cost of autodecoding. I wrote
this simple microbenchmark:
=====
import std.range;
alias myPopFront = std.range.popFront;
alias myFront = std.range.front;
void main(string[] args) {
import std.algorithm, std.array, std.stdio;
char[] line = "0123456789".dup.repeat(50_000_000).join;
ulong checksum;
if (args.length == 1)
{
while (line.length) {
version(autodecode)
{
checksum += line.myFront;
line.myPopFront;
}
else
{
checksum += line[0];
line = line[1 .. $];
}
}
version(autodecode)
writeln("autodecode ", checksum);
else
writeln("bytes ", checksum);
}
else
writeln("overhead");
}
=====
On my machine, with "ldc2 -release -O3 -enable-inlining" I get something
like 0.54s overhead, 0.81s with no autodecoding, and 1.12s with
autodecoding.
Your mission, should you choose to accept it, is to define a combination
front/popFront that reduces the gap.
Andrei