On Wednesday, 30 March 2016 at 05:16:04 UTC, H. S. Teoh wrote:
If we didn't have autodecoding, would be a simple matter of searching for sentinel substrings. This also indicates that most of the work done by autodecoding is unnecessary -- it's wasted work since most of the string data is treated opaquely anyway.

Just to drive this point home, I made a very simple benchmark. Iterating over code points when you don't need to is 100x slower than iterating over code units.

import std.datetime;
import std.stdio;
import std.array;
import std.utf;
import std.uni;

enum testCount = 1_000_000;
enum var = "Lorem ipsum dolor sit amet, consectetur adipiscing elit. Praesent justo ante, vehicula in felis vitae, finibus tincidunt dolor. Fusce sagittis.";

void test()
{
    auto a = var.array;
}

void test2()
{
    auto a = var.byCodeUnit.array;
}

void test3()
{
    auto a = var.byGrapheme.array;
}

void main()
{
    import std.conv : to;
    auto r = benchmark!(test, test2, test3)(testCount);
    auto result = to!Duration(r[0] / testCount);
    auto result2 = to!Duration(r[1] / testCount);
    auto result3 = to!Duration(r[2] / testCount);

    writeln("auto-decoding", "\t\t", result);
    writeln("byCodeUnit", "\t\t", result2);
    writeln("byGrapheme", "\t\t", result3);
}


$ ldc2 -O3 -release -boundscheck=off test.d
$ ./test
auto-decoding           1 μs
byCodeUnit              0 hnsecs
byGrapheme              11 μs

Reply via email to