On Wed, 08 Feb 2012 22:44:25 +0100, Jesse Phillips <[email protected]> wrote:

I've finely moved to the new regex for some real code. I'm seeing a major change in performance when checking if a large number of words contain a digit.

The english.dic file contains 134,950 entries

With
2.056: 0.22sec
2.058: 7.65sec

I don't expect a correction for this would make it in 2.058 as it is likely an issue in 2.057.

--------
import std.file;
import std.string;
import std.datetime;
import std.regex;

private int[string] model;

void main() {
   auto name = "english.dic";
   foreach(w; std.file.readText(name).toLower.splitLines)
      model[w] += 1;

   foreach(w; std.string.split(readText(name)))
      if(!match(w, regex(r"\d")).empty)
      {}
}


There are some more performance issues.
D has a nice built-in profiler to find such issues.

----------
import std.algorithm, std.stdio, std.string, std.path, std.regex;

private int[string] model;

int main(string[] args)
{
    if (args.length != 2)
    {
std.stdio.stderr.writefln("usage: %s <file>", std.path.baseName(args[0]));
        return 1;
    }

    auto re = std.regex.regex(r"\d");
    foreach(line; std.stdio.File(args[1], "r").byLine())
    {
        // Bug 6791: splitter is UTF-8 unsafe
        foreach(w; std.algorithm.splitter(line))
        {
            if(!std.regex.match(w, re).empty)
            {
            }
        }

        std.string.toLowerInPlace(line);
        model[line.idup] += 1;
    }

    return 0;
}

Reply via email to