On Wed, 08 Feb 2012 22:44:25 +0100, Jesse Phillips
<[email protected]> wrote:
I've finely moved to the new regex for some real code. I'm seeing a
major change in performance when checking if a large number of words
contain a digit.
The english.dic file contains 134,950 entries
With
2.056: 0.22sec
2.058: 7.65sec
I don't expect a correction for this would make it in 2.058 as it is
likely an issue in 2.057.
--------
import std.file;
import std.string;
import std.datetime;
import std.regex;
private int[string] model;
void main() {
auto name = "english.dic";
foreach(w; std.file.readText(name).toLower.splitLines)
model[w] += 1;
foreach(w; std.string.split(readText(name)))
if(!match(w, regex(r"\d")).empty)
{}
}
There are some more performance issues.
D has a nice built-in profiler to find such issues.
----------
import std.algorithm, std.stdio, std.string, std.path, std.regex;
private int[string] model;
int main(string[] args)
{
if (args.length != 2)
{
std.stdio.stderr.writefln("usage: %s <file>",
std.path.baseName(args[0]));
return 1;
}
auto re = std.regex.regex(r"\d");
foreach(line; std.stdio.File(args[1], "r").byLine())
{
// Bug 6791: splitter is UTF-8 unsafe
foreach(w; std.algorithm.splitter(line))
{
if(!std.regex.match(w, re).empty)
{
}
}
std.string.toLowerInPlace(line);
model[line.idup] += 1;
}
return 0;
}