On 02.12.2011 15:32, Marco Leise wrote:
The import problem in std.file has been fixed on GitHub, but I couldn't
get FReD to compile this regex:
enum regex = ctRegex!r"relay=([\w\-\.]+[\w]+)[\.\,]*\s";
Instead I'm using this one:
enum regex = ctRegex!r"relay=([A-Za-z0-9_\-.]+[A-Za-z0-9_]+)[.,]*\s";
Both \. and \w inside seem to cause problems. \- was also troublesome,
but easy to add a case in the parser looking at how \r is handled.
First of all, sorry for some messy problems with escapes in character
classes. If we all agree to just treat anything non-special after \ as
is then I'll add it. Second, I might take a shot at optimizing engine,
once OSX problem is figured out.
Then I started optimizing with these steps:
1. Run a 64-bit build instead of a 32-bit build :D
30.2 s => 14.4 s
2. use "auto regex = ctRegex!..." insdead of "enum regex = ctRegex!..."
14.4 s => 6.4 s
Well, another thing to try is gdc/ldc. Last time I succeeded in this
endeavor with -O3 it yielded a small boost of ~ 5%.
For comparison: the Java version takes 5.3 s here.
Don't kill me ;)
Seriously... they must be doing no decoding of UTF. Another option is
Boyer-moor on "relay=". It would be interesting to search for something
a little bit more fussy e.g. "r[eE]lay=" or something like that just to
see if it has any effect.
That left me with the following profile chart of function calls > %1
time. The percentages don't accumulate subroutine calls. So main() is
fairly low in the list:
From this short list I'd say that opIndex could be sped up a bit. But
nothing other catches my eye. Except for that 4% enforceEx on UTF exception.
samples % source function
6934 16.7800 uni.d:601 const(@trusted bool function(dchar))
std.internal.uni.CodepointTrie!(8).CodepointTrie.opIndex
4235 10.2485 (no location information) pure @safe dchar
std.utf.decode(const(char[]), ref ulong)
3807 9.2128 regex.d:6395 @trusted bool
std.regex.ctRegexImpl!("relay=([A-Za-z0-9_\-.]+[A-Za-z0-9_]+)[.,]*\s",
[]).func(ref
std.regex.BacktrackingMatcher!(true).BacktrackingMatcher!(char).BacktrackingMatcher)
2240 5.4207 regex.d:3232 @property @trusted bool
std.regex.BacktrackingMatcher!(true).BacktrackingMatcher!(char).BacktrackingMatcher.atEnd()
2151 5.2053 regex.d:2932 @safe bool
std.regex.Input!(char).Input.nextChar(ref dchar, ref ulong)
1812 4.3850 exception.d:486 pure @safe bool
std.exception.enforceEx!(std.utf.UTFException, bool).enforceEx(bool,
lazy immutable(char)[], immutable(char)[], ulong)
1686 4.0801 regex.d:6490 @trusted bool
std.regex.ctRegexImpl!("relay=([A-Za-z0-9_\-.]+[A-Za-z0-9_]+)[.,]*\s",
[]).func(ref
std.regex.BacktrackingMatcher!(true).BacktrackingMatcher!(char).BacktrackingMatcher).int
test_11()
1409 3.4097 regex.d:6450 @safe
std.regex.__T10RegexMatchTAaS613std5regex28__T19BacktrackingMatcherVb1Z19BacktrackingMatcherZ.RegexMatch
std.regex.match!(char[],
std.regex.StaticRegex!(char).StaticRegex).match(char[],
std.regex.StaticRegex!(char).StaticRegex)
1335 3.2306 regex.d:6272 @trusted
std.regex.__T10RegexMatchTAaS613std5regex28__T19BacktrackingMatcherVb1Z19BacktrackingMatcherZ.RegexMatch
std.regex.__T10RegexMatchTAaS613std5regex28__T19BacktrackingMatcherVb1Z19BacktrackingMatcherZ.RegexMatch.__ctor!(std.regex.StaticRegex!(char).StaticRegex).__ctor(std.regex.StaticRegex!(char).StaticRegex,
char[])
1224 2.9620 regex.d:3234 @trusted void
std.regex.BacktrackingMatcher!(true).BacktrackingMatcher!(char).BacktrackingMatcher.next()
1212 2.9330 regex.d:2951 @property @safe ulong
std.regex.Input!(char).Input.lastIndex()
1202 2.9088 regex.d:2744 @trusted ulong
std.regex.ShiftOr!(char).ShiftOr.search(const(char)[], ulong)
1051 2.5434 regex.d:3717 @trusted void
std.regex.BacktrackingMatcher!(true).BacktrackingMatcher!(char).BacktrackingMatcher.stackPush!(int).stackPush(int)
973 2.3546 regex.d:3717 @trusted void
std.regex.BacktrackingMatcher!(true).BacktrackingMatcher!(char).BacktrackingMatcher.stackPush!(ulong).stackPush(ulong)
884 2.1392 main.d:22 _Dmain
618 1.4955 regex.d:3726 @trusted void
std.regex.BacktrackingMatcher!(true).BacktrackingMatcher!(char).BacktrackingMatcher.stackPush!(std.regex.Group!(ulong).Group).stackPush(std.regex.Group!(ulong).Group[])
466 1.1277 (no location information) _d_arraysetlengthiT
These functions sum up to ~80%. And if it is correct, the garbage
collector functions each take a low place in the table. At this point
I'd probably recommend an ASCII regex, but I'd like to know how Java can
still be substantially faster with library routines. :)