One more thing before I forget - you may want to use byLine() for input. In case the issue turns out to be related to I/O, it's much better we improve byLine() instead of the streams library.This is a good benchmark for I/O and a practical regex. David, could you please send (privately if you want) the file or some statistics about it (bytes, lines, a representative sample)? Thanks!
I implemented the various suggestions (File.byLine, writeln instead of writefln, std.algorithm.sort, except using FReD. FReD wouldn't compile on the linux box I am using. the error was:
/phobos/std/file.d(537): Error: undefined identifier package c.stdio Previous timing: real 4m21.255s user 4m14.216s sys 0m5.940s New timing after the changes: real 2m15.840s user 2m12.700s sys 0m2.760s So, it's nearly twice as fast but still the slowest of the four.I was able to compile with FReD on a 32-bit Windows system and it performed 15% faster than std.regex processing these same test files. I would love to try the precompiled regex code for FReD but the compile throws an out of memory error when I try it.
The source files are /var/log/syslog files from sendmail on a Solaris 10 box. I can't make them available because they are mail logs from our company but here are the sizes and line counts along with example entries.
$ wc -l syslog syslog.0 syslog.1 syslog.2 280618 syslog 331609 syslog.0 535035 syslog.1 543241 syslog.2 1690503 total -rw-r--r-- 1 david david 86244537 2011-11-30 21:26 syslog.0 -rw-r--r-- 1 david david 146156778 2011-11-30 21:26 syslog.1 -rw-r--r-- 1 david david 143481904 2011-11-30 21:26 syslog.2 -rw-r--r-- 1 david david 73030898 2011-11-30 21:26 syslog The entries look like this:Oct 27 03:10:01 thehost sendmail[3248]: [ID 801593 mail.info] p9R8A0MJ003245: to=u...@somewhere.com, delay=00:00:01, xdelay=00:00:01, mailer=esmtp, pri=120773, relay=some.host.com. [5.6.7.8], dsn=2.0.0, stat=Sent (ok 1319703001 qp 25319 the.mail.host.com!1319703000!80184558!1) Oct 27 03:10:04 thehost sendmail[3289]: [ID 801593 mail.info] p9R8A3Nr003289: from=sender@senderbox, size=765, class=0, nrcpts=1, msgid=<201110270810.p9R8A3QA021419@senderbox>, proto=ESMTP, daemon=MTA, relay=senderbox.foo.com [1.2.3.4]
-Dave
relayhosts.d
Description: Binary data