"Andrei Alexandrescu" <seewebsiteforem...@erdani.org> wrote in message news:jb8hvh$2sdl$1...@digitalmars.com...
This is a good benchmark for I/O and a practical regex. David, could you
please send (privately if you want) the file or some statistics about it
(bytes, lines, a representative sample)? Thanks!

One more thing before I forget - you may want to use byLine() for input. In case the issue turns out to be related to I/O, it's much better we improve byLine() instead of the streams library.


I implemented the various suggestions (File.byLine, writeln instead of writefln, std.algorithm.sort, except using FReD. FReD wouldn't compile on the linux box I am using. the error was:

/phobos/std/file.d(537): Error: undefined identifier package c.stdio

Previous timing:
real    4m21.255s
user    4m14.216s
sys     0m5.940s

New timing after the changes:
real    2m15.840s
user    2m12.700s
sys     0m2.760s


So, it's nearly twice as fast but still the slowest of the four.

I was able to compile with FReD on a 32-bit Windows system and it performed 15% faster than std.regex processing these same test files. I would love to try the precompiled regex code for FReD but the compile throws an out of memory error when I try it.

The source files are /var/log/syslog files from sendmail on a Solaris 10 box. I can't make them available because they are mail logs from our company but here are the sizes and line counts along with example entries.

$ wc -l syslog syslog.0 syslog.1 syslog.2
  280618 syslog
  331609 syslog.0
  535035 syslog.1
  543241 syslog.2
 1690503 total

-rw-r--r-- 1 david david  86244537 2011-11-30 21:26 syslog.0
-rw-r--r-- 1 david david 146156778 2011-11-30 21:26 syslog.1
-rw-r--r-- 1 david david 143481904 2011-11-30 21:26 syslog.2
-rw-r--r-- 1 david david  73030898 2011-11-30 21:26 syslog

The entries look like this:

Oct 27 03:10:01 thehost sendmail[3248]: [ID 801593 mail.info] p9R8A0MJ003245: to=u...@somewhere.com, delay=00:00:01, xdelay=00:00:01, mailer=esmtp, pri=120773, relay=some.host.com. [5.6.7.8], dsn=2.0.0, stat=Sent (ok 1319703001 qp 25319 the.mail.host.com!1319703000!80184558!1) Oct 27 03:10:04 thehost sendmail[3289]: [ID 801593 mail.info] p9R8A3Nr003289: from=sender@senderbox, size=765, class=0, nrcpts=1, msgid=<201110270810.p9R8A3QA021419@senderbox>, proto=ESMTP, daemon=MTA, relay=senderbox.foo.com [1.2.3.4]


-Dave

Attachment: relayhosts.d
Description: Binary data

Reply via email to