On Fri, Jun 9, 2017 at 9:34 AM, uncorroded via Digitalmars-d-learn < digitalmars-d-learn@puremagic.com> wrote:
> Hi guys, > > I am a beginner in D. As a project, I converted a log-parsing script in > Python which we use at work, to D. This link was helpful - ( > https://dlang.org/blog/2017/05/24/faster-command-line-tools-in-d/ ) I > compiled it with dmd and ldc. The log file is 52 MB. With dmd (not release > build), it takes 1.1 sec and with ldc, it takes 0.3 sec. > > The Python script (run with system python, not Pypy) takes 0.75 sec. The D > and Python functions are here and on pastebin ( D - > https://pastebin.com/SeUR3wFP , Python - https://pastebin.com/F5JbfBmE ). > > Basically, i am reading a line, checking for 2 constants. If either one is > found, some processing is done on line and stored to an array for later > analysis. I tried reading the file entirely in one go using std.file : > readText and using std.algorithm : splitter for lazily splitting newline > but there is no difference in speed, so I used the byLine approach > mentioned in the linked blog. Is there a better way of doing this in D? > > There is no difference in speed because you do not process your data lazily, so you make many allocations, so this is main reason why it is so slow. I could improve that, but I will need to see some example data, which you are trying to parse. But some rules, 1.) instead of ~= you shoud use std.array.appender 2.) instead of std.string.split you could use std.algorithm.splitter or std.algorithm.findSplit 3.) instead of indexOf I would use std.algorithm.startsWith (in case it is on the begining of the line)