Re: Tips on making regex more performant?

KJS Tue, 18 Jun 2013 16:31:14 -0700

On Tuesday, 18 June 2013 at 18:53:34 UTC, Gary Willoughby wrote:

Below is an example snippet of code to test for performance ofregex matches. I need to parse a large log and extract datafrom it and i've noticed a huge increase in time of the loopwhen reading and using regex.
        ...
        auto alert = regex(r"^Alert ([0-9]+)");

        while ((line = file.readln()) !is null)
        {
                auto m = match(line, alert);

                if (m)
                {
                        alerts++;
                }

                counter++;
        }
        ...
Using the above example i parse about 700K lines per second(i'm reading from an SSD). If i comment out the regex matchfunction, i read at 4.5M lines per second. Considering i needto use about 8 regex matches and extract data, this figurefurther drops to about 100K lines per second.
Is there anything i can do to speed up regex matching in such ascenario? Are there any tips you can share to speed things up?
Thanks.

I'm working with some string-heavy applications so I was curiousabout this myself. I'm new to D, but I did some heavy dataanalysis on chat files a while back.

Not knowing anything about your data or what other queries youmight want to do on it, matching the first part of the stringwith std.algorithm.startsWith() and splitting the line on adelimiter outperforms regex matching on my admittedly arbitrarytest code. I tested two extremes at 10,000,000 rounds:

Match everything: ~39 seconds for match, ~8 seconds forstartsWith/split.Match fails at start of string: ~10 seconds for match, ~1 secondfor startsWith/split.Match fails at end of string: ~15 seconds for match, ~1 secondfor startsWith/split.

Even if you need a regex to match the middle, it might beworthwhile to filter the list with startsWith if you're matchinga fixed string at the start of the line. Again, it depends on thefrequency of hits and how the data is structured.

Split is probably not the best way to slice the match, but Idon't have time tonight to try other slicing methods.

Re: Tips on making regex more performant?

Reply via email to