On Monday, 19 March 2012 at 04:12:33 UTC, Jay Norwood wrote:

[....]

Ok, this was the good surprise. Reading by chunks was faster than reading the whole file, by several ms.

// read files by chunk ...!better than full input
//finished! time: 23 ms
void wcp_files_by_chunk(string fn)
{
        auto f = File(fn);
        foreach(chunk; f.byChunk(1_000_000)){
        }
}

Try copying each received chunk into an array of the size of
the file, so that the comparison is fair (allocation time
of one big array != allocation time of single chunk). Plus,
std.file.read() might reallocate).
  Plus, I think that the GC runs at the end of a program,
and it's definitely not the same to have it scan through
a bigger heap (20Mb) than just scan through 1Mb of ram,
and with 20Mb of a file, it might have gotten 'distracted'
with pointer-looking data.
  Are you benchmarking the time of the whole program,
or of just that snippet? Is the big array out of scope
after the std.file.read() ? If so, try putting the benchmark
start and end inside the function, at the same scope
maybe inside that wcp_whole_file() function.


[....]

So here is some surprise ... why is regex 136ms vs 34 ms hand code?

It's not surprising to me. I don't think that there
is a single regex engine in the world (I don't think even
the legendary Ken Thompson machine code engine) that can
surpass a hand coded:
     foreach(c; buffer) { lineCount += (c == '\n'); }
for line counting.

The same goes for indexOf. If you are searching for
the position of a single char (unless maybe that the
regex is optimized for that specific case, I'd like
to know of one if it exists), I think nothing beats indexOf.


Regular expressions are for what you rather never hand code,
or for convenience.
I would rather write a regex to do a complex subtitution in Vim,
than make a Vim function just for that. They have an amazing
descriptive power. You are basically describing a program
in a single regex line.


When people talk about *fast* regexen, they talk comparatively
to other engines, not as an absolute measure.
This confuses a lot of people.

--jm



Reply via email to