On 26.03.2012 20:00, Jay Norwood wrote:
On Sunday, 25 March 2012 at 16:31:40 UTC, James Blewitt wrote:
I'm currently trying to figure out what I'm doing differently in my
original program. At this point I am assuming that I have an error in
my code which causes the D program to do much more work that its Ruby
counterpart (although I am currently unable to find it).

When I know more I will let you know.

James Blewitt

That was the same type of thing I was seeing with very simple regex
expressions. The regex was on the order of 30 times slower than hand
code for finding words in strings.

This is a sad fact of life, the general tool can't beat highly specialized things. Ideally it can be on par though. Even in the best case ctRegex has to do a lot of things a simple == '\n' doesn't do, like storing boundaries of match. That's something to keep in mind.

By the way, regex does fine job on (semi-)fixed strings of length >= 3-4, often easily beating plain find/indexOf. I haven't tested Boyer-Moore version of find, that should be faster then regex for sure.

The ctRegex is on the order of 13x
slower than hand code. The times below are from parallel processing on
100MB of text files, just finding the word boundaries. I uploaded that
tests in https://github.com/jnorwood/wc_test
I believe in all these cases the files are being cached by the os, since
I was able to see the same measurements from a ramdisk done with imdisk.
So in these cases the file reads are about 30ms of the result. The rest
is cpu time, finding the words.

This is with default 7 threads

finished wcp_wcPointer! time: 98 ms
finished wcp_wcCtRegex! time: 1300 ms
finished wcp_wcRegex! time: 2946 ms
finished wcp_wcRegex2! time: 2687 ms
finished wcp_wcSlices! time: 157 ms
finished wcp_wcStdAscii! time: 225 ms


This is processing the same data with 1 thread

finished wcp_wcPointer! time: 188 ms
finished wcp_wcCtRegex! time: 2219 ms
finished wcp_wcRegex! time: 5951 ms
finished wcp_wcRegex2! time: 5502 ms
finished wcp_wcSlices! time: 318 ms
finished wcp_wcStdAscii! time: 446 ms

And this is processing the same data with 13 threads

finished wcp_wcPointer! time: 93 ms
finished wcp_wcCtRegex! time: 1110 ms
finished wcp_wcRegex! time: 2531 ms
finished wcp_wcRegex2! time: 2321 ms
finished wcp_wcSlices! time: 136 ms
finished wcp_wcStdAscii! time: 200 ms

The only change in the program that is uploaded is to add the suggested
defaultPoolThreads(13);
at the start of main to change the ThreadPool default thread count.



--
Dmitry Olshansky

Reply via email to